Wind.model.Model

class Wind.model.Model(model_function: str='cb', params: dict={}, prediction_type: str='one_shot') [source]

Methods

train(self, train_x: pd.DataFrame, train_y: pd.DataFrame, val_x: pd. DataFrame, val_y: pd.DataFrame, multioutput: bool=False, verbose: int=500 ) [source]

Train the model.

Parameters: train_x (pd.DataFrame) :

Training input data.

train_y (pd.DataFrame) :

Training target data.

val_x (pd.DataFrame) :

Validation input data.

val_y (pd.DataFrame) :

Validation target data.

multioutput (bool, optional) :

Flag indicating if the model supports multioutput. Defaults to False.

verbose (int, optional) :

Verbosity level during training. Defaults to 500.

Returns: None :

predict(self, X: pd.DataFrame, horizon: int=1) [source]

Make predictions using the trained model depending on prediction type. Recursive prediction only supports univariate data with previous steps as features.

Parameters: X (pd.DataFrame) :

Input data for prediction. horizon (int, optional): Number of steps to predict into the future. Defaults to 1.

Returns: np.ndarray :

Predicted values.

model_summarizer(self, val_x: pd.DataFrame, val_y: pd.DataFrame, test_x: pd.DataFrame, test_y: pd.DataFrame, plots: bool=True, plot_steps: int= 2000, feat_importance: bool=True, feat_steps: int=15, feat_names: list= None, horizon: int=1) [source]

Generate a summary of the model's performance.

Parameters: val_x (pd.DataFrame) :

Validation input data.

val_y (pd.DataFrame) :

Validation target data.

test_x (pd.DataFrame) :

Test input data.

test_y (pd.DataFrame) :

Test target data.

plots (bool, optional) :

Flag indicating if plots should be generated. Defaults to True.

plot_steps (int, optional) :

Number of steps to include in the plots. Defaults to 2000.

feat_importance (bool, optional) :

Flag indicating if feature importance should be calculated and plotted. Defaults to True.

feat_steps (int, optional) :

Number of top features to display in the feature importance plot. Defaults to 15.

feat_names (list, optional) :

List of feature names. Defaults to None.

horizon (int, optional) :

Number of steps to predict into the future. Defaults to 1.

Returns: tuple :

Tuple containing scores (MAE, RMSE, R2) and feature importances (if enabled).

hyp_op(self, val_x: pd.DataFrame, val_y: pd.DataFrame, train_x: pd. DataFrame, train_y: pd.DataFrame, horizon: int=1, trial=30, task_type='GPU' ) [source]

Perform hyperparameter optimization for a machine learning model using Optuna.

Parameters: val_x (pd.DataFrame): Validation dataset features. val_y (pd.DataFrame): Validation dataset labels. train_x (pd.DataFrame): Training dataset features. train_y (pd.DataFrame): Training dataset labels. horizon (int, optional): Prediction horizon for the model. Default is 1. trial (int, optional): Number of optimization trials. Default is 30. task_type (str, optional): Task type for CatBoost ('CPU' or 'GPU'). Default is 'GPU'.

Returns: tuple: A tuple containing the best hyperparameters (dict) and the corresponding best RMSE (float).

This function uses Optuna to perform hyperparameter optimization for a CatBoost machine learning model. It searches for the best hyperparameters within the specified parameter ranges and training settings.

The optimization objective is to minimize the Root Mean Squared Error (RMSE) on the validation dataset. The best hyperparameters and their corresponding RMSE are returned as a tuple.

Example: best_params, best_rmse = hyp_op(val_x, val_y, train_x, train_y, horizon=2, trial=50, task_type='GPU') print("Best Hyperparameters:", best_params) print("Best RMSE:", best_rmse)

feat_select(self, val_x: pd.DataFrame, val_y: pd.DataFrame, train_x: pd .DataFrame, train_y: pd.DataFrame, num_feats=20, num_steps=3, plot=True) [source]

Perform feature selection using CatBoost's select_features method.

Parameters: val_x (pd.DataFrame): Validation dataset features. val_y (pd.DataFrame): Validation dataset labels. train_x (pd.DataFrame): Training dataset features. train_y (pd.DataFrame): Training dataset labels. num_feats (int, optional): Number of features to select. Default is 20. num_steps (int, optional): Number of feature selection steps. Default is 3. plot (bool, optional): Whether to plot feature selection results. Default is True.

Returns: catboost.FeatureSelectionSummary: A summary of the feature selection process.

This function uses CatBoost's select_features method to perform feature selection on the given datasets. It selects a specified number of features based on their importance and returns a summary of the process.

Example: summary = feat_select(val_x, val_y, train_x, train_y, num_feats=15, num_steps=4, plot=True) print(summary)