Wind.dataset.Dataset

class Wind.dataset.Dataset(df: pd.DataFrame) [source]

Methods

fill_nan(self, fields: list) [source]

Fill missing values (NaN) in the specified fields/columns of the DataFrame.

Parameters: fields (list) :

List of fields/columns to fill missing values.

Returns: None :

drop_nan(self, fields: list) [source]

Drop columns in the specified fields/columns of the DataFrame.

Parameters: fields (list) :

List of fields/columns to drop rows with NaN values.

Returns: None :

sample(self, n: int) [source]

Sample every nth row from the DataFrame.

Parameters: n (int) :

Sampling interval.

Returns: None :

apply_rolling_window(self, df: pd.DataFrame, data: str, roll_time: int, window_function: callable) [source]

Apply a rolling window function to the specified data column in the DataFrame.

Parameters: df (pd.DataFrame) :

DataFrame to which the rolling window function will be applied.

data (str) :

Column name containing the data to apply the rolling window function.

roll_time (int) :

Window size for the rolling window.

window_function (callable) :

Callable function to apply as the rolling window function.

Returns: None :

add_last_t(self, df: pd.DataFrame, data: str, step: int=2) [source]

Add lagged versions of a column to the DataFrame.

Parameters: df (pd.DataFrame) :

DataFrame to which the lagged columns will be added.

data (str) :

Column name to create lagged versions of.

step (int, optional) :

Number of lagged steps to add. Defaults to 2.

Returns: None :

add_seasonal_feat(self, df: pd.DataFrame, time_col) [source]

Add seasonal features based on a time column.

Parameters: df (pd.DataFrame) :

DataFrame to which the seasonal features will be added.

time_col :

Time column to extract seasonal features from.

Returns: None :

create_dataset(self, df: pd.DataFrame, window_size: int, prediction_horizon: int, test_split: float=0.2, val_split: float=0.2, univariate: bool=False, target_col: str='active_power_total', shuffle: bool=False) [source]

Create a dataset for training and evaluation.

Parameters: df (pd.DataFrame) :

Input DataFrame containing the data.

window_size (int) :

Size of the input window.

prediction_horizon (int) :

Number of steps to predict into the future.

test_split (float, optional) :

Ratio of test data split. Defaults to 0.2.

val_split (float, optional) :

Ratio of validation data split. Defaults to 0.2.

univariate (bool, optional) :

Flag indicating if the data is univariate. Defaults to False.

target_col (str, optional) :

Name of the target column. Defaults to "active_power_total".

Returns: tuple : Tuple containing train,val and test data and labels, as well as feature names.