stlflib 1.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
stlflib-1.0.1/PKG-INFO ADDED
@@ -0,0 +1,96 @@
1
+ Metadata-Version: 2.1
2
+ Name: stlflib
3
+ Version: 1.0.1
4
+ Summary: Short-Term Forecasting of Regional Electrical Load Based on CatBoost Model
5
+ Home-page: https://github.com/caapel/ForecastPowerEnergy
6
+ Author: caapel
7
+ Author-email: caapel@mail.ru
8
+ License: KSPEU License
9
+ Project-URL: GitHub, https://github.com/caapel/ForecastPowerEnergy
10
+ Keywords: STLF
11
+ Classifier: Programming Language :: Python :: 3.10
12
+ Classifier: License :: Other/Proprietary License
13
+ Classifier: Operating System :: OS Independent
14
+ Requires-Python: >=3.10
15
+ Description-Content-Type: text/markdown
16
+ Requires-Dist: ephem
17
+ Requires-Dist: requests
18
+ Requires-Dist: beautifulsoup4
19
+ Requires-Dist: tqdm
20
+ Requires-Dist: numpy
21
+ Requires-Dist: pandas
22
+ Requires-Dist: seaborn
23
+ Requires-Dist: matplotlib
24
+ Requires-Dist: catboost
25
+ Requires-Dist: graphviz
26
+ Requires-Dist: scikit-learn
27
+
28
+ # Short-Term Load Forecasting Based on CatBoost Model Library (STLFLib)
29
+
30
+ This is a Python STLF machine learning library designed for generating energy consumption bids for the DAM (day-ahead market).
31
+ The library is distributed under the KSPEU license ([RU 2025688100](https://new.fips.ru/registers-doc-view/fips_servlet?DB=EVM&DocNumber=2025688100&TypeFile=html)). For commercial use, please contact the author: caapel@mail.ru.
32
+
33
+ ----------
34
+
35
+ ## How to install ##
36
+ To install, you can use the command:
37
+
38
+ pip install stlflib
39
+
40
+ Or download the repository from [GitHub](https://github.com/caapel/ForecastPowerEnergy) (private access)
41
+
42
+ ----------
43
+
44
+ ## Using ##
45
+ The essence of this project and its library is described in detail in the study [***Short-Term Forecasting of Regional Electrical Load Based on XGBoost Model***](https://doi.org/10.3390/en18195144)
46
+ > In this file you will not find a detailed description and instructions on how to work with this library;
47
+ > only a description of each of the basic library modules.
48
+
49
+ ### Dependency ##
50
+ The **dependency** module (located in the `dependency.py` file) contains a complete list of dependencies.
51
+ The module has only one function:
52
+ - *print_dependency()* - prints the versions of installed dependencies and the current version of STLFLib
53
+
54
+ ### ServiceDB ###
55
+ The **serviceDB** module (located in the `serviceDB.py` file) contains a set of tools for working with the database:
56
+ <br>-------------------------------create---------------------------------<br>
57
+ - *generate_volume_df(path)* - generate a dataframe with archived energy consumption data from prepared .xls-files located at `path`
58
+ - *get_weather(date)* - generate a weather archive/forecast (outside air temperature) for the specified date with a sampling frequency of 1 hour
59
+ - *get_br_feature(date)* - load a BEM (Balancing energy Market) archive/forecast for the specified date
60
+ - *get_RSV_rate(date)* - load the unregulated DAM price for the specified date (per month)
61
+ - *updating_or_create_df(get_function, filename, start=datetime(2013, 1, 1).date())* - create a new (from the specified date) or replenish an existing database (filename.xlsx) with missing data up to the end of the previous month, returning the resulting dataframe.
62
+ - *merge_and_export_DB(total_volume_df, df_weather, df_br_feature, filename='DataBase.xlsx')* - merge dataframes total_volume_df (`Volume.xlsx`), df_weather (`Weather.xlsx`), and df_br_feature (`br_feature.xlsx`) by the 'Date' column into one common database (by default, `DataBase.xlsx`)
63
+ <br>-------------------------------service--------------------------------<br>
64
+ - *get_empty_daily_df(date)* - Creates an empty dataframe (25 rows: from 0:00 to 24:00) for the specified date (for full temperature interpolation)
65
+ - *add_date_scalar(df)* - Adds additional categorical features to the dataframe: Day, Month, Year, WeekDay
66
+ - *is_check_DataBase(df)* - Checks database integrity
67
+ - *act_pred_reverse(df_br_feature)* - Replaces missing actual (Act) consumption and BR generation values ​​with planned (Pred) values. This function is used to generate a forecast for the current day, when the actual values ​​of `ActCons` and `ActGen` are not available for the entire day.
68
+ - *get_files_from_path(path='_raw_Data_TatEnergosbyt')* - Retrieving operational data from the directory (by default, `/_raw_Data_TatEnergosbyt`)
69
+ - *update_DataBase(total_oper_df, filename='DataBase.xlsx')* - Updating the database by adding operational data from `total_oper_df`
70
+
71
+ ### Preprocessing ###
72
+ The **preprocessing** module (located in the `preprocessing.py` file) contains data preprocessing tools for subsequent transfer of this data to the **core** functions (CatBoostRegressor):
73
+ - *get_type_day(df)* - encoding the day type (`TypeDay`) based on the `df.Date` column of the DataTime format. The encoding is performed based on the industrial calendar of the Republic of Tatarstan.
74
+ - *get_light(df)* - encoding the light interval (`Light`) based on the `df.Date` column of the DataTime format. The encoding is performed based on the geographic location of the city of Kazan.
75
+ - *get_season(df)* - encoding seasonality based on the `df.Date` column of the DataTime format.
76
+ - *prepareData(df, lag_start=1, lag_end=7)* - data preprocessing function. Preprocessing includes: adding day type, light interval, seasonality, and energy consumption lag (default 1...7 days)
77
+
78
+ ### Core ###
79
+ The **core** module (located in the `core.py` file) is the main class in the library. It is based on `CatBoostRegressor` and has a number of functions:
80
+ - *predict_volume(df_general, df_predict, max_depth, learn_period)*
81
+ - *get_df_predicted(df_general, max_depth, learn_period, model, date_start, date_end)* - generates a data frame with the predicted energy consumption volume for the specified planning horizon
82
+ - *date_str_format(df_predicted)* - generates a date string for the exported xlsx file
83
+ - *get_DAM_order(df_general, max_depth, learn_period, model, date_start, date_end)* - a function that generates a DAM order and exports it to xlsx format
84
+
85
+ ### Validating ###
86
+ The **validating** helper module (located in the `validating.py` file) is designed to validate the **core** functions:
87
+ - *get_df_val_predicted(df_general, df_general_date_index, max_depth, learn_period, model, date_start, date_end)* - function for generating a dataframe with predicted energy consumption volumes for the specified planning horizon, adapted for validation calculations (simulating the absence of 'ActCons' and 'ActGen' data after 7 AM, offline access to the weather forecast and BEM data)
88
+ - *get_df_validate(df_general, df_general_date_index, max_depth, learn_period, model, date_start, date_end, logging=True)* - function for validating the model for the specified time interval. Returns a validation dataframe with predicted values.
89
+ - *get_df_validate_with_loss(df_validate_result, df_RSV_vs_BR_rate)* - adds a 'loss' column with BEM losses to the resulting dataframe.
90
+ - *diff_predict_vs_fact(df_validate_result)* - outputs validation results in table and graph form (Matplotlib object).
91
+ - *Grid_Search(df_general, df_general_date_index, max_depth_grid, learn_period_grid, model, date_start, date_end)* - grid search for optimal training period and tree depth.
92
+ - *search_result_highlighting(df_search_result)* - highlights the search results of the Grid_Search() function.
93
+
94
+ ### EDA ###
95
+ The **EDA** graphics module (located in the `EDA.py` file) is designed to display the results of exploratory data analysis. In developing.
96
+ - *draw_learning_curve(df_general, max_depth, model, fontsize=15)* - calculates and plots the learning curve.
@@ -0,0 +1,69 @@
1
+ # Short-Term Load Forecasting Based on CatBoost Model Library (STLFLib)
2
+
3
+ This is a Python STLF machine learning library designed for generating energy consumption bids for the DAM (day-ahead market).
4
+ The library is distributed under the KSPEU license ([RU 2025688100](https://new.fips.ru/registers-doc-view/fips_servlet?DB=EVM&DocNumber=2025688100&TypeFile=html)). For commercial use, please contact the author: caapel@mail.ru.
5
+
6
+ ----------
7
+
8
+ ## How to install ##
9
+ To install, you can use the command:
10
+
11
+ pip install stlflib
12
+
13
+ Or download the repository from [GitHub](https://github.com/caapel/ForecastPowerEnergy) (private access)
14
+
15
+ ----------
16
+
17
+ ## Using ##
18
+ The essence of this project and its library is described in detail in the study [***Short-Term Forecasting of Regional Electrical Load Based on XGBoost Model***](https://doi.org/10.3390/en18195144)
19
+ > In this file you will not find a detailed description and instructions on how to work with this library;
20
+ > only a description of each of the basic library modules.
21
+
22
+ ### Dependency ##
23
+ The **dependency** module (located in the `dependency.py` file) contains a complete list of dependencies.
24
+ The module has only one function:
25
+ - *print_dependency()* - prints the versions of installed dependencies and the current version of STLFLib
26
+
27
+ ### ServiceDB ###
28
+ The **serviceDB** module (located in the `serviceDB.py` file) contains a set of tools for working with the database:
29
+ <br>-------------------------------create---------------------------------<br>
30
+ - *generate_volume_df(path)* - generate a dataframe with archived energy consumption data from prepared .xls-files located at `path`
31
+ - *get_weather(date)* - generate a weather archive/forecast (outside air temperature) for the specified date with a sampling frequency of 1 hour
32
+ - *get_br_feature(date)* - load a BEM (Balancing energy Market) archive/forecast for the specified date
33
+ - *get_RSV_rate(date)* - load the unregulated DAM price for the specified date (per month)
34
+ - *updating_or_create_df(get_function, filename, start=datetime(2013, 1, 1).date())* - create a new (from the specified date) or replenish an existing database (filename.xlsx) with missing data up to the end of the previous month, returning the resulting dataframe.
35
+ - *merge_and_export_DB(total_volume_df, df_weather, df_br_feature, filename='DataBase.xlsx')* - merge dataframes total_volume_df (`Volume.xlsx`), df_weather (`Weather.xlsx`), and df_br_feature (`br_feature.xlsx`) by the 'Date' column into one common database (by default, `DataBase.xlsx`)
36
+ <br>-------------------------------service--------------------------------<br>
37
+ - *get_empty_daily_df(date)* - Creates an empty dataframe (25 rows: from 0:00 to 24:00) for the specified date (for full temperature interpolation)
38
+ - *add_date_scalar(df)* - Adds additional categorical features to the dataframe: Day, Month, Year, WeekDay
39
+ - *is_check_DataBase(df)* - Checks database integrity
40
+ - *act_pred_reverse(df_br_feature)* - Replaces missing actual (Act) consumption and BR generation values ​​with planned (Pred) values. This function is used to generate a forecast for the current day, when the actual values ​​of `ActCons` and `ActGen` are not available for the entire day.
41
+ - *get_files_from_path(path='_raw_Data_TatEnergosbyt')* - Retrieving operational data from the directory (by default, `/_raw_Data_TatEnergosbyt`)
42
+ - *update_DataBase(total_oper_df, filename='DataBase.xlsx')* - Updating the database by adding operational data from `total_oper_df`
43
+
44
+ ### Preprocessing ###
45
+ The **preprocessing** module (located in the `preprocessing.py` file) contains data preprocessing tools for subsequent transfer of this data to the **core** functions (CatBoostRegressor):
46
+ - *get_type_day(df)* - encoding the day type (`TypeDay`) based on the `df.Date` column of the DataTime format. The encoding is performed based on the industrial calendar of the Republic of Tatarstan.
47
+ - *get_light(df)* - encoding the light interval (`Light`) based on the `df.Date` column of the DataTime format. The encoding is performed based on the geographic location of the city of Kazan.
48
+ - *get_season(df)* - encoding seasonality based on the `df.Date` column of the DataTime format.
49
+ - *prepareData(df, lag_start=1, lag_end=7)* - data preprocessing function. Preprocessing includes: adding day type, light interval, seasonality, and energy consumption lag (default 1...7 days)
50
+
51
+ ### Core ###
52
+ The **core** module (located in the `core.py` file) is the main class in the library. It is based on `CatBoostRegressor` and has a number of functions:
53
+ - *predict_volume(df_general, df_predict, max_depth, learn_period)*
54
+ - *get_df_predicted(df_general, max_depth, learn_period, model, date_start, date_end)* - generates a data frame with the predicted energy consumption volume for the specified planning horizon
55
+ - *date_str_format(df_predicted)* - generates a date string for the exported xlsx file
56
+ - *get_DAM_order(df_general, max_depth, learn_period, model, date_start, date_end)* - a function that generates a DAM order and exports it to xlsx format
57
+
58
+ ### Validating ###
59
+ The **validating** helper module (located in the `validating.py` file) is designed to validate the **core** functions:
60
+ - *get_df_val_predicted(df_general, df_general_date_index, max_depth, learn_period, model, date_start, date_end)* - function for generating a dataframe with predicted energy consumption volumes for the specified planning horizon, adapted for validation calculations (simulating the absence of 'ActCons' and 'ActGen' data after 7 AM, offline access to the weather forecast and BEM data)
61
+ - *get_df_validate(df_general, df_general_date_index, max_depth, learn_period, model, date_start, date_end, logging=True)* - function for validating the model for the specified time interval. Returns a validation dataframe with predicted values.
62
+ - *get_df_validate_with_loss(df_validate_result, df_RSV_vs_BR_rate)* - adds a 'loss' column with BEM losses to the resulting dataframe.
63
+ - *diff_predict_vs_fact(df_validate_result)* - outputs validation results in table and graph form (Matplotlib object).
64
+ - *Grid_Search(df_general, df_general_date_index, max_depth_grid, learn_period_grid, model, date_start, date_end)* - grid search for optimal training period and tree depth.
65
+ - *search_result_highlighting(df_search_result)* - highlights the search results of the Grid_Search() function.
66
+
67
+ ### EDA ###
68
+ The **EDA** graphics module (located in the `EDA.py` file) is designed to display the results of exploratory data analysis. In developing.
69
+ - *draw_learning_curve(df_general, max_depth, model, fontsize=15)* - calculates and plots the learning curve.
@@ -0,0 +1,83 @@
1
+ from .dependency import *
2
+ from .validating import *
3
+
4
+ def draw_learning_curve(df_general, max_depth, model, fontsize=15):
5
+
6
+ '''
7
+ SYNOPSYS: Расчёт и построение кривой обучения
8
+
9
+ KEYWORD ARGUMENTS:
10
+ df_general -- валидируемая генеральная совокупность
11
+ max_depth -- максимальная глубина решающего дерева регрессора
12
+ model -- тип модели
13
+ fontsize -- размер шрифта на графике
14
+
15
+ RETURNS:
16
+ None
17
+
18
+ EXAMPLE:
19
+ >>> draw_learning_curve(df_general, max_depth=6, model=model)
20
+ '''
21
+
22
+ def X_y_split(df_general, learn_period):
23
+
24
+ '''
25
+ SYNOPSYS: Разделение выборки на целевой и исходные признаки
26
+ '''
27
+
28
+ datetime=df_general.Date.iloc[-1]
29
+
30
+ df_train = df_general[(df_general.Date > datetime - timedelta(days=round(365.25*learn_period), hours=0)) &
31
+ (df_general.Date <= datetime)]
32
+
33
+ return df_train.drop(columns=['Date', 'Volume']), df_train.Volume
34
+
35
+ if model == 'br3_act': # c 30 ноября 2025 перейти на 8 лет для br3 и 13 лет для br2
36
+ df_general = df_general.drop(columns=['PredCons', 'PredGen'])
37
+ learn_period = 7
38
+ elif model == 'br2_act':
39
+ df_general = df_general.drop(columns=['PredCons', 'PredGen', 'Price'])
40
+ learn_period = 12
41
+
42
+ X, y = X_y_split(df_general, learn_period)
43
+
44
+ engine = catboost.CatBoostRegressor(silent=True,
45
+ n_estimators = 200,
46
+ max_depth = max_depth)
47
+ common_params = {
48
+ "X": X,
49
+ "y": y,
50
+ "train_sizes": np.linspace(1/learn_period, 1.0, learn_period),
51
+ "cv": TimeSeriesSplit(n_splits=5),
52
+ "scoring": 'neg_mean_absolute_percentage_error',
53
+ "n_jobs": -1,
54
+ "line_kw": {"marker": "o"},
55
+ "std_display_style": "fill_between",
56
+ "score_name": "neg MAPE",
57
+ }
58
+
59
+ plt.figure(figsize=(10, 5))
60
+ plt.rcParams['axes.prop_cycle'] = plt.cycler(color=['#1f77b4', '#8ec489'])
61
+ plt.rcParams['font.family'] = 'Palatino Linotype'
62
+ plt.rcParams['font.size'] = fontsize
63
+
64
+ LCD = LearningCurveDisplay.from_estimator(engine, **common_params)
65
+ train_sizes = LCD.train_sizes
66
+ test_scores = LCD.test_scores
67
+ mean_test_scores = [test_scores[i].mean() for i in range(len(test_scores))]
68
+ opt_y = max(mean_test_scores)
69
+ opt_x = train_sizes[mean_test_scores.index(opt_y)]
70
+
71
+ for i in range(len(mean_test_scores)):
72
+ plt.text(train_sizes[i], mean_test_scores[i], f'{i+1}', ha='center', va='bottom')
73
+
74
+ plt.annotate('Optimal period', xy=(opt_x, opt_y),
75
+ xycoords='data', xytext=(opt_x-4000, opt_y-0.015),
76
+ textcoords='data', fontsize=fontsize,
77
+ arrowprops=dict(arrowstyle='-|>'))
78
+
79
+ plt.title(f'Learning Curve for {model} (max_depth={max_depth})', fontsize=fontsize)
80
+ plt.legend(['Training score', 'Test score'], loc='lower right')
81
+ plt.tight_layout()
82
+ #plt.savefig(f'pictures/br2_Learning_Curve(max_depth={max_depth}).png', dpi = 300, transparent = True)
83
+ plt.show()
@@ -0,0 +1,14 @@
1
+ '''
2
+
3
+ Short-Term Forecasting of Regional Electrical Load Based on CatBoost Model library (STLFLib) v. 1.0.0.0
4
+
5
+ The KSPEU License Copyrigth © 2025 caapel
6
+
7
+ '''
8
+
9
+ from .dependency import *
10
+ from .serviceDB import *
11
+ from .preprocessing import *
12
+ from .core import *
13
+ from .validating import *
14
+ from .EDA import *
@@ -0,0 +1,242 @@
1
+ from .dependency import *
2
+ from .preprocessing import *
3
+ from .serviceDB import *
4
+
5
+ def predict_volume(df_general, df_predict, max_depth, learn_period):
6
+
7
+ '''
8
+ SYNOPSYS: Функция обучения модели и прогнозирования объемов энергопотребления
9
+
10
+ KEYWORD ARGUMENTS:
11
+ df_general -- датафрейм с генеральной совокупностью данных (X_train, y_train)
12
+ df_predict -- суточный датафрейм с исходными данными для прогноза (x_pred)
13
+ max_depth -- максимальная глубина решающего дерева регрессора
14
+ learn_period -- период обучения модели (размер тренировочной выборки)
15
+
16
+ RETURNS:
17
+ df_predict : pandas.core.frame.DataFrame
18
+
19
+ EXAMPLES:
20
+ >>> predict_volume(df_general, df_predict, datetime(2025, 8, 30, 1), max_depth, learn_period).info()
21
+
22
+ <class 'pandas.core.frame.DataFrame'>
23
+ RangeIndex: 24 entries, 0 to 23
24
+ Data columns (total 21 columns):
25
+ # Column Non-Null Count Dtype
26
+ --- ------ -------------- -----
27
+ 0 Date 24 non-null datetime64[ns]
28
+ 1 Year 24 non-null int64
29
+ 2 Month 24 non-null int64
30
+ 3 Day 24 non-null int64
31
+ 4 Hour 24 non-null int64
32
+ 5 Weekday 24 non-null int64
33
+ 6 Volume 24 non-null float64
34
+ 7 Temperature 24 non-null float64
35
+ 8 ActCons 24 non-null int64
36
+ 9 ActGen 24 non-null int64
37
+ 10 Price 24 non-null int64
38
+ 11 TypeDay 24 non-null int64
39
+ 12 Light 24 non-null int64
40
+ 13 Season 24 non-null int64
41
+ 14 lag-1 24 non-null float64
42
+ 15 lag-2 24 non-null float64
43
+ 16 lag-3 24 non-null float64
44
+ 17 lag-4 24 non-null float64
45
+ 18 lag-5 24 non-null float64
46
+ 19 lag-6 24 non-null float64
47
+ 20 lag-7 24 non-null float64
48
+ dtypes: datetime64[ns](1), float64(9), int64(11)
49
+ memory usage: 4.1 KB
50
+ '''
51
+
52
+ # формируем оптимальную обучающую выборку
53
+ df_train = df_general[df_general.Date > df_predict.iloc[0].Date - timedelta(days=365*learn_period+1, hours=1)]
54
+
55
+ # обучаем модель
56
+ model = catboost.CatBoostRegressor(silent=True,
57
+ n_estimators = 200,
58
+ max_depth = max_depth)
59
+
60
+ model.fit(df_train.drop(columns=['Date', 'Volume']), df_train.Volume)
61
+
62
+ # прогнозируем объёмы потребления на ближайшие сутки
63
+ df_predict.Volume = model.predict(df_predict.drop(columns=['Date', 'Volume']))
64
+
65
+ return df_predict
66
+
67
+
68
+ def get_df_predicted(df_general,
69
+ max_depth,
70
+ learn_period,
71
+ model,
72
+ date_start = datetime.now().date(),
73
+ date_end = datetime.now().date() + timedelta(days=1),
74
+ ):
75
+
76
+ '''
77
+ SYNOPSYS: Функция генерации датафрейма с прогнозными объёмами энергопотребления под указанный горизонт планирования
78
+
79
+ KEYWORD ARGUMENTS:
80
+ df_general -- актуальная генеральная совокупность ко дню прогноза
81
+ max_depth -- максимальная глубина решающего дерева регрессора
82
+ learn_period -- период обучения модели (размер тренировочной выборки)
83
+ model -- тип модели (по умолчанию br3_act: [ActCons, ActGen, Price])
84
+ date_start -- дата от начала старта прогноза (по умолчанию – текущий день)
85
+ date_end -- дата окончания прогноза (по умолчанию – сутки вперёд)
86
+
87
+ RETURNS:
88
+ df_predict : pandas.core.frame.DataFrame
89
+
90
+ EXAMPLE:
91
+ # генерация прогнозных объемов потребления ЭЭ на сутки вперед
92
+ >>> get_df_predicted(df_general, max_depth, learn_period, model)
93
+
94
+ # генерация прогнозных объемов потребления ЭЭ от указанной даты на сутки вперед
95
+ >>> get_df_predicted(df_general, max_depth, learn_period, model, date_start=datetime(2024, 5, 12).date())
96
+
97
+ # генерация прогнозных объемов потребления ЭЭ от текущего дня до указанной даты
98
+ >>> get_df_predicted(df_general, max_depth, learn_period, model, date_end=datetime(2024, 6, 15).date())
99
+
100
+ # генерация прогнозных объемов потребления ЭЭ от одной до другой указанной даты
101
+ >>> get_df_predicted(df_general, max_depth, learn_period, model, date_start=datetime(2024, 5, 14).date(),
102
+ date_end=datetime(2024, 5, 16).date())
103
+
104
+ # генерация прогнозных объемов потребления ЭЭ на двое суток вперед
105
+ >>> get_df_predicted(df_general, max_depth, learn_period, model, date_end=datetime.now().date() + timedelta(days=2))
106
+ '''
107
+
108
+ # генерируем пустой датафрейм под итоговый результат
109
+ df_predicted = pd.DataFrame()
110
+
111
+ #for date in trange((date_end - date_start).days + 1, desc=f"days progress"): # виджет процесса расчёта по суткам
112
+ for date in range((date_end - date_start).days + 1):
113
+
114
+ # генерируем пустой суточный датафрейм с погодными и календарными признаками
115
+ df_predicted_daily = add_date_scalar(get_weather(date_start + timedelta(days=date)))
116
+
117
+ # разделяем логику обучения (текущие сутки с учётом БР, последующие - без учёта БР)
118
+ if date == 0:
119
+
120
+ if model == 'br3_act':
121
+ # удаляем столбцы с прогнозными значениями генерации и потребления БР СО ЕЭС
122
+ df_general = df_general.drop(columns=['PredCons', 'PredGen'])
123
+ # выгружаем данные (прогнозные и часть актуальных) с балансирующего рынка на текущие сутки
124
+ df_predicted_daily = df_predicted_daily.merge(get_br_feature(date_start), on='Date')
125
+ elif model == 'br2_act':
126
+ # удаляем столбцы с прогнозными значениями генерации и потребления БР СО ЕЭС и 1 признак: `Price`
127
+ df_general = df_general.drop(columns=['PredCons', 'PredGen', 'Price'])
128
+ # выгружаем данные (прогнозные и часть актуальных) с балансирующего рынка на текущие сутки
129
+ df_predicted_daily = df_predicted_daily.merge(get_br_feature(date_start).drop(columns='Price'), on='Date')
130
+
131
+ # восстанавливаем пропуски ActCons и ActGen данными из столбцов PredCons и PredGen
132
+ df_predicted_daily = act_pred_reverse(df_predicted_daily)
133
+
134
+ elif date == 1:
135
+
136
+ if model == 'br3_act':
137
+ df_general = df_general.drop(columns=['ActCons', 'ActGen', 'Price'])
138
+ elif model == 'br2_act':
139
+ df_general = df_general.drop(columns=['ActCons', 'ActGen'])
140
+
141
+ # добавляем последние 168 (24 · 7) строк от df_general для генерации временного лага
142
+ df_predicted_daily = pd.concat([df_general.tail(168), df_predicted_daily])
143
+
144
+ # генерируем временной лаг (1 полная неделя)
145
+ df_predicted_daily = prepareData(df_predicted_daily)
146
+
147
+ # получаем прогнозные значения на текущие сутки
148
+ df_predicted_daily = predict_volume(df_general,
149
+ df_predicted_daily,
150
+ max_depth,
151
+ learn_period)
152
+
153
+ # добавляем полученные прогнозные значения в итоговый прогнозный фрейм
154
+ df_predicted = pd.concat([df_predicted, df_predicted_daily])
155
+
156
+ # пополняем генеральную совокупность текущими сутками (+24 строки) для планирования следующего дня
157
+ df_general = pd.concat([df_general, df_predicted_daily])
158
+
159
+ df_predicted.rename(columns = {'Volume':'Predicted'}, inplace = True)
160
+
161
+ # убрать .tail(24) если необходимо получить прогноз на несколько дней
162
+ return df_predicted[['Date', 'Predicted']].tail(24*(date_end - date_start).days)
163
+
164
+
165
+ def date_str_format(df_predicted):
166
+
167
+ '''
168
+ SYNOPSYS: генерация строки с датой для экспортируемого xlsx-файла
169
+
170
+ KEYWORD ARGUMENTS:
171
+ date -- дата, на которую генерируется прогноз
172
+
173
+ RETURNS: str
174
+
175
+ EXAMPLE:
176
+ >>> date_str_format(df_predicted)
177
+ '12.11.2025'
178
+
179
+ >>> date_str_format(df_predicted)
180
+ '12-13.11.2025'
181
+
182
+ >>> date_str_format(df_predicted)
183
+ '30.11.2025-01.12.2025'
184
+ '''
185
+
186
+ def single_date_str_format(date):
187
+ return f'{0 if date.day < 10 else ""}{date.day}.{0 if date.month < 10 else ""}{date.month}.{date.year}'
188
+
189
+ date_start = df_predicted.iloc[1].Date.date()
190
+ date_end = df_predicted.iloc[-2].Date.date()
191
+
192
+ if df_predicted.shape[0] == 24:
193
+ return single_date_str_format(date_start)
194
+ else:
195
+ if date_start.month != date_end.month: # даты начала и конца выходят за пределы текущего месяца
196
+ return f'{single_date_str_format(date_start)}-{single_date_str_format(date_end)}'
197
+ else: # даты начала и конца не выходят за пределы текущего месяца
198
+ str_day = f'{0 if date_start.day < 10 else ""}{date_start.day}-{0 if date_end.day < 10 else ""}{date_end.day}'
199
+ str_month = f'{0 if date_start.month < 10 else ""}{date_start.month}'
200
+ return f'{str_day}.{str_month}.{date_start.year}'
201
+
202
+
203
+ def get_DAM_order(df_general,
204
+ max_depth,
205
+ learn_period,
206
+ model = 'br3_act',
207
+ date_start = datetime.now().date(),
208
+ date_end = datetime.now().date() + timedelta(days=1),
209
+ ):
210
+
211
+ '''
212
+ SYNOPSYS: функция, генерирующая заявку на РСВ с экспортом в xlsx-форму
213
+
214
+ KEYWORD ARGUMENTS:
215
+ df_general -- актуальная генеральная совокупность ко дню прогноза
216
+ max_depth -- максимальная глубина решающего дерева регрессора
217
+ learn_period -- период обучения модели (размер тренировочной выборки)
218
+ model -- тип модели (по умолчанию br3_act: [ActCons, ActGen, Price])
219
+ date_start -- дата от начала старта прогноза (по умолчанию – текущий день)
220
+ date_end -- дата окончания прогноза (по умолчанию – сутки вперёд)
221
+
222
+ RETURNS: *.xlsx
223
+
224
+ EXAMPLE:
225
+ >>> get_DAM_order(df_general, max_depth, learn_period)
226
+
227
+ >>> get_DAM_order(
228
+ df_general, max_depth, learn_period, model,
229
+ date_start=datetime.now().date(), date_end=datetime.now().date() + timedelta(days=2))
230
+ )
231
+ '''
232
+
233
+ df_predicted = get_df_predicted(df_general, max_depth, learn_period, model, date_start, date_end)
234
+ return_days = (date_end - date_start).days
235
+
236
+ if df_predicted.shape[0] == 24*return_days:
237
+ df_predicted.to_excel(f'Predicted({date_str_format(df_predicted)}).xlsx', index=False)
238
+ print(f'Результаты прогноза сохранены в файле Predicted({date_str_format(df_predicted)}).xlsx')
239
+ else:
240
+ print("\033[1;31m{}".format('WARNING: No data in exported dataframe'))
241
+
242
+ return df_predicted
@@ -0,0 +1,46 @@
1
+ """
2
+ import all dependencies
3
+ """
4
+
5
+ import os
6
+ import re
7
+ import ephem
8
+ import locale
9
+ import requests
10
+ import warnings
11
+ import calendar
12
+ from bs4 import BeautifulSoup
13
+ from tqdm.notebook import trange
14
+
15
+ import notebook
16
+ import numpy as np
17
+ import pandas as pd
18
+ import seaborn as sns
19
+ import matplotlib
20
+ import matplotlib.pyplot as plt
21
+ import matplotlib.ticker as ticker
22
+ from platform import python_version
23
+ import catboost
24
+ import graphviz
25
+ from datetime import time
26
+ from datetime import datetime
27
+ from datetime import timedelta
28
+ import sklearn
29
+ from sklearn.metrics import mean_absolute_error as MAE
30
+ from sklearn.metrics import mean_absolute_percentage_error as MAPE
31
+ from sklearn.model_selection import ShuffleSplit
32
+ from sklearn.model_selection import GridSearchCV
33
+ from sklearn.model_selection import TimeSeriesSplit
34
+ from sklearn.model_selection import LearningCurveDisplay
35
+
36
+ def print_dependency():
37
+ print(f"python: v {python_version()}")
38
+ print(f"Jupyter Notebook: v {notebook.__version__}")
39
+ print(f"numpy: v {np.__version__}")
40
+ print(f"pandas: v {pd.__version__}")
41
+ print(f"seaborn: v {sns.__version__}")
42
+ print(f"graphviz: v {graphviz.__version__}")
43
+ print(f"matplotlib: v {matplotlib.__version__}")
44
+ print(f"sklearn: v {sklearn.__version__}")
45
+ print(f"CatBoost: v {catboost.__version__}")
46
+ print(f"STLFLib: v 1.0.0")