stlflib 1.0.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- stlflib-1.0.1/PKG-INFO +96 -0
- stlflib-1.0.1/README.md +69 -0
- stlflib-1.0.1/STLFLib/EDA.py +83 -0
- stlflib-1.0.1/STLFLib/__init__.py +14 -0
- stlflib-1.0.1/STLFLib/core.py +242 -0
- stlflib-1.0.1/STLFLib/dependency.py +46 -0
- stlflib-1.0.1/STLFLib/preprocessing.py +244 -0
- stlflib-1.0.1/STLFLib/serviceDB.py +716 -0
- stlflib-1.0.1/STLFLib/validating.py +420 -0
- stlflib-1.0.1/setup.cfg +4 -0
- stlflib-1.0.1/setup.py +42 -0
- stlflib-1.0.1/stlflib.egg-info/PKG-INFO +96 -0
- stlflib-1.0.1/stlflib.egg-info/SOURCES.txt +15 -0
- stlflib-1.0.1/stlflib.egg-info/dependency_links.txt +1 -0
- stlflib-1.0.1/stlflib.egg-info/requires.txt +11 -0
- stlflib-1.0.1/stlflib.egg-info/top_level.txt +1 -0
stlflib-1.0.1/PKG-INFO
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: stlflib
|
|
3
|
+
Version: 1.0.1
|
|
4
|
+
Summary: Short-Term Forecasting of Regional Electrical Load Based on CatBoost Model
|
|
5
|
+
Home-page: https://github.com/caapel/ForecastPowerEnergy
|
|
6
|
+
Author: caapel
|
|
7
|
+
Author-email: caapel@mail.ru
|
|
8
|
+
License: KSPEU License
|
|
9
|
+
Project-URL: GitHub, https://github.com/caapel/ForecastPowerEnergy
|
|
10
|
+
Keywords: STLF
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
12
|
+
Classifier: License :: Other/Proprietary License
|
|
13
|
+
Classifier: Operating System :: OS Independent
|
|
14
|
+
Requires-Python: >=3.10
|
|
15
|
+
Description-Content-Type: text/markdown
|
|
16
|
+
Requires-Dist: ephem
|
|
17
|
+
Requires-Dist: requests
|
|
18
|
+
Requires-Dist: beautifulsoup4
|
|
19
|
+
Requires-Dist: tqdm
|
|
20
|
+
Requires-Dist: numpy
|
|
21
|
+
Requires-Dist: pandas
|
|
22
|
+
Requires-Dist: seaborn
|
|
23
|
+
Requires-Dist: matplotlib
|
|
24
|
+
Requires-Dist: catboost
|
|
25
|
+
Requires-Dist: graphviz
|
|
26
|
+
Requires-Dist: scikit-learn
|
|
27
|
+
|
|
28
|
+
# Short-Term Load Forecasting Based on CatBoost Model Library (STLFLib)
|
|
29
|
+
|
|
30
|
+
This is a Python STLF machine learning library designed for generating energy consumption bids for the DAM (day-ahead market).
|
|
31
|
+
The library is distributed under the KSPEU license ([RU 2025688100](https://new.fips.ru/registers-doc-view/fips_servlet?DB=EVM&DocNumber=2025688100&TypeFile=html)). For commercial use, please contact the author: caapel@mail.ru.
|
|
32
|
+
|
|
33
|
+
----------
|
|
34
|
+
|
|
35
|
+
## How to install ##
|
|
36
|
+
To install, you can use the command:
|
|
37
|
+
|
|
38
|
+
pip install stlflib
|
|
39
|
+
|
|
40
|
+
Or download the repository from [GitHub](https://github.com/caapel/ForecastPowerEnergy) (private access)
|
|
41
|
+
|
|
42
|
+
----------
|
|
43
|
+
|
|
44
|
+
## Using ##
|
|
45
|
+
The essence of this project and its library is described in detail in the study [***Short-Term Forecasting of Regional Electrical Load Based on XGBoost Model***](https://doi.org/10.3390/en18195144)
|
|
46
|
+
> In this file you will not find a detailed description and instructions on how to work with this library;
|
|
47
|
+
> only a description of each of the basic library modules.
|
|
48
|
+
|
|
49
|
+
### Dependency ##
|
|
50
|
+
The **dependency** module (located in the `dependency.py` file) contains a complete list of dependencies.
|
|
51
|
+
The module has only one function:
|
|
52
|
+
- *print_dependency()* - prints the versions of installed dependencies and the current version of STLFLib
|
|
53
|
+
|
|
54
|
+
### ServiceDB ###
|
|
55
|
+
The **serviceDB** module (located in the `serviceDB.py` file) contains a set of tools for working with the database:
|
|
56
|
+
<br>-------------------------------create---------------------------------<br>
|
|
57
|
+
- *generate_volume_df(path)* - generate a dataframe with archived energy consumption data from prepared .xls-files located at `path`
|
|
58
|
+
- *get_weather(date)* - generate a weather archive/forecast (outside air temperature) for the specified date with a sampling frequency of 1 hour
|
|
59
|
+
- *get_br_feature(date)* - load a BEM (Balancing energy Market) archive/forecast for the specified date
|
|
60
|
+
- *get_RSV_rate(date)* - load the unregulated DAM price for the specified date (per month)
|
|
61
|
+
- *updating_or_create_df(get_function, filename, start=datetime(2013, 1, 1).date())* - create a new (from the specified date) or replenish an existing database (filename.xlsx) with missing data up to the end of the previous month, returning the resulting dataframe.
|
|
62
|
+
- *merge_and_export_DB(total_volume_df, df_weather, df_br_feature, filename='DataBase.xlsx')* - merge dataframes total_volume_df (`Volume.xlsx`), df_weather (`Weather.xlsx`), and df_br_feature (`br_feature.xlsx`) by the 'Date' column into one common database (by default, `DataBase.xlsx`)
|
|
63
|
+
<br>-------------------------------service--------------------------------<br>
|
|
64
|
+
- *get_empty_daily_df(date)* - Creates an empty dataframe (25 rows: from 0:00 to 24:00) for the specified date (for full temperature interpolation)
|
|
65
|
+
- *add_date_scalar(df)* - Adds additional categorical features to the dataframe: Day, Month, Year, WeekDay
|
|
66
|
+
- *is_check_DataBase(df)* - Checks database integrity
|
|
67
|
+
- *act_pred_reverse(df_br_feature)* - Replaces missing actual (Act) consumption and BR generation values with planned (Pred) values. This function is used to generate a forecast for the current day, when the actual values of `ActCons` and `ActGen` are not available for the entire day.
|
|
68
|
+
- *get_files_from_path(path='_raw_Data_TatEnergosbyt')* - Retrieving operational data from the directory (by default, `/_raw_Data_TatEnergosbyt`)
|
|
69
|
+
- *update_DataBase(total_oper_df, filename='DataBase.xlsx')* - Updating the database by adding operational data from `total_oper_df`
|
|
70
|
+
|
|
71
|
+
### Preprocessing ###
|
|
72
|
+
The **preprocessing** module (located in the `preprocessing.py` file) contains data preprocessing tools for subsequent transfer of this data to the **core** functions (CatBoostRegressor):
|
|
73
|
+
- *get_type_day(df)* - encoding the day type (`TypeDay`) based on the `df.Date` column of the DataTime format. The encoding is performed based on the industrial calendar of the Republic of Tatarstan.
|
|
74
|
+
- *get_light(df)* - encoding the light interval (`Light`) based on the `df.Date` column of the DataTime format. The encoding is performed based on the geographic location of the city of Kazan.
|
|
75
|
+
- *get_season(df)* - encoding seasonality based on the `df.Date` column of the DataTime format.
|
|
76
|
+
- *prepareData(df, lag_start=1, lag_end=7)* - data preprocessing function. Preprocessing includes: adding day type, light interval, seasonality, and energy consumption lag (default 1...7 days)
|
|
77
|
+
|
|
78
|
+
### Core ###
|
|
79
|
+
The **core** module (located in the `core.py` file) is the main class in the library. It is based on `CatBoostRegressor` and has a number of functions:
|
|
80
|
+
- *predict_volume(df_general, df_predict, max_depth, learn_period)*
|
|
81
|
+
- *get_df_predicted(df_general, max_depth, learn_period, model, date_start, date_end)* - generates a data frame with the predicted energy consumption volume for the specified planning horizon
|
|
82
|
+
- *date_str_format(df_predicted)* - generates a date string for the exported xlsx file
|
|
83
|
+
- *get_DAM_order(df_general, max_depth, learn_period, model, date_start, date_end)* - a function that generates a DAM order and exports it to xlsx format
|
|
84
|
+
|
|
85
|
+
### Validating ###
|
|
86
|
+
The **validating** helper module (located in the `validating.py` file) is designed to validate the **core** functions:
|
|
87
|
+
- *get_df_val_predicted(df_general, df_general_date_index, max_depth, learn_period, model, date_start, date_end)* - function for generating a dataframe with predicted energy consumption volumes for the specified planning horizon, adapted for validation calculations (simulating the absence of 'ActCons' and 'ActGen' data after 7 AM, offline access to the weather forecast and BEM data)
|
|
88
|
+
- *get_df_validate(df_general, df_general_date_index, max_depth, learn_period, model, date_start, date_end, logging=True)* - function for validating the model for the specified time interval. Returns a validation dataframe with predicted values.
|
|
89
|
+
- *get_df_validate_with_loss(df_validate_result, df_RSV_vs_BR_rate)* - adds a 'loss' column with BEM losses to the resulting dataframe.
|
|
90
|
+
- *diff_predict_vs_fact(df_validate_result)* - outputs validation results in table and graph form (Matplotlib object).
|
|
91
|
+
- *Grid_Search(df_general, df_general_date_index, max_depth_grid, learn_period_grid, model, date_start, date_end)* - grid search for optimal training period and tree depth.
|
|
92
|
+
- *search_result_highlighting(df_search_result)* - highlights the search results of the Grid_Search() function.
|
|
93
|
+
|
|
94
|
+
### EDA ###
|
|
95
|
+
The **EDA** graphics module (located in the `EDA.py` file) is designed to display the results of exploratory data analysis. In developing.
|
|
96
|
+
- *draw_learning_curve(df_general, max_depth, model, fontsize=15)* - calculates and plots the learning curve.
|
stlflib-1.0.1/README.md
ADDED
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
# Short-Term Load Forecasting Based on CatBoost Model Library (STLFLib)
|
|
2
|
+
|
|
3
|
+
This is a Python STLF machine learning library designed for generating energy consumption bids for the DAM (day-ahead market).
|
|
4
|
+
The library is distributed under the KSPEU license ([RU 2025688100](https://new.fips.ru/registers-doc-view/fips_servlet?DB=EVM&DocNumber=2025688100&TypeFile=html)). For commercial use, please contact the author: caapel@mail.ru.
|
|
5
|
+
|
|
6
|
+
----------
|
|
7
|
+
|
|
8
|
+
## How to install ##
|
|
9
|
+
To install, you can use the command:
|
|
10
|
+
|
|
11
|
+
pip install stlflib
|
|
12
|
+
|
|
13
|
+
Or download the repository from [GitHub](https://github.com/caapel/ForecastPowerEnergy) (private access)
|
|
14
|
+
|
|
15
|
+
----------
|
|
16
|
+
|
|
17
|
+
## Using ##
|
|
18
|
+
The essence of this project and its library is described in detail in the study [***Short-Term Forecasting of Regional Electrical Load Based on XGBoost Model***](https://doi.org/10.3390/en18195144)
|
|
19
|
+
> In this file you will not find a detailed description and instructions on how to work with this library;
|
|
20
|
+
> only a description of each of the basic library modules.
|
|
21
|
+
|
|
22
|
+
### Dependency ##
|
|
23
|
+
The **dependency** module (located in the `dependency.py` file) contains a complete list of dependencies.
|
|
24
|
+
The module has only one function:
|
|
25
|
+
- *print_dependency()* - prints the versions of installed dependencies and the current version of STLFLib
|
|
26
|
+
|
|
27
|
+
### ServiceDB ###
|
|
28
|
+
The **serviceDB** module (located in the `serviceDB.py` file) contains a set of tools for working with the database:
|
|
29
|
+
<br>-------------------------------create---------------------------------<br>
|
|
30
|
+
- *generate_volume_df(path)* - generate a dataframe with archived energy consumption data from prepared .xls-files located at `path`
|
|
31
|
+
- *get_weather(date)* - generate a weather archive/forecast (outside air temperature) for the specified date with a sampling frequency of 1 hour
|
|
32
|
+
- *get_br_feature(date)* - load a BEM (Balancing energy Market) archive/forecast for the specified date
|
|
33
|
+
- *get_RSV_rate(date)* - load the unregulated DAM price for the specified date (per month)
|
|
34
|
+
- *updating_or_create_df(get_function, filename, start=datetime(2013, 1, 1).date())* - create a new (from the specified date) or replenish an existing database (filename.xlsx) with missing data up to the end of the previous month, returning the resulting dataframe.
|
|
35
|
+
- *merge_and_export_DB(total_volume_df, df_weather, df_br_feature, filename='DataBase.xlsx')* - merge dataframes total_volume_df (`Volume.xlsx`), df_weather (`Weather.xlsx`), and df_br_feature (`br_feature.xlsx`) by the 'Date' column into one common database (by default, `DataBase.xlsx`)
|
|
36
|
+
<br>-------------------------------service--------------------------------<br>
|
|
37
|
+
- *get_empty_daily_df(date)* - Creates an empty dataframe (25 rows: from 0:00 to 24:00) for the specified date (for full temperature interpolation)
|
|
38
|
+
- *add_date_scalar(df)* - Adds additional categorical features to the dataframe: Day, Month, Year, WeekDay
|
|
39
|
+
- *is_check_DataBase(df)* - Checks database integrity
|
|
40
|
+
- *act_pred_reverse(df_br_feature)* - Replaces missing actual (Act) consumption and BR generation values with planned (Pred) values. This function is used to generate a forecast for the current day, when the actual values of `ActCons` and `ActGen` are not available for the entire day.
|
|
41
|
+
- *get_files_from_path(path='_raw_Data_TatEnergosbyt')* - Retrieving operational data from the directory (by default, `/_raw_Data_TatEnergosbyt`)
|
|
42
|
+
- *update_DataBase(total_oper_df, filename='DataBase.xlsx')* - Updating the database by adding operational data from `total_oper_df`
|
|
43
|
+
|
|
44
|
+
### Preprocessing ###
|
|
45
|
+
The **preprocessing** module (located in the `preprocessing.py` file) contains data preprocessing tools for subsequent transfer of this data to the **core** functions (CatBoostRegressor):
|
|
46
|
+
- *get_type_day(df)* - encoding the day type (`TypeDay`) based on the `df.Date` column of the DataTime format. The encoding is performed based on the industrial calendar of the Republic of Tatarstan.
|
|
47
|
+
- *get_light(df)* - encoding the light interval (`Light`) based on the `df.Date` column of the DataTime format. The encoding is performed based on the geographic location of the city of Kazan.
|
|
48
|
+
- *get_season(df)* - encoding seasonality based on the `df.Date` column of the DataTime format.
|
|
49
|
+
- *prepareData(df, lag_start=1, lag_end=7)* - data preprocessing function. Preprocessing includes: adding day type, light interval, seasonality, and energy consumption lag (default 1...7 days)
|
|
50
|
+
|
|
51
|
+
### Core ###
|
|
52
|
+
The **core** module (located in the `core.py` file) is the main class in the library. It is based on `CatBoostRegressor` and has a number of functions:
|
|
53
|
+
- *predict_volume(df_general, df_predict, max_depth, learn_period)*
|
|
54
|
+
- *get_df_predicted(df_general, max_depth, learn_period, model, date_start, date_end)* - generates a data frame with the predicted energy consumption volume for the specified planning horizon
|
|
55
|
+
- *date_str_format(df_predicted)* - generates a date string for the exported xlsx file
|
|
56
|
+
- *get_DAM_order(df_general, max_depth, learn_period, model, date_start, date_end)* - a function that generates a DAM order and exports it to xlsx format
|
|
57
|
+
|
|
58
|
+
### Validating ###
|
|
59
|
+
The **validating** helper module (located in the `validating.py` file) is designed to validate the **core** functions:
|
|
60
|
+
- *get_df_val_predicted(df_general, df_general_date_index, max_depth, learn_period, model, date_start, date_end)* - function for generating a dataframe with predicted energy consumption volumes for the specified planning horizon, adapted for validation calculations (simulating the absence of 'ActCons' and 'ActGen' data after 7 AM, offline access to the weather forecast and BEM data)
|
|
61
|
+
- *get_df_validate(df_general, df_general_date_index, max_depth, learn_period, model, date_start, date_end, logging=True)* - function for validating the model for the specified time interval. Returns a validation dataframe with predicted values.
|
|
62
|
+
- *get_df_validate_with_loss(df_validate_result, df_RSV_vs_BR_rate)* - adds a 'loss' column with BEM losses to the resulting dataframe.
|
|
63
|
+
- *diff_predict_vs_fact(df_validate_result)* - outputs validation results in table and graph form (Matplotlib object).
|
|
64
|
+
- *Grid_Search(df_general, df_general_date_index, max_depth_grid, learn_period_grid, model, date_start, date_end)* - grid search for optimal training period and tree depth.
|
|
65
|
+
- *search_result_highlighting(df_search_result)* - highlights the search results of the Grid_Search() function.
|
|
66
|
+
|
|
67
|
+
### EDA ###
|
|
68
|
+
The **EDA** graphics module (located in the `EDA.py` file) is designed to display the results of exploratory data analysis. In developing.
|
|
69
|
+
- *draw_learning_curve(df_general, max_depth, model, fontsize=15)* - calculates and plots the learning curve.
|
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
from .dependency import *
|
|
2
|
+
from .validating import *
|
|
3
|
+
|
|
4
|
+
def draw_learning_curve(df_general, max_depth, model, fontsize=15):
|
|
5
|
+
|
|
6
|
+
'''
|
|
7
|
+
SYNOPSYS: Расчёт и построение кривой обучения
|
|
8
|
+
|
|
9
|
+
KEYWORD ARGUMENTS:
|
|
10
|
+
df_general -- валидируемая генеральная совокупность
|
|
11
|
+
max_depth -- максимальная глубина решающего дерева регрессора
|
|
12
|
+
model -- тип модели
|
|
13
|
+
fontsize -- размер шрифта на графике
|
|
14
|
+
|
|
15
|
+
RETURNS:
|
|
16
|
+
None
|
|
17
|
+
|
|
18
|
+
EXAMPLE:
|
|
19
|
+
>>> draw_learning_curve(df_general, max_depth=6, model=model)
|
|
20
|
+
'''
|
|
21
|
+
|
|
22
|
+
def X_y_split(df_general, learn_period):
|
|
23
|
+
|
|
24
|
+
'''
|
|
25
|
+
SYNOPSYS: Разделение выборки на целевой и исходные признаки
|
|
26
|
+
'''
|
|
27
|
+
|
|
28
|
+
datetime=df_general.Date.iloc[-1]
|
|
29
|
+
|
|
30
|
+
df_train = df_general[(df_general.Date > datetime - timedelta(days=round(365.25*learn_period), hours=0)) &
|
|
31
|
+
(df_general.Date <= datetime)]
|
|
32
|
+
|
|
33
|
+
return df_train.drop(columns=['Date', 'Volume']), df_train.Volume
|
|
34
|
+
|
|
35
|
+
if model == 'br3_act': # c 30 ноября 2025 перейти на 8 лет для br3 и 13 лет для br2
|
|
36
|
+
df_general = df_general.drop(columns=['PredCons', 'PredGen'])
|
|
37
|
+
learn_period = 7
|
|
38
|
+
elif model == 'br2_act':
|
|
39
|
+
df_general = df_general.drop(columns=['PredCons', 'PredGen', 'Price'])
|
|
40
|
+
learn_period = 12
|
|
41
|
+
|
|
42
|
+
X, y = X_y_split(df_general, learn_period)
|
|
43
|
+
|
|
44
|
+
engine = catboost.CatBoostRegressor(silent=True,
|
|
45
|
+
n_estimators = 200,
|
|
46
|
+
max_depth = max_depth)
|
|
47
|
+
common_params = {
|
|
48
|
+
"X": X,
|
|
49
|
+
"y": y,
|
|
50
|
+
"train_sizes": np.linspace(1/learn_period, 1.0, learn_period),
|
|
51
|
+
"cv": TimeSeriesSplit(n_splits=5),
|
|
52
|
+
"scoring": 'neg_mean_absolute_percentage_error',
|
|
53
|
+
"n_jobs": -1,
|
|
54
|
+
"line_kw": {"marker": "o"},
|
|
55
|
+
"std_display_style": "fill_between",
|
|
56
|
+
"score_name": "neg MAPE",
|
|
57
|
+
}
|
|
58
|
+
|
|
59
|
+
plt.figure(figsize=(10, 5))
|
|
60
|
+
plt.rcParams['axes.prop_cycle'] = plt.cycler(color=['#1f77b4', '#8ec489'])
|
|
61
|
+
plt.rcParams['font.family'] = 'Palatino Linotype'
|
|
62
|
+
plt.rcParams['font.size'] = fontsize
|
|
63
|
+
|
|
64
|
+
LCD = LearningCurveDisplay.from_estimator(engine, **common_params)
|
|
65
|
+
train_sizes = LCD.train_sizes
|
|
66
|
+
test_scores = LCD.test_scores
|
|
67
|
+
mean_test_scores = [test_scores[i].mean() for i in range(len(test_scores))]
|
|
68
|
+
opt_y = max(mean_test_scores)
|
|
69
|
+
opt_x = train_sizes[mean_test_scores.index(opt_y)]
|
|
70
|
+
|
|
71
|
+
for i in range(len(mean_test_scores)):
|
|
72
|
+
plt.text(train_sizes[i], mean_test_scores[i], f'{i+1}', ha='center', va='bottom')
|
|
73
|
+
|
|
74
|
+
plt.annotate('Optimal period', xy=(opt_x, opt_y),
|
|
75
|
+
xycoords='data', xytext=(opt_x-4000, opt_y-0.015),
|
|
76
|
+
textcoords='data', fontsize=fontsize,
|
|
77
|
+
arrowprops=dict(arrowstyle='-|>'))
|
|
78
|
+
|
|
79
|
+
plt.title(f'Learning Curve for {model} (max_depth={max_depth})', fontsize=fontsize)
|
|
80
|
+
plt.legend(['Training score', 'Test score'], loc='lower right')
|
|
81
|
+
plt.tight_layout()
|
|
82
|
+
#plt.savefig(f'pictures/br2_Learning_Curve(max_depth={max_depth}).png', dpi = 300, transparent = True)
|
|
83
|
+
plt.show()
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
'''
|
|
2
|
+
|
|
3
|
+
Short-Term Forecasting of Regional Electrical Load Based on CatBoost Model library (STLFLib) v. 1.0.0.0
|
|
4
|
+
|
|
5
|
+
The KSPEU License Copyrigth © 2025 caapel
|
|
6
|
+
|
|
7
|
+
'''
|
|
8
|
+
|
|
9
|
+
from .dependency import *
|
|
10
|
+
from .serviceDB import *
|
|
11
|
+
from .preprocessing import *
|
|
12
|
+
from .core import *
|
|
13
|
+
from .validating import *
|
|
14
|
+
from .EDA import *
|
|
@@ -0,0 +1,242 @@
|
|
|
1
|
+
from .dependency import *
|
|
2
|
+
from .preprocessing import *
|
|
3
|
+
from .serviceDB import *
|
|
4
|
+
|
|
5
|
+
def predict_volume(df_general, df_predict, max_depth, learn_period):
|
|
6
|
+
|
|
7
|
+
'''
|
|
8
|
+
SYNOPSYS: Функция обучения модели и прогнозирования объемов энергопотребления
|
|
9
|
+
|
|
10
|
+
KEYWORD ARGUMENTS:
|
|
11
|
+
df_general -- датафрейм с генеральной совокупностью данных (X_train, y_train)
|
|
12
|
+
df_predict -- суточный датафрейм с исходными данными для прогноза (x_pred)
|
|
13
|
+
max_depth -- максимальная глубина решающего дерева регрессора
|
|
14
|
+
learn_period -- период обучения модели (размер тренировочной выборки)
|
|
15
|
+
|
|
16
|
+
RETURNS:
|
|
17
|
+
df_predict : pandas.core.frame.DataFrame
|
|
18
|
+
|
|
19
|
+
EXAMPLES:
|
|
20
|
+
>>> predict_volume(df_general, df_predict, datetime(2025, 8, 30, 1), max_depth, learn_period).info()
|
|
21
|
+
|
|
22
|
+
<class 'pandas.core.frame.DataFrame'>
|
|
23
|
+
RangeIndex: 24 entries, 0 to 23
|
|
24
|
+
Data columns (total 21 columns):
|
|
25
|
+
# Column Non-Null Count Dtype
|
|
26
|
+
--- ------ -------------- -----
|
|
27
|
+
0 Date 24 non-null datetime64[ns]
|
|
28
|
+
1 Year 24 non-null int64
|
|
29
|
+
2 Month 24 non-null int64
|
|
30
|
+
3 Day 24 non-null int64
|
|
31
|
+
4 Hour 24 non-null int64
|
|
32
|
+
5 Weekday 24 non-null int64
|
|
33
|
+
6 Volume 24 non-null float64
|
|
34
|
+
7 Temperature 24 non-null float64
|
|
35
|
+
8 ActCons 24 non-null int64
|
|
36
|
+
9 ActGen 24 non-null int64
|
|
37
|
+
10 Price 24 non-null int64
|
|
38
|
+
11 TypeDay 24 non-null int64
|
|
39
|
+
12 Light 24 non-null int64
|
|
40
|
+
13 Season 24 non-null int64
|
|
41
|
+
14 lag-1 24 non-null float64
|
|
42
|
+
15 lag-2 24 non-null float64
|
|
43
|
+
16 lag-3 24 non-null float64
|
|
44
|
+
17 lag-4 24 non-null float64
|
|
45
|
+
18 lag-5 24 non-null float64
|
|
46
|
+
19 lag-6 24 non-null float64
|
|
47
|
+
20 lag-7 24 non-null float64
|
|
48
|
+
dtypes: datetime64[ns](1), float64(9), int64(11)
|
|
49
|
+
memory usage: 4.1 KB
|
|
50
|
+
'''
|
|
51
|
+
|
|
52
|
+
# формируем оптимальную обучающую выборку
|
|
53
|
+
df_train = df_general[df_general.Date > df_predict.iloc[0].Date - timedelta(days=365*learn_period+1, hours=1)]
|
|
54
|
+
|
|
55
|
+
# обучаем модель
|
|
56
|
+
model = catboost.CatBoostRegressor(silent=True,
|
|
57
|
+
n_estimators = 200,
|
|
58
|
+
max_depth = max_depth)
|
|
59
|
+
|
|
60
|
+
model.fit(df_train.drop(columns=['Date', 'Volume']), df_train.Volume)
|
|
61
|
+
|
|
62
|
+
# прогнозируем объёмы потребления на ближайшие сутки
|
|
63
|
+
df_predict.Volume = model.predict(df_predict.drop(columns=['Date', 'Volume']))
|
|
64
|
+
|
|
65
|
+
return df_predict
|
|
66
|
+
|
|
67
|
+
|
|
68
|
+
def get_df_predicted(df_general,
|
|
69
|
+
max_depth,
|
|
70
|
+
learn_period,
|
|
71
|
+
model,
|
|
72
|
+
date_start = datetime.now().date(),
|
|
73
|
+
date_end = datetime.now().date() + timedelta(days=1),
|
|
74
|
+
):
|
|
75
|
+
|
|
76
|
+
'''
|
|
77
|
+
SYNOPSYS: Функция генерации датафрейма с прогнозными объёмами энергопотребления под указанный горизонт планирования
|
|
78
|
+
|
|
79
|
+
KEYWORD ARGUMENTS:
|
|
80
|
+
df_general -- актуальная генеральная совокупность ко дню прогноза
|
|
81
|
+
max_depth -- максимальная глубина решающего дерева регрессора
|
|
82
|
+
learn_period -- период обучения модели (размер тренировочной выборки)
|
|
83
|
+
model -- тип модели (по умолчанию br3_act: [ActCons, ActGen, Price])
|
|
84
|
+
date_start -- дата от начала старта прогноза (по умолчанию – текущий день)
|
|
85
|
+
date_end -- дата окончания прогноза (по умолчанию – сутки вперёд)
|
|
86
|
+
|
|
87
|
+
RETURNS:
|
|
88
|
+
df_predict : pandas.core.frame.DataFrame
|
|
89
|
+
|
|
90
|
+
EXAMPLE:
|
|
91
|
+
# генерация прогнозных объемов потребления ЭЭ на сутки вперед
|
|
92
|
+
>>> get_df_predicted(df_general, max_depth, learn_period, model)
|
|
93
|
+
|
|
94
|
+
# генерация прогнозных объемов потребления ЭЭ от указанной даты на сутки вперед
|
|
95
|
+
>>> get_df_predicted(df_general, max_depth, learn_period, model, date_start=datetime(2024, 5, 12).date())
|
|
96
|
+
|
|
97
|
+
# генерация прогнозных объемов потребления ЭЭ от текущего дня до указанной даты
|
|
98
|
+
>>> get_df_predicted(df_general, max_depth, learn_period, model, date_end=datetime(2024, 6, 15).date())
|
|
99
|
+
|
|
100
|
+
# генерация прогнозных объемов потребления ЭЭ от одной до другой указанной даты
|
|
101
|
+
>>> get_df_predicted(df_general, max_depth, learn_period, model, date_start=datetime(2024, 5, 14).date(),
|
|
102
|
+
date_end=datetime(2024, 5, 16).date())
|
|
103
|
+
|
|
104
|
+
# генерация прогнозных объемов потребления ЭЭ на двое суток вперед
|
|
105
|
+
>>> get_df_predicted(df_general, max_depth, learn_period, model, date_end=datetime.now().date() + timedelta(days=2))
|
|
106
|
+
'''
|
|
107
|
+
|
|
108
|
+
# генерируем пустой датафрейм под итоговый результат
|
|
109
|
+
df_predicted = pd.DataFrame()
|
|
110
|
+
|
|
111
|
+
#for date in trange((date_end - date_start).days + 1, desc=f"days progress"): # виджет процесса расчёта по суткам
|
|
112
|
+
for date in range((date_end - date_start).days + 1):
|
|
113
|
+
|
|
114
|
+
# генерируем пустой суточный датафрейм с погодными и календарными признаками
|
|
115
|
+
df_predicted_daily = add_date_scalar(get_weather(date_start + timedelta(days=date)))
|
|
116
|
+
|
|
117
|
+
# разделяем логику обучения (текущие сутки с учётом БР, последующие - без учёта БР)
|
|
118
|
+
if date == 0:
|
|
119
|
+
|
|
120
|
+
if model == 'br3_act':
|
|
121
|
+
# удаляем столбцы с прогнозными значениями генерации и потребления БР СО ЕЭС
|
|
122
|
+
df_general = df_general.drop(columns=['PredCons', 'PredGen'])
|
|
123
|
+
# выгружаем данные (прогнозные и часть актуальных) с балансирующего рынка на текущие сутки
|
|
124
|
+
df_predicted_daily = df_predicted_daily.merge(get_br_feature(date_start), on='Date')
|
|
125
|
+
elif model == 'br2_act':
|
|
126
|
+
# удаляем столбцы с прогнозными значениями генерации и потребления БР СО ЕЭС и 1 признак: `Price`
|
|
127
|
+
df_general = df_general.drop(columns=['PredCons', 'PredGen', 'Price'])
|
|
128
|
+
# выгружаем данные (прогнозные и часть актуальных) с балансирующего рынка на текущие сутки
|
|
129
|
+
df_predicted_daily = df_predicted_daily.merge(get_br_feature(date_start).drop(columns='Price'), on='Date')
|
|
130
|
+
|
|
131
|
+
# восстанавливаем пропуски ActCons и ActGen данными из столбцов PredCons и PredGen
|
|
132
|
+
df_predicted_daily = act_pred_reverse(df_predicted_daily)
|
|
133
|
+
|
|
134
|
+
elif date == 1:
|
|
135
|
+
|
|
136
|
+
if model == 'br3_act':
|
|
137
|
+
df_general = df_general.drop(columns=['ActCons', 'ActGen', 'Price'])
|
|
138
|
+
elif model == 'br2_act':
|
|
139
|
+
df_general = df_general.drop(columns=['ActCons', 'ActGen'])
|
|
140
|
+
|
|
141
|
+
# добавляем последние 168 (24 · 7) строк от df_general для генерации временного лага
|
|
142
|
+
df_predicted_daily = pd.concat([df_general.tail(168), df_predicted_daily])
|
|
143
|
+
|
|
144
|
+
# генерируем временной лаг (1 полная неделя)
|
|
145
|
+
df_predicted_daily = prepareData(df_predicted_daily)
|
|
146
|
+
|
|
147
|
+
# получаем прогнозные значения на текущие сутки
|
|
148
|
+
df_predicted_daily = predict_volume(df_general,
|
|
149
|
+
df_predicted_daily,
|
|
150
|
+
max_depth,
|
|
151
|
+
learn_period)
|
|
152
|
+
|
|
153
|
+
# добавляем полученные прогнозные значения в итоговый прогнозный фрейм
|
|
154
|
+
df_predicted = pd.concat([df_predicted, df_predicted_daily])
|
|
155
|
+
|
|
156
|
+
# пополняем генеральную совокупность текущими сутками (+24 строки) для планирования следующего дня
|
|
157
|
+
df_general = pd.concat([df_general, df_predicted_daily])
|
|
158
|
+
|
|
159
|
+
df_predicted.rename(columns = {'Volume':'Predicted'}, inplace = True)
|
|
160
|
+
|
|
161
|
+
# убрать .tail(24) если необходимо получить прогноз на несколько дней
|
|
162
|
+
return df_predicted[['Date', 'Predicted']].tail(24*(date_end - date_start).days)
|
|
163
|
+
|
|
164
|
+
|
|
165
|
+
def date_str_format(df_predicted):
|
|
166
|
+
|
|
167
|
+
'''
|
|
168
|
+
SYNOPSYS: генерация строки с датой для экспортируемого xlsx-файла
|
|
169
|
+
|
|
170
|
+
KEYWORD ARGUMENTS:
|
|
171
|
+
date -- дата, на которую генерируется прогноз
|
|
172
|
+
|
|
173
|
+
RETURNS: str
|
|
174
|
+
|
|
175
|
+
EXAMPLE:
|
|
176
|
+
>>> date_str_format(df_predicted)
|
|
177
|
+
'12.11.2025'
|
|
178
|
+
|
|
179
|
+
>>> date_str_format(df_predicted)
|
|
180
|
+
'12-13.11.2025'
|
|
181
|
+
|
|
182
|
+
>>> date_str_format(df_predicted)
|
|
183
|
+
'30.11.2025-01.12.2025'
|
|
184
|
+
'''
|
|
185
|
+
|
|
186
|
+
def single_date_str_format(date):
|
|
187
|
+
return f'{0 if date.day < 10 else ""}{date.day}.{0 if date.month < 10 else ""}{date.month}.{date.year}'
|
|
188
|
+
|
|
189
|
+
date_start = df_predicted.iloc[1].Date.date()
|
|
190
|
+
date_end = df_predicted.iloc[-2].Date.date()
|
|
191
|
+
|
|
192
|
+
if df_predicted.shape[0] == 24:
|
|
193
|
+
return single_date_str_format(date_start)
|
|
194
|
+
else:
|
|
195
|
+
if date_start.month != date_end.month: # даты начала и конца выходят за пределы текущего месяца
|
|
196
|
+
return f'{single_date_str_format(date_start)}-{single_date_str_format(date_end)}'
|
|
197
|
+
else: # даты начала и конца не выходят за пределы текущего месяца
|
|
198
|
+
str_day = f'{0 if date_start.day < 10 else ""}{date_start.day}-{0 if date_end.day < 10 else ""}{date_end.day}'
|
|
199
|
+
str_month = f'{0 if date_start.month < 10 else ""}{date_start.month}'
|
|
200
|
+
return f'{str_day}.{str_month}.{date_start.year}'
|
|
201
|
+
|
|
202
|
+
|
|
203
|
+
def get_DAM_order(df_general,
|
|
204
|
+
max_depth,
|
|
205
|
+
learn_period,
|
|
206
|
+
model = 'br3_act',
|
|
207
|
+
date_start = datetime.now().date(),
|
|
208
|
+
date_end = datetime.now().date() + timedelta(days=1),
|
|
209
|
+
):
|
|
210
|
+
|
|
211
|
+
'''
|
|
212
|
+
SYNOPSYS: функция, генерирующая заявку на РСВ с экспортом в xlsx-форму
|
|
213
|
+
|
|
214
|
+
KEYWORD ARGUMENTS:
|
|
215
|
+
df_general -- актуальная генеральная совокупность ко дню прогноза
|
|
216
|
+
max_depth -- максимальная глубина решающего дерева регрессора
|
|
217
|
+
learn_period -- период обучения модели (размер тренировочной выборки)
|
|
218
|
+
model -- тип модели (по умолчанию br3_act: [ActCons, ActGen, Price])
|
|
219
|
+
date_start -- дата от начала старта прогноза (по умолчанию – текущий день)
|
|
220
|
+
date_end -- дата окончания прогноза (по умолчанию – сутки вперёд)
|
|
221
|
+
|
|
222
|
+
RETURNS: *.xlsx
|
|
223
|
+
|
|
224
|
+
EXAMPLE:
|
|
225
|
+
>>> get_DAM_order(df_general, max_depth, learn_period)
|
|
226
|
+
|
|
227
|
+
>>> get_DAM_order(
|
|
228
|
+
df_general, max_depth, learn_period, model,
|
|
229
|
+
date_start=datetime.now().date(), date_end=datetime.now().date() + timedelta(days=2))
|
|
230
|
+
)
|
|
231
|
+
'''
|
|
232
|
+
|
|
233
|
+
df_predicted = get_df_predicted(df_general, max_depth, learn_period, model, date_start, date_end)
|
|
234
|
+
return_days = (date_end - date_start).days
|
|
235
|
+
|
|
236
|
+
if df_predicted.shape[0] == 24*return_days:
|
|
237
|
+
df_predicted.to_excel(f'Predicted({date_str_format(df_predicted)}).xlsx', index=False)
|
|
238
|
+
print(f'Результаты прогноза сохранены в файле Predicted({date_str_format(df_predicted)}).xlsx')
|
|
239
|
+
else:
|
|
240
|
+
print("\033[1;31m{}".format('WARNING: No data in exported dataframe'))
|
|
241
|
+
|
|
242
|
+
return df_predicted
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
"""
|
|
2
|
+
import all dependencies
|
|
3
|
+
"""
|
|
4
|
+
|
|
5
|
+
import os
|
|
6
|
+
import re
|
|
7
|
+
import ephem
|
|
8
|
+
import locale
|
|
9
|
+
import requests
|
|
10
|
+
import warnings
|
|
11
|
+
import calendar
|
|
12
|
+
from bs4 import BeautifulSoup
|
|
13
|
+
from tqdm.notebook import trange
|
|
14
|
+
|
|
15
|
+
import notebook
|
|
16
|
+
import numpy as np
|
|
17
|
+
import pandas as pd
|
|
18
|
+
import seaborn as sns
|
|
19
|
+
import matplotlib
|
|
20
|
+
import matplotlib.pyplot as plt
|
|
21
|
+
import matplotlib.ticker as ticker
|
|
22
|
+
from platform import python_version
|
|
23
|
+
import catboost
|
|
24
|
+
import graphviz
|
|
25
|
+
from datetime import time
|
|
26
|
+
from datetime import datetime
|
|
27
|
+
from datetime import timedelta
|
|
28
|
+
import sklearn
|
|
29
|
+
from sklearn.metrics import mean_absolute_error as MAE
|
|
30
|
+
from sklearn.metrics import mean_absolute_percentage_error as MAPE
|
|
31
|
+
from sklearn.model_selection import ShuffleSplit
|
|
32
|
+
from sklearn.model_selection import GridSearchCV
|
|
33
|
+
from sklearn.model_selection import TimeSeriesSplit
|
|
34
|
+
from sklearn.model_selection import LearningCurveDisplay
|
|
35
|
+
|
|
36
|
+
def print_dependency():
|
|
37
|
+
print(f"python: v {python_version()}")
|
|
38
|
+
print(f"Jupyter Notebook: v {notebook.__version__}")
|
|
39
|
+
print(f"numpy: v {np.__version__}")
|
|
40
|
+
print(f"pandas: v {pd.__version__}")
|
|
41
|
+
print(f"seaborn: v {sns.__version__}")
|
|
42
|
+
print(f"graphviz: v {graphviz.__version__}")
|
|
43
|
+
print(f"matplotlib: v {matplotlib.__version__}")
|
|
44
|
+
print(f"sklearn: v {sklearn.__version__}")
|
|
45
|
+
print(f"CatBoost: v {catboost.__version__}")
|
|
46
|
+
print(f"STLFLib: v 1.0.0")
|