stationaritytoolkit 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,24 @@
1
+ Custom Learning-Only License
2
+
3
+ This software is provided under the terms of the Custom Learning-Only License (CLOL). By using this software,
4
+ you agree to the following terms:
5
+
6
+ 1. Free for Personal and Educational Use:
7
+ You are free to use, modify, and distribute this software for personal and educational purposes, including
8
+ learning, research, and non-commercial projects.
9
+
10
+ 2. Commercial/Business Use Prohibited:
11
+ You are not allowed to use this software for any commercial or business purposes, including but not limited
12
+ to selling, licensing, or using it within a for-profit organization. If you intend to use this software for any commercial or business-related activities, you must obtain explicit permission from the author.
13
+
14
+ 3. No Warranty:
15
+ This software is provided "as is" without any warranty. The author shall not be held liable for any damages
16
+ or losses resulting from the use of this software.
17
+
18
+ 4. Attribution:
19
+ If you distribute or use this software in any way, you must provide clear attribution to the original author.
20
+
21
+ 5. Changes to License:
22
+ The author reserves the right to change the terms of this license at any time.
23
+
24
+ For inquiries or permission for commercial use, please contact the author at [Your Email Address].
@@ -0,0 +1,103 @@
1
+ Metadata-Version: 2.4
2
+ Name: stationaritytoolkit
3
+ Version: 0.1.0
4
+ Summary: Test and Convert non-stationary time-series to stationary
5
+ Author-email: Bhanu Suraj Malla <bsmalla@asu.edu>
6
+ Project-URL: Homepage, https://github.com/mbsuraj/stationarityToolkit
7
+ Project-URL: Bug Tracker, https://github.com/mbsuraj/stationarityToolkit/issues
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: Other/Proprietary License
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.7
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE
14
+ Dynamic: license-file
15
+
16
+ # ๐Ÿ“Š StationarityToolkit
17
+
18
+ **StationarityToolkit** is a Python library designed to help you **analyze and transform time series data for stationarity**. It offers a suite of statistical tests and automated transformations to detect and handle both **trend** and **variance** non-stationarity.
19
+
20
+ Whether you're building a forecasting model or preparing data for analysis, this toolkit makes your preprocessing easier and more reliable.
21
+
22
+ ## ๐Ÿš€ Features
23
+
24
+
25
+ ### โœ… 1. Test for Variance Non-Stationarity
26
+ - Use the **Phillips-Perron test** to detect variance instability.
27
+
28
+ ### โœ… 2. Test for Trend Non-Stationarity
29
+ - Use both **ADF (Augmented Dickey-Fuller)** and **KPSS (Kwiatkowski-Phillips-Schmidt-Shin)** tests to check for trend-based non-stationarity.
30
+
31
+ ### ๐Ÿ”ง 3. Remove Trend Non-Stationarity
32
+ - Automatically apply:
33
+ - **Trend differencing**
34
+ - **Seasonal differencing**
35
+ - Or a combination of both
36
+ - Optimized for **weekly seasonal data**.
37
+
38
+ ### ๐Ÿ”ง 4. Remove Variance Non-Stationarity
39
+ - Automatically apply transformations like:
40
+ - **Logarithmic**
41
+ - **Square root**
42
+ - **Box-Cox**
43
+ - Selects the best transformation based on statistical significance.
44
+ - Skips transformation if variance is already stationary.
45
+
46
+ ### ๐Ÿงน 5. Remove All Non-Stationarity
47
+ - Combine both variance and trend removal in one pipeline:
48
+ - Detect and remove variance issues first
49
+ - Then proceed to handle trend non-stationarity
50
+
51
+ ---
52
+
53
+ ## ๐Ÿ› ๏ธ Installation
54
+ pip install StationarityToolkit
55
+
56
+ ## ๐Ÿงช Quick Start:
57
+
58
+ 1. **Import the toolkit:**
59
+ ```python
60
+ from stationarity_toolkit.stationarity_toolkit import StationarityToolkit
61
+ 2. **Initialize the Toolkit:**
62
+ ```python
63
+ toolkit = StationarityToolkit(alpha=0.05)
64
+
65
+ ## โš™๏ธ Usage Guide
66
+ 1. **โœ… Test for Stationarity:**
67
+ ```python
68
+ toolkit.perform_pp_test(ts) # Phillips-Perron test for variance non-stationarity
69
+ toolkit.adf_test(ts) # Augmented Dickey-Fuller test for trend
70
+ toolkit.kpss_test(ts) # KPSS test for trend
71
+ 2. **๐Ÿ”ง Remove Variance Non-Stationarity**
72
+ ```python
73
+ toolkit.remove_var_nonstationarity(ts_as_a_dataframe)
74
+ - Checks if variance non-stationarity exists.
75
+ - Applies log, square root, and Box-Cox transformations.
76
+ - Selects the transformation that produces the lowest p-value.
77
+ - Skips transformation if unnecessary.
78
+
79
+ 3. **๐Ÿ”ง Remove Trend Non-Stationarity**
80
+ ```python
81
+ toolkit.remove_var_nonstationarity(ts_as_a_dataframe)
82
+ - Applies differencing techniques:
83
+ - Lag differencing
84
+ - Seasonal differencing
85
+ - Combination of both
86
+ - Evaluates each using ADF and KPSS tests to find the best transformation.
87
+ - โš ๏ธ Currently supports weekly seasonality only.
88
+
89
+ 4. **๐Ÿงน Remove All Non-Stationarity**
90
+ ```python
91
+ toolkit.remove_nonstationarity(ts_as_a_dataframe)
92
+ - Runs both variance and trend checks/removal:
93
+ - Removes variance non-stationarity (if present)
94
+ - Then removes trend non-stationarity
95
+
96
+ ## ๐Ÿ’ก Why Stationarity Matters
97
+ - Most classical and deep learning time series models (ARIMA, VAR, Prophet, LSTM) assume that the data is stationary. Non-stationary data can lead to:
98
+ - Spurious regressions
99
+ - Poor model accuracy
100
+ - Invalid statistical inferences
101
+
102
+ StationarityToolkit helps you automate this critical preprocessing step with minimal manual intervention.
103
+
@@ -0,0 +1,88 @@
1
+ # ๐Ÿ“Š StationarityToolkit
2
+
3
+ **StationarityToolkit** is a Python library designed to help you **analyze and transform time series data for stationarity**. It offers a suite of statistical tests and automated transformations to detect and handle both **trend** and **variance** non-stationarity.
4
+
5
+ Whether you're building a forecasting model or preparing data for analysis, this toolkit makes your preprocessing easier and more reliable.
6
+
7
+ ## ๐Ÿš€ Features
8
+
9
+
10
+ ### โœ… 1. Test for Variance Non-Stationarity
11
+ - Use the **Phillips-Perron test** to detect variance instability.
12
+
13
+ ### โœ… 2. Test for Trend Non-Stationarity
14
+ - Use both **ADF (Augmented Dickey-Fuller)** and **KPSS (Kwiatkowski-Phillips-Schmidt-Shin)** tests to check for trend-based non-stationarity.
15
+
16
+ ### ๐Ÿ”ง 3. Remove Trend Non-Stationarity
17
+ - Automatically apply:
18
+ - **Trend differencing**
19
+ - **Seasonal differencing**
20
+ - Or a combination of both
21
+ - Optimized for **weekly seasonal data**.
22
+
23
+ ### ๐Ÿ”ง 4. Remove Variance Non-Stationarity
24
+ - Automatically apply transformations like:
25
+ - **Logarithmic**
26
+ - **Square root**
27
+ - **Box-Cox**
28
+ - Selects the best transformation based on statistical significance.
29
+ - Skips transformation if variance is already stationary.
30
+
31
+ ### ๐Ÿงน 5. Remove All Non-Stationarity
32
+ - Combine both variance and trend removal in one pipeline:
33
+ - Detect and remove variance issues first
34
+ - Then proceed to handle trend non-stationarity
35
+
36
+ ---
37
+
38
+ ## ๐Ÿ› ๏ธ Installation
39
+ pip install StationarityToolkit
40
+
41
+ ## ๐Ÿงช Quick Start:
42
+
43
+ 1. **Import the toolkit:**
44
+ ```python
45
+ from stationarity_toolkit.stationarity_toolkit import StationarityToolkit
46
+ 2. **Initialize the Toolkit:**
47
+ ```python
48
+ toolkit = StationarityToolkit(alpha=0.05)
49
+
50
+ ## โš™๏ธ Usage Guide
51
+ 1. **โœ… Test for Stationarity:**
52
+ ```python
53
+ toolkit.perform_pp_test(ts) # Phillips-Perron test for variance non-stationarity
54
+ toolkit.adf_test(ts) # Augmented Dickey-Fuller test for trend
55
+ toolkit.kpss_test(ts) # KPSS test for trend
56
+ 2. **๐Ÿ”ง Remove Variance Non-Stationarity**
57
+ ```python
58
+ toolkit.remove_var_nonstationarity(ts_as_a_dataframe)
59
+ - Checks if variance non-stationarity exists.
60
+ - Applies log, square root, and Box-Cox transformations.
61
+ - Selects the transformation that produces the lowest p-value.
62
+ - Skips transformation if unnecessary.
63
+
64
+ 3. **๐Ÿ”ง Remove Trend Non-Stationarity**
65
+ ```python
66
+ toolkit.remove_var_nonstationarity(ts_as_a_dataframe)
67
+ - Applies differencing techniques:
68
+ - Lag differencing
69
+ - Seasonal differencing
70
+ - Combination of both
71
+ - Evaluates each using ADF and KPSS tests to find the best transformation.
72
+ - โš ๏ธ Currently supports weekly seasonality only.
73
+
74
+ 4. **๐Ÿงน Remove All Non-Stationarity**
75
+ ```python
76
+ toolkit.remove_nonstationarity(ts_as_a_dataframe)
77
+ - Runs both variance and trend checks/removal:
78
+ - Removes variance non-stationarity (if present)
79
+ - Then removes trend non-stationarity
80
+
81
+ ## ๐Ÿ’ก Why Stationarity Matters
82
+ - Most classical and deep learning time series models (ARIMA, VAR, Prophet, LSTM) assume that the data is stationary. Non-stationary data can lead to:
83
+ - Spurious regressions
84
+ - Poor model accuracy
85
+ - Invalid statistical inferences
86
+
87
+ StationarityToolkit helps you automate this critical preprocessing step with minimal manual intervention.
88
+
@@ -0,0 +1,22 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61.0", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "stationaritytoolkit"
7
+ version = "0.1.0"
8
+ authors = [
9
+ { name="Bhanu Suraj Malla", email="bsmalla@asu.edu" },
10
+ ]
11
+ description = "Test and Convert non-stationary time-series to stationary"
12
+ readme = "README.md"
13
+ requires-python = ">=3.7"
14
+ classifiers = [
15
+ "Programming Language :: Python :: 3",
16
+ "License :: Other/Proprietary License",
17
+ "Operating System :: OS Independent",
18
+ ]
19
+
20
+ [project.urls]
21
+ "Homepage" = "https://github.com/mbsuraj/stationarityToolkit"
22
+ "Bug Tracker" = "https://github.com/mbsuraj/stationarityToolkit/issues"
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
File without changes
@@ -0,0 +1,103 @@
1
+ Metadata-Version: 2.4
2
+ Name: stationaritytoolkit
3
+ Version: 0.1.0
4
+ Summary: Test and Convert non-stationary time-series to stationary
5
+ Author-email: Bhanu Suraj Malla <bsmalla@asu.edu>
6
+ Project-URL: Homepage, https://github.com/mbsuraj/stationarityToolkit
7
+ Project-URL: Bug Tracker, https://github.com/mbsuraj/stationarityToolkit/issues
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: Other/Proprietary License
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.7
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE
14
+ Dynamic: license-file
15
+
16
+ # ๐Ÿ“Š StationarityToolkit
17
+
18
+ **StationarityToolkit** is a Python library designed to help you **analyze and transform time series data for stationarity**. It offers a suite of statistical tests and automated transformations to detect and handle both **trend** and **variance** non-stationarity.
19
+
20
+ Whether you're building a forecasting model or preparing data for analysis, this toolkit makes your preprocessing easier and more reliable.
21
+
22
+ ## ๐Ÿš€ Features
23
+
24
+
25
+ ### โœ… 1. Test for Variance Non-Stationarity
26
+ - Use the **Phillips-Perron test** to detect variance instability.
27
+
28
+ ### โœ… 2. Test for Trend Non-Stationarity
29
+ - Use both **ADF (Augmented Dickey-Fuller)** and **KPSS (Kwiatkowski-Phillips-Schmidt-Shin)** tests to check for trend-based non-stationarity.
30
+
31
+ ### ๐Ÿ”ง 3. Remove Trend Non-Stationarity
32
+ - Automatically apply:
33
+ - **Trend differencing**
34
+ - **Seasonal differencing**
35
+ - Or a combination of both
36
+ - Optimized for **weekly seasonal data**.
37
+
38
+ ### ๐Ÿ”ง 4. Remove Variance Non-Stationarity
39
+ - Automatically apply transformations like:
40
+ - **Logarithmic**
41
+ - **Square root**
42
+ - **Box-Cox**
43
+ - Selects the best transformation based on statistical significance.
44
+ - Skips transformation if variance is already stationary.
45
+
46
+ ### ๐Ÿงน 5. Remove All Non-Stationarity
47
+ - Combine both variance and trend removal in one pipeline:
48
+ - Detect and remove variance issues first
49
+ - Then proceed to handle trend non-stationarity
50
+
51
+ ---
52
+
53
+ ## ๐Ÿ› ๏ธ Installation
54
+ pip install StationarityToolkit
55
+
56
+ ## ๐Ÿงช Quick Start:
57
+
58
+ 1. **Import the toolkit:**
59
+ ```python
60
+ from stationarity_toolkit.stationarity_toolkit import StationarityToolkit
61
+ 2. **Initialize the Toolkit:**
62
+ ```python
63
+ toolkit = StationarityToolkit(alpha=0.05)
64
+
65
+ ## โš™๏ธ Usage Guide
66
+ 1. **โœ… Test for Stationarity:**
67
+ ```python
68
+ toolkit.perform_pp_test(ts) # Phillips-Perron test for variance non-stationarity
69
+ toolkit.adf_test(ts) # Augmented Dickey-Fuller test for trend
70
+ toolkit.kpss_test(ts) # KPSS test for trend
71
+ 2. **๐Ÿ”ง Remove Variance Non-Stationarity**
72
+ ```python
73
+ toolkit.remove_var_nonstationarity(ts_as_a_dataframe)
74
+ - Checks if variance non-stationarity exists.
75
+ - Applies log, square root, and Box-Cox transformations.
76
+ - Selects the transformation that produces the lowest p-value.
77
+ - Skips transformation if unnecessary.
78
+
79
+ 3. **๐Ÿ”ง Remove Trend Non-Stationarity**
80
+ ```python
81
+ toolkit.remove_var_nonstationarity(ts_as_a_dataframe)
82
+ - Applies differencing techniques:
83
+ - Lag differencing
84
+ - Seasonal differencing
85
+ - Combination of both
86
+ - Evaluates each using ADF and KPSS tests to find the best transformation.
87
+ - โš ๏ธ Currently supports weekly seasonality only.
88
+
89
+ 4. **๐Ÿงน Remove All Non-Stationarity**
90
+ ```python
91
+ toolkit.remove_nonstationarity(ts_as_a_dataframe)
92
+ - Runs both variance and trend checks/removal:
93
+ - Removes variance non-stationarity (if present)
94
+ - Then removes trend non-stationarity
95
+
96
+ ## ๐Ÿ’ก Why Stationarity Matters
97
+ - Most classical and deep learning time series models (ARIMA, VAR, Prophet, LSTM) assume that the data is stationary. Non-stationary data can lead to:
98
+ - Spurious regressions
99
+ - Poor model accuracy
100
+ - Invalid statistical inferences
101
+
102
+ StationarityToolkit helps you automate this critical preprocessing step with minimal manual intervention.
103
+
@@ -0,0 +1,14 @@
1
+ LICENSE
2
+ README.md
3
+ pyproject.toml
4
+ src/__init__.py
5
+ src/stationarityToolkit.egg-info/PKG-INFO
6
+ src/stationarityToolkit.egg-info/SOURCES.txt
7
+ src/stationarityToolkit.egg-info/dependency_links.txt
8
+ src/stationarityToolkit.egg-info/top_level.txt
9
+ src/stationarity_toolkit/__init__.py
10
+ src/stationarity_toolkit/stationarity_toolkit.py
11
+ src/stationaritytoolkit.egg-info/PKG-INFO
12
+ src/stationaritytoolkit.egg-info/SOURCES.txt
13
+ src/stationaritytoolkit.egg-info/dependency_links.txt
14
+ src/stationaritytoolkit.egg-info/top_level.txt
@@ -0,0 +1,2 @@
1
+ __init__
2
+ stationarity_toolkit
@@ -0,0 +1,470 @@
1
+ import logging
2
+ import pickle
3
+ import numpy as np
4
+ import pandas as pd
5
+ from arch.unitroot import PhillipsPerron
6
+ from matplotlib import pyplot as plt
7
+ from statsmodels.tsa.stattools import kpss
8
+ from statsmodels.tsa.stattools import adfuller
9
+ from scipy.stats import boxcox
10
+
11
+
12
+ class StationarityToolkit:
13
+ def __init__(self, alpha, timeseries=None):
14
+ """
15
+ Constructor for the StationarityToolkit class.
16
+
17
+ Parameters:
18
+ alpha (float): The significance level for hypothesis testing. A value between 0 and 1.
19
+ timeseries (pandas.DataFrame): The input time series data in DataFrame format.
20
+ """
21
+ self._initiate_logger()
22
+ self.alpha = alpha
23
+ self.timeseries = timeseries
24
+ self._recurse_cnt = 0
25
+ self._differencing = None
26
+ self._trend_initial_value = None
27
+ self._seasonality_initial_value = None
28
+ self._index = self._get_index()
29
+ self._var_nonstationarity_removed = False
30
+ self.df = None
31
+
32
+ def _initiate_logger(self):
33
+ self.logger = logging.getLogger(__name__)
34
+ self.logger.setLevel(logging.INFO)
35
+
36
+ # Create a formatter with a custom format
37
+ formatter = logging.Formatter("[%(levelname)s] - %(message)s")
38
+
39
+ # Create a console handler and set the formatter
40
+ console_handler = logging.StreamHandler()
41
+ console_handler.setFormatter(formatter)
42
+
43
+ # Add the console handler to the logger
44
+ self.logger.addHandler(console_handler)
45
+
46
+ def _get_index(self):
47
+ if self.timeseries is not None:
48
+ return self.timeseries.index
49
+ else:
50
+ return None
51
+
52
+ def perform_pp_test(self, ts):
53
+ """
54
+ Perform the Phillips-Perron test on the given time series to test for variance stationarity.
55
+
56
+ Parameters:
57
+ ts (pandas.Series): The input time series data as a pandas Series.
58
+
59
+ Returns:
60
+ float: The p-value obtained from the Phillips-Perron test.
61
+ """
62
+ # Check if the logger has any handlers already
63
+ if self.logger.hasHandlers():
64
+ # If the logger already has handlers, remove them
65
+ self.logger.handlers.clear()
66
+ # Perform Dickey-Fuller test:
67
+ self.logger.info("PHILLIPS-PERRON TEST FOR VARIANCE STATIONARITY")
68
+ pp_test = PhillipsPerron(ts)
69
+ p_value = pp_test.pvalue
70
+ self.logger.debug(p_value)
71
+ if p_value <= self.alpha:
72
+ self.logger.info("The time series is likely variance stationary.")
73
+ else:
74
+ self.logger.info("The time series is likely not variance stationary.")
75
+ return p_value
76
+
77
+ def inv_boxcox(self, y_box, lambda_, index):
78
+ """
79
+ Apply the inverse Box-Cox transformation to the given Box-Cox transformed data.
80
+
81
+ Parameters:
82
+ y_box (numpy.ndarray): The Box-Cox transformed data.
83
+ lambda_ (float): The lambda value used in the Box-Cox transformation.
84
+ index (pandas.Index): The index of the original time series.
85
+
86
+ Returns:
87
+ pandas.DataFrame: The time series in its original scale.
88
+ """
89
+ # Check if the logger has any handlers already
90
+ if self.logger.hasHandlers():
91
+ # If the logger already has handlers, remove them
92
+ self.logger.handlers.clear()
93
+ # Convert to numpy array
94
+ y_box = np.array(y_box)
95
+ ts = pd.DataFrame(np.power((y_box * lambda_) + 1, 1 / lambda_) - 1, index=index)
96
+ return ts
97
+
98
+ def remove_var_nonstationarity(self, ts=None):
99
+ """
100
+ Attempt to remove variance non-stationarity from the given time series.
101
+
102
+ Parameters:
103
+ ts (pandas.DataFrame, optional): The input time series data in DataFrame format.
104
+ If None, the class's 'timeseries' attribute is used.
105
+
106
+ Returns:
107
+ pandas.DataFrame: DataFrame containing the original and transformed time series along with
108
+ the transformation name, parameters, and inverse transformation function.
109
+ """
110
+ if self.logger.hasHandlers():
111
+ self.logger.handlers.clear()
112
+
113
+ ts = ts if ts is not None else self.timeseries
114
+ self._index = self._get_index()
115
+
116
+ self.logger.info("Test Variance Stationarity: ")
117
+ if self.perform_pp_test(ts) > self.alpha:
118
+ transformations = {
119
+ "Original": (ts, None),
120
+ "Log-Transformed": (np.log(ts), np.exp),
121
+ "Square Root Transformed": (np.sqrt(ts), np.square),
122
+ }
123
+
124
+ # Box-Cox requires positive data, so we add a constant to the data to make it positive
125
+ constant = (
126
+ 1 # Choose a positive constant (can be adjusted based on the data)
127
+ )
128
+ bc_ts = ts.values.flatten() + constant
129
+ if np.all(bc_ts > 0):
130
+ boxcox_transformed_data, lam = boxcox(bc_ts)
131
+ transformations["Box-Cox Transformed"] = (
132
+ boxcox_transformed_data,
133
+ self.inv_boxcox,
134
+ )
135
+
136
+ # Test variance stationarity for each transformed series
137
+ best_transformation = None
138
+ best_p_value = np.inf
139
+
140
+ for name, (transformed_data, _) in transformations.items():
141
+ self.logger.info(f"\n\ntesting {name}")
142
+ p_value = self.perform_pp_test(transformed_data)
143
+ if p_value <= best_p_value:
144
+ best_p_value = p_value
145
+ best_transformation = (name, transformed_data)
146
+
147
+ # Extract the best transformation information
148
+ best_transformation_name, best_transformed_data = best_transformation
149
+ var_transformed = best_transformed_data if best_transformation_name == "Box-Cox Transformed" \
150
+ else best_transformed_data.to_numpy().flatten()
151
+
152
+ self.logger.info(f"\n\nBest Transformation: {best_transformation_name}")
153
+ self.logger.info(f"P-Value: {best_p_value}")
154
+
155
+ parameters = (
156
+ {}
157
+ if best_transformation_name != "Box-Cox Transformed"
158
+ else {"constant": constant, "lam": lam}
159
+ )
160
+
161
+ # Serialize the inverse function using pickle
162
+ inv_function_serialized = pickle.dumps(
163
+ transformations[best_transformation_name][1]
164
+ )
165
+
166
+ # Plot the original and best transformed series
167
+ self.df = pd.DataFrame(
168
+ {
169
+ "original": ts.to_numpy().flatten(),
170
+ "var_transformed": var_transformed,
171
+ "var_transformation_name": best_transformation_name,
172
+ "var_transformation_par": parameters,
173
+ "var_inverse_function": inv_function_serialized,
174
+ },
175
+ index=ts.index,
176
+ )
177
+ plt.figure(figsize=(10, 6))
178
+ plt.plot(self.df.index, self.df.original, label="Original Plot")
179
+ plt.plot(self.df.index, self.df.var_transformed,
180
+ label=f"{best_transformation_name} Variance Stationary Plot")
181
+ plt.title("Variance Stationarity Plot")
182
+ # self.df.plot(
183
+ # title="Variance Stationarity with Best Transformation", figsize=(10, 6)
184
+ # )
185
+ plt.legend()
186
+ plt.xlabel("Time")
187
+ plt.ylabel("Values")
188
+ plt.show()
189
+ else:
190
+ self.df = pd.DataFrame(
191
+ {
192
+ "original": ts.to_numpy().flatten(),
193
+ "var_transformed": ts.to_numpy().flatten(),
194
+ "var_transformation_name": None,
195
+ "var_transformation_par": None,
196
+ "var_inverse_function": None,
197
+ },
198
+ index=ts.index,
199
+ )
200
+ self._var_nonstationarity_removed = True
201
+ return self.df
202
+
203
+ def load_inverse_function(self, row):
204
+ """
205
+ Load the inverse transformation function from the serialized form in the DataFrame.
206
+
207
+ Parameters:
208
+ row (pandas.Series): A row from the DataFrame containing the serialized inverse function.
209
+
210
+ Returns:
211
+ Callable: The deserialized inverse transformation function.
212
+ """
213
+ inv_function_serialized = row["var_inverse_function"]
214
+ if pd.notna(inv_function_serialized):
215
+ return pickle.loads(inv_function_serialized)
216
+ return None
217
+
218
+ def adf_test(self, timeseries):
219
+ """
220
+ Perform the Augmented Dickey-Fuller (ADF) test on the given time series to test for trend stationarity.
221
+
222
+ Parameters:
223
+ timeseries (pandas.Series): The input time series data as a pandas Series.
224
+
225
+ Returns:
226
+ pandas.Series: Series containing the test results.
227
+ """
228
+ # Check if the logger has any handlers already
229
+ if self.logger.hasHandlers():
230
+ # If the logger already has handlers, remove them
231
+ self.logger.handlers.clear()
232
+ self.logger.info("DICKEY-FULLER TEST FOR TREND STATIONARITY")
233
+ dftest = adfuller(timeseries, autolag="AIC")
234
+ dfoutput = pd.Series(
235
+ dftest[0:4],
236
+ index=[
237
+ "Test Statistic",
238
+ "p-value",
239
+ "#Lags Used",
240
+ "Number of Observations Used",
241
+ ],
242
+ )
243
+ for key, value in dftest[4].items():
244
+ dfoutput["Critical Value (%s)" % key] = value
245
+
246
+ self.logger.debug(dfoutput)
247
+ if dfoutput["p-value"] <= self.alpha:
248
+ self.logger.info("Reject Null Hypothesis: Series is stationary")
249
+ else:
250
+ self.logger.info(
251
+ "Fail to reject Null Hypothesis: Series is non-stationary."
252
+ )
253
+ return dfoutput
254
+
255
+ def kpss_test(self, timeseries):
256
+ """
257
+ Perform the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test on the given time series to test for trend stationarity.
258
+
259
+ Parameters:
260
+ timeseries (pandas.Series): The input time series data as a pandas Series.
261
+
262
+ Returns:
263
+ pandas.Series: Series containing the test results.
264
+ """
265
+ # Check if the logger has any handlers already
266
+ if self.logger.hasHandlers():
267
+ # If the logger already has handlers, remove them
268
+ self.logger.handlers.clear()
269
+ self.logger.info("KPSS TEST FOR TREND STATIONARITY")
270
+ kpsstest = kpss(timeseries, regression="c")
271
+ kpss_output = pd.Series(
272
+ kpsstest[0:3], index=["Test Statistic", "p-value", "Lags Used"]
273
+ )
274
+ for key, value in kpsstest[3].items():
275
+ kpss_output["Critical Value (%s)" % key] = value
276
+ self.logger.debug(kpss_output)
277
+ if kpss_output["p-value"] <= self.alpha:
278
+ self.logger.info("Reject Null Hypothesis: Series is non-stationary")
279
+ else:
280
+ self.logger.info(
281
+ "Fail to reject Null Hypothesis: Series is trend-stationary."
282
+ )
283
+ return kpss_output
284
+
285
+ def remove_trend_nonstationarity(self, ts=None):
286
+ """
287
+ Attempt to remove trend-based non-stationarity from the given time series.
288
+
289
+ Parameters:
290
+ ts (pandas.DataFrame, optional): The input time series data in DataFrame format.
291
+ If None, the class's 'timeseries' attribute is used.
292
+
293
+ Returns:
294
+ tuple: A tuple containing the differenced time series, the type of differencing applied,
295
+ and the initial value of the time series.
296
+ """
297
+
298
+ def seasonal_inv(diff_ts, seasonal_initial_values):
299
+ val_len = len(diff_ts)
300
+ for i in range(val_len):
301
+ # print(i)
302
+ # print(diff_ts.iloc[i, 0])
303
+ if i < 52:
304
+ diff_ts.iloc[i] = seasonal_initial_values.iloc[i]
305
+ else:
306
+ # Calculate the sum of every 52nd value
307
+ sum_52nd = diff_ts.iloc[i - 52] + diff_ts.iloc[i]
308
+ diff_ts.iloc[i] = sum_52nd
309
+ return diff_ts
310
+
311
+ def season_trend_inv(diff_ts2, seasonal_initial_values, trend_initial_value):
312
+ inv_diff_ts2 = diff_ts2.cumsum() + trend_initial_value
313
+ inv_diff_ts2.iloc[52] = trend_initial_value
314
+ inv_inv_diff_ts2 = seasonal_inv(inv_diff_ts2, seasonal_initial_values)
315
+ return inv_inv_diff_ts2
316
+
317
+ if self._recurse_cnt == 0:
318
+ self.timeseries = ts if ts is not None else self.timeseries
319
+ self._trend_initial_value = self.timeseries.iloc[0].copy()
320
+ self._index = self._get_index()
321
+
322
+ if self._recurse_cnt == 0:
323
+ # Check if the logger has any handlers already
324
+ if self.logger.hasHandlers():
325
+ # If the logger already has handlers, remove them
326
+ self.logger.handlers.clear()
327
+ self.logger.info("REMOVE TREND NON-STATIONARITY")
328
+ self.logger.info("-----------------------LOG-------------------------")
329
+ self.logger.info("INITIAL STATISTICAL TESTS")
330
+ # Perform ADF and KPSS tests
331
+ try:
332
+ dfoutput = self.adf_test(ts.dropna())
333
+ kpss_output = self.kpss_test(ts.dropna())
334
+ except ValueError as e:
335
+ self.logger.error(f"Error occurred during ADF or KPSS test: {e}")
336
+ return None
337
+ self._recurse_cnt += 1
338
+ self.logger.info("----------------------------------------------------")
339
+ self.logger.info(f"Recurse Count: {self._recurse_cnt}")
340
+ self.logger.info("----------------------------------------------------")
341
+
342
+ if (
343
+ (dfoutput["p-value"] >= self.alpha)
344
+ or (kpss_output["p-value"] <= self.alpha)
345
+ ) and (self._recurse_cnt == 1):
346
+ self.logger.info(
347
+ "Both tests conclude that the series is not stationary -> Removing trend**"
348
+ )
349
+ self._differencing = "trend"
350
+ self._trend_initial_value = ts.iloc[0].copy()
351
+ ts_dif = ts - ts.shift(1)
352
+ if self.remove_trend_nonstationarity(ts_dif) is None:
353
+ self.logger.info("Trend Removal didn't work. Removing Seasonality")
354
+ self._differencing = "seasonality"
355
+ self._seasonality_initial_value = ts.iloc[0:52].copy()
356
+ ts_seasonal_diff = ts - ts.shift(52)
357
+ if self.remove_trend_nonstationarity(ts_seasonal_diff) is None:
358
+ self._differencing = "seasonal_trend"
359
+ if len(ts_seasonal_diff) > 52:
360
+ self._trend_initial_value = ts_seasonal_diff.iloc[52]
361
+ else:
362
+ self.logger.error("ts_seasonal_diff doesn't have enough elements.")
363
+ return None
364
+ self.logger.info(
365
+ "Seasonality Removal didn't work. Removing Trend on top of Seasonal Differencing"
366
+ )
367
+ ts_seasonal_trend_diff = ts_seasonal_diff - ts_seasonal_diff.shift(
368
+ 1
369
+ )
370
+ ts_seasonal_trend_diff.index = self._index
371
+ result = self.remove_trend_nonstationarity(ts_seasonal_trend_diff)
372
+ self._recurse_cnt = 0
373
+ return result
374
+ else:
375
+ self.remove_trend_nonstationarity(ts_seasonal_diff)
376
+ self._recurse_cnt = 0
377
+ else:
378
+ result = self.remove_trend_nonstationarity(ts_dif)
379
+ self._recurse_cnt = 0
380
+ return result
381
+ ts_dif.index = self._index
382
+ elif (
383
+ (dfoutput["p-value"] >= self.alpha)
384
+ or (kpss_output["p-value"] <= self.alpha)
385
+ ) and (self._recurse_cnt == 2):
386
+ self._recurse_cnt = 1
387
+ return None
388
+ elif (dfoutput["p-value"] <= self.alpha) and (
389
+ kpss_output["p-value"] >= self.alpha
390
+ ):
391
+ self.logger.info(
392
+ f"**After {self._recurse_cnt} iterations - Both tests now conclude that the series is stationary**"
393
+ )
394
+ df = pd.DataFrame(
395
+ {
396
+ "original": self.timeseries.to_numpy().flatten(),
397
+ "trend_transformed": ts.to_numpy().flatten(),
398
+ "trend_transformation_name": self._differencing,
399
+ "trend_initial_value": self._trend_initial_value,
400
+ "seasonal_initial_values": [self._seasonality_initial_value.values] if self._seasonality_initial_value is not None else None,
401
+ "trend_inverse_function": None, # Placeholder for storing inverse function
402
+ },
403
+ index=ts.index,
404
+ )
405
+ if self.df is None:
406
+ self.df = df
407
+ elif (self.df is not None) and ("trend_transformed" not in self.df.columns):
408
+ df.drop(columns=["original"], inplace=True)
409
+ self.df = pd.merge(self.df, df, left_index=True, right_index=True)
410
+ else:
411
+ pass
412
+ # Generate and save the inverse function
413
+ inv_function_serialized = None
414
+ if self._differencing == "trend":
415
+ inv_function_serialized = (
416
+ lambda ts_diff: ts_diff.fillna(0).cumsum() + self._trend_initial_value
417
+ )
418
+ elif self._differencing == "seasonality":
419
+ inv_function_serialized = lambda ts_diff: seasonal_inv(
420
+ ts_diff, self._seasonality_initial_value
421
+ )
422
+ elif self._differencing == "seasonal_trend":
423
+ inv_function_serialized = lambda ts_diff: season_trend_inv(
424
+ ts_diff, self._seasonality_initial_value, self._trend_initial_value
425
+ )
426
+
427
+ self.df["trend_inverse_function"] = inv_function_serialized
428
+ return self.df
429
+
430
+ def _inverse_difference_fn(self, ts_diff, initial_value):
431
+ """
432
+ Inverse function for seasonal differencing ts - ts.shift(52).
433
+
434
+ Parameters:
435
+ ts_diff (pandas.Series): Differenced time series to be inverted.
436
+ initial_value (float): The initial value of the original time series before differencing.
437
+
438
+ Returns:
439
+ pandas.Series: The inverted time series.
440
+ """
441
+ ts_inv = ts_diff.cumsum() + initial_value
442
+ return ts_inv
443
+
444
+ # # Serialize the inverse function using pickle
445
+ # inv_function_serialized = pickle.dumps(lambda ts_diff: inverse_seasonal_difference(ts_diff, initial_value_example))
446
+ def _plot_trend_stationary_series(self):
447
+ plt.figure(figsize=(10, 6))
448
+ plt.plot(self.df.index, self.df.original, label="Original")
449
+ plt.plot(self.df.index, self.df.trend_transformed, label=f"{self._differencing}-Transformed")
450
+ plt.plot(title="Trend Stationarity", figsize=(10, 6))
451
+ plt.legend()
452
+ plt.xlabel("Time")
453
+ plt.ylabel("Values")
454
+ plt.show()
455
+
456
+ def remove_nonstationarity(self, ts):
457
+ """
458
+ Perform both variance and trend-based non-stationarity removal from the input time series.
459
+
460
+ Returns:
461
+ pandas.DataFrame: DataFrame containing the original time series, the best variance-transformed time series,
462
+ the differenced time series, the type of differencing applied, and the initial value of the time series.
463
+ """
464
+ self._recurse_cnt = 0
465
+ self.timeseries = ts if ts is not None else self.timeseries
466
+ self._index = self._get_index()
467
+ df = self.remove_var_nonstationarity(ts)
468
+ df2 = self.remove_trend_nonstationarity(df[["var_transformed"]])
469
+ self._plot_trend_stationary_series()
470
+ return df2