direl-ts-tool-kit 0.4.8__tar.gz → 0.5.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (19) hide show
  1. direl_ts_tool_kit-0.5.0/PKG-INFO +123 -0
  2. direl_ts_tool_kit-0.5.0/README.md +90 -0
  3. {direl_ts_tool_kit-0.4.8 → direl_ts_tool_kit-0.5.0}/direl_ts_tool_kit/plot/plot_style.py +1 -0
  4. {direl_ts_tool_kit-0.4.8 → direl_ts_tool_kit-0.5.0}/direl_ts_tool_kit/plot/plot_ts.py +109 -2
  5. direl_ts_tool_kit-0.5.0/direl_ts_tool_kit.egg-info/PKG-INFO +123 -0
  6. {direl_ts_tool_kit-0.4.8 → direl_ts_tool_kit-0.5.0}/direl_ts_tool_kit.egg-info/requires.txt +2 -0
  7. {direl_ts_tool_kit-0.4.8 → direl_ts_tool_kit-0.5.0}/setup.py +3 -1
  8. direl_ts_tool_kit-0.4.8/PKG-INFO +0 -487
  9. direl_ts_tool_kit-0.4.8/README.md +0 -456
  10. direl_ts_tool_kit-0.4.8/direl_ts_tool_kit.egg-info/PKG-INFO +0 -487
  11. {direl_ts_tool_kit-0.4.8 → direl_ts_tool_kit-0.5.0}/LICENCE +0 -0
  12. {direl_ts_tool_kit-0.4.8 → direl_ts_tool_kit-0.5.0}/direl_ts_tool_kit/__init__.py +0 -0
  13. {direl_ts_tool_kit-0.4.8 → direl_ts_tool_kit-0.5.0}/direl_ts_tool_kit/plot/__init__.py +0 -0
  14. {direl_ts_tool_kit-0.4.8 → direl_ts_tool_kit-0.5.0}/direl_ts_tool_kit/utilities/__init__.py +0 -0
  15. {direl_ts_tool_kit-0.4.8 → direl_ts_tool_kit-0.5.0}/direl_ts_tool_kit/utilities/data_prep.py +0 -0
  16. {direl_ts_tool_kit-0.4.8 → direl_ts_tool_kit-0.5.0}/direl_ts_tool_kit.egg-info/SOURCES.txt +0 -0
  17. {direl_ts_tool_kit-0.4.8 → direl_ts_tool_kit-0.5.0}/direl_ts_tool_kit.egg-info/dependency_links.txt +0 -0
  18. {direl_ts_tool_kit-0.4.8 → direl_ts_tool_kit-0.5.0}/direl_ts_tool_kit.egg-info/top_level.txt +0 -0
  19. {direl_ts_tool_kit-0.4.8 → direl_ts_tool_kit-0.5.0}/setup.cfg +0 -0
@@ -0,0 +1,123 @@
1
+ Metadata-Version: 2.4
2
+ Name: direl-ts-tool-kit
3
+ Version: 0.5.0
4
+ Summary: A toolbox for time series analysis and visualization.
5
+ Home-page: https://gitlab.com/direl/direl_tool_kit
6
+ Author: Diego Restrepo-Leal
7
+ Author-email: diegorestrepoleal@gmail.com
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Programming Language :: Python :: 3.9
10
+ Classifier: License :: OSI Approved :: MIT License
11
+ Classifier: Operating System :: OS Independent
12
+ Classifier: Intended Audience :: Science/Research
13
+ Classifier: Topic :: Scientific/Engineering :: Visualization
14
+ Requires-Python: >=3.8
15
+ Description-Content-Type: text/markdown
16
+ License-File: LICENCE
17
+ Requires-Dist: pandas>=1.0.0
18
+ Requires-Dist: numpy>=1.18.0
19
+ Requires-Dist: matplotlib>=3.0.0
20
+ Requires-Dist: openpyxl
21
+ Requires-Dist: seaborn
22
+ Requires-Dist: scipy
23
+ Dynamic: author
24
+ Dynamic: author-email
25
+ Dynamic: classifier
26
+ Dynamic: description
27
+ Dynamic: description-content-type
28
+ Dynamic: home-page
29
+ Dynamic: license-file
30
+ Dynamic: requires-dist
31
+ Dynamic: requires-python
32
+ Dynamic: summary
33
+
34
+ # direl-ts-tool-kit
35
+ > A Toolbox for Time Series Analysis and Visualization
36
+
37
+ A lightweight Python library developed to streamline common tasks in time series processing, including data preparation,
38
+ visualization with a consistent aesthetic style, and handling irregular indices.
39
+
40
+ ## Key features and functions
41
+
42
+ The library provides the following key functionalities, primarily centered around data preparation and plotting.
43
+
44
+ ### Data preparation and index management
45
+ #### parse_datetime_index
46
+ `parse_datetime_index(df_raw, date_column="date", format=None)`
47
+
48
+ Parses a specified column into datetime objects and sets it as the DataFrame index.
49
+
50
+ This function prepares raw data for time series analysis by ensuring the
51
+ DataFrame is indexed by the correct datetime type.
52
+
53
+ #### generate_dates
54
+ `generate_dates(df_ts, freq="MS")`
55
+
56
+ Generates a continuous DatetimeIndex covering the time span of the input DataFrame.
57
+
58
+ The function determines the start and end dates from the existing DataFrame index
59
+ and creates a new, regular date sequence based on the specified frequency.
60
+
61
+ #### reindex_and_aggregate
62
+ `reindex_and_aggregate(df_ts, column_name, freq="MS")`
63
+
64
+ Re-indexes a time series DataFrame to a regular frequency, aggregates values,
65
+ and introduces NaN for missing time steps.
66
+
67
+ This function first identifies the time range from the original (potentially irregular)
68
+ index, aggregates data if necessary (e.g., if multiple entries exist per time step),
69
+ and then merges the data onto a complete date range, effectively filling gaps
70
+ with NaN values.
71
+
72
+ #### remove_outliers_by_threshold
73
+ `remove_outliers_by_threshold(df_ts, column_name, lower_bound, upper_bound)`
74
+
75
+ Replaces values in a specified column with NaN if they fall outside a defined range (outlier removal).
76
+
77
+ This function identifies data points that are either below the lower
78
+ bound or above the upper bound and treats them as missing data.
79
+
80
+
81
+ ### Visualization and styling
82
+
83
+ #### plot_time_series
84
+ `plot_time_series(df_ts, variable, units="", color="BLUE_LINES", time_unit="Year", rot=90, auto_format_label=True)`
85
+
86
+ Plots a time series with custom styling and dual-level grid visibility.
87
+
88
+ This function automatically sets major and minor time-based locators
89
+ on the x-axis based on the specified time unit, and formats the y-axis
90
+ to use scientific notation.
91
+
92
+ #### save_figure
93
+ `save_figure(fig, file_name, variable_name="", path="./")`
94
+
95
+ Saves a Matplotlib figure in three common high-quality formats (PNG, PDF, SVG).
96
+
97
+ The function creates a consistent file name structure:
98
+ {path}/{file_name}_{variable_name}.{extension}.
99
+
100
+ #### heat_map
101
+ `heat_map(X, y, colors="Blues")`
102
+
103
+ Generates a correlation heatmap plot for a set of features and a target variable.
104
+
105
+ This function concatenates the feature DataFrame (X) and the target Series (y)
106
+ to compute and visualize the full pairwise correlation matrix using Seaborn.
107
+
108
+ #### pair_plot
109
+ `pair_plot(X, y)`
110
+
111
+ Generates a cornered pair plot (scatterplot matrix) to visualize relationships
112
+ between features and the target variable.
113
+
114
+ The function combines the feature DataFrame (X) and the target Series (y)
115
+ and uses seaborn.pairplot to create a matrix of scatter plots and histograms.
116
+ It focuses on the lower triangular part (corner=True) and includes a
117
+ regression line for trend visualization.
118
+
119
+
120
+ # Examples
121
+ - [Example 1](https://gitlab.com/direl/direl_tool_kit/-/blob/main/example/example_01.md?ref_type=heads)
122
+ - [Example 2](https://gitlab.com/direl/direl_tool_kit/-/blob/main/example/example_02.md?ref_type=heads)
123
+
@@ -0,0 +1,90 @@
1
+ # direl-ts-tool-kit
2
+ > A Toolbox for Time Series Analysis and Visualization
3
+
4
+ A lightweight Python library developed to streamline common tasks in time series processing, including data preparation,
5
+ visualization with a consistent aesthetic style, and handling irregular indices.
6
+
7
+ ## Key features and functions
8
+
9
+ The library provides the following key functionalities, primarily centered around data preparation and plotting.
10
+
11
+ ### Data preparation and index management
12
+ #### parse_datetime_index
13
+ `parse_datetime_index(df_raw, date_column="date", format=None)`
14
+
15
+ Parses a specified column into datetime objects and sets it as the DataFrame index.
16
+
17
+ This function prepares raw data for time series analysis by ensuring the
18
+ DataFrame is indexed by the correct datetime type.
19
+
20
+ #### generate_dates
21
+ `generate_dates(df_ts, freq="MS")`
22
+
23
+ Generates a continuous DatetimeIndex covering the time span of the input DataFrame.
24
+
25
+ The function determines the start and end dates from the existing DataFrame index
26
+ and creates a new, regular date sequence based on the specified frequency.
27
+
28
+ #### reindex_and_aggregate
29
+ `reindex_and_aggregate(df_ts, column_name, freq="MS")`
30
+
31
+ Re-indexes a time series DataFrame to a regular frequency, aggregates values,
32
+ and introduces NaN for missing time steps.
33
+
34
+ This function first identifies the time range from the original (potentially irregular)
35
+ index, aggregates data if necessary (e.g., if multiple entries exist per time step),
36
+ and then merges the data onto a complete date range, effectively filling gaps
37
+ with NaN values.
38
+
39
+ #### remove_outliers_by_threshold
40
+ `remove_outliers_by_threshold(df_ts, column_name, lower_bound, upper_bound)`
41
+
42
+ Replaces values in a specified column with NaN if they fall outside a defined range (outlier removal).
43
+
44
+ This function identifies data points that are either below the lower
45
+ bound or above the upper bound and treats them as missing data.
46
+
47
+
48
+ ### Visualization and styling
49
+
50
+ #### plot_time_series
51
+ `plot_time_series(df_ts, variable, units="", color="BLUE_LINES", time_unit="Year", rot=90, auto_format_label=True)`
52
+
53
+ Plots a time series with custom styling and dual-level grid visibility.
54
+
55
+ This function automatically sets major and minor time-based locators
56
+ on the x-axis based on the specified time unit, and formats the y-axis
57
+ to use scientific notation.
58
+
59
+ #### save_figure
60
+ `save_figure(fig, file_name, variable_name="", path="./")`
61
+
62
+ Saves a Matplotlib figure in three common high-quality formats (PNG, PDF, SVG).
63
+
64
+ The function creates a consistent file name structure:
65
+ {path}/{file_name}_{variable_name}.{extension}.
66
+
67
+ #### heat_map
68
+ `heat_map(X, y, colors="Blues")`
69
+
70
+ Generates a correlation heatmap plot for a set of features and a target variable.
71
+
72
+ This function concatenates the feature DataFrame (X) and the target Series (y)
73
+ to compute and visualize the full pairwise correlation matrix using Seaborn.
74
+
75
+ #### pair_plot
76
+ `pair_plot(X, y)`
77
+
78
+ Generates a cornered pair plot (scatterplot matrix) to visualize relationships
79
+ between features and the target variable.
80
+
81
+ The function combines the feature DataFrame (X) and the target Series (y)
82
+ and uses seaborn.pairplot to create a matrix of scatter plots and histograms.
83
+ It focuses on the lower triangular part (corner=True) and includes a
84
+ regression line for trend visualization.
85
+
86
+
87
+ # Examples
88
+ - [Example 1](https://gitlab.com/direl/direl_tool_kit/-/blob/main/example/example_01.md?ref_type=heads)
89
+ - [Example 2](https://gitlab.com/direl/direl_tool_kit/-/blob/main/example/example_02.md?ref_type=heads)
90
+
@@ -1,3 +1,4 @@
1
+ import seaborn as sns
1
2
  import matplotlib.pyplot as plt
2
3
  import matplotlib.dates as mdates
3
4
 
@@ -1,8 +1,16 @@
1
+ import pandas as pd
1
2
  from .plot_style import *
3
+ from scipy.stats import pearsonr
2
4
 
3
5
 
4
6
  def plot_time_series(
5
- df_ts, variable, units="", color="BLUE_LINES", time_unit="Year", rot=90, auto_format_label=True
7
+ df_ts,
8
+ variable,
9
+ units="",
10
+ color="BLUE_LINES",
11
+ time_unit="Year",
12
+ rot=90,
13
+ auto_format_label=True,
6
14
  ):
7
15
  """
8
16
  Plots a time series with custom styling and dual-level grid visibility.
@@ -100,7 +108,7 @@ def plot_time_series(
100
108
  if time_unit == "Day":
101
109
  ax.xaxis.set_major_locator(mdates.DayLocator())
102
110
  ax.xaxis.set_minor_locator(mdates.HourLocator())
103
-
111
+
104
112
  if time_unit == "Hour":
105
113
  ax.xaxis.set_major_locator(mdates.HourLocator())
106
114
  ax.xaxis.set_minor_locator(mdates.MinuteLocator())
@@ -151,3 +159,102 @@ def save_figure(
151
159
  fig.savefig(f"{base_name}.png")
152
160
  fig.savefig(f"{base_name}.pdf")
153
161
  fig.savefig(f"{base_name}.svg")
162
+
163
+
164
+ def heat_map(X, y, colors="Blues"):
165
+ """
166
+ Generates a correlation heatmap plot for a set of features and a target variable.
167
+
168
+ This function concatenates the feature DataFrame (X) and the target Series (y)
169
+ to compute and visualize the full pairwise correlation matrix using Seaborn.
170
+
171
+ Parameters
172
+ ----------
173
+ X : pd.DataFrame
174
+ The DataFrame containing the feature variables.
175
+ y : pd.Series or pd.DataFrame
176
+ The target variable (must be concatenable with X).
177
+ colors : str or matplotlib.colors.Colormap, optional
178
+ The colormap to use for the heatmap, passed to the 'cmap' argument
179
+ in seaborn.heatmap. Defaults to "Blues".
180
+
181
+ Note: For standard correlation matrices (which include negative values),
182
+ a diverging colormap (e.g., "coolwarm", "vlag") is usually recommended.
183
+
184
+ Returns
185
+ -------
186
+ matplotlib.figure.Figure
187
+ The generated Matplotlib figure object containing the heatmap.
188
+
189
+ Notes
190
+ -----
191
+ The heatmap displays the Pearson correlation coefficient rounded to two
192
+ decimal places and includes annotations for improved readability.
193
+ """
194
+ fig, ax = plt.subplots()
195
+ Z = pd.concat([X, y], axis=1)
196
+
197
+ ax = sns.heatmap(
198
+ Z.corr(),
199
+ cmap=colors,
200
+ annot=True,
201
+ linewidths=0.5,
202
+ fmt=".2f",
203
+ annot_kws={"size": 10},
204
+ )
205
+
206
+ return fig
207
+
208
+
209
+ def corrfunc(x, y, ax=None, **kws):
210
+ """Plot the correlation coefficient in the top left hand corner of a plot."""
211
+ r, _ = pearsonr(x, y)
212
+ ax = ax or plt.gca()
213
+ ax.annotate(f"R = {r:.2f}", xy=(0.1, 0.9), fontsize=25, xycoords=ax.transAxes)
214
+
215
+
216
+ def pair_plot(X, y):
217
+ """
218
+ Generates a cornered pair plot (scatterplot matrix) to visualize relationships
219
+ between features and the target variable.
220
+
221
+ The function combines the feature DataFrame (X) and the target Series (y)
222
+ and uses seaborn.pairplot to create a matrix of scatter plots and histograms.
223
+ It focuses on the lower triangular part (corner=True) and includes a
224
+ regression line for trend visualization.
225
+
226
+ Parameters
227
+ ----------
228
+ X : pd.DataFrame
229
+ The DataFrame containing the feature variables.
230
+ y : pd.Series or pd.DataFrame
231
+ The target variable (must be concatenable with X).
232
+
233
+ Returns
234
+ -------
235
+ matplotlib.figure.Figure
236
+ The generated Matplotlib Figure object containing the cornered pair plot.
237
+
238
+ Notes
239
+ -----
240
+ 1. **Dependency:** This function requires a previously defined custom function
241
+ `corrfunc` to be available in the local namespace, as it is used via
242
+ `svm.map_lower()`. This custom function is typically used to display
243
+ correlation coefficients (e.g., Pearson's r) in the lower panel.
244
+ 2. **Aesthetics:** Uses a regression line (`kind="reg"`) with custom color
245
+ (RED_LINES) to highlight linear relationships.
246
+ 3. **Output:** The returned Figure object can be manipulated further
247
+ or saved using methods like `fig.savefig()`.
248
+ """
249
+ Z = pd.concat([X, y], axis=1)
250
+ svm = sns.pairplot(
251
+ Z,
252
+ corner=True,
253
+ kind="reg",
254
+ plot_kws={"line_kws": {"color": paper_colors["RED_LINES"]}},
255
+ )
256
+ svm.map_lower(corrfunc)
257
+
258
+ fig = svm.fig
259
+
260
+ return fig
@@ -0,0 +1,123 @@
1
+ Metadata-Version: 2.4
2
+ Name: direl-ts-tool-kit
3
+ Version: 0.5.0
4
+ Summary: A toolbox for time series analysis and visualization.
5
+ Home-page: https://gitlab.com/direl/direl_tool_kit
6
+ Author: Diego Restrepo-Leal
7
+ Author-email: diegorestrepoleal@gmail.com
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Programming Language :: Python :: 3.9
10
+ Classifier: License :: OSI Approved :: MIT License
11
+ Classifier: Operating System :: OS Independent
12
+ Classifier: Intended Audience :: Science/Research
13
+ Classifier: Topic :: Scientific/Engineering :: Visualization
14
+ Requires-Python: >=3.8
15
+ Description-Content-Type: text/markdown
16
+ License-File: LICENCE
17
+ Requires-Dist: pandas>=1.0.0
18
+ Requires-Dist: numpy>=1.18.0
19
+ Requires-Dist: matplotlib>=3.0.0
20
+ Requires-Dist: openpyxl
21
+ Requires-Dist: seaborn
22
+ Requires-Dist: scipy
23
+ Dynamic: author
24
+ Dynamic: author-email
25
+ Dynamic: classifier
26
+ Dynamic: description
27
+ Dynamic: description-content-type
28
+ Dynamic: home-page
29
+ Dynamic: license-file
30
+ Dynamic: requires-dist
31
+ Dynamic: requires-python
32
+ Dynamic: summary
33
+
34
+ # direl-ts-tool-kit
35
+ > A Toolbox for Time Series Analysis and Visualization
36
+
37
+ A lightweight Python library developed to streamline common tasks in time series processing, including data preparation,
38
+ visualization with a consistent aesthetic style, and handling irregular indices.
39
+
40
+ ## Key features and functions
41
+
42
+ The library provides the following key functionalities, primarily centered around data preparation and plotting.
43
+
44
+ ### Data preparation and index management
45
+ #### parse_datetime_index
46
+ `parse_datetime_index(df_raw, date_column="date", format=None)`
47
+
48
+ Parses a specified column into datetime objects and sets it as the DataFrame index.
49
+
50
+ This function prepares raw data for time series analysis by ensuring the
51
+ DataFrame is indexed by the correct datetime type.
52
+
53
+ #### generate_dates
54
+ `generate_dates(df_ts, freq="MS")`
55
+
56
+ Generates a continuous DatetimeIndex covering the time span of the input DataFrame.
57
+
58
+ The function determines the start and end dates from the existing DataFrame index
59
+ and creates a new, regular date sequence based on the specified frequency.
60
+
61
+ #### reindex_and_aggregate
62
+ `reindex_and_aggregate(df_ts, column_name, freq="MS")`
63
+
64
+ Re-indexes a time series DataFrame to a regular frequency, aggregates values,
65
+ and introduces NaN for missing time steps.
66
+
67
+ This function first identifies the time range from the original (potentially irregular)
68
+ index, aggregates data if necessary (e.g., if multiple entries exist per time step),
69
+ and then merges the data onto a complete date range, effectively filling gaps
70
+ with NaN values.
71
+
72
+ #### remove_outliers_by_threshold
73
+ `remove_outliers_by_threshold(df_ts, column_name, lower_bound, upper_bound)`
74
+
75
+ Replaces values in a specified column with NaN if they fall outside a defined range (outlier removal).
76
+
77
+ This function identifies data points that are either below the lower
78
+ bound or above the upper bound and treats them as missing data.
79
+
80
+
81
+ ### Visualization and styling
82
+
83
+ #### plot_time_series
84
+ `plot_time_series(df_ts, variable, units="", color="BLUE_LINES", time_unit="Year", rot=90, auto_format_label=True)`
85
+
86
+ Plots a time series with custom styling and dual-level grid visibility.
87
+
88
+ This function automatically sets major and minor time-based locators
89
+ on the x-axis based on the specified time unit, and formats the y-axis
90
+ to use scientific notation.
91
+
92
+ #### save_figure
93
+ `save_figure(fig, file_name, variable_name="", path="./")`
94
+
95
+ Saves a Matplotlib figure in three common high-quality formats (PNG, PDF, SVG).
96
+
97
+ The function creates a consistent file name structure:
98
+ {path}/{file_name}_{variable_name}.{extension}.
99
+
100
+ #### heat_map
101
+ `heat_map(X, y, colors="Blues")`
102
+
103
+ Generates a correlation heatmap plot for a set of features and a target variable.
104
+
105
+ This function concatenates the feature DataFrame (X) and the target Series (y)
106
+ to compute and visualize the full pairwise correlation matrix using Seaborn.
107
+
108
+ #### pair_plot
109
+ `pair_plot(X, y)`
110
+
111
+ Generates a cornered pair plot (scatterplot matrix) to visualize relationships
112
+ between features and the target variable.
113
+
114
+ The function combines the feature DataFrame (X) and the target Series (y)
115
+ and uses seaborn.pairplot to create a matrix of scatter plots and histograms.
116
+ It focuses on the lower triangular part (corner=True) and includes a
117
+ regression line for trend visualization.
118
+
119
+
120
+ # Examples
121
+ - [Example 1](https://gitlab.com/direl/direl_tool_kit/-/blob/main/example/example_01.md?ref_type=heads)
122
+ - [Example 2](https://gitlab.com/direl/direl_tool_kit/-/blob/main/example/example_02.md?ref_type=heads)
123
+
@@ -2,3 +2,5 @@ pandas>=1.0.0
2
2
  numpy>=1.18.0
3
3
  matplotlib>=3.0.0
4
4
  openpyxl
5
+ seaborn
6
+ scipy
@@ -2,7 +2,7 @@ from setuptools import setup, find_packages
2
2
 
3
3
  setup(
4
4
  name="direl-ts-tool-kit",
5
- version="0.4.8",
5
+ version="0.5.0",
6
6
  description="A toolbox for time series analysis and visualization.",
7
7
  long_description=open("README.md", encoding="utf-8").read(),
8
8
  long_description_content_type="text/markdown",
@@ -15,6 +15,8 @@ setup(
15
15
  "numpy>=1.18.0",
16
16
  "matplotlib>=3.0.0",
17
17
  "openpyxl",
18
+ "seaborn",
19
+ "scipy"
18
20
  ],
19
21
  classifiers=[
20
22
  "Programming Language :: Python :: 3",