dragon-ml-toolbox 1.2.0__py3-none-any.whl → 1.3.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of dragon-ml-toolbox might be problematic. Click here for more details.

@@ -0,0 +1,88 @@
1
+ Metadata-Version: 2.4
2
+ Name: dragon-ml-toolbox
3
+ Version: 1.3.0
4
+ Summary: A collection of tools for data science and machine learning projects
5
+ Author-email: Karl Loza <luigiloza@gmail.com>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/DrAg0n-BoRn/ML_tools
8
+ Project-URL: Changelog, https://github.com/DrAg0n-BoRn/ML_tools/blob/master/CHANGELOG.md
9
+ Classifier: Programming Language :: Python :: 3
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.9
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE
14
+ License-File: LICENSE-THIRD-PARTY.md
15
+ Requires-Dist: numpy<2.0
16
+ Requires-Dist: scikit-learn
17
+ Requires-Dist: openpyxl
18
+ Requires-Dist: miceforest
19
+ Requires-Dist: matplotlib
20
+ Requires-Dist: seaborn
21
+ Requires-Dist: pandas
22
+ Requires-Dist: polars
23
+ Requires-Dist: imblearn
24
+ Requires-Dist: statsmodels
25
+ Requires-Dist: ipython
26
+ Requires-Dist: joblib
27
+ Requires-Dist: xgboost
28
+ Requires-Dist: lightgbm
29
+ Requires-Dist: shap
30
+ Provides-Extra: pytorch
31
+ Requires-Dist: torch; extra == "pytorch"
32
+ Requires-Dist: Pillow; extra == "pytorch"
33
+ Requires-Dist: torchvision; extra == "pytorch"
34
+ Dynamic: license-file
35
+
36
+ # dragon-ml-tools
37
+
38
+ A collection of Python utilities for data science and machine learning, structured as a modular package for easy reuse and installation.
39
+
40
+ ## Features
41
+
42
+ - Modular scripts for data exploration, logging, machine learning, and more.
43
+ - Designed for seamless integration as a Git submodule or installable Python package.
44
+
45
+ ## Installation
46
+
47
+ **Python 3.9+ recommended.**
48
+
49
+ ### Via PyPI
50
+
51
+ Install the latest stable release from PyPI:
52
+
53
+ ```bash
54
+ pip install dragon-ml-tools
55
+ ```
56
+
57
+ ### Via conda-forge
58
+
59
+ Install from the conda-forge channel:
60
+
61
+ ```bash
62
+ conda install -c conda-forge dragon-ml-toolbox
63
+ ```
64
+
65
+ #### Optional dependencies
66
+
67
+ ```bash
68
+ pip install dragon-ml-tools[pytorch]
69
+ ```
70
+
71
+ ### Via GitHub (Editable)
72
+
73
+ Clone the repository and install in editable mode with optional dependencies:
74
+
75
+ ```bash
76
+ git clone https://github.com/DrAg0n-BoRn/ML_tools.git
77
+ cd ML_tools
78
+ pip install -e '.[pytorch]'
79
+ ```
80
+
81
+ ## Usage
82
+
83
+ After installation, import modules like this:
84
+
85
+ ```python
86
+ from ml_tools.utilities import sanitize_filename
87
+ from ml_tools.logger import custom_logger
88
+ ```
@@ -1,9 +1,10 @@
1
- dragon_ml_toolbox-1.2.0.dist-info/licenses/LICENSE,sha256=2uUFNy7D0TLgHim1K5s3DIJ4q_KvxEXVilnU20cWliY,1066
1
+ dragon_ml_toolbox-1.3.0.dist-info/licenses/LICENSE,sha256=2uUFNy7D0TLgHim1K5s3DIJ4q_KvxEXVilnU20cWliY,1066
2
+ dragon_ml_toolbox-1.3.0.dist-info/licenses/LICENSE-THIRD-PARTY.md,sha256=e1Hg5ZtaBpDV7ZvxhLe1ac28l7nMjvi1MSE5YvB1s-o,1472
2
3
  ml_tools/MICE_imputation.py,sha256=Xvupj6w4NJ7d8gcJbpp1y3LVVnWEfvx-It7oEksuT5I,7349
3
4
  ml_tools/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
4
- ml_tools/data_exploration.py,sha256=AMQ5XLrRhV6dLhptjl2Jppgk9JJ06ZjXEuvqkjC3gt0,26998
5
+ ml_tools/data_exploration.py,sha256=laTNbN5_xlhqWiKfF-cJ9yMZ8zAM2a-AryqgiIQBBLg,26649
5
6
  ml_tools/datasetmaster.py,sha256=VUneKshnmjOGbtqVVGTFcIMRKF3s6ZDYrosIYKDjD80,28956
6
- ml_tools/ensemble_learning.py,sha256=uA7A94CLv8o2l125oTEi0cjHusZkB-7Mnrtn7SGTfjs,29714
7
+ ml_tools/ensemble_learning.py,sha256=5UmlXI3Orm5zL0P07Ub_Y0gwjruH-REHY-cFWQpJWb0,29085
7
8
  ml_tools/handle_excel.py,sha256=IR0VQc3hYdmjwC31E5YxDnRcWig4jSIx7Y_7to-KZz4,11969
8
9
  ml_tools/logger.py,sha256=XwSpCUzw2Le24fJHyljBxNLgw63SwjZ0pMjTJqf0ylI,4622
9
10
  ml_tools/particle_swarm_optimization.py,sha256=jpkje4OETC9fyISxxUTx4XGrImSU6gDEcwz46ZDs2bQ,19250
@@ -11,7 +12,7 @@ ml_tools/pytorch_models.py,sha256=Oykw02sOZLCjvSadQd64UGesBN7kq0x1EGXHusvYiQI,99
11
12
  ml_tools/trainer.py,sha256=Zd7AaHeoNd8dEas2JChWoHaCUpWUVRDUMybuHaKJ0XY,16740
12
13
  ml_tools/utilities.py,sha256=mG_--EFplfI9H7OhrWI8VkdNJtTbs4Wbz32xvcFWps8,5518
13
14
  ml_tools/vision_helpers.py,sha256=lBAW6dzAK-HOswAt1fU_tfP9hkNLY5D8c_I_7hhEXno,7528
14
- dragon_ml_toolbox-1.2.0.dist-info/METADATA,sha256=LmlbpETQETUcZuGatEtnP6JttrkN7kVObxjzvl5INfk,5128
15
- dragon_ml_toolbox-1.2.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
16
- dragon_ml_toolbox-1.2.0.dist-info/top_level.txt,sha256=wm-oxax3ciyez6VoO4zsFd-gSok2VipYXnbg3TH9PtU,9
17
- dragon_ml_toolbox-1.2.0.dist-info/RECORD,,
15
+ dragon_ml_toolbox-1.3.0.dist-info/METADATA,sha256=NAQJFp18GJcSVs4W1zMjzLlxuS79Xs7gZgEru-n-2o8,2169
16
+ dragon_ml_toolbox-1.3.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
17
+ dragon_ml_toolbox-1.3.0.dist-info/top_level.txt,sha256=wm-oxax3ciyez6VoO4zsFd-gSok2VipYXnbg3TH9PtU,9
18
+ dragon_ml_toolbox-1.3.0.dist-info/RECORD,,
@@ -0,0 +1,23 @@
1
+ # Third-Party Licenses
2
+
3
+ This project depends on the following third-party packages. Each is governed by its own license, linked below.
4
+
5
+ - [pandas](https://github.com/pandas-dev/pandas/blob/main/LICENSE)
6
+ - [numpy](https://github.com/numpy/numpy/blob/main/LICENSE.txt)
7
+ - [matplotlib](https://github.com/matplotlib/matplotlib/blob/main/LICENSE/LICENSE)
8
+ - [seaborn](https://github.com/mwaskom/seaborn/blob/main/LICENSE)
9
+ - [statsmodels](https://github.com/statsmodels/statsmodels/blob/main/LICENSE.txt)
10
+ - [ipython](https://github.com/ipython/ipython/blob/main/COPYING.rst)
11
+ - [torch](https://github.com/pytorch/pytorch/blob/main/LICENSE)
12
+ - [scikit-learn](https://github.com/scikit-learn/scikit-learn/blob/main/COPYING)
13
+ - [imblearn](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/main/LICENSE)
14
+ - [Pillow](https://github.com/python-pillow/Pillow/blob/main/LICENSE)
15
+ - [joblib](https://github.com/joblib/joblib/blob/main/LICENSE.txt)
16
+ - [xgboost](https://github.com/dmlc/xgboost/blob/main/LICENSE)
17
+ - [lightgbm](https://github.com/microsoft/LightGBM/blob/master/LICENSE)
18
+ - [shap](https://github.com/shap/shap/blob/master/LICENSE)
19
+ - [openpyxl](https://github.com/chronossc/openpyxl/blob/main/LICENSE)
20
+ - [miceforest](https://github.com/AnotherSamWilson/miceforest/blob/main/LICENSE)
21
+ - [polars](https://github.com/pola-rs/polars/blob/main/LICENSE)
22
+ - [torchvision](https://github.com/pytorch/vision/blob/main/LICENSE)
23
+ - [pyswarm](https://pythonhosted.org/pyswarm/#license)
@@ -15,8 +15,7 @@ from ml_tools.utilities import sanitize_filename
15
15
 
16
16
 
17
17
  # Keep track of all available functions, show using `info()`
18
- __all__ = ["load_dataframe",
19
- "summarize_dataframe",
18
+ __all__ = ["summarize_dataframe",
20
19
  "drop_rows_with_missing_data",
21
20
  "split_features_targets",
22
21
  "show_null_columns",
@@ -33,21 +32,6 @@ __all__ = ["load_dataframe",
33
32
  "drop_vif_based"]
34
33
 
35
34
 
36
- def load_dataframe(df_path: str) -> pd.DataFrame:
37
- """
38
- Loads a DataFrame from a CSV file.
39
-
40
- Args:
41
- df_path (str): Path to the CSV file.
42
-
43
- Returns:
44
- pd.DataFrame: Loaded DataFrame.
45
- """
46
- df = pd.read_csv(df_path, encoding='utf-8')
47
- print(f"DataFrame shape {df.shape}")
48
- return df
49
-
50
-
51
35
  def summarize_dataframe(df: pd.DataFrame, round_digits: int = 2):
52
36
  """
53
37
  Returns a summary DataFrame with data types, non-null counts, number of unique values,
@@ -21,6 +21,8 @@ from sklearn.preprocessing import StandardScaler, MaxAbsScaler, MinMaxScaler
21
21
  from sklearn.metrics import accuracy_score, classification_report, ConfusionMatrixDisplay, mean_absolute_error, mean_squared_error, r2_score, roc_curve, roc_auc_score
22
22
  import shap
23
23
 
24
+ from .utilities import yield_dataframes_from_dir
25
+
24
26
  import warnings # Ignore warnings
25
27
  warnings.filterwarnings('ignore', category=DeprecationWarning)
26
28
  warnings.filterwarnings('ignore', category=FutureWarning)
@@ -28,23 +30,6 @@ warnings.filterwarnings('ignore', category=UserWarning)
28
30
 
29
31
 
30
32
  ###### 1. Dataset Loader ######
31
- #Load imputed datasets as a generator
32
- def yield_imputed_dataframe(datasets_dir: str):
33
- '''
34
- Yields a tuple `(dataframe, dataframe_name)`
35
- '''
36
- dataset_filenames = [dataset for dataset in os.listdir(datasets_dir) if dataset.endswith(".csv")]
37
- if not dataset_filenames:
38
- raise IOError(f"No imputed datasets have been found at {datasets_dir}")
39
-
40
- for dataset_filename in dataset_filenames:
41
- full_path = os.path.join(datasets_dir, dataset_filename)
42
- df = pd.read_csv(full_path)
43
- #remove extension
44
- filename = os.path.splitext(os.path.basename(dataset_filename))[0]
45
- print(f"Working on dataset: {filename}")
46
- yield (df, filename)
47
-
48
33
  #Split a dataset into features and targets datasets
49
34
  def dataset_yielder(df: pd.DataFrame, target_cols: list[str]):
50
35
  '''
@@ -543,7 +528,7 @@ def get_shap_values(model, model_name: str,
543
528
  plot_size=figsize,
544
529
  max_display=max_display_features,
545
530
  alpha=0.7,
546
- color=plt.get_cmap('viridis')
531
+ color=plt.get_cmap('viridis') # type: ignore
547
532
  )
548
533
 
549
534
  # Add professional styling
@@ -674,7 +659,7 @@ def run_pipeline(datasets_dir: str, save_dir: str, target_columns: list[str], ta
674
659
  #Check paths
675
660
  _check_paths(datasets_dir, save_dir)
676
661
  #Yield imputed dataset
677
- for dataframe, dataframe_name in yield_imputed_dataframe(datasets_dir):
662
+ for dataframe, dataframe_name in yield_dataframes_from_dir(datasets_dir):
678
663
  #Yield features dataframe and target dataframe
679
664
  for df_features, df_target, feature_names, target_name in dataset_yielder(df=dataframe, target_cols=target_columns):
680
665
  #Dataset pipeline
@@ -1,140 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: dragon-ml-toolbox
3
- Version: 1.2.0
4
- Summary: A collection of tools for data science and machine learning projects
5
- Author-email: Karl Loza <luigiloza@gmail.com>
6
- License-Expression: MIT
7
- Project-URL: Homepage, https://github.com/DrAg0n-BoRn/ML_tools
8
- Project-URL: Changelog, https://github.com/DrAg0n-BoRn/ML_tools/blob/master/CHANGELOG.md
9
- Classifier: Programming Language :: Python :: 3
10
- Classifier: Operating System :: OS Independent
11
- Requires-Python: >=3.9
12
- Description-Content-Type: text/markdown
13
- License-File: LICENSE
14
- Requires-Dist: numpy
15
- Requires-Dist: pandas
16
- Requires-Dist: matplotlib
17
- Requires-Dist: scikit-learn
18
- Provides-Extra: data-exploration
19
- Requires-Dist: pandas; extra == "data-exploration"
20
- Requires-Dist: numpy; extra == "data-exploration"
21
- Requires-Dist: matplotlib; extra == "data-exploration"
22
- Requires-Dist: seaborn; extra == "data-exploration"
23
- Requires-Dist: statsmodels; extra == "data-exploration"
24
- Requires-Dist: ipython; extra == "data-exploration"
25
- Provides-Extra: datasetmaster
26
- Requires-Dist: torch; extra == "datasetmaster"
27
- Requires-Dist: pandas; extra == "datasetmaster"
28
- Requires-Dist: numpy; extra == "datasetmaster"
29
- Requires-Dist: scikit-learn; extra == "datasetmaster"
30
- Requires-Dist: imblearn; extra == "datasetmaster"
31
- Requires-Dist: Pillow; extra == "datasetmaster"
32
- Requires-Dist: matplotlib; extra == "datasetmaster"
33
- Provides-Extra: ensemble-learning
34
- Requires-Dist: pandas; extra == "ensemble-learning"
35
- Requires-Dist: numpy; extra == "ensemble-learning"
36
- Requires-Dist: seaborn; extra == "ensemble-learning"
37
- Requires-Dist: matplotlib; extra == "ensemble-learning"
38
- Requires-Dist: joblib; extra == "ensemble-learning"
39
- Requires-Dist: imblearn; extra == "ensemble-learning"
40
- Requires-Dist: scikit-learn; extra == "ensemble-learning"
41
- Requires-Dist: xgboost; extra == "ensemble-learning"
42
- Requires-Dist: lightgbm; extra == "ensemble-learning"
43
- Requires-Dist: shap; extra == "ensemble-learning"
44
- Provides-Extra: handle-excel
45
- Requires-Dist: openpyxl; extra == "handle-excel"
46
- Requires-Dist: pandas; extra == "handle-excel"
47
- Provides-Extra: logger
48
- Requires-Dist: pandas; extra == "logger"
49
- Requires-Dist: openpyxl; extra == "logger"
50
- Provides-Extra: mice-imputation
51
- Requires-Dist: pandas; extra == "mice-imputation"
52
- Requires-Dist: miceforest; extra == "mice-imputation"
53
- Requires-Dist: matplotlib; extra == "mice-imputation"
54
- Requires-Dist: numpy; extra == "mice-imputation"
55
- Provides-Extra: particle-swarm-optimization
56
- Requires-Dist: numpy; extra == "particle-swarm-optimization"
57
- Requires-Dist: joblib; extra == "particle-swarm-optimization"
58
- Requires-Dist: xgboost; extra == "particle-swarm-optimization"
59
- Requires-Dist: lightgbm; extra == "particle-swarm-optimization"
60
- Requires-Dist: scikit-learn; extra == "particle-swarm-optimization"
61
- Requires-Dist: polars; extra == "particle-swarm-optimization"
62
- Provides-Extra: pytorch-models
63
- Requires-Dist: torch; extra == "pytorch-models"
64
- Provides-Extra: trainer
65
- Requires-Dist: numpy; extra == "trainer"
66
- Requires-Dist: torch; extra == "trainer"
67
- Requires-Dist: matplotlib; extra == "trainer"
68
- Requires-Dist: scikit-learn; extra == "trainer"
69
- Provides-Extra: vision-helpers
70
- Requires-Dist: Pillow; extra == "vision-helpers"
71
- Requires-Dist: torch; extra == "vision-helpers"
72
- Requires-Dist: torchvision; extra == "vision-helpers"
73
- Provides-Extra: full
74
- Requires-Dist: pandas; extra == "full"
75
- Requires-Dist: numpy; extra == "full"
76
- Requires-Dist: matplotlib; extra == "full"
77
- Requires-Dist: seaborn; extra == "full"
78
- Requires-Dist: statsmodels; extra == "full"
79
- Requires-Dist: ipython; extra == "full"
80
- Requires-Dist: torch; extra == "full"
81
- Requires-Dist: scikit-learn; extra == "full"
82
- Requires-Dist: imblearn; extra == "full"
83
- Requires-Dist: Pillow; extra == "full"
84
- Requires-Dist: joblib; extra == "full"
85
- Requires-Dist: xgboost; extra == "full"
86
- Requires-Dist: lightgbm; extra == "full"
87
- Requires-Dist: shap; extra == "full"
88
- Requires-Dist: openpyxl; extra == "full"
89
- Requires-Dist: miceforest; extra == "full"
90
- Requires-Dist: polars; extra == "full"
91
- Requires-Dist: torchvision; extra == "full"
92
- Dynamic: license-file
93
-
94
- # dragon-ml-tools
95
-
96
- A collection of Python utilities and machine learning tools, structured as a modular package for easy reuse and installation.
97
-
98
- ## Features
99
-
100
- - Modular scripts for data exploration, logging, machine learning, and more.
101
- - Optional dependencies grouped by functionality for lightweight installs.
102
- - Designed for seamless integration as a Git submodule or installable Python package.
103
-
104
-
105
- ## Installation
106
-
107
- Python 3.9+ recommended.
108
-
109
- ### Via PyPI (Stable Releases)
110
-
111
- Install the latest stable release from PyPI with optional dependencies:
112
-
113
- ```bash
114
- pip install dragon-ml-tools[logger,trainer]
115
- ```
116
-
117
- To install dependencies from all modules
118
-
119
- ```bash
120
- pip install dragon-ml-tools[full]
121
- ```
122
-
123
- ### Via GitHub (Editable)
124
-
125
- Clone the repository and install in editable mode with optional dependencies:
126
-
127
- ```bash
128
- git clone https://github.com/DrAg0n-BoRn/ML_tools.git
129
- cd ML_tools
130
- pip install -e '.[logger]'
131
- ```
132
-
133
- ## Usage
134
-
135
- After installation, import modules like this:
136
-
137
- ```python
138
- from ml_tools.utilities import sanitize_filename
139
- from ml_tools.logger import custom_logger
140
- ```