PyPI - dragon-ml-toolbox - Versions diffs - 11.1.0__py3-none-any.whl → 12.0.0__py3-none-any.whl - Mend

dragon-ml-toolbox 11.1.0py3-none-any.whl → 12.0.0py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of dragon-ml-toolbox might be problematic. Click here for more details.

Files changed (34) hide show

{dragon_ml_toolbox-11.1.0.dist-info → dragon_ml_toolbox-12.0.0.dist-info}/METADATA +22 -36
dragon_ml_toolbox-12.0.0.dist-info/RECORD +40 -0
ml_tools/ETL_cleaning.py +1 -0
ml_tools/ETL_engineering.py +19 -7
ml_tools/GUI_tools.py +2 -1
ml_tools/MICE_imputation.py +5 -2
ml_tools/ML_callbacks.py +3 -3
ml_tools/ML_datasetmaster.py +1 -0
ml_tools/ML_evaluation.py +2 -1
ml_tools/ML_evaluation_multi.py +1 -0
ml_tools/ML_inference.py +1 -0
ml_tools/ML_models.py +3 -1
ml_tools/ML_optimization.py +2 -1
ml_tools/ML_scaler.py +3 -0
ml_tools/ML_utilities.py +219 -0
ml_tools/PSO_optimization.py +5 -6
ml_tools/RNN_forecast.py +2 -0
ml_tools/SQL.py +1 -0
ml_tools/VIF_factor.py +2 -1
ml_tools/_logger.py +0 -2
ml_tools/custom_logger.py +1 -0
ml_tools/data_exploration.py +16 -10
ml_tools/ensemble_inference.py +5 -6
ml_tools/ensemble_learning.py +3 -2
ml_tools/handle_excel.py +1 -0
ml_tools/math_utilities.py +235 -0
ml_tools/path_manager.py +2 -1
ml_tools/serde.py +103 -0
ml_tools/utilities.py +19 -453
dragon_ml_toolbox-11.1.0.dist-info/RECORD +0 -37
{dragon_ml_toolbox-11.1.0.dist-info → dragon_ml_toolbox-12.0.0.dist-info}/WHEEL +0 -0
{dragon_ml_toolbox-11.1.0.dist-info → dragon_ml_toolbox-12.0.0.dist-info}/licenses/LICENSE +0 -0
{dragon_ml_toolbox-11.1.0.dist-info → dragon_ml_toolbox-12.0.0.dist-info}/licenses/LICENSE-THIRD-PARTY.md +0 -0
{dragon_ml_toolbox-11.1.0.dist-info → dragon_ml_toolbox-12.0.0.dist-info}/top_level.txt +0 -0

{dragon_ml_toolbox-11.1.0.dist-info → dragon_ml_toolbox-12.0.0.dist-info}/METADATA RENAMED Viewed

@@ -1,14 +1,14 @@
 Metadata-Version: 2.4
 Name: dragon-ml-toolbox
-Version: 11.1.0
+Version: 12.0.0
 Summary: A collection of tools for data science and machine learning projects.
-Author-email: Karl Loza <luigiloza@gmail.com>
+Author-email: "Karl L. Loza Vidaurre" <luigiloza@gmail.com>
 License-Expression: MIT
 Project-URL: Homepage, https://github.com/DrAg0n-BoRn/ML_tools
 Project-URL: Changelog, https://github.com/DrAg0n-BoRn/ML_tools/blob/master/CHANGELOG.md
 Classifier: Programming Language :: Python :: 3
 Classifier: Operating System :: OS Independent
-Requires-Python: >=3.10
+Requires-Python: ==3.12
 Description-Content-Type: text/markdown
 License-File: LICENSE
 License-File: LICENSE-THIRD-PARTY.md
@@ -47,9 +47,6 @@ Requires-Dist: lightgbm<=4.5.0; extra == "mice"
 Requires-Dist: shap; extra == "mice"
 Requires-Dist: colorlog; extra == "mice"
 Requires-Dist: pyarrow; extra == "mice"
-Provides-Extra: pytorch
-Requires-Dist: torch; extra == "pytorch"
-Requires-Dist: torchvision; extra == "pytorch"
 Provides-Extra: excel
 Requires-Dist: pandas; extra == "excel"
 Requires-Dist: openpyxl; extra == "excel"
@@ -68,9 +65,6 @@ Requires-Dist: lightgbm; extra == "gui-boost"
 Provides-Extra: gui-torch
 Requires-Dist: numpy; extra == "gui-torch"
 Requires-Dist: FreeSimpleGUI>=5.2; extra == "gui-torch"
-Provides-Extra: plot
-Requires-Dist: matplotlib; extra == "plot"
-Requires-Dist: seaborn; extra == "plot"
 Provides-Extra: pyinstaller
 Requires-Dist: pyinstaller; extra == "pyinstaller"
 Provides-Extra: nuitka
@@ -90,7 +84,7 @@ A collection of Python utilities for data science and machine learning, structur
 ## Installation
-**Python 3.10+**
+**Python 3.12**
 ### Via PyPI
@@ -100,22 +94,22 @@ Install the latest stable release from PyPI:
 pip install dragon-ml-toolbox
 ```
-### Via GitHub (Editable)
+### Via conda-forge
-Clone the repository and install in editable mode with optional dependencies:
+Install from the conda-forge channel:
 ```bash
-git clone https://github.com/DrAg0n-BoRn/ML_tools.git
-cd ML_tools
-pip install -e .
+conda install -c conda-forge dragon-ml-toolbox
 ```
-### Via conda-forge
+### Via GitHub (Editable)
-Install from the conda-forge channel:
+Clone the repository and install in editable mode:
 ```bash
-conda install -c conda-forge dragon-ml-toolbox
+git clone https://github.com/DrAg0n-BoRn/ML_tools.git
+cd ML_tools
+pip install -e .
 ```
 ## Modular Installation
@@ -128,13 +122,7 @@ Installs a comprehensive set of tools for typical data science workflows, includ
 pip install "dragon-ml-toolbox[ML]"
 ```
-To install the standard CPU-only versions of Torch and Torchvision:
-```Bash
-pip install "dragon-ml-toolbox[pytorch]"
-```
-⚠️ To make use of GPU acceleration (highly recommended), follow the official instructions: [PyTorch website](https://pytorch.org/get-started/locally/)
+⚠️ PyTorch required, follow the official instructions: [PyTorch website](https://pytorch.org/get-started/locally/)
 #### Modules:
@@ -147,6 +135,7 @@ ensemble_inference
 ensemble_learning
 ETL_cleaning
 ETL_engineering
+math_utilities
 ML_callbacks
 ML_datasetmaster
 ML_evaluation_multi
@@ -156,10 +145,12 @@ ML_models
 ML_optimization
 ML_scaler
 ML_trainer
+ML_utilities
 optimization_tools
 path_manager
 PSO_optimization
 RNN_forecast
+serde
 SQL
 utilities
 ```
@@ -179,7 +170,9 @@ pip install "dragon-ml-toolbox[mice]"
 ```Bash
 constants
 custom_logger
+math_utilities
 MICE_imputation
+serde
 VIF_factor
 path_manager
 utilities
@@ -208,16 +201,12 @@ path_manager
 ### 🎰 GUI for Boosting Algorithms (XGBoost, LightGBM) [gui-boost]
-For GUIs that include plotting functionality, you must also install the [plot] extra.
+GUI tools compatible with XGBoost and LightGBM models used for inference.
 ```Bash
 pip install "dragon-ml-toolbox[gui-boost]"
 ```
-```Bash
-pip install "dragon-ml-toolbox[gui-boost,plot]"
-```
 #### Modules:
 ```Bash
@@ -226,22 +215,19 @@ custom_logger
 GUI_tools
 ensemble_inference
 path_manager
+serde
 ```
 ---
 ### 🤖 GUI for PyTorch Models [gui-torch]
-For GUIs that include plotting functionality, you must also install the [plot] extra.
+GUI tools compatible with PyTorch models used for inference.
 ```Bash
 pip install "dragon-ml-toolbox[gui-torch]"
 ```
-```Bash
-pip install "dragon-ml-toolbox[gui-torch,plot]"
-```
 #### Modules:
 ```Bash
@@ -273,6 +259,6 @@ pip install "dragon-ml-toolbox[nuitka]"
 After installation, import modules like this:
 ```python
-from ml_tools.utilities import serialize_object, deserialize_object
+from ml_tools.serde import serialize_object, deserialize_object
 from ml_tools import custom_logger
 ```

dragon_ml_toolbox-12.0.0.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,40 @@
+dragon_ml_toolbox-12.0.0.dist-info/licenses/LICENSE,sha256=L35WDmmLZNTlJvxF6Vy7Uy4SYNi6rCfWUqlTHpoRMoU,1081
+dragon_ml_toolbox-12.0.0.dist-info/licenses/LICENSE-THIRD-PARTY.md,sha256=iy2r_R7wjzsCbz_Q_jMsp_jfZ6oP8XW9QhwzRBH0mGY,1904
+ml_tools/ETL_cleaning.py,sha256=PLRSR-VYnt1nNT9XrcWq40SE0VzHCw7DQ8v9czfSQsU,20366
+ml_tools/ETL_engineering.py,sha256=l0I6Og9o4s6EODdk0kZXjbbC-a3vVPYy1FopP2BkQSQ,54909
+ml_tools/GUI_tools.py,sha256=Va6ig-dHULPVRwQYYtH3fvY5XPIoqRcJpRW8oXC55Hw,45413
+ml_tools/MICE_imputation.py,sha256=eNN7JuT43bydAJ5E2k2A5sDjYDu3X8kCHtMdFBkzjR0,11699
+ml_tools/ML_callbacks.py,sha256=-XRIZEy3CPJWTHcoReyIw53FZlTs3pWcTVVnncTQQSc,13909
+ml_tools/ML_datasetmaster.py,sha256=t6q6mU9lz2rYKTVPKjA7yZ5ImV7_NykiciHaYnqIEpA,30822
+ml_tools/ML_evaluation.py,sha256=tLswOPgH4G1KExSMn0876YtNkbxPh-W3J4MYOjomMWA,16208
+ml_tools/ML_evaluation_multi.py,sha256=6OZyQ4SM9ALh38mOABmiHgIQDWcovsD_iOo7Bg9YZCE,12516
+ml_tools/ML_inference.py,sha256=ymFvncFsU10PExq87xnEj541DKV5ck0nMuK8ToJHzVQ,23067
+ml_tools/ML_models.py,sha256=pSCV6KbmVnPZr49Kbyg7g25CYaWBWJr6IinBHKgVKGw,28042
+ml_tools/ML_optimization.py,sha256=r1lAQiztTtRuh13rWj1iqbXvWO0LCqbzlkRdy3gEWo4,18124
+ml_tools/ML_scaler.py,sha256=tw6onj9o8_kk3FQYb930HUzvv1zsFZe2YZJdF3LtHkU,7538
+ml_tools/ML_trainer.py,sha256=_g48w5Ak-wQr5fGHdJqlcpnzv3gWyL1ghkOhy9VOZbo,23930
+ml_tools/ML_utilities.py,sha256=35DfZzAwfDwVwfRECD8X_2ynsU2NCpTdNJSmza6oAzQ,8712
+ml_tools/PSO_optimization.py,sha256=fVHeemqilBS0zrGV25E5yKwDlGdd2ZKa18d8CZ6Q6Fk,22961
+ml_tools/RNN_forecast.py,sha256=Qa2KoZfdAvSjZ4yE78N4BFXtr3tTr0Gx7tQJZPotsh0,1967
+ml_tools/SQL.py,sha256=vXLPGfVVg8bfkbBE3HVfyEclVbdJy0TBhuQONtMwSCQ,11234
+ml_tools/VIF_factor.py,sha256=dizjK0zmgOMuLBnJ66y5Sll5do6wjGWhAPVzJF1uwhQ,10404
+ml_tools/__init__.py,sha256=q0y9faQ6e17XCQ7eUiCZ1FJ4Bg5EQqLjZ9f_l5REUUY,41
+ml_tools/_logger.py,sha256=dlp5cGbzooK9YSNSZYB4yjZrOaQUGW8PTrM411AOvL8,4717
+ml_tools/_script_info.py,sha256=21r83LV3RubsNZ_RTEUON6RbDf7Mh4_udweNcvdF_Fk,212
+ml_tools/constants.py,sha256=3br5Rk9cL2IUo638eJuMOGdbGQaWssaUecYEvSeRBLM,3322
+ml_tools/custom_logger.py,sha256=OZqG7FR_UE6byzY3RDmlj08a336ZU-4DzNBMPLr_d5c,5881
+ml_tools/data_exploration.py,sha256=qpRUCQEVUmkxjx7DAztT6yIdI___xNV5NVPMBqCp3Mk,38870
+ml_tools/ensemble_evaluation.py,sha256=FGHSe8LBI8_w8LjNeJWOcYQ1UK_mc6fVah8gmSvNVGg,26853
+ml_tools/ensemble_inference.py,sha256=0yLmLNj45RVVoSCLH1ZYJG9IoAhTkWUqEZmLOQTFGTY,9348
+ml_tools/ensemble_learning.py,sha256=aTPeKthO4zRWBEaQJOUj8jEqVHiHjjOMXuiEWjI9NxM,21946
+ml_tools/handle_excel.py,sha256=pfdAPb9ywegFkM9T54bRssDOsX-K7rSeV0RaMz7lEAo,14006
+ml_tools/keys.py,sha256=FDpbS3Jb0pjrVvvp2_8nZi919mbob_-xwuy5OOtKM_A,1848
+ml_tools/math_utilities.py,sha256=CUkyBuExFOnEHp9J1Xsh6H4xILwYOBilwFccM9J_Dxo,7870
+ml_tools/optimization_tools.py,sha256=P3I6lIpvZ8Xf2kX5FvvBKBmrK2pB6idBpkTzfUJxTeE,5073
+ml_tools/path_manager.py,sha256=CyDU16pOKmC82jPubqJPT6EBt-u-3rGVbxyPIZCvDDY,18432
+ml_tools/serde.py,sha256=k0qAwfMf13lVBQSgq5u9MSXEoo31iOA2-Ncm8XgMCMI,3974
+ml_tools/utilities.py,sha256=gef62GLK7ev5BWkkQekeJoVZqwf2mIuOlOfyCw6WdtE,13882
+dragon_ml_toolbox-12.0.0.dist-info/METADATA,sha256=piCOJTB5V7QKGXqbYiu3GjdNLeyrpzV-42tIxVxBRBU,6166
+dragon_ml_toolbox-12.0.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
+dragon_ml_toolbox-12.0.0.dist-info/top_level.txt,sha256=wm-oxax3ciyez6VoO4zsFd-gSok2VipYXnbg3TH9PtU,9
+dragon_ml_toolbox-12.0.0.dist-info/RECORD,,

ml_tools/ETL_cleaning.py CHANGED Viewed

@@ -2,6 +2,7 @@ import polars as pl
 import pandas as pd
 from pathlib import Path
 from typing import Union, List, Dict
 from .path_manager import sanitize_filename, make_fullpath
 from .data_exploration import drop_macro
 from .utilities import save_dataframe, load_dataframe

ml_tools/ETL_engineering.py CHANGED Viewed

@@ -2,6 +2,7 @@ import polars as pl
 import re
 from pathlib import Path
 from typing import Literal, Union, Optional, Any, Callable, List, Dict, Tuple
 from .utilities import load_dataframe, save_dataframe
 from .path_manager import make_fullpath
 from ._script_info import _script_info
@@ -370,8 +371,20 @@ class AutoDummifier:
                           Column names are auto-generated by Polars as
                           '{original_col_name}_{category_value}'.
         """
-        # Ensure the column is treated as a string before creating dummies
-        return column.cast(pl.Utf8).to_dummies(drop_first=self.drop_first)
+        # Store the original column name to construct the potential null column name
+        col_name = column.name
+        # Create the dummy variables from the series
+        dummies = column.cast(pl.Utf8).to_dummies(drop_first=self.drop_first)
+        # Define the name of the column that Polars creates for null values
+        null_col_name = f"{col_name}_null"
+        # Check if the null column exists and drop it if it does
+        if null_col_name in dummies.columns:
+            return dummies.drop(null_col_name)
+        return dummies
 class MultiBinaryDummifier:
@@ -388,7 +401,7 @@ class MultiBinaryDummifier:
             A list of strings, where each string is a keyword to search for. A separate
             binary column will be created for each keyword.
         case_insensitive (bool):
-            If True, keyword matching ignores case. Defaults to True.
+            If True, keyword matching ignores case.
     """
     def __init__(self, keywords: List[str], case_insensitive: bool = True):
         if not isinstance(keywords, list) or not all(isinstance(k, str) for k in keywords):
@@ -531,7 +544,7 @@ class NumberExtractor:
         round_digits (int | None):
             If the dtype is 'float', you can specify the number of decimal
             places to round the result to. This parameter is ignored if
-            dtype is 'int'. Defaults to None (no rounding).
+            dtype is 'int'.
     """
     def __init__(
         self,
@@ -657,7 +670,7 @@ class MultiNumberExtractor:
             # Define the core extraction logic for the i-th number
             extraction_expr = (
                 column.str.extract_all(self.regex_pattern)
-                .list.get(i)
+                .list.get(i, null_on_oob=True)
                 .cast(self.polars_dtype, strict=False)
             )
@@ -944,8 +957,7 @@ class RatioCalculator:
 class TriRatioCalculator:
     """
-    A transformer that handles three-part ("A:B:C") and two-part ("A:C")
-    ratios, enforcing a strict output structure.
+    A transformer that handles three-part ("A:B:C") ratios, enforcing a strict output structure.
     - Three-part ratios produce A/B and A/C.
     - Two-part ratios are assumed to be A:C and produce None for A/B.

ml_tools/GUI_tools.py CHANGED Viewed

@@ -4,8 +4,9 @@ import traceback
 import FreeSimpleGUI as sg
 from functools import wraps
 from typing import Any, Dict, Tuple, List, Literal, Union, Optional, Callable
-from ._script_info import _script_info
 import numpy as np
+from ._script_info import _script_info
 from ._logger import _LOGGER
 from .keys import _OneHotOtherPlaceholder

ml_tools/MICE_imputation.py CHANGED Viewed

@@ -3,13 +3,16 @@ import miceforest as mf
 from pathlib import Path
 import matplotlib.pyplot as plt
 import numpy as np
-from .utilities import load_dataframe, merge_dataframes, save_dataframe, threshold_binary_values
-from .path_manager import sanitize_filename, make_fullpath, list_csv_paths
 from plotnine import ggplot, labs, theme, element_blank # type: ignore
 from typing import Optional, Union
+from .utilities import load_dataframe, merge_dataframes, save_dataframe
+from .math_utilities import threshold_binary_values
+from .path_manager import sanitize_filename, make_fullpath, list_csv_paths
 from ._logger import _LOGGER
 from ._script_info import _script_info
 __all__ = [
     "apply_mice",
     "save_imputed_datasets",

ml_tools/ML_callbacks.py CHANGED Viewed

@@ -1,13 +1,13 @@
 import numpy as np
 import torch
 from tqdm.auto import tqdm
+from typing import Union, Literal, Optional
+from pathlib import Path
 from .path_manager import make_fullpath, sanitize_filename
 from .keys import PyTorchLogKeys
 from ._logger import _LOGGER
-from typing import Optional
 from ._script_info import _script_info
-from typing import Union, Literal
-from pathlib import Path
 __all__ = [

ml_tools/ML_datasetmaster.py CHANGED Viewed

@@ -10,6 +10,7 @@ from torchvision.datasets import ImageFolder
 from torchvision import transforms
 import matplotlib.pyplot as plt
 from pathlib import Path
 from .path_manager import make_fullpath, sanitize_filename
 from ._logger import _LOGGER
 from ._script_info import _script_info

ml_tools/ML_evaluation.py CHANGED Viewed

@@ -18,9 +18,10 @@ from sklearn.metrics import (
 import torch
 import shap
 from pathlib import Path
+from typing import Union, Optional, List
 from .path_manager import make_fullpath
 from ._logger import _LOGGER
-from typing import Union, Optional, List
 from ._script_info import _script_info
 from .keys import SHAPKeys

ml_tools/ML_evaluation_multi.py CHANGED Viewed

@@ -25,6 +25,7 @@ from .path_manager import make_fullpath, sanitize_filename
 from ._logger import _LOGGER
 from ._script_info import _script_info
 __all__ = [
     "multi_target_regression_metrics",
     "multi_label_classification_metrics",

ml_tools/ML_inference.py CHANGED Viewed

@@ -11,6 +11,7 @@ from ._logger import _LOGGER
 from .path_manager import make_fullpath
 from .keys import PyTorchInferenceKeys
 __all__ = [
     "PyTorchInferenceHandler",
     "PyTorchInferenceHandlerMulti",

ml_tools/ML_models.py CHANGED Viewed

@@ -3,6 +3,7 @@ from torch import nn
 from typing import List, Union, Tuple, Dict, Any
 from pathlib import Path
 import json
 from ._logger import _LOGGER
 from .path_manager import make_fullpath
 from ._script_info import _script_info
@@ -155,6 +156,7 @@ class _BaseAttention(_BaseMLP):
     def __init__(self, *args, **kwargs):
         super().__init__(*args, **kwargs)
         # By default, models inheriting this do not have the flag.
+        self.attention = None
         self.has_interpretable_attention = False
     def forward(self, x: torch.Tensor) -> torch.Tensor:
@@ -165,7 +167,7 @@ class _BaseAttention(_BaseMLP):
     def forward_attention(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
         """Returns logits and attention weights."""
         # This logic is now shared and defined in one place
-        x, attention_weights = self.attention(x)
+        x, attention_weights = self.attention(x) # type: ignore
         x = self.mlp(x)
         logits = self.output_layer(x)
         return logits, attention_weights

ml_tools/ML_optimization.py CHANGED Viewed

@@ -18,7 +18,8 @@ from .ML_inference import PyTorchInferenceHandler
 from .keys import PyTorchInferenceKeys
 from .SQL import DatabaseManager
 from .optimization_tools import _save_result
-from .utilities import threshold_binary_values, save_dataframe
+from .utilities import save_dataframe
+from .math_utilities import threshold_binary_values
 __all__ = [

ml_tools/ML_scaler.py CHANGED Viewed

@@ -2,14 +2,17 @@ import torch
 from torch.utils.data import Dataset, DataLoader
 from pathlib import Path
 from typing import Union, List, Optional
 from ._logger import _LOGGER
 from ._script_info import _script_info
 from .path_manager import make_fullpath
 __all__ = [
     "PytorchScaler"
 ]
 class PytorchScaler:
     """
     Standardizes continuous features in a PyTorch dataset by subtracting the

ml_tools/ML_utilities.py ADDED Viewed

@@ -0,0 +1,219 @@
+import pandas as pd
+from pathlib import Path
+from typing import Union, Any
+from .path_manager import make_fullpath, list_subdirectories, list_files_by_extension
+from ._script_info import _script_info
+from ._logger import _LOGGER
+from .keys import DatasetKeys, PytorchModelArchitectureKeys, PytorchArtifactPathKeys, SHAPKeys
+from .utilities import load_dataframe
+__all__ = [
+    "find_model_artifacts",
+    "select_features_by_shap"
+]
+def find_model_artifacts(target_directory: Union[str,Path], load_scaler: bool, verbose: bool=False) -> list[dict[str,Any]]:
+    """
+    Scans subdirectories to find paths to model weights, target names, feature names, and model architecture. Optionally an scaler path if `load_scaler` is True.
+    This function operates on a specific directory structure. It expects the
+    `target_directory` to contain one or more subdirectories, where each
+    subdirectory represents a single trained model result.
+    The expected directory structure for each model is as follows:
+    ```
+        target_directory
+        ├── model_1
+        │   ├── *.pth
+        │   ├── scaler_*.pth          (Required if `load_scaler` is True)
+        │   ├── feature_names.txt
+        │   ├── target_names.txt
+        │   └── architecture.json
+        └── model_2/
+            └── ...
+    ```
+    Args:
+        target_directory (str | Path): The path to the root directory that contains model subdirectories.
+        load_scaler (bool): If True, the function requires and searches for a scaler file (`.pth`) in each model subdirectory.
+        verbose (bool): If True, enables detailed logging during the file paths search process.
+    Returns:
+        (list[dict[str, Path]]): A list of dictionaries, where each dictionary
+            corresponds to a model found in a subdirectory. The dictionary
+            maps standardized keys to the absolute paths of the model's
+            artifacts (weights, architecture, features, targets, and scaler).
+            The scaler path will be `None` if `load_scaler` is False.
+    """
+    # validate directory
+    root_path = make_fullpath(target_directory, enforce="directory")
+    # store results
+    all_artifacts: list[dict] = list()
+    # find model directories
+    result_dirs_dict = list_subdirectories(root_dir=root_path, verbose=verbose)
+    for dir_name, dir_path in result_dirs_dict.items():
+        # find files
+        model_pth_dict = list_files_by_extension(directory=dir_path, extension="pth", verbose=verbose)
+        # restriction
+        if load_scaler:
+            if len(model_pth_dict) != 2:
+                _LOGGER.error(f"Directory {dir_path} should contain exactly 2 '.pth' files: scaler and weights.")
+                raise IOError()
+        else:
+            if len(model_pth_dict) != 1:
+                _LOGGER.error(f"Directory {dir_path} should contain exactly 1 '.pth' file: weights.")
+                raise IOError()
+        ##### Scaler and Weights #####
+        scaler_path = None
+        weights_path = None
+        # load weights and scaler if present
+        for pth_filename, pth_path in model_pth_dict.items():
+            if load_scaler and pth_filename.lower().startswith(DatasetKeys.SCALER_PREFIX):
+                scaler_path = pth_path
+            else:
+                weights_path = pth_path
+        # validation
+        if not weights_path:
+            _LOGGER.error(f"Error parsing the model weights path from '{dir_name}'")
+            raise IOError()
+        if load_scaler and not scaler_path:
+            _LOGGER.error(f"Error parsing the scaler path from '{dir_name}'")
+            raise IOError()
+        ##### Target and Feature names #####
+        target_names_path = None
+        feature_names_path = None
+        # load feature and target names
+        model_txt_dict = list_files_by_extension(directory=dir_path, extension="txt", verbose=verbose)
+        for txt_filename, txt_path in model_txt_dict.items():
+            if txt_filename == DatasetKeys.FEATURE_NAMES:
+                feature_names_path = txt_path
+            elif txt_filename == DatasetKeys.TARGET_NAMES:
+                target_names_path = txt_path
+        # validation
+        if not target_names_path or not feature_names_path:
+            _LOGGER.error(f"Error parsing features path or targets path from '{dir_name}'")
+            raise IOError()
+        ##### load model architecture path #####
+        architecture_path = None
+        model_json_dict = list_files_by_extension(directory=dir_path, extension="json", verbose=verbose)
+        for json_filename, json_path in model_json_dict.items():
+            if json_filename == PytorchModelArchitectureKeys.SAVENAME:
+                architecture_path = json_path
+        # validation
+        if not architecture_path:
+            _LOGGER.error(f"Error parsing the model architecture path from '{dir_name}'")
+            raise IOError()
+        ##### Paths dictionary #####
+        parsing_dict = {
+            PytorchArtifactPathKeys.WEIGHTS_PATH: weights_path,
+            PytorchArtifactPathKeys.ARCHITECTURE_PATH: architecture_path,
+            PytorchArtifactPathKeys.FEATURES_PATH: feature_names_path,
+            PytorchArtifactPathKeys.TARGETS_PATH: target_names_path,
+            PytorchArtifactPathKeys.SCALER_PATH: scaler_path
+        }
+        all_artifacts.append(parsing_dict)
+    return all_artifacts
+def select_features_by_shap(
+    root_directory: Union[str, Path],
+    shap_threshold: float,
+    verbose: bool = True) -> list[str]:
+    """
+    Scans subdirectories to find SHAP summary CSVs, then extracts feature
+    names whose mean absolute SHAP value meets a specified threshold.
+    This function is useful for automated feature selection based on feature
+    importance scores aggregated from multiple models.
+    Args:
+        root_directory (Union[str, Path]):
+            The path to the root directory that contains model subdirectories.
+        shap_threshold (float):
+            The minimum mean absolute SHAP value for a feature to be included
+            in the final list.
+    Returns:
+        list[str]:
+            A single, sorted list of unique feature names that meet the
+            threshold criteria across all found files.
+    """
+    if verbose:
+        _LOGGER.info(f"Starting feature selection with SHAP threshold >= {shap_threshold}")
+    root_path = make_fullpath(root_directory, enforce="directory")
+    # --- Step 2: Directory and File Discovery ---
+    subdirectories = list_subdirectories(root_dir=root_path, verbose=False)
+    shap_filename = SHAPKeys.SAVENAME + ".csv"
+    valid_csv_paths = []
+    for dir_name, dir_path in subdirectories.items():
+        expected_path = dir_path / shap_filename
+        if expected_path.is_file():
+            valid_csv_paths.append(expected_path)
+        else:
+            _LOGGER.warning(f"No '{shap_filename}' found in subdirectory '{dir_name}'.")
+    if not valid_csv_paths:
+        _LOGGER.error(f"Process halted: No '{shap_filename}' files were found in any subdirectory.")
+        return []
+    if verbose:
+        _LOGGER.info(f"Found {len(valid_csv_paths)} SHAP summary files to process.")
+    # --- Step 3: Data Processing and Feature Extraction ---
+    master_feature_set = set()
+    for csv_path in valid_csv_paths:
+        try:
+            df, _ = load_dataframe(csv_path, kind="pandas", verbose=False)
+            # Validate required columns
+            required_cols = {SHAPKeys.FEATURE_COLUMN, SHAPKeys.SHAP_VALUE_COLUMN}
+            if not required_cols.issubset(df.columns):
+                _LOGGER.warning(f"Skipping '{csv_path}': missing required columns.")
+                continue
+            # Filter by threshold and extract features
+            filtered_df = df[df[SHAPKeys.SHAP_VALUE_COLUMN] >= shap_threshold]
+            features = filtered_df[SHAPKeys.FEATURE_COLUMN].tolist()
+            master_feature_set.update(features)
+        except (ValueError, pd.errors.EmptyDataError):
+            _LOGGER.warning(f"Skipping '{csv_path}' because it is empty or malformed.")
+            continue
+        except Exception as e:
+            _LOGGER.error(f"An unexpected error occurred while processing '{csv_path}': {e}")
+            continue
+    # --- Step 4: Finalize and Return ---
+    final_features = sorted(list(master_feature_set))
+    if verbose:
+        _LOGGER.info(f"Selected {len(final_features)} unique features across all files.")
+    return final_features
+def info():
+    _script_info(__all__)

ml_tools/PSO_optimization.py CHANGED Viewed

@@ -4,18 +4,17 @@ import xgboost as xgb
 import lightgbm as lgb
 from typing import Literal, Union, Tuple, Dict, Optional
 from copy import deepcopy
-from .utilities import (
-    threshold_binary_values,
-    threshold_binary_values_batch,
-    deserialize_object)
-from .path_manager import sanitize_filename, make_fullpath, list_files_by_extension
 import torch
 from tqdm import trange
+from contextlib import nullcontext
+from .serde import deserialize_object
+from .math_utilities import threshold_binary_values, threshold_binary_values_batch
+from .path_manager import sanitize_filename, make_fullpath, list_files_by_extension
 from ._logger import _LOGGER
 from .keys import EnsembleKeys
 from ._script_info import _script_info
 from .SQL import DatabaseManager
-from contextlib import nullcontext
 from .optimization_tools import _save_result

dragon-ml-toolbox 11.1.0__py3-none-any.whl → 12.0.0__py3-none-any.whl

Potentially problematic release.

dragon-ml-toolbox 11.1.0py3-none-any.whl → 12.0.0py3-none-any.whl