PyPI - dragon-ml-toolbox - Versions diffs - 1.4.7__tar.gz → 2.0.0__tar.gz - Mend

dragon-ml-toolbox 1.4.7tar.gz → 2.0.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

{dragon_ml_toolbox-1.4.7 → dragon_ml_toolbox-2.0.0}/LICENSE-THIRD-PARTY.md RENAMED Viewed

@@ -5,10 +5,10 @@ This project depends on the following third-party packages. Each is governed by
 - [pandas](https://github.com/pandas-dev/pandas/blob/main/LICENSE)
 - [numpy](https://github.com/numpy/numpy/blob/main/LICENSE.txt)
 - [matplotlib](https://github.com/matplotlib/matplotlib/blob/main/LICENSE/LICENSE)
-- [seaborn](https://github.com/mwaskom/seaborn/blob/main/LICENSE)
+- [seaborn](https://github.com/mwaskom/seaborn/blob/master/LICENSE.md)
 - [statsmodels](https://github.com/statsmodels/statsmodels/blob/main/LICENSE.txt)
-- [ipython](https://github.com/ipython/ipython/blob/main/COPYING.rst)
-- [ipykernel](https://github.com/ipython/ipykernel/blob/main/COPYING.rst)
+- [ipython](https://github.com/ipython/ipython/blob/main/LICENSE)
+- [ipykernel](https://github.com/ipython/ipykernel/blob/main/LICENSE)
 - [notebook](https://github.com/jupyter/notebook/blob/main/LICENSE)
 - [jupyterlab](https://github.com/jupyterlab/jupyterlab/blob/main/LICENSE)
 - [ipywidgets](https://github.com/jupyter-widgets/ipywidgets/blob/main/LICENSE)
@@ -24,5 +24,6 @@ This project depends on the following third-party packages. Each is governed by
 - [openpyxl](https://github.com/chronossc/openpyxl/blob/main/LICENSE)
 - [miceforest](https://github.com/AnotherSamWilson/miceforest/blob/main/LICENSE)
 - [polars](https://github.com/pola-rs/polars/blob/main/LICENSE)
-- [plotnine](https://github.com/has2k1/plotnine/blob/main/LICENSE.txt)
+- [plotnine](https://github.com/has2k1/plotnine/blob/main/LICENSE)
 - [pyswarm](https://pythonhosted.org/pyswarm/#license)
+- [tqdm](https://github.com/tqdm/tqdm/blob/master/LICENSE)

{dragon_ml_toolbox-1.4.7/dragon_ml_toolbox.egg-info → dragon_ml_toolbox-2.0.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: dragon-ml-toolbox
-Version: 1.4.7
+Version: 2.0.0
 Summary: A collection of tools for data science and machine learning projects
 Author-email: Karl Loza <luigiloza@gmail.com>
 License-Expression: MIT
@@ -8,7 +8,7 @@ Project-URL: Homepage, https://github.com/DrAg0n-BoRn/ML_tools
 Project-URL: Changelog, https://github.com/DrAg0n-BoRn/ML_tools/blob/master/CHANGELOG.md
 Classifier: Programming Language :: Python :: 3
 Classifier: Operating System :: OS Independent
-Requires-Python: >=3.9
+Requires-Python: >=3.10
 Description-Content-Type: text/markdown
 License-File: LICENSE
 License-File: LICENSE-THIRD-PARTY.md
@@ -32,9 +32,10 @@ Requires-Dist: joblib
 Requires-Dist: xgboost
 Requires-Dist: lightgbm<=4.5.0
 Requires-Dist: shap
+Requires-Dist: tqdm>=4.0
+Requires-Dist: Pillow
 Provides-Extra: pytorch
 Requires-Dist: torch; extra == "pytorch"
-Requires-Dist: Pillow; extra == "pytorch"
 Requires-Dist: torchvision; extra == "pytorch"
 Dynamic: license-file
@@ -49,7 +50,7 @@ A collection of Python utilities for data science and machine learning, structur
 ## Installation
-**Python 3.9+ recommended.**
+**Python 3.10+ recommended.**
 ### Via PyPI
@@ -59,6 +60,16 @@ Install the latest stable release from PyPI:
 pip install dragon-ml-tools
 ```
+### Via GitHub (Editable)
+Clone the repository and install in editable mode with optional dependencies:
+```bash
+git clone https://github.com/DrAg0n-BoRn/ML_tools.git
+cd ML_tools
+pip install -e .
+```
 ### Via conda-forge
 Install from the conda-forge channel:
@@ -66,22 +77,21 @@ Install from the conda-forge channel:
 ```bash
 conda install -c conda-forge dragon-ml-toolbox
 ```
+**Note:** This version is outdated or broken due to dependency incompatibilities.
-#### Optional dependencies
+## Optional dependencies
+**PyTorch**, which provides different builds depending on the **platform** and **hardware acceleration** (e.g., CUDA for NVIDIA GPUs on Linux/Windows, or MPS for Apple Silicon on macOS).
+Install the default CPU-only version with
 ```bash
 pip install dragon-ml-tools[pytorch]
 ```
-### Via GitHub (Editable)
-Clone the repository and install in editable mode with optional dependencies:
+To make use of GPU acceleration use the official PyTorch installation instructions:
-```bash
-git clone https://github.com/DrAg0n-BoRn/ML_tools.git
-cd ML_tools
-pip install -e .
-```
+[PyTorch Instructions](https://pytorch.org/get-started/locally/)
 ## Usage
@@ -101,7 +111,7 @@ ensemble_learning
 handle_excel
 logger
 MICE_imputation
-particle_swarm_optimization
+PSO_optimization
 trainer
 utilities
 VIF_factor

{dragon_ml_toolbox-1.4.7 → dragon_ml_toolbox-2.0.0}/README.md RENAMED Viewed

@@ -9,7 +9,7 @@ A collection of Python utilities for data science and machine learning, structur
 ## Installation
-**Python 3.9+ recommended.**
+**Python 3.10+ recommended.**
 ### Via PyPI
@@ -19,6 +19,16 @@ Install the latest stable release from PyPI:
 pip install dragon-ml-tools
 ```
+### Via GitHub (Editable)
+Clone the repository and install in editable mode with optional dependencies:
+```bash
+git clone https://github.com/DrAg0n-BoRn/ML_tools.git
+cd ML_tools
+pip install -e .
+```
 ### Via conda-forge
 Install from the conda-forge channel:
@@ -26,22 +36,21 @@ Install from the conda-forge channel:
 ```bash
 conda install -c conda-forge dragon-ml-toolbox
 ```
+**Note:** This version is outdated or broken due to dependency incompatibilities.
-#### Optional dependencies
+## Optional dependencies
+**PyTorch**, which provides different builds depending on the **platform** and **hardware acceleration** (e.g., CUDA for NVIDIA GPUs on Linux/Windows, or MPS for Apple Silicon on macOS).
+Install the default CPU-only version with
 ```bash
 pip install dragon-ml-tools[pytorch]
 ```
-### Via GitHub (Editable)
-Clone the repository and install in editable mode with optional dependencies:
+To make use of GPU acceleration use the official PyTorch installation instructions:
-```bash
-git clone https://github.com/DrAg0n-BoRn/ML_tools.git
-cd ML_tools
-pip install -e .
-```
+[PyTorch Instructions](https://pytorch.org/get-started/locally/)
 ## Usage
@@ -61,7 +70,7 @@ ensemble_learning
 handle_excel
 logger
 MICE_imputation
-particle_swarm_optimization
+PSO_optimization
 trainer
 utilities
 VIF_factor

{dragon_ml_toolbox-1.4.7 → dragon_ml_toolbox-2.0.0/dragon_ml_toolbox.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: dragon-ml-toolbox
-Version: 1.4.7
+Version: 2.0.0
 Summary: A collection of tools for data science and machine learning projects
 Author-email: Karl Loza <luigiloza@gmail.com>
 License-Expression: MIT
@@ -8,7 +8,7 @@ Project-URL: Homepage, https://github.com/DrAg0n-BoRn/ML_tools
 Project-URL: Changelog, https://github.com/DrAg0n-BoRn/ML_tools/blob/master/CHANGELOG.md
 Classifier: Programming Language :: Python :: 3
 Classifier: Operating System :: OS Independent
-Requires-Python: >=3.9
+Requires-Python: >=3.10
 Description-Content-Type: text/markdown
 License-File: LICENSE
 License-File: LICENSE-THIRD-PARTY.md
@@ -32,9 +32,10 @@ Requires-Dist: joblib
 Requires-Dist: xgboost
 Requires-Dist: lightgbm<=4.5.0
 Requires-Dist: shap
+Requires-Dist: tqdm>=4.0
+Requires-Dist: Pillow
 Provides-Extra: pytorch
 Requires-Dist: torch; extra == "pytorch"
-Requires-Dist: Pillow; extra == "pytorch"
 Requires-Dist: torchvision; extra == "pytorch"
 Dynamic: license-file
@@ -49,7 +50,7 @@ A collection of Python utilities for data science and machine learning, structur
 ## Installation
-**Python 3.9+ recommended.**
+**Python 3.10+ recommended.**
 ### Via PyPI
@@ -59,6 +60,16 @@ Install the latest stable release from PyPI:
 pip install dragon-ml-tools
 ```
+### Via GitHub (Editable)
+Clone the repository and install in editable mode with optional dependencies:
+```bash
+git clone https://github.com/DrAg0n-BoRn/ML_tools.git
+cd ML_tools
+pip install -e .
+```
 ### Via conda-forge
 Install from the conda-forge channel:
@@ -66,22 +77,21 @@ Install from the conda-forge channel:
 ```bash
 conda install -c conda-forge dragon-ml-toolbox
 ```
+**Note:** This version is outdated or broken due to dependency incompatibilities.
-#### Optional dependencies
+## Optional dependencies
+**PyTorch**, which provides different builds depending on the **platform** and **hardware acceleration** (e.g., CUDA for NVIDIA GPUs on Linux/Windows, or MPS for Apple Silicon on macOS).
+Install the default CPU-only version with
 ```bash
 pip install dragon-ml-tools[pytorch]
 ```
-### Via GitHub (Editable)
-Clone the repository and install in editable mode with optional dependencies:
+To make use of GPU acceleration use the official PyTorch installation instructions:
-```bash
-git clone https://github.com/DrAg0n-BoRn/ML_tools.git
-cd ML_tools
-pip install -e .
-```
+[PyTorch Instructions](https://pytorch.org/get-started/locally/)
 ## Usage
@@ -101,7 +111,7 @@ ensemble_learning
 handle_excel
 logger
 MICE_imputation
-particle_swarm_optimization
+PSO_optimization
 trainer
 utilities
 VIF_factor

{dragon_ml_toolbox-1.4.7 → dragon_ml_toolbox-2.0.0}/dragon_ml_toolbox.egg-info/SOURCES.txt RENAMED Viewed

@@ -8,14 +8,15 @@ dragon_ml_toolbox.egg-info/dependency_links.txt
 dragon_ml_toolbox.egg-info/requires.txt
 dragon_ml_toolbox.egg-info/top_level.txt
 ml_tools/MICE_imputation.py
+ml_tools/PSO_optimization.py
 ml_tools/VIF_factor.py
 ml_tools/__init__.py
+ml_tools/_particle_swarm_optimization.py
 ml_tools/data_exploration.py
 ml_tools/datasetmaster.py
 ml_tools/ensemble_learning.py
 ml_tools/handle_excel.py
 ml_tools/logger.py
-ml_tools/particle_swarm_optimization.py
 ml_tools/pytorch_models.py
 ml_tools/trainer.py
 ml_tools/utilities.py

{dragon_ml_toolbox-1.4.7 → dragon_ml_toolbox-2.0.0}/dragon_ml_toolbox.egg-info/requires.txt RENAMED Viewed

@@ -18,8 +18,9 @@ joblib
 xgboost
 lightgbm<=4.5.0
 shap
+tqdm>=4.0
+Pillow
 [pytorch]
 torch
-Pillow
 torchvision

{dragon_ml_toolbox-1.4.7 → dragon_ml_toolbox-2.0.0}/ml_tools/MICE_imputation.py RENAMED Viewed

@@ -3,7 +3,7 @@ import miceforest as mf
 import os
 import matplotlib.pyplot as plt
 import numpy as np
-from ml_tools.utilities import load_dataframe, list_csv_paths, sanitize_filename, _script_info, merge_dataframes, save_dataframe, threshold_binary_values
+from .utilities import load_dataframe, list_csv_paths, sanitize_filename, _script_info, merge_dataframes, save_dataframe, threshold_binary_values
 from plotnine import ggplot, labs, theme, element_blank # type: ignore
 from typing import Optional

dragon_ml_toolbox-2.0.0/ml_tools/PSO_optimization.py ADDED Viewed

@@ -0,0 +1,490 @@
+import numpy as np
+import os
+import xgboost as xgb
+import lightgbm as lgb
+from sklearn.ensemble import HistGradientBoostingRegressor
+from sklearn.base import ClassifierMixin
+from typing import Literal, Union, Tuple, Dict, Optional
+import pandas as pd
+from copy import deepcopy
+from .utilities import _script_info, threshold_binary_values, threshold_binary_values_batch, deserialize_object, list_files_by_extension, save_dataframe
+import torch
+from tqdm import trange
+__all__ = [
+    "ObjectiveFunction",
+    "multiple_objective_functions_from_dir",
+    "run_pso"
+]
+class ObjectiveFunction():
+    """
+    Callable objective function designed for optimizing continuous outputs from tree-based regression models.
+    The target serialized file (joblib) must include a trained tree-based 'model'. Additionally 'feature_names' and 'target_name' will be parsed if present.
+    Parameters
+    ----------
+    trained_model_path : str
+        Path to a serialized model (joblib) compatible with scikit-learn-like `.predict`.
+    add_noise : bool
+        Whether to apply multiplicative noise to the input features during evaluation.
+    task : (Literal["maximization", "minimization"])
+        Whether to maximize or minimize the target.
+    binary_features : int
+        Number of binary features located at the END of the feature vector. Model should be trained with continuous features first, followed by binary.
+    """
+    def __init__(self, trained_model_path: str, add_noise: bool, task: Literal["maximization", "minimization"], binary_features: int) -> None:
+        self.binary_features = binary_features
+        self.is_hybrid = False if binary_features <= 0 else True
+        self.use_noise = add_noise
+        self._artifact = deserialize_object(trained_model_path, verbose=False, raise_on_error=True)
+        self.model = self._get_from_artifact('model')
+        self.feature_names: Optional[list[str]] = self._get_from_artifact('feature_names') # type: ignore
+        self.target_name: Optional[str] = self._get_from_artifact('target_name') # type: ignore
+        self.task = task
+        self.check_model() # check for classification models and None values
+    def __call__(self, features_array: np.ndarray) -> np.ndarray:
+        """
+        Batched evaluation for PSO. Accepts 2D array (n_samples, n_features).
+        Applies optional noise and hybrid binary thresholding.
+        Returns
+        -------
+        np.ndarray
+            1D array with length n_samples containing predicted target values.
+        """
+        assert features_array.ndim == 2, f"Expected 2D array, got shape {features_array.shape}"
+        # Apply noise if enabled
+        if self.use_noise:
+            features_array = self.add_noise(features_array)
+        # Apply binary thresholding if enabled
+        if self.is_hybrid:
+            features_array = threshold_binary_values_batch(features_array, self.binary_features)
+        # Ensure correct type
+        features_array = features_array.astype(np.float32)
+        # Evaluate
+        result = self.model.predict(features_array) # type: ignore
+        # Flip sign if maximizing
+        if self.task == "maximization":
+            return -result
+        return result
+    def add_noise(self, features_array: np.ndarray) -> np.ndarray:
+        """
+        Apply multiplicative noise to input feature batch (2D).
+        Binary features (if present) are excluded from noise injection.
+        Parameters
+        ----------
+        features_array : np.ndarray
+            Input array of shape (batch_size, n_features)
+        Returns
+        -------
+        np.ndarray
+            Noised array of same shape
+        """
+        assert features_array.ndim == 2, "Expected 2D array for batch noise injection"
+        if self.binary_features > 0:
+            split_idx = -self.binary_features
+            cont_part = features_array[:, :split_idx]
+            bin_part = features_array[:, split_idx:]
+            noise = np.random.uniform(0.95, 1.05, size=cont_part.shape)
+            cont_noised = cont_part * noise
+            return np.hstack([cont_noised, bin_part])
+        else:
+            noise = np.random.uniform(0.95, 1.05, size=features_array.shape)
+            return features_array * noise
+    def check_model(self):
+        if isinstance(self.model, ClassifierMixin) or isinstance(self.model, xgb.XGBClassifier) or isinstance(self.model, lgb.LGBMClassifier):
+            raise ValueError(f"[Model Check Failed] ❌\nThe loaded model ({type(self.model).__name__}) is a Classifier.\nOptimization is not suitable for standard classification tasks.")
+        if self.model is None:
+            raise ValueError("Loaded model is None")
+    def _get_from_artifact(self, key: str):
+        if self._artifact is None:
+            raise TypeError("Load model error")
+        val = self._artifact.get(key)
+        if key == "feature_names":
+            result = val if isinstance(val, list) and val else None
+        else:
+            result = val if val else None
+        return result
+    def __repr__(self):
+        return (f"<ObjectiveFunction(model={type(self.model).__name__}, use_noise={self.use_noise}, is_hybrid={self.is_hybrid}, task='{self.task}')>")
+def multiple_objective_functions_from_dir(directory: str, add_noise: bool, task: Literal["maximization", "minimization"], binary_features: int):
+    """
+    Loads multiple objective functions from serialized models in the given directory.
+    Each `.joblib` file which is loaded and wrapped as an `ObjectiveFunction` instance. Returns a list of such instances along with their corresponding names.
+    Parameters:
+        directory (str) : Path to the directory containing `.joblib` files (serialized models).
+        add_noise (bool) : Whether to apply multiplicative noise to the input features during evaluation.
+        task (Literal["maximization", "minimization"]) : Defines the nature of the optimization task.
+        binary_features (int) : Number of binary features expected by each objective function.
+    Returns:
+        (tuple[list[ObjectiveFunction], list[str]]) : A tuple containing:
+            - list of `ObjectiveFunction` instances.
+            - list of corresponding filenames.
+    """
+    objective_functions = list()
+    objective_function_names = list()
+    for file_name, file_path in list_files_by_extension(directory=directory, extension='joblib').items():
+        current_objective = ObjectiveFunction(trained_model_path=file_path,
+                                              add_noise=add_noise,
+                                              task=task,
+                                              binary_features=binary_features)
+        objective_functions.append(current_objective)
+        objective_function_names.append(file_name)
+    return objective_functions, objective_function_names
+def _set_boundaries(lower_boundaries: list[float], upper_boundaries: list[float]):
+    assert len(lower_boundaries) == len(upper_boundaries), "Lower and upper boundaries must have the same length."
+    assert len(lower_boundaries) >= 1, "At least one boundary pair is required."
+    lower = np.array(lower_boundaries)
+    upper = np.array(upper_boundaries)
+    return lower, upper
+def _set_feature_names(size: int, names: Union[list[str], None]):
+    if names is None:
+        return [str(i) for i in range(1, size+1)]
+    else:
+        assert len(names) == size, "List with feature names do not match the number of features"
+        return names
+def _save_results(*dicts, save_dir: str, target_name: str):
+    combined_dict = dict()
+    for single_dict in dicts:
+        combined_dict.update(single_dict)
+    df = pd.DataFrame(combined_dict)
+    save_dataframe(df=df, save_dir=save_dir, filename=f"Optimization_{target_name}")
+def run_pso(lower_boundaries: list[float],
+            upper_boundaries: list[float],
+            objective_function: ObjectiveFunction,
+            save_results_dir: str,
+            auto_binary_boundaries: bool=True,
+            target_name: Union[str, None]=None,
+            feature_names: Union[list[str], None]=None,
+            swarm_size: int=200,
+            max_iterations: int=1000,
+            random_state: int=101,
+            post_hoc_analysis: Optional[int]=3) -> Tuple[Dict[str, float | list[float]], Dict[str, float | list[float]]]:
+    """
+    Executes Particle Swarm Optimization (PSO) to optimize a given objective function and saves the results as a CSV file.
+    Parameters
+    ----------
+    lower_boundaries : list[float]
+        Lower bounds for each feature in the search space (as many as features expected by the model).
+    upper_boundaries : list[float]
+        Upper bounds for each feature in the search space (as many as features expected by the model).
+    objective_function : ObjectiveFunction
+        A callable object encapsulating a tree-based regression model.
+    save_results_dir : str
+        Directory path to save the results CSV file.
+    auto_binary_boundaries : bool
+        Use `ObjectiveFunction.binary_features` to append as many binary boundaries as needed to `lower_boundaries` and `upper_boundaries` automatically.
+    target_name : str or None, optional
+        Name of the target variable. If None, attempts to retrieve from the ObjectiveFunction object.
+    feature_names : list[str] or None, optional
+        List of feature names. If None, attempts to retrieve from the ObjectiveFunction or generate generic names.
+    swarm_size : int
+        Number of particles in the swarm.
+    max_iterations : int
+        Maximum number of iterations for the optimization algorithm.
+    post_hoc_analysis : int or None
+        If specified, runs the optimization multiple times to perform post hoc analysis. The value indicates the number of repetitions.
+    Returns
+    -------
+    Tuple[Dict[str, float | list[float]], Dict[str, float | list[float]]]
+        If `post_hoc_analysis` is None, returns two dictionaries:
+            - feature_names: Feature values (after inverse scaling) that yield the best result.
+            - target_name: Best result obtained for the target variable.
+        If `post_hoc_analysis` is an integer, returns two dictionaries:
+            - feature_names: Lists of best feature values (after inverse scaling) for each repetition.
+            - target_name: List of best target values across repetitions.
+    Notes
+    -----
+    - PSO minimizes the objective function by default; if maximization is desired, it should be handled inside the ObjectiveFunction.
+    """
+    # Select device
+    if torch.cuda.is_available():
+        device = torch.device("cuda")
+    elif torch.backends.mps.is_available():
+        device = torch.device("mps")
+    else:
+        device = torch.device("cpu")
+    print(f"[PSO] Using device: '{device}'")
+    # set local deep copies to prevent in place list modification
+    local_lower_boundaries = deepcopy(lower_boundaries)
+    local_upper_boundaries = deepcopy(upper_boundaries)
+    # Append binary boundaries
+    binary_number = objective_function.binary_features
+    if auto_binary_boundaries and binary_number > 0:
+        local_lower_boundaries.extend([0] * binary_number)
+        local_upper_boundaries.extend([1] * binary_number)
+    # Set the total length of features
+    size_of_features = len(local_lower_boundaries)
+    lower, upper = _set_boundaries(local_lower_boundaries, local_upper_boundaries)
+    # feature names
+    if feature_names is None and objective_function.feature_names is not None:
+        feature_names = objective_function.feature_names
+    names = _set_feature_names(size=size_of_features, names=feature_names)
+    # target name
+    if target_name is None and objective_function.target_name is not None:
+        target_name = objective_function.target_name
+    if target_name is None:
+        target_name = "Target"
+    arguments = {
+            "func":objective_function,
+            "lb": lower,
+            "ub": upper,
+            "device": device,
+            "swarmsize": swarm_size,
+            "maxiter": max_iterations,
+            "particle_output": False,
+    }
+    os.makedirs(save_results_dir, exist_ok=True)
+    if post_hoc_analysis is None or post_hoc_analysis == 1:
+        arguments.update({"seed": random_state})
+        best_features, best_target, *_ = _pso(**arguments)
+        # best_features, best_target, _particle_positions, _target_values_per_position = _pso(**arguments)
+        # flip best_target if maximization was used
+        if objective_function.task == "maximization":
+            best_target = -best_target
+        # threshold binary features
+        best_features_threshold = threshold_binary_values(best_features, binary_number)
+        # name features
+        best_features_named = {name: value for name, value in zip(names, best_features_threshold)}
+        best_target_named = {target_name: best_target}
+        # save results
+        _save_results(best_features_named, best_target_named, save_dir=save_results_dir, target_name=target_name)
+        return best_features_named, best_target_named
+    else:
+        all_best_targets = list()
+        all_best_features = [[] for _ in range(size_of_features)]
+        for _ in range(post_hoc_analysis):
+            best_features, best_target, *_ = _pso(**arguments)
+            # best_features, best_target, _particle_positions, _target_values_per_position = _pso(**arguments)
+            # flip best_target if maximization was used
+            if objective_function.task == "maximization":
+                best_target = -best_target
+            # threshold binary features
+            best_features_threshold = threshold_binary_values(best_features, binary_number)
+            for i, best_feature in enumerate(best_features_threshold):
+                all_best_features[i].append(best_feature)
+            all_best_targets.append(best_target)
+        # name features
+        all_best_features_named = {name: list_values for name, list_values in zip(names, all_best_features)}
+        all_best_targets_named = {target_name: all_best_targets}
+        # save results
+        _save_results(all_best_features_named, all_best_targets_named, save_dir=save_results_dir, target_name=target_name)
+        return all_best_features_named, all_best_targets_named # type: ignore
+def info():
+    _script_info(__all__)
+def _pso(func: ObjectiveFunction,
+         lb: np.ndarray,
+         ub: np.ndarray,
+         device: torch.device,
+         swarmsize=100,
+         maxiter=100,
+         omega = 0.729,     # Clerc and Kennedy’s constriction coefficient
+         phip = 1.49445,    # Clerc and Kennedy’s constriction coefficient
+         phig = 1.49445,    # Clerc and Kennedy’s constriction coefficient
+         tolerance = 1e-8,
+         particle_output=False,
+         seed: Optional[int] = None):
+    """
+    Internal PSO implementation using PyTorch tensors for acceleration on CUDA or MPS devices.
+    Parameters
+    ----------
+    func : callable
+        Callable objective function with batched evaluation support. Must accept a 2D NumPy array
+        of shape (n_particles, n_features) and return a 1D NumPy array of shape (n_particles,).
+    lb : np.ndarray
+        Lower bounds for each feature (1D array of length n_features).
+    ub : np.ndarray
+        Upper bounds for each feature (1D array of length n_features).
+    swarmsize : int
+        Number of particles in the swarm (i.e., batch size per iteration).
+    maxiter : int
+        Number of iterations to perform (i.e., optimization steps).
+    omega : float
+        Inertia weight controlling velocity retention across iterations.
+        - Typical range: [0.4, 0.9]
+        - Lower values encourage convergence, higher values promote exploration.
+        - The default value (0.729) comes from Clerc & Kennedy's constriction method.
+    phip : float
+        Cognitive acceleration coefficient.
+        - Controls how strongly particles are pulled toward their own best-known positions.
+        - Typical range: [0.5, 2.5]
+        - Default from Clerc & Kennedy's recommended setting.
+    phig : float
+        Social acceleration coefficient.
+        - Controls how strongly particles are pulled toward the swarm's global best.
+        - Typical range: [0.5, 2.5]
+        - Default from Clerc & Kennedy's recommended setting.
+    particle_output : bool, default=False
+        If True, returns the full history of particle positions and objective scores at each iteration.
+    seed : int or None, default=None
+        Random seed for reproducibility. If None, defaults to 42.
+    Returns
+    -------
+    best_position : np.ndarray
+        1D array of shape (n_features,) representing the best solution found.
+    best_score : float
+        Objective value at `best_position`.
+    history_positions : list[np.ndarray], optional
+        Only returned if `particle_output=True`. List of particle positions per iteration.
+        Each element has shape (swarmsize, n_features).
+    history_scores : list[np.ndarray], optional
+        Only returned if `particle_output=True`. List of objective scores per iteration.
+        Each element has shape (swarmsize,).
+    """
+    if seed is not None:
+        torch.manual_seed(seed)
+    ndim = len(lb)
+    lb_t = torch.tensor(lb, dtype=torch.float32, device=device, requires_grad=False)
+    ub_t = torch.tensor(ub, dtype=torch.float32, device=device, requires_grad=False)
+    # Initialize positions and velocities
+    r = torch.rand((swarmsize, ndim), device=device, requires_grad=False)
+    positions = lb_t + r * (ub_t - lb_t)  # shape: (swarmsize, ndim)
+    velocities = torch.zeros_like(positions, requires_grad=False)
+    # Initialize best positions and scores
+    personal_best_positions = positions.clone()
+    personal_best_scores = torch.full((swarmsize,), float('inf'), device=device, requires_grad=False)
+    global_best_score = float('inf')
+    global_best_position = torch.zeros(ndim, device=device, requires_grad=False)
+    # History (optional)
+    if particle_output:
+        history_positions = []
+        history_scores = []
+    # Main loop
+    previous_best_score = float('inf')
+    progress = trange(maxiter, desc="PSO", unit="iter", leave=True) #tqdm bar
+    with torch.no_grad():
+        for i in progress:
+            # Evaluate objective for all particles
+            positions_np = positions.detach().cpu().numpy() # shape: (swarmsize, n_features)
+            scores_np = func(positions_np)  # shape: (swarmsize,)
+            scores = torch.tensor(scores_np, device=device, dtype=torch.float32)
+            # Update personal bests
+            improved = scores < personal_best_scores
+            personal_best_scores = torch.where(improved, scores, personal_best_scores)
+            personal_best_positions = torch.where(improved[:, None], positions, personal_best_positions)
+            # Update global best
+            min_score, min_idx = torch.min(personal_best_scores, dim=0)
+            if min_score < global_best_score:
+                global_best_score = min_score.item()
+                global_best_position = personal_best_positions[min_idx].clone()
+                # Early stopping criteria
+                if abs(previous_best_score - global_best_score) < tolerance:
+                    progress.set_description(f"PSO (early stop at iteration {i+1})")
+                    break
+                previous_best_score = global_best_score
+            # Optional: track history for debugging/visualization
+            if particle_output:
+                history_positions.append(positions.detach().cpu().numpy())
+                history_scores.append(scores_np)
+            # Velocity update
+            rp = torch.rand((swarmsize, ndim), device=device, requires_grad=False)
+            rg = torch.rand((swarmsize, ndim), device=device, requires_grad=False)
+            cognitive = phip * rp * (personal_best_positions - positions)
+            social = phig * rg * (global_best_position - positions)
+            velocities = omega * velocities + cognitive + social
+            # Position update
+            positions = positions + velocities
+            # Clamp to search space bounds
+            positions = torch.max(positions, lb_t)
+            positions = torch.min(positions, ub_t)
+    # Move to CPU and convert to NumPy
+    best_position = global_best_position.detach().cpu().numpy()
+    best_score = global_best_score
+    if particle_output:
+        return best_position, best_score, history_positions, history_scores
+    else:
+        return best_position, best_score

dragon_ml_toolbox-1.4.7/ml_tools/particle_swarm_optimization.py → dragon_ml_toolbox-2.0.0/ml_tools/_particle_swarm_optimization.py RENAMED Viewed

@@ -1,6 +1,10 @@
+"""
+DEPRECATED
+"""
 import numpy as np
 import os
-import joblib
 import xgboost as xgb
 import lightgbm as lgb
 from sklearn.ensemble import HistGradientBoostingClassifier, HistGradientBoostingRegressor

{dragon_ml_toolbox-1.4.7 → dragon_ml_toolbox-2.0.0}/ml_tools/data_exploration.py RENAMED Viewed

@@ -7,7 +7,7 @@ from IPython.display import clear_output
 import time
 from typing import Union, Literal, Dict, Tuple, List
 import os
-from ml_tools.utilities import sanitize_filename, _script_info
+from .utilities import sanitize_filename, _script_info
 import re

{dragon_ml_toolbox-1.4.7 → dragon_ml_toolbox-2.0.0}/ml_tools/ensemble_learning.py RENAMED Viewed

@@ -7,7 +7,6 @@ from matplotlib import rcdefaults
 import os
 from typing import Literal, Union, Optional, Iterator, Tuple
-import joblib
 from imblearn.over_sampling import ADASYN, SMOTE, RandomOverSampler
 from imblearn.under_sampling import RandomUnderSampler

{dragon_ml_toolbox-1.4.7 → dragon_ml_toolbox-2.0.0}/ml_tools/handle_excel.py RENAMED Viewed

@@ -2,7 +2,7 @@ import os
 from openpyxl import load_workbook, Workbook
 import pandas as pd
 from typing import List, Optional
-from utilities import _script_info, sanitize_filename
+from .utilities import _script_info, sanitize_filename
 __all__ = [

{dragon_ml_toolbox-1.4.7 → dragon_ml_toolbox-2.0.0}/ml_tools/logger.py RENAMED Viewed

@@ -5,7 +5,7 @@ import pandas as pd
 from openpyxl.styles import Font, PatternFill
 import traceback
 import json
-from ml_tools.utilities import sanitize_filename, _script_info
+from .utilities import sanitize_filename, _script_info
 __all__ = [

{dragon_ml_toolbox-1.4.7 → dragon_ml_toolbox-2.0.0}/ml_tools/utilities.py RENAMED Viewed

@@ -21,6 +21,7 @@ __all__ = [
     "normalize_mixed_list",
     "sanitize_filename",
     "threshold_binary_values",
+    "threshold_binary_values_batch",
     "serialize_object",
     "deserialize_object",
     "distribute_datasets_by_target"
@@ -356,6 +357,39 @@ def threshold_binary_values(
         return tuple(result)
     else:
         return result
+def threshold_binary_values_batch(
+    input_array: np.ndarray,
+    binary_values: int
+) -> np.ndarray:
+    """
+    Threshold the last `binary_values` columns of a 2D NumPy array to binary {0,1} using 0.5 cutoff.
+    Parameters
+    ----------
+    input_array : np.ndarray
+        2D array with shape (batch_size, n_features).
+    binary_values : int
+        Number of binary features located at the END of each row.
+    Returns
+    -------
+    np.ndarray
+        Thresholded array, same shape as input.
+    """
+    assert input_array.ndim == 2, f"Expected 2D array, got {input_array.ndim}D"
+    batch_size, total_features = input_array.shape
+    assert 0 <= binary_values <= total_features, "binary_values out of valid range"
+    if binary_values == 0:
+        return input_array.copy()
+    cont_part = input_array[:, :-binary_values] if binary_values < total_features else np.empty((batch_size, 0))
+    bin_part = input_array[:, -binary_values:] > 0.5
+    bin_part = bin_part.astype(np.int32)
+    return np.hstack([cont_part, bin_part])
 def serialize_object(obj: Any, save_dir: str, filename: str, verbose: bool=True, raise_on_error: bool=False) -> Optional[str]:

{dragon_ml_toolbox-1.4.7 → dragon_ml_toolbox-2.0.0}/pyproject.toml RENAMED Viewed

@@ -1,12 +1,12 @@
 [project]
 name = "dragon-ml-toolbox"
-version = "1.4.7"
+version = "2.0.0"
 description = "A collection of tools for data science and machine learning projects"
 authors = [
     { name = "Karl Loza", email = "luigiloza@gmail.com" }
 ]
 readme = "README.md"
-requires-python = ">=3.9"
+requires-python = ">=3.10"
 license = "MIT"
 classifiers = [
     "Programming Language :: Python :: 3",
@@ -32,7 +32,9 @@ dependencies = [
     "joblib",
     "xgboost",
     "lightgbm<=4.5.0",
-    "shap"
+    "shap",
+    "tqdm>=4.0",
+    "Pillow"
 ]
 [project.urls]
@@ -42,7 +44,6 @@ Changelog = "https://github.com/DrAg0n-BoRn/ML_tools/blob/master/CHANGELOG.md"
 [project.optional-dependencies]
 pytorch = [
     "torch",
-    "Pillow",
     "torchvision"
 ]