PyPI - panelsplit - Versions diffs - 1.1.1__tar.gz → 2.0.4.dev0__tar.gz - Mend

panelsplit 1.1.1tar.gz → 2.0.4.dev0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (53) hide show

{panelsplit-1.1.1 → panelsplit-2.0.4.dev0}/.github/workflows/ci.yml RENAMED Viewed

@@ -14,7 +14,7 @@ jobs:
       strategy:
         matrix:
-          python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "3.14"]
+          python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
         fail-fast: true
       steps:

{panelsplit-1.1.1 → panelsplit-2.0.4.dev0}/.github/workflows/lint.yml RENAMED Viewed

@@ -16,7 +16,7 @@ jobs:
       - name: Setup Python
         uses: actions/setup-python@v4
         with:
-          python-version: '3.9'
+          python-version: '3.11'
       - name: Checkout
         uses: actions/checkout@v3
       - name: Install uv
@@ -26,8 +26,35 @@ jobs:
       - name: Install dependencies
         run: uv sync --dev
+      - name: Install mypy (match pre-commit)
+        run: uv run pip install mypy==1.18.2
       - name: Run mypy
-        run: uv run mypy panelsplit
+        run: |
+          uv run mypy panelsplit \
+            --disallow-untyped-defs \
+            --disallow-incomplete-defs \
+            --ignore-missing-imports
+  numpydoc:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Setup Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.11'
+      - name: Checkout
+        uses: actions/checkout@v3
+      - name: Install uv
+        run: |
+          curl -LsSf https://astral.sh/uv/install.sh | sh
+          echo "$HOME/.local/bin" >> $GITHUB_PATH
+      - name: Install dependencies
+        run: uv sync --dev
+      - name: Run numpydoc
+        run: uv run pre-commit run numpydoc-validation --all-files
   ruff:
     runs-on: ubuntu-latest

panelsplit-2.0.4.dev0/.gitignore ADDED Viewed

@@ -0,0 +1,28 @@
+# Ignore compiled Python files
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+*.pyirc
+*.mypy_cache
+*.pytest_cache
+*.ruff_cache
+*docs/
+*typecov/
+dist/
+.DS_Store
+debug*
+panelsplit.egg-info
+.venv
+.python-version
+# drafts
+examples/Intro to PanelSplit.ipynb
+examples/PanelSplit explanation.ipynb
+htmlcov
+.coverage
+.vscode/

{panelsplit-1.1.1 → panelsplit-2.0.4.dev0}/.pre-commit-config.yaml RENAMED Viewed

@@ -23,3 +23,18 @@ repos:
         args: [--fix]
       - id: ruff-format
         types_or: [python, pyi]
+  - repo: https://github.com/numpy/numpydoc
+    rev: v1.10.0
+    hooks:
+      - id: numpydoc-validation
+        files: ^panelsplit/.*\.py$
+        exclude: ^panelsplit/(_|.*/_)
+  - repo: https://github.com/pre-commit/mirrors-mypy
+    rev: v1.18.2  # Use the sha / tag you want to point at
+    hooks:
+      -   id: mypy
+          files: panelsplit
+          args: [--disallow-untyped-defs, --disallow-incomplete-defs, --ignore-missing-imports]

panelsplit-2.0.4.dev0/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,40 @@
+# Changelog
+All notable changes to panelsplit will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [2.0.0] - 2025-12-31
+### Changed
+- Switched from `pydoclint` to `numpydoc` for better docstring readability.
+- `SequentialCVPipeline` made more sklearn-like (e.g. added functionality for `set_params` and `get_params`). Its arguments were also made to match sklearn; `steps` is a list of tuples of length 2 (name, estimator) instead of (name, estimator, cv). CVs were moved to a separate argument, `cv_steps`
+### Added
+- `panelsplit.model_selection`. This includes `RandomizedSearch` and `GridSearch` in order to allow for hyper-parameter searching with `SequentialCVPipeline`
+- `panelsplit.metrics`. This module includes metrics which work with the `model_selection` module.
+## [1.1.2] - 2025-10-28
+### Added
+- Consistent type hints with more restrictions (E.g. `--disallow-untyped-defs` `--disallow-incomplete-defs`), addressing [#85](https://github.com/4Freye/panelsplit/issues/85)
+- Consistent docstrings addressing [#94](https://github.com/4Freye/panelsplit/issues/94)
+- mypy and pydoclint checks on `pre-commit-config.yaml` and `.github/workflows/lint.yml`
+## [1.1.1] - 2025-10-23
+### Changed
+- Migrated from boolean indexing to purely integer-based indexing, as mentioned in [#86](https://github.com/4Freye/panelsplit/issues/86)
+### Added
+- Consistent type hints throughout the Python codebase, addressing [#85](https://github.com/4Freye/panelsplit/issues/85)
+- mypy to CI, addressing [#85](https://github.com/4Freye/panelsplit/issues/85)
+## [1.1.0] - 2025-10-21
+### Added
+- Support for more DataFrame types (e.g. polars) via narwhals
+## [1.0.4] - 2025-10-16
+### Added
+- `CHANGELOG.md` - marking changes to the project
+- Automation of publishing to pypi
+- Dynamic versioning
+- Automation of GitHub Releases via `CHANGELOG.md`

{panelsplit-1.1.1 → panelsplit-2.0.4.dev0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: panelsplit
-Version: 1.1.1
+Version: 2.0.4.dev0
 Summary: A tool for panel data analysis.
 Project-URL: Homepage, https://github.com/4Freye/panelsplit
 Project-URL: Repository, https://github.com/4Freye/panelsplit
@@ -11,14 +11,16 @@ License-File: LICENSE
 Classifier: License :: OSI Approved :: MIT License
 Classifier: Operating System :: OS Independent
 Classifier: Programming Language :: Python :: 3
-Requires-Python: >=3.8
+Requires-Python: >=3.10
 Requires-Dist: joblib>=1.0.1
 Requires-Dist: matplotlib>=3.4.3
 Requires-Dist: narwhals>=1.42.1
 Requires-Dist: numpy>=1.21.0
 Requires-Dist: pandas>=1.3.0
 Requires-Dist: scikit-learn>=0.24.2
+Requires-Dist: scipy>=1.10.1
 Requires-Dist: tqdm>=4.67.1
+Requires-Dist: typing-extensions>=4.13.2
 Description-Content-Type: text/markdown
 ![PyPI - Version](https://img.shields.io/pypi/v/panelsplit)
@@ -30,7 +32,7 @@ panelsplit is a Python package designed to facilitate time series cross-validati
 ## Installation
-panelsplit is tested for compatibility with python versions >= 3.8. You can install panelsplit using pip:
+panelsplit is tested for compatibility with python versions >= 3.10. You can install panelsplit using pip:
 ```bash
 pip install panelsplit

{panelsplit-1.1.1 → panelsplit-2.0.4.dev0}/README.md RENAMED Viewed

@@ -7,7 +7,7 @@ panelsplit is a Python package designed to facilitate time series cross-validati
 ## Installation
-panelsplit is tested for compatibility with python versions >= 3.8. You can install panelsplit using pip:
+panelsplit is tested for compatibility with python versions >= 3.10. You can install panelsplit using pip:
 ```bash
 pip install panelsplit

{panelsplit-1.1.1 → panelsplit-2.0.4.dev0}/panelsplit/__init__.py RENAMED Viewed

@@ -42,6 +42,21 @@ Explore the modules in detail by clicking on the links below to see full documen
 ### `panelsplit.plot`
 - Visualize time series splits easily.
+### `panelsplit.model_selection`
+- **Hyperparameter tuning:** Provides GridSearch and RandomizedSearch classes for optimizing model parameters using panel data cross-validation.
+- **Efficient search:** Supports parallel processing and integrates with panelsplit's cross-validation framework.
+### `panelsplit.metrics`
+- **Scoring functions:** Offers a range of metrics for evaluating model performance on panel data.
+- **Sequential CV scorers:** Specialized scorers designed for sequential cross-validation splits.
 """
-__all__ = ["application", "cross_validation", "pipeline", "plot"]
+__all__ = [
+    "application",
+    "cross_validation",
+    "metrics",
+    "model_selection",
+    "pipeline",
+    "plot",
+]

{panelsplit-1.1.1 → panelsplit-2.0.4.dev0}/panelsplit/application.py RENAMED Viewed

@@ -1,11 +1,15 @@
 import inspect
-from typing import Tuple, List, Union, Optional
+from typing import Tuple, List, Optional, Iterable, Union
+from numpy.typing import NDArray
 import narwhals as nw
 import numpy as np
 from joblib import Parallel, delayed
 from narwhals.typing import IntoDataFrame, IntoSeries
-from sklearn.base import clone, BaseEstimator
+from sklearn.base import clone
+from .utils.typing import ArrayLike, EstimatorLike
+from .cross_validation import PanelSplit
+from typing import Literal
 from .utils.utils import _split_wrapper
 from .utils.validation import (
@@ -17,20 +21,22 @@ from .utils.validation import (
 )
-def _get_non_null_mask(data):
+def _get_non_null_mask(data: IntoSeries) -> IntoSeries:
     """Get non-null mask for any data type."""
     return ~nw.from_native(data, series_only=True).is_null()
-def _predict_split(model, X_test: IntoDataFrame, method: str = "predict") -> np.ndarray:
+def _predict_split(
+    model: EstimatorLike, X_test: ArrayLike, method: str = "predict"
+) -> np.ndarray:
     """
     Perform predictions for a single split.
     Parameters
     ----------
-    model : object
+    model : EstimatorLike
         The machine learning model used for prediction.
-    X_test : IntoDataFrame
+    X_test : ArrayLike
         The input features for testing.
     method : str, optional
         The method to use for prediction. It can be 'predict', 'predict_proba',
@@ -46,34 +52,34 @@ def _predict_split(model, X_test: IntoDataFrame, method: str = "predict") -> np.
 def _fit_split(
-    estimator,
+    estimator: EstimatorLike,
     X: IntoDataFrame,
     y: Optional[IntoSeries],
-    train_indices: np.ndarray,
-    sample_weight: Optional[Union[IntoSeries, np.ndarray]] = None,
+    train_indices: NDArray,
+    sample_weight: Optional[Union[IntoSeries, NDArray]] = None,
     drop_na_in_y: bool = False,
-):
+) -> EstimatorLike:
     """
     Fit a cloned estimator on the given training indices.
     Parameters
     ----------
-    estimator : object
+    estimator : EstimatorLike
         The machine learning model to be fitted.
     X : IntoDataFrame
         The input features for the estimator.
-    y : IntoSeries or None
-        The target variable for the estimator.
-    train_indices : np.ndarray
+    y : Optional[IntoSeries]
+        The target variable for the estimator. Default is None.
+    train_indices : NDArray
         Integer indices indicating the training data.
-    sample_weight : IntoSeries or np.ndarray, optional
+    sample_weight : Optional[Union[IntoSeries, NDArray]]
         Sample weights for the training data. Default is None.
-    drop_na_in_y : bool, default=False
-        Whether to drop rows with null values in y.
+    drop_na_in_y : bool
+        Whether to drop rows with null values in y. Default is False
     Returns
     -------
-    object
+    EstimatorLike
         A fitted estimator.
     """
     local_estimator = clone(estimator)
@@ -152,41 +158,41 @@ def _prediction_order_to_original_order(indices: List[np.ndarray]) -> np.ndarray
 def cross_val_fit(
-    estimator,
+    estimator: EstimatorLike,
     X: IntoDataFrame,
     y: IntoSeries,
-    cv,
+    cv: Union[PanelSplit, Iterable],
     sample_weight: Optional[Union[IntoSeries, np.ndarray]] = None,
     n_jobs: int = 1,
     progress_bar: bool = False,
     drop_na_in_y: bool = False,
-) -> List[BaseEstimator]:
+) -> List[EstimatorLike]:
     """
     Fit the estimator using cross-validation.
     Parameters
     ----------
-    estimator : object
+    estimator : EstimatorLike
         The machine learning model to be fitted.
     X : IntoDataFrame
         The input features for the estimator.
     y : IntoSeries
         The target variable for the estimator.
-    cv : object or iterable
+    cv : Union[PanelSplit, Iterable]
         Cross-validation splitter; either an object that generates train/test splits (e.g., an instance of PanelSplit)
         or an iterable of splits.
-    sample_weight : IntoSeries or np.ndarray, optional
+    sample_weight : Optional[Union[IntoSeries, np.ndarray]]
         Sample weights for the training data. Default is None.
-    n_jobs : int, optional
+    n_jobs : int
         The number of jobs to run in parallel. Default is 1.
-    progress_bar : bool, optional
+    progress_bar : bool
         Whether to display a progress bar. Default is False.
-    drop_na_in_y : bool, optional
+    drop_na_in_y : bool
         Whether to drop observations where y is na. Default is False.
     Returns
     -------
-    list
+    List[EstimatorLike]
         List containing fitted models for each split.
     Examples
@@ -195,14 +201,11 @@ def cross_val_fit(
     >>> from sklearn.linear_model import LinearRegression
     >>> from panelsplit.cross_validation import PanelSplit
     >>> # Create sample data
-    >>> df = pd.DataFrame({
-    ...     'feature': [1, 2, 3, 4, 5, 6],
-    ...     'period': [1, 1, 2, 2, 3, 3]
-    ... })
-    >>> X = df[['feature']]
+    >>> df = pd.DataFrame({"feature": [1, 2, 3, 4, 5, 6], "period": [1, 1, 2, 2, 3, 3]})
+    >>> X = df[["feature"]]
     >>> y = pd.Series([2, 4, 6, 8, 10, 12])
     >>> # Create a PanelSplit instance for cross-validation
-    >>> ps = PanelSplit(periods=df['period'], n_splits=2)
+    >>> ps = PanelSplit(periods=df["period"], n_splits=2)
     >>> fitted_models = cross_val_fit(LinearRegression(), X, y, ps)
     >>> len(fitted_models)
     2
@@ -223,142 +226,136 @@ def cross_val_fit(
 def cross_val_predict(
-    fitted_estimators,
+    fitted_estimators: List[EstimatorLike],
     X: IntoDataFrame,
-    cv,
+    cv: Union[PanelSplit, Iterable],
     method: str = "predict",
     n_jobs: int = 1,
-    return_train_preds: bool = False,
-) -> Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]:
+    return_group: Literal["test", "train"] = "test",
+) -> np.ndarray:
     """
     Perform cross-validated predictions using a given predictor model.
     Parameters
     ----------
-    fitted_estimators : list
+    fitted_estimators : List[EstimatorLike]
         List of fitted machine learning models used for prediction.
     X : IntoDataFrame
         The input features for prediction.
-    cv : object or iterable
+    cv : Union[PanelSplit, Iterable]
         Cross-validation splitter; either an object that generates train/test splits or an iterable of splits.
-    method : str, optional
-        The method to use for prediction. It can be whatever methods are available to the estimator
+    method : str
+        The method to use for prediction. It can be whatever methods are available to the estimator.
         (e.g. predict_proba in the case of a classifier or transform in the case of a transformer). Default is 'predict'.
-    n_jobs : int, optional
+    n_jobs : int
         The number of jobs to run in parallel. Default is 1.
-    return_train_preds : bool, optional
-        If True, return predictions for the training set as well. Default is False.
+    return_group : {"test","train"}
+        Whether to return the train or test predictions. Default is "test".
     Returns
     -------
-    test_preds : np.ndarray
-        Array containing test predictions made by the model during cross-validation.
-    train_preds : np.ndarray, optional
-        Array containing train predictions made by the model during cross-validation.
-        Returned only if `return_train_preds` is True.
+    np.ndarray
+        Predictions (either train or test depending on return_group).
+    Examples
+    --------
+    >>> from sklearn.linear_model import LinearRegression
+    >>> import numpy as np
+    >>> from panelsplit.cross_validation import PanelSplit,
+    >>> from panelsplit.application import cross_val_predict, cross_val_fit
+    >>> X = np.arange(12).reshape(6, 2)
+    >>> y = np.array([1, 2, 3, 4, 5, 6])
+    >>> ps = PanelSplit(periods=np.array([1, 1, 2, 2, 3, 3]), n_splits=2)
+    >>> estimators = cross_val_fit(LinearRegression(), X, y, ps)
+    >>> preds = cross_val_predict(estimators, X, ps)
+    >>> preds.shape
+    (4,)
     """
     check_fitted_estimators(fitted_estimators)
     splits = check_cv(cv)
+    if return_group not in ["train", "test"]:
+        raise ValueError(
+            f"return_group must be train or test. Got {return_group} instead."
+        )
-    test_splits = [split[1] for split in splits]
-    test_indices = _prediction_order_to_original_order(test_splits)
+    group = 0 if return_group == "train" else 1
+    group_splits = [split[group] for split in splits]
+    group_indices = _prediction_order_to_original_order(group_splits)
     # Use narwhals for dataframe-agnostic operations
     X_nw = nw.from_native(X, pass_through=True)
-    test_preds = Parallel(n_jobs=n_jobs)(
+    preds = Parallel(n_jobs=n_jobs)(
         delayed(_predict_split)(
             fitted_estimators[i],
             _safe_indexing(X_nw, test_idx, to_native=True),
             method,
         )
-        for i, test_idx in enumerate(test_splits)
+        for i, test_idx in enumerate(group_splits)
     )
-    if return_train_preds:
-        train_splits = [split[0] for split in splits]
-        train_indices = _prediction_order_to_original_order(train_splits)
-        train_preds = Parallel(n_jobs=n_jobs)(
-            delayed(_predict_split)(
-                fitted_estimators[i],
-                _safe_indexing(X_nw, train_idx, to_native=True),
-                method,
-            )
-            for i, train_idx in enumerate(train_splits)
-        )
-        return np.concatenate(test_preds, axis=0)[test_indices], np.concatenate(
-            train_preds, axis=0
-        )[train_indices]
-    else:
-        return np.concatenate(test_preds, axis=0)[test_indices]
+    return np.concatenate(preds, axis=0)[group_indices]
 def cross_val_fit_predict(
-    estimator,
+    estimator: EstimatorLike,
     X: IntoDataFrame,
     y: IntoSeries,
-    cv,
+    cv: Union[PanelSplit, Iterable],
     method: str = "predict",
     sample_weight: Optional[Union[IntoSeries, np.ndarray]] = None,
     n_jobs: int = 1,
-    return_train_preds: bool = False,
-    drop_na_in_y=False,
-) -> Union[
-    Tuple[np.ndarray, List[BaseEstimator]],
-    Tuple[np.ndarray, np.ndarray, List[BaseEstimator]],
-]:
+    return_group: Literal["test", "train"] = "test",
+    drop_na_in_y: bool = False,
+) -> Tuple[np.ndarray, List[EstimatorLike]]:
     """
     Fit the estimator using cross-validation and then make predictions.
     Parameters
     ----------
-    estimator : object
+    estimator : EstimatorLike
         The machine learning model to be fitted.
     X : IntoDataFrame
         The input features for the estimator.
     y : IntoSeries
         The target variable for the estimator.
-    cv : object
+    cv : Union[PanelSplit, Iterable]
         Cross-validation splitter; an object that generates train/test splits.
-     method : str, optional
-        The method to use for prediction. It can be whatever methods are available to the estimator
-        (e.g. predict_proba in the case of a classifier or transform in the case of a transformer). Default is 'predict'.
-    sample_weight : IntoSeries or np.ndarray, optional
+    method : str
+        The method to use for prediction. It can be any method available on the estimator
+        (e.g., ``predict_proba`` for classifiers or ``transform`` for transformers). Default is predict.
+    sample_weight : Optional[Union[IntoSeries, np.ndarray]]
         Sample weights for the training data. Default is None.
-    n_jobs : int, optional
+    n_jobs : int
         The number of jobs to run in parallel. Default is 1.
-    return_train_preds : bool, optional
-        If True, return predictions for the training set as well. Default is False.
-    drop_na_in_y : bool, optional
-        Whether to drop observations where y is na. Default is False.
+    return_group : {"test","train"}
+        Whether to return the train or test predictions. Default is test.
+    drop_na_in_y : bool
+        Whether to drop observations where ``y`` is NA. Default is False.
     Returns
     -------
-    tuple
-        If `return_train_preds` is False, returns a tuple of:
-            - preds (np.ndarray): Array containing predictions made by the model during cross-validation.
-            - fitted_estimators (list): List containing fitted models for each split.
-        If `return_train_preds` is True, returns a tuple of:
-            - preds (np.ndarray): Array containing test predictions made by the model during cross-validation.
-            - train_preds (np.ndarray): Array containing train predictions made by the model during cross-validation.
-            - fitted_estimators (list): List containing fitted models for each split.
+    Tuple[np.ndarray, List[EstimatorLike]]
+        (predictions (either train or test depending on return_group), fitted_estimators).
+    Raises
+    ------
+    TypeError
+        If the provided estimator does not implement the specified ``method`` or has invalid type.
     Examples
     --------
     >>> import pandas as pd
     >>> from sklearn.linear_model import LinearRegression
-    >>> from panelsplit.cross_validation import PanelSplit  # assuming PanelSplit is imported from your module
+    >>> from panelsplit.cross_validation import (
+    ...     PanelSplit,
+    ... )  # assuming PanelSplit is imported from your module
     >>> # Create sample data
-    >>> df = pd.DataFrame({
-    ...     'feature': [1, 2, 3, 4, 5, 6],
-    ...     'period': [1, 1, 2, 2, 3, 3]
-    ... })
-    >>> X = df[['feature']]
+    >>> df = pd.DataFrame({"feature": [1, 2, 3, 4, 5, 6], "period": [1, 1, 2, 2, 3, 3]})
+    >>> X = df[["feature"]]
     >>> y = pd.Series([2, 4, 6, 8, 10, 12])
     >>> # Create a PanelSplit instance for cross-validation
-    >>> ps = PanelSplit(periods=df['period'], n_splits=2)
+    >>> ps = PanelSplit(periods=df["period"], n_splits=2)
     >>> # Get test predictions and fitted models
     >>> preds, models = cross_val_fit_predict(LinearRegression(), X, y, ps)
     >>> preds.shape
@@ -369,22 +366,6 @@ def cross_val_fit_predict(
         estimator, X, y, cv, sample_weight, n_jobs, drop_na_in_y=drop_na_in_y
     )
-    res = cross_val_predict(
-        fitted_estimators, X, cv, method, n_jobs, return_train_preds
-    )
+    preds = cross_val_predict(fitted_estimators, X, cv, method, n_jobs, return_group)
-    if return_train_preds:
-        # res should be Tuple[np.ndarray, np.ndarray]
-        if isinstance(res, tuple):
-            preds, train_preds = res
-        else:
-            # defensive: unexpected type at runtime
-            raise TypeError("cross_val_predict returned ndarray but expected tuple")
-        return preds, train_preds, fitted_estimators
-    else:
-        # res should be np.ndarray
-        if isinstance(res, tuple):
-            preds = res[0]
-        else:
-            preds = res
-        return preds, fitted_estimators
+    return preds, fitted_estimators

panelsplit 1.1.1__tar.gz → 2.0.4.dev0__tar.gz

panelsplit 1.1.1tar.gz → 2.0.4.dev0tar.gz