PyPI - r-scikit-learn - Versions diffs - 0.1.0__tar.gz → 0.1.2__tar.gz - Mend

r-scikit-learn 0.1.0tar.gz → 0.1.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (94) hide show

r_scikit_learn-0.1.2/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,35 @@
+# Changelog
+All notable changes to r-scikit-learn are documented here. Release tags and
+published package versions are immutable.
+## Unreleased
+## 0.1.2 - 2026-06-24
+- Added dense brute-force `KNeighborsClassifier` with Rust-backed neighbor
+  search, class voting, `predict`, `predict_proba`, and `kneighbors`.
+- Added scikit-learn parity tests and benchmarks for nearest-neighbor
+  classification.
+- Optimized the dense Euclidean neighbor search path with blocked dot products,
+  reusable work buffers, and macOS Accelerate/CBLAS acceleration with a portable
+  `matrixmultiply` fallback.
+- Added sparse `StandardScaler(with_mean=False)` and `MaxAbsScaler` with
+  Rust-backed CSR/CSC reductions and column scaling.
+## 0.1.1 - 2026-06-15
+- Added wheel and source-distribution installation testing across supported
+  operating systems and Python versions.
+- Added a numerical-safety fallback for ill-conditioned tall least-squares
+  problems.
+- Added TestPyPI, cross-platform benchmark, and immutable manual release
+  workflows.
+## 0.1.0
+- Added Rust-powered preprocessing, categorical encoding, sparse
+  infrastructure, composition, metrics, model selection, and linear models.
+- Added Linux, macOS, and Windows wheel builds for Python 3.10 through 3.13.
+- Added Rust-native tall-matrix least squares and multinomial logistic
+  optimization.

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.2}/Cargo.lock RENAMED Viewed

@@ -998,9 +998,10 @@ dependencies = [
 [[package]]
 name = "r-scikit-learn-core"
-version = "0.1.0"
+version = "0.1.2"
 dependencies = [
  "faer",
+ "matrixmultiply",
  "nalgebra",
  "numpy",
  "pyo3",

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.2}/Cargo.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [package]
 name = "r-scikit-learn-core"
-version = "0.1.0"
+version = "0.1.2"
 edition = "2021"
 license = "MIT"
 description = "Rust computational core for r-scikit-learn"
@@ -9,6 +9,7 @@ repository = "https://github.com/rishib42/r-scikit-learn"
 include = [
   "/Cargo.lock",
   "/Cargo.toml",
+  "/CHANGELOG.md",
   "/LICENSE",
   "/README.md",
   "/benches/*.py",
@@ -28,6 +29,7 @@ crate-type = ["cdylib", "rlib"]
 [dependencies]
 faer = { version = "0.24", default-features = false, features = ["std", "rayon", "linalg"] }
+matrixmultiply = "0.3"
 nalgebra = { version = "0.34", default-features = false, features = ["std"] }
 numpy = "0.28"
 pyo3 = "0.28"

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: r-scikit-learn
-Version: 0.1.0
+Version: 0.1.2
 Classifier: Development Status :: 3 - Alpha
 Classifier: License :: OSI Approved :: MIT License
 Classifier: Programming Language :: Python :: 3
@@ -12,6 +12,7 @@ Classifier: Programming Language :: Rust
 Classifier: Typing :: Typed
 Requires-Dist: numpy>=1.23
 Requires-Dist: scipy>=1.10
+Requires-Dist: hypothesis>=6.100,<7 ; extra == 'dev'
 Requires-Dist: maturin>=1.9,<2.0 ; extra == 'dev'
 Requires-Dist: pytest>=8 ; extra == 'dev'
 Requires-Dist: ruff>=0.11 ; extra == 'dev'
@@ -25,6 +26,7 @@ Author: r-scikit-learn contributors
 License-Expression: MIT
 Requires-Python: >=3.10
 Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
+Project-URL: Changelog, https://github.com/rishib42/r-scikit-learn/blob/main/CHANGELOG.md
 Project-URL: Homepage, https://github.com/rishib42/r-scikit-learn
 Project-URL: Issues, https://github.com/rishib42/r-scikit-learn/issues
 Project-URL: Repository, https://github.com/rishib42/r-scikit-learn
@@ -34,7 +36,7 @@ Project-URL: Repository, https://github.com/rishib42/r-scikit-learn
 Fast, familiar machine-learning building blocks powered by safe Rust. 🦀
 `r-scikit-learn` combines a Rust computational core with lightweight,
-scikit-learn-style Python estimators. Version 0.1.0 includes:
+scikit-learn-style Python estimators. Version 0.1.1 includes:
 - Preprocessing, categorical encoding, and missing-value imputation
 - Pipelines and column transformers
@@ -124,6 +126,13 @@ encoder = OneHotEncoder(handle_unknown="ignore")
 X_one_hot = encoder.fit_transform([["small"], ["large"], ["small"]])
 ```
+```python
+from rsklearn.preprocessing import MaxAbsScaler, StandardScaler
+X_sparse_scaled = StandardScaler(with_mean=False).fit_transform(X_one_hot)
+X_sparse_maxabs = MaxAbsScaler().fit_transform(X_one_hot)
+```
 ```python
 import numpy as np
 from rsklearn.impute import SimpleImputer
@@ -193,7 +202,10 @@ probabilities = classifier.predict_proba(X_test)
 - Uses float64 fitted statistics and native float32 kernels where supported.
 - Ignores NaNs while fitting, preserves them while transforming, and rejects
   infinity.
-- Supports incremental `partial_fit` for `StandardScaler` and `MinMaxScaler`.
+- Supports incremental `partial_fit` for `StandardScaler`, `MaxAbsScaler`, and
+  `MinMaxScaler`.
+- Supports CSR/CSC sparse `StandardScaler(with_mean=False)` and `MaxAbsScaler`
+  without densifying input.
 - Supports L1, L2, and max row normalization.
 - Provides quantile-based `RobustScaler` fitting and inverse transforms.
@@ -274,8 +286,6 @@ The core implemented behavior is tested and packaged across Linux, macOS, and
 Windows, but the project remains alpha software. Before a stable 1.0 release,
 the following compatibility and operational work remains:
-- Sparse-aware estimator behavior, including non-centering `StandardScaler`
-  operation. Shared CSR/CSC validation and Rust kernels are implemented.
 - `sample_weight` support for `StandardScaler.partial_fit`.
 - Comprehensive `get_feature_names_out` support and configurable output
   containers across estimators.
@@ -327,14 +337,22 @@ Substantial numerical loops release the Python GIL.
 ## Release
-1. Run all development checks and build a release wheel.
-2. Install the wheel into a clean virtual environment and run the import smoke
-   test.
-3. Verify the distribution name on PyPI.
-4. Tag the release as `v0.1.0` and push the tag.
-5. Approve the GitHub Actions Trusted Publishing environment.
-The release workflow uses PyPI Trusted Publishing and contains no API token.
+1. Update the matching versions in `pyproject.toml`, `Cargo.toml`, and
+   `python/rsklearn/__init__.py`, then update `CHANGELOG.md`.
+2. Push the release commit and wait for CI, including manylinux and sdist
+   installation checks, to pass.
+3. Run the manual TestPyPI workflow and verify its distributions.
+4. Run the manual Release workflow with the version number without a `v`
+   prefix.
+5. Approve the PyPI environment if required.
+The release workflow refuses existing versions, installs every wheel on
+Python 3.10-3.13 across Linux, macOS, and Windows, verifies sdist installation,
+publishes through PyPI Trusted Publishing, creates the immutable GitHub tag and
+release, attaches artifacts, and verifies installation from PyPI. No API token
+is stored in the repository. Configure separate `pypi` and `testpypi` GitHub
+environments and matching Trusted Publishers for `release.yml` and
+`test-pypi.yml`, respectively.
 ## Roadmap

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.2}/README.md RENAMED Viewed

@@ -3,7 +3,7 @@
 Fast, familiar machine-learning building blocks powered by safe Rust. 🦀
 `r-scikit-learn` combines a Rust computational core with lightweight,
-scikit-learn-style Python estimators. Version 0.1.0 includes:
+scikit-learn-style Python estimators. Version 0.1.1 includes:
 - Preprocessing, categorical encoding, and missing-value imputation
 - Pipelines and column transformers
@@ -93,6 +93,13 @@ encoder = OneHotEncoder(handle_unknown="ignore")
 X_one_hot = encoder.fit_transform([["small"], ["large"], ["small"]])
 ```
+```python
+from rsklearn.preprocessing import MaxAbsScaler, StandardScaler
+X_sparse_scaled = StandardScaler(with_mean=False).fit_transform(X_one_hot)
+X_sparse_maxabs = MaxAbsScaler().fit_transform(X_one_hot)
+```
 ```python
 import numpy as np
 from rsklearn.impute import SimpleImputer
@@ -162,7 +169,10 @@ probabilities = classifier.predict_proba(X_test)
 - Uses float64 fitted statistics and native float32 kernels where supported.
 - Ignores NaNs while fitting, preserves them while transforming, and rejects
   infinity.
-- Supports incremental `partial_fit` for `StandardScaler` and `MinMaxScaler`.
+- Supports incremental `partial_fit` for `StandardScaler`, `MaxAbsScaler`, and
+  `MinMaxScaler`.
+- Supports CSR/CSC sparse `StandardScaler(with_mean=False)` and `MaxAbsScaler`
+  without densifying input.
 - Supports L1, L2, and max row normalization.
 - Provides quantile-based `RobustScaler` fitting and inverse transforms.
@@ -243,8 +253,6 @@ The core implemented behavior is tested and packaged across Linux, macOS, and
 Windows, but the project remains alpha software. Before a stable 1.0 release,
 the following compatibility and operational work remains:
-- Sparse-aware estimator behavior, including non-centering `StandardScaler`
-  operation. Shared CSR/CSC validation and Rust kernels are implemented.
 - `sample_weight` support for `StandardScaler.partial_fit`.
 - Comprehensive `get_feature_names_out` support and configurable output
   containers across estimators.
@@ -296,14 +304,22 @@ Substantial numerical loops release the Python GIL.
 ## Release
-1. Run all development checks and build a release wheel.
-2. Install the wheel into a clean virtual environment and run the import smoke
-   test.
-3. Verify the distribution name on PyPI.
-4. Tag the release as `v0.1.0` and push the tag.
-5. Approve the GitHub Actions Trusted Publishing environment.
-The release workflow uses PyPI Trusted Publishing and contains no API token.
+1. Update the matching versions in `pyproject.toml`, `Cargo.toml`, and
+   `python/rsklearn/__init__.py`, then update `CHANGELOG.md`.
+2. Push the release commit and wait for CI, including manylinux and sdist
+   installation checks, to pass.
+3. Run the manual TestPyPI workflow and verify its distributions.
+4. Run the manual Release workflow with the version number without a `v`
+   prefix.
+5. Approve the PyPI environment if required.
+The release workflow refuses existing versions, installs every wheel on
+Python 3.10-3.13 across Linux, macOS, and Windows, verifies sdist installation,
+publishes through PyPI Trusted Publishing, creates the immutable GitHub tag and
+release, attaches artifacts, and verifies installation from PyPI. No API token
+is stored in the repository. Configure separate `pypi` and `testpypi` GitHub
+environments and matching Trusted Publishers for `release.yml` and
+`test-pypi.yml`, respectively.
 ## Roadmap

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.2}/benches/benchmark_linear_models.py RENAMED Viewed

@@ -10,6 +10,8 @@ from collections.abc import Callable
 import numpy as np
 import rsklearn.linear_model as rlinear
+import scipy
+import sklearn
 import sklearn.linear_model as slinear
 from rsklearn import _core
@@ -65,6 +67,10 @@ def main() -> None:
         )
     print(f"Python: {sys.executable}")
     print(f"Rust extension: {_core.__file__} ({profile})")
+    print(
+        f"Dependencies: numpy {np.__version__}, scipy {scipy.__version__}, "
+        f"scikit-learn {sklearn.__version__}"
+    )
     rng = np.random.default_rng(20260614)
     X = rng.normal(size=(args.samples, args.features))
     coefficients = rng.normal(size=args.features)

r_scikit_learn-0.1.2/benches/benchmark_neighbors.py ADDED Viewed

@@ -0,0 +1,124 @@
+"""Compare r-scikit-learn and scikit-learn nearest-neighbor performance."""
+from __future__ import annotations
+import argparse
+import statistics
+import sys
+import time
+from collections.abc import Callable
+import numpy as np
+import rsklearn.neighbors as rneighbors
+import scipy
+import sklearn
+import sklearn.neighbors as sneighbors
+from rsklearn import _core
+def measure(
+    function: Callable[[], object], repetitions: int, warmups: int
+) -> tuple[float, float]:
+    for _ in range(warmups):
+        function()
+    values = []
+    for _ in range(repetitions):
+        started = time.perf_counter()
+        function()
+        values.append(time.perf_counter() - started)
+    return statistics.mean(values), statistics.stdev(values) if repetitions > 1 else 0
+def report(
+    name: str,
+    ours: Callable[[], object],
+    theirs: Callable[[], object],
+    repetitions: int,
+    warmups: int,
+) -> None:
+    ours_mean, ours_stdev = measure(ours, repetitions, warmups)
+    theirs_mean, theirs_stdev = measure(theirs, repetitions, warmups)
+    improvement = (theirs_mean - ours_mean) / theirs_mean * 100
+    print(
+        f"{name:<32} r-scikit-learn {ours_mean:9.6f}s ± {ours_stdev:9.6f}s  "
+        f"scikit-learn {theirs_mean:9.6f}s ± {theirs_stdev:9.6f}s  "
+        f"impr. {improvement:+7.2f}%"
+    )
+def main() -> None:
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--train-samples", type=int, default=20_000)
+    parser.add_argument("--query-samples", type=int, default=1_000)
+    parser.add_argument("--features", type=int, default=20)
+    parser.add_argument("--classes", type=int, default=5)
+    parser.add_argument("--neighbors", type=int, default=5)
+    parser.add_argument("--repetitions", type=int, default=5)
+    parser.add_argument("--warmups", type=int, default=2)
+    parser.add_argument(
+        "--allow-debug",
+        action="store_true",
+        help="run even when r-scikit-learn's Rust extension is a debug build",
+    )
+    args = parser.parse_args()
+    profile = _core.build_profile()
+    if profile != "release" and not args.allow_debug:
+        raise SystemExit(
+            "Refusing to benchmark a debug Rust extension. Install a release build "
+            "with `maturin develop --release`, then rerun. Pass --allow-debug only "
+            "when intentionally measuring debug code."
+        )
+    print(f"Python: {sys.executable}")
+    print(f"Rust extension: {_core.__file__} ({profile})")
+    print(
+        f"Dependencies: numpy {np.__version__}, scipy {scipy.__version__}, "
+        f"scikit-learn {sklearn.__version__}"
+    )
+    rng = np.random.default_rng(20260616)
+    X_train = rng.normal(size=(args.train_samples, args.features))
+    X_query = rng.normal(size=(args.query_samples, args.features))
+    y = rng.integers(0, args.classes, size=args.train_samples, dtype=np.int64)
+    options = {
+        "n_neighbors": args.neighbors,
+        "weights": "uniform",
+        "algorithm": "brute",
+        "metric": "euclidean",
+    }
+    print(
+        f"Train matrix: {args.train_samples:,} x {args.features:,}; "
+        f"query matrix: {args.query_samples:,} x {args.features:,}"
+    )
+    report(
+        "KNeighborsClassifier fit",
+        lambda: rneighbors.KNeighborsClassifier(**options).fit(X_train, y),
+        lambda: sneighbors.KNeighborsClassifier(**options).fit(X_train, y),
+        args.repetitions,
+        args.warmups,
+    )
+    ours = rneighbors.KNeighborsClassifier(**options).fit(X_train, y)
+    theirs = sneighbors.KNeighborsClassifier(**options).fit(X_train, y)
+    report(
+        "KNeighborsClassifier kneighbors",
+        lambda: ours.kneighbors(X_query),
+        lambda: theirs.kneighbors(X_query),
+        args.repetitions,
+        args.warmups,
+    )
+    report(
+        "KNeighborsClassifier predict",
+        lambda: ours.predict(X_query),
+        lambda: theirs.predict(X_query),
+        args.repetitions,
+        args.warmups,
+    )
+    report(
+        "KNeighborsClassifier proba",
+        lambda: ours.predict_proba(X_query),
+        lambda: theirs.predict_proba(X_query),
+        args.repetitions,
+        args.warmups,
+    )
+if __name__ == "__main__":
+    main()

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.2}/benches/benchmark_preprocessing.py RENAMED Viewed

@@ -13,6 +13,7 @@ from rsklearn.base import BaseEstimator
 from rsklearn.impute import SimpleImputer
 from rsklearn.preprocessing import (
     LabelEncoder,
+    MaxAbsScaler,
     MinMaxScaler,
     Normalizer,
     OneHotEncoder,
@@ -27,6 +28,7 @@ from sklearn.impute import SimpleImputer as ScikitSimpleImputer
 # The scikit-learn distribution intentionally exposes the `sklearn` import package.
 from sklearn.preprocessing import LabelEncoder as ScikitLabelEncoder
+from sklearn.preprocessing import MaxAbsScaler as ScikitMaxAbsScaler
 from sklearn.preprocessing import MinMaxScaler as ScikitMinMaxScaler
 from sklearn.preprocessing import Normalizer as ScikitNormalizer
 from sklearn.preprocessing import OneHotEncoder as ScikitOneHotEncoder
@@ -88,6 +90,7 @@ def benchmark_matrix(rows: int, columns: int, repetitions: int) -> None:
     )
     for name, ours, theirs in [
         ("StandardScaler", StandardScaler, ScikitStandardScaler),
+        ("MaxAbsScaler", MaxAbsScaler, ScikitMaxAbsScaler),
         ("MinMaxScaler", MinMaxScaler, ScikitMinMaxScaler),
         ("Normalizer", Normalizer, ScikitNormalizer),
         ("RobustScaler", RobustScaler, ScikitRobustScaler),
@@ -294,6 +297,34 @@ def benchmark_sparse(repetitions: int) -> None:
         scikit_scale,
         repetitions,
     )
+    ours_standard = StandardScaler(with_mean=False).fit(matrix)
+    theirs_standard = ScikitStandardScaler(with_mean=False).fit(matrix)
+    report_comparison(
+        "Sparse StandardScaler fit",
+        lambda: StandardScaler(with_mean=False).fit(matrix),
+        lambda: ScikitStandardScaler(with_mean=False).fit(matrix),
+        repetitions,
+    )
+    report_comparison(
+        "Sparse StandardScaler transform",
+        lambda: ours_standard.transform(matrix),
+        lambda: theirs_standard.transform(matrix),
+        repetitions,
+    )
+    ours_maxabs = MaxAbsScaler().fit(matrix)
+    theirs_maxabs = ScikitMaxAbsScaler().fit(matrix)
+    report_comparison(
+        "Sparse MaxAbsScaler fit",
+        lambda: MaxAbsScaler().fit(matrix),
+        lambda: ScikitMaxAbsScaler().fit(matrix),
+        repetitions,
+    )
+    report_comparison(
+        "Sparse MaxAbsScaler transform",
+        lambda: ours_maxabs.transform(matrix),
+        lambda: theirs_maxabs.transform(matrix),
+        repetitions,
+    )
 def main() -> None:

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.2}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "maturin"
 [project]
 name = "r-scikit-learn"
-version = "0.1.0"
+version = "0.1.2"
 description = "High-performance scikit-learn-style machine learning powered by safe Rust"
 readme = "README.md"
 requires-python = ">=3.10"
@@ -26,6 +26,7 @@ dependencies = ["numpy>=1.23", "scipy>=1.10"]
 [project.optional-dependencies]
 dev = [
+  "hypothesis>=6.100,<7",
   "maturin>=1.9,<2.0",
   "pytest>=8",
   "ruff>=0.11",
@@ -36,6 +37,7 @@ dev = [
 Homepage = "https://github.com/rishib42/r-scikit-learn"
 Repository = "https://github.com/rishib42/r-scikit-learn"
 Issues = "https://github.com/rishib42/r-scikit-learn/issues"
+Changelog = "https://github.com/rishib42/r-scikit-learn/blob/main/CHANGELOG.md"
 [tool.maturin]
 python-source = "python"

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.2}/python/rsklearn/__init__.py RENAMED Viewed

@@ -10,6 +10,7 @@ from .base import (
 from .compose import ColumnTransformer, make_column_transformer
 from .impute import SimpleImputer
 from .linear_model import ElasticNet, Lasso, LinearRegression, LogisticRegression, Ridge
+from .neighbors import KNeighborsClassifier
 from .pipeline import Pipeline, make_pipeline
 from .preprocessing import (
     LabelEncoder,
@@ -26,6 +27,7 @@ __all__ = [
     "ClassifierMixin",
     "ColumnTransformer",
     "ElasticNet",
+    "KNeighborsClassifier",
     "LabelEncoder",
     "Lasso",
     "LinearRegression",
@@ -45,4 +47,4 @@ __all__ = [
     "make_column_transformer",
     "make_pipeline",
 ]
-__version__ = "0.1.0"
+__version__ = "0.1.2"

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.2}/python/rsklearn/linear_model/_least_squares.py RENAMED Viewed

@@ -12,6 +12,26 @@ from rsklearn.base import BaseEstimator, RegressorMixin
 from ._base import LinearModel, validate_regression_fit
+# Normal equations square the condition number. This cutoff limits the
+# resulting float64 error amplification before selecting the fast Gram path.
+_GRAM_MIN_SINGULAR_RATIO = np.finfo(np.float64).eps ** 0.25
+_GRAM_RANK_RESOLUTION = np.sqrt(np.finfo(np.float64).eps)
+def _tall_solution_is_stable(singular: np.ndarray, rank: int, tolerance: float) -> bool:
+    """Return whether normal-equation accuracy is reliable for this spectrum."""
+    if rank == 0 or singular.size == 0 or not np.isfinite(singular).all():
+        return False
+    if rank < singular.size and tolerance < _GRAM_RANK_RESOLUTION:
+        return False
+    largest = singular[0]
+    smallest_retained = singular[rank - 1]
+    return (
+        largest > 0
+        and smallest_retained > 0
+        and smallest_retained / largest >= _GRAM_MIN_SINGULAR_RATIO
+    )
 def _fit_lstsq(
     X: np.ndarray,
@@ -22,7 +42,9 @@ def _fit_lstsq(
 ) -> tuple[np.ndarray, np.ndarray, int, np.ndarray]:
     """Solve unregularized least squares through a shape-aware dense backend."""
     if X.shape[0] >= 4 * X.shape[1]:
-        return _core.linear_fit_tall(X, y, weights, fit_intercept, tolerance)
+        tall_fit = _core.linear_fit_tall(X, y, weights, fit_intercept, tolerance)
+        if _tall_solution_is_stable(tall_fit[3], tall_fit[2], tolerance):
+            return tall_fit
     uniform_weights = np.all(weights == weights[0])
     if fit_intercept:
         if uniform_weights:

r_scikit_learn-0.1.2/python/rsklearn/neighbors/__init__.py ADDED Viewed

@@ -0,0 +1,5 @@
+"""Nearest-neighbor estimators."""
+from ._classification import KNeighborsClassifier
+__all__ = ["KNeighborsClassifier"]

r-scikit-learn 0.1.0__tar.gz → 0.1.2__tar.gz

r-scikit-learn 0.1.0tar.gz → 0.1.2tar.gz