PyPI - r-scikit-learn - Versions diffs - 0.1.0__tar.gz → 0.1.1__tar.gz - Mend

r-scikit-learn 0.1.0tar.gz → 0.1.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (84) hide show

r_scikit_learn-0.1.1/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,23 @@
+# Changelog
+All notable changes to r-scikit-learn are documented here. Release tags and
+published package versions are immutable.
+## Unreleased
+## 0.1.1 - 2026-06-15
+- Added wheel and source-distribution installation testing across supported
+  operating systems and Python versions.
+- Added a numerical-safety fallback for ill-conditioned tall least-squares
+  problems.
+- Added TestPyPI, cross-platform benchmark, and immutable manual release
+  workflows.
+## 0.1.0
+- Added Rust-powered preprocessing, categorical encoding, sparse
+  infrastructure, composition, metrics, model selection, and linear models.
+- Added Linux, macOS, and Windows wheel builds for Python 3.10 through 3.13.
+- Added Rust-native tall-matrix least squares and multinomial logistic
+  optimization.

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.1}/Cargo.lock RENAMED Viewed

@@ -998,7 +998,7 @@ dependencies = [
 [[package]]
 name = "r-scikit-learn-core"
-version = "0.1.0"
+version = "0.1.1"
 dependencies = [
  "faer",
  "nalgebra",

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.1}/Cargo.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [package]
 name = "r-scikit-learn-core"
-version = "0.1.0"
+version = "0.1.1"
 edition = "2021"
 license = "MIT"
 description = "Rust computational core for r-scikit-learn"
@@ -9,6 +9,7 @@ repository = "https://github.com/rishib42/r-scikit-learn"
 include = [
   "/Cargo.lock",
   "/Cargo.toml",
+  "/CHANGELOG.md",
   "/LICENSE",
   "/README.md",
   "/benches/*.py",

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: r-scikit-learn
-Version: 0.1.0
+Version: 0.1.1
 Classifier: Development Status :: 3 - Alpha
 Classifier: License :: OSI Approved :: MIT License
 Classifier: Programming Language :: Python :: 3
@@ -12,6 +12,7 @@ Classifier: Programming Language :: Rust
 Classifier: Typing :: Typed
 Requires-Dist: numpy>=1.23
 Requires-Dist: scipy>=1.10
+Requires-Dist: hypothesis>=6.100,<7 ; extra == 'dev'
 Requires-Dist: maturin>=1.9,<2.0 ; extra == 'dev'
 Requires-Dist: pytest>=8 ; extra == 'dev'
 Requires-Dist: ruff>=0.11 ; extra == 'dev'
@@ -25,6 +26,7 @@ Author: r-scikit-learn contributors
 License-Expression: MIT
 Requires-Python: >=3.10
 Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
+Project-URL: Changelog, https://github.com/rishib42/r-scikit-learn/blob/main/CHANGELOG.md
 Project-URL: Homepage, https://github.com/rishib42/r-scikit-learn
 Project-URL: Issues, https://github.com/rishib42/r-scikit-learn/issues
 Project-URL: Repository, https://github.com/rishib42/r-scikit-learn
@@ -34,7 +36,7 @@ Project-URL: Repository, https://github.com/rishib42/r-scikit-learn
 Fast, familiar machine-learning building blocks powered by safe Rust. 🦀
 `r-scikit-learn` combines a Rust computational core with lightweight,
-scikit-learn-style Python estimators. Version 0.1.0 includes:
+scikit-learn-style Python estimators. Version 0.1.1 includes:
 - Preprocessing, categorical encoding, and missing-value imputation
 - Pipelines and column transformers
@@ -327,14 +329,22 @@ Substantial numerical loops release the Python GIL.
 ## Release
-1. Run all development checks and build a release wheel.
-2. Install the wheel into a clean virtual environment and run the import smoke
-   test.
-3. Verify the distribution name on PyPI.
-4. Tag the release as `v0.1.0` and push the tag.
-5. Approve the GitHub Actions Trusted Publishing environment.
-The release workflow uses PyPI Trusted Publishing and contains no API token.
+1. Update the matching versions in `pyproject.toml`, `Cargo.toml`, and
+   `python/rsklearn/__init__.py`, then update `CHANGELOG.md`.
+2. Push the release commit and wait for CI, including manylinux and sdist
+   installation checks, to pass.
+3. Run the manual TestPyPI workflow and verify its distributions.
+4. Run the manual Release workflow with the version number without a `v`
+   prefix.
+5. Approve the PyPI environment if required.
+The release workflow refuses existing versions, installs every wheel on
+Python 3.10-3.13 across Linux, macOS, and Windows, verifies sdist installation,
+publishes through PyPI Trusted Publishing, creates the immutable GitHub tag and
+release, attaches artifacts, and verifies installation from PyPI. No API token
+is stored in the repository. Configure separate `pypi` and `testpypi` GitHub
+environments and matching Trusted Publishers for `release.yml` and
+`test-pypi.yml`, respectively.
 ## Roadmap

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.1}/README.md RENAMED Viewed

@@ -3,7 +3,7 @@
 Fast, familiar machine-learning building blocks powered by safe Rust. 🦀
 `r-scikit-learn` combines a Rust computational core with lightweight,
-scikit-learn-style Python estimators. Version 0.1.0 includes:
+scikit-learn-style Python estimators. Version 0.1.1 includes:
 - Preprocessing, categorical encoding, and missing-value imputation
 - Pipelines and column transformers
@@ -296,14 +296,22 @@ Substantial numerical loops release the Python GIL.
 ## Release
-1. Run all development checks and build a release wheel.
-2. Install the wheel into a clean virtual environment and run the import smoke
-   test.
-3. Verify the distribution name on PyPI.
-4. Tag the release as `v0.1.0` and push the tag.
-5. Approve the GitHub Actions Trusted Publishing environment.
-The release workflow uses PyPI Trusted Publishing and contains no API token.
+1. Update the matching versions in `pyproject.toml`, `Cargo.toml`, and
+   `python/rsklearn/__init__.py`, then update `CHANGELOG.md`.
+2. Push the release commit and wait for CI, including manylinux and sdist
+   installation checks, to pass.
+3. Run the manual TestPyPI workflow and verify its distributions.
+4. Run the manual Release workflow with the version number without a `v`
+   prefix.
+5. Approve the PyPI environment if required.
+The release workflow refuses existing versions, installs every wheel on
+Python 3.10-3.13 across Linux, macOS, and Windows, verifies sdist installation,
+publishes through PyPI Trusted Publishing, creates the immutable GitHub tag and
+release, attaches artifacts, and verifies installation from PyPI. No API token
+is stored in the repository. Configure separate `pypi` and `testpypi` GitHub
+environments and matching Trusted Publishers for `release.yml` and
+`test-pypi.yml`, respectively.
 ## Roadmap

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.1}/benches/benchmark_linear_models.py RENAMED Viewed

@@ -10,6 +10,8 @@ from collections.abc import Callable
 import numpy as np
 import rsklearn.linear_model as rlinear
+import scipy
+import sklearn
 import sklearn.linear_model as slinear
 from rsklearn import _core
@@ -65,6 +67,10 @@ def main() -> None:
         )
     print(f"Python: {sys.executable}")
     print(f"Rust extension: {_core.__file__} ({profile})")
+    print(
+        f"Dependencies: numpy {np.__version__}, scipy {scipy.__version__}, "
+        f"scikit-learn {sklearn.__version__}"
+    )
     rng = np.random.default_rng(20260614)
     X = rng.normal(size=(args.samples, args.features))
     coefficients = rng.normal(size=args.features)

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "maturin"
 [project]
 name = "r-scikit-learn"
-version = "0.1.0"
+version = "0.1.1"
 description = "High-performance scikit-learn-style machine learning powered by safe Rust"
 readme = "README.md"
 requires-python = ">=3.10"
@@ -26,6 +26,7 @@ dependencies = ["numpy>=1.23", "scipy>=1.10"]
 [project.optional-dependencies]
 dev = [
+  "hypothesis>=6.100,<7",
   "maturin>=1.9,<2.0",
   "pytest>=8",
   "ruff>=0.11",
@@ -36,6 +37,7 @@ dev = [
 Homepage = "https://github.com/rishib42/r-scikit-learn"
 Repository = "https://github.com/rishib42/r-scikit-learn"
 Issues = "https://github.com/rishib42/r-scikit-learn/issues"
+Changelog = "https://github.com/rishib42/r-scikit-learn/blob/main/CHANGELOG.md"
 [tool.maturin]
 python-source = "python"

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.1}/python/rsklearn/__init__.py RENAMED Viewed

@@ -45,4 +45,4 @@ __all__ = [
     "make_column_transformer",
     "make_pipeline",
 ]
-__version__ = "0.1.0"
+__version__ = "0.1.1"

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.1}/python/rsklearn/linear_model/_least_squares.py RENAMED Viewed

@@ -12,6 +12,26 @@ from rsklearn.base import BaseEstimator, RegressorMixin
 from ._base import LinearModel, validate_regression_fit
+# Normal equations square the condition number. This cutoff limits the
+# resulting float64 error amplification before selecting the fast Gram path.
+_GRAM_MIN_SINGULAR_RATIO = np.finfo(np.float64).eps ** 0.25
+_GRAM_RANK_RESOLUTION = np.sqrt(np.finfo(np.float64).eps)
+def _tall_solution_is_stable(singular: np.ndarray, rank: int, tolerance: float) -> bool:
+    """Return whether normal-equation accuracy is reliable for this spectrum."""
+    if rank == 0 or singular.size == 0 or not np.isfinite(singular).all():
+        return False
+    if rank < singular.size and tolerance < _GRAM_RANK_RESOLUTION:
+        return False
+    largest = singular[0]
+    smallest_retained = singular[rank - 1]
+    return (
+        largest > 0
+        and smallest_retained > 0
+        and smallest_retained / largest >= _GRAM_MIN_SINGULAR_RATIO
+    )
 def _fit_lstsq(
     X: np.ndarray,
@@ -22,7 +42,9 @@ def _fit_lstsq(
 ) -> tuple[np.ndarray, np.ndarray, int, np.ndarray]:
     """Solve unregularized least squares through a shape-aware dense backend."""
     if X.shape[0] >= 4 * X.shape[1]:
-        return _core.linear_fit_tall(X, y, weights, fit_intercept, tolerance)
+        tall_fit = _core.linear_fit_tall(X, y, weights, fit_intercept, tolerance)
+        if _tall_solution_is_stable(tall_fit[3], tall_fit[2], tolerance):
+            return tall_fit
     uniform_weights = np.all(weights == weights[0])
     if fit_intercept:
         if uniform_weights:

r_scikit_learn-0.1.1/tests/release_smoke.py ADDED Viewed

@@ -0,0 +1,28 @@
+"""Minimal installed-distribution smoke test used by release workflows."""
+from __future__ import annotations
+import numpy as np
+import rsklearn
+from rsklearn.linear_model import LinearRegression, LogisticRegression
+from rsklearn.preprocessing import OneHotEncoder, StandardScaler
+def main() -> None:
+    X = np.asarray([[1.0, 2.0], [2.0, 1.0], [3.0, 4.0], [4.0, 3.0]])
+    regression = LinearRegression().fit(X, [3.0, 3.0, 7.0, 7.0])
+    np.testing.assert_allclose(regression.predict(X), [3.0, 3.0, 7.0, 7.0])
+    classification = LogisticRegression(max_iter=500).fit(X, [0, 0, 1, 1])
+    np.testing.assert_array_equal(classification.predict(X), [0, 0, 1, 1])
+    scaled = StandardScaler().fit_transform(X)
+    np.testing.assert_allclose(scaled.mean(axis=0), 0.0, atol=1e-12)
+    encoded = OneHotEncoder().fit_transform([["a"], ["b"], ["a"]])
+    assert encoded.shape == (3, 2)
+    assert rsklearn.__version__
+if __name__ == "__main__":
+    main()

{r_scikit_learn-0.1.0 → r_scikit_learn-0.1.1}/tests/test_linear_model_parity.py RENAMED Viewed

@@ -1,12 +1,16 @@
 import numpy as np
 import pytest
+from hypothesis import given, settings
+from hypothesis import strategies as st
 from rsklearn.linear_model import (
     ElasticNet,
     Lasso,
     LinearRegression,
     LogisticRegression,
     Ridge,
+    _least_squares,
 )
+from rsklearn.linear_model._least_squares import _tall_solution_is_stable
 from scipy import linalg
 sklearn_linear = pytest.importorskip("sklearn.linear_model")
@@ -60,6 +64,84 @@ def test_tall_linear_regression_matches_svd_near_rank_deficiency(perturbation):
     np.testing.assert_allclose(ours.predict(X), expected, rtol=1e-7, atol=5e-9)
+@given(
+    rows=st.integers(min_value=80, max_value=240),
+    columns=st.integers(min_value=2, max_value=12),
+    log_condition=st.floats(
+        min_value=0.0, max_value=12.0, allow_nan=False, allow_infinity=False
+    ),
+    weighted=st.booleans(),
+    fit_intercept=st.booleans(),
+)
+@settings(max_examples=40, deadline=None)
+def test_linear_regression_matches_svd_across_condition_numbers(
+    rows, columns, log_condition, weighted, fit_intercept
+):
+    rows = max(rows, 4 * columns)
+    rng = np.random.default_rng(
+        rows * 10_000 + columns * 100 + int(log_condition * 10) + int(weighted)
+    )
+    left, _ = np.linalg.qr(rng.normal(size=(rows, columns)))
+    right, _ = np.linalg.qr(rng.normal(size=(columns, columns)))
+    singular = np.geomspace(1.0, 10.0**-log_condition, columns)
+    X = np.ascontiguousarray((left * singular) @ right.T)
+    y = rng.normal(size=rows)
+    weights = rng.uniform(0.1, 2.0, size=rows) if weighted else None
+    ours = LinearRegression(tol=1e-10, fit_intercept=fit_intercept).fit(
+        X, y, sample_weight=weights
+    )
+    reference_weights = np.ones(rows) if weights is None else weights
+    if fit_intercept:
+        x_mean = np.average(X, axis=0, weights=reference_weights)
+        y_mean = np.average(y, weights=reference_weights)
+    else:
+        x_mean = np.zeros(columns)
+        y_mean = 0.0
+    root_weights = np.sqrt(reference_weights)
+    coefficients, _, _, _ = linalg.lstsq(
+        (X - x_mean) * root_weights[:, None],
+        (y - y_mean) * root_weights,
+        cond=1e-10,
+        check_finite=False,
+        lapack_driver="gelsd",
+    )
+    expected = X @ coefficients + y_mean - coefficients @ x_mean
+    np.testing.assert_allclose(ours.predict(X), expected, rtol=1e-7, atol=1e-9)
+def test_tall_solution_stability_gate_rejects_unsafe_spectra():
+    assert _tall_solution_is_stable(np.asarray([10.0, 1.0]), 2, 1e-6)
+    assert not _tall_solution_is_stable(np.asarray([10.0, 1e-10]), 2, 1e-6)
+    assert _tall_solution_is_stable(np.asarray([10.0, 1.0, 0.0]), 2, 1e-6)
+    assert not _tall_solution_is_stable(np.asarray([10.0, 1.0, 0.0]), 2, 1e-10)
+    assert not _tall_solution_is_stable(np.asarray([0.0, 0.0]), 0, 1e-6)
+@pytest.mark.parametrize(
+    "log_condition, expects_fallback", [(2.0, False), (10.0, True)]
+)
+def test_tall_linear_regression_falls_back_only_when_numerically_unsafe(
+    monkeypatch, log_condition, expects_fallback
+):
+    rng = np.random.default_rng(1234)
+    left, _ = np.linalg.qr(rng.normal(size=(1_000, 5)))
+    right, _ = np.linalg.qr(rng.normal(size=(5, 5)))
+    X = np.ascontiguousarray(
+        (left * np.geomspace(1.0, 10.0**-log_condition, 5)) @ right.T
+    )
+    original = _least_squares.linalg.lstsq
+    calls = 0
+    def tracked_lstsq(*args, **kwargs):
+        nonlocal calls
+        calls += 1
+        return original(*args, **kwargs)
+    monkeypatch.setattr(_least_squares.linalg, "lstsq", tracked_lstsq)
+    LinearRegression(tol=1e-10).fit(X, rng.normal(size=X.shape[0]))
+    assert bool(calls) is expects_fallback
 @pytest.mark.parametrize("alpha", [0.0, 0.1, 10.0])
 @pytest.mark.parametrize("fit_intercept", [True, False])
 def test_ridge_matches_scikit_learn_svd(alpha, fit_intercept):