PyPI - ezmsg-learn - Versions diffs - 1.3.0__tar.gz → 1.4.0__tar.gz - Mend

ezmsg-learn 1.3.0tar.gz → 1.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (75) hide show

{ezmsg_learn-1.3.0 → ezmsg_learn-1.4.0}/PKG-INFO RENAMED Viewed

@@ -1,13 +1,14 @@
 Metadata-Version: 2.4
 Name: ezmsg-learn
-Version: 1.3.0
+Version: 1.4.0
 Summary: ezmsg namespace package for machine learning
 Author-email: Chadwick Boulay <chadwick.boulay@gmail.com>
 License-Expression: MIT
 License-File: LICENSE
 Requires-Python: >=3.10.15
-Requires-Dist: ezmsg-baseproc>=1.3.0
-Requires-Dist: ezmsg-sigproc>=2.15.0
+Requires-Dist: ezmsg-baseproc>=1.5.1
+Requires-Dist: ezmsg-sigproc>=2.17.0
+Requires-Dist: ezmsg>=3.7.3
 Requires-Dist: river>=0.22.0
 Requires-Dist: scikit-learn>=1.6.0
 Requires-Dist: torch>=2.6.0

ezmsg_learn-1.4.0/docs/source/guides/array_api.rst ADDED Viewed

@@ -0,0 +1,246 @@
+Array API Compatibility
+=======================
+ezmsg-learn uses the `Array API standard <https://data-apis.org/array-api/latest/>`_
+to allow processors to operate on arrays from different backends — NumPy, CuPy,
+PyTorch, and others — without code changes.
+.. contents:: On this page
+   :local:
+   :depth: 2
+How It Works
+------------
+Modules that support the Array API derive the array namespace from their input
+data using ``array_api_compat.get_namespace()``:
+.. code-block:: python
+   from array_api_compat import get_namespace
+   def process(self, data):
+       xp = get_namespace(data)       # numpy, cupy, torch, etc.
+       result = xp.linalg.inv(data)   # dispatches to the right backend
+       return result
+This means that if you pass a CuPy array, all computation stays on the GPU.
+If you pass a NumPy array, it behaves exactly as before.
+Helper utilities from ``ezmsg.sigproc.util.array`` handle device placement
+and creation functions portably:
+- ``array_device(x)`` — returns the device of an array, or ``None``
+- ``xp_create(fn, *args, dtype=None, device=None)`` — calls creation
+  functions (``zeros``, ``eye``) with optional device
+- ``xp_asarray(xp, obj, dtype=None, device=None)`` — portable ``asarray``
+Module Compatibility
+--------------------
+The table below summarises the Array API status of each module.
+Fully compatible
+^^^^^^^^^^^^^^^^
+These modules perform all computation in the source array namespace.
+.. list-table::
+   :header-rows: 1
+   :widths: 35 65
+   * - Module
+     - Notes
+   * - ``process.ssr``
+     - LRR / self-supervised regression. Full Array API.
+   * - ``model.cca``
+     - Incremental CCA. Replaced ``scipy.linalg.sqrtm`` with an
+       eigendecomposition-based inverse square root using only Array API ops.
+   * - ``process.rnn``
+     - PyTorch-native; operates on ``torch.Tensor`` throughout.
+Mostly compatible (with NumPy boundaries)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+These modules use the Array API for data manipulation but fall back to NumPy
+at specific points where a dependency requires it.
+.. list-table::
+   :header-rows: 1
+   :widths: 25 35 40
+   * - Module
+     - NumPy boundary
+     - Reason
+   * - ``model.refit_kalman``
+     - ``_compute_gain()``
+     - ``scipy.linalg.solve_discrete_are`` has no Array API equivalent.
+       Matrices are converted to NumPy for the DARE solver, then converted back.
+   * - ``model.refit_kalman``
+     - ``refit()`` mutation loop
+     - Per-sample velocity remapping uses ``np.linalg.norm`` on small vectors
+       and scalar element assignment.
+   * - ``process.refit_kalman``
+     - Inherits boundaries from model
+     - State init and output arrays use the source namespace.
+   * - ``process.slda``
+     - ``predict_proba``
+     - sklearn ``LinearDiscriminantAnalysis`` requires NumPy input.
+   * - ``process.adaptive_linear_regressor``
+     - ``partial_fit`` / ``predict``
+     - sklearn and river models require NumPy / pandas input.
+   * - ``dim_reduce.adaptive_decomp``
+     - ``partial_fit`` / ``transform``
+     - sklearn ``IncrementalPCA`` and ``MiniBatchNMF`` require NumPy input.
+Not converted
+^^^^^^^^^^^^^
+These modules use NumPy directly. Conversion would provide little benefit
+because the underlying estimator is the bottleneck.
+.. list-table::
+   :header-rows: 1
+   :widths: 25 75
+   * - Module
+     - Reason
+   * - ``process.linear_regressor``
+     - Thin wrapper around sklearn ``LinearModel.predict``.
+       Could be made compatible if sklearn's ``array_api_dispatch`` is enabled
+       (see below).
+   * - ``process.sgd``
+     - sklearn ``SGDClassifier`` has no Array API support.
+   * - ``process.sklearn``
+     - Generic wrapper for arbitrary models; cannot assume Array API support.
+   * - ``dim_reduce.incremental_decomp``
+     - Delegates to ``adaptive_decomp``; trivial numpy usage (``np.prod`` on
+       Python tuples).
+sklearn Array API Dispatch
+--------------------------
+scikit-learn 1.8+ has experimental support for Array API dispatch on a subset
+of estimators.  Two estimators used in ezmsg-learn are on the supported list:
+.. list-table::
+   :header-rows: 1
+   :widths: 30 30 40
+   * - Estimator
+     - Used in
+     - Constraint
+   * - ``LinearDiscriminantAnalysis``
+     - ``process.slda``
+     - Requires ``solver="svd"`` (the ``"lsqr"`` solver with ``shrinkage``
+       is not supported)
+   * - ``Ridge``
+     - ``process.linear_regressor``
+     - Requires ``solver="svd"``
+To use dispatch, enable it before creating the estimator:
+.. code-block:: python
+   from sklearn import set_config
+   set_config(array_api_dispatch=True)
+.. warning::
+   - ``array_api_dispatch`` is marked **experimental** in sklearn.
+   - Solver constraints (``solver="svd"``) may produce slightly different
+     numerical results compared to other solvers.
+   - Enabling dispatch globally may affect other sklearn estimators in the
+     same process.
+   - ezmsg-learn does **not** enable dispatch by default.
+Estimators that do **not** support Array API dispatch:
+- ``IncrementalPCA``, ``MiniBatchNMF`` — only batch ``PCA`` is supported
+- ``SGDClassifier``, ``SGDRegressor``, ``PassiveAggressiveRegressor``
+- All river models
+Writing Array API Compatible Code
+----------------------------------
+When adding or modifying processors in ezmsg-learn, follow these patterns.
+Deriving the namespace
+^^^^^^^^^^^^^^^^^^^^^^
+Always derive ``xp`` from the input data, not from a hardcoded ``numpy``:
+.. code-block:: python
+   from array_api_compat import get_namespace
+   from ezmsg.sigproc.util.array import array_device, xp_create
+   def _process(self, message):
+       xp = get_namespace(message.data)
+       dev = array_device(message.data)
+Transposing matrices
+^^^^^^^^^^^^^^^^^^^^
+The Array API does not support ``.T``.  Use ``xp.linalg.matrix_transpose()``:
+.. code-block:: python
+   # Before (numpy-only)
+   result = A.T @ B
+   # After (Array API)
+   _mT = xp.linalg.matrix_transpose
+   result = _mT(A) @ B
+Creating arrays
+^^^^^^^^^^^^^^^
+Use ``xp_create`` to handle device placement portably:
+.. code-block:: python
+   # Before
+   I = np.eye(n)
+   z = np.zeros((m, n), dtype=np.float64)
+   # After
+   I = xp_create(xp.eye, n, device=dev)
+   z = xp_create(xp.zeros, (m, n), dtype=xp.float64, device=dev)
+Handling sklearn boundaries
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+When calling into sklearn (or other NumPy-only libraries), convert at the
+boundary and convert back:
+.. code-block:: python
+   from array_api_compat import is_numpy_array
+   # Convert to numpy for sklearn
+   X_np = np.asarray(X) if not is_numpy_array(X) else X
+   result_np = estimator.predict(X_np)
+   # Convert back to source namespace
+   result = xp.asarray(result_np) if not is_numpy_array(X) else result_np
+Checking for NaN
+^^^^^^^^^^^^^^^^
+Use ``xp.isnan`` instead of ``np.isnan``:
+.. code-block:: python
+   if xp.any(xp.isnan(message.data)):
+       return
+Norms
+^^^^^
+Use ``xp.linalg.matrix_norm`` (Frobenius by default) instead of
+``np.linalg.norm`` for matrices.  For vectors, use ``xp.linalg.vector_norm``.

{ezmsg_learn-1.3.0 → ezmsg_learn-1.4.0}/docs/source/guides/classification.rst RENAMED Viewed

@@ -125,7 +125,8 @@ For models that support ``partial_fit``, you can update them during streaming:
 .. code-block:: python
    from ezmsg.learn.process.sklearn import SklearnModelProcessor, SklearnModelSettings
-   from ezmsg.sigproc.sampler import SampleMessage
+   from ezmsg.baseproc import SampleTriggerMessage
+   from ezmsg.util.messages.util import replace
    # Create processor with online learning support
    processor = SklearnModelProcessor(
@@ -137,9 +138,9 @@ For models that support ``partial_fit``, you can update them during streaming:
    )
    # Training with labeled samples
-   sample_msg = SampleMessage(
-       sample=feature_array,  # AxisArray with features
-       trigger=label_value,   # The class label
+   sample_msg = replace(
+       feature_array,  # AxisArray with features
+       attrs={"trigger": SampleTriggerMessage(value=label_value)}
    )
    processor.partial_fit(sample_msg)

{ezmsg_learn-1.3.0 → ezmsg_learn-1.4.0}/docs/source/index.rst RENAMED Viewed

@@ -54,6 +54,7 @@ For general ezmsg tutorials and guides, visit `ezmsg.org <https://www.ezmsg.org>
    :caption: Contents:
    guides/classification
+   guides/array_api
    api/index

{ezmsg_learn-1.3.0 → ezmsg_learn-1.4.0}/pyproject.toml RENAMED Viewed

@@ -9,8 +9,9 @@ license = "MIT"
 requires-python = ">=3.10.15"
 dynamic = ["version"]
 dependencies = [
-    "ezmsg-baseproc>=1.3.0",
-    "ezmsg-sigproc>=2.15.0",
+    "ezmsg>=3.7.3",
+    "ezmsg-baseproc>=1.5.1",
+    "ezmsg-sigproc>=2.17.0",
     "river>=0.22.0",
     "scikit-learn>=1.6.0",
     "torch>=2.6.0",
@@ -73,5 +74,4 @@ known-third-party = ["ezmsg", "ezmsg.baseproc", "ezmsg.sigproc"]
 [tool.uv.sources]
 # Uncomment to use development version of ezmsg from git
-#ezmsg = { git = "https://github.com/ezmsg-org/ezmsg.git", branch = "feature/profiling" }
-#ezmsg-sigproc = { path = "../ezmsg-sigproc", editable = true }
+#ezmsg = { git = "https://github.com/ezmsg-org/ezmsg.git", branch = "feature/profiling" }

{ezmsg_learn-1.3.0 → ezmsg_learn-1.4.0}/src/ezmsg/learn/__version__.py RENAMED Viewed

@@ -28,7 +28,7 @@ version_tuple: VERSION_TUPLE
 commit_id: COMMIT_ID
 __commit_id__: COMMIT_ID
-__version__ = version = '1.3.0'
-__version_tuple__ = version_tuple = (1, 3, 0)
+__version__ = version = '1.4.0'
+__version_tuple__ = version_tuple = (1, 4, 0)
 __commit_id__ = commit_id = None

{ezmsg_learn-1.3.0 → ezmsg_learn-1.4.0}/src/ezmsg/learn/dim_reduce/adaptive_decomp.py RENAMED Viewed

@@ -1,7 +1,18 @@
+"""Adaptive decomposition transformers (PCA, NMF).
+.. note::
+    This module supports the Array API standard via
+    ``array_api_compat.get_namespace()``.  Reshaping and output allocation
+    use Array API operations; a NumPy boundary is applied before sklearn
+    ``partial_fit``/``transform`` calls.
+"""
+import math
 import typing
 import ezmsg.core as ez
 import numpy as np
+from array_api_compat import get_namespace, is_numpy_array
 from ezmsg.baseproc import (
     BaseAdaptiveTransformer,
     BaseAdaptiveTransformerUnit,
@@ -128,6 +139,8 @@ class AdaptiveDecompTransformer(
         if in_dat.shape[ax_idx] == 0:
             return self._state.template
+        xp = get_namespace(in_dat)
         # Re-order axes
         sorted_dims_exp = [iter_axis] + off_targ_axes + targ_axes
         if message.dims != sorted_dims_exp:
@@ -137,16 +150,20 @@ class AdaptiveDecompTransformer(
             pass
         # fold [iter_axis] + off_targ_axes together and fold targ_axes together
-        d2 = np.prod(in_dat.shape[len(off_targ_axes) + 1 :])
-        in_dat = in_dat.reshape((-1, d2))
+        d2 = math.prod(in_dat.shape[len(off_targ_axes) + 1 :])
+        in_dat = xp.reshape(in_dat, (-1, d2))
         replace_kwargs = {
             "axes": {**self._state.template.axes, iter_axis: message.axes[iter_axis]},
         }
-        # Transform data
+        # Transform data — sklearn needs numpy
         if hasattr(self._state.estimator, "components_"):
-            decomp_dat = self._state.estimator.transform(in_dat).reshape((-1,) + self._state.template.data.shape[1:])
+            in_np = np.asarray(in_dat) if not is_numpy_array(in_dat) else in_dat
+            decomp_dat = self._state.estimator.transform(in_np)
+            # Convert back to source namespace
+            decomp_dat = xp.asarray(decomp_dat) if not is_numpy_array(in_dat) else decomp_dat
+            decomp_dat = xp.reshape(decomp_dat, (-1,) + self._state.template.data.shape[1:])
             replace_kwargs["data"] = decomp_dat
         return replace(self._state.template, **replace_kwargs)
@@ -165,6 +182,8 @@ class AdaptiveDecompTransformer(
         if in_dat.shape[ax_idx] == 0:
             return
+        xp = get_namespace(in_dat)
         # Re-order axes if needed
         sorted_dims_exp = [iter_axis] + off_targ_axes + targ_axes
         if message.dims != sorted_dims_exp:
@@ -172,11 +191,12 @@ class AdaptiveDecompTransformer(
             pass
         # fold [iter_axis] + off_targ_axes together and fold targ_axes together
-        d2 = np.prod(in_dat.shape[len(off_targ_axes) + 1 :])
-        in_dat = in_dat.reshape((-1, d2))
+        d2 = math.prod(in_dat.shape[len(off_targ_axes) + 1 :])
+        in_dat = xp.reshape(in_dat, (-1, d2))
-        # Fit the estimator
-        self._state.estimator.partial_fit(in_dat)
+        # Fit the estimator — sklearn needs numpy
+        in_np = np.asarray(in_dat) if not is_numpy_array(in_dat) else in_dat
+        self._state.estimator.partial_fit(in_np)
 class IncrementalPCASettings(AdaptiveDecompSettings):

ezmsg_learn-1.4.0/src/ezmsg/learn/model/cca.py ADDED Viewed

@@ -0,0 +1,163 @@
+"""Incremental Canonical Correlation Analysis (CCA).
+.. note::
+    This module supports the Array API standard via
+    ``array_api_compat.get_namespace()``.  All linear algebra uses Array API
+    operations; ``scipy.linalg.sqrtm`` is replaced by an eigendecomposition-
+    based inverse square root (:func:`_inv_sqrtm_spd`).
+"""
+import numpy as np
+from array_api_compat import get_namespace
+from ezmsg.sigproc.util.array import array_device, xp_create
+def _inv_sqrtm_spd(xp, A):
+    """Inverse matrix square root for symmetric positive-definite matrices.
+    Computes ``inv(sqrtm(A)) = Q @ diag(1/sqrt(lambda)) @ Q^T`` using the
+    eigendecomposition.  This is more numerically stable than computing
+    ``inv(sqrtm(...))`` separately and uses only Array API operations.
+    """
+    eigenvalues, eigenvectors = xp.linalg.eigh(A)
+    eigenvalues = xp.clip(eigenvalues, 1e-12, None)  # avoid div-by-zero
+    inv_sqrt_eig = 1.0 / xp.sqrt(eigenvalues)
+    # Q @ diag(v) == Q * v (broadcasting), then @ Q^T
+    return (eigenvectors * inv_sqrt_eig) @ xp.linalg.matrix_transpose(eigenvectors)
+class IncrementalCCA:
+    def __init__(
+        self,
+        n_components=2,
+        base_smoothing=0.95,
+        min_smoothing=0.5,
+        max_smoothing=0.99,
+        adaptation_rate=0.1,
+    ):
+        """
+        Parameters:
+        -----------
+        n_components : int
+            Number of canonical components to compute
+        base_smoothing : float
+            Base smoothing factor (will be adapted)
+        min_smoothing : float
+            Minimum allowed smoothing factor
+        max_smoothing : float
+            Maximum allowed smoothing factor
+        adaptation_rate : float
+            How quickly to adjust smoothing factor (between 0 and 1)
+        """
+        self.n_components = n_components
+        self.base_smoothing = base_smoothing
+        self.current_smoothing = base_smoothing
+        self.min_smoothing = min_smoothing
+        self.max_smoothing = max_smoothing
+        self.adaptation_rate = adaptation_rate
+        self.initialized = False
+    def initialize(self, d1, d2, *, ref_array=None):
+        """Initialize the necessary matrices.
+        Args:
+            d1: Dimensionality of the first dataset.
+            d2: Dimensionality of the second dataset.
+            ref_array: Optional reference array to derive array namespace
+                and device from.  If ``None``, defaults to NumPy.
+        """
+        self.d1 = d1
+        self.d2 = d2
+        if ref_array is not None:
+            xp = get_namespace(ref_array)
+            dev = array_device(ref_array)
+        else:
+            xp, dev = np, None
+        # Initialize correlation matrices
+        self.C11 = xp_create(xp.zeros, (d1, d1), dtype=xp.float64, device=dev)
+        self.C22 = xp_create(xp.zeros, (d2, d2), dtype=xp.float64, device=dev)
+        self.C12 = xp_create(xp.zeros, (d1, d2), dtype=xp.float64, device=dev)
+        self.initialized = True
+    def _compute_change_magnitude(self, C11_new, C22_new, C12_new):
+        """Compute magnitude of change in correlation structure."""
+        xp = get_namespace(self.C11)
+        # Frobenius norm of differences
+        diff11 = xp.linalg.matrix_norm(C11_new - self.C11)
+        diff22 = xp.linalg.matrix_norm(C22_new - self.C22)
+        diff12 = xp.linalg.matrix_norm(C12_new - self.C12)
+        # Normalize by matrix sizes
+        diff11 = diff11 / (self.d1 * self.d1)
+        diff22 = diff22 / (self.d2 * self.d2)
+        diff12 = diff12 / (self.d1 * self.d2)
+        return float((diff11 + diff22 + diff12) / 3)
+    def _adapt_smoothing(self, change_magnitude):
+        """Adapt smoothing factor based on detected changes."""
+        # If change is large, decrease smoothing factor
+        target_smoothing = self.base_smoothing * (1.0 - change_magnitude)
+        target_smoothing = max(self.min_smoothing, min(target_smoothing, self.max_smoothing))
+        # Smooth the adaptation itself
+        self.current_smoothing = (
+            1 - self.adaptation_rate
+        ) * self.current_smoothing + self.adaptation_rate * target_smoothing
+    def partial_fit(self, X1, X2, update_projections=True):
+        """Update the model with new samples using adaptive smoothing.
+        Assumes X1 and X2 are already centered and scaled."""
+        xp = get_namespace(X1, X2)
+        _mT = xp.linalg.matrix_transpose
+        if not self.initialized:
+            self.initialize(X1.shape[1], X2.shape[1], ref_array=X1)
+        # Compute new correlation matrices from current batch
+        C11_new = _mT(X1) @ X1 / X1.shape[0]
+        C22_new = _mT(X2) @ X2 / X2.shape[0]
+        C12_new = _mT(X1) @ X2 / X1.shape[0]
+        # Detect changes and adapt smoothing factor
+        if bool(xp.any(self.C11 != 0)):  # Skip first update
+            change_magnitude = self._compute_change_magnitude(C11_new, C22_new, C12_new)
+            self._adapt_smoothing(change_magnitude)
+        # Update with current smoothing factor
+        alpha = self.current_smoothing
+        self.C11 = alpha * self.C11 + (1 - alpha) * C11_new
+        self.C22 = alpha * self.C22 + (1 - alpha) * C22_new
+        self.C12 = alpha * self.C12 + (1 - alpha) * C12_new
+        if update_projections:
+            self._update_projections()
+    def _update_projections(self):
+        """Update canonical vectors and correlations."""
+        xp = get_namespace(self.C11)
+        dev = array_device(self.C11)
+        _mT = xp.linalg.matrix_transpose
+        eps = 1e-8
+        C11_reg = self.C11 + eps * xp_create(xp.eye, self.d1, dtype=self.C11.dtype, device=dev)
+        C22_reg = self.C22 + eps * xp_create(xp.eye, self.d2, dtype=self.C22.dtype, device=dev)
+        inv_sqrt_C11 = _inv_sqrtm_spd(xp, C11_reg)
+        inv_sqrt_C22 = _inv_sqrtm_spd(xp, C22_reg)
+        K = inv_sqrt_C11 @ self.C12 @ inv_sqrt_C22
+        U, self.correlations_, Vh = xp.linalg.svd(K, full_matrices=False)
+        self.x_weights_ = inv_sqrt_C11 @ U[:, : self.n_components]
+        self.y_weights_ = inv_sqrt_C22 @ _mT(Vh)[:, : self.n_components]
+    def transform(self, X1, X2):
+        """Project data onto canonical components."""
+        X1_proj = X1 @ self.x_weights_
+        X2_proj = X2 @ self.y_weights_
+        return X1_proj, X2_proj

ezmsg-learn 1.3.0__tar.gz → 1.4.0__tar.gz

ezmsg-learn 1.3.0tar.gz → 1.4.0tar.gz