PyPI - pystatistics - Versions diffs - 2.0.1__tar.gz → 2.1.0__tar.gz - Mend

pystatistics 2.0.1tar.gz → 2.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (544) hide show

{pystatistics-2.0.1 → pystatistics-2.1.0}/CHANGELOG.md RENAMED Viewed

@@ -1,5 +1,218 @@
 # Changelog
+## 2.1.0
+- **`mom_mcar_test`: new method-of-moments MCAR test**
+  (``pystatistics/mvnmle/mcar_test.py``). A separate function — not
+  a new mode on ``little_mcar_test`` — because the method-of-moments
+  variant **is not Little's test**. Little (1988) specifically calls
+  for MLE plug-in estimators; swapping in pairwise-deletion sample
+  moments gives a statistic of the same shape with different
+  asymptotic properties, and calling it Little's test would be a
+  polite but concrete lie. The separate function preserves the
+  ``little_mcar_test`` contract exactly (matches R ``mvnmle`` bit-
+  for-bit as before) while giving users a documented fast alternative.
+  End-to-end timings at 15 % MCAR:
+  | dataset        | shape     | little_mcar_test | mom_mcar_test |
+  |----------------|-----------|------------------|---------------|
+  | iris           | 150 × 4   | 2.9 ms           | 0.31 ms       |
+  | wine           | 178 × 13  | 60.9 ms          | 2.17 ms       |
+  | breast_cancer  | 569 × 30  | 1491 ms          | 28.7 ms       |
+  For repeated-diagnostic workflows (e.g. an MCAR sweep over 3410
+  datasets), this is **1.6 minutes vs ~50 minutes** end-to-end. The
+  statistical trade-off is asymptotic efficiency: MoM is consistent
+  under the MCAR null but not asymptotically efficient, and the
+  finite-sample distribution deviates more from chi-square than
+  Little's does. The docstring spells out when to use which:
+  diagnostic screens → MoM; regulated submissions or anywhere the
+  exact asymptotic distribution matters → Little's.
+  Implementation details:
+    - ``_pairwise_deletion_moments``: O(n v^2) pairwise mean and
+      covariance via a single matmul. No per-column loop.
+    - ``chi_square_mcar_batched_np`` / ``_torch``: fully batched
+      chi-square assembly (batched SVD for conditioning,
+      well-conditioned patterns through batched solve, ill-conditioned
+      patterns through batched pinv as separate groups — no
+      per-pattern Python loop).
+    - ``backend`` parameter with same size-heuristic + visible-warning
+      discipline as the EM path. GPU is supported but does not
+      out-perform CPU on any tested shape — MoM's compute is small
+      enough that transfer + launch overhead loses to CPU numpy.
+      Auto-dispatch warns when this is the case.
+    - Honesty: ``MCARTestResult`` gained a ``method`` field so
+      downstream code knows which test produced a given result.
+      ``little_mcar_test`` reports ``"Little (MLE plug-in)"``;
+      ``mom_mcar_test`` reports ``"Method-of-moments
+      (pairwise-deletion plug-in)"``.
+  New tests (``tests/mvnmle/test_mom_mcar.py``, 10 tests):
+  name-honesty, MLE-vs-MoM agreement on MCAR data, correct rejection
+  on non-MCAR data, all-missing-row handling, speed guard of
+  ≥ 10× over MLE on breast_cancer.
+- **Fully-batched device-resident EM on GPU** (``_em_batched.py``
+  / ``_run_em_loop_gpu``). Pre-2.1.0 the "GPU EM" path set up a
+  torch device in the constructor but none of the per-iteration
+  work actually ran on-device — the numpy E-step ran for every
+  backend, which is why pre-2.1.0 benchmarks showed identical CPU
+  and GPU timings. This release implements the real thing: one
+  batched Cholesky + one batched solve over patterns for the
+  regression betas, one batched gather + bmm over all N
+  observations for the filled data, two dense gemms for the
+  sufficient-statistic accumulators, all on-device. SQUAREM also
+  runs fully on-device.
+  EM-only timings (without the MCAR-assembly wrapper):
+  | dataset        | shape     | CPU EM   | GPU EM   | speedup |
+  |----------------|-----------|----------|----------|---------|
+  | wine           | 178 × 13  | 38 ms    | 24 ms    | 1.6×    |
+  | breast_cancer  | 569 × 30  | 2142 ms  | 147 ms   | 14.6×   |
+  Small-data cases (apple, iris, missvals) lose on GPU because
+  transfer + launch overhead exceeds the per-iteration work.
+  Empirically calibrated heuristic: GPU is worth it when
+  ``n_obs * n_vars > 1500``.
+- **Size-heuristic dispatch with Rule-1 visibility** for both EM and
+  MoM backends. When ``backend='auto'`` makes a non-obvious choice
+  (e.g. picking CPU despite GPU availability because the data are
+  too small), a ``UserWarning`` explains the decision and tells
+  users how to override. When ``backend='gpu'`` is explicitly
+  requested on small data, the request is honored (user knows best)
+  but a warning notes that CPU would likely be faster. No silent
+  fallbacks anywhere. New tests pin these behaviours.
+- **Monotone-missingness closed-form MLE** (Anderson 1957; new
+  ``pystatistics.mvnmle._monotone``). When the missingness pattern
+  is monotone — when variables can be ordered such that each
+  observation's missing entries form a contiguous suffix — the MVN
+  MLE has a closed form via a chain of OLS regressions, with no
+  iteration. Common on longitudinal data with attrition, panel
+  surveys with dropout, and most sequentially-administered
+  instruments. New public helpers:
+    - ``pystatistics.mvnmle.is_monotone(data) -> bool``
+    - ``pystatistics.mvnmle.monotone_permutation(data) -> ndarray | None``
+    - ``pystatistics.mvnmle.mlest_monotone_closed_form(data) -> (mu, sigma, n)``
+    - ``mlest(data, algorithm='monotone')`` routes through the
+      closed-form; raises ``ValidationError`` if the data are not
+      monotone (Rule 1: no silent dispatch). Users who want
+      "use the closed form when applicable, fall back otherwise"
+      should call ``is_monotone`` first and branch explicitly.
+  The closed-form is the exact MLE (no tolerance-bounded
+  approximation), matches R ``mvnmle`` reference output on both
+  ``apple`` and ``missvals`` to machine precision, and is
+  dramatically faster than iterative algorithms at larger v
+  (a 1500 × 20 monotone dataset completes in ~2 ms vs EM's
+  ~40 ms). For non-monotone random MCAR data (the common case
+  in MCAR diagnostic use), detection is cheap (~O(v² n)) and
+  correctly returns False so iterative algorithms run.
+  New tests (``tests/mvnmle/test_monotone.py``, 12 tests):
+  detection true-positive / true-negative on several canonical
+  shapes; closed-form vs EM agreement; permutation invariance;
+  non-monotone data raises; performance guard at v=20.
+- **EM MLE: substantial real-data speedup via batched per-pattern
+  linear algebra + SQUAREM acceleration** (Project Lacuna-driven).
+  End-to-end ``little_mcar_test`` wall-clock at 15 % MCAR, seed 0:
+  | dataset        | shape     | 2.0.1    | 2.1.0    | speedup |
+  |----------------|-----------|----------|----------|---------|
+  | apple          | 18 × 2    |  1.9 ms  |  2.0 ms  |  flat   |
+  | missvals       | 13 × 5    | 19.9 ms  |  9.5 ms  |  2.1×   |
+  | iris           | 150 × 4   |  2.8 ms  |  2.8 ms  |  flat   |
+  | wine           | 178 × 13  | 79.4 ms  | 41.5 ms  |  1.9×   |
+  | breast_cancer  | 569 × 30  | 3278 ms  | 2089 ms  |  1.6×   |
+  For workloads that run MCAR repeatedly over many datasets
+  (e.g. a 3410-entry MCAR sweep), this is roughly a 1-hour reduction
+  per full pass at Lacuna's current scale.
+  Three changes stack:
+  1. **Batched per-pattern conditional parameters** (new
+     ``pystatistics.mvnmle.backends._em_batched``). The E-step used
+     to loop in Python over missingness patterns, issuing a scalar
+     Cholesky + triangular solve per pattern. It now stacks all P
+     pattern-sigma submatrices into a single
+     ``(P, v_max, v_max)`` tensor (identity-padded in the unused
+     slots so the Cholesky stays well-defined) and runs one batched
+     Cholesky + one batched solve for the whole iteration. The
+     accumulator loop over patterns remains in Python because
+     ``n_k`` varies and full observation-level padding hurt more
+     than it helped on the representative shapes we benchmarked.
+  2. **SQUAREM acceleration** (Varadhan & Roland 2008; new
+     ``pystatistics.mvnmle.backends._squarem``). EM's linear
+     convergence is sped up by a Steffensen-style extrapolation of
+     three consecutive EM iterates, safeguarded by a monotonicity
+     check on the observed-data log-likelihood. Typical effect on
+     well-behaved EM problems: 2–4× reduction in underlying
+     EM-step-equivalents. Preserves the MLE — the convergence
+     point is unchanged, only the path is shorter. On by default
+     via a new ``accelerate=True`` kwarg on ``EMBackend.solve``;
+     pass ``accelerate=False`` for the plain-EM reference path.
+  3. **Fully batched observed-data log-likelihood**
+     (``compute_loglik_batched_np``). The SQUAREM monotonicity
+     safeguard calls the log-likelihood often, so that path
+     needed to be cheap. The implementation now does one batched
+     Cholesky over all patterns for log-determinants and one
+     batched solve across all N observations for the quadratic-
+     form contribution — no per-pattern Python loop.
+- **Benchmark harness** (``benchmarks/mvnmle_bench.py``). Runs the
+  five reference shapes (apple, missvals, iris, wine,
+  breast_cancer) across the (algorithm, backend) matrix and
+  prints wall-clock / iteration counts per case. Use
+  ``--quick`` to skip the BFGS cases that don't converge on
+  high-$v$ data; ``--tag`` labels a run for diff against prior
+  baselines.
+- **Documented why direct-BFGS is not always the right default.**
+  Internal notes and the 2.0.0 / 2.0.1 release narrative already
+  covered why ``algorithm='em'`` became the ``little_mcar_test``
+  default; this release adds the story of why batching helps EM
+  significantly but does *not* rescue direct-BFGS on realistic
+  high-$v$ data (layer-3 Hessian conditioning is parameterization-
+  invariant; batching only addresses layer-1 launch overhead).
+  See ``GPU_BACKEND_CONVENTION.md`` Section 0 for the "when to
+  add a GPU backend and when not" rule that drove the 2.0.1
+  cleanup; this release extends that logic with
+  "accelerating the algorithm by reducing iteration count
+  (SQUAREM) is cheaper than accelerating each iteration."
+- **Finding: the "GPU EM" backend was never actually running on
+  GPU.** The ``device='cuda'`` / ``'mps'`` constructor flag set
+  up ``self._torch`` but none of ``_e_step`` / ``_m_step`` /
+  ``_compute_loglik`` used it — the numpy path ran for every
+  backend. That's why pre-2.1.0 benchmarks showed identical
+  CPU and GPU EM timings. We attempted to implement a real
+  device-resident EM loop and found it was slower than CPU for
+  all the shapes we care about (per-pattern kernel-launch
+  overhead dominates the small per-pattern matrix work). The
+  honest answer for now is that GPU EM stays CPU-equivalent
+  by design; a future release may revisit with fully N-parallel
+  observation-level batching if a workload appears where the
+  GPU can actually win. This behaviour is unchanged from prior
+  releases — we're just documenting what was already true.
+- **SQUAREM test coverage** (new ``tests/mvnmle/test_squarem.py``).
+  Four tests pinning the invariants: same MLE as plain EM on
+  apple; substantially fewer EM-equivalent steps on missvals;
+  monotonicity of log-likelihood preserved across iteration
+  caps; same MLE as plain EM on a realistic shape (sklearn
+  wine with 15 % MCAR).
 ## 2.0.1
 - **GPU Backend Convention: codified when NOT to add a GPU backend**

{pystatistics-2.0.1 → pystatistics-2.1.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pystatistics
-Version: 2.0.1
+Version: 2.1.0
 Summary: GPU-accelerated statistical computing for Python
 Project-URL: Homepage, https://sgcx.org/technology/pystatistics/
 Project-URL: Documentation, https://sgcx.org/docs/pystatistics/
@@ -51,6 +51,108 @@ GPU-accelerated statistical computing for Python.
 ## What's New
+### 2.1.0 — Real-data EM speedup + monotone closed-form MLE
+Dogfooding via Project Lacuna surfaced that ``little_mcar_test`` on
+realistic tabular data (sklearn's iris / wine / breast_cancer with
+random MCAR injection) was bottlenecked by EM: the E-step was a
+Python loop over missingness patterns, and each SQUAREM-style
+safeguard pass re-ran a per-pattern log-likelihood. This release
+batches both and adds Varadhan & Roland's SQUAREM acceleration.
+End-to-end ``little_mcar_test`` wall-clock at 15 % MCAR, seed 0:
+| dataset        | shape     | 2.0.1    | 2.1.0    | speedup |
+|----------------|-----------|----------|----------|---------|
+| missvals       | 13 × 5    | 19.9 ms  |  9.5 ms  |  2.1×   |
+| wine           | 178 × 13  | 79.4 ms  | 41.5 ms  |  1.9×   |
+| breast_cancer  | 569 × 30  | 3278 ms  | 2089 ms  |  1.6×   |
+For repeated-diagnostic workflows (e.g. an MCAR sweep over several
+thousand datasets), this turns a 3-hour run into a 2-hour run.
+Three stacked improvements, all preserving bit-equivalence on the R
+mvnmle reference cases (apple, missvals):
+- **Batched per-pattern conditional parameters.** The E-step's
+  per-pattern Cholesky + triangular solve now runs as a single
+  batched kernel pair across all missingness patterns. The
+  unused padding slots are identity-filled so the Cholesky stays
+  well-defined.
+- **SQUAREM acceleration on top of EM.** Three EM steps + one
+  Steffensen-style extrapolation, safeguarded by a monotonicity
+  check on the observed-data log-likelihood. Typical effect:
+  2–4× fewer EM-step equivalents to convergence. Convergence
+  point is the same MLE — only the path is shorter. On by
+  default; ``EMBackend.solve(..., accelerate=False)`` recovers
+  the plain-EM reference.
+- **Fully batched log-likelihood.** The SQUAREM monotonicity
+  check calls ``loglik`` often, so it was batched too — one
+  Cholesky over all patterns, one solve across all N
+  observations, no per-pattern Python loop.
+**`mom_mcar_test`: fast method-of-moments MCAR test.** A new *separate
+function* (not a mode on ``little_mcar_test``, because the MoM variant
+is not Little's test) that uses pairwise-deletion sample moments
+instead of MLE plug-in. The test is consistent under MCAR but not
+asymptotically efficient, trading a small amount of statistical
+efficiency for dramatic speed. At 15 % MCAR on sklearn demos:
+| dataset        | shape     | little_mcar_test | mom_mcar_test |
+|----------------|-----------|------------------|---------------|
+| iris           | 150 × 4   | 2.9 ms           | 0.31 ms       |
+| wine           | 178 × 13  | 60.9 ms          | 2.17 ms       |
+| breast_cancer  | 569 × 30  | 1491 ms          | 28.7 ms       |
+For a 3410-dataset MCAR sweep: **~50 minutes → ~1.6 minutes**. Use
+``little_mcar_test`` when you need Little 1988's asymptotic
+distribution exactly (regulated submissions, citing R reference);
+use ``mom_mcar_test`` for high-throughput diagnostic screens. The
+``MCARTestResult.method`` field records which test produced a given
+result so downstream code can disambiguate without tracking the
+calling function.
+**Fully-batched device-resident EM on GPU.** Pre-2.1.0 the
+``device='cuda'`` EM path set up a torch device but never used it —
+numpy ran for every backend. This release implements a real
+device-resident loop with fully batched E-step / M-step / log-
+likelihood, SQUAREM acceleration on top, all on device. On breast-
+cancer-scale (569 × 30) EM drops from 2142 ms CPU to 147 ms GPU
+(14.6×). Small data remains CPU-faster; an empirical size heuristic
+(``n * v >= 1500``) with visible dispatch warnings keeps this
+correct in user-facing behaviour.
+**Monotone-missingness closed-form MLE** (Anderson 1957). Longitudinal
+cohorts with attrition, panel surveys with dropout, and most
+sequentially-administered instruments produce *monotone* missingness
+— the variables can be ordered such that each observation's missing
+entries form a contiguous suffix. When the pattern is monotone, the
+MVN MLE has a closed form via a chain of OLS regressions, with no
+iteration. New helpers: ``mvnmle.is_monotone(data)``,
+``mvnmle.monotone_permutation(data)``, and
+``mlest(data, algorithm='monotone')``. The closed-form matches R
+``mvnmle`` bit-for-bit on canonical datasets and is orders of
+magnitude faster than EM on larger-v longitudinal data. Per Rule 1
+the algorithm raises on non-monotone input rather than silently
+falling back — call ``is_monotone`` first if you want conditional
+dispatch.
+Also in this release:
+- **Benchmark harness** under ``benchmarks/mvnmle_bench.py`` for
+  tracking wall-clock and iteration counts across the reference
+  shapes; use the ``--tag`` flag to label a baseline for diff
+  against future changes.
+- **Documented finding**: the ``device='cuda'`` EM path was never
+  actually running on the GPU prior to this release — it stored
+  a torch device but never used it. We tried to wire up a real
+  device-resident loop and found GPU is slower than CPU for all
+  shapes we tested (per-pattern launch overhead still dominates
+  the tiny per-pattern matrix work). GPU EM therefore remains
+  CPU-equivalent by design; a future release will revisit if a
+  workload appears where full observation-level batching makes
+  GPU actually win.
 ### 2.0.1 — GPU-backend exposure gaps and a convention rule
 Two public functions had GPU-capable inner calls but no `backend=`

{pystatistics-2.0.1 → pystatistics-2.1.0}/README.md RENAMED Viewed

@@ -4,6 +4,108 @@ GPU-accelerated statistical computing for Python.
 ## What's New
+### 2.1.0 — Real-data EM speedup + monotone closed-form MLE
+Dogfooding via Project Lacuna surfaced that ``little_mcar_test`` on
+realistic tabular data (sklearn's iris / wine / breast_cancer with
+random MCAR injection) was bottlenecked by EM: the E-step was a
+Python loop over missingness patterns, and each SQUAREM-style
+safeguard pass re-ran a per-pattern log-likelihood. This release
+batches both and adds Varadhan & Roland's SQUAREM acceleration.
+End-to-end ``little_mcar_test`` wall-clock at 15 % MCAR, seed 0:
+| dataset        | shape     | 2.0.1    | 2.1.0    | speedup |
+|----------------|-----------|----------|----------|---------|
+| missvals       | 13 × 5    | 19.9 ms  |  9.5 ms  |  2.1×   |
+| wine           | 178 × 13  | 79.4 ms  | 41.5 ms  |  1.9×   |
+| breast_cancer  | 569 × 30  | 3278 ms  | 2089 ms  |  1.6×   |
+For repeated-diagnostic workflows (e.g. an MCAR sweep over several
+thousand datasets), this turns a 3-hour run into a 2-hour run.
+Three stacked improvements, all preserving bit-equivalence on the R
+mvnmle reference cases (apple, missvals):
+- **Batched per-pattern conditional parameters.** The E-step's
+  per-pattern Cholesky + triangular solve now runs as a single
+  batched kernel pair across all missingness patterns. The
+  unused padding slots are identity-filled so the Cholesky stays
+  well-defined.
+- **SQUAREM acceleration on top of EM.** Three EM steps + one
+  Steffensen-style extrapolation, safeguarded by a monotonicity
+  check on the observed-data log-likelihood. Typical effect:
+  2–4× fewer EM-step equivalents to convergence. Convergence
+  point is the same MLE — only the path is shorter. On by
+  default; ``EMBackend.solve(..., accelerate=False)`` recovers
+  the plain-EM reference.
+- **Fully batched log-likelihood.** The SQUAREM monotonicity
+  check calls ``loglik`` often, so it was batched too — one
+  Cholesky over all patterns, one solve across all N
+  observations, no per-pattern Python loop.
+**`mom_mcar_test`: fast method-of-moments MCAR test.** A new *separate
+function* (not a mode on ``little_mcar_test``, because the MoM variant
+is not Little's test) that uses pairwise-deletion sample moments
+instead of MLE plug-in. The test is consistent under MCAR but not
+asymptotically efficient, trading a small amount of statistical
+efficiency for dramatic speed. At 15 % MCAR on sklearn demos:
+| dataset        | shape     | little_mcar_test | mom_mcar_test |
+|----------------|-----------|------------------|---------------|
+| iris           | 150 × 4   | 2.9 ms           | 0.31 ms       |
+| wine           | 178 × 13  | 60.9 ms          | 2.17 ms       |
+| breast_cancer  | 569 × 30  | 1491 ms          | 28.7 ms       |
+For a 3410-dataset MCAR sweep: **~50 minutes → ~1.6 minutes**. Use
+``little_mcar_test`` when you need Little 1988's asymptotic
+distribution exactly (regulated submissions, citing R reference);
+use ``mom_mcar_test`` for high-throughput diagnostic screens. The
+``MCARTestResult.method`` field records which test produced a given
+result so downstream code can disambiguate without tracking the
+calling function.
+**Fully-batched device-resident EM on GPU.** Pre-2.1.0 the
+``device='cuda'`` EM path set up a torch device but never used it —
+numpy ran for every backend. This release implements a real
+device-resident loop with fully batched E-step / M-step / log-
+likelihood, SQUAREM acceleration on top, all on device. On breast-
+cancer-scale (569 × 30) EM drops from 2142 ms CPU to 147 ms GPU
+(14.6×). Small data remains CPU-faster; an empirical size heuristic
+(``n * v >= 1500``) with visible dispatch warnings keeps this
+correct in user-facing behaviour.
+**Monotone-missingness closed-form MLE** (Anderson 1957). Longitudinal
+cohorts with attrition, panel surveys with dropout, and most
+sequentially-administered instruments produce *monotone* missingness
+— the variables can be ordered such that each observation's missing
+entries form a contiguous suffix. When the pattern is monotone, the
+MVN MLE has a closed form via a chain of OLS regressions, with no
+iteration. New helpers: ``mvnmle.is_monotone(data)``,
+``mvnmle.monotone_permutation(data)``, and
+``mlest(data, algorithm='monotone')``. The closed-form matches R
+``mvnmle`` bit-for-bit on canonical datasets and is orders of
+magnitude faster than EM on larger-v longitudinal data. Per Rule 1
+the algorithm raises on non-monotone input rather than silently
+falling back — call ``is_monotone`` first if you want conditional
+dispatch.
+Also in this release:
+- **Benchmark harness** under ``benchmarks/mvnmle_bench.py`` for
+  tracking wall-clock and iteration counts across the reference
+  shapes; use the ``--tag`` flag to label a baseline for diff
+  against future changes.
+- **Documented finding**: the ``device='cuda'`` EM path was never
+  actually running on the GPU prior to this release — it stored
+  a torch device but never used it. We tried to wire up a real
+  device-resident loop and found GPU is slower than CPU for all
+  shapes we tested (per-pattern launch overhead still dominates
+  the tiny per-pattern matrix work). GPU EM therefore remains
+  CPU-equivalent by design; a future release will revisit if a
+  workload appears where full observation-level batching makes
+  GPU actually win.
 ### 2.0.1 — GPU-backend exposure gaps and a convention rule
 Two public functions had GPU-capable inner calls but no `backend=`

pystatistics-2.1.0/benchmarks/mvnmle_bench.py ADDED Viewed

@@ -0,0 +1,194 @@
+"""Benchmark harness for pystatistics.mvnmle.
+Measures wall-clock + iteration count across a representative
+spectrum of shapes, for every (algorithm, backend) combination.
+Shapes:
+    apple        2-var, 18 obs — R-mvnmle reference case
+    missvals     5-var, 13 obs — R-mvnmle reference case
+    iris         4-var, 150 obs, synthetic MCAR — sklearn demo
+    wine         13-var, 178 obs, synthetic MCAR — sklearn demo
+                 (the Project Lacuna canary: 100+ patterns)
+    breast       30-var, 569 obs, synthetic MCAR — sklearn demo
+                 (Lacuna's real per-entry workload)
+Run:
+    python benchmarks/mvnmle_bench.py            # full sweep
+    python benchmarks/mvnmle_bench.py --quick    # skip slow BFGS cases
+    python benchmarks/mvnmle_bench.py --tag baseline > baseline.txt
+"""
+from __future__ import annotations
+import argparse
+import sys
+import time
+from dataclasses import dataclass
+import numpy as np
+SEED = 0
+MISSING_RATE = 0.15
+@dataclass
+class Case:
+    name: str
+    data_fn: object
+    shape_hint: str
+    slow_bfgs: bool
+def _load_apple():
+    from pystatistics.mvnmle.datasets import apple
+    return apple.copy()
+def _load_missvals():
+    from pystatistics.mvnmle.datasets import missvals
+    return missvals.copy()
+def _load_sklearn(loader_name):
+    from sklearn import datasets as sk
+    X = getattr(sk, f"load_{loader_name}")().data.astype(float).copy()
+    rng = np.random.default_rng(SEED)
+    X[rng.random(X.shape) < MISSING_RATE] = np.nan
+    X = X[~np.all(np.isnan(X), axis=1)]
+    return X
+CASES = [
+    Case("apple",   _load_apple,                          "18 x 2",   False),
+    Case("missvals", _load_missvals,                      "13 x 5",   False),
+    Case("iris",    lambda: _load_sklearn("iris"),        "150 x 4",  False),
+    Case("wine",    lambda: _load_sklearn("wine"),        "178 x 13", True),
+    Case("breast",  lambda: _load_sklearn("breast_cancer"),"569 x 30", True),
+]
+def _gpu_available():
+    try:
+        import torch
+        return torch.cuda.is_available()
+    except ImportError:
+        return False
+def bench_one(data, algorithm, backend, max_iter, repeat=1):
+    """Return dict with time_ms, n_iter, converged, loglik, err.
+    Never raises — any exception becomes err=type-name.
+    """
+    from pystatistics.mvnmle import mlest
+    # Warmup
+    try:
+        _ = mlest(data, algorithm=algorithm, backend=backend,
+                  max_iter=max_iter, verbose=False)
+    except Exception as e:
+        return {"time_ms": None, "n_iter": None, "converged": False,
+                "loglik": None, "err": type(e).__name__}
+    times = []
+    n_iter = None
+    converged = None
+    loglik = None
+    for _ in range(repeat):
+        t = time.perf_counter()
+        try:
+            r = mlest(data, algorithm=algorithm, backend=backend,
+                      max_iter=max_iter, verbose=False)
+            times.append(time.perf_counter() - t)
+            n_iter = r.n_iter
+            converged = r.converged
+            loglik = r.loglik
+        except Exception as e:
+            return {"time_ms": None, "n_iter": None, "converged": False,
+                    "loglik": None, "err": type(e).__name__}
+    median_ms = 1000 * float(np.median(times))
+    return {"time_ms": median_ms, "n_iter": n_iter, "converged": converged,
+            "loglik": loglik, "err": None}
+def bench_mcar_one(data, backend, repeat=1):
+    """Benchmark little_mcar_test end-to-end (what Lacuna actually calls)."""
+    from pystatistics.mvnmle import little_mcar_test
+    import warnings
+    try:
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            _ = little_mcar_test(data, backend=backend)
+    except Exception as e:
+        return {"time_ms": None, "err": type(e).__name__, "stat": None}
+    times = []
+    stat = None
+    for _ in range(repeat):
+        t = time.perf_counter()
+        try:
+            with warnings.catch_warnings():
+                warnings.simplefilter("ignore")
+                r = little_mcar_test(data, backend=backend)
+            times.append(time.perf_counter() - t)
+            stat = r.statistic
+        except Exception as e:
+            return {"time_ms": None, "err": type(e).__name__, "stat": None}
+    return {"time_ms": 1000 * float(np.median(times)),
+            "err": None, "stat": stat}
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--quick", action="store_true",
+                    help="skip BFGS on known-slow cases")
+    ap.add_argument("--tag", default="",
+                    help="label printed with each row (e.g. 'baseline')")
+    ap.add_argument("--repeat", type=int, default=1,
+                    help="repetitions per case; reports median")
+    ap.add_argument("--max-iter", type=int, default=500)
+    args = ap.parse_args()
+    gpu = _gpu_available()
+    backends = ["cpu"] + (["gpu"] if gpu else [])
+    print(f"# GPU available: {gpu}")
+    print(f"# Missing rate: {MISSING_RATE}, seed: {SEED}")
+    print(f"# Tag: {args.tag!r}")
+    print()
+    header = f"{'case':<10} {'shape':<10} {'algo':<7} {'backend':<4} {'time_ms':>10} {'n_iter':>7} {'conv':>5} {'err':<15}"
+    print(header)
+    print("-" * len(header))
+    for case in CASES:
+        data = case.data_fn()
+        for algorithm in ("em", "direct"):
+            for backend in backends:
+                if args.quick and algorithm == "direct" and case.slow_bfgs:
+                    continue
+                r = bench_one(data, algorithm, backend,
+                              max_iter=args.max_iter, repeat=args.repeat)
+                t = f"{r['time_ms']:.1f}" if r["time_ms"] is not None else "--"
+                ni = r["n_iter"] if r["n_iter"] is not None else "--"
+                cv = "y" if r["converged"] else ("--" if r["converged"] is None else "n")
+                err = r["err"] or ""
+                print(f"{case.name:<10} {case.shape_hint:<10} {algorithm:<7} "
+                      f"{backend:<4} {t:>10} {ni:>7} {cv:>5} {err:<15}")
+    print()
+    print("# little_mcar_test end-to-end timings:")
+    print(f"{'case':<10} {'shape':<10} {'backend':<4} {'time_ms':>10}")
+    print("-" * 40)
+    for case in CASES:
+        data = case.data_fn()
+        for backend in backends:
+            r = bench_mcar_one(data, backend, repeat=args.repeat)
+            t = f"{r['time_ms']:.1f}" if r["time_ms"] is not None else "--"
+            err = r["err"] or ""
+            print(f"{case.name:<10} {case.shape_hint:<10} {backend:<4} "
+                  f"{t:>10}   {err}")
+if __name__ == "__main__":
+    main()

{pystatistics-2.0.1 → pystatistics-2.1.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "pystatistics"
-version = "2.0.1"
+version = "2.1.0"
 description = "GPU-accelerated statistical computing for Python"
 readme = "README.md"
 license = "MIT"

{pystatistics-2.0.1 → pystatistics-2.1.0}/pystatistics/__init__.py RENAMED Viewed

@@ -16,7 +16,7 @@ Usage:
     result = fit(design)
 """
-__version__ = "2.0.1"
+__version__ = "2.1.0"
 __author__ = "Hai-Shuo"
 __email__ = "contact@sgcx.org"

pystatistics 2.0.1__tar.gz → 2.1.0__tar.gz

pystatistics 2.0.1tar.gz → 2.1.0tar.gz