PyPI - FastLSQ - Versions diffs - 0.2.1__tar.gz → 0.2.3__tar.gz - Mend

FastLSQ 0.2.1tar.gz → 0.2.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (118) hide show

{fastlsq-0.2.1 → fastlsq-0.2.3}/CHANGELOG.md RENAMED Viewed

@@ -2,6 +2,86 @@
 All notable changes to FastLSQ will be documented in this file.
+## [0.2.3] - 2026-06-04
+### Added
+- **Householder-QR least-squares back-end** `solve_lstsq(..., method="qr")`:
+  backward-stable at `cond(A)` (ridge applied via the `[A; sqrt(mu) I]`
+  augmentation, not the normal equations), giving SVD-grade accuracy (~1e-14 on
+  the Helmholtz random-feature benchmark) at QR cost -- and, on the
+  rank-deficient CPU/no-ridge path, faster than the `gelsd` `"svd"` driver too,
+  while far more accurate than the normal-equations `"cholesky"` (no `cond(A)`
+  squaring, no required ridge). Assumes the system is numerically full column
+  rank; `"svd"` remains the rank-deficient-safe reference.
+- **`solve_linear(..., method=...)`**: the linear solve back-end is now
+  selectable from the high-level API (`"auto"`, `"qr"`, `"svd"`, `"cholesky"`,
+  `"rsvd"`; defaults to `"auto"`).
+### Changed
+- **`method="auto"` now tries QR before SVD.** After the Cholesky conditioning
+  probe rejects the fast path, `auto` uses the faster, more accurate QR solve and
+  falls back to the rank-revealing SVD only when QR's solution blows up
+  (`||x|| / (1 + ||b||)` above a generous guard). Real PDE systems measure
+  `<= 0.3` and keep QR; genuinely rank-deficient *inconsistent* systems (e.g. a
+  random RHS) measure ~3e14 and route to SVD. Net: the default solve is faster
+  and at least as accurate on real problems, with minimum-norm SVD preserved
+  exactly where it is needed.
+- **N-scaled collocation defaults.** `solve_linear` and `solve_nonlinear` now
+  default `n_pde`/`n_bc` to `None` and derive them from the feature count
+  (`n_pde = max(3000, 3 * n_blocks * hidden_size)`, `n_bc = max(800, n_pde // 5)`),
+  replacing the fixed `10000`/`2000` (and `5000`/`1000`) over-sampling that was
+  ~6x the default feature count. Faster for the default configuration; passing
+  explicit `n_pde`/`n_bc` still overrides.
+## [0.2.2] - 2026-06-03
+### Fixed
+- **Learnable bandwidth now trains.** `LearnableFastLSQ.solve_inner` replaced the
+  backprop-through-`torch.linalg.svd` inner solve (which returned NaN gradients
+  w.r.t. the bandwidth on the clustered singular values of random-feature
+  matrices) with the SVD-based `gelsd` rank-revealing least-squares driver, so
+  `train_bandwidth` / `fit` no longer stall at step 0.
+- **Default-solve accuracy.** Tightened the `_auto_solve` Cholesky-acceptance
+  probe from `rcond**0.5` to `rcond**0.25`, so `method="auto"` falls back to SVD
+  before the normal-equations Cholesky loses half its float64 digits
+  (cond(A) ~ 1e7 previously returned a ~1e-3-accurate answer).
+- **Newton convergence and robustness.** The stop test now combines a *relative*
+  residual criterion (`res_norm < tol_res * R0`) with the relative solution
+  change (`||Δu||/||u|| < tol_du`); the previous unreachable absolute residual
+  tolerance forced every nonlinear solve to run the full `max_iter`. The
+  backtracking line search keeps the previous iterate when no step satisfies
+  Armijo instead of committing a worse point. `solve_nonlinear` default
+  tolerances loosened to `tol_res=1e-8`, `tol_du=1e-10`.
+- **Continuation guard.** `solve_nonlinear` no longer raises `TypeError` when a
+  problem sets `use_continuation=True` without a `nu_target`.
+- **Regression problems solvable via the public API.** Their `get_train_data`
+  now accepts the `n_pde`/`n_bc` signature used by `solve_linear`,
+  `auto_select_scale`, and `check_problem` (was `n_samples`, raising
+  `TypeError`); `auto_select_scale` now raises when every trial fails instead of
+  silently returning the first scale.
+- **Float32 inputs.** `SinusoidalBasis.cache` promotes inputs to the basis
+  dtype/device, so float32 collocation points no longer raise `float != double`.
+- **Checkpoint reload.** `load_checkpoint` passes `weights_only=False`, fixing
+  `UnpicklingError` on torch >= 2.6 (checkpoints store NumPy arrays).
+- **Vector per-component scale.** `VectorFastLSQSolver.add_block` accepts a NumPy
+  array of per-component bandwidths (previously list/tuple only, silently
+  misread as per-dimension).
+- **ElasticWave2D operator.** Scaled the spatial and cross terms by `t_max²`
+  (time normalisation), consistent with `Wave2D_MS`.
+### Changed
+- Problem modules (`nonlinear.py`, `regression.py`) resolve the device via the
+  live `get_device()` rather than an import-time snapshot.
+- Packaging: the source distribution no longer ships the `misc/` images (the
+  sdist was ~14 MB); project URLs point to `github.com/sulcantonin/FastLSQ`;
+  README images use absolute URLs so they render on PyPI.
+  `examples/orbit_hill.py` solves via rank-revealing `lstsq` rather than a
+  normal-equations Cholesky.
 ## [0.2.1] - 2026-06-02
 ### Added

{fastlsq-0.2.1 → fastlsq-0.2.3}/FastLSQ.egg-info/PKG-INFO RENAMED Viewed

@@ -1,14 +1,14 @@
 Metadata-Version: 2.4
 Name: FastLSQ
-Version: 0.2.1
+Version: 0.2.3
 Summary: One-shot PDE solving via Fourier features with exact analytical derivatives; rank-revealing solvers, learnable anisotropic bandwidth, and CPU/CUDA/MPS support
 Author: Antonin Sulc
 License-Expression: MIT
-Project-URL: Homepage, https://github.com/asulc/FastLSQ
-Project-URL: Repository, https://github.com/asulc/FastLSQ
+Project-URL: Homepage, https://github.com/sulcantonin/FastLSQ
+Project-URL: Repository, https://github.com/sulcantonin/FastLSQ
 Project-URL: Paper, https://arxiv.org/abs/2602.10541
-Project-URL: Bug Tracker, https://github.com/asulc/FastLSQ/issues
-Project-URL: Changelog, https://github.com/asulc/FastLSQ/blob/main/CHANGELOG.md
+Project-URL: Bug Tracker, https://github.com/sulcantonin/FastLSQ/issues
+Project-URL: Changelog, https://github.com/sulcantonin/FastLSQ/blob/main/CHANGELOG.md
 Keywords: pde,partial-differential-equations,fourier-features,least-squares,scientific-computing,neural-network,physics-informed,newton-raphson
 Classifier: Development Status :: 4 - Beta
 Classifier: Intended Audience :: Science/Research
@@ -45,7 +45,7 @@ Dynamic: license-file
 <p align="center">
-  <img src="misc/fastlsq_teaser.png" alt="FastLSQ method overview" width="400"/>
+  <img src="https://raw.githubusercontent.com/sulcantonin/FastLSQ/main/misc/fastlsq_teaser.png" alt="FastLSQ method overview" width="400"/>
 </p>
 **Solving PDEs in one shot via Fourier features with exact analytical derivatives.**
@@ -235,8 +235,8 @@ python examples/learnable_helmholtz.py
 The analytical derivatives enable gradients through the pre-factored solve, making inverse problems tractable. Example: recovering 4 anisotropic Gaussian heat sources (24 parameters) from 4 sparse sensors. The heat equation is solved in space-time; L-BFGS-B optimises source positions and shapes to match sensor time-series. *(Click image for animation.)*
 <p align="center">
-  <a href="misc/inverse_heat_source.gif">
-    <img src="misc/inverse_heat_source.png" alt="Inverse heat source localisation" width="700"/>
+  <a href="https://raw.githubusercontent.com/sulcantonin/FastLSQ/main/misc/inverse_heat_source.gif">
+    <img src="https://raw.githubusercontent.com/sulcantonin/FastLSQ/main/misc/inverse_heat_source.png" alt="Inverse heat source localisation" width="700"/>
   </a>
 </p>

{fastlsq-0.2.1 → fastlsq-0.2.3}/FastLSQ.egg-info/SOURCES.txt RENAMED Viewed

@@ -95,16 +95,6 @@ fastlsq/problems/__init__.py
 fastlsq/problems/linear.py
 fastlsq/problems/nonlinear.py
 fastlsq/problems/regression.py
-misc/fastlsq_teaser.png
-misc/ideal_quadrupole.png
-misc/inverse_heat_source.gif
-misc/inverse_heat_source.png
-misc/inverse_magnetostatics.png
-misc/inverse_magnetostatics_convergence.png
-misc/quadrupole_convergence.png
-misc/quadrupole_optimization.png
-misc/tutorial_nlpoisson_convergence.png
-misc/tutorial_nlpoisson_solution.png
 tests/test_basic.py
 tests/test_block.py
 tests/test_derivatives.py

{fastlsq-0.2.1 → fastlsq-0.2.3}/MANIFEST.in RENAMED Viewed

@@ -2,7 +2,6 @@ include LICENSE
 include README.md
 include CHANGELOG.md
 include requirements.txt
-recursive-include misc *.png *.gif
 recursive-include examples *.py
 recursive-include tests *.py
 recursive-exclude * __pycache__

{fastlsq-0.2.1 → fastlsq-0.2.3}/PKG-INFO RENAMED Viewed

@@ -1,14 +1,14 @@
 Metadata-Version: 2.4
 Name: FastLSQ
-Version: 0.2.1
+Version: 0.2.3
 Summary: One-shot PDE solving via Fourier features with exact analytical derivatives; rank-revealing solvers, learnable anisotropic bandwidth, and CPU/CUDA/MPS support
 Author: Antonin Sulc
 License-Expression: MIT
-Project-URL: Homepage, https://github.com/asulc/FastLSQ
-Project-URL: Repository, https://github.com/asulc/FastLSQ
+Project-URL: Homepage, https://github.com/sulcantonin/FastLSQ
+Project-URL: Repository, https://github.com/sulcantonin/FastLSQ
 Project-URL: Paper, https://arxiv.org/abs/2602.10541
-Project-URL: Bug Tracker, https://github.com/asulc/FastLSQ/issues
-Project-URL: Changelog, https://github.com/asulc/FastLSQ/blob/main/CHANGELOG.md
+Project-URL: Bug Tracker, https://github.com/sulcantonin/FastLSQ/issues
+Project-URL: Changelog, https://github.com/sulcantonin/FastLSQ/blob/main/CHANGELOG.md
 Keywords: pde,partial-differential-equations,fourier-features,least-squares,scientific-computing,neural-network,physics-informed,newton-raphson
 Classifier: Development Status :: 4 - Beta
 Classifier: Intended Audience :: Science/Research
@@ -45,7 +45,7 @@ Dynamic: license-file
 <p align="center">
-  <img src="misc/fastlsq_teaser.png" alt="FastLSQ method overview" width="400"/>
+  <img src="https://raw.githubusercontent.com/sulcantonin/FastLSQ/main/misc/fastlsq_teaser.png" alt="FastLSQ method overview" width="400"/>
 </p>
 **Solving PDEs in one shot via Fourier features with exact analytical derivatives.**
@@ -235,8 +235,8 @@ python examples/learnable_helmholtz.py
 The analytical derivatives enable gradients through the pre-factored solve, making inverse problems tractable. Example: recovering 4 anisotropic Gaussian heat sources (24 parameters) from 4 sparse sensors. The heat equation is solved in space-time; L-BFGS-B optimises source positions and shapes to match sensor time-series. *(Click image for animation.)*
 <p align="center">
-  <a href="misc/inverse_heat_source.gif">
-    <img src="misc/inverse_heat_source.png" alt="Inverse heat source localisation" width="700"/>
+  <a href="https://raw.githubusercontent.com/sulcantonin/FastLSQ/main/misc/inverse_heat_source.gif">
+    <img src="https://raw.githubusercontent.com/sulcantonin/FastLSQ/main/misc/inverse_heat_source.png" alt="Inverse heat source localisation" width="700"/>
   </a>
 </p>

{fastlsq-0.2.1 → fastlsq-0.2.3}/README.md RENAMED Viewed

@@ -4,7 +4,7 @@
 <p align="center">
-  <img src="misc/fastlsq_teaser.png" alt="FastLSQ method overview" width="400"/>
+  <img src="https://raw.githubusercontent.com/sulcantonin/FastLSQ/main/misc/fastlsq_teaser.png" alt="FastLSQ method overview" width="400"/>
 </p>
 **Solving PDEs in one shot via Fourier features with exact analytical derivatives.**
@@ -194,8 +194,8 @@ python examples/learnable_helmholtz.py
 The analytical derivatives enable gradients through the pre-factored solve, making inverse problems tractable. Example: recovering 4 anisotropic Gaussian heat sources (24 parameters) from 4 sparse sensors. The heat equation is solved in space-time; L-BFGS-B optimises source positions and shapes to match sensor time-series. *(Click image for animation.)*
 <p align="center">
-  <a href="misc/inverse_heat_source.gif">
-    <img src="misc/inverse_heat_source.png" alt="Inverse heat source localisation" width="700"/>
+  <a href="https://raw.githubusercontent.com/sulcantonin/FastLSQ/main/misc/inverse_heat_source.gif">
+    <img src="https://raw.githubusercontent.com/sulcantonin/FastLSQ/main/misc/inverse_heat_source.png" alt="Inverse heat source localisation" width="700"/>
   </a>
 </p>

{fastlsq-0.2.1 → fastlsq-0.2.3}/examples/orbit_hill.py RENAMED Viewed

@@ -31,7 +31,6 @@ import sys
 import time
 import numpy as np
 import torch
-from scipy.linalg import cho_factor, cho_solve
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
 from fastlsq.basis import SinusoidalBasis  # noqa: E402
@@ -166,10 +165,13 @@ def assemble(basis: SinusoidalBasis, pts_int: torch.Tensor):
 def solve(A, b):
     A64 = A.astype(np.float64, copy=False)
     b64 = b.astype(np.float64, copy=False)
-    AtA = A64.T @ A64 + MU_REG * np.eye(A64.shape[1])
-    Atb = A64.T @ b64
-    cho = cho_factor(AtA)
-    return cho_solve(cho, Atb)
+    # Rank-revealing least squares. Forming the normal equations A^T A (+ridge)
+    # and Cholesky-factoring them squares the condition number of this
+    # random-feature system, which made cho_factor fail ("not positive
+    # definite"); lstsq solves min ||A x - b|| directly via SVD and needs no
+    # positive-definiteness.
+    beta, *_ = np.linalg.lstsq(A64, b64, rcond=None)
+    return beta
 # ---------------------------------------------------------------------------

{fastlsq-0.2.1 → fastlsq-0.2.3}/fastlsq/__init__.py RENAMED Viewed

@@ -44,7 +44,7 @@ from fastlsq.export import (
 )
 from fastlsq import viz
-__version__ = "0.2.1"
+__version__ = "0.2.3"
 __all__ = [
     # Device selection (CPU / CUDA / Apple-MPS, dtype-aware)
     "resolve_device",

{fastlsq-0.2.1 → fastlsq-0.2.3}/fastlsq/api.py RENAMED Viewed

@@ -35,10 +35,11 @@ def solve_linear(
     scale: Optional[float] = None,
     n_blocks: int = 3,
     hidden_size: int = 500,
-    n_pde: int = 10000,
-    n_bc: int = 2000,
+    n_pde: Optional[int] = None,
+    n_bc: Optional[int] = None,
     n_test: int = 5000,
     mu: float = 0.0,
+    method: str = "auto",
     auto_scale: bool = True,
     auto_scale_trials: int = 5,
     return_solver: bool = False,
@@ -65,12 +66,17 @@ def solve_linear(
         Number of feature blocks.
     hidden_size : int
         Features per block.
-    n_pde, n_bc : int
-        Number of collocation and boundary points.
+    n_pde, n_bc : int, optional
+        Number of collocation and boundary points. If None, scaled with the
+        feature count: n_pde = max(3000, 3 * n_blocks * hidden_size),
+        n_bc = max(800, n_pde // 5).
     n_test : int
         Number of test points for error evaluation.
     mu : float
         Tikhonov regularisation parameter (0 = no regularisation).
+    method : str
+        Linear solve back-end passed to ``solve_lstsq`` ("auto", "qr", "svd",
+        "cholesky", "rsvd"). Default "auto".
     auto_scale : bool
         If True and scale=None, automatically select scale via grid search.
     auto_scale_trials : int
@@ -93,6 +99,12 @@ def solve_linear(
     """
     t0 = time.time()
+    n_feat = n_blocks * hidden_size
+    if n_pde is None:
+        n_pde = max(3000, 3 * n_feat)   # ~3x oversampling; fixed 10000 was 6x for default N
+    if n_bc is None:
+        n_bc = max(800, n_pde // 5)
     # Auto-select scale if needed
     if scale is None and auto_scale:
         if verbose:
@@ -127,7 +139,7 @@ def solve_linear(
     # Assemble and solve
     A, b = problem.build(solver, x_pde, *build_args)
-    beta_raw = solve_lstsq(A, b, mu=mu)
+    beta_raw = solve_lstsq(A, b, mu=mu, method=method)
     n_outputs = getattr(problem, "n_outputs", 1)
     solver.beta = unpack_beta(beta_raw, solver.n_features, n_outputs)
@@ -170,12 +182,12 @@ def solve_nonlinear(
     scale: Optional[float] = None,
     n_blocks: int = 3,
     hidden_size: int = 500,
-    n_pde: int = 5000,
-    n_bc: int = 1000,
+    n_pde: Optional[int] = None,
+    n_bc: Optional[int] = None,
     n_test: int = 5000,
     max_iter: int = 30,
-    tol_res: float = 1e-12,
-    tol_du: float = 1e-13,
+    tol_res: float = 1e-8,
+    tol_du: float = 1e-10,
     damping: float = 1.0,
     mu: float = 1e-10,
     auto_scale: bool = True,
@@ -202,8 +214,10 @@ def solve_nonlinear(
         Number of feature blocks.
     hidden_size : int
         Features per block.
-    n_pde, n_bc : int
-        Number of collocation and boundary points.
+    n_pde, n_bc : int, optional
+        Number of collocation and boundary points. If None, scaled with the
+        feature count: n_pde = max(3000, 3 * n_blocks * hidden_size),
+        n_bc = max(800, n_pde // 5).
     n_test : int
         Number of test points for error evaluation.
     max_iter : int
@@ -239,6 +253,12 @@ def solve_nonlinear(
     """
     t0 = time.time()
+    n_feat = n_blocks * hidden_size
+    if n_pde is None:
+        n_pde = max(3000, 3 * n_feat)   # ~3x oversampling; fixed 10000 was 6x for default N
+    if n_bc is None:
+        n_bc = max(800, n_pde // 5)
     # Auto-select scale if needed
     if scale is None and auto_scale:
         if verbose:
@@ -264,9 +284,11 @@ def solve_nonlinear(
     # Check for continuation
     if getattr(problem, "use_continuation", False):
         schedule = list(problem.continuation_schedule)
-        if schedule[-1] != getattr(problem, "nu_target", None):
-            schedule.append(getattr(problem, "nu_target", None))
-        schedule = [v for v in schedule if v >= getattr(problem, "nu_target", 0.0)]
+        nu_target = getattr(problem, "nu_target", None)
+        if nu_target is not None:
+            if schedule[-1] != nu_target:
+                schedule.append(nu_target)
+            schedule = [v for v in schedule if v >= nu_target]
         history = continuation_solve(
             solver, problem, x_pde, bcs, f_pde,

{fastlsq-0.2.1 → fastlsq-0.2.3}/fastlsq/basis.py RENAMED Viewed

@@ -172,6 +172,11 @@ class SinusoidalBasis:
     def cache(self, x: torch.Tensor) -> BasisCache:
         """Create a cache for the given collocation points."""
+        # Accept inputs in any dtype/device (e.g. float32 from user code) and
+        # promote to the basis's own dtype/device so ``x @ self.W`` never trips
+        # a float32-vs-float64 mismatch.
+        if x.dtype != self.W.dtype or x.device != self.W.device:
+            x = x.to(dtype=self.W.dtype, device=self.W.device)
         return BasisCache(x @ self.W + self.b)
     # ------------------------------------------------------------------

{fastlsq-0.2.1 → fastlsq-0.2.3}/fastlsq/export.py RENAMED Viewed

@@ -164,7 +164,10 @@ def load_checkpoint(
     solver : FastLSQSolver
     metadata : dict, optional
     """
-    state = torch.load(path, map_location=device)
+    # weights_only=False: save_checkpoint writes NumPy arrays (see to_dict),
+    # which torch>=2.6's default weights_only=True refuses to unpickle. The
+    # file is produced by this library, so it is trusted.
+    state = torch.load(path, map_location=device, weights_only=False)
     metadata = state.pop("metadata", None)
     solver = from_dict(state, device=device)
     return solver, metadata

{fastlsq-0.2.1 → fastlsq-0.2.3}/fastlsq/learnable.py RENAMED Viewed

@@ -180,19 +180,26 @@ class LearnableFastLSQ(nn.Module):
                     rcond: float = 1e-12):
         """Differentiable rank-revealing inner solve.
-        Solves ``beta* = argmin ||A beta - b||^2 + mu ||beta||^2`` through a
-        rank-revealing truncated SVD of ``A``, so gradients still flow back to
-        ``L`` *and* the solve is stable when ``A`` is rank-deficient.  (The plain
-        ``torch.linalg.lstsq`` used previously amplifies the near-null space and
-        makes the outer AdamW loop diverge.)
+        Solves ``beta* = argmin ||A beta - b||^2 + mu ||beta||^2`` through the
+        SVD-based ``gelsd`` least-squares driver with ``rcond`` truncation, so
+        gradients still flow back to ``L`` *and* the solve is stable when ``A``
+        is rank-deficient.  (The ``rcond`` cut suppresses the near-null space,
+        and ``gelsd``'s backward uses the stable pseudoinverse formula rather
+        than per-singular-vector derivatives -- which is what keeps the outer
+        AdamW loop's gradients finite.  A plain ``torch.linalg.lstsq`` *without*
+        ``rcond`` is what amplifies the null space.)
         For ``n_outputs > 1`` the system is block-stacked: the flat solution is
         kept as ``self._beta_flat`` (shape-compatible with ``A``) for residual
         losses, while ``self.beta`` is reshaped to ``(N, k)`` for prediction.
         """
-        U, S, Vh = torch.linalg.svd(A, full_matrices=False)
-        filt = torch.where(S > rcond * S[0], S / (S * S + mu), torch.zeros_like(S))
-        beta_flat = Vh.transpose(-2, -1) @ (filt.unsqueeze(-1) * (U.transpose(-2, -1) @ b))
+        if mu and mu > 0.0:
+            n = A.shape[-1]
+            A_aug = torch.cat([A, (mu ** 0.5) * torch.eye(n, dtype=A.dtype, device=A.device)], dim=0)
+            b_aug = torch.cat([b, torch.zeros(n, b.shape[-1], dtype=b.dtype, device=b.device)], dim=0)
+            beta_flat = torch.linalg.lstsq(A_aug, b_aug, rcond=rcond, driver="gelsd").solution
+        else:
+            beta_flat = torch.linalg.lstsq(A, b, rcond=rcond, driver="gelsd").solution
         self._beta_flat = beta_flat
         if self.n_outputs > 1:
             self.beta = unpack_beta(beta_flat, self.n_features, self.n_outputs)

{fastlsq-0.2.1 → fastlsq-0.2.3}/fastlsq/linalg.py RENAMED Viewed

@@ -11,17 +11,26 @@ condition number -- leaving several orders of magnitude of accuracy on the floor
 ``solve_lstsq`` therefore exposes several back-ends via ``method=``:
+* ``"qr"``       -- Householder-QR least squares (ridge via ``[A; sqrt(mu) I]``
+                    augmentation).  Backward-stable at ``cond(A)`` -- SVD-grade
+                    accuracy with no normal-equations squaring and no required
+                    ridge, at ~QR cost (cheaper than SVD).  Assumes (numerically)
+                    full column rank; ``"svd"`` is the rank-deficient-safe choice
+                    (and ``"auto"``'s ultimate fallback if QR blows up).
 * ``"svd"``      -- rank-revealing truncated SVD of ``A`` (LAPACK ``gelsd`` fast
-                    path on CPU; explicit SVD elsewhere).  The accuracy reference.
+                    path on CPU; explicit SVD elsewhere).  The accuracy reference;
+                    use for a genuinely rank-deficient ``A``.
 * ``"cholesky"`` -- normal-equations ``(A^T A + mu I)`` Cholesky.  Fast, but only
                     safe when ``A`` is well-conditioned.
 * ``"rsvd"``     -- randomized SVD (range-finder + power iterations).  ``O(MNk)``
                     for a target ``rank`` k << N -- the cheap option for strongly
                     low-rank systems.
 * ``"auto"`` (default) -- try Cholesky; if the system is ill-conditioned (a
-                    cheap pivot-ratio test) fall back to ``"svd"``.  Recovers the
-                    fast path on well-conditioned problems **without** sacrificing
-                    accuracy on the rest.
+                    cheap pivot-ratio test) use the faster ``"qr"``, and fall back
+                    to rank-revealing ``"svd"`` only if QR's solution blows up (the
+                    feature matrices can be rank-deficient).  Fast path when
+                    well-conditioned, QR speed/accuracy on the rest, SVD as the
+                    safety net.
 All back-ends are device/dtype-aware.  Apple-MPS lacks a robust ``svd``/``lstsq``,
 so the factorization is run on CPU and the result moved back (one-time warning).
@@ -33,6 +42,13 @@ import torch
 _MPS_WARNED = False
+# In ``method="auto"``: above this ``||x|| / (1 + ||b||)`` ratio the unpivoted-QR
+# solve is treated as a rank-deficiency blow-up and handed to the rank-revealing
+# SVD instead.  Real PDE systems measure <= 0.3 here; the degenerate inconsistent
+# (random-RHS) rank-deficient case measures ~3e14 -- so the guard is generous and
+# a false positive only costs speed, never correctness.
+_QR_AUTO_NORM_GUARD = 1e6
 def _maybe_cpu(A, b):
     """MPS has no robust svd/lstsq -- factorize on CPU, remember to move back."""
@@ -86,16 +102,37 @@ def _rsvd_solve(A, b, mu, rcond, rank, oversample, n_iter):
     return Vh.transpose(-2, -1) @ (filt.unsqueeze(-1) * (U.transpose(-2, -1) @ b))
+def _qr_solve(A, b, mu):
+    """Householder-QR least squares (ridge via [A; sqrt(mu) I] augmentation).
+    Backward-stable at cond(A): SVD-grade accuracy with NO normal-equations
+    squaring and no required ridge, at ~QR cost (cheaper than SVD).  Assumes
+    (numerically) full column rank; use method='svd' for a rank-deficient A."""
+    if mu:
+        n = A.shape[-1]
+        A = torch.cat([A, (mu ** 0.5) * torch.eye(n, dtype=A.dtype, device=A.device)], dim=-2)
+        b = torch.cat([b, torch.zeros(n, b.shape[-1], dtype=b.dtype, device=b.device)], dim=-2)
+    Q, R = torch.linalg.qr(A, mode="reduced")
+    return torch.linalg.solve_triangular(R, Q.transpose(-2, -1) @ b, upper=True)
 def _auto_solve(A, b, mu, rcond):
     # Cheap conditioning probe: cond(A) ~ max/min Cholesky pivot.  If well within
-    # float64's reach use the fast Cholesky; otherwise fall back to the SVD.
+    # float64's reach use the fast Cholesky.
     try:
         x, L = _cholesky_solve(A, b, mu)
         d = torch.diagonal(L).abs()
-        if torch.isfinite(d).all() and d.min() > (rcond ** 0.5) * d.max():
+        if torch.isfinite(d).all() and d.min() > (rcond ** 0.25) * d.max():
             return x
     except torch.linalg.LinAlgError:
         pass
+    # Ill-conditioned: try the faster, backward-stable QR.  On a genuinely
+    # rank-deficient *inconsistent* A unpivoted QR can return a wildly
+    # non-minimum-norm solution, so fall back to the rank-revealing SVD when the
+    # QR solution blows up (or is non-finite).  See _QR_AUTO_NORM_GUARD.
+    x = _qr_solve(A, b, mu)
+    nx = torch.linalg.vector_norm(x)
+    if torch.isfinite(nx) and nx <= _QR_AUTO_NORM_GUARD * (1.0 + torch.linalg.vector_norm(b)):
+        return x
     return _svd_solve(A, b, mu, rcond)
@@ -112,7 +149,7 @@ def solve_lstsq(A, b, mu=0.0, rcond=1e-12, method="auto",
         an unstable add-on).
     rcond : float
         Relative singular-value / pivot threshold for rank determination.
-    method : {"auto", "svd", "cholesky", "rsvd"}
+    method : {"auto", "qr", "svd", "cholesky", "rsvd"}
         Solve back-end (see module docstring).  Default "auto".
     rank, oversample, n_iter : int
         Randomized-SVD parameters (``method="rsvd"`` only).  Set ``rank`` << N for
@@ -127,11 +164,13 @@ def solve_lstsq(A, b, mu=0.0, rcond=1e-12, method="auto",
         x = _auto_solve(A2, b2, mu, rcond)
     elif method == "svd":
         x = _svd_solve(A2, b2, mu, rcond)
+    elif method == "qr":
+        x = _qr_solve(A2, b2, mu)
     elif method == "cholesky":
         x = _cholesky_solve(A2, b2, mu)[0]
     elif method == "rsvd":
         x = _rsvd_solve(A2, b2, mu, rcond, rank, oversample, n_iter)
     else:
         raise ValueError(f"Unknown method {method!r}; "
-                         "choose 'auto', 'svd', 'cholesky', or 'rsvd'.")
+                         "choose 'auto', 'qr', 'svd', 'cholesky', or 'rsvd'.")
     return x.to(mps_dev) if mps_dev is not None else x

{fastlsq-0.2.1 → fastlsq-0.2.3}/fastlsq/newton.py RENAMED Viewed

@@ -87,10 +87,13 @@ def newton_solve(solver, problem, x_pde, bcs, f_pde,
     history = []
     n_outputs = getattr(problem, "n_outputs", 1)
     N = solver.n_features
+    R0 = None
     for it in range(max_iter):
         J, neg_R = problem.build_newton_step(solver, x_pde, bcs, f_pde)
         res_norm = torch.norm(neg_R).item()
+        if R0 is None:
+            R0 = max(res_norm, 1e-30)
         delta_beta_raw = solve_lstsq(J, neg_R, mu=mu)
         delta_beta = unpack_beta(delta_beta_raw, N, n_outputs)
@@ -116,7 +119,10 @@ def newton_solve(solver, problem, x_pde, bcs, f_pde,
                 break
             alpha *= 0.5
         else:
-            solver.beta = beta_old + alpha * delta_beta
+            # No backtracked step satisfied the Armijo condition; reject the
+            # step and keep the previous iterate rather than committing a
+            # point that may be worse than where we started.
+            solver.beta = beta_old
         history.append({
             "iter": it, "residual": res_norm,
@@ -128,7 +134,7 @@ def newton_solve(solver, problem, x_pde, bcs, f_pde,
             print(f"  Newton {it:2d}: |R|={res_norm:.2e}  "
                   f"|du|/|u|={rel_du:.2e}  alpha={alpha:.3f}")
-        if res_norm < tol_res and rel_du < tol_du:
+        if res_norm < tol_res * R0 or rel_du < tol_du:
             if verbose:
                 print(f"  Converged in {it + 1} iterations "
                       f"(|R|={res_norm:.1e}, |du|/|u|={rel_du:.1e})")

{fastlsq-0.2.1 → fastlsq-0.2.3}/fastlsq/problems/linear.py RENAMED Viewed

@@ -392,13 +392,13 @@ class ElasticWave2D:
         # t is normalised to [0,1]; physical d²/dt² = (1/t_max)² d²/dτ²
         t_scale = self.t_max ** 2
-        # PDE1: u_x_tt - c_p² u_x_xx - c_s² u_x_yy - (c_p² - c_s²) u_y_xy = 0
-        A1_x = t_scale * u_tt - self.c_p2 * u_xx - self.c_s2 * u_yy
-        A1_y = -self.c_cross * u_xy
+        # PDE1: u_x_ττ = t_max²·(c_p² u_x_xx + c_s² u_x_yy + (c_p²-c_s²) u_y_xy)
+        A1_x = u_tt - t_scale * (self.c_p2 * u_xx + self.c_s2 * u_yy)
+        A1_y = -t_scale * self.c_cross * u_xy
-        # PDE2: u_y_tt - c_p² u_y_yy - c_s² u_y_xx - (c_p² - c_s²) u_x_xy = 0
-        A2_x = -self.c_cross * u_xy
-        A2_y = t_scale * u_tt - self.c_p2 * u_yy - self.c_s2 * u_xx
+        # PDE2: u_y_ττ = t_max²·(c_p² u_y_yy + c_s² u_y_xx + (c_p²-c_s²) u_x_xy)
+        A2_x = -t_scale * self.c_cross * u_xy
+        A2_y = u_tt - t_scale * (self.c_p2 * u_yy + self.c_s2 * u_xx)
         A_pde = torch.cat([
             torch.cat([A1_x, A1_y], dim=1),

FastLSQ 0.2.1__tar.gz → 0.2.3__tar.gz

FastLSQ 0.2.1tar.gz → 0.2.3tar.gz