PyPI - diffcb - Versions diffs - 0.1.1__tar.gz → 0.1.3__tar.gz - Mend

diffcb 0.1.1tar.gz → 0.1.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

{diffcb-0.1.1 → diffcb-0.1.3}/PKG-INFO +58 -21
diffcb-0.1.3/README.md +129 -0
{diffcb-0.1.1 → diffcb-0.1.3}/dcb/__init__.py +1 -1
diffcb-0.1.3/dcb/fft_kde.py +262 -0
{diffcb-0.1.1 → diffcb-0.1.3}/dcb/layer.py +14 -5
{diffcb-0.1.1 → diffcb-0.1.3}/dcb/solver.py +9 -2
{diffcb-0.1.1 → diffcb-0.1.3}/pyproject.toml +1 -1
diffcb-0.1.1/README.md +0 -92
diffcb-0.1.1/dcb/fft_kde.py +0 -144
{diffcb-0.1.1 → diffcb-0.1.3}/.gitignore +0 -0
{diffcb-0.1.1 → diffcb-0.1.3}/.zenodo.json +0 -0
{diffcb-0.1.1 → diffcb-0.1.3}/LICENSE +0 -0
{diffcb-0.1.1 → diffcb-0.1.3}/dcb/diagnostics.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.3}/dcb/kde.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.3}/dcb/utils.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.3}/notebooks/.gitkeep +0 -0
{diffcb-0.1.1 → diffcb-0.1.3}/tests/test_kde.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.3}/tests/test_layer.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.3}/tests/test_r18c_denom_audit.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.3}/tests/test_r18c_deprecation_warn.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.3}/tests/test_r19_default_fft.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.3}/tests/test_r19_diagnostics.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.3}/tests/test_solver.py +0 -0

{diffcb-0.1.1 → diffcb-0.1.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: diffcb
-Version: 0.1.1
+Version: 0.1.3
 Summary: Differentiable Critical Bandwidth: Silverman's modality test as a differentiable PyTorch layer with IFT backward pass.
 Project-URL: Homepage, https://github.com/ryZhangHason/differentiable-critical-bandwidth
 Project-URL: Repository, https://github.com/ryZhangHason/differentiable-critical-bandwidth
@@ -71,10 +71,10 @@ The critical bandwidth `h_crit` is the minimum KDE bandwidth at which a distribu
 import torch
 from dcb import DCBLayer
-X = torch.randn(256, requires_grad=True)   # 1D samples
+X = torch.randn(1000, requires_grad=True)   # 1D samples
 layer = DCBLayer(target_modes=1)
-h_crit = layer(X)                          # differentiable scalar
-h_crit.backward()                          # exact IFT gradients
+h_crit = layer(X)                           # differentiable scalar
+h_crit.backward()                           # exact IFT gradients
 ```
 ## Installation
@@ -91,34 +91,72 @@ cd differentiable-critical-bandwidth
 pip install -e ".[dev]"
 ```
-## Paper
+## Accuracy vs R's `bw.crit`
-> Ruiyu Zhang. "Differentiable Critical Bandwidth: Making Silverman's Modality Test End-to-End Trainable." *Journal of Machine Learning Research*, 2026 (in preparation).
+DCB is validated against R's `multimode::bw.crit(data, mod0=1)` — the standard reference implementation of Hall & York (2001). On **identical data**:
+| n | DCB vs R (same sample) | DCB vs R (independent samples) |
+|---|---|---|
+| 100K | **0.004%** | ~0.5% (MC noise from independent RNG) |
+| 1M | **0.005%** | ~0.2% |
+| 10M | **0.004%** | ~0.1% |
+The independent-sample figures reflect natural sampling variability (two unbiased estimators drawing different data), not algorithmic error. On identical data, DCB agrees with R to within **0.005%** at all tested n. DCB is 43× faster than R at n=100M (1.1 s vs 50 s) and handles n=2B in 24 s while R OOMs.
+## Key Parameters
+```python
+DCBLayer(
+    target_modes=1,       # target number of modes
+    G=512,                # IFT evaluation grid points
+    use_fft=True,         # FFT forward (default); eliminates subsampling bias for n>50K
+    max_n_exact=1_000_000,# sketch to sketch_size when n exceeds this (None = always exact)
+    sketch_size=500_000,  # sketch target; 500K matches full-n accuracy (O(n^{-2/9}) rate)
+    safe_backward=False,  # clamp IFT denominator near bifurcations
+)
+```
 ## Confirmed Experimental Results
-All results produced on Kaggle GPU (T4 / P100) — see `experiments/` and `outputs/`.
+All GPU results produced on Kaggle (T4 / P100) — see `experiments/` and `outputs/`.
 | Experiment | Result | Criterion |
 |---|---|---|
-| **Validation (m≥2)** | R²=0.91, MAE=0.07, Spearman ρ=0.89 | R²≥0.85, MAE≤0.10 ✓ |
-| **Speedup vs scipy (n=8192)** | **10.5×** on T4 | ≥3× ✓ |
+| **Accuracy vs R (same data, n=100K)** | **0.004%** | < 0.01% ✓ |
+| **Validation (m≥2, Marron-Wand)** | R²=0.91, MAE=0.07, ρ=0.89 | R²≥0.85 ✓ |
+| **Speedup vs scipy (CUDA T4, n=8192)** | **10.5×** | ≥3× ✓ |
 | **GAN mode preservation** | h_crit=1.232 >> 0.3 | h_crit>0.3 ✓ |
 | **Anomaly AUC (KDDCup99)** | DCB=**0.9982** vs IF=0.9867 | DCB≥IF ✓ |
+## Changelog
+### v0.1.1 (2026-05-29)
+- **MPS fix:** `torch.histc` on MPS allocated an n×bins intermediate (OOM at n≥5M). Replaced with `bucketize+bincount` on CPU — MPS-safe and numerically identical.
+- **Sketch API:** `DCBLayer(max_n_exact=1_000_000, sketch_size=500_000)` — silently sketches to 500K when n exceeds threshold. Justified by O(n⁻²/⁹) convergence of h_crit; 500K sketch matches full-n accuracy.
+- **Consistent bisection domain:** Pre-computed domain passed to all `fft_mode_count` calls in a single bisection, eliminating per-step drift.
+- **Bias warning direction:** Corrected "expected upward bias" to "expected downward bias" on legacy `use_fft=False` path.
+- **Test fixes:** Updated 8 pre-existing test failures (tuple unpacking, bounds, deprecation API).
+### v0.1.0 (2026-05-28)
+- Initial PyPI release. FFT forward (O(n + G log G)), IFT backward, MPS support.
 ## Repository Structure
 ```
-dcb/            Core PyTorch package (layer.py, solver.py, kde.py, utils.py)
+dcb/            Core PyTorch package
+  layer.py        DCBLayer nn.Module + DCBFunction autograd
+  solver.py       IFT root-finder and backward pass
+  fft_kde.py      FFT-based mode counter (MPS-safe, float64, G=16384)
+  kde.py          Direct KDE derivatives (small-n path)
+  utils.py        Grid, Silverman bandwidth, sg() stabilizer
 experiments/    Reproduction scripts for all paper figures and tables
-  phase1_validation.py   Figure 1: DCB vs reference h_crit scatter
-  phase1_speedup.py      Figure 2: GPU speedup benchmark
-  phase1_ablation.py     Figures S1–S2: ε/τ sensitivity heatmaps
-  phase2_gan.py          Figure 3: GAN mode-collapse prevention
-  phase3_anomaly.py      Table 2 + Figure 5: anomaly detection benchmark
-tests/          Unit tests (pytest, 35/35 passing)
+  phase1_*.py     Validation, speedup, ablation (Figures 1–2, S1–S2)
+  phase2_gan.py   GAN mode-collapse prevention (Figure 3)
+  phase3_anomaly.py  Anomaly detection (Table 2, Figure 5)
+  round20_*.py    Large-n R comparison and streaming benchmarks
+  round21_*.py    Accuracy improvement experiments
+tests/          Unit tests (pytest, 45 passed, 1 xfailed)
 outputs/        All generated figures and tables (PDFs, PNGs, CSVs)
-notebooks/      Quickstart and demo notebooks
 ```
 ## Reproducing Paper Results
@@ -127,7 +165,6 @@ notebooks/      Quickstart and demo notebooks
 # Phase 1: validation, speedup, ablation
 python experiments/phase1_validation.py
 python experiments/phase1_speedup.py
-python experiments/phase1_ablation.py
 # Phase 2: GAN mode collapse experiment
 python experiments/phase2_gan.py
@@ -136,13 +173,13 @@ python experiments/phase2_gan.py
 python experiments/phase3_anomaly.py
 ```
-For GPU runs, use the provided Kaggle kernels:
+For GPU runs use the Kaggle kernels:
 - Phase 1–2: `hsingle/dcb-full-experiments`
 - Phase 3: `hsingle/dcb-phase-3-anomaly-detection`
-## Kaggle GPU Notes
+## Paper
-Kaggle may assign a P100 (sm_60) instead of T4. The Phase 3 kernel handles this automatically by installing `torch==2.2.2+cu118` (the earliest PyTorch release with both Python 3.12 and sm_60 support) when P100 is detected.
+> Ruiyu Zhang. "Differentiable Critical Bandwidth: Making Silverman's Modality Test End-to-End Trainable." *Journal of Machine Learning Research*, 2026 (in preparation).
 ## License

diffcb-0.1.3/README.md ADDED Viewed

@@ -0,0 +1,129 @@
+# DCB — Differentiable Critical Bandwidth
+[![PyPI](https://img.shields.io/pypi/v/diffcb.svg)](https://pypi.org/project/diffcb/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
+[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/)
+A PyTorch package that makes **Silverman's critical bandwidth test (1981)** fully differentiable, enabling end-to-end gradient-based optimization over the modal structure of continuous distributions.
+## Overview
+The critical bandwidth `h_crit` is the minimum KDE bandwidth at which a distribution appears to have at most `m` modes — a classical nonparametric statistic for modality testing. DCB replaces every non-differentiable operation in its computation with a smooth surrogate, then uses the **Implicit Function Theorem** to compute exact gradients through the root-finding step at O(1) memory cost.
+```python
+import torch
+from dcb import DCBLayer
+X = torch.randn(1000, requires_grad=True)   # 1D samples
+layer = DCBLayer(target_modes=1)
+h_crit = layer(X)                           # differentiable scalar
+h_crit.backward()                           # exact IFT gradients
+```
+## Installation
+```bash
+pip install diffcb
+```
+Or from source:
+```bash
+git clone https://github.com/ryZhangHason/differentiable-critical-bandwidth
+cd differentiable-critical-bandwidth
+pip install -e ".[dev]"
+```
+## Accuracy vs R's `bw.crit`
+DCB is validated against R's `multimode::bw.crit(data, mod0=1)` — the standard reference implementation of Hall & York (2001). On **identical data**:
+| n | DCB vs R (same sample) | DCB vs R (independent samples) |
+|---|---|---|
+| 100K | **0.004%** | ~0.5% (MC noise from independent RNG) |
+| 1M | **0.005%** | ~0.2% |
+| 10M | **0.004%** | ~0.1% |
+The independent-sample figures reflect natural sampling variability (two unbiased estimators drawing different data), not algorithmic error. On identical data, DCB agrees with R to within **0.005%** at all tested n. DCB is 43× faster than R at n=100M (1.1 s vs 50 s) and handles n=2B in 24 s while R OOMs.
+## Key Parameters
+```python
+DCBLayer(
+    target_modes=1,       # target number of modes
+    G=512,                # IFT evaluation grid points
+    use_fft=True,         # FFT forward (default); eliminates subsampling bias for n>50K
+    max_n_exact=1_000_000,# sketch to sketch_size when n exceeds this (None = always exact)
+    sketch_size=500_000,  # sketch target; 500K matches full-n accuracy (O(n^{-2/9}) rate)
+    safe_backward=False,  # clamp IFT denominator near bifurcations
+)
+```
+## Confirmed Experimental Results
+All GPU results produced on Kaggle (T4 / P100) — see `experiments/` and `outputs/`.
+| Experiment | Result | Criterion |
+|---|---|---|
+| **Accuracy vs R (same data, n=100K)** | **0.004%** | < 0.01% ✓ |
+| **Validation (m≥2, Marron-Wand)** | R²=0.91, MAE=0.07, ρ=0.89 | R²≥0.85 ✓ |
+| **Speedup vs scipy (CUDA T4, n=8192)** | **10.5×** | ≥3× ✓ |
+| **GAN mode preservation** | h_crit=1.232 >> 0.3 | h_crit>0.3 ✓ |
+| **Anomaly AUC (KDDCup99)** | DCB=**0.9982** vs IF=0.9867 | DCB≥IF ✓ |
+## Changelog
+### v0.1.1 (2026-05-29)
+- **MPS fix:** `torch.histc` on MPS allocated an n×bins intermediate (OOM at n≥5M). Replaced with `bucketize+bincount` on CPU — MPS-safe and numerically identical.
+- **Sketch API:** `DCBLayer(max_n_exact=1_000_000, sketch_size=500_000)` — silently sketches to 500K when n exceeds threshold. Justified by O(n⁻²/⁹) convergence of h_crit; 500K sketch matches full-n accuracy.
+- **Consistent bisection domain:** Pre-computed domain passed to all `fft_mode_count` calls in a single bisection, eliminating per-step drift.
+- **Bias warning direction:** Corrected "expected upward bias" to "expected downward bias" on legacy `use_fft=False` path.
+- **Test fixes:** Updated 8 pre-existing test failures (tuple unpacking, bounds, deprecation API).
+### v0.1.0 (2026-05-28)
+- Initial PyPI release. FFT forward (O(n + G log G)), IFT backward, MPS support.
+## Repository Structure
+```
+dcb/            Core PyTorch package
+  layer.py        DCBLayer nn.Module + DCBFunction autograd
+  solver.py       IFT root-finder and backward pass
+  fft_kde.py      FFT-based mode counter (MPS-safe, float64, G=16384)
+  kde.py          Direct KDE derivatives (small-n path)
+  utils.py        Grid, Silverman bandwidth, sg() stabilizer
+experiments/    Reproduction scripts for all paper figures and tables
+  phase1_*.py     Validation, speedup, ablation (Figures 1–2, S1–S2)
+  phase2_gan.py   GAN mode-collapse prevention (Figure 3)
+  phase3_anomaly.py  Anomaly detection (Table 2, Figure 5)
+  round20_*.py    Large-n R comparison and streaming benchmarks
+  round21_*.py    Accuracy improvement experiments
+tests/          Unit tests (pytest, 45 passed, 1 xfailed)
+outputs/        All generated figures and tables (PDFs, PNGs, CSVs)
+```
+## Reproducing Paper Results
+```bash
+# Phase 1: validation, speedup, ablation
+python experiments/phase1_validation.py
+python experiments/phase1_speedup.py
+# Phase 2: GAN mode collapse experiment
+python experiments/phase2_gan.py
+# Phase 3: anomaly detection benchmark
+python experiments/phase3_anomaly.py
+```
+For GPU runs use the Kaggle kernels:
+- Phase 1–2: `hsingle/dcb-full-experiments`
+- Phase 3: `hsingle/dcb-phase-3-anomaly-detection`
+## Paper
+> Ruiyu Zhang. "Differentiable Critical Bandwidth: Making Silverman's Modality Test End-to-End Trainable." *Journal of Machine Learning Research*, 2026 (in preparation).
+## License
+MIT — see [LICENSE](LICENSE).

{diffcb-0.1.1 → diffcb-0.1.3}/dcb/__init__.py RENAMED Viewed

@@ -19,4 +19,4 @@ __all__ = [
     "DCBLayer", "DifferentiableCriticalBandwidth",
     "anneal_eps_tau", "soft_mode_count_cross", "soft_mode_count",
 ]
-__version__ = "0.1.1"
+__version__ = "0.1.3"

diffcb-0.1.3/dcb/fft_kde.py ADDED Viewed

@@ -0,0 +1,262 @@
+"""
+dcb.fft_kde — FFT-based KDE Mode Counter
+Implements mode counting via FFT convolution of the histogram with a
+Gaussian derivative kernel. Complexity is O(n + G log G), avoiding the
+O(n × G) cost of the direct KDE approach and — crucially — requiring NO
+subsampling. This eliminates the (brentq_n_max / n)^{-1/5} upward bias
+that affects the standard bisection path when n > brentq_n_max.
+Round 18b: forward kernel only. The IFT backward is unchanged (still uses
+the analytical chunked KDE derivatives on all n points).
+"""
+from __future__ import annotations
+import math
+import torch
+from torch import Tensor
+def fft_mode_count(
+    X: Tensor,
+    h: float,
+    G: int = 4096,
+    pad_factor: int = 4,
+    domain: tuple[float, float] | None = None,
+) -> int:
+    """Count KDE modes via FFT convolution — O(n + G log G), no subsampling.
+    Bins X into G histogram bins, zero-pads to pad_factor*G, convolves with
+    the Gaussian derivative kernel in the frequency domain (applying iω·exp(−½(ωh)²)),
+    back-transforms, and counts positive-to-negative sign changes of the
+    resulting f' estimate.
+    Parameters
+    ----------
+    X : Tensor, shape (n,)
+        1D data tensor (may be on CPU or CUDA).
+    h : float
+        Bandwidth for the Gaussian kernel.
+    G : int
+        Number of histogram bins. Must satisfy h > 8 * (data_range / G) for
+        reliable derivative estimation. Use `adaptive_fft_G` to choose G
+        automatically before bisection.
+    pad_factor : int
+        Zero-padding multiplier (default 4). Mandatory ≥ 2 for circular-wrap
+        correctness; 4 is recommended at the largest h encountered.
+    domain : (lo, hi) or None
+        If provided, use this as the histogram domain instead of computing
+        X.min() - 3σ … X.max() + 3σ. Allows the caller to align the domain
+        with the bisection bracket (e.g., X.min() - 2*h_hi … X.max() + 2*h_hi)
+        so every fft_mode_count call in a bisection loop uses an identical grid.
+    Returns
+    -------
+    int
+        Number of KDE modes (downward zero-crossings of f').
+    """
+    with torch.no_grad():
+        if domain is not None:
+            lo, hi = domain
+        else:
+            # Domain: extend 3σ beyond data range to avoid boundary effects
+            sigma = X.std().item()
+            if sigma == 0.0:
+                sigma = 1.0  # degenerate case: all points identical
+            lo = X.min().item() - 3 * sigma
+            hi = X.max().item() + 3 * sigma
+        data_range = hi - lo
+        if data_range == 0.0:
+            return 1  # single-point distribution has 1 mode
+        # Histogram (O(n)) — MPS-safe via bucketize+bincount on CPU.
+        # torch.histc on MPS allocates an n × bins float32 intermediate (PyTorch
+        # MPS bug); at n=5M, bins=512 this is ~9.5 GiB → OOM.  Moving to CPU for
+        # the binning step avoids the intermediate and is numerically identical
+        # for data within [lo, hi] (guaranteed by the 3σ domain extension above).
+        X_cpu = X.float().cpu()
+        edges = torch.linspace(lo, hi, G + 1)                       # (G+1,) CPU
+        bin_idx = torch.bucketize(X_cpu, edges, right=True).clamp(1, G) - 1  # 0-indexed
+        counts = torch.bincount(bin_idx, minlength=G).float().to(X.device)   # back to device
+        # Zero-pad to pad_factor*G — promote to float64 for FFT precision
+        N = pad_factor * G
+        counts_padded = torch.zeros(N, dtype=torch.float64, device=X.device)
+        counts_padded[:G] = counts.double()
+        # FFT of histogram (float64)
+        C = torch.fft.rfft(counts_padded)
+        # Derivative kernel in frequency domain (float64)
+        bin_width = data_range / G
+        k = torch.arange(N // 2 + 1, device=X.device, dtype=torch.float64)
+        omega = 2 * math.pi * k / (N * bin_width)
+        K_deriv = 1j * omega * torch.exp(-0.5 * (omega * h) ** 2)
+        # Convolve and back-transform; cast result back to float32
+        f_prime_padded = torch.fft.irfft(C * K_deriv, n=N).float()
+        # Trim to original G grid (discard zero-padded tail)
+        f_prime = f_prime_padded[:G]
+        # Count (+→-) sign changes = number of modes
+        # A mode is a local max of f, i.e., f' crosses zero from + to -
+        # Remove zeros (flat segments) — carry forward last nonzero sign
+        nonzero_mask = f_prime != 0
+        if not nonzero_mask.any():
+            return 0
+        s = f_prime[nonzero_mask]
+        transitions = int(((s[:-1] > 0) & (s[1:] < 0)).sum().item())
+        return transitions
+def _refine_hcrit(
+    X: Tensor,
+    h_lo: float,
+    h_hi: float,
+    G: int,
+    domain: tuple[float, float],
+    target_modes: int = 1,
+    pad_factor: int = 4,
+) -> float:
+    """Sub-bin quadratic refinement of h_crit after bisection converges.
+    Identifies the f′ lobe that disappears at the mode-merging bandwidth and
+    fits a quadratic in h to that lobe's peak value, returning the root — the
+    h where that peak exactly reaches zero.  Reduces the bin-width-limited
+    systematic from ~bin_width/h_crit to well below 1e-4.
+    When the incoming bracket [h_lo, h_hi] is tighter than one histogram bin
+    width (the common case after 50-step bisection), the function expands the
+    bracket outward from h_hi by up to 4× the bin width while maintaining the
+    invariant that fft_mode_count > target at the left endpoint and
+    <= target at the right endpoint, so the disappearing f′ lobe is visible
+    across the bracket.
+    Parameters
+    ----------
+    X : Tensor  — data (may be on any device)
+    h_lo, h_hi : float  — final bisection bracket; fft_mode_count(X,h_lo) > target,
+                          fft_mode_count(X,h_hi) <= target
+    G, domain, target_modes, pad_factor — same as fft_mode_count
+    Returns
+    -------
+    float  — refined h_crit, guaranteed to lie in [h_lo, h_hi] of the
+             (possibly expanded) bracket used for fitting.
+    """
+    import numpy as np
+    lo_d, hi_d = domain
+    data_range = hi_d - lo_d
+    if data_range == 0.0:
+        return h_hi
+    bin_width = data_range / G
+    N = pad_factor * G
+    bw = bin_width  # histogram bin width
+    # Pre-compute histogram once; reuse C (FFT of counts) for all h evaluations.
+    with torch.no_grad():
+        X_cpu = X.float().cpu()
+        edges = torch.linspace(lo_d, hi_d, G + 1)
+        bin_idx = torch.bucketize(X_cpu, edges, right=True).clamp(1, G) - 1
+        counts = torch.bincount(bin_idx, minlength=G).float()
+        counts_padded = torch.zeros(N, dtype=torch.float64)
+        counts_padded[:G] = counts.double()
+        C = torch.fft.rfft(counts_padded)
+        k = torch.arange(N // 2 + 1, dtype=torch.float64)
+        omega_base = 2 * math.pi * k / (N * bw)
+    def fprime(h: float) -> Tensor:
+        """Compute f′ array (shape G,) for bandwidth h using cached C (float64)."""
+        K_deriv = 1j * omega_base * torch.exp(-0.5 * (omega_base * h) ** 2)
+        return torch.fft.irfft(C * K_deriv, n=N).float()[:G]
+    with torch.no_grad():
+        # If the bracket is tighter than bin_width, expand it so that the
+        # disappearing f′ lobe crosses zero somewhere inside the bracket.
+        # Expand the left endpoint leftward by up to 4 bin widths.
+        ref_lo = h_lo
+        ref_hi = h_hi
+        if (ref_hi - ref_lo) < bw:
+            # Try expanding leftward until we find a bin where fp crosses zero
+            for mult in [1, 2, 3, 4]:
+                cand_lo = max(ref_hi - mult * bw, ref_hi * 0.9)
+                fp_cand = fprime(cand_lo)
+                fp_hi_  = fprime(ref_hi)
+                cm = (fp_cand > 0) & (fp_hi_ <= 0)
+                if cm.any():
+                    ref_lo = cand_lo
+                    break
+            # If still no candidates found, return bisection result unchanged
+            fp_lo_ = fprime(ref_lo)
+            fp_hi_ = fprime(ref_hi)
+            candidate_mask = (fp_lo_ > 0) & (fp_hi_ <= 0)
+            if not candidate_mask.any():
+                return h_hi
+        else:
+            fp_lo_ = fprime(ref_lo)
+            fp_hi_ = fprime(ref_hi)
+            candidate_mask = (fp_lo_ > 0) & (fp_hi_ <= 0)
+            if not candidate_mask.any():
+                return h_hi
+        # Pick the bin with the largest positive value at ref_lo that crossed zero
+        masked_fp_lo = fp_lo_.clone()
+        masked_fp_lo[~candidate_mask] = -float('inf')
+        j = int(masked_fp_lo.argmax().item())
+        h_mid = (ref_lo + ref_hi) / 2.0
+        # Evaluate fp[j] at three bandwidths for quadratic fit
+        y_lo  = fp_lo_[j].item()
+        y_mid = fprime(h_mid)[j].item()
+        y_hi  = fp_hi_[j].item()
+        # Fit quadratic y = a*h² + b*h + c through the three (h, y) pairs
+        # and solve for the root in [ref_lo, ref_hi].
+        coeffs = np.polyfit([ref_lo, h_mid, ref_hi], [y_lo, y_mid, y_hi], 2)
+        roots = np.roots(coeffs)
+        real_roots = [
+            r.real for r in roots
+            if abs(r.imag) < 1e-10 * abs(r.real + 1e-30)
+            and ref_lo <= r.real <= ref_hi
+        ]
+        if real_roots:
+            return float(min(real_roots, key=lambda r: abs(r - h_mid)))
+        return h_hi
+def adaptive_fft_G(data_range: float, h_hi: float, G_min: int = 16384) -> int:
+    """Choose FFT grid size G so that the derivative kernel is well-resolved.
+    Requires h > 8 * bin_width = 8 * data_range / G, equivalently
+    G > 8 * data_range / h_hi. We use a factor of 16 for safety margin,
+    then round up to the next power of 2 for efficient FFT.
+    Parameters
+    ----------
+    data_range : float
+        hi - lo of the data domain (typically X.max() - X.min() + 6σ).
+    h_hi : float
+        Upper bracket of the bisection (smallest h needing resolution).
+    G_min : int
+        Minimum returned G (default 16384).
+    Returns
+    -------
+    int
+        Grid size G, a power of 2, at least G_min.
+    """
+    needed = 16 * math.ceil(data_range / h_hi)
+    # Round up to next power of 2
+    p = 1
+    while p < needed:
+        p <<= 1
+    return max(G_min, p)

{diffcb-0.1.1 → diffcb-0.1.3}/dcb/layer.py RENAMED Viewed

@@ -35,13 +35,13 @@ class DCBFunction(torch.autograd.Function):
     @staticmethod
     def forward(ctx, X, grid, eps, tau, target_modes, delta, formula, chunk_size,
-                brentq_n_max, g_brentq, use_hard_bisection, safe_backward, use_fft):
+                brentq_n_max, g_brentq, use_hard_bisection, safe_backward, use_fft, fft_G_min):
         """Locate h_crit and save state for the backward pass."""
         h_crit, cond_num = find_h_crit(
             X, grid, eps, tau, target_modes,
             formula=formula, brentq_n_max=brentq_n_max, chunk_size=chunk_size,
             g_brentq=g_brentq, use_hard_bisection=use_hard_bisection,
-            use_fft=use_fft,
+            use_fft=use_fft, G_min=fft_G_min,
         )
         ctx.save_for_backward(X, grid)
         ctx.h_crit = h_crit
@@ -67,8 +67,8 @@ class DCBFunction(torch.autograd.Function):
         ctx.denom_abs       = ift_gradient.last_denom_abs
         # Gradients for: X, grid, eps, tau, target_modes, delta, formula,
         #                chunk_size, brentq_n_max, g_brentq, use_hard_bisection,
-        #                safe_backward, use_fft
-        return grad_X, None, None, None, None, None, None, None, None, None, None, None, None
+        #                safe_backward, use_fft, fft_G_min
+        return grad_X, None, None, None, None, None, None, None, None, None, None, None, None, None
 class DCBLayer(nn.Module):
@@ -133,6 +133,13 @@ class DCBLayer(nn.Module):
         Number of points to sketch when n > max_n_exact. Default 500_000.
         A 500K sketch achieves the same mean accuracy as streaming 100M points
         (validated in Round 20 reservoir experiment).
+    fft_G_min : int
+        Minimum FFT histogram grid size for the bisection solver (default 16384).
+        Controls accuracy of the FFT path (n > 50K). Larger values reduce
+        discretisation error at a modest cost: G=16384 gives ~0.004% err vs R;
+        G=32768 gives ~0.001% at +9% cost; G=65536 reaches the R-matching floor
+        (~0.001%) with no further gain beyond that. Ignored for n ≤ 50K (direct
+        KDE path).
     Examples
     --------
@@ -162,6 +169,7 @@ class DCBLayer(nn.Module):
         use_fft: bool = True,
         max_n_exact: int | None = 1_000_000,
         sketch_size: int = 500_000,
+        fft_G_min: int = 16384,
     ):
         super().__init__()
         self.target_modes = target_modes
@@ -180,6 +188,7 @@ class DCBLayer(nn.Module):
         self.use_fft = use_fft
         self.max_n_exact = max_n_exact
         self.sketch_size = sketch_size
+        self.fft_G_min = fft_G_min
         if use_fft and brentq_n_max != 50_000:
             raise TypeError(
                 f"brentq_n_max={brentq_n_max} is meaningless when use_fft=True: the FFT path "
@@ -250,7 +259,7 @@ class DCBLayer(nn.Module):
         return DCBFunction.apply(
             X, grid, eps_eff, tau_eff, self.target_modes, self.delta, self.formula,
             self.chunk_size, self.brentq_n_max, self.g_brentq, self.use_hard_bisection,
-            self.safe_backward, self.use_fft,
+            self.safe_backward, self.use_fft, self.fft_G_min,
         )

{diffcb-0.1.1 → diffcb-0.1.3}/dcb/solver.py RENAMED Viewed

@@ -74,6 +74,7 @@ def find_h_crit_hard(
     eps: float = 0.1,
     tau: float = 0.2,
     use_fft: bool = False,
+    G_min: int = 16384,
 ) -> tuple[float, float]:
     """Find h_crit via hard-mode-count bisection (monotone, no false roots).
@@ -151,7 +152,7 @@ def find_h_crit_hard(
             lo_domain = X.min().item() - 3 * sigma
             hi_domain = X.max().item() + 3 * sigma
             data_range = hi_domain - lo_domain
-        G_fft = adaptive_fft_G(data_range, h_hi)
+        G_fft = adaptive_fft_G(data_range, h_hi, G_min=G_min)
         _domain = (lo_domain, hi_domain)
         with torch.no_grad():
@@ -188,6 +189,11 @@ def find_h_crit_hard(
             h_crit = float(hi)  # smallest h with count <= target_modes
+            # Sub-bin refinement: quadratic interpolation on the disappearing f′ lobe
+            # to locate h_crit below the bin-width precision limit.
+            from dcb.fft_kde import _refine_hcrit
+            h_crit = _refine_hcrit(X, lo, hi, G_fft, _domain, target_modes)
     else:
         with torch.no_grad():
             # Verify bracket: need count > target at h_lo, count <= target at h_hi.
@@ -290,6 +296,7 @@ def find_h_crit(
     g_brentq: int = 128,
     use_hard_bisection: bool = True,
     use_fft: bool = True,
+    G_min: int = 16384,
 ) -> tuple[float, float]:
     """Find h_crit and return (h_crit, condition_number).
@@ -343,7 +350,7 @@ def find_h_crit(
         return find_h_crit_hard(
             X, grid, target_modes, chunk_size, brentq_n_max,
             h_lo, h_hi, formula=formula, eps=eps, tau=tau,
-            use_fft=use_fft,
+            use_fft=use_fft, G_min=G_min,
         )
     from scipy.optimize import brentq

{diffcb-0.1.1 → diffcb-0.1.3}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "diffcb"
-version = "0.1.1"
+version = "0.1.3"
 description = "Differentiable Critical Bandwidth: Silverman's modality test as a differentiable PyTorch layer with IFT backward pass."
 readme = "README.md"
 license = { file = "LICENSE" }

diffcb-0.1.1/README.md DELETED Viewed

@@ -1,92 +0,0 @@
-# DCB — Differentiable Critical Bandwidth
-[![PyPI](https://img.shields.io/pypi/v/diffcb.svg)](https://pypi.org/project/diffcb/)
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
-[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/)
-A PyTorch package that makes **Silverman's critical bandwidth test (1981)** fully differentiable, enabling end-to-end gradient-based optimization over the modal structure of continuous distributions.
-## Overview
-The critical bandwidth `h_crit` is the minimum KDE bandwidth at which a distribution appears to have at most `m` modes — a classical nonparametric statistic for modality testing. DCB replaces every non-differentiable operation in its computation with a smooth surrogate, then uses the **Implicit Function Theorem** to compute exact gradients through the root-finding step at O(1) memory cost.
-```python
-import torch
-from dcb import DCBLayer
-X = torch.randn(256, requires_grad=True)   # 1D samples
-layer = DCBLayer(target_modes=1)
-h_crit = layer(X)                          # differentiable scalar
-h_crit.backward()                          # exact IFT gradients
-```
-## Installation
-```bash
-pip install diffcb
-```
-Or from source:
-```bash
-git clone https://github.com/ryZhangHason/differentiable-critical-bandwidth
-cd differentiable-critical-bandwidth
-pip install -e ".[dev]"
-```
-## Paper
-> Ruiyu Zhang. "Differentiable Critical Bandwidth: Making Silverman's Modality Test End-to-End Trainable." *Journal of Machine Learning Research*, 2026 (in preparation).
-## Confirmed Experimental Results
-All results produced on Kaggle GPU (T4 / P100) — see `experiments/` and `outputs/`.
-| Experiment | Result | Criterion |
-|---|---|---|
-| **Validation (m≥2)** | R²=0.91, MAE=0.07, Spearman ρ=0.89 | R²≥0.85, MAE≤0.10 ✓ |
-| **Speedup vs scipy (n=8192)** | **10.5×** on T4 | ≥3× ✓ |
-| **GAN mode preservation** | h_crit=1.232 >> 0.3 | h_crit>0.3 ✓ |
-| **Anomaly AUC (KDDCup99)** | DCB=**0.9982** vs IF=0.9867 | DCB≥IF ✓ |
-## Repository Structure
-```
-dcb/            Core PyTorch package (layer.py, solver.py, kde.py, utils.py)
-experiments/    Reproduction scripts for all paper figures and tables
-  phase1_validation.py   Figure 1: DCB vs reference h_crit scatter
-  phase1_speedup.py      Figure 2: GPU speedup benchmark
-  phase1_ablation.py     Figures S1–S2: ε/τ sensitivity heatmaps
-  phase2_gan.py          Figure 3: GAN mode-collapse prevention
-  phase3_anomaly.py      Table 2 + Figure 5: anomaly detection benchmark
-tests/          Unit tests (pytest, 35/35 passing)
-outputs/        All generated figures and tables (PDFs, PNGs, CSVs)
-notebooks/      Quickstart and demo notebooks
-```
-## Reproducing Paper Results
-```bash
-# Phase 1: validation, speedup, ablation
-python experiments/phase1_validation.py
-python experiments/phase1_speedup.py
-python experiments/phase1_ablation.py
-# Phase 2: GAN mode collapse experiment
-python experiments/phase2_gan.py
-# Phase 3: anomaly detection benchmark
-python experiments/phase3_anomaly.py
-```
-For GPU runs, use the provided Kaggle kernels:
-- Phase 1–2: `hsingle/dcb-full-experiments`
-- Phase 3: `hsingle/dcb-phase-3-anomaly-detection`
-## Kaggle GPU Notes
-Kaggle may assign a P100 (sm_60) instead of T4. The Phase 3 kernel handles this automatically by installing `torch==2.2.2+cu118` (the earliest PyTorch release with both Python 3.12 and sm_60 support) when P100 is detected.
-## License
-MIT — see [LICENSE](LICENSE).

diffcb-0.1.1/dcb/fft_kde.py DELETED Viewed

@@ -1,144 +0,0 @@
-"""
-dcb.fft_kde — FFT-based KDE Mode Counter
-Implements mode counting via FFT convolution of the histogram with a
-Gaussian derivative kernel. Complexity is O(n + G log G), avoiding the
-O(n × G) cost of the direct KDE approach and — crucially — requiring NO
-subsampling. This eliminates the (brentq_n_max / n)^{-1/5} upward bias
-that affects the standard bisection path when n > brentq_n_max.
-Round 18b: forward kernel only. The IFT backward is unchanged (still uses
-the analytical chunked KDE derivatives on all n points).
-"""
-from __future__ import annotations
-import math
-import torch
-from torch import Tensor
-def fft_mode_count(
-    X: Tensor,
-    h: float,
-    G: int = 4096,
-    pad_factor: int = 4,
-    domain: tuple[float, float] | None = None,
-) -> int:
-    """Count KDE modes via FFT convolution — O(n + G log G), no subsampling.
-    Bins X into G histogram bins, zero-pads to pad_factor*G, convolves with
-    the Gaussian derivative kernel in the frequency domain (applying iω·exp(−½(ωh)²)),
-    back-transforms, and counts positive-to-negative sign changes of the
-    resulting f' estimate.
-    Parameters
-    ----------
-    X : Tensor, shape (n,)
-        1D data tensor (may be on CPU or CUDA).
-    h : float
-        Bandwidth for the Gaussian kernel.
-    G : int
-        Number of histogram bins. Must satisfy h > 8 * (data_range / G) for
-        reliable derivative estimation. Use `adaptive_fft_G` to choose G
-        automatically before bisection.
-    pad_factor : int
-        Zero-padding multiplier (default 4). Mandatory ≥ 2 for circular-wrap
-        correctness; 4 is recommended at the largest h encountered.
-    domain : (lo, hi) or None
-        If provided, use this as the histogram domain instead of computing
-        X.min() - 3σ … X.max() + 3σ. Allows the caller to align the domain
-        with the bisection bracket (e.g., X.min() - 2*h_hi … X.max() + 2*h_hi)
-        so every fft_mode_count call in a bisection loop uses an identical grid.
-    Returns
-    -------
-    int
-        Number of KDE modes (downward zero-crossings of f').
-    """
-    with torch.no_grad():
-        if domain is not None:
-            lo, hi = domain
-        else:
-            # Domain: extend 3σ beyond data range to avoid boundary effects
-            sigma = X.std().item()
-            if sigma == 0.0:
-                sigma = 1.0  # degenerate case: all points identical
-            lo = X.min().item() - 3 * sigma
-            hi = X.max().item() + 3 * sigma
-        data_range = hi - lo
-        if data_range == 0.0:
-            return 1  # single-point distribution has 1 mode
-        # Histogram (O(n)) — MPS-safe via bucketize+bincount on CPU.
-        # torch.histc on MPS allocates an n × bins float32 intermediate (PyTorch
-        # MPS bug); at n=5M, bins=512 this is ~9.5 GiB → OOM.  Moving to CPU for
-        # the binning step avoids the intermediate and is numerically identical
-        # for data within [lo, hi] (guaranteed by the 3σ domain extension above).
-        X_cpu = X.float().cpu()
-        edges = torch.linspace(lo, hi, G + 1)                       # (G+1,) CPU
-        bin_idx = torch.bucketize(X_cpu, edges, right=True).clamp(1, G) - 1  # 0-indexed
-        counts = torch.bincount(bin_idx, minlength=G).float().to(X.device)   # back to device
-        # Zero-pad to pad_factor*G (4× mandatory for circular wrap correctness at h_hi)
-        N = pad_factor * G
-        counts_padded = torch.zeros(N, dtype=torch.float32, device=X.device)
-        counts_padded[:G] = counts
-        # FFT of histogram
-        C = torch.fft.rfft(counts_padded)
-        # Derivative kernel in frequency domain: iω * exp(-0.5*(ω*h)²)
-        # ω_k = 2π*k / (N * bin_width), bin_width = data_range / G
-        bin_width = data_range / G
-        k = torch.arange(N // 2 + 1, device=X.device, dtype=torch.float32)
-        omega = 2 * math.pi * k / (N * bin_width)
-        K_deriv = 1j * omega * torch.exp(-0.5 * (omega * h) ** 2)
-        # Convolve and back-transform
-        f_prime_padded = torch.fft.irfft(C * K_deriv, n=N)
-        # Trim to original G grid (discard zero-padded tail)
-        f_prime = f_prime_padded[:G]
-        # Count (+→-) sign changes = number of modes
-        # A mode is a local max of f, i.e., f' crosses zero from + to -
-        # Remove zeros (flat segments) — carry forward last nonzero sign
-        nonzero_mask = f_prime != 0
-        if not nonzero_mask.any():
-            return 0
-        s = f_prime[nonzero_mask]
-        transitions = int(((s[:-1] > 0) & (s[1:] < 0)).sum().item())
-        return transitions
-def adaptive_fft_G(data_range: float, h_hi: float, G_min: int = 4096) -> int:
-    """Choose FFT grid size G so that the derivative kernel is well-resolved.
-    Requires h > 8 * bin_width = 8 * data_range / G, equivalently
-    G > 8 * data_range / h_hi. We use a factor of 16 for safety margin,
-    then round up to the next power of 2 for efficient FFT.
-    Parameters
-    ----------
-    data_range : float
-        hi - lo of the data domain (typically X.max() - X.min() + 6σ).
-    h_hi : float
-        Upper bracket of the bisection (smallest h needing resolution).
-    G_min : int
-        Minimum returned G (default 4096).
-    Returns
-    -------
-    int
-        Grid size G, a power of 2, at least G_min.
-    """
-    needed = 16 * math.ceil(data_range / h_hi)
-    # Round up to next power of 2
-    p = 1
-    while p < needed:
-        p <<= 1
-    return max(G_min, p)