PyPI - diffcb - Versions diffs - 0.1.1__tar.gz → 0.1.4__tar.gz - Mend

diffcb 0.1.1tar.gz → 0.1.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

{diffcb-0.1.1 → diffcb-0.1.4}/PKG-INFO +58 -21
diffcb-0.1.4/README.md +129 -0
{diffcb-0.1.1 → diffcb-0.1.4}/dcb/__init__.py +1 -1
diffcb-0.1.4/dcb/fft_kde.py +339 -0
{diffcb-0.1.1 → diffcb-0.1.4}/dcb/layer.py +17 -5
{diffcb-0.1.1 → diffcb-0.1.4}/dcb/solver.py +37 -9
{diffcb-0.1.1 → diffcb-0.1.4}/pyproject.toml +1 -1
diffcb-0.1.1/README.md +0 -92
diffcb-0.1.1/dcb/fft_kde.py +0 -144
{diffcb-0.1.1 → diffcb-0.1.4}/.gitignore +0 -0
{diffcb-0.1.1 → diffcb-0.1.4}/.zenodo.json +0 -0
{diffcb-0.1.1 → diffcb-0.1.4}/LICENSE +0 -0
{diffcb-0.1.1 → diffcb-0.1.4}/dcb/diagnostics.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.4}/dcb/kde.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.4}/dcb/utils.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.4}/notebooks/.gitkeep +0 -0
{diffcb-0.1.1 → diffcb-0.1.4}/tests/test_kde.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.4}/tests/test_layer.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.4}/tests/test_r18c_denom_audit.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.4}/tests/test_r18c_deprecation_warn.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.4}/tests/test_r19_default_fft.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.4}/tests/test_r19_diagnostics.py +0 -0
{diffcb-0.1.1 → diffcb-0.1.4}/tests/test_solver.py +0 -0

{diffcb-0.1.1 → diffcb-0.1.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: diffcb
-Version: 0.1.1
+Version: 0.1.4
 Summary: Differentiable Critical Bandwidth: Silverman's modality test as a differentiable PyTorch layer with IFT backward pass.
 Project-URL: Homepage, https://github.com/ryZhangHason/differentiable-critical-bandwidth
 Project-URL: Repository, https://github.com/ryZhangHason/differentiable-critical-bandwidth
@@ -71,10 +71,10 @@ The critical bandwidth `h_crit` is the minimum KDE bandwidth at which a distribu
 import torch
 from dcb import DCBLayer
-X = torch.randn(256, requires_grad=True)   # 1D samples
+X = torch.randn(1000, requires_grad=True)   # 1D samples
 layer = DCBLayer(target_modes=1)
-h_crit = layer(X)                          # differentiable scalar
-h_crit.backward()                          # exact IFT gradients
+h_crit = layer(X)                           # differentiable scalar
+h_crit.backward()                           # exact IFT gradients
 ```
 ## Installation
@@ -91,34 +91,72 @@ cd differentiable-critical-bandwidth
 pip install -e ".[dev]"
 ```
-## Paper
+## Accuracy vs R's `bw.crit`
-> Ruiyu Zhang. "Differentiable Critical Bandwidth: Making Silverman's Modality Test End-to-End Trainable." *Journal of Machine Learning Research*, 2026 (in preparation).
+DCB is validated against R's `multimode::bw.crit(data, mod0=1)` — the standard reference implementation of Hall & York (2001). On **identical data**:
+| n | DCB vs R (same sample) | DCB vs R (independent samples) |
+|---|---|---|
+| 100K | **0.004%** | ~0.5% (MC noise from independent RNG) |
+| 1M | **0.005%** | ~0.2% |
+| 10M | **0.004%** | ~0.1% |
+The independent-sample figures reflect natural sampling variability (two unbiased estimators drawing different data), not algorithmic error. On identical data, DCB agrees with R to within **0.005%** at all tested n. DCB is 43× faster than R at n=100M (1.1 s vs 50 s) and handles n=2B in 24 s while R OOMs.
+## Key Parameters
+```python
+DCBLayer(
+    target_modes=1,       # target number of modes
+    G=512,                # IFT evaluation grid points
+    use_fft=True,         # FFT forward (default); eliminates subsampling bias for n>50K
+    max_n_exact=1_000_000,# sketch to sketch_size when n exceeds this (None = always exact)
+    sketch_size=500_000,  # sketch target; 500K matches full-n accuracy (O(n^{-2/9}) rate)
+    safe_backward=False,  # clamp IFT denominator near bifurcations
+)
+```
 ## Confirmed Experimental Results
-All results produced on Kaggle GPU (T4 / P100) — see `experiments/` and `outputs/`.
+All GPU results produced on Kaggle (T4 / P100) — see `experiments/` and `outputs/`.
 | Experiment | Result | Criterion |
 |---|---|---|
-| **Validation (m≥2)** | R²=0.91, MAE=0.07, Spearman ρ=0.89 | R²≥0.85, MAE≤0.10 ✓ |
-| **Speedup vs scipy (n=8192)** | **10.5×** on T4 | ≥3× ✓ |
+| **Accuracy vs R (same data, n=100K)** | **0.004%** | < 0.01% ✓ |
+| **Validation (m≥2, Marron-Wand)** | R²=0.91, MAE=0.07, ρ=0.89 | R²≥0.85 ✓ |
+| **Speedup vs scipy (CUDA T4, n=8192)** | **10.5×** | ≥3× ✓ |
 | **GAN mode preservation** | h_crit=1.232 >> 0.3 | h_crit>0.3 ✓ |
 | **Anomaly AUC (KDDCup99)** | DCB=**0.9982** vs IF=0.9867 | DCB≥IF ✓ |
+## Changelog
+### v0.1.1 (2026-05-29)
+- **MPS fix:** `torch.histc` on MPS allocated an n×bins intermediate (OOM at n≥5M). Replaced with `bucketize+bincount` on CPU — MPS-safe and numerically identical.
+- **Sketch API:** `DCBLayer(max_n_exact=1_000_000, sketch_size=500_000)` — silently sketches to 500K when n exceeds threshold. Justified by O(n⁻²/⁹) convergence of h_crit; 500K sketch matches full-n accuracy.
+- **Consistent bisection domain:** Pre-computed domain passed to all `fft_mode_count` calls in a single bisection, eliminating per-step drift.
+- **Bias warning direction:** Corrected "expected upward bias" to "expected downward bias" on legacy `use_fft=False` path.
+- **Test fixes:** Updated 8 pre-existing test failures (tuple unpacking, bounds, deprecation API).
+### v0.1.0 (2026-05-28)
+- Initial PyPI release. FFT forward (O(n + G log G)), IFT backward, MPS support.
 ## Repository Structure
 ```
-dcb/            Core PyTorch package (layer.py, solver.py, kde.py, utils.py)
+dcb/            Core PyTorch package
+  layer.py        DCBLayer nn.Module + DCBFunction autograd
+  solver.py       IFT root-finder and backward pass
+  fft_kde.py      FFT-based mode counter (MPS-safe, float64, G=16384)
+  kde.py          Direct KDE derivatives (small-n path)
+  utils.py        Grid, Silverman bandwidth, sg() stabilizer
 experiments/    Reproduction scripts for all paper figures and tables
-  phase1_validation.py   Figure 1: DCB vs reference h_crit scatter
-  phase1_speedup.py      Figure 2: GPU speedup benchmark
-  phase1_ablation.py     Figures S1–S2: ε/τ sensitivity heatmaps
-  phase2_gan.py          Figure 3: GAN mode-collapse prevention
-  phase3_anomaly.py      Table 2 + Figure 5: anomaly detection benchmark
-tests/          Unit tests (pytest, 35/35 passing)
+  phase1_*.py     Validation, speedup, ablation (Figures 1–2, S1–S2)
+  phase2_gan.py   GAN mode-collapse prevention (Figure 3)
+  phase3_anomaly.py  Anomaly detection (Table 2, Figure 5)
+  round20_*.py    Large-n R comparison and streaming benchmarks
+  round21_*.py    Accuracy improvement experiments
+tests/          Unit tests (pytest, 45 passed, 1 xfailed)
 outputs/        All generated figures and tables (PDFs, PNGs, CSVs)
-notebooks/      Quickstart and demo notebooks
 ```
 ## Reproducing Paper Results
@@ -127,7 +165,6 @@ notebooks/      Quickstart and demo notebooks
 # Phase 1: validation, speedup, ablation
 python experiments/phase1_validation.py
 python experiments/phase1_speedup.py
-python experiments/phase1_ablation.py
 # Phase 2: GAN mode collapse experiment
 python experiments/phase2_gan.py
@@ -136,13 +173,13 @@ python experiments/phase2_gan.py
 python experiments/phase3_anomaly.py
 ```
-For GPU runs, use the provided Kaggle kernels:
+For GPU runs use the Kaggle kernels:
 - Phase 1–2: `hsingle/dcb-full-experiments`
 - Phase 3: `hsingle/dcb-phase-3-anomaly-detection`
-## Kaggle GPU Notes
+## Paper
-Kaggle may assign a P100 (sm_60) instead of T4. The Phase 3 kernel handles this automatically by installing `torch==2.2.2+cu118` (the earliest PyTorch release with both Python 3.12 and sm_60 support) when P100 is detected.
+> Ruiyu Zhang. "Differentiable Critical Bandwidth: Making Silverman's Modality Test End-to-End Trainable." *Journal of Machine Learning Research*, 2026 (in preparation).
 ## License

diffcb-0.1.4/README.md ADDED Viewed

@@ -0,0 +1,129 @@
+# DCB — Differentiable Critical Bandwidth
+[![PyPI](https://img.shields.io/pypi/v/diffcb.svg)](https://pypi.org/project/diffcb/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
+[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/)
+A PyTorch package that makes **Silverman's critical bandwidth test (1981)** fully differentiable, enabling end-to-end gradient-based optimization over the modal structure of continuous distributions.
+## Overview
+The critical bandwidth `h_crit` is the minimum KDE bandwidth at which a distribution appears to have at most `m` modes — a classical nonparametric statistic for modality testing. DCB replaces every non-differentiable operation in its computation with a smooth surrogate, then uses the **Implicit Function Theorem** to compute exact gradients through the root-finding step at O(1) memory cost.
+```python
+import torch
+from dcb import DCBLayer
+X = torch.randn(1000, requires_grad=True)   # 1D samples
+layer = DCBLayer(target_modes=1)
+h_crit = layer(X)                           # differentiable scalar
+h_crit.backward()                           # exact IFT gradients
+```
+## Installation
+```bash
+pip install diffcb
+```
+Or from source:
+```bash
+git clone https://github.com/ryZhangHason/differentiable-critical-bandwidth
+cd differentiable-critical-bandwidth
+pip install -e ".[dev]"
+```
+## Accuracy vs R's `bw.crit`
+DCB is validated against R's `multimode::bw.crit(data, mod0=1)` — the standard reference implementation of Hall & York (2001). On **identical data**:
+| n | DCB vs R (same sample) | DCB vs R (independent samples) |
+|---|---|---|
+| 100K | **0.004%** | ~0.5% (MC noise from independent RNG) |
+| 1M | **0.005%** | ~0.2% |
+| 10M | **0.004%** | ~0.1% |
+The independent-sample figures reflect natural sampling variability (two unbiased estimators drawing different data), not algorithmic error. On identical data, DCB agrees with R to within **0.005%** at all tested n. DCB is 43× faster than R at n=100M (1.1 s vs 50 s) and handles n=2B in 24 s while R OOMs.
+## Key Parameters
+```python
+DCBLayer(
+    target_modes=1,       # target number of modes
+    G=512,                # IFT evaluation grid points
+    use_fft=True,         # FFT forward (default); eliminates subsampling bias for n>50K
+    max_n_exact=1_000_000,# sketch to sketch_size when n exceeds this (None = always exact)
+    sketch_size=500_000,  # sketch target; 500K matches full-n accuracy (O(n^{-2/9}) rate)
+    safe_backward=False,  # clamp IFT denominator near bifurcations
+)
+```
+## Confirmed Experimental Results
+All GPU results produced on Kaggle (T4 / P100) — see `experiments/` and `outputs/`.
+| Experiment | Result | Criterion |
+|---|---|---|
+| **Accuracy vs R (same data, n=100K)** | **0.004%** | < 0.01% ✓ |
+| **Validation (m≥2, Marron-Wand)** | R²=0.91, MAE=0.07, ρ=0.89 | R²≥0.85 ✓ |
+| **Speedup vs scipy (CUDA T4, n=8192)** | **10.5×** | ≥3× ✓ |
+| **GAN mode preservation** | h_crit=1.232 >> 0.3 | h_crit>0.3 ✓ |
+| **Anomaly AUC (KDDCup99)** | DCB=**0.9982** vs IF=0.9867 | DCB≥IF ✓ |
+## Changelog
+### v0.1.1 (2026-05-29)
+- **MPS fix:** `torch.histc` on MPS allocated an n×bins intermediate (OOM at n≥5M). Replaced with `bucketize+bincount` on CPU — MPS-safe and numerically identical.
+- **Sketch API:** `DCBLayer(max_n_exact=1_000_000, sketch_size=500_000)` — silently sketches to 500K when n exceeds threshold. Justified by O(n⁻²/⁹) convergence of h_crit; 500K sketch matches full-n accuracy.
+- **Consistent bisection domain:** Pre-computed domain passed to all `fft_mode_count` calls in a single bisection, eliminating per-step drift.
+- **Bias warning direction:** Corrected "expected upward bias" to "expected downward bias" on legacy `use_fft=False` path.
+- **Test fixes:** Updated 8 pre-existing test failures (tuple unpacking, bounds, deprecation API).
+### v0.1.0 (2026-05-28)
+- Initial PyPI release. FFT forward (O(n + G log G)), IFT backward, MPS support.
+## Repository Structure
+```
+dcb/            Core PyTorch package
+  layer.py        DCBLayer nn.Module + DCBFunction autograd
+  solver.py       IFT root-finder and backward pass
+  fft_kde.py      FFT-based mode counter (MPS-safe, float64, G=16384)
+  kde.py          Direct KDE derivatives (small-n path)
+  utils.py        Grid, Silverman bandwidth, sg() stabilizer
+experiments/    Reproduction scripts for all paper figures and tables
+  phase1_*.py     Validation, speedup, ablation (Figures 1–2, S1–S2)
+  phase2_gan.py   GAN mode-collapse prevention (Figure 3)
+  phase3_anomaly.py  Anomaly detection (Table 2, Figure 5)
+  round20_*.py    Large-n R comparison and streaming benchmarks
+  round21_*.py    Accuracy improvement experiments
+tests/          Unit tests (pytest, 45 passed, 1 xfailed)
+outputs/        All generated figures and tables (PDFs, PNGs, CSVs)
+```
+## Reproducing Paper Results
+```bash
+# Phase 1: validation, speedup, ablation
+python experiments/phase1_validation.py
+python experiments/phase1_speedup.py
+# Phase 2: GAN mode collapse experiment
+python experiments/phase2_gan.py
+# Phase 3: anomaly detection benchmark
+python experiments/phase3_anomaly.py
+```
+For GPU runs use the Kaggle kernels:
+- Phase 1–2: `hsingle/dcb-full-experiments`
+- Phase 3: `hsingle/dcb-phase-3-anomaly-detection`
+## Paper
+> Ruiyu Zhang. "Differentiable Critical Bandwidth: Making Silverman's Modality Test End-to-End Trainable." *Journal of Machine Learning Research*, 2026 (in preparation).
+## License
+MIT — see [LICENSE](LICENSE).

{diffcb-0.1.1 → diffcb-0.1.4}/dcb/__init__.py RENAMED Viewed

@@ -19,4 +19,4 @@ __all__ = [
     "DCBLayer", "DifferentiableCriticalBandwidth",
     "anneal_eps_tau", "soft_mode_count_cross", "soft_mode_count",
 ]
-__version__ = "0.1.1"
+__version__ = "0.1.4"

diffcb-0.1.4/dcb/fft_kde.py ADDED Viewed

@@ -0,0 +1,339 @@
+"""
+dcb.fft_kde — FFT-based KDE Mode Counter
+Implements mode counting via FFT convolution of the histogram with a
+Gaussian derivative kernel. Complexity is O(n + G log G), avoiding the
+O(n × G) cost of the direct KDE approach and — crucially — requiring NO
+subsampling. This eliminates the (brentq_n_max / n)^{-1/5} upward bias
+that affects the standard bisection path when n > brentq_n_max.
+Round 18b: forward kernel only. The IFT backward is unchanged (still uses
+the analytical chunked KDE derivatives on all n points).
+"""
+from __future__ import annotations
+import math
+import torch
+from torch import Tensor
+# Worker 2: device-native histogram
+def _histogram_on_device(X: Tensor, G: int, lo: float, hi: float) -> Tensor:
+    """Compute a G-bin histogram of X on the same device as X."""
+    device = X.device
+    if device.type == 'cuda':
+        return torch.histc(X.float(), bins=G, min=lo, max=hi)
+    elif device.type == 'mps':
+        bin_idx = ((X.float() - lo) * (G / (hi - lo))).long().clamp_(0, G - 1)
+        counts = torch.zeros(G, dtype=torch.float32, device=device)
+        counts.scatter_add_(0, bin_idx, torch.ones(X.shape[0], dtype=torch.float32, device=device))
+        return counts
+    else:  # cpu
+        X_cpu = X.float()
+        edges = torch.linspace(lo, hi, G + 1)
+        bin_idx = torch.bucketize(X_cpu, edges, right=True).clamp(1, G) - 1
+        return torch.bincount(bin_idx, minlength=G).float()
+def precompute_fft(
+    X: Tensor,
+    G: int = 4096,
+    domain: tuple[float, float] | None = None,
+    pad_factor: int = 2,  # Worker 5: pad_factor=2 (was 4) — safe for h ≤ 3σ, halves irfft size
+    fft_dtype: torch.dtype = torch.float32,  # Worker 3: float32 FFT
+) -> tuple[Tensor, Tensor, tuple[float, float]]:
+    """Precompute the FFT of the zero-padded histogram of X.
+    This is the bandwidth-independent work shared across a bisection loop on
+    h: build the histogram, zero-pad, take rfft, and build the frequency grid
+    omega.  The per-step kernel K(omega, h) = i*omega*exp(-0.5*(omega*h)**2)
+    must be combined with C inside `mode_count_from_C`.
+    Parameters
+    ----------
+    X : Tensor, shape (n,)
+    G : int
+        Number of histogram bins.
+    domain : (lo, hi) or None
+        If provided, use as histogram domain; otherwise computed from X
+        with a 3*sigma margin.
+    pad_factor : int
+        Zero-padding multiplier (default 4).
+    Returns
+    -------
+    C : Tensor, shape (N//2+1,), complex128
+        rfft of the zero-padded float64 histogram.  Empty tensor (degenerate
+        zero-range domain) signals the caller to short-circuit to 1 mode.
+    omega : Tensor, shape (N//2+1,), float64
+        Angular frequency grid for the FFT.
+    domain : (lo, hi)
+        Domain tuple actually used.
+    """
+    with torch.no_grad():
+        if domain is not None:
+            lo, hi = domain
+        else:
+            sigma = X.std().item()
+            if sigma == 0.0:
+                sigma = 1.0
+            lo = X.min().item() - 3 * sigma
+            hi = X.max().item() + 3 * sigma
+        data_range = hi - lo
+        if data_range == 0.0:
+            complex_dtype = torch.complex64 if fft_dtype == torch.float32 else torch.complex128
+            empty = torch.zeros(0, dtype=complex_dtype, device=X.device)
+            empty_omega = torch.zeros(0, dtype=fft_dtype, device=X.device)
+            return empty, empty_omega, (lo, hi)
+        # Histogram (O(n)) — device-native dispatch.
+        counts = _histogram_on_device(X, G, lo, hi)
+        N = pad_factor * G
+        counts_padded = torch.zeros(N, dtype=fft_dtype, device=X.device)
+        counts_padded[:G] = counts.to(fft_dtype)
+        C = torch.fft.rfft(counts_padded)
+        bin_width = data_range / G
+        k = torch.arange(N // 2 + 1, device=X.device, dtype=fft_dtype)
+        omega = 2 * math.pi * k / (N * bin_width)
+    return C, omega, (lo, hi)
+def mode_count_from_C(
+    C: Tensor,
+    omega: Tensor,
+    h: float,
+    G: int,
+    N: int,
+) -> int:
+    """Per-step mode count: apply Gaussian derivative kernel and count sign changes.
+    Cheap inner loop body for bisection — only the kernel depends on h.
+    Parameters
+    ----------
+    C : Tensor, shape (N//2+1,), complex
+        rfft of the zero-padded histogram (from `precompute_fft`).
+    omega : Tensor, shape (N//2+1,), float64
+        Frequency grid (from `precompute_fft`).
+    h : float
+        Bandwidth.
+    G : int
+        Histogram bin count.
+    N : int
+        Padded FFT length (pad_factor * G).
+    Returns
+    -------
+    int
+        Number of KDE modes.
+    """
+    if C.numel() == 0:
+        return 1  # degenerate single-point distribution
+    K_deriv = 1j * omega * torch.exp(-0.5 * (omega * h) ** 2)
+    f_prime_padded = torch.fft.irfft(C * K_deriv, n=N).real
+    f_prime = f_prime_padded[:G]
+    nonzero_mask = f_prime != 0
+    if not nonzero_mask.any():
+        return 0
+    s = f_prime[nonzero_mask]
+    transitions = int(((s[:-1] > 0) & (s[1:] < 0)).sum().item())
+    return transitions
+def fft_mode_count(
+    X: Tensor,
+    h: float,
+    G: int = 4096,
+    pad_factor: int = 2,  # Worker 5: pad_factor=2 (was 4) — safe for h ≤ 3σ, halves irfft size
+    domain: tuple[float, float] | None = None,
+) -> int:
+    """Count KDE modes via FFT convolution — O(n + G log G), no subsampling.
+    Bins X into G histogram bins, zero-pads to pad_factor*G, convolves with
+    the Gaussian derivative kernel in the frequency domain (applying iω·exp(−½(ωh)²)),
+    back-transforms, and counts positive-to-negative sign changes of the
+    resulting f' estimate.
+    Parameters
+    ----------
+    X : Tensor, shape (n,)
+        1D data tensor (may be on CPU or CUDA).
+    h : float
+        Bandwidth for the Gaussian kernel.
+    G : int
+        Number of histogram bins. Must satisfy h > 8 * (data_range / G) for
+        reliable derivative estimation. Use `adaptive_fft_G` to choose G
+        automatically before bisection.
+    pad_factor : int
+        Zero-padding multiplier (default 4). Mandatory ≥ 2 for circular-wrap
+        correctness; 4 is recommended at the largest h encountered.
+    domain : (lo, hi) or None
+        If provided, use this as the histogram domain instead of computing
+        X.min() - 3σ … X.max() + 3σ. Allows the caller to align the domain
+        with the bisection bracket (e.g., X.min() - 2*h_hi … X.max() + 2*h_hi)
+        so every fft_mode_count call in a bisection loop uses an identical grid.
+    Returns
+    -------
+    int
+        Number of KDE modes (downward zero-crossings of f').
+    """
+    with torch.no_grad():
+        C, omega, _ = precompute_fft(X, G=G, domain=domain, pad_factor=pad_factor)
+        N = pad_factor * G
+        return mode_count_from_C(C, omega, h, G, N)
+def _refine_hcrit(
+    X: Tensor,
+    h_lo: float,
+    h_hi: float,
+    G: int,
+    domain: tuple[float, float],
+    target_modes: int = 1,
+    pad_factor: int = 2,  # Worker 5: pad_factor=2 (was 4) — safe for h ≤ 3σ, halves irfft size
+) -> float:
+    """Sub-bin quadratic refinement of h_crit after bisection converges.
+    Identifies the f′ lobe that disappears at the mode-merging bandwidth and
+    fits a quadratic in h to that lobe's peak value, returning the root — the
+    h where that peak exactly reaches zero.  Reduces the bin-width-limited
+    systematic from ~bin_width/h_crit to well below 1e-4.
+    When the incoming bracket [h_lo, h_hi] is tighter than one histogram bin
+    width (the common case after 50-step bisection), the function expands the
+    bracket outward from h_hi by up to 4× the bin width while maintaining the
+    invariant that fft_mode_count > target at the left endpoint and
+    <= target at the right endpoint, so the disappearing f′ lobe is visible
+    across the bracket.
+    Parameters
+    ----------
+    X : Tensor  — data (may be on any device)
+    h_lo, h_hi : float  — final bisection bracket; fft_mode_count(X,h_lo) > target,
+                          fft_mode_count(X,h_hi) <= target
+    G, domain, target_modes, pad_factor — same as fft_mode_count
+    Returns
+    -------
+    float  — refined h_crit, guaranteed to lie in [h_lo, h_hi] of the
+             (possibly expanded) bracket used for fitting.
+    """
+    import numpy as np
+    lo_d, hi_d = domain
+    data_range = hi_d - lo_d
+    if data_range == 0.0:
+        return h_hi
+    bin_width = data_range / G
+    N = pad_factor * G
+    bw = bin_width  # histogram bin width
+    # Pre-compute histogram once; reuse C (FFT of counts) for all h evaluations.
+    with torch.no_grad():
+        counts = _histogram_on_device(X, G, lo_d, hi_d).cpu()
+        counts_padded = torch.zeros(N, dtype=torch.float64)
+        counts_padded[:G] = counts.double()
+        C = torch.fft.rfft(counts_padded)
+        k = torch.arange(N // 2 + 1, dtype=torch.float64)
+        omega_base = 2 * math.pi * k / (N * bw)
+    def fprime(h: float) -> Tensor:
+        """Compute f′ array (shape G,) for bandwidth h using cached C (float64)."""
+        K_deriv = 1j * omega_base * torch.exp(-0.5 * (omega_base * h) ** 2)
+        return torch.fft.irfft(C * K_deriv, n=N).float()[:G]
+    with torch.no_grad():
+        # If the bracket is tighter than bin_width, expand it so that the
+        # disappearing f′ lobe crosses zero somewhere inside the bracket.
+        # Expand the left endpoint leftward by up to 4 bin widths.
+        ref_lo = h_lo
+        ref_hi = h_hi
+        if (ref_hi - ref_lo) < bw:
+            # Try expanding leftward until we find a bin where fp crosses zero
+            for mult in [1, 2, 3, 4]:
+                cand_lo = max(ref_hi - mult * bw, ref_hi * 0.9)
+                fp_cand = fprime(cand_lo)
+                fp_hi_  = fprime(ref_hi)
+                cm = (fp_cand > 0) & (fp_hi_ <= 0)
+                if cm.any():
+                    ref_lo = cand_lo
+                    break
+            # If still no candidates found, return bisection result unchanged
+            fp_lo_ = fprime(ref_lo)
+            fp_hi_ = fprime(ref_hi)
+            candidate_mask = (fp_lo_ > 0) & (fp_hi_ <= 0)
+            if not candidate_mask.any():
+                return h_hi
+        else:
+            fp_lo_ = fprime(ref_lo)
+            fp_hi_ = fprime(ref_hi)
+            candidate_mask = (fp_lo_ > 0) & (fp_hi_ <= 0)
+            if not candidate_mask.any():
+                return h_hi
+        # Pick the bin with the largest positive value at ref_lo that crossed zero
+        masked_fp_lo = fp_lo_.clone()
+        masked_fp_lo[~candidate_mask] = -float('inf')
+        j = int(masked_fp_lo.argmax().item())
+        h_mid = (ref_lo + ref_hi) / 2.0
+        # Evaluate fp[j] at three bandwidths for quadratic fit
+        y_lo  = fp_lo_[j].item()
+        y_mid = fprime(h_mid)[j].item()
+        y_hi  = fp_hi_[j].item()
+        # Fit quadratic y = a*h² + b*h + c through the three (h, y) pairs
+        # and solve for the root in [ref_lo, ref_hi].
+        coeffs = np.polyfit([ref_lo, h_mid, ref_hi], [y_lo, y_mid, y_hi], 2)
+        roots = np.roots(coeffs)
+        real_roots = [
+            r.real for r in roots
+            if abs(r.imag) < 1e-10 * abs(r.real + 1e-30)
+            and ref_lo <= r.real <= ref_hi
+        ]
+        if real_roots:
+            return float(min(real_roots, key=lambda r: abs(r - h_mid)))
+        return h_hi
+def adaptive_fft_G(data_range: float, h_hi: float, G_min: int = 16384) -> int:
+    """Choose FFT grid size G so that the derivative kernel is well-resolved.
+    Requires h > 8 * bin_width = 8 * data_range / G, equivalently
+    G > 8 * data_range / h_hi. We use a factor of 16 for safety margin,
+    then round up to the next power of 2 for efficient FFT.
+    Parameters
+    ----------
+    data_range : float
+        hi - lo of the data domain (typically X.max() - X.min() + 6σ).
+    h_hi : float
+        Upper bracket of the bisection (smallest h needing resolution).
+    G_min : int
+        Minimum returned G (default 16384).
+    Returns
+    -------
+    int
+        Grid size G, a power of 2, at least G_min.
+    """
+    needed = 16 * math.ceil(data_range / h_hi)
+    # Round up to next power of 2
+    p = 1
+    while p < needed:
+        p <<= 1
+    return max(G_min, p)

{diffcb-0.1.1 → diffcb-0.1.4}/dcb/layer.py RENAMED Viewed

@@ -35,13 +35,14 @@ class DCBFunction(torch.autograd.Function):
     @staticmethod
     def forward(ctx, X, grid, eps, tau, target_modes, delta, formula, chunk_size,
-                brentq_n_max, g_brentq, use_hard_bisection, safe_backward, use_fft):
+                brentq_n_max, g_brentq, use_hard_bisection, safe_backward, use_fft, fft_G_min,
+                fft_dtype):
         """Locate h_crit and save state for the backward pass."""
         h_crit, cond_num = find_h_crit(
             X, grid, eps, tau, target_modes,
             formula=formula, brentq_n_max=brentq_n_max, chunk_size=chunk_size,
             g_brentq=g_brentq, use_hard_bisection=use_hard_bisection,
-            use_fft=use_fft,
+            use_fft=use_fft, G_min=fft_G_min, fft_dtype=fft_dtype,
         )
         ctx.save_for_backward(X, grid)
         ctx.h_crit = h_crit
@@ -67,8 +68,8 @@ class DCBFunction(torch.autograd.Function):
         ctx.denom_abs       = ift_gradient.last_denom_abs
         # Gradients for: X, grid, eps, tau, target_modes, delta, formula,
         #                chunk_size, brentq_n_max, g_brentq, use_hard_bisection,
-        #                safe_backward, use_fft
-        return grad_X, None, None, None, None, None, None, None, None, None, None, None, None
+        #                safe_backward, use_fft, fft_G_min, fft_dtype
+        return grad_X, None, None, None, None, None, None, None, None, None, None, None, None, None, None
 class DCBLayer(nn.Module):
@@ -133,6 +134,13 @@ class DCBLayer(nn.Module):
         Number of points to sketch when n > max_n_exact. Default 500_000.
         A 500K sketch achieves the same mean accuracy as streaming 100M points
         (validated in Round 20 reservoir experiment).
+    fft_G_min : int
+        Minimum FFT histogram grid size for the bisection solver (default 16384).
+        Controls accuracy of the FFT path (n > 50K). Larger values reduce
+        discretisation error at a modest cost: G=16384 gives ~0.004% err vs R;
+        G=32768 gives ~0.001% at +9% cost; G=65536 reaches the R-matching floor
+        (~0.001%) with no further gain beyond that. Ignored for n ≤ 50K (direct
+        KDE path).
     Examples
     --------
@@ -162,6 +170,8 @@ class DCBLayer(nn.Module):
         use_fft: bool = True,
         max_n_exact: int | None = 1_000_000,
         sketch_size: int = 500_000,
+        fft_G_min: int = 16384,
+        fft_dtype: torch.dtype = torch.float32,
     ):
         super().__init__()
         self.target_modes = target_modes
@@ -180,6 +190,8 @@ class DCBLayer(nn.Module):
         self.use_fft = use_fft
         self.max_n_exact = max_n_exact
         self.sketch_size = sketch_size
+        self.fft_G_min = fft_G_min
+        self.fft_dtype = fft_dtype
         if use_fft and brentq_n_max != 50_000:
             raise TypeError(
                 f"brentq_n_max={brentq_n_max} is meaningless when use_fft=True: the FFT path "
@@ -250,7 +262,7 @@ class DCBLayer(nn.Module):
         return DCBFunction.apply(
             X, grid, eps_eff, tau_eff, self.target_modes, self.delta, self.formula,
             self.chunk_size, self.brentq_n_max, self.g_brentq, self.use_hard_bisection,
-            self.safe_backward, self.use_fft,
+            self.safe_backward, self.use_fft, self.fft_G_min, self.fft_dtype,
         )

{diffcb-0.1.1 → diffcb-0.1.4}/dcb/solver.py RENAMED Viewed

@@ -37,7 +37,7 @@ from dcb.kde import (
     soft_mode_count_cross_from_derivs,
     kde_derivatives_chunked,
 )
-from dcb.fft_kde import fft_mode_count, adaptive_fft_G
+from dcb.fft_kde import fft_mode_count, adaptive_fft_G, precompute_fft, mode_count_from_C
 _AUTO_FFT_THRESHOLD = 50_000  # n above which FFT bisection activates (use_fft_effective)
@@ -74,6 +74,8 @@ def find_h_crit_hard(
     eps: float = 0.1,
     tau: float = 0.2,
     use_fft: bool = False,
+    G_min: int = 16384,
+    fft_dtype: torch.dtype = torch.float32,
 ) -> tuple[float, float]:
     """Find h_crit via hard-mode-count bisection (monotone, no false roots).
@@ -151,43 +153,64 @@ def find_h_crit_hard(
             lo_domain = X.min().item() - 3 * sigma
             hi_domain = X.max().item() + 3 * sigma
             data_range = hi_domain - lo_domain
-        G_fft = adaptive_fft_G(data_range, h_hi)
+        G_fft = adaptive_fft_G(data_range, h_hi, G_min=G_min)
         _domain = (lo_domain, hi_domain)
+        pad_factor = 2  # Worker 5: pad_factor=2 (was 4) — safe for h ≤ 3σ, halves irfft size
+        N = pad_factor * G_fft
         with torch.no_grad():
+            # Worker 1: precomputed C — hoist histogram + rfft out of bisection.
+            # Worker 3: float32 FFT by default — 2× faster; _refine_hcrit uses float64 independently.
+            C, omega, _domain = precompute_fft(
+                X, G=G_fft, domain=_domain, pad_factor=pad_factor, fft_dtype=fft_dtype,
+            )
             # Verify bracket using FFT mode count on full X
-            count_lo = fft_mode_count(X, h_lo, G=G_fft, domain=_domain)
+            count_lo = mode_count_from_C(C, omega, h_lo, G_fft, N)
             if count_lo <= target_modes:
                 h_lo_try = h_lo
                 for _ in range(30):
                     h_lo_try *= 0.5
                     if h_lo_try < 1e-10:
                         break
-                    if fft_mode_count(X, h_lo_try, G=G_fft, domain=_domain) > target_modes:
+                    if mode_count_from_C(C, omega, h_lo_try, G_fft, N) > target_modes:
                         h_lo = h_lo_try
                         break
-            count_hi = fft_mode_count(X, h_hi, G=G_fft, domain=_domain)
+            count_hi = mode_count_from_C(C, omega, h_hi, G_fft, N)
             if count_hi > target_modes:
                 for _ in range(30):
                     h_hi *= 2.0
-                    if fft_mode_count(X, h_hi, G=G_fft, domain=_domain) <= target_modes:
+                    if mode_count_from_C(C, omega, h_hi, G_fft, N) <= target_modes:
                         break
-            # Standard bisection: 50 iterations → bracket width / 2^50
+            # Adaptive bisection: stop when bracket is localised (relative width < 1e-3)
+            # _refine_hcrit provides sub-bin precision afterwards — no need to over-bisect.
             lo, hi = h_lo, h_hi
             for _ in range(50):
                 mid = (lo + hi) / 2.0
-                count = fft_mode_count(X, mid, G=G_fft, domain=_domain)
+                count = mode_count_from_C(C, omega, mid, G_fft, N)
                 if count <= target_modes:
                     hi = mid
                 else:
                     lo = mid
                 if (hi - lo) < tol:
                     break
+                # Worker 4: adaptive termination — stop when relative bracket width
+                # is small enough that further bisection cannot meaningfully shift
+                # _refine_hcrit's quadratic fit. Empirically 1e-7 preserves h_crit
+                # to within 1e-6 of the 50-step tol=1e-6 baseline while saving ~10
+                # bisection steps in typical cases.
+                if hi > 0 and (hi - lo) / hi < 1e-7:
+                    break
             h_crit = float(hi)  # smallest h with count <= target_modes
+            # Sub-bin refinement: quadratic interpolation on the disappearing f′ lobe
+            # to locate h_crit below the bin-width precision limit.
+            from dcb.fft_kde import _refine_hcrit
+            h_crit = _refine_hcrit(X, lo, hi, G_fft, _domain, target_modes)
     else:
         with torch.no_grad():
             # Verify bracket: need count > target at h_lo, count <= target at h_hi.
@@ -216,6 +239,9 @@ def find_h_crit_hard(
                         break
             # Standard bisection: 50 iterations → bracket width / 2^50
+            # NOTE: non-FFT path has no _refine_hcrit sub-bin refinement, so we keep
+            # tight bisection here for gradient stability (IFT test requires h_crit
+            # accurate well below FD perturbation delta=1e-3).
             lo, hi = h_lo, h_hi
             for _ in range(50):
                 mid = (lo + hi) / 2.0
@@ -290,6 +316,8 @@ def find_h_crit(
     g_brentq: int = 128,
     use_hard_bisection: bool = True,
     use_fft: bool = True,
+    G_min: int = 16384,
+    fft_dtype: torch.dtype = torch.float32,
 ) -> tuple[float, float]:
     """Find h_crit and return (h_crit, condition_number).
@@ -343,7 +371,7 @@ def find_h_crit(
         return find_h_crit_hard(
             X, grid, target_modes, chunk_size, brentq_n_max,
             h_lo, h_hi, formula=formula, eps=eps, tau=tau,
-            use_fft=use_fft,
+            use_fft=use_fft, G_min=G_min, fft_dtype=fft_dtype,
         )
     from scipy.optimize import brentq

{diffcb-0.1.1 → diffcb-0.1.4}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "diffcb"
-version = "0.1.1"
+version = "0.1.4"
 description = "Differentiable Critical Bandwidth: Silverman's modality test as a differentiable PyTorch layer with IFT backward pass."
 readme = "README.md"
 license = { file = "LICENSE" }

diffcb-0.1.1/README.md DELETED Viewed

@@ -1,92 +0,0 @@
-# DCB — Differentiable Critical Bandwidth
-[![PyPI](https://img.shields.io/pypi/v/diffcb.svg)](https://pypi.org/project/diffcb/)
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
-[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/)
-A PyTorch package that makes **Silverman's critical bandwidth test (1981)** fully differentiable, enabling end-to-end gradient-based optimization over the modal structure of continuous distributions.
-## Overview
-The critical bandwidth `h_crit` is the minimum KDE bandwidth at which a distribution appears to have at most `m` modes — a classical nonparametric statistic for modality testing. DCB replaces every non-differentiable operation in its computation with a smooth surrogate, then uses the **Implicit Function Theorem** to compute exact gradients through the root-finding step at O(1) memory cost.
-```python
-import torch
-from dcb import DCBLayer
-X = torch.randn(256, requires_grad=True)   # 1D samples
-layer = DCBLayer(target_modes=1)
-h_crit = layer(X)                          # differentiable scalar
-h_crit.backward()                          # exact IFT gradients
-```
-## Installation
-```bash
-pip install diffcb
-```
-Or from source:
-```bash
-git clone https://github.com/ryZhangHason/differentiable-critical-bandwidth
-cd differentiable-critical-bandwidth
-pip install -e ".[dev]"
-```
-## Paper
-> Ruiyu Zhang. "Differentiable Critical Bandwidth: Making Silverman's Modality Test End-to-End Trainable." *Journal of Machine Learning Research*, 2026 (in preparation).
-## Confirmed Experimental Results
-All results produced on Kaggle GPU (T4 / P100) — see `experiments/` and `outputs/`.
-| Experiment | Result | Criterion |
-|---|---|---|
-| **Validation (m≥2)** | R²=0.91, MAE=0.07, Spearman ρ=0.89 | R²≥0.85, MAE≤0.10 ✓ |
-| **Speedup vs scipy (n=8192)** | **10.5×** on T4 | ≥3× ✓ |
-| **GAN mode preservation** | h_crit=1.232 >> 0.3 | h_crit>0.3 ✓ |
-| **Anomaly AUC (KDDCup99)** | DCB=**0.9982** vs IF=0.9867 | DCB≥IF ✓ |
-## Repository Structure
-```
-dcb/            Core PyTorch package (layer.py, solver.py, kde.py, utils.py)
-experiments/    Reproduction scripts for all paper figures and tables
-  phase1_validation.py   Figure 1: DCB vs reference h_crit scatter
-  phase1_speedup.py      Figure 2: GPU speedup benchmark
-  phase1_ablation.py     Figures S1–S2: ε/τ sensitivity heatmaps
-  phase2_gan.py          Figure 3: GAN mode-collapse prevention
-  phase3_anomaly.py      Table 2 + Figure 5: anomaly detection benchmark
-tests/          Unit tests (pytest, 35/35 passing)
-outputs/        All generated figures and tables (PDFs, PNGs, CSVs)
-notebooks/      Quickstart and demo notebooks
-```
-## Reproducing Paper Results
-```bash
-# Phase 1: validation, speedup, ablation
-python experiments/phase1_validation.py
-python experiments/phase1_speedup.py
-python experiments/phase1_ablation.py
-# Phase 2: GAN mode collapse experiment
-python experiments/phase2_gan.py
-# Phase 3: anomaly detection benchmark
-python experiments/phase3_anomaly.py
-```
-For GPU runs, use the provided Kaggle kernels:
-- Phase 1–2: `hsingle/dcb-full-experiments`
-- Phase 3: `hsingle/dcb-phase-3-anomaly-detection`
-## Kaggle GPU Notes
-Kaggle may assign a P100 (sm_60) instead of T4. The Phase 3 kernel handles this automatically by installing `torch==2.2.2+cu118` (the earliest PyTorch release with both Python 3.12 and sm_60 support) when P100 is detected.
-## License
-MIT — see [LICENSE](LICENSE).

diffcb-0.1.1/dcb/fft_kde.py DELETED Viewed

@@ -1,144 +0,0 @@
-"""
-dcb.fft_kde — FFT-based KDE Mode Counter
-Implements mode counting via FFT convolution of the histogram with a
-Gaussian derivative kernel. Complexity is O(n + G log G), avoiding the
-O(n × G) cost of the direct KDE approach and — crucially — requiring NO
-subsampling. This eliminates the (brentq_n_max / n)^{-1/5} upward bias
-that affects the standard bisection path when n > brentq_n_max.
-Round 18b: forward kernel only. The IFT backward is unchanged (still uses
-the analytical chunked KDE derivatives on all n points).
-"""
-from __future__ import annotations
-import math
-import torch
-from torch import Tensor
-def fft_mode_count(
-    X: Tensor,
-    h: float,
-    G: int = 4096,
-    pad_factor: int = 4,
-    domain: tuple[float, float] | None = None,
-) -> int:
-    """Count KDE modes via FFT convolution — O(n + G log G), no subsampling.
-    Bins X into G histogram bins, zero-pads to pad_factor*G, convolves with
-    the Gaussian derivative kernel in the frequency domain (applying iω·exp(−½(ωh)²)),
-    back-transforms, and counts positive-to-negative sign changes of the
-    resulting f' estimate.
-    Parameters
-    ----------
-    X : Tensor, shape (n,)
-        1D data tensor (may be on CPU or CUDA).
-    h : float
-        Bandwidth for the Gaussian kernel.
-    G : int
-        Number of histogram bins. Must satisfy h > 8 * (data_range / G) for
-        reliable derivative estimation. Use `adaptive_fft_G` to choose G
-        automatically before bisection.
-    pad_factor : int
-        Zero-padding multiplier (default 4). Mandatory ≥ 2 for circular-wrap
-        correctness; 4 is recommended at the largest h encountered.
-    domain : (lo, hi) or None
-        If provided, use this as the histogram domain instead of computing
-        X.min() - 3σ … X.max() + 3σ. Allows the caller to align the domain
-        with the bisection bracket (e.g., X.min() - 2*h_hi … X.max() + 2*h_hi)
-        so every fft_mode_count call in a bisection loop uses an identical grid.
-    Returns
-    -------
-    int
-        Number of KDE modes (downward zero-crossings of f').
-    """
-    with torch.no_grad():
-        if domain is not None:
-            lo, hi = domain
-        else:
-            # Domain: extend 3σ beyond data range to avoid boundary effects
-            sigma = X.std().item()
-            if sigma == 0.0:
-                sigma = 1.0  # degenerate case: all points identical
-            lo = X.min().item() - 3 * sigma
-            hi = X.max().item() + 3 * sigma
-        data_range = hi - lo
-        if data_range == 0.0:
-            return 1  # single-point distribution has 1 mode
-        # Histogram (O(n)) — MPS-safe via bucketize+bincount on CPU.
-        # torch.histc on MPS allocates an n × bins float32 intermediate (PyTorch
-        # MPS bug); at n=5M, bins=512 this is ~9.5 GiB → OOM.  Moving to CPU for
-        # the binning step avoids the intermediate and is numerically identical
-        # for data within [lo, hi] (guaranteed by the 3σ domain extension above).
-        X_cpu = X.float().cpu()
-        edges = torch.linspace(lo, hi, G + 1)                       # (G+1,) CPU
-        bin_idx = torch.bucketize(X_cpu, edges, right=True).clamp(1, G) - 1  # 0-indexed
-        counts = torch.bincount(bin_idx, minlength=G).float().to(X.device)   # back to device
-        # Zero-pad to pad_factor*G (4× mandatory for circular wrap correctness at h_hi)
-        N = pad_factor * G
-        counts_padded = torch.zeros(N, dtype=torch.float32, device=X.device)
-        counts_padded[:G] = counts
-        # FFT of histogram
-        C = torch.fft.rfft(counts_padded)
-        # Derivative kernel in frequency domain: iω * exp(-0.5*(ω*h)²)
-        # ω_k = 2π*k / (N * bin_width), bin_width = data_range / G
-        bin_width = data_range / G
-        k = torch.arange(N // 2 + 1, device=X.device, dtype=torch.float32)
-        omega = 2 * math.pi * k / (N * bin_width)
-        K_deriv = 1j * omega * torch.exp(-0.5 * (omega * h) ** 2)
-        # Convolve and back-transform
-        f_prime_padded = torch.fft.irfft(C * K_deriv, n=N)
-        # Trim to original G grid (discard zero-padded tail)
-        f_prime = f_prime_padded[:G]
-        # Count (+→-) sign changes = number of modes
-        # A mode is a local max of f, i.e., f' crosses zero from + to -
-        # Remove zeros (flat segments) — carry forward last nonzero sign
-        nonzero_mask = f_prime != 0
-        if not nonzero_mask.any():
-            return 0
-        s = f_prime[nonzero_mask]
-        transitions = int(((s[:-1] > 0) & (s[1:] < 0)).sum().item())
-        return transitions
-def adaptive_fft_G(data_range: float, h_hi: float, G_min: int = 4096) -> int:
-    """Choose FFT grid size G so that the derivative kernel is well-resolved.
-    Requires h > 8 * bin_width = 8 * data_range / G, equivalently
-    G > 8 * data_range / h_hi. We use a factor of 16 for safety margin,
-    then round up to the next power of 2 for efficient FFT.
-    Parameters
-    ----------
-    data_range : float
-        hi - lo of the data domain (typically X.max() - X.min() + 6σ).
-    h_hi : float
-        Upper bracket of the bisection (smallest h needing resolution).
-    G_min : int
-        Minimum returned G (default 4096).
-    Returns
-    -------
-    int
-        Grid size G, a power of 2, at least G_min.
-    """
-    needed = 16 * math.ceil(data_range / h_hi)
-    # Round up to next power of 2
-    p = 1
-    while p < needed:
-        p <<= 1
-    return max(G_min, p)