PyPI - dimensionality - Versions diffs - 0.1.0__tar.gz - Mend

dimensionality 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (78) hide show

dimensionality-0.1.0/CLAUDE.md ADDED Viewed

@@ -0,0 +1,142 @@
+# Dimensionality Estimator
+Python research repository implementing a bias-corrected participation ratio (PR) estimator for measuring global (and local) dimensionality of neural representation manifolds. Based on the ICLR 2026 paper:
+> **Estimating Dimensionality of Neural Representations from Finite Samples**
+> Chanwoo Chun\*, Abdulkadir Canatar\*, SueYeon Chung, Daniel Lee
+---
+## Background
+The **participation ratio (PR)** γ is a soft count of nonzero eigenvalues of a covariance matrix K = (1/Q) ΦΦᵀ, where Φ ∈ ℝ^{P×Q} is the neural activation matrix (P stimuli × Q neurons):
+```
+γ = (Σᵢ λᵢ)² / Σᵢ λᵢ²   =   A / B
+```
+The naive estimator γ_naive is heavily biased when P or Q is small. The key insight: bias arises from **overlapping indices** in the sums for A and B. The fix is to average over **disjoint/unequal indices** only.
+### Estimators
+| Estimator    | Corrects         | Use when                                      |
+|--------------|------------------|-----------------------------------------------|
+| `γ_naive`    | nothing          | baseline comparison                           |
+| `γ_row`      | row (stimulus) sampling bias   | full neuron access, sampled stimuli |
+| `γ_col`      | column (neuron) sampling bias  | full stimulus access, sampled neurons |
+| `γ_both`     | both row and column bias       | general case (recommended)          |
+### Extensions
+- **Noise correction**: pass two trial matrices Φ⁽¹⁾, Φ⁽²⁾; redefine v^{αβ}_{ijkl} ← Φ⁽¹⁾_{iα} Φ⁽²⁾_{jα} Φ⁽¹⁾_{kβ} Φ⁽²⁾_{lβ}. Eliminates additive/multiplicative noise bias.
+- **Importance sampling**: weight samples by r(x) = ρ_X(x)/ρ_X^obs(x) and c(w) = ρ_W(w)/ρ_W^obs(w) to correct for biased sampling distributions.
+- **Sparse matrices**: skip summands that include any missing entry.
+- **Finite underlying matrix**: when sampling P rows/cols from a finite R×C matrix (without replacement), use the corrected estimators that require knowledge of R and C.
+- **Local dimensionality**: weight samples by proximity (Mahalanobis distance with local metric), take average over all center points. Noise-resistant unlike TwoNN.
+### Scaling law of γ_naive
+Under uniform row/column norms:
+```
+E[1/γ_naive] ≈ 1/P + 1/Q + 1/γ
+```
+γ_naive is approximately a harmonic mean of P, Q, and γ — like parallel resistance.
+---
+## Implementation Notes
+- Core computation uses `opt_einsum` (no JAX). Disjoint-index sums are re-expressed as linear combinations of regular sums to enable vectorization. See Sec. A.3 of the paper for the full expansions.
+- **Centering is algebraic**: the three-term formula structure encodes centering (e.g. A = ⟨v_iijj⟩ − 2⟨v_iijl⟩ + ⟨v_ijlr⟩). Do **not** pre-subtract column means before calling the estimators — doing so introduces statistical dependencies that break the bias correction.
+- Task dimensionality (default): pass Φ directly (rows = stimuli, columns = neurons).
+- Neuron dimensionality: pass Φ.T instead.
+- The ratio A/B introduces a small, unavoidable O((1/P + 1/Q)²) bias even when A and B are individually unbiased — this is negligible in practice.
+---
+## Package API
+```python
+from dimensionality import participation_ratio, participation_ratio_finite
+# Single-trial, default output (γ_both, bias-corrected for both axes)
+gamma = participation_ratio(Phi)                         # float
+# Two-trial noise correction
+gamma = participation_ratio(Phi1, Phi2)                  # float
+# Return all four estimators
+result = participation_ratio(Phi, return_all=True)
+# result['naive'], result['row'], result['col'], result['both']
+# Return numerator A and denominator B
+result = participation_ratio(Phi, return_parts=True)
+# result['both'], result['A'], result['B']
+# Both options at once — adds A_naive/B_naive, A_row/B_row, etc.
+result = participation_ratio(Phi, return_all=True, return_parts=True)
+# Finite underlying matrix (R×C), submatrix Φ is P×Q
+gamma = participation_ratio_finite(Phi, R=5000, C=2000)  # float
+gamma = participation_ratio_finite(Phi1, R=5000, C=2000, Phi2=Phi2)  # two-trial
+result = participation_ratio_finite(Phi, R=5000, C=2000, return_parts=True)
+# result['gamma'], result['A'], result['B']
+```
+Minimum sizes: `participation_ratio` requires P ≥ 4, Q ≥ 2. `participation_ratio_finite` requires the same plus R ≥ P, C ≥ Q.
+---
+## Repository Structure
+```
+src/
+  dimensionality/
+    __init__.py        # exports participation_ratio, participation_ratio_finite
+    _core.py           # _gett_all(pattern, A, B) — quartic einsum helper
+    estimators.py      # participation_ratio (infinite underlying matrix)
+    finite.py          # participation_ratio_finite (finite R×C underlying matrix)
+tests/
+  test_estimators.py   # API tests, bias ordering, noise correction, finite estimator
+examples/
+  synthetic.py         # Figure 1 reproduction (vary P or Q, all four estimators)
+legacy/                # Old exploratory notebooks — ignore unless explicitly referenced
+  biology/             #   Brain data experiments (Stringer, MajajHong, TVSD)
+  intrinsic/           #   Local dimensionality experiments
+  synthetic/           #   Synthetic data experiments
+```
+The `legacy/` folder contains the original research notebooks used to produce figures in the paper. **Do not modify or base new code on legacy/ unless explicitly asked.**
+To run examples:
+```bash
+python examples/synthetic.py            # saves examples/figure1.png
+python examples/synthetic.py --show     # also opens interactive window
+```
+To run tests:
+```bash
+pytest tests/
+```
+---
+## Key Symbols
+| Symbol | Meaning |
+|--------|---------|
+| Φ ∈ ℝ^{P×Q} | Sample activation matrix (P stimuli, Q neurons) |
+| Φ^{(∞)} | True infinite underlying matrix |
+| K = (1/Q)ΦΦᵀ | Sample covariance matrix |
+| γ | True participation ratio (dimensionality) |
+| γ_naive | Naive estimator (biased) |
+| γ_both | Bias-corrected estimator (both row and column) |
+| A, B | Numerator and denominator of γ (centered) |
+| v^{αβ}_{ijlr} = Φ_{iα}Φ_{jα}Φ_{lβ}Φ_{rβ} | Elementary quartic tensor |
+| r_{ijlr} | Column-marginalized tensor (sum over α≠β) |
+| t¹–t⁵ | Five unique terms in A and B |
+## Environment
+- Conda environment: `dimensionality`
+- Activate before running any Python: `conda activate dimensionality`

dimensionality-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,21 @@
+Metadata-Version: 2.4
+Name: dimensionality
+Version: 0.1.0
+Summary: Bias-corrected participation ratio estimator for measuring dimensionality of neural representations
+Project-URL: Homepage, https://github.com/badooki/dimensionality
+Project-URL: Paper, https://arxiv.org/abs/2509.26560
+Author: Chanwoo Chun, Abdulkadir Canatar, SueYeon Chung, Daniel Lee
+License: MIT
+Keywords: dimensionality,neural manifold,neuroscience,participation ratio,representation geometry
+Classifier: Intended Audience :: Science/Research
+Classifier: Programming Language :: Python :: 3
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
+Requires-Python: >=3.9
+Requires-Dist: numpy>=1.24
+Requires-Dist: opt-einsum>=3.3
+Provides-Extra: dev
+Requires-Dist: jupyter; extra == 'dev'
+Requires-Dist: matplotlib>=3.6; extra == 'dev'
+Requires-Dist: pytest>=7; extra == 'dev'
+Requires-Dist: scipy>=1.10; extra == 'dev'

dimensionality-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,237 @@
+# Sample-size invariant measure of dimensionality
+Bias-corrected **participation ratio (PR)** estimators for measuring the dimensionality of neural representation manifolds, as introduced in:
+> **Estimating Dimensionality of Neural Representations from Finite Samples**
+> Chanwoo Chun\*, Abdulkadir Canatar\*, SueYeon Chung, Daniel Lee
+> *ICLR 2026*
+---
+## 📐 Background
+Given a neural activation matrix $\Phi \in \mathbb{R}^{P\times Q}$ (P stimuli × Q neurons), the participation ratio
+$$ \gamma =\frac{\left(\sum_i \lambda_i \right)^2}{\sum_i \lambda_i^2} $$
+is a soft count of the number of nonzero eigenvalues of the stimulus covariance $K=\frac{1}{Q}\Phi\Phi^\top$.  The naive estimator is severely biased downward when P or Q is small — it behaves approximately as a harmonic mean of P, Q, and the true $\gamma$:
+$$ \mathbb{E}\left[ \frac{1}{\gamma_{\text{naive}}}\right] \approx \frac{1}{P}+ \frac{1}{Q} +\frac{1}{\gamma} $$
+This package provides unbiased estimators that correct for finite P and/or Q by averaging over disjoint index sets.
+| Estimator | Corrects | Use when |
+|-----------|----------|----------|
+| `γ_naive` | nothing | baseline reference |
+| `γ_row` | row (stimulus) sampling bias | full neuron access, subsampled stimuli |
+| `γ_col` | column (neuron) sampling bias | full stimulus access, subsampled neurons |
+| `γ_both` | both row and column bias | subsampled neurons, subsampled stimuli|
+An additional `participation_ratio_finite` estimator handles the case where Φ is a submatrix sampled without replacement from a large-but-finite R×C matrix (Appendix A.6 of the paper).
+---
+## 📦 Installation
+```bash
+pip install git+https://github.com/badooki/dimensionality.git
+```
+Or, for development:
+```bash
+git clone https://github.com/badooki/dimensionality.git
+cd dimensionality
+pip install -e ".[dev]"
+```
+**Dependencies:** `numpy >= 1.24`, `opt_einsum >= 3.3`.  Python ≥ 3.9.
+---
+## 🚀 Quick start
+```python
+import numpy as np
+from dimensionality import participation_ratio
+# Phi: P stimuli × Q neurons  (do NOT pre-center)
+Phi = np.random.randn(200, 100)
+# Default: γ_both (bias-corrected for both row and column subsampling)
+gamma = participation_ratio(Phi)
+print(gamma)
+```
+### All four estimators
+```python
+result = participation_ratio(Phi, return_all=True)
+# result['naive'], result['row'], result['col'], result['both']
+```
+### Two-trial noise correction
+When two independent repeat trials are available for the same stimuli and neurons, the cross-trial construction removes additive and multiplicative noise bias:
+```python
+gamma = participation_ratio(Phi1, Phi2)
+```
+### Return numerator and denominator separately
+```python
+result = participation_ratio(Phi, return_parts=True)
+# result['both'], result['A'], result['B']
+# Combined with return_all:
+result = participation_ratio(Phi, return_all=True, return_parts=True)
+# result['naive'], result['A_naive'], result['B_naive'], ...
+```
+### Neuron dimensionality
+To estimate dimensionality along the neuron axis (centering across stimuli), transpose the matrix:
+```python
+gamma_neuron = participation_ratio(Phi.T)
+```
+---
+## 🔢 Finite underlying matrix
+When Φ is a P×Q submatrix sampled without replacement from a finite R×C population matrix, use `participation_ratio_finite`:
+```python
+from dimensionality import participation_ratio_finite
+gamma = participation_ratio_finite(Phi, R=5000, C=2000)
+# With noise correction:
+gamma = participation_ratio_finite(Phi1, R=5000, C=2000, Phi2=Phi2)
+# Also return the naive estimate:
+result = participation_ratio_finite(Phi, R=5000, C=2000, return_naive=True)
+# result['gamma'], result['naive']
+```
+---
+## 📊 Subsampling sweep
+To assess how the estimate converges with sample size, sweep over P or Q:
+```python
+from dimensionality import sweep_dimensionality, plot_sweep
+# Sweep over number of stimuli; keep all neurons
+result = sweep_dimensionality(Phi, axis='P', n_trials=20)
+# result['values']  — array of P values used
+# result['naive'], result['row'], result['col'], result['both']  — mean estimates
+# result['both_sem']  — standard error of the mean
+fig, ax = plot_sweep(result, true_d=50)
+```
+To sweep over number of neurons instead:
+```python
+result = sweep_dimensionality(Phi, axis='Q')
+```
+For the finite estimator:
+```python
+result = sweep_dimensionality(Phi, axis='P', estimator='finite', R=5000, C=2000)
+# result['naive'], result['gamma']
+```
+---
+## ⚠️ Important: do not pre-center
+The bias corrections rely on an algebraic three-term centering structure built into the estimator formulas.  Subtracting column means from Φ before passing it to the estimator introduces statistical dependencies between rows that break the bias correction.  **Pass the raw activation matrix directly.**
+---
+## 🔧 API reference
+### `participation_ratio(Phi, Phi2=None, *, return_all=False, return_parts=False)`
+Estimate the task dimensionality (PR of the centered covariance) of Φ.
+- **Phi** — raw activation matrix, shape (P, Q); P ≥ 4, Q ≥ 2.
+- **Phi2** — optional second trial for noise correction.
+- **return_all** — if `True`, return dict with all four estimator variants.
+- **return_parts** — if `True`, include numerator A and denominator B.
+Returns a scalar (`γ_both`) by default, or a dict when either flag is set.
+---
+### `participation_ratio_finite(Phi, R, C, Phi2=None, *, return_naive=False, return_parts=False)`
+Estimate the PR of the full R×C matrix from the observed P×Q submatrix.
+- **R, C** — number of rows/columns in the full underlying matrix; R ≥ P, C ≥ Q.
+- **return_naive** — if `True`, also return the (uncorrected) naive estimate.
+- **return_parts** — if `True`, include numerator A and denominator B.
+Returns a scalar by default, or a dict when either flag is set.
+---
+### `sweep_dimensionality(Phi, axis='P', values=None, n_trials=20, Phi2=None, estimator='infinite', R=None, C=None, ...)`
+Run a subsampling sweep.  Returns a dict with mean estimates and SEMs at each value.  See docstring for full parameter list.
+---
+### `plot_sweep(result, ax=None, true_d=None, title=None, figsize=(5, 4))`
+Plot the output of `sweep_dimensionality`.  Returns `(fig, ax)`.
+---
+## 🗂️ Repository structure
+```
+src/
+  dimensionality/
+    __init__.py        # public API
+    _core.py           # quartic einsum helper
+    estimators.py      # participation_ratio
+    finite.py          # participation_ratio_finite
+    sweep.py           # sweep_dimensionality
+    plot.py            # plot_sweep
+tests/
+  test_estimators.py
+examples/
+  demo.ipynb           # interactive walkthrough on synthetic data
+  synthetic.py         # Figure 1 reproduction
+```
+---
+## 📄 Citation
+If you use this package, please cite:
+```bibtex
+@inproceedings{chun2026estimating,
+  title     = {Estimating Dimensionality of Neural Representations from Finite Samples},
+  author    = {Chun, Chanwoo and Canatar, Abdulkadir and Chung, SueYeon and Lee, Daniel},
+  booktitle = {International Conference on Learning Representations},
+  year      = {2026},
+}
+```
+---
+## License
+MIT