pmf-acls 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pmf_acls-0.1.0/.gitignore +59 -0
- pmf_acls-0.1.0/CLAUDE.md +40 -0
- pmf_acls-0.1.0/LICENSE +21 -0
- pmf_acls-0.1.0/MANIFEST.in +14 -0
- pmf_acls-0.1.0/PKG-INFO +309 -0
- pmf_acls-0.1.0/README.md +23 -0
- pmf_acls-0.1.0/pmf_acls/README.md +281 -0
- pmf_acls-0.1.0/pmf_acls/__init__.py +89 -0
- pmf_acls-0.1.0/pmf_acls/bayes.py +1556 -0
- pmf_acls-0.1.0/pmf_acls/bayes_diagnostics.py +234 -0
- pmf_acls-0.1.0/pmf_acls/core.py +1967 -0
- pmf_acls-0.1.0/pmf_acls/data_structures.py +579 -0
- pmf_acls-0.1.0/pmf_acls/diagnostics.py +461 -0
- pmf_acls-0.1.0/pmf_acls/lda.py +654 -0
- pmf_acls-0.1.0/pmf_acls/matrix_builder.py +573 -0
- pmf_acls-0.1.0/pmf_acls/py.typed +0 -0
- pmf_acls-0.1.0/pmf_acls/pytest.ini +12 -0
- pmf_acls-0.1.0/pmf_acls/solvers.py +737 -0
- pmf_acls-0.1.0/pmf_acls/tests/__init__.py +1 -0
- pmf_acls-0.1.0/pmf_acls/tests/test_bayes.py +378 -0
- pmf_acls-0.1.0/pmf_acls/tests/test_bayes_diagnostics.py +267 -0
- pmf_acls-0.1.0/pmf_acls/tests/test_bayes_hierarchical.py +238 -0
- pmf_acls-0.1.0/pmf_acls/tests/test_bayes_lognormal.py +277 -0
- pmf_acls-0.1.0/pmf_acls/tests/test_bayes_robust.py +183 -0
- pmf_acls-0.1.0/pmf_acls/tests/test_core.py +886 -0
- pmf_acls-0.1.0/pmf_acls/tests/test_data_structures.py +242 -0
- pmf_acls-0.1.0/pmf_acls/tests/test_diagnostics.py +422 -0
- pmf_acls-0.1.0/pmf_acls/tests/test_lda.py +356 -0
- pmf_acls-0.1.0/pmf_acls/tests/test_matrix_builder.py +369 -0
- pmf_acls-0.1.0/pmf_acls/tests/test_solvers.py +432 -0
- pmf_acls-0.1.0/pmf_acls/tests/test_uncertainty.py +738 -0
- pmf_acls-0.1.0/pmf_acls/tests/test_utils.py +453 -0
- pmf_acls-0.1.0/pmf_acls/tests/test_validation.py +290 -0
- pmf_acls-0.1.0/pmf_acls/uncertainty.py +1140 -0
- pmf_acls-0.1.0/pmf_acls/utils.py +532 -0
- pmf_acls-0.1.0/pmf_acls/validation.py +456 -0
- pmf_acls-0.1.0/pmf_acls.egg-info/PKG-INFO +309 -0
- pmf_acls-0.1.0/pmf_acls.egg-info/SOURCES.txt +42 -0
- pmf_acls-0.1.0/pmf_acls.egg-info/dependency_links.txt +1 -0
- pmf_acls-0.1.0/pmf_acls.egg-info/requires.txt +10 -0
- pmf_acls-0.1.0/pmf_acls.egg-info/top_level.txt +1 -0
- pmf_acls-0.1.0/pyproject.toml +51 -0
- pmf_acls-0.1.0/requirements.txt +0 -0
- pmf_acls-0.1.0/setup.cfg +4 -0
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
# Virtual environment
|
|
2
|
+
.venv/
|
|
3
|
+
venv/
|
|
4
|
+
ENV/
|
|
5
|
+
env/
|
|
6
|
+
|
|
7
|
+
.claude
|
|
8
|
+
|
|
9
|
+
testrun/
|
|
10
|
+
|
|
11
|
+
# Python
|
|
12
|
+
__pycache__/
|
|
13
|
+
*.py[cod]
|
|
14
|
+
*$py.class
|
|
15
|
+
*.so
|
|
16
|
+
.Python
|
|
17
|
+
|
|
18
|
+
# Testing
|
|
19
|
+
.pytest_cache/
|
|
20
|
+
.coverage
|
|
21
|
+
htmlcov/
|
|
22
|
+
*.cover
|
|
23
|
+
|
|
24
|
+
# IDEs
|
|
25
|
+
.vscode/
|
|
26
|
+
.idea/
|
|
27
|
+
*.swp
|
|
28
|
+
*.swo
|
|
29
|
+
*~
|
|
30
|
+
|
|
31
|
+
# Distribution / packaging
|
|
32
|
+
build/
|
|
33
|
+
dist/
|
|
34
|
+
*.egg-info/
|
|
35
|
+
*.egg
|
|
36
|
+
|
|
37
|
+
# Jupyter
|
|
38
|
+
.ipynb_checkpoints/
|
|
39
|
+
|
|
40
|
+
# OS
|
|
41
|
+
.DS_Store
|
|
42
|
+
Thumbs.db
|
|
43
|
+
.Rproj.user
|
|
44
|
+
|
|
45
|
+
.claude/
|
|
46
|
+
examples/NMF_testing/
|
|
47
|
+
NMF_testing.7z
|
|
48
|
+
|
|
49
|
+
# Benchmark outputs
|
|
50
|
+
nmf_compare/results/*.csv
|
|
51
|
+
nmf_compare/results/*.png
|
|
52
|
+
nmf_compare/results/compare_factors/
|
|
53
|
+
|
|
54
|
+
# Generated outputs at root (legacy)
|
|
55
|
+
*.csv
|
|
56
|
+
*.png
|
|
57
|
+
.~lock.*
|
|
58
|
+
tmpclaude-*
|
|
59
|
+
testrun_me2/
|
pmf_acls-0.1.0/CLAUDE.md
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# PMF Monorepo
|
|
2
|
+
|
|
3
|
+
Two projects in one repo:
|
|
4
|
+
|
|
5
|
+
## pmf_acls/ — PMF Solver Package
|
|
6
|
+
A Python implementation of Positive Matrix Factorization with five solver backends (ACLS, LS-NMF, Newton, Bayesian NMF, Bayesian LDA), featuring per-element uncertainty weighting.
|
|
7
|
+
|
|
8
|
+
## nmf_compare/ — NMF/PMF Benchmarking Framework
|
|
9
|
+
Benchmark harness comparing multiple NMF/PMF solvers (pmf_acls, ESAT, scikit-learn NMF) on synthetic data with configurable noise models and matrix sizes.
|
|
10
|
+
|
|
11
|
+
## Layout
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
pmf_acls/ # Solver package (installable)
|
|
15
|
+
nmf_compare/ # Benchmarking scripts and results
|
|
16
|
+
scripts/ # Ad-hoc / one-off analysis scripts
|
|
17
|
+
examples/ # Usage examples for pmf_acls
|
|
18
|
+
docs/ # Literature reviews, specs, notes
|
|
19
|
+
reference/ # PMF2/ME2 reference binaries
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## Running Tests
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
pytest pmf_acls/tests/ -v
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
## Running Benchmarks
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
python nmf_compare/benchmark_simulation.py --noise-sweep --solvers PMF_ACLS --reps 3 --seeds 5
|
|
32
|
+
python nmf_compare/benchmark_simulation.py --size-sweep --solvers PMF_ACLS ESAT_LSNMF --reps 3 --seeds 5
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
## Conventions
|
|
36
|
+
- Package imports: `from pmf_acls import pmf, pmf_bayes, pmf_lda`
|
|
37
|
+
- Algorithms via `pmf()`: 'acls' (default), 'ls-nmf', 'newton', 'bayes'
|
|
38
|
+
- LDA is standalone: `pmf_lda(X, sigma, p)`
|
|
39
|
+
- Benchmark outputs go to `nmf_compare/results/`
|
|
40
|
+
- NumPy/SciPy only for the solver (no heavy deps)
|
pmf_acls-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Jerritt Collord
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
include LICENSE
|
|
2
|
+
include README.md
|
|
3
|
+
include pyproject.toml
|
|
4
|
+
|
|
5
|
+
recursive-include pmf_acls *.py py.typed
|
|
6
|
+
recursive-exclude pmf_acls CLAUDE.md
|
|
7
|
+
|
|
8
|
+
global-exclude *.pyc __pycache__
|
|
9
|
+
prune docs
|
|
10
|
+
prune examples
|
|
11
|
+
prune nmf_compare
|
|
12
|
+
prune reference
|
|
13
|
+
prune scripts
|
|
14
|
+
prune .claude
|
pmf_acls-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,309 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: pmf-acls
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Experimental Positive Matrix Factorization (PMF) routines
|
|
5
|
+
Author: Jerritt Collord
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Classifier: Development Status :: 3 - Alpha
|
|
8
|
+
Classifier: Intended Audience :: Science/Research
|
|
9
|
+
Classifier: Topic :: Scientific/Engineering :: Mathematics
|
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
16
|
+
Requires-Python: >=3.9
|
|
17
|
+
Description-Content-Type: text/markdown
|
|
18
|
+
License-File: LICENSE
|
|
19
|
+
Requires-Dist: numpy>=1.20.0
|
|
20
|
+
Requires-Dist: scipy>=1.7.0
|
|
21
|
+
Provides-Extra: jax
|
|
22
|
+
Requires-Dist: jax>=0.4.0; extra == "jax"
|
|
23
|
+
Requires-Dist: jaxlib>=0.4.0; extra == "jax"
|
|
24
|
+
Provides-Extra: dev
|
|
25
|
+
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
|
26
|
+
Requires-Dist: pytest-cov>=3.0.0; extra == "dev"
|
|
27
|
+
Dynamic: license-file
|
|
28
|
+
|
|
29
|
+
# pmf_acls — Positive Matrix Factorization
|
|
30
|
+
|
|
31
|
+
A Python implementation of Positive Matrix Factorization (PMF) for environmental data analysis, featuring five solver backends with per-element uncertainty weighting.
|
|
32
|
+
|
|
33
|
+
Minimizes the uncertainty-weighted Q objective:
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
Q = Σᵢⱼ [(xᵢⱼ - Σₖ fᵢₖ gₖⱼ) / σᵢⱼ]²
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
where F is (m, p) — factor profiles (variables × factors) and G is (p, n) — factor contributions (factors × observations).
|
|
40
|
+
|
|
41
|
+
## Installation
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
pip install -e .
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## Quick Start
|
|
48
|
+
|
|
49
|
+
```python
|
|
50
|
+
import numpy as np
|
|
51
|
+
from pmf_acls import pmf
|
|
52
|
+
|
|
53
|
+
# X = data matrix (m variables × n observations)
|
|
54
|
+
# sigma = per-element uncertainties (same shape)
|
|
55
|
+
result = pmf(X, sigma, p=3) # ACLS algorithm (default)
|
|
56
|
+
print(f"Converged: {result.converged}, Q: {result.Q:.4f}")
|
|
57
|
+
|
|
58
|
+
F = result.F # (m, p) factor profiles
|
|
59
|
+
G = result.G # (p, n) factor contributions
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
## Algorithms
|
|
63
|
+
|
|
64
|
+
The unified `pmf()` entry point supports four algorithms via `algorithm=`. A fifth (LDA) is available as a standalone function.
|
|
65
|
+
|
|
66
|
+
### ACLS (default) — `algorithm='acls'`
|
|
67
|
+
|
|
68
|
+
Alternating Constrained Least Squares (Langville et al. 2014). Solves weighted k × k normal equations per column/row. Fast and robust.
|
|
69
|
+
|
|
70
|
+
```python
|
|
71
|
+
result = pmf(X, sigma, p=3, algorithm='acls')
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
| Parameter | Default | Description |
|
|
75
|
+
|-----------|---------|-------------|
|
|
76
|
+
| `lambda_W` | 0.0 | Tikhonov regularization on F |
|
|
77
|
+
| `lambda_H` | 0.0 | Tikhonov regularization on G |
|
|
78
|
+
| `fpeak` | 0.0 | FPEAK rotational parameter |
|
|
79
|
+
| `max_iter` | 1000 | Maximum iterations |
|
|
80
|
+
| `conv_tol` | 0.005 | Relative change in Q for convergence |
|
|
81
|
+
|
|
82
|
+
### LS-NMF — `algorithm='ls-nmf'`
|
|
83
|
+
|
|
84
|
+
Weighted multiplicative updates (Wang et al. 2006). Matches ESAT's LS-NMF for direct comparison.
|
|
85
|
+
|
|
86
|
+
```python
|
|
87
|
+
result = pmf(X, sigma, p=3, algorithm='ls-nmf')
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
| Parameter | Default | Description |
|
|
91
|
+
|-----------|---------|-------------|
|
|
92
|
+
| `max_iter` | 1000 | Maximum iterations |
|
|
93
|
+
| `conv_tol` | 0.005 | Relative change in Q for convergence |
|
|
94
|
+
|
|
95
|
+
### Newton — `algorithm='newton'`
|
|
96
|
+
|
|
97
|
+
Newton-based method (Lu & Wu 2005). Solves large (mp+np) × (mp+np) systems. Supports multi-phase iteration control and outlier reweighting.
|
|
98
|
+
|
|
99
|
+
```python
|
|
100
|
+
result = pmf(X, sigma, p=3, algorithm='newton')
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
| Parameter | Default | Description |
|
|
104
|
+
|-----------|---------|-------------|
|
|
105
|
+
| `mode` | 'full' | 'basic', 'regularized', or 'full' |
|
|
106
|
+
| `alpha, beta, gamma, delta` | 'auto' | Strength coefficients for penalties |
|
|
107
|
+
| `solver` | 'numpy' | Backend: 'numpy', 'jax', or 'jax_sparse' |
|
|
108
|
+
| `enable_reweighting` | True | Iterative outlier reweighting |
|
|
109
|
+
| `outlier_threshold` | 4.0 | Outlier detection threshold (× sigma) |
|
|
110
|
+
| `enable_step_control` | True | Step length control |
|
|
111
|
+
| `max_step_halving` | 3 | Max step halving attempts |
|
|
112
|
+
| `enable_initial_accel` | True | Initial NNLS acceleration |
|
|
113
|
+
| `accel_iterations` | 10 | Iterations with acceleration |
|
|
114
|
+
| `enable_multiphase` | True | PMF2-style multi-phase iteration |
|
|
115
|
+
| `phase_config` | None | Custom phase config (list of dicts) |
|
|
116
|
+
|
|
117
|
+
### Bayesian NMF — `algorithm='bayes'`
|
|
118
|
+
|
|
119
|
+
Gibbs sampling with heteroscedastic Gaussian noise and exponential priors (Schmidt et al. 2009). Via `pmf()`, returns `PMFResult` with posterior means. For full Bayesian output (posterior samples, Geweke diagnostics, credible intervals), call `pmf_bayes()` directly.
|
|
120
|
+
|
|
121
|
+
```python
|
|
122
|
+
# Via unified interface (returns PMFResult)
|
|
123
|
+
result = pmf(X, sigma, p=3, algorithm='bayes')
|
|
124
|
+
|
|
125
|
+
# Direct call (returns BayesNMFResult with full posterior)
|
|
126
|
+
from pmf_acls import pmf_bayes
|
|
127
|
+
result = pmf_bayes(X, sigma, p=3, n_samples=1000, n_burnin=500)
|
|
128
|
+
result.F_std # posterior standard deviations
|
|
129
|
+
result.Q_samples # Q trace for convergence diagnostics
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
| Parameter | Default | Description |
|
|
133
|
+
|-----------|---------|-------------|
|
|
134
|
+
| `n_samples` | 1000 | Posterior samples to collect |
|
|
135
|
+
| `n_burnin` | 500 | Burn-in sweeps to discard |
|
|
136
|
+
| `n_thin` | 1 | Thinning factor |
|
|
137
|
+
| `lambda_G` | 1.0 | Exponential rate prior on G (contributions) |
|
|
138
|
+
| `lambda_F` | 1.0 | Exponential rate prior on F (profiles) |
|
|
139
|
+
| `learn_hyperparams` | True | Conjugate Gamma updates for lambda |
|
|
140
|
+
| `hyperparam_shape` | 1.0 | Shape (a) of Gamma(a, b) hyperprior |
|
|
141
|
+
| `hyperparam_rate` | 0.1 | Rate (b) of Gamma(a, b) hyperprior |
|
|
142
|
+
| `warm_start` | True | Initialize from short ACLS run |
|
|
143
|
+
| `store_samples` | False | Keep full F/G posterior sample arrays |
|
|
144
|
+
| `sampler` | 'icdf' | Truncated-normal method: 'icdf' (fast) or 'scipy' |
|
|
145
|
+
| `ard` | False | Automatic Relevance Determination for factor pruning |
|
|
146
|
+
| `ard_threshold` | 0.01 | Relative contribution threshold for active factors |
|
|
147
|
+
|
|
148
|
+
Convergence is assessed via the Geweke z-score on the Q trace (|z| < 2 → converged). Factor columns of F are normalized to unit L1 after each sweep.
|
|
149
|
+
|
|
150
|
+
#### Automatic Relevance Determination (ARD)
|
|
151
|
+
|
|
152
|
+
ARD enables automatic factor number selection. Over-specify p (e.g., request 8 factors when you expect 3-5) and the model prunes unnecessary factors by driving their prior rates large:
|
|
153
|
+
|
|
154
|
+
```python
|
|
155
|
+
result = pmf_bayes(X, sigma, p=8, ard=True, hyperparam_shape=0.5)
|
|
156
|
+
print(f"Active factors: {result.effective_p} / 8")
|
|
157
|
+
print(f"Active mask: {result.active_factors}")
|
|
158
|
+
# Per-factor rates: large values = pruned factors
|
|
159
|
+
print(f"lambda_F per factor: {result.ard_lambda_F}")
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
- `hyperparam_shape < 1` (e.g., 0.5) promotes aggressive pruning
|
|
163
|
+
- `hyperparam_shape = 1.0` (default) provides moderate pruning
|
|
164
|
+
- `ard_threshold` controls the relative contribution cutoff for "active"
|
|
165
|
+
|
|
166
|
+
### Bayesian LDA — `pmf_lda()` (standalone)
|
|
167
|
+
|
|
168
|
+
Dirichlet-Gaussian source apportionment via Metropolis-within-Gibbs. F columns are simplex-constrained (proper compositional profiles summing to 1); G rows have exponential priors. Not available through `pmf()` — use `pmf_lda()` directly.
|
|
169
|
+
|
|
170
|
+
```python
|
|
171
|
+
from pmf_acls import pmf_lda
|
|
172
|
+
result = pmf_lda(X, sigma, p=3, n_samples=1000, n_burnin=500)
|
|
173
|
+
result.mh_acceptance_rate # target: 0.1–0.5
|
|
174
|
+
result.kl_samples # KL divergence trace
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
| Parameter | Default | Description |
|
|
178
|
+
|-----------|---------|-------------|
|
|
179
|
+
| `n_samples` | 1000 | Posterior samples to collect |
|
|
180
|
+
| `n_burnin` | 500 | Burn-in sweeps to discard |
|
|
181
|
+
| `n_thin` | 1 | Thinning factor |
|
|
182
|
+
| `alpha` | 1.0 | Symmetric Dirichlet concentration (fixed) |
|
|
183
|
+
| `lambda_G` | 1.0 | Exponential rate prior on G (contributions) |
|
|
184
|
+
| `learn_hyperparams` | True | Conjugate Gamma updates for lambda_G |
|
|
185
|
+
| `hyperparam_shape` | 1.0 | Shape (a) of Gamma(a, b) hyperprior |
|
|
186
|
+
| `hyperparam_rate` | 0.1 | Rate (b) of Gamma(a, b) hyperprior |
|
|
187
|
+
| `mh_step_size` | 1.0 | MH proposal scale (tune for acceptance 0.1–0.5) |
|
|
188
|
+
| `store_samples` | False | Keep full posterior sample arrays |
|
|
189
|
+
| `sampler` | 'icdf' | Truncated-normal method for G updates |
|
|
190
|
+
|
|
191
|
+
Key difference from Bayesian NMF: F columns have a joint Dirichlet prior enforcing the simplex constraint by construction (not post-hoc normalization). `alpha < 1` promotes sparse profiles; `alpha > 1` smooths toward uniform. Convergence is assessed via Geweke z-score on the KL divergence trace.
|
|
192
|
+
|
|
193
|
+
### Common Parameters (all algorithms)
|
|
194
|
+
|
|
195
|
+
| Parameter | Default | Description |
|
|
196
|
+
|-----------|---------|-------------|
|
|
197
|
+
| `X` | required | Data matrix, shape (m, n) |
|
|
198
|
+
| `sigma` | required | Per-element uncertainties, shape (m, n) |
|
|
199
|
+
| `p` | required | Number of factors |
|
|
200
|
+
| `F_init, G_init` | None | Initial factor matrices |
|
|
201
|
+
| `init_method` | 'random' | 'random', 'random_acol', or 'svd_centroid' |
|
|
202
|
+
| `random_seed` | None | Seed for reproducibility |
|
|
203
|
+
| `verbose` | False | Print progress |
|
|
204
|
+
|
|
205
|
+
## Bayesian Diagnostics
|
|
206
|
+
|
|
207
|
+
The `bayes_diagnostics` module provides MCMC convergence diagnostics and model comparison tools for the Bayesian solvers.
|
|
208
|
+
|
|
209
|
+
### Effective Sample Size (ESS)
|
|
210
|
+
|
|
211
|
+
Estimates the number of effectively independent samples in a correlated MCMC trace. Low ESS suggests increasing `n_samples` or `n_thin`.
|
|
212
|
+
|
|
213
|
+
```python
|
|
214
|
+
from pmf_acls import effective_sample_size
|
|
215
|
+
|
|
216
|
+
result = pmf_bayes(X, sigma, p=3, n_samples=2000, n_burnin=500)
|
|
217
|
+
ess = effective_sample_size(result.Q_samples)
|
|
218
|
+
print(f"ESS: {ess:.0f} / {len(result.Q_samples)} samples")
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
### Gelman-Rubin Rhat (multi-chain convergence)
|
|
222
|
+
|
|
223
|
+
Compares between-chain and within-chain variance across independent runs. Rhat < 1.05 indicates convergence.
|
|
224
|
+
|
|
225
|
+
```python
|
|
226
|
+
from pmf_acls import gelman_rubin
|
|
227
|
+
|
|
228
|
+
results = [pmf_bayes(X, sigma, p=3, random_seed=s) for s in range(4)]
|
|
229
|
+
rhat = gelman_rubin(*[r.Q_samples for r in results])
|
|
230
|
+
print(f"Rhat: {rhat:.3f}") # target: < 1.05
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
### WAIC (model comparison)
|
|
234
|
+
|
|
235
|
+
Widely Applicable Information Criterion for comparing models with different p. Lower WAIC is better. Requires `store_samples=True`.
|
|
236
|
+
|
|
237
|
+
```python
|
|
238
|
+
from pmf_acls import compute_waic
|
|
239
|
+
|
|
240
|
+
waic_scores = {}
|
|
241
|
+
for p in [2, 3, 4, 5]:
|
|
242
|
+
result = pmf_bayes(X, sigma, p, store_samples=True, n_samples=500)
|
|
243
|
+
w = compute_waic(X, sigma, result.F_samples, result.G_samples)
|
|
244
|
+
waic_scores[p] = w["waic"]
|
|
245
|
+
print(f"p={p}: WAIC={w['waic']:.1f} p_waic={w['p_waic']:.1f} SE={w['se']:.1f}")
|
|
246
|
+
|
|
247
|
+
best_p = min(waic_scores, key=waic_scores.get)
|
|
248
|
+
print(f"Best number of factors: {best_p}")
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
## Data Preparation
|
|
252
|
+
|
|
253
|
+
```python
|
|
254
|
+
from pmf_acls import prepare_data
|
|
255
|
+
|
|
256
|
+
X_clean, sigma = prepare_data(
|
|
257
|
+
X_raw,
|
|
258
|
+
detection_limit=detection_limits,
|
|
259
|
+
missing_method='median',
|
|
260
|
+
bdl_replacement='half_dl',
|
|
261
|
+
uncertainty_method='poisson',
|
|
262
|
+
)
|
|
263
|
+
result = pmf(X_clean, sigma, p=3)
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
## Factor Selection
|
|
267
|
+
|
|
268
|
+
```python
|
|
269
|
+
from pmf_acls import select_factors
|
|
270
|
+
|
|
271
|
+
selection = select_factors(X, sigma, p_range=(2, 6), n_runs=10)
|
|
272
|
+
print(f"Recommended: {selection.best_p} factors")
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
## Uncertainty Estimation
|
|
276
|
+
|
|
277
|
+
```python
|
|
278
|
+
from pmf_acls import bootstrap_uncertainty, displacement_test
|
|
279
|
+
|
|
280
|
+
# Bootstrap
|
|
281
|
+
uncertainty = bootstrap_uncertainty(X, sigma, p=3, n_bootstrap=100)
|
|
282
|
+
|
|
283
|
+
# Displacement (DISP)
|
|
284
|
+
disp = displacement_test(X, sigma, result)
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
## Tests
|
|
288
|
+
|
|
289
|
+
```bash
|
|
290
|
+
pytest pmf_acls/tests/ -v
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
## Dependencies
|
|
294
|
+
|
|
295
|
+
- numpy >= 1.20.0
|
|
296
|
+
- scipy >= 1.7.0
|
|
297
|
+
|
|
298
|
+
### Optional
|
|
299
|
+
|
|
300
|
+
- JAX >= 0.4.0 (GPU acceleration for Newton solver)
|
|
301
|
+
|
|
302
|
+
## References
|
|
303
|
+
|
|
304
|
+
- Lu, J. & Wu, L. (2005). Technical details and programming guide for a general two-way PMF algorithm.
|
|
305
|
+
- Langville, A. N. et al. (2014). Algorithms, Initializations, and Convergence for the NMF.
|
|
306
|
+
- Wang, G. et al. (2006). LS-NMF: A modified non-negative matrix factorization algorithm utilizing uncertainty estimates.
|
|
307
|
+
- Schmidt, M. N. et al. (2009). Bayesian non-negative matrix factorization.
|
|
308
|
+
- Cemgil, A. T. (2009). Bayesian inference for nonnegative matrix factorisation models.
|
|
309
|
+
- Blei, D. M. et al. (2003). Latent Dirichlet Allocation.
|
pmf_acls-0.1.0/README.md
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
# PMF Monorepo
|
|
2
|
+
|
|
3
|
+
Two projects in one repo:
|
|
4
|
+
|
|
5
|
+
- **pmf_acls/** — A Python Positive Matrix Factorization solver with multiple algorithm backends and per-element uncertainty weighting. See [pmf_acls/README.md](pmf_acls/README.md).
|
|
6
|
+
- **nmf_compare/** — Benchmark harness comparing NMF/PMF solvers on synthetic data. See [nmf_compare/README.md](nmf_compare/README.md).
|
|
7
|
+
|
|
8
|
+
## Layout
|
|
9
|
+
|
|
10
|
+
```
|
|
11
|
+
pmf_acls/ # Solver package (installable)
|
|
12
|
+
nmf_compare/ # Benchmarking scripts and results
|
|
13
|
+
scripts/ # Ad-hoc / one-off analysis scripts
|
|
14
|
+
examples/ # Usage examples for pmf_acls
|
|
15
|
+
docs/ # Literature reviews, specs, notes
|
|
16
|
+
reference/ # PMF2/ME2 reference binaries
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
## Running Tests
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
pytest pmf_acls/tests/ -v
|
|
23
|
+
```
|