ssbc 1.0.0__tar.gz → 1.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (60) hide show
  1. ssbc-1.1.0/HISTORY.md +48 -0
  2. ssbc-1.1.0/LICENSE +29 -0
  3. ssbc-1.1.0/PKG-INFO +337 -0
  4. ssbc-1.1.0/README.md +258 -0
  5. {ssbc-1.0.0 → ssbc-1.1.0}/docs/index.md +3 -3
  6. ssbc-1.1.0/docs/theory.md +265 -0
  7. ssbc-1.1.0/docs/usage.md +508 -0
  8. {ssbc-1.0.0 → ssbc-1.1.0}/pyproject.toml +15 -2
  9. {ssbc-1.0.0 → ssbc-1.1.0}/src/ssbc/__init__.py +47 -1
  10. ssbc-1.1.0/src/ssbc/bootstrap.py +411 -0
  11. {ssbc-1.0.0 → ssbc-1.1.0}/src/ssbc/cli.py +0 -3
  12. ssbc-1.1.0/src/ssbc/conformal.py +1032 -0
  13. ssbc-1.1.0/src/ssbc/cross_conformal.py +425 -0
  14. ssbc-1.1.0/src/ssbc/mcp_server.py +93 -0
  15. ssbc-1.1.0/src/ssbc/operational_bounds_simple.py +367 -0
  16. ssbc-1.1.0/src/ssbc/rigorous_report.py +601 -0
  17. {ssbc-1.0.0 → ssbc-1.1.0}/src/ssbc/statistics.py +70 -0
  18. ssbc-1.1.0/src/ssbc/utils.py +72 -0
  19. ssbc-1.1.0/src/ssbc/validation.py +409 -0
  20. ssbc-1.1.0/src/ssbc/visualization.py +482 -0
  21. ssbc-1.1.0/src/ssbc.egg-info/PKG-INFO +337 -0
  22. {ssbc-1.0.0 → ssbc-1.1.0}/src/ssbc.egg-info/SOURCES.txt +13 -2
  23. {ssbc-1.0.0 → ssbc-1.1.0}/src/ssbc.egg-info/requires.txt +9 -0
  24. ssbc-1.1.0/tests/test_bootstrap.py +297 -0
  25. ssbc-1.1.0/tests/test_conformal.py +792 -0
  26. ssbc-1.1.0/tests/test_cross_conformal.py +339 -0
  27. ssbc-1.1.0/tests/test_operational_bounds.py +310 -0
  28. ssbc-1.1.0/tests/test_rigorous_report.py +345 -0
  29. ssbc-1.1.0/tests/test_utils.py +185 -0
  30. ssbc-1.1.0/tests/test_validation.py +236 -0
  31. {ssbc-1.0.0 → ssbc-1.1.0}/tests/test_visualization.py +9 -138
  32. ssbc-1.0.0/HISTORY.md +0 -5
  33. ssbc-1.0.0/LICENSE +0 -21
  34. ssbc-1.0.0/PKG-INFO +0 -266
  35. ssbc-1.0.0/README.md +0 -223
  36. ssbc-1.0.0/docs/usage.md +0 -7
  37. ssbc-1.0.0/src/ssbc/conformal.py +0 -333
  38. ssbc-1.0.0/src/ssbc/ssbc.py +0 -1
  39. ssbc-1.0.0/src/ssbc/utils.py +0 -2
  40. ssbc-1.0.0/src/ssbc/visualization.py +0 -459
  41. ssbc-1.0.0/src/ssbc.egg-info/PKG-INFO +0 -266
  42. ssbc-1.0.0/tests/test_conformal.py +0 -326
  43. ssbc-1.0.0/tests/test_ssbc.py +0 -22
  44. {ssbc-1.0.0 → ssbc-1.1.0}/CONTRIBUTING.md +0 -0
  45. {ssbc-1.0.0 → ssbc-1.1.0}/MANIFEST.in +0 -0
  46. {ssbc-1.0.0 → ssbc-1.1.0}/docs/installation.md +0 -0
  47. {ssbc-1.0.0 → ssbc-1.1.0}/setup.cfg +0 -0
  48. {ssbc-1.0.0 → ssbc-1.1.0}/setup.py +0 -0
  49. {ssbc-1.0.0 → ssbc-1.1.0}/src/ssbc/__main__.py +0 -0
  50. {ssbc-1.0.0 → ssbc-1.1.0}/src/ssbc/core.py +0 -0
  51. {ssbc-1.0.0 → ssbc-1.1.0}/src/ssbc/hyperparameter.py +0 -0
  52. {ssbc-1.0.0 → ssbc-1.1.0}/src/ssbc/simulation.py +0 -0
  53. {ssbc-1.0.0 → ssbc-1.1.0}/src/ssbc.egg-info/dependency_links.txt +0 -0
  54. {ssbc-1.0.0 → ssbc-1.1.0}/src/ssbc.egg-info/entry_points.txt +0 -0
  55. {ssbc-1.0.0 → ssbc-1.1.0}/src/ssbc.egg-info/top_level.txt +0 -0
  56. {ssbc-1.0.0 → ssbc-1.1.0}/tests/__init__.py +0 -0
  57. {ssbc-1.0.0 → ssbc-1.1.0}/tests/test_core.py +0 -0
  58. {ssbc-1.0.0 → ssbc-1.1.0}/tests/test_hyperparameter.py +0 -0
  59. {ssbc-1.0.0 → ssbc-1.1.0}/tests/test_simulation.py +0 -0
  60. {ssbc-1.0.0 → ssbc-1.1.0}/tests/test_statistics.py +0 -0
ssbc-1.1.0/HISTORY.md ADDED
@@ -0,0 +1,48 @@
1
+ # History
2
+
3
+ ## 1.1.0 (2025-10-15)
4
+
5
+ ### Major Features
6
+
7
+ * Added **bootstrap calibration uncertainty analysis** for understanding recalibration variability
8
+ * Added **cross-conformal validation** (K-fold) for finite-sample diagnostics
9
+ * Added **validation module** for empirical PAC bounds verification
10
+ * Added **unified workflow** via `generate_rigorous_pac_report()` integrating all uncertainty analyses
11
+
12
+ ### API Changes (BREAKING)
13
+
14
+ * Removed deprecated `sla.py` module and old operational bounds API:
15
+ - Removed `compute_mondrian_operational_bounds()`
16
+ - Removed `compute_marginal_operational_bounds()`
17
+ - Removed `OperationalRateBounds` and `OperationalRateBoundsResult`
18
+ * Replaced with rigorous PAC-controlled operational bounds via `generate_rigorous_pac_report()`
19
+ * New bounds use LOO-CV + Clopper-Pearson for proper estimation uncertainty
20
+
21
+ ### Internal Improvements
22
+
23
+ * Removed dead code modules: `coverage_distribution.py` (1,400 lines), `blakers_confidence_interval.py` (388 lines)
24
+ * Added comprehensive test suite: 90+ new tests across 6 new test files
25
+ * Test coverage improved from ~45% to 77%
26
+ * All code now passes ruff linting and ty type checking
27
+ * Examples directory cleaned and fully integrated into linting workflow
28
+
29
+ ### Migration Guide
30
+
31
+ ```python
32
+ # OLD (v1.0.0 and earlier)
33
+ from ssbc import compute_mondrian_operational_bounds, compute_marginal_operational_bounds
34
+ bounds = compute_mondrian_operational_bounds(cal_result, labels, probs)
35
+
36
+ # NEW (v1.1.0)
37
+ from ssbc import generate_rigorous_pac_report
38
+ report = generate_rigorous_pac_report(labels, probs, alpha_target=0.10, delta=0.10)
39
+ pac_bounds = report['pac_bounds_class_0']
40
+ ```
41
+
42
+ ## 1.0.0 (2025-10-10)
43
+
44
+ * First stable release on PyPI.
45
+
46
+ ## 0.1.0 (2025-10-10)
47
+
48
+ * Initial development release.
ssbc-1.1.0/LICENSE ADDED
@@ -0,0 +1,29 @@
1
+ BSD License
2
+
3
+ Copyright (c) 2025, Petrus H. Zwart / Lawrence Berkeley National Laboratory
4
+ All rights reserved.
5
+
6
+ Redistribution and use in source and binary forms, with or without modification,
7
+ are permitted provided that the following conditions are met:
8
+
9
+ * Redistributions of source code must retain the above copyright notice, this
10
+ list of conditions and the following disclaimer.
11
+
12
+ * Redistributions in binary form must reproduce the above copyright notice, this
13
+ list of conditions and the following disclaimer in the documentation and/or
14
+ other materials provided with the distribution.
15
+
16
+ * Neither the name of the copyright holder nor the names of its
17
+ contributors may be used to endorse or promote products derived from this
18
+ software without specific prior written permission.
19
+
20
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
21
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
22
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
23
+ IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
24
+ INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
25
+ BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
27
+ OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
28
+ OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
29
+ OF THE POSSIBILITY OF SUCH DAMAGE.
ssbc-1.1.0/PKG-INFO ADDED
@@ -0,0 +1,337 @@
1
+ Metadata-Version: 2.4
2
+ Name: ssbc
3
+ Version: 1.1.0
4
+ Summary: Small Sample Beta Correction - PAC guarantees with small datasets
5
+ Author-email: Petrus H Zwart <phzwart@lbl.gov>
6
+ Maintainer-email: Petrus H Zwart <phzwart@lbl.gov>
7
+ License: BSD License
8
+
9
+ Copyright (c) 2025, Petrus H. Zwart / Lawrence Berkeley National Laboratory
10
+ All rights reserved.
11
+
12
+ Redistribution and use in source and binary forms, with or without modification,
13
+ are permitted provided that the following conditions are met:
14
+
15
+ * Redistributions of source code must retain the above copyright notice, this
16
+ list of conditions and the following disclaimer.
17
+
18
+ * Redistributions in binary form must reproduce the above copyright notice, this
19
+ list of conditions and the following disclaimer in the documentation and/or
20
+ other materials provided with the distribution.
21
+
22
+ * Neither the name of the copyright holder nor the names of its
23
+ contributors may be used to endorse or promote products derived from this
24
+ software without specific prior written permission.
25
+
26
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
27
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
28
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
29
+ IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
30
+ INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
31
+ BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
32
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
33
+ OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
34
+ OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
35
+ OF THE POSSIBILITY OF SUCH DAMAGE.
36
+
37
+ Project-URL: bugs, https://github.com/phzwart/ssbc/issues
38
+ Project-URL: changelog, https://github.com/phzwart/ssbc/blob/master/changelog.md
39
+ Project-URL: homepage, https://github.com/phzwart/ssbc
40
+ Classifier: Development Status :: 4 - Beta
41
+ Classifier: Intended Audience :: Science/Research
42
+ Classifier: Intended Audience :: Developers
43
+ Classifier: Programming Language :: Python :: 3
44
+ Classifier: Programming Language :: Python :: 3.10
45
+ Classifier: Programming Language :: Python :: 3.11
46
+ Classifier: Programming Language :: Python :: 3.12
47
+ Classifier: Programming Language :: Python :: 3.13
48
+ Classifier: Topic :: Scientific/Engineering
49
+ Classifier: Topic :: Scientific/Engineering :: Mathematics
50
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
51
+ Requires-Python: >=3.10
52
+ Description-Content-Type: text/markdown
53
+ License-File: LICENSE
54
+ Requires-Dist: joblib
55
+ Requires-Dist: matplotlib
56
+ Requires-Dist: numpy
57
+ Requires-Dist: pandas
58
+ Requires-Dist: plotly
59
+ Requires-Dist: rich
60
+ Requires-Dist: scipy
61
+ Requires-Dist: typer
62
+ Provides-Extra: test
63
+ Requires-Dist: coverage; extra == "test"
64
+ Requires-Dist: pytest; extra == "test"
65
+ Requires-Dist: pytest-cov; extra == "test"
66
+ Requires-Dist: ruff; extra == "test"
67
+ Requires-Dist: ty; extra == "test"
68
+ Requires-Dist: ipdb; extra == "test"
69
+ Provides-Extra: dev
70
+ Requires-Dist: pre-commit; extra == "dev"
71
+ Requires-Dist: bandit[toml]; extra == "dev"
72
+ Provides-Extra: docs
73
+ Requires-Dist: sphinx>=7.0; extra == "docs"
74
+ Requires-Dist: sphinx-rtd-theme; extra == "docs"
75
+ Requires-Dist: myst-parser; extra == "docs"
76
+ Provides-Extra: mcp
77
+ Requires-Dist: fastmcp; extra == "mcp"
78
+ Dynamic: license-file
79
+
80
+ # SSBC: Small-Sample Beta Correction
81
+
82
+ ![PyPI version](https://img.shields.io/pypi/v/ssbc.svg)
83
+ [![Documentation Status](https://readthedocs.org/projects/ssbc/badge/?version=latest)](https://ssbc.readthedocs.io/en/latest/?version=latest)
84
+
85
+ **Small-Sample Beta Correction** provides PAC (Probably Approximately Correct) guarantees for conformal prediction with small calibration sets.
86
+
87
+ * PyPI package: https://pypi.org/project/ssbc/
88
+ * Free software: MIT License
89
+ * Documentation: https://ssbc.readthedocs.io.
90
+
91
+ ## Overview
92
+
93
+ SSBC addresses the challenge of constructing valid prediction sets when you have limited calibration data. Traditional conformal prediction assumes large calibration sets, but in practice, data is often scarce. SSBC provides **finite-sample PAC guarantees** and **rigorous operational bounds** for deployment.
94
+
95
+ ### What Makes SSBC Unique?
96
+
97
+ Unlike asymptotic methods, SSBC provides:
98
+
99
+ 1. **Finite-Sample PAC Coverage** (via SSBC algorithm)
100
+ - Rigorous guarantees that hold for ANY sample size
101
+ - Automatically adapts to class imbalance via Mondrian conformal prediction
102
+ - Example: "≥90% coverage with 95% probability" even with n=50
103
+
104
+ 2. **Rigorous Operational Bounds** (via LOO-CV + Clopper-Pearson)
105
+ - PAC-controlled bounds on automation rates, error rates, escalation rates
106
+ - Confidence intervals account for estimation uncertainty
107
+ - Example: "Singleton rate [0.85, 0.97] with 90% PAC guarantee"
108
+
109
+ 3. **Uncertainty Quantification**
110
+ - Bootstrap analysis for recalibration uncertainty
111
+ - Cross-conformal validation for finite-sample diagnostics
112
+ - Empirical validation for verifying theoretical guarantees
113
+
114
+ 4. **Contract-Ready Guarantees**
115
+ - Transform theory into deployable systems
116
+ - Resource planning (human oversight needs)
117
+ - SLA compliance (performance bounds)
118
+
119
+ ### Core Statistical Properties
120
+
121
+ 🎯 **Distribution-Free**: No assumptions about data distribution
122
+ 🎯 **Model-Agnostic**: Works with ANY probabilistic classifier
123
+ 🎯 **Frequentist**: Valid frequentist guarantees, no prior needed
124
+ 🎯 **Non-Bayesian**: No Bayesian assumptions or hyperpriors
125
+ 🎯 **Finite-Sample**: Exact guarantees for small n, not asymptotic
126
+ 🎯 **Exchangeability Only**: Minimal assumption (test/calibration exchangeable)
127
+
128
+ **📖 For detailed theory and deployment guide, see [docs/theory.md](docs/theory.md)**
129
+
130
+ ## Installation
131
+
132
+ ```bash
133
+ pip install ssbc
134
+ ```
135
+
136
+ Or from source:
137
+
138
+ ```bash
139
+ git clone https://github.com/phzwart/ssbc.git
140
+ cd ssbc
141
+ pip install -e .
142
+ ```
143
+
144
+ ## Quick Start
145
+
146
+ ### Unified Workflow (Recommended)
147
+
148
+ The complete workflow is available through a single function:
149
+
150
+ ```python
151
+ from ssbc import BinaryClassifierSimulator, generate_rigorous_pac_report
152
+
153
+ # Generate or load calibration data
154
+ sim = BinaryClassifierSimulator(
155
+ p_class1=0.2,
156
+ beta_params_class0=(1, 7),
157
+ beta_params_class1=(5, 2),
158
+ seed=42
159
+ )
160
+ labels, probs = sim.generate(n_samples=100)
161
+
162
+ # Generate comprehensive PAC report with operational bounds
163
+ report = generate_rigorous_pac_report(
164
+ labels=labels,
165
+ probs=probs,
166
+ alpha_target=0.10, # Target 90% coverage
167
+ delta=0.10, # 90% PAC confidence
168
+ test_size=1000, # Expected deployment size
169
+ use_union_bound=True, # Simultaneous guarantees
170
+
171
+ # Optional uncertainty analyses
172
+ run_bootstrap=True, # Recalibration uncertainty
173
+ n_bootstrap=1000,
174
+ simulator=sim,
175
+
176
+ run_cross_conformal=True, # Finite-sample diagnostics
177
+ n_folds=10,
178
+ )
179
+
180
+ # Access results
181
+ pac_bounds = report['pac_bounds_marginal']
182
+ print(f"Singleton rate: {pac_bounds['singleton_rate_bounds']}")
183
+ print(f"Expected: {pac_bounds['expected_singleton_rate']:.3f}")
184
+ ```
185
+
186
+ **Output includes:**
187
+ - ✅ PAC coverage guarantees (SSBC-corrected thresholds)
188
+ - ✅ Rigorous operational bounds (singleton, doublet, abstention, error rates)
189
+ - ✅ Per-class and marginal statistics
190
+ - ✅ Optional: Bootstrap uncertainty intervals
191
+ - ✅ Optional: Cross-conformal validation diagnostics
192
+
193
+ ### Core SSBC Algorithm
194
+
195
+ For fine-grained control, use the core algorithm directly:
196
+
197
+ ```python
198
+ from ssbc import ssbc_correct
199
+
200
+ result = ssbc_correct(
201
+ alpha_target=0.10, # Target 10% miscoverage
202
+ n=50, # Calibration set size
203
+ delta=0.10, # PAC parameter (90% confidence)
204
+ mode="beta" # Infinite test window
205
+ )
206
+
207
+ print(f"Corrected α: {result.alpha_corrected:.4f}")
208
+ print(f"u*: {result.u_star}")
209
+ ```
210
+
211
+ ### Validation and Diagnostics
212
+
213
+ Empirically validate your PAC bounds:
214
+
215
+ ```python
216
+ from ssbc import validate_pac_bounds, print_validation_results
217
+
218
+ # Generate report
219
+ report = generate_rigorous_pac_report(labels, probs, delta=0.10)
220
+
221
+ # Validate empirically
222
+ validation = validate_pac_bounds(
223
+ report=report,
224
+ simulator=sim,
225
+ test_size=1000,
226
+ n_trials=10000
227
+ )
228
+
229
+ # Print results
230
+ print_validation_results(validation)
231
+ ```
232
+
233
+ Cross-conformal validation for calibration diagnostics:
234
+
235
+ ```python
236
+ from ssbc import cross_conformal_validation
237
+
238
+ results = cross_conformal_validation(
239
+ labels=labels,
240
+ probs=probs,
241
+ n_folds=10,
242
+ alpha_target=0.10,
243
+ delta=0.10
244
+ )
245
+
246
+ print(f"Singleton rate: {results['marginal']['singleton']['mean']:.3f}")
247
+ print(f"Std dev: {results['marginal']['singleton']['std']:.3f}")
248
+ ```
249
+
250
+ ## Key Features
251
+
252
+ - ✅ **Small-Sample Correction**: PAC-valid conformal prediction for small calibration sets
253
+ - ✅ **Mondrian Conformal Prediction**: Per-class calibration for handling class imbalance
254
+ - ✅ **PAC Operational Bounds**: Rigorous bounds on deployment rates (LOO-CV + Clopper-Pearson)
255
+ - ✅ **Bootstrap Uncertainty**: Recalibration variability analysis
256
+ - ✅ **Cross-Conformal Validation**: Finite-sample diagnostics via K-fold
257
+ - ✅ **Empirical Validation**: Verify theoretical guarantees in practice
258
+ - ✅ **Comprehensive Statistics**: Detailed reporting with exact confidence intervals
259
+ - ✅ **Hyperparameter Tuning**: Interactive parallel coordinates visualization
260
+ - ✅ **Simulation Tools**: Built-in data generators for testing
261
+
262
+ ## Examples
263
+
264
+ The `examples/` directory contains comprehensive demonstrations:
265
+
266
+ ### Essential Examples
267
+
268
+ ```bash
269
+ # Core algorithm
270
+ python examples/ssbc_core_example.py
271
+
272
+ # Mondrian conformal prediction
273
+ python examples/mondrian_conformal_example.py
274
+
275
+ # Complete workflow with all uncertainty analyses
276
+ python examples/complete_workflow_example.py
277
+
278
+ # SLA/deployment contracts
279
+ python examples/sla_example.py
280
+
281
+ # Alpha scanning across thresholds
282
+ python examples/alpha_scan_example.py
283
+
284
+ # Empirical validation
285
+ python examples/pac_validation_example.py
286
+ ```
287
+
288
+ ## Understanding the Output
289
+
290
+ ### Per-Class Statistics (Conditioned on True Label)
291
+
292
+ For each class, the report shows:
293
+ - **Abstentions**: Empty prediction sets (no confident prediction)
294
+ - **Singletons**: Single-label predictions (automated decisions)
295
+ - **Doublets**: Both labels included (escalated to human review)
296
+ - **Singleton Error Rate**: P(error | singleton prediction)
297
+
298
+ ### Marginal Statistics (Deployment View)
299
+
300
+ Overall performance metrics (deployment perspective):
301
+ - **Coverage**: Fraction of predictions containing the true label
302
+ - **Automation Rate**: Fraction of confident predictions (singletons)
303
+ - **Escalation Rate**: Fraction requiring human review (doublets + abstentions)
304
+ - **Error Rate**: Among automated decisions
305
+
306
+ ### PAC Operational Bounds
307
+
308
+ Rigorous bounds on all operational metrics:
309
+ - Computed via Leave-One-Out Cross-Validation (LOO-CV)
310
+ - Clopper-Pearson confidence intervals account for estimation uncertainty
311
+ - Union bound ensures all metrics hold simultaneously
312
+ - Valid for any future test set from the same distribution
313
+
314
+ ## Citation
315
+
316
+ If you use SSBC in your research, please cite:
317
+
318
+ ```bibtex
319
+ @software{ssbc2024,
320
+ author = {Zwart, Petrus H},
321
+ title = {SSBC: Small-Sample Beta Correction},
322
+ year = {2024},
323
+ url = {https://github.com/phzwart/ssbc}
324
+ }
325
+ ```
326
+
327
+ ## Contributing
328
+
329
+ Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
330
+
331
+ ## License
332
+
333
+ MIT License - see [LICENSE](LICENSE) file for details.
334
+
335
+ ## Credits
336
+
337
+ This package was created with [Cookiecutter](https://github.com/audreyfeldroy/cookiecutter) and the [audreyfeldroy/cookiecutter-pypackage](https://github.com/audreyfeldroy/cookiecutter-pypackage) project template.
ssbc-1.1.0/README.md ADDED
@@ -0,0 +1,258 @@
1
+ # SSBC: Small-Sample Beta Correction
2
+
3
+ ![PyPI version](https://img.shields.io/pypi/v/ssbc.svg)
4
+ [![Documentation Status](https://readthedocs.org/projects/ssbc/badge/?version=latest)](https://ssbc.readthedocs.io/en/latest/?version=latest)
5
+
6
+ **Small-Sample Beta Correction** provides PAC (Probably Approximately Correct) guarantees for conformal prediction with small calibration sets.
7
+
8
+ * PyPI package: https://pypi.org/project/ssbc/
9
+ * Free software: MIT License
10
+ * Documentation: https://ssbc.readthedocs.io.
11
+
12
+ ## Overview
13
+
14
+ SSBC addresses the challenge of constructing valid prediction sets when you have limited calibration data. Traditional conformal prediction assumes large calibration sets, but in practice, data is often scarce. SSBC provides **finite-sample PAC guarantees** and **rigorous operational bounds** for deployment.
15
+
16
+ ### What Makes SSBC Unique?
17
+
18
+ Unlike asymptotic methods, SSBC provides:
19
+
20
+ 1. **Finite-Sample PAC Coverage** (via SSBC algorithm)
21
+ - Rigorous guarantees that hold for ANY sample size
22
+ - Automatically adapts to class imbalance via Mondrian conformal prediction
23
+ - Example: "≥90% coverage with 95% probability" even with n=50
24
+
25
+ 2. **Rigorous Operational Bounds** (via LOO-CV + Clopper-Pearson)
26
+ - PAC-controlled bounds on automation rates, error rates, escalation rates
27
+ - Confidence intervals account for estimation uncertainty
28
+ - Example: "Singleton rate [0.85, 0.97] with 90% PAC guarantee"
29
+
30
+ 3. **Uncertainty Quantification**
31
+ - Bootstrap analysis for recalibration uncertainty
32
+ - Cross-conformal validation for finite-sample diagnostics
33
+ - Empirical validation for verifying theoretical guarantees
34
+
35
+ 4. **Contract-Ready Guarantees**
36
+ - Transform theory into deployable systems
37
+ - Resource planning (human oversight needs)
38
+ - SLA compliance (performance bounds)
39
+
40
+ ### Core Statistical Properties
41
+
42
+ 🎯 **Distribution-Free**: No assumptions about data distribution
43
+ 🎯 **Model-Agnostic**: Works with ANY probabilistic classifier
44
+ 🎯 **Frequentist**: Valid frequentist guarantees, no prior needed
45
+ 🎯 **Non-Bayesian**: No Bayesian assumptions or hyperpriors
46
+ 🎯 **Finite-Sample**: Exact guarantees for small n, not asymptotic
47
+ 🎯 **Exchangeability Only**: Minimal assumption (test/calibration exchangeable)
48
+
49
+ **📖 For detailed theory and deployment guide, see [docs/theory.md](docs/theory.md)**
50
+
51
+ ## Installation
52
+
53
+ ```bash
54
+ pip install ssbc
55
+ ```
56
+
57
+ Or from source:
58
+
59
+ ```bash
60
+ git clone https://github.com/phzwart/ssbc.git
61
+ cd ssbc
62
+ pip install -e .
63
+ ```
64
+
65
+ ## Quick Start
66
+
67
+ ### Unified Workflow (Recommended)
68
+
69
+ The complete workflow is available through a single function:
70
+
71
+ ```python
72
+ from ssbc import BinaryClassifierSimulator, generate_rigorous_pac_report
73
+
74
+ # Generate or load calibration data
75
+ sim = BinaryClassifierSimulator(
76
+ p_class1=0.2,
77
+ beta_params_class0=(1, 7),
78
+ beta_params_class1=(5, 2),
79
+ seed=42
80
+ )
81
+ labels, probs = sim.generate(n_samples=100)
82
+
83
+ # Generate comprehensive PAC report with operational bounds
84
+ report = generate_rigorous_pac_report(
85
+ labels=labels,
86
+ probs=probs,
87
+ alpha_target=0.10, # Target 90% coverage
88
+ delta=0.10, # 90% PAC confidence
89
+ test_size=1000, # Expected deployment size
90
+ use_union_bound=True, # Simultaneous guarantees
91
+
92
+ # Optional uncertainty analyses
93
+ run_bootstrap=True, # Recalibration uncertainty
94
+ n_bootstrap=1000,
95
+ simulator=sim,
96
+
97
+ run_cross_conformal=True, # Finite-sample diagnostics
98
+ n_folds=10,
99
+ )
100
+
101
+ # Access results
102
+ pac_bounds = report['pac_bounds_marginal']
103
+ print(f"Singleton rate: {pac_bounds['singleton_rate_bounds']}")
104
+ print(f"Expected: {pac_bounds['expected_singleton_rate']:.3f}")
105
+ ```
106
+
107
+ **Output includes:**
108
+ - ✅ PAC coverage guarantees (SSBC-corrected thresholds)
109
+ - ✅ Rigorous operational bounds (singleton, doublet, abstention, error rates)
110
+ - ✅ Per-class and marginal statistics
111
+ - ✅ Optional: Bootstrap uncertainty intervals
112
+ - ✅ Optional: Cross-conformal validation diagnostics
113
+
114
+ ### Core SSBC Algorithm
115
+
116
+ For fine-grained control, use the core algorithm directly:
117
+
118
+ ```python
119
+ from ssbc import ssbc_correct
120
+
121
+ result = ssbc_correct(
122
+ alpha_target=0.10, # Target 10% miscoverage
123
+ n=50, # Calibration set size
124
+ delta=0.10, # PAC parameter (90% confidence)
125
+ mode="beta" # Infinite test window
126
+ )
127
+
128
+ print(f"Corrected α: {result.alpha_corrected:.4f}")
129
+ print(f"u*: {result.u_star}")
130
+ ```
131
+
132
+ ### Validation and Diagnostics
133
+
134
+ Empirically validate your PAC bounds:
135
+
136
+ ```python
137
+ from ssbc import validate_pac_bounds, print_validation_results
138
+
139
+ # Generate report
140
+ report = generate_rigorous_pac_report(labels, probs, delta=0.10)
141
+
142
+ # Validate empirically
143
+ validation = validate_pac_bounds(
144
+ report=report,
145
+ simulator=sim,
146
+ test_size=1000,
147
+ n_trials=10000
148
+ )
149
+
150
+ # Print results
151
+ print_validation_results(validation)
152
+ ```
153
+
154
+ Cross-conformal validation for calibration diagnostics:
155
+
156
+ ```python
157
+ from ssbc import cross_conformal_validation
158
+
159
+ results = cross_conformal_validation(
160
+ labels=labels,
161
+ probs=probs,
162
+ n_folds=10,
163
+ alpha_target=0.10,
164
+ delta=0.10
165
+ )
166
+
167
+ print(f"Singleton rate: {results['marginal']['singleton']['mean']:.3f}")
168
+ print(f"Std dev: {results['marginal']['singleton']['std']:.3f}")
169
+ ```
170
+
171
+ ## Key Features
172
+
173
+ - ✅ **Small-Sample Correction**: PAC-valid conformal prediction for small calibration sets
174
+ - ✅ **Mondrian Conformal Prediction**: Per-class calibration for handling class imbalance
175
+ - ✅ **PAC Operational Bounds**: Rigorous bounds on deployment rates (LOO-CV + Clopper-Pearson)
176
+ - ✅ **Bootstrap Uncertainty**: Recalibration variability analysis
177
+ - ✅ **Cross-Conformal Validation**: Finite-sample diagnostics via K-fold
178
+ - ✅ **Empirical Validation**: Verify theoretical guarantees in practice
179
+ - ✅ **Comprehensive Statistics**: Detailed reporting with exact confidence intervals
180
+ - ✅ **Hyperparameter Tuning**: Interactive parallel coordinates visualization
181
+ - ✅ **Simulation Tools**: Built-in data generators for testing
182
+
183
+ ## Examples
184
+
185
+ The `examples/` directory contains comprehensive demonstrations:
186
+
187
+ ### Essential Examples
188
+
189
+ ```bash
190
+ # Core algorithm
191
+ python examples/ssbc_core_example.py
192
+
193
+ # Mondrian conformal prediction
194
+ python examples/mondrian_conformal_example.py
195
+
196
+ # Complete workflow with all uncertainty analyses
197
+ python examples/complete_workflow_example.py
198
+
199
+ # SLA/deployment contracts
200
+ python examples/sla_example.py
201
+
202
+ # Alpha scanning across thresholds
203
+ python examples/alpha_scan_example.py
204
+
205
+ # Empirical validation
206
+ python examples/pac_validation_example.py
207
+ ```
208
+
209
+ ## Understanding the Output
210
+
211
+ ### Per-Class Statistics (Conditioned on True Label)
212
+
213
+ For each class, the report shows:
214
+ - **Abstentions**: Empty prediction sets (no confident prediction)
215
+ - **Singletons**: Single-label predictions (automated decisions)
216
+ - **Doublets**: Both labels included (escalated to human review)
217
+ - **Singleton Error Rate**: P(error | singleton prediction)
218
+
219
+ ### Marginal Statistics (Deployment View)
220
+
221
+ Overall performance metrics (deployment perspective):
222
+ - **Coverage**: Fraction of predictions containing the true label
223
+ - **Automation Rate**: Fraction of confident predictions (singletons)
224
+ - **Escalation Rate**: Fraction requiring human review (doublets + abstentions)
225
+ - **Error Rate**: Among automated decisions
226
+
227
+ ### PAC Operational Bounds
228
+
229
+ Rigorous bounds on all operational metrics:
230
+ - Computed via Leave-One-Out Cross-Validation (LOO-CV)
231
+ - Clopper-Pearson confidence intervals account for estimation uncertainty
232
+ - Union bound ensures all metrics hold simultaneously
233
+ - Valid for any future test set from the same distribution
234
+
235
+ ## Citation
236
+
237
+ If you use SSBC in your research, please cite:
238
+
239
+ ```bibtex
240
+ @software{ssbc2024,
241
+ author = {Zwart, Petrus H},
242
+ title = {SSBC: Small-Sample Beta Correction},
243
+ year = {2024},
244
+ url = {https://github.com/phzwart/ssbc}
245
+ }
246
+ ```
247
+
248
+ ## Contributing
249
+
250
+ Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
251
+
252
+ ## License
253
+
254
+ MIT License - see [LICENSE](LICENSE) file for details.
255
+
256
+ ## Credits
257
+
258
+ This package was created with [Cookiecutter](https://github.com/audreyfeldroy/cookiecutter) and the [audreyfeldroy/cookiecutter-pypackage](https://github.com/audreyfeldroy/cookiecutter-pypackage) project template.
@@ -1,11 +1,11 @@
1
- # Welcome to ssbc's documentation!
1
+ # Welcome to SSBC's documentation!
2
2
 
3
3
  ## Contents
4
4
 
5
- - [Readme](readme.md)
6
5
  - [Installation](installation.md)
7
6
  - [Usage](usage.md)
8
- - [Modules](modules.md)
7
+ - [Theory and Deployment Guide](theory.md)
8
+ - [API Reference](index.rst)
9
9
  - [Contributing](contributing.md)
10
10
  - [History](history.md)
11
11