certflow 1.0.2__tar.gz → 1.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- certflow-1.1.0/.gitignore +21 -0
- certflow-1.1.0/CHANGELOG.md +135 -0
- {certflow-1.0.2 → certflow-1.1.0}/CITATION.cff +13 -1
- {certflow-1.0.2 → certflow-1.1.0}/PKG-INFO +135 -11
- certflow-1.1.0/README.md +289 -0
- certflow-1.1.0/docs/README.md +54 -0
- {certflow-1.0.2 → certflow-1.1.0}/pyproject.toml +4 -2
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/__init__.py +32 -2
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/cert.py +305 -3
- certflow-1.1.0/src/certflow/conformal.py +937 -0
- certflow-1.1.0/src/certflow/team.py +93 -0
- certflow-1.1.0/tests/test_cia.py +184 -0
- certflow-1.1.0/tests/test_live_wiring.py +197 -0
- certflow-1.1.0/tests/test_lp_shift.py +151 -0
- certflow-1.1.0/tests/test_pasc_watch.py +280 -0
- certflow-1.1.0/tests/test_sfogd.py +106 -0
- certflow-1.1.0/tests/test_team.py +138 -0
- certflow-1.0.2/.gitignore +0 -13
- certflow-1.0.2/CHANGELOG.md +0 -76
- certflow-1.0.2/README.md +0 -169
- certflow-1.0.2/src/certflow/conformal.py +0 -275
- {certflow-1.0.2 → certflow-1.1.0}/LICENSE +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/baselines.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/ch.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/drift.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/egraph.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/episodes.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/fastgraph.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/graphcore.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/harness.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/movingai.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/oracle.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/realworld.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/roadnet.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/sensing.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/snapshot.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/src/certflow/types.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_baselines.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_cert.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_ch.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_conformal.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_drift_oracle.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_egraph.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_episodes.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_fastgraph.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_graphcore.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_harness.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_movingai.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_movingai_experiment.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_realworld.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_roadnet.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_snapshot.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_state_sync.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_tier1_smoke.py +0 -0
- {certflow-1.0.2 → certflow-1.1.0}/tests/test_tier2_smoke.py +0 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
/data/
|
|
2
|
+
cert_env/
|
|
3
|
+
.venv/
|
|
4
|
+
__pycache__/
|
|
5
|
+
*.egg-info/
|
|
6
|
+
.pytest_cache/
|
|
7
|
+
/results/
|
|
8
|
+
paper/*.pdf
|
|
9
|
+
paper/*.aux
|
|
10
|
+
paper/*.log
|
|
11
|
+
paper/*.out
|
|
12
|
+
venue.md
|
|
13
|
+
.claude/
|
|
14
|
+
|
|
15
|
+
# raw visualization renders (curated copies live in site/assets/)
|
|
16
|
+
viz_out/
|
|
17
|
+
|
|
18
|
+
# personal launch assets (not part of the published artifact)
|
|
19
|
+
linkedin.zip
|
|
20
|
+
reddit.zip
|
|
21
|
+
x.zip
|
|
@@ -0,0 +1,135 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to CERT-FLOW are documented here. The format follows
|
|
4
|
+
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/); versions follow
|
|
5
|
+
[Semantic Versioning](https://semver.org/).
|
|
6
|
+
|
|
7
|
+
## [Unreleased]
|
|
8
|
+
|
|
9
|
+
## [1.1.0] - 2026-07-02
|
|
10
|
+
|
|
11
|
+
Multi-agent certificate + a 2026 conformal upgrade layer. All new behavior is
|
|
12
|
+
opt-in behind config flags / new classes; the default single-agent certificate
|
|
13
|
+
and its guarantees are unchanged.
|
|
14
|
+
|
|
15
|
+
### Added
|
|
16
|
+
- **Additive multi-agent certificate** (`certflow.team`): `additive_certificate`
|
|
17
|
+
+ `TeamCertificate` (`sum LB <= sum OPT <= sum UB`, union-bound confidence) —
|
|
18
|
+
the one TEAM-CERT survivor, ported over a shared conformal store.
|
|
19
|
+
- **LP-shift staleness** (`ConformalScorer(shift_model="lp", eps_lp, rho_lp)`,
|
|
20
|
+
arXiv 2502.14105): worst-case quantile `Quant(1-alpha+rho)+eps`. TV default.
|
|
21
|
+
- **CIA path-sum calibration** (`CIACalibrator`, `CertPlanner.cia_path_certificate`,
|
|
22
|
+
arXiv 2408.10939): group-sum path calibration with symmetric-calibration overlap
|
|
23
|
+
handling and the age-weighted drift retrofit. Bonferroni default.
|
|
24
|
+
- **SF-OGD ACI** (`ACITracker(mode="sf-ogd")`, arXiv 2302.07869): scale-free,
|
|
25
|
+
anytime step size. Fixed-gamma default.
|
|
26
|
+
- **PASC joint per-edge calibration** (`PASCCalibrator`, arXiv 2605.18812): one
|
|
27
|
+
`max`-score quantile prices all edges jointly at `>= 1-alpha`, replacing the
|
|
28
|
+
`alpha/L` per-edge Bonferroni correction.
|
|
29
|
+
- **Testability layer** — making the pinned-at-1.0 coverage observable:
|
|
30
|
+
`conformal_p_value` (WATCH Eq. 9), `ConformalTestMartingale` (WATCH, arXiv
|
|
31
|
+
2505.04608; Ville alarm + tightness stress test), `conformal_e_value` /
|
|
32
|
+
`score_ratio_e_value` / `merge_e_values` (arXiv 2503.13050).
|
|
33
|
+
- **Drift diagnostics** (from DASC, arXiv 2606.15953, as observables only —
|
|
34
|
+
DASC's coverage bound is not distribution-free): `residual_drift_score`
|
|
35
|
+
(1-D Wasserstein `D_t`), `effective_sample_size` (Kish `n_eff`).
|
|
36
|
+
- **Live round-2 wiring** — the round-2 calibrators, previously standalone, are
|
|
37
|
+
now wired into the live `round()` loop behind `PlannerConfig` flags (all
|
|
38
|
+
default **OFF**; on/off produce a byte-identical `(lb, ub, confidence)`
|
|
39
|
+
stream):
|
|
40
|
+
- `watch_monitor=True` (+ `sr_threshold`): every new weighted conformal
|
|
41
|
+
p-value inside `ingest_observation` feeds `planner.watch`
|
|
42
|
+
(`ConformalTestMartingale`) **and** `planner.sr` (`ShiryaevRobertsDetector`);
|
|
43
|
+
`planner.diagnostics()` exposes the martingale value/alarm, SR peak/alarm,
|
|
44
|
+
the recent-vs-buffer residual drift score and the age-weights' effective
|
|
45
|
+
sample size. Purely observational — no certificate, no pricing change.
|
|
46
|
+
- `path_calibration="pasc"`: `_q()` prices edges with the PASC **joint** radius
|
|
47
|
+
live, falling back to Bonferroni during warm-up / while α-annealing pins the
|
|
48
|
+
level at 1. Uses the **signed** block-max, not `abs()`: the live buffer
|
|
49
|
+
already stores the drift-adjusted score `|obs−ĉ| − ρ·age`, so `abs()` (as the
|
|
50
|
+
standalone `pasc_edge_radius` applies to *raw* residuals) would double-count
|
|
51
|
+
the drift subtraction and inflate the radius.
|
|
52
|
+
- `scripts/run_live_wiring.py`, `tests/test_live_wiring.py`.
|
|
53
|
+
- **Real METR-LA benchmark of the wiring** (20 seeds × 288 rounds = one replay
|
|
54
|
+
day each; oracle = exact Dijkstra on the recording): **0.0000** coverage
|
|
55
|
+
violations in every mode; `watch_monitor` **quiet 20/20** on both detectors —
|
|
56
|
+
the pinned-at-1.0 coverage is now a live, alarming quantity at zero cost
|
|
57
|
+
(WATCH HOLDS on real data). PASC is an **honest negative**: **+25.1 %** wider
|
|
58
|
+
median width than Bonferroni on real traffic (8797 → 11007 s), the opposite of
|
|
59
|
+
its 4.5 % synthetic-grid win — long optimistic paths (L ≈ 14–18) starve the
|
|
60
|
+
length-L block quantile, while Bonferroni's full-pooled per-edge quantile stays
|
|
61
|
+
better-resolved. Bonferroni stays default; PASC keeps its experimental flag.
|
|
62
|
+
Full suite **250 passed**. (`docs/results/live-wiring-2026.md`)
|
|
63
|
+
- `docs/results/multiagent.md`, `docs/related-work-2026.md` (positioning vs
|
|
64
|
+
arXiv 2601.03629 + the adopted machinery).
|
|
65
|
+
|
|
66
|
+
## [1.0.2] - 2026-06-10
|
|
67
|
+
|
|
68
|
+
Packaging and serialization fixes for the freshly published library.
|
|
69
|
+
|
|
70
|
+
### Fixed
|
|
71
|
+
- `pytest` now discovers the `src/`-layout package on a fresh checkout: the
|
|
72
|
+
`[tool.pytest.ini_options]` `pythonpath` was `["."]` (repo root, no package
|
|
73
|
+
there), so `python -m pytest` failed with `ModuleNotFoundError: certflow`
|
|
74
|
+
unless `PYTHONPATH=src` was set or the package was installed. Set to
|
|
75
|
+
`["src"]`.
|
|
76
|
+
- `EpisodeResult.oracle_cost` is now serialized by `save_results` and restored
|
|
77
|
+
by `load_results`. It was dropped on save, so reloaded Tier-2 results came
|
|
78
|
+
back with `oracle_cost = nan` and any regret analysis silently reported NaN.
|
|
79
|
+
Legacy result files without the field still load (oracle_cost stays nan).
|
|
80
|
+
|
|
81
|
+
### Added
|
|
82
|
+
- `realworld` optional extra (`pip install 'certflow[realworld]'`) declaring
|
|
83
|
+
the `pandas` and `tables` dependencies the METR-LA / PEMS-BAY traffic
|
|
84
|
+
adapter needs. The core install stays numpy/scipy only; `_load_traffic` now
|
|
85
|
+
raises a clear `ImportError` pointing at the extra when pandas is absent.
|
|
86
|
+
|
|
87
|
+
## [1.0.1] - 2026-06-10
|
|
88
|
+
|
|
89
|
+
First PyPI release (`pip install certflow`).
|
|
90
|
+
|
|
91
|
+
### Added
|
|
92
|
+
- Top-level package API: `from certflow import CertPlanner, PlannerConfig,
|
|
93
|
+
Certificate, ConformalScorer, ACITracker, EdgeBelief, World`, plus
|
|
94
|
+
`certflow.__version__` (previously `certflow/__init__.py` was empty and
|
|
95
|
+
everything had to be imported from submodules; the old submodule imports
|
|
96
|
+
still work).
|
|
97
|
+
- `CITATION.cff` (validated, concept DOI 10.5281/zenodo.20631475) and this
|
|
98
|
+
changelog.
|
|
99
|
+
- Full PyPI packaging metadata: readme, keywords, classifiers, project URLs.
|
|
100
|
+
|
|
101
|
+
### Changed
|
|
102
|
+
- README: pip-based 30-second quickstart, static DOI badge pointing at the
|
|
103
|
+
concept DOI, link to the limitations ledger, Python badge corrected to
|
|
104
|
+
3.10+ (matching `requires-python`).
|
|
105
|
+
- Package version aligned with the release tag (pyproject said 0.1.0 while
|
|
106
|
+
the repository was at v1.0.0).
|
|
107
|
+
|
|
108
|
+
### Fixed
|
|
109
|
+
- Lint sweep over `src/`: removed unused imports and dead local assignments
|
|
110
|
+
(no behavior change; the full test suite passes bit-identically).
|
|
111
|
+
|
|
112
|
+
## [1.0.0] - 2026-06-10
|
|
113
|
+
|
|
114
|
+
First public release, accompanying the preprint *CERT: Certified Route
|
|
115
|
+
Planning under Drifting Costs (Extended Version)*.
|
|
116
|
+
|
|
117
|
+
- Conformal route certificates (LB <= OPT <= UB) under drifting edge costs:
|
|
118
|
+
age-weighted non-exchangeable quantiles, staleness correction, honest
|
|
119
|
+
annealing.
|
|
120
|
+
- Certificate-directed sensing (route-critical, churn-aware) and dual
|
|
121
|
+
incremental search on a flat-array engine (numba kernels with pure-Python
|
|
122
|
+
fallback).
|
|
123
|
+
- Certificate-gated preprocessing: all-pairs snapshot oracle and certified
|
|
124
|
+
Contraction Hierarchies (ns-to-microsecond queries that expire under
|
|
125
|
+
drift).
|
|
126
|
+
- 200+ tests; 16 reproduction pipelines covering 17 synthetic regimes,
|
|
127
|
+
METR-LA / PEMS-BAY traffic replay, MovingAI maps, and DIMACS road
|
|
128
|
+
networks.
|
|
129
|
+
- Theory T1-T7 documented in `docs/` (coverage, certifiability threshold,
|
|
130
|
+
sum-aware certificate, impossibility of a tighter lower bound,
|
|
131
|
+
decision-uniform validity, churn floor).
|
|
132
|
+
|
|
133
|
+
[1.0.2]: https://github.com/Archerkattri/CERT-FLOW/releases/tag/v1.0.2
|
|
134
|
+
[1.0.1]: https://github.com/Archerkattri/CERT-FLOW/releases/tag/v1.0.1
|
|
135
|
+
[1.0.0]: https://github.com/Archerkattri/CERT-FLOW/releases/tag/v1.0.0
|
|
@@ -7,7 +7,8 @@ authors:
|
|
|
7
7
|
- family-names: Attri
|
|
8
8
|
given-names: Krishi
|
|
9
9
|
email: krishiattriwork@gmail.com
|
|
10
|
-
|
|
10
|
+
orcid: https://orcid.org/0009-0005-4695-6467
|
|
11
|
+
date-released: "2026-06-10"
|
|
11
12
|
doi: 10.5281/zenodo.20631475
|
|
12
13
|
url: "https://github.com/Archerkattri/CERT-FLOW"
|
|
13
14
|
repository-code: "https://github.com/Archerkattri/CERT-FLOW"
|
|
@@ -28,3 +29,14 @@ abstract: >-
|
|
|
28
29
|
preprocessing (all-pairs snapshot oracle, certified Contraction
|
|
29
30
|
Hierarchies) behind the certificate so static-speed queries expire the
|
|
30
31
|
instant drift exceeds tolerance.
|
|
32
|
+
preferred-citation:
|
|
33
|
+
type: article
|
|
34
|
+
title: "CERT: Certified Route Planning under Drifting Costs"
|
|
35
|
+
authors:
|
|
36
|
+
- family-names: Attri
|
|
37
|
+
given-names: Krishi
|
|
38
|
+
orcid: https://orcid.org/0009-0005-4695-6467
|
|
39
|
+
year: 2026
|
|
40
|
+
doi: 10.31224/7306
|
|
41
|
+
url: "https://doi.org/10.31224/7306"
|
|
42
|
+
notes: "engrXiv preprint, extended version"
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: certflow
|
|
3
|
-
Version: 1.0
|
|
3
|
+
Version: 1.1.0
|
|
4
4
|
Summary: Certified route planning under drifting costs: conformal LB<=OPT<=UB certificates, certificate-directed sensing, proof-gated preprocessing
|
|
5
5
|
Project-URL: Homepage, https://github.com/Archerkattri/CERT-FLOW
|
|
6
6
|
Project-URL: Repository, https://github.com/Archerkattri/CERT-FLOW
|
|
@@ -22,10 +22,14 @@ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
|
22
22
|
Requires-Python: >=3.10
|
|
23
23
|
Requires-Dist: numpy>=1.24
|
|
24
24
|
Requires-Dist: scipy>=1.10
|
|
25
|
+
Provides-Extra: bench
|
|
26
|
+
Requires-Dist: networkx>=3.0; extra == 'bench'
|
|
25
27
|
Provides-Extra: dev
|
|
26
28
|
Requires-Dist: pytest>=7.0; extra == 'dev'
|
|
27
29
|
Provides-Extra: fast
|
|
28
|
-
Requires-Dist: numba; extra == 'fast'
|
|
30
|
+
Requires-Dist: numba>=0.59; extra == 'fast'
|
|
31
|
+
Provides-Extra: plots
|
|
32
|
+
Requires-Dist: matplotlib>=3.7; extra == 'plots'
|
|
29
33
|
Provides-Extra: realworld
|
|
30
34
|
Requires-Dist: pandas>=1.5; extra == 'realworld'
|
|
31
35
|
Requires-Dist: tables>=3.7; extra == 'realworld'
|
|
@@ -35,13 +39,17 @@ Description-Content-Type: text/markdown
|
|
|
35
39
|
|
|
36
40
|
<p align="center">
|
|
37
41
|
<a href="https://pypi.org/project/certflow/"><img alt="PyPI" src="https://img.shields.io/pypi/v/certflow?color=009E73"></a>
|
|
38
|
-
<a href="#reproducing-every-number"><img alt="tests" src="https://img.shields.io/badge/tests-
|
|
42
|
+
<a href="#reproducing-every-number"><img alt="tests" src="https://img.shields.io/badge/tests-200%2B%20passing-0072B2"></a>
|
|
39
43
|
<img alt="python" src="https://img.shields.io/badge/python-3.10%2B-56B4E9">
|
|
40
44
|
<img alt="license" src="https://img.shields.io/badge/license-MIT-1a7f37">
|
|
41
45
|
<img alt="coverage claim" src="https://img.shields.io/badge/certificate%20coverage-1.000%20measured-D55E00">
|
|
42
|
-
<a href="https://
|
|
46
|
+
<a href="https://zenodo.org/badge/latestdoi/1265150144"><img alt="DOI" src="https://zenodo.org/badge/1265150144.svg"></a>
|
|
47
|
+
<a href="https://doi.org/10.31224/7306"><img alt="engrXiv paper" src="https://img.shields.io/badge/engrXiv-10.31224%2F7306-009E73"></a>
|
|
48
|
+
<a href="https://archerkattri.github.io/CERT-FLOW/"><img alt="project page" src="https://img.shields.io/badge/project%20page-live-56B4E9"></a>
|
|
43
49
|
</p>
|
|
44
50
|
|
|
51
|
+
<p align="center"><b><a href="https://archerkattri.github.io/CERT-FLOW/">🌐 Project page & videos →</a></b></p>
|
|
52
|
+
|
|
45
53
|
A robot replanning through a world whose costs drift faces a question classical
|
|
46
54
|
planners never answer: **how good is my current route, given that most of the
|
|
47
55
|
map is stale?** CERT-FLOW answers it every round, with a proof: a
|
|
@@ -54,13 +62,24 @@ certificate says the gap shrinks fastest.
|
|
|
54
62
|
|
|
55
63
|
## Why it's different
|
|
56
64
|
|
|
57
|
-
| | classical replanning (D\* Lite, AD\*) | exchangeable conformal (CIA) | **CERT-FLOW** |
|
|
65
|
+
| property | classical replanning (D\* Lite, AD\*) | exchangeable conformal (CIA) | 🏆 **CERT-FLOW** (wins) |
|
|
58
66
|
|---|---|---|---|
|
|
59
67
|
| stale map | silently trusts it | coverage collapses (0.95 → **0.20** measured) | **prices it**: width grows with age, claim degrades visibly |
|
|
60
68
|
| validity under drift | 0.02–0.59 measured | gap-dependent | **0.95–1.00, every condition ever run** |
|
|
61
69
|
| sensing | none / heuristic | none | **certificate-directed** (oracle-level regret) |
|
|
62
70
|
| static regime | fast | tight | **proof-gated preprocessing**: ns–µs queries that self-expire |
|
|
63
71
|
|
|
72
|
+
### Results at a glance — CERT-FLOW wins every metric
|
|
73
|
+
|
|
74
|
+
| metric | better is | **CERT-FLOW** | best alternative | winner |
|
|
75
|
+
|---|:--:|---|---|:--:|
|
|
76
|
+
| certificate coverage | higher ↑ | **1.000** (every condition) | AD\* 0.02–0.59 · CIA → 0.20 | 🏆 CERT |
|
|
77
|
+
| travel-regret, unknown terrain | lower ↓ | **−0.12** (≈ clairvoyant oracle) | VOI 0.47 · freshness/blind 4–7 | 🏆 CERT |
|
|
78
|
+
| fully-certified round @ 60×60 | lower ↓ | **3.7 ms** p50 / 12 ms p95 | — no certified planner reports one | 🏆 CERT |
|
|
79
|
+
| road cost-change absorption | lower ↓ | **0.015–0.34 ms** | CRP ≈ 1 s | 🏆 CERT |
|
|
80
|
+
|
|
81
|
+
*“better is” shows the metric direction (↑ higher / ↓ lower); **bold** = best value; 🏆 = winner. Every per-condition table in [`docs/`](docs/) is likewise marked and ranked best→worst.*
|
|
82
|
+
|
|
64
83
|
## Headline results (all reproducible below)
|
|
65
84
|
|
|
66
85
|
- **Coverage ≥ claimed confidence on every condition ever run**: 17 synthetic
|
|
@@ -87,6 +106,65 @@ certificate says the gap shrinks fastest.
|
|
|
87
106
|
help. Every known limitation and its disposition:
|
|
88
107
|
[docs/results/limitations.md](docs/results/limitations.md).
|
|
89
108
|
|
|
109
|
+
## 2026 upgrades (opt-in)
|
|
110
|
+
|
|
111
|
+
Everything here is **off by default** — the single-agent certificate and its
|
|
112
|
+
guarantees are byte-identical, and each addition is a new class or a config flag
|
|
113
|
+
you opt into. Derivations in
|
|
114
|
+
[docs/related-work-2026.md](docs/related-work-2026.md); the API in the
|
|
115
|
+
[CHANGELOG](CHANGELOG.md).
|
|
116
|
+
|
|
117
|
+
- **Additive multi-agent certificate** — `certflow.team.additive_certificate`
|
|
118
|
+
composes per-agent certificates into a fleet-level `ΣLB ≤ ΣOPT ≤ ΣUB`
|
|
119
|
+
(union-bound confidence). The one TEAM-CERT variant that survived scrutiny.
|
|
120
|
+
([docs/results/multiagent.md](docs/results/multiagent.md))
|
|
121
|
+
- **2025 conformal machinery, retrofitted with our age weights** — LP-shift
|
|
122
|
+
staleness (`ConformalScorer(shift_model="lp")`,
|
|
123
|
+
[arXiv 2502.14105](https://arxiv.org/abs/2502.14105)); a scale-free SF-OGD step
|
|
124
|
+
for the ACI net (`ACITracker(mode="sf-ogd")`,
|
|
125
|
+
[arXiv 2302.07869](https://arxiv.org/abs/2302.07869)); CIA path-*sum*
|
|
126
|
+
calibration (`CIACalibrator`, `CertPlanner.cia_path_certificate()`,
|
|
127
|
+
[arXiv 2408.10939](https://arxiv.org/abs/2408.10939)); and PASC joint
|
|
128
|
+
*per-edge* radius (`PASCCalibrator`, `CertPlanner.pasc_edge_radius()`,
|
|
129
|
+
[arXiv 2605.18812](https://arxiv.org/abs/2605.18812)) — one `max`-score
|
|
130
|
+
quantile prices every edge at `≥ 1-α`, replacing the `α/L` Bonferroni
|
|
131
|
+
correction. TV, fixed-γ, and Bonferroni stay the defaults.
|
|
132
|
+
- **Testability layer** — makes the pinned-at-1.0 coverage *observable*:
|
|
133
|
+
`conformal_p_value` + `ConformalTestMartingale` (WATCH,
|
|
134
|
+
[arXiv 2505.04608](https://arxiv.org/abs/2505.04608) — a Ville-bounded validity
|
|
135
|
+
monitor plus a tightness stress test), `ShiryaevRobertsDetector` for
|
|
136
|
+
late-change detection, conformal e-values and admissible merging
|
|
137
|
+
(`conformal_e_value`, `merge_e_values`,
|
|
138
|
+
[arXiv 2503.13050](https://arxiv.org/abs/2503.13050)), and drift diagnostics
|
|
139
|
+
`residual_drift_score` / `effective_sample_size` (from DASC,
|
|
140
|
+
[arXiv 2606.15953](https://arxiv.org/abs/2606.15953) — observables only: DASC's
|
|
141
|
+
own bound isn't distribution-free, so it never touches the coverage-critical
|
|
142
|
+
weights).
|
|
143
|
+
- **Demonstration** — `scripts/run_watch_testability.py` on streams with known
|
|
144
|
+
ground truth: the validity monitor stays flat (coverage tracks `1-α`); the
|
|
145
|
+
Shiryaev-Roberts detector catches a sharp regime shift ~7 rounds after it,
|
|
146
|
+
where the plain martingale — decayed over a long null — misses it; and the
|
|
147
|
+
Bonferroni-vs-PASC width gap shows up **only under positive edge correlation**
|
|
148
|
+
(ρ=0.9, L=20: Bonferroni over-covers 0.97, PASC holds ~0.91 at **16.5% less
|
|
149
|
+
width**). Under independence PASC barely helps — stated, not hidden.
|
|
150
|
+
- **Live-wired and benchmarked on real data** —
|
|
151
|
+
`PlannerConfig(watch_monitor=True)` runs the WATCH martingale +
|
|
152
|
+
Shiryaev-Roberts detector inside `round()` (read via `planner.diagnostics()`),
|
|
153
|
+
and `path_calibration="pasc"` prices edges with the joint radius live; both
|
|
154
|
+
default **off** and change no certificate (`(lb, ub, confidence)` is
|
|
155
|
+
byte-identical). On **real METR-LA** (20 seeds × 288 rounds) the certificate
|
|
156
|
+
holds at **0.0000** violations in every mode, both detectors stay **quiet
|
|
157
|
+
20/20** — coverage is now a live, alarming quantity — and PASC is an honest
|
|
158
|
+
negative: **+25.1 % wider** than Bonferroni on real traffic, the opposite of
|
|
159
|
+
its synthetic-grid win, so Bonferroni stays the default.
|
|
160
|
+
([docs/results/live-wiring-2026.md](docs/results/live-wiring-2026.md))
|
|
161
|
+
|
|
162
|
+
<p align="center"><img src="assets/live_wiring_2026.png" alt="Left: certified width on real METR-LA — the PASC joint radius is 25% wider than the default Bonferroni, both at zero violations. Right: the Shiryaev-Roberts validity monitor stays quiet under the correct model and fires 7 rounds after an injected regime shift." width="100%"/></p>
|
|
163
|
+
|
|
164
|
+
<p align="center"><em>The 2026 layer on real data. <b>Left</b> — on METR-LA the joint PASC radius is <b>+25.1 % wider</b> than the default per-edge Bonferroni (both at 0.0000 violations): long paths starve the length-L block quantile, so PASC keeps its experimental flag. <b>Right</b> — soundness is now <b>observable</b>: the Shiryaev-Roberts statistic stays below its alarm threshold under the correctly-modelled null (quiet on 20/20 real seeds) and crosses it ~7 rounds after an injected regime shift, at zero cost to the certificate. Regenerate with <code>scripts/viz_gen/live_wiring_fig.py</code>.</em></p>
|
|
165
|
+
|
|
166
|
+
With these in, the full suite is **250 passing** (the default path unchanged).
|
|
167
|
+
|
|
90
168
|
## Quickstart
|
|
91
169
|
|
|
92
170
|
```bash
|
|
@@ -111,7 +189,7 @@ To develop or reproduce the paper numbers, work from a clone:
|
|
|
111
189
|
git clone https://github.com/Archerkattri/CERT-FLOW && cd CERT-FLOW
|
|
112
190
|
python -m venv cert_env && source cert_env/bin/activate
|
|
113
191
|
pip install -e ".[dev,fast,realworld]" h5py
|
|
114
|
-
pytest # full suite:
|
|
192
|
+
pytest # full suite: 200+ tests (more with datasets); data-dependent tests skip cleanly without data/
|
|
115
193
|
```
|
|
116
194
|
|
|
117
195
|
## Reproducing every number
|
|
@@ -137,9 +215,51 @@ a multicore machine (`CERTFLOW_WORKERS=N` parallelizes seeds bit-identically).
|
|
|
137
215
|
| Scale + engine benchmarks | `scripts/run_scale.py` | `docs/results/scale.md` |
|
|
138
216
|
| Road networks (DIMACS NY/FLA, ALT) | `scripts/run_roadnet.py` | `docs/results/published-speed-comparison.md` |
|
|
139
217
|
| Certified Contraction Hierarchies | `scripts/run_ch.py` | `docs/results/published-speed-comparison.md` |
|
|
218
|
+
| Extended validation (baselines, stress, scaling) | `scripts/extval/*.py` | `docs/results/extended-validation.md` |
|
|
219
|
+
| FoMo off-road seasonal drift | `scripts/extval/fomo_validation.py` | `docs/results/extended-validation.md` (§6) |
|
|
220
|
+
| Comparison videos | `scripts/viz_compare.py`, `scripts/viz_gen/*.py` | `site/` (project page) |
|
|
140
221
|
|
|
141
222
|
All scripts accept `--quick`. Real-data runs need `data/` (sources and loaders
|
|
142
|
-
in `data/README.md`; ~230 MB
|
|
223
|
+
in `data/README.md`; ~230 MB + optional FoMo cost-signal ~150 MB, links inside).
|
|
224
|
+
|
|
225
|
+
## Videos
|
|
226
|
+
|
|
227
|
+
Honest side-by-side comparisons — every clip replays a **real run** and the
|
|
228
|
+
coverage/regret numbers shown are measured, not staged (warm-up rounds are
|
|
229
|
+
drawn as "no claim", never counted as misses). Generators:
|
|
230
|
+
`scripts/viz_compare.py` + `scripts/viz_gen/`; MP4 + supplementary reel in
|
|
231
|
+
[`assets/videos/`](assets/videos).
|
|
232
|
+
|
|
233
|
+
**The certificate that holds vs. the one that breaks** — CERT's band contains
|
|
234
|
+
the true optimum every round it claims; AD\*'s w-suboptimality band, trusting
|
|
235
|
+
stale point estimates, drifts out of date.
|
|
236
|
+
|
|
237
|
+

|
|
238
|
+
|
|
239
|
+
*Synthetic drifting grid — CERT coverage 100% vs AD\* 43% (60 rounds).*
|
|
240
|
+
|
|
241
|
+

|
|
242
|
+
|
|
243
|
+
*Real MovingAI map (DAO arena) — CERT 100% vs AD\* 42%.*
|
|
244
|
+
|
|
245
|
+
**Sensing that pays** — gap-directed sensing converges near a clairvoyant
|
|
246
|
+
oracle; random / max-age / drive-blind wander at equal budget.
|
|
247
|
+
|
|
248
|
+

|
|
249
|
+
|
|
250
|
+
*Unknown drifting terrain — CERT travel-regret 1.96 vs random 7.84 / max-age 4.89 / blind 5.43 (15 seeds).*
|
|
251
|
+
|
|
252
|
+

|
|
253
|
+
|
|
254
|
+
*Real arena map — CERT 1.71, lowest of all policies.*
|
|
255
|
+
|
|
256
|
+
**Exchangeability collapse under staleness** — exchangeable conformal (CIA,
|
|
257
|
+
its own construction) covers on the static slice it assumes, then collapses;
|
|
258
|
+
CERT widens to hold coverage.
|
|
259
|
+
|
|
260
|
+

|
|
261
|
+
|
|
262
|
+
*METR-LA — CIA coverage 0.88 → 0.25 → 0.38 (width frozen) vs CERT ~1.0 (width grows).*
|
|
143
263
|
|
|
144
264
|
## How it works
|
|
145
265
|
|
|
@@ -170,6 +290,8 @@ src/certflow/
|
|
|
170
290
|
ch.py certified Contraction Hierarchies (231 µs on 264k-node NY)
|
|
171
291
|
roadnet.py DIMACS road graphs + exact ALT on landmark lower-bounds
|
|
172
292
|
drift.py / realworld.py / movingai.py synthetic, traffic-replay, game maps
|
|
293
|
+
scripts/extval/ extended validation (baselines, stress, scaling, FoMo)
|
|
294
|
+
scripts/viz_gen/ comparison-video generators; site/ = project page
|
|
173
295
|
episodes.py / harness.py / baselines.py runners, seeds, parametric strawman
|
|
174
296
|
docs/results/ one markdown per experiment: numbers, anomalies, verdicts
|
|
175
297
|
docs/specs/ design spec; docs/theory/ working notes
|
|
@@ -177,8 +299,8 @@ docs/specs/ design spec; docs/theory/ working notes
|
|
|
177
299
|
|
|
178
300
|
## Citation
|
|
179
301
|
|
|
180
|
-
Paper: *CERT: Certified Route Planning under Drifting Costs
|
|
181
|
-
|
|
302
|
+
Paper (engrXiv preprint): *CERT: Certified Route Planning under Drifting Costs*,
|
|
303
|
+
[doi:10.31224/7306](https://doi.org/10.31224/7306).
|
|
182
304
|
|
|
183
305
|
```bibtex
|
|
184
306
|
@software{attri2026certflow,
|
|
@@ -194,8 +316,10 @@ The DOI above is the concept DOI (always resolves to the latest archived
|
|
|
194
316
|
version); [CITATION.cff](CITATION.cff) carries the same metadata in
|
|
195
317
|
machine-readable form.
|
|
196
318
|
|
|
197
|
-
Paper preprint (extended version):
|
|
198
|
-
|
|
319
|
+
Paper preprint (extended version): engrXiv,
|
|
320
|
+
[doi:10.31224/7306](https://doi.org/10.31224/7306).
|
|
321
|
+
|
|
322
|
+
**Author:** Krishi Attri ([ORCID](https://orcid.org/0009-0005-4695-6467) · [Google Scholar](https://scholar.google.com/citations?hl=en&user=VW1YUNYAAAAJ))
|
|
199
323
|
|
|
200
324
|
## License
|
|
201
325
|
|