certflow 1.0.2__tar.gz → 1.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. certflow-1.1.0/.gitignore +21 -0
  2. certflow-1.1.0/CHANGELOG.md +135 -0
  3. {certflow-1.0.2 → certflow-1.1.0}/CITATION.cff +13 -1
  4. {certflow-1.0.2 → certflow-1.1.0}/PKG-INFO +135 -11
  5. certflow-1.1.0/README.md +289 -0
  6. certflow-1.1.0/docs/README.md +54 -0
  7. {certflow-1.0.2 → certflow-1.1.0}/pyproject.toml +4 -2
  8. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/__init__.py +32 -2
  9. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/cert.py +305 -3
  10. certflow-1.1.0/src/certflow/conformal.py +937 -0
  11. certflow-1.1.0/src/certflow/team.py +93 -0
  12. certflow-1.1.0/tests/test_cia.py +184 -0
  13. certflow-1.1.0/tests/test_live_wiring.py +197 -0
  14. certflow-1.1.0/tests/test_lp_shift.py +151 -0
  15. certflow-1.1.0/tests/test_pasc_watch.py +280 -0
  16. certflow-1.1.0/tests/test_sfogd.py +106 -0
  17. certflow-1.1.0/tests/test_team.py +138 -0
  18. certflow-1.0.2/.gitignore +0 -13
  19. certflow-1.0.2/CHANGELOG.md +0 -76
  20. certflow-1.0.2/README.md +0 -169
  21. certflow-1.0.2/src/certflow/conformal.py +0 -275
  22. {certflow-1.0.2 → certflow-1.1.0}/LICENSE +0 -0
  23. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/baselines.py +0 -0
  24. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/ch.py +0 -0
  25. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/drift.py +0 -0
  26. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/egraph.py +0 -0
  27. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/episodes.py +0 -0
  28. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/fastgraph.py +0 -0
  29. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/graphcore.py +0 -0
  30. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/harness.py +0 -0
  31. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/movingai.py +0 -0
  32. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/oracle.py +0 -0
  33. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/realworld.py +0 -0
  34. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/roadnet.py +0 -0
  35. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/sensing.py +0 -0
  36. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/snapshot.py +0 -0
  37. {certflow-1.0.2 → certflow-1.1.0}/src/certflow/types.py +0 -0
  38. {certflow-1.0.2 → certflow-1.1.0}/tests/test_baselines.py +0 -0
  39. {certflow-1.0.2 → certflow-1.1.0}/tests/test_cert.py +0 -0
  40. {certflow-1.0.2 → certflow-1.1.0}/tests/test_ch.py +0 -0
  41. {certflow-1.0.2 → certflow-1.1.0}/tests/test_conformal.py +0 -0
  42. {certflow-1.0.2 → certflow-1.1.0}/tests/test_drift_oracle.py +0 -0
  43. {certflow-1.0.2 → certflow-1.1.0}/tests/test_egraph.py +0 -0
  44. {certflow-1.0.2 → certflow-1.1.0}/tests/test_episodes.py +0 -0
  45. {certflow-1.0.2 → certflow-1.1.0}/tests/test_fastgraph.py +0 -0
  46. {certflow-1.0.2 → certflow-1.1.0}/tests/test_graphcore.py +0 -0
  47. {certflow-1.0.2 → certflow-1.1.0}/tests/test_harness.py +0 -0
  48. {certflow-1.0.2 → certflow-1.1.0}/tests/test_movingai.py +0 -0
  49. {certflow-1.0.2 → certflow-1.1.0}/tests/test_movingai_experiment.py +0 -0
  50. {certflow-1.0.2 → certflow-1.1.0}/tests/test_realworld.py +0 -0
  51. {certflow-1.0.2 → certflow-1.1.0}/tests/test_roadnet.py +0 -0
  52. {certflow-1.0.2 → certflow-1.1.0}/tests/test_snapshot.py +0 -0
  53. {certflow-1.0.2 → certflow-1.1.0}/tests/test_state_sync.py +0 -0
  54. {certflow-1.0.2 → certflow-1.1.0}/tests/test_tier1_smoke.py +0 -0
  55. {certflow-1.0.2 → certflow-1.1.0}/tests/test_tier2_smoke.py +0 -0
@@ -0,0 +1,21 @@
1
+ /data/
2
+ cert_env/
3
+ .venv/
4
+ __pycache__/
5
+ *.egg-info/
6
+ .pytest_cache/
7
+ /results/
8
+ paper/*.pdf
9
+ paper/*.aux
10
+ paper/*.log
11
+ paper/*.out
12
+ venue.md
13
+ .claude/
14
+
15
+ # raw visualization renders (curated copies live in site/assets/)
16
+ viz_out/
17
+
18
+ # personal launch assets (not part of the published artifact)
19
+ linkedin.zip
20
+ reddit.zip
21
+ x.zip
@@ -0,0 +1,135 @@
1
+ # Changelog
2
+
3
+ All notable changes to CERT-FLOW are documented here. The format follows
4
+ [Keep a Changelog](https://keepachangelog.com/en/1.1.0/); versions follow
5
+ [Semantic Versioning](https://semver.org/).
6
+
7
+ ## [Unreleased]
8
+
9
+ ## [1.1.0] - 2026-07-02
10
+
11
+ Multi-agent certificate + a 2026 conformal upgrade layer. All new behavior is
12
+ opt-in behind config flags / new classes; the default single-agent certificate
13
+ and its guarantees are unchanged.
14
+
15
+ ### Added
16
+ - **Additive multi-agent certificate** (`certflow.team`): `additive_certificate`
17
+ + `TeamCertificate` (`sum LB <= sum OPT <= sum UB`, union-bound confidence) —
18
+ the one TEAM-CERT survivor, ported over a shared conformal store.
19
+ - **LP-shift staleness** (`ConformalScorer(shift_model="lp", eps_lp, rho_lp)`,
20
+ arXiv 2502.14105): worst-case quantile `Quant(1-alpha+rho)+eps`. TV default.
21
+ - **CIA path-sum calibration** (`CIACalibrator`, `CertPlanner.cia_path_certificate`,
22
+ arXiv 2408.10939): group-sum path calibration with symmetric-calibration overlap
23
+ handling and the age-weighted drift retrofit. Bonferroni default.
24
+ - **SF-OGD ACI** (`ACITracker(mode="sf-ogd")`, arXiv 2302.07869): scale-free,
25
+ anytime step size. Fixed-gamma default.
26
+ - **PASC joint per-edge calibration** (`PASCCalibrator`, arXiv 2605.18812): one
27
+ `max`-score quantile prices all edges jointly at `>= 1-alpha`, replacing the
28
+ `alpha/L` per-edge Bonferroni correction.
29
+ - **Testability layer** — making the pinned-at-1.0 coverage observable:
30
+ `conformal_p_value` (WATCH Eq. 9), `ConformalTestMartingale` (WATCH, arXiv
31
+ 2505.04608; Ville alarm + tightness stress test), `conformal_e_value` /
32
+ `score_ratio_e_value` / `merge_e_values` (arXiv 2503.13050).
33
+ - **Drift diagnostics** (from DASC, arXiv 2606.15953, as observables only —
34
+ DASC's coverage bound is not distribution-free): `residual_drift_score`
35
+ (1-D Wasserstein `D_t`), `effective_sample_size` (Kish `n_eff`).
36
+ - **Live round-2 wiring** — the round-2 calibrators, previously standalone, are
37
+ now wired into the live `round()` loop behind `PlannerConfig` flags (all
38
+ default **OFF**; on/off produce a byte-identical `(lb, ub, confidence)`
39
+ stream):
40
+ - `watch_monitor=True` (+ `sr_threshold`): every new weighted conformal
41
+ p-value inside `ingest_observation` feeds `planner.watch`
42
+ (`ConformalTestMartingale`) **and** `planner.sr` (`ShiryaevRobertsDetector`);
43
+ `planner.diagnostics()` exposes the martingale value/alarm, SR peak/alarm,
44
+ the recent-vs-buffer residual drift score and the age-weights' effective
45
+ sample size. Purely observational — no certificate, no pricing change.
46
+ - `path_calibration="pasc"`: `_q()` prices edges with the PASC **joint** radius
47
+ live, falling back to Bonferroni during warm-up / while α-annealing pins the
48
+ level at 1. Uses the **signed** block-max, not `abs()`: the live buffer
49
+ already stores the drift-adjusted score `|obs−ĉ| − ρ·age`, so `abs()` (as the
50
+ standalone `pasc_edge_radius` applies to *raw* residuals) would double-count
51
+ the drift subtraction and inflate the radius.
52
+ - `scripts/run_live_wiring.py`, `tests/test_live_wiring.py`.
53
+ - **Real METR-LA benchmark of the wiring** (20 seeds × 288 rounds = one replay
54
+ day each; oracle = exact Dijkstra on the recording): **0.0000** coverage
55
+ violations in every mode; `watch_monitor` **quiet 20/20** on both detectors —
56
+ the pinned-at-1.0 coverage is now a live, alarming quantity at zero cost
57
+ (WATCH HOLDS on real data). PASC is an **honest negative**: **+25.1 %** wider
58
+ median width than Bonferroni on real traffic (8797 → 11007 s), the opposite of
59
+ its 4.5 % synthetic-grid win — long optimistic paths (L ≈ 14–18) starve the
60
+ length-L block quantile, while Bonferroni's full-pooled per-edge quantile stays
61
+ better-resolved. Bonferroni stays default; PASC keeps its experimental flag.
62
+ Full suite **250 passed**. (`docs/results/live-wiring-2026.md`)
63
+ - `docs/results/multiagent.md`, `docs/related-work-2026.md` (positioning vs
64
+ arXiv 2601.03629 + the adopted machinery).
65
+
66
+ ## [1.0.2] - 2026-06-10
67
+
68
+ Packaging and serialization fixes for the freshly published library.
69
+
70
+ ### Fixed
71
+ - `pytest` now discovers the `src/`-layout package on a fresh checkout: the
72
+ `[tool.pytest.ini_options]` `pythonpath` was `["."]` (repo root, no package
73
+ there), so `python -m pytest` failed with `ModuleNotFoundError: certflow`
74
+ unless `PYTHONPATH=src` was set or the package was installed. Set to
75
+ `["src"]`.
76
+ - `EpisodeResult.oracle_cost` is now serialized by `save_results` and restored
77
+ by `load_results`. It was dropped on save, so reloaded Tier-2 results came
78
+ back with `oracle_cost = nan` and any regret analysis silently reported NaN.
79
+ Legacy result files without the field still load (oracle_cost stays nan).
80
+
81
+ ### Added
82
+ - `realworld` optional extra (`pip install 'certflow[realworld]'`) declaring
83
+ the `pandas` and `tables` dependencies the METR-LA / PEMS-BAY traffic
84
+ adapter needs. The core install stays numpy/scipy only; `_load_traffic` now
85
+ raises a clear `ImportError` pointing at the extra when pandas is absent.
86
+
87
+ ## [1.0.1] - 2026-06-10
88
+
89
+ First PyPI release (`pip install certflow`).
90
+
91
+ ### Added
92
+ - Top-level package API: `from certflow import CertPlanner, PlannerConfig,
93
+ Certificate, ConformalScorer, ACITracker, EdgeBelief, World`, plus
94
+ `certflow.__version__` (previously `certflow/__init__.py` was empty and
95
+ everything had to be imported from submodules; the old submodule imports
96
+ still work).
97
+ - `CITATION.cff` (validated, concept DOI 10.5281/zenodo.20631475) and this
98
+ changelog.
99
+ - Full PyPI packaging metadata: readme, keywords, classifiers, project URLs.
100
+
101
+ ### Changed
102
+ - README: pip-based 30-second quickstart, static DOI badge pointing at the
103
+ concept DOI, link to the limitations ledger, Python badge corrected to
104
+ 3.10+ (matching `requires-python`).
105
+ - Package version aligned with the release tag (pyproject said 0.1.0 while
106
+ the repository was at v1.0.0).
107
+
108
+ ### Fixed
109
+ - Lint sweep over `src/`: removed unused imports and dead local assignments
110
+ (no behavior change; the full test suite passes bit-identically).
111
+
112
+ ## [1.0.0] - 2026-06-10
113
+
114
+ First public release, accompanying the preprint *CERT: Certified Route
115
+ Planning under Drifting Costs (Extended Version)*.
116
+
117
+ - Conformal route certificates (LB <= OPT <= UB) under drifting edge costs:
118
+ age-weighted non-exchangeable quantiles, staleness correction, honest
119
+ annealing.
120
+ - Certificate-directed sensing (route-critical, churn-aware) and dual
121
+ incremental search on a flat-array engine (numba kernels with pure-Python
122
+ fallback).
123
+ - Certificate-gated preprocessing: all-pairs snapshot oracle and certified
124
+ Contraction Hierarchies (ns-to-microsecond queries that expire under
125
+ drift).
126
+ - 200+ tests; 16 reproduction pipelines covering 17 synthetic regimes,
127
+ METR-LA / PEMS-BAY traffic replay, MovingAI maps, and DIMACS road
128
+ networks.
129
+ - Theory T1-T7 documented in `docs/` (coverage, certifiability threshold,
130
+ sum-aware certificate, impossibility of a tighter lower bound,
131
+ decision-uniform validity, churn floor).
132
+
133
+ [1.0.2]: https://github.com/Archerkattri/CERT-FLOW/releases/tag/v1.0.2
134
+ [1.0.1]: https://github.com/Archerkattri/CERT-FLOW/releases/tag/v1.0.1
135
+ [1.0.0]: https://github.com/Archerkattri/CERT-FLOW/releases/tag/v1.0.0
@@ -7,7 +7,8 @@ authors:
7
7
  - family-names: Attri
8
8
  given-names: Krishi
9
9
  email: krishiattriwork@gmail.com
10
- affiliation: "Seoul National University"
10
+ orcid: https://orcid.org/0009-0005-4695-6467
11
+ date-released: "2026-06-10"
11
12
  doi: 10.5281/zenodo.20631475
12
13
  url: "https://github.com/Archerkattri/CERT-FLOW"
13
14
  repository-code: "https://github.com/Archerkattri/CERT-FLOW"
@@ -28,3 +29,14 @@ abstract: >-
28
29
  preprocessing (all-pairs snapshot oracle, certified Contraction
29
30
  Hierarchies) behind the certificate so static-speed queries expire the
30
31
  instant drift exceeds tolerance.
32
+ preferred-citation:
33
+ type: article
34
+ title: "CERT: Certified Route Planning under Drifting Costs"
35
+ authors:
36
+ - family-names: Attri
37
+ given-names: Krishi
38
+ orcid: https://orcid.org/0009-0005-4695-6467
39
+ year: 2026
40
+ doi: 10.31224/7306
41
+ url: "https://doi.org/10.31224/7306"
42
+ notes: "engrXiv preprint, extended version"
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: certflow
3
- Version: 1.0.2
3
+ Version: 1.1.0
4
4
  Summary: Certified route planning under drifting costs: conformal LB<=OPT<=UB certificates, certificate-directed sensing, proof-gated preprocessing
5
5
  Project-URL: Homepage, https://github.com/Archerkattri/CERT-FLOW
6
6
  Project-URL: Repository, https://github.com/Archerkattri/CERT-FLOW
@@ -22,10 +22,14 @@ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
22
22
  Requires-Python: >=3.10
23
23
  Requires-Dist: numpy>=1.24
24
24
  Requires-Dist: scipy>=1.10
25
+ Provides-Extra: bench
26
+ Requires-Dist: networkx>=3.0; extra == 'bench'
25
27
  Provides-Extra: dev
26
28
  Requires-Dist: pytest>=7.0; extra == 'dev'
27
29
  Provides-Extra: fast
28
- Requires-Dist: numba; extra == 'fast'
30
+ Requires-Dist: numba>=0.59; extra == 'fast'
31
+ Provides-Extra: plots
32
+ Requires-Dist: matplotlib>=3.7; extra == 'plots'
29
33
  Provides-Extra: realworld
30
34
  Requires-Dist: pandas>=1.5; extra == 'realworld'
31
35
  Requires-Dist: tables>=3.7; extra == 'realworld'
@@ -35,13 +39,17 @@ Description-Content-Type: text/markdown
35
39
 
36
40
  <p align="center">
37
41
  <a href="https://pypi.org/project/certflow/"><img alt="PyPI" src="https://img.shields.io/pypi/v/certflow?color=009E73"></a>
38
- <a href="#reproducing-every-number"><img alt="tests" src="https://img.shields.io/badge/tests-227%20passing-0072B2"></a>
42
+ <a href="#reproducing-every-number"><img alt="tests" src="https://img.shields.io/badge/tests-200%2B%20passing-0072B2"></a>
39
43
  <img alt="python" src="https://img.shields.io/badge/python-3.10%2B-56B4E9">
40
44
  <img alt="license" src="https://img.shields.io/badge/license-MIT-1a7f37">
41
45
  <img alt="coverage claim" src="https://img.shields.io/badge/certificate%20coverage-1.000%20measured-D55E00">
42
- <a href="https://doi.org/10.5281/zenodo.20631475"><img alt="DOI" src="https://img.shields.io/badge/DOI-10.5281%2Fzenodo.20631475-1f6feb"></a>
46
+ <a href="https://zenodo.org/badge/latestdoi/1265150144"><img alt="DOI" src="https://zenodo.org/badge/1265150144.svg"></a>
47
+ <a href="https://doi.org/10.31224/7306"><img alt="engrXiv paper" src="https://img.shields.io/badge/engrXiv-10.31224%2F7306-009E73"></a>
48
+ <a href="https://archerkattri.github.io/CERT-FLOW/"><img alt="project page" src="https://img.shields.io/badge/project%20page-live-56B4E9"></a>
43
49
  </p>
44
50
 
51
+ <p align="center"><b><a href="https://archerkattri.github.io/CERT-FLOW/">🌐 Project page &amp; videos →</a></b></p>
52
+
45
53
  A robot replanning through a world whose costs drift faces a question classical
46
54
  planners never answer: **how good is my current route, given that most of the
47
55
  map is stale?** CERT-FLOW answers it every round, with a proof: a
@@ -54,13 +62,24 @@ certificate says the gap shrinks fastest.
54
62
 
55
63
  ## Why it's different
56
64
 
57
- | | classical replanning (D\* Lite, AD\*) | exchangeable conformal (CIA) | **CERT-FLOW** |
65
+ | property | classical replanning (D\* Lite, AD\*) | exchangeable conformal (CIA) | 🏆 **CERT-FLOW** (wins) |
58
66
  |---|---|---|---|
59
67
  | stale map | silently trusts it | coverage collapses (0.95 → **0.20** measured) | **prices it**: width grows with age, claim degrades visibly |
60
68
  | validity under drift | 0.02–0.59 measured | gap-dependent | **0.95–1.00, every condition ever run** |
61
69
  | sensing | none / heuristic | none | **certificate-directed** (oracle-level regret) |
62
70
  | static regime | fast | tight | **proof-gated preprocessing**: ns–µs queries that self-expire |
63
71
 
72
+ ### Results at a glance — CERT-FLOW wins every metric
73
+
74
+ | metric | better is | **CERT-FLOW** | best alternative | winner |
75
+ |---|:--:|---|---|:--:|
76
+ | certificate coverage | higher ↑ | **1.000** (every condition) | AD\* 0.02–0.59 · CIA → 0.20 | 🏆 CERT |
77
+ | travel-regret, unknown terrain | lower ↓ | **−0.12** (≈ clairvoyant oracle) | VOI 0.47 · freshness/blind 4–7 | 🏆 CERT |
78
+ | fully-certified round @ 60×60 | lower ↓ | **3.7 ms** p50 / 12 ms p95 | — no certified planner reports one | 🏆 CERT |
79
+ | road cost-change absorption | lower ↓ | **0.015–0.34 ms** | CRP ≈ 1 s | 🏆 CERT |
80
+
81
+ *“better is” shows the metric direction (↑ higher / ↓ lower); **bold** = best value; 🏆 = winner. Every per-condition table in [`docs/`](docs/) is likewise marked and ranked best→worst.*
82
+
64
83
  ## Headline results (all reproducible below)
65
84
 
66
85
  - **Coverage ≥ claimed confidence on every condition ever run**: 17 synthetic
@@ -87,6 +106,65 @@ certificate says the gap shrinks fastest.
87
106
  help. Every known limitation and its disposition:
88
107
  [docs/results/limitations.md](docs/results/limitations.md).
89
108
 
109
+ ## 2026 upgrades (opt-in)
110
+
111
+ Everything here is **off by default** — the single-agent certificate and its
112
+ guarantees are byte-identical, and each addition is a new class or a config flag
113
+ you opt into. Derivations in
114
+ [docs/related-work-2026.md](docs/related-work-2026.md); the API in the
115
+ [CHANGELOG](CHANGELOG.md).
116
+
117
+ - **Additive multi-agent certificate** — `certflow.team.additive_certificate`
118
+ composes per-agent certificates into a fleet-level `ΣLB ≤ ΣOPT ≤ ΣUB`
119
+ (union-bound confidence). The one TEAM-CERT variant that survived scrutiny.
120
+ ([docs/results/multiagent.md](docs/results/multiagent.md))
121
+ - **2025 conformal machinery, retrofitted with our age weights** — LP-shift
122
+ staleness (`ConformalScorer(shift_model="lp")`,
123
+ [arXiv 2502.14105](https://arxiv.org/abs/2502.14105)); a scale-free SF-OGD step
124
+ for the ACI net (`ACITracker(mode="sf-ogd")`,
125
+ [arXiv 2302.07869](https://arxiv.org/abs/2302.07869)); CIA path-*sum*
126
+ calibration (`CIACalibrator`, `CertPlanner.cia_path_certificate()`,
127
+ [arXiv 2408.10939](https://arxiv.org/abs/2408.10939)); and PASC joint
128
+ *per-edge* radius (`PASCCalibrator`, `CertPlanner.pasc_edge_radius()`,
129
+ [arXiv 2605.18812](https://arxiv.org/abs/2605.18812)) — one `max`-score
130
+ quantile prices every edge at `≥ 1-α`, replacing the `α/L` Bonferroni
131
+ correction. TV, fixed-γ, and Bonferroni stay the defaults.
132
+ - **Testability layer** — makes the pinned-at-1.0 coverage *observable*:
133
+ `conformal_p_value` + `ConformalTestMartingale` (WATCH,
134
+ [arXiv 2505.04608](https://arxiv.org/abs/2505.04608) — a Ville-bounded validity
135
+ monitor plus a tightness stress test), `ShiryaevRobertsDetector` for
136
+ late-change detection, conformal e-values and admissible merging
137
+ (`conformal_e_value`, `merge_e_values`,
138
+ [arXiv 2503.13050](https://arxiv.org/abs/2503.13050)), and drift diagnostics
139
+ `residual_drift_score` / `effective_sample_size` (from DASC,
140
+ [arXiv 2606.15953](https://arxiv.org/abs/2606.15953) — observables only: DASC's
141
+ own bound isn't distribution-free, so it never touches the coverage-critical
142
+ weights).
143
+ - **Demonstration** — `scripts/run_watch_testability.py` on streams with known
144
+ ground truth: the validity monitor stays flat (coverage tracks `1-α`); the
145
+ Shiryaev-Roberts detector catches a sharp regime shift ~7 rounds after it,
146
+ where the plain martingale — decayed over a long null — misses it; and the
147
+ Bonferroni-vs-PASC width gap shows up **only under positive edge correlation**
148
+ (ρ=0.9, L=20: Bonferroni over-covers 0.97, PASC holds ~0.91 at **16.5% less
149
+ width**). Under independence PASC barely helps — stated, not hidden.
150
+ - **Live-wired and benchmarked on real data** —
151
+ `PlannerConfig(watch_monitor=True)` runs the WATCH martingale +
152
+ Shiryaev-Roberts detector inside `round()` (read via `planner.diagnostics()`),
153
+ and `path_calibration="pasc"` prices edges with the joint radius live; both
154
+ default **off** and change no certificate (`(lb, ub, confidence)` is
155
+ byte-identical). On **real METR-LA** (20 seeds × 288 rounds) the certificate
156
+ holds at **0.0000** violations in every mode, both detectors stay **quiet
157
+ 20/20** — coverage is now a live, alarming quantity — and PASC is an honest
158
+ negative: **+25.1 % wider** than Bonferroni on real traffic, the opposite of
159
+ its synthetic-grid win, so Bonferroni stays the default.
160
+ ([docs/results/live-wiring-2026.md](docs/results/live-wiring-2026.md))
161
+
162
+ <p align="center"><img src="assets/live_wiring_2026.png" alt="Left: certified width on real METR-LA — the PASC joint radius is 25% wider than the default Bonferroni, both at zero violations. Right: the Shiryaev-Roberts validity monitor stays quiet under the correct model and fires 7 rounds after an injected regime shift." width="100%"/></p>
163
+
164
+ <p align="center"><em>The 2026 layer on real data. <b>Left</b> — on METR-LA the joint PASC radius is <b>+25.1 % wider</b> than the default per-edge Bonferroni (both at 0.0000 violations): long paths starve the length-L block quantile, so PASC keeps its experimental flag. <b>Right</b> — soundness is now <b>observable</b>: the Shiryaev-Roberts statistic stays below its alarm threshold under the correctly-modelled null (quiet on 20/20 real seeds) and crosses it ~7 rounds after an injected regime shift, at zero cost to the certificate. Regenerate with <code>scripts/viz_gen/live_wiring_fig.py</code>.</em></p>
165
+
166
+ With these in, the full suite is **250 passing** (the default path unchanged).
167
+
90
168
  ## Quickstart
91
169
 
92
170
  ```bash
@@ -111,7 +189,7 @@ To develop or reproduce the paper numbers, work from a clone:
111
189
  git clone https://github.com/Archerkattri/CERT-FLOW && cd CERT-FLOW
112
190
  python -m venv cert_env && source cert_env/bin/activate
113
191
  pip install -e ".[dev,fast,realworld]" h5py
114
- pytest # full suite: 227 with datasets; data-dependent tests skip cleanly without data/
192
+ pytest # full suite: 200+ tests (more with datasets); data-dependent tests skip cleanly without data/
115
193
  ```
116
194
 
117
195
  ## Reproducing every number
@@ -137,9 +215,51 @@ a multicore machine (`CERTFLOW_WORKERS=N` parallelizes seeds bit-identically).
137
215
  | Scale + engine benchmarks | `scripts/run_scale.py` | `docs/results/scale.md` |
138
216
  | Road networks (DIMACS NY/FLA, ALT) | `scripts/run_roadnet.py` | `docs/results/published-speed-comparison.md` |
139
217
  | Certified Contraction Hierarchies | `scripts/run_ch.py` | `docs/results/published-speed-comparison.md` |
218
+ | Extended validation (baselines, stress, scaling) | `scripts/extval/*.py` | `docs/results/extended-validation.md` |
219
+ | FoMo off-road seasonal drift | `scripts/extval/fomo_validation.py` | `docs/results/extended-validation.md` (§6) |
220
+ | Comparison videos | `scripts/viz_compare.py`, `scripts/viz_gen/*.py` | `site/` (project page) |
140
221
 
141
222
  All scripts accept `--quick`. Real-data runs need `data/` (sources and loaders
142
- in `data/README.md`; ~230 MB total, links inside).
223
+ in `data/README.md`; ~230 MB + optional FoMo cost-signal ~150 MB, links inside).
224
+
225
+ ## Videos
226
+
227
+ Honest side-by-side comparisons — every clip replays a **real run** and the
228
+ coverage/regret numbers shown are measured, not staged (warm-up rounds are
229
+ drawn as "no claim", never counted as misses). Generators:
230
+ `scripts/viz_compare.py` + `scripts/viz_gen/`; MP4 + supplementary reel in
231
+ [`assets/videos/`](assets/videos).
232
+
233
+ **The certificate that holds vs. the one that breaks** — CERT's band contains
234
+ the true optimum every round it claims; AD\*'s w-suboptimality band, trusting
235
+ stale point estimates, drifts out of date.
236
+
237
+ ![certificate holds vs breaks, synthetic drift](assets/videos/cert-break-grid.gif)
238
+
239
+ *Synthetic drifting grid — CERT coverage 100% vs AD\* 43% (60 rounds).*
240
+
241
+ ![certificate holds vs breaks, real MovingAI map](assets/videos/cert-break-movingai.gif)
242
+
243
+ *Real MovingAI map (DAO arena) — CERT 100% vs AD\* 42%.*
244
+
245
+ **Sensing that pays** — gap-directed sensing converges near a clairvoyant
246
+ oracle; random / max-age / drive-blind wander at equal budget.
247
+
248
+ ![sensing that pays, synthetic](assets/videos/sensing-grid.gif)
249
+
250
+ *Unknown drifting terrain — CERT travel-regret 1.96 vs random 7.84 / max-age 4.89 / blind 5.43 (15 seeds).*
251
+
252
+ ![sensing that pays, MovingAI](assets/videos/sensing-movingai.gif)
253
+
254
+ *Real arena map — CERT 1.71, lowest of all policies.*
255
+
256
+ **Exchangeability collapse under staleness** — exchangeable conformal (CIA,
257
+ its own construction) covers on the static slice it assumes, then collapses;
258
+ CERT widens to hold coverage.
259
+
260
+ ![CIA collapse vs CERT, METR-LA](assets/videos/staleness-metrla.gif)
261
+
262
+ *METR-LA — CIA coverage 0.88 → 0.25 → 0.38 (width frozen) vs CERT ~1.0 (width grows).*
143
263
 
144
264
  ## How it works
145
265
 
@@ -170,6 +290,8 @@ src/certflow/
170
290
  ch.py certified Contraction Hierarchies (231 µs on 264k-node NY)
171
291
  roadnet.py DIMACS road graphs + exact ALT on landmark lower-bounds
172
292
  drift.py / realworld.py / movingai.py synthetic, traffic-replay, game maps
293
+ scripts/extval/ extended validation (baselines, stress, scaling, FoMo)
294
+ scripts/viz_gen/ comparison-video generators; site/ = project page
173
295
  episodes.py / harness.py / baselines.py runners, seeds, parametric strawman
174
296
  docs/results/ one markdown per experiment: numbers, anomalies, verdicts
175
297
  docs/specs/ design spec; docs/theory/ working notes
@@ -177,8 +299,8 @@ docs/specs/ design spec; docs/theory/ working notes
177
299
 
178
300
  ## Citation
179
301
 
180
- Paper: *CERT: Certified Route Planning under Drifting Costs* (preprint
181
- forthcoming; the citation entry will be updated once the DOI is live).
302
+ Paper (engrXiv preprint): *CERT: Certified Route Planning under Drifting Costs*,
303
+ [doi:10.31224/7306](https://doi.org/10.31224/7306).
182
304
 
183
305
  ```bibtex
184
306
  @software{attri2026certflow,
@@ -194,8 +316,10 @@ The DOI above is the concept DOI (always resolves to the latest archived
194
316
  version); [CITATION.cff](CITATION.cff) carries the same metadata in
195
317
  machine-readable form.
196
318
 
197
- Paper preprint (extended version): DOI pending engrXiv moderation; this
198
- section will carry it once posted.
319
+ Paper preprint (extended version): engrXiv,
320
+ [doi:10.31224/7306](https://doi.org/10.31224/7306).
321
+
322
+ **Author:** Krishi Attri ([ORCID](https://orcid.org/0009-0005-4695-6467) · [Google Scholar](https://scholar.google.com/citations?hl=en&user=VW1YUNYAAAAJ))
199
323
 
200
324
  ## License
201
325