voidforge-build 23.9.2 → 23.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (75) hide show
  1. package/dist/.claude/agents/bashir-field-medic.md +1 -0
  2. package/dist/.claude/agents/coulson-release.md +3 -0
  3. package/dist/.claude/agents/irulan-historian.md +3 -0
  4. package/dist/.claude/agents/kusanagi-devops.md +8 -0
  5. package/dist/.claude/agents/leia-secrets.md +10 -0
  6. package/dist/.claude/agents/loki-chaos.md +1 -0
  7. package/dist/.claude/agents/picard-architecture.md +11 -0
  8. package/dist/.claude/agents/silver-surfer-herald.md +17 -0
  9. package/dist/.claude/agents/sisko-campaign.md +3 -0
  10. package/dist/.claude/agents/thufir-protocol-parsing.md +10 -0
  11. package/dist/.claude/commands/architect.md +56 -0
  12. package/dist/.claude/commands/campaign.md +26 -1
  13. package/dist/.claude/commands/deploy.md +31 -0
  14. package/dist/.claude/commands/gauntlet.md +11 -0
  15. package/dist/.claude/commands/git.md +13 -3
  16. package/dist/.claude/commands/prd.md +8 -0
  17. package/dist/CHANGELOG.md +107 -0
  18. package/dist/CLAUDE.md +13 -4
  19. package/dist/VERSION.md +3 -1
  20. package/dist/docs/methods/AI_INTELLIGENCE.md +15 -0
  21. package/dist/docs/methods/BACKEND_ENGINEER.md +48 -0
  22. package/dist/docs/methods/BUILD_PROTOCOL.md +19 -0
  23. package/dist/docs/methods/CAMPAIGN.md +204 -1
  24. package/dist/docs/methods/DEVOPS_ENGINEER.md +80 -0
  25. package/dist/docs/methods/FORGE_KEEPER.md +80 -3
  26. package/dist/docs/methods/GAUNTLET.md +2 -0
  27. package/dist/docs/methods/PRD_GENERATOR.md +15 -0
  28. package/dist/docs/methods/QA_ENGINEER.md +46 -0
  29. package/dist/docs/methods/RELEASE_MANAGER.md +59 -0
  30. package/dist/docs/methods/SECURITY_AUDITOR.md +53 -0
  31. package/dist/docs/methods/SPEC_HANDOFF.md +53 -0
  32. package/dist/docs/methods/SUB_AGENTS.md +90 -0
  33. package/dist/docs/methods/SYSTEMS_ARCHITECT.md +55 -2
  34. package/dist/docs/methods/TESTING.md +17 -0
  35. package/dist/docs/methods/TIME_VAULT.md +17 -0
  36. package/dist/docs/methods/TROUBLESHOOTING.md +27 -0
  37. package/dist/docs/patterns/adr-verification-gate.md +80 -0
  38. package/dist/docs/patterns/ai-eval.ts +87 -0
  39. package/dist/docs/patterns/ai-prompt-safety.ts +242 -0
  40. package/dist/docs/patterns/audit-log.ts +132 -0
  41. package/dist/docs/patterns/deploy-preflight.ts +195 -0
  42. package/dist/docs/patterns/llm-state-dedup.ts +246 -0
  43. package/dist/docs/patterns/middleware.ts +83 -0
  44. package/dist/docs/patterns/multi-tenant-pool-bypass.ts +134 -0
  45. package/dist/docs/patterns/multi-tenant-property-test.ts +127 -0
  46. package/dist/docs/patterns/refactor-extraction.md +96 -0
  47. package/dist/scripts/voidforge.js +0 -0
  48. package/dist/wizard/lib/anomaly-detection.d.ts +59 -0
  49. package/dist/wizard/lib/anomaly-detection.js +122 -0
  50. package/dist/wizard/lib/asset-scanner.d.ts +23 -0
  51. package/dist/wizard/lib/asset-scanner.js +107 -0
  52. package/dist/wizard/lib/build-analytics.d.ts +39 -0
  53. package/dist/wizard/lib/build-analytics.js +91 -0
  54. package/dist/wizard/lib/codegen/erd-gen.d.ts +16 -0
  55. package/dist/wizard/lib/codegen/erd-gen.js +98 -0
  56. package/dist/wizard/lib/codegen/openapi-gen.d.ts +15 -0
  57. package/dist/wizard/lib/codegen/openapi-gen.js +79 -0
  58. package/dist/wizard/lib/codegen/prisma-types.d.ts +15 -0
  59. package/dist/wizard/lib/codegen/prisma-types.js +44 -0
  60. package/dist/wizard/lib/codegen/seed-gen.d.ts +16 -0
  61. package/dist/wizard/lib/codegen/seed-gen.js +128 -0
  62. package/dist/wizard/lib/correlation-engine.d.ts +59 -0
  63. package/dist/wizard/lib/correlation-engine.js +152 -0
  64. package/dist/wizard/lib/desktop-notify.d.ts +27 -0
  65. package/dist/wizard/lib/desktop-notify.js +98 -0
  66. package/dist/wizard/lib/image-gen.d.ts +56 -0
  67. package/dist/wizard/lib/image-gen.js +159 -0
  68. package/dist/wizard/lib/natural-language-deploy.d.ts +30 -0
  69. package/dist/wizard/lib/natural-language-deploy.js +186 -0
  70. package/dist/wizard/lib/project-init.js +57 -0
  71. package/dist/wizard/lib/route-optimizer.d.ts +28 -0
  72. package/dist/wizard/lib/route-optimizer.js +93 -0
  73. package/dist/wizard/lib/service-install.d.ts +18 -0
  74. package/dist/wizard/lib/service-install.js +182 -0
  75. package/package.json +1 -1
@@ -132,6 +132,21 @@ Ask the user: "Does your product use AI or LLM features? If yes: What models? Wh
132
132
  - Backup strategy
133
133
  - Complete environment variable list
134
134
 
135
+ ### Cloudflare Pages deploy safety (required for `deploy: cloudflare` projects)
136
+
137
+ Projects with `deploy: "cloudflare"` (or the static-site variant) MUST include:
138
+
139
+ 1. **`wrangler.toml`** with `pages_build_output_dir = "./dist"` (or the project's actual build output directory). This makes the deploy surface explicit.
140
+ 2. **Deploy command uses the output directory, not `.`** — always `wrangler pages deploy ./dist`, never `wrangler pages deploy .`. The dot path uploads the entire repo root including `.env`, `.claude/`, `docs/methods/`, `logs/`. `.gitignore` is IGNORED in Direct Upload mode.
141
+ 3. **`.cfignore`** (repo root) that excludes `.claude/`, `docs/methods/`, `docs/patterns/`, `HOLOCRON.md`, `CHANGELOG.md`, `VERSION.md`, `logs/`. Defense in depth.
142
+ 4. **`SECURITY.md`** (repo root) with a coordinated-disclosure contact.
143
+ 5. **`public/.well-known/security.txt`** pointing at the same contact.
144
+ 6. **Dedicated build output directory** — `dist/`, `build/`, `out/`, or `site/`. Never repo root.
145
+
146
+ The PRD generator MUST emit these files / entries in the Infrastructure / Deployment section for any Cloudflare Pages target.
147
+
148
+ Evidence: field report #305 documents a 32-day live credential leak caused by `wrangler pages deploy .` from repo root. Affects ALL VoidForge-generated projects that deploy to Cloudflare Pages via Direct Upload, and structurally similar situations with Netlify CLI, Vercel CLI, Firebase CLI, and `aws s3 sync`. See `docs/methods/DEVOPS_ENGINEER.md` §Deploy Surface Boundary.
149
+
135
150
  ## 16. Launch Sequence
136
151
  - Phased build plan (what gets built in what order)
137
152
  - Each phase has: scope, dependencies, and "done" criteria
@@ -118,6 +118,22 @@ When the project targets mobile platforms, add these to the attack plan:
118
118
  - **App lifecycle:** Background → foreground. Verify state restored (form input, scroll position, auth token). Test after 30min background.
119
119
  - **Platform differences:** Test on both iOS and Android if cross-platform. Verify platform-specific components render correctly.
120
120
 
121
+ ### Multi-Tenant Retrofit Smell (regression checklist)
122
+
123
+ For any project with `org_id`, `tenant_id`, or `workspace_id` columns, run this grep before declaring a QA pass green:
124
+
125
+ ```bash
126
+ grep -rnE "(\bor\s+1\b|org_id\s*:\s*int\s*=\s*1|org_id\s*:\s*int\s*\|\s*None|org_id\s*=\s*None|tenant_id\s*=\s*None|workspace_id\s*=\s*None)" \
127
+ --include="*.py" --include="*.ts" --include="*.tsx" \
128
+ --exclude-dir=node_modules --exclude-dir=.venv --exclude-dir=tests .
129
+ ```
130
+
131
+ Each hit is either intentionally cross-tenant (system endpoints, admin tools — must have authorization comment naming the policy) or retrofit residue (a fallback predating the multi-tenant migration — CRITICAL, fix before sign-off). See `SECURITY_AUDITOR.md` for the full smell discussion. The pattern recurred across 6 Union Station campaigns despite previous "we already fixed that" claims (field report #315 M2). Trust the grep, not memory.
132
+
133
+ ### Production-Backend Parity Check
134
+
135
+ Before declaring any QA pass green, verify that the test execution backend matches the production backend declared in `PROJECT_VERSION.md` / `CLAUDE.md` Stack section. Concrete check: read `tests/conftest.py` (Python) or equivalent test bootstrap; if it pins a non-prod backend (e.g., `_backend = "sqlite"` while prod is PostgreSQL since version X), this is a CRITICAL finding and the QA pass FAILS regardless of green test counts. Tests pinned to the wrong backend exercise none of the production-relevant integrations (RLS, asyncpg, advisory locks, LISTEN/NOTIFY, FOR UPDATE SKIP LOCKED) and silently mask production behavior. Field report #315 M3: this slipped past 4 dual-backend Gauntlets on Union Station before being caught at /assess. See `GAUNTLET.md` for the Gauntlet-side exit criterion.
136
+
121
137
  ### API Boundary Type Verification
122
138
 
123
139
  When the backend (Python, Go, Rust) and frontend (JavaScript) use different type systems, verify that types survive the API boundary correctly. Common gotcha: Python `bool` (`True`/`False`) becomes JSON `true`/`false` — but Python's string representation `"True"` is truthy in JS while `"False"` is also truthy. Check: Does the frontend compare API boolean values with `===` (strict) or `==` (loose)? Does the backend serialize booleans as JSON booleans or as strings? This catches "it works in Python tests but breaks in the browser" bugs. (Field report #66)
@@ -202,6 +218,36 @@ For services that maintain runtime state (caches, connection pools, scheduled jo
202
218
  Grep for `strftime`, `format(`, `toISOString`, `new Date().to` calls and verify they use the project's canonical timestamp format (typically `%Y-%m-%dT%H:%M:%SZ` or ISO 8601). Flag any non-canonical format strings. Non-canonical timestamps cause: cache TTL bugs (string comparison fails), sorting issues, and cross-system timestamp mismatches.
203
219
  (Field report #21: cache used `%Y-%m-%d %H:%M:%S` while all other code used `%Y-%m-%dT%H:%M:%SZ` — cache effectively never expired.)
204
220
 
221
+ ### Strict-Mode Audit Classification
222
+
223
+ When the codebase ships under any strict-mode setting (bash `set -euo pipefail`, TypeScript `strict: true`, Python `-W error`, Rust `#![deny(warnings)]`, Ruby `frozen_string_literal: true`), no QA finding involving language syntax, undefined-variable references, arithmetic expansion, type coercion, or null/undefined handling may be classified as **WARN/cosmetic** without behavioral evidence.
224
+
225
+ **Behavioral evidence requires ONE of:**
226
+
227
+ 1. **Unreachable-by-gate proof** — the code path is provably unreachable under any input, cited by the specific gate that excludes it (e.g., "this branch is guarded by `if (process.env.NODE_ENV !== 'production')` and the audit fixture pins production").
228
+ 2. **Real-path test under the same strict-mode flags as production** — the reviewer ran the non-dry-run code path with the production strict-mode settings and observed no failure. Static reading alone is not sufficient.
229
+
230
+ The audit's strict-mode MUST match the script's strict-mode. A reviewer running tests with `set +u` (not strict) cannot classify a `set -u` script's undefined-variable risk as cosmetic — the production environment promotes the risk to fatal that the audit never exercised.
231
+
232
+ Field report #330 (threadplex-ops): a sub-agent reviewer flagged `$(( rc_failures(rc) ))` as "cosmetic WARN — always returns 0." The function used C-style call syntax inside arithmetic expansion, which bash doesn't support — under `set -u` it parsed as undefined-variable reference and aborted immediately. The dry-run path didn't execute the branch (only reached on real label-PUT failures), so the bug was invisible during review. First real `--once` invocation tagged 100 items, then crashed mid-batch.
233
+
234
+ The audit-classification was wrong, and a wrong audit classification routes around the fix. The orchestrator MUST NOT unblock a fix-batch on a WARN/cosmetic classification that lacks one of the two evidence requirements above.
235
+
236
+ ### Telegram-Bot Group-Chat Suffix Test (when project is a chat bot)
237
+
238
+ Telegram (and similar chat platforms) append the bot's username to commands in group chats: `/system` becomes `/system@MyBotName`. A bare-anchor regex like `^/system($|[[:space:]])` rejects the group-chat form, silently breaking the bot for any group-deployed user.
239
+
240
+ **Required test for every chat-bot command parser:**
241
+
242
+ 1. `/cmd` — direct chat (private message form)
243
+ 2. `/cmd arg1 arg2` — direct chat with arguments
244
+ 3. `/cmd@BotName` — group chat (bare command)
245
+ 4. `/cmd@BotName arg1 arg2` — group chat with arguments
246
+
247
+ The parser must accept both forms. Reference normalizer: `sed -E 's#^(/[a-zA-Z_]+)@[a-zA-Z0-9_]+($|[[:space:]])#\1\2#'` (note the `#` delimiter to avoid clash with the regex's alternation operator).
248
+
249
+ Field report #325 (threadplex-ops): five fix batches missed this until Round 3 V-02 caught it — the bot rejected `/system@MyBot` in group chats. Niche but real, and zero-cost to add to the QA checklist for any chat-bot product.
250
+
205
251
  ### Stub Detection (Oracle, Round 2)
206
252
 
207
253
  Oracle scans for methods that return success without side effects — the most dangerous form of incomplete code. A method that raises `NotImplementedError` fails loudly and safely. A method that returns `True` without acting is a time bomb.
@@ -136,6 +136,65 @@ When the user passes `--deploy` to `/git`, run `/deploy` automatically after the
136
136
 
137
137
  This enables one-command commit-and-deploy for ad-hoc changes outside of campaigns.
138
138
 
139
+ ## Per-Commit CHANGELOG Discipline
140
+
141
+ CHANGELOG drift accumulates silently when entries are deferred to session boundaries. By the time someone notices, the test count trajectory is wrong and the per-mission delta is unrecoverable from the diff alone.
142
+
143
+ **Rule:** Commits that touch `src/**`, `docs/adrs/**`, or load-bearing method docs (`docs/methods/*.md`) MUST include a `CHANGELOG.md` entry as part of the staged paths. Coulson rejects commits matching those globs that omit `CHANGELOG.md`.
144
+
145
+ **Exceptions** (no CHANGELOG entry needed):
146
+ - Pure refactor / move with no behavior change (label the commit `chore:`)
147
+ - Test-only changes that don't add a new test pattern
148
+ - Documentation typo fixes
149
+ - Files explicitly listed under `.changelog-exempt` if present
150
+
151
+ **Enforcement check (Coulson runs before commit):**
152
+
153
+ ```bash
154
+ if git diff --cached --name-only | grep -qE '^(src/|docs/adrs/|docs/methods/.*\.md$)'; then
155
+ git diff --cached --name-only | grep -q '^CHANGELOG\.md$' || {
156
+ echo "Commit touches src/adrs/methods but omits CHANGELOG.md"; exit 1
157
+ }
158
+ fi
159
+ ```
160
+
161
+ Field report #322 (barrierwatch): test count trajectory showed 1207 when reality was 1209+ after Fix Batch 1; CHANGELOG drift caught only by Round 3 Nightwing. Without that agent, the release would have shipped with a stale CHANGELOG.
162
+
163
+ ## Pre-Push Lint Sweep
164
+
165
+ Project-specific lint gates (`scripts/check-*.sh`, `scripts/lint_*.py`, `bin/preflight`, etc.) are easy to forget without a checklist — and the cost is a hotfix loop where the first push fails CI on a contract gate that local development never exercised.
166
+
167
+ **Rule:** Before `git push`, Coulson runs every executable under `scripts/check-*` (or framework equivalent — `scripts/lint_*`, `bin/preflight`, `make preflight`). If any returns non-zero, push is blocked until the finding is resolved (fix the code OR add an explicit `# <gate>-allowed` waiver with rationale).
168
+
169
+ **Discovery shape:**
170
+
171
+ ```bash
172
+ find scripts/ -maxdepth 2 -type f \( -name 'check-*' -o -name 'lint_*' \) -executable 2>/dev/null
173
+ ```
174
+
175
+ For each script discovered, document its purpose + waiver convention in the project README or `docs/CONTRIBUTING.md`. Field report #324 (Union Station v7.8) documents 3 separate hotfix loops in a single session where the waiver convention (`# system-org-allowed` for source code, double-backticks for prose) existed but was not surfaced in any reviewer-readable checklist.
176
+
177
+ **Methodology vs project tooling:** the SCRIPTS are project-specific; the DISCIPLINE (run all gates before push) is methodology. The orchestrator does not need to know what each script does — only that it exists and must pass.
178
+
179
+ ## Post-Amend SHA Pin
180
+
181
+ `git commit --amend` rewrites the SHA but `logs/campaign-state.md` rows still reference the pre-amend SHA. Across a long campaign, these dangling references accumulate and break post-hoc audits (`git log --grep` against the recorded SHA returns nothing).
182
+
183
+ **Rule:** After any `git commit --amend`, Coulson scans `logs/campaign-state.md` (and `logs/build-state.md`, `logs/gauntlet-state.md` if present) for SHA placeholders that may now be stale.
184
+
185
+ **Detection pattern:**
186
+
187
+ ```bash
188
+ # Find recorded SHAs that no longer exist in git
189
+ grep -oE '\b[a-f0-9]{7,40}\b' logs/campaign-state.md 2>/dev/null | sort -u | while read sha; do
190
+ git cat-file -e "$sha^{commit}" 2>/dev/null || echo "STALE: $sha in campaign-state.md"
191
+ done
192
+ ```
193
+
194
+ **Resolution:** Replace the stale SHA with the post-amend SHA. Land both the amend and the state-file pin in one logical operation (squash if not yet pushed; new commit if already on remote).
195
+
196
+ Field report #327 (Union Station v7.10 Phase C): every mission shipped as a `<mission> + <followup pin SHA>` pair because amends were routine and the state file always lagged by one SHA. The discipline ergonomically holds, but it's a known foot-gun — surface it explicitly so future operators don't rediscover it.
197
+
139
198
  ## Post-Push Deploy Check
140
199
 
141
200
  After pushing to remote, if the project runs on a persistent server (PM2, systemd, Docker):
@@ -106,6 +106,59 @@ These require full codebase context — run sequentially:
106
106
  - **JS execution:** `eval()`, `Function()`, `setTimeout`/`setInterval` with string arguments
107
107
  (Field report #38: sanitizer missed `object`, `embed`, `applet`, `base`, `meta[http-equiv]` — 5 potential XSS vectors.)
108
108
 
109
+ ### Sanitizer Bypass-Class Checklist
110
+
111
+ When auditing any prompt-injection sanitizer, command-injection filter, or content sanitizer that operates on adversary-controlled text, verify coverage against the canonical bypass classes. Sanitizers built incrementally (adding patterns as discovered) inevitably miss entries — each fix-batch produces a narrower bypass that the next round catches, compressing 3 fix batches into 1.
112
+
113
+ **Required coverage for every text-input sanitizer:**
114
+
115
+ 1. **Case-fold variants** — `APPROVED ACTION`, `approved action`, `Approved Action`, `aPPROVED aCTION`. The sanitizer MUST be case-insensitive (regex `i` flag, ICU case-fold, or explicit `.lower()` pre-check). Test with mixed-case input.
116
+ 2. **Unicode lookalikes & em-dash variants** — em-dash (`—`), en-dash (`–`), figure-dash (`‒`), minus sign (`−`), full-width hyphen (`-`), Cyrillic `а`/`е`/`о` substituted for Latin `a`/`e`/`o`. Normalize to NFKC before matching, OR explicitly enumerate the lookalike set.
117
+ 3. **Newline-split variants** — `sed` is line-oriented by default; a marker split across `\n` defeats line-level regex. Use `sed -zE` (whole-buffer), Perl `-0777`, or Python re.DOTALL/re.MULTILINE depending on language. Test with `\r\n`, `\n`, `
`, `
`.
118
+ 4. **Character-class glob variants** — patterns like `AUTHORIT[Yy]` or `appr[o0]ved` exploit blocklist regexes that miss numeric/alpha substitutions. The sanitizer should normalize obfuscation classes (l33t-speak, `0`/`o`, `1`/`l`, `$`/`s`) OR reject any non-ASCII letter in security-relevant context.
119
+ 5. **Encoding variants** — base64, URL-encoded, HTML-entity, JS-escape (`\x41`, `A`), hex-escape, double-encoded. The sanitizer must decode BEFORE matching, not after.
120
+ 6. **Length-boundary variants** — payloads at exactly the truncation boundary, payloads with leading/trailing whitespace that strips to a malicious core, payloads that exceed max-length and trigger truncation that creates a different malicious string.
121
+ 7. **Novel-marker variants** — the sanitizer that catches `[APPROVED]` should catch `「APPROVED」`, `«APPROVED»`, `\\xe2\\x80\\xbaAPPROVED\\xe2\\x80\\xba`. Test with at least 3 unusual delimiter pairs.
122
+
123
+ Field report #325 (threadplex-ops Victory Gauntlet): each fix batch on the prompt-injection sanitizer introduced a narrower bypass that the next round caught. Fix Batch 1 added noun-whitelist `sed`; Round 3 found case-fold + em-dash + novel marker bypasses. Fix Batch 3 added shape-blacklist `sed -E i`; Round 4 found newline-split bypass (sed line-oriented). Fix Batch 4 used `sed -zE` (whole-buffer). The checklist above would have collapsed those three iterations into one — the bypass classes are knowable upfront, not discoverable per-round.
124
+
125
+ **Audit step:** for every sanitizer the codebase ships, verify the test suite covers all 7 classes above with at least 2 samples each. Missing classes are pre-flagged finding (HIGH severity for security-relevant sanitizers, MEDIUM otherwise).
126
+
127
+ ### Multi-Tenant Retrofit Smell (`or 1` / `org_id=None`)
128
+
129
+ A recurring data-leak class across multi-tenant retrofit campaigns. When a project adds `org_id` columns and composite PKs but leaves the `else` branch / `or 1` fallback alive, queries silently leak across tenants when authentication is missing or partial. Field report #315 M2 documents this recurring across 6 Union Station campaigns (v3.0 → v3.6.1 → v7.0 → v7.0.1 → v7.4 → v7.6).
130
+
131
+ **Mandatory grep pass on every multi-tenant codebase:**
132
+
133
+ ```bash
134
+ # Catches all variants
135
+ grep -rnE "(\bor\s+1\b|org_id\s*:\s*int\s*=\s*1|org_id\s*:\s*int\s*\|\s*None|org_id\s*=\s*None|tenant_id\s*=\s*None|workspace_id\s*=\s*None)" \
136
+ --include="*.py" --include="*.ts" --include="*.tsx" \
137
+ --exclude-dir=node_modules --exclude-dir=.venv .
138
+ ```
139
+
140
+ Each hit must be classified:
141
+ - **Defensible** — a system endpoint that explicitly serves cross-tenant data (admin tools, reporting), with documented authorization checks. Annotate with a comment naming the policy.
142
+ - **Retrofit residue** — a fallback that predates the multi-tenant migration. **CRITICAL** finding; rewrite to fail-fast.
143
+ - **Test-only** — fixture default. Acceptable in `tests/`, **never** in production code.
144
+
145
+ This grep is part of every `/sentinel` run on projects with `org_id` columns. Also runs in `/qa` regression checklists (see QA_ENGINEER.md). Do not skip it for "we already fixed that" — the pattern recurs.
146
+
147
+ ### IDOR Matrix for Parametric-Path Routers
148
+
149
+ Mandatory when a router has parametric paths (`/X/{id}`) AND additional fixed-suffix paths under the same entity prefix (`/X/batch-update`, `/X/merge`, `/X/export`). FastAPI dispatches first-matching-route — `/X/{person_id}` is more general than `/X/batch-update` and shadows the fixed suffix when registered first. The fixed-suffix endpoint then becomes silently unreachable, returning 422 (path-arg parse failure) instead of running.
150
+
151
+ Field report #320 §2 documents M-10 commit 5: `PATCH /people/batch-update` had been **unreachable in production for an unknown duration** because `/people/{person_id}` shadowed it. Surfaced only when Strange's IDOR matrix test attempted cross-org denial on `batch-update` and got 422 instead of 403.
152
+
153
+ **Matrix shape (one row per fixed-suffix endpoint × one column per access pattern):**
154
+
155
+ | | Same-org user | Cross-org user | No auth |
156
+ |---|---|---|---|
157
+ | `PATCH /X/batch-update` | 200 + scoped result | 403 (or 404 per ADR) | 401 |
158
+ | `POST /X/merge` | 200 | 403 | 401 |
159
+
160
+ **Fix when the matrix surfaces a route shadow:** add path-converter type hints (`{person_id:int}`, `{company_id:int}`) so the parametric route is restricted to its actual type. Do not reorder routes — type-converted paths are unambiguous; reordering is fragile. Then re-run the matrix to confirm fixed-suffix routes reach their handlers.
161
+
109
162
  ### Proxy Route SSRF
110
163
 
111
164
  For any route that proxies requests to external APIs (image proxies, API gateways, CDN wrappers):
@@ -0,0 +1,53 @@
1
+ # SPEC_HANDOFF — Cross-Session Implementation Hand-off
2
+ ## System Protocol · Introduced by: field reports #307, #308
3
+
4
+ ## When to Use
5
+
6
+ When a session would benefit from offloading mechanical implementation work to a different session (separate context window, separate repo, separate worktree) while preserving the orchestrator's context for synthesis and review.
7
+
8
+ Typical triggers:
9
+ - Multi-repo campaign (e.g., scaffold methodology update + marketing-site content update)
10
+ - Large-but-mechanical work (26-finding spec executed without back-and-forth — v23.9.x demonstrated this)
11
+ - Executor needs fresh context; orchestrator needs to stay on high-level synthesis
12
+
13
+ ## The Pattern
14
+
15
+ ### Session A (orchestrator) produces the spec
16
+
17
+ Spec doc lives at `docs/SITE_UPDATE_SPEC.md` or similar in the TARGET repo (so the executing session finds it by path).
18
+
19
+ Required structure:
20
+ 1. **Title** with date and source.
21
+ 2. **Numbered findings** — each finding has ID, severity (Critical / High / Should / Nice-to-have), file:line citation, proposed change, and a `verified-against-commit: <SHA>` field. The verified-against-commit field lets Session B fast-skip any finding whose state hasn't changed since the spec was authored.
22
+ 3. **Phases** — group findings by logical phase (data fixes, new pages, content rewrite, test updates, typecheck/build). One commit per phase.
23
+ 4. **Nav-order requirements for new pages** — if the spec proposes creating new pages in a linear tutorial/flow, explicitly state `prev=<page>, next=<page>` for each new page. Without this, the executor guesses and often chooses the wrong direction (field report #308 RC-7).
24
+ 5. **Success criteria** — typecheck green, tests green, build green, optional per-phase smoke checks.
25
+
26
+ ### Session B (executor) receives the hand-off prompt
27
+
28
+ Copy-pasteable prompt template:
29
+ ```
30
+ Read docs/SITE_UPDATE_SPEC.md in this repo. Execute phases in order.
31
+ Commit per phase with CHANGELOG entry. Run typecheck + test + build between phases.
32
+ If you hit a blocker, stop and save state — do not improvise.
33
+ ```
34
+
35
+ ### Session B validates before acting
36
+
37
+ For each finding: `git show <verified-against-commit>:<path>` and compare to local HEAD. If they match, the claimed state is current — execute as planned. If they differ, the state has moved; re-evaluate before applying.
38
+
39
+ ## Evidence
40
+
41
+ - Field report #308: 23/26 items executed across 5 phases. 3 Must-Fix items slipped through (all related to spec gaps around nav direction and table captions). Net positive — saved ~20k tokens of orchestrator context.
42
+ - Field report #307 F4: CAMPAIGN.md convention for `verified-against-commit: <SHA>` stamping.
43
+
44
+ ## Limitations
45
+
46
+ - Executor may optimize for literal compliance over holistic UX (nav direction example above).
47
+ - Spec must include nav order, table captions, and a11y requirements for new components explicitly.
48
+ - Orchestrator MUST run a review pass (`/engage`) on the executor's output before considering the hand-off complete.
49
+
50
+ ## Handoffs
51
+
52
+ - After executor completes, orchestrator runs `/engage` then `/assemble --fast` on the affected files to surface integration issues.
53
+ - If the executor skipped findings whose `verified-against-commit` matched local HEAD, note which in the completion report — helps validate the SHA-skip heuristic.
@@ -133,11 +133,81 @@ AGENT: [Name]
133
133
  STATUS: Done / Blocked / Needs Review
134
134
  CHANGES: [Files modified, one-line each]
135
135
  DECISIONS: [Non-obvious choices with rationale]
136
+ DEVIATIONS FROM CONTRACT: [see below — required, "None" is acceptable]
136
137
  ASSUMPTIONS: [Needs confirmation]
137
138
  RISKS: [Side effects]
138
139
  REGRESSION: [How to verify]
139
140
  ```
140
141
 
142
+ ### Deviations from Contract (required section)
143
+
144
+ For every item in the dispatch brief that the agent chose to handle differently from the literal contract — defensible improvements, scope adjustments, deferred work — flag it explicitly:
145
+
146
+ ```
147
+ - Brief said: "<exact wording>"
148
+ You did: <what you actually shipped>
149
+ Why: <rationale>
150
+ Risk: <production-side implication, or "None" if internal-only>
151
+ Reviewer signoff needed: <Y/N — if Y, name the reviewer>
152
+ ```
153
+
154
+ An empty section ("No deviations") is acceptable and explicit. **Hidden deviations risk emerging as production bugs** — Stark's M-05-prep-2 silent fallback (`_get_db_admin()` retained tenant-pool fallback for "dev/test backward compat" instead of failing-fast as Picard's contract specified) was sound but not flagged in the build report headline. It took a Loki chaos pass to catch the production-side implication. (Field report #318 §4.) Across that single session, 6 separate agents had silent deviations from their dispatch briefs.
155
+
156
+ The orchestrator reviews this section at the same priority as STATUS. A deviation that risks production behavior triggers a reviewer dispatch (Loki, Riker, or the original contract author).
157
+
158
+ ### Sub-Agent Review Contract (WARN/cosmetic evidence requirement)
159
+
160
+ A sub-agent reviewer may classify a finding as **WARN/cosmetic** (deferrable, non-blocking) only if at least ONE of the following holds:
161
+
162
+ 1. The code path is **provably unreachable** with a citation of the specific gate that excludes it (e.g., `if (DEV_ONLY)` guard pinned by audit fixture).
163
+ 2. The reviewer **ran the real (non-dry-run) code path under the same strict-mode flags as production** and observed no failure.
164
+
165
+ Static reading alone is NOT sufficient evidence for a WARN/cosmetic downgrade when the codebase ships under `set -euo pipefail`, TypeScript strict, Python `-W error`, or any equivalent strict-mode setting. The orchestrator MUST NOT unblock a fix-batch on a WARN/cosmetic classification that lacks one of the two above.
166
+
167
+ Field report #330: a Kim-class reviewer flagged a bash syntax oddity as "cosmetic — always returns 0." The reasoning was correct only if the code path didn't crash under strict-mode flags — which it did. The audit's strict-mode must match the script's strict-mode. See `QA_ENGINEER.md` "Strict-Mode Audit Classification" for the language-level rule.
168
+
169
+ **The contract applies recursively** — a sub-agent reviewing another sub-agent's classification inherits this requirement. WARN/cosmetic that survives a chain of reviews still requires evidence at the root of the chain.
170
+
171
+ ### Agent Capability Matrix (tool surface verification)
172
+
173
+ Before briefing an agent for a task, the orchestrator confirms the agent has the tools required for that task. The `tools:` field in each `.claude/agents/<id>.md` frontmatter is the source of truth.
174
+
175
+ **Quick decision tree:**
176
+
177
+ | Task type | Required tools | Common mismatch |
178
+ |---|---|---|
179
+ | Write files (audit reports, ADRs, code) | `Write` + `Edit` | Read-only agents (e.g., scout-tier) return audit text instead of files |
180
+ | Modify existing files | `Edit` | Read-only agents propose diffs instead of applying them |
181
+ | Run scripts / git ops | `Bash` | Some review-tier agents lack Bash and can't verify their own findings |
182
+ | Pattern search / discovery | `Grep` + `Glob` | All agents have these (scout floor) |
183
+ | Read agent definitions | `Read` | Universal |
184
+
185
+ **Pre-deployment check:** if the dispatch brief asks the agent to "write," "update," "modify," or "fix" any file, verify the agent definition includes `Write` and/or `Edit` in `tools:`. If not, EITHER:
186
+
187
+ 1. Add the tool to the agent definition (preferred when the agent SHOULD be authoring in their domain — e.g., Irulan should write ADR audits as files), OR
188
+ 2. Delegate the actual write to an orchestrator-tier action (the agent produces structured audit output; the orchestrator writes the file).
189
+
190
+ Field report #322 (barrierwatch M1): Irulan was asked to write `docs/adrs/INDEX.md` and update `CHANGELOG.md`. Her tools were `Read, Grep, Glob` — she returned a comprehensive audit text instead of files. The orchestrator manually transcribed her audit into the files. Cost: a redirect that should have been a tool-list fix.
191
+
192
+ ### Build-Agent Pytest Sequencing
193
+
194
+ Build agents that need to verify their work with pytest should:
195
+
196
+ 1. Run **targeted pytest** on touched files only as the agent's internal verification (fast, fits in the agent response window — typically 1-3 min).
197
+ 2. **Commit + report BEFORE** running the full-suite pytest. The orchestrator runs the full suite as the gate — that's not the agent's job.
198
+ 3. Do NOT run the full CI-equivalent suite as the agent's final action. Long-running suites (12-15 min) routinely exceed the agent response window, truncate the report mid-output, and force the orchestrator to reconstruct state from `git log` rather than read the report.
199
+
200
+ Field report #320 §4: 4 of Strange's M-10 commits had truncated reports because internal pytest was still running when the response window closed. Targeted pytest (`pytest -q tests/path/to/touched_module.py`) is the right shape for the agent; full-suite is the orchestrator's gate.
201
+
202
+ ### Long-Running Shell Commands Inside Agent Dispatches
203
+
204
+ When a sub-agent needs to run a shell command that takes longer than ~3 minutes (long pytest, full build, multi-region deploy probe, container migration), the dispatch prompt must specify one of two patterns:
205
+
206
+ 1. **Background + poll** — agent runs the command with `run_in_background: true`, then polls for completion at fixed intervals. The agent's final response includes the polled outcome.
207
+ 2. **Reduce scope** — the agent runs a focused subset that completes inside the response-stream window. The orchestrator runs the full version separately.
208
+
209
+ Naked long-running commands inside an agent dispatch will truncate the agent's report mid-execution; the orchestrator then has to recover state from disk and re-write the report retrospectively. Field report #317 logged 4 such truncations in a single Union Station session.
210
+
141
211
  ## Agent Debate Protocol
142
212
 
143
213
  When two agents disagree on a finding, run a structured debate instead of listing both opinions:
@@ -327,6 +397,26 @@ CONSTRAINTS: [list]
327
397
  | Architecture / Council | Position Statement: assessment, concerns, sign-off |
328
398
  | Build agents | Build Report: files created/modified, tests added, decisions made |
329
399
 
400
+ ### Intentionally Overlapping Mandates (high-signal convergence)
401
+
402
+ When dispatching parallel reviewers, **deliberately give 3+ agents the same diff with different lenses**. This is not duplication — it is intentional convergence.
403
+
404
+ - Findings flagged by 1 agent = standard signal, route to triage
405
+ - Findings flagged by 2+ agents from different universes = high-confidence signal, prioritize
406
+ - Findings flagged by 3+ agents = critical convergence, fix in same batch
407
+
408
+ Field report #324 (Union Station v7.8 R2): three agents (Discovery + Stark + Kenobi) ran in parallel against the same diff. HIGH-1 was caught by all three; two MED findings by 2 of 3. A single-agent review would have missed ~25% of findings empirically. The "wasted" agent budget is the price of multi-lens coverage.
409
+
410
+ **When to use overlap:**
411
+ - Methodology ADRs (statistical, security, financial) — code-vs-ADR + spec-adversary + Riker trade-offs (3 lenses, same diff)
412
+ - Multi-tenant boundary changes — Stark (impl) + Kenobi (auth) + Ahsoka (IDOR) + Spock (schema), 4 lenses on the same code
413
+ - Cross-module diffs after refactor sweeps — Cyborg (integration) + Strange (services) + Banner (queries)
414
+
415
+ **When NOT to use overlap:**
416
+ - Trivial single-file changes (<50 lines, no cross-module impact)
417
+ - Pure formatting/lint sweeps
418
+ - Doc-only edits where finding density approaches zero
419
+
330
420
  ### Concurrency Rules (ADR-059)
331
421
 
332
422
  - **Fan out the full roster in parallel for read-only analysis.** Opus 4.7's 1M context window handles 20+ concurrent findings tables without thrashing. Field report #270 confirmed 15+ parallel agents at 15-25% context usage.
@@ -100,9 +100,62 @@ Use the Agent tool to run these in parallel — they are independent analysis ta
100
100
  - **Data's Tech Debt:** Wrong abstractions, missing abstractions, premature optimization, deferred decisions, dependency debt, documentation debt. Each with impact, risk, effort, urgency.
101
101
 
102
102
  **Step 5 — ADRs + Riker's Decision Review:**
103
- - **Picard writes ADRs:** Architecture Decision Records for every non-obvious choice. Status, context, decision, consequences, alternatives. **Each ADR must include an Implementation Scope field:** "Fully implemented in vX.Y" or "Deferred to vX.Y no stub code committed." This prevents the pattern where architecture is decided, stubs are shipped as placeholders, and the real implementation never arrives. (Field report: v17.0 assessment found 3,500+ lines of infrastructure built on stub adapters that were "deferred" in v11.0 and never completed through v16.1.)
103
+ - **Picard writes ADRs:** Architecture Decision Records for every non-obvious choice. Status, context, decision, consequences, alternatives. **Each ADR must include an Implementation Scope field anchored to reality:** before writing "Fully implemented in vX.Y," verify with `ls`/`grep` that every named deliverable exists at HEAD. If any cited file is missing, status is "Proposed to be implemented in vX.Y PR" never "Accepted." Field reports #312 (4 of 5 ADRs falsely claimed Fully Implemented), #313 (ADR-039 said `STRUCT-006/012 fully implemented in v0.4.0`; at HEAD, neither existed), and #316 (ADR-101 claimed schema property that the schema didn't have) document the cost: false confidence in audit trails is worse than missing audit trails.
104
+ - **Each ADR has a Verification Gate with a Fixture Bindability proof.** A gate that algebraically cannot fail under its fixture proves only refactor-correctness, not fix-correctness. State explicitly: *"Fixture: <data/scenario>. Can the gate FAIL under this fixture? <yes/no + rationale>."* If no, add a fixture where the fix CAN bind, or downgrade the verification claim. See `/docs/patterns/adr-verification-gate.md`. (Field report #313 Finding 1: ADR-040's "bit-identical 12-day forensic" PASS proved arithmetic preservation; the cap path was never exercised because proximity stayed wide.)
105
+ - **ADRs with numbered cohort breakdowns require sum-verification.** When the ADR claims "5 cohorts of N tables totaling X," compute the sum independently and compare. If mismatch, document which is canonical, why, and where the spec is authoritative. Otherwise 3+ downstream agents waste reviewer cycles re-verifying the math. (Field report #318: Picard's M-05 ADR said "47 RLS-policied tables" in 3 places; cohort breakdown summed to 55. Spock, Trunks, and Cara Dune each caught it independently.)
106
+ - **ADRs specifying HARD GATEs require feasibility audit.** Acceptance criteria must be derivable from the kernel/agent's actual input set, not from post-hoc forensic labels. Test: write the algebraic intersection of all gate conditions; if the solution set is empty, the gate is structurally infeasible and must be reframed BEFORE downstream missions consume it. (Field report #314 Finding 2: a regime classifier was asked to identify forensic-directional days using only pre-midnight 4h drift inputs; algebraic proof showed no parameter satisfied both directional and symmetric pins simultaneously. Required operator escalation + reframing.)
107
+ - **ADR amendments trigger a cross-ADR cascade scan.** Any ADR amendment must scan dependent ADRs (cross-references in §References, downstream missions consuming the amended spec) for stale claims, then bundle all amendments into one commit. (Field report #314 Finding 6: M9.1a kernel amendment forced ADR-038 schema, ADR-044 enum, and ADR-036 amendments; T'Pol caught the cascade during synthesis. Without the bundled commit, downstream missions would have read stale specs.)
104
108
  - **ToS/API policy compatibility:** For ADRs selecting third-party services, verify the provider's Terms of Service and API usage policies permit the intended usage pattern (automation, bot-initiated transactions, reselling, volume). A service rejected on ToS grounds after building requires a full architecture pivot. (Field report #300)
105
- - **Riker reviews:** "Number One, does this hold up?" Riker challenges each ADR's trade-offs — are the alternatives truly worse? Are the consequences acceptable? Did we consider the second-order effects? **Riker also verifies the implementation scope is honest** — if an ADR says "fully implemented" but the code throws `'Implement...'`, that's a finding. Riker's review prevents architectural decisions made in a vacuum.
109
+ - **Riker reviews:** "Number One, does this hold up?" Riker challenges each ADR's trade-offs — are the alternatives truly worse? Are the consequences acceptable? Did we consider the second-order effects? **Riker also verifies the implementation scope is honest** — if an ADR says "fully implemented" but the code throws `'Implement...'`, that's a finding. **Riker also asks "Can this gate FAIL under the proposed fixture?"** If algebraically it cannot, the gate proves only that the refactor preserved arithmetic, not that the fix is correct. Riker's review prevents architectural decisions made in a vacuum.
110
+ - **Spec adversary pass (BEFORE implementation):** Riker reviews trade-offs; an adversarial agent (Feyd-Rautha, Maul, or Loki, chosen by domain) attacks the SPECIFICATION itself for category errors and missing constraints. **This pass runs before Stark implements.** The question Riker asks is "does this hold up?" The question the adversary asks is different: "is the spec asking the right question? Does the algebraic intersection of all constraints contain the desired solution? What's the failure mode the spec didn't name?" Field report #322 documents the cost: ADR-069 (FWER family scoping) said "filter family by p-value alone"; four agents (T'Pol, Picard, Stark, Batman) reviewed code-vs-ADR and all signed off. The bug was in the spec — the family should have been scoped to runs that passed the per-run gate. Surfaced only when M6's smoke run produced a false positive in production. A spec-adversary pass — asking "is the family definition itself correct?" before implementation — would have caught it. The rule: code-vs-ADR review confirms fidelity; spec-adversary review confirms correctness. Both are required for non-trivial methodology ADRs (statistical, security, financial, identity).
111
+
112
+ ### Scope-confidence interval (callsite-counted ADRs)
113
+
114
+ When an ADR's effort estimate is denominated in callsite/file count ("12 sites need updating," "5-line cleanup," "~150 caller cascade"), the ADR MUST include ONE of:
115
+
116
+ 1. **Verifying grep with pinned `n=N`** — the literal command + the observed count at the SHA the ADR was authored against. Example: *"Verified at `f7330c6`: `grep -rcE 'org_id\s*:\s*int\s*=\s*1' app/ | awk -F: '{s+=$2} END {print s}'` → n=65."*
117
+ 2. **Uncertainty annotation** — explicit "±X×" range when verification is intentionally deferred. Example: *"Estimated 12 sites; ±5× uncertainty pending audit mission."* Downstream missions reading the ADR treat the upper bound as the planning estimate.
118
+
119
+ Point estimates without verification or uncertainty are a methodology bug. Field reports #328 (architect estimates off 5-10× on M-48c.1 + M-48c.3 + M-48d) and #329 (F-V710-ORG1-DEFAULTS estimated 12, reality was 65 — 5×, restructured v7.11 plan into a parallel sub-campaign) document the cost: campaigns inherit consequences silently. The verification step is cheap. Skipping it is not.
120
+
121
+ **Closeout reciprocity:** when a `/campaign` closeout report cites a followup count that will be consumed by the next plan, the followup definition MUST embed the same grep pattern. The next campaign's `/architect --plan` re-runs the grep before accepting the count. See `CAMPAIGN.md` "Closeout grep pinning."
122
+
123
+ ### Service-extraction test-patch checklist
124
+
125
+ When a mission moves a symbol out of one module into another (PIC-002-style service extraction, refactor-into-helper, rename-with-relocation), the same commit MUST update every test that patches the symbol by old path. Imports bind at module load — `patch("app.routers.X.foo")` silently no-ops if `foo` now lives in `app.services.X.service`, and the test passes against unmocked production code.
126
+
127
+ **Checklist for any extraction mission:**
128
+
129
+ 1. After moving the symbol, `grep -rn 'patch[(]"[^"]*\.<symbol_name>"' tests/` (or equivalent for the test framework)
130
+ 2. For every match, update the path to the new module location
131
+ 3. If the symbol is re-exported from the old path for backward compat, document it — but prefer updating tests over keeping re-exports (tests should follow code)
132
+
133
+ Field report #324 (Union Station v7.8 PIC-002 trio): multiple half-Gauntlet followups had to retroactively update `patch("app.routers.X.foo")` → `patch("app.services.X.service.foo")` because the extraction missions did not include the test-patch sweep.
134
+
135
+ ### Signing-path audit
136
+
137
+ For every file in the codebase that produces a cryptographic signature (EIP-712, EIP-191, action hashes, JWT signing, HMAC for webhooks, OAuth state signing, license signing), verify a golden-vector test exists pinning byte-identical output for fixed inputs. Asymmetry across signing paths in the same codebase is a known regression vector — the test the author didn't write is the one that catches the SDK upgrade that breaks production.
138
+
139
+ **Audit step:**
140
+
141
+ 1. Grep for signing primitives: `signTypedData`, `sign(`, `signMessage`, `createHmac`, `jwt.sign`, `crypto.sign`, framework-specific equivalents
142
+ 2. For each call site, locate the corresponding golden-vector test (pinned inputs → expected hex output)
143
+ 3. If a signing path lacks a golden vector, the audit FAILS — write the test before the next refactor touches the path
144
+
145
+ Field report #323 (barrierwatch Phase 2): the HL exchange client had a golden-vector test, but the PM CLOB client (which delegates to `@polymarket/clob-client` SDK) did not. A 35-agent /architect synthesis caught the asymmetry; without that depth, a future SDK upgrade would have shipped a silent regression.
146
+
147
+ ### Npm-name availability pre-flight (ADR authoring)
148
+
149
+ When an ADR proposes a published npm package name or scope, the architect MUST verify availability via BOTH:
150
+
151
+ 1. **Registry query** — `npm view <name>` returns E404 (or equivalent "not found" signal)
152
+ 2. **Org-create form** — if scoped (e.g., `@foo/bar`), visit npmjs.com/org/create and attempt to create the org. npm has no CLI-level `npm org create`; scope availability in the registry does NOT imply org-create availability.
153
+
154
+ Do not canonicalize the name in docs, code, or CHANGELOG entries until BOTH checks pass. Checklist item in the ADR's Decision section:
155
+
156
+ > "Npm-name availability confirmed: registry E404 ✓, scope create-form accepts ✓."
157
+
158
+ Field report evidence: #308 RC-1 documents v23.9.0 → v23.9.1 mid-flight pivot from `@voidforge/cli` to unscoped `voidforge-build` because `voidforge` org creation was rejected after docs had already canonicalized the scoped name. Related: LRN-4, LRN-7 in docs/LEARNINGS.md; ADR-061 §13.
106
159
 
107
160
  ### `--adr-only` Lightweight Mode
108
161
 
@@ -177,6 +177,23 @@ steps:
177
177
  - run: npx playwright test --shard=${{ matrix.shard }}
178
178
  ```
179
179
 
180
+ ### Decreasing-Counter Test Markers (e.g., `known_pg_gap`)
181
+
182
+ When a multi-mission migration introduces deliberate, tracked test failures (a backend swap, a forced-RLS rollout, a schema canonicalization), use a **decreasing-counter marker** to keep CI green while the gap closes.
183
+
184
+ **Pattern:**
185
+ 1. Pick a marker name describing the migration (`known_pg_gap`, `known_v2_schema_gap`, `known_force_rls_gap`).
186
+ 2. Tag every currently-failing test with the marker. Add a one-line reason: `# known_pg_gap: pinned to SQLite — exercises asyncpg LISTEN/NOTIFY in M-04c`.
187
+ 3. CI runs with the marker excluded by default: `pytest -m "not known_pg_gap"`. Treat green as actionable.
188
+ 4. **Each mission removes its tag as it closes the gap.** The total count of tagged tests is a monotonically decreasing counter; campaign-state.md tracks it.
189
+ 5. Final mission (boundary or victory) removes the last tag, drops the marker registration, and asserts `pytest -m known_pg_gap` collects 0 tests.
190
+
191
+ **Why:** without this, dual-backend or boundary-tightening campaigns either ship CI red for weeks (eroding the green-CI invariant) or freeze the migration mid-flight to land all tests at once (which is high-risk). The decreasing counter lets each mission ship green while reducing the tracked debt.
192
+
193
+ **Anti-pattern:** using the marker for genuinely-broken tests with no plan to remove it. Markers must be paired with mission ownership in campaign-state.md. Untracked markers become permanent test-suite scar tissue.
194
+
195
+ Field report #316 §7 (Union Station v7.7 M-13a — 83 known_pg_gap tags landed during the SQLite→PG canonicalization, decreasing across M-04..M-12).
196
+
180
197
  ### Flaky Test Protocol
181
198
 
182
199
  Flaky tests erode trust in the test suite. Huntress (stability monitor) tracks flake rates.
@@ -99,6 +99,23 @@ The pickup prompt is the vault's delivery mechanism. It's printed to console, no
99
99
  - **Campaign pause** — When `/campaign` pauses between missions across sessions.
100
100
  - **Before destructive operations** — Before `git reset`, branch switches, or major refactors.
101
101
 
102
+ ### 6.5. Verification Pass Before Sealing
103
+
104
+ A vault that mis-states load-bearing facts misleads the next session. Field report #318 documented vault-2026-04-29-2 carrying 4 inaccuracies (table count off, migration head wrong, advisory lock id wrong, FK claim contradicted by the actual schema) — three independent reviewers caught them via live psql + code inspection in the next session, costing ~30-60 min of corrected work.
105
+
106
+ Before sealing, **run a verification pass** on every load-bearing fact:
107
+
108
+ | Claim type | How to verify |
109
+ |------------|--------------|
110
+ | Table count | Live DB: `SELECT count(*) FROM pg_class WHERE relkind='r' AND relnamespace='public'::regnamespace` (PG) or equivalent |
111
+ | Migration head | `git log -1 --format=%H -- <migrations-dir>` and the latest applied row in the migrations table |
112
+ | Schema invariants (advisory lock id, FK constraints, NOT NULL flags) | Read the code, not memory: `grep -nE "advisory_lock|crc32" <code>`, `\d <table>` in psql |
113
+ | File paths cited as deliverables | `[ -f <path> ] && echo present \|\| echo MISSING` |
114
+ | Test counts | `pytest --collect-only -q | tail -1` or equivalent |
115
+ | Version numbers | `cat VERSION.md`, `cat package.json | jq .version` |
116
+
117
+ Document each verified fact with the source (`from psql`, `from VERSION.md:3`, `from <file>:<line>`). If a previously-true claim is no longer true at sealing time, fix the claim — do not seal known drift. The vault carries the **truth at sealing time**; drift between the vault and reality is methodology debt that compounds across sessions.
118
+
102
119
  ### 7. Operational Learnings Sync
103
120
 
104
121
  At session end, before sealing the vault, check for approved operational learnings from this session:
@@ -263,3 +263,30 @@ Before clearing, deleting, or modifying database fields to "fix" missing files o
263
263
  5. **NEVER clear a DB field to work around a missing file.** Restore the file first, or confirm the regeneration cost is acceptable BEFORE deleting the reference.
264
264
 
265
265
  (Field report #103: 251 avatarUrl fields cleared to "fix" missing files, triggering ~$10 in DALL-E regeneration + 50 minutes downtime. The files existed on the VPS — they were deleted by `rsync --delete`, not lost. Restoring from backup would have been free.)
266
+
267
+ ---
268
+
269
+ ## Cloudflare / Wrangler Gotchas
270
+
271
+ ### `wrangler pages deploy` in Direct Upload mode ignores `.gitignore`
272
+
273
+ `.gitignore` semantics differ between Git-integrated and Direct Upload Pages projects. Direct Upload uploads EVERY file in the target directory, including `.env`, `.pem`, `.key`, and anything else `.gitignore` would normally hide. Always deploy from a dedicated subdirectory (`dist/`, `public/`, `site/`) — never repo root. See `docs/methods/DEVOPS_ENGINEER.md` §Deploy Surface Boundary.
274
+
275
+ Evidence: field report #305 — 32-day live credential leak via `wrangler pages deploy .` from repo root.
276
+
277
+ ### `wrangler pages deployment delete --force` doesn't force aliased deployments
278
+
279
+ For aliased deployments (preview URLs attached to branch names), the CLI's `--force` flag does NOT pass `force=true` to the underlying API. Result: error code 8000035 "deployment is aliased and cannot be deleted." Workaround: call the Cloudflare API directly with `force=true` in the request body.
280
+
281
+ ### Cloudflare Pages Dev Mode + Purge Everything may not evict all cache
282
+
283
+ Dev Mode and Purge Everything are both best-effort across Cloudflare's PoP network. For time-critical evictions (e.g., post-credential-rotation):
284
+
285
+ 1. Purge Everything in the dashboard.
286
+ 2. Run Custom Purge by URL for the specific asset.
287
+ 3. Enable Dev Mode.
288
+ 4. Wait at least TTL + 60 seconds before asserting eviction.
289
+
290
+ If the stale content persists after step 4, a second Custom Purge + TTL wait is often required. Do not assume a single purge is sufficient.
291
+
292
+ Evidence: field report #305 — credential-leak remediation required multiple purge passes before all PoPs served the rotated content.
@@ -0,0 +1,80 @@
1
+ # Pattern: ADR Verification Gate
2
+
3
+ **When to use:** Every ADR with a verification gate. The gate must prove the *fix* is correct — not merely that a refactor preserved existing behavior.
4
+
5
+ **Source:** Field reports #313 (Fixture Bindability), #314 (HARD GATE feasibility), #318 (sum-verification), #316 (schema cross-check).
6
+
7
+ ## The Failure Mode
8
+
9
+ ADRs ship with verification gates that record PASS but cannot demonstrate fix correctness. Examples:
10
+
11
+ - **Refactor-only proof:** ADR-040 (#313): "12-day forensic window is bit-identical." Straddle P&L was unchanged before and after — but the forensic window never exercised the capped path. Proximity stayed wide enough that the cap ceiling was never hit. The PASS proved arithmetic preservation, not cap correctness.
12
+ - **Empty-solution gate:** ADR-036 M9.1a HARD GATE (#314): asked the kernel to identify forensic-directional days using only pre-midnight 4h inputs. Algebraic intersection of "directional" and "symmetric" pins had no solution. Required operator escalation + reframing.
13
+ - **Aspirational claim:** ADR-039 (#313): header said `STRUCT-006, STRUCT-012 — fully implemented in v0.4.0`. At HEAD, neither existed. No file-existence check before marking Accepted.
14
+
15
+ ## The Pattern
16
+
17
+ Every ADR includes a Verification Gate block:
18
+
19
+ ```markdown
20
+ ## Verification Gate
21
+
22
+ **Fixture:** <data set / scenario / runtime state used to exercise the gate>
23
+
24
+ **Can the gate FAIL under this fixture?** <yes | no + algebraic/empirical rationale>
25
+ - If **no**: this is a refactor-correctness test, not a fix-correctness test.
26
+ Add a fixture where the fix CAN bind, OR downgrade the verification claim
27
+ to "preserves prior behavior" (which is a refactor proof, not a fix proof).
28
+
29
+ **Fixture-bindability proof:** <one sentence showing the fixture would detect
30
+ regression if the fix were incorrect>
31
+
32
+ **Rehearsed at:** <commit-sha or "not yet" — see Step 4.7 of architect.md>
33
+
34
+ **Implementation Scope (reality anchor):**
35
+ - Status: Proposed | Accepted | Deferred
36
+ - Deliverables exist at HEAD?
37
+ - <path/1> — <existence-check command + result>
38
+ - <path/2> — <existence-check command + result>
39
+ - If any deliverable is missing: status MUST be "Proposed," not "Accepted."
40
+
41
+ **Sum-verification (if ADR contains numbered cohorts):**
42
+ - Headline claim: "<X total>"
43
+ - Independent sum of cohorts: <Y>
44
+ - Match? <yes | no + which is canonical>
45
+ ```
46
+
47
+ ## Decision Tree
48
+
49
+ | Situation | What to do |
50
+ |-----------|-----------|
51
+ | Gate fixture is fixed historical data | Verify the data exercises the fix path. If the historical window doesn't trip the gate, add a synthetic adversarial case. |
52
+ | Gate is "bit-identical to prior implementation" | Acceptable as a refactor proof. NOT acceptable as the only evidence the fix is correct — pair with a fix-correctness gate. |
53
+ | Gate is a HARD GATE with multiple acceptance pins | Compute the algebraic intersection of all pins. If the solution set is empty, the gate is structurally infeasible — escalate to operator. |
54
+ | ADR cites file paths as deliverables | Run `[ -f <path> ] && echo present || echo MISSING` for each before marking Accepted. |
55
+ | ADR cites cohort sums (e.g., "55 tables = 37+5+7+5+1") | Spock-style independent sum. Mismatch → document which is canonical. |
56
+ | ADR amends an earlier ADR | Cross-ADR cascade scan: every dependent ADR's references must be checked for stale claims. Bundle amendments in one commit. |
57
+
58
+ ## Anti-Patterns
59
+
60
+ - **"Bit-identical" without fixture-bindability proof.** Demonstrates arithmetic preservation, not fix correctness.
61
+ - **"Fully implemented in vX.Y" without a file-existence check.** Aspirational status; reviewers gain false confidence.
62
+ - **HARD GATE pins derived from post-hoc forensic labels.** Algebraically infeasible if the kernel's input set doesn't contain the discriminating signal.
63
+ - **Numbered breakdowns without independent sum.** Cascades into wasted reviewer cycles when 3+ downstream agents independently re-verify the math.
64
+ - **Single-form structural sentinels.** A gate that detects only `current_setting(...) = ''` misses commuted, cast, IS-NULL, and coalesce variants. See `/docs/patterns/structural-sql-sentinel.py` for adversarial-test discipline.
65
+
66
+ ## When the Gate Cannot Bind
67
+
68
+ If the proposed fixture cannot exercise the fix:
69
+
70
+ 1. Construct a synthetic fixture that does. (For numerical kernels: jitter inputs across the threshold. For RLS gates: test under a non-owner role. For middleware: test at expected RPS.)
71
+ 2. If no fixture is feasible (e.g., the fix is a defensive guard for an unreachable state), the ADR is documenting a *theoretical* fix — say so explicitly: *"Verification: theoretical; this guard cannot be exercised in normal operation."*
72
+ 3. NEVER ship a PASS that asserts only what the algebra already requires.
73
+
74
+ ## Riker's Standing Question
75
+
76
+ When reviewing any ADR with a Verification Gate, Riker asks: *"Can this gate FAIL under the proposed fixture?"* The honest answer drives the disposition:
77
+
78
+ - **Yes, with a clear failure path** → gate is sound; ADR may be Accepted.
79
+ - **No, the algebra forbids it** → gate is circular; require an additional fix-correctness fixture or downgrade the claim.
80
+ - **Unsure** → spike a deliberate regression and observe whether the gate trips.