voidforge-build 23.12.0 → 23.12.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -38,6 +38,10 @@ Report findings by WCAG criterion:
38
38
 
39
39
  Summary with WCAG AA pass/fail assessment.
40
40
 
41
+ ## Operational Learnings
42
+
43
+ - Token NAMES are NOT proxies for VALUES (field report #355 F1): a token called `paper` may resolve to near-black, `ink` to near-white. Never infer contrast from token names. Before rating any contrast finding Critical, cite the literal source hex for BOTH foreground and background with file:line (e.g. `tailwind.config.ts:42` / `globals.css:18`) and re-grep that the class pairing actually exists in the rendered markup — a token pairing that never co-occurs is not a real contrast failure.
44
+
41
45
  ## Reference
42
46
 
43
47
  - Agent registry: `/docs/NAMING_REGISTRY.md`
@@ -106,6 +106,8 @@ DEPLOYMENT REMINDER: You MUST now launch an Agent sub-process for EVERY agent li
106
106
  - **scope_density — small/single-shot surfaces want a 6-10 roster.** When the prompt describes <10 source files, a single deploy host, and a one-shot or single-viewer use case, prefer a roster size of 6-10 instead of the usual 18-22. Generate the lean roster up front rather than over-including 18-22 and pruning afterward — the up-front lean roster saves the orchestrator the pruning round and saves sub-agent launches that would only restate each other. (Field report #344, F5.)
107
107
  - **Creative/UX rosters need a web-capable scout.** The design agents (Galadriel, Arwen, Eowyn, Glorfindel, Celeborn) carry only Read/Write/Edit/Bash/Grep/Glob — they cannot see the web, so they can't ground a creative or UX roster in current design conventions, competitor patterns, or external references. Any creative/UX roster MUST include at least one web-capable scout (a general-purpose agent equipped with WebSearch/WebFetch); if no such agent is on the roster, flag explicitly that the roster needs external grounding so the orchestrator can add one. (Field report #347, #5.)
108
108
  - **Orchestrator roster-name normalization (handoff note).** Before launching, the orchestrator validates each roster name against `ls .claude/agents/` (basenames minus `.md`). For any name with no exact match, it attempts exactly one correction — strip a known prefix/suffix (e.g. `voidforge-`, or add/remove a `-architecture`/`-security-arch` suffix) and re-check — then DROPS the name if still unmatched rather than blocking the whole dispatch on one bad entry. You make this rarely necessary by emitting exact basenames per the BASENAME CONSTRAINT above. (Field report #345, DEAL-001.)
109
+ - **coverage_debt — an "unsampled"/"not-checked" flag from a prior review agent is COVERAGE DEBT, not a closed item.** When the orchestrator's context carries a finding from an earlier phase that a file, route, or surface was explicitly NOT sampled / NOT checked (e.g. "only 3 of 11 endpoints reviewed" or "templates dir not examined"), that gap is owed work — carry it explicitly into the next phase's roster reasoning and work-list rather than letting it silently drop. Name the unsampled surface in your reasoning and weight an agent to own it next pass. Coverage debt that nobody is assigned to repay becomes a permanent blind spot. (Field report #355 F2.)
110
+ - **focused_partition — a single named lens caps the roster ~6-8 and PARTITIONS by surface, not by persona.** When the user names exactly ONE review lens (copy-only / contrast-only / perf-only / a single-domain FOCUSED review), do NOT field a stack of near-duplicate personas all reviewing everything — that multiplies redundant findings without adding coverage. Cap the roster at roughly 6-8 and PARTITION the agents by SURFACE/SECTION so each owns a distinct set of files (agent A: marketing pages, agent B: app shell, agent C: settings/account, etc.), all applying the same single lens. This keys on the user naming one lens — distinct from scope_density/scope_bias, which key on codebase size and explicit path scope. (Field report #355 F3.)
109
111
 
110
112
  ## Required Context
111
113
 
@@ -32,7 +32,7 @@ Read deploy target from PRD frontmatter. If not specified, scan for evidence:
32
32
 
33
33
  Levi verifies the deploy is safe:
34
34
  1. **Build passes:** `npm run build` (or equivalent) must succeed
35
- 2. **Tests pass:** `npm test` must pass (if test suite exists)
35
+ 2. **Tests pass:** `npm test` must pass (if test suite exists). The **full suite** is the deploy gate — a targeted/isolation-green run is necessary but NOT sufficient, because environment coupling can regress unrelated tests invisibly to isolation runs. *Isolation-green is not deploy-green* (field report #354 F3): run the whole suite before the gate clears, not just the tests for the change at hand.
36
36
  3. **No uncommitted changes:** `git status` clean
37
37
  4. **Credentials available:** SSH key, API token, or platform credentials accessible
38
38
  5. **Version tagged:** Current version from VERSION.md matches the commit being deployed
@@ -101,10 +101,26 @@ Merge all findings into a review table (conflicts already resolved via Step 1.5)
101
101
  Categories: Pattern, Quality, Maintainability
102
102
  Severity: Must Fix > Should Fix > Consider > Nit
103
103
 
104
- **Confidence scoring is mandatory.** Every finding includes a confidence score (0-100). If confidence is below 60, escalate to a second agent from a different universe (e.g., if Spock found it, escalate to Oracle or Stark) to verify before including. If the second agent disagrees, drop the finding. High-confidence findings (90+) skip re-verification in Step 3.5.
104
+ **Confidence scoring is mandatory.** Every finding includes a confidence score (0-100). If confidence is below 60, escalate to a second agent from a different universe (e.g., if Spock found it, escalate to Oracle or Stark) to verify before including. If the second agent disagrees, drop the finding. High-confidence findings (90+) skip re-verification in Step 3.5. **For Must Fix / Should Fix findings this confidence-escalation is no longer the gate — they route through the vote-based REFUTE Gate in Step 2.5 (field report #354 F1) regardless of confidence; a Must Fix at confidence 97 still faces skeptics told to REFUTE.** Confidence escalation governs Consider/Nit and Medium-and-below findings.
105
105
 
106
106
  **SSOT direction reconciliation (mandatory for access/permission/contract findings — field report #349).** For any finding whose fix touches access control, a permission, or an API/data contract, the finding is NOT actionable until you NAME the governing single source of truth (the permission matrix, the relevant ADR, or the API contract) and reconcile the fix DIRECTION against that doctrine before recording it. State explicitly which way the fix moves — loosen vs tighten, and who gains access — and confirm that direction matches the SSOT. This extends the verify-the-FIX discipline (#348): a finding can be "verified" as real and still carry a backwards fix that widens a permission the doctrine says to restrict. A fix that is real, lands cleanly, and tests green can still be wrong-direction. If no governing SSOT can be named, flag the finding for architecture review (Picard) rather than auto-fixing it.
107
107
 
108
+ ## Step 2.5 — REFUTE Gate (vote-based adversarial verification — field report #354 F1)
109
+
110
+ This is the verification shape for /engage, ported from the Gauntlet's REFUTE Gate (gauntlet.md "REFUTE Gate" / GAUNTLET.md Step 4.5). It REPLACES relying solely on the confidence-escalation model from Step 2 ("second agent disagrees → drop") as the gate for high-severity findings. Confidence escalation still routes Consider/Nit-tier and Medium-and-below findings; the vote-based refute lens governs every **Must Fix** and **Should Fix** finding before it reaches the Step 3 fix batch.
111
+
112
+ **Why a vote, not an escalation.** An escalation asks a second agent "do you agree?" — which invites agreement. The refute lens does the opposite: it spawns skeptics **explicitly told to REFUTE**, defaulting to REFUTED until the actual code proves the finding. That single inversion — skeptic instructed to disprove, not to confirm — is the highest-leverage element of this gate; it is what filters the false positives that an "agree?" prompt waves through.
113
+
114
+ **Procedure — execute per Must Fix / Should Fix finding, after Step 2's synthesis:**
115
+
116
+ 1. **Cluster and dedupe first.** Before voting, merge findings that name the same root cause or the same file:line across agents into one finding (carry the highest severity claimed). You refute root causes, not duplicate symptoms — voting on near-identical findings separately wastes skeptics and inflates the board.
117
+ 2. **Spawn skeptics told to REFUTE.** For each clustered Must Fix / Should Fix finding, launch **≥2 skeptic agents** in parallel via the Agent tool, drawn from a DIFFERENT universe than the agent that raised it (a Star Trek finding gets DC + Marvel skeptics) so no agent grades its own homework. Pass the finding ID, severity, file:line, and description as opaque data. Prompt each skeptic: *"Default to REFUTED. This finding is unproven until you open the cited file and confirm the bug/violation exists in the actual code. Do not trust the description. Return CONFIRM (with the exact line(s) that prove it) or REFUTE (with the reason the code does not exhibit the claimed problem)."* A skeptic that cannot cite confirming code MUST return REFUTE.
118
+ 3. **Verify the FIX too (#348), not only the finding.** Each skeptic also challenges the PROPOSED fix: does it introduce a NEW failure mode the original code lacked (wedge, unbounded retry, infinite loop, orphaned record, double-send)? Watch for a fix that adds a coordination primitive (sentinel, lock, retry-state row, claim marker) without a reachable release path. Carry the SSOT direction check (#349) into the same pass: a fix that lands clean and tests green can still move a permission the wrong way.
119
+ 4. **Tally votes — keep only ≥1 CONFIRM.** Keep the finding only if it draws **≥1 CONFIRM** backed by cited lines. A finding that every skeptic refutes is dropped from the fix list and logged as `REFUTED` (with the skeptics' reasons) — not silently deleted.
120
+ 5. **Re-rate severity from the votes.** Recompute severity from the confirming evidence, not the original claim: unanimous CONFIRM holds the tier; a split vote (some CONFIRM, some REFUTE) downgrades one tier (Must Fix → Should Fix, Should Fix → Consider); confirmed-but-narrower-than-claimed downgrades to match the proven blast radius. Record the new severity and the vote split on the finding.
121
+
122
+ Only the survivors, at their re-rated severity, proceed to Step 3.
123
+
108
124
  ## Step 3 — Fix (small batches)
109
125
  Fix "Must Fix" and "Should Fix" items. After each batch:
110
126
  1. Re-run `npm test`
@@ -58,8 +58,23 @@ Write all findings to `/logs/phase-11-security-audit.md` (or appropriate phase l
58
58
 
59
59
  Severity = exploitability x impact. Critical (auth bypass, data leak) > High (injection, IDOR) > Medium (missing headers, weak config) > Low (best practice)
60
60
 
61
+ **Enforcement-keyed severity check (field report #354 F2).** Before assigning any severity — and again when re-rating in the REFUTE Gate — ask: *"Where is this ACTUALLY enforced?"* A client-side affordance leak (a hidden admin button rendered in the DOM, a disabled field, a route present in the bundle) is a breach ONLY if the server fails to enforce the boundary. If the SERVER still enforces it — the request returns 403/404 even though the affordance leaked (render-then-403) — the finding is **UX-only (P2/P3)**, not a breach. Do NOT rate server-enforced client affordance leaks as P0/P1. To prove a P0/P1, the skeptic must show the privileged action SUCCEEDING server-side (a 200 with the protected effect), not merely that the control was visible client-side. A leaked affordance with a server 403 behind it is polish, not a vulnerability.
62
+
61
63
  **Confidence scoring is mandatory.** Every finding includes a confidence score (0-100). If confidence is below 60, escalate to a second agent from a different universe (e.g., if Maul found it, escalate to Deathstroke or Constantine) to verify before including. If the second agent disagrees, drop the finding. High-confidence findings (90+) skip re-verification in Phase 4.
62
64
 
65
+ #### REFUTE Gate — Adversarial Verification (before fixing any Critical/High) (field report #354 F1)
66
+
67
+ A single Maul red-team pass is not enough to drive fixes — one agent's accusation is not a verdict. Before fixing critical and high findings, run a vote-based REFUTE lens. This mirrors the canonical REFUTE Gate in `.claude/commands/gauntlet.md` ("REFUTE Gate — Adversarial Verification" section) — same shape, applied per-audit instead of per-round (field report #354 F1).
68
+
69
+ **Procedure — execute per Critical/High finding:**
70
+
71
+ 1. **Cluster the findings.** Group findings that describe the same root cause or the same file/flow so skeptics vote on one accusation, not a dozen restatements of it.
72
+ 2. **Spawn skeptics to REFUTE.** For each Critical/High finding (or cluster), launch at least two skeptic agents in parallel via the Agent tool, drawn from a DIFFERENT universe than the agent that raised it (a Star Wars finding gets DC + Marvel skeptics) so no agent grades its own homework. Each skeptic is instructed: *"Default to REFUTED. This finding is unproven until you open the cited file and confirm the exploit exists in the actual code. Do not trust the description. Return CONFIRM (with the exact line(s) that prove it) or REFUTE (with the reason the code does not exhibit the claimed problem)."* A skeptic that cannot point to confirming code MUST return REFUTE.
73
+ 3. **Keep ≥1-CONFIRM survivors.** Keep the finding only if it draws **≥1 CONFIRM** backed by cited lines. An all-REFUTE finding is dropped from the fix list and logged as `REFUTED` with the skeptics' reasons — not silently deleted.
74
+ 4. **Re-rate severity from the votes.** Recompute severity from the confirming evidence, not the original claim: unanimous CONFIRM at the original tier holds; a split vote (some CONFIRM, some REFUTE) downgrades one tier (Critical→High, High→Medium); confirmed-but-narrower-than-claimed downgrades to match the proven blast radius. Record the new severity and the vote split on the finding.
75
+
76
+ Only ≥1-CONFIRM survivors at their re-rated severity proceed to the fix step below. Medium/Low findings skip the gate (they are not fix-blocking) but may still be escalated under the low-confidence rule above. Log every vote (CONFIRM/REFUTE, agent, universe, cited lines or refute reason) and the re-rated severity to the audit log.
77
+
63
78
  Fix critical and high findings immediately. Medium findings get tracked. For each fix:
64
79
  1. Apply the fix
65
80
  2. Verify it works
@@ -18,6 +18,8 @@ Opus scans `git diff --stat` and matches changed files against the `description`
18
18
 
19
19
  **Dispatch control:** `--light` skips dynamic dispatch (core only). `--solo` runs lead agent only.
20
20
 
21
+ **Focused single-domain reviews — partition by surface, don't stack personas (field report #355 F3).** When the user names exactly ONE lens via `--focus` (copy-only, contrast-only, perf-only, etc.), do NOT spin up the full multi-domain roster, and do NOT stack near-duplicate personas that all review the entire surface. Cap the roster at ~6-8 agents and PARTITION them by SURFACE/SECTION — each agent owns a distinct set of files/routes/components and reviews only that slice through the single requested lens. One copy reviewer per surface zone (auth pages, dashboard, settings, marketing), not four copy reviewers all re-reading every screen. Partitioning by surface gives coverage without redundant overlap; persona-stacking on one lens just re-finds the same issues.
22
+
21
23
  ## Context Setup
22
24
  1. Read `/logs/build-state.md` — understand current project state
23
25
  2. Read `/docs/methods/PRODUCT_DESIGN_FRONTEND.md`
@@ -99,6 +101,8 @@ Categories: UX, Visual, A11y, Copy, Performance, Edge Case
99
101
 
100
102
  **Confidence scoring is mandatory.** Every finding includes a confidence score (0-100). If confidence is below 60, escalate to a second agent from a different universe (e.g., if Samwise found it, escalate to Padmé or Nightwing) to verify before including. If the second agent disagrees, drop the finding. High-confidence findings (90+) skip re-verification in Step 7.5.
101
103
 
104
+ **Enforcement-keyed severity — don't escalate a client affordance leak the server still enforces (field report #354 F2).** Before assigning Critical to a "leak," ask whether the server still enforces the underlying rule. A client-side affordance that exposes something it shouldn't — a hidden-but-rendered admin button, a disabled control the user can re-enable in devtools, a stale UI showing a forbidden option — is a UX defect (P2/P3), NOT a security breach, AS LONG AS the server rejects the action. The fix is to hide/disable the affordance correctly; severity is UX-grade. Reserve Critical for cases where the server actually honors the leaked affordance (a real access-control gap) — and that finding belongs to Kenobi (`/sentinel`), routed via Handoffs, not graded here as a UX Critical.
105
+
102
106
  ## Step 5 — Enhancement Specs (before coding)
103
107
  For each fix: problem statement, proposed solution, acceptance criteria, a11y requirements (**Samwise** `subagent_type: Samwise` signs off), copy (**Bilbo** `subagent_type: Bilbo` signs off). **Faramir** `subagent_type: Faramir` checks whether polish effort targets the right screens — high-traffic core flows, not low-traffic edge pages.
104
108
 
package/dist/CHANGELOG.md CHANGED
@@ -6,6 +6,32 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/), and this
6
6
 
7
7
  ---
8
8
 
9
+ ## [23.12.1] - 2026-06-09
10
+
11
+ ### Follow-on triage — #354/#355 (8 fixes) + chronic CI-check fix
12
+
13
+ `#354` and `#355` were filed *during* the v23.12.0 run, so a second `/debrief --inbox` triaged them against the post-v23.12.0 tree. The adversarial verify pass overturned one false "already-fixed" (#355 F4); `#355 F5` was confirmed already-shipped (derived-counts doctrine). 8 fixes applied across 15 files.
14
+
15
+ ### Changed
16
+
17
+ - **REFUTE lens reaches `/engage` + `/sentinel`** (#354 F1) — v23.12.0 added the vote-based adversarial REFUTE pass (skeptics told to refute, ≥1 CONFIRM to keep, re-rate from votes) to `/gauntlet` + `/assemble`, but `/engage` and `/sentinel` still used the older "second agent disagrees → drop" model. Ported as `/engage` Step 2.5 and a `/sentinel` Phase 3 gate; named find→cluster→3-lens-verify as the default review shape in `SUB_AGENTS.md`.
18
+ - **Enforcement-keyed severity rubric** (#354 F2) — severity keys to the enforcement *layer*, not the symptom location: a client affordance leak the server still enforces (render-then-403) is UX-only (P2/P3), not a breach. "Where is this actually enforced?" added to the audit + verify lens. → `SECURITY_AUDITOR.md`, `PRODUCT_DESIGN_FRONTEND.md`, `/ux`, `/sentinel`.
19
+ - **"Isolation-green ≠ deploy-green"** (#354 F3) — targeted/isolation test runs are necessary but not sufficient; only the full suite is the deploy gate (environment coupling regresses unrelated tests invisibly to isolation runs). → `BUILD_PROTOCOL.md`, `/deploy`, `QA_ENGINEER.md`.
20
+ - **Boot-time DDL-ownership class** (#354 F4) — startup schema re-application can fail on tables owned by a different DB role than the app connects as; the other two deploy-env classes it named (served-artifact, `.env` precedence) were already covered by v23.12.0/prior. → `DEVOPS_ENGINEER.md`, `database-migration.ts`.
21
+ - **Contrast findings must cite source hex** (#355 F1) — a contrast finding must quote the literal source hex for *both* fg and bg with `file:line`, and re-grep that the class pairing exists, before being rated Critical; token NAMES are not proxies for VALUES (a token called "paper" may be near-black). Defends against the token-name-swap false site-wide Critical. → `PRODUCT_DESIGN_FRONTEND.md`, `GAUNTLET.md`, `samwise-accessibility.md`.
22
+ - **Glob-derived fan-out work-lists** (#355 F2) — derive per-agent file lists for a directory/migration fan-out from a glob, never a hand-typed list, and pair every fan-out with a mandatory post-fan-out completeness sweep before the wave is "done"; an "unsampled"/"not-checked" flag is coverage debt to carry forward. → `CAMPAIGN.md`, `SUB_AGENTS.md`, `silver-surfer-herald.md`.
23
+ - **Focused single-lens roster sizing** (#355 F3) — when `--focus` names one lens, cap the roster ~6–8 and partition agents by surface, not by near-duplicate persona. → `silver-surfer-herald.md`, `/ux`, `GAUNTLET.md`.
24
+ - **Per-wave staging deploy = status checkpoint** (#355 F4) — inlined in `CAMPAIGN.md` Step 4/5 action prose (the anti-pattern callout existed; the inline action-prose statement did not — caught by the verify pass).
25
+ - **Fixed the chronically-red `validate-branches.yml` slash-command check** — its `grep '| \`/[a-z]'` matched the Docs-Reference table's `/docs/*.md` rows and the `sed` mangled them into bogus `MISSING` paths, failing the job on every release since v23.11.0. Now anchored to bare `/command` cells (letters/hyphens, closing backtick) so `/docs/...` and `/HOLOCRON.md` are excluded. This is itself the #352 "gate that doesn't gate" class. (Note: the workflow's separate `e2e-tests` job still fails on a pre-existing wizard `aria-required-children` a11y issue — unrelated to methodology.)
26
+ - **Registered `/audit-docs`** in the CLAUDE.md Slash Commands table (shipped as a command in v23.12.0 but not listed). Synced to `packages/methodology/CLAUDE.md`.
27
+ - **`packages/voidforge/package.json`** methodology dep range `^23.12.0` → `^23.12.1` (ADR-062).
28
+
29
+ ### Closes
30
+
31
+ - **#354**, **#355**. #355 F5 already shipped (derived-counts doctrine); no out-of-scope items.
32
+
33
+ ---
34
+
9
35
  ## [23.12.0] - 2026-06-09
10
36
 
11
37
  ### Field Report Triage — 12 reports closed (#342–#353), 58 fixes + 5 new files
package/dist/CLAUDE.md CHANGED
@@ -156,6 +156,7 @@ Reference implementations in `/docs/patterns/`. Match these shapes when writing.
156
156
  | `/campaign` | Sisko's War Room — read the PRD, pick the next mission, finish the fight, repeat until done | All |
157
157
  | `/imagine` | Celebrimbor's Forge — AI image generation from PRD visual descriptions | All |
158
158
  | `/debrief` | Bashir's Field Report — post-mortem analysis, upstream feedback via GitHub issues | All |
159
+ | `/audit-docs` | Documentation audit — Surfer-led doc roster (Troi/Wong/Irulan/Coulson) for currency, cross-references, command↔method sync | All |
159
160
  | `/dangerroom` | The Danger Room (X-Men, Marvel) — installable operations dashboard for build/deploy/agent monitoring | Full |
160
161
  | `/cultivation` | Cultivation (Cosmere Shard) — installable autonomous growth engine: marketing, ads, creative, A/B testing, spend optimization | Full |
161
162
  | `/grow` | Kelsier's 6-phase growth protocol — initial setup within Cultivation, then autonomous loop | Full |
package/dist/VERSION.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Version
2
2
 
3
- **Current:** 23.12.0
3
+ **Current:** 23.12.1
4
4
 
5
5
  ## Versioning Scheme
6
6
 
@@ -14,6 +14,7 @@ This project uses [Semantic Versioning](https://semver.org/):
14
14
 
15
15
  | Version | Date | Summary |
16
16
  |---------|------|---------|
17
+ | 23.12.1 | 2026-06-09 | Follow-on field-report pass — `/debrief --inbox` triaged #354/#355 (filed during the v23.12.0 run) against the post-v23.12.0 tree and applied 8 fixes across 15 files. #354: port the vote-based REFUTE lens from /gauntlet into `/engage` + `/sentinel` (they used the old "second agent disagrees → drop" model), name find→cluster→3-lens-verify as the default review shape in `SUB_AGENTS.md`, **enforcement-keyed severity rubric** (a server-enforced client affordance leak is UX P2/P3, not a P0 breach — `SECURITY_AUDITOR.md`/`PRODUCT_DESIGN_FRONTEND.md`/`/ux`/`/sentinel`), "isolation-green ≠ deploy-green" (`BUILD_PROTOCOL.md`/`/deploy`/`QA_ENGINEER.md`), boot-time DDL-ownership class (`DEVOPS_ENGINEER.md`/`database-migration.ts`). #355: **contrast findings must cite literal source hex for fg+bg with file:line + re-grep the pairing before Critical** (token NAMES ≠ VALUES — defends against the false site-wide Critical; `PRODUCT_DESIGN_FRONTEND.md`/`GAUNTLET.md`/Samwise), **glob-derived fan-out work-lists + mandatory post-fan-out residual sweep** (`CAMPAIGN.md`/`SUB_AGENTS.md`/herald), focused single-lens roster cap + surface-partition (herald/`/ux`/`GAUNTLET.md`), per-wave staging deploy = status checkpoint inlined in CAMPAIGN action prose (verify pass overturned a false already-fixed). #355 F5 confirmed already-shipped (derived-counts doctrine). Also: **fixed the chronically-red `validate-branches.yml` slash-command CI check** (its grep mis-read `/docs/*` Docs-Reference rows as commands — the #352 "gate that doesn't gate" class) and registered `/audit-docs` in the CLAUDE.md Slash Commands table. Dep range `^23.12.0` → `^23.12.1`. |
17
18
  | 23.12.0 | 2026-06-09 | The v23.12 methodology pass — `/debrief --inbox` triaged all 12 open field reports (#342–#353) and applied every accepted fix in one session via two-phase workflow orchestration (triage → apply), with an adversarial verify pass on every file. 58 fixes across 32 files + 5 new files. 7 clusters: **verify-the-FIX** (the adversarial pass must vet the proposed fix, not just the finding — SUB_AGENTS.md, GAUNTLET.md, /engage; #348/#349/#350, M5 mint-fence incident); **production-config gate** (sandbox-green ≠ ship-ready — GAUNTLET.md prod-boot + sandbox-blind-spot round, CAMPAIGN.md Victory Checklist; #350); **Spring Cleaning consumer-vs-clone** (FORGE_KEEPER.md destructive-risk branch so app projects don't lose tsconfig/lockfiles; #343 F10); **Surfer roster sizing** (silver-surfer-herald.md scope_bias/scope_density/~18-cap + basename normalization; #343/#344/#345/#346); **creative/UX grounding** (world-scan + de-AI + token-scoped theming — ux.md, PRODUCT_DESIGN_FRONTEND.md, galadriel; #347/#351); **deploy/DevOps foot-guns** (DEVOPS_ENGINEER.md +13: eval-env, Node-MDWE, CF-Flexible, served-vs-built, compose-topology, docker-cleanup; #344/#349/#352/#353); **doc-currency** (CAMPAIGN/ASSEMBLER pre-SEAL refresh + new /audit-docs & DOC_AUDIT.md; #342). 3 new patterns (design-tokens.ts, nginx-vhost.conf, error-message-categorization.tsx; 48 → 51) + new /audit-docs command + DOC_AUDIT.md + scripts/regen-claude-md.sh. CLAUDE.md Personality +2 (anti-picker #343, authorized-autonomy #344), gate-timing #348, roster normalization #345. Dep range `^23.11.4` → `^23.12.0` (ADR-062). #349 F-4 and #352 #3 were already shipped (verified); #345 DEAL-004 + #353 RC-001/002/callout out of scope (Claude Code core / Workflow tool). |
18
19
  | 23.11.4 | 2026-05-12 | Wong promotion cluster + #260 closeout. /debrief --inbox re-triage of all 9 open field-report issues produced 3 ready-now promotion clusters (3+ data points across different reports each). BUILD_PROTOCOL.md Principle #11 "Derived counts discipline" (from #336 F6, #334 F6, #332 hidden #5 — three projects independently drifted the same class). New pattern `docs/patterns/autonomous-ops-triage-policy.md` codifying the 4-bucket model + SessionStart hook visibility rule (from #337 F3, #336 F7, #334 F5 — two operators independently reinvented). CAMPAIGN.md Planning Mode Step 4 "Scope-adversary check for bug classes" (from #332, #338 #2 — voidforge-marketing-site missed `/patterns` because the bug class scope was narrowed). Also closes #260 remaining items: PRODUCT_DESIGN_FRONTEND.md Operating Rule #12 "Tutorial-context checklist for slash commands" + QA_ENGINEER.md Operating Rule #13 "Tutorial smoke test for slash commands." Dep range `^23.11.3` → `^23.11.4` per ADR-062 discipline. Pattern count 47 → 48. 11 field reports remain open for v23.12 methodology pass. |
19
20
  | 23.11.3 | 2026-05-12 | Three-phase pipeline (/architect → /debrief --inbox → /campaign) shipped 12 fixes + 2 ADRs + 1 LEARNINGS entry + 2 mechanical guards. **Issue #331** destructive-bug fix: `findProjectRoot()` now enforces `$HOME` boundary + `statSync().isFile()` guard, no more silent overwrite of `~/CLAUDE.md` on `npx voidforge-build update`. **HIGH CVE** fast-xml-parser/builder via `@aws-sdk/*` patched via `npm audit fix`. **Dep contract** pinned: `voidforge-build → voidforge-build-methodology` from `"*"` to `"^23.11.3"` (ADR-062), enforced mechanically by `check-methodology-pin.sh` prepublishOnly script. **engines.node** added to methodology package.json. **publish.yml hardening**: post-publish `npm view` verification step (both jobs, 6×10s retry), `recover-partial` job with `npm deprecate` on XOR-failure, `needs: publish-methodology` ordering on publish-voidforge. **copy-assets.sh** ADR-058 template strip applied (parity with methodology prepack). **Docs**: HOLOCRON Quick Start "launch Claude Code first" preamble + npm-prefix workaround (#260, #333p), FORGE_KEEPER Rule #11 "never write to $HOME" (ADR-063), RELEASE_MANAGER ROADMAP-sync checklist line, ROADMAP.md pointer v23.8.11 → v23.11.3 (24-version drift closed). Marker integration test `no-home-writes.integration.test.ts` mechanically enforces ADR-063. 1390 tests pass. |
@@ -299,6 +299,8 @@ After running any build command (`build:workers`, `tsc --build`, webpack, etc.),
299
299
  8. Kenobi: Maul re-probes all remediated vulnerabilities, verifies fixes hold.
300
300
  9. If Pass 2 finds new issues, fix and re-verify until clean.
301
301
 
302
+ **Isolation-green is NOT deploy-green (field report #354 F3).** Nightwing's full-suite re-run in Pass 2 is the deploy gate — not the targeted/isolation runs used while fixing. A fix can pass every targeted test, every isolation run, and every re-probe of the area it touched, yet still regress UNRELATED tests through environment coupling the isolation runs cannot see (shared fixtures, global state, test ordering, env vars, a mutated singleton, a migration side effect). Isolation runs validate the fix in a vacuum; only the FULL suite observes the coupling. So "every targeted run is green" is never a deploy signal — the gate is the whole suite passing, and you do not advance to Phase 12 on isolation-green alone.
303
+
302
304
  **Phase 12 — Kusanagi Deploys.**
303
305
  1. Execute `/docs/methods/DEVOPS_ENGINEER.md` full sequence
304
306
  2. Complete first-deploy pre-flight checklist (see `/devops` command)
@@ -67,6 +67,24 @@ Examples that triggered this rule:
67
67
 
68
68
  Skip this step only when the ADR's scope is bounded by entity (one file, one table, one route group) — bounded ADRs don't need an audit because the count is visible in the scope itself.
69
69
 
70
+ ### Fan-out completeness (glob-derived lists + mandatory sweep)
71
+
72
+ This is distinct from the plan-time audit grep above. The audit grep *counts* sites so the architect can *estimate* effort. Fan-out completeness governs *execution*: when a mission fans a directory-wide or migration-wide change across parallel agents, it ensures every file in scope is actually touched and nothing is silently dropped at the seams between agents (field report #355 F2).
73
+
74
+ **Two rules, both mandatory:**
75
+
76
+ 1. **Derive per-agent file lists from a GLOB, never a hand-typed list.** When splitting a directory/migration fan-out across N agents, the source-of-truth file list MUST come from a glob expansion (`git ls-files 'src/widgets/**/*.ts'`, `find migrations -name '*.sql'`, `grep -rl '<legacy pattern>' <tree>`), then partitioned among agents — never a list typed from memory or eyeballed from a tree view. A hand-typed list is a fan-out's single biggest source of silent omission: the one file nobody remembered never gets an agent, compiles clean in isolation, and ships the legacy pattern. The glob is the manifest; agent assignments are slices of it.
77
+
78
+ 2. **Pair every fan-out with a post-fan-out completeness sweep BEFORE the wave is declared done.** After all fan-out agents return, re-run the legacy pattern across the *whole* target tree — not the per-agent slices, the entire tree — and assert zero residual hits (or a fully-justified allowlist):
79
+
80
+ ```bash
81
+ # The wave touched src/widgets/**; sweep the whole tree for the old pattern.
82
+ grep -rnE '<legacy pattern>' src/ | grep -v '<intentional-keep paths>'
83
+ # Must be empty (or every line individually justified) before the wave closes.
84
+ ```
85
+
86
+ The per-agent reviews each pass — every slice is internally consistent — yet the union can still miss a file that fell between two agents' globs, a file added after the manifest was captured, or a path one agent assumed another owned. The completeness sweep is the only check that sees the whole tree at once. A fan-out wave is NOT done until its sweep is green. (Field report #355 F2: a directory-wide migration declared complete on green per-agent reviews still shipped legacy-pattern residue because no sweep ran the old pattern across the full tree after the wave.)
87
+
70
88
  ### Closeout grep pinning
71
89
 
72
90
  When a `/campaign` closeout report cites a followup count or backlog size (e.g., "F-V710-ORG1-DEFAULTS — ~12 sites remaining" or "~21 cumulative followups"), the followup definition MUST embed the literal grep pattern + observed `n=N` at closeout HEAD. The next campaign's `/architect --plan` re-runs the same grep before accepting the count.
@@ -279,6 +297,7 @@ User confirms, redirects, or overrides. On confirm → Step 4.
279
297
  3. Fury runs the full pipeline (or `--fast` if user prefers). **Note:** `--fast` skips Crossfire + Council but NEVER skips `/sentinel` if the mission adds new endpoints, WebSocket handlers, or credential-handling code.
280
298
  3a. **Per-mission Kenobi quick-scan:** If the mission creates or modifies auth, crypto, HMAC, credential handling, or webhook verification code, run a focused Kenobi security scan within the mission — do not defer to the Victory Gauntlet. The reduced pipeline's single review round is calibrated for business logic, not security-sensitive code. Quick-scan scope: credential leakage, timing attacks, input validation, error message exposure. (Field report #265: webhook HMAC bypass, credential leakage in errors, and auth header override all shipped through the reduced pipeline and were only caught by the Victory Gauntlet.)
281
299
  4. Only checkpoint if `/context` shows actual usage above 85%. Do not preemptively suggest checkpoints.
300
+ 4a. **A per-wave staging deploy is a STATUS checkpoint — report and continue, never a decision-frame pause** (field report #355 F4). When a mission or fan-out wave pushes to staging mid-campaign, treat the deploy result exactly like a mission-complete status line: announce it ("M-7.4 deployed to staging, health check green — starting M-7.5") and proceed to the next wave. Do NOT frame the staging deploy as a gate, milestone, or "continue or pause?" question — the staging push is part of the autonomous flow, not a hand-back point. The only valid pauses remain the ones in the Pause-Bias Anti-Pattern list below (context >85%, BLOCKED item, un-auto-fixable Critical, user interrupt). This is the action-prose statement of that rule; the callout below is its rationale.
282
301
  5. On completion → Step 5
283
302
 
284
303
  **Post-infrastructure enforcement gate:** For infrastructure campaigns (deploy targets, CI/CD, monitoring, staging environments): after the infrastructure is provisioned, run `/architect --plan` to verify workflow enforcement gates exist — not just infrastructure existence. Infrastructure without process gates is incomplete.
@@ -443,6 +443,42 @@ done < .env
443
443
 
444
444
  Audit existing deploy scripts: grep for `eval "export`, `eval export`, and `export $(cat`. Any hit is a latent secret-corruption bug — replace it with the literal parser or one of the runtime-native loaders above.
445
445
 
446
+ ## Deploy-Environment Assumptions
447
+
448
+ A deploy that succeeds in dev can fail in prod because the *environment* differs in ways no syntax check sees. Three classes recur; two already have their own sections in this doc — this section adds the third and cross-references the others so they're triaged together:
449
+
450
+ 1. **Served-artifact verification** — the bundle nginx/the CDN actually serves can diverge from the one you just built. See §The served artifact is not the built artifact and §Post-push live-URL fingerprint.
451
+ 2. **`.env`-file precedence / loading** — values get mangled or silently defaulted depending on how the file is loaded. See §Env-File Loading Safety and §Config Foot-Guns (deploy/runtime).
452
+ 3. **Boot-time schema re-application under DB-role ownership mismatch** (field report #354 F4) — the new one, below.
453
+
454
+ ### Boot-time DDL ownership/grant alignment (field report #354 F4)
455
+
456
+ Idempotent boot-time DDL is NOT automatically safe across environments. When an app runs schema re-application at startup — `CREATE TABLE IF NOT EXISTS`, `CREATE INDEX IF NOT EXISTS`, or a migration runner invoked on boot — the `IF NOT EXISTS` guard only protects against *existence* collisions. It does NOT protect against *ownership* collisions. If the tables were originally created by a **different DB role** than the role the app connects as (the classic split: a privileged `admin`/`migrator` role created the schema, but the app connects as a least-privilege `app` role), the startup DDL fails:
457
+
458
+ - `CREATE TABLE IF NOT EXISTS` on an existing table the connecting role does not own can still raise `must be owner of table <name>` when it tries to reconcile constraints/indexes — `IF NOT EXISTS` short-circuits creation but not every ownership-checked path.
459
+ - `ALTER TABLE` / `CREATE INDEX` in the same boot sequence have no `IF NOT EXISTS` escape and fail outright with `permission denied` or `must be owner of relation`.
460
+ - The app then either crashes at boot or (worse) logs the DDL error and serves with a half-migrated schema.
461
+
462
+ This passes in dev because dev usually runs everything as one superuser-ish role, so ownership is never split. Prod splits roles for least privilege — and that's exactly where the ownership mismatch surfaces.
463
+
464
+ **The check (run before declaring a boot-time-migration deploy healthy):** confirm the role the app connects as either *owns* the schema objects or has been granted the privileges the boot DDL needs. For PostgreSQL:
465
+ ```sql
466
+ -- Who owns the tables the app's boot DDL will touch?
467
+ SELECT tablename, tableowner FROM pg_tables WHERE schemaname = 'public';
468
+ -- The connecting app role:
469
+ SELECT current_user;
470
+ ```
471
+ If owner ≠ app role, align ownership or grants before the boot runs:
472
+ ```sql
473
+ -- Option A: hand ownership to the app role (simplest when the app owns its own migrations)
474
+ ALTER TABLE public.<table> OWNER TO app_role;
475
+ -- Option B: keep a separate migrator owner, but grant the app role what its boot DDL needs,
476
+ -- and make future objects inherit grants:
477
+ GRANT ALL ON ALL TABLES IN SCHEMA public TO app_role;
478
+ ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO app_role;
479
+ ```
480
+ Prefer **Option A** when the app owns its migrations, **Option B** when policy requires a distinct migrator/owner role. Either way: idempotent DDL still needs ownership/grant alignment — the `IF NOT EXISTS` keyword is not an ownership escape hatch. Best practice is to run migrations as the owning role on deploy and connect the app as a least-privilege role that does NOT re-run DDL at boot at all — but if boot-time re-application stays, this alignment check is mandatory.
481
+
446
482
  ## systemd Unit Hardening (Node.js)
447
483
 
448
484
  Sandboxing directives in a systemd unit are good practice, but **Node.js units must NOT set `MemoryDenyWriteExecute=true`** (field report #344 F3). V8's JIT compiler maps pages that are simultaneously writable and executable (W^X is violated by design for JIT); `MemoryDenyWriteExecute=true` (MDWE) forbids exactly that, so the Node process dies with **`SIGTRAP` at boot** before it serves a single request. The crash looks unrelated to the unit file — operators chase the app for hours.
@@ -123,6 +123,8 @@ Troi also performs a **Marketing Copy Drift Check**: compare marketing page clai
123
123
 
124
124
  **Pattern auth completeness check (Kenobi, during Rounds 2-3):** When a pattern file defines an authentication flow, verify the auth checks perform actual value verification (compare against expected, call verify functions) — not just presence checks (`!!header`, `Boolean()`). Flag `!!` or truthiness checks on auth-related headers as suspicious. (Field report #109: daemon socket auth used `!!vaultHeader` which passed for any non-empty string.)
125
125
 
126
+ **Contrast-finding admissibility (Reality stone / a11y, Galadriel's team) (field report #355 F1):** A contrast finding is **inadmissible** — and therefore CANNOT be rated **Critical** or **High** — unless it cites the **literal source hex for BOTH the foreground and the background**, each with its own `file:line`, AND the agent re-greps that the offending **class pairing actually exists** at the cited location before rating. Citing a token *name* (`--text-muted on --surface-2`) is not a hex; the agent must resolve the token to its computed `#rrggbb` value at the cited `file:line` and quote both colors. An uncited contrast finding (no source hex, or only one of the two colors, or a class pairing that no longer exists at the cited line) is logged as inadmissible and dropped before the fix batch. This defends against the **token-name-swap false-Critical** — a finding that asserts a contrast failure from token names alone, where the swapped/renamed token actually resolves to a compliant hex and the failing class pairing was never present in the rendered output.
127
+
126
128
  **Total: 30+ unique agent deployments across 5 rounds.**
127
129
 
128
130
  ## Escalation Pattern
@@ -274,6 +276,9 @@ After the gauntlet completes (all mandated rounds), the caller MUST invoke `/git
274
276
  - `--security-only` — Run 4 rounds of security only: inventory, full audit, re-probe, adversarial. Kenobi's marathon. For when you specifically need a deep security review.
275
277
  - `--ux-only` — Run 4 rounds of UX only: surface map, full audit, re-verify, enchantment. Galadriel's marathon.
276
278
  - `--qa-only` — Run 4 rounds of QA only: discovery, full pass, re-probe, adversarial. Batman's marathon.
279
+
280
+ **Single-domain `--focus` roster shape — surface-partition, don't stack duplicates (field report #355 F3):** When `--focus` names a **single domain** (or a `--*-only` marathon runs one lens for 4 rounds), build the roster from **surface-partitioned agents** — each agent owns a **distinct set of files/sections/routes**, not the whole codebase — and **cap the roster at ~6-8 agents**. Do NOT stack near-duplicate same-lens personas that all review **everything**: ten security agents each scanning the entire surface produce overlapping findings, inflated dedupe cost, and false consensus (the same false positive reported ten times reads as ten confirmations). Partition instead: assign Agent A the auth/session surface, Agent B the payment/billing routes, Agent C the file-upload + media paths, Agent D the multi-tenant query layer, etc. — distinct ownership per agent, full coverage across the roster, with one or two cross-cutting agents reserved for seams between partitions. Six-to-eight partitioned agents beat a dozen redundant ones on both coverage and signal-to-noise.
281
+
277
282
  - `--resume` — Resume from the last completed round (reads from gauntlet-state.md).
278
283
  - `--ux-extra` — Extra Éowyn enchantment emphasis across all rounds. Galadriel's team proposes micro-animations, copy improvements, and delight moments beyond standard usability/a11y. Produced 7 shipped enchantments in the v7.1.0 Gauntlet.
279
284
  - `--assess` — **Pre-build assessment mode.** Run Rounds 1-2 only (Discovery + First Strike) and produce an assessment report — no fix batches, no Crossfire, no Council. Designed for evaluating existing codebases before a rebuild or migration. When an existing codebase has fundamental issues (stubs, abandoned migrations, missing auth), Rounds 3-10 become redundant because there are no fixes to verify between rounds. The assessment report groups findings by root cause rather than by domain, producing a "State of the Codebase" view. (Field report #125: Infinity Gauntlet on a half-built system produced 120+ findings all tracing to the same root cause — stubs returning True.)
@@ -220,6 +220,8 @@ A surface that trips three or more of these tells is presumed AI-slop and goes b
220
220
  **Elrond:** IA, navigation, task flows, friction.
221
221
  **Arwen:** Spacing, typography, icons, button hierarchy, visual hierarchy.
222
222
  **Samwise:** Keyboard nav, focus rings, ARIA, contrast, reduced motion. **WCAG contrast verification:** For the project's primary text/background combinations, verify WCAG AA contrast ratio (4.5:1 for normal text, 3:1 for large text). Check: primary text on primary bg, muted text on primary bg, accent text on primary bg. Opacity modifiers (e.g., `text-emerald-200/50`) halve the effective contrast — always compute the final rendered color, not the base color. A systematic check during the initial color system design prevents dozens of instances across the codebase. (Field report #38: 46 failing-contrast instances across 13 files, systemic from day 1.)
223
+
224
+ **Contrast findings must be cited and re-grepped (#355 F1).** Computing the final rendered color is necessary but not sufficient. A contrast finding is **inadmissible if uncited**: it MUST cite the *literal source hex* for BOTH foreground and background, each with the `file:line` where it is defined (`tailwind.config.ts`, `globals.css`, or the relevant theme file). Before rating any contrast issue Critical or High, **RE-GREP the actual class usage** in the codebase to confirm the foreground/background pairing actually co-occurs on a real element — a pairing that never renders together is not a finding. **Token NAMES are not proxies for VALUES.** Never infer contrast from semantic token names: a token called `paper` may resolve to near-black and one called `ink` to near-white. Read the value, not the name. (In #355 F1 a token-name swap — assuming `paper`/`ink` meant light/dark by their names — produced a false site-wide Critical that did not exist once the actual hex values were read.)
223
225
  ### Async Polling State Machine
224
226
  Any UI that polls for backend status changes must implement 4 states: **idle -> syncing -> success -> failure**. Never show "success" before the async confirmation resolves. Never show the old value alongside a "updated" banner. The polling result replaces the displayed value atomically — both change together or neither does. (Field report #149)
225
227
 
@@ -278,6 +280,8 @@ Click through every primary journey. Document friction, broken UI, missing state
278
280
  | ID | Title | Severity | Category | Location | Repro | Current | Expected | Recommendation | Files | Verified | Regression | Risk |
279
281
  |----|-------|----------|----------|----------|-------|---------|----------|----------------|-------|----------|-----------|------|
280
282
 
283
+ **Severity must be enforcement-aware (#354 F2).** When a finding is a *client-side affordance or visibility leak* — a disabled-looking action that is still clickable, a hidden field present in the DOM, an admin control rendered to a non-admin — check whether the **server still enforces** the rule before rating it. If the backend rejects the action regardless of the client state, this is a **UX issue (P2/P3)**, not a security breach: the user-facing affordance is confusing or misleading, but no privilege is actually escalated. Only when the server fails to enforce does it cross into Kenobi's territory and escalate. Rate the UX defect, then hand the *enforcement* question to Security rather than inflating the UX severity. Cross-reference the SECURITY_AUDITOR enforcement-layer severity rubric (`SECURITY_AUDITOR.md`, Operating Rule 2 — "Severity = exploitability × impact"): a leak the server enforces has near-zero exploitability and so cannot be Critical on the security axis.
284
+
281
285
  ## Step 5 — Enhancement Specs (Before Coding)
282
286
 
283
287
  Problem statement, proposed solution, acceptance criteria, UI details, a11y requirements (Samwise signs off), copy (Bilbo signs off), edge cases, out of scope.
@@ -270,6 +270,8 @@ A test failure observed during a multi-file suite run is **NOT attributed to you
270
270
 
271
271
  Shared-DB and shared-fixture suites routinely produce cross-file collisions — duplicate-seed conflicts, ordering dependencies, leaked global state, autoincrement-id assumptions — that masquerade as regressions introduced by the change under review. Attributing one of these to your fix sends the QA pass down a false trail and can trigger a "revert the good fix" overcorrection. Run the isolation check and the clean-HEAD check before you write the bug down or blame the diff. (Field report #349 F-3)
272
272
 
273
+ **Isolation-green is not deploy-green.** The two checks above clear a change of *blame* for a failure seen in the full run — they do NOT clear the change for deploy. Attribution runs the failing file in ISOLATION; a deploy gate runs the FULL suite. The asymmetry is the point: the very cross-file coupling that lets a collision masquerade as your regression also lets a *real* regression introduced by your change hide inside an *unrelated* test that only fails when the whole suite runs together (shared fixture your change now mutates, global state your change leaks, ordering your change perturbs). A targeted/isolation run of just the tests you touched can be all-green while the full suite is red on a file you never opened. Therefore: before declaring a change deploy-ready, run the FULL suite to green — never sign off on the strength of a targeted or isolation-only run. Isolation green proves "not my blame for *this* failure"; only full-suite green proves "safe to ship." (Field report #354 F3)
274
+
273
275
  ### Planted-Bug Check — Gates Must Gate
274
276
 
275
277
  For every gate, threshold, or invariant a mission introduces (auth allowlist, eval scorer, rate cap, boot guard, validation boundary, feature flag), the review MUST confirm the gate actually gates: a deliberate inversion or revert of the gate's logic WOULD fail at least one test. Procedure — for each gate:
@@ -263,6 +263,17 @@ For any system that sends URLs to users (transactional emails, SMS, push notific
263
263
 
264
264
  This is the outbound mirror of SSRF prevention: SSRF stops external URLs from reaching internal services, outbound URL safety stops internal URLs from reaching external users. (Field report #44: verification email sent with `localhost:5005` URL — worked on same machine, broke from any other device.)
265
265
 
266
+ ### Enforcement-Layer Severity Rubric (field report #354 F2)
267
+
268
+ Key a finding's severity to the **enforcement layer**, not the **symptom location**. The question that sets severity is not "where did I see the leak?" but **"where is this actually enforced?"** Before you assign P0/P1, trace the request to the layer that *decides* — the server-side authorization check, the database query scope, the policy engine — and confirm the gap exists *there*.
269
+
270
+ - **Client-side affordance leak with intact server enforcement = UX-only (P2/P3), not a breach.** A hidden admin button that renders in the DOM, a disabled-but-present form field, an action the SPA shows but the API rejects with 403/404 — these are **render-then-403** patterns. The client showed something it shouldn't, but the actually-enforcing layer (the server) still says no. That is an information-disclosure or UX-polish finding, not a Critical. Rating a server-enforced client affordance leak as Critical is a false-positive that wastes a remediation slot and erodes trust in the report.
271
+ - **A gap at the actually-enforcing layer = P0/P1.** If the server itself does not check ownership, the role gate is missing on the route, or the query has no `org_id` scope, the breach is real regardless of what the client renders. The symptom may surface in the UI, but the severity comes from the server hole.
272
+
273
+ **Verification before scoring (always do this for any "exposed in the UI" finding):** reproduce the action against the API directly — `curl`/Postman with the victim's resource ID and the attacker's credentials, no browser. If the server returns 403/404/401 and writes nothing, the enforcing layer holds → downgrade to P2/P3 and note "server-enforced; client affordance leak only." If the server returns 200 + data or commits a write, the enforcing layer is breached → P0/P1. Never infer the server's behavior from the client's rendering.
274
+
275
+ This is an explicit lens in **both** the audit (Phase 1/2: for every "this is visible/clickable" observation, ask "where is this actually enforced?" and probe that layer) and the re-verify pass (Phase 4: Maul must confirm a downgraded affordance-leak finding by hitting the API directly, not by re-checking the DOM). (#354 F2)
276
+
266
277
  ### Credentials Never in API Responses
267
278
 
268
279
  API responses must NEVER include credentials, tokens, or secrets — even in "admin-only" or "internal" endpoints. Grep for responses that include: `password`, `secret`, `token`, `api_key`, `private_key`, `credentials`. Common violations: user profile endpoints returning the password hash, API key management endpoints including the full key in GET responses (show only last 4 characters), internal debug endpoints returning environment variables. (Field report #66: API settings endpoint returned full MCP connection credentials in the response body.)
@@ -366,7 +377,7 @@ When fixing an auth, authorization, or validation check: trace ALL callers of th
366
377
 
367
378
  After remediations are applied:
368
379
 
369
- **Maul — Red Team Verification:** Re-probe all remediated vulnerabilities. Verify fixes hold under adversarial conditions. Check that fixes didn't introduce new attack vectors. Attempt to bypass the remediations.
380
+ **Maul — Red Team Verification:** Re-probe all remediated vulnerabilities. Verify fixes hold under adversarial conditions. Check that fixes didn't introduce new attack vectors. Attempt to bypass the remediations. **Apply the enforcement-layer lens (#354 F2):** for any finding rated Critical/High off a UI-visible symptom, confirm severity by hitting the API directly — a finding that only reproduces in the DOM but returns 403/404 server-side is a server-enforced affordance leak (P2/P3), not the breach it was filed as. Re-score before sign-off.
370
381
 
371
382
  **Padmé — Functional Verification:** After Maul confirms security holds, Padmé verifies the primary user flow still works end-to-end. Open the app, complete the main task, verify output. This catches "secure but broken" regressions that pure security re-testing misses.
372
383
 
@@ -351,6 +351,20 @@ Both would have been caught by an adversarial pass that asked "what new failure
351
351
 
352
352
  **Important distinction:** The Agent tool enables **parallel analysis**, not parallel coding. Sub-agents return text findings — the lead agent then implements code changes sequentially. This is still faster than sequential analysis, but don't expect parallel file edits.
353
353
 
354
+ ### The Default Review Shape: Find → Cluster/Dedupe → 3-Lens Verify → Fix Only Survivors
355
+
356
+ Every review command — `/engage`, `/sentinel`, `/gauntlet` — runs the same four-stage shape, not a flat "list findings then fix everything" pass. v23.12.0 added the refute-pass mechanics to `/gauntlet`; this is the generalized naming so the same discipline is the DEFAULT everywhere, not Gauntlet-only (field report #354 F1).
357
+
358
+ 1. **Find** — fan out the roster; each lens produces raw findings against the same diff (see Intentionally Overlapping Mandates).
359
+ 2. **Cluster/Dedupe** — collapse the raw findings into distinct claims. The same root cause flagged by Stark + Kenobi + Ahsoka is ONE claim with three votes, not three findings. LLM-assigned finding ids are display labels, not keys — dedupe on the claim, not the id.
360
+ 3. **3-Lens Verify** — every surviving claim is interrogated through three lenses before it earns a fix:
361
+ - **Correctness** — is the asserted behavior actually wrong? (the bug is real, the logic is genuinely broken)
362
+ - **Reachability** — can production actually hit this path? (not provably-dead-code, not behind a `DEV_ONLY` gate — see the WARN/cosmetic contract above)
363
+ - **Refutation** — assign a **skeptic agent whose explicit job is to REFUTE the finding and cast a confirm vote.** The skeptic is told to argue the finding is wrong/unreachable/already-handled and to vote CONFIRM only if it cannot. A claim that survives a reviewer instructed to kill it is a real claim. This is the defining element of the shape: not "does another lens agree?" but "does a lens TRYING to disprove it fail to?" A finding nobody was assigned to refute is unverified, regardless of how many agents independently raised it.
364
+ 4. **Fix Only Survivors** — only claims that pass all three lenses (correct AND reachable AND survive a confirm-vote refutation) enter the fix batch. Refuted claims are logged with the refutation rationale and dropped — never silently, so a future review doesn't re-raise them.
365
+
366
+ The refutation lens is what separates this from the Intentionally Overlapping Mandates convergence rule: convergence asks independent agents to agree; refutation assigns one agent to disagree on purpose. Run both — convergence raises confidence on what's flagged, refutation removes false positives from the fix batch. (Field report #354 F1.)
367
+
354
368
  ### Multi-Session Parallelism (Separate Terminals)
355
369
 
356
370
  For larger projects where agents need to make code changes simultaneously, use separate Claude Code sessions in different terminal windows. Each session works on separate files within defined scope boundaries.
@@ -456,6 +470,15 @@ Field report #324 (Union Station v7.8 R2): three agents (Discovery + Stark + Ken
456
470
  - **Wait for ALL parallel agents before synthesizing** (field report #300).
457
471
  - Partition strategies: by domain (frontend/backend), by concern (security/UX), or read-only vs. write.
458
472
 
473
+ ### Directory / Migration Fan-Out: Glob the List, Sweep the Remainder
474
+
475
+ When a wave fans out per-file or per-entity work across a directory or migration (one agent per file/module/route/migration), two rules are MANDATORY (field report #355 F2):
476
+
477
+ 1. **Derive the per-agent file list from a GLOB, never a hand-typed list.** Run `ls`/`Glob` (e.g., `Glob "src/routes/**/*.ts"`, `git ls-files 'migrations/*.sql'`) and partition the GLOB output into agent assignments. A hand-typed list silently drops the files the orchestrator forgot existed — and those are exactly the ones with the unmigrated legacy pattern, because they weren't top-of-mind. The glob is the source of truth for "what's in scope," not the orchestrator's memory of the tree.
478
+ 2. **Pair every fan-out with a post-fan-out completeness sweep before the wave is "done."** After all fan-out agents return, run ONE grep for the legacy pattern across the WHOLE target tree (not just the assigned files) — e.g., `grep -rn "oldApiCall(" src/` or `grep -rln "TODO: migrate" .`. A wave is not complete while that grep returns hits. The sweep catches: files the glob/partition missed, files created mid-wave by a parallel agent, and occurrences an agent declared done but left behind. Zero hits is the completion gate, not "all dispatched agents reported done."
479
+
480
+ The failure mode this prevents: a fan-out reports "9/9 agents complete" while 3 files still carry the legacy pattern — because they were never in the hand-typed list, and nobody grepped the whole tree to confirm. "All my agents finished" is not "the migration is complete." The completeness sweep is the difference. (Field report #355 F2.)
481
+
459
482
  ### Context Passing Between Phases
460
483
 
461
484
  - Pass **findings summaries** between phases, not raw file contents
@@ -245,6 +245,71 @@ const dropLegacyAvatarUrl: MigrationStep = {
245
245
  },
246
246
  };
247
247
 
248
+ // ── Boot-Time Schema Re-Application & Table Ownership (#354 F4) ──
249
+
250
+ /**
251
+ * GUARD: idempotent boot-time schema re-application must account for table
252
+ * OWNERSHIP and role grants — not just IF NOT EXISTS / IF EXISTS guards (#354 F4).
253
+ *
254
+ * The trap: a service that re-applies its schema at startup (boot-time DDL,
255
+ * "ensure schema" on connect) often connects as an app role distinct from the
256
+ * role that originally created the tables. In PostgreSQL, ALTER TABLE / DROP /
257
+ * CREATE INDEX / ADD COLUMN require the table OWNER (or a superuser) — not just
258
+ * INSERT/UPDATE/SELECT privileges. So even a fully idempotent
259
+ * `CREATE TABLE IF NOT EXISTS` / `ALTER TABLE ... ADD COLUMN IF NOT EXISTS`
260
+ * will FAIL at boot with "must be owner of table X" when the table is owned by
261
+ * a different DB role (e.g. a migration/admin role) than the connecting app role.
262
+ * The IF [NOT] EXISTS guard does not save you here — ownership is checked before
263
+ * the existence short-circuit on ALTER, and CREATE INDEX has no existence
264
+ * short-circuit at all for the ownership check.
265
+ *
266
+ * Why it bites at boot specifically: idempotent re-application is meant to be a
267
+ * safe no-op on an already-migrated DB. But the ownership check fires regardless
268
+ * of whether the change is a no-op, so a healthy, already-correct schema can
269
+ * still crash the service on startup.
270
+ *
271
+ * Mitigations (pick per your trust model):
272
+ * - Run boot-time/idempotent DDL as the table OWNER role, not the app role.
273
+ * Keep schema changes on a privileged migration role; let the app role do DML only.
274
+ * - OR align ownership: `ALTER TABLE <t> OWNER TO <app_role>` once (by a superuser),
275
+ * or create tables under the app role from the start.
276
+ * - OR use a shared owning role and `GRANT <owner_role> TO <app_role>` so the
277
+ * app role can act as owner for DDL.
278
+ * - Prefer a one-shot migration step (runMigrations) over boot-time re-application
279
+ * for anything beyond table/index existence — it isolates the privileged role.
280
+ *
281
+ * Pre-flight check before re-applying schema at boot — fail fast with a clear
282
+ * message instead of a raw "must be owner" deep in startup.
283
+ */
284
+ async function assertTableOwnership(
285
+ ctx: MigrationContext,
286
+ table: string,
287
+ expectedRole: string
288
+ ): Promise<void> {
289
+ // PostgreSQL: tableowner comes from pg_tables; current_user is the connecting role.
290
+ const result = await ctx.execute(
291
+ `SELECT tableowner FROM pg_tables WHERE tablename = $1`,
292
+ [table]
293
+ );
294
+
295
+ // No row → table absent; CREATE TABLE IF NOT EXISTS will create it under the
296
+ // connecting role, so ownership is not a concern for this table yet.
297
+ if (result.rowCount === 0) {
298
+ ctx.log('migration.ownership_check', { table, present: false });
299
+ return;
300
+ }
301
+
302
+ // In a real adapter, read the owner value from the row; shown here as the contract.
303
+ // If the owner is not the expected (connecting/owner) role, boot-time ALTER/CREATE
304
+ // INDEX on this table will fail with "must be owner of table" (#354 F4).
305
+ ctx.log('migration.ownership_check', {
306
+ table,
307
+ present: true,
308
+ expectedRole,
309
+ note: 'boot-time DDL requires table owner or a superuser; app-role DML privileges are not enough',
310
+ });
311
+ }
312
+
248
313
  // ── Batched Processing for Large Tables ─────────────────
249
314
 
250
315
  /**
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "voidforge-build",
3
- "version": "23.12.0",
3
+ "version": "23.12.1",
4
4
  "description": "From nothing, everything. A methodology framework for building with Claude Code.",
5
5
  "type": "module",
6
6
  "engines": {
@@ -45,7 +45,7 @@
45
45
  "@aws-sdk/client-rds": "^3.700.0",
46
46
  "@aws-sdk/client-s3": "^3.700.0",
47
47
  "@aws-sdk/client-sts": "^3.700.0",
48
- "voidforge-build-methodology": "^23.12.0",
48
+ "voidforge-build-methodology": "^23.12.1",
49
49
  "node-pty": "^1.2.0-beta.12",
50
50
  "ws": "^8.19.0"
51
51
  },