thevoidforge 21.0.10 → 21.0.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (108) hide show
  1. package/dist/.claude/commands/ai.md +69 -0
  2. package/dist/.claude/commands/architect.md +121 -0
  3. package/dist/.claude/commands/assemble.md +201 -0
  4. package/dist/.claude/commands/assess.md +75 -0
  5. package/dist/.claude/commands/blueprint.md +135 -0
  6. package/dist/.claude/commands/build.md +116 -0
  7. package/dist/.claude/commands/campaign.md +201 -0
  8. package/dist/.claude/commands/cultivation.md +166 -0
  9. package/dist/.claude/commands/current.md +128 -0
  10. package/dist/.claude/commands/dangerroom.md +74 -0
  11. package/dist/.claude/commands/debrief.md +178 -0
  12. package/dist/.claude/commands/deploy.md +99 -0
  13. package/dist/.claude/commands/devops.md +143 -0
  14. package/dist/.claude/commands/gauntlet.md +140 -0
  15. package/dist/.claude/commands/git.md +104 -0
  16. package/dist/.claude/commands/grow.md +146 -0
  17. package/dist/.claude/commands/imagine.md +126 -0
  18. package/dist/.claude/commands/portfolio.md +50 -0
  19. package/dist/.claude/commands/prd.md +113 -0
  20. package/dist/.claude/commands/qa.md +107 -0
  21. package/dist/.claude/commands/review.md +151 -0
  22. package/dist/.claude/commands/security.md +100 -0
  23. package/dist/.claude/commands/test.md +96 -0
  24. package/dist/.claude/commands/thumper.md +116 -0
  25. package/dist/.claude/commands/treasury.md +100 -0
  26. package/dist/.claude/commands/ux.md +118 -0
  27. package/dist/.claude/commands/vault.md +189 -0
  28. package/dist/.claude/commands/void.md +108 -0
  29. package/dist/CHANGELOG.md +1918 -0
  30. package/dist/CLAUDE.md +250 -0
  31. package/dist/HOLOCRON.md +856 -0
  32. package/dist/VERSION.md +123 -0
  33. package/dist/docs/NAMING_REGISTRY.md +478 -0
  34. package/dist/docs/methods/AI_INTELLIGENCE.md +276 -0
  35. package/dist/docs/methods/ASSEMBLER.md +142 -0
  36. package/dist/docs/methods/BACKEND_ENGINEER.md +165 -0
  37. package/dist/docs/methods/BUILD_JOURNAL.md +185 -0
  38. package/dist/docs/methods/BUILD_PROTOCOL.md +426 -0
  39. package/dist/docs/methods/CAMPAIGN.md +568 -0
  40. package/dist/docs/methods/CONTEXT_MANAGEMENT.md +189 -0
  41. package/dist/docs/methods/DEEP_CURRENT.md +184 -0
  42. package/dist/docs/methods/DEVOPS_ENGINEER.md +295 -0
  43. package/dist/docs/methods/FIELD_MEDIC.md +261 -0
  44. package/dist/docs/methods/FORGE_ARTIST.md +108 -0
  45. package/dist/docs/methods/FORGE_KEEPER.md +268 -0
  46. package/dist/docs/methods/GAUNTLET.md +344 -0
  47. package/dist/docs/methods/GROWTH_STRATEGIST.md +466 -0
  48. package/dist/docs/methods/HEARTBEAT.md +168 -0
  49. package/dist/docs/methods/MCP_INTEGRATION.md +139 -0
  50. package/dist/docs/methods/MUSTER.md +148 -0
  51. package/dist/docs/methods/PRD_GENERATOR.md +186 -0
  52. package/dist/docs/methods/PRODUCT_DESIGN_FRONTEND.md +250 -0
  53. package/dist/docs/methods/QA_ENGINEER.md +337 -0
  54. package/dist/docs/methods/RELEASE_MANAGER.md +145 -0
  55. package/dist/docs/methods/SECURITY_AUDITOR.md +320 -0
  56. package/dist/docs/methods/SUB_AGENTS.md +335 -0
  57. package/dist/docs/methods/SYSTEMS_ARCHITECT.md +171 -0
  58. package/dist/docs/methods/TESTING.md +359 -0
  59. package/dist/docs/methods/THUMPER.md +175 -0
  60. package/dist/docs/methods/TIME_VAULT.md +120 -0
  61. package/dist/docs/methods/TREASURY.md +184 -0
  62. package/dist/docs/methods/TROUBLESHOOTING.md +265 -0
  63. package/dist/docs/patterns/README.md +52 -0
  64. package/dist/docs/patterns/ad-billing-adapter.ts +537 -0
  65. package/dist/docs/patterns/ad-platform-adapter.ts +421 -0
  66. package/dist/docs/patterns/ai-classifier.ts +195 -0
  67. package/dist/docs/patterns/ai-eval.ts +272 -0
  68. package/dist/docs/patterns/ai-orchestrator.ts +341 -0
  69. package/dist/docs/patterns/ai-router.ts +194 -0
  70. package/dist/docs/patterns/ai-tool-schema.ts +237 -0
  71. package/dist/docs/patterns/api-route.ts +241 -0
  72. package/dist/docs/patterns/backtest-engine.ts +499 -0
  73. package/dist/docs/patterns/browser-review.ts +292 -0
  74. package/dist/docs/patterns/combobox.tsx +300 -0
  75. package/dist/docs/patterns/component.tsx +262 -0
  76. package/dist/docs/patterns/daemon-process.ts +338 -0
  77. package/dist/docs/patterns/data-pipeline.ts +297 -0
  78. package/dist/docs/patterns/database-migration.ts +466 -0
  79. package/dist/docs/patterns/e2e-test.ts +629 -0
  80. package/dist/docs/patterns/error-handling.ts +312 -0
  81. package/dist/docs/patterns/execution-safety.ts +601 -0
  82. package/dist/docs/patterns/financial-transaction.ts +342 -0
  83. package/dist/docs/patterns/funding-plan.ts +462 -0
  84. package/dist/docs/patterns/game-entity.ts +137 -0
  85. package/dist/docs/patterns/game-loop.ts +113 -0
  86. package/dist/docs/patterns/game-state.ts +143 -0
  87. package/dist/docs/patterns/job-queue.ts +225 -0
  88. package/dist/docs/patterns/kongo-integration.ts +164 -0
  89. package/dist/docs/patterns/middleware.ts +363 -0
  90. package/dist/docs/patterns/mobile-screen.tsx +139 -0
  91. package/dist/docs/patterns/mobile-service.ts +167 -0
  92. package/dist/docs/patterns/multi-tenant.ts +382 -0
  93. package/dist/docs/patterns/oauth-token-lifecycle.ts +223 -0
  94. package/dist/docs/patterns/outbound-rate-limiter.ts +260 -0
  95. package/dist/docs/patterns/prompt-template.ts +195 -0
  96. package/dist/docs/patterns/revenue-source-adapter.ts +311 -0
  97. package/dist/docs/patterns/service.ts +224 -0
  98. package/dist/docs/patterns/sse-endpoint.ts +118 -0
  99. package/dist/docs/patterns/stablecoin-adapter.ts +511 -0
  100. package/dist/docs/patterns/third-party-script.ts +68 -0
  101. package/dist/scripts/thumper/gom-jabbar.sh +241 -0
  102. package/dist/scripts/thumper/relay.sh +610 -0
  103. package/dist/scripts/thumper/scan.sh +359 -0
  104. package/dist/scripts/thumper/thumper.sh +190 -0
  105. package/dist/scripts/thumper/water-rings.sh +76 -0
  106. package/dist/scripts/voidforge.js +1 -1
  107. package/package.json +1 -1
  108. package/dist/tsconfig.tsbuildinfo +0 -1
@@ -0,0 +1,344 @@
1
+ # THE GAUNTLET — Thanos's Comprehensive Review
2
+ ## Lead Agent: **Thanos** (Marvel) · Agents: All Universes
3
+
4
+ > *"I am inevitable."*
5
+
6
+ ## Identity
7
+
8
+ **Thanos** is not a villain in VoidForge. He is the quality bar. The Gauntlet is the most comprehensive review protocol in the system — 5 rounds, 30+ agents across 9 universes, escalating from discovery to adversarial warfare to final convergence. If your project survives the Gauntlet, it's ready for anything.
9
+
10
+ **The metaphor:** The Infinity Gauntlet holds seven stones — Power (QA), Space (Architecture), Reality (UX), Soul (Security), Time (DevOps), Mind (Code Review), Wisdom (AI Intelligence). Thanos fires every stone, multiple times, from different angles. The project either withstands the snap, or it reveals where it breaks.
11
+
12
+ **Behavioral directives:** Be thorough without being theatrical. Every finding must be actionable. Don't hunt for problems that don't exist — but don't leave a single stone unturned. The Gauntlet is not about finding fault. It's about finding truth. A project that survives is genuinely strong. A project that reveals weaknesses gets stronger.
13
+
14
+ **When to use /gauntlet:**
15
+ - After `/build` or `/campaign` completes and before shipping to production
16
+ - After a major refactor or architecture change
17
+ - When you want absolute confidence in a deliverable
18
+ - When the project has significant attack surface (auth, payments, user data, WebSocket, file uploads)
19
+ - Before a public launch or investor demo
20
+
21
+ **Dispatch model:** All Gauntlet rounds MUST dispatch to sub-agents per `SUB_AGENTS.md` "Parallel Agent Standard." The main thread manages rounds, triages findings between rounds, and applies fixes — it does NOT read source files or analyze code inline. Each round launches agents in waves of 3 (max concurrent). Findings pass between rounds as summary tables, not raw code. (Field report #270)
22
+
23
+ **When NOT to use /gauntlet:**
24
+ - During active development (use `/assemble` instead — it includes building)
25
+ - On a prototype or WIP (overkill)
26
+ - On methodology-only changes (no runtime code to review)
27
+
28
+ ## The Seven Stones (Domains)
29
+
30
+ | Stone | Domain | Lead | Universe | What It Tests |
31
+ |-------|--------|------|----------|---------------|
32
+ | Power | QA | Batman | DC | Edge cases, error states, boundaries, regression |
33
+ | Space | Architecture | Picard | Star Trek | Schema, scaling, service boundaries, tech debt |
34
+ | Reality | UX | Galadriel | Tolkien | Usability, accessibility, enchantment, visual |
35
+ | Soul | Security | Kenobi | Star Wars | OWASP, injection, auth, secrets, access control |
36
+ | Time | DevOps | Kusanagi | Anime | Deploy, monitoring, backups, infrastructure |
37
+ | Mind | Code Review | Stark | Marvel | Patterns, logic, types, integration tracing |
38
+ | Wisdom | AI Intelligence | Hari Seldon | Foundation | Prompts, orchestration, safety, evals, cost, failure modes |
39
+
40
+ ## Full Agent Roster
41
+
42
+ **Round 1 — Discovery (5 leads):**
43
+ - Picard (Star Trek) — architecture scan
44
+ - Stark (Marvel) — code review scan
45
+ - Galadriel (Tolkien) — UX surface map + Éowyn enchantment
46
+ - Kenobi (Star Wars) — attack surface inventory
47
+ - Kusanagi (Anime) — infrastructure discovery (deploy scripts, generated configs, CI/CD, open ports, default credentials). **Mandate runtime diagnostics:** If the project is runnable, execute diagnostic commands (`ss -tlnp` or `lsof -i`, `df -h`, database config queries, cache status). Source-level config analysis misses runtime state — 3 Critical + 7 High infrastructure findings in field report #102 were invisible to source review. (Field report #102)
48
+ - **Dead code discovery:** Grep frontend API client for methods with zero call sites. Grep stores/services for unexported or unimported functions. >20 dead functions = Medium finding. Dead code accumulates fastest at API boundaries — when a backend endpoint is refactored or removed, the frontend client method and response model survive because nothing errors. (Field report #233)
49
+ - **Cross-environment contamination (multi-env projects):** If the project has staging + production on the same server, verify: different API keys (`grep API_KEY prod/.env staging/.env | md5sum`), different storage buckets, Redis authentication enabled (`redis-cli ACL LIST`), no shared Unix group membership (`id staging-user | grep prod-group`), Docker ports bound to `127.0.0.1` not `0.0.0.0` (`docker ps --format '{{.Ports}}'`). Docker port bindings bypass UFW — verify with `ss -tlnp`, not `ufw status`. (Field report #241)
50
+ - **Environment-aware feature assessment:** When a feature doesn't work in a given environment, check whether that environment has the required credentials/config before reporting it as broken. If staging lacks API keys that production has, report "staging gap — feature requires production credentials" not "feature not working." Don't conflate missing-env-config with broken-feature. (Field report #243)
51
+ - **Cron and scheduled job inventory:** Scan for all scheduled job definitions — cron files, process manager schedules (PM2 `cron_restart`, systemd timers), in-app schedulers (`node-cron`, Celery Beat, APScheduler), CI/CD scheduled pipelines. For each: verify it runs against current code (not a stale path), log output is writable, and it doesn't duplicate another job. Unaudited scheduled jobs are the highest-risk dead code — they run silently and fail silently. (Field report #279)
52
+
53
+ **Round 2 — First Strike (full teams):**
54
+ - Batman team: Oracle, Red Hood, Alfred, Deathstroke, Constantine, Nightwing, Lucius
55
+ - Galadriel team: Elrond, Arwen, Samwise, Bilbo, Legolas, Gimli, Radagast, Éowyn
56
+ - Kenobi team: Leia, Chewie, Rex, Maul, Yoda, Windu, Ahsoka, Padmé
57
+ - Stark: integration tracing (solo — follows data across all modules)
58
+ - Agent 10: **Vin** (Statistical Review) — if the codebase contains statistical analysis (z-tests, confidence intervals, A/B test evaluation, Bayesian inference), Vin verifies: correct baseline/control selection, appropriate statistical test for the hypothesis, confidence threshold interpretation (one-tailed vs two-tailed, p-value vs CDF), sample size adequacy, and NaN/edge case guards. Tests that validate buggy statistical code will pass — Vin reasons about the math, not just the test results. (Field report #265: growth signal z-test shipped with wrong control selection, inverted confidence, and insufficient timeout — all tests passed.)
59
+ - Agent 11: **Hari Seldon** (AI Audit) — Salvor Hardin, Gaal Dornick, Hober Mallow, Bel Riose, Bliss, The Mule, Ducem Barr, Bayta Darell, Dors Venabili (full Foundation team)
60
+
61
+ **Step 2.5 — Runtime Smoke Test (Hawkeye):**
62
+ If the project has a runnable server, start it and verify the full lifecycle:
63
+ 1. Start the server (`npm run dev`, `python manage.py runserver`, etc.)
64
+ 2. Hit every new/modified API endpoint with curl — verify HTTP status codes
65
+ 3. If WebSocket endpoints exist, open a connection and verify handshake + data flow
66
+ 4. If terminal/PTY features exist, create a session and verify it stays alive for 5 seconds
67
+ 5. If the server cannot start (scaffold/methodology-only), skip with a note
68
+
69
+ This catches what static analysis misses: IPv6 binding, native module ABI compatibility, WebSocket frame timing, browser caching, in-memory state lifecycle. (Field report #30: 11 runtime bugs invisible to 5 rounds of code review.)
70
+
71
+ **Browser smoke test (when `e2e: yes`):** Hawkeye runs `npm run test:e2e` as part of Round 2.5. If E2E tests exist, they replace curl-based endpoint checks for UI routes. API endpoint verification via curl remains for non-UI routes. axe-core results from E2E are included in the Gauntlet findings.
72
+
73
+ **Hawkeye's Browser Intelligence (v18.1):** After E2E tests pass, Hawkeye launches the review browser and:
74
+ 1. Navigates every primary route, capturing console errors. Console errors become Gauntlet findings.
75
+ 2. **MANDATORY: Takes a screenshot of every page** at desktop viewport (1440x900). Screenshots are saved to a temp directory and READ by the agent via the Read tool (which displays images). The agent MUST visually inspect each screenshot for: broken layouts, missing content, blank areas, overlapping elements, stuck loading states. This is not optional — it is the core value of browser intelligence. Screenshots are shared with Round 2 agents as context.
76
+ 3. Inspects cookies and CORS headers. Security findings forwarded to Kenobi's team.
77
+
78
+ **Env var audit (after smoke test):** If the project uses build-time environment variables (Next.js `NEXT_PUBLIC_*`, Vite `VITE_*`, CRA `REACT_APP_*`), grep the built JS bundle for references and verify each has a non-empty value in the deployment environment. Build succeeding does NOT mean env vars are set — missing build-time vars cause features to silently disappear without errors. (Field report #104: OAuth buttons rendered conditionally on `NEXT_PUBLIC_GOOGLE_CLIENT_ID` which was never created — build passed, buttons vanished.)
79
+
80
+ **Round 3 — Second Strike (targeted re-verification):**
81
+ - Batman: Nightwing + Red Hood + Deathstroke (re-probe)
82
+ - Galadriel: Samwise + Radagast + Bilbo (re-verify)
83
+ - Kenobi: Maul + Ahsoka + Padmé (re-probe + functional check)
84
+ - Kusanagi: Senku + Levi + Spike + L + Bulma + Holo (full DevOps)
85
+
86
+ **Round 4 — The Crossfire (adversarial):**
87
+ - Maul (Star Wars) — attacks "clean" code
88
+ - Deathstroke (DC) — bypasses hardened defenses
89
+ - Loki (Marvel) — chaos-tests cleared features
90
+ - Constantine (DC) — finds cursed code in fixed areas
91
+ - Éowyn (Tolkien) — final enchantment on polished product
92
+ - **The Mule** (Foundation) — adversarial AI testing on code that passed /ai review
93
+
94
+ **Defense-first rule:** Before claiming a bypass or missing defense, read the FULL function/module that implements the defense. Quote the defensive code. Then explain why the defense is insufficient. If you cannot find defensive code, state 'No defense found at [file:line range]' — do not assume it's missing without reading.
95
+
96
+ **Semantic verification rule:** Verify semantic correctness of arguments, not just type correctness. Ask: is this the RIGHT value, not just a valid type? A function call that compiles and passes type-checking can still be fundamentally wrong if the wrong variable is passed. Check that each argument carries the intended meaning, not just a compatible shape. (Field report #258: aggregate spend parameter received a config object — type-compatible but semantically meaningless, causing NaN comparisons that silently fell through.)
97
+
98
+ **Round 5 — The Council (convergence):**
99
+ - Spock (Star Trek) — code quality after fixes
100
+ - Ahsoka (Star Wars) — access control integrity
101
+ - Nightwing (DC) — full regression
102
+ - Samwise (Tolkien) — final a11y
103
+ - Padmé (Star Wars) — critical path functional verification
104
+ - **Bayta Darell** (Foundation) — AI evaluation completeness verification
105
+ - Troi (Star Trek) — PRD compliance (prose-level) + **CLAUDE.md verification**: every slash command in the table has a `.claude/commands/*.md` file, every agent in the team table has a naming registry entry, every doc in the reference table exists at the stated path. (Field report #108: `/dangerroom` listed but no command file existed for 30 versions.)
106
+
107
+ Troi also performs a **Marketing Copy Drift Check**: compare marketing page claims (features listed, capabilities described, performance promises) against the actual shipped feature set. Flag any claim that cannot be demonstrated in the running application. Marketing pages may describe planned features that were later descoped or changed during review fixes.
108
+
109
+ **Pattern auth completeness check (Kenobi, during Rounds 2-3):** When a pattern file defines an authentication flow, verify the auth checks perform actual value verification (compare against expected, call verify functions) — not just presence checks (`!!header`, `Boolean()`). Flag `!!` or truthiness checks on auth-related headers as suspicious. (Field report #109: daemon socket auth used `!!vaultHeader` which passed for any non-empty string.)
110
+
111
+ **Total: 30+ unique agent deployments across 5 rounds.**
112
+
113
+ ## Escalation Pattern
114
+
115
+ Each round is more intense than the last:
116
+
117
+ ```
118
+ Round 1: "What do we have?" → Map the surface
119
+ Round 2: "What's wrong?" → Find the problems
120
+ Round 3: "Did we fix them?" → Verify the fixes
121
+ Round 4: "Can we break it anyway?" → Adversarial attack
122
+ Round 5: "Is it actually right?" → Final convergence
123
+ ```
124
+
125
+ Fix batches happen between rounds:
126
+ - After Round 2: fix all Critical + High
127
+ - After Round 3: fix remaining findings
128
+ - After Round 4: fix adversarial findings
129
+ - After Round 5: final convergence fixes (max 2 iterations)
130
+
131
+ **Sibling Verification Protocol:** After EVERY fix, before commit, verify three dimensions:
132
+
133
+ **Dimension 1 — Pattern grep:** Grep the entire codebase for the same pattern. When fixing `aria-controls` in one component, grep all components. When adding SSRF protection to one endpoint, check all endpoints that accept URLs. Fix ALL instances — not just the one that was reported. This is the #1 source of rework across field reports.
134
+
135
+ **Dimension 2 — Caller tracing:** Trace all callers of the modified function. When fixing an auth check in a helper function, find every code path that implements the same check independently (inline duplicates). Don't fix only the helper — find the routes that duplicated the logic. (Field report #102: `checkMonthlyLimit()` was fixed to check BYOK, but the chat route had a separate BYOK resolution that didn't use the helper.)
136
+
137
+ **Dimension 3 — Mutation parity:** Identify all routes/endpoints that mutate the same data. When fixing a safety mechanism (locking, transactions, version sync) in one mutation path, verify ALL other paths that write to the same table/store use identical mechanisms. (Field report #102: inline-edit route was missing optimistic locking, default version sync, and transactions that the chat service had — three rounds found three separate gaps in the same file.)
138
+
139
+ **Dimension 4 — Output verification:** After modifying any function that produces output sent to clients or external APIs, verify the fix against 3+ samples of real output data. A pattern that passes logic review may fail on actual output keys. Specifically: if adding keyword filters (blocklists, sanitizers), test against the known output schema to check for false positives before applying. (Field report #148: secret stripping `_url`/`_uri` wildcards deleted `DEPLOY_URL`, `S3_WEBSITE_URL`, `CF_PROJECT_URL` from SSE output — caught by Council but cost an extra round.)
140
+
141
+ **Concrete examples of sibling patterns to grep:**
142
+ - Same ARIA attribute value in the same file (e.g., `role="option"` → grep for `"option"` in that file)
143
+ - Same endpoint pattern in sibling router files (e.g., fixed `/api/trips/:id` → check `/api/places/:id`, `/api/bookings/:id`)
144
+ - Same SQL pattern in sibling store functions (e.g., added `AND org_id = ?` → check all `WHERE id = ?` queries)
145
+ - Same CSS class or animation name across component files
146
+ - Same error handling pattern across API routes (e.g., added try/catch → check all routes in the same router)
147
+ - Same crypto pattern across utility files (e.g., fixed modulo bias in `generateToken()` → check `cryptoRandomSuffix()` in helpers.ts)
148
+
149
+ **Execution order check:** For every fix, verify not just that the code exists, but that it executes in the correct order relative to the code that consumes its output. Specifically: if a fix sanitizes/validates a value, verify the sanitization happens BEFORE the value is captured by any object construction, function call, or closure. (Field report #20: PTY clamping placed after spawnOptions construction — caught in Round 3.)
150
+
151
+ **Encoding variant check:** For every security filter that operates on tool names, function names, or identifiers, verify it handles all encoding variants (`:`, `__`, URL-encoded, dot-notation, etc.). MCP tool names, API paths, and permission identifiers may use different encodings across layers.
152
+
153
+ **Build-output verification:** After every fix batch, if the project has a build step, run BOTH `npm test` AND the build command (`npm run build`). Tests passing does NOT mean the build succeeds — variable scoping, import resolution, and TypeScript strict mode can fail at build time while tests pass. Check output for inline scripts broken by CSP changes: `grep -c '<script>' dist/**/*.html`. If the project compiles JSX to HTML (VM execution, SSR, static site generation), also execute the compiled output in the target runtime and verify it renders correctly — build success does NOT mean runtime success. If the build fails, the fix is wrong — fix the fix before proceeding. (Field reports #119, #124: fix agent passed `npm test` but broke the build; compiled HTML rendered empty due to SSR stripping hook state.)
154
+
155
+ **Auth flag security check (Victory Gauntlet):** Grep the codebase for known dangerous auth configuration flags: `allowDangerousEmailAccountLinking`, `trustHost` without proxy validation, `debug: true` in auth config, `session: { strategy: "jwt" }` without token validation. These flags are often set during development and survive into production across multiple campaigns. (Field report #119: `allowDangerousEmailAccountLinking` survived 4 campaigns and 3 Gauntlets undetected.)
156
+
157
+ **Crossfire false-positive verification:** When adversarial agents add blocklist/filter patterns during Crossfire, test the pattern against 5 samples of legitimate output before applying. A pattern that catches an attack but also matches normal content creates a worse problem than the one it solves. (Field report #119: isSafeForVM pattern risked matching legitimate user-generated content.)
158
+
159
+ **Commit per fix batch:** After each fix batch, create a separate commit. This enables surgical revert if a fix introduces a regression — one 43-file commit is impossible to partially revert.
160
+
161
+ **Real data smoke test:** For fixes that modify data transformation, sanitization, or rendering, test against actual project data (not just unit tests). If the project has AI-generated content, test with real LLM output. Unit tests pass ≠ production works. **Security scanner mandatory:** When modifying security filters (isSafeForVM, sanitizeJsx, content validators), MUST compile 3+ real AI-generated files through the full pipeline and verify the output is correct. Security scanner false positives silently break production. This is not optional. (Field report #121: broad regex in isSafeForVM matched legitimate content, silently degrading all sites to client-side Babel fallback.)
162
+
163
+ **Pass 2 false-positive severity:** When Pass 2 identifies a potential false-positive in a security pattern added during Pass 1, classify as **Must Fix**, not Medium. A false positive in a security scanner is functionally a regression — it degrades working features. Do not defer with "monitor in production" unless a monitoring mechanism actually exists and is configured. (Field report #121)
164
+
165
+ ## Finding Format
166
+
167
+ Every finding, from every agent, in every round, uses this format:
168
+
169
+ ```
170
+ ID: [DOMAIN]-[ROUND]-[NUMBER] (e.g., SEC-R2-003)
171
+ Severity: Critical / High / Medium / Low
172
+ Agent: [Name] ([Universe])
173
+ File: [path:line]
174
+ What's wrong: [description]
175
+ Attack vector: [how to exploit, if security]
176
+ How to fix: [specific recommendation]
177
+ Status: Open / Fixed / Verified / Won't Fix
178
+ ```
179
+
180
+ **Verification Gate:** Every Critical or High finding MUST include a direct code quote (3+ lines) from the actual file with file path and line numbers. Findings without exact code quotes are classified as 'Unverified' and must be verified before counting toward severity tallies. This prevents hallucinated findings from driving fix batches.
181
+
182
+ **RC-STUB (Root Cause — Stub Code):** Any function that returns hardcoded success without performing its documented side effects, any method that throws `new Error('Implement...')` or `'Not implemented'`, any handler that logs but performs no work, or any endpoint that reports an action was taken when nothing happened. RC-STUB findings are automatically **High severity** — they represent false functionality that misleads users and downstream systems. Grep for: `throw new Error('Implement`, `throw new Error('Not implemented`, `throw new Error('TODO`, `{ ok: true }` in handlers that have no side effects. **Also check default/else branches in dispatch logic** — these return fake success when no known case matches and are the most commonly missed RC-STUB variant because they don't match named grep patterns. (Field reports: v17.0 assessment, #230)
183
+
184
+ **RC-VERIFY (Unverified Wiring):** After confirming a function is implemented (not a stub), verify it is actually called. Grep for the function name and confirm at least one call site exists in application code (not just tests). Check: (1) new classes never instantiated in `main()` or `app.ts`, (2) exported methods with zero callers outside their own file, (3) route handlers where the router file is never imported in the server entry, (4) event listeners where the event is never emitted, (5) middleware defined but never added to the chain. A function that passes RC-STUB but has zero call sites is dead code masquerading as a feature. (Field report #279)
185
+
186
+ ## State Tracking
187
+
188
+ Write progress to `/logs/gauntlet-state.md` after every round:
189
+
190
+ ```markdown
191
+ # Gauntlet State
192
+
193
+ | Round | Status | Findings | Fixes |
194
+ |-------|--------|----------|-------|
195
+ | 1. Discovery | COMPLETE | 12 | — |
196
+ | 2. First Strike | COMPLETE | 45 | 18 |
197
+ | 3. Second Strike | COMPLETE | 8 | 5 |
198
+ | 4. Crossfire | IN PROGRESS | 3 | — |
199
+ | 5. Council | PENDING | — | — |
200
+ ```
201
+
202
+ ## Quality Reduction Prohibition
203
+
204
+ **The Gauntlet is NEVER reduced, abbreviated, or "run efficiently."** Every round runs the full protocol for that round. No exceptions.
205
+
206
+ You MUST NOT:
207
+ - "Focus on the changeset" instead of reviewing the full codebase
208
+ - Run a "lightweight" version of any round
209
+ - Skip agents within a round because "context is heavy"
210
+ - Combine rounds to "save context"
211
+
212
+ If you believe context is limited, run `/context` and report the actual number. Below 85%: continue full protocol. Above 85%: checkpoint and suggest a fresh session. Never reduce Gauntlet quality in the current session.
213
+
214
+ This rule exists because agents self-justified "efficient" Gauntlets at 28% and 37% context usage, letting bugs through that full rounds would have caught.
215
+
216
+ **Context excuses are not valid.** "Context is heavy" without a number is not justification for reducing rounds, skipping agents, or deferring work. Run `/context`, report the percentage, and continue. If above 85%: checkpoint and suggest a fresh session. Below 85%: full protocol, no exceptions. (Field report #150)
217
+
218
+ ## Flags
219
+
220
+ - `--fast` — Skip Rounds 4 (Crossfire) and 5 (Council). For projects where a lighter review is acceptable. Still 3 rounds, still comprehensive. (formerly `--quick` — renamed v17.3 for cross-command consistency)
221
+ - `--security-only` — Run 4 rounds of security only: inventory, full audit, re-probe, adversarial. Kenobi's marathon. For when you specifically need a deep security review.
222
+ - `--ux-only` — Run 4 rounds of UX only: surface map, full audit, re-verify, enchantment. Galadriel's marathon.
223
+ - `--qa-only` — Run 4 rounds of QA only: discovery, full pass, re-probe, adversarial. Batman's marathon.
224
+ - `--resume` — Resume from the last completed round (reads from gauntlet-state.md).
225
+ - `--ux-extra` — Extra Éowyn enchantment emphasis across all rounds. Galadriel's team proposes micro-animations, copy improvements, and delight moments beyond standard usability/a11y. Produced 7 shipped enchantments in the v7.1.0 Gauntlet.
226
+ - `--assess` — **Pre-build assessment mode.** Run Rounds 1-2 only (Discovery + First Strike) and produce an assessment report — no fix batches, no Crossfire, no Council. Designed for evaluating existing codebases before a rebuild or migration. When an existing codebase has fundamental issues (stubs, abandoned migrations, missing auth), Rounds 3-10 become redundant because there are no fixes to verify between rounds. The assessment report groups findings by root cause rather than by domain, producing a "State of the Codebase" view. (Field report #125: Infinity Gauntlet on a half-built system produced 120+ findings all tracing to the same root cause — stubs returning True.)
227
+ - `--infinity` — **The Infinity Gauntlet.** 10 rounds (2x full pass). Every active agent deployed as its own sub-process — not combined, not summarized. The full ~110 agent roster across 9 universes. See below.
228
+ - `--blitz` — Autonomous execution: no pause between rounds, auto-apply fixes, auto-continue. Combine with `--infinity` for fully autonomous maximum review. Does NOT reduce agent count or skip rounds — only removes human interaction between rounds.
229
+ - `--reckoning` — Pre-launch parity audit: 5-wave parallel review (Marketing → UI → Backend → Gates → Cross-cutting) with ~13 agents. Lighter than `--infinity`, focused on launch readiness rather than code quality. See CAMPAIGN.md "The Reckoning" for the full wave structure.
230
+
231
+ ### The Infinity Gauntlet (`--infinity`)
232
+
233
+ *"I used the stones to destroy the stones."*
234
+
235
+ This is the ultimate test. Every active agent in VoidForge runs as its own dedicated sub-process. No combining agents into single prompts. No "Picard + Stark combined." Each agent gets its own launch, its own context, its own findings.
236
+
237
+ **10 rounds (2 full passes):**
238
+
239
+ **Pass 1 — Discovery + Strike + Crossfire + Council (Rounds 1-5):**
240
+
241
+ Round 1 — Discovery (launch 5+ agents in parallel):
242
+ - Agent 1: **Picard** (Architecture) — with Crusher (diagnostics) + Archer (greenfield assessment)
243
+ - Agent 2: **Stark** (Code Review) — with Spock (pattern compliance) + Seven (integration tracing)
244
+ - Agent 3: **Galadriel** (UX Surface Map) — with Éowyn (enchantment scan) + Celeborn (design system)
245
+ - Agent 4: **Kenobi** (Security Inventory) — with Han (first strike) + Cassian (threat modeling)
246
+ - Agent 5: **Kusanagi** (Infrastructure) — with Senku + Levi + Spike
247
+
248
+ Round 2 — First Strike (launch full domain teams as separate agents):
249
+ - Agent 1: **Batman** → Oracle, Red Hood, Alfred, Lucius (core QA)
250
+ - Agent 2: **Batman** → Deathstroke, Constantine, Cyborg, Raven, Wonder Woman (adversarial QA)
251
+ - Agent 3: **Batman** → Green Lantern (test matrix), Flash (smoke tests), Batgirl (detail), Aquaman (deep dive)
252
+ - Agent 4: **Galadriel** → Elrond, Arwen, Samwise, Bilbo, Legolas, Gimli, Radagast, Éowyn, Celeborn
253
+ - Agent 5: **Galadriel** → Aragorn, Faramir, Pippin, Boromir, Haldir, Frodo, Merry (extended Tolkien)
254
+ - Agent 6: **Kenobi** → Leia, Chewie, Rex, Bo-Katan, Maul (parallel security)
255
+ - Agent 7: **Kenobi** → Yoda, Windu, Ahsoka, Padmé, Qui-Gon, Sabine (sequential + extended)
256
+ - Agent 8: **Stark** → Rogers, Banner, Strange, Barton, Romanoff, Thor, T'Challa, Wanda (full backend)
257
+ - Agent 9: **Picard** → Spock, Uhura, Worf, Tuvok, Scotty, Torres, Kim, Janeway, Riker (full architecture)
258
+ - Agent 10: **Kusanagi** → Senku, Levi, Spike, L, Bulma, Holo, Valkyrie, Vegeta, Trunks, Mikasa, Erwin, Mustang, Olivier, Hughes, Calcifer, Duo (full DevOps)
259
+
260
+ Round 3 — Second Strike (re-probe all domains with fresh agents):
261
+ - Nightwing, Red Hood, Deathstroke (QA re-probe)
262
+ - Samwise, Radagast, Merry (UX re-verify)
263
+ - Maul, Ahsoka, Padmé, Anakin, Din Djarin (security re-probe)
264
+ - Kusanagi full team (DevOps re-verify)
265
+ - Superman (standards enforcement), Huntress (flaky tests)
266
+
267
+ Round 4 — Crossfire (5 adversarial agents, each as own sub-process):
268
+ - **Maul** — attacks reviewed code
269
+ - **Deathstroke** — bypasses security remediations
270
+ - **Loki** — chaos-tests cleared features
271
+ - **Constantine** — hunts cursed code in fixed areas
272
+ - **Éowyn** — final enchantment on hardened product
273
+
274
+ Round 5 — Council (6+ agents, each as own sub-process):
275
+ - **Spock** — pattern/quality verification
276
+ - **Ahsoka** — access control verification
277
+ - **Nightwing** — full regression
278
+ - **Samwise** — final a11y audit
279
+ - **Padmé** — critical path functional verification
280
+ - **Troi** — PRD compliance section-by-section. **Troi — Browser PRD Compliance:** When E2E infrastructure exists, Troi walks through each PRD user flow in the browser using page objects. Compares rendered content against PRD Sections 2 (routes), 4 (features), and 14 (brand voice). Screenshot evidence logged.
281
+ Troi also performs Marketing Copy Drift Check — verifies marketing claims match shipped features.
282
+
283
+ **Pass 2 — Repeat (Rounds 6-10):** Same structure, all agents re-deployed on the fixed codebase. Pass 2 should find zero issues if Pass 1 fixes were correct.
284
+
285
+ **Agent count:** ~60-80 agent sub-process launches across 10 rounds (some agents run in both passes). This is the most thorough review possible within a single session.
286
+
287
+ **When to use:** After completing a major version (v8.x, v9.x). Before v1.0 of a real product. When shipping to production for the first time. When the cost of a missed bug exceeds the cost of the review.
288
+
289
+ **ENFORCEMENT:** Every agent named above MUST be launched as its own Agent tool invocation. Do NOT combine agents. Do NOT shortcut to inline analysis. If context reaches 85%, checkpoint and resume in a fresh session — do NOT reduce the agent count. The Infinity Gauntlet is the one protocol where "too thorough" is impossible.
290
+
291
+ ## Agent Confidence Scoring
292
+
293
+ Each agent reports a confidence score (0-100) on their findings. The score reflects how certain the agent is that the finding is a real issue (not a false positive).
294
+
295
+ **Score ranges:**
296
+ - **90-100:** High confidence — agent is very sure. Skip re-verification in Pass 2.
297
+ - **60-89:** Medium confidence — standard handling. Include in findings, verify in next round.
298
+ - **0-59:** Low confidence — escalate to a second agent from a DIFFERENT universe before presenting. If the second agent disagrees, drop the finding. If the second agent agrees, upgrade to medium confidence.
299
+
300
+ **How to report:** Every finding includes a confidence field: `[ID] [SEVERITY] [CONFIDENCE: XX] [FILE] [DESCRIPTION]`
301
+
302
+ **Why this matters:** In the v8.0 Gauntlet, several "findings" were false positives that wasted fix time. Confidence scoring lets agents express uncertainty instead of presenting everything as definitive. Low-confidence findings get a second opinion before reaching the user.
303
+
304
+ ## Sub-agent Failure Fallback
305
+
306
+ If a sub-agent launch fails (API error, timeout, context exhaustion):
307
+ 1. **Retry once** with 5-second backoff.
308
+ 2. If retry fails, **run the analysis inline** (in the current context) with a logged warning: "Agent [name] failed to launch — running inline. Findings may be less thorough."
309
+ 3. Log the failure to the gauntlet state file with the agent name and error.
310
+ 4. Never skip an agent entirely. Inline analysis is degraded but better than a blind spot.
311
+
312
+ ## Alternative Patterns
313
+
314
+ ### The Reckoning (Pre-Launch Parity Audit)
315
+
316
+ When the goal is "can we ship this?" rather than "is this code perfect?", the Reckoning runs 5 parallel waves focused on launch readiness:
317
+ 1. **Marketing parity** — does the site say what the product does?
318
+ 2. **UI parity** — do all pages/flows match the PRD?
319
+ 3. **Backend parity** — are all endpoints wired and functional?
320
+ 4. **Gate parity** — auth, payments, error handling all working?
321
+ 5. **Cross-cutting** — a11y, SEO, performance, mobile
322
+
323
+ ~13 agents, 30-60 minutes. Complements the Gauntlet (which is code-focused) with a product-focused lens. Reference: field report #85.
324
+
325
+ ## Integration Points
326
+
327
+ ### With `/campaign`
328
+ After the campaign's final mission and before victory, Sisko may offer:
329
+ *"The campaign is built. Want Thanos to put it through the Gauntlet before we declare victory?"*
330
+
331
+ ### With `/assemble`
332
+ The Gauntlet can replace the review phases of `/assemble` for maximum rigor:
333
+ *"Fury completed the build. Running /gauntlet instead of standard review rounds."*
334
+
335
+ ### With `/debrief`
336
+ After the Gauntlet completes, Bashir may offer to capture learnings:
337
+ *"The Gauntlet is complete. Want Bashir to analyze what survived and what broke?"*
338
+
339
+ ## Deliverables
340
+
341
+ 1. `/logs/gauntlet-state.md` — round-by-round progress and findings
342
+ 2. All fixes applied to the codebase
343
+ 3. Lessons extracted to `/docs/LESSONS.md` (Wong)
344
+ 4. Final summary with sign-off or outstanding items