voidforge-build 23.11.4 → 23.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. package/dist/.claude/agents/batman-qa.md +1 -0
  2. package/dist/.claude/agents/galadriel-frontend.md +2 -0
  3. package/dist/.claude/agents/kusanagi-devops.md +4 -0
  4. package/dist/.claude/agents/lucius-config.md +6 -0
  5. package/dist/.claude/agents/silver-surfer-herald.md +11 -4
  6. package/dist/.claude/commands/architect.md +9 -0
  7. package/dist/.claude/commands/assemble.md +4 -1
  8. package/dist/.claude/commands/assess.md +13 -1
  9. package/dist/.claude/commands/audit-docs.md +106 -0
  10. package/dist/.claude/commands/deploy.md +28 -0
  11. package/dist/.claude/commands/engage.md +2 -0
  12. package/dist/.claude/commands/gauntlet.md +23 -4
  13. package/dist/.claude/commands/imagine.md +15 -0
  14. package/dist/.claude/commands/ux.md +32 -0
  15. package/dist/.claude/commands/void.md +1 -0
  16. package/dist/CHANGELOG.md +39 -0
  17. package/dist/CLAUDE.md +8 -0
  18. package/dist/VERSION.md +2 -1
  19. package/dist/docs/methods/AI_INTELLIGENCE.md +33 -0
  20. package/dist/docs/methods/ASSEMBLER.md +31 -2
  21. package/dist/docs/methods/CAMPAIGN.md +27 -0
  22. package/dist/docs/methods/DEVOPS_ENGINEER.md +158 -0
  23. package/dist/docs/methods/DOC_AUDIT.md +92 -0
  24. package/dist/docs/methods/FORGE_KEEPER.md +16 -5
  25. package/dist/docs/methods/GAUNTLET.md +33 -0
  26. package/dist/docs/methods/PRODUCT_DESIGN_FRONTEND.md +53 -0
  27. package/dist/docs/methods/QA_ENGINEER.md +19 -0
  28. package/dist/docs/methods/RELEASE_MANAGER.md +27 -0
  29. package/dist/docs/methods/SUB_AGENTS.md +31 -0
  30. package/dist/docs/methods/SYSTEMS_ARCHITECT.md +13 -0
  31. package/dist/docs/methods/TESTING.md +19 -0
  32. package/dist/docs/patterns/README.md +3 -0
  33. package/dist/docs/patterns/ai-eval.ts +63 -0
  34. package/dist/docs/patterns/daemon-process.ts +90 -0
  35. package/dist/docs/patterns/deploy-preflight.ts +85 -2
  36. package/dist/docs/patterns/design-tokens.ts +338 -0
  37. package/dist/docs/patterns/error-message-categorization.tsx +376 -0
  38. package/dist/wizard/lib/patterns/daemon-process.d.ts +2 -1
  39. package/dist/wizard/lib/patterns/daemon-process.js +89 -1
  40. package/package.json +2 -2
@@ -143,6 +143,15 @@ When upgrading across versions, check the **Migration Registry** for one-time cl
143
143
 
144
144
  **Important:** Some cleanup targets (like `docs/ARCHITECTURE.md`) could be the user's own project files, not leaked VoidForge artifacts. Before removing any file, **fingerprint it** — check if it contains VoidForge-specific markers (e.g., header says "VoidForge", references `wizard/`, or matches a known stale version like "15.2.1"). If the file looks like the user's own work, skip it and note why.
145
145
 
146
+ **Consumer vs. clone — gate the whole Migration Registry on this first (field report #343 F10).** Spring Cleaning is destructive, and the "Always remove" list below is calibrated for **methodology clones** (projects scaffolded from the `scaffold` or `core` source, which carry no application code of their own). On a **methodology consumer** — an application project that adopted VoidForge but has its own production source tree — files like `playwright.config.ts`, `vitest.config.ts`, `tsconfig.json`, and `package-lock.json` are **legitimate application files**, not leaked VoidForge artifacts. Deleting them on a consumer is **data loss**: you would be removing the project's real test config, TypeScript config, and dependency lockfile.
147
+
148
+ **Detection heuristic:** Read the project's `package.json`. If it declares non-VoidForge `dependencies` or `devDependencies` (anything beyond a bare name + version + description), the project is a **CONSUMER**. If `package.json` is minimal or absent (no real dependencies — the shape scaffold/core ship), the project is a **CLONE**.
149
+
150
+ - **CONSUMER** → **SKIP the entire "Always remove" list.** Do not apply the version-range migrations that delete config/lockfiles. The only files Spring Cleaning may touch on a consumer are ones that fingerprint **unambiguously** as VoidForge artifacts (e.g., `PRD-VOIDFORGE.md`, a `docs/ARCHITECTURE.md` whose header literally says "VoidForge" / "Version: 15.2.1"). Fingerprint **defensively** before deleting anything; when a file is ambiguous, keep it and note why. Never delete `playwright.config.ts`, `vitest.config.ts`, `tsconfig.json`, or `package-lock.json` on a consumer.
151
+ - **CLONE** → apply the full Migration Registry as written below, including the "Always remove" list.
152
+
153
+ When unsure which side of the line the project sits on, treat it as a CONSUMER (the safe default — keeping a file is reversible, deleting it is not).
154
+
146
155
  **Process:**
147
156
  1. Determine which migrations apply based on the local version → upstream version range
148
157
  2. For each applicable migration, scan for the listed files
@@ -163,15 +172,15 @@ When upgrading across versions, check the **Migration Registry** for one-time cl
163
172
 
164
173
  Prior to v20.2, the scaffold and core branches contained files that should only exist on main. These were cleaned from upstream npm package but may persist in projects that cloned earlier versions.
165
174
 
166
- **Always remove (unambiguous VoidForge artifacts):**
175
+ **Always remove (unambiguous VoidForge artifacts) — CLONES ONLY. On a methodology consumer, skip this entire list (field report #343 F10); `package-lock.json`, `playwright.config.ts`, `vitest.config.ts`, and `tsconfig.json` are the consumer's real application files and deleting them is data loss:**
167
176
  ```
168
177
  PRD-VOIDFORGE.md ← VoidForge's own product PRD
169
178
  PROPHECY.md ← Historical roadmap, all items shipped
170
179
  WORKSHOP.md ← Workshop guide requiring wizard/
171
- package-lock.json ← Scaffold/core have no dependencies
172
- playwright.config.ts ← References wizard/e2e
173
- vitest.config.ts ← References wizard/__tests__
174
- tsconfig.json ← References wizard/**/*.ts
180
+ package-lock.json ← Scaffold/core have no dependencies (CONSUMER: real lockfile — keep)
181
+ playwright.config.ts ← References wizard/e2e (CONSUMER: real test config — keep)
182
+ vitest.config.ts ← References wizard/__tests__ (CONSUMER: real test config — keep)
183
+ tsconfig.json ← References wizard/**/*.ts (CONSUMER: real TS config — keep)
175
184
  packages/voidforge/scripts/voidforge.ts ← CLI entry point, imports wizard/
176
185
  scripts/vault-read.ts ← Imports packages/voidforge/wizard/lib/vault
177
186
  scripts/danger-room-feed.sh ← Feeds wizard dashboard
@@ -274,6 +283,8 @@ Verify and celebrate:
274
283
  - Conflicts resolved: [list]
275
284
  ```
276
285
  4. Check for handoffs — if new commands or agents were added, mention them
286
+ 4b. **Restart required before new agents are launchable (field report #343 F1a).** Agent registration is **session-scoped**: Claude Code reads `.claude/agents/*.md` at session start, so any agent files this sync *added* are NOT yet usable as `subagent_type` values in the current session. If the sync added one or more `.claude/agents/` files, tell the operator: *"New agents were synced into `.claude/agents/`. Restart the Claude Code session before launching them — until you do, they can't be used as `subagent_type` values."* Files that were merely *updated* (already present at session start) work without a restart; only newly-added agent files need one.
287
+ 4c. **Silver Surfer Gate bypass flags are per-session (field report #343 F4).** `--solo` and `--light` bypasses recorded by `scripts/surfer-gate/bypass.sh` live in per-session gate state, so a bypass is **not durable across `/clear` or a session restart**. After clearing context or restarting, any prior `--solo`/`--light` no longer applies and must be **re-issued** on the next gated command. Don't assume a bypass set earlier in a different session is still in effect — if the operator restarted or ran `/clear` since granting it, the gate is live again until the flag is passed afresh.
277
288
  5. **Content drift check:** If the sync changed methodology counts (agent counts, command counts, pattern counts) AND the project has a data layer that displays VoidForge metadata (e.g., `releases.ts`, `commands.ts`, site content), flag: "The sync changed [N] agents/commands/patterns. If your project displays these counts, update the data layer to match." This prevents stale counts on marketing sites and docs pages after version bumps. (Field report #113)
278
289
  5b. **Description accuracy check (Radagast):** For projects that display command descriptions (marketing sites, docs sites, README generators), compare each command's user-facing description against the upstream method doc's actual steps. If the upstream method doc gained new steps, flags, or capabilities in this sync that aren't reflected in the site's description, flag: "Command /X gained [capability] in this sync but the site description doesn't mention it. Update the description in [data file]." Count-based checks catch missing entries; this catches stale descriptions on existing entries. The most common void sync change is adding capabilities to existing commands, not adding new commands. (Field report #267: 9 commands had outdated descriptions after a sync that added capabilities to 12 agents — the biggest feature was invisible on the site.)
279
290
  5c. **Version history check:** If VERSION.md was updated, compare the version table entries against any project pages that display release history (roadmap pages, changelog displays, "shipped versions" sections). Flag versions present in VERSION.md that are missing from site content. This prevents version drift between the methodology's version history and user-facing release pages.
@@ -95,6 +95,21 @@ This catches what static analysis misses: IPv6 binding, native module ABI compat
95
95
 
96
96
  **Semantic verification rule:** Verify semantic correctness of arguments, not just type correctness. Ask: is this the RIGHT value, not just a valid type? A function call that compiles and passes type-checking can still be fundamentally wrong if the wrong variable is passed. Check that each argument carries the intended meaning, not just a compatible shape. (Field report #258: aggregate spend parameter received a config object — type-compatible but semantically meaningless, causing NaN comparisons that silently fell through.)
97
97
 
98
+ **Step 4.5 — Adversarial Verification (vote-based REFUTE pass) (field report #346 #2):** Crossfire (above) attacks the codebase to discover NEW bugs. This sub-step is the opposite vector — it refutes the EXISTING findings already on the board. Run it on every **Critical** and **High** finding before it reaches the fix batch:
99
+
100
+ 1. For each Critical/High finding, spawn **≥2 skeptic agents** (drawn from different universes per the low-confidence escalation rule). Each skeptic is prompted to **REFUTE** the finding, not to confirm it: *"Here is a claimed defect. Read the actual code at the cited file:line and prove it is NOT a real issue. Default to REFUTED unless the code itself confirms the defect."*
101
+ 2. Each skeptic returns a vote: **CONFIRM** (the code at the cited location demonstrably exhibits the defect) or **REFUTE** (the defect cannot be reproduced from the cited code).
102
+ 3. **Keep a finding only if it receives ≥1 CONFIRM.** A finding that every skeptic refutes is dropped (logged as a refuted first-pass false positive, not deleted silently).
103
+ 4. **Re-rate severity from the votes**, not from the original author's assertion: a Critical that earns only one weak CONFIRM and one REFUTE drops to High or Medium; a finding that all skeptics CONFIRM with reproductions holds its severity.
104
+
105
+ Why default-to-refuted: across instrumented Gauntlets, **~38% of first-pass Criticals were false positives** — author confidence and adversarial-attack momentum inflate severity. An attacker prompted to find bugs will manufacture them; a skeptic prompted to refute them filters them. The two passes are complementary: Crossfire (attack for new bugs) → Adversarial Verification (refute existing findings).
106
+
107
+ **Verify the FIX, not just the finding (field report #348 #4 / #350 #4):** The refute pass must also challenge the **PROPOSED FIX**, not only the finding it addresses. For each fix the batch intends to apply, the skeptic asks: *does this fix introduce a NEW failure mode the original code did not have?* Specifically hunt for **wedge, unbounded retry, infinite loop, orphaned record, double-send** regressions. The risk is acute whenever a fix adds a **coordination primitive — a sentinel, a lock, a retry-state row, a fence/claim marker — without also adding a liveness signal** (a bounded timeout that is actually reachable, a heartbeat, a dead-man release). A coordination primitive with no reachable release path does not fix a bug; it converts a transient failure into a permanent wedge.
108
+
109
+ > **M5 mint-fence incident (field report #348 #4):** a fix added a stale-reclaim fence to recover stuck mint jobs after **120s**. But the reclaim window sat *inside* a BullMQ retry budget of only **~3s** — the 120s liveness threshold was structurally unreachable before the job exhausted its retries, so drafts that hit the fence wedged permanently in `FAILED` instead of being reclaimed. The fix's own coordination primitive (the fence) had no reachable liveness signal. The finding was real; the *fix* created a new Critical.
110
+
111
+ > **Cross-system checkpoint is non-optional (field report #350 #4):** in a multi-mission Gauntlet, the cross-system checkpoint caught a **fix-induced Critical that a per-mission review's own fix had created** — the per-mission review verified its fix in isolation and passed it; only the whole-system pass saw the new failure mode the fix introduced. This is direct evidence that verifying a fix against the single mission that motivated it is insufficient. The Gauntlet-level refute-the-fix checkpoint stays in the protocol regardless of how green the per-mission reviews were.
112
+
98
113
  **Round 5 — The Council (convergence):**
99
114
  - Spock (Star Trek) — code quality after fixes
100
115
  - Ahsoka (Star Wars) — access control integrity
@@ -164,6 +179,22 @@ Fix batches happen between rounds:
164
179
 
165
180
  **Production-parity exit criterion:** Before any Gauntlet round can be marked PASS, verify that the test execution backend matches the project's declared production backend. If `PROJECT_VERSION.md` (or equivalent) declares PostgreSQL but `tests/conftest.py` autouse fixture pins SQLite (or vice versa), the Gauntlet **FAILS** regardless of green test counts. Tests pinned to the wrong backend silently mask the integrations that actually run in prod (RLS, asyncpg pools, advisory locks, LISTEN/NOTIFY, FOR UPDATE SKIP LOCKED, transaction semantics). Field report #315 M3: this slipped past 4 dual-backend Gauntlets on Union Station between v6.2.1 cutover and v7.6 — every Gauntlet was structurally blind to the runtime risk it was supposed to be reviewing. Concrete check at end of each round: `grep -nE "_backend\s*=\s*['\"]" tests/conftest.py` and reconcile against `cat PROJECT_VERSION.md | grep -i 'database\|backend'`. Mismatch = FAIL the round.
166
181
 
182
+ **Production-config boot exit criterion (Victory/launch-readiness Gauntlet) (field report #350 #1):** The #315 production-parity criterion above only reconciles the *test database backend*. It does NOT cover sandbox storage, sandbox email, sandbox extractor, or any other adapter that runs in a fake/sandbox mode under test but must resolve to a real implementation in production. For a Victory or launch-readiness Gauntlet, before any round can be marked PASS, run config validation in a **`APP_ENV=production` posture and ASSERT the app actually boots** under it. This catches, before launch:
183
+
184
+ - **Missing real adapters that throw** — a production adapter (`S3Storage`, `SESMailer`, a real extractor) whose constructor or first call raises because a required key/endpoint was never provisioned. Under sandbox the fake adapter swallows this; under `APP_ENV=production` it surfaces at boot.
185
+ - **Sandbox-in-prod** — config that silently falls back to the sandbox adapter when a production credential is absent, shipping fake storage/email/extraction to real users.
186
+ - **No prod-boot guard** — the absence of any startup assertion that production mode resolved zero sandbox adapters.
187
+
188
+ Concrete check: `APP_ENV=production <boot command> --check-config` (or the smallest invocation that triggers full adapter resolution) must exit 0 *and* log zero sandbox-adapter selections. A boot that throws, or that boots only by falling back to a sandbox adapter, **FAILS** the round.
189
+
190
+ **Sandbox-blind-spot dimension (field report #350 #2):** A 100%-green **sandbox** test suite is *necessary but not sufficient*. Add a first-class round dimension that explicitly enumerates: **"what does the green sandbox suite NOT exercise *because* it runs in sandbox?"** Sandbox mode does not just substitute fake data — it changes which code paths execute. Concretely hunt for:
191
+
192
+ - **Selectors / accessors that throw only on real adapters** — a `get_url()`, `presign()`, or `extract()` that returns a canned value in sandbox but raises on the real implementation (missing region, unsigned URL, unsupported content type). The sandbox path never reaches the throwing branch.
193
+ - **Auto / silent paths suppressed by sandbox confidence pinning** — when sandbox pins a confidence score or classification to a constant, every downstream branch gated on that score is forced down one path. The auto-approve, auto-retry, or human-fallback branches that real (variable) confidence would trigger are never exercised by the green suite.
194
+ - **Coverage that is structurally unreachable in sandbox** — branches behind real rate limits, real pagination, real webhook signatures, real timeouts.
195
+
196
+ Output of this dimension is an explicit list: *"green sandbox suite does NOT cover: [path], [path], …"* — each entry is either covered by a production-posture test (see the boot criterion above) or logged as a known launch-risk gap. "All sandbox tests pass" is never, by itself, grounds to mark a launch-readiness round PASS.
197
+
167
198
  ## Finding Format
168
199
 
169
200
  Every finding, from every agent, in every round, uses this format:
@@ -323,6 +354,8 @@ Each agent reports a confidence score (0-100) on their findings. The score refle
323
354
 
324
355
  **Why this matters:** In the v8.0 Gauntlet, several "findings" were false positives that wasted fix time. Confidence scoring lets agents express uncertainty instead of presenting everything as definitive. Low-confidence findings get a second opinion before reaching the user.
325
356
 
357
+ **PRINCIPLE — Critical findings are unconditionally verified (field report #345 DEAL-003):** Confidence is an advisory signal for routing *Medium and below*. It is NEVER a fast-track that lets a **Critical**-severity finding skip adversarial verification. The 90-100 "skip re-verification" optimization above applies to High/Medium/Low only — a Critical at confidence 97 is routed to the adversarial refute pass exactly the same as a Critical at confidence 40. Severity dominates confidence: when the two conflict, the higher severity wins the routing decision. Do not enshrine a runtime `needs_verify` boolean (or any per-finding "already verified, skip" flag) into the finding schema as a way to opt a Critical out of verification — Critical-routes-to-verification is a structural property of the protocol, not a field an agent (or a fix author) can toggle. The cost of one false-negative Critical reaching production dwarfs the cost of re-verifying a true-positive one.
358
+
326
359
  ## Sub-agent Failure Fallback
327
360
 
328
361
  If a sub-agent launch fails (API error, timeout, context exhaustion):
@@ -30,6 +30,8 @@
30
30
 
31
31
  Adversarial UX/UI QA review. Identify usability issues, inconsistencies, broken states, accessibility gaps, responsiveness problems. Implement safely in small batches. No redesigning for fun.
32
32
 
33
+ **Scope clarification — `/ux` is a UI/UX review verb, not a generic audit verb.** (Field report #342 F-3.) `/ux` reviews interface and experience: screens, flows, states, a11y, visual hierarchy, motion. **Documentation and content audits are out of `/ux`'s scope** — auditing prose, doc structure, broken links, stale instructions, or content accuracy is a different discipline with a different checklist. Route those to the doc-audit path (`/audit-docs`, see `DOC_AUDIT.md`), not here. `/ux` happens to be the most audit-shaped command in the roster, which tempts users to point every audit-flavored request at it; resist that. If a request is about *what the docs say* rather than *how the interface behaves*, hand off to the doc-audit path. (Tutorial/docs *surfaces* — the rendered page's usability, launch-context, prerequisite depth per Step 1.5 — remain in `/ux`'s scope; the *content audit* of those same docs does not.)
34
+
33
35
  ## When to Call Other Agents
34
36
 
35
37
  | Situation | Hand off to |
@@ -162,6 +164,57 @@ Any fire-and-forget background operation (AI generation, file processing, deploy
162
164
  Before hiding, relocating, or collapsing a UI container (dropdown, panel, menu, toolbar), list ALL actions inside it — primary (viewing, selecting, navigating) AND secondary (creating, deleting, configuring, exporting). Verify every action remains reachable after the redesign. A "simplification" that hides a version picker also hides the "New Version" button inside it.
163
165
  (Field report #22: workspace redesign hid the version creation button that lived inside a dropdown.)
164
166
 
167
+ ## Step 1.8 — Reference Grounding (World-Scan) — Mandatory
168
+
169
+ (Field reports #347, #2.)
170
+
171
+ Before Galadriel generates any visual direction — palette, type system, layout language, signature interaction — she must ground the work in the real design world. This step is **mandatory** input to every downstream generation step. Skipping it produces the single most common visual failure mode in agent-generated UI.
172
+
173
+ **The failure mode — committee-converges-on-the-mean.** When a committee of agents reasons about "what good design looks like" from training priors alone, every agent independently regresses toward the statistical center of its training distribution. The outputs agree with each other, feel internally consistent, and pass every internal review — yet land on the bland, averaged, instantly-recognizable look users now perceive as "AI slop." Consensus is not quality here; it is the symptom. The agents converged on the mean precisely *because* nothing pulled them off it. Internal agreement on visual direction, with no external reference, is a red flag, not a green light.
174
+
175
+ **The remedy — fan out to the real world first.** Before any visual generation, web-capable agents (Arwen, Éowyn) fan out to:
176
+
177
+ - **Award galleries:** Awwwards, FWA, CSSDA, Godly, Typewolf. These are curated, off-the-mean, and current.
178
+ - **The live competitor set:** the actual sites of the product's named competitors and adjacent best-in-class products — not a description of them, the live pages.
179
+
180
+ From that scan, extract **named** artifacts into a **reference dossier**:
181
+
182
+ - Specific sites worth stealing a move from (named, with the move identified: "Linear's command-palette transition," "Stripe's gradient-on-scroll hero," not "a clean SaaS site").
183
+ - Named typefaces and pairings actually in use (not "a modern sans").
184
+ - Named interactions and motion patterns (the signature moment, the page transition, the hover behavior) worth adapting.
185
+
186
+ **The dossier is required input downstream.** Every later generation step — Step 1.75 enchantment, Step 2 visual attack plan, any palette/type/layout proposal — must cite the dossier. A proposal with no reference anchor is unanchored from reality and is sent back. **Never generate visual direction from training priors alone.** The dossier is the gravity that pulls the work off the statistical mean.
187
+
188
+ ## Step 1.85 — Converging Creative Direction
189
+
190
+ (Field reports #351, #2.)
191
+
192
+ Reference grounding tells you where the real world is. These three disciplines keep your own output off the mean and make creative direction actually converge instead of looping.
193
+
194
+ ### Show, don't tell — prototype before you finalize
195
+
196
+ Creative direction does not converge from prose, mockups, or description. It converges only when a **feel-able interactive prototype of the signature moment** ships to a review URL someone can open and touch. The signature moment — the hero reveal, the command palette, the card-to-detail transition, whatever carries the product's character — must run in a browser at a real URL before the direction is called final. Reading "a smooth 200ms ease-out reveal" tells you nothing; opening the URL and feeling it tells you everything. Until the signature moment is feel-able at a URL, treat the direction as a proposal, not a decision. This is the fastest known way to break the description-loop where reviewers keep agreeing on words that mean different things to each of them.
197
+
198
+ ### Token-scoped theming — pivots must be cheap
199
+
200
+ Scope color and type to **semantic tokens** (`--color-surface`, `--color-accent`, `--text-heading`, `--text-body`) from the first component, never hardcoded values inside components. The test: a palette pivot or a type pivot must be a **token change, not a component rewrite**. If switching the accent color or swapping the heading typeface requires editing more than the token definitions, the theming is not token-scoped and the pivot is expensive — which means in practice the pivot won't happen, and the design freezes on its first guess. Cheap pivots are what let creative direction explore and actually converge instead of committing to the first idea by inertia. Celeborn (Step 2 design-system governance) enforces token usage; this step establishes *why* it is load-bearing for creative direction, not just consistency.
201
+
202
+ ### The de-AI checklist
203
+
204
+ Screen all copy and visuals against the tells that mark generated work as generated. Each tell below is a flag, not an automatic ban — but every flagged instance must be a deliberate, justified choice, never a default the model reached for:
205
+
206
+ **Copy tells:**
207
+ - **Em-dashes** used as the default connective rhythm (the most reliable single tell). Vary the punctuation; not every clause break is an em-dash.
208
+ - **Generic adjectives** — "seamless," "powerful," "robust," "intuitive," "elevate," "delightful," "effortless." Specific beats generic; show the thing instead of asserting it.
209
+
210
+ **Visual tells:**
211
+ - **Gradient-text** headings (the `bg-clip-text` rainbow/violet headline).
212
+ - **Pill eyebrows** — the small rounded-full badge above every hero headline.
213
+ - **Default Inter/Playfair pairing** — the reflexive "modern sans + elegant serif" combo. If the reference dossier (Step 1.8) didn't lead you there for a reason, don't reach for it by default.
214
+ - **Cream-editorial-as-trope** — the warm off-white background + serif + wide margins "editorial" look applied to products it doesn't fit, because the model treats it as shorthand for "premium."
215
+
216
+ A surface that trips three or more of these tells is presumed AI-slop and goes back for de-AI revision, anchored against the Step 1.8 reference dossier.
217
+
165
218
  ## Step 2 — UX/UI Attack Plan
166
219
 
167
220
  **Elrond:** IA, navigation, task flows, friction.
@@ -261,6 +261,25 @@ Oracle scans for methods that return success without side effects — the most d
261
261
 
262
262
  Flag as **High severity**. In financial systems (trading, payments, billing), flag as **Critical**. (Field report #125: `ProtectionService._place_stop_loss()` returned `True` after logging but never called the exchange. `OrderService.cancel_order()` returned `True` without cancelling.)
263
263
 
264
+ ### Failure Attribution (multi-file test runs)
265
+
266
+ A test failure observed during a multi-file suite run is **NOT attributed to your change** until BOTH of these hold:
267
+
268
+ 1. **It reproduces with that file run in ISOLATION.** Re-run only the failing test file by itself (e.g., `pytest path/to/test_x.py`, `npm test -- path/to/x.test.ts`, `go test ./pkg/x`). If the failure vanishes when the file runs alone, it is a cross-file collision, not your regression.
269
+ 2. **It does NOT reproduce on clean HEAD.** `git stash` your working changes, re-run the same isolated file, and observe. If the failure is present on clean HEAD too, your change did not cause it. `git stash pop` to restore.
270
+
271
+ Shared-DB and shared-fixture suites routinely produce cross-file collisions — duplicate-seed conflicts, ordering dependencies, leaked global state, autoincrement-id assumptions — that masquerade as regressions introduced by the change under review. Attributing one of these to your fix sends the QA pass down a false trail and can trigger a "revert the good fix" overcorrection. Run the isolation check and the clean-HEAD check before you write the bug down or blame the diff. (Field report #349 F-3)
272
+
273
+ ### Planted-Bug Check — Gates Must Gate
274
+
275
+ For every gate, threshold, or invariant a mission introduces (auth allowlist, eval scorer, rate cap, boot guard, validation boundary, feature flag), the review MUST confirm the gate actually gates: a deliberate inversion or revert of the gate's logic WOULD fail at least one test. Procedure — for each gate:
276
+
277
+ 1. Identify the line(s) that enforce the gate.
278
+ 2. Mentally (or, when cheap and reversible, actually) invert it — flip the comparison, negate the predicate, widen the allowlist, make the scorer return a constant pass, push the boundary off by one.
279
+ 3. Ask: does any existing test now go red? If yes, the gate is covered. If no test trips, the gate is **untested** — the finding is **High**, and the deliverable is the missing test that would have caught the inversion.
280
+
281
+ A gate with no test that fails on its inversion is a **vacuous invariant**: it looks like protection but enforces nothing, because nothing observes whether it holds. Recurring vacuous-invariant anti-patterns (these surfaced **4x in a single session**): an eval scorer that always passes regardless of output; an auth allowlist with an inverted `!`-check that admits everyone; an off-by-one cap boundary that never actually caps; a truthy boot-guard that is always truthy and so never guards. Treat any newly-introduced gate as guilty until a failing-on-inversion test proves it innocent. (Field report #352 #1)
282
+
264
283
  ### Safety-Critical Return Value Verification
265
284
 
266
285
  For systems with safety-critical operations (stop-loss placement, circuit breakers, rollback triggers, payment captures, credential revocations): verify the return value of the safety operation BEFORE transitioning state. The pattern: `call safety operation → check return → only then transition`.
@@ -230,3 +230,30 @@ After pushing to remote, if the project runs on a persistent server (PM2, system
230
230
  2. **If stale:** Prompt: "Server is running an older version. Rebuild and restart? [Y/n]"
231
231
  3. **In blitz mode:** Auto-rebuild if a deploy script or PM2 ecosystem config exists.
232
232
  4. Pushing code to GitHub is NOT deploying it. The server must be rebuilt and restarted for changes to take effect. (Field report #104: 22 commits pushed but PM2 was still running v3.8.1 while code was v3.10.0.)
233
+
234
+ ## No Auto-Rotting Production-Status Footer (field report #342 F-4)
235
+
236
+ Do NOT add a "Production binary still vX.Y — vA, B, C await operator deploy" footer to the `PROJECT_VERSION.md` template (or any per-version block). The pattern is seductive — it reads as a helpful reminder when written — but it rots silently: it is accurate only at the instant of the version it was written under, and the *next* version bump leaves it pointing at a stale "still on vX.Y" claim that nobody re-reads. By the third release it actively lies about what production is running.
237
+
238
+ **Rule:** Production-deploy status lives in exactly two places, both of which a release bump already touches:
239
+
240
+ 1. **The single source of truth**, if the project keeps one — `docs/_truth.yml` (or equivalent machine-readable status file). One canonical `production_version:` field, not a prose footer.
241
+ 2. **The topmost "Current" block** of `PROJECT_VERSION.md` — the line Coulson already rewrites every bump (Step 5 changes `**Current:** X.Y.Z`). Deploy state, if tracked here at all, belongs adjacent to that line so it is impossible to bump the version without confronting it.
242
+
243
+ A per-version footer fails because it is *additive* — each bump appends a new one and leaves the old ones in place, so the file accumulates N footers of which N−1 are false. The Current block and the truth file are *overwritten* each bump, so they cannot drift. Coulson rejects any release diff that introduces an "await operator deploy" or "Production binary still" footer; route that information to the Current block instead.
244
+
245
+ ## Regenerating Generated CLAUDE.md Stack Blocks (field report #342 F-2)
246
+
247
+ When a generated `CLAUDE.md` (or any generated doc) embeds a project stack/inventory block — framework, language, test count, package versions — do NOT leave a promissory placeholder marker (`<!-- stack block: fill me in -->`, `[STACK_TBD]`, etc.) that depends on a human remembering to update it. Placeholder markers rot the same way the footer in F-4 does: they survive review, ship, and then read as authoritative once the brackets are forgotten.
248
+
249
+ **Pattern:** If the project keeps a machine-readable truth source — `docs/_truth.yml`, `package.json`, a manifest — a regeneration helper rewrites a **clearly-delimited generated block** in place from that source, so the block is reproducible and drift is impossible (re-run the helper, diff, commit). Wrap the block in explicit sentinels so the rewrite is surgical and the hand-written prose around it is never clobbered:
250
+
251
+ ```
252
+ <!-- BEGIN GENERATED: stack (do not edit by hand — run scripts/regen-claude-md.sh) -->
253
+ - **Framework:** Next.js 15.4
254
+ - **Language:** TypeScript 5.6 (strict)
255
+ - **Tests:** 1209 passing
256
+ <!-- END GENERATED: stack -->
257
+ ```
258
+
259
+ A working `scripts/regen-claude-md.sh` may ship alongside this discipline (reading `docs/_truth.yml` / `package.json` and rewriting only the text between the sentinels, leaving everything else byte-identical). If that script is absent, this section documents the intended pattern: the *generated* block is derived, never authored by hand, and never a placeholder. On every MINOR/MAJOR bump Coulson regenerates the block (or flags it for regeneration) rather than trusting that someone updated the prose by hand.
@@ -115,6 +115,10 @@ This powers the Danger Room's live agent ticker. The wizard server watches this
115
115
 
116
116
  This is **methodology-driven logging**, not hook-driven. Hooks cannot extract agent identity from tool input — the orchestrator must write the log entry explicitly. (Field report #128, architectural review)
117
117
 
118
+ ### Workflow-Tool Progress-Tree Labels
119
+
120
+ When dispatching via the Workflow tool, set the agent **label** so the named character surfaces in the `/workflows` progress tree. Use the form `"<agent> · <key>"` (e.g., `"Picard · review:architecture"`, `"Kenobi · sentinel:auth"`, `"Galadriel · ux:a11y"`), or omit the label entirely so the underlying `agentType` surfaces on its own. If you instead pass only a dimension key like `review:architecture` as the label, that key OVERRIDES the agent identity and the tree shows the dimension instead of Picard/Kenobi/Galadriel — the roster becomes anonymous in the dashboard and the Danger Room ticker correlation breaks. Keep the character name as the leading token of every workflow label. (Field report #348 #2.)
121
+
118
122
  ## Delegation Template
119
123
 
120
124
  ```
@@ -330,6 +334,21 @@ This pattern applies to:
330
334
  - Galadriel's UX (Samwise + Radagast re-verify)
331
335
  - Kenobi's Security (Maul re-probes remediations)
332
336
 
337
+ #### Verify the FIX, not just the finding
338
+
339
+ The adversarial-verify step has two distinct jobs, and orchestrators routinely collapse them into one:
340
+
341
+ 1. **Re-probe the fixed AREA** — after a fix lands, confirm the original finding is gone and no neighboring regression appeared. This is the Pass 2 above.
342
+ 2. **Interrogate the fix DESIGN** — before or as the fix lands, challenge the *proposed remediation itself* for NEW failure modes it introduces: wedge (a state that can never be exited inside the available budget), unbounded retry, infinite loop, orphaned record, double-send. This is NOT the same as re-probing the area; it scrutinizes the design of the change, not its installed effect.
343
+
344
+ Job 2 is **especially mandatory when the fix adds a coordination primitive** — a sentinel, a lock, a retry-state record, a fence, a dedup marker — **without a corresponding liveness signal** (a guaranteed path that releases the primitive, an upper bound on retries, a reclaim window that is actually reachable). A coordination primitive with no liveness signal is a wedge waiting to happen: it makes the original bug rarer but converts it into a stuck state that is harder to diagnose.
345
+
346
+ Motivating incidents:
347
+ - **M5 mint-fence** (field report #348 #1 / #350 #4): the fix added a mint fence so a draft couldn't be re-minted concurrently, with a reclaim window to recover abandoned fences. But the reclaim window was set *longer than the retry budget* — so every retry exhausted before the window opened, and the reclaim path was algebraically unreachable inside the retry budget. Drafts wedged permanently in `FAILED`. The fix's own coordination primitive (the fence) had no reachable liveness path.
348
+ - **M6 lifecycle-sweep** (field report #348 #1 / #350 #4): the fix swept lifecycle records on a schedule but compared against a stale `send_at` snapshot captured before the sweep, so a record whose `send_at` had advanced got swept AND re-sent — a double-send introduced by the remediation, not present in the original bug.
349
+
350
+ Both would have been caught by an adversarial pass that asked "what new failure mode does THIS fix create?" rather than only "is the old finding gone?" When a fix introduces a sentinel/lock/retry-state, the verify dispatch brief MUST name the wedge/loop/orphan/double-send checklist explicitly and require the agent to trace the liveness path.
351
+
333
352
  **Important distinction:** The Agent tool enables **parallel analysis**, not parallel coding. Sub-agents return text findings — the lead agent then implements code changes sequentially. This is still faster than sequential analysis, but don't expect parallel file edits.
334
353
 
335
354
  ### Multi-Session Parallelism (Separate Terminals)
@@ -372,6 +391,18 @@ Proven in production: a full `/assemble --muster` (11 phases, 15+ agents) ran en
372
391
  | Track status, report to user | Do work an agent could do |
373
392
  | Git operations (commit, push) | Launch agent-to-agent dispatch |
374
393
 
394
+ ### Default to Fixing, Not to Asking Which to Fix
395
+
396
+ When a review surfaces a clear list of fixable findings, the orchestrator's DEFAULT is to apply them in batches — not to surface a multi-option "which subset should I fix?" picker and wait. A list of well-scoped findings with obvious remediations is a work queue, not a decision fork. Presenting it back to the user as a menu of options offloads triage the orchestrator was dispatched to do, and stalls a batch that could already be landing.
397
+
398
+ Apply the findings in batches (partition by domain/concern per the Concurrency Rules), verify after each batch, and report what was fixed. Only stop to ask when a choice is **genuinely architectural or irreversible** — e.g., two incompatible schema directions, a data migration that can't be rolled back, a dependency that changes the deploy target, or a trade-off the PRD is silent on (then follow Multi-agent conflict resolution). "Which of these 9 lint/logic findings should I fix?" is not such a choice; "should this be event-sourced or CRUD?" is. (Field report #343 F5.)
399
+
400
+ ### Use AskUserQuestion at Genuine Forks
401
+
402
+ The flip side of the anti-picker rule: when the orchestrator hits a **genuine creative or scope fork** — 2-3 mutually-exclusive directions, none obviously dominant, where guessing wrong means rework — present them with `AskUserQuestion` and an option preview for each, rather than silently picking one or surfacing a single take-it-or-leave-it option. Give each option a short label and a one-line preview of what it commits to (the trade-off, the consequence, what it forecloses), so the user can decide in one read instead of an interview.
403
+
404
+ Use it for: which of two layouts/IA directions, which scope to ship first when both are valid, an irreversible architectural split, a naming/contract convention that downstream agents will all inherit. Do NOT use it as a substitute for triage you should be doing yourself (see the anti-picker rule above), and do NOT pad it past 3 options — a fork with 6 options usually means the scope wasn't analyzed enough to narrow it. One option presented as a question ("shall I do X?") is also an anti-pattern: either it's the obvious default (just do it) or there's a real alternative (show both). (Field report #351 #5.)
405
+
375
406
  ### Standard Agent Brief
376
407
 
377
408
  Every agent launch MUST include a structured brief:
@@ -211,6 +211,19 @@ When reviewing architecture, identify all endpoints/services that mutate the sam
211
211
 
212
212
  When architecture requires accepting a known security risk (e.g., iframe sandbox weakening for UX, storing tokens in memory for operational continuity), document it as an ADR with explicit risk acceptance. Include: the tradeoff made, what is gained, what attack surface is expanded, what mitigations are in place, and who accepted the risk. This prevents the same finding from appearing in every future audit and reduces Gauntlet noise. (Field report #102: preview iframe `allow-scripts + allow-same-origin` sandbox escape was a known tradeoff but was never documented — flagged in every security pass.)
213
213
 
214
+ ### Fix-Direction Reconciliation Against Doctrine
215
+
216
+ For any access, permission, or contract fix, "verified" is not sufficient to make the fix actionable. A finding can be reproduced, root-caused, and confirmed by multiple agents and *still* carry a backwards fix — one that widens a permission, grants access to the wrong principal, or relaxes a contract the doctrine intends to tighten. Reproduction proves the behavior; it does not prove the fix moves in the correct direction. (Field report #349 F-2)
217
+
218
+ Before any such fix is accepted, the architect MUST do two things explicitly:
219
+
220
+ 1. **Name the governing SSOT.** Identify the single source of truth that governs the access/permission/contract being changed — the permission matrix, the relevant ADR, or the published API contract. If no SSOT exists for the boundary being touched, that absence is itself a finding: the fix is unanchored and must wait until the doctrine is written.
221
+ 2. **Reconcile the fix DIRECTION against that SSOT.** State, in the fix record, whether the change *loosens* or *tightens* the boundary, and *who gains or loses access* as a result. Then compare that direction to what the named SSOT prescribes. If the fix loosens a permission the matrix says should be tightened (or grants a role access the ADR reserves for another), the fix is backwards — reject it and re-derive the correct change from doctrine, regardless of how well-verified the underlying finding is.
222
+
223
+ The reconciliation belongs in the same record as the finding: *"SSOT: <permission-matrix row / ADR-NNN / contract endpoint>. Direction: <loosen|tighten>; <principal> gains/loses <access>. Doctrine prescribes: <tighten|loosen>. Reconciled: <match|MISMATCH — fix is backwards>."* A MISMATCH blocks the fix.
224
+
225
+ This mirrors the engage.md Step 2 requirement that access/permission findings name their governing SSOT and reconcile fix direction before synthesis — Picard applies the same gate at the architecture layer so a backwards fix never reaches an ADR or an implementer. (Field report #349 F-2)
226
+
214
227
  ### Strategy Consolidation Check
215
228
 
216
229
  When a system implements N parallel strategies for the same goal (payment providers, notification channels, API versions, deployment targets, content pipelines), periodically verify that each strategy still justifies its maintenance cost. If usage data shows one strategy handling 95%+ of traffic or value while the others sit idle or near-zero, the idle strategies are not "options" — they are dead code with maintenance burden.
@@ -356,6 +356,25 @@ Do NOT create custom DDL in test files — it drifts from the real schema (missi
356
356
 
357
357
  Custom DDL causes test DB schema mismatches that require 2-3 fix-and-retry cycles per occurrence. (Field report #31)
358
358
 
359
+ ### Failure Attribution in Shared-State Suites
360
+
361
+ When a test fails in a suite that shares mutable state across files (a shared test DB, module-level singletons, a global fixture, an ordering-sensitive runner), do NOT attribute a multi-file failure to your change until you have reproduced it in isolation. Shared state means a failure can surface in file B while the root cause lives in file A — or in test ordering itself, not in your edit at all. (Field report #349 F-3)
362
+
363
+ **Procedure:**
364
+
365
+ 1. **Isolate the failing file.** Run only the failing test file (or the single test), so cross-file state pollution can't contribute. Use the framework's isolation/single-worker flag so the runner doesn't parallelize or randomize:
366
+
367
+ | Framework | Isolate single-worker / no parallelism | Disable random ordering |
368
+ |-----------|----------------------------------------|-------------------------|
369
+ | vitest | `vitest run --no-threads <file>` (or `--pool=forks --poolOptions.forks.singleFork`) | `--sequence.shuffle=false` |
370
+ | jest | `jest --runInBand <file>` | `--testSequencer` (default is deterministic) |
371
+ | pytest | `pytest <file>::<test>` | `pytest -p no:randomly` (disable pytest-randomly) |
372
+
373
+ 2. **Compare against clean HEAD.** Stash your change (`git stash`) and re-run the same isolated command on a clean tree. If it still fails on clean HEAD, the failure is pre-existing — not yours. Restore with `git stash pop` afterward.
374
+ 3. **Only after isolation + clean-HEAD comparison** attribute the failure to your change, and fix the actual cause rather than the symptom.
375
+
376
+ This is the canonical rule in `/docs/methods/QA_ENGINEER.md` (Failure Attribution) — see it for the full decision tree. This section is the testing-runner-flag companion to it.
377
+
359
378
  ## Setup Checklist
360
379
 
361
380
  When setting up testing for a new project:
@@ -34,6 +34,9 @@ Reference implementations for common code structures. These show the **shape and
34
34
  | Data Pipeline | `data-pipeline.ts` | ETL with checkpoint/resume, quality checks, idempotent processing | Node.js streams, Python polars, SQL/dbt |
35
35
  | Backtest Engine | `backtest-engine.ts` | Walk-forward validation, no-lookahead, Sharpe/drawdown metrics | Python vectorbt/backtrader |
36
36
  | Execution Safety | `execution-safety.ts` | Order validation, position limits, exchange precision, paper/live toggle | CCXT, Alpaca, IBKR |
37
+ | Design Tokens | `design-tokens.ts` | Semantic color/type tokens so theme pivots are a token change (field report #351) | CSS vars + Tailwind + React |
38
+ | Nginx Vhost | `nginx-vhost.conf` | Cloudflare-Flexible-safe vhost: security headers, ACME passthrough (field report #351) | Nginx |
39
+ | Error Message Categorization | `error-message-categorization.tsx` | Categorize errors at the UI boundary before showing copy (field report #351) | React (framework-agnostic notes) |
37
40
 
38
41
  ## How to Use
39
42
 
@@ -337,6 +337,69 @@ export const CLAUDE_PROMPT_EVAL_CATEGORIES = {
337
337
  * 7. cost per case within 20% of baseline
338
338
  */
339
339
 
340
+ // --- Live eval layer: the pre-launch gate (field report #352, #4) ---
341
+
342
+ /**
343
+ * THE LIVE EVAL LAYER IS THE PRE-LAUNCH GATE.
344
+ *
345
+ * Deterministic and sandbox-adapter evals (fixed inputs, fake-data runners)
346
+ * verify your *plumbing* — scoring functions, tag breakdowns, comparison
347
+ * thresholds. They CANNOT catch model-output-shape bugs, because the runner
348
+ * never calls a real model. The shape of what a live model actually emits —
349
+ * extra prose, null fields, reordered keys, casing drift — only appears when
350
+ * you run against the real provider.
351
+ *
352
+ * Field report #352: a classifier passed every sandbox eval (the fake runner
353
+ * returned hand-written JSON), then crashed in production on launch day
354
+ * because the live model emitted `null` for an absent optional field and the
355
+ * Zod `.optional()` parse rejected it. The deterministic layer was green the
356
+ * whole time. The bug was structurally invisible to it.
357
+ *
358
+ * Rule: before any launch, run AT LEAST ONE eval pass with a LIVE model
359
+ * runner (real provider call), not just the sandbox runner. Treat the live
360
+ * pass as a release gate — a deterministic-only green is necessary but never
361
+ * sufficient. Wire it as the final, non-skippable category in CI.
362
+ *
363
+ * // Sandbox pass — fast, free, catches plumbing regressions:
364
+ * await suite.run(sandboxRunner, version, 'sandbox')
365
+ *
366
+ * // Live pass — the actual gate, catches output-shape bugs:
367
+ * await suite.run(liveModelRunner, version, 'claude-sonnet-4-20250514')
368
+ */
369
+
370
+ /**
371
+ * GOTCHA: live models emit `null` for absent optionals — Zod `.optional()`
372
+ * accepts `undefined`, NOT `null` (field report #352, #4).
373
+ *
374
+ * `z.string().optional()` is `string | undefined`. A live model serializing
375
+ * "this field is absent" almost always emits JSON `null`, which deserializes
376
+ * to JS `null` — and `null` fails `.optional()`. The fix is to normalize
377
+ * null-to-undefined BEFORE Zod validation (do NOT reach for `.nullable()`
378
+ * everywhere — that leaks `null` into downstream types and just moves the
379
+ * problem). Normalize at the boundary, validate clean shapes inside.
380
+ *
381
+ * const Schema = z.object({ label: z.string(), reason: z.string().optional() })
382
+ * const raw = JSON.parse(modelOutput) // { label: 'billing', reason: null }
383
+ * const parsed = Schema.parse(normalizeNullsToUndefined(raw)) // ✓ reason -> undefined
384
+ */
385
+ export function normalizeNullsToUndefined<T>(value: T): T {
386
+ if (value === null) return undefined as T
387
+ if (Array.isArray(value)) {
388
+ return value.map((item) => normalizeNullsToUndefined(item)) as unknown as T
389
+ }
390
+ if (value && typeof value === 'object') {
391
+ const out: Record<string, unknown> = {}
392
+ for (const [key, val] of Object.entries(value as Record<string, unknown>)) {
393
+ const normalized = normalizeNullsToUndefined(val)
394
+ // Drop keys whose value normalized to undefined so Zod `.optional()`
395
+ // treats them as truly absent rather than present-with-undefined.
396
+ if (normalized !== undefined) out[key] = normalized
397
+ }
398
+ return out as T
399
+ }
400
+ return value
401
+ }
402
+
340
403
  /**
341
404
  * Framework adaptations:
342
405
  *
@@ -325,6 +325,95 @@ function createLogger(logPath: string): { log: (msg: string) => void; close: ()
325
325
  };
326
326
  }
327
327
 
328
+ // ── .env Parsing (literal, $-safe) ────────────────────
329
+ // field report #344 F1: never source secrets via `export $(cat .env)` /
330
+ // `eval "$(cat .env)"`. The shell performs variable expansion and word
331
+ // splitting on the RHS, so a `$`-bearing secret — bcrypt hashes
332
+ // ($2b$...), JWTs, Postgres URLs with `$` in the password, anything with
333
+ // `$VAR`/`${...}`/backticks — gets mangled or silently truncated. Parse
334
+ // literally instead: read each line, split on the FIRST `=` only, and keep
335
+ // the value byte-for-byte (no expansion, no eval). For shells, the
336
+ // equivalent is `while IFS='=' read -r k v; do export "$k=$v"; done < .env`
337
+ // — note IFS='=' and `read -r` (raw, no backslash processing), which never
338
+ // re-expands the value.
339
+ //
340
+ // Prefer a runtime-native loader where available — it sidesteps the shell
341
+ // entirely:
342
+ // - Node 20.6+: `node --env-file=.env daemon.js` (literal parse, no shell).
343
+ // - systemd: `EnvironmentFile=/etc/voidforge/heartbeat.env` (also literal;
344
+ // unit-file `Environment=` lines do NOT undergo shell expansion).
345
+ // Use this helper only when you must parse `.env` in-process.
346
+
347
+ function parseDotenv(contents: string): Record<string, string> {
348
+ const out: Record<string, string> = {};
349
+ for (const rawLine of contents.split('\n')) {
350
+ const line = rawLine.replace(/\r$/, '');
351
+ // Skip blanks and comments. A leading `export ` prefix is tolerated.
352
+ const trimmed = line.trimStart();
353
+ if (trimmed === '' || trimmed.startsWith('#')) continue;
354
+ const body = trimmed.startsWith('export ') ? trimmed.slice(7) : trimmed;
355
+
356
+ // Split on the FIRST `=` only — values may legitimately contain `=`.
357
+ const eq = body.indexOf('=');
358
+ if (eq < 0) continue; // not a KEY=VALUE line — ignore, don't guess
359
+ const key = body.slice(0, eq).trim();
360
+ if (!/^[A-Za-z_][A-Za-z0-9_]*$/.test(key)) continue; // invalid env name
361
+
362
+ let value = body.slice(eq + 1);
363
+ // Strip a single layer of matching surrounding quotes. Inside quotes the
364
+ // value is taken LITERALLY — no `$` expansion, no eval — which is the
365
+ // whole point: `PASS='p@$$w0rd'` keeps its `$$` intact.
366
+ if (value.length >= 2 &&
367
+ ((value[0] === '"' && value[value.length - 1] === '"') ||
368
+ (value[0] === "'" && value[value.length - 1] === "'"))) {
369
+ value = value.slice(1, -1);
370
+ } else {
371
+ // Unquoted: trim trailing inline whitespace only (POSIX-ish), never
372
+ // touch interior `$` characters.
373
+ value = value.trimEnd();
374
+ }
375
+ out[key] = value;
376
+ }
377
+ return out;
378
+ }
379
+
380
+ // ── systemd hardening stanza (Node daemons) ───────────
381
+ // field report #344 F3: when running this daemon under systemd, harden the
382
+ // unit — but DO NOT set `MemoryDenyWriteExecute=true` for a Node/V8 process.
383
+ // V8's JIT allocates pages that are written and then executed (it manages its
384
+ // own W^X internally); MDWE forbids any write+exec mapping, so the daemon
385
+ // takes a SIGTRAP and dies at boot, usually before it logs a single line. The
386
+ // safe, high-value sandbox flags below give most of MDWE's benefit without the
387
+ // JIT collision:
388
+ //
389
+ // [Unit]
390
+ // Description=VoidForge Heartbeat daemon
391
+ // After=network-online.target
392
+ // Wants=network-online.target
393
+ //
394
+ // [Service]
395
+ // Type=simple
396
+ // ExecStart=/usr/bin/node /opt/voidforge/daemon.js
397
+ // EnvironmentFile=/etc/voidforge/heartbeat.env # literal parse — see #344 F1
398
+ // Restart=on-failure
399
+ // RestartSec=5
400
+ //
401
+ // # Hardening — keep these:
402
+ // NoNewPrivileges=true # no setuid/setgid privilege escalation
403
+ // ProtectSystem=full # /usr, /boot, /etc mounted read-only
404
+ // ProtectHome=true # /home, /root, /run/user hidden
405
+ // PrivateTmp=true # private /tmp + /var/tmp namespace
406
+ // # MemoryDenyWriteExecute=true # <-- OMITTED ON PURPOSE: breaks V8 JIT
407
+ // # (SIGTRAP at boot). Re-enable ONLY for
408
+ // # Go/Rust/static daemons with no JIT.
409
+ //
410
+ // [Install]
411
+ // WantedBy=multi-user.target
412
+ //
413
+ // Go, Rust, and other AOT-compiled daemons emit no executable pages at
414
+ // runtime, so for THEM you can and should keep `MemoryDenyWriteExecute=true`.
415
+ // The omission above is V8-specific, not a general weakening.
416
+
328
417
  export {
329
418
  writePidFile, checkStalePid, removePidFile,
330
419
  generateSessionToken, validateToken,
@@ -333,6 +422,7 @@ export {
333
422
  setupSignalHandlers,
334
423
  JobScheduler,
335
424
  createLogger,
425
+ parseDotenv,
336
426
  PID_FILE, SOCKET_PATH, TOKEN_FILE, STATE_FILE, LOG_FILE,
337
427
  };
338
428
  export type { DaemonState, HeartbeatState, ScheduledJob };