voidforge-build 23.11.4 → 23.12.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/.claude/agents/batman-qa.md +1 -0
- package/dist/.claude/agents/galadriel-frontend.md +2 -0
- package/dist/.claude/agents/kusanagi-devops.md +4 -0
- package/dist/.claude/agents/lucius-config.md +6 -0
- package/dist/.claude/agents/samwise-accessibility.md +4 -0
- package/dist/.claude/agents/silver-surfer-herald.md +13 -4
- package/dist/.claude/commands/architect.md +9 -0
- package/dist/.claude/commands/assemble.md +4 -1
- package/dist/.claude/commands/assess.md +13 -1
- package/dist/.claude/commands/audit-docs.md +106 -0
- package/dist/.claude/commands/deploy.md +29 -1
- package/dist/.claude/commands/engage.md +19 -1
- package/dist/.claude/commands/gauntlet.md +23 -4
- package/dist/.claude/commands/imagine.md +15 -0
- package/dist/.claude/commands/sentinel.md +15 -0
- package/dist/.claude/commands/ux.md +36 -0
- package/dist/.claude/commands/void.md +1 -0
- package/dist/CHANGELOG.md +65 -0
- package/dist/CLAUDE.md +9 -0
- package/dist/VERSION.md +3 -1
- package/dist/docs/methods/AI_INTELLIGENCE.md +33 -0
- package/dist/docs/methods/ASSEMBLER.md +31 -2
- package/dist/docs/methods/BUILD_PROTOCOL.md +2 -0
- package/dist/docs/methods/CAMPAIGN.md +46 -0
- package/dist/docs/methods/DEVOPS_ENGINEER.md +194 -0
- package/dist/docs/methods/DOC_AUDIT.md +92 -0
- package/dist/docs/methods/FORGE_KEEPER.md +16 -5
- package/dist/docs/methods/GAUNTLET.md +38 -0
- package/dist/docs/methods/PRODUCT_DESIGN_FRONTEND.md +57 -0
- package/dist/docs/methods/QA_ENGINEER.md +21 -0
- package/dist/docs/methods/RELEASE_MANAGER.md +27 -0
- package/dist/docs/methods/SECURITY_AUDITOR.md +12 -1
- package/dist/docs/methods/SUB_AGENTS.md +54 -0
- package/dist/docs/methods/SYSTEMS_ARCHITECT.md +13 -0
- package/dist/docs/methods/TESTING.md +19 -0
- package/dist/docs/patterns/README.md +3 -0
- package/dist/docs/patterns/ai-eval.ts +63 -0
- package/dist/docs/patterns/daemon-process.ts +90 -0
- package/dist/docs/patterns/database-migration.ts +65 -0
- package/dist/docs/patterns/deploy-preflight.ts +85 -2
- package/dist/docs/patterns/design-tokens.ts +338 -0
- package/dist/docs/patterns/error-message-categorization.tsx +376 -0
- package/dist/wizard/lib/patterns/daemon-process.d.ts +2 -1
- package/dist/wizard/lib/patterns/daemon-process.js +89 -1
- package/package.json +2 -2
|
@@ -97,6 +97,11 @@ For each service in `docker-compose.yml`, verify:
|
|
|
97
97
|
7. **Dependency health** — `depends_on` with `condition: service_healthy` (compose v2.1+). Without it, the app starts before its database is ready.
|
|
98
98
|
(Field report #280)
|
|
99
99
|
|
|
100
|
+
**Compose validation goes deeper than syntax (field report #352 #2).** `docker compose config` only validates *syntax* — it renders the merged YAML and exits 0 even when the resulting topology is wrong. Two failure modes it will not catch:
|
|
101
|
+
|
|
102
|
+
- **Dependency closure.** A service can reference a network, volume, or `depends_on` target whose definition exists but whose *startup* chain is broken. Check the closure with `docker compose up --dry-run` — it walks the full dependency graph and reports what would actually start (and in what order) without launching containers.
|
|
103
|
+
- **Overlay merge, not overlay replace.** Compose **merges** list-and-map fields like `depends_on` and `environment` across overlay files (`-f base.yml -f docker-compose.dev.yml`); it does not replace them. The classic trap: `base.yml` declares `depends_on: [redis]` for development, and an overlay tries to drop it with `depends_on: []` — the empty list **merges into** the base list, the `redis` edge **survives**, and prod still waits on (or starts) a dev-only Redis. To *replace* rather than merge, use the override tags: `depends_on: !override []` (replace the whole list) or `!reset null` (remove the key entirely). Verify the rendered result with `docker compose config` and confirm the unwanted edge is actually gone — never assume the overlay won.
|
|
104
|
+
|
|
100
105
|
**L — Monitoring:** Health endpoint (/api/health checking DB, Redis, disk). External uptime monitor. Request logging (method, path, status, duration). Error tracking. Slow query logging (>1s). Worker job logging. Alerts: CPU >80%, Memory >85%, Disk >80%.
|
|
101
106
|
|
|
102
107
|
**Build Staleness Detection (health endpoint):** The health endpoint MUST include a build fingerprint check. At startup, capture a build fingerprint (git commit hash, `BUILD_HASH` env var, or entry bundle mtime). Include it in `/api/health` responses. After any deploy, compare the health endpoint's fingerprint against the expected value. A mismatch means the process serves stale code — the build completed but was never reloaded. Automate: if health fingerprint != deployed commit, trigger process reload. This is the #1 cause of "I deployed but nothing changed" incidents. (Field reports #278, #279)
|
|
@@ -164,6 +169,33 @@ If a process manager (PM2, systemd, Docker, supervisord) owns the application po
|
|
|
164
169
|
|
|
165
170
|
**Detection rule:** When writing CLAUDE.md "How to Run" sections or session restart commands, check if the project uses a process manager (`ecosystem.config.js`, `docker-compose.yml`, `*.service` files). If yes, the restart command MUST go through the PM — not through port killing.
|
|
166
171
|
|
|
172
|
+
### PM2 Operational Foot-guns
|
|
173
|
+
|
|
174
|
+
**`pm2 reload <config>` does NOT re-read log paths.** `error_file` / `out_file` paths bind at process *registration* time, not at reload time (field report #343 F9). If you change a log path in `ecosystem.config.js` and run `pm2 reload`, PM2 keeps writing to the old paths — the new ones never take effect, and a log-rotation or disk-pressure fix silently does nothing. Changing log paths requires a full re-registration cycle:
|
|
175
|
+
```bash
|
|
176
|
+
pm2 delete <app> # drop the old registration
|
|
177
|
+
pm2 start ecosystem.config.js --cwd /path/to/project
|
|
178
|
+
pm2 save # persist so the new paths survive reboot
|
|
179
|
+
```
|
|
180
|
+
The same applies to any other property that binds at registration (`exec_mode`, `instances`, `cwd`): `pm2 reload` reloads code, not the process definition.
|
|
181
|
+
|
|
182
|
+
**Multi-user deploy setups need per-user git identity (field report #343 F3).** When each environment runs as a different OS user (e.g. `deploy-staging`, `deploy-prod`), any git operation the deploy performs as that user — a merge commit, a `git stash`, a tag, an auto-commit of generated lockfiles — fails with `fatal: empty ident name (for <user@host>) not allowed` if that user has no `user.email` / `user.name`. The fault is invisible until a fallback path that commits actually runs in production. Provision git identity per deploy user:
|
|
183
|
+
```bash
|
|
184
|
+
sudo -u deploy-prod git config --global user.email "deploy@example.com"
|
|
185
|
+
sudo -u deploy-prod git config --global user.name "Prod Deploy"
|
|
186
|
+
```
|
|
187
|
+
Add this to `provision.sh` for every Unix user that will run git as part of a deploy or fallback path.
|
|
188
|
+
|
|
189
|
+
### Deploy-Strategy Nomenclature Check
|
|
190
|
+
|
|
191
|
+
If a deploy script's comments or docs claim **blue-green** or **zero-downtime**, verify the code actually implements an atomic-swap mechanism before believing the label (field report #343 F7). A real zero-downtime swap is one of:
|
|
192
|
+
|
|
193
|
+
- **temp-build-then-rename** — build into `release-new/`, then `mv release-new release` (or repoint a `current` symlink) in a single atomic operation,
|
|
194
|
+
- **container swap** — start the new container, health-check it, then cut traffic over and stop the old one, or
|
|
195
|
+
- **load-balancer cutover** — add the new instance to the pool, drain and remove the old one.
|
|
196
|
+
|
|
197
|
+
A `stop → build → start` loop mislabeled "blue-green" serves nothing during the build window and produces a 502 gap on every deploy. The label is not the mechanism. Audit check: grep the deploy script for the claim, then confirm a rename/symlink-repoint, container cutover, or LB pool change exists. If it's a stop-build-start loop, either fix it to atomic-swap or correct the comment — a mislabeled strategy hides a recurring outage.
|
|
198
|
+
|
|
167
199
|
### CI runs `npm test` at repo root
|
|
168
200
|
|
|
169
201
|
In monorepo CI workflows, run `npm test` at the repository root — NOT `npm run test -w <workspace-name>`. The workspace-scoped form skips the root `pretest` hook, silently bypassing any root-level validators (agent-ref checkers, gate tests, consistency checks).
|
|
@@ -191,6 +223,21 @@ fi
|
|
|
191
223
|
|
|
192
224
|
Applies to: Vercel Git Integration, Cloudflare Pages Git Integration, Netlify Git Integration, Firebase web-hook auto-deploys.
|
|
193
225
|
|
|
226
|
+
### The served artifact is not the built artifact
|
|
227
|
+
|
|
228
|
+
Every step exiting 0 — `git pull` ✓, `npm run build` ✓, `pm2 reload` / `docker compose up` ✓ — proves the build *ran*; it does NOT prove the **served** bundle is the one you just built (field report #349 F-1). The two can diverge whenever the thing that builds and the thing that serves are different processes pointed at different paths. The canonical split: a **host nginx static root** serves `/var/www/app/dist`, while the build runs *inside a Docker container* and writes to a **container-internal `dist`** that the host root never sees. Build succeeds, container restarts clean, health check is green — and prod serves the previous bundle indefinitely because nginx is reading a directory nobody rebuilt.
|
|
229
|
+
|
|
230
|
+
Rule: after deploy, confirm the SERVED bundle matches the BUILT one by **fingerprint fetched back through the public/served path** — not by exit codes. Capture a build fingerprint (git short SHA, `BUILD_HASH`, or the hashed entry-bundle filename) at build time, then fetch it back through the real serving path and assert equality:
|
|
231
|
+
```bash
|
|
232
|
+
EXPECTED="$(git rev-parse --short HEAD)"
|
|
233
|
+
# Pull the fingerprint through the SERVED path — the public URL or the host static root,
|
|
234
|
+
# whichever end users actually hit — never the build directory.
|
|
235
|
+
SERVED="$(curl -s "https://$DEPLOY_URL/version.txt")" # or grep the hashed main.<hash>.js from index.html
|
|
236
|
+
[ "$SERVED" = "$EXPECTED" ] || { echo "SERVED ARTIFACT MISMATCH: served=$SERVED built=$EXPECTED"; exit 1; }
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
This **generalizes to manual `/deploy`** the two automated checks already in this doc: **Build Staleness Detection** (§health endpoint — the process serves stale code) catches a build-but-no-reload within one process, and **Post-push live-URL fingerprint** (§above) catches a broken platform auto-deploy. This entry is the same fingerprint discipline for the self-hosted, multi-location, hand-run deploy: assert the served fingerprint equals the built fingerprint as the final gate of any manual deploy, not just platform pushes.
|
|
240
|
+
|
|
194
241
|
### Methodology-exposure check (static-host deploys)
|
|
195
242
|
|
|
196
243
|
After deploying to a static CDN (Cloudflare Pages, Vercel, Netlify, Firebase, S3+CloudFront), curl a known methodology path and assert 404 / denied:
|
|
@@ -293,6 +340,23 @@ If health check fails after deploy:
|
|
|
293
340
|
|
|
294
341
|
**PM2 discipline: never `pm2 delete` + `pm2 start` without `--cwd`.** Always specify the working directory explicitly: `pm2 start ecosystem.config.js --cwd /path/to/project`. Without `--cwd`, PM2 resolves paths relative to the current shell directory, which may differ from the project root — especially in deploy scripts that `cd` between operations. A `pm2 start` from the wrong directory silently starts the process with wrong paths, serving 404s on every route. (Triage fix from field report batch #149-#153.)
|
|
295
342
|
|
|
343
|
+
### Docker Cleanup Preflight
|
|
344
|
+
|
|
345
|
+
Before any `rm -rf` against a Docker **bind-mount** path (volumes the container wrote to as root — pgdata, redis dumps, uploaded files), preflight the ownership; do not just run the delete and hope (field report #353 RC-003). Docker bind-mounts written by a container default to **root** ownership on the host, so an unprivileged agent's `rm -rf` fails partway with `Permission denied`, often after deleting the writable half of the tree — a worse state than not starting.
|
|
346
|
+
|
|
347
|
+
Preflight: `stat` the path's owner first, and branch on it:
|
|
348
|
+
```bash
|
|
349
|
+
target=/var/lib/myapp/pgdata
|
|
350
|
+
owner="$(stat -c %U "$target" 2>/dev/null || stat -f %Su "$target")" # GNU || BSD/macOS
|
|
351
|
+
if [ "$owner" = "root" ] && [ "$(id -u)" -ne 0 ]; then
|
|
352
|
+
echo "MANUAL STEP REQUIRED — $target is root-owned; run as operator:"
|
|
353
|
+
echo " sudo rm -rf $target"
|
|
354
|
+
else
|
|
355
|
+
rm -rf "$target"
|
|
356
|
+
fi
|
|
357
|
+
```
|
|
358
|
+
When the path is root-owned and the agent is unprivileged, **emit the `sudo`-prefixed step as a MANUAL operator action** rather than attempting (and half-completing) the delete. A clean handoff beats a partial destruction. (`stat -c %U` is GNU coreutils; `stat -f %Su` is BSD/macOS — the snippet tries both for portability.)
|
|
359
|
+
|
|
296
360
|
## Multi-Environment Isolation
|
|
297
361
|
|
|
298
362
|
When staging and production coexist on the same server, enforce full isolation:
|
|
@@ -308,6 +372,29 @@ When staging and production coexist on the same server, enforce full isolation:
|
|
|
308
372
|
|
|
309
373
|
Convention isn't enough — enforcement is. The pre-push hook is the single most effective protection. (Field report #241: 68-hour production outage from shared infrastructure.)
|
|
310
374
|
|
|
375
|
+
### Renaming a Linked Worktree Directory Breaks Git Silently
|
|
376
|
+
|
|
377
|
+
A linked git worktree (staging worktree, release worktree) keeps **two** pointer files that must agree on the directory's path. Renaming the worktree directory with a plain `mv` orphans both, and git gives you no warning (field report #343 F2):
|
|
378
|
+
|
|
379
|
+
1. The worktree's own `.git` **file** (not a directory — it contains `gitdir: /abs/path/to/main/.git/worktrees/<name>`).
|
|
380
|
+
2. The main repo's `.git/worktrees/<name>/gitdir` file, which points back at the worktree's `.git` file.
|
|
381
|
+
|
|
382
|
+
After a bare `mv staging staging-old`, both paths are stale. The worst part: **`git worktree list` does NOT warn** — it happily prints the old path, so the breakage is invisible until a git command inside the moved worktree fails with `fatal: not a git repository` or a deploy that `cd`s into the worktree silently operates on the wrong tree.
|
|
383
|
+
|
|
384
|
+
Fix — never `mv` a worktree directory. Use the porcelain that updates both pointers atomically:
|
|
385
|
+
```bash
|
|
386
|
+
git worktree move staging /new/abs/path/staging-old
|
|
387
|
+
```
|
|
388
|
+
If a directory was already moved by hand, repair both pointers manually:
|
|
389
|
+
```bash
|
|
390
|
+
# 1. fix the worktree's own .git file
|
|
391
|
+
echo "gitdir: /abs/main/.git/worktrees/staging" > /new/abs/path/staging-old/.git
|
|
392
|
+
# 2. fix the main repo's back-pointer
|
|
393
|
+
echo "/new/abs/path/staging-old/.git" > /abs/main/.git/worktrees/staging/gitdir
|
|
394
|
+
git worktree repair /new/abs/path/staging-old # validates both ends
|
|
395
|
+
```
|
|
396
|
+
`git worktree repair` is the belt-and-suspenders step — run it after any manual edit to confirm both ends resolve.
|
|
397
|
+
|
|
311
398
|
## Deploy Safety Rules
|
|
312
399
|
|
|
313
400
|
**rsync exclusion mandate:** NEVER use `rsync --delete` without excluding VPS-only directories. User-uploaded files, generated avatars, and data files only exist on the VPS — `--delete` will destroy them. Mandatory exclusions:
|
|
@@ -332,6 +419,98 @@ Add project-specific exclusions for any directory that receives runtime-generate
|
|
|
332
419
|
|
|
333
420
|
**Post-deploy asset verification:** After deploying, verify specifically the files that *changed* in this deploy — not pre-existing assets. Check: (a) correct content-type header (text/html on a static asset means the file is missing from the deployment), (b) correct content-length (not the index.html fallback size), (c) deployment list shows the correct environment. Do NOT verify only pre-existing assets — they prove the host is up, not that the deploy succeeded. (Field report #114)
|
|
334
421
|
|
|
422
|
+
**Read back after a vendor PUT that doesn't echo the object.** When a deploy or config step `PUT`s to a vendor/control-plane API (DNS provider, CDN, Plex, a SaaS settings endpoint) and the response does **not** contain the mutated object, do NOT treat the `200` as confirmation — issue a follow-up `GET` and assert the field you set actually took (field report #353 RC-004). A vendor `PUT` can return `200 OK` while silently discarding body params it doesn't recognize, applies asynchronously, or rejects at a validation layer that still returns success (the Plex pattern: settings PUT returns 200 but the value is unchanged). The status code confirms the request was *received*, not that the *mutation persisted*. Rule: for any non-echoing PUT/PATCH on the deploy path, follow with a read-back and compare before declaring success.
|
|
423
|
+
|
|
424
|
+
## Env-File Loading Safety
|
|
425
|
+
|
|
426
|
+
**NEVER load `.env` files with eval-export.** The pattern `while read line; do eval "export $line"; done < .env` (and `export $(cat .env | xargs)`) routes every value through the shell's positional-parameter and command expansion. Any secret containing a literal `$` — bcrypt hashes (`$2b$12$...`), PHP-style hashes, JWT signing keys, some base64 — gets mangled: `$2b` and `$12` are expanded as positional parameters and silently substituted (usually to empty), corrupting the secret. The app then boots with a broken hash and rejects every login, or signs tokens with a truncated key. The failure is invisible until auth breaks in production (field report #344 F1).
|
|
427
|
+
|
|
428
|
+
Use a `$`-safe literal parser that never re-evaluates the value:
|
|
429
|
+
```bash
|
|
430
|
+
# Safe: read the line verbatim, split on the FIRST '=' only, no expansion.
|
|
431
|
+
while IFS='=' read -r key val; do
|
|
432
|
+
case "$key" in
|
|
433
|
+
''|'#'*) continue ;; # skip blanks and comments
|
|
434
|
+
esac
|
|
435
|
+
export "$key=$val" # value is a literal string, never eval'd
|
|
436
|
+
done < .env
|
|
437
|
+
```
|
|
438
|
+
`$`-safe alternatives that bypass the shell entirely:
|
|
439
|
+
|
|
440
|
+
- **Node ≥20:** `node --env-file=.env app.js` — Node parses the file itself, no shell expansion.
|
|
441
|
+
- **systemd:** `EnvironmentFile=/etc/myapp/app.env` in the unit — systemd reads values literally.
|
|
442
|
+
- **Docker Compose:** `env_file: .env` — Compose reads the file directly (it does NOT eval).
|
|
443
|
+
|
|
444
|
+
Audit existing deploy scripts: grep for `eval "export`, `eval export`, and `export $(cat`. Any hit is a latent secret-corruption bug — replace it with the literal parser or one of the runtime-native loaders above.
|
|
445
|
+
|
|
446
|
+
## Deploy-Environment Assumptions
|
|
447
|
+
|
|
448
|
+
A deploy that succeeds in dev can fail in prod because the *environment* differs in ways no syntax check sees. Three classes recur; two already have their own sections in this doc — this section adds the third and cross-references the others so they're triaged together:
|
|
449
|
+
|
|
450
|
+
1. **Served-artifact verification** — the bundle nginx/the CDN actually serves can diverge from the one you just built. See §The served artifact is not the built artifact and §Post-push live-URL fingerprint.
|
|
451
|
+
2. **`.env`-file precedence / loading** — values get mangled or silently defaulted depending on how the file is loaded. See §Env-File Loading Safety and §Config Foot-Guns (deploy/runtime).
|
|
452
|
+
3. **Boot-time schema re-application under DB-role ownership mismatch** (field report #354 F4) — the new one, below.
|
|
453
|
+
|
|
454
|
+
### Boot-time DDL ownership/grant alignment (field report #354 F4)
|
|
455
|
+
|
|
456
|
+
Idempotent boot-time DDL is NOT automatically safe across environments. When an app runs schema re-application at startup — `CREATE TABLE IF NOT EXISTS`, `CREATE INDEX IF NOT EXISTS`, or a migration runner invoked on boot — the `IF NOT EXISTS` guard only protects against *existence* collisions. It does NOT protect against *ownership* collisions. If the tables were originally created by a **different DB role** than the role the app connects as (the classic split: a privileged `admin`/`migrator` role created the schema, but the app connects as a least-privilege `app` role), the startup DDL fails:
|
|
457
|
+
|
|
458
|
+
- `CREATE TABLE IF NOT EXISTS` on an existing table the connecting role does not own can still raise `must be owner of table <name>` when it tries to reconcile constraints/indexes — `IF NOT EXISTS` short-circuits creation but not every ownership-checked path.
|
|
459
|
+
- `ALTER TABLE` / `CREATE INDEX` in the same boot sequence have no `IF NOT EXISTS` escape and fail outright with `permission denied` or `must be owner of relation`.
|
|
460
|
+
- The app then either crashes at boot or (worse) logs the DDL error and serves with a half-migrated schema.
|
|
461
|
+
|
|
462
|
+
This passes in dev because dev usually runs everything as one superuser-ish role, so ownership is never split. Prod splits roles for least privilege — and that's exactly where the ownership mismatch surfaces.
|
|
463
|
+
|
|
464
|
+
**The check (run before declaring a boot-time-migration deploy healthy):** confirm the role the app connects as either *owns* the schema objects or has been granted the privileges the boot DDL needs. For PostgreSQL:
|
|
465
|
+
```sql
|
|
466
|
+
-- Who owns the tables the app's boot DDL will touch?
|
|
467
|
+
SELECT tablename, tableowner FROM pg_tables WHERE schemaname = 'public';
|
|
468
|
+
-- The connecting app role:
|
|
469
|
+
SELECT current_user;
|
|
470
|
+
```
|
|
471
|
+
If owner ≠ app role, align ownership or grants before the boot runs:
|
|
472
|
+
```sql
|
|
473
|
+
-- Option A: hand ownership to the app role (simplest when the app owns its own migrations)
|
|
474
|
+
ALTER TABLE public.<table> OWNER TO app_role;
|
|
475
|
+
-- Option B: keep a separate migrator owner, but grant the app role what its boot DDL needs,
|
|
476
|
+
-- and make future objects inherit grants:
|
|
477
|
+
GRANT ALL ON ALL TABLES IN SCHEMA public TO app_role;
|
|
478
|
+
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO app_role;
|
|
479
|
+
```
|
|
480
|
+
Prefer **Option A** when the app owns its migrations, **Option B** when policy requires a distinct migrator/owner role. Either way: idempotent DDL still needs ownership/grant alignment — the `IF NOT EXISTS` keyword is not an ownership escape hatch. Best practice is to run migrations as the owning role on deploy and connect the app as a least-privilege role that does NOT re-run DDL at boot at all — but if boot-time re-application stays, this alignment check is mandatory.
|
|
481
|
+
|
|
482
|
+
## systemd Unit Hardening (Node.js)
|
|
483
|
+
|
|
484
|
+
Sandboxing directives in a systemd unit are good practice, but **Node.js units must NOT set `MemoryDenyWriteExecute=true`** (field report #344 F3). V8's JIT compiler maps pages that are simultaneously writable and executable (W^X is violated by design for JIT); `MemoryDenyWriteExecute=true` (MDWE) forbids exactly that, so the Node process dies with **`SIGTRAP` at boot** before it serves a single request. The crash looks unrelated to the unit file — operators chase the app for hours.
|
|
485
|
+
|
|
486
|
+
Safe Node hardening stanza — everything useful **except** MDWE:
|
|
487
|
+
```ini
|
|
488
|
+
[Service]
|
|
489
|
+
# --- hardening (Node-safe) ---
|
|
490
|
+
NoNewPrivileges=true
|
|
491
|
+
ProtectSystem=full
|
|
492
|
+
ProtectHome=true
|
|
493
|
+
PrivateTmp=true
|
|
494
|
+
PrivateDevices=true
|
|
495
|
+
ProtectKernelTunables=true
|
|
496
|
+
ProtectKernelModules=true
|
|
497
|
+
ProtectControlGroups=true
|
|
498
|
+
RestrictSUIDSGID=true
|
|
499
|
+
RestrictRealtime=true
|
|
500
|
+
LockPersonality=true
|
|
501
|
+
# MemoryDenyWriteExecute=true # <-- DO NOT: V8 JIT needs W+X pages; SIGTRAP at boot
|
|
502
|
+
ReadWritePaths=/var/lib/myapp /var/log/myapp
|
|
503
|
+
```
|
|
504
|
+
Note: ahead-of-time-compiled binaries (Go, Rust, statically compiled C/C++) have no JIT and **can** keep `MemoryDenyWriteExecute=true` — the restriction is specific to JIT runtimes (Node/V8, the JVM, PyPy, .NET with JIT). When a unit template is shared across services, gate MDWE on the runtime, not on the unit boilerplate.
|
|
505
|
+
|
|
506
|
+
## Config Foot-Guns (deploy/runtime)
|
|
507
|
+
|
|
508
|
+
Three recurring config traps that pass every syntax check yet break at runtime (field report #352 #5):
|
|
509
|
+
|
|
510
|
+
- **Empty-string env defaults are non-nullish.** A shell default of the form `${VAR:-}` (or a Compose `VAR: ""`) sets the variable to `""`, which is a *defined, non-null* value. Downstream `cfg.X = process.env.VAR ?? defaultX` then keeps `""` — nullish coalescing (`??`) only fires on `null`/`undefined`, never on empty string — so the intended default is silently poisoned and the app runs with an empty config value. Either leave the var truly unset (omit the `:-` default) or validate-and-coerce empty strings at the config boundary.
|
|
511
|
+
- **Dev hostnames hardcoded in worker healthchecks false-fail in prod.** A worker healthcheck that pings `http://localhost:3000` or `redis://dev-redis` passes in dev and fails in prod, marking a healthy worker unhealthy (and triggering restart loops). Healthcheck targets must come from the same env config the worker uses, never literals.
|
|
512
|
+
- **Awaiting best-effort side effects on the auth path blocks sign-in.** `await analytics.track(...)` / `await auditLog.write(...)` inline in the login handler means a slow or down telemetry backend stalls — or fails — the sign-in. Best-effort side effects must be fire-and-forget (queue them, `void`-them, or move them off the request path), never `await`ed on a latency-critical auth route.
|
|
513
|
+
|
|
335
514
|
## Subdomain Routing (Cloudflare Pages / Vercel / Netlify)
|
|
336
515
|
|
|
337
516
|
Platform-hosted static sites serve the entire project from root. Subdomain-to-subdirectory routing (e.g., `labs.example.com` → `/labs/`) requires platform-specific configuration:
|
|
@@ -344,6 +523,21 @@ Platform-hosted static sites serve the entire project from root. Subdomain-to-su
|
|
|
344
523
|
|
|
345
524
|
**Always test routing before announcing a subdomain.** Curl the subdomain and verify it serves the expected content, not the root index.html.
|
|
346
525
|
|
|
526
|
+
## Cloudflare TLS Mode (Flexible vs Full/Strict)
|
|
527
|
+
|
|
528
|
+
On a **Flexible** TLS zone, Cloudflare terminates TLS at the edge and talks to the origin over **plain HTTP**. If that origin then **301-redirects HTTP → HTTPS** (the near-universal nginx/Caddy default), it bounces the edge's HTTP request back to HTTPS, which Cloudflare re-fetches over HTTP, which redirects again — an **infinite redirect loop** (`ERR_TOO_MANY_REDIRECTS`) for every visitor (field report #344 F4a). On a Flexible zone the origin must serve the app on plain HTTP and must NOT force the HTTPS upgrade — let Cloudflare own the HTTPS edge.
|
|
529
|
+
|
|
530
|
+
**A Let's Encrypt cert on a sibling host is NOT proof the zone is Full/Strict.** Operators see `https://api.example.com` with a valid LE cert and assume the apex is Full mode too — but TLS mode is per-zone (sometimes per-host with config-rule overrides), and a working cert elsewhere says nothing about the mode applied to *this* hostname. Don't infer the mode from a neighbor's cert; check it.
|
|
531
|
+
|
|
532
|
+
**Behavioral check — count redirect hops, don't read config:**
|
|
533
|
+
```bash
|
|
534
|
+
# Healthy Full/Strict origin: 0–1 hops. A Flexible-loop origin spirals (curl caps at --max-redirs).
|
|
535
|
+
curl -sIL --max-redirs 10 "http://$ORIGIN_HOST/" | grep -ci '^location:'
|
|
536
|
+
```
|
|
537
|
+
A count at or near the cap (or `curl: (47) Maximum (10) redirects followed`) is the Flexible-loop signature. Fix by either switching the zone to **Full (strict)** and keeping the origin's HTTPS redirect, OR keeping **Flexible** and removing the origin's HTTP→HTTPS 301. Pick one; don't mix.
|
|
538
|
+
|
|
539
|
+
**Minimum Cloudflare API token scope for `/deploy`.** So `/deploy` can verify the zone's SSL mode *before* it writes an nginx config that adds a redirect (and thus before it can create the loop), the deploy token must include **`Zone → SSL and Certificates → Read`** (`Zone:SSL`) and **`Zone → Certificates → Read`** (field report #344 F4b). With those scopes the deploy step queries the zone's `ssl` setting, and only emits a redirect-bearing origin config when the mode is Full/Strict. A token scoped to DNS-only cannot see the SSL mode and will happily ship a redirect into a Flexible zone.
|
|
540
|
+
|
|
347
541
|
## Deploy Surface Boundary
|
|
348
542
|
|
|
349
543
|
**Invariant:** the repository root is NEVER the deploy surface. Physical separation between "all files tracked in the repo" and "files uploaded to the CDN / server" is enforced by tool configuration, not by `.gitignore`.
|
|
@@ -0,0 +1,92 @@
|
|
|
1
|
+
# DOC AUDIT — Documentation Currency & Cross-Reference Integrity
|
|
2
|
+
## Lead: **Surfer-led roster** · Domain: Documentation correctness (NOT UX)
|
|
3
|
+
|
|
4
|
+
> *"The map is not the territory — but a map that lies about the territory is worse than no map at all."*
|
|
5
|
+
|
|
6
|
+
## Identity
|
|
7
|
+
|
|
8
|
+
A doc audit verifies that VoidForge's prose tells the truth about VoidForge's behavior. It is a correctness discipline, not a styling one. Code drifts; commands get added, renamed, or retired; versions bump; ADRs supersede each other. Every drift leaves the docs one step behind reality, and stale docs are load-bearing lies — the next session (or the next user) acts on them. The doc audit catches that drift before it ships (field report #342 F-3).
|
|
9
|
+
|
|
10
|
+
**Doc audits are NOT a `/ux` concern.** Galadriel's UX pass evaluates the *user-facing product* — screens, flows, a11y, copy that end users read. A doc audit evaluates the *methodology and developer documentation* — method docs, command specs, the Holocron, README, ADRs, CLAUDE.md. These are different artifacts reviewed against different sources of truth. Routing doc-currency findings into a UX pass buries them under unrelated screenshots and produces neither a real UX review nor a real doc audit. Keep them separate (#342 F-3).
|
|
11
|
+
|
|
12
|
+
## Goal
|
|
13
|
+
|
|
14
|
+
After a doc audit, every documented claim is true at audit time, every cross-reference resolves, every command listed in CLAUDE.md has a matching spec (and vice versa), and the version stated in the docs matches the single source of truth. A reader who trusts the docs is not misled.
|
|
15
|
+
|
|
16
|
+
## The Four Checks
|
|
17
|
+
|
|
18
|
+
A doc audit is four distinct verifications. Each has its own source of truth — the audit is the act of diffing prose against that source (#342 F-3).
|
|
19
|
+
|
|
20
|
+
### 1. Currency
|
|
21
|
+
|
|
22
|
+
Documentation describes the system **as it is now**, not as it was. For every factual claim that can go stale — counts, file paths, feature lists, "X does Y" behavioral statements — verify it against the live artifact:
|
|
23
|
+
|
|
24
|
+
| Claim type | How to verify |
|
|
25
|
+
|------------|--------------|
|
|
26
|
+
| Agent / command / pattern counts | `ls .claude/agents/*.md \| wc -l`, count the rows in the relevant table |
|
|
27
|
+
| File paths cited as deliverables | `[ -f <path> ] && echo present \|\| echo MISSING` |
|
|
28
|
+
| Behavioral claims ("the hook blocks X") | Read the code, not memory — the doc must match the implementation |
|
|
29
|
+
| Retired / renamed features | grep for the old name; if it still appears as current, it is stale |
|
|
30
|
+
|
|
31
|
+
Document each verified claim with its source (`from ls`, `from <file>:<line>`). A claim you cannot anchor to an artifact is a claim you cannot defend.
|
|
32
|
+
|
|
33
|
+
### 2. Cross-Reference Integrity
|
|
34
|
+
|
|
35
|
+
Every internal reference must resolve. Broken cross-references rot silently because nothing errors at read time:
|
|
36
|
+
|
|
37
|
+
- **File links** — every `/docs/...`, `/.claude/...`, pattern, and method path cited must exist on disk.
|
|
38
|
+
- **ADR references** — every `ADR-NNN` mentioned must correspond to a real, current ADR; superseded ADRs must say so.
|
|
39
|
+
- **"See X" pointers** — the target section/doc must still exist and still cover what the pointer claims.
|
|
40
|
+
|
|
41
|
+
Verification is mechanical: extract every path-like and `ADR-NNN`-like token, then existence-check each one.
|
|
42
|
+
|
|
43
|
+
### 3. Command ↔ Method Sync
|
|
44
|
+
|
|
45
|
+
The command table in CLAUDE.md, the slash-command specs in `.claude/commands/`, and the method docs in `/docs/methods/` describe the same surface from three angles. They must agree:
|
|
46
|
+
|
|
47
|
+
- Every command in the CLAUDE.md table has a spec file in `.claude/commands/` and a method-doc entry where one is expected.
|
|
48
|
+
- Every command spec in `.claude/commands/` appears in the CLAUDE.md table (no orphan commands).
|
|
49
|
+
- Aliases (e.g. `/review` → `/engage`, `/security` → `/sentinel`) are documented as aliases in all three places, not as independent commands in one and missing from another.
|
|
50
|
+
- Flag taxonomy claims (which flag works on which command) match the command specs.
|
|
51
|
+
|
|
52
|
+
A command documented in one place but missing from another is a sync defect — report it.
|
|
53
|
+
|
|
54
|
+
### 4. Version-SSOT Consistency
|
|
55
|
+
|
|
56
|
+
There is one source of truth for the version. Every other mention must match it:
|
|
57
|
+
|
|
58
|
+
- Identify the SSOT (e.g. `VERSION.md`, `package.json` `version`, the latest `/git` release tag).
|
|
59
|
+
- Every version string in docs, changelog headers, and README badges must equal the SSOT.
|
|
60
|
+
- The changelog must have an entry for the current version; a version bumped without a changelog entry is a currency defect (Coulson's domain).
|
|
61
|
+
|
|
62
|
+
Mismatched version strings are the most common — and most embarrassing — doc defect. They are also the cheapest to verify: grep for version-shaped tokens, compare to SSOT.
|
|
63
|
+
|
|
64
|
+
## The Surfer-Led Doc Roster
|
|
65
|
+
|
|
66
|
+
Doc audits run under the Silver Surfer Gate like any other review command — announce the herald, launch the Surfer, deploy the roster it returns. The Surfer biases toward the documentation specialists below for this domain; it is not a fixed list, and the Surfer may add cross-domain agents when the diff warrants (#342 F-3).
|
|
67
|
+
|
|
68
|
+
| Agent | Universe | Focus in a doc audit |
|
|
69
|
+
|-------|----------|----------------------|
|
|
70
|
+
| **Troi** | Star Trek | PRD ↔ implementation claim traceability — does every documented claim trace to something that actually exists in code/PRD? Catches requirement and asset gaps. |
|
|
71
|
+
| **Wong** | Marvel | Doc accuracy — API docs, inline comments, README correctness; the guardian of "does the prose match the code." |
|
|
72
|
+
| **Irulan** | Dune | Documentation completeness — the historian checks for *missing* documentation: undocumented commands, agents, ADRs, and features that exist but are described nowhere. |
|
|
73
|
+
| **Coulson** | Marvel | Version / changelog currency — version-SSOT consistency, changelog completeness, release-note accuracy. |
|
|
74
|
+
|
|
75
|
+
**Division of labor:** Troi works inward (claim → does it trace?), Irulan works outward (artifact → is it documented?). Together they close both gaps: documented-but-false (Troi) and true-but-undocumented (Irulan). Wong validates the prose itself; Coulson owns everything version- and release-shaped.
|
|
76
|
+
|
|
77
|
+
## Integration Points
|
|
78
|
+
|
|
79
|
+
| Command | How it uses the doc audit |
|
|
80
|
+
|---------|---------------------------|
|
|
81
|
+
| `/git` | Before a release, Coulson verifies version-SSOT consistency and changelog currency as part of the bump. |
|
|
82
|
+
| `/engage` | Code-review passes flag doc drift adjacent to changed code — but a full doc audit is its own pass, not a `/engage` side effect. |
|
|
83
|
+
| `/debrief` | Field reports about stale or wrong docs feed doc-audit scope; Bashir routes them here, not to `/ux`. |
|
|
84
|
+
| `/void` | After a methodology sync, run a doc audit to confirm the merged docs still cross-reference correctly. |
|
|
85
|
+
|
|
86
|
+
## Anti-Patterns
|
|
87
|
+
|
|
88
|
+
- **Routing doc currency to `/ux`** — the most common misroute. UX reviews the product; doc audits review the methodology docs. Different source of truth, different roster (#342 F-3).
|
|
89
|
+
- **Auditing from memory** — every currency claim must be anchored to a live artifact (`ls`, file existence, code read). A claim you cannot source is a claim you cannot verify.
|
|
90
|
+
- **Spot-checking cross-references** — broken links rot silently. Extract *every* path/ADR token and existence-check all of them, not a sample.
|
|
91
|
+
- **Fixing prose without checking the other two angles** — a command renamed in CLAUDE.md but not in its spec file (or vice versa) is still broken. Sync is a three-way agreement, not a one-way edit.
|
|
92
|
+
- **Treating version mismatch as cosmetic** — a wrong version string in the docs is a correctness defect: it tells the reader they are on a release they are not on.
|
|
@@ -143,6 +143,15 @@ When upgrading across versions, check the **Migration Registry** for one-time cl
|
|
|
143
143
|
|
|
144
144
|
**Important:** Some cleanup targets (like `docs/ARCHITECTURE.md`) could be the user's own project files, not leaked VoidForge artifacts. Before removing any file, **fingerprint it** — check if it contains VoidForge-specific markers (e.g., header says "VoidForge", references `wizard/`, or matches a known stale version like "15.2.1"). If the file looks like the user's own work, skip it and note why.
|
|
145
145
|
|
|
146
|
+
**Consumer vs. clone — gate the whole Migration Registry on this first (field report #343 F10).** Spring Cleaning is destructive, and the "Always remove" list below is calibrated for **methodology clones** (projects scaffolded from the `scaffold` or `core` source, which carry no application code of their own). On a **methodology consumer** — an application project that adopted VoidForge but has its own production source tree — files like `playwright.config.ts`, `vitest.config.ts`, `tsconfig.json`, and `package-lock.json` are **legitimate application files**, not leaked VoidForge artifacts. Deleting them on a consumer is **data loss**: you would be removing the project's real test config, TypeScript config, and dependency lockfile.
|
|
147
|
+
|
|
148
|
+
**Detection heuristic:** Read the project's `package.json`. If it declares non-VoidForge `dependencies` or `devDependencies` (anything beyond a bare name + version + description), the project is a **CONSUMER**. If `package.json` is minimal or absent (no real dependencies — the shape scaffold/core ship), the project is a **CLONE**.
|
|
149
|
+
|
|
150
|
+
- **CONSUMER** → **SKIP the entire "Always remove" list.** Do not apply the version-range migrations that delete config/lockfiles. The only files Spring Cleaning may touch on a consumer are ones that fingerprint **unambiguously** as VoidForge artifacts (e.g., `PRD-VOIDFORGE.md`, a `docs/ARCHITECTURE.md` whose header literally says "VoidForge" / "Version: 15.2.1"). Fingerprint **defensively** before deleting anything; when a file is ambiguous, keep it and note why. Never delete `playwright.config.ts`, `vitest.config.ts`, `tsconfig.json`, or `package-lock.json` on a consumer.
|
|
151
|
+
- **CLONE** → apply the full Migration Registry as written below, including the "Always remove" list.
|
|
152
|
+
|
|
153
|
+
When unsure which side of the line the project sits on, treat it as a CONSUMER (the safe default — keeping a file is reversible, deleting it is not).
|
|
154
|
+
|
|
146
155
|
**Process:**
|
|
147
156
|
1. Determine which migrations apply based on the local version → upstream version range
|
|
148
157
|
2. For each applicable migration, scan for the listed files
|
|
@@ -163,15 +172,15 @@ When upgrading across versions, check the **Migration Registry** for one-time cl
|
|
|
163
172
|
|
|
164
173
|
Prior to v20.2, the scaffold and core branches contained files that should only exist on main. These were cleaned from upstream npm package but may persist in projects that cloned earlier versions.
|
|
165
174
|
|
|
166
|
-
**Always remove (unambiguous VoidForge artifacts):**
|
|
175
|
+
**Always remove (unambiguous VoidForge artifacts) — CLONES ONLY. On a methodology consumer, skip this entire list (field report #343 F10); `package-lock.json`, `playwright.config.ts`, `vitest.config.ts`, and `tsconfig.json` are the consumer's real application files and deleting them is data loss:**
|
|
167
176
|
```
|
|
168
177
|
PRD-VOIDFORGE.md ← VoidForge's own product PRD
|
|
169
178
|
PROPHECY.md ← Historical roadmap, all items shipped
|
|
170
179
|
WORKSHOP.md ← Workshop guide requiring wizard/
|
|
171
|
-
package-lock.json ← Scaffold/core have no dependencies
|
|
172
|
-
playwright.config.ts ← References wizard/e2e
|
|
173
|
-
vitest.config.ts ← References wizard/__tests__
|
|
174
|
-
tsconfig.json ← References wizard/**/*.ts
|
|
180
|
+
package-lock.json ← Scaffold/core have no dependencies (CONSUMER: real lockfile — keep)
|
|
181
|
+
playwright.config.ts ← References wizard/e2e (CONSUMER: real test config — keep)
|
|
182
|
+
vitest.config.ts ← References wizard/__tests__ (CONSUMER: real test config — keep)
|
|
183
|
+
tsconfig.json ← References wizard/**/*.ts (CONSUMER: real TS config — keep)
|
|
175
184
|
packages/voidforge/scripts/voidforge.ts ← CLI entry point, imports wizard/
|
|
176
185
|
scripts/vault-read.ts ← Imports packages/voidforge/wizard/lib/vault
|
|
177
186
|
scripts/danger-room-feed.sh ← Feeds wizard dashboard
|
|
@@ -274,6 +283,8 @@ Verify and celebrate:
|
|
|
274
283
|
- Conflicts resolved: [list]
|
|
275
284
|
```
|
|
276
285
|
4. Check for handoffs — if new commands or agents were added, mention them
|
|
286
|
+
4b. **Restart required before new agents are launchable (field report #343 F1a).** Agent registration is **session-scoped**: Claude Code reads `.claude/agents/*.md` at session start, so any agent files this sync *added* are NOT yet usable as `subagent_type` values in the current session. If the sync added one or more `.claude/agents/` files, tell the operator: *"New agents were synced into `.claude/agents/`. Restart the Claude Code session before launching them — until you do, they can't be used as `subagent_type` values."* Files that were merely *updated* (already present at session start) work without a restart; only newly-added agent files need one.
|
|
287
|
+
4c. **Silver Surfer Gate bypass flags are per-session (field report #343 F4).** `--solo` and `--light` bypasses recorded by `scripts/surfer-gate/bypass.sh` live in per-session gate state, so a bypass is **not durable across `/clear` or a session restart**. After clearing context or restarting, any prior `--solo`/`--light` no longer applies and must be **re-issued** on the next gated command. Don't assume a bypass set earlier in a different session is still in effect — if the operator restarted or ran `/clear` since granting it, the gate is live again until the flag is passed afresh.
|
|
277
288
|
5. **Content drift check:** If the sync changed methodology counts (agent counts, command counts, pattern counts) AND the project has a data layer that displays VoidForge metadata (e.g., `releases.ts`, `commands.ts`, site content), flag: "The sync changed [N] agents/commands/patterns. If your project displays these counts, update the data layer to match." This prevents stale counts on marketing sites and docs pages after version bumps. (Field report #113)
|
|
278
289
|
5b. **Description accuracy check (Radagast):** For projects that display command descriptions (marketing sites, docs sites, README generators), compare each command's user-facing description against the upstream method doc's actual steps. If the upstream method doc gained new steps, flags, or capabilities in this sync that aren't reflected in the site's description, flag: "Command /X gained [capability] in this sync but the site description doesn't mention it. Update the description in [data file]." Count-based checks catch missing entries; this catches stale descriptions on existing entries. The most common void sync change is adding capabilities to existing commands, not adding new commands. (Field report #267: 9 commands had outdated descriptions after a sync that added capabilities to 12 agents — the biggest feature was invisible on the site.)
|
|
279
290
|
5c. **Version history check:** If VERSION.md was updated, compare the version table entries against any project pages that display release history (roadmap pages, changelog displays, "shipped versions" sections). Flag versions present in VERSION.md that are missing from site content. This prevents version drift between the methodology's version history and user-facing release pages.
|
|
@@ -95,6 +95,21 @@ This catches what static analysis misses: IPv6 binding, native module ABI compat
|
|
|
95
95
|
|
|
96
96
|
**Semantic verification rule:** Verify semantic correctness of arguments, not just type correctness. Ask: is this the RIGHT value, not just a valid type? A function call that compiles and passes type-checking can still be fundamentally wrong if the wrong variable is passed. Check that each argument carries the intended meaning, not just a compatible shape. (Field report #258: aggregate spend parameter received a config object — type-compatible but semantically meaningless, causing NaN comparisons that silently fell through.)
|
|
97
97
|
|
|
98
|
+
**Step 4.5 — Adversarial Verification (vote-based REFUTE pass) (field report #346 #2):** Crossfire (above) attacks the codebase to discover NEW bugs. This sub-step is the opposite vector — it refutes the EXISTING findings already on the board. Run it on every **Critical** and **High** finding before it reaches the fix batch:
|
|
99
|
+
|
|
100
|
+
1. For each Critical/High finding, spawn **≥2 skeptic agents** (drawn from different universes per the low-confidence escalation rule). Each skeptic is prompted to **REFUTE** the finding, not to confirm it: *"Here is a claimed defect. Read the actual code at the cited file:line and prove it is NOT a real issue. Default to REFUTED unless the code itself confirms the defect."*
|
|
101
|
+
2. Each skeptic returns a vote: **CONFIRM** (the code at the cited location demonstrably exhibits the defect) or **REFUTE** (the defect cannot be reproduced from the cited code).
|
|
102
|
+
3. **Keep a finding only if it receives ≥1 CONFIRM.** A finding that every skeptic refutes is dropped (logged as a refuted first-pass false positive, not deleted silently).
|
|
103
|
+
4. **Re-rate severity from the votes**, not from the original author's assertion: a Critical that earns only one weak CONFIRM and one REFUTE drops to High or Medium; a finding that all skeptics CONFIRM with reproductions holds its severity.
|
|
104
|
+
|
|
105
|
+
Why default-to-refuted: across instrumented Gauntlets, **~38% of first-pass Criticals were false positives** — author confidence and adversarial-attack momentum inflate severity. An attacker prompted to find bugs will manufacture them; a skeptic prompted to refute them filters them. The two passes are complementary: Crossfire (attack for new bugs) → Adversarial Verification (refute existing findings).
|
|
106
|
+
|
|
107
|
+
**Verify the FIX, not just the finding (field report #348 #4 / #350 #4):** The refute pass must also challenge the **PROPOSED FIX**, not only the finding it addresses. For each fix the batch intends to apply, the skeptic asks: *does this fix introduce a NEW failure mode the original code did not have?* Specifically hunt for **wedge, unbounded retry, infinite loop, orphaned record, double-send** regressions. The risk is acute whenever a fix adds a **coordination primitive — a sentinel, a lock, a retry-state row, a fence/claim marker — without also adding a liveness signal** (a bounded timeout that is actually reachable, a heartbeat, a dead-man release). A coordination primitive with no reachable release path does not fix a bug; it converts a transient failure into a permanent wedge.
|
|
108
|
+
|
|
109
|
+
> **M5 mint-fence incident (field report #348 #4):** a fix added a stale-reclaim fence to recover stuck mint jobs after **120s**. But the reclaim window sat *inside* a BullMQ retry budget of only **~3s** — the 120s liveness threshold was structurally unreachable before the job exhausted its retries, so drafts that hit the fence wedged permanently in `FAILED` instead of being reclaimed. The fix's own coordination primitive (the fence) had no reachable liveness signal. The finding was real; the *fix* created a new Critical.
|
|
110
|
+
|
|
111
|
+
> **Cross-system checkpoint is non-optional (field report #350 #4):** in a multi-mission Gauntlet, the cross-system checkpoint caught a **fix-induced Critical that a per-mission review's own fix had created** — the per-mission review verified its fix in isolation and passed it; only the whole-system pass saw the new failure mode the fix introduced. This is direct evidence that verifying a fix against the single mission that motivated it is insufficient. The Gauntlet-level refute-the-fix checkpoint stays in the protocol regardless of how green the per-mission reviews were.
|
|
112
|
+
|
|
98
113
|
**Round 5 — The Council (convergence):**
|
|
99
114
|
- Spock (Star Trek) — code quality after fixes
|
|
100
115
|
- Ahsoka (Star Wars) — access control integrity
|
|
@@ -108,6 +123,8 @@ Troi also performs a **Marketing Copy Drift Check**: compare marketing page clai
|
|
|
108
123
|
|
|
109
124
|
**Pattern auth completeness check (Kenobi, during Rounds 2-3):** When a pattern file defines an authentication flow, verify the auth checks perform actual value verification (compare against expected, call verify functions) — not just presence checks (`!!header`, `Boolean()`). Flag `!!` or truthiness checks on auth-related headers as suspicious. (Field report #109: daemon socket auth used `!!vaultHeader` which passed for any non-empty string.)
|
|
110
125
|
|
|
126
|
+
**Contrast-finding admissibility (Reality stone / a11y, Galadriel's team) (field report #355 F1):** A contrast finding is **inadmissible** — and therefore CANNOT be rated **Critical** or **High** — unless it cites the **literal source hex for BOTH the foreground and the background**, each with its own `file:line`, AND the agent re-greps that the offending **class pairing actually exists** at the cited location before rating. Citing a token *name* (`--text-muted on --surface-2`) is not a hex; the agent must resolve the token to its computed `#rrggbb` value at the cited `file:line` and quote both colors. An uncited contrast finding (no source hex, or only one of the two colors, or a class pairing that no longer exists at the cited line) is logged as inadmissible and dropped before the fix batch. This defends against the **token-name-swap false-Critical** — a finding that asserts a contrast failure from token names alone, where the swapped/renamed token actually resolves to a compliant hex and the failing class pairing was never present in the rendered output.
|
|
127
|
+
|
|
111
128
|
**Total: 30+ unique agent deployments across 5 rounds.**
|
|
112
129
|
|
|
113
130
|
## Escalation Pattern
|
|
@@ -164,6 +181,22 @@ Fix batches happen between rounds:
|
|
|
164
181
|
|
|
165
182
|
**Production-parity exit criterion:** Before any Gauntlet round can be marked PASS, verify that the test execution backend matches the project's declared production backend. If `PROJECT_VERSION.md` (or equivalent) declares PostgreSQL but `tests/conftest.py` autouse fixture pins SQLite (or vice versa), the Gauntlet **FAILS** regardless of green test counts. Tests pinned to the wrong backend silently mask the integrations that actually run in prod (RLS, asyncpg pools, advisory locks, LISTEN/NOTIFY, FOR UPDATE SKIP LOCKED, transaction semantics). Field report #315 M3: this slipped past 4 dual-backend Gauntlets on Union Station between v6.2.1 cutover and v7.6 — every Gauntlet was structurally blind to the runtime risk it was supposed to be reviewing. Concrete check at end of each round: `grep -nE "_backend\s*=\s*['\"]" tests/conftest.py` and reconcile against `cat PROJECT_VERSION.md | grep -i 'database\|backend'`. Mismatch = FAIL the round.
|
|
166
183
|
|
|
184
|
+
**Production-config boot exit criterion (Victory/launch-readiness Gauntlet) (field report #350 #1):** The #315 production-parity criterion above only reconciles the *test database backend*. It does NOT cover sandbox storage, sandbox email, sandbox extractor, or any other adapter that runs in a fake/sandbox mode under test but must resolve to a real implementation in production. For a Victory or launch-readiness Gauntlet, before any round can be marked PASS, run config validation in a **`APP_ENV=production` posture and ASSERT the app actually boots** under it. This catches, before launch:
|
|
185
|
+
|
|
186
|
+
- **Missing real adapters that throw** — a production adapter (`S3Storage`, `SESMailer`, a real extractor) whose constructor or first call raises because a required key/endpoint was never provisioned. Under sandbox the fake adapter swallows this; under `APP_ENV=production` it surfaces at boot.
|
|
187
|
+
- **Sandbox-in-prod** — config that silently falls back to the sandbox adapter when a production credential is absent, shipping fake storage/email/extraction to real users.
|
|
188
|
+
- **No prod-boot guard** — the absence of any startup assertion that production mode resolved zero sandbox adapters.
|
|
189
|
+
|
|
190
|
+
Concrete check: `APP_ENV=production <boot command> --check-config` (or the smallest invocation that triggers full adapter resolution) must exit 0 *and* log zero sandbox-adapter selections. A boot that throws, or that boots only by falling back to a sandbox adapter, **FAILS** the round.
|
|
191
|
+
|
|
192
|
+
**Sandbox-blind-spot dimension (field report #350 #2):** A 100%-green **sandbox** test suite is *necessary but not sufficient*. Add a first-class round dimension that explicitly enumerates: **"what does the green sandbox suite NOT exercise *because* it runs in sandbox?"** Sandbox mode does not just substitute fake data — it changes which code paths execute. Concretely hunt for:
|
|
193
|
+
|
|
194
|
+
- **Selectors / accessors that throw only on real adapters** — a `get_url()`, `presign()`, or `extract()` that returns a canned value in sandbox but raises on the real implementation (missing region, unsigned URL, unsupported content type). The sandbox path never reaches the throwing branch.
|
|
195
|
+
- **Auto / silent paths suppressed by sandbox confidence pinning** — when sandbox pins a confidence score or classification to a constant, every downstream branch gated on that score is forced down one path. The auto-approve, auto-retry, or human-fallback branches that real (variable) confidence would trigger are never exercised by the green suite.
|
|
196
|
+
- **Coverage that is structurally unreachable in sandbox** — branches behind real rate limits, real pagination, real webhook signatures, real timeouts.
|
|
197
|
+
|
|
198
|
+
Output of this dimension is an explicit list: *"green sandbox suite does NOT cover: [path], [path], …"* — each entry is either covered by a production-posture test (see the boot criterion above) or logged as a known launch-risk gap. "All sandbox tests pass" is never, by itself, grounds to mark a launch-readiness round PASS.
|
|
199
|
+
|
|
167
200
|
## Finding Format
|
|
168
201
|
|
|
169
202
|
Every finding, from every agent, in every round, uses this format:
|
|
@@ -243,6 +276,9 @@ After the gauntlet completes (all mandated rounds), the caller MUST invoke `/git
|
|
|
243
276
|
- `--security-only` — Run 4 rounds of security only: inventory, full audit, re-probe, adversarial. Kenobi's marathon. For when you specifically need a deep security review.
|
|
244
277
|
- `--ux-only` — Run 4 rounds of UX only: surface map, full audit, re-verify, enchantment. Galadriel's marathon.
|
|
245
278
|
- `--qa-only` — Run 4 rounds of QA only: discovery, full pass, re-probe, adversarial. Batman's marathon.
|
|
279
|
+
|
|
280
|
+
**Single-domain `--focus` roster shape — surface-partition, don't stack duplicates (field report #355 F3):** When `--focus` names a **single domain** (or a `--*-only` marathon runs one lens for 4 rounds), build the roster from **surface-partitioned agents** — each agent owns a **distinct set of files/sections/routes**, not the whole codebase — and **cap the roster at ~6-8 agents**. Do NOT stack near-duplicate same-lens personas that all review **everything**: ten security agents each scanning the entire surface produce overlapping findings, inflated dedupe cost, and false consensus (the same false positive reported ten times reads as ten confirmations). Partition instead: assign Agent A the auth/session surface, Agent B the payment/billing routes, Agent C the file-upload + media paths, Agent D the multi-tenant query layer, etc. — distinct ownership per agent, full coverage across the roster, with one or two cross-cutting agents reserved for seams between partitions. Six-to-eight partitioned agents beat a dozen redundant ones on both coverage and signal-to-noise.
|
|
281
|
+
|
|
246
282
|
- `--resume` — Resume from the last completed round (reads from gauntlet-state.md).
|
|
247
283
|
- `--ux-extra` — Extra Éowyn enchantment emphasis across all rounds. Galadriel's team proposes micro-animations, copy improvements, and delight moments beyond standard usability/a11y. Produced 7 shipped enchantments in the v7.1.0 Gauntlet.
|
|
248
284
|
- `--assess` — **Pre-build assessment mode.** Run Rounds 1-2 only (Discovery + First Strike) and produce an assessment report — no fix batches, no Crossfire, no Council. Designed for evaluating existing codebases before a rebuild or migration. When an existing codebase has fundamental issues (stubs, abandoned migrations, missing auth), Rounds 3-10 become redundant because there are no fixes to verify between rounds. The assessment report groups findings by root cause rather than by domain, producing a "State of the Codebase" view. (Field report #125: Infinity Gauntlet on a half-built system produced 120+ findings all tracing to the same root cause — stubs returning True.)
|
|
@@ -323,6 +359,8 @@ Each agent reports a confidence score (0-100) on their findings. The score refle
|
|
|
323
359
|
|
|
324
360
|
**Why this matters:** In the v8.0 Gauntlet, several "findings" were false positives that wasted fix time. Confidence scoring lets agents express uncertainty instead of presenting everything as definitive. Low-confidence findings get a second opinion before reaching the user.
|
|
325
361
|
|
|
362
|
+
**PRINCIPLE — Critical findings are unconditionally verified (field report #345 DEAL-003):** Confidence is an advisory signal for routing *Medium and below*. It is NEVER a fast-track that lets a **Critical**-severity finding skip adversarial verification. The 90-100 "skip re-verification" optimization above applies to High/Medium/Low only — a Critical at confidence 97 is routed to the adversarial refute pass exactly the same as a Critical at confidence 40. Severity dominates confidence: when the two conflict, the higher severity wins the routing decision. Do not enshrine a runtime `needs_verify` boolean (or any per-finding "already verified, skip" flag) into the finding schema as a way to opt a Critical out of verification — Critical-routes-to-verification is a structural property of the protocol, not a field an agent (or a fix author) can toggle. The cost of one false-negative Critical reaching production dwarfs the cost of re-verifying a true-positive one.
|
|
363
|
+
|
|
326
364
|
## Sub-agent Failure Fallback
|
|
327
365
|
|
|
328
366
|
If a sub-agent launch fails (API error, timeout, context exhaustion):
|
|
@@ -30,6 +30,8 @@
|
|
|
30
30
|
|
|
31
31
|
Adversarial UX/UI QA review. Identify usability issues, inconsistencies, broken states, accessibility gaps, responsiveness problems. Implement safely in small batches. No redesigning for fun.
|
|
32
32
|
|
|
33
|
+
**Scope clarification — `/ux` is a UI/UX review verb, not a generic audit verb.** (Field report #342 F-3.) `/ux` reviews interface and experience: screens, flows, states, a11y, visual hierarchy, motion. **Documentation and content audits are out of `/ux`'s scope** — auditing prose, doc structure, broken links, stale instructions, or content accuracy is a different discipline with a different checklist. Route those to the doc-audit path (`/audit-docs`, see `DOC_AUDIT.md`), not here. `/ux` happens to be the most audit-shaped command in the roster, which tempts users to point every audit-flavored request at it; resist that. If a request is about *what the docs say* rather than *how the interface behaves*, hand off to the doc-audit path. (Tutorial/docs *surfaces* — the rendered page's usability, launch-context, prerequisite depth per Step 1.5 — remain in `/ux`'s scope; the *content audit* of those same docs does not.)
|
|
34
|
+
|
|
33
35
|
## When to Call Other Agents
|
|
34
36
|
|
|
35
37
|
| Situation | Hand off to |
|
|
@@ -162,11 +164,64 @@ Any fire-and-forget background operation (AI generation, file processing, deploy
|
|
|
162
164
|
Before hiding, relocating, or collapsing a UI container (dropdown, panel, menu, toolbar), list ALL actions inside it — primary (viewing, selecting, navigating) AND secondary (creating, deleting, configuring, exporting). Verify every action remains reachable after the redesign. A "simplification" that hides a version picker also hides the "New Version" button inside it.
|
|
163
165
|
(Field report #22: workspace redesign hid the version creation button that lived inside a dropdown.)
|
|
164
166
|
|
|
167
|
+
## Step 1.8 — Reference Grounding (World-Scan) — Mandatory
|
|
168
|
+
|
|
169
|
+
(Field reports #347, #2.)
|
|
170
|
+
|
|
171
|
+
Before Galadriel generates any visual direction — palette, type system, layout language, signature interaction — she must ground the work in the real design world. This step is **mandatory** input to every downstream generation step. Skipping it produces the single most common visual failure mode in agent-generated UI.
|
|
172
|
+
|
|
173
|
+
**The failure mode — committee-converges-on-the-mean.** When a committee of agents reasons about "what good design looks like" from training priors alone, every agent independently regresses toward the statistical center of its training distribution. The outputs agree with each other, feel internally consistent, and pass every internal review — yet land on the bland, averaged, instantly-recognizable look users now perceive as "AI slop." Consensus is not quality here; it is the symptom. The agents converged on the mean precisely *because* nothing pulled them off it. Internal agreement on visual direction, with no external reference, is a red flag, not a green light.
|
|
174
|
+
|
|
175
|
+
**The remedy — fan out to the real world first.** Before any visual generation, web-capable agents (Arwen, Éowyn) fan out to:
|
|
176
|
+
|
|
177
|
+
- **Award galleries:** Awwwards, FWA, CSSDA, Godly, Typewolf. These are curated, off-the-mean, and current.
|
|
178
|
+
- **The live competitor set:** the actual sites of the product's named competitors and adjacent best-in-class products — not a description of them, the live pages.
|
|
179
|
+
|
|
180
|
+
From that scan, extract **named** artifacts into a **reference dossier**:
|
|
181
|
+
|
|
182
|
+
- Specific sites worth stealing a move from (named, with the move identified: "Linear's command-palette transition," "Stripe's gradient-on-scroll hero," not "a clean SaaS site").
|
|
183
|
+
- Named typefaces and pairings actually in use (not "a modern sans").
|
|
184
|
+
- Named interactions and motion patterns (the signature moment, the page transition, the hover behavior) worth adapting.
|
|
185
|
+
|
|
186
|
+
**The dossier is required input downstream.** Every later generation step — Step 1.75 enchantment, Step 2 visual attack plan, any palette/type/layout proposal — must cite the dossier. A proposal with no reference anchor is unanchored from reality and is sent back. **Never generate visual direction from training priors alone.** The dossier is the gravity that pulls the work off the statistical mean.
|
|
187
|
+
|
|
188
|
+
## Step 1.85 — Converging Creative Direction
|
|
189
|
+
|
|
190
|
+
(Field reports #351, #2.)
|
|
191
|
+
|
|
192
|
+
Reference grounding tells you where the real world is. These three disciplines keep your own output off the mean and make creative direction actually converge instead of looping.
|
|
193
|
+
|
|
194
|
+
### Show, don't tell — prototype before you finalize
|
|
195
|
+
|
|
196
|
+
Creative direction does not converge from prose, mockups, or description. It converges only when a **feel-able interactive prototype of the signature moment** ships to a review URL someone can open and touch. The signature moment — the hero reveal, the command palette, the card-to-detail transition, whatever carries the product's character — must run in a browser at a real URL before the direction is called final. Reading "a smooth 200ms ease-out reveal" tells you nothing; opening the URL and feeling it tells you everything. Until the signature moment is feel-able at a URL, treat the direction as a proposal, not a decision. This is the fastest known way to break the description-loop where reviewers keep agreeing on words that mean different things to each of them.
|
|
197
|
+
|
|
198
|
+
### Token-scoped theming — pivots must be cheap
|
|
199
|
+
|
|
200
|
+
Scope color and type to **semantic tokens** (`--color-surface`, `--color-accent`, `--text-heading`, `--text-body`) from the first component, never hardcoded values inside components. The test: a palette pivot or a type pivot must be a **token change, not a component rewrite**. If switching the accent color or swapping the heading typeface requires editing more than the token definitions, the theming is not token-scoped and the pivot is expensive — which means in practice the pivot won't happen, and the design freezes on its first guess. Cheap pivots are what let creative direction explore and actually converge instead of committing to the first idea by inertia. Celeborn (Step 2 design-system governance) enforces token usage; this step establishes *why* it is load-bearing for creative direction, not just consistency.
|
|
201
|
+
|
|
202
|
+
### The de-AI checklist
|
|
203
|
+
|
|
204
|
+
Screen all copy and visuals against the tells that mark generated work as generated. Each tell below is a flag, not an automatic ban — but every flagged instance must be a deliberate, justified choice, never a default the model reached for:
|
|
205
|
+
|
|
206
|
+
**Copy tells:**
|
|
207
|
+
- **Em-dashes** used as the default connective rhythm (the most reliable single tell). Vary the punctuation; not every clause break is an em-dash.
|
|
208
|
+
- **Generic adjectives** — "seamless," "powerful," "robust," "intuitive," "elevate," "delightful," "effortless." Specific beats generic; show the thing instead of asserting it.
|
|
209
|
+
|
|
210
|
+
**Visual tells:**
|
|
211
|
+
- **Gradient-text** headings (the `bg-clip-text` rainbow/violet headline).
|
|
212
|
+
- **Pill eyebrows** — the small rounded-full badge above every hero headline.
|
|
213
|
+
- **Default Inter/Playfair pairing** — the reflexive "modern sans + elegant serif" combo. If the reference dossier (Step 1.8) didn't lead you there for a reason, don't reach for it by default.
|
|
214
|
+
- **Cream-editorial-as-trope** — the warm off-white background + serif + wide margins "editorial" look applied to products it doesn't fit, because the model treats it as shorthand for "premium."
|
|
215
|
+
|
|
216
|
+
A surface that trips three or more of these tells is presumed AI-slop and goes back for de-AI revision, anchored against the Step 1.8 reference dossier.
|
|
217
|
+
|
|
165
218
|
## Step 2 — UX/UI Attack Plan
|
|
166
219
|
|
|
167
220
|
**Elrond:** IA, navigation, task flows, friction.
|
|
168
221
|
**Arwen:** Spacing, typography, icons, button hierarchy, visual hierarchy.
|
|
169
222
|
**Samwise:** Keyboard nav, focus rings, ARIA, contrast, reduced motion. **WCAG contrast verification:** For the project's primary text/background combinations, verify WCAG AA contrast ratio (4.5:1 for normal text, 3:1 for large text). Check: primary text on primary bg, muted text on primary bg, accent text on primary bg. Opacity modifiers (e.g., `text-emerald-200/50`) halve the effective contrast — always compute the final rendered color, not the base color. A systematic check during the initial color system design prevents dozens of instances across the codebase. (Field report #38: 46 failing-contrast instances across 13 files, systemic from day 1.)
|
|
223
|
+
|
|
224
|
+
**Contrast findings must be cited and re-grepped (#355 F1).** Computing the final rendered color is necessary but not sufficient. A contrast finding is **inadmissible if uncited**: it MUST cite the *literal source hex* for BOTH foreground and background, each with the `file:line` where it is defined (`tailwind.config.ts`, `globals.css`, or the relevant theme file). Before rating any contrast issue Critical or High, **RE-GREP the actual class usage** in the codebase to confirm the foreground/background pairing actually co-occurs on a real element — a pairing that never renders together is not a finding. **Token NAMES are not proxies for VALUES.** Never infer contrast from semantic token names: a token called `paper` may resolve to near-black and one called `ink` to near-white. Read the value, not the name. (In #355 F1 a token-name swap — assuming `paper`/`ink` meant light/dark by their names — produced a false site-wide Critical that did not exist once the actual hex values were read.)
|
|
170
225
|
### Async Polling State Machine
|
|
171
226
|
Any UI that polls for backend status changes must implement 4 states: **idle -> syncing -> success -> failure**. Never show "success" before the async confirmation resolves. Never show the old value alongside a "updated" banner. The polling result replaces the displayed value atomically — both change together or neither does. (Field report #149)
|
|
172
227
|
|
|
@@ -225,6 +280,8 @@ Click through every primary journey. Document friction, broken UI, missing state
|
|
|
225
280
|
| ID | Title | Severity | Category | Location | Repro | Current | Expected | Recommendation | Files | Verified | Regression | Risk |
|
|
226
281
|
|----|-------|----------|----------|----------|-------|---------|----------|----------------|-------|----------|-----------|------|
|
|
227
282
|
|
|
283
|
+
**Severity must be enforcement-aware (#354 F2).** When a finding is a *client-side affordance or visibility leak* — a disabled-looking action that is still clickable, a hidden field present in the DOM, an admin control rendered to a non-admin — check whether the **server still enforces** the rule before rating it. If the backend rejects the action regardless of the client state, this is a **UX issue (P2/P3)**, not a security breach: the user-facing affordance is confusing or misleading, but no privilege is actually escalated. Only when the server fails to enforce does it cross into Kenobi's territory and escalate. Rate the UX defect, then hand the *enforcement* question to Security rather than inflating the UX severity. Cross-reference the SECURITY_AUDITOR enforcement-layer severity rubric (`SECURITY_AUDITOR.md`, Operating Rule 2 — "Severity = exploitability × impact"): a leak the server enforces has near-zero exploitability and so cannot be Critical on the security axis.
|
|
284
|
+
|
|
228
285
|
## Step 5 — Enhancement Specs (Before Coding)
|
|
229
286
|
|
|
230
287
|
Problem statement, proposed solution, acceptance criteria, UI details, a11y requirements (Samwise signs off), copy (Bilbo signs off), edge cases, out of scope.
|
|
@@ -261,6 +261,27 @@ Oracle scans for methods that return success without side effects — the most d
|
|
|
261
261
|
|
|
262
262
|
Flag as **High severity**. In financial systems (trading, payments, billing), flag as **Critical**. (Field report #125: `ProtectionService._place_stop_loss()` returned `True` after logging but never called the exchange. `OrderService.cancel_order()` returned `True` without cancelling.)
|
|
263
263
|
|
|
264
|
+
### Failure Attribution (multi-file test runs)
|
|
265
|
+
|
|
266
|
+
A test failure observed during a multi-file suite run is **NOT attributed to your change** until BOTH of these hold:
|
|
267
|
+
|
|
268
|
+
1. **It reproduces with that file run in ISOLATION.** Re-run only the failing test file by itself (e.g., `pytest path/to/test_x.py`, `npm test -- path/to/x.test.ts`, `go test ./pkg/x`). If the failure vanishes when the file runs alone, it is a cross-file collision, not your regression.
|
|
269
|
+
2. **It does NOT reproduce on clean HEAD.** `git stash` your working changes, re-run the same isolated file, and observe. If the failure is present on clean HEAD too, your change did not cause it. `git stash pop` to restore.
|
|
270
|
+
|
|
271
|
+
Shared-DB and shared-fixture suites routinely produce cross-file collisions — duplicate-seed conflicts, ordering dependencies, leaked global state, autoincrement-id assumptions — that masquerade as regressions introduced by the change under review. Attributing one of these to your fix sends the QA pass down a false trail and can trigger a "revert the good fix" overcorrection. Run the isolation check and the clean-HEAD check before you write the bug down or blame the diff. (Field report #349 F-3)
|
|
272
|
+
|
|
273
|
+
**Isolation-green is not deploy-green.** The two checks above clear a change of *blame* for a failure seen in the full run — they do NOT clear the change for deploy. Attribution runs the failing file in ISOLATION; a deploy gate runs the FULL suite. The asymmetry is the point: the very cross-file coupling that lets a collision masquerade as your regression also lets a *real* regression introduced by your change hide inside an *unrelated* test that only fails when the whole suite runs together (shared fixture your change now mutates, global state your change leaks, ordering your change perturbs). A targeted/isolation run of just the tests you touched can be all-green while the full suite is red on a file you never opened. Therefore: before declaring a change deploy-ready, run the FULL suite to green — never sign off on the strength of a targeted or isolation-only run. Isolation green proves "not my blame for *this* failure"; only full-suite green proves "safe to ship." (Field report #354 F3)
|
|
274
|
+
|
|
275
|
+
### Planted-Bug Check — Gates Must Gate
|
|
276
|
+
|
|
277
|
+
For every gate, threshold, or invariant a mission introduces (auth allowlist, eval scorer, rate cap, boot guard, validation boundary, feature flag), the review MUST confirm the gate actually gates: a deliberate inversion or revert of the gate's logic WOULD fail at least one test. Procedure — for each gate:
|
|
278
|
+
|
|
279
|
+
1. Identify the line(s) that enforce the gate.
|
|
280
|
+
2. Mentally (or, when cheap and reversible, actually) invert it — flip the comparison, negate the predicate, widen the allowlist, make the scorer return a constant pass, push the boundary off by one.
|
|
281
|
+
3. Ask: does any existing test now go red? If yes, the gate is covered. If no test trips, the gate is **untested** — the finding is **High**, and the deliverable is the missing test that would have caught the inversion.
|
|
282
|
+
|
|
283
|
+
A gate with no test that fails on its inversion is a **vacuous invariant**: it looks like protection but enforces nothing, because nothing observes whether it holds. Recurring vacuous-invariant anti-patterns (these surfaced **4x in a single session**): an eval scorer that always passes regardless of output; an auth allowlist with an inverted `!`-check that admits everyone; an off-by-one cap boundary that never actually caps; a truthy boot-guard that is always truthy and so never guards. Treat any newly-introduced gate as guilty until a failing-on-inversion test proves it innocent. (Field report #352 #1)
|
|
284
|
+
|
|
264
285
|
### Safety-Critical Return Value Verification
|
|
265
286
|
|
|
266
287
|
For systems with safety-critical operations (stop-loss placement, circuit breakers, rollback triggers, payment captures, credential revocations): verify the return value of the safety operation BEFORE transitioning state. The pattern: `call safety operation → check return → only then transition`.
|