voidforge-build 23.19.0 → 23.20.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (59) hide show
  1. package/dist/.claude/agents/celebrimbor-forge-artist.md +1 -0
  2. package/dist/.claude/agents/ducem-token-economics.md +1 -0
  3. package/dist/.claude/agents/galadriel-frontend.md +1 -0
  4. package/dist/.claude/agents/romanoff-integrations.md +4 -0
  5. package/dist/.claude/agents/silver-surfer-herald.md +19 -4
  6. package/dist/.claude/commands/architect.md +4 -3
  7. package/dist/.claude/commands/assemble.md +12 -0
  8. package/dist/.claude/commands/assess.md +1 -0
  9. package/dist/.claude/commands/build.md +8 -0
  10. package/dist/.claude/commands/contextmeter.md +56 -0
  11. package/dist/.claude/commands/debrief.md +10 -0
  12. package/dist/.claude/commands/engage.md +5 -0
  13. package/dist/.claude/commands/git.md +13 -1
  14. package/dist/.claude/commands/imagine.md +1 -1
  15. package/dist/.claude/commands/seal.md +80 -0
  16. package/dist/.claude/commands/ux.md +13 -0
  17. package/dist/.claude/workflows/gauntlet.workflow.js +13 -1
  18. package/dist/CHANGELOG.md +38 -0
  19. package/dist/CLAUDE.md +8 -0
  20. package/dist/HOLOCRON.md +16 -2
  21. package/dist/VERSION.md +2 -1
  22. package/dist/docs/methods/AI_INTELLIGENCE.md +3 -0
  23. package/dist/docs/methods/ASSEMBLER.md +12 -0
  24. package/dist/docs/methods/BUILD_PROTOCOL.md +7 -0
  25. package/dist/docs/methods/CAMPAIGN.md +11 -0
  26. package/dist/docs/methods/DEVOPS_ENGINEER.md +56 -0
  27. package/dist/docs/methods/FIELD_MEDIC.md +1 -0
  28. package/dist/docs/methods/FORGE_ARTIST.md +3 -4
  29. package/dist/docs/methods/GAUNTLET.md +6 -0
  30. package/dist/docs/methods/MUSTER.md +2 -0
  31. package/dist/docs/methods/PRODUCT_DESIGN_FRONTEND.md +18 -0
  32. package/dist/docs/methods/QA_ENGINEER.md +17 -1
  33. package/dist/docs/methods/RELEASE_MANAGER.md +27 -0
  34. package/dist/docs/methods/SECURITY_AUDITOR.md +11 -1
  35. package/dist/docs/methods/SUB_AGENTS.md +31 -0
  36. package/dist/docs/methods/SYSTEMS_ARCHITECT.md +15 -0
  37. package/dist/docs/methods/TESTING.md +2 -0
  38. package/dist/docs/methods/TROUBLESHOOTING.md +2 -2
  39. package/dist/docs/methods/WORKFLOWS.md +14 -0
  40. package/dist/docs/patterns/ai-prompt-safety.ts +85 -0
  41. package/dist/docs/patterns/data-pipeline.ts +59 -1
  42. package/dist/docs/patterns/exclusion-set-invariant.md +62 -0
  43. package/dist/docs/patterns/multi-tenant-property-test.ts +64 -0
  44. package/dist/docs/patterns/oauth-token-lifecycle.ts +21 -0
  45. package/dist/scripts/statusline/README.md +38 -0
  46. package/dist/scripts/statusline/context-awareness-hook.sh +53 -0
  47. package/dist/scripts/statusline/settings-snippet.json +17 -0
  48. package/dist/scripts/statusline/voidforge-statusline.sh +91 -0
  49. package/dist/scripts/voidforge.js +69 -6
  50. package/dist/wizard/lib/claude-md-strategy.d.ts +87 -0
  51. package/dist/wizard/lib/claude-md-strategy.js +198 -0
  52. package/dist/wizard/lib/marker.d.ts +48 -1
  53. package/dist/wizard/lib/marker.js +58 -2
  54. package/dist/wizard/lib/patterns/oauth-token-lifecycle.d.ts +14 -0
  55. package/dist/wizard/lib/patterns/oauth-token-lifecycle.js +21 -0
  56. package/dist/wizard/lib/project-init.js +59 -0
  57. package/dist/wizard/lib/updater.d.ts +19 -0
  58. package/dist/wizard/lib/updater.js +84 -33
  59. package/package.json +2 -2
@@ -53,6 +53,7 @@ return { confirmed: claims.filter((c,i) => verdicts[i]?.survives) }
53
53
  5. **Cost lever:** route cheap stages with `agent(p, {model:'haiku'})` (scout pre-scans) and reserve the default model for synthesis — the way the Surfer already runs on Haiku.
54
54
  6. **`agentType` resolves by the agent's `name:` display field, NOT the filename** (e.g. `'Picard'`, not `'picard-architecture'`). A filename-style `agentType` fails to resolve and the `agent()` call returns `null` (silently filtered by `.filter(Boolean)`), so the agent simply never runs. If a roster carries both, pass `a.name`. Same rule as the Agent tool's `subagent_type`.
55
55
  7. **Validate before shipping:** a workflow script's top-level `await`/`return` make a bare `node --check` fail ("Illegal return statement") — that is expected (the runtime wraps the body in an async fn). Use `npm run validate:workflows` (wired into `pretest`), which reproduces the wrapper before checking, so a real syntax error is caught in CI rather than shipping to npm.
56
+ 8. **Repro scratch goes to `mktemp`, never the repo tree** (#366 F5). A workflow's adversarial/repro agents that reproduce a finding via shell (probe scripts, atomic-write `.tmp` files, fixture dirs) MUST write to `$(mktemp -d)` (or `$(mktemp)` for a single file) — isolated, auto-cleaned, invisible to `git add -A`. Never write probe scripts or scratch into the working tree: the gauntlet's gate-race repro littered `.gate-repro-scratch/` and `scripts/surfer-gate/.*-probe.sh` into the repo on two separate runs and was nearly committed. The agent prompt that asks for a shell repro must say *where* to write it. Projects may also `.gitignore` a designated scratch path as a backstop, but the primary rule is `mktemp`. (Same rule for raw Agent dispatch — see `SUB_AGENTS.md`.)
56
57
 
57
58
  ## Gate interop (ADR-064) — REQUIRED
58
59
 
@@ -71,6 +72,19 @@ The 264 personas, the Agent Debate Protocol, severity re-rating from votes, the
71
72
 
72
73
  Every Workflow run persists its script + a journal. To resume after an edit/kill: `Workflow({scriptPath, resumeFromRunId})` — unchanged `agent()` calls return cached results; the first edited call and everything after re-runs.
73
74
 
75
+ ## Recovery — after `/clear` or a crash (#366 F1)
76
+
77
+ A background workflow survives **neither** `/clear` **nor** a host crash. Both leave the launching task's output empty (0-byte) or partial — the run did not finish synthesizing, even though the journal on disk may hold dozens of completed `agent()` results. The reflex is to re-run from scratch; for a 60–80-agent gauntlet that throws away ~80 minutes and the token cost of every cached agent. **Resume FIRST.**
78
+
79
+ **Recovery procedure:**
80
+
81
+ 1. **Record the `runId` at launch.** `/gauntlet` and `/assemble` write the workflow `runId` to their state file (and the vault) the moment they invoke the Workflow tool, so a fresh post-`/clear` session can find it. If you don't have it, the runtime can list recent runs for the script.
82
+ 2. **On an empty or partial task-output, resume — don't restart.** `Workflow({ scriptPath, resumeFromRunId })` replays the journal: every unchanged `agent()` call returns its cached result instantly, and execution continues from the first incomplete call through the final synthesis. You pay only for what didn't finish.
83
+ 3. **Empty-output handling is not "the run failed."** A 0-byte output means the *lead's task* was interrupted, not that the agents didn't run. Check the journal/`runId` before concluding the work was lost.
84
+ 4. **What survives:** the script source and the per-call result journal (so cached `agent()` results survive). **What does NOT survive:** in-flight agents at crash time (re-run on resume), and any repro scratch the agents wrote (gone with `mktemp`, as it should be — Gotcha 8). If you *edited* the script after the crash, resume re-runs from the first changed call forward; an unchanged script resumes cleanly.
85
+
86
+ Re-running from scratch is correct only when no `runId` is recoverable. Treat blind restart as the fallback, not the default.
87
+
74
88
  ## Related
75
89
 
76
90
  - `SUB_AGENTS.md` — dispatch discipline, model/effort tiering, the find→verify review shape, fan-out residual sweeps.
@@ -288,10 +288,95 @@ const conferenceUrlField: UntrustedExtractionField = {
288
288
  * surface the raw value on the review surface for operator edit.
289
289
  */
290
290
 
291
+ // --- Deny-list discipline (forbidden-inference / forbidden-token filters) ---
292
+
293
+ /**
294
+ * Pattern for a deny-list that strips or rejects forbidden content an LLM might
295
+ * emit — e.g. a compliance filter that must NOT let the model infer or assert a
296
+ * subject's wealth, accreditation, or citizenship. A naive "does the output
297
+ * contain any banned token?" substring/regex filter false-fires three ways and
298
+ * is silently un-testable a fourth. Field report #378 (InvestorGraph) hit all
299
+ * four on a compliance-critical forbidden-inference filter:
300
+ *
301
+ * 1. NEGATION / DISCLAIMER false-positive
302
+ * The model correctly writing "*no* accreditation evidence" or "citizenship
303
+ * unknown" is the SAFE answer — yet a bare token match strips it and
304
+ * penalizes the model for being careful. The filter must scope matches to
305
+ * POSITIVE assertions: if a negation/disclaimer cue sits adjacent to the
306
+ * banned token, the mention is not a leak.
307
+ *
308
+ * 2. PROPER-NOUN false-positive
309
+ * A contact employed at "Visa", a fund literally named "Trust Fund", a
310
+ * company "BIG RICH LLC", a "...High Net Worth Community" group — the banned
311
+ * substring appears inside a legitimate entity name the model is allowed to
312
+ * report. An allowlist of known proper nouns (and the entity's own
313
+ * attribute values — employer, company, group names) must suppress the match.
314
+ *
315
+ * 3. HOMOGLYPH / ZERO-WIDTH evasion (false-NEGATIVE — the dangerous direction)
316
+ * An adversary (or a quirk of upstream data) writes "аccredited" with a
317
+ * Cyrillic 'а', or splits the token with a zero-width joiner, and the banned
318
+ * term sails through. NFKC-normalize and strip zero-width / combining marks
319
+ * BEFORE matching so visually-identical variants collapse to the canonical
320
+ * form the deny-list is written against.
321
+ *
322
+ * 4. TAUTOLOGICAL EVAL (the un-testable trap)
323
+ * The safety EVAL's leak-detector must be INDEPENDENT of the production
324
+ * filter. If the eval re-imports the same deny-list / regex the filter uses,
325
+ * it is structurally incapable of catching the filter's gaps — every term
326
+ * the filter misses, the eval also misses, so the eval reports PASS on a
327
+ * real leak. Testing a filter with itself is vacuous. The leak-detector
328
+ * must be built from an independent oracle (a hand-curated banned-phrase
329
+ * set, a second model, an LLM-judge, or human labels).
330
+ */
331
+ export interface DenyListPolicy {
332
+ forbiddenTerms: string[] // canonical, post-NFKC banned tokens/phrases
333
+ normalizeBeforeMatch: 'nfkc-strip-zerowidth' // ALWAYS normalize first (guard #3)
334
+ negationGuard: { // guard #1 — a nearby negation/disclaimer un-flags the match
335
+ enabled: true
336
+ cues: string[] // e.g. ['no', 'not', 'unknown', 'unverified', 'absent', 'lacks']
337
+ windowTokens: number // how many tokens of adjacency count as "negating" the term
338
+ }
339
+ properNounAllowlist: string[] // guard #2 — names containing a banned substring that are OK
340
+ allowEntityAttributeValues: boolean // guard #2 — also exempt the entity's own employer/company/group fields
341
+ evalLeakDetector: 'independent' // guard #4 — MUST NOT reuse this policy's forbiddenTerms
342
+ }
343
+
344
+ const accreditationDenyList: DenyListPolicy = {
345
+ forbiddenTerms: ['accredited', 'net worth', 'high net worth', 'citizenship', 'wealthy'],
346
+ normalizeBeforeMatch: 'nfkc-strip-zerowidth',
347
+ negationGuard: {
348
+ enabled: true,
349
+ cues: ['no', 'not', 'unknown', 'unverified', 'absent', 'lacks', 'without', 'cannot confirm'],
350
+ windowTokens: 4,
351
+ },
352
+ properNounAllowlist: ['Visa', 'Trust Fund', 'BIG RICH LLC', 'High Net Worth Community'],
353
+ allowEntityAttributeValues: true,
354
+ evalLeakDetector: 'independent',
355
+ }
356
+
357
+ /* ANTI-PATTERN 5: bare substring/regex deny-list with a self-referential eval
358
+ *
359
+ * 'We strip any output line containing a banned term, and our safety eval
360
+ * greps the output for the same banned terms — 11/11 pass, ship it.'
361
+ *
362
+ * No. Four failures, three loud and one silent:
363
+ * - "no accreditation evidence" (the SAFE answer) is stripped + penalized.
364
+ * - A contact at "Visa" / a "Trust Fund" is flagged on a proper noun.
365
+ * - "аccredited" (Cyrillic а) or a zero-width-split token slips through.
366
+ * - The eval reuses the filter's deny-list, so it CANNOT fail on a leak the
367
+ * filter misses — 11/11 is a tautology, not evidence of safety.
368
+ *
369
+ * Fix: NFKC-normalize + strip zero-width BEFORE matching (defeats evasion);
370
+ * scope matches to positive assertions via a negation-adjacency guard; suppress
371
+ * proper-noun / entity-attribute matches via an allowlist; and build the eval's
372
+ * leak-detector from an INDEPENDENT oracle so it can actually fail.
373
+ */
374
+
291
375
  export {
292
376
  authorityInstruction,
293
377
  denyListEnforcement,
294
378
  fsPermsEnforcement,
295
379
  threadplexAgentStack,
296
380
  conferenceUrlField,
381
+ accreditationDenyList,
297
382
  }
@@ -10,6 +10,11 @@
10
10
  * - Batch vs streaming mode toggle — same stages, different execution
11
11
  * - Error handling: skip-and-log vs fail-fast configurable per pipeline
12
12
  * - Progress reporting callback for observability
13
+ * - Source-format discovery BEFORE assuming CSV — the first stage detects the
14
+ * real input format and dispatches to a SourceAdapter. Never hardcode
15
+ * `read_csv`. A "giant contact dump" is frequently NOT a CSV (field report
16
+ * #378: a 4k-row export arrived as an Apple Contacts `.abbu` SQLite bundle).
17
+ * See the SourceAdapter section in Framework Adaptations below.
13
18
  *
14
19
  * Agents: Stark (backend), Banner (data), L (monitoring)
15
20
  *
@@ -250,6 +255,56 @@ export {
250
255
  checkNullRate, checkRange, computeDedupHash,
251
256
  };
252
257
 
258
+ // ── Source Adapter (format discovery — field report #378) ──────────────
259
+ //
260
+ // The PRD says "CSV" but the real authorized source is often something else.
261
+ // A pipeline's FIRST stage must DISCOVER the format and dispatch to an adapter,
262
+ // never assume CSV. Each adapter normalizes its source into the same record
263
+ // shape the rest of the pipeline consumes (e.g. a flat contact row). Adding a
264
+ // source = adding an adapter, not editing every downstream stage.
265
+ //
266
+ // type SourceFormat = 'csv' | 'vcard' | 'sqlite-contacts' | 'json';
267
+ //
268
+ // /** Sniff the format from extension + magic bytes — do NOT trust the name alone. */
269
+ // function detectSourceFormat(path: string, head: Buffer): SourceFormat {
270
+ // const ext = path.toLowerCase();
271
+ // if (ext.endsWith('.vcf')) return 'vcard'; // vCard text
272
+ // if (ext.endsWith('.abbu') || ext.endsWith('.abcddb')) return 'sqlite-contacts'; // Apple Contacts store
273
+ // if (head.subarray(0, 16).toString() === 'SQLite format 3') return 'sqlite-contacts';
274
+ // if (ext.endsWith('.json')) return 'json';
275
+ // if (head[0] === 0x42 && head[1] === 0x45 && head[2] === 0x47) return 'vcard'; // "BEG" of BEGIN:VCARD
276
+ // return 'csv';
277
+ // }
278
+ //
279
+ // interface SourceAdapter { read(path: string): Promise<Record<string, unknown>[]>; }
280
+ //
281
+ // // --- vCard (.vcf) ------------------------------------------------------
282
+ // // STUB: parse with a vCard lib (e.g. `vcf`/`ical.js`); map FN/EMAIL/TEL/ORG
283
+ // // to the canonical contact record. A single .vcf can hold many VCARD blocks.
284
+ // const vcardAdapter: SourceAdapter = {
285
+ // async read(_path) { throw new Error('Implement: split on BEGIN:VCARD, map FN/EMAIL/TEL/ORG'); },
286
+ // };
287
+ //
288
+ // // --- SQLite contact stores (.abbu bundle / .abcddb) -------------------
289
+ // // STUB: an Apple Contacts `.abbu` is a BUNDLE containing an `.abcddb` SQLite
290
+ // // file; open read-only and SELECT from ZABCDRECORD/ZABCDEMAILADDRESS etc.
291
+ // // (schema varies by macOS version — probe table names, don't hardcode).
292
+ // const sqliteContactsAdapter: SourceAdapter = {
293
+ // async read(_path) { throw new Error('Implement: open .abcddb read-only, join ZABCDRECORD + email/phone tables'); },
294
+ // };
295
+ //
296
+ // // --- JSON export -------------------------------------------------------
297
+ // // STUB: many providers export a JSON array (or NDJSON); validate with Zod
298
+ // // before mapping — exported JSON is untyped and frequently partial.
299
+ // const jsonAdapter: SourceAdapter = {
300
+ // async read(_path) { throw new Error('Implement: parse + Zod-validate, map to canonical record'); },
301
+ // };
302
+ //
303
+ // // SECURITY: every one of these formats is a PII export. The default
304
+ // // .gitignore must cover them up front (*.vcf *.abbu *.abcddb* *.json input
305
+ // // dumps) — field report #378 logged TWO near-misses where a non-CSV source
306
+ // // dump sat un-ignored in the repo root.
307
+ //
253
308
  // ── Framework Adaptations ───────────────────────────────
254
309
  //
255
310
  // === Python (pandas/polars) ===
@@ -262,7 +317,10 @@ export {
262
317
  // raise FileNotFoundError(path)
263
318
  //
264
319
  // def transform(self, path: str) -> pl.DataFrame:
265
- // return pl.read_csv(path)
320
+ // # Discover the format first — do NOT assume CSV (field report #378).
321
+ // fmt = detect_source_format(path) # 'csv'|'vcard'|'sqlite-contacts'|'json'
322
+ // return SOURCE_ADAPTERS[fmt](path) # each adapter -> canonical DataFrame
323
+ // # e.g. sqlite-contacts: sqlite3.connect(f"file:{abcddb}?mode=ro", uri=True)
266
324
  //
267
325
  // class CleanStage:
268
326
  // def validate(self, df: pl.DataFrame) -> None:
@@ -0,0 +1,62 @@
1
+ # Pattern: Exclusion-Set Superset Invariant
2
+
3
+ **When to use:** Any project where MORE THAN ONE mechanism independently enumerates "secret / PII / excluded" files — typically `.gitignore`, an `rsync --exclude` (or `tar --exclude`) deploy list, and a secret-scanner config (gitleaks/trufflehog/detect-secrets). Containment-heavy projects (autonomous agents, deploy pipelines that ship a working tree to a host) are the high-risk case.
4
+
5
+ **Source:** Field report #377 §5 (live secret exposure traced to three exclusion mechanisms drifting apart).
6
+
7
+ ## The Failure Mode
8
+
9
+ Each mechanism enumerates "the secret files" by its OWN rules, authored at a different time by a different concern:
10
+
11
+ - `.gitignore` keeps secrets OUT OF GIT.
12
+ - `rsync --exclude` (deploy) keeps secrets OFF THE TARGET HOST.
13
+ - the secret-scanner keeps secrets OUT OF COMMITS / CI.
14
+
15
+ Because the three lists are written and maintained separately, they drift. A file the `.gitignore` covers shipped through `rsync` world-readable, and the scanner's name patterns never matched it — so a secret excluded from git was deployed to the host and went undetected. Three "secured" mechanisms, zero of them caught the leak, because none of them agreed on the set.
16
+
17
+ The trap: each list looks complete in isolation. The bug is in the DELTA between them, which no single mechanism can see.
18
+
19
+ ## The Pattern — One Canonical Set, the Others are Supersets
20
+
21
+ Define ONE canonical secret/PII exclusion set. Every other mechanism's exclusion set must be a SUPERSET of it (it may exclude more — never less). Then assert the invariant in CI so it cannot silently drift.
22
+
23
+ 1. **Canonical source.** Pick one list as canonical (usually `.gitignore`'s secret section, or a dedicated `secrets.exclude` manifest). This is the minimum set every mechanism must cover.
24
+
25
+ 2. **Derive, don't duplicate, where possible.** Generate the `rsync --exclude-from=` file and the scanner's path patterns FROM the canonical set at build time. Derivation makes drift structurally impossible; if a mechanism's format can't be derived, fall to the assertion below.
26
+
27
+ 3. **Assert the superset invariant.** A CI/provisioning check that fails closed:
28
+
29
+ ```bash
30
+ # exclusion-set-invariant check — every mechanism must cover the canonical set.
31
+ # Canonical set = the secret/PII globs that MUST be excluded everywhere.
32
+ canonical=$(sort -u docs/security/secrets.exclude) # one file, one canonical truth
33
+
34
+ # Each mechanism exposes its excluded globs (normalize to one-glob-per-line).
35
+ gitignore=$(git_secret_globs) # secret section of .gitignore
36
+ rsync_excl=$(cat deploy/rsync.exclude)
37
+ scanner=$(scanner_path_globs) # gitleaks/trufflehog allow/deny paths
38
+
39
+ fail=0
40
+ for mech in "gitignore:$gitignore" "rsync:$rsync_excl" "scanner:$scanner"; do
41
+ name="${mech%%:*}"; have="${mech#*:}"
42
+ # Anything in canonical NOT covered by this mechanism = drift = fail.
43
+ missing=$(comm -23 <(printf '%s\n' "$canonical" | sort -u) \
44
+ <(printf '%s\n' "$have" | sort -u))
45
+ if [[ -n "$missing" ]]; then
46
+ echo "EXCLUSION DRIFT: '$name' is missing canonical entries:" >&2
47
+ echo "$missing" >&2
48
+ fail=1
49
+ fi
50
+ done
51
+ exit "$fail"
52
+ ```
53
+
54
+ 4. **Wire it into the gates.** Run the check in CI AND as a deploy/arming pre-flight (per the field report it was a deploy-time exposure). A new secret pattern added to the canonical set then forces every mechanism to cover it, or the build/deploy fails.
55
+
56
+ ## The Invariant, Stated
57
+
58
+ > `canonical ⊆ gitignore` AND `canonical ⊆ rsync_exclude` AND `canonical ⊆ scanner` — at all times, enforced by an assertion. Supersets are fine; subsets are drift.
59
+
60
+ ## The Trade-off
61
+
62
+ Derivation (step 2) is strictly better than assertion (step 3) — it removes the possibility of drift instead of detecting it — but not every tool accepts a generated exclude format, and some teams want each mechanism's list hand-tunable for its own extra concerns (rsync excluding build artifacts; the scanner allow-listing test fixtures). The superset invariant is the floor that permits those per-mechanism extras while forbidding any mechanism from covering LESS than the canonical secret set. Use derivation where the format allows; fall back to the asserted invariant everywhere else. (Field report #377 §5.)
@@ -34,6 +34,20 @@ declare const harness: {
34
34
  listAllReadEndpoints(): string[];
35
35
  listAllWriteEndpoints(): string[];
36
36
  resetDb(): Promise<void>;
37
+
38
+ // ── Handler-entry (HTTP-level) harness — field report #371 ──────────────
39
+ // Drives the REAL request entrypoint with a concrete credential, so the
40
+ // auth→uid wiring is exercised (not just the repository's WHERE org_id).
41
+ // `principal` is whatever the entrypoint actually authenticates with: a
42
+ // bearer token, a session cookie, an API key header — give two DISTINCT ones.
43
+ httpRequest(
44
+ principal: { headers: Record<string, string> },
45
+ method: 'GET' | 'POST' | 'PUT' | 'DELETE',
46
+ path: string,
47
+ body?: unknown,
48
+ ): Promise<{ status: number; json: unknown }>;
49
+ // Two distinct, real principals for the SAME logical resource owner vs other.
50
+ principalForOrg(org: { apiKey: string; userId: string }): { headers: Record<string, string> };
37
51
  };
38
52
 
39
53
  // ── The Property ─────────────────────────────────────────────────────────
@@ -85,6 +99,43 @@ describe('multi-tenant isolation property', () => {
85
99
  const rowsB = await harness.readAsOrg(orgB, '/api/people');
86
100
  expect(rowsB.find((r) => r.org_id === orgA.id)).toBeUndefined();
87
101
  });
102
+
103
+ // ── Handler-entry two-principal variant (field report #371) ──────────────
104
+ // The repository-layer property above can pass while a handler that hardcodes
105
+ // `uid = 1` leaks across tenants — the repo test never crosses the auth→uid
106
+ // seam. This variant drives the REAL HTTP entrypoint with TWO DISTINCT
107
+ // credentials and asserts isolation through the request path. It is the test
108
+ // that the planted-bug check below must turn red.
109
+ test('two distinct principals through the real handler do not cross tenants', async () => {
110
+ const orgA = await harness.createOrg();
111
+ const orgB = await harness.createOrg();
112
+ const pA = harness.principalForOrg(orgA);
113
+ const pB = harness.principalForOrg(orgB);
114
+
115
+ // A writes through the real entrypoint with A's own credential.
116
+ const created = await harness.httpRequest(pA, 'POST', '/api/people', { name: 'A-secret' });
117
+ expect(created.status).toBeLessThan(300);
118
+ const writtenId = (created.json as { id: string }).id;
119
+
120
+ // B reads every list endpoint through the real entrypoint with B's credential.
121
+ for (const readEndpoint of harness.listAllReadEndpoints()) {
122
+ const res = await harness.httpRequest(pB, 'GET', readEndpoint);
123
+ const rows = Array.isArray(res.json) ? (res.json as Array<{ id?: string }>) : [];
124
+ expect(rows.find((r) => r.id === writtenId)).toBeUndefined();
125
+ }
126
+
127
+ // Cross-principal direct fetch: B asking for A's row by id must 404, not 403
128
+ // (404 avoids leaking existence — see CLAUDE.md "Return 404, not 403").
129
+ const direct = await harness.httpRequest(pB, 'GET', `/api/people/${writtenId}`);
130
+ expect(direct.status).toBe(404);
131
+ });
132
+
133
+ // PLANTED-BUG RED-CHECK (field report #371): hardcoding `uid = <owner>` in the
134
+ // handler MUST turn the two-principal test above RED. If you can introduce
135
+ // that bug and the suite stays green, your isolation test is not crossing the
136
+ // auth→uid seam — it is asserting at the repository layer only. Run this once
137
+ // as a mutation check: patch the handler to ignore the authenticated principal
138
+ // and pin uid to org A's id; the test above must fail. Revert after proving it.
88
139
  });
89
140
 
90
141
  function randomPayload(): fc.Arbitrary<unknown> {
@@ -112,8 +163,21 @@ function randomPayload(): fc.Arbitrary<unknown> {
112
163
  // assert not any(r['id'] == written['id'] for r in rows_b), \
113
164
  // f"LEAK: {write_endpoint} -> {read_endpoint}"
114
165
  //
166
+ // # Handler-entry two-principal variant (field report #371) — drive the real
167
+ // # entrypoint (FastAPI TestClient / Django test Client) with two distinct
168
+ // # credentials, NOT the repository:
169
+ // # ra = client.post('/api/people', json={'name': 'A'}, headers=princ_a)
170
+ // # rb = client.get(f"/api/people/{ra.json()['id']}", headers=princ_b)
171
+ // # assert rb.status_code == 404 # not 403 — don't leak existence
172
+ // # Mutation check: pin uid=<owner> in the handler; this MUST go red.
173
+ //
115
174
  // ── Anti-patterns ────────────────────────────────────────────────────────
116
175
  //
176
+ // 0. Asserting isolation only at the repository layer. A handler that
177
+ // hardcodes uid=1 passes every repo-level test while leaking across
178
+ // tenants. The isolation test MUST drive the real request entrypoint with
179
+ // two distinct principals (field report #371). Prove it with the planted
180
+ // uid red-check.
117
181
  // 1. Testing isolation only on known endpoints. The bug is in the endpoint
118
182
  // you forgot. Property tests enumerate the full surface.
119
183
  // 2. Using SUPERUSER fixtures. They silently bypass FORCE RLS at the engine
@@ -8,6 +8,20 @@
8
8
  * - Failure escalation: retry 3x → pause platform → alert → requires_reauth
9
9
  * - Token stored as encrypted blob in vault, keyed by platform name
10
10
  * - Session token (daemon) rotates every 24 hours (§9.19.15)
11
+ * - VERIFY EXPIRY + REFRESH-GRANT BEHAVIOR AGAINST THE PROVIDER'S LIVE DOCS AT
12
+ * INTEGRATION TIME. The PLATFORM_CONFIGS TTLs below are STARTING ASSUMPTIONS,
13
+ * not ground truth — providers change them and "no refresh token / never
14
+ * expires" is a common false assumption. Field report #373: a Todoist
15
+ * integration shipped on "tokens don't expire," but the modern API issues
16
+ * ~1h access tokens WITH a refresh token; the code discarded the refresh
17
+ * token + expiry and registered no refresher, so it died ~1h after every
18
+ * connect across four sessions — looking exactly like intermittent
19
+ * revocation. At integration time: (1) read the provider's OAuth docs and
20
+ * quote the verified access-token TTL + whether a refresh_token is issued;
21
+ * (2) if a refresh_token exists, PERSIST it and register a refresher — never
22
+ * discard it; (3) distinguish "expired" from "revoked" via the API's OWN
23
+ * error body, not by inference (an expired token that mimics revocation will
24
+ * send you reauth-hunting instead of refreshing).
11
25
  *
12
26
  * Agents: Breeze (platform relations), Dockson (vault)
13
27
  *
@@ -50,6 +64,13 @@ interface PlatformTokenConfig {
50
64
  revokeEndpoint?: string;
51
65
  }
52
66
 
67
+ // ASSUMPTIONS, NOT GROUND TRUTH (field report #373). These TTLs and the
68
+ // "refreshTokenTtlDays: 0 = never expires" entries are starting points. At
69
+ // integration time, VERIFY each value against the provider's current OAuth
70
+ // docs and the live token response (`expires_in`, presence of `refresh_token`)
71
+ // — a provider that "doesn't expire" today may issue ~1h tokens tomorrow, and
72
+ // a missing refresher then surfaces as recurring prod token-death that mimics
73
+ // revocation. Treat any new platform here the same way before shipping.
53
74
  const PLATFORM_CONFIGS: PlatformTokenConfig[] = [
54
75
  { platform: 'meta', accessTokenTtlHours: 1440, refreshTokenTtlDays: 0, refreshEndpoint: 'https://graph.facebook.com/v19.0/oauth/access_token' },
55
76
  { platform: 'google', accessTokenTtlHours: 1, refreshTokenTtlDays: 0, refreshEndpoint: 'https://oauth2.googleapis.com/token' },
@@ -0,0 +1,38 @@
1
+ # Context Meter — status line + awareness hook
2
+
3
+ Two small scripts that surface how full the context window is — one for the human, one for the model.
4
+
5
+ | Script | Wired to | Audience | What it does |
6
+ |--------|----------|----------|--------------|
7
+ | `voidforge-statusline.sh` | `statusLine` (settings.json) | you | Renders one line: model + a colored meter (`⟦████████░░⟧ 78%`) + tokens remaining. Green → yellow → red as the window fills. |
8
+ | `context-awareness-hook.sh` | `UserPromptSubmit` hook | Claude | Once usage crosses a threshold, injects "you have ~X% left, checkpoint soon" into the model's own context each turn. Silent below the threshold. |
9
+
10
+ The model can't see its own remaining context. The status line tells *you*; the hook tells *Claude* — so it can wrap up open loops and suggest `/vault` or `/seal` before compaction instead of being surprised by it.
11
+
12
+ ## Install
13
+
14
+ **Default-on.** `npx voidforge-build init` already wires both scripts into a new project's `.claude/settings.json` (warn 80% / crit 92%). Nothing to do for a fresh project.
15
+
16
+ To re-install, retune, or activate on a project that predates this feature, run **`/contextmeter`** — it chmods these scripts and merges the right block into `.claude/settings.json`. Or wire it by hand: merge `settings-snippet.json` into `.claude/settings.json`. Remove with `/contextmeter --uninstall`.
17
+
18
+ ## How it reads context
19
+
20
+ - **Status line:** prefers the native `context_window` object Claude Code pipes on stdin (`used_percentage`, `context_window_size`). Falls back to deriving usage from the most recent assistant `message.usage` in `transcript_path` on older Claude Code that doesn't send the field.
21
+ - **Hook:** the hook stdin has no `context_window` object, so it always derives from `transcript_path` (`input_tokens + cache_read_input_tokens + cache_creation_input_tokens`).
22
+ - 1M-token sessions are detected automatically (usage above 200k ⇒ 1,000,000 denominator), or set `VOIDFORGE_CONTEXT_WINDOW`.
23
+
24
+ ## Tuning (env)
25
+
26
+ | Var | Default | Effect |
27
+ |-----|---------|--------|
28
+ | `VOIDFORGE_CONTEXT_WINDOW` | `200000` | Denominator when the size field is absent. |
29
+ | `VOIDFORGE_CONTEXT_WARN_PCT` | `80` | Hook starts speaking — and the meter turns yellow — at this % used. |
30
+ | `VOIDFORGE_CONTEXT_CRIT_PCT` | `92` | Hook escalates to "checkpoint NOW" — and the meter turns red — at this %. |
31
+
32
+ Both scripts read the same two thresholds, so the meter's yellow/red bands stay in lockstep with the hook's warn/critical bands. `/contextmeter --warn-pct N` / `--crit-pct N` bake these into the command strings in settings.json so they persist without a shell export.
33
+
34
+ ## Requirements & caveats
35
+
36
+ - **`jq`** is required. Without it the status line prints a one-line "install jq" notice and the hook no-ops — neither ever breaks your session.
37
+ - Only the **first line** of status-line stdout is shown by Claude Code, so the meter is deliberately single-line.
38
+ - **Name:** this ships as `/contextmeter`, not `/statusline` — Claude Code's native `/statusline` and `/context` commands always shadow a same-named project command (see `docs/NATIVE_CAPABILITIES.md`).
@@ -0,0 +1,53 @@
1
+ #!/usr/bin/env bash
2
+ # context-awareness-hook.sh — UserPromptSubmit hook that injects context-budget
3
+ # awareness INTO Claude's own context as the window fills.
4
+ #
5
+ # The status-line meter is for the human; this hook is for the model. Claude
6
+ # cannot see its own remaining context directly, so each turn (once usage crosses
7
+ # a threshold) this prints a JSON object whose `hookSpecificOutput.additionalContext`
8
+ # Claude receives — "you have ~X% left, checkpoint soon." Below the threshold it is
9
+ # silent, so it adds zero noise until it matters.
10
+ #
11
+ # Cadence: Claude Code has no time/turn-interval hooks — UserPromptSubmit (once per
12
+ # user turn) is the finest cadence available, which is exactly when fresh awareness
13
+ # is useful. Threshold-gated so it behaves like a periodic warning that only speaks
14
+ # near the limit.
15
+ #
16
+ # Requires jq; without it, no-op (exit 0). A hook must never break the turn.
17
+ #
18
+ # Env knobs:
19
+ # VOIDFORGE_CONTEXT_WINDOW denominator (default 200000; auto-bumps to 1000000 when usage exceeds 200k)
20
+ # VOIDFORGE_CONTEXT_WARN_PCT start warning at this % used (default 80)
21
+ # VOIDFORGE_CONTEXT_CRIT_PCT escalate to "checkpoint NOW" at this % (default 92)
22
+ set -uo pipefail
23
+
24
+ input="$(cat 2>/dev/null || true)"
25
+ command -v jq >/dev/null 2>&1 || exit 0
26
+
27
+ transcript="$(printf '%s' "$input" | jq -r '.transcript_path // empty' 2>/dev/null)"
28
+ [ -n "$transcript" ] && [ -f "$transcript" ] || exit 0
29
+
30
+ usage="$(tail -n 400 "$transcript" | jq -c 'select(.message.usage != null) | .message.usage' 2>/dev/null | tail -1)"
31
+ [ -n "$usage" ] || exit 0
32
+ used="$(printf '%s' "$usage" | jq -r '((.input_tokens//0)+(.cache_read_input_tokens//0)+(.cache_creation_input_tokens//0))' 2>/dev/null)"
33
+ used="${used%%.*}"
34
+ [ -n "${used:-}" ] || exit 0
35
+
36
+ if [ "$used" -gt 200000 ] 2>/dev/null; then window=1000000; else window="${VOIDFORGE_CONTEXT_WINDOW:-200000}"; fi
37
+ [ "${window:-0}" -gt 0 ] 2>/dev/null || exit 0
38
+ pct=$(( used * 100 / window ))
39
+
40
+ warn="${VOIDFORGE_CONTEXT_WARN_PCT:-80}"
41
+ crit="${VOIDFORGE_CONTEXT_CRIT_PCT:-92}"
42
+ [ "$pct" -lt "$warn" ] && exit 0
43
+
44
+ rem_k=$(( (window - used) / 1000 ))
45
+
46
+ if [ "$pct" -ge "$crit" ]; then
47
+ msg="⚠️ CONTEXT CRITICAL: ~${pct}% of the ${window}-token window is used (~${rem_k}k left). Compaction is imminent — checkpoint NOW: run /vault (or /seal) to preserve session state before the context is summarized, and prefer finishing the current sub-task over starting new work."
48
+ else
49
+ msg="Context monitor: ~${pct}% of the ${window}-token window is used (~${rem_k}k left). You are approaching the limit — wrap up open loops and consider /vault or /seal to checkpoint before compaction."
50
+ fi
51
+
52
+ jq -cn --arg m "$msg" '{hookSpecificOutput:{hookEventName:"UserPromptSubmit",additionalContext:$m}}'
53
+ exit 0
@@ -0,0 +1,17 @@
1
+ {
2
+ "statusLine": {
3
+ "type": "command",
4
+ "command": "bash scripts/statusline/voidforge-statusline.sh",
5
+ "padding": 0
6
+ },
7
+ "hooks": {
8
+ "UserPromptSubmit": [
9
+ {
10
+ "matcher": "",
11
+ "hooks": [
12
+ { "type": "command", "command": "bash scripts/statusline/context-awareness-hook.sh" }
13
+ ]
14
+ }
15
+ ]
16
+ }
17
+ }
@@ -0,0 +1,91 @@
1
+ #!/usr/bin/env bash
2
+ # voidforge-statusline.sh — Context-usage meter for the Claude Code status line.
3
+ #
4
+ # Reads the status-line JSON on stdin and prints ONE line:
5
+ # <model> ⟦████████░░⟧ 78% ctx · 44k left
6
+ # The meter is colored green → yellow → red as the context window fills.
7
+ #
8
+ # Source of truth: the native `.context_window` object Claude Code pipes to the
9
+ # status line (`used_percentage`, `context_window_size`). When that field is
10
+ # absent (older Claude Code), it falls back to deriving usage from the most
11
+ # recent assistant `message.usage` in `.transcript_path`.
12
+ #
13
+ # Requires jq. Without jq it prints a minimal line and exits 0 — a status line
14
+ # must NEVER hard-fail (that would blank the bar).
15
+ #
16
+ # Env knobs (shared with the awareness hook so colors and warnings stay in lockstep):
17
+ # VOIDFORGE_CONTEXT_WINDOW denominator when the size field is absent (default 200000)
18
+ # VOIDFORGE_CONTEXT_WARN_PCT meter turns yellow at this % used (default 80)
19
+ # VOIDFORGE_CONTEXT_CRIT_PCT meter turns red at this % used (default 92)
20
+ set -uo pipefail
21
+
22
+ input="$(cat 2>/dev/null || true)"
23
+
24
+ if ! command -v jq >/dev/null 2>&1; then
25
+ printf 'VoidForge · ctx meter needs jq (brew install jq)\n'
26
+ exit 0
27
+ fi
28
+
29
+ j() { printf '%s' "$input" | jq -r "$1" 2>/dev/null; }
30
+
31
+ model="$(j '.model.display_name // .model.id // "Claude"')"
32
+ pct="$(j '.context_window.used_percentage // empty')"
33
+ window="$(j '.context_window.context_window_size // empty')"
34
+
35
+ # Fallback: derive from the transcript when the native field is absent.
36
+ if [ -z "$pct" ]; then
37
+ transcript="$(j '.transcript_path // empty')"
38
+ if [ -n "$transcript" ] && [ -f "$transcript" ]; then
39
+ usage="$(tail -n 400 "$transcript" | jq -c 'select(.message.usage != null) | .message.usage' 2>/dev/null | tail -1)"
40
+ if [ -n "$usage" ]; then
41
+ used="$(printf '%s' "$usage" | jq -r '((.input_tokens//0)+(.cache_read_input_tokens//0)+(.cache_creation_input_tokens//0))' 2>/dev/null)"
42
+ used="${used%%.*}"
43
+ if [ -z "$window" ]; then
44
+ if [ "${used:-0}" -gt 200000 ] 2>/dev/null; then window=1000000; else window="${VOIDFORGE_CONTEXT_WINDOW:-200000}"; fi
45
+ fi
46
+ if [ -n "${used:-}" ] && [ "${window:-0}" -gt 0 ] 2>/dev/null; then
47
+ pct=$(( used * 100 / window ))
48
+ fi
49
+ fi
50
+ fi
51
+ fi
52
+
53
+ # Coerce to integer; bail to model-only if we still have nothing.
54
+ pct="${pct%%.*}"
55
+ if [ -z "$pct" ]; then
56
+ printf '%s\n' "$model"
57
+ exit 0
58
+ fi
59
+ [ -z "$window" ] && window="${VOIDFORGE_CONTEXT_WINDOW:-200000}"
60
+ window="${window%%.*}"
61
+
62
+ [ "$pct" -lt 0 ] 2>/dev/null && pct=0
63
+ [ "$pct" -gt 100 ] 2>/dev/null && pct=100
64
+
65
+ remaining=$(( window - window * pct / 100 ))
66
+ if [ "$remaining" -ge 1000 ]; then rem_h="$(( remaining / 1000 ))k"; else rem_h="${remaining}"; fi
67
+
68
+ # Color band — defaults align with the awareness-hook thresholds (warn 80 → yellow,
69
+ # crit 92 → red) so the meter turns red exactly when the hook goes critical. Both
70
+ # honor the same env vars, so retuning one retunes the other.
71
+ yellow_at="${VOIDFORGE_CONTEXT_WARN_PCT:-80}"
72
+ red_at="${VOIDFORGE_CONTEXT_CRIT_PCT:-92}"
73
+ if [ "$pct" -ge "$red_at" ]; then color=$'\033[31m' # red — checkpoint now
74
+ elif [ "$pct" -ge "$yellow_at" ]; then color=$'\033[33m' # yellow — getting full
75
+ else color=$'\033[32m' # green — healthy
76
+ fi
77
+ reset=$'\033[0m'
78
+ dim=$'\033[2m'
79
+
80
+ # 10-cell meter, rounded.
81
+ filled=$(( (pct + 5) / 10 ))
82
+ [ "$filled" -gt 10 ] && filled=10
83
+ [ "$filled" -lt 0 ] && filled=0
84
+ bar=""
85
+ i=0
86
+ while [ "$i" -lt 10 ]; do
87
+ if [ "$i" -lt "$filled" ]; then bar="${bar}█"; else bar="${bar}░"; fi
88
+ i=$(( i + 1 ))
89
+ done
90
+
91
+ printf '%s %s⟦%s⟧ %d%%%s %sctx · %s left%s\n' "$model" "$color" "$bar" "$pct" "$reset" "$dim" "$rem_h" "$reset"