voidforge-build 23.14.0 → 23.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/CHANGELOG.md CHANGED
@@ -6,6 +6,32 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/), and this
6
6
 
7
7
  ---
8
8
 
9
+ ## [23.15.0] - 2026-06-13
10
+
11
+ ### Platform alignment — gate↔Workflow (ADR-064) + model-ID/effort/concurrency currency
12
+
13
+ Output of `/architect --plan` (a 12-agent platform-evolution review of VoidForge against the mid-2026 Claude Code platform) → `/build` items **(b)** then **(a)**.
14
+
15
+ ### Added
16
+
17
+ - **ADR-064 — Silver Surfer Gate ↔ Dynamic Workflow interop.** Empirically confirmed the `PreToolUse` gate (`check.sh:99`, matcher `"Agent"` only) is **structurally blind to Workflow-tool-spawned agents**: across this session ~60+ workflow agents produced exactly **2** gate events (`Surfer self-launch`, `ROSTER_RECEIVED`), and a controlled `gate-probe` workflow left the count unchanged (BEFORE=2 / AFTER=2). Decision: extend the matcher to `Agent|Workflow` and gate the workflow *launch* on a recorded roster. **Implementation is a campaign mission** (it touches the gate + its test suite); the ADR records the decision + the reproducible test.
18
+
19
+ ### Fixed
20
+
21
+ - **Live runtime model-ID bug** — `packages/voidforge/wizard/lib/anthropic.ts` `resolveBestModel()` fell back to **`claude-sonnet-4-7`, a model that does not exist**, on the exact API-unreachable path the fallback exists for (→ 404 when reliability matters most). Corrected to `claude-sonnet-4-6` at both fallback sites, fixed the test that *asserted the bug* (now 6/6 green), and updated `PRD.md` / `FAILURE_MODES.md` / `AI_INTELLIGENCE.md`. (#359-adjacent; surfaced by Seldon + Troi.)
22
+ - **Stale model IDs in reference patterns** — `claude-sonnet-4-20250514` → `claude-sonnet-4-6` across 6 `docs/patterns/*.ts` (every `init` copies these); `Opus 4.7` → `Opus 4.8` across `SUB_AGENTS.md` + ADR-050/051/054/059. (Historical CHANGELOG mentions of `4-7` left intact.)
23
+
24
+ ### Changed
25
+
26
+ - **Effort-tiering policy** added to `SUB_AGENTS.md` Model Tiering (leads `xhigh` / specialists `medium` / **scouts omit — Haiku 4.5 errors on `effort` and caps at 200K context**) and mapped onto the flag taxonomy in `CLAUDE.md` (default→xhigh/medium, `--fast`→medium/low, `ultracode`-keyword caveat). The 264-file frontmatter fleet edit + the ADR-054 amendment are deferred to the campaign (pending runtime verification that agent-frontmatter `effort:` is honored).
27
+ - **ADR-059 amended** with the real platform concurrency ceiling (**~16 concurrent / ~1,000 per run** — the "20+/30+ parallel" framing was context-headroom, not actual parallelism; batch unbounded fan-outs) and promoted Proposed→Accepted. **`GAUNTLET.md`** "Each round launches agents in waves of 3" (which contradicted ADR-059) corrected to full-roster-with-batching.
28
+
29
+ ### Pipeline
30
+
31
+ Dogfooded the v23.13.1 pre-tag `npm test` gate and the v23.14.0 publish-gate alignment. Dep range `^23.14.0` → `^23.15.0` (ADR-062). Operator-directed follow-on this session: `/architect --plan` ADR-065 (platform version floor) + ADR-066 (native-capability collision tracker) + amend ADR-051/054 → `/campaign --plan` → `/campaign` build all non-stop.
32
+
33
+ ---
34
+
9
35
  ## [23.14.0] - 2026-06-12
10
36
 
11
37
  ### Field Report Triage — 2 reports closed (#362, #363)
package/dist/CLAUDE.md CHANGED
@@ -210,6 +210,8 @@ Default is now maximum quality: autonomous execution + full agent roster + all r
210
210
  | `--interactive` | Pause for human confirmation at mission briefs and between phases | `/campaign`, `/assemble`, `/build` |
211
211
  | `--solo` | Lead agent only, no sub-agents | All commands |
212
212
 
213
+ **Effort mapping (Claude Code effort levels).** The opt-out ladder maps onto platform effort levels: **default** → `xhigh` on the Opus lead + `medium` on specialists; **`--fast`** → `medium` lead + `low` specialists (and/or fewer rounds); **`--solo`** → lead only at standard effort. **Haiku-tier scouts take NO effort parameter — Haiku 4.5 errors on it.** `effort` is a per-agent spend lever independent of model tier (see `SUB_AGENTS.md` Model Tiering). Caveat: the literal keyword `ultracode` auto-launches dynamic workflows on current Claude Code — keep it out of unescaped `$ARGUMENTS`/`--focus` text.
214
+
213
215
  **Retired flags (accepted silently as no-ops for backward compat):** `--blitz`, `--muster`, `--infinity`
214
216
 
215
217
  See `/docs/methods/MUSTER.md` for the full Muster Protocol.
package/dist/VERSION.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Version
2
2
 
3
- **Current:** 23.14.0
3
+ **Current:** 23.15.0
4
4
 
5
5
  ## Versioning Scheme
6
6
 
@@ -14,6 +14,7 @@ This project uses [Semantic Versioning](https://semver.org/):
14
14
 
15
15
  | Version | Date | Summary |
16
16
  |---------|------|---------|
17
+ | 23.15.0 | 2026-06-13 | Platform-alignment build (`/architect --plan` → `/build` b+a). **P0:** empirically confirmed the Silver Surfer `PreToolUse` gate is **blind to Workflow-tool-spawned agents** (this session: 60+ workflow agents → 2 gate events; controlled probe BEFORE=2/AFTER=2) and wrote **ADR-064** (gate↔Workflow interop — extend matcher to `Agent\|Workflow`, gate the workflow launch; implementation tracked for the campaign). **P1-B near-free batch:** fixed a **live runtime bug** — `anthropic.ts` fell back to the non-existent `claude-sonnet-4-7` (404 on the exact degraded path the fallback exists for) → `claude-sonnet-4-6`, plus the bug-asserting test (now 6/6) and 4 docs; purged stale model IDs (`claude-sonnet-4-20250514`→`claude-sonnet-4-6` in 6 pattern files; `Opus 4.7`→`4.8` across SUB_AGENTS + 4 ADRs); added the **effort-tiering policy** (leads `xhigh` / specialists `medium` / Haiku omit-no-effort+200K ceiling) to SUB_AGENTS.md + CLAUDE.md flag-taxonomy mapping; **amended ADR-059** with the real platform caps (~16 concurrent / ~1,000 per run) and fixed GAUNTLET.md's contradicting "waves of 3". Dogfooded the pre-tag `npm test` gate (ADR from v23.13.1) + the publish-gate alignment (v23.14.0). Dep `^23.14.0` → `^23.15.0`. Follow-on (operator-directed): `/architect --plan` ADR-065/066 + amend ADR-051/054 → `/campaign` to build all. |
17
18
  | 23.14.0 | 2026-06-12 | Field Report Triage — 2 reports closed (#362, #363) via `/debrief --inbox`, 8 fixes across 9 files. **#363** (self-filed last session): release flow now runs the test suite as Step 5's first action before any tag (`git.md`, since tag-push arms an irreversible publish); **Numeric constant migration checklist** generalizing the error-shape rule (`TESTING.md`); **Registry-Derived Fan-Out** coverage rule — enumerate the accepted `(fixId,targetFile)` tuple set, diff-check after appliers (`SUB_AGENTS.md` + `debrief.md` Step 6); **Chronically-Red Check Policy** (red ≥2 releases → fix/informational/remove) + **Publish-gate alignment** (publish must `needs:` the full E2E+a11y suite, not unit-only) (`DEVOPS_ENGINEER.md` + `RELEASE_MANAGER.md`); Workflow `args`-as-JSON-string defensive parse + `gh workflow` scope note (`SUB_AGENTS.md` + `RELEASE_MANAGER.md`). **#362** (enhancements): a named, right-sized **Pre-Deploy Review Gate** (diff-scoped N lenses + mandatory adversarial-verify) documented in `SUB_AGENTS.md` and realized as a new `/engage --pre-deploy --diff` mode; atomic-visual **render-harness screenshot carve-out** (`QA_ENGINEER.md` + `PRODUCT_DESIGN_FRONTEND.md`). Dogfooded #363 in its own release: ran the coverage diff-check (9/9 files) and `npm test` (1390/1390) before tagging. Dep `^23.13.1` → `^23.14.0` (ADR-062). |
18
19
  | 23.13.1 | 2026-06-12 | Publish-gate fix for v23.13.0. The #360 roster-TTL change (600s→3600s in `scripts/surfer-gate/check.sh`) did not update the gate's own `test.sh`, whose "Stale roster (>10min) blocks" case aged a roster 11 min and expected a block — now still *fresh* under the 1-hour TTL, so it returned exit 0 (expected 2). The CI `pretest` gate (`bash scripts/surfer-gate/test.sh`) failed → the `Publish to npm` job's test stage failed → both publish jobs were skipped (v23.13.0 was tagged but **never published**; npm stayed at 23.12.2). Fix: age the stale-roster test roster to 61 min (past the new TTL) and relabel ">1hr". Pure CI-gate fix — no methodology behavior change beyond v23.13.0. Full suite 1390/1390 green. Lesson for next time: a TTL/threshold change in a gate script must update the gate's adversarial test in the same commit (the test.sh stale case is exactly the kind of threshold-coupled assertion #356-F4 / #358-F1 warn about). Dep range `^23.13.0` → `^23.13.1` (ADR-062). |
19
20
  | 23.13.0 | 2026-06-12 | Field Report Triage — 6 reports closed (#356–#361). `/debrief --inbox` triaged all 6 open reports against the post-v23.12.2 tree via two-phase workflow orchestration (per-report investigators → adversarial verify of every already-fixed verdict → per-file appliers), applying 23 accepted fixes across 17 files + 1 new pattern. Clusters: **deploy-safety** (empty-string-into-strict-Zod boot crash + `z.preprocess` fix, "render≠load" config-LOADS gate, canary-worker-first, pre-build disk preflight, OAuth IdP-side-vs-regression — DEVOPS_ENGINEER.md, deploy.md, lucius-config); **adversarial-verify rigor** (reproduce through the REAL execution path not a library-in-isolation, GAUNTLET.md #356; composition/wiring lens for the Victory Gauntlet + ship-vs-enable ADR requirement #358); **mandatory-verification** (run prompt evals INLINE via `eval:op` not deferred to operator + mandatory adversarial review for untrusted→user-facing-sink paths #359; live-fire credential verification + premise-verification recon #360); **secret surfaces** (git remote / `.git/config` inline-credential scan — SECURITY_AUDITOR.md Phase 1, leia-secrets, deploy-preflight.ts, DEVOPS #361); **test fidelity** (real-output seeded-mutant self-test for LLM/external-output boundaries — QA_ENGINEER.md/TESTING.md #358). Surfer-gate roster TTL 600s→3600s + refresh-on-activity (check.sh/ADR-060 #360). New pattern `codemod-hygiene.md` (strip incidental recast reformatting #357; 51→52). #358-F3 (find→verify pattern) verified already-shipped in v23.12.0; #360-F4 reporter-scoped to project LEARNINGS. Dep range `^23.12.2` → `^23.13.0` (ADR-062). |
@@ -310,7 +310,7 @@ When a project uses AI, the PRD frontmatter should include:
310
310
  ```yaml
311
311
  ai: yes # Activates Seldon's review
312
312
  ai_provider: "anthropic" # anthropic | openai | local | multi
313
- ai_models: ["claude-sonnet-4-7"] # Models used — update to current runtime model
313
+ ai_models: ["claude-sonnet-4-6"] # Models used — update to current runtime model
314
314
  ai_features: ["classification", "generation", "tool-use", "routing"]
315
315
  ```
316
316
 
@@ -18,7 +18,7 @@
18
18
  - When the project has significant attack surface (auth, payments, user data, WebSocket, file uploads)
19
19
  - Before a public launch or investor demo
20
20
 
21
- **Dispatch model:** All Gauntlet rounds MUST dispatch to sub-agents per `SUB_AGENTS.md` "Parallel Agent Standard." Agents are launched as named subagent types defined in `.claude/agents/` with model tiering (Opus leads, Sonnet specialists, Haiku scouts) and tool restrictions. The main thread manages rounds, triages findings between rounds, and applies fixes — it does NOT read source files or analyze code inline. Each round launches agents in waves of 3 (max concurrent). Findings pass between rounds as summary tables, not raw code. See `docs/AGENT_CLASSIFICATION.md` for the full agent manifest (see docs/AGENT_CLASSIFICATION.md). (Field report #270)
21
+ **Dispatch model:** All Gauntlet rounds MUST dispatch to sub-agents per `SUB_AGENTS.md` "Parallel Agent Standard." Agents are launched as named subagent types defined in `.claude/agents/` with model tiering (Opus leads, Sonnet specialists, Haiku scouts) and tool restrictions. The main thread manages rounds, triages findings between rounds, and applies fixes — it does NOT read source files or analyze code inline. Fan out the full roster in parallel for read-only analysis (ADR-059) — the runtime queues beyond ~16 concurrent, so batch large rounds rather than assume true N-wide parallelism (and stay under the ~1,000-agent/run ceiling; never one-agent-per-file unbounded). Findings pass between rounds as summary tables, not raw code. See `docs/AGENT_CLASSIFICATION.md` for the full agent manifest (see docs/AGENT_CLASSIFICATION.md). (Field report #270)
22
22
 
23
23
  **When NOT to use /gauntlet:**
24
24
  - During active development (use `/assemble` instead — it includes building)
@@ -276,6 +276,8 @@ Each file is a standalone subagent definition that Claude Code's native subagent
276
276
 
277
277
  Leads inherit the main session's model (Opus). Specialists run on Sonnet for cost efficiency without sacrificing analysis quality. Scouts run on Haiku for fast, cheap reconnaissance.
278
278
 
279
+ **Effort tiering (per-agent spend lever).** Claude Code exposes an `effort:` level (`low`/`medium`/`high`/`xhigh`/`max`) that controls reasoning depth *independently* of the model tier. Apply by role: **Leads → `xhigh`** (the recommended start for agentic work on Opus 4.8); **Specialists → `medium`** (read-and-report review rarely needs full `high` spend across ~200 agents); **Scouts → OMIT** — **Haiku 4.5 does not support the effort parameter and errors if it is passed.** Haiku also has a **200K context ceiling (not 1M)**: the Surfer pre-scan and scout prompts must fit within it — read agent frontmatter (name/description/tags), not full bodies, on large rosters. Verify the runtime honors agent-frontmatter `effort:` before a fleet edit; the tier *policy* stands regardless. (Platform research 2026-06; the ADR-054 amendment + the 264-file fleet edit are tracked as a campaign mission.)
280
+
279
281
  ### Tool Restrictions
280
282
 
281
283
  | Profile | Tools | Agents |
@@ -485,7 +487,7 @@ Field report #324 (Union Station v7.8 R2): three agents (Discovery + Stark + Ken
485
487
 
486
488
  ### Concurrency Rules (ADR-059)
487
489
 
488
- - **Fan out the full roster in parallel for read-only analysis.** Opus 4.7's 1M context window handles 20+ concurrent findings tables without thrashing. Field report #270 confirmed 15+ parallel agents at 15-25% context usage.
490
+ - **Fan out the full roster in parallel for read-only analysis.** Opus 4.8's 1M context window handles 20+ concurrent findings tables without thrashing. Field report #270 confirmed 15+ parallel agents at 15-25% context usage.
489
491
  - **No two concurrent agents may write to the same file** — partition by domain/concern, or serialize writes.
490
492
  - **Fix/build agents:** batch into waves only when writes overlap. Independent files = parallel.
491
493
  - **Wait for ALL parallel agents before synthesizing** (field report #300).
@@ -51,7 +51,7 @@ export async function classify<T extends string>(
51
51
  const start = Date.now()
52
52
 
53
53
  const response = await client.messages.create({
54
- model: 'claude-sonnet-4-20250514',
54
+ model: 'claude-sonnet-4-6',
55
55
  max_tokens: 256,
56
56
  system: [
57
57
  systemPrompt,
@@ -75,7 +75,7 @@ export async function classify<T extends string>(
75
75
  label: parsed.label as T,
76
76
  confidence: parsed.confidence,
77
77
  reasoning: parsed.reasoning,
78
- model: 'claude-sonnet-4-20250514',
78
+ model: 'claude-sonnet-4-6',
79
79
  latencyMs: Date.now() - start,
80
80
  }
81
81
  }
@@ -241,8 +241,8 @@ export function compareVersions(
241
241
  // { id: 'tech-1', input: 'App crashes on login', expected: '{"label":"technical"}', tags: ['technical'] },
242
242
  // ])
243
243
  //
244
- // const baseResult = await suite.run(classifyV1, '2024.01.01', 'claude-sonnet-4-20250514')
245
- // const candidateResult = await suite.run(classifyV2, '2024.01.15', 'claude-sonnet-4-20250514')
244
+ // const baseResult = await suite.run(classifyV1, '2024.01.01', 'claude-sonnet-4-6')
245
+ // const candidateResult = await suite.run(classifyV2, '2024.01.15', 'claude-sonnet-4-6')
246
246
  // const comparison = compareVersions(baseResult, candidateResult)
247
247
  //
248
248
  // if (comparison.verdict === 'fail') {
@@ -364,7 +364,7 @@ export const CLAUDE_PROMPT_EVAL_CATEGORIES = {
364
364
  * await suite.run(sandboxRunner, version, 'sandbox')
365
365
  *
366
366
  * // Live pass — the actual gate, catches output-shape bugs:
367
- * await suite.run(liveModelRunner, version, 'claude-sonnet-4-20250514')
367
+ * await suite.run(liveModelRunner, version, 'claude-sonnet-4-6')
368
368
  */
369
369
 
370
370
  /**
@@ -59,7 +59,7 @@ export async function executeWithRetry<T>(
59
59
  export async function summarize(client: Anthropic, text: string): Promise<Summary> {
60
60
  const response = await executeWithRetry(() =>
61
61
  client.messages.create({
62
- model: 'claude-sonnet-4-20250514',
62
+ model: 'claude-sonnet-4-6',
63
63
  max_tokens: 512,
64
64
  messages: [{ role: 'user', content: `Summarize as JSON: ${text}` }],
65
65
  // System prompt enforces output shape
@@ -110,7 +110,7 @@ export async function runAgentLoop(
110
110
  for (let i = 0; i < MAX_ITERATIONS; i++) {
111
111
  const response = await executeWithRetry(() =>
112
112
  client.messages.create({
113
- model: 'claude-sonnet-4-20250514',
113
+ model: 'claude-sonnet-4-6',
114
114
  max_tokens: 4096,
115
115
  messages,
116
116
  tools: tools as Anthropic.Tool[],
@@ -298,7 +298,7 @@ async function verifyCredential(provider: string, apiKey: string): Promise<boole
298
298
  const probe = new Anthropic({ apiKey })
299
299
  // Minimal call — small max_tokens to burn near-zero quota
300
300
  await probe.messages.create({
301
- model: 'claude-sonnet-4-20250514',
301
+ model: 'claude-sonnet-4-6',
302
302
  max_tokens: 1,
303
303
  messages: [{ role: 'user', content: 'ping' }],
304
304
  })
@@ -338,4 +338,4 @@ function recordUsage(
338
338
  }
339
339
 
340
340
  // Usage:
341
- // recordUsage(sink, org.id, 'anthropic', 'claude-sonnet-4-20250514', 320, 150, 2)
341
+ // recordUsage(sink, org.id, 'anthropic', 'claude-sonnet-4-6', 320, 150, 2)
@@ -68,7 +68,7 @@ async function classifyIntent(
68
68
  input: string
69
69
  ): Promise<z.infer<typeof IntentOutputSchema>> {
70
70
  const response = await client.messages.create({
71
- model: 'claude-sonnet-4-20250514',
71
+ model: 'claude-sonnet-4-6',
72
72
  max_tokens: 256,
73
73
  system: [
74
74
  'You are an intent classifier. Classify the user message into exactly one intent.',
@@ -202,7 +202,7 @@ function zodToJsonSchema(schema: ZodType): Record<string, unknown> {
202
202
  //
203
203
  // // Pass to Anthropic agent loop (see ai-orchestrator.ts):
204
204
  // const response = await client.messages.create({
205
- // model: 'claude-sonnet-4-20250514',
205
+ // model: 'claude-sonnet-4-6',
206
206
  // tools: registry.toAnthropicFormat(),
207
207
  // messages: [{ role: 'user', content: 'What is the weather in Tokyo?' }],
208
208
  // })
@@ -144,7 +144,7 @@ export class PromptRegistry {
144
144
  // { name: 'categories', description: 'Comma-separated category list', required: true },
145
145
  // { name: 'ticket_body', description: 'The ticket text to classify', required: true, maxLength: 5000 },
146
146
  // ],
147
- // model: 'claude-sonnet-4-20250514',
147
+ // model: 'claude-sonnet-4-6',
148
148
  // maxTokens: 256,
149
149
  // })
150
150
  //
@@ -72,10 +72,10 @@ export async function resolveBestModel(apiKey) {
72
72
  catch {
73
73
  // If we can't reach the models endpoint, fall back to a known-good model.
74
74
  // This is the one hardcoded fallback — everything else is dynamic.
75
- return 'claude-sonnet-4-7';
75
+ return 'claude-sonnet-4-6';
76
76
  }
77
77
  if (models.length === 0) {
78
- return 'claude-sonnet-4-7';
78
+ return 'claude-sonnet-4-6';
79
79
  }
80
80
  // Sort each model into preference buckets, newest first within each bucket
81
81
  for (const prefix of MODEL_PREFERENCE) {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "voidforge-build",
3
- "version": "23.14.0",
3
+ "version": "23.15.0",
4
4
  "description": "From nothing, everything. A methodology framework for building with Claude Code.",
5
5
  "type": "module",
6
6
  "engines": {
@@ -45,7 +45,7 @@
45
45
  "@aws-sdk/client-rds": "^3.700.0",
46
46
  "@aws-sdk/client-s3": "^3.700.0",
47
47
  "@aws-sdk/client-sts": "^3.700.0",
48
- "voidforge-build-methodology": "^23.14.0",
48
+ "voidforge-build-methodology": "^23.15.0",
49
49
  "node-pty": "^1.2.0-beta.12",
50
50
  "ws": "^8.19.0"
51
51
  },