voidforge-build 23.17.0 → 23.19.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/.claude/commands/assemble.md +6 -0
- package/dist/.claude/commands/gauntlet.md +12 -0
- package/dist/.claude/workflows/assemble-review.workflow.js +137 -0
- package/dist/.claude/workflows/gauntlet.workflow.js +211 -0
- package/dist/CHANGELOG.md +59 -0
- package/dist/CLAUDE.md +2 -1
- package/dist/VERSION.md +3 -1
- package/dist/docs/methods/WORKFLOWS.md +78 -0
- package/dist/wizard/lib/project-init.js +18 -0
- package/dist/wizard/lib/updater.js +7 -0
- package/package.json +2 -2
|
@@ -67,6 +67,12 @@ Mandatory runtime verification BEFORE code review begins:
|
|
|
67
67
|
|
|
68
68
|
**Gate:** All endpoints return expected status codes. No route collisions. No infinite render loops detected. Update assemble-state.
|
|
69
69
|
|
|
70
|
+
## Workflow Execution — review phases (ADR-067)
|
|
71
|
+
|
|
72
|
+
The **review-heavy fan-out phases** — Phase 3-5 (engage), 7-8 (sentinel), 12 (crossfire), 13 (council) — run as a **Dynamic Workflow** (`.claude/workflows/assemble-review.workflow.js`) over the mission's working diff, so the 15+-agent fan-out stays out of the lead's context (ADR-067; see `docs/methods/WORKFLOWS.md`). The **build/architecture/devops phases (1-2.5, 9) STAY prose orchestration** — they write code, are sequentially dependent, and need lead judgment + `--interactive` gates between them.
|
|
73
|
+
|
|
74
|
+
Run the review workflow as **one workflow run per review pass** so an `--interactive` pause sits at the workflow boundary (workflows take no mid-run input). **Gate (ADR-064):** muster the Silver Surfer + `record-roster.sh` *before* invoking, then `Workflow({ scriptPath: '.claude/workflows/assemble-review.workflow.js', args: { diff, roster } })`. The lead applies fixes from the returned report, then re-runs to re-verify. The phase prose below is the canonical description; `--light`/`--solo` use the raw-Agent fallback (with a `bypass.sh`).
|
|
75
|
+
|
|
70
76
|
## Phase 3 — Review Round 1 (Full Roster — see Agent Deployment Manifest)
|
|
71
77
|
**Fury:** "Picard's team — first pass. Find everything. Full roster deployed."
|
|
72
78
|
|
|
@@ -23,6 +23,18 @@ Opus scans `git diff --stat` and matches changed files against the `description`
|
|
|
23
23
|
|
|
24
24
|
**Dispatch control:** `--light` skips dynamic dispatch (core only). `--solo` runs lead agent only.
|
|
25
25
|
|
|
26
|
+
## Workflow Execution (default — ADR-067)
|
|
27
|
+
|
|
28
|
+
The Gauntlet's 5-round skeleton runs as a **Dynamic Workflow** — `.claude/workflows/gauntlet.workflow.js` — so the 60–80 agents' intermediate findings live in script variables, not the lead's context (only the final synthesis returns). The rounds below define **what** each round does; the workflow **implements** them (discovery → JS dedupe → 3-lens REFUTE verify → crossfire → council). See `docs/methods/WORKFLOWS.md`.
|
|
29
|
+
|
|
30
|
+
**Gate-compliant launch sequence (ADR-064 — the gate now covers the Workflow tool):**
|
|
31
|
+
1. Run the **Silver Surfer** (Agent tool — self-launch always allowed) per the gate header above; announce the heralding.
|
|
32
|
+
2. **Record the roster:** `[ -x scripts/surfer-gate/record-roster.sh ] && bash scripts/surfer-gate/record-roster.sh '<roster-json>' || true` — *before* invoking the workflow, or the gate blocks it.
|
|
33
|
+
3. **Invoke the workflow:** `Workflow({ scriptPath: '.claude/workflows/gauntlet.workflow.js', args: { scope, roster: <Surfer roster>, } })`. The gate allows it (roster recorded); the workflow's internal `agent()` calls are that roster.
|
|
34
|
+
4. **The lead applies fixes** from the returned report, then re-invokes the workflow to re-verify (workflows take no mid-run input — fix application + the Debate Protocol + severity re-rating stay lead/prose judgment).
|
|
35
|
+
|
|
36
|
+
`--light`/`--solo` skip the workflow and use the raw-Agent path below as the fallback (set a `bypass.sh --light`/`--solo` so the gate permits the reduced run). The prose rounds below remain the canonical description of each round.
|
|
37
|
+
|
|
26
38
|
## Round 1 — Discovery (parallel)
|
|
27
39
|
|
|
28
40
|
**Thanos:** "Before I test, I must understand."
|
|
@@ -0,0 +1,137 @@
|
|
|
1
|
+
// assemble-review.workflow.js — the REVIEW-heavy phases of /assemble, re-platformed.
|
|
2
|
+
//
|
|
3
|
+
// /assemble's build/architecture/devops phases STAY prose orchestration (they write
|
|
4
|
+
// code, are sequentially dependent, and need lead judgment + --interactive gates
|
|
5
|
+
// between them). Only the read-heavy fan-out phases move here (ADR-067): the 3x code
|
|
6
|
+
// review (engage), 2x security (sentinel), crossfire, and council — run over a single
|
|
7
|
+
// mission's working diff. Run this as ONE workflow per review pass so an --interactive
|
|
8
|
+
// pause sits at the workflow boundary, not mid-run (workflows take no mid-run input).
|
|
9
|
+
//
|
|
10
|
+
// GATE (ADR-064): muster the Surfer + record-roster BEFORE invoking (see assemble.md).
|
|
11
|
+
// Invoke: Workflow({ scriptPath: '.claude/workflows/assemble-review.workflow.js',
|
|
12
|
+
// args: { diff, roster: [{id,name,key,lens}] } })
|
|
13
|
+
|
|
14
|
+
export const meta = {
|
|
15
|
+
name: 'assemble-review',
|
|
16
|
+
description: 'Per-mission review fan-out: engage (code) + sentinel (security) → 3-lens verify → crossfire → council, over the working diff',
|
|
17
|
+
phases: [
|
|
18
|
+
{ title: 'Review', detail: 'engage + sentinel lenses over the diff' },
|
|
19
|
+
{ title: 'Verify', detail: '3-lens adversarial REFUTE on each claim' },
|
|
20
|
+
{ title: 'Crossfire', detail: 'adversaries hunt NEW issues in the diff' },
|
|
21
|
+
{ title: 'Council', detail: 'synthesize survivors by severity' },
|
|
22
|
+
],
|
|
23
|
+
}
|
|
24
|
+
|
|
25
|
+
// Guarded parse: a malformed/empty `args` string must not crash the run before phase 1.
|
|
26
|
+
let input = {}
|
|
27
|
+
try {
|
|
28
|
+
input = typeof args === 'string' ? (args.trim() ? JSON.parse(args) : {}) : (args || {})
|
|
29
|
+
} catch (_e) {
|
|
30
|
+
input = {}
|
|
31
|
+
}
|
|
32
|
+
const diff = input.diff || 'the working-tree diff for this mission (git diff)'
|
|
33
|
+
const roster = Array.isArray(input.roster) && input.roster.length
|
|
34
|
+
? input.roster
|
|
35
|
+
: [
|
|
36
|
+
{ id: 'picard-architecture', name: 'Picard', key: 'arch', lens: 'architecture & pattern compliance' },
|
|
37
|
+
{ id: 'stark-backend', name: 'Stark', key: 'backend', lens: 'API/DB/service correctness' },
|
|
38
|
+
{ id: 'galadriel-frontend', name: 'Galadriel', key: 'ux', lens: 'UX/a11y of changed surfaces' },
|
|
39
|
+
{ id: 'kenobi-security', name: 'Kenobi', key: 'sec', lens: 'auth/injection/secrets/data' },
|
|
40
|
+
{ id: 'maul', name: 'Maul', key: 'redteam', lens: 'red-team the new attack surface' },
|
|
41
|
+
]
|
|
42
|
+
|
|
43
|
+
const FINDINGS = {
|
|
44
|
+
type: 'object', additionalProperties: false,
|
|
45
|
+
required: ['agent', 'findings'],
|
|
46
|
+
properties: {
|
|
47
|
+
agent: { type: 'string' },
|
|
48
|
+
findings: {
|
|
49
|
+
type: 'array',
|
|
50
|
+
items: {
|
|
51
|
+
type: 'object', additionalProperties: false,
|
|
52
|
+
required: ['title', 'severity', 'file', 'evidence'],
|
|
53
|
+
properties: {
|
|
54
|
+
title: { type: 'string' },
|
|
55
|
+
severity: { type: 'string', enum: ['CRITICAL', 'HIGH', 'MEDIUM', 'LOW', 'WARN'] },
|
|
56
|
+
file: { type: 'string' },
|
|
57
|
+
evidence: { type: 'string', description: '≥1 quoted changed line or concrete repro' },
|
|
58
|
+
},
|
|
59
|
+
},
|
|
60
|
+
},
|
|
61
|
+
},
|
|
62
|
+
}
|
|
63
|
+
const VOTE = { type: 'object', additionalProperties: false, required: ['confirm', 'reason'], properties: { confirm: { type: 'boolean' }, reason: { type: 'string' } } }
|
|
64
|
+
const key = (f) => `${(f.file || '').toLowerCase()}::${(f.title || '').toLowerCase().slice(0, 60)}`
|
|
65
|
+
|
|
66
|
+
// ── Review: engage + sentinel lenses over the DIFF only ───────────────────────
|
|
67
|
+
phase('Review')
|
|
68
|
+
const reviews = (await parallel(roster.map((a) => () =>
|
|
69
|
+
agent(
|
|
70
|
+
`You are ${a.name}. Review ONLY ${diff} through the ${a.lens} lens (do not review unchanged code). Evidence-backed findings only — file:line + a quoted CHANGED line or a real repro. For any access/permission/contract finding, name the governing SSOT and reconcile the fix direction (field report #349).`,
|
|
71
|
+
{ label: `${a.name} · review:${a.key}`, phase: 'Review', schema: FINDINGS, agentType: a.name },
|
|
72
|
+
)
|
|
73
|
+
))).filter(Boolean)
|
|
74
|
+
|
|
75
|
+
const SEV_RANK = { CRITICAL: 5, HIGH: 4, MEDIUM: 3, LOW: 2, WARN: 1 }
|
|
76
|
+
const seen = new Map()
|
|
77
|
+
for (const r of reviews) for (const f of (r.findings || [])) {
|
|
78
|
+
const k = key(f)
|
|
79
|
+
if (!seen.has(k)) seen.set(k, { ...f, raisedBy: [r.agent] })
|
|
80
|
+
else {
|
|
81
|
+
const ex = seen.get(k)
|
|
82
|
+
ex.raisedBy.push(r.agent)
|
|
83
|
+
// Keep the highest severity any lens assigned + track who raised it (consensus
|
|
84
|
+
// visibility) — first-write-wins dropped both before.
|
|
85
|
+
if ((SEV_RANK[f.severity] || 0) > (SEV_RANK[ex.severity] || 0)) ex.severity = f.severity
|
|
86
|
+
}
|
|
87
|
+
}
|
|
88
|
+
const claims = [...seen.values()]
|
|
89
|
+
log(`Review: ${reviews.length} lenses → ${claims.length} distinct claims over the diff.`)
|
|
90
|
+
|
|
91
|
+
// ── Verify: 3-lens adversarial REFUTE (default-to-refuted; verify the FIX too) ─
|
|
92
|
+
phase('Verify')
|
|
93
|
+
const LENSES = ['correctness', 'reachability', 'refutation']
|
|
94
|
+
const verdicts = await parallel(claims.map((c) => () =>
|
|
95
|
+
parallel(LENSES.map((lens) => () =>
|
|
96
|
+
agent(
|
|
97
|
+
`Adversarially verify via the ${lens} lens, reproducing through the REAL execution path (not a library in isolation): "${c.title}" [${c.severity}] at ${c.file}. Evidence: ${c.evidence}. REFUTE unless you cannot. On the refutation lens, also confirm the implied fix adds no new failure mode (wedge/loop/orphan/double-send/TOCTOU).`,
|
|
98
|
+
{ label: `verify:${lens}:${(c.file || '').slice(0, 24)}`, phase: 'Verify', schema: VOTE },
|
|
99
|
+
)
|
|
100
|
+
)).then((votes) => { const v = votes.filter(Boolean); return { claim: c, confirmVotes: v.filter((x) => x.confirm).length } })
|
|
101
|
+
))
|
|
102
|
+
const confirmed = verdicts.filter(Boolean).filter((v) => v.confirmVotes >= 2).map((v) => v.claim)
|
|
103
|
+
const refuted = verdicts.filter(Boolean).filter((v) => v.confirmVotes < 2)
|
|
104
|
+
.map((v) => ({ title: v.claim.title, file: v.claim.file, confirmVotes: v.confirmVotes }))
|
|
105
|
+
log(`Verify: ${confirmed.length}/${claims.length} survived the 3-lens refute (${refuted.length} refuted, logged in the report).`)
|
|
106
|
+
|
|
107
|
+
// ── Crossfire: adversaries hunt NEW issues the review cleared ─────────────────
|
|
108
|
+
phase('Crossfire')
|
|
109
|
+
const confirmedKeys = new Set(confirmed.map(key))
|
|
110
|
+
const crossRaw = (await parallel([
|
|
111
|
+
{ id: 'deathstroke', name: 'Deathstroke', key: 'pentest' },
|
|
112
|
+
{ id: 'loki', name: 'Loki', key: 'chaos' },
|
|
113
|
+
].map((a) => () =>
|
|
114
|
+
agent(
|
|
115
|
+
`You are ${a.name}, crossfire adversary over ${diff}. The review already ran — find NEW issues it cleared (bypasses, edge/chaos cases). Evidence-backed only.`,
|
|
116
|
+
{ label: `${a.name} · crossfire:${a.key}`, phase: 'Crossfire', schema: FINDINGS, agentType: a.name },
|
|
117
|
+
)
|
|
118
|
+
))).filter(Boolean)
|
|
119
|
+
const crossNew = []
|
|
120
|
+
for (const r of crossRaw) for (const f of (r.findings || [])) if (!confirmedKeys.has(key(f))) crossNew.push(f)
|
|
121
|
+
const crossConfirmed = (await parallel(crossNew.map((c) => () =>
|
|
122
|
+
agent(`Adversarially verify (default-to-refuted), real execution path: "${c.title}" [${c.severity}] at ${c.file}. ${c.evidence}`,
|
|
123
|
+
{ label: `verify:crossfire:${(c.file || '').slice(0, 20)}`, phase: 'Crossfire', schema: VOTE })
|
|
124
|
+
.then((v) => (v && v.confirm ? c : null))
|
|
125
|
+
))).filter(Boolean)
|
|
126
|
+
log(`Crossfire: ${crossNew.length} new → ${crossConfirmed.length} confirmed.`)
|
|
127
|
+
|
|
128
|
+
// ── Council: synthesize (JS); the lead applies fixes, then re-runs to re-verify ─
|
|
129
|
+
phase('Council')
|
|
130
|
+
const all = [...confirmed, ...crossConfirmed]
|
|
131
|
+
const sev = (s) => all.filter((f) => f.severity === s)
|
|
132
|
+
return {
|
|
133
|
+
diff,
|
|
134
|
+
counts: { claims: claims.length, confirmed: confirmed.length, refuted: refuted.length, crossfireConfirmed: crossConfirmed.length },
|
|
135
|
+
critical: sev('CRITICAL'), high: sev('HIGH'), medium: sev('MEDIUM'), low: [...sev('LOW'), ...sev('WARN')],
|
|
136
|
+
refutedLog: refuted, // dropped from the actionable buckets, but never silently — logged per SUB_AGENTS.md
|
|
137
|
+
}
|
|
@@ -0,0 +1,211 @@
|
|
|
1
|
+
// gauntlet.workflow.js — Thanos's Comprehensive Review as a Dynamic Workflow.
|
|
2
|
+
//
|
|
3
|
+
// Re-platforms /gauntlet's hand-fanned 60-80 agent rounds onto the Workflow tool
|
|
4
|
+
// (ADR-067) so intermediate findings live in script variables, not the lead's
|
|
5
|
+
// context. The lead's context only sees the final synthesis.
|
|
6
|
+
//
|
|
7
|
+
// GATE (ADR-064): the Workflow launch is gated. /gauntlet must muster the Silver
|
|
8
|
+
// Surfer + record-roster BEFORE invoking this script (see gauntlet.md). The roster
|
|
9
|
+
// is passed in via `args`; this script does NOT re-select it.
|
|
10
|
+
//
|
|
11
|
+
// What stays PROSE / lead judgment (NOT in this script): severity re-rating debate,
|
|
12
|
+
// the Agent Debate Protocol, and the application of fixes. This script SCHEDULES the
|
|
13
|
+
// find → dedupe → 3-lens-verify → crossfire → council skeleton and returns confirmed
|
|
14
|
+
// findings; the lead applies fixes between runs (workflows take no mid-run input).
|
|
15
|
+
//
|
|
16
|
+
// Invoke: Workflow({ scriptPath: '.claude/workflows/gauntlet.workflow.js',
|
|
17
|
+
// args: { scope, roster: [{id,name,key,domain}], rounds } })
|
|
18
|
+
|
|
19
|
+
export const meta = {
|
|
20
|
+
name: 'gauntlet',
|
|
21
|
+
description: 'Comprehensive review: discovery → strike → 3-lens adversarial verify → crossfire → council (schema-validated)',
|
|
22
|
+
phases: [
|
|
23
|
+
{ title: 'Discovery', detail: 'core domain leads map the surface' },
|
|
24
|
+
{ title: 'Strike', detail: 'Surfer-selected specialists fan out' },
|
|
25
|
+
{ title: 'Verify', detail: '3-lens adversarial REFUTE on every distinct claim' },
|
|
26
|
+
{ title: 'Crossfire', detail: 'adversaries hunt NEW issues' },
|
|
27
|
+
{ title: 'Council', detail: 'synthesize survivors by severity' },
|
|
28
|
+
],
|
|
29
|
+
}
|
|
30
|
+
|
|
31
|
+
// Guarded parse: a malformed or empty `args` string must NOT throw and abort the
|
|
32
|
+
// entire run before phase 1 (field report) — fall back to defaults instead.
|
|
33
|
+
let input = {}
|
|
34
|
+
try {
|
|
35
|
+
input = typeof args === 'string' ? (args.trim() ? JSON.parse(args) : {}) : (args || {})
|
|
36
|
+
} catch (_e) {
|
|
37
|
+
input = {}
|
|
38
|
+
}
|
|
39
|
+
const scope = input.scope || 'the working tree / full codebase per gauntlet.md'
|
|
40
|
+
// Surfer-selected specialists (gate-recorded upstream). Fall back to the canonical
|
|
41
|
+
// core leads if no roster was passed (e.g. --light).
|
|
42
|
+
const roster = Array.isArray(input.roster) && input.roster.length
|
|
43
|
+
? input.roster
|
|
44
|
+
: [
|
|
45
|
+
{ id: 'picard-architecture', name: 'Picard', key: 'architecture', domain: 'architecture' },
|
|
46
|
+
{ id: 'stark-backend', name: 'Stark', key: 'backend', domain: 'code/backend' },
|
|
47
|
+
{ id: 'galadriel-frontend', name: 'Galadriel', key: 'ux', domain: 'UX/a11y' },
|
|
48
|
+
{ id: 'kenobi-security', name: 'Kenobi', key: 'security', domain: 'security' },
|
|
49
|
+
{ id: 'kusanagi-devops', name: 'Kusanagi', key: 'devops', domain: 'infra/deploy' },
|
|
50
|
+
]
|
|
51
|
+
|
|
52
|
+
const FINDINGS = {
|
|
53
|
+
type: 'object', additionalProperties: false,
|
|
54
|
+
required: ['agent', 'findings'],
|
|
55
|
+
properties: {
|
|
56
|
+
agent: { type: 'string' },
|
|
57
|
+
findings: {
|
|
58
|
+
type: 'array',
|
|
59
|
+
items: {
|
|
60
|
+
type: 'object', additionalProperties: false,
|
|
61
|
+
required: ['title', 'severity', 'file', 'evidence'],
|
|
62
|
+
properties: {
|
|
63
|
+
title: { type: 'string' },
|
|
64
|
+
severity: { type: 'string', enum: ['CRITICAL', 'HIGH', 'MEDIUM', 'LOW', 'WARN'] },
|
|
65
|
+
file: { type: 'string', description: 'path:line, or "n/a"' },
|
|
66
|
+
evidence: { type: 'string', description: '≥1 quoted code line or a concrete repro; no vibes' },
|
|
67
|
+
},
|
|
68
|
+
},
|
|
69
|
+
},
|
|
70
|
+
},
|
|
71
|
+
}
|
|
72
|
+
|
|
73
|
+
const VERDICT = {
|
|
74
|
+
type: 'object', additionalProperties: false,
|
|
75
|
+
required: ['survives', 'confirmVotes', 'finalSeverity', 'rationale'],
|
|
76
|
+
properties: {
|
|
77
|
+
survives: { type: 'boolean', description: 'true only if ≥2 of the 3 lenses confirm AND the fix would not introduce a new failure mode' },
|
|
78
|
+
confirmVotes: { type: 'integer', description: '0-3' },
|
|
79
|
+
finalSeverity: { type: 'string', enum: ['CRITICAL', 'HIGH', 'MEDIUM', 'LOW', 'WARN', 'REFUTED'] },
|
|
80
|
+
rationale: { type: 'string' },
|
|
81
|
+
},
|
|
82
|
+
}
|
|
83
|
+
|
|
84
|
+
const key = (f) => `${(f.file || '').toLowerCase()}::${(f.title || '').toLowerCase().slice(0, 60)}`
|
|
85
|
+
|
|
86
|
+
// ── Round 1: Discovery + Round 2/3: Strike ────────────────────────────────────
|
|
87
|
+
const dom = (a) => a.domain || a.key || 'their domain' // avoid literal "undefined" in prompts
|
|
88
|
+
phase('Discovery')
|
|
89
|
+
const discovery = (await parallel(roster.slice(0, 5).map((a) => () =>
|
|
90
|
+
agent(
|
|
91
|
+
`You are ${a.name} (${dom(a)}). GAUNTLET discovery pass over ${scope}. Map your domain and report concrete, evidence-backed findings only — every finding needs a file:line and a quoted line or a real repro (no speculation). Rate severity honestly.`,
|
|
92
|
+
{ label: `${a.name} · discovery:${a.key}`, phase: 'Discovery', schema: FINDINGS, agentType: a.name },
|
|
93
|
+
)
|
|
94
|
+
))).filter(Boolean)
|
|
95
|
+
|
|
96
|
+
phase('Strike')
|
|
97
|
+
// Specialists only (index ≥5). When the roster is ≤5 (the default/--light core-leads
|
|
98
|
+
// set) there are NO specialists, so strike is EMPTY — falling back to the full roster
|
|
99
|
+
// here re-ran the identical discovery agents with a "find what discovery missed" prompt,
|
|
100
|
+
// doubling cost for no new coverage (field report). parallel([]) is a harmless no-op.
|
|
101
|
+
const strikeRoster = roster.length > 5 ? roster.slice(5) : []
|
|
102
|
+
if (!strikeRoster.length) log('Strike: no specialists beyond the 5 core leads — skipping (no double-pass).')
|
|
103
|
+
const strike = (await parallel(strikeRoster.map((a) => () =>
|
|
104
|
+
agent(
|
|
105
|
+
`You are ${a.name} (${dom(a)}). GAUNTLET strike pass over ${scope}. Deep, adversarial domain review — find what discovery missed. Evidence-backed findings only (file:line + quoted line/repro).`,
|
|
106
|
+
{ label: `${a.name} · strike:${a.key}`, phase: 'Strike', schema: FINDINGS, agentType: a.name },
|
|
107
|
+
)
|
|
108
|
+
))).filter(Boolean)
|
|
109
|
+
|
|
110
|
+
// ── Dedupe across all domains (plain JS — no agent) ───────────────────────────
|
|
111
|
+
const SEV_RANK = { CRITICAL: 5, HIGH: 4, MEDIUM: 3, LOW: 2, WARN: 1 }
|
|
112
|
+
const seen = new Map()
|
|
113
|
+
for (const r of [...discovery, ...strike]) {
|
|
114
|
+
for (const f of (r.findings || [])) {
|
|
115
|
+
const k = key(f)
|
|
116
|
+
if (!seen.has(k)) seen.set(k, { ...f, raisedBy: [r.agent] })
|
|
117
|
+
else {
|
|
118
|
+
const ex = seen.get(k)
|
|
119
|
+
ex.raisedBy.push(r.agent)
|
|
120
|
+
// Keep the HIGHEST severity any agent assigned. First-write-wins silently
|
|
121
|
+
// discarded a later agent's escalation (e.g. one rates MEDIUM, another HIGH).
|
|
122
|
+
if ((SEV_RANK[f.severity] || 0) > (SEV_RANK[ex.severity] || 0)) ex.severity = f.severity
|
|
123
|
+
}
|
|
124
|
+
}
|
|
125
|
+
}
|
|
126
|
+
const claims = [...seen.values()]
|
|
127
|
+
log(`Discovery+Strike: ${discovery.length + strike.length} agents → ${claims.length} distinct claims (deduped).`)
|
|
128
|
+
|
|
129
|
+
// ── Step 4.5: 3-lens adversarial REFUTE on every distinct claim ───────────────
|
|
130
|
+
// Default-to-refuted. Keep only claims ≥2/3 lenses confirm AND whose fix adds no
|
|
131
|
+
// new failure mode. (Verify-the-FIX, field report #348 #4.)
|
|
132
|
+
phase('Verify')
|
|
133
|
+
const LENSES = ['correctness', 'reachability', 'refutation']
|
|
134
|
+
const verified = await parallel(claims.map((c) => () =>
|
|
135
|
+
parallel(LENSES.map((lens) => () =>
|
|
136
|
+
agent(
|
|
137
|
+
`Adversarially verify this GAUNTLET claim through the ${lens} lens. Claim: "${c.title}" [${c.severity}] at ${c.file}. Evidence: ${c.evidence}. Your job is to REFUTE it — confirm ONLY if you cannot, citing the exact code. For the refutation lens, also check the implied FIX introduces no new failure mode (wedge/loop/orphan/double-send/TOCTOU). Reproduce through the REAL execution path, not a library in isolation (ADR/field report #356).`,
|
|
138
|
+
{ label: `verify:${lens}:${(c.file || '').slice(0, 24)}`, phase: 'Verify', schema: { type: 'object', additionalProperties: false, required: ['confirm', 'reason'], properties: { confirm: { type: 'boolean' }, reason: { type: 'string' } } } },
|
|
139
|
+
)
|
|
140
|
+
)).then((votes) => {
|
|
141
|
+
const v = votes.filter(Boolean)
|
|
142
|
+
const confirmVotes = v.filter((x) => x.confirm).length
|
|
143
|
+
return { claim: c, survives: confirmVotes >= 2, confirmVotes, lensReasons: v.map((x) => x.reason) }
|
|
144
|
+
})
|
|
145
|
+
))
|
|
146
|
+
const confirmed = verified.filter(Boolean).filter((v) => v.survives).map((v) => ({ ...v.claim, confirmVotes: v.confirmVotes }))
|
|
147
|
+
const refuted = verified.filter(Boolean).filter((v) => !v.survives).map((v) => ({ title: v.claim.title, confirmVotes: v.confirmVotes, why: v.lensReasons }))
|
|
148
|
+
log(`Verify: ${confirmed.length} survived 3-lens refute, ${refuted.length} refuted (logged, dropped).`)
|
|
149
|
+
|
|
150
|
+
// ── Round 4: Crossfire — adversaries hunt NEW issues the review cleared ────────
|
|
151
|
+
phase('Crossfire')
|
|
152
|
+
const ADVERSARIES = [
|
|
153
|
+
{ id: 'maul', name: 'Maul', key: 'red-team' },
|
|
154
|
+
{ id: 'deathstroke', name: 'Deathstroke', key: 'pentest' },
|
|
155
|
+
{ id: 'loki', name: 'Loki', key: 'chaos' },
|
|
156
|
+
]
|
|
157
|
+
const crossfireRaw = (await parallel(ADVERSARIES.map((a) => () =>
|
|
158
|
+
agent(
|
|
159
|
+
`You are ${a.name}, a GAUNTLET crossfire adversary over ${scope}. The domain review already ran — hunt NEW issues it cleared (bypasses, chaos/edge cases, exploit chains). Evidence-backed only (file:line + repro).`,
|
|
160
|
+
{ label: `${a.name} · crossfire:${a.key}`, phase: 'Crossfire', schema: FINDINGS, agentType: a.name },
|
|
161
|
+
)
|
|
162
|
+
))).filter(Boolean)
|
|
163
|
+
// New crossfire claims (not already confirmed) get the same one-pass refute.
|
|
164
|
+
const confirmedKeys = new Set(confirmed.map(key))
|
|
165
|
+
const crossNew = []
|
|
166
|
+
for (const r of crossfireRaw) for (const f of (r.findings || [])) if (!confirmedKeys.has(key(f))) crossNew.push(f)
|
|
167
|
+
const crossVerified = await parallel(crossNew.map((c) => () =>
|
|
168
|
+
agent(
|
|
169
|
+
`Adversarially verify (default-to-refuted) this crossfire claim, reproducing through the real execution path: "${c.title}" [${c.severity}] at ${c.file}. Evidence: ${c.evidence}.`,
|
|
170
|
+
{ label: `verify:crossfire:${(c.file || '').slice(0, 24)}`, phase: 'Crossfire', schema: VERDICT },
|
|
171
|
+
).then((v) => ({ claim: c, verdict: v }))
|
|
172
|
+
))
|
|
173
|
+
const crossfireConfirmed = []
|
|
174
|
+
const crossfireRefuted = []
|
|
175
|
+
for (const cv of crossVerified.filter(Boolean)) {
|
|
176
|
+
const v = cv.verdict
|
|
177
|
+
// The VERDICT schema lets a verdict be survives:true AND finalSeverity:'REFUTED'.
|
|
178
|
+
// Such a claim used to be kept as "confirmed" yet matched NO council severity bucket
|
|
179
|
+
// (bySeverity only checks CRITICAL/HIGH/MEDIUM/LOW/WARN) and silently vanished — breaking
|
|
180
|
+
// the "never silently dropped" invariant. Confirm ONLY a real severity; log the rest.
|
|
181
|
+
if (v && v.survives && v.finalSeverity && v.finalSeverity !== 'REFUTED') {
|
|
182
|
+
crossfireConfirmed.push({ ...cv.claim, finalSeverity: v.finalSeverity })
|
|
183
|
+
} else {
|
|
184
|
+
crossfireRefuted.push({ title: cv.claim.title, finalSeverity: (v && v.finalSeverity) || 'UNVERIFIED', why: (v && v.rationale) || 'verifier returned no verdict' })
|
|
185
|
+
}
|
|
186
|
+
}
|
|
187
|
+
log(`Crossfire: ${crossNew.length} new claims → ${crossfireConfirmed.length} confirmed, ${crossfireRefuted.length} refuted/unverified (logged, dropped).`)
|
|
188
|
+
|
|
189
|
+
// ── Round 5: Council — synthesize survivors by severity (JS; lead applies fixes) ─
|
|
190
|
+
phase('Council')
|
|
191
|
+
const all = [...confirmed, ...crossfireConfirmed]
|
|
192
|
+
const bySeverity = (sev) => all.filter((f) => (f.finalSeverity || f.severity) === sev)
|
|
193
|
+
const report = {
|
|
194
|
+
scope,
|
|
195
|
+
rosterSize: roster.length,
|
|
196
|
+
counts: {
|
|
197
|
+
distinctClaims: claims.length,
|
|
198
|
+
confirmed: confirmed.length,
|
|
199
|
+
refuted: refuted.length,
|
|
200
|
+
crossfireConfirmed: crossfireConfirmed.length,
|
|
201
|
+
crossfireRefuted: crossfireRefuted.length,
|
|
202
|
+
},
|
|
203
|
+
critical: bySeverity('CRITICAL'),
|
|
204
|
+
high: bySeverity('HIGH'),
|
|
205
|
+
medium: bySeverity('MEDIUM'),
|
|
206
|
+
low: [...bySeverity('LOW'), ...bySeverity('WARN')],
|
|
207
|
+
refutedLog: refuted, // dropped, but never silently — logged per SUB_AGENTS.md
|
|
208
|
+
crossfireRefutedLog: crossfireRefuted, // ditto for crossfire claims that failed the one-pass verify
|
|
209
|
+
}
|
|
210
|
+
log(`Council: ${report.critical.length} Critical · ${report.high.length} High · ${report.medium.length} Medium · ${report.low.length} Low/Warn. Lead applies fixes (workflow takes no mid-run input), then re-runs to re-verify.`)
|
|
211
|
+
return report
|
package/dist/CHANGELOG.md
CHANGED
|
@@ -6,6 +6,65 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/), and this
|
|
|
6
6
|
|
|
7
7
|
---
|
|
8
8
|
|
|
9
|
+
## [23.19.0] - 2026-06-13
|
|
10
|
+
|
|
11
|
+
### Gauntlet acceptance test → 14 fixes
|
|
12
|
+
|
|
13
|
+
The ADR-067 workflow re-platform validated by running `gauntlet.workflow.js` live on the v23.13–v23.18 platform code (a 10-agent Surfer roster fanned to 347 agents → 99 distinct claims → 66 confirmed + 24 crossfire, **0 Critical**). The acceptance test passed *and* surfaced real bugs; the 3-lens-confirmed High/Medium findings are fixed here. (An earlier run crashed the machine ~44 min in at peak load; the relaunch completed cleanly — the workflow is resumable via `resumeFromRunId`.)
|
|
14
|
+
|
|
15
|
+
### Fixed — Silver Surfer gate (security/correctness)
|
|
16
|
+
|
|
17
|
+
- **`_paths.sh` reap could delete the entire `sessions/` tree.** `find "$SESSIONS_DIR" -maxdepth 1 -type d -mmin +60` matches the search root itself; without `-mindepth 1` a stale `sessions/` root mtime made the sweep `rm -rf` every live session's roster + bypass flag. Added `-mindepth 1`.
|
|
18
|
+
- **Reap-vs-fresh-roster/bypass race (the one VERSION.md flagged for a field report).** The TTL refresh touched the roster *file* but the reaper keys on the session *directory* mtime (touching a child never bumps the parent). `check.sh` now touches `$SESSION_DIR` on every roster/bypass ALLOW, `bypass.sh` touches it on write, and the reap threshold (+120m) is held strictly above the roster TTL (3600s) — an active session is never reaped while still valid.
|
|
19
|
+
- **Gate silently broke on Alpine/minimal Linux:** `_paths.sh` hashed paths with `shasum` only (Perl, absent there). Added a `sha256sum` fallback via a shared `surfer_gate_repo_hash` helper.
|
|
20
|
+
- **`bypass.sh` run before the first hook fire was a silent no-op** (the documented orchestrator order). It now records a repo-scoped *pending* bypass that `check.sh` promotes to the session flag on the first fire.
|
|
21
|
+
|
|
22
|
+
### Fixed — Dynamic Workflow scripts
|
|
23
|
+
|
|
24
|
+
- **Strike phase double-ran the roster** when ≤5 agents (the default/`--light` core-leads set): `strikeRoster` fell back to the full roster, re-running discovery agents under a "find what discovery missed" prompt. Now empty when there are no specialists.
|
|
25
|
+
- **Crossfire `survives:true` + `finalSeverity:'REFUTED'` verdicts silently vanished** — kept as "confirmed" yet matched no council severity bucket. Now excluded and logged in `crossfireRefutedLog`, honoring the never-silently-dropped invariant.
|
|
26
|
+
- **Dedup kept the first raiser's severity**, discarding a later agent's higher rating. Now keeps the max severity and tracks `raisedBy` (gauntlet + assemble-review). assemble-review also gains a `refutedLog`.
|
|
27
|
+
- **Unguarded `JSON.parse(args)`** could crash the whole run before phase 1; now falls back to defaults. Undefined-`domain` roster items no longer render literal `undefined` in prompts.
|
|
28
|
+
|
|
29
|
+
### Fixed — distribution & CI
|
|
30
|
+
|
|
31
|
+
- **`npx voidforge-build init` now copies `.claude/workflows/`** (+ `AGENT_CLASSIFICATION.md`). gauntlet.md/assemble.md referenced workflow scripts that the CLI init path never shipped — v23.18.0 only patched the prepack/dist paths, not `project-init.ts`.
|
|
32
|
+
- **`npx voidforge-build update` now propagates `.claude/workflows` + `scripts/surfer-gate`** — both were absent from the updater's diff list, so existing projects never received workflow or gate fixes.
|
|
33
|
+
- **`publish.yml`:** `recover-partial` derives the version from `package.json`, not `github.ref_name` (a branch name on `workflow_dispatch`, which mis-targeted `npm deprecate`); the Playwright cache key keys on the committed manifests instead of the deleted-and-regenerated lockfile (was a permanent cache miss).
|
|
34
|
+
|
|
35
|
+
### Added / Changed
|
|
36
|
+
|
|
37
|
+
- **`scripts/validate-workflows.sh`** (+ `npm run validate:workflows`, wired into `pretest`): a real syntax gate for `.workflow.js` scripts. Their top-level `await`/`return` make a bare `node --check` fail ("Illegal return statement"), so the v23.18.0 "passes node --check" claim was inaccurate; the validator reproduces the runtime's async wrapper before checking, catching syntax errors before they ship.
|
|
38
|
+
- **`WORKFLOWS.md`** example corrected (`agentType: a.id` → `a.name`) + two new gotchas (agentType resolves by the `name:` display field; how to validate workflow scripts). Stale pre-ADR-060 `/tmp/voidforge-*` paths fixed in the gate `README.md` and `CLAUDE.md`.
|
|
39
|
+
|
|
40
|
+
### Validation
|
|
41
|
+
|
|
42
|
+
Gate suite **23 → 27** (added reap-root-preservation + pending-bypass cases). Full suite **1390 → 1392** (added init-copies-workflows + updater-tracks-workflows/gate regression tests). typecheck clean. Deferred as field-report candidates: concurrent same-repo pointer collision, `workflow_dispatch` branch guard. Dep `^23.18.0` → `^23.19.0` (ADR-062).
|
|
43
|
+
|
|
44
|
+
## [23.18.0] - 2026-06-13
|
|
45
|
+
|
|
46
|
+
### Workflow re-platform of `/gauntlet` + `/assemble` (ADR-067)
|
|
47
|
+
|
|
48
|
+
The opportunity ADR-064 unblocked. The heavy review commands' deterministic skeletons now run as Dynamic Workflows, so 60–80 agents' intermediate findings live in script variables instead of the lead's context.
|
|
49
|
+
|
|
50
|
+
### Added
|
|
51
|
+
|
|
52
|
+
- **`.claude/workflows/gauntlet.workflow.js`** — the 5-round Gauntlet as a workflow: discovery (parallel core leads) → **plain-JS dedupe** → **3-lens adversarial REFUTE** per claim (schema-validated votes, default-to-refuted, keep ≥2/3, verify-the-FIX, reproduce-through-real-path) → crossfire (adversaries hunt NEW issues) → council (JS synthesis by severity). Refuted claims logged, never silently dropped.
|
|
53
|
+
- **`.claude/workflows/assemble-review.workflow.js`** — the review-heavy `/assemble` phases (engage + sentinel + crossfire + council) over a mission's working diff; one run per pass so `--interactive` pauses at the boundary. Build/architecture/devops phases **stay prose orchestration**.
|
|
54
|
+
- **`docs/methods/WORKFLOWS.md`** — the authoring standard: when-to-use, API (`phase`/`parallel`/`pipeline`/`agent({schema})`), the `args`-as-JSON-string (#363) + label-leading-character (#348) gotchas, the 16/1000 caps (ADR-059), and the **ADR-064 gate-launch sequence** (Surfer → record-roster → Workflow). Added to the CLAUDE.md Docs Reference.
|
|
55
|
+
- **ADR-067** decision record.
|
|
56
|
+
|
|
57
|
+
### Changed
|
|
58
|
+
|
|
59
|
+
- **`gauntlet.md` / `assemble.md`** gain "Workflow Execution" sections: the gate-compliant launch (muster Surfer → record roster → invoke the workflow with the roster in `args`), what's workflow-backed vs prose, and the `--light`/`--solo` raw-Agent fallback. Personas, the Agent Debate Protocol, severity re-rating, and **fix application** stay lead/prose judgment (the lead applies fixes from the returned report, then re-runs to re-verify).
|
|
60
|
+
- **Distribution (Phase 12.75 gate):** `.claude/workflows/` is a new shared file category — added to `prepack.sh` (npm package) and `copy-assets.sh` (CLI `init`) so the scripts reach consumers (they were referenced by the command docs but would not have shipped otherwise — the #297 class).
|
|
61
|
+
|
|
62
|
+
### Validation
|
|
63
|
+
|
|
64
|
+
Both workflow scripts pass `node --check` (ESM, async-wrapped to match the Workflow runtime). The **live end-to-end gauntlet run is the acceptance test** (it launches 30+ real review agents through the now-gated Workflow path); the raw-Agent prose path remains the fallback and canonical description. Dogfooded pre-tag `npm test` (1390/1390) + publish-gate. Dep range `^23.17.0` → `^23.18.0` (ADR-062).
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
9
68
|
## [23.17.0] - 2026-06-13
|
|
10
69
|
|
|
11
70
|
### Effort-tiering fleet edit (ADR-054) — verified + applied
|
package/dist/CLAUDE.md
CHANGED
|
@@ -39,7 +39,7 @@ ADR-051 enforces this gate at the hook level (PreToolUse). The prose below is th
|
|
|
39
39
|
2. When the user's command includes `--light` or `--solo`, BEFORE launching the Surfer or any other agent: `[ -x scripts/surfer-gate/bypass.sh ] && bash scripts/surfer-gate/bypass.sh --light || true` (or `--solo`). **Fails closed on unknown flag values** (ADR-060 v23.8.18 hardening, SEC-003) — passing anything other than `--light` or `--solo` exits 2 with an error. No silent bypass.
|
|
40
40
|
3. **Normalize roster names before dispatch** (field report #345, DEAL-001). The Silver Surfer returns agent names from a Haiku pre-scan, which can drift from the actual filenames in `.claude/agents/` (extra `silver-surfer-` prefix, a stray `.md` suffix, a hyphen/underscore mismatch). Before you launch, validate each name against `ls .claude/agents/`: if it matches a file (with or without the `.md` extension), keep it; if not, attempt exactly one correction — strip a known prefix/suffix or normalize separators — and re-check. If it still doesn't resolve, DROP that single name and proceed with the rest of the roster. Never block the whole dispatch over one unresolved name; log the dropped name so the Herald roster can be corrected upstream. The gate's job is to enforce *that* a roster ran, not to fail the run over a typo in *one* name.
|
|
41
41
|
|
|
42
|
-
If `scripts/surfer-gate/check.sh` exists but you skip step 1, your first non-Surfer Agent call in that turn will be blocked with a clear message and your own log line in `/tmp/voidforge-session
|
|
42
|
+
If `scripts/surfer-gate/check.sh` exists but you skip step 1, your first non-Surfer Agent call in that turn will be blocked with a clear message and your own log line in the gate's session log (`<gate-root>/sessions/$SESSION_ID/gate.log`, where `<gate-root>` is `$XDG_RUNTIME_DIR/voidforge-gate` or `$HOME/.voidforge/gate` per ADR-060 — the old `/tmp/voidforge-session-*` path was retired in v23.8.18). You are expected to comply with the block (launch Surfer / run record-roster), not to fight it. If the script does not exist, your project predates v23.10.0; pull the gate from `tmcleod3/voidforge:scripts/surfer-gate/` and merge `settings-snippet.json` into `.claude/settings.json`, or re-run `npx voidforge-build init` against the methodology source.
|
|
43
43
|
|
|
44
44
|
**Why.** Seven field incidents (logged in `.claude/agents/silver-surfer-herald.md`) document the cost of skipping: the orchestrator cannot predict cross-domain relevance from the command name alone. The hook makes skipping mechanically impossible for non-bypass cases. Launch the Surfer. Every time.
|
|
45
45
|
|
|
@@ -256,6 +256,7 @@ See `/docs/methods/MUSTER.md` for the full Muster Protocol.
|
|
|
256
256
|
| **Time Vault** | `/docs/methods/TIME_VAULT.md` | Seldon — when preserving session intelligence for transfer |
|
|
257
257
|
| **Patterns** | `/docs/patterns/` | When writing code (37 reference implementations) |
|
|
258
258
|
| **Lessons** | `/docs/LESSONS.md` | Cross-project learnings |
|
|
259
|
+
| **Workflows** | `/docs/methods/WORKFLOWS.md` | Dynamic Workflow authoring standard (ADR-067) — when to use, API, gotchas, the ADR-064 gate-launch sequence |
|
|
259
260
|
| **Native Capabilities** | `/docs/NATIVE_CAPABILITIES.md` | Command × native-skill collision tracker (ADR-066) — re-audit each release |
|
|
260
261
|
| **Compatibility** | `/docs/COMPATIBILITY.md` | Node + Claude Code platform floor & feature maturity tags (ADR-065) |
|
|
261
262
|
|
package/dist/VERSION.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Version
|
|
2
2
|
|
|
3
|
-
**Current:** 23.
|
|
3
|
+
**Current:** 23.19.0
|
|
4
4
|
|
|
5
5
|
## Versioning Scheme
|
|
6
6
|
|
|
@@ -14,6 +14,8 @@ This project uses [Semantic Versioning](https://semver.org/):
|
|
|
14
14
|
|
|
15
15
|
| Version | Date | Summary |
|
|
16
16
|
|---------|------|---------|
|
|
17
|
+
| 23.19.0 | 2026-06-13 | **Gauntlet acceptance test → 14 fixes (the ADR-067 re-platform, validated by running it on itself).** Ran the new `gauntlet.workflow.js` live on the v23.13–v23.18 platform code (10-agent Surfer roster → 347 agents → 99 distinct claims → 66 confirmed + 24 crossfire, 0 Critical) and fixed the 3-lens-confirmed findings. **Gate (security):** `_paths.sh` reap was missing `-mindepth 1` → could `rm -rf` the entire `sessions/` tree (every live roster/bypass); the reaper now refreshes `$SESSION_DIR` mtime on activity + threshold raised above the TTL, closing the documented reap-vs-fresh-roster/bypass race; `shasum`→`sha256sum` fallback (gate silently broke on Alpine); `bypass.sh` run before the first hook fire now records a repo-scoped *pending* bypass that `check.sh` promotes (was a silent no-op). **Workflows:** strike no longer re-runs the same ≤5-agent roster twice; crossfire `survives:true+REFUTED` verdicts no longer vanish into no bucket (logged in `crossfireRefutedLog`); dedup keeps the **highest** severity + `raisedBy` (was first-write-wins); guarded `JSON.parse(args)`; undefined-domain prompt guard. **Distribution:** `npx voidforge-build init` now copies `.claude/workflows/` + `AGENT_CLASSIFICATION.md`; `update` now propagates `.claude/workflows` + `scripts/surfer-gate` (both were stranded). **Validation:** new `scripts/validate-workflows.sh` (wraps the runtime shape, then `node --check`) wired into `pretest` — corrects the false "scripts pass `node --check`" claim and gates syntax errors from shipping. **Docs:** `WORKFLOWS.md` example `agentType: a.id`→`a.name` + new gotchas; stale `/tmp/voidforge-*` paths fixed in gate README + CLAUDE.md (ADR-060). **CI:** `recover-partial` derives the version from `package.json` not `github.ref_name` (broke on dispatch); Playwright cache key off the committed manifests not the regenerated lockfile. Gate suite 23→27, full suite 1390→1392. Deferred (field-report candidates): concurrent same-repo pointer collision, `workflow_dispatch` branch guard. Dep `^23.18.0` → `^23.19.0`. |
|
|
18
|
+
| 23.18.0 | 2026-06-13 | **Workflow re-platform of `/gauntlet` + `/assemble` (ADR-067)** — the opportunity ADR-064 unblocked. New `.claude/workflows/gauntlet.workflow.js` (discovery → JS dedupe → 3-lens adversarial REFUTE → crossfire → council, schema-validated) and `assemble-review.workflow.js` (engage+sentinel over a mission diff; build/arch/devops stay prose). New **`docs/methods/WORKFLOWS.md`** authoring standard (API, the #348/#363 gotchas, 16/1000 caps, and the ADR-064 gate-launch sequence: Surfer→record-roster→Workflow). `gauntlet.md`/`assemble.md` gain workflow-execution sections; personas + fix-application + Debate Protocol stay prose. **Distribution gate (Phase 12.75):** `.claude/workflows/` is a new shared category — added to `prepack.sh` (npm) + `copy-assets.sh` (init) so the scripts ship to consumers. Both scripts `node --check`-validated (ESM async-wrapped); the live end-to-end gauntlet run is the acceptance test. Dep `^23.17.0` → `^23.18.0`. |
|
|
17
19
|
| 23.17.0 | 2026-06-13 | **Effort-tiering fleet edit (ADR-054) — verified + applied.** Verified against the official Claude Code sub-agents docs that `effort` is a supported sub-agent frontmatter field (values `low`/`medium`/`high`/`xhigh`/`max`; "available levels depend on the model"). Applied across all 264 agent definitions: **20 leads (`model: inherit`) → `effort: xhigh`, 201 Sonnet specialists → `effort: medium`, 43 Haiku scouts → omitted** (Haiku doesn't support the parameter). Per-agent reasoning-spend lever, independent of model tier — the largest cost lever in the fleet (200 specialists no longer pay lead-level reasoning for read-and-report review). Frontmatter-only, idempotent insert after the `model:` line; `validate-agent-refs` + full suite (1390/1390) green; integrity preserved. Closes the M2 deferral from v23.16.0. Updated ADR-054 (status→fleet-applied), SUB_AGENTS.md, COMPATIBILITY.md. Dep `^23.16.0` → `^23.17.0`. (Aside: confirmed the v23.16.0 gate fix live — a Workflow launch was correctly BLOCKED this session until a `--light` bypass was set; noted a reap-vs-fresh-bypass timing race for a future field report.) |
|
|
18
20
|
| 23.16.0 | 2026-06-13 | Platform-alignment campaign — implements the ADR set from `/architect --plan` (→ `/campaign`). **ADR-064 (gate↔Workflow) IMPLEMENTED:** the Silver Surfer `PreToolUse` hook matcher is now `Agent\|Workflow` and `check.sh` gates the **Workflow tool launch** on a recorded roster (closes the proven bypass where workflow-spawned agents skipped the gate); `test.sh` gains 3 Workflow cases (**23/23**), mirrored to the methodology package. **Behavior change:** a Workflow run now requires a recorded roster or a `--light`/`--solo` bypass — build/apply/research workflows should set a bypass. **ADR-065 (platform floor):** `docs/COMPATIBILITY.md` gains a Claude Code platform-floor + per-feature maturity table; informational `claudeCodeFloor` field; semver rule (raising the floor = breaking) + release-checklist item. **ADR-066 (native-capability tracker):** new `docs/NATIVE_CAPABILITIES.md` audits all commands vs native skills with dispositions (`/qa`,`/test` coexist+document; `/git` keep) — realizes ADR-050's deferred follow-up; release re-audit item added. Amended **ADR-051** (workflow-exemption→closure), **ADR-054** (effort tiers + Haiku 200K/no-effort). M2 effort fleet-edit deferred per ADR-054 precondition (runtime `effort:`-frontmatter honoring unverified). Dep `^23.15.0` → `^23.16.0`. |
|
|
19
21
|
| 23.15.0 | 2026-06-13 | Platform-alignment build (`/architect --plan` → `/build` b+a). **P0:** empirically confirmed the Silver Surfer `PreToolUse` gate is **blind to Workflow-tool-spawned agents** (this session: 60+ workflow agents → 2 gate events; controlled probe BEFORE=2/AFTER=2) and wrote **ADR-064** (gate↔Workflow interop — extend matcher to `Agent\|Workflow`, gate the workflow launch; implementation tracked for the campaign). **P1-B near-free batch:** fixed a **live runtime bug** — `anthropic.ts` fell back to the non-existent `claude-sonnet-4-7` (404 on the exact degraded path the fallback exists for) → `claude-sonnet-4-6`, plus the bug-asserting test (now 6/6) and 4 docs; purged stale model IDs (`claude-sonnet-4-20250514`→`claude-sonnet-4-6` in 6 pattern files; `Opus 4.7`→`4.8` across SUB_AGENTS + 4 ADRs); added the **effort-tiering policy** (leads `xhigh` / specialists `medium` / Haiku omit-no-effort+200K ceiling) to SUB_AGENTS.md + CLAUDE.md flag-taxonomy mapping; **amended ADR-059** with the real platform caps (~16 concurrent / ~1,000 per run) and fixed GAUNTLET.md's contradicting "waves of 3". Dogfooded the pre-tag `npm test` gate (ADR from v23.13.1) + the publish-gate alignment (v23.14.0). Dep `^23.14.0` → `^23.15.0`. Follow-on (operator-directed): `/architect --plan` ADR-065/066 + amend ADR-051/054 → `/campaign` to build all. |
|
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
# WORKFLOWS — Dynamic Workflow Orchestration (the authoring standard)
|
|
2
|
+
|
|
3
|
+
> When a command fans out more agents than one conversation can coordinate, the deterministic skeleton belongs in a **Workflow script**, not in the lead's context. The personas and the judgment stay prose; the *scheduling* becomes code.
|
|
4
|
+
|
|
5
|
+
VoidForge's heavy commands (`/gauntlet` 60–80 agents, `/assemble` review phases) were authored before the Workflow tool existed and hand-fan sub-agents via the Agent tool — every intermediate findings table lands in the lead's context. Dynamic Workflows move the fan-out into a JavaScript script whose intermediate results live in **script variables**; only the final synthesis reaches the lead. This is the supported escalation of the dispatch discipline in `SUB_AGENTS.md`.
|
|
6
|
+
|
|
7
|
+
Canonical scripts live in **`.claude/workflows/*.workflow.js`** and are invoked via the Workflow tool (`scriptPath`).
|
|
8
|
+
|
|
9
|
+
## When to use a workflow (vs raw Agent dispatch)
|
|
10
|
+
|
|
11
|
+
| Use a **workflow** | Use **raw Agent dispatch** |
|
|
12
|
+
|---|---|
|
|
13
|
+
| 10+ agents, or fan-out → reduce → synthesize | 2–9 agents the lead orchestrates directly |
|
|
14
|
+
| Repetitive/deterministic rounds, loop-until-dry | The lead must judge between every round |
|
|
15
|
+
| Intermediates would flood the lead's context | Findings are few and the lead acts on each |
|
|
16
|
+
| `/gauntlet`, `/assemble` review phases | a one-off targeted review |
|
|
17
|
+
|
|
18
|
+
## The canonical shape
|
|
19
|
+
|
|
20
|
+
**Fan-out → reduce/dedupe (plain JS) → schema-validated verify → synthesize.** The reduce/dedupe step is *plain JavaScript* (a `Map`, a `filter`) — do **not** spend an agent on it.
|
|
21
|
+
|
|
22
|
+
```js
|
|
23
|
+
export const meta = { // MUST be a pure literal — no vars/calls/spreads
|
|
24
|
+
name: 'example', description: 'one line', // shown in the gate/permission dialog
|
|
25
|
+
phases: [{ title: 'Find' }, { title: 'Verify' }],
|
|
26
|
+
}
|
|
27
|
+
const roster = typeof args === 'string' ? JSON.parse(args) : args // see Gotcha 1
|
|
28
|
+
phase('Find')
|
|
29
|
+
const found = (await parallel(roster.map(a => () =>
|
|
30
|
+
agent(prompt(a), { label: `${a.name} · find:${a.key}`, phase: 'Find', schema: FINDINGS, agentType: a.name })
|
|
31
|
+
))).filter(Boolean) // agentType resolves by the agent's `name:` display field — see Gotcha 6
|
|
32
|
+
const claims = dedupe(found.flatMap(f => f.findings)) // plain JS reduce — no agent
|
|
33
|
+
phase('Verify')
|
|
34
|
+
const verdicts = await parallel(claims.map(c => () =>
|
|
35
|
+
agent(refutePrompt(c), { label: `verify:${c.id}`, phase: 'Verify', schema: VERDICT })))
|
|
36
|
+
return { confirmed: claims.filter((c,i) => verdicts[i]?.survives) }
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
## API essentials
|
|
40
|
+
|
|
41
|
+
- `phase(title)` — groups subsequent `agent()` calls in the `/workflows` progress tree.
|
|
42
|
+
- `parallel(thunks)` — **barrier**: awaits all; a failed thunk resolves to `null` (`.filter(Boolean)`). Use only when you need all results together (dedup, early-exit, cross-comparison).
|
|
43
|
+
- `pipeline(items, stage1, stage2, …)` — **no barrier**: each item flows through all stages independently. The **default** for multi-stage work (verify-as-soon-as-found).
|
|
44
|
+
- `agent(prompt, {schema, agentType, model, label, phase, isolation})` — spawn a sub-agent. With `schema` (JSON Schema) it returns the validated object and **auto-retries on malformed output** — this replaces "please return JSON and hope." Without schema, returns final text.
|
|
45
|
+
- `log(msg)` — narrator line. `budget` — token target. `workflow(name|{scriptPath}, args)` — nested run (one level).
|
|
46
|
+
|
|
47
|
+
## Gotchas (paid for in field reports)
|
|
48
|
+
|
|
49
|
+
1. **`args` arrives as a JSON string** (#363 F5). First line of every script that takes structured args: `const parsed = typeof args === 'string' ? JSON.parse(args) : args;` — `args.map(...)` on the raw value throws `is not a function`.
|
|
50
|
+
2. **Label must lead with the character name** (#348 #2): `"Picard · review:arch"`, not `"review:arch"` — a bare dimension key overrides the agent identity and breaks the Danger Room ticker correlation. Omit the label to let `agentType` surface on its own.
|
|
51
|
+
3. **No `Date.now()` / `Math.random()` / argless `new Date()`** — they throw (they'd break resume). Pass timestamps via `args`; vary by index for "randomness."
|
|
52
|
+
4. **Concurrency caps (ADR-059):** ~16 concurrent / ~1,000 total per run. `parallel([...])` accepts 100s of items but only ~16 run at once. **Batch** unbounded fan-outs (glob-then-partition, `SUB_AGENTS.md`); never one-agent-per-file on a large repo.
|
|
53
|
+
5. **Cost lever:** route cheap stages with `agent(p, {model:'haiku'})` (scout pre-scans) and reserve the default model for synthesis — the way the Surfer already runs on Haiku.
|
|
54
|
+
6. **`agentType` resolves by the agent's `name:` display field, NOT the filename** (e.g. `'Picard'`, not `'picard-architecture'`). A filename-style `agentType` fails to resolve and the `agent()` call returns `null` (silently filtered by `.filter(Boolean)`), so the agent simply never runs. If a roster carries both, pass `a.name`. Same rule as the Agent tool's `subagent_type`.
|
|
55
|
+
7. **Validate before shipping:** a workflow script's top-level `await`/`return` make a bare `node --check` fail ("Illegal return statement") — that is expected (the runtime wraps the body in an async fn). Use `npm run validate:workflows` (wired into `pretest`), which reproduces the wrapper before checking, so a real syntax error is caught in CI rather than shipping to npm.
|
|
56
|
+
|
|
57
|
+
## Gate interop (ADR-064) — REQUIRED
|
|
58
|
+
|
|
59
|
+
The Silver Surfer gate's `PreToolUse` hook now matches `Agent|Workflow`, so **a Workflow launch is gated like an Agent launch**. A command that runs a *review* workflow MUST satisfy the gate at the launch boundary:
|
|
60
|
+
|
|
61
|
+
1. **Muster the Surfer** (Agent tool — self-launch is always allowed) and **`record-roster.sh`** the returned roster — *before* invoking the Workflow.
|
|
62
|
+
2. **Invoke the Workflow**, passing the roster via `args`. The gate allows it (roster recorded); the workflow's internal `agent()` calls are the contents of that authorized roster.
|
|
63
|
+
|
|
64
|
+
Build/apply/research workflows that are **not** a review roster set a `--light`/`--solo` **bypass** (`bypass.sh`) instead. Never invoke a review Workflow without a recorded roster — that was the exact bypass ADR-064 closed.
|
|
65
|
+
|
|
66
|
+
## What stays prose (workflows orchestrate; they don't replace judgment)
|
|
67
|
+
|
|
68
|
+
The 264 personas, the Agent Debate Protocol, severity re-rating from votes, the "Verify the FIX not just the finding" interrogation, and the application of fixes between rounds stay as agent prompts / lead judgment. A workflow deterministically *schedules* "spawn 2 skeptics, collect schema-validated votes"; it does not decide "is this Critical really Critical." The workflow is the skeleton; the personas and judgment are the muscle.
|
|
69
|
+
|
|
70
|
+
## Resume
|
|
71
|
+
|
|
72
|
+
Every Workflow run persists its script + a journal. To resume after an edit/kill: `Workflow({scriptPath, resumeFromRunId})` — unchanged `agent()` calls return cached results; the first edited call and everything after re-runs.
|
|
73
|
+
|
|
74
|
+
## Related
|
|
75
|
+
|
|
76
|
+
- `SUB_AGENTS.md` — dispatch discipline, model/effort tiering, the find→verify review shape, fan-out residual sweeps.
|
|
77
|
+
- `ADR-064` (gate↔workflow interop), `ADR-059` (concurrency caps).
|
|
78
|
+
- `.claude/workflows/gauntlet.workflow.js`, `.claude/workflows/assemble-review.workflow.js` — the reference re-platformed commands.
|
|
@@ -89,6 +89,15 @@ async function copyMethodology(methodologyRoot, projectDir, core) {
|
|
|
89
89
|
if (existsSync(agentsSrc)) {
|
|
90
90
|
count += await copyDir(agentsSrc, join(projectDir, '.claude', 'agents'));
|
|
91
91
|
}
|
|
92
|
+
// Dynamic Workflow scripts (ADR-067 — gauntlet/assemble review skeletons).
|
|
93
|
+
// gauntlet.md / assemble.md reference `.claude/workflows/*.workflow.js`; without this
|
|
94
|
+
// a fresh `npx voidforge-build init` ships the command docs but NOT the scripts they
|
|
95
|
+
// invoke. prepack.sh + copy-assets.sh ship them to the npm package and dist/, but this
|
|
96
|
+
// CLI init copy path (methodologyRoot = the installed package) was missed in v23.18.0.
|
|
97
|
+
const workflowsSrc = join(methodologyRoot, '.claude', 'workflows');
|
|
98
|
+
if (existsSync(workflowsSrc)) {
|
|
99
|
+
count += await copyDir(workflowsSrc, join(projectDir, '.claude', 'workflows'));
|
|
100
|
+
}
|
|
92
101
|
// Methods
|
|
93
102
|
const methodsSrc = join(methodologyRoot, 'docs', 'methods');
|
|
94
103
|
if (existsSync(methodsSrc)) {
|
|
@@ -108,6 +117,15 @@ async function copyMethodology(methodologyRoot, projectDir, core) {
|
|
|
108
117
|
await cp(registrySrc, join(projectDir, 'docs', 'NAMING_REGISTRY.md'));
|
|
109
118
|
count++;
|
|
110
119
|
}
|
|
120
|
+
// Agent classification — single source of truth for agent counts (v23.7.0). Command
|
|
121
|
+
// docs derive their counts from it; prepack ships it to the npm package, but this init
|
|
122
|
+
// path omitted it, so init'd projects couldn't resolve the count SSOT.
|
|
123
|
+
const classificationSrc = join(methodologyRoot, 'docs', 'AGENT_CLASSIFICATION.md');
|
|
124
|
+
if (existsSync(classificationSrc)) {
|
|
125
|
+
await mkdir(join(projectDir, 'docs'), { recursive: true });
|
|
126
|
+
await cp(classificationSrc, join(projectDir, 'docs', 'AGENT_CLASSIFICATION.md'));
|
|
127
|
+
count++;
|
|
128
|
+
}
|
|
111
129
|
// Thumper scripts
|
|
112
130
|
const thumperSrc = join(methodologyRoot, 'scripts', 'thumper');
|
|
113
131
|
if (existsSync(thumperSrc)) {
|
|
@@ -58,9 +58,16 @@ export async function diffMethodology(projectDir) {
|
|
|
58
58
|
const dirs = [
|
|
59
59
|
{ src: '.claude/commands', dest: '.claude/commands' },
|
|
60
60
|
{ src: '.claude/agents', dest: '.claude/agents' },
|
|
61
|
+
// Dynamic Workflow scripts (ADR-067) and the Silver Surfer gate (ADR-051/060/064):
|
|
62
|
+
// both ship to new projects via init but were absent from the updater's diff list, so
|
|
63
|
+
// `npx voidforge-build update` never propagated them — existing projects were stranded
|
|
64
|
+
// on whatever gate/workflow scripts they were created with (e.g. a gate before this
|
|
65
|
+
// release's reap fix). Invocation is via `bash <script>` so exec bits are not required.
|
|
66
|
+
{ src: '.claude/workflows', dest: '.claude/workflows' },
|
|
61
67
|
{ src: 'docs/methods', dest: 'docs/methods' },
|
|
62
68
|
{ src: 'docs/patterns', dest: 'docs/patterns' },
|
|
63
69
|
{ src: 'scripts/thumper', dest: 'scripts/thumper' },
|
|
70
|
+
{ src: 'scripts/surfer-gate', dest: 'scripts/surfer-gate' },
|
|
64
71
|
];
|
|
65
72
|
// Single files to compare
|
|
66
73
|
const singleFiles = ['CLAUDE.md', 'HOLOCRON.md', 'VERSION.md'];
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "voidforge-build",
|
|
3
|
-
"version": "23.
|
|
3
|
+
"version": "23.19.0",
|
|
4
4
|
"description": "From nothing, everything. A methodology framework for building with Claude Code.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"engines": {
|
|
@@ -45,7 +45,7 @@
|
|
|
45
45
|
"@aws-sdk/client-rds": "^3.700.0",
|
|
46
46
|
"@aws-sdk/client-s3": "^3.700.0",
|
|
47
47
|
"@aws-sdk/client-sts": "^3.700.0",
|
|
48
|
-
"voidforge-build-methodology": "^23.
|
|
48
|
+
"voidforge-build-methodology": "^23.19.0",
|
|
49
49
|
"node-pty": "^1.2.0-beta.12",
|
|
50
50
|
"ws": "^8.19.0"
|
|
51
51
|
},
|