substrate-ai 0.20.46 → 0.20.48

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -4,9 +4,9 @@
4
4
 
5
5
  # Substrate
6
6
 
7
- Substrate is an autonomous software development pipeline, operated by your AI coding assistant. Install it, initialize your project, and tell Claude what to build — Substrate handles the rest.
7
+ Substrate is an autonomous software development pipeline, operated by your AI coding assistant. Install it, initialize your project, and tell Claude (or Codex, or Gemini) what to build — Substrate handles the rest.
8
8
 
9
- Most multi-agent coding tools help you run AI sessions in parallel but leave planning, quality control, and learning up to you. Substrate is different: it packages structured planning methodology, multi-agent parallel execution, automated code review cycles, and self-improvement into a single pipeline. Describe your project concept, and Substrate takes it from research through implementation and review — coordinating multiple AI coding agents across isolated worktree branches while a supervisor watches for stalls, auto-recovers, and experiments with improvements to close the loop.
9
+ Most multi-agent coding tools help you run AI sessions in parallel but leave planning, quality control, and learning up to you. Substrate is different: it packages **structured planning methodology**, **multi-agent parallel execution**, **a six-stage verification pipeline**, **automated review-and-fix cycles**, and **a self-improvement loop** into a single pipeline. Describe your project concept, and Substrate takes it from research through implementation and review — coordinating multiple AI coding agents across isolated worktree branches while a supervisor watches for stalls, auto-recovers, and experiments with improvements to close the loop.
10
10
 
11
11
  ## How It Works
12
12
 
@@ -28,7 +28,7 @@ Substrate operates through a three-layer interaction model:
28
28
 
29
29
  **You talk to your AI assistant. Your assistant talks to Substrate. Substrate orchestrates everything.**
30
30
 
31
- Here's what that looks like in practice:
31
+ In practice:
32
32
 
33
33
  ```
34
34
  You: "Implement stories 7-1 through 7-5"
@@ -36,15 +36,17 @@ You: "Implement stories 7-1 through 7-5"
36
36
  Claude Code: runs `substrate run --events --stories 7-1,7-2,7-3,7-4,7-5`
37
37
 
38
38
  Substrate: dispatches 5 stories across 3 agents in parallel worktrees
39
- → story 7-1: dev complete, code review: SHIP_IT
40
- → story 7-2: dev complete, code review: NEEDS_MINOR_FIXES → auto-fix → SHIP_IT
41
- → story 7-3: escalated (interface conflict) Claude asks you what to do
42
- → story 7-4: dev complete, code review: SHIP_IT
43
- → story 7-5: dev complete, code review: SHIP_IT
39
+ → story 7-1: dev complete, 6 verification checks → SHIP_IT
40
+ → story 7-2: code review NEEDS_MINOR_FIXES → auto-fix → SHIP_IT
41
+ → story 7-3: source-ac-fidelity flagged a missing path escalated
42
+ → story 7-4: runtime probe failed escalated for diagnosis
43
+ → story 7-5: SHIP_IT first cycle
44
44
 
45
- Claude Code: "4 succeeded, 1 escalated — here's the interface conflict in 7-3..."
45
+ Claude Code: "3 succeeded, 2 escalated — here's the runtime-probe failure on 7-4..."
46
46
  ```
47
47
 
48
+ Substrate is also **self-developing**: substrate's own development is dispatched through substrate. The fixes shipped in v0.20.42 → v0.20.46 (probe-awareness, frontmatter declarations, dependency-context detection, AnthropicAdapter streaming) were authored by substrate dispatching against its own codebase. This is intentional dogfooding — see the `substrate-on-substrate` examples below.
49
+
48
50
  ## Prerequisites
49
51
 
50
52
  - **Node.js** 22.0.0 or later
@@ -53,6 +55,7 @@ Claude Code: "4 succeeded, 1 escalated — here's the interface conflict in 7-3.
53
55
  - [Claude Code](https://docs.anthropic.com/en/docs/claude-code) (`claude`)
54
56
  - [Codex CLI](https://github.com/openai/codex) (`codex`)
55
57
  - Gemini CLI (`gemini`)
58
+ - **Optional but recommended**: [Dolt](https://www.dolthub.com/) for versioned pipeline state
56
59
 
57
60
  ## Quick Start
58
61
 
@@ -65,97 +68,109 @@ substrate init
65
68
  ```
66
69
 
67
70
  This does three things:
68
- 1. **Generates `.substrate/config.yaml`** — provider routing, concurrency, budgets
69
- 2. **Injects a `## Substrate Pipeline` section into CLAUDE.md** — behavioral directives that teach your AI assistant how to operate the pipeline
70
- 3. **Creates `.claude/commands/` slash commands** — `/substrate-run`, `/substrate-supervisor`, `/substrate-metrics`
71
-
72
- ### Run From Your AI Assistant
73
-
74
- Start a Claude Code session in your project. Claude automatically reads the substrate instructions from CLAUDE.md and knows how to operate the pipeline. From there:
75
71
 
76
- - **"Run the substrate pipeline"** Claude runs the full lifecycle from analysis through implementation
77
- - **"Run substrate for stories 7-1, 7-2, 7-3"** — Claude implements specific stories
78
- - **"/substrate-run"** — invoke the slash command directly for a guided pipeline run
79
-
80
- Claude parses structured events, handles escalations, offers to fix review issues, and summarizes results. You stay in control — Claude always asks before re-running failed stories or applying fixes.
72
+ 1. **Generates `.substrate/config.yaml`**provider routing, concurrency, budgets, quality mode
73
+ 2. **Injects a `## Substrate Pipeline` section into CLAUDE.md** — behavioral directives that teach your AI assistant how to operate the pipeline
74
+ 3. **Creates `.claude/commands/` slash commands** `/substrate-run`, `/substrate-supervisor`, `/substrate-metrics`, `/substrate-factory-loop`
81
75
 
82
- ### Monitor and Self-Improve
76
+ If Dolt is on PATH, `substrate init` automatically sets up versioned state. Without Dolt, substrate falls back to plain SQLite.
83
77
 
84
- While the pipeline runs (or after it finishes):
78
+ ### Run From Your AI Assistant
85
79
 
86
- > "Run the substrate supervisor"
80
+ Start a session in your AI tool of choice. The assistant reads the substrate instructions from `CLAUDE.md` and knows how to operate the pipeline:
87
81
 
88
- The supervisor watches the pipeline, kills stalls, and auto-restarts. When the run completes, it analyzes what happened bottlenecks, token waste, slow stories — then optionally runs A/B experiments on prompts and config in isolated worktrees. Improvements get auto-PRed; regressions get discarded.
82
+ - **"Run the substrate pipeline"**full lifecycle from analysis through implementation
83
+ - **"Run substrate for stories 7-1, 7-2, 7-3"** — implement specific stories
84
+ - **"/substrate-run"** — invoke the slash command directly for a guided run
89
85
 
90
- This is the full loop: **run watch analyze experiment improve.**
86
+ Your assistant parses NDJSON events, handles escalations, offers to fix review issues, and summarizes results. You stay in control — your assistant always asks before re-running failed stories or applying fixes.
91
87
 
92
88
  ### Run From the CLI Directly
93
89
 
94
- You can also run substrate directly from the terminal:
95
-
96
90
  ```bash
97
91
  # Full pipeline with NDJSON event stream
98
92
  substrate run --events
99
93
 
100
- # Specific stories
101
- substrate run --events --stories 7-1,7-2,7-3
94
+ # Specific stories with stricter review limits
95
+ substrate run --events --stories 7-1,7-2,7-3 --max-review-cycles 3
102
96
 
103
- # Human-readable progress output (default)
104
- substrate run
97
+ # Resume an interrupted run
98
+ substrate resume
99
+
100
+ # Cancel a running pipeline
101
+ substrate cancel
105
102
  ```
106
103
 
107
104
  ## The Pipeline
108
105
 
109
- When you tell Substrate to build something, it runs through up to six phases — auto-detecting which phase to start from based on what artifacts already exist:
106
+ When you tell Substrate to build something, it runs through up to **six phases** — auto-detecting which phase to start from based on what artifacts already exist.
110
107
 
111
108
  ### Full Lifecycle (from concept)
112
109
 
113
- 1. **Research** technology stack research, keyword extraction (optional)
114
- 2. **Analysis** — processes concept into structured product brief with problem statement, target users, core features
115
- 3. **Planning** breaks product brief into epics and stories
116
- 4. **Solutioning** technical architecture design with constraints, tech stack, design decisions
117
- 5. **Implementation** parallel story execution (see below)
118
- 6. **Contract Verification** post-sprint validation of cross-story interfaces
110
+ | Phase | Purpose |
111
+ |---|---|
112
+ | **Research** *(optional)* | Technology stack research, keyword extraction |
113
+ | **Analysis** | Concept structured product brief (problem, users, features) |
114
+ | **Planning** | Brief epics and stories |
115
+ | **Solutioning** | Architecture: tech stack, design decisions, constraints |
116
+ | **Implementation** | Parallel story execution (see below) |
117
+ | **Contract Verification** | Post-sprint cross-story interface validation |
119
118
 
120
119
  ### Per-Story Implementation
121
120
 
122
- Each story flows through a quality-gated loop:
121
+ Each story flows through a sequence of phases with a quality-gated review loop:
123
122
 
124
123
  ```
125
- create-story → dev-story → build-verify → code-review
126
-
127
- SHIP_IT → done ✓
128
- NEEDS_MINOR_FIXES → auto-fix → code-review
129
- NEEDS_MAJOR_REWORK → rework → code-review
130
- max cycles exceeded → escalated ⚠
124
+ create-story → test-plan → dev-story → build-fix → code-review
125
+
126
+ SHIP_IT → verification → done ✓
127
+ NEEDS_MINOR_FIXES → fix → code-review
128
+ NEEDS_MAJOR_REWORK → rework → code-review
129
+ max cycles exceeded → escalated ⚠
131
130
  ```
132
131
 
133
- Stories run in parallel across your available agents, each in its own git worktree. Build verification catches compilation errors before code review. Zero-diff detection catches phantom completions. Interface change warnings flag potential cross-module impacts.
132
+ Stories run in parallel across your available agents, each in its own git worktree. After dev-story completes, an optional `probe-author` phase dispatches for event-driven and state-integrating ACs (see [Verification Pipeline](#verification-pipeline)) to derive runtime probes from AC text. Build-fix runs the project's build to catch compilation errors before code review.
133
+
134
+ ### Verification Pipeline
135
+
136
+ Six gates run after code review. Each can pass, warn, or fail; failures block SHIP_IT.
137
+
138
+ | Gate | What it catches |
139
+ |---|---|
140
+ | **phantom-review** | Code review that returned no real verdict (review output malformed/empty) |
141
+ | **trivial-output** | Output token count below threshold — likely no real work done |
142
+ | **acceptance-criteria-evidence** | Each AC has demonstrable evidence in dev-story signals (files modified, tests added) |
143
+ | **build** | Project build succeeds against the dev's worktree |
144
+ | **runtime-probes** | Each declared `## Runtime Probes` section probe runs successfully against real or sandboxed state. Includes auto-detection for error-shape envelopes (`{"isError": true}`, `{"status": "error"}`) and production-trigger requirements for event-driven ACs. Frontmatter `external_state_dependencies` declarations hard-gate when probes section is missing. |
145
+ | **source-ac-fidelity** | AC text in source epic appears verbatim in story artifact (paths, MUST clauses, hard contracts). Includes 4 context-aware heuristics: negation (paths the AC says NOT to deliver), dependency-context (peer packages the implementation imports), operational-path (system install destinations like `.git/hooks/`), and alternative-option groups. |
134
146
 
135
147
  ### Already Have Planning Artifacts?
136
148
 
137
- If your project already has BMAD artifacts (from any tool), Substrate skips straight to implementation:
149
+ Substrate skips to whichever phase is needed:
150
+
151
+ | File | Purpose |
152
+ |---|---|
153
+ | `_bmad-output/planning-artifacts/epics.md` *(or per-epic `epic-N-*.md`)* | Parsed into per-epic context shards |
154
+ | `_bmad-output/planning-artifacts/architecture.md` | Tech stack and constraints for agents |
155
+ | `_bmad-output/implementation-artifacts/<key>-*.md` | Existing story files — substrate skips re-creation |
138
156
 
139
- | File | Required? | Purpose |
140
- |------|-----------|---------|
141
- | `_bmad-output/planning-artifacts/epics.md` | Yes | Parsed into per-epic context shards |
142
- | `_bmad-output/planning-artifacts/architecture.md` | Yes | Tech stack and constraints for agents |
143
- | `_bmad-output/implementation-artifacts/*.md` | Optional | Existing story files — Substrate skips creation for any it finds |
157
+ Drop these in any project and run `substrate run --events --stories <keys>` to dispatch implementation.
144
158
 
145
159
  ## AI Agent Integration
146
160
 
147
- Substrate is designed to be operated by AI agents, not just humans. Three mechanisms teach agents how to interact with the pipeline at runtime:
161
+ Substrate is designed to be operated by AI agents, not just humans. Three mechanisms teach agents how to interact with the pipeline at runtime.
148
162
 
149
163
  ### CLAUDE.md Scaffold
150
164
 
151
- `substrate init` injects a `## Substrate Pipeline` section into your project's CLAUDE.md with:
165
+ `substrate init` injects a `## Substrate Pipeline` section into your project's `CLAUDE.md` with:
152
166
 
153
167
  - Instructions to run `--help-agent` on first use
154
168
  - Event-driven interaction patterns (escalation handling, fix offers, confirmation requirements)
155
169
  - Supervisor workflow guidance
170
+ - Cross-project observation lifecycle norms (reopen-evidence requirements)
156
171
  - Version stamp for detecting stale instructions after upgrades
157
172
 
158
- The section is wrapped in `<!-- substrate:start/end -->` markers for idempotent updates. Re-running `init` updates the substrate section while preserving all other CLAUDE.md content.
173
+ The section is wrapped in `<!-- substrate:start/end -->` markers for idempotent updates. Re-running `init` updates the substrate section while preserving everything else.
159
174
 
160
175
  ### Self-Describing CLI (`--help-agent`)
161
176
 
@@ -165,7 +180,7 @@ substrate run --help-agent
165
180
 
166
181
  Outputs a machine-optimized prompt fragment (<2000 tokens) that an AI agent can ingest as a system prompt. Generated from the same TypeScript type definitions as the event emitter, so documentation never drifts from implementation. Includes:
167
182
 
168
- - All available commands and flags with examples
183
+ - All commands and flags with examples
169
184
  - Capabilities manifest — installed version, available engines, configured providers, active features
170
185
  - Complete event protocol schema
171
186
  - Decision flowchart for handling each event type
@@ -174,34 +189,33 @@ Outputs a machine-optimized prompt fragment (<2000 tokens) that an AI agent can
174
189
 
175
190
  `substrate init` generates `.claude/commands/` slash commands:
176
191
 
177
- - `/substrate-run` — Start or resume a pipeline run with structured events
178
- - `/substrate-supervisor` — Launch the supervisor monitor with stall detection and auto-restart
179
- - `/substrate-metrics` — Query run history, compare runs, and read analysis reports
192
+ - `/substrate-run` — start or resume a pipeline run with structured events
193
+ - `/substrate-supervisor` — launch the supervisor with stall detection and auto-restart
194
+ - `/substrate-metrics` — query run history and analysis reports
195
+ - `/substrate-factory-loop` — run the convergence loop (see [Software Factory](#software-factory-advanced))
180
196
 
181
197
  ### NDJSON Event Protocol
182
198
 
183
199
  With `--events`, Substrate emits newline-delimited JSON events on stdout for programmatic consumption:
184
200
 
185
- ```bash
186
- substrate run --events
187
- ```
188
-
189
- Event types form a discriminated union on the `type` field:
190
-
191
- | Event | Description |
192
- |-------|-------------|
193
- | `pipeline:start` | Pipeline begins — includes `run_id`, `stories[]`, `concurrency` |
194
- | `pipeline:complete` | Pipeline ends — includes `succeeded[]`, `failed[]`, `escalated[]` |
195
- | `story:phase` | Story transitions between phases (`create-story`, `dev-story`, `code-review`, `fix`) |
196
- | `story:done` | Story reaches terminal state with `review_cycles` count |
197
- | `story:escalation` | Story escalated — includes issue list with severities |
198
- | `story:metrics` | Per-story wall-clock time, token counts, phase breakdown |
199
- | `story:warn` | Non-fatal warning (e.g., token ceiling truncation) |
201
+ | Event | When |
202
+ |---|---|
203
+ | `pipeline:start` | Pipeline begins (`run_id`, `stories[]`, `concurrency`) |
204
+ | `pipeline:complete` | Pipeline ends (`succeeded[]`, `failed[]`, `escalated[]`) |
200
205
  | `pipeline:heartbeat` | Periodic heartbeat with active/completed/queued dispatch counts |
201
- | `supervisor:*` | Supervisor lifecycle `poll`, `kill`, `restart`, `abort`, `summary` |
202
- | `supervisor:experiment:*` | Experiment loop `start`, `recommendations`, `complete`, `error` |
203
-
204
- All events carry a `ts` (ISO-8601 timestamp) field. Full TypeScript types are exported:
206
+ | `pipeline:contract-mismatch` | Cross-story interface conflict detected |
207
+ | `story:phase` | Story transitions phase (`create-story`, `test-plan`, `dev-story`, `build-fix`, `code-review`, `fix`) |
208
+ | `story:done` | Story reaches terminal state |
209
+ | `story:metrics` | Per-story wall-clock, tokens, phase breakdown |
210
+ | `story:escalation` | Story escalated with issue list |
211
+ | `story:warn` | Non-fatal warning (token ceiling, low output, etc.) |
212
+ | `verification:check-complete` | Single verification gate finished |
213
+ | `verification:story-complete` | All verification gates done for a story |
214
+ | `probe-author:*` | Probe-author phase events (`dispatched`, `output-parsed`, `appended-to-artifact`, `skipped`, `authored-probe-failed`) |
215
+ | `supervisor:*` | Supervisor lifecycle (`poll`, `kill`, `restart`, `abort`, `summary`) |
216
+ | `supervisor:experiment:*` | Self-improvement loop (`start`, `recommendations`, `complete`, `skip`, `error`) |
217
+
218
+ All events carry a `ts` (ISO-8601) field. Full TypeScript types are exported:
205
219
 
206
220
  ```typescript
207
221
  import type { PipelineEvent, StoryEscalationEvent } from 'substrate-ai'
@@ -216,33 +230,34 @@ if (event.type === 'story:escalation') {
216
230
 
217
231
  ## Supported Worker Agents
218
232
 
219
- Substrate dispatches work to CLI-based AI agents running as child processes. It never calls LLMs directly — all implementation, code review, and story generation is delegated to worker agents.
233
+ Substrate dispatches work to CLI-based AI agents running as child processes. It never calls LLMs directly from the dispatch path — implementation, code review, and story generation are all delegated to worker agents.
220
234
 
221
235
  | Agent ID | CLI Tool | Billing |
222
- |----------|----------|---------|
236
+ |---|---|---|
223
237
  | `claude-code` | [Claude Code](https://docs.anthropic.com/en/docs/claude-code) | Subscription (Max) or API key |
224
238
  | `codex` | [Codex CLI](https://github.com/openai/codex) | Subscription (ChatGPT Plus/Pro) or API key |
225
239
  | `gemini` | Gemini CLI | Subscription or API key |
226
240
 
227
- Substrate auto-discovers available agents at startup and routes work based on adapter health checks and your routing configuration. Unlike API-based orchestrators, Substrate routes work through the CLI tools you already have installed, maximizing your existing AI subscriptions before falling back to pay-per-token billing.
241
+ `substrate adapters list` shows what's installed and healthy. `substrate adapters check` runs full headless-mode verification on each.
242
+
243
+ Substrate routes work through CLI tools you already have installed, maximizing your existing AI subscriptions before falling back to pay-per-token billing. Per-task routing is configurable in `.substrate/routing-policy.yaml` and tunable via `substrate routing`.
228
244
 
229
245
  ## Observability and Self-Improvement
230
246
 
231
- ### Pipeline Monitoring
247
+ ### Live Pipeline Monitoring
232
248
 
233
249
  ```bash
234
250
  # Human-readable progress (default)
235
251
  substrate run
236
- # Shows compact, updating progress lines:
237
- # [dev] 7-2 implementing...
238
- # [review] 7-3 SHIP_IT (1 cycle)
239
- # [done] 7-5 SHIP_IT (2 cycles)
240
252
 
241
- # Real-time health check
253
+ # Real-time health
242
254
  substrate health --output-format json
243
255
 
244
256
  # Poll status
245
257
  substrate status --output-format json
258
+
259
+ # TUI dashboard
260
+ substrate run --tui
246
261
  ```
247
262
 
248
263
  - **TTY mode**: ANSI cursor control for in-place line updates
@@ -251,7 +266,7 @@ substrate status --output-format json
251
266
 
252
267
  ### Supervisor
253
268
 
254
- The supervisor is a long-running monitor that watches pipeline health:
269
+ Long-running monitor that watches pipeline health:
255
270
 
256
271
  ```bash
257
272
  substrate supervisor --output-format json
@@ -259,6 +274,7 @@ substrate supervisor --output-format json
259
274
 
260
275
  - Detects stalled agents (configurable threshold)
261
276
  - Kills stuck process trees and auto-restarts via `resume`
277
+ - Inherits story scope from health snapshots on restart
262
278
  - Emits structured events for each action taken
263
279
 
264
280
  ### Self-Improvement Loop
@@ -268,12 +284,13 @@ substrate supervisor --experiment --output-format json
268
284
  ```
269
285
 
270
286
  After the pipeline completes, the supervisor:
287
+
271
288
  1. **Analyzes** the run — identifies bottlenecks, token waste, slow stories
272
289
  2. **Generates recommendations** — prompt tweaks, config changes, routing adjustments
273
290
  3. **Runs A/B experiments** — applies each recommendation in an isolated worktree, re-runs affected stories, compares metrics
274
- 4. **Verdicts**: IMPROVED changes are kept, REGRESSED changes are discarded
291
+ 4. **Verdicts**: IMPROVED changes are kept and auto-PRed; REGRESSED changes are discarded
275
292
 
276
- ### Metrics and Cost Tracking
293
+ ### Metrics, Cost, and Diff
277
294
 
278
295
  ```bash
279
296
  # Historical run metrics
@@ -282,16 +299,38 @@ substrate metrics --output-format json
282
299
  # Compare two runs side-by-side
283
300
  substrate metrics --compare <run-a>,<run-b>
284
301
 
285
- # Read analysis report
302
+ # Read analysis report from a supervisor run
286
303
  substrate metrics --analysis <run-id> --output-format json
287
304
 
288
305
  # Cost breakdown
289
306
  substrate cost --output-format json
307
+
308
+ # Probe-author KPI summary (catch rate, cost, dispatches)
309
+ substrate metrics --probe-author-summary
310
+ ```
311
+
312
+ With Dolt as the state backend:
313
+
314
+ ```bash
315
+ # Row-level diff of state changes for a story
316
+ substrate diff <story-key>
317
+
318
+ # Commit log of pipeline state mutations
319
+ substrate history
320
+ ```
321
+
322
+ ### Operator Annotations
323
+
324
+ Tag verification findings as confirmed defects, false positives, or probe bugs to drive probe-author KPI feedback:
325
+
326
+ ```bash
327
+ substrate annotate --story 7-3 --finding-category runtime-probe-fail --confirmed-defect --note "..."
328
+ substrate annotate --story 7-4 --finding-category source-ac-drift --false-positive
290
329
  ```
291
330
 
292
331
  ## Software Factory (Advanced)
293
332
 
294
- Beyond the linear SDLC pipeline, Substrate includes a graph-based execution engine and autonomous quality system:
333
+ Beyond the linear SDLC pipeline, Substrate includes a graph-based execution engine and autonomous quality system.
295
334
 
296
335
  ### Graph Engine
297
336
 
@@ -299,7 +338,8 @@ Beyond the linear SDLC pipeline, Substrate includes a graph-based execution engi
299
338
  substrate run --engine graph --events
300
339
  ```
301
340
 
302
- The graph engine reads pipeline topology from DOT files (Graphviz format), enabling:
341
+ Reads pipeline topology from DOT files (Graphviz format), enabling:
342
+
303
343
  - Conditional edges (retry loops, branching on review verdict)
304
344
  - Parallel fan-out/fan-in with configurable join policies
305
345
  - LLM-evaluated edge conditions
@@ -308,7 +348,7 @@ The graph engine reads pipeline topology from DOT files (Graphviz format), enabl
308
348
 
309
349
  ### Scenario-Based Validation
310
350
 
311
- Instead of (or alongside) code review, define external test scenarios that the agent can't game:
351
+ External test scenarios that the agent can't game:
312
352
 
313
353
  ```bash
314
354
  substrate factory scenarios list
@@ -316,7 +356,7 @@ substrate factory scenarios run
316
356
  ```
317
357
 
318
358
  - **Scenario Store**: SHA-256 manifests for integrity verification
319
- - **Satisfaction Scoring**: weighted composite of scenario pass rate, performance, complexity
359
+ - **Satisfaction Scoring**: weighted composite of pass rate, performance, complexity
320
360
  - **Convergence Loops**: iterate until satisfaction threshold met, with plateau detection and budget controls
321
361
 
322
362
  ### Quality Modes
@@ -324,8 +364,8 @@ substrate factory scenarios run
324
364
  Configure how stories are validated via `.substrate/config.yaml`:
325
365
 
326
366
  | Mode | Description |
327
- |------|-------------|
328
- | `code-review` | Traditional — code review verdict drives the gate (default) |
367
+ |---|---|
368
+ | `code-review` | Code review verdict drives the gate (default) |
329
369
  | `dual-signal` | Both scenario satisfaction and code review required |
330
370
  | `scenario-primary` | Satisfaction score is authoritative |
331
371
  | `scenario-only` | Satisfaction only; code review skipped |
@@ -340,18 +380,37 @@ substrate factory twins status
340
380
  substrate factory twins down
341
381
  ```
342
382
 
383
+ ## Substrate-on-Substrate (Self-Development)
384
+
385
+ Substrate's own development is dispatched through substrate. To dispatch a substrate fix from substrate's own working tree:
386
+
387
+ ```bash
388
+ # Author or update the epic doc:
389
+ # _bmad-output/planning-artifacts/epic-NN-<topic>.md
390
+
391
+ # Ingest into the work graph:
392
+ substrate ingest-epic _bmad-output/planning-artifacts/epic-64-state-integrating-ac-frontmatter-and-gate.md
393
+
394
+ # Dispatch the planned stories:
395
+ substrate run --events --stories 64-2,64-3 --max-review-cycles 3
396
+ ```
397
+
398
+ For local CLI changes during dev, use `npm run substrate:dev -- <args>` instead of bare `substrate` (the global binary runs the published version, not your local code).
399
+
400
+ This is also how empirical smoke validation works for prompt-edit ships: a fixture epic at `_bmad-output/planning-artifacts/epic-999-prompt-smoke-state-integrating.md` is dispatched to verify prompt changes produce the structural property they target before publishing.
401
+
343
402
  ## Using as a Library
344
403
 
345
404
  Substrate ships as a family of npm packages. Most users just want the CLI (`substrate-ai`); the scoped packages are for downstream projects that want to embed substrate pieces directly.
346
405
 
347
406
  | Package | Use when you want... |
348
- |---------|----------------------|
407
+ |---|---|
349
408
  | `substrate-ai` | The full CLI — installed globally |
350
409
  | `@substrate-ai/core` | Transport-agnostic primitives — event bus, adapters, cost tracker, telemetry, config schema |
351
- | `@substrate-ai/sdlc` | SDLC orchestration — phase handlers, graph orchestrator, verification pipeline, learning loop |
352
- | `@substrate-ai/factory` | Graph engine, scenario runner, convergence loop, digital twin helpers, LLM client |
410
+ | `@substrate-ai/sdlc` | SDLC orchestration — phase handlers, graph orchestrator, verification pipeline (all 6 gates), learning loop |
411
+ | `@substrate-ai/factory` | Graph engine, scenario runner, convergence loop, digital twin helpers, LLM client (with streaming for Anthropic / OpenAI / Gemini) |
353
412
 
354
- All four packages release in lockstep on every `v*` tag push — pick a version and mix any combination:
413
+ All four packages release in lockstep on every `v*` tag push.
355
414
 
356
415
  ```bash
357
416
  npm install @substrate-ai/core @substrate-ai/factory
@@ -365,14 +424,12 @@ import { createSdlcEventBridge } from '@substrate-ai/sdlc'
365
424
  // Compose these primitives in your own orchestrator.
366
425
  ```
367
426
 
368
- TypeScript declaration files are bundled in each package. Published tarballs carry an npm provenance attestation you can verify with `npm audit signatures`.
427
+ TypeScript declarations bundled. Published tarballs carry an npm provenance attestation you can verify with `npm audit signatures`.
369
428
 
370
429
  ## Configuration
371
430
 
372
431
  Substrate reads configuration from `.substrate/config.yaml` in your project root. Run `substrate init` to generate defaults.
373
432
 
374
- ### Key Configuration
375
-
376
433
  ```yaml
377
434
  config_format_version: '1'
378
435
 
@@ -402,84 +459,121 @@ dispatch_timeouts:
402
459
  ### Configuration Files
403
460
 
404
461
  | File | Purpose |
405
- |------|---------|
462
+ |---|---|
406
463
  | `.substrate/config.yaml` | Provider routing, concurrency, budgets, quality mode |
407
464
  | `.substrate/project-profile.yaml` | Auto-detected build system, language, test framework |
408
465
  | `.substrate/routing-policy.yaml` | Task-to-provider routing rules |
409
466
  | `CLAUDE.md` | Agent scaffold with substrate instructions |
410
467
  | `.claude/commands/` | Slash commands for Claude Code |
411
468
 
412
- ### Versioned State Backend (Optional)
469
+ ### State Backend
470
+
471
+ Substrate persists pipeline state (work graph, decisions, telemetry, runs, repo-map) in either:
413
472
 
414
- Substrate supports [Dolt](https://www.dolthub.com/) for versioned pipeline state:
473
+ - **SQLite** (default) zero setup, single-file durable state
474
+ - **Dolt** (recommended) — versioned state, branchable, enables `substrate diff` and `substrate history`
415
475
 
416
476
  ```bash
417
- substrate init --dolt
477
+ # With Dolt (auto-detected if `dolt` is on PATH)
478
+ substrate init
418
479
  ```
419
480
 
420
- This enables:
421
- - `substrate diff <story>` — row-level state changes per story
422
- - `substrate history` — commit log of pipeline state mutations
423
- - OTEL observability persistence
424
- - Context engineering repo-map storage
425
-
426
- Without Dolt, everything works using plain SQLite.
481
+ Without Dolt, all functionality works except for: `substrate diff`, `substrate history`, persistent OTEL observability tables, and context engineering repo-map storage.
427
482
 
428
483
  ## CLI Command Reference
429
484
 
430
- These commands are invoked by AI agents during pipeline operation. You typically don't run them directly — you tell your agent what to do and it selects the right command.
485
+ These commands are typically invoked by your AI assistant during pipeline operation. You usually don't run them directly.
431
486
 
432
487
  ### Pipeline
433
488
 
434
489
  | Command | Description |
435
- |---------|-------------|
436
- | `substrate run` | Run the full pipeline (analysis implement) |
490
+ |---|---|
491
+ | `substrate run` | Run the full pipeline (auto-detects starting phase) |
437
492
  | `substrate run --events` | Emit NDJSON event stream on stdout |
438
493
  | `substrate run --stories <keys>` | Run specific stories (e.g., `7-1,7-2`) |
494
+ | `substrate run --epic <n>` | Scope discovery to a single epic number |
439
495
  | `substrate run --from <phase>` | Start from a specific phase |
496
+ | `substrate run --stop-after <phase>` | Stop pipeline after this phase |
440
497
  | `substrate run --engine graph` | Use the graph execution engine |
441
- | `substrate run --help-agent` | Print agent instruction prompt fragment and exit |
442
- | `substrate resume` | Resume an interrupted pipeline run |
498
+ | `substrate run --halt-on <severity>` | Halt on escalation severity (`all`/`critical`/`none`) |
499
+ | `substrate run --max-review-cycles <n>` | Cycles per story (default 2; use 3 for migrations / interface extraction) |
500
+ | `substrate run --skip-verification` | Skip post-dispatch verification (use sparingly) |
501
+ | `substrate run --help-agent` | Print agent instruction prompt fragment |
502
+ | `substrate resume` | Resume an interrupted run |
503
+ | `substrate cancel` | Cancel a running pipeline |
443
504
  | `substrate status` | Show pipeline run status |
444
505
  | `substrate amend` | Run an amendment pipeline against a completed run |
445
506
  | `substrate brainstorm` | Interactive multi-persona ideation session |
446
507
 
508
+ ### Work Graph
509
+
510
+ | Command | Description |
511
+ |---|---|
512
+ | `substrate ingest-epic <path>` | Parse an epic doc and upsert story metadata into the work graph |
513
+ | `substrate epic-status <epic>` | Generated status view of an epic from the Dolt work graph |
514
+ | `substrate retry-escalated` | Retry escalated stories flagged retry-targeted by escalation diagnosis |
515
+
447
516
  ### Observability
448
517
 
449
518
  | Command | Description |
450
- |---------|-------------|
451
- | `substrate health` | Check pipeline health, stall detection, and process status |
452
- | `substrate supervisor` | Long-running monitor with kill-and-restart recovery |
453
- | `substrate supervisor --experiment` | Self-improvement: post-run analysis + A/B experiments |
519
+ |---|---|
520
+ | `substrate health` | Pipeline health, stall detection, process status |
521
+ | `substrate supervisor` | Long-running monitor with kill-and-restart |
522
+ | `substrate supervisor --experiment` | Self-improvement: analysis + A/B experiments |
454
523
  | `substrate metrics` | Historical pipeline run metrics |
455
- | `substrate metrics --compare <a,b>` | Side-by-side comparison of two runs |
456
- | `substrate metrics --analysis <run-id>` | Read the analysis report for a specific run |
457
- | `substrate monitor status` | View agent performance metrics |
458
- | `substrate cost` | View cost and token usage summary |
524
+ | `substrate metrics --compare <a,b>` | Side-by-side run comparison |
525
+ | `substrate metrics --analysis <run-id>` | Read analysis report for a specific run |
526
+ | `substrate metrics --probe-author-summary` | Probe-author KPI aggregate |
527
+ | `substrate diff [storyKey]` | Stat-based diff of state changes (Dolt only) |
528
+ | `substrate history` | Dolt commit log for state mutations |
529
+ | `substrate cost` | Cost / token usage summary |
530
+ | `substrate monitor` | Agent performance metrics |
531
+ | `substrate probes` | Inspect runtime-probe sections across story artifacts |
459
532
 
460
- ### Export and Sharing
533
+ ### Operator Workflow
461
534
 
462
535
  | Command | Description |
463
- |---------|-------------|
464
- | `substrate export` | Export planning artifacts as markdown |
465
- | `substrate export --run-id <id>` | Export artifacts from a specific pipeline run |
466
- | `substrate export --output-format json` | Emit JSON result for agent consumption |
536
+ |---|---|
537
+ | `substrate annotate` | Tag verification finding as confirmed-defect / false-positive / probe-bug |
538
+ | `substrate probe-author dispatch` | Manually invoke probe-author phase against a single story file |
539
+ | `substrate contracts` | Show contract declarations and verification status |
540
+
541
+ ### Setup
542
+
543
+ | Command | Description |
544
+ |---|---|
545
+ | `substrate init` | Initialize config, CLAUDE.md scaffold, slash commands, state backend |
546
+ | `substrate adapters list` | List known AI agent adapters with availability |
547
+ | `substrate adapters check` | Run health checks across all adapters |
548
+ | `substrate config` | Show, set, export, or import configuration |
549
+ | `substrate routing` | Show / tune routing configuration |
550
+ | `substrate repo-map` | Show / update / query the repo-map symbol index |
551
+ | `substrate upgrade` | Check for updates and upgrade |
552
+ | `substrate migrate` | Migrate historical SQLite data into Dolt |
467
553
 
468
554
  ### Worktree Management
469
555
 
470
556
  | Command | Description |
471
- |---------|-------------|
557
+ |---|---|
472
558
  | `substrate merge` | Detect conflicts and merge worktree branches into target |
473
- | `substrate worktrees` | List active git worktrees and their tasks |
559
+ | `substrate worktrees` | List active worktrees and associated tasks |
474
560
 
475
- ### Setup
561
+ ### Export
476
562
 
477
563
  | Command | Description |
478
- |---------|-------------|
479
- | `substrate init` | Initialize config, CLAUDE.md scaffold, and slash commands |
480
- | `substrate adapters` | List and check available AI agent adapters |
481
- | `substrate config` | Show, set, export, or import configuration |
482
- | `substrate upgrade` | Check for updates and upgrade to the latest version |
564
+ |---|---|
565
+ | `substrate export` | Export decision store contents as markdown |
566
+ | `substrate export --run-id <id>` | Export artifacts from a specific run |
567
+
568
+ ### Software Factory
569
+
570
+ | Command | Description |
571
+ |---|---|
572
+ | `substrate factory scenarios list` | List defined scenarios |
573
+ | `substrate factory scenarios run` | Run scenarios in convergence loop |
574
+ | `substrate factory twins up` | Bring up Docker Compose digital twins |
575
+ | `substrate factory twins status` | Twin service status |
576
+ | `substrate factory twins down` | Tear down twins |
483
577
 
484
578
  ## Development
485
579
 
@@ -492,7 +586,16 @@ npm run test:fast # ~50s unit suite for iteration
492
586
  npm test # full suite with coverage — run before merging
493
587
  ```
494
588
 
495
- The repo is an npm workspaces monorepo — see [Using as a Library](#using-as-a-library) for the four packages it publishes. Release mechanics live in `scripts/sync-workspace-versions.mjs` and `.github/workflows/publish.yml`: every `v*` tag push syncs the workspace package versions to the root, dry-runs all four tarballs, and publishes via npm OIDC trusted publishing.
589
+ The repo is an npm workspaces monorepo — see [Using as a Library](#using-as-a-library) for the four packages it publishes. Release mechanics live in `scripts/sync-workspace-versions.mjs` and `.github/workflows/publish.yml`: every `v*` tag push syncs workspace package versions to the root, dry-runs all four tarballs, and publishes via npm OIDC trusted publishing.
590
+
591
+ To test local CLI changes without overriding the global binary:
592
+
593
+ ```bash
594
+ npm run build
595
+ npm run substrate:dev -- run --events --stories 999-1
596
+ ```
597
+
598
+ The project's [`.claude/commands/ship.md`](.claude/commands/ship.md) defines a `/ship` workflow that runs build / circular-deps / typecheck / tests / (conditional empirical prompt-edit smoke for `packs/bmad/prompts/*.md` changes) before commit and push.
496
599
 
497
600
  ## License
498
601
 
package/dist/cli/index.js CHANGED
@@ -4,7 +4,7 @@ import { createLogger } from "../logger-KeHncl-f.js";
4
4
  import { createEventBus } from "../helpers-CElYrONe.js";
5
5
  import { AdapterRegistry, BudgetConfigSchema, CURRENT_CONFIG_FORMAT_VERSION, CURRENT_TASK_GRAPH_VERSION, ConfigError, CostTrackerConfigSchema, DEFAULT_CONFIG, DoltClient, DoltNotInstalled, GlobalSettingsSchema, InMemoryDatabaseAdapter, IngestionServer, MonitorDatabaseImpl, OPERATIONAL_FINDING, PartialGlobalSettingsSchema, PartialProviderConfigSchema, ProvidersSchema, RoutingRecommender, STORY_METRICS, TelemetryConfigSchema, addTokenUsage, aggregateTokenUsageForRun, checkDoltInstalled, compareRunMetrics, createAmendmentRun, createConfigSystem, createDecision, createDoltClient, createPipelineRun, getActiveDecisions, getAllCostEntriesFiltered, getBaselineRunMetrics, getDecisionsByCategory, getDecisionsByPhaseForRun, getLatestCompletedRun, getLatestRun, getPipelineRunById, getPlanningCostTotal, getRetryableEscalations, getRunMetrics, getRunningPipelineRuns, getSessionCostSummary, getSessionCostSummaryFiltered, getStoryMetricsForRun, getTokenUsageSummary, incrementRunRestarts, initSchema, initializeDolt, listRunMetrics, loadParentRunDecisions, supersedeDecision, tagRunAsBaseline, updatePipelineRun } from "../dist-VcMmfo2w.js";
6
6
  import "../adapter-registry-DXLMTmfD.js";
7
- import { AdapterTelemetryPersistence, AppError, DoltRepoMapMetaRepository, DoltSymbolRepository, ERR_REPO_MAP_STORAGE_WRITE, EpicIngester, GitClient, GrammarLoader, RepoMapInjector, RepoMapModule, RepoMapQueryEngine, RepoMapStorage, SymbolParser, createContextCompiler, createDispatcher, createEventEmitter, createImplementationOrchestrator, createPackLoader, createPhaseOrchestrator, createStopAfterGate, createTelemetryAdvisor, formatPhaseCompletionSummary, getFactoryRunSummaries, getScenarioResultsForRun, getTwinRunsForRun, listGraphRuns, registerExportCommand, registerFactoryCommand, registerRunCommand, registerScenariosCommand, resolveStoryKeys, runAnalysisPhase, runPlanningPhase, runProbeAuthor, runSolutioningPhase, validateStopAfterFromConflict } from "../run-BYMVrlGZ.js";
7
+ import { AdapterTelemetryPersistence, AppError, DoltRepoMapMetaRepository, DoltSymbolRepository, ERR_REPO_MAP_STORAGE_WRITE, EpicIngester, GitClient, GrammarLoader, RepoMapInjector, RepoMapModule, RepoMapQueryEngine, RepoMapStorage, SymbolParser, createContextCompiler, createDispatcher, createEventEmitter, createImplementationOrchestrator, createPackLoader, createPhaseOrchestrator, createStopAfterGate, createTelemetryAdvisor, formatPhaseCompletionSummary, getFactoryRunSummaries, getScenarioResultsForRun, getTwinRunsForRun, listGraphRuns, registerExportCommand, registerFactoryCommand, registerRunCommand, registerScenariosCommand, resolveStoryKeys, runAnalysisPhase, runPlanningPhase, runProbeAuthor, runSolutioningPhase, validateStopAfterFromConflict } from "../run-Cdu2K3tg.js";
8
8
  import "../errors-CogpxBUg.js";
9
9
  import "../routing-CcBOCuC9.js";
10
10
  import "../decisions-C0pz9Clx.js";
@@ -5204,7 +5204,7 @@ async function runSupervisorAction(options, deps = {}) {
5204
5204
  await initSchema(expAdapter);
5205
5205
  const { runRunAction: runPipeline } = await import(
5206
5206
  /* @vite-ignore */
5207
- "../run-Dwru4oKj.js"
5207
+ "../run-syUfEHEq.js"
5208
5208
  );
5209
5209
  const runStoryFn = async (opts) => {
5210
5210
  const exitCode = await runPipeline({
@@ -6082,6 +6082,61 @@ const WORD_TO_NUMBER = {
6082
6082
  nine: 9,
6083
6083
  ten: 10
6084
6084
  };
6085
+ /**
6086
+ * obs_2026-05-03_021: nouns that genuinely end in `-s` in their singular form
6087
+ * — must NOT be lemma-stripped (would yield `proces`, `statu`, `busines`).
6088
+ * The list is the conservative set of common engineering vocabulary; broader
6089
+ * coverage can be added as new false-strips surface.
6090
+ */
6091
+ const LEMMA_STOPLIST = new Set([
6092
+ "process",
6093
+ "status",
6094
+ "class",
6095
+ "access",
6096
+ "success",
6097
+ "address",
6098
+ "business",
6099
+ "analysis",
6100
+ "basis",
6101
+ "crisis",
6102
+ "thesis",
6103
+ "axis",
6104
+ "series",
6105
+ "species"
6106
+ ]);
6107
+ /**
6108
+ * obs_2026-05-03_021: collapse plural ↔ singular noun forms to a shared
6109
+ * lemma so that the source AC's plural ("functions") and the rendered story's
6110
+ * singular code-form (`'function'`) compare on the same key.
6111
+ *
6112
+ * Strata Story 1-11b's failure shape was a canonical TS/JS rendering: source
6113
+ * AC says "X and Y are both **functions**" (prose plural), rendered story
6114
+ * expresses the same constraint as `typeof X === 'function' && typeof Y === 'function'`
6115
+ * (singular literal in backticks). The pre-fix heuristic counted plural
6116
+ * forms only, saw source=2, rendered=0, escalated as drift after 2 retries.
6117
+ *
6118
+ * Rules (priority order):
6119
+ * - Stoplist hit → return as-is (preserves `process`, `status`, etc.)
6120
+ * - `-ies` suffix (≥5 chars) → strip + 'y' (categories → category)
6121
+ * - `-es` suffix (≥5 chars) where stem ends in s/x/z/ch/sh → strip 'es' (classes → class, boxes → box, watches → watch)
6122
+ * - `-s` suffix (≥4 chars) → strip (functions → function, tools → tool)
6123
+ * - otherwise return as-is
6124
+ */
6125
+ function lemmatizeNoun(noun) {
6126
+ const lower = noun.toLowerCase();
6127
+ if (LEMMA_STOPLIST.has(lower)) return lower;
6128
+ if (lower.length >= 5 && lower.endsWith("ies")) return lower.slice(0, -3) + "y";
6129
+ if (lower.length >= 5 && lower.endsWith("ches")) {
6130
+ if (lower.endsWith("tches")) return lower.slice(0, -2);
6131
+ return lower.slice(0, -1);
6132
+ }
6133
+ if (lower.length >= 5 && lower.endsWith("es")) {
6134
+ const stem = lower.slice(0, -2);
6135
+ if (/[sxz]$/.test(stem) || /sh$/.test(stem)) return stem;
6136
+ }
6137
+ if (lower.length >= 4 && lower.endsWith("s")) return lower.slice(0, -1);
6138
+ return lower;
6139
+ }
6085
6140
  function extractBehavioralAssertions(content) {
6086
6141
  if (content.length === 0) return {
6087
6142
  whenClauseCount: 0,
@@ -6113,13 +6168,33 @@ function extractBehavioralAssertions(content) {
6113
6168
  }
6114
6169
  if (noun.length < 3) continue;
6115
6170
  const phrase = `${determiner} ${numStr || (determiner === "both" ? "" : "")} ${noun}`.trim().replace(/\s+/g, " ");
6116
- const dedupKey = `${count}|${noun}`;
6171
+ const lemma = lemmatizeNoun(noun);
6172
+ const dedupKey = `${count}|${lemma}`;
6117
6173
  if (seen.has(dedupKey)) continue;
6118
6174
  seen.add(dedupKey);
6119
6175
  numericQuantifiers.push({
6120
6176
  phrase,
6121
6177
  count,
6122
- noun
6178
+ noun: lemma
6179
+ });
6180
+ }
6181
+ const backtickLiteralPattern = /`[^`]*?'([a-z][a-z_-]+)'[^`]*?`/gi;
6182
+ const backtickCounts = new Map();
6183
+ let blMatch;
6184
+ while ((blMatch = backtickLiteralPattern.exec(content)) !== null) {
6185
+ const rawNoun = blMatch[1]?.toLowerCase() ?? "";
6186
+ if (rawNoun.length < 3) continue;
6187
+ const lemma = lemmatizeNoun(rawNoun);
6188
+ backtickCounts.set(lemma, (backtickCounts.get(lemma) ?? 0) + 1);
6189
+ }
6190
+ for (const [lemma, count] of backtickCounts) {
6191
+ const dedupKey = `${count}|${lemma}`;
6192
+ if (seen.has(dedupKey)) continue;
6193
+ seen.add(dedupKey);
6194
+ numericQuantifiers.push({
6195
+ phrase: `<backtick-literal-occurrences>`,
6196
+ count,
6197
+ noun: lemma
6123
6198
  });
6124
6199
  }
6125
6200
  return {
@@ -6128,6 +6203,14 @@ function extractBehavioralAssertions(content) {
6128
6203
  numericQuantifiers
6129
6204
  };
6130
6205
  }
6206
+ /**
6207
+ * obs_2026-05-03_021: hard-fail threshold for numeric-quantifier drift.
6208
+ * When `renderedCount / sourceCount ≤ 0.5`, the drop is large enough to
6209
+ * indicate genuine clause reduction (strata 1-10 was 4→2 = 0.5; the
6210
+ * boundary is inclusive). Above 0.5, the gap is within plausible
6211
+ * lemma/code-rendering variance and demoted to warn.
6212
+ */
6213
+ const NUMERIC_HARD_FAIL_RATIO = .5;
6131
6214
  function computeClauseFidelity(storyFileContent, sourceContent) {
6132
6215
  const sourceSignals = extractBehavioralAssertions(sourceContent);
6133
6216
  const renderedSignals = extractBehavioralAssertions(storyFileContent);
@@ -6141,13 +6224,19 @@ function computeClauseFidelity(storyFileContent, sourceContent) {
6141
6224
  const numericMismatches = [];
6142
6225
  for (const [noun, sourceCnt] of sourceNounCounts.entries()) {
6143
6226
  const renderedCnt = renderedNounCounts.get(noun) ?? 0;
6144
- if (renderedCnt < sourceCnt) numericMismatches.push({
6145
- noun,
6146
- sourceCount: sourceCnt,
6147
- renderedCount: renderedCnt
6148
- });
6227
+ if (renderedCnt < sourceCnt) {
6228
+ const ratio = sourceCnt === 0 ? 1 : renderedCnt / sourceCnt;
6229
+ const severity = ratio <= NUMERIC_HARD_FAIL_RATIO ? "error" : "warn";
6230
+ numericMismatches.push({
6231
+ noun,
6232
+ sourceCount: sourceCnt,
6233
+ renderedCount: renderedCnt,
6234
+ severity
6235
+ });
6236
+ }
6149
6237
  }
6150
- const numericDriftComponent = numericMismatches.length > 0 ? 1 : 0;
6238
+ const hasNumericHardFail = numericMismatches.some((m) => m.severity === "error");
6239
+ const numericDriftComponent = hasNumericHardFail ? 1 : 0;
6151
6240
  const CLAUSE_RATIO_FLOOR = .7;
6152
6241
  const clauseDriftComponent = clauseRatio >= CLAUSE_RATIO_FLOOR ? 0 : Math.min(1, (CLAUSE_RATIO_FLOOR - clauseRatio) / CLAUSE_RATIO_FLOOR);
6153
6242
  const drift = Math.max(numericDriftComponent, clauseDriftComponent);
@@ -13214,7 +13303,7 @@ function createImplementationOrchestrator(deps) {
13214
13303
  renameSync(storyFilePath, stalePath);
13215
13304
  const driftPct = Math.round(overallDrift * 100);
13216
13305
  const pathMissing = pathFidelity?.missing ?? [];
13217
- const numericMismatches = clauseFidelity.numericMismatches;
13306
+ const numericMismatches = clauseFidelity.numericMismatches.filter((m) => m.severity === "error");
13218
13307
  const reasons = [];
13219
13308
  if (pathMissing.length > 0) reasons.push(`${pathMissing.length} named path(s) missing`);
13220
13309
  if (numericMismatches.length > 0) reasons.push(`${numericMismatches.length} numeric quantifier mismatch(es) (e.g., "${numericMismatches[0].noun}" source=${numericMismatches[0].sourceCount} rendered=${numericMismatches[0].renderedCount})`);
@@ -13270,7 +13359,7 @@ function createImplementationOrchestrator(deps) {
13270
13359
  }
13271
13360
  } else {
13272
13361
  const pathMissing = pathFidelity?.missing ?? [];
13273
- const numericMismatches = clauseFidelity.numericMismatches;
13362
+ const numericMismatches = clauseFidelity.numericMismatches.filter((m) => m.severity === "error");
13274
13363
  const reasons = [];
13275
13364
  if (pathMissing.length > 0) reasons.push(`paths missing: ${pathMissing.join(", ")}`);
13276
13365
  if (numericMismatches.length > 0) reasons.push(`numeric mismatches: ${numericMismatches.map((m) => `${m.noun} (source=${m.sourceCount}, rendered=${m.renderedCount})`).join("; ")}`);
@@ -45277,4 +45366,4 @@ function registerRunCommand(program, _version = "0.0.0", projectRoot = process.c
45277
45366
 
45278
45367
  //#endregion
45279
45368
  export { AdapterTelemetryPersistence, AppError, DoltRepoMapMetaRepository, DoltSymbolRepository, ERR_REPO_MAP_STORAGE_WRITE, EpicIngester, GitClient, GrammarLoader, RepoMapInjector, RepoMapModule, RepoMapQueryEngine, RepoMapStorage, SymbolParser, createContextCompiler, createDispatcher, createEventEmitter, createImplementationOrchestrator, createPackLoader, createPhaseOrchestrator, createStopAfterGate, createTelemetryAdvisor, formatPhaseCompletionSummary, getFactoryRunSummaries, getScenarioResultsForRun, getTwinRunsForRun, listGraphRuns, normalizeGraphSummaryToStatus, registerExportCommand, registerFactoryCommand, registerRunCommand, registerScenariosCommand, resolveMaxReviewCycles, resolveStoryKeys, runAnalysisPhase, runPlanningPhase, runProbeAuthor, runRunAction, runSolutioningPhase, validateStopAfterFromConflict, wireNdjsonEmitter };
45280
- //# sourceMappingURL=run-BYMVrlGZ.js.map
45369
+ //# sourceMappingURL=run-Cdu2K3tg.js.map
@@ -2,7 +2,7 @@ import "./health-BV-rzjf7.js";
2
2
  import "./logger-KeHncl-f.js";
3
3
  import "./helpers-CElYrONe.js";
4
4
  import "./dist-VcMmfo2w.js";
5
- import { normalizeGraphSummaryToStatus, registerRunCommand, resolveMaxReviewCycles, runRunAction, wireNdjsonEmitter } from "./run-BYMVrlGZ.js";
5
+ import { normalizeGraphSummaryToStatus, registerRunCommand, resolveMaxReviewCycles, runRunAction, wireNdjsonEmitter } from "./run-Cdu2K3tg.js";
6
6
  import "./routing-CcBOCuC9.js";
7
7
  import "./decisions-C0pz9Clx.js";
8
8
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "substrate-ai",
3
- "version": "0.20.46",
3
+ "version": "0.20.48",
4
4
  "description": "Substrate — multi-agent orchestration daemon for AI coding agents",
5
5
  "type": "module",
6
6
  "license": "MIT",
@@ -260,6 +260,45 @@ Note this example, taken to production, would have caught the strata 1-12 bug at
260
260
 
261
261
  Pre-Sprint-22 (warn-severity advisory) the gate produced false negatives at SHIP_IT time. Post-flip, the gate is the load-bearing line of defense for the trigger-invocation property.
262
262
 
263
+ ### Production-shaped fixtures
264
+
265
+ When the AC describes integration with a **collection** of real-state resources — a fleet of repos, a set of files, a list of services, multiple registry rows, a directory of N projects — the probe fixture MUST contain **≥2 distinct, non-overlapping resources**. A probe that builds a one-resource fixture silently passes when the production-state shape is ≥2, masking defects whose failure mode only surfaces under multiplicity (wrong-cwd-with-N-children, substring-collision attribution, single-row optimistic queries that mis-route under a second row).
266
+
267
+ Strata Story 2-4 ("morning briefing generator", v0.20.41) shipped two architectural defects (`fetchGitLog` ran with `cwd=fleetRoot` not per-project; commit attribution used substring match) that any single-repo probe fixture would have hidden. The fleet-root cwd defect produces *some* output against a fleet of one repo (one commit found, attributed to the one project — looks correct); it only fails when the fleet has ≥2 repos with distinct, non-overlapping commit messages and the probe asserts each project gets attributed correctly. See observation `obs_2026-05-02_018`.
268
+
269
+ **Rule**: if the AC names a plural state shape (`fleet`, `set of`, `list of`, `multiple`, `each <thing>`, `N projects`, `the registry`, `all <things>`), the probe fixture must populate at least two distinct, non-overlapping instances of that resource and the assertions must distinguish them.
270
+
271
+ | AC names | Probe fixture must contain |
272
+ |---|---|
273
+ | fleet of repos / each project | ≥2 git repos in the fleet root, each with a distinct commit message |
274
+ | set of files / list of files | ≥2 files with distinct content; assertions distinguish each |
275
+ | multiple table rows / the registry | ≥2 rows with non-overlapping keys; assertions verify per-row behavior |
276
+ | services in a manifest / N services | ≥2 service definitions with distinct names; assert each |
277
+
278
+ **Example: multi-repo fleet probe (production-shaped fixture for the strata 2-4 family)**
279
+
280
+ ```yaml
281
+ - name: briefing-attributes-commits-per-project
282
+ sandbox: twin
283
+ command: |
284
+ set -e
285
+ FLEET=$(mktemp -d)
286
+ for proj in alpha beta; do
287
+ mkdir -p "$FLEET/$proj"
288
+ cd "$FLEET/$proj" && git init -q
289
+ git config user.email t@example.com && git config user.name test
290
+ echo "$proj content" > a.md && git add . && git commit -qm "$proj-only commit"
291
+ done
292
+ cd <REPO_ROOT>
293
+ FLEET_ROOT="$FLEET" node dist/cli.mjs briefing
294
+ expect_stdout_regex:
295
+ - 'alpha-only commit'
296
+ - 'beta-only commit'
297
+ description: each project's commit attributed correctly — fixture has ≥2 distinct repos
298
+ ```
299
+
300
+ A one-repo variant of this probe would pass against the (broken) v0.20.41 implementation; the two-repo variant catches the wrong-cwd defect because the parent-cwd `git log --all` returns BOTH commits but substring-match attribution mis-routes them.
301
+
263
302
  ### Examples by artifact class
264
303
 
265
304
  **Systemd unit:**
@@ -130,6 +130,38 @@ Strata Run 13 (Story 1-12, post-merge git hook) shipped SHIP_IT after the dev's
130
130
 
131
131
  Note this example, taken to production, would have caught the strata 1-12 bug at runtime-probe phase rather than only at e2e smoke pass. That's the standard this guidance sets.
132
132
 
133
+ ## Production-shaped fixtures
134
+
135
+ When the AC describes integration with a **collection** of real-state resources — a fleet of repos, a set of files, a list of services, multiple registry rows, a directory of N projects — the probe fixture MUST contain **≥2 distinct, non-overlapping resources**. A probe that builds a one-resource fixture silently passes when the production-state shape is ≥2, masking defects whose failure mode only surfaces under multiplicity (wrong-cwd-with-N-children, substring-collision attribution, single-row optimistic queries that mis-route under a second row).
136
+
137
+ Strata Story 2-4 ("morning briefing generator", v0.20.41) shipped two architectural defects (`fetchGitLog` ran with `cwd=fleetRoot` not per-project; commit attribution used substring match) that any single-repo probe fixture would have hidden. The fleet-root cwd defect produces *some* output against a fleet of one repo (one commit found, attributed to the one project — looks correct); it only fails when the fleet has ≥2 repos with distinct, non-overlapping commit messages and the probe asserts each project gets attributed correctly. See observation `obs_2026-05-02_018`.
138
+
139
+ **Rule**: if the AC names a plural state shape (`fleet`, `set of`, `list of`, `multiple`, `each <thing>`, `N projects`, `the registry`, `all <things>`), the probe fixture must populate at least two distinct, non-overlapping instances of that resource and the assertions must distinguish them. The plurality must show in the `command:` setup AND in the assertions — a two-repo fixture with a single regex check is half the discipline.
140
+
141
+ **Example: multi-repo fleet probe (production-shaped fixture for the strata 2-4 family)**
142
+
143
+ ```yaml
144
+ - name: briefing-attributes-commits-per-project
145
+ sandbox: twin
146
+ command: |
147
+ set -e
148
+ FLEET=$(mktemp -d)
149
+ for proj in alpha beta; do
150
+ mkdir -p "$FLEET/$proj"
151
+ cd "$FLEET/$proj" && git init -q
152
+ git config user.email t@example.com && git config user.name test
153
+ echo "$proj content" > a.md && git add . && git commit -qm "$proj-only commit"
154
+ done
155
+ cd <REPO_ROOT>
156
+ FLEET_ROOT="$FLEET" node dist/cli.mjs briefing
157
+ expect_stdout_regex:
158
+ - 'alpha-only commit'
159
+ - 'beta-only commit'
160
+ description: each project's commit attributed correctly — fixture has ≥2 distinct repos
161
+ ```
162
+
163
+ A one-repo variant of this probe would pass against the (broken) v0.20.41 implementation; the two-repo variant catches the wrong-cwd defect because the parent-cwd `git log --all` returns BOTH commits but substring-match attribution mis-routes them.
164
+
133
165
  ## Mission
134
166
 
135
167
  Author runtime probes for the story described above. Use the AC sections provided: