npm - substrate-ai - Versions diffs - 0.20.46 → 0.20.47 - Mend

substrate-ai 0.20.46 → 0.20.47

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md +246 -143
package/package.json +1 -1
package/packs/bmad/prompts/create-story.md +39 -0
package/packs/bmad/prompts/probe-author.md +32 -0

package/README.md CHANGED Viewed

@@ -4,9 +4,9 @@
 # Substrate
-Substrate is an autonomous software development pipeline, operated by your AI coding assistant. Install it, initialize your project, and tell Claude what to build — Substrate handles the rest.
+Substrate is an autonomous software development pipeline, operated by your AI coding assistant. Install it, initialize your project, and tell Claude (or Codex, or Gemini) what to build — Substrate handles the rest.
-Most multi-agent coding tools help you run AI sessions in parallel but leave planning, quality control, and learning up to you. Substrate is different: it packages structured planning methodology, multi-agent parallel execution, automated code review cycles, and self-improvement into a single pipeline. Describe your project concept, and Substrate takes it from research through implementation and review — coordinating multiple AI coding agents across isolated worktree branches while a supervisor watches for stalls, auto-recovers, and experiments with improvements to close the loop.
+Most multi-agent coding tools help you run AI sessions in parallel but leave planning, quality control, and learning up to you. Substrate is different: it packages **structured planning methodology**, **multi-agent parallel execution**, **a six-stage verification pipeline**, **automated review-and-fix cycles**, and **a self-improvement loop** into a single pipeline. Describe your project concept, and Substrate takes it from research through implementation and review — coordinating multiple AI coding agents across isolated worktree branches while a supervisor watches for stalls, auto-recovers, and experiments with improvements to close the loop.
 ## How It Works
@@ -28,7 +28,7 @@ Substrate operates through a three-layer interaction model:
 **You talk to your AI assistant. Your assistant talks to Substrate. Substrate orchestrates everything.**
-Here's what that looks like in practice:
+In practice:
 ```
 You: "Implement stories 7-1 through 7-5"
@@ -36,15 +36,17 @@ You: "Implement stories 7-1 through 7-5"
 Claude Code: runs `substrate run --events --stories 7-1,7-2,7-3,7-4,7-5`
 Substrate:   dispatches 5 stories across 3 agents in parallel worktrees
-             → story 7-1: dev complete, code review: SHIP_IT ✓
-             → story 7-2: dev complete, code review: NEEDS_MINOR_FIXES → auto-fix → SHIP_IT ✓
-             → story 7-3: escalated (interface conflict) → Claude asks you what to do
-             → story 7-4: dev complete, code review: SHIP_IT ✓
-             → story 7-5: dev complete, code review: SHIP_IT ✓
+             → story 7-1: dev complete, 6 verification checks ✓ → SHIP_IT
+             → story 7-2: code review NEEDS_MINOR_FIXES → auto-fix → SHIP_IT
+             → story 7-3: source-ac-fidelity flagged a missing path → escalated
+             → story 7-4: runtime probe failed → escalated for diagnosis
+             → story 7-5: SHIP_IT first cycle ✓
-Claude Code: "4 succeeded, 1 escalated — here's the interface conflict in 7-3..."
+Claude Code: "3 succeeded, 2 escalated — here's the runtime-probe failure on 7-4..."
 ```
+Substrate is also **self-developing**: substrate's own development is dispatched through substrate. The fixes shipped in v0.20.42 → v0.20.46 (probe-awareness, frontmatter declarations, dependency-context detection, AnthropicAdapter streaming) were authored by substrate dispatching against its own codebase. This is intentional dogfooding — see the `substrate-on-substrate` examples below.
 ## Prerequisites
 - **Node.js** 22.0.0 or later
@@ -53,6 +55,7 @@ Claude Code: "4 succeeded, 1 escalated — here's the interface conflict in 7-3.
   - [Claude Code](https://docs.anthropic.com/en/docs/claude-code) (`claude`)
   - [Codex CLI](https://github.com/openai/codex) (`codex`)
   - Gemini CLI (`gemini`)
+- **Optional but recommended**: [Dolt](https://www.dolthub.com/) for versioned pipeline state
 ## Quick Start
@@ -65,97 +68,109 @@ substrate init
 ```
 This does three things:
-1. **Generates `.substrate/config.yaml`** — provider routing, concurrency, budgets
-2. **Injects a `## Substrate Pipeline` section into CLAUDE.md** — behavioral directives that teach your AI assistant how to operate the pipeline
-3. **Creates `.claude/commands/` slash commands** — `/substrate-run`, `/substrate-supervisor`, `/substrate-metrics`
-### Run From Your AI Assistant
-Start a Claude Code session in your project. Claude automatically reads the substrate instructions from CLAUDE.md and knows how to operate the pipeline. From there:
-- **"Run the substrate pipeline"** — Claude runs the full lifecycle from analysis through implementation
-- **"Run substrate for stories 7-1, 7-2, 7-3"** — Claude implements specific stories
-- **"/substrate-run"** — invoke the slash command directly for a guided pipeline run
-Claude parses structured events, handles escalations, offers to fix review issues, and summarizes results. You stay in control — Claude always asks before re-running failed stories or applying fixes.
+1. **Generates `.substrate/config.yaml`** — provider routing, concurrency, budgets, quality mode
+2. **Injects a `## Substrate Pipeline` section into CLAUDE.md** — behavioral directives that teach your AI assistant how to operate the pipeline
+3. **Creates `.claude/commands/` slash commands** — `/substrate-run`, `/substrate-supervisor`, `/substrate-metrics`, `/substrate-factory-loop`
-### Monitor and Self-Improve
+If Dolt is on PATH, `substrate init` automatically sets up versioned state. Without Dolt, substrate falls back to plain SQLite.
-While the pipeline runs (or after it finishes):
+### Run From Your AI Assistant
-> "Run the substrate supervisor"
+Start a session in your AI tool of choice. The assistant reads the substrate instructions from `CLAUDE.md` and knows how to operate the pipeline:
-The supervisor watches the pipeline, kills stalls, and auto-restarts. When the run completes, it analyzes what happened — bottlenecks, token waste, slow stories — then optionally runs A/B experiments on prompts and config in isolated worktrees. Improvements get auto-PRed; regressions get discarded.
+- **"Run the substrate pipeline"** — full lifecycle from analysis through implementation
+- **"Run substrate for stories 7-1, 7-2, 7-3"** — implement specific stories
+- **"/substrate-run"** — invoke the slash command directly for a guided run
-This is the full loop: **run → watch → analyze → experiment → improve.**
+Your assistant parses NDJSON events, handles escalations, offers to fix review issues, and summarizes results. You stay in control — your assistant always asks before re-running failed stories or applying fixes.
 ### Run From the CLI Directly
-You can also run substrate directly from the terminal:
 ```bash
 # Full pipeline with NDJSON event stream
 substrate run --events
-# Specific stories
-substrate run --events --stories 7-1,7-2,7-3
+# Specific stories with stricter review limits
+substrate run --events --stories 7-1,7-2,7-3 --max-review-cycles 3
-# Human-readable progress output (default)
-substrate run
+# Resume an interrupted run
+substrate resume
+# Cancel a running pipeline
+substrate cancel
 ```
 ## The Pipeline
-When you tell Substrate to build something, it runs through up to six phases — auto-detecting which phase to start from based on what artifacts already exist:
+When you tell Substrate to build something, it runs through up to **six phases** — auto-detecting which phase to start from based on what artifacts already exist.
 ### Full Lifecycle (from concept)
-1. **Research** — technology stack research, keyword extraction (optional)
-2. **Analysis** — processes concept into structured product brief with problem statement, target users, core features
-3. **Planning** — breaks product brief into epics and stories
-4. **Solutioning** — technical architecture design with constraints, tech stack, design decisions
-5. **Implementation** — parallel story execution (see below)
-6. **Contract Verification** — post-sprint validation of cross-story interfaces
+| Phase | Purpose |
+|---|---|
+| **Research** *(optional)* | Technology stack research, keyword extraction |
+| **Analysis** | Concept → structured product brief (problem, users, features) |
+| **Planning** | Brief → epics and stories |
+| **Solutioning** | Architecture: tech stack, design decisions, constraints |
+| **Implementation** | Parallel story execution (see below) |
+| **Contract Verification** | Post-sprint cross-story interface validation |
 ### Per-Story Implementation
-Each story flows through a quality-gated loop:
+Each story flows through a sequence of phases with a quality-gated review loop:
 ```
-create-story → dev-story → build-verify → code-review
-                                              ↓
-                              SHIP_IT → done ✓
-                              NEEDS_MINOR_FIXES → auto-fix → code-review
-                              NEEDS_MAJOR_REWORK → rework → code-review
-                              max cycles exceeded → escalated ⚠
+create-story → test-plan → dev-story → build-fix → code-review
+                                                       ↓
+                                       SHIP_IT → verification → done ✓
+                                       NEEDS_MINOR_FIXES → fix → code-review
+                                       NEEDS_MAJOR_REWORK → rework → code-review
+                                       max cycles exceeded → escalated ⚠
 ```
-Stories run in parallel across your available agents, each in its own git worktree. Build verification catches compilation errors before code review. Zero-diff detection catches phantom completions. Interface change warnings flag potential cross-module impacts.
+Stories run in parallel across your available agents, each in its own git worktree. After dev-story completes, an optional `probe-author` phase dispatches for event-driven and state-integrating ACs (see [Verification Pipeline](#verification-pipeline)) to derive runtime probes from AC text. Build-fix runs the project's build to catch compilation errors before code review.
+### Verification Pipeline
+Six gates run after code review. Each can pass, warn, or fail; failures block SHIP_IT.
+| Gate | What it catches |
+|---|---|
+| **phantom-review** | Code review that returned no real verdict (review output malformed/empty) |
+| **trivial-output** | Output token count below threshold — likely no real work done |
+| **acceptance-criteria-evidence** | Each AC has demonstrable evidence in dev-story signals (files modified, tests added) |
+| **build** | Project build succeeds against the dev's worktree |
+| **runtime-probes** | Each declared `## Runtime Probes` section probe runs successfully against real or sandboxed state. Includes auto-detection for error-shape envelopes (`{"isError": true}`, `{"status": "error"}`) and production-trigger requirements for event-driven ACs. Frontmatter `external_state_dependencies` declarations hard-gate when probes section is missing. |
+| **source-ac-fidelity** | AC text in source epic appears verbatim in story artifact (paths, MUST clauses, hard contracts). Includes 4 context-aware heuristics: negation (paths the AC says NOT to deliver), dependency-context (peer packages the implementation imports), operational-path (system install destinations like `.git/hooks/`), and alternative-option groups. |
 ### Already Have Planning Artifacts?
-If your project already has BMAD artifacts (from any tool), Substrate skips straight to implementation:
+Substrate skips to whichever phase is needed:
+| File | Purpose |
+|---|---|
+| `_bmad-output/planning-artifacts/epics.md` *(or per-epic `epic-N-*.md`)* | Parsed into per-epic context shards |
+| `_bmad-output/planning-artifacts/architecture.md` | Tech stack and constraints for agents |
+| `_bmad-output/implementation-artifacts/<key>-*.md` | Existing story files — substrate skips re-creation |
-| File | Required? | Purpose |
-|------|-----------|---------|
-| `_bmad-output/planning-artifacts/epics.md` | Yes | Parsed into per-epic context shards |
-| `_bmad-output/planning-artifacts/architecture.md` | Yes | Tech stack and constraints for agents |
-| `_bmad-output/implementation-artifacts/*.md` | Optional | Existing story files — Substrate skips creation for any it finds |
+Drop these in any project and run `substrate run --events --stories <keys>` to dispatch implementation.
 ## AI Agent Integration
-Substrate is designed to be operated by AI agents, not just humans. Three mechanisms teach agents how to interact with the pipeline at runtime:
+Substrate is designed to be operated by AI agents, not just humans. Three mechanisms teach agents how to interact with the pipeline at runtime.
 ### CLAUDE.md Scaffold
-`substrate init` injects a `## Substrate Pipeline` section into your project's CLAUDE.md with:
+`substrate init` injects a `## Substrate Pipeline` section into your project's `CLAUDE.md` with:
 - Instructions to run `--help-agent` on first use
 - Event-driven interaction patterns (escalation handling, fix offers, confirmation requirements)
 - Supervisor workflow guidance
+- Cross-project observation lifecycle norms (reopen-evidence requirements)
 - Version stamp for detecting stale instructions after upgrades
-The section is wrapped in `<!-- substrate:start/end -->` markers for idempotent updates. Re-running `init` updates the substrate section while preserving all other CLAUDE.md content.
+The section is wrapped in `<!-- substrate:start/end -->` markers for idempotent updates. Re-running `init` updates the substrate section while preserving everything else.
 ### Self-Describing CLI (`--help-agent`)
@@ -165,7 +180,7 @@ substrate run --help-agent
 Outputs a machine-optimized prompt fragment (<2000 tokens) that an AI agent can ingest as a system prompt. Generated from the same TypeScript type definitions as the event emitter, so documentation never drifts from implementation. Includes:
-- All available commands and flags with examples
+- All commands and flags with examples
 - Capabilities manifest — installed version, available engines, configured providers, active features
 - Complete event protocol schema
 - Decision flowchart for handling each event type
@@ -174,34 +189,33 @@ Outputs a machine-optimized prompt fragment (<2000 tokens) that an AI agent can
 `substrate init` generates `.claude/commands/` slash commands:
-- `/substrate-run` — Start or resume a pipeline run with structured events
-- `/substrate-supervisor` — Launch the supervisor monitor with stall detection and auto-restart
-- `/substrate-metrics` — Query run history, compare runs, and read analysis reports
+- `/substrate-run` — start or resume a pipeline run with structured events
+- `/substrate-supervisor` — launch the supervisor with stall detection and auto-restart
+- `/substrate-metrics` — query run history and analysis reports
+- `/substrate-factory-loop` — run the convergence loop (see [Software Factory](#software-factory-advanced))
 ### NDJSON Event Protocol
 With `--events`, Substrate emits newline-delimited JSON events on stdout for programmatic consumption:
-```bash
-substrate run --events
-```
-Event types form a discriminated union on the `type` field:
-| Event | Description |
-|-------|-------------|
-| `pipeline:start` | Pipeline begins — includes `run_id`, `stories[]`, `concurrency` |
-| `pipeline:complete` | Pipeline ends — includes `succeeded[]`, `failed[]`, `escalated[]` |
-| `story:phase` | Story transitions between phases (`create-story`, `dev-story`, `code-review`, `fix`) |
-| `story:done` | Story reaches terminal state with `review_cycles` count |
-| `story:escalation` | Story escalated — includes issue list with severities |
-| `story:metrics` | Per-story wall-clock time, token counts, phase breakdown |
-| `story:warn` | Non-fatal warning (e.g., token ceiling truncation) |
+| Event | When |
+|---|---|
+| `pipeline:start` | Pipeline begins (`run_id`, `stories[]`, `concurrency`) |
+| `pipeline:complete` | Pipeline ends (`succeeded[]`, `failed[]`, `escalated[]`) |
 | `pipeline:heartbeat` | Periodic heartbeat with active/completed/queued dispatch counts |
-| `supervisor:*` | Supervisor lifecycle — `poll`, `kill`, `restart`, `abort`, `summary` |
-| `supervisor:experiment:*` | Experiment loop — `start`, `recommendations`, `complete`, `error` |
-All events carry a `ts` (ISO-8601 timestamp) field. Full TypeScript types are exported:
+| `pipeline:contract-mismatch` | Cross-story interface conflict detected |
+| `story:phase` | Story transitions phase (`create-story`, `test-plan`, `dev-story`, `build-fix`, `code-review`, `fix`) |
+| `story:done` | Story reaches terminal state |
+| `story:metrics` | Per-story wall-clock, tokens, phase breakdown |
+| `story:escalation` | Story escalated with issue list |
+| `story:warn` | Non-fatal warning (token ceiling, low output, etc.) |
+| `verification:check-complete` | Single verification gate finished |
+| `verification:story-complete` | All verification gates done for a story |
+| `probe-author:*` | Probe-author phase events (`dispatched`, `output-parsed`, `appended-to-artifact`, `skipped`, `authored-probe-failed`) |
+| `supervisor:*` | Supervisor lifecycle (`poll`, `kill`, `restart`, `abort`, `summary`) |
+| `supervisor:experiment:*` | Self-improvement loop (`start`, `recommendations`, `complete`, `skip`, `error`) |
+All events carry a `ts` (ISO-8601) field. Full TypeScript types are exported:
 ```typescript
 import type { PipelineEvent, StoryEscalationEvent } from 'substrate-ai'
@@ -216,33 +230,34 @@ if (event.type === 'story:escalation') {
 ## Supported Worker Agents
-Substrate dispatches work to CLI-based AI agents running as child processes. It never calls LLMs directly — all implementation, code review, and story generation is delegated to worker agents.
+Substrate dispatches work to CLI-based AI agents running as child processes. It never calls LLMs directly from the dispatch path — implementation, code review, and story generation are all delegated to worker agents.
 | Agent ID | CLI Tool | Billing |
-|----------|----------|---------|
+|---|---|---|
 | `claude-code` | [Claude Code](https://docs.anthropic.com/en/docs/claude-code) | Subscription (Max) or API key |
 | `codex` | [Codex CLI](https://github.com/openai/codex) | Subscription (ChatGPT Plus/Pro) or API key |
 | `gemini` | Gemini CLI | Subscription or API key |
-Substrate auto-discovers available agents at startup and routes work based on adapter health checks and your routing configuration. Unlike API-based orchestrators, Substrate routes work through the CLI tools you already have installed, maximizing your existing AI subscriptions before falling back to pay-per-token billing.
+`substrate adapters list` shows what's installed and healthy. `substrate adapters check` runs full headless-mode verification on each.
+Substrate routes work through CLI tools you already have installed, maximizing your existing AI subscriptions before falling back to pay-per-token billing. Per-task routing is configurable in `.substrate/routing-policy.yaml` and tunable via `substrate routing`.
 ## Observability and Self-Improvement
-### Pipeline Monitoring
+### Live Pipeline Monitoring
 ```bash
 # Human-readable progress (default)
 substrate run
-# Shows compact, updating progress lines:
-#   [dev]    7-2 implementing...
-#   [review] 7-3 SHIP_IT (1 cycle)
-#   [done]   7-5 SHIP_IT (2 cycles)
-# Real-time health check
+# Real-time health
 substrate health --output-format json
 # Poll status
 substrate status --output-format json
+# TUI dashboard
+substrate run --tui
 ```
 - **TTY mode**: ANSI cursor control for in-place line updates
@@ -251,7 +266,7 @@ substrate status --output-format json
 ### Supervisor
-The supervisor is a long-running monitor that watches pipeline health:
+Long-running monitor that watches pipeline health:
 ```bash
 substrate supervisor --output-format json
@@ -259,6 +274,7 @@ substrate supervisor --output-format json
 - Detects stalled agents (configurable threshold)
 - Kills stuck process trees and auto-restarts via `resume`
+- Inherits story scope from health snapshots on restart
 - Emits structured events for each action taken
 ### Self-Improvement Loop
@@ -268,12 +284,13 @@ substrate supervisor --experiment --output-format json
 ```
 After the pipeline completes, the supervisor:
 1. **Analyzes** the run — identifies bottlenecks, token waste, slow stories
 2. **Generates recommendations** — prompt tweaks, config changes, routing adjustments
 3. **Runs A/B experiments** — applies each recommendation in an isolated worktree, re-runs affected stories, compares metrics
-4. **Verdicts**: IMPROVED changes are kept, REGRESSED changes are discarded
+4. **Verdicts**: IMPROVED changes are kept and auto-PRed; REGRESSED changes are discarded
-### Metrics and Cost Tracking
+### Metrics, Cost, and Diff
 ```bash
 # Historical run metrics
@@ -282,16 +299,38 @@ substrate metrics --output-format json
 # Compare two runs side-by-side
 substrate metrics --compare <run-a>,<run-b>
-# Read analysis report
+# Read analysis report from a supervisor run
 substrate metrics --analysis <run-id> --output-format json
 # Cost breakdown
 substrate cost --output-format json
+# Probe-author KPI summary (catch rate, cost, dispatches)
+substrate metrics --probe-author-summary
+```
+With Dolt as the state backend:
+```bash
+# Row-level diff of state changes for a story
+substrate diff <story-key>
+# Commit log of pipeline state mutations
+substrate history
+```
+### Operator Annotations
+Tag verification findings as confirmed defects, false positives, or probe bugs to drive probe-author KPI feedback:
+```bash
+substrate annotate --story 7-3 --finding-category runtime-probe-fail --confirmed-defect --note "..."
+substrate annotate --story 7-4 --finding-category source-ac-drift --false-positive
 ```
 ## Software Factory (Advanced)
-Beyond the linear SDLC pipeline, Substrate includes a graph-based execution engine and autonomous quality system:
+Beyond the linear SDLC pipeline, Substrate includes a graph-based execution engine and autonomous quality system.
 ### Graph Engine
@@ -299,7 +338,8 @@ Beyond the linear SDLC pipeline, Substrate includes a graph-based execution engi
 substrate run --engine graph --events
 ```
-The graph engine reads pipeline topology from DOT files (Graphviz format), enabling:
+Reads pipeline topology from DOT files (Graphviz format), enabling:
 - Conditional edges (retry loops, branching on review verdict)
 - Parallel fan-out/fan-in with configurable join policies
 - LLM-evaluated edge conditions
@@ -308,7 +348,7 @@ The graph engine reads pipeline topology from DOT files (Graphviz format), enabl
 ### Scenario-Based Validation
-Instead of (or alongside) code review, define external test scenarios that the agent can't game:
+External test scenarios that the agent can't game:
 ```bash
 substrate factory scenarios list
@@ -316,7 +356,7 @@ substrate factory scenarios run
 ```
 - **Scenario Store**: SHA-256 manifests for integrity verification
-- **Satisfaction Scoring**: weighted composite of scenario pass rate, performance, complexity
+- **Satisfaction Scoring**: weighted composite of pass rate, performance, complexity
 - **Convergence Loops**: iterate until satisfaction threshold met, with plateau detection and budget controls
 ### Quality Modes
@@ -324,8 +364,8 @@ substrate factory scenarios run
 Configure how stories are validated via `.substrate/config.yaml`:
 | Mode | Description |
-|------|-------------|
-| `code-review` | Traditional — code review verdict drives the gate (default) |
+|---|---|
+| `code-review` | Code review verdict drives the gate (default) |
 | `dual-signal` | Both scenario satisfaction and code review required |
 | `scenario-primary` | Satisfaction score is authoritative |
 | `scenario-only` | Satisfaction only; code review skipped |
@@ -340,18 +380,37 @@ substrate factory twins status
 substrate factory twins down
 ```
+## Substrate-on-Substrate (Self-Development)
+Substrate's own development is dispatched through substrate. To dispatch a substrate fix from substrate's own working tree:
+```bash
+# Author or update the epic doc:
+#   _bmad-output/planning-artifacts/epic-NN-<topic>.md
+# Ingest into the work graph:
+substrate ingest-epic _bmad-output/planning-artifacts/epic-64-state-integrating-ac-frontmatter-and-gate.md
+# Dispatch the planned stories:
+substrate run --events --stories 64-2,64-3 --max-review-cycles 3
+```
+For local CLI changes during dev, use `npm run substrate:dev -- <args>` instead of bare `substrate` (the global binary runs the published version, not your local code).
+This is also how empirical smoke validation works for prompt-edit ships: a fixture epic at `_bmad-output/planning-artifacts/epic-999-prompt-smoke-state-integrating.md` is dispatched to verify prompt changes produce the structural property they target before publishing.
 ## Using as a Library
 Substrate ships as a family of npm packages. Most users just want the CLI (`substrate-ai`); the scoped packages are for downstream projects that want to embed substrate pieces directly.
 | Package | Use when you want... |
-|---------|----------------------|
+|---|---|
 | `substrate-ai` | The full CLI — installed globally |
 | `@substrate-ai/core` | Transport-agnostic primitives — event bus, adapters, cost tracker, telemetry, config schema |
-| `@substrate-ai/sdlc` | SDLC orchestration — phase handlers, graph orchestrator, verification pipeline, learning loop |
-| `@substrate-ai/factory` | Graph engine, scenario runner, convergence loop, digital twin helpers, LLM client |
+| `@substrate-ai/sdlc` | SDLC orchestration — phase handlers, graph orchestrator, verification pipeline (all 6 gates), learning loop |
+| `@substrate-ai/factory` | Graph engine, scenario runner, convergence loop, digital twin helpers, LLM client (with streaming for Anthropic / OpenAI / Gemini) |
-All four packages release in lockstep on every `v*` tag push — pick a version and mix any combination:
+All four packages release in lockstep on every `v*` tag push.
 ```bash
 npm install @substrate-ai/core @substrate-ai/factory
@@ -365,14 +424,12 @@ import { createSdlcEventBridge } from '@substrate-ai/sdlc'
 // Compose these primitives in your own orchestrator.
 ```
-TypeScript declaration files are bundled in each package. Published tarballs carry an npm provenance attestation you can verify with `npm audit signatures`.
+TypeScript declarations bundled. Published tarballs carry an npm provenance attestation you can verify with `npm audit signatures`.
 ## Configuration
 Substrate reads configuration from `.substrate/config.yaml` in your project root. Run `substrate init` to generate defaults.
-### Key Configuration
 ```yaml
 config_format_version: '1'
@@ -402,84 +459,121 @@ dispatch_timeouts:
 ### Configuration Files
 | File | Purpose |
-|------|---------|
+|---|---|
 | `.substrate/config.yaml` | Provider routing, concurrency, budgets, quality mode |
 | `.substrate/project-profile.yaml` | Auto-detected build system, language, test framework |
 | `.substrate/routing-policy.yaml` | Task-to-provider routing rules |
 | `CLAUDE.md` | Agent scaffold with substrate instructions |
 | `.claude/commands/` | Slash commands for Claude Code |
-### Versioned State Backend (Optional)
+### State Backend
+Substrate persists pipeline state (work graph, decisions, telemetry, runs, repo-map) in either:
-Substrate supports [Dolt](https://www.dolthub.com/) for versioned pipeline state:
+- **SQLite** (default) — zero setup, single-file durable state
+- **Dolt** (recommended) — versioned state, branchable, enables `substrate diff` and `substrate history`
 ```bash
-substrate init --dolt
+# With Dolt (auto-detected if `dolt` is on PATH)
+substrate init
 ```
-This enables:
-- `substrate diff <story>` — row-level state changes per story
-- `substrate history` — commit log of pipeline state mutations
-- OTEL observability persistence
-- Context engineering repo-map storage
-Without Dolt, everything works using plain SQLite.
+Without Dolt, all functionality works except for: `substrate diff`, `substrate history`, persistent OTEL observability tables, and context engineering repo-map storage.
 ## CLI Command Reference
-These commands are invoked by AI agents during pipeline operation. You typically don't run them directly — you tell your agent what to do and it selects the right command.
+These commands are typically invoked by your AI assistant during pipeline operation. You usually don't run them directly.
 ### Pipeline
 | Command | Description |
-|---------|-------------|
-| `substrate run` | Run the full pipeline (analysis → implement) |
+|---|---|
+| `substrate run` | Run the full pipeline (auto-detects starting phase) |
 | `substrate run --events` | Emit NDJSON event stream on stdout |
 | `substrate run --stories <keys>` | Run specific stories (e.g., `7-1,7-2`) |
+| `substrate run --epic <n>` | Scope discovery to a single epic number |
 | `substrate run --from <phase>` | Start from a specific phase |
+| `substrate run --stop-after <phase>` | Stop pipeline after this phase |
 | `substrate run --engine graph` | Use the graph execution engine |
-| `substrate run --help-agent` | Print agent instruction prompt fragment and exit |
-| `substrate resume` | Resume an interrupted pipeline run |
+| `substrate run --halt-on <severity>` | Halt on escalation severity (`all`/`critical`/`none`) |
+| `substrate run --max-review-cycles <n>` | Cycles per story (default 2; use 3 for migrations / interface extraction) |
+| `substrate run --skip-verification` | Skip post-dispatch verification (use sparingly) |
+| `substrate run --help-agent` | Print agent instruction prompt fragment |
+| `substrate resume` | Resume an interrupted run |
+| `substrate cancel` | Cancel a running pipeline |
 | `substrate status` | Show pipeline run status |
 | `substrate amend` | Run an amendment pipeline against a completed run |
 | `substrate brainstorm` | Interactive multi-persona ideation session |
+### Work Graph
+| Command | Description |
+|---|---|
+| `substrate ingest-epic <path>` | Parse an epic doc and upsert story metadata into the work graph |
+| `substrate epic-status <epic>` | Generated status view of an epic from the Dolt work graph |
+| `substrate retry-escalated` | Retry escalated stories flagged retry-targeted by escalation diagnosis |
 ### Observability
 | Command | Description |
-|---------|-------------|
-| `substrate health` | Check pipeline health, stall detection, and process status |
-| `substrate supervisor` | Long-running monitor with kill-and-restart recovery |
-| `substrate supervisor --experiment` | Self-improvement: post-run analysis + A/B experiments |
+|---|---|
+| `substrate health` | Pipeline health, stall detection, process status |
+| `substrate supervisor` | Long-running monitor with kill-and-restart |
+| `substrate supervisor --experiment` | Self-improvement: analysis + A/B experiments |
 | `substrate metrics` | Historical pipeline run metrics |
-| `substrate metrics --compare <a,b>` | Side-by-side comparison of two runs |
-| `substrate metrics --analysis <run-id>` | Read the analysis report for a specific run |
-| `substrate monitor status` | View agent performance metrics |
-| `substrate cost` | View cost and token usage summary |
+| `substrate metrics --compare <a,b>` | Side-by-side run comparison |
+| `substrate metrics --analysis <run-id>` | Read analysis report for a specific run |
+| `substrate metrics --probe-author-summary` | Probe-author KPI aggregate |
+| `substrate diff [storyKey]` | Stat-based diff of state changes (Dolt only) |
+| `substrate history` | Dolt commit log for state mutations |
+| `substrate cost` | Cost / token usage summary |
+| `substrate monitor` | Agent performance metrics |
+| `substrate probes` | Inspect runtime-probe sections across story artifacts |
-### Export and Sharing
+### Operator Workflow
 | Command | Description |
-|---------|-------------|
-| `substrate export` | Export planning artifacts as markdown |
-| `substrate export --run-id <id>` | Export artifacts from a specific pipeline run |
-| `substrate export --output-format json` | Emit JSON result for agent consumption |
+|---|---|
+| `substrate annotate` | Tag verification finding as confirmed-defect / false-positive / probe-bug |
+| `substrate probe-author dispatch` | Manually invoke probe-author phase against a single story file |
+| `substrate contracts` | Show contract declarations and verification status |
+### Setup
+| Command | Description |
+|---|---|
+| `substrate init` | Initialize config, CLAUDE.md scaffold, slash commands, state backend |
+| `substrate adapters list` | List known AI agent adapters with availability |
+| `substrate adapters check` | Run health checks across all adapters |
+| `substrate config` | Show, set, export, or import configuration |
+| `substrate routing` | Show / tune routing configuration |
+| `substrate repo-map` | Show / update / query the repo-map symbol index |
+| `substrate upgrade` | Check for updates and upgrade |
+| `substrate migrate` | Migrate historical SQLite data into Dolt |
 ### Worktree Management
 | Command | Description |
-|---------|-------------|
+|---|---|
 | `substrate merge` | Detect conflicts and merge worktree branches into target |
-| `substrate worktrees` | List active git worktrees and their tasks |
+| `substrate worktrees` | List active worktrees and associated tasks |
-### Setup
+### Export
 | Command | Description |
-|---------|-------------|
-| `substrate init` | Initialize config, CLAUDE.md scaffold, and slash commands |
-| `substrate adapters` | List and check available AI agent adapters |
-| `substrate config` | Show, set, export, or import configuration |
-| `substrate upgrade` | Check for updates and upgrade to the latest version |
+|---|---|
+| `substrate export` | Export decision store contents as markdown |
+| `substrate export --run-id <id>` | Export artifacts from a specific run |
+### Software Factory
+| Command | Description |
+|---|---|
+| `substrate factory scenarios list` | List defined scenarios |
+| `substrate factory scenarios run` | Run scenarios in convergence loop |
+| `substrate factory twins up` | Bring up Docker Compose digital twins |
+| `substrate factory twins status` | Twin service status |
+| `substrate factory twins down` | Tear down twins |
 ## Development
@@ -492,7 +586,16 @@ npm run test:fast   # ~50s unit suite for iteration
 npm test            # full suite with coverage — run before merging
 ```
-The repo is an npm workspaces monorepo — see [Using as a Library](#using-as-a-library) for the four packages it publishes. Release mechanics live in `scripts/sync-workspace-versions.mjs` and `.github/workflows/publish.yml`: every `v*` tag push syncs the workspace package versions to the root, dry-runs all four tarballs, and publishes via npm OIDC trusted publishing.
+The repo is an npm workspaces monorepo — see [Using as a Library](#using-as-a-library) for the four packages it publishes. Release mechanics live in `scripts/sync-workspace-versions.mjs` and `.github/workflows/publish.yml`: every `v*` tag push syncs workspace package versions to the root, dry-runs all four tarballs, and publishes via npm OIDC trusted publishing.
+To test local CLI changes without overriding the global binary:
+```bash
+npm run build
+npm run substrate:dev -- run --events --stories 999-1
+```
+The project's [`.claude/commands/ship.md`](.claude/commands/ship.md) defines a `/ship` workflow that runs build / circular-deps / typecheck / tests / (conditional empirical prompt-edit smoke for `packs/bmad/prompts/*.md` changes) before commit and push.
 ## License

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "substrate-ai",
-  "version": "0.20.46",
+  "version": "0.20.47",
   "description": "Substrate — multi-agent orchestration daemon for AI coding agents",
   "type": "module",
   "license": "MIT",

package/packs/bmad/prompts/create-story.md CHANGED Viewed

@@ -260,6 +260,45 @@ Note this example, taken to production, would have caught the strata 1-12 bug at
 Pre-Sprint-22 (warn-severity advisory) the gate produced false negatives at SHIP_IT time. Post-flip, the gate is the load-bearing line of defense for the trigger-invocation property.
+### Production-shaped fixtures
+When the AC describes integration with a **collection** of real-state resources — a fleet of repos, a set of files, a list of services, multiple registry rows, a directory of N projects — the probe fixture MUST contain **≥2 distinct, non-overlapping resources**. A probe that builds a one-resource fixture silently passes when the production-state shape is ≥2, masking defects whose failure mode only surfaces under multiplicity (wrong-cwd-with-N-children, substring-collision attribution, single-row optimistic queries that mis-route under a second row).
+Strata Story 2-4 ("morning briefing generator", v0.20.41) shipped two architectural defects (`fetchGitLog` ran with `cwd=fleetRoot` not per-project; commit attribution used substring match) that any single-repo probe fixture would have hidden. The fleet-root cwd defect produces *some* output against a fleet of one repo (one commit found, attributed to the one project — looks correct); it only fails when the fleet has ≥2 repos with distinct, non-overlapping commit messages and the probe asserts each project gets attributed correctly. See observation `obs_2026-05-02_018`.
+**Rule**: if the AC names a plural state shape (`fleet`, `set of`, `list of`, `multiple`, `each <thing>`, `N projects`, `the registry`, `all <things>`), the probe fixture must populate at least two distinct, non-overlapping instances of that resource and the assertions must distinguish them.
+| AC names | Probe fixture must contain |
+|---|---|
+| fleet of repos / each project | ≥2 git repos in the fleet root, each with a distinct commit message |
+| set of files / list of files | ≥2 files with distinct content; assertions distinguish each |
+| multiple table rows / the registry | ≥2 rows with non-overlapping keys; assertions verify per-row behavior |
+| services in a manifest / N services | ≥2 service definitions with distinct names; assert each |
+**Example: multi-repo fleet probe (production-shaped fixture for the strata 2-4 family)**
+```yaml
+- name: briefing-attributes-commits-per-project
+  sandbox: twin
+  command: |
+    set -e
+    FLEET=$(mktemp -d)
+    for proj in alpha beta; do
+      mkdir -p "$FLEET/$proj"
+      cd "$FLEET/$proj" && git init -q
+      git config user.email t@example.com && git config user.name test
+      echo "$proj content" > a.md && git add . && git commit -qm "$proj-only commit"
+    done
+    cd <REPO_ROOT>
+    FLEET_ROOT="$FLEET" node dist/cli.mjs briefing
+  expect_stdout_regex:
+    - 'alpha-only commit'
+    - 'beta-only commit'
+  description: each project's commit attributed correctly — fixture has ≥2 distinct repos
+```
+A one-repo variant of this probe would pass against the (broken) v0.20.41 implementation; the two-repo variant catches the wrong-cwd defect because the parent-cwd `git log --all` returns BOTH commits but substring-match attribution mis-routes them.
 ### Examples by artifact class
 **Systemd unit:**

package/packs/bmad/prompts/probe-author.md CHANGED Viewed

@@ -130,6 +130,38 @@ Strata Run 13 (Story 1-12, post-merge git hook) shipped SHIP_IT after the dev's
 Note this example, taken to production, would have caught the strata 1-12 bug at runtime-probe phase rather than only at e2e smoke pass. That's the standard this guidance sets.
+## Production-shaped fixtures
+When the AC describes integration with a **collection** of real-state resources — a fleet of repos, a set of files, a list of services, multiple registry rows, a directory of N projects — the probe fixture MUST contain **≥2 distinct, non-overlapping resources**. A probe that builds a one-resource fixture silently passes when the production-state shape is ≥2, masking defects whose failure mode only surfaces under multiplicity (wrong-cwd-with-N-children, substring-collision attribution, single-row optimistic queries that mis-route under a second row).
+Strata Story 2-4 ("morning briefing generator", v0.20.41) shipped two architectural defects (`fetchGitLog` ran with `cwd=fleetRoot` not per-project; commit attribution used substring match) that any single-repo probe fixture would have hidden. The fleet-root cwd defect produces *some* output against a fleet of one repo (one commit found, attributed to the one project — looks correct); it only fails when the fleet has ≥2 repos with distinct, non-overlapping commit messages and the probe asserts each project gets attributed correctly. See observation `obs_2026-05-02_018`.
+**Rule**: if the AC names a plural state shape (`fleet`, `set of`, `list of`, `multiple`, `each <thing>`, `N projects`, `the registry`, `all <things>`), the probe fixture must populate at least two distinct, non-overlapping instances of that resource and the assertions must distinguish them. The plurality must show in the `command:` setup AND in the assertions — a two-repo fixture with a single regex check is half the discipline.
+**Example: multi-repo fleet probe (production-shaped fixture for the strata 2-4 family)**
+```yaml
+- name: briefing-attributes-commits-per-project
+  sandbox: twin
+  command: |
+    set -e
+    FLEET=$(mktemp -d)
+    for proj in alpha beta; do
+      mkdir -p "$FLEET/$proj"
+      cd "$FLEET/$proj" && git init -q
+      git config user.email t@example.com && git config user.name test
+      echo "$proj content" > a.md && git add . && git commit -qm "$proj-only commit"
+    done
+    cd <REPO_ROOT>
+    FLEET_ROOT="$FLEET" node dist/cli.mjs briefing
+  expect_stdout_regex:
+    - 'alpha-only commit'
+    - 'beta-only commit'
+  description: each project's commit attributed correctly — fixture has ≥2 distinct repos
+```
+A one-repo variant of this probe would pass against the (broken) v0.20.41 implementation; the two-repo variant catches the wrong-cwd defect because the parent-cwd `git log --all` returns BOTH commits but substring-match attribution mis-routes them.
 ## Mission
 Author runtime probes for the story described above. Use the AC sections provided: