npm - valent-pipeline - Versions diffs - 0.4.3 → 0.5.0 - Mend

valent-pipeline 0.4.3 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (44) hide show

package/README.md +26 -19
package/package.json +1 -1
package/pipeline/docs/lean-spawn-human-tasks.md +2 -2
package/pipeline/orchestrators/claude-code/README.md +18 -2
package/pipeline/orchestrators/claude-code/plan.workflow.js +56 -16
package/pipeline/orchestrators/claude-code/retro.workflow.js +58 -17
package/pipeline/orchestrators/claude-code/sprint.workflow.js +97 -9
package/pipeline/orchestrators/codex/README.md +3 -3
package/pipeline/orchestrators/codex/lead-loop.md +3 -3
package/pipeline/prompts/lead.md +1 -1
package/pipeline/schemas/task-graph.schema.json +1 -1
package/pipeline/steps/common/distilled-handoff-format.md +1 -1
package/pipeline/steps/orchestration/adopt-lead-and-create-team.md +1 -1
package/pipeline/steps/orchestration/sprint-plan.md +2 -2
package/pipeline/steps/retrospective/calibration.md +1 -1
package/pipeline/task-graphs/frontend-only.yaml +1 -1
package/pipeline/task-graphs/fullstack-web.yaml +1 -1
package/pipeline/task-graphs/mobile-app.yaml +1 -1
package/pipeline/templates/bend-handoff.template.md +1 -1
package/pipeline/templates/critic-review.template.md +1 -1
package/pipeline/templates/data-handoff.template.md +1 -1
package/pipeline/templates/docgen-handoff.template.md +1 -1
package/pipeline/templates/execution-report.template.md +1 -1
package/pipeline/templates/fend-handoff.template.md +1 -1
package/pipeline/templates/iac-handoff.template.md +1 -1
package/pipeline/templates/judge-decision.template.md +1 -1
package/pipeline/templates/libdev-handoff.template.md +1 -1
package/pipeline/templates/mcp-dev-handoff.template.md +1 -1
package/pipeline/templates/mobile-handoff.template.md +1 -1
package/pipeline/templates/qa-test-spec.template.md +1 -1
package/pipeline/templates/readiness-review.template.md +1 -1
package/pipeline/templates/reqs-brief.template.md +1 -1
package/pipeline/templates/uxa-spec.template.md +1 -1
package/skills/valent-configure/SKILL.md +10 -5
package/skills/valent-run-epic-workflow/SKILL.md +4 -4
package/skills/valent-run-project-workflow/SKILL.md +4 -4
package/skills/valent-run-story-workflow/SKILL.md +4 -3
package/src/commands/init.js +45 -23
package/src/commands/resolve-graph.js +3 -6
package/src/commands/upgrade.js +28 -5
package/src/lib/config-schema.js +8 -3
package/src/lib/graph.js +2 -6
package/src/lib/handoff.js +2 -6
package/src/lib/paths.js +26 -0

package/README.md CHANGED Viewed

@@ -7,12 +7,11 @@ You write the story. The pipeline handles requirements analysis, UX specificatio
 ## Quick Start
 ```bash
-# Install globally
-npm install -g valent-pipeline
-# Initialize in your project
+# Initialize in your project — no global install needed.
+# `init` scaffolds .valent-pipeline/ AND vendors the CLI into it, so every project
+# pins its own version and the agents call it via `node .valent-pipeline/bin/cli.js`.
 cd your-project
-valent-pipeline init
+npx valent-pipeline init
 # Run the interactive configuration wizard
 /valent-configure
@@ -21,6 +20,13 @@ valent-pipeline init
 /valent-run-story STORY-001
 ```
+> **No global install required.** `npx valent-pipeline init` copies the CLI (`bin/` + `src/`)
+> into `.valent-pipeline/` and installs its dependencies there. Agents invoke
+> `node .valent-pipeline/bin/cli.js <cmd>` — so different projects can run different CLI
+> versions, and you can customize the pipeline (including `src/`) per project. A global
+> install (`npm install -g valent-pipeline`) still works if you prefer the bare
+> `valent-pipeline` command for manual use.
 ## How It Works
 A persistent **Lead** agent reads your story, assembles a team of specialist agents, and orchestrates them through a dependency-driven pipeline:
@@ -108,39 +114,40 @@ Specialized agents that replace BEND for non-API project types:
 - Claude Code CLI
 - npm account (for publishing)
-### Install
-```bash
-npm install -g valent-pipeline
-```
 ### Initialize a Project
 ```bash
 cd your-project
-valent-pipeline init
+npx valent-pipeline init
 ```
 The init command:
 1. Runs an interactive wizard to set project type, tech stack, and model assignments
 2. Copies pipeline infrastructure to `.valent-pipeline/`
-3. Generates `pipeline-config.yaml` from your answers
-4. Creates knowledge directories and initializes the backlog
-5. Installs Claude Code skills for story/epic/project execution
+3. **Vendors the CLI** (`bin/` + `src/`) into `.valent-pipeline/` and installs its runtime
+   dependencies there, so the project is self-contained and agents run
+   `node .valent-pipeline/bin/cli.js <cmd>` — no global install or `npx` round-trip at run time
+4. Generates `pipeline-config.yaml` from your answers
+5. Creates knowledge directories and initializes the backlog
+6. Installs Claude Code skills for story/epic/project execution
+A global install (`npm install -g valent-pipeline`) is optional — only needed if you want the
+bare `valent-pipeline` command available for manual use outside a project.
 ### Upgrade
 ```bash
-valent-pipeline upgrade
-valent-pipeline upgrade --dry-run   # preview changes without applying
+npx valent-pipeline upgrade
+npx valent-pipeline upgrade --dry-run   # preview changes without applying
 ```
-Upgrades pipeline infrastructure (prompts, templates, task graphs, scripts) while preserving your project-specific files (config, knowledge, backlog).
+Upgrades pipeline infrastructure (prompts, templates, task graphs, scripts) **and re-vendors the
+CLI** (`bin/` + `src/`) while preserving your project-specific files (config, knowledge, backlog).
 ### Validate Configuration
 ```bash
-valent-pipeline config validate
+node .valent-pipeline/bin/cli.js config validate
 ```
 ## Configuration

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "valent-pipeline",
-  "version": "0.4.3",
+  "version": "0.5.0",
   "description": "v3 multi-agent AI pipeline for software development lifecycle",
   "type": "module",
   "bin": {

package/pipeline/docs/lean-spawn-human-tasks.md CHANGED Viewed

@@ -135,8 +135,8 @@ Then test other commands:
 ```bash
 valent-pipeline config validate    # should exit 0
 valent-pipeline upgrade --dry-run  # should show no changes (just installed)
-valent-pipeline db rebuild          # indexes story artifacts (auto-creates DB if missing)
-valent-pipeline db rebuild         # should complete (no stories to index yet)
+node .valent-pipeline/bin/cli.js db rebuild          # indexes story artifacts (auto-creates DB if missing)
+node .valent-pipeline/bin/cli.js db rebuild         # should complete (no stories to index yet)
 ```
 Clean up:

package/pipeline/orchestrators/claude-code/README.md CHANGED Viewed

@@ -45,19 +45,35 @@ incl. a resume-safety lint), but:
 | 3b CRITIC | `parallel([blind, edge, acceptance])` independent agents → triage barrier | one CRITIC context, passes anchored on each other |
 | Spawn context | `buildPrompt()` mirrors `spawn.template.md` (Setup/Task/Trigger/Completion) | terse inline instructions |
 | Roll-over | a rejected story is recorded and the batch continues | — |
+| Empty-graph guard | a resolved graph with zero dev agents throws a diagnostic before Build | silent empty Build → CRITIC looping on an empty diff |
+| No-diff guard | if dev agents report no files, CRITIC/QA/JUDGE are skipped and the story rolls over `blocked` | 4-agent CRITIC re-reviewing an empty diff to the cap |
+| Non-actionable verdict | a gate/CRITIC `needs-review` escalates immediately (no re-run) | re-reviewing a structural blocker until the cap |
 | Resume | journal (`resumeFromRunId`) | disk-state rehydration + re-decide |
 ## Args
 ```js
 // batch form (a planned sprint)
-{ stories: [{ storyId, projectType, profiles }, ...], maxRejectionCycles? }
+{ stories: [{ storyId, projectType, profiles }, ...], maxRejectionCycles?, models? }
 // single-story form (back-compat)
-{ storyId, projectType, profiles?, maxRejectionCycles? }
+{ storyId, projectType, profiles?, maxRejectionCycles?, models? }
 ```
 Returns `{ shipped, stories_shipped, stories_rolled_over, results: [{ storyId, shipped, verdict, skipped }] }`.
+### Per-agent model tiers (`models`)
+Each workflow assigns a model tier per spawned agent — **gates** (READINESS/CRITIC/JUDGE) → `opus`,
+**spec + build** → `sonnet`, **CLI-runner / IO** steps (resolve-graph, sprint-pack, validate-sprint,
+calibrate, embed, persist) → `haiku`, and the retro's loop-until-dry review (`RETRO-REVIEW`) → `opus`.
+This assignment is baked into each script as a default and is **overridable** via the `models` arg —
+the `pipeline-config.yaml` `models` tier→roles map (`{ opus:[...], sonnet:[...], haiku:[...] }`), which
+the invoking skills pass through. Edit it with `/valent-configure` → "Model Assignments". A Workflow
+script can't read files, so the config arrives via `args`, never a direct read. The same `models`
+config also drives the prose-Lead pipeline (`providers/claude-code/runtime.md`), so the two paths stay
+in sync. Selection is static (default + args only) → journal-replay safe. Omit `models` to use the
+baked-in default; an agent with no tier mapping inherits the session model.
 ## Resume & state model (step 8)
 **The journal is the state of record.** Each Workflow invocation returns a `runId`. To resume

package/pipeline/orchestrators/claude-code/plan.workflow.js CHANGED Viewed

@@ -14,14 +14,17 @@
  * one git branch and must be sequential — see sprint.workflow.js.)
  *
  * The deterministic packing/validation (greedy bin-packing, consistency cross-checks) is NOT
- * done in this script — it lives in `valent-pipeline sprint-pack` / `validate-sprint`
+ * done in this script — it lives in `node .valent-pipeline/bin/cli.js sprint-pack` / `validate-sprint`
  * (src/lib/sprint.js), invoked through an agent because a Workflow script has no CLI/fs
  * access. Both runtimes reuse those CLIs; this workflow just sequences the agents.
  *
  * The return value is shaped to feed straight into sprint.workflow.js:
  *   { sprintId, points_planned, stories: [{ storyId, projectType, profiles }] }
  *
- * args: { stories: [{ storyId, projectType }], sprintId, velocity, backlogPath?, maxRejectionCycles? }
+ * args: { stories: [{ storyId, projectType }], sprintId, velocity, backlogPath?, maxRejectionCycles?, models? }
+ *   `models` is the pipeline-config.yaml `models` tier->roles map, passed through by the invoking
+ *   skill so per-agent model tiers stay config-driven (editable via `valent configure`). Omit it to
+ *   use the baked-in default. See sprint.workflow.js for the full rationale.
  */
 export const meta = {
@@ -31,8 +34,8 @@ export const meta = {
     { title: 'Groom', detail: 'reqs -> uxa? -> qa-a -> readiness gate, pipelined across the batch' },
     { title: 'Size', detail: 'profile-matched estimators per story, summed (parallel)' },
     { title: 'Persist', detail: 'write story_points + groomed status to the backlog' },
-    { title: 'Pack', detail: 'valent-pipeline sprint-pack (greedy bin-packing, in code)' },
-    { title: 'Validate', detail: 'write plan/status artifacts + valent-pipeline validate-sprint' },
+    { title: 'Pack', detail: 'node .valent-pipeline/bin/cli.js sprint-pack (greedy bin-packing, in code)' },
+    { title: 'Validate', detail: 'write plan/status artifacts + node .valent-pipeline/bin/cli.js validate-sprint' },
   ],
 }
@@ -116,7 +119,16 @@ const PROFILE_ESTIMATORS = {
 // --- args ---
-const a = args || {}
+// args may arrive as a parsed object or as a JSON string, depending on how the invoking
+// skill/harness passes it. Normalize defensively so `a.stories` etc. resolve either way.
+function parseArgs(x) {
+  if (typeof x === 'string') {
+    try { return JSON.parse(x) } catch { return {} }
+  }
+  return x || {}
+}
+const a = parseArgs(args)
 const stories = Array.isArray(a.stories) ? a.stories : []
 const sprintId = a.sprintId
 const velocity = a.velocity
@@ -126,6 +138,34 @@ if (!stories.length || !sprintId || typeof velocity !== 'number') {
   throw new Error('args must include { stories:[{storyId,projectType}], sprintId, velocity }')
 }
+// --- per-agent model tiers ----------------------------------------------------
+// Tiers come from pipeline-config.yaml `models` (a tier->roles map), passed in as
+// args.models by the invoking skill — a Workflow script can't read files. We invert it
+// to role->tier and overlay it on a baked-in default so the workflow self-hosts a sane
+// assignment even when args.models is absent. Static + args only => journal-replay safe.
+//   readiness gate -> opus, spec/estimators -> sonnet, CLI-runners/IO -> haiku.
+const DEFAULT_MODELS = {
+  READINESS: 'opus',
+  REQS: 'sonnet', UXA: 'sonnet', 'QA-A': 'sonnet',
+  BEND: 'sonnet', FEND: 'sonnet', DATA: 'sonnet', 'MCP-DEV': 'sonnet',
+  LIBDEV: 'sonnet', DOCGEN: 'sonnet', IAC: 'sonnet', MOBILE: 'sonnet',
+  PERSIST: 'haiku', PACK: 'haiku', VALIDATE: 'haiku',
+}
+function buildModelMap(cfg) {
+  const map = { ...DEFAULT_MODELS }
+  if (cfg && typeof cfg === 'object' && !Array.isArray(cfg)) {
+    for (const tier of ['opus', 'sonnet', 'haiku']) {
+      for (const role of cfg[tier] || []) {
+        if (typeof role === 'string') map[role.toUpperCase()] = tier
+      }
+    }
+  }
+  return map
+}
+const MODELS = buildModelMap(a.models)
+// undefined => the agent inherits the main-loop (session) model.
+const modelFor = (role) => MODELS[String(role).toUpperCase()]
 function buildPrompt({ role, promptFile, storyId, taskSubject, trigger, returnContract }) {
   const outputDir = `stories/${storyId}/output`
   return [
@@ -161,7 +201,7 @@ const groomed = await pipeline(
         taskSubject: 'Tag testing_profiles for this story, then produce reqs-brief.md.',
         returnContract: 'Return ONLY { schema:1, agent:"reqs", story, testing_profiles:[...], files:[...] } as JSON.',
       }),
-      { label: `reqs:${story.storyId}`, phase: 'Groom', schema: REQS_GROOM_SCHEMA },
+      { label: `reqs:${story.storyId}`, phase: 'Groom', schema: REQS_GROOM_SCHEMA, model: modelFor('REQS') },
     )
     return { ...story, profiles: r.testing_profiles || [] }
   },
@@ -170,7 +210,7 @@ const groomed = await pipeline(
     if (g.profiles.includes('ui')) {
       await agent(
         buildPrompt({ role: 'UXA', promptFile: 'uxa.md', storyId: g.storyId, taskSubject: 'Translate the brief into uxa-spec.md.' }),
-        { label: `uxa:${g.storyId}`, phase: 'Groom', schema: HANDOFF_SCHEMA },
+        { label: `uxa:${g.storyId}`, phase: 'Groom', schema: HANDOFF_SCHEMA, model: modelFor('UXA') },
       )
     }
     return g
@@ -179,7 +219,7 @@ const groomed = await pipeline(
   async (g) => {
     await agent(
       buildPrompt({ role: 'QA-A', promptFile: 'qa-a.md', storyId: g.storyId, taskSubject: 'Produce qa-test-spec.md before any code is written.' }),
-      { label: `qa-a:${g.storyId}`, phase: 'Groom', schema: HANDOFF_SCHEMA },
+      { label: `qa-a:${g.storyId}`, phase: 'Groom', schema: HANDOFF_SCHEMA, model: modelFor('QA-A') },
     )
     return g
   },
@@ -192,7 +232,7 @@ const groomed = await pipeline(
           role: 'READINESS', promptFile: 'readiness.md', storyId: g.storyId,
           taskSubject: 'Validate the spec chain (reqs/uxa/qa) is implementation-ready; run cross-story checks (sprint mode).',
         }),
-        { label: `gate:readiness:${g.storyId}`, phase: 'Groom', schema: VERDICT_SCHEMA },
+        { label: `gate:readiness:${g.storyId}`, phase: 'Groom', schema: VERDICT_SCHEMA, model: modelFor('READINESS') },
       )
       if (v.verdict === 'pass') return { ...g, groomedStatus: 'groomed' }
       rejections += 1
@@ -204,7 +244,7 @@ const groomed = await pipeline(
       log(`${g.storyId}: readiness rejection ${rejections}/${maxRejectionCycles} -> ${target}`)
       await agent(
         buildPrompt({ role: target, promptFile: `${target.toLowerCase()}.md`, storyId: g.storyId, taskSubject: 'Address the READINESS rejection and rewrite the affected spec.' }),
-        { label: `rework:${target.toLowerCase()}:${g.storyId}`, phase: 'Groom', schema: HANDOFF_SCHEMA },
+        { label: `rework:${target.toLowerCase()}:${g.storyId}`, phase: 'Groom', schema: HANDOFF_SCHEMA, model: modelFor(target) },
       )
     }
   },
@@ -227,7 +267,7 @@ const sized = await parallel(
             taskSubject: 'Estimate this story (read your estimate.md step; apply calibration directives if present).',
             returnContract: 'Return ONLY { schema:1, agent, story, points:<int> } as JSON.',
           }),
-          { label: `estimate:${est.toLowerCase()}:${g.storyId}`, phase: 'Size', schema: ESTIMATE_SCHEMA },
+          { label: `estimate:${est.toLowerCase()}:${g.storyId}`, phase: 'Size', schema: ESTIMATE_SCHEMA, model: modelFor(est) },
         )),
     ).then((ests) => ({
       ...g,
@@ -244,15 +284,15 @@ await agent(
   `Update \`${backlogPath}\`: for each of these stories set \`story_points\` and \`status: groomed\`, ` +
     `and write \`testing_profiles\`. Stories (JSON): ${JSON.stringify(sizedStories.map((s) => ({ id: s.storyId, story_points: s.points, testing_profiles: s.profiles })))}. ` +
     `Return your \`valent:handoff\` machine block fields as JSON.`,
-  { label: 'persist-sizing', phase: 'Persist', schema: HANDOFF_SCHEMA },
+  { label: 'persist-sizing', phase: 'Persist', schema: HANDOFF_SCHEMA, model: modelFor('PERSIST') },
 )
 phase('Pack')
 // Deterministic greedy packing happens in code (src/lib/sprint.js), invoked via the CLI.
 const pack = await agent(
-  `Run exactly: \`valent-pipeline sprint-pack --velocity ${velocity} --backlog ${backlogPath}\` ` +
+  `Run exactly: \`node .valent-pipeline/bin/cli.js sprint-pack --velocity ${velocity} --backlog ${backlogPath}\` ` +
     `in the project root and return its stdout JSON verbatim (fields: sprint_stories, buffer_story_ids, points_planned, remaining_capacity).`,
-  { label: 'sprint-pack', phase: 'Pack', schema: PACK_SCHEMA },
+  { label: 'sprint-pack', phase: 'Pack', schema: PACK_SCHEMA, model: modelFor('PACK') },
 )
 log(`packed ${pack.sprint_stories.length} stories (${pack.points_planned} pts); buffer: ${pack.buffer_story_ids.length}`)
@@ -262,9 +302,9 @@ const validation = await agent(
   `For sprint ${sprintId}: (1) write \`sprint-${sprintId}-plan.md\` from \`.valent-pipeline/templates/sprint-plan.template.md\` ` +
     `and \`sprint-${sprintId}-status.yaml\` from the status template for the packed stories ${JSON.stringify(pack.sprint_stories)}; ` +
     `(2) tag those stories \`sprint: ${sprintId}\` + \`status: sprint-planned\` in \`${backlogPath}\`; ` +
-    `(3) run \`valent-pipeline validate-sprint --status sprint-${sprintId}-status.yaml --backlog ${backlogPath}\` and ` +
+    `(3) run \`node .valent-pipeline/bin/cli.js validate-sprint --status sprint-${sprintId}-status.yaml --backlog ${backlogPath}\` and ` +
     `return its result as JSON { valid:boolean, errors:[...] } (errors = the lines it printed on failure, else []).`,
-  { label: 'validate-sprint', phase: 'Validate', schema: VALIDATE_SCHEMA },
+  { label: 'validate-sprint', phase: 'Validate', schema: VALIDATE_SCHEMA, model: modelFor('VALIDATE') },
 )
 if (!validation.valid) {
   throw new Error(`sprint ${sprintId} plan failed validation: ${(validation.errors || []).join('; ')}`)

package/pipeline/orchestrators/claude-code/retro.workflow.js CHANGED Viewed

@@ -16,25 +16,28 @@
  *          guard) -> embed (CLI).
  *
  * The deterministic pieces are NOT in this script: calibration arithmetic is
- * `valent-pipeline calibrate` (src/lib/sprint.js); embedding is `valent-pipeline db embed`.
+ * `node .valent-pipeline/bin/cli.js calibrate` (src/lib/sprint.js); embedding is `node .valent-pipeline/bin/cli.js db embed`.
  * Both run through agents (a Workflow script has no CLI/fs access). The directive IMPACT
  * GATING and INVARIANT GUARD are deterministic policy, so they are enforced HERE in code —
  * the agent only proposes; the script decides what gets applied vs. surfaced for approval.
  *
- * args: { batchNumber, sprintId?, storyOutputDirs?: string[], dryRounds?: number, maxRounds?: number }
+ * args: { batchNumber, sprintId?, storyOutputDirs?: string[], dryRounds?: number, maxRounds?: number, models? }
  *   sprintId present => sprint-mode (calibration runs). dryRounds = consecutive empty rounds
- *   that end the loop-until-dry (default 2). maxRounds caps it (default 5).
+ *   that end the loop-until-dry (default 2). maxRounds caps it (default 5). `models` is the
+ *   pipeline-config.yaml `models` tier->roles map, passed through by the invoking skill so
+ *   per-agent model tiers stay config-driven (editable via `valent configure`). Omit it to use
+ *   the baked-in default. See sprint.workflow.js for the full rationale.
  */
 export const meta = {
   name: 'valent-retro',
   description: 'Retrospective: calibrate, loop-until-dry aggregate review, gated directives, embed (Workflow)',
   phases: [
-    { title: 'Calibrate', detail: 'valent-pipeline calibrate (estimation accuracy, in code) — sprint mode' },
+    { title: 'Calibrate', detail: 'node .valent-pipeline/bin/cli.js calibrate (estimation accuracy, in code) — sprint mode' },
     { title: 'Analyze', detail: 'CRITIC/QA/JUDGE batch outputs + cost' },
     { title: 'Aggregate', detail: 'loop-until-dry 3-pass aggregate review + completeness critic (R5)' },
     { title: 'Directives', detail: 'agent proposes; code enforces impact gating + invariant guard' },
-    { title: 'Embed', detail: 'valent-pipeline db embed (persist curated patterns)' },
+    { title: 'Embed', detail: 'node .valent-pipeline/bin/cli.js db embed (persist curated patterns)' },
   ],
 }
@@ -109,13 +112,51 @@ const HANDOFF_SCHEMA = {
 // --- args ---
-const a = args || {}
+// args may arrive as a parsed object or as a JSON string, depending on how the invoking
+// skill/harness passes it. Normalize defensively so `a.batchNumber` etc. resolve either way.
+function parseArgs(x) {
+  if (typeof x === 'string') {
+    try { return JSON.parse(x) } catch { return {} }
+  }
+  return x || {}
+}
+const a = parseArgs(args)
 const batchNumber = a.batchNumber
 const sprintId = a.sprintId || null
 const dryRounds = a.dryRounds ?? 2
 const maxRounds = a.maxRounds ?? 5
 if (batchNumber == null) throw new Error('args must include { batchNumber }')
+// --- per-agent model tiers ----------------------------------------------------
+// Tiers come from pipeline-config.yaml `models` (a tier->roles map), passed in as
+// args.models by the invoking skill — a Workflow script can't read files. We invert it
+// to role->tier and overlay it on a baked-in default so the workflow self-hosts a sane
+// assignment even when args.models is absent. Static + args only => journal-replay safe.
+// Retro stages map to synthetic role keys (not the single RETROSPECTIVE persona) so each
+// stage can be tuned independently: the loop-until-dry aggregate review + completeness
+// critic are the genuine quality work (RETRO-REVIEW -> opus); analyze/directives are
+// lighter (RETRO -> sonnet); calibrate/embed/IO are mechanical (haiku).
+const DEFAULT_MODELS = {
+  'RETRO-REVIEW': 'opus',
+  RETRO: 'sonnet',
+  CALIBRATE: 'haiku', EMBED: 'haiku', PERSIST: 'haiku',
+}
+function buildModelMap(cfg) {
+  const map = { ...DEFAULT_MODELS }
+  if (cfg && typeof cfg === 'object' && !Array.isArray(cfg)) {
+    for (const tier of ['opus', 'sonnet', 'haiku']) {
+      for (const role of cfg[tier] || []) {
+        if (typeof role === 'string') map[role.toUpperCase()] = tier
+      }
+    }
+  }
+  return map
+}
+const MODELS = buildModelMap(a.models)
+// undefined => the agent inherits the main-loop (session) model.
+const modelFor = (role) => MODELS[String(role).toUpperCase()]
 const retroPrompt = (instruction, returnContract) =>
   `You are **RETROSPECTIVE**, analyzing story batch ${batchNumber} in the valent-pipeline. ` +
   `Read \`.valent-pipeline/prompts/retrospective.md\` and the step file named in the task. ${instruction} ` +
@@ -131,9 +172,9 @@ if (sprintId) {
   phase('Calibrate')
   // Estimation-accuracy arithmetic lives in code (src/lib/sprint.js); run it via the CLI.
   calibration = await agent(
-    `Run exactly: \`valent-pipeline calibrate --sprint ${sprintId}\` in the project root and return its stdout JSON verbatim ` +
+    `Run exactly: \`node .valent-pipeline/bin/cli.js calibrate --sprint ${sprintId}\` in the project root and return its stdout JSON verbatim ` +
       `(fields: ratios, flagged_pairs, surface_averages, velocity). This feeds calibration directives.`,
-    { label: 'calibrate', phase: 'Calibrate', schema: { type: 'object', additionalProperties: true } },
+    { label: 'calibrate', phase: 'Calibrate', schema: { type: 'object', additionalProperties: true }, model: modelFor('CALIBRATE') },
   )
   log(`calibration: ${(calibration.flagged_pairs || []).length} flagged pair(s); velocity unstable=${calibration.velocity?.unstable}`)
 }
@@ -144,7 +185,7 @@ await agent(
     'Run analyze.md: read all CRITIC reviews, QA-B bug reports, JUDGE rejections, and cost data; categorize rejection/bug patterns.',
     'Return ONLY { schema:1, findings:[{id,summary,severity,stories}] } as JSON.',
   ),
-  { label: 'analyze', phase: 'Analyze', schema: FINDINGS_SCHEMA },
+  { label: 'analyze', phase: 'Analyze', schema: FINDINGS_SCHEMA, model: modelFor('RETRO') },
 )
 phase('Aggregate')
@@ -164,7 +205,7 @@ while (dry < dryRounds && round < maxRounds) {
         `Report ONLY findings not already reported in earlier rounds.`,
       'Return ONLY { schema:1, findings:[{id,summary,severity,stories}] } as JSON.',
     ),
-    { label: `aggregate:round-${round}`, phase: 'Aggregate', schema: FINDINGS_SCHEMA },
+    { label: `aggregate:round-${round}`, phase: 'Aggregate', schema: FINDINGS_SCHEMA, model: modelFor('RETRO-REVIEW') },
   )
   const fresh = (r.findings || []).filter((f) => !seen.has(findingKey(f)))
   if (!fresh.length) {
@@ -187,7 +228,7 @@ const critic = await agent(
       `List only genuine gaps — empty if coverage is complete.`,
     'Return ONLY { schema:1, gaps:["..."] } as JSON.',
   ),
-  { label: 'completeness-critic', phase: 'Aggregate', schema: COMPLETENESS_SCHEMA },
+  { label: 'completeness-critic', phase: 'Aggregate', schema: COMPLETENESS_SCHEMA, model: modelFor('RETRO-REVIEW') },
 )
 if ((critic.gaps || []).length) {
   log(`completeness-critic surfaced ${critic.gaps.length} gap(s) — running targeted reviews`)
@@ -196,7 +237,7 @@ if ((critic.gaps || []).length) {
       agent(
         retroPrompt(`Targeted aggregate review for the previously-uncovered angle: "${gap}". Report only findings not already reported.`,
           'Return ONLY { schema:1, findings:[{id,summary,severity,stories}] } as JSON.'),
-        { label: `aggregate:gap-${i + 1}`, phase: 'Aggregate', schema: FINDINGS_SCHEMA },
+        { label: `aggregate:gap-${i + 1}`, phase: 'Aggregate', schema: FINDINGS_SCHEMA, model: modelFor('RETRO-REVIEW') },
       )),
   )
   for (const r of extra.filter(Boolean)) {
@@ -222,7 +263,7 @@ const drafted = await agent(
       `propose it and flag it; the orchestrator decides what gets applied.`,
     'Return ONLY { schema:1, directives:[{target_agent,directive,reason,impact_level,touchesInvariant,category}] } as JSON.',
   ),
-  { label: 'draft-directives', phase: 'Directives', schema: DIRECTIVES_SCHEMA },
+  { label: 'draft-directives', phase: 'Directives', schema: DIRECTIVES_SCHEMA, model: modelFor('RETRO') },
 )
 const all = drafted.directives || []
@@ -239,7 +280,7 @@ if (applied.length) {
     `Append these APPROVED correction directives to \`correction-directives.yaml\` (status: active, created_batch: ${batchNumber}). ` +
       `They have passed the impact gate (low/medium only). Directives (JSON): ${JSON.stringify(applied)}. ` +
       `Return { schema:1 } when done.`,
-    { label: 'apply-directives', phase: 'Directives', schema: HANDOFF_SCHEMA },
+    { label: 'apply-directives', phase: 'Directives', schema: HANDOFF_SCHEMA, model: modelFor('PERSIST') },
   )
 }
 if (proposals.length) {
@@ -248,7 +289,7 @@ if (proposals.length) {
     `Write these directive PROPOSALS to \`retrospective-batch-${batchNumber}.md\` under "## Pending Approval" — do NOT add them to ` +
       `correction-directives.yaml. For each, document the proposed directive, why it needs approval (architecture-conflict or high-impact), ` +
       `evidence, risk, and an alternative. Proposals (JSON): ${JSON.stringify(proposals)}. Return { schema:1 } when done.`,
-    { label: 'surface-proposals', phase: 'Directives', schema: HANDOFF_SCHEMA },
+    { label: 'surface-proposals', phase: 'Directives', schema: HANDOFF_SCHEMA, model: modelFor('PERSIST') },
   )
 }
@@ -257,8 +298,8 @@ phase('Embed')
 const embed = await agent(
   `Run embed-instructions.md: write \`embed-instructions.md\` (curated recurring patterns / novel decisions / bug patterns / ` +
     `broadly-applicable directives only — NOT one-offs) in the most recent story output dir, then run ` +
-    `\`valent-pipeline db embed --file <that path>\`. Return { schema:1, embedded:<int count> }.`,
-  { label: 'embed', phase: 'Embed', schema: { type: 'object', additionalProperties: true } },
+    `\`node .valent-pipeline/bin/cli.js db embed --file <that path>\`. Return { schema:1, embedded:<int count> }.`,
+  { label: 'embed', phase: 'Embed', schema: { type: 'object', additionalProperties: true }, model: modelFor('EMBED') },
 )
 return {