npm - valent-pipeline - Versions diffs - 0.4.3 → 0.5.1 - Mend

valent-pipeline 0.4.3 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (57) hide show

package/README.md +26 -19
package/bin/cli.js +43 -0
package/package.json +1 -1
package/pipeline/docs/lean-spawn-human-tasks.md +2 -2
package/pipeline/orchestrators/claude-code/README.md +18 -2
package/pipeline/orchestrators/claude-code/plan.workflow.js +78 -16
package/pipeline/orchestrators/claude-code/retro.workflow.js +85 -21
package/pipeline/orchestrators/claude-code/sprint.workflow.js +127 -9
package/pipeline/orchestrators/codex/README.md +3 -3
package/pipeline/orchestrators/codex/lead-loop.md +3 -3
package/pipeline/prompts/lead.md +1 -1
package/pipeline/schemas/task-graph.schema.json +1 -1
package/pipeline/steps/common/distilled-handoff-format.md +1 -1
package/pipeline/steps/orchestration/adopt-lead-and-create-team.md +1 -1
package/pipeline/steps/orchestration/sprint-plan.md +2 -2
package/pipeline/steps/retrospective/calibration.md +1 -1
package/pipeline/task-graphs/frontend-only.yaml +1 -1
package/pipeline/task-graphs/fullstack-web.yaml +1 -1
package/pipeline/task-graphs/mobile-app.yaml +1 -1
package/pipeline/templates/bend-handoff.template.md +1 -1
package/pipeline/templates/critic-review.template.md +1 -1
package/pipeline/templates/data-handoff.template.md +1 -1
package/pipeline/templates/docgen-handoff.template.md +1 -1
package/pipeline/templates/execution-report.template.md +1 -1
package/pipeline/templates/fend-handoff.template.md +1 -1
package/pipeline/templates/iac-handoff.template.md +1 -1
package/pipeline/templates/judge-decision.template.md +1 -1
package/pipeline/templates/libdev-handoff.template.md +1 -1
package/pipeline/templates/mcp-dev-handoff.template.md +1 -1
package/pipeline/templates/mobile-handoff.template.md +1 -1
package/pipeline/templates/qa-test-spec.template.md +1 -1
package/pipeline/templates/readiness-review.template.md +1 -1
package/pipeline/templates/reqs-brief.template.md +1 -1
package/pipeline/templates/uxa-spec.template.md +1 -1
package/skills/valent-configure/SKILL.md +26 -5
package/skills/valent-help/SKILL.md +3 -0
package/skills/valent-review-cost/SKILL.md +69 -0
package/skills/valent-run-epic-workflow/SKILL.md +4 -4
package/skills/valent-run-project-workflow/SKILL.md +4 -4
package/skills/valent-run-story-workflow/SKILL.md +5 -3
package/src/board/public/app.js +377 -0
package/src/board/public/index.html +62 -0
package/src/board/public/styles.css +542 -0
package/src/board/server.js +209 -0
package/src/commands/audit.js +190 -0
package/src/commands/board.js +102 -0
package/src/commands/init.js +56 -23
package/src/commands/resolve-graph.js +3 -6
package/src/commands/status.js +122 -0
package/src/commands/upgrade.js +28 -5
package/src/lib/audit.js +192 -0
package/src/lib/board-source.js +138 -0
package/src/lib/board.js +219 -0
package/src/lib/config-schema.js +31 -3
package/src/lib/graph.js +2 -6
package/src/lib/handoff.js +2 -6
package/src/lib/paths.js +26 -0

package/README.md CHANGED Viewed

@@ -7,12 +7,11 @@ You write the story. The pipeline handles requirements analysis, UX specificatio
 ## Quick Start
 ```bash
-# Install globally
-npm install -g valent-pipeline
-# Initialize in your project
+# Initialize in your project — no global install needed.
+# `init` scaffolds .valent-pipeline/ AND vendors the CLI into it, so every project
+# pins its own version and the agents call it via `node .valent-pipeline/bin/cli.js`.
 cd your-project
-valent-pipeline init
+npx valent-pipeline init
 # Run the interactive configuration wizard
 /valent-configure
@@ -21,6 +20,13 @@ valent-pipeline init
 /valent-run-story STORY-001
 ```
+> **No global install required.** `npx valent-pipeline init` copies the CLI (`bin/` + `src/`)
+> into `.valent-pipeline/` and installs its dependencies there. Agents invoke
+> `node .valent-pipeline/bin/cli.js <cmd>` — so different projects can run different CLI
+> versions, and you can customize the pipeline (including `src/`) per project. A global
+> install (`npm install -g valent-pipeline`) still works if you prefer the bare
+> `valent-pipeline` command for manual use.
 ## How It Works
 A persistent **Lead** agent reads your story, assembles a team of specialist agents, and orchestrates them through a dependency-driven pipeline:
@@ -108,39 +114,40 @@ Specialized agents that replace BEND for non-API project types:
 - Claude Code CLI
 - npm account (for publishing)
-### Install
-```bash
-npm install -g valent-pipeline
-```
 ### Initialize a Project
 ```bash
 cd your-project
-valent-pipeline init
+npx valent-pipeline init
 ```
 The init command:
 1. Runs an interactive wizard to set project type, tech stack, and model assignments
 2. Copies pipeline infrastructure to `.valent-pipeline/`
-3. Generates `pipeline-config.yaml` from your answers
-4. Creates knowledge directories and initializes the backlog
-5. Installs Claude Code skills for story/epic/project execution
+3. **Vendors the CLI** (`bin/` + `src/`) into `.valent-pipeline/` and installs its runtime
+   dependencies there, so the project is self-contained and agents run
+   `node .valent-pipeline/bin/cli.js <cmd>` — no global install or `npx` round-trip at run time
+4. Generates `pipeline-config.yaml` from your answers
+5. Creates knowledge directories and initializes the backlog
+6. Installs Claude Code skills for story/epic/project execution
+A global install (`npm install -g valent-pipeline`) is optional — only needed if you want the
+bare `valent-pipeline` command available for manual use outside a project.
 ### Upgrade
 ```bash
-valent-pipeline upgrade
-valent-pipeline upgrade --dry-run   # preview changes without applying
+npx valent-pipeline upgrade
+npx valent-pipeline upgrade --dry-run   # preview changes without applying
 ```
-Upgrades pipeline infrastructure (prompts, templates, task graphs, scripts) while preserving your project-specific files (config, knowledge, backlog).
+Upgrades pipeline infrastructure (prompts, templates, task graphs, scripts) **and re-vendors the
+CLI** (`bin/` + `src/`) while preserving your project-specific files (config, knowledge, backlog).
 ### Validate Configuration
 ```bash
-valent-pipeline config validate
+node .valent-pipeline/bin/cli.js config validate
 ```
 ## Configuration

package/bin/cli.js CHANGED Viewed

@@ -41,6 +41,49 @@ program
     await upgrade(options);
   });
+// status command — board read-model (composes pipeline-state.json + backlog + artifacts)
+program
+  .command('status')
+  .description('Show pipeline status as a board read-model (human summary, or --json for the board-state contract)')
+  .option('--json', 'Emit the board-state JSON contract to stdout (for a board/notifier/CI)')
+  .option('--out <path>', 'Write the board-state JSON to a file (e.g. board-state.json for a board to watch)')
+  .option('--root <dir>', 'Project root to inspect (defaults to the current directory)')
+  .option('--no-audit', 'Skip merging the per-agent token/wall-clock audit trail into the board')
+  .action(async (options) => {
+    const { statusCmd } = await import('../src/commands/status.js');
+    await statusCmd(options);
+  });
+// board command — serves the read-only board SPA + read API (a projection, never a write path)
+program
+  .command('board')
+  .description('Serve a read-only board UI (Backlog + Kanban) over a local HTTP server')
+  .option('--port <n>', 'Port to listen on (default 7777)', '7777')
+  .option('--host <addr>', 'Host/interface to bind (default 127.0.0.1; 0.0.0.0 warns loudly)', '127.0.0.1')
+  .option('--root <dir>', 'Project root to project the board over (defaults to the current directory)')
+  .option('--open', 'Open the board in the default browser once it is listening')
+  .option('--no-audit', 'Skip merging the per-agent token/wall-clock cost trail into the board')
+  .action(async (options) => {
+    const { boardCmd } = await import('../src/commands/board.js');
+    await boardCmd(options);
+  });
+// audit command — per-agent, per-story token + wall-clock trail (reads Workflow journals)
+program
+  .command('audit')
+  .description('Per-agent, per-story audit trail (tokens + wall-clock), read from Workflow run journals')
+  .option('--story <id>', 'Filter to a single story')
+  .option('--run <runId>', 'Filter to a single workflow run')
+  .option('--json', 'Emit the audit JSON contract to stdout')
+  .option('--out <path>', 'Write the audit JSON to a file')
+  .option('--file <path>', 'Audit a single workflow journal file (wf_*.json)')
+  .option('--session-dir <dir>', 'Scan a specific session dir\'s workflows/ folder')
+  .option('--project-dir <dir>', 'Project dir whose ~/.claude/projects session journals to read (default: cwd)')
+  .action(async (options) => {
+    const { auditCmd } = await import('../src/commands/audit.js');
+    await auditCmd(options);
+  });
 // config validate command
 const configCmd = program
   .command('config')

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "valent-pipeline",
-  "version": "0.4.3",
+  "version": "0.5.1",
   "description": "v3 multi-agent AI pipeline for software development lifecycle",
   "type": "module",
   "bin": {

package/pipeline/docs/lean-spawn-human-tasks.md CHANGED Viewed

@@ -135,8 +135,8 @@ Then test other commands:
 ```bash
 valent-pipeline config validate    # should exit 0
 valent-pipeline upgrade --dry-run  # should show no changes (just installed)
-valent-pipeline db rebuild          # indexes story artifacts (auto-creates DB if missing)
-valent-pipeline db rebuild         # should complete (no stories to index yet)
+node .valent-pipeline/bin/cli.js db rebuild          # indexes story artifacts (auto-creates DB if missing)
+node .valent-pipeline/bin/cli.js db rebuild         # should complete (no stories to index yet)
 ```
 Clean up:

package/pipeline/orchestrators/claude-code/README.md CHANGED Viewed

@@ -45,19 +45,35 @@ incl. a resume-safety lint), but:
 | 3b CRITIC | `parallel([blind, edge, acceptance])` independent agents → triage barrier | one CRITIC context, passes anchored on each other |
 | Spawn context | `buildPrompt()` mirrors `spawn.template.md` (Setup/Task/Trigger/Completion) | terse inline instructions |
 | Roll-over | a rejected story is recorded and the batch continues | — |
+| Empty-graph guard | a resolved graph with zero dev agents throws a diagnostic before Build | silent empty Build → CRITIC looping on an empty diff |
+| No-diff guard | if dev agents report no files, CRITIC/QA/JUDGE are skipped and the story rolls over `blocked` | 4-agent CRITIC re-reviewing an empty diff to the cap |
+| Non-actionable verdict | a gate/CRITIC `needs-review` escalates immediately (no re-run) | re-reviewing a structural blocker until the cap |
 | Resume | journal (`resumeFromRunId`) | disk-state rehydration + re-decide |
 ## Args
 ```js
 // batch form (a planned sprint)
-{ stories: [{ storyId, projectType, profiles }, ...], maxRejectionCycles? }
+{ stories: [{ storyId, projectType, profiles }, ...], maxRejectionCycles?, models? }
 // single-story form (back-compat)
-{ storyId, projectType, profiles?, maxRejectionCycles? }
+{ storyId, projectType, profiles?, maxRejectionCycles?, models? }
 ```
 Returns `{ shipped, stories_shipped, stories_rolled_over, results: [{ storyId, shipped, verdict, skipped }] }`.
+### Per-agent model tiers (`models`)
+Each workflow assigns a model tier per spawned agent — **gates** (READINESS/CRITIC/JUDGE) → `opus`,
+**spec + build** → `sonnet`, **CLI-runner / IO** steps (resolve-graph, sprint-pack, validate-sprint,
+calibrate, embed, persist) → `haiku`, and the retro's loop-until-dry review (`RETRO-REVIEW`) → `opus`.
+This assignment is baked into each script as a default and is **overridable** via the `models` arg —
+the `pipeline-config.yaml` `models` tier→roles map (`{ opus:[...], sonnet:[...], haiku:[...] }`), which
+the invoking skills pass through. Edit it with `/valent-configure` → "Model Assignments". A Workflow
+script can't read files, so the config arrives via `args`, never a direct read. The same `models`
+config also drives the prose-Lead pipeline (`providers/claude-code/runtime.md`), so the two paths stay
+in sync. Selection is static (default + args only) → journal-replay safe. Omit `models` to use the
+baked-in default; an agent with no tier mapping inherits the session model.
 ## Resume & state model (step 8)
 **The journal is the state of record.** Each Workflow invocation returns a `runId`. To resume

package/pipeline/orchestrators/claude-code/plan.workflow.js CHANGED Viewed

@@ -14,14 +14,17 @@
  * one git branch and must be sequential — see sprint.workflow.js.)
  *
  * The deterministic packing/validation (greedy bin-packing, consistency cross-checks) is NOT
- * done in this script — it lives in `valent-pipeline sprint-pack` / `validate-sprint`
+ * done in this script — it lives in `node .valent-pipeline/bin/cli.js sprint-pack` / `validate-sprint`
  * (src/lib/sprint.js), invoked through an agent because a Workflow script has no CLI/fs
  * access. Both runtimes reuse those CLIs; this workflow just sequences the agents.
  *
  * The return value is shaped to feed straight into sprint.workflow.js:
  *   { sprintId, points_planned, stories: [{ storyId, projectType, profiles }] }
  *
- * args: { stories: [{ storyId, projectType }], sprintId, velocity, backlogPath?, maxRejectionCycles? }
+ * args: { stories: [{ storyId, projectType }], sprintId, velocity, backlogPath?, maxRejectionCycles?, models? }
+ *   `models` is the pipeline-config.yaml `models` tier->roles map, passed through by the invoking
+ *   skill so per-agent model tiers stay config-driven (editable via `valent configure`). Omit it to
+ *   use the baked-in default. See sprint.workflow.js for the full rationale.
  */
 export const meta = {
@@ -31,8 +34,8 @@ export const meta = {
     { title: 'Groom', detail: 'reqs -> uxa? -> qa-a -> readiness gate, pipelined across the batch' },
     { title: 'Size', detail: 'profile-matched estimators per story, summed (parallel)' },
     { title: 'Persist', detail: 'write story_points + groomed status to the backlog' },
-    { title: 'Pack', detail: 'valent-pipeline sprint-pack (greedy bin-packing, in code)' },
-    { title: 'Validate', detail: 'write plan/status artifacts + valent-pipeline validate-sprint' },
+    { title: 'Pack', detail: 'node .valent-pipeline/bin/cli.js sprint-pack (greedy bin-packing, in code)' },
+    { title: 'Validate', detail: 'write plan/status artifacts + node .valent-pipeline/bin/cli.js validate-sprint' },
   ],
 }
@@ -116,7 +119,16 @@ const PROFILE_ESTIMATORS = {
 // --- args ---
-const a = args || {}
+// args may arrive as a parsed object or as a JSON string, depending on how the invoking
+// skill/harness passes it. Normalize defensively so `a.stories` etc. resolve either way.
+function parseArgs(x) {
+  if (typeof x === 'string') {
+    try { return JSON.parse(x) } catch { return {} }
+  }
+  return x || {}
+}
+const a = parseArgs(args)
 const stories = Array.isArray(a.stories) ? a.stories : []
 const sprintId = a.sprintId
 const velocity = a.velocity
@@ -126,10 +138,60 @@ if (!stories.length || !sprintId || typeof velocity !== 'number') {
   throw new Error('args must include { stories:[{storyId,projectType}], sprintId, velocity }')
 }
+// --- per-agent model tiers ----------------------------------------------------
+// Tiers come from pipeline-config.yaml `models` (a tier->roles map), passed in as
+// args.models by the invoking skill — a Workflow script can't read files. We invert it
+// to role->tier and overlay it on a baked-in default so the workflow self-hosts a sane
+// assignment even when args.models is absent. Static + args only => journal-replay safe.
+//   readiness gate -> opus, spec/estimators -> sonnet, CLI-runners/IO -> haiku.
+const DEFAULT_MODELS = {
+  READINESS: 'opus',
+  REQS: 'sonnet', UXA: 'sonnet', 'QA-A': 'sonnet',
+  BEND: 'sonnet', FEND: 'sonnet', DATA: 'sonnet', 'MCP-DEV': 'sonnet',
+  LIBDEV: 'sonnet', DOCGEN: 'sonnet', IAC: 'sonnet', MOBILE: 'sonnet',
+  PERSIST: 'haiku', PACK: 'haiku', VALIDATE: 'haiku',
+}
+function buildModelMap(cfg) {
+  const map = { ...DEFAULT_MODELS }
+  if (cfg && typeof cfg === 'object' && !Array.isArray(cfg)) {
+    for (const tier of ['opus', 'sonnet', 'haiku']) {
+      for (const role of cfg[tier] || []) {
+        if (typeof role === 'string') map[role.toUpperCase()] = tier
+      }
+    }
+  }
+  return map
+}
+const MODELS = buildModelMap(a.models)
+// undefined => the agent inherits the main-loop (session) model.
+const modelFor = (role) => MODELS[String(role).toUpperCase()]
+// --- per-agent reasoning effort (thinking budget) ----------------------------
+// Optional, config-driven, mirrors `models`. args.reasoning is a level->roles map inverted to
+// role->trigger-phrase. EMPTY default => no trigger injected unless the config opts in, so
+// behavior is unchanged out of the box. Static + args only => journal-replay safe.
+const REASONING_PHRASES = { think: 'think', 'think-hard': 'think hard', 'think-harder': 'think harder', ultrathink: 'ultrathink' }
+const DEFAULT_REASONING = {} // blank control surface — fill `reasoning` in pipeline-config.yaml to use it
+function buildReasoningMap(cfg) {
+  const map = { ...DEFAULT_REASONING }
+  if (cfg && typeof cfg === 'object' && !Array.isArray(cfg)) {
+    for (const level of Object.keys(REASONING_PHRASES)) {
+      for (const role of cfg[level] || []) {
+        if (typeof role === 'string') map[role.toUpperCase()] = REASONING_PHRASES[level]
+      }
+    }
+  }
+  return map
+}
+const REASONING = buildReasoningMap(a.reasoning)
+const reasoningFor = (role) => REASONING[String(role).toUpperCase()]
 function buildPrompt({ role, promptFile, storyId, taskSubject, trigger, returnContract }) {
   const outputDir = `stories/${storyId}/output`
+  const think = reasoningFor(role) // undefined unless config opts this role into a thinking tier
   return [
     `You are **${role}**, for story ${storyId} in the valent-pipeline (sprint ${sprintId} planning).`,
+    ...(think ? ['', `Before you act, ${think} about the hardest parts of this task.`] : []),
     '',
     '## Setup',
     `1. Read your core prompt: \`.valent-pipeline/prompts/${promptFile}\` — identity, protocols, step sequence.`,
@@ -161,7 +223,7 @@ const groomed = await pipeline(
         taskSubject: 'Tag testing_profiles for this story, then produce reqs-brief.md.',
         returnContract: 'Return ONLY { schema:1, agent:"reqs", story, testing_profiles:[...], files:[...] } as JSON.',
       }),
-      { label: `reqs:${story.storyId}`, phase: 'Groom', schema: REQS_GROOM_SCHEMA },
+      { label: `reqs:${story.storyId}`, phase: 'Groom', schema: REQS_GROOM_SCHEMA, model: modelFor('REQS') },
     )
     return { ...story, profiles: r.testing_profiles || [] }
   },
@@ -170,7 +232,7 @@ const groomed = await pipeline(
     if (g.profiles.includes('ui')) {
       await agent(
         buildPrompt({ role: 'UXA', promptFile: 'uxa.md', storyId: g.storyId, taskSubject: 'Translate the brief into uxa-spec.md.' }),
-        { label: `uxa:${g.storyId}`, phase: 'Groom', schema: HANDOFF_SCHEMA },
+        { label: `uxa:${g.storyId}`, phase: 'Groom', schema: HANDOFF_SCHEMA, model: modelFor('UXA') },
       )
     }
     return g
@@ -179,7 +241,7 @@ const groomed = await pipeline(
   async (g) => {
     await agent(
       buildPrompt({ role: 'QA-A', promptFile: 'qa-a.md', storyId: g.storyId, taskSubject: 'Produce qa-test-spec.md before any code is written.' }),
-      { label: `qa-a:${g.storyId}`, phase: 'Groom', schema: HANDOFF_SCHEMA },
+      { label: `qa-a:${g.storyId}`, phase: 'Groom', schema: HANDOFF_SCHEMA, model: modelFor('QA-A') },
     )
     return g
   },
@@ -192,7 +254,7 @@ const groomed = await pipeline(
           role: 'READINESS', promptFile: 'readiness.md', storyId: g.storyId,
           taskSubject: 'Validate the spec chain (reqs/uxa/qa) is implementation-ready; run cross-story checks (sprint mode).',
         }),
-        { label: `gate:readiness:${g.storyId}`, phase: 'Groom', schema: VERDICT_SCHEMA },
+        { label: `gate:readiness:${g.storyId}`, phase: 'Groom', schema: VERDICT_SCHEMA, model: modelFor('READINESS') },
       )
       if (v.verdict === 'pass') return { ...g, groomedStatus: 'groomed' }
       rejections += 1
@@ -204,7 +266,7 @@ const groomed = await pipeline(
       log(`${g.storyId}: readiness rejection ${rejections}/${maxRejectionCycles} -> ${target}`)
       await agent(
         buildPrompt({ role: target, promptFile: `${target.toLowerCase()}.md`, storyId: g.storyId, taskSubject: 'Address the READINESS rejection and rewrite the affected spec.' }),
-        { label: `rework:${target.toLowerCase()}:${g.storyId}`, phase: 'Groom', schema: HANDOFF_SCHEMA },
+        { label: `rework:${target.toLowerCase()}:${g.storyId}`, phase: 'Groom', schema: HANDOFF_SCHEMA, model: modelFor(target) },
       )
     }
   },
@@ -227,7 +289,7 @@ const sized = await parallel(
             taskSubject: 'Estimate this story (read your estimate.md step; apply calibration directives if present).',
             returnContract: 'Return ONLY { schema:1, agent, story, points:<int> } as JSON.',
           }),
-          { label: `estimate:${est.toLowerCase()}:${g.storyId}`, phase: 'Size', schema: ESTIMATE_SCHEMA },
+          { label: `estimate:${est.toLowerCase()}:${g.storyId}`, phase: 'Size', schema: ESTIMATE_SCHEMA, model: modelFor(est) },
         )),
     ).then((ests) => ({
       ...g,
@@ -244,15 +306,15 @@ await agent(
   `Update \`${backlogPath}\`: for each of these stories set \`story_points\` and \`status: groomed\`, ` +
     `and write \`testing_profiles\`. Stories (JSON): ${JSON.stringify(sizedStories.map((s) => ({ id: s.storyId, story_points: s.points, testing_profiles: s.profiles })))}. ` +
     `Return your \`valent:handoff\` machine block fields as JSON.`,
-  { label: 'persist-sizing', phase: 'Persist', schema: HANDOFF_SCHEMA },
+  { label: 'persist-sizing', phase: 'Persist', schema: HANDOFF_SCHEMA, model: modelFor('PERSIST') },
 )
 phase('Pack')
 // Deterministic greedy packing happens in code (src/lib/sprint.js), invoked via the CLI.
 const pack = await agent(
-  `Run exactly: \`valent-pipeline sprint-pack --velocity ${velocity} --backlog ${backlogPath}\` ` +
+  `Run exactly: \`node .valent-pipeline/bin/cli.js sprint-pack --velocity ${velocity} --backlog ${backlogPath}\` ` +
     `in the project root and return its stdout JSON verbatim (fields: sprint_stories, buffer_story_ids, points_planned, remaining_capacity).`,
-  { label: 'sprint-pack', phase: 'Pack', schema: PACK_SCHEMA },
+  { label: 'sprint-pack', phase: 'Pack', schema: PACK_SCHEMA, model: modelFor('PACK') },
 )
 log(`packed ${pack.sprint_stories.length} stories (${pack.points_planned} pts); buffer: ${pack.buffer_story_ids.length}`)
@@ -262,9 +324,9 @@ const validation = await agent(
   `For sprint ${sprintId}: (1) write \`sprint-${sprintId}-plan.md\` from \`.valent-pipeline/templates/sprint-plan.template.md\` ` +
     `and \`sprint-${sprintId}-status.yaml\` from the status template for the packed stories ${JSON.stringify(pack.sprint_stories)}; ` +
     `(2) tag those stories \`sprint: ${sprintId}\` + \`status: sprint-planned\` in \`${backlogPath}\`; ` +
-    `(3) run \`valent-pipeline validate-sprint --status sprint-${sprintId}-status.yaml --backlog ${backlogPath}\` and ` +
+    `(3) run \`node .valent-pipeline/bin/cli.js validate-sprint --status sprint-${sprintId}-status.yaml --backlog ${backlogPath}\` and ` +
     `return its result as JSON { valid:boolean, errors:[...] } (errors = the lines it printed on failure, else []).`,
-  { label: 'validate-sprint', phase: 'Validate', schema: VALIDATE_SCHEMA },
+  { label: 'validate-sprint', phase: 'Validate', schema: VALIDATE_SCHEMA, model: modelFor('VALIDATE') },
 )
 if (!validation.valid) {
   throw new Error(`sprint ${sprintId} plan failed validation: ${(validation.errors || []).join('; ')}`)

package/pipeline/orchestrators/claude-code/retro.workflow.js CHANGED Viewed

@@ -16,25 +16,28 @@
  *          guard) -> embed (CLI).
  *
  * The deterministic pieces are NOT in this script: calibration arithmetic is
- * `valent-pipeline calibrate` (src/lib/sprint.js); embedding is `valent-pipeline db embed`.
+ * `node .valent-pipeline/bin/cli.js calibrate` (src/lib/sprint.js); embedding is `node .valent-pipeline/bin/cli.js db embed`.
  * Both run through agents (a Workflow script has no CLI/fs access). The directive IMPACT
  * GATING and INVARIANT GUARD are deterministic policy, so they are enforced HERE in code —
  * the agent only proposes; the script decides what gets applied vs. surfaced for approval.
  *
- * args: { batchNumber, sprintId?, storyOutputDirs?: string[], dryRounds?: number, maxRounds?: number }
+ * args: { batchNumber, sprintId?, storyOutputDirs?: string[], dryRounds?: number, maxRounds?: number, models? }
  *   sprintId present => sprint-mode (calibration runs). dryRounds = consecutive empty rounds
- *   that end the loop-until-dry (default 2). maxRounds caps it (default 5).
+ *   that end the loop-until-dry (default 2). maxRounds caps it (default 5). `models` is the
+ *   pipeline-config.yaml `models` tier->roles map, passed through by the invoking skill so
+ *   per-agent model tiers stay config-driven (editable via `valent configure`). Omit it to use
+ *   the baked-in default. See sprint.workflow.js for the full rationale.
  */
 export const meta = {
   name: 'valent-retro',
   description: 'Retrospective: calibrate, loop-until-dry aggregate review, gated directives, embed (Workflow)',
   phases: [
-    { title: 'Calibrate', detail: 'valent-pipeline calibrate (estimation accuracy, in code) — sprint mode' },
+    { title: 'Calibrate', detail: 'node .valent-pipeline/bin/cli.js calibrate (estimation accuracy, in code) — sprint mode' },
     { title: 'Analyze', detail: 'CRITIC/QA/JUDGE batch outputs + cost' },
     { title: 'Aggregate', detail: 'loop-until-dry 3-pass aggregate review + completeness critic (R5)' },
     { title: 'Directives', detail: 'agent proposes; code enforces impact gating + invariant guard' },
-    { title: 'Embed', detail: 'valent-pipeline db embed (persist curated patterns)' },
+    { title: 'Embed', detail: 'node .valent-pipeline/bin/cli.js db embed (persist curated patterns)' },
   ],
 }
@@ -109,17 +112,78 @@ const HANDOFF_SCHEMA = {
 // --- args ---
-const a = args || {}
+// args may arrive as a parsed object or as a JSON string, depending on how the invoking
+// skill/harness passes it. Normalize defensively so `a.batchNumber` etc. resolve either way.
+function parseArgs(x) {
+  if (typeof x === 'string') {
+    try { return JSON.parse(x) } catch { return {} }
+  }
+  return x || {}
+}
+const a = parseArgs(args)
 const batchNumber = a.batchNumber
 const sprintId = a.sprintId || null
 const dryRounds = a.dryRounds ?? 2
 const maxRounds = a.maxRounds ?? 5
 if (batchNumber == null) throw new Error('args must include { batchNumber }')
-const retroPrompt = (instruction, returnContract) =>
-  `You are **RETROSPECTIVE**, analyzing story batch ${batchNumber} in the valent-pipeline. ` +
-  `Read \`.valent-pipeline/prompts/retrospective.md\` and the step file named in the task. ${instruction} ` +
-  (returnContract || 'Return your findings as the JSON object specified.')
+// --- per-agent model tiers ----------------------------------------------------
+// Tiers come from pipeline-config.yaml `models` (a tier->roles map), passed in as
+// args.models by the invoking skill — a Workflow script can't read files. We invert it
+// to role->tier and overlay it on a baked-in default so the workflow self-hosts a sane
+// assignment even when args.models is absent. Static + args only => journal-replay safe.
+// Retro stages map to synthetic role keys (not the single RETROSPECTIVE persona) so each
+// stage can be tuned independently: the loop-until-dry aggregate review + completeness
+// critic are the genuine quality work (RETRO-REVIEW -> opus); analyze/directives are
+// lighter (RETRO -> sonnet); calibrate/embed/IO are mechanical (haiku).
+const DEFAULT_MODELS = {
+  'RETRO-REVIEW': 'opus',
+  RETRO: 'sonnet',
+  CALIBRATE: 'haiku', EMBED: 'haiku', PERSIST: 'haiku',
+}
+function buildModelMap(cfg) {
+  const map = { ...DEFAULT_MODELS }
+  if (cfg && typeof cfg === 'object' && !Array.isArray(cfg)) {
+    for (const tier of ['opus', 'sonnet', 'haiku']) {
+      for (const role of cfg[tier] || []) {
+        if (typeof role === 'string') map[role.toUpperCase()] = tier
+      }
+    }
+  }
+  return map
+}
+const MODELS = buildModelMap(a.models)
+// undefined => the agent inherits the main-loop (session) model.
+const modelFor = (role) => MODELS[String(role).toUpperCase()]
+// --- reasoning effort (thinking budget) --------------------------------------
+// Optional, config-driven, mirrors `models`. args.reasoning is a level->roles map inverted to
+// role->trigger-phrase. EMPTY default => nothing injected unless the config opts in. Every agent
+// here shares the RETROSPECTIVE identity, so the knob keys on that role. Static + args only.
+const REASONING_PHRASES = { think: 'think', 'think-hard': 'think hard', 'think-harder': 'think harder', ultrathink: 'ultrathink' }
+const DEFAULT_REASONING = {} // blank control surface — fill `reasoning` in pipeline-config.yaml to use it
+function buildReasoningMap(cfg) {
+  const map = { ...DEFAULT_REASONING }
+  if (cfg && typeof cfg === 'object' && !Array.isArray(cfg)) {
+    for (const level of Object.keys(REASONING_PHRASES)) {
+      for (const role of cfg[level] || []) {
+        if (typeof role === 'string') map[role.toUpperCase()] = REASONING_PHRASES[level]
+      }
+    }
+  }
+  return map
+}
+const REASONING = buildReasoningMap(a.reasoning)
+const reasoningFor = (role) => REASONING[String(role).toUpperCase()]
+const retroPrompt = (instruction, returnContract) => {
+  const think = reasoningFor('RETROSPECTIVE') // undefined unless config opts RETROSPECTIVE into a tier
+  return `You are **RETROSPECTIVE**, analyzing story batch ${batchNumber} in the valent-pipeline. ` +
+    (think ? `Before you act, ${think} about the hardest parts of this task. ` : '') +
+    `Read \`.valent-pipeline/prompts/retrospective.md\` and the step file named in the task. ${instruction} ` +
+    (returnContract || 'Return your findings as the JSON object specified.')
+}
 // A stable de-dup key so loop-until-dry converges (don't re-count the same finding).
 const findingKey = (f) => `${(f.summary || '').toLowerCase().trim().slice(0, 80)}`
@@ -131,9 +195,9 @@ if (sprintId) {
   phase('Calibrate')
   // Estimation-accuracy arithmetic lives in code (src/lib/sprint.js); run it via the CLI.
   calibration = await agent(
-    `Run exactly: \`valent-pipeline calibrate --sprint ${sprintId}\` in the project root and return its stdout JSON verbatim ` +
+    `Run exactly: \`node .valent-pipeline/bin/cli.js calibrate --sprint ${sprintId}\` in the project root and return its stdout JSON verbatim ` +
       `(fields: ratios, flagged_pairs, surface_averages, velocity). This feeds calibration directives.`,
-    { label: 'calibrate', phase: 'Calibrate', schema: { type: 'object', additionalProperties: true } },
+    { label: 'calibrate', phase: 'Calibrate', schema: { type: 'object', additionalProperties: true }, model: modelFor('CALIBRATE') },
   )
   log(`calibration: ${(calibration.flagged_pairs || []).length} flagged pair(s); velocity unstable=${calibration.velocity?.unstable}`)
 }
@@ -144,7 +208,7 @@ await agent(
     'Run analyze.md: read all CRITIC reviews, QA-B bug reports, JUDGE rejections, and cost data; categorize rejection/bug patterns.',
     'Return ONLY { schema:1, findings:[{id,summary,severity,stories}] } as JSON.',
   ),
-  { label: 'analyze', phase: 'Analyze', schema: FINDINGS_SCHEMA },
+  { label: 'analyze', phase: 'Analyze', schema: FINDINGS_SCHEMA, model: modelFor('RETRO') },
 )
 phase('Aggregate')
@@ -164,7 +228,7 @@ while (dry < dryRounds && round < maxRounds) {
         `Report ONLY findings not already reported in earlier rounds.`,
       'Return ONLY { schema:1, findings:[{id,summary,severity,stories}] } as JSON.',
     ),
-    { label: `aggregate:round-${round}`, phase: 'Aggregate', schema: FINDINGS_SCHEMA },
+    { label: `aggregate:round-${round}`, phase: 'Aggregate', schema: FINDINGS_SCHEMA, model: modelFor('RETRO-REVIEW') },
   )
   const fresh = (r.findings || []).filter((f) => !seen.has(findingKey(f)))
   if (!fresh.length) {
@@ -187,7 +251,7 @@ const critic = await agent(
       `List only genuine gaps — empty if coverage is complete.`,
     'Return ONLY { schema:1, gaps:["..."] } as JSON.',
   ),
-  { label: 'completeness-critic', phase: 'Aggregate', schema: COMPLETENESS_SCHEMA },
+  { label: 'completeness-critic', phase: 'Aggregate', schema: COMPLETENESS_SCHEMA, model: modelFor('RETRO-REVIEW') },
 )
 if ((critic.gaps || []).length) {
   log(`completeness-critic surfaced ${critic.gaps.length} gap(s) — running targeted reviews`)
@@ -196,7 +260,7 @@ if ((critic.gaps || []).length) {
       agent(
         retroPrompt(`Targeted aggregate review for the previously-uncovered angle: "${gap}". Report only findings not already reported.`,
           'Return ONLY { schema:1, findings:[{id,summary,severity,stories}] } as JSON.'),
-        { label: `aggregate:gap-${i + 1}`, phase: 'Aggregate', schema: FINDINGS_SCHEMA },
+        { label: `aggregate:gap-${i + 1}`, phase: 'Aggregate', schema: FINDINGS_SCHEMA, model: modelFor('RETRO-REVIEW') },
       )),
   )
   for (const r of extra.filter(Boolean)) {
@@ -222,7 +286,7 @@ const drafted = await agent(
       `propose it and flag it; the orchestrator decides what gets applied.`,
     'Return ONLY { schema:1, directives:[{target_agent,directive,reason,impact_level,touchesInvariant,category}] } as JSON.',
   ),
-  { label: 'draft-directives', phase: 'Directives', schema: DIRECTIVES_SCHEMA },
+  { label: 'draft-directives', phase: 'Directives', schema: DIRECTIVES_SCHEMA, model: modelFor('RETRO') },
 )
 const all = drafted.directives || []
@@ -239,7 +303,7 @@ if (applied.length) {
     `Append these APPROVED correction directives to \`correction-directives.yaml\` (status: active, created_batch: ${batchNumber}). ` +
       `They have passed the impact gate (low/medium only). Directives (JSON): ${JSON.stringify(applied)}. ` +
       `Return { schema:1 } when done.`,
-    { label: 'apply-directives', phase: 'Directives', schema: HANDOFF_SCHEMA },
+    { label: 'apply-directives', phase: 'Directives', schema: HANDOFF_SCHEMA, model: modelFor('PERSIST') },
   )
 }
 if (proposals.length) {
@@ -248,7 +312,7 @@ if (proposals.length) {
     `Write these directive PROPOSALS to \`retrospective-batch-${batchNumber}.md\` under "## Pending Approval" — do NOT add them to ` +
       `correction-directives.yaml. For each, document the proposed directive, why it needs approval (architecture-conflict or high-impact), ` +
       `evidence, risk, and an alternative. Proposals (JSON): ${JSON.stringify(proposals)}. Return { schema:1 } when done.`,
-    { label: 'surface-proposals', phase: 'Directives', schema: HANDOFF_SCHEMA },
+    { label: 'surface-proposals', phase: 'Directives', schema: HANDOFF_SCHEMA, model: modelFor('PERSIST') },
   )
 }
@@ -257,8 +321,8 @@ phase('Embed')
 const embed = await agent(
   `Run embed-instructions.md: write \`embed-instructions.md\` (curated recurring patterns / novel decisions / bug patterns / ` +
     `broadly-applicable directives only — NOT one-offs) in the most recent story output dir, then run ` +
-    `\`valent-pipeline db embed --file <that path>\`. Return { schema:1, embedded:<int count> }.`,
-  { label: 'embed', phase: 'Embed', schema: { type: 'object', additionalProperties: true } },
+    `\`node .valent-pipeline/bin/cli.js db embed --file <that path>\`. Return { schema:1, embedded:<int count> }.`,
+  { label: 'embed', phase: 'Embed', schema: { type: 'object', additionalProperties: true }, model: modelFor('EMBED') },
 )
 return {