npm - valent-pipeline - Versions diffs - 0.3.3 → 0.4.1 - Mend

valent-pipeline 0.3.3 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (103) hide show

package/bin/cli.js +80 -0
package/package.json +7 -5
package/pipeline/agents-manifest.yaml +23 -33
package/pipeline/docs/design/provider-adapter-guide.md +6 -7
package/pipeline/docs/knowledge-system.md +16 -18
package/pipeline/docs/lead-lifecycle.md +3 -12
package/pipeline/docs/npx-packaging.md +0 -1
package/pipeline/docs/template-skeleton.md +1 -1
package/pipeline/orchestrators/claude-code/README.md +99 -0
package/pipeline/orchestrators/claude-code/plan.workflow.js +284 -0
package/pipeline/orchestrators/claude-code/retro.workflow.js +274 -0
package/pipeline/orchestrators/claude-code/sprint.workflow.js +354 -0
package/pipeline/orchestrators/codex/README.md +52 -0
package/pipeline/orchestrators/codex/lead-loop.md +115 -0
package/pipeline/prompts/bend.md +12 -2
package/pipeline/prompts/critic.md +17 -8
package/pipeline/prompts/fend.md +12 -2
package/pipeline/prompts/judge.md +12 -2
package/pipeline/prompts/lead.md +231 -71
package/pipeline/prompts/qa-a.md +1 -1
package/pipeline/prompts/qa-b.md +12 -2
package/pipeline/prompts/reqs.md +1 -1
package/pipeline/prompts/uxa.md +1 -1
package/pipeline/providers/claude-code/runtime.md +31 -10
package/pipeline/providers/codex/AGENTS.md +8 -3
package/pipeline/providers/codex/cloud-task-prompts/implementation.md +2 -0
package/pipeline/providers/codex/codex-project-files/.codex/agents/review-explorer.toml +2 -2
package/pipeline/providers/codex/runtime.md +91 -208
package/pipeline/providers/codex/spawn.template.md +3 -1
package/pipeline/schemas/handoff.schema.json +19 -0
package/pipeline/schemas/task-graph.schema.json +53 -0
package/pipeline/schemas/verdict.schema.json +20 -0
package/pipeline/scripts/query-kb.ts +1 -1
package/pipeline/spawn-templates/pipeline-context.template.md +1 -3
package/pipeline/steps/bend/read-inputs.md +2 -5
package/pipeline/steps/common/agent-protocol.md +9 -1
package/pipeline/steps/common/distilled-handoff-format.md +15 -0
package/pipeline/steps/critic/acceptance-audit.md +1 -1
package/pipeline/steps/critic/edge-case-hunt.md +2 -2
package/pipeline/steps/critic/triage.md +2 -2
package/pipeline/steps/data/read-inputs.md +2 -5
package/pipeline/steps/docgen/read-inputs.md +2 -5
package/pipeline/steps/fend/read-inputs.md +2 -5
package/pipeline/steps/iac/read-inputs.md +2 -5
package/pipeline/steps/libdev/read-inputs.md +2 -5
package/pipeline/steps/mcp-dev/read-inputs.md +2 -5
package/pipeline/steps/mobile/read-inputs.md +2 -5
package/pipeline/steps/orchestration/adopt-lead-and-create-team.md +107 -33
package/pipeline/steps/orchestration/sprint-execute.md +30 -10
package/pipeline/steps/orchestration/sprint-plan.md +28 -31
package/pipeline/steps/orchestration/validate-story-inputs.md +1 -1
package/pipeline/steps/qa-a/read-inputs.md +2 -6
package/pipeline/steps/reqs/read-inputs.md +3 -7
package/pipeline/steps/retrospective/calibration.md +18 -31
package/pipeline/steps/uxa/read-inputs.md +2 -6
package/pipeline/task-graphs/backend-api.yaml +1 -9
package/pipeline/task-graphs/data-pipeline.yaml +1 -9
package/pipeline/task-graphs/document-generation.yaml +1 -9
package/pipeline/task-graphs/frontend-only.yaml +9 -16
package/pipeline/task-graphs/fullstack-web.yaml +11 -18
package/pipeline/task-graphs/library.yaml +1 -9
package/pipeline/task-graphs/mcp-server.yaml +1 -9
package/pipeline/task-graphs/mobile-app.yaml +8 -15
package/pipeline/templates/bend-handoff.template.md +11 -0
package/pipeline/templates/critic-review.template.md +15 -1
package/pipeline/templates/data-handoff.template.md +11 -0
package/pipeline/templates/docgen-handoff.template.md +11 -0
package/pipeline/templates/embed-instructions.template.md +1 -1
package/pipeline/templates/execution-report.template.md +11 -0
package/pipeline/templates/fend-handoff.template.md +11 -0
package/pipeline/templates/iac-handoff.template.md +11 -0
package/pipeline/templates/judge-decision.template.md +13 -0
package/pipeline/templates/libdev-handoff.template.md +11 -0
package/pipeline/templates/mcp-dev-handoff.template.md +11 -0
package/pipeline/templates/mobile-handoff.template.md +11 -0
package/pipeline/templates/qa-test-spec.template.md +11 -0
package/pipeline/templates/readiness-review.template.md +13 -0
package/pipeline/templates/reqs-brief.template.md +11 -0
package/pipeline/templates/retrospective.template.md +1 -1
package/pipeline/templates/uxa-spec.template.md +11 -0
package/skills/valent-help/SKILL.md +2 -2
package/skills/valent-knowledge/SKILL.md +68 -0
package/skills/valent-run-epic/SKILL.md +4 -9
package/skills/valent-run-project/SKILL.md +4 -7
package/skills/valent-run-story/SKILL.md +13 -1
package/skills/valent-setup-backlog/SKILL.md +3 -3
package/src/commands/calibrate.js +86 -0
package/src/commands/init.js +1 -1
package/src/commands/rejection-cap.js +70 -0
package/src/commands/resolve-graph.js +79 -0
package/src/commands/sprint-pack.js +62 -0
package/src/commands/validate-handoff.js +32 -0
package/src/commands/validate-sprint.js +55 -0
package/src/lib/config-schema.js +2 -2
package/src/lib/graph.js +98 -0
package/src/lib/handoff.js +99 -0
package/src/lib/rejection.js +38 -0
package/src/lib/sprint.js +312 -0
package/pipeline/prompts/knowledge.md +0 -94
package/pipeline/providers/claude-code/knowledge-spawn.template.md +0 -17
package/pipeline/providers/codex/codex-project-files/.codex/agents/knowledge-service.toml +0 -14
package/pipeline/providers/codex/knowledge-spawn.template.md +0 -19
package/pipeline/spawn-templates/knowledge-spawn.template.md +0 -17

package/bin/cli.js CHANGED Viewed

@@ -54,6 +54,86 @@ configCmd
     await validate();
   });
+// validate-handoff command
+program
+  .command('validate-handoff')
+  .description('Validate a handoff artifact\'s valent:handoff machine block against the schema')
+  .requiredOption('--file <path>', 'Path to the handoff markdown file')
+  .option('--gate', 'Force gate validation (verdict required + pass-requires-zero-Highs invariant)')
+  .action(async (options) => {
+    const { validateHandoffCmd } = await import('../src/commands/validate-handoff.js');
+    await validateHandoffCmd(options);
+  });
+// resolve-graph command
+program
+  .command('resolve-graph')
+  .description('Deterministically resolve a task graph against testing profiles (evaluate predicates, prune blockedBy)')
+  .option('--type <project-type>', 'Project type (resolves .valent-pipeline/task-graphs/<type>.yaml, falling back to packaged)')
+  .option('--file <path>', 'Explicit path to a task-graph YAML (overrides --type)')
+  .option('--profiles <list>', 'Comma-separated testing profiles, e.g. api,ui,iac', '')
+  .option('--validate-only', 'Validate the graph shape and references without resolving')
+  .action(async (options) => {
+    const { resolveGraphCmd } = await import('../src/commands/resolve-graph.js');
+    await resolveGraphCmd(options);
+  });
+// sprint-pack command (meta-loop: greedy story packing)
+program
+  .command('sprint-pack')
+  .description('Deterministically pack groomed stories into a sprint by priority within a velocity budget')
+  .requiredOption('--velocity <n>', 'Sprint capacity in story points')
+  .option('--backlog <path>', 'Backlog file (YAML/JSON); packs its `items`')
+  .option('--stories <path>', 'Explicit story array (YAML/JSON); overrides --backlog')
+  .action(async (options) => {
+    const { sprintPackCmd } = await import('../src/commands/sprint-pack.js');
+    await sprintPackCmd(options);
+  });
+// calibrate command (meta-loop: estimation-accuracy arithmetic)
+program
+  .command('calibrate')
+  .description('Compute calibration metrics (point/time ratios, deviation flags, velocity stability)')
+  .option('--sprint <id>', 'Sprint to pull calibration rows for (queries the SQLite store)')
+  .option('--db', 'Use all calibration rows from the store (no sprint filter)')
+  .option('--db-path <path>', 'Database path (defaults to config)')
+  .option('--data <path>', 'Explicit calibration rows (YAML/JSON); overrides the db source')
+  .option('--velocity-history <path>', 'Explicit velocity history (YAML/JSON) to pair with --data')
+  .option('--deviation-threshold <n>', 'Pairwise deviation flag threshold (default 0.5)')
+  .option('--cv-threshold <n>', 'Velocity coefficient-of-variation instability threshold (default 0.3)')
+  .action(async (options) => {
+    const { calibrateCmd } = await import('../src/commands/calibrate.js');
+    await calibrateCmd(options);
+  });
+// validate-sprint command (meta-loop: consistency cross-checks)
+program
+  .command('validate-sprint')
+  .description('Cross-check sprint status YAML and backlog for consistency (sprint-plan.md Step 6)')
+  .requiredOption('--status <path>', 'Sprint status YAML/JSON (machine-readable companion to the plan)')
+  .requiredOption('--backlog <path>', 'Backlog file (YAML/JSON)')
+  .option('--plan <path>', 'Optional structured plan (JSON/YAML), e.g. sprint-pack output; defaults to deriving from --status')
+  .action(async (options) => {
+    const { validateSprintCmd } = await import('../src/commands/validate-sprint.js');
+    await validateSprintCmd(options);
+  });
+// rejection-cap command (code-owned rejection cap for the prose/Codex shell)
+program
+  .command('rejection-cap')
+  .description('Track and enforce the per-story rejection cap in code (exits non-zero when tripped)')
+  .requiredOption('--story <id>', 'Story identifier')
+  .requiredOption('--gate <gate>', 'Gate name (readiness | critic | judge)')
+  .option('--agent <name>', 'Responsible agent the rejection is routed to (defaults to the gate)')
+  .option('--max <n>', 'Cap (max rejection cycles); defaults to 5')
+  .option('--increment', 'Record a new rejection (bump the counter), then report')
+  .option('--reset', 'Clear all counters for the story (call at a story boundary), then report')
+  .option('--state <path>', 'State file path (defaults to .valent-pipeline/rejection-state.json)')
+  .action(async (options) => {
+    const { rejectionCapCmd } = await import('../src/commands/rejection-cap.js');
+    await rejectionCapCmd(options);
+  });
 // db commands
 const dbCmd = program
   .command('db')

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "valent-pipeline",
-  "version": "0.3.3",
+  "version": "0.4.1",
   "description": "v3 multi-agent AI pipeline for software development lifecycle",
   "type": "module",
   "bin": {
@@ -16,14 +16,16 @@
     "skills/"
   ],
   "scripts": {
-    "test": "node scripts/test-local.js",
-    "prepublishOnly": "node scripts/test-local.js"
+    "test": "node scripts/test-local.js && node scripts/test-workflow.js",
+    "prepublishOnly": "node scripts/test-local.js && node scripts/test-workflow.js"
   },
   "dependencies": {
+    "ajv": "^8.20.0",
+    "better-sqlite3": "^11.0.0",
     "commander": "^12.0.0",
     "inquirer": "^9.0.0",
-    "better-sqlite3": "^11.0.0",
-    "sqlite-vec": "^0.1.0"
+    "sqlite-vec": "^0.1.0",
+    "yaml": "^2.9.0"
   },
   "keywords": [
     "ai",

package/pipeline/agents-manifest.yaml CHANGED Viewed

@@ -75,7 +75,7 @@ agents:
   readiness:
     name: READINESS
-    model: sonnet
+    model: opus
     lifecycle: per-story
     role: "Spec quality gate — validates reqs, UXA spec, and test specs are implementation-ready"
     prompt_template: .valent-pipeline/prompts/readiness.md
@@ -85,8 +85,8 @@ agents:
   bend:
     name: BEND
-    model: sonnet
-    lifecycle: per-story
+    model: opus
+    lifecycle: per-sprint  # persists across stories; receives [STORY-RESET] between stories
     role: "Backend developer — implements production code and tests"
     prompt_template: .valent-pipeline/prompts/bend.md
     reads_from: [reqs-brief.md, qa-test-spec.md]
@@ -95,8 +95,8 @@ agents:
   fend:
     name: FEND
-    model: sonnet
-    lifecycle: per-story
+    model: opus
+    lifecycle: per-sprint  # persists across stories; receives [STORY-RESET] between stories
     role: "Frontend developer — implements UI components and tests"
     prompt_template: .valent-pipeline/prompts/fend.md
     reads_from: [reqs-brief.md, uxa-spec.md, qa-test-spec.md]
@@ -105,8 +105,8 @@ agents:
   mobile:
     name: MOBILE
-    model: sonnet
-    lifecycle: per-story
+    model: opus
+    lifecycle: per-sprint  # persists across stories; receives [STORY-RESET] between stories
     role: "Mobile developer — implements RN/Flutter screens, components, Maestro E2E flows"
     prompt_template: .valent-pipeline/prompts/mobile.md
     reads_from: [reqs-brief.md, uxa-spec.md, qa-test-spec.md]
@@ -115,8 +115,8 @@ agents:
   data:
     name: DATA
-    model: sonnet
-    lifecycle: per-story
+    model: opus
+    lifecycle: per-sprint  # persists across stories; receives [STORY-RESET] between stories
     role: "Data pipeline developer — implements ETL, transforms, data quality, checkpointing"
     prompt_template: .valent-pipeline/prompts/data.md
     reads_from: [reqs-brief.md, qa-test-spec.md]
@@ -125,8 +125,8 @@ agents:
   mcp_dev:
     name: MCP-DEV
-    model: sonnet
-    lifecycle: per-story
+    model: opus
+    lifecycle: per-sprint  # persists across stories; receives [STORY-RESET] between stories
     role: "Protocol developer — implements MCP server tools, JSON-RPC handlers, transport"
     prompt_template: .valent-pipeline/prompts/mcp-dev.md
     reads_from: [reqs-brief.md, qa-test-spec.md]
@@ -135,8 +135,8 @@ agents:
   libdev:
     name: LIBDEV
-    model: sonnet
-    lifecycle: per-story
+    model: opus
+    lifecycle: per-sprint  # persists across stories; receives [STORY-RESET] between stories
     role: "Library developer — implements public API, exports, packaging, type declarations"
     prompt_template: .valent-pipeline/prompts/libdev.md
     reads_from: [reqs-brief.md, qa-test-spec.md]
@@ -145,8 +145,8 @@ agents:
   docgen:
     name: DOCGEN
-    model: sonnet
-    lifecycle: per-story
+    model: opus
+    lifecycle: per-sprint  # persists across stories; receives [STORY-RESET] between stories
     role: "Document generation developer — implements templates, render pipeline, output formatting"
     prompt_template: .valent-pipeline/prompts/docgen.md
     reads_from: [reqs-brief.md, qa-test-spec.md]
@@ -155,8 +155,8 @@ agents:
   iac:
     name: IAC
-    model: sonnet
-    lifecycle: per-story
+    model: opus
+    lifecycle: per-sprint  # persists across stories; receives [STORY-RESET] between stories
     role: "Infrastructure developer — implements IaC definitions, deployment configs, infrastructure tests"
     prompt_template: .valent-pipeline/prompts/iac.md
     reads_from: [reqs-brief.md, qa-test-spec.md]
@@ -165,7 +165,7 @@ agents:
   critic:
     name: CRITIC
     model: opus
-    lifecycle: per-story
+    lifecycle: per-sprint  # persists across stories; receives [STORY-RESET] between stories
     role: "Code reviewer — 3-pass adversarial review of production and test code"
     prompt_template: .valent-pipeline/prompts/critic.md
     review_passes: [blind-hunt, edge-case-hunt, acceptance-audit, triage]
@@ -174,8 +174,8 @@ agents:
   qa_b:
     name: QA-B
-    model: sonnet
-    lifecycle: per-story
+    model: opus
+    lifecycle: per-sprint  # persists across stories; receives [STORY-RESET] between stories
     role: "Test executor — runs tests, validates spec alignment, files bugs"
     prompt_template: .valent-pipeline/prompts/qa-b.md
     reads_from: [qa-test-spec.md, critic-review.md, reqs-brief.md]
@@ -184,23 +184,13 @@ agents:
   judge:
     name: JUDGE
-    model: sonnet
-    lifecycle: per-story
+    model: opus
+    lifecycle: per-sprint  # persists across stories; receives [STORY-RESET] between stories
     role: "Final quality gate — bug priority review + evidence-based ship decision"
     prompt_template: .valent-pipeline/prompts/judge.md
     reads_from: [execution-report.md, traceability-matrix.md, pmcp-evidence.md, bugs.md, qa-test-spec.md]  # critic-review.md intentionally excluded — JUDGE validates test/execution evidence, not code review; qa-test-spec.md used as reference for assertion cross-check
     writes_to: [judge-review.md, judge-decision.md, story-report.md]
-  knowledge:
-    name: Knowledge
-    model: haiku
-    lifecycle: per-story
-    role: "Knowledge retrieval — answers queries from persistent data sources"
-    prompt_template: .valent-pipeline/prompts/knowledge.md
-    data_sources: [chromadb, curated-knowledge-files, correction-directives]
-    context_variables: [knowledge_mode, chromadb_host, chromadb_collection_prefix, curated_files_path, correction_directives]
-    # No writes_to — Knowledge Agent responds via inbox only, no file output
 ephemeral_agents:
   pmcp:
     name: PMCP
@@ -223,7 +213,7 @@ ephemeral_agents:
   retrospective:
     name: Retrospective
-    model: sonnet
+    model: opus
     role: "Batch reviewer — analyzes last N stories for recurring patterns"
     prompt_template: .valent-pipeline/prompts/retrospective.md
     spawned_by: lead

package/pipeline/docs/design/provider-adapter-guide.md CHANGED Viewed

@@ -24,11 +24,9 @@ pipeline/
     claude-code/
       runtime.md                   ← PROVIDER — Claude Code runtime operations
       spawn.template.md            ← PROVIDER — Claude Code spawn template
-      knowledge-spawn.template.md  ← PROVIDER — Claude Code knowledge spawn
     codex/
       runtime.md                   ← PROVIDER — Codex runtime operations
       spawn.template.md            ← PROVIDER — Codex spawn template
-      knowledge-spawn.template.md  ← PROVIDER — Codex knowledge spawn
       AGENTS.md                    ← PROVIDER — Codex repo-level instructions
       cloud-task-protocol.md       ← PROVIDER — Codex cloud execution protocol
       cloud-task-prompts/          ← PROVIDER — Codex cloud task templates
@@ -56,7 +54,8 @@ Lead's prompt (`lead.md`) defines WHEN and WHY. The runtime adapter defines HOW.
 |------|---------|
 | `runtime.md` | All runtime operations: initialization, task registry, agent spawning, signal delivery, monitoring, teardown |
 | `spawn.template.md` | Agent spawn prompt template — what each agent instance receives at startup |
-| `knowledge-spawn.template.md` | Knowledge agent spawn template — knowledge-specific initialization |
+> **Note:** Knowledge is a self-service skill (`valent-knowledge`), not an agent — there is no `knowledge-spawn.template.md`. `scripts/validate-provider-sync.js` enforces this inventory (spawn-template parity + manifest prompt resolution).
 ### Codex-Only Files
@@ -133,7 +132,6 @@ Entirely shared. Quality gates are orchestration logic (when to reject, where to
 1. Create `providers/{new-provider}/` with:
    - `runtime.md` — all runtime operations for the new provider
    - `spawn.template.md` — spawn template adapted for the provider's agent model
-   - `knowledge-spawn.template.md` — knowledge spawn adapted
 2. Update `src/lib/config-schema.js`:
    - Add new provider to `validProviders` array
@@ -158,9 +156,10 @@ Entirely shared. Quality gates are orchestration logic (when to reject, where to
 The `scripts/validate-provider-sync.js` script runs in CI before every publish. It checks:
-1. **Template parity** — `spawn.template.md` and `knowledge-spawn.template.md` exist in both provider directories
-2. **Agent coverage** — Both runtime.md files reference the same set of agents from `agents-manifest.yaml`
-3. **Structural consistency** — Both runtime.md files have the same major sections (Initialization, Task Registry, Agent Spawning, Signal Delivery, Monitoring, Teardown)
+1. **Template parity** — `spawn.template.md` exists in both providers, and any `*spawn.template.md` in one provider has a counterpart in the other
+2. **Manifest integrity** — every `prompt_template` declared in `agents-manifest.yaml` resolves to a real file
+3. **Agent coverage** — both runtime.md files reference the critical agents (REQS, BEND, FEND, CRITIC, QA-B, JUDGE)
+4. **Structural consistency** — both runtime.md files have the major sections (Initialization, Task Registry, Agent Spawning, Signal Delivery, Monitoring, Teardown)
 If any check fails, the publish is blocked. Fix the discrepancy, then re-push.

package/pipeline/docs/knowledge-system.md CHANGED Viewed

@@ -4,7 +4,7 @@ Reference documentation for the v3 pipeline knowledge subsystem -- how the pipel
 ## 1. Architecture Overview
-The knowledge system has three data sources, three agents, and one principle: the Retrospective Agent is the sole gatekeeper for what enters persistent knowledge.
+The knowledge system has three data sources, two curation agents, and one principle: the Retrospective Agent is the sole gatekeeper for what enters persistent knowledge. Agents self-serve from these data sources directly during their read-inputs step — there is no separate Knowledge Agent.
 ### Data Sources
@@ -12,15 +12,14 @@ The knowledge system has three data sources, three agents, and one principle: th
 |--------|--------|---------|
 | **Curated knowledge files** | Markdown in `.valent-pipeline/knowledge/curated/` | Conventions, validated patterns, known pitfalls, test stability data |
 | **Correction directives** | YAML in `.valent-pipeline/knowledge/correction-directives.yaml` | Behavioral changes for agents -- translates observations into prompt-level guidance |
-| **ChromaDB** (optional) | Vector store via Docker or remote host | Embedding-based retrieval for code patterns and build artifacts |
+| **SQLite database** (optional) | SQLite via CLI | Indexed artifacts, full-text search, cross-story queries |
-### Agents
+### Curation Agents
 | Agent | Model | Lifecycle | Role |
 |-------|-------|-----------|------|
-| **Knowledge** | Haiku | Per-story | Reads all three sources, responds to teammate queries via inbox |
 | **Retrospective** | Sonnet | Ephemeral (every N stories) | Sole gatekeeper -- analyzes batch outputs, writes correction directives and embed instructions |
-| **Embed** | Haiku | Ephemeral (after Retrospective) | Executes indexing instructions -- writes to ChromaDB and/or curated files |
+| **Embed** | Haiku | Ephemeral (after Retrospective) | Executes indexing instructions -- writes to curated files and/or SQLite |
 ### Data Flow
@@ -33,14 +32,13 @@ Retrospective Agent
     |--- writes  ---> embed-instructions.md
     v
 Embed Agent
-    |--- indexes ---> ChromaDB collections (if configured)
+    |--- indexes ---> SQLite database (if configured)
     |--- writes  ---> .valent-pipeline/knowledge/curated/ files
     v
-Knowledge Agent (next story)
+Pipeline agents (next story)
     |--- reads   ---> correction directives (active only)
     |--- reads   ---> curated files
-    |--- queries ---> ChromaDB (if configured)
-    |--- responds --> teammate queries via inbox
+    |--- queries ---> SQLite (if configured)
 ```
 ---
@@ -118,13 +116,13 @@ No per-story indexing occurs. This is the core design decision that prevents ind
 6. **Lead spawns Embed Agent** after Retrospective completes. Embed reads the manifest and executes indexing. No lead interpretation needed.
-7. **Knowledge Agent** (spawned fresh each story) reads active correction directives and curated files, then responds to queries during the story.
+7. **Pipeline agents** (next story) read active correction directives and curated files directly during their read-inputs step.
 ---
 ## 4. RAG Assessment Framework
-Before investing further in ChromaDB-based RAG, run a Knowledge Retrieval Audit after 5-10 stories with the Knowledge Agent active.
+Before investing further in ChromaDB-based RAG, run a Knowledge Retrieval Audit after 5-10 stories with the knowledge system active.
 ### Three Failure Modes
@@ -132,15 +130,15 @@ Before investing further in ChromaDB-based RAG, run a Knowledge Retrieval Audit
 2. **Index pollution.** Without garbage collection or versioning, ChromaDB collections accumulate stale and contradictory entries. The Retrospective-gated curation directly addresses this.
-3. **Brief quality.** Does BEND perform measurably better with the Knowledge Agent's brief than without it? If not, those 2-3k tokens of context are displacing something more useful.
+3. **Brief quality.** Does BEND perform measurably better with knowledge context than without it? If not, those 2-3k tokens of context are displacing something more useful.
 ### Assessment Questions
 | Question | How to Measure | Implication |
 |----------|---------------|-------------|
-| Do agents actually query the Knowledge Agent mid-task? | Count on-demand queries per story across last 10 stories | If near-zero, agents are not finding it useful |
+| Do agents actually use knowledge data during tasks? | Check if agents reference knowledge sources in their frontmatter across last 10 stories | If near-zero, agents are not finding it useful |
 | Do startup briefs reduce rejection cycles? | Compare CRITIC rejection rates for stories with vs without relevant prior patterns | If no difference, briefs are not helping |
-| Are retrieval results relevant? | Sample 20 Knowledge Agent queries, manually rate top-3 results for relevance | If <50% relevant, embedding strategy needs work |
+| Are retrieval results relevant? | Sample 20 knowledge queries, manually rate top-3 results for relevance | If <50% relevant, embedding strategy needs work |
 | Is index pollution growing? | Count contradictory entries in `corrections` collection | If significant, need versioning/expiry |
 ### Three Possible Outcomes
@@ -152,13 +150,13 @@ Before investing further in ChromaDB-based RAG, run a Knowledge Retrieval Audit
 **B. RAG is noise -- simplify to curated context:**
 - Replace ChromaDB with curated knowledge files maintained by Retrospective Agent
-- Knowledge Agent becomes a simple file reader, not a retrieval system
+- Knowledge becomes simple file reading, not a retrieval system
 - Cheaper, more predictable, easier to debug
 **C. RAG is partially working -- hybrid approach:**
 - Keep ChromaDB for `source-code` and `build-patterns` collections (embedding similarity works for code)
 - Move `corrections`, `conventions`, and `qa-lessons` to curated files (human-readable, not embedding-dependent)
-- Knowledge Agent uses both: curated files for startup briefs, ChromaDB for on-demand "find similar code" queries
+- Agents use both: curated files for startup briefs, ChromaDB for on-demand "find similar code" queries
 ---
@@ -168,7 +166,7 @@ Configured via `knowledge.mode` in `pipeline-config.yaml`.
 ### `none` (default)
-- Knowledge Agent reads curated files + correction directives only
+- Agents read curated files + correction directives only
 - Embed Agent IS triggered but only writes to curated files (no ChromaDB operations)
 - Zero external dependencies
 - ChromaDB can be added later without pipeline changes
@@ -176,7 +174,7 @@ Configured via `knowledge.mode` in `pipeline-config.yaml`.
 ### `local-docker`
 - ChromaDB runs locally via `docker compose -f .valent-pipeline/docker-compose.chromadb.yml up -d`
-- Knowledge Agent connects to ChromaDB at the configured `chromadb_host` (typically `http://localhost:8000`)
+- Agents can connect to ChromaDB at the configured `chromadb_host` (typically `http://localhost:8000`)
 - Falls back to curated-only mode if ChromaDB is unreachable
 - Embed Agent indexes into both ChromaDB collections and curated files

package/pipeline/docs/lead-lifecycle.md CHANGED Viewed

@@ -8,9 +8,7 @@
 ### Persistent vs Per-Story Agents
-The lead is the **only persistent agent** in the pipeline. It carries `pipeline-state.json` and backlog position forward across stories. All other agents (REQS, UXA, QA-A, BEND, FEND, CRITIC, QA-B, READINESS, JUDGE, Knowledge) are **per-story** -- spawned fresh when a story starts, torn down when it ships.
-The Knowledge Agent's value is in its persistent data sources (ChromaDB collections and curated knowledge files on disk), not its conversation history. A fresh spawn reads from the same store.
+The lead is the **only persistent agent** in the pipeline. It carries `pipeline-state.json` and backlog position forward across stories. All other agents (REQS, UXA, QA-A, BEND, FEND, CRITIC, QA-B, READINESS, JUDGE) are **per-story** -- spawned fresh when a story starts, torn down when it ships. Knowledge is self-served by each agent directly from curated files and correction directives on disk.
 Ephemeral agents (PMCP, Embed, Retrospective, Help) are spawned on-demand for a specific task and killed when done. They are not teammates -- they do not receive inbox messages mid-story.
@@ -35,7 +33,7 @@ The lead validates the story input before spawning any teammates.
 - **Trigger map** -- enables UXA strategic validation (driving force cross-referencing). Without it, UXA runs in translation-only mode.
 - **Scenario outlines** -- enables scenario-driven UXA specs.
 - **Architecture decisions** -- enables REQS to incorporate technical constraints.
-- **Existing project context** -- codebase documentation, conventions, prior patterns. Loaded by Knowledge Agent if available.
+- **Existing project context** -- codebase documentation, conventions, prior patterns. Loaded from curated knowledge files.
 If required fields are missing, the story is rejected via CLI escalation (see Backlog Management below).
@@ -120,7 +118,7 @@ All code committed and pushed to the branch specified by the user. The pipeline
 2. Code committed and pushed to user-specified branch
 3. All agent outputs persist in the story folder (handoff files, reviews, bug reports, execution reports, PMCP evidence)
 4. Lead writes `story-report.md`: task completion times, rejection cycles, cost metrics
-5. Lead tears down all story teammates including Knowledge Agent
+5. Lead tears down all story teammates
 6. Lead persists -- carries pipeline state and backlog position forward
 7. Lead picks next story from backlog and returns to Phase 1 with a fresh story team
@@ -256,13 +254,6 @@ The lead manages the backlog as a dependency-aware queue, not a simple FIFO list
    - "You are replacing a crashed agent. Steps completed: [from frontmatter]. Prior work: [from handoff files on disk]. Resume from step: [next incomplete step]."
 7. Fresh teammate picks up from where the crashed agent left off
-### Crash Type: Knowledge Agent Crashes
-1. Spawn a new Knowledge Agent with the same role definition
-2. New agent has immediate access to ChromaDB and curated knowledge files (both on disk)
-3. On-demand queries are stateless by design -- no conversation history needed
-4. The Knowledge Agent is killed and respawned fresh per story anyway, so mid-story crashes are the only case that matters
 ### Crash Type: Lead Crashes
 1. Human restarts the lead (this is the one case requiring manual intervention)

package/pipeline/docs/npx-packaging.md CHANGED Viewed

@@ -27,7 +27,6 @@ The v3 pipeline splits into three categories of files:
 | `.valent-pipeline/task-graphs/frontend-only.yaml` | Pipeline infrastructure | Shipped with package |
 | `.valent-pipeline/spawn-templates/pipeline-context.template.md` | Pipeline infrastructure | Shipped with package; filled at runtime |
 | `.valent-pipeline/spawn-templates/agent-spawn.template.md` | Pipeline infrastructure | Shipped with package |
-| `.valent-pipeline/spawn-templates/knowledge-spawn.template.md` | Pipeline infrastructure | Shipped with package |
 | `.valent-pipeline/agents-manifest.yaml` | Pipeline infrastructure | Shipped with package; models section overridable via project config |
 | `.valent-pipeline/scripts/embed.ts` | Pipeline infrastructure | Shipped with package |
 | `.valent-pipeline/docker-compose.chromadb.yml` | Pipeline infrastructure | Shipped with package |

package/pipeline/docs/template-skeleton.md CHANGED Viewed

@@ -278,5 +278,5 @@ The 16 templates in `.valent-pipeline/templates/`, mapped to their producing age
 | `judge-decision.template.md` | JUDGE | Lead |
 | `story-report.template.md` | Lead | User |
 | `pmcp-evidence.template.md` | PMCP | JUDGE |
-| `retrospective.template.md` | Retrospective Agent | Lead, Knowledge Agent |
+| `retrospective.template.md` | Retrospective Agent | Lead, pipeline agents |
 | `embed-instructions.template.md` | Lead | Embed Agent |

package/pipeline/orchestrators/claude-code/README.md ADDED Viewed

@@ -0,0 +1,99 @@
+# Claude Code orchestrator (native Workflow)
+This is the Claude Code deployment of the valent-pipeline orchestrator, per the hybrid
+target in [`../../../docs-feedback/reimplementation-plan.md`](../../../docs-feedback/reimplementation-plan.md)
+(R3): the Claude Code provider runs a deterministic **Workflow script**, while the Codex
+provider keeps the markdown-skill Lead. Both consume the same shared substrate
+(`prompts/`, `steps/`, `task-graphs/`, `schemas/`, templates).
+## The three workflows
+| File | Step | Role |
+|---|---|---|
+| `plan.workflow.js` | 7 | Groom → size → pack → validate a set of pending stories into a planned sprint batch. Emits a batch shaped to feed straight into `sprint.workflow.js`. |
+| `sprint.workflow.js` | 4 + 6 | Execute a planned batch sequentially through the per-story pipeline with schema-validated gates. |
+| `retro.workflow.js` | 7 | Learn from a shipped batch: calibrate, loop-until-dry aggregate review, gated directives, embed. |
+They compose as `plan → sprint → retro`. The per-story pipeline is kept **inline** in
+`sprint.workflow.js` (not a nested `workflow()`), so the single `workflow()` nesting level
+stays free for a future sprint-cycle wrapper to call all three (reimplementation-plan §5b).
+## Status
+`sprint.workflow.js` implements **Steps 4 + 6** (R1 control flow, R4 gates-as-stages, the
+sprint batch loop, 3b parallel CRITIC, and full spawn-context prompts). `plan.workflow.js`
+and `retro.workflow.js` implement **Step 7**. **Step 8** (resume + state model, below) is
+wired. All three are control-flow-validated by `scripts/test-workflow.js` (21 scenarios,
+incl. a resume-safety lint), but:
+- It is **opt-in, not the default.** `skills/valent-run-story` still drives the prose Lead;
+  the Workflow runs only when the user opts in (see that skill's "Step 5 (alternative)").
+- It has **not been exercised end-to-end against a live story.** A Workflow runs via the
+  Workflow tool against a real project and spawns real agents; it cannot be unit-tested like
+  `src/lib/*`. Validate it against a `testResources/*` fixture before making it the default.
+## What it demonstrates
+| Concern | How | Replaces |
+|---|---|---|
+| DAG resolution | spawns an agent that runs `resolve-graph` (step 2) per story | Lead transcribing + pruning by judgment |
+| Sprint batch | sequential `for`-loop over `args.stories[]` (shared branch ⇒ no overlap) | prose six-phase sprint loop |
+| Quality gates | `runGate()` returns a `verdict.schema`-validated object | prose verdict, unchecked |
+| Pass-invariant | `assertGate()` rejects `pass` + open Highs | KANBAN-002 class |
+| Rejection cap | JS `while` loop, code-owned counter | model-counted circuit breaker |
+| Dev fan-out | `parallel()` barrier before CRITIC | wave/spawn_trigger overlay |
+| 3b CRITIC | `parallel([blind, edge, acceptance])` independent agents → triage barrier | one CRITIC context, passes anchored on each other |
+| Spawn context | `buildPrompt()` mirrors `spawn.template.md` (Setup/Task/Trigger/Completion) | terse inline instructions |
+| Roll-over | a rejected story is recorded and the batch continues | — |
+| Resume | journal (`resumeFromRunId`) | disk-state rehydration + re-decide |
+## Args
+```js
+// batch form (a planned sprint)
+{ stories: [{ storyId, projectType, profiles }, ...], maxRejectionCycles? }
+// single-story form (back-compat)
+{ storyId, projectType, profiles?, maxRejectionCycles? }
+```
+Returns `{ shipped, stories_shipped, stories_rolled_over, results: [{ storyId, shipped, verdict, skipped }] }`.
+## Resume & state model (step 8)
+**The journal is the state of record.** Each Workflow invocation returns a `runId`. To resume
+after an interruption (context limit, crash, manual stop, or a mid-run script edit), relaunch
+with `Workflow({ scriptPath, resumeFromRunId })` — **not** a fresh run. The journal replays the
+unchanged prefix of `agent()` calls instantly (same script + args → 100% cache hit) and re-runs
+only from the first changed/new call onward. Already-shipped stories and passed gates are not
+redone. This is the exact form of the durability the prose Lead approximated by re-reading
+`pipeline-state.json` and re-deciding.
+`pipeline-state.json`, `sprint-{n}-status.yaml`, and the markdown handoffs are **derived,
+human-readable views** in this path — agents write them for visibility; the orchestrator never
+reads them back to make a control-flow decision (its state lives in JS variables the journal
+captures). Because there is no multi-file state of record, the non-atomic multi-file desync the
+prose Lead can hit (feedback gap #2) is structurally impossible here. *Do not hand-edit a state
+file to resume — pass `resumeFromRunId`.* (The prose Lead path still uses `pipeline-state.json`
+as its mechanism; that's correct for that runtime.)
+**Resume-safety is linted.** Journal replay requires a deterministic, side-effect-free script
+body, so `scripts/test-workflow.js` statically rejects `Date.now`/`new Date(`/`Math.random`
+(nondeterminism) and `import`/`require`/`*FileSync`/`process.*` (in-script IO) in all three
+workflow files. All IO goes through agents; that's why resolve-graph/sprint-pack/calibrate/embed
+are invoked *through* an agent rather than imported.
+## Known simplifications (next slices)
+- A `sprint-cycle.workflow.js` that calls `plan → sprint → retro` via `workflow()` isn't built
+  yet; for now run the three workflows in sequence (the plan output feeds the sprint input).
+- Per-story dev fan-out re-runs ALL dev agents on a CRITIC rejection; routing rework to only
+  the agent(s) CRITIC targeted (via `rejectionTarget`) is a refinement once run live.
+- No PMCP / visual-validation stage yet; no PM/program-loop workflow (left agent-driven per §5b).
+## Runtime constraint that shaped the design
+A Workflow script body has **no filesystem or import access** — it cannot read
+`task-graphs/*.yaml`, parse handoffs, or run the CLI directly. All IO is performed by the
+agents it spawns (which have Bash/Read/Write); the script only sequences them and validates
+their structured returns. That is why `resolve-graph` is invoked *through* an agent rather
+than imported.