npm - valent-pipeline - Versions diffs - 0.2.25 → 0.2.26 - Mend

valent-pipeline 0.2.25 → 0.2.26

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/package.json +1 -1
package/pipeline/providers/claude-code/runtime.md +127 -0
package/pipeline/providers/codex/AGENTS.md +49 -0
package/pipeline/providers/codex/runtime.md +256 -0
package/src/lib/config-schema.js +31 -0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "valent-pipeline",
-  "version": "0.2.25",
+  "version": "0.2.26",
   "description": "v3 multi-agent AI pipeline for software development lifecycle",
   "type": "module",
   "bin": {

package/pipeline/providers/claude-code/runtime.md ADDED Viewed

@@ -0,0 +1,127 @@
+# Claude Code Runtime Adapter
+> **Provider:** Claude Code Agent Teams
+> **Read by:** Lead agent when `runtime.provider` is `claude-code` in `pipeline-config.yaml`
+> **Purpose:** Defines HOW to create teams, spawn agents, send signals, track tasks, and monitor liveness using Claude Code's native primitives.
+---
+## Initialization
+Use `TeamCreate` to create a named team for inbox routing:
+- **Epic runs** (`{is_epic_run}` is true): `TeamCreate("valent-{epic_id}")` (lowercased). The team persists across stories so Knowledge Agent stays alive. If the team already exists (continuing an epic), proceed without recreating.
+- **Standalone runs**: `TeamCreate("valent-{story_id}")` (lowercased).
+**Clean up stale team:** If `TeamCreate` fails with an "Already leading team" error, call `TeamDelete` first, then retry. This handles cases where a prior run crashed before teardown.
+**NEVER** spawn pipeline agents as anonymous subagents. Every agent must be a named teammate on the team so it can send and receive inbox messages with other teammates.
+---
+## Task Registry
+Use the shared task list API to create and track tasks:
+1. **Create tasks:** For each task in the resolved task graph, call `TaskCreate` with `subject`, `description`, and `activeForm` from the template.
+2. **Wire dependencies:** For each task with `blockedBy` refs, call `TaskUpdate` with `addBlockedBy` using the mapped real task IDs. If a `blockedBy` ref is in `skipped_refs`, omit it.
+3. **Maintain ref map:** Keep a `ref -> taskId` map (e.g., `{ reqs: "1", uxa: "2", ... }`) and a `skipped_refs` set.
+4. **Read task state:** Use `TaskList` and `TaskGet` to check task statuses during monitoring.
+5. **Update status:** Use `TaskUpdate` to mark tasks `in_progress` on agent spawn and `completed` on agent handoff.
+---
+## Agent Spawning
+Use the `Agent` tool with `name` parameter to spawn named teammates onto the team. Set `run_in_background: true` so agents work autonomously.
+### Spawn Pattern
+1. Read the spawn template (a short ~15-line template from `.valent-pipeline/spawn-templates/`)
+2. Substitute all `{{variables}}` with resolved values from config and the task graph
+3. Pass the filled template as the Agent tool's `prompt` parameter
+4. Set `name` to the agent's name from the manifest (e.g., `"REQS"`, `"UXA"`, `"Knowledge"`)
+5. Set `run_in_background: true`
+6. Set `model` per the `models` section in `pipeline-config.yaml` (match agent name to tier)
+**DO NOT read the agent's prompt files, step files, or templates yourself.** The spawn template tells the teammate to read its own prompt and steps. Your job is ONLY to substitute variables in the template and pass it to the Agent tool.
+### Templates
+- `.valent-pipeline/spawn-templates/agent-spawn.template.md` — for all agents except Knowledge
+- `.valent-pipeline/spawn-templates/knowledge-spawn.template.md` — for the Knowledge Agent
+### Wave-Based Spawning
+| Wave | Spawn Trigger | Agents |
+|------|---------------|--------|
+| 1 | At kick-off | Knowledge, REQS, UXA, QA-A, READINESS |
+| 2 | QA-A sends `[HANDOFF]` | BEND, FEND, DATA, MCP-DEV, LIBDEV, DOCGEN, IAC, CRITIC (each only if not skipped) |
+| 3 | CRITIC task becomes `in_progress` | QA-B, PMCP (if ui profile) |
+| 4 | JUDGE bug-review task becomes `in_progress` | (reserved) |
+**Pattern:** Spawn the next wave when the current blocking agent starts work, so downstream agents are initialized and ready the moment the blocker finishes.
+### Knowledge Agent (Epic Mode)
+If `{is_epic_run}` is true and a Knowledge teammate already exists in the current team, skip the spawn. Instead, send `SendMessage(to: "Knowledge")`: `[STORY-RESET] story_id={story_id}, pipeline_context={story_output_dir}/pipeline-context.md`. Wait for `[KNOWLEDGE-READY]` response.
+---
+## Signal Delivery
+Use `SendMessage` for all inter-agent communication. Messages are delivered to named teammate inboxes.
+### Lead Outbound Messages
+- `[SPAWN] Spawning {agent} for {story_id}. Role: {role}. Shared context: {story_output_dir}.`
+- `[CHECK-IN] {agent}: task {task} has been in_progress for {minutes}min. Status?`
+- `[REVIEW-READY] Story {story_id}` — sent to READINESS when a story reaches `readiness-review`
+- `[TEARDOWN] Tearing down all teammates for {story_id}.`
+- `shutdown_request` via SendMessage to each agent individually at teardown
+### Agent-to-Agent (Peer-to-Peer)
+Agents use `SendMessage` directly for:
+- `[HANDOFF]` — completion signal to Lead and/or downstream agent
+- `[CRITIC-REJECTION]` — CRITIC to BEND/FEND, CC Lead
+- `[BUG]` — QA-B to BEND/FEND, CC Lead
+- `[KNOWLEDGE-QUERY]` / `[KNOWLEDGE-RESPONSE]` — any agent to/from Knowledge
+- `[DESIGN-COUNCIL]` / `[DESIGN-COUNCIL-RESPONSE]` — structured deliberation between 2-3 agents
+- `[BLOCKER]` / `[ESCALATION]` — agent to Lead
+---
+## Monitoring
+### Heartbeat Timer
+At the start of Phase 2, create a `CronCreate` job that fires every 4 minutes. Each heartbeat triggers a liveness check: call `TaskList`, verify at least one non-Knowledge agent is `in_progress`. If uncompleted tasks exist but no agent is working, diagnose deadlock (stale blockers, missed unblocks, dead teammates) and act.
+Create a separate 4-minute keep-alive ping for the Knowledge Agent to prevent prompt cache expiry.
+### Monitoring Loop
+1. Watch for task status changes (completed, blocked, failed)
+2. Watch for wave spawn triggers (QA-A completion, CRITIC start)
+3. Watch for inbox messages ([ESCALATION], [BLOCKER], [DESIGN-COUNCIL], [STATUS])
+4. Track rejection counts per agent for circuit breaker
+5. Track time-in-progress per task for stall detection (exempt: Knowledge Agent)
+6. On every phase transition, capture timestamp and update `phase-timing.md`
+### Stall Detection
+When a task stays `in_progress` beyond `{stall_threshold_minutes}`:
+1. Send `[CHECK-IN]` message to the teammate via SendMessage
+2. If no response within reasonable time, classify per Headless Escalation Protocol
+---
+## Teardown
+1. Send `shutdown_request` via `SendMessage` to each teammate individually (not broadcast)
+2. Wait for each agent to write final state to its handoff file
+3. Delete heartbeat and keep-alive cron jobs via `CronDelete`
+4. Call `TeamDelete` to destroy the team and all inboxes
+5. Write `story-report.md`
+6. Commit and push to story branch

package/pipeline/providers/codex/AGENTS.md ADDED Viewed

@@ -0,0 +1,49 @@
+# Valent Pipeline — Codex Agent Instructions
+> This file provides pipeline-wide rules for all agents running under OpenAI Codex.
+> It is the Codex equivalent of project-level CLAUDE.md instructions.
+## Communication Standard
+All output follows the V3 Distilled Communication Standard:
+- Write for machine consumption, not human readability
+- Structured data (YAML, lists, key-value) over prose paragraphs
+- Facts and decisions only — no filler sentences
+- Section headers as semantic labels
+- Explicit cross-references using file paths and section anchors
+Full reference: `.valent-pipeline/docs/communication-standard.md`
+## Output Format
+Every handoff document must include:
+1. **YAML frontmatter** — agent, story, status, stepsCompleted, pendingSteps, lastCheckpoint, inputsRead, outputsWritten, blockers
+2. **Orchestrator Summary** — Agent, Story, Verdict (pass/fail/needs-review), State transition, Files created/modified, Flags
+Full schema: `.valent-pipeline/docs/communication-standard.md#3` and `#4`
+## Shared Context
+At startup, every agent reads:
+1. Its core prompt from `.valent-pipeline/prompts/{agent}.md`
+2. Shared context from `{story_output_dir}/pipeline-context.md`
+3. Step files at the point of execution (not before)
+## File Conventions
+- Write all output to `{story_output_dir}/`
+- Use the appropriate template from `.valent-pipeline/templates/`
+- Cross-reference files by path and section anchor: `file.md#section`
+- Never reference implicit shared context — everything must be explicit
+## Signal Delivery
+In Codex mode, thread completion IS your handoff signal. Write your output files with the orchestrator summary verdict, then complete. Lead reads your verdict from the handoff file and routes accordingly.
+Do NOT attempt to use `SendMessage` or inbox messaging — these are Claude Code primitives not available in Codex.
+## Knowledge Queries
+If you need knowledge context, check `{story_output_dir}/pipeline-context.md` for the knowledge mode:
+- If `knowledge-context.md` exists in the story output dir, read it directly
+- Otherwise, write your query to `{story_output_dir}/knowledge-queries/{your_name}-{n}.md` and note "pending knowledge query" in your handoff. Lead will relay the response.

package/pipeline/providers/codex/runtime.md ADDED Viewed

@@ -0,0 +1,256 @@
+# Codex Runtime Adapter
+> **Provider:** OpenAI Codex (CLI and Cloud)
+> **Read by:** Lead agent when `runtime.provider` is `codex` in `pipeline-config.yaml`
+> **Purpose:** Defines HOW to spawn agent threads, coordinate via steering, track tasks via file-based registry, and manage thread lifecycle using Codex's native primitives.
+---
+## Initialization
+No team API call needed. Ensure the story output directory exists:
+```
+mkdir -p {story_output_dir}/
+```
+No `signals/` or `inbox/` directories needed. Codex thread management handles coordination natively:
+- Agent completion is detected by thread state
+- Parent-to-child communication uses steering (follow-up instructions to running threads)
+- Child-to-parent communication is via thread completion + handoff file verdict
+---
+## Task Registry
+Write `{story_output_dir}/task-registry.yaml` from the resolved task graph:
+```yaml
+tasks:
+  reqs:
+    status: pending         # pending | in_progress | completed | blocked
+    agent: REQS
+    blocked_by: []
+    started_at: null
+    completed_at: null
+  uxa:
+    status: pending
+    agent: UXA
+    blocked_by: [reqs]
+    started_at: null
+    completed_at: null
+  # ... all tasks from task graph, filtered by project type and testing profiles
+```
+**Update protocol:**
+- Before spawning an agent: set its task status to `in_progress`, write `started_at`
+- After agent thread completes: set status to `completed`, write `completed_at`, check if downstream tasks are unblocked
+- Read the registry to determine next actions (replaces `TaskList`/`TaskGet`)
+**Lead is the sole writer.** Agents never modify `task-registry.yaml`.
+---
+## Agent Spawning
+Codex subagents run as **persistent threads**. Threads stay alive until explicitly closed. Lead spawns threads, steers them with follow-up instructions, and closes them at teardown.
+### Agent Type Classification
+| Agent | Type | Rationale |
+|-------|------|-----------|
+| REQS | explorer | Reads story inputs + codebase for context, produces spec only |
+| UXA | worker | Translates specs, may scaffold component stubs |
+| QA-A | explorer | Reads all specs + codebase patterns, produces test spec only |
+| READINESS | explorer | Reads all spec artifacts, produces review only |
+| BEND | worker | Writes production code + tests, runs builds |
+| FEND | worker | Writes UI code + tests, runs builds |
+| IAC | worker | Writes infrastructure code |
+| CRITIC | explorer | Multi-pass code review — reads everything, writes review doc only |
+| QA-B | worker | Runs test suites, writes bug reports, may modify test files |
+| JUDGE | explorer | Reads all artifacts, produces ship/reject verdict only |
+| Knowledge | default | Reactive service — reads knowledge base, writes responses |
+### Sandbox Mode Per Agent
+| Agent | Sandbox Mode | Notes |
+|-------|-------------|-------|
+| REQS, UXA, READINESS | `workspace-write` | Read files, write handoff docs |
+| QA-A | `workspace-write` | Read files, write test spec |
+| BEND, FEND, IAC | `workspace-write` | Read/write files, run shell (npm install, build, git). Network enabled for deps. |
+| CRITIC, JUDGE | `read-only` | Only reads code. Override `writable_roots` for story output dir. |
+| QA-B | `workspace-write` | Run test suites, start local servers. Network enabled, `allow_local_binding: true`. |
+| PMCP | `workspace-write` | Run Playwright MCP. Network enabled, `allow_local_binding: true`. |
+| Knowledge | `workspace-write` | Read knowledge files, write responses |
+### Sequential Phases (Wave 1)
+Spawn threads one at a time. Each reads predecessor output from disk.
+```
+Lead spawns REQS thread → REQS writes reqs-brief.md → completes
+Lead reads handoff file verdict, updates task-registry.yaml
+Lead spawns UXA thread → UXA reads reqs-brief.md, writes uxa-spec.md → completes
+...
+Lead spawns READINESS thread → reviews specs → completes
+```
+### Parallel Phases (Wave 2)
+Spawn BEND, FEND, IAC as parallel threads. All read from approved specs on disk.
+```
+Lead spawns [BEND, FEND, IAC] as parallel threads
+All three read from qa-test-spec.md + readiness-review.md on disk
+All three write separate output files
+Lead waits for all threads to complete
+Lead updates task-registry.yaml for all three
+Lead spawns CRITIC thread
+```
+### Wave Spawn Triggers
+Same as Claude Code — spawn next wave when current blocking agent starts:
+| Wave | Trigger | Agents |
+|------|---------|--------|
+| 1 | At kick-off | Knowledge, REQS, UXA, QA-A, READINESS |
+| 2 | QA-A completes (READINESS approved) | BEND, FEND, IAC, CRITIC (each only if not skipped) |
+| 3 | CRITIC starts | QA-B, PMCP (if ui profile) |
+| 4 | JUDGE bug-review starts | (reserved) |
+### Spawn Template
+Each agent receives a prompt built from `.valent-pipeline/spawn-templates/agent-spawn.template.md` (same template as Claude Code, without SendMessage instructions). The template tells the agent to:
+1. Read its core prompt
+2. Read shared context
+3. Execute steps
+4. Write output files to `{story_output_dir}/`
+5. Complete (Lead manages thread lifecycle)
+---
+## Rejection Loop (Steering, Not Re-Spawning)
+Because threads persist, rejection loops use steering instead of re-spawning:
+```
+Lead spawns CRITIC thread → CRITIC writes critic-review.md → completes with verdict
+If verdict == rejection:
+  Lead steers BEND thread with follow-up: "CRITIC rejected. See critic-review.md#finding-1. Fix."
+  BEND fixes in-place → completes
+  Lead steers CRITIC thread: "BEND pushed fixes. Re-review."
+  CRITIC re-reviews → completes
+  Loop until approved or circuit breaker
+```
+This preserves agent context from the initial implementation — no re-loading of codebase or specs.
+If a thread was already closed (crash, timeout), Lead spawns a new thread with recovery context from handoff frontmatter.
+---
+## Knowledge Agent
+### Option A: Persistent Thread via Steering (Default for CLI)
+Lead spawns Knowledge thread at story start. Knowledge loads correction directives, curated knowledge, and ChromaDB/SQLite sources, then signals ready.
+When another agent needs knowledge:
+1. Agent writes query to `{story_output_dir}/knowledge-queries/{agent}-{n}.md`
+2. Agent completes its current step and notes "pending knowledge query" in handoff
+3. Lead steers Knowledge thread: "Answer query in `knowledge-queries/{agent}-{n}.md`"
+4. Knowledge writes response to `{story_output_dir}/knowledge-responses/{agent}-{n}.md`
+5. Lead steers the requesting agent: "Knowledge response ready at `knowledge-responses/{agent}-{n}.md`"
+Hub-and-spoke: Lead relays between agents and Knowledge. No direct agent-to-Knowledge channel.
+### Option B: Pre-Computed Context File (Default for Cloud)
+Lead spawns a one-shot Knowledge thread that:
+1. Reads ALL correction directives and ALL curated knowledge entries
+2. Produces `{story_output_dir}/knowledge-context.md` — compiled reference
+3. Completes
+All downstream agents read `knowledge-context.md` directly. Zero relay overhead.
+### Option C: MCP Tool (Advanced)
+Configure knowledge retrieval as an MCP tool in `.codex/config.toml`:
+```toml
+[mcp.knowledge]
+type = "stdio"
+command = "node"
+args = [".valent-pipeline/scripts/knowledge-mcp-server.js"]
+```
+Agents call the knowledge tool directly. No relay through Lead.
+---
+## Signal Delivery
+### Agent → Lead (completion/verdict)
+Thread completion IS the signal. When a thread completes, Lead reads the handoff file's YAML frontmatter (`status`) and orchestrator summary (`verdict`) to determine routing:
+- `verdict: pass` → advance downstream task, update task-registry
+- `verdict: fail` → route rejection per re-entry map
+- `verdict: needs-review` → escalate per Headless Escalation Protocol
+No signal files needed. Lead reads the handoff artifact directly.
+### Lead → Agent (instructions, rejection context, steering)
+Two channels:
+- **At spawn:** Initial prompt via spawn template.
+- **Mid-execution:** Steering via follow-up instructions to running thread. Used for rejection rework, knowledge query relay, and course correction.
+### Agent → Agent (peer communication)
+Not supported natively. Hub-and-spoke only. Two patterns:
+**File-based (for data):** Agent writes to `{story_output_dir}/` and the consuming agent reads it when spawned or steered. This is how handoff documents already work.
+**Lead relay (for coordination):** Lead reads output from one thread, steers another thread with the relevant information. Used for Design Council deliberation and Knowledge Agent query relay.
+---
+## Monitoring
+Lead drives execution as an orchestration loop, managing threads directly:
+```
+while tasks remain:
+  next_batch = get_unblocked_tasks(task_registry)
+  if parallel_safe(next_batch):
+    spawn all as parallel threads, wait for all to complete
+  else:
+    spawn sequentially, wait for each
+  update task_registry
+  check circuit breaker thresholds
+  if rejection: steer agent threads for rework (see Rejection Loop)
+```
+### Stall Detection
+Lead can inspect running threads via `/agent` or by checking `job_max_runtime_seconds` timeout. If a thread exceeds the timeout, Codex kills it and Lead handles the timeout as a stall — spawn a replacement thread with recovery context from handoff frontmatter.
+No heartbeat cron needed — Lead observes thread state directly.
+### Thread Budget
+`max_threads` defaults to 6 (configurable via `runtime.codex.max_concurrent_agents`). Lead's own thread does NOT count against this limit.
+**Strategy:** Close completed spec agent threads (REQS, UXA, QA-A, READINESS) before spawning Wave 2 to free slots. For rejection loops, if slots are tight, close dev threads and re-spawn with recovery context.
+If `max_threads` is hit, Codex queues excess spawns. Pipeline degrades to sequential execution — slower but not broken.
+---
+## Teardown
+Close all agent threads and clean up:
+1. Close all completed agent threads (ask Codex to close each thread)
+2. Stop any still-running threads (Knowledge Agent if using Option A)
+3. Write `story-report.md`
+4. Commit and push to story branch

package/src/lib/config-schema.js CHANGED Viewed

@@ -95,6 +95,23 @@ export function validateConfig(config) {
     }
   }
+  // Runtime section (optional — provider adapter configuration)
+  if (config.runtime) {
+    const validProviders = ['claude-code', 'codex'];
+    if (config.runtime.provider && !validProviders.includes(config.runtime.provider)) {
+      errors.push(`Invalid runtime.provider: "${config.runtime.provider}". Must be one of: ${validProviders.join(', ')}`);
+    }
+    if (config.runtime.codex) {
+      if (config.runtime.codex.max_concurrent_agents !== undefined && typeof config.runtime.codex.max_concurrent_agents !== 'number') {
+        errors.push(`runtime.codex.max_concurrent_agents must be a number, got: ${typeof config.runtime.codex.max_concurrent_agents}`);
+      }
+      const validSandboxModes = ['read-only', 'workspace-write', 'danger-full-access'];
+      if (config.runtime.codex.sandbox_mode && !validSandboxModes.includes(config.runtime.codex.sandbox_mode)) {
+        errors.push(`Invalid runtime.codex.sandbox_mode: "${config.runtime.codex.sandbox_mode}". Must be one of: ${validSandboxModes.join(', ')}`);
+      }
+    }
+  }
   if (config.knowledge?.mode === 'sqlite' && !config.knowledge?.sqlite_db_path) {
     errors.push('knowledge.sqlite_db_path is required when knowledge.mode is "sqlite"');
   }
@@ -168,4 +185,18 @@ export const defaults = {
     auto_reprioritize: true,
     auto_generate_gap_stories: false,
   },
+  runtime: {
+    provider: 'claude-code',
+    codex: {
+      max_concurrent_agents: 6,
+      sandbox_mode: 'workspace-write',
+      model_map: {},
+      cloud: {
+        git_remote: 'origin',
+        internet_allowlist: [],
+        setup_script: '.codex/setup.sh',
+        commit_prefix: 'pipeline',
+      },
+    },
+  },
 };