valent-pipeline 0.3.3 → 0.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/cli.js +80 -0
- package/package.json +7 -5
- package/pipeline/agents-manifest.yaml +23 -33
- package/pipeline/docs/design/provider-adapter-guide.md +6 -7
- package/pipeline/docs/knowledge-system.md +16 -18
- package/pipeline/docs/lead-lifecycle.md +3 -12
- package/pipeline/docs/npx-packaging.md +0 -1
- package/pipeline/docs/template-skeleton.md +1 -1
- package/pipeline/orchestrators/claude-code/README.md +99 -0
- package/pipeline/orchestrators/claude-code/plan.workflow.js +284 -0
- package/pipeline/orchestrators/claude-code/retro.workflow.js +274 -0
- package/pipeline/orchestrators/claude-code/sprint.workflow.js +354 -0
- package/pipeline/orchestrators/codex/README.md +52 -0
- package/pipeline/orchestrators/codex/lead-loop.md +115 -0
- package/pipeline/prompts/bend.md +12 -2
- package/pipeline/prompts/critic.md +17 -8
- package/pipeline/prompts/fend.md +12 -2
- package/pipeline/prompts/judge.md +12 -2
- package/pipeline/prompts/lead.md +231 -71
- package/pipeline/prompts/qa-a.md +1 -1
- package/pipeline/prompts/qa-b.md +12 -2
- package/pipeline/prompts/reqs.md +1 -1
- package/pipeline/prompts/uxa.md +1 -1
- package/pipeline/providers/claude-code/runtime.md +31 -10
- package/pipeline/providers/codex/AGENTS.md +8 -3
- package/pipeline/providers/codex/cloud-task-prompts/implementation.md +2 -0
- package/pipeline/providers/codex/codex-project-files/.codex/agents/review-explorer.toml +2 -2
- package/pipeline/providers/codex/runtime.md +91 -208
- package/pipeline/providers/codex/spawn.template.md +3 -1
- package/pipeline/schemas/handoff.schema.json +19 -0
- package/pipeline/schemas/task-graph.schema.json +53 -0
- package/pipeline/schemas/verdict.schema.json +20 -0
- package/pipeline/scripts/query-kb.ts +1 -1
- package/pipeline/spawn-templates/pipeline-context.template.md +1 -3
- package/pipeline/steps/bend/read-inputs.md +2 -5
- package/pipeline/steps/common/agent-protocol.md +9 -1
- package/pipeline/steps/common/distilled-handoff-format.md +15 -0
- package/pipeline/steps/critic/acceptance-audit.md +1 -1
- package/pipeline/steps/critic/edge-case-hunt.md +2 -2
- package/pipeline/steps/critic/triage.md +2 -2
- package/pipeline/steps/data/read-inputs.md +2 -5
- package/pipeline/steps/docgen/read-inputs.md +2 -5
- package/pipeline/steps/fend/read-inputs.md +2 -5
- package/pipeline/steps/iac/read-inputs.md +2 -5
- package/pipeline/steps/libdev/read-inputs.md +2 -5
- package/pipeline/steps/mcp-dev/read-inputs.md +2 -5
- package/pipeline/steps/mobile/read-inputs.md +2 -5
- package/pipeline/steps/orchestration/adopt-lead-and-create-team.md +107 -33
- package/pipeline/steps/orchestration/sprint-execute.md +30 -10
- package/pipeline/steps/orchestration/sprint-plan.md +28 -31
- package/pipeline/steps/orchestration/validate-story-inputs.md +1 -1
- package/pipeline/steps/qa-a/read-inputs.md +2 -6
- package/pipeline/steps/reqs/read-inputs.md +3 -7
- package/pipeline/steps/retrospective/calibration.md +18 -31
- package/pipeline/steps/uxa/read-inputs.md +2 -6
- package/pipeline/task-graphs/backend-api.yaml +1 -9
- package/pipeline/task-graphs/data-pipeline.yaml +1 -9
- package/pipeline/task-graphs/document-generation.yaml +1 -9
- package/pipeline/task-graphs/frontend-only.yaml +9 -16
- package/pipeline/task-graphs/fullstack-web.yaml +11 -18
- package/pipeline/task-graphs/library.yaml +1 -9
- package/pipeline/task-graphs/mcp-server.yaml +1 -9
- package/pipeline/task-graphs/mobile-app.yaml +8 -15
- package/pipeline/templates/bend-handoff.template.md +11 -0
- package/pipeline/templates/critic-review.template.md +15 -1
- package/pipeline/templates/data-handoff.template.md +11 -0
- package/pipeline/templates/docgen-handoff.template.md +11 -0
- package/pipeline/templates/embed-instructions.template.md +1 -1
- package/pipeline/templates/execution-report.template.md +11 -0
- package/pipeline/templates/fend-handoff.template.md +11 -0
- package/pipeline/templates/iac-handoff.template.md +11 -0
- package/pipeline/templates/judge-decision.template.md +13 -0
- package/pipeline/templates/libdev-handoff.template.md +11 -0
- package/pipeline/templates/mcp-dev-handoff.template.md +11 -0
- package/pipeline/templates/mobile-handoff.template.md +11 -0
- package/pipeline/templates/qa-test-spec.template.md +11 -0
- package/pipeline/templates/readiness-review.template.md +13 -0
- package/pipeline/templates/reqs-brief.template.md +11 -0
- package/pipeline/templates/retrospective.template.md +1 -1
- package/pipeline/templates/uxa-spec.template.md +11 -0
- package/skills/valent-help/SKILL.md +2 -2
- package/skills/valent-knowledge/SKILL.md +68 -0
- package/skills/valent-run-epic/SKILL.md +4 -9
- package/skills/valent-run-project/SKILL.md +4 -7
- package/skills/valent-run-story/SKILL.md +13 -1
- package/skills/valent-setup-backlog/SKILL.md +3 -3
- package/src/commands/calibrate.js +86 -0
- package/src/commands/init.js +1 -1
- package/src/commands/rejection-cap.js +70 -0
- package/src/commands/resolve-graph.js +79 -0
- package/src/commands/sprint-pack.js +62 -0
- package/src/commands/validate-handoff.js +32 -0
- package/src/commands/validate-sprint.js +55 -0
- package/src/lib/config-schema.js +2 -2
- package/src/lib/graph.js +98 -0
- package/src/lib/handoff.js +99 -0
- package/src/lib/rejection.js +38 -0
- package/src/lib/sprint.js +312 -0
- package/pipeline/prompts/knowledge.md +0 -94
- package/pipeline/providers/claude-code/knowledge-spawn.template.md +0 -17
- package/pipeline/providers/codex/codex-project-files/.codex/agents/knowledge-service.toml +0 -14
- package/pipeline/providers/codex/knowledge-spawn.template.md +0 -19
- package/pipeline/spawn-templates/knowledge-spawn.template.md +0 -17
package/bin/cli.js
CHANGED
|
@@ -54,6 +54,86 @@ configCmd
|
|
|
54
54
|
await validate();
|
|
55
55
|
});
|
|
56
56
|
|
|
57
|
+
// validate-handoff command
|
|
58
|
+
program
|
|
59
|
+
.command('validate-handoff')
|
|
60
|
+
.description('Validate a handoff artifact\'s valent:handoff machine block against the schema')
|
|
61
|
+
.requiredOption('--file <path>', 'Path to the handoff markdown file')
|
|
62
|
+
.option('--gate', 'Force gate validation (verdict required + pass-requires-zero-Highs invariant)')
|
|
63
|
+
.action(async (options) => {
|
|
64
|
+
const { validateHandoffCmd } = await import('../src/commands/validate-handoff.js');
|
|
65
|
+
await validateHandoffCmd(options);
|
|
66
|
+
});
|
|
67
|
+
|
|
68
|
+
// resolve-graph command
|
|
69
|
+
program
|
|
70
|
+
.command('resolve-graph')
|
|
71
|
+
.description('Deterministically resolve a task graph against testing profiles (evaluate predicates, prune blockedBy)')
|
|
72
|
+
.option('--type <project-type>', 'Project type (resolves .valent-pipeline/task-graphs/<type>.yaml, falling back to packaged)')
|
|
73
|
+
.option('--file <path>', 'Explicit path to a task-graph YAML (overrides --type)')
|
|
74
|
+
.option('--profiles <list>', 'Comma-separated testing profiles, e.g. api,ui,iac', '')
|
|
75
|
+
.option('--validate-only', 'Validate the graph shape and references without resolving')
|
|
76
|
+
.action(async (options) => {
|
|
77
|
+
const { resolveGraphCmd } = await import('../src/commands/resolve-graph.js');
|
|
78
|
+
await resolveGraphCmd(options);
|
|
79
|
+
});
|
|
80
|
+
|
|
81
|
+
// sprint-pack command (meta-loop: greedy story packing)
|
|
82
|
+
program
|
|
83
|
+
.command('sprint-pack')
|
|
84
|
+
.description('Deterministically pack groomed stories into a sprint by priority within a velocity budget')
|
|
85
|
+
.requiredOption('--velocity <n>', 'Sprint capacity in story points')
|
|
86
|
+
.option('--backlog <path>', 'Backlog file (YAML/JSON); packs its `items`')
|
|
87
|
+
.option('--stories <path>', 'Explicit story array (YAML/JSON); overrides --backlog')
|
|
88
|
+
.action(async (options) => {
|
|
89
|
+
const { sprintPackCmd } = await import('../src/commands/sprint-pack.js');
|
|
90
|
+
await sprintPackCmd(options);
|
|
91
|
+
});
|
|
92
|
+
|
|
93
|
+
// calibrate command (meta-loop: estimation-accuracy arithmetic)
|
|
94
|
+
program
|
|
95
|
+
.command('calibrate')
|
|
96
|
+
.description('Compute calibration metrics (point/time ratios, deviation flags, velocity stability)')
|
|
97
|
+
.option('--sprint <id>', 'Sprint to pull calibration rows for (queries the SQLite store)')
|
|
98
|
+
.option('--db', 'Use all calibration rows from the store (no sprint filter)')
|
|
99
|
+
.option('--db-path <path>', 'Database path (defaults to config)')
|
|
100
|
+
.option('--data <path>', 'Explicit calibration rows (YAML/JSON); overrides the db source')
|
|
101
|
+
.option('--velocity-history <path>', 'Explicit velocity history (YAML/JSON) to pair with --data')
|
|
102
|
+
.option('--deviation-threshold <n>', 'Pairwise deviation flag threshold (default 0.5)')
|
|
103
|
+
.option('--cv-threshold <n>', 'Velocity coefficient-of-variation instability threshold (default 0.3)')
|
|
104
|
+
.action(async (options) => {
|
|
105
|
+
const { calibrateCmd } = await import('../src/commands/calibrate.js');
|
|
106
|
+
await calibrateCmd(options);
|
|
107
|
+
});
|
|
108
|
+
|
|
109
|
+
// validate-sprint command (meta-loop: consistency cross-checks)
|
|
110
|
+
program
|
|
111
|
+
.command('validate-sprint')
|
|
112
|
+
.description('Cross-check sprint status YAML and backlog for consistency (sprint-plan.md Step 6)')
|
|
113
|
+
.requiredOption('--status <path>', 'Sprint status YAML/JSON (machine-readable companion to the plan)')
|
|
114
|
+
.requiredOption('--backlog <path>', 'Backlog file (YAML/JSON)')
|
|
115
|
+
.option('--plan <path>', 'Optional structured plan (JSON/YAML), e.g. sprint-pack output; defaults to deriving from --status')
|
|
116
|
+
.action(async (options) => {
|
|
117
|
+
const { validateSprintCmd } = await import('../src/commands/validate-sprint.js');
|
|
118
|
+
await validateSprintCmd(options);
|
|
119
|
+
});
|
|
120
|
+
|
|
121
|
+
// rejection-cap command (code-owned rejection cap for the prose/Codex shell)
|
|
122
|
+
program
|
|
123
|
+
.command('rejection-cap')
|
|
124
|
+
.description('Track and enforce the per-story rejection cap in code (exits non-zero when tripped)')
|
|
125
|
+
.requiredOption('--story <id>', 'Story identifier')
|
|
126
|
+
.requiredOption('--gate <gate>', 'Gate name (readiness | critic | judge)')
|
|
127
|
+
.option('--agent <name>', 'Responsible agent the rejection is routed to (defaults to the gate)')
|
|
128
|
+
.option('--max <n>', 'Cap (max rejection cycles); defaults to 5')
|
|
129
|
+
.option('--increment', 'Record a new rejection (bump the counter), then report')
|
|
130
|
+
.option('--reset', 'Clear all counters for the story (call at a story boundary), then report')
|
|
131
|
+
.option('--state <path>', 'State file path (defaults to .valent-pipeline/rejection-state.json)')
|
|
132
|
+
.action(async (options) => {
|
|
133
|
+
const { rejectionCapCmd } = await import('../src/commands/rejection-cap.js');
|
|
134
|
+
await rejectionCapCmd(options);
|
|
135
|
+
});
|
|
136
|
+
|
|
57
137
|
// db commands
|
|
58
138
|
const dbCmd = program
|
|
59
139
|
.command('db')
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "valent-pipeline",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.4.1",
|
|
4
4
|
"description": "v3 multi-agent AI pipeline for software development lifecycle",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -16,14 +16,16 @@
|
|
|
16
16
|
"skills/"
|
|
17
17
|
],
|
|
18
18
|
"scripts": {
|
|
19
|
-
"test": "node scripts/test-local.js",
|
|
20
|
-
"prepublishOnly": "node scripts/test-local.js"
|
|
19
|
+
"test": "node scripts/test-local.js && node scripts/test-workflow.js",
|
|
20
|
+
"prepublishOnly": "node scripts/test-local.js && node scripts/test-workflow.js"
|
|
21
21
|
},
|
|
22
22
|
"dependencies": {
|
|
23
|
+
"ajv": "^8.20.0",
|
|
24
|
+
"better-sqlite3": "^11.0.0",
|
|
23
25
|
"commander": "^12.0.0",
|
|
24
26
|
"inquirer": "^9.0.0",
|
|
25
|
-
"
|
|
26
|
-
"
|
|
27
|
+
"sqlite-vec": "^0.1.0",
|
|
28
|
+
"yaml": "^2.9.0"
|
|
27
29
|
},
|
|
28
30
|
"keywords": [
|
|
29
31
|
"ai",
|
|
@@ -75,7 +75,7 @@ agents:
|
|
|
75
75
|
|
|
76
76
|
readiness:
|
|
77
77
|
name: READINESS
|
|
78
|
-
model:
|
|
78
|
+
model: opus
|
|
79
79
|
lifecycle: per-story
|
|
80
80
|
role: "Spec quality gate — validates reqs, UXA spec, and test specs are implementation-ready"
|
|
81
81
|
prompt_template: .valent-pipeline/prompts/readiness.md
|
|
@@ -85,8 +85,8 @@ agents:
|
|
|
85
85
|
|
|
86
86
|
bend:
|
|
87
87
|
name: BEND
|
|
88
|
-
model:
|
|
89
|
-
lifecycle: per-
|
|
88
|
+
model: opus
|
|
89
|
+
lifecycle: per-sprint # persists across stories; receives [STORY-RESET] between stories
|
|
90
90
|
role: "Backend developer — implements production code and tests"
|
|
91
91
|
prompt_template: .valent-pipeline/prompts/bend.md
|
|
92
92
|
reads_from: [reqs-brief.md, qa-test-spec.md]
|
|
@@ -95,8 +95,8 @@ agents:
|
|
|
95
95
|
|
|
96
96
|
fend:
|
|
97
97
|
name: FEND
|
|
98
|
-
model:
|
|
99
|
-
lifecycle: per-
|
|
98
|
+
model: opus
|
|
99
|
+
lifecycle: per-sprint # persists across stories; receives [STORY-RESET] between stories
|
|
100
100
|
role: "Frontend developer — implements UI components and tests"
|
|
101
101
|
prompt_template: .valent-pipeline/prompts/fend.md
|
|
102
102
|
reads_from: [reqs-brief.md, uxa-spec.md, qa-test-spec.md]
|
|
@@ -105,8 +105,8 @@ agents:
|
|
|
105
105
|
|
|
106
106
|
mobile:
|
|
107
107
|
name: MOBILE
|
|
108
|
-
model:
|
|
109
|
-
lifecycle: per-
|
|
108
|
+
model: opus
|
|
109
|
+
lifecycle: per-sprint # persists across stories; receives [STORY-RESET] between stories
|
|
110
110
|
role: "Mobile developer — implements RN/Flutter screens, components, Maestro E2E flows"
|
|
111
111
|
prompt_template: .valent-pipeline/prompts/mobile.md
|
|
112
112
|
reads_from: [reqs-brief.md, uxa-spec.md, qa-test-spec.md]
|
|
@@ -115,8 +115,8 @@ agents:
|
|
|
115
115
|
|
|
116
116
|
data:
|
|
117
117
|
name: DATA
|
|
118
|
-
model:
|
|
119
|
-
lifecycle: per-
|
|
118
|
+
model: opus
|
|
119
|
+
lifecycle: per-sprint # persists across stories; receives [STORY-RESET] between stories
|
|
120
120
|
role: "Data pipeline developer — implements ETL, transforms, data quality, checkpointing"
|
|
121
121
|
prompt_template: .valent-pipeline/prompts/data.md
|
|
122
122
|
reads_from: [reqs-brief.md, qa-test-spec.md]
|
|
@@ -125,8 +125,8 @@ agents:
|
|
|
125
125
|
|
|
126
126
|
mcp_dev:
|
|
127
127
|
name: MCP-DEV
|
|
128
|
-
model:
|
|
129
|
-
lifecycle: per-
|
|
128
|
+
model: opus
|
|
129
|
+
lifecycle: per-sprint # persists across stories; receives [STORY-RESET] between stories
|
|
130
130
|
role: "Protocol developer — implements MCP server tools, JSON-RPC handlers, transport"
|
|
131
131
|
prompt_template: .valent-pipeline/prompts/mcp-dev.md
|
|
132
132
|
reads_from: [reqs-brief.md, qa-test-spec.md]
|
|
@@ -135,8 +135,8 @@ agents:
|
|
|
135
135
|
|
|
136
136
|
libdev:
|
|
137
137
|
name: LIBDEV
|
|
138
|
-
model:
|
|
139
|
-
lifecycle: per-
|
|
138
|
+
model: opus
|
|
139
|
+
lifecycle: per-sprint # persists across stories; receives [STORY-RESET] between stories
|
|
140
140
|
role: "Library developer — implements public API, exports, packaging, type declarations"
|
|
141
141
|
prompt_template: .valent-pipeline/prompts/libdev.md
|
|
142
142
|
reads_from: [reqs-brief.md, qa-test-spec.md]
|
|
@@ -145,8 +145,8 @@ agents:
|
|
|
145
145
|
|
|
146
146
|
docgen:
|
|
147
147
|
name: DOCGEN
|
|
148
|
-
model:
|
|
149
|
-
lifecycle: per-
|
|
148
|
+
model: opus
|
|
149
|
+
lifecycle: per-sprint # persists across stories; receives [STORY-RESET] between stories
|
|
150
150
|
role: "Document generation developer — implements templates, render pipeline, output formatting"
|
|
151
151
|
prompt_template: .valent-pipeline/prompts/docgen.md
|
|
152
152
|
reads_from: [reqs-brief.md, qa-test-spec.md]
|
|
@@ -155,8 +155,8 @@ agents:
|
|
|
155
155
|
|
|
156
156
|
iac:
|
|
157
157
|
name: IAC
|
|
158
|
-
model:
|
|
159
|
-
lifecycle: per-
|
|
158
|
+
model: opus
|
|
159
|
+
lifecycle: per-sprint # persists across stories; receives [STORY-RESET] between stories
|
|
160
160
|
role: "Infrastructure developer — implements IaC definitions, deployment configs, infrastructure tests"
|
|
161
161
|
prompt_template: .valent-pipeline/prompts/iac.md
|
|
162
162
|
reads_from: [reqs-brief.md, qa-test-spec.md]
|
|
@@ -165,7 +165,7 @@ agents:
|
|
|
165
165
|
critic:
|
|
166
166
|
name: CRITIC
|
|
167
167
|
model: opus
|
|
168
|
-
lifecycle: per-
|
|
168
|
+
lifecycle: per-sprint # persists across stories; receives [STORY-RESET] between stories
|
|
169
169
|
role: "Code reviewer — 3-pass adversarial review of production and test code"
|
|
170
170
|
prompt_template: .valent-pipeline/prompts/critic.md
|
|
171
171
|
review_passes: [blind-hunt, edge-case-hunt, acceptance-audit, triage]
|
|
@@ -174,8 +174,8 @@ agents:
|
|
|
174
174
|
|
|
175
175
|
qa_b:
|
|
176
176
|
name: QA-B
|
|
177
|
-
model:
|
|
178
|
-
lifecycle: per-
|
|
177
|
+
model: opus
|
|
178
|
+
lifecycle: per-sprint # persists across stories; receives [STORY-RESET] between stories
|
|
179
179
|
role: "Test executor — runs tests, validates spec alignment, files bugs"
|
|
180
180
|
prompt_template: .valent-pipeline/prompts/qa-b.md
|
|
181
181
|
reads_from: [qa-test-spec.md, critic-review.md, reqs-brief.md]
|
|
@@ -184,23 +184,13 @@ agents:
|
|
|
184
184
|
|
|
185
185
|
judge:
|
|
186
186
|
name: JUDGE
|
|
187
|
-
model:
|
|
188
|
-
lifecycle: per-
|
|
187
|
+
model: opus
|
|
188
|
+
lifecycle: per-sprint # persists across stories; receives [STORY-RESET] between stories
|
|
189
189
|
role: "Final quality gate — bug priority review + evidence-based ship decision"
|
|
190
190
|
prompt_template: .valent-pipeline/prompts/judge.md
|
|
191
191
|
reads_from: [execution-report.md, traceability-matrix.md, pmcp-evidence.md, bugs.md, qa-test-spec.md] # critic-review.md intentionally excluded — JUDGE validates test/execution evidence, not code review; qa-test-spec.md used as reference for assertion cross-check
|
|
192
192
|
writes_to: [judge-review.md, judge-decision.md, story-report.md]
|
|
193
193
|
|
|
194
|
-
knowledge:
|
|
195
|
-
name: Knowledge
|
|
196
|
-
model: haiku
|
|
197
|
-
lifecycle: per-story
|
|
198
|
-
role: "Knowledge retrieval — answers queries from persistent data sources"
|
|
199
|
-
prompt_template: .valent-pipeline/prompts/knowledge.md
|
|
200
|
-
data_sources: [chromadb, curated-knowledge-files, correction-directives]
|
|
201
|
-
context_variables: [knowledge_mode, chromadb_host, chromadb_collection_prefix, curated_files_path, correction_directives]
|
|
202
|
-
# No writes_to — Knowledge Agent responds via inbox only, no file output
|
|
203
|
-
|
|
204
194
|
ephemeral_agents:
|
|
205
195
|
pmcp:
|
|
206
196
|
name: PMCP
|
|
@@ -223,7 +213,7 @@ ephemeral_agents:
|
|
|
223
213
|
|
|
224
214
|
retrospective:
|
|
225
215
|
name: Retrospective
|
|
226
|
-
model:
|
|
216
|
+
model: opus
|
|
227
217
|
role: "Batch reviewer — analyzes last N stories for recurring patterns"
|
|
228
218
|
prompt_template: .valent-pipeline/prompts/retrospective.md
|
|
229
219
|
spawned_by: lead
|
|
@@ -24,11 +24,9 @@ pipeline/
|
|
|
24
24
|
claude-code/
|
|
25
25
|
runtime.md ← PROVIDER — Claude Code runtime operations
|
|
26
26
|
spawn.template.md ← PROVIDER — Claude Code spawn template
|
|
27
|
-
knowledge-spawn.template.md ← PROVIDER — Claude Code knowledge spawn
|
|
28
27
|
codex/
|
|
29
28
|
runtime.md ← PROVIDER — Codex runtime operations
|
|
30
29
|
spawn.template.md ← PROVIDER — Codex spawn template
|
|
31
|
-
knowledge-spawn.template.md ← PROVIDER — Codex knowledge spawn
|
|
32
30
|
AGENTS.md ← PROVIDER — Codex repo-level instructions
|
|
33
31
|
cloud-task-protocol.md ← PROVIDER — Codex cloud execution protocol
|
|
34
32
|
cloud-task-prompts/ ← PROVIDER — Codex cloud task templates
|
|
@@ -56,7 +54,8 @@ Lead's prompt (`lead.md`) defines WHEN and WHY. The runtime adapter defines HOW.
|
|
|
56
54
|
|------|---------|
|
|
57
55
|
| `runtime.md` | All runtime operations: initialization, task registry, agent spawning, signal delivery, monitoring, teardown |
|
|
58
56
|
| `spawn.template.md` | Agent spawn prompt template — what each agent instance receives at startup |
|
|
59
|
-
|
|
57
|
+
|
|
58
|
+
> **Note:** Knowledge is a self-service skill (`valent-knowledge`), not an agent — there is no `knowledge-spawn.template.md`. `scripts/validate-provider-sync.js` enforces this inventory (spawn-template parity + manifest prompt resolution).
|
|
60
59
|
|
|
61
60
|
### Codex-Only Files
|
|
62
61
|
|
|
@@ -133,7 +132,6 @@ Entirely shared. Quality gates are orchestration logic (when to reject, where to
|
|
|
133
132
|
1. Create `providers/{new-provider}/` with:
|
|
134
133
|
- `runtime.md` — all runtime operations for the new provider
|
|
135
134
|
- `spawn.template.md` — spawn template adapted for the provider's agent model
|
|
136
|
-
- `knowledge-spawn.template.md` — knowledge spawn adapted
|
|
137
135
|
|
|
138
136
|
2. Update `src/lib/config-schema.js`:
|
|
139
137
|
- Add new provider to `validProviders` array
|
|
@@ -158,9 +156,10 @@ Entirely shared. Quality gates are orchestration logic (when to reject, where to
|
|
|
158
156
|
|
|
159
157
|
The `scripts/validate-provider-sync.js` script runs in CI before every publish. It checks:
|
|
160
158
|
|
|
161
|
-
1. **Template parity** — `spawn.template.md` and
|
|
162
|
-
2. **
|
|
163
|
-
3. **
|
|
159
|
+
1. **Template parity** — `spawn.template.md` exists in both providers, and any `*spawn.template.md` in one provider has a counterpart in the other
|
|
160
|
+
2. **Manifest integrity** — every `prompt_template` declared in `agents-manifest.yaml` resolves to a real file
|
|
161
|
+
3. **Agent coverage** — both runtime.md files reference the critical agents (REQS, BEND, FEND, CRITIC, QA-B, JUDGE)
|
|
162
|
+
4. **Structural consistency** — both runtime.md files have the major sections (Initialization, Task Registry, Agent Spawning, Signal Delivery, Monitoring, Teardown)
|
|
164
163
|
|
|
165
164
|
If any check fails, the publish is blocked. Fix the discrepancy, then re-push.
|
|
166
165
|
|
|
@@ -4,7 +4,7 @@ Reference documentation for the v3 pipeline knowledge subsystem -- how the pipel
|
|
|
4
4
|
|
|
5
5
|
## 1. Architecture Overview
|
|
6
6
|
|
|
7
|
-
The knowledge system has three data sources,
|
|
7
|
+
The knowledge system has three data sources, two curation agents, and one principle: the Retrospective Agent is the sole gatekeeper for what enters persistent knowledge. Agents self-serve from these data sources directly during their read-inputs step — there is no separate Knowledge Agent.
|
|
8
8
|
|
|
9
9
|
### Data Sources
|
|
10
10
|
|
|
@@ -12,15 +12,14 @@ The knowledge system has three data sources, three agents, and one principle: th
|
|
|
12
12
|
|--------|--------|---------|
|
|
13
13
|
| **Curated knowledge files** | Markdown in `.valent-pipeline/knowledge/curated/` | Conventions, validated patterns, known pitfalls, test stability data |
|
|
14
14
|
| **Correction directives** | YAML in `.valent-pipeline/knowledge/correction-directives.yaml` | Behavioral changes for agents -- translates observations into prompt-level guidance |
|
|
15
|
-
| **
|
|
15
|
+
| **SQLite database** (optional) | SQLite via CLI | Indexed artifacts, full-text search, cross-story queries |
|
|
16
16
|
|
|
17
|
-
### Agents
|
|
17
|
+
### Curation Agents
|
|
18
18
|
|
|
19
19
|
| Agent | Model | Lifecycle | Role |
|
|
20
20
|
|-------|-------|-----------|------|
|
|
21
|
-
| **Knowledge** | Haiku | Per-story | Reads all three sources, responds to teammate queries via inbox |
|
|
22
21
|
| **Retrospective** | Sonnet | Ephemeral (every N stories) | Sole gatekeeper -- analyzes batch outputs, writes correction directives and embed instructions |
|
|
23
|
-
| **Embed** | Haiku | Ephemeral (after Retrospective) | Executes indexing instructions -- writes to
|
|
22
|
+
| **Embed** | Haiku | Ephemeral (after Retrospective) | Executes indexing instructions -- writes to curated files and/or SQLite |
|
|
24
23
|
|
|
25
24
|
### Data Flow
|
|
26
25
|
|
|
@@ -33,14 +32,13 @@ Retrospective Agent
|
|
|
33
32
|
|--- writes ---> embed-instructions.md
|
|
34
33
|
v
|
|
35
34
|
Embed Agent
|
|
36
|
-
|--- indexes --->
|
|
35
|
+
|--- indexes ---> SQLite database (if configured)
|
|
37
36
|
|--- writes ---> .valent-pipeline/knowledge/curated/ files
|
|
38
37
|
v
|
|
39
|
-
|
|
38
|
+
Pipeline agents (next story)
|
|
40
39
|
|--- reads ---> correction directives (active only)
|
|
41
40
|
|--- reads ---> curated files
|
|
42
|
-
|--- queries --->
|
|
43
|
-
|--- responds --> teammate queries via inbox
|
|
41
|
+
|--- queries ---> SQLite (if configured)
|
|
44
42
|
```
|
|
45
43
|
|
|
46
44
|
---
|
|
@@ -118,13 +116,13 @@ No per-story indexing occurs. This is the core design decision that prevents ind
|
|
|
118
116
|
|
|
119
117
|
6. **Lead spawns Embed Agent** after Retrospective completes. Embed reads the manifest and executes indexing. No lead interpretation needed.
|
|
120
118
|
|
|
121
|
-
7. **
|
|
119
|
+
7. **Pipeline agents** (next story) read active correction directives and curated files directly during their read-inputs step.
|
|
122
120
|
|
|
123
121
|
---
|
|
124
122
|
|
|
125
123
|
## 4. RAG Assessment Framework
|
|
126
124
|
|
|
127
|
-
Before investing further in ChromaDB-based RAG, run a Knowledge Retrieval Audit after 5-10 stories with the
|
|
125
|
+
Before investing further in ChromaDB-based RAG, run a Knowledge Retrieval Audit after 5-10 stories with the knowledge system active.
|
|
128
126
|
|
|
129
127
|
### Three Failure Modes
|
|
130
128
|
|
|
@@ -132,15 +130,15 @@ Before investing further in ChromaDB-based RAG, run a Knowledge Retrieval Audit
|
|
|
132
130
|
|
|
133
131
|
2. **Index pollution.** Without garbage collection or versioning, ChromaDB collections accumulate stale and contradictory entries. The Retrospective-gated curation directly addresses this.
|
|
134
132
|
|
|
135
|
-
3. **Brief quality.** Does BEND perform measurably better with
|
|
133
|
+
3. **Brief quality.** Does BEND perform measurably better with knowledge context than without it? If not, those 2-3k tokens of context are displacing something more useful.
|
|
136
134
|
|
|
137
135
|
### Assessment Questions
|
|
138
136
|
|
|
139
137
|
| Question | How to Measure | Implication |
|
|
140
138
|
|----------|---------------|-------------|
|
|
141
|
-
| Do agents actually
|
|
139
|
+
| Do agents actually use knowledge data during tasks? | Check if agents reference knowledge sources in their frontmatter across last 10 stories | If near-zero, agents are not finding it useful |
|
|
142
140
|
| Do startup briefs reduce rejection cycles? | Compare CRITIC rejection rates for stories with vs without relevant prior patterns | If no difference, briefs are not helping |
|
|
143
|
-
| Are retrieval results relevant? | Sample 20
|
|
141
|
+
| Are retrieval results relevant? | Sample 20 knowledge queries, manually rate top-3 results for relevance | If <50% relevant, embedding strategy needs work |
|
|
144
142
|
| Is index pollution growing? | Count contradictory entries in `corrections` collection | If significant, need versioning/expiry |
|
|
145
143
|
|
|
146
144
|
### Three Possible Outcomes
|
|
@@ -152,13 +150,13 @@ Before investing further in ChromaDB-based RAG, run a Knowledge Retrieval Audit
|
|
|
152
150
|
|
|
153
151
|
**B. RAG is noise -- simplify to curated context:**
|
|
154
152
|
- Replace ChromaDB with curated knowledge files maintained by Retrospective Agent
|
|
155
|
-
- Knowledge
|
|
153
|
+
- Knowledge becomes simple file reading, not a retrieval system
|
|
156
154
|
- Cheaper, more predictable, easier to debug
|
|
157
155
|
|
|
158
156
|
**C. RAG is partially working -- hybrid approach:**
|
|
159
157
|
- Keep ChromaDB for `source-code` and `build-patterns` collections (embedding similarity works for code)
|
|
160
158
|
- Move `corrections`, `conventions`, and `qa-lessons` to curated files (human-readable, not embedding-dependent)
|
|
161
|
-
-
|
|
159
|
+
- Agents use both: curated files for startup briefs, ChromaDB for on-demand "find similar code" queries
|
|
162
160
|
|
|
163
161
|
---
|
|
164
162
|
|
|
@@ -168,7 +166,7 @@ Configured via `knowledge.mode` in `pipeline-config.yaml`.
|
|
|
168
166
|
|
|
169
167
|
### `none` (default)
|
|
170
168
|
|
|
171
|
-
-
|
|
169
|
+
- Agents read curated files + correction directives only
|
|
172
170
|
- Embed Agent IS triggered but only writes to curated files (no ChromaDB operations)
|
|
173
171
|
- Zero external dependencies
|
|
174
172
|
- ChromaDB can be added later without pipeline changes
|
|
@@ -176,7 +174,7 @@ Configured via `knowledge.mode` in `pipeline-config.yaml`.
|
|
|
176
174
|
### `local-docker`
|
|
177
175
|
|
|
178
176
|
- ChromaDB runs locally via `docker compose -f .valent-pipeline/docker-compose.chromadb.yml up -d`
|
|
179
|
-
-
|
|
177
|
+
- Agents can connect to ChromaDB at the configured `chromadb_host` (typically `http://localhost:8000`)
|
|
180
178
|
- Falls back to curated-only mode if ChromaDB is unreachable
|
|
181
179
|
- Embed Agent indexes into both ChromaDB collections and curated files
|
|
182
180
|
|
|
@@ -8,9 +8,7 @@
|
|
|
8
8
|
|
|
9
9
|
### Persistent vs Per-Story Agents
|
|
10
10
|
|
|
11
|
-
The lead is the **only persistent agent** in the pipeline. It carries `pipeline-state.json` and backlog position forward across stories. All other agents (REQS, UXA, QA-A, BEND, FEND, CRITIC, QA-B, READINESS, JUDGE
|
|
12
|
-
|
|
13
|
-
The Knowledge Agent's value is in its persistent data sources (ChromaDB collections and curated knowledge files on disk), not its conversation history. A fresh spawn reads from the same store.
|
|
11
|
+
The lead is the **only persistent agent** in the pipeline. It carries `pipeline-state.json` and backlog position forward across stories. All other agents (REQS, UXA, QA-A, BEND, FEND, CRITIC, QA-B, READINESS, JUDGE) are **per-story** -- spawned fresh when a story starts, torn down when it ships. Knowledge is self-served by each agent directly from curated files and correction directives on disk.
|
|
14
12
|
|
|
15
13
|
Ephemeral agents (PMCP, Embed, Retrospective, Help) are spawned on-demand for a specific task and killed when done. They are not teammates -- they do not receive inbox messages mid-story.
|
|
16
14
|
|
|
@@ -35,7 +33,7 @@ The lead validates the story input before spawning any teammates.
|
|
|
35
33
|
- **Trigger map** -- enables UXA strategic validation (driving force cross-referencing). Without it, UXA runs in translation-only mode.
|
|
36
34
|
- **Scenario outlines** -- enables scenario-driven UXA specs.
|
|
37
35
|
- **Architecture decisions** -- enables REQS to incorporate technical constraints.
|
|
38
|
-
- **Existing project context** -- codebase documentation, conventions, prior patterns. Loaded
|
|
36
|
+
- **Existing project context** -- codebase documentation, conventions, prior patterns. Loaded from curated knowledge files.
|
|
39
37
|
|
|
40
38
|
If required fields are missing, the story is rejected via CLI escalation (see Backlog Management below).
|
|
41
39
|
|
|
@@ -120,7 +118,7 @@ All code committed and pushed to the branch specified by the user. The pipeline
|
|
|
120
118
|
2. Code committed and pushed to user-specified branch
|
|
121
119
|
3. All agent outputs persist in the story folder (handoff files, reviews, bug reports, execution reports, PMCP evidence)
|
|
122
120
|
4. Lead writes `story-report.md`: task completion times, rejection cycles, cost metrics
|
|
123
|
-
5. Lead tears down all story teammates
|
|
121
|
+
5. Lead tears down all story teammates
|
|
124
122
|
6. Lead persists -- carries pipeline state and backlog position forward
|
|
125
123
|
7. Lead picks next story from backlog and returns to Phase 1 with a fresh story team
|
|
126
124
|
|
|
@@ -256,13 +254,6 @@ The lead manages the backlog as a dependency-aware queue, not a simple FIFO list
|
|
|
256
254
|
- "You are replacing a crashed agent. Steps completed: [from frontmatter]. Prior work: [from handoff files on disk]. Resume from step: [next incomplete step]."
|
|
257
255
|
7. Fresh teammate picks up from where the crashed agent left off
|
|
258
256
|
|
|
259
|
-
### Crash Type: Knowledge Agent Crashes
|
|
260
|
-
|
|
261
|
-
1. Spawn a new Knowledge Agent with the same role definition
|
|
262
|
-
2. New agent has immediate access to ChromaDB and curated knowledge files (both on disk)
|
|
263
|
-
3. On-demand queries are stateless by design -- no conversation history needed
|
|
264
|
-
4. The Knowledge Agent is killed and respawned fresh per story anyway, so mid-story crashes are the only case that matters
|
|
265
|
-
|
|
266
257
|
### Crash Type: Lead Crashes
|
|
267
258
|
|
|
268
259
|
1. Human restarts the lead (this is the one case requiring manual intervention)
|
|
@@ -27,7 +27,6 @@ The v3 pipeline splits into three categories of files:
|
|
|
27
27
|
| `.valent-pipeline/task-graphs/frontend-only.yaml` | Pipeline infrastructure | Shipped with package |
|
|
28
28
|
| `.valent-pipeline/spawn-templates/pipeline-context.template.md` | Pipeline infrastructure | Shipped with package; filled at runtime |
|
|
29
29
|
| `.valent-pipeline/spawn-templates/agent-spawn.template.md` | Pipeline infrastructure | Shipped with package |
|
|
30
|
-
| `.valent-pipeline/spawn-templates/knowledge-spawn.template.md` | Pipeline infrastructure | Shipped with package |
|
|
31
30
|
| `.valent-pipeline/agents-manifest.yaml` | Pipeline infrastructure | Shipped with package; models section overridable via project config |
|
|
32
31
|
| `.valent-pipeline/scripts/embed.ts` | Pipeline infrastructure | Shipped with package |
|
|
33
32
|
| `.valent-pipeline/docker-compose.chromadb.yml` | Pipeline infrastructure | Shipped with package |
|
|
@@ -278,5 +278,5 @@ The 16 templates in `.valent-pipeline/templates/`, mapped to their producing age
|
|
|
278
278
|
| `judge-decision.template.md` | JUDGE | Lead |
|
|
279
279
|
| `story-report.template.md` | Lead | User |
|
|
280
280
|
| `pmcp-evidence.template.md` | PMCP | JUDGE |
|
|
281
|
-
| `retrospective.template.md` | Retrospective Agent | Lead,
|
|
281
|
+
| `retrospective.template.md` | Retrospective Agent | Lead, pipeline agents |
|
|
282
282
|
| `embed-instructions.template.md` | Lead | Embed Agent |
|
|
@@ -0,0 +1,99 @@
|
|
|
1
|
+
# Claude Code orchestrator (native Workflow)
|
|
2
|
+
|
|
3
|
+
This is the Claude Code deployment of the valent-pipeline orchestrator, per the hybrid
|
|
4
|
+
target in [`../../../docs-feedback/reimplementation-plan.md`](../../../docs-feedback/reimplementation-plan.md)
|
|
5
|
+
(R3): the Claude Code provider runs a deterministic **Workflow script**, while the Codex
|
|
6
|
+
provider keeps the markdown-skill Lead. Both consume the same shared substrate
|
|
7
|
+
(`prompts/`, `steps/`, `task-graphs/`, `schemas/`, templates).
|
|
8
|
+
|
|
9
|
+
## The three workflows
|
|
10
|
+
|
|
11
|
+
| File | Step | Role |
|
|
12
|
+
|---|---|---|
|
|
13
|
+
| `plan.workflow.js` | 7 | Groom → size → pack → validate a set of pending stories into a planned sprint batch. Emits a batch shaped to feed straight into `sprint.workflow.js`. |
|
|
14
|
+
| `sprint.workflow.js` | 4 + 6 | Execute a planned batch sequentially through the per-story pipeline with schema-validated gates. |
|
|
15
|
+
| `retro.workflow.js` | 7 | Learn from a shipped batch: calibrate, loop-until-dry aggregate review, gated directives, embed. |
|
|
16
|
+
|
|
17
|
+
They compose as `plan → sprint → retro`. The per-story pipeline is kept **inline** in
|
|
18
|
+
`sprint.workflow.js` (not a nested `workflow()`), so the single `workflow()` nesting level
|
|
19
|
+
stays free for a future sprint-cycle wrapper to call all three (reimplementation-plan §5b).
|
|
20
|
+
|
|
21
|
+
## Status
|
|
22
|
+
|
|
23
|
+
`sprint.workflow.js` implements **Steps 4 + 6** (R1 control flow, R4 gates-as-stages, the
|
|
24
|
+
sprint batch loop, 3b parallel CRITIC, and full spawn-context prompts). `plan.workflow.js`
|
|
25
|
+
and `retro.workflow.js` implement **Step 7**. **Step 8** (resume + state model, below) is
|
|
26
|
+
wired. All three are control-flow-validated by `scripts/test-workflow.js` (21 scenarios,
|
|
27
|
+
incl. a resume-safety lint), but:
|
|
28
|
+
|
|
29
|
+
- It is **opt-in, not the default.** `skills/valent-run-story` still drives the prose Lead;
|
|
30
|
+
the Workflow runs only when the user opts in (see that skill's "Step 5 (alternative)").
|
|
31
|
+
- It has **not been exercised end-to-end against a live story.** A Workflow runs via the
|
|
32
|
+
Workflow tool against a real project and spawns real agents; it cannot be unit-tested like
|
|
33
|
+
`src/lib/*`. Validate it against a `testResources/*` fixture before making it the default.
|
|
34
|
+
|
|
35
|
+
## What it demonstrates
|
|
36
|
+
|
|
37
|
+
| Concern | How | Replaces |
|
|
38
|
+
|---|---|---|
|
|
39
|
+
| DAG resolution | spawns an agent that runs `resolve-graph` (step 2) per story | Lead transcribing + pruning by judgment |
|
|
40
|
+
| Sprint batch | sequential `for`-loop over `args.stories[]` (shared branch ⇒ no overlap) | prose six-phase sprint loop |
|
|
41
|
+
| Quality gates | `runGate()` returns a `verdict.schema`-validated object | prose verdict, unchecked |
|
|
42
|
+
| Pass-invariant | `assertGate()` rejects `pass` + open Highs | KANBAN-002 class |
|
|
43
|
+
| Rejection cap | JS `while` loop, code-owned counter | model-counted circuit breaker |
|
|
44
|
+
| Dev fan-out | `parallel()` barrier before CRITIC | wave/spawn_trigger overlay |
|
|
45
|
+
| 3b CRITIC | `parallel([blind, edge, acceptance])` independent agents → triage barrier | one CRITIC context, passes anchored on each other |
|
|
46
|
+
| Spawn context | `buildPrompt()` mirrors `spawn.template.md` (Setup/Task/Trigger/Completion) | terse inline instructions |
|
|
47
|
+
| Roll-over | a rejected story is recorded and the batch continues | — |
|
|
48
|
+
| Resume | journal (`resumeFromRunId`) | disk-state rehydration + re-decide |
|
|
49
|
+
|
|
50
|
+
## Args
|
|
51
|
+
|
|
52
|
+
```js
|
|
53
|
+
// batch form (a planned sprint)
|
|
54
|
+
{ stories: [{ storyId, projectType, profiles }, ...], maxRejectionCycles? }
|
|
55
|
+
// single-story form (back-compat)
|
|
56
|
+
{ storyId, projectType, profiles?, maxRejectionCycles? }
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
Returns `{ shipped, stories_shipped, stories_rolled_over, results: [{ storyId, shipped, verdict, skipped }] }`.
|
|
60
|
+
|
|
61
|
+
## Resume & state model (step 8)
|
|
62
|
+
|
|
63
|
+
**The journal is the state of record.** Each Workflow invocation returns a `runId`. To resume
|
|
64
|
+
after an interruption (context limit, crash, manual stop, or a mid-run script edit), relaunch
|
|
65
|
+
with `Workflow({ scriptPath, resumeFromRunId })` — **not** a fresh run. The journal replays the
|
|
66
|
+
unchanged prefix of `agent()` calls instantly (same script + args → 100% cache hit) and re-runs
|
|
67
|
+
only from the first changed/new call onward. Already-shipped stories and passed gates are not
|
|
68
|
+
redone. This is the exact form of the durability the prose Lead approximated by re-reading
|
|
69
|
+
`pipeline-state.json` and re-deciding.
|
|
70
|
+
|
|
71
|
+
`pipeline-state.json`, `sprint-{n}-status.yaml`, and the markdown handoffs are **derived,
|
|
72
|
+
human-readable views** in this path — agents write them for visibility; the orchestrator never
|
|
73
|
+
reads them back to make a control-flow decision (its state lives in JS variables the journal
|
|
74
|
+
captures). Because there is no multi-file state of record, the non-atomic multi-file desync the
|
|
75
|
+
prose Lead can hit (feedback gap #2) is structurally impossible here. *Do not hand-edit a state
|
|
76
|
+
file to resume — pass `resumeFromRunId`.* (The prose Lead path still uses `pipeline-state.json`
|
|
77
|
+
as its mechanism; that's correct for that runtime.)
|
|
78
|
+
|
|
79
|
+
**Resume-safety is linted.** Journal replay requires a deterministic, side-effect-free script
|
|
80
|
+
body, so `scripts/test-workflow.js` statically rejects `Date.now`/`new Date(`/`Math.random`
|
|
81
|
+
(nondeterminism) and `import`/`require`/`*FileSync`/`process.*` (in-script IO) in all three
|
|
82
|
+
workflow files. All IO goes through agents; that's why resolve-graph/sprint-pack/calibrate/embed
|
|
83
|
+
are invoked *through* an agent rather than imported.
|
|
84
|
+
|
|
85
|
+
## Known simplifications (next slices)
|
|
86
|
+
|
|
87
|
+
- A `sprint-cycle.workflow.js` that calls `plan → sprint → retro` via `workflow()` isn't built
|
|
88
|
+
yet; for now run the three workflows in sequence (the plan output feeds the sprint input).
|
|
89
|
+
- Per-story dev fan-out re-runs ALL dev agents on a CRITIC rejection; routing rework to only
|
|
90
|
+
the agent(s) CRITIC targeted (via `rejectionTarget`) is a refinement once run live.
|
|
91
|
+
- No PMCP / visual-validation stage yet; no PM/program-loop workflow (left agent-driven per §5b).
|
|
92
|
+
|
|
93
|
+
## Runtime constraint that shaped the design
|
|
94
|
+
|
|
95
|
+
A Workflow script body has **no filesystem or import access** — it cannot read
|
|
96
|
+
`task-graphs/*.yaml`, parse handoffs, or run the CLI directly. All IO is performed by the
|
|
97
|
+
agents it spawns (which have Bash/Read/Write); the script only sequences them and validates
|
|
98
|
+
their structured returns. That is why `resolve-graph` is invoked *through* an agent rather
|
|
99
|
+
than imported.
|