agent-bober 0.4.3 → 0.5.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +30 -0
- package/agents/bober-evaluator.md +277 -8
- package/agents/bober-generator.md +155 -0
- package/agents/bober-planner.md +70 -0
- package/dist/cli/commands/init.js +1 -0
- package/dist/cli/commands/init.js.map +1 -1
- package/dist/evaluators/builtin/playwright.d.ts +11 -0
- package/dist/evaluators/builtin/playwright.d.ts.map +1 -1
- package/dist/evaluators/builtin/playwright.js +259 -12
- package/dist/evaluators/builtin/playwright.js.map +1 -1
- package/package.json +1 -1
- package/skills/bober.eval/SKILL.md +145 -148
- package/skills/bober.playwright/SKILL.md +429 -0
- package/skills/bober.playwright/references/playwright-patterns.md +377 -0
- package/skills/bober.run/SKILL.md +425 -118
- package/skills/bober.sprint/SKILL.md +147 -57
- package/templates/presets/nextjs/bober.config.json +2 -1
|
@@ -4,9 +4,9 @@ description: Full autonomous pipeline — plan a feature, execute all sprints, e
|
|
|
4
4
|
argument-hint: <task-description>
|
|
5
5
|
---
|
|
6
6
|
|
|
7
|
-
# bober.run —
|
|
7
|
+
# bober.run — Multi-Agent Pipeline Orchestrator
|
|
8
8
|
|
|
9
|
-
You are
|
|
9
|
+
You are the **orchestrator** for the bober.run pipeline. You do NOT plan, code, or evaluate yourself. You spawn subagents for each of those roles using the **Agent tool**, coordinate the flow between them, and track progress. Each subagent runs in its own isolated context window, receiving only the information you explicitly pass in its prompt.
|
|
10
10
|
|
|
11
11
|
## Autonomous Mode
|
|
12
12
|
|
|
@@ -20,206 +20,504 @@ This command is designed to run **fully autonomously** — do NOT stop to ask th
|
|
|
20
20
|
|
|
21
21
|
The user launched this command to walk away and come back to a finished product. Respect that intent.
|
|
22
22
|
|
|
23
|
-
##
|
|
24
|
-
|
|
25
|
-
The pipeline follows this flow:
|
|
23
|
+
## Architecture — True Multi-Agent Orchestration
|
|
26
24
|
|
|
27
25
|
```
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
26
|
+
ORCHESTRATOR (you — this session)
|
|
27
|
+
│
|
|
28
|
+
├─ 1. Read bober.config.json, .bober/principles.md
|
|
29
|
+
├─ 2. Run check-prereqs.sh
|
|
30
|
+
│
|
|
31
|
+
├─ 3. SPAWN planner subagent (Agent tool)
|
|
32
|
+
│ └─ Planner reads codebase, generates PlanSpec + sprint contracts
|
|
33
|
+
│ └─ Saves to .bober/specs/ and .bober/contracts/
|
|
34
|
+
│ └─ Returns: spec ID and contract list
|
|
35
|
+
│
|
|
36
|
+
├─ 4. For each sprint contract:
|
|
37
|
+
│ │
|
|
38
|
+
│ ├─ 4a. Build context handoff (JSON in the prompt)
|
|
39
|
+
│ │ (spec, contract, previous feedback, principles)
|
|
40
|
+
│ │
|
|
41
|
+
│ ├─ 4b. SPAWN generator subagent (Agent tool)
|
|
42
|
+
│ │ └─ Receives handoff as prompt
|
|
43
|
+
│ │ └─ Implements the sprint, commits code
|
|
44
|
+
│ │ └─ Returns: completion report JSON
|
|
45
|
+
│ │
|
|
46
|
+
│ ├─ 4c. SPAWN evaluator subagent (Agent tool)
|
|
47
|
+
│ │ └─ Receives handoff + generator report
|
|
48
|
+
│ │ └─ Runs eval strategies (typecheck, lint, test, playwright)
|
|
49
|
+
│ │ └─ Returns: eval result JSON with pass/fail
|
|
50
|
+
│ │
|
|
51
|
+
│ ├─ 4d. If FAILED and retries < maxIterations:
|
|
52
|
+
│ │ └─ Add evaluator feedback to handoff
|
|
53
|
+
│ │ └─ Go to 4b (spawn FRESH generator with feedback)
|
|
54
|
+
│ │
|
|
55
|
+
│ └─ 4e. If PASSED: update contract status, log, next sprint
|
|
56
|
+
│
|
|
57
|
+
└─ 5. Final summary
|
|
50
58
|
```
|
|
51
59
|
|
|
52
|
-
|
|
60
|
+
**Critical rules for you as orchestrator:**
|
|
61
|
+
- NEVER do the planning, coding, or evaluating yourself — ALWAYS delegate to subagents via the Agent tool.
|
|
62
|
+
- After spawning a subagent, READ the files it created to get the actual results (the subagent's return value is a summary, but files on disk are the source of truth).
|
|
63
|
+
- Keep your own context clean — only track orchestration state (which sprint, which iteration, pass/fail), not implementation details.
|
|
64
|
+
- Each subagent spawn is a FRESH context — this is the whole point. It prevents context degradation over long pipelines.
|
|
65
|
+
- Log progress to `.bober/progress.md` and `.bober/history.jsonl` between every phase transition.
|
|
66
|
+
- Print clear phase banners so progress is visible in the terminal.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Step 1: Initialize
|
|
53
71
|
|
|
54
|
-
### 1a.
|
|
72
|
+
### 1a. Read Project Configuration
|
|
55
73
|
|
|
56
74
|
Read `bober.config.json`. If it does not exist:
|
|
57
|
-
- Ask the user the minimal initialization questions: project name, mode (greenfield vs brownfield), and what they are building
|
|
58
|
-
- Determine the appropriate `mode` and `preset` (if any) from the user's description
|
|
59
|
-
- Create `bober.config.json` with appropriate defaults
|
|
60
|
-
- Create the `.bober/` directory structure
|
|
75
|
+
- Ask the user the minimal initialization questions: project name, mode (greenfield vs brownfield), and what they are building.
|
|
76
|
+
- Determine the appropriate `mode` and `preset` (if any) from the user's description.
|
|
77
|
+
- Create `bober.config.json` with appropriate defaults.
|
|
78
|
+
- Create the `.bober/` directory structure.
|
|
61
79
|
|
|
62
80
|
If `bober.config.json` exists, read the configuration.
|
|
63
81
|
|
|
64
|
-
|
|
82
|
+
Read `.bober/principles.md` if it exists. You will pass the principles text into every subagent prompt.
|
|
83
|
+
|
|
84
|
+
### 1b. Run Prerequisites Check
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
bash scripts/check-prereqs.sh
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
If it fails, report the missing prerequisites and stop.
|
|
91
|
+
|
|
92
|
+
### 1c. Check for Existing Plans
|
|
65
93
|
|
|
66
94
|
Read `.bober/specs/` and `.bober/progress.md`. If there is an existing plan with incomplete sprints:
|
|
67
95
|
|
|
68
|
-
- If the user provided a new task description that clearly differs from the existing plan
|
|
69
|
-
- If the user provided no task or a task that matches the existing plan
|
|
70
|
-
- Log your decision but do NOT ask the user — autonomous mode means you decide and move forward
|
|
96
|
+
- If the user provided a new task description that clearly differs from the existing plan: create a new plan (go to Step 2)
|
|
97
|
+
- If the user provided no task or a task that matches the existing plan: resume from the next incomplete sprint (skip to Step 3)
|
|
98
|
+
- Log your decision but do NOT ask the user — autonomous mode means you decide and move forward.
|
|
99
|
+
|
|
100
|
+
Log event:
|
|
101
|
+
```json
|
|
102
|
+
{"event":"pipeline-started","timestamp":"<ISO-8601>","task":"<task description>"}
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## Step 2: Spawn the Planner Subagent
|
|
108
|
+
|
|
109
|
+
Use the **Agent tool** to spawn a planner subagent.
|
|
110
|
+
|
|
111
|
+
**How to call the Agent tool:**
|
|
71
112
|
|
|
72
|
-
|
|
113
|
+
```
|
|
114
|
+
Agent tool call:
|
|
115
|
+
description: "Plan feature: <title from task description>"
|
|
116
|
+
prompt: <the full prompt below>
|
|
117
|
+
```
|
|
73
118
|
|
|
74
|
-
|
|
119
|
+
**Build the planner prompt with ALL of these sections:**
|
|
120
|
+
|
|
121
|
+
```
|
|
122
|
+
You are the Bober Planner subagent. You have been spawned by the orchestrator to create a plan.
|
|
123
|
+
|
|
124
|
+
## Your Task
|
|
125
|
+
<paste the user's task description here>
|
|
126
|
+
|
|
127
|
+
## Project Configuration (bober.config.json)
|
|
128
|
+
<paste the full contents of bober.config.json here>
|
|
129
|
+
|
|
130
|
+
## Project Principles (.bober/principles.md)
|
|
131
|
+
<paste the full contents of .bober/principles.md here, or "No principles file found." if it does not exist>
|
|
132
|
+
|
|
133
|
+
## Existing Specs
|
|
134
|
+
<list any existing spec IDs from .bober/specs/, or "None" if no prior specs>
|
|
135
|
+
|
|
136
|
+
## Instructions
|
|
137
|
+
1. Read the codebase to understand the project structure (use Glob and Grep to survey, Read to examine key files).
|
|
138
|
+
2. Generate a PlanSpec with sprint decomposition.
|
|
139
|
+
3. Save the PlanSpec to .bober/specs/<specId>.json
|
|
140
|
+
4. Save each SprintContract to .bober/contracts/<contractId>.json
|
|
141
|
+
5. Update .bober/progress.md with the plan summary.
|
|
142
|
+
6. Append to .bober/history.jsonl: {"event":"plan-created","specId":"...","timestamp":"...","sprintCount":N}
|
|
143
|
+
|
|
144
|
+
IMPORTANT: You are running as a subagent — do NOT ask clarifying questions. Infer reasonable defaults from the codebase and task description. If something is genuinely ambiguous, document your assumption in the PlanSpec's "assumptions" field.
|
|
145
|
+
|
|
146
|
+
## Your Response
|
|
147
|
+
When done, respond with EXACTLY this JSON structure (no other text):
|
|
148
|
+
{
|
|
149
|
+
"specId": "<the spec ID you created>",
|
|
150
|
+
"title": "<plan title>",
|
|
151
|
+
"sprintCount": <number>,
|
|
152
|
+
"contractIds": ["<contract-id-1>", "<contract-id-2>", ...],
|
|
153
|
+
"summary": "<2-3 sentence summary of the plan>"
|
|
154
|
+
}
|
|
155
|
+
```
|
|
75
156
|
|
|
76
|
-
|
|
77
|
-
2. Ask 3-5 clarifying questions about the task
|
|
78
|
-
3. Wait for user responses
|
|
79
|
-
4. Generate the PlanSpec with sprint decomposition
|
|
80
|
-
5. Save everything to `.bober/`
|
|
157
|
+
**After the planner subagent returns:**
|
|
81
158
|
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
159
|
+
1. Parse the planner's response to extract `specId` and `contractIds`.
|
|
160
|
+
2. Read `.bober/specs/<specId>.json` to verify it was created.
|
|
161
|
+
3. Read each contract file in `.bober/contracts/` to verify they exist.
|
|
162
|
+
4. Print the plan summary:
|
|
163
|
+
```
|
|
164
|
+
=== PLAN CREATED ===
|
|
165
|
+
Spec: <specId>
|
|
166
|
+
Title: <title>
|
|
167
|
+
Sprints: <count>
|
|
168
|
+
1. <Sprint 1 title>
|
|
169
|
+
2. <Sprint 2 title>
|
|
170
|
+
...
|
|
171
|
+
```
|
|
172
|
+
5. If the planner subagent failed or returned an error, report it and stop the pipeline.
|
|
86
173
|
|
|
87
|
-
|
|
174
|
+
---
|
|
88
175
|
|
|
89
|
-
## Step
|
|
176
|
+
## Step 3: Sprint Execution Loop
|
|
90
177
|
|
|
91
178
|
Load the sprint contracts from `.bober/contracts/` in order. For each sprint with status `proposed` or `needs-rework`:
|
|
92
179
|
|
|
93
|
-
###
|
|
180
|
+
### 3a. Pre-Sprint Checks
|
|
94
181
|
|
|
95
|
-
1. **Verify dependencies:** All sprints in `dependsOn` must have status `completed
|
|
96
|
-
2. **Verify build state:** The project must build before starting a new sprint
|
|
182
|
+
1. **Verify dependencies:** All sprints in `dependsOn` must have status `completed`.
|
|
183
|
+
2. **Verify build state:** The project must build before starting a new sprint.
|
|
97
184
|
```bash
|
|
98
|
-
# Run configured build/compile command
|
|
99
|
-
# e.g., npm run build, anchor build, forge build, cargo build
|
|
185
|
+
# Run configured build/compile command from bober.config.json commands.build
|
|
100
186
|
```
|
|
101
|
-
If the build is broken BEFORE the sprint starts, stop and report this to the user.
|
|
102
|
-
3. **Verify git state:** Ensure we are on the correct feature branch
|
|
187
|
+
If the build is broken BEFORE the sprint starts, stop and report this to the user.
|
|
188
|
+
3. **Verify git state:** Ensure we are on the correct feature branch.
|
|
103
189
|
```bash
|
|
104
190
|
git branch --show-current
|
|
105
191
|
```
|
|
106
192
|
4. **Check iteration budget:** Read `pipeline.maxIterations` from config. Track total iterations across all sprints. If the budget is exhausted, stop.
|
|
107
193
|
|
|
108
|
-
|
|
194
|
+
Print phase banner:
|
|
195
|
+
```
|
|
196
|
+
=== SPRINT <N>/<total>: <title> ===
|
|
197
|
+
Iteration: 1 of <maxIterations>
|
|
198
|
+
Budget used: <used>/<max> total iterations
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
### 3b. Contract Negotiation
|
|
109
202
|
|
|
110
203
|
If the sprint status is `proposed`:
|
|
111
|
-
- Review success criteria for executability
|
|
112
|
-
- Verify evaluation strategies are available
|
|
113
|
-
- Adjust criteria if needed
|
|
114
204
|
- Update status to `in-progress`
|
|
205
|
+
- Save the updated contract back to `.bober/contracts/`
|
|
206
|
+
- Log event:
|
|
207
|
+
```json
|
|
208
|
+
{"event":"sprint-started","contractId":"...","specId":"...","timestamp":"..."}
|
|
209
|
+
```
|
|
115
210
|
|
|
116
|
-
###
|
|
211
|
+
### 3c. Build the Context Handoff
|
|
212
|
+
|
|
213
|
+
Build a context handoff JSON. This is the ONLY information the subagent receives — it must be self-contained.
|
|
214
|
+
|
|
215
|
+
**Context Handoff structure:**
|
|
216
|
+
```json
|
|
217
|
+
{
|
|
218
|
+
"handoffId": "handoff-<contractId>-gen-<iteration>",
|
|
219
|
+
"type": "to-generator",
|
|
220
|
+
"contractId": "<contract ID>",
|
|
221
|
+
"specId": "<spec ID>",
|
|
222
|
+
"timestamp": "<ISO-8601>",
|
|
223
|
+
"iteration": 1,
|
|
224
|
+
"context": {
|
|
225
|
+
"projectOverview": "<Brief project description from PlanSpec>",
|
|
226
|
+
"completedSprints": [
|
|
227
|
+
{
|
|
228
|
+
"contractId": "<ID>",
|
|
229
|
+
"title": "<title>",
|
|
230
|
+
"summary": "<what was built>"
|
|
231
|
+
}
|
|
232
|
+
],
|
|
233
|
+
"currentBranch": "<git branch name>",
|
|
234
|
+
"relevantFiles": ["<key files the generator should read>"]
|
|
235
|
+
},
|
|
236
|
+
"contract": { "<full SprintContract object>" },
|
|
237
|
+
"config": {
|
|
238
|
+
"commands": { "<commands section from bober.config.json>" },
|
|
239
|
+
"generator": { "<generator section from bober.config.json>" }
|
|
240
|
+
},
|
|
241
|
+
"principles": "<full text of .bober/principles.md or null>",
|
|
242
|
+
"evaluatorFeedback": null
|
|
243
|
+
}
|
|
244
|
+
```
|
|
117
245
|
|
|
118
|
-
|
|
119
|
-
- Include the contract, project context, config, and any evaluator feedback (for retries)
|
|
120
|
-
- Include summaries of completed sprints
|
|
121
|
-
- Include relevant file paths
|
|
246
|
+
For retry iterations (iteration > 1), populate `evaluatorFeedback` with the evaluator's failure details.
|
|
122
247
|
|
|
123
|
-
|
|
248
|
+
Save the handoff to `.bober/handoffs/<handoffId>.json`.
|
|
124
249
|
|
|
125
|
-
|
|
126
|
-
- Read the Generator's completion report
|
|
127
|
-
- Verify commits were made
|
|
128
|
-
- Proceed to evaluation
|
|
250
|
+
### 3d. Spawn the Generator Subagent
|
|
129
251
|
|
|
130
|
-
|
|
252
|
+
Use the **Agent tool** to spawn a generator subagent.
|
|
131
253
|
|
|
132
|
-
|
|
133
|
-
- Include the contract, Generator's report, config
|
|
254
|
+
**How to call the Agent tool:**
|
|
134
255
|
|
|
135
|
-
|
|
256
|
+
```
|
|
257
|
+
Agent tool call:
|
|
258
|
+
description: "Sprint <N>: <sprint title>"
|
|
259
|
+
prompt: <the full prompt below>
|
|
260
|
+
```
|
|
136
261
|
|
|
137
|
-
|
|
138
|
-
- Read the EvalResult
|
|
139
|
-
- Save it to `.bober/eval-results/`
|
|
140
|
-
- Determine pass/fail
|
|
262
|
+
**Build the generator prompt:**
|
|
141
263
|
|
|
142
|
-
|
|
264
|
+
```
|
|
265
|
+
You are the Bober Generator subagent. You have been spawned by the orchestrator to implement a sprint.
|
|
266
|
+
|
|
267
|
+
## Context Handoff
|
|
268
|
+
<paste the FULL handoff JSON here — this is ALL the context you get>
|
|
269
|
+
|
|
270
|
+
## Instructions
|
|
271
|
+
1. Read the SprintContract at .bober/contracts/<contractId>.json
|
|
272
|
+
2. Read the PlanSpec at .bober/specs/<specId>.json for broader context
|
|
273
|
+
3. Read bober.config.json for commands configuration
|
|
274
|
+
4. Read .bober/principles.md if it exists — adhere to all principles strictly
|
|
275
|
+
5. Read the files listed in the contract's estimatedFiles
|
|
276
|
+
6. Implement the sprint according to the contract's success criteria
|
|
277
|
+
7. Self-verify: run build, typecheck, lint, and test commands
|
|
278
|
+
8. Commit your changes with proper messages (format: "bober(<sprint-N>): <description>")
|
|
279
|
+
9. Work on the feature branch, never on main/master
|
|
280
|
+
|
|
281
|
+
<IF iteration > 1>
|
|
282
|
+
## IMPORTANT — This is a RETRY (iteration <N>)
|
|
283
|
+
The previous attempt failed evaluation. Here is the evaluator's feedback:
|
|
284
|
+
<paste evaluator feedback JSON>
|
|
285
|
+
|
|
286
|
+
Focus on fixing the specific failures listed above. Read the feedback line by line before making any changes.
|
|
287
|
+
</IF>
|
|
288
|
+
|
|
289
|
+
## Your Response
|
|
290
|
+
When done, respond with EXACTLY this JSON structure (no other text):
|
|
291
|
+
{
|
|
292
|
+
"contractId": "<contract ID>",
|
|
293
|
+
"status": "complete | partial | blocked",
|
|
294
|
+
"criteriaResults": [
|
|
295
|
+
{
|
|
296
|
+
"criterionId": "sc-X-Y",
|
|
297
|
+
"met": true/false,
|
|
298
|
+
"evidence": "<verification evidence>"
|
|
299
|
+
}
|
|
300
|
+
],
|
|
301
|
+
"filesChanged": [
|
|
302
|
+
{
|
|
303
|
+
"path": "<file path>",
|
|
304
|
+
"action": "created | modified | deleted",
|
|
305
|
+
"description": "<what changed>"
|
|
306
|
+
}
|
|
307
|
+
],
|
|
308
|
+
"testsAdded": ["<test file paths>"],
|
|
309
|
+
"commits": ["<hash> - <message>"],
|
|
310
|
+
"blockers": ["<any unresolved issues>"],
|
|
311
|
+
"notes": "<additional context for the evaluator>"
|
|
312
|
+
}
|
|
313
|
+
```
|
|
314
|
+
|
|
315
|
+
**After the generator subagent returns:**
|
|
316
|
+
|
|
317
|
+
1. Parse the generator's response to extract the completion report.
|
|
318
|
+
2. Verify commits were made: `git log --oneline -5`
|
|
319
|
+
3. Save the generator report to `.bober/handoffs/gen-report-<contractId>-<iteration>.json`
|
|
320
|
+
4. Log event:
|
|
321
|
+
```json
|
|
322
|
+
{"event":"sprint-iteration-started","contractId":"...","iteration":N,"timestamp":"..."}
|
|
323
|
+
```
|
|
324
|
+
5. If the generator subagent crashed or returned an error, mark the sprint as `needs-rework` and log it.
|
|
325
|
+
|
|
326
|
+
### 3e. Spawn the Evaluator Subagent
|
|
327
|
+
|
|
328
|
+
Use the **Agent tool** to spawn an evaluator subagent.
|
|
329
|
+
|
|
330
|
+
**How to call the Agent tool:**
|
|
331
|
+
|
|
332
|
+
```
|
|
333
|
+
Agent tool call:
|
|
334
|
+
description: "Evaluate sprint <N>: <sprint title>"
|
|
335
|
+
prompt: <the full prompt below>
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
**Build the evaluator prompt:**
|
|
339
|
+
|
|
340
|
+
```
|
|
341
|
+
You are the Bober Evaluator subagent. You have been spawned by the orchestrator to evaluate a sprint.
|
|
342
|
+
|
|
343
|
+
## Sprint Contract
|
|
344
|
+
<paste the full SprintContract JSON>
|
|
345
|
+
|
|
346
|
+
## Generator's Completion Report
|
|
347
|
+
<paste the generator's completion report JSON>
|
|
348
|
+
|
|
349
|
+
## Project Configuration
|
|
350
|
+
<paste relevant sections of bober.config.json: commands, evaluator>
|
|
351
|
+
|
|
352
|
+
## Project Principles
|
|
353
|
+
<paste full text of .bober/principles.md or "No principles file found.">
|
|
354
|
+
|
|
355
|
+
## Context
|
|
356
|
+
- Contract ID: <contractId>
|
|
357
|
+
- Spec ID: <specId>
|
|
358
|
+
- Sprint: <N> of <total>
|
|
359
|
+
- Iteration: <N>
|
|
360
|
+
- Branch: <current git branch>
|
|
361
|
+
- Changed files (per generator): <list of files>
|
|
362
|
+
|
|
363
|
+
## Instructions
|
|
364
|
+
1. Read the SprintContract at .bober/contracts/<contractId>.json
|
|
365
|
+
2. Read bober.config.json for configured eval strategies and commands
|
|
366
|
+
3. Run each configured evaluation strategy (typecheck, lint, build, unit-test, playwright, api-check) using the commands from config
|
|
367
|
+
4. Verify EVERY success criterion in the contract one by one
|
|
368
|
+
5. Check for regressions (pre-existing tests still passing, build stability)
|
|
369
|
+
6. Check adherence to project principles
|
|
370
|
+
7. Produce a structured EvalResult
|
|
371
|
+
|
|
372
|
+
IMPORTANT: You do NOT have Write or Edit tools. Output the EvalResult JSON in your response, and the orchestrator will save it to disk.
|
|
373
|
+
|
|
374
|
+
## Your Response
|
|
375
|
+
When done, respond with EXACTLY this JSON structure (no other text):
|
|
376
|
+
{
|
|
377
|
+
"evalId": "eval-<contractId>-<iteration>",
|
|
378
|
+
"contractId": "<contract ID>",
|
|
379
|
+
"specId": "<spec ID>",
|
|
380
|
+
"timestamp": "<ISO-8601>",
|
|
381
|
+
"iteration": <N>,
|
|
382
|
+
"overallResult": "pass | fail",
|
|
383
|
+
"score": {
|
|
384
|
+
"criteriaTotal": <N>,
|
|
385
|
+
"criteriaPassed": <N>,
|
|
386
|
+
"criteriaFailed": <N>,
|
|
387
|
+
"criteriaSkipped": <N>,
|
|
388
|
+
"requiredPassed": <N>,
|
|
389
|
+
"requiredFailed": <N>,
|
|
390
|
+
"requiredTotal": <N>
|
|
391
|
+
},
|
|
392
|
+
"strategyResults": [
|
|
393
|
+
{
|
|
394
|
+
"strategy": "<type>",
|
|
395
|
+
"required": true/false,
|
|
396
|
+
"result": "pass | fail | skipped",
|
|
397
|
+
"output": "<relevant output>",
|
|
398
|
+
"details": "<explanation>"
|
|
399
|
+
}
|
|
400
|
+
],
|
|
401
|
+
"criteriaResults": [
|
|
402
|
+
{
|
|
403
|
+
"criterionId": "sc-X-Y",
|
|
404
|
+
"description": "<criterion>",
|
|
405
|
+
"required": true/false,
|
|
406
|
+
"result": "pass | fail | skipped",
|
|
407
|
+
"evidence": "<evidence>",
|
|
408
|
+
"feedback": "<failure details if failed>"
|
|
409
|
+
}
|
|
410
|
+
],
|
|
411
|
+
"regressions": [],
|
|
412
|
+
"generatorFeedback": [],
|
|
413
|
+
"summary": "<2-3 sentence summary>"
|
|
414
|
+
}
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
**After the evaluator subagent returns:**
|
|
418
|
+
|
|
419
|
+
1. Parse the evaluator's response to extract the EvalResult.
|
|
420
|
+
2. Save the EvalResult to `.bober/eval-results/eval-<contractId>-<iteration>.json` (the evaluator cannot write files).
|
|
421
|
+
3. Determine pass/fail from the `overallResult` field.
|
|
422
|
+
|
|
423
|
+
### 3f. Process the Evaluation Result
|
|
143
424
|
|
|
144
425
|
**On PASS:**
|
|
145
|
-
1. Update contract status to `completed`
|
|
146
|
-
2. Update `.bober/progress.md
|
|
147
|
-
3. Log
|
|
148
|
-
|
|
426
|
+
1. Update contract status to `completed` and save to `.bober/contracts/`.
|
|
427
|
+
2. Update `.bober/progress.md`.
|
|
428
|
+
3. Log event:
|
|
429
|
+
```json
|
|
430
|
+
{"event":"sprint-completed","contractId":"...","specId":"...","iteration":N,"timestamp":"..."}
|
|
431
|
+
```
|
|
432
|
+
4. Print milestone:
|
|
149
433
|
```
|
|
150
|
-
Sprint <N>/<total> PASSED
|
|
434
|
+
=== Sprint <N>/<total> PASSED ===
|
|
435
|
+
Title: <title>
|
|
436
|
+
Iteration: <M>
|
|
151
437
|
Progress: [=====> ] <N>/<total> sprints complete
|
|
152
438
|
Next: <next sprint title>
|
|
153
439
|
```
|
|
154
|
-
5. Move to next sprint
|
|
440
|
+
5. Move to next sprint.
|
|
155
441
|
|
|
156
442
|
**On FAIL with retries remaining:**
|
|
157
|
-
1. Check if iteration
|
|
158
|
-
2.
|
|
159
|
-
|
|
443
|
+
1. Check if iteration < `evaluator.maxIterations` (default: 3).
|
|
444
|
+
2. Log event:
|
|
445
|
+
```json
|
|
446
|
+
{"event":"sprint-iteration-failed","contractId":"...","iteration":N,"failedCriteria":[...],"timestamp":"..."}
|
|
160
447
|
```
|
|
161
|
-
|
|
162
|
-
Failed: <brief failure summary>
|
|
448
|
+
3. Print retry notice:
|
|
163
449
|
```
|
|
450
|
+
=== Sprint <N> iteration <M> FAILED ===
|
|
451
|
+
Failed criteria: <list>
|
|
452
|
+
Retrying (iteration <M+1> of <maxIterations>)...
|
|
453
|
+
```
|
|
454
|
+
4. Build a NEW context handoff with evaluator feedback included.
|
|
455
|
+
5. Go back to step 3d (spawn a FRESH generator subagent with the feedback).
|
|
456
|
+
|
|
457
|
+
**On FAIL with no retries remaining:**
|
|
458
|
+
1. Update contract status to `needs-rework` and save.
|
|
459
|
+
2. Log event:
|
|
460
|
+
```json
|
|
461
|
+
{"event":"sprint-failed","contractId":"...","specId":"...","totalIterations":N,"timestamp":"..."}
|
|
462
|
+
```
|
|
463
|
+
3. Decide whether to continue or stop:
|
|
464
|
+
- If the failure is in a non-blocking sprint (nothing depends on it), skip and continue.
|
|
465
|
+
- If the failure blocks subsequent sprints, stop the pipeline.
|
|
466
|
+
4. Print failure report with full context.
|
|
164
467
|
|
|
165
|
-
|
|
166
|
-
1. Update contract status to `needs-rework`
|
|
167
|
-
2. Decide whether to continue or stop based on severity:
|
|
168
|
-
- If the failure is in a non-blocking sprint (nothing depends on it), skip and continue
|
|
169
|
-
- If the failure blocks subsequent sprints, stop the pipeline
|
|
170
|
-
3. Report to user with full context
|
|
171
|
-
|
|
172
|
-
### 2f. Context Reset
|
|
468
|
+
### 3g. Context Reset
|
|
173
469
|
|
|
174
|
-
After each sprint completes (pass or fail), check `pipeline.contextReset
|
|
175
|
-
- `always`: Fresh context for the next sprint. The next sprint's Generator receives only its handoff document.
|
|
176
|
-
- `on-threshold`:
|
|
177
|
-
- `never`: Carry
|
|
470
|
+
After each sprint completes (pass or fail), check `pipeline.contextReset` from config:
|
|
471
|
+
- `always`: Fresh context for the next sprint. The next sprint's Generator receives only its handoff document. (This is the default with subagent architecture — each spawn IS a fresh context.)
|
|
472
|
+
- `on-threshold`: Same as `always` with subagents, since each subagent is already isolated.
|
|
473
|
+
- `never`: Carry summary forward in the handoff. Still a fresh subagent, but with richer handoff.
|
|
178
474
|
|
|
179
|
-
###
|
|
475
|
+
### 3h. Iteration Budget
|
|
180
476
|
|
|
181
477
|
Track total Generator-Evaluator iterations across all sprints:
|
|
182
|
-
- Each Generator+Evaluator cycle counts as 1 iteration
|
|
183
|
-
- When total iterations reach `pipeline.maxIterations` (default: 20), stop the pipeline
|
|
184
|
-
-
|
|
478
|
+
- Each Generator+Evaluator cycle counts as 1 iteration.
|
|
479
|
+
- When total iterations reach `pipeline.maxIterations` (default: 20), stop the pipeline.
|
|
480
|
+
- Print budget status after each cycle:
|
|
185
481
|
```
|
|
186
482
|
Iteration budget: <used>/<max>
|
|
187
483
|
```
|
|
188
484
|
|
|
189
|
-
|
|
485
|
+
---
|
|
486
|
+
|
|
487
|
+
## Step 4: Completion
|
|
190
488
|
|
|
191
489
|
When all sprints are complete (or the pipeline stops):
|
|
192
490
|
|
|
193
491
|
### All Sprints Passed
|
|
194
492
|
|
|
195
493
|
```
|
|
196
|
-
|
|
494
|
+
=== PIPELINE COMPLETE ===
|
|
197
495
|
|
|
198
496
|
All <N> sprints passed successfully.
|
|
199
497
|
|
|
200
498
|
### Results
|
|
201
|
-
1. [PASS] Sprint 1: <title>
|
|
202
|
-
2. [PASS] Sprint 2: <title>
|
|
499
|
+
1. [PASS] Sprint 1: <title> — iteration <M>
|
|
500
|
+
2. [PASS] Sprint 2: <title> — iteration <M>
|
|
203
501
|
...
|
|
204
502
|
|
|
205
503
|
### Statistics
|
|
206
504
|
- Total iterations: <N>
|
|
207
505
|
- Sprints: <N>/<N> passed
|
|
208
|
-
-
|
|
506
|
+
- Subagents spawned: <count>
|
|
209
507
|
|
|
210
508
|
### What Was Built
|
|
211
509
|
<Brief summary of the complete feature>
|
|
212
510
|
|
|
213
511
|
### Next Steps
|
|
214
512
|
- Review the code on branch: bober/<feature-slug>
|
|
215
|
-
- Run the test suite:
|
|
216
|
-
- Merge to main when ready
|
|
513
|
+
- Run the test suite: <configured test command>
|
|
514
|
+
- Merge to main when ready
|
|
217
515
|
```
|
|
218
516
|
|
|
219
517
|
### Pipeline Stopped (failures or budget exhausted)
|
|
220
518
|
|
|
221
519
|
```
|
|
222
|
-
|
|
520
|
+
=== PIPELINE STOPPED ===
|
|
223
521
|
|
|
224
522
|
Completed <M> of <N> sprints. Stopped because: <reason>
|
|
225
523
|
|
|
@@ -242,6 +540,8 @@ Sprint 3: <title>
|
|
|
242
540
|
- Run /bober.plan to revise the plan
|
|
243
541
|
```
|
|
244
542
|
|
|
543
|
+
---
|
|
544
|
+
|
|
245
545
|
## Human Escalation Protocol
|
|
246
546
|
|
|
247
547
|
Escalate to the user (pause and ask) when:
|
|
@@ -258,6 +558,8 @@ Escalate to the user (pause and ask) when:
|
|
|
258
558
|
|
|
259
559
|
4. **Halfway checkpoint:** For plans with 5+ sprints, pause after completing half the sprints to report progress and ask if the user wants to continue, adjust, or stop.
|
|
260
560
|
|
|
561
|
+
---
|
|
562
|
+
|
|
261
563
|
## Progress Tracking
|
|
262
564
|
|
|
263
565
|
Throughout the pipeline, keep `.bober/progress.md` updated:
|
|
@@ -288,6 +590,7 @@ Last updated: <timestamp>
|
|
|
288
590
|
### Pipeline Statistics
|
|
289
591
|
- Total iterations used: 4 / 20
|
|
290
592
|
- Sprints completed: 2 / 5
|
|
593
|
+
- Subagents spawned: 6
|
|
291
594
|
```
|
|
292
595
|
|
|
293
596
|
And keep `.bober/history.jsonl` updated with events:
|
|
@@ -301,10 +604,14 @@ And keep `.bober/history.jsonl` updated with events:
|
|
|
301
604
|
- `pipeline-stopped`
|
|
302
605
|
- `human-escalation`
|
|
303
606
|
|
|
304
|
-
|
|
607
|
+
---
|
|
608
|
+
|
|
609
|
+
## Error Handling
|
|
305
610
|
|
|
611
|
+
- **Subagent crash/timeout:** If a subagent call via the Agent tool fails or returns an error, catch it. Log the error, mark the sprint as `needs-rework`, and decide whether to retry or escalate. Do NOT let a subagent failure crash the entire pipeline.
|
|
612
|
+
- **Subagent returns malformed response:** If you cannot parse the subagent's JSON response, read the files on disk (`.bober/specs/`, `.bober/contracts/`, `.bober/eval-results/`) as the source of truth. The subagent may have saved files correctly even if its response text was garbled.
|
|
306
613
|
- **Git conflicts:** Pause and report to user. Do not auto-resolve.
|
|
307
614
|
- **npm install failures:** Try once. If it fails, report to user.
|
|
308
615
|
- **Dev server won't start:** Needed for API checks and Playwright. Report as a configuration issue.
|
|
309
|
-
- **Out of context window:**
|
|
310
|
-
- **Previous sprint broke something:** If a completed sprint's code is causing issues in a later sprint, note this but do not go back and modify completed sprints. Instead,
|
|
616
|
+
- **Out of context window:** With subagent architecture, this is largely mitigated — each subagent gets a fresh context. If YOUR orchestrator context gets long, summarize completed sprints more aggressively in the handoff documents.
|
|
617
|
+
- **Previous sprint broke something:** If a completed sprint's code is causing issues in a later sprint, note this but do not go back and modify completed sprints. Instead, include the issue details in the current sprint's generator handoff so it can fix the problem within its scope.
|