deepflow 0.1.71 → 0.1.73
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +79 -203
- package/bin/install.js +2 -6
- package/package.json +7 -3
- package/src/commands/df/auto-cycle.md +384 -0
- package/src/commands/df/auto.md +69 -6
- package/src/commands/df/execute.md +348 -216
- package/src/commands/df/plan.md +45 -0
- package/src/commands/df/verify.md +75 -30
- package/src/agents/deepflow-auto.md +0 -667
|
@@ -2,16 +2,16 @@
|
|
|
2
2
|
|
|
3
3
|
## Orchestrator Role
|
|
4
4
|
|
|
5
|
-
You are a coordinator. Spawn agents,
|
|
5
|
+
You are a coordinator. Spawn agents, run ratchet checks, update PLAN.md. Never implement code yourself.
|
|
6
6
|
|
|
7
|
-
**NEVER:** Read source files, edit code,
|
|
7
|
+
**NEVER:** Read source files, edit code, use TaskOutput, use EnterPlanMode, use ExitPlanMode
|
|
8
8
|
|
|
9
|
-
**ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents,
|
|
9
|
+
**ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, run ratchet health checks after each agent completes, update PLAN.md, write `.deepflow/decisions.md` in the main tree
|
|
10
10
|
|
|
11
11
|
---
|
|
12
12
|
|
|
13
13
|
## Purpose
|
|
14
|
-
Implement tasks from PLAN.md with parallel agents, atomic commits, and context-efficient execution.
|
|
14
|
+
Implement tasks from PLAN.md with parallel agents, atomic commits, ratchet-driven quality gates, and context-efficient execution.
|
|
15
15
|
|
|
16
16
|
## Usage
|
|
17
17
|
```
|
|
@@ -26,11 +26,13 @@ Implement tasks from PLAN.md with parallel agents, atomic commits, and context-e
|
|
|
26
26
|
- Skill: `atomic-commits` — Clean commit protocol
|
|
27
27
|
|
|
28
28
|
**Use Task tool to spawn agents:**
|
|
29
|
-
| Agent | subagent_type |
|
|
30
|
-
|
|
31
|
-
| Implementation | `general-purpose` |
|
|
32
|
-
|
|
|
33
|
-
|
|
29
|
+
| Agent | subagent_type | Purpose |
|
|
30
|
+
|-------|---------------|---------|
|
|
31
|
+
| Implementation | `general-purpose` | Task implementation |
|
|
32
|
+
| Debugger | `reasoner` | Debugging failures |
|
|
33
|
+
|
|
34
|
+
**Model routing from frontmatter:**
|
|
35
|
+
The model for each agent is determined by the `model:` field in the command/agent/skill frontmatter being invoked. The orchestrator reads the relevant frontmatter to determine which model to pass to `Task()`. If no `model:` field is present in the frontmatter, default to `sonnet`.
|
|
34
36
|
|
|
35
37
|
## Context-Aware Execution
|
|
36
38
|
|
|
@@ -54,53 +56,15 @@ Each task = one background agent. Use agent completion notifications as the feed
|
|
|
54
56
|
2. STOP. End your turn. Do NOT run Bash monitors or poll for results.
|
|
55
57
|
3. Wait for "Agent X completed" notifications (they arrive automatically)
|
|
56
58
|
4. On EACH notification:
|
|
57
|
-
a.
|
|
58
|
-
b. Report: "✓ T1:
|
|
59
|
+
a. Run ratchet check (health checks on the worktree)
|
|
60
|
+
b. Report: "✓ T1: ratchet passed (abc123)" or "✗ T1: ratchet failed, reverted"
|
|
59
61
|
c. Update PLAN.md for that task
|
|
60
62
|
d. Check: all wave agents done?
|
|
61
63
|
- No → end turn, wait for next notification
|
|
62
64
|
- Yes → proceed to next wave or write final summary
|
|
63
65
|
```
|
|
64
66
|
|
|
65
|
-
After spawning, your turn ENDS. Per notification:
|
|
66
|
-
|
|
67
|
-
Result file `.deepflow/results/{task_id}.yaml`:
|
|
68
|
-
```yaml
|
|
69
|
-
task: T3
|
|
70
|
-
status: success|failed
|
|
71
|
-
commit: abc1234
|
|
72
|
-
summary: "one line"
|
|
73
|
-
tests_ran: true|false
|
|
74
|
-
test_command: "npm test"
|
|
75
|
-
test_exit_code: 0
|
|
76
|
-
test_output_tail: |
|
|
77
|
-
PASS src/upload.test.ts
|
|
78
|
-
Tests: 12 passed, 12 total
|
|
79
|
-
```
|
|
80
|
-
|
|
81
|
-
New fields: `tests_ran` (bool), `test_command` (string), `test_exit_code` (int), `test_output_tail` (last 20 lines of output).
|
|
82
|
-
|
|
83
|
-
**Spike result file** `.deepflow/results/{task_id}.yaml` (additional fields):
|
|
84
|
-
```yaml
|
|
85
|
-
task: T1
|
|
86
|
-
type: spike
|
|
87
|
-
status: success|failed
|
|
88
|
-
commit: abc1234
|
|
89
|
-
summary: "one line"
|
|
90
|
-
criteria:
|
|
91
|
-
- name: "throughput"
|
|
92
|
-
target: ">= 7000 g/s"
|
|
93
|
-
actual: "1500 g/s"
|
|
94
|
-
met: false
|
|
95
|
-
- name: "memory usage"
|
|
96
|
-
target: "< 500 MB"
|
|
97
|
-
actual: "320 MB"
|
|
98
|
-
met: true
|
|
99
|
-
all_criteria_met: false # ALL must be true for spike to pass
|
|
100
|
-
experiment_file: ".deepflow/experiments/upload--streaming--failed.md"
|
|
101
|
-
```
|
|
102
|
-
|
|
103
|
-
**CRITICAL:** `status` MUST equal `success` only if `all_criteria_met: true`. The spike verifier will reject mismatches.
|
|
67
|
+
After spawning, your turn ENDS. Per notification: run ratchet, output ONE line, update PLAN.md. Write full summary only after ALL wave agents complete.
|
|
104
68
|
|
|
105
69
|
## Checkpoint & Resume
|
|
106
70
|
|
|
@@ -160,6 +124,66 @@ fi
|
|
|
160
124
|
|
|
161
125
|
**--fresh flag:** Deletes existing worktree and creates new one.
|
|
162
126
|
|
|
127
|
+
### 1.6. RATCHET SNAPSHOT
|
|
128
|
+
|
|
129
|
+
Before spawning agents, snapshot pre-existing test files:
|
|
130
|
+
|
|
131
|
+
```bash
|
|
132
|
+
cd ${WORKTREE_PATH}
|
|
133
|
+
|
|
134
|
+
# Snapshot pre-existing test files (only these count for ratchet)
|
|
135
|
+
git ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' \
|
|
136
|
+
> .deepflow/auto-snapshot.txt
|
|
137
|
+
|
|
138
|
+
echo "Ratchet snapshot: $(wc -l < .deepflow/auto-snapshot.txt) pre-existing test files"
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
**Only pre-existing test files are used for ratchet evaluation.** New test files created by agents during implementation don't influence the pass/fail decision. This prevents agents from gaming the ratchet by writing tests that pass trivially.
|
|
142
|
+
|
|
143
|
+
### 1.7. NO-TESTS BOOTSTRAP
|
|
144
|
+
|
|
145
|
+
After the ratchet snapshot, check if zero test files were found:
|
|
146
|
+
|
|
147
|
+
```bash
|
|
148
|
+
TEST_COUNT=$(wc -l < .deepflow/auto-snapshot.txt | tr -d ' ')
|
|
149
|
+
|
|
150
|
+
if [ "${TEST_COUNT}" = "0" ]; then
|
|
151
|
+
echo "Bootstrap needed: no pre-existing test files found."
|
|
152
|
+
BOOTSTRAP_NEEDED=true
|
|
153
|
+
else
|
|
154
|
+
BOOTSTRAP_NEEDED=false
|
|
155
|
+
fi
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
**If `BOOTSTRAP_NEEDED=true`:**
|
|
159
|
+
|
|
160
|
+
1. **Inject a bootstrap task** as the FIRST action before any regular PLAN.md task is executed:
|
|
161
|
+
- Bootstrap task description: "Write tests for files in edit_scope"
|
|
162
|
+
- Read `edit_scope` from `specs/doing-*.md` to know which files need tests
|
|
163
|
+
- Spawn ONE dedicated bootstrap agent using the Bootstrap Task prompt (section 6)
|
|
164
|
+
|
|
165
|
+
2. **Bootstrap agent behavior:**
|
|
166
|
+
- Write tests covering the files listed in `edit_scope`
|
|
167
|
+
- Commit as `test({spec}): bootstrap tests for edit_scope`
|
|
168
|
+
- The bootstrap agent's ONLY job is writing tests — no implementation changes
|
|
169
|
+
|
|
170
|
+
3. **After bootstrap agent completes:**
|
|
171
|
+
- Run ratchet health checks (build must pass; test suite must not error out)
|
|
172
|
+
- If ratchet passes: re-take the ratchet snapshot so subsequent tasks use the new tests as baseline:
|
|
173
|
+
```bash
|
|
174
|
+
cd ${WORKTREE_PATH}
|
|
175
|
+
git ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' \
|
|
176
|
+
> .deepflow/auto-snapshot.txt
|
|
177
|
+
echo "Post-bootstrap snapshot: $(wc -l < .deepflow/auto-snapshot.txt) test files"
|
|
178
|
+
```
|
|
179
|
+
- If ratchet fails: revert bootstrap commit, log error, halt and report "Bootstrap failed — manual intervention required"
|
|
180
|
+
|
|
181
|
+
4. **Signal to caller:** After bootstrap completes successfully, report `"bootstrap: completed"` in the cycle summary. This cycle's sole output is the test bootstrap — no regular PLAN.md task is executed this cycle.
|
|
182
|
+
|
|
183
|
+
5. **Subsequent cycles:** The updated `.deepflow/auto-snapshot.txt` now contains the bootstrapped test files. All subsequent ratchet checks use these as the baseline.
|
|
184
|
+
|
|
185
|
+
**If `BOOTSTRAP_NEEDED=false`:** Proceed normally to section 2.
|
|
186
|
+
|
|
163
187
|
### 2. LOAD PLAN
|
|
164
188
|
|
|
165
189
|
```
|
|
@@ -175,162 +199,251 @@ For each `[ ]` task in PLAN.md: `TaskCreate(subject: "{task_id}: {description}",
|
|
|
175
199
|
|
|
176
200
|
Warn if `specs/*.md` (excluding doing-/done-) exist. Non-blocking.
|
|
177
201
|
|
|
178
|
-
### 4.
|
|
202
|
+
### 4. IDENTIFY READY TASKS
|
|
179
203
|
|
|
180
|
-
|
|
204
|
+
Use TaskList to find ready tasks:
|
|
181
205
|
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
-
|
|
206
|
+
```
|
|
207
|
+
Ready = TaskList results where:
|
|
208
|
+
- status: "pending"
|
|
209
|
+
- blockedBy: empty (auto-unblocked by native dependency system)
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
### 5. SPAWN AGENTS
|
|
185
213
|
|
|
186
|
-
|
|
214
|
+
Context ≥50%: checkpoint and exit.
|
|
187
215
|
|
|
216
|
+
**Before spawning each agent**, mark its native task as in_progress:
|
|
188
217
|
```
|
|
189
|
-
|
|
190
|
-
If task is spike task:
|
|
191
|
-
→ Mark as executable (spikes are always allowed)
|
|
192
|
-
Else if task is blocked by a spike task (T{n}):
|
|
193
|
-
→ Find related experiment file in .deepflow/experiments/
|
|
194
|
-
→ Check experiment status:
|
|
195
|
-
- --passed.md exists → Unblock, proceed with implementation
|
|
196
|
-
- --failed.md exists → Keep blocked, warn user
|
|
197
|
-
- --active.md exists → Keep blocked, spike in progress
|
|
198
|
-
- No experiment → Keep blocked, spike not started
|
|
218
|
+
TaskUpdate(taskId: native_id, status: "in_progress")
|
|
199
219
|
```
|
|
220
|
+
This activates the UI spinner showing the task's activeForm (e.g. "Creating upload endpoint").
|
|
200
221
|
|
|
201
|
-
**
|
|
222
|
+
**NEVER use `isolation: "worktree"` on Task tool calls.** Deepflow manages a shared worktree per spec (`.deepflow/worktrees/{spec}/`) so wave 2 agents see wave 1 commits. Claude Code's native isolation creates separate per-agent worktrees (`.claude/worktrees/`) where agents can't see each other's work.
|
|
202
223
|
|
|
203
|
-
|
|
204
|
-
Glob: .deepflow/experiments/{topic}--*--{status}.md
|
|
224
|
+
**Spawn ALL ready tasks in ONE message** with multiple Task tool calls (true parallelism). Same-file conflicts: spawn sequentially.
|
|
205
225
|
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
3. Fuzzy match: normalize and match
|
|
210
|
-
```
|
|
226
|
+
**Multiple [SPIKE] tasks for the same problem:** When PLAN.md contains two or more `[SPIKE]` tasks grouped by the same "Blocked by:" target or identical problem description, do NOT run them sequentially. Instead, follow the **Parallel Spike Probes** protocol in section 5.7 before spawning any implementation tasks that depend on the spike outcome.
|
|
227
|
+
|
|
228
|
+
### 5.5. RATCHET CHECK
|
|
211
229
|
|
|
212
|
-
|
|
230
|
+
After each agent completes (notification received), the orchestrator runs health checks on the worktree.
|
|
213
231
|
|
|
214
|
-
|
|
215
|
-
|-------------------|-------------|--------|
|
|
216
|
-
| `--passed.md` | Ready | Execute full implementation |
|
|
217
|
-
| `--failed.md` | Blocked | Skip, warn: "Experiment failed, re-plan needed" |
|
|
218
|
-
| `--active.md` | Blocked | Skip, info: "Waiting for spike completion" |
|
|
219
|
-
| Not found | Blocked | Skip, info: "Spike task not executed yet" |
|
|
232
|
+
**Step 1: Detect commands** (same auto-detection as /df:verify):
|
|
220
233
|
|
|
221
|
-
|
|
234
|
+
| File | Build | Test | Typecheck | Lint |
|
|
235
|
+
|------|-------|------|-----------|------|
|
|
236
|
+
| `package.json` | `npm run build` (if scripts.build) | `npm test` (if scripts.test not placeholder) | `npx tsc --noEmit` (if tsconfig.json) | `npm run lint` (if scripts.lint) |
|
|
237
|
+
| `pyproject.toml` | — | `pytest` | `mypy .` (if mypy in deps) | `ruff check .` (if ruff in deps) |
|
|
238
|
+
| `Cargo.toml` | `cargo build` | `cargo test` | — | `cargo clippy` (if installed) |
|
|
239
|
+
| `go.mod` | `go build ./...` | `go test ./...` | — | `go vet ./...` |
|
|
222
240
|
|
|
241
|
+
**Step 2: Run health checks** in the worktree:
|
|
242
|
+
```bash
|
|
243
|
+
cd ${WORKTREE_PATH}
|
|
244
|
+
|
|
245
|
+
# Run each detected command
|
|
246
|
+
# Build → Test → Typecheck → Lint (stop on first failure)
|
|
223
247
|
```
|
|
224
|
-
|
|
225
|
-
|
|
248
|
+
|
|
249
|
+
**Step 3: Validate edit scope** (if spec declares `edit_scope`):
|
|
250
|
+
```bash
|
|
251
|
+
# Get files changed by the agent
|
|
252
|
+
CHANGED=$(git diff HEAD~1 --name-only)
|
|
253
|
+
|
|
254
|
+
# Load edit_scope from spec (files/globs)
|
|
255
|
+
EDIT_SCOPE=$(grep 'edit_scope:' specs/doing-*.md | sed 's/edit_scope://' | tr ',' '\n' | xargs)
|
|
256
|
+
|
|
257
|
+
# Check each changed file against allowed scope
|
|
258
|
+
for file in ${CHANGED}; do
|
|
259
|
+
ALLOWED=false
|
|
260
|
+
for pattern in ${EDIT_SCOPE}; do
|
|
261
|
+
# Match file against glob pattern
|
|
262
|
+
[[ "${file}" == ${pattern} ]] && ALLOWED=true
|
|
263
|
+
done
|
|
264
|
+
${ALLOWED} || VIOLATIONS+=("${file}")
|
|
265
|
+
done
|
|
226
266
|
```
|
|
227
267
|
|
|
228
|
-
|
|
268
|
+
- Violations found → revert: `git revert HEAD --no-edit`, report "✗ Edit scope violation: {files}"
|
|
269
|
+
- No violations → continue to health checks
|
|
270
|
+
|
|
271
|
+
**Step 4: Evaluate**:
|
|
272
|
+
- All checks pass AND no scope violations → task succeeds, commit stands
|
|
273
|
+
- Any check fails → regression detected → revert: `git revert HEAD --no-edit`
|
|
274
|
+
|
|
275
|
+
**Ratchet uses ONLY pre-existing test files** (from `.deepflow/auto-snapshot.txt`). If the agent added new test files that fail, those are excluded from evaluation — the agent's new tests don't influence the ratchet decision.
|
|
276
|
+
|
|
277
|
+
**For spike tasks:** Same ratchet. If the spike's code passes pre-existing health checks, the spike passes. No LLM judges another LLM's work.
|
|
278
|
+
|
|
279
|
+
### 5.7. PARALLEL SPIKE PROBES
|
|
229
280
|
|
|
230
|
-
|
|
281
|
+
When two or more `[SPIKE]` tasks address the **same problem** (same "Blocked by:" target OR identical or near-identical hypothesis wording), treat them as a probe set and run this protocol instead of the standard single-agent flow.
|
|
282
|
+
|
|
283
|
+
#### Detection
|
|
231
284
|
|
|
232
285
|
```
|
|
233
|
-
|
|
234
|
-
-
|
|
235
|
-
-
|
|
286
|
+
Spike group = all [SPIKE] tasks where:
|
|
287
|
+
- same "Blocked by:" value, OR
|
|
288
|
+
- problem description is identical after stripping task ID prefix
|
|
289
|
+
If group size ≥ 2 → enter parallel probe mode
|
|
236
290
|
```
|
|
237
291
|
|
|
238
|
-
|
|
239
|
-
- If task depends on spike AND experiment not `--passed.md` → still blocked
|
|
240
|
-
- TaskUpdate to add spike as blocker if not already set
|
|
292
|
+
#### Step 1: Record baseline commit
|
|
241
293
|
|
|
242
|
-
|
|
294
|
+
```bash
|
|
295
|
+
cd ${WORKTREE_PATH}
|
|
296
|
+
BASELINE=$(git rev-parse HEAD)
|
|
297
|
+
echo "Probe baseline: ${BASELINE}"
|
|
298
|
+
```
|
|
243
299
|
|
|
244
|
-
|
|
300
|
+
All probes branch from this exact commit so they share the same ratchet baseline.
|
|
245
301
|
|
|
246
|
-
|
|
302
|
+
#### Step 2: Create isolated sub-worktrees
|
|
247
303
|
|
|
248
|
-
|
|
304
|
+
For each spike `{SPIKE_ID}` in the probe group:
|
|
305
|
+
|
|
306
|
+
```bash
|
|
307
|
+
PROBE_BRANCH="df/${SPEC_NAME}/probe-${SPIKE_ID}"
|
|
308
|
+
PROBE_PATH=".deepflow/worktrees/${SPEC_NAME}/probe-${SPIKE_ID}"
|
|
309
|
+
|
|
310
|
+
git worktree add -b "${PROBE_BRANCH}" "${PROBE_PATH}" "${BASELINE}"
|
|
311
|
+
echo "Created probe worktree: ${PROBE_PATH} (branch: ${PROBE_BRANCH})"
|
|
249
312
|
```
|
|
250
|
-
|
|
313
|
+
|
|
314
|
+
#### Step 3: Spawn all probes in parallel
|
|
315
|
+
|
|
316
|
+
Mark every spike task as `in_progress`, then spawn one agent per probe **in a single message** using the Spike Task prompt (section 6), with the probe's worktree path as its working directory.
|
|
317
|
+
|
|
318
|
+
```
|
|
319
|
+
TaskUpdate(taskId: native_id_SPIKE_A, status: "in_progress")
|
|
320
|
+
TaskUpdate(taskId: native_id_SPIKE_B, status: "in_progress")
|
|
321
|
+
[spawn agent for SPIKE_A → PROBE_PATH_A]
|
|
322
|
+
[spawn agent for SPIKE_B → PROBE_PATH_B]
|
|
323
|
+
... (all in ONE message)
|
|
251
324
|
```
|
|
252
|
-
This activates the UI spinner showing the task's activeForm (e.g. "Creating upload endpoint").
|
|
253
325
|
|
|
254
|
-
|
|
326
|
+
End your turn. Do NOT poll or monitor. Wait for completion notifications.
|
|
255
327
|
|
|
256
|
-
|
|
328
|
+
#### Step 4: Ratchet each probe (on completion notifications)
|
|
257
329
|
|
|
258
|
-
|
|
259
|
-
When spawning a spike task, the agent MUST:
|
|
260
|
-
1. Execute the minimal validation method
|
|
261
|
-
2. Record structured criteria evaluation in result file (see spike result schema above)
|
|
262
|
-
3. Write experiment file with `--active.md` status (verifier determines final status)
|
|
263
|
-
4. Commit as `spike({spec}): validate {hypothesis}`
|
|
330
|
+
When a probe agent's notification arrives, run the standard ratchet (section 5.5) against its dedicated probe worktree:
|
|
264
331
|
|
|
265
|
-
|
|
332
|
+
```bash
|
|
333
|
+
cd ${PROBE_PATH}
|
|
334
|
+
|
|
335
|
+
# Identical health-check commands as standard tasks
|
|
336
|
+
# Build → Test → Typecheck → Lint (stop on first failure)
|
|
337
|
+
```
|
|
338
|
+
|
|
339
|
+
Record per-probe metrics:
|
|
340
|
+
|
|
341
|
+
```yaml
|
|
342
|
+
probe_id: SPIKE_A
|
|
343
|
+
worktree: .deepflow/worktrees/{spec}/probe-SPIKE_A
|
|
344
|
+
branch: df/{spec}/probe-SPIKE_A
|
|
345
|
+
ratchet_passed: true/false
|
|
346
|
+
regressions: 0 # failing pre-existing tests
|
|
347
|
+
coverage_delta: +3 # new lines covered (positive = better)
|
|
348
|
+
files_changed: 4 # number of files touched
|
|
349
|
+
commit: abc1234
|
|
350
|
+
```
|
|
266
351
|
|
|
267
|
-
|
|
352
|
+
Wait until **all** probe notifications have arrived before proceeding to selection.
|
|
268
353
|
|
|
269
|
-
|
|
354
|
+
#### Step 5: Machine-select winner
|
|
270
355
|
|
|
271
|
-
|
|
356
|
+
No LLM evaluates another LLM's work. Apply the following ordered criteria to all probes that **passed** the ratchet:
|
|
272
357
|
|
|
273
|
-
**Spawn:**
|
|
274
358
|
```
|
|
275
|
-
|
|
359
|
+
1. Fewer regressions (lower is better — hard gate: any regression disqualifies)
|
|
360
|
+
2. Better coverage (higher delta is better)
|
|
361
|
+
3. Fewer files changed (lower is better — smaller blast radius)
|
|
362
|
+
|
|
363
|
+
Tie-break: first probe to complete (chronological)
|
|
276
364
|
```
|
|
277
365
|
|
|
278
|
-
**
|
|
366
|
+
If **no** probe passes the ratchet, all are failed probes. Log insights (step 7) and reset the spike tasks to `pending` for retry with debugger guidance.
|
|
367
|
+
|
|
368
|
+
#### Step 6: Preserve ALL probe worktrees
|
|
369
|
+
|
|
370
|
+
Do NOT delete losing probe worktrees. They are preserved for manual inspection and cross-cycle learning:
|
|
371
|
+
|
|
372
|
+
```bash
|
|
373
|
+
# Winning probe: leave as-is, will be used as implementation base (step 8)
|
|
374
|
+
# Losing probes: leave worktrees intact, mark branches with -failed suffix for clarity
|
|
375
|
+
git branch -m "df/{spec}/probe-SPIKE_B" "df/{spec}/probe-SPIKE_B-failed"
|
|
279
376
|
```
|
|
280
|
-
SPIKE VERIFICATION — Be skeptical. Catch false positives.
|
|
281
377
|
|
|
282
|
-
|
|
283
|
-
Result: {worktree_path}/.deepflow/results/{task_id}.yaml
|
|
284
|
-
Experiment: {worktree_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
|
|
378
|
+
Record all probe paths in `.deepflow/checkpoint.json` under `"spike_probes"` so future `--continue` runs know they exist.
|
|
285
379
|
|
|
286
|
-
|
|
287
|
-
1. Is `actual` a concrete number? (reject "good", "improved", "better")
|
|
288
|
-
2. Does `actual` satisfy `target`? Do the math.
|
|
289
|
-
3. Is `met` correct?
|
|
380
|
+
#### Step 7: Log failed probe insights
|
|
290
381
|
|
|
291
|
-
|
|
292
|
-
- "Works but doesn't meet target" → FAILED
|
|
293
|
-
- "Close enough" → FAILED
|
|
294
|
-
- Actual 1500 vs Target >= 7000 → FAILED
|
|
382
|
+
For every probe that failed the ratchet (or lost selection), write two entries to `.deepflow/auto-memory.yaml` in the **main** tree.
|
|
295
383
|
|
|
296
|
-
|
|
297
|
-
|
|
298
|
-
|
|
299
|
-
|
|
384
|
+
**Entry 1 — `spike_insights` (detailed probe record):**
|
|
385
|
+
|
|
386
|
+
```yaml
|
|
387
|
+
spike_insights:
|
|
388
|
+
- date: "YYYY-MM-DD"
|
|
389
|
+
spec: "{spec_name}"
|
|
390
|
+
spike_id: "SPIKE_B"
|
|
391
|
+
hypothesis: "{hypothesis text from PLAN.md}"
|
|
392
|
+
outcome: "failed" # or "passed-but-lost"
|
|
393
|
+
failure_reason: "{first failed check and error summary}"
|
|
394
|
+
ratchet_metrics:
|
|
395
|
+
regressions: 2
|
|
396
|
+
coverage_delta: -1
|
|
397
|
+
files_changed: 7
|
|
398
|
+
worktree: ".deepflow/worktrees/{spec}/probe-SPIKE_B-failed"
|
|
399
|
+
branch: "df/{spec}/probe-SPIKE_B-failed"
|
|
400
|
+
edge_cases: [] # orchestrator may populate after manual review
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
**Entry 2 — `probe_learnings` (cross-cycle memory, read by `/df:auto-cycle` on each cycle start):**
|
|
300
404
|
|
|
301
|
-
|
|
302
|
-
|
|
303
|
-
-
|
|
405
|
+
```yaml
|
|
406
|
+
probe_learnings:
|
|
407
|
+
- spike: "SPIKE_B"
|
|
408
|
+
probe: "{probe branch suffix, e.g. probe-SPIKE_B}"
|
|
409
|
+
insight: "{one-sentence summary of what the probe revealed, derived from failure_reason}"
|
|
304
410
|
```
|
|
305
411
|
|
|
306
|
-
|
|
412
|
+
If the file does not exist, create it. Initialize both `spike_insights:` and `probe_learnings:` as empty lists before appending. Preserve all existing keys when merging.
|
|
413
|
+
|
|
414
|
+
#### Step 8: Promote winning probe
|
|
415
|
+
|
|
416
|
+
Cherry-pick the winner's commit into the shared spec worktree so downstream implementation tasks see the winning approach:
|
|
417
|
+
|
|
418
|
+
```bash
|
|
419
|
+
cd ${WORKTREE_PATH} # shared worktree (not the probe sub-worktree)
|
|
420
|
+
git cherry-pick ${WINNER_COMMIT}
|
|
307
421
|
```
|
|
308
|
-
VERIFIED_PASS →
|
|
309
|
-
TaskUpdate(taskId: spike_native_id, status: "completed")
|
|
310
|
-
# Native system auto-unblocks dependent tasks
|
|
311
|
-
Log "✓ Spike {task_id} verified"
|
|
312
422
|
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
|
|
316
|
-
|
|
423
|
+
Then mark the winning spike task as `completed` and auto-unblock its dependents:
|
|
424
|
+
|
|
425
|
+
```
|
|
426
|
+
TaskUpdate(taskId: native_id_SPIKE_WINNER, status: "completed")
|
|
427
|
+
TaskUpdate(taskId: native_id_SPIKE_LOSERS, status: "pending") # keep visible for audit
|
|
317
428
|
```
|
|
318
429
|
|
|
319
|
-
|
|
430
|
+
Update PLAN.md:
|
|
431
|
+
- Winning spike → `[x]` with commit hash and `[PROBE_WINNER]` tag
|
|
432
|
+
- Losing spikes → `[~]` (skipped) with `[PROBE_FAILED: see auto-memory.yaml]` note
|
|
433
|
+
|
|
434
|
+
Resume the standard execution loop (section 9) — implementation tasks blocked by the spike group are now unblocked.
|
|
320
435
|
|
|
321
|
-
|
|
436
|
+
---
|
|
437
|
+
|
|
438
|
+
### 6. PER-TASK (agent prompt)
|
|
322
439
|
|
|
323
440
|
**Common preamble (include in all agent prompts):**
|
|
324
441
|
```
|
|
325
442
|
Working directory: {worktree_absolute_path}
|
|
326
443
|
All file operations MUST use this absolute path as base. Do NOT write files to the main project directory.
|
|
327
444
|
Commit format: {commit_type}({spec}): {description}
|
|
328
|
-
Result file: {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
|
|
329
445
|
|
|
330
|
-
STOP after
|
|
331
|
-
|
|
332
|
-
Navigation: Prefer LSP tools (goToDefinition, findReferences, workspaceSymbol) over Grep/Glob for code navigation. Fall back to Grep/Glob if LSP unavailable.
|
|
333
|
-
If LSP errors, install the language server (TS→typescript-language-server, Python→pyright, Rust→rust-analyzer, Go→gopls) and retry. If still unavailable, use Grep/Glob.
|
|
446
|
+
STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main. These are handled by the orchestrator and /df:verify.
|
|
334
447
|
```
|
|
335
448
|
|
|
336
449
|
**Standard Task (append after preamble):**
|
|
@@ -341,44 +454,49 @@ Spec: {spec_name}
|
|
|
341
454
|
|
|
342
455
|
Steps:
|
|
343
456
|
1. Implement the task
|
|
344
|
-
2.
|
|
345
|
-
|
|
346
|
-
|
|
347
|
-
|
|
348
|
-
|
|
457
|
+
2. Commit as feat({spec}): {description}
|
|
458
|
+
|
|
459
|
+
Your ONLY job is to write code and commit. The orchestrator will run health checks after you finish.
|
|
460
|
+
```
|
|
461
|
+
|
|
462
|
+
**Bootstrap Task (append after preamble):**
|
|
463
|
+
```
|
|
464
|
+
BOOTSTRAP: Write tests for files in edit_scope
|
|
465
|
+
Files: {edit_scope files from spec}
|
|
466
|
+
Spec: {spec_name}
|
|
467
|
+
|
|
468
|
+
Steps:
|
|
469
|
+
1. Write tests that cover the functionality of the files listed above
|
|
470
|
+
2. Do NOT change implementation files — tests only
|
|
471
|
+
3. Commit as test({spec}): bootstrap tests for edit_scope
|
|
472
|
+
|
|
473
|
+
Your ONLY job is to write tests and commit. The orchestrator will run health checks after you finish.
|
|
349
474
|
```
|
|
350
475
|
|
|
351
476
|
**Spike Task (append after preamble):**
|
|
352
477
|
```
|
|
353
478
|
{task_id} [SPIKE]: {hypothesis}
|
|
354
|
-
|
|
355
|
-
|
|
356
|
-
Success criteria: {measurable targets}
|
|
357
|
-
Experiment file: {worktree_absolute_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
|
|
479
|
+
Files: {target files}
|
|
480
|
+
Spec: {spec_name}
|
|
358
481
|
|
|
359
482
|
Steps:
|
|
360
|
-
1.
|
|
361
|
-
2.
|
|
362
|
-
3. Write experiment as --active.md (verifier determines final status)
|
|
363
|
-
4. Commit: spike({spec}): validate {hypothesis}
|
|
364
|
-
5. Write result to .deepflow/results/{task_id}.yaml (see spike result schema)
|
|
365
|
-
6. If test infrastructure exists, also run tests and include evidence in result file
|
|
483
|
+
1. Implement the minimal spike to validate the hypothesis
|
|
484
|
+
2. Commit as spike({spec}): {description}
|
|
366
485
|
|
|
367
|
-
|
|
368
|
-
- `met: true` ONLY if actual satisfies target
|
|
369
|
-
- `status: success` ONLY if ALL criteria met
|
|
370
|
-
- Worse than baseline = FAILED (baseline 7k, actual 1.5k → FAILED)
|
|
371
|
-
- "Close enough" = FAILED
|
|
372
|
-
- Verifier will check. False positives waste resources.
|
|
486
|
+
Your ONLY job is to write code and commit. The orchestrator will run health checks to determine if the spike passes.
|
|
373
487
|
```
|
|
374
488
|
|
|
375
|
-
###
|
|
489
|
+
### 7. FAILURE HANDLING
|
|
490
|
+
|
|
491
|
+
When a task fails ratchet and is reverted:
|
|
492
|
+
|
|
493
|
+
`TaskUpdate(taskId: native_id, status: "pending")` — keeps task visible for retry; dependents remain blocked.
|
|
376
494
|
|
|
377
|
-
|
|
495
|
+
On repeated failure: spawn `Task(subagent_type="reasoner", model={model from debugger frontmatter, default "sonnet"}, prompt="Debug failure: {ratchet output}")`.
|
|
378
496
|
|
|
379
|
-
|
|
497
|
+
Leave worktree intact, keep checkpoint.json, output: worktree path/branch, `cd {worktree_path}` to investigate, `/df:execute --continue` to resume, `/df:execute --fresh` to discard.
|
|
380
498
|
|
|
381
|
-
###
|
|
499
|
+
### 8. COMPLETE SPECS
|
|
382
500
|
|
|
383
501
|
When all tasks done for a `doing-*` spec:
|
|
384
502
|
1. Embed history in spec: `## Completed` section with task list and commit hashes
|
|
@@ -397,19 +515,16 @@ When all tasks done for a `doing-*` spec:
|
|
|
397
515
|
- Separators (`---`) between removed sections
|
|
398
516
|
5. Recalculate the Summary table at the top of PLAN.md (update counts for completed/pending)
|
|
399
517
|
|
|
400
|
-
###
|
|
518
|
+
### 9. ITERATE (Notification-Driven)
|
|
401
519
|
|
|
402
520
|
After spawning wave agents, your turn ENDS. Completion notifications drive the loop.
|
|
403
521
|
|
|
404
522
|
**Per notification:**
|
|
405
|
-
1.
|
|
406
|
-
2.
|
|
407
|
-
|
|
408
|
-
|
|
409
|
-
|
|
410
|
-
3. TaskUpdate(taskId: native_id, status: "completed") — auto-unblocks dependent tasks
|
|
411
|
-
4. Update PLAN.md: `[ ]` → `[x]` + commit hash (as before)
|
|
412
|
-
5. Report: "✓ T1: success (abc123) [12 tests passed]" or "⚠ T1: success (abc123) [no tests]"
|
|
523
|
+
1. Run ratchet check for the completed agent (see section 5.5)
|
|
524
|
+
2. Ratchet passed → `TaskUpdate(taskId: native_id, status: "completed")` — auto-unblocks dependent tasks
|
|
525
|
+
3. Ratchet failed → revert commit, `TaskUpdate(taskId: native_id, status: "pending")`
|
|
526
|
+
4. Update PLAN.md: `[ ]` → `[x]` + commit hash (on pass) or note revert (on fail)
|
|
527
|
+
5. Report: "✓ T1: ratchet passed (abc123)" or "✗ T1: ratchet failed, reverted"
|
|
413
528
|
6. If NOT all wave agents done → end turn, wait
|
|
414
529
|
7. If ALL wave agents done → use TaskList to find newly unblocked tasks, check context, spawn next wave or finish
|
|
415
530
|
|
|
@@ -421,69 +536,86 @@ After spawning wave agents, your turn ENDS. Completion notifications drive the l
|
|
|
421
536
|
|
|
422
537
|
| Rule | Detail |
|
|
423
538
|
|------|--------|
|
|
539
|
+
| Zero test files → bootstrap first | Section 1.7; bootstrap is the cycle's sole task when snapshot is empty |
|
|
424
540
|
| 1 task = 1 agent = 1 commit | `atomic-commits` skill |
|
|
425
541
|
| 1 file = 1 writer | Sequential if conflict |
|
|
426
|
-
|
|
|
542
|
+
| Agent writes code, orchestrator measures | Ratchet is the judge |
|
|
543
|
+
| No LLM evaluates LLM work | Health checks only |
|
|
544
|
+
| ≥2 spikes for same problem → parallel probes | Section 5.7; never run competing spikes sequentially |
|
|
545
|
+
| All probe worktrees preserved | Losing probes renamed with `-failed` suffix; never deleted |
|
|
546
|
+
| Machine-selected winner | Fewer regressions > better coverage > fewer files changed; no LLM judge |
|
|
547
|
+
| Failed probe insights logged | `.deepflow/auto-memory.yaml` in main tree; persists across cycles |
|
|
548
|
+
| Winner cherry-picked to shared worktree | Downstream tasks see winning approach via shared worktree |
|
|
427
549
|
|
|
428
550
|
## Example
|
|
429
551
|
|
|
552
|
+
### No-Tests Bootstrap
|
|
553
|
+
|
|
554
|
+
```
|
|
555
|
+
/df:execute (context: 8%)
|
|
556
|
+
|
|
557
|
+
Loading PLAN.md... T1 ready, T2/T3 blocked by T1
|
|
558
|
+
Ratchet snapshot: 0 pre-existing test files
|
|
559
|
+
Bootstrap needed: no pre-existing test files found.
|
|
560
|
+
|
|
561
|
+
Spawning bootstrap agent for edit_scope...
|
|
562
|
+
[Bootstrap agent completed]
|
|
563
|
+
Running ratchet: build ✓ | tests ✓ (12 new tests pass)
|
|
564
|
+
✓ Bootstrap: ratchet passed (boo1234)
|
|
565
|
+
Re-taking ratchet snapshot: 3 test files
|
|
566
|
+
|
|
567
|
+
bootstrap: completed — cycle's sole task was test bootstrap
|
|
568
|
+
Next: Run /df:auto-cycle again to execute T1
|
|
569
|
+
```
|
|
570
|
+
|
|
430
571
|
### Standard Execution
|
|
431
572
|
|
|
432
573
|
```
|
|
433
574
|
/df:execute (context: 12%)
|
|
434
575
|
|
|
435
576
|
Loading PLAN.md... T1 ready, T2/T3 blocked by T1
|
|
577
|
+
Ratchet snapshot: 24 pre-existing test files
|
|
436
578
|
Registering native tasks: TaskCreate T1/T2/T3, TaskUpdate(T2 blockedBy T1), TaskUpdate(T3 blockedBy T1)
|
|
437
579
|
|
|
438
580
|
Wave 1: TaskUpdate(T1, in_progress)
|
|
439
|
-
[Agent "T1" completed]
|
|
440
|
-
|
|
581
|
+
[Agent "T1" completed]
|
|
582
|
+
Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
|
|
583
|
+
✓ T1: ratchet passed (abc1234)
|
|
584
|
+
TaskUpdate(T1, completed) → auto-unblocks T2, T3
|
|
441
585
|
|
|
442
586
|
Wave 2: TaskUpdate(T2/T3, in_progress)
|
|
443
|
-
[Agent "T2" completed]
|
|
444
|
-
|
|
587
|
+
[Agent "T2" completed]
|
|
588
|
+
Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
|
|
589
|
+
✓ T2: ratchet passed (def5678)
|
|
590
|
+
[Agent "T3" completed]
|
|
591
|
+
Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
|
|
592
|
+
✓ T3: ratchet passed (ghi9012)
|
|
593
|
+
|
|
445
594
|
Context: 35% — ✓ doing-upload → done-upload. Complete: 3/3
|
|
446
595
|
|
|
447
596
|
Next: Run /df:verify to verify specs and merge to main
|
|
448
597
|
```
|
|
449
598
|
|
|
450
|
-
###
|
|
599
|
+
### Ratchet Failure (Regression Detected)
|
|
451
600
|
|
|
452
601
|
```
|
|
453
602
|
/df:execute (context: 10%)
|
|
454
603
|
|
|
455
|
-
|
|
456
|
-
|
|
457
|
-
|
|
458
|
-
|
|
459
|
-
|
|
460
|
-
TaskUpdate(task-002, addBlockedBy: [task-001])
|
|
461
|
-
TaskUpdate(task-003, addBlockedBy: [task-001])
|
|
462
|
-
|
|
463
|
-
Checking experiment status...
|
|
464
|
-
T1 [SPIKE]: No experiment yet, spike executable
|
|
465
|
-
T2, T3: Blocked by T1 (spike not validated)
|
|
466
|
-
|
|
467
|
-
Spawning Wave 1: T1 [SPIKE]
|
|
468
|
-
TaskUpdate(task-001, status: "in_progress")
|
|
469
|
-
|
|
470
|
-
[Agent "T1 SPIKE" completed]
|
|
471
|
-
✓ T1: complete (agent said: success), verifying...
|
|
472
|
-
|
|
473
|
-
Verifying T1...
|
|
474
|
-
✗ Spike T1 failed verification (throughput 1500 < 7000)
|
|
475
|
-
⚠ Agent incorrectly marked as passed — overriding to FAILED
|
|
476
|
-
# Spike stays pending — dependents remain blocked
|
|
477
|
-
→ upload--streaming--failed.md
|
|
604
|
+
Wave 1: TaskUpdate(T1, in_progress)
|
|
605
|
+
[Agent "T1" completed]
|
|
606
|
+
Running ratchet: build ✓ | tests ✗ (2 failed of 24)
|
|
607
|
+
✗ T1: ratchet failed, reverted
|
|
608
|
+
TaskUpdate(T1, pending)
|
|
478
609
|
|
|
479
|
-
|
|
480
|
-
|
|
610
|
+
Spawning debugger for T1...
|
|
611
|
+
[Debugger completed]
|
|
612
|
+
Re-running T1 with fix guidance...
|
|
481
613
|
|
|
482
|
-
|
|
614
|
+
[Agent "T1 retry" completed]
|
|
615
|
+
Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
|
|
616
|
+
✓ T1: ratchet passed (abc1234)
|
|
483
617
|
```
|
|
484
618
|
|
|
485
|
-
Note: If the agent correctly reports `status: failed`, the "overriding to FAILED" line is omitted — the verifier simply confirms failure.
|
|
486
|
-
|
|
487
619
|
### With Checkpoint
|
|
488
620
|
|
|
489
621
|
```
|