deepflow 0.1.71 → 0.1.72

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -2,16 +2,16 @@
2
2
 
3
3
  ## Orchestrator Role
4
4
 
5
- You are a coordinator. Spawn agents, wait for results, update PLAN.md. Never implement code yourself.
5
+ You are a coordinator. Spawn agents, run ratchet checks, update PLAN.md. Never implement code yourself.
6
6
 
7
- **NEVER:** Read source files, edit code, run tests, run git commands (except status), use TaskOutput, use EnterPlanMode, use ExitPlanMode
7
+ **NEVER:** Read source files, edit code, use TaskOutput, use EnterPlanMode, use ExitPlanMode
8
8
 
9
- **ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, read `.deepflow/results/*.yaml` on completion notifications, update PLAN.md, write `.deepflow/decisions.md` in the main tree
9
+ **ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, run ratchet health checks after each agent completes, update PLAN.md, write `.deepflow/decisions.md` in the main tree
10
10
 
11
11
  ---
12
12
 
13
13
  ## Purpose
14
- Implement tasks from PLAN.md with parallel agents, atomic commits, and context-efficient execution.
14
+ Implement tasks from PLAN.md with parallel agents, atomic commits, ratchet-driven quality gates, and context-efficient execution.
15
15
 
16
16
  ## Usage
17
17
  ```
@@ -26,11 +26,13 @@ Implement tasks from PLAN.md with parallel agents, atomic commits, and context-e
26
26
  - Skill: `atomic-commits` — Clean commit protocol
27
27
 
28
28
  **Use Task tool to spawn agents:**
29
- | Agent | subagent_type | model | Purpose |
30
- |-------|---------------|-------|---------|
31
- | Implementation | `general-purpose` | `sonnet` | Task implementation |
32
- | Spike Verifier | `reasoner` | `opus` | Verify spike pass/fail is correct |
33
- | Debugger | `reasoner` | `opus` | Debugging failures |
29
+ | Agent | subagent_type | Purpose |
30
+ |-------|---------------|---------|
31
+ | Implementation | `general-purpose` | Task implementation |
32
+ | Debugger | `reasoner` | Debugging failures |
33
+
34
+ **Model routing from frontmatter:**
35
+ The model for each agent is determined by the `model:` field in the command/agent/skill frontmatter being invoked. The orchestrator reads the relevant frontmatter to determine which model to pass to `Task()`. If no `model:` field is present in the frontmatter, default to `sonnet`.
34
36
 
35
37
  ## Context-Aware Execution
36
38
 
@@ -54,53 +56,15 @@ Each task = one background agent. Use agent completion notifications as the feed
54
56
  2. STOP. End your turn. Do NOT run Bash monitors or poll for results.
55
57
  3. Wait for "Agent X completed" notifications (they arrive automatically)
56
58
  4. On EACH notification:
57
- a. Read the result file: Read("{worktree}/.deepflow/results/{task_id}.yaml")
58
- b. Report: "✓ T1: success (abc123)" or "✗ T1: failed"
59
+ a. Run ratchet check (health checks on the worktree)
60
+ b. Report: "✓ T1: ratchet passed (abc123)" or "✗ T1: ratchet failed, reverted"
59
61
  c. Update PLAN.md for that task
60
62
  d. Check: all wave agents done?
61
63
  - No → end turn, wait for next notification
62
64
  - Yes → proceed to next wave or write final summary
63
65
  ```
64
66
 
65
- After spawning, your turn ENDS. Per notification: read result file, output ONE line ("✓ T1: success (abc123)"), update PLAN.md. Write full summary only after ALL wave agents complete.
66
-
67
- Result file `.deepflow/results/{task_id}.yaml`:
68
- ```yaml
69
- task: T3
70
- status: success|failed
71
- commit: abc1234
72
- summary: "one line"
73
- tests_ran: true|false
74
- test_command: "npm test"
75
- test_exit_code: 0
76
- test_output_tail: |
77
- PASS src/upload.test.ts
78
- Tests: 12 passed, 12 total
79
- ```
80
-
81
- New fields: `tests_ran` (bool), `test_command` (string), `test_exit_code` (int), `test_output_tail` (last 20 lines of output).
82
-
83
- **Spike result file** `.deepflow/results/{task_id}.yaml` (additional fields):
84
- ```yaml
85
- task: T1
86
- type: spike
87
- status: success|failed
88
- commit: abc1234
89
- summary: "one line"
90
- criteria:
91
- - name: "throughput"
92
- target: ">= 7000 g/s"
93
- actual: "1500 g/s"
94
- met: false
95
- - name: "memory usage"
96
- target: "< 500 MB"
97
- actual: "320 MB"
98
- met: true
99
- all_criteria_met: false # ALL must be true for spike to pass
100
- experiment_file: ".deepflow/experiments/upload--streaming--failed.md"
101
- ```
102
-
103
- **CRITICAL:** `status` MUST equal `success` only if `all_criteria_met: true`. The spike verifier will reject mismatches.
67
+ After spawning, your turn ENDS. Per notification: run ratchet, output ONE line, update PLAN.md. Write full summary only after ALL wave agents complete.
104
68
 
105
69
  ## Checkpoint & Resume
106
70
 
@@ -160,6 +124,66 @@ fi
160
124
 
161
125
  **--fresh flag:** Deletes existing worktree and creates new one.
162
126
 
127
+ ### 1.6. RATCHET SNAPSHOT
128
+
129
+ Before spawning agents, snapshot pre-existing test files:
130
+
131
+ ```bash
132
+ cd ${WORKTREE_PATH}
133
+
134
+ # Snapshot pre-existing test files (only these count for ratchet)
135
+ git ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' \
136
+ > .deepflow/auto-snapshot.txt
137
+
138
+ echo "Ratchet snapshot: $(wc -l < .deepflow/auto-snapshot.txt) pre-existing test files"
139
+ ```
140
+
141
+ **Only pre-existing test files are used for ratchet evaluation.** New test files created by agents during implementation don't influence the pass/fail decision. This prevents agents from gaming the ratchet by writing tests that pass trivially.
142
+
143
+ ### 1.7. NO-TESTS BOOTSTRAP
144
+
145
+ After the ratchet snapshot, check if zero test files were found:
146
+
147
+ ```bash
148
+ TEST_COUNT=$(wc -l < .deepflow/auto-snapshot.txt | tr -d ' ')
149
+
150
+ if [ "${TEST_COUNT}" = "0" ]; then
151
+ echo "Bootstrap needed: no pre-existing test files found."
152
+ BOOTSTRAP_NEEDED=true
153
+ else
154
+ BOOTSTRAP_NEEDED=false
155
+ fi
156
+ ```
157
+
158
+ **If `BOOTSTRAP_NEEDED=true`:**
159
+
160
+ 1. **Inject a bootstrap task** as the FIRST action before any regular PLAN.md task is executed:
161
+ - Bootstrap task description: "Write tests for files in edit_scope"
162
+ - Read `edit_scope` from `specs/doing-*.md` to know which files need tests
163
+ - Spawn ONE dedicated bootstrap agent using the Bootstrap Task prompt (section 6)
164
+
165
+ 2. **Bootstrap agent behavior:**
166
+ - Write tests covering the files listed in `edit_scope`
167
+ - Commit as `test({spec}): bootstrap tests for edit_scope`
168
+ - The bootstrap agent's ONLY job is writing tests — no implementation changes
169
+
170
+ 3. **After bootstrap agent completes:**
171
+ - Run ratchet health checks (build must pass; test suite must not error out)
172
+ - If ratchet passes: re-take the ratchet snapshot so subsequent tasks use the new tests as baseline:
173
+ ```bash
174
+ cd ${WORKTREE_PATH}
175
+ git ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' \
176
+ > .deepflow/auto-snapshot.txt
177
+ echo "Post-bootstrap snapshot: $(wc -l < .deepflow/auto-snapshot.txt) test files"
178
+ ```
179
+ - If ratchet fails: revert bootstrap commit, log error, halt and report "Bootstrap failed — manual intervention required"
180
+
181
+ 4. **Signal to caller:** After bootstrap completes successfully, report `"bootstrap: completed"` in the cycle summary. This cycle's sole output is the test bootstrap — no regular PLAN.md task is executed this cycle.
182
+
183
+ 5. **Subsequent cycles:** The updated `.deepflow/auto-snapshot.txt` now contains the bootstrapped test files. All subsequent ratchet checks use these as the baseline.
184
+
185
+ **If `BOOTSTRAP_NEEDED=false`:** Proceed normally to section 2.
186
+
163
187
  ### 2. LOAD PLAN
164
188
 
165
189
  ```
@@ -175,162 +199,251 @@ For each `[ ]` task in PLAN.md: `TaskCreate(subject: "{task_id}: {description}",
175
199
 
176
200
  Warn if `specs/*.md` (excluding doing-/done-) exist. Non-blocking.
177
201
 
178
- ### 4. CHECK EXPERIMENT STATUS (HYPOTHESIS VALIDATION)
202
+ ### 4. IDENTIFY READY TASKS
179
203
 
180
- **Before identifying ready tasks**, check experiment validation for full implementation tasks.
204
+ Use TaskList to find ready tasks:
181
205
 
182
- **Task Types:**
183
- - **Spike tasks**: Have `[SPIKE]` in title OR `Type: spike` in description — always executable
184
- - **Full implementation tasks**: Blocked by spike tasks — require validated experiment
206
+ ```
207
+ Ready = TaskList results where:
208
+ - status: "pending"
209
+ - blockedBy: empty (auto-unblocked by native dependency system)
210
+ ```
211
+
212
+ ### 5. SPAWN AGENTS
185
213
 
186
- **Validation Flow:**
214
+ Context ≥50%: checkpoint and exit.
187
215
 
216
+ **Before spawning each agent**, mark its native task as in_progress:
188
217
  ```
189
- For each task in plan:
190
- If task is spike task:
191
- → Mark as executable (spikes are always allowed)
192
- Else if task is blocked by a spike task (T{n}):
193
- → Find related experiment file in .deepflow/experiments/
194
- → Check experiment status:
195
- - --passed.md exists → Unblock, proceed with implementation
196
- - --failed.md exists → Keep blocked, warn user
197
- - --active.md exists → Keep blocked, spike in progress
198
- - No experiment → Keep blocked, spike not started
218
+ TaskUpdate(taskId: native_id, status: "in_progress")
199
219
  ```
220
+ This activates the UI spinner showing the task's activeForm (e.g. "Creating upload endpoint").
200
221
 
201
- **Experiment File Discovery:**
222
+ **NEVER use `isolation: "worktree"` on Task tool calls.** Deepflow manages a shared worktree per spec (`.deepflow/worktrees/{spec}/`) so wave 2 agents see wave 1 commits. Claude Code's native isolation creates separate per-agent worktrees (`.claude/worktrees/`) where agents can't see each other's work.
202
223
 
203
- ```
204
- Glob: .deepflow/experiments/{topic}--*--{status}.md
224
+ **Spawn ALL ready tasks in ONE message** with multiple Task tool calls (true parallelism). Same-file conflicts: spawn sequentially.
205
225
 
206
- Topic extraction:
207
- 1. From spike task: experiment file path in task description
208
- 2. From spec name: doing-{topic} → {topic}
209
- 3. Fuzzy match: normalize and match
210
- ```
226
+ **Multiple [SPIKE] tasks for the same problem:** When PLAN.md contains two or more `[SPIKE]` tasks grouped by the same "Blocked by:" target or identical problem description, do NOT run them sequentially. Instead, follow the **Parallel Spike Probes** protocol in section 5.7 before spawning any implementation tasks that depend on the spike outcome.
227
+
228
+ ### 5.5. RATCHET CHECK
211
229
 
212
- **Status Handling:**
230
+ After each agent completes (notification received), the orchestrator runs health checks on the worktree.
213
231
 
214
- | Experiment Status | Task Status | Action |
215
- |-------------------|-------------|--------|
216
- | `--passed.md` | Ready | Execute full implementation |
217
- | `--failed.md` | Blocked | Skip, warn: "Experiment failed, re-plan needed" |
218
- | `--active.md` | Blocked | Skip, info: "Waiting for spike completion" |
219
- | Not found | Blocked | Skip, info: "Spike task not executed yet" |
232
+ **Step 1: Detect commands** (same auto-detection as /df:verify):
220
233
 
221
- **Warning Output:**
234
+ | File | Build | Test | Typecheck | Lint |
235
+ |------|-------|------|-----------|------|
236
+ | `package.json` | `npm run build` (if scripts.build) | `npm test` (if scripts.test not placeholder) | `npx tsc --noEmit` (if tsconfig.json) | `npm run lint` (if scripts.lint) |
237
+ | `pyproject.toml` | — | `pytest` | `mypy .` (if mypy in deps) | `ruff check .` (if ruff in deps) |
238
+ | `Cargo.toml` | `cargo build` | `cargo test` | — | `cargo clippy` (if installed) |
239
+ | `go.mod` | `go build ./...` | `go test ./...` | — | `go vet ./...` |
222
240
 
241
+ **Step 2: Run health checks** in the worktree:
242
+ ```bash
243
+ cd ${WORKTREE_PATH}
244
+
245
+ # Run each detected command
246
+ # Build → Test → Typecheck → Lint (stop on first failure)
223
247
  ```
224
- ⚠ T3 blocked: Experiment 'upload--streaming--failed.md' did not validate
225
- Run /df:plan to generate new hypothesis spike
248
+
249
+ **Step 3: Validate edit scope** (if spec declares `edit_scope`):
250
+ ```bash
251
+ # Get files changed by the agent
252
+ CHANGED=$(git diff HEAD~1 --name-only)
253
+
254
+ # Load edit_scope from spec (files/globs)
255
+ EDIT_SCOPE=$(grep 'edit_scope:' specs/doing-*.md | sed 's/edit_scope://' | tr ',' '\n' | xargs)
256
+
257
+ # Check each changed file against allowed scope
258
+ for file in ${CHANGED}; do
259
+ ALLOWED=false
260
+ for pattern in ${EDIT_SCOPE}; do
261
+ # Match file against glob pattern
262
+ [[ "${file}" == ${pattern} ]] && ALLOWED=true
263
+ done
264
+ ${ALLOWED} || VIOLATIONS+=("${file}")
265
+ done
226
266
  ```
227
267
 
228
- ### 5. IDENTIFY READY TASKS
268
+ - Violations found revert: `git revert HEAD --no-edit`, report "✗ Edit scope violation: {files}"
269
+ - No violations → continue to health checks
270
+
271
+ **Step 4: Evaluate**:
272
+ - All checks pass AND no scope violations → task succeeds, commit stands
273
+ - Any check fails → regression detected → revert: `git revert HEAD --no-edit`
274
+
275
+ **Ratchet uses ONLY pre-existing test files** (from `.deepflow/auto-snapshot.txt`). If the agent added new test files that fail, those are excluded from evaluation — the agent's new tests don't influence the ratchet decision.
276
+
277
+ **For spike tasks:** Same ratchet. If the spike's code passes pre-existing health checks, the spike passes. No LLM judges another LLM's work.
278
+
279
+ ### 5.7. PARALLEL SPIKE PROBES
229
280
 
230
- Use TaskList to find ready tasks (replaces manual PLAN.md parsing):
281
+ When two or more `[SPIKE]` tasks address the **same problem** (same "Blocked by:" target OR identical or near-identical hypothesis wording), treat them as a probe set and run this protocol instead of the standard single-agent flow.
282
+
283
+ #### Detection
231
284
 
232
285
  ```
233
- Ready = TaskList results where:
234
- - status: "pending"
235
- - blockedBy: empty (auto-unblocked by native dependency system)
286
+ Spike group = all [SPIKE] tasks where:
287
+ - same "Blocked by:" value, OR
288
+ - problem description is identical after stripping task ID prefix
289
+ If group size ≥ 2 → enter parallel probe mode
236
290
  ```
237
291
 
238
- **Cross-check with experiment validation** (for spike-blocked tasks):
239
- - If task depends on spike AND experiment not `--passed.md` → still blocked
240
- - TaskUpdate to add spike as blocker if not already set
292
+ #### Step 1: Record baseline commit
241
293
 
242
- Ready = TaskList pending + empty blockedBy + experiment validated (if applicable).
294
+ ```bash
295
+ cd ${WORKTREE_PATH}
296
+ BASELINE=$(git rev-parse HEAD)
297
+ echo "Probe baseline: ${BASELINE}"
298
+ ```
243
299
 
244
- ### 6. SPAWN AGENTS
300
+ All probes branch from this exact commit so they share the same ratchet baseline.
245
301
 
246
- Context ≥50%: checkpoint and exit.
302
+ #### Step 2: Create isolated sub-worktrees
247
303
 
248
- **Before spawning each agent**, mark its native task as in_progress:
304
+ For each spike `{SPIKE_ID}` in the probe group:
305
+
306
+ ```bash
307
+ PROBE_BRANCH="df/${SPEC_NAME}/probe-${SPIKE_ID}"
308
+ PROBE_PATH=".deepflow/worktrees/${SPEC_NAME}/probe-${SPIKE_ID}"
309
+
310
+ git worktree add -b "${PROBE_BRANCH}" "${PROBE_PATH}" "${BASELINE}"
311
+ echo "Created probe worktree: ${PROBE_PATH} (branch: ${PROBE_BRANCH})"
249
312
  ```
250
- TaskUpdate(taskId: native_id, status: "in_progress")
313
+
314
+ #### Step 3: Spawn all probes in parallel
315
+
316
+ Mark every spike task as `in_progress`, then spawn one agent per probe **in a single message** using the Spike Task prompt (section 6), with the probe's worktree path as its working directory.
317
+
318
+ ```
319
+ TaskUpdate(taskId: native_id_SPIKE_A, status: "in_progress")
320
+ TaskUpdate(taskId: native_id_SPIKE_B, status: "in_progress")
321
+ [spawn agent for SPIKE_A → PROBE_PATH_A]
322
+ [spawn agent for SPIKE_B → PROBE_PATH_B]
323
+ ... (all in ONE message)
251
324
  ```
252
- This activates the UI spinner showing the task's activeForm (e.g. "Creating upload endpoint").
253
325
 
254
- **NEVER use `isolation: "worktree"` on Task tool calls.** Deepflow manages a shared worktree per spec (`.deepflow/worktrees/{spec}/`) so wave 2 agents see wave 1 commits. Claude Code's native isolation creates separate per-agent worktrees (`.claude/worktrees/`) where agents can't see each other's work.
326
+ End your turn. Do NOT poll or monitor. Wait for completion notifications.
255
327
 
256
- **Spawn ALL ready tasks in ONE message** with multiple Task tool calls (true parallelism). Same-file conflicts: spawn sequentially.
328
+ #### Step 4: Ratchet each probe (on completion notifications)
257
329
 
258
- **Spike Task Execution:**
259
- When spawning a spike task, the agent MUST:
260
- 1. Execute the minimal validation method
261
- 2. Record structured criteria evaluation in result file (see spike result schema above)
262
- 3. Write experiment file with `--active.md` status (verifier determines final status)
263
- 4. Commit as `spike({spec}): validate {hypothesis}`
330
+ When a probe agent's notification arrives, run the standard ratchet (section 5.5) against its dedicated probe worktree:
264
331
 
265
- **IMPORTANT:** Spike agent writes `--active.md`, NOT `--passed.md` or `--failed.md`. The verifier determines final status.
332
+ ```bash
333
+ cd ${PROBE_PATH}
334
+
335
+ # Identical health-check commands as standard tasks
336
+ # Build → Test → Typecheck → Lint (stop on first failure)
337
+ ```
338
+
339
+ Record per-probe metrics:
340
+
341
+ ```yaml
342
+ probe_id: SPIKE_A
343
+ worktree: .deepflow/worktrees/{spec}/probe-SPIKE_A
344
+ branch: df/{spec}/probe-SPIKE_A
345
+ ratchet_passed: true/false
346
+ regressions: 0 # failing pre-existing tests
347
+ coverage_delta: +3 # new lines covered (positive = better)
348
+ files_changed: 4 # number of files touched
349
+ commit: abc1234
350
+ ```
266
351
 
267
- ### 6.5. VERIFY SPIKE RESULTS
352
+ Wait until **all** probe notifications have arrived before proceeding to selection.
268
353
 
269
- After spike completes, spawn verifier BEFORE unblocking implementation tasks.
354
+ #### Step 5: Machine-select winner
270
355
 
271
- **Trigger:** Spike result file detected (`.deepflow/results/T{n}.yaml` with `type: spike`)
356
+ No LLM evaluates another LLM's work. Apply the following ordered criteria to all probes that **passed** the ratchet:
272
357
 
273
- **Spawn:**
274
358
  ```
275
- Task(subagent_type="reasoner", model="opus", prompt=VERIFIER_PROMPT)
359
+ 1. Fewer regressions (lower is better — hard gate: any regression disqualifies)
360
+ 2. Better coverage (higher delta is better)
361
+ 3. Fewer files changed (lower is better — smaller blast radius)
362
+
363
+ Tie-break: first probe to complete (chronological)
276
364
  ```
277
365
 
278
- **Verifier Prompt:**
366
+ If **no** probe passes the ratchet, all are failed probes. Log insights (step 7) and reset the spike tasks to `pending` for retry with debugger guidance.
367
+
368
+ #### Step 6: Preserve ALL probe worktrees
369
+
370
+ Do NOT delete losing probe worktrees. They are preserved for manual inspection and cross-cycle learning:
371
+
372
+ ```bash
373
+ # Winning probe: leave as-is, will be used as implementation base (step 8)
374
+ # Losing probes: leave worktrees intact, mark branches with -failed suffix for clarity
375
+ git branch -m "df/{spec}/probe-SPIKE_B" "df/{spec}/probe-SPIKE_B-failed"
279
376
  ```
280
- SPIKE VERIFICATION — Be skeptical. Catch false positives.
281
377
 
282
- Task: {task_id}
283
- Result: {worktree_path}/.deepflow/results/{task_id}.yaml
284
- Experiment: {worktree_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
378
+ Record all probe paths in `.deepflow/checkpoint.json` under `"spike_probes"` so future `--continue` runs know they exist.
285
379
 
286
- For each criterion in result file:
287
- 1. Is `actual` a concrete number? (reject "good", "improved", "better")
288
- 2. Does `actual` satisfy `target`? Do the math.
289
- 3. Is `met` correct?
380
+ #### Step 7: Log failed probe insights
290
381
 
291
- Reject these patterns:
292
- - "Works but doesn't meet target" → FAILED
293
- - "Close enough" → FAILED
294
- - Actual 1500 vs Target >= 7000 → FAILED
382
+ For every probe that failed the ratchet (or lost selection), write two entries to `.deepflow/auto-memory.yaml` in the **main** tree.
295
383
 
296
- Output to {worktree_path}/.deepflow/results/{task_id}-verified.yaml:
297
- verified_status: VERIFIED_PASS|VERIFIED_FAIL
298
- override: true|false
299
- reason: "one line"
384
+ **Entry 1 — `spike_insights` (detailed probe record):**
385
+
386
+ ```yaml
387
+ spike_insights:
388
+ - date: "YYYY-MM-DD"
389
+ spec: "{spec_name}"
390
+ spike_id: "SPIKE_B"
391
+ hypothesis: "{hypothesis text from PLAN.md}"
392
+ outcome: "failed" # or "passed-but-lost"
393
+ failure_reason: "{first failed check and error summary}"
394
+ ratchet_metrics:
395
+ regressions: 2
396
+ coverage_delta: -1
397
+ files_changed: 7
398
+ worktree: ".deepflow/worktrees/{spec}/probe-SPIKE_B-failed"
399
+ branch: "df/{spec}/probe-SPIKE_B-failed"
400
+ edge_cases: [] # orchestrator may populate after manual review
401
+ ```
402
+
403
+ **Entry 2 — `probe_learnings` (cross-cycle memory, read by `/df:auto-cycle` on each cycle start):**
300
404
 
301
- Then rename experiment:
302
- - VERIFIED_PASS → --passed.md
303
- - VERIFIED_FAIL → --failed.md (add "Next hypothesis:" to Conclusion)
405
+ ```yaml
406
+ probe_learnings:
407
+ - spike: "SPIKE_B"
408
+ probe: "{probe branch suffix, e.g. probe-SPIKE_B}"
409
+ insight: "{one-sentence summary of what the probe revealed, derived from failure_reason}"
304
410
  ```
305
411
 
306
- **Gate:**
412
+ If the file does not exist, create it. Initialize both `spike_insights:` and `probe_learnings:` as empty lists before appending. Preserve all existing keys when merging.
413
+
414
+ #### Step 8: Promote winning probe
415
+
416
+ Cherry-pick the winner's commit into the shared spec worktree so downstream implementation tasks see the winning approach:
417
+
418
+ ```bash
419
+ cd ${WORKTREE_PATH} # shared worktree (not the probe sub-worktree)
420
+ git cherry-pick ${WINNER_COMMIT}
307
421
  ```
308
- VERIFIED_PASS →
309
- TaskUpdate(taskId: spike_native_id, status: "completed")
310
- # Native system auto-unblocks dependent tasks
311
- Log "✓ Spike {task_id} verified"
312
422
 
313
- VERIFIED_FAIL
314
- # Spike task stays as pending, dependents remain blocked
315
- Log "✗ Spike {task_id} failed verification"
316
- If override: log "⚠ Agent incorrectly marked as passed"
423
+ Then mark the winning spike task as `completed` and auto-unblock its dependents:
424
+
425
+ ```
426
+ TaskUpdate(taskId: native_id_SPIKE_WINNER, status: "completed")
427
+ TaskUpdate(taskId: native_id_SPIKE_LOSERS, status: "pending") # keep visible for audit
317
428
  ```
318
429
 
319
- On task failure: spawn `Task(subagent_type="reasoner", model="opus", prompt="Debug failure: {error details}")`.
430
+ Update PLAN.md:
431
+ - Winning spike → `[x]` with commit hash and `[PROBE_WINNER]` tag
432
+ - Losing spikes → `[~]` (skipped) with `[PROBE_FAILED: see auto-memory.yaml]` note
433
+
434
+ Resume the standard execution loop (section 9) — implementation tasks blocked by the spike group are now unblocked.
320
435
 
321
- ### 7. PER-TASK (agent prompt)
436
+ ---
437
+
438
+ ### 6. PER-TASK (agent prompt)
322
439
 
323
440
  **Common preamble (include in all agent prompts):**
324
441
  ```
325
442
  Working directory: {worktree_absolute_path}
326
443
  All file operations MUST use this absolute path as base. Do NOT write files to the main project directory.
327
444
  Commit format: {commit_type}({spec}): {description}
328
- Result file: {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
329
445
 
330
- STOP after writing the result file. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main. These are handled by the orchestrator and /df:verify.
331
-
332
- Navigation: Prefer LSP tools (goToDefinition, findReferences, workspaceSymbol) over Grep/Glob for code navigation. Fall back to Grep/Glob if LSP unavailable.
333
- If LSP errors, install the language server (TS→typescript-language-server, Python→pyright, Rust→rust-analyzer, Go→gopls) and retry. If still unavailable, use Grep/Glob.
446
+ STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main. These are handled by the orchestrator and /df:verify.
334
447
  ```
335
448
 
336
449
  **Standard Task (append after preamble):**
@@ -341,44 +454,49 @@ Spec: {spec_name}
341
454
 
342
455
  Steps:
343
456
  1. Implement the task
344
- 2. Detect and run the project's test command if test infrastructure exists
345
- - If tests fail: fix and re-run until passing. Do NOT commit with failing tests
346
- - If NO test infrastructure: set tests_ran: false in result file
347
- 3. Commit as feat({spec}): {description}
348
- 4. Write result file with ALL fields including test evidence (see schema)
457
+ 2. Commit as feat({spec}): {description}
458
+
459
+ Your ONLY job is to write code and commit. The orchestrator will run health checks after you finish.
460
+ ```
461
+
462
+ **Bootstrap Task (append after preamble):**
463
+ ```
464
+ BOOTSTRAP: Write tests for files in edit_scope
465
+ Files: {edit_scope files from spec}
466
+ Spec: {spec_name}
467
+
468
+ Steps:
469
+ 1. Write tests that cover the functionality of the files listed above
470
+ 2. Do NOT change implementation files — tests only
471
+ 3. Commit as test({spec}): bootstrap tests for edit_scope
472
+
473
+ Your ONLY job is to write tests and commit. The orchestrator will run health checks after you finish.
349
474
  ```
350
475
 
351
476
  **Spike Task (append after preamble):**
352
477
  ```
353
478
  {task_id} [SPIKE]: {hypothesis}
354
- Type: spike
355
- Method: {minimal steps}
356
- Success criteria: {measurable targets}
357
- Experiment file: {worktree_absolute_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
479
+ Files: {target files}
480
+ Spec: {spec_name}
358
481
 
359
482
  Steps:
360
- 1. Execute method
361
- 2. For EACH criterion: record target, measure actual, compare (show math)
362
- 3. Write experiment as --active.md (verifier determines final status)
363
- 4. Commit: spike({spec}): validate {hypothesis}
364
- 5. Write result to .deepflow/results/{task_id}.yaml (see spike result schema)
365
- 6. If test infrastructure exists, also run tests and include evidence in result file
483
+ 1. Implement the minimal spike to validate the hypothesis
484
+ 2. Commit as spike({spec}): {description}
366
485
 
367
- Rules:
368
- - `met: true` ONLY if actual satisfies target
369
- - `status: success` ONLY if ALL criteria met
370
- - Worse than baseline = FAILED (baseline 7k, actual 1.5k → FAILED)
371
- - "Close enough" = FAILED
372
- - Verifier will check. False positives waste resources.
486
+ Your ONLY job is to write code and commit. The orchestrator will run health checks to determine if the spike passes.
373
487
  ```
374
488
 
375
- ### 8. FAILURE HANDLING
489
+ ### 7. FAILURE HANDLING
490
+
491
+ When a task fails ratchet and is reverted:
492
+
493
+ `TaskUpdate(taskId: native_id, status: "pending")` — keeps task visible for retry; dependents remain blocked.
376
494
 
377
- When a task fails and cannot be auto-fixed:
495
+ On repeated failure: spawn `Task(subagent_type="reasoner", model={model from debugger frontmatter, default "sonnet"}, prompt="Debug failure: {ratchet output}")`.
378
496
 
379
- `TaskUpdate(taskId: native_id, status: "pending")` — keeps task visible for retry; dependents remain blocked. Leave worktree intact, keep checkpoint.json, output: worktree path/branch, `cd {worktree_path}` to investigate, `/df:execute --continue` to resume, `/df:execute --fresh` to discard.
497
+ Leave worktree intact, keep checkpoint.json, output: worktree path/branch, `cd {worktree_path}` to investigate, `/df:execute --continue` to resume, `/df:execute --fresh` to discard.
380
498
 
381
- ### 9. COMPLETE SPECS
499
+ ### 8. COMPLETE SPECS
382
500
 
383
501
  When all tasks done for a `doing-*` spec:
384
502
  1. Embed history in spec: `## Completed` section with task list and commit hashes
@@ -397,19 +515,16 @@ When all tasks done for a `doing-*` spec:
397
515
  - Separators (`---`) between removed sections
398
516
  5. Recalculate the Summary table at the top of PLAN.md (update counts for completed/pending)
399
517
 
400
- ### 10. ITERATE (Notification-Driven)
518
+ ### 9. ITERATE (Notification-Driven)
401
519
 
402
520
  After spawning wave agents, your turn ENDS. Completion notifications drive the loop.
403
521
 
404
522
  **Per notification:**
405
- 1. Read result file for the completed agent
406
- 2. Validate test evidence:
407
- - `tests_ran: true` + `test_exit_code: 0` trust result
408
- - `tests_ran: true` + `test_exit_code: non-zero` status MUST be failed (flag mismatch if agent said success)
409
- - `tests_ran: false` + `status: success` flag: " Tx: success but no tests ran"
410
- 3. TaskUpdate(taskId: native_id, status: "completed") — auto-unblocks dependent tasks
411
- 4. Update PLAN.md: `[ ]` → `[x]` + commit hash (as before)
412
- 5. Report: "✓ T1: success (abc123) [12 tests passed]" or "⚠ T1: success (abc123) [no tests]"
523
+ 1. Run ratchet check for the completed agent (see section 5.5)
524
+ 2. Ratchet passed → `TaskUpdate(taskId: native_id, status: "completed")` — auto-unblocks dependent tasks
525
+ 3. Ratchet failed revert commit, `TaskUpdate(taskId: native_id, status: "pending")`
526
+ 4. Update PLAN.md: `[ ]` → `[x]` + commit hash (on pass) or note revert (on fail)
527
+ 5. Report: "✓ T1: ratchet passed (abc123)" or " T1: ratchet failed, reverted"
413
528
  6. If NOT all wave agents done → end turn, wait
414
529
  7. If ALL wave agents done → use TaskList to find newly unblocked tasks, check context, spawn next wave or finish
415
530
 
@@ -421,69 +536,86 @@ After spawning wave agents, your turn ENDS. Completion notifications drive the l
421
536
 
422
537
  | Rule | Detail |
423
538
  |------|--------|
539
+ | Zero test files → bootstrap first | Section 1.7; bootstrap is the cycle's sole task when snapshot is empty |
424
540
  | 1 task = 1 agent = 1 commit | `atomic-commits` skill |
425
541
  | 1 file = 1 writer | Sequential if conflict |
426
- | Agents verify internally | Fix issues, don't report |
542
+ | Agent writes code, orchestrator measures | Ratchet is the judge |
543
+ | No LLM evaluates LLM work | Health checks only |
544
+ | ≥2 spikes for same problem → parallel probes | Section 5.7; never run competing spikes sequentially |
545
+ | All probe worktrees preserved | Losing probes renamed with `-failed` suffix; never deleted |
546
+ | Machine-selected winner | Fewer regressions > better coverage > fewer files changed; no LLM judge |
547
+ | Failed probe insights logged | `.deepflow/auto-memory.yaml` in main tree; persists across cycles |
548
+ | Winner cherry-picked to shared worktree | Downstream tasks see winning approach via shared worktree |
427
549
 
428
550
  ## Example
429
551
 
552
+ ### No-Tests Bootstrap
553
+
554
+ ```
555
+ /df:execute (context: 8%)
556
+
557
+ Loading PLAN.md... T1 ready, T2/T3 blocked by T1
558
+ Ratchet snapshot: 0 pre-existing test files
559
+ Bootstrap needed: no pre-existing test files found.
560
+
561
+ Spawning bootstrap agent for edit_scope...
562
+ [Bootstrap agent completed]
563
+ Running ratchet: build ✓ | tests ✓ (12 new tests pass)
564
+ ✓ Bootstrap: ratchet passed (boo1234)
565
+ Re-taking ratchet snapshot: 3 test files
566
+
567
+ bootstrap: completed — cycle's sole task was test bootstrap
568
+ Next: Run /df:auto-cycle again to execute T1
569
+ ```
570
+
430
571
  ### Standard Execution
431
572
 
432
573
  ```
433
574
  /df:execute (context: 12%)
434
575
 
435
576
  Loading PLAN.md... T1 ready, T2/T3 blocked by T1
577
+ Ratchet snapshot: 24 pre-existing test files
436
578
  Registering native tasks: TaskCreate T1/T2/T3, TaskUpdate(T2 blockedBy T1), TaskUpdate(T3 blockedBy T1)
437
579
 
438
580
  Wave 1: TaskUpdate(T1, in_progress)
439
- [Agent "T1" completed] TaskUpdate(T1, completed) → auto-unblocks T2, T3
440
- T1: success (abc1234)
581
+ [Agent "T1" completed]
582
+ Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
583
+ ✓ T1: ratchet passed (abc1234)
584
+ TaskUpdate(T1, completed) → auto-unblocks T2, T3
441
585
 
442
586
  Wave 2: TaskUpdate(T2/T3, in_progress)
443
- [Agent "T2" completed] ✓ T2: success (def5678)
444
- [Agent "T3" completed]T3: success (ghi9012)
587
+ [Agent "T2" completed]
588
+ Running ratchet: build| tests (24 passed) | typecheck ✓
589
+ ✓ T2: ratchet passed (def5678)
590
+ [Agent "T3" completed]
591
+ Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
592
+ ✓ T3: ratchet passed (ghi9012)
593
+
445
594
  Context: 35% — ✓ doing-upload → done-upload. Complete: 3/3
446
595
 
447
596
  Next: Run /df:verify to verify specs and merge to main
448
597
  ```
449
598
 
450
- ### Spike with Failure (Agent or Verifier Override)
599
+ ### Ratchet Failure (Regression Detected)
451
600
 
452
601
  ```
453
602
  /df:execute (context: 10%)
454
603
 
455
- Loading PLAN.md...
456
- Registering native tasks...
457
- TaskCreate T1 [SPIKE] (native: task-001)
458
- TaskCreate T2 (native: task-002)
459
- TaskCreate → T3 (native: task-003)
460
- TaskUpdate(task-002, addBlockedBy: [task-001])
461
- TaskUpdate(task-003, addBlockedBy: [task-001])
462
-
463
- Checking experiment status...
464
- T1 [SPIKE]: No experiment yet, spike executable
465
- T2, T3: Blocked by T1 (spike not validated)
466
-
467
- Spawning Wave 1: T1 [SPIKE]
468
- TaskUpdate(task-001, status: "in_progress")
469
-
470
- [Agent "T1 SPIKE" completed]
471
- ✓ T1: complete (agent said: success), verifying...
472
-
473
- Verifying T1...
474
- ✗ Spike T1 failed verification (throughput 1500 < 7000)
475
- ⚠ Agent incorrectly marked as passed — overriding to FAILED
476
- # Spike stays pending — dependents remain blocked
477
- → upload--streaming--failed.md
604
+ Wave 1: TaskUpdate(T1, in_progress)
605
+ [Agent "T1" completed]
606
+ Running ratchet: build | tests ✗ (2 failed of 24)
607
+ T1: ratchet failed, reverted
608
+ TaskUpdate(T1, pending)
478
609
 
479
- Spike T1 invalidated hypothesis
480
- Complete: 1/3 tasks (2 blocked by failed experiment)
610
+ Spawning debugger for T1...
611
+ [Debugger completed]
612
+ Re-running T1 with fix guidance...
481
613
 
482
- Next: Run /df:plan to generate new hypothesis spike
614
+ [Agent "T1 retry" completed]
615
+ Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
616
+ ✓ T1: ratchet passed (abc1234)
483
617
  ```
484
618
 
485
- Note: If the agent correctly reports `status: failed`, the "overriding to FAILED" line is omitted — the verifier simply confirms failure.
486
-
487
619
  ### With Checkpoint
488
620
 
489
621
  ```