deepflow 0.1.78 → 0.1.80

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -8,93 +8,44 @@ You are a coordinator. Spawn agents, run ratchet checks, update PLAN.md. Never i
8
8
 
9
9
  **ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, run ratchet health checks after each agent completes, update PLAN.md, write `.deepflow/decisions.md` in the main tree
10
10
 
11
- ---
11
+ ## Core Loop (Notification-Driven)
12
12
 
13
- ## Purpose
14
- Implement tasks from PLAN.md with parallel agents, atomic commits, ratchet-driven quality gates, and context-efficient execution.
13
+ Each task = one background agent. Completion notifications drive the loop.
14
+
15
+ **NEVER use TaskOutput** — returns full transcripts (100KB+) that explode context.
15
16
 
16
- ## Usage
17
17
  ```
18
- /df:execute # Execute all ready tasks
19
- /df:execute T1 T2 # Specific tasks only
20
- /df:execute --continue # Resume from checkpoint
21
- /df:execute --fresh # Ignore checkpoint
22
- /df:execute --dry-run # Show plan only
18
+ 1. Spawn ALL wave agents with run_in_background=true in ONE message
19
+ 2. STOP. End your turn. Do NOT poll or monitor.
20
+ 3. On EACH notification:
21
+ a. Run ratchet check (section 5.5)
22
+ b. Passed → TaskUpdate(status: "completed"), update PLAN.md [x] + commit hash
23
+ c. Failed → git revert HEAD --no-edit, TaskUpdate(status: "pending")
24
+ d. Report ONE line: "✓ T1: ratchet passed (abc123)" or "✗ T1: ratchet failed, reverted"
25
+ e. NOT all done → end turn, wait | ALL done → next wave or finish
26
+ 4. Between waves: check context %. If ≥50% → checkpoint and exit.
27
+ 5. Repeat until: all done, all blocked, or context ≥50%.
23
28
  ```
24
29
 
25
- ## Skills & Agents
26
- - Skill: `atomic-commits` — Clean commit protocol
27
- - Skill: `context-hub` — Fetch external API docs before coding (when task involves external libraries)
30
+ ## Context Threshold
28
31
 
29
- **Use Task tool to spawn agents:**
30
- | Agent | subagent_type | Purpose |
31
- |-------|---------------|---------|
32
- | Implementation | `general-purpose` | Task implementation |
33
- | Debugger | `reasoner` | Debugging failures |
34
-
35
- **Model routing from frontmatter:**
36
- The model for each agent is determined by the `model:` field in the command/agent/skill frontmatter being invoked. The orchestrator reads the relevant frontmatter to determine which model to pass to `Task()`. If no `model:` field is present in the frontmatter, default to `sonnet`.
37
-
38
- ## Context-Aware Execution
39
-
40
- Statusline writes to `.deepflow/context.json`: `{"percentage": 45}`
32
+ Statusline writes `.deepflow/context.json`: `{"percentage": 45}`
41
33
 
42
34
  | Context % | Action |
43
35
  |-----------|--------|
44
36
  | < 50% | Full parallelism (up to 5 agents) |
45
37
  | ≥ 50% | Wait for running agents, checkpoint, exit |
46
38
 
47
- ## Agent Protocol
48
-
49
- Each task = one background agent. Use agent completion notifications as the feedback loop.
50
-
51
- **NEVER use TaskOutput** — returns full agent transcripts (100KB+) that explode context.
52
-
53
- ### Notification-Driven Execution
54
-
55
- ```
56
- 1. Spawn ALL wave agents with run_in_background=true in ONE message
57
- 2. STOP. End your turn. Do NOT run Bash monitors or poll for results.
58
- 3. Wait for "Agent X completed" notifications (they arrive automatically)
59
- 4. On EACH notification:
60
- a. Run ratchet check (health checks on the worktree)
61
- b. Report: "✓ T1: ratchet passed (abc123)" or "✗ T1: ratchet failed, reverted"
62
- c. Update PLAN.md for that task
63
- d. Check: all wave agents done?
64
- - No → end turn, wait for next notification
65
- - Yes → proceed to next wave or write final summary
66
- ```
67
-
68
- After spawning, your turn ENDS. Per notification: run ratchet, output ONE line, update PLAN.md. Write full summary only after ALL wave agents complete.
69
-
70
- ## Checkpoint & Resume
71
-
72
- **File:** `.deepflow/checkpoint.json` — stored in WORKTREE directory, not main.
73
-
74
- **Schema:**
75
- ```json
76
- {
77
- "completed_tasks": ["T1", "T2"],
78
- "current_wave": 2,
79
- "worktree_path": ".deepflow/worktrees/upload",
80
- "worktree_branch": "df/upload"
81
- }
82
- ```
83
-
84
- **On checkpoint:** Complete wave → update PLAN.md → save to worktree → exit.
85
- **Resume:** `--continue` loads checkpoint, verifies worktree, skips completed tasks.
39
+ ---
86
40
 
87
41
  ## Behavior
88
42
 
89
43
  ### 1. CHECK CHECKPOINT
90
44
 
91
45
  ```
92
- --continue → Load checkpoint
93
- If worktree_path exists:
94
- Verify worktree still exists on disk
95
- → If missing: Error "Worktree deleted. Use --fresh"
96
- → If exists: Use it, skip worktree creation
97
- → Resume execution with completed tasks
46
+ --continue → Load .deepflow/checkpoint.json from worktree
47
+ Verify worktree exists on disk (else error: "Use --fresh")
48
+ Skip completed tasks, resume execution
98
49
  --fresh → Delete checkpoint, start fresh
99
50
  checkpoint exists → Prompt: "Resume? (y/n)"
100
51
  else → Start fresh
@@ -102,88 +53,29 @@ else → Start fresh
102
53
 
103
54
  ### 1.5. CREATE WORKTREE
104
55
 
105
- Before spawning any agents, create an isolated worktree:
106
-
107
- ```
108
- # Check main is clean (ignore untracked)
109
- git diff --quiet HEAD || Error: "Main has uncommitted changes. Commit or stash first."
110
-
111
- # Generate paths
112
- SPEC_NAME=$(basename spec/doing-*.md .md | sed 's/doing-//')
113
- BRANCH_NAME="df/${SPEC_NAME}"
114
- WORKTREE_PATH=".deepflow/worktrees/${SPEC_NAME}"
115
-
116
- # Create worktree (or reuse existing)
117
- if [ -d "${WORKTREE_PATH}" ]; then
118
- echo "Reusing existing worktree"
119
- else
120
- git worktree add -b "${BRANCH_NAME}" "${WORKTREE_PATH}"
121
- fi
122
- ```
123
-
124
- **Existing worktree:** Reuse it (same spec = same worktree).
125
-
126
- **--fresh flag:** Deletes existing worktree and creates new one.
56
+ Require clean HEAD (`git diff --quiet`). Derive SPEC_NAME from `specs/doing-*.md`.
57
+ Create worktree: `.deepflow/worktrees/{spec}` on branch `df/{spec}`.
58
+ Reuse if exists. `--fresh` deletes first.
127
59
 
128
60
  ### 1.6. RATCHET SNAPSHOT
129
61
 
130
- Before spawning agents, snapshot pre-existing test files:
62
+ Snapshot pre-existing test files in worktree — only these count for ratchet (agent-created tests excluded):
131
63
 
132
64
  ```bash
133
65
  cd ${WORKTREE_PATH}
134
-
135
- # Snapshot pre-existing test files (only these count for ratchet)
136
66
  git ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' \
137
67
  > .deepflow/auto-snapshot.txt
138
-
139
- echo "Ratchet snapshot: $(wc -l < .deepflow/auto-snapshot.txt) pre-existing test files"
140
68
  ```
141
69
 
142
- **Only pre-existing test files are used for ratchet evaluation.** New test files created by agents during implementation don't influence the pass/fail decision. This prevents agents from gaming the ratchet by writing tests that pass trivially.
143
-
144
70
  ### 1.7. NO-TESTS BOOTSTRAP
145
71
 
146
- After the ratchet snapshot, check if zero test files were found:
147
-
148
- ```bash
149
- TEST_COUNT=$(wc -l < .deepflow/auto-snapshot.txt | tr -d ' ')
150
-
151
- if [ "${TEST_COUNT}" = "0" ]; then
152
- echo "Bootstrap needed: no pre-existing test files found."
153
- BOOTSTRAP_NEEDED=true
154
- else
155
- BOOTSTRAP_NEEDED=false
156
- fi
157
- ```
158
-
159
- **If `BOOTSTRAP_NEEDED=true`:**
160
-
161
- 1. **Inject a bootstrap task** as the FIRST action before any regular PLAN.md task is executed:
162
- - Bootstrap task description: "Write tests for files in edit_scope"
163
- - Read `edit_scope` from `specs/doing-*.md` to know which files need tests
164
- - Spawn ONE dedicated bootstrap agent using the Bootstrap Task prompt (section 6)
72
+ If snapshot has zero test files:
165
73
 
166
- 2. **Bootstrap agent behavior:**
167
- - Write tests covering the files listed in `edit_scope`
168
- - Commit as `test({spec}): bootstrap tests for edit_scope`
169
- - The bootstrap agent's ONLY job is writing tests — no implementation changes
74
+ 1. Spawn ONE bootstrap agent (section 6 Bootstrap Task) to write tests for `edit_scope` files
75
+ 2. On ratchet pass: re-snapshot, report `"bootstrap: completed"`, end cycle (no PLAN.md tasks this cycle)
76
+ 3. On ratchet fail: revert, halt with "Bootstrap failed — manual intervention required"
170
77
 
171
- 3. **After bootstrap agent completes:**
172
- - Run ratchet health checks (build must pass; test suite must not error out)
173
- - If ratchet passes: re-take the ratchet snapshot so subsequent tasks use the new tests as baseline:
174
- ```bash
175
- cd ${WORKTREE_PATH}
176
- git ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' \
177
- > .deepflow/auto-snapshot.txt
178
- echo "Post-bootstrap snapshot: $(wc -l < .deepflow/auto-snapshot.txt) test files"
179
- ```
180
- - If ratchet fails: revert bootstrap commit, log error, halt and report "Bootstrap failed — manual intervention required"
181
-
182
- 4. **Signal to caller:** After bootstrap completes successfully, report `"bootstrap: completed"` in the cycle summary. This cycle's sole output is the test bootstrap — no regular PLAN.md task is executed this cycle.
183
-
184
- 5. **Subsequent cycles:** The updated `.deepflow/auto-snapshot.txt` now contains the bootstrapped test files. All subsequent ratchet checks use these as the baseline.
185
-
186
- **If `BOOTSTRAP_NEEDED=false`:** Proceed normally to section 2.
78
+ Subsequent cycles use bootstrapped tests as ratchet baseline.
187
79
 
188
80
  ### 2. LOAD PLAN
189
81
 
@@ -194,7 +86,7 @@ If missing: "No PLAN.md found. Run /df:plan first."
194
86
 
195
87
  ### 2.5. REGISTER NATIVE TASKS
196
88
 
197
- For each `[ ]` task in PLAN.md: `TaskCreate(subject: "{task_id}: {description}", activeForm: "{gerund}", description: full block)`. Store task_id → native ID mapping. Then set dependencies: `TaskUpdate(addBlockedBy: [...])` for each "Blocked by:" entry. On `--continue`: only register remaining `[ ]` items.
89
+ For each `[ ]` task in PLAN.md: `TaskCreate(subject: "{task_id}: {description}", activeForm: "{gerund}", description: full block)`. Store task_id → native ID mapping. Set dependencies via `TaskUpdate(addBlockedBy: [...])`. On `--continue`: only register remaining `[ ]` items.
198
90
 
199
91
  ### 3. CHECK FOR UNPLANNED SPECS
200
92
 
@@ -202,237 +94,84 @@ Warn if `specs/*.md` (excluding doing-/done-) exist. Non-blocking.
202
94
 
203
95
  ### 4. IDENTIFY READY TASKS
204
96
 
205
- Use TaskList to find ready tasks:
206
-
207
- ```
208
- Ready = TaskList results where:
209
- - status: "pending"
210
- - blockedBy: empty (auto-unblocked by native dependency system)
211
- ```
97
+ Ready = TaskList where status: "pending" AND blockedBy: empty.
212
98
 
213
99
  ### 5. SPAWN AGENTS
214
100
 
215
101
  Context ≥50%: checkpoint and exit.
216
102
 
217
- **Before spawning each agent**, mark its native task as in_progress:
218
- ```
219
- TaskUpdate(taskId: native_id, status: "in_progress")
220
- ```
221
- This activates the UI spinner showing the task's activeForm (e.g. "Creating upload endpoint").
103
+ Before spawning: `TaskUpdate(taskId: native_id, status: "in_progress")` activates UI spinner.
104
+
105
+ **NEVER use `isolation: "worktree"` on Task calls.** Deepflow manages a shared worktree so wave 2 sees wave 1 commits.
222
106
 
223
- **NEVER use `isolation: "worktree"` on Task tool calls.** Deepflow manages a shared worktree per spec (`.deepflow/worktrees/{spec}/`) so wave 2 agents see wave 1 commits. Claude Code's native isolation creates separate per-agent worktrees (`.claude/worktrees/`) where agents can't see each other's work.
107
+ **Spawn ALL ready tasks in ONE message** EXCEPT file conflicts (see below).
224
108
 
225
- **Spawn ALL ready tasks in ONE message** with multiple Task tool calls (true parallelism). Same-file conflicts: spawn sequentially.
109
+ **File conflict enforcement (1 file = 1 writer):**
110
+ Before spawning, check `Files:` lists of all ready tasks. If two+ ready tasks share a file:
111
+ 1. Sort conflicting tasks by task number (T1 < T2 < T3)
112
+ 2. Spawn only the lowest-numbered task from each conflict group
113
+ 3. Remaining tasks stay `pending` — they become ready once the spawned task completes
114
+ 4. Log: `"⏳ T{N} deferred — file conflict with T{M} on {filename}"`
226
115
 
227
- **Multiple [SPIKE] tasks for the same problem:** When PLAN.md contains two or more `[SPIKE]` tasks grouped by the same "Blocked by:" target or identical problem description, do NOT run them sequentially. Instead, follow the **Parallel Spike Probes** protocol in section 5.7 before spawning any implementation tasks that depend on the spike outcome.
116
+ **≥2 [SPIKE] tasks for same problem:** Follow Parallel Spike Probes (section 5.7).
228
117
 
229
118
  ### 5.5. RATCHET CHECK
230
119
 
231
- After each agent completes (notification received), the orchestrator runs health checks on the worktree.
120
+ After each agent completes, run health checks in the worktree.
232
121
 
233
- **Step 1: Detect commands** (same auto-detection as /df:verify):
122
+ **Auto-detect commands:**
234
123
 
235
124
  | File | Build | Test | Typecheck | Lint |
236
125
  |------|-------|------|-----------|------|
237
- | `package.json` | `npm run build` (if scripts.build) | `npm test` (if scripts.test not placeholder) | `npx tsc --noEmit` (if tsconfig.json) | `npm run lint` (if scripts.lint) |
238
- | `pyproject.toml` | — | `pytest` | `mypy .` (if mypy in deps) | `ruff check .` (if ruff in deps) |
239
- | `Cargo.toml` | `cargo build` | `cargo test` | — | `cargo clippy` (if installed) |
126
+ | `package.json` | `npm run build` | `npm test` | `npx tsc --noEmit` | `npm run lint` |
127
+ | `pyproject.toml` | — | `pytest` | `mypy .` | `ruff check .` |
128
+ | `Cargo.toml` | `cargo build` | `cargo test` | — | `cargo clippy` |
240
129
  | `go.mod` | `go build ./...` | `go test ./...` | — | `go vet ./...` |
241
130
 
242
- **Step 2: Run health checks** in the worktree:
243
- ```bash
244
- cd ${WORKTREE_PATH}
131
+ Run Build Test Typecheck Lint (stop on first failure).
245
132
 
246
- # Run each detected command
247
- # Build → Test → Typecheck → Lint (stop on first failure)
248
- ```
133
+ **Edit scope validation** (if spec declares `edit_scope`): check `git diff HEAD~1 --name-only` against allowed globs. Violations → `git revert HEAD --no-edit`, report "Edit scope violation: {files}".
249
134
 
250
- **Step 3: Validate edit scope** (if spec declares `edit_scope`):
251
- ```bash
252
- # Get files changed by the agent
253
- CHANGED=$(git diff HEAD~1 --name-only)
254
-
255
- # Load edit_scope from spec (files/globs)
256
- EDIT_SCOPE=$(grep 'edit_scope:' specs/doing-*.md | sed 's/edit_scope://' | tr ',' '\n' | xargs)
257
-
258
- # Check each changed file against allowed scope
259
- for file in ${CHANGED}; do
260
- ALLOWED=false
261
- for pattern in ${EDIT_SCOPE}; do
262
- # Match file against glob pattern
263
- [[ "${file}" == ${pattern} ]] && ALLOWED=true
264
- done
265
- ${ALLOWED} || VIOLATIONS+=("${file}")
266
- done
267
- ```
268
-
269
- - Violations found → revert: `git revert HEAD --no-edit`, report "✗ Edit scope violation: {files}"
270
- - No violations → continue to health checks
135
+ **Impact completeness check** (if task has Impact block in PLAN.md):
136
+ Compare `git diff HEAD~1 --name-only` against Impact callers/duplicates list.
137
+ File listed but not modified → **advisory warning**: "Impact gap: {file} listed as {caller|duplicate} but not modified — verify manually". Not auto-revert (callers sometimes don't need changes), but flags the risk.
271
138
 
272
- **Step 4: Evaluate**:
273
- - All checks pass AND no scope violations → task succeeds, commit stands
274
- - Any check fails → regression detected → revert: `git revert HEAD --no-edit`
139
+ **Evaluate:** All pass + no violations → commit stands. Any failure → `git revert HEAD --no-edit`.
275
140
 
276
- **Ratchet uses ONLY pre-existing test files** (from `.deepflow/auto-snapshot.txt`). If the agent added new test files that fail, those are excluded from evaluation — the agent's new tests don't influence the ratchet decision.
277
-
278
- **For spike tasks:** Same ratchet. If the spike's code passes pre-existing health checks, the spike passes. No LLM judges another LLM's work.
141
+ Ratchet uses ONLY pre-existing test files from `.deepflow/auto-snapshot.txt`.
279
142
 
280
143
  ### 5.7. PARALLEL SPIKE PROBES
281
144
 
282
- When two or more `[SPIKE]` tasks address the **same problem** (same "Blocked by:" target OR identical or near-identical hypothesis wording), treat them as a probe set and run this protocol instead of the standard single-agent flow.
283
-
284
- #### Detection
285
-
286
- ```
287
- Spike group = all [SPIKE] tasks where:
288
- - same "Blocked by:" value, OR
289
- - problem description is identical after stripping task ID prefix
290
- If group size 2 enter parallel probe mode
291
- ```
292
-
293
- #### Step 1: Record baseline commit
294
-
295
- ```bash
296
- cd ${WORKTREE_PATH}
297
- BASELINE=$(git rev-parse HEAD)
298
- echo "Probe baseline: ${BASELINE}"
299
- ```
300
-
301
- All probes branch from this exact commit so they share the same ratchet baseline.
302
-
303
- #### Step 2: Create isolated sub-worktrees
304
-
305
- For each spike `{SPIKE_ID}` in the probe group:
306
-
307
- ```bash
308
- PROBE_BRANCH="df/${SPEC_NAME}/probe-${SPIKE_ID}"
309
- PROBE_PATH=".deepflow/worktrees/${SPEC_NAME}/probe-${SPIKE_ID}"
310
-
311
- git worktree add -b "${PROBE_BRANCH}" "${PROBE_PATH}" "${BASELINE}"
312
- echo "Created probe worktree: ${PROBE_PATH} (branch: ${PROBE_BRANCH})"
313
- ```
314
-
315
- #### Step 3: Spawn all probes in parallel
316
-
317
- Mark every spike task as `in_progress`, then spawn one agent per probe **in a single message** using the Spike Task prompt (section 6), with the probe's worktree path as its working directory.
318
-
319
- ```
320
- TaskUpdate(taskId: native_id_SPIKE_A, status: "in_progress")
321
- TaskUpdate(taskId: native_id_SPIKE_B, status: "in_progress")
322
- [spawn agent for SPIKE_A → PROBE_PATH_A]
323
- [spawn agent for SPIKE_B → PROBE_PATH_B]
324
- ... (all in ONE message)
325
- ```
326
-
327
- End your turn. Do NOT poll or monitor. Wait for completion notifications.
328
-
329
- #### Step 4: Ratchet each probe (on completion notifications)
330
-
331
- When a probe agent's notification arrives, run the standard ratchet (section 5.5) against its dedicated probe worktree:
332
-
333
- ```bash
334
- cd ${PROBE_PATH}
335
-
336
- # Identical health-check commands as standard tasks
337
- # Build → Test → Typecheck → Lint (stop on first failure)
338
- ```
339
-
340
- Record per-probe metrics:
341
-
342
- ```yaml
343
- probe_id: SPIKE_A
344
- worktree: .deepflow/worktrees/{spec}/probe-SPIKE_A
345
- branch: df/{spec}/probe-SPIKE_A
346
- ratchet_passed: true/false
347
- regressions: 0 # failing pre-existing tests
348
- coverage_delta: +3 # new lines covered (positive = better)
349
- files_changed: 4 # number of files touched
350
- commit: abc1234
351
- ```
352
-
353
- Wait until **all** probe notifications have arrived before proceeding to selection.
354
-
355
- #### Step 5: Machine-select winner
356
-
357
- No LLM evaluates another LLM's work. Apply the following ordered criteria to all probes that **passed** the ratchet:
358
-
359
- ```
360
- 1. Fewer regressions (lower is better — hard gate: any regression disqualifies)
361
- 2. Better coverage (higher delta is better)
362
- 3. Fewer files changed (lower is better — smaller blast radius)
363
-
364
- Tie-break: first probe to complete (chronological)
365
- ```
366
-
367
- If **no** probe passes the ratchet, all are failed probes. Log insights (step 7) and reset the spike tasks to `pending` for retry with debugger guidance.
368
-
369
- #### Step 6: Preserve ALL probe worktrees
370
-
371
- Do NOT delete losing probe worktrees. They are preserved for manual inspection and cross-cycle learning:
372
-
373
- ```bash
374
- # Winning probe: leave as-is, will be used as implementation base (step 8)
375
- # Losing probes: leave worktrees intact, mark branches with -failed suffix for clarity
376
- git branch -m "df/{spec}/probe-SPIKE_B" "df/{spec}/probe-SPIKE_B-failed"
377
- ```
378
-
379
- Record all probe paths in `.deepflow/checkpoint.json` under `"spike_probes"` so future `--continue` runs know they exist.
380
-
381
- #### Step 7: Log failed probe insights
382
-
383
- For every probe that failed the ratchet (or lost selection), write two entries to `.deepflow/auto-memory.yaml` in the **main** tree.
384
-
385
- **Entry 1 — `spike_insights` (detailed probe record):**
386
-
387
- ```yaml
388
- spike_insights:
389
- - date: "YYYY-MM-DD"
390
- spec: "{spec_name}"
391
- spike_id: "SPIKE_B"
392
- hypothesis: "{hypothesis text from PLAN.md}"
393
- outcome: "failed" # or "passed-but-lost"
394
- failure_reason: "{first failed check and error summary}"
395
- ratchet_metrics:
396
- regressions: 2
397
- coverage_delta: -1
398
- files_changed: 7
399
- worktree: ".deepflow/worktrees/{spec}/probe-SPIKE_B-failed"
400
- branch: "df/{spec}/probe-SPIKE_B-failed"
401
- edge_cases: [] # orchestrator may populate after manual review
402
- ```
403
-
404
- **Entry 2 — `probe_learnings` (cross-cycle memory, read by `/df:auto-cycle` on each cycle start):**
405
-
406
- ```yaml
407
- probe_learnings:
408
- - spike: "SPIKE_B"
409
- probe: "{probe branch suffix, e.g. probe-SPIKE_B}"
410
- insight: "{one-sentence summary of what the probe revealed, derived from failure_reason}"
411
- ```
412
-
413
- If the file does not exist, create it. Initialize both `spike_insights:` and `probe_learnings:` as empty lists before appending. Preserve all existing keys when merging.
414
-
415
- #### Step 8: Promote winning probe
416
-
417
- Cherry-pick the winner's commit into the shared spec worktree so downstream implementation tasks see the winning approach:
418
-
419
- ```bash
420
- cd ${WORKTREE_PATH} # shared worktree (not the probe sub-worktree)
421
- git cherry-pick ${WINNER_COMMIT}
422
- ```
423
-
424
- Then mark the winning spike task as `completed` and auto-unblock its dependents:
425
-
426
- ```
427
- TaskUpdate(taskId: native_id_SPIKE_WINNER, status: "completed")
428
- TaskUpdate(taskId: native_id_SPIKE_LOSERS, status: "pending") # keep visible for audit
429
- ```
430
-
431
- Update PLAN.md:
432
- - Winning spike → `[x]` with commit hash and `[PROBE_WINNER]` tag
433
- - Losing spikes → `[~]` (skipped) with `[PROBE_FAILED: see auto-memory.yaml]` note
434
-
435
- Resume the standard execution loop (section 9) — implementation tasks blocked by the spike group are now unblocked.
145
+ Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothesis.
146
+
147
+ 1. **Baseline:** Record `BASELINE=$(git rev-parse HEAD)` in shared worktree
148
+ 2. **Sub-worktrees:** Per spike: `git worktree add -b df/{spec}--probe-{SPIKE_ID} .deepflow/worktrees/{spec}/probe-{SPIKE_ID} ${BASELINE}`
149
+ 3. **Spawn:** All probes in ONE message, each targeting its probe worktree. End turn.
150
+ 4. **Ratchet:** Per notification, run standard ratchet (5.5) in probe worktree. Record: ratchet_passed, regressions, coverage_delta, files_changed, commit
151
+ 5. **Select winner** (after ALL complete, no LLM judge):
152
+ - Disqualify any with regressions
153
+ - Rank: fewer regressions > higher coverage_delta > fewer files_changed > first to complete
154
+ - No passes → reset all to pending for retry with debugger
155
+ 6. **Preserve all worktrees.** Losers: rename branch + `-failed` suffix. Record in checkpoint.json under `"spike_probes"`
156
+ 7. **Log failed probes** to `.deepflow/auto-memory.yaml` (main tree):
157
+ ```yaml
158
+ spike_insights:
159
+ - date: "YYYY-MM-DD"
160
+ spec: "{spec_name}"
161
+ spike_id: "SPIKE_B"
162
+ hypothesis: "{from PLAN.md}"
163
+ outcome: "failed" # or "passed-but-lost"
164
+ failure_reason: "{first failed check + error summary}"
165
+ ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
166
+ worktree: ".deepflow/worktrees/{spec}/probe-SPIKE_B-failed"
167
+ branch: "df/{spec}--probe-SPIKE_B-failed"
168
+ probe_learnings: # read by /df:auto-cycle each start
169
+ - spike: "SPIKE_B"
170
+ probe: "probe-SPIKE_B"
171
+ insight: "{one-sentence summary from failure_reason}"
172
+ ```
173
+ Create file if missing. Preserve existing keys when merging.
174
+ 8. **Promote winner:** Cherry-pick into shared worktree. Winner → `[x] [PROBE_WINNER]`, losers → `[~] [PROBE_FAILED]`. Resume standard loop.
436
175
 
437
176
  ---
438
177
 
@@ -444,143 +183,127 @@ Working directory: {worktree_absolute_path}
444
183
  All file operations MUST use this absolute path as base. Do NOT write files to the main project directory.
445
184
  Commit format: {commit_type}({spec}): {description}
446
185
 
447
- STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main. These are handled by the orchestrator and /df:verify.
186
+ STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main.
448
187
  ```
449
188
 
450
- **Standard Task (append after preamble):**
189
+ **Standard Task:**
451
190
  ```
452
191
  {task_id}: {description from PLAN.md}
453
- Files: {target files}
454
- Spec: {spec_name}
192
+ Files: {target files} Spec: {spec_name}
193
+ {Impact block from PLAN.md — include verbatim if present}
194
+
195
+ {Prior failure context — include ONLY if task was previously reverted. Read from .deepflow/auto-memory.yaml revert_history for this task_id:}
196
+ Previous attempts (DO NOT repeat these approaches):
197
+ - Cycle {N}: reverted — "{reason from revert_history}"
198
+ - Cycle {N}: reverted — "{reason from revert_history}"
199
+ {Omit this entire block if task has no revert history.}
200
+
201
+ CRITICAL: If Impact lists duplicates or callers, you MUST verify each one is consistent with your changes.
202
+ - [active] duplicates → consolidate into single source of truth (e.g., local generateYAML → use shared buildConfigData)
203
+ - [dead] duplicates → DELETE the dead code entirely. Dead code pollutes context and causes drift.
455
204
 
456
205
  Steps:
457
- 1. If the task involves external APIs/SDKs, run: chub search "<library>" --json → chub get <id> --lang <lang>
458
- Use fetched docs as ground truth for API signatures. Annotate any gaps: chub annotate <id> "note"
459
- Skip this step if chub is not installed or the task only touches internal code.
460
- 2. Implement the task
461
- 3. Commit as feat({spec}): {description}
206
+ 1. External APIs/SDKs chub search "<library>" --json → chub get <id> --lang <lang> (skip if chub unavailable or internal code only)
207
+ 2. Read ALL files in Impact before implementing understand the full picture
208
+ 3. Implement the task, updating all impacted files
209
+ 4. Commit as feat({spec}): {description}
462
210
 
463
- Your ONLY job is to write code and commit. The orchestrator will run health checks after you finish.
211
+ Your ONLY job is to write code and commit. Orchestrator runs health checks after.
464
212
  ```
465
213
 
466
- **Bootstrap Task (append after preamble):**
214
+ **Bootstrap Task:**
467
215
  ```
468
216
  BOOTSTRAP: Write tests for files in edit_scope
469
- Files: {edit_scope files from spec}
470
- Spec: {spec_name}
217
+ Files: {edit_scope files} Spec: {spec_name}
471
218
 
472
- Steps:
473
- 1. Write tests that cover the functionality of the files listed above
474
- 2. Do NOT change implementation files — tests only
475
- 3. Commit as test({spec}): bootstrap tests for edit_scope
476
-
477
- Your ONLY job is to write tests and commit. The orchestrator will run health checks after you finish.
219
+ Write tests covering listed files. Do NOT change implementation files.
220
+ Commit as test({spec}): bootstrap tests for edit_scope
478
221
  ```
479
222
 
480
- **Spike Task (append after preamble):**
223
+ **Spike Task:**
481
224
  ```
482
225
  {task_id} [SPIKE]: {hypothesis}
483
- Files: {target files}
484
- Spec: {spec_name}
226
+ Files: {target files} Spec: {spec_name}
485
227
 
486
- Steps:
487
- 1. Implement the minimal spike to validate the hypothesis
488
- 2. Commit as spike({spec}): {description}
228
+ {Prior failure context — include ONLY if this spike was previously reverted. Read from .deepflow/auto-memory.yaml revert_history + spike_insights for this task_id:}
229
+ Previous attempts (DO NOT repeat these approaches):
230
+ - Cycle {N}: reverted — "{reason}"
231
+ {Omit this entire block if no revert history.}
489
232
 
490
- Your ONLY job is to write code and commit. The orchestrator will run health checks to determine if the spike passes.
233
+ Implement minimal spike to validate hypothesis.
234
+ Commit as spike({spec}): {description}
491
235
  ```
492
236
 
493
- ### 7. FAILURE HANDLING
237
+ ### 8. COMPLETE SPECS
494
238
 
495
- When a task fails ratchet and is reverted:
239
+ When all tasks done for a `doing-*` spec:
240
+ 1. Run `/df:verify doing-{name}` via the Skill tool (`skill: "df:verify", args: "doing-{name}"`)
241
+ - Verify runs quality gates (L0-L4), merges worktree branch to main, cleans up worktree, renames spec `doing-*` → `done-*`, and extracts decisions
242
+ - If verify fails (adds fix tasks): stop here — `/df:execute --continue` will pick up the fix tasks
243
+ - If verify passes: proceed to step 2
244
+ 2. Remove spec's ENTIRE section from PLAN.md (header, tasks, summaries, fix tasks, separators)
245
+ 3. Recalculate Summary table at top of PLAN.md
496
246
 
497
- `TaskUpdate(taskId: native_id, status: "pending")` — keeps task visible for retry; dependents remain blocked.
247
+ ---
498
248
 
499
- On repeated failure: spawn `Task(subagent_type="reasoner", model={model from debugger frontmatter, default "sonnet"}, prompt="Debug failure: {ratchet output}")`.
249
+ ## Usage
500
250
 
501
- Leave worktree intact, keep checkpoint.json, output: worktree path/branch, `cd {worktree_path}` to investigate, `/df:execute --continue` to resume, `/df:execute --fresh` to discard.
251
+ ```
252
+ /df:execute # Execute all ready tasks
253
+ /df:execute T1 T2 # Specific tasks only
254
+ /df:execute --continue # Resume from checkpoint
255
+ /df:execute --fresh # Ignore checkpoint
256
+ /df:execute --dry-run # Show plan only
257
+ ```
502
258
 
503
- ### 8. COMPLETE SPECS
259
+ ## Skills & Agents
504
260
 
505
- When all tasks done for a `doing-*` spec:
506
- 1. Embed history in spec: `## Completed` section with task list and commit hashes
507
- 2. Rename: `doing-upload.md` → `done-upload.md`
508
- 3. Extract decisions from done-* spec: Read the `done-{name}.md` file. Model-extract architectural decisions — look for explicit choices (→ `[APPROACH]`), unvalidated assumptions (→ `[ASSUMPTION]`), and "for now" decisions (→ `[PROVISIONAL]`). Append as a new section to **main tree** `.deepflow/decisions.md`:
509
- ```
510
- ### {YYYY-MM-DD} — {spec-name}
511
- - [TAG] decision text — rationale
512
- ```
513
- After successful append, delete `specs/done-{name}.md`. If write fails, preserve the file.
514
- 4. Remove the spec's ENTIRE section from PLAN.md:
515
- - The `### doing-{spec}` header
516
- - All task entries (`- [x] **T{n}**: ...` and their sub-items)
517
- - Any `## Execution Summary` block for that spec
518
- - Any `### Fix Tasks` sub-section for that spec
519
- - Separators (`---`) between removed sections
520
- 5. Recalculate the Summary table at the top of PLAN.md (update counts for completed/pending)
261
+ - Skill: `atomic-commits` Clean commit protocol
262
+ - Skill: `browse-fetch` Fetch live web pages and external API docs via browser before coding
521
263
 
522
- ### 9. ITERATE (Notification-Driven)
264
+ | Agent | subagent_type | Purpose |
265
+ |-------|---------------|---------|
266
+ | Implementation | `general-purpose` | Task implementation |
267
+ | Debugger | `reasoner` | Debugging failures |
523
268
 
524
- After spawning wave agents, your turn ENDS. Completion notifications drive the loop.
269
+ **Model routing:** Use `model:` from command/agent/skill frontmatter. Default: `sonnet`.
525
270
 
526
- **Per notification:**
527
- 1. Run ratchet check for the completed agent (see section 5.5)
528
- 2. Ratchet passed `TaskUpdate(taskId: native_id, status: "completed")` — auto-unblocks dependent tasks
529
- 3. Ratchet failed → revert commit, `TaskUpdate(taskId: native_id, status: "pending")`
530
- 4. Update PLAN.md: `[ ]` → `[x]` + commit hash (on pass) or note revert (on fail)
531
- 5. Report: "✓ T1: ratchet passed (abc123)" or "✗ T1: ratchet failed, reverted"
532
- 6. If NOT all wave agents done → end turn, wait
533
- 7. If ALL wave agents done → use TaskList to find newly unblocked tasks, check context, spawn next wave or finish
271
+ **Checkpoint schema:** `.deepflow/checkpoint.json` in worktree:
272
+ ```json
273
+ {"completed_tasks": ["T1","T2"], "current_wave": 2, "worktree_path": ".deepflow/worktrees/upload", "worktree_branch": "df/upload"}
274
+ ```
275
+
276
+ ---
534
277
 
535
- **Between waves:** Check context %. If ≥50%, checkpoint and exit.
278
+ ## Failure Handling
536
279
 
537
- **Repeat** until: all done, all blocked, or context ≥50% (checkpoint).
280
+ When task fails ratchet and is reverted:
281
+ - `TaskUpdate(taskId: native_id, status: "pending")` — dependents remain blocked
282
+ - Repeated failure → spawn `Task(subagent_type="reasoner", prompt="Debug failure: {ratchet output}")`
283
+ - Leave worktree intact, keep checkpoint.json
284
+ - Output: worktree path/branch, `cd {path}` to investigate, `--continue` to resume, `--fresh` to discard
538
285
 
539
286
  ## Rules
540
287
 
541
288
  | Rule | Detail |
542
289
  |------|--------|
543
- | Zero test files → bootstrap first | Section 1.7; bootstrap is the cycle's sole task when snapshot is empty |
290
+ | Zero test files → bootstrap first | Bootstrap is cycle's sole task when snapshot empty |
544
291
  | 1 task = 1 agent = 1 commit | `atomic-commits` skill |
545
292
  | 1 file = 1 writer | Sequential if conflict |
546
293
  | Agent writes code, orchestrator measures | Ratchet is the judge |
547
294
  | No LLM evaluates LLM work | Health checks only |
548
- | ≥2 spikes for same problem → parallel probes | Section 5.7; never run competing spikes sequentially |
549
- | All probe worktrees preserved | Losing probes renamed with `-failed` suffix; never deleted |
550
- | Machine-selected winner | Fewer regressions > better coverage > fewer files changed; no LLM judge |
551
- | Failed probe insights logged | `.deepflow/auto-memory.yaml` in main tree; persists across cycles |
552
- | Winner cherry-picked to shared worktree | Downstream tasks see winning approach via shared worktree |
553
- | External APIs → chub first | Agents fetch curated docs before implementing external API calls; skip if chub unavailable |
295
+ | ≥2 spikes same problem → parallel probes | Never run competing spikes sequentially |
296
+ | All probe worktrees preserved | Losers renamed `-failed`; never deleted |
297
+ | Machine-selected winner | Regressions > coverage > files changed; no LLM judge |
298
+ | External APIs chub first | Skip if unavailable |
554
299
 
555
300
  ## Example
556
301
 
557
- ### No-Tests Bootstrap
558
-
559
- ```
560
- /df:execute (context: 8%)
561
-
562
- Loading PLAN.md... T1 ready, T2/T3 blocked by T1
563
- Ratchet snapshot: 0 pre-existing test files
564
- Bootstrap needed: no pre-existing test files found.
565
-
566
- Spawning bootstrap agent for edit_scope...
567
- [Bootstrap agent completed]
568
- Running ratchet: build ✓ | tests ✓ (12 new tests pass)
569
- ✓ Bootstrap: ratchet passed (boo1234)
570
- Re-taking ratchet snapshot: 3 test files
571
-
572
- bootstrap: completed — cycle's sole task was test bootstrap
573
- Next: Run /df:auto-cycle again to execute T1
574
- ```
575
-
576
- ### Standard Execution
577
-
578
302
  ```
579
303
  /df:execute (context: 12%)
580
304
 
581
305
  Loading PLAN.md... T1 ready, T2/T3 blocked by T1
582
306
  Ratchet snapshot: 24 pre-existing test files
583
- Registering native tasks: TaskCreate T1/T2/T3, TaskUpdate(T2 blockedBy T1), TaskUpdate(T3 blockedBy T1)
584
307
 
585
308
  Wave 1: TaskUpdate(T1, in_progress)
586
309
  [Agent "T1" completed]
@@ -589,43 +312,13 @@ Wave 1: TaskUpdate(T1, in_progress)
589
312
  TaskUpdate(T1, completed) → auto-unblocks T2, T3
590
313
 
591
314
  Wave 2: TaskUpdate(T2/T3, in_progress)
592
- [Agent "T2" completed]
593
- Running ratchet: build | tests (24 passed) | typecheck ✓
594
- ✓ T2: ratchet passed (def5678)
595
- [Agent "T3" completed]
596
- Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
597
- T3: ratchet passed (ghi9012)
598
-
599
- Context: 35% doing-upload → done-upload. Complete: 3/3
600
-
601
- Next: Run /df:verify to verify specs and merge to main
602
- ```
603
-
604
- ### Ratchet Failure (Regression Detected)
605
-
606
- ```
607
- /df:execute (context: 10%)
608
-
609
- Wave 1: TaskUpdate(T1, in_progress)
610
- [Agent "T1" completed]
611
- Running ratchet: build ✓ | tests ✗ (2 failed of 24)
612
- ✗ T1: ratchet failed, reverted
613
- TaskUpdate(T1, pending)
614
-
615
- Spawning debugger for T1...
616
- [Debugger completed]
617
- Re-running T1 with fix guidance...
618
-
619
- [Agent "T1 retry" completed]
620
- Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
621
- ✓ T1: ratchet passed (abc1234)
622
- ```
623
-
624
- ### With Checkpoint
625
-
626
- ```
627
- Wave 1 complete (context: 52%)
628
- Checkpoint saved.
629
-
630
- Next: Run /df:execute --continue to resume execution
315
+ [Agent "T2" completed] ✓ T2: ratchet passed (def5678)
316
+ [Agent "T3" completed] T3: ratchet passed (ghi9012)
317
+
318
+ Context: 35% — All tasks done for doing-upload.
319
+ Running /df:verify doing-upload...
320
+ L0 | L1 (3/3 files) | ⚠ L2 (no coverage tool) | ✓ L4 (24 tests)
321
+ ✓ Merged df/upload to main
322
+ Spec complete: doing-upload → done-upload
323
+ Complete: 3/3
631
324
  ```