deepflow 0.1.78 → 0.1.79

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "deepflow",
3
- "version": "0.1.78",
3
+ "version": "0.1.79",
4
4
  "description": "Doing reveals what thinking can't predict — spec-driven iterative development for Claude Code",
5
5
  "keywords": [
6
6
  "claude",
@@ -169,10 +169,10 @@ _Last updated: {YYYY-MM-DDTHH:MM:SSZ}_
169
169
 
170
170
  ## Cycle Log
171
171
 
172
- | Cycle | Task | Status | Commit / Revert | Reason | Timestamp |
173
- |-------|------|--------|-----------------|--------|-----------|
174
- | 1 | T1 | passed | abc1234 | — | 2025-01-15T10:00:00Z |
175
- | 2 | T2 | failed | reverted | tests failed: 2 of 24 | 2025-01-15T10:05:00Z |
172
+ | Cycle | Task | Status | Commit / Revert | Delta | Reason | Timestamp |
173
+ |-------|------|--------|-----------------|-------|--------|-----------|
174
+ | 1 | T1 | passed | abc1234 | tests: 24→24, build: ok | — | 2025-01-15T10:00:00Z |
175
+ | 2 | T2 | failed | reverted | tests: 24→22 (−2) | tests failed: 2 of 24 | 2025-01-15T10:05:00Z |
176
176
 
177
177
  ## Probe Results
178
178
 
@@ -202,13 +202,14 @@ _(tasks that were reverted with their failure reasons)_
202
202
  **Cycle Log — append one row:**
203
203
 
204
204
  ```
205
- | {cycle_number} | {task_id} | {status} | {commit_hash or "reverted"} | {reason or "—"} | {YYYY-MM-DDTHH:MM:SSZ} |
205
+ | {cycle_number} | {task_id} | {status} | {commit_hash or "reverted"} | {delta} | {reason or "—"} | {YYYY-MM-DDTHH:MM:SSZ} |
206
206
  ```
207
207
 
208
208
  - `cycle_number`: total number of cycles executed so far (count existing data rows in the Cycle Log + 1)
209
209
  - `task_id`: task ID from PLAN.md, or `BOOTSTRAP` for bootstrap cycles
210
210
  - `status`: `passed` (ratchet passed), `failed` (ratchet failed, reverted), or `skipped` (task was already done)
211
211
  - `commit_hash`: short hash from the commit, or `reverted` if ratchet failed
212
+ - `delta`: ratchet metric change from this cycle. Format: `tests: {before}→{after}, build: ok/fail`. Include coverage delta if available (e.g., `cov: 80%→82% (+2%)`). On revert, show the regression that triggered it (e.g., `tests: 24→22 (−2)`)
212
213
  - `reason`: failure reason from ratchet output (e.g., `"tests failed: 2 of 24"`), or `—` if passed
213
214
 
214
215
  **Summary table — recalculate from Cycle Log rows:**
@@ -259,10 +260,12 @@ done_count = number of [x] tasks
259
260
  pending_count = number of [ ] tasks
260
261
  ```
261
262
 
262
- **If ALL tasks are `[x]` (pending_count == 0):**
263
+ **Note:** Per-spec verification and merge to main happens automatically in `/df:execute` (step 8) when all tasks for a spec complete. No separate verify call is needed here.
264
+
265
+ **If no `[ ]` tasks remain (pending_count == 0):**
263
266
  ```
264
- Run /df:verify via Skill tool (skill: "df:verify", no args)
265
- Report: "All tasks complete. Verification triggered."
267
+ Report: "All specs verified and merged. Workflow complete."
268
+ Exit
266
269
  ```
267
270
 
268
271
  **If tasks remain (pending_count > 0):**
@@ -327,17 +330,14 @@ Updated .deepflow/auto-report.md:
327
330
  Cycle complete. 1 tasks remaining.
328
331
  ```
329
332
 
330
- ### All Tasks Done (verify triggered)
333
+ ### All Tasks Done (workflow complete)
331
334
 
332
335
  ```
333
336
  /df:auto-cycle
334
337
 
335
- Loading PLAN.md... 3 tasks total, 3 done, 0 pending
338
+ Loading PLAN.md... 0 tasks total, 0 done, 0 pending
336
339
 
337
- All tasks complete. Verification triggered.
338
- Running: /df:verify
339
- ✓ L0 | ✓ L1 | ⚠ L2 (no coverage tool) | ✓ L4
340
- ✓ Merged df/upload to main
340
+ All specs verified and merged. Workflow complete.
341
341
  ```
342
342
 
343
343
  ### No Work Remaining (idempotent)
@@ -345,10 +345,9 @@ Running: /df:verify
345
345
  ```
346
346
  /df:auto-cycle
347
347
 
348
- Loading PLAN.md... 3 tasks total, 3 done, 0 pending
349
- Verification already complete (no doing-* specs found).
348
+ Loading PLAN.md... 0 tasks total, 0 done, 0 pending
350
349
 
351
- Nothing to do. Cycle complete. 0 tasks remaining.
350
+ All specs verified and merged. Workflow complete.
352
351
  ```
353
352
 
354
353
  ### Circuit Breaker Tripped
@@ -8,93 +8,44 @@ You are a coordinator. Spawn agents, run ratchet checks, update PLAN.md. Never i
8
8
 
9
9
  **ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, run ratchet health checks after each agent completes, update PLAN.md, write `.deepflow/decisions.md` in the main tree
10
10
 
11
- ---
11
+ ## Core Loop (Notification-Driven)
12
12
 
13
- ## Purpose
14
- Implement tasks from PLAN.md with parallel agents, atomic commits, ratchet-driven quality gates, and context-efficient execution.
13
+ Each task = one background agent. Completion notifications drive the loop.
14
+
15
+ **NEVER use TaskOutput** — returns full transcripts (100KB+) that explode context.
15
16
 
16
- ## Usage
17
17
  ```
18
- /df:execute # Execute all ready tasks
19
- /df:execute T1 T2 # Specific tasks only
20
- /df:execute --continue # Resume from checkpoint
21
- /df:execute --fresh # Ignore checkpoint
22
- /df:execute --dry-run # Show plan only
18
+ 1. Spawn ALL wave agents with run_in_background=true in ONE message
19
+ 2. STOP. End your turn. Do NOT poll or monitor.
20
+ 3. On EACH notification:
21
+ a. Run ratchet check (section 5.5)
22
+ b. Passed → TaskUpdate(status: "completed"), update PLAN.md [x] + commit hash
23
+ c. Failed → git revert HEAD --no-edit, TaskUpdate(status: "pending")
24
+ d. Report ONE line: "✓ T1: ratchet passed (abc123)" or "✗ T1: ratchet failed, reverted"
25
+ e. NOT all done → end turn, wait | ALL done → next wave or finish
26
+ 4. Between waves: check context %. If ≥50% → checkpoint and exit.
27
+ 5. Repeat until: all done, all blocked, or context ≥50%.
23
28
  ```
24
29
 
25
- ## Skills & Agents
26
- - Skill: `atomic-commits` — Clean commit protocol
27
- - Skill: `context-hub` — Fetch external API docs before coding (when task involves external libraries)
30
+ ## Context Threshold
28
31
 
29
- **Use Task tool to spawn agents:**
30
- | Agent | subagent_type | Purpose |
31
- |-------|---------------|---------|
32
- | Implementation | `general-purpose` | Task implementation |
33
- | Debugger | `reasoner` | Debugging failures |
34
-
35
- **Model routing from frontmatter:**
36
- The model for each agent is determined by the `model:` field in the command/agent/skill frontmatter being invoked. The orchestrator reads the relevant frontmatter to determine which model to pass to `Task()`. If no `model:` field is present in the frontmatter, default to `sonnet`.
37
-
38
- ## Context-Aware Execution
39
-
40
- Statusline writes to `.deepflow/context.json`: `{"percentage": 45}`
32
+ Statusline writes `.deepflow/context.json`: `{"percentage": 45}`
41
33
 
42
34
  | Context % | Action |
43
35
  |-----------|--------|
44
36
  | < 50% | Full parallelism (up to 5 agents) |
45
37
  | ≥ 50% | Wait for running agents, checkpoint, exit |
46
38
 
47
- ## Agent Protocol
48
-
49
- Each task = one background agent. Use agent completion notifications as the feedback loop.
50
-
51
- **NEVER use TaskOutput** — returns full agent transcripts (100KB+) that explode context.
52
-
53
- ### Notification-Driven Execution
54
-
55
- ```
56
- 1. Spawn ALL wave agents with run_in_background=true in ONE message
57
- 2. STOP. End your turn. Do NOT run Bash monitors or poll for results.
58
- 3. Wait for "Agent X completed" notifications (they arrive automatically)
59
- 4. On EACH notification:
60
- a. Run ratchet check (health checks on the worktree)
61
- b. Report: "✓ T1: ratchet passed (abc123)" or "✗ T1: ratchet failed, reverted"
62
- c. Update PLAN.md for that task
63
- d. Check: all wave agents done?
64
- - No → end turn, wait for next notification
65
- - Yes → proceed to next wave or write final summary
66
- ```
67
-
68
- After spawning, your turn ENDS. Per notification: run ratchet, output ONE line, update PLAN.md. Write full summary only after ALL wave agents complete.
69
-
70
- ## Checkpoint & Resume
71
-
72
- **File:** `.deepflow/checkpoint.json` — stored in WORKTREE directory, not main.
73
-
74
- **Schema:**
75
- ```json
76
- {
77
- "completed_tasks": ["T1", "T2"],
78
- "current_wave": 2,
79
- "worktree_path": ".deepflow/worktrees/upload",
80
- "worktree_branch": "df/upload"
81
- }
82
- ```
83
-
84
- **On checkpoint:** Complete wave → update PLAN.md → save to worktree → exit.
85
- **Resume:** `--continue` loads checkpoint, verifies worktree, skips completed tasks.
39
+ ---
86
40
 
87
41
  ## Behavior
88
42
 
89
43
  ### 1. CHECK CHECKPOINT
90
44
 
91
45
  ```
92
- --continue → Load checkpoint
93
- If worktree_path exists:
94
- Verify worktree still exists on disk
95
- → If missing: Error "Worktree deleted. Use --fresh"
96
- → If exists: Use it, skip worktree creation
97
- → Resume execution with completed tasks
46
+ --continue → Load .deepflow/checkpoint.json from worktree
47
+ Verify worktree exists on disk (else error: "Use --fresh")
48
+ Skip completed tasks, resume execution
98
49
  --fresh → Delete checkpoint, start fresh
99
50
  checkpoint exists → Prompt: "Resume? (y/n)"
100
51
  else → Start fresh
@@ -102,88 +53,29 @@ else → Start fresh
102
53
 
103
54
  ### 1.5. CREATE WORKTREE
104
55
 
105
- Before spawning any agents, create an isolated worktree:
106
-
107
- ```
108
- # Check main is clean (ignore untracked)
109
- git diff --quiet HEAD || Error: "Main has uncommitted changes. Commit or stash first."
110
-
111
- # Generate paths
112
- SPEC_NAME=$(basename spec/doing-*.md .md | sed 's/doing-//')
113
- BRANCH_NAME="df/${SPEC_NAME}"
114
- WORKTREE_PATH=".deepflow/worktrees/${SPEC_NAME}"
115
-
116
- # Create worktree (or reuse existing)
117
- if [ -d "${WORKTREE_PATH}" ]; then
118
- echo "Reusing existing worktree"
119
- else
120
- git worktree add -b "${BRANCH_NAME}" "${WORKTREE_PATH}"
121
- fi
122
- ```
123
-
124
- **Existing worktree:** Reuse it (same spec = same worktree).
125
-
126
- **--fresh flag:** Deletes existing worktree and creates new one.
56
+ Require clean HEAD (`git diff --quiet`). Derive SPEC_NAME from `specs/doing-*.md`.
57
+ Create worktree: `.deepflow/worktrees/{spec}` on branch `df/{spec}`.
58
+ Reuse if exists. `--fresh` deletes first.
127
59
 
128
60
  ### 1.6. RATCHET SNAPSHOT
129
61
 
130
- Before spawning agents, snapshot pre-existing test files:
62
+ Snapshot pre-existing test files in worktree — only these count for ratchet (agent-created tests excluded):
131
63
 
132
64
  ```bash
133
65
  cd ${WORKTREE_PATH}
134
-
135
- # Snapshot pre-existing test files (only these count for ratchet)
136
66
  git ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' \
137
67
  > .deepflow/auto-snapshot.txt
138
-
139
- echo "Ratchet snapshot: $(wc -l < .deepflow/auto-snapshot.txt) pre-existing test files"
140
68
  ```
141
69
 
142
- **Only pre-existing test files are used for ratchet evaluation.** New test files created by agents during implementation don't influence the pass/fail decision. This prevents agents from gaming the ratchet by writing tests that pass trivially.
143
-
144
70
  ### 1.7. NO-TESTS BOOTSTRAP
145
71
 
146
- After the ratchet snapshot, check if zero test files were found:
147
-
148
- ```bash
149
- TEST_COUNT=$(wc -l < .deepflow/auto-snapshot.txt | tr -d ' ')
150
-
151
- if [ "${TEST_COUNT}" = "0" ]; then
152
- echo "Bootstrap needed: no pre-existing test files found."
153
- BOOTSTRAP_NEEDED=true
154
- else
155
- BOOTSTRAP_NEEDED=false
156
- fi
157
- ```
158
-
159
- **If `BOOTSTRAP_NEEDED=true`:**
160
-
161
- 1. **Inject a bootstrap task** as the FIRST action before any regular PLAN.md task is executed:
162
- - Bootstrap task description: "Write tests for files in edit_scope"
163
- - Read `edit_scope` from `specs/doing-*.md` to know which files need tests
164
- - Spawn ONE dedicated bootstrap agent using the Bootstrap Task prompt (section 6)
72
+ If snapshot has zero test files:
165
73
 
166
- 2. **Bootstrap agent behavior:**
167
- - Write tests covering the files listed in `edit_scope`
168
- - Commit as `test({spec}): bootstrap tests for edit_scope`
169
- - The bootstrap agent's ONLY job is writing tests — no implementation changes
74
+ 1. Spawn ONE bootstrap agent (section 6 Bootstrap Task) to write tests for `edit_scope` files
75
+ 2. On ratchet pass: re-snapshot, report `"bootstrap: completed"`, end cycle (no PLAN.md tasks this cycle)
76
+ 3. On ratchet fail: revert, halt with "Bootstrap failed — manual intervention required"
170
77
 
171
- 3. **After bootstrap agent completes:**
172
- - Run ratchet health checks (build must pass; test suite must not error out)
173
- - If ratchet passes: re-take the ratchet snapshot so subsequent tasks use the new tests as baseline:
174
- ```bash
175
- cd ${WORKTREE_PATH}
176
- git ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' \
177
- > .deepflow/auto-snapshot.txt
178
- echo "Post-bootstrap snapshot: $(wc -l < .deepflow/auto-snapshot.txt) test files"
179
- ```
180
- - If ratchet fails: revert bootstrap commit, log error, halt and report "Bootstrap failed — manual intervention required"
181
-
182
- 4. **Signal to caller:** After bootstrap completes successfully, report `"bootstrap: completed"` in the cycle summary. This cycle's sole output is the test bootstrap — no regular PLAN.md task is executed this cycle.
183
-
184
- 5. **Subsequent cycles:** The updated `.deepflow/auto-snapshot.txt` now contains the bootstrapped test files. All subsequent ratchet checks use these as the baseline.
185
-
186
- **If `BOOTSTRAP_NEEDED=false`:** Proceed normally to section 2.
78
+ Subsequent cycles use bootstrapped tests as ratchet baseline.
187
79
 
188
80
  ### 2. LOAD PLAN
189
81
 
@@ -194,7 +86,7 @@ If missing: "No PLAN.md found. Run /df:plan first."
194
86
 
195
87
  ### 2.5. REGISTER NATIVE TASKS
196
88
 
197
- For each `[ ]` task in PLAN.md: `TaskCreate(subject: "{task_id}: {description}", activeForm: "{gerund}", description: full block)`. Store task_id → native ID mapping. Then set dependencies: `TaskUpdate(addBlockedBy: [...])` for each "Blocked by:" entry. On `--continue`: only register remaining `[ ]` items.
89
+ For each `[ ]` task in PLAN.md: `TaskCreate(subject: "{task_id}: {description}", activeForm: "{gerund}", description: full block)`. Store task_id → native ID mapping. Set dependencies via `TaskUpdate(addBlockedBy: [...])`. On `--continue`: only register remaining `[ ]` items.
198
90
 
199
91
  ### 3. CHECK FOR UNPLANNED SPECS
200
92
 
@@ -202,237 +94,77 @@ Warn if `specs/*.md` (excluding doing-/done-) exist. Non-blocking.
202
94
 
203
95
  ### 4. IDENTIFY READY TASKS
204
96
 
205
- Use TaskList to find ready tasks:
206
-
207
- ```
208
- Ready = TaskList results where:
209
- - status: "pending"
210
- - blockedBy: empty (auto-unblocked by native dependency system)
211
- ```
97
+ Ready = TaskList where status: "pending" AND blockedBy: empty.
212
98
 
213
99
  ### 5. SPAWN AGENTS
214
100
 
215
101
  Context ≥50%: checkpoint and exit.
216
102
 
217
- **Before spawning each agent**, mark its native task as in_progress:
218
- ```
219
- TaskUpdate(taskId: native_id, status: "in_progress")
220
- ```
221
- This activates the UI spinner showing the task's activeForm (e.g. "Creating upload endpoint").
103
+ Before spawning: `TaskUpdate(taskId: native_id, status: "in_progress")` activates UI spinner.
222
104
 
223
- **NEVER use `isolation: "worktree"` on Task tool calls.** Deepflow manages a shared worktree per spec (`.deepflow/worktrees/{spec}/`) so wave 2 agents see wave 1 commits. Claude Code's native isolation creates separate per-agent worktrees (`.claude/worktrees/`) where agents can't see each other's work.
105
+ **NEVER use `isolation: "worktree"` on Task calls.** Deepflow manages a shared worktree so wave 2 sees wave 1 commits.
224
106
 
225
- **Spawn ALL ready tasks in ONE message** with multiple Task tool calls (true parallelism). Same-file conflicts: spawn sequentially.
107
+ **Spawn ALL ready tasks in ONE message.** Same-file conflicts: spawn sequentially.
226
108
 
227
- **Multiple [SPIKE] tasks for the same problem:** When PLAN.md contains two or more `[SPIKE]` tasks grouped by the same "Blocked by:" target or identical problem description, do NOT run them sequentially. Instead, follow the **Parallel Spike Probes** protocol in section 5.7 before spawning any implementation tasks that depend on the spike outcome.
109
+ **≥2 [SPIKE] tasks for same problem:** Follow Parallel Spike Probes (section 5.7).
228
110
 
229
111
  ### 5.5. RATCHET CHECK
230
112
 
231
- After each agent completes (notification received), the orchestrator runs health checks on the worktree.
113
+ After each agent completes, run health checks in the worktree.
232
114
 
233
- **Step 1: Detect commands** (same auto-detection as /df:verify):
115
+ **Auto-detect commands:**
234
116
 
235
117
  | File | Build | Test | Typecheck | Lint |
236
118
  |------|-------|------|-----------|------|
237
- | `package.json` | `npm run build` (if scripts.build) | `npm test` (if scripts.test not placeholder) | `npx tsc --noEmit` (if tsconfig.json) | `npm run lint` (if scripts.lint) |
238
- | `pyproject.toml` | — | `pytest` | `mypy .` (if mypy in deps) | `ruff check .` (if ruff in deps) |
239
- | `Cargo.toml` | `cargo build` | `cargo test` | — | `cargo clippy` (if installed) |
119
+ | `package.json` | `npm run build` | `npm test` | `npx tsc --noEmit` | `npm run lint` |
120
+ | `pyproject.toml` | — | `pytest` | `mypy .` | `ruff check .` |
121
+ | `Cargo.toml` | `cargo build` | `cargo test` | — | `cargo clippy` |
240
122
  | `go.mod` | `go build ./...` | `go test ./...` | — | `go vet ./...` |
241
123
 
242
- **Step 2: Run health checks** in the worktree:
243
- ```bash
244
- cd ${WORKTREE_PATH}
124
+ Run Build Test Typecheck Lint (stop on first failure).
245
125
 
246
- # Run each detected command
247
- # Build → Test → Typecheck → Lint (stop on first failure)
248
- ```
126
+ **Edit scope validation** (if spec declares `edit_scope`): check `git diff HEAD~1 --name-only` against allowed globs. Violations → `git revert HEAD --no-edit`, report "Edit scope violation: {files}".
249
127
 
250
- **Step 3: Validate edit scope** (if spec declares `edit_scope`):
251
- ```bash
252
- # Get files changed by the agent
253
- CHANGED=$(git diff HEAD~1 --name-only)
254
-
255
- # Load edit_scope from spec (files/globs)
256
- EDIT_SCOPE=$(grep 'edit_scope:' specs/doing-*.md | sed 's/edit_scope://' | tr ',' '\n' | xargs)
257
-
258
- # Check each changed file against allowed scope
259
- for file in ${CHANGED}; do
260
- ALLOWED=false
261
- for pattern in ${EDIT_SCOPE}; do
262
- # Match file against glob pattern
263
- [[ "${file}" == ${pattern} ]] && ALLOWED=true
264
- done
265
- ${ALLOWED} || VIOLATIONS+=("${file}")
266
- done
267
- ```
268
-
269
- - Violations found → revert: `git revert HEAD --no-edit`, report "✗ Edit scope violation: {files}"
270
- - No violations → continue to health checks
271
-
272
- **Step 4: Evaluate**:
273
- - All checks pass AND no scope violations → task succeeds, commit stands
274
- - Any check fails → regression detected → revert: `git revert HEAD --no-edit`
128
+ **Impact completeness check** (if task has Impact block in PLAN.md):
129
+ Compare `git diff HEAD~1 --name-only` against Impact callers/duplicates list.
130
+ File listed but not modified → **advisory warning**: "Impact gap: {file} listed as {caller|duplicate} but not modified — verify manually". Not auto-revert (callers sometimes don't need changes), but flags the risk.
275
131
 
276
- **Ratchet uses ONLY pre-existing test files** (from `.deepflow/auto-snapshot.txt`). If the agent added new test files that fail, those are excluded from evaluation — the agent's new tests don't influence the ratchet decision.
132
+ **Evaluate:** All pass + no violations commit stands. Any failure `git revert HEAD --no-edit`.
277
133
 
278
- **For spike tasks:** Same ratchet. If the spike's code passes pre-existing health checks, the spike passes. No LLM judges another LLM's work.
134
+ Ratchet uses ONLY pre-existing test files from `.deepflow/auto-snapshot.txt`.
279
135
 
280
136
  ### 5.7. PARALLEL SPIKE PROBES
281
137
 
282
- When two or more `[SPIKE]` tasks address the **same problem** (same "Blocked by:" target OR identical or near-identical hypothesis wording), treat them as a probe set and run this protocol instead of the standard single-agent flow.
283
-
284
- #### Detection
285
-
286
- ```
287
- Spike group = all [SPIKE] tasks where:
288
- - same "Blocked by:" value, OR
289
- - problem description is identical after stripping task ID prefix
290
- If group size 2 enter parallel probe mode
291
- ```
292
-
293
- #### Step 1: Record baseline commit
294
-
295
- ```bash
296
- cd ${WORKTREE_PATH}
297
- BASELINE=$(git rev-parse HEAD)
298
- echo "Probe baseline: ${BASELINE}"
299
- ```
300
-
301
- All probes branch from this exact commit so they share the same ratchet baseline.
302
-
303
- #### Step 2: Create isolated sub-worktrees
304
-
305
- For each spike `{SPIKE_ID}` in the probe group:
306
-
307
- ```bash
308
- PROBE_BRANCH="df/${SPEC_NAME}/probe-${SPIKE_ID}"
309
- PROBE_PATH=".deepflow/worktrees/${SPEC_NAME}/probe-${SPIKE_ID}"
310
-
311
- git worktree add -b "${PROBE_BRANCH}" "${PROBE_PATH}" "${BASELINE}"
312
- echo "Created probe worktree: ${PROBE_PATH} (branch: ${PROBE_BRANCH})"
313
- ```
314
-
315
- #### Step 3: Spawn all probes in parallel
316
-
317
- Mark every spike task as `in_progress`, then spawn one agent per probe **in a single message** using the Spike Task prompt (section 6), with the probe's worktree path as its working directory.
318
-
319
- ```
320
- TaskUpdate(taskId: native_id_SPIKE_A, status: "in_progress")
321
- TaskUpdate(taskId: native_id_SPIKE_B, status: "in_progress")
322
- [spawn agent for SPIKE_A → PROBE_PATH_A]
323
- [spawn agent for SPIKE_B → PROBE_PATH_B]
324
- ... (all in ONE message)
325
- ```
326
-
327
- End your turn. Do NOT poll or monitor. Wait for completion notifications.
328
-
329
- #### Step 4: Ratchet each probe (on completion notifications)
330
-
331
- When a probe agent's notification arrives, run the standard ratchet (section 5.5) against its dedicated probe worktree:
332
-
333
- ```bash
334
- cd ${PROBE_PATH}
335
-
336
- # Identical health-check commands as standard tasks
337
- # Build → Test → Typecheck → Lint (stop on first failure)
338
- ```
339
-
340
- Record per-probe metrics:
341
-
342
- ```yaml
343
- probe_id: SPIKE_A
344
- worktree: .deepflow/worktrees/{spec}/probe-SPIKE_A
345
- branch: df/{spec}/probe-SPIKE_A
346
- ratchet_passed: true/false
347
- regressions: 0 # failing pre-existing tests
348
- coverage_delta: +3 # new lines covered (positive = better)
349
- files_changed: 4 # number of files touched
350
- commit: abc1234
351
- ```
352
-
353
- Wait until **all** probe notifications have arrived before proceeding to selection.
354
-
355
- #### Step 5: Machine-select winner
356
-
357
- No LLM evaluates another LLM's work. Apply the following ordered criteria to all probes that **passed** the ratchet:
358
-
359
- ```
360
- 1. Fewer regressions (lower is better — hard gate: any regression disqualifies)
361
- 2. Better coverage (higher delta is better)
362
- 3. Fewer files changed (lower is better — smaller blast radius)
363
-
364
- Tie-break: first probe to complete (chronological)
365
- ```
366
-
367
- If **no** probe passes the ratchet, all are failed probes. Log insights (step 7) and reset the spike tasks to `pending` for retry with debugger guidance.
368
-
369
- #### Step 6: Preserve ALL probe worktrees
370
-
371
- Do NOT delete losing probe worktrees. They are preserved for manual inspection and cross-cycle learning:
372
-
373
- ```bash
374
- # Winning probe: leave as-is, will be used as implementation base (step 8)
375
- # Losing probes: leave worktrees intact, mark branches with -failed suffix for clarity
376
- git branch -m "df/{spec}/probe-SPIKE_B" "df/{spec}/probe-SPIKE_B-failed"
377
- ```
378
-
379
- Record all probe paths in `.deepflow/checkpoint.json` under `"spike_probes"` so future `--continue` runs know they exist.
380
-
381
- #### Step 7: Log failed probe insights
382
-
383
- For every probe that failed the ratchet (or lost selection), write two entries to `.deepflow/auto-memory.yaml` in the **main** tree.
384
-
385
- **Entry 1 — `spike_insights` (detailed probe record):**
386
-
387
- ```yaml
388
- spike_insights:
389
- - date: "YYYY-MM-DD"
390
- spec: "{spec_name}"
391
- spike_id: "SPIKE_B"
392
- hypothesis: "{hypothesis text from PLAN.md}"
393
- outcome: "failed" # or "passed-but-lost"
394
- failure_reason: "{first failed check and error summary}"
395
- ratchet_metrics:
396
- regressions: 2
397
- coverage_delta: -1
398
- files_changed: 7
399
- worktree: ".deepflow/worktrees/{spec}/probe-SPIKE_B-failed"
400
- branch: "df/{spec}/probe-SPIKE_B-failed"
401
- edge_cases: [] # orchestrator may populate after manual review
402
- ```
403
-
404
- **Entry 2 — `probe_learnings` (cross-cycle memory, read by `/df:auto-cycle` on each cycle start):**
405
-
406
- ```yaml
407
- probe_learnings:
408
- - spike: "SPIKE_B"
409
- probe: "{probe branch suffix, e.g. probe-SPIKE_B}"
410
- insight: "{one-sentence summary of what the probe revealed, derived from failure_reason}"
411
- ```
412
-
413
- If the file does not exist, create it. Initialize both `spike_insights:` and `probe_learnings:` as empty lists before appending. Preserve all existing keys when merging.
414
-
415
- #### Step 8: Promote winning probe
416
-
417
- Cherry-pick the winner's commit into the shared spec worktree so downstream implementation tasks see the winning approach:
418
-
419
- ```bash
420
- cd ${WORKTREE_PATH} # shared worktree (not the probe sub-worktree)
421
- git cherry-pick ${WINNER_COMMIT}
422
- ```
423
-
424
- Then mark the winning spike task as `completed` and auto-unblock its dependents:
425
-
426
- ```
427
- TaskUpdate(taskId: native_id_SPIKE_WINNER, status: "completed")
428
- TaskUpdate(taskId: native_id_SPIKE_LOSERS, status: "pending") # keep visible for audit
429
- ```
430
-
431
- Update PLAN.md:
432
- - Winning spike → `[x]` with commit hash and `[PROBE_WINNER]` tag
433
- - Losing spikes → `[~]` (skipped) with `[PROBE_FAILED: see auto-memory.yaml]` note
434
-
435
- Resume the standard execution loop (section 9) — implementation tasks blocked by the spike group are now unblocked.
138
+ Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothesis.
139
+
140
+ 1. **Baseline:** Record `BASELINE=$(git rev-parse HEAD)` in shared worktree
141
+ 2. **Sub-worktrees:** Per spike: `git worktree add -b df/{spec}/probe-{SPIKE_ID} .deepflow/worktrees/{spec}/probe-{SPIKE_ID} ${BASELINE}`
142
+ 3. **Spawn:** All probes in ONE message, each targeting its probe worktree. End turn.
143
+ 4. **Ratchet:** Per notification, run standard ratchet (5.5) in probe worktree. Record: ratchet_passed, regressions, coverage_delta, files_changed, commit
144
+ 5. **Select winner** (after ALL complete, no LLM judge):
145
+ - Disqualify any with regressions
146
+ - Rank: fewer regressions > higher coverage_delta > fewer files_changed > first to complete
147
+ - No passes → reset all to pending for retry with debugger
148
+ 6. **Preserve all worktrees.** Losers: rename branch + `-failed` suffix. Record in checkpoint.json under `"spike_probes"`
149
+ 7. **Log failed probes** to `.deepflow/auto-memory.yaml` (main tree):
150
+ ```yaml
151
+ spike_insights:
152
+ - date: "YYYY-MM-DD"
153
+ spec: "{spec_name}"
154
+ spike_id: "SPIKE_B"
155
+ hypothesis: "{from PLAN.md}"
156
+ outcome: "failed" # or "passed-but-lost"
157
+ failure_reason: "{first failed check + error summary}"
158
+ ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
159
+ worktree: ".deepflow/worktrees/{spec}/probe-SPIKE_B-failed"
160
+ branch: "df/{spec}/probe-SPIKE_B-failed"
161
+ probe_learnings: # read by /df:auto-cycle each start
162
+ - spike: "SPIKE_B"
163
+ probe: "probe-SPIKE_B"
164
+ insight: "{one-sentence summary from failure_reason}"
165
+ ```
166
+ Create file if missing. Preserve existing keys when merging.
167
+ 8. **Promote winner:** Cherry-pick into shared worktree. Winner → `[x] [PROBE_WINNER]`, losers → `[~] [PROBE_FAILED]`. Resume standard loop.
436
168
 
437
169
  ---
438
170
 
@@ -444,143 +176,127 @@ Working directory: {worktree_absolute_path}
444
176
  All file operations MUST use this absolute path as base. Do NOT write files to the main project directory.
445
177
  Commit format: {commit_type}({spec}): {description}
446
178
 
447
- STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main. These are handled by the orchestrator and /df:verify.
179
+ STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main.
448
180
  ```
449
181
 
450
- **Standard Task (append after preamble):**
182
+ **Standard Task:**
451
183
  ```
452
184
  {task_id}: {description from PLAN.md}
453
- Files: {target files}
454
- Spec: {spec_name}
185
+ Files: {target files} Spec: {spec_name}
186
+ {Impact block from PLAN.md — include verbatim if present}
187
+
188
+ {Prior failure context — include ONLY if task was previously reverted. Read from .deepflow/auto-memory.yaml revert_history for this task_id:}
189
+ Previous attempts (DO NOT repeat these approaches):
190
+ - Cycle {N}: reverted — "{reason from revert_history}"
191
+ - Cycle {N}: reverted — "{reason from revert_history}"
192
+ {Omit this entire block if task has no revert history.}
193
+
194
+ CRITICAL: If Impact lists duplicates or callers, you MUST verify each one is consistent with your changes.
195
+ - [active] duplicates → consolidate into single source of truth (e.g., local generateYAML → use shared buildConfigData)
196
+ - [dead] duplicates → DELETE the dead code entirely. Dead code pollutes context and causes drift.
455
197
 
456
198
  Steps:
457
- 1. If the task involves external APIs/SDKs, run: chub search "<library>" --json → chub get <id> --lang <lang>
458
- Use fetched docs as ground truth for API signatures. Annotate any gaps: chub annotate <id> "note"
459
- Skip this step if chub is not installed or the task only touches internal code.
460
- 2. Implement the task
461
- 3. Commit as feat({spec}): {description}
199
+ 1. External APIs/SDKs chub search "<library>" --json → chub get <id> --lang <lang> (skip if chub unavailable or internal code only)
200
+ 2. Read ALL files in Impact before implementing understand the full picture
201
+ 3. Implement the task, updating all impacted files
202
+ 4. Commit as feat({spec}): {description}
462
203
 
463
- Your ONLY job is to write code and commit. The orchestrator will run health checks after you finish.
204
+ Your ONLY job is to write code and commit. Orchestrator runs health checks after.
464
205
  ```
465
206
 
466
- **Bootstrap Task (append after preamble):**
207
+ **Bootstrap Task:**
467
208
  ```
468
209
  BOOTSTRAP: Write tests for files in edit_scope
469
- Files: {edit_scope files from spec}
470
- Spec: {spec_name}
471
-
472
- Steps:
473
- 1. Write tests that cover the functionality of the files listed above
474
- 2. Do NOT change implementation files — tests only
475
- 3. Commit as test({spec}): bootstrap tests for edit_scope
210
+ Files: {edit_scope files} Spec: {spec_name}
476
211
 
477
- Your ONLY job is to write tests and commit. The orchestrator will run health checks after you finish.
212
+ Write tests covering listed files. Do NOT change implementation files.
213
+ Commit as test({spec}): bootstrap tests for edit_scope
478
214
  ```
479
215
 
480
- **Spike Task (append after preamble):**
216
+ **Spike Task:**
481
217
  ```
482
218
  {task_id} [SPIKE]: {hypothesis}
483
- Files: {target files}
484
- Spec: {spec_name}
219
+ Files: {target files} Spec: {spec_name}
485
220
 
486
- Steps:
487
- 1. Implement the minimal spike to validate the hypothesis
488
- 2. Commit as spike({spec}): {description}
221
+ {Prior failure context — include ONLY if this spike was previously reverted. Read from .deepflow/auto-memory.yaml revert_history + spike_insights for this task_id:}
222
+ Previous attempts (DO NOT repeat these approaches):
223
+ - Cycle {N}: reverted — "{reason}"
224
+ {Omit this entire block if no revert history.}
489
225
 
490
- Your ONLY job is to write code and commit. The orchestrator will run health checks to determine if the spike passes.
226
+ Implement minimal spike to validate hypothesis.
227
+ Commit as spike({spec}): {description}
491
228
  ```
492
229
 
493
- ### 7. FAILURE HANDLING
230
+ ### 8. COMPLETE SPECS
494
231
 
495
- When a task fails ratchet and is reverted:
232
+ When all tasks done for a `doing-*` spec:
233
+ 1. Run `/df:verify doing-{name}` via the Skill tool (`skill: "df:verify", args: "doing-{name}"`)
234
+ - Verify runs quality gates (L0-L4), merges worktree branch to main, cleans up worktree, renames spec `doing-*` → `done-*`, and extracts decisions
235
+ - If verify fails (adds fix tasks): stop here — `/df:execute --continue` will pick up the fix tasks
236
+ - If verify passes: proceed to step 2
237
+ 2. Remove spec's ENTIRE section from PLAN.md (header, tasks, summaries, fix tasks, separators)
238
+ 3. Recalculate Summary table at top of PLAN.md
496
239
 
497
- `TaskUpdate(taskId: native_id, status: "pending")` — keeps task visible for retry; dependents remain blocked.
240
+ ---
498
241
 
499
- On repeated failure: spawn `Task(subagent_type="reasoner", model={model from debugger frontmatter, default "sonnet"}, prompt="Debug failure: {ratchet output}")`.
242
+ ## Usage
243
+
244
+ ```
245
+ /df:execute # Execute all ready tasks
246
+ /df:execute T1 T2 # Specific tasks only
247
+ /df:execute --continue # Resume from checkpoint
248
+ /df:execute --fresh # Ignore checkpoint
249
+ /df:execute --dry-run # Show plan only
250
+ ```
500
251
 
501
- Leave worktree intact, keep checkpoint.json, output: worktree path/branch, `cd {worktree_path}` to investigate, `/df:execute --continue` to resume, `/df:execute --fresh` to discard.
252
+ ## Skills & Agents
502
253
 
503
- ### 8. COMPLETE SPECS
254
+ - Skill: `atomic-commits` — Clean commit protocol
255
+ - Skill: `context-hub` — Fetch external API docs before coding
504
256
 
505
- When all tasks done for a `doing-*` spec:
506
- 1. Embed history in spec: `## Completed` section with task list and commit hashes
507
- 2. Rename: `doing-upload.md` `done-upload.md`
508
- 3. Extract decisions from done-* spec: Read the `done-{name}.md` file. Model-extract architectural decisions — look for explicit choices (→ `[APPROACH]`), unvalidated assumptions (→ `[ASSUMPTION]`), and "for now" decisions (→ `[PROVISIONAL]`). Append as a new section to **main tree** `.deepflow/decisions.md`:
509
- ```
510
- ### {YYYY-MM-DD} — {spec-name}
511
- - [TAG] decision text — rationale
512
- ```
513
- After successful append, delete `specs/done-{name}.md`. If write fails, preserve the file.
514
- 4. Remove the spec's ENTIRE section from PLAN.md:
515
- - The `### doing-{spec}` header
516
- - All task entries (`- [x] **T{n}**: ...` and their sub-items)
517
- - Any `## Execution Summary` block for that spec
518
- - Any `### Fix Tasks` sub-section for that spec
519
- - Separators (`---`) between removed sections
520
- 5. Recalculate the Summary table at the top of PLAN.md (update counts for completed/pending)
257
+ | Agent | subagent_type | Purpose |
258
+ |-------|---------------|---------|
259
+ | Implementation | `general-purpose` | Task implementation |
260
+ | Debugger | `reasoner` | Debugging failures |
521
261
 
522
- ### 9. ITERATE (Notification-Driven)
262
+ **Model routing:** Use `model:` from command/agent/skill frontmatter. Default: `sonnet`.
523
263
 
524
- After spawning wave agents, your turn ENDS. Completion notifications drive the loop.
264
+ **Checkpoint schema:** `.deepflow/checkpoint.json` in worktree:
265
+ ```json
266
+ {"completed_tasks": ["T1","T2"], "current_wave": 2, "worktree_path": ".deepflow/worktrees/upload", "worktree_branch": "df/upload"}
267
+ ```
525
268
 
526
- **Per notification:**
527
- 1. Run ratchet check for the completed agent (see section 5.5)
528
- 2. Ratchet passed → `TaskUpdate(taskId: native_id, status: "completed")` — auto-unblocks dependent tasks
529
- 3. Ratchet failed → revert commit, `TaskUpdate(taskId: native_id, status: "pending")`
530
- 4. Update PLAN.md: `[ ]` → `[x]` + commit hash (on pass) or note revert (on fail)
531
- 5. Report: "✓ T1: ratchet passed (abc123)" or "✗ T1: ratchet failed, reverted"
532
- 6. If NOT all wave agents done → end turn, wait
533
- 7. If ALL wave agents done → use TaskList to find newly unblocked tasks, check context, spawn next wave or finish
269
+ ---
534
270
 
535
- **Between waves:** Check context %. If ≥50%, checkpoint and exit.
271
+ ## Failure Handling
536
272
 
537
- **Repeat** until: all done, all blocked, or context ≥50% (checkpoint).
273
+ When task fails ratchet and is reverted:
274
+ - `TaskUpdate(taskId: native_id, status: "pending")` — dependents remain blocked
275
+ - Repeated failure → spawn `Task(subagent_type="reasoner", prompt="Debug failure: {ratchet output}")`
276
+ - Leave worktree intact, keep checkpoint.json
277
+ - Output: worktree path/branch, `cd {path}` to investigate, `--continue` to resume, `--fresh` to discard
538
278
 
539
279
  ## Rules
540
280
 
541
281
  | Rule | Detail |
542
282
  |------|--------|
543
- | Zero test files → bootstrap first | Section 1.7; bootstrap is the cycle's sole task when snapshot is empty |
283
+ | Zero test files → bootstrap first | Bootstrap is cycle's sole task when snapshot empty |
544
284
  | 1 task = 1 agent = 1 commit | `atomic-commits` skill |
545
285
  | 1 file = 1 writer | Sequential if conflict |
546
286
  | Agent writes code, orchestrator measures | Ratchet is the judge |
547
287
  | No LLM evaluates LLM work | Health checks only |
548
- | ≥2 spikes for same problem → parallel probes | Section 5.7; never run competing spikes sequentially |
549
- | All probe worktrees preserved | Losing probes renamed with `-failed` suffix; never deleted |
550
- | Machine-selected winner | Fewer regressions > better coverage > fewer files changed; no LLM judge |
551
- | Failed probe insights logged | `.deepflow/auto-memory.yaml` in main tree; persists across cycles |
552
- | Winner cherry-picked to shared worktree | Downstream tasks see winning approach via shared worktree |
553
- | External APIs → chub first | Agents fetch curated docs before implementing external API calls; skip if chub unavailable |
288
+ | ≥2 spikes same problem → parallel probes | Never run competing spikes sequentially |
289
+ | All probe worktrees preserved | Losers renamed `-failed`; never deleted |
290
+ | Machine-selected winner | Regressions > coverage > files changed; no LLM judge |
291
+ | External APIs chub first | Skip if unavailable |
554
292
 
555
293
  ## Example
556
294
 
557
- ### No-Tests Bootstrap
558
-
559
- ```
560
- /df:execute (context: 8%)
561
-
562
- Loading PLAN.md... T1 ready, T2/T3 blocked by T1
563
- Ratchet snapshot: 0 pre-existing test files
564
- Bootstrap needed: no pre-existing test files found.
565
-
566
- Spawning bootstrap agent for edit_scope...
567
- [Bootstrap agent completed]
568
- Running ratchet: build ✓ | tests ✓ (12 new tests pass)
569
- ✓ Bootstrap: ratchet passed (boo1234)
570
- Re-taking ratchet snapshot: 3 test files
571
-
572
- bootstrap: completed — cycle's sole task was test bootstrap
573
- Next: Run /df:auto-cycle again to execute T1
574
- ```
575
-
576
- ### Standard Execution
577
-
578
295
  ```
579
296
  /df:execute (context: 12%)
580
297
 
581
298
  Loading PLAN.md... T1 ready, T2/T3 blocked by T1
582
299
  Ratchet snapshot: 24 pre-existing test files
583
- Registering native tasks: TaskCreate T1/T2/T3, TaskUpdate(T2 blockedBy T1), TaskUpdate(T3 blockedBy T1)
584
300
 
585
301
  Wave 1: TaskUpdate(T1, in_progress)
586
302
  [Agent "T1" completed]
@@ -589,43 +305,13 @@ Wave 1: TaskUpdate(T1, in_progress)
589
305
  TaskUpdate(T1, completed) → auto-unblocks T2, T3
590
306
 
591
307
  Wave 2: TaskUpdate(T2/T3, in_progress)
592
- [Agent "T2" completed]
593
- Running ratchet: build | tests (24 passed) | typecheck ✓
594
- ✓ T2: ratchet passed (def5678)
595
- [Agent "T3" completed]
596
- Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
597
- T3: ratchet passed (ghi9012)
598
-
599
- Context: 35% doing-upload → done-upload. Complete: 3/3
600
-
601
- Next: Run /df:verify to verify specs and merge to main
602
- ```
603
-
604
- ### Ratchet Failure (Regression Detected)
605
-
606
- ```
607
- /df:execute (context: 10%)
608
-
609
- Wave 1: TaskUpdate(T1, in_progress)
610
- [Agent "T1" completed]
611
- Running ratchet: build ✓ | tests ✗ (2 failed of 24)
612
- ✗ T1: ratchet failed, reverted
613
- TaskUpdate(T1, pending)
614
-
615
- Spawning debugger for T1...
616
- [Debugger completed]
617
- Re-running T1 with fix guidance...
618
-
619
- [Agent "T1 retry" completed]
620
- Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
621
- ✓ T1: ratchet passed (abc1234)
622
- ```
623
-
624
- ### With Checkpoint
625
-
626
- ```
627
- Wave 1 complete (context: 52%)
628
- Checkpoint saved.
629
-
630
- Next: Run /df:execute --continue to resume execution
308
+ [Agent "T2" completed] ✓ T2: ratchet passed (def5678)
309
+ [Agent "T3" completed] T3: ratchet passed (ghi9012)
310
+
311
+ Context: 35% — All tasks done for doing-upload.
312
+ Running /df:verify doing-upload...
313
+ L0 | L1 (3/3 files) | ⚠ L2 (no coverage tool) | ✓ L4 (24 tests)
314
+ ✓ Merged df/upload to main
315
+ Spec complete: doing-upload → done-upload
316
+ Complete: 3/3
631
317
  ```
@@ -3,7 +3,7 @@
3
3
  ## Purpose
4
4
  Compare specs against codebase and past experiments. Generate prioritized tasks.
5
5
 
6
- **NEVER:** use EnterPlanMode, use ExitPlanMode — this command IS the planning phase; native plan mode conflicts with it
6
+ **NEVER:** use EnterPlanMode, use ExitPlanMode — this command IS the planning phase
7
7
 
8
8
  ## Usage
9
9
  ```
@@ -17,71 +17,50 @@ Compare specs against codebase and past experiments. Generate prioritized tasks.
17
17
 
18
18
  ## Spec File States
19
19
 
20
- | Prefix | State | Action |
21
- |--------|-------|--------|
22
- | (none) | New | Plan this |
23
- | `doing-` | In progress | Skip |
24
- | `done-` | Completed | Skip |
20
+ | Prefix | Action |
21
+ |--------|--------|
22
+ | (none) | Plan this |
23
+ | `doing-` | Skip |
24
+ | `done-` | Skip |
25
25
 
26
26
  ## Behavior
27
27
 
28
28
  ### 1. LOAD CONTEXT
29
29
 
30
30
  ```
31
- Load:
32
- - specs/*.md EXCLUDING doing-* and done-* (only new specs)
33
- - PLAN.md (if exists, for appending)
34
- - .deepflow/config.yaml (if exists)
35
-
31
+ Load: specs/*.md (exclude doing-*/done-*), PLAN.md (if exists), .deepflow/config.yaml
36
32
  Determine source_dir from config or default to src/
37
33
  ```
38
34
 
39
- Run `validateSpec` on each loaded spec. Hard failures → skip that spec entirely and emit an error line. Advisory warnings → include them in plan output.
40
-
41
- If no new specs: report counts, suggest `/df:execute`.
35
+ Run `validateSpec` on each spec. Hard failures → skip + error. Advisory → include in output.
36
+ No new specs → report counts, suggest `/df:execute`.
42
37
 
43
38
  ### 2. CHECK PAST EXPERIMENTS (SPIKE-FIRST)
44
39
 
45
40
  **CRITICAL**: Check experiments BEFORE generating any tasks.
46
41
 
47
- Extract topic from spec name (fuzzy match), then:
48
-
49
42
  ```
50
43
  Glob .deepflow/experiments/{topic}--*
51
44
  ```
52
45
 
53
- **Experiment file naming:** `{topic}--{hypothesis}--{status}.md`
54
- Statuses: `active`, `passed`, `failed`
46
+ File naming: `{topic}--{hypothesis}--{status}.md` (active/passed/failed)
55
47
 
56
48
  | Result | Action |
57
49
  |--------|--------|
58
- | `--failed.md` exists | Extract "next hypothesis" from Conclusion section |
59
- | `--passed.md` exists | Reference as validated pattern, can proceed to full implementation |
60
- | `--active.md` exists | Wait for experiment completion before planning |
61
- | No matches | New topic, needs initial spike |
50
+ | `--failed.md` | Extract "next hypothesis" from Conclusion, generate spike |
51
+ | `--passed.md` | Proceed to full implementation |
52
+ | `--active.md` | Wait for completion |
53
+ | No matches | New topic, generate initial spike |
62
54
 
63
- **Spike-First Rule**:
64
- - If `--failed.md` exists: Generate spike task to test the next hypothesis (from failed experiment's Conclusion)
65
- - If no experiments exist: Generate spike task for the core hypothesis
66
- - Full implementation tasks are BLOCKED until a spike validates the approach
67
- - Only proceed to full task generation after `--passed.md` exists
68
-
69
- See: `templates/experiment-template.md` for experiment format
55
+ Full implementation tasks BLOCKED until spike validates. See `templates/experiment-template.md`.
70
56
 
71
57
  ### 3. DETECT PROJECT CONTEXT
72
58
 
73
- For existing codebases, identify:
74
- - Code style/conventions
75
- - Existing patterns (error handling, API structure)
76
- - Integration points
77
-
78
- Include patterns in task descriptions for agents to follow.
59
+ Identify code style, patterns (error handling, API structure), integration points. Include in task descriptions.
79
60
 
80
61
  ### 4. ANALYZE CODEBASE
81
62
 
82
- Follow `templates/explore-agent.md` for spawn rules, prompt structure, and scope restrictions.
83
-
84
- Scale agent count based on codebase size:
63
+ Follow `templates/explore-agent.md` for spawn rules and scope.
85
64
 
86
65
  | File Count | Agents |
87
66
  |------------|--------|
@@ -90,125 +69,111 @@ Scale agent count based on codebase size:
90
69
  | 100-500 | 25-40 |
91
70
  | 500+ | 50-100 (cap) |
92
71
 
93
- **Use `code-completeness` skill patterns** to search for:
94
- - Implementations matching spec requirements
95
- - TODO, FIXME, HACK comments
96
- - Stub functions, placeholder returns
97
- - Skipped tests, incomplete coverage
72
+ Use `code-completeness` skill to search for: implementations matching spec requirements, TODOs/FIXMEs/HACKs, stubs, skipped tests.
98
73
 
99
- ### 5. COMPARE & PRIORITIZE
74
+ ### 4.5. IMPACT ANALYSIS (per planned file)
100
75
 
101
- Spawn `Task(subagent_type="reasoner", model="opus")`. Reasoner maps each requirement to DONE / PARTIAL / MISSING / CONFLICT. Flag spec gaps; don't silently assume.
76
+ For each file in a task's "Files:" list, find the full blast radius.
102
77
 
103
- Check spec health: verify REQ-AC alignment, requirement clarity, and completeness. Note any issues (orphan ACs, vague requirements) in plan output.
78
+ **Search for:**
104
79
 
105
- **Priority order:** Dependencies Impact Risk
80
+ 1. **Callers:** `grep -r "{exported_function}" --include="*.{ext}" -l` — files that import/call what's being changed
81
+ 2. **Duplicates:** Files with similar logic (same function name, same transformation). Classify:
82
+ - `[active]` — used in production → must consolidate
83
+ - `[dead]` — bypassed/unreachable → must delete
84
+ 3. **Data flow:** If file produces/transforms data, find ALL consumers of that shape across languages
106
85
 
107
- ### 6. GENERATE SPIKE TASKS (IF NEEDED)
86
+ **Embed as `Impact:` block in each task:**
87
+ ```markdown
88
+ - [ ] **T2**: Add new features to YAML export
89
+ - Files: src/utils/buildConfigData.ts
90
+ - Impact:
91
+ - Callers: src/routes/index.ts:12, src/api/handler.ts:45
92
+ - Duplicates:
93
+ - src/components/YamlViewer.tsx:19 (own generateYAML) [active — consolidate]
94
+ - backend/yaml_gen.go (generateYAMLFromConfig) [dead — DELETE]
95
+ - Data flow: buildConfigData → YamlViewer, SimControls, RoleplayPage
96
+ - Blocked by: T1
97
+ ```
98
+
99
+ Files outside original "Files:" → add with `(impact — verify/update)`.
100
+ Skip for spike tasks.
101
+
102
+ ### 5. COMPARE & PRIORITIZE
103
+
104
+ Spawn `Task(subagent_type="reasoner", model="opus")`. Map each requirement to DONE / PARTIAL / MISSING / CONFLICT. Check REQ-AC alignment. Flag spec gaps.
108
105
 
109
- **When to generate spike tasks:**
110
- 1. Failed experiment exists → Test the next hypothesis
111
- 2. No experiments exist Test the core hypothesis
112
- 3. Passed experiment exists → Skip to full implementation
106
+ Priority: Dependencies Impact → Risk
107
+
108
+ ### 6. GENERATE SPIKE TASKS (IF NEEDED)
113
109
 
114
110
  **Spike Task Format:**
115
111
  ```markdown
116
112
  - [ ] **T1** [SPIKE]: Validate {hypothesis}
117
113
  - Type: spike
118
114
  - Hypothesis: {what we're testing}
119
- - Method: {minimal steps to validate}
120
- - Success criteria: {how to know it passed}
115
+ - Method: {minimal steps}
116
+ - Success criteria: {measurable}
121
117
  - Time-box: 30 min
122
118
  - Files: .deepflow/experiments/{topic}--{hypothesis}--{status}.md
123
119
  - Blocked by: none
124
120
  ```
125
121
 
126
- **Blocking Logic:** All implementation tasks MUST have `Blocked by: T{spike}` until spike passes. If spike fails: update to `--failed.md`, DO NOT generate implementation tasks.
122
+ All implementation tasks MUST `Blocked by: T{spike}`. Spike fails `--failed.md`, no implementation tasks.
127
123
 
128
124
  #### Probe Diversity
129
125
 
130
- When generating multiple spike probes for the same problem, diversity is required to avoid confirmation bias and enable discovery of unexpected solutions.
126
+ When generating multiple spikes for the same problem:
131
127
 
132
128
  | Requirement | Rule |
133
129
  |-------------|------|
134
- | Contradictory | At least 2 probes must use opposing/contradictory approaches (e.g., streaming vs buffering, in-process vs external) |
135
- | Naive | At least 1 probe must be a naive/simple approach without prior technical justification — enables exaptation (discovering unexpected solutions) |
136
- | Parallel | All probes for the same problem run simultaneously, not sequentially |
137
- | Scoped | Each probe is minimal — just enough to validate the hypothesis |
138
- | Safe to fail | Each probe runs in its own worktree; failure has zero impact on main |
130
+ | Contradictory | 2 probes with opposing approaches |
131
+ | Naive | 1 probe without prior technical justification |
132
+ | Parallel | All run simultaneously |
133
+ | Scoped | Minimal — just enough to validate |
139
134
 
140
- **Diversity validation step** before outputting spike tasks, verify:
141
- 1. Are there at least 2 probes with opposing assumptions? If not, add a contradictory probe.
142
- 2. Is there at least 1 naive probe with no prior technical justification? If not, add one.
143
- 3. Are all probes independent (no probe depends on another probe's result)?
144
-
145
- **Example — 3 diverse probes for a caching problem:**
135
+ Before output, verify: ≥2 opposing probes, ≥1 naive, all independent.
146
136
 
137
+ **Example — caching problem, 3 diverse probes:**
147
138
  ```markdown
148
139
  - [ ] **T1** [SPIKE]: Validate in-memory LRU cache
149
- - Type: spike
150
140
  - Role: Contradictory-A (in-process)
151
- - Hypothesis: In-memory LRU cache reduces DB queries by ≥80%
152
- - Method: Implement LRU with 1000-item cap, run load test
153
- - Success criteria: DB query count drops ≥80% under 100 concurrent users
154
- - Blocked by: none
141
+ - Hypothesis: In-memory LRU reduces DB queries by ≥80%
142
+ - Method: LRU with 1000-item cap, load test
143
+ - Success criteria: DB queries drop ≥80% under 100 concurrent users
155
144
 
156
145
  - [ ] **T2** [SPIKE]: Validate Redis distributed cache
157
- - Type: spike
158
146
  - Role: Contradictory-B (external, opposing T1)
159
- - Hypothesis: Redis cache scales across multiple instances
160
- - Method: Add Redis client, cache top 10 queries, same load test
161
- - Success criteria: DB queries drop ≥80%, works across 2 app instances
162
- - Blocked by: none
147
+ - Hypothesis: Redis scales across multiple instances
148
+ - Method: Redis client, cache top 10 queries, same load test
149
+ - Success criteria: DB queries drop ≥80%, works across 2 instances
163
150
 
164
- - [ ] **T3** [SPIKE]: Validate query optimization without cache (naive)
165
- - Type: spike
151
+ - [ ] **T3** [SPIKE]: Validate query optimization without cache
166
152
  - Role: Naive (no prior justification — tests if caching is even necessary)
167
- - Hypothesis: Indexes + query batching alone may be sufficient
168
- - Method: Add missing indexes, batch N+1 queries, same load test — no cache
153
+ - Hypothesis: Indexes + query batching alone may suffice
154
+ - Method: Add indexes, batch N+1 queries, same load test — no cache
169
155
  - Success criteria: DB queries drop ≥80% with zero cache infrastructure
170
- - Blocked by: none
171
156
  ```
172
157
 
173
158
  ### 7. VALIDATE HYPOTHESES
174
159
 
175
- For unfamiliar APIs, ambiguous approaches, or performance-critical work: prototype in scratchpad (not committed). If assumption fails, write `.deepflow/experiments/{topic}--{hypothesis}--failed.md`. Skip for well-known patterns/simple CRUD.
160
+ Unfamiliar APIs or performance-critical prototype in scratchpad. Fails write `--failed.md`. Skip for known patterns.
176
161
 
177
162
  ### 8. CLEANUP PLAN.md
178
163
 
179
- Before writing new tasks, prune stale sections:
180
-
181
- ```
182
- For each ### section in PLAN.md:
183
- Extract spec name from header (e.g. "doing-upload" or "done-upload")
184
- If specs/done-{name}.md exists:
185
- → Remove the ENTIRE section: header, tasks, execution summary, fix tasks, separators
186
- If header references a spec with no matching specs/doing-*.md or specs/done-*.md:
187
- → Remove it (orphaned section)
188
- ```
189
-
190
- Also recalculate the Summary table (specs analyzed, tasks created/completed/pending) to reflect only remaining sections.
191
-
192
- If PLAN.md becomes empty after cleanup, delete the file and recreate fresh.
193
-
194
- ### 9. OUTPUT PLAN.md
164
+ Prune stale sections: remove `done-*` sections and orphaned headers. Recalculate Summary table. Empty → recreate fresh.
195
165
 
196
- Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validation findings.
166
+ ### 9. OUTPUT & RENAME
197
167
 
198
- ### 10. RENAME SPECS
168
+ Append tasks grouped by `### doing-{spec-name}`. Rename `specs/feature.md` `specs/doing-feature.md`.
199
169
 
200
- `mv specs/feature.md specs/doing-feature.md`
201
-
202
- ### 11. REPORT
203
-
204
- `✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
170
+ Report: `✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
205
171
 
206
172
  ## Rules
207
- - **Spike-first** — Generate spike task before full implementation if no `--passed.md` experiment exists
208
- - **Block on spike** — Full implementation tasks MUST be blocked by spike validation
209
- - **Learn from failures** — Extract "next hypothesis" from failed experiments, never repeat same approach
173
+ - **Spike-first** — No `--passed.md` spike before implementation
174
+ - **Block on spike** — Implementation tasks blocked until spike validates
175
+ - **Learn from failures** — Extract next hypothesis, never repeat approach
210
176
  - **Plan only** — Do NOT implement (except quick validation prototypes)
211
- - **Confirm before assume** — Search code before marking "missing"
212
177
  - **One task = one logical unit** — Atomic, committable
213
178
  - Prefer existing utilities over new code; flag spec gaps
214
179
 
@@ -216,74 +181,31 @@ Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validatio
216
181
 
217
182
  | Agent | Model | Base | Scale |
218
183
  |-------|-------|------|-------|
219
- | Explore (search) | haiku | 10 | +1 per 20 files |
220
- | Reasoner (analyze) | opus | 5 | +1 per 2 specs |
184
+ | Explore | haiku | 10 | +1 per 20 files |
185
+ | Reasoner | opus | 5 | +1 per 2 specs |
221
186
 
222
- Always use the `Task` tool with explicit `subagent_type` and `model`. Do NOT use Glob/Grep/Read directly.
187
+ Always use `Task` tool with explicit `subagent_type` and `model`.
223
188
 
224
189
  ## Example
225
190
 
226
- ### Spike-First (No Prior Experiments)
227
-
228
191
  ```markdown
229
- # Plan
230
-
231
192
  ### doing-upload
232
193
 
233
194
  - [ ] **T1** [SPIKE]: Validate streaming upload approach
234
195
  - Type: spike
235
- - Hypothesis: Streaming uploads will handle files >1GB without memory issues
236
- - Method: Create minimal endpoint, upload 2GB file, measure memory
237
- - Success criteria: Memory stays under 500MB during upload
238
- - Time-box: 30 min
196
+ - Hypothesis: Streaming uploads handle >1GB without memory issues
197
+ - Success criteria: Memory <500MB during 2GB upload
239
198
  - Files: .deepflow/experiments/upload--streaming--active.md
240
199
  - Blocked by: none
241
200
 
242
201
  - [ ] **T2**: Create upload endpoint
243
202
  - Files: src/api/upload.ts
244
- - Blocked by: T1 (spike must pass)
203
+ - Impact:
204
+ - Callers: src/routes/index.ts:5
205
+ - Duplicates: backend/legacy-upload.go [dead — DELETE]
206
+ - Blocked by: T1
245
207
 
246
208
  - [ ] **T3**: Add S3 service with streaming
247
209
  - Files: src/services/storage.ts
248
- - Blocked by: T1 (spike must pass), T2
249
- ```
250
-
251
- ### Spike-First (After Failed Experiment)
252
-
253
- ```markdown
254
- # Plan
255
-
256
- ### doing-upload
257
-
258
- - [ ] **T1** [SPIKE]: Validate chunked upload with backpressure
259
- - Type: spike
260
- - Hypothesis: Adding backpressure control will prevent buffer overflow
261
- - Method: Implement pause/resume on buffer threshold, test with 2GB file
262
- - Success criteria: No memory spikes above 500MB
263
- - Time-box: 30 min
264
- - Files: .deepflow/experiments/upload--chunked-backpressure--active.md
265
- - Blocked by: none
266
- - Note: Previous approach failed (see upload--buffer-upload--failed.md)
267
-
268
- - [ ] **T2**: Implement chunked upload endpoint
269
- - Files: src/api/upload.ts
270
- - Blocked by: T1 (spike must pass)
271
- ```
272
-
273
- ### After Spike Validates (Full Implementation)
274
-
275
- ```markdown
276
- # Plan
277
-
278
- ### doing-upload
279
-
280
- - [ ] **T1**: Create upload endpoint
281
- - Files: src/api/upload.ts
282
- - Blocked by: none
283
- - Note: Use streaming (validated in upload--streaming--passed.md)
284
-
285
- - [ ] **T2**: Add S3 service with streaming
286
- - Files: src/services/storage.ts
287
- - Blocked by: T1
288
- - Avoid: Direct buffer upload failed (see upload--buffer-upload--failed.md)
210
+ - Blocked by: T1, T2
289
211
  ```