deepflow 0.1.49 → 0.1.50

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -62,36 +62,7 @@ Each task = one background agent. Use agent completion notifications as the feed
62
62
  - Yes → proceed to next wave or write final summary
63
63
  ```
64
64
 
65
- **CRITICAL: After spawning agents, your turn ENDS. Do NOT:**
66
- - Run Bash commands to poll/monitor
67
- - Try to read result files before notifications arrive
68
- - Write summaries before all wave agents complete
69
-
70
- **On notification, respond briefly:**
71
- - ONE line per completed agent: "✓ T1: success (abc123)"
72
- - Only write full summary after ALL wave agents complete
73
- - Do NOT repeat the full execution status on every notification
74
-
75
- ```python
76
- # Step 1: Spawn wave (ONE message, then STOP)
77
- Task(subagent_type="general-purpose", model="sonnet", run_in_background=True, prompt="T1: ...")
78
- Task(subagent_type="general-purpose", model="sonnet", run_in_background=True, prompt="T2: ...")
79
- Task(subagent_type="general-purpose", model="sonnet", run_in_background=True, prompt="T3: ...")
80
- # Turn ends here. Wait for notifications.
81
-
82
- # Step 2: On "Agent T1 completed" notification:
83
- Read("{worktree}/.deepflow/results/T1.yaml")
84
- # Output: "✓ T1: success (abc123)" — then STOP, wait for next
85
-
86
- # Step 3: On "Agent T2 completed" notification:
87
- Read("{worktree}/.deepflow/results/T2.yaml")
88
- # Output: "✓ T2: success (def456)" — then STOP, wait for next
89
-
90
- # Step 4: On "Agent T3 completed" notification (last one):
91
- Read("{worktree}/.deepflow/results/T3.yaml")
92
- # Output: "✓ T3: success (ghi789)"
93
- # All done → proceed to next wave or final summary
94
- ```
65
+ After spawning, your turn ENDS. Per notification: read result file, output ONE line ("✓ T1: success (abc123)"), update PLAN.md. Write full summary only after ALL wave agents complete.
95
66
 
96
67
  Result file `.deepflow/results/{task_id}.yaml`:
97
68
  ```yaml
@@ -145,10 +116,8 @@ experiment_file: ".deepflow/experiments/upload--streaming--failed.md"
145
116
  }
146
117
  ```
147
118
 
148
- Note: `completed_tasks` is kept for backward compatibility but is now derivable from PLAN.md `[x]` entries. The native task system (TaskList) is the primary source for runtime task status.
149
-
150
119
  **On checkpoint:** Complete wave → update PLAN.md → save to worktree → exit.
151
- **Resume:** `--continue` loads checkpoint, verifies worktree, skips completed tasks. Native tasks are re-registered for remaining `[ ]` items only.
120
+ **Resume:** `--continue` loads checkpoint, verifies worktree, skips completed tasks.
152
121
 
153
122
  ## Behavior
154
123
 
@@ -200,27 +169,7 @@ If missing: "No PLAN.md found. Run /df:plan first."
200
169
 
201
170
  ### 2.5. REGISTER NATIVE TASKS
202
171
 
203
- Parse PLAN.md and create native tasks for tracking, dependency management, and UI spinners.
204
-
205
- **For each uncompleted task (`[ ]`) in PLAN.md:**
206
-
207
- ```
208
- 1. TaskCreate:
209
- - subject: "{task_id}: {description}" (e.g. "T1: Create upload endpoint")
210
- - description: Full task block from PLAN.md (files, blocked by, type, etc.)
211
- - activeForm: "{gerund form of description}" (e.g. "Creating upload endpoint")
212
-
213
- 2. Store mapping: PLAN.md task_id (T1) → native task ID
214
- ```
215
-
216
- **After all tasks created, set up dependencies:**
217
-
218
- ```
219
- For each task with "Blocked by: T{n}, T{m}":
220
- TaskUpdate(taskId: native_id, addBlockedBy: [native_id_of_Tn, native_id_of_Tm])
221
- ```
222
-
223
- **On `--continue`:** Only create tasks for remaining `[ ]` items (skip `[x]` completed).
172
+ For each `[ ]` task in PLAN.md: `TaskCreate(subject: "{task_id}: {description}", activeForm: "{gerund}", description: full block)`. Store task_id → native ID mapping. Then set dependencies: `TaskUpdate(addBlockedBy: [...])` for each "Blocked by:" entry. On `--continue`: only register remaining `[ ]` items.
224
173
 
225
174
  ### 3. CHECK FOR UNPLANNED SPECS
226
175
 
@@ -302,24 +251,9 @@ TaskUpdate(taskId: native_id, status: "in_progress")
302
251
  ```
303
252
  This activates the UI spinner showing the task's activeForm (e.g. "Creating upload endpoint").
304
253
 
305
- **CRITICAL: Spawn ALL ready tasks in a SINGLE response with MULTIPLE Task tool calls.**
306
-
307
- DO NOT spawn one task, wait, then spawn another. Instead, call Task tool multiple times in the SAME message block. This enables true parallelism.
254
+ **NEVER use `isolation: "worktree"` on Task tool calls.** Deepflow manages a shared worktree per spec (`.deepflow/worktrees/{spec}/`) so wave 2 agents see wave 1 commits. Claude Code's native isolation creates separate per-agent worktrees (`.claude/worktrees/`) where agents can't see each other's work.
308
255
 
309
- Example: If T1, T2, T3 are ready, send ONE message containing THREE Task tool invocations:
310
-
311
- ```
312
- // In a SINGLE assistant message, invoke Task with run_in_background=true:
313
- Task(subagent_type="general-purpose", model="sonnet", run_in_background=true, prompt="T1: ...")
314
- Task(subagent_type="general-purpose", model="sonnet", run_in_background=true, prompt="T2: ...")
315
- Task(subagent_type="general-purpose", model="sonnet", run_in_background=true, prompt="T3: ...")
316
- // Turn ends here. Wait for completion notifications.
317
- ```
318
-
319
- **WRONG (sequential):** Send message with Task for T1 → wait → send message with Task for T2 → wait → ...
320
- **RIGHT (parallel):** Send ONE message with Task for T1, T2, T3 all together, then STOP
321
-
322
- Same-file conflicts: spawn sequentially instead.
256
+ **Spawn ALL ready tasks in ONE message** with multiple Task tool calls (true parallelism). Same-file conflicts: spawn sequentially.
323
257
 
324
258
  **Spike Task Execution:**
325
259
  When spawning a spike task, the agent MUST:
@@ -378,58 +312,40 @@ VERIFIED_PASS →
378
312
 
379
313
  VERIFIED_FAIL →
380
314
  # Spike task stays as pending, dependents remain blocked
381
- # No TaskUpdate needed — native system keeps them blocked
382
315
  Log "✗ Spike {task_id} failed verification"
383
316
  If override: log "⚠ Agent incorrectly marked as passed"
384
317
  ```
385
318
 
386
- **On failure, use Task tool to spawn reasoner:**
387
- ```
388
- Task tool parameters:
389
- - subagent_type: "reasoner"
390
- - model: "opus"
391
- - prompt: "Debug failure: {error details}"
392
- ```
319
+ On task failure: spawn `Task(subagent_type="reasoner", model="opus", prompt="Debug failure: {error details}")`.
393
320
 
394
321
  ### 7. PER-TASK (agent prompt)
395
322
 
396
- **Standard Task:**
323
+ **Common preamble (include in all agent prompts):**
324
+ ```
325
+ Working directory: {worktree_absolute_path}
326
+ All file operations MUST use this absolute path as base. Do NOT write files to the main project directory.
327
+ Commit format: {commit_type}({spec}): {description}
328
+ Result file: {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
329
+
330
+ STOP after writing the result file. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main. These are handled by the orchestrator and /df:verify.
331
+ ```
332
+
333
+ **Standard Task (append after preamble):**
397
334
  ```
398
335
  {task_id}: {description from PLAN.md}
399
336
  Files: {target files}
400
337
  Spec: {spec_name}
401
338
 
402
- **IMPORTANT: Working Directory**
403
- All file operations MUST use this absolute path as base:
404
- {worktree_absolute_path}
405
-
406
- Example: To edit src/foo.ts, use:
407
- {worktree_absolute_path}/src/foo.ts
408
-
409
- Do NOT write files to the main project directory.
410
-
411
339
  Steps:
412
340
  1. Implement the task
413
- 2. Detect test command: check for package.json (npm test), pyproject.toml (pytest),
414
- Cargo.toml (cargo test), go.mod (go test ./...), or Makefile (make test)
415
- 3. Run tests if test infrastructure exists:
416
- - Run the detected test command
417
- - If tests fail: fix the code and re-run until passing
418
- - Do NOT commit with failing tests
419
- 4. If NO test infrastructure: set tests_ran: false in result file
420
- 5. Commit as feat({spec}): {description}
421
- 6. Write result file with ALL fields including test evidence (see schema):
422
- {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
423
-
424
- **STOP after writing the result file. Do NOT:**
425
- - Merge branches or cherry-pick commits
426
- - Rename or move spec files (doing-* → done-*)
427
- - Remove worktrees or delete branches
428
- - Run git checkout on main
429
- These are handled by the orchestrator and /df:verify.
430
- ```
431
-
432
- **Spike Task:**
341
+ 2. Detect and run the project's test command if test infrastructure exists
342
+ - If tests fail: fix and re-run until passing. Do NOT commit with failing tests
343
+ - If NO test infrastructure: set tests_ran: false in result file
344
+ 3. Commit as feat({spec}): {description}
345
+ 4. Write result file with ALL fields including test evidence (see schema)
346
+ ```
347
+
348
+ **Spike Task (append after preamble):**
433
349
  ```
434
350
  {task_id} [SPIKE]: {hypothesis}
435
351
  Type: spike
@@ -437,8 +353,6 @@ Method: {minimal steps}
437
353
  Success criteria: {measurable targets}
438
354
  Experiment file: {worktree_absolute_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
439
355
 
440
- Working directory: {worktree_absolute_path}
441
-
442
356
  Steps:
443
357
  1. Execute method
444
358
  2. For EACH criterion: record target, measure actual, compare (show math)
@@ -453,61 +367,31 @@ Rules:
453
367
  - Worse than baseline = FAILED (baseline 7k, actual 1.5k → FAILED)
454
368
  - "Close enough" = FAILED
455
369
  - Verifier will check. False positives waste resources.
456
- - STOP after writing result file. Do NOT merge, rename specs, or clean up worktrees.
457
370
  ```
458
371
 
459
372
  ### 8. FAILURE HANDLING
460
373
 
461
374
  When a task fails and cannot be auto-fixed:
462
375
 
463
- **Native task update:**
464
- ```
465
- TaskUpdate(taskId: native_id, status: "pending") # Reset to pending, not deleted
466
- ```
467
- This keeps the task visible for retry. Dependent tasks remain blocked.
468
-
469
- **Behavior:**
470
- 1. Leave worktree intact at `{worktree_path}`
471
- 2. Keep checkpoint.json for potential resume
472
- 3. Output debugging instructions
473
-
474
- **Output:**
475
- ```
476
- ✗ Task T3 failed after retry
477
-
478
- Worktree preserved for debugging:
479
- Path: .deepflow/worktrees/upload
480
- Branch: df/upload
481
-
482
- To investigate:
483
- cd .deepflow/worktrees/upload
484
- # examine files, run tests, etc.
485
-
486
- To resume after fixing:
487
- /df:execute --continue
488
-
489
- To discard and start fresh:
490
- /df:execute --fresh
491
- ```
492
-
493
- **Key points:**
494
- - Never auto-delete worktree on failure (cleanup_on_fail: false by default)
495
- - Always provide the exact cleanup commands
496
- - Checkpoint remains so --continue can work after manual fix
376
+ `TaskUpdate(taskId: native_id, status: "pending")` — keeps task visible for retry; dependents remain blocked. Leave worktree intact, keep checkpoint.json, output: worktree path/branch, `cd {worktree_path}` to investigate, `/df:execute --continue` to resume, `/df:execute --fresh` to discard.
497
377
 
498
378
  ### 9. COMPLETE SPECS
499
379
 
500
380
  When all tasks done for a `doing-*` spec:
501
- 1. Embed history in spec: `## Completed` section
381
+ 1. Embed history in spec: `## Completed` section with task list and commit hashes
502
382
  2. Rename: `doing-upload.md` → `done-upload.md`
503
- 3. Remove section from PLAN.md
383
+ 3. Remove the spec's ENTIRE section from PLAN.md:
384
+ - The `### doing-{spec}` header
385
+ - All task entries (`- [x] **T{n}**: ...` and their sub-items)
386
+ - Any `## Execution Summary` block for that spec
387
+ - Any `### Fix Tasks` sub-section for that spec
388
+ - Separators (`---`) between removed sections
389
+ 4. Recalculate the Summary table at the top of PLAN.md (update counts for completed/pending)
504
390
 
505
391
  ### 10. ITERATE (Notification-Driven)
506
392
 
507
393
  After spawning wave agents, your turn ENDS. Completion notifications drive the loop.
508
394
 
509
- **NEVER use TaskOutput** — it explodes context.
510
-
511
395
  **Per notification:**
512
396
  1. Read result file for the completed agent
513
397
  2. Validate test evidence:
@@ -526,19 +410,7 @@ After spawning wave agents, your turn ENDS. Completion notifications drive the l
526
410
 
527
411
  ### 11. CAPTURE DECISIONS
528
412
 
529
- After all tasks complete (or all blocked), extract up to 4 candidate decisions from the session (implementation patterns, deviations from plan, key assumptions made).
530
-
531
- Present via AskUserQuestion with multiSelect: true. Labels: `[TAG] decision text`. Descriptions: rationale.
532
-
533
- For each confirmed decision, append to **main tree** `.deepflow/decisions.md` (create if missing):
534
- ```
535
- ### {YYYY-MM-DD} — execute
536
- - [APPROACH] Parallel agent spawn for independent tasks — confirmed no file conflicts
537
- ```
538
-
539
- Main tree path: use the repo root (parent of `.deepflow/worktrees/`), NOT the worktree.
540
-
541
- Max 4 candidates per prompt. Tags: [APPROACH], [PROVISIONAL], [ASSUMPTION].
413
+ Follow the **main-tree** variant from `templates/decision-capture.md`. Command name: `execute`.
542
414
 
543
415
  ## Rules
544
416
 
@@ -555,48 +427,22 @@ Max 4 candidates per prompt. Tags: [APPROACH], [PROVISIONAL], [ASSUMPTION].
555
427
  ```
556
428
  /df:execute (context: 12%)
557
429
 
558
- Loading PLAN.md...
559
- T1: Create upload endpoint (ready)
560
- T2: Add S3 service (blocked by T1)
561
- T3: Add auth guard (blocked by T1)
562
-
563
- Registering native tasks...
564
- TaskCreate → T1 (native: task-001)
565
- TaskCreate → T2 (native: task-002)
566
- TaskCreate → T3 (native: task-003)
567
- TaskUpdate(task-002, addBlockedBy: [task-001])
568
- TaskUpdate(task-003, addBlockedBy: [task-001])
569
-
570
- Spawning Wave 1: T1
571
- TaskUpdate(task-001, status: "in_progress") ← spinner: "Creating upload endpoint"
572
-
573
- [Agent "T1" completed]
574
- TaskUpdate(task-001, status: "completed") ← auto-unblocks task-002, task-003
575
- ✓ T1: success (abc1234)
576
-
577
- TaskList → task-002, task-003 now ready (blockedBy empty)
578
-
579
- Spawning Wave 2: T2, T3 parallel
580
- TaskUpdate(task-002, status: "in_progress")
581
- TaskUpdate(task-003, status: "in_progress")
582
-
583
- [Agent "T2" completed]
584
- TaskUpdate(task-002, status: "completed")
585
- ✓ T2: success (def5678)
430
+ Loading PLAN.md... T1 ready, T2/T3 blocked by T1
431
+ Registering native tasks: TaskCreate T1/T2/T3, TaskUpdate(T2 blockedBy T1), TaskUpdate(T3 blockedBy T1)
586
432
 
587
- [Agent "T3" completed]
588
- TaskUpdate(task-003, status: "completed")
589
- T3: success (ghi9012)
433
+ Wave 1: TaskUpdate(T1, in_progress)
434
+ [Agent "T1" completed] TaskUpdate(T1, completed) → auto-unblocks T2, T3
435
+ T1: success (abc1234)
590
436
 
591
- Wave 2 complete (2/2). Context: 35%
592
-
593
- doing-upload done-upload
594
- ✓ Complete: 3/3 tasks
437
+ Wave 2: TaskUpdate(T2/T3, in_progress)
438
+ [Agent "T2" completed] ✓ T2: success (def5678)
439
+ [Agent "T3" completed] T3: success (ghi9012)
440
+ Context: 35% — doing-upload → done-upload. Complete: 3/3
595
441
 
596
442
  Next: Run /df:verify to verify specs and merge to main
597
443
  ```
598
444
 
599
- ### Spike-First Execution
445
+ ### Spike with Failure (Agent or Verifier Override)
600
446
 
601
447
  ```
602
448
  /df:execute (context: 10%)
@@ -611,56 +457,17 @@ Registering native tasks...
611
457
 
612
458
  Checking experiment status...
613
459
  T1 [SPIKE]: No experiment yet, spike executable
614
- T2: Blocked by T1 (spike not validated)
615
- T3: Blocked by T1 (spike not validated)
460
+ T2, T3: Blocked by T1 (spike not validated)
616
461
 
617
462
  Spawning Wave 1: T1 [SPIKE]
618
463
  TaskUpdate(task-001, status: "in_progress")
619
464
 
620
465
  [Agent "T1 SPIKE" completed]
621
- ✓ T1: complete, verifying...
622
-
623
- Verifying T1...
624
- ✓ Spike T1 verified (throughput 8500 >= 7000)
625
- TaskUpdate(task-001, status: "completed") ← auto-unblocks task-002, task-003
626
- → upload--streaming--passed.md
627
-
628
- TaskList → task-002, task-003 now ready
629
-
630
- Spawning Wave 2: T2, T3 parallel
631
- TaskUpdate(task-002, status: "in_progress")
632
- TaskUpdate(task-003, status: "in_progress")
633
-
634
- [Agent "T2" completed]
635
- TaskUpdate(task-002, status: "completed")
636
- ✓ T2: success (def5678)
637
-
638
- [Agent "T3" completed]
639
- TaskUpdate(task-003, status: "completed")
640
- ✓ T3: success (ghi9012)
641
-
642
- Wave 2 complete (2/2). Context: 40%
643
-
644
- ✓ doing-upload → done-upload
645
- ✓ Complete: 3/3 tasks
646
-
647
- Next: Run /df:verify to verify specs and merge to main
648
- ```
649
-
650
- ### Spike Failed (Agent Correctly Reported)
651
-
652
- ```
653
- /df:execute (context: 10%)
654
-
655
- Registering native tasks...
656
- TaskCreate → T1 [SPIKE], T2, T3 (with dependencies)
657
-
658
- Wave 1: T1 [SPIKE] (context: 15%)
659
- TaskUpdate(task-001, status: "in_progress")
660
- T1: complete, verifying...
466
+ ✓ T1: complete (agent said: success), verifying...
661
467
 
662
468
  Verifying T1...
663
469
  ✗ Spike T1 failed verification (throughput 1500 < 7000)
470
+ ⚠ Agent incorrectly marked as passed — overriding to FAILED
664
471
  # Spike stays pending — dependents remain blocked
665
472
  → upload--streaming--failed.md
666
473
 
@@ -670,29 +477,7 @@ Complete: 1/3 tasks (2 blocked by failed experiment)
670
477
  Next: Run /df:plan to generate new hypothesis spike
671
478
  ```
672
479
 
673
- ### Spike Failed (Verifier Override)
674
-
675
- ```
676
- /df:execute (context: 10%)
677
-
678
- Registering native tasks...
679
- TaskCreate → T1 [SPIKE], T2, T3 (with dependencies)
680
-
681
- Wave 1: T1 [SPIKE] (context: 15%)
682
- TaskUpdate(task-001, status: "in_progress")
683
- T1: complete (agent said: success), verifying...
684
-
685
- Verifying T1...
686
- ✗ Spike T1 failed verification (throughput 1500 < 7000)
687
- ⚠ Agent incorrectly marked as passed — overriding to FAILED
688
- TaskUpdate(task-001, status: "pending") ← reset, dependents stay blocked
689
- → upload--streaming--failed.md
690
-
691
- ⚠ Spike T1 invalidated hypothesis
692
- Complete: 1/3 tasks (2 blocked by failed experiment)
693
-
694
- Next: Run /df:plan to generate new hypothesis spike
695
- ```
480
+ Note: If the agent correctly reports `status: failed`, the "overriding to FAILED" line is omitted — the verifier simply confirms failure.
696
481
 
697
482
  ### With Checkpoint
698
483
 
@@ -17,17 +17,11 @@ Compare specs against codebase and past experiments. Generate prioritized tasks.
17
17
 
18
18
  ## Spec File States
19
19
 
20
- ```
21
- specs/
22
- feature.md → New, needs planning (this command reads these)
23
- doing-auth.md → In progress, has tasks in PLAN.md
24
- done-payments.md → Completed, history embedded
25
- ```
26
-
27
- **Filtering:**
28
- - New: `specs/*.md` excluding `doing-*` and `done-*`
29
- - In progress: `specs/doing-*.md`
30
- - Completed: `specs/done-*.md`
20
+ | Prefix | State | Action |
21
+ |--------|-------|--------|
22
+ | (none) | New | Plan this |
23
+ | `doing-` | In progress | Skip |
24
+ | `done-` | Completed | Skip |
31
25
 
32
26
  ## Behavior
33
27
 
@@ -83,20 +77,7 @@ Include patterns in task descriptions for agents to follow.
83
77
 
84
78
  ### 4. ANALYZE CODEBASE
85
79
 
86
- **NEVER use `run_in_background` for Explore agents** causes late "Agent completed" notifications that pollute output after work is done.
87
-
88
- **NEVER use TaskOutput** — returns full agent transcripts (100KB+) that explode context.
89
-
90
- **Spawn ALL Explore agents in ONE message (non-background, parallel):**
91
-
92
- ```python
93
- # All in single message — runs in parallel, blocks until all complete:
94
- Task(subagent_type="Explore", model="haiku", prompt="Find: ...")
95
- Task(subagent_type="Explore", model="haiku", prompt="Find: ...")
96
- Task(subagent_type="Explore", model="haiku", prompt="Find: ...")
97
- # Each returns agent's final message only (not full transcript)
98
- # No late notifications — agents complete before orchestrator proceeds
99
- ```
80
+ Follow `templates/explore-agent.md` for spawn rules, prompt structure, and scope restrictions.
100
81
 
101
82
  Scale agent count based on codebase size:
102
83
 
@@ -107,35 +88,6 @@ Scale agent count based on codebase size:
107
88
  | 100-500 | 25-40 |
108
89
  | 500+ | 50-100 (cap) |
109
90
 
110
- **Explore Agent Prompt Structure:**
111
- ```
112
- Find: [specific question]
113
-
114
- Return ONLY:
115
- - File paths matching criteria
116
- - One-line description per file
117
- - Integration points (if asked)
118
-
119
- DO NOT:
120
- - Read or summarize spec files
121
- - Make recommendations
122
- - Propose solutions
123
- - Generate tables or lengthy explanations
124
-
125
- Max response: 500 tokens (configurable via .deepflow/config.yaml explore.max_tokens)
126
- ```
127
-
128
- **Explore Agent Scope Restrictions:**
129
- - MUST only report factual findings:
130
- - Files found
131
- - Patterns/conventions observed
132
- - Integration points
133
- - MUST NOT:
134
- - Make recommendations
135
- - Propose architectures
136
- - Read and summarize specs (that's orchestrator's job)
137
- - Draw conclusions about what should be built
138
-
139
91
  **Use `code-completeness` skill patterns** to search for:
140
92
  - Implementations matching spec requirements
141
93
  - TODO, FIXME, HACK comments
@@ -144,28 +96,9 @@ Max response: 500 tokens (configurable via .deepflow/config.yaml explore.max_tok
144
96
 
145
97
  ### 5. COMPARE & PRIORITIZE
146
98
 
147
- **Use Task tool to spawn reasoner agent:**
148
- ```
149
- Task tool parameters:
150
- - subagent_type: "reasoner"
151
- - model: "opus"
152
- ```
99
+ Spawn `Task(subagent_type="reasoner", model="opus")`. Reasoner maps each requirement to DONE / PARTIAL / MISSING / CONFLICT. Flag spec gaps; don't silently assume.
153
100
 
154
- Reasoner performs analysis:
155
-
156
- | Status | Action |
157
- |--------|--------|
158
- | DONE | Skip |
159
- | PARTIAL | Task to complete |
160
- | MISSING | Task to implement |
161
- | CONFLICT | Flag for review |
162
-
163
- **Spec gaps:** If spec is ambiguous or missing details, note in output (don't silently assume).
164
-
165
- **Priority order:**
166
- 1. Dependencies — blockers first
167
- 2. Impact — core features before enhancements
168
- 3. Risk — unknowns early
101
+ **Priority order:** Dependencies → Impact → Risk
169
102
 
170
103
  ### 6. GENERATE SPIKE TASKS (IF NEEDED)
171
104
 
@@ -186,66 +119,53 @@ Reasoner performs analysis:
186
119
  - Blocked by: none
187
120
  ```
188
121
 
189
- **Blocking Logic:**
190
- - All implementation tasks MUST have `Blocked by: T{spike}` until spike passes
191
- - After spike completes:
192
- - If passed: Update experiment to `--passed.md`, unblock implementation tasks
193
- - If failed: Update experiment to `--failed.md`, DO NOT generate implementation tasks
194
-
195
- **Full Implementation Only After Spike:**
196
- - Only generate full task list when spike validates the approach
197
- - Never generate 10-task waterfall without validated hypothesis
122
+ **Blocking Logic:** All implementation tasks MUST have `Blocked by: T{spike}` until spike passes. If spike fails: update to `--failed.md`, DO NOT generate implementation tasks.
198
123
 
199
124
  ### 7. VALIDATE HYPOTHESES
200
125
 
201
- Test risky assumptions before finalizing plan.
126
+ For unfamiliar APIs, ambiguous approaches, or performance-critical work: prototype in scratchpad (not committed). If assumption fails, write `.deepflow/experiments/{topic}--{hypothesis}--failed.md`. Skip for well-known patterns/simple CRUD.
202
127
 
203
- **Validate when:** Unfamiliar APIs, multiple approaches possible, external integrations, performance-critical
128
+ ### 8. CLEANUP PLAN.md
204
129
 
205
- **Process:**
206
- 1. Prototype in scratchpad (not committed)
207
- 2. Test assumption
208
- 3. If fails → Write `.deepflow/experiments/{topic}--{hypothesis}--failed.md`
209
- 4. Adjust approach, document in task
130
+ Before writing new tasks, prune stale sections:
210
131
 
211
- **Skip:** Well-known patterns, simple CRUD, clear docs exist
132
+ ```
133
+ For each ### section in PLAN.md:
134
+ Extract spec name from header (e.g. "doing-upload" or "done-upload")
135
+ If specs/done-{name}.md exists:
136
+ → Remove the ENTIRE section: header, tasks, execution summary, fix tasks, separators
137
+ If header references a spec with no matching specs/doing-*.md or specs/done-*.md:
138
+ → Remove it (orphaned section)
139
+ ```
140
+
141
+ Also recalculate the Summary table (specs analyzed, tasks created/completed/pending) to reflect only remaining sections.
142
+
143
+ If PLAN.md becomes empty after cleanup, delete the file and recreate fresh.
212
144
 
213
- ### 8. OUTPUT PLAN.md
145
+ ### 9. OUTPUT PLAN.md
214
146
 
215
147
  Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validation findings.
216
148
 
217
- ### 9. RENAME SPECS
149
+ ### 10. RENAME SPECS
218
150
 
219
151
  `mv specs/feature.md specs/doing-feature.md`
220
152
 
221
- ### 10. REPORT
153
+ ### 11. REPORT
222
154
 
223
155
  `✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
224
156
 
225
- ### 11. CAPTURE DECISIONS
157
+ ### 12. CAPTURE DECISIONS
226
158
 
227
- Extract up to 4 candidate decisions (approaches chosen, spike strategies, prioritization rationale). Present via AskUserQuestion with `multiSelect: true`. Each option: `label: "[TAG] <decision>"`, `description: "<rationale>"`. Tags: `[APPROACH]`, `[PROVISIONAL]`, `[ASSUMPTION]`.
228
-
229
- Append confirmed decisions to `.deepflow/decisions.md` (create if missing):
230
- ```
231
- ### {YYYY-MM-DD} — plan
232
- - [TAG] Decision text — rationale summary
233
- ```
234
- If a decision contradicts a prior entry, add: `(supersedes: <prior text>)`
159
+ Follow the **default** variant from `templates/decision-capture.md`. Command name: `plan`.
235
160
 
236
161
  ## Rules
237
- - **Never use TaskOutput** — Returns full transcripts that explode context
238
- - **Never use run_in_background for Explore agents** — Causes late notifications that pollute output
239
162
  - **Spike-first** — Generate spike task before full implementation if no `--passed.md` experiment exists
240
163
  - **Block on spike** — Full implementation tasks MUST be blocked by spike validation
241
164
  - **Learn from failures** — Extract "next hypothesis" from failed experiments, never repeat same approach
242
- - **Learn from history** — Check past experiments before proposing approaches
243
- - **Plan only** — Do NOT implement anything (except quick validation prototypes)
244
- - **Validate before commit** — Test risky assumptions with minimal experiments
165
+ - **Plan only** — Do NOT implement (except quick validation prototypes)
245
166
  - **Confirm before assume** — Search code before marking "missing"
246
167
  - **One task = one logical unit** — Atomic, committable
247
- - Prefer existing utilities over new code
248
- - Flag spec gaps, don't silently ignore
168
+ - Prefer existing utilities over new code; flag spec gaps
249
169
 
250
170
  ## Agent Scaling
251
171
 
@@ -254,7 +174,7 @@ If a decision contradicts a prior entry, add: `(supersedes: <prior text>)`
254
174
  | Explore (search) | haiku | 10 | +1 per 20 files |
255
175
  | Reasoner (analyze) | opus | 5 | +1 per 2 specs |
256
176
 
257
- **IMPORTANT**: Always use the `Task` tool with explicit `subagent_type` and `model` parameters. Do NOT use Glob/Grep/Read directly for codebase analysis - spawn agents instead.
177
+ Always use the `Task` tool with explicit `subagent_type` and `model`. Do NOT use Glob/Grep/Read directly.
258
178
 
259
179
  ## Example
260
180
 
@@ -93,13 +93,7 @@ Word count target: 200-500 words. Do not pad. Do not truncate important informat
93
93
 
94
94
  ## Rules
95
95
 
96
- - **NEVER write any file** — not decisions.md, not PLAN.md, not any new file
97
- - **NEVER use AskUserQuestion** — this command is read-only, no interaction
98
- - **NEVER spawn agents** — read directly using Bash (git log) and Read tool
99
- - **NEVER use TaskOutput** — returns full transcripts that explode context
100
- - **NEVER use EnterPlanMode or ExitPlanMode**
101
96
  - Read sources in a single pass — do not loop or re-read
102
- - If a source file is missing, skip it and note it only if relevant
103
97
  - Contradicted decisions: show newest entry per topic only
104
98
  - Token budget: stay within ~2500 tokens of input to produce ~500 words of output
105
99