deepflow 0.1.49 → 0.1.50
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +24 -9
- package/bin/install.js +1 -1
- package/package.json +1 -1
- package/src/commands/df/debate.md +31 -146
- package/src/commands/df/discover.md +1 -20
- package/src/commands/df/execute.md +49 -264
- package/src/commands/df/plan.md +32 -112
- package/src/commands/df/resume.md +0 -6
- package/src/commands/df/spec.md +4 -58
- package/src/commands/df/verify.md +39 -167
- package/templates/decision-capture.md +19 -0
- package/templates/explore-agent.md +34 -0
|
@@ -62,36 +62,7 @@ Each task = one background agent. Use agent completion notifications as the feed
|
|
|
62
62
|
- Yes → proceed to next wave or write final summary
|
|
63
63
|
```
|
|
64
64
|
|
|
65
|
-
|
|
66
|
-
- Run Bash commands to poll/monitor
|
|
67
|
-
- Try to read result files before notifications arrive
|
|
68
|
-
- Write summaries before all wave agents complete
|
|
69
|
-
|
|
70
|
-
**On notification, respond briefly:**
|
|
71
|
-
- ONE line per completed agent: "✓ T1: success (abc123)"
|
|
72
|
-
- Only write full summary after ALL wave agents complete
|
|
73
|
-
- Do NOT repeat the full execution status on every notification
|
|
74
|
-
|
|
75
|
-
```python
|
|
76
|
-
# Step 1: Spawn wave (ONE message, then STOP)
|
|
77
|
-
Task(subagent_type="general-purpose", model="sonnet", run_in_background=True, prompt="T1: ...")
|
|
78
|
-
Task(subagent_type="general-purpose", model="sonnet", run_in_background=True, prompt="T2: ...")
|
|
79
|
-
Task(subagent_type="general-purpose", model="sonnet", run_in_background=True, prompt="T3: ...")
|
|
80
|
-
# Turn ends here. Wait for notifications.
|
|
81
|
-
|
|
82
|
-
# Step 2: On "Agent T1 completed" notification:
|
|
83
|
-
Read("{worktree}/.deepflow/results/T1.yaml")
|
|
84
|
-
# Output: "✓ T1: success (abc123)" — then STOP, wait for next
|
|
85
|
-
|
|
86
|
-
# Step 3: On "Agent T2 completed" notification:
|
|
87
|
-
Read("{worktree}/.deepflow/results/T2.yaml")
|
|
88
|
-
# Output: "✓ T2: success (def456)" — then STOP, wait for next
|
|
89
|
-
|
|
90
|
-
# Step 4: On "Agent T3 completed" notification (last one):
|
|
91
|
-
Read("{worktree}/.deepflow/results/T3.yaml")
|
|
92
|
-
# Output: "✓ T3: success (ghi789)"
|
|
93
|
-
# All done → proceed to next wave or final summary
|
|
94
|
-
```
|
|
65
|
+
After spawning, your turn ENDS. Per notification: read result file, output ONE line ("✓ T1: success (abc123)"), update PLAN.md. Write full summary only after ALL wave agents complete.
|
|
95
66
|
|
|
96
67
|
Result file `.deepflow/results/{task_id}.yaml`:
|
|
97
68
|
```yaml
|
|
@@ -145,10 +116,8 @@ experiment_file: ".deepflow/experiments/upload--streaming--failed.md"
|
|
|
145
116
|
}
|
|
146
117
|
```
|
|
147
118
|
|
|
148
|
-
Note: `completed_tasks` is kept for backward compatibility but is now derivable from PLAN.md `[x]` entries. The native task system (TaskList) is the primary source for runtime task status.
|
|
149
|
-
|
|
150
119
|
**On checkpoint:** Complete wave → update PLAN.md → save to worktree → exit.
|
|
151
|
-
**Resume:** `--continue` loads checkpoint, verifies worktree, skips completed tasks.
|
|
120
|
+
**Resume:** `--continue` loads checkpoint, verifies worktree, skips completed tasks.
|
|
152
121
|
|
|
153
122
|
## Behavior
|
|
154
123
|
|
|
@@ -200,27 +169,7 @@ If missing: "No PLAN.md found. Run /df:plan first."
|
|
|
200
169
|
|
|
201
170
|
### 2.5. REGISTER NATIVE TASKS
|
|
202
171
|
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
**For each uncompleted task (`[ ]`) in PLAN.md:**
|
|
206
|
-
|
|
207
|
-
```
|
|
208
|
-
1. TaskCreate:
|
|
209
|
-
- subject: "{task_id}: {description}" (e.g. "T1: Create upload endpoint")
|
|
210
|
-
- description: Full task block from PLAN.md (files, blocked by, type, etc.)
|
|
211
|
-
- activeForm: "{gerund form of description}" (e.g. "Creating upload endpoint")
|
|
212
|
-
|
|
213
|
-
2. Store mapping: PLAN.md task_id (T1) → native task ID
|
|
214
|
-
```
|
|
215
|
-
|
|
216
|
-
**After all tasks created, set up dependencies:**
|
|
217
|
-
|
|
218
|
-
```
|
|
219
|
-
For each task with "Blocked by: T{n}, T{m}":
|
|
220
|
-
TaskUpdate(taskId: native_id, addBlockedBy: [native_id_of_Tn, native_id_of_Tm])
|
|
221
|
-
```
|
|
222
|
-
|
|
223
|
-
**On `--continue`:** Only create tasks for remaining `[ ]` items (skip `[x]` completed).
|
|
172
|
+
For each `[ ]` task in PLAN.md: `TaskCreate(subject: "{task_id}: {description}", activeForm: "{gerund}", description: full block)`. Store task_id → native ID mapping. Then set dependencies: `TaskUpdate(addBlockedBy: [...])` for each "Blocked by:" entry. On `--continue`: only register remaining `[ ]` items.
|
|
224
173
|
|
|
225
174
|
### 3. CHECK FOR UNPLANNED SPECS
|
|
226
175
|
|
|
@@ -302,24 +251,9 @@ TaskUpdate(taskId: native_id, status: "in_progress")
|
|
|
302
251
|
```
|
|
303
252
|
This activates the UI spinner showing the task's activeForm (e.g. "Creating upload endpoint").
|
|
304
253
|
|
|
305
|
-
**
|
|
306
|
-
|
|
307
|
-
DO NOT spawn one task, wait, then spawn another. Instead, call Task tool multiple times in the SAME message block. This enables true parallelism.
|
|
254
|
+
**NEVER use `isolation: "worktree"` on Task tool calls.** Deepflow manages a shared worktree per spec (`.deepflow/worktrees/{spec}/`) so wave 2 agents see wave 1 commits. Claude Code's native isolation creates separate per-agent worktrees (`.claude/worktrees/`) where agents can't see each other's work.
|
|
308
255
|
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
```
|
|
312
|
-
// In a SINGLE assistant message, invoke Task with run_in_background=true:
|
|
313
|
-
Task(subagent_type="general-purpose", model="sonnet", run_in_background=true, prompt="T1: ...")
|
|
314
|
-
Task(subagent_type="general-purpose", model="sonnet", run_in_background=true, prompt="T2: ...")
|
|
315
|
-
Task(subagent_type="general-purpose", model="sonnet", run_in_background=true, prompt="T3: ...")
|
|
316
|
-
// Turn ends here. Wait for completion notifications.
|
|
317
|
-
```
|
|
318
|
-
|
|
319
|
-
**WRONG (sequential):** Send message with Task for T1 → wait → send message with Task for T2 → wait → ...
|
|
320
|
-
**RIGHT (parallel):** Send ONE message with Task for T1, T2, T3 all together, then STOP
|
|
321
|
-
|
|
322
|
-
Same-file conflicts: spawn sequentially instead.
|
|
256
|
+
**Spawn ALL ready tasks in ONE message** with multiple Task tool calls (true parallelism). Same-file conflicts: spawn sequentially.
|
|
323
257
|
|
|
324
258
|
**Spike Task Execution:**
|
|
325
259
|
When spawning a spike task, the agent MUST:
|
|
@@ -378,58 +312,40 @@ VERIFIED_PASS →
|
|
|
378
312
|
|
|
379
313
|
VERIFIED_FAIL →
|
|
380
314
|
# Spike task stays as pending, dependents remain blocked
|
|
381
|
-
# No TaskUpdate needed — native system keeps them blocked
|
|
382
315
|
Log "✗ Spike {task_id} failed verification"
|
|
383
316
|
If override: log "⚠ Agent incorrectly marked as passed"
|
|
384
317
|
```
|
|
385
318
|
|
|
386
|
-
|
|
387
|
-
```
|
|
388
|
-
Task tool parameters:
|
|
389
|
-
- subagent_type: "reasoner"
|
|
390
|
-
- model: "opus"
|
|
391
|
-
- prompt: "Debug failure: {error details}"
|
|
392
|
-
```
|
|
319
|
+
On task failure: spawn `Task(subagent_type="reasoner", model="opus", prompt="Debug failure: {error details}")`.
|
|
393
320
|
|
|
394
321
|
### 7. PER-TASK (agent prompt)
|
|
395
322
|
|
|
396
|
-
**
|
|
323
|
+
**Common preamble (include in all agent prompts):**
|
|
324
|
+
```
|
|
325
|
+
Working directory: {worktree_absolute_path}
|
|
326
|
+
All file operations MUST use this absolute path as base. Do NOT write files to the main project directory.
|
|
327
|
+
Commit format: {commit_type}({spec}): {description}
|
|
328
|
+
Result file: {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
|
|
329
|
+
|
|
330
|
+
STOP after writing the result file. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main. These are handled by the orchestrator and /df:verify.
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
**Standard Task (append after preamble):**
|
|
397
334
|
```
|
|
398
335
|
{task_id}: {description from PLAN.md}
|
|
399
336
|
Files: {target files}
|
|
400
337
|
Spec: {spec_name}
|
|
401
338
|
|
|
402
|
-
**IMPORTANT: Working Directory**
|
|
403
|
-
All file operations MUST use this absolute path as base:
|
|
404
|
-
{worktree_absolute_path}
|
|
405
|
-
|
|
406
|
-
Example: To edit src/foo.ts, use:
|
|
407
|
-
{worktree_absolute_path}/src/foo.ts
|
|
408
|
-
|
|
409
|
-
Do NOT write files to the main project directory.
|
|
410
|
-
|
|
411
339
|
Steps:
|
|
412
340
|
1. Implement the task
|
|
413
|
-
2. Detect
|
|
414
|
-
|
|
415
|
-
|
|
416
|
-
|
|
417
|
-
|
|
418
|
-
|
|
419
|
-
|
|
420
|
-
|
|
421
|
-
6. Write result file with ALL fields including test evidence (see schema):
|
|
422
|
-
{worktree_absolute_path}/.deepflow/results/{task_id}.yaml
|
|
423
|
-
|
|
424
|
-
**STOP after writing the result file. Do NOT:**
|
|
425
|
-
- Merge branches or cherry-pick commits
|
|
426
|
-
- Rename or move spec files (doing-* → done-*)
|
|
427
|
-
- Remove worktrees or delete branches
|
|
428
|
-
- Run git checkout on main
|
|
429
|
-
These are handled by the orchestrator and /df:verify.
|
|
430
|
-
```
|
|
431
|
-
|
|
432
|
-
**Spike Task:**
|
|
341
|
+
2. Detect and run the project's test command if test infrastructure exists
|
|
342
|
+
- If tests fail: fix and re-run until passing. Do NOT commit with failing tests
|
|
343
|
+
- If NO test infrastructure: set tests_ran: false in result file
|
|
344
|
+
3. Commit as feat({spec}): {description}
|
|
345
|
+
4. Write result file with ALL fields including test evidence (see schema)
|
|
346
|
+
```
|
|
347
|
+
|
|
348
|
+
**Spike Task (append after preamble):**
|
|
433
349
|
```
|
|
434
350
|
{task_id} [SPIKE]: {hypothesis}
|
|
435
351
|
Type: spike
|
|
@@ -437,8 +353,6 @@ Method: {minimal steps}
|
|
|
437
353
|
Success criteria: {measurable targets}
|
|
438
354
|
Experiment file: {worktree_absolute_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
|
|
439
355
|
|
|
440
|
-
Working directory: {worktree_absolute_path}
|
|
441
|
-
|
|
442
356
|
Steps:
|
|
443
357
|
1. Execute method
|
|
444
358
|
2. For EACH criterion: record target, measure actual, compare (show math)
|
|
@@ -453,61 +367,31 @@ Rules:
|
|
|
453
367
|
- Worse than baseline = FAILED (baseline 7k, actual 1.5k → FAILED)
|
|
454
368
|
- "Close enough" = FAILED
|
|
455
369
|
- Verifier will check. False positives waste resources.
|
|
456
|
-
- STOP after writing result file. Do NOT merge, rename specs, or clean up worktrees.
|
|
457
370
|
```
|
|
458
371
|
|
|
459
372
|
### 8. FAILURE HANDLING
|
|
460
373
|
|
|
461
374
|
When a task fails and cannot be auto-fixed:
|
|
462
375
|
|
|
463
|
-
|
|
464
|
-
```
|
|
465
|
-
TaskUpdate(taskId: native_id, status: "pending") # Reset to pending, not deleted
|
|
466
|
-
```
|
|
467
|
-
This keeps the task visible for retry. Dependent tasks remain blocked.
|
|
468
|
-
|
|
469
|
-
**Behavior:**
|
|
470
|
-
1. Leave worktree intact at `{worktree_path}`
|
|
471
|
-
2. Keep checkpoint.json for potential resume
|
|
472
|
-
3. Output debugging instructions
|
|
473
|
-
|
|
474
|
-
**Output:**
|
|
475
|
-
```
|
|
476
|
-
✗ Task T3 failed after retry
|
|
477
|
-
|
|
478
|
-
Worktree preserved for debugging:
|
|
479
|
-
Path: .deepflow/worktrees/upload
|
|
480
|
-
Branch: df/upload
|
|
481
|
-
|
|
482
|
-
To investigate:
|
|
483
|
-
cd .deepflow/worktrees/upload
|
|
484
|
-
# examine files, run tests, etc.
|
|
485
|
-
|
|
486
|
-
To resume after fixing:
|
|
487
|
-
/df:execute --continue
|
|
488
|
-
|
|
489
|
-
To discard and start fresh:
|
|
490
|
-
/df:execute --fresh
|
|
491
|
-
```
|
|
492
|
-
|
|
493
|
-
**Key points:**
|
|
494
|
-
- Never auto-delete worktree on failure (cleanup_on_fail: false by default)
|
|
495
|
-
- Always provide the exact cleanup commands
|
|
496
|
-
- Checkpoint remains so --continue can work after manual fix
|
|
376
|
+
`TaskUpdate(taskId: native_id, status: "pending")` — keeps task visible for retry; dependents remain blocked. Leave worktree intact, keep checkpoint.json, output: worktree path/branch, `cd {worktree_path}` to investigate, `/df:execute --continue` to resume, `/df:execute --fresh` to discard.
|
|
497
377
|
|
|
498
378
|
### 9. COMPLETE SPECS
|
|
499
379
|
|
|
500
380
|
When all tasks done for a `doing-*` spec:
|
|
501
|
-
1. Embed history in spec: `## Completed` section
|
|
381
|
+
1. Embed history in spec: `## Completed` section with task list and commit hashes
|
|
502
382
|
2. Rename: `doing-upload.md` → `done-upload.md`
|
|
503
|
-
3. Remove section from PLAN.md
|
|
383
|
+
3. Remove the spec's ENTIRE section from PLAN.md:
|
|
384
|
+
- The `### doing-{spec}` header
|
|
385
|
+
- All task entries (`- [x] **T{n}**: ...` and their sub-items)
|
|
386
|
+
- Any `## Execution Summary` block for that spec
|
|
387
|
+
- Any `### Fix Tasks` sub-section for that spec
|
|
388
|
+
- Separators (`---`) between removed sections
|
|
389
|
+
4. Recalculate the Summary table at the top of PLAN.md (update counts for completed/pending)
|
|
504
390
|
|
|
505
391
|
### 10. ITERATE (Notification-Driven)
|
|
506
392
|
|
|
507
393
|
After spawning wave agents, your turn ENDS. Completion notifications drive the loop.
|
|
508
394
|
|
|
509
|
-
**NEVER use TaskOutput** — it explodes context.
|
|
510
|
-
|
|
511
395
|
**Per notification:**
|
|
512
396
|
1. Read result file for the completed agent
|
|
513
397
|
2. Validate test evidence:
|
|
@@ -526,19 +410,7 @@ After spawning wave agents, your turn ENDS. Completion notifications drive the l
|
|
|
526
410
|
|
|
527
411
|
### 11. CAPTURE DECISIONS
|
|
528
412
|
|
|
529
|
-
|
|
530
|
-
|
|
531
|
-
Present via AskUserQuestion with multiSelect: true. Labels: `[TAG] decision text`. Descriptions: rationale.
|
|
532
|
-
|
|
533
|
-
For each confirmed decision, append to **main tree** `.deepflow/decisions.md` (create if missing):
|
|
534
|
-
```
|
|
535
|
-
### {YYYY-MM-DD} — execute
|
|
536
|
-
- [APPROACH] Parallel agent spawn for independent tasks — confirmed no file conflicts
|
|
537
|
-
```
|
|
538
|
-
|
|
539
|
-
Main tree path: use the repo root (parent of `.deepflow/worktrees/`), NOT the worktree.
|
|
540
|
-
|
|
541
|
-
Max 4 candidates per prompt. Tags: [APPROACH], [PROVISIONAL], [ASSUMPTION].
|
|
413
|
+
Follow the **main-tree** variant from `templates/decision-capture.md`. Command name: `execute`.
|
|
542
414
|
|
|
543
415
|
## Rules
|
|
544
416
|
|
|
@@ -555,48 +427,22 @@ Max 4 candidates per prompt. Tags: [APPROACH], [PROVISIONAL], [ASSUMPTION].
|
|
|
555
427
|
```
|
|
556
428
|
/df:execute (context: 12%)
|
|
557
429
|
|
|
558
|
-
Loading PLAN.md...
|
|
559
|
-
|
|
560
|
-
T2: Add S3 service (blocked by T1)
|
|
561
|
-
T3: Add auth guard (blocked by T1)
|
|
562
|
-
|
|
563
|
-
Registering native tasks...
|
|
564
|
-
TaskCreate → T1 (native: task-001)
|
|
565
|
-
TaskCreate → T2 (native: task-002)
|
|
566
|
-
TaskCreate → T3 (native: task-003)
|
|
567
|
-
TaskUpdate(task-002, addBlockedBy: [task-001])
|
|
568
|
-
TaskUpdate(task-003, addBlockedBy: [task-001])
|
|
569
|
-
|
|
570
|
-
Spawning Wave 1: T1
|
|
571
|
-
TaskUpdate(task-001, status: "in_progress") ← spinner: "Creating upload endpoint"
|
|
572
|
-
|
|
573
|
-
[Agent "T1" completed]
|
|
574
|
-
TaskUpdate(task-001, status: "completed") ← auto-unblocks task-002, task-003
|
|
575
|
-
✓ T1: success (abc1234)
|
|
576
|
-
|
|
577
|
-
TaskList → task-002, task-003 now ready (blockedBy empty)
|
|
578
|
-
|
|
579
|
-
Spawning Wave 2: T2, T3 parallel
|
|
580
|
-
TaskUpdate(task-002, status: "in_progress")
|
|
581
|
-
TaskUpdate(task-003, status: "in_progress")
|
|
582
|
-
|
|
583
|
-
[Agent "T2" completed]
|
|
584
|
-
TaskUpdate(task-002, status: "completed")
|
|
585
|
-
✓ T2: success (def5678)
|
|
430
|
+
Loading PLAN.md... T1 ready, T2/T3 blocked by T1
|
|
431
|
+
Registering native tasks: TaskCreate T1/T2/T3, TaskUpdate(T2 blockedBy T1), TaskUpdate(T3 blockedBy T1)
|
|
586
432
|
|
|
587
|
-
|
|
588
|
-
|
|
589
|
-
|
|
433
|
+
Wave 1: TaskUpdate(T1, in_progress)
|
|
434
|
+
[Agent "T1" completed] TaskUpdate(T1, completed) → auto-unblocks T2, T3
|
|
435
|
+
✓ T1: success (abc1234)
|
|
590
436
|
|
|
591
|
-
Wave 2
|
|
592
|
-
|
|
593
|
-
✓
|
|
594
|
-
✓ Complete: 3/3
|
|
437
|
+
Wave 2: TaskUpdate(T2/T3, in_progress)
|
|
438
|
+
[Agent "T2" completed] ✓ T2: success (def5678)
|
|
439
|
+
[Agent "T3" completed] ✓ T3: success (ghi9012)
|
|
440
|
+
Context: 35% — ✓ doing-upload → done-upload. Complete: 3/3
|
|
595
441
|
|
|
596
442
|
Next: Run /df:verify to verify specs and merge to main
|
|
597
443
|
```
|
|
598
444
|
|
|
599
|
-
### Spike
|
|
445
|
+
### Spike with Failure (Agent or Verifier Override)
|
|
600
446
|
|
|
601
447
|
```
|
|
602
448
|
/df:execute (context: 10%)
|
|
@@ -611,56 +457,17 @@ Registering native tasks...
|
|
|
611
457
|
|
|
612
458
|
Checking experiment status...
|
|
613
459
|
T1 [SPIKE]: No experiment yet, spike executable
|
|
614
|
-
T2: Blocked by T1 (spike not validated)
|
|
615
|
-
T3: Blocked by T1 (spike not validated)
|
|
460
|
+
T2, T3: Blocked by T1 (spike not validated)
|
|
616
461
|
|
|
617
462
|
Spawning Wave 1: T1 [SPIKE]
|
|
618
463
|
TaskUpdate(task-001, status: "in_progress")
|
|
619
464
|
|
|
620
465
|
[Agent "T1 SPIKE" completed]
|
|
621
|
-
✓ T1: complete, verifying...
|
|
622
|
-
|
|
623
|
-
Verifying T1...
|
|
624
|
-
✓ Spike T1 verified (throughput 8500 >= 7000)
|
|
625
|
-
TaskUpdate(task-001, status: "completed") ← auto-unblocks task-002, task-003
|
|
626
|
-
→ upload--streaming--passed.md
|
|
627
|
-
|
|
628
|
-
TaskList → task-002, task-003 now ready
|
|
629
|
-
|
|
630
|
-
Spawning Wave 2: T2, T3 parallel
|
|
631
|
-
TaskUpdate(task-002, status: "in_progress")
|
|
632
|
-
TaskUpdate(task-003, status: "in_progress")
|
|
633
|
-
|
|
634
|
-
[Agent "T2" completed]
|
|
635
|
-
TaskUpdate(task-002, status: "completed")
|
|
636
|
-
✓ T2: success (def5678)
|
|
637
|
-
|
|
638
|
-
[Agent "T3" completed]
|
|
639
|
-
TaskUpdate(task-003, status: "completed")
|
|
640
|
-
✓ T3: success (ghi9012)
|
|
641
|
-
|
|
642
|
-
Wave 2 complete (2/2). Context: 40%
|
|
643
|
-
|
|
644
|
-
✓ doing-upload → done-upload
|
|
645
|
-
✓ Complete: 3/3 tasks
|
|
646
|
-
|
|
647
|
-
Next: Run /df:verify to verify specs and merge to main
|
|
648
|
-
```
|
|
649
|
-
|
|
650
|
-
### Spike Failed (Agent Correctly Reported)
|
|
651
|
-
|
|
652
|
-
```
|
|
653
|
-
/df:execute (context: 10%)
|
|
654
|
-
|
|
655
|
-
Registering native tasks...
|
|
656
|
-
TaskCreate → T1 [SPIKE], T2, T3 (with dependencies)
|
|
657
|
-
|
|
658
|
-
Wave 1: T1 [SPIKE] (context: 15%)
|
|
659
|
-
TaskUpdate(task-001, status: "in_progress")
|
|
660
|
-
T1: complete, verifying...
|
|
466
|
+
✓ T1: complete (agent said: success), verifying...
|
|
661
467
|
|
|
662
468
|
Verifying T1...
|
|
663
469
|
✗ Spike T1 failed verification (throughput 1500 < 7000)
|
|
470
|
+
⚠ Agent incorrectly marked as passed — overriding to FAILED
|
|
664
471
|
# Spike stays pending — dependents remain blocked
|
|
665
472
|
→ upload--streaming--failed.md
|
|
666
473
|
|
|
@@ -670,29 +477,7 @@ Complete: 1/3 tasks (2 blocked by failed experiment)
|
|
|
670
477
|
Next: Run /df:plan to generate new hypothesis spike
|
|
671
478
|
```
|
|
672
479
|
|
|
673
|
-
|
|
674
|
-
|
|
675
|
-
```
|
|
676
|
-
/df:execute (context: 10%)
|
|
677
|
-
|
|
678
|
-
Registering native tasks...
|
|
679
|
-
TaskCreate → T1 [SPIKE], T2, T3 (with dependencies)
|
|
680
|
-
|
|
681
|
-
Wave 1: T1 [SPIKE] (context: 15%)
|
|
682
|
-
TaskUpdate(task-001, status: "in_progress")
|
|
683
|
-
T1: complete (agent said: success), verifying...
|
|
684
|
-
|
|
685
|
-
Verifying T1...
|
|
686
|
-
✗ Spike T1 failed verification (throughput 1500 < 7000)
|
|
687
|
-
⚠ Agent incorrectly marked as passed — overriding to FAILED
|
|
688
|
-
TaskUpdate(task-001, status: "pending") ← reset, dependents stay blocked
|
|
689
|
-
→ upload--streaming--failed.md
|
|
690
|
-
|
|
691
|
-
⚠ Spike T1 invalidated hypothesis
|
|
692
|
-
Complete: 1/3 tasks (2 blocked by failed experiment)
|
|
693
|
-
|
|
694
|
-
Next: Run /df:plan to generate new hypothesis spike
|
|
695
|
-
```
|
|
480
|
+
Note: If the agent correctly reports `status: failed`, the "overriding to FAILED" line is omitted — the verifier simply confirms failure.
|
|
696
481
|
|
|
697
482
|
### With Checkpoint
|
|
698
483
|
|
package/src/commands/df/plan.md
CHANGED
|
@@ -17,17 +17,11 @@ Compare specs against codebase and past experiments. Generate prioritized tasks.
|
|
|
17
17
|
|
|
18
18
|
## Spec File States
|
|
19
19
|
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
```
|
|
26
|
-
|
|
27
|
-
**Filtering:**
|
|
28
|
-
- New: `specs/*.md` excluding `doing-*` and `done-*`
|
|
29
|
-
- In progress: `specs/doing-*.md`
|
|
30
|
-
- Completed: `specs/done-*.md`
|
|
20
|
+
| Prefix | State | Action |
|
|
21
|
+
|--------|-------|--------|
|
|
22
|
+
| (none) | New | Plan this |
|
|
23
|
+
| `doing-` | In progress | Skip |
|
|
24
|
+
| `done-` | Completed | Skip |
|
|
31
25
|
|
|
32
26
|
## Behavior
|
|
33
27
|
|
|
@@ -83,20 +77,7 @@ Include patterns in task descriptions for agents to follow.
|
|
|
83
77
|
|
|
84
78
|
### 4. ANALYZE CODEBASE
|
|
85
79
|
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
**NEVER use TaskOutput** — returns full agent transcripts (100KB+) that explode context.
|
|
89
|
-
|
|
90
|
-
**Spawn ALL Explore agents in ONE message (non-background, parallel):**
|
|
91
|
-
|
|
92
|
-
```python
|
|
93
|
-
# All in single message — runs in parallel, blocks until all complete:
|
|
94
|
-
Task(subagent_type="Explore", model="haiku", prompt="Find: ...")
|
|
95
|
-
Task(subagent_type="Explore", model="haiku", prompt="Find: ...")
|
|
96
|
-
Task(subagent_type="Explore", model="haiku", prompt="Find: ...")
|
|
97
|
-
# Each returns agent's final message only (not full transcript)
|
|
98
|
-
# No late notifications — agents complete before orchestrator proceeds
|
|
99
|
-
```
|
|
80
|
+
Follow `templates/explore-agent.md` for spawn rules, prompt structure, and scope restrictions.
|
|
100
81
|
|
|
101
82
|
Scale agent count based on codebase size:
|
|
102
83
|
|
|
@@ -107,35 +88,6 @@ Scale agent count based on codebase size:
|
|
|
107
88
|
| 100-500 | 25-40 |
|
|
108
89
|
| 500+ | 50-100 (cap) |
|
|
109
90
|
|
|
110
|
-
**Explore Agent Prompt Structure:**
|
|
111
|
-
```
|
|
112
|
-
Find: [specific question]
|
|
113
|
-
|
|
114
|
-
Return ONLY:
|
|
115
|
-
- File paths matching criteria
|
|
116
|
-
- One-line description per file
|
|
117
|
-
- Integration points (if asked)
|
|
118
|
-
|
|
119
|
-
DO NOT:
|
|
120
|
-
- Read or summarize spec files
|
|
121
|
-
- Make recommendations
|
|
122
|
-
- Propose solutions
|
|
123
|
-
- Generate tables or lengthy explanations
|
|
124
|
-
|
|
125
|
-
Max response: 500 tokens (configurable via .deepflow/config.yaml explore.max_tokens)
|
|
126
|
-
```
|
|
127
|
-
|
|
128
|
-
**Explore Agent Scope Restrictions:**
|
|
129
|
-
- MUST only report factual findings:
|
|
130
|
-
- Files found
|
|
131
|
-
- Patterns/conventions observed
|
|
132
|
-
- Integration points
|
|
133
|
-
- MUST NOT:
|
|
134
|
-
- Make recommendations
|
|
135
|
-
- Propose architectures
|
|
136
|
-
- Read and summarize specs (that's orchestrator's job)
|
|
137
|
-
- Draw conclusions about what should be built
|
|
138
|
-
|
|
139
91
|
**Use `code-completeness` skill patterns** to search for:
|
|
140
92
|
- Implementations matching spec requirements
|
|
141
93
|
- TODO, FIXME, HACK comments
|
|
@@ -144,28 +96,9 @@ Max response: 500 tokens (configurable via .deepflow/config.yaml explore.max_tok
|
|
|
144
96
|
|
|
145
97
|
### 5. COMPARE & PRIORITIZE
|
|
146
98
|
|
|
147
|
-
|
|
148
|
-
```
|
|
149
|
-
Task tool parameters:
|
|
150
|
-
- subagent_type: "reasoner"
|
|
151
|
-
- model: "opus"
|
|
152
|
-
```
|
|
99
|
+
Spawn `Task(subagent_type="reasoner", model="opus")`. Reasoner maps each requirement to DONE / PARTIAL / MISSING / CONFLICT. Flag spec gaps; don't silently assume.
|
|
153
100
|
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
| Status | Action |
|
|
157
|
-
|--------|--------|
|
|
158
|
-
| DONE | Skip |
|
|
159
|
-
| PARTIAL | Task to complete |
|
|
160
|
-
| MISSING | Task to implement |
|
|
161
|
-
| CONFLICT | Flag for review |
|
|
162
|
-
|
|
163
|
-
**Spec gaps:** If spec is ambiguous or missing details, note in output (don't silently assume).
|
|
164
|
-
|
|
165
|
-
**Priority order:**
|
|
166
|
-
1. Dependencies — blockers first
|
|
167
|
-
2. Impact — core features before enhancements
|
|
168
|
-
3. Risk — unknowns early
|
|
101
|
+
**Priority order:** Dependencies → Impact → Risk
|
|
169
102
|
|
|
170
103
|
### 6. GENERATE SPIKE TASKS (IF NEEDED)
|
|
171
104
|
|
|
@@ -186,66 +119,53 @@ Reasoner performs analysis:
|
|
|
186
119
|
- Blocked by: none
|
|
187
120
|
```
|
|
188
121
|
|
|
189
|
-
**Blocking Logic:**
|
|
190
|
-
- All implementation tasks MUST have `Blocked by: T{spike}` until spike passes
|
|
191
|
-
- After spike completes:
|
|
192
|
-
- If passed: Update experiment to `--passed.md`, unblock implementation tasks
|
|
193
|
-
- If failed: Update experiment to `--failed.md`, DO NOT generate implementation tasks
|
|
194
|
-
|
|
195
|
-
**Full Implementation Only After Spike:**
|
|
196
|
-
- Only generate full task list when spike validates the approach
|
|
197
|
-
- Never generate 10-task waterfall without validated hypothesis
|
|
122
|
+
**Blocking Logic:** All implementation tasks MUST have `Blocked by: T{spike}` until spike passes. If spike fails: update to `--failed.md`, DO NOT generate implementation tasks.
|
|
198
123
|
|
|
199
124
|
### 7. VALIDATE HYPOTHESES
|
|
200
125
|
|
|
201
|
-
|
|
126
|
+
For unfamiliar APIs, ambiguous approaches, or performance-critical work: prototype in scratchpad (not committed). If assumption fails, write `.deepflow/experiments/{topic}--{hypothesis}--failed.md`. Skip for well-known patterns/simple CRUD.
|
|
202
127
|
|
|
203
|
-
|
|
128
|
+
### 8. CLEANUP PLAN.md
|
|
204
129
|
|
|
205
|
-
|
|
206
|
-
1. Prototype in scratchpad (not committed)
|
|
207
|
-
2. Test assumption
|
|
208
|
-
3. If fails → Write `.deepflow/experiments/{topic}--{hypothesis}--failed.md`
|
|
209
|
-
4. Adjust approach, document in task
|
|
130
|
+
Before writing new tasks, prune stale sections:
|
|
210
131
|
|
|
211
|
-
|
|
132
|
+
```
|
|
133
|
+
For each ### section in PLAN.md:
|
|
134
|
+
Extract spec name from header (e.g. "doing-upload" or "done-upload")
|
|
135
|
+
If specs/done-{name}.md exists:
|
|
136
|
+
→ Remove the ENTIRE section: header, tasks, execution summary, fix tasks, separators
|
|
137
|
+
If header references a spec with no matching specs/doing-*.md or specs/done-*.md:
|
|
138
|
+
→ Remove it (orphaned section)
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
Also recalculate the Summary table (specs analyzed, tasks created/completed/pending) to reflect only remaining sections.
|
|
142
|
+
|
|
143
|
+
If PLAN.md becomes empty after cleanup, delete the file and recreate fresh.
|
|
212
144
|
|
|
213
|
-
###
|
|
145
|
+
### 9. OUTPUT PLAN.md
|
|
214
146
|
|
|
215
147
|
Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validation findings.
|
|
216
148
|
|
|
217
|
-
###
|
|
149
|
+
### 10. RENAME SPECS
|
|
218
150
|
|
|
219
151
|
`mv specs/feature.md specs/doing-feature.md`
|
|
220
152
|
|
|
221
|
-
###
|
|
153
|
+
### 11. REPORT
|
|
222
154
|
|
|
223
155
|
`✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
|
|
224
156
|
|
|
225
|
-
###
|
|
157
|
+
### 12. CAPTURE DECISIONS
|
|
226
158
|
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
Append confirmed decisions to `.deepflow/decisions.md` (create if missing):
|
|
230
|
-
```
|
|
231
|
-
### {YYYY-MM-DD} — plan
|
|
232
|
-
- [TAG] Decision text — rationale summary
|
|
233
|
-
```
|
|
234
|
-
If a decision contradicts a prior entry, add: `(supersedes: <prior text>)`
|
|
159
|
+
Follow the **default** variant from `templates/decision-capture.md`. Command name: `plan`.
|
|
235
160
|
|
|
236
161
|
## Rules
|
|
237
|
-
- **Never use TaskOutput** — Returns full transcripts that explode context
|
|
238
|
-
- **Never use run_in_background for Explore agents** — Causes late notifications that pollute output
|
|
239
162
|
- **Spike-first** — Generate spike task before full implementation if no `--passed.md` experiment exists
|
|
240
163
|
- **Block on spike** — Full implementation tasks MUST be blocked by spike validation
|
|
241
164
|
- **Learn from failures** — Extract "next hypothesis" from failed experiments, never repeat same approach
|
|
242
|
-
- **
|
|
243
|
-
- **Plan only** — Do NOT implement anything (except quick validation prototypes)
|
|
244
|
-
- **Validate before commit** — Test risky assumptions with minimal experiments
|
|
165
|
+
- **Plan only** — Do NOT implement (except quick validation prototypes)
|
|
245
166
|
- **Confirm before assume** — Search code before marking "missing"
|
|
246
167
|
- **One task = one logical unit** — Atomic, committable
|
|
247
|
-
- Prefer existing utilities over new code
|
|
248
|
-
- Flag spec gaps, don't silently ignore
|
|
168
|
+
- Prefer existing utilities over new code; flag spec gaps
|
|
249
169
|
|
|
250
170
|
## Agent Scaling
|
|
251
171
|
|
|
@@ -254,7 +174,7 @@ If a decision contradicts a prior entry, add: `(supersedes: <prior text>)`
|
|
|
254
174
|
| Explore (search) | haiku | 10 | +1 per 20 files |
|
|
255
175
|
| Reasoner (analyze) | opus | 5 | +1 per 2 specs |
|
|
256
176
|
|
|
257
|
-
|
|
177
|
+
Always use the `Task` tool with explicit `subagent_type` and `model`. Do NOT use Glob/Grep/Read directly.
|
|
258
178
|
|
|
259
179
|
## Example
|
|
260
180
|
|
|
@@ -93,13 +93,7 @@ Word count target: 200-500 words. Do not pad. Do not truncate important informat
|
|
|
93
93
|
|
|
94
94
|
## Rules
|
|
95
95
|
|
|
96
|
-
- **NEVER write any file** — not decisions.md, not PLAN.md, not any new file
|
|
97
|
-
- **NEVER use AskUserQuestion** — this command is read-only, no interaction
|
|
98
|
-
- **NEVER spawn agents** — read directly using Bash (git log) and Read tool
|
|
99
|
-
- **NEVER use TaskOutput** — returns full transcripts that explode context
|
|
100
|
-
- **NEVER use EnterPlanMode or ExitPlanMode**
|
|
101
96
|
- Read sources in a single pass — do not loop or re-read
|
|
102
|
-
- If a source file is missing, skip it and note it only if relevant
|
|
103
97
|
- Contradicted decisions: show newest entry per topic only
|
|
104
98
|
- Token budget: stay within ~2500 tokens of input to produce ~500 words of output
|
|
105
99
|
|