deepflow 0.1.49 → 0.1.51
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +36 -19
- package/bin/install.js +13 -4
- package/hooks/df-consolidation-check.js +67 -0
- package/package.json +1 -1
- package/src/commands/df/consolidate.md +58 -0
- package/src/commands/df/debate.md +30 -149
- package/src/commands/df/discover.md +0 -21
- package/src/commands/df/execute.md +54 -267
- package/src/commands/df/note.md +1 -0
- package/src/commands/df/plan.md +30 -114
- package/src/commands/df/resume.md +0 -6
- package/src/commands/df/spec.md +3 -61
- package/src/commands/df/verify.md +59 -168
- package/templates/explore-agent.md +34 -0
|
@@ -62,36 +62,7 @@ Each task = one background agent. Use agent completion notifications as the feed
|
|
|
62
62
|
- Yes → proceed to next wave or write final summary
|
|
63
63
|
```
|
|
64
64
|
|
|
65
|
-
|
|
66
|
-
- Run Bash commands to poll/monitor
|
|
67
|
-
- Try to read result files before notifications arrive
|
|
68
|
-
- Write summaries before all wave agents complete
|
|
69
|
-
|
|
70
|
-
**On notification, respond briefly:**
|
|
71
|
-
- ONE line per completed agent: "✓ T1: success (abc123)"
|
|
72
|
-
- Only write full summary after ALL wave agents complete
|
|
73
|
-
- Do NOT repeat the full execution status on every notification
|
|
74
|
-
|
|
75
|
-
```python
|
|
76
|
-
# Step 1: Spawn wave (ONE message, then STOP)
|
|
77
|
-
Task(subagent_type="general-purpose", model="sonnet", run_in_background=True, prompt="T1: ...")
|
|
78
|
-
Task(subagent_type="general-purpose", model="sonnet", run_in_background=True, prompt="T2: ...")
|
|
79
|
-
Task(subagent_type="general-purpose", model="sonnet", run_in_background=True, prompt="T3: ...")
|
|
80
|
-
# Turn ends here. Wait for notifications.
|
|
81
|
-
|
|
82
|
-
# Step 2: On "Agent T1 completed" notification:
|
|
83
|
-
Read("{worktree}/.deepflow/results/T1.yaml")
|
|
84
|
-
# Output: "✓ T1: success (abc123)" — then STOP, wait for next
|
|
85
|
-
|
|
86
|
-
# Step 3: On "Agent T2 completed" notification:
|
|
87
|
-
Read("{worktree}/.deepflow/results/T2.yaml")
|
|
88
|
-
# Output: "✓ T2: success (def456)" — then STOP, wait for next
|
|
89
|
-
|
|
90
|
-
# Step 4: On "Agent T3 completed" notification (last one):
|
|
91
|
-
Read("{worktree}/.deepflow/results/T3.yaml")
|
|
92
|
-
# Output: "✓ T3: success (ghi789)"
|
|
93
|
-
# All done → proceed to next wave or final summary
|
|
94
|
-
```
|
|
65
|
+
After spawning, your turn ENDS. Per notification: read result file, output ONE line ("✓ T1: success (abc123)"), update PLAN.md. Write full summary only after ALL wave agents complete.
|
|
95
66
|
|
|
96
67
|
Result file `.deepflow/results/{task_id}.yaml`:
|
|
97
68
|
```yaml
|
|
@@ -145,10 +116,8 @@ experiment_file: ".deepflow/experiments/upload--streaming--failed.md"
|
|
|
145
116
|
}
|
|
146
117
|
```
|
|
147
118
|
|
|
148
|
-
Note: `completed_tasks` is kept for backward compatibility but is now derivable from PLAN.md `[x]` entries. The native task system (TaskList) is the primary source for runtime task status.
|
|
149
|
-
|
|
150
119
|
**On checkpoint:** Complete wave → update PLAN.md → save to worktree → exit.
|
|
151
|
-
**Resume:** `--continue` loads checkpoint, verifies worktree, skips completed tasks.
|
|
120
|
+
**Resume:** `--continue` loads checkpoint, verifies worktree, skips completed tasks.
|
|
152
121
|
|
|
153
122
|
## Behavior
|
|
154
123
|
|
|
@@ -200,27 +169,7 @@ If missing: "No PLAN.md found. Run /df:plan first."
|
|
|
200
169
|
|
|
201
170
|
### 2.5. REGISTER NATIVE TASKS
|
|
202
171
|
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
**For each uncompleted task (`[ ]`) in PLAN.md:**
|
|
206
|
-
|
|
207
|
-
```
|
|
208
|
-
1. TaskCreate:
|
|
209
|
-
- subject: "{task_id}: {description}" (e.g. "T1: Create upload endpoint")
|
|
210
|
-
- description: Full task block from PLAN.md (files, blocked by, type, etc.)
|
|
211
|
-
- activeForm: "{gerund form of description}" (e.g. "Creating upload endpoint")
|
|
212
|
-
|
|
213
|
-
2. Store mapping: PLAN.md task_id (T1) → native task ID
|
|
214
|
-
```
|
|
215
|
-
|
|
216
|
-
**After all tasks created, set up dependencies:**
|
|
217
|
-
|
|
218
|
-
```
|
|
219
|
-
For each task with "Blocked by: T{n}, T{m}":
|
|
220
|
-
TaskUpdate(taskId: native_id, addBlockedBy: [native_id_of_Tn, native_id_of_Tm])
|
|
221
|
-
```
|
|
222
|
-
|
|
223
|
-
**On `--continue`:** Only create tasks for remaining `[ ]` items (skip `[x]` completed).
|
|
172
|
+
For each `[ ]` task in PLAN.md: `TaskCreate(subject: "{task_id}: {description}", activeForm: "{gerund}", description: full block)`. Store task_id → native ID mapping. Then set dependencies: `TaskUpdate(addBlockedBy: [...])` for each "Blocked by:" entry. On `--continue`: only register remaining `[ ]` items.
|
|
224
173
|
|
|
225
174
|
### 3. CHECK FOR UNPLANNED SPECS
|
|
226
175
|
|
|
@@ -302,24 +251,9 @@ TaskUpdate(taskId: native_id, status: "in_progress")
|
|
|
302
251
|
```
|
|
303
252
|
This activates the UI spinner showing the task's activeForm (e.g. "Creating upload endpoint").
|
|
304
253
|
|
|
305
|
-
**
|
|
306
|
-
|
|
307
|
-
DO NOT spawn one task, wait, then spawn another. Instead, call Task tool multiple times in the SAME message block. This enables true parallelism.
|
|
254
|
+
**NEVER use `isolation: "worktree"` on Task tool calls.** Deepflow manages a shared worktree per spec (`.deepflow/worktrees/{spec}/`) so wave 2 agents see wave 1 commits. Claude Code's native isolation creates separate per-agent worktrees (`.claude/worktrees/`) where agents can't see each other's work.
|
|
308
255
|
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
```
|
|
312
|
-
// In a SINGLE assistant message, invoke Task with run_in_background=true:
|
|
313
|
-
Task(subagent_type="general-purpose", model="sonnet", run_in_background=true, prompt="T1: ...")
|
|
314
|
-
Task(subagent_type="general-purpose", model="sonnet", run_in_background=true, prompt="T2: ...")
|
|
315
|
-
Task(subagent_type="general-purpose", model="sonnet", run_in_background=true, prompt="T3: ...")
|
|
316
|
-
// Turn ends here. Wait for completion notifications.
|
|
317
|
-
```
|
|
318
|
-
|
|
319
|
-
**WRONG (sequential):** Send message with Task for T1 → wait → send message with Task for T2 → wait → ...
|
|
320
|
-
**RIGHT (parallel):** Send ONE message with Task for T1, T2, T3 all together, then STOP
|
|
321
|
-
|
|
322
|
-
Same-file conflicts: spawn sequentially instead.
|
|
256
|
+
**Spawn ALL ready tasks in ONE message** with multiple Task tool calls (true parallelism). Same-file conflicts: spawn sequentially.
|
|
323
257
|
|
|
324
258
|
**Spike Task Execution:**
|
|
325
259
|
When spawning a spike task, the agent MUST:
|
|
@@ -378,58 +312,40 @@ VERIFIED_PASS →
|
|
|
378
312
|
|
|
379
313
|
VERIFIED_FAIL →
|
|
380
314
|
# Spike task stays as pending, dependents remain blocked
|
|
381
|
-
# No TaskUpdate needed — native system keeps them blocked
|
|
382
315
|
Log "✗ Spike {task_id} failed verification"
|
|
383
316
|
If override: log "⚠ Agent incorrectly marked as passed"
|
|
384
317
|
```
|
|
385
318
|
|
|
386
|
-
|
|
387
|
-
```
|
|
388
|
-
Task tool parameters:
|
|
389
|
-
- subagent_type: "reasoner"
|
|
390
|
-
- model: "opus"
|
|
391
|
-
- prompt: "Debug failure: {error details}"
|
|
392
|
-
```
|
|
319
|
+
On task failure: spawn `Task(subagent_type="reasoner", model="opus", prompt="Debug failure: {error details}")`.
|
|
393
320
|
|
|
394
321
|
### 7. PER-TASK (agent prompt)
|
|
395
322
|
|
|
396
|
-
**
|
|
323
|
+
**Common preamble (include in all agent prompts):**
|
|
324
|
+
```
|
|
325
|
+
Working directory: {worktree_absolute_path}
|
|
326
|
+
All file operations MUST use this absolute path as base. Do NOT write files to the main project directory.
|
|
327
|
+
Commit format: {commit_type}({spec}): {description}
|
|
328
|
+
Result file: {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
|
|
329
|
+
|
|
330
|
+
STOP after writing the result file. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main. These are handled by the orchestrator and /df:verify.
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
**Standard Task (append after preamble):**
|
|
397
334
|
```
|
|
398
335
|
{task_id}: {description from PLAN.md}
|
|
399
336
|
Files: {target files}
|
|
400
337
|
Spec: {spec_name}
|
|
401
338
|
|
|
402
|
-
**IMPORTANT: Working Directory**
|
|
403
|
-
All file operations MUST use this absolute path as base:
|
|
404
|
-
{worktree_absolute_path}
|
|
405
|
-
|
|
406
|
-
Example: To edit src/foo.ts, use:
|
|
407
|
-
{worktree_absolute_path}/src/foo.ts
|
|
408
|
-
|
|
409
|
-
Do NOT write files to the main project directory.
|
|
410
|
-
|
|
411
339
|
Steps:
|
|
412
340
|
1. Implement the task
|
|
413
|
-
2. Detect
|
|
414
|
-
|
|
415
|
-
|
|
416
|
-
|
|
417
|
-
|
|
418
|
-
|
|
419
|
-
|
|
420
|
-
|
|
421
|
-
6. Write result file with ALL fields including test evidence (see schema):
|
|
422
|
-
{worktree_absolute_path}/.deepflow/results/{task_id}.yaml
|
|
423
|
-
|
|
424
|
-
**STOP after writing the result file. Do NOT:**
|
|
425
|
-
- Merge branches or cherry-pick commits
|
|
426
|
-
- Rename or move spec files (doing-* → done-*)
|
|
427
|
-
- Remove worktrees or delete branches
|
|
428
|
-
- Run git checkout on main
|
|
429
|
-
These are handled by the orchestrator and /df:verify.
|
|
430
|
-
```
|
|
431
|
-
|
|
432
|
-
**Spike Task:**
|
|
341
|
+
2. Detect and run the project's test command if test infrastructure exists
|
|
342
|
+
- If tests fail: fix and re-run until passing. Do NOT commit with failing tests
|
|
343
|
+
- If NO test infrastructure: set tests_ran: false in result file
|
|
344
|
+
3. Commit as feat({spec}): {description}
|
|
345
|
+
4. Write result file with ALL fields including test evidence (see schema)
|
|
346
|
+
```
|
|
347
|
+
|
|
348
|
+
**Spike Task (append after preamble):**
|
|
433
349
|
```
|
|
434
350
|
{task_id} [SPIKE]: {hypothesis}
|
|
435
351
|
Type: spike
|
|
@@ -437,8 +353,6 @@ Method: {minimal steps}
|
|
|
437
353
|
Success criteria: {measurable targets}
|
|
438
354
|
Experiment file: {worktree_absolute_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
|
|
439
355
|
|
|
440
|
-
Working directory: {worktree_absolute_path}
|
|
441
|
-
|
|
442
356
|
Steps:
|
|
443
357
|
1. Execute method
|
|
444
358
|
2. For EACH criterion: record target, measure actual, compare (show math)
|
|
@@ -453,61 +367,37 @@ Rules:
|
|
|
453
367
|
- Worse than baseline = FAILED (baseline 7k, actual 1.5k → FAILED)
|
|
454
368
|
- "Close enough" = FAILED
|
|
455
369
|
- Verifier will check. False positives waste resources.
|
|
456
|
-
- STOP after writing result file. Do NOT merge, rename specs, or clean up worktrees.
|
|
457
370
|
```
|
|
458
371
|
|
|
459
372
|
### 8. FAILURE HANDLING
|
|
460
373
|
|
|
461
374
|
When a task fails and cannot be auto-fixed:
|
|
462
375
|
|
|
463
|
-
|
|
464
|
-
```
|
|
465
|
-
TaskUpdate(taskId: native_id, status: "pending") # Reset to pending, not deleted
|
|
466
|
-
```
|
|
467
|
-
This keeps the task visible for retry. Dependent tasks remain blocked.
|
|
468
|
-
|
|
469
|
-
**Behavior:**
|
|
470
|
-
1. Leave worktree intact at `{worktree_path}`
|
|
471
|
-
2. Keep checkpoint.json for potential resume
|
|
472
|
-
3. Output debugging instructions
|
|
473
|
-
|
|
474
|
-
**Output:**
|
|
475
|
-
```
|
|
476
|
-
✗ Task T3 failed after retry
|
|
477
|
-
|
|
478
|
-
Worktree preserved for debugging:
|
|
479
|
-
Path: .deepflow/worktrees/upload
|
|
480
|
-
Branch: df/upload
|
|
481
|
-
|
|
482
|
-
To investigate:
|
|
483
|
-
cd .deepflow/worktrees/upload
|
|
484
|
-
# examine files, run tests, etc.
|
|
485
|
-
|
|
486
|
-
To resume after fixing:
|
|
487
|
-
/df:execute --continue
|
|
488
|
-
|
|
489
|
-
To discard and start fresh:
|
|
490
|
-
/df:execute --fresh
|
|
491
|
-
```
|
|
492
|
-
|
|
493
|
-
**Key points:**
|
|
494
|
-
- Never auto-delete worktree on failure (cleanup_on_fail: false by default)
|
|
495
|
-
- Always provide the exact cleanup commands
|
|
496
|
-
- Checkpoint remains so --continue can work after manual fix
|
|
376
|
+
`TaskUpdate(taskId: native_id, status: "pending")` — keeps task visible for retry; dependents remain blocked. Leave worktree intact, keep checkpoint.json, output: worktree path/branch, `cd {worktree_path}` to investigate, `/df:execute --continue` to resume, `/df:execute --fresh` to discard.
|
|
497
377
|
|
|
498
378
|
### 9. COMPLETE SPECS
|
|
499
379
|
|
|
500
380
|
When all tasks done for a `doing-*` spec:
|
|
501
|
-
1. Embed history in spec: `## Completed` section
|
|
381
|
+
1. Embed history in spec: `## Completed` section with task list and commit hashes
|
|
502
382
|
2. Rename: `doing-upload.md` → `done-upload.md`
|
|
503
|
-
3.
|
|
383
|
+
3. Extract decisions from done-* spec: Read the `done-{name}.md` file. Model-extract architectural decisions — look for explicit choices (→ `[APPROACH]`), unvalidated assumptions (→ `[ASSUMPTION]`), and "for now" decisions (→ `[PROVISIONAL]`). Append as a new section to **main tree** `.deepflow/decisions.md`:
|
|
384
|
+
```
|
|
385
|
+
### {YYYY-MM-DD} — {spec-name}
|
|
386
|
+
- [TAG] decision text — rationale
|
|
387
|
+
```
|
|
388
|
+
After successful append, delete `specs/done-{name}.md`. If write fails, preserve the file.
|
|
389
|
+
4. Remove the spec's ENTIRE section from PLAN.md:
|
|
390
|
+
- The `### doing-{spec}` header
|
|
391
|
+
- All task entries (`- [x] **T{n}**: ...` and their sub-items)
|
|
392
|
+
- Any `## Execution Summary` block for that spec
|
|
393
|
+
- Any `### Fix Tasks` sub-section for that spec
|
|
394
|
+
- Separators (`---`) between removed sections
|
|
395
|
+
5. Recalculate the Summary table at the top of PLAN.md (update counts for completed/pending)
|
|
504
396
|
|
|
505
397
|
### 10. ITERATE (Notification-Driven)
|
|
506
398
|
|
|
507
399
|
After spawning wave agents, your turn ENDS. Completion notifications drive the loop.
|
|
508
400
|
|
|
509
|
-
**NEVER use TaskOutput** — it explodes context.
|
|
510
|
-
|
|
511
401
|
**Per notification:**
|
|
512
402
|
1. Read result file for the completed agent
|
|
513
403
|
2. Validate test evidence:
|
|
@@ -524,22 +414,6 @@ After spawning wave agents, your turn ENDS. Completion notifications drive the l
|
|
|
524
414
|
|
|
525
415
|
**Repeat** until: all done, all blocked, or context ≥50% (checkpoint).
|
|
526
416
|
|
|
527
|
-
### 11. CAPTURE DECISIONS
|
|
528
|
-
|
|
529
|
-
After all tasks complete (or all blocked), extract up to 4 candidate decisions from the session (implementation patterns, deviations from plan, key assumptions made).
|
|
530
|
-
|
|
531
|
-
Present via AskUserQuestion with multiSelect: true. Labels: `[TAG] decision text`. Descriptions: rationale.
|
|
532
|
-
|
|
533
|
-
For each confirmed decision, append to **main tree** `.deepflow/decisions.md` (create if missing):
|
|
534
|
-
```
|
|
535
|
-
### {YYYY-MM-DD} — execute
|
|
536
|
-
- [APPROACH] Parallel agent spawn for independent tasks — confirmed no file conflicts
|
|
537
|
-
```
|
|
538
|
-
|
|
539
|
-
Main tree path: use the repo root (parent of `.deepflow/worktrees/`), NOT the worktree.
|
|
540
|
-
|
|
541
|
-
Max 4 candidates per prompt. Tags: [APPROACH], [PROVISIONAL], [ASSUMPTION].
|
|
542
|
-
|
|
543
417
|
## Rules
|
|
544
418
|
|
|
545
419
|
| Rule | Detail |
|
|
@@ -555,48 +429,22 @@ Max 4 candidates per prompt. Tags: [APPROACH], [PROVISIONAL], [ASSUMPTION].
|
|
|
555
429
|
```
|
|
556
430
|
/df:execute (context: 12%)
|
|
557
431
|
|
|
558
|
-
Loading PLAN.md...
|
|
559
|
-
|
|
560
|
-
T2: Add S3 service (blocked by T1)
|
|
561
|
-
T3: Add auth guard (blocked by T1)
|
|
562
|
-
|
|
563
|
-
Registering native tasks...
|
|
564
|
-
TaskCreate → T1 (native: task-001)
|
|
565
|
-
TaskCreate → T2 (native: task-002)
|
|
566
|
-
TaskCreate → T3 (native: task-003)
|
|
567
|
-
TaskUpdate(task-002, addBlockedBy: [task-001])
|
|
568
|
-
TaskUpdate(task-003, addBlockedBy: [task-001])
|
|
569
|
-
|
|
570
|
-
Spawning Wave 1: T1
|
|
571
|
-
TaskUpdate(task-001, status: "in_progress") ← spinner: "Creating upload endpoint"
|
|
572
|
-
|
|
573
|
-
[Agent "T1" completed]
|
|
574
|
-
TaskUpdate(task-001, status: "completed") ← auto-unblocks task-002, task-003
|
|
575
|
-
✓ T1: success (abc1234)
|
|
576
|
-
|
|
577
|
-
TaskList → task-002, task-003 now ready (blockedBy empty)
|
|
578
|
-
|
|
579
|
-
Spawning Wave 2: T2, T3 parallel
|
|
580
|
-
TaskUpdate(task-002, status: "in_progress")
|
|
581
|
-
TaskUpdate(task-003, status: "in_progress")
|
|
432
|
+
Loading PLAN.md... T1 ready, T2/T3 blocked by T1
|
|
433
|
+
Registering native tasks: TaskCreate T1/T2/T3, TaskUpdate(T2 blockedBy T1), TaskUpdate(T3 blockedBy T1)
|
|
582
434
|
|
|
583
|
-
|
|
584
|
-
|
|
585
|
-
|
|
435
|
+
Wave 1: TaskUpdate(T1, in_progress)
|
|
436
|
+
[Agent "T1" completed] TaskUpdate(T1, completed) → auto-unblocks T2, T3
|
|
437
|
+
✓ T1: success (abc1234)
|
|
586
438
|
|
|
587
|
-
|
|
588
|
-
|
|
589
|
-
|
|
590
|
-
|
|
591
|
-
Wave 2 complete (2/2). Context: 35%
|
|
592
|
-
|
|
593
|
-
✓ doing-upload → done-upload
|
|
594
|
-
✓ Complete: 3/3 tasks
|
|
439
|
+
Wave 2: TaskUpdate(T2/T3, in_progress)
|
|
440
|
+
[Agent "T2" completed] ✓ T2: success (def5678)
|
|
441
|
+
[Agent "T3" completed] ✓ T3: success (ghi9012)
|
|
442
|
+
Context: 35% — ✓ doing-upload → done-upload. Complete: 3/3
|
|
595
443
|
|
|
596
444
|
Next: Run /df:verify to verify specs and merge to main
|
|
597
445
|
```
|
|
598
446
|
|
|
599
|
-
### Spike
|
|
447
|
+
### Spike with Failure (Agent or Verifier Override)
|
|
600
448
|
|
|
601
449
|
```
|
|
602
450
|
/df:execute (context: 10%)
|
|
@@ -611,56 +459,17 @@ Registering native tasks...
|
|
|
611
459
|
|
|
612
460
|
Checking experiment status...
|
|
613
461
|
T1 [SPIKE]: No experiment yet, spike executable
|
|
614
|
-
T2: Blocked by T1 (spike not validated)
|
|
615
|
-
T3: Blocked by T1 (spike not validated)
|
|
462
|
+
T2, T3: Blocked by T1 (spike not validated)
|
|
616
463
|
|
|
617
464
|
Spawning Wave 1: T1 [SPIKE]
|
|
618
465
|
TaskUpdate(task-001, status: "in_progress")
|
|
619
466
|
|
|
620
467
|
[Agent "T1 SPIKE" completed]
|
|
621
|
-
✓ T1: complete, verifying...
|
|
622
|
-
|
|
623
|
-
Verifying T1...
|
|
624
|
-
✓ Spike T1 verified (throughput 8500 >= 7000)
|
|
625
|
-
TaskUpdate(task-001, status: "completed") ← auto-unblocks task-002, task-003
|
|
626
|
-
→ upload--streaming--passed.md
|
|
627
|
-
|
|
628
|
-
TaskList → task-002, task-003 now ready
|
|
629
|
-
|
|
630
|
-
Spawning Wave 2: T2, T3 parallel
|
|
631
|
-
TaskUpdate(task-002, status: "in_progress")
|
|
632
|
-
TaskUpdate(task-003, status: "in_progress")
|
|
633
|
-
|
|
634
|
-
[Agent "T2" completed]
|
|
635
|
-
TaskUpdate(task-002, status: "completed")
|
|
636
|
-
✓ T2: success (def5678)
|
|
637
|
-
|
|
638
|
-
[Agent "T3" completed]
|
|
639
|
-
TaskUpdate(task-003, status: "completed")
|
|
640
|
-
✓ T3: success (ghi9012)
|
|
641
|
-
|
|
642
|
-
Wave 2 complete (2/2). Context: 40%
|
|
643
|
-
|
|
644
|
-
✓ doing-upload → done-upload
|
|
645
|
-
✓ Complete: 3/3 tasks
|
|
646
|
-
|
|
647
|
-
Next: Run /df:verify to verify specs and merge to main
|
|
648
|
-
```
|
|
649
|
-
|
|
650
|
-
### Spike Failed (Agent Correctly Reported)
|
|
651
|
-
|
|
652
|
-
```
|
|
653
|
-
/df:execute (context: 10%)
|
|
654
|
-
|
|
655
|
-
Registering native tasks...
|
|
656
|
-
TaskCreate → T1 [SPIKE], T2, T3 (with dependencies)
|
|
657
|
-
|
|
658
|
-
Wave 1: T1 [SPIKE] (context: 15%)
|
|
659
|
-
TaskUpdate(task-001, status: "in_progress")
|
|
660
|
-
T1: complete, verifying...
|
|
468
|
+
✓ T1: complete (agent said: success), verifying...
|
|
661
469
|
|
|
662
470
|
Verifying T1...
|
|
663
471
|
✗ Spike T1 failed verification (throughput 1500 < 7000)
|
|
472
|
+
⚠ Agent incorrectly marked as passed — overriding to FAILED
|
|
664
473
|
# Spike stays pending — dependents remain blocked
|
|
665
474
|
→ upload--streaming--failed.md
|
|
666
475
|
|
|
@@ -670,29 +479,7 @@ Complete: 1/3 tasks (2 blocked by failed experiment)
|
|
|
670
479
|
Next: Run /df:plan to generate new hypothesis spike
|
|
671
480
|
```
|
|
672
481
|
|
|
673
|
-
|
|
674
|
-
|
|
675
|
-
```
|
|
676
|
-
/df:execute (context: 10%)
|
|
677
|
-
|
|
678
|
-
Registering native tasks...
|
|
679
|
-
TaskCreate → T1 [SPIKE], T2, T3 (with dependencies)
|
|
680
|
-
|
|
681
|
-
Wave 1: T1 [SPIKE] (context: 15%)
|
|
682
|
-
TaskUpdate(task-001, status: "in_progress")
|
|
683
|
-
T1: complete (agent said: success), verifying...
|
|
684
|
-
|
|
685
|
-
Verifying T1...
|
|
686
|
-
✗ Spike T1 failed verification (throughput 1500 < 7000)
|
|
687
|
-
⚠ Agent incorrectly marked as passed — overriding to FAILED
|
|
688
|
-
TaskUpdate(task-001, status: "pending") ← reset, dependents stay blocked
|
|
689
|
-
→ upload--streaming--failed.md
|
|
690
|
-
|
|
691
|
-
⚠ Spike T1 invalidated hypothesis
|
|
692
|
-
Complete: 1/3 tasks (2 blocked by failed experiment)
|
|
693
|
-
|
|
694
|
-
Next: Run /df:plan to generate new hypothesis spike
|
|
695
|
-
```
|
|
482
|
+
Note: If the agent correctly reports `status: failed`, the "overriding to FAILED" line is omitted — the verifier simply confirms failure.
|
|
696
483
|
|
|
697
484
|
### With Checkpoint
|
|
698
485
|
|
package/src/commands/df/note.md
CHANGED
|
@@ -132,6 +132,7 @@ No decisions saved.
|
|
|
132
132
|
- `[APPROACH]` — deliberate design or implementation choice
|
|
133
133
|
- `[PROVISIONAL]` — works for now, will revisit at scale or with more information
|
|
134
134
|
- `[ASSUMPTION]` — treating something as true without full confirmation
|
|
135
|
+
- `[DEBT]` — needs revisiting; produced only by `/df:consolidate`, never manually assigned
|
|
135
136
|
|
|
136
137
|
**Contradiction handling:** Never delete prior entries. When a new decision contradicts an older one, include a reference in the rationale: `was "X", now "Y" because Z`.
|
|
137
138
|
|
package/src/commands/df/plan.md
CHANGED
|
@@ -17,17 +17,11 @@ Compare specs against codebase and past experiments. Generate prioritized tasks.
|
|
|
17
17
|
|
|
18
18
|
## Spec File States
|
|
19
19
|
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
```
|
|
26
|
-
|
|
27
|
-
**Filtering:**
|
|
28
|
-
- New: `specs/*.md` excluding `doing-*` and `done-*`
|
|
29
|
-
- In progress: `specs/doing-*.md`
|
|
30
|
-
- Completed: `specs/done-*.md`
|
|
20
|
+
| Prefix | State | Action |
|
|
21
|
+
|--------|-------|--------|
|
|
22
|
+
| (none) | New | Plan this |
|
|
23
|
+
| `doing-` | In progress | Skip |
|
|
24
|
+
| `done-` | Completed | Skip |
|
|
31
25
|
|
|
32
26
|
## Behavior
|
|
33
27
|
|
|
@@ -83,20 +77,7 @@ Include patterns in task descriptions for agents to follow.
|
|
|
83
77
|
|
|
84
78
|
### 4. ANALYZE CODEBASE
|
|
85
79
|
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
**NEVER use TaskOutput** — returns full agent transcripts (100KB+) that explode context.
|
|
89
|
-
|
|
90
|
-
**Spawn ALL Explore agents in ONE message (non-background, parallel):**
|
|
91
|
-
|
|
92
|
-
```python
|
|
93
|
-
# All in single message — runs in parallel, blocks until all complete:
|
|
94
|
-
Task(subagent_type="Explore", model="haiku", prompt="Find: ...")
|
|
95
|
-
Task(subagent_type="Explore", model="haiku", prompt="Find: ...")
|
|
96
|
-
Task(subagent_type="Explore", model="haiku", prompt="Find: ...")
|
|
97
|
-
# Each returns agent's final message only (not full transcript)
|
|
98
|
-
# No late notifications — agents complete before orchestrator proceeds
|
|
99
|
-
```
|
|
80
|
+
Follow `templates/explore-agent.md` for spawn rules, prompt structure, and scope restrictions.
|
|
100
81
|
|
|
101
82
|
Scale agent count based on codebase size:
|
|
102
83
|
|
|
@@ -107,35 +88,6 @@ Scale agent count based on codebase size:
|
|
|
107
88
|
| 100-500 | 25-40 |
|
|
108
89
|
| 500+ | 50-100 (cap) |
|
|
109
90
|
|
|
110
|
-
**Explore Agent Prompt Structure:**
|
|
111
|
-
```
|
|
112
|
-
Find: [specific question]
|
|
113
|
-
|
|
114
|
-
Return ONLY:
|
|
115
|
-
- File paths matching criteria
|
|
116
|
-
- One-line description per file
|
|
117
|
-
- Integration points (if asked)
|
|
118
|
-
|
|
119
|
-
DO NOT:
|
|
120
|
-
- Read or summarize spec files
|
|
121
|
-
- Make recommendations
|
|
122
|
-
- Propose solutions
|
|
123
|
-
- Generate tables or lengthy explanations
|
|
124
|
-
|
|
125
|
-
Max response: 500 tokens (configurable via .deepflow/config.yaml explore.max_tokens)
|
|
126
|
-
```
|
|
127
|
-
|
|
128
|
-
**Explore Agent Scope Restrictions:**
|
|
129
|
-
- MUST only report factual findings:
|
|
130
|
-
- Files found
|
|
131
|
-
- Patterns/conventions observed
|
|
132
|
-
- Integration points
|
|
133
|
-
- MUST NOT:
|
|
134
|
-
- Make recommendations
|
|
135
|
-
- Propose architectures
|
|
136
|
-
- Read and summarize specs (that's orchestrator's job)
|
|
137
|
-
- Draw conclusions about what should be built
|
|
138
|
-
|
|
139
91
|
**Use `code-completeness` skill patterns** to search for:
|
|
140
92
|
- Implementations matching spec requirements
|
|
141
93
|
- TODO, FIXME, HACK comments
|
|
@@ -144,28 +96,9 @@ Max response: 500 tokens (configurable via .deepflow/config.yaml explore.max_tok
|
|
|
144
96
|
|
|
145
97
|
### 5. COMPARE & PRIORITIZE
|
|
146
98
|
|
|
147
|
-
|
|
148
|
-
```
|
|
149
|
-
Task tool parameters:
|
|
150
|
-
- subagent_type: "reasoner"
|
|
151
|
-
- model: "opus"
|
|
152
|
-
```
|
|
153
|
-
|
|
154
|
-
Reasoner performs analysis:
|
|
155
|
-
|
|
156
|
-
| Status | Action |
|
|
157
|
-
|--------|--------|
|
|
158
|
-
| DONE | Skip |
|
|
159
|
-
| PARTIAL | Task to complete |
|
|
160
|
-
| MISSING | Task to implement |
|
|
161
|
-
| CONFLICT | Flag for review |
|
|
162
|
-
|
|
163
|
-
**Spec gaps:** If spec is ambiguous or missing details, note in output (don't silently assume).
|
|
99
|
+
Spawn `Task(subagent_type="reasoner", model="opus")`. Reasoner maps each requirement to DONE / PARTIAL / MISSING / CONFLICT. Flag spec gaps; don't silently assume.
|
|
164
100
|
|
|
165
|
-
**Priority order:**
|
|
166
|
-
1. Dependencies — blockers first
|
|
167
|
-
2. Impact — core features before enhancements
|
|
168
|
-
3. Risk — unknowns early
|
|
101
|
+
**Priority order:** Dependencies → Impact → Risk
|
|
169
102
|
|
|
170
103
|
### 6. GENERATE SPIKE TASKS (IF NEEDED)
|
|
171
104
|
|
|
@@ -186,66 +119,49 @@ Reasoner performs analysis:
|
|
|
186
119
|
- Blocked by: none
|
|
187
120
|
```
|
|
188
121
|
|
|
189
|
-
**Blocking Logic:**
|
|
190
|
-
- All implementation tasks MUST have `Blocked by: T{spike}` until spike passes
|
|
191
|
-
- After spike completes:
|
|
192
|
-
- If passed: Update experiment to `--passed.md`, unblock implementation tasks
|
|
193
|
-
- If failed: Update experiment to `--failed.md`, DO NOT generate implementation tasks
|
|
194
|
-
|
|
195
|
-
**Full Implementation Only After Spike:**
|
|
196
|
-
- Only generate full task list when spike validates the approach
|
|
197
|
-
- Never generate 10-task waterfall without validated hypothesis
|
|
122
|
+
**Blocking Logic:** All implementation tasks MUST have `Blocked by: T{spike}` until spike passes. If spike fails: update to `--failed.md`, DO NOT generate implementation tasks.
|
|
198
123
|
|
|
199
124
|
### 7. VALIDATE HYPOTHESES
|
|
200
125
|
|
|
201
|
-
|
|
126
|
+
For unfamiliar APIs, ambiguous approaches, or performance-critical work: prototype in scratchpad (not committed). If assumption fails, write `.deepflow/experiments/{topic}--{hypothesis}--failed.md`. Skip for well-known patterns/simple CRUD.
|
|
202
127
|
|
|
203
|
-
|
|
128
|
+
### 8. CLEANUP PLAN.md
|
|
204
129
|
|
|
205
|
-
|
|
206
|
-
1. Prototype in scratchpad (not committed)
|
|
207
|
-
2. Test assumption
|
|
208
|
-
3. If fails → Write `.deepflow/experiments/{topic}--{hypothesis}--failed.md`
|
|
209
|
-
4. Adjust approach, document in task
|
|
130
|
+
Before writing new tasks, prune stale sections:
|
|
210
131
|
|
|
211
|
-
|
|
132
|
+
```
|
|
133
|
+
For each ### section in PLAN.md:
|
|
134
|
+
Extract spec name from header (e.g. "doing-upload" or "done-upload")
|
|
135
|
+
If specs/done-{name}.md exists:
|
|
136
|
+
→ Remove the ENTIRE section: header, tasks, execution summary, fix tasks, separators
|
|
137
|
+
If header references a spec with no matching specs/doing-*.md or specs/done-*.md:
|
|
138
|
+
→ Remove it (orphaned section)
|
|
139
|
+
```
|
|
212
140
|
|
|
213
|
-
|
|
141
|
+
Also recalculate the Summary table (specs analyzed, tasks created/completed/pending) to reflect only remaining sections.
|
|
142
|
+
|
|
143
|
+
If PLAN.md becomes empty after cleanup, delete the file and recreate fresh.
|
|
144
|
+
|
|
145
|
+
### 9. OUTPUT PLAN.md
|
|
214
146
|
|
|
215
147
|
Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validation findings.
|
|
216
148
|
|
|
217
|
-
###
|
|
149
|
+
### 10. RENAME SPECS
|
|
218
150
|
|
|
219
151
|
`mv specs/feature.md specs/doing-feature.md`
|
|
220
152
|
|
|
221
|
-
###
|
|
153
|
+
### 11. REPORT
|
|
222
154
|
|
|
223
155
|
`✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
|
|
224
156
|
|
|
225
|
-
### 11. CAPTURE DECISIONS
|
|
226
|
-
|
|
227
|
-
Extract up to 4 candidate decisions (approaches chosen, spike strategies, prioritization rationale). Present via AskUserQuestion with `multiSelect: true`. Each option: `label: "[TAG] <decision>"`, `description: "<rationale>"`. Tags: `[APPROACH]`, `[PROVISIONAL]`, `[ASSUMPTION]`.
|
|
228
|
-
|
|
229
|
-
Append confirmed decisions to `.deepflow/decisions.md` (create if missing):
|
|
230
|
-
```
|
|
231
|
-
### {YYYY-MM-DD} — plan
|
|
232
|
-
- [TAG] Decision text — rationale summary
|
|
233
|
-
```
|
|
234
|
-
If a decision contradicts a prior entry, add: `(supersedes: <prior text>)`
|
|
235
|
-
|
|
236
157
|
## Rules
|
|
237
|
-
- **Never use TaskOutput** — Returns full transcripts that explode context
|
|
238
|
-
- **Never use run_in_background for Explore agents** — Causes late notifications that pollute output
|
|
239
158
|
- **Spike-first** — Generate spike task before full implementation if no `--passed.md` experiment exists
|
|
240
159
|
- **Block on spike** — Full implementation tasks MUST be blocked by spike validation
|
|
241
160
|
- **Learn from failures** — Extract "next hypothesis" from failed experiments, never repeat same approach
|
|
242
|
-
- **
|
|
243
|
-
- **Plan only** — Do NOT implement anything (except quick validation prototypes)
|
|
244
|
-
- **Validate before commit** — Test risky assumptions with minimal experiments
|
|
161
|
+
- **Plan only** — Do NOT implement (except quick validation prototypes)
|
|
245
162
|
- **Confirm before assume** — Search code before marking "missing"
|
|
246
163
|
- **One task = one logical unit** — Atomic, committable
|
|
247
|
-
- Prefer existing utilities over new code
|
|
248
|
-
- Flag spec gaps, don't silently ignore
|
|
164
|
+
- Prefer existing utilities over new code; flag spec gaps
|
|
249
165
|
|
|
250
166
|
## Agent Scaling
|
|
251
167
|
|
|
@@ -254,7 +170,7 @@ If a decision contradicts a prior entry, add: `(supersedes: <prior text>)`
|
|
|
254
170
|
| Explore (search) | haiku | 10 | +1 per 20 files |
|
|
255
171
|
| Reasoner (analyze) | opus | 5 | +1 per 2 specs |
|
|
256
172
|
|
|
257
|
-
|
|
173
|
+
Always use the `Task` tool with explicit `subagent_type` and `model`. Do NOT use Glob/Grep/Read directly.
|
|
258
174
|
|
|
259
175
|
## Example
|
|
260
176
|
|
|
@@ -93,13 +93,7 @@ Word count target: 200-500 words. Do not pad. Do not truncate important informat
|
|
|
93
93
|
|
|
94
94
|
## Rules
|
|
95
95
|
|
|
96
|
-
- **NEVER write any file** — not decisions.md, not PLAN.md, not any new file
|
|
97
|
-
- **NEVER use AskUserQuestion** — this command is read-only, no interaction
|
|
98
|
-
- **NEVER spawn agents** — read directly using Bash (git log) and Read tool
|
|
99
|
-
- **NEVER use TaskOutput** — returns full transcripts that explode context
|
|
100
|
-
- **NEVER use EnterPlanMode or ExitPlanMode**
|
|
101
96
|
- Read sources in a single pass — do not loop or re-read
|
|
102
|
-
- If a source file is missing, skip it and note it only if relevant
|
|
103
97
|
- Contradicted decisions: show newest entry per topic only
|
|
104
98
|
- Token budget: stay within ~2500 tokens of input to produce ~500 words of output
|
|
105
99
|
|