deepflow 0.1.27 → 0.1.29
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/src/commands/df/execute.md +164 -54
- package/src/commands/df/plan.md +8 -5
- package/src/commands/df/spec.md +9 -6
- package/src/commands/df/verify.md +10 -5
package/package.json
CHANGED
|
@@ -2,11 +2,11 @@
|
|
|
2
2
|
|
|
3
3
|
## Orchestrator Role
|
|
4
4
|
|
|
5
|
-
You
|
|
5
|
+
You are a coordinator. Spawn agents, wait for results, update PLAN.md. Never implement code yourself.
|
|
6
6
|
|
|
7
|
-
**NEVER:** Read source files, edit code, run tests, run git (except status)
|
|
7
|
+
**NEVER:** Read source files, edit code, run tests, run git commands (except status)
|
|
8
8
|
|
|
9
|
-
**ONLY:** Read
|
|
9
|
+
**ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, use TaskOutput to get results, update PLAN.md
|
|
10
10
|
|
|
11
11
|
---
|
|
12
12
|
|
|
@@ -29,6 +29,7 @@ Implement tasks from PLAN.md with parallel agents, atomic commits, and context-e
|
|
|
29
29
|
| Agent | subagent_type | model | Purpose |
|
|
30
30
|
|-------|---------------|-------|---------|
|
|
31
31
|
| Implementation | `general-purpose` | `sonnet` | Task implementation |
|
|
32
|
+
| Spike Verifier | `reasoner` | `opus` | Verify spike pass/fail is correct |
|
|
32
33
|
| Debugger | `reasoner` | `opus` | Debugging failures |
|
|
33
34
|
|
|
34
35
|
## Context-Aware Execution
|
|
@@ -42,11 +43,16 @@ Statusline writes to `.deepflow/context.json`: `{"percentage": 45}`
|
|
|
42
43
|
|
|
43
44
|
## Agent Protocol
|
|
44
45
|
|
|
45
|
-
|
|
46
|
+
Each task = one background agent. Use TaskOutput to wait for results. Never poll files in a loop.
|
|
46
47
|
|
|
47
48
|
```python
|
|
48
|
-
|
|
49
|
-
|
|
49
|
+
# Spawn agents in parallel (single message, multiple Task calls)
|
|
50
|
+
task_id_1 = Task(subagent_type="general-purpose", run_in_background=True, prompt="T1: ...")
|
|
51
|
+
task_id_2 = Task(subagent_type="general-purpose", run_in_background=True, prompt="T2: ...")
|
|
52
|
+
|
|
53
|
+
# Wait for all results (single message, multiple TaskOutput calls)
|
|
54
|
+
TaskOutput(task_id=task_id_1)
|
|
55
|
+
TaskOutput(task_id=task_id_2)
|
|
50
56
|
```
|
|
51
57
|
|
|
52
58
|
Result file `.deepflow/results/{task_id}.yaml`:
|
|
@@ -57,6 +63,28 @@ commit: abc1234
|
|
|
57
63
|
summary: "one line"
|
|
58
64
|
```
|
|
59
65
|
|
|
66
|
+
**Spike result file** `.deepflow/results/{task_id}.yaml` (additional fields):
|
|
67
|
+
```yaml
|
|
68
|
+
task: T1
|
|
69
|
+
type: spike
|
|
70
|
+
status: success|failed
|
|
71
|
+
commit: abc1234
|
|
72
|
+
summary: "one line"
|
|
73
|
+
criteria:
|
|
74
|
+
- name: "throughput"
|
|
75
|
+
target: ">= 7000 g/s"
|
|
76
|
+
actual: "1500 g/s"
|
|
77
|
+
met: false
|
|
78
|
+
- name: "memory usage"
|
|
79
|
+
target: "< 500 MB"
|
|
80
|
+
actual: "320 MB"
|
|
81
|
+
met: true
|
|
82
|
+
all_criteria_met: false # ALL must be true for spike to pass
|
|
83
|
+
experiment_file: ".deepflow/experiments/upload--streaming--failed.md"
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
**CRITICAL:** `status` MUST equal `success` only if `all_criteria_met: true`. The spike verifier will reject mismatches.
|
|
87
|
+
|
|
60
88
|
## Checkpoint & Resume
|
|
61
89
|
|
|
62
90
|
**File:** `.deepflow/checkpoint.json` — stored in WORKTREE directory, not main.
|
|
@@ -188,23 +216,78 @@ Ready = `[ ]` + all `blocked_by` complete + experiment validated (if applicable)
|
|
|
188
216
|
|
|
189
217
|
Context ≥50%: checkpoint and exit.
|
|
190
218
|
|
|
191
|
-
**
|
|
219
|
+
**CRITICAL: Spawn ALL ready tasks in a SINGLE response with MULTIPLE Task tool calls.**
|
|
220
|
+
|
|
221
|
+
DO NOT spawn one task, wait, then spawn another. Instead, call Task tool multiple times in the SAME message block. This enables true parallelism.
|
|
222
|
+
|
|
223
|
+
Example: If T1, T2, T3 are ready, send ONE message containing THREE Task tool invocations:
|
|
224
|
+
|
|
192
225
|
```
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
- model: "
|
|
196
|
-
- run_in_background:
|
|
197
|
-
- prompt: "{task details from PLAN.md}"
|
|
226
|
+
// In a SINGLE assistant message, invoke Task THREE times:
|
|
227
|
+
Task(subagent_type="general-purpose", model="sonnet", run_in_background=true, prompt="T1: ...")
|
|
228
|
+
Task(subagent_type="general-purpose", model="sonnet", run_in_background=true, prompt="T2: ...")
|
|
229
|
+
Task(subagent_type="general-purpose", model="sonnet", run_in_background=true, prompt="T3: ...")
|
|
198
230
|
```
|
|
199
231
|
|
|
232
|
+
**WRONG (sequential):** Send message with Task for T1 → wait → send message with Task for T2 → wait → ...
|
|
233
|
+
**RIGHT (parallel):** Send ONE message with Task for T1, T2, T3 all together
|
|
234
|
+
|
|
200
235
|
Same-file conflicts: spawn sequentially instead.
|
|
201
236
|
|
|
202
237
|
**Spike Task Execution:**
|
|
203
238
|
When spawning a spike task, the agent MUST:
|
|
204
239
|
1. Execute the minimal validation method
|
|
205
|
-
2. Record
|
|
206
|
-
3.
|
|
207
|
-
4.
|
|
240
|
+
2. Record structured criteria evaluation in result file (see spike result schema above)
|
|
241
|
+
3. Write experiment file with `--active.md` status (verifier determines final status)
|
|
242
|
+
4. Commit as `spike({spec}): validate {hypothesis}`
|
|
243
|
+
|
|
244
|
+
**IMPORTANT:** Spike agent writes `--active.md`, NOT `--passed.md` or `--failed.md`. The verifier determines final status.
|
|
245
|
+
|
|
246
|
+
### 6.5. VERIFY SPIKE RESULTS
|
|
247
|
+
|
|
248
|
+
After spike completes, spawn verifier BEFORE unblocking implementation tasks.
|
|
249
|
+
|
|
250
|
+
**Trigger:** Spike result file detected (`.deepflow/results/T{n}.yaml` with `type: spike`)
|
|
251
|
+
|
|
252
|
+
**Spawn:**
|
|
253
|
+
```
|
|
254
|
+
Task(subagent_type="reasoner", model="opus", prompt=VERIFIER_PROMPT)
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
**Verifier Prompt:**
|
|
258
|
+
```
|
|
259
|
+
SPIKE VERIFICATION — Be skeptical. Catch false positives.
|
|
260
|
+
|
|
261
|
+
Task: {task_id}
|
|
262
|
+
Result: {worktree_path}/.deepflow/results/{task_id}.yaml
|
|
263
|
+
Experiment: {worktree_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
|
|
264
|
+
|
|
265
|
+
For each criterion in result file:
|
|
266
|
+
1. Is `actual` a concrete number? (reject "good", "improved", "better")
|
|
267
|
+
2. Does `actual` satisfy `target`? Do the math.
|
|
268
|
+
3. Is `met` correct?
|
|
269
|
+
|
|
270
|
+
Reject these patterns:
|
|
271
|
+
- "Works but doesn't meet target" → FAILED
|
|
272
|
+
- "Close enough" → FAILED
|
|
273
|
+
- Actual 1500 vs Target >= 7000 → FAILED
|
|
274
|
+
|
|
275
|
+
Output to {worktree_path}/.deepflow/results/{task_id}-verified.yaml:
|
|
276
|
+
verified_status: VERIFIED_PASS|VERIFIED_FAIL
|
|
277
|
+
override: true|false
|
|
278
|
+
reason: "one line"
|
|
279
|
+
|
|
280
|
+
Then rename experiment:
|
|
281
|
+
- VERIFIED_PASS → --passed.md
|
|
282
|
+
- VERIFIED_FAIL → --failed.md (add "Next hypothesis:" to Conclusion)
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
**Gate:**
|
|
286
|
+
```
|
|
287
|
+
VERIFIED_PASS → Unblock, log "✓ Spike {task_id} verified"
|
|
288
|
+
VERIFIED_FAIL → Block, log "✗ Spike {task_id} failed verification"
|
|
289
|
+
If override: log "⚠ Agent incorrectly marked as passed"
|
|
290
|
+
```
|
|
208
291
|
|
|
209
292
|
**On failure, use Task tool to spawn reasoner:**
|
|
210
293
|
```
|
|
@@ -239,33 +322,25 @@ Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
|
|
|
239
322
|
```
|
|
240
323
|
{task_id} [SPIKE]: {hypothesis}
|
|
241
324
|
Type: spike
|
|
242
|
-
Method: {minimal steps
|
|
243
|
-
Success criteria: {
|
|
244
|
-
Time-box: {duration}
|
|
325
|
+
Method: {minimal steps}
|
|
326
|
+
Success criteria: {measurable targets}
|
|
245
327
|
Experiment file: {worktree_absolute_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
|
|
246
|
-
Spec: {spec_name}
|
|
247
328
|
|
|
248
|
-
|
|
249
|
-
All file operations MUST use this absolute path as base:
|
|
250
|
-
{worktree_absolute_path}
|
|
251
|
-
|
|
252
|
-
Example: To edit src/foo.ts, use:
|
|
253
|
-
{worktree_absolute_path}/src/foo.ts
|
|
254
|
-
|
|
255
|
-
Do NOT write files to the main project directory.
|
|
329
|
+
Working directory: {worktree_absolute_path}
|
|
256
330
|
|
|
257
|
-
|
|
258
|
-
1.
|
|
259
|
-
2.
|
|
260
|
-
3.
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
4. Commit as spike({spec}): validate {hypothesis}
|
|
264
|
-
5. Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
|
|
331
|
+
Steps:
|
|
332
|
+
1. Execute method
|
|
333
|
+
2. For EACH criterion: record target, measure actual, compare (show math)
|
|
334
|
+
3. Write experiment as --active.md (verifier determines final status)
|
|
335
|
+
4. Commit: spike({spec}): validate {hypothesis}
|
|
336
|
+
5. Write result to .deepflow/results/{task_id}.yaml (see spike result schema)
|
|
265
337
|
|
|
266
|
-
|
|
267
|
-
-
|
|
268
|
-
-
|
|
338
|
+
Rules:
|
|
339
|
+
- `met: true` ONLY if actual satisfies target
|
|
340
|
+
- `status: success` ONLY if ALL criteria met
|
|
341
|
+
- Worse than baseline = FAILED (baseline 7k, actual 1.5k → FAILED)
|
|
342
|
+
- "Close enough" = FAILED
|
|
343
|
+
- Verifier will check. False positives waste resources.
|
|
269
344
|
```
|
|
270
345
|
|
|
271
346
|
### 8. FAILURE HANDLING
|
|
@@ -312,7 +387,18 @@ When all tasks done for a `doing-*` spec:
|
|
|
312
387
|
|
|
313
388
|
### 10. ITERATE
|
|
314
389
|
|
|
315
|
-
|
|
390
|
+
After spawning agents, wait for results using TaskOutput. Call TaskOutput for ALL running agents in a SINGLE message (parallel wait).
|
|
391
|
+
|
|
392
|
+
```python
|
|
393
|
+
# After spawning T1, T2, T3 in parallel, wait for all in parallel:
|
|
394
|
+
TaskOutput(task_id=t1_id) # These three calls go in ONE message
|
|
395
|
+
TaskOutput(task_id=t2_id)
|
|
396
|
+
TaskOutput(task_id=t3_id)
|
|
397
|
+
```
|
|
398
|
+
|
|
399
|
+
Then check which tasks completed, update PLAN.md, identify newly unblocked tasks, spawn next wave.
|
|
400
|
+
|
|
401
|
+
Repeat until: all done, all blocked, or context ≥50% (checkpoint).
|
|
316
402
|
|
|
317
403
|
## Rules
|
|
318
404
|
|
|
@@ -338,6 +424,8 @@ Wave 2: T3 (context: 48%)
|
|
|
338
424
|
|
|
339
425
|
✓ doing-upload → done-upload
|
|
340
426
|
✓ Complete: 3/3 tasks
|
|
427
|
+
|
|
428
|
+
Next: Run /df:verify to verify specs and merge to main
|
|
341
429
|
```
|
|
342
430
|
|
|
343
431
|
### Spike-First Execution
|
|
@@ -350,43 +438,65 @@ Checking experiment status...
|
|
|
350
438
|
T2: Blocked by T1 (spike not validated)
|
|
351
439
|
T3: Blocked by T1 (spike not validated)
|
|
352
440
|
|
|
353
|
-
Wave 1: T1 [SPIKE] (context:
|
|
354
|
-
T1:
|
|
441
|
+
Wave 1: T1 [SPIKE] (context: 15%)
|
|
442
|
+
T1: complete, verifying...
|
|
355
443
|
|
|
356
|
-
|
|
357
|
-
|
|
358
|
-
|
|
444
|
+
Verifying T1...
|
|
445
|
+
✓ Spike T1 verified (throughput 8500 >= 7000)
|
|
446
|
+
→ upload--streaming--passed.md
|
|
359
447
|
|
|
360
|
-
Wave 2: T2, T3 parallel (context:
|
|
448
|
+
Wave 2: T2, T3 parallel (context: 40%)
|
|
361
449
|
T2: success (def5678)
|
|
362
450
|
T3: success (ghi9012)
|
|
363
451
|
|
|
364
452
|
✓ doing-upload → done-upload
|
|
365
453
|
✓ Complete: 3/3 tasks
|
|
454
|
+
|
|
455
|
+
Next: Run /df:verify to verify specs and merge to main
|
|
366
456
|
```
|
|
367
457
|
|
|
368
|
-
### Spike Failed
|
|
458
|
+
### Spike Failed (Agent Correctly Reported)
|
|
369
459
|
|
|
370
460
|
```
|
|
371
461
|
/df:execute (context: 10%)
|
|
372
462
|
|
|
373
|
-
Wave 1: T1 [SPIKE] (context:
|
|
374
|
-
T1:
|
|
463
|
+
Wave 1: T1 [SPIKE] (context: 15%)
|
|
464
|
+
T1: complete, verifying...
|
|
375
465
|
|
|
376
|
-
|
|
377
|
-
|
|
378
|
-
|
|
466
|
+
Verifying T1...
|
|
467
|
+
✗ Spike T1 failed verification (throughput 1500 < 7000)
|
|
468
|
+
→ upload--streaming--failed.md
|
|
379
469
|
|
|
380
470
|
⚠ Spike T1 invalidated hypothesis
|
|
381
|
-
|
|
382
|
-
→ Run /df:plan to generate new hypothesis spike
|
|
471
|
+
Complete: 1/3 tasks (2 blocked by failed experiment)
|
|
383
472
|
|
|
473
|
+
Next: Run /df:plan to generate new hypothesis spike
|
|
474
|
+
```
|
|
475
|
+
|
|
476
|
+
### Spike Failed (Verifier Override)
|
|
477
|
+
|
|
478
|
+
```
|
|
479
|
+
/df:execute (context: 10%)
|
|
480
|
+
|
|
481
|
+
Wave 1: T1 [SPIKE] (context: 15%)
|
|
482
|
+
T1: complete (agent said: success), verifying...
|
|
483
|
+
|
|
484
|
+
Verifying T1...
|
|
485
|
+
✗ Spike T1 failed verification (throughput 1500 < 7000)
|
|
486
|
+
⚠ Agent incorrectly marked as passed — overriding to FAILED
|
|
487
|
+
→ upload--streaming--failed.md
|
|
488
|
+
|
|
489
|
+
⚠ Spike T1 invalidated hypothesis
|
|
384
490
|
Complete: 1/3 tasks (2 blocked by failed experiment)
|
|
491
|
+
|
|
492
|
+
Next: Run /df:plan to generate new hypothesis spike
|
|
385
493
|
```
|
|
386
494
|
|
|
387
495
|
### With Checkpoint
|
|
388
496
|
|
|
389
497
|
```
|
|
390
498
|
Wave 1 complete (context: 52%)
|
|
391
|
-
Checkpoint saved.
|
|
499
|
+
Checkpoint saved.
|
|
500
|
+
|
|
501
|
+
Next: Run /df:execute --continue to resume execution
|
|
392
502
|
```
|
package/src/commands/df/plan.md
CHANGED
|
@@ -81,12 +81,15 @@ Include patterns in task descriptions for agents to follow.
|
|
|
81
81
|
|
|
82
82
|
### 4. ANALYZE CODEBASE
|
|
83
83
|
|
|
84
|
-
**
|
|
84
|
+
**Spawn ALL Explore agents in ONE message, then wait for ALL with TaskOutput in ONE message:**
|
|
85
85
|
```
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
86
|
+
// Spawn all in single message:
|
|
87
|
+
t1 = Task(subagent_type="Explore", model="haiku", run_in_background=true, prompt="...")
|
|
88
|
+
t2 = Task(subagent_type="Explore", model="haiku", run_in_background=true, prompt="...")
|
|
89
|
+
|
|
90
|
+
// Wait all in single message:
|
|
91
|
+
TaskOutput(task_id=t1)
|
|
92
|
+
TaskOutput(task_id=t2)
|
|
90
93
|
```
|
|
91
94
|
|
|
92
95
|
Scale agent count based on codebase size:
|
package/src/commands/df/spec.md
CHANGED
|
@@ -6,7 +6,7 @@ You coordinate agents and ask questions. You never search code directly.
|
|
|
6
6
|
|
|
7
7
|
**NEVER:** Read source files, use Glob/Grep directly, run git
|
|
8
8
|
|
|
9
|
-
**ONLY:** Spawn agents,
|
|
9
|
+
**ONLY:** Spawn agents, use TaskOutput to get results, ask user questions, write spec file
|
|
10
10
|
|
|
11
11
|
---
|
|
12
12
|
|
|
@@ -31,12 +31,15 @@ Transform conversation context into a structured specification file.
|
|
|
31
31
|
|
|
32
32
|
### 1. GATHER CODEBASE CONTEXT
|
|
33
33
|
|
|
34
|
-
**
|
|
34
|
+
**Spawn ALL Explore agents in ONE message, then wait for ALL with TaskOutput in ONE message:**
|
|
35
35
|
```
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
36
|
+
// Spawn all in single message:
|
|
37
|
+
t1 = Task(subagent_type="Explore", model="haiku", run_in_background=true, prompt="...")
|
|
38
|
+
t2 = Task(subagent_type="Explore", model="haiku", run_in_background=true, prompt="...")
|
|
39
|
+
|
|
40
|
+
// Wait all in single message:
|
|
41
|
+
TaskOutput(task_id=t1)
|
|
42
|
+
TaskOutput(task_id=t2)
|
|
40
43
|
```
|
|
41
44
|
|
|
42
45
|
Find:
|
|
@@ -91,12 +91,15 @@ Default: L1-L3 (L4 optional, can be slow)
|
|
|
91
91
|
|
|
92
92
|
## Agent Usage
|
|
93
93
|
|
|
94
|
-
**
|
|
94
|
+
**Spawn ALL Explore agents in ONE message, then wait for ALL with TaskOutput in ONE message:**
|
|
95
95
|
```
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
96
|
+
// Spawn all in single message:
|
|
97
|
+
t1 = Task(subagent_type="Explore", model="haiku", run_in_background=true, prompt="...")
|
|
98
|
+
t2 = Task(subagent_type="Explore", model="haiku", run_in_background=true, prompt="...")
|
|
99
|
+
|
|
100
|
+
// Wait all in single message:
|
|
101
|
+
TaskOutput(task_id=t1)
|
|
102
|
+
TaskOutput(task_id=t2)
|
|
100
103
|
```
|
|
101
104
|
|
|
102
105
|
Scale: 1-2 agents per spec, cap 10.
|
|
@@ -157,4 +160,6 @@ rm .deepflow/checkpoint.json
|
|
|
157
160
|
✓ Merged df/doing-upload/20260202-1430 to main
|
|
158
161
|
✓ Cleaned up worktree and branch
|
|
159
162
|
✓ Spec complete: doing-upload → done-upload
|
|
163
|
+
|
|
164
|
+
Workflow complete! Ready for next feature: /df:spec <name>
|
|
160
165
|
```
|