deepflow 0.1.27 → 0.1.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "deepflow",
3
- "version": "0.1.27",
3
+ "version": "0.1.29",
4
4
  "description": "Stay in flow state - lightweight spec-driven task orchestration for Claude Code",
5
5
  "keywords": [
6
6
  "claude",
@@ -2,11 +2,11 @@
2
2
 
3
3
  ## Orchestrator Role
4
4
 
5
- You spawn agents and poll results. You never implement.
5
+ You are a coordinator. Spawn agents, wait for results, update PLAN.md. Never implement code yourself.
6
6
 
7
- **NEVER:** Read source files, edit code, run tests, run git (except status), use `TaskOutput`
7
+ **NEVER:** Read source files, edit code, run tests, run git commands (except status)
8
8
 
9
- **ONLY:** Read `PLAN.md` + `specs/doing-*.md`, spawn background agents, poll `.deepflow/results/`, update PLAN.md
9
+ **ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, use TaskOutput to get results, update PLAN.md
10
10
 
11
11
  ---
12
12
 
@@ -29,6 +29,7 @@ Implement tasks from PLAN.md with parallel agents, atomic commits, and context-e
29
29
  | Agent | subagent_type | model | Purpose |
30
30
  |-------|---------------|-------|---------|
31
31
  | Implementation | `general-purpose` | `sonnet` | Task implementation |
32
+ | Spike Verifier | `reasoner` | `opus` | Verify spike pass/fail is correct |
32
33
  | Debugger | `reasoner` | `opus` | Debugging failures |
33
34
 
34
35
  ## Context-Aware Execution
@@ -42,11 +43,16 @@ Statusline writes to `.deepflow/context.json`: `{"percentage": 45}`
42
43
 
43
44
  ## Agent Protocol
44
45
 
45
- Every task = one background agent. Poll result files, never `TaskOutput`.
46
+ Each task = one background agent. Use TaskOutput to wait for results. Never poll files in a loop.
46
47
 
47
48
  ```python
48
- Task(subagent_type="general-purpose", run_in_background=True, prompt="T1: ...")
49
- # Poll: Glob(".deepflow/results/T*.yaml")
49
+ # Spawn agents in parallel (single message, multiple Task calls)
50
+ task_id_1 = Task(subagent_type="general-purpose", run_in_background=True, prompt="T1: ...")
51
+ task_id_2 = Task(subagent_type="general-purpose", run_in_background=True, prompt="T2: ...")
52
+
53
+ # Wait for all results (single message, multiple TaskOutput calls)
54
+ TaskOutput(task_id=task_id_1)
55
+ TaskOutput(task_id=task_id_2)
50
56
  ```
51
57
 
52
58
  Result file `.deepflow/results/{task_id}.yaml`:
@@ -57,6 +63,28 @@ commit: abc1234
57
63
  summary: "one line"
58
64
  ```
59
65
 
66
+ **Spike result file** `.deepflow/results/{task_id}.yaml` (additional fields):
67
+ ```yaml
68
+ task: T1
69
+ type: spike
70
+ status: success|failed
71
+ commit: abc1234
72
+ summary: "one line"
73
+ criteria:
74
+ - name: "throughput"
75
+ target: ">= 7000 g/s"
76
+ actual: "1500 g/s"
77
+ met: false
78
+ - name: "memory usage"
79
+ target: "< 500 MB"
80
+ actual: "320 MB"
81
+ met: true
82
+ all_criteria_met: false # ALL must be true for spike to pass
83
+ experiment_file: ".deepflow/experiments/upload--streaming--failed.md"
84
+ ```
85
+
86
+ **CRITICAL:** `status` MUST equal `success` only if `all_criteria_met: true`. The spike verifier will reject mismatches.
87
+
60
88
  ## Checkpoint & Resume
61
89
 
62
90
  **File:** `.deepflow/checkpoint.json` — stored in WORKTREE directory, not main.
@@ -188,23 +216,78 @@ Ready = `[ ]` + all `blocked_by` complete + experiment validated (if applicable)
188
216
 
189
217
  Context ≥50%: checkpoint and exit.
190
218
 
191
- **Use Task tool to spawn all ready tasks in ONE message (parallel):**
219
+ **CRITICAL: Spawn ALL ready tasks in a SINGLE response with MULTIPLE Task tool calls.**
220
+
221
+ DO NOT spawn one task, wait, then spawn another. Instead, call Task tool multiple times in the SAME message block. This enables true parallelism.
222
+
223
+ Example: If T1, T2, T3 are ready, send ONE message containing THREE Task tool invocations:
224
+
192
225
  ```
193
- Task tool parameters for each task:
194
- - subagent_type: "general-purpose"
195
- - model: "sonnet"
196
- - run_in_background: true
197
- - prompt: "{task details from PLAN.md}"
226
+ // In a SINGLE assistant message, invoke Task THREE times:
227
+ Task(subagent_type="general-purpose", model="sonnet", run_in_background=true, prompt="T1: ...")
228
+ Task(subagent_type="general-purpose", model="sonnet", run_in_background=true, prompt="T2: ...")
229
+ Task(subagent_type="general-purpose", model="sonnet", run_in_background=true, prompt="T3: ...")
198
230
  ```
199
231
 
232
+ **WRONG (sequential):** Send message with Task for T1 → wait → send message with Task for T2 → wait → ...
233
+ **RIGHT (parallel):** Send ONE message with Task for T1, T2, T3 all together
234
+
200
235
  Same-file conflicts: spawn sequentially instead.
201
236
 
202
237
  **Spike Task Execution:**
203
238
  When spawning a spike task, the agent MUST:
204
239
  1. Execute the minimal validation method
205
- 2. Record result in experiment file (update status: `--passed.md` or `--failed.md`)
206
- 3. If passed: implementation tasks become unblocked
207
- 4. If failed: record conclusion with "next hypothesis" for future planning
240
+ 2. Record structured criteria evaluation in result file (see spike result schema above)
241
+ 3. Write experiment file with `--active.md` status (verifier determines final status)
242
+ 4. Commit as `spike({spec}): validate {hypothesis}`
243
+
244
+ **IMPORTANT:** Spike agent writes `--active.md`, NOT `--passed.md` or `--failed.md`. The verifier determines final status.
245
+
246
+ ### 6.5. VERIFY SPIKE RESULTS
247
+
248
+ After spike completes, spawn verifier BEFORE unblocking implementation tasks.
249
+
250
+ **Trigger:** Spike result file detected (`.deepflow/results/T{n}.yaml` with `type: spike`)
251
+
252
+ **Spawn:**
253
+ ```
254
+ Task(subagent_type="reasoner", model="opus", prompt=VERIFIER_PROMPT)
255
+ ```
256
+
257
+ **Verifier Prompt:**
258
+ ```
259
+ SPIKE VERIFICATION — Be skeptical. Catch false positives.
260
+
261
+ Task: {task_id}
262
+ Result: {worktree_path}/.deepflow/results/{task_id}.yaml
263
+ Experiment: {worktree_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
264
+
265
+ For each criterion in result file:
266
+ 1. Is `actual` a concrete number? (reject "good", "improved", "better")
267
+ 2. Does `actual` satisfy `target`? Do the math.
268
+ 3. Is `met` correct?
269
+
270
+ Reject these patterns:
271
+ - "Works but doesn't meet target" → FAILED
272
+ - "Close enough" → FAILED
273
+ - Actual 1500 vs Target >= 7000 → FAILED
274
+
275
+ Output to {worktree_path}/.deepflow/results/{task_id}-verified.yaml:
276
+ verified_status: VERIFIED_PASS|VERIFIED_FAIL
277
+ override: true|false
278
+ reason: "one line"
279
+
280
+ Then rename experiment:
281
+ - VERIFIED_PASS → --passed.md
282
+ - VERIFIED_FAIL → --failed.md (add "Next hypothesis:" to Conclusion)
283
+ ```
284
+
285
+ **Gate:**
286
+ ```
287
+ VERIFIED_PASS → Unblock, log "✓ Spike {task_id} verified"
288
+ VERIFIED_FAIL → Block, log "✗ Spike {task_id} failed verification"
289
+ If override: log "⚠ Agent incorrectly marked as passed"
290
+ ```
208
291
 
209
292
  **On failure, use Task tool to spawn reasoner:**
210
293
  ```
@@ -239,33 +322,25 @@ Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
239
322
  ```
240
323
  {task_id} [SPIKE]: {hypothesis}
241
324
  Type: spike
242
- Method: {minimal steps to validate}
243
- Success criteria: {how to know it passed}
244
- Time-box: {duration}
325
+ Method: {minimal steps}
326
+ Success criteria: {measurable targets}
245
327
  Experiment file: {worktree_absolute_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
246
- Spec: {spec_name}
247
328
 
248
- **IMPORTANT: Working Directory**
249
- All file operations MUST use this absolute path as base:
250
- {worktree_absolute_path}
251
-
252
- Example: To edit src/foo.ts, use:
253
- {worktree_absolute_path}/src/foo.ts
254
-
255
- Do NOT write files to the main project directory.
329
+ Working directory: {worktree_absolute_path}
256
330
 
257
- Execute the minimal validation:
258
- 1. Follow the method steps exactly
259
- 2. Measure against success criteria
260
- 3. Update experiment file with result:
261
- - If passed: rename to --passed.md, record findings
262
- - If failed: rename to --failed.md, record conclusion with "next hypothesis"
263
- 4. Commit as spike({spec}): validate {hypothesis}
264
- 5. Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
331
+ Steps:
332
+ 1. Execute method
333
+ 2. For EACH criterion: record target, measure actual, compare (show math)
334
+ 3. Write experiment as --active.md (verifier determines final status)
335
+ 4. Commit: spike({spec}): validate {hypothesis}
336
+ 5. Write result to .deepflow/results/{task_id}.yaml (see spike result schema)
265
337
 
266
- Result status:
267
- - success = hypothesis validated (passed)
268
- - failed = hypothesis invalidated (failed experiment, NOT agent error)
338
+ Rules:
339
+ - `met: true` ONLY if actual satisfies target
340
+ - `status: success` ONLY if ALL criteria met
341
+ - Worse than baseline = FAILED (baseline 7k, actual 1.5k → FAILED)
342
+ - "Close enough" = FAILED
343
+ - Verifier will check. False positives waste resources.
269
344
  ```
270
345
 
271
346
  ### 8. FAILURE HANDLING
@@ -312,7 +387,18 @@ When all tasks done for a `doing-*` spec:
312
387
 
313
388
  ### 10. ITERATE
314
389
 
315
- Repeat until: all done, all blocked, or checkpoint.
390
+ After spawning agents, wait for results using TaskOutput. Call TaskOutput for ALL running agents in a SINGLE message (parallel wait).
391
+
392
+ ```python
393
+ # After spawning T1, T2, T3 in parallel, wait for all in parallel:
394
+ TaskOutput(task_id=t1_id) # These three calls go in ONE message
395
+ TaskOutput(task_id=t2_id)
396
+ TaskOutput(task_id=t3_id)
397
+ ```
398
+
399
+ Then check which tasks completed, update PLAN.md, identify newly unblocked tasks, spawn next wave.
400
+
401
+ Repeat until: all done, all blocked, or context ≥50% (checkpoint).
316
402
 
317
403
  ## Rules
318
404
 
@@ -338,6 +424,8 @@ Wave 2: T3 (context: 48%)
338
424
 
339
425
  ✓ doing-upload → done-upload
340
426
  ✓ Complete: 3/3 tasks
427
+
428
+ Next: Run /df:verify to verify specs and merge to main
341
429
  ```
342
430
 
343
431
  ### Spike-First Execution
@@ -350,43 +438,65 @@ Checking experiment status...
350
438
  T2: Blocked by T1 (spike not validated)
351
439
  T3: Blocked by T1 (spike not validated)
352
440
 
353
- Wave 1: T1 [SPIKE] (context: 20%)
354
- T1: success (abc1234) → upload--streaming--passed.md
441
+ Wave 1: T1 [SPIKE] (context: 15%)
442
+ T1: complete, verifying...
355
443
 
356
- Checking experiment status...
357
- T2: Experiment passed, unblocked
358
- T3: Experiment passed, unblocked
444
+ Verifying T1...
445
+ Spike T1 verified (throughput 8500 >= 7000)
446
+ upload--streaming--passed.md
359
447
 
360
- Wave 2: T2, T3 parallel (context: 45%)
448
+ Wave 2: T2, T3 parallel (context: 40%)
361
449
  T2: success (def5678)
362
450
  T3: success (ghi9012)
363
451
 
364
452
  ✓ doing-upload → done-upload
365
453
  ✓ Complete: 3/3 tasks
454
+
455
+ Next: Run /df:verify to verify specs and merge to main
366
456
  ```
367
457
 
368
- ### Spike Failed
458
+ ### Spike Failed (Agent Correctly Reported)
369
459
 
370
460
  ```
371
461
  /df:execute (context: 10%)
372
462
 
373
- Wave 1: T1 [SPIKE] (context: 20%)
374
- T1: failed → upload--streaming--failed.md
463
+ Wave 1: T1 [SPIKE] (context: 15%)
464
+ T1: complete, verifying...
375
465
 
376
- Checking experiment status...
377
- T2: Blocked - Experiment failed
378
- T3: ⚠ Blocked - Experiment failed
466
+ Verifying T1...
467
+ Spike T1 failed verification (throughput 1500 < 7000)
468
+ upload--streaming--failed.md
379
469
 
380
470
  ⚠ Spike T1 invalidated hypothesis
381
- Experiment: upload--streaming--failed.md
382
- → Run /df:plan to generate new hypothesis spike
471
+ Complete: 1/3 tasks (2 blocked by failed experiment)
383
472
 
473
+ Next: Run /df:plan to generate new hypothesis spike
474
+ ```
475
+
476
+ ### Spike Failed (Verifier Override)
477
+
478
+ ```
479
+ /df:execute (context: 10%)
480
+
481
+ Wave 1: T1 [SPIKE] (context: 15%)
482
+ T1: complete (agent said: success), verifying...
483
+
484
+ Verifying T1...
485
+ ✗ Spike T1 failed verification (throughput 1500 < 7000)
486
+ ⚠ Agent incorrectly marked as passed — overriding to FAILED
487
+ → upload--streaming--failed.md
488
+
489
+ ⚠ Spike T1 invalidated hypothesis
384
490
  Complete: 1/3 tasks (2 blocked by failed experiment)
491
+
492
+ Next: Run /df:plan to generate new hypothesis spike
385
493
  ```
386
494
 
387
495
  ### With Checkpoint
388
496
 
389
497
  ```
390
498
  Wave 1 complete (context: 52%)
391
- Checkpoint saved. Run /df:execute --continue
499
+ Checkpoint saved.
500
+
501
+ Next: Run /df:execute --continue to resume execution
392
502
  ```
@@ -81,12 +81,15 @@ Include patterns in task descriptions for agents to follow.
81
81
 
82
82
  ### 4. ANALYZE CODEBASE
83
83
 
84
- **Use Task tool to spawn Explore agents in parallel:**
84
+ **Spawn ALL Explore agents in ONE message, then wait for ALL with TaskOutput in ONE message:**
85
85
  ```
86
- Task tool parameters:
87
- - subagent_type: "Explore"
88
- - model: "haiku"
89
- - run_in_background: true (for parallel execution)
86
+ // Spawn all in single message:
87
+ t1 = Task(subagent_type="Explore", model="haiku", run_in_background=true, prompt="...")
88
+ t2 = Task(subagent_type="Explore", model="haiku", run_in_background=true, prompt="...")
89
+
90
+ // Wait all in single message:
91
+ TaskOutput(task_id=t1)
92
+ TaskOutput(task_id=t2)
90
93
  ```
91
94
 
92
95
  Scale agent count based on codebase size:
@@ -6,7 +6,7 @@ You coordinate agents and ask questions. You never search code directly.
6
6
 
7
7
  **NEVER:** Read source files, use Glob/Grep directly, run git
8
8
 
9
- **ONLY:** Spawn agents, poll results, ask user questions, write spec file
9
+ **ONLY:** Spawn agents, use TaskOutput to get results, ask user questions, write spec file
10
10
 
11
11
  ---
12
12
 
@@ -31,12 +31,15 @@ Transform conversation context into a structured specification file.
31
31
 
32
32
  ### 1. GATHER CODEBASE CONTEXT
33
33
 
34
- **Use Task tool to spawn Explore agents in parallel:**
34
+ **Spawn ALL Explore agents in ONE message, then wait for ALL with TaskOutput in ONE message:**
35
35
  ```
36
- Task tool parameters:
37
- - subagent_type: "Explore"
38
- - model: "haiku"
39
- - run_in_background: true
36
+ // Spawn all in single message:
37
+ t1 = Task(subagent_type="Explore", model="haiku", run_in_background=true, prompt="...")
38
+ t2 = Task(subagent_type="Explore", model="haiku", run_in_background=true, prompt="...")
39
+
40
+ // Wait all in single message:
41
+ TaskOutput(task_id=t1)
42
+ TaskOutput(task_id=t2)
40
43
  ```
41
44
 
42
45
  Find:
@@ -91,12 +91,15 @@ Default: L1-L3 (L4 optional, can be slow)
91
91
 
92
92
  ## Agent Usage
93
93
 
94
- **Use Task tool to spawn Explore agents:**
94
+ **Spawn ALL Explore agents in ONE message, then wait for ALL with TaskOutput in ONE message:**
95
95
  ```
96
- Task tool parameters:
97
- - subagent_type: "Explore"
98
- - model: "haiku"
99
- - run_in_background: true (for parallel)
96
+ // Spawn all in single message:
97
+ t1 = Task(subagent_type="Explore", model="haiku", run_in_background=true, prompt="...")
98
+ t2 = Task(subagent_type="Explore", model="haiku", run_in_background=true, prompt="...")
99
+
100
+ // Wait all in single message:
101
+ TaskOutput(task_id=t1)
102
+ TaskOutput(task_id=t2)
100
103
  ```
101
104
 
102
105
  Scale: 1-2 agents per spec, cap 10.
@@ -157,4 +160,6 @@ rm .deepflow/checkpoint.json
157
160
  ✓ Merged df/doing-upload/20260202-1430 to main
158
161
  ✓ Cleaned up worktree and branch
159
162
  ✓ Spec complete: doing-upload → done-upload
163
+
164
+ Workflow complete! Ready for next feature: /df:spec <name>
160
165
  ```