deepflow 0.1.27 → 0.1.28

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "deepflow",
3
- "version": "0.1.27",
3
+ "version": "0.1.28",
4
4
  "description": "Stay in flow state - lightweight spec-driven task orchestration for Claude Code",
5
5
  "keywords": [
6
6
  "claude",
@@ -29,6 +29,7 @@ Implement tasks from PLAN.md with parallel agents, atomic commits, and context-e
29
29
  | Agent | subagent_type | model | Purpose |
30
30
  |-------|---------------|-------|---------|
31
31
  | Implementation | `general-purpose` | `sonnet` | Task implementation |
32
+ | Spike Verifier | `reasoner` | `opus` | Verify spike pass/fail is correct |
32
33
  | Debugger | `reasoner` | `opus` | Debugging failures |
33
34
 
34
35
  ## Context-Aware Execution
@@ -57,6 +58,28 @@ commit: abc1234
57
58
  summary: "one line"
58
59
  ```
59
60
 
61
+ **Spike result file** `.deepflow/results/{task_id}.yaml` (additional fields):
62
+ ```yaml
63
+ task: T1
64
+ type: spike
65
+ status: success|failed
66
+ commit: abc1234
67
+ summary: "one line"
68
+ criteria:
69
+ - name: "throughput"
70
+ target: ">= 7000 g/s"
71
+ actual: "1500 g/s"
72
+ met: false
73
+ - name: "memory usage"
74
+ target: "< 500 MB"
75
+ actual: "320 MB"
76
+ met: true
77
+ all_criteria_met: false # ALL must be true for spike to pass
78
+ experiment_file: ".deepflow/experiments/upload--streaming--failed.md"
79
+ ```
80
+
81
+ **CRITICAL:** `status` MUST equal `success` only if `all_criteria_met: true`. The spike verifier will reject mismatches.
82
+
60
83
  ## Checkpoint & Resume
61
84
 
62
85
  **File:** `.deepflow/checkpoint.json` — stored in WORKTREE directory, not main.
@@ -202,9 +225,57 @@ Same-file conflicts: spawn sequentially instead.
202
225
  **Spike Task Execution:**
203
226
  When spawning a spike task, the agent MUST:
204
227
  1. Execute the minimal validation method
205
- 2. Record result in experiment file (update status: `--passed.md` or `--failed.md`)
206
- 3. If passed: implementation tasks become unblocked
207
- 4. If failed: record conclusion with "next hypothesis" for future planning
228
+ 2. Record structured criteria evaluation in result file (see spike result schema above)
229
+ 3. Write experiment file with `--active.md` status (verifier determines final status)
230
+ 4. Commit as `spike({spec}): validate {hypothesis}`
231
+
232
+ **IMPORTANT:** Spike agent writes `--active.md`, NOT `--passed.md` or `--failed.md`. The verifier determines final status.
233
+
234
+ ### 6.5. VERIFY SPIKE RESULTS
235
+
236
+ After spike completes, spawn verifier BEFORE unblocking implementation tasks.
237
+
238
+ **Trigger:** Spike result file detected (`.deepflow/results/T{n}.yaml` with `type: spike`)
239
+
240
+ **Spawn:**
241
+ ```
242
+ Task(subagent_type="reasoner", model="opus", prompt=VERIFIER_PROMPT)
243
+ ```
244
+
245
+ **Verifier Prompt:**
246
+ ```
247
+ SPIKE VERIFICATION — Be skeptical. Catch false positives.
248
+
249
+ Task: {task_id}
250
+ Result: {worktree_path}/.deepflow/results/{task_id}.yaml
251
+ Experiment: {worktree_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
252
+
253
+ For each criterion in result file:
254
+ 1. Is `actual` a concrete number? (reject "good", "improved", "better")
255
+ 2. Does `actual` satisfy `target`? Do the math.
256
+ 3. Is `met` correct?
257
+
258
+ Reject these patterns:
259
+ - "Works but doesn't meet target" → FAILED
260
+ - "Close enough" → FAILED
261
+ - Actual 1500 vs Target >= 7000 → FAILED
262
+
263
+ Output to {worktree_path}/.deepflow/results/{task_id}-verified.yaml:
264
+ verified_status: VERIFIED_PASS|VERIFIED_FAIL
265
+ override: true|false
266
+ reason: "one line"
267
+
268
+ Then rename experiment:
269
+ - VERIFIED_PASS → --passed.md
270
+ - VERIFIED_FAIL → --failed.md (add "Next hypothesis:" to Conclusion)
271
+ ```
272
+
273
+ **Gate:**
274
+ ```
275
+ VERIFIED_PASS → Unblock, log "✓ Spike {task_id} verified"
276
+ VERIFIED_FAIL → Block, log "✗ Spike {task_id} failed verification"
277
+ If override: log "⚠ Agent incorrectly marked as passed"
278
+ ```
208
279
 
209
280
  **On failure, use Task tool to spawn reasoner:**
210
281
  ```
@@ -239,33 +310,25 @@ Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
239
310
  ```
240
311
  {task_id} [SPIKE]: {hypothesis}
241
312
  Type: spike
242
- Method: {minimal steps to validate}
243
- Success criteria: {how to know it passed}
244
- Time-box: {duration}
313
+ Method: {minimal steps}
314
+ Success criteria: {measurable targets}
245
315
  Experiment file: {worktree_absolute_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
246
- Spec: {spec_name}
247
-
248
- **IMPORTANT: Working Directory**
249
- All file operations MUST use this absolute path as base:
250
- {worktree_absolute_path}
251
316
 
252
- Example: To edit src/foo.ts, use:
253
- {worktree_absolute_path}/src/foo.ts
254
-
255
- Do NOT write files to the main project directory.
317
+ Working directory: {worktree_absolute_path}
256
318
 
257
- Execute the minimal validation:
258
- 1. Follow the method steps exactly
259
- 2. Measure against success criteria
260
- 3. Update experiment file with result:
261
- - If passed: rename to --passed.md, record findings
262
- - If failed: rename to --failed.md, record conclusion with "next hypothesis"
263
- 4. Commit as spike({spec}): validate {hypothesis}
264
- 5. Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
319
+ Steps:
320
+ 1. Execute method
321
+ 2. For EACH criterion: record target, measure actual, compare (show math)
322
+ 3. Write experiment as --active.md (verifier determines final status)
323
+ 4. Commit: spike({spec}): validate {hypothesis}
324
+ 5. Write result to .deepflow/results/{task_id}.yaml (see spike result schema)
265
325
 
266
- Result status:
267
- - success = hypothesis validated (passed)
268
- - failed = hypothesis invalidated (failed experiment, NOT agent error)
326
+ Rules:
327
+ - `met: true` ONLY if actual satisfies target
328
+ - `status: success` ONLY if ALL criteria met
329
+ - Worse than baseline = FAILED (baseline 7k, actual 1.5k → FAILED)
330
+ - "Close enough" = FAILED
331
+ - Verifier will check. False positives waste resources.
269
332
  ```
270
333
 
271
334
  ### 8. FAILURE HANDLING
@@ -350,14 +413,14 @@ Checking experiment status...
350
413
  T2: Blocked by T1 (spike not validated)
351
414
  T3: Blocked by T1 (spike not validated)
352
415
 
353
- Wave 1: T1 [SPIKE] (context: 20%)
354
- T1: success (abc1234) → upload--streaming--passed.md
416
+ Wave 1: T1 [SPIKE] (context: 15%)
417
+ T1: complete, verifying...
355
418
 
356
- Checking experiment status...
357
- T2: Experiment passed, unblocked
358
- T3: Experiment passed, unblocked
419
+ Verifying T1...
420
+ Spike T1 verified (throughput 8500 >= 7000)
421
+ upload--streaming--passed.md
359
422
 
360
- Wave 2: T2, T3 parallel (context: 45%)
423
+ Wave 2: T2, T3 parallel (context: 40%)
361
424
  T2: success (def5678)
362
425
  T3: success (ghi9012)
363
426
 
@@ -365,20 +428,38 @@ Wave 2: T2, T3 parallel (context: 45%)
365
428
  ✓ Complete: 3/3 tasks
366
429
  ```
367
430
 
368
- ### Spike Failed
431
+ ### Spike Failed (Agent Correctly Reported)
369
432
 
370
433
  ```
371
434
  /df:execute (context: 10%)
372
435
 
373
- Wave 1: T1 [SPIKE] (context: 20%)
374
- T1: failed → upload--streaming--failed.md
436
+ Wave 1: T1 [SPIKE] (context: 15%)
437
+ T1: complete, verifying...
375
438
 
376
- Checking experiment status...
377
- T2: Blocked - Experiment failed
378
- T3: ⚠ Blocked - Experiment failed
439
+ Verifying T1...
440
+ Spike T1 failed verification (throughput 1500 < 7000)
441
+ upload--streaming--failed.md
442
+
443
+ ⚠ Spike T1 invalidated hypothesis
444
+ → Run /df:plan to generate new hypothesis spike
445
+
446
+ Complete: 1/3 tasks (2 blocked by failed experiment)
447
+ ```
448
+
449
+ ### Spike Failed (Verifier Override)
450
+
451
+ ```
452
+ /df:execute (context: 10%)
453
+
454
+ Wave 1: T1 [SPIKE] (context: 15%)
455
+ T1: complete (agent said: success), verifying...
456
+
457
+ Verifying T1...
458
+ ✗ Spike T1 failed verification (throughput 1500 < 7000)
459
+ ⚠ Agent incorrectly marked as passed — overriding to FAILED
460
+ → upload--streaming--failed.md
379
461
 
380
462
  ⚠ Spike T1 invalidated hypothesis
381
- Experiment: upload--streaming--failed.md
382
463
  → Run /df:plan to generate new hypothesis spike
383
464
 
384
465
  Complete: 1/3 tasks (2 blocked by failed experiment)