deepflow 0.1.27 → 0.1.28
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/src/commands/df/execute.md +120 -39
package/package.json
CHANGED
|
@@ -29,6 +29,7 @@ Implement tasks from PLAN.md with parallel agents, atomic commits, and context-e
|
|
|
29
29
|
| Agent | subagent_type | model | Purpose |
|
|
30
30
|
|-------|---------------|-------|---------|
|
|
31
31
|
| Implementation | `general-purpose` | `sonnet` | Task implementation |
|
|
32
|
+
| Spike Verifier | `reasoner` | `opus` | Verify spike pass/fail is correct |
|
|
32
33
|
| Debugger | `reasoner` | `opus` | Debugging failures |
|
|
33
34
|
|
|
34
35
|
## Context-Aware Execution
|
|
@@ -57,6 +58,28 @@ commit: abc1234
|
|
|
57
58
|
summary: "one line"
|
|
58
59
|
```
|
|
59
60
|
|
|
61
|
+
**Spike result file** `.deepflow/results/{task_id}.yaml` (additional fields):
|
|
62
|
+
```yaml
|
|
63
|
+
task: T1
|
|
64
|
+
type: spike
|
|
65
|
+
status: success|failed
|
|
66
|
+
commit: abc1234
|
|
67
|
+
summary: "one line"
|
|
68
|
+
criteria:
|
|
69
|
+
- name: "throughput"
|
|
70
|
+
target: ">= 7000 g/s"
|
|
71
|
+
actual: "1500 g/s"
|
|
72
|
+
met: false
|
|
73
|
+
- name: "memory usage"
|
|
74
|
+
target: "< 500 MB"
|
|
75
|
+
actual: "320 MB"
|
|
76
|
+
met: true
|
|
77
|
+
all_criteria_met: false # ALL must be true for spike to pass
|
|
78
|
+
experiment_file: ".deepflow/experiments/upload--streaming--failed.md"
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
**CRITICAL:** `status` MUST equal `success` only if `all_criteria_met: true`. The spike verifier will reject mismatches.
|
|
82
|
+
|
|
60
83
|
## Checkpoint & Resume
|
|
61
84
|
|
|
62
85
|
**File:** `.deepflow/checkpoint.json` — stored in WORKTREE directory, not main.
|
|
@@ -202,9 +225,57 @@ Same-file conflicts: spawn sequentially instead.
|
|
|
202
225
|
**Spike Task Execution:**
|
|
203
226
|
When spawning a spike task, the agent MUST:
|
|
204
227
|
1. Execute the minimal validation method
|
|
205
|
-
2. Record
|
|
206
|
-
3.
|
|
207
|
-
4.
|
|
228
|
+
2. Record structured criteria evaluation in result file (see spike result schema above)
|
|
229
|
+
3. Write experiment file with `--active.md` status (verifier determines final status)
|
|
230
|
+
4. Commit as `spike({spec}): validate {hypothesis}`
|
|
231
|
+
|
|
232
|
+
**IMPORTANT:** Spike agent writes `--active.md`, NOT `--passed.md` or `--failed.md`. The verifier determines final status.
|
|
233
|
+
|
|
234
|
+
### 6.5. VERIFY SPIKE RESULTS
|
|
235
|
+
|
|
236
|
+
After spike completes, spawn verifier BEFORE unblocking implementation tasks.
|
|
237
|
+
|
|
238
|
+
**Trigger:** Spike result file detected (`.deepflow/results/T{n}.yaml` with `type: spike`)
|
|
239
|
+
|
|
240
|
+
**Spawn:**
|
|
241
|
+
```
|
|
242
|
+
Task(subagent_type="reasoner", model="opus", prompt=VERIFIER_PROMPT)
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
**Verifier Prompt:**
|
|
246
|
+
```
|
|
247
|
+
SPIKE VERIFICATION — Be skeptical. Catch false positives.
|
|
248
|
+
|
|
249
|
+
Task: {task_id}
|
|
250
|
+
Result: {worktree_path}/.deepflow/results/{task_id}.yaml
|
|
251
|
+
Experiment: {worktree_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
|
|
252
|
+
|
|
253
|
+
For each criterion in result file:
|
|
254
|
+
1. Is `actual` a concrete number? (reject "good", "improved", "better")
|
|
255
|
+
2. Does `actual` satisfy `target`? Do the math.
|
|
256
|
+
3. Is `met` correct?
|
|
257
|
+
|
|
258
|
+
Reject these patterns:
|
|
259
|
+
- "Works but doesn't meet target" → FAILED
|
|
260
|
+
- "Close enough" → FAILED
|
|
261
|
+
- Actual 1500 vs Target >= 7000 → FAILED
|
|
262
|
+
|
|
263
|
+
Output to {worktree_path}/.deepflow/results/{task_id}-verified.yaml:
|
|
264
|
+
verified_status: VERIFIED_PASS|VERIFIED_FAIL
|
|
265
|
+
override: true|false
|
|
266
|
+
reason: "one line"
|
|
267
|
+
|
|
268
|
+
Then rename experiment:
|
|
269
|
+
- VERIFIED_PASS → --passed.md
|
|
270
|
+
- VERIFIED_FAIL → --failed.md (add "Next hypothesis:" to Conclusion)
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
**Gate:**
|
|
274
|
+
```
|
|
275
|
+
VERIFIED_PASS → Unblock, log "✓ Spike {task_id} verified"
|
|
276
|
+
VERIFIED_FAIL → Block, log "✗ Spike {task_id} failed verification"
|
|
277
|
+
If override: log "⚠ Agent incorrectly marked as passed"
|
|
278
|
+
```
|
|
208
279
|
|
|
209
280
|
**On failure, use Task tool to spawn reasoner:**
|
|
210
281
|
```
|
|
@@ -239,33 +310,25 @@ Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
|
|
|
239
310
|
```
|
|
240
311
|
{task_id} [SPIKE]: {hypothesis}
|
|
241
312
|
Type: spike
|
|
242
|
-
Method: {minimal steps
|
|
243
|
-
Success criteria: {
|
|
244
|
-
Time-box: {duration}
|
|
313
|
+
Method: {minimal steps}
|
|
314
|
+
Success criteria: {measurable targets}
|
|
245
315
|
Experiment file: {worktree_absolute_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
|
|
246
|
-
Spec: {spec_name}
|
|
247
|
-
|
|
248
|
-
**IMPORTANT: Working Directory**
|
|
249
|
-
All file operations MUST use this absolute path as base:
|
|
250
|
-
{worktree_absolute_path}
|
|
251
316
|
|
|
252
|
-
|
|
253
|
-
{worktree_absolute_path}/src/foo.ts
|
|
254
|
-
|
|
255
|
-
Do NOT write files to the main project directory.
|
|
317
|
+
Working directory: {worktree_absolute_path}
|
|
256
318
|
|
|
257
|
-
|
|
258
|
-
1.
|
|
259
|
-
2.
|
|
260
|
-
3.
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
4. Commit as spike({spec}): validate {hypothesis}
|
|
264
|
-
5. Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
|
|
319
|
+
Steps:
|
|
320
|
+
1. Execute method
|
|
321
|
+
2. For EACH criterion: record target, measure actual, compare (show math)
|
|
322
|
+
3. Write experiment as --active.md (verifier determines final status)
|
|
323
|
+
4. Commit: spike({spec}): validate {hypothesis}
|
|
324
|
+
5. Write result to .deepflow/results/{task_id}.yaml (see spike result schema)
|
|
265
325
|
|
|
266
|
-
|
|
267
|
-
-
|
|
268
|
-
-
|
|
326
|
+
Rules:
|
|
327
|
+
- `met: true` ONLY if actual satisfies target
|
|
328
|
+
- `status: success` ONLY if ALL criteria met
|
|
329
|
+
- Worse than baseline = FAILED (baseline 7k, actual 1.5k → FAILED)
|
|
330
|
+
- "Close enough" = FAILED
|
|
331
|
+
- Verifier will check. False positives waste resources.
|
|
269
332
|
```
|
|
270
333
|
|
|
271
334
|
### 8. FAILURE HANDLING
|
|
@@ -350,14 +413,14 @@ Checking experiment status...
|
|
|
350
413
|
T2: Blocked by T1 (spike not validated)
|
|
351
414
|
T3: Blocked by T1 (spike not validated)
|
|
352
415
|
|
|
353
|
-
Wave 1: T1 [SPIKE] (context:
|
|
354
|
-
T1:
|
|
416
|
+
Wave 1: T1 [SPIKE] (context: 15%)
|
|
417
|
+
T1: complete, verifying...
|
|
355
418
|
|
|
356
|
-
|
|
357
|
-
|
|
358
|
-
|
|
419
|
+
Verifying T1...
|
|
420
|
+
✓ Spike T1 verified (throughput 8500 >= 7000)
|
|
421
|
+
→ upload--streaming--passed.md
|
|
359
422
|
|
|
360
|
-
Wave 2: T2, T3 parallel (context:
|
|
423
|
+
Wave 2: T2, T3 parallel (context: 40%)
|
|
361
424
|
T2: success (def5678)
|
|
362
425
|
T3: success (ghi9012)
|
|
363
426
|
|
|
@@ -365,20 +428,38 @@ Wave 2: T2, T3 parallel (context: 45%)
|
|
|
365
428
|
✓ Complete: 3/3 tasks
|
|
366
429
|
```
|
|
367
430
|
|
|
368
|
-
### Spike Failed
|
|
431
|
+
### Spike Failed (Agent Correctly Reported)
|
|
369
432
|
|
|
370
433
|
```
|
|
371
434
|
/df:execute (context: 10%)
|
|
372
435
|
|
|
373
|
-
Wave 1: T1 [SPIKE] (context:
|
|
374
|
-
T1:
|
|
436
|
+
Wave 1: T1 [SPIKE] (context: 15%)
|
|
437
|
+
T1: complete, verifying...
|
|
375
438
|
|
|
376
|
-
|
|
377
|
-
|
|
378
|
-
|
|
439
|
+
Verifying T1...
|
|
440
|
+
✗ Spike T1 failed verification (throughput 1500 < 7000)
|
|
441
|
+
→ upload--streaming--failed.md
|
|
442
|
+
|
|
443
|
+
⚠ Spike T1 invalidated hypothesis
|
|
444
|
+
→ Run /df:plan to generate new hypothesis spike
|
|
445
|
+
|
|
446
|
+
Complete: 1/3 tasks (2 blocked by failed experiment)
|
|
447
|
+
```
|
|
448
|
+
|
|
449
|
+
### Spike Failed (Verifier Override)
|
|
450
|
+
|
|
451
|
+
```
|
|
452
|
+
/df:execute (context: 10%)
|
|
453
|
+
|
|
454
|
+
Wave 1: T1 [SPIKE] (context: 15%)
|
|
455
|
+
T1: complete (agent said: success), verifying...
|
|
456
|
+
|
|
457
|
+
Verifying T1...
|
|
458
|
+
✗ Spike T1 failed verification (throughput 1500 < 7000)
|
|
459
|
+
⚠ Agent incorrectly marked as passed — overriding to FAILED
|
|
460
|
+
→ upload--streaming--failed.md
|
|
379
461
|
|
|
380
462
|
⚠ Spike T1 invalidated hypothesis
|
|
381
|
-
Experiment: upload--streaming--failed.md
|
|
382
463
|
→ Run /df:plan to generate new hypothesis spike
|
|
383
464
|
|
|
384
465
|
Complete: 1/3 tasks (2 blocked by failed experiment)
|