deepflow 0.1.26 → 0.1.28
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/src/commands/df/execute.md +217 -39
- package/src/commands/df/verify.md +43 -0
- package/templates/config-template.yaml +18 -0
package/package.json
CHANGED
|
@@ -29,6 +29,7 @@ Implement tasks from PLAN.md with parallel agents, atomic commits, and context-e
|
|
|
29
29
|
| Agent | subagent_type | model | Purpose |
|
|
30
30
|
|-------|---------------|-------|---------|
|
|
31
31
|
| Implementation | `general-purpose` | `sonnet` | Task implementation |
|
|
32
|
+
| Spike Verifier | `reasoner` | `opus` | Verify spike pass/fail is correct |
|
|
32
33
|
| Debugger | `reasoner` | `opus` | Debugging failures |
|
|
33
34
|
|
|
34
35
|
## Context-Aware Execution
|
|
@@ -57,24 +58,90 @@ commit: abc1234
|
|
|
57
58
|
summary: "one line"
|
|
58
59
|
```
|
|
59
60
|
|
|
61
|
+
**Spike result file** `.deepflow/results/{task_id}.yaml` (additional fields):
|
|
62
|
+
```yaml
|
|
63
|
+
task: T1
|
|
64
|
+
type: spike
|
|
65
|
+
status: success|failed
|
|
66
|
+
commit: abc1234
|
|
67
|
+
summary: "one line"
|
|
68
|
+
criteria:
|
|
69
|
+
- name: "throughput"
|
|
70
|
+
target: ">= 7000 g/s"
|
|
71
|
+
actual: "1500 g/s"
|
|
72
|
+
met: false
|
|
73
|
+
- name: "memory usage"
|
|
74
|
+
target: "< 500 MB"
|
|
75
|
+
actual: "320 MB"
|
|
76
|
+
met: true
|
|
77
|
+
all_criteria_met: false # ALL must be true for spike to pass
|
|
78
|
+
experiment_file: ".deepflow/experiments/upload--streaming--failed.md"
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
**CRITICAL:** `status` MUST equal `success` only if `all_criteria_met: true`. The spike verifier will reject mismatches.
|
|
82
|
+
|
|
60
83
|
## Checkpoint & Resume
|
|
61
84
|
|
|
62
|
-
**File:** `.deepflow/checkpoint.json` —
|
|
85
|
+
**File:** `.deepflow/checkpoint.json` — stored in WORKTREE directory, not main.
|
|
63
86
|
|
|
64
|
-
**
|
|
65
|
-
|
|
87
|
+
**Schema:**
|
|
88
|
+
```json
|
|
89
|
+
{
|
|
90
|
+
"completed_tasks": ["T1", "T2"],
|
|
91
|
+
"current_wave": 2,
|
|
92
|
+
"worktree_path": ".deepflow/worktrees/df/doing-upload/20260202-1430",
|
|
93
|
+
"worktree_branch": "df/doing-upload/20260202-1430"
|
|
94
|
+
}
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
**On checkpoint:** Complete wave → update PLAN.md → save to worktree → exit.
|
|
98
|
+
**Resume:** `--continue` loads checkpoint, verifies worktree, skips completed tasks.
|
|
66
99
|
|
|
67
100
|
## Behavior
|
|
68
101
|
|
|
69
102
|
### 1. CHECK CHECKPOINT
|
|
70
103
|
|
|
71
104
|
```
|
|
72
|
-
--continue → Load
|
|
105
|
+
--continue → Load checkpoint
|
|
106
|
+
→ If worktree_path exists:
|
|
107
|
+
→ Verify worktree still exists on disk
|
|
108
|
+
→ If missing: Error "Worktree deleted. Use --fresh"
|
|
109
|
+
→ If exists: Use it, skip worktree creation
|
|
110
|
+
→ Resume execution with completed tasks
|
|
73
111
|
--fresh → Delete checkpoint, start fresh
|
|
74
112
|
checkpoint exists → Prompt: "Resume? (y/n)"
|
|
75
113
|
else → Start fresh
|
|
76
114
|
```
|
|
77
115
|
|
|
116
|
+
### 1.5. CREATE WORKTREE
|
|
117
|
+
|
|
118
|
+
Before spawning any agents, create an isolated worktree:
|
|
119
|
+
|
|
120
|
+
```
|
|
121
|
+
# Check main is clean (ignore untracked)
|
|
122
|
+
git diff --quiet HEAD || Error: "Main has uncommitted changes. Commit or stash first."
|
|
123
|
+
|
|
124
|
+
# Generate worktree path
|
|
125
|
+
SPEC_NAME=$(basename spec/doing-*.md .md | sed 's/doing-//')
|
|
126
|
+
TIMESTAMP=$(date +%Y%m%d-%H%M)
|
|
127
|
+
BRANCH_NAME="df/${SPEC_NAME}/${TIMESTAMP}"
|
|
128
|
+
WORKTREE_PATH=".deepflow/worktrees/${BRANCH_NAME}"
|
|
129
|
+
|
|
130
|
+
# Create worktree
|
|
131
|
+
git worktree add -b "${BRANCH_NAME}" "${WORKTREE_PATH}"
|
|
132
|
+
|
|
133
|
+
# Store in checkpoint for resume
|
|
134
|
+
checkpoint.worktree_path = WORKTREE_PATH
|
|
135
|
+
checkpoint.worktree_branch = BRANCH_NAME
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
**Resume handling:**
|
|
139
|
+
- If checkpoint has worktree_path → verify it exists, use it
|
|
140
|
+
- If worktree missing → Error: "Worktree deleted. Use --fresh"
|
|
141
|
+
|
|
142
|
+
**Existing worktree handling:**
|
|
143
|
+
- If worktree exists for same spec → Prompt: "Resume existing worktree? (y/n/delete)"
|
|
144
|
+
|
|
78
145
|
### 2. LOAD PLAN
|
|
79
146
|
|
|
80
147
|
```
|
|
@@ -158,9 +225,57 @@ Same-file conflicts: spawn sequentially instead.
|
|
|
158
225
|
**Spike Task Execution:**
|
|
159
226
|
When spawning a spike task, the agent MUST:
|
|
160
227
|
1. Execute the minimal validation method
|
|
161
|
-
2. Record
|
|
162
|
-
3.
|
|
163
|
-
4.
|
|
228
|
+
2. Record structured criteria evaluation in result file (see spike result schema above)
|
|
229
|
+
3. Write experiment file with `--active.md` status (verifier determines final status)
|
|
230
|
+
4. Commit as `spike({spec}): validate {hypothesis}`
|
|
231
|
+
|
|
232
|
+
**IMPORTANT:** Spike agent writes `--active.md`, NOT `--passed.md` or `--failed.md`. The verifier determines final status.
|
|
233
|
+
|
|
234
|
+
### 6.5. VERIFY SPIKE RESULTS
|
|
235
|
+
|
|
236
|
+
After spike completes, spawn verifier BEFORE unblocking implementation tasks.
|
|
237
|
+
|
|
238
|
+
**Trigger:** Spike result file detected (`.deepflow/results/T{n}.yaml` with `type: spike`)
|
|
239
|
+
|
|
240
|
+
**Spawn:**
|
|
241
|
+
```
|
|
242
|
+
Task(subagent_type="reasoner", model="opus", prompt=VERIFIER_PROMPT)
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
**Verifier Prompt:**
|
|
246
|
+
```
|
|
247
|
+
SPIKE VERIFICATION — Be skeptical. Catch false positives.
|
|
248
|
+
|
|
249
|
+
Task: {task_id}
|
|
250
|
+
Result: {worktree_path}/.deepflow/results/{task_id}.yaml
|
|
251
|
+
Experiment: {worktree_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
|
|
252
|
+
|
|
253
|
+
For each criterion in result file:
|
|
254
|
+
1. Is `actual` a concrete number? (reject "good", "improved", "better")
|
|
255
|
+
2. Does `actual` satisfy `target`? Do the math.
|
|
256
|
+
3. Is `met` correct?
|
|
257
|
+
|
|
258
|
+
Reject these patterns:
|
|
259
|
+
- "Works but doesn't meet target" → FAILED
|
|
260
|
+
- "Close enough" → FAILED
|
|
261
|
+
- Actual 1500 vs Target >= 7000 → FAILED
|
|
262
|
+
|
|
263
|
+
Output to {worktree_path}/.deepflow/results/{task_id}-verified.yaml:
|
|
264
|
+
verified_status: VERIFIED_PASS|VERIFIED_FAIL
|
|
265
|
+
override: true|false
|
|
266
|
+
reason: "one line"
|
|
267
|
+
|
|
268
|
+
Then rename experiment:
|
|
269
|
+
- VERIFIED_PASS → --passed.md
|
|
270
|
+
- VERIFIED_FAIL → --failed.md (add "Next hypothesis:" to Conclusion)
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
**Gate:**
|
|
274
|
+
```
|
|
275
|
+
VERIFIED_PASS → Unblock, log "✓ Spike {task_id} verified"
|
|
276
|
+
VERIFIED_FAIL → Block, log "✗ Spike {task_id} failed verification"
|
|
277
|
+
If override: log "⚠ Agent incorrectly marked as passed"
|
|
278
|
+
```
|
|
164
279
|
|
|
165
280
|
**On failure, use Task tool to spawn reasoner:**
|
|
166
281
|
```
|
|
@@ -178,42 +293,87 @@ Task tool parameters:
|
|
|
178
293
|
Files: {target files}
|
|
179
294
|
Spec: {spec_name}
|
|
180
295
|
|
|
296
|
+
**IMPORTANT: Working Directory**
|
|
297
|
+
All file operations MUST use this absolute path as base:
|
|
298
|
+
{worktree_absolute_path}
|
|
299
|
+
|
|
300
|
+
Example: To edit src/foo.ts, use:
|
|
301
|
+
{worktree_absolute_path}/src/foo.ts
|
|
302
|
+
|
|
303
|
+
Do NOT write files to the main project directory.
|
|
304
|
+
|
|
181
305
|
Implement, test, commit as feat({spec}): {description}.
|
|
182
|
-
Write result to
|
|
306
|
+
Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
|
|
183
307
|
```
|
|
184
308
|
|
|
185
309
|
**Spike Task:**
|
|
186
310
|
```
|
|
187
311
|
{task_id} [SPIKE]: {hypothesis}
|
|
188
312
|
Type: spike
|
|
189
|
-
Method: {minimal steps
|
|
190
|
-
Success criteria: {
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
|
|
313
|
+
Method: {minimal steps}
|
|
314
|
+
Success criteria: {measurable targets}
|
|
315
|
+
Experiment file: {worktree_absolute_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
|
|
316
|
+
|
|
317
|
+
Working directory: {worktree_absolute_path}
|
|
318
|
+
|
|
319
|
+
Steps:
|
|
320
|
+
1. Execute method
|
|
321
|
+
2. For EACH criterion: record target, measure actual, compare (show math)
|
|
322
|
+
3. Write experiment as --active.md (verifier determines final status)
|
|
323
|
+
4. Commit: spike({spec}): validate {hypothesis}
|
|
324
|
+
5. Write result to .deepflow/results/{task_id}.yaml (see spike result schema)
|
|
325
|
+
|
|
326
|
+
Rules:
|
|
327
|
+
- `met: true` ONLY if actual satisfies target
|
|
328
|
+
- `status: success` ONLY if ALL criteria met
|
|
329
|
+
- Worse than baseline = FAILED (baseline 7k, actual 1.5k → FAILED)
|
|
330
|
+
- "Close enough" = FAILED
|
|
331
|
+
- Verifier will check. False positives waste resources.
|
|
332
|
+
```
|
|
333
|
+
|
|
334
|
+
### 8. FAILURE HANDLING
|
|
194
335
|
|
|
195
|
-
|
|
196
|
-
1. Follow the method steps exactly
|
|
197
|
-
2. Measure against success criteria
|
|
198
|
-
3. Update experiment file with result:
|
|
199
|
-
- If passed: rename to --passed.md, record findings
|
|
200
|
-
- If failed: rename to --failed.md, record conclusion with "next hypothesis"
|
|
201
|
-
4. Commit as spike({spec}): validate {hypothesis}
|
|
202
|
-
5. Write result to .deepflow/results/{task_id}.yaml
|
|
336
|
+
When a task fails and cannot be auto-fixed:
|
|
203
337
|
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
338
|
+
**Behavior:**
|
|
339
|
+
1. Leave worktree intact at `{worktree_path}`
|
|
340
|
+
2. Keep checkpoint.json for potential resume
|
|
341
|
+
3. Output debugging instructions
|
|
342
|
+
|
|
343
|
+
**Output:**
|
|
207
344
|
```
|
|
345
|
+
✗ Task T3 failed after retry
|
|
346
|
+
|
|
347
|
+
Worktree preserved for debugging:
|
|
348
|
+
Path: .deepflow/worktrees/df/doing-upload/20260202-1430
|
|
349
|
+
Branch: df/doing-upload/20260202-1430
|
|
350
|
+
|
|
351
|
+
To investigate:
|
|
352
|
+
cd .deepflow/worktrees/df/doing-upload/20260202-1430
|
|
353
|
+
# examine files, run tests, etc.
|
|
354
|
+
|
|
355
|
+
To resume after fixing:
|
|
356
|
+
/df:execute --continue
|
|
357
|
+
|
|
358
|
+
To discard and start fresh:
|
|
359
|
+
git worktree remove --force .deepflow/worktrees/df/doing-upload/20260202-1430
|
|
360
|
+
git branch -D df/doing-upload/20260202-1430
|
|
361
|
+
/df:execute --fresh
|
|
362
|
+
```
|
|
363
|
+
|
|
364
|
+
**Key points:**
|
|
365
|
+
- Never auto-delete worktree on failure (cleanup_on_fail: false by default)
|
|
366
|
+
- Always provide the exact cleanup commands
|
|
367
|
+
- Checkpoint remains so --continue can work after manual fix
|
|
208
368
|
|
|
209
|
-
###
|
|
369
|
+
### 9. COMPLETE SPECS
|
|
210
370
|
|
|
211
371
|
When all tasks done for a `doing-*` spec:
|
|
212
372
|
1. Embed history in spec: `## Completed` section
|
|
213
373
|
2. Rename: `doing-upload.md` → `done-upload.md`
|
|
214
374
|
3. Remove section from PLAN.md
|
|
215
375
|
|
|
216
|
-
###
|
|
376
|
+
### 10. ITERATE
|
|
217
377
|
|
|
218
378
|
Repeat until: all done, all blocked, or checkpoint.
|
|
219
379
|
|
|
@@ -253,14 +413,14 @@ Checking experiment status...
|
|
|
253
413
|
T2: Blocked by T1 (spike not validated)
|
|
254
414
|
T3: Blocked by T1 (spike not validated)
|
|
255
415
|
|
|
256
|
-
Wave 1: T1 [SPIKE] (context:
|
|
257
|
-
T1:
|
|
416
|
+
Wave 1: T1 [SPIKE] (context: 15%)
|
|
417
|
+
T1: complete, verifying...
|
|
258
418
|
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
|
|
419
|
+
Verifying T1...
|
|
420
|
+
✓ Spike T1 verified (throughput 8500 >= 7000)
|
|
421
|
+
→ upload--streaming--passed.md
|
|
262
422
|
|
|
263
|
-
Wave 2: T2, T3 parallel (context:
|
|
423
|
+
Wave 2: T2, T3 parallel (context: 40%)
|
|
264
424
|
T2: success (def5678)
|
|
265
425
|
T3: success (ghi9012)
|
|
266
426
|
|
|
@@ -268,20 +428,38 @@ Wave 2: T2, T3 parallel (context: 45%)
|
|
|
268
428
|
✓ Complete: 3/3 tasks
|
|
269
429
|
```
|
|
270
430
|
|
|
271
|
-
### Spike Failed
|
|
431
|
+
### Spike Failed (Agent Correctly Reported)
|
|
272
432
|
|
|
273
433
|
```
|
|
274
434
|
/df:execute (context: 10%)
|
|
275
435
|
|
|
276
|
-
Wave 1: T1 [SPIKE] (context:
|
|
277
|
-
T1:
|
|
436
|
+
Wave 1: T1 [SPIKE] (context: 15%)
|
|
437
|
+
T1: complete, verifying...
|
|
278
438
|
|
|
279
|
-
|
|
280
|
-
|
|
281
|
-
|
|
439
|
+
Verifying T1...
|
|
440
|
+
✗ Spike T1 failed verification (throughput 1500 < 7000)
|
|
441
|
+
→ upload--streaming--failed.md
|
|
442
|
+
|
|
443
|
+
⚠ Spike T1 invalidated hypothesis
|
|
444
|
+
→ Run /df:plan to generate new hypothesis spike
|
|
445
|
+
|
|
446
|
+
Complete: 1/3 tasks (2 blocked by failed experiment)
|
|
447
|
+
```
|
|
448
|
+
|
|
449
|
+
### Spike Failed (Verifier Override)
|
|
450
|
+
|
|
451
|
+
```
|
|
452
|
+
/df:execute (context: 10%)
|
|
453
|
+
|
|
454
|
+
Wave 1: T1 [SPIKE] (context: 15%)
|
|
455
|
+
T1: complete (agent said: success), verifying...
|
|
456
|
+
|
|
457
|
+
Verifying T1...
|
|
458
|
+
✗ Spike T1 failed verification (throughput 1500 < 7000)
|
|
459
|
+
⚠ Agent incorrectly marked as passed — overriding to FAILED
|
|
460
|
+
→ upload--streaming--failed.md
|
|
282
461
|
|
|
283
462
|
⚠ Spike T1 invalidated hypothesis
|
|
284
|
-
Experiment: upload--streaming--failed.md
|
|
285
463
|
→ Run /df:plan to generate new hypothesis spike
|
|
286
464
|
|
|
287
465
|
Complete: 1/3 tasks (2 blocked by failed experiment)
|
|
@@ -115,3 +115,46 @@ Learnings captured:
|
|
|
115
115
|
→ experiments/perf--streaming-upload--success.md
|
|
116
116
|
→ experiments/auth--jwt-refresh-rotation--success.md
|
|
117
117
|
```
|
|
118
|
+
|
|
119
|
+
## Post-Verification: Worktree Merge & Cleanup
|
|
120
|
+
|
|
121
|
+
After all verification passes:
|
|
122
|
+
|
|
123
|
+
### 1. MERGE TO MAIN
|
|
124
|
+
|
|
125
|
+
```bash
|
|
126
|
+
# Get worktree info from checkpoint
|
|
127
|
+
WORKTREE_BRANCH=$(cat .deepflow/checkpoint.json | jq -r '.worktree_branch')
|
|
128
|
+
|
|
129
|
+
# Switch to main and merge
|
|
130
|
+
git checkout main
|
|
131
|
+
git merge "${WORKTREE_BRANCH}" --no-ff -m "feat({spec}): merge verified changes"
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
**On merge conflict:**
|
|
135
|
+
- Keep worktree intact for manual resolution
|
|
136
|
+
- Output: "Merge conflict detected. Resolve manually, then run /df:verify --merge-only"
|
|
137
|
+
- Exit without cleanup
|
|
138
|
+
|
|
139
|
+
### 2. CLEANUP WORKTREE
|
|
140
|
+
|
|
141
|
+
After successful merge:
|
|
142
|
+
|
|
143
|
+
```bash
|
|
144
|
+
# Get worktree path from checkpoint
|
|
145
|
+
WORKTREE_PATH=$(cat .deepflow/checkpoint.json | jq -r '.worktree_path')
|
|
146
|
+
|
|
147
|
+
# Remove worktree and branch
|
|
148
|
+
git worktree remove --force "${WORKTREE_PATH}"
|
|
149
|
+
git branch -d "${WORKTREE_BRANCH}"
|
|
150
|
+
|
|
151
|
+
# Remove checkpoint
|
|
152
|
+
rm .deepflow/checkpoint.json
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
**Output on success:**
|
|
156
|
+
```
|
|
157
|
+
✓ Merged df/doing-upload/20260202-1430 to main
|
|
158
|
+
✓ Cleaned up worktree and branch
|
|
159
|
+
✓ Spec complete: doing-upload → done-upload
|
|
160
|
+
```
|
|
@@ -43,3 +43,21 @@ commits:
|
|
|
43
43
|
format: "feat({spec}): {description}"
|
|
44
44
|
atomic: true # One task = one commit
|
|
45
45
|
push_after: complete # Or "each" for every commit
|
|
46
|
+
|
|
47
|
+
# Worktree isolation for /df:execute
|
|
48
|
+
# Isolates all agent work in a git worktree, keeping main clean
|
|
49
|
+
worktree:
|
|
50
|
+
# Enable worktree isolation (default: true)
|
|
51
|
+
enabled: true
|
|
52
|
+
|
|
53
|
+
# Base path for worktrees relative to project root
|
|
54
|
+
base_path: .deepflow/worktrees
|
|
55
|
+
|
|
56
|
+
# Branch name prefix for worktree branches
|
|
57
|
+
branch_prefix: df/
|
|
58
|
+
|
|
59
|
+
# Automatically cleanup worktree after successful verify
|
|
60
|
+
cleanup_on_success: true
|
|
61
|
+
|
|
62
|
+
# Keep worktree after failed execution for debugging
|
|
63
|
+
cleanup_on_fail: false
|