deepflow 0.1.24 → 0.1.27
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/src/commands/df/execute.md +261 -16
- package/src/commands/df/plan.md +158 -20
- package/src/commands/df/spec.md +57 -8
- package/src/commands/df/verify.md +57 -2
- package/templates/config-template.yaml +21 -0
- package/templates/experiment-template.md +74 -0
- package/templates/plan-template.md +18 -0
package/package.json
CHANGED
|
@@ -24,8 +24,12 @@ Implement tasks from PLAN.md with parallel agents, atomic commits, and context-e
|
|
|
24
24
|
|
|
25
25
|
## Skills & Agents
|
|
26
26
|
- Skill: `atomic-commits` — Clean commit protocol
|
|
27
|
-
|
|
28
|
-
|
|
27
|
+
|
|
28
|
+
**Use Task tool to spawn agents:**
|
|
29
|
+
| Agent | subagent_type | model | Purpose |
|
|
30
|
+
|-------|---------------|-------|---------|
|
|
31
|
+
| Implementation | `general-purpose` | `sonnet` | Task implementation |
|
|
32
|
+
| Debugger | `reasoner` | `opus` | Debugging failures |
|
|
29
33
|
|
|
30
34
|
## Context-Aware Execution
|
|
31
35
|
|
|
@@ -55,22 +59,66 @@ summary: "one line"
|
|
|
55
59
|
|
|
56
60
|
## Checkpoint & Resume
|
|
57
61
|
|
|
58
|
-
**File:** `.deepflow/checkpoint.json` —
|
|
62
|
+
**File:** `.deepflow/checkpoint.json` — stored in WORKTREE directory, not main.
|
|
63
|
+
|
|
64
|
+
**Schema:**
|
|
65
|
+
```json
|
|
66
|
+
{
|
|
67
|
+
"completed_tasks": ["T1", "T2"],
|
|
68
|
+
"current_wave": 2,
|
|
69
|
+
"worktree_path": ".deepflow/worktrees/df/doing-upload/20260202-1430",
|
|
70
|
+
"worktree_branch": "df/doing-upload/20260202-1430"
|
|
71
|
+
}
|
|
72
|
+
```
|
|
59
73
|
|
|
60
|
-
**On checkpoint:** Complete wave → update PLAN.md → save → exit.
|
|
61
|
-
**Resume:** `--continue` loads checkpoint, skips completed tasks.
|
|
74
|
+
**On checkpoint:** Complete wave → update PLAN.md → save to worktree → exit.
|
|
75
|
+
**Resume:** `--continue` loads checkpoint, verifies worktree, skips completed tasks.
|
|
62
76
|
|
|
63
77
|
## Behavior
|
|
64
78
|
|
|
65
79
|
### 1. CHECK CHECKPOINT
|
|
66
80
|
|
|
67
81
|
```
|
|
68
|
-
--continue → Load
|
|
82
|
+
--continue → Load checkpoint
|
|
83
|
+
→ If worktree_path exists:
|
|
84
|
+
→ Verify worktree still exists on disk
|
|
85
|
+
→ If missing: Error "Worktree deleted. Use --fresh"
|
|
86
|
+
→ If exists: Use it, skip worktree creation
|
|
87
|
+
→ Resume execution with completed tasks
|
|
69
88
|
--fresh → Delete checkpoint, start fresh
|
|
70
89
|
checkpoint exists → Prompt: "Resume? (y/n)"
|
|
71
90
|
else → Start fresh
|
|
72
91
|
```
|
|
73
92
|
|
|
93
|
+
### 1.5. CREATE WORKTREE
|
|
94
|
+
|
|
95
|
+
Before spawning any agents, create an isolated worktree:
|
|
96
|
+
|
|
97
|
+
```
|
|
98
|
+
# Check main is clean (ignore untracked)
|
|
99
|
+
git diff --quiet HEAD || Error: "Main has uncommitted changes. Commit or stash first."
|
|
100
|
+
|
|
101
|
+
# Generate worktree path
|
|
102
|
+
SPEC_NAME=$(basename spec/doing-*.md .md | sed 's/doing-//')
|
|
103
|
+
TIMESTAMP=$(date +%Y%m%d-%H%M)
|
|
104
|
+
BRANCH_NAME="df/${SPEC_NAME}/${TIMESTAMP}"
|
|
105
|
+
WORKTREE_PATH=".deepflow/worktrees/${BRANCH_NAME}"
|
|
106
|
+
|
|
107
|
+
# Create worktree
|
|
108
|
+
git worktree add -b "${BRANCH_NAME}" "${WORKTREE_PATH}"
|
|
109
|
+
|
|
110
|
+
# Store in checkpoint for resume
|
|
111
|
+
checkpoint.worktree_path = WORKTREE_PATH
|
|
112
|
+
checkpoint.worktree_branch = BRANCH_NAME
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
**Resume handling:**
|
|
116
|
+
- If checkpoint has worktree_path → verify it exists, use it
|
|
117
|
+
- If worktree missing → Error: "Worktree deleted. Use --fresh"
|
|
118
|
+
|
|
119
|
+
**Existing worktree handling:**
|
|
120
|
+
- If worktree exists for same spec → Prompt: "Resume existing worktree? (y/n/delete)"
|
|
121
|
+
|
|
74
122
|
### 2. LOAD PLAN
|
|
75
123
|
|
|
76
124
|
```
|
|
@@ -82,37 +130,187 @@ If missing: "No PLAN.md found. Run /df:plan first."
|
|
|
82
130
|
|
|
83
131
|
Warn if `specs/*.md` (excluding doing-/done-) exist. Non-blocking.
|
|
84
132
|
|
|
85
|
-
### 4.
|
|
133
|
+
### 4. CHECK EXPERIMENT STATUS (HYPOTHESIS VALIDATION)
|
|
134
|
+
|
|
135
|
+
**Before identifying ready tasks**, check experiment validation for full implementation tasks.
|
|
136
|
+
|
|
137
|
+
**Task Types:**
|
|
138
|
+
- **Spike tasks**: Have `[SPIKE]` in title OR `Type: spike` in description — always executable
|
|
139
|
+
- **Full implementation tasks**: Blocked by spike tasks — require validated experiment
|
|
140
|
+
|
|
141
|
+
**Validation Flow:**
|
|
142
|
+
|
|
143
|
+
```
|
|
144
|
+
For each task in plan:
|
|
145
|
+
If task is spike task:
|
|
146
|
+
→ Mark as executable (spikes are always allowed)
|
|
147
|
+
Else if task is blocked by a spike task (T{n}):
|
|
148
|
+
→ Find related experiment file in .deepflow/experiments/
|
|
149
|
+
→ Check experiment status:
|
|
150
|
+
- --passed.md exists → Unblock, proceed with implementation
|
|
151
|
+
- --failed.md exists → Keep blocked, warn user
|
|
152
|
+
- --active.md exists → Keep blocked, spike in progress
|
|
153
|
+
- No experiment → Keep blocked, spike not started
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
**Experiment File Discovery:**
|
|
157
|
+
|
|
158
|
+
```
|
|
159
|
+
Glob: .deepflow/experiments/{topic}--*--{status}.md
|
|
160
|
+
|
|
161
|
+
Topic extraction:
|
|
162
|
+
1. From spike task: experiment file path in task description
|
|
163
|
+
2. From spec name: doing-{topic} → {topic}
|
|
164
|
+
3. Fuzzy match: normalize and match
|
|
165
|
+
```
|
|
86
166
|
|
|
87
|
-
|
|
167
|
+
**Status Handling:**
|
|
88
168
|
|
|
89
|
-
|
|
169
|
+
| Experiment Status | Task Status | Action |
|
|
170
|
+
|-------------------|-------------|--------|
|
|
171
|
+
| `--passed.md` | Ready | Execute full implementation |
|
|
172
|
+
| `--failed.md` | Blocked | Skip, warn: "Experiment failed, re-plan needed" |
|
|
173
|
+
| `--active.md` | Blocked | Skip, info: "Waiting for spike completion" |
|
|
174
|
+
| Not found | Blocked | Skip, info: "Spike task not executed yet" |
|
|
175
|
+
|
|
176
|
+
**Warning Output:**
|
|
177
|
+
|
|
178
|
+
```
|
|
179
|
+
⚠ T3 blocked: Experiment 'upload--streaming--failed.md' did not validate
|
|
180
|
+
→ Run /df:plan to generate new hypothesis spike
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
### 5. IDENTIFY READY TASKS
|
|
184
|
+
|
|
185
|
+
Ready = `[ ]` + all `blocked_by` complete + experiment validated (if applicable) + not in checkpoint.
|
|
186
|
+
|
|
187
|
+
### 6. SPAWN AGENTS
|
|
90
188
|
|
|
91
189
|
Context ≥50%: checkpoint and exit.
|
|
92
190
|
|
|
93
|
-
|
|
191
|
+
**Use Task tool to spawn all ready tasks in ONE message (parallel):**
|
|
192
|
+
```
|
|
193
|
+
Task tool parameters for each task:
|
|
194
|
+
- subagent_type: "general-purpose"
|
|
195
|
+
- model: "sonnet"
|
|
196
|
+
- run_in_background: true
|
|
197
|
+
- prompt: "{task details from PLAN.md}"
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
Same-file conflicts: spawn sequentially instead.
|
|
201
|
+
|
|
202
|
+
**Spike Task Execution:**
|
|
203
|
+
When spawning a spike task, the agent MUST:
|
|
204
|
+
1. Execute the minimal validation method
|
|
205
|
+
2. Record result in experiment file (update status: `--passed.md` or `--failed.md`)
|
|
206
|
+
3. If passed: implementation tasks become unblocked
|
|
207
|
+
4. If failed: record conclusion with "next hypothesis" for future planning
|
|
94
208
|
|
|
95
|
-
On failure
|
|
209
|
+
**On failure, use Task tool to spawn reasoner:**
|
|
210
|
+
```
|
|
211
|
+
Task tool parameters:
|
|
212
|
+
- subagent_type: "reasoner"
|
|
213
|
+
- model: "opus"
|
|
214
|
+
- prompt: "Debug failure: {error details}"
|
|
215
|
+
```
|
|
96
216
|
|
|
97
|
-
###
|
|
217
|
+
### 7. PER-TASK (agent prompt)
|
|
98
218
|
|
|
219
|
+
**Standard Task:**
|
|
99
220
|
```
|
|
100
221
|
{task_id}: {description from PLAN.md}
|
|
101
222
|
Files: {target files}
|
|
102
223
|
Spec: {spec_name}
|
|
103
224
|
|
|
225
|
+
**IMPORTANT: Working Directory**
|
|
226
|
+
All file operations MUST use this absolute path as base:
|
|
227
|
+
{worktree_absolute_path}
|
|
228
|
+
|
|
229
|
+
Example: To edit src/foo.ts, use:
|
|
230
|
+
{worktree_absolute_path}/src/foo.ts
|
|
231
|
+
|
|
232
|
+
Do NOT write files to the main project directory.
|
|
233
|
+
|
|
104
234
|
Implement, test, commit as feat({spec}): {description}.
|
|
105
|
-
Write result to
|
|
235
|
+
Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
|
|
106
236
|
```
|
|
107
237
|
|
|
108
|
-
|
|
238
|
+
**Spike Task:**
|
|
239
|
+
```
|
|
240
|
+
{task_id} [SPIKE]: {hypothesis}
|
|
241
|
+
Type: spike
|
|
242
|
+
Method: {minimal steps to validate}
|
|
243
|
+
Success criteria: {how to know it passed}
|
|
244
|
+
Time-box: {duration}
|
|
245
|
+
Experiment file: {worktree_absolute_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
|
|
246
|
+
Spec: {spec_name}
|
|
247
|
+
|
|
248
|
+
**IMPORTANT: Working Directory**
|
|
249
|
+
All file operations MUST use this absolute path as base:
|
|
250
|
+
{worktree_absolute_path}
|
|
251
|
+
|
|
252
|
+
Example: To edit src/foo.ts, use:
|
|
253
|
+
{worktree_absolute_path}/src/foo.ts
|
|
254
|
+
|
|
255
|
+
Do NOT write files to the main project directory.
|
|
256
|
+
|
|
257
|
+
Execute the minimal validation:
|
|
258
|
+
1. Follow the method steps exactly
|
|
259
|
+
2. Measure against success criteria
|
|
260
|
+
3. Update experiment file with result:
|
|
261
|
+
- If passed: rename to --passed.md, record findings
|
|
262
|
+
- If failed: rename to --failed.md, record conclusion with "next hypothesis"
|
|
263
|
+
4. Commit as spike({spec}): validate {hypothesis}
|
|
264
|
+
5. Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
|
|
265
|
+
|
|
266
|
+
Result status:
|
|
267
|
+
- success = hypothesis validated (passed)
|
|
268
|
+
- failed = hypothesis invalidated (failed experiment, NOT agent error)
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
### 8. FAILURE HANDLING
|
|
272
|
+
|
|
273
|
+
When a task fails and cannot be auto-fixed:
|
|
274
|
+
|
|
275
|
+
**Behavior:**
|
|
276
|
+
1. Leave worktree intact at `{worktree_path}`
|
|
277
|
+
2. Keep checkpoint.json for potential resume
|
|
278
|
+
3. Output debugging instructions
|
|
279
|
+
|
|
280
|
+
**Output:**
|
|
281
|
+
```
|
|
282
|
+
✗ Task T3 failed after retry
|
|
283
|
+
|
|
284
|
+
Worktree preserved for debugging:
|
|
285
|
+
Path: .deepflow/worktrees/df/doing-upload/20260202-1430
|
|
286
|
+
Branch: df/doing-upload/20260202-1430
|
|
287
|
+
|
|
288
|
+
To investigate:
|
|
289
|
+
cd .deepflow/worktrees/df/doing-upload/20260202-1430
|
|
290
|
+
# examine files, run tests, etc.
|
|
291
|
+
|
|
292
|
+
To resume after fixing:
|
|
293
|
+
/df:execute --continue
|
|
294
|
+
|
|
295
|
+
To discard and start fresh:
|
|
296
|
+
git worktree remove --force .deepflow/worktrees/df/doing-upload/20260202-1430
|
|
297
|
+
git branch -D df/doing-upload/20260202-1430
|
|
298
|
+
/df:execute --fresh
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
**Key points:**
|
|
302
|
+
- Never auto-delete worktree on failure (cleanup_on_fail: false by default)
|
|
303
|
+
- Always provide the exact cleanup commands
|
|
304
|
+
- Checkpoint remains so --continue can work after manual fix
|
|
305
|
+
|
|
306
|
+
### 9. COMPLETE SPECS
|
|
109
307
|
|
|
110
308
|
When all tasks done for a `doing-*` spec:
|
|
111
309
|
1. Embed history in spec: `## Completed` section
|
|
112
310
|
2. Rename: `doing-upload.md` → `done-upload.md`
|
|
113
311
|
3. Remove section from PLAN.md
|
|
114
312
|
|
|
115
|
-
###
|
|
313
|
+
### 10. ITERATE
|
|
116
314
|
|
|
117
315
|
Repeat until: all done, all blocked, or checkpoint.
|
|
118
316
|
|
|
@@ -126,6 +324,8 @@ Repeat until: all done, all blocked, or checkpoint.
|
|
|
126
324
|
|
|
127
325
|
## Example
|
|
128
326
|
|
|
327
|
+
### Standard Execution
|
|
328
|
+
|
|
129
329
|
```
|
|
130
330
|
/df:execute (context: 12%)
|
|
131
331
|
|
|
@@ -140,7 +340,52 @@ Wave 2: T3 (context: 48%)
|
|
|
140
340
|
✓ Complete: 3/3 tasks
|
|
141
341
|
```
|
|
142
342
|
|
|
143
|
-
|
|
343
|
+
### Spike-First Execution
|
|
344
|
+
|
|
345
|
+
```
|
|
346
|
+
/df:execute (context: 10%)
|
|
347
|
+
|
|
348
|
+
Checking experiment status...
|
|
349
|
+
T1 [SPIKE]: No experiment yet, spike executable
|
|
350
|
+
T2: Blocked by T1 (spike not validated)
|
|
351
|
+
T3: Blocked by T1 (spike not validated)
|
|
352
|
+
|
|
353
|
+
Wave 1: T1 [SPIKE] (context: 20%)
|
|
354
|
+
T1: success (abc1234) → upload--streaming--passed.md
|
|
355
|
+
|
|
356
|
+
Checking experiment status...
|
|
357
|
+
T2: Experiment passed, unblocked
|
|
358
|
+
T3: Experiment passed, unblocked
|
|
359
|
+
|
|
360
|
+
Wave 2: T2, T3 parallel (context: 45%)
|
|
361
|
+
T2: success (def5678)
|
|
362
|
+
T3: success (ghi9012)
|
|
363
|
+
|
|
364
|
+
✓ doing-upload → done-upload
|
|
365
|
+
✓ Complete: 3/3 tasks
|
|
366
|
+
```
|
|
367
|
+
|
|
368
|
+
### Spike Failed
|
|
369
|
+
|
|
370
|
+
```
|
|
371
|
+
/df:execute (context: 10%)
|
|
372
|
+
|
|
373
|
+
Wave 1: T1 [SPIKE] (context: 20%)
|
|
374
|
+
T1: failed → upload--streaming--failed.md
|
|
375
|
+
|
|
376
|
+
Checking experiment status...
|
|
377
|
+
T2: ⚠ Blocked - Experiment failed
|
|
378
|
+
T3: ⚠ Blocked - Experiment failed
|
|
379
|
+
|
|
380
|
+
⚠ Spike T1 invalidated hypothesis
|
|
381
|
+
Experiment: upload--streaming--failed.md
|
|
382
|
+
→ Run /df:plan to generate new hypothesis spike
|
|
383
|
+
|
|
384
|
+
Complete: 1/3 tasks (2 blocked by failed experiment)
|
|
385
|
+
```
|
|
386
|
+
|
|
387
|
+
### With Checkpoint
|
|
388
|
+
|
|
144
389
|
```
|
|
145
390
|
Wave 1 complete (context: 52%)
|
|
146
391
|
Checkpoint saved. Run /df:execute --continue
|
package/src/commands/df/plan.md
CHANGED
|
@@ -42,21 +42,33 @@ Determine source_dir from config or default to src/
|
|
|
42
42
|
|
|
43
43
|
If no new specs: report counts, suggest `/df:execute`.
|
|
44
44
|
|
|
45
|
-
### 2. CHECK PAST EXPERIMENTS
|
|
45
|
+
### 2. CHECK PAST EXPERIMENTS (SPIKE-FIRST)
|
|
46
46
|
|
|
47
|
-
|
|
47
|
+
**CRITICAL**: Check experiments BEFORE generating any tasks.
|
|
48
|
+
|
|
49
|
+
Extract topic from spec name (fuzzy match), then:
|
|
48
50
|
|
|
49
51
|
```
|
|
50
|
-
Glob .deepflow/experiments/{
|
|
52
|
+
Glob .deepflow/experiments/{topic}--*
|
|
51
53
|
```
|
|
52
54
|
|
|
55
|
+
**Experiment file naming:** `{topic}--{hypothesis}--{status}.md`
|
|
56
|
+
Statuses: `active`, `passed`, `failed`
|
|
57
|
+
|
|
53
58
|
| Result | Action |
|
|
54
59
|
|--------|--------|
|
|
55
|
-
| `--failed.md` |
|
|
56
|
-
| `--
|
|
57
|
-
|
|
|
60
|
+
| `--failed.md` exists | Extract "next hypothesis" from Conclusion section |
|
|
61
|
+
| `--passed.md` exists | Reference as validated pattern, can proceed to full implementation |
|
|
62
|
+
| `--active.md` exists | Wait for experiment completion before planning |
|
|
63
|
+
| No matches | New topic, needs initial spike |
|
|
64
|
+
|
|
65
|
+
**Spike-First Rule**:
|
|
66
|
+
- If `--failed.md` exists: Generate spike task to test the next hypothesis (from failed experiment's Conclusion)
|
|
67
|
+
- If no experiments exist: Generate spike task for the core hypothesis
|
|
68
|
+
- Full implementation tasks are BLOCKED until a spike validates the approach
|
|
69
|
+
- Only proceed to full task generation after `--passed.md` exists
|
|
58
70
|
|
|
59
|
-
|
|
71
|
+
See: `templates/experiment-template.md` for experiment format
|
|
60
72
|
|
|
61
73
|
### 3. DETECT PROJECT CONTEXT
|
|
62
74
|
|
|
@@ -69,7 +81,15 @@ Include patterns in task descriptions for agents to follow.
|
|
|
69
81
|
|
|
70
82
|
### 4. ANALYZE CODEBASE
|
|
71
83
|
|
|
72
|
-
**
|
|
84
|
+
**Use Task tool to spawn Explore agents in parallel:**
|
|
85
|
+
```
|
|
86
|
+
Task tool parameters:
|
|
87
|
+
- subagent_type: "Explore"
|
|
88
|
+
- model: "haiku"
|
|
89
|
+
- run_in_background: true (for parallel execution)
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
Scale agent count based on codebase size:
|
|
73
93
|
|
|
74
94
|
| File Count | Agents |
|
|
75
95
|
|------------|--------|
|
|
@@ -78,6 +98,34 @@ Include patterns in task descriptions for agents to follow.
|
|
|
78
98
|
| 100-500 | 25-40 |
|
|
79
99
|
| 500+ | 50-100 (cap) |
|
|
80
100
|
|
|
101
|
+
**Explore Agent Prompt Structure:**
|
|
102
|
+
```
|
|
103
|
+
Find: [specific question]
|
|
104
|
+
Return ONLY:
|
|
105
|
+
- File paths matching criteria
|
|
106
|
+
- One-line description per file
|
|
107
|
+
- Integration points (if asked)
|
|
108
|
+
|
|
109
|
+
DO NOT:
|
|
110
|
+
- Read or summarize spec files
|
|
111
|
+
- Make recommendations
|
|
112
|
+
- Propose solutions
|
|
113
|
+
- Generate tables or lengthy explanations
|
|
114
|
+
|
|
115
|
+
Max response: 500 tokens (configurable via .deepflow/config.yaml explore.max_tokens)
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
**Explore Agent Scope Restrictions:**
|
|
119
|
+
- MUST only report factual findings:
|
|
120
|
+
- Files found
|
|
121
|
+
- Patterns/conventions observed
|
|
122
|
+
- Integration points
|
|
123
|
+
- MUST NOT:
|
|
124
|
+
- Make recommendations
|
|
125
|
+
- Propose architectures
|
|
126
|
+
- Read and summarize specs (that's orchestrator's job)
|
|
127
|
+
- Draw conclusions about what should be built
|
|
128
|
+
|
|
81
129
|
**Use `code-completeness` skill patterns** to search for:
|
|
82
130
|
- Implementations matching spec requirements
|
|
83
131
|
- TODO, FIXME, HACK comments
|
|
@@ -86,7 +134,14 @@ Include patterns in task descriptions for agents to follow.
|
|
|
86
134
|
|
|
87
135
|
### 5. COMPARE & PRIORITIZE
|
|
88
136
|
|
|
89
|
-
**
|
|
137
|
+
**Use Task tool to spawn reasoner agent:**
|
|
138
|
+
```
|
|
139
|
+
Task tool parameters:
|
|
140
|
+
- subagent_type: "reasoner"
|
|
141
|
+
- model: "opus"
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
Reasoner performs analysis:
|
|
90
145
|
|
|
91
146
|
| Status | Action |
|
|
92
147
|
|--------|--------|
|
|
@@ -102,7 +157,36 @@ Include patterns in task descriptions for agents to follow.
|
|
|
102
157
|
2. Impact — core features before enhancements
|
|
103
158
|
3. Risk — unknowns early
|
|
104
159
|
|
|
105
|
-
### 6.
|
|
160
|
+
### 6. GENERATE SPIKE TASKS (IF NEEDED)
|
|
161
|
+
|
|
162
|
+
**When to generate spike tasks:**
|
|
163
|
+
1. Failed experiment exists → Test the next hypothesis
|
|
164
|
+
2. No experiments exist → Test the core hypothesis
|
|
165
|
+
3. Passed experiment exists → Skip to full implementation
|
|
166
|
+
|
|
167
|
+
**Spike Task Format:**
|
|
168
|
+
```markdown
|
|
169
|
+
- [ ] **T1** [SPIKE]: Validate {hypothesis}
|
|
170
|
+
- Type: spike
|
|
171
|
+
- Hypothesis: {what we're testing}
|
|
172
|
+
- Method: {minimal steps to validate}
|
|
173
|
+
- Success criteria: {how to know it passed}
|
|
174
|
+
- Time-box: 30 min
|
|
175
|
+
- Files: .deepflow/experiments/{topic}--{hypothesis}--{status}.md
|
|
176
|
+
- Blocked by: none
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
**Blocking Logic:**
|
|
180
|
+
- All implementation tasks MUST have `Blocked by: T{spike}` until spike passes
|
|
181
|
+
- After spike completes:
|
|
182
|
+
- If passed: Update experiment to `--passed.md`, unblock implementation tasks
|
|
183
|
+
- If failed: Update experiment to `--failed.md`, DO NOT generate implementation tasks
|
|
184
|
+
|
|
185
|
+
**Full Implementation Only After Spike:**
|
|
186
|
+
- Only generate full task list when spike validates the approach
|
|
187
|
+
- Never generate 10-task waterfall without validated hypothesis
|
|
188
|
+
|
|
189
|
+
### 7. VALIDATE HYPOTHESES
|
|
106
190
|
|
|
107
191
|
Test risky assumptions before finalizing plan.
|
|
108
192
|
|
|
@@ -111,24 +195,27 @@ Test risky assumptions before finalizing plan.
|
|
|
111
195
|
**Process:**
|
|
112
196
|
1. Prototype in scratchpad (not committed)
|
|
113
197
|
2. Test assumption
|
|
114
|
-
3. If fails → Write `.deepflow/experiments/{
|
|
198
|
+
3. If fails → Write `.deepflow/experiments/{topic}--{hypothesis}--failed.md`
|
|
115
199
|
4. Adjust approach, document in task
|
|
116
200
|
|
|
117
201
|
**Skip:** Well-known patterns, simple CRUD, clear docs exist
|
|
118
202
|
|
|
119
|
-
###
|
|
203
|
+
### 8. OUTPUT PLAN.md
|
|
120
204
|
|
|
121
205
|
Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validation findings.
|
|
122
206
|
|
|
123
|
-
###
|
|
207
|
+
### 9. RENAME SPECS
|
|
124
208
|
|
|
125
209
|
`mv specs/feature.md specs/doing-feature.md`
|
|
126
210
|
|
|
127
|
-
###
|
|
211
|
+
### 10. REPORT
|
|
128
212
|
|
|
129
213
|
`✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
|
|
130
214
|
|
|
131
215
|
## Rules
|
|
216
|
+
- **Spike-first** — Generate spike task before full implementation if no `--passed.md` experiment exists
|
|
217
|
+
- **Block on spike** — Full implementation tasks MUST be blocked by spike validation
|
|
218
|
+
- **Learn from failures** — Extract "next hypothesis" from failed experiments, never repeat same approach
|
|
132
219
|
- **Learn from history** — Check past experiments before proposing approaches
|
|
133
220
|
- **Plan only** — Do NOT implement anything (except quick validation prototypes)
|
|
134
221
|
- **Validate before commit** — Test risky assumptions with minimal experiments
|
|
@@ -139,13 +226,64 @@ Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validatio
|
|
|
139
226
|
|
|
140
227
|
## Agent Scaling
|
|
141
228
|
|
|
142
|
-
| Agent | Base | Scale |
|
|
143
|
-
|
|
144
|
-
| Explore (search) | 10 | +1 per 20 files |
|
|
145
|
-
| Reasoner (analyze) | 5 | +1 per 2 specs |
|
|
229
|
+
| Agent | Model | Base | Scale |
|
|
230
|
+
|-------|-------|------|-------|
|
|
231
|
+
| Explore (search) | haiku | 10 | +1 per 20 files |
|
|
232
|
+
| Reasoner (analyze) | opus | 5 | +1 per 2 specs |
|
|
233
|
+
|
|
234
|
+
**IMPORTANT**: Always use the `Task` tool with explicit `subagent_type` and `model` parameters. Do NOT use Glob/Grep/Read directly for codebase analysis - spawn agents instead.
|
|
146
235
|
|
|
147
236
|
## Example
|
|
148
237
|
|
|
238
|
+
### Spike-First (No Prior Experiments)
|
|
239
|
+
|
|
240
|
+
```markdown
|
|
241
|
+
# Plan
|
|
242
|
+
|
|
243
|
+
### doing-upload
|
|
244
|
+
|
|
245
|
+
- [ ] **T1** [SPIKE]: Validate streaming upload approach
|
|
246
|
+
- Type: spike
|
|
247
|
+
- Hypothesis: Streaming uploads will handle files >1GB without memory issues
|
|
248
|
+
- Method: Create minimal endpoint, upload 2GB file, measure memory
|
|
249
|
+
- Success criteria: Memory stays under 500MB during upload
|
|
250
|
+
- Time-box: 30 min
|
|
251
|
+
- Files: .deepflow/experiments/upload--streaming--active.md
|
|
252
|
+
- Blocked by: none
|
|
253
|
+
|
|
254
|
+
- [ ] **T2**: Create upload endpoint
|
|
255
|
+
- Files: src/api/upload.ts
|
|
256
|
+
- Blocked by: T1 (spike must pass)
|
|
257
|
+
|
|
258
|
+
- [ ] **T3**: Add S3 service with streaming
|
|
259
|
+
- Files: src/services/storage.ts
|
|
260
|
+
- Blocked by: T1 (spike must pass), T2
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
### Spike-First (After Failed Experiment)
|
|
264
|
+
|
|
265
|
+
```markdown
|
|
266
|
+
# Plan
|
|
267
|
+
|
|
268
|
+
### doing-upload
|
|
269
|
+
|
|
270
|
+
- [ ] **T1** [SPIKE]: Validate chunked upload with backpressure
|
|
271
|
+
- Type: spike
|
|
272
|
+
- Hypothesis: Adding backpressure control will prevent buffer overflow
|
|
273
|
+
- Method: Implement pause/resume on buffer threshold, test with 2GB file
|
|
274
|
+
- Success criteria: No memory spikes above 500MB
|
|
275
|
+
- Time-box: 30 min
|
|
276
|
+
- Files: .deepflow/experiments/upload--chunked-backpressure--active.md
|
|
277
|
+
- Blocked by: none
|
|
278
|
+
- Note: Previous approach failed (see upload--buffer-upload--failed.md)
|
|
279
|
+
|
|
280
|
+
- [ ] **T2**: Implement chunked upload endpoint
|
|
281
|
+
- Files: src/api/upload.ts
|
|
282
|
+
- Blocked by: T1 (spike must pass)
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
### After Spike Validates (Full Implementation)
|
|
286
|
+
|
|
149
287
|
```markdown
|
|
150
288
|
# Plan
|
|
151
289
|
|
|
@@ -154,10 +292,10 @@ Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validatio
|
|
|
154
292
|
- [ ] **T1**: Create upload endpoint
|
|
155
293
|
- Files: src/api/upload.ts
|
|
156
294
|
- Blocked by: none
|
|
295
|
+
- Note: Use streaming (validated in upload--streaming--passed.md)
|
|
157
296
|
|
|
158
297
|
- [ ] **T2**: Add S3 service with streaming
|
|
159
298
|
- Files: src/services/storage.ts
|
|
160
299
|
- Blocked by: T1
|
|
161
|
-
-
|
|
162
|
-
- Avoid: Direct buffer upload failed for large files (experiments/perf--buffer-upload--failed.md)
|
|
300
|
+
- Avoid: Direct buffer upload failed (see upload--buffer-upload--failed.md)
|
|
163
301
|
```
|
package/src/commands/df/spec.md
CHANGED
|
@@ -20,14 +20,26 @@ Transform conversation context into a structured specification file.
|
|
|
20
20
|
|
|
21
21
|
## Skills & Agents
|
|
22
22
|
- Skill: `gap-discovery` — Proactive requirement gap identification
|
|
23
|
-
|
|
24
|
-
|
|
23
|
+
|
|
24
|
+
**Use Task tool to spawn agents:**
|
|
25
|
+
| Agent | subagent_type | model | Purpose |
|
|
26
|
+
|-------|---------------|-------|---------|
|
|
27
|
+
| Context | `Explore` | `haiku` | Codebase context gathering |
|
|
28
|
+
| Synthesizer | `reasoner` | `opus` | Synthesize findings into requirements |
|
|
25
29
|
|
|
26
30
|
## Behavior
|
|
27
31
|
|
|
28
32
|
### 1. GATHER CODEBASE CONTEXT
|
|
29
33
|
|
|
30
|
-
**
|
|
34
|
+
**Use Task tool to spawn Explore agents in parallel:**
|
|
35
|
+
```
|
|
36
|
+
Task tool parameters:
|
|
37
|
+
- subagent_type: "Explore"
|
|
38
|
+
- model: "haiku"
|
|
39
|
+
- run_in_background: true
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
Find:
|
|
31
43
|
- Related existing implementations
|
|
32
44
|
- Code patterns and conventions
|
|
33
45
|
- Integration points relevant to the feature
|
|
@@ -39,6 +51,34 @@ Transform conversation context into a structured specification file.
|
|
|
39
51
|
| 20-100 | 5-8 |
|
|
40
52
|
| 100+ | 10-15 |
|
|
41
53
|
|
|
54
|
+
**Explore Agent Prompt Structure:**
|
|
55
|
+
```
|
|
56
|
+
Find: [specific question]
|
|
57
|
+
Return ONLY:
|
|
58
|
+
- File paths matching criteria
|
|
59
|
+
- One-line description per file
|
|
60
|
+
- Integration points (if asked)
|
|
61
|
+
|
|
62
|
+
DO NOT:
|
|
63
|
+
- Read or summarize spec files
|
|
64
|
+
- Make recommendations
|
|
65
|
+
- Propose solutions
|
|
66
|
+
- Generate tables or lengthy explanations
|
|
67
|
+
|
|
68
|
+
Max response: 500 tokens (configurable via .deepflow/config.yaml explore.max_tokens)
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
**Explore Agent Scope Restrictions:**
|
|
72
|
+
- MUST only report factual findings:
|
|
73
|
+
- Files found
|
|
74
|
+
- Patterns/conventions observed
|
|
75
|
+
- Integration points
|
|
76
|
+
- MUST NOT:
|
|
77
|
+
- Make recommendations
|
|
78
|
+
- Propose architectures
|
|
79
|
+
- Read and summarize specs (that's orchestrator's job)
|
|
80
|
+
- Draw conclusions about what should be built
|
|
81
|
+
|
|
42
82
|
### 2. GAP CHECK
|
|
43
83
|
Use the `gap-discovery` skill to analyze conversation + agent findings.
|
|
44
84
|
|
|
@@ -70,7 +110,14 @@ Max 4 questions per tool call. Wait for answers before proceeding.
|
|
|
70
110
|
|
|
71
111
|
### 3. SYNTHESIZE FINDINGS
|
|
72
112
|
|
|
73
|
-
**
|
|
113
|
+
**Use Task tool to spawn reasoner agent:**
|
|
114
|
+
```
|
|
115
|
+
Task tool parameters:
|
|
116
|
+
- subagent_type: "reasoner"
|
|
117
|
+
- model: "opus"
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
The reasoner will:
|
|
74
121
|
- Analyze codebase context from Explore agents
|
|
75
122
|
- Identify constraints from existing architecture
|
|
76
123
|
- Suggest requirements based on patterns found
|
|
@@ -130,10 +177,12 @@ Next: Run /df:plan to generate tasks
|
|
|
130
177
|
|
|
131
178
|
## Agent Scaling
|
|
132
179
|
|
|
133
|
-
| Agent | Base | Purpose |
|
|
134
|
-
|
|
135
|
-
| Explore
|
|
136
|
-
| Reasoner
|
|
180
|
+
| Agent | subagent_type | model | Base | Purpose |
|
|
181
|
+
|-------|---------------|-------|------|---------|
|
|
182
|
+
| Explore | `Explore` | `haiku` | 3-5 | Find related code, patterns |
|
|
183
|
+
| Reasoner | `reasoner` | `opus` | 1 | Synthesize into requirements |
|
|
184
|
+
|
|
185
|
+
**IMPORTANT**: Always use the `Task` tool with explicit `subagent_type` and `model` parameters.
|
|
137
186
|
|
|
138
187
|
## Example
|
|
139
188
|
|
|
@@ -12,7 +12,11 @@ Check that implemented code satisfies spec requirements and acceptance criteria.
|
|
|
12
12
|
|
|
13
13
|
## Skills & Agents
|
|
14
14
|
- Skill: `code-completeness` — Find incomplete implementations
|
|
15
|
-
|
|
15
|
+
|
|
16
|
+
**Use Task tool to spawn agents:**
|
|
17
|
+
| Agent | subagent_type | model | Purpose |
|
|
18
|
+
|-------|---------------|-------|---------|
|
|
19
|
+
| Scanner | `Explore` | `haiku` | Fast codebase scanning |
|
|
16
20
|
|
|
17
21
|
## Spec File States
|
|
18
22
|
|
|
@@ -87,7 +91,15 @@ Default: L1-L3 (L4 optional, can be slow)
|
|
|
87
91
|
|
|
88
92
|
## Agent Usage
|
|
89
93
|
|
|
90
|
-
|
|
94
|
+
**Use Task tool to spawn Explore agents:**
|
|
95
|
+
```
|
|
96
|
+
Task tool parameters:
|
|
97
|
+
- subagent_type: "Explore"
|
|
98
|
+
- model: "haiku"
|
|
99
|
+
- run_in_background: true (for parallel)
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
Scale: 1-2 agents per spec, cap 10.
|
|
91
103
|
|
|
92
104
|
## Example
|
|
93
105
|
|
|
@@ -103,3 +115,46 @@ Learnings captured:
|
|
|
103
115
|
→ experiments/perf--streaming-upload--success.md
|
|
104
116
|
→ experiments/auth--jwt-refresh-rotation--success.md
|
|
105
117
|
```
|
|
118
|
+
|
|
119
|
+
## Post-Verification: Worktree Merge & Cleanup
|
|
120
|
+
|
|
121
|
+
After all verification passes:
|
|
122
|
+
|
|
123
|
+
### 1. MERGE TO MAIN
|
|
124
|
+
|
|
125
|
+
```bash
|
|
126
|
+
# Get worktree info from checkpoint
|
|
127
|
+
WORKTREE_BRANCH=$(cat .deepflow/checkpoint.json | jq -r '.worktree_branch')
|
|
128
|
+
|
|
129
|
+
# Switch to main and merge
|
|
130
|
+
git checkout main
|
|
131
|
+
git merge "${WORKTREE_BRANCH}" --no-ff -m "feat({spec}): merge verified changes"
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
**On merge conflict:**
|
|
135
|
+
- Keep worktree intact for manual resolution
|
|
136
|
+
- Output: "Merge conflict detected. Resolve manually, then run /df:verify --merge-only"
|
|
137
|
+
- Exit without cleanup
|
|
138
|
+
|
|
139
|
+
### 2. CLEANUP WORKTREE
|
|
140
|
+
|
|
141
|
+
After successful merge:
|
|
142
|
+
|
|
143
|
+
```bash
|
|
144
|
+
# Get worktree path from checkpoint
|
|
145
|
+
WORKTREE_PATH=$(cat .deepflow/checkpoint.json | jq -r '.worktree_path')
|
|
146
|
+
|
|
147
|
+
# Remove worktree and branch
|
|
148
|
+
git worktree remove --force "${WORKTREE_PATH}"
|
|
149
|
+
git branch -d "${WORKTREE_BRANCH}"
|
|
150
|
+
|
|
151
|
+
# Remove checkpoint
|
|
152
|
+
rm .deepflow/checkpoint.json
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
**Output on success:**
|
|
156
|
+
```
|
|
157
|
+
✓ Merged df/doing-upload/20260202-1430 to main
|
|
158
|
+
✓ Cleaned up worktree and branch
|
|
159
|
+
✓ Spec complete: doing-upload → done-upload
|
|
160
|
+
```
|
|
@@ -36,7 +36,28 @@ models:
|
|
|
36
36
|
reason: opus # Complex decisions
|
|
37
37
|
debug: opus # Problem solving
|
|
38
38
|
|
|
39
|
+
explore:
|
|
40
|
+
max_tokens: 500 # Controls Explore agent response length
|
|
41
|
+
|
|
39
42
|
commits:
|
|
40
43
|
format: "feat({spec}): {description}"
|
|
41
44
|
atomic: true # One task = one commit
|
|
42
45
|
push_after: complete # Or "each" for every commit
|
|
46
|
+
|
|
47
|
+
# Worktree isolation for /df:execute
|
|
48
|
+
# Isolates all agent work in a git worktree, keeping main clean
|
|
49
|
+
worktree:
|
|
50
|
+
# Enable worktree isolation (default: true)
|
|
51
|
+
enabled: true
|
|
52
|
+
|
|
53
|
+
# Base path for worktrees relative to project root
|
|
54
|
+
base_path: .deepflow/worktrees
|
|
55
|
+
|
|
56
|
+
# Branch name prefix for worktree branches
|
|
57
|
+
branch_prefix: df/
|
|
58
|
+
|
|
59
|
+
# Automatically cleanup worktree after successful verify
|
|
60
|
+
cleanup_on_success: true
|
|
61
|
+
|
|
62
|
+
# Keep worktree after failed execution for debugging
|
|
63
|
+
cleanup_on_fail: false
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
# Experiment: {hypothesis-slug}
|
|
2
|
+
|
|
3
|
+
> **Filename convention**: `{topic}--{hypothesis-slug}--{status}.md`
|
|
4
|
+
> Status: `active` | `passed` | `failed`
|
|
5
|
+
|
|
6
|
+
## Topic
|
|
7
|
+
|
|
8
|
+
{Spec name or feature area this experiment relates to}
|
|
9
|
+
|
|
10
|
+
<!--
|
|
11
|
+
What problem or feature does this experiment address?
|
|
12
|
+
Link to relevant spec if applicable.
|
|
13
|
+
-->
|
|
14
|
+
|
|
15
|
+
## Hypothesis
|
|
16
|
+
|
|
17
|
+
{What we believe will work and why}
|
|
18
|
+
|
|
19
|
+
<!--
|
|
20
|
+
Be specific and testable:
|
|
21
|
+
- "Using approach X will achieve Y because Z"
|
|
22
|
+
- "The bottleneck is in component A, not B"
|
|
23
|
+
- Should be falsifiable in a single experiment
|
|
24
|
+
-->
|
|
25
|
+
|
|
26
|
+
## Method
|
|
27
|
+
|
|
28
|
+
{Minimal steps to validate the hypothesis}
|
|
29
|
+
|
|
30
|
+
<!--
|
|
31
|
+
Keep it minimal - fastest path to prove/disprove:
|
|
32
|
+
1. Step one (e.g., "Create test file with X")
|
|
33
|
+
2. Step two (e.g., "Run command Y")
|
|
34
|
+
3. Step three (e.g., "Observe output Z")
|
|
35
|
+
|
|
36
|
+
Time-box: ideally under 30 minutes
|
|
37
|
+
-->
|
|
38
|
+
|
|
39
|
+
## Result
|
|
40
|
+
|
|
41
|
+
**Status**: {pass | fail}
|
|
42
|
+
|
|
43
|
+
{Actual outcome with evidence}
|
|
44
|
+
|
|
45
|
+
<!--
|
|
46
|
+
Include concrete evidence:
|
|
47
|
+
- Error messages, output logs
|
|
48
|
+
- Metrics or measurements
|
|
49
|
+
- Screenshots if applicable
|
|
50
|
+
- What specifically happened vs. expected
|
|
51
|
+
-->
|
|
52
|
+
|
|
53
|
+
## Conclusion
|
|
54
|
+
|
|
55
|
+
{What we learned from this experiment}
|
|
56
|
+
|
|
57
|
+
<!--
|
|
58
|
+
Answer these:
|
|
59
|
+
- Why did it pass/fail?
|
|
60
|
+
- What assumption was validated/invalidated?
|
|
61
|
+
- If failed: What's the next hypothesis? (don't repeat same approach)
|
|
62
|
+
- If passed: What's ready for implementation?
|
|
63
|
+
-->
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
<!--
|
|
68
|
+
Experiment Guidelines:
|
|
69
|
+
- One hypothesis per experiment
|
|
70
|
+
- Failed experiments are valuable - they inform the next hypothesis
|
|
71
|
+
- Never repeat a failed approach without a new insight
|
|
72
|
+
- Keep experiments small and fast (under 30 min)
|
|
73
|
+
- Link related experiments in conclusions
|
|
74
|
+
-->
|
|
@@ -29,6 +29,22 @@ Generated: {timestamp}
|
|
|
29
29
|
- Files: {files}
|
|
30
30
|
- Blocked by: T1
|
|
31
31
|
|
|
32
|
+
### Spike Task Example
|
|
33
|
+
|
|
34
|
+
When no experiments exist to validate an approach, start with a minimal validation spike:
|
|
35
|
+
|
|
36
|
+
- [ ] **T1** (spike): Validate [hypothesis] approach
|
|
37
|
+
- Files: [minimal files needed]
|
|
38
|
+
- Blocked by: none
|
|
39
|
+
- Blocks: T2, T3, T4 (full implementation)
|
|
40
|
+
- Description: Minimal test to verify [approach] works before full implementation
|
|
41
|
+
|
|
42
|
+
- [ ] **T2**: Implement [feature] based on spike results
|
|
43
|
+
- Files: [implementation files]
|
|
44
|
+
- Blocked by: T1 (spike)
|
|
45
|
+
|
|
46
|
+
Spike tasks are 1-2 tasks to validate an approach before committing to full implementation.
|
|
47
|
+
|
|
32
48
|
---
|
|
33
49
|
|
|
34
50
|
<!--
|
|
@@ -38,4 +54,6 @@ Plan Guidelines:
|
|
|
38
54
|
- Blocked by references task IDs (T1, T2, etc.)
|
|
39
55
|
- Mark complete with [x] and commit hash
|
|
40
56
|
- Example completed: [x] **T1**: Create API ✓ (abc1234)
|
|
57
|
+
- Spike tasks: If no experiments validate the approach, first task should be a minimal validation spike
|
|
58
|
+
- Spike tasks block full implementation tasks until the hypothesis is validated
|
|
41
59
|
-->
|