deepflow 0.1.24 → 0.1.27

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "deepflow",
3
- "version": "0.1.24",
3
+ "version": "0.1.27",
4
4
  "description": "Stay in flow state - lightweight spec-driven task orchestration for Claude Code",
5
5
  "keywords": [
6
6
  "claude",
@@ -24,8 +24,12 @@ Implement tasks from PLAN.md with parallel agents, atomic commits, and context-e
24
24
 
25
25
  ## Skills & Agents
26
26
  - Skill: `atomic-commits` — Clean commit protocol
27
- - Agent: `general-purpose` (Sonnet) — Task implementation
28
- - Agent: `reasoner` (Opus) Debugging failures
27
+
28
+ **Use Task tool to spawn agents:**
29
+ | Agent | subagent_type | model | Purpose |
30
+ |-------|---------------|-------|---------|
31
+ | Implementation | `general-purpose` | `sonnet` | Task implementation |
32
+ | Debugger | `reasoner` | `opus` | Debugging failures |
29
33
 
30
34
  ## Context-Aware Execution
31
35
 
@@ -55,22 +59,66 @@ summary: "one line"
55
59
 
56
60
  ## Checkpoint & Resume
57
61
 
58
- **File:** `.deepflow/checkpoint.json` — stores completed tasks, current wave.
62
+ **File:** `.deepflow/checkpoint.json` — stored in WORKTREE directory, not main.
63
+
64
+ **Schema:**
65
+ ```json
66
+ {
67
+ "completed_tasks": ["T1", "T2"],
68
+ "current_wave": 2,
69
+ "worktree_path": ".deepflow/worktrees/df/doing-upload/20260202-1430",
70
+ "worktree_branch": "df/doing-upload/20260202-1430"
71
+ }
72
+ ```
59
73
 
60
- **On checkpoint:** Complete wave → update PLAN.md → save → exit.
61
- **Resume:** `--continue` loads checkpoint, skips completed tasks.
74
+ **On checkpoint:** Complete wave → update PLAN.md → save to worktree → exit.
75
+ **Resume:** `--continue` loads checkpoint, verifies worktree, skips completed tasks.
62
76
 
63
77
  ## Behavior
64
78
 
65
79
  ### 1. CHECK CHECKPOINT
66
80
 
67
81
  ```
68
- --continue → Load and resume
82
+ --continue → Load checkpoint
83
+ → If worktree_path exists:
84
+ → Verify worktree still exists on disk
85
+ → If missing: Error "Worktree deleted. Use --fresh"
86
+ → If exists: Use it, skip worktree creation
87
+ → Resume execution with completed tasks
69
88
  --fresh → Delete checkpoint, start fresh
70
89
  checkpoint exists → Prompt: "Resume? (y/n)"
71
90
  else → Start fresh
72
91
  ```
73
92
 
93
+ ### 1.5. CREATE WORKTREE
94
+
95
+ Before spawning any agents, create an isolated worktree:
96
+
97
+ ```
98
+ # Check main is clean (ignore untracked)
99
+ git diff --quiet HEAD || Error: "Main has uncommitted changes. Commit or stash first."
100
+
101
+ # Generate worktree path
102
+ SPEC_NAME=$(basename spec/doing-*.md .md | sed 's/doing-//')
103
+ TIMESTAMP=$(date +%Y%m%d-%H%M)
104
+ BRANCH_NAME="df/${SPEC_NAME}/${TIMESTAMP}"
105
+ WORKTREE_PATH=".deepflow/worktrees/${BRANCH_NAME}"
106
+
107
+ # Create worktree
108
+ git worktree add -b "${BRANCH_NAME}" "${WORKTREE_PATH}"
109
+
110
+ # Store in checkpoint for resume
111
+ checkpoint.worktree_path = WORKTREE_PATH
112
+ checkpoint.worktree_branch = BRANCH_NAME
113
+ ```
114
+
115
+ **Resume handling:**
116
+ - If checkpoint has worktree_path → verify it exists, use it
117
+ - If worktree missing → Error: "Worktree deleted. Use --fresh"
118
+
119
+ **Existing worktree handling:**
120
+ - If worktree exists for same spec → Prompt: "Resume existing worktree? (y/n/delete)"
121
+
74
122
  ### 2. LOAD PLAN
75
123
 
76
124
  ```
@@ -82,37 +130,187 @@ If missing: "No PLAN.md found. Run /df:plan first."
82
130
 
83
131
  Warn if `specs/*.md` (excluding doing-/done-) exist. Non-blocking.
84
132
 
85
- ### 4. IDENTIFY READY TASKS
133
+ ### 4. CHECK EXPERIMENT STATUS (HYPOTHESIS VALIDATION)
134
+
135
+ **Before identifying ready tasks**, check experiment validation for full implementation tasks.
136
+
137
+ **Task Types:**
138
+ - **Spike tasks**: Have `[SPIKE]` in title OR `Type: spike` in description — always executable
139
+ - **Full implementation tasks**: Blocked by spike tasks — require validated experiment
140
+
141
+ **Validation Flow:**
142
+
143
+ ```
144
+ For each task in plan:
145
+ If task is spike task:
146
+ → Mark as executable (spikes are always allowed)
147
+ Else if task is blocked by a spike task (T{n}):
148
+ → Find related experiment file in .deepflow/experiments/
149
+ → Check experiment status:
150
+ - --passed.md exists → Unblock, proceed with implementation
151
+ - --failed.md exists → Keep blocked, warn user
152
+ - --active.md exists → Keep blocked, spike in progress
153
+ - No experiment → Keep blocked, spike not started
154
+ ```
155
+
156
+ **Experiment File Discovery:**
157
+
158
+ ```
159
+ Glob: .deepflow/experiments/{topic}--*--{status}.md
160
+
161
+ Topic extraction:
162
+ 1. From spike task: experiment file path in task description
163
+ 2. From spec name: doing-{topic} → {topic}
164
+ 3. Fuzzy match: normalize and match
165
+ ```
86
166
 
87
- Ready = `[ ]` + all `blocked_by` complete + not in checkpoint.
167
+ **Status Handling:**
88
168
 
89
- ### 5. SPAWN AGENTS
169
+ | Experiment Status | Task Status | Action |
170
+ |-------------------|-------------|--------|
171
+ | `--passed.md` | Ready | Execute full implementation |
172
+ | `--failed.md` | Blocked | Skip, warn: "Experiment failed, re-plan needed" |
173
+ | `--active.md` | Blocked | Skip, info: "Waiting for spike completion" |
174
+ | Not found | Blocked | Skip, info: "Spike task not executed yet" |
175
+
176
+ **Warning Output:**
177
+
178
+ ```
179
+ ⚠ T3 blocked: Experiment 'upload--streaming--failed.md' did not validate
180
+ → Run /df:plan to generate new hypothesis spike
181
+ ```
182
+
183
+ ### 5. IDENTIFY READY TASKS
184
+
185
+ Ready = `[ ]` + all `blocked_by` complete + experiment validated (if applicable) + not in checkpoint.
186
+
187
+ ### 6. SPAWN AGENTS
90
188
 
91
189
  Context ≥50%: checkpoint and exit.
92
190
 
93
- Spawn all ready tasks in ONE message (parallel). Same-file conflicts: sequential.
191
+ **Use Task tool to spawn all ready tasks in ONE message (parallel):**
192
+ ```
193
+ Task tool parameters for each task:
194
+ - subagent_type: "general-purpose"
195
+ - model: "sonnet"
196
+ - run_in_background: true
197
+ - prompt: "{task details from PLAN.md}"
198
+ ```
199
+
200
+ Same-file conflicts: spawn sequentially instead.
201
+
202
+ **Spike Task Execution:**
203
+ When spawning a spike task, the agent MUST:
204
+ 1. Execute the minimal validation method
205
+ 2. Record result in experiment file (update status: `--passed.md` or `--failed.md`)
206
+ 3. If passed: implementation tasks become unblocked
207
+ 4. If failed: record conclusion with "next hypothesis" for future planning
94
208
 
95
- On failure: spawn `reasoner`.
209
+ **On failure, use Task tool to spawn reasoner:**
210
+ ```
211
+ Task tool parameters:
212
+ - subagent_type: "reasoner"
213
+ - model: "opus"
214
+ - prompt: "Debug failure: {error details}"
215
+ ```
96
216
 
97
- ### 6. PER-TASK (agent prompt)
217
+ ### 7. PER-TASK (agent prompt)
98
218
 
219
+ **Standard Task:**
99
220
  ```
100
221
  {task_id}: {description from PLAN.md}
101
222
  Files: {target files}
102
223
  Spec: {spec_name}
103
224
 
225
+ **IMPORTANT: Working Directory**
226
+ All file operations MUST use this absolute path as base:
227
+ {worktree_absolute_path}
228
+
229
+ Example: To edit src/foo.ts, use:
230
+ {worktree_absolute_path}/src/foo.ts
231
+
232
+ Do NOT write files to the main project directory.
233
+
104
234
  Implement, test, commit as feat({spec}): {description}.
105
- Write result to .deepflow/results/{task_id}.yaml
235
+ Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
106
236
  ```
107
237
 
108
- ### 7. COMPLETE SPECS
238
+ **Spike Task:**
239
+ ```
240
+ {task_id} [SPIKE]: {hypothesis}
241
+ Type: spike
242
+ Method: {minimal steps to validate}
243
+ Success criteria: {how to know it passed}
244
+ Time-box: {duration}
245
+ Experiment file: {worktree_absolute_path}/.deepflow/experiments/{topic}--{hypothesis}--active.md
246
+ Spec: {spec_name}
247
+
248
+ **IMPORTANT: Working Directory**
249
+ All file operations MUST use this absolute path as base:
250
+ {worktree_absolute_path}
251
+
252
+ Example: To edit src/foo.ts, use:
253
+ {worktree_absolute_path}/src/foo.ts
254
+
255
+ Do NOT write files to the main project directory.
256
+
257
+ Execute the minimal validation:
258
+ 1. Follow the method steps exactly
259
+ 2. Measure against success criteria
260
+ 3. Update experiment file with result:
261
+ - If passed: rename to --passed.md, record findings
262
+ - If failed: rename to --failed.md, record conclusion with "next hypothesis"
263
+ 4. Commit as spike({spec}): validate {hypothesis}
264
+ 5. Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
265
+
266
+ Result status:
267
+ - success = hypothesis validated (passed)
268
+ - failed = hypothesis invalidated (failed experiment, NOT agent error)
269
+ ```
270
+
271
+ ### 8. FAILURE HANDLING
272
+
273
+ When a task fails and cannot be auto-fixed:
274
+
275
+ **Behavior:**
276
+ 1. Leave worktree intact at `{worktree_path}`
277
+ 2. Keep checkpoint.json for potential resume
278
+ 3. Output debugging instructions
279
+
280
+ **Output:**
281
+ ```
282
+ ✗ Task T3 failed after retry
283
+
284
+ Worktree preserved for debugging:
285
+ Path: .deepflow/worktrees/df/doing-upload/20260202-1430
286
+ Branch: df/doing-upload/20260202-1430
287
+
288
+ To investigate:
289
+ cd .deepflow/worktrees/df/doing-upload/20260202-1430
290
+ # examine files, run tests, etc.
291
+
292
+ To resume after fixing:
293
+ /df:execute --continue
294
+
295
+ To discard and start fresh:
296
+ git worktree remove --force .deepflow/worktrees/df/doing-upload/20260202-1430
297
+ git branch -D df/doing-upload/20260202-1430
298
+ /df:execute --fresh
299
+ ```
300
+
301
+ **Key points:**
302
+ - Never auto-delete worktree on failure (cleanup_on_fail: false by default)
303
+ - Always provide the exact cleanup commands
304
+ - Checkpoint remains so --continue can work after manual fix
305
+
306
+ ### 9. COMPLETE SPECS
109
307
 
110
308
  When all tasks done for a `doing-*` spec:
111
309
  1. Embed history in spec: `## Completed` section
112
310
  2. Rename: `doing-upload.md` → `done-upload.md`
113
311
  3. Remove section from PLAN.md
114
312
 
115
- ### 8. ITERATE
313
+ ### 10. ITERATE
116
314
 
117
315
  Repeat until: all done, all blocked, or checkpoint.
118
316
 
@@ -126,6 +324,8 @@ Repeat until: all done, all blocked, or checkpoint.
126
324
 
127
325
  ## Example
128
326
 
327
+ ### Standard Execution
328
+
129
329
  ```
130
330
  /df:execute (context: 12%)
131
331
 
@@ -140,7 +340,52 @@ Wave 2: T3 (context: 48%)
140
340
  ✓ Complete: 3/3 tasks
141
341
  ```
142
342
 
143
- With checkpoint:
343
+ ### Spike-First Execution
344
+
345
+ ```
346
+ /df:execute (context: 10%)
347
+
348
+ Checking experiment status...
349
+ T1 [SPIKE]: No experiment yet, spike executable
350
+ T2: Blocked by T1 (spike not validated)
351
+ T3: Blocked by T1 (spike not validated)
352
+
353
+ Wave 1: T1 [SPIKE] (context: 20%)
354
+ T1: success (abc1234) → upload--streaming--passed.md
355
+
356
+ Checking experiment status...
357
+ T2: Experiment passed, unblocked
358
+ T3: Experiment passed, unblocked
359
+
360
+ Wave 2: T2, T3 parallel (context: 45%)
361
+ T2: success (def5678)
362
+ T3: success (ghi9012)
363
+
364
+ ✓ doing-upload → done-upload
365
+ ✓ Complete: 3/3 tasks
366
+ ```
367
+
368
+ ### Spike Failed
369
+
370
+ ```
371
+ /df:execute (context: 10%)
372
+
373
+ Wave 1: T1 [SPIKE] (context: 20%)
374
+ T1: failed → upload--streaming--failed.md
375
+
376
+ Checking experiment status...
377
+ T2: ⚠ Blocked - Experiment failed
378
+ T3: ⚠ Blocked - Experiment failed
379
+
380
+ ⚠ Spike T1 invalidated hypothesis
381
+ Experiment: upload--streaming--failed.md
382
+ → Run /df:plan to generate new hypothesis spike
383
+
384
+ Complete: 1/3 tasks (2 blocked by failed experiment)
385
+ ```
386
+
387
+ ### With Checkpoint
388
+
144
389
  ```
145
390
  Wave 1 complete (context: 52%)
146
391
  Checkpoint saved. Run /df:execute --continue
@@ -42,21 +42,33 @@ Determine source_dir from config or default to src/
42
42
 
43
43
  If no new specs: report counts, suggest `/df:execute`.
44
44
 
45
- ### 2. CHECK PAST EXPERIMENTS
45
+ ### 2. CHECK PAST EXPERIMENTS (SPIKE-FIRST)
46
46
 
47
- Extract domains from spec (perf, auth, api, etc.), then:
47
+ **CRITICAL**: Check experiments BEFORE generating any tasks.
48
+
49
+ Extract topic from spec name (fuzzy match), then:
48
50
 
49
51
  ```
50
- Glob .deepflow/experiments/{domain}--*
52
+ Glob .deepflow/experiments/{topic}--*
51
53
  ```
52
54
 
55
+ **Experiment file naming:** `{topic}--{hypothesis}--{status}.md`
56
+ Statuses: `active`, `passed`, `failed`
57
+
53
58
  | Result | Action |
54
59
  |--------|--------|
55
- | `--failed.md` | Exclude approach, note why |
56
- | `--success.md` | Reference as pattern |
57
- | No matches | Continue (expected for new projects) |
60
+ | `--failed.md` exists | Extract "next hypothesis" from Conclusion section |
61
+ | `--passed.md` exists | Reference as validated pattern, can proceed to full implementation |
62
+ | `--active.md` exists | Wait for experiment completion before planning |
63
+ | No matches | New topic, needs initial spike |
64
+
65
+ **Spike-First Rule**:
66
+ - If `--failed.md` exists: Generate spike task to test the next hypothesis (from failed experiment's Conclusion)
67
+ - If no experiments exist: Generate spike task for the core hypothesis
68
+ - Full implementation tasks are BLOCKED until a spike validates the approach
69
+ - Only proceed to full task generation after `--passed.md` exists
58
70
 
59
- **Naming:** `{domain}--{approach}--{result}.md`
71
+ See: `templates/experiment-template.md` for experiment format
60
72
 
61
73
  ### 3. DETECT PROJECT CONTEXT
62
74
 
@@ -69,7 +81,15 @@ Include patterns in task descriptions for agents to follow.
69
81
 
70
82
  ### 4. ANALYZE CODEBASE
71
83
 
72
- **Spawn Explore agents** (haiku, read-only) with dynamic count:
84
+ **Use Task tool to spawn Explore agents in parallel:**
85
+ ```
86
+ Task tool parameters:
87
+ - subagent_type: "Explore"
88
+ - model: "haiku"
89
+ - run_in_background: true (for parallel execution)
90
+ ```
91
+
92
+ Scale agent count based on codebase size:
73
93
 
74
94
  | File Count | Agents |
75
95
  |------------|--------|
@@ -78,6 +98,34 @@ Include patterns in task descriptions for agents to follow.
78
98
  | 100-500 | 25-40 |
79
99
  | 500+ | 50-100 (cap) |
80
100
 
101
+ **Explore Agent Prompt Structure:**
102
+ ```
103
+ Find: [specific question]
104
+ Return ONLY:
105
+ - File paths matching criteria
106
+ - One-line description per file
107
+ - Integration points (if asked)
108
+
109
+ DO NOT:
110
+ - Read or summarize spec files
111
+ - Make recommendations
112
+ - Propose solutions
113
+ - Generate tables or lengthy explanations
114
+
115
+ Max response: 500 tokens (configurable via .deepflow/config.yaml explore.max_tokens)
116
+ ```
117
+
118
+ **Explore Agent Scope Restrictions:**
119
+ - MUST only report factual findings:
120
+ - Files found
121
+ - Patterns/conventions observed
122
+ - Integration points
123
+ - MUST NOT:
124
+ - Make recommendations
125
+ - Propose architectures
126
+ - Read and summarize specs (that's orchestrator's job)
127
+ - Draw conclusions about what should be built
128
+
81
129
  **Use `code-completeness` skill patterns** to search for:
82
130
  - Implementations matching spec requirements
83
131
  - TODO, FIXME, HACK comments
@@ -86,7 +134,14 @@ Include patterns in task descriptions for agents to follow.
86
134
 
87
135
  ### 5. COMPARE & PRIORITIZE
88
136
 
89
- **Spawn `reasoner` agent** (Opus) for analysis:
137
+ **Use Task tool to spawn reasoner agent:**
138
+ ```
139
+ Task tool parameters:
140
+ - subagent_type: "reasoner"
141
+ - model: "opus"
142
+ ```
143
+
144
+ Reasoner performs analysis:
90
145
 
91
146
  | Status | Action |
92
147
  |--------|--------|
@@ -102,7 +157,36 @@ Include patterns in task descriptions for agents to follow.
102
157
  2. Impact — core features before enhancements
103
158
  3. Risk — unknowns early
104
159
 
105
- ### 6. VALIDATE HYPOTHESES
160
+ ### 6. GENERATE SPIKE TASKS (IF NEEDED)
161
+
162
+ **When to generate spike tasks:**
163
+ 1. Failed experiment exists → Test the next hypothesis
164
+ 2. No experiments exist → Test the core hypothesis
165
+ 3. Passed experiment exists → Skip to full implementation
166
+
167
+ **Spike Task Format:**
168
+ ```markdown
169
+ - [ ] **T1** [SPIKE]: Validate {hypothesis}
170
+ - Type: spike
171
+ - Hypothesis: {what we're testing}
172
+ - Method: {minimal steps to validate}
173
+ - Success criteria: {how to know it passed}
174
+ - Time-box: 30 min
175
+ - Files: .deepflow/experiments/{topic}--{hypothesis}--{status}.md
176
+ - Blocked by: none
177
+ ```
178
+
179
+ **Blocking Logic:**
180
+ - All implementation tasks MUST have `Blocked by: T{spike}` until spike passes
181
+ - After spike completes:
182
+ - If passed: Update experiment to `--passed.md`, unblock implementation tasks
183
+ - If failed: Update experiment to `--failed.md`, DO NOT generate implementation tasks
184
+
185
+ **Full Implementation Only After Spike:**
186
+ - Only generate full task list when spike validates the approach
187
+ - Never generate 10-task waterfall without validated hypothesis
188
+
189
+ ### 7. VALIDATE HYPOTHESES
106
190
 
107
191
  Test risky assumptions before finalizing plan.
108
192
 
@@ -111,24 +195,27 @@ Test risky assumptions before finalizing plan.
111
195
  **Process:**
112
196
  1. Prototype in scratchpad (not committed)
113
197
  2. Test assumption
114
- 3. If fails → Write `.deepflow/experiments/{domain}--{approach}--failed.md`
198
+ 3. If fails → Write `.deepflow/experiments/{topic}--{hypothesis}--failed.md`
115
199
  4. Adjust approach, document in task
116
200
 
117
201
  **Skip:** Well-known patterns, simple CRUD, clear docs exist
118
202
 
119
- ### 7. OUTPUT PLAN.md
203
+ ### 8. OUTPUT PLAN.md
120
204
 
121
205
  Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validation findings.
122
206
 
123
- ### 8. RENAME SPECS
207
+ ### 9. RENAME SPECS
124
208
 
125
209
  `mv specs/feature.md specs/doing-feature.md`
126
210
 
127
- ### 9. REPORT
211
+ ### 10. REPORT
128
212
 
129
213
  `✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
130
214
 
131
215
  ## Rules
216
+ - **Spike-first** — Generate spike task before full implementation if no `--passed.md` experiment exists
217
+ - **Block on spike** — Full implementation tasks MUST be blocked by spike validation
218
+ - **Learn from failures** — Extract "next hypothesis" from failed experiments, never repeat same approach
132
219
  - **Learn from history** — Check past experiments before proposing approaches
133
220
  - **Plan only** — Do NOT implement anything (except quick validation prototypes)
134
221
  - **Validate before commit** — Test risky assumptions with minimal experiments
@@ -139,13 +226,64 @@ Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validatio
139
226
 
140
227
  ## Agent Scaling
141
228
 
142
- | Agent | Base | Scale |
143
- |-------|------|-------|
144
- | Explore (search) | 10 | +1 per 20 files |
145
- | Reasoner (analyze) | 5 | +1 per 2 specs |
229
+ | Agent | Model | Base | Scale |
230
+ |-------|-------|------|-------|
231
+ | Explore (search) | haiku | 10 | +1 per 20 files |
232
+ | Reasoner (analyze) | opus | 5 | +1 per 2 specs |
233
+
234
+ **IMPORTANT**: Always use the `Task` tool with explicit `subagent_type` and `model` parameters. Do NOT use Glob/Grep/Read directly for codebase analysis - spawn agents instead.
146
235
 
147
236
  ## Example
148
237
 
238
+ ### Spike-First (No Prior Experiments)
239
+
240
+ ```markdown
241
+ # Plan
242
+
243
+ ### doing-upload
244
+
245
+ - [ ] **T1** [SPIKE]: Validate streaming upload approach
246
+ - Type: spike
247
+ - Hypothesis: Streaming uploads will handle files >1GB without memory issues
248
+ - Method: Create minimal endpoint, upload 2GB file, measure memory
249
+ - Success criteria: Memory stays under 500MB during upload
250
+ - Time-box: 30 min
251
+ - Files: .deepflow/experiments/upload--streaming--active.md
252
+ - Blocked by: none
253
+
254
+ - [ ] **T2**: Create upload endpoint
255
+ - Files: src/api/upload.ts
256
+ - Blocked by: T1 (spike must pass)
257
+
258
+ - [ ] **T3**: Add S3 service with streaming
259
+ - Files: src/services/storage.ts
260
+ - Blocked by: T1 (spike must pass), T2
261
+ ```
262
+
263
+ ### Spike-First (After Failed Experiment)
264
+
265
+ ```markdown
266
+ # Plan
267
+
268
+ ### doing-upload
269
+
270
+ - [ ] **T1** [SPIKE]: Validate chunked upload with backpressure
271
+ - Type: spike
272
+ - Hypothesis: Adding backpressure control will prevent buffer overflow
273
+ - Method: Implement pause/resume on buffer threshold, test with 2GB file
274
+ - Success criteria: No memory spikes above 500MB
275
+ - Time-box: 30 min
276
+ - Files: .deepflow/experiments/upload--chunked-backpressure--active.md
277
+ - Blocked by: none
278
+ - Note: Previous approach failed (see upload--buffer-upload--failed.md)
279
+
280
+ - [ ] **T2**: Implement chunked upload endpoint
281
+ - Files: src/api/upload.ts
282
+ - Blocked by: T1 (spike must pass)
283
+ ```
284
+
285
+ ### After Spike Validates (Full Implementation)
286
+
149
287
  ```markdown
150
288
  # Plan
151
289
 
@@ -154,10 +292,10 @@ Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validatio
154
292
  - [ ] **T1**: Create upload endpoint
155
293
  - Files: src/api/upload.ts
156
294
  - Blocked by: none
295
+ - Note: Use streaming (validated in upload--streaming--passed.md)
157
296
 
158
297
  - [ ] **T2**: Add S3 service with streaming
159
298
  - Files: src/services/storage.ts
160
299
  - Blocked by: T1
161
- - Note: Use streaming (see experiments/perf--chunked-upload--success.md)
162
- - Avoid: Direct buffer upload failed for large files (experiments/perf--buffer-upload--failed.md)
300
+ - Avoid: Direct buffer upload failed (see upload--buffer-upload--failed.md)
163
301
  ```
@@ -20,14 +20,26 @@ Transform conversation context into a structured specification file.
20
20
 
21
21
  ## Skills & Agents
22
22
  - Skill: `gap-discovery` — Proactive requirement gap identification
23
- - Agent: `Explore` (haiku) — Codebase context gathering
24
- - Agent: `reasoner` (Opus) Synthesize findings into requirements
23
+
24
+ **Use Task tool to spawn agents:**
25
+ | Agent | subagent_type | model | Purpose |
26
+ |-------|---------------|-------|---------|
27
+ | Context | `Explore` | `haiku` | Codebase context gathering |
28
+ | Synthesizer | `reasoner` | `opus` | Synthesize findings into requirements |
25
29
 
26
30
  ## Behavior
27
31
 
28
32
  ### 1. GATHER CODEBASE CONTEXT
29
33
 
30
- **Spawn Explore agents** (haiku, read-only, parallel) to find:
34
+ **Use Task tool to spawn Explore agents in parallel:**
35
+ ```
36
+ Task tool parameters:
37
+ - subagent_type: "Explore"
38
+ - model: "haiku"
39
+ - run_in_background: true
40
+ ```
41
+
42
+ Find:
31
43
  - Related existing implementations
32
44
  - Code patterns and conventions
33
45
  - Integration points relevant to the feature
@@ -39,6 +51,34 @@ Transform conversation context into a structured specification file.
39
51
  | 20-100 | 5-8 |
40
52
  | 100+ | 10-15 |
41
53
 
54
+ **Explore Agent Prompt Structure:**
55
+ ```
56
+ Find: [specific question]
57
+ Return ONLY:
58
+ - File paths matching criteria
59
+ - One-line description per file
60
+ - Integration points (if asked)
61
+
62
+ DO NOT:
63
+ - Read or summarize spec files
64
+ - Make recommendations
65
+ - Propose solutions
66
+ - Generate tables or lengthy explanations
67
+
68
+ Max response: 500 tokens (configurable via .deepflow/config.yaml explore.max_tokens)
69
+ ```
70
+
71
+ **Explore Agent Scope Restrictions:**
72
+ - MUST only report factual findings:
73
+ - Files found
74
+ - Patterns/conventions observed
75
+ - Integration points
76
+ - MUST NOT:
77
+ - Make recommendations
78
+ - Propose architectures
79
+ - Read and summarize specs (that's orchestrator's job)
80
+ - Draw conclusions about what should be built
81
+
42
82
  ### 2. GAP CHECK
43
83
  Use the `gap-discovery` skill to analyze conversation + agent findings.
44
84
 
@@ -70,7 +110,14 @@ Max 4 questions per tool call. Wait for answers before proceeding.
70
110
 
71
111
  ### 3. SYNTHESIZE FINDINGS
72
112
 
73
- **Spawn `reasoner` agent** (Opus) to:
113
+ **Use Task tool to spawn reasoner agent:**
114
+ ```
115
+ Task tool parameters:
116
+ - subagent_type: "reasoner"
117
+ - model: "opus"
118
+ ```
119
+
120
+ The reasoner will:
74
121
  - Analyze codebase context from Explore agents
75
122
  - Identify constraints from existing architecture
76
123
  - Suggest requirements based on patterns found
@@ -130,10 +177,12 @@ Next: Run /df:plan to generate tasks
130
177
 
131
178
  ## Agent Scaling
132
179
 
133
- | Agent | Base | Purpose |
134
- |-------|------|---------|
135
- | Explore (haiku) | 3-5 | Find related code, patterns |
136
- | Reasoner (Opus) | 1 | Synthesize into requirements |
180
+ | Agent | subagent_type | model | Base | Purpose |
181
+ |-------|---------------|-------|------|---------|
182
+ | Explore | `Explore` | `haiku` | 3-5 | Find related code, patterns |
183
+ | Reasoner | `reasoner` | `opus` | 1 | Synthesize into requirements |
184
+
185
+ **IMPORTANT**: Always use the `Task` tool with explicit `subagent_type` and `model` parameters.
137
186
 
138
187
  ## Example
139
188
 
@@ -12,7 +12,11 @@ Check that implemented code satisfies spec requirements and acceptance criteria.
12
12
 
13
13
  ## Skills & Agents
14
14
  - Skill: `code-completeness` — Find incomplete implementations
15
- - Agent: `Explore` (Haiku) — Fast codebase scanning
15
+
16
+ **Use Task tool to spawn agents:**
17
+ | Agent | subagent_type | model | Purpose |
18
+ |-------|---------------|-------|---------|
19
+ | Scanner | `Explore` | `haiku` | Fast codebase scanning |
16
20
 
17
21
  ## Spec File States
18
22
 
@@ -87,7 +91,15 @@ Default: L1-L3 (L4 optional, can be slow)
87
91
 
88
92
  ## Agent Usage
89
93
 
90
- Spawn `Explore` agents (Haiku), 1-2 per spec, cap 10.
94
+ **Use Task tool to spawn Explore agents:**
95
+ ```
96
+ Task tool parameters:
97
+ - subagent_type: "Explore"
98
+ - model: "haiku"
99
+ - run_in_background: true (for parallel)
100
+ ```
101
+
102
+ Scale: 1-2 agents per spec, cap 10.
91
103
 
92
104
  ## Example
93
105
 
@@ -103,3 +115,46 @@ Learnings captured:
103
115
  → experiments/perf--streaming-upload--success.md
104
116
  → experiments/auth--jwt-refresh-rotation--success.md
105
117
  ```
118
+
119
+ ## Post-Verification: Worktree Merge & Cleanup
120
+
121
+ After all verification passes:
122
+
123
+ ### 1. MERGE TO MAIN
124
+
125
+ ```bash
126
+ # Get worktree info from checkpoint
127
+ WORKTREE_BRANCH=$(cat .deepflow/checkpoint.json | jq -r '.worktree_branch')
128
+
129
+ # Switch to main and merge
130
+ git checkout main
131
+ git merge "${WORKTREE_BRANCH}" --no-ff -m "feat({spec}): merge verified changes"
132
+ ```
133
+
134
+ **On merge conflict:**
135
+ - Keep worktree intact for manual resolution
136
+ - Output: "Merge conflict detected. Resolve manually, then run /df:verify --merge-only"
137
+ - Exit without cleanup
138
+
139
+ ### 2. CLEANUP WORKTREE
140
+
141
+ After successful merge:
142
+
143
+ ```bash
144
+ # Get worktree path from checkpoint
145
+ WORKTREE_PATH=$(cat .deepflow/checkpoint.json | jq -r '.worktree_path')
146
+
147
+ # Remove worktree and branch
148
+ git worktree remove --force "${WORKTREE_PATH}"
149
+ git branch -d "${WORKTREE_BRANCH}"
150
+
151
+ # Remove checkpoint
152
+ rm .deepflow/checkpoint.json
153
+ ```
154
+
155
+ **Output on success:**
156
+ ```
157
+ ✓ Merged df/doing-upload/20260202-1430 to main
158
+ ✓ Cleaned up worktree and branch
159
+ ✓ Spec complete: doing-upload → done-upload
160
+ ```
@@ -36,7 +36,28 @@ models:
36
36
  reason: opus # Complex decisions
37
37
  debug: opus # Problem solving
38
38
 
39
+ explore:
40
+ max_tokens: 500 # Controls Explore agent response length
41
+
39
42
  commits:
40
43
  format: "feat({spec}): {description}"
41
44
  atomic: true # One task = one commit
42
45
  push_after: complete # Or "each" for every commit
46
+
47
+ # Worktree isolation for /df:execute
48
+ # Isolates all agent work in a git worktree, keeping main clean
49
+ worktree:
50
+ # Enable worktree isolation (default: true)
51
+ enabled: true
52
+
53
+ # Base path for worktrees relative to project root
54
+ base_path: .deepflow/worktrees
55
+
56
+ # Branch name prefix for worktree branches
57
+ branch_prefix: df/
58
+
59
+ # Automatically cleanup worktree after successful verify
60
+ cleanup_on_success: true
61
+
62
+ # Keep worktree after failed execution for debugging
63
+ cleanup_on_fail: false
@@ -0,0 +1,74 @@
1
+ # Experiment: {hypothesis-slug}
2
+
3
+ > **Filename convention**: `{topic}--{hypothesis-slug}--{status}.md`
4
+ > Status: `active` | `passed` | `failed`
5
+
6
+ ## Topic
7
+
8
+ {Spec name or feature area this experiment relates to}
9
+
10
+ <!--
11
+ What problem or feature does this experiment address?
12
+ Link to relevant spec if applicable.
13
+ -->
14
+
15
+ ## Hypothesis
16
+
17
+ {What we believe will work and why}
18
+
19
+ <!--
20
+ Be specific and testable:
21
+ - "Using approach X will achieve Y because Z"
22
+ - "The bottleneck is in component A, not B"
23
+ - Should be falsifiable in a single experiment
24
+ -->
25
+
26
+ ## Method
27
+
28
+ {Minimal steps to validate the hypothesis}
29
+
30
+ <!--
31
+ Keep it minimal - fastest path to prove/disprove:
32
+ 1. Step one (e.g., "Create test file with X")
33
+ 2. Step two (e.g., "Run command Y")
34
+ 3. Step three (e.g., "Observe output Z")
35
+
36
+ Time-box: ideally under 30 minutes
37
+ -->
38
+
39
+ ## Result
40
+
41
+ **Status**: {pass | fail}
42
+
43
+ {Actual outcome with evidence}
44
+
45
+ <!--
46
+ Include concrete evidence:
47
+ - Error messages, output logs
48
+ - Metrics or measurements
49
+ - Screenshots if applicable
50
+ - What specifically happened vs. expected
51
+ -->
52
+
53
+ ## Conclusion
54
+
55
+ {What we learned from this experiment}
56
+
57
+ <!--
58
+ Answer these:
59
+ - Why did it pass/fail?
60
+ - What assumption was validated/invalidated?
61
+ - If failed: What's the next hypothesis? (don't repeat same approach)
62
+ - If passed: What's ready for implementation?
63
+ -->
64
+
65
+ ---
66
+
67
+ <!--
68
+ Experiment Guidelines:
69
+ - One hypothesis per experiment
70
+ - Failed experiments are valuable - they inform the next hypothesis
71
+ - Never repeat a failed approach without a new insight
72
+ - Keep experiments small and fast (under 30 min)
73
+ - Link related experiments in conclusions
74
+ -->
@@ -29,6 +29,22 @@ Generated: {timestamp}
29
29
  - Files: {files}
30
30
  - Blocked by: T1
31
31
 
32
+ ### Spike Task Example
33
+
34
+ When no experiments exist to validate an approach, start with a minimal validation spike:
35
+
36
+ - [ ] **T1** (spike): Validate [hypothesis] approach
37
+ - Files: [minimal files needed]
38
+ - Blocked by: none
39
+ - Blocks: T2, T3, T4 (full implementation)
40
+ - Description: Minimal test to verify [approach] works before full implementation
41
+
42
+ - [ ] **T2**: Implement [feature] based on spike results
43
+ - Files: [implementation files]
44
+ - Blocked by: T1 (spike)
45
+
46
+ Spike tasks are 1-2 tasks to validate an approach before committing to full implementation.
47
+
32
48
  ---
33
49
 
34
50
  <!--
@@ -38,4 +54,6 @@ Plan Guidelines:
38
54
  - Blocked by references task IDs (T1, T2, etc.)
39
55
  - Mark complete with [x] and commit hash
40
56
  - Example completed: [x] **T1**: Create API ✓ (abc1234)
57
+ - Spike tasks: If no experiments validate the approach, first task should be a minimal validation spike
58
+ - Spike tasks block full implementation tasks until the hypothesis is validated
41
59
  -->