deepflow 0.1.24 → 0.1.26

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "deepflow",
3
- "version": "0.1.24",
3
+ "version": "0.1.26",
4
4
  "description": "Stay in flow state - lightweight spec-driven task orchestration for Claude Code",
5
5
  "keywords": [
6
6
  "claude",
@@ -24,8 +24,12 @@ Implement tasks from PLAN.md with parallel agents, atomic commits, and context-e
24
24
 
25
25
  ## Skills & Agents
26
26
  - Skill: `atomic-commits` — Clean commit protocol
27
- - Agent: `general-purpose` (Sonnet) — Task implementation
28
- - Agent: `reasoner` (Opus) Debugging failures
27
+
28
+ **Use Task tool to spawn agents:**
29
+ | Agent | subagent_type | model | Purpose |
30
+ |-------|---------------|-------|---------|
31
+ | Implementation | `general-purpose` | `sonnet` | Task implementation |
32
+ | Debugger | `reasoner` | `opus` | Debugging failures |
29
33
 
30
34
  ## Context-Aware Execution
31
35
 
@@ -82,20 +86,93 @@ If missing: "No PLAN.md found. Run /df:plan first."
82
86
 
83
87
  Warn if `specs/*.md` (excluding doing-/done-) exist. Non-blocking.
84
88
 
85
- ### 4. IDENTIFY READY TASKS
89
+ ### 4. CHECK EXPERIMENT STATUS (HYPOTHESIS VALIDATION)
90
+
91
+ **Before identifying ready tasks**, check experiment validation for full implementation tasks.
92
+
93
+ **Task Types:**
94
+ - **Spike tasks**: Have `[SPIKE]` in title OR `Type: spike` in description — always executable
95
+ - **Full implementation tasks**: Blocked by spike tasks — require validated experiment
96
+
97
+ **Validation Flow:**
98
+
99
+ ```
100
+ For each task in plan:
101
+ If task is spike task:
102
+ → Mark as executable (spikes are always allowed)
103
+ Else if task is blocked by a spike task (T{n}):
104
+ → Find related experiment file in .deepflow/experiments/
105
+ → Check experiment status:
106
+ - --passed.md exists → Unblock, proceed with implementation
107
+ - --failed.md exists → Keep blocked, warn user
108
+ - --active.md exists → Keep blocked, spike in progress
109
+ - No experiment → Keep blocked, spike not started
110
+ ```
111
+
112
+ **Experiment File Discovery:**
113
+
114
+ ```
115
+ Glob: .deepflow/experiments/{topic}--*--{status}.md
86
116
 
87
- Ready = `[ ]` + all `blocked_by` complete + not in checkpoint.
117
+ Topic extraction:
118
+ 1. From spike task: experiment file path in task description
119
+ 2. From spec name: doing-{topic} → {topic}
120
+ 3. Fuzzy match: normalize and match
121
+ ```
88
122
 
89
- ### 5. SPAWN AGENTS
123
+ **Status Handling:**
124
+
125
+ | Experiment Status | Task Status | Action |
126
+ |-------------------|-------------|--------|
127
+ | `--passed.md` | Ready | Execute full implementation |
128
+ | `--failed.md` | Blocked | Skip, warn: "Experiment failed, re-plan needed" |
129
+ | `--active.md` | Blocked | Skip, info: "Waiting for spike completion" |
130
+ | Not found | Blocked | Skip, info: "Spike task not executed yet" |
131
+
132
+ **Warning Output:**
133
+
134
+ ```
135
+ ⚠ T3 blocked: Experiment 'upload--streaming--failed.md' did not validate
136
+ → Run /df:plan to generate new hypothesis spike
137
+ ```
138
+
139
+ ### 5. IDENTIFY READY TASKS
140
+
141
+ Ready = `[ ]` + all `blocked_by` complete + experiment validated (if applicable) + not in checkpoint.
142
+
143
+ ### 6. SPAWN AGENTS
90
144
 
91
145
  Context ≥50%: checkpoint and exit.
92
146
 
93
- Spawn all ready tasks in ONE message (parallel). Same-file conflicts: sequential.
147
+ **Use Task tool to spawn all ready tasks in ONE message (parallel):**
148
+ ```
149
+ Task tool parameters for each task:
150
+ - subagent_type: "general-purpose"
151
+ - model: "sonnet"
152
+ - run_in_background: true
153
+ - prompt: "{task details from PLAN.md}"
154
+ ```
94
155
 
95
- On failure: spawn `reasoner`.
156
+ Same-file conflicts: spawn sequentially instead.
96
157
 
97
- ### 6. PER-TASK (agent prompt)
158
+ **Spike Task Execution:**
159
+ When spawning a spike task, the agent MUST:
160
+ 1. Execute the minimal validation method
161
+ 2. Record result in experiment file (update status: `--passed.md` or `--failed.md`)
162
+ 3. If passed: implementation tasks become unblocked
163
+ 4. If failed: record conclusion with "next hypothesis" for future planning
98
164
 
165
+ **On failure, use Task tool to spawn reasoner:**
166
+ ```
167
+ Task tool parameters:
168
+ - subagent_type: "reasoner"
169
+ - model: "opus"
170
+ - prompt: "Debug failure: {error details}"
171
+ ```
172
+
173
+ ### 7. PER-TASK (agent prompt)
174
+
175
+ **Standard Task:**
99
176
  ```
100
177
  {task_id}: {description from PLAN.md}
101
178
  Files: {target files}
@@ -105,14 +182,38 @@ Implement, test, commit as feat({spec}): {description}.
105
182
  Write result to .deepflow/results/{task_id}.yaml
106
183
  ```
107
184
 
108
- ### 7. COMPLETE SPECS
185
+ **Spike Task:**
186
+ ```
187
+ {task_id} [SPIKE]: {hypothesis}
188
+ Type: spike
189
+ Method: {minimal steps to validate}
190
+ Success criteria: {how to know it passed}
191
+ Time-box: {duration}
192
+ Experiment file: {.deepflow/experiments/{topic}--{hypothesis}--active.md}
193
+ Spec: {spec_name}
194
+
195
+ Execute the minimal validation:
196
+ 1. Follow the method steps exactly
197
+ 2. Measure against success criteria
198
+ 3. Update experiment file with result:
199
+ - If passed: rename to --passed.md, record findings
200
+ - If failed: rename to --failed.md, record conclusion with "next hypothesis"
201
+ 4. Commit as spike({spec}): validate {hypothesis}
202
+ 5. Write result to .deepflow/results/{task_id}.yaml
203
+
204
+ Result status:
205
+ - success = hypothesis validated (passed)
206
+ - failed = hypothesis invalidated (failed experiment, NOT agent error)
207
+ ```
208
+
209
+ ### 8. COMPLETE SPECS
109
210
 
110
211
  When all tasks done for a `doing-*` spec:
111
212
  1. Embed history in spec: `## Completed` section
112
213
  2. Rename: `doing-upload.md` → `done-upload.md`
113
214
  3. Remove section from PLAN.md
114
215
 
115
- ### 8. ITERATE
216
+ ### 9. ITERATE
116
217
 
117
218
  Repeat until: all done, all blocked, or checkpoint.
118
219
 
@@ -126,6 +227,8 @@ Repeat until: all done, all blocked, or checkpoint.
126
227
 
127
228
  ## Example
128
229
 
230
+ ### Standard Execution
231
+
129
232
  ```
130
233
  /df:execute (context: 12%)
131
234
 
@@ -140,7 +243,52 @@ Wave 2: T3 (context: 48%)
140
243
  ✓ Complete: 3/3 tasks
141
244
  ```
142
245
 
143
- With checkpoint:
246
+ ### Spike-First Execution
247
+
248
+ ```
249
+ /df:execute (context: 10%)
250
+
251
+ Checking experiment status...
252
+ T1 [SPIKE]: No experiment yet, spike executable
253
+ T2: Blocked by T1 (spike not validated)
254
+ T3: Blocked by T1 (spike not validated)
255
+
256
+ Wave 1: T1 [SPIKE] (context: 20%)
257
+ T1: success (abc1234) → upload--streaming--passed.md
258
+
259
+ Checking experiment status...
260
+ T2: Experiment passed, unblocked
261
+ T3: Experiment passed, unblocked
262
+
263
+ Wave 2: T2, T3 parallel (context: 45%)
264
+ T2: success (def5678)
265
+ T3: success (ghi9012)
266
+
267
+ ✓ doing-upload → done-upload
268
+ ✓ Complete: 3/3 tasks
269
+ ```
270
+
271
+ ### Spike Failed
272
+
273
+ ```
274
+ /df:execute (context: 10%)
275
+
276
+ Wave 1: T1 [SPIKE] (context: 20%)
277
+ T1: failed → upload--streaming--failed.md
278
+
279
+ Checking experiment status...
280
+ T2: ⚠ Blocked - Experiment failed
281
+ T3: ⚠ Blocked - Experiment failed
282
+
283
+ ⚠ Spike T1 invalidated hypothesis
284
+ Experiment: upload--streaming--failed.md
285
+ → Run /df:plan to generate new hypothesis spike
286
+
287
+ Complete: 1/3 tasks (2 blocked by failed experiment)
288
+ ```
289
+
290
+ ### With Checkpoint
291
+
144
292
  ```
145
293
  Wave 1 complete (context: 52%)
146
294
  Checkpoint saved. Run /df:execute --continue
@@ -42,21 +42,33 @@ Determine source_dir from config or default to src/
42
42
 
43
43
  If no new specs: report counts, suggest `/df:execute`.
44
44
 
45
- ### 2. CHECK PAST EXPERIMENTS
45
+ ### 2. CHECK PAST EXPERIMENTS (SPIKE-FIRST)
46
46
 
47
- Extract domains from spec (perf, auth, api, etc.), then:
47
+ **CRITICAL**: Check experiments BEFORE generating any tasks.
48
+
49
+ Extract topic from spec name (fuzzy match), then:
48
50
 
49
51
  ```
50
- Glob .deepflow/experiments/{domain}--*
52
+ Glob .deepflow/experiments/{topic}--*
51
53
  ```
52
54
 
55
+ **Experiment file naming:** `{topic}--{hypothesis}--{status}.md`
56
+ Statuses: `active`, `passed`, `failed`
57
+
53
58
  | Result | Action |
54
59
  |--------|--------|
55
- | `--failed.md` | Exclude approach, note why |
56
- | `--success.md` | Reference as pattern |
57
- | No matches | Continue (expected for new projects) |
60
+ | `--failed.md` exists | Extract "next hypothesis" from Conclusion section |
61
+ | `--passed.md` exists | Reference as validated pattern, can proceed to full implementation |
62
+ | `--active.md` exists | Wait for experiment completion before planning |
63
+ | No matches | New topic, needs initial spike |
64
+
65
+ **Spike-First Rule**:
66
+ - If `--failed.md` exists: Generate spike task to test the next hypothesis (from failed experiment's Conclusion)
67
+ - If no experiments exist: Generate spike task for the core hypothesis
68
+ - Full implementation tasks are BLOCKED until a spike validates the approach
69
+ - Only proceed to full task generation after `--passed.md` exists
58
70
 
59
- **Naming:** `{domain}--{approach}--{result}.md`
71
+ See: `templates/experiment-template.md` for experiment format
60
72
 
61
73
  ### 3. DETECT PROJECT CONTEXT
62
74
 
@@ -69,7 +81,15 @@ Include patterns in task descriptions for agents to follow.
69
81
 
70
82
  ### 4. ANALYZE CODEBASE
71
83
 
72
- **Spawn Explore agents** (haiku, read-only) with dynamic count:
84
+ **Use Task tool to spawn Explore agents in parallel:**
85
+ ```
86
+ Task tool parameters:
87
+ - subagent_type: "Explore"
88
+ - model: "haiku"
89
+ - run_in_background: true (for parallel execution)
90
+ ```
91
+
92
+ Scale agent count based on codebase size:
73
93
 
74
94
  | File Count | Agents |
75
95
  |------------|--------|
@@ -78,6 +98,34 @@ Include patterns in task descriptions for agents to follow.
78
98
  | 100-500 | 25-40 |
79
99
  | 500+ | 50-100 (cap) |
80
100
 
101
+ **Explore Agent Prompt Structure:**
102
+ ```
103
+ Find: [specific question]
104
+ Return ONLY:
105
+ - File paths matching criteria
106
+ - One-line description per file
107
+ - Integration points (if asked)
108
+
109
+ DO NOT:
110
+ - Read or summarize spec files
111
+ - Make recommendations
112
+ - Propose solutions
113
+ - Generate tables or lengthy explanations
114
+
115
+ Max response: 500 tokens (configurable via .deepflow/config.yaml explore.max_tokens)
116
+ ```
117
+
118
+ **Explore Agent Scope Restrictions:**
119
+ - MUST only report factual findings:
120
+ - Files found
121
+ - Patterns/conventions observed
122
+ - Integration points
123
+ - MUST NOT:
124
+ - Make recommendations
125
+ - Propose architectures
126
+ - Read and summarize specs (that's orchestrator's job)
127
+ - Draw conclusions about what should be built
128
+
81
129
  **Use `code-completeness` skill patterns** to search for:
82
130
  - Implementations matching spec requirements
83
131
  - TODO, FIXME, HACK comments
@@ -86,7 +134,14 @@ Include patterns in task descriptions for agents to follow.
86
134
 
87
135
  ### 5. COMPARE & PRIORITIZE
88
136
 
89
- **Spawn `reasoner` agent** (Opus) for analysis:
137
+ **Use Task tool to spawn reasoner agent:**
138
+ ```
139
+ Task tool parameters:
140
+ - subagent_type: "reasoner"
141
+ - model: "opus"
142
+ ```
143
+
144
+ Reasoner performs analysis:
90
145
 
91
146
  | Status | Action |
92
147
  |--------|--------|
@@ -102,7 +157,36 @@ Include patterns in task descriptions for agents to follow.
102
157
  2. Impact — core features before enhancements
103
158
  3. Risk — unknowns early
104
159
 
105
- ### 6. VALIDATE HYPOTHESES
160
+ ### 6. GENERATE SPIKE TASKS (IF NEEDED)
161
+
162
+ **When to generate spike tasks:**
163
+ 1. Failed experiment exists → Test the next hypothesis
164
+ 2. No experiments exist → Test the core hypothesis
165
+ 3. Passed experiment exists → Skip to full implementation
166
+
167
+ **Spike Task Format:**
168
+ ```markdown
169
+ - [ ] **T1** [SPIKE]: Validate {hypothesis}
170
+ - Type: spike
171
+ - Hypothesis: {what we're testing}
172
+ - Method: {minimal steps to validate}
173
+ - Success criteria: {how to know it passed}
174
+ - Time-box: 30 min
175
+ - Files: .deepflow/experiments/{topic}--{hypothesis}--{status}.md
176
+ - Blocked by: none
177
+ ```
178
+
179
+ **Blocking Logic:**
180
+ - All implementation tasks MUST have `Blocked by: T{spike}` until spike passes
181
+ - After spike completes:
182
+ - If passed: Update experiment to `--passed.md`, unblock implementation tasks
183
+ - If failed: Update experiment to `--failed.md`, DO NOT generate implementation tasks
184
+
185
+ **Full Implementation Only After Spike:**
186
+ - Only generate full task list when spike validates the approach
187
+ - Never generate 10-task waterfall without validated hypothesis
188
+
189
+ ### 7. VALIDATE HYPOTHESES
106
190
 
107
191
  Test risky assumptions before finalizing plan.
108
192
 
@@ -111,24 +195,27 @@ Test risky assumptions before finalizing plan.
111
195
  **Process:**
112
196
  1. Prototype in scratchpad (not committed)
113
197
  2. Test assumption
114
- 3. If fails → Write `.deepflow/experiments/{domain}--{approach}--failed.md`
198
+ 3. If fails → Write `.deepflow/experiments/{topic}--{hypothesis}--failed.md`
115
199
  4. Adjust approach, document in task
116
200
 
117
201
  **Skip:** Well-known patterns, simple CRUD, clear docs exist
118
202
 
119
- ### 7. OUTPUT PLAN.md
203
+ ### 8. OUTPUT PLAN.md
120
204
 
121
205
  Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validation findings.
122
206
 
123
- ### 8. RENAME SPECS
207
+ ### 9. RENAME SPECS
124
208
 
125
209
  `mv specs/feature.md specs/doing-feature.md`
126
210
 
127
- ### 9. REPORT
211
+ ### 10. REPORT
128
212
 
129
213
  `✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
130
214
 
131
215
  ## Rules
216
+ - **Spike-first** — Generate spike task before full implementation if no `--passed.md` experiment exists
217
+ - **Block on spike** — Full implementation tasks MUST be blocked by spike validation
218
+ - **Learn from failures** — Extract "next hypothesis" from failed experiments, never repeat same approach
132
219
  - **Learn from history** — Check past experiments before proposing approaches
133
220
  - **Plan only** — Do NOT implement anything (except quick validation prototypes)
134
221
  - **Validate before commit** — Test risky assumptions with minimal experiments
@@ -139,13 +226,64 @@ Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validatio
139
226
 
140
227
  ## Agent Scaling
141
228
 
142
- | Agent | Base | Scale |
143
- |-------|------|-------|
144
- | Explore (search) | 10 | +1 per 20 files |
145
- | Reasoner (analyze) | 5 | +1 per 2 specs |
229
+ | Agent | Model | Base | Scale |
230
+ |-------|-------|------|-------|
231
+ | Explore (search) | haiku | 10 | +1 per 20 files |
232
+ | Reasoner (analyze) | opus | 5 | +1 per 2 specs |
233
+
234
+ **IMPORTANT**: Always use the `Task` tool with explicit `subagent_type` and `model` parameters. Do NOT use Glob/Grep/Read directly for codebase analysis - spawn agents instead.
146
235
 
147
236
  ## Example
148
237
 
238
+ ### Spike-First (No Prior Experiments)
239
+
240
+ ```markdown
241
+ # Plan
242
+
243
+ ### doing-upload
244
+
245
+ - [ ] **T1** [SPIKE]: Validate streaming upload approach
246
+ - Type: spike
247
+ - Hypothesis: Streaming uploads will handle files >1GB without memory issues
248
+ - Method: Create minimal endpoint, upload 2GB file, measure memory
249
+ - Success criteria: Memory stays under 500MB during upload
250
+ - Time-box: 30 min
251
+ - Files: .deepflow/experiments/upload--streaming--active.md
252
+ - Blocked by: none
253
+
254
+ - [ ] **T2**: Create upload endpoint
255
+ - Files: src/api/upload.ts
256
+ - Blocked by: T1 (spike must pass)
257
+
258
+ - [ ] **T3**: Add S3 service with streaming
259
+ - Files: src/services/storage.ts
260
+ - Blocked by: T1 (spike must pass), T2
261
+ ```
262
+
263
+ ### Spike-First (After Failed Experiment)
264
+
265
+ ```markdown
266
+ # Plan
267
+
268
+ ### doing-upload
269
+
270
+ - [ ] **T1** [SPIKE]: Validate chunked upload with backpressure
271
+ - Type: spike
272
+ - Hypothesis: Adding backpressure control will prevent buffer overflow
273
+ - Method: Implement pause/resume on buffer threshold, test with 2GB file
274
+ - Success criteria: No memory spikes above 500MB
275
+ - Time-box: 30 min
276
+ - Files: .deepflow/experiments/upload--chunked-backpressure--active.md
277
+ - Blocked by: none
278
+ - Note: Previous approach failed (see upload--buffer-upload--failed.md)
279
+
280
+ - [ ] **T2**: Implement chunked upload endpoint
281
+ - Files: src/api/upload.ts
282
+ - Blocked by: T1 (spike must pass)
283
+ ```
284
+
285
+ ### After Spike Validates (Full Implementation)
286
+
149
287
  ```markdown
150
288
  # Plan
151
289
 
@@ -154,10 +292,10 @@ Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validatio
154
292
  - [ ] **T1**: Create upload endpoint
155
293
  - Files: src/api/upload.ts
156
294
  - Blocked by: none
295
+ - Note: Use streaming (validated in upload--streaming--passed.md)
157
296
 
158
297
  - [ ] **T2**: Add S3 service with streaming
159
298
  - Files: src/services/storage.ts
160
299
  - Blocked by: T1
161
- - Note: Use streaming (see experiments/perf--chunked-upload--success.md)
162
- - Avoid: Direct buffer upload failed for large files (experiments/perf--buffer-upload--failed.md)
300
+ - Avoid: Direct buffer upload failed (see upload--buffer-upload--failed.md)
163
301
  ```
@@ -20,14 +20,26 @@ Transform conversation context into a structured specification file.
20
20
 
21
21
  ## Skills & Agents
22
22
  - Skill: `gap-discovery` — Proactive requirement gap identification
23
- - Agent: `Explore` (haiku) — Codebase context gathering
24
- - Agent: `reasoner` (Opus) Synthesize findings into requirements
23
+
24
+ **Use Task tool to spawn agents:**
25
+ | Agent | subagent_type | model | Purpose |
26
+ |-------|---------------|-------|---------|
27
+ | Context | `Explore` | `haiku` | Codebase context gathering |
28
+ | Synthesizer | `reasoner` | `opus` | Synthesize findings into requirements |
25
29
 
26
30
  ## Behavior
27
31
 
28
32
  ### 1. GATHER CODEBASE CONTEXT
29
33
 
30
- **Spawn Explore agents** (haiku, read-only, parallel) to find:
34
+ **Use Task tool to spawn Explore agents in parallel:**
35
+ ```
36
+ Task tool parameters:
37
+ - subagent_type: "Explore"
38
+ - model: "haiku"
39
+ - run_in_background: true
40
+ ```
41
+
42
+ Find:
31
43
  - Related existing implementations
32
44
  - Code patterns and conventions
33
45
  - Integration points relevant to the feature
@@ -39,6 +51,34 @@ Transform conversation context into a structured specification file.
39
51
  | 20-100 | 5-8 |
40
52
  | 100+ | 10-15 |
41
53
 
54
+ **Explore Agent Prompt Structure:**
55
+ ```
56
+ Find: [specific question]
57
+ Return ONLY:
58
+ - File paths matching criteria
59
+ - One-line description per file
60
+ - Integration points (if asked)
61
+
62
+ DO NOT:
63
+ - Read or summarize spec files
64
+ - Make recommendations
65
+ - Propose solutions
66
+ - Generate tables or lengthy explanations
67
+
68
+ Max response: 500 tokens (configurable via .deepflow/config.yaml explore.max_tokens)
69
+ ```
70
+
71
+ **Explore Agent Scope Restrictions:**
72
+ - MUST only report factual findings:
73
+ - Files found
74
+ - Patterns/conventions observed
75
+ - Integration points
76
+ - MUST NOT:
77
+ - Make recommendations
78
+ - Propose architectures
79
+ - Read and summarize specs (that's orchestrator's job)
80
+ - Draw conclusions about what should be built
81
+
42
82
  ### 2. GAP CHECK
43
83
  Use the `gap-discovery` skill to analyze conversation + agent findings.
44
84
 
@@ -70,7 +110,14 @@ Max 4 questions per tool call. Wait for answers before proceeding.
70
110
 
71
111
  ### 3. SYNTHESIZE FINDINGS
72
112
 
73
- **Spawn `reasoner` agent** (Opus) to:
113
+ **Use Task tool to spawn reasoner agent:**
114
+ ```
115
+ Task tool parameters:
116
+ - subagent_type: "reasoner"
117
+ - model: "opus"
118
+ ```
119
+
120
+ The reasoner will:
74
121
  - Analyze codebase context from Explore agents
75
122
  - Identify constraints from existing architecture
76
123
  - Suggest requirements based on patterns found
@@ -130,10 +177,12 @@ Next: Run /df:plan to generate tasks
130
177
 
131
178
  ## Agent Scaling
132
179
 
133
- | Agent | Base | Purpose |
134
- |-------|------|---------|
135
- | Explore (haiku) | 3-5 | Find related code, patterns |
136
- | Reasoner (Opus) | 1 | Synthesize into requirements |
180
+ | Agent | subagent_type | model | Base | Purpose |
181
+ |-------|---------------|-------|------|---------|
182
+ | Explore | `Explore` | `haiku` | 3-5 | Find related code, patterns |
183
+ | Reasoner | `reasoner` | `opus` | 1 | Synthesize into requirements |
184
+
185
+ **IMPORTANT**: Always use the `Task` tool with explicit `subagent_type` and `model` parameters.
137
186
 
138
187
  ## Example
139
188
 
@@ -12,7 +12,11 @@ Check that implemented code satisfies spec requirements and acceptance criteria.
12
12
 
13
13
  ## Skills & Agents
14
14
  - Skill: `code-completeness` — Find incomplete implementations
15
- - Agent: `Explore` (Haiku) — Fast codebase scanning
15
+
16
+ **Use Task tool to spawn agents:**
17
+ | Agent | subagent_type | model | Purpose |
18
+ |-------|---------------|-------|---------|
19
+ | Scanner | `Explore` | `haiku` | Fast codebase scanning |
16
20
 
17
21
  ## Spec File States
18
22
 
@@ -87,7 +91,15 @@ Default: L1-L3 (L4 optional, can be slow)
87
91
 
88
92
  ## Agent Usage
89
93
 
90
- Spawn `Explore` agents (Haiku), 1-2 per spec, cap 10.
94
+ **Use Task tool to spawn Explore agents:**
95
+ ```
96
+ Task tool parameters:
97
+ - subagent_type: "Explore"
98
+ - model: "haiku"
99
+ - run_in_background: true (for parallel)
100
+ ```
101
+
102
+ Scale: 1-2 agents per spec, cap 10.
91
103
 
92
104
  ## Example
93
105
 
@@ -36,6 +36,9 @@ models:
36
36
  reason: opus # Complex decisions
37
37
  debug: opus # Problem solving
38
38
 
39
+ explore:
40
+ max_tokens: 500 # Controls Explore agent response length
41
+
39
42
  commits:
40
43
  format: "feat({spec}): {description}"
41
44
  atomic: true # One task = one commit
@@ -0,0 +1,74 @@
1
+ # Experiment: {hypothesis-slug}
2
+
3
+ > **Filename convention**: `{topic}--{hypothesis-slug}--{status}.md`
4
+ > Status: `active` | `passed` | `failed`
5
+
6
+ ## Topic
7
+
8
+ {Spec name or feature area this experiment relates to}
9
+
10
+ <!--
11
+ What problem or feature does this experiment address?
12
+ Link to relevant spec if applicable.
13
+ -->
14
+
15
+ ## Hypothesis
16
+
17
+ {What we believe will work and why}
18
+
19
+ <!--
20
+ Be specific and testable:
21
+ - "Using approach X will achieve Y because Z"
22
+ - "The bottleneck is in component A, not B"
23
+ - Should be falsifiable in a single experiment
24
+ -->
25
+
26
+ ## Method
27
+
28
+ {Minimal steps to validate the hypothesis}
29
+
30
+ <!--
31
+ Keep it minimal - fastest path to prove/disprove:
32
+ 1. Step one (e.g., "Create test file with X")
33
+ 2. Step two (e.g., "Run command Y")
34
+ 3. Step three (e.g., "Observe output Z")
35
+
36
+ Time-box: ideally under 30 minutes
37
+ -->
38
+
39
+ ## Result
40
+
41
+ **Status**: {pass | fail}
42
+
43
+ {Actual outcome with evidence}
44
+
45
+ <!--
46
+ Include concrete evidence:
47
+ - Error messages, output logs
48
+ - Metrics or measurements
49
+ - Screenshots if applicable
50
+ - What specifically happened vs. expected
51
+ -->
52
+
53
+ ## Conclusion
54
+
55
+ {What we learned from this experiment}
56
+
57
+ <!--
58
+ Answer these:
59
+ - Why did it pass/fail?
60
+ - What assumption was validated/invalidated?
61
+ - If failed: What's the next hypothesis? (don't repeat same approach)
62
+ - If passed: What's ready for implementation?
63
+ -->
64
+
65
+ ---
66
+
67
+ <!--
68
+ Experiment Guidelines:
69
+ - One hypothesis per experiment
70
+ - Failed experiments are valuable - they inform the next hypothesis
71
+ - Never repeat a failed approach without a new insight
72
+ - Keep experiments small and fast (under 30 min)
73
+ - Link related experiments in conclusions
74
+ -->
@@ -29,6 +29,22 @@ Generated: {timestamp}
29
29
  - Files: {files}
30
30
  - Blocked by: T1
31
31
 
32
+ ### Spike Task Example
33
+
34
+ When no experiments exist to validate an approach, start with a minimal validation spike:
35
+
36
+ - [ ] **T1** (spike): Validate [hypothesis] approach
37
+ - Files: [minimal files needed]
38
+ - Blocked by: none
39
+ - Blocks: T2, T3, T4 (full implementation)
40
+ - Description: Minimal test to verify [approach] works before full implementation
41
+
42
+ - [ ] **T2**: Implement [feature] based on spike results
43
+ - Files: [implementation files]
44
+ - Blocked by: T1 (spike)
45
+
46
+ Spike tasks are 1-2 tasks to validate an approach before committing to full implementation.
47
+
32
48
  ---
33
49
 
34
50
  <!--
@@ -38,4 +54,6 @@ Plan Guidelines:
38
54
  - Blocked by references task IDs (T1, T2, etc.)
39
55
  - Mark complete with [x] and commit hash
40
56
  - Example completed: [x] **T1**: Create API ✓ (abc1234)
57
+ - Spike tasks: If no experiments validate the approach, first task should be a minimal validation spike
58
+ - Spike tasks block full implementation tasks until the hypothesis is validated
41
59
  -->