deepflow 0.1.87 → 0.1.89

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -7,206 +7,67 @@ description: Capture decisions that emerged during free conversations outside of
7
7
 
8
8
  ## Orchestrator Role
9
9
 
10
- You scan prior conversation context for candidate decisions, present them for user confirmation, and persist confirmed decisions to `.deepflow/decisions.md`.
10
+ Scan conversation for candidate decisions, present for user confirmation, persist to `.deepflow/decisions.md`.
11
11
 
12
- **NEVER:** Spawn agents, use Task tool, use Glob/Grep on source code, run git, use TaskOutput, use EnterPlanMode, use ExitPlanMode
12
+ **NEVER:** Spawn agents, use Task tool, use Glob/Grep on source code, run git, use TaskOutput, EnterPlanMode, ExitPlanMode
13
13
 
14
- **ONLY:** Read `.deepflow/decisions.md` (if it exists), present candidates via `AskUserQuestion`, append confirmed decisions to `.deepflow/decisions.md`
15
-
16
- ---
17
-
18
- ## Purpose
19
-
20
- Capture decisions that emerged during free conversations outside of deepflow commands. Surfaces candidate decisions from the current conversation, lets the user confirm or discard each, and persists confirmed ones to the shared decisions log.
21
-
22
- ## Usage
23
-
24
- ```
25
- /df:note
26
- ```
27
-
28
- No arguments required. Operates on the current conversation context.
29
-
30
- ---
14
+ **ONLY:** Read `.deepflow/decisions.md`, present candidates via `AskUserQuestion`, append confirmed decisions
31
15
 
32
16
  ## Behavior
33
17
 
34
18
  ### 1. EXTRACT CANDIDATES
35
19
 
36
- Scan the prior conversation messages for candidate decisions. A decision is any resolved choice, adopted approach, or stated assumption that affects how the work is done. Look for:
20
+ Scan prior messages for resolved choices, adopted approaches, or stated assumptions. Look for:
21
+ - **Approaches chosen**: "we'll use X instead of Y"
22
+ - **Provisional choices**: "for now we'll use X"
23
+ - **Stated assumptions**: "assuming X is true"
24
+ - **Constraints accepted**: "X is out of scope"
25
+ - **Naming/structural choices**: "we'll call it X", "X goes in the Y layer"
37
26
 
38
- - **Approaches chosen**: "we'll use X instead of Y", "let's go with X"
39
- - **Provisional choices**: "for now we'll use X", "assuming X until we know more"
40
- - **Stated assumptions**: "assuming X is true", "treating X as given"
41
- - **Constraints accepted**: "we won't do X", "X is out of scope"
42
- - **Naming or structural choices**: "we'll call it X", "X goes in the Y layer"
27
+ Extract **at most 4 candidates**. For each, determine:
43
28
 
44
- Extract **at most 4 candidates** from the conversation. Prioritize the most consequential or recent ones.
29
+ | Field | Value |
30
+ |-------|-------|
31
+ | Tag | `[APPROACH]` (deliberate choice), `[PROVISIONAL]` (revisit later), or `[ASSUMPTION]` (unvalidated) |
32
+ | Decision | One concise line describing the choice |
33
+ | Rationale | One sentence explaining why |
45
34
 
46
- For each candidate, determine:
47
- - **Tag**: one of `[APPROACH]`, `[PROVISIONAL]`, or `[ASSUMPTION]`
48
- - `[APPROACH]` — a deliberate design or implementation choice
49
- - `[PROVISIONAL]` — works for now, expected to revisit
50
- - `[ASSUMPTION]` — treating something as true without full validation
51
- - **Decision text**: one concise line describing the choice
52
- - **Rationale**: one sentence explaining why this was chosen
53
-
54
- If fewer than 2 clear candidates are found, say so briefly and exit without calling `AskUserQuestion`.
35
+ If <2 clear candidates found, say so and exit.
55
36
 
56
37
  ### 2. CHECK FOR CONTRADICTIONS
57
38
 
58
- Read `.deepflow/decisions.md` if it exists. For each candidate, check whether it contradicts a prior entry in the file.
59
-
60
- If a contradiction is found:
61
- - Keep the prior entry — never delete or modify it
62
- - Amend the candidate's rationale to reference the prior decision: `was "X", now "Y" because Z`
39
+ Read `.deepflow/decisions.md` if it exists. If a candidate contradicts a prior entry: keep prior entry unchanged, amend candidate rationale to `was "X", now "Y" because Z`.
63
40
 
64
41
  ### 3. PRESENT VIA AskUserQuestion
65
42
 
66
- Present candidates as a multi-select question with at most 4 options (tool limit).
67
-
68
- ```json
69
- {
70
- "questions": [
71
- {
72
- "question": "These decisions were detected in your conversation. Which should be saved to .deepflow/decisions.md?",
73
- "header": "Save notes?",
74
- "multiSelect": true,
75
- "options": [
76
- {
77
- "label": "[APPROACH] <decision text>",
78
- "description": "<rationale>"
79
- },
80
- {
81
- "label": "[PROVISIONAL] <decision text>",
82
- "description": "<rationale>"
83
- }
84
- ]
85
- }
86
- ]
87
- }
88
- ```
89
-
90
- Each option's `label` is the tag + decision text. Each `description` is the rationale (one sentence).
43
+ Single multi-select call. Each option: `label` = tag + decision text, `description` = rationale.
91
44
 
92
45
  ### 4. APPEND CONFIRMED DECISIONS
93
46
 
94
- For each option the user selects:
95
-
96
- 1. If `.deepflow/decisions.md` does not exist, create it with a blank header:
97
- ```
98
- # Decisions
99
- ```
100
-
101
- 2. Append a new dated section using today's date in `YYYY-MM-DD` format and source `note`:
102
-
103
- ```markdown
104
- ### 2026-02-22 — note
105
- - [APPROACH] Use event sourcing over CRUD — append-only log matches audit requirements
106
- - [PROVISIONAL] Batch size = 50 — works for 4-game dataset, revisit at scale
107
- ```
108
-
109
- 3. If multiple decisions are confirmed in one invocation, group them under a single dated section.
110
-
111
- 4. Never modify or delete any prior entries.
47
+ For each selected option:
48
+ 1. Create `.deepflow/decisions.md` with `# Decisions` header if absent
49
+ 2. Append a dated section: `### YYYY-MM-DD note`
50
+ 3. Group all confirmed decisions under one section: `- [TAG] Decision text — rationale`
51
+ 4. Never modify or delete prior entries
112
52
 
113
53
  ### 5. CONFIRM
114
54
 
115
- After writing, report to the user:
116
-
117
- ```
118
- Saved N decision(s) to .deepflow/decisions.md
119
- ```
120
-
121
- If the user selected nothing, respond:
122
-
123
- ```
124
- No decisions saved.
125
- ```
126
-
127
- ---
128
-
129
- ## Decision Format
130
-
131
- ```
132
- ### YYYY-MM-DD — note
133
- - [TAG] Decision text — rationale
134
- ```
55
+ Report: `Saved N decision(s) to .deepflow/decisions.md` or `No decisions saved.`
135
56
 
136
- **Tags:**
137
- - `[APPROACH]` — deliberate design or implementation choice
138
- - `[PROVISIONAL]` — works for now, will revisit at scale or with more information
139
- - `[ASSUMPTION]` — treating something as true without full confirmation
140
- - `[DEBT]` — needs revisiting; produced only by `/df:consolidate`, never manually assigned
57
+ ## Decision Tags
141
58
 
142
- **Contradiction handling:** Never delete prior entries. When a new decision contradicts an older one, include a reference in the rationale: `was "X", now "Y" because Z`.
143
-
144
- ---
59
+ | Tag | Meaning | Source |
60
+ |-----|---------|--------|
61
+ | `[APPROACH]` | Firm decision | /df:note, auto-extraction |
62
+ | `[PROVISIONAL]` | Revisit later | /df:note, auto-extraction |
63
+ | `[ASSUMPTION]` | Unverified | /df:note, auto-extraction |
64
+ | `[DEBT]` | Needs revisiting | /df:consolidate only, never manually assigned |
145
65
 
146
66
  ## Rules
147
67
 
148
- - **Maximum 4 candidates** per invocation (tool limit for AskUserQuestion options)
149
- - **multiSelect: true** — user can confirm any subset, including all or none
150
- - **Never invent decisions** — only extract what was genuinely discussed and resolved in the conversation
151
- - **Never modify prior entries** in `.deepflow/decisions.md`
152
- - **Create the file if absent** first use initializes it with `# Decisions` header
153
- - **Source is always `note`** this distinguishes from decisions captured by other df commands
154
- - **Date is today's date** — use the current date in `YYYY-MM-DD` format
155
- - **One AskUserQuestion call** — present all candidates in a single call, not sequentially
156
-
157
- ---
158
-
159
- ## Example
160
-
161
- ```
162
- USER: /df:note
163
-
164
- CLAUDE: Let me scan our conversation for candidate decisions.
165
-
166
- [Reads .deepflow/decisions.md — finds prior entry:
167
- "### 2026-02-10 — spec
168
- - [APPROACH] Use PostgreSQL for primary storage — team already knows it"]
169
-
170
- [Extracts from conversation:
171
- 1. [APPROACH] Use Redis for session state — avoids DB round-trips on hot path
172
- 2. [PROVISIONAL] Skip auth for internal APIs — only 2 internal services for now
173
- 3. [ASSUMPTION] Peak load is 500 req/s — based on current traffic, unvalidated
174
- ]
175
-
176
- [No contradictions found with prior entries]
177
-
178
- [Uses AskUserQuestion]
179
- {
180
- "questions": [
181
- {
182
- "question": "These decisions were detected in your conversation. Which should be saved to .deepflow/decisions.md?",
183
- "header": "Save notes?",
184
- "multiSelect": true,
185
- "options": [
186
- {
187
- "label": "[APPROACH] Use Redis for session state",
188
- "description": "Avoids DB round-trips on hot path"
189
- },
190
- {
191
- "label": "[PROVISIONAL] Skip auth for internal APIs",
192
- "description": "Only 2 internal services for now, revisit when we open to more consumers"
193
- },
194
- {
195
- "label": "[ASSUMPTION] Peak load is 500 req/s",
196
- "description": "Based on current traffic patterns, not load-tested"
197
- }
198
- ]
199
- }
200
- ]
201
- }
202
-
203
- USER: [Selects: Use Redis for session state, Peak load is 500 req/s]
204
-
205
- CLAUDE: [Appends to .deepflow/decisions.md:]
206
-
207
- ### 2026-02-22 — note
208
- - [APPROACH] Use Redis for session state — avoids DB round-trips on hot path
209
- - [ASSUMPTION] Peak load is 500 req/s — based on current traffic patterns, not load-tested
210
-
211
- Saved 2 decision(s) to .deepflow/decisions.md
212
- ```
68
+ - Max 4 candidates per invocation (AskUserQuestion tool limit)
69
+ - multiSelect: true — user confirms any subset
70
+ - Never invent decisions — only extract what was discussed and resolved
71
+ - Never modify prior entries in `.deepflow/decisions.md`
72
+ - Source is always `note`; date is today (YYYY-MM-DD)
73
+ - One AskUserQuestion callall candidates in a single call
@@ -5,7 +5,6 @@ description: Compare specs against codebase and past experiments, generate prior
5
5
 
6
6
  # /df:plan — Generate Task Plan from Specs
7
7
 
8
- ## Purpose
9
8
  Compare specs against codebase and past experiments. Generate prioritized tasks.
10
9
 
11
10
  **NEVER:** use EnterPlanMode, use ExitPlanMode — this command IS the planning phase
@@ -37,22 +36,35 @@ Load: specs/*.md (exclude doing-*/done-*), PLAN.md (if exists), .deepflow/config
37
36
  Determine source_dir from config or default to src/
38
37
  ```
39
38
 
40
- Shell injection (use output directly — no manual file reads needed):
39
+ Shell injection:
41
40
  - `` !`ls specs/*.md 2>/dev/null || echo 'NOT_FOUND'` ``
42
41
  - `` !`cat PLAN.md 2>/dev/null || echo 'NOT_FOUND'` ``
43
42
 
44
- Run `validateSpec` on each spec. Hard failures → skip + error. Advisory → include in output.
43
+ Run `validateSpec` on each spec. Hard failures → skip + error. Advisory → include.
44
+ Record each spec's computed layer (gates task generation per §1.5).
45
45
  No new specs → report counts, suggest `/df:execute`.
46
46
 
47
- ### 2. CHECK PAST EXPERIMENTS (SPIKE-FIRST)
47
+ ### 1.5. LAYER-GATED TASK GENERATION
48
48
 
49
- **CRITICAL**: Check experiments BEFORE generating any tasks.
49
+ | Layer | Sections present | Allowed task types |
50
+ |-------|------------------|--------------------|
51
+ | L0 | Objective | Spikes only |
52
+ | L1 | + Requirements | Spikes only (better targeted) |
53
+ | L2 | + Acceptance Criteria | Spikes + Implementation |
54
+ | L3 | + Constraints, Out of Scope, Technical Notes | Spikes + Implementation + Impact analysis + Optimize |
50
55
 
51
- ```
52
- Glob .deepflow/experiments/{topic}--*
53
- ```
56
+ **Rules:**
57
+ - L0–L1: ONLY spike tasks. Implementation blocked until spec deepens to L2+.
58
+ - L2: spikes + implementation, skip impact analysis.
59
+ - L3: full planning — spikes, implementation, impact analysis, optimize.
60
+ - Spike results deepen specs: findings incorporated back via user or `/df:spec`, raising layer.
61
+ - Report layer: `"Spec {name}: L{N} ({label}) — {task_types_generated}"`
62
+
63
+ ### 2. CHECK PAST EXPERIMENTS (SPIKE-FIRST)
64
+
65
+ **CRITICAL**: Check experiments BEFORE generating tasks.
54
66
 
55
- File naming: `{topic}--{hypothesis}--{status}.md` (active/passed/failed)
67
+ Glob `.deepflow/experiments/{topic}--*`. File naming: `{topic}--{hypothesis}--{status}.md`
56
68
 
57
69
  | Result | Action |
58
70
  |--------|--------|
@@ -61,140 +73,79 @@ File naming: `{topic}--{hypothesis}--{status}.md` (active/passed/failed)
61
73
  | `--active.md` | Wait for completion |
62
74
  | No matches | New topic, generate initial spike |
63
75
 
64
- Full implementation tasks BLOCKED until spike validates. See `templates/experiment-template.md`.
76
+ Implementation tasks BLOCKED until spike validates.
65
77
 
66
78
  ### 3. DETECT PROJECT CONTEXT
67
79
 
68
80
  Identify code style, patterns (error handling, API structure), integration points. Include in task descriptions.
69
81
 
70
- ### 4. IMPACT ANALYSIS (per planned file)
82
+ ### 4. IMPACT ANALYSIS (L3 specs only)
71
83
 
72
- For each file in a task's "Files:" list, find the full blast radius.
84
+ Skip for L0–L2 specs. For each file in a task's `Files:` list, find blast radius.
73
85
 
74
- **Search for (prefer LSP, fallback to grep):**
86
+ **Search (prefer LSP, fallback grep):**
87
+ 1. **Callers:** LSP `findReferences`/`incomingCalls` on exports being changed. Annotate WHY impacted. Fallback: grep.
88
+ 2. **Duplicates:** Similar logic files. Classify: `[active]` → consolidate, `[dead]` → DELETE.
89
+ 3. **Data flow:** LSP `outgoingCalls` to trace consumers.
75
90
 
76
- 1. **Callers:** Use LSP `findReferences` / `incomingCalls` on each exported function/type being changed. Annotate each caller with WHY it's impacted (e.g. "imports validateToken which this task changes"). Fallback: `grep -r "{exported_function}" --include="*.{ext}" -l`
77
- 2. **Duplicates:** Files with similar logic (same function name, same transformation). Classify:
78
- - `[active]` — used in production → must consolidate
79
- - `[dead]` — bypassed/unreachable → must delete
80
- 3. **Data flow:** If file produces/transforms data, use LSP `outgoingCalls` to trace consumers. Fallback: grep across languages
81
-
82
- **Embed as `Impact:` block in each task:**
83
- ```markdown
84
- - [ ] **T2**: Add new features to YAML export
85
- - Files: src/utils/buildConfigData.ts
86
- - Impact:
87
- - Callers: src/routes/index.ts:12, src/api/handler.ts:45
88
- - Duplicates:
89
- - src/components/YamlViewer.tsx:19 (own generateYAML) [active — consolidate]
90
- - backend/yaml_gen.go (generateYAMLFromConfig) [dead — DELETE]
91
- - Data flow: buildConfigData → YamlViewer, SimControls, RoleplayPage
92
- - Blocked by: T1
93
- ```
94
-
95
- Files outside original "Files:" → add with `(impact — verify/update)`.
96
- Skip for spike tasks.
91
+ Embed as `Impact:` block in each task. Files outside original `Files:` add with `(impact verify/update)`. Skip for spikes.
97
92
 
98
93
  ### 4.5. TARGETED EXPLORATION
99
94
 
100
- Follow `templates/explore-agent.md` for spawn rules and scope. Explore agents cover **what LSP did not reveal**: conventions, dead code, implicit patterns.
95
+ Follow `templates/explore-agent.md` for spawn rules. 3-5 agents cover post-LSP gaps: conventions, dead code, implicit patterns.
101
96
 
102
- | Finding Type | Agents |
103
- |--------------|--------|
104
- | Post-LSP gaps | 3-5 |
105
-
106
- Use `code-completeness` skill to search for: implementations matching spec requirements, TODOs/FIXMEs/HACKs, stubs, skipped tests.
97
+ Use `code-completeness` skill: implementations matching spec, TODOs/FIXMEs/HACKs, stubs, skipped tests.
107
98
 
108
99
  ### 4.6. CROSS-TASK FILE CONFLICT DETECTION
109
100
 
110
- After all tasks have their `Files:` lists, detect overlaps that require sequential execution.
111
-
112
- **Algorithm:**
113
- 1. Build a map: `file → [task IDs that list it]`
114
- 2. For each file with >1 task: add `Blocked by` edge from later task → earlier task (by task number)
115
- 3. If a dependency already exists (direct or transitive), skip (no redundant edges)
101
+ After all tasks have `Files:` lists, detect overlaps requiring sequential execution.
116
102
 
117
- **Example:**
118
- ```
119
- T1: Files: config.go, feature.go — Blocked by: none
120
- T3: Files: config.go — Blocked by: none
121
- T5: Files: config.go — Blocked by: none
122
- ```
123
- After conflict detection:
124
- ```
125
- T1: Blocked by: none
126
- T3: Blocked by: T1 (file conflict: config.go)
127
- T5: Blocked by: T3 (file conflict: config.go)
128
- ```
103
+ 1. Build map: `file → [task IDs]`
104
+ 2. For files with >1 task: add `Blocked by` from later → earlier task
105
+ 3. Skip if dependency already exists (direct or transitive)
129
106
 
130
- **Rules:**
131
- - Only add the minimum edges needed (chain, not full mesh — T5 blocks on T3, not T1+T3)
132
- - Append `(file conflict: {filename})` to the Blocked by reason for traceability
133
- - If a logical dependency already covers the ordering, don't add a redundant conflict edge
134
- - Cross-spec conflicts: tasks from different specs sharing files get the same treatment
107
+ **Rules:** Chain only (T5→T3, not T5→T1+T3). Append `(file conflict: {filename})`. Logical deps override conflict edges. Cross-spec conflicts get same treatment.
135
108
 
136
109
  ### 5. COMPARE & PRIORITIZE
137
110
 
138
- Spawn `Task(subagent_type="reasoner", model="opus")`. Map each requirement to DONE / PARTIAL / MISSING / CONFLICT. Check REQ-AC alignment. Flag spec gaps.
111
+ Spawn `Task(subagent_type="reasoner", model="opus")`. Map each requirement to DONE/PARTIAL/MISSING/CONFLICT. Check REQ-AC alignment. Flag spec gaps.
139
112
 
140
113
  Priority: Dependencies → Impact → Risk
141
114
 
142
115
  #### Metric AC Detection
143
116
 
144
- While comparing requirements, scan each spec AC for the pattern `{metric} {operator} {number}[unit]`:
117
+ Scan ACs for pattern `{metric} {operator} {number}[unit]` (e.g., `coverage > 85%`, `latency < 200ms`). Operators: `>`, `<`, `>=`, `<=`, `==`.
145
118
 
146
- - **Pattern examples**: `coverage > 85%`, `latency < 200ms`, `p99_latency <= 150ms`, `bundle_size < 500kb`
147
- - **Operators**: `>`, `<`, `>=`, `<=`, `==`
148
- - **Number**: float or integer, optional unit suffix (%, ms, kb, mb, s, etc.)
149
- - **On match**: flag the AC as a **metric AC** and generate an `Optimize:` task (see section 6.5)
150
- - **Non-match**: treat as standard functional AC → standard implementation task
151
- - **Ambiguous ACs** (qualitative terms like "fast", "small", "improved"): flag as spec gap, request numeric threshold before planning
119
+ - **Match:** flag as metric AC generate `Optimize:` task (§6.5)
120
+ - **Non-match:** standard implementation task
121
+ - **Ambiguous** ("fast", "small"): flag as spec gap, request numeric threshold
152
122
 
153
123
  ### 5.5. CLASSIFY MODEL + EFFORT PER TASK
154
124
 
155
- For each task, assign `Model:` and `Effort:` based on the routing matrix:
156
-
157
125
  #### Routing matrix
158
126
 
159
- | Task type | Model | Effort | Rationale |
160
- |-----------|-------|--------|-----------|
161
- | Bootstrap (scaffold, config, rename) | `haiku` | `low` | Mechanical, pattern-following, zero ambiguity |
162
- | browse-fetch (doc retrieval) | `haiku` | `low` | Just fetching and extracting, no reasoning |
163
- | Single-file simple addition | `haiku` | `high` | Small scope but needs to get it right |
164
- | Multi-file with clear specs | `sonnet` | `medium` | Standard work, specs remove need for deep thinking |
165
- | Bug fix (clear repro) | `sonnet` | `medium` | Diagnosis done, just apply fix |
166
- | Bug fix (unclear cause) | `sonnet` | `high` | Needs reasoning to find root cause |
167
- | Spike / validation | `sonnet` | `high` | Scoped but needs reasoning to validate hypothesis |
168
- | Optimize (metric AC) | `opus` | `high` | Multi-cycle, ambiguous — best strategy changes per iteration |
169
- | Feature work (well-specced) | `sonnet` | `medium` | Clear ACs reduce thinking overhead |
170
- | Feature work (ambiguous ACs) | `opus` | `medium` | Needs intelligence but effort can be moderate with good specs |
171
- | Refactor (>5 files, many callers) | `opus` | `medium` | Blast radius needs intelligence, patterns are repetitive |
172
- | Architecture change | `opus` | `high` | High complexity + high ambiguity |
173
- | Unfamiliar API integration | `opus` | `high` | Needs deep reasoning about unknown patterns |
174
- | Retried after revert | _(raise one level)_ | `high` | Prior failure means harder than expected |
175
-
176
- #### Decision inputs
177
-
178
- 1. **File count** — 1 file → haiku/sonnet, 2-5 → sonnet, >5 → sonnet/opus
179
- 2. **Impact blast radius** — many callers/duplicates → raise model
180
- 3. **Spec clarity** — clear ACs → lower effort, ambiguous → raise effort
181
- 4. **Type** — spikes → `sonnet high`, bootstrap → `haiku low`
182
- 5. **Has prior failures** — raise model one level AND set effort to `high`
183
- 6. **Repetitiveness** — repetitive pattern across files → lower effort even at higher model
184
-
185
- #### Effort economics
186
-
187
- Effort controls ALL token spend (text, tool calls, thinking). Lower effort = fewer tool calls, less preamble, shorter reasoning.
188
-
189
- - `low` → ~60-70% token reduction vs high. Use when task is mechanical.
190
- - `medium` → ~30-40% token reduction. Use when specs are clear.
191
- - `high` → full spend (default). Use when ambiguity or risk is high.
192
-
193
- Add `Model: haiku|sonnet|opus` and `Effort: low|medium|high` to each task block. Defaults: `Model: sonnet`, `Effort: medium`.
127
+ | Task type | Model | Effort |
128
+ |-----------|-------|--------|
129
+ | Bootstrap (scaffold, config, rename) | `haiku` | `low` |
130
+ | browse-fetch (doc retrieval) | `haiku` | `low` |
131
+ | Single-file simple addition | `haiku` | `high` |
132
+ | Multi-file with clear specs | `sonnet` | `medium` |
133
+ | Bug fix (clear repro) | `sonnet` | `medium` |
134
+ | Bug fix (unclear cause) | `sonnet` | `high` |
135
+ | Spike / validation | `sonnet` | `high` |
136
+ | Optimize (metric AC) | `opus` | `high` |
137
+ | Feature work (well-specced) | `sonnet` | `medium` |
138
+ | Feature work (ambiguous ACs) | `opus` | `medium` |
139
+ | Refactor (>5 files, many callers) | `opus` | `medium` |
140
+ | Architecture change | `opus` | `high` |
141
+ | Unfamiliar API integration | `opus` | `high` |
142
+ | Retried after revert | _(raise one level)_ | `high` |
143
+
144
+ Add `Model:` and `Effort:` to each task. Defaults: `sonnet` / `medium`.
194
145
 
195
146
  ### 6. GENERATE SPIKE TASKS (IF NEEDED)
196
147
 
197
- **Spike Task Format:**
148
+ **Format:**
198
149
  ```markdown
199
150
  - [ ] **T1** [SPIKE]: Validate {hypothesis}
200
151
  - Type: spike
@@ -206,12 +157,10 @@ Add `Model: haiku|sonnet|opus` and `Effort: low|medium|high` to each task block.
206
157
  - Blocked by: none
207
158
  ```
208
159
 
209
- All implementation tasks MUST `Blocked by: T{spike}`. Spike fails → `--failed.md`, no implementation tasks.
160
+ All implementation tasks MUST `Blocked by: T{spike}`. Spike fails → `--failed.md`, no implementation.
210
161
 
211
162
  #### Probe Diversity
212
163
 
213
- When generating multiple spikes for the same problem:
214
-
215
164
  | Requirement | Rule |
216
165
  |-------------|------|
217
166
  | Contradictory | ≥2 probes with opposing approaches |
@@ -221,38 +170,15 @@ When generating multiple spikes for the same problem:
221
170
 
222
171
  Before output, verify: ≥2 opposing probes, ≥1 naive, all independent.
223
172
 
224
- **Example — caching problem, 3 diverse probes:**
225
- ```markdown
226
- - [ ] **T1** [SPIKE]: Validate in-memory LRU cache
227
- - Role: Contradictory-A (in-process)
228
- - Hypothesis: In-memory LRU reduces DB queries by ≥80%
229
- - Method: LRU with 1000-item cap, load test
230
- - Success criteria: DB queries drop ≥80% under 100 concurrent users
231
-
232
- - [ ] **T2** [SPIKE]: Validate Redis distributed cache
233
- - Role: Contradictory-B (external, opposing T1)
234
- - Hypothesis: Redis scales across multiple instances
235
- - Method: Redis client, cache top 10 queries, same load test
236
- - Success criteria: DB queries drop ≥80%, works across 2 instances
237
-
238
- - [ ] **T3** [SPIKE]: Validate query optimization without cache
239
- - Role: Naive (no prior justification — tests if caching is even necessary)
240
- - Hypothesis: Indexes + query batching alone may suffice
241
- - Method: Add indexes, batch N+1 queries, same load test — no cache
242
- - Success criteria: DB queries drop ≥80% with zero cache infrastructure
243
- ```
244
-
245
173
  ### 6.5. GENERATE OPTIMIZE TASKS (FROM METRIC ACs)
246
174
 
247
- For each metric AC detected in section 5, generate an `Optimize:` task using this format:
248
-
249
- **Optimize Task Format:**
175
+ **Format:**
250
176
  ```markdown
251
177
  - [ ] **T{n}** [OPTIMIZE]: Improve {metric_name} to {target}
252
178
  - Type: optimize
253
- - Files: {primary files likely to affect the metric}
179
+ - Files: {primary files affecting metric}
254
180
  - Optimize:
255
- metric: "{shell command that outputs a single number}"
181
+ metric: "{shell command outputting single number}"
256
182
  target: {number}
257
183
  direction: higher|lower
258
184
  max_cycles: {number, default 20}
@@ -262,95 +188,39 @@ For each metric AC detected in section 5, generate an `Optimize:` task using thi
262
188
  regression_threshold: 5%
263
189
  - Model: opus
264
190
  - Effort: high
265
- - Blocked by: {spike T{n} if applicable, else none}
191
+ - Blocked by: {spike if applicable, else none}
266
192
  ```
267
193
 
268
- **Field rules:**
269
- - `metric`: a shell command returning a single scalar float/integer (e.g., `npx jest --coverage --json | jq '.coverageMap | .. | .pct? | numbers' | awk '{sum+=$1;n++} END{print sum/n}'`). Must be deterministic and side-effect free.
270
- - `target`: the numeric threshold extracted from the AC (strip unit suffix for the value; note unit in task description)
271
- - `direction`: `higher` if operator is `>` or `>=`; `lower` if `<` or `<=`; `higher` by convention for `==`
272
- - `max_cycles`: from spec if stated; default 20
273
- - `secondary_metrics`: other metrics from the same spec that could regress (e.g., build time, bundle size, test count). Omit if none.
274
-
275
- **Model/Effort**: always `opus` / `high` (see routing matrix).
276
-
277
- **Blocking**: if a spike exists for the same area, block the optimize task on the spike passing.
194
+ **Field rules:** `metric` must be deterministic, side-effect free, return single scalar. `direction`: higher for `>`/`>=`, lower for `<`/`<=`, higher for `==`. `max_cycles`: from spec or default 20. Always `opus`/`high`. Block on spike if one exists.
278
195
 
279
196
  ### 7. VALIDATE HYPOTHESES
280
197
 
281
- Unfamiliar APIs or performance-critical → prototype in scratchpad. Fails → write `--failed.md`. Skip for known patterns.
198
+ Unfamiliar APIs or performance-critical → prototype in scratchpad. Fails → `--failed.md`. Skip for known patterns.
282
199
 
283
200
  ### 8. CLEANUP PLAN.md
284
201
 
285
- Prune stale sections: remove `done-*` sections and orphaned headers. Recalculate Summary table. Empty → recreate fresh.
202
+ Prune stale `done-*` sections and orphaned headers. Recalculate Summary. Empty → recreate fresh.
286
203
 
287
204
  ### 9. OUTPUT & RENAME
288
205
 
289
206
  Append tasks grouped by `### doing-{spec-name}`. Rename `specs/feature.md` → `specs/doing-feature.md`.
290
207
 
291
- Report: `✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
208
+ Report:
209
+ ```
210
+ ✓ Plan generated — {n} specs, {n} tasks. Run /df:execute
211
+
212
+ Spec layers:
213
+ {name}: L{N} ({label}) — {n} spikes{, {n} impl tasks if L2+}
214
+ ```
215
+
216
+ If any L0–L1 spec: `ℹ L0–L1 specs generate spikes only. Deepen with /df:spec {name} to unlock implementation.`
292
217
 
293
218
  ## Rules
219
+ - **Layer-gated** — L0–L1 → spikes only; L2+ → implementation; L3 → full planning
294
220
  - **Spike-first** — No `--passed.md` → spike before implementation
295
- - **Block on spike** — Implementation tasks blocked until spike validates
221
+ - **Block on spike** — Implementation blocked until spike validates
296
222
  - **Learn from failures** — Extract next hypothesis, never repeat approach
297
223
  - **Plan only** — Do NOT implement (except quick validation prototypes)
298
224
  - **One task = one logical unit** — Atomic, committable
299
225
  - Prefer existing utilities over new code; flag spec gaps
300
-
301
- ## Agent Scaling
302
-
303
- | Agent | Model | Base | Scale |
304
- |-------|-------|------|-------|
305
- | Explore | haiku | 3-5 | none |
306
- | Reasoner | opus | 5 | +1 per 2 specs |
307
-
308
- Always use `Task` tool with explicit `subagent_type` and `model`.
309
-
310
- ## Example
311
-
312
- ```markdown
313
- ### doing-upload
314
-
315
- - [ ] **T1** [SPIKE]: Validate streaming upload approach
316
- - Type: spike
317
- - Hypothesis: Streaming uploads handle >1GB without memory issues
318
- - Success criteria: Memory <500MB during 2GB upload
319
- - Files: .deepflow/experiments/upload--streaming--active.md
320
- - Blocked by: none
321
-
322
- - [ ] **T2**: Create upload endpoint
323
- - Files: src/api/upload.ts
324
- - Model: sonnet
325
- - Impact:
326
- - Callers: src/routes/index.ts:5
327
- - Duplicates: backend/legacy-upload.go [dead — DELETE]
328
- - Blocked by: T1
329
-
330
- - [ ] **T3**: Add S3 service with streaming
331
- - Files: src/services/storage.ts
332
- - Model: opus
333
- - Blocked by: T1, T2
334
- ```
335
-
336
- **Optimize task example** (from spec AC: `coverage > 85%`):
337
-
338
- ```markdown
339
- ### doing-quality
340
-
341
- - [ ] **T1** [OPTIMIZE]: Improve test coverage to >85%
342
- - Type: optimize
343
- - Files: src/
344
- - Optimize:
345
- metric: "npx jest --coverage --json 2>/dev/null | jq '[.. | .pct? | numbers] | add / length'"
346
- target: 85
347
- direction: higher
348
- max_cycles: 20
349
- secondary_metrics:
350
- - metric: "npx jest --json 2>/dev/null | jq '.testResults | length'"
351
- name: test_count
352
- regression_threshold: 5%
353
- - Model: opus
354
- - Effort: high
355
- - Blocked by: none
356
- ```
226
+ - Always use `Task` tool with explicit `subagent_type` and `model`