npm - deepflow - Versions diffs - 0.1.87 → 0.1.89 - Mend

deepflow 0.1.87 → 0.1.89

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/bin/install.js +73 -7
package/hooks/df-dashboard-push.js +170 -0
package/hooks/df-execution-history.js +120 -0
package/hooks/df-invariant-check.js +126 -0
package/hooks/df-spec-lint.js +78 -4
package/hooks/df-statusline.js +77 -5
package/hooks/df-tool-usage-spike.js +41 -0
package/hooks/df-tool-usage.js +86 -0
package/hooks/df-worktree-guard.js +101 -0
package/package.json +1 -1
package/src/commands/df/auto-cycle.md +75 -558
package/src/commands/df/auto.md +9 -48
package/src/commands/df/consolidate.md +14 -38
package/src/commands/df/dashboard.md +35 -0
package/src/commands/df/debate.md +27 -156
package/src/commands/df/discover.md +35 -181
package/src/commands/df/execute.md +283 -563
package/src/commands/df/note.md +37 -176
package/src/commands/df/plan.md +80 -210
package/src/commands/df/report.md +29 -184
package/src/commands/df/resume.md +18 -101
package/src/commands/df/spec.md +49 -145
package/src/commands/df/verify.md +59 -606
package/src/skills/browse-fetch/SKILL.md +32 -257
package/src/skills/browse-verify/SKILL.md +40 -174
package/src/skills/code-completeness/SKILL.md +2 -9
package/src/skills/gap-discovery/SKILL.md +19 -86
package/templates/config-template.yaml +10 -0
package/templates/spec-template.md +12 -1

package/src/commands/df/note.md CHANGED Viewed

@@ -7,206 +7,67 @@ description: Capture decisions that emerged during free conversations outside of
 ## Orchestrator Role
-You scan prior conversation context for candidate decisions, present them for user confirmation, and persist confirmed decisions to `.deepflow/decisions.md`.
+Scan conversation for candidate decisions, present for user confirmation, persist to `.deepflow/decisions.md`.
-**NEVER:** Spawn agents, use Task tool, use Glob/Grep on source code, run git, use TaskOutput, use EnterPlanMode, use ExitPlanMode
+**NEVER:** Spawn agents, use Task tool, use Glob/Grep on source code, run git, use TaskOutput, EnterPlanMode, ExitPlanMode
-**ONLY:** Read `.deepflow/decisions.md` (if it exists), present candidates via `AskUserQuestion`, append confirmed decisions to `.deepflow/decisions.md`
----
-## Purpose
-Capture decisions that emerged during free conversations outside of deepflow commands. Surfaces candidate decisions from the current conversation, lets the user confirm or discard each, and persists confirmed ones to the shared decisions log.
-## Usage
-```
-/df:note
-```
-No arguments required. Operates on the current conversation context.
----
+**ONLY:** Read `.deepflow/decisions.md`, present candidates via `AskUserQuestion`, append confirmed decisions
 ## Behavior
 ### 1. EXTRACT CANDIDATES
-Scan the prior conversation messages for candidate decisions. A decision is any resolved choice, adopted approach, or stated assumption that affects how the work is done. Look for:
+Scan prior messages for resolved choices, adopted approaches, or stated assumptions. Look for:
+- **Approaches chosen**: "we'll use X instead of Y"
+- **Provisional choices**: "for now we'll use X"
+- **Stated assumptions**: "assuming X is true"
+- **Constraints accepted**: "X is out of scope"
+- **Naming/structural choices**: "we'll call it X", "X goes in the Y layer"
-- **Approaches chosen**: "we'll use X instead of Y", "let's go with X"
-- **Provisional choices**: "for now we'll use X", "assuming X until we know more"
-- **Stated assumptions**: "assuming X is true", "treating X as given"
-- **Constraints accepted**: "we won't do X", "X is out of scope"
-- **Naming or structural choices**: "we'll call it X", "X goes in the Y layer"
+Extract **at most 4 candidates**. For each, determine:
-Extract **at most 4 candidates** from the conversation. Prioritize the most consequential or recent ones.
+| Field | Value |
+|-------|-------|
+| Tag | `[APPROACH]` (deliberate choice), `[PROVISIONAL]` (revisit later), or `[ASSUMPTION]` (unvalidated) |
+| Decision | One concise line describing the choice |
+| Rationale | One sentence explaining why |
-For each candidate, determine:
-- **Tag**: one of `[APPROACH]`, `[PROVISIONAL]`, or `[ASSUMPTION]`
-  - `[APPROACH]` — a deliberate design or implementation choice
-  - `[PROVISIONAL]` — works for now, expected to revisit
-  - `[ASSUMPTION]` — treating something as true without full validation
-- **Decision text**: one concise line describing the choice
-- **Rationale**: one sentence explaining why this was chosen
-If fewer than 2 clear candidates are found, say so briefly and exit without calling `AskUserQuestion`.
+If <2 clear candidates found, say so and exit.
 ### 2. CHECK FOR CONTRADICTIONS
-Read `.deepflow/decisions.md` if it exists. For each candidate, check whether it contradicts a prior entry in the file.
-If a contradiction is found:
-- Keep the prior entry — never delete or modify it
-- Amend the candidate's rationale to reference the prior decision: `was "X", now "Y" because Z`
+Read `.deepflow/decisions.md` if it exists. If a candidate contradicts a prior entry: keep prior entry unchanged, amend candidate rationale to `was "X", now "Y" because Z`.
 ### 3. PRESENT VIA AskUserQuestion
-Present candidates as a multi-select question with at most 4 options (tool limit).
-```json
-{
-  "questions": [
-    {
-      "question": "These decisions were detected in your conversation. Which should be saved to .deepflow/decisions.md?",
-      "header": "Save notes?",
-      "multiSelect": true,
-      "options": [
-        {
-          "label": "[APPROACH] <decision text>",
-          "description": "<rationale>"
-        },
-        {
-          "label": "[PROVISIONAL] <decision text>",
-          "description": "<rationale>"
-        }
-      ]
-    }
-  ]
-}
-```
-Each option's `label` is the tag + decision text. Each `description` is the rationale (one sentence).
+Single multi-select call. Each option: `label` = tag + decision text, `description` = rationale.
 ### 4. APPEND CONFIRMED DECISIONS
-For each option the user selects:
-1. If `.deepflow/decisions.md` does not exist, create it with a blank header:
-   ```
-   # Decisions
-   ```
-2. Append a new dated section using today's date in `YYYY-MM-DD` format and source `note`:
-   ```markdown
-   ### 2026-02-22 — note
-   - [APPROACH] Use event sourcing over CRUD — append-only log matches audit requirements
-   - [PROVISIONAL] Batch size = 50 — works for 4-game dataset, revisit at scale
-   ```
-3. If multiple decisions are confirmed in one invocation, group them under a single dated section.
-4. Never modify or delete any prior entries.
+For each selected option:
+1. Create `.deepflow/decisions.md` with `# Decisions` header if absent
+2. Append a dated section: `### YYYY-MM-DD — note`
+3. Group all confirmed decisions under one section: `- [TAG] Decision text — rationale`
+4. Never modify or delete prior entries
 ### 5. CONFIRM
-After writing, report to the user:
-```
-Saved N decision(s) to .deepflow/decisions.md
-```
-If the user selected nothing, respond:
-```
-No decisions saved.
-```
----
-## Decision Format
-```
-### YYYY-MM-DD — note
-- [TAG] Decision text — rationale
-```
+Report: `Saved N decision(s) to .deepflow/decisions.md` or `No decisions saved.`
-**Tags:**
-- `[APPROACH]` — deliberate design or implementation choice
-- `[PROVISIONAL]` — works for now, will revisit at scale or with more information
-- `[ASSUMPTION]` — treating something as true without full confirmation
-- `[DEBT]` — needs revisiting; produced only by `/df:consolidate`, never manually assigned
+## Decision Tags
-**Contradiction handling:** Never delete prior entries. When a new decision contradicts an older one, include a reference in the rationale: `was "X", now "Y" because Z`.
----
+| Tag | Meaning | Source |
+|-----|---------|--------|
+| `[APPROACH]` | Firm decision | /df:note, auto-extraction |
+| `[PROVISIONAL]` | Revisit later | /df:note, auto-extraction |
+| `[ASSUMPTION]` | Unverified | /df:note, auto-extraction |
+| `[DEBT]` | Needs revisiting | /df:consolidate only, never manually assigned |
 ## Rules
-- **Maximum 4 candidates** per invocation (tool limit for AskUserQuestion options)
-- **multiSelect: true** — user can confirm any subset, including all or none
-- **Never invent decisions** — only extract what was genuinely discussed and resolved in the conversation
-- **Never modify prior entries** in `.deepflow/decisions.md`
-- **Create the file if absent** — first use initializes it with `# Decisions` header
-- **Source is always `note`** — this distinguishes from decisions captured by other df commands
-- **Date is today's date** — use the current date in `YYYY-MM-DD` format
-- **One AskUserQuestion call** — present all candidates in a single call, not sequentially
----
-## Example
-```
-USER: /df:note
-CLAUDE: Let me scan our conversation for candidate decisions.
-[Reads .deepflow/decisions.md — finds prior entry:
-  "### 2026-02-10 — spec
-  - [APPROACH] Use PostgreSQL for primary storage — team already knows it"]
-[Extracts from conversation:
-  1. [APPROACH] Use Redis for session state — avoids DB round-trips on hot path
-  2. [PROVISIONAL] Skip auth for internal APIs — only 2 internal services for now
-  3. [ASSUMPTION] Peak load is 500 req/s — based on current traffic, unvalidated
-]
-[No contradictions found with prior entries]
-[Uses AskUserQuestion]
-{
-  "questions": [
-    {
-      "question": "These decisions were detected in your conversation. Which should be saved to .deepflow/decisions.md?",
-      "header": "Save notes?",
-      "multiSelect": true,
-      "options": [
-        {
-          "label": "[APPROACH] Use Redis for session state",
-          "description": "Avoids DB round-trips on hot path"
-        },
-        {
-          "label": "[PROVISIONAL] Skip auth for internal APIs",
-          "description": "Only 2 internal services for now, revisit when we open to more consumers"
-        },
-        {
-          "label": "[ASSUMPTION] Peak load is 500 req/s",
-          "description": "Based on current traffic patterns, not load-tested"
-        }
-      ]
-    }
-  ]
-}
-USER: [Selects: Use Redis for session state, Peak load is 500 req/s]
-CLAUDE: [Appends to .deepflow/decisions.md:]
-### 2026-02-22 — note
-- [APPROACH] Use Redis for session state — avoids DB round-trips on hot path
-- [ASSUMPTION] Peak load is 500 req/s — based on current traffic patterns, not load-tested
-Saved 2 decision(s) to .deepflow/decisions.md
-```
+- Max 4 candidates per invocation (AskUserQuestion tool limit)
+- multiSelect: true — user confirms any subset
+- Never invent decisions — only extract what was discussed and resolved
+- Never modify prior entries in `.deepflow/decisions.md`
+- Source is always `note`; date is today (YYYY-MM-DD)
+- One AskUserQuestion call — all candidates in a single call

package/src/commands/df/plan.md CHANGED Viewed

@@ -5,7 +5,6 @@ description: Compare specs against codebase and past experiments, generate prior
 # /df:plan — Generate Task Plan from Specs
-## Purpose
 Compare specs against codebase and past experiments. Generate prioritized tasks.
 **NEVER:** use EnterPlanMode, use ExitPlanMode — this command IS the planning phase
@@ -37,22 +36,35 @@ Load: specs/*.md (exclude doing-*/done-*), PLAN.md (if exists), .deepflow/config
 Determine source_dir from config or default to src/
 ```
-Shell injection (use output directly — no manual file reads needed):
+Shell injection:
 - `` !`ls specs/*.md 2>/dev/null || echo 'NOT_FOUND'` ``
 - `` !`cat PLAN.md 2>/dev/null || echo 'NOT_FOUND'` ``
-Run `validateSpec` on each spec. Hard failures → skip + error. Advisory → include in output.
+Run `validateSpec` on each spec. Hard failures → skip + error. Advisory → include.
+Record each spec's computed layer (gates task generation per §1.5).
 No new specs → report counts, suggest `/df:execute`.
-### 2. CHECK PAST EXPERIMENTS (SPIKE-FIRST)
+### 1.5. LAYER-GATED TASK GENERATION
-**CRITICAL**: Check experiments BEFORE generating any tasks.
+| Layer | Sections present | Allowed task types |
+|-------|------------------|--------------------|
+| L0 | Objective | Spikes only |
+| L1 | + Requirements | Spikes only (better targeted) |
+| L2 | + Acceptance Criteria | Spikes + Implementation |
+| L3 | + Constraints, Out of Scope, Technical Notes | Spikes + Implementation + Impact analysis + Optimize |
-```
-Glob .deepflow/experiments/{topic}--*
-```
+**Rules:**
+- L0–L1: ONLY spike tasks. Implementation blocked until spec deepens to L2+.
+- L2: spikes + implementation, skip impact analysis.
+- L3: full planning — spikes, implementation, impact analysis, optimize.
+- Spike results deepen specs: findings incorporated back via user or `/df:spec`, raising layer.
+- Report layer: `"Spec {name}: L{N} ({label}) — {task_types_generated}"`
+### 2. CHECK PAST EXPERIMENTS (SPIKE-FIRST)
+**CRITICAL**: Check experiments BEFORE generating tasks.
-File naming: `{topic}--{hypothesis}--{status}.md` (active/passed/failed)
+Glob `.deepflow/experiments/{topic}--*`. File naming: `{topic}--{hypothesis}--{status}.md`
 | Result | Action |
 |--------|--------|
@@ -61,140 +73,79 @@ File naming: `{topic}--{hypothesis}--{status}.md` (active/passed/failed)
 | `--active.md` | Wait for completion |
 | No matches | New topic, generate initial spike |
-Full implementation tasks BLOCKED until spike validates. See `templates/experiment-template.md`.
+Implementation tasks BLOCKED until spike validates.
 ### 3. DETECT PROJECT CONTEXT
 Identify code style, patterns (error handling, API structure), integration points. Include in task descriptions.
-### 4. IMPACT ANALYSIS (per planned file)
+### 4. IMPACT ANALYSIS (L3 specs only)
-For each file in a task's "Files:" list, find the full blast radius.
+Skip for L0–L2 specs. For each file in a task's `Files:` list, find blast radius.
-**Search for (prefer LSP, fallback to grep):**
+**Search (prefer LSP, fallback grep):**
+1. **Callers:** LSP `findReferences`/`incomingCalls` on exports being changed. Annotate WHY impacted. Fallback: grep.
+2. **Duplicates:** Similar logic files. Classify: `[active]` → consolidate, `[dead]` → DELETE.
+3. **Data flow:** LSP `outgoingCalls` to trace consumers.
-1. **Callers:** Use LSP `findReferences` / `incomingCalls` on each exported function/type being changed. Annotate each caller with WHY it's impacted (e.g. "imports validateToken which this task changes"). Fallback: `grep -r "{exported_function}" --include="*.{ext}" -l`
-2. **Duplicates:** Files with similar logic (same function name, same transformation). Classify:
-   - `[active]` — used in production → must consolidate
-   - `[dead]` — bypassed/unreachable → must delete
-3. **Data flow:** If file produces/transforms data, use LSP `outgoingCalls` to trace consumers. Fallback: grep across languages
-**Embed as `Impact:` block in each task:**
-```markdown
-- [ ] **T2**: Add new features to YAML export
-  - Files: src/utils/buildConfigData.ts
-  - Impact:
-    - Callers: src/routes/index.ts:12, src/api/handler.ts:45
-    - Duplicates:
-      - src/components/YamlViewer.tsx:19 (own generateYAML) [active — consolidate]
-      - backend/yaml_gen.go (generateYAMLFromConfig) [dead — DELETE]
-    - Data flow: buildConfigData → YamlViewer, SimControls, RoleplayPage
-  - Blocked by: T1
-```
-Files outside original "Files:" → add with `(impact — verify/update)`.
-Skip for spike tasks.
+Embed as `Impact:` block in each task. Files outside original `Files:` → add with `(impact — verify/update)`. Skip for spikes.
 ### 4.5. TARGETED EXPLORATION
-Follow `templates/explore-agent.md` for spawn rules and scope. Explore agents cover **what LSP did not reveal**: conventions, dead code, implicit patterns.
+Follow `templates/explore-agent.md` for spawn rules. 3-5 agents cover post-LSP gaps: conventions, dead code, implicit patterns.
-| Finding Type | Agents |
-|--------------|--------|
-| Post-LSP gaps | 3-5 |
-Use `code-completeness` skill to search for: implementations matching spec requirements, TODOs/FIXMEs/HACKs, stubs, skipped tests.
+Use `code-completeness` skill: implementations matching spec, TODOs/FIXMEs/HACKs, stubs, skipped tests.
 ### 4.6. CROSS-TASK FILE CONFLICT DETECTION
-After all tasks have their `Files:` lists, detect overlaps that require sequential execution.
-**Algorithm:**
-1. Build a map: `file → [task IDs that list it]`
-2. For each file with >1 task: add `Blocked by` edge from later task → earlier task (by task number)
-3. If a dependency already exists (direct or transitive), skip (no redundant edges)
+After all tasks have `Files:` lists, detect overlaps requiring sequential execution.
-**Example:**
-```
-T1: Files: config.go, feature.go  — Blocked by: none
-T3: Files: config.go              — Blocked by: none
-T5: Files: config.go              — Blocked by: none
-```
-After conflict detection:
-```
-T1: Blocked by: none
-T3: Blocked by: T1 (file conflict: config.go)
-T5: Blocked by: T3 (file conflict: config.go)
-```
+1. Build map: `file → [task IDs]`
+2. For files with >1 task: add `Blocked by` from later → earlier task
+3. Skip if dependency already exists (direct or transitive)
-**Rules:**
-- Only add the minimum edges needed (chain, not full mesh — T5 blocks on T3, not T1+T3)
-- Append `(file conflict: {filename})` to the Blocked by reason for traceability
-- If a logical dependency already covers the ordering, don't add a redundant conflict edge
-- Cross-spec conflicts: tasks from different specs sharing files get the same treatment
+**Rules:** Chain only (T5→T3, not T5→T1+T3). Append `(file conflict: {filename})`. Logical deps override conflict edges. Cross-spec conflicts get same treatment.
 ### 5. COMPARE & PRIORITIZE
-Spawn `Task(subagent_type="reasoner", model="opus")`. Map each requirement to DONE / PARTIAL / MISSING / CONFLICT. Check REQ-AC alignment. Flag spec gaps.
+Spawn `Task(subagent_type="reasoner", model="opus")`. Map each requirement to DONE/PARTIAL/MISSING/CONFLICT. Check REQ-AC alignment. Flag spec gaps.
 Priority: Dependencies → Impact → Risk
 #### Metric AC Detection
-While comparing requirements, scan each spec AC for the pattern `{metric} {operator} {number}[unit]`:
+Scan ACs for pattern `{metric} {operator} {number}[unit]` (e.g., `coverage > 85%`, `latency < 200ms`). Operators: `>`, `<`, `>=`, `<=`, `==`.
-- **Pattern examples**: `coverage > 85%`, `latency < 200ms`, `p99_latency <= 150ms`, `bundle_size < 500kb`
-- **Operators**: `>`, `<`, `>=`, `<=`, `==`
-- **Number**: float or integer, optional unit suffix (%, ms, kb, mb, s, etc.)
-- **On match**: flag the AC as a **metric AC** and generate an `Optimize:` task (see section 6.5)
-- **Non-match**: treat as standard functional AC → standard implementation task
-- **Ambiguous ACs** (qualitative terms like "fast", "small", "improved"): flag as spec gap, request numeric threshold before planning
+- **Match:** flag as metric AC → generate `Optimize:` task (§6.5)
+- **Non-match:** standard implementation task
+- **Ambiguous** ("fast", "small"): flag as spec gap, request numeric threshold
 ### 5.5. CLASSIFY MODEL + EFFORT PER TASK
-For each task, assign `Model:` and `Effort:` based on the routing matrix:
 #### Routing matrix
-| Task type | Model | Effort | Rationale |
-|-----------|-------|--------|-----------|
-| Bootstrap (scaffold, config, rename) | `haiku` | `low` | Mechanical, pattern-following, zero ambiguity |
-| browse-fetch (doc retrieval) | `haiku` | `low` | Just fetching and extracting, no reasoning |
-| Single-file simple addition | `haiku` | `high` | Small scope but needs to get it right |
-| Multi-file with clear specs | `sonnet` | `medium` | Standard work, specs remove need for deep thinking |
-| Bug fix (clear repro) | `sonnet` | `medium` | Diagnosis done, just apply fix |
-| Bug fix (unclear cause) | `sonnet` | `high` | Needs reasoning to find root cause |
-| Spike / validation | `sonnet` | `high` | Scoped but needs reasoning to validate hypothesis |
-| Optimize (metric AC) | `opus` | `high` | Multi-cycle, ambiguous — best strategy changes per iteration |
-| Feature work (well-specced) | `sonnet` | `medium` | Clear ACs reduce thinking overhead |
-| Feature work (ambiguous ACs) | `opus` | `medium` | Needs intelligence but effort can be moderate with good specs |
-| Refactor (>5 files, many callers) | `opus` | `medium` | Blast radius needs intelligence, patterns are repetitive |
-| Architecture change | `opus` | `high` | High complexity + high ambiguity |
-| Unfamiliar API integration | `opus` | `high` | Needs deep reasoning about unknown patterns |
-| Retried after revert | _(raise one level)_ | `high` | Prior failure means harder than expected |
-#### Decision inputs
-1. **File count** — 1 file → haiku/sonnet, 2-5 → sonnet, >5 → sonnet/opus
-2. **Impact blast radius** — many callers/duplicates → raise model
-3. **Spec clarity** — clear ACs → lower effort, ambiguous → raise effort
-4. **Type** — spikes → `sonnet high`, bootstrap → `haiku low`
-5. **Has prior failures** — raise model one level AND set effort to `high`
-6. **Repetitiveness** — repetitive pattern across files → lower effort even at higher model
-#### Effort economics
-Effort controls ALL token spend (text, tool calls, thinking). Lower effort = fewer tool calls, less preamble, shorter reasoning.
-- `low` → ~60-70% token reduction vs high. Use when task is mechanical.
-- `medium` → ~30-40% token reduction. Use when specs are clear.
-- `high` → full spend (default). Use when ambiguity or risk is high.
-Add `Model: haiku|sonnet|opus` and `Effort: low|medium|high` to each task block. Defaults: `Model: sonnet`, `Effort: medium`.
+| Task type | Model | Effort |
+|-----------|-------|--------|
+| Bootstrap (scaffold, config, rename) | `haiku` | `low` |
+| browse-fetch (doc retrieval) | `haiku` | `low` |
+| Single-file simple addition | `haiku` | `high` |
+| Multi-file with clear specs | `sonnet` | `medium` |
+| Bug fix (clear repro) | `sonnet` | `medium` |
+| Bug fix (unclear cause) | `sonnet` | `high` |
+| Spike / validation | `sonnet` | `high` |
+| Optimize (metric AC) | `opus` | `high` |
+| Feature work (well-specced) | `sonnet` | `medium` |
+| Feature work (ambiguous ACs) | `opus` | `medium` |
+| Refactor (>5 files, many callers) | `opus` | `medium` |
+| Architecture change | `opus` | `high` |
+| Unfamiliar API integration | `opus` | `high` |
+| Retried after revert | _(raise one level)_ | `high` |
+Add `Model:` and `Effort:` to each task. Defaults: `sonnet` / `medium`.
 ### 6. GENERATE SPIKE TASKS (IF NEEDED)
-**Spike Task Format:**
+**Format:**
 ```markdown
 - [ ] **T1** [SPIKE]: Validate {hypothesis}
   - Type: spike
@@ -206,12 +157,10 @@ Add `Model: haiku|sonnet|opus` and `Effort: low|medium|high` to each task block.
   - Blocked by: none
 ```
-All implementation tasks MUST `Blocked by: T{spike}`. Spike fails → `--failed.md`, no implementation tasks.
+All implementation tasks MUST `Blocked by: T{spike}`. Spike fails → `--failed.md`, no implementation.
 #### Probe Diversity
-When generating multiple spikes for the same problem:
 | Requirement | Rule |
 |-------------|------|
 | Contradictory | ≥2 probes with opposing approaches |
@@ -221,38 +170,15 @@ When generating multiple spikes for the same problem:
 Before output, verify: ≥2 opposing probes, ≥1 naive, all independent.
-**Example — caching problem, 3 diverse probes:**
-```markdown
-- [ ] **T1** [SPIKE]: Validate in-memory LRU cache
-  - Role: Contradictory-A (in-process)
-  - Hypothesis: In-memory LRU reduces DB queries by ≥80%
-  - Method: LRU with 1000-item cap, load test
-  - Success criteria: DB queries drop ≥80% under 100 concurrent users
-- [ ] **T2** [SPIKE]: Validate Redis distributed cache
-  - Role: Contradictory-B (external, opposing T1)
-  - Hypothesis: Redis scales across multiple instances
-  - Method: Redis client, cache top 10 queries, same load test
-  - Success criteria: DB queries drop ≥80%, works across 2 instances
-- [ ] **T3** [SPIKE]: Validate query optimization without cache
-  - Role: Naive (no prior justification — tests if caching is even necessary)
-  - Hypothesis: Indexes + query batching alone may suffice
-  - Method: Add indexes, batch N+1 queries, same load test — no cache
-  - Success criteria: DB queries drop ≥80% with zero cache infrastructure
-```
 ### 6.5. GENERATE OPTIMIZE TASKS (FROM METRIC ACs)
-For each metric AC detected in section 5, generate an `Optimize:` task using this format:
-**Optimize Task Format:**
+**Format:**
 ```markdown
 - [ ] **T{n}** [OPTIMIZE]: Improve {metric_name} to {target}
   - Type: optimize
-  - Files: {primary files likely to affect the metric}
+  - Files: {primary files affecting metric}
   - Optimize:
-      metric: "{shell command that outputs a single number}"
+      metric: "{shell command outputting single number}"
       target: {number}
       direction: higher|lower
       max_cycles: {number, default 20}
@@ -262,95 +188,39 @@ For each metric AC detected in section 5, generate an `Optimize:` task using thi
           regression_threshold: 5%
   - Model: opus
   - Effort: high
-  - Blocked by: {spike T{n} if applicable, else none}
+  - Blocked by: {spike if applicable, else none}
 ```
-**Field rules:**
-- `metric`: a shell command returning a single scalar float/integer (e.g., `npx jest --coverage --json | jq '.coverageMap | .. | .pct? | numbers' | awk '{sum+=$1;n++} END{print sum/n}'`). Must be deterministic and side-effect free.
-- `target`: the numeric threshold extracted from the AC (strip unit suffix for the value; note unit in task description)
-- `direction`: `higher` if operator is `>` or `>=`; `lower` if `<` or `<=`; `higher` by convention for `==`
-- `max_cycles`: from spec if stated; default 20
-- `secondary_metrics`: other metrics from the same spec that could regress (e.g., build time, bundle size, test count). Omit if none.
-**Model/Effort**: always `opus` / `high` (see routing matrix).
-**Blocking**: if a spike exists for the same area, block the optimize task on the spike passing.
+**Field rules:** `metric` must be deterministic, side-effect free, return single scalar. `direction`: higher for `>`/`>=`, lower for `<`/`<=`, higher for `==`. `max_cycles`: from spec or default 20. Always `opus`/`high`. Block on spike if one exists.
 ### 7. VALIDATE HYPOTHESES
-Unfamiliar APIs or performance-critical → prototype in scratchpad. Fails → write `--failed.md`. Skip for known patterns.
+Unfamiliar APIs or performance-critical → prototype in scratchpad. Fails → `--failed.md`. Skip for known patterns.
 ### 8. CLEANUP PLAN.md
-Prune stale sections: remove `done-*` sections and orphaned headers. Recalculate Summary table. Empty → recreate fresh.
+Prune stale `done-*` sections and orphaned headers. Recalculate Summary. Empty → recreate fresh.
 ### 9. OUTPUT & RENAME
 Append tasks grouped by `### doing-{spec-name}`. Rename `specs/feature.md` → `specs/doing-feature.md`.
-Report: `✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
+Report:
+```
+✓ Plan generated — {n} specs, {n} tasks. Run /df:execute
+Spec layers:
+  {name}: L{N} ({label}) — {n} spikes{, {n} impl tasks if L2+}
+```
+If any L0–L1 spec: `ℹ L0–L1 specs generate spikes only. Deepen with /df:spec {name} to unlock implementation.`
 ## Rules
+- **Layer-gated** — L0–L1 → spikes only; L2+ → implementation; L3 → full planning
 - **Spike-first** — No `--passed.md` → spike before implementation
-- **Block on spike** — Implementation tasks blocked until spike validates
+- **Block on spike** — Implementation blocked until spike validates
 - **Learn from failures** — Extract next hypothesis, never repeat approach
 - **Plan only** — Do NOT implement (except quick validation prototypes)
 - **One task = one logical unit** — Atomic, committable
 - Prefer existing utilities over new code; flag spec gaps
-## Agent Scaling
-| Agent | Model | Base | Scale |
-|-------|-------|------|-------|
-| Explore | haiku | 3-5 | none |
-| Reasoner | opus | 5 | +1 per 2 specs |
-Always use `Task` tool with explicit `subagent_type` and `model`.
-## Example
-```markdown
-### doing-upload
-- [ ] **T1** [SPIKE]: Validate streaming upload approach
-  - Type: spike
-  - Hypothesis: Streaming uploads handle >1GB without memory issues
-  - Success criteria: Memory <500MB during 2GB upload
-  - Files: .deepflow/experiments/upload--streaming--active.md
-  - Blocked by: none
-- [ ] **T2**: Create upload endpoint
-  - Files: src/api/upload.ts
-  - Model: sonnet
-  - Impact:
-    - Callers: src/routes/index.ts:5
-    - Duplicates: backend/legacy-upload.go [dead — DELETE]
-  - Blocked by: T1
-- [ ] **T3**: Add S3 service with streaming
-  - Files: src/services/storage.ts
-  - Model: opus
-  - Blocked by: T1, T2
-```
-**Optimize task example** (from spec AC: `coverage > 85%`):
-```markdown
-### doing-quality
-- [ ] **T1** [OPTIMIZE]: Improve test coverage to >85%
-  - Type: optimize
-  - Files: src/
-  - Optimize:
-      metric: "npx jest --coverage --json 2>/dev/null | jq '[.. | .pct? | numbers] | add / length'"
-      target: 85
-      direction: higher
-      max_cycles: 20
-      secondary_metrics:
-        - metric: "npx jest --json 2>/dev/null | jq '.testResults | length'"
-          name: test_count
-          regression_threshold: 5%
-  - Model: opus
-  - Effort: high
-  - Blocked by: none
-```
+- Always use `Task` tool with explicit `subagent_type` and `model`