npm - deepflow - Versions diffs - 0.1.78 → 0.1.79 - Mend

deepflow 0.1.78 → 0.1.79

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/package.json +1 -1
package/src/commands/df/auto-cycle.md +16 -17
package/src/commands/df/execute.md +159 -473
package/src/commands/df/plan.md +85 -163

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "deepflow",
-  "version": "0.1.78",
+  "version": "0.1.79",
   "description": "Doing reveals what thinking can't predict — spec-driven iterative development for Claude Code",
   "keywords": [
     "claude",

package/src/commands/df/auto-cycle.md CHANGED Viewed

@@ -169,10 +169,10 @@ _Last updated: {YYYY-MM-DDTHH:MM:SSZ}_
 ## Cycle Log
-| Cycle | Task | Status | Commit / Revert | Reason | Timestamp |
-|-------|------|--------|-----------------|--------|-----------|
-| 1 | T1 | passed | abc1234 | — | 2025-01-15T10:00:00Z |
-| 2 | T2 | failed | reverted | tests failed: 2 of 24 | 2025-01-15T10:05:00Z |
+| Cycle | Task | Status | Commit / Revert | Delta | Reason | Timestamp |
+|-------|------|--------|-----------------|-------|--------|-----------|
+| 1 | T1 | passed | abc1234 | tests: 24→24, build: ok | — | 2025-01-15T10:00:00Z |
+| 2 | T2 | failed | reverted | tests: 24→22 (−2) | tests failed: 2 of 24 | 2025-01-15T10:05:00Z |
 ## Probe Results
@@ -202,13 +202,14 @@ _(tasks that were reverted with their failure reasons)_
 **Cycle Log — append one row:**
 ```
-| {cycle_number} | {task_id} | {status} | {commit_hash or "reverted"} | {reason or "—"} | {YYYY-MM-DDTHH:MM:SSZ} |
+| {cycle_number} | {task_id} | {status} | {commit_hash or "reverted"} | {delta} | {reason or "—"} | {YYYY-MM-DDTHH:MM:SSZ} |
 ```
 - `cycle_number`: total number of cycles executed so far (count existing data rows in the Cycle Log + 1)
 - `task_id`: task ID from PLAN.md, or `BOOTSTRAP` for bootstrap cycles
 - `status`: `passed` (ratchet passed), `failed` (ratchet failed, reverted), or `skipped` (task was already done)
 - `commit_hash`: short hash from the commit, or `reverted` if ratchet failed
+- `delta`: ratchet metric change from this cycle. Format: `tests: {before}→{after}, build: ok/fail`. Include coverage delta if available (e.g., `cov: 80%→82% (+2%)`). On revert, show the regression that triggered it (e.g., `tests: 24→22 (−2)`)
 - `reason`: failure reason from ratchet output (e.g., `"tests failed: 2 of 24"`), or `—` if passed
 **Summary table — recalculate from Cycle Log rows:**
@@ -259,10 +260,12 @@ done_count   = number of [x] tasks
 pending_count = number of [ ] tasks
 ```
-**If ALL tasks are `[x]` (pending_count == 0):**
+**Note:** Per-spec verification and merge to main happens automatically in `/df:execute` (step 8) when all tasks for a spec complete. No separate verify call is needed here.
+**If no `[ ]` tasks remain (pending_count == 0):**
 ```
-→ Run /df:verify via Skill tool (skill: "df:verify", no args)
-→ Report: "All tasks complete. Verification triggered."
+→ Report: "All specs verified and merged. Workflow complete."
+→ Exit
 ```
 **If tasks remain (pending_count > 0):**
@@ -327,17 +330,14 @@ Updated .deepflow/auto-report.md:
 Cycle complete. 1 tasks remaining.
 ```
-### All Tasks Done (verify triggered)
+### All Tasks Done (workflow complete)
 ```
 /df:auto-cycle
-Loading PLAN.md... 3 tasks total, 3 done, 0 pending
+Loading PLAN.md... 0 tasks total, 0 done, 0 pending
-All tasks complete. Verification triggered.
-Running: /df:verify
-  ✓ L0 | ✓ L1 | ⚠ L2 (no coverage tool) | ✓ L4
-  ✓ Merged df/upload to main
+All specs verified and merged. Workflow complete.
 ```
 ### No Work Remaining (idempotent)
@@ -345,10 +345,9 @@ Running: /df:verify
 ```
 /df:auto-cycle
-Loading PLAN.md... 3 tasks total, 3 done, 0 pending
-Verification already complete (no doing-* specs found).
+Loading PLAN.md... 0 tasks total, 0 done, 0 pending
-Nothing to do. Cycle complete. 0 tasks remaining.
+All specs verified and merged. Workflow complete.
 ```
 ### Circuit Breaker Tripped

package/src/commands/df/execute.md CHANGED Viewed

@@ -8,93 +8,44 @@ You are a coordinator. Spawn agents, run ratchet checks, update PLAN.md. Never i
 **ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, run ratchet health checks after each agent completes, update PLAN.md, write `.deepflow/decisions.md` in the main tree
----
+## Core Loop (Notification-Driven)
-## Purpose
-Implement tasks from PLAN.md with parallel agents, atomic commits, ratchet-driven quality gates, and context-efficient execution.
+Each task = one background agent. Completion notifications drive the loop.
+**NEVER use TaskOutput** — returns full transcripts (100KB+) that explode context.
-## Usage
 ```
-/df:execute              # Execute all ready tasks
-/df:execute T1 T2        # Specific tasks only
-/df:execute --continue   # Resume from checkpoint
-/df:execute --fresh      # Ignore checkpoint
-/df:execute --dry-run    # Show plan only
+1. Spawn ALL wave agents with run_in_background=true in ONE message
+2. STOP. End your turn. Do NOT poll or monitor.
+3. On EACH notification:
+   a. Run ratchet check (section 5.5)
+   b. Passed → TaskUpdate(status: "completed"), update PLAN.md [x] + commit hash
+   c. Failed → git revert HEAD --no-edit, TaskUpdate(status: "pending")
+   d. Report ONE line: "✓ T1: ratchet passed (abc123)" or "✗ T1: ratchet failed, reverted"
+   e. NOT all done → end turn, wait | ALL done → next wave or finish
+4. Between waves: check context %. If ≥50% → checkpoint and exit.
+5. Repeat until: all done, all blocked, or context ≥50%.
 ```
-## Skills & Agents
-- Skill: `atomic-commits` — Clean commit protocol
-- Skill: `context-hub` — Fetch external API docs before coding (when task involves external libraries)
+## Context Threshold
-**Use Task tool to spawn agents:**
-| Agent | subagent_type | Purpose |
-|-------|---------------|---------|
-| Implementation | `general-purpose` | Task implementation |
-| Debugger | `reasoner` | Debugging failures |
-**Model routing from frontmatter:**
-The model for each agent is determined by the `model:` field in the command/agent/skill frontmatter being invoked. The orchestrator reads the relevant frontmatter to determine which model to pass to `Task()`. If no `model:` field is present in the frontmatter, default to `sonnet`.
-## Context-Aware Execution
-Statusline writes to `.deepflow/context.json`: `{"percentage": 45}`
+Statusline writes `.deepflow/context.json`: `{"percentage": 45}`
 | Context % | Action |
 |-----------|--------|
 | < 50% | Full parallelism (up to 5 agents) |
 | ≥ 50% | Wait for running agents, checkpoint, exit |
-## Agent Protocol
-Each task = one background agent. Use agent completion notifications as the feedback loop.
-**NEVER use TaskOutput** — returns full agent transcripts (100KB+) that explode context.
-### Notification-Driven Execution
-```
-1. Spawn ALL wave agents with run_in_background=true in ONE message
-2. STOP. End your turn. Do NOT run Bash monitors or poll for results.
-3. Wait for "Agent X completed" notifications (they arrive automatically)
-4. On EACH notification:
-   a. Run ratchet check (health checks on the worktree)
-   b. Report: "✓ T1: ratchet passed (abc123)" or "✗ T1: ratchet failed, reverted"
-   c. Update PLAN.md for that task
-   d. Check: all wave agents done?
-      - No → end turn, wait for next notification
-      - Yes → proceed to next wave or write final summary
-```
-After spawning, your turn ENDS. Per notification: run ratchet, output ONE line, update PLAN.md. Write full summary only after ALL wave agents complete.
-## Checkpoint & Resume
-**File:** `.deepflow/checkpoint.json` — stored in WORKTREE directory, not main.
-**Schema:**
-```json
-{
-  "completed_tasks": ["T1", "T2"],
-  "current_wave": 2,
-  "worktree_path": ".deepflow/worktrees/upload",
-  "worktree_branch": "df/upload"
-}
-```
-**On checkpoint:** Complete wave → update PLAN.md → save to worktree → exit.
-**Resume:** `--continue` loads checkpoint, verifies worktree, skips completed tasks.
+---
 ## Behavior
 ### 1. CHECK CHECKPOINT
 ```
---continue → Load checkpoint
-  → If worktree_path exists:
-    → Verify worktree still exists on disk
-    → If missing: Error "Worktree deleted. Use --fresh"
-    → If exists: Use it, skip worktree creation
-  → Resume execution with completed tasks
+--continue → Load .deepflow/checkpoint.json from worktree
+  → Verify worktree exists on disk (else error: "Use --fresh")
+  → Skip completed tasks, resume execution
 --fresh → Delete checkpoint, start fresh
 checkpoint exists → Prompt: "Resume? (y/n)"
 else → Start fresh
@@ -102,88 +53,29 @@ else → Start fresh
 ### 1.5. CREATE WORKTREE
-Before spawning any agents, create an isolated worktree:
-```
-# Check main is clean (ignore untracked)
-git diff --quiet HEAD || Error: "Main has uncommitted changes. Commit or stash first."
-# Generate paths
-SPEC_NAME=$(basename spec/doing-*.md .md | sed 's/doing-//')
-BRANCH_NAME="df/${SPEC_NAME}"
-WORKTREE_PATH=".deepflow/worktrees/${SPEC_NAME}"
-# Create worktree (or reuse existing)
-if [ -d "${WORKTREE_PATH}" ]; then
-  echo "Reusing existing worktree"
-else
-  git worktree add -b "${BRANCH_NAME}" "${WORKTREE_PATH}"
-fi
-```
-**Existing worktree:** Reuse it (same spec = same worktree).
-**--fresh flag:** Deletes existing worktree and creates new one.
+Require clean HEAD (`git diff --quiet`). Derive SPEC_NAME from `specs/doing-*.md`.
+Create worktree: `.deepflow/worktrees/{spec}` on branch `df/{spec}`.
+Reuse if exists. `--fresh` deletes first.
 ### 1.6. RATCHET SNAPSHOT
-Before spawning agents, snapshot pre-existing test files:
+Snapshot pre-existing test files in worktree — only these count for ratchet (agent-created tests excluded):
 ```bash
 cd ${WORKTREE_PATH}
-# Snapshot pre-existing test files (only these count for ratchet)
 git ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' \
   > .deepflow/auto-snapshot.txt
-echo "Ratchet snapshot: $(wc -l < .deepflow/auto-snapshot.txt) pre-existing test files"
 ```
-**Only pre-existing test files are used for ratchet evaluation.** New test files created by agents during implementation don't influence the pass/fail decision. This prevents agents from gaming the ratchet by writing tests that pass trivially.
 ### 1.7. NO-TESTS BOOTSTRAP
-After the ratchet snapshot, check if zero test files were found:
-```bash
-TEST_COUNT=$(wc -l < .deepflow/auto-snapshot.txt | tr -d ' ')
-if [ "${TEST_COUNT}" = "0" ]; then
-  echo "Bootstrap needed: no pre-existing test files found."
-  BOOTSTRAP_NEEDED=true
-else
-  BOOTSTRAP_NEEDED=false
-fi
-```
-**If `BOOTSTRAP_NEEDED=true`:**
-1. **Inject a bootstrap task** as the FIRST action before any regular PLAN.md task is executed:
-   - Bootstrap task description: "Write tests for files in edit_scope"
-   - Read `edit_scope` from `specs/doing-*.md` to know which files need tests
-   - Spawn ONE dedicated bootstrap agent using the Bootstrap Task prompt (section 6)
+If snapshot has zero test files:
-2. **Bootstrap agent behavior:**
-   - Write tests covering the files listed in `edit_scope`
-   - Commit as `test({spec}): bootstrap tests for edit_scope`
-   - The bootstrap agent's ONLY job is writing tests — no implementation changes
+1. Spawn ONE bootstrap agent (section 6 Bootstrap Task) to write tests for `edit_scope` files
+2. On ratchet pass: re-snapshot, report `"bootstrap: completed"`, end cycle (no PLAN.md tasks this cycle)
+3. On ratchet fail: revert, halt with "Bootstrap failed — manual intervention required"
-3. **After bootstrap agent completes:**
-   - Run ratchet health checks (build must pass; test suite must not error out)
-   - If ratchet passes: re-take the ratchet snapshot so subsequent tasks use the new tests as baseline:
-     ```bash
-     cd ${WORKTREE_PATH}
-     git ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' \
-       > .deepflow/auto-snapshot.txt
-     echo "Post-bootstrap snapshot: $(wc -l < .deepflow/auto-snapshot.txt) test files"
-     ```
-   - If ratchet fails: revert bootstrap commit, log error, halt and report "Bootstrap failed — manual intervention required"
-4. **Signal to caller:** After bootstrap completes successfully, report `"bootstrap: completed"` in the cycle summary. This cycle's sole output is the test bootstrap — no regular PLAN.md task is executed this cycle.
-5. **Subsequent cycles:** The updated `.deepflow/auto-snapshot.txt` now contains the bootstrapped test files. All subsequent ratchet checks use these as the baseline.
-**If `BOOTSTRAP_NEEDED=false`:** Proceed normally to section 2.
+Subsequent cycles use bootstrapped tests as ratchet baseline.
 ### 2. LOAD PLAN
@@ -194,7 +86,7 @@ If missing: "No PLAN.md found. Run /df:plan first."
 ### 2.5. REGISTER NATIVE TASKS
-For each `[ ]` task in PLAN.md: `TaskCreate(subject: "{task_id}: {description}", activeForm: "{gerund}", description: full block)`. Store task_id → native ID mapping. Then set dependencies: `TaskUpdate(addBlockedBy: [...])` for each "Blocked by:" entry. On `--continue`: only register remaining `[ ]` items.
+For each `[ ]` task in PLAN.md: `TaskCreate(subject: "{task_id}: {description}", activeForm: "{gerund}", description: full block)`. Store task_id → native ID mapping. Set dependencies via `TaskUpdate(addBlockedBy: [...])`. On `--continue`: only register remaining `[ ]` items.
 ### 3. CHECK FOR UNPLANNED SPECS
@@ -202,237 +94,77 @@ Warn if `specs/*.md` (excluding doing-/done-) exist. Non-blocking.
 ### 4. IDENTIFY READY TASKS
-Use TaskList to find ready tasks:
-```
-Ready = TaskList results where:
-  - status: "pending"
-  - blockedBy: empty (auto-unblocked by native dependency system)
-```
+Ready = TaskList where status: "pending" AND blockedBy: empty.
 ### 5. SPAWN AGENTS
 Context ≥50%: checkpoint and exit.
-**Before spawning each agent**, mark its native task as in_progress:
-```
-TaskUpdate(taskId: native_id, status: "in_progress")
-```
-This activates the UI spinner showing the task's activeForm (e.g. "Creating upload endpoint").
+Before spawning: `TaskUpdate(taskId: native_id, status: "in_progress")` — activates UI spinner.
-**NEVER use `isolation: "worktree"` on Task tool calls.** Deepflow manages a shared worktree per spec (`.deepflow/worktrees/{spec}/`) so wave 2 agents see wave 1 commits. Claude Code's native isolation creates separate per-agent worktrees (`.claude/worktrees/`) where agents can't see each other's work.
+**NEVER use `isolation: "worktree"` on Task calls.** Deepflow manages a shared worktree so wave 2 sees wave 1 commits.
-**Spawn ALL ready tasks in ONE message** with multiple Task tool calls (true parallelism). Same-file conflicts: spawn sequentially.
+**Spawn ALL ready tasks in ONE message.** Same-file conflicts: spawn sequentially.
-**Multiple [SPIKE] tasks for the same problem:** When PLAN.md contains two or more `[SPIKE]` tasks grouped by the same "Blocked by:" target or identical problem description, do NOT run them sequentially. Instead, follow the **Parallel Spike Probes** protocol in section 5.7 before spawning any implementation tasks that depend on the spike outcome.
+**≥2 [SPIKE] tasks for same problem:** Follow Parallel Spike Probes (section 5.7).
 ### 5.5. RATCHET CHECK
-After each agent completes (notification received), the orchestrator runs health checks on the worktree.
+After each agent completes, run health checks in the worktree.
-**Step 1: Detect commands** (same auto-detection as /df:verify):
+**Auto-detect commands:**
 | File | Build | Test | Typecheck | Lint |
 |------|-------|------|-----------|------|
-| `package.json` | `npm run build` (if scripts.build) | `npm test` (if scripts.test not placeholder) | `npx tsc --noEmit` (if tsconfig.json) | `npm run lint` (if scripts.lint) |
-| `pyproject.toml` | — | `pytest` | `mypy .` (if mypy in deps) | `ruff check .` (if ruff in deps) |
-| `Cargo.toml` | `cargo build` | `cargo test` | — | `cargo clippy` (if installed) |
+| `package.json` | `npm run build` | `npm test` | `npx tsc --noEmit` | `npm run lint` |
+| `pyproject.toml` | — | `pytest` | `mypy .` | `ruff check .` |
+| `Cargo.toml` | `cargo build` | `cargo test` | — | `cargo clippy` |
 | `go.mod` | `go build ./...` | `go test ./...` | — | `go vet ./...` |
-**Step 2: Run health checks** in the worktree:
-```bash
-cd ${WORKTREE_PATH}
+Run Build → Test → Typecheck → Lint (stop on first failure).
-# Run each detected command
-# Build → Test → Typecheck → Lint (stop on first failure)
-```
+**Edit scope validation** (if spec declares `edit_scope`): check `git diff HEAD~1 --name-only` against allowed globs. Violations → `git revert HEAD --no-edit`, report "Edit scope violation: {files}".
-**Step 3: Validate edit scope** (if spec declares `edit_scope`):
-```bash
-# Get files changed by the agent
-CHANGED=$(git diff HEAD~1 --name-only)
-# Load edit_scope from spec (files/globs)
-EDIT_SCOPE=$(grep 'edit_scope:' specs/doing-*.md | sed 's/edit_scope://' | tr ',' '\n' | xargs)
-# Check each changed file against allowed scope
-for file in ${CHANGED}; do
-  ALLOWED=false
-  for pattern in ${EDIT_SCOPE}; do
-    # Match file against glob pattern
-    [[ "${file}" == ${pattern} ]] && ALLOWED=true
-  done
-  ${ALLOWED} || VIOLATIONS+=("${file}")
-done
-```
-- Violations found → revert: `git revert HEAD --no-edit`, report "✗ Edit scope violation: {files}"
-- No violations → continue to health checks
-**Step 4: Evaluate**:
-- All checks pass AND no scope violations → task succeeds, commit stands
-- Any check fails → regression detected → revert: `git revert HEAD --no-edit`
+**Impact completeness check** (if task has Impact block in PLAN.md):
+Compare `git diff HEAD~1 --name-only` against Impact callers/duplicates list.
+File listed but not modified → **advisory warning**: "Impact gap: {file} listed as {caller|duplicate} but not modified — verify manually". Not auto-revert (callers sometimes don't need changes), but flags the risk.
-**Ratchet uses ONLY pre-existing test files** (from `.deepflow/auto-snapshot.txt`). If the agent added new test files that fail, those are excluded from evaluation — the agent's new tests don't influence the ratchet decision.
+**Evaluate:** All pass + no violations → commit stands. Any failure → `git revert HEAD --no-edit`.
-**For spike tasks:** Same ratchet. If the spike's code passes pre-existing health checks, the spike passes. No LLM judges another LLM's work.
+Ratchet uses ONLY pre-existing test files from `.deepflow/auto-snapshot.txt`.
 ### 5.7. PARALLEL SPIKE PROBES
-When two or more `[SPIKE]` tasks address the **same problem** (same "Blocked by:" target OR identical or near-identical hypothesis wording), treat them as a probe set and run this protocol instead of the standard single-agent flow.
-#### Detection
-```
-Spike group = all [SPIKE] tasks where:
-  - same "Blocked by:" value, OR
-  - problem description is identical after stripping task ID prefix
-If group size ≥ 2 → enter parallel probe mode
-```
-#### Step 1: Record baseline commit
-```bash
-cd ${WORKTREE_PATH}
-BASELINE=$(git rev-parse HEAD)
-echo "Probe baseline: ${BASELINE}"
-```
-All probes branch from this exact commit so they share the same ratchet baseline.
-#### Step 2: Create isolated sub-worktrees
-For each spike `{SPIKE_ID}` in the probe group:
-```bash
-PROBE_BRANCH="df/${SPEC_NAME}/probe-${SPIKE_ID}"
-PROBE_PATH=".deepflow/worktrees/${SPEC_NAME}/probe-${SPIKE_ID}"
-git worktree add -b "${PROBE_BRANCH}" "${PROBE_PATH}" "${BASELINE}"
-echo "Created probe worktree: ${PROBE_PATH} (branch: ${PROBE_BRANCH})"
-```
-#### Step 3: Spawn all probes in parallel
-Mark every spike task as `in_progress`, then spawn one agent per probe **in a single message** using the Spike Task prompt (section 6), with the probe's worktree path as its working directory.
-```
-TaskUpdate(taskId: native_id_SPIKE_A, status: "in_progress")
-TaskUpdate(taskId: native_id_SPIKE_B, status: "in_progress")
-[spawn agent for SPIKE_A → PROBE_PATH_A]
-[spawn agent for SPIKE_B → PROBE_PATH_B]
-... (all in ONE message)
-```
-End your turn. Do NOT poll or monitor. Wait for completion notifications.
-#### Step 4: Ratchet each probe (on completion notifications)
-When a probe agent's notification arrives, run the standard ratchet (section 5.5) against its dedicated probe worktree:
-```bash
-cd ${PROBE_PATH}
-# Identical health-check commands as standard tasks
-# Build → Test → Typecheck → Lint (stop on first failure)
-```
-Record per-probe metrics:
-```yaml
-probe_id: SPIKE_A
-worktree: .deepflow/worktrees/{spec}/probe-SPIKE_A
-branch: df/{spec}/probe-SPIKE_A
-ratchet_passed: true/false
-regressions: 0          # failing pre-existing tests
-coverage_delta: +3      # new lines covered (positive = better)
-files_changed: 4        # number of files touched
-commit: abc1234
-```
-Wait until **all** probe notifications have arrived before proceeding to selection.
-#### Step 5: Machine-select winner
-No LLM evaluates another LLM's work. Apply the following ordered criteria to all probes that **passed** the ratchet:
-```
-1. Fewer regressions  (lower is better — hard gate: any regression disqualifies)
-2. Better coverage    (higher delta is better)
-3. Fewer files changed (lower is better — smaller blast radius)
-Tie-break: first probe to complete (chronological)
-```
-If **no** probe passes the ratchet, all are failed probes. Log insights (step 7) and reset the spike tasks to `pending` for retry with debugger guidance.
-#### Step 6: Preserve ALL probe worktrees
-Do NOT delete losing probe worktrees. They are preserved for manual inspection and cross-cycle learning:
-```bash
-# Winning probe: leave as-is, will be used as implementation base (step 8)
-# Losing probes: leave worktrees intact, mark branches with -failed suffix for clarity
-git branch -m "df/{spec}/probe-SPIKE_B" "df/{spec}/probe-SPIKE_B-failed"
-```
-Record all probe paths in `.deepflow/checkpoint.json` under `"spike_probes"` so future `--continue` runs know they exist.
-#### Step 7: Log failed probe insights
-For every probe that failed the ratchet (or lost selection), write two entries to `.deepflow/auto-memory.yaml` in the **main** tree.
-**Entry 1 — `spike_insights` (detailed probe record):**
-```yaml
-spike_insights:
-  - date: "YYYY-MM-DD"
-    spec: "{spec_name}"
-    spike_id: "SPIKE_B"
-    hypothesis: "{hypothesis text from PLAN.md}"
-    outcome: "failed"               # or "passed-but-lost"
-    failure_reason: "{first failed check and error summary}"
-    ratchet_metrics:
-      regressions: 2
-      coverage_delta: -1
-      files_changed: 7
-    worktree: ".deepflow/worktrees/{spec}/probe-SPIKE_B-failed"
-    branch: "df/{spec}/probe-SPIKE_B-failed"
-    edge_cases: []                  # orchestrator may populate after manual review
-```
-**Entry 2 — `probe_learnings` (cross-cycle memory, read by `/df:auto-cycle` on each cycle start):**
-```yaml
-probe_learnings:
-  - spike: "SPIKE_B"
-    probe: "{probe branch suffix, e.g. probe-SPIKE_B}"
-    insight: "{one-sentence summary of what the probe revealed, derived from failure_reason}"
-```
-If the file does not exist, create it. Initialize both `spike_insights:` and `probe_learnings:` as empty lists before appending. Preserve all existing keys when merging.
-#### Step 8: Promote winning probe
-Cherry-pick the winner's commit into the shared spec worktree so downstream implementation tasks see the winning approach:
-```bash
-cd ${WORKTREE_PATH}               # shared worktree (not the probe sub-worktree)
-git cherry-pick ${WINNER_COMMIT}
-```
-Then mark the winning spike task as `completed` and auto-unblock its dependents:
-```
-TaskUpdate(taskId: native_id_SPIKE_WINNER, status: "completed")
-TaskUpdate(taskId: native_id_SPIKE_LOSERS, status: "pending")  # keep visible for audit
-```
-Update PLAN.md:
-- Winning spike → `[x]` with commit hash and `[PROBE_WINNER]` tag
-- Losing spikes → `[~]` (skipped) with `[PROBE_FAILED: see auto-memory.yaml]` note
-Resume the standard execution loop (section 9) — implementation tasks blocked by the spike group are now unblocked.
+Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothesis.
+1. **Baseline:** Record `BASELINE=$(git rev-parse HEAD)` in shared worktree
+2. **Sub-worktrees:** Per spike: `git worktree add -b df/{spec}/probe-{SPIKE_ID} .deepflow/worktrees/{spec}/probe-{SPIKE_ID} ${BASELINE}`
+3. **Spawn:** All probes in ONE message, each targeting its probe worktree. End turn.
+4. **Ratchet:** Per notification, run standard ratchet (5.5) in probe worktree. Record: ratchet_passed, regressions, coverage_delta, files_changed, commit
+5. **Select winner** (after ALL complete, no LLM judge):
+   - Disqualify any with regressions
+   - Rank: fewer regressions > higher coverage_delta > fewer files_changed > first to complete
+   - No passes → reset all to pending for retry with debugger
+6. **Preserve all worktrees.** Losers: rename branch + `-failed` suffix. Record in checkpoint.json under `"spike_probes"`
+7. **Log failed probes** to `.deepflow/auto-memory.yaml` (main tree):
+   ```yaml
+   spike_insights:
+     - date: "YYYY-MM-DD"
+       spec: "{spec_name}"
+       spike_id: "SPIKE_B"
+       hypothesis: "{from PLAN.md}"
+       outcome: "failed"  # or "passed-but-lost"
+       failure_reason: "{first failed check + error summary}"
+       ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
+       worktree: ".deepflow/worktrees/{spec}/probe-SPIKE_B-failed"
+       branch: "df/{spec}/probe-SPIKE_B-failed"
+   probe_learnings:  # read by /df:auto-cycle each start
+     - spike: "SPIKE_B"
+       probe: "probe-SPIKE_B"
+       insight: "{one-sentence summary from failure_reason}"
+   ```
+   Create file if missing. Preserve existing keys when merging.
+8. **Promote winner:** Cherry-pick into shared worktree. Winner → `[x] [PROBE_WINNER]`, losers → `[~] [PROBE_FAILED]`. Resume standard loop.
 ---
@@ -444,143 +176,127 @@ Working directory: {worktree_absolute_path}
 All file operations MUST use this absolute path as base. Do NOT write files to the main project directory.
 Commit format: {commit_type}({spec}): {description}
-STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main. These are handled by the orchestrator and /df:verify.
+STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main.
 ```
-**Standard Task (append after preamble):**
+**Standard Task:**
 ```
 {task_id}: {description from PLAN.md}
-Files: {target files}
-Spec: {spec_name}
+Files: {target files}  Spec: {spec_name}
+{Impact block from PLAN.md — include verbatim if present}
+{Prior failure context — include ONLY if task was previously reverted. Read from .deepflow/auto-memory.yaml revert_history for this task_id:}
+Previous attempts (DO NOT repeat these approaches):
+- Cycle {N}: reverted — "{reason from revert_history}"
+- Cycle {N}: reverted — "{reason from revert_history}"
+{Omit this entire block if task has no revert history.}
+CRITICAL: If Impact lists duplicates or callers, you MUST verify each one is consistent with your changes.
+- [active] duplicates → consolidate into single source of truth (e.g., local generateYAML → use shared buildConfigData)
+- [dead] duplicates → DELETE the dead code entirely. Dead code pollutes context and causes drift.
 Steps:
-1. If the task involves external APIs/SDKs, run: chub search "<library>" --json → chub get <id> --lang <lang>
-   Use fetched docs as ground truth for API signatures. Annotate any gaps: chub annotate <id> "note"
-   Skip this step if chub is not installed or the task only touches internal code.
-2. Implement the task
-3. Commit as feat({spec}): {description}
+1. External APIs/SDKs → chub search "<library>" --json → chub get <id> --lang <lang> (skip if chub unavailable or internal code only)
+2. Read ALL files in Impact before implementing — understand the full picture
+3. Implement the task, updating all impacted files
+4. Commit as feat({spec}): {description}
-Your ONLY job is to write code and commit. The orchestrator will run health checks after you finish.
+Your ONLY job is to write code and commit. Orchestrator runs health checks after.
 ```
-**Bootstrap Task (append after preamble):**
+**Bootstrap Task:**
 ```
 BOOTSTRAP: Write tests for files in edit_scope
-Files: {edit_scope files from spec}
-Spec: {spec_name}
-Steps:
-1. Write tests that cover the functionality of the files listed above
-2. Do NOT change implementation files — tests only
-3. Commit as test({spec}): bootstrap tests for edit_scope
+Files: {edit_scope files}  Spec: {spec_name}
-Your ONLY job is to write tests and commit. The orchestrator will run health checks after you finish.
+Write tests covering listed files. Do NOT change implementation files.
+Commit as test({spec}): bootstrap tests for edit_scope
 ```
-**Spike Task (append after preamble):**
+**Spike Task:**
 ```
 {task_id} [SPIKE]: {hypothesis}
-Files: {target files}
-Spec: {spec_name}
+Files: {target files}  Spec: {spec_name}
-Steps:
-1. Implement the minimal spike to validate the hypothesis
-2. Commit as spike({spec}): {description}
+{Prior failure context — include ONLY if this spike was previously reverted. Read from .deepflow/auto-memory.yaml revert_history + spike_insights for this task_id:}
+Previous attempts (DO NOT repeat these approaches):
+- Cycle {N}: reverted — "{reason}"
+{Omit this entire block if no revert history.}
-Your ONLY job is to write code and commit. The orchestrator will run health checks to determine if the spike passes.
+Implement minimal spike to validate hypothesis.
+Commit as spike({spec}): {description}
 ```
-### 7. FAILURE HANDLING
+### 8. COMPLETE SPECS
-When a task fails ratchet and is reverted:
+When all tasks done for a `doing-*` spec:
+1. Run `/df:verify doing-{name}` via the Skill tool (`skill: "df:verify", args: "doing-{name}"`)
+   - Verify runs quality gates (L0-L4), merges worktree branch to main, cleans up worktree, renames spec `doing-*` → `done-*`, and extracts decisions
+   - If verify fails (adds fix tasks): stop here — `/df:execute --continue` will pick up the fix tasks
+   - If verify passes: proceed to step 2
+2. Remove spec's ENTIRE section from PLAN.md (header, tasks, summaries, fix tasks, separators)
+3. Recalculate Summary table at top of PLAN.md
-`TaskUpdate(taskId: native_id, status: "pending")` — keeps task visible for retry; dependents remain blocked.
+---
-On repeated failure: spawn `Task(subagent_type="reasoner", model={model from debugger frontmatter, default "sonnet"}, prompt="Debug failure: {ratchet output}")`.
+## Usage
+```
+/df:execute              # Execute all ready tasks
+/df:execute T1 T2        # Specific tasks only
+/df:execute --continue   # Resume from checkpoint
+/df:execute --fresh      # Ignore checkpoint
+/df:execute --dry-run    # Show plan only
+```
-Leave worktree intact, keep checkpoint.json, output: worktree path/branch, `cd {worktree_path}` to investigate, `/df:execute --continue` to resume, `/df:execute --fresh` to discard.
+## Skills & Agents
-### 8. COMPLETE SPECS
+- Skill: `atomic-commits` — Clean commit protocol
+- Skill: `context-hub` — Fetch external API docs before coding
-When all tasks done for a `doing-*` spec:
-1. Embed history in spec: `## Completed` section with task list and commit hashes
-2. Rename: `doing-upload.md` → `done-upload.md`
-3. Extract decisions from done-* spec: Read the `done-{name}.md` file. Model-extract architectural decisions — look for explicit choices (→ `[APPROACH]`), unvalidated assumptions (→ `[ASSUMPTION]`), and "for now" decisions (→ `[PROVISIONAL]`). Append as a new section to **main tree** `.deepflow/decisions.md`:
-   ```
-   ### {YYYY-MM-DD} — {spec-name}
-   - [TAG] decision text — rationale
-   ```
-   After successful append, delete `specs/done-{name}.md`. If write fails, preserve the file.
-4. Remove the spec's ENTIRE section from PLAN.md:
-   - The `### doing-{spec}` header
-   - All task entries (`- [x] **T{n}**: ...` and their sub-items)
-   - Any `## Execution Summary` block for that spec
-   - Any `### Fix Tasks` sub-section for that spec
-   - Separators (`---`) between removed sections
-5. Recalculate the Summary table at the top of PLAN.md (update counts for completed/pending)
+| Agent | subagent_type | Purpose |
+|-------|---------------|---------|
+| Implementation | `general-purpose` | Task implementation |
+| Debugger | `reasoner` | Debugging failures |
-### 9. ITERATE (Notification-Driven)
+**Model routing:** Use `model:` from command/agent/skill frontmatter. Default: `sonnet`.
-After spawning wave agents, your turn ENDS. Completion notifications drive the loop.
+**Checkpoint schema:** `.deepflow/checkpoint.json` in worktree:
+```json
+{"completed_tasks": ["T1","T2"], "current_wave": 2, "worktree_path": ".deepflow/worktrees/upload", "worktree_branch": "df/upload"}
+```
-**Per notification:**
-1. Run ratchet check for the completed agent (see section 5.5)
-2. Ratchet passed → `TaskUpdate(taskId: native_id, status: "completed")` — auto-unblocks dependent tasks
-3. Ratchet failed → revert commit, `TaskUpdate(taskId: native_id, status: "pending")`
-4. Update PLAN.md: `[ ]` → `[x]` + commit hash (on pass) or note revert (on fail)
-5. Report: "✓ T1: ratchet passed (abc123)" or "✗ T1: ratchet failed, reverted"
-6. If NOT all wave agents done → end turn, wait
-7. If ALL wave agents done → use TaskList to find newly unblocked tasks, check context, spawn next wave or finish
+---
-**Between waves:** Check context %. If ≥50%, checkpoint and exit.
+## Failure Handling
-**Repeat** until: all done, all blocked, or context ≥50% (checkpoint).
+When task fails ratchet and is reverted:
+- `TaskUpdate(taskId: native_id, status: "pending")` — dependents remain blocked
+- Repeated failure → spawn `Task(subagent_type="reasoner", prompt="Debug failure: {ratchet output}")`
+- Leave worktree intact, keep checkpoint.json
+- Output: worktree path/branch, `cd {path}` to investigate, `--continue` to resume, `--fresh` to discard
 ## Rules
 | Rule | Detail |
 |------|--------|
-| Zero test files → bootstrap first | Section 1.7; bootstrap is the cycle's sole task when snapshot is empty |
+| Zero test files → bootstrap first | Bootstrap is cycle's sole task when snapshot empty |
 | 1 task = 1 agent = 1 commit | `atomic-commits` skill |
 | 1 file = 1 writer | Sequential if conflict |
 | Agent writes code, orchestrator measures | Ratchet is the judge |
 | No LLM evaluates LLM work | Health checks only |
-| ≥2 spikes for same problem → parallel probes | Section 5.7; never run competing spikes sequentially |
-| All probe worktrees preserved | Losing probes renamed with `-failed` suffix; never deleted |
-| Machine-selected winner | Fewer regressions > better coverage > fewer files changed; no LLM judge |
-| Failed probe insights logged | `.deepflow/auto-memory.yaml` in main tree; persists across cycles |
-| Winner cherry-picked to shared worktree | Downstream tasks see winning approach via shared worktree |
-| External APIs → chub first | Agents fetch curated docs before implementing external API calls; skip if chub unavailable |
+| ≥2 spikes same problem → parallel probes | Never run competing spikes sequentially |
+| All probe worktrees preserved | Losers renamed `-failed`; never deleted |
+| Machine-selected winner | Regressions > coverage > files changed; no LLM judge |
+| External APIs → chub first | Skip if unavailable |
 ## Example
-### No-Tests Bootstrap
-```
-/df:execute (context: 8%)
-Loading PLAN.md... T1 ready, T2/T3 blocked by T1
-Ratchet snapshot: 0 pre-existing test files
-Bootstrap needed: no pre-existing test files found.
-Spawning bootstrap agent for edit_scope...
-[Bootstrap agent completed]
-  Running ratchet: build ✓ | tests ✓ (12 new tests pass)
-  ✓ Bootstrap: ratchet passed (boo1234)
-  Re-taking ratchet snapshot: 3 test files
-bootstrap: completed — cycle's sole task was test bootstrap
-Next: Run /df:auto-cycle again to execute T1
-```
-### Standard Execution
 ```
 /df:execute (context: 12%)
 Loading PLAN.md... T1 ready, T2/T3 blocked by T1
 Ratchet snapshot: 24 pre-existing test files
-Registering native tasks: TaskCreate T1/T2/T3, TaskUpdate(T2 blockedBy T1), TaskUpdate(T3 blockedBy T1)
 Wave 1: TaskUpdate(T1, in_progress)
 [Agent "T1" completed]
@@ -589,43 +305,13 @@ Wave 1: TaskUpdate(T1, in_progress)
   TaskUpdate(T1, completed) → auto-unblocks T2, T3
 Wave 2: TaskUpdate(T2/T3, in_progress)
-[Agent "T2" completed]
-  Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
-  ✓ T2: ratchet passed (def5678)
-[Agent "T3" completed]
-  Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
-  ✓ T3: ratchet passed (ghi9012)
-Context: 35% — ✓ doing-upload → done-upload. Complete: 3/3
-Next: Run /df:verify to verify specs and merge to main
-```
-### Ratchet Failure (Regression Detected)
-```
-/df:execute (context: 10%)
-Wave 1: TaskUpdate(T1, in_progress)
-[Agent "T1" completed]
-  Running ratchet: build ✓ | tests ✗ (2 failed of 24)
-  ✗ T1: ratchet failed, reverted
-  TaskUpdate(T1, pending)
-Spawning debugger for T1...
-[Debugger completed]
-  Re-running T1 with fix guidance...
-[Agent "T1 retry" completed]
-  Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
-  ✓ T1: ratchet passed (abc1234)
-```
-### With Checkpoint
-```
-Wave 1 complete (context: 52%)
-Checkpoint saved.
-Next: Run /df:execute --continue to resume execution
+[Agent "T2" completed]  ✓ T2: ratchet passed (def5678)
+[Agent "T3" completed]  ✓ T3: ratchet passed (ghi9012)
+Context: 35% — All tasks done for doing-upload.
+Running /df:verify doing-upload...
+  ✓ L0 | ✓ L1 (3/3 files) | ⚠ L2 (no coverage tool) | ✓ L4 (24 tests)
+  ✓ Merged df/upload to main
+  ✓ Spec complete: doing-upload → done-upload
+Complete: 3/3
 ```

package/src/commands/df/plan.md CHANGED Viewed

@@ -3,7 +3,7 @@
 ## Purpose
 Compare specs against codebase and past experiments. Generate prioritized tasks.
-**NEVER:** use EnterPlanMode, use ExitPlanMode — this command IS the planning phase; native plan mode conflicts with it
+**NEVER:** use EnterPlanMode, use ExitPlanMode — this command IS the planning phase
 ## Usage
 ```
@@ -17,71 +17,50 @@ Compare specs against codebase and past experiments. Generate prioritized tasks.
 ## Spec File States
-| Prefix | State | Action |
-|--------|-------|--------|
-| (none) | New | Plan this |
-| `doing-` | In progress | Skip |
-| `done-` | Completed | Skip |
+| Prefix | Action |
+|--------|--------|
+| (none) | Plan this |
+| `doing-` | Skip |
+| `done-` | Skip |
 ## Behavior
 ### 1. LOAD CONTEXT
 ```
-Load:
-- specs/*.md EXCLUDING doing-* and done-* (only new specs)
-- PLAN.md (if exists, for appending)
-- .deepflow/config.yaml (if exists)
+Load: specs/*.md (exclude doing-*/done-*), PLAN.md (if exists), .deepflow/config.yaml
 Determine source_dir from config or default to src/
 ```
-Run `validateSpec` on each loaded spec. Hard failures → skip that spec entirely and emit an error line. Advisory warnings → include them in plan output.
-If no new specs: report counts, suggest `/df:execute`.
+Run `validateSpec` on each spec. Hard failures → skip + error. Advisory → include in output.
+No new specs → report counts, suggest `/df:execute`.
 ### 2. CHECK PAST EXPERIMENTS (SPIKE-FIRST)
 **CRITICAL**: Check experiments BEFORE generating any tasks.
-Extract topic from spec name (fuzzy match), then:
 ```
 Glob .deepflow/experiments/{topic}--*
 ```
-**Experiment file naming:** `{topic}--{hypothesis}--{status}.md`
-Statuses: `active`, `passed`, `failed`
+File naming: `{topic}--{hypothesis}--{status}.md` (active/passed/failed)
 | Result | Action |
 |--------|--------|
-| `--failed.md` exists | Extract "next hypothesis" from Conclusion section |
-| `--passed.md` exists | Reference as validated pattern, can proceed to full implementation |
-| `--active.md` exists | Wait for experiment completion before planning |
-| No matches | New topic, needs initial spike |
+| `--failed.md` | Extract "next hypothesis" from Conclusion, generate spike |
+| `--passed.md` | Proceed to full implementation |
+| `--active.md` | Wait for completion |
+| No matches | New topic, generate initial spike |
-**Spike-First Rule**:
-- If `--failed.md` exists: Generate spike task to test the next hypothesis (from failed experiment's Conclusion)
-- If no experiments exist: Generate spike task for the core hypothesis
-- Full implementation tasks are BLOCKED until a spike validates the approach
-- Only proceed to full task generation after `--passed.md` exists
-See: `templates/experiment-template.md` for experiment format
+Full implementation tasks BLOCKED until spike validates. See `templates/experiment-template.md`.
 ### 3. DETECT PROJECT CONTEXT
-For existing codebases, identify:
-- Code style/conventions
-- Existing patterns (error handling, API structure)
-- Integration points
-Include patterns in task descriptions for agents to follow.
+Identify code style, patterns (error handling, API structure), integration points. Include in task descriptions.
 ### 4. ANALYZE CODEBASE
-Follow `templates/explore-agent.md` for spawn rules, prompt structure, and scope restrictions.
-Scale agent count based on codebase size:
+Follow `templates/explore-agent.md` for spawn rules and scope.
 | File Count | Agents |
 |------------|--------|
@@ -90,125 +69,111 @@ Scale agent count based on codebase size:
 | 100-500 | 25-40 |
 | 500+ | 50-100 (cap) |
-**Use `code-completeness` skill patterns** to search for:
-- Implementations matching spec requirements
-- TODO, FIXME, HACK comments
-- Stub functions, placeholder returns
-- Skipped tests, incomplete coverage
+Use `code-completeness` skill to search for: implementations matching spec requirements, TODOs/FIXMEs/HACKs, stubs, skipped tests.
-### 5. COMPARE & PRIORITIZE
+### 4.5. IMPACT ANALYSIS (per planned file)
-Spawn `Task(subagent_type="reasoner", model="opus")`. Reasoner maps each requirement to DONE / PARTIAL / MISSING / CONFLICT. Flag spec gaps; don't silently assume.
+For each file in a task's "Files:" list, find the full blast radius.
-Check spec health: verify REQ-AC alignment, requirement clarity, and completeness. Note any issues (orphan ACs, vague requirements) in plan output.
+**Search for:**
-**Priority order:** Dependencies → Impact → Risk
+1. **Callers:** `grep -r "{exported_function}" --include="*.{ext}" -l` — files that import/call what's being changed
+2. **Duplicates:** Files with similar logic (same function name, same transformation). Classify:
+   - `[active]` — used in production → must consolidate
+   - `[dead]` — bypassed/unreachable → must delete
+3. **Data flow:** If file produces/transforms data, find ALL consumers of that shape across languages
-### 6. GENERATE SPIKE TASKS (IF NEEDED)
+**Embed as `Impact:` block in each task:**
+```markdown
+- [ ] **T2**: Add new features to YAML export
+  - Files: src/utils/buildConfigData.ts
+  - Impact:
+    - Callers: src/routes/index.ts:12, src/api/handler.ts:45
+    - Duplicates:
+      - src/components/YamlViewer.tsx:19 (own generateYAML) [active — consolidate]
+      - backend/yaml_gen.go (generateYAMLFromConfig) [dead — DELETE]
+    - Data flow: buildConfigData → YamlViewer, SimControls, RoleplayPage
+  - Blocked by: T1
+```
+Files outside original "Files:" → add with `(impact — verify/update)`.
+Skip for spike tasks.
+### 5. COMPARE & PRIORITIZE
+Spawn `Task(subagent_type="reasoner", model="opus")`. Map each requirement to DONE / PARTIAL / MISSING / CONFLICT. Check REQ-AC alignment. Flag spec gaps.
-**When to generate spike tasks:**
-1. Failed experiment exists → Test the next hypothesis
-2. No experiments exist → Test the core hypothesis
-3. Passed experiment exists → Skip to full implementation
+Priority: Dependencies → Impact → Risk
+### 6. GENERATE SPIKE TASKS (IF NEEDED)
 **Spike Task Format:**
 ```markdown
 - [ ] **T1** [SPIKE]: Validate {hypothesis}
   - Type: spike
   - Hypothesis: {what we're testing}
-  - Method: {minimal steps to validate}
-  - Success criteria: {how to know it passed}
+  - Method: {minimal steps}
+  - Success criteria: {measurable}
   - Time-box: 30 min
   - Files: .deepflow/experiments/{topic}--{hypothesis}--{status}.md
   - Blocked by: none
 ```
-**Blocking Logic:** All implementation tasks MUST have `Blocked by: T{spike}` until spike passes. If spike fails: update to `--failed.md`, DO NOT generate implementation tasks.
+All implementation tasks MUST `Blocked by: T{spike}`. Spike fails → `--failed.md`, no implementation tasks.
 #### Probe Diversity
-When generating multiple spike probes for the same problem, diversity is required to avoid confirmation bias and enable discovery of unexpected solutions.
+When generating multiple spikes for the same problem:
 | Requirement | Rule |
 |-------------|------|
-| Contradictory | At least 2 probes must use opposing/contradictory approaches (e.g., streaming vs buffering, in-process vs external) |
-| Naive | At least 1 probe must be a naive/simple approach without prior technical justification — enables exaptation (discovering unexpected solutions) |
-| Parallel | All probes for the same problem run simultaneously, not sequentially |
-| Scoped | Each probe is minimal — just enough to validate the hypothesis |
-| Safe to fail | Each probe runs in its own worktree; failure has zero impact on main |
+| Contradictory | ≥2 probes with opposing approaches |
+| Naive | ≥1 probe without prior technical justification |
+| Parallel | All run simultaneously |
+| Scoped | Minimal — just enough to validate |
-**Diversity validation step** — before outputting spike tasks, verify:
-1. Are there at least 2 probes with opposing assumptions? If not, add a contradictory probe.
-2. Is there at least 1 naive probe with no prior technical justification? If not, add one.
-3. Are all probes independent (no probe depends on another probe's result)?
-**Example — 3 diverse probes for a caching problem:**
+Before output, verify: ≥2 opposing probes, ≥1 naive, all independent.
+**Example — caching problem, 3 diverse probes:**
 ```markdown
 - [ ] **T1** [SPIKE]: Validate in-memory LRU cache
-  - Type: spike
   - Role: Contradictory-A (in-process)
-  - Hypothesis: In-memory LRU cache reduces DB queries by ≥80%
-  - Method: Implement LRU with 1000-item cap, run load test
-  - Success criteria: DB query count drops ≥80% under 100 concurrent users
-  - Blocked by: none
+  - Hypothesis: In-memory LRU reduces DB queries by ≥80%
+  - Method: LRU with 1000-item cap, load test
+  - Success criteria: DB queries drop ≥80% under 100 concurrent users
 - [ ] **T2** [SPIKE]: Validate Redis distributed cache
-  - Type: spike
   - Role: Contradictory-B (external, opposing T1)
-  - Hypothesis: Redis cache scales across multiple instances
-  - Method: Add Redis client, cache top 10 queries, same load test
-  - Success criteria: DB queries drop ≥80%, works across 2 app instances
-  - Blocked by: none
+  - Hypothesis: Redis scales across multiple instances
+  - Method: Redis client, cache top 10 queries, same load test
+  - Success criteria: DB queries drop ≥80%, works across 2 instances
-- [ ] **T3** [SPIKE]: Validate query optimization without cache (naive)
-  - Type: spike
+- [ ] **T3** [SPIKE]: Validate query optimization without cache
   - Role: Naive (no prior justification — tests if caching is even necessary)
-  - Hypothesis: Indexes + query batching alone may be sufficient
-  - Method: Add missing indexes, batch N+1 queries, same load test — no cache
+  - Hypothesis: Indexes + query batching alone may suffice
+  - Method: Add indexes, batch N+1 queries, same load test — no cache
   - Success criteria: DB queries drop ≥80% with zero cache infrastructure
-  - Blocked by: none
 ```
 ### 7. VALIDATE HYPOTHESES
-For unfamiliar APIs, ambiguous approaches, or performance-critical work: prototype in scratchpad (not committed). If assumption fails, write `.deepflow/experiments/{topic}--{hypothesis}--failed.md`. Skip for well-known patterns/simple CRUD.
+Unfamiliar APIs or performance-critical → prototype in scratchpad. Fails → write `--failed.md`. Skip for known patterns.
 ### 8. CLEANUP PLAN.md
-Before writing new tasks, prune stale sections:
-```
-For each ### section in PLAN.md:
-  Extract spec name from header (e.g. "doing-upload" or "done-upload")
-  If specs/done-{name}.md exists:
-    → Remove the ENTIRE section: header, tasks, execution summary, fix tasks, separators
-  If header references a spec with no matching specs/doing-*.md or specs/done-*.md:
-    → Remove it (orphaned section)
-```
-Also recalculate the Summary table (specs analyzed, tasks created/completed/pending) to reflect only remaining sections.
-If PLAN.md becomes empty after cleanup, delete the file and recreate fresh.
-### 9. OUTPUT PLAN.md
+Prune stale sections: remove `done-*` sections and orphaned headers. Recalculate Summary table. Empty → recreate fresh.
-Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validation findings.
+### 9. OUTPUT & RENAME
-### 10. RENAME SPECS
+Append tasks grouped by `### doing-{spec-name}`. Rename `specs/feature.md` → `specs/doing-feature.md`.
-`mv specs/feature.md specs/doing-feature.md`
-### 11. REPORT
-`✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
+Report: `✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
 ## Rules
-- **Spike-first** — Generate spike task before full implementation if no `--passed.md` experiment exists
-- **Block on spike** — Full implementation tasks MUST be blocked by spike validation
-- **Learn from failures** — Extract "next hypothesis" from failed experiments, never repeat same approach
+- **Spike-first** — No `--passed.md` → spike before implementation
+- **Block on spike** — Implementation tasks blocked until spike validates
+- **Learn from failures** — Extract next hypothesis, never repeat approach
 - **Plan only** — Do NOT implement (except quick validation prototypes)
-- **Confirm before assume** — Search code before marking "missing"
 - **One task = one logical unit** — Atomic, committable
 - Prefer existing utilities over new code; flag spec gaps
@@ -216,74 +181,31 @@ Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validatio
 | Agent | Model | Base | Scale |
 |-------|-------|------|-------|
-| Explore (search) | haiku | 10 | +1 per 20 files |
-| Reasoner (analyze) | opus | 5 | +1 per 2 specs |
+| Explore | haiku | 10 | +1 per 20 files |
+| Reasoner | opus | 5 | +1 per 2 specs |
-Always use the `Task` tool with explicit `subagent_type` and `model`. Do NOT use Glob/Grep/Read directly.
+Always use `Task` tool with explicit `subagent_type` and `model`.
 ## Example
-### Spike-First (No Prior Experiments)
 ```markdown
-# Plan
 ### doing-upload
 - [ ] **T1** [SPIKE]: Validate streaming upload approach
   - Type: spike
-  - Hypothesis: Streaming uploads will handle files >1GB without memory issues
-  - Method: Create minimal endpoint, upload 2GB file, measure memory
-  - Success criteria: Memory stays under 500MB during upload
-  - Time-box: 30 min
+  - Hypothesis: Streaming uploads handle >1GB without memory issues
+  - Success criteria: Memory <500MB during 2GB upload
   - Files: .deepflow/experiments/upload--streaming--active.md
   - Blocked by: none
 - [ ] **T2**: Create upload endpoint
   - Files: src/api/upload.ts
-  - Blocked by: T1 (spike must pass)
+  - Impact:
+    - Callers: src/routes/index.ts:5
+    - Duplicates: backend/legacy-upload.go [dead — DELETE]
+  - Blocked by: T1
 - [ ] **T3**: Add S3 service with streaming
   - Files: src/services/storage.ts
-  - Blocked by: T1 (spike must pass), T2
-```
-### Spike-First (After Failed Experiment)
-```markdown
-# Plan
-### doing-upload
-- [ ] **T1** [SPIKE]: Validate chunked upload with backpressure
-  - Type: spike
-  - Hypothesis: Adding backpressure control will prevent buffer overflow
-  - Method: Implement pause/resume on buffer threshold, test with 2GB file
-  - Success criteria: No memory spikes above 500MB
-  - Time-box: 30 min
-  - Files: .deepflow/experiments/upload--chunked-backpressure--active.md
-  - Blocked by: none
-  - Note: Previous approach failed (see upload--buffer-upload--failed.md)
-- [ ] **T2**: Implement chunked upload endpoint
-  - Files: src/api/upload.ts
-  - Blocked by: T1 (spike must pass)
-```
-### After Spike Validates (Full Implementation)
-```markdown
-# Plan
-### doing-upload
-- [ ] **T1**: Create upload endpoint
-  - Files: src/api/upload.ts
-  - Blocked by: none
-  - Note: Use streaming (validated in upload--streaming--passed.md)
-- [ ] **T2**: Add S3 service with streaming
-  - Files: src/services/storage.ts
-  - Blocked by: T1
-  - Avoid: Direct buffer upload failed (see upload--buffer-upload--failed.md)
+  - Blocked by: T1, T2
 ```