npm - deepflow - Versions diffs - 0.1.82 → 0.1.84 - Mend

deepflow 0.1.82 → 0.1.84

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

package/README.md +14 -0
package/bin/install.js +21 -7
package/package.json +1 -1
package/src/agents/reasoner.md +1 -0
package/src/commands/df/auto-cycle.md +244 -11
package/src/commands/df/auto.md +5 -0
package/src/commands/df/consolidate.md +8 -0
package/src/commands/df/debate.md +6 -0
package/src/commands/df/discover.md +6 -0
package/src/commands/df/execute.md +265 -1
package/src/commands/df/note.md +5 -0
package/src/commands/df/plan.md +89 -15
package/src/commands/df/resume.md +11 -5
package/src/commands/df/spec.md +5 -0
package/src/commands/df/update.md +5 -0
package/src/commands/df/verify.md +8 -2
package/src/skills/atomic-commits/SKILL.md +3 -2
package/src/skills/browse-fetch/SKILL.md +2 -0
package/src/skills/browse-verify/SKILL.md +1 -0
package/src/skills/code-completeness/SKILL.md +1 -0
package/src/skills/gap-discovery/SKILL.md +1 -0
package/templates/explore-agent.md +68 -3

package/README.md CHANGED Viewed

@@ -186,6 +186,20 @@ your-project/
 6. **Atomic commits** — One task = one commit
 7. **Context-aware** — Checkpoint before limits, resume seamlessly
+## Why This Architecture Works
+Deepflow's design isn't opinionated — it's a direct response to measured LLM limitations:
+**Focused tasks > giant context** — LLMs lose ~2% effectiveness per 100K additional tokens, even on trivial tasks ([Chroma "Context Rot", 2025](https://research.trychroma.com/context-rot), 18 models tested). Deepflow keeps each task's context minimal and focused instead of loading the entire codebase.
+**Tool use > context stuffing** — Information in the middle of context has up to 40% less recall than at the start/end ([Lost in the Middle, 2023](https://arxiv.org/abs/2307.03172)). Agents access code on-demand via LSP (`findReferences`, `incomingCalls`) and grep — always fresh, no attention dilution.
+**Model routing > one-size-fits-all** — Mechanical tasks with cheap models (haiku), complex tasks with powerful models (opus). Fewer tokens per task = less degradation = better results. Effort-aware context budgets strip unnecessary sections from prompts for simpler tasks.
+**Prompt order follows attention** — Execute prompts follow the attention U-curve: critical instructions (task definition, failure history, success criteria) at start and end, navigable data (impact analysis, dependency context) in the middle. Distractors eliminated by design.
+**LSP-powered impact analysis** — Plan-time uses `findReferences` and `incomingCalls` to map blast radius precisely. Execute-time runs a freshness check before implementing — catching callers added after planning. Grep as fallback when LSP is unavailable.
 ## Skills
 | Skill | Purpose |

package/bin/install.js CHANGED Viewed

@@ -259,14 +259,19 @@ async function configureHooks(claudeDir) {
   // Configure statusline
   if (settings.statusLine) {
-    const answer = await ask(
-      `  ${c.yellow}!${c.reset} Existing statusLine found. Replace with deepflow? [y/N] `
-    );
-    if (answer.toLowerCase() === 'y') {
-      settings.statusLine = { type: 'command', command: statuslineCmd };
-      log('Statusline configured');
+    if (process.stdin.isTTY) {
+      const answer = await ask(
+        `  ${c.yellow}!${c.reset} Existing statusLine found. Replace with deepflow? [y/N] `
+      );
+      if (answer.toLowerCase() === 'y') {
+        settings.statusLine = { type: 'command', command: statuslineCmd };
+        log('Statusline configured');
+      } else {
+        console.log(`  ${c.yellow}!${c.reset} Skipped statusline configuration`);
+      }
     } else {
-      console.log(`  ${c.yellow}!${c.reset} Skipped statusline configuration`);
+      // Non-interactive (e.g. Claude Code bash tool) — skip prompt, keep existing
+      console.log(`  ${c.yellow}!${c.reset} Existing statusLine found — kept (non-interactive mode)`);
     }
   } else {
     settings.statusLine = { type: 'command', command: statuslineCmd };
@@ -407,6 +412,11 @@ function ask(question) {
 }
 async function askInstallLevel(prompt) {
+  if (!process.stdin.isTTY) {
+    // Non-interactive — default to global
+    console.log(`${c.dim}Non-interactive mode — defaulting to global install${c.reset}`);
+    return 'global';
+  }
   console.log(prompt);
   console.log('');
   console.log(`  ${c.cyan}1${c.reset}) Global  ${c.dim}(~/.claude/ - available in all projects)${c.reset}`);
@@ -455,6 +465,10 @@ async function uninstall() {
   const CLAUDE_DIR = level === 'global' ? GLOBAL_DIR : PROJECT_DIR;
   const levelLabel = level === 'global' ? 'global' : 'project';
+  if (!process.stdin.isTTY) {
+    console.log('Uninstall requires interactive mode. Run from a terminal.');
+    return;
+  }
   const confirm = await ask(`Remove ${levelLabel} installation from ${CLAUDE_DIR}? [y/N] `);
   if (confirm.toLowerCase() !== 'y') {
     console.log('Cancelled.');

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "deepflow",
-  "version": "0.1.82",
+  "version": "0.1.84",
   "description": "Doing reveals what thinking can't predict — spec-driven iterative development for Claude Code",
   "keywords": [
     "claude",

package/src/agents/reasoner.md CHANGED Viewed

@@ -58,3 +58,4 @@ Complex reasoning tasks requiring deep analysis.
 - Show reasoning, not just conclusions
 - Cite evidence (file:line)
 - Be decisive - recommend, don't waffle
+- Max response: 500 words

package/src/commands/df/auto-cycle.md CHANGED Viewed

@@ -1,3 +1,8 @@
+---
+name: df:auto-cycle
+description: Execute one task from PLAN.md with ratchet health checks and state tracking for autonomous mode
+---
 # /df:auto-cycle — Single Cycle of Auto Mode
 ## Purpose
@@ -22,6 +27,10 @@ Load: PLAN.md (required)
 Load: .deepflow/auto-memory.yaml (optional — cross-cycle state, ignore if missing)
 ```
+Shell injection (use output directly — no manual file reads needed):
+- `` !`cat PLAN.md 2>/dev/null || echo 'NOT_FOUND'` ``
+- `` !`cat .deepflow/auto-memory.yaml 2>/dev/null || echo 'NOT_FOUND'` ``
 **auto-memory.yaml full schema:**
 ```yaml
@@ -36,13 +45,37 @@ consecutive_reverts:   # written by circuit breaker (step 3.5)
   T2: 2
 probe_learnings:
   - { spike: T1, probe: "streaming", insight: "discovered hidden dependency on fs.watch" }
+optimize_state:                    # present only when an optimize task is active or was completed
+  task_id: "T{n}"
+  metric_command: "{shell command}"
+  target: {number}
+  direction: "higher|lower"
+  baseline: null                   # float; set on first measure
+  current_best: null               # best metric value seen
+  best_commit: null                # short commit hash of best value
+  cycles_run: 0
+  cycles_without_improvement: 0
+  consecutive_reverts: 0           # optimize-specific revert counter (separate from global)
+  probe_scale: 0                   # 0=no probes yet, 2/4/6
+  max_cycles: {number}
+  history: []                      # [{cycle, value, delta_pct, kept: bool, commit}]
+  failed_hypotheses: []            # ["{description}"] — written to experiments/ on completion
 ```
 Each section is optional. Missing keys are treated as empty. The file is created on first write if absent.
 ### 2. PICK NEXT TASK
-Scan PLAN.md for the first `[ ]` task where all "Blocked by:" dependencies are `[x]`:
+**Optimize-active override:** Before scanning PLAN.md, check `auto-memory.yaml` for `optimize_state.task_id`. If present and the corresponding task is still `[ ]` in PLAN.md, resume that task immediately — skip the normal `[ ]` scan. This ensures optimize tasks survive context exhaustion and resume across cycles.
+```
+If optimize_state.task_id exists in auto-memory.yaml:
+  → Look up that task_id in PLAN.md
+  → If the task is still [ ] → select it (override normal scan)
+  → If the task is [x] → clear optimize_state.task_id and fall through to normal scan
+```
+Otherwise, scan PLAN.md for the first `[ ]` task where all "Blocked by:" dependencies are `[x]`:
 ```
 For each [ ] task in PLAN.md (top to bottom):
@@ -85,7 +118,7 @@ This handles worktree creation, agent spawning, ratchet health checks, and commi
 After `/df:execute` returns, record the task result in `.deepflow/auto-memory.yaml`:
-**On success (ratchet passed):**
+**On success (ratchet passed — non-optimize task):**
 ```yaml
 # Set task_results[task_id] = success entry
@@ -93,7 +126,7 @@ task_results:
   {task_id}: { status: success, commit: {short_hash}, cycle: {cycle_number} }
 ```
-**On revert (ratchet failed):**
+**On revert (ratchet failed — non-optimize task):**
 ```yaml
 # Set task_results[task_id] = reverted entry
@@ -105,6 +138,20 @@ revert_history:
   - { task: {task_id}, cycle: {cycle_number}, reason: "{ratchet failure summary}" }
 ```
+**On optimize cycle result** (task has `Optimize:` block — execute.md section 5.9 handles the inner cycle; auto-cycle only updates the outer state here):
+After each optimize cycle reported by `/df:execute`:
+```yaml
+# Merge updated optimize_state written by execute into auto-memory.yaml
+# execute already persists optimize_state after each cycle (5.9.5) — confirm it was written
+# Increment cycles_run tracked at auto-cycle level for report summary
+optimize_state:
+  cycles_run: {N}                  # echoed from execute's optimize_state
+  current_best: {value}
+  history: [...]                   # full history from execute's optimize_state
+```
 Read the current file first (create if missing), merge the new values, and write back. Preserve all existing keys.
 ### 3.6. CIRCUIT BREAKER
@@ -126,7 +173,7 @@ What does NOT count as a failure:
 - L5 ⚠ (passed on retry): treated as pass, resets counter
 ```
-**On revert (ratchet failed — any of L0 ✗, L1 ✗, L2 ✗, L4 ✗, L5 ✗, or L5 ✗ flaky):**
+**On revert (ratchet failed — any of L0 ✗, L1 ✗, L2 ✗, L4 ✗, L5 ✗, or L5 ✗ flaky — non-optimize task):**
 ```
 1. Read .deepflow/auto-memory.yaml (create if missing)
@@ -141,12 +188,37 @@ What does NOT count as a failure:
      → Continue to step 4 (UPDATE REPORT) as normal
 ```
-**On success (ratchet passed — including L5 — no frontend or L5 ⚠ pass-on-retry):**
+**On success (ratchet passed — including L5 — no frontend or L5 ⚠ pass-on-retry — non-optimize task):**
 ```
 1. Reset consecutive_reverts[task_id] to 0 in .deepflow/auto-memory.yaml
 ```
+**Optimize stop conditions** (task has `Optimize:` block — checked after every optimize cycle result from execute):
+Execute (section 5.9.3) handles the inner-cycle circuit breaker inside the optimize loop. At the auto-cycle level, watch for these terminal outcomes reported by `/df:execute`:
+```
+1. "target reached: {value}"
+     → Mark task [x] (execute already did this — confirm)
+     → Write optimize completion (step 3.7)
+     → Report: "Optimize complete: target reached — {value} (target: {target})"
+     → Continue to step 4
+2. "max cycles reached, best: {current_best}"
+     → Mark task [x] (execute already did this — confirm)
+     → Write optimize completion (step 3.7)
+     → Report: "Optimize complete: max cycles reached — best: {current_best} (target: {target})"
+     → Continue to step 4
+3. "circuit breaker: 3 consecutive reverts"
+     → Task stays [ ] — do NOT mark [x]
+     → Write optimize failure to experiments/ (step 3.7)
+     → Clear optimize_state.task_id (task stays [ ] for manual intervention)
+     → Report: "Circuit breaker tripped (optimize): T{n} halted after 3 consecutive reverts. Resolve manually."
+     → Halt (exit without scheduling next cycle)
+```
 **auto-memory.yaml schema for the circuit breaker:**
 ```yaml
@@ -161,6 +233,43 @@ consecutive_reverts:
 circuit_breaker_threshold: 3   # halt after this many consecutive reverts on the same task
 ```
+### 3.7. OPTIMIZE COMPLETION
+When an optimize task reaches a terminal stop condition (target reached, max cycles, or circuit breaker):
+**On target reached or max cycles (task [x]):**
+```
+1. Read optimize_state.failed_hypotheses from .deepflow/auto-memory.yaml
+2. For each failed hypothesis, write to .deepflow/experiments/:
+     File: {spec}--optimize-{task_id}--{slug}--failed.md
+     Content:
+       # Failed Hypothesis: {description}
+       Task: {task_id}  Spec: {spec_name}  Cycle: {cycle_N}
+       Metric before: {value_before}  Metric after: {value_after}
+       Reason: {why it was reverted}
+3. Write a summary experiment file for the optimize run:
+     File: {spec}--optimize-{task_id}--summary--{status}.md
+     Content:
+       # Optimize Summary: {task_id}
+       Metric: {metric_command}  Target: {target}  Direction: {direction}
+       Baseline: {baseline}  Best achieved: {current_best}  Final: {final_value}
+       Cycles run: {cycles_run}  Status: {reached|max_cycles}
+       History (all cycles):
+       | Cycle | Value | Delta | Kept | Commit |
+       ...
+4. Clear optimize_state from .deepflow/auto-memory.yaml (set to null or remove key)
+```
+**On circuit breaker halt:**
+```
+1. Write failed_hypotheses to .deepflow/experiments/ (same as above)
+2. Write summary experiment file with status: circuit_breaker
+3. Preserve optimize_state in auto-memory.yaml (do NOT clear — enables human diagnosis)
+     Add note: "halted: circuit_breaker — requires manual intervention"
+```
 ### 4. UPDATE REPORT
 Write a comprehensive report to `.deepflow/auto-report.md` after every cycle. The file is appended each cycle — never overwritten. Each cycle adds its row to the per-cycle log table and updates the running summary counts.
@@ -181,13 +290,16 @@ _Last updated: {YYYY-MM-DDTHH:MM:SSZ}_
 | Total cycles run | {N} |
 | Tasks committed | {N} |
 | Tasks reverted | {N} |
+| Optimize cycles run | {N} |          ← present only when optimize tasks exist in PLAN.md
+| Optimize best value | {value} / {target} |  ← present only when optimize tasks exist
 ## Cycle Log
-| Cycle | Task | Status | Commit / Revert | Delta | Reason | Timestamp |
-|-------|------|--------|-----------------|-------|--------|-----------|
-| 1 | T1 | passed | abc1234 | tests: 24→24, build: ok | — | 2025-01-15T10:00:00Z |
-| 2 | T2 | failed | reverted | tests: 24→22 (−2) | tests failed: 2 of 24 | 2025-01-15T10:05:00Z |
+| Cycle | Task | Status | Commit / Revert | Delta | Metric Delta | Reason | Timestamp |
+|-------|------|--------|-----------------|-------|--------------|--------|-----------|
+| 1 | T1 | passed | abc1234 | tests: 24→24, build: ok | — | — | 2025-01-15T10:00:00Z |
+| 2 | T2 | failed | reverted | tests: 24→22 (−2) | — | tests failed: 2 of 24 | 2025-01-15T10:05:00Z |
+| 3 | T3 | optimize | def789 | tests: 24→24, build: ok | 72.3→74.1 (+2.5%) | — | 2025-01-15T10:10:00Z |
 ## Probe Results
@@ -196,6 +308,20 @@ _(empty until a probe/spike task runs)_
 | Probe | Metric | Winner | Loser | Notes |
 |-------|--------|--------|-------|-------|
+## Optimize Runs
+_(empty until an optimize task completes)_
+| Task | Metric | Baseline | Best | Target | Cycles | Status |
+|------|--------|----------|------|--------|--------|--------|
+## Secondary Metric Warnings
+_(empty until a secondary metric regresses >5%)_
+| Cycle | Task | Secondary Metric | Before | After | Delta | Severity |
+|-------|------|-----------------|--------|-------|-------|----------|
 ## Health Score
 | Check | Status |
@@ -203,6 +329,7 @@ _(empty until a probe/spike task runs)_
 | Tests passed | {N} / {total} |
 | Build status | passing / failing |
 | Ratchet | green / red |
+| Optimize status | in_progress / reached / max_cycles / circuit_breaker / — |  ← present only when optimize tasks exist
 ## Reverted Tasks
@@ -217,14 +344,15 @@ _(tasks that were reverted with their failure reasons)_
 **Cycle Log — append one row:**
 ```
-| {cycle_number} | {task_id} | {status} | {commit_hash or "reverted"} | {delta} | {reason or "—"} | {YYYY-MM-DDTHH:MM:SSZ} |
+| {cycle_number} | {task_id} | {status} | {commit_hash or "reverted"} | {delta} | {metric_delta} | {reason or "—"} | {YYYY-MM-DDTHH:MM:SSZ} |
 ```
 - `cycle_number`: total number of cycles executed so far (count existing data rows in the Cycle Log + 1)
 - `task_id`: task ID from PLAN.md, or `BOOTSTRAP` for bootstrap cycles
-- `status`: `passed` (ratchet passed), `failed` (ratchet failed, reverted), or `skipped` (task was already done)
+- `status`: `passed` (ratchet passed), `failed` (ratchet failed, reverted), `skipped` (task was already done), or `optimize` (optimize cycle — one inner cycle of an Optimize task)
 - `commit_hash`: short hash from the commit, or `reverted` if ratchet failed
 - `delta`: ratchet metric change from this cycle. Format: `tests: {before}→{after}, build: ok/fail`. Include coverage delta if available (e.g., `cov: 80%→82% (+2%)`). On revert, show the regression that triggered it (e.g., `tests: 24→22 (−2)`)
+- `metric_delta`: for optimize cycles, show `{old}→{new} ({+/-pct}%)`. For non-optimize cycles, use `—`.
 - `reason`: failure reason from ratchet output (e.g., `"tests failed: 2 of 24"`), or `—` if passed
 **Summary table — recalculate from Cycle Log rows:**
@@ -232,9 +360,31 @@ _(tasks that were reverted with their failure reasons)_
 - `Total cycles run`: count of all data rows in the Cycle Log
 - `Tasks committed`: count of rows where Status = `passed`
 - `Tasks reverted`: count of rows where Status = `failed`
+- `Optimize cycles run`: count of rows where Status = `optimize` (omit row if no optimize tasks in PLAN.md)
+- `Optimize best value`: `{current_best} / {target}` from `optimize_state` in auto-memory.yaml (omit row if no optimize tasks)
 **Last updated timestamp:** always overwrite the `_Last updated:` line with the current timestamp.
+**Optimize Runs table — update on optimize terminal events:**
+When an optimize stop condition is reached (target reached, max cycles, circuit breaker), append or update the row for the optimize task:
+```
+| {task_id} | {metric_command} | {baseline} | {current_best} | {target} | {cycles_run} | {reached|max_cycles|circuit_breaker} |
+```
+If the task is still in progress, do not add a row yet (it will be added when the terminal event fires).
+**Secondary Metric Warnings table — append on regression >5%:**
+After each optimize cycle, `/df:execute` section 5.9.2 step j measures secondary metrics. If a regression exceeds the threshold, auto-cycle reads the warning from execute's output and appends to the table:
+```
+| {cycle_number} | {task_id} | {secondary_metric_command} | {before} | {after} | {+/-pct}% | WARNING |
+```
+The severity is always `WARNING` (no auto-revert — human decision required). These rows are informational only.
 #### 4.3 Probe results (when applicable)
 If the executed task was a probe/spike (task description contains "probe" or "spike"), append a row to the Probe Results table:
@@ -254,6 +404,13 @@ Read the ratchet output from the last `/df:execute` result and populate:
 - `Tests passed`: e.g., `22 / 24` (from ratchet summary line)
 - `Build status`: `passing` if exit code 0, `failing` if build error
 - `Ratchet`: `green` if ratchet passed, `red` if ratchet failed
+- `Optimize status`: read from `optimize_state` in auto-memory.yaml:
+  - `in_progress` if `optimize_state.task_id` present and task still `[ ]`
+  - `reached` if stop condition was "target reached"
+  - `max_cycles` if stop condition was "max cycles"
+  - `circuit_breaker` if halted by circuit breaker
+  - `—` if no optimize task is active or was ever run
+  - Omit this row entirely if PLAN.md contains no `[OPTIMIZE]` tasks
 Replace the entire Health Score section content with the latest values each cycle.
@@ -301,6 +458,11 @@ pending_count = number of [ ] tasks
 | Auto-memory updated after every cycle | `task_results`, `revert_history`, and `consecutive_reverts` in `.deepflow/auto-memory.yaml` are written after each EXECUTE result |
 | Cross-cycle state read at cycle start | LOAD STATE reads the full `auto-memory.yaml` schema; prior task outcomes and probe learnings are available to the cycle |
 | Circuit breaker halts the loop | After N consecutive reverts on the same task (default 3, configurable via `circuit_breaker_threshold` in `.deepflow/config.yaml`), the loop is stopped and the reason is reported |
+| One optimize task at a time | Only one `[OPTIMIZE]` task runs at a time — auto-cycle defers other optimize tasks until the active one reaches a terminal stop condition |
+| Optimize tasks resume across context windows | `optimize_state.task_id` in `auto-memory.yaml` overrides the normal `[ ]` scan; the same task is picked every cycle until a stop condition fires |
+| Optimize circuit breaker halts AND preserves state | When optimize hits 3 consecutive reverts: task stays `[ ]`, `optimize_state` is preserved in `auto-memory.yaml` (not cleared), loop halts |
+| Secondary metric regression is advisory only | >5% regression generates WARNING in `auto-report.md` Secondary Metric Warnings table — never triggers auto-revert |
+| Optimize completion writes experiments | Failed hypotheses and run summary are written to `.deepflow/experiments/` when a terminal stop condition fires |
 ## Example
@@ -383,6 +545,77 @@ Circuit breaker tripped: T3 failed 3 consecutive times. Reason: 2 tests regresse
 Loop halted. Resolve T3 manually, then run /df:auto-cycle to resume.
 ```
+### Optimize Cycle (in progress — task resumes from optimize_state)
+```
+/df:auto-cycle
+Loading PLAN.md... 4 tasks total, 2 done, 2 pending
+Loading auto-memory.yaml... optimize_state.task_id = T3
+Optimize-active override: T3 still [ ] — resuming optimize task
+  optimize_state: cycles_run=4, current_best=74.1, target=85.0, direction=higher
+Running: /df:execute T3
+  ⟳ T3 cycle 5: 74.1 → 75.8 (+2.3%) — kept [best: 75.8, target: 85.0]
+Updated .deepflow/auto-memory.yaml:
+  optimize_state.cycles_run = 5
+  optimize_state.current_best = 75.8
+Updated .deepflow/auto-report.md:
+  Summary: cycles=5, committed=2, reverted=0, optimize_cycles=5, optimize_best=75.8/85.0
+  Cycle Log row: | 5 | T3 | optimize | abc1234 | tests: 24→24, build: ok | 74.1→75.8 (+2.3%) | — | 2025-01-15T10:15:00Z |
+  Health: tests 24/24, build passing, ratchet green, optimize in_progress
+Cycle complete. 2 tasks remaining.
+```
+### Optimize Complete (target reached)
+```
+/df:auto-cycle
+Loading PLAN.md... 4 tasks total, 2 done, 2 pending
+Loading auto-memory.yaml... optimize_state.task_id = T3
+Optimize-active override: T3 still [ ] — resuming optimize task
+  optimize_state: cycles_run=12, current_best=84.9, target=85.0, direction=higher
+Running: /df:execute T3
+  ⟳ T3 cycle 13: 84.9 → 85.3 (+0.5%) — kept [best: 85.3, target: 85.0]
+  Target reached: 85.3 >= 85.0 — marking T3 [x]
+Optimize completion:
+  Writing 3 failed hypotheses to .deepflow/experiments/
+  Writing summary: specs--optimize-T3--summary--reached.md
+  Clearing optimize_state from auto-memory.yaml
+Updated .deepflow/auto-report.md:
+  Summary: cycles=13, committed=3, reverted=0, optimize_cycles=13, optimize_best=85.3/85.0
+  Cycle Log row: | 13 | T3 | optimize | def456 | tests: 24→24, build: ok | 84.9→85.3 (+0.5%) | — | 2025-01-15T10:45:00Z |
+  Optimize Runs row: | T3 | coverage_cmd | 72.3 | 85.3 | 85.0 | 13 | reached |
+  Health: tests 24/24, build passing, ratchet green, optimize reached
+Cycle complete. 1 tasks remaining.
+```
+### Optimize Secondary Metric Warning
+```
+/df:auto-cycle
+Running: /df:execute T3
+  ⟳ T3 cycle 8: 80.1 → 81.4 (+1.6%) — kept [best: 81.4, target: 85.0]
+  WARNING: secondary metric 'lint_errors' regressed: 2 → 5 (+150%) — exceeds 5% threshold
+Updated .deepflow/auto-report.md:
+  Secondary Metric Warnings row: | 8 | T3 | lint_errors | 2 | 5 | +150% | WARNING |
+  (No auto-revert — human decision required)
+Cycle complete. 2 tasks remaining.
+```
 ### All Tasks Blocked
 ```

package/src/commands/df/auto.md CHANGED Viewed

@@ -1,3 +1,8 @@
+---
+name: df:auto
+description: Set up and launch fully autonomous execution with plan generation and ratchet snapshots
+---
 # /df:auto — Autonomous Mode Setup
 Set up and launch fully autonomous execution. Runs `/df:plan` if no PLAN.md exists, takes a ratchet snapshot, then starts `/loop 1m /df:auto-cycle`.

package/src/commands/df/consolidate.md CHANGED Viewed

@@ -1,3 +1,8 @@
+---
+name: df:consolidate
+description: Remove duplicates and superseded entries from decisions file, promote stale provisionals
+---
 # /df:consolidate — Consolidate Decisions
 ## Purpose
@@ -15,6 +20,9 @@ Remove duplicates, superseded entries, and promote stale provisionals. Keep deci
 ### 1. LOAD
 Read `.deepflow/decisions.md`. If missing or empty, report and exit.
+Shell injection (use output directly — no manual file reads needed):
+- `` !`cat .deepflow/decisions.md 2>/dev/null || echo 'NOT_FOUND'` ``
 ### 2. ANALYZE
 Model-driven analysis (not regex):
 - Identify duplicate decisions (same meaning, different wording)

package/src/commands/df/debate.md CHANGED Viewed

@@ -1,3 +1,9 @@
+---
+name: df:debate
+description: Generate multi-perspective analysis of a problem before formalizing into a spec
+allowed-tools: [Read, Grep, Glob, Agent]
+---
 # /df:debate — Multi-Perspective Analysis
 ## Orchestrator Role

package/src/commands/df/discover.md CHANGED Viewed

@@ -1,3 +1,9 @@
+---
+name: df:discover
+description: Explore a problem space deeply through structured questioning to surface requirements and constraints
+allowed-tools: [AskUserQuestion, Read]
+---
 # /df:discover — Deep Problem Exploration
 ## Orchestrator Role

package/src/commands/df/execute.md CHANGED Viewed

@@ -1,3 +1,8 @@
+---
+name: df:execute
+description: Execute tasks from PLAN.md with agent spawning, ratchet health checks, and worktree management
+---
 # /df:execute — Execute Tasks from Plan
 ## Orchestrator Role
@@ -51,6 +56,10 @@ checkpoint exists → Prompt: "Resume? (y/n)"
 else → Start fresh
 ```
+Shell injection (use output directly — no manual file reads needed):
+- `` !`cat .deepflow/checkpoint.json 2>/dev/null || echo 'NOT_FOUND'` ``
+- `` !`git diff --quiet && echo 'CLEAN' || echo 'DIRTY'` ``
 ### 1.5. CREATE WORKTREE
 Require clean HEAD (`git diff --quiet`). Derive SPEC_NAME from `specs/doing-*.md`.
@@ -92,6 +101,10 @@ Load: PLAN.md (required), specs/doing-*.md, .deepflow/config.yaml
 If missing: "No PLAN.md found. Run /df:plan first."
 ```
+Shell injection (use output directly — no manual file reads needed):
+- `` !`cat .deepflow/checkpoint.json 2>/dev/null || echo 'NOT_FOUND'` ``
+- `` !`git diff --quiet && echo 'CLEAN' || echo 'DIRTY'` ``
 ### 2.5. REGISTER NATIVE TASKS
 For each `[ ]` task in PLAN.md: `TaskCreate(subject: "{task_id}: {description}", activeForm: "{gerund}", description: full block)`. Store task_id → native ID mapping. Set dependencies via `TaskUpdate(addBlockedBy: [...])`. On `--continue`: only register remaining `[ ]` items.
@@ -123,6 +136,8 @@ Before spawning, check `Files:` lists of all ready tasks. If two+ ready tasks sh
 **≥2 [SPIKE] tasks for same problem:** Follow Parallel Spike Probes (section 5.7).
+**[OPTIMIZE] tasks:** Follow Optimize Cycle (section 5.9). Only ONE optimize task runs at a time — defer others until the active one completes.
 ### 5.5. RATCHET CHECK
 After each agent completes, run health checks in the worktree.
@@ -144,6 +159,28 @@ Run Build → Test → Typecheck → Lint (stop on first failure).
 Compare `git diff HEAD~1 --name-only` against Impact callers/duplicates list.
 File listed but not modified → **advisory warning**: "Impact gap: {file} listed as {caller|duplicate} but not modified — verify manually". Not auto-revert (callers sometimes don't need changes), but flags the risk.
+**Metric gate (Optimize tasks only):**
+After ratchet passes, if the current task has an `Optimize:` block, run the metric gate:
+1. Run the `metric` shell command in the worktree: `cd ${WORKTREE_PATH} && eval "${metric_command}"`
+2. Parse output as float. Non-numeric output → cycle failure (revert, log "metric parse error: {raw output}")
+3. Compare against previous measurement using `direction`:
+   - `direction: higher` → new value must be > previous + (previous × min_improvement_threshold)
+   - `direction: lower` → new value must be < previous - (previous × min_improvement_threshold)
+4. Both ratchet AND metric improvement required → keep commit
+5. Ratchet passes but metric did not improve → revert (log "ratchet passed but metric stagnant/regressed: {old} → {new}")
+6. Run each `secondary_metrics` command, parse as float. If regression > `regression_threshold` (default 5%) compared to baseline: append WARNING to `.deepflow/auto-report.md`: `"WARNING: {name} regressed {delta}% ({baseline_val} → {new_val}) at cycle {N}"`. Do NOT auto-revert.
+**Output Truncation:**
+After ratchet checks complete, truncate command output for context efficiency:
+- **Success (all checks passed):** Suppress output entirely — do not include build/test/lint output in reports
+- **Build failure:** Include last 15 lines of build error only
+- **Test failure:** Include failed test name(s) + last 20 lines of test output
+- **Typecheck/lint failure:** Include error count + first 5 errors only
 **Evaluate:** All pass + no violations → commit stands. Any failure → attempt partial salvage before reverting:
 **Partial salvage protocol:**
@@ -166,7 +203,8 @@ Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothes
 4. **Ratchet:** Per notification, run standard ratchet (5.5) in probe worktree. Record: ratchet_passed, regressions, coverage_delta, files_changed, commit
 5. **Select winner** (after ALL complete, no LLM judge):
    - Disqualify any with regressions
-   - Rank: fewer regressions > higher coverage_delta > fewer files_changed > first to complete
+   - **Standard spikes**: Rank: fewer regressions > higher coverage_delta > fewer files_changed > first to complete
+   - **Optimize probes**: Rank: best metric improvement (absolute delta toward target) > fewer regressions > fewer files_changed
    - No passes → reset all to pending for retry with debugger
 6. **Preserve all worktrees.** Losers: rename branch + `-failed` suffix. Record in checkpoint.json under `"spike_probes"`
 7. **Log ALL probe outcomes** to `.deepflow/auto-memory.yaml` (main tree):
@@ -200,6 +238,141 @@ Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothes
    Create file if missing. Preserve existing keys when merging. Log BOTH winners and losers — downstream tasks need to know what was chosen, not just what failed.
 8. **Promote winner:** Cherry-pick into shared worktree. Winner → `[x] [PROBE_WINNER]`, losers → `[~] [PROBE_FAILED]`. Resume standard loop.
+#### 5.7.1. PROBE DIVERSITY ENFORCEMENT (Optimize Probes)
+When spawning probes for optimize plateau resolution, enforce diversity roles:
+**Role definitions:**
+- **contextualizada**: Builds on the best approach so far — refines, extends, or combines what worked. Prompt includes: "Build on the best result so far: {best_approach_summary}. Refine or extend it."
+- **contraditoria**: Tries the opposite of the current best. Prompt includes: "The best approach so far is {best_approach_summary}. Try the OPPOSITE direction — if it cached, don't cache; if it optimized hot path, optimize cold path; etc."
+- **ingenua**: No prior context — naive fresh attempt. Prompt includes: "Ignore all prior attempts. Approach this from scratch with no assumptions about what works."
+**Auto-scaling by probe round:**
+| Probe round | Count | Required roles |
+|-------------|-------|----------------|
+| 1st plateau | 2 | 1 contraditoria + 1 ingenua |
+| 2nd plateau | 4 | 1 contextualizada + 2 contraditoria + 1 ingenua |
+| 3rd+ plateau | 6 | 2 contextualizada + 2 contraditoria + 2 ingenua |
+**Rules:**
+- Every probe set MUST include ≥1 contraditoria and ≥1 ingenua (minimum diversity)
+- contextualizada only added from round 2+ (needs prior data to build on)
+- Each probe prompt includes its role label and role-specific instruction
+- Probe scale persists in `optimize_state.probe_scale` in `auto-memory.yaml`
+### 5.9. OPTIMIZE CYCLE
+Trigger: task has `Optimize:` block in PLAN.md. Runs instead of standard single-agent spawn.
+**Optimize is a distinct execution mode** — one optimize task at a time, spanning N cycles until a stop condition.
+#### 5.9.1. INITIALIZATION
+1. Parse `Optimize:` block from PLAN.md task: `metric`, `target`, `direction`, `max_cycles`, `secondary_metrics`
+2. Load or initialize `optimize_state` from `.deepflow/auto-memory.yaml`:
+   ```yaml
+   optimize_state:
+     task_id: "T{n}"
+     metric_command: "{shell command}"
+     target: {number}
+     direction: "higher|lower"
+     baseline: null          # set on first measure
+     current_best: null      # best metric value seen
+     best_commit: null       # commit hash of best value
+     cycles_run: 0
+     cycles_without_improvement: 0
+     consecutive_reverts: 0
+     probe_scale: 0          # 0=no probes yet, 2/4/6
+     max_cycles: {number}
+     history: []             # [{cycle, value, delta, kept, commit}]
+     failed_hypotheses: []   # ["{description}"]
+   ```
+3. **Measure baseline**: `cd ${WORKTREE_PATH} && eval "${metric_command}"` → parse float → store as `baseline` and `current_best`
+4. Measure each secondary metric → store as `secondary_baselines`
+5. Check if target already met (`direction: higher` → baseline >= target; `lower` → baseline <= target). If met → mark task `[x]`, log "target already met: {baseline}", done.
+#### 5.9.2. CYCLE LOOP
+Each cycle = one agent spawn + measure + keep/revert decision.
+```
+REPEAT:
+  1. Check stop conditions (5.9.3) → if triggered, exit loop
+  2. Spawn ONE optimize agent (section 6, Optimize Task prompt) with run_in_background=true
+  3. STOP. End turn. Wait for notification.
+  4. On notification:
+     a. Run ratchet check (section 5.5) — build/test/lint must pass
+     b. If ratchet fails → git revert HEAD --no-edit, increment consecutive_reverts, log failed hypothesis, go to step 1
+     c. Run metric gate (section 5.5 metric gate) — measure new value
+     d. If metric parse error → git revert HEAD --no-edit, increment consecutive_reverts, log "metric parse error"
+     e. Compute improvement:
+        - direction: higher → improvement = (new - current_best) / |current_best| × 100
+        - direction: lower  → improvement = (current_best - new) / |current_best| × 100
+        - current_best == 0 → use absolute delta
+     f. If improvement >= min_improvement_threshold (default 1%):
+        → KEEP: update current_best, best_commit, reset cycles_without_improvement=0, reset consecutive_reverts=0
+     g. If improvement < min_improvement_threshold:
+        → REVERT: git revert HEAD --no-edit, increment cycles_without_improvement
+     h. Increment cycles_run
+     i. Append to history: {cycle, value, delta_pct, kept: bool, commit}
+     j. Measure secondary metrics, check regression (WARNING only, no revert)
+     k. Persist optimize_state to auto-memory.yaml
+     l. Report: "⟳ T{n} cycle {N}: {old} → {new} ({+/-delta}%) — {kept|reverted} [best: {current_best}, target: {target}]"
+     m. Check context %. If ≥50% → checkpoint and exit (auto-cycle resumes).
+```
+#### 5.9.3. STOP CONDITIONS
+| Condition | Detection | Action |
+|-----------|-----------|--------|
+| **Target reached** | `direction: higher` → value >= target; `lower` → value <= target | Mark task `[x]`, log "target reached: {value}" |
+| **Max cycles** | `cycles_run >= max_cycles` | Mark task `[x]` with note: "max cycles reached, best: {current_best}". If current_best worse than baseline → `git reset --hard {best_commit}`, log "reverted to best-known" |
+| **Plateau** | `cycles_without_improvement >= 3` | Pause normal cycle → launch probes (5.9.4) |
+| **Circuit breaker** | `consecutive_reverts >= 3` | Halt, task stays `[ ]`, log "circuit breaker: 3 consecutive reverts". Requires human intervention. |
+On **max cycles** with final value worse than baseline:
+1. `git reset --hard {best_commit}` in worktree
+2. Log: "final value {current} worse than baseline {baseline}, reverted to best-known commit {best_commit} (value: {current_best})"
+#### 5.9.4. PLATEAU → PROBE LAUNCH
+When plateau detected (3 cycles without ≥1% improvement):
+1. Pause normal optimize cycle
+2. Determine probe count from `probe_scale` (section 5.7.1 auto-scaling table): 0→2, 2→4, 4→6
+3. Update `probe_scale` in optimize_state
+4. Record `BASELINE=$(git rev-parse HEAD)` in shared worktree
+5. Create sub-worktrees per probe: `git worktree add -b df/{spec}--opt-probe-{N} .deepflow/worktrees/{spec}/opt-probe-{N} ${BASELINE}`
+6. Spawn ALL probes in ONE message using Optimize Probe prompt (section 6), each with its diversity role
+7. End turn. Wait for all notifications.
+8. Per notification: run ratchet + metric measurement in probe worktree
+9. Select winner (section 5.7 step 5, optimize ranking): best metric improvement toward target
+10. Winner → cherry-pick into shared worktree, update current_best, reset cycles_without_improvement=0
+11. Losers → rename branch with `-failed` suffix, preserve worktrees
+12. Log all probe outcomes to `auto-memory.yaml` under `spike_insights` (reuse existing format)
+13. Log probe learnings: winning approach summary + each loser's failure reason
+14. Resume normal optimize cycle from step 1
+#### 5.9.5. STATE PERSISTENCE (auto-memory.yaml)
+After every cycle, write `optimize_state` to `.deepflow/auto-memory.yaml` (main tree). This ensures:
+- Context exhaustion at 50% → auto-cycle resumes with full history
+- Failed hypotheses carry forward (agents won't repeat approaches)
+- Probe scale persists across context windows
+Also append cycle results to `.deepflow/auto-report.md`:
+```
+## Optimize: T{n} — {metric_name}
+| Cycle | Value | Delta | Kept | Commit |
+|-------|-------|-------|------|--------|
+| 1 | 72.3 | — | baseline | abc123 |
+| 2 | 74.1 | +2.5% | ✓ | def456 |
+| 3 | 73.8 | -0.4% | ✗ | (reverted) |
+...
+Best: {current_best} | Target: {target} | Status: {in_progress|reached|max_cycles|circuit_breaker}
+```
 ---
 ### 6. PER-TASK (agent prompt)
@@ -296,6 +469,91 @@ Implement minimal spike to validate hypothesis.
 Commit as spike({spec}): {description}
 ```
+**Optimize Task** (spawn with `Agent(model="opus", subagent_type="general-purpose")`):
+One agent per cycle. Agent makes ONE atomic change to improve the metric.
+```
+--- START (high attention zone) ---
+{task_id} [OPTIMIZE]: Improve {metric_name} — cycle {N}/{max_cycles}
+Files: {target files}  Spec: {spec_name}
+Current metric: {current_value} (baseline: {baseline}, best: {current_best})
+Target: {target} ({direction})
+Improvement needed: {delta_to_target} ({direction})
+CONSTRAINT: Make exactly ONE atomic change. Do not refactor broadly.
+The metric is measured by: {metric_command}
+You succeed if the metric moves toward {target} after your change.
+--- MIDDLE (navigable data zone) ---
+Attempt history (last 5 cycles):
+{For each recent history entry:}
+- Cycle {N}: {value} ({+/-delta}%) — {kept|reverted} — "{one-line description of what was tried}"
+{Omit if cycle 1.}
+DO NOT repeat these failed approaches:
+{For each failed_hypothesis in optimize_state:}
+- "{hypothesis description}"
+{Omit if no failed hypotheses.}
+{Impact block from PLAN.md if present}
+{Dependency context if present}
+Steps:
+1. Analyze the metric command to understand what's being measured
+2. Read the target files and identify ONE specific improvement
+3. Implement the change (ONE atomic modification)
+4. Commit as feat({spec}): optimize {metric_name} — {what you changed}
+--- END (high attention zone) ---
+{Spike/probe learnings if any}
+Your ONLY job is to make ONE atomic change and commit. Orchestrator measures the metric after.
+Do NOT run the metric command yourself. Do NOT make multiple changes.
+STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main.
+```
+**Optimize Probe Task** (spawn with `Agent(model="opus", subagent_type="general-purpose")`):
+Used during plateau resolution. Each probe has a diversity role.
+```
+--- START (high attention zone) ---
+{task_id} [OPTIMIZE PROBE]: {metric_name} — probe {probe_id} ({role_label})
+Files: {target files}  Spec: {spec_name}
+Current metric: {current_value} (baseline: {baseline}, best: {current_best})
+Target: {target} ({direction})
+Role: {role_label}
+{role_instruction — one of:}
+  contextualizada: "Build on the best approach so far: {best_approach_summary}. Refine, extend, or combine what worked."
+  contraditoria: "The best approach so far was: {best_approach_summary}. Try the OPPOSITE — if it optimized X, try Y instead. Challenge the current direction."
+  ingenua: "Ignore all prior attempts. Approach this metric from scratch with no assumptions about what has or hasn't worked."
+--- MIDDLE (navigable data zone) ---
+Full attempt history:
+{ALL history entries from optimize_state}
+- Cycle {N}: {value} ({+/-delta}%) — {kept|reverted}
+All failed approaches (DO NOT repeat):
+{ALL failed_hypotheses}
+- "{hypothesis description}"
+--- END (high attention zone) ---
+Make ONE atomic change that moves the metric toward {target}.
+Commit as feat({spec}): optimize probe {probe_id} — {what you changed}
+STOP after committing.
+```
 ### 8. COMPLETE SPECS
 When all tasks done for a `doing-*` spec:
@@ -370,6 +628,12 @@ When task fails ratchet and is reverted:
 | All probe worktrees preserved | Losers renamed `-failed`; never deleted |
 | Machine-selected winner | Regressions > coverage > files changed; no LLM judge |
 | External APIs → chub first | Skip if unavailable |
+| 1 optimize task at a time | Inherently sequential — no parallel optimize tasks |
+| Optimize = atomic changes only | One modification per cycle for diagnosability |
+| Ratchet + metric = both required | Optimize keeps commit only if ratchet AND metric improve |
+| Plateau → probes, not more cycles | 3 cycles without ≥1% improvement triggers probe launch |
+| Circuit breaker = 3 consecutive reverts | Halts optimize loop, requires human intervention |
+| Optimize probes need diversity | Every probe set: ≥1 contraditoria + ≥1 ingenua minimum |
 ## Example

package/src/commands/df/note.md CHANGED Viewed

@@ -1,3 +1,8 @@
+---
+name: df:note
+description: Capture decisions that emerged during free conversations outside of deepflow commands
+---
 # /df:note — Capture Decisions from Free Conversations
 ## Orchestrator Role

package/src/commands/df/plan.md CHANGED Viewed

@@ -1,3 +1,8 @@
+---
+name: df:plan
+description: Compare specs against codebase and past experiments, generate prioritized tasks
+---
 # /df:plan — Generate Task Plan from Specs
 ## Purpose
@@ -32,6 +37,10 @@ Load: specs/*.md (exclude doing-*/done-*), PLAN.md (if exists), .deepflow/config
 Determine source_dir from config or default to src/
 ```
+Shell injection (use output directly — no manual file reads needed):
+- `` !`ls specs/*.md 2>/dev/null || echo 'NOT_FOUND'` ``
+- `` !`cat PLAN.md 2>/dev/null || echo 'NOT_FOUND'` ``
 Run `validateSpec` on each spec. Hard failures → skip + error. Advisory → include in output.
 No new specs → report counts, suggest `/df:execute`.
@@ -58,20 +67,7 @@ Full implementation tasks BLOCKED until spike validates. See `templates/experime
 Identify code style, patterns (error handling, API structure), integration points. Include in task descriptions.
-### 4. ANALYZE CODEBASE
-Follow `templates/explore-agent.md` for spawn rules and scope.
-| File Count | Agents |
-|------------|--------|
-| <20 | 3-5 |
-| 20-100 | 10-15 |
-| 100-500 | 25-40 |
-| 500+ | 50-100 (cap) |
-Use `code-completeness` skill to search for: implementations matching spec requirements, TODOs/FIXMEs/HACKs, stubs, skipped tests.
-### 4.5. IMPACT ANALYSIS (per planned file)
+### 4. IMPACT ANALYSIS (per planned file)
 For each file in a task's "Files:" list, find the full blast radius.
@@ -99,6 +95,16 @@ For each file in a task's "Files:" list, find the full blast radius.
 Files outside original "Files:" → add with `(impact — verify/update)`.
 Skip for spike tasks.
+### 4.5. TARGETED EXPLORATION
+Follow `templates/explore-agent.md` for spawn rules and scope. Explore agents cover **what LSP did not reveal**: conventions, dead code, implicit patterns.
+| Finding Type | Agents |
+|--------------|--------|
+| Post-LSP gaps | 3-5 |
+Use `code-completeness` skill to search for: implementations matching spec requirements, TODOs/FIXMEs/HACKs, stubs, skipped tests.
 ### 4.6. CROSS-TASK FILE CONFLICT DETECTION
 After all tasks have their `Files:` lists, detect overlaps that require sequential execution.
@@ -133,6 +139,17 @@ Spawn `Task(subagent_type="reasoner", model="opus")`. Map each requirement to DO
 Priority: Dependencies → Impact → Risk
+#### Metric AC Detection
+While comparing requirements, scan each spec AC for the pattern `{metric} {operator} {number}[unit]`:
+- **Pattern examples**: `coverage > 85%`, `latency < 200ms`, `p99_latency <= 150ms`, `bundle_size < 500kb`
+- **Operators**: `>`, `<`, `>=`, `<=`, `==`
+- **Number**: float or integer, optional unit suffix (%, ms, kb, mb, s, etc.)
+- **On match**: flag the AC as a **metric AC** and generate an `Optimize:` task (see section 6.5)
+- **Non-match**: treat as standard functional AC → standard implementation task
+- **Ambiguous ACs** (qualitative terms like "fast", "small", "improved"): flag as spec gap, request numeric threshold before planning
 ### 5.5. CLASSIFY MODEL + EFFORT PER TASK
 For each task, assign `Model:` and `Effort:` based on the routing matrix:
@@ -148,6 +165,7 @@ For each task, assign `Model:` and `Effort:` based on the routing matrix:
 | Bug fix (clear repro) | `sonnet` | `medium` | Diagnosis done, just apply fix |
 | Bug fix (unclear cause) | `sonnet` | `high` | Needs reasoning to find root cause |
 | Spike / validation | `sonnet` | `high` | Scoped but needs reasoning to validate hypothesis |
+| Optimize (metric AC) | `opus` | `high` | Multi-cycle, ambiguous — best strategy changes per iteration |
 | Feature work (well-specced) | `sonnet` | `medium` | Clear ACs reduce thinking overhead |
 | Feature work (ambiguous ACs) | `opus` | `medium` | Needs intelligence but effort can be moderate with good specs |
 | Refactor (>5 files, many callers) | `opus` | `medium` | Blast radius needs intelligence, patterns are repetitive |
@@ -224,6 +242,40 @@ Before output, verify: ≥2 opposing probes, ≥1 naive, all independent.
   - Success criteria: DB queries drop ≥80% with zero cache infrastructure
 ```
+### 6.5. GENERATE OPTIMIZE TASKS (FROM METRIC ACs)
+For each metric AC detected in section 5, generate an `Optimize:` task using this format:
+**Optimize Task Format:**
+```markdown
+- [ ] **T{n}** [OPTIMIZE]: Improve {metric_name} to {target}
+  - Type: optimize
+  - Files: {primary files likely to affect the metric}
+  - Optimize:
+      metric: "{shell command that outputs a single number}"
+      target: {number}
+      direction: higher|lower
+      max_cycles: {number, default 20}
+      secondary_metrics:
+        - metric: "{shell command}"
+          name: "{label}"
+          regression_threshold: 5%
+  - Model: opus
+  - Effort: high
+  - Blocked by: {spike T{n} if applicable, else none}
+```
+**Field rules:**
+- `metric`: a shell command returning a single scalar float/integer (e.g., `npx jest --coverage --json | jq '.coverageMap | .. | .pct? | numbers' | awk '{sum+=$1;n++} END{print sum/n}'`). Must be deterministic and side-effect free.
+- `target`: the numeric threshold extracted from the AC (strip unit suffix for the value; note unit in task description)
+- `direction`: `higher` if operator is `>` or `>=`; `lower` if `<` or `<=`; `higher` by convention for `==`
+- `max_cycles`: from spec if stated; default 20
+- `secondary_metrics`: other metrics from the same spec that could regress (e.g., build time, bundle size, test count). Omit if none.
+**Model/Effort**: always `opus` / `high` (see routing matrix).
+**Blocking**: if a spike exists for the same area, block the optimize task on the spike passing.
 ### 7. VALIDATE HYPOTHESES
 Unfamiliar APIs or performance-critical → prototype in scratchpad. Fails → write `--failed.md`. Skip for known patterns.
@@ -250,7 +302,7 @@ Report: `✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
 | Agent | Model | Base | Scale |
 |-------|-------|------|-------|
-| Explore | haiku | 10 | +1 per 20 files |
+| Explore | haiku | 3-5 | none |
 | Reasoner | opus | 5 | +1 per 2 specs |
 Always use `Task` tool with explicit `subagent_type` and `model`.
@@ -280,3 +332,25 @@ Always use `Task` tool with explicit `subagent_type` and `model`.
   - Model: opus
   - Blocked by: T1, T2
 ```
+**Optimize task example** (from spec AC: `coverage > 85%`):
+```markdown
+### doing-quality
+- [ ] **T1** [OPTIMIZE]: Improve test coverage to >85%
+  - Type: optimize
+  - Files: src/
+  - Optimize:
+      metric: "npx jest --coverage --json 2>/dev/null | jq '[.. | .pct? | numbers] | add / length'"
+      target: 85
+      direction: higher
+      max_cycles: 20
+      secondary_metrics:
+        - metric: "npx jest --json 2>/dev/null | jq '.testResults | length'"
+          name: test_count
+          regression_threshold: 5%
+  - Model: opus
+  - Effort: high
+  - Blocked by: none
+```

package/src/commands/df/resume.md CHANGED Viewed

@@ -1,3 +1,9 @@
+---
+name: df:resume
+description: Synthesize project state into a briefing covering what happened, current decisions, and next steps
+allowed-tools: [Read, Grep, Glob, Bash]
+---
 # /df:resume — Session Continuity Briefing
 ## Orchestrator Role
@@ -28,11 +34,11 @@ Read these sources in parallel (all reads, no writes):
 | Source | Command/Path | Purpose |
 |--------|-------------|---------|
-| Git timeline | `git log --oneline -20` | What changed and when |
-| Decisions | `.deepflow/decisions.md` | Current [APPROACH], [PROVISIONAL], [ASSUMPTION] entries |
-| Plan | `PLAN.md` | Task status (checked vs unchecked) |
-| Spec headers | `specs/doing-*.md` (first 20 lines each) | What features are in-flight |
-| Experiments | `.deepflow/experiments/` (file listing + names) | Validated and failed approaches |
+| Git timeline | `!`git log --oneline -20`` | What changed and when |
+| Decisions | `!`cat .deepflow/decisions.md 2>/dev/null \|\| echo 'NOT_FOUND'`` | Current [APPROACH], [PROVISIONAL], [ASSUMPTION] entries |
+| Plan | `!`cat PLAN.md 2>/dev/null \|\| echo 'NOT_FOUND'`` | Task status (checked vs unchecked) |
+| Spec headers | `!`head -20 specs/doing-*.md 2>/dev/null \|\| echo 'NOT_FOUND'`` | What features are in-flight |
+| Experiments | `!`ls .deepflow/experiments/ 2>/dev/null \|\| echo 'NOT_FOUND'`` | Validated and failed approaches |
 **Token budget:** Read only what's needed — ~2500 tokens total across all sources.

package/src/commands/df/spec.md CHANGED Viewed

@@ -1,3 +1,8 @@
+---
+name: df:spec
+description: Transform conversation context into a structured specification file with requirements and acceptance criteria
+---
 # /df:spec — Generate Spec from Conversation
 ## Orchestrator Role

package/src/commands/df/update.md CHANGED Viewed

@@ -1,3 +1,8 @@
+---
+name: df:update
+description: Update or uninstall deepflow, check installed version
+---
 # /df:update — Update deepflow
 ## Update

package/src/commands/df/verify.md CHANGED Viewed

@@ -1,3 +1,9 @@
+---
+name: df:verify
+description: Check that implemented code satisfies spec requirements and acceptance criteria through machine-verifiable checks
+context: fork
+---
 # /df:verify — Verify Specs Satisfied
 ## Purpose
@@ -25,7 +31,7 @@ specs/
 ### 1. LOAD CONTEXT
-Load: `specs/doing-*.md`, `PLAN.md`, source code. Load `specs/done-*.md` only if `--re-verify`.
+Load: `!`ls specs/doing-*.md 2>/dev/null || echo 'NOT_FOUND'``, `!`cat PLAN.md 2>/dev/null || echo 'NOT_FOUND'``, source code. Load `specs/done-*.md` only if `--re-verify`.
 **Readiness check:** For each `doing-*` spec, check PLAN.md:
 - All tasks `[x]` → ready (proceed)
@@ -35,7 +41,7 @@ If no `doing-*` specs found: report counts, suggest `/df:execute`.
 ### 1.5. DETECT PROJECT COMMANDS
-**Config override always wins.** If `.deepflow/config.yaml` has `quality.test_command` or `quality.build_command`, use those.
+**Config override always wins.** If `!`cat .deepflow/config.yaml 2>/dev/null || echo 'NOT_FOUND'`` has `quality.test_command` or `quality.build_command`, use those.
 **Auto-detection (first match wins):**

package/src/skills/atomic-commits/SKILL.md CHANGED Viewed

@@ -45,8 +45,9 @@ Task: T1
 1. Implement task completely
 2. Verify it works (tests, types, lint)
 3. Stage specific files (`git add {files}`, not `-A`)
-4. Commit with proper format
-5. Return hash
+4. Read staged changes: !`git diff --cached --stat 2>/dev/null || echo 'NOT_FOUND'`
+5. Commit with proper format
+6. Return hash
 ## Pre-Commit

package/src/skills/browse-fetch/SKILL.md CHANGED Viewed

@@ -1,6 +1,8 @@
 ---
 name: browse-fetch
 description: Fetches live web content using headless Chromium via Playwright. Use when you need to read documentation, articles, or any public URL that requires JavaScript rendering. Falls back to WebFetch for simple HTML pages.
+context: fork
+allowed-tools: [Bash, WebFetch, WebSearch, Read]
 ---
 # Browse-Fetch

package/src/skills/browse-verify/SKILL.md CHANGED Viewed

@@ -1,6 +1,7 @@
 ---
 name: browse-verify
 description: Verifies UI acceptance criteria by launching a headless browser, extracting the accessibility tree, and evaluating structured assertions deterministically. Use when a spec has browser-based ACs that need automated verification after implementation.
+context: fork
 ---
 # Browse-Verify

package/src/skills/code-completeness/SKILL.md CHANGED Viewed

@@ -1,6 +1,7 @@
 ---
 name: code-completeness
 description: Finds incomplete code in codebase. Use when analyzing for TODOs, stubs, placeholders, skipped tests, or missing implementations. Helps compare specs against actual code state.
+allowed-tools: [Read, Grep, Glob]
 ---
 # Code Completeness

package/src/skills/gap-discovery/SKILL.md CHANGED Viewed

@@ -1,6 +1,7 @@
 ---
 name: gap-discovery
 description: Discovers requirement gaps during ideation. Use when user describes features, planning specs, or requirements seem incomplete. Asks clarifying questions about scope, constraints, edge cases, success criteria.
+allowed-tools: [AskUserQuestion, Read]
 ---
 # Gap Discovery

package/templates/explore-agent.md CHANGED Viewed

@@ -12,21 +12,86 @@ Task(subagent_type="Explore", model="haiku", prompt="Find: ...")
 # Returns final message only; blocks until all complete; no late notifications
 ```
+## Search Protocol
+Exploration follows three named phases:
+### DIVERSIFY
+- **Goal**: Find ALL potential matches across the codebase quickly
+- **Method**: Launch 5–8 parallel tool calls in a single message
+- **Tools**: Glob (broad patterns), Grep (regex searches), Read (file content verification)
+- **Result**: Narrow down to 2–5 candidate files
+Example: Search for "config" + "settings" + "env" patterns in parallel, not sequentially.
+### CONVERGE
+- **Goal**: Validate matches against the search criteria
+- **Method**: Read only the matched files; extract relevant line ranges
+- **Result**: Eliminate false positives, confirm relevance
+### EARLY STOP
+- **Goal**: Avoid wasting tokens on exhaustive searches
+- **Rule**: Stop as soon as **>= 2 relevant files found** that answer the question
+- **Exception**: If searching for a single unique thing (e.g., "the entry point file"), find just 1
 ## Prompt Structure
 ```
 Find: [specific question]
 Return ONLY:
-- File paths matching criteria
-- One-line description per file
+- filepath:startLine-endLine -- why relevant
 - Integration points (if asked)
-DO NOT: read/summarize specs, make recommendations, propose solutions, generate tables.
+DO NOT: read/summarize specs, make recommendations, propose solutions, generate tables, narrate your search process.
 Max response: 500 tokens (configurable via .deepflow/config.yaml explore.max_tokens)
 ```
+## Examples
+### GOOD: Parallel search (2 turns total)
+**Turn 1 (DIVERSIFY):**
+```
+- Glob: "src/**/*.ts" pattern="config" (search in all TS files)
+- Glob: "src/**/*.js" pattern="config" (search in all JS files)
+- Grep: pattern="export.*config", type="ts" (find exports)
+- Grep: pattern="interface.*Config", type="ts" (find type definitions)
+- Grep: pattern="class.*Settings", type="ts" (alternative pattern)
+- Read: src/index.ts (verify entry point structure)
+```
+**Turn 2 (CONVERGE):**
+Return only confirmed matches:
+```
+src/config/app.ts:1-45 -- main config export with environment settings
+src/config/types.ts:10-30 -- Config interface definition
+src/utils/settings.ts:1-20 -- Settings helper functions
+```
+### DO NOT: Sequential search (antipattern, 5+ turns)
+```
+Turn 1: Glob for config files
+Turn 2: Read the first file
+Turn 3: Grep for config patterns
+Turn 4: Read results
+Turn 5: Another Grep search
+... (narrating each step)
+```
+This pattern wastes tokens and breaks context efficiency.
+## Fallback
+Search dependency directories **only when not found in app code**:
+- `node_modules/` — npm packages
+- `vendor/` — vendored dependencies
+- `site-packages/` — Python packages
+Fallback instruction: "Check node_modules/ only if target not found in src/ or lib/"
 ## Scope Restrictions
 MUST only report factual findings: files found, patterns/conventions, integration points.