deepflow 0.1.87 → 0.1.88

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -10,36 +10,26 @@ description: Execute tasks from PLAN.md with agent spawning, ratchet health chec
10
10
  You are a coordinator. Spawn agents, run ratchet checks, update PLAN.md. Never implement code yourself.
11
11
 
12
12
  **NEVER:** Read source files, edit code, use TaskOutput, use EnterPlanMode, use ExitPlanMode
13
-
14
- **ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, run ratchet health checks after each agent completes, update PLAN.md, write `.deepflow/decisions.md` in the main tree
13
+ **ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, run ratchet health checks, update PLAN.md, write `.deepflow/decisions.md`
15
14
 
16
15
  ## Core Loop (Notification-Driven)
17
16
 
18
- Each task = one background agent. Completion notifications drive the loop.
19
-
20
- **NEVER use TaskOutput** — returns full transcripts (100KB+) that explode context.
17
+ Each task = one background agent. **NEVER use TaskOutput** (100KB+ transcripts explode context).
21
18
 
22
19
  ```
23
20
  1. Spawn ALL wave agents with run_in_background=true in ONE message
24
- 2. STOP. End your turn. Do NOT poll or monitor.
21
+ 2. STOP. End turn. Do NOT poll.
25
22
  3. On EACH notification:
26
- a. Run ratchet check (section 5.5)
23
+ a. Ratchet check (§5.5)
27
24
  b. Passed → TaskUpdate(status: "completed"), update PLAN.md [x] + commit hash
28
- c. Failed → run partial salvage protocol (section 5.5). If salvaged treat as passed. If not → git revert, TaskUpdate(status: "pending")
29
- d. Report ONE line: "✓ T1: ratchet passed (abc123)" or "⚕ T1: salvaged lint fix (abc124)" or "✗ T1: ratchet failed, reverted"
25
+ c. Failed → partial salvage (§5.5). Salvaged → passed. Not → git revert, TaskUpdate(status: "pending")
26
+ d. Report ONE line: "✓ T1: ratchet passed (abc123)" or "⚕ T1: salvaged (abc124)" or "✗ T1: reverted"
30
27
  e. NOT all done → end turn, wait | ALL done → next wave or finish
31
- 4. Between waves: check context %. If ≥50% → checkpoint and exit.
28
+ 4. Between waves: context ≥50% → checkpoint and exit.
32
29
  5. Repeat until: all done, all blocked, or context ≥50%.
33
30
  ```
34
31
 
35
- ## Context Threshold
36
-
37
- Statusline writes `.deepflow/context.json`: `{"percentage": 45}`
38
-
39
- | Context % | Action |
40
- |-----------|--------|
41
- | < 50% | Full parallelism (up to 5 agents) |
42
- | ≥ 50% | Wait for running agents, checkpoint, exit |
32
+ **Context threshold:** Statusline writes `.deepflow/context.json`: `{"percentage": 45}`. <50% = full parallelism (up to 5). ≥50% = wait, checkpoint, exit.
43
33
 
44
34
  ---
45
35
 
@@ -47,109 +37,51 @@ Statusline writes `.deepflow/context.json`: `{"percentage": 45}`
47
37
 
48
38
  ### 1. CHECK CHECKPOINT
49
39
 
50
- ```
51
- --continue Load .deepflow/checkpoint.json from worktree
52
- → Verify worktree exists on disk (else error: "Use --fresh")
53
- → Skip completed tasks, resume execution
54
- --fresh → Delete checkpoint, start fresh
55
- checkpoint exists → Prompt: "Resume? (y/n)"
56
- else → Start fresh
57
- ```
58
-
59
- Shell injection (use output directly — no manual file reads needed):
60
- - `` !`cat .deepflow/checkpoint.json 2>/dev/null || echo 'NOT_FOUND'` ``
61
- - `` !`git diff --quiet && echo 'CLEAN' || echo 'DIRTY'` ``
40
+ `--continue` → load `.deepflow/checkpoint.json`, verify worktree exists (else error "Use --fresh"), skip completed. `--fresh` → delete checkpoint. Checkpoint exists → prompt "Resume? (y/n)".
41
+ Shell: `` !`cat .deepflow/checkpoint.json 2>/dev/null || echo 'NOT_FOUND'` `` / `` !`git diff --quiet && echo 'CLEAN' || echo 'DIRTY'` ``
62
42
 
63
43
  ### 1.5. CREATE WORKTREE
64
44
 
65
- Require clean HEAD (`git diff --quiet`). Derive SPEC_NAME from `specs/doing-*.md`.
66
- Create worktree: `.deepflow/worktrees/{spec}` on branch `df/{spec}`.
67
- Reuse if exists. `--fresh` deletes first.
68
-
69
- If `worktree.sparse_paths` is non-empty in config, enable sparse checkout:
70
- ```bash
71
- git worktree add --no-checkout -b df/{spec} .deepflow/worktrees/{spec}
72
- cd .deepflow/worktrees/{spec}
73
- git sparse-checkout set {sparse_paths...}
74
- git checkout df/{spec}
75
- ```
45
+ Require clean HEAD. Derive SPEC_NAME from `specs/doing-*.md`. Create `.deepflow/worktrees/{spec}` on branch `df/{spec}`. Reuse if exists; `--fresh` deletes first. If `worktree.sparse_paths` non-empty: `git worktree add --no-checkout`, `sparse-checkout set {paths}`, checkout.
76
46
 
77
47
  ### 1.6. RATCHET SNAPSHOT
78
48
 
79
- Snapshot pre-existing test files in worktree — only these count for ratchet (agent-created tests excluded):
80
-
49
+ Snapshot pre-existing test files — only these count for ratchet (agent-created excluded):
81
50
  ```bash
82
- cd ${WORKTREE_PATH}
83
- git ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' \
84
- > .deepflow/auto-snapshot.txt
51
+ git -C ${WORKTREE_PATH} ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' > .deepflow/auto-snapshot.txt
85
52
  ```
86
53
 
87
54
  ### 1.7. NO-TESTS BOOTSTRAP
88
55
 
89
- If snapshot has zero test files:
90
-
91
- 1. Spawn ONE bootstrap agent (section 6 Bootstrap Task) to write tests for `edit_scope` files
92
- 2. On ratchet pass: re-snapshot, report `"bootstrap: completed"`, end cycle (no PLAN.md tasks this cycle)
93
- 3. On ratchet fail: revert, halt with "Bootstrap failed — manual intervention required"
94
-
95
- Subsequent cycles use bootstrapped tests as ratchet baseline.
56
+ Zero test files → spawn ONE bootstrap agent (§6 Bootstrap). Pass → re-snapshot, end cycle. Fail → revert, halt "Bootstrap failed — manual intervention required". Subsequent cycles use bootstrapped tests as baseline.
96
57
 
97
58
  ### 2. LOAD PLAN
98
59
 
99
- ```
100
- Load: PLAN.md (required), specs/doing-*.md, .deepflow/config.yaml
101
- If missing: "No PLAN.md found. Run /df:plan first."
102
- ```
103
-
104
- Shell injection (use output directly — no manual file reads needed):
105
- - `` !`cat .deepflow/checkpoint.json 2>/dev/null || echo 'NOT_FOUND'` ``
106
- - `` !`git diff --quiet && echo 'CLEAN' || echo 'DIRTY'` ``
60
+ Load PLAN.md (required), specs/doing-*.md, .deepflow/config.yaml. Missing → "No PLAN.md found. Run /df:plan first."
107
61
 
108
62
  ### 2.5. REGISTER NATIVE TASKS
109
63
 
110
- For each `[ ]` task in PLAN.md: `TaskCreate(subject: "{task_id}: {description}", activeForm: "{gerund}", description: full block)`. Store task_id → native ID mapping. Set dependencies via `TaskUpdate(addBlockedBy: [...])`. On `--continue`: only register remaining `[ ]` items.
111
-
112
- ### 3. CHECK FOR UNPLANNED SPECS
64
+ For each `[ ]` task: `TaskCreate(subject: "{task_id}: {description}", activeForm: "{gerund}", description: full block)`. Store task_id → native ID. Set deps via `TaskUpdate(addBlockedBy: [...])`. `--continue` only remaining `[ ]` items.
113
65
 
114
- Warn if `specs/*.md` (excluding doing-/done-) exist. Non-blocking.
66
+ ### 3–4. READY TASKS
115
67
 
116
- ### 4. IDENTIFY READY TASKS
117
-
118
- Ready = TaskList where status: "pending" AND blockedBy: empty.
68
+ Warn if unplanned `specs/*.md` (excluding doing-/done-) exist (non-blocking). Ready = TaskList where status: "pending" AND blockedBy: empty.
119
69
 
120
70
  ### 5. SPAWN AGENTS
121
71
 
122
- Context ≥50%: checkpoint and exit.
123
-
124
- Before spawning: `TaskUpdate(taskId: native_id, status: "in_progress")` — activates UI spinner.
125
-
126
- **Token tracking — record start:**
127
- ```
128
- start_percentage = !`grep -o '"percentage":[0-9]*' .deepflow/context.json 2>/dev/null | grep -o '[0-9]*' || echo ''`
129
- start_timestamp = !`date -u +%Y-%m-%dT%H:%M:%SZ`
130
- ```
131
- Store both values in memory (keyed by task_id) for use after ratchet completes. Omit if context.json unavailable.
132
-
133
- **NEVER use `isolation: "worktree"` on Task calls.** Deepflow manages a shared worktree so wave 2 sees wave 1 commits.
72
+ Context ≥50% checkpoint and exit. Before spawning: `TaskUpdate(status: "in_progress")`.
134
73
 
135
- **Spawn ALL ready tasks in ONE message** EXCEPT file conflicts (see below).
74
+ **Token tracking start:** Store `start_percentage` (from context.json) and `start_timestamp` (ISO 8601) keyed by task_id. Omit if unavailable.
136
75
 
137
- **File conflict enforcement (1 file = 1 writer):**
138
- Before spawning, check `Files:` lists of all ready tasks. If two+ ready tasks share a file:
139
- 1. Sort conflicting tasks by task number (T1 < T2 < T3)
140
- 2. Spawn only the lowest-numbered task from each conflict group
141
- 3. Remaining tasks stay `pending` — they become ready once the spawned task completes
142
- 4. Log: `"⏳ T{N} deferred — file conflict with T{M} on {filename}"`
76
+ **NEVER use `isolation: "worktree"`.** Deepflow manages a shared worktree so wave 2 sees wave 1 commits. **Spawn ALL ready tasks in ONE message** except file conflicts.
143
77
 
144
- **≥2 [SPIKE] tasks for same problem:** Follow Parallel Spike Probes (section 5.7).
78
+ **File conflicts (1 file = 1 writer):** Check `Files:` lists. Overlap spawn lowest-numbered only; rest stay pending. Log: `"⏳ T{N} deferred — file conflict with T{M} on {filename}"`
145
79
 
146
- **[OPTIMIZE] tasks:** Follow Optimize Cycle (section 5.9). Only ONE optimize task runs at a time — defer others until the active one completes.
80
+ **≥2 [SPIKE] tasks same problem →** Parallel Spike Probes (§5.7). **[OPTIMIZE] tasks →** Optimize Cycle (§5.9), one at a time.
147
81
 
148
82
  ### 5.5. RATCHET CHECK
149
83
 
150
- After each agent completes, run health checks in the worktree.
151
-
152
- **Auto-detect commands:**
84
+ Run health checks in worktree after each agent completes.
153
85
 
154
86
  | File | Build | Test | Typecheck | Lint |
155
87
  |------|-------|------|-----------|------|
@@ -158,555 +90,194 @@ After each agent completes, run health checks in the worktree.
158
90
  | `Cargo.toml` | `cargo build` | `cargo test` | — | `cargo clippy` |
159
91
  | `go.mod` | `go build ./...` | `go test ./...` | — | `go vet ./...` |
160
92
 
161
- Run Build → Test → Typecheck → Lint (stop on first failure).
162
-
163
- **Edit scope validation** (if spec declares `edit_scope`): check `git diff HEAD~1 --name-only` against allowed globs. Violations → `git revert HEAD --no-edit`, report "Edit scope violation: {files}".
164
-
165
- **Impact completeness check** (if task has Impact block in PLAN.md):
166
- Compare `git diff HEAD~1 --name-only` against Impact callers/duplicates list.
167
- File listed but not modified → **advisory warning**: "Impact gap: {file} listed as {caller|duplicate} but not modified — verify manually". Not auto-revert (callers sometimes don't need changes), but flags the risk.
168
-
169
- **Metric gate (Optimize tasks only):**
170
-
171
- After ratchet passes, if the current task has an `Optimize:` block, run the metric gate:
172
-
173
- 1. Run the `metric` shell command in the worktree: `cd ${WORKTREE_PATH} && eval "${metric_command}"`
174
- 2. Parse output as float. Non-numeric output → cycle failure (revert, log "metric parse error: {raw output}")
175
- 3. Compare against previous measurement using `direction`:
176
- - `direction: higher` → new value must be > previous + (previous × min_improvement_threshold)
177
- - `direction: lower` → new value must be < previous - (previous × min_improvement_threshold)
178
- 4. Both ratchet AND metric improvement required → keep commit
179
- 5. Ratchet passes but metric did not improve → revert (log "ratchet passed but metric stagnant/regressed: {old} → {new}")
180
- 6. Run each `secondary_metrics` command, parse as float. If regression > `regression_threshold` (default 5%) compared to baseline: append WARNING to `.deepflow/auto-report.md`: `"WARNING: {name} regressed {delta}% ({baseline_val} → {new_val}) at cycle {N}"`. Do NOT auto-revert.
93
+ Run Build → Test → Typecheck → Lint (stop on first failure). Ratchet uses ONLY pre-existing tests from `.deepflow/auto-snapshot.txt`.
181
94
 
182
- **Output Truncation:**
183
-
184
- After ratchet checks complete, truncate command output for context efficiency:
185
-
186
- - **Success (all checks passed):** Suppress output entirely — do not include build/test/lint output in reports
187
- - **Build failure:** Include last 15 lines of build error only
188
- - **Test failure:** Include failed test name(s) + last 20 lines of test output
189
- - **Typecheck/lint failure:** Include error count + first 5 errors only
190
-
191
- **Token tracking — write result (on ratchet pass):**
192
-
193
- After all checks pass, compute and write the token block to `.deepflow/results/T{N}.yaml`:
194
-
195
- ```
196
- end_percentage = !`grep -o '"percentage":[0-9]*' .deepflow/context.json 2>/dev/null | grep -o '[0-9]*' || echo ''`
197
- ```
95
+ **Edit scope validation:** `git diff HEAD~1 --name-only` vs allowed globs. Violation → revert, report.
96
+ **Impact completeness:** diff vs Impact callers/duplicates. Gap → advisory warning (no revert).
198
97
 
199
- Parse `.deepflow/token-history.jsonl` to sum token fields for lines whose `timestamp` falls between `start_timestamp` and `end_timestamp` (ISO 8601 compare):
200
- ```bash
201
- awk -v start="REPLACE_start_timestamp" -v end="REPLACE_end_timestamp" '
202
- {
203
- ts=""; inp=0; cre=0; rd=0
204
- if (match($0, /"timestamp":"[^"]*"/)) { ts=substr($0, RSTART+13, RLENGTH-14) }
205
- if (ts >= start && ts <= end) {
206
- if (match($0, /"input_tokens":[0-9]+/)) inp=substr($0, RSTART+15, RLENGTH-15)
207
- if (match($0, /"cache_creation_input_tokens":[0-9]+/)) cre=substr($0, RSTART+30, RLENGTH-30)
208
- if (match($0, /"cache_read_input_tokens":[0-9]+/)) rd=substr($0, RSTART+26, RLENGTH-26)
209
- si+=inp; sc+=cre; sr+=rd
210
- }
211
- }
212
- END { printf "{\"input_tokens\":%d,\"cache_creation_input_tokens\":%d,\"cache_read_input_tokens\":%d}\n", si+0, sc+0, sr+0 }
213
- ' .deepflow/token-history.jsonl 2>/dev/null || echo '{}'
214
- ```
98
+ **Metric gate (Optimize only):** Run `eval "${metric_command}"` with cwd=`${WORKTREE_PATH}` (never `cd && eval`). Parse float (non-numeric → revert). Compare using `direction`+`min_improvement_threshold`. Both ratchet AND metric must pass keep. Ratchet pass + metric stagnant → revert. Secondary metrics: regression > `regression_threshold` (5%) WARNING in auto-report.md (no revert).
215
99
 
216
- Append (or create) `.deepflow/results/T{N}.yaml` with the following block. Use shell injection to read the existing file first:
217
- ```
218
- !`cat .deepflow/results/T{N}.yaml 2>/dev/null || echo ''`
219
- ```
100
+ **Output truncation:** Success → suppress. Build fail last 15 lines. Test fail names + last 20 lines. Typecheck/lint → count + first 5 errors.
220
101
 
221
- Write the `tokens` block:
102
+ **Token tracking result (on pass):** Read `end_percentage`. Sum token fields from `.deepflow/token-history.jsonl` between start/end timestamps (awk ISO 8601 compare). Write to `.deepflow/results/T{N}.yaml`:
222
103
  ```yaml
223
104
  tokens:
224
- start_percentage: {start_percentage}
225
- end_percentage: {end_percentage}
226
- delta_percentage: {end_percentage - start_percentage}
227
- input_tokens: {sum from jsonl}
228
- cache_creation_input_tokens: {sum from jsonl}
229
- cache_read_input_tokens: {sum from jsonl}
105
+ start_percentage: {val}
106
+ end_percentage: {val}
107
+ delta_percentage: {end - start}
108
+ input_tokens: {sum}
109
+ cache_creation_input_tokens: {sum}
110
+ cache_read_input_tokens: {sum}
230
111
  ```
112
+ Omit if context.json/token-history.jsonl/awk unavailable. Never fail ratchet for tracking errors.
113
+
114
+ **Evaluate:** All pass → commit stands. Failure → partial salvage:
115
+ 1. Lint/typecheck-only (build+tests passed): spawn `Agent(model="haiku")` to fix. Re-ratchet. Fail → revert both.
116
+ 2. Build/test failure → `git revert HEAD --no-edit` (no salvage).
231
117
 
232
- **Omit entirely if:** context.json was unavailable at start OR end, OR token-history.jsonl is missing, OR awk is unavailable. Never fail the ratchet due to token tracking errors.
118
+ ### 5.7. PARALLEL SPIKE PROBES
233
119
 
234
- **Evaluate:** All pass + no violations commit stands. Any failure → attempt partial salvage before reverting:
120
+ Trigger: ≥2 [SPIKE] tasks with same blocker or identical hypothesis.
235
121
 
236
- **Partial salvage protocol:**
237
- 1. Run `git diff HEAD~1 --stat` to see what the agent changed
238
- 2. If failure is lint-only or typecheck-only (build + tests passed):
239
- - Spawn `Agent(model="haiku", subagent_type="general-purpose")` with prompt: `Fix the {lint|typecheck} errors in the worktree. Only fix what's broken, change nothing else. Files changed: {diff stat}. Error output: {error}`
240
- - Run ratchet again on the fix commit
241
- - If passes → both commits stand. If fails `git revert HEAD --no-edit && git revert HEAD --no-edit` (revert both)
242
- 3. If failure is build or test `git revert HEAD --no-edit` (no salvage, too risky)
122
+ 1. `BASELINE=$(git rev-parse HEAD)` in shared worktree
123
+ 2. Sub-worktrees per spike: `git worktree add -b df/{spec}--probe-{ID} .deepflow/worktrees/{spec}/probe-{ID} ${BASELINE}`
124
+ 3. Spawn all probes in ONE message. End turn.
125
+ 4. Per notification: ratchet (§5.5). Record: ratchet_passed, regressions, coverage_delta, files_changed, commit.
126
+ 5. **Winner selection** (no LLM judge): disqualify regressions. Standard: fewer regressions > coverage > fewer files > first complete. Optimize: best metric delta > fewer regressions > fewer files. No passes → reset pending for debugger.
127
+ 6. Preserve all worktrees. Losers: branch + `-failed`. Record in checkpoint.json.
128
+ 7. Log all outcomes to `.deepflow/auto-memory.yaml` under `spike_insights`+`probe_learnings` (schema in auto-cycle.md). Both winners and losers.
129
+ 8. Cherry-pick winner into shared worktree. Winner → `[x] [PROBE_WINNER]`, losers → `[~] [PROBE_FAILED]`.
243
130
 
244
- Ratchet uses ONLY pre-existing test files from `.deepflow/auto-snapshot.txt`.
131
+ #### 5.7.1. PROBE DIVERSITY (Optimize Probes)
245
132
 
246
- ### 5.7. PARALLEL SPIKE PROBES
133
+ Roles: **contextualizada** (refine best), **contraditoria** (opposite of best), **ingenua** (fresh, no context).
247
134
 
248
- Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothesis.
249
-
250
- 1. **Baseline:** Record `BASELINE=$(git rev-parse HEAD)` in shared worktree
251
- 2. **Sub-worktrees:** Per spike: `git worktree add -b df/{spec}--probe-{SPIKE_ID} .deepflow/worktrees/{spec}/probe-{SPIKE_ID} ${BASELINE}`
252
- 3. **Spawn:** All probes in ONE message, each targeting its probe worktree. End turn.
253
- 4. **Ratchet:** Per notification, run standard ratchet (5.5) in probe worktree. Record: ratchet_passed, regressions, coverage_delta, files_changed, commit
254
- 5. **Select winner** (after ALL complete, no LLM judge):
255
- - Disqualify any with regressions
256
- - **Standard spikes**: Rank: fewer regressions > higher coverage_delta > fewer files_changed > first to complete
257
- - **Optimize probes**: Rank: best metric improvement (absolute delta toward target) > fewer regressions > fewer files_changed
258
- - No passes → reset all to pending for retry with debugger
259
- 6. **Preserve all worktrees.** Losers: rename branch + `-failed` suffix. Record in checkpoint.json under `"spike_probes"`
260
- 7. **Log ALL probe outcomes** to `.deepflow/auto-memory.yaml` (main tree):
261
- ```yaml
262
- spike_insights:
263
- - date: "YYYY-MM-DD"
264
- spec: "{spec_name}"
265
- spike_id: "SPIKE_A"
266
- hypothesis: "{from PLAN.md}"
267
- outcome: "winner"
268
- approach: "{one-sentence summary of what the winning probe chose}"
269
- ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
270
- branch: "df/{spec}--probe-SPIKE_A"
271
- - date: "YYYY-MM-DD"
272
- spec: "{spec_name}"
273
- spike_id: "SPIKE_B"
274
- hypothesis: "{from PLAN.md}"
275
- outcome: "failed" # or "passed-but-lost"
276
- failure_reason: "{first failed check + error summary}"
277
- ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
278
- worktree: ".deepflow/worktrees/{spec}/probe-SPIKE_B-failed"
279
- branch: "df/{spec}--probe-SPIKE_B-failed"
280
- probe_learnings: # read by /df:auto-cycle each start AND included in per-task preamble
281
- - spike: "SPIKE_A"
282
- probe: "probe-SPIKE_A"
283
- insight: "{one-sentence summary of winning approach — e.g. 'Use Node.js over Bun for Playwright'}"
284
- - spike: "SPIKE_B"
285
- probe: "probe-SPIKE_B"
286
- insight: "{one-sentence summary from failure_reason}"
287
- ```
288
- Create file if missing. Preserve existing keys when merging. Log BOTH winners and losers — downstream tasks need to know what was chosen, not just what failed.
289
- 8. **Promote winner:** Cherry-pick into shared worktree. Winner → `[x] [PROBE_WINNER]`, losers → `[~] [PROBE_FAILED]`. Resume standard loop.
290
-
291
- #### 5.7.1. PROBE DIVERSITY ENFORCEMENT (Optimize Probes)
292
-
293
- When spawning probes for optimize plateau resolution, enforce diversity roles:
294
-
295
- **Role definitions:**
296
- - **contextualizada**: Builds on the best approach so far — refines, extends, or combines what worked. Prompt includes: "Build on the best result so far: {best_approach_summary}. Refine or extend it."
297
- - **contraditoria**: Tries the opposite of the current best. Prompt includes: "The best approach so far is {best_approach_summary}. Try the OPPOSITE direction — if it cached, don't cache; if it optimized hot path, optimize cold path; etc."
298
- - **ingenua**: No prior context — naive fresh attempt. Prompt includes: "Ignore all prior attempts. Approach this from scratch with no assumptions about what works."
299
-
300
- **Auto-scaling by probe round:**
301
-
302
- | Probe round | Count | Required roles |
303
- |-------------|-------|----------------|
135
+ | Round | Count | Roles |
136
+ |-------|-------|-------|
304
137
  | 1st plateau | 2 | 1 contraditoria + 1 ingenua |
305
138
  | 2nd plateau | 4 | 1 contextualizada + 2 contraditoria + 1 ingenua |
306
- | 3rd+ plateau | 6 | 2 contextualizada + 2 contraditoria + 2 ingenua |
139
+ | 3rd+ | 6 | 2 contextualizada + 2 contraditoria + 2 ingenua |
307
140
 
308
- **Rules:**
309
- - Every probe set MUST include ≥1 contraditoria and ≥1 ingenua (minimum diversity)
310
- - contextualizada only added from round 2+ (needs prior data to build on)
311
- - Each probe prompt includes its role label and role-specific instruction
312
- - Probe scale persists in `optimize_state.probe_scale` in `auto-memory.yaml`
141
+ Every set: ≥1 contraditoria + ≥1 ingenua. contextualizada from round 2+ only. Scale persists in `optimize_state.probe_scale`.
313
142
 
314
143
  ### 5.9. OPTIMIZE CYCLE
315
144
 
316
- Trigger: task has `Optimize:` block in PLAN.md. Runs instead of standard single-agent spawn.
317
-
318
- **Optimize is a distinct execution mode** one optimize task at a time, spanning N cycles until a stop condition.
319
-
320
- #### 5.9.1. INITIALIZATION
321
-
322
- 1. Parse `Optimize:` block from PLAN.md task: `metric`, `target`, `direction`, `max_cycles`, `secondary_metrics`
323
- 2. Load or initialize `optimize_state` from `.deepflow/auto-memory.yaml`:
324
- ```yaml
325
- optimize_state:
326
- task_id: "T{n}"
327
- metric_command: "{shell command}"
328
- target: {number}
329
- direction: "higher|lower"
330
- baseline: null # set on first measure
331
- current_best: null # best metric value seen
332
- best_commit: null # commit hash of best value
333
- cycles_run: 0
334
- cycles_without_improvement: 0
335
- consecutive_reverts: 0
336
- probe_scale: 0 # 0=no probes yet, 2/4/6
337
- max_cycles: {number}
338
- history: [] # [{cycle, value, delta, kept, commit}]
339
- failed_hypotheses: [] # ["{description}"]
340
- ```
341
- 3. **Measure baseline**: `cd ${WORKTREE_PATH} && eval "${metric_command}"` → parse float → store as `baseline` and `current_best`
342
- 4. Measure each secondary metric → store as `secondary_baselines`
343
- 5. Check if target already met (`direction: higher` → baseline >= target; `lower` → baseline <= target). If met → mark task `[x]`, log "target already met: {baseline}", done.
344
-
345
- #### 5.9.2. CYCLE LOOP
346
-
347
- Each cycle = one agent spawn + measure + keep/revert decision.
145
+ Trigger: task has `Optimize:` block. One at a time, N cycles until stop condition.
146
+
147
+ **Init:** Parse metric/target/direction/max_cycles/secondary_metrics. Load or init `optimize_state` in auto-memory.yaml (fields: task_id, metric_command, target, direction, baseline, current_best, best_commit, cycles_run, cycles_without_improvement, consecutive_reverts, probe_scale, max_cycles, history[], failed_hypotheses[]). Measure baseline (`eval` with cwd=worktree) → store as baseline+current_best. Measure secondaries. Target met → mark `[x]`, done.
348
148
 
149
+ **Cycle loop:**
349
150
  ```
350
151
  REPEAT:
351
- 1. Check stop conditions (5.9.3) → if triggered, exit loop
352
- 2. Spawn ONE optimize agent (section 6, Optimize Task prompt) with run_in_background=true
353
- 3. STOP. End turn. Wait for notification.
354
- 4. On notification:
355
- a. Run ratchet check (section 5.5) — build/test/lint must pass
356
- b. If ratchet fails git revert HEAD --no-edit, increment consecutive_reverts, log failed hypothesis, go to step 1
357
- c. Run metric gate (section 5.5 metric gate) measure new value
358
- d. If metric parse error git revert HEAD --no-edit, increment consecutive_reverts, log "metric parse error"
359
- e. Compute improvement:
360
- - direction: higher improvement = (new - current_best) / |current_best| × 100
361
- - direction: lower improvement = (current_best - new) / |current_best| × 100
362
- - current_best == 0 → use absolute delta
363
- f. If improvement >= min_improvement_threshold (default 1%):
364
- KEEP: update current_best, best_commit, reset cycles_without_improvement=0, reset consecutive_reverts=0
365
- g. If improvement < min_improvement_threshold:
366
- REVERT: git revert HEAD --no-edit, increment cycles_without_improvement
367
- h. Increment cycles_run
368
- i. Append to history: {cycle, value, delta_pct, kept: bool, commit}
369
- j. Measure secondary metrics, check regression (WARNING only, no revert)
370
- k. Persist optimize_state to auto-memory.yaml
371
- l. Report: "⟳ T{n} cycle {N}: {old} {new} ({+/-delta}%) {kept|reverted} [best: {current_best}, target: {target}]"
372
- m. Check context %. If ≥50% → checkpoint and exit (auto-cycle resumes).
373
- ```
152
+ 1. Check stop conditions → if triggered, exit
153
+ 2. Spawn ONE optimize agent (§6) run_in_background=true. STOP, end turn.
154
+ 3. On notification:
155
+ a. Ratchet fail → revert, ++consecutive_reverts, log hypothesis, goto 1
156
+ b. Metric parse error revert, ++consecutive_reverts
157
+ c. improvement = (new - best) / |best| × 100 (flip for lower; absolute if best==0)
158
+ d. >= 1% threshold KEEP, update best, reset counters
159
+ e. < thresholdREVERT, ++cycles_without_improvement
160
+ f. ++cycles_run, append history, check secondaries, persist state
161
+ g. Report: "⟳ T{n} cycle {N}: {old}→{new} ({delta}%) {kept|reverted} [best: X, target: Y]"
162
+ h. Context ≥50% checkpoint, exit
163
+ ```
164
+
165
+ **Stop conditions:**
166
+
167
+ | Condition | Action |
168
+ |-----------|--------|
169
+ | Target reached | Mark `[x]` |
170
+ | cycles_run >= max_cycles | Mark `[x]`. If best < baseline → `git reset --hard {best_commit}` |
171
+ | 3 cycles without improvement | Launch probes (plateau) |
172
+ | 3 consecutive reverts | Halt, task `[ ]`, requires human intervention |
374
173
 
375
- #### 5.9.3. STOP CONDITIONS
376
-
377
- | Condition | Detection | Action |
378
- |-----------|-----------|--------|
379
- | **Target reached** | `direction: higher` → value >= target; `lower` → value <= target | Mark task `[x]`, log "target reached: {value}" |
380
- | **Max cycles** | `cycles_run >= max_cycles` | Mark task `[x]` with note: "max cycles reached, best: {current_best}". If current_best worse than baseline → `git reset --hard {best_commit}`, log "reverted to best-known" |
381
- | **Plateau** | `cycles_without_improvement >= 3` | Pause normal cycle → launch probes (5.9.4) |
382
- | **Circuit breaker** | `consecutive_reverts >= 3` | Halt, task stays `[ ]`, log "circuit breaker: 3 consecutive reverts". Requires human intervention. |
383
-
384
- On **max cycles** with final value worse than baseline:
385
- 1. `git reset --hard {best_commit}` in worktree
386
- 2. Log: "final value {current} worse than baseline {baseline}, reverted to best-known commit {best_commit} (value: {current_best})"
387
-
388
- #### 5.9.4. PLATEAU → PROBE LAUNCH
389
-
390
- When plateau detected (3 cycles without ≥1% improvement):
391
-
392
- 1. Pause normal optimize cycle
393
- 2. Determine probe count from `probe_scale` (section 5.7.1 auto-scaling table): 0→2, 2→4, 4→6
394
- 3. Update `probe_scale` in optimize_state
395
- 4. Record `BASELINE=$(git rev-parse HEAD)` in shared worktree
396
- 5. Create sub-worktrees per probe: `git worktree add -b df/{spec}--opt-probe-{N} .deepflow/worktrees/{spec}/opt-probe-{N} ${BASELINE}`
397
- 6. Spawn ALL probes in ONE message using Optimize Probe prompt (section 6), each with its diversity role
398
- 7. End turn. Wait for all notifications.
399
- 8. Per notification: run ratchet + metric measurement in probe worktree
400
- 9. Select winner (section 5.7 step 5, optimize ranking): best metric improvement toward target
401
- 10. Winner → cherry-pick into shared worktree, update current_best, reset cycles_without_improvement=0
402
- 11. Losers → rename branch with `-failed` suffix, preserve worktrees
403
- 12. Log all probe outcomes to `auto-memory.yaml` under `spike_insights` (reuse existing format)
404
- 13. Log probe learnings: winning approach summary + each loser's failure reason
405
- 14. Resume normal optimize cycle from step 1
406
-
407
- #### 5.9.5. STATE PERSISTENCE (auto-memory.yaml)
408
-
409
- After every cycle, write `optimize_state` to `.deepflow/auto-memory.yaml` (main tree). This ensures:
410
- - Context exhaustion at 50% → auto-cycle resumes with full history
411
- - Failed hypotheses carry forward (agents won't repeat approaches)
412
- - Probe scale persists across context windows
413
-
414
- Also append cycle results to `.deepflow/auto-report.md`:
415
- ```
416
- ## Optimize: T{n} — {metric_name}
417
- | Cycle | Value | Delta | Kept | Commit |
418
- |-------|-------|-------|------|--------|
419
- | 1 | 72.3 | — | baseline | abc123 |
420
- | 2 | 74.1 | +2.5% | ✓ | def456 |
421
- | 3 | 73.8 | -0.4% | ✗ | (reverted) |
422
- ...
423
- Best: {current_best} | Target: {target} | Status: {in_progress|reached|max_cycles|circuit_breaker}
424
- ```
174
+ **Plateau → probes:** Scale 0→2, 2→4, 4→6 per §5.7.1. Create sub-worktrees, spawn all with diversity roles (§6 Optimize Probe). Per notification: ratchet + metric. Winner → cherry-pick, update best, reset counters. Losers → `-failed`. Log outcomes. Resume cycle.
175
+
176
+ **State persistence:** Write `optimize_state` to auto-memory.yaml after every cycle. Append results table to `.deepflow/auto-report.md`.
425
177
 
426
178
  ---
427
179
 
428
180
  ### 6. PER-TASK (agent prompt)
429
181
 
430
- > **Context engineering rationale:** Prompt order follows the attention U-curve (start/end = high attention, middle = low).
431
- > Critical instructions go at start and end. Navigable data goes in the middle.
432
- > See: Chroma "Context Rot" (2025) — performance degrades ~2%/100K tokens; distractors and semantic ambiguity compound degradation.
433
-
434
- **Common preamble (include in all agent prompts):**
435
- ```
436
- Working directory: {worktree_absolute_path}
437
- All file operations MUST use this absolute path as base. Do NOT write files to the main project directory.
438
- Commit format: {commit_type}({spec}): {description}
439
- ```
440
-
441
- **Standard Task** (spawn with `Agent(model="{Model from PLAN.md}", ...)`):
442
-
443
- Prompt sections in order (START = high attention, MIDDLE = navigable data, END = high attention):
444
-
445
- ```
446
- --- START (high attention zone) ---
447
-
448
- {task_id}: {description from PLAN.md}
449
- Files: {target files} Spec: {spec_name}
450
-
451
- {Prior failure context — include ONLY if task was previously reverted. Read from .deepflow/auto-memory.yaml revert_history for this task_id:}
452
- DO NOT repeat these approaches:
453
- - Cycle {N}: reverted — "{reason from revert_history}"
454
- {Omit this entire block if task has no revert history.}
455
-
456
- {Acceptance criteria excerpt — extract 2-3 key ACs from the spec file (specs/doing-*.md). Include only the criteria relevant to THIS task, not the full spec.}
457
- Success criteria:
458
- - {AC relevant to this task}
459
- - {AC relevant to this task}
460
- {Omit if spec has no structured ACs.}
461
-
462
- --- MIDDLE (navigable data zone) ---
463
-
464
- {Impact block from PLAN.md — include verbatim if present. Annotate each caller with WHY it's impacted:}
465
- Impact:
466
- - Callers: {file} ({why — e.g. "imports validateToken which you're changing"})
467
- - Duplicates:
468
- - {file} [active — consolidate]
469
- - {file} [dead — DELETE]
470
- - Data flow: {consumers}
471
- {Omit if no Impact in PLAN.md.}
472
-
473
- {Dependency context — for each completed blocker task, include a one-liner summary:}
474
- Prior tasks:
475
- - {dep_task_id}: {one-line summary of what changed — e.g. "refactored validateToken to async, changed signature (string) → (string, opts)"}
476
- {Omit if task has no dependencies or all deps are bootstrap/spike tasks.}
477
-
478
- Steps:
479
- 1. External APIs/SDKs → chub search "<library>" --json → chub get <id> --lang <lang> (skip if chub unavailable or internal code only)
480
- 2. LSP freshness check: run `findReferences` on each function/type you're about to change. If callers exist beyond the Impact list, add them to your scope before implementing.
481
- 3. Read ALL files in Impact (+ any new callers from step 2) before implementing — understand the full picture
482
- 4. Implement the task, updating all impacted files
483
- 5. Commit as feat({spec}): {description}
484
-
485
- --- END (high attention zone) ---
486
-
487
- {If .deepflow/auto-memory.yaml exists and has probe_learnings, include:}
488
- Spike results (follow these approaches):
489
- {each probe_learning with outcome "winner" → "- {insight}"}
490
- {Omit this block if no probe_learnings exist.}
491
-
492
- If Impact lists duplicates: [active] → consolidate into single source of truth. [dead] → DELETE entirely.
493
- Your ONLY job is to write code and commit. Orchestrator runs health checks after.
494
- STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main.
495
- ```
496
-
497
- **Effort-aware context budget:** For `Effort: low` tasks, omit the MIDDLE section entirely (no Impact, no dependency context, no steps). For `Effort: medium`, include Impact but omit dependency context. For `Effort: high`, include everything.
498
-
499
- **Bootstrap Task:**
500
- ```
501
- BOOTSTRAP: Write tests for files in edit_scope
502
- Files: {edit_scope files} Spec: {spec_name}
503
-
504
- Write tests covering listed files. Do NOT change implementation files.
505
- Commit as test({spec}): bootstrap tests for edit_scope
506
- ```
182
+ **Common preamble (all):** `Working directory: {worktree_absolute_path}. All file ops use this path. Commit format: {type}({spec}): {desc}`
507
183
 
508
- **Spike Task:**
184
+ **Standard Task** (`Agent(model="{Model}", ...)`):
509
185
  ```
510
- {task_id} [SPIKE]: {hypothesis}
511
- Files: {target files} Spec: {spec_name}
512
-
513
- {Prior failure context — include ONLY if this spike was previously reverted. Read from .deepflow/auto-memory.yaml revert_history + spike_insights for this task_id:}
514
- DO NOT repeat these approaches:
515
- - Cycle {N}: reverted "{reason}"
516
- {Omit this entire block if no revert history.}
517
-
518
- Implement minimal spike to validate hypothesis.
519
- Commit as spike({spec}): {description}
186
+ --- START ---
187
+ {task_id}: {description} Files: {files} Spec: {spec}
188
+ {If reverted: DO NOT repeat: - Cycle {N}: "{reason}"}
189
+ Success criteria: {ACs from spec relevant to this task}
190
+ --- MIDDLE (omit for low effort; omit deps for medium) ---
191
+ Impact: Callers: {file} ({why}) | Duplicates: [active→consolidate] [dead→DELETE] | Data flow: {consumers}
192
+ Prior tasks: {dep_id}: {summary}
193
+ Steps: 1. chub search/get for APIs 2. LSP findReferences, add unlisted callers 3. Read all Impact files 4. Implement 5. Commit
194
+ --- END ---
195
+ Spike results: {winner learnings}
196
+ Duplicates: [active]→consolidate [dead]→DELETE. ONLY job: code+commit. No merge/rename/checkout.
520
197
  ```
521
198
 
522
- **Optimize Task** (spawn with `Agent(model="opus", subagent_type="general-purpose")`):
199
+ **Bootstrap:** `BOOTSTRAP: Write tests for edit_scope files. Do NOT change implementation. Commit as test({spec}): bootstrap`
523
200
 
524
- One agent per cycle. Agent makes ONE atomic change to improve the metric.
201
+ **Spike:** `{task_id} [SPIKE]: {hypothesis}. Files+Spec. {reverted warnings}. Minimal spike. Commit as spike({spec}): {desc}`
525
202
 
203
+ **Optimize Task** (`Agent(model="opus")`):
526
204
  ```
527
- --- START (high attention zone) ---
528
-
529
- {task_id} [OPTIMIZE]: Improve {metric_name} cycle {N}/{max_cycles}
530
- Files: {target files} Spec: {spec_name}
531
-
532
- Current metric: {current_value} (baseline: {baseline}, best: {current_best})
533
- Target: {target} ({direction})
534
- Improvement needed: {delta_to_target} ({direction})
535
-
536
- CONSTRAINT: Make exactly ONE atomic change. Do not refactor broadly.
537
- The metric is measured by: {metric_command}
538
- You succeed if the metric moves toward {target} after your change.
539
-
540
- --- MIDDLE (navigable data zone) ---
541
-
542
- Attempt history (last 5 cycles):
543
- {For each recent history entry:}
544
- - Cycle {N}: {value} ({+/-delta}%) — {kept|reverted} — "{one-line description of what was tried}"
545
- {Omit if cycle 1.}
546
-
547
- DO NOT repeat these failed approaches:
548
- {For each failed_hypothesis in optimize_state:}
549
- - "{hypothesis description}"
550
- {Omit if no failed hypotheses.}
551
-
552
- {Impact block from PLAN.md if present}
553
-
554
- {Dependency context if present}
555
-
556
- Steps:
557
- 1. Analyze the metric command to understand what's being measured
558
- 2. Read the target files and identify ONE specific improvement
559
- 3. Implement the change (ONE atomic modification)
560
- 4. Commit as feat({spec}): optimize {metric_name} — {what you changed}
561
-
562
- --- END (high attention zone) ---
563
-
564
- {Spike/probe learnings if any}
565
-
566
- Your ONLY job is to make ONE atomic change and commit. Orchestrator measures the metric after.
567
- Do NOT run the metric command yourself. Do NOT make multiple changes.
568
- STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main.
205
+ --- START ---
206
+ {task_id} [OPTIMIZE]: {metric} — cycle {N}/{max}. Files+Spec.
207
+ Current: {val} (baseline: {b}, best: {best}). Target: {t} ({dir}). Metric: {cmd}
208
+ CONSTRAINT: ONE atomic change.
209
+ --- MIDDLE ---
210
+ Last 5 cycles + failed hypotheses + Impact/deps.
211
+ --- END ---
212
+ {Learnings}. ONE change + commit. No metric run, no multiple changes.
569
213
  ```
570
214
 
571
- **Optimize Probe Task** (spawn with `Agent(model="opus", subagent_type="general-purpose")`):
572
-
573
- Used during plateau resolution. Each probe has a diversity role.
574
-
215
+ **Optimize Probe** (`Agent(model="opus")`):
575
216
  ```
576
- --- START (high attention zone) ---
577
-
578
- {task_id} [OPTIMIZE PROBE]: {metric_name} — probe {probe_id} ({role_label})
579
- Files: {target files} Spec: {spec_name}
580
-
581
- Current metric: {current_value} (baseline: {baseline}, best: {current_best})
582
- Target: {target} ({direction})
583
-
584
- Role: {role_label}
585
- {role_instruction one of:}
586
- contextualizada: "Build on the best approach so far: {best_approach_summary}. Refine, extend, or combine what worked."
587
- contraditoria: "The best approach so far was: {best_approach_summary}. Try the OPPOSITE — if it optimized X, try Y instead. Challenge the current direction."
588
- ingenua: "Ignore all prior attempts. Approach this metric from scratch with no assumptions about what has or hasn't worked."
589
-
590
- --- MIDDLE (navigable data zone) ---
591
-
592
- Full attempt history:
593
- {ALL history entries from optimize_state}
594
- - Cycle {N}: {value} ({+/-delta}%) — {kept|reverted}
595
-
596
- All failed approaches (DO NOT repeat):
597
- {ALL failed_hypotheses}
598
- - "{hypothesis description}"
599
-
600
- --- END (high attention zone) ---
601
-
602
- Make ONE atomic change that moves the metric toward {target}.
603
- Commit as feat({spec}): optimize probe {probe_id} — {what you changed}
604
- STOP after committing.
217
+ --- START ---
218
+ {task_id} [OPTIMIZE PROBE]: {metric} — probe {id} ({role})
219
+ Current/Target. Role instruction:
220
+ contextualizada: "Build on best: {summary}. Refine."
221
+ contraditoria: "Best was: {summary}. Try OPPOSITE."
222
+ ingenua: "Ignore prior. Fresh approach."
223
+ --- MIDDLE ---
224
+ Full history + all failed hypotheses.
225
+ --- END ---
226
+ ONE atomic change. Commit. STOP.
605
227
  ```
606
228
 
607
229
  ### 8. COMPLETE SPECS
608
230
 
609
- When all tasks done for a `doing-*` spec:
610
- 1. Run `/df:verify doing-{name}` via the Skill tool (`skill: "df:verify", args: "doing-{name}"`)
611
- - Verify runs quality gates (L0-L4), merges worktree branch to main, cleans up worktree, renames spec `doing-*` → `done-*`, and extracts decisions
612
- - If verify fails (adds fix tasks): stop here — `/df:execute --continue` will pick up the fix tasks
613
- - If verify passes: proceed to step 2
614
- 2. Remove spec's ENTIRE section from PLAN.md (header, tasks, summaries, fix tasks, separators)
615
- 3. Recalculate Summary table at top of PLAN.md
231
+ All tasks done for `doing-*` spec:
232
+ 1. `skill: "df:verify", args: "doing-{name}"` — runs L0-L4 gates, merges, cleans worktree, renames doing→done, extracts decisions. Fail (fix tasks added) → stop; `--continue` picks them up.
233
+ 2. Remove spec's ENTIRE section from PLAN.md. Recalculate Summary table.
616
234
 
617
235
  ---
618
236
 
619
237
  ## Usage
620
238
 
621
239
  ```
622
- /df:execute # Execute all ready tasks
623
- /df:execute T1 T2 # Specific tasks only
624
- /df:execute --continue # Resume from checkpoint
240
+ /df:execute # All ready tasks
241
+ /df:execute T1 T2 # Specific tasks
242
+ /df:execute --continue # Resume checkpoint
625
243
  /df:execute --fresh # Ignore checkpoint
626
244
  /df:execute --dry-run # Show plan only
627
245
  ```
628
246
 
629
247
  ## Skills & Agents
630
248
 
631
- - Skill: `atomic-commits` Clean commit protocol
632
- - Skill: `browse-fetch` — Fetch live web pages and external API docs via browser before coding
633
-
634
- | Agent | subagent_type | Purpose |
635
- |-------|---------------|---------|
636
- | Implementation | `general-purpose` | Task implementation |
637
- | Debugger | `reasoner` | Debugging failures |
638
-
639
- **Model + effort routing:** Read `Model:` and `Effort:` fields from each task block in PLAN.md. Pass `model:` parameter when spawning the agent. Prepend effort instruction to the agent prompt. Defaults: `Model: sonnet`, `Effort: medium`.
640
-
641
- | Task fields | Agent call | Prompt preamble |
642
- |-------------|-----------|-----------------|
643
- | `Model: haiku, Effort: low` | `Agent(model="haiku", ...)` | `You MUST be maximally efficient: skip explanations, minimize tool calls, go straight to implementation.` |
644
- | `Model: sonnet, Effort: medium` | `Agent(model="sonnet", ...)` | `Be direct and efficient. Explain only when the logic is non-obvious.` |
645
- | `Model: opus, Effort: high` | `Agent(model="opus", ...)` | _(no preamble — default behavior)_ |
646
- | (missing) | `Agent(model="sonnet", ...)` | `Be direct and efficient. Explain only when the logic is non-obvious.` |
249
+ Skills: `atomic-commits`, `browse-fetch`. Agents: Implementation (`general-purpose`), Debugger (`reasoner`).
647
250
 
648
- **Effort preamble rules:**
649
- - `low` → Prepend efficiency instruction. Agent should make fewest possible tool calls.
650
- - `medium` → Prepend balanced instruction. Agent skips preamble but explains non-obvious decisions.
651
- - `high` → No preamble added. Agent uses full reasoning capabilities.
251
+ **Model+effort routing** (read from PLAN.md, defaults: sonnet/medium):
652
252
 
653
- **Checkpoint schema:** `.deepflow/checkpoint.json` in worktree:
654
- ```json
655
- {"completed_tasks": ["T1","T2"], "current_wave": 2, "worktree_path": ".deepflow/worktrees/upload", "worktree_branch": "df/upload"}
656
- ```
253
+ | Fields | Agent | Preamble |
254
+ |--------|-------|----------|
255
+ | haiku/low | `Agent(model="haiku")` | `Maximally efficient: skip explanations, minimize tool calls, straight to implementation.` |
256
+ | sonnet/medium | `Agent(model="sonnet")` | `Direct and efficient. Explain only non-obvious logic.` |
257
+ | opus/high | `Agent(model="opus")` | _(none)_ |
657
258
 
658
- ---
259
+ **Checkpoint:** `.deepflow/checkpoint.json`: `{"completed_tasks":["T1"],"current_wave":2,"worktree_path":"...","worktree_branch":"df/..."}`
659
260
 
660
261
  ## Failure Handling
661
262
 
662
- When task fails ratchet and is reverted:
663
- - `TaskUpdate(taskId: native_id, status: "pending")` — dependents remain blocked
664
- - Repeated failure → spawn `Task(subagent_type="reasoner", prompt="Debug failure: {ratchet output}")`
665
- - Leave worktree intact, keep checkpoint.json
666
- - Output: worktree path/branch, `cd {path}` to investigate, `--continue` to resume, `--fresh` to discard
263
+ Reverted task: `TaskUpdate(status: "pending")`, dependents stay blocked. Repeated failure → spawn reasoner debugger. Leave worktree+checkpoint intact. Output: path, `cd` command, `--continue`/`--fresh` options.
667
264
 
668
265
  ## Rules
669
266
 
670
267
  | Rule | Detail |
671
268
  |------|--------|
672
- | Zero test files → bootstrap first | Bootstrap is cycle's sole task when snapshot empty |
269
+ | Zero tests → bootstrap first | Sole task when snapshot empty |
673
270
  | 1 task = 1 agent = 1 commit | `atomic-commits` skill |
674
- | 1 file = 1 writer | Sequential if conflict |
675
- | Agent writes code, orchestrator measures | Ratchet is the judge |
676
- | No LLM evaluates LLM work | Health checks only |
677
- | ≥2 spikes same problem → parallel probes | Never run competing spikes sequentially |
678
- | All probe worktrees preserved | Losers renamed `-failed`; never deleted |
679
- | Machine-selected winner | Regressions > coverage > files changed; no LLM judge |
271
+ | 1 file = 1 writer | Sequential on conflict |
272
+ | Agent codes, orchestrator measures | Ratchet judges |
273
+ | No LLM evaluates LLM | Health checks only |
274
+ | ≥2 spikes → parallel probes | Never sequential |
275
+ | Probe worktrees preserved | Losers `-failed`, never deleted |
276
+ | Machine-selected winner | Regressions > coverage > files; no LLM judge |
680
277
  | External APIs → chub first | Skip if unavailable |
681
- | 1 optimize task at a time | Inherently sequential — no parallel optimize tasks |
682
- | Optimize = atomic changes only | One modification per cycle for diagnosability |
683
- | Ratchet + metric = both required | Optimize keeps commit only if ratchet AND metric improve |
684
- | Plateau → probes, not more cycles | 3 cycles without ≥1% improvement triggers probe launch |
685
- | Circuit breaker = 3 consecutive reverts | Halts optimize loop, requires human intervention |
686
- | Optimize probes need diversity | Every probe set: ≥1 contraditoria + ≥1 ingenua minimum |
687
-
688
- ## Example
689
-
690
- ```
691
- /df:execute (context: 12%)
692
-
693
- Loading PLAN.md... T1 ready, T2/T3 blocked by T1
694
- Ratchet snapshot: 24 pre-existing test files
695
-
696
- Wave 1: TaskUpdate(T1, in_progress)
697
- [Agent "T1" completed]
698
- Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
699
- ✓ T1: ratchet passed (abc1234)
700
- TaskUpdate(T1, completed) → auto-unblocks T2, T3
701
-
702
- Wave 2: TaskUpdate(T2/T3, in_progress)
703
- [Agent "T2" completed] ✓ T2: ratchet passed (def5678)
704
- [Agent "T3" completed] ✓ T3: ratchet passed (ghi9012)
705
-
706
- Context: 35% — All tasks done for doing-upload.
707
- Running /df:verify doing-upload...
708
- ✓ L0 | ✓ L1 (3/3 files) | ⚠ L2 (no coverage tool) | ✓ L4 (24 tests)
709
- ✓ Merged df/upload to main
710
- ✓ Spec complete: doing-upload → done-upload
711
- Complete: 3/3
712
- ```
278
+ | 1 optimize at a time | Sequential |
279
+ | Optimize = atomic only | One change per cycle |
280
+ | Ratchet + metric both required | Keep only if both pass |
281
+ | Plateau → probes | 3 cycles <1% triggers probes |
282
+ | Circuit breaker = 3 reverts | Halts, needs human |
283
+ | Probe diversity | ≥1 contraditoria + ≥1 ingenua |