deepflow 0.1.87 → 0.1.88
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/install.js +32 -2
- package/hooks/df-spec-lint.js +78 -4
- package/hooks/df-statusline.js +77 -5
- package/hooks/df-tool-usage-spike.js +41 -0
- package/hooks/df-tool-usage.js +86 -0
- package/package.json +1 -1
- package/src/commands/df/auto-cycle.md +75 -558
- package/src/commands/df/auto.md +9 -48
- package/src/commands/df/consolidate.md +14 -38
- package/src/commands/df/debate.md +27 -156
- package/src/commands/df/discover.md +35 -181
- package/src/commands/df/execute.md +148 -577
- package/src/commands/df/note.md +37 -176
- package/src/commands/df/plan.md +80 -210
- package/src/commands/df/report.md +27 -184
- package/src/commands/df/resume.md +18 -101
- package/src/commands/df/spec.md +49 -145
- package/src/commands/df/verify.md +59 -606
- package/src/skills/browse-fetch/SKILL.md +32 -257
- package/src/skills/browse-verify/SKILL.md +40 -174
- package/src/skills/code-completeness/SKILL.md +2 -9
- package/src/skills/gap-discovery/SKILL.md +19 -86
- package/templates/spec-template.md +12 -1
|
@@ -10,36 +10,26 @@ description: Execute tasks from PLAN.md with agent spawning, ratchet health chec
|
|
|
10
10
|
You are a coordinator. Spawn agents, run ratchet checks, update PLAN.md. Never implement code yourself.
|
|
11
11
|
|
|
12
12
|
**NEVER:** Read source files, edit code, use TaskOutput, use EnterPlanMode, use ExitPlanMode
|
|
13
|
-
|
|
14
|
-
**ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, run ratchet health checks after each agent completes, update PLAN.md, write `.deepflow/decisions.md` in the main tree
|
|
13
|
+
**ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, run ratchet health checks, update PLAN.md, write `.deepflow/decisions.md`
|
|
15
14
|
|
|
16
15
|
## Core Loop (Notification-Driven)
|
|
17
16
|
|
|
18
|
-
Each task = one background agent.
|
|
19
|
-
|
|
20
|
-
**NEVER use TaskOutput** — returns full transcripts (100KB+) that explode context.
|
|
17
|
+
Each task = one background agent. **NEVER use TaskOutput** (100KB+ transcripts explode context).
|
|
21
18
|
|
|
22
19
|
```
|
|
23
20
|
1. Spawn ALL wave agents with run_in_background=true in ONE message
|
|
24
|
-
2. STOP. End
|
|
21
|
+
2. STOP. End turn. Do NOT poll.
|
|
25
22
|
3. On EACH notification:
|
|
26
|
-
a.
|
|
23
|
+
a. Ratchet check (§5.5)
|
|
27
24
|
b. Passed → TaskUpdate(status: "completed"), update PLAN.md [x] + commit hash
|
|
28
|
-
c. Failed →
|
|
29
|
-
d. Report ONE line: "✓ T1: ratchet passed (abc123)" or "⚕ T1: salvaged
|
|
25
|
+
c. Failed → partial salvage (§5.5). Salvaged → passed. Not → git revert, TaskUpdate(status: "pending")
|
|
26
|
+
d. Report ONE line: "✓ T1: ratchet passed (abc123)" or "⚕ T1: salvaged (abc124)" or "✗ T1: reverted"
|
|
30
27
|
e. NOT all done → end turn, wait | ALL done → next wave or finish
|
|
31
|
-
4. Between waves:
|
|
28
|
+
4. Between waves: context ≥50% → checkpoint and exit.
|
|
32
29
|
5. Repeat until: all done, all blocked, or context ≥50%.
|
|
33
30
|
```
|
|
34
31
|
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
Statusline writes `.deepflow/context.json`: `{"percentage": 45}`
|
|
38
|
-
|
|
39
|
-
| Context % | Action |
|
|
40
|
-
|-----------|--------|
|
|
41
|
-
| < 50% | Full parallelism (up to 5 agents) |
|
|
42
|
-
| ≥ 50% | Wait for running agents, checkpoint, exit |
|
|
32
|
+
**Context threshold:** Statusline writes `.deepflow/context.json`: `{"percentage": 45}`. <50% = full parallelism (up to 5). ≥50% = wait, checkpoint, exit.
|
|
43
33
|
|
|
44
34
|
---
|
|
45
35
|
|
|
@@ -47,109 +37,51 @@ Statusline writes `.deepflow/context.json`: `{"percentage": 45}`
|
|
|
47
37
|
|
|
48
38
|
### 1. CHECK CHECKPOINT
|
|
49
39
|
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
→ Verify worktree exists on disk (else error: "Use --fresh")
|
|
53
|
-
→ Skip completed tasks, resume execution
|
|
54
|
-
--fresh → Delete checkpoint, start fresh
|
|
55
|
-
checkpoint exists → Prompt: "Resume? (y/n)"
|
|
56
|
-
else → Start fresh
|
|
57
|
-
```
|
|
58
|
-
|
|
59
|
-
Shell injection (use output directly — no manual file reads needed):
|
|
60
|
-
- `` !`cat .deepflow/checkpoint.json 2>/dev/null || echo 'NOT_FOUND'` ``
|
|
61
|
-
- `` !`git diff --quiet && echo 'CLEAN' || echo 'DIRTY'` ``
|
|
40
|
+
`--continue` → load `.deepflow/checkpoint.json`, verify worktree exists (else error "Use --fresh"), skip completed. `--fresh` → delete checkpoint. Checkpoint exists → prompt "Resume? (y/n)".
|
|
41
|
+
Shell: `` !`cat .deepflow/checkpoint.json 2>/dev/null || echo 'NOT_FOUND'` `` / `` !`git diff --quiet && echo 'CLEAN' || echo 'DIRTY'` ``
|
|
62
42
|
|
|
63
43
|
### 1.5. CREATE WORKTREE
|
|
64
44
|
|
|
65
|
-
Require clean HEAD
|
|
66
|
-
Create worktree: `.deepflow/worktrees/{spec}` on branch `df/{spec}`.
|
|
67
|
-
Reuse if exists. `--fresh` deletes first.
|
|
68
|
-
|
|
69
|
-
If `worktree.sparse_paths` is non-empty in config, enable sparse checkout:
|
|
70
|
-
```bash
|
|
71
|
-
git worktree add --no-checkout -b df/{spec} .deepflow/worktrees/{spec}
|
|
72
|
-
cd .deepflow/worktrees/{spec}
|
|
73
|
-
git sparse-checkout set {sparse_paths...}
|
|
74
|
-
git checkout df/{spec}
|
|
75
|
-
```
|
|
45
|
+
Require clean HEAD. Derive SPEC_NAME from `specs/doing-*.md`. Create `.deepflow/worktrees/{spec}` on branch `df/{spec}`. Reuse if exists; `--fresh` deletes first. If `worktree.sparse_paths` non-empty: `git worktree add --no-checkout`, `sparse-checkout set {paths}`, checkout.
|
|
76
46
|
|
|
77
47
|
### 1.6. RATCHET SNAPSHOT
|
|
78
48
|
|
|
79
|
-
Snapshot pre-existing test files
|
|
80
|
-
|
|
49
|
+
Snapshot pre-existing test files — only these count for ratchet (agent-created excluded):
|
|
81
50
|
```bash
|
|
82
|
-
|
|
83
|
-
git ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' \
|
|
84
|
-
> .deepflow/auto-snapshot.txt
|
|
51
|
+
git -C ${WORKTREE_PATH} ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' > .deepflow/auto-snapshot.txt
|
|
85
52
|
```
|
|
86
53
|
|
|
87
54
|
### 1.7. NO-TESTS BOOTSTRAP
|
|
88
55
|
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
1. Spawn ONE bootstrap agent (section 6 Bootstrap Task) to write tests for `edit_scope` files
|
|
92
|
-
2. On ratchet pass: re-snapshot, report `"bootstrap: completed"`, end cycle (no PLAN.md tasks this cycle)
|
|
93
|
-
3. On ratchet fail: revert, halt with "Bootstrap failed — manual intervention required"
|
|
94
|
-
|
|
95
|
-
Subsequent cycles use bootstrapped tests as ratchet baseline.
|
|
56
|
+
Zero test files → spawn ONE bootstrap agent (§6 Bootstrap). Pass → re-snapshot, end cycle. Fail → revert, halt "Bootstrap failed — manual intervention required". Subsequent cycles use bootstrapped tests as baseline.
|
|
96
57
|
|
|
97
58
|
### 2. LOAD PLAN
|
|
98
59
|
|
|
99
|
-
|
|
100
|
-
Load: PLAN.md (required), specs/doing-*.md, .deepflow/config.yaml
|
|
101
|
-
If missing: "No PLAN.md found. Run /df:plan first."
|
|
102
|
-
```
|
|
103
|
-
|
|
104
|
-
Shell injection (use output directly — no manual file reads needed):
|
|
105
|
-
- `` !`cat .deepflow/checkpoint.json 2>/dev/null || echo 'NOT_FOUND'` ``
|
|
106
|
-
- `` !`git diff --quiet && echo 'CLEAN' || echo 'DIRTY'` ``
|
|
60
|
+
Load PLAN.md (required), specs/doing-*.md, .deepflow/config.yaml. Missing → "No PLAN.md found. Run /df:plan first."
|
|
107
61
|
|
|
108
62
|
### 2.5. REGISTER NATIVE TASKS
|
|
109
63
|
|
|
110
|
-
For each `[ ]` task
|
|
111
|
-
|
|
112
|
-
### 3. CHECK FOR UNPLANNED SPECS
|
|
64
|
+
For each `[ ]` task: `TaskCreate(subject: "{task_id}: {description}", activeForm: "{gerund}", description: full block)`. Store task_id → native ID. Set deps via `TaskUpdate(addBlockedBy: [...])`. `--continue` → only remaining `[ ]` items.
|
|
113
65
|
|
|
114
|
-
|
|
66
|
+
### 3–4. READY TASKS
|
|
115
67
|
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
Ready = TaskList where status: "pending" AND blockedBy: empty.
|
|
68
|
+
Warn if unplanned `specs/*.md` (excluding doing-/done-) exist (non-blocking). Ready = TaskList where status: "pending" AND blockedBy: empty.
|
|
119
69
|
|
|
120
70
|
### 5. SPAWN AGENTS
|
|
121
71
|
|
|
122
|
-
Context ≥50
|
|
123
|
-
|
|
124
|
-
Before spawning: `TaskUpdate(taskId: native_id, status: "in_progress")` — activates UI spinner.
|
|
125
|
-
|
|
126
|
-
**Token tracking — record start:**
|
|
127
|
-
```
|
|
128
|
-
start_percentage = !`grep -o '"percentage":[0-9]*' .deepflow/context.json 2>/dev/null | grep -o '[0-9]*' || echo ''`
|
|
129
|
-
start_timestamp = !`date -u +%Y-%m-%dT%H:%M:%SZ`
|
|
130
|
-
```
|
|
131
|
-
Store both values in memory (keyed by task_id) for use after ratchet completes. Omit if context.json unavailable.
|
|
132
|
-
|
|
133
|
-
**NEVER use `isolation: "worktree"` on Task calls.** Deepflow manages a shared worktree so wave 2 sees wave 1 commits.
|
|
72
|
+
Context ≥50% → checkpoint and exit. Before spawning: `TaskUpdate(status: "in_progress")`.
|
|
134
73
|
|
|
135
|
-
**
|
|
74
|
+
**Token tracking start:** Store `start_percentage` (from context.json) and `start_timestamp` (ISO 8601) keyed by task_id. Omit if unavailable.
|
|
136
75
|
|
|
137
|
-
**
|
|
138
|
-
Before spawning, check `Files:` lists of all ready tasks. If two+ ready tasks share a file:
|
|
139
|
-
1. Sort conflicting tasks by task number (T1 < T2 < T3)
|
|
140
|
-
2. Spawn only the lowest-numbered task from each conflict group
|
|
141
|
-
3. Remaining tasks stay `pending` — they become ready once the spawned task completes
|
|
142
|
-
4. Log: `"⏳ T{N} deferred — file conflict with T{M} on {filename}"`
|
|
76
|
+
**NEVER use `isolation: "worktree"`.** Deepflow manages a shared worktree so wave 2 sees wave 1 commits. **Spawn ALL ready tasks in ONE message** except file conflicts.
|
|
143
77
|
|
|
144
|
-
|
|
78
|
+
**File conflicts (1 file = 1 writer):** Check `Files:` lists. Overlap → spawn lowest-numbered only; rest stay pending. Log: `"⏳ T{N} deferred — file conflict with T{M} on {filename}"`
|
|
145
79
|
|
|
146
|
-
**[OPTIMIZE] tasks
|
|
80
|
+
**≥2 [SPIKE] tasks same problem →** Parallel Spike Probes (§5.7). **[OPTIMIZE] tasks →** Optimize Cycle (§5.9), one at a time.
|
|
147
81
|
|
|
148
82
|
### 5.5. RATCHET CHECK
|
|
149
83
|
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
**Auto-detect commands:**
|
|
84
|
+
Run health checks in worktree after each agent completes.
|
|
153
85
|
|
|
154
86
|
| File | Build | Test | Typecheck | Lint |
|
|
155
87
|
|------|-------|------|-----------|------|
|
|
@@ -158,555 +90,194 @@ After each agent completes, run health checks in the worktree.
|
|
|
158
90
|
| `Cargo.toml` | `cargo build` | `cargo test` | — | `cargo clippy` |
|
|
159
91
|
| `go.mod` | `go build ./...` | `go test ./...` | — | `go vet ./...` |
|
|
160
92
|
|
|
161
|
-
Run Build → Test → Typecheck → Lint (stop on first failure).
|
|
162
|
-
|
|
163
|
-
**Edit scope validation** (if spec declares `edit_scope`): check `git diff HEAD~1 --name-only` against allowed globs. Violations → `git revert HEAD --no-edit`, report "Edit scope violation: {files}".
|
|
164
|
-
|
|
165
|
-
**Impact completeness check** (if task has Impact block in PLAN.md):
|
|
166
|
-
Compare `git diff HEAD~1 --name-only` against Impact callers/duplicates list.
|
|
167
|
-
File listed but not modified → **advisory warning**: "Impact gap: {file} listed as {caller|duplicate} but not modified — verify manually". Not auto-revert (callers sometimes don't need changes), but flags the risk.
|
|
168
|
-
|
|
169
|
-
**Metric gate (Optimize tasks only):**
|
|
170
|
-
|
|
171
|
-
After ratchet passes, if the current task has an `Optimize:` block, run the metric gate:
|
|
172
|
-
|
|
173
|
-
1. Run the `metric` shell command in the worktree: `cd ${WORKTREE_PATH} && eval "${metric_command}"`
|
|
174
|
-
2. Parse output as float. Non-numeric output → cycle failure (revert, log "metric parse error: {raw output}")
|
|
175
|
-
3. Compare against previous measurement using `direction`:
|
|
176
|
-
- `direction: higher` → new value must be > previous + (previous × min_improvement_threshold)
|
|
177
|
-
- `direction: lower` → new value must be < previous - (previous × min_improvement_threshold)
|
|
178
|
-
4. Both ratchet AND metric improvement required → keep commit
|
|
179
|
-
5. Ratchet passes but metric did not improve → revert (log "ratchet passed but metric stagnant/regressed: {old} → {new}")
|
|
180
|
-
6. Run each `secondary_metrics` command, parse as float. If regression > `regression_threshold` (default 5%) compared to baseline: append WARNING to `.deepflow/auto-report.md`: `"WARNING: {name} regressed {delta}% ({baseline_val} → {new_val}) at cycle {N}"`. Do NOT auto-revert.
|
|
93
|
+
Run Build → Test → Typecheck → Lint (stop on first failure). Ratchet uses ONLY pre-existing tests from `.deepflow/auto-snapshot.txt`.
|
|
181
94
|
|
|
182
|
-
**
|
|
183
|
-
|
|
184
|
-
After ratchet checks complete, truncate command output for context efficiency:
|
|
185
|
-
|
|
186
|
-
- **Success (all checks passed):** Suppress output entirely — do not include build/test/lint output in reports
|
|
187
|
-
- **Build failure:** Include last 15 lines of build error only
|
|
188
|
-
- **Test failure:** Include failed test name(s) + last 20 lines of test output
|
|
189
|
-
- **Typecheck/lint failure:** Include error count + first 5 errors only
|
|
190
|
-
|
|
191
|
-
**Token tracking — write result (on ratchet pass):**
|
|
192
|
-
|
|
193
|
-
After all checks pass, compute and write the token block to `.deepflow/results/T{N}.yaml`:
|
|
194
|
-
|
|
195
|
-
```
|
|
196
|
-
end_percentage = !`grep -o '"percentage":[0-9]*' .deepflow/context.json 2>/dev/null | grep -o '[0-9]*' || echo ''`
|
|
197
|
-
```
|
|
95
|
+
**Edit scope validation:** `git diff HEAD~1 --name-only` vs allowed globs. Violation → revert, report.
|
|
96
|
+
**Impact completeness:** diff vs Impact callers/duplicates. Gap → advisory warning (no revert).
|
|
198
97
|
|
|
199
|
-
Parse
|
|
200
|
-
```bash
|
|
201
|
-
awk -v start="REPLACE_start_timestamp" -v end="REPLACE_end_timestamp" '
|
|
202
|
-
{
|
|
203
|
-
ts=""; inp=0; cre=0; rd=0
|
|
204
|
-
if (match($0, /"timestamp":"[^"]*"/)) { ts=substr($0, RSTART+13, RLENGTH-14) }
|
|
205
|
-
if (ts >= start && ts <= end) {
|
|
206
|
-
if (match($0, /"input_tokens":[0-9]+/)) inp=substr($0, RSTART+15, RLENGTH-15)
|
|
207
|
-
if (match($0, /"cache_creation_input_tokens":[0-9]+/)) cre=substr($0, RSTART+30, RLENGTH-30)
|
|
208
|
-
if (match($0, /"cache_read_input_tokens":[0-9]+/)) rd=substr($0, RSTART+26, RLENGTH-26)
|
|
209
|
-
si+=inp; sc+=cre; sr+=rd
|
|
210
|
-
}
|
|
211
|
-
}
|
|
212
|
-
END { printf "{\"input_tokens\":%d,\"cache_creation_input_tokens\":%d,\"cache_read_input_tokens\":%d}\n", si+0, sc+0, sr+0 }
|
|
213
|
-
' .deepflow/token-history.jsonl 2>/dev/null || echo '{}'
|
|
214
|
-
```
|
|
98
|
+
**Metric gate (Optimize only):** Run `eval "${metric_command}"` with cwd=`${WORKTREE_PATH}` (never `cd && eval`). Parse float (non-numeric → revert). Compare using `direction`+`min_improvement_threshold`. Both ratchet AND metric must pass → keep. Ratchet pass + metric stagnant → revert. Secondary metrics: regression > `regression_threshold` (5%) → WARNING in auto-report.md (no revert).
|
|
215
99
|
|
|
216
|
-
|
|
217
|
-
```
|
|
218
|
-
!`cat .deepflow/results/T{N}.yaml 2>/dev/null || echo ''`
|
|
219
|
-
```
|
|
100
|
+
**Output truncation:** Success → suppress. Build fail → last 15 lines. Test fail → names + last 20 lines. Typecheck/lint → count + first 5 errors.
|
|
220
101
|
|
|
221
|
-
|
|
102
|
+
**Token tracking result (on pass):** Read `end_percentage`. Sum token fields from `.deepflow/token-history.jsonl` between start/end timestamps (awk ISO 8601 compare). Write to `.deepflow/results/T{N}.yaml`:
|
|
222
103
|
```yaml
|
|
223
104
|
tokens:
|
|
224
|
-
start_percentage: {
|
|
225
|
-
end_percentage: {
|
|
226
|
-
delta_percentage: {
|
|
227
|
-
input_tokens: {sum
|
|
228
|
-
cache_creation_input_tokens: {sum
|
|
229
|
-
cache_read_input_tokens: {sum
|
|
105
|
+
start_percentage: {val}
|
|
106
|
+
end_percentage: {val}
|
|
107
|
+
delta_percentage: {end - start}
|
|
108
|
+
input_tokens: {sum}
|
|
109
|
+
cache_creation_input_tokens: {sum}
|
|
110
|
+
cache_read_input_tokens: {sum}
|
|
230
111
|
```
|
|
112
|
+
Omit if context.json/token-history.jsonl/awk unavailable. Never fail ratchet for tracking errors.
|
|
113
|
+
|
|
114
|
+
**Evaluate:** All pass → commit stands. Failure → partial salvage:
|
|
115
|
+
1. Lint/typecheck-only (build+tests passed): spawn `Agent(model="haiku")` to fix. Re-ratchet. Fail → revert both.
|
|
116
|
+
2. Build/test failure → `git revert HEAD --no-edit` (no salvage).
|
|
231
117
|
|
|
232
|
-
|
|
118
|
+
### 5.7. PARALLEL SPIKE PROBES
|
|
233
119
|
|
|
234
|
-
|
|
120
|
+
Trigger: ≥2 [SPIKE] tasks with same blocker or identical hypothesis.
|
|
235
121
|
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
122
|
+
1. `BASELINE=$(git rev-parse HEAD)` in shared worktree
|
|
123
|
+
2. Sub-worktrees per spike: `git worktree add -b df/{spec}--probe-{ID} .deepflow/worktrees/{spec}/probe-{ID} ${BASELINE}`
|
|
124
|
+
3. Spawn all probes in ONE message. End turn.
|
|
125
|
+
4. Per notification: ratchet (§5.5). Record: ratchet_passed, regressions, coverage_delta, files_changed, commit.
|
|
126
|
+
5. **Winner selection** (no LLM judge): disqualify regressions. Standard: fewer regressions > coverage > fewer files > first complete. Optimize: best metric delta > fewer regressions > fewer files. No passes → reset pending for debugger.
|
|
127
|
+
6. Preserve all worktrees. Losers: branch + `-failed`. Record in checkpoint.json.
|
|
128
|
+
7. Log all outcomes to `.deepflow/auto-memory.yaml` under `spike_insights`+`probe_learnings` (schema in auto-cycle.md). Both winners and losers.
|
|
129
|
+
8. Cherry-pick winner into shared worktree. Winner → `[x] [PROBE_WINNER]`, losers → `[~] [PROBE_FAILED]`.
|
|
243
130
|
|
|
244
|
-
|
|
131
|
+
#### 5.7.1. PROBE DIVERSITY (Optimize Probes)
|
|
245
132
|
|
|
246
|
-
|
|
133
|
+
Roles: **contextualizada** (refine best), **contraditoria** (opposite of best), **ingenua** (fresh, no context).
|
|
247
134
|
|
|
248
|
-
|
|
249
|
-
|
|
250
|
-
1. **Baseline:** Record `BASELINE=$(git rev-parse HEAD)` in shared worktree
|
|
251
|
-
2. **Sub-worktrees:** Per spike: `git worktree add -b df/{spec}--probe-{SPIKE_ID} .deepflow/worktrees/{spec}/probe-{SPIKE_ID} ${BASELINE}`
|
|
252
|
-
3. **Spawn:** All probes in ONE message, each targeting its probe worktree. End turn.
|
|
253
|
-
4. **Ratchet:** Per notification, run standard ratchet (5.5) in probe worktree. Record: ratchet_passed, regressions, coverage_delta, files_changed, commit
|
|
254
|
-
5. **Select winner** (after ALL complete, no LLM judge):
|
|
255
|
-
- Disqualify any with regressions
|
|
256
|
-
- **Standard spikes**: Rank: fewer regressions > higher coverage_delta > fewer files_changed > first to complete
|
|
257
|
-
- **Optimize probes**: Rank: best metric improvement (absolute delta toward target) > fewer regressions > fewer files_changed
|
|
258
|
-
- No passes → reset all to pending for retry with debugger
|
|
259
|
-
6. **Preserve all worktrees.** Losers: rename branch + `-failed` suffix. Record in checkpoint.json under `"spike_probes"`
|
|
260
|
-
7. **Log ALL probe outcomes** to `.deepflow/auto-memory.yaml` (main tree):
|
|
261
|
-
```yaml
|
|
262
|
-
spike_insights:
|
|
263
|
-
- date: "YYYY-MM-DD"
|
|
264
|
-
spec: "{spec_name}"
|
|
265
|
-
spike_id: "SPIKE_A"
|
|
266
|
-
hypothesis: "{from PLAN.md}"
|
|
267
|
-
outcome: "winner"
|
|
268
|
-
approach: "{one-sentence summary of what the winning probe chose}"
|
|
269
|
-
ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
|
|
270
|
-
branch: "df/{spec}--probe-SPIKE_A"
|
|
271
|
-
- date: "YYYY-MM-DD"
|
|
272
|
-
spec: "{spec_name}"
|
|
273
|
-
spike_id: "SPIKE_B"
|
|
274
|
-
hypothesis: "{from PLAN.md}"
|
|
275
|
-
outcome: "failed" # or "passed-but-lost"
|
|
276
|
-
failure_reason: "{first failed check + error summary}"
|
|
277
|
-
ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
|
|
278
|
-
worktree: ".deepflow/worktrees/{spec}/probe-SPIKE_B-failed"
|
|
279
|
-
branch: "df/{spec}--probe-SPIKE_B-failed"
|
|
280
|
-
probe_learnings: # read by /df:auto-cycle each start AND included in per-task preamble
|
|
281
|
-
- spike: "SPIKE_A"
|
|
282
|
-
probe: "probe-SPIKE_A"
|
|
283
|
-
insight: "{one-sentence summary of winning approach — e.g. 'Use Node.js over Bun for Playwright'}"
|
|
284
|
-
- spike: "SPIKE_B"
|
|
285
|
-
probe: "probe-SPIKE_B"
|
|
286
|
-
insight: "{one-sentence summary from failure_reason}"
|
|
287
|
-
```
|
|
288
|
-
Create file if missing. Preserve existing keys when merging. Log BOTH winners and losers — downstream tasks need to know what was chosen, not just what failed.
|
|
289
|
-
8. **Promote winner:** Cherry-pick into shared worktree. Winner → `[x] [PROBE_WINNER]`, losers → `[~] [PROBE_FAILED]`. Resume standard loop.
|
|
290
|
-
|
|
291
|
-
#### 5.7.1. PROBE DIVERSITY ENFORCEMENT (Optimize Probes)
|
|
292
|
-
|
|
293
|
-
When spawning probes for optimize plateau resolution, enforce diversity roles:
|
|
294
|
-
|
|
295
|
-
**Role definitions:**
|
|
296
|
-
- **contextualizada**: Builds on the best approach so far — refines, extends, or combines what worked. Prompt includes: "Build on the best result so far: {best_approach_summary}. Refine or extend it."
|
|
297
|
-
- **contraditoria**: Tries the opposite of the current best. Prompt includes: "The best approach so far is {best_approach_summary}. Try the OPPOSITE direction — if it cached, don't cache; if it optimized hot path, optimize cold path; etc."
|
|
298
|
-
- **ingenua**: No prior context — naive fresh attempt. Prompt includes: "Ignore all prior attempts. Approach this from scratch with no assumptions about what works."
|
|
299
|
-
|
|
300
|
-
**Auto-scaling by probe round:**
|
|
301
|
-
|
|
302
|
-
| Probe round | Count | Required roles |
|
|
303
|
-
|-------------|-------|----------------|
|
|
135
|
+
| Round | Count | Roles |
|
|
136
|
+
|-------|-------|-------|
|
|
304
137
|
| 1st plateau | 2 | 1 contraditoria + 1 ingenua |
|
|
305
138
|
| 2nd plateau | 4 | 1 contextualizada + 2 contraditoria + 1 ingenua |
|
|
306
|
-
| 3rd+
|
|
139
|
+
| 3rd+ | 6 | 2 contextualizada + 2 contraditoria + 2 ingenua |
|
|
307
140
|
|
|
308
|
-
|
|
309
|
-
- Every probe set MUST include ≥1 contraditoria and ≥1 ingenua (minimum diversity)
|
|
310
|
-
- contextualizada only added from round 2+ (needs prior data to build on)
|
|
311
|
-
- Each probe prompt includes its role label and role-specific instruction
|
|
312
|
-
- Probe scale persists in `optimize_state.probe_scale` in `auto-memory.yaml`
|
|
141
|
+
Every set: ≥1 contraditoria + ≥1 ingenua. contextualizada from round 2+ only. Scale persists in `optimize_state.probe_scale`.
|
|
313
142
|
|
|
314
143
|
### 5.9. OPTIMIZE CYCLE
|
|
315
144
|
|
|
316
|
-
Trigger: task has `Optimize:` block
|
|
317
|
-
|
|
318
|
-
**
|
|
319
|
-
|
|
320
|
-
#### 5.9.1. INITIALIZATION
|
|
321
|
-
|
|
322
|
-
1. Parse `Optimize:` block from PLAN.md task: `metric`, `target`, `direction`, `max_cycles`, `secondary_metrics`
|
|
323
|
-
2. Load or initialize `optimize_state` from `.deepflow/auto-memory.yaml`:
|
|
324
|
-
```yaml
|
|
325
|
-
optimize_state:
|
|
326
|
-
task_id: "T{n}"
|
|
327
|
-
metric_command: "{shell command}"
|
|
328
|
-
target: {number}
|
|
329
|
-
direction: "higher|lower"
|
|
330
|
-
baseline: null # set on first measure
|
|
331
|
-
current_best: null # best metric value seen
|
|
332
|
-
best_commit: null # commit hash of best value
|
|
333
|
-
cycles_run: 0
|
|
334
|
-
cycles_without_improvement: 0
|
|
335
|
-
consecutive_reverts: 0
|
|
336
|
-
probe_scale: 0 # 0=no probes yet, 2/4/6
|
|
337
|
-
max_cycles: {number}
|
|
338
|
-
history: [] # [{cycle, value, delta, kept, commit}]
|
|
339
|
-
failed_hypotheses: [] # ["{description}"]
|
|
340
|
-
```
|
|
341
|
-
3. **Measure baseline**: `cd ${WORKTREE_PATH} && eval "${metric_command}"` → parse float → store as `baseline` and `current_best`
|
|
342
|
-
4. Measure each secondary metric → store as `secondary_baselines`
|
|
343
|
-
5. Check if target already met (`direction: higher` → baseline >= target; `lower` → baseline <= target). If met → mark task `[x]`, log "target already met: {baseline}", done.
|
|
344
|
-
|
|
345
|
-
#### 5.9.2. CYCLE LOOP
|
|
346
|
-
|
|
347
|
-
Each cycle = one agent spawn + measure + keep/revert decision.
|
|
145
|
+
Trigger: task has `Optimize:` block. One at a time, N cycles until stop condition.
|
|
146
|
+
|
|
147
|
+
**Init:** Parse metric/target/direction/max_cycles/secondary_metrics. Load or init `optimize_state` in auto-memory.yaml (fields: task_id, metric_command, target, direction, baseline, current_best, best_commit, cycles_run, cycles_without_improvement, consecutive_reverts, probe_scale, max_cycles, history[], failed_hypotheses[]). Measure baseline (`eval` with cwd=worktree) → store as baseline+current_best. Measure secondaries. Target met → mark `[x]`, done.
|
|
348
148
|
|
|
149
|
+
**Cycle loop:**
|
|
349
150
|
```
|
|
350
151
|
REPEAT:
|
|
351
|
-
1. Check stop conditions
|
|
352
|
-
2. Spawn ONE optimize agent (
|
|
353
|
-
3.
|
|
354
|
-
|
|
355
|
-
|
|
356
|
-
|
|
357
|
-
|
|
358
|
-
|
|
359
|
-
|
|
360
|
-
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
|
|
364
|
-
|
|
365
|
-
|
|
366
|
-
|
|
367
|
-
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
|
|
372
|
-
m. Check context %. If ≥50% → checkpoint and exit (auto-cycle resumes).
|
|
373
|
-
```
|
|
152
|
+
1. Check stop conditions → if triggered, exit
|
|
153
|
+
2. Spawn ONE optimize agent (§6) run_in_background=true. STOP, end turn.
|
|
154
|
+
3. On notification:
|
|
155
|
+
a. Ratchet fail → revert, ++consecutive_reverts, log hypothesis, goto 1
|
|
156
|
+
b. Metric parse error → revert, ++consecutive_reverts
|
|
157
|
+
c. improvement = (new - best) / |best| × 100 (flip for lower; absolute if best==0)
|
|
158
|
+
d. >= 1% threshold → KEEP, update best, reset counters
|
|
159
|
+
e. < threshold → REVERT, ++cycles_without_improvement
|
|
160
|
+
f. ++cycles_run, append history, check secondaries, persist state
|
|
161
|
+
g. Report: "⟳ T{n} cycle {N}: {old}→{new} ({delta}%) — {kept|reverted} [best: X, target: Y]"
|
|
162
|
+
h. Context ≥50% → checkpoint, exit
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
**Stop conditions:**
|
|
166
|
+
|
|
167
|
+
| Condition | Action |
|
|
168
|
+
|-----------|--------|
|
|
169
|
+
| Target reached | Mark `[x]` |
|
|
170
|
+
| cycles_run >= max_cycles | Mark `[x]`. If best < baseline → `git reset --hard {best_commit}` |
|
|
171
|
+
| 3 cycles without improvement | Launch probes (plateau) |
|
|
172
|
+
| 3 consecutive reverts | Halt, task `[ ]`, requires human intervention |
|
|
374
173
|
|
|
375
|
-
|
|
376
|
-
|
|
377
|
-
|
|
378
|
-
|-----------|-----------|--------|
|
|
379
|
-
| **Target reached** | `direction: higher` → value >= target; `lower` → value <= target | Mark task `[x]`, log "target reached: {value}" |
|
|
380
|
-
| **Max cycles** | `cycles_run >= max_cycles` | Mark task `[x]` with note: "max cycles reached, best: {current_best}". If current_best worse than baseline → `git reset --hard {best_commit}`, log "reverted to best-known" |
|
|
381
|
-
| **Plateau** | `cycles_without_improvement >= 3` | Pause normal cycle → launch probes (5.9.4) |
|
|
382
|
-
| **Circuit breaker** | `consecutive_reverts >= 3` | Halt, task stays `[ ]`, log "circuit breaker: 3 consecutive reverts". Requires human intervention. |
|
|
383
|
-
|
|
384
|
-
On **max cycles** with final value worse than baseline:
|
|
385
|
-
1. `git reset --hard {best_commit}` in worktree
|
|
386
|
-
2. Log: "final value {current} worse than baseline {baseline}, reverted to best-known commit {best_commit} (value: {current_best})"
|
|
387
|
-
|
|
388
|
-
#### 5.9.4. PLATEAU → PROBE LAUNCH
|
|
389
|
-
|
|
390
|
-
When plateau detected (3 cycles without ≥1% improvement):
|
|
391
|
-
|
|
392
|
-
1. Pause normal optimize cycle
|
|
393
|
-
2. Determine probe count from `probe_scale` (section 5.7.1 auto-scaling table): 0→2, 2→4, 4→6
|
|
394
|
-
3. Update `probe_scale` in optimize_state
|
|
395
|
-
4. Record `BASELINE=$(git rev-parse HEAD)` in shared worktree
|
|
396
|
-
5. Create sub-worktrees per probe: `git worktree add -b df/{spec}--opt-probe-{N} .deepflow/worktrees/{spec}/opt-probe-{N} ${BASELINE}`
|
|
397
|
-
6. Spawn ALL probes in ONE message using Optimize Probe prompt (section 6), each with its diversity role
|
|
398
|
-
7. End turn. Wait for all notifications.
|
|
399
|
-
8. Per notification: run ratchet + metric measurement in probe worktree
|
|
400
|
-
9. Select winner (section 5.7 step 5, optimize ranking): best metric improvement toward target
|
|
401
|
-
10. Winner → cherry-pick into shared worktree, update current_best, reset cycles_without_improvement=0
|
|
402
|
-
11. Losers → rename branch with `-failed` suffix, preserve worktrees
|
|
403
|
-
12. Log all probe outcomes to `auto-memory.yaml` under `spike_insights` (reuse existing format)
|
|
404
|
-
13. Log probe learnings: winning approach summary + each loser's failure reason
|
|
405
|
-
14. Resume normal optimize cycle from step 1
|
|
406
|
-
|
|
407
|
-
#### 5.9.5. STATE PERSISTENCE (auto-memory.yaml)
|
|
408
|
-
|
|
409
|
-
After every cycle, write `optimize_state` to `.deepflow/auto-memory.yaml` (main tree). This ensures:
|
|
410
|
-
- Context exhaustion at 50% → auto-cycle resumes with full history
|
|
411
|
-
- Failed hypotheses carry forward (agents won't repeat approaches)
|
|
412
|
-
- Probe scale persists across context windows
|
|
413
|
-
|
|
414
|
-
Also append cycle results to `.deepflow/auto-report.md`:
|
|
415
|
-
```
|
|
416
|
-
## Optimize: T{n} — {metric_name}
|
|
417
|
-
| Cycle | Value | Delta | Kept | Commit |
|
|
418
|
-
|-------|-------|-------|------|--------|
|
|
419
|
-
| 1 | 72.3 | — | baseline | abc123 |
|
|
420
|
-
| 2 | 74.1 | +2.5% | ✓ | def456 |
|
|
421
|
-
| 3 | 73.8 | -0.4% | ✗ | (reverted) |
|
|
422
|
-
...
|
|
423
|
-
Best: {current_best} | Target: {target} | Status: {in_progress|reached|max_cycles|circuit_breaker}
|
|
424
|
-
```
|
|
174
|
+
**Plateau → probes:** Scale 0→2, 2→4, 4→6 per §5.7.1. Create sub-worktrees, spawn all with diversity roles (§6 Optimize Probe). Per notification: ratchet + metric. Winner → cherry-pick, update best, reset counters. Losers → `-failed`. Log outcomes. Resume cycle.
|
|
175
|
+
|
|
176
|
+
**State persistence:** Write `optimize_state` to auto-memory.yaml after every cycle. Append results table to `.deepflow/auto-report.md`.
|
|
425
177
|
|
|
426
178
|
---
|
|
427
179
|
|
|
428
180
|
### 6. PER-TASK (agent prompt)
|
|
429
181
|
|
|
430
|
-
|
|
431
|
-
> Critical instructions go at start and end. Navigable data goes in the middle.
|
|
432
|
-
> See: Chroma "Context Rot" (2025) — performance degrades ~2%/100K tokens; distractors and semantic ambiguity compound degradation.
|
|
433
|
-
|
|
434
|
-
**Common preamble (include in all agent prompts):**
|
|
435
|
-
```
|
|
436
|
-
Working directory: {worktree_absolute_path}
|
|
437
|
-
All file operations MUST use this absolute path as base. Do NOT write files to the main project directory.
|
|
438
|
-
Commit format: {commit_type}({spec}): {description}
|
|
439
|
-
```
|
|
440
|
-
|
|
441
|
-
**Standard Task** (spawn with `Agent(model="{Model from PLAN.md}", ...)`):
|
|
442
|
-
|
|
443
|
-
Prompt sections in order (START = high attention, MIDDLE = navigable data, END = high attention):
|
|
444
|
-
|
|
445
|
-
```
|
|
446
|
-
--- START (high attention zone) ---
|
|
447
|
-
|
|
448
|
-
{task_id}: {description from PLAN.md}
|
|
449
|
-
Files: {target files} Spec: {spec_name}
|
|
450
|
-
|
|
451
|
-
{Prior failure context — include ONLY if task was previously reverted. Read from .deepflow/auto-memory.yaml revert_history for this task_id:}
|
|
452
|
-
DO NOT repeat these approaches:
|
|
453
|
-
- Cycle {N}: reverted — "{reason from revert_history}"
|
|
454
|
-
{Omit this entire block if task has no revert history.}
|
|
455
|
-
|
|
456
|
-
{Acceptance criteria excerpt — extract 2-3 key ACs from the spec file (specs/doing-*.md). Include only the criteria relevant to THIS task, not the full spec.}
|
|
457
|
-
Success criteria:
|
|
458
|
-
- {AC relevant to this task}
|
|
459
|
-
- {AC relevant to this task}
|
|
460
|
-
{Omit if spec has no structured ACs.}
|
|
461
|
-
|
|
462
|
-
--- MIDDLE (navigable data zone) ---
|
|
463
|
-
|
|
464
|
-
{Impact block from PLAN.md — include verbatim if present. Annotate each caller with WHY it's impacted:}
|
|
465
|
-
Impact:
|
|
466
|
-
- Callers: {file} ({why — e.g. "imports validateToken which you're changing"})
|
|
467
|
-
- Duplicates:
|
|
468
|
-
- {file} [active — consolidate]
|
|
469
|
-
- {file} [dead — DELETE]
|
|
470
|
-
- Data flow: {consumers}
|
|
471
|
-
{Omit if no Impact in PLAN.md.}
|
|
472
|
-
|
|
473
|
-
{Dependency context — for each completed blocker task, include a one-liner summary:}
|
|
474
|
-
Prior tasks:
|
|
475
|
-
- {dep_task_id}: {one-line summary of what changed — e.g. "refactored validateToken to async, changed signature (string) → (string, opts)"}
|
|
476
|
-
{Omit if task has no dependencies or all deps are bootstrap/spike tasks.}
|
|
477
|
-
|
|
478
|
-
Steps:
|
|
479
|
-
1. External APIs/SDKs → chub search "<library>" --json → chub get <id> --lang <lang> (skip if chub unavailable or internal code only)
|
|
480
|
-
2. LSP freshness check: run `findReferences` on each function/type you're about to change. If callers exist beyond the Impact list, add them to your scope before implementing.
|
|
481
|
-
3. Read ALL files in Impact (+ any new callers from step 2) before implementing — understand the full picture
|
|
482
|
-
4. Implement the task, updating all impacted files
|
|
483
|
-
5. Commit as feat({spec}): {description}
|
|
484
|
-
|
|
485
|
-
--- END (high attention zone) ---
|
|
486
|
-
|
|
487
|
-
{If .deepflow/auto-memory.yaml exists and has probe_learnings, include:}
|
|
488
|
-
Spike results (follow these approaches):
|
|
489
|
-
{each probe_learning with outcome "winner" → "- {insight}"}
|
|
490
|
-
{Omit this block if no probe_learnings exist.}
|
|
491
|
-
|
|
492
|
-
If Impact lists duplicates: [active] → consolidate into single source of truth. [dead] → DELETE entirely.
|
|
493
|
-
Your ONLY job is to write code and commit. Orchestrator runs health checks after.
|
|
494
|
-
STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main.
|
|
495
|
-
```
|
|
496
|
-
|
|
497
|
-
**Effort-aware context budget:** For `Effort: low` tasks, omit the MIDDLE section entirely (no Impact, no dependency context, no steps). For `Effort: medium`, include Impact but omit dependency context. For `Effort: high`, include everything.
|
|
498
|
-
|
|
499
|
-
**Bootstrap Task:**
|
|
500
|
-
```
|
|
501
|
-
BOOTSTRAP: Write tests for files in edit_scope
|
|
502
|
-
Files: {edit_scope files} Spec: {spec_name}
|
|
503
|
-
|
|
504
|
-
Write tests covering listed files. Do NOT change implementation files.
|
|
505
|
-
Commit as test({spec}): bootstrap tests for edit_scope
|
|
506
|
-
```
|
|
182
|
+
**Common preamble (all):** `Working directory: {worktree_absolute_path}. All file ops use this path. Commit format: {type}({spec}): {desc}`
|
|
507
183
|
|
|
508
|
-
**
|
|
184
|
+
**Standard Task** (`Agent(model="{Model}", ...)`):
|
|
509
185
|
```
|
|
510
|
-
|
|
511
|
-
Files: {
|
|
512
|
-
|
|
513
|
-
|
|
514
|
-
|
|
515
|
-
|
|
516
|
-
|
|
517
|
-
|
|
518
|
-
|
|
519
|
-
|
|
186
|
+
--- START ---
|
|
187
|
+
{task_id}: {description} Files: {files} Spec: {spec}
|
|
188
|
+
{If reverted: DO NOT repeat: - Cycle {N}: "{reason}"}
|
|
189
|
+
Success criteria: {ACs from spec relevant to this task}
|
|
190
|
+
--- MIDDLE (omit for low effort; omit deps for medium) ---
|
|
191
|
+
Impact: Callers: {file} ({why}) | Duplicates: [active→consolidate] [dead→DELETE] | Data flow: {consumers}
|
|
192
|
+
Prior tasks: {dep_id}: {summary}
|
|
193
|
+
Steps: 1. chub search/get for APIs 2. LSP findReferences, add unlisted callers 3. Read all Impact files 4. Implement 5. Commit
|
|
194
|
+
--- END ---
|
|
195
|
+
Spike results: {winner learnings}
|
|
196
|
+
Duplicates: [active]→consolidate [dead]→DELETE. ONLY job: code+commit. No merge/rename/checkout.
|
|
520
197
|
```
|
|
521
198
|
|
|
522
|
-
**
|
|
199
|
+
**Bootstrap:** `BOOTSTRAP: Write tests for edit_scope files. Do NOT change implementation. Commit as test({spec}): bootstrap`
|
|
523
200
|
|
|
524
|
-
|
|
201
|
+
**Spike:** `{task_id} [SPIKE]: {hypothesis}. Files+Spec. {reverted warnings}. Minimal spike. Commit as spike({spec}): {desc}`
|
|
525
202
|
|
|
203
|
+
**Optimize Task** (`Agent(model="opus")`):
|
|
526
204
|
```
|
|
527
|
-
--- START
|
|
528
|
-
|
|
529
|
-
{
|
|
530
|
-
|
|
531
|
-
|
|
532
|
-
|
|
533
|
-
|
|
534
|
-
|
|
535
|
-
|
|
536
|
-
CONSTRAINT: Make exactly ONE atomic change. Do not refactor broadly.
|
|
537
|
-
The metric is measured by: {metric_command}
|
|
538
|
-
You succeed if the metric moves toward {target} after your change.
|
|
539
|
-
|
|
540
|
-
--- MIDDLE (navigable data zone) ---
|
|
541
|
-
|
|
542
|
-
Attempt history (last 5 cycles):
|
|
543
|
-
{For each recent history entry:}
|
|
544
|
-
- Cycle {N}: {value} ({+/-delta}%) — {kept|reverted} — "{one-line description of what was tried}"
|
|
545
|
-
{Omit if cycle 1.}
|
|
546
|
-
|
|
547
|
-
DO NOT repeat these failed approaches:
|
|
548
|
-
{For each failed_hypothesis in optimize_state:}
|
|
549
|
-
- "{hypothesis description}"
|
|
550
|
-
{Omit if no failed hypotheses.}
|
|
551
|
-
|
|
552
|
-
{Impact block from PLAN.md if present}
|
|
553
|
-
|
|
554
|
-
{Dependency context if present}
|
|
555
|
-
|
|
556
|
-
Steps:
|
|
557
|
-
1. Analyze the metric command to understand what's being measured
|
|
558
|
-
2. Read the target files and identify ONE specific improvement
|
|
559
|
-
3. Implement the change (ONE atomic modification)
|
|
560
|
-
4. Commit as feat({spec}): optimize {metric_name} — {what you changed}
|
|
561
|
-
|
|
562
|
-
--- END (high attention zone) ---
|
|
563
|
-
|
|
564
|
-
{Spike/probe learnings if any}
|
|
565
|
-
|
|
566
|
-
Your ONLY job is to make ONE atomic change and commit. Orchestrator measures the metric after.
|
|
567
|
-
Do NOT run the metric command yourself. Do NOT make multiple changes.
|
|
568
|
-
STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main.
|
|
205
|
+
--- START ---
|
|
206
|
+
{task_id} [OPTIMIZE]: {metric} — cycle {N}/{max}. Files+Spec.
|
|
207
|
+
Current: {val} (baseline: {b}, best: {best}). Target: {t} ({dir}). Metric: {cmd}
|
|
208
|
+
CONSTRAINT: ONE atomic change.
|
|
209
|
+
--- MIDDLE ---
|
|
210
|
+
Last 5 cycles + failed hypotheses + Impact/deps.
|
|
211
|
+
--- END ---
|
|
212
|
+
{Learnings}. ONE change + commit. No metric run, no multiple changes.
|
|
569
213
|
```
|
|
570
214
|
|
|
571
|
-
**Optimize Probe
|
|
572
|
-
|
|
573
|
-
Used during plateau resolution. Each probe has a diversity role.
|
|
574
|
-
|
|
215
|
+
**Optimize Probe** (`Agent(model="opus")`):
|
|
575
216
|
```
|
|
576
|
-
--- START
|
|
577
|
-
|
|
578
|
-
|
|
579
|
-
|
|
580
|
-
|
|
581
|
-
|
|
582
|
-
|
|
583
|
-
|
|
584
|
-
|
|
585
|
-
|
|
586
|
-
contextualizada: "Build on the best approach so far: {best_approach_summary}. Refine, extend, or combine what worked."
|
|
587
|
-
contraditoria: "The best approach so far was: {best_approach_summary}. Try the OPPOSITE — if it optimized X, try Y instead. Challenge the current direction."
|
|
588
|
-
ingenua: "Ignore all prior attempts. Approach this metric from scratch with no assumptions about what has or hasn't worked."
|
|
589
|
-
|
|
590
|
-
--- MIDDLE (navigable data zone) ---
|
|
591
|
-
|
|
592
|
-
Full attempt history:
|
|
593
|
-
{ALL history entries from optimize_state}
|
|
594
|
-
- Cycle {N}: {value} ({+/-delta}%) — {kept|reverted}
|
|
595
|
-
|
|
596
|
-
All failed approaches (DO NOT repeat):
|
|
597
|
-
{ALL failed_hypotheses}
|
|
598
|
-
- "{hypothesis description}"
|
|
599
|
-
|
|
600
|
-
--- END (high attention zone) ---
|
|
601
|
-
|
|
602
|
-
Make ONE atomic change that moves the metric toward {target}.
|
|
603
|
-
Commit as feat({spec}): optimize probe {probe_id} — {what you changed}
|
|
604
|
-
STOP after committing.
|
|
217
|
+
--- START ---
|
|
218
|
+
{task_id} [OPTIMIZE PROBE]: {metric} — probe {id} ({role})
|
|
219
|
+
Current/Target. Role instruction:
|
|
220
|
+
contextualizada: "Build on best: {summary}. Refine."
|
|
221
|
+
contraditoria: "Best was: {summary}. Try OPPOSITE."
|
|
222
|
+
ingenua: "Ignore prior. Fresh approach."
|
|
223
|
+
--- MIDDLE ---
|
|
224
|
+
Full history + all failed hypotheses.
|
|
225
|
+
--- END ---
|
|
226
|
+
ONE atomic change. Commit. STOP.
|
|
605
227
|
```
|
|
606
228
|
|
|
607
229
|
### 8. COMPLETE SPECS
|
|
608
230
|
|
|
609
|
-
|
|
610
|
-
1.
|
|
611
|
-
|
|
612
|
-
- If verify fails (adds fix tasks): stop here — `/df:execute --continue` will pick up the fix tasks
|
|
613
|
-
- If verify passes: proceed to step 2
|
|
614
|
-
2. Remove spec's ENTIRE section from PLAN.md (header, tasks, summaries, fix tasks, separators)
|
|
615
|
-
3. Recalculate Summary table at top of PLAN.md
|
|
231
|
+
All tasks done for `doing-*` spec:
|
|
232
|
+
1. `skill: "df:verify", args: "doing-{name}"` — runs L0-L4 gates, merges, cleans worktree, renames doing→done, extracts decisions. Fail (fix tasks added) → stop; `--continue` picks them up.
|
|
233
|
+
2. Remove spec's ENTIRE section from PLAN.md. Recalculate Summary table.
|
|
616
234
|
|
|
617
235
|
---
|
|
618
236
|
|
|
619
237
|
## Usage
|
|
620
238
|
|
|
621
239
|
```
|
|
622
|
-
/df:execute #
|
|
623
|
-
/df:execute T1 T2 # Specific tasks
|
|
624
|
-
/df:execute --continue # Resume
|
|
240
|
+
/df:execute # All ready tasks
|
|
241
|
+
/df:execute T1 T2 # Specific tasks
|
|
242
|
+
/df:execute --continue # Resume checkpoint
|
|
625
243
|
/df:execute --fresh # Ignore checkpoint
|
|
626
244
|
/df:execute --dry-run # Show plan only
|
|
627
245
|
```
|
|
628
246
|
|
|
629
247
|
## Skills & Agents
|
|
630
248
|
|
|
631
|
-
|
|
632
|
-
- Skill: `browse-fetch` — Fetch live web pages and external API docs via browser before coding
|
|
633
|
-
|
|
634
|
-
| Agent | subagent_type | Purpose |
|
|
635
|
-
|-------|---------------|---------|
|
|
636
|
-
| Implementation | `general-purpose` | Task implementation |
|
|
637
|
-
| Debugger | `reasoner` | Debugging failures |
|
|
638
|
-
|
|
639
|
-
**Model + effort routing:** Read `Model:` and `Effort:` fields from each task block in PLAN.md. Pass `model:` parameter when spawning the agent. Prepend effort instruction to the agent prompt. Defaults: `Model: sonnet`, `Effort: medium`.
|
|
640
|
-
|
|
641
|
-
| Task fields | Agent call | Prompt preamble |
|
|
642
|
-
|-------------|-----------|-----------------|
|
|
643
|
-
| `Model: haiku, Effort: low` | `Agent(model="haiku", ...)` | `You MUST be maximally efficient: skip explanations, minimize tool calls, go straight to implementation.` |
|
|
644
|
-
| `Model: sonnet, Effort: medium` | `Agent(model="sonnet", ...)` | `Be direct and efficient. Explain only when the logic is non-obvious.` |
|
|
645
|
-
| `Model: opus, Effort: high` | `Agent(model="opus", ...)` | _(no preamble — default behavior)_ |
|
|
646
|
-
| (missing) | `Agent(model="sonnet", ...)` | `Be direct and efficient. Explain only when the logic is non-obvious.` |
|
|
249
|
+
Skills: `atomic-commits`, `browse-fetch`. Agents: Implementation (`general-purpose`), Debugger (`reasoner`).
|
|
647
250
|
|
|
648
|
-
**
|
|
649
|
-
- `low` → Prepend efficiency instruction. Agent should make fewest possible tool calls.
|
|
650
|
-
- `medium` → Prepend balanced instruction. Agent skips preamble but explains non-obvious decisions.
|
|
651
|
-
- `high` → No preamble added. Agent uses full reasoning capabilities.
|
|
251
|
+
**Model+effort routing** (read from PLAN.md, defaults: sonnet/medium):
|
|
652
252
|
|
|
653
|
-
|
|
654
|
-
|
|
655
|
-
|
|
656
|
-
|
|
253
|
+
| Fields | Agent | Preamble |
|
|
254
|
+
|--------|-------|----------|
|
|
255
|
+
| haiku/low | `Agent(model="haiku")` | `Maximally efficient: skip explanations, minimize tool calls, straight to implementation.` |
|
|
256
|
+
| sonnet/medium | `Agent(model="sonnet")` | `Direct and efficient. Explain only non-obvious logic.` |
|
|
257
|
+
| opus/high | `Agent(model="opus")` | _(none)_ |
|
|
657
258
|
|
|
658
|
-
|
|
259
|
+
**Checkpoint:** `.deepflow/checkpoint.json`: `{"completed_tasks":["T1"],"current_wave":2,"worktree_path":"...","worktree_branch":"df/..."}`
|
|
659
260
|
|
|
660
261
|
## Failure Handling
|
|
661
262
|
|
|
662
|
-
|
|
663
|
-
- `TaskUpdate(taskId: native_id, status: "pending")` — dependents remain blocked
|
|
664
|
-
- Repeated failure → spawn `Task(subagent_type="reasoner", prompt="Debug failure: {ratchet output}")`
|
|
665
|
-
- Leave worktree intact, keep checkpoint.json
|
|
666
|
-
- Output: worktree path/branch, `cd {path}` to investigate, `--continue` to resume, `--fresh` to discard
|
|
263
|
+
Reverted task: `TaskUpdate(status: "pending")`, dependents stay blocked. Repeated failure → spawn reasoner debugger. Leave worktree+checkpoint intact. Output: path, `cd` command, `--continue`/`--fresh` options.
|
|
667
264
|
|
|
668
265
|
## Rules
|
|
669
266
|
|
|
670
267
|
| Rule | Detail |
|
|
671
268
|
|------|--------|
|
|
672
|
-
| Zero
|
|
269
|
+
| Zero tests → bootstrap first | Sole task when snapshot empty |
|
|
673
270
|
| 1 task = 1 agent = 1 commit | `atomic-commits` skill |
|
|
674
|
-
| 1 file = 1 writer | Sequential
|
|
675
|
-
| Agent
|
|
676
|
-
| No LLM evaluates LLM
|
|
677
|
-
| ≥2 spikes
|
|
678
|
-
|
|
|
679
|
-
| Machine-selected winner | Regressions > coverage > files
|
|
271
|
+
| 1 file = 1 writer | Sequential on conflict |
|
|
272
|
+
| Agent codes, orchestrator measures | Ratchet judges |
|
|
273
|
+
| No LLM evaluates LLM | Health checks only |
|
|
274
|
+
| ≥2 spikes → parallel probes | Never sequential |
|
|
275
|
+
| Probe worktrees preserved | Losers `-failed`, never deleted |
|
|
276
|
+
| Machine-selected winner | Regressions > coverage > files; no LLM judge |
|
|
680
277
|
| External APIs → chub first | Skip if unavailable |
|
|
681
|
-
| 1 optimize
|
|
682
|
-
| Optimize = atomic
|
|
683
|
-
| Ratchet + metric
|
|
684
|
-
| Plateau → probes
|
|
685
|
-
| Circuit breaker = 3
|
|
686
|
-
|
|
|
687
|
-
|
|
688
|
-
## Example
|
|
689
|
-
|
|
690
|
-
```
|
|
691
|
-
/df:execute (context: 12%)
|
|
692
|
-
|
|
693
|
-
Loading PLAN.md... T1 ready, T2/T3 blocked by T1
|
|
694
|
-
Ratchet snapshot: 24 pre-existing test files
|
|
695
|
-
|
|
696
|
-
Wave 1: TaskUpdate(T1, in_progress)
|
|
697
|
-
[Agent "T1" completed]
|
|
698
|
-
Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
|
|
699
|
-
✓ T1: ratchet passed (abc1234)
|
|
700
|
-
TaskUpdate(T1, completed) → auto-unblocks T2, T3
|
|
701
|
-
|
|
702
|
-
Wave 2: TaskUpdate(T2/T3, in_progress)
|
|
703
|
-
[Agent "T2" completed] ✓ T2: ratchet passed (def5678)
|
|
704
|
-
[Agent "T3" completed] ✓ T3: ratchet passed (ghi9012)
|
|
705
|
-
|
|
706
|
-
Context: 35% — All tasks done for doing-upload.
|
|
707
|
-
Running /df:verify doing-upload...
|
|
708
|
-
✓ L0 | ✓ L1 (3/3 files) | ⚠ L2 (no coverage tool) | ✓ L4 (24 tests)
|
|
709
|
-
✓ Merged df/upload to main
|
|
710
|
-
✓ Spec complete: doing-upload → done-upload
|
|
711
|
-
Complete: 3/3
|
|
712
|
-
```
|
|
278
|
+
| 1 optimize at a time | Sequential |
|
|
279
|
+
| Optimize = atomic only | One change per cycle |
|
|
280
|
+
| Ratchet + metric both required | Keep only if both pass |
|
|
281
|
+
| Plateau → probes | 3 cycles <1% triggers probes |
|
|
282
|
+
| Circuit breaker = 3 reverts | Halts, needs human |
|
|
283
|
+
| Probe diversity | ≥1 contraditoria + ≥1 ingenua |
|