deepflow 0.1.87 → 0.1.89
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/install.js +73 -7
- package/hooks/df-dashboard-push.js +170 -0
- package/hooks/df-execution-history.js +120 -0
- package/hooks/df-invariant-check.js +126 -0
- package/hooks/df-spec-lint.js +78 -4
- package/hooks/df-statusline.js +77 -5
- package/hooks/df-tool-usage-spike.js +41 -0
- package/hooks/df-tool-usage.js +86 -0
- package/hooks/df-worktree-guard.js +101 -0
- package/package.json +1 -1
- package/src/commands/df/auto-cycle.md +75 -558
- package/src/commands/df/auto.md +9 -48
- package/src/commands/df/consolidate.md +14 -38
- package/src/commands/df/dashboard.md +35 -0
- package/src/commands/df/debate.md +27 -156
- package/src/commands/df/discover.md +35 -181
- package/src/commands/df/execute.md +283 -563
- package/src/commands/df/note.md +37 -176
- package/src/commands/df/plan.md +80 -210
- package/src/commands/df/report.md +29 -184
- package/src/commands/df/resume.md +18 -101
- package/src/commands/df/spec.md +49 -145
- package/src/commands/df/verify.md +59 -606
- package/src/skills/browse-fetch/SKILL.md +32 -257
- package/src/skills/browse-verify/SKILL.md +40 -174
- package/src/skills/code-completeness/SKILL.md +2 -9
- package/src/skills/gap-discovery/SKILL.md +19 -86
- package/templates/config-template.yaml +10 -0
- package/templates/spec-template.md +12 -1
|
@@ -10,36 +10,27 @@ description: Execute tasks from PLAN.md with agent spawning, ratchet health chec
|
|
|
10
10
|
You are a coordinator. Spawn agents, run ratchet checks, update PLAN.md. Never implement code yourself.
|
|
11
11
|
|
|
12
12
|
**NEVER:** Read source files, edit code, use TaskOutput, use EnterPlanMode, use ExitPlanMode
|
|
13
|
-
|
|
14
|
-
**ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, run ratchet health checks after each agent completes, update PLAN.md, write `.deepflow/decisions.md` in the main tree
|
|
13
|
+
**ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, run ratchet health checks, update PLAN.md, write `.deepflow/decisions.md`
|
|
15
14
|
|
|
16
15
|
## Core Loop (Notification-Driven)
|
|
17
16
|
|
|
18
|
-
Each task = one background agent.
|
|
19
|
-
|
|
20
|
-
**NEVER use TaskOutput** — returns full transcripts (100KB+) that explode context.
|
|
17
|
+
Each task = one background agent. **NEVER use TaskOutput** (100KB+ transcripts explode context).
|
|
21
18
|
|
|
22
19
|
```
|
|
23
20
|
1. Spawn ALL wave agents with run_in_background=true in ONE message
|
|
24
|
-
2. STOP. End
|
|
21
|
+
2. STOP. End turn. Do NOT poll.
|
|
25
22
|
3. On EACH notification:
|
|
26
|
-
a.
|
|
27
|
-
b. Passed → TaskUpdate(status: "completed"), update PLAN.md [x] + commit hash
|
|
28
|
-
c. Failed →
|
|
29
|
-
d.
|
|
30
|
-
e.
|
|
31
|
-
|
|
23
|
+
a. Ratchet check (§5.5)
|
|
24
|
+
b. Passed → wave test agent (§5.6). Tests pass → re-snapshot (§5.6) → TaskUpdate(status: "completed"), update PLAN.md [x] + commit hash
|
|
25
|
+
c. Failed → partial salvage (§5.5). Salvaged → wave test agent (§5.6). Not → git revert, TaskUpdate(status: "pending")
|
|
26
|
+
d. Wave test agent failed after max attempts → revert ALL task commits, TaskUpdate(status: "pending")
|
|
27
|
+
e. Report ONE line: "✓ T1: ratchet+tests passed (abc123)" or "⚕ T1: salvaged+tested (abc124)" or "✗ T1: reverted" or "✗ T1: test agent failed, reverted"
|
|
28
|
+
f. NOT all done → end turn, wait | ALL done → next wave or finish
|
|
29
|
+
4. Between waves: context ≥50% → checkpoint and exit.
|
|
32
30
|
5. Repeat until: all done, all blocked, or context ≥50%.
|
|
33
31
|
```
|
|
34
32
|
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
Statusline writes `.deepflow/context.json`: `{"percentage": 45}`
|
|
38
|
-
|
|
39
|
-
| Context % | Action |
|
|
40
|
-
|-----------|--------|
|
|
41
|
-
| < 50% | Full parallelism (up to 5 agents) |
|
|
42
|
-
| ≥ 50% | Wait for running agents, checkpoint, exit |
|
|
33
|
+
**Context threshold:** Statusline writes `.deepflow/context.json`: `{"percentage": 45}`. <50% = full parallelism (up to 5). ≥50% = wait, checkpoint, exit.
|
|
43
34
|
|
|
44
35
|
---
|
|
45
36
|
|
|
@@ -47,109 +38,70 @@ Statusline writes `.deepflow/context.json`: `{"percentage": 45}`
|
|
|
47
38
|
|
|
48
39
|
### 1. CHECK CHECKPOINT
|
|
49
40
|
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
→ Verify worktree exists on disk (else error: "Use --fresh")
|
|
53
|
-
→ Skip completed tasks, resume execution
|
|
54
|
-
--fresh → Delete checkpoint, start fresh
|
|
55
|
-
checkpoint exists → Prompt: "Resume? (y/n)"
|
|
56
|
-
else → Start fresh
|
|
57
|
-
```
|
|
58
|
-
|
|
59
|
-
Shell injection (use output directly — no manual file reads needed):
|
|
60
|
-
- `` !`cat .deepflow/checkpoint.json 2>/dev/null || echo 'NOT_FOUND'` ``
|
|
61
|
-
- `` !`git diff --quiet && echo 'CLEAN' || echo 'DIRTY'` ``
|
|
41
|
+
`--continue` → load `.deepflow/checkpoint.json`, verify worktree exists (else error "Use --fresh"), skip completed. `--fresh` → delete checkpoint. Checkpoint exists → prompt "Resume? (y/n)".
|
|
42
|
+
Shell: `` !`cat .deepflow/checkpoint.json 2>/dev/null || echo 'NOT_FOUND'` `` / `` !`git diff --quiet && echo 'CLEAN' || echo 'DIRTY'` ``
|
|
62
43
|
|
|
63
44
|
### 1.5. CREATE WORKTREE
|
|
64
45
|
|
|
65
|
-
Require clean HEAD
|
|
66
|
-
Create worktree: `.deepflow/worktrees/{spec}` on branch `df/{spec}`.
|
|
67
|
-
Reuse if exists. `--fresh` deletes first.
|
|
68
|
-
|
|
69
|
-
If `worktree.sparse_paths` is non-empty in config, enable sparse checkout:
|
|
70
|
-
```bash
|
|
71
|
-
git worktree add --no-checkout -b df/{spec} .deepflow/worktrees/{spec}
|
|
72
|
-
cd .deepflow/worktrees/{spec}
|
|
73
|
-
git sparse-checkout set {sparse_paths...}
|
|
74
|
-
git checkout df/{spec}
|
|
75
|
-
```
|
|
46
|
+
Require clean HEAD. Derive SPEC_NAME from `specs/doing-*.md`. Create `.deepflow/worktrees/{spec}` on branch `df/{spec}`. Reuse if exists; `--fresh` deletes first. If `worktree.sparse_paths` non-empty: `git worktree add --no-checkout`, `sparse-checkout set {paths}`, checkout.
|
|
76
47
|
|
|
77
48
|
### 1.6. RATCHET SNAPSHOT
|
|
78
49
|
|
|
79
|
-
Snapshot pre-existing test files
|
|
80
|
-
|
|
50
|
+
Snapshot pre-existing test files — only these count for ratchet (agent-created excluded):
|
|
81
51
|
```bash
|
|
82
|
-
|
|
83
|
-
git ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' \
|
|
84
|
-
> .deepflow/auto-snapshot.txt
|
|
52
|
+
git -C ${WORKTREE_PATH} ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' > .deepflow/auto-snapshot.txt
|
|
85
53
|
```
|
|
86
54
|
|
|
87
55
|
### 1.7. NO-TESTS BOOTSTRAP
|
|
88
56
|
|
|
89
|
-
|
|
57
|
+
<!-- AC-1: zero test files triggers bootstrap before wave 1 -->
|
|
58
|
+
<!-- AC-2: bootstrap success re-snapshots auto-snapshot.txt; subsequent tasks use updated snapshot -->
|
|
59
|
+
<!-- AC-3: bootstrap failure with default model retries with Opus; double failure halts with specific message -->
|
|
90
60
|
|
|
91
|
-
1.
|
|
92
|
-
|
|
93
|
-
|
|
61
|
+
**Gate:** After §1.6 snapshot, check `auto-snapshot.txt`:
|
|
62
|
+
```bash
|
|
63
|
+
SNAPSHOT_COUNT=$(wc -l < .deepflow/auto-snapshot.txt | tr -d ' ')
|
|
64
|
+
```
|
|
65
|
+
If `SNAPSHOT_COUNT` is `0` (zero test files found), MUST spawn bootstrap agent before wave 1. No implementation tasks may start until bootstrap completes successfully.
|
|
94
66
|
|
|
95
|
-
|
|
67
|
+
**Bootstrap flow:**
|
|
68
|
+
1. Spawn `Agent(model="{default_model}", ...)` with Bootstrap prompt (§6). End turn, wait for notification.
|
|
69
|
+
2. **On success (TASK_STATUS:pass):** Re-snapshot immediately:
|
|
70
|
+
```bash
|
|
71
|
+
git -C ${WORKTREE_PATH} ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' > .deepflow/auto-snapshot.txt
|
|
72
|
+
```
|
|
73
|
+
All subsequent tasks use this updated snapshot as their ratchet baseline. Proceed to wave 1.
|
|
74
|
+
3. **On failure (TASK_STATUS:fail) with default model:** Retry ONCE with `Agent(model="opus", ...)` using the same Bootstrap prompt.
|
|
75
|
+
- Opus success → re-snapshot (same command above) → proceed to wave 1.
|
|
76
|
+
- Opus failure → halt with message: `"Bootstrap failed with both default and Opus — manual intervention required"`. Do not proceed.
|
|
96
77
|
|
|
97
78
|
### 2. LOAD PLAN
|
|
98
79
|
|
|
99
|
-
|
|
100
|
-
Load: PLAN.md (required), specs/doing-*.md, .deepflow/config.yaml
|
|
101
|
-
If missing: "No PLAN.md found. Run /df:plan first."
|
|
102
|
-
```
|
|
103
|
-
|
|
104
|
-
Shell injection (use output directly — no manual file reads needed):
|
|
105
|
-
- `` !`cat .deepflow/checkpoint.json 2>/dev/null || echo 'NOT_FOUND'` ``
|
|
106
|
-
- `` !`git diff --quiet && echo 'CLEAN' || echo 'DIRTY'` ``
|
|
80
|
+
Load PLAN.md (required), specs/doing-*.md, .deepflow/config.yaml. Missing → "No PLAN.md found. Run /df:plan first."
|
|
107
81
|
|
|
108
82
|
### 2.5. REGISTER NATIVE TASKS
|
|
109
83
|
|
|
110
|
-
For each `[ ]` task
|
|
84
|
+
For each `[ ]` task: `TaskCreate(subject: "{task_id}: {description}", activeForm: "{gerund}", description: full block)`. Store task_id → native ID. Set deps via `TaskUpdate(addBlockedBy: [...])`. `--continue` → only remaining `[ ]` items.
|
|
111
85
|
|
|
112
|
-
### 3.
|
|
86
|
+
### 3–4. READY TASKS
|
|
113
87
|
|
|
114
|
-
Warn if `specs/*.md` (excluding doing-/done-) exist
|
|
115
|
-
|
|
116
|
-
### 4. IDENTIFY READY TASKS
|
|
117
|
-
|
|
118
|
-
Ready = TaskList where status: "pending" AND blockedBy: empty.
|
|
88
|
+
Warn if unplanned `specs/*.md` (excluding doing-/done-) exist (non-blocking). Ready = TaskList where status: "pending" AND blockedBy: empty.
|
|
119
89
|
|
|
120
90
|
### 5. SPAWN AGENTS
|
|
121
91
|
|
|
122
|
-
Context ≥50
|
|
123
|
-
|
|
124
|
-
Before spawning: `TaskUpdate(taskId: native_id, status: "in_progress")` — activates UI spinner.
|
|
125
|
-
|
|
126
|
-
**Token tracking — record start:**
|
|
127
|
-
```
|
|
128
|
-
start_percentage = !`grep -o '"percentage":[0-9]*' .deepflow/context.json 2>/dev/null | grep -o '[0-9]*' || echo ''`
|
|
129
|
-
start_timestamp = !`date -u +%Y-%m-%dT%H:%M:%SZ`
|
|
130
|
-
```
|
|
131
|
-
Store both values in memory (keyed by task_id) for use after ratchet completes. Omit if context.json unavailable.
|
|
132
|
-
|
|
133
|
-
**NEVER use `isolation: "worktree"` on Task calls.** Deepflow manages a shared worktree so wave 2 sees wave 1 commits.
|
|
92
|
+
Context ≥50% → checkpoint and exit. Before spawning: `TaskUpdate(status: "in_progress")`.
|
|
134
93
|
|
|
135
|
-
**
|
|
94
|
+
**Token tracking start:** Store `start_percentage` (from context.json) and `start_timestamp` (ISO 8601) keyed by task_id. Omit if unavailable.
|
|
136
95
|
|
|
137
|
-
**
|
|
138
|
-
Before spawning, check `Files:` lists of all ready tasks. If two+ ready tasks share a file:
|
|
139
|
-
1. Sort conflicting tasks by task number (T1 < T2 < T3)
|
|
140
|
-
2. Spawn only the lowest-numbered task from each conflict group
|
|
141
|
-
3. Remaining tasks stay `pending` — they become ready once the spawned task completes
|
|
142
|
-
4. Log: `"⏳ T{N} deferred — file conflict with T{M} on {filename}"`
|
|
96
|
+
**NEVER use `isolation: "worktree"`.** Deepflow manages a shared worktree so wave 2 sees wave 1 commits. **Spawn ALL ready tasks in ONE message** except file conflicts.
|
|
143
97
|
|
|
144
|
-
|
|
98
|
+
**File conflicts (1 file = 1 writer):** Check `Files:` lists. Overlap → spawn lowest-numbered only; rest stay pending. Log: `"⏳ T{N} deferred — file conflict with T{M} on {filename}"`
|
|
145
99
|
|
|
146
|
-
**[OPTIMIZE] tasks
|
|
100
|
+
**≥2 [SPIKE] tasks same problem →** Parallel Spike Probes (§5.7). **[OPTIMIZE] tasks →** Optimize Cycle (§5.9), one at a time.
|
|
147
101
|
|
|
148
102
|
### 5.5. RATCHET CHECK
|
|
149
103
|
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
**Auto-detect commands:**
|
|
104
|
+
Run health checks in worktree after each agent completes.
|
|
153
105
|
|
|
154
106
|
| File | Build | Test | Typecheck | Lint |
|
|
155
107
|
|------|-------|------|-----------|------|
|
|
@@ -158,555 +110,323 @@ After each agent completes, run health checks in the worktree.
|
|
|
158
110
|
| `Cargo.toml` | `cargo build` | `cargo test` | — | `cargo clippy` |
|
|
159
111
|
| `go.mod` | `go build ./...` | `go test ./...` | — | `go vet ./...` |
|
|
160
112
|
|
|
161
|
-
Run Build → Test → Typecheck → Lint (stop on first failure).
|
|
162
|
-
|
|
163
|
-
**Edit scope validation** (if spec declares `edit_scope`): check `git diff HEAD~1 --name-only` against allowed globs. Violations → `git revert HEAD --no-edit`, report "Edit scope violation: {files}".
|
|
164
|
-
|
|
165
|
-
**Impact completeness check** (if task has Impact block in PLAN.md):
|
|
166
|
-
Compare `git diff HEAD~1 --name-only` against Impact callers/duplicates list.
|
|
167
|
-
File listed but not modified → **advisory warning**: "Impact gap: {file} listed as {caller|duplicate} but not modified — verify manually". Not auto-revert (callers sometimes don't need changes), but flags the risk.
|
|
168
|
-
|
|
169
|
-
**Metric gate (Optimize tasks only):**
|
|
170
|
-
|
|
171
|
-
After ratchet passes, if the current task has an `Optimize:` block, run the metric gate:
|
|
172
|
-
|
|
173
|
-
1. Run the `metric` shell command in the worktree: `cd ${WORKTREE_PATH} && eval "${metric_command}"`
|
|
174
|
-
2. Parse output as float. Non-numeric output → cycle failure (revert, log "metric parse error: {raw output}")
|
|
175
|
-
3. Compare against previous measurement using `direction`:
|
|
176
|
-
- `direction: higher` → new value must be > previous + (previous × min_improvement_threshold)
|
|
177
|
-
- `direction: lower` → new value must be < previous - (previous × min_improvement_threshold)
|
|
178
|
-
4. Both ratchet AND metric improvement required → keep commit
|
|
179
|
-
5. Ratchet passes but metric did not improve → revert (log "ratchet passed but metric stagnant/regressed: {old} → {new}")
|
|
180
|
-
6. Run each `secondary_metrics` command, parse as float. If regression > `regression_threshold` (default 5%) compared to baseline: append WARNING to `.deepflow/auto-report.md`: `"WARNING: {name} regressed {delta}% ({baseline_val} → {new_val}) at cycle {N}"`. Do NOT auto-revert.
|
|
181
|
-
|
|
182
|
-
**Output Truncation:**
|
|
183
|
-
|
|
184
|
-
After ratchet checks complete, truncate command output for context efficiency:
|
|
185
|
-
|
|
186
|
-
- **Success (all checks passed):** Suppress output entirely — do not include build/test/lint output in reports
|
|
187
|
-
- **Build failure:** Include last 15 lines of build error only
|
|
188
|
-
- **Test failure:** Include failed test name(s) + last 20 lines of test output
|
|
189
|
-
- **Typecheck/lint failure:** Include error count + first 5 errors only
|
|
190
|
-
|
|
191
|
-
**Token tracking — write result (on ratchet pass):**
|
|
192
|
-
|
|
193
|
-
After all checks pass, compute and write the token block to `.deepflow/results/T{N}.yaml`:
|
|
113
|
+
Run Build → Test → Typecheck → Lint (stop on first failure). Ratchet uses ONLY pre-existing tests from `.deepflow/auto-snapshot.txt`.
|
|
194
114
|
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
```
|
|
115
|
+
**Edit scope validation:** `git diff HEAD~1 --name-only` vs allowed globs. Violation → revert, report.
|
|
116
|
+
**Impact completeness:** diff vs Impact callers/duplicates. Gap → advisory warning (no revert).
|
|
198
117
|
|
|
199
|
-
Parse
|
|
200
|
-
```bash
|
|
201
|
-
awk -v start="REPLACE_start_timestamp" -v end="REPLACE_end_timestamp" '
|
|
202
|
-
{
|
|
203
|
-
ts=""; inp=0; cre=0; rd=0
|
|
204
|
-
if (match($0, /"timestamp":"[^"]*"/)) { ts=substr($0, RSTART+13, RLENGTH-14) }
|
|
205
|
-
if (ts >= start && ts <= end) {
|
|
206
|
-
if (match($0, /"input_tokens":[0-9]+/)) inp=substr($0, RSTART+15, RLENGTH-15)
|
|
207
|
-
if (match($0, /"cache_creation_input_tokens":[0-9]+/)) cre=substr($0, RSTART+30, RLENGTH-30)
|
|
208
|
-
if (match($0, /"cache_read_input_tokens":[0-9]+/)) rd=substr($0, RSTART+26, RLENGTH-26)
|
|
209
|
-
si+=inp; sc+=cre; sr+=rd
|
|
210
|
-
}
|
|
211
|
-
}
|
|
212
|
-
END { printf "{\"input_tokens\":%d,\"cache_creation_input_tokens\":%d,\"cache_read_input_tokens\":%d}\n", si+0, sc+0, sr+0 }
|
|
213
|
-
' .deepflow/token-history.jsonl 2>/dev/null || echo '{}'
|
|
214
|
-
```
|
|
118
|
+
**Metric gate (Optimize only):** Run `eval "${metric_command}"` with cwd=`${WORKTREE_PATH}` (never `cd && eval`). Parse float (non-numeric → revert). Compare using `direction`+`min_improvement_threshold`. Both ratchet AND metric must pass → keep. Ratchet pass + metric stagnant → revert. Secondary metrics: regression > `regression_threshold` (5%) → WARNING in auto-report.md (no revert).
|
|
215
119
|
|
|
216
|
-
|
|
217
|
-
```
|
|
218
|
-
!`cat .deepflow/results/T{N}.yaml 2>/dev/null || echo ''`
|
|
219
|
-
```
|
|
120
|
+
**Output truncation:** Success → suppress. Build fail → last 15 lines. Test fail → names + last 20 lines. Typecheck/lint → count + first 5 errors.
|
|
220
121
|
|
|
221
|
-
|
|
122
|
+
**Token tracking result (on pass):** Read `end_percentage`. Sum token fields from `.deepflow/token-history.jsonl` between start/end timestamps (awk ISO 8601 compare). Write to `.deepflow/results/T{N}.yaml`:
|
|
222
123
|
```yaml
|
|
223
124
|
tokens:
|
|
224
|
-
start_percentage: {
|
|
225
|
-
end_percentage: {
|
|
226
|
-
delta_percentage: {
|
|
227
|
-
input_tokens: {sum
|
|
228
|
-
cache_creation_input_tokens: {sum
|
|
229
|
-
cache_read_input_tokens: {sum
|
|
230
|
-
```
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
125
|
+
start_percentage: {val}
|
|
126
|
+
end_percentage: {val}
|
|
127
|
+
delta_percentage: {end - start}
|
|
128
|
+
input_tokens: {sum}
|
|
129
|
+
cache_creation_input_tokens: {sum}
|
|
130
|
+
cache_read_input_tokens: {sum}
|
|
131
|
+
```
|
|
132
|
+
Omit if context.json/token-history.jsonl/awk unavailable. Never fail ratchet for tracking errors.
|
|
133
|
+
|
|
134
|
+
**Evaluate:** All pass → commit stands. Failure → partial salvage:
|
|
135
|
+
1. Lint/typecheck-only (build+tests passed): spawn `Agent(model="haiku")` to fix. Re-ratchet. Fail → revert both.
|
|
136
|
+
2. Build/test failure → `git revert HEAD --no-edit` (no salvage).
|
|
137
|
+
|
|
138
|
+
### 5.6. WAVE TEST AGENT
|
|
139
|
+
|
|
140
|
+
<!-- AC-8: After wave ratchet passes, Opus test agent spawns and writes unit tests -->
|
|
141
|
+
<!-- AC-9: Test failures trigger implementer re-spawn with failure feedback; max 3 attempts then revert -->
|
|
142
|
+
<!-- AC-12: auto-snapshot.txt re-generated after wave test agent commits; wave N+1 ratchet includes wave N tests -->
|
|
143
|
+
|
|
144
|
+
**Trigger:** After ratchet check passes (or after successful salvage) for a task.
|
|
145
|
+
|
|
146
|
+
**Attempt tracking:** Initialize `attempt_count = 1` and `failure_feedback = ""` per task when first spawned. Max 3 total attempts (1 initial + 2 retries).
|
|
147
|
+
|
|
148
|
+
**Flow:**
|
|
149
|
+
1. Capture the implementation diff: `git -C ${WORKTREE_PATH} diff HEAD~1` → store as `IMPL_DIFF`.
|
|
150
|
+
2. Spawn `Agent(model="opus")` with Wave Test prompt (§6). `run_in_background=true`. End turn, wait.
|
|
151
|
+
3. On notification:
|
|
152
|
+
a. Run ratchet check (§5.5) — all new + pre-existing tests must pass.
|
|
153
|
+
b. **Tests pass** → commit stands. **Re-snapshot** immediately so wave N+1 ratchet includes wave N tests:
|
|
154
|
+
```bash
|
|
155
|
+
git -C ${WORKTREE_PATH} ls-files | grep -E '\.(test|spec)\.[^/]+$|^test_|_test\.[^/]+$|^tests/|__tests__/' > .deepflow/auto-snapshot.txt
|
|
156
|
+
```
|
|
157
|
+
Task complete. Report: `"✓ T{n}: ratchet+tests passed ({hash})"`.
|
|
158
|
+
c. **Tests fail** →
|
|
159
|
+
- If `attempt_count < 3`:
|
|
160
|
+
- `git revert HEAD --no-edit` (revert test commit)
|
|
161
|
+
- `git revert HEAD --no-edit` (revert implementation commit)
|
|
162
|
+
- Accumulate failure output: `failure_feedback += "Attempt {N}: {truncated_test_output}\n"`
|
|
163
|
+
- `attempt_count += 1`
|
|
164
|
+
- Re-spawn implementer agent with original prompt + failure feedback appendix:
|
|
165
|
+
```
|
|
166
|
+
PREVIOUS FAILURES (attempt {N-1} of 3):
|
|
167
|
+
{failure_feedback}
|
|
168
|
+
Fix the issues above. Do NOT repeat the same mistakes.
|
|
169
|
+
```
|
|
170
|
+
- On implementer notification: ratchet check (§5.5). Passed → goto step 1 (spawn test agent again). Failed → same retry logic.
|
|
171
|
+
- If `attempt_count >= 3`:
|
|
172
|
+
- Revert ALL commits back to pre-task state: `git -C ${WORKTREE_PATH} reset --hard {pre_task_commit}`
|
|
173
|
+
- `TaskUpdate(status: "pending")`
|
|
174
|
+
- Report: `"✗ T{n}: test agent failed after 3 attempts, reverted"`
|
|
175
|
+
|
|
176
|
+
**Output truncation for failure feedback:** Test failures → test names + last 30 lines of output. Build failures → last 15 lines. Cap total `failure_feedback` at 200 lines.
|
|
245
177
|
|
|
246
178
|
### 5.7. PARALLEL SPIKE PROBES
|
|
247
179
|
|
|
248
|
-
Trigger: ≥2 [SPIKE] tasks with same
|
|
249
|
-
|
|
250
|
-
1. **Baseline:** Record `BASELINE=$(git rev-parse HEAD)` in shared worktree
|
|
251
|
-
2. **Sub-worktrees:** Per spike: `git worktree add -b df/{spec}--probe-{SPIKE_ID} .deepflow/worktrees/{spec}/probe-{SPIKE_ID} ${BASELINE}`
|
|
252
|
-
3. **Spawn:** All probes in ONE message, each targeting its probe worktree. End turn.
|
|
253
|
-
4. **Ratchet:** Per notification, run standard ratchet (5.5) in probe worktree. Record: ratchet_passed, regressions, coverage_delta, files_changed, commit
|
|
254
|
-
5. **Select winner** (after ALL complete, no LLM judge):
|
|
255
|
-
- Disqualify any with regressions
|
|
256
|
-
- **Standard spikes**: Rank: fewer regressions > higher coverage_delta > fewer files_changed > first to complete
|
|
257
|
-
- **Optimize probes**: Rank: best metric improvement (absolute delta toward target) > fewer regressions > fewer files_changed
|
|
258
|
-
- No passes → reset all to pending for retry with debugger
|
|
259
|
-
6. **Preserve all worktrees.** Losers: rename branch + `-failed` suffix. Record in checkpoint.json under `"spike_probes"`
|
|
260
|
-
7. **Log ALL probe outcomes** to `.deepflow/auto-memory.yaml` (main tree):
|
|
261
|
-
```yaml
|
|
262
|
-
spike_insights:
|
|
263
|
-
- date: "YYYY-MM-DD"
|
|
264
|
-
spec: "{spec_name}"
|
|
265
|
-
spike_id: "SPIKE_A"
|
|
266
|
-
hypothesis: "{from PLAN.md}"
|
|
267
|
-
outcome: "winner"
|
|
268
|
-
approach: "{one-sentence summary of what the winning probe chose}"
|
|
269
|
-
ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
|
|
270
|
-
branch: "df/{spec}--probe-SPIKE_A"
|
|
271
|
-
- date: "YYYY-MM-DD"
|
|
272
|
-
spec: "{spec_name}"
|
|
273
|
-
spike_id: "SPIKE_B"
|
|
274
|
-
hypothesis: "{from PLAN.md}"
|
|
275
|
-
outcome: "failed" # or "passed-but-lost"
|
|
276
|
-
failure_reason: "{first failed check + error summary}"
|
|
277
|
-
ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
|
|
278
|
-
worktree: ".deepflow/worktrees/{spec}/probe-SPIKE_B-failed"
|
|
279
|
-
branch: "df/{spec}--probe-SPIKE_B-failed"
|
|
280
|
-
probe_learnings: # read by /df:auto-cycle each start AND included in per-task preamble
|
|
281
|
-
- spike: "SPIKE_A"
|
|
282
|
-
probe: "probe-SPIKE_A"
|
|
283
|
-
insight: "{one-sentence summary of winning approach — e.g. 'Use Node.js over Bun for Playwright'}"
|
|
284
|
-
- spike: "SPIKE_B"
|
|
285
|
-
probe: "probe-SPIKE_B"
|
|
286
|
-
insight: "{one-sentence summary from failure_reason}"
|
|
287
|
-
```
|
|
288
|
-
Create file if missing. Preserve existing keys when merging. Log BOTH winners and losers — downstream tasks need to know what was chosen, not just what failed.
|
|
289
|
-
8. **Promote winner:** Cherry-pick into shared worktree. Winner → `[x] [PROBE_WINNER]`, losers → `[~] [PROBE_FAILED]`. Resume standard loop.
|
|
290
|
-
|
|
291
|
-
#### 5.7.1. PROBE DIVERSITY ENFORCEMENT (Optimize Probes)
|
|
180
|
+
Trigger: ≥2 [SPIKE] tasks with same blocker or identical hypothesis.
|
|
292
181
|
|
|
293
|
-
|
|
182
|
+
1. `BASELINE=$(git rev-parse HEAD)` in shared worktree
|
|
183
|
+
2. Sub-worktrees per spike: `git worktree add -b df/{spec}--probe-{ID} .deepflow/worktrees/{spec}/probe-{ID} ${BASELINE}`
|
|
184
|
+
3. Spawn all probes in ONE message. End turn.
|
|
185
|
+
4. Per notification: ratchet (§5.5). Record: ratchet_passed, regressions, coverage_delta, files_changed, commit.
|
|
186
|
+
5. **Winner selection** (no LLM judge): disqualify regressions. Standard: fewer regressions > coverage > fewer files > first complete. Optimize: best metric delta > fewer regressions > fewer files. No passes → reset pending for debugger.
|
|
187
|
+
6. Preserve all worktrees. Losers: branch + `-failed`. Record in checkpoint.json.
|
|
188
|
+
7. Log all outcomes to `.deepflow/auto-memory.yaml` under `spike_insights`+`probe_learnings` (schema in auto-cycle.md). Both winners and losers.
|
|
189
|
+
8. Cherry-pick winner into shared worktree. Winner → `[x] [PROBE_WINNER]`, losers → `[~] [PROBE_FAILED]`.
|
|
294
190
|
|
|
295
|
-
|
|
296
|
-
- **contextualizada**: Builds on the best approach so far — refines, extends, or combines what worked. Prompt includes: "Build on the best result so far: {best_approach_summary}. Refine or extend it."
|
|
297
|
-
- **contraditoria**: Tries the opposite of the current best. Prompt includes: "The best approach so far is {best_approach_summary}. Try the OPPOSITE direction — if it cached, don't cache; if it optimized hot path, optimize cold path; etc."
|
|
298
|
-
- **ingenua**: No prior context — naive fresh attempt. Prompt includes: "Ignore all prior attempts. Approach this from scratch with no assumptions about what works."
|
|
191
|
+
#### 5.7.1. PROBE DIVERSITY (Optimize Probes)
|
|
299
192
|
|
|
300
|
-
**
|
|
193
|
+
Roles: **contextualizada** (refine best), **contraditoria** (opposite of best), **ingenua** (fresh, no context).
|
|
301
194
|
|
|
302
|
-
|
|
|
303
|
-
|
|
195
|
+
| Round | Count | Roles |
|
|
196
|
+
|-------|-------|-------|
|
|
304
197
|
| 1st plateau | 2 | 1 contraditoria + 1 ingenua |
|
|
305
198
|
| 2nd plateau | 4 | 1 contextualizada + 2 contraditoria + 1 ingenua |
|
|
306
|
-
| 3rd+
|
|
199
|
+
| 3rd+ | 6 | 2 contextualizada + 2 contraditoria + 2 ingenua |
|
|
307
200
|
|
|
308
|
-
|
|
309
|
-
- Every probe set MUST include ≥1 contraditoria and ≥1 ingenua (minimum diversity)
|
|
310
|
-
- contextualizada only added from round 2+ (needs prior data to build on)
|
|
311
|
-
- Each probe prompt includes its role label and role-specific instruction
|
|
312
|
-
- Probe scale persists in `optimize_state.probe_scale` in `auto-memory.yaml`
|
|
201
|
+
Every set: ≥1 contraditoria + ≥1 ingenua. contextualizada from round 2+ only. Scale persists in `optimize_state.probe_scale`.
|
|
313
202
|
|
|
314
203
|
### 5.9. OPTIMIZE CYCLE
|
|
315
204
|
|
|
316
|
-
Trigger: task has `Optimize:` block
|
|
317
|
-
|
|
318
|
-
**Optimize is a distinct execution mode** — one optimize task at a time, spanning N cycles until a stop condition.
|
|
319
|
-
|
|
320
|
-
#### 5.9.1. INITIALIZATION
|
|
321
|
-
|
|
322
|
-
1. Parse `Optimize:` block from PLAN.md task: `metric`, `target`, `direction`, `max_cycles`, `secondary_metrics`
|
|
323
|
-
2. Load or initialize `optimize_state` from `.deepflow/auto-memory.yaml`:
|
|
324
|
-
```yaml
|
|
325
|
-
optimize_state:
|
|
326
|
-
task_id: "T{n}"
|
|
327
|
-
metric_command: "{shell command}"
|
|
328
|
-
target: {number}
|
|
329
|
-
direction: "higher|lower"
|
|
330
|
-
baseline: null # set on first measure
|
|
331
|
-
current_best: null # best metric value seen
|
|
332
|
-
best_commit: null # commit hash of best value
|
|
333
|
-
cycles_run: 0
|
|
334
|
-
cycles_without_improvement: 0
|
|
335
|
-
consecutive_reverts: 0
|
|
336
|
-
probe_scale: 0 # 0=no probes yet, 2/4/6
|
|
337
|
-
max_cycles: {number}
|
|
338
|
-
history: [] # [{cycle, value, delta, kept, commit}]
|
|
339
|
-
failed_hypotheses: [] # ["{description}"]
|
|
340
|
-
```
|
|
341
|
-
3. **Measure baseline**: `cd ${WORKTREE_PATH} && eval "${metric_command}"` → parse float → store as `baseline` and `current_best`
|
|
342
|
-
4. Measure each secondary metric → store as `secondary_baselines`
|
|
343
|
-
5. Check if target already met (`direction: higher` → baseline >= target; `lower` → baseline <= target). If met → mark task `[x]`, log "target already met: {baseline}", done.
|
|
344
|
-
|
|
345
|
-
#### 5.9.2. CYCLE LOOP
|
|
205
|
+
Trigger: task has `Optimize:` block. One at a time, N cycles until stop condition.
|
|
346
206
|
|
|
347
|
-
|
|
207
|
+
**Init:** Parse metric/target/direction/max_cycles/secondary_metrics. Load or init `optimize_state` in auto-memory.yaml (fields: task_id, metric_command, target, direction, baseline, current_best, best_commit, cycles_run, cycles_without_improvement, consecutive_reverts, probe_scale, max_cycles, history[], failed_hypotheses[]). Measure baseline (`eval` with cwd=worktree) → store as baseline+current_best. Measure secondaries. Target met → mark `[x]`, done.
|
|
348
208
|
|
|
209
|
+
**Cycle loop:**
|
|
349
210
|
```
|
|
350
211
|
REPEAT:
|
|
351
|
-
1. Check stop conditions
|
|
352
|
-
2. Spawn ONE optimize agent (
|
|
353
|
-
3.
|
|
354
|
-
|
|
355
|
-
|
|
356
|
-
|
|
357
|
-
|
|
358
|
-
|
|
359
|
-
|
|
360
|
-
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
|
|
364
|
-
|
|
365
|
-
|
|
366
|
-
|
|
367
|
-
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
|
|
372
|
-
m. Check context %. If ≥50% → checkpoint and exit (auto-cycle resumes).
|
|
373
|
-
```
|
|
212
|
+
1. Check stop conditions → if triggered, exit
|
|
213
|
+
2. Spawn ONE optimize agent (§6) run_in_background=true. STOP, end turn.
|
|
214
|
+
3. On notification:
|
|
215
|
+
a. Ratchet fail → revert, ++consecutive_reverts, log hypothesis, goto 1
|
|
216
|
+
b. Metric parse error → revert, ++consecutive_reverts
|
|
217
|
+
c. improvement = (new - best) / |best| × 100 (flip for lower; absolute if best==0)
|
|
218
|
+
d. >= 1% threshold → KEEP, update best, reset counters
|
|
219
|
+
e. < threshold → REVERT, ++cycles_without_improvement
|
|
220
|
+
f. ++cycles_run, append history, check secondaries, persist state
|
|
221
|
+
g. Report: "⟳ T{n} cycle {N}: {old}→{new} ({delta}%) — {kept|reverted} [best: X, target: Y]"
|
|
222
|
+
h. Context ≥50% → checkpoint, exit
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
**Stop conditions:**
|
|
226
|
+
|
|
227
|
+
| Condition | Action |
|
|
228
|
+
|-----------|--------|
|
|
229
|
+
| Target reached | Mark `[x]` |
|
|
230
|
+
| cycles_run >= max_cycles | Mark `[x]`. If best < baseline → `git reset --hard {best_commit}` |
|
|
231
|
+
| 3 cycles without improvement | Launch probes (plateau) |
|
|
232
|
+
| 3 consecutive reverts | Halt, task `[ ]`, requires human intervention |
|
|
374
233
|
|
|
375
|
-
|
|
376
|
-
|
|
377
|
-
|
|
378
|
-
|-----------|-----------|--------|
|
|
379
|
-
| **Target reached** | `direction: higher` → value >= target; `lower` → value <= target | Mark task `[x]`, log "target reached: {value}" |
|
|
380
|
-
| **Max cycles** | `cycles_run >= max_cycles` | Mark task `[x]` with note: "max cycles reached, best: {current_best}". If current_best worse than baseline → `git reset --hard {best_commit}`, log "reverted to best-known" |
|
|
381
|
-
| **Plateau** | `cycles_without_improvement >= 3` | Pause normal cycle → launch probes (5.9.4) |
|
|
382
|
-
| **Circuit breaker** | `consecutive_reverts >= 3` | Halt, task stays `[ ]`, log "circuit breaker: 3 consecutive reverts". Requires human intervention. |
|
|
383
|
-
|
|
384
|
-
On **max cycles** with final value worse than baseline:
|
|
385
|
-
1. `git reset --hard {best_commit}` in worktree
|
|
386
|
-
2. Log: "final value {current} worse than baseline {baseline}, reverted to best-known commit {best_commit} (value: {current_best})"
|
|
387
|
-
|
|
388
|
-
#### 5.9.4. PLATEAU → PROBE LAUNCH
|
|
389
|
-
|
|
390
|
-
When plateau detected (3 cycles without ≥1% improvement):
|
|
391
|
-
|
|
392
|
-
1. Pause normal optimize cycle
|
|
393
|
-
2. Determine probe count from `probe_scale` (section 5.7.1 auto-scaling table): 0→2, 2→4, 4→6
|
|
394
|
-
3. Update `probe_scale` in optimize_state
|
|
395
|
-
4. Record `BASELINE=$(git rev-parse HEAD)` in shared worktree
|
|
396
|
-
5. Create sub-worktrees per probe: `git worktree add -b df/{spec}--opt-probe-{N} .deepflow/worktrees/{spec}/opt-probe-{N} ${BASELINE}`
|
|
397
|
-
6. Spawn ALL probes in ONE message using Optimize Probe prompt (section 6), each with its diversity role
|
|
398
|
-
7. End turn. Wait for all notifications.
|
|
399
|
-
8. Per notification: run ratchet + metric measurement in probe worktree
|
|
400
|
-
9. Select winner (section 5.7 step 5, optimize ranking): best metric improvement toward target
|
|
401
|
-
10. Winner → cherry-pick into shared worktree, update current_best, reset cycles_without_improvement=0
|
|
402
|
-
11. Losers → rename branch with `-failed` suffix, preserve worktrees
|
|
403
|
-
12. Log all probe outcomes to `auto-memory.yaml` under `spike_insights` (reuse existing format)
|
|
404
|
-
13. Log probe learnings: winning approach summary + each loser's failure reason
|
|
405
|
-
14. Resume normal optimize cycle from step 1
|
|
406
|
-
|
|
407
|
-
#### 5.9.5. STATE PERSISTENCE (auto-memory.yaml)
|
|
408
|
-
|
|
409
|
-
After every cycle, write `optimize_state` to `.deepflow/auto-memory.yaml` (main tree). This ensures:
|
|
410
|
-
- Context exhaustion at 50% → auto-cycle resumes with full history
|
|
411
|
-
- Failed hypotheses carry forward (agents won't repeat approaches)
|
|
412
|
-
- Probe scale persists across context windows
|
|
413
|
-
|
|
414
|
-
Also append cycle results to `.deepflow/auto-report.md`:
|
|
415
|
-
```
|
|
416
|
-
## Optimize: T{n} — {metric_name}
|
|
417
|
-
| Cycle | Value | Delta | Kept | Commit |
|
|
418
|
-
|-------|-------|-------|------|--------|
|
|
419
|
-
| 1 | 72.3 | — | baseline | abc123 |
|
|
420
|
-
| 2 | 74.1 | +2.5% | ✓ | def456 |
|
|
421
|
-
| 3 | 73.8 | -0.4% | ✗ | (reverted) |
|
|
422
|
-
...
|
|
423
|
-
Best: {current_best} | Target: {target} | Status: {in_progress|reached|max_cycles|circuit_breaker}
|
|
424
|
-
```
|
|
234
|
+
**Plateau → probes:** Scale 0→2, 2→4, 4→6 per §5.7.1. Create sub-worktrees, spawn all with diversity roles (§6 Optimize Probe). Per notification: ratchet + metric. Winner → cherry-pick, update best, reset counters. Losers → `-failed`. Log outcomes. Resume cycle.
|
|
235
|
+
|
|
236
|
+
**State persistence:** Write `optimize_state` to auto-memory.yaml after every cycle. Append results table to `.deepflow/auto-report.md`.
|
|
425
237
|
|
|
426
238
|
---
|
|
427
239
|
|
|
428
240
|
### 6. PER-TASK (agent prompt)
|
|
429
241
|
|
|
430
|
-
|
|
431
|
-
> Critical instructions go at start and end. Navigable data goes in the middle.
|
|
432
|
-
> See: Chroma "Context Rot" (2025) — performance degrades ~2%/100K tokens; distractors and semantic ambiguity compound degradation.
|
|
242
|
+
**Common preamble (all):** `Working directory: {worktree_absolute_path}. All file ops use this path. Commit format: {type}({spec}): {desc}`
|
|
433
243
|
|
|
434
|
-
**
|
|
244
|
+
**Standard Task** (`Agent(model="{Model}", ...)`):
|
|
435
245
|
```
|
|
436
|
-
|
|
437
|
-
|
|
438
|
-
|
|
246
|
+
--- START ---
|
|
247
|
+
{task_id}: {description} Files: {files} Spec: {spec}
|
|
248
|
+
{If reverted: DO NOT repeat: - Cycle {N}: "{reason}"}
|
|
249
|
+
{If spike insights exist:
|
|
250
|
+
spike_results:
|
|
251
|
+
hypothesis: {hypothesis from spike_insights}
|
|
252
|
+
outcome: {outcome}
|
|
253
|
+
edge_cases: {edge_cases}
|
|
254
|
+
insight: {insight from probe_learnings}
|
|
255
|
+
}
|
|
256
|
+
Success criteria: {ACs from spec relevant to this task}
|
|
257
|
+
--- MIDDLE (omit for low effort; omit deps for medium) ---
|
|
258
|
+
Impact: Callers: {file} ({why}) | Duplicates: [active→consolidate] [dead→DELETE] | Data flow: {consumers}
|
|
259
|
+
Prior tasks: {dep_id}: {summary}
|
|
260
|
+
Steps: 1. chub search/get for APIs 2. LSP findReferences, add unlisted callers 3. Read all Impact files 4. Implement 5. Commit
|
|
261
|
+
--- END ---
|
|
262
|
+
Duplicates: [active]→consolidate [dead]→DELETE. ONLY job: code+commit. No merge/rename/checkout.
|
|
263
|
+
Last line of your response MUST be: TASK_STATUS:pass (if successful) or TASK_STATUS:fail (if failed) or TASK_STATUS:revert (if reverted)
|
|
439
264
|
```
|
|
440
265
|
|
|
441
|
-
**
|
|
442
|
-
|
|
443
|
-
Prompt sections in order (START = high attention, MIDDLE = navigable data, END = high attention):
|
|
266
|
+
**Bootstrap:** `BOOTSTRAP: Write tests for edit_scope files. Do NOT change implementation. Commit as test({spec}): bootstrap. Last line: TASK_STATUS:pass or TASK_STATUS:fail`
|
|
444
267
|
|
|
268
|
+
**Wave Test** (`Agent(model="opus")`):
|
|
445
269
|
```
|
|
446
|
-
--- START
|
|
447
|
-
|
|
448
|
-
{
|
|
449
|
-
|
|
450
|
-
|
|
451
|
-
{Prior failure context — include ONLY if task was previously reverted. Read from .deepflow/auto-memory.yaml revert_history for this task_id:}
|
|
452
|
-
DO NOT repeat these approaches:
|
|
453
|
-
- Cycle {N}: reverted — "{reason from revert_history}"
|
|
454
|
-
{Omit this entire block if task has no revert history.}
|
|
455
|
-
|
|
456
|
-
{Acceptance criteria excerpt — extract 2-3 key ACs from the spec file (specs/doing-*.md). Include only the criteria relevant to THIS task, not the full spec.}
|
|
457
|
-
Success criteria:
|
|
458
|
-
- {AC relevant to this task}
|
|
459
|
-
- {AC relevant to this task}
|
|
460
|
-
{Omit if spec has no structured ACs.}
|
|
461
|
-
|
|
462
|
-
--- MIDDLE (navigable data zone) ---
|
|
463
|
-
|
|
464
|
-
{Impact block from PLAN.md — include verbatim if present. Annotate each caller with WHY it's impacted:}
|
|
465
|
-
Impact:
|
|
466
|
-
- Callers: {file} ({why — e.g. "imports validateToken which you're changing"})
|
|
467
|
-
- Duplicates:
|
|
468
|
-
- {file} [active — consolidate]
|
|
469
|
-
- {file} [dead — DELETE]
|
|
470
|
-
- Data flow: {consumers}
|
|
471
|
-
{Omit if no Impact in PLAN.md.}
|
|
472
|
-
|
|
473
|
-
{Dependency context — for each completed blocker task, include a one-liner summary:}
|
|
474
|
-
Prior tasks:
|
|
475
|
-
- {dep_task_id}: {one-line summary of what changed — e.g. "refactored validateToken to async, changed signature (string) → (string, opts)"}
|
|
476
|
-
{Omit if task has no dependencies or all deps are bootstrap/spike tasks.}
|
|
477
|
-
|
|
478
|
-
Steps:
|
|
479
|
-
1. External APIs/SDKs → chub search "<library>" --json → chub get <id> --lang <lang> (skip if chub unavailable or internal code only)
|
|
480
|
-
2. LSP freshness check: run `findReferences` on each function/type you're about to change. If callers exist beyond the Impact list, add them to your scope before implementing.
|
|
481
|
-
3. Read ALL files in Impact (+ any new callers from step 2) before implementing — understand the full picture
|
|
482
|
-
4. Implement the task, updating all impacted files
|
|
483
|
-
5. Commit as feat({spec}): {description}
|
|
484
|
-
|
|
485
|
-
--- END (high attention zone) ---
|
|
486
|
-
|
|
487
|
-
{If .deepflow/auto-memory.yaml exists and has probe_learnings, include:}
|
|
488
|
-
Spike results (follow these approaches):
|
|
489
|
-
{each probe_learning with outcome "winner" → "- {insight}"}
|
|
490
|
-
{Omit this block if no probe_learnings exist.}
|
|
491
|
-
|
|
492
|
-
If Impact lists duplicates: [active] → consolidate into single source of truth. [dead] → DELETE entirely.
|
|
493
|
-
Your ONLY job is to write code and commit. Orchestrator runs health checks after.
|
|
494
|
-
STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main.
|
|
495
|
-
```
|
|
270
|
+
--- START ---
|
|
271
|
+
You are a QA engineer. Write unit tests for the following code changes.
|
|
272
|
+
Use {test_framework}. Test behavioral correctness, not implementation details.
|
|
273
|
+
Spec: {spec}. Task: {task_id}.
|
|
496
274
|
|
|
497
|
-
|
|
275
|
+
Implementation diff:
|
|
276
|
+
{IMPL_DIFF}
|
|
498
277
|
|
|
499
|
-
|
|
500
|
-
|
|
501
|
-
|
|
502
|
-
Files: {edit_scope files} Spec: {spec_name}
|
|
503
|
-
|
|
504
|
-
Write tests covering listed files. Do NOT change implementation files.
|
|
505
|
-
Commit as test({spec}): bootstrap tests for edit_scope
|
|
506
|
-
```
|
|
278
|
+
--- MIDDLE ---
|
|
279
|
+
Files changed: {changed_files}
|
|
280
|
+
Existing test patterns: {test_file_examples from auto-snapshot.txt, first 3}
|
|
507
281
|
|
|
508
|
-
|
|
282
|
+
--- END ---
|
|
283
|
+
Write thorough unit tests covering: happy paths, edge cases, error handling.
|
|
284
|
+
Follow existing test conventions in the codebase.
|
|
285
|
+
Commit as: test({spec}): wave-{N} unit tests
|
|
286
|
+
Do NOT modify implementation files. ONLY add/edit test files.
|
|
287
|
+
Last line of your response MUST be: TASK_STATUS:pass or TASK_STATUS:fail
|
|
509
288
|
```
|
|
510
|
-
{task_id} [SPIKE]: {hypothesis}
|
|
511
|
-
Files: {target files} Spec: {spec_name}
|
|
512
289
|
|
|
513
|
-
{
|
|
514
|
-
DO NOT repeat these approaches:
|
|
515
|
-
- Cycle {N}: reverted — "{reason}"
|
|
516
|
-
{Omit this entire block if no revert history.}
|
|
290
|
+
**Spike:** `{task_id} [SPIKE]: {hypothesis}. Files+Spec. {reverted warnings}. Minimal spike. Commit as spike({spec}): {desc}. Last line: TASK_STATUS:pass or TASK_STATUS:fail`
|
|
517
291
|
|
|
518
|
-
|
|
519
|
-
Commit as spike({spec}): {description}
|
|
292
|
+
**Optimize Task** (`Agent(model="opus")`):
|
|
520
293
|
```
|
|
521
|
-
|
|
522
|
-
|
|
523
|
-
|
|
524
|
-
|
|
525
|
-
|
|
294
|
+
--- START ---
|
|
295
|
+
{task_id} [OPTIMIZE]: {metric} — cycle {N}/{max}. Files+Spec.
|
|
296
|
+
Current: {val} (baseline: {b}, best: {best}). Target: {t} ({dir}). Metric: {cmd}
|
|
297
|
+
CONSTRAINT: ONE atomic change.
|
|
298
|
+
--- MIDDLE ---
|
|
299
|
+
Last 5 cycles + failed hypotheses + Impact/deps.
|
|
300
|
+
--- END ---
|
|
301
|
+
{Learnings}. ONE change + commit. No metric run, no multiple changes.
|
|
302
|
+
Last line of your response MUST be: TASK_STATUS:pass or TASK_STATUS:fail or TASK_STATUS:revert
|
|
526
303
|
```
|
|
527
|
-
--- START (high attention zone) ---
|
|
528
|
-
|
|
529
|
-
{task_id} [OPTIMIZE]: Improve {metric_name} — cycle {N}/{max_cycles}
|
|
530
|
-
Files: {target files} Spec: {spec_name}
|
|
531
|
-
|
|
532
|
-
Current metric: {current_value} (baseline: {baseline}, best: {current_best})
|
|
533
|
-
Target: {target} ({direction})
|
|
534
|
-
Improvement needed: {delta_to_target} ({direction})
|
|
535
|
-
|
|
536
|
-
CONSTRAINT: Make exactly ONE atomic change. Do not refactor broadly.
|
|
537
|
-
The metric is measured by: {metric_command}
|
|
538
|
-
You succeed if the metric moves toward {target} after your change.
|
|
539
|
-
|
|
540
|
-
--- MIDDLE (navigable data zone) ---
|
|
541
|
-
|
|
542
|
-
Attempt history (last 5 cycles):
|
|
543
|
-
{For each recent history entry:}
|
|
544
|
-
- Cycle {N}: {value} ({+/-delta}%) — {kept|reverted} — "{one-line description of what was tried}"
|
|
545
|
-
{Omit if cycle 1.}
|
|
546
|
-
|
|
547
|
-
DO NOT repeat these failed approaches:
|
|
548
|
-
{For each failed_hypothesis in optimize_state:}
|
|
549
|
-
- "{hypothesis description}"
|
|
550
|
-
{Omit if no failed hypotheses.}
|
|
551
304
|
|
|
552
|
-
|
|
553
|
-
|
|
554
|
-
|
|
555
|
-
|
|
556
|
-
|
|
557
|
-
|
|
558
|
-
|
|
559
|
-
|
|
560
|
-
|
|
561
|
-
|
|
562
|
-
--- END
|
|
563
|
-
|
|
564
|
-
|
|
565
|
-
|
|
566
|
-
Your ONLY job is to make ONE atomic change and commit. Orchestrator measures the metric after.
|
|
567
|
-
Do NOT run the metric command yourself. Do NOT make multiple changes.
|
|
568
|
-
STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main.
|
|
305
|
+
**Optimize Probe** (`Agent(model="opus")`):
|
|
306
|
+
```
|
|
307
|
+
--- START ---
|
|
308
|
+
{task_id} [OPTIMIZE PROBE]: {metric} — probe {id} ({role})
|
|
309
|
+
Current/Target. Role instruction:
|
|
310
|
+
contextualizada: "Build on best: {summary}. Refine."
|
|
311
|
+
contraditoria: "Best was: {summary}. Try OPPOSITE."
|
|
312
|
+
ingenua: "Ignore prior. Fresh approach."
|
|
313
|
+
--- MIDDLE ---
|
|
314
|
+
Full history + all failed hypotheses.
|
|
315
|
+
--- END ---
|
|
316
|
+
ONE atomic change. Commit. STOP.
|
|
317
|
+
Last line of your response MUST be: TASK_STATUS:pass or TASK_STATUS:fail or TASK_STATUS:revert
|
|
569
318
|
```
|
|
570
319
|
|
|
571
|
-
**
|
|
572
|
-
|
|
573
|
-
Used during plateau resolution. Each probe has a diversity role.
|
|
574
|
-
|
|
320
|
+
**Final Test** (`Agent(model="opus")`):
|
|
575
321
|
```
|
|
576
|
-
--- START
|
|
322
|
+
--- START ---
|
|
323
|
+
You are an independent QA engineer. You have ONLY the spec and exported interfaces below.
|
|
324
|
+
You cannot read implementation files — you must treat the system as a black box.
|
|
325
|
+
Write integration tests that verify EACH acceptance criterion from the spec.
|
|
577
326
|
|
|
578
|
-
|
|
579
|
-
|
|
327
|
+
Spec:
|
|
328
|
+
{SPEC_CONTENT}
|
|
580
329
|
|
|
581
|
-
|
|
582
|
-
|
|
330
|
+
Exported interfaces:
|
|
331
|
+
{EXPORTED_INTERFACES}
|
|
583
332
|
|
|
584
|
-
|
|
585
|
-
|
|
586
|
-
|
|
587
|
-
|
|
588
|
-
|
|
333
|
+
--- END ---
|
|
334
|
+
Write integration tests covering every AC in the spec.
|
|
335
|
+
Test through public interfaces only — no internal imports, no implementation details.
|
|
336
|
+
If an AC cannot be tested through exports alone, write a test stub with a TODO comment explaining why.
|
|
337
|
+
Commit as: test({spec}): integration tests
|
|
338
|
+
Do NOT read or modify implementation files. ONLY add/edit test files.
|
|
339
|
+
Last line of your response MUST be: TASK_STATUS:pass or TASK_STATUS:fail
|
|
340
|
+
```
|
|
589
341
|
|
|
590
|
-
|
|
342
|
+
### 8. COMPLETE SPECS
|
|
591
343
|
|
|
592
|
-
|
|
593
|
-
|
|
594
|
-
- Cycle {N}: {value} ({+/-delta}%) — {kept|reverted}
|
|
344
|
+
<!-- AC-10: After all waves, Opus black-box test agent spawns with spec + exports only (no implementation) -->
|
|
345
|
+
<!-- AC-11: Final integration tests must all pass before merge proceeds; failure blocks merge -->
|
|
595
346
|
|
|
596
|
-
All
|
|
597
|
-
{ALL failed_hypotheses}
|
|
598
|
-
- "{hypothesis description}"
|
|
347
|
+
All tasks done for `doing-*` spec:
|
|
599
348
|
|
|
600
|
-
|
|
349
|
+
**8.1. Final Test Agent (black-box integration tests):**
|
|
601
350
|
|
|
602
|
-
|
|
603
|
-
Commit as feat({spec}): optimize probe {probe_id} — {what you changed}
|
|
604
|
-
STOP after committing.
|
|
605
|
-
```
|
|
351
|
+
Before merge, spawn an independent Opus QA agent that sees ONLY the spec and exported interfaces — never implementation source.
|
|
606
352
|
|
|
607
|
-
|
|
608
|
-
|
|
609
|
-
|
|
610
|
-
|
|
611
|
-
|
|
612
|
-
|
|
613
|
-
|
|
614
|
-
2.
|
|
615
|
-
|
|
353
|
+
1. Extract exported interfaces from the worktree (public API surface):
|
|
354
|
+
```bash
|
|
355
|
+
# Collect exported symbols — adapt pattern to language
|
|
356
|
+
git -C ${WORKTREE_PATH} diff main --name-only | xargs grep -h '^\(export\|pub \|func \|def \)' 2>/dev/null | head -100
|
|
357
|
+
```
|
|
358
|
+
Store result as `EXPORTED_INTERFACES`. Also load spec content: `cat specs/doing-{name}.md` → `SPEC_CONTENT`.
|
|
359
|
+
|
|
360
|
+
2. Spawn `Agent(model="opus")` with Final Test prompt (§6). `run_in_background=true`. End turn, wait.
|
|
361
|
+
|
|
362
|
+
3. On notification:
|
|
363
|
+
a. Run ratchet check (§5.5) — all integration tests must pass.
|
|
364
|
+
b. **Tests pass** → commit stands. Proceed to step 8.2 (merge).
|
|
365
|
+
c. **Tests fail** → **merge is blocked**. Do NOT retry. Report:
|
|
366
|
+
`"✗ Final integration tests failed for {spec} — merge blocked, requires human review"`
|
|
367
|
+
Leave worktree intact. Set all spec tasks back to `TaskUpdate(status: "pending")`.
|
|
368
|
+
Write failure details to `.deepflow/results/final-test-{spec}.yaml`:
|
|
369
|
+
```yaml
|
|
370
|
+
spec: {spec}
|
|
371
|
+
status: blocked
|
|
372
|
+
reason: "Final integration tests failed"
|
|
373
|
+
output: |
|
|
374
|
+
{truncated test output — last 30 lines}
|
|
375
|
+
```
|
|
376
|
+
STOP. Do not proceed to merge.
|
|
377
|
+
|
|
378
|
+
**8.2. Merge and cleanup:**
|
|
379
|
+
1. `skill: "df:verify", args: "doing-{name}"` — runs L0-L4 gates, merges, cleans worktree, renames doing→done, extracts decisions. Fail (fix tasks added) → stop; `--continue` picks them up.
|
|
380
|
+
2. Remove spec's ENTIRE section from PLAN.md. Recalculate Summary table.
|
|
616
381
|
|
|
617
382
|
---
|
|
618
383
|
|
|
619
384
|
## Usage
|
|
620
385
|
|
|
621
386
|
```
|
|
622
|
-
/df:execute #
|
|
623
|
-
/df:execute T1 T2 # Specific tasks
|
|
624
|
-
/df:execute --continue # Resume
|
|
387
|
+
/df:execute # All ready tasks
|
|
388
|
+
/df:execute T1 T2 # Specific tasks
|
|
389
|
+
/df:execute --continue # Resume checkpoint
|
|
625
390
|
/df:execute --fresh # Ignore checkpoint
|
|
626
391
|
/df:execute --dry-run # Show plan only
|
|
627
392
|
```
|
|
628
393
|
|
|
629
394
|
## Skills & Agents
|
|
630
395
|
|
|
631
|
-
|
|
632
|
-
- Skill: `browse-fetch` — Fetch live web pages and external API docs via browser before coding
|
|
633
|
-
|
|
634
|
-
| Agent | subagent_type | Purpose |
|
|
635
|
-
|-------|---------------|---------|
|
|
636
|
-
| Implementation | `general-purpose` | Task implementation |
|
|
637
|
-
| Debugger | `reasoner` | Debugging failures |
|
|
396
|
+
Skills: `atomic-commits`, `browse-fetch`. Agents: Implementation (`general-purpose`), Debugger (`reasoner`).
|
|
638
397
|
|
|
639
|
-
**Model
|
|
398
|
+
**Model+effort routing** (read from PLAN.md, defaults: sonnet/medium):
|
|
640
399
|
|
|
641
|
-
|
|
|
642
|
-
|
|
643
|
-
|
|
|
644
|
-
|
|
|
645
|
-
|
|
|
646
|
-
| (missing) | `Agent(model="sonnet", ...)` | `Be direct and efficient. Explain only when the logic is non-obvious.` |
|
|
400
|
+
| Fields | Agent | Preamble |
|
|
401
|
+
|--------|-------|----------|
|
|
402
|
+
| haiku/low | `Agent(model="haiku")` | `Maximally efficient: skip explanations, minimize tool calls, straight to implementation.` |
|
|
403
|
+
| sonnet/medium | `Agent(model="sonnet")` | `Direct and efficient. Explain only non-obvious logic.` |
|
|
404
|
+
| opus/high | `Agent(model="opus")` | _(none)_ |
|
|
647
405
|
|
|
648
|
-
**
|
|
649
|
-
- `low` → Prepend efficiency instruction. Agent should make fewest possible tool calls.
|
|
650
|
-
- `medium` → Prepend balanced instruction. Agent skips preamble but explains non-obvious decisions.
|
|
651
|
-
- `high` → No preamble added. Agent uses full reasoning capabilities.
|
|
652
|
-
|
|
653
|
-
**Checkpoint schema:** `.deepflow/checkpoint.json` in worktree:
|
|
654
|
-
```json
|
|
655
|
-
{"completed_tasks": ["T1","T2"], "current_wave": 2, "worktree_path": ".deepflow/worktrees/upload", "worktree_branch": "df/upload"}
|
|
656
|
-
```
|
|
657
|
-
|
|
658
|
-
---
|
|
406
|
+
**Checkpoint:** `.deepflow/checkpoint.json`: `{"completed_tasks":["T1"],"current_wave":2,"worktree_path":"...","worktree_branch":"df/..."}`
|
|
659
407
|
|
|
660
408
|
## Failure Handling
|
|
661
409
|
|
|
662
|
-
|
|
663
|
-
- `TaskUpdate(taskId: native_id, status: "pending")` — dependents remain blocked
|
|
664
|
-
- Repeated failure → spawn `Task(subagent_type="reasoner", prompt="Debug failure: {ratchet output}")`
|
|
665
|
-
- Leave worktree intact, keep checkpoint.json
|
|
666
|
-
- Output: worktree path/branch, `cd {path}` to investigate, `--continue` to resume, `--fresh` to discard
|
|
410
|
+
Reverted task: `TaskUpdate(status: "pending")`, dependents stay blocked. Repeated failure → spawn reasoner debugger. Leave worktree+checkpoint intact. Output: path, `cd` command, `--continue`/`--fresh` options.
|
|
667
411
|
|
|
668
412
|
## Rules
|
|
669
413
|
|
|
670
414
|
| Rule | Detail |
|
|
671
415
|
|------|--------|
|
|
672
|
-
| Zero
|
|
416
|
+
| Zero tests → bootstrap first | Sole task when snapshot empty |
|
|
673
417
|
| 1 task = 1 agent = 1 commit | `atomic-commits` skill |
|
|
674
|
-
| 1 file = 1 writer | Sequential
|
|
675
|
-
| Agent
|
|
676
|
-
| No LLM evaluates LLM
|
|
677
|
-
| ≥2 spikes
|
|
678
|
-
|
|
|
679
|
-
| Machine-selected winner | Regressions > coverage > files
|
|
418
|
+
| 1 file = 1 writer | Sequential on conflict |
|
|
419
|
+
| Agent codes, orchestrator measures | Ratchet judges |
|
|
420
|
+
| No LLM evaluates LLM | Health checks only |
|
|
421
|
+
| ≥2 spikes → parallel probes | Never sequential |
|
|
422
|
+
| Probe worktrees preserved | Losers `-failed`, never deleted |
|
|
423
|
+
| Machine-selected winner | Regressions > coverage > files; no LLM judge |
|
|
680
424
|
| External APIs → chub first | Skip if unavailable |
|
|
681
|
-
| 1 optimize
|
|
682
|
-
| Optimize = atomic
|
|
683
|
-
| Ratchet + metric
|
|
684
|
-
| Plateau → probes
|
|
685
|
-
| Circuit breaker = 3
|
|
686
|
-
|
|
|
687
|
-
|
|
688
|
-
|
|
689
|
-
|
|
690
|
-
```
|
|
691
|
-
/df:execute (context: 12%)
|
|
692
|
-
|
|
693
|
-
Loading PLAN.md... T1 ready, T2/T3 blocked by T1
|
|
694
|
-
Ratchet snapshot: 24 pre-existing test files
|
|
695
|
-
|
|
696
|
-
Wave 1: TaskUpdate(T1, in_progress)
|
|
697
|
-
[Agent "T1" completed]
|
|
698
|
-
Running ratchet: build ✓ | tests ✓ (24 passed) | typecheck ✓
|
|
699
|
-
✓ T1: ratchet passed (abc1234)
|
|
700
|
-
TaskUpdate(T1, completed) → auto-unblocks T2, T3
|
|
701
|
-
|
|
702
|
-
Wave 2: TaskUpdate(T2/T3, in_progress)
|
|
703
|
-
[Agent "T2" completed] ✓ T2: ratchet passed (def5678)
|
|
704
|
-
[Agent "T3" completed] ✓ T3: ratchet passed (ghi9012)
|
|
705
|
-
|
|
706
|
-
Context: 35% — All tasks done for doing-upload.
|
|
707
|
-
Running /df:verify doing-upload...
|
|
708
|
-
✓ L0 | ✓ L1 (3/3 files) | ⚠ L2 (no coverage tool) | ✓ L4 (24 tests)
|
|
709
|
-
✓ Merged df/upload to main
|
|
710
|
-
✓ Spec complete: doing-upload → done-upload
|
|
711
|
-
Complete: 3/3
|
|
712
|
-
```
|
|
425
|
+
| 1 optimize at a time | Sequential |
|
|
426
|
+
| Optimize = atomic only | One change per cycle |
|
|
427
|
+
| Ratchet + metric both required | Keep only if both pass |
|
|
428
|
+
| Plateau → probes | 3 cycles <1% triggers probes |
|
|
429
|
+
| Circuit breaker = 3 reverts | Halts, needs human |
|
|
430
|
+
| Wave test after ratchet | Opus writes tests; 3 attempts then revert |
|
|
431
|
+
| Final test before merge | Opus black-box integration tests; failure blocks merge, no retry |
|
|
432
|
+
| Probe diversity | ≥1 contraditoria + ≥1 ingenua |
|