deepflow 0.1.79 → 0.1.81
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +14 -3
- package/bin/install.js +3 -2
- package/package.json +4 -1
- package/src/commands/df/auto-cycle.md +17 -2
- package/src/commands/df/execute.md +39 -9
- package/src/commands/df/plan.md +49 -0
- package/src/commands/df/verify.md +433 -3
- package/src/skills/browse-fetch/SKILL.md +416 -0
- package/src/skills/browse-verify/SKILL.md +264 -0
- package/templates/config-template.yaml +14 -0
- package/src/skills/context-hub/SKILL.md +0 -87
package/README.md
CHANGED
|
@@ -33,6 +33,7 @@ Most spec-driven frameworks start from a finished spec and execute a static plan
|
|
|
33
33
|
- **Spec as living hypothesis** — Core intent stays fixed, details refine through implementation. "The spec becomes bulletproof because you built it, not before."
|
|
34
34
|
- **Parallel probes reveal the best path** — Uncertain approaches spawn parallel spikes in isolated worktrees. The machine selects the winner (fewer regressions > better coverage > fewer files changed). Failed approaches stay recorded and never repeat.
|
|
35
35
|
- **Metrics decide, not opinions** — No LLM judges another LLM. Build, tests, typecheck, lint, and invariant checks are the only judges. After an agent commits, the orchestrator runs health checks. Pass = keep. Fail = revert + new hypothesis.
|
|
36
|
+
- **Browser verification closes the loop** — L5 launches headless Chromium via Playwright, captures the accessibility tree, and evaluates structured assertions extracted at plan-time from your spec's acceptance criteria. Deterministic pass/fail — no LLM calls during verification. Screenshots saved as evidence.
|
|
36
37
|
- **The loop is the product** — Not "execute a plan" — "evolve the codebase toward the spec's goals through iterative cycles." Each cycle reveals what the previous one couldn't see.
|
|
37
38
|
|
|
38
39
|
## What We Learned by Doing
|
|
@@ -111,7 +112,7 @@ $ git log --oneline
|
|
|
111
112
|
1. Runs `/df:plan` if no PLAN.md exists
|
|
112
113
|
2. Snapshots pre-existing tests (ratchet baseline)
|
|
113
114
|
3. Starts a loop (`/loop 1m /df:auto-cycle`) — fresh context each cycle
|
|
114
|
-
4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint/invariant-check)
|
|
115
|
+
4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint/invariant-check/browser-verify)
|
|
115
116
|
5. Pass = commit stands. Fail = revert + retry next cycle
|
|
116
117
|
6. Circuit breaker: halts after N consecutive reverts on same task
|
|
117
118
|
7. When all tasks done: runs `/df:verify`, merges to main
|
|
@@ -142,7 +143,7 @@ $ git log --oneline
|
|
|
142
143
|
| `/df:spec <name>` | Generate spec from conversation |
|
|
143
144
|
| `/df:plan` | Compare specs to code, create tasks |
|
|
144
145
|
| `/df:execute` | Run tasks with parallel agents |
|
|
145
|
-
| `/df:verify` | Check specs satisfied, merge to main |
|
|
146
|
+
| `/df:verify` | Check specs satisfied (L0-L5), merge to main |
|
|
146
147
|
| `/df:note` | Capture decisions ad-hoc from conversation |
|
|
147
148
|
| `/df:consolidate` | Deduplicate and clean up decisions.md |
|
|
148
149
|
| `/df:resume` | Session continuity briefing |
|
|
@@ -179,12 +180,22 @@ your-project/
|
|
|
179
180
|
|
|
180
181
|
1. **Discover before specifying, spike before implementing** — Ask, debate, probe — then commit
|
|
181
182
|
2. **You define WHAT, AI figures out HOW** — Specs are the contract
|
|
182
|
-
3. **Metrics decide, not opinions** — Build/test/typecheck/lint/invariant-check are the only judges
|
|
183
|
+
3. **Metrics decide, not opinions** — Build/test/typecheck/lint/invariant-check/browser-verify are the only judges
|
|
183
184
|
4. **Confirm before assume** — Search the code before marking "missing"
|
|
184
185
|
5. **Complete implementations** — No stubs, no placeholders
|
|
185
186
|
6. **Atomic commits** — One task = one commit
|
|
186
187
|
7. **Context-aware** — Checkpoint before limits, resume seamlessly
|
|
187
188
|
|
|
189
|
+
## Skills
|
|
190
|
+
|
|
191
|
+
| Skill | Purpose |
|
|
192
|
+
|-------|---------|
|
|
193
|
+
| `browse-fetch` | Fetch external API docs via headless Chromium (replaces context-hub) |
|
|
194
|
+
| `browse-verify` | L5 browser verification — Playwright a11y tree assertions |
|
|
195
|
+
| `atomic-commits` | One logical change per commit |
|
|
196
|
+
| `code-completeness` | Find TODOs, stubs, and missing implementations |
|
|
197
|
+
| `gap-discovery` | Surface missing requirements during ideation |
|
|
198
|
+
|
|
188
199
|
## More
|
|
189
200
|
|
|
190
201
|
- [Concepts](docs/concepts.md) — Philosophy and flow in depth
|
package/bin/install.js
CHANGED
|
@@ -184,7 +184,7 @@ async function main() {
|
|
|
184
184
|
console.log('');
|
|
185
185
|
console.log(`Installed to ${c.cyan}${CLAUDE_DIR}${c.reset}:`);
|
|
186
186
|
console.log(' commands/df/ — /df:discover, /df:debate, /df:spec, /df:plan, /df:execute, /df:verify, /df:auto, /df:note, /df:resume, /df:update');
|
|
187
|
-
console.log(' skills/ — gap-discovery, atomic-commits, code-completeness,
|
|
187
|
+
console.log(' skills/ — gap-discovery, atomic-commits, code-completeness, browse-fetch, browse-verify');
|
|
188
188
|
console.log(' agents/ — reasoner (/df:auto — autonomous execution via /loop)');
|
|
189
189
|
if (level === 'global') {
|
|
190
190
|
console.log(' hooks/ — statusline, update checker, invariant checker');
|
|
@@ -469,7 +469,8 @@ async function uninstall() {
|
|
|
469
469
|
'skills/atomic-commits',
|
|
470
470
|
'skills/code-completeness',
|
|
471
471
|
'skills/gap-discovery',
|
|
472
|
-
'skills/
|
|
472
|
+
'skills/browse-fetch',
|
|
473
|
+
'skills/browse-verify',
|
|
473
474
|
'agents/reasoner.md'
|
|
474
475
|
];
|
|
475
476
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "deepflow",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.81",
|
|
4
4
|
"description": "Doing reveals what thinking can't predict — spec-driven iterative development for Claude Code",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"claude",
|
|
@@ -39,5 +39,8 @@
|
|
|
39
39
|
],
|
|
40
40
|
"engines": {
|
|
41
41
|
"node": ">=16.0.0"
|
|
42
|
+
},
|
|
43
|
+
"dependencies": {
|
|
44
|
+
"playwright": "^1.58.2"
|
|
42
45
|
}
|
|
43
46
|
}
|
|
@@ -111,7 +111,22 @@ Read the current file first (create if missing), merge the new values, and write
|
|
|
111
111
|
|
|
112
112
|
After `/df:execute` returns, check whether the task was reverted (ratchet failed):
|
|
113
113
|
|
|
114
|
-
**
|
|
114
|
+
**What counts as a failure (increments counter):**
|
|
115
|
+
|
|
116
|
+
```
|
|
117
|
+
- L0 ✗ (build failed)
|
|
118
|
+
- L1 ✗ (files missing)
|
|
119
|
+
- L2 ✗ (coverage dropped)
|
|
120
|
+
- L4 ✗ (tests failed)
|
|
121
|
+
- L5 ✗ (browser assertions failed — both attempts)
|
|
122
|
+
- L5 ✗ (flaky) (browser assertions failed on both attempts, different assertions)
|
|
123
|
+
|
|
124
|
+
What does NOT count as a failure:
|
|
125
|
+
- L5 — (no frontend): skipped, not a revert trigger
|
|
126
|
+
- L5 ⚠ (passed on retry): treated as pass, resets counter
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
**On revert (ratchet failed — any of L0 ✗, L1 ✗, L2 ✗, L4 ✗, L5 ✗, or L5 ✗ flaky):**
|
|
115
130
|
|
|
116
131
|
```
|
|
117
132
|
1. Read .deepflow/auto-memory.yaml (create if missing)
|
|
@@ -126,7 +141,7 @@ After `/df:execute` returns, check whether the task was reverted (ratchet failed
|
|
|
126
141
|
→ Continue to step 4 (UPDATE REPORT) as normal
|
|
127
142
|
```
|
|
128
143
|
|
|
129
|
-
**On success (ratchet passed):**
|
|
144
|
+
**On success (ratchet passed — including L5 — no frontend or L5 ⚠ pass-on-retry):**
|
|
130
145
|
|
|
131
146
|
```
|
|
132
147
|
1. Reset consecutive_reverts[task_id] to 0 in .deepflow/auto-memory.yaml
|
|
@@ -104,7 +104,14 @@ Before spawning: `TaskUpdate(taskId: native_id, status: "in_progress")` — acti
|
|
|
104
104
|
|
|
105
105
|
**NEVER use `isolation: "worktree"` on Task calls.** Deepflow manages a shared worktree so wave 2 sees wave 1 commits.
|
|
106
106
|
|
|
107
|
-
**Spawn ALL ready tasks in ONE message
|
|
107
|
+
**Spawn ALL ready tasks in ONE message** — EXCEPT file conflicts (see below).
|
|
108
|
+
|
|
109
|
+
**File conflict enforcement (1 file = 1 writer):**
|
|
110
|
+
Before spawning, check `Files:` lists of all ready tasks. If two+ ready tasks share a file:
|
|
111
|
+
1. Sort conflicting tasks by task number (T1 < T2 < T3)
|
|
112
|
+
2. Spawn only the lowest-numbered task from each conflict group
|
|
113
|
+
3. Remaining tasks stay `pending` — they become ready once the spawned task completes
|
|
114
|
+
4. Log: `"⏳ T{N} deferred — file conflict with T{M} on {filename}"`
|
|
108
115
|
|
|
109
116
|
**≥2 [SPIKE] tasks for same problem:** Follow Parallel Spike Probes (section 5.7).
|
|
110
117
|
|
|
@@ -138,7 +145,7 @@ Ratchet uses ONLY pre-existing test files from `.deepflow/auto-snapshot.txt`.
|
|
|
138
145
|
Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothesis.
|
|
139
146
|
|
|
140
147
|
1. **Baseline:** Record `BASELINE=$(git rev-parse HEAD)` in shared worktree
|
|
141
|
-
2. **Sub-worktrees:** Per spike: `git worktree add -b df/{spec}
|
|
148
|
+
2. **Sub-worktrees:** Per spike: `git worktree add -b df/{spec}--probe-{SPIKE_ID} .deepflow/worktrees/{spec}/probe-{SPIKE_ID} ${BASELINE}`
|
|
142
149
|
3. **Spawn:** All probes in ONE message, each targeting its probe worktree. End turn.
|
|
143
150
|
4. **Ratchet:** Per notification, run standard ratchet (5.5) in probe worktree. Record: ratchet_passed, regressions, coverage_delta, files_changed, commit
|
|
144
151
|
5. **Select winner** (after ALL complete, no LLM judge):
|
|
@@ -146,9 +153,17 @@ Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothes
|
|
|
146
153
|
- Rank: fewer regressions > higher coverage_delta > fewer files_changed > first to complete
|
|
147
154
|
- No passes → reset all to pending for retry with debugger
|
|
148
155
|
6. **Preserve all worktrees.** Losers: rename branch + `-failed` suffix. Record in checkpoint.json under `"spike_probes"`
|
|
149
|
-
7. **Log
|
|
156
|
+
7. **Log ALL probe outcomes** to `.deepflow/auto-memory.yaml` (main tree):
|
|
150
157
|
```yaml
|
|
151
158
|
spike_insights:
|
|
159
|
+
- date: "YYYY-MM-DD"
|
|
160
|
+
spec: "{spec_name}"
|
|
161
|
+
spike_id: "SPIKE_A"
|
|
162
|
+
hypothesis: "{from PLAN.md}"
|
|
163
|
+
outcome: "winner"
|
|
164
|
+
approach: "{one-sentence summary of what the winning probe chose}"
|
|
165
|
+
ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
|
|
166
|
+
branch: "df/{spec}--probe-SPIKE_A"
|
|
152
167
|
- date: "YYYY-MM-DD"
|
|
153
168
|
spec: "{spec_name}"
|
|
154
169
|
spike_id: "SPIKE_B"
|
|
@@ -157,13 +172,16 @@ Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothes
|
|
|
157
172
|
failure_reason: "{first failed check + error summary}"
|
|
158
173
|
ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
|
|
159
174
|
worktree: ".deepflow/worktrees/{spec}/probe-SPIKE_B-failed"
|
|
160
|
-
branch: "df/{spec}
|
|
161
|
-
probe_learnings: # read by /df:auto-cycle each start
|
|
175
|
+
branch: "df/{spec}--probe-SPIKE_B-failed"
|
|
176
|
+
probe_learnings: # read by /df:auto-cycle each start AND included in per-task preamble
|
|
177
|
+
- spike: "SPIKE_A"
|
|
178
|
+
probe: "probe-SPIKE_A"
|
|
179
|
+
insight: "{one-sentence summary of winning approach — e.g. 'Use Node.js over Bun for Playwright'}"
|
|
162
180
|
- spike: "SPIKE_B"
|
|
163
181
|
probe: "probe-SPIKE_B"
|
|
164
182
|
insight: "{one-sentence summary from failure_reason}"
|
|
165
183
|
```
|
|
166
|
-
Create file if missing. Preserve existing keys when merging.
|
|
184
|
+
Create file if missing. Preserve existing keys when merging. Log BOTH winners and losers — downstream tasks need to know what was chosen, not just what failed.
|
|
167
185
|
8. **Promote winner:** Cherry-pick into shared worktree. Winner → `[x] [PROBE_WINNER]`, losers → `[~] [PROBE_FAILED]`. Resume standard loop.
|
|
168
186
|
|
|
169
187
|
---
|
|
@@ -176,10 +194,15 @@ Working directory: {worktree_absolute_path}
|
|
|
176
194
|
All file operations MUST use this absolute path as base. Do NOT write files to the main project directory.
|
|
177
195
|
Commit format: {commit_type}({spec}): {description}
|
|
178
196
|
|
|
197
|
+
{If .deepflow/auto-memory.yaml exists and has probe_learnings, include:}
|
|
198
|
+
Spike results (follow these approaches):
|
|
199
|
+
{each probe_learning with outcome "winner" → "- {insight}"}
|
|
200
|
+
{Omit this block if no probe_learnings exist.}
|
|
201
|
+
|
|
179
202
|
STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main.
|
|
180
203
|
```
|
|
181
204
|
|
|
182
|
-
**Standard Task
|
|
205
|
+
**Standard Task** (spawn with `Agent(model="{Model from PLAN.md}", ...)`):
|
|
183
206
|
```
|
|
184
207
|
{task_id}: {description from PLAN.md}
|
|
185
208
|
Files: {target files} Spec: {spec_name}
|
|
@@ -252,14 +275,21 @@ When all tasks done for a `doing-*` spec:
|
|
|
252
275
|
## Skills & Agents
|
|
253
276
|
|
|
254
277
|
- Skill: `atomic-commits` — Clean commit protocol
|
|
255
|
-
- Skill: `
|
|
278
|
+
- Skill: `browse-fetch` — Fetch live web pages and external API docs via browser before coding
|
|
256
279
|
|
|
257
280
|
| Agent | subagent_type | Purpose |
|
|
258
281
|
|-------|---------------|---------|
|
|
259
282
|
| Implementation | `general-purpose` | Task implementation |
|
|
260
283
|
| Debugger | `reasoner` | Debugging failures |
|
|
261
284
|
|
|
262
|
-
**Model routing:**
|
|
285
|
+
**Model routing:** Read `Model:` field from each task block in PLAN.md. Pass as `model:` parameter when spawning the agent. Default: `sonnet` if field is missing.
|
|
286
|
+
|
|
287
|
+
| Task field | Agent call |
|
|
288
|
+
|------------|-----------|
|
|
289
|
+
| `Model: haiku` | `Agent(model="haiku", ...)` |
|
|
290
|
+
| `Model: sonnet` | `Agent(model="sonnet", ...)` |
|
|
291
|
+
| `Model: opus` | `Agent(model="opus", ...)` |
|
|
292
|
+
| (missing) | `Agent(model="sonnet", ...)` |
|
|
263
293
|
|
|
264
294
|
**Checkpoint schema:** `.deepflow/checkpoint.json` in worktree:
|
|
265
295
|
```json
|
package/src/commands/df/plan.md
CHANGED
|
@@ -99,12 +99,59 @@ For each file in a task's "Files:" list, find the full blast radius.
|
|
|
99
99
|
Files outside original "Files:" → add with `(impact — verify/update)`.
|
|
100
100
|
Skip for spike tasks.
|
|
101
101
|
|
|
102
|
+
### 4.6. CROSS-TASK FILE CONFLICT DETECTION
|
|
103
|
+
|
|
104
|
+
After all tasks have their `Files:` lists, detect overlaps that require sequential execution.
|
|
105
|
+
|
|
106
|
+
**Algorithm:**
|
|
107
|
+
1. Build a map: `file → [task IDs that list it]`
|
|
108
|
+
2. For each file with >1 task: add `Blocked by` edge from later task → earlier task (by task number)
|
|
109
|
+
3. If a dependency already exists (direct or transitive), skip (no redundant edges)
|
|
110
|
+
|
|
111
|
+
**Example:**
|
|
112
|
+
```
|
|
113
|
+
T1: Files: config.go, feature.go — Blocked by: none
|
|
114
|
+
T3: Files: config.go — Blocked by: none
|
|
115
|
+
T5: Files: config.go — Blocked by: none
|
|
116
|
+
```
|
|
117
|
+
After conflict detection:
|
|
118
|
+
```
|
|
119
|
+
T1: Blocked by: none
|
|
120
|
+
T3: Blocked by: T1 (file conflict: config.go)
|
|
121
|
+
T5: Blocked by: T3 (file conflict: config.go)
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
**Rules:**
|
|
125
|
+
- Only add the minimum edges needed (chain, not full mesh — T5 blocks on T3, not T1+T3)
|
|
126
|
+
- Append `(file conflict: {filename})` to the Blocked by reason for traceability
|
|
127
|
+
- If a logical dependency already covers the ordering, don't add a redundant conflict edge
|
|
128
|
+
- Cross-spec conflicts: tasks from different specs sharing files get the same treatment
|
|
129
|
+
|
|
102
130
|
### 5. COMPARE & PRIORITIZE
|
|
103
131
|
|
|
104
132
|
Spawn `Task(subagent_type="reasoner", model="opus")`. Map each requirement to DONE / PARTIAL / MISSING / CONFLICT. Check REQ-AC alignment. Flag spec gaps.
|
|
105
133
|
|
|
106
134
|
Priority: Dependencies → Impact → Risk
|
|
107
135
|
|
|
136
|
+
### 5.5. CLASSIFY MODEL PER TASK
|
|
137
|
+
|
|
138
|
+
For each task, assign `Model:` based on complexity signals:
|
|
139
|
+
|
|
140
|
+
| Model | When | Signals |
|
|
141
|
+
|-------|------|---------|
|
|
142
|
+
| `haiku` | Mechanical / low-risk | Single file, config changes, renames, formatting, browse-fetch, simple additions with clear pattern to follow |
|
|
143
|
+
| `sonnet` | Standard implementation | Feature work, bug fixes, refactoring, multi-file changes with clear specs |
|
|
144
|
+
| `opus` | High complexity | Architecture changes, complex multi-file refactors, ambiguous specs, unfamiliar APIs, >5 files in Impact |
|
|
145
|
+
|
|
146
|
+
**Decision inputs:**
|
|
147
|
+
1. **File count** — 1 file → likely haiku/sonnet, >5 files → sonnet/opus
|
|
148
|
+
2. **Impact blast radius** — many callers/duplicates → raise complexity
|
|
149
|
+
3. **Spec clarity** — clear ACs with patterns → lower, ambiguous requirements → raise
|
|
150
|
+
4. **Type** — spikes always `sonnet` (need reasoning but scoped), bootstrap → `haiku`
|
|
151
|
+
5. **Has prior failures** — reverted tasks → raise one level (min `sonnet`)
|
|
152
|
+
|
|
153
|
+
Add `Model: haiku|sonnet|opus` to each task block. Default: `sonnet` if unclear.
|
|
154
|
+
|
|
108
155
|
### 6. GENERATE SPIKE TASKS (IF NEEDED)
|
|
109
156
|
|
|
110
157
|
**Spike Task Format:**
|
|
@@ -200,6 +247,7 @@ Always use `Task` tool with explicit `subagent_type` and `model`.
|
|
|
200
247
|
|
|
201
248
|
- [ ] **T2**: Create upload endpoint
|
|
202
249
|
- Files: src/api/upload.ts
|
|
250
|
+
- Model: sonnet
|
|
203
251
|
- Impact:
|
|
204
252
|
- Callers: src/routes/index.ts:5
|
|
205
253
|
- Duplicates: backend/legacy-upload.go [dead — DELETE]
|
|
@@ -207,5 +255,6 @@ Always use `Task` tool with explicit `subagent_type` and `model`.
|
|
|
207
255
|
|
|
208
256
|
- [ ] **T3**: Add S3 service with streaming
|
|
209
257
|
- Files: src/services/storage.ts
|
|
258
|
+
- Model: opus
|
|
210
259
|
- Blocked by: T1, T2
|
|
211
260
|
```
|