deepflow 0.1.79 → 0.1.81

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -33,6 +33,7 @@ Most spec-driven frameworks start from a finished spec and execute a static plan
33
33
  - **Spec as living hypothesis** — Core intent stays fixed, details refine through implementation. "The spec becomes bulletproof because you built it, not before."
34
34
  - **Parallel probes reveal the best path** — Uncertain approaches spawn parallel spikes in isolated worktrees. The machine selects the winner (fewer regressions > better coverage > fewer files changed). Failed approaches stay recorded and never repeat.
35
35
  - **Metrics decide, not opinions** — No LLM judges another LLM. Build, tests, typecheck, lint, and invariant checks are the only judges. After an agent commits, the orchestrator runs health checks. Pass = keep. Fail = revert + new hypothesis.
36
+ - **Browser verification closes the loop** — L5 launches headless Chromium via Playwright, captures the accessibility tree, and evaluates structured assertions extracted at plan-time from your spec's acceptance criteria. Deterministic pass/fail — no LLM calls during verification. Screenshots saved as evidence.
36
37
  - **The loop is the product** — Not "execute a plan" — "evolve the codebase toward the spec's goals through iterative cycles." Each cycle reveals what the previous one couldn't see.
37
38
 
38
39
  ## What We Learned by Doing
@@ -111,7 +112,7 @@ $ git log --oneline
111
112
  1. Runs `/df:plan` if no PLAN.md exists
112
113
  2. Snapshots pre-existing tests (ratchet baseline)
113
114
  3. Starts a loop (`/loop 1m /df:auto-cycle`) — fresh context each cycle
114
- 4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint/invariant-check)
115
+ 4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint/invariant-check/browser-verify)
115
116
  5. Pass = commit stands. Fail = revert + retry next cycle
116
117
  6. Circuit breaker: halts after N consecutive reverts on same task
117
118
  7. When all tasks done: runs `/df:verify`, merges to main
@@ -142,7 +143,7 @@ $ git log --oneline
142
143
  | `/df:spec <name>` | Generate spec from conversation |
143
144
  | `/df:plan` | Compare specs to code, create tasks |
144
145
  | `/df:execute` | Run tasks with parallel agents |
145
- | `/df:verify` | Check specs satisfied, merge to main |
146
+ | `/df:verify` | Check specs satisfied (L0-L5), merge to main |
146
147
  | `/df:note` | Capture decisions ad-hoc from conversation |
147
148
  | `/df:consolidate` | Deduplicate and clean up decisions.md |
148
149
  | `/df:resume` | Session continuity briefing |
@@ -179,12 +180,22 @@ your-project/
179
180
 
180
181
  1. **Discover before specifying, spike before implementing** — Ask, debate, probe — then commit
181
182
  2. **You define WHAT, AI figures out HOW** — Specs are the contract
182
- 3. **Metrics decide, not opinions** — Build/test/typecheck/lint/invariant-check are the only judges
183
+ 3. **Metrics decide, not opinions** — Build/test/typecheck/lint/invariant-check/browser-verify are the only judges
183
184
  4. **Confirm before assume** — Search the code before marking "missing"
184
185
  5. **Complete implementations** — No stubs, no placeholders
185
186
  6. **Atomic commits** — One task = one commit
186
187
  7. **Context-aware** — Checkpoint before limits, resume seamlessly
187
188
 
189
+ ## Skills
190
+
191
+ | Skill | Purpose |
192
+ |-------|---------|
193
+ | `browse-fetch` | Fetch external API docs via headless Chromium (replaces context-hub) |
194
+ | `browse-verify` | L5 browser verification — Playwright a11y tree assertions |
195
+ | `atomic-commits` | One logical change per commit |
196
+ | `code-completeness` | Find TODOs, stubs, and missing implementations |
197
+ | `gap-discovery` | Surface missing requirements during ideation |
198
+
188
199
  ## More
189
200
 
190
201
  - [Concepts](docs/concepts.md) — Philosophy and flow in depth
package/bin/install.js CHANGED
@@ -184,7 +184,7 @@ async function main() {
184
184
  console.log('');
185
185
  console.log(`Installed to ${c.cyan}${CLAUDE_DIR}${c.reset}:`);
186
186
  console.log(' commands/df/ — /df:discover, /df:debate, /df:spec, /df:plan, /df:execute, /df:verify, /df:auto, /df:note, /df:resume, /df:update');
187
- console.log(' skills/ — gap-discovery, atomic-commits, code-completeness, context-hub');
187
+ console.log(' skills/ — gap-discovery, atomic-commits, code-completeness, browse-fetch, browse-verify');
188
188
  console.log(' agents/ — reasoner (/df:auto — autonomous execution via /loop)');
189
189
  if (level === 'global') {
190
190
  console.log(' hooks/ — statusline, update checker, invariant checker');
@@ -469,7 +469,8 @@ async function uninstall() {
469
469
  'skills/atomic-commits',
470
470
  'skills/code-completeness',
471
471
  'skills/gap-discovery',
472
- 'skills/context-hub',
472
+ 'skills/browse-fetch',
473
+ 'skills/browse-verify',
473
474
  'agents/reasoner.md'
474
475
  ];
475
476
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "deepflow",
3
- "version": "0.1.79",
3
+ "version": "0.1.81",
4
4
  "description": "Doing reveals what thinking can't predict — spec-driven iterative development for Claude Code",
5
5
  "keywords": [
6
6
  "claude",
@@ -39,5 +39,8 @@
39
39
  ],
40
40
  "engines": {
41
41
  "node": ">=16.0.0"
42
+ },
43
+ "dependencies": {
44
+ "playwright": "^1.58.2"
42
45
  }
43
46
  }
@@ -111,7 +111,22 @@ Read the current file first (create if missing), merge the new values, and write
111
111
 
112
112
  After `/df:execute` returns, check whether the task was reverted (ratchet failed):
113
113
 
114
- **On revert (ratchet failed):**
114
+ **What counts as a failure (increments counter):**
115
+
116
+ ```
117
+ - L0 ✗ (build failed)
118
+ - L1 ✗ (files missing)
119
+ - L2 ✗ (coverage dropped)
120
+ - L4 ✗ (tests failed)
121
+ - L5 ✗ (browser assertions failed — both attempts)
122
+ - L5 ✗ (flaky) (browser assertions failed on both attempts, different assertions)
123
+
124
+ What does NOT count as a failure:
125
+ - L5 — (no frontend): skipped, not a revert trigger
126
+ - L5 ⚠ (passed on retry): treated as pass, resets counter
127
+ ```
128
+
129
+ **On revert (ratchet failed — any of L0 ✗, L1 ✗, L2 ✗, L4 ✗, L5 ✗, or L5 ✗ flaky):**
115
130
 
116
131
  ```
117
132
  1. Read .deepflow/auto-memory.yaml (create if missing)
@@ -126,7 +141,7 @@ After `/df:execute` returns, check whether the task was reverted (ratchet failed
126
141
  → Continue to step 4 (UPDATE REPORT) as normal
127
142
  ```
128
143
 
129
- **On success (ratchet passed):**
144
+ **On success (ratchet passed — including L5 — no frontend or L5 ⚠ pass-on-retry):**
130
145
 
131
146
  ```
132
147
  1. Reset consecutive_reverts[task_id] to 0 in .deepflow/auto-memory.yaml
@@ -104,7 +104,14 @@ Before spawning: `TaskUpdate(taskId: native_id, status: "in_progress")` — acti
104
104
 
105
105
  **NEVER use `isolation: "worktree"` on Task calls.** Deepflow manages a shared worktree so wave 2 sees wave 1 commits.
106
106
 
107
- **Spawn ALL ready tasks in ONE message.** Same-file conflicts: spawn sequentially.
107
+ **Spawn ALL ready tasks in ONE message** — EXCEPT file conflicts (see below).
108
+
109
+ **File conflict enforcement (1 file = 1 writer):**
110
+ Before spawning, check `Files:` lists of all ready tasks. If two+ ready tasks share a file:
111
+ 1. Sort conflicting tasks by task number (T1 < T2 < T3)
112
+ 2. Spawn only the lowest-numbered task from each conflict group
113
+ 3. Remaining tasks stay `pending` — they become ready once the spawned task completes
114
+ 4. Log: `"⏳ T{N} deferred — file conflict with T{M} on {filename}"`
108
115
 
109
116
  **≥2 [SPIKE] tasks for same problem:** Follow Parallel Spike Probes (section 5.7).
110
117
 
@@ -138,7 +145,7 @@ Ratchet uses ONLY pre-existing test files from `.deepflow/auto-snapshot.txt`.
138
145
  Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothesis.
139
146
 
140
147
  1. **Baseline:** Record `BASELINE=$(git rev-parse HEAD)` in shared worktree
141
- 2. **Sub-worktrees:** Per spike: `git worktree add -b df/{spec}/probe-{SPIKE_ID} .deepflow/worktrees/{spec}/probe-{SPIKE_ID} ${BASELINE}`
148
+ 2. **Sub-worktrees:** Per spike: `git worktree add -b df/{spec}--probe-{SPIKE_ID} .deepflow/worktrees/{spec}/probe-{SPIKE_ID} ${BASELINE}`
142
149
  3. **Spawn:** All probes in ONE message, each targeting its probe worktree. End turn.
143
150
  4. **Ratchet:** Per notification, run standard ratchet (5.5) in probe worktree. Record: ratchet_passed, regressions, coverage_delta, files_changed, commit
144
151
  5. **Select winner** (after ALL complete, no LLM judge):
@@ -146,9 +153,17 @@ Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothes
146
153
  - Rank: fewer regressions > higher coverage_delta > fewer files_changed > first to complete
147
154
  - No passes → reset all to pending for retry with debugger
148
155
  6. **Preserve all worktrees.** Losers: rename branch + `-failed` suffix. Record in checkpoint.json under `"spike_probes"`
149
- 7. **Log failed probes** to `.deepflow/auto-memory.yaml` (main tree):
156
+ 7. **Log ALL probe outcomes** to `.deepflow/auto-memory.yaml` (main tree):
150
157
  ```yaml
151
158
  spike_insights:
159
+ - date: "YYYY-MM-DD"
160
+ spec: "{spec_name}"
161
+ spike_id: "SPIKE_A"
162
+ hypothesis: "{from PLAN.md}"
163
+ outcome: "winner"
164
+ approach: "{one-sentence summary of what the winning probe chose}"
165
+ ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
166
+ branch: "df/{spec}--probe-SPIKE_A"
152
167
  - date: "YYYY-MM-DD"
153
168
  spec: "{spec_name}"
154
169
  spike_id: "SPIKE_B"
@@ -157,13 +172,16 @@ Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothes
157
172
  failure_reason: "{first failed check + error summary}"
158
173
  ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
159
174
  worktree: ".deepflow/worktrees/{spec}/probe-SPIKE_B-failed"
160
- branch: "df/{spec}/probe-SPIKE_B-failed"
161
- probe_learnings: # read by /df:auto-cycle each start
175
+ branch: "df/{spec}--probe-SPIKE_B-failed"
176
+ probe_learnings: # read by /df:auto-cycle each start AND included in per-task preamble
177
+ - spike: "SPIKE_A"
178
+ probe: "probe-SPIKE_A"
179
+ insight: "{one-sentence summary of winning approach — e.g. 'Use Node.js over Bun for Playwright'}"
162
180
  - spike: "SPIKE_B"
163
181
  probe: "probe-SPIKE_B"
164
182
  insight: "{one-sentence summary from failure_reason}"
165
183
  ```
166
- Create file if missing. Preserve existing keys when merging.
184
+ Create file if missing. Preserve existing keys when merging. Log BOTH winners and losers — downstream tasks need to know what was chosen, not just what failed.
167
185
  8. **Promote winner:** Cherry-pick into shared worktree. Winner → `[x] [PROBE_WINNER]`, losers → `[~] [PROBE_FAILED]`. Resume standard loop.
168
186
 
169
187
  ---
@@ -176,10 +194,15 @@ Working directory: {worktree_absolute_path}
176
194
  All file operations MUST use this absolute path as base. Do NOT write files to the main project directory.
177
195
  Commit format: {commit_type}({spec}): {description}
178
196
 
197
+ {If .deepflow/auto-memory.yaml exists and has probe_learnings, include:}
198
+ Spike results (follow these approaches):
199
+ {each probe_learning with outcome "winner" → "- {insight}"}
200
+ {Omit this block if no probe_learnings exist.}
201
+
179
202
  STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main.
180
203
  ```
181
204
 
182
- **Standard Task:**
205
+ **Standard Task** (spawn with `Agent(model="{Model from PLAN.md}", ...)`):
183
206
  ```
184
207
  {task_id}: {description from PLAN.md}
185
208
  Files: {target files} Spec: {spec_name}
@@ -252,14 +275,21 @@ When all tasks done for a `doing-*` spec:
252
275
  ## Skills & Agents
253
276
 
254
277
  - Skill: `atomic-commits` — Clean commit protocol
255
- - Skill: `context-hub` — Fetch external API docs before coding
278
+ - Skill: `browse-fetch` — Fetch live web pages and external API docs via browser before coding
256
279
 
257
280
  | Agent | subagent_type | Purpose |
258
281
  |-------|---------------|---------|
259
282
  | Implementation | `general-purpose` | Task implementation |
260
283
  | Debugger | `reasoner` | Debugging failures |
261
284
 
262
- **Model routing:** Use `model:` from command/agent/skill frontmatter. Default: `sonnet`.
285
+ **Model routing:** Read `Model:` field from each task block in PLAN.md. Pass as `model:` parameter when spawning the agent. Default: `sonnet` if field is missing.
286
+
287
+ | Task field | Agent call |
288
+ |------------|-----------|
289
+ | `Model: haiku` | `Agent(model="haiku", ...)` |
290
+ | `Model: sonnet` | `Agent(model="sonnet", ...)` |
291
+ | `Model: opus` | `Agent(model="opus", ...)` |
292
+ | (missing) | `Agent(model="sonnet", ...)` |
263
293
 
264
294
  **Checkpoint schema:** `.deepflow/checkpoint.json` in worktree:
265
295
  ```json
@@ -99,12 +99,59 @@ For each file in a task's "Files:" list, find the full blast radius.
99
99
  Files outside original "Files:" → add with `(impact — verify/update)`.
100
100
  Skip for spike tasks.
101
101
 
102
+ ### 4.6. CROSS-TASK FILE CONFLICT DETECTION
103
+
104
+ After all tasks have their `Files:` lists, detect overlaps that require sequential execution.
105
+
106
+ **Algorithm:**
107
+ 1. Build a map: `file → [task IDs that list it]`
108
+ 2. For each file with >1 task: add `Blocked by` edge from later task → earlier task (by task number)
109
+ 3. If a dependency already exists (direct or transitive), skip (no redundant edges)
110
+
111
+ **Example:**
112
+ ```
113
+ T1: Files: config.go, feature.go — Blocked by: none
114
+ T3: Files: config.go — Blocked by: none
115
+ T5: Files: config.go — Blocked by: none
116
+ ```
117
+ After conflict detection:
118
+ ```
119
+ T1: Blocked by: none
120
+ T3: Blocked by: T1 (file conflict: config.go)
121
+ T5: Blocked by: T3 (file conflict: config.go)
122
+ ```
123
+
124
+ **Rules:**
125
+ - Only add the minimum edges needed (chain, not full mesh — T5 blocks on T3, not T1+T3)
126
+ - Append `(file conflict: {filename})` to the Blocked by reason for traceability
127
+ - If a logical dependency already covers the ordering, don't add a redundant conflict edge
128
+ - Cross-spec conflicts: tasks from different specs sharing files get the same treatment
129
+
102
130
  ### 5. COMPARE & PRIORITIZE
103
131
 
104
132
  Spawn `Task(subagent_type="reasoner", model="opus")`. Map each requirement to DONE / PARTIAL / MISSING / CONFLICT. Check REQ-AC alignment. Flag spec gaps.
105
133
 
106
134
  Priority: Dependencies → Impact → Risk
107
135
 
136
+ ### 5.5. CLASSIFY MODEL PER TASK
137
+
138
+ For each task, assign `Model:` based on complexity signals:
139
+
140
+ | Model | When | Signals |
141
+ |-------|------|---------|
142
+ | `haiku` | Mechanical / low-risk | Single file, config changes, renames, formatting, browse-fetch, simple additions with clear pattern to follow |
143
+ | `sonnet` | Standard implementation | Feature work, bug fixes, refactoring, multi-file changes with clear specs |
144
+ | `opus` | High complexity | Architecture changes, complex multi-file refactors, ambiguous specs, unfamiliar APIs, >5 files in Impact |
145
+
146
+ **Decision inputs:**
147
+ 1. **File count** — 1 file → likely haiku/sonnet, >5 files → sonnet/opus
148
+ 2. **Impact blast radius** — many callers/duplicates → raise complexity
149
+ 3. **Spec clarity** — clear ACs with patterns → lower, ambiguous requirements → raise
150
+ 4. **Type** — spikes always `sonnet` (need reasoning but scoped), bootstrap → `haiku`
151
+ 5. **Has prior failures** — reverted tasks → raise one level (min `sonnet`)
152
+
153
+ Add `Model: haiku|sonnet|opus` to each task block. Default: `sonnet` if unclear.
154
+
108
155
  ### 6. GENERATE SPIKE TASKS (IF NEEDED)
109
156
 
110
157
  **Spike Task Format:**
@@ -200,6 +247,7 @@ Always use `Task` tool with explicit `subagent_type` and `model`.
200
247
 
201
248
  - [ ] **T2**: Create upload endpoint
202
249
  - Files: src/api/upload.ts
250
+ - Model: sonnet
203
251
  - Impact:
204
252
  - Callers: src/routes/index.ts:5
205
253
  - Duplicates: backend/legacy-upload.go [dead — DELETE]
@@ -207,5 +255,6 @@ Always use `Task` tool with explicit `subagent_type` and `model`.
207
255
 
208
256
  - [ ] **T3**: Add S3 service with streaming
209
257
  - Files: src/services/storage.ts
258
+ - Model: opus
210
259
  - Blocked by: T1, T2
211
260
  ```