deepflow 0.1.78 → 0.1.80

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -33,6 +33,7 @@ Most spec-driven frameworks start from a finished spec and execute a static plan
33
33
  - **Spec as living hypothesis** — Core intent stays fixed, details refine through implementation. "The spec becomes bulletproof because you built it, not before."
34
34
  - **Parallel probes reveal the best path** — Uncertain approaches spawn parallel spikes in isolated worktrees. The machine selects the winner (fewer regressions > better coverage > fewer files changed). Failed approaches stay recorded and never repeat.
35
35
  - **Metrics decide, not opinions** — No LLM judges another LLM. Build, tests, typecheck, lint, and invariant checks are the only judges. After an agent commits, the orchestrator runs health checks. Pass = keep. Fail = revert + new hypothesis.
36
+ - **Browser verification closes the loop** — L5 launches headless Chromium via Playwright, captures the accessibility tree, and evaluates structured assertions extracted at plan-time from your spec's acceptance criteria. Deterministic pass/fail — no LLM calls during verification. Screenshots saved as evidence.
36
37
  - **The loop is the product** — Not "execute a plan" — "evolve the codebase toward the spec's goals through iterative cycles." Each cycle reveals what the previous one couldn't see.
37
38
 
38
39
  ## What We Learned by Doing
@@ -111,7 +112,7 @@ $ git log --oneline
111
112
  1. Runs `/df:plan` if no PLAN.md exists
112
113
  2. Snapshots pre-existing tests (ratchet baseline)
113
114
  3. Starts a loop (`/loop 1m /df:auto-cycle`) — fresh context each cycle
114
- 4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint/invariant-check)
115
+ 4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint/invariant-check/browser-verify)
115
116
  5. Pass = commit stands. Fail = revert + retry next cycle
116
117
  6. Circuit breaker: halts after N consecutive reverts on same task
117
118
  7. When all tasks done: runs `/df:verify`, merges to main
@@ -142,7 +143,7 @@ $ git log --oneline
142
143
  | `/df:spec <name>` | Generate spec from conversation |
143
144
  | `/df:plan` | Compare specs to code, create tasks |
144
145
  | `/df:execute` | Run tasks with parallel agents |
145
- | `/df:verify` | Check specs satisfied, merge to main |
146
+ | `/df:verify` | Check specs satisfied (L0-L5), merge to main |
146
147
  | `/df:note` | Capture decisions ad-hoc from conversation |
147
148
  | `/df:consolidate` | Deduplicate and clean up decisions.md |
148
149
  | `/df:resume` | Session continuity briefing |
@@ -179,12 +180,22 @@ your-project/
179
180
 
180
181
  1. **Discover before specifying, spike before implementing** — Ask, debate, probe — then commit
181
182
  2. **You define WHAT, AI figures out HOW** — Specs are the contract
182
- 3. **Metrics decide, not opinions** — Build/test/typecheck/lint/invariant-check are the only judges
183
+ 3. **Metrics decide, not opinions** — Build/test/typecheck/lint/invariant-check/browser-verify are the only judges
183
184
  4. **Confirm before assume** — Search the code before marking "missing"
184
185
  5. **Complete implementations** — No stubs, no placeholders
185
186
  6. **Atomic commits** — One task = one commit
186
187
  7. **Context-aware** — Checkpoint before limits, resume seamlessly
187
188
 
189
+ ## Skills
190
+
191
+ | Skill | Purpose |
192
+ |-------|---------|
193
+ | `browse-fetch` | Fetch external API docs via headless Chromium (replaces context-hub) |
194
+ | `browse-verify` | L5 browser verification — Playwright a11y tree assertions |
195
+ | `atomic-commits` | One logical change per commit |
196
+ | `code-completeness` | Find TODOs, stubs, and missing implementations |
197
+ | `gap-discovery` | Surface missing requirements during ideation |
198
+
188
199
  ## More
189
200
 
190
201
  - [Concepts](docs/concepts.md) — Philosophy and flow in depth
package/bin/install.js CHANGED
@@ -184,7 +184,7 @@ async function main() {
184
184
  console.log('');
185
185
  console.log(`Installed to ${c.cyan}${CLAUDE_DIR}${c.reset}:`);
186
186
  console.log(' commands/df/ — /df:discover, /df:debate, /df:spec, /df:plan, /df:execute, /df:verify, /df:auto, /df:note, /df:resume, /df:update');
187
- console.log(' skills/ — gap-discovery, atomic-commits, code-completeness, context-hub');
187
+ console.log(' skills/ — gap-discovery, atomic-commits, code-completeness, browse-fetch, browse-verify');
188
188
  console.log(' agents/ — reasoner (/df:auto — autonomous execution via /loop)');
189
189
  if (level === 'global') {
190
190
  console.log(' hooks/ — statusline, update checker, invariant checker');
@@ -469,7 +469,8 @@ async function uninstall() {
469
469
  'skills/atomic-commits',
470
470
  'skills/code-completeness',
471
471
  'skills/gap-discovery',
472
- 'skills/context-hub',
472
+ 'skills/browse-fetch',
473
+ 'skills/browse-verify',
473
474
  'agents/reasoner.md'
474
475
  ];
475
476
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "deepflow",
3
- "version": "0.1.78",
3
+ "version": "0.1.80",
4
4
  "description": "Doing reveals what thinking can't predict — spec-driven iterative development for Claude Code",
5
5
  "keywords": [
6
6
  "claude",
@@ -39,5 +39,8 @@
39
39
  ],
40
40
  "engines": {
41
41
  "node": ">=16.0.0"
42
+ },
43
+ "dependencies": {
44
+ "playwright": "^1.58.2"
42
45
  }
43
46
  }
@@ -111,7 +111,22 @@ Read the current file first (create if missing), merge the new values, and write
111
111
 
112
112
  After `/df:execute` returns, check whether the task was reverted (ratchet failed):
113
113
 
114
- **On revert (ratchet failed):**
114
+ **What counts as a failure (increments counter):**
115
+
116
+ ```
117
+ - L0 ✗ (build failed)
118
+ - L1 ✗ (files missing)
119
+ - L2 ✗ (coverage dropped)
120
+ - L4 ✗ (tests failed)
121
+ - L5 ✗ (browser assertions failed — both attempts)
122
+ - L5 ✗ (flaky) (browser assertions failed on both attempts, different assertions)
123
+
124
+ What does NOT count as a failure:
125
+ - L5 — (no frontend): skipped, not a revert trigger
126
+ - L5 ⚠ (passed on retry): treated as pass, resets counter
127
+ ```
128
+
129
+ **On revert (ratchet failed — any of L0 ✗, L1 ✗, L2 ✗, L4 ✗, L5 ✗, or L5 ✗ flaky):**
115
130
 
116
131
  ```
117
132
  1. Read .deepflow/auto-memory.yaml (create if missing)
@@ -126,7 +141,7 @@ After `/df:execute` returns, check whether the task was reverted (ratchet failed
126
141
  → Continue to step 4 (UPDATE REPORT) as normal
127
142
  ```
128
143
 
129
- **On success (ratchet passed):**
144
+ **On success (ratchet passed — including L5 — no frontend or L5 ⚠ pass-on-retry):**
130
145
 
131
146
  ```
132
147
  1. Reset consecutive_reverts[task_id] to 0 in .deepflow/auto-memory.yaml
@@ -169,10 +184,10 @@ _Last updated: {YYYY-MM-DDTHH:MM:SSZ}_
169
184
 
170
185
  ## Cycle Log
171
186
 
172
- | Cycle | Task | Status | Commit / Revert | Reason | Timestamp |
173
- |-------|------|--------|-----------------|--------|-----------|
174
- | 1 | T1 | passed | abc1234 | — | 2025-01-15T10:00:00Z |
175
- | 2 | T2 | failed | reverted | tests failed: 2 of 24 | 2025-01-15T10:05:00Z |
187
+ | Cycle | Task | Status | Commit / Revert | Delta | Reason | Timestamp |
188
+ |-------|------|--------|-----------------|-------|--------|-----------|
189
+ | 1 | T1 | passed | abc1234 | tests: 24→24, build: ok | — | 2025-01-15T10:00:00Z |
190
+ | 2 | T2 | failed | reverted | tests: 24→22 (−2) | tests failed: 2 of 24 | 2025-01-15T10:05:00Z |
176
191
 
177
192
  ## Probe Results
178
193
 
@@ -202,13 +217,14 @@ _(tasks that were reverted with their failure reasons)_
202
217
  **Cycle Log — append one row:**
203
218
 
204
219
  ```
205
- | {cycle_number} | {task_id} | {status} | {commit_hash or "reverted"} | {reason or "—"} | {YYYY-MM-DDTHH:MM:SSZ} |
220
+ | {cycle_number} | {task_id} | {status} | {commit_hash or "reverted"} | {delta} | {reason or "—"} | {YYYY-MM-DDTHH:MM:SSZ} |
206
221
  ```
207
222
 
208
223
  - `cycle_number`: total number of cycles executed so far (count existing data rows in the Cycle Log + 1)
209
224
  - `task_id`: task ID from PLAN.md, or `BOOTSTRAP` for bootstrap cycles
210
225
  - `status`: `passed` (ratchet passed), `failed` (ratchet failed, reverted), or `skipped` (task was already done)
211
226
  - `commit_hash`: short hash from the commit, or `reverted` if ratchet failed
227
+ - `delta`: ratchet metric change from this cycle. Format: `tests: {before}→{after}, build: ok/fail`. Include coverage delta if available (e.g., `cov: 80%→82% (+2%)`). On revert, show the regression that triggered it (e.g., `tests: 24→22 (−2)`)
212
228
  - `reason`: failure reason from ratchet output (e.g., `"tests failed: 2 of 24"`), or `—` if passed
213
229
 
214
230
  **Summary table — recalculate from Cycle Log rows:**
@@ -259,10 +275,12 @@ done_count = number of [x] tasks
259
275
  pending_count = number of [ ] tasks
260
276
  ```
261
277
 
262
- **If ALL tasks are `[x]` (pending_count == 0):**
278
+ **Note:** Per-spec verification and merge to main happens automatically in `/df:execute` (step 8) when all tasks for a spec complete. No separate verify call is needed here.
279
+
280
+ **If no `[ ]` tasks remain (pending_count == 0):**
263
281
  ```
264
- Run /df:verify via Skill tool (skill: "df:verify", no args)
265
- Report: "All tasks complete. Verification triggered."
282
+ Report: "All specs verified and merged. Workflow complete."
283
+ Exit
266
284
  ```
267
285
 
268
286
  **If tasks remain (pending_count > 0):**
@@ -327,17 +345,14 @@ Updated .deepflow/auto-report.md:
327
345
  Cycle complete. 1 tasks remaining.
328
346
  ```
329
347
 
330
- ### All Tasks Done (verify triggered)
348
+ ### All Tasks Done (workflow complete)
331
349
 
332
350
  ```
333
351
  /df:auto-cycle
334
352
 
335
- Loading PLAN.md... 3 tasks total, 3 done, 0 pending
353
+ Loading PLAN.md... 0 tasks total, 0 done, 0 pending
336
354
 
337
- All tasks complete. Verification triggered.
338
- Running: /df:verify
339
- ✓ L0 | ✓ L1 | ⚠ L2 (no coverage tool) | ✓ L4
340
- ✓ Merged df/upload to main
355
+ All specs verified and merged. Workflow complete.
341
356
  ```
342
357
 
343
358
  ### No Work Remaining (idempotent)
@@ -345,10 +360,9 @@ Running: /df:verify
345
360
  ```
346
361
  /df:auto-cycle
347
362
 
348
- Loading PLAN.md... 3 tasks total, 3 done, 0 pending
349
- Verification already complete (no doing-* specs found).
363
+ Loading PLAN.md... 0 tasks total, 0 done, 0 pending
350
364
 
351
- Nothing to do. Cycle complete. 0 tasks remaining.
365
+ All specs verified and merged. Workflow complete.
352
366
  ```
353
367
 
354
368
  ### Circuit Breaker Tripped