deepflow 0.1.78 → 0.1.80
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +14 -3
- package/bin/install.js +3 -2
- package/package.json +4 -1
- package/src/commands/df/auto-cycle.md +33 -19
- package/src/commands/df/execute.md +166 -473
- package/src/commands/df/plan.md +113 -163
- package/src/commands/df/verify.md +433 -3
- package/src/skills/browse-fetch/SKILL.md +258 -0
- package/src/skills/browse-verify/SKILL.md +264 -0
- package/templates/config-template.yaml +14 -0
- package/src/skills/context-hub/SKILL.md +0 -87
package/README.md
CHANGED
|
@@ -33,6 +33,7 @@ Most spec-driven frameworks start from a finished spec and execute a static plan
|
|
|
33
33
|
- **Spec as living hypothesis** — Core intent stays fixed, details refine through implementation. "The spec becomes bulletproof because you built it, not before."
|
|
34
34
|
- **Parallel probes reveal the best path** — Uncertain approaches spawn parallel spikes in isolated worktrees. The machine selects the winner (fewer regressions > better coverage > fewer files changed). Failed approaches stay recorded and never repeat.
|
|
35
35
|
- **Metrics decide, not opinions** — No LLM judges another LLM. Build, tests, typecheck, lint, and invariant checks are the only judges. After an agent commits, the orchestrator runs health checks. Pass = keep. Fail = revert + new hypothesis.
|
|
36
|
+
- **Browser verification closes the loop** — L5 launches headless Chromium via Playwright, captures the accessibility tree, and evaluates structured assertions extracted at plan-time from your spec's acceptance criteria. Deterministic pass/fail — no LLM calls during verification. Screenshots saved as evidence.
|
|
36
37
|
- **The loop is the product** — Not "execute a plan" — "evolve the codebase toward the spec's goals through iterative cycles." Each cycle reveals what the previous one couldn't see.
|
|
37
38
|
|
|
38
39
|
## What We Learned by Doing
|
|
@@ -111,7 +112,7 @@ $ git log --oneline
|
|
|
111
112
|
1. Runs `/df:plan` if no PLAN.md exists
|
|
112
113
|
2. Snapshots pre-existing tests (ratchet baseline)
|
|
113
114
|
3. Starts a loop (`/loop 1m /df:auto-cycle`) — fresh context each cycle
|
|
114
|
-
4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint/invariant-check)
|
|
115
|
+
4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint/invariant-check/browser-verify)
|
|
115
116
|
5. Pass = commit stands. Fail = revert + retry next cycle
|
|
116
117
|
6. Circuit breaker: halts after N consecutive reverts on same task
|
|
117
118
|
7. When all tasks done: runs `/df:verify`, merges to main
|
|
@@ -142,7 +143,7 @@ $ git log --oneline
|
|
|
142
143
|
| `/df:spec <name>` | Generate spec from conversation |
|
|
143
144
|
| `/df:plan` | Compare specs to code, create tasks |
|
|
144
145
|
| `/df:execute` | Run tasks with parallel agents |
|
|
145
|
-
| `/df:verify` | Check specs satisfied, merge to main |
|
|
146
|
+
| `/df:verify` | Check specs satisfied (L0-L5), merge to main |
|
|
146
147
|
| `/df:note` | Capture decisions ad-hoc from conversation |
|
|
147
148
|
| `/df:consolidate` | Deduplicate and clean up decisions.md |
|
|
148
149
|
| `/df:resume` | Session continuity briefing |
|
|
@@ -179,12 +180,22 @@ your-project/
|
|
|
179
180
|
|
|
180
181
|
1. **Discover before specifying, spike before implementing** — Ask, debate, probe — then commit
|
|
181
182
|
2. **You define WHAT, AI figures out HOW** — Specs are the contract
|
|
182
|
-
3. **Metrics decide, not opinions** — Build/test/typecheck/lint/invariant-check are the only judges
|
|
183
|
+
3. **Metrics decide, not opinions** — Build/test/typecheck/lint/invariant-check/browser-verify are the only judges
|
|
183
184
|
4. **Confirm before assume** — Search the code before marking "missing"
|
|
184
185
|
5. **Complete implementations** — No stubs, no placeholders
|
|
185
186
|
6. **Atomic commits** — One task = one commit
|
|
186
187
|
7. **Context-aware** — Checkpoint before limits, resume seamlessly
|
|
187
188
|
|
|
189
|
+
## Skills
|
|
190
|
+
|
|
191
|
+
| Skill | Purpose |
|
|
192
|
+
|-------|---------|
|
|
193
|
+
| `browse-fetch` | Fetch external API docs via headless Chromium (replaces context-hub) |
|
|
194
|
+
| `browse-verify` | L5 browser verification — Playwright a11y tree assertions |
|
|
195
|
+
| `atomic-commits` | One logical change per commit |
|
|
196
|
+
| `code-completeness` | Find TODOs, stubs, and missing implementations |
|
|
197
|
+
| `gap-discovery` | Surface missing requirements during ideation |
|
|
198
|
+
|
|
188
199
|
## More
|
|
189
200
|
|
|
190
201
|
- [Concepts](docs/concepts.md) — Philosophy and flow in depth
|
package/bin/install.js
CHANGED
|
@@ -184,7 +184,7 @@ async function main() {
|
|
|
184
184
|
console.log('');
|
|
185
185
|
console.log(`Installed to ${c.cyan}${CLAUDE_DIR}${c.reset}:`);
|
|
186
186
|
console.log(' commands/df/ — /df:discover, /df:debate, /df:spec, /df:plan, /df:execute, /df:verify, /df:auto, /df:note, /df:resume, /df:update');
|
|
187
|
-
console.log(' skills/ — gap-discovery, atomic-commits, code-completeness,
|
|
187
|
+
console.log(' skills/ — gap-discovery, atomic-commits, code-completeness, browse-fetch, browse-verify');
|
|
188
188
|
console.log(' agents/ — reasoner (/df:auto — autonomous execution via /loop)');
|
|
189
189
|
if (level === 'global') {
|
|
190
190
|
console.log(' hooks/ — statusline, update checker, invariant checker');
|
|
@@ -469,7 +469,8 @@ async function uninstall() {
|
|
|
469
469
|
'skills/atomic-commits',
|
|
470
470
|
'skills/code-completeness',
|
|
471
471
|
'skills/gap-discovery',
|
|
472
|
-
'skills/
|
|
472
|
+
'skills/browse-fetch',
|
|
473
|
+
'skills/browse-verify',
|
|
473
474
|
'agents/reasoner.md'
|
|
474
475
|
];
|
|
475
476
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "deepflow",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.80",
|
|
4
4
|
"description": "Doing reveals what thinking can't predict — spec-driven iterative development for Claude Code",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"claude",
|
|
@@ -39,5 +39,8 @@
|
|
|
39
39
|
],
|
|
40
40
|
"engines": {
|
|
41
41
|
"node": ">=16.0.0"
|
|
42
|
+
},
|
|
43
|
+
"dependencies": {
|
|
44
|
+
"playwright": "^1.58.2"
|
|
42
45
|
}
|
|
43
46
|
}
|
|
@@ -111,7 +111,22 @@ Read the current file first (create if missing), merge the new values, and write
|
|
|
111
111
|
|
|
112
112
|
After `/df:execute` returns, check whether the task was reverted (ratchet failed):
|
|
113
113
|
|
|
114
|
-
**
|
|
114
|
+
**What counts as a failure (increments counter):**
|
|
115
|
+
|
|
116
|
+
```
|
|
117
|
+
- L0 ✗ (build failed)
|
|
118
|
+
- L1 ✗ (files missing)
|
|
119
|
+
- L2 ✗ (coverage dropped)
|
|
120
|
+
- L4 ✗ (tests failed)
|
|
121
|
+
- L5 ✗ (browser assertions failed — both attempts)
|
|
122
|
+
- L5 ✗ (flaky) (browser assertions failed on both attempts, different assertions)
|
|
123
|
+
|
|
124
|
+
What does NOT count as a failure:
|
|
125
|
+
- L5 — (no frontend): skipped, not a revert trigger
|
|
126
|
+
- L5 ⚠ (passed on retry): treated as pass, resets counter
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
**On revert (ratchet failed — any of L0 ✗, L1 ✗, L2 ✗, L4 ✗, L5 ✗, or L5 ✗ flaky):**
|
|
115
130
|
|
|
116
131
|
```
|
|
117
132
|
1. Read .deepflow/auto-memory.yaml (create if missing)
|
|
@@ -126,7 +141,7 @@ After `/df:execute` returns, check whether the task was reverted (ratchet failed
|
|
|
126
141
|
→ Continue to step 4 (UPDATE REPORT) as normal
|
|
127
142
|
```
|
|
128
143
|
|
|
129
|
-
**On success (ratchet passed):**
|
|
144
|
+
**On success (ratchet passed — including L5 — no frontend or L5 ⚠ pass-on-retry):**
|
|
130
145
|
|
|
131
146
|
```
|
|
132
147
|
1. Reset consecutive_reverts[task_id] to 0 in .deepflow/auto-memory.yaml
|
|
@@ -169,10 +184,10 @@ _Last updated: {YYYY-MM-DDTHH:MM:SSZ}_
|
|
|
169
184
|
|
|
170
185
|
## Cycle Log
|
|
171
186
|
|
|
172
|
-
| Cycle | Task | Status | Commit / Revert | Reason | Timestamp |
|
|
173
|
-
|
|
174
|
-
| 1 | T1 | passed | abc1234 | — | 2025-01-15T10:00:00Z |
|
|
175
|
-
| 2 | T2 | failed | reverted | tests failed: 2 of 24 | 2025-01-15T10:05:00Z |
|
|
187
|
+
| Cycle | Task | Status | Commit / Revert | Delta | Reason | Timestamp |
|
|
188
|
+
|-------|------|--------|-----------------|-------|--------|-----------|
|
|
189
|
+
| 1 | T1 | passed | abc1234 | tests: 24→24, build: ok | — | 2025-01-15T10:00:00Z |
|
|
190
|
+
| 2 | T2 | failed | reverted | tests: 24→22 (−2) | tests failed: 2 of 24 | 2025-01-15T10:05:00Z |
|
|
176
191
|
|
|
177
192
|
## Probe Results
|
|
178
193
|
|
|
@@ -202,13 +217,14 @@ _(tasks that were reverted with their failure reasons)_
|
|
|
202
217
|
**Cycle Log — append one row:**
|
|
203
218
|
|
|
204
219
|
```
|
|
205
|
-
| {cycle_number} | {task_id} | {status} | {commit_hash or "reverted"} | {reason or "—"} | {YYYY-MM-DDTHH:MM:SSZ} |
|
|
220
|
+
| {cycle_number} | {task_id} | {status} | {commit_hash or "reverted"} | {delta} | {reason or "—"} | {YYYY-MM-DDTHH:MM:SSZ} |
|
|
206
221
|
```
|
|
207
222
|
|
|
208
223
|
- `cycle_number`: total number of cycles executed so far (count existing data rows in the Cycle Log + 1)
|
|
209
224
|
- `task_id`: task ID from PLAN.md, or `BOOTSTRAP` for bootstrap cycles
|
|
210
225
|
- `status`: `passed` (ratchet passed), `failed` (ratchet failed, reverted), or `skipped` (task was already done)
|
|
211
226
|
- `commit_hash`: short hash from the commit, or `reverted` if ratchet failed
|
|
227
|
+
- `delta`: ratchet metric change from this cycle. Format: `tests: {before}→{after}, build: ok/fail`. Include coverage delta if available (e.g., `cov: 80%→82% (+2%)`). On revert, show the regression that triggered it (e.g., `tests: 24→22 (−2)`)
|
|
212
228
|
- `reason`: failure reason from ratchet output (e.g., `"tests failed: 2 of 24"`), or `—` if passed
|
|
213
229
|
|
|
214
230
|
**Summary table — recalculate from Cycle Log rows:**
|
|
@@ -259,10 +275,12 @@ done_count = number of [x] tasks
|
|
|
259
275
|
pending_count = number of [ ] tasks
|
|
260
276
|
```
|
|
261
277
|
|
|
262
|
-
**
|
|
278
|
+
**Note:** Per-spec verification and merge to main happens automatically in `/df:execute` (step 8) when all tasks for a spec complete. No separate verify call is needed here.
|
|
279
|
+
|
|
280
|
+
**If no `[ ]` tasks remain (pending_count == 0):**
|
|
263
281
|
```
|
|
264
|
-
→
|
|
265
|
-
→
|
|
282
|
+
→ Report: "All specs verified and merged. Workflow complete."
|
|
283
|
+
→ Exit
|
|
266
284
|
```
|
|
267
285
|
|
|
268
286
|
**If tasks remain (pending_count > 0):**
|
|
@@ -327,17 +345,14 @@ Updated .deepflow/auto-report.md:
|
|
|
327
345
|
Cycle complete. 1 tasks remaining.
|
|
328
346
|
```
|
|
329
347
|
|
|
330
|
-
### All Tasks Done (
|
|
348
|
+
### All Tasks Done (workflow complete)
|
|
331
349
|
|
|
332
350
|
```
|
|
333
351
|
/df:auto-cycle
|
|
334
352
|
|
|
335
|
-
Loading PLAN.md...
|
|
353
|
+
Loading PLAN.md... 0 tasks total, 0 done, 0 pending
|
|
336
354
|
|
|
337
|
-
All
|
|
338
|
-
Running: /df:verify
|
|
339
|
-
✓ L0 | ✓ L1 | ⚠ L2 (no coverage tool) | ✓ L4
|
|
340
|
-
✓ Merged df/upload to main
|
|
355
|
+
All specs verified and merged. Workflow complete.
|
|
341
356
|
```
|
|
342
357
|
|
|
343
358
|
### No Work Remaining (idempotent)
|
|
@@ -345,10 +360,9 @@ Running: /df:verify
|
|
|
345
360
|
```
|
|
346
361
|
/df:auto-cycle
|
|
347
362
|
|
|
348
|
-
Loading PLAN.md...
|
|
349
|
-
Verification already complete (no doing-* specs found).
|
|
363
|
+
Loading PLAN.md... 0 tasks total, 0 done, 0 pending
|
|
350
364
|
|
|
351
|
-
|
|
365
|
+
All specs verified and merged. Workflow complete.
|
|
352
366
|
```
|
|
353
367
|
|
|
354
368
|
### Circuit Breaker Tripped
|