npm - deepflow - Versions diffs - 0.1.78 → 0.1.80 - Mend

deepflow 0.1.78 → 0.1.80

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +14 -3
package/bin/install.js +3 -2
package/package.json +4 -1
package/src/commands/df/auto-cycle.md +33 -19
package/src/commands/df/execute.md +166 -473
package/src/commands/df/plan.md +113 -163
package/src/commands/df/verify.md +433 -3
package/src/skills/browse-fetch/SKILL.md +258 -0
package/src/skills/browse-verify/SKILL.md +264 -0
package/templates/config-template.yaml +14 -0
package/src/skills/context-hub/SKILL.md +0 -87

package/README.md CHANGED Viewed

@@ -33,6 +33,7 @@ Most spec-driven frameworks start from a finished spec and execute a static plan
 - **Spec as living hypothesis** — Core intent stays fixed, details refine through implementation. "The spec becomes bulletproof because you built it, not before."
 - **Parallel probes reveal the best path** — Uncertain approaches spawn parallel spikes in isolated worktrees. The machine selects the winner (fewer regressions > better coverage > fewer files changed). Failed approaches stay recorded and never repeat.
 - **Metrics decide, not opinions** — No LLM judges another LLM. Build, tests, typecheck, lint, and invariant checks are the only judges. After an agent commits, the orchestrator runs health checks. Pass = keep. Fail = revert + new hypothesis.
+- **Browser verification closes the loop** — L5 launches headless Chromium via Playwright, captures the accessibility tree, and evaluates structured assertions extracted at plan-time from your spec's acceptance criteria. Deterministic pass/fail — no LLM calls during verification. Screenshots saved as evidence.
 - **The loop is the product** — Not "execute a plan" — "evolve the codebase toward the spec's goals through iterative cycles." Each cycle reveals what the previous one couldn't see.
 ## What We Learned by Doing
@@ -111,7 +112,7 @@ $ git log --oneline
 1. Runs `/df:plan` if no PLAN.md exists
 2. Snapshots pre-existing tests (ratchet baseline)
 3. Starts a loop (`/loop 1m /df:auto-cycle`) — fresh context each cycle
-4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint/invariant-check)
+4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint/invariant-check/browser-verify)
 5. Pass = commit stands. Fail = revert + retry next cycle
 6. Circuit breaker: halts after N consecutive reverts on same task
 7. When all tasks done: runs `/df:verify`, merges to main
@@ -142,7 +143,7 @@ $ git log --oneline
 | `/df:spec <name>` | Generate spec from conversation |
 | `/df:plan` | Compare specs to code, create tasks |
 | `/df:execute` | Run tasks with parallel agents |
-| `/df:verify` | Check specs satisfied, merge to main |
+| `/df:verify` | Check specs satisfied (L0-L5), merge to main |
 | `/df:note` | Capture decisions ad-hoc from conversation |
 | `/df:consolidate` | Deduplicate and clean up decisions.md |
 | `/df:resume` | Session continuity briefing |
@@ -179,12 +180,22 @@ your-project/
 1. **Discover before specifying, spike before implementing** — Ask, debate, probe — then commit
 2. **You define WHAT, AI figures out HOW** — Specs are the contract
-3. **Metrics decide, not opinions** — Build/test/typecheck/lint/invariant-check are the only judges
+3. **Metrics decide, not opinions** — Build/test/typecheck/lint/invariant-check/browser-verify are the only judges
 4. **Confirm before assume** — Search the code before marking "missing"
 5. **Complete implementations** — No stubs, no placeholders
 6. **Atomic commits** — One task = one commit
 7. **Context-aware** — Checkpoint before limits, resume seamlessly
+## Skills
+| Skill | Purpose |
+|-------|---------|
+| `browse-fetch` | Fetch external API docs via headless Chromium (replaces context-hub) |
+| `browse-verify` | L5 browser verification — Playwright a11y tree assertions |
+| `atomic-commits` | One logical change per commit |
+| `code-completeness` | Find TODOs, stubs, and missing implementations |
+| `gap-discovery` | Surface missing requirements during ideation |
 ## More
 - [Concepts](docs/concepts.md) — Philosophy and flow in depth

package/bin/install.js CHANGED Viewed

@@ -184,7 +184,7 @@ async function main() {
   console.log('');
   console.log(`Installed to ${c.cyan}${CLAUDE_DIR}${c.reset}:`);
   console.log('  commands/df/     — /df:discover, /df:debate, /df:spec, /df:plan, /df:execute, /df:verify, /df:auto, /df:note, /df:resume, /df:update');
-  console.log('  skills/          — gap-discovery, atomic-commits, code-completeness, context-hub');
+  console.log('  skills/          — gap-discovery, atomic-commits, code-completeness, browse-fetch, browse-verify');
   console.log('  agents/          — reasoner (/df:auto — autonomous execution via /loop)');
   if (level === 'global') {
     console.log('  hooks/           — statusline, update checker, invariant checker');
@@ -469,7 +469,8 @@ async function uninstall() {
     'skills/atomic-commits',
     'skills/code-completeness',
     'skills/gap-discovery',
-    'skills/context-hub',
+    'skills/browse-fetch',
+    'skills/browse-verify',
     'agents/reasoner.md'
   ];

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "deepflow",
-  "version": "0.1.78",
+  "version": "0.1.80",
   "description": "Doing reveals what thinking can't predict — spec-driven iterative development for Claude Code",
   "keywords": [
     "claude",
@@ -39,5 +39,8 @@
   ],
   "engines": {
     "node": ">=16.0.0"
+  },
+  "dependencies": {
+    "playwright": "^1.58.2"
   }
 }

package/src/commands/df/auto-cycle.md CHANGED Viewed

@@ -111,7 +111,22 @@ Read the current file first (create if missing), merge the new values, and write
 After `/df:execute` returns, check whether the task was reverted (ratchet failed):
-**On revert (ratchet failed):**
+**What counts as a failure (increments counter):**
+```
+- L0 ✗ (build failed)
+- L1 ✗ (files missing)
+- L2 ✗ (coverage dropped)
+- L4 ✗ (tests failed)
+- L5 ✗ (browser assertions failed — both attempts)
+- L5 ✗ (flaky) (browser assertions failed on both attempts, different assertions)
+What does NOT count as a failure:
+- L5 — (no frontend): skipped, not a revert trigger
+- L5 ⚠ (passed on retry): treated as pass, resets counter
+```
+**On revert (ratchet failed — any of L0 ✗, L1 ✗, L2 ✗, L4 ✗, L5 ✗, or L5 ✗ flaky):**
 ```
 1. Read .deepflow/auto-memory.yaml (create if missing)
@@ -126,7 +141,7 @@ After `/df:execute` returns, check whether the task was reverted (ratchet failed
      → Continue to step 4 (UPDATE REPORT) as normal
 ```
-**On success (ratchet passed):**
+**On success (ratchet passed — including L5 — no frontend or L5 ⚠ pass-on-retry):**
 ```
 1. Reset consecutive_reverts[task_id] to 0 in .deepflow/auto-memory.yaml
@@ -169,10 +184,10 @@ _Last updated: {YYYY-MM-DDTHH:MM:SSZ}_
 ## Cycle Log
-| Cycle | Task | Status | Commit / Revert | Reason | Timestamp |
-|-------|------|--------|-----------------|--------|-----------|
-| 1 | T1 | passed | abc1234 | — | 2025-01-15T10:00:00Z |
-| 2 | T2 | failed | reverted | tests failed: 2 of 24 | 2025-01-15T10:05:00Z |
+| Cycle | Task | Status | Commit / Revert | Delta | Reason | Timestamp |
+|-------|------|--------|-----------------|-------|--------|-----------|
+| 1 | T1 | passed | abc1234 | tests: 24→24, build: ok | — | 2025-01-15T10:00:00Z |
+| 2 | T2 | failed | reverted | tests: 24→22 (−2) | tests failed: 2 of 24 | 2025-01-15T10:05:00Z |
 ## Probe Results
@@ -202,13 +217,14 @@ _(tasks that were reverted with their failure reasons)_
 **Cycle Log — append one row:**
 ```
-| {cycle_number} | {task_id} | {status} | {commit_hash or "reverted"} | {reason or "—"} | {YYYY-MM-DDTHH:MM:SSZ} |
+| {cycle_number} | {task_id} | {status} | {commit_hash or "reverted"} | {delta} | {reason or "—"} | {YYYY-MM-DDTHH:MM:SSZ} |
 ```
 - `cycle_number`: total number of cycles executed so far (count existing data rows in the Cycle Log + 1)
 - `task_id`: task ID from PLAN.md, or `BOOTSTRAP` for bootstrap cycles
 - `status`: `passed` (ratchet passed), `failed` (ratchet failed, reverted), or `skipped` (task was already done)
 - `commit_hash`: short hash from the commit, or `reverted` if ratchet failed
+- `delta`: ratchet metric change from this cycle. Format: `tests: {before}→{after}, build: ok/fail`. Include coverage delta if available (e.g., `cov: 80%→82% (+2%)`). On revert, show the regression that triggered it (e.g., `tests: 24→22 (−2)`)
 - `reason`: failure reason from ratchet output (e.g., `"tests failed: 2 of 24"`), or `—` if passed
 **Summary table — recalculate from Cycle Log rows:**
@@ -259,10 +275,12 @@ done_count   = number of [x] tasks
 pending_count = number of [ ] tasks
 ```
-**If ALL tasks are `[x]` (pending_count == 0):**
+**Note:** Per-spec verification and merge to main happens automatically in `/df:execute` (step 8) when all tasks for a spec complete. No separate verify call is needed here.
+**If no `[ ]` tasks remain (pending_count == 0):**
 ```
-→ Run /df:verify via Skill tool (skill: "df:verify", no args)
-→ Report: "All tasks complete. Verification triggered."
+→ Report: "All specs verified and merged. Workflow complete."
+→ Exit
 ```
 **If tasks remain (pending_count > 0):**
@@ -327,17 +345,14 @@ Updated .deepflow/auto-report.md:
 Cycle complete. 1 tasks remaining.
 ```
-### All Tasks Done (verify triggered)
+### All Tasks Done (workflow complete)
 ```
 /df:auto-cycle
-Loading PLAN.md... 3 tasks total, 3 done, 0 pending
+Loading PLAN.md... 0 tasks total, 0 done, 0 pending
-All tasks complete. Verification triggered.
-Running: /df:verify
-  ✓ L0 | ✓ L1 | ⚠ L2 (no coverage tool) | ✓ L4
-  ✓ Merged df/upload to main
+All specs verified and merged. Workflow complete.
 ```
 ### No Work Remaining (idempotent)
@@ -345,10 +360,9 @@ Running: /df:verify
 ```
 /df:auto-cycle
-Loading PLAN.md... 3 tasks total, 3 done, 0 pending
-Verification already complete (no doing-* specs found).
+Loading PLAN.md... 0 tasks total, 0 done, 0 pending
-Nothing to do. Cycle complete. 0 tasks remaining.
+All specs verified and merged. Workflow complete.
 ```
 ### Circuit Breaker Tripped