npm - deepflow - Versions diffs - 0.1.79 → 0.1.81 - Mend

deepflow 0.1.79 → 0.1.81

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +14 -3
package/bin/install.js +3 -2
package/package.json +4 -1
package/src/commands/df/auto-cycle.md +17 -2
package/src/commands/df/execute.md +39 -9
package/src/commands/df/plan.md +49 -0
package/src/commands/df/verify.md +433 -3
package/src/skills/browse-fetch/SKILL.md +416 -0
package/src/skills/browse-verify/SKILL.md +264 -0
package/templates/config-template.yaml +14 -0
package/src/skills/context-hub/SKILL.md +0 -87

package/README.md CHANGED Viewed

@@ -33,6 +33,7 @@ Most spec-driven frameworks start from a finished spec and execute a static plan
 - **Spec as living hypothesis** — Core intent stays fixed, details refine through implementation. "The spec becomes bulletproof because you built it, not before."
 - **Parallel probes reveal the best path** — Uncertain approaches spawn parallel spikes in isolated worktrees. The machine selects the winner (fewer regressions > better coverage > fewer files changed). Failed approaches stay recorded and never repeat.
 - **Metrics decide, not opinions** — No LLM judges another LLM. Build, tests, typecheck, lint, and invariant checks are the only judges. After an agent commits, the orchestrator runs health checks. Pass = keep. Fail = revert + new hypothesis.
+- **Browser verification closes the loop** — L5 launches headless Chromium via Playwright, captures the accessibility tree, and evaluates structured assertions extracted at plan-time from your spec's acceptance criteria. Deterministic pass/fail — no LLM calls during verification. Screenshots saved as evidence.
 - **The loop is the product** — Not "execute a plan" — "evolve the codebase toward the spec's goals through iterative cycles." Each cycle reveals what the previous one couldn't see.
 ## What We Learned by Doing
@@ -111,7 +112,7 @@ $ git log --oneline
 1. Runs `/df:plan` if no PLAN.md exists
 2. Snapshots pre-existing tests (ratchet baseline)
 3. Starts a loop (`/loop 1m /df:auto-cycle`) — fresh context each cycle
-4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint/invariant-check)
+4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint/invariant-check/browser-verify)
 5. Pass = commit stands. Fail = revert + retry next cycle
 6. Circuit breaker: halts after N consecutive reverts on same task
 7. When all tasks done: runs `/df:verify`, merges to main
@@ -142,7 +143,7 @@ $ git log --oneline
 | `/df:spec <name>` | Generate spec from conversation |
 | `/df:plan` | Compare specs to code, create tasks |
 | `/df:execute` | Run tasks with parallel agents |
-| `/df:verify` | Check specs satisfied, merge to main |
+| `/df:verify` | Check specs satisfied (L0-L5), merge to main |
 | `/df:note` | Capture decisions ad-hoc from conversation |
 | `/df:consolidate` | Deduplicate and clean up decisions.md |
 | `/df:resume` | Session continuity briefing |
@@ -179,12 +180,22 @@ your-project/
 1. **Discover before specifying, spike before implementing** — Ask, debate, probe — then commit
 2. **You define WHAT, AI figures out HOW** — Specs are the contract
-3. **Metrics decide, not opinions** — Build/test/typecheck/lint/invariant-check are the only judges
+3. **Metrics decide, not opinions** — Build/test/typecheck/lint/invariant-check/browser-verify are the only judges
 4. **Confirm before assume** — Search the code before marking "missing"
 5. **Complete implementations** — No stubs, no placeholders
 6. **Atomic commits** — One task = one commit
 7. **Context-aware** — Checkpoint before limits, resume seamlessly
+## Skills
+| Skill | Purpose |
+|-------|---------|
+| `browse-fetch` | Fetch external API docs via headless Chromium (replaces context-hub) |
+| `browse-verify` | L5 browser verification — Playwright a11y tree assertions |
+| `atomic-commits` | One logical change per commit |
+| `code-completeness` | Find TODOs, stubs, and missing implementations |
+| `gap-discovery` | Surface missing requirements during ideation |
 ## More
 - [Concepts](docs/concepts.md) — Philosophy and flow in depth

package/bin/install.js CHANGED Viewed

@@ -184,7 +184,7 @@ async function main() {
   console.log('');
   console.log(`Installed to ${c.cyan}${CLAUDE_DIR}${c.reset}:`);
   console.log('  commands/df/     — /df:discover, /df:debate, /df:spec, /df:plan, /df:execute, /df:verify, /df:auto, /df:note, /df:resume, /df:update');
-  console.log('  skills/          — gap-discovery, atomic-commits, code-completeness, context-hub');
+  console.log('  skills/          — gap-discovery, atomic-commits, code-completeness, browse-fetch, browse-verify');
   console.log('  agents/          — reasoner (/df:auto — autonomous execution via /loop)');
   if (level === 'global') {
     console.log('  hooks/           — statusline, update checker, invariant checker');
@@ -469,7 +469,8 @@ async function uninstall() {
     'skills/atomic-commits',
     'skills/code-completeness',
     'skills/gap-discovery',
-    'skills/context-hub',
+    'skills/browse-fetch',
+    'skills/browse-verify',
     'agents/reasoner.md'
   ];

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "deepflow",
-  "version": "0.1.79",
+  "version": "0.1.81",
   "description": "Doing reveals what thinking can't predict — spec-driven iterative development for Claude Code",
   "keywords": [
     "claude",
@@ -39,5 +39,8 @@
   ],
   "engines": {
     "node": ">=16.0.0"
+  },
+  "dependencies": {
+    "playwright": "^1.58.2"
   }
 }

package/src/commands/df/auto-cycle.md CHANGED Viewed

@@ -111,7 +111,22 @@ Read the current file first (create if missing), merge the new values, and write
 After `/df:execute` returns, check whether the task was reverted (ratchet failed):
-**On revert (ratchet failed):**
+**What counts as a failure (increments counter):**
+```
+- L0 ✗ (build failed)
+- L1 ✗ (files missing)
+- L2 ✗ (coverage dropped)
+- L4 ✗ (tests failed)
+- L5 ✗ (browser assertions failed — both attempts)
+- L5 ✗ (flaky) (browser assertions failed on both attempts, different assertions)
+What does NOT count as a failure:
+- L5 — (no frontend): skipped, not a revert trigger
+- L5 ⚠ (passed on retry): treated as pass, resets counter
+```
+**On revert (ratchet failed — any of L0 ✗, L1 ✗, L2 ✗, L4 ✗, L5 ✗, or L5 ✗ flaky):**
 ```
 1. Read .deepflow/auto-memory.yaml (create if missing)
@@ -126,7 +141,7 @@ After `/df:execute` returns, check whether the task was reverted (ratchet failed
      → Continue to step 4 (UPDATE REPORT) as normal
 ```
-**On success (ratchet passed):**
+**On success (ratchet passed — including L5 — no frontend or L5 ⚠ pass-on-retry):**
 ```
 1. Reset consecutive_reverts[task_id] to 0 in .deepflow/auto-memory.yaml

package/src/commands/df/execute.md CHANGED Viewed

@@ -104,7 +104,14 @@ Before spawning: `TaskUpdate(taskId: native_id, status: "in_progress")` — acti
 **NEVER use `isolation: "worktree"` on Task calls.** Deepflow manages a shared worktree so wave 2 sees wave 1 commits.
-**Spawn ALL ready tasks in ONE message.** Same-file conflicts: spawn sequentially.
+**Spawn ALL ready tasks in ONE message** — EXCEPT file conflicts (see below).
+**File conflict enforcement (1 file = 1 writer):**
+Before spawning, check `Files:` lists of all ready tasks. If two+ ready tasks share a file:
+1. Sort conflicting tasks by task number (T1 < T2 < T3)
+2. Spawn only the lowest-numbered task from each conflict group
+3. Remaining tasks stay `pending` — they become ready once the spawned task completes
+4. Log: `"⏳ T{N} deferred — file conflict with T{M} on {filename}"`
 **≥2 [SPIKE] tasks for same problem:** Follow Parallel Spike Probes (section 5.7).
@@ -138,7 +145,7 @@ Ratchet uses ONLY pre-existing test files from `.deepflow/auto-snapshot.txt`.
 Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothesis.
 1. **Baseline:** Record `BASELINE=$(git rev-parse HEAD)` in shared worktree
-2. **Sub-worktrees:** Per spike: `git worktree add -b df/{spec}/probe-{SPIKE_ID} .deepflow/worktrees/{spec}/probe-{SPIKE_ID} ${BASELINE}`
+2. **Sub-worktrees:** Per spike: `git worktree add -b df/{spec}--probe-{SPIKE_ID} .deepflow/worktrees/{spec}/probe-{SPIKE_ID} ${BASELINE}`
 3. **Spawn:** All probes in ONE message, each targeting its probe worktree. End turn.
 4. **Ratchet:** Per notification, run standard ratchet (5.5) in probe worktree. Record: ratchet_passed, regressions, coverage_delta, files_changed, commit
 5. **Select winner** (after ALL complete, no LLM judge):
@@ -146,9 +153,17 @@ Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothes
    - Rank: fewer regressions > higher coverage_delta > fewer files_changed > first to complete
    - No passes → reset all to pending for retry with debugger
 6. **Preserve all worktrees.** Losers: rename branch + `-failed` suffix. Record in checkpoint.json under `"spike_probes"`
-7. **Log failed probes** to `.deepflow/auto-memory.yaml` (main tree):
+7. **Log ALL probe outcomes** to `.deepflow/auto-memory.yaml` (main tree):
    ```yaml
    spike_insights:
+     - date: "YYYY-MM-DD"
+       spec: "{spec_name}"
+       spike_id: "SPIKE_A"
+       hypothesis: "{from PLAN.md}"
+       outcome: "winner"
+       approach: "{one-sentence summary of what the winning probe chose}"
+       ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
+       branch: "df/{spec}--probe-SPIKE_A"
      - date: "YYYY-MM-DD"
        spec: "{spec_name}"
        spike_id: "SPIKE_B"
@@ -157,13 +172,16 @@ Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothes
        failure_reason: "{first failed check + error summary}"
        ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
        worktree: ".deepflow/worktrees/{spec}/probe-SPIKE_B-failed"
-       branch: "df/{spec}/probe-SPIKE_B-failed"
-   probe_learnings:  # read by /df:auto-cycle each start
+       branch: "df/{spec}--probe-SPIKE_B-failed"
+   probe_learnings:  # read by /df:auto-cycle each start AND included in per-task preamble
+     - spike: "SPIKE_A"
+       probe: "probe-SPIKE_A"
+       insight: "{one-sentence summary of winning approach — e.g. 'Use Node.js over Bun for Playwright'}"
      - spike: "SPIKE_B"
        probe: "probe-SPIKE_B"
        insight: "{one-sentence summary from failure_reason}"
    ```
-   Create file if missing. Preserve existing keys when merging.
+   Create file if missing. Preserve existing keys when merging. Log BOTH winners and losers — downstream tasks need to know what was chosen, not just what failed.
 8. **Promote winner:** Cherry-pick into shared worktree. Winner → `[x] [PROBE_WINNER]`, losers → `[~] [PROBE_FAILED]`. Resume standard loop.
 ---
@@ -176,10 +194,15 @@ Working directory: {worktree_absolute_path}
 All file operations MUST use this absolute path as base. Do NOT write files to the main project directory.
 Commit format: {commit_type}({spec}): {description}
+{If .deepflow/auto-memory.yaml exists and has probe_learnings, include:}
+Spike results (follow these approaches):
+{each probe_learning with outcome "winner" → "- {insight}"}
+{Omit this block if no probe_learnings exist.}
 STOP after committing. Do NOT merge branches, rename spec files, remove worktrees, or run git checkout on main.
 ```
-**Standard Task:**
+**Standard Task** (spawn with `Agent(model="{Model from PLAN.md}", ...)`):
 ```
 {task_id}: {description from PLAN.md}
 Files: {target files}  Spec: {spec_name}
@@ -252,14 +275,21 @@ When all tasks done for a `doing-*` spec:
 ## Skills & Agents
 - Skill: `atomic-commits` — Clean commit protocol
-- Skill: `context-hub` — Fetch external API docs before coding
+- Skill: `browse-fetch` — Fetch live web pages and external API docs via browser before coding
 | Agent | subagent_type | Purpose |
 |-------|---------------|---------|
 | Implementation | `general-purpose` | Task implementation |
 | Debugger | `reasoner` | Debugging failures |
-**Model routing:** Use `model:` from command/agent/skill frontmatter. Default: `sonnet`.
+**Model routing:** Read `Model:` field from each task block in PLAN.md. Pass as `model:` parameter when spawning the agent. Default: `sonnet` if field is missing.
+| Task field | Agent call |
+|------------|-----------|
+| `Model: haiku` | `Agent(model="haiku", ...)` |
+| `Model: sonnet` | `Agent(model="sonnet", ...)` |
+| `Model: opus` | `Agent(model="opus", ...)` |
+| (missing) | `Agent(model="sonnet", ...)` |
 **Checkpoint schema:** `.deepflow/checkpoint.json` in worktree:
 ```json

package/src/commands/df/plan.md CHANGED Viewed

@@ -99,12 +99,59 @@ For each file in a task's "Files:" list, find the full blast radius.
 Files outside original "Files:" → add with `(impact — verify/update)`.
 Skip for spike tasks.
+### 4.6. CROSS-TASK FILE CONFLICT DETECTION
+After all tasks have their `Files:` lists, detect overlaps that require sequential execution.
+**Algorithm:**
+1. Build a map: `file → [task IDs that list it]`
+2. For each file with >1 task: add `Blocked by` edge from later task → earlier task (by task number)
+3. If a dependency already exists (direct or transitive), skip (no redundant edges)
+**Example:**
+```
+T1: Files: config.go, feature.go  — Blocked by: none
+T3: Files: config.go              — Blocked by: none
+T5: Files: config.go              — Blocked by: none
+```
+After conflict detection:
+```
+T1: Blocked by: none
+T3: Blocked by: T1 (file conflict: config.go)
+T5: Blocked by: T3 (file conflict: config.go)
+```
+**Rules:**
+- Only add the minimum edges needed (chain, not full mesh — T5 blocks on T3, not T1+T3)
+- Append `(file conflict: {filename})` to the Blocked by reason for traceability
+- If a logical dependency already covers the ordering, don't add a redundant conflict edge
+- Cross-spec conflicts: tasks from different specs sharing files get the same treatment
 ### 5. COMPARE & PRIORITIZE
 Spawn `Task(subagent_type="reasoner", model="opus")`. Map each requirement to DONE / PARTIAL / MISSING / CONFLICT. Check REQ-AC alignment. Flag spec gaps.
 Priority: Dependencies → Impact → Risk
+### 5.5. CLASSIFY MODEL PER TASK
+For each task, assign `Model:` based on complexity signals:
+| Model | When | Signals |
+|-------|------|---------|
+| `haiku` | Mechanical / low-risk | Single file, config changes, renames, formatting, browse-fetch, simple additions with clear pattern to follow |
+| `sonnet` | Standard implementation | Feature work, bug fixes, refactoring, multi-file changes with clear specs |
+| `opus` | High complexity | Architecture changes, complex multi-file refactors, ambiguous specs, unfamiliar APIs, >5 files in Impact |
+**Decision inputs:**
+1. **File count** — 1 file → likely haiku/sonnet, >5 files → sonnet/opus
+2. **Impact blast radius** — many callers/duplicates → raise complexity
+3. **Spec clarity** — clear ACs with patterns → lower, ambiguous requirements → raise
+4. **Type** — spikes always `sonnet` (need reasoning but scoped), bootstrap → `haiku`
+5. **Has prior failures** — reverted tasks → raise one level (min `sonnet`)
+Add `Model: haiku|sonnet|opus` to each task block. Default: `sonnet` if unclear.
 ### 6. GENERATE SPIKE TASKS (IF NEEDED)
 **Spike Task Format:**
@@ -200,6 +247,7 @@ Always use `Task` tool with explicit `subagent_type` and `model`.
 - [ ] **T2**: Create upload endpoint
   - Files: src/api/upload.ts
+  - Model: sonnet
   - Impact:
     - Callers: src/routes/index.ts:5
     - Duplicates: backend/legacy-upload.go [dead — DELETE]
@@ -207,5 +255,6 @@ Always use `Task` tool with explicit `subagent_type` and `model`.
 - [ ] **T3**: Add S3 service with streaming
   - Files: src/services/storage.ts
+  - Model: opus
   - Blocked by: T1, T2
 ```