deepflow 0.1.79 → 0.1.80
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +14 -3
- package/bin/install.js +3 -2
- package/package.json +4 -1
- package/src/commands/df/auto-cycle.md +17 -2
- package/src/commands/df/execute.md +11 -4
- package/src/commands/df/plan.md +28 -0
- package/src/commands/df/verify.md +433 -3
- package/src/skills/browse-fetch/SKILL.md +258 -0
- package/src/skills/browse-verify/SKILL.md +264 -0
- package/templates/config-template.yaml +14 -0
- package/src/skills/context-hub/SKILL.md +0 -87
package/README.md
CHANGED
|
@@ -33,6 +33,7 @@ Most spec-driven frameworks start from a finished spec and execute a static plan
|
|
|
33
33
|
- **Spec as living hypothesis** — Core intent stays fixed, details refine through implementation. "The spec becomes bulletproof because you built it, not before."
|
|
34
34
|
- **Parallel probes reveal the best path** — Uncertain approaches spawn parallel spikes in isolated worktrees. The machine selects the winner (fewer regressions > better coverage > fewer files changed). Failed approaches stay recorded and never repeat.
|
|
35
35
|
- **Metrics decide, not opinions** — No LLM judges another LLM. Build, tests, typecheck, lint, and invariant checks are the only judges. After an agent commits, the orchestrator runs health checks. Pass = keep. Fail = revert + new hypothesis.
|
|
36
|
+
- **Browser verification closes the loop** — L5 launches headless Chromium via Playwright, captures the accessibility tree, and evaluates structured assertions extracted at plan-time from your spec's acceptance criteria. Deterministic pass/fail — no LLM calls during verification. Screenshots saved as evidence.
|
|
36
37
|
- **The loop is the product** — Not "execute a plan" — "evolve the codebase toward the spec's goals through iterative cycles." Each cycle reveals what the previous one couldn't see.
|
|
37
38
|
|
|
38
39
|
## What We Learned by Doing
|
|
@@ -111,7 +112,7 @@ $ git log --oneline
|
|
|
111
112
|
1. Runs `/df:plan` if no PLAN.md exists
|
|
112
113
|
2. Snapshots pre-existing tests (ratchet baseline)
|
|
113
114
|
3. Starts a loop (`/loop 1m /df:auto-cycle`) — fresh context each cycle
|
|
114
|
-
4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint/invariant-check)
|
|
115
|
+
4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint/invariant-check/browser-verify)
|
|
115
116
|
5. Pass = commit stands. Fail = revert + retry next cycle
|
|
116
117
|
6. Circuit breaker: halts after N consecutive reverts on same task
|
|
117
118
|
7. When all tasks done: runs `/df:verify`, merges to main
|
|
@@ -142,7 +143,7 @@ $ git log --oneline
|
|
|
142
143
|
| `/df:spec <name>` | Generate spec from conversation |
|
|
143
144
|
| `/df:plan` | Compare specs to code, create tasks |
|
|
144
145
|
| `/df:execute` | Run tasks with parallel agents |
|
|
145
|
-
| `/df:verify` | Check specs satisfied, merge to main |
|
|
146
|
+
| `/df:verify` | Check specs satisfied (L0-L5), merge to main |
|
|
146
147
|
| `/df:note` | Capture decisions ad-hoc from conversation |
|
|
147
148
|
| `/df:consolidate` | Deduplicate and clean up decisions.md |
|
|
148
149
|
| `/df:resume` | Session continuity briefing |
|
|
@@ -179,12 +180,22 @@ your-project/
|
|
|
179
180
|
|
|
180
181
|
1. **Discover before specifying, spike before implementing** — Ask, debate, probe — then commit
|
|
181
182
|
2. **You define WHAT, AI figures out HOW** — Specs are the contract
|
|
182
|
-
3. **Metrics decide, not opinions** — Build/test/typecheck/lint/invariant-check are the only judges
|
|
183
|
+
3. **Metrics decide, not opinions** — Build/test/typecheck/lint/invariant-check/browser-verify are the only judges
|
|
183
184
|
4. **Confirm before assume** — Search the code before marking "missing"
|
|
184
185
|
5. **Complete implementations** — No stubs, no placeholders
|
|
185
186
|
6. **Atomic commits** — One task = one commit
|
|
186
187
|
7. **Context-aware** — Checkpoint before limits, resume seamlessly
|
|
187
188
|
|
|
189
|
+
## Skills
|
|
190
|
+
|
|
191
|
+
| Skill | Purpose |
|
|
192
|
+
|-------|---------|
|
|
193
|
+
| `browse-fetch` | Fetch external API docs via headless Chromium (replaces context-hub) |
|
|
194
|
+
| `browse-verify` | L5 browser verification — Playwright a11y tree assertions |
|
|
195
|
+
| `atomic-commits` | One logical change per commit |
|
|
196
|
+
| `code-completeness` | Find TODOs, stubs, and missing implementations |
|
|
197
|
+
| `gap-discovery` | Surface missing requirements during ideation |
|
|
198
|
+
|
|
188
199
|
## More
|
|
189
200
|
|
|
190
201
|
- [Concepts](docs/concepts.md) — Philosophy and flow in depth
|
package/bin/install.js
CHANGED
|
@@ -184,7 +184,7 @@ async function main() {
|
|
|
184
184
|
console.log('');
|
|
185
185
|
console.log(`Installed to ${c.cyan}${CLAUDE_DIR}${c.reset}:`);
|
|
186
186
|
console.log(' commands/df/ — /df:discover, /df:debate, /df:spec, /df:plan, /df:execute, /df:verify, /df:auto, /df:note, /df:resume, /df:update');
|
|
187
|
-
console.log(' skills/ — gap-discovery, atomic-commits, code-completeness,
|
|
187
|
+
console.log(' skills/ — gap-discovery, atomic-commits, code-completeness, browse-fetch, browse-verify');
|
|
188
188
|
console.log(' agents/ — reasoner (/df:auto — autonomous execution via /loop)');
|
|
189
189
|
if (level === 'global') {
|
|
190
190
|
console.log(' hooks/ — statusline, update checker, invariant checker');
|
|
@@ -469,7 +469,8 @@ async function uninstall() {
|
|
|
469
469
|
'skills/atomic-commits',
|
|
470
470
|
'skills/code-completeness',
|
|
471
471
|
'skills/gap-discovery',
|
|
472
|
-
'skills/
|
|
472
|
+
'skills/browse-fetch',
|
|
473
|
+
'skills/browse-verify',
|
|
473
474
|
'agents/reasoner.md'
|
|
474
475
|
];
|
|
475
476
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "deepflow",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.80",
|
|
4
4
|
"description": "Doing reveals what thinking can't predict — spec-driven iterative development for Claude Code",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"claude",
|
|
@@ -39,5 +39,8 @@
|
|
|
39
39
|
],
|
|
40
40
|
"engines": {
|
|
41
41
|
"node": ">=16.0.0"
|
|
42
|
+
},
|
|
43
|
+
"dependencies": {
|
|
44
|
+
"playwright": "^1.58.2"
|
|
42
45
|
}
|
|
43
46
|
}
|
|
@@ -111,7 +111,22 @@ Read the current file first (create if missing), merge the new values, and write
|
|
|
111
111
|
|
|
112
112
|
After `/df:execute` returns, check whether the task was reverted (ratchet failed):
|
|
113
113
|
|
|
114
|
-
**
|
|
114
|
+
**What counts as a failure (increments counter):**
|
|
115
|
+
|
|
116
|
+
```
|
|
117
|
+
- L0 ✗ (build failed)
|
|
118
|
+
- L1 ✗ (files missing)
|
|
119
|
+
- L2 ✗ (coverage dropped)
|
|
120
|
+
- L4 ✗ (tests failed)
|
|
121
|
+
- L5 ✗ (browser assertions failed — both attempts)
|
|
122
|
+
- L5 ✗ (flaky) (browser assertions failed on both attempts, different assertions)
|
|
123
|
+
|
|
124
|
+
What does NOT count as a failure:
|
|
125
|
+
- L5 — (no frontend): skipped, not a revert trigger
|
|
126
|
+
- L5 ⚠ (passed on retry): treated as pass, resets counter
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
**On revert (ratchet failed — any of L0 ✗, L1 ✗, L2 ✗, L4 ✗, L5 ✗, or L5 ✗ flaky):**
|
|
115
130
|
|
|
116
131
|
```
|
|
117
132
|
1. Read .deepflow/auto-memory.yaml (create if missing)
|
|
@@ -126,7 +141,7 @@ After `/df:execute` returns, check whether the task was reverted (ratchet failed
|
|
|
126
141
|
→ Continue to step 4 (UPDATE REPORT) as normal
|
|
127
142
|
```
|
|
128
143
|
|
|
129
|
-
**On success (ratchet passed):**
|
|
144
|
+
**On success (ratchet passed — including L5 — no frontend or L5 ⚠ pass-on-retry):**
|
|
130
145
|
|
|
131
146
|
```
|
|
132
147
|
1. Reset consecutive_reverts[task_id] to 0 in .deepflow/auto-memory.yaml
|
|
@@ -104,7 +104,14 @@ Before spawning: `TaskUpdate(taskId: native_id, status: "in_progress")` — acti
|
|
|
104
104
|
|
|
105
105
|
**NEVER use `isolation: "worktree"` on Task calls.** Deepflow manages a shared worktree so wave 2 sees wave 1 commits.
|
|
106
106
|
|
|
107
|
-
**Spawn ALL ready tasks in ONE message
|
|
107
|
+
**Spawn ALL ready tasks in ONE message** — EXCEPT file conflicts (see below).
|
|
108
|
+
|
|
109
|
+
**File conflict enforcement (1 file = 1 writer):**
|
|
110
|
+
Before spawning, check `Files:` lists of all ready tasks. If two+ ready tasks share a file:
|
|
111
|
+
1. Sort conflicting tasks by task number (T1 < T2 < T3)
|
|
112
|
+
2. Spawn only the lowest-numbered task from each conflict group
|
|
113
|
+
3. Remaining tasks stay `pending` — they become ready once the spawned task completes
|
|
114
|
+
4. Log: `"⏳ T{N} deferred — file conflict with T{M} on {filename}"`
|
|
108
115
|
|
|
109
116
|
**≥2 [SPIKE] tasks for same problem:** Follow Parallel Spike Probes (section 5.7).
|
|
110
117
|
|
|
@@ -138,7 +145,7 @@ Ratchet uses ONLY pre-existing test files from `.deepflow/auto-snapshot.txt`.
|
|
|
138
145
|
Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothesis.
|
|
139
146
|
|
|
140
147
|
1. **Baseline:** Record `BASELINE=$(git rev-parse HEAD)` in shared worktree
|
|
141
|
-
2. **Sub-worktrees:** Per spike: `git worktree add -b df/{spec}
|
|
148
|
+
2. **Sub-worktrees:** Per spike: `git worktree add -b df/{spec}--probe-{SPIKE_ID} .deepflow/worktrees/{spec}/probe-{SPIKE_ID} ${BASELINE}`
|
|
142
149
|
3. **Spawn:** All probes in ONE message, each targeting its probe worktree. End turn.
|
|
143
150
|
4. **Ratchet:** Per notification, run standard ratchet (5.5) in probe worktree. Record: ratchet_passed, regressions, coverage_delta, files_changed, commit
|
|
144
151
|
5. **Select winner** (after ALL complete, no LLM judge):
|
|
@@ -157,7 +164,7 @@ Trigger: ≥2 [SPIKE] tasks with same "Blocked by:" target or identical hypothes
|
|
|
157
164
|
failure_reason: "{first failed check + error summary}"
|
|
158
165
|
ratchet_metrics: {regressions: N, coverage_delta: N, files_changed: N}
|
|
159
166
|
worktree: ".deepflow/worktrees/{spec}/probe-SPIKE_B-failed"
|
|
160
|
-
branch: "df/{spec}
|
|
167
|
+
branch: "df/{spec}--probe-SPIKE_B-failed"
|
|
161
168
|
probe_learnings: # read by /df:auto-cycle each start
|
|
162
169
|
- spike: "SPIKE_B"
|
|
163
170
|
probe: "probe-SPIKE_B"
|
|
@@ -252,7 +259,7 @@ When all tasks done for a `doing-*` spec:
|
|
|
252
259
|
## Skills & Agents
|
|
253
260
|
|
|
254
261
|
- Skill: `atomic-commits` — Clean commit protocol
|
|
255
|
-
- Skill: `
|
|
262
|
+
- Skill: `browse-fetch` — Fetch live web pages and external API docs via browser before coding
|
|
256
263
|
|
|
257
264
|
| Agent | subagent_type | Purpose |
|
|
258
265
|
|-------|---------------|---------|
|
package/src/commands/df/plan.md
CHANGED
|
@@ -99,6 +99,34 @@ For each file in a task's "Files:" list, find the full blast radius.
|
|
|
99
99
|
Files outside original "Files:" → add with `(impact — verify/update)`.
|
|
100
100
|
Skip for spike tasks.
|
|
101
101
|
|
|
102
|
+
### 4.6. CROSS-TASK FILE CONFLICT DETECTION
|
|
103
|
+
|
|
104
|
+
After all tasks have their `Files:` lists, detect overlaps that require sequential execution.
|
|
105
|
+
|
|
106
|
+
**Algorithm:**
|
|
107
|
+
1. Build a map: `file → [task IDs that list it]`
|
|
108
|
+
2. For each file with >1 task: add `Blocked by` edge from later task → earlier task (by task number)
|
|
109
|
+
3. If a dependency already exists (direct or transitive), skip (no redundant edges)
|
|
110
|
+
|
|
111
|
+
**Example:**
|
|
112
|
+
```
|
|
113
|
+
T1: Files: config.go, feature.go — Blocked by: none
|
|
114
|
+
T3: Files: config.go — Blocked by: none
|
|
115
|
+
T5: Files: config.go — Blocked by: none
|
|
116
|
+
```
|
|
117
|
+
After conflict detection:
|
|
118
|
+
```
|
|
119
|
+
T1: Blocked by: none
|
|
120
|
+
T3: Blocked by: T1 (file conflict: config.go)
|
|
121
|
+
T5: Blocked by: T3 (file conflict: config.go)
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
**Rules:**
|
|
125
|
+
- Only add the minimum edges needed (chain, not full mesh — T5 blocks on T3, not T1+T3)
|
|
126
|
+
- Append `(file conflict: {filename})` to the Blocked by reason for traceability
|
|
127
|
+
- If a logical dependency already covers the ordering, don't add a redundant conflict edge
|
|
128
|
+
- Cross-spec conflicts: tasks from different specs sharing files get the same treatment
|
|
129
|
+
|
|
102
130
|
### 5. COMPARE & PRIORITIZE
|
|
103
131
|
|
|
104
132
|
Spawn `Task(subagent_type="reasoner", model="opus")`. Map each requirement to DONE / PARTIAL / MISSING / CONFLICT. Check REQ-AC alignment. Flag spec gaps.
|
|
@@ -131,16 +131,444 @@ Run AFTER L0 passes and L1-L2 complete. Run even if L1-L2 found issues.
|
|
|
131
131
|
**Flaky test handling** (if `quality.test_retry_on_fail: true` in config):
|
|
132
132
|
- Re-run ONCE on failure. Second pass → "⚠ L4: Passed on retry (possible flaky test)". Second fail → genuine failure.
|
|
133
133
|
|
|
134
|
+
**L5: Browser Verification** (if frontend detected)
|
|
135
|
+
|
|
136
|
+
**Step 1: Detect frontend framework** (config override always wins):
|
|
137
|
+
|
|
138
|
+
```bash
|
|
139
|
+
BROWSER_VERIFY=$(yq '.quality.browser_verify' .deepflow/config.yaml 2>/dev/null)
|
|
140
|
+
|
|
141
|
+
if [ "${BROWSER_VERIFY}" = "false" ]; then
|
|
142
|
+
# Explicitly disabled — skip L5 unconditionally
|
|
143
|
+
echo "L5 — (no frontend)"
|
|
144
|
+
L5_RESULT="skipped-no-frontend"
|
|
145
|
+
elif [ "${BROWSER_VERIFY}" = "true" ]; then
|
|
146
|
+
# Explicitly enabled — proceed even without frontend deps
|
|
147
|
+
FRONTEND_DETECTED=true
|
|
148
|
+
FRONTEND_FRAMEWORK="configured"
|
|
149
|
+
else
|
|
150
|
+
# Auto-detect from package.json (both dependencies and devDependencies)
|
|
151
|
+
FRONTEND_DETECTED=false
|
|
152
|
+
FRONTEND_FRAMEWORK=""
|
|
153
|
+
|
|
154
|
+
if [ -f package.json ]; then
|
|
155
|
+
# Check for React / Next.js
|
|
156
|
+
if jq -e '(.dependencies + (.devDependencies // {})) | keys[] | select(. == "react" or . == "react-dom" or . == "next")' package.json >/dev/null 2>&1; then
|
|
157
|
+
FRONTEND_DETECTED=true
|
|
158
|
+
# Prefer Next.js label when next is present
|
|
159
|
+
if jq -e '(.dependencies + (.devDependencies // {}))["next"]' package.json >/dev/null 2>&1; then
|
|
160
|
+
FRONTEND_FRAMEWORK="Next.js"
|
|
161
|
+
else
|
|
162
|
+
FRONTEND_FRAMEWORK="React"
|
|
163
|
+
fi
|
|
164
|
+
# Check for Nuxt / Vue
|
|
165
|
+
elif jq -e '(.dependencies + (.devDependencies // {})) | keys[] | select(. == "vue" or . == "nuxt" or startswith("@vue/"))' package.json >/dev/null 2>&1; then
|
|
166
|
+
FRONTEND_DETECTED=true
|
|
167
|
+
if jq -e '(.dependencies + (.devDependencies // {}))["nuxt"]' package.json >/dev/null 2>&1; then
|
|
168
|
+
FRONTEND_FRAMEWORK="Nuxt"
|
|
169
|
+
else
|
|
170
|
+
FRONTEND_FRAMEWORK="Vue"
|
|
171
|
+
fi
|
|
172
|
+
# Check for Svelte / SvelteKit
|
|
173
|
+
elif jq -e '(.dependencies + (.devDependencies // {})) | keys[] | select(. == "svelte" or startswith("@sveltejs/"))' package.json >/dev/null 2>&1; then
|
|
174
|
+
FRONTEND_DETECTED=true
|
|
175
|
+
if jq -e '(.dependencies + (.devDependencies // {}))["@sveltejs/kit"]' package.json >/dev/null 2>&1; then
|
|
176
|
+
FRONTEND_FRAMEWORK="SvelteKit"
|
|
177
|
+
else
|
|
178
|
+
FRONTEND_FRAMEWORK="Svelte"
|
|
179
|
+
fi
|
|
180
|
+
fi
|
|
181
|
+
fi
|
|
182
|
+
|
|
183
|
+
if [ "${FRONTEND_DETECTED}" = "false" ]; then
|
|
184
|
+
echo "L5 — (no frontend)"
|
|
185
|
+
L5_RESULT="skipped-no-frontend"
|
|
186
|
+
fi
|
|
187
|
+
fi
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
Packages checked in both `dependencies` and `devDependencies`:
|
|
191
|
+
|
|
192
|
+
| Package(s) | Detected Framework |
|
|
193
|
+
|------------|--------------------|
|
|
194
|
+
| `next` | Next.js |
|
|
195
|
+
| `react`, `react-dom` | React |
|
|
196
|
+
| `nuxt` | Nuxt |
|
|
197
|
+
| `vue`, `@vue/*` | Vue |
|
|
198
|
+
| `@sveltejs/kit` | SvelteKit |
|
|
199
|
+
| `svelte`, `@sveltejs/*` | Svelte |
|
|
200
|
+
|
|
201
|
+
Config key `quality.browser_verify`:
|
|
202
|
+
- `false` → always skip L5, output `L5 — (no frontend)`, even if frontend deps are present
|
|
203
|
+
- `true` → always run L5, even if no frontend deps detected
|
|
204
|
+
- absent → auto-detect from package.json as above
|
|
205
|
+
|
|
206
|
+
No frontend deps found and `quality.browser_verify` not set → output `L5 — (no frontend)`, skip all remaining L5 steps.
|
|
207
|
+
|
|
208
|
+
**Step 2: Dev server lifecycle**
|
|
209
|
+
|
|
210
|
+
**2a. Resolve dev command** (config override always wins):
|
|
211
|
+
|
|
212
|
+
```bash
|
|
213
|
+
# 1. Config override
|
|
214
|
+
DEV_COMMAND=$(yq '.quality.dev_command' .deepflow/config.yaml 2>/dev/null)
|
|
215
|
+
|
|
216
|
+
# 2. Auto-detect from package.json scripts.dev
|
|
217
|
+
if [ -z "${DEV_COMMAND}" ] || [ "${DEV_COMMAND}" = "null" ]; then
|
|
218
|
+
if [ -f package.json ] && jq -e '.scripts.dev' package.json >/dev/null 2>&1; then
|
|
219
|
+
DEV_COMMAND="npm run dev"
|
|
220
|
+
fi
|
|
221
|
+
fi
|
|
222
|
+
|
|
223
|
+
# 3. No dev command found → skip L5 dev server steps
|
|
224
|
+
if [ -z "${DEV_COMMAND}" ]; then
|
|
225
|
+
echo "⚠ L5: No dev command found (scripts.dev not in package.json, quality.dev_command not set). Skipping browser check."
|
|
226
|
+
L5_RESULT="skipped-no-dev-command"
|
|
227
|
+
fi
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
**2b. Resolve port:**
|
|
231
|
+
|
|
232
|
+
```bash
|
|
233
|
+
# Config override wins; fallback to 3000
|
|
234
|
+
DEV_PORT=$(yq '.quality.dev_port' .deepflow/config.yaml 2>/dev/null)
|
|
235
|
+
if [ -z "${DEV_PORT}" ] || [ "${DEV_PORT}" = "null" ]; then
|
|
236
|
+
DEV_PORT=3000
|
|
237
|
+
fi
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
**2c. Check if dev server is already running (port already bound):**
|
|
241
|
+
|
|
242
|
+
```bash
|
|
243
|
+
PORT_IN_USE=false
|
|
244
|
+
if curl -s -o /dev/null -w "%{http_code}" "http://localhost:${DEV_PORT}" | grep -q "200"; then
|
|
245
|
+
PORT_IN_USE=true
|
|
246
|
+
echo "ℹ L5: Port ${DEV_PORT} already bound — using existing dev server, will not kill on exit."
|
|
247
|
+
fi
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
**2d. Start dev server and poll for readiness:**
|
|
251
|
+
|
|
252
|
+
```bash
|
|
253
|
+
DEV_SERVER_PID=""
|
|
254
|
+
if [ "${PORT_IN_USE}" = "false" ]; then
|
|
255
|
+
# Start in a new process group so all child processes can be killed together
|
|
256
|
+
setsid ${DEV_COMMAND} &
|
|
257
|
+
DEV_SERVER_PID=$!
|
|
258
|
+
fi
|
|
259
|
+
|
|
260
|
+
# Resolve timeout from config (default 30s)
|
|
261
|
+
TIMEOUT=$(yq '.quality.browser_timeout' .deepflow/config.yaml 2>/dev/null)
|
|
262
|
+
if [ -z "${TIMEOUT}" ] || [ "${TIMEOUT}" = "null" ]; then
|
|
263
|
+
TIMEOUT=30
|
|
264
|
+
fi
|
|
265
|
+
POLL_INTERVAL=0.5
|
|
266
|
+
MAX_POLLS=$(echo "${TIMEOUT} / ${POLL_INTERVAL}" | bc)
|
|
267
|
+
|
|
268
|
+
HTTP_STATUS=""
|
|
269
|
+
for i in $(seq 1 ${MAX_POLLS}); do
|
|
270
|
+
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:${DEV_PORT}" 2>/dev/null)
|
|
271
|
+
[ "${HTTP_STATUS}" = "200" ] && break
|
|
272
|
+
sleep ${POLL_INTERVAL}
|
|
273
|
+
done
|
|
274
|
+
|
|
275
|
+
if [ "${HTTP_STATUS}" != "200" ]; then
|
|
276
|
+
# Kill process group before reporting failure
|
|
277
|
+
if [ -n "${DEV_SERVER_PID}" ]; then
|
|
278
|
+
kill -SIGTERM -${DEV_SERVER_PID} 2>/dev/null
|
|
279
|
+
fi
|
|
280
|
+
echo "✗ L5 FAIL: dev server did not start within ${TIMEOUT}s"
|
|
281
|
+
# add fix task to PLAN.md
|
|
282
|
+
exit 1
|
|
283
|
+
fi
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
**2e. Teardown — always runs on both pass and fail paths:**
|
|
287
|
+
|
|
288
|
+
```bash
|
|
289
|
+
cleanup_dev_server() {
|
|
290
|
+
if [ -n "${DEV_SERVER_PID}" ]; then
|
|
291
|
+
# Kill the entire process group to catch any child processes spawned by the dev server
|
|
292
|
+
kill -SIGTERM -${DEV_SERVER_PID} 2>/dev/null
|
|
293
|
+
# Give it up to 5s to exit cleanly, then force-kill
|
|
294
|
+
for i in $(seq 1 10); do
|
|
295
|
+
kill -0 ${DEV_SERVER_PID} 2>/dev/null || break
|
|
296
|
+
sleep 0.5
|
|
297
|
+
done
|
|
298
|
+
kill -SIGKILL -${DEV_SERVER_PID} 2>/dev/null || true
|
|
299
|
+
fi
|
|
300
|
+
}
|
|
301
|
+
# Register cleanup for both success and failure paths
|
|
302
|
+
trap cleanup_dev_server EXIT
|
|
303
|
+
```
|
|
304
|
+
|
|
305
|
+
Note: When `PORT_IN_USE=true` (dev server was already running before L5 began), `DEV_SERVER_PID` is empty and `cleanup_dev_server` is a no-op — the pre-existing server is left running.
|
|
306
|
+
|
|
307
|
+
**Step 3: Read assertions from PLAN.md**
|
|
308
|
+
|
|
309
|
+
Assertions are written into PLAN.md at plan-time (REQ-8). Extract them for the current spec:
|
|
310
|
+
|
|
311
|
+
```bash
|
|
312
|
+
# Parse structured browser assertions block from PLAN.md
|
|
313
|
+
# Format expected in PLAN.md under each spec section:
|
|
314
|
+
# browser_assertions:
|
|
315
|
+
# - selector: "nav"
|
|
316
|
+
# role: "navigation"
|
|
317
|
+
# name: "Main navigation"
|
|
318
|
+
# - selector: "button[type=submit]"
|
|
319
|
+
# visible: true
|
|
320
|
+
# text: "Submit"
|
|
321
|
+
ASSERTIONS=$(parse_yaml_block "browser_assertions" PLAN.md)
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
If no `browser_assertions` block found for the spec → L5 — (no assertions), skip Playwright step.
|
|
325
|
+
|
|
326
|
+
**Step 3.5: Playwright browser auto-install**
|
|
327
|
+
|
|
328
|
+
Before launching Playwright, verify the Chromium browser binary is available. Run this check once per session; cache the result to avoid repeated installs.
|
|
329
|
+
|
|
330
|
+
```bash
|
|
331
|
+
# Marker file path — presence means Playwright Chromium was verified this session
|
|
332
|
+
PW_MARKER="${TMPDIR:-/tmp}/.deepflow-pw-chromium-ok"
|
|
333
|
+
|
|
334
|
+
if [ ! -f "${PW_MARKER}" ]; then
|
|
335
|
+
# Dry-run to detect whether the browser binary is already installed
|
|
336
|
+
if ! npx --yes playwright install --dry-run chromium 2>&1 | grep -q "chromium.*already installed"; then
|
|
337
|
+
echo "ℹ L5: Playwright Chromium not found — installing (one-time setup)..."
|
|
338
|
+
if npx --yes playwright install chromium 2>&1; then
|
|
339
|
+
echo "✓ L5: Playwright Chromium installed successfully."
|
|
340
|
+
touch "${PW_MARKER}"
|
|
341
|
+
else
|
|
342
|
+
echo "✗ L5 FAIL: Playwright Chromium install failed. Browser verification skipped."
|
|
343
|
+
L5_RESULT="skipped-install-failed"
|
|
344
|
+
# Skip the remaining L5 steps for this run
|
|
345
|
+
fi
|
|
346
|
+
else
|
|
347
|
+
# Already installed — cache for this session
|
|
348
|
+
touch "${PW_MARKER}"
|
|
349
|
+
fi
|
|
350
|
+
fi
|
|
351
|
+
|
|
352
|
+
# If install failed, skip Playwright launch and jump to L5 outcome reporting
|
|
353
|
+
if [ "${L5_RESULT}" = "skipped-install-failed" ]; then
|
|
354
|
+
# No assertions can be evaluated — treat as a non-blocking skip with error notice
|
|
355
|
+
: # fall through to report section
|
|
356
|
+
fi
|
|
357
|
+
```
|
|
358
|
+
|
|
359
|
+
Skip Steps 4–6 when `L5_RESULT="skipped-install-failed"`.
|
|
360
|
+
|
|
361
|
+
**Step 4: Playwright verification**
|
|
362
|
+
|
|
363
|
+
Launch Chromium headlessly via Playwright and evaluate each assertion deterministically — no LLM judgment:
|
|
364
|
+
|
|
365
|
+
```javascript
|
|
366
|
+
const { chromium } = require('playwright');
|
|
367
|
+
const browser = await chromium.launch({ headless: true });
|
|
368
|
+
const page = await browser.newPage();
|
|
369
|
+
await page.goto('http://localhost:3000');
|
|
370
|
+
|
|
371
|
+
const failures = [];
|
|
372
|
+
|
|
373
|
+
for (const assertion of assertions) {
|
|
374
|
+
const locator = page.locator(assertion.selector);
|
|
375
|
+
|
|
376
|
+
// Capture accessibility tree (replaces deprecated page.accessibility.snapshot())
|
|
377
|
+
// locator.ariaSnapshot() returns YAML-like text with roles, names, hierarchy
|
|
378
|
+
const ariaSnapshot = await locator.ariaSnapshot();
|
|
379
|
+
|
|
380
|
+
if (assertion.role && !ariaSnapshot.includes(`role: ${assertion.role}`)) {
|
|
381
|
+
failures.push(`${assertion.selector}: expected role "${assertion.role}", not found in aria snapshot`);
|
|
382
|
+
}
|
|
383
|
+
if (assertion.name && !ariaSnapshot.includes(assertion.name)) {
|
|
384
|
+
failures.push(`${assertion.selector}: expected name "${assertion.name}", not found in aria snapshot`);
|
|
385
|
+
}
|
|
386
|
+
|
|
387
|
+
// Capture bounding boxes for visible assertions
|
|
388
|
+
if (assertion.visible !== undefined) {
|
|
389
|
+
const box = await locator.boundingBox();
|
|
390
|
+
const isVisible = box !== null && box.width > 0 && box.height > 0;
|
|
391
|
+
if (assertion.visible !== isVisible) {
|
|
392
|
+
failures.push(`${assertion.selector}: expected visible=${assertion.visible}, got visible=${isVisible}`);
|
|
393
|
+
}
|
|
394
|
+
}
|
|
395
|
+
|
|
396
|
+
if (assertion.text) {
|
|
397
|
+
const text = await locator.innerText();
|
|
398
|
+
if (!text.includes(assertion.text)) {
|
|
399
|
+
failures.push(`${assertion.selector}: expected text "${assertion.text}", got "${text}"`);
|
|
400
|
+
}
|
|
401
|
+
}
|
|
402
|
+
}
|
|
403
|
+
```
|
|
404
|
+
|
|
405
|
+
Note: `page.accessibility.snapshot()` was removed in Playwright 1.x. Always use `locator.ariaSnapshot()`, which returns YAML-like text describing roles, names, and hierarchy for the matched element subtree.
|
|
406
|
+
|
|
407
|
+
**Step 5: Screenshot capture**
|
|
408
|
+
|
|
409
|
+
After evaluation (pass or fail), capture a full-page screenshot:
|
|
410
|
+
|
|
411
|
+
```javascript
|
|
412
|
+
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
|
|
413
|
+
const specName = 'doing-upload'; // derived from current spec filename
|
|
414
|
+
const screenshotPath = `.deepflow/screenshots/${specName}/${timestamp}.png`;
|
|
415
|
+
await fs.mkdirSync(path.dirname(screenshotPath), { recursive: true });
|
|
416
|
+
await page.screenshot({ path: screenshotPath, fullPage: true });
|
|
417
|
+
```
|
|
418
|
+
|
|
419
|
+
Screenshot path: `.deepflow/screenshots/{spec-name}/{timestamp}.png`
|
|
420
|
+
|
|
421
|
+
**Step 6: Retry logic**
|
|
422
|
+
|
|
423
|
+
On first failure, retry the FULL L5 check once (total 2 attempts). Re-navigate and re-evaluate all assertions from scratch on the retry:
|
|
424
|
+
|
|
425
|
+
```javascript
|
|
426
|
+
// attempt1_failures populated by Step 4 above
|
|
427
|
+
let attempt2_failures = [];
|
|
428
|
+
|
|
429
|
+
if (attempt1_failures.length > 0) {
|
|
430
|
+
// Retry: re-navigate and re-evaluate all assertions (identical logic to Step 4)
|
|
431
|
+
await page.goto('http://localhost:' + DEV_PORT);
|
|
432
|
+
attempt2_failures = await evaluateAssertions(page, assertions); // same loop as Step 4
|
|
433
|
+
|
|
434
|
+
// Capture a second screenshot for the retry attempt
|
|
435
|
+
const retryTimestamp = new Date().toISOString().replace(/[:.]/g, '-');
|
|
436
|
+
const retryScreenshotPath = `.deepflow/screenshots/${specName}/${retryTimestamp}-retry.png`;
|
|
437
|
+
await page.screenshot({ path: retryScreenshotPath, fullPage: true });
|
|
438
|
+
}
|
|
439
|
+
```
|
|
440
|
+
|
|
441
|
+
**Outcome matrix:**
|
|
442
|
+
|
|
443
|
+
| Attempt 1 | Attempt 2 | Result |
|
|
444
|
+
|-----------|-----------|--------|
|
|
445
|
+
| Pass | — (not run) | L5 ✓ |
|
|
446
|
+
| Fail | Pass | L5 ✓ with warning "(passed on retry)" |
|
|
447
|
+
| Fail | Fail — same assertions | L5 ✗ — genuine failure |
|
|
448
|
+
| Fail | Fail — different assertions | L5 ✗ (flaky) |
|
|
449
|
+
|
|
450
|
+
**Outcome reporting:**
|
|
451
|
+
|
|
452
|
+
- **First attempt passes:** `✓ L5: All assertions passed` — no retry needed.
|
|
453
|
+
|
|
454
|
+
- **First fails, retry passes:**
|
|
455
|
+
```
|
|
456
|
+
⚠ L5: Passed on retry (possible flaky render)
|
|
457
|
+
First attempt failed on: {list of assertion selectors from attempt 1}
|
|
458
|
+
```
|
|
459
|
+
→ L5 pass with warning. No fix task added.
|
|
460
|
+
|
|
461
|
+
- **Both fail on SAME assertions** (identical set of failing selectors):
|
|
462
|
+
```
|
|
463
|
+
✗ L5: Browser assertions failed (both attempts)
|
|
464
|
+
{selector}: {failure detail}
|
|
465
|
+
{selector}: {failure detail}
|
|
466
|
+
...
|
|
467
|
+
```
|
|
468
|
+
→ L5 FAIL. Add fix task to PLAN.md.
|
|
469
|
+
|
|
470
|
+
- **Both fail on DIFFERENT assertions** (flaky — assertion sets differ between attempts):
|
|
471
|
+
```
|
|
472
|
+
✗ L5: Browser assertions failed (flaky — inconsistent failures across attempts)
|
|
473
|
+
Attempt 1 failures:
|
|
474
|
+
{selector}: {failure detail}
|
|
475
|
+
Attempt 2 failures:
|
|
476
|
+
{selector}: {failure detail}
|
|
477
|
+
```
|
|
478
|
+
→ L5 ✗ (flaky). Add fix task to PLAN.md noting flakiness.
|
|
479
|
+
|
|
480
|
+
**Fix task generation on L5 failure (both same and flaky):**
|
|
481
|
+
|
|
482
|
+
When both attempts fail (`L5_RESULT = 'fail'` or `L5_RESULT = 'fail-flaky'`), generate a fix task and append it to PLAN.md under the spec's section:
|
|
483
|
+
|
|
484
|
+
```javascript
|
|
485
|
+
// 1. Determine next task ID
|
|
486
|
+
// Scan PLAN.md for highest T{n} and increment
|
|
487
|
+
const planContent = fs.readFileSync('PLAN.md', 'utf8');
|
|
488
|
+
const taskIds = [...planContent.matchAll(/\bT(\d+)\b/g)].map(m => parseInt(m[1], 10));
|
|
489
|
+
const nextId = taskIds.length > 0 ? Math.max(...taskIds) + 1 : 1;
|
|
490
|
+
const taskId = `T${nextId}`;
|
|
491
|
+
|
|
492
|
+
// 2. Collect fix task context
|
|
493
|
+
// - Failing assertions: the structured assertion objects that failed
|
|
494
|
+
const failingAssertions = attempt2_failures.length > 0 ? attempt2_failures : attempt1_failures;
|
|
495
|
+
|
|
496
|
+
// - DOM snapshot excerpt: capture aria snapshot of body at the time of failure
|
|
497
|
+
const domSnapshotExcerpt = await page.locator('body').ariaSnapshot();
|
|
498
|
+
|
|
499
|
+
// - Screenshot path: already captured in Step 5 / Step 6 retry
|
|
500
|
+
// screenshotPath / retryScreenshotPath are available from those steps
|
|
501
|
+
|
|
502
|
+
// 3. Build task description
|
|
503
|
+
const isFlaky = L5_RESULT === 'fail-flaky';
|
|
504
|
+
const flakySuffix = isFlaky ? ' (flaky — inconsistent failures across attempts)' : '';
|
|
505
|
+
const screenshotRef = isFlaky ? retryScreenshotPath : screenshotPath;
|
|
506
|
+
|
|
507
|
+
const fixTaskBlock = `
|
|
508
|
+
- [ ] ${taskId}: Fix L5 browser assertion failures in ${specName}${flakySuffix}
|
|
509
|
+
**Failing assertions:**
|
|
510
|
+
${failingAssertions.map(f => ` - ${f}`).join('\n')}
|
|
511
|
+
**DOM snapshot (aria tree excerpt at failure):**
|
|
512
|
+
\`\`\`
|
|
513
|
+
${domSnapshotExcerpt.split('\n').slice(0, 40).join('\n')}
|
|
514
|
+
\`\`\`
|
|
515
|
+
**Screenshot:** ${screenshotRef}
|
|
516
|
+
`;
|
|
517
|
+
|
|
518
|
+
// 4. Append fix task under spec section in PLAN.md
|
|
519
|
+
// Find the spec section and append before the next section header or EOF
|
|
520
|
+
const specSectionPattern = new RegExp(`(## ${specName}[\\s\\S]*?)(\n## |$)`);
|
|
521
|
+
const updated = planContent.replace(specSectionPattern, (_, section, next) => section + fixTaskBlock + next);
|
|
522
|
+
fs.writeFileSync('PLAN.md', updated);
|
|
523
|
+
|
|
524
|
+
console.log(`Fix task added to PLAN.md: ${taskId}: Fix L5 browser assertion failures in ${specName}`);
|
|
525
|
+
```
|
|
526
|
+
|
|
527
|
+
Fix task context included:
|
|
528
|
+
- **Failing assertions**: the structured assertion data (selector + failure detail) from whichever attempt(s) failed
|
|
529
|
+
- **DOM snapshot excerpt**: first 40 lines of `locator('body').ariaSnapshot()` output at time of failure (textual a11y tree)
|
|
530
|
+
- **Screenshot path**: `.deepflow/screenshots/{spec-name}/{timestamp}.png` (retry screenshot when available)
|
|
531
|
+
- **Flakiness note**: appended to task title when assertion sets differed between attempts
|
|
532
|
+
|
|
533
|
+
**Comparing assertion sets (same vs. different):**
|
|
534
|
+
|
|
535
|
+
```javascript
|
|
536
|
+
// Compare by selector strings only — ignore detail text differences
|
|
537
|
+
const attempt1_selectors = attempt1_failures.map(f => f.split(':')[0]).sort();
|
|
538
|
+
const attempt2_selectors = attempt2_failures.map(f => f.split(':')[0]).sort();
|
|
539
|
+
const same_assertions = JSON.stringify(attempt1_selectors) === JSON.stringify(attempt2_selectors);
|
|
540
|
+
|
|
541
|
+
if (attempt2_failures.length === 0) {
|
|
542
|
+
// Retry passed
|
|
543
|
+
L5_RESULT = 'pass-on-retry';
|
|
544
|
+
} else if (same_assertions) {
|
|
545
|
+
// Genuine failure — same assertions failed both times
|
|
546
|
+
L5_RESULT = 'fail';
|
|
547
|
+
} else {
|
|
548
|
+
// Flaky — different assertions failed each time
|
|
549
|
+
L5_RESULT = 'fail-flaky';
|
|
550
|
+
}
|
|
551
|
+
```
|
|
552
|
+
|
|
553
|
+
**L5 outcomes:**
|
|
554
|
+
- L5 ✓ — all assertions pass on first attempt
|
|
555
|
+
- L5 ⚠ — passed on retry (possible flaky render); first-attempt failures listed as context
|
|
556
|
+
- L5 ✗ — assertions failed on both attempts (same assertions), fix tasks added
|
|
557
|
+
- L5 ✗ (flaky) — assertions failed on both attempts but on different assertions; fix tasks added noting flakiness
|
|
558
|
+
- L5 — (no frontend) — no frontend deps detected and no config override
|
|
559
|
+
- L5 — (no assertions) — frontend detected but no `browser_assertions` in PLAN.md
|
|
560
|
+
- L5 ✗ (install failed) — Playwright Chromium install failed; browser verification skipped for this run
|
|
561
|
+
|
|
134
562
|
### 3. GENERATE REPORT
|
|
135
563
|
|
|
136
564
|
**Format on success:**
|
|
137
565
|
```
|
|
138
|
-
doing-upload.md: L0 ✓ | L1 ✓ (5/5 files) | L2 ⚠ (no coverage tool) | L3 — (subsumed) | L4 ✓ (12 tests) | 0 quality issues
|
|
566
|
+
doing-upload.md: L0 ✓ | L1 ✓ (5/5 files) | L2 ⚠ (no coverage tool) | L3 — (subsumed) | L4 ✓ (12 tests) | L5 ✓ | 0 quality issues
|
|
139
567
|
```
|
|
140
568
|
|
|
141
569
|
**Format on failure:**
|
|
142
570
|
```
|
|
143
|
-
doing-upload.md: L0 ✓ | L1 ✗ (3/5 files) | L2 ⚠ | L3 — | L4 ✗ (3 failed)
|
|
571
|
+
doing-upload.md: L0 ✓ | L1 ✗ (3/5 files) | L2 ⚠ | L3 — | L4 ✗ (3 failed) | L5 ✗ (2 assertions failed)
|
|
144
572
|
|
|
145
573
|
Issues:
|
|
146
574
|
✗ L1: Missing files: src/api/upload.ts, src/services/storage.ts
|
|
@@ -160,6 +588,7 @@ Run /df:execute --continue to fix in the same worktree.
|
|
|
160
588
|
- L1: All planned files appear in diff
|
|
161
589
|
- L2: Coverage didn't drop (or no coverage tool detected)
|
|
162
590
|
- L4: Tests pass (or no test command detected)
|
|
591
|
+
- L5: Browser assertions pass (or no frontend detected, or no assertions defined)
|
|
163
592
|
|
|
164
593
|
**If all gates pass:** Proceed to Post-Verification merge.
|
|
165
594
|
|
|
@@ -192,8 +621,9 @@ Files: ...
|
|
|
192
621
|
| L2: Coverage | Coverage didn't drop | Coverage tool (before/after) | Orchestrator (Bash) |
|
|
193
622
|
| L3: Integration | Build + tests pass | Subsumed by L0 + L4 | — |
|
|
194
623
|
| L4: Tested | Tests pass | Run test command | Orchestrator (Bash) |
|
|
624
|
+
| L5: Browser | UI assertions pass | Playwright + `locator.ariaSnapshot()` | Orchestrator (Bash + Node) |
|
|
195
625
|
|
|
196
|
-
**Default: L0 through
|
|
626
|
+
**Default: L0 through L5.** L0 and L4 skipped ONLY if no build/test command detected (see step 1.5). L5 skipped if no frontend detected and no config override. All checks are machine-verifiable. No LLM agents are used.
|
|
197
627
|
|
|
198
628
|
## Rules
|
|
199
629
|
- Verify against spec, not assumptions
|
|
@@ -0,0 +1,258 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: browse-fetch
|
|
3
|
+
description: Fetches live web content using headless Chromium via Playwright. Use when you need to read documentation, articles, or any public URL that requires JavaScript rendering. Falls back to WebFetch for simple HTML pages.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Browse-Fetch
|
|
7
|
+
|
|
8
|
+
Retrieve live web content with a headless browser. Handles JavaScript-rendered pages, SPAs, and dynamic content that WebFetch cannot reach.
|
|
9
|
+
|
|
10
|
+
## When to Use
|
|
11
|
+
|
|
12
|
+
- Reading documentation sites that require JavaScript to render (e.g., React-based docs, Vite, Next.js portals)
|
|
13
|
+
- Fetching the current content of a specific URL provided by the user
|
|
14
|
+
- Extracting article or reference content from a known page before implementing code against it
|
|
15
|
+
|
|
16
|
+
## Skip When
|
|
17
|
+
|
|
18
|
+
- The URL is a plain HTML page or GitHub raw file — use WebFetch instead (faster, no overhead)
|
|
19
|
+
- The target requires authentication (login wall) or CAPTCHA — browser cannot bypass; note the block and continue
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Browser Core Protocol
|
|
24
|
+
|
|
25
|
+
This protocol is the reusable foundation for all browser-based skills (browse-fetch, browse-verify, etc.).
|
|
26
|
+
|
|
27
|
+
### 1. Install Check
|
|
28
|
+
|
|
29
|
+
Before launching, verify Playwright is available:
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
# Prefer bun if available, fall back to node
|
|
33
|
+
if which bun > /dev/null 2>&1; then RUNTIME=bun; else RUNTIME=node; fi
|
|
34
|
+
|
|
35
|
+
$RUNTIME -e "require('playwright')" 2>/dev/null \
|
|
36
|
+
|| npx --yes playwright install chromium --with-deps 2>&1 | tail -5
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
If installation fails, fall back to WebFetch (see Fallback section below).
|
|
40
|
+
|
|
41
|
+
### 2. Launch Command
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
# Detect runtime
|
|
45
|
+
if which bun > /dev/null 2>&1; then RUNTIME=bun; else RUNTIME=node; fi
|
|
46
|
+
|
|
47
|
+
$RUNTIME -e "
|
|
48
|
+
const { chromium } = require('playwright');
|
|
49
|
+
(async () => {
|
|
50
|
+
const browser = await chromium.launch({ headless: true });
|
|
51
|
+
const context = await browser.newContext({
|
|
52
|
+
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
|
53
|
+
});
|
|
54
|
+
const page = await context.newPage();
|
|
55
|
+
|
|
56
|
+
// --- navigation + extraction (see sections 3–4) ---
|
|
57
|
+
|
|
58
|
+
await browser.close();
|
|
59
|
+
})().catch(e => { console.error(e.message); process.exit(1); });
|
|
60
|
+
"
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
### 3. Navigation
|
|
64
|
+
|
|
65
|
+
```js
|
|
66
|
+
// Inside the async IIFE above
|
|
67
|
+
await page.goto(URL, { waitUntil: 'domcontentloaded', timeout: 30000 });
|
|
68
|
+
// Allow JS to settle
|
|
69
|
+
await page.waitForTimeout(1500);
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
- Use `waitUntil: 'domcontentloaded'` for speed; upgrade to `'networkidle'` only if content is missing.
|
|
73
|
+
- Set `timeout: 30000` (30 s). On timeout, treat as graceful failure (see section 5).
|
|
74
|
+
|
|
75
|
+
### 4. Content Extraction
|
|
76
|
+
|
|
77
|
+
Extract the main readable text, not raw HTML:
|
|
78
|
+
|
|
79
|
+
```js
|
|
80
|
+
// Primary: semantic content containers
|
|
81
|
+
let text = await page.innerText('main, article, [role="main"]').catch(() => '');
|
|
82
|
+
|
|
83
|
+
// Fallback: full body text
|
|
84
|
+
if (!text || text.trim().length < 100) {
|
|
85
|
+
text = await page.innerText('body').catch(() => '');
|
|
86
|
+
}
|
|
87
|
+
|
|
88
|
+
// Truncate to ~4000 tokens (~16000 chars) to stay within context budget
|
|
89
|
+
const MAX_CHARS = 16000;
|
|
90
|
+
if (text.length > MAX_CHARS) {
|
|
91
|
+
text = text.slice(0, MAX_CHARS) + '\n\n[content truncated — use a more specific selector or paginate]';
|
|
92
|
+
}
|
|
93
|
+
|
|
94
|
+
console.log(text);
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
For interactive element inspection (e.g., browse-verify), use `locator.ariaSnapshot()` instead of `innerText`.
|
|
98
|
+
|
|
99
|
+
### 5. Graceful Failure
|
|
100
|
+
|
|
101
|
+
Detect and handle blocks without crashing:
|
|
102
|
+
|
|
103
|
+
```js
|
|
104
|
+
const title = await page.title();
|
|
105
|
+
const url = page.url();
|
|
106
|
+
|
|
107
|
+
// Login wall
|
|
108
|
+
if (/sign.?in|log.?in|auth/i.test(title) || url.includes('/login')) {
|
|
109
|
+
console.log(`[browse-fetch] Blocked by login wall at ${url}. Skipping.`);
|
|
110
|
+
await browser.close();
|
|
111
|
+
process.exit(0);
|
|
112
|
+
}
|
|
113
|
+
|
|
114
|
+
// CAPTCHA
|
|
115
|
+
const bodyText = await page.innerText('body').catch(() => '');
|
|
116
|
+
if (/captcha|robot|human verification/i.test(bodyText)) {
|
|
117
|
+
console.log(`[browse-fetch] CAPTCHA detected at ${url}. Skipping.`);
|
|
118
|
+
await browser.close();
|
|
119
|
+
process.exit(0);
|
|
120
|
+
}
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
On graceful failure: return the URL and a short explanation, then continue with the task using available context.
|
|
124
|
+
|
|
125
|
+
### 6. Cleanup
|
|
126
|
+
|
|
127
|
+
Always close the browser in a `finally` block or after use:
|
|
128
|
+
|
|
129
|
+
```js
|
|
130
|
+
await browser.close();
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## Fetch Workflow
|
|
136
|
+
|
|
137
|
+
**Goal:** retrieve and return the text content of a single URL.
|
|
138
|
+
|
|
139
|
+
```bash
|
|
140
|
+
# Full inline script — adapt URL and selector per query
|
|
141
|
+
if which bun > /dev/null 2>&1; then RUNTIME=bun; else RUNTIME=node; fi
|
|
142
|
+
|
|
143
|
+
$RUNTIME -e "
|
|
144
|
+
const { chromium } = require('playwright');
|
|
145
|
+
(async () => {
|
|
146
|
+
const browser = await chromium.launch({ headless: true });
|
|
147
|
+
const context = await browser.newContext({
|
|
148
|
+
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
|
149
|
+
});
|
|
150
|
+
const page = await context.newPage();
|
|
151
|
+
|
|
152
|
+
try {
|
|
153
|
+
await page.goto('https://example.com/docs/page', {
|
|
154
|
+
waitUntil: 'domcontentloaded',
|
|
155
|
+
timeout: 30000
|
|
156
|
+
});
|
|
157
|
+
await page.waitForTimeout(1500);
|
|
158
|
+
|
|
159
|
+
const title = await page.title();
|
|
160
|
+
const url = page.url();
|
|
161
|
+
|
|
162
|
+
if (/sign.?in|log.?in|auth/i.test(title) || url.includes('/login')) {
|
|
163
|
+
console.log('[browse-fetch] Blocked by login wall at ' + url);
|
|
164
|
+
return;
|
|
165
|
+
}
|
|
166
|
+
|
|
167
|
+
let text = await page.innerText('main, article, [role=\"main\"]').catch(() => '');
|
|
168
|
+
if (!text || text.trim().length < 100) {
|
|
169
|
+
text = await page.innerText('body').catch(() => '');
|
|
170
|
+
}
|
|
171
|
+
|
|
172
|
+
const MAX_CHARS = 16000;
|
|
173
|
+
if (text.length > MAX_CHARS) {
|
|
174
|
+
text = text.slice(0, MAX_CHARS) + '\n\n[content truncated]';
|
|
175
|
+
}
|
|
176
|
+
|
|
177
|
+
console.log('=== ' + title + ' ===\n' + text);
|
|
178
|
+
} finally {
|
|
179
|
+
await browser.close();
|
|
180
|
+
}
|
|
181
|
+
})().catch(e => { console.error(e.message); process.exit(1); });
|
|
182
|
+
"
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
Adapt the URL and selector per query. The agent inlines the full script via `node -e` or `bun -e` so no temp files are needed for extractions under ~4000 tokens.
|
|
186
|
+
|
|
187
|
+
---
|
|
188
|
+
|
|
189
|
+
## Search + Navigation Protocol
|
|
190
|
+
|
|
191
|
+
**Time-box:** 60 seconds total. **Page cap:** 5 pages per query.
|
|
192
|
+
|
|
193
|
+
> Search engines (Google, DuckDuckGo) block headless browsers with CAPTCHAs. Do NOT use Playwright to search them.
|
|
194
|
+
|
|
195
|
+
Instead, use one of these strategies:
|
|
196
|
+
|
|
197
|
+
| Strategy | When to use |
|
|
198
|
+
|----------|-------------|
|
|
199
|
+
| Direct URL construction | You know the domain (e.g., `docs.stripe.com/api/charges`) |
|
|
200
|
+
| WebSearch tool | General keyword search before fetching pages |
|
|
201
|
+
| Site-specific search | Navigate to `site.com/search?q=term` if the site exposes it |
|
|
202
|
+
|
|
203
|
+
**Navigation loop** (up to 5 pages):
|
|
204
|
+
|
|
205
|
+
1. Construct or obtain the target URL.
|
|
206
|
+
2. Run the fetch workflow above.
|
|
207
|
+
3. If the page lacks the needed information, look for a next-page link or a more specific sub-URL.
|
|
208
|
+
4. Repeat up to 4 more times (5 total).
|
|
209
|
+
5. Stop and summarize what was found within the 60 s window.
|
|
210
|
+
|
|
211
|
+
---
|
|
212
|
+
|
|
213
|
+
## Session Cache
|
|
214
|
+
|
|
215
|
+
The context window is the cache. Extracted content lives in the conversation until it is no longer needed.
|
|
216
|
+
|
|
217
|
+
For extractions larger than ~4000 tokens, write to a temp file and reference it:
|
|
218
|
+
|
|
219
|
+
```bash
|
|
220
|
+
# Write large extraction to temp file
|
|
221
|
+
TMPFILE=$(mktemp /tmp/browse-fetch-XXXXXX.txt)
|
|
222
|
+
$RUNTIME -e "...script..." > "$TMPFILE"
|
|
223
|
+
echo "Content saved to $TMPFILE"
|
|
224
|
+
# Read relevant sections with grep or head rather than loading all at once
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
---
|
|
228
|
+
|
|
229
|
+
## Fallback Without Playwright
|
|
230
|
+
|
|
231
|
+
When Playwright is unavailable or fails to install, fall back to the WebFetch tool for:
|
|
232
|
+
|
|
233
|
+
- Static HTML sites (GitHub README, raw docs, Wikipedia)
|
|
234
|
+
- Any URL the user provides where JavaScript rendering is not required
|
|
235
|
+
|
|
236
|
+
| Condition | Action |
|
|
237
|
+
|-----------|--------|
|
|
238
|
+
| `playwright` not installed, install fails | Use WebFetch |
|
|
239
|
+
| Page is a known static domain (github.com/raw, pastebin, etc.) | Use WebFetch directly — skip Playwright |
|
|
240
|
+
| Playwright times out twice | Use WebFetch as fallback attempt |
|
|
241
|
+
|
|
242
|
+
```
|
|
243
|
+
WebFetch: { url: "https://example.com/page", prompt: "Extract the main content" }
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
If WebFetch also fails, return the URL with an explanation and continue the task.
|
|
247
|
+
|
|
248
|
+
---
|
|
249
|
+
|
|
250
|
+
## Rules
|
|
251
|
+
|
|
252
|
+
- Always run the install check before the first browser launch in a session.
|
|
253
|
+
- Detect runtime with `which bun` first; use `node` if bun is absent.
|
|
254
|
+
- Never navigate to Google or DuckDuckGo with Playwright — use WebSearch tool or direct URLs.
|
|
255
|
+
- Truncate output at ~4000 tokens (~16 000 chars) to protect context budget.
|
|
256
|
+
- On login wall or CAPTCHA, log the block, skip, and continue — never retry infinitely.
|
|
257
|
+
- Close the browser in every code path (use `finally`).
|
|
258
|
+
- Do not persist browser sessions across unrelated tasks.
|
|
@@ -0,0 +1,264 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: browse-verify
|
|
3
|
+
description: Verifies UI acceptance criteria by launching a headless browser, extracting the accessibility tree, and evaluating structured assertions deterministically. Use when a spec has browser-based ACs that need automated verification after implementation.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Browse-Verify
|
|
7
|
+
|
|
8
|
+
Headless browser verification using Playwright's accessibility tree. Evaluates structured assertions from PLAN.md without LLM calls — purely deterministic matching.
|
|
9
|
+
|
|
10
|
+
## When to Use
|
|
11
|
+
|
|
12
|
+
After implementing a spec that contains browser-based acceptance criteria:
|
|
13
|
+
- Visual/layout checks (element presence, text content, roles)
|
|
14
|
+
- Interactive state checks (aria-checked, aria-expanded, aria-disabled)
|
|
15
|
+
- Structural checks (element within a container)
|
|
16
|
+
|
|
17
|
+
**Skip when:** The spec has no browser-facing ACs, or the implementation is backend-only.
|
|
18
|
+
|
|
19
|
+
## Prerequisites
|
|
20
|
+
|
|
21
|
+
- Node.js (preferred) or Bun
|
|
22
|
+
- Playwright 1.x (`npm install playwright` or `npx playwright install`)
|
|
23
|
+
- Chromium browser (auto-installed if missing)
|
|
24
|
+
|
|
25
|
+
## Runtime Detection
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
# Prefer Node.js; fall back to Bun
|
|
29
|
+
if which node > /dev/null 2>&1; then
|
|
30
|
+
RUNTIME=node
|
|
31
|
+
elif which bun > /dev/null 2>&1; then
|
|
32
|
+
RUNTIME=bun
|
|
33
|
+
else
|
|
34
|
+
echo "Error: neither node nor bun found" && exit 1
|
|
35
|
+
fi
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## Browser Auto-Install
|
|
39
|
+
|
|
40
|
+
Before running, ensure Chromium is available:
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
npx playwright install chromium
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Run this once per environment. If it fails due to permissions, instruct the user to run it manually.
|
|
47
|
+
|
|
48
|
+
## Protocol
|
|
49
|
+
|
|
50
|
+
### 1. Read Assertions from PLAN.md
|
|
51
|
+
|
|
52
|
+
Assertions are written into PLAN.md by the `plan` skill during planning. Format:
|
|
53
|
+
|
|
54
|
+
```yaml
|
|
55
|
+
assertions:
|
|
56
|
+
- role: button
|
|
57
|
+
name: "Submit"
|
|
58
|
+
check: visible
|
|
59
|
+
- role: checkbox
|
|
60
|
+
name: "Accept terms"
|
|
61
|
+
check: state
|
|
62
|
+
value: checked
|
|
63
|
+
- role: heading
|
|
64
|
+
name: "Dashboard"
|
|
65
|
+
check: visible
|
|
66
|
+
within: main
|
|
67
|
+
- role: textbox
|
|
68
|
+
name: "Email"
|
|
69
|
+
check: value
|
|
70
|
+
value: "user@example.com"
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
Assertion schema:
|
|
74
|
+
|
|
75
|
+
| Field | Required | Description |
|
|
76
|
+
|----------|----------|-------------|
|
|
77
|
+
| `role` | yes | ARIA role (button, checkbox, heading, textbox, link, etc.) |
|
|
78
|
+
| `name` | yes | Accessible name (exact or partial match) |
|
|
79
|
+
| `check` | yes | One of: `visible`, `absent`, `state`, `value`, `count` |
|
|
80
|
+
| `value` | no | Expected value for `state` or `value` checks |
|
|
81
|
+
| `within` | no | Ancestor role or selector to scope the search |
|
|
82
|
+
|
|
83
|
+
### 2. Launch Browser and Navigate
|
|
84
|
+
|
|
85
|
+
```javascript
|
|
86
|
+
const { chromium } = require('playwright');
|
|
87
|
+
|
|
88
|
+
const browser = await chromium.launch({ headless: true });
|
|
89
|
+
const page = await browser.newPage();
|
|
90
|
+
|
|
91
|
+
await page.goto(TARGET_URL, { waitUntil: 'networkidle' });
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
`TARGET_URL` is read from the spec's metadata or passed as an argument.
|
|
95
|
+
|
|
96
|
+
### 3. Extract Accessibility Tree
|
|
97
|
+
|
|
98
|
+
Use `locator.ariaSnapshot()` — **NOT** `page.accessibility.snapshot()` (removed in Playwright 1.x):
|
|
99
|
+
|
|
100
|
+
```javascript
|
|
101
|
+
// Full-page aria snapshot (YAML-like role tree)
|
|
102
|
+
const snapshot = await page.locator('body').ariaSnapshot();
|
|
103
|
+
|
|
104
|
+
// Scoped snapshot within a container
|
|
105
|
+
const containerSnapshot = await page.locator('main').ariaSnapshot();
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
`ariaSnapshot()` returns a YAML-like string such as:
|
|
109
|
+
|
|
110
|
+
```yaml
|
|
111
|
+
- heading "Dashboard" [level=1]
|
|
112
|
+
- button "Submit" [disabled]
|
|
113
|
+
- checkbox "Accept terms" [checked]
|
|
114
|
+
- textbox "Email": user@example.com
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
### 4. Capture Bounding Boxes (optional)
|
|
118
|
+
|
|
119
|
+
For spatial/layout assertions or debugging:
|
|
120
|
+
|
|
121
|
+
```javascript
|
|
122
|
+
const element = page.getByRole(role, { name: assertionName });
|
|
123
|
+
const box = await element.boundingBox();
|
|
124
|
+
// box: { x, y, width, height } or null if not visible
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
### 5. Evaluate Assertions Deterministically
|
|
128
|
+
|
|
129
|
+
Parse the aria snapshot and evaluate each assertion. No LLM calls during this phase.
|
|
130
|
+
|
|
131
|
+
```javascript
|
|
132
|
+
function evaluateAssertion(snapshot, assertion) {
|
|
133
|
+
const { role, name, check, value, within } = assertion;
|
|
134
|
+
|
|
135
|
+
// Optionally scope to a sub-tree
|
|
136
|
+
const tree = within
|
|
137
|
+
? extractSubtree(snapshot, within)
|
|
138
|
+
: snapshot;
|
|
139
|
+
|
|
140
|
+
switch (check) {
|
|
141
|
+
case 'visible':
|
|
142
|
+
return treeContains(tree, role, name);
|
|
143
|
+
|
|
144
|
+
case 'absent':
|
|
145
|
+
return !treeContains(tree, role, name);
|
|
146
|
+
|
|
147
|
+
case 'state':
|
|
148
|
+
// e.g., value: "checked", "disabled", "expanded"
|
|
149
|
+
return treeContainsWithState(tree, role, name, value);
|
|
150
|
+
|
|
151
|
+
case 'value':
|
|
152
|
+
// Matches textbox/combobox displayed value
|
|
153
|
+
return treeContainsWithValue(tree, role, name, value);
|
|
154
|
+
|
|
155
|
+
case 'count':
|
|
156
|
+
return countMatches(tree, role, name) === parseInt(value, 10);
|
|
157
|
+
}
|
|
158
|
+
}
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
Matching rules:
|
|
162
|
+
- Role matching is case-insensitive
|
|
163
|
+
- Name matching is case-insensitive substring match (unless wrapped in quotes for exact match)
|
|
164
|
+
- State tokens (`[checked]`, `[disabled]`, `[expanded]`) are parsed from the snapshot line
|
|
165
|
+
|
|
166
|
+
### 6. Capture Screenshot
|
|
167
|
+
|
|
168
|
+
After evaluation, capture a screenshot for the audit trail:
|
|
169
|
+
|
|
170
|
+
```javascript
|
|
171
|
+
const screenshotDir = `.deepflow/screenshots/${specName}`;
|
|
172
|
+
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
|
|
173
|
+
const screenshotPath = `${screenshotDir}/${timestamp}.png`;
|
|
174
|
+
|
|
175
|
+
await fs.mkdir(screenshotDir, { recursive: true });
|
|
176
|
+
await page.screenshot({ path: screenshotPath, fullPage: true });
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
Screenshot path convention: `.deepflow/screenshots/{spec-name}/{timestamp}.png`
|
|
180
|
+
|
|
181
|
+
### 7. Report Results
|
|
182
|
+
|
|
183
|
+
Emit a structured result for each assertion:
|
|
184
|
+
|
|
185
|
+
```
|
|
186
|
+
[PASS] button "Submit" — visible ✓
|
|
187
|
+
[PASS] checkbox "Accept terms" — state: checked ✓
|
|
188
|
+
[FAIL] heading "Dashboard" — expected visible, not found in snapshot
|
|
189
|
+
[PASS] textbox "Email" — value: user@example.com ✓
|
|
190
|
+
|
|
191
|
+
Results: 3 passed, 1 failed
|
|
192
|
+
Screenshot: .deepflow/screenshots/login-form/2026-03-14T12-00-00-000Z.png
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
Exit with code 0 if all assertions pass, 1 if any fail.
|
|
196
|
+
|
|
197
|
+
### 8. Tear Down
|
|
198
|
+
|
|
199
|
+
```javascript
|
|
200
|
+
await browser.close();
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
Always close the browser, even on error (use try/finally).
|
|
204
|
+
|
|
205
|
+
## Full Script Template
|
|
206
|
+
|
|
207
|
+
```javascript
|
|
208
|
+
#!/usr/bin/env node
|
|
209
|
+
const { chromium } = require('playwright');
|
|
210
|
+
const fs = require('fs/promises');
|
|
211
|
+
const path = require('path');
|
|
212
|
+
|
|
213
|
+
async function main({ targetUrl, specName, assertions }) {
|
|
214
|
+
// Auto-install chromium if needed
|
|
215
|
+
// (handled by: npx playwright install chromium)
|
|
216
|
+
|
|
217
|
+
const browser = await chromium.launch({ headless: true });
|
|
218
|
+
const page = await browser.newPage();
|
|
219
|
+
|
|
220
|
+
try {
|
|
221
|
+
await page.goto(targetUrl, { waitUntil: 'networkidle' });
|
|
222
|
+
|
|
223
|
+
const snapshot = await page.locator('body').ariaSnapshot();
|
|
224
|
+
|
|
225
|
+
const results = assertions.map(assertion => ({
|
|
226
|
+
assertion,
|
|
227
|
+
passed: evaluateAssertion(snapshot, assertion),
|
|
228
|
+
}));
|
|
229
|
+
|
|
230
|
+
// Screenshot
|
|
231
|
+
const screenshotDir = path.join('.deepflow', 'screenshots', specName);
|
|
232
|
+
await fs.mkdir(screenshotDir, { recursive: true });
|
|
233
|
+
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
|
|
234
|
+
const screenshotPath = path.join(screenshotDir, `${timestamp}.png`);
|
|
235
|
+
await page.screenshot({ path: screenshotPath, fullPage: true });
|
|
236
|
+
|
|
237
|
+
// Report
|
|
238
|
+
const passed = results.filter(r => r.passed).length;
|
|
239
|
+
const failed = results.filter(r => !r.passed).length;
|
|
240
|
+
|
|
241
|
+
for (const { assertion, passed: ok } of results) {
|
|
242
|
+
const status = ok ? '[PASS]' : '[FAIL]';
|
|
243
|
+
console.log(`${status} ${assertion.role} "${assertion.name}" — ${assertion.check}${assertion.value ? ': ' + assertion.value : ''}`);
|
|
244
|
+
}
|
|
245
|
+
|
|
246
|
+
console.log(`\nResults: ${passed} passed, ${failed} failed`);
|
|
247
|
+
console.log(`Screenshot: ${screenshotPath}`);
|
|
248
|
+
|
|
249
|
+
process.exit(failed > 0 ? 1 : 0);
|
|
250
|
+
} finally {
|
|
251
|
+
await browser.close();
|
|
252
|
+
}
|
|
253
|
+
}
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
## Rules
|
|
257
|
+
|
|
258
|
+
- Never call an LLM during the verify phase — all assertion evaluation is deterministic
|
|
259
|
+
- Always use `locator.ariaSnapshot()`, never `page.accessibility.snapshot()` (removed)
|
|
260
|
+
- Always close the browser in a `finally` block
|
|
261
|
+
- Screenshot every run regardless of pass/fail outcome
|
|
262
|
+
- If Playwright is not installed, emit a clear error and instructions — don't silently skip
|
|
263
|
+
- Partial name matching is the default; use exact matching only when the assertion specifies it
|
|
264
|
+
- Report results to stdout in the structured format above for downstream parsing
|
|
@@ -75,3 +75,17 @@ quality:
|
|
|
75
75
|
|
|
76
76
|
# Retry flaky tests once before failing (default: true)
|
|
77
77
|
test_retry_on_fail: true
|
|
78
|
+
|
|
79
|
+
# Enable L5 browser verification after tests pass (default: false)
|
|
80
|
+
# When true, deepflow will start the dev server and run visual checks
|
|
81
|
+
browser_verify: false
|
|
82
|
+
|
|
83
|
+
# Override the dev server start command for browser verification
|
|
84
|
+
# If empty, deepflow will attempt to auto-detect (e.g., "npm run dev", "yarn dev")
|
|
85
|
+
dev_command: ""
|
|
86
|
+
|
|
87
|
+
# Port that the dev server listens on (default: 3000)
|
|
88
|
+
dev_port: 3000
|
|
89
|
+
|
|
90
|
+
# Timeout in seconds to wait for the dev server to become ready (default: 30)
|
|
91
|
+
browser_timeout: 30
|
|
@@ -1,87 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: context-hub
|
|
3
|
-
description: Fetches curated API docs for external libraries before coding. Use when implementing code that uses external APIs/SDKs (Stripe, OpenAI, MongoDB, etc.) to avoid hallucinating APIs and reduce token usage.
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# Context Hub
|
|
7
|
-
|
|
8
|
-
Fetch curated, versioned docs for external libraries instead of guessing APIs.
|
|
9
|
-
|
|
10
|
-
## When to Use
|
|
11
|
-
|
|
12
|
-
Before writing code that calls an external API or SDK:
|
|
13
|
-
- New library integration (e.g., Stripe payments, AWS S3)
|
|
14
|
-
- Unfamiliar API version or method
|
|
15
|
-
- Complex API with many options (e.g., MongoDB aggregation)
|
|
16
|
-
|
|
17
|
-
**Skip when:** Working with internal code (use LSP instead) or well-known stdlib APIs.
|
|
18
|
-
|
|
19
|
-
## Prerequisites
|
|
20
|
-
|
|
21
|
-
Requires `chub` CLI: `npm install -g @aisuite/chub`
|
|
22
|
-
|
|
23
|
-
If `chub` is not installed, tell the user and skip — don't block implementation.
|
|
24
|
-
|
|
25
|
-
## Workflow
|
|
26
|
-
|
|
27
|
-
### 1. Search for docs
|
|
28
|
-
|
|
29
|
-
```bash
|
|
30
|
-
chub search "<library or API>" --json
|
|
31
|
-
```
|
|
32
|
-
|
|
33
|
-
Example:
|
|
34
|
-
```bash
|
|
35
|
-
chub search "stripe payments" --json
|
|
36
|
-
chub search "mongodb aggregation" --json
|
|
37
|
-
```
|
|
38
|
-
|
|
39
|
-
### 2. Fetch relevant docs
|
|
40
|
-
|
|
41
|
-
```bash
|
|
42
|
-
chub get <id> --lang <py|js|ts>
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
Use `--lang` matching the project language. Use `--full` only if the summary lacks what you need.
|
|
46
|
-
|
|
47
|
-
### 3. Write code using fetched docs
|
|
48
|
-
|
|
49
|
-
Use the retrieved documentation as ground truth for API signatures, parameter names, and patterns.
|
|
50
|
-
|
|
51
|
-
### 4. Annotate discoveries
|
|
52
|
-
|
|
53
|
-
When you find something the docs missed or got wrong:
|
|
54
|
-
|
|
55
|
-
```bash
|
|
56
|
-
chub annotate <id> "Note: method X requires param Y since v2.0"
|
|
57
|
-
```
|
|
58
|
-
|
|
59
|
-
This persists locally and appears on future `chub get` calls — the agent learns across sessions.
|
|
60
|
-
|
|
61
|
-
### 5. Rate docs (optional)
|
|
62
|
-
|
|
63
|
-
```bash
|
|
64
|
-
chub feedback <id> up --label accurate
|
|
65
|
-
chub feedback <id> down --label outdated
|
|
66
|
-
```
|
|
67
|
-
|
|
68
|
-
Labels: `accurate`, `outdated`, `incomplete`, `wrong-version`, `helpful`
|
|
69
|
-
|
|
70
|
-
## Integration with LSP
|
|
71
|
-
|
|
72
|
-
| Need | Tool |
|
|
73
|
-
|------|------|
|
|
74
|
-
| Internal code navigation | LSP (`goToDefinition`, `findReferences`) |
|
|
75
|
-
| External API signatures | Context Hub (`chub get`) |
|
|
76
|
-
| Symbol search in project | LSP (`workspaceSymbol`) |
|
|
77
|
-
| Library usage patterns | Context Hub (`chub search`) |
|
|
78
|
-
|
|
79
|
-
**Combined approach:** Use LSP to understand how the project currently uses a library, then use Context Hub to verify correct API usage and discover better patterns.
|
|
80
|
-
|
|
81
|
-
## Rules
|
|
82
|
-
|
|
83
|
-
- Always search before implementing external API calls
|
|
84
|
-
- Trust chub docs over training data for API specifics
|
|
85
|
-
- Annotate gaps so future sessions benefit
|
|
86
|
-
- Don't block on chub failures — fall back to best knowledge
|
|
87
|
-
- Prefer `--json` flag for programmatic parsing in automated workflows
|