devlyn-cli 1.3.0 → 1.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +11 -1
- package/README.md +16 -4
- package/config/skills/devlyn:auto-resolve/SKILL.md +6 -2
- package/config/skills/devlyn:browser-validate/SKILL.md +72 -66
- package/config/skills/devlyn:browser-validate/references/tier1-chrome.md +1 -1
- package/config/skills/devlyn:browser-validate/references/tier2-playwright.md +16 -2
- package/config/skills/devlyn:browser-validate/references/tier3-curl.md +3 -2
- package/package.json +1 -1
package/CLAUDE.md
CHANGED
|
@@ -56,10 +56,13 @@ For hands-free build-evaluate-polish cycles — works for bugs, features, refact
|
|
|
56
56
|
/devlyn:auto-resolve [task description]
|
|
57
57
|
```
|
|
58
58
|
|
|
59
|
-
This runs the full pipeline automatically: **Build → Evaluate → Fix Loop → Simplify → Review → Security Review → Clean → Docs**. Each phase runs as a separate subagent with its own context. Communication between phases happens via files (`.claude/done-criteria.md`, `.claude/EVAL-FINDINGS.md`).
|
|
59
|
+
This runs the full pipeline automatically: **Build → Browser Validate → Evaluate → Fix Loop → Simplify → Review → Security Review → Clean → Docs**. Each phase runs as a separate subagent with its own context. Communication between phases happens via files (`.claude/done-criteria.md`, `.claude/EVAL-FINDINGS.md`, `.claude/BROWSER-RESULTS.md`).
|
|
60
|
+
|
|
61
|
+
For web projects, the Browser Validate phase starts the dev server and tests the implemented feature in a real browser — clicking buttons, filling forms, verifying results. If the feature doesn't work, findings feed back into the fix loop.
|
|
60
62
|
|
|
61
63
|
Optional flags:
|
|
62
64
|
- `--max-rounds 3` — increase max evaluate-fix iterations (default: 2)
|
|
65
|
+
- `--skip-browser` — skip browser validation phase (auto-skipped for non-web changes)
|
|
63
66
|
- `--skip-review` — skip team-review phase
|
|
64
67
|
- `--skip-clean` — skip clean phase
|
|
65
68
|
- `--skip-docs` — skip update-docs phase
|
|
@@ -99,6 +102,13 @@ Steps 4-6 are optional depending on the scope of changes. `/simplify` should alw
|
|
|
99
102
|
- Preserves all forward-looking content: roadmaps, future plans, visions, open questions
|
|
100
103
|
- If no docs exist, proposes a tailored docs structure and generates initial content
|
|
101
104
|
|
|
105
|
+
## Browser Testing Workflow
|
|
106
|
+
|
|
107
|
+
- **Standalone**: Use `/devlyn:browser-validate` to test any web feature in the browser — starts the dev server, tests the feature end-to-end, fixes issues it finds
|
|
108
|
+
- **In pipeline**: Auto-resolve includes browser validation automatically for web projects (between Build and Evaluate phases)
|
|
109
|
+
- **Tiered**: Uses chrome MCP tools if available, falls back to Playwright, then curl
|
|
110
|
+
- **Feature-first**: Tests the implemented feature (from done-criteria), not just "does the page load"
|
|
111
|
+
|
|
102
112
|
## Debugging Workflow
|
|
103
113
|
|
|
104
114
|
- **Simple bugs**: Use `/devlyn:resolve` for systematic bug fixing with test-driven validation
|
package/README.md
CHANGED
|
@@ -39,9 +39,10 @@ Structured prompts and role-based instructions that shape _what the AI knows and
|
|
|
39
39
|
|
|
40
40
|
Pipeline orchestration that controls _how agents execute_ — permissions, state management, multi-phase workflows, and cross-model evaluation.
|
|
41
41
|
|
|
42
|
-
- **`/devlyn:auto-resolve`** —
|
|
42
|
+
- **`/devlyn:auto-resolve`** — 9-phase automated pipeline (build → browser validate → evaluate → fix loop → simplify → review → security → clean → docs)
|
|
43
|
+
- **`/devlyn:browser-validate`** — feature verification in a real browser with tiered fallback (Chrome MCP → Playwright → curl)
|
|
43
44
|
- **`bypassPermissions` mode** for autonomous subagent execution
|
|
44
|
-
- **File-based state machine** — agents communicate via `.claude/done-criteria.md` and `
|
|
45
|
+
- **File-based state machine** — agents communicate via `.claude/done-criteria.md`, `EVAL-FINDINGS.md`, and `BROWSER-RESULTS.md`
|
|
45
46
|
- **Git checkpoints** at each phase for rollback safety
|
|
46
47
|
- **Cross-model evaluation** via `--with-codex` flag (OpenAI Codex as independent evaluator)
|
|
47
48
|
|
|
@@ -89,7 +90,8 @@ Slash commands are invoked directly in Claude Code conversations (e.g., type `/d
|
|
|
89
90
|
|---|---|
|
|
90
91
|
| `/devlyn:resolve` | Systematic bug fixing with root-cause analysis and test-driven validation |
|
|
91
92
|
| `/devlyn:team-resolve` | Spawns a full agent team — root cause analyst, test engineer, security auditor — to investigate complex issues |
|
|
92
|
-
| `/devlyn:auto-resolve` | Fully automated pipeline for any task — bugs, features, refactors, chores. Build → evaluate → fix loop → simplify → review → clean → docs. One command, zero human intervention. Supports `--with-codex` for cross-model evaluation via OpenAI Codex |
|
|
93
|
+
| `/devlyn:auto-resolve` | Fully automated pipeline for any task — bugs, features, refactors, chores. Build → browser validate → evaluate → fix loop → simplify → review → clean → docs. One command, zero human intervention. Supports `--with-codex` for cross-model evaluation via OpenAI Codex |
|
|
94
|
+
| `/devlyn:browser-validate` | Verify implemented features work in a real browser — starts dev server, tests the feature end-to-end (clicks, forms, verification), with tiered fallback (Chrome MCP → Playwright → curl) |
|
|
93
95
|
|
|
94
96
|
### Code Review & Quality
|
|
95
97
|
|
|
@@ -151,6 +153,7 @@ One command runs the full cycle — no human intervention needed:
|
|
|
151
153
|
| Phase | What Happens |
|
|
152
154
|
|---|---|
|
|
153
155
|
| **Build** | `team-resolve` investigates and implements, writes testable done criteria |
|
|
156
|
+
| **Browser Validate** | For web projects: starts dev server, tests the implemented feature end-to-end in a real browser, fixes issues found |
|
|
154
157
|
| **Evaluate** | Independent evaluator grades against done criteria with calibrated skepticism |
|
|
155
158
|
| **Fix Loop** | If evaluation fails, fixes findings and re-evaluates (up to N rounds) |
|
|
156
159
|
| **Simplify** | Quick cleanup pass for reuse and efficiency |
|
|
@@ -159,7 +162,7 @@ One command runs the full cycle — no human intervention needed:
|
|
|
159
162
|
| **Clean** | Remove dead code and unused dependencies |
|
|
160
163
|
| **Docs** | Sync documentation with changes |
|
|
161
164
|
|
|
162
|
-
Each phase runs as a separate subagent (fresh context), communicates via files, and commits a git checkpoint for rollback safety. Skip phases with flags: `--skip-review`, `--skip-clean`, `--skip-docs`, `--max-rounds 3`, `--with-codex` (cross-model evaluation via OpenAI Codex).
|
|
165
|
+
Each phase runs as a separate subagent (fresh context), communicates via files, and commits a git checkpoint for rollback safety. Skip phases with flags: `--skip-browser`, `--skip-review`, `--skip-clean`, `--skip-docs`, `--max-rounds 3`, `--with-codex` (cross-model evaluation via OpenAI Codex).
|
|
163
166
|
|
|
164
167
|
### Manual Workflow
|
|
165
168
|
|
|
@@ -237,6 +240,15 @@ Installed via the [skills CLI](https://github.com/anthropics/skills) (`npx skill
|
|
|
237
240
|
| `anthropics/skills` | Official Anthropic skill-creator with eval framework and description optimizer |
|
|
238
241
|
| `Leonxlnx/taste-skill` | Premium frontend design skills — modern layouts, animations, and visual refinement |
|
|
239
242
|
|
|
243
|
+
### MCP Servers
|
|
244
|
+
|
|
245
|
+
Installed via `claude mcp add` during setup.
|
|
246
|
+
|
|
247
|
+
| Server | Description |
|
|
248
|
+
|---|---|
|
|
249
|
+
| `codex-cli` | Codex MCP server for cross-model evaluation via OpenAI Codex |
|
|
250
|
+
| `playwright` | Playwright MCP for browser testing — powers `devlyn:browser-validate` Tier 2 |
|
|
251
|
+
|
|
240
252
|
> **Want to add a pack?** Open a PR adding your pack to the `OPTIONAL_ADDONS` array in [`bin/devlyn.js`](bin/devlyn.js).
|
|
241
253
|
|
|
242
254
|
## How It Works
|
|
@@ -91,8 +91,12 @@ You are a browser validation agent. Read the skill instructions at `.claude/skil
|
|
|
91
91
|
**After the agent completes**:
|
|
92
92
|
1. Read `.claude/BROWSER-RESULTS.md`
|
|
93
93
|
2. Extract the verdict
|
|
94
|
-
3.
|
|
95
|
-
|
|
94
|
+
3. Branch on verdict:
|
|
95
|
+
- `PASS` → continue to PHASE 2
|
|
96
|
+
- `PASS WITH ISSUES` → continue to PHASE 2 (evaluator reads browser results as extra context)
|
|
97
|
+
- `PARTIALLY VERIFIED` → continue to PHASE 2, but flag to the evaluator that browser coverage was incomplete — unverified features should be weighted more heavily
|
|
98
|
+
- `NEEDS WORK` → features don't work in the browser. Go to PHASE 2.5 fix loop. Fix agent reads `.claude/BROWSER-RESULTS.md` for which criterion failed, at what step, with what error. After fixing, re-run PHASE 1.5 to verify the fix before proceeding to Evaluate.
|
|
99
|
+
- `BLOCKED` → app doesn't render. Go to PHASE 2.5 fix loop. After fixing, re-run PHASE 1.5.
|
|
96
100
|
|
|
97
101
|
## PHASE 2: EVALUATE
|
|
98
102
|
|
|
@@ -1,9 +1,11 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: devlyn:browser-validate
|
|
3
|
-
description: Browser-based validation for web applications —
|
|
3
|
+
description: Browser-based validation for web applications — verifies that implemented features actually work by testing them in a real browser. Starts the dev server, tests the feature end-to-end (click buttons, fill forms, verify results), and reports what's broken with screenshot evidence. Use this skill whenever the user says "test in browser", "check if it works", "does the feature work", "browser test", "validate the UI", or when auto-resolve needs to verify web changes actually function correctly. Also use proactively after implementing UI changes. The primary goal is feature verification, not just checking if pages render.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
|
|
6
|
+
Verify that implemented features actually work in the browser. The primary job is to test the feature that was just built — click the button, fill the form, check the result. Smoke tests and visual checks are supporting checks, not the main event.
|
|
7
|
+
|
|
8
|
+
The whole point of browser validation is to catch the gap between "code looks correct" and "user can actually do the thing." Static analysis and unit tests can confirm the code is well-structured. Browser validation confirms it *works*.
|
|
7
9
|
|
|
8
10
|
<config>
|
|
9
11
|
$ARGUMENTS
|
|
@@ -13,80 +15,76 @@ $ARGUMENTS
|
|
|
13
15
|
|
|
14
16
|
## PHASE 1: DETECT
|
|
15
17
|
|
|
16
|
-
1. **
|
|
18
|
+
1. **What was built**: This is the most important input. Read `.claude/done-criteria.md` if it exists — it tells you what the feature is supposed to do. If it doesn't exist, read `git diff --stat` and `git log -1` to understand what changed. You need to know what to test before anything else.
|
|
19
|
+
|
|
20
|
+
2. **Framework detection**: Read `package.json` → identify framework and start command from `scripts.dev`, `scripts.start`, or `scripts.preview`.
|
|
17
21
|
|
|
18
|
-
|
|
22
|
+
3. **Port inference**: Defaults — Next.js: 3000, Vite: 5173, CRA: 3000, Nuxt: 3000, Astro: 4321, Angular: 4200. Override with `--port` flag.
|
|
19
23
|
|
|
20
|
-
|
|
24
|
+
4. **Affected routes**: Map changed files to routes (e.g., `app/dashboard/page.tsx` → `/dashboard`).
|
|
21
25
|
|
|
22
|
-
|
|
23
|
-
- Check if `mcp__claude-in-chrome__*` tools exist
|
|
24
|
-
- Else check if `mcp__playwright__*` tools exist
|
|
26
|
+
5. **Tier selection** — pick the best available browser tool:
|
|
27
|
+
- Check if `mcp__claude-in-chrome__*` tools exist → **Tier 1** (Chrome DevTools). Read `references/tier1-chrome.md`.
|
|
28
|
+
- Else check if `mcp__playwright__*` tools exist or `npx playwright --version` succeeds → **Tier 2** (Playwright). Read `references/tier2-playwright.md`.
|
|
25
29
|
- Else → **Tier 3** (HTTP smoke). Read `references/tier3-curl.md`.
|
|
26
30
|
|
|
27
|
-
|
|
31
|
+
6. **Skip gate**: If no web-relevant files changed (no `*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `*.astro`, `*.css`, `*.scss`, `*.html`, `page.*`, `layout.*`, `route.*`, `+page.*`, `+layout.*`), skip. Report: "Browser validation skipped — no web changes detected."
|
|
28
32
|
|
|
29
|
-
|
|
30
|
-
- `--skip-
|
|
33
|
+
7. **Parse flags** from `<config>`:
|
|
34
|
+
- `--skip-feature` — skip feature testing, only run smoke + visual
|
|
31
35
|
- `--port PORT` — override detected port
|
|
32
36
|
- `--tier N` — force a specific tier (1, 2, or 3)
|
|
33
|
-
- `--mobile-only`
|
|
34
|
-
- `--desktop-only` — only test desktop viewport
|
|
37
|
+
- `--mobile-only` / `--desktop-only` — limit viewport testing
|
|
35
38
|
|
|
36
39
|
Announce:
|
|
37
40
|
```
|
|
38
41
|
Browser validation starting
|
|
42
|
+
Feature: [what was built, from done-criteria or git diff]
|
|
39
43
|
Framework: [detected] | Port: [PORT] | Tier: [N — name]
|
|
40
|
-
|
|
41
|
-
Phases: Smoke → [Flow] → Visual → Report
|
|
44
|
+
Phases: Server → Smoke → Feature Test → Visual → Report
|
|
42
45
|
```
|
|
43
46
|
|
|
44
47
|
## PHASE 2: SERVER
|
|
45
48
|
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
Write this to `.claude/BROWSER-RESULTS.md` and stop.
|
|
55
|
-
4. Record the server PID for cleanup.
|
|
49
|
+
Get the dev server running. If it doesn't start, diagnose and fix — don't just report failure.
|
|
50
|
+
|
|
51
|
+
1. Start the dev server in background via Bash with `run_in_background: true`.
|
|
52
|
+
2. Health-check: poll `http://localhost:PORT` every 2s, timeout 30s. Ready when you get an HTTP response.
|
|
53
|
+
3. **If it doesn't come up — troubleshoot** (up to 2 attempts): read stderr for the error, fix it (npm install, port conflict, build error, etc.), restart, re-check.
|
|
54
|
+
4. If still down after 2 attempts: write BLOCKED verdict and stop.
|
|
55
|
+
|
|
56
|
+
## PHASE 3: SMOKE (quick prerequisite)
|
|
56
57
|
|
|
57
|
-
|
|
58
|
+
Quick check that the app is alive. This is not the main test — it's a gate to make sure feature testing is even possible.
|
|
58
59
|
|
|
59
|
-
|
|
60
|
+
Navigate to `/` and each affected route. For each page, judge: is this the actual application, or an error page? A connection error, framework error overlay, or blank shell is not the app. If broken, try to fix (read console errors, fix source, let hot-reload pick it up). Up to 2 fix attempts per route.
|
|
60
61
|
|
|
61
|
-
|
|
62
|
-
1. Navigate to the page
|
|
63
|
-
2. Verify the page has meaningful content (not a blank page, not a raw error)
|
|
64
|
-
3. Capture console messages — filter for errors (ignore React dev-mode warnings, HMR noise, favicon 404s)
|
|
65
|
-
4. Capture network requests — flag any 4xx/5xx responses or CORS failures (ignore HMR websocket, source maps)
|
|
66
|
-
5. Take a screenshot as evidence
|
|
62
|
+
If the app isn't rendering, the verdict is BLOCKED — feature testing can't happen.
|
|
67
63
|
|
|
68
|
-
|
|
64
|
+
## PHASE 4: FEATURE TEST (the main event)
|
|
69
65
|
|
|
70
|
-
|
|
66
|
+
This is the primary purpose of browser validation. Everything else is in service of getting here.
|
|
71
67
|
|
|
72
|
-
|
|
68
|
+
Read `.claude/done-criteria.md` (or infer from git diff what was built). For each criterion that describes something a user can do or see in the UI, test it end-to-end in the browser:
|
|
73
69
|
|
|
74
|
-
|
|
70
|
+
1. **Plan the test**: What would a user do to verify this feature works? Navigate where, click what, type what, expect what result?
|
|
71
|
+
2. **Execute it**: Navigate to the page, find the interactive elements, perform the actions, verify the outcome. Read `references/flow-testing.md` for patterns on converting criteria to browser steps.
|
|
72
|
+
3. **Capture evidence**: Screenshot at each key step. Record console errors and network failures that happen during the interaction.
|
|
73
|
+
4. **If it fails — try to fix**: Read the error (console, network, or the UI state) to understand why the feature broke. Fix the source code, let hot-reload update, and re-test. Up to 2 fix attempts per criterion.
|
|
74
|
+
5. **Record the result**: For each criterion — PASS (feature works as specified), FAIL (feature doesn't work, include what went wrong), SKIPPED (criterion isn't browser-testable, e.g., "API returns 401"), or UNVERIFIABLE (feature depends on external services not available in the test environment — e.g., real API keys, third-party auth, paid services).
|
|
75
75
|
|
|
76
|
-
|
|
77
|
-
1. Execute the action sequence (navigate → find → interact → verify)
|
|
78
|
-
2. After each interaction, check console + network for new errors
|
|
79
|
-
3. Screenshot at each verification point
|
|
80
|
-
4. Record pass/fail with evidence
|
|
76
|
+
**Don't churn on external dependencies.** If a feature test is blocked because an API times out, a third-party service isn't configured, or auth credentials aren't available — that's not a bug to fix, it's a test environment limitation. Note it as UNVERIFIABLE, move on to the next criterion. Don't spend more than 30 seconds waiting for a response that's never coming. The goal is to verify what *can* be verified in the current environment, and be honest about what can't.
|
|
81
77
|
|
|
82
|
-
|
|
78
|
+
The verdict depends primarily on this phase. If the implemented features don't work in the browser, the validation fails — even if every page renders perfectly and the layout looks great. And if most features couldn't be verified due to environment limitations, be honest about that — don't call it PASS.
|
|
83
79
|
|
|
84
|
-
|
|
80
|
+
## PHASE 5: VISUAL (supporting check)
|
|
85
81
|
|
|
86
|
-
|
|
87
|
-
2. **Desktop** (1280x800): resize → navigate to each affected route → screenshot → check for broken layouts, missing sections
|
|
82
|
+
Quick layout check at two viewports (skip if `--mobile-only` or `--desktop-only`):
|
|
88
83
|
|
|
89
|
-
|
|
84
|
+
1. **Mobile** (375x812): screenshot each affected route, check for overflow/overlap/unreadable text
|
|
85
|
+
2. **Desktop** (1280x800): screenshot each affected route, check for broken layouts
|
|
86
|
+
|
|
87
|
+
Judgment-based — look at the screenshots and report visible issues.
|
|
90
88
|
|
|
91
89
|
## PHASE 6: REPORT
|
|
92
90
|
|
|
@@ -95,24 +93,29 @@ Write `.claude/BROWSER-RESULTS.md`:
|
|
|
95
93
|
```markdown
|
|
96
94
|
# Browser Validation Results
|
|
97
95
|
|
|
98
|
-
## Verdict: [PASS / PASS WITH ISSUES / NEEDS WORK / BLOCKED]
|
|
99
|
-
Verdict rules:
|
|
96
|
+
## Verdict: [PASS / PASS WITH ISSUES / NEEDS WORK / PARTIALLY VERIFIED / BLOCKED]
|
|
97
|
+
Verdict rules:
|
|
98
|
+
- BLOCKED = server won't start or app doesn't render
|
|
99
|
+
- NEEDS WORK = implemented features don't work in the browser
|
|
100
|
+
- PARTIALLY VERIFIED = some features verified working, but others couldn't be tested due to environment limitations (missing API keys, external service dependencies). Be explicit about what was and wasn't verified.
|
|
101
|
+
- PASS WITH ISSUES = all testable features work but visual issues or minor warnings exist
|
|
102
|
+
- PASS = all testable features verified working, pages render, layout clean
|
|
103
|
+
|
|
104
|
+
## What Was Tested
|
|
105
|
+
[Brief description of the feature/task from done-criteria or git diff]
|
|
100
106
|
|
|
101
|
-
##
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
- Startup time: [N]s
|
|
107
|
+
## Feature Verification (primary)
|
|
108
|
+
| Criterion | Test Steps | Result | Evidence |
|
|
109
|
+
|-----------|-----------|--------|----------|
|
|
110
|
+
| [what should work] | [what you did] | PASS/FAIL/SKIPPED/UNVERIFIABLE | [screenshot, errors, what went wrong] |
|
|
106
111
|
|
|
107
|
-
##
|
|
108
|
-
|
|
109
|
-
|-------|---------|---------------|-----------------|------------|
|
|
110
|
-
| / | YES/NO | [count]: [details] | [count]: [details] | [path] |
|
|
112
|
+
## Unverifiable Features (if any)
|
|
113
|
+
[List features that couldn't be tested and why — e.g., "Badge rendering requires /api/backends/status which needs real API keys not present in test env. Verified via source code and unit tests instead."]
|
|
111
114
|
|
|
112
|
-
##
|
|
113
|
-
|
|
|
114
|
-
|
|
115
|
-
|
|
|
115
|
+
## Smoke Test (prerequisite)
|
|
116
|
+
| Route | Renders | Console Errors | Network Failures |
|
|
117
|
+
|-------|---------|---------------|-----------------|
|
|
118
|
+
| / | YES/NO | [count] | [count] |
|
|
116
119
|
|
|
117
120
|
## Visual Check
|
|
118
121
|
| Viewport | Route | Issues |
|
|
@@ -120,15 +123,18 @@ Verdict rules: BLOCKED = app won't start or root page crashes. NEEDS WORK = flow
|
|
|
120
123
|
| Mobile (375px) | / | [issues or "Clean"] |
|
|
121
124
|
| Desktop (1280px) | / | [issues or "Clean"] |
|
|
122
125
|
|
|
123
|
-
##
|
|
124
|
-
[
|
|
126
|
+
## Fixes Applied During Validation
|
|
127
|
+
[List any bugs found and fixed during testing — server startup issues, broken routes, feature bugs]
|
|
128
|
+
|
|
129
|
+
## Runtime Errors
|
|
130
|
+
[Console errors captured during testing]
|
|
125
131
|
|
|
126
132
|
## Failed Network Requests
|
|
127
|
-
[
|
|
133
|
+
[Failed API calls captured during testing]
|
|
128
134
|
```
|
|
129
135
|
|
|
130
136
|
## PHASE 7: CLEANUP
|
|
131
137
|
|
|
132
|
-
Kill the dev server
|
|
138
|
+
Kill the dev server PID. If `--keep-server` was passed (auto-resolve pipeline), skip — the pipeline handles cleanup.
|
|
133
139
|
|
|
134
140
|
</workflow>
|
|
@@ -41,7 +41,7 @@ After navigating, wait 2-3 seconds for client-side rendering, then call `get_pag
|
|
|
41
41
|
```
|
|
42
42
|
get_page_text → extract visible text content
|
|
43
43
|
```
|
|
44
|
-
|
|
44
|
+
Read the text and judge: is this the actual application, or an error/fallback page? Browser error pages, framework error overlays, "Unable to connect" screens, and empty shells all have text — but they're not the app. If the page content doesn't look like what the application is supposed to show, it's a failure.
|
|
45
45
|
|
|
46
46
|
### Read page structure
|
|
47
47
|
```
|
|
@@ -70,10 +70,24 @@ test.describe('Smoke Tests', () => {
|
|
|
70
70
|
}
|
|
71
71
|
});
|
|
72
72
|
|
|
73
|
+
// If goto throws (connection refused), the test fails — that's correct behavior
|
|
73
74
|
await page.goto(`http://localhost:${PORT}${route}`, { waitUntil: 'networkidle', timeout: 15000 });
|
|
74
75
|
|
|
75
|
-
|
|
76
|
-
|
|
76
|
+
// Verify this is the actual application, not an error page.
|
|
77
|
+
// When a server is down or a route is broken, the browser shows an error page
|
|
78
|
+
// that still has text content — "Unable to connect", "This site can't be reached", etc.
|
|
79
|
+
// A naive length check would pass on these. The title is the best signal:
|
|
80
|
+
// browser error pages have titles like "Problem loading page" or the URL itself,
|
|
81
|
+
// while real apps have meaningful titles set by the application.
|
|
82
|
+
const title = await page.title();
|
|
83
|
+
const bodyText = await page.textContent('body') || '';
|
|
84
|
+
|
|
85
|
+
// Page must have substantive content
|
|
86
|
+
expect(bodyText.trim().length, 'Page body is empty').toBeGreaterThan(0);
|
|
87
|
+
|
|
88
|
+
// Fail if the page navigation itself failed (Playwright sets title to the URL on error)
|
|
89
|
+
const pageUrl = page.url();
|
|
90
|
+
expect(title, 'Page shows a browser error — server may be down').not.toBe(pageUrl);
|
|
77
91
|
|
|
78
92
|
await page.screenshot({ path: `.claude/screenshots/smoke${route.replace(/\//g, '-') || '-root'}.png`, fullPage: true });
|
|
79
93
|
|
|
@@ -37,8 +37,9 @@ HTML=$(curl -s http://localhost:{PORT}{route} --max-time 10)
|
|
|
37
37
|
### Pass Criteria
|
|
38
38
|
|
|
39
39
|
A route passes if:
|
|
40
|
-
1.
|
|
41
|
-
2.
|
|
40
|
+
1. curl succeeds (doesn't error out with connection refused or timeout)
|
|
41
|
+
2. `STATUS` is `200` (or `301`, `302`, `304`) — not `000`, not `5xx`
|
|
42
|
+
3. HTML contains `<body` tag
|
|
42
43
|
3. HTML body has more than 100 characters of text content (not just empty divs)
|
|
43
44
|
4. HTML does not contain server error indicators: `Internal Server Error`, `500`, `ECONNREFUSED`, `Cannot GET`, `404`
|
|
44
45
|
|