devlyn-cli 1.3.0 → 1.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,9 +1,11 @@
1
1
  ---
2
2
  name: devlyn:browser-validate
3
- description: Browser-based validation for web applications — smoke tests, user flow testing, visual checks, and runtime error detection. Starts the dev server, navigates the app, and reports what's broken with screenshot evidence. Use this skill whenever the user says "test in browser", "check if it works", "smoke test", "browser test", "validate the UI", "does the app run", or when auto-resolve needs to verify web changes actually render and function correctly. Also use proactively after implementing UI changes to catch runtime errors that static analysis misses.
3
+ description: Browser-based validation for web applications — verifies that implemented features actually work by testing them in a real browser. Starts the dev server, tests the feature end-to-end (click buttons, fill forms, verify results), and reports what's broken with screenshot evidence. Use this skill whenever the user says "test in browser", "check if it works", "does the feature work", "browser test", "validate the UI", or when auto-resolve needs to verify web changes actually function correctly. Also use proactively after implementing UI changes. The primary goal is feature verification, not just checking if pages render.
4
4
  ---
5
5
 
6
- Browser validation for web applications. Starts a dev server, tests in a real browser (or falls back to Playwright/curl), and reports findings with evidence. Designed to catch the bugs that pass every static check but break when a user actually clicks something runtime errors, failed API calls, blank pages, broken interactions.
6
+ Verify that implemented features actually work in the browser. The primary job is to test the feature that was just built click the button, fill the form, check the result. Smoke tests and visual checks are supporting checks, not the main event.
7
+
8
+ The whole point of browser validation is to catch the gap between "code looks correct" and "user can actually do the thing." Static analysis and unit tests can confirm the code is well-structured. Browser validation confirms it *works*.
7
9
 
8
10
  <config>
9
11
  $ARGUMENTS
@@ -13,80 +15,74 @@ $ARGUMENTS
13
15
 
14
16
  ## PHASE 1: DETECT
15
17
 
16
- 1. **Framework detection**: Read `package.json` identify framework from dependencies (`next`, `vite`, `react-scripts`, `nuxt`, `astro`, `svelte`, `remix`, `angular`). Find the start command from `scripts.dev`, `scripts.start`, or `scripts.preview`.
18
+ 1. **What was built**: This is the most important input. Read `.claude/done-criteria.md` if it exists it tells you what the feature is supposed to do. If it doesn't exist, read `git diff --stat` and `git log -1` to understand what changed. You need to know what to test before anything else.
19
+
20
+ 2. **Framework detection**: Read `package.json` → identify framework and start command from `scripts.dev`, `scripts.start`, or `scripts.preview`.
17
21
 
18
- 2. **Port inference**: Check framework config files for custom ports. Defaults — Next.js: 3000, Vite: 5173, CRA: 3000, Nuxt: 3000, Astro: 4321, Angular: 4200. Override with `--port` flag if provided.
22
+ 3. **Port inference**: Defaults — Next.js: 3000, Vite: 5173, CRA: 3000, Nuxt: 3000, Astro: 4321, Angular: 4200. Override with `--port` flag.
19
23
 
20
- 3. **Affected routes**: Run `git diff --name-only` and map changed files to routes (e.g., `app/dashboard/page.tsx` → `/dashboard`, `src/pages/about.vue` → `/about`). These are the pages that need testing.
24
+ 4. **Affected routes**: Map changed files to routes (e.g., `app/dashboard/page.tsx` → `/dashboard`).
21
25
 
22
- 4. **Tier selection** — pick the best available browser tool:
23
- - Check if `mcp__claude-in-chrome__*` tools exist in available tools → **Tier 1** (Chrome DevTools). Read `references/tier1-chrome.md`.
24
- - Else check if `mcp__playwright__*` tools exist (Playwright MCP installed via `npx devlyn-cli`) OR run `npx playwright --version 2>/dev/null` → **Tier 2** (Playwright). Read `references/tier2-playwright.md`.
26
+ 5. **Tier selection** — pick the best available browser tool:
27
+ - Check if `mcp__claude-in-chrome__*` tools exist → **Tier 1** (Chrome DevTools). Read `references/tier1-chrome.md`.
28
+ - Else check if `mcp__playwright__*` tools exist or `npx playwright --version` succeeds → **Tier 2** (Playwright). Read `references/tier2-playwright.md`.
25
29
  - Else → **Tier 3** (HTTP smoke). Read `references/tier3-curl.md`.
26
30
 
27
- 5. **Skip gate**: If no web-relevant files changed (no `*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `*.astro`, `*.css`, `*.scss`, `*.html`, `page.*`, `layout.*`, `route.*`, `+page.*`, `+layout.*`), skip the entire phase. Report: "Browser validation skipped — no web changes detected."
31
+ 6. **Skip gate**: If no web-relevant files changed (no `*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `*.astro`, `*.css`, `*.scss`, `*.html`, `page.*`, `layout.*`, `route.*`, `+page.*`, `+layout.*`), skip. Report: "Browser validation skipped — no web changes detected."
28
32
 
29
- 6. **Parse flags** from `<config>`:
30
- - `--skip-flow` — skip flow testing, only run smoke + visual
33
+ 7. **Parse flags** from `<config>`:
34
+ - `--skip-feature` — skip feature testing, only run smoke + visual
31
35
  - `--port PORT` — override detected port
32
36
  - `--tier N` — force a specific tier (1, 2, or 3)
33
- - `--mobile-only` only test mobile viewport
34
- - `--desktop-only` — only test desktop viewport
37
+ - `--mobile-only` / `--desktop-only` limit viewport testing
35
38
 
36
39
  Announce:
37
40
  ```
38
41
  Browser validation starting
42
+ Feature: [what was built, from done-criteria or git diff]
39
43
  Framework: [detected] | Port: [PORT] | Tier: [N — name]
40
- Affected routes: [list]
41
- Phases: Smoke → [Flow] → Visual → Report
44
+ Phases: Server → Smoke → Feature Test → Visual → Report
42
45
  ```
43
46
 
44
47
  ## PHASE 2: SERVER
45
48
 
46
- 1. Start the dev server in background: run the detected start command via Bash with `run_in_background: true`.
47
- 2. Health-check loop: poll `http://localhost:PORT` every 2 seconds using `curl -s -o /dev/null -w "%{http_code}"`. Timeout after 20 seconds.
48
- 3. If the server fails to start, capture stderr output and report as BLOCKED:
49
- ```
50
- Verdict: BLOCKED
51
- Reason: Dev server failed to start within 20s
52
- Error: [stderr output]
53
- ```
54
- Write this to `.claude/BROWSER-RESULTS.md` and stop.
55
- 4. Record the server PID for cleanup.
49
+ Get the dev server running. If it doesn't start, diagnose and fix don't just report failure.
50
+
51
+ 1. Start the dev server in background via Bash with `run_in_background: true`.
52
+ 2. Health-check: poll `http://localhost:PORT` every 2s, timeout 30s. Ready when you get an HTTP response.
53
+ 3. **If it doesn't come up — troubleshoot** (up to 2 attempts): read stderr for the error, fix it (npm install, port conflict, build error, etc.), restart, re-check.
54
+ 4. If still down after 2 attempts: write BLOCKED verdict and stop.
56
55
 
57
- ## PHASE 3: SMOKE
56
+ ## PHASE 3: SMOKE (quick prerequisite)
58
57
 
59
- Test that the app renders and the runtime is clean. Follow the tier-specific reference file for exact tool calls.
58
+ Quick check that the app is alive. This is not the main test it's a gate to make sure feature testing is even possible.
60
59
 
61
- For each affected route (and always `/` as the first):
62
- 1. Navigate to the page
63
- 2. Verify the page has meaningful content (not a blank page, not a raw error)
64
- 3. Capture console messages — filter for errors (ignore React dev-mode warnings, HMR noise, favicon 404s)
65
- 4. Capture network requests — flag any 4xx/5xx responses or CORS failures (ignore HMR websocket, source maps)
66
- 5. Take a screenshot as evidence
60
+ Navigate to `/` and each affected route. For each page, judge: is this the actual application, or an error page? A connection error, framework error overlay, or blank shell is not the app. If broken, try to fix (read console errors, fix source, let hot-reload pick it up). Up to 2 fix attempts per route.
67
61
 
68
- A route **fails smoke** if: the page is blank, shows an unhandled error, has console errors (excluding known dev noise), or has failed network requests to the app's own API.
62
+ If the app isn't rendering, the verdict is BLOCKED feature testing can't happen.
69
63
 
70
- ## PHASE 4: FLOW (conditional)
64
+ ## PHASE 4: FEATURE TEST (the main event)
71
65
 
72
- Skip if `--skip-flow` is set or if `.claude/done-criteria.md` doesn't exist.
66
+ This is the primary purpose of browser validation. Everything else is in service of getting here.
73
67
 
74
- Read `references/flow-testing.md` for how to convert done-criteria into browser test steps. Then execute each test step using the tier-specific tools.
68
+ Read `.claude/done-criteria.md` (or infer from git diff what was built). For each criterion that describes something a user can do or see in the UI, test it end-to-end in the browser:
75
69
 
76
- For each flow test:
77
- 1. Execute the action sequence (navigate find interact verify)
78
- 2. After each interaction, check console + network for new errors
79
- 3. Screenshot at each verification point
80
- 4. Record pass/fail with evidence
70
+ 1. **Plan the test**: What would a user do to verify this feature works? Navigate where, click what, type what, expect what result?
71
+ 2. **Execute it**: Navigate to the page, find the interactive elements, perform the actions, verify the outcome. Read `references/flow-testing.md` for patterns on converting criteria to browser steps.
72
+ 3. **Capture evidence**: Screenshot at each key step. Record console errors and network failures that happen during the interaction.
73
+ 4. **If it fails try to fix**: Read the error (console, network, or the UI state) to understand why the feature broke. Fix the source code, let hot-reload update, and re-test. Up to 2 fix attempts per criterion.
74
+ 5. **Record the result**: For each criterion — PASS (feature works as specified), FAIL (feature doesn't work, include what went wrong), or SKIPPED (criterion isn't browser-testable, e.g., "API returns 401").
81
75
 
82
- ## PHASE 5: VISUAL
76
+ The verdict depends primarily on this phase. If the implemented features don't work in the browser, the validation fails — even if every page renders perfectly and the layout looks great.
83
77
 
84
- Test layout integrity at two viewports (skip one if `--mobile-only` or `--desktop-only` is set):
78
+ ## PHASE 5: VISUAL (supporting check)
85
79
 
86
- 1. **Mobile** (375x812): resize navigate to each affected route → screenshot → check for overflow, overlapping elements, unreadable text
87
- 2. **Desktop** (1280x800): resize → navigate to each affected route → screenshot → check for broken layouts, missing sections
80
+ Quick layout check at two viewports (skip if `--mobile-only` or `--desktop-only`):
88
81
 
89
- This is judgment-based the agent looks at screenshots and reports visible issues. Not pixel-diff.
82
+ 1. **Mobile** (375x812): screenshot each affected route, check for overflow/overlap/unreadable text
83
+ 2. **Desktop** (1280x800): screenshot each affected route, check for broken layouts
84
+
85
+ Judgment-based — look at the screenshots and report visible issues.
90
86
 
91
87
  ## PHASE 6: REPORT
92
88
 
@@ -96,23 +92,24 @@ Write `.claude/BROWSER-RESULTS.md`:
96
92
  # Browser Validation Results
97
93
 
98
94
  ## Verdict: [PASS / PASS WITH ISSUES / NEEDS WORK / BLOCKED]
99
- Verdict rules: BLOCKED = app won't start or root page crashes. NEEDS WORK = flow tests fail or console errors on affected routes. PASS WITH ISSUES = visual issues or minor warnings. PASS = clean across all checks.
95
+ Verdict rules:
96
+ - BLOCKED = server won't start or app doesn't render
97
+ - NEEDS WORK = implemented features don't work in the browser (this is the primary failure mode)
98
+ - PASS WITH ISSUES = features work but visual issues or minor warnings exist
99
+ - PASS = features verified working, pages render, layout clean
100
100
 
101
- ## Environment
102
- - Framework: [detected]
103
- - Dev server: [command] on port [PORT]
104
- - Browser tier: [1/2/3 — name]
105
- - Startup time: [N]s
101
+ ## What Was Tested
102
+ [Brief description of the feature/task from done-criteria or git diff]
106
103
 
107
- ## Smoke Test
108
- | Route | Renders | Console Errors | Network Failures | Screenshot |
109
- |-------|---------|---------------|-----------------|------------|
110
- | / | YES/NO | [count]: [details] | [count]: [details] | [path] |
104
+ ## Feature Verification (primary)
105
+ | Criterion | Test Steps | Result | Evidence |
106
+ |-----------|-----------|--------|----------|
107
+ | [what should work] | [what you did] | PASS/FAIL/SKIPPED | [screenshot, errors, what went wrong] |
111
108
 
112
- ## Flow Tests
113
- | Criterion | Steps | Result | Evidence |
114
- |-----------|-------|--------|----------|
115
- | [text] | [N] | PASS/FAIL | [screenshot, errors] |
109
+ ## Smoke Test (prerequisite)
110
+ | Route | Renders | Console Errors | Network Failures |
111
+ |-------|---------|---------------|-----------------|
112
+ | / | YES/NO | [count] | [count] |
116
113
 
117
114
  ## Visual Check
118
115
  | Viewport | Route | Issues |
@@ -120,15 +117,18 @@ Verdict rules: BLOCKED = app won't start or root page crashes. NEEDS WORK = flow
120
117
  | Mobile (375px) | / | [issues or "Clean"] |
121
118
  | Desktop (1280px) | / | [issues or "Clean"] |
122
119
 
123
- ## Runtime Errors (full log)
124
- [all unique console errors, deduplicated]
120
+ ## Fixes Applied During Validation
121
+ [List any bugs found and fixed during testing — server startup issues, broken routes, feature bugs]
122
+
123
+ ## Runtime Errors
124
+ [Console errors captured during testing]
125
125
 
126
126
  ## Failed Network Requests
127
- [all failed requests with URL, status, and context]
127
+ [Failed API calls captured during testing]
128
128
  ```
129
129
 
130
130
  ## PHASE 7: CLEANUP
131
131
 
132
- Kill the dev server process using the stored PID. If running inside auto-resolve pipeline and the `--keep-server` flag was passed (set by the pipeline orchestrator), skip cleanup — the pipeline will handle it.
132
+ Kill the dev server PID. If `--keep-server` was passed (auto-resolve pipeline), skip — the pipeline handles cleanup.
133
133
 
134
134
  </workflow>
@@ -41,7 +41,7 @@ After navigating, wait 2-3 seconds for client-side rendering, then call `get_pag
41
41
  ```
42
42
  get_page_text → extract visible text content
43
43
  ```
44
- Page fails if: text is empty, contains only boilerplate (e.g., just "Loading..."), or contains raw error stack traces.
44
+ Read the text and judge: is this the actual application, or an error/fallback page? Browser error pages, framework error overlays, "Unable to connect" screens, and empty shells all have text — but they're not the app. If the page content doesn't look like what the application is supposed to show, it's a failure.
45
45
 
46
46
  ### Read page structure
47
47
  ```
@@ -70,10 +70,24 @@ test.describe('Smoke Tests', () => {
70
70
  }
71
71
  });
72
72
 
73
+ // If goto throws (connection refused), the test fails — that's correct behavior
73
74
  await page.goto(`http://localhost:${PORT}${route}`, { waitUntil: 'networkidle', timeout: 15000 });
74
75
 
75
- const bodyText = await page.textContent('body');
76
- expect(bodyText?.trim().length).toBeGreaterThan(0);
76
+ // Verify this is the actual application, not an error page.
77
+ // When a server is down or a route is broken, the browser shows an error page
78
+ // that still has text content — "Unable to connect", "This site can't be reached", etc.
79
+ // A naive length check would pass on these. The title is the best signal:
80
+ // browser error pages have titles like "Problem loading page" or the URL itself,
81
+ // while real apps have meaningful titles set by the application.
82
+ const title = await page.title();
83
+ const bodyText = await page.textContent('body') || '';
84
+
85
+ // Page must have substantive content
86
+ expect(bodyText.trim().length, 'Page body is empty').toBeGreaterThan(0);
87
+
88
+ // Fail if the page navigation itself failed (Playwright sets title to the URL on error)
89
+ const pageUrl = page.url();
90
+ expect(title, 'Page shows a browser error — server may be down').not.toBe(pageUrl);
77
91
 
78
92
  await page.screenshot({ path: `.claude/screenshots/smoke${route.replace(/\//g, '-') || '-root'}.png`, fullPage: true });
79
93
 
@@ -37,8 +37,9 @@ HTML=$(curl -s http://localhost:{PORT}{route} --max-time 10)
37
37
  ### Pass Criteria
38
38
 
39
39
  A route passes if:
40
- 1. `STATUS` is `200` (or `304`)
41
- 2. HTML contains `<body` tag
40
+ 1. curl succeeds (doesn't error out with connection refused or timeout)
41
+ 2. `STATUS` is `200` (or `301`, `302`, `304`) — not `000`, not `5xx`
42
+ 3. HTML contains `<body` tag
42
43
  3. HTML body has more than 100 characters of text content (not just empty divs)
43
44
  4. HTML does not contain server error indicators: `Internal Server Error`, `500`, `ECONNREFUSED`, `Cannot GET`, `404`
44
45
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "devlyn-cli",
3
- "version": "1.3.0",
3
+ "version": "1.3.1",
4
4
  "description": "Claude Code configuration toolkit for teams",
5
5
  "bin": {
6
6
  "devlyn": "bin/devlyn.js"