devlyn-cli 1.2.1 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/bin/devlyn.js CHANGED
@@ -154,6 +154,7 @@ const OPTIONAL_ADDONS = [
154
154
  { name: 'Leonxlnx/taste-skill', desc: 'Premium frontend design skills — modern layouts, animations, and visual refinement', type: 'external' },
155
155
  // MCP servers (installed via claude mcp add)
156
156
  { name: 'codex-cli', desc: 'Codex MCP server for cross-model evaluation via OpenAI Codex', type: 'mcp', command: 'npx -y codex-mcp-server' },
157
+ { name: 'playwright', desc: 'Playwright MCP for browser testing — powers devlyn:browser-validate Tier 2', type: 'mcp', command: 'npx -y @anthropic-ai/mcp-playwright' },
157
158
  ];
158
159
 
159
160
  function log(msg, color = 'reset') {
@@ -544,8 +545,10 @@ async function init(skipPrompts = false) {
544
545
  const pipelinePermissions = [
545
546
  'Write(.claude/done-criteria.md)',
546
547
  'Write(.claude/EVAL-FINDINGS.md)',
548
+ 'Write(.claude/BROWSER-RESULTS.md)',
547
549
  'Edit(.claude/done-criteria.md)',
548
550
  'Edit(.claude/EVAL-FINDINGS.md)',
551
+ 'Edit(.claude/BROWSER-RESULTS.md)',
549
552
  'Bash(git add *)',
550
553
  'Bash(git commit *)',
551
554
  'Bash(git diff *)',
@@ -19,6 +19,7 @@ $ARGUMENTS
19
19
  - `--skip-review` (false) — skip team-review phase
20
20
  - `--security-review` (auto) — run dedicated security audit. Auto-detects: runs when changes touch auth, secrets, user data, API endpoints, env/config, or crypto. Force with `--security-review always` or skip with `--security-review skip`
21
21
  - `--skip-clean` (false) — skip clean phase
22
+ - `--skip-browser` (false) — skip browser validation phase (auto-skipped for non-web changes)
22
23
  - `--skip-docs` (false) — skip update-docs phase
23
24
  - `--with-codex` (false) — use OpenAI Codex as a cross-model evaluator/reviewer via `mcp__codex-cli__*` MCP tools. Accepts: `evaluate`, `review`, or `both` (default when flag is present without value). When enabled, Codex provides an independent second opinion from a different model family, creating a GAN-like dynamic where Claude builds and Codex critiques.
24
25
 
@@ -32,7 +33,7 @@ $ARGUMENTS
32
33
  ```
33
34
  Auto-resolve pipeline starting
34
35
  Task: [extracted task description]
35
- Phases: Build → Evaluate → [Fix loop if needed] → Simplify → [Review] → [Security] → [Clean] → [Docs]
36
+ Phases: Build → [Browser] → Evaluate → [Fix loop if needed] → Simplify → [Review] → [Security] → [Clean] → [Docs]
36
37
  Max evaluation rounds: [N]
37
38
  Cross-model evaluation (Codex): [evaluate / review / both / disabled]
38
39
  ```
@@ -75,6 +76,24 @@ The task is: [paste the task description here]
75
76
  3. If no changes were made, report failure and stop
76
77
  4. **Checkpoint**: Run `git add -A && git commit -m "chore(pipeline): phase 1 — build complete"` to create a rollback point
77
78
 
79
+ ## PHASE 1.5: BROWSER VALIDATE (conditional)
80
+
81
+ Skip if `--skip-browser` was set.
82
+
83
+ 1. **Check relevance**: Run `git diff --name-only` and check for web-relevant files (`*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `*.css`, `*.html`, `page.*`, `layout.*`, `route.*`). If none found, skip and note "Browser validation skipped — no web changes detected."
84
+
85
+ 2. **Run validation**: Spawn a subagent using the Agent tool with `mode: "bypassPermissions"`.
86
+
87
+ Agent prompt — pass this to the Agent tool:
88
+
89
+ You are a browser validation agent. Read the skill instructions at `.claude/skills/devlyn:browser-validate/SKILL.md` and follow the full workflow to validate this web application. The dev server should be started, tested, and left running (pass `--keep-server` internally) — the pipeline will clean it up later. Write your findings to `.claude/BROWSER-RESULTS.md`.
90
+
91
+ **After the agent completes**:
92
+ 1. Read `.claude/BROWSER-RESULTS.md`
93
+ 2. Extract the verdict
94
+ 3. If `BLOCKED` → the app doesn't even render. Go directly to PHASE 2.5 fix loop with browser findings as context.
95
+ 4. Otherwise → continue to PHASE 2 (the evaluator will read `BROWSER-RESULTS.md` as additional evidence)
96
+
78
97
  ## PHASE 2: EVALUATE
79
98
 
80
99
  Spawn a subagent using the Agent tool with `mode: "bypassPermissions"` to evaluate the work. Include all evaluation instructions inline.
@@ -250,6 +269,9 @@ After all phases complete:
250
269
  1. Clean up temporary files:
251
270
  - Delete `.claude/done-criteria.md`
252
271
  - Delete `.claude/EVAL-FINDINGS.md`
272
+ - Delete `.claude/BROWSER-RESULTS.md` (if exists)
273
+ - Delete `.claude/screenshots/` directory (if exists)
274
+ - Kill any dev server process still running from browser validation
253
275
 
254
276
  2. Run `git log --oneline -10` to show commits made during the pipeline
255
277
 
@@ -264,6 +286,7 @@ After all phases complete:
264
286
  | Phase | Status | Notes |
265
287
  |-------|--------|-------|
266
288
  | Build (team-resolve) | [completed] | [brief summary] |
289
+ | Browser validate | [completed / skipped / auto-skipped] | [verdict, tier used, console errors, flow results] |
267
290
  | Evaluate (Claude) | [PASS/NEEDS WORK after N rounds] | [verdict + key findings] |
268
291
  | Evaluate (Codex) | [completed / skipped] | [Codex-only findings count, merged verdict] |
269
292
  | Fix rounds | [N rounds / skipped] | [what was fixed] |
@@ -0,0 +1,134 @@
1
+ ---
2
+ name: devlyn:browser-validate
3
+ description: Browser-based validation for web applications — smoke tests, user flow testing, visual checks, and runtime error detection. Starts the dev server, navigates the app, and reports what's broken with screenshot evidence. Use this skill whenever the user says "test in browser", "check if it works", "smoke test", "browser test", "validate the UI", "does the app run", or when auto-resolve needs to verify web changes actually render and function correctly. Also use proactively after implementing UI changes to catch runtime errors that static analysis misses.
4
+ ---
5
+
6
+ Browser validation for web applications. Starts a dev server, tests in a real browser (or falls back to Playwright/curl), and reports findings with evidence. Designed to catch the bugs that pass every static check but break when a user actually clicks something — runtime errors, failed API calls, blank pages, broken interactions.
7
+
8
+ <config>
9
+ $ARGUMENTS
10
+ </config>
11
+
12
+ <workflow>
13
+
14
+ ## PHASE 1: DETECT
15
+
16
+ 1. **Framework detection**: Read `package.json` → identify framework from dependencies (`next`, `vite`, `react-scripts`, `nuxt`, `astro`, `svelte`, `remix`, `angular`). Find the start command from `scripts.dev`, `scripts.start`, or `scripts.preview`.
17
+
18
+ 2. **Port inference**: Check framework config files for custom ports. Defaults — Next.js: 3000, Vite: 5173, CRA: 3000, Nuxt: 3000, Astro: 4321, Angular: 4200. Override with `--port` flag if provided.
19
+
20
+ 3. **Affected routes**: Run `git diff --name-only` and map changed files to routes (e.g., `app/dashboard/page.tsx` → `/dashboard`, `src/pages/about.vue` → `/about`). These are the pages that need testing.
21
+
22
+ 4. **Tier selection** — pick the best available browser tool:
23
+ - Check if `mcp__claude-in-chrome__*` tools exist in available tools → **Tier 1** (Chrome DevTools). Read `references/tier1-chrome.md`.
24
+ - Else check if `mcp__playwright__*` tools exist (Playwright MCP installed via `npx devlyn-cli`) OR run `npx playwright --version 2>/dev/null` → **Tier 2** (Playwright). Read `references/tier2-playwright.md`.
25
+ - Else → **Tier 3** (HTTP smoke). Read `references/tier3-curl.md`.
26
+
27
+ 5. **Skip gate**: If no web-relevant files changed (no `*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `*.astro`, `*.css`, `*.scss`, `*.html`, `page.*`, `layout.*`, `route.*`, `+page.*`, `+layout.*`), skip the entire phase. Report: "Browser validation skipped — no web changes detected."
28
+
29
+ 6. **Parse flags** from `<config>`:
30
+ - `--skip-flow` — skip flow testing, only run smoke + visual
31
+ - `--port PORT` — override detected port
32
+ - `--tier N` — force a specific tier (1, 2, or 3)
33
+ - `--mobile-only` — only test mobile viewport
34
+ - `--desktop-only` — only test desktop viewport
35
+
36
+ Announce:
37
+ ```
38
+ Browser validation starting
39
+ Framework: [detected] | Port: [PORT] | Tier: [N — name]
40
+ Affected routes: [list]
41
+ Phases: Smoke → [Flow] → Visual → Report
42
+ ```
43
+
44
+ ## PHASE 2: SERVER
45
+
46
+ 1. Start the dev server in background: run the detected start command via Bash with `run_in_background: true`.
47
+ 2. Health-check loop: poll `http://localhost:PORT` every 2 seconds using `curl -s -o /dev/null -w "%{http_code}"`. Timeout after 20 seconds.
48
+ 3. If the server fails to start, capture stderr output and report as BLOCKED:
49
+ ```
50
+ Verdict: BLOCKED
51
+ Reason: Dev server failed to start within 20s
52
+ Error: [stderr output]
53
+ ```
54
+ Write this to `.claude/BROWSER-RESULTS.md` and stop.
55
+ 4. Record the server PID for cleanup.
56
+
57
+ ## PHASE 3: SMOKE
58
+
59
+ Test that the app renders and the runtime is clean. Follow the tier-specific reference file for exact tool calls.
60
+
61
+ For each affected route (and always `/` as the first):
62
+ 1. Navigate to the page
63
+ 2. Verify the page has meaningful content (not a blank page, not a raw error)
64
+ 3. Capture console messages — filter for errors (ignore React dev-mode warnings, HMR noise, favicon 404s)
65
+ 4. Capture network requests — flag any 4xx/5xx responses or CORS failures (ignore HMR websocket, source maps)
66
+ 5. Take a screenshot as evidence
67
+
68
+ A route **fails smoke** if: the page is blank, shows an unhandled error, has console errors (excluding known dev noise), or has failed network requests to the app's own API.
69
+
70
+ ## PHASE 4: FLOW (conditional)
71
+
72
+ Skip if `--skip-flow` is set or if `.claude/done-criteria.md` doesn't exist.
73
+
74
+ Read `references/flow-testing.md` for how to convert done-criteria into browser test steps. Then execute each test step using the tier-specific tools.
75
+
76
+ For each flow test:
77
+ 1. Execute the action sequence (navigate → find → interact → verify)
78
+ 2. After each interaction, check console + network for new errors
79
+ 3. Screenshot at each verification point
80
+ 4. Record pass/fail with evidence
81
+
82
+ ## PHASE 5: VISUAL
83
+
84
+ Test layout integrity at two viewports (skip one if `--mobile-only` or `--desktop-only` is set):
85
+
86
+ 1. **Mobile** (375x812): resize → navigate to each affected route → screenshot → check for overflow, overlapping elements, unreadable text
87
+ 2. **Desktop** (1280x800): resize → navigate to each affected route → screenshot → check for broken layouts, missing sections
88
+
89
+ This is judgment-based — the agent looks at screenshots and reports visible issues. Not pixel-diff.
90
+
91
+ ## PHASE 6: REPORT
92
+
93
+ Write `.claude/BROWSER-RESULTS.md`:
94
+
95
+ ```markdown
96
+ # Browser Validation Results
97
+
98
+ ## Verdict: [PASS / PASS WITH ISSUES / NEEDS WORK / BLOCKED]
99
+ Verdict rules: BLOCKED = app won't start or root page crashes. NEEDS WORK = flow tests fail or console errors on affected routes. PASS WITH ISSUES = visual issues or minor warnings. PASS = clean across all checks.
100
+
101
+ ## Environment
102
+ - Framework: [detected]
103
+ - Dev server: [command] on port [PORT]
104
+ - Browser tier: [1/2/3 — name]
105
+ - Startup time: [N]s
106
+
107
+ ## Smoke Test
108
+ | Route | Renders | Console Errors | Network Failures | Screenshot |
109
+ |-------|---------|---------------|-----------------|------------|
110
+ | / | YES/NO | [count]: [details] | [count]: [details] | [path] |
111
+
112
+ ## Flow Tests
113
+ | Criterion | Steps | Result | Evidence |
114
+ |-----------|-------|--------|----------|
115
+ | [text] | [N] | PASS/FAIL | [screenshot, errors] |
116
+
117
+ ## Visual Check
118
+ | Viewport | Route | Issues |
119
+ |----------|-------|--------|
120
+ | Mobile (375px) | / | [issues or "Clean"] |
121
+ | Desktop (1280px) | / | [issues or "Clean"] |
122
+
123
+ ## Runtime Errors (full log)
124
+ [all unique console errors, deduplicated]
125
+
126
+ ## Failed Network Requests
127
+ [all failed requests with URL, status, and context]
128
+ ```
129
+
130
+ ## PHASE 7: CLEANUP
131
+
132
+ Kill the dev server process using the stored PID. If running inside auto-resolve pipeline and the `--keep-server` flag was passed (set by the pipeline orchestrator), skip cleanup — the pipeline will handle it.
133
+
134
+ </workflow>
@@ -0,0 +1,118 @@
1
+ # Flow Testing: Done-Criteria to Browser Steps
2
+
3
+ How to read `.claude/done-criteria.md` and convert testable criteria into browser action sequences. This is the bridge between "what should work" and "prove it works in the browser."
4
+
5
+ Read this file only during PHASE 4 (FLOW) when done-criteria exists.
6
+
7
+ ---
8
+
9
+ ## Step 1: Classify Each Criterion
10
+
11
+ Read `.claude/done-criteria.md` and classify each criterion:
12
+
13
+ **Browser-testable** — the criterion describes something a user can see or do in the UI:
14
+ - "User can create a new project from the dashboard"
15
+ - "Error message appears when form is submitted with empty fields"
16
+ - "Navigation shows active state on current page"
17
+ - "Data table loads and displays 10 rows"
18
+
19
+ **Not browser-testable** — the criterion is about backend logic, data integrity, or code quality:
20
+ - "API returns 401 for unauthenticated requests"
21
+ - "Database migration runs without errors"
22
+ - "Test coverage exceeds 80%"
23
+ - "No TypeScript errors"
24
+
25
+ Skip non-browser-testable criteria. Note them as "Skipped — not browser-testable" in the report.
26
+
27
+ ## Step 2: Convert to Action Sequences
28
+
29
+ For each browser-testable criterion, generate a sequence of steps:
30
+
31
+ ### Pattern: Navigation + Verification
32
+ ```
33
+ Criterion: "Dashboard shows project count"
34
+ Steps:
35
+ 1. Navigate to /dashboard
36
+ 2. Find element containing project count (look for text matching a number pattern)
37
+ 3. Verify: element exists and contains a numeric value
38
+ 4. Screenshot
39
+ ```
40
+
41
+ ### Pattern: Form Interaction
42
+ ```
43
+ Criterion: "User can create a new project"
44
+ Steps:
45
+ 1. Navigate to /dashboard (or wherever the create action lives)
46
+ 2. Find "Create" or "New Project" button
47
+ 3. Click it
48
+ 4. Find form fields (name, description, etc.)
49
+ 5. Fill with test data: name="Test Project", description="Browser validation test"
50
+ 6. Find and click submit button
51
+ 7. Verify: success indicator appears (toast, redirect, new item in list)
52
+ 8. Screenshot at steps 3, 6, and 7
53
+ ```
54
+
55
+ ### Pattern: Error State
56
+ ```
57
+ Criterion: "Error message shows when form submitted empty"
58
+ Steps:
59
+ 1. Navigate to the form page
60
+ 2. Find submit button
61
+ 3. Click submit without filling any fields
62
+ 4. Verify: error message(s) visible
63
+ 5. Screenshot showing error state
64
+ ```
65
+
66
+ ### Pattern: Conditional UI
67
+ ```
68
+ Criterion: "Empty state shows when no data exists"
69
+ Steps:
70
+ 1. Navigate to the list/table page
71
+ 2. Check if data exists — if so, this test needs a clean state
72
+ 3. If clean state achievable: verify empty state message/illustration
73
+ 4. If not: skip with note "Cannot verify empty state — data already exists"
74
+ 5. Screenshot
75
+ ```
76
+
77
+ ## Step 3: Handle Data Dependencies
78
+
79
+ Some flow tests need specific data to exist (or not exist). Approach:
80
+
81
+ 1. **Read-only tests preferred** — test flows that verify existing state rather than create/modify
82
+ 2. **Create test data if safe** — if the flow creates something (like a project), use obvious test names ("Browser Validation Test — safe to delete")
83
+ 3. **Skip if destructive** — don't test delete flows, don't modify existing data, don't test flows that send emails or notifications
84
+ 4. **Note dependencies** — if a test can't run because of missing data, note it as "Skipped — requires [specific data state]"
85
+
86
+ ## Step 4: Handle Auth-Protected Pages
87
+
88
+ If a route requires authentication:
89
+ 1. Check if the app redirects to a login page
90
+ 2. If login is a simple form (email + password): note "Auth required — skipping unless test credentials available"
91
+ 3. If login uses OAuth/SSO: skip entirely, note "Skipped — requires OAuth flow"
92
+ 4. Do not attempt to log in with guessed credentials
93
+
94
+ ## Test Data Guidelines
95
+
96
+ When filling forms during flow tests, use obviously fake but valid data:
97
+ - Name: "Test User" or "Browser Validate Test"
98
+ - Email: "test@browser-validate.local"
99
+ - Description: "Created by browser-validate skill — safe to delete"
100
+ - Numbers: use small, obvious values (1, 10, 100)
101
+
102
+ This makes test data easy to identify and clean up later.
103
+
104
+ ## Output Format
105
+
106
+ For each flow test, report:
107
+
108
+ ```
109
+ Criterion: [original text from done-criteria]
110
+ Classification: browser-testable | skipped
111
+ Steps executed: [N of total]
112
+ Result: PASS | FAIL | SKIPPED
113
+ Evidence:
114
+ - Screenshot: [path]
115
+ - Console errors during flow: [count] — [details]
116
+ - Network failures during flow: [count] — [details]
117
+ - Failure point: [which step failed and why]
118
+ ```
@@ -0,0 +1,132 @@
1
+ # Tier 1: Chrome DevTools (claude-in-chrome)
2
+
3
+ The richest testing tier. Requires the claude-in-chrome MCP extension running in a Chrome browser. Provides full DOM interaction, console monitoring, network inspection, screenshots, and GIF recording.
4
+
5
+ Read this file only when Tier 1 was selected during DETECT phase.
6
+
7
+ ---
8
+
9
+ ## Setup
10
+
11
+ Before any browser interaction, load the tools you need via ToolSearch:
12
+ ```
13
+ ToolSearch: "select:mcp__claude-in-chrome__tabs_context_mcp"
14
+ ToolSearch: "select:mcp__claude-in-chrome__tabs_create_mcp"
15
+ ToolSearch: "select:mcp__claude-in-chrome__navigate"
16
+ ToolSearch: "select:mcp__claude-in-chrome__get_page_text"
17
+ ToolSearch: "select:mcp__claude-in-chrome__read_page"
18
+ ToolSearch: "select:mcp__claude-in-chrome__find"
19
+ ToolSearch: "select:mcp__claude-in-chrome__computer"
20
+ ToolSearch: "select:mcp__claude-in-chrome__form_input"
21
+ ToolSearch: "select:mcp__claude-in-chrome__resize_window"
22
+ ToolSearch: "select:mcp__claude-in-chrome__read_console_messages"
23
+ ToolSearch: "select:mcp__claude-in-chrome__read_network_requests"
24
+ ToolSearch: "select:mcp__claude-in-chrome__gif_creator"
25
+ ToolSearch: "select:mcp__claude-in-chrome__javascript_tool"
26
+ ```
27
+
28
+ Then call `tabs_context_mcp` first to understand current browser state. Create a new tab for testing — never reuse existing user tabs.
29
+
30
+ ## Tool Mapping by Action
31
+
32
+ ### Navigate to a page
33
+ ```
34
+ tabs_create_mcp → create new tab with URL http://localhost:{PORT}{route}
35
+ OR
36
+ navigate → go to URL in existing tab
37
+ ```
38
+ After navigating, wait 2-3 seconds for client-side rendering, then call `get_page_text` to verify content loaded.
39
+
40
+ ### Check if page rendered
41
+ ```
42
+ get_page_text → extract visible text content
43
+ ```
44
+ Page fails if: text is empty, contains only boilerplate (e.g., just "Loading..."), or contains raw error stack traces.
45
+
46
+ ### Read page structure
47
+ ```
48
+ read_page → get DOM structure and layout info
49
+ ```
50
+ Use this to understand component hierarchy before interacting.
51
+
52
+ ### Find interactive elements
53
+ ```
54
+ find → locate buttons, links, inputs by text content or attributes
55
+ ```
56
+ Returns element positions for clicking.
57
+
58
+ ### Click elements
59
+ ```
60
+ computer → click at coordinates returned by find
61
+ ```
62
+ After clicking, wait 1-2 seconds, then check console + network for errors.
63
+
64
+ ### Fill form fields
65
+ ```
66
+ form_input → set values on input fields, selects, textareas
67
+ ```
68
+ Identify fields with `find` first, then use `form_input` with the field selector.
69
+
70
+ ### Take screenshots
71
+ ```
72
+ computer → screenshot action captures the visible viewport
73
+ ```
74
+ Save screenshots with descriptive names: `smoke-root.png`, `flow-create-project-step3.png`, `visual-mobile-dashboard.png`.
75
+
76
+ ### Resize viewport
77
+ ```
78
+ resize_window → set width and height
79
+ ```
80
+ Mobile: `resize_window(375, 812)`. Desktop: `resize_window(1280, 800)`.
81
+
82
+ ### Read console messages
83
+ ```
84
+ read_console_messages → get all console output
85
+ ```
86
+ Use `pattern` parameter to filter. Useful patterns:
87
+ - `"error|Error|ERROR"` — catch errors
88
+ - `"warn|Warning"` — catch warnings
89
+ - Exclude known noise: React dev warnings (`"Warning: "` prefix), HMR messages (`"[vite]"`, `"[HMR]"`, `"[Fast Refresh]"`), favicon 404s
90
+
91
+ ### Read network requests
92
+ ```
93
+ read_network_requests → get all HTTP requests with status codes
94
+ ```
95
+ Flag: any request with status 4xx or 5xx (excluding `/favicon.ico`). Flag: any CORS error. Ignore: HMR websocket connections, source map requests (`.map`).
96
+
97
+ ### Record multi-step flows
98
+ ```
99
+ gif_creator → record a sequence of actions as an animated GIF
100
+ ```
101
+ Use for flow tests with 3+ steps. Capture extra frames before and after actions for smooth playback. Name meaningfully: `flow-user-registration.gif`.
102
+
103
+ ### Run custom assertions
104
+ ```
105
+ javascript_tool → execute JS in the page context
106
+ ```
107
+ Useful for checking specific DOM state that other tools can't easily verify:
108
+ - `document.querySelectorAll('.error-message').length` — count error elements
109
+ - `window.__NEXT_DATA__` — check Next.js hydration data
110
+ - `document.title` — verify page title
111
+
112
+ Avoid triggering alerts or confirms — they block the extension. Use `console.log` + `read_console_messages` instead.
113
+
114
+ ## Error Filtering
115
+
116
+ Not every console message is a real problem. Apply these filters:
117
+
118
+ **Ignore (dev noise)**:
119
+ - `[HMR]`, `[vite]`, `[Fast Refresh]`, `[webpack-dev-server]`
120
+ - `Warning: ReactDOM.render is no longer supported` (React 18 dev warning)
121
+ - `Download the React DevTools`
122
+ - `/favicon.ico` 404
123
+ - Source map warnings
124
+
125
+ **Flag as errors**:
126
+ - `Uncaught` anything
127
+ - `TypeError`, `ReferenceError`, `SyntaxError`
128
+ - `Failed to fetch` (network errors)
129
+ - `CORS` errors
130
+ - `Hydration` mismatches
131
+ - `ChunkLoadError` (code splitting failures)
132
+ - Any `console.error` call from application code
@@ -0,0 +1,178 @@
1
+ # Tier 2: Playwright (Headless Browser)
2
+
3
+ Solid middle-ground tier. No browser extension needed — works in CI, SSH, Docker, and headless environments. Provides DOM interaction, console monitoring, screenshots, and network inspection. No GIF recording.
4
+
5
+ Read this file only when Tier 2 was selected during DETECT phase.
6
+
7
+ ---
8
+
9
+ ## Two Modes
10
+
11
+ Playwright Tier 2 has two sub-modes depending on what's available. The skill auto-detects which to use.
12
+
13
+ ### Mode A: Playwright MCP (preferred)
14
+
15
+ If `mcp__playwright__*` tools are available (installed via `npx devlyn-cli` → select "playwright" MCP), use them directly. This gives interactive browser control similar to Tier 1:
16
+
17
+ - `mcp__playwright__browser_navigate` — navigate to URL
18
+ - `mcp__playwright__browser_screenshot` — capture screenshot
19
+ - `mcp__playwright__browser_click` — click elements
20
+ - `mcp__playwright__browser_type` — type into inputs
21
+ - `mcp__playwright__browser_console` — read console messages
22
+ - `mcp__playwright__browser_network` — read network requests
23
+ - `mcp__playwright__browser_resize` — resize viewport
24
+
25
+ When Playwright MCP is available, follow the same interaction pattern as Tier 1 (navigate → check → interact → screenshot) but using `mcp__playwright__*` tools instead of `mcp__claude-in-chrome__*`.
26
+
27
+ Load tools via ToolSearch before use: `ToolSearch: "select:mcp__playwright__browser_navigate"` etc.
28
+
29
+ ### Mode B: Script Generation (fallback)
30
+
31
+ If Playwright MCP is not installed but `npx playwright` CLI is available, generate and execute test scripts. This is the approach documented below.
32
+
33
+ ## Setup (Mode B only)
34
+
35
+ Playwright runs via `npx` with auto-download. No global install needed. If browsers aren't installed yet:
36
+ ```bash
37
+ npx playwright install chromium 2>/dev/null
38
+ ```
39
+ This downloads only Chromium (~130MB), not all browsers. It's a one-time cost.
40
+
41
+ ## Approach (Mode B)
42
+
43
+ Generate a temporary test script from the test steps, run it with Playwright's JSON reporter, then parse the results. This avoids needing a persistent test infrastructure — the script is created, executed, and cleaned up.
44
+
45
+ ## Script Generation
46
+
47
+ For each phase (smoke, flow, visual), generate a test script at `.claude/browser-test.spec.ts`.
48
+
49
+ ### Smoke Test Script Template
50
+
51
+ ```typescript
52
+ import { test, expect } from '@playwright/test';
53
+
54
+ const PORT = {PORT};
55
+ const ROUTES = {ROUTES_JSON_ARRAY};
56
+
57
+ test.describe('Smoke Tests', () => {
58
+ for (const route of ROUTES) {
59
+ test(`smoke: ${route}`, async ({ page }) => {
60
+ const errors: string[] = [];
61
+ const failedRequests: string[] = [];
62
+
63
+ page.on('console', msg => {
64
+ if (msg.type() === 'error') errors.push(msg.text());
65
+ });
66
+
67
+ page.on('response', response => {
68
+ if (response.status() >= 400 && !response.url().includes('favicon')) {
69
+ failedRequests.push(`${response.status()} ${response.url()}`);
70
+ }
71
+ });
72
+
73
+ await page.goto(`http://localhost:${PORT}${route}`, { waitUntil: 'networkidle', timeout: 15000 });
74
+
75
+ const bodyText = await page.textContent('body');
76
+ expect(bodyText?.trim().length).toBeGreaterThan(0);
77
+
78
+ await page.screenshot({ path: `.claude/screenshots/smoke${route.replace(/\//g, '-') || '-root'}.png`, fullPage: true });
79
+
80
+ if (errors.length > 0) {
81
+ test.info().annotations.push({ type: 'console_errors', description: errors.join(' | ') });
82
+ }
83
+ if (failedRequests.length > 0) {
84
+ test.info().annotations.push({ type: 'network_failures', description: failedRequests.join(' | ') });
85
+ }
86
+
87
+ expect(errors.filter(e => !e.includes('[HMR]') && !e.includes('favicon'))).toHaveLength(0);
88
+ expect(failedRequests).toHaveLength(0);
89
+ });
90
+ }
91
+ });
92
+ ```
93
+
94
+ ### Flow Test Script Template
95
+
96
+ For each flow test step from done-criteria, generate a test block:
97
+
98
+ ```typescript
99
+ test('flow: [criterion description]', async ({ page }) => {
100
+ // Navigate
101
+ await page.goto(`http://localhost:${PORT}{start_route}`);
102
+
103
+ // Find and interact
104
+ await page.click('[text or selector]');
105
+ await page.fill('[selector]', '[value]');
106
+ await page.click('[submit selector]');
107
+
108
+ // Verify
109
+ await expect(page.locator('[verification selector]')).toBeVisible();
110
+
111
+ // Screenshot
112
+ await page.screenshot({ path: '.claude/screenshots/flow-[name].png' });
113
+ });
114
+ ```
115
+
116
+ ### Visual Test Script Template
117
+
118
+ ```typescript
119
+ test.describe('Visual - Mobile', () => {
120
+ test.use({ viewport: { width: 375, height: 812 } });
121
+ for (const route of ROUTES) {
122
+ test(`visual-mobile: ${route}`, async ({ page }) => {
123
+ await page.goto(`http://localhost:${PORT}${route}`, { waitUntil: 'networkidle' });
124
+ await page.screenshot({ path: `.claude/screenshots/visual-mobile${route.replace(/\//g, '-') || '-root'}.png`, fullPage: true });
125
+ });
126
+ }
127
+ });
128
+
129
+ test.describe('Visual - Desktop', () => {
130
+ test.use({ viewport: { width: 1280, height: 800 } });
131
+ for (const route of ROUTES) {
132
+ test(`visual-desktop: ${route}`, async ({ page }) => {
133
+ await page.goto(`http://localhost:${PORT}${route}`, { waitUntil: 'networkidle' });
134
+ await page.screenshot({ path: `.claude/screenshots/visual-desktop${route.replace(/\//g, '-') || '-root'}.png`, fullPage: true });
135
+ });
136
+ }
137
+ });
138
+ ```
139
+
140
+ ## Execution
141
+
142
+ ```bash
143
+ mkdir -p .claude/screenshots
144
+ npx playwright test .claude/browser-test.spec.ts \
145
+ --reporter=json \
146
+ --output=.claude/playwright-results \
147
+ 2>&1 | tee .claude/playwright-output.json
148
+ ```
149
+
150
+ ## Parsing Results
151
+
152
+ Read `.claude/playwright-output.json`. The JSON structure contains:
153
+ - `suites[].specs[].tests[].results[].status` — `"passed"`, `"failed"`, `"timedOut"`
154
+ - `suites[].specs[].tests[].results[].errors` — error messages with stack traces
155
+ - `suites[].specs[].tests[].annotations` — custom annotations (console_errors, network_failures)
156
+
157
+ Map these to BROWSER-RESULTS.md findings:
158
+ - `failed` → route fails smoke, include error message
159
+ - Annotations with `console_errors` → list in Runtime Errors section
160
+ - Annotations with `network_failures` → list in Failed Network Requests section
161
+
162
+ ## Cleanup
163
+
164
+ After parsing results:
165
+ ```bash
166
+ rm -f .claude/browser-test.spec.ts
167
+ rm -rf .claude/playwright-results
168
+ rm -f .claude/playwright-output.json
169
+ ```
170
+
171
+ Keep `.claude/screenshots/` — those are evidence referenced by the report.
172
+
173
+ ## Limitations vs Tier 1
174
+
175
+ - No GIF recording (can't capture multi-step flow animations)
176
+ - No live DOM exploration (tests are scripted, not interactive)
177
+ - Screenshots are full-page captures, not viewport-specific (use `fullPage: true`)
178
+ - Console filtering is code-based (less flexible than chrome MCP pattern matching)
@@ -0,0 +1,56 @@
1
+ # Tier 3: HTTP Smoke (curl)
2
+
3
+ Bare-minimum fallback. No browser, no JavaScript execution, no interaction testing. This tier confirms the dev server responds and pages return valid HTML. It catches "app doesn't start" and "page returns 500" but nothing subtler.
4
+
5
+ Read this file only when Tier 3 was selected during DETECT phase.
6
+
7
+ ---
8
+
9
+ ## What You Can Test
10
+
11
+ - Server responds on the expected port
12
+ - Pages return HTTP 200
13
+ - HTML contains a `<body>` with content (not an empty shell)
14
+ - No server-side error indicators in the HTML
15
+
16
+ ## What You Cannot Test
17
+
18
+ - Client-side rendering (SPA content won't appear in curl output)
19
+ - JavaScript errors or console output
20
+ - Network requests made by the client
21
+ - Interactive elements (forms, buttons, navigation)
22
+ - Visual layout or responsive behavior
23
+ - Screenshots
24
+
25
+ ## Smoke Test
26
+
27
+ For each affected route:
28
+
29
+ ```bash
30
+ # Check HTTP status
31
+ STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:{PORT}{route} --max-time 10)
32
+
33
+ # Get HTML content
34
+ HTML=$(curl -s http://localhost:{PORT}{route} --max-time 10)
35
+ ```
36
+
37
+ ### Pass Criteria
38
+
39
+ A route passes if:
40
+ 1. `STATUS` is `200` (or `304`)
41
+ 2. HTML contains `<body` tag
42
+ 3. HTML body has more than 100 characters of text content (not just empty divs)
43
+ 4. HTML does not contain server error indicators: `Internal Server Error`, `500`, `ECONNREFUSED`, `Cannot GET`, `404`
44
+
45
+ ### Parsing HTML Content
46
+
47
+ Since curl returns raw HTML (no JS execution), for SPAs the body may only contain a root `<div id="root"></div>` or `<div id="__next"></div>`. This is normal and counts as a PASS for Tier 3 — note it as "SPA shell detected, client-side rendering not verifiable at this tier."
48
+
49
+ For SSR frameworks (Next.js with server components, Nuxt, Astro), the HTML should contain actual rendered content.
50
+
51
+ ## Report Adjustments
52
+
53
+ When writing BROWSER-RESULTS.md from Tier 3:
54
+ - Set confidence level to LOW
55
+ - Leave Console Errors, Network Failures, Flow Tests, and Visual Check sections as "N/A — Tier 3 (HTTP only)"
56
+ - Note the limitation: "Tier 3 testing provides HTTP-level validation only. Client-side behavior, JavaScript errors, and visual rendering were not tested. For comprehensive browser validation, install the claude-in-chrome extension (Tier 1) or Playwright (Tier 2)."
@@ -297,14 +297,9 @@ LOW (note):
297
297
  4. For each catch block: is the error surfaced to the user or silently swallowed?
298
298
  5. Check for React anti-patterns: uncontrolled-to-controlled switches, direct DOM mutation, missing cleanup
299
299
  6. Compare against existing components for pattern consistency
300
- 7. **Live app testing** (when browser tools are available): If `mcp__claude-in-chrome__*` tools are available, test the running application directly:
301
- - Navigate to the affected pages
302
- - Click through the user flow end-to-end
303
- - Test interactive elements (forms, buttons, modals, navigation)
304
- - Verify loading, error, and empty states render correctly
305
- - Screenshot any visual issues as evidence
306
- - Test responsive behavior at mobile/tablet/desktop widths
307
- If browser tools are NOT available, skip this step and note "Live testing skipped — no browser tools" in your deliverable.
300
+ 7. **Browser evidence** (when available): Read `.claude/BROWSER-RESULTS.md` if it exists — it contains pre-collected smoke test results, flow test results, console errors, network failures, and screenshots from the `devlyn:browser-validate` skill. Use this as additional evidence in your evaluation. Do not re-run smoke tests that are already covered.
301
+ If the dev server is still running and you need deeper investigation on a specific interaction, use browser tools directly (check if `mcp__claude-in-chrome__*` tools are available, or fall back to Playwright). Focus on verifying specific findings, not duplicating the full smoke/flow suite.
302
+ If neither `.claude/BROWSER-RESULTS.md` exists nor browser tools are available, note "Live testing skipped — no browser validation available" in your deliverable.
308
303
 
309
304
  **Your deliverable**: Send a message to the team lead with:
310
305
  1. Component quality assessment for each new/changed component
@@ -312,7 +307,7 @@ LOW (note):
312
307
  3. Silent failure points that violate error handling policy
313
308
  4. React anti-patterns found
314
309
  5. Pattern consistency with existing components
315
- 6. Live testing results (if browser tools were available): screenshots, interaction bugs, visual regressions
310
+ 6. Browser validation results (from BROWSER-RESULTS.md or live testing): screenshots, interaction bugs, runtime errors, visual regressions
316
311
 
317
312
  Read the team config at ~/.claude/teams/{team-name}/config.json to discover teammates. Coordinate with api-contract-evaluator about client-server type alignment via SendMessage.
318
313
  </frontend_evaluator_prompt>
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "devlyn-cli",
3
- "version": "1.2.1",
3
+ "version": "1.3.0",
4
4
  "description": "Claude Code configuration toolkit for teams",
5
5
  "bin": {
6
6
  "devlyn": "bin/devlyn.js"