devlyn-cli 1.15.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (158) hide show
  1. package/AGENTS.md +104 -0
  2. package/CLAUDE.md +135 -21
  3. package/README.md +43 -125
  4. package/benchmark/auto-resolve/BENCHMARK-DESIGN.md +272 -0
  5. package/benchmark/auto-resolve/README.md +114 -0
  6. package/benchmark/auto-resolve/RUBRIC.md +162 -0
  7. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/NOTES.md +30 -0
  8. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/expected.json +68 -0
  9. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/metadata.json +10 -0
  10. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/setup.sh +4 -0
  11. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/spec.md +45 -0
  12. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/task.txt +8 -0
  13. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/NOTES.md +54 -0
  14. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected-pair-plan-registry.json +170 -0
  15. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected.json +84 -0
  16. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/metadata.json +21 -0
  17. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-fail.json +214 -0
  18. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-pass.json +223 -0
  19. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/setup.sh +5 -0
  20. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/spec.md +56 -0
  21. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/task.txt +14 -0
  22. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/NOTES.md +28 -0
  23. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected-pair-plan-registry.json +162 -0
  24. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected.json +65 -0
  25. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/metadata.json +19 -0
  26. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/setup.sh +4 -0
  27. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/spec.md +56 -0
  28. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/task.txt +9 -0
  29. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/NOTES.md +40 -0
  30. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/expected.json +57 -0
  31. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/metadata.json +10 -0
  32. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/setup.sh +6 -0
  33. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/spec.md +49 -0
  34. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/task.txt +9 -0
  35. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/NOTES.md +38 -0
  36. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/expected.json +65 -0
  37. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/metadata.json +10 -0
  38. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/setup.sh +55 -0
  39. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/spec.md +49 -0
  40. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/task.txt +7 -0
  41. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/NOTES.md +38 -0
  42. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/expected.json +77 -0
  43. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/metadata.json +10 -0
  44. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/setup.sh +4 -0
  45. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/spec.md +49 -0
  46. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/task.txt +10 -0
  47. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/NOTES.md +50 -0
  48. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/expected.json +76 -0
  49. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/metadata.json +10 -0
  50. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/setup.sh +36 -0
  51. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/spec.md +46 -0
  52. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/task.txt +7 -0
  53. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/NOTES.md +50 -0
  54. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/expected.json +63 -0
  55. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/metadata.json +10 -0
  56. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/setup.sh +4 -0
  57. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/spec.md +48 -0
  58. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/task.txt +1 -0
  59. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/NOTES.md +93 -0
  60. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/expected.json +74 -0
  61. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/metadata.json +10 -0
  62. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/setup.sh +28 -0
  63. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/spec.md +62 -0
  64. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/task.txt +5 -0
  65. package/benchmark/auto-resolve/fixtures/SCHEMA.md +130 -0
  66. package/benchmark/auto-resolve/fixtures/test-repo/README.md +27 -0
  67. package/benchmark/auto-resolve/fixtures/test-repo/bin/cli.js +63 -0
  68. package/benchmark/auto-resolve/fixtures/test-repo/package-lock.json +823 -0
  69. package/benchmark/auto-resolve/fixtures/test-repo/package.json +22 -0
  70. package/benchmark/auto-resolve/fixtures/test-repo/playwright.config.js +17 -0
  71. package/benchmark/auto-resolve/fixtures/test-repo/server/index.js +37 -0
  72. package/benchmark/auto-resolve/fixtures/test-repo/tests/cli.test.js +25 -0
  73. package/benchmark/auto-resolve/fixtures/test-repo/tests/server.test.js +58 -0
  74. package/benchmark/auto-resolve/fixtures/test-repo/web/index.html +37 -0
  75. package/benchmark/auto-resolve/scripts/build-pair-eligible-manifest.py +174 -0
  76. package/benchmark/auto-resolve/scripts/check-f9-artifacts.py +256 -0
  77. package/benchmark/auto-resolve/scripts/compile-report.py +331 -0
  78. package/benchmark/auto-resolve/scripts/iter-0033c-compare.py +552 -0
  79. package/benchmark/auto-resolve/scripts/judge-opus-pass.sh +430 -0
  80. package/benchmark/auto-resolve/scripts/judge.sh +359 -0
  81. package/benchmark/auto-resolve/scripts/oracle-scope-tier-a.py +260 -0
  82. package/benchmark/auto-resolve/scripts/oracle-scope-tier-b.py +274 -0
  83. package/benchmark/auto-resolve/scripts/oracle-test-fidelity.py +328 -0
  84. package/benchmark/auto-resolve/scripts/pair-plan-idgen.py +401 -0
  85. package/benchmark/auto-resolve/scripts/pair-plan-lint.py +468 -0
  86. package/benchmark/auto-resolve/scripts/run-fixture.sh +691 -0
  87. package/benchmark/auto-resolve/scripts/run-iter-0033c.sh +234 -0
  88. package/benchmark/auto-resolve/scripts/run-suite.sh +214 -0
  89. package/benchmark/auto-resolve/scripts/ship-gate.py +222 -0
  90. package/bin/devlyn.js +175 -17
  91. package/config/skills/_shared/adapters/README.md +64 -0
  92. package/config/skills/_shared/adapters/gpt-5-5.md +29 -0
  93. package/config/skills/_shared/adapters/opus-4-7.md +29 -0
  94. package/config/skills/{devlyn:auto-resolve/scripts → _shared}/archive_run.py +26 -0
  95. package/config/skills/_shared/codex-config.md +54 -0
  96. package/config/skills/_shared/codex-monitored.sh +141 -0
  97. package/config/skills/_shared/engine-preflight.md +35 -0
  98. package/config/skills/_shared/expected.schema.json +93 -0
  99. package/config/skills/_shared/pair-plan-schema.md +298 -0
  100. package/config/skills/_shared/runtime-principles.md +110 -0
  101. package/config/skills/_shared/spec-verify-check.py +519 -0
  102. package/config/skills/devlyn:ideate/SKILL.md +99 -429
  103. package/config/skills/devlyn:ideate/references/elicitation.md +97 -0
  104. package/config/skills/devlyn:ideate/references/from-spec-mode.md +54 -0
  105. package/config/skills/devlyn:ideate/references/project-mode.md +76 -0
  106. package/config/skills/devlyn:ideate/references/spec-template.md +102 -0
  107. package/config/skills/devlyn:resolve/SKILL.md +172 -184
  108. package/config/skills/devlyn:resolve/references/free-form-mode.md +68 -0
  109. package/config/skills/devlyn:resolve/references/phases/build-gate.md +45 -0
  110. package/config/skills/devlyn:resolve/references/phases/cleanup.md +39 -0
  111. package/config/skills/devlyn:resolve/references/phases/implement.md +42 -0
  112. package/config/skills/devlyn:resolve/references/phases/plan.md +42 -0
  113. package/config/skills/devlyn:resolve/references/phases/verify.md +69 -0
  114. package/config/skills/devlyn:resolve/references/state-schema.md +106 -0
  115. package/{config/skills → optional-skills}/devlyn:design-system/SKILL.md +1 -0
  116. package/{config/skills → optional-skills}/devlyn:reap/SKILL.md +1 -0
  117. package/{config/skills → optional-skills}/devlyn:team-design-ui/SKILL.md +5 -0
  118. package/package.json +12 -2
  119. package/scripts/lint-skills.sh +431 -0
  120. package/config/skills/devlyn:auto-resolve/SKILL.md +0 -252
  121. package/config/skills/devlyn:auto-resolve/evals/evals.json +0 -21
  122. package/config/skills/devlyn:auto-resolve/evals/task-doctor-subcommand.md +0 -42
  123. package/config/skills/devlyn:auto-resolve/references/build-gate.md +0 -130
  124. package/config/skills/devlyn:auto-resolve/references/engine-routing.md +0 -82
  125. package/config/skills/devlyn:auto-resolve/references/findings-schema.md +0 -103
  126. package/config/skills/devlyn:auto-resolve/references/phases/phase-1-build.md +0 -54
  127. package/config/skills/devlyn:auto-resolve/references/phases/phase-2-evaluate.md +0 -45
  128. package/config/skills/devlyn:auto-resolve/references/phases/phase-3-critic.md +0 -84
  129. package/config/skills/devlyn:auto-resolve/references/pipeline-routing.md +0 -114
  130. package/config/skills/devlyn:auto-resolve/references/pipeline-state.md +0 -201
  131. package/config/skills/devlyn:auto-resolve/scripts/terminal_verdict.py +0 -96
  132. package/config/skills/devlyn:browser-validate/SKILL.md +0 -164
  133. package/config/skills/devlyn:browser-validate/references/flow-testing.md +0 -118
  134. package/config/skills/devlyn:browser-validate/references/tier1-chrome.md +0 -137
  135. package/config/skills/devlyn:browser-validate/references/tier2-playwright.md +0 -195
  136. package/config/skills/devlyn:browser-validate/references/tier3-curl.md +0 -57
  137. package/config/skills/devlyn:clean/SKILL.md +0 -285
  138. package/config/skills/devlyn:design-ui/SKILL.md +0 -351
  139. package/config/skills/devlyn:discover-product/SKILL.md +0 -124
  140. package/config/skills/devlyn:evaluate/SKILL.md +0 -564
  141. package/config/skills/devlyn:feature-spec/SKILL.md +0 -630
  142. package/config/skills/devlyn:ideate/references/challenge-rubric.md +0 -122
  143. package/config/skills/devlyn:ideate/references/codex-critic-template.md +0 -42
  144. package/config/skills/devlyn:ideate/references/templates/item-spec.md +0 -90
  145. package/config/skills/devlyn:implement-ui/SKILL.md +0 -466
  146. package/config/skills/devlyn:preflight/SKILL.md +0 -355
  147. package/config/skills/devlyn:preflight/references/auditors/browser-auditor.md +0 -32
  148. package/config/skills/devlyn:preflight/references/auditors/code-auditor.md +0 -86
  149. package/config/skills/devlyn:preflight/references/auditors/docs-auditor.md +0 -38
  150. package/config/skills/devlyn:product-spec/SKILL.md +0 -603
  151. package/config/skills/devlyn:recommend-features/SKILL.md +0 -286
  152. package/config/skills/devlyn:review/SKILL.md +0 -161
  153. package/config/skills/devlyn:team-resolve/SKILL.md +0 -631
  154. package/config/skills/devlyn:team-review/SKILL.md +0 -493
  155. package/config/skills/devlyn:update-docs/SKILL.md +0 -463
  156. package/config/skills/workflow-routing/SKILL.md +0 -73
  157. /package/{config/skills → optional-skills}/devlyn:reap/scripts/reap.sh +0 -0
  158. /package/{config/skills → optional-skills}/devlyn:reap/scripts/scan.sh +0 -0
@@ -1,164 +0,0 @@
1
- ---
2
- name: devlyn:browser-validate
3
- description: Browser-based validation for web applications — verifies that implemented features actually work by testing them in a real browser. Starts the dev server, tests the feature end-to-end (click buttons, fill forms, verify results), and reports what's broken with screenshot evidence. Use this skill whenever the user says "test in browser", "check if it works", "does the feature work", "browser test", "validate the UI", or when auto-resolve needs to verify web changes actually function correctly. Also use proactively after implementing UI changes. The primary goal is feature verification, not just checking if pages render.
4
- ---
5
-
6
- Verify that implemented features actually work in the browser. The primary job is to test the feature that was just built — click the button, fill the form, check the result. Smoke tests and visual checks are supporting checks, not the main event.
7
-
8
- The whole point of browser validation is to catch the gap between "code looks correct" and "user can actually do the thing." Static analysis and unit tests can confirm the code is well-structured. Browser validation confirms it *works*.
9
-
10
- <config>
11
- $ARGUMENTS
12
- </config>
13
-
14
- <workflow>
15
-
16
- ## PHASE 1: DETECT
17
-
18
- 1. **What was built**: This is the most important input. Read `.devlyn/done-criteria.md` if it exists — it tells you what the feature is supposed to do. If it doesn't exist, read `git diff --stat` and `git log -1` to understand what changed. You need to know what to test before anything else.
19
-
20
- 2. **Framework detection**: Read `package.json` → identify framework and start command from `scripts.dev`, `scripts.start`, or `scripts.preview`.
21
-
22
- 3. **Port inference**: Defaults — Next.js: 3000, Vite: 5173, CRA: 3000, Nuxt: 3000, Astro: 4321, Angular: 4200. Override with `--port` flag.
23
-
24
- 4. **Affected routes**: Map changed files to routes (e.g., `app/dashboard/page.tsx` → `/dashboard`).
25
-
26
- 5. **Tier selection** — pick the best available browser tool. **You must verify each tier actually works before committing to it** — tools can be registered but not connected:
27
- - **Tier 1 probe** (Chrome DevTools): Check if `mcp__claude-in-chrome__*` tools exist. If they do, load `mcp__claude-in-chrome__tabs_context_mcp` via ToolSearch and call it. If the call **succeeds** (returns tab data without error), use Tier 1. Read `references/tier1-chrome.md`. If the call **fails** (timeout, connection error, extension not running), Tier 1 is unavailable — fall through to Tier 2.
28
- - **Tier 2 probe** (Playwright): Check if `mcp__playwright__*` tools exist (try ToolSearch for `mcp__playwright__browser_navigate`). If they exist and respond, use Tier 2 Mode A. Else run `npx playwright --version 2>/dev/null` — if it succeeds, use Tier 2 Mode B. Read `references/tier2-playwright.md`.
29
- - **Tier 3** (HTTP smoke): Fallback when no browser tool is functional. Read `references/tier3-curl.md`.
30
-
31
- **Critical rule**: Never treat a tier as available just because its tools appear in the tool list. Deferred/registered tools may not have a running backend. Always probe before committing.
32
-
33
- 6. **Skip gate**: If no web-relevant files changed (no `*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `*.astro`, `*.css`, `*.scss`, `*.html`, `page.*`, `layout.*`, `route.*`, `+page.*`, `+layout.*`), skip. Report: "Browser validation skipped — no web changes detected."
34
-
35
- 7. **Parse flags** from `<config>`:
36
- - `--skip-feature` — skip feature testing, only run smoke + visual
37
- - `--port PORT` — override detected port
38
- - `--tier N` — force a specific tier (1, 2, or 3)
39
- - `--mobile-only` / `--desktop-only` — limit viewport testing
40
- - `--topic SLUG` — override the auto-derived screenshot topic slug
41
-
42
- 8. **Derive the screenshot topic slug**. All screenshots for this run go under `.devlyn/screenshots/<topic-slug>/` so runs for different features don't pile up together. Resolution order:
43
- 1. `--topic` flag value, kebab-cased
44
- 2. First non-blank heading/line of `.devlyn/done-criteria.md` (strip `#`, kebab-case, max 40 chars)
45
- 3. Current git branch name, if not `main`/`master`/`HEAD`
46
- 4. Fallback: `run-<YYYYMMDD-HHMM>`
47
-
48
- Then wipe and recreate the topic dir (fresh evidence per run; don't touch other topics' dirs):
49
- ```bash
50
- SCREENSHOT_DIR=".devlyn/screenshots/<topic-slug>"
51
- rm -rf "$SCREENSHOT_DIR"
52
- mkdir -p "$SCREENSHOT_DIR"/{smoke,feature,visual}
53
- ```
54
-
55
- Record `$SCREENSHOT_DIR` and reuse it through the run. All screenshot paths below are **relative to `$SCREENSHOT_DIR`**:
56
- - Smoke: `smoke/<route-slug>.png` (root → `smoke/root.png`)
57
- - Feature: `feature/<criterion-slug>-step<N>.png`
58
- - Visual: `visual/<viewport>-<route-slug>.png` (e.g., `visual/mobile-dashboard.png`)
59
-
60
- Announce:
61
- ```
62
- Browser validation starting
63
- Feature: [what was built, from done-criteria or git diff]
64
- Framework: [detected] | Port: [PORT] | Tier: [N — name]
65
- Topic: [topic-slug] → .devlyn/screenshots/<topic-slug>/
66
- Phases: Server → Smoke → Feature Test → Visual → Report
67
- ```
68
-
69
- ## PHASE 2: SERVER
70
-
71
- Get the dev server running. If it doesn't start, diagnose and fix — don't just report failure.
72
-
73
- 1. Start the dev server in background via Bash with `run_in_background: true`.
74
- 2. Health-check: poll `http://localhost:PORT` every 2s, timeout 30s. Ready when you get an HTTP response.
75
- 3. **If it doesn't come up — troubleshoot** (up to 2 attempts): read stderr for the error, fix it (npm install, port conflict, build error, etc.), restart, re-check.
76
- 4. If still down after 2 attempts: write BLOCKED verdict and stop.
77
-
78
- ## PHASE 3: SMOKE (quick prerequisite)
79
-
80
- Quick check that the app is alive. This is not the main test — it's a gate to make sure feature testing is even possible.
81
-
82
- Navigate to `/` and each affected route. For each page, judge: is this the actual application, or an error page? A connection error, framework error overlay, or blank shell is not the app. If broken, try to fix (read console errors, fix source, let hot-reload pick it up). Up to 2 fix attempts per route.
83
-
84
- **Tier downgrade on failure**: If you're on Tier 1 or Tier 2 Mode A and the browser tool consistently fails during smoke (connection errors, timeouts, extension disconnected), **do not skip browser testing**. Instead, downgrade to the next tier (Tier 1 → Tier 2 → Tier 3), re-read the corresponding reference file, and retry the smoke phase with the new tier. Announce: `"Tier [N] browser tools not responding — downgrading to Tier [N+1]."` The goal is to always run the best available browser test, not to give up.
85
-
86
- If the app isn't rendering, the verdict is BLOCKED — feature testing can't happen.
87
-
88
- ## PHASE 4: FEATURE TEST (the main event)
89
-
90
- This is the primary purpose of browser validation. Everything else is in service of getting here.
91
-
92
- Read `.devlyn/done-criteria.md` (or infer from git diff what was built). For each criterion that describes something a user can do or see in the UI, test it end-to-end in the browser:
93
-
94
- 1. **Plan the test**: What would a user do to verify this feature works? Navigate where, click what, type what, expect what result?
95
- 2. **Execute it**: Navigate to the page, find the interactive elements, perform the actions, verify the outcome. Read `references/flow-testing.md` for patterns on converting criteria to browser steps.
96
- 3. **Capture evidence**: Screenshot at each key step. Record console errors and network failures that happen during the interaction.
97
- 4. **If it fails — try to fix**: Read the error (console, network, or the UI state) to understand why the feature broke. Fix the source code, let hot-reload update, and re-test. Up to 2 fix attempts per criterion.
98
- 5. **Record the result**: For each criterion — PASS (feature works as specified), FAIL (feature doesn't work, include what went wrong), SKIPPED (criterion isn't browser-testable, e.g., "API returns 401"), or UNVERIFIABLE (feature depends on external services not available in the test environment — e.g., real API keys, third-party auth, paid services).
99
-
100
- **Don't churn on external dependencies.** If a feature test is blocked because an API times out, a third-party service isn't configured, or auth credentials aren't available — that's not a bug to fix, it's a test environment limitation. Note it as UNVERIFIABLE, move on to the next criterion. Don't spend more than 30 seconds waiting for a response that's never coming. The goal is to verify what *can* be verified in the current environment, and be honest about what can't.
101
-
102
- The verdict depends primarily on this phase. If the implemented features don't work in the browser, the validation fails — even if every page renders perfectly and the layout looks great. And if most features couldn't be verified due to environment limitations, be honest about that — don't call it PASS.
103
-
104
- ## PHASE 5: VISUAL (supporting check)
105
-
106
- Quick layout check at two viewports (skip if `--mobile-only` or `--desktop-only`):
107
-
108
- 1. **Mobile** (375x812): screenshot each affected route, check for overflow/overlap/unreadable text
109
- 2. **Desktop** (1280x800): screenshot each affected route, check for broken layouts
110
-
111
- Judgment-based — look at the screenshots and report visible issues.
112
-
113
- ## PHASE 6: REPORT
114
-
115
- Write `.devlyn/BROWSER-RESULTS.md`:
116
-
117
- ```markdown
118
- # Browser Validation Results
119
-
120
- ## Verdict: [PASS / PASS WITH ISSUES / NEEDS WORK / PARTIALLY VERIFIED / BLOCKED]
121
- Verdict rules:
122
- - BLOCKED = server won't start or app doesn't render
123
- - NEEDS WORK = implemented features don't work in the browser
124
- - PARTIALLY VERIFIED = some features verified working, but others couldn't be tested due to environment limitations (missing API keys, external service dependencies). Be explicit about what was and wasn't verified.
125
- - PASS WITH ISSUES = all testable features work but visual issues or minor warnings exist
126
- - PASS = all testable features verified working, pages render, layout clean
127
-
128
- ## What Was Tested
129
- [Brief description of the feature/task from done-criteria or git diff]
130
-
131
- ## Feature Verification (primary)
132
- | Criterion | Test Steps | Result | Evidence |
133
- |-----------|-----------|--------|----------|
134
- | [what should work] | [what you did] | PASS/FAIL/SKIPPED/UNVERIFIABLE | [screenshot, errors, what went wrong] |
135
-
136
- ## Unverifiable Features (if any)
137
- [List features that couldn't be tested and why — e.g., "Badge rendering requires /api/backends/status which needs real API keys not present in test env. Verified via source code and unit tests instead."]
138
-
139
- ## Smoke Test (prerequisite)
140
- | Route | Renders | Console Errors | Network Failures |
141
- |-------|---------|---------------|-----------------|
142
- | / | YES/NO | [count] | [count] |
143
-
144
- ## Visual Check
145
- | Viewport | Route | Issues |
146
- |----------|-------|--------|
147
- | Mobile (375px) | / | [issues or "Clean"] |
148
- | Desktop (1280px) | / | [issues or "Clean"] |
149
-
150
- ## Fixes Applied During Validation
151
- [List any bugs found and fixed during testing — server startup issues, broken routes, feature bugs]
152
-
153
- ## Runtime Errors
154
- [Console errors captured during testing]
155
-
156
- ## Failed Network Requests
157
- [Failed API calls captured during testing]
158
- ```
159
-
160
- ## PHASE 7: CLEANUP
161
-
162
- Kill the dev server PID. If `--keep-server` was passed (auto-resolve pipeline), skip — the pipeline handles cleanup.
163
-
164
- </workflow>
@@ -1,118 +0,0 @@
1
- # Flow Testing: Done-Criteria to Browser Steps
2
-
3
- How to read `.devlyn/done-criteria.md` and convert testable criteria into browser action sequences. This is the bridge between "what should work" and "prove it works in the browser."
4
-
5
- Read this file only during PHASE 4 (FLOW) when done-criteria exists.
6
-
7
- ---
8
-
9
- ## Step 1: Classify Each Criterion
10
-
11
- Read `.devlyn/done-criteria.md` and classify each criterion:
12
-
13
- **Browser-testable** — the criterion describes something a user can see or do in the UI:
14
- - "User can create a new project from the dashboard"
15
- - "Error message appears when form is submitted with empty fields"
16
- - "Navigation shows active state on current page"
17
- - "Data table loads and displays 10 rows"
18
-
19
- **Not browser-testable** — the criterion is about backend logic, data integrity, or code quality:
20
- - "API returns 401 for unauthenticated requests"
21
- - "Database migration runs without errors"
22
- - "Test coverage exceeds 80%"
23
- - "No TypeScript errors"
24
-
25
- Skip non-browser-testable criteria. Note them as "Skipped — not browser-testable" in the report.
26
-
27
- ## Step 2: Convert to Action Sequences
28
-
29
- For each browser-testable criterion, generate a sequence of steps:
30
-
31
- ### Pattern: Navigation + Verification
32
- ```
33
- Criterion: "Dashboard shows project count"
34
- Steps:
35
- 1. Navigate to /dashboard
36
- 2. Find element containing project count (look for text matching a number pattern)
37
- 3. Verify: element exists and contains a numeric value
38
- 4. Screenshot
39
- ```
40
-
41
- ### Pattern: Form Interaction
42
- ```
43
- Criterion: "User can create a new project"
44
- Steps:
45
- 1. Navigate to /dashboard (or wherever the create action lives)
46
- 2. Find "Create" or "New Project" button
47
- 3. Click it
48
- 4. Find form fields (name, description, etc.)
49
- 5. Fill with test data: name="Test Project", description="Browser validation test"
50
- 6. Find and click submit button
51
- 7. Verify: success indicator appears (toast, redirect, new item in list)
52
- 8. Screenshot at steps 3, 6, and 7
53
- ```
54
-
55
- ### Pattern: Error State
56
- ```
57
- Criterion: "Error message shows when form submitted empty"
58
- Steps:
59
- 1. Navigate to the form page
60
- 2. Find submit button
61
- 3. Click submit without filling any fields
62
- 4. Verify: error message(s) visible
63
- 5. Screenshot showing error state
64
- ```
65
-
66
- ### Pattern: Conditional UI
67
- ```
68
- Criterion: "Empty state shows when no data exists"
69
- Steps:
70
- 1. Navigate to the list/table page
71
- 2. Check if data exists — if so, this test needs a clean state
72
- 3. If clean state achievable: verify empty state message/illustration
73
- 4. If not: skip with note "Cannot verify empty state — data already exists"
74
- 5. Screenshot
75
- ```
76
-
77
- ## Step 3: Handle Data Dependencies
78
-
79
- Some flow tests need specific data to exist (or not exist). Approach:
80
-
81
- 1. **Read-only tests preferred** — test flows that verify existing state rather than create/modify
82
- 2. **Create test data if safe** — if the flow creates something (like a project), use obvious test names ("Browser Validation Test — safe to delete")
83
- 3. **Skip if destructive** — don't test delete flows, don't modify existing data, don't test flows that send emails or notifications
84
- 4. **Note dependencies** — if a test can't run because of missing data, note it as "Skipped — requires [specific data state]"
85
-
86
- ## Step 4: Handle Auth-Protected Pages
87
-
88
- If a route requires authentication:
89
- 1. Check if the app redirects to a login page
90
- 2. If login is a simple form (email + password): note "Auth required — skipping unless test credentials available"
91
- 3. If login uses OAuth/SSO: skip entirely, note "Skipped — requires OAuth flow"
92
- 4. Do not attempt to log in with guessed credentials
93
-
94
- ## Test Data Guidelines
95
-
96
- When filling forms during flow tests, use obviously fake but valid data:
97
- - Name: "Test User" or "Browser Validate Test"
98
- - Email: "test@browser-validate.local"
99
- - Description: "Created by browser-validate skill — safe to delete"
100
- - Numbers: use small, obvious values (1, 10, 100)
101
-
102
- This makes test data easy to identify and clean up later.
103
-
104
- ## Output Format
105
-
106
- For each flow test, report:
107
-
108
- ```
109
- Criterion: [original text from done-criteria]
110
- Classification: browser-testable | skipped
111
- Steps executed: [N of total]
112
- Result: PASS | FAIL | SKIPPED
113
- Evidence:
114
- - Screenshot: [path]
115
- - Console errors during flow: [count] — [details]
116
- - Network failures during flow: [count] — [details]
117
- - Failure point: [which step failed and why]
118
- ```
@@ -1,137 +0,0 @@
1
- # Tier 1: Chrome DevTools (claude-in-chrome)
2
-
3
- The richest testing tier. Requires the claude-in-chrome MCP extension running in a Chrome browser. Provides full DOM interaction, console monitoring, network inspection, screenshots, and GIF recording.
4
-
5
- Read this file only when Tier 1 was selected during DETECT phase.
6
-
7
- ---
8
-
9
- ## Setup
10
-
11
- Before any browser interaction, load the tools you need via ToolSearch:
12
- ```
13
- ToolSearch: "select:mcp__claude-in-chrome__tabs_context_mcp"
14
- ToolSearch: "select:mcp__claude-in-chrome__tabs_create_mcp"
15
- ToolSearch: "select:mcp__claude-in-chrome__navigate"
16
- ToolSearch: "select:mcp__claude-in-chrome__get_page_text"
17
- ToolSearch: "select:mcp__claude-in-chrome__read_page"
18
- ToolSearch: "select:mcp__claude-in-chrome__find"
19
- ToolSearch: "select:mcp__claude-in-chrome__computer"
20
- ToolSearch: "select:mcp__claude-in-chrome__form_input"
21
- ToolSearch: "select:mcp__claude-in-chrome__resize_window"
22
- ToolSearch: "select:mcp__claude-in-chrome__read_console_messages"
23
- ToolSearch: "select:mcp__claude-in-chrome__read_network_requests"
24
- ToolSearch: "select:mcp__claude-in-chrome__gif_creator"
25
- ToolSearch: "select:mcp__claude-in-chrome__javascript_tool"
26
- ```
27
-
28
- Then call `tabs_context_mcp` first to understand current browser state. Create a new tab for testing — never reuse existing user tabs.
29
-
30
- ## Tool Mapping by Action
31
-
32
- ### Navigate to a page
33
- ```
34
- tabs_create_mcp → create new tab with URL http://localhost:{PORT}{route}
35
- OR
36
- navigate → go to URL in existing tab
37
- ```
38
- After navigating, wait 2-3 seconds for client-side rendering, then call `get_page_text` to verify content loaded.
39
-
40
- ### Check if page rendered
41
- ```
42
- get_page_text → extract visible text content
43
- ```
44
- Read the text and judge: is this the actual application, or an error/fallback page? Browser error pages, framework error overlays, "Unable to connect" screens, and empty shells all have text — but they're not the app. If the page content doesn't look like what the application is supposed to show, it's a failure.
45
-
46
- ### Read page structure
47
- ```
48
- read_page → get DOM structure and layout info
49
- ```
50
- Use this to understand component hierarchy before interacting.
51
-
52
- ### Find interactive elements
53
- ```
54
- find → locate buttons, links, inputs by text content or attributes
55
- ```
56
- Returns element positions for clicking.
57
-
58
- ### Click elements
59
- ```
60
- computer → click at coordinates returned by find
61
- ```
62
- After clicking, wait 1-2 seconds, then check console + network for errors.
63
-
64
- ### Fill form fields
65
- ```
66
- form_input → set values on input fields, selects, textareas
67
- ```
68
- Identify fields with `find` first, then use `form_input` with the field selector.
69
-
70
- ### Take screenshots
71
- ```
72
- computer → screenshot action captures the visible viewport
73
- ```
74
- Save screenshots into the topic-scoped directory that PHASE 1 set up (`.devlyn/screenshots/<topic-slug>/`), organized by phase:
75
- - Smoke: `<topic>/smoke/<route>.png` — e.g., `smoke/root.png`, `smoke/dashboard.png`
76
- - Feature: `<topic>/feature/<criterion>-step<N>.png` — e.g., `feature/create-project-step3.png`
77
- - Visual: `<topic>/visual/<viewport>-<route>.png` — e.g., `visual/mobile-dashboard.png`
78
-
79
- Since `computer → screenshot` writes to a default location, move/rename the captured file into the right subdirectory immediately after taking it, so evidence paths in the report match this scheme.
80
-
81
- ### Resize viewport
82
- ```
83
- resize_window → set width and height
84
- ```
85
- Mobile: `resize_window(375, 812)`. Desktop: `resize_window(1280, 800)`.
86
-
87
- ### Read console messages
88
- ```
89
- read_console_messages → get all console output
90
- ```
91
- Use `pattern` parameter to filter. Useful patterns:
92
- - `"error|Error|ERROR"` — catch errors
93
- - `"warn|Warning"` — catch warnings
94
- - Exclude known noise: React dev warnings (`"Warning: "` prefix), HMR messages (`"[vite]"`, `"[HMR]"`, `"[Fast Refresh]"`), favicon 404s
95
-
96
- ### Read network requests
97
- ```
98
- read_network_requests → get all HTTP requests with status codes
99
- ```
100
- Flag: any request with status 4xx or 5xx (excluding `/favicon.ico`). Flag: any CORS error. Ignore: HMR websocket connections, source map requests (`.map`).
101
-
102
- ### Record multi-step flows
103
- ```
104
- gif_creator → record a sequence of actions as an animated GIF
105
- ```
106
- Use for flow tests with 3+ steps. Capture extra frames before and after actions for smooth playback. Name meaningfully: `flow-user-registration.gif`.
107
-
108
- ### Run custom assertions
109
- ```
110
- javascript_tool → execute JS in the page context
111
- ```
112
- Useful for checking specific DOM state that other tools can't easily verify:
113
- - `document.querySelectorAll('.error-message').length` — count error elements
114
- - `window.__NEXT_DATA__` — check Next.js hydration data
115
- - `document.title` — verify page title
116
-
117
- Avoid triggering alerts or confirms — they block the extension. Use `console.log` + `read_console_messages` instead.
118
-
119
- ## Error Filtering
120
-
121
- Not every console message is a real problem. Apply these filters:
122
-
123
- **Ignore (dev noise)**:
124
- - `[HMR]`, `[vite]`, `[Fast Refresh]`, `[webpack-dev-server]`
125
- - `Warning: ReactDOM.render is no longer supported` (React 18 dev warning)
126
- - `Download the React DevTools`
127
- - `/favicon.ico` 404
128
- - Source map warnings
129
-
130
- **Flag as errors**:
131
- - `Uncaught` anything
132
- - `TypeError`, `ReferenceError`, `SyntaxError`
133
- - `Failed to fetch` (network errors)
134
- - `CORS` errors
135
- - `Hydration` mismatches
136
- - `ChunkLoadError` (code splitting failures)
137
- - Any `console.error` call from application code
@@ -1,195 +0,0 @@
1
- # Tier 2: Playwright (Headless Browser)
2
-
3
- Solid middle-ground tier. No browser extension needed — works in CI, SSH, Docker, and headless environments. Provides DOM interaction, console monitoring, screenshots, and network inspection. No GIF recording.
4
-
5
- Read this file only when Tier 2 was selected during DETECT phase.
6
-
7
- ---
8
-
9
- ## Two Modes
10
-
11
- Playwright Tier 2 has two sub-modes depending on what's available. The skill auto-detects which to use.
12
-
13
- ### Mode A: Playwright MCP (preferred)
14
-
15
- If `mcp__playwright__*` tools are available (installed via `npx devlyn-cli` → select "playwright" MCP), use them directly. This gives interactive browser control similar to Tier 1:
16
-
17
- - `mcp__playwright__browser_navigate` — navigate to URL
18
- - `mcp__playwright__browser_screenshot` — capture screenshot
19
- - `mcp__playwright__browser_click` — click elements
20
- - `mcp__playwright__browser_type` — type into inputs
21
- - `mcp__playwright__browser_console` — read console messages
22
- - `mcp__playwright__browser_network` — read network requests
23
- - `mcp__playwright__browser_resize` — resize viewport
24
-
25
- When Playwright MCP is available, follow the same interaction pattern as Tier 1 (navigate → check → interact → screenshot) but using `mcp__playwright__*` tools instead of `mcp__claude-in-chrome__*`.
26
-
27
- Load tools via ToolSearch before use: `ToolSearch: "select:mcp__playwright__browser_navigate"` etc.
28
-
29
- ### Mode B: Script Generation (fallback)
30
-
31
- If Playwright MCP is not installed but `npx playwright` CLI is available, generate and execute test scripts. This is the approach documented below.
32
-
33
- ## Setup (Mode B only)
34
-
35
- Playwright runs via `npx` with auto-download. No global install needed. If browsers aren't installed yet:
36
- ```bash
37
- npx playwright install chromium 2>/dev/null
38
- ```
39
- This downloads only Chromium (~130MB), not all browsers. It's a one-time cost.
40
-
41
- ## Approach (Mode B)
42
-
43
- Generate a temporary test script from the test steps, run it with Playwright's JSON reporter, then parse the results. This avoids needing a persistent test infrastructure — the script is created, executed, and cleaned up.
44
-
45
- ## Script Generation
46
-
47
- For each phase (smoke, flow, visual), generate a test script at `.devlyn/browser-test.spec.ts`.
48
-
49
- ### Smoke Test Script Template
50
-
51
- ```typescript
52
- import { test, expect } from '@playwright/test';
53
-
54
- const PORT = {PORT};
55
- const ROUTES = {ROUTES_JSON_ARRAY};
56
-
57
- test.describe('Smoke Tests', () => {
58
- for (const route of ROUTES) {
59
- test(`smoke: ${route}`, async ({ page }) => {
60
- const errors: string[] = [];
61
- const failedRequests: string[] = [];
62
-
63
- page.on('console', msg => {
64
- if (msg.type() === 'error') errors.push(msg.text());
65
- });
66
-
67
- page.on('response', response => {
68
- if (response.status() >= 400 && !response.url().includes('favicon')) {
69
- failedRequests.push(`${response.status()} ${response.url()}`);
70
- }
71
- });
72
-
73
- // If goto throws (connection refused), the test fails — that's correct behavior
74
- await page.goto(`http://localhost:${PORT}${route}`, { waitUntil: 'networkidle', timeout: 15000 });
75
-
76
- // Verify this is the actual application, not an error page.
77
- // When a server is down or a route is broken, the browser shows an error page
78
- // that still has text content — "Unable to connect", "This site can't be reached", etc.
79
- // A naive length check would pass on these. The title is the best signal:
80
- // browser error pages have titles like "Problem loading page" or the URL itself,
81
- // while real apps have meaningful titles set by the application.
82
- const title = await page.title();
83
- const bodyText = await page.textContent('body') || '';
84
-
85
- // Page must have substantive content
86
- expect(bodyText.trim().length, 'Page body is empty').toBeGreaterThan(0);
87
-
88
- // Fail if the page navigation itself failed (Playwright sets title to the URL on error)
89
- const pageUrl = page.url();
90
- expect(title, 'Page shows a browser error — server may be down').not.toBe(pageUrl);
91
-
92
- await page.screenshot({ path: `${SCREENSHOT_DIR}/smoke/${route.replace(/^\//, '').replace(/\//g, '-') || 'root'}.png`, fullPage: true });
93
- // SCREENSHOT_DIR is the topic-scoped dir set up in PHASE 1 of SKILL.md
94
- // (e.g., .devlyn/screenshots/add-login-page). Inject it at test-generation
95
- // time so every test writes into the same per-run folder.
96
-
97
- if (errors.length > 0) {
98
- test.info().annotations.push({ type: 'console_errors', description: errors.join(' | ') });
99
- }
100
- if (failedRequests.length > 0) {
101
- test.info().annotations.push({ type: 'network_failures', description: failedRequests.join(' | ') });
102
- }
103
-
104
- expect(errors.filter(e => !e.includes('[HMR]') && !e.includes('favicon'))).toHaveLength(0);
105
- expect(failedRequests).toHaveLength(0);
106
- });
107
- }
108
- });
109
- ```
110
-
111
- ### Flow Test Script Template
112
-
113
- For each flow test step from done-criteria, generate a test block:
114
-
115
- ```typescript
116
- test('flow: [criterion description]', async ({ page }) => {
117
- // Navigate
118
- await page.goto(`http://localhost:${PORT}{start_route}`);
119
-
120
- // Find and interact
121
- await page.click('[text or selector]');
122
- await page.fill('[selector]', '[value]');
123
- await page.click('[submit selector]');
124
-
125
- // Verify
126
- await expect(page.locator('[verification selector]')).toBeVisible();
127
-
128
- // Screenshot
129
- await page.screenshot({ path: `${SCREENSHOT_DIR}/feature/[criterion-slug]-step[N].png` });
130
- });
131
- ```
132
-
133
- ### Visual Test Script Template
134
-
135
- ```typescript
136
- test.describe('Visual - Mobile', () => {
137
- test.use({ viewport: { width: 375, height: 812 } });
138
- for (const route of ROUTES) {
139
- test(`visual-mobile: ${route}`, async ({ page }) => {
140
- await page.goto(`http://localhost:${PORT}${route}`, { waitUntil: 'networkidle' });
141
- await page.screenshot({ path: `${SCREENSHOT_DIR}/visual/mobile-${route.replace(/^\//, '').replace(/\//g, '-') || 'root'}.png`, fullPage: true });
142
- });
143
- }
144
- });
145
-
146
- test.describe('Visual - Desktop', () => {
147
- test.use({ viewport: { width: 1280, height: 800 } });
148
- for (const route of ROUTES) {
149
- test(`visual-desktop: ${route}`, async ({ page }) => {
150
- await page.goto(`http://localhost:${PORT}${route}`, { waitUntil: 'networkidle' });
151
- await page.screenshot({ path: `${SCREENSHOT_DIR}/visual/desktop-${route.replace(/^\//, '').replace(/\//g, '-') || 'root'}.png`, fullPage: true });
152
- });
153
- }
154
- });
155
- ```
156
-
157
- ## Execution
158
-
159
- ```bash
160
- mkdir -p "$SCREENSHOT_DIR"/{smoke,feature,visual}
161
- npx playwright test .devlyn/browser-test.spec.ts \
162
- --reporter=json \
163
- --output=.devlyn/playwright-results \
164
- 2>&1 | tee .devlyn/playwright-output.json
165
- ```
166
-
167
- ## Parsing Results
168
-
169
- Read `.devlyn/playwright-output.json`. The JSON structure contains:
170
- - `suites[].specs[].tests[].results[].status` — `"passed"`, `"failed"`, `"timedOut"`
171
- - `suites[].specs[].tests[].results[].errors` — error messages with stack traces
172
- - `suites[].specs[].tests[].annotations` — custom annotations (console_errors, network_failures)
173
-
174
- Map these to BROWSER-RESULTS.md findings:
175
- - `failed` → route fails smoke, include error message
176
- - Annotations with `console_errors` → list in Runtime Errors section
177
- - Annotations with `network_failures` → list in Failed Network Requests section
178
-
179
- ## Cleanup
180
-
181
- After parsing results:
182
- ```bash
183
- rm -f .devlyn/browser-test.spec.ts
184
- rm -rf .devlyn/playwright-results
185
- rm -f .devlyn/playwright-output.json
186
- ```
187
-
188
- Keep `$SCREENSHOT_DIR` (`.devlyn/screenshots/<topic-slug>/`) — those are evidence referenced by the report. Don't touch other topics' directories.
189
-
190
- ## Limitations vs Tier 1
191
-
192
- - No GIF recording (can't capture multi-step flow animations)
193
- - No live DOM exploration (tests are scripted, not interactive)
194
- - Screenshots are full-page captures, not viewport-specific (use `fullPage: true`)
195
- - Console filtering is code-based (less flexible than chrome MCP pattern matching)
@@ -1,57 +0,0 @@
1
- # Tier 3: HTTP Smoke (curl)
2
-
3
- Bare-minimum fallback. No browser, no JavaScript execution, no interaction testing. This tier confirms the dev server responds and pages return valid HTML. It catches "app doesn't start" and "page returns 500" but nothing subtler.
4
-
5
- Read this file only when Tier 3 was selected during DETECT phase.
6
-
7
- ---
8
-
9
- ## What You Can Test
10
-
11
- - Server responds on the expected port
12
- - Pages return HTTP 200
13
- - HTML contains a `<body>` with content (not an empty shell)
14
- - No server-side error indicators in the HTML
15
-
16
- ## What You Cannot Test
17
-
18
- - Client-side rendering (SPA content won't appear in curl output)
19
- - JavaScript errors or console output
20
- - Network requests made by the client
21
- - Interactive elements (forms, buttons, navigation)
22
- - Visual layout or responsive behavior
23
- - Screenshots
24
-
25
- ## Smoke Test
26
-
27
- For each affected route:
28
-
29
- ```bash
30
- # Check HTTP status
31
- STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:{PORT}{route} --max-time 10)
32
-
33
- # Get HTML content
34
- HTML=$(curl -s http://localhost:{PORT}{route} --max-time 10)
35
- ```
36
-
37
- ### Pass Criteria
38
-
39
- A route passes if:
40
- 1. curl succeeds (doesn't error out with connection refused or timeout)
41
- 2. `STATUS` is `200` (or `301`, `302`, `304`) — not `000`, not `5xx`
42
- 3. HTML contains `<body` tag
43
- 3. HTML body has more than 100 characters of text content (not just empty divs)
44
- 4. HTML does not contain server error indicators: `Internal Server Error`, `500`, `ECONNREFUSED`, `Cannot GET`, `404`
45
-
46
- ### Parsing HTML Content
47
-
48
- Since curl returns raw HTML (no JS execution), for SPAs the body may only contain a root `<div id="root"></div>` or `<div id="__next"></div>`. This is normal and counts as a PASS for Tier 3 — note it as "SPA shell detected, client-side rendering not verifiable at this tier."
49
-
50
- For SSR frameworks (Next.js with server components, Nuxt, Astro), the HTML should contain actual rendered content.
51
-
52
- ## Report Adjustments
53
-
54
- When writing BROWSER-RESULTS.md from Tier 3:
55
- - Set confidence level to LOW
56
- - Leave Console Errors, Network Failures, Flow Tests, and Visual Check sections as "N/A — Tier 3 (HTTP only)"
57
- - Note the limitation: "Tier 3 testing provides HTTP-level validation only. Client-side behavior, JavaScript errors, and visual rendering were not tested. For comprehensive browser validation, install the claude-in-chrome extension (Tier 1) or Playwright (Tier 2)."