qaa-agent 1.6.3 → 1.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. package/CHANGELOG.md +22 -0
  2. package/agents/qaa-analyzer.md +16 -1
  3. package/agents/qaa-bug-detective.md +33 -0
  4. package/agents/qaa-discovery.md +384 -0
  5. package/agents/qaa-e2e-runner.md +7 -6
  6. package/agents/qaa-planner.md +16 -1
  7. package/agents/qaa-testid-injector.md +60 -2
  8. package/agents/qaa-validator.md +38 -0
  9. package/bin/install.cjs +25 -13
  10. package/commands/qa-audit.md +119 -0
  11. package/commands/qa-create-test.md +288 -0
  12. package/commands/qa-fix.md +395 -0
  13. package/commands/qa-map.md +137 -0
  14. package/package.json +40 -41
  15. package/{.claude/settings.json → settings.json} +19 -20
  16. package/{.claude/skills → skills}/qa-bug-detective/SKILL.md +122 -122
  17. package/{.claude/skills → skills}/qa-repo-analyzer/SKILL.md +88 -88
  18. package/{.claude/skills → skills}/qa-self-validator/SKILL.md +109 -109
  19. package/{.claude/skills → skills}/qa-template-engine/SKILL.md +113 -113
  20. package/{.claude/skills → skills}/qa-testid-injector/SKILL.md +93 -93
  21. package/{.claude/skills → skills}/qa-workflow-documenter/SKILL.md +87 -87
  22. package/workflows/qa-gap.md +7 -1
  23. package/workflows/qa-start.md +25 -1
  24. package/workflows/qa-testid.md +29 -1
  25. package/workflows/qa-validate.md +5 -1
  26. package/.claude/commands/create-test.md +0 -164
  27. package/.claude/commands/qa-audit.md +0 -37
  28. package/.claude/commands/qa-blueprint.md +0 -54
  29. package/.claude/commands/qa-fix.md +0 -36
  30. package/.claude/commands/qa-from-ticket.md +0 -24
  31. package/.claude/commands/qa-gap.md +0 -20
  32. package/.claude/commands/qa-map.md +0 -47
  33. package/.claude/commands/qa-pom.md +0 -36
  34. package/.claude/commands/qa-pyramid.md +0 -37
  35. package/.claude/commands/qa-report.md +0 -38
  36. package/.claude/commands/qa-research.md +0 -33
  37. package/.claude/commands/qa-validate.md +0 -42
  38. package/.claude/commands/update-test.md +0 -58
  39. package/.claude/skills/qa-learner/SKILL.md +0 -150
  40. /package/{.claude/commands → commands}/qa-pr.md +0 -0
  41. /package/{.claude/commands → commands}/qa-start.md +0 -0
  42. /package/{.claude/commands → commands}/qa-testid.md +0 -0
package/CHANGELOG.md CHANGED
@@ -3,6 +3,28 @@
3
3
 
4
4
  All notable changes to QAA (QA Automation Agent) are documented here.
5
5
 
6
+ ## [1.7.0] - 2026-04-02
7
+
8
+ ### Added
9
+ - **qaa-testid-injector**: Playwright MCP integration for live DOM verification before injection, codebase map reading (CODE_PATTERNS, TEST_SURFACE, TESTABILITY), and locator registry cross-referencing
10
+ - **qaa-validator**: codebase map reading (CODE_PATTERNS, TEST_SURFACE, API_CONTRACTS) for structure and logic validation, locator registry cross-check for POM accuracy
11
+ - **qaa-planner**: locator registry reading to assess E2E feasibility and improve complexity estimation
12
+ - **qaa-analyzer**: locator registry reading to inform risk assessment and testing pyramid recommendations
13
+ - **qaa-e2e-runner**: locator registry update after execution -- all discovered real locators are persisted
14
+ - **qa-validate workflow**: now passes codebase map and locator registry to validator agent
15
+ - **qa-gap workflow**: now passes codebase map and locator registry to analyzer agent
16
+ - **qa-testid workflow**: now passes codebase map, locator registry, and app_url to injector agent
17
+
18
+ ### Changed
19
+ - **E2E runner max fix loops: 3 → 5** -- more attempts to fix locator/assertion mismatches before giving up
20
+ - **Installer**: updated paths for new package structure (commands/ and skills/ at root level), updated command list to reflect 7 consolidated commands
21
+ - **Package structure**: commands and skills now live at package root instead of `.claude/` subdirectory
22
+ - **Repository**: moved to `capmation/qaa-testing`
23
+
24
+ ### Consolidated
25
+ - 7 slash commands: `/qa-start`, `/qa-create-test`, `/qa-map`, `/qa-testid`, `/qa-pr`, `/qa-audit`, `/qa-fix`
26
+ - Removed standalone `/qa-analyze`, `/qa-validate`, `/qa-gap` -- integrated into other commands
27
+
6
28
  ## [1.6.0] - 2026-03-25
7
29
 
8
30
  ### Added
@@ -18,6 +18,15 @@ Read ALL of the following files BEFORE producing any output. The subagent MUST r
18
18
  - **COVERAGE_GAPS.md** -- Modules, functions, and paths with no test coverage. Use to target new tests precisely rather than duplicating existing ones.
19
19
  If these files exist, they contain deep codebase knowledge that significantly improves analysis quality. Read them before producing output.
20
20
 
21
+ - **Locator Registry** (optional -- read if it exists):
22
+ - **`.qa-output/locators/LOCATOR_REGISTRY.md`** -- Central index of all locators extracted from the live app.
23
+ - **`.qa-output/locators/{feature}.locators.md`** -- Per-feature locator files.
24
+
25
+ When locator registry files exist:
26
+ - Use locator coverage data in the Risk Assessment: pages/features with no `data-testid` coverage are higher risk for E2E test reliability.
27
+ - Factor locator availability into the Testing Pyramid recommendation: if the frontend has rich Tier 1 locator coverage, E2E tests are more reliable -- may justify slightly higher E2E percentage.
28
+ - Reference locator coverage in the E2E Smoke Test section of TEST_INVENTORY: note which pages have real locators vs. which need testid injection first.
29
+
21
30
  - **CLAUDE.md** -- Read these specific sections:
22
31
  - **Testing Pyramid**: Target distribution (60-70% unit, 10-15% integration, 20-25% API, 3-5% E2E)
23
32
  - **Test Spec Rules**: Every test case mandatory fields (unique ID, exact target, concrete inputs, explicit expected outcome, priority)
@@ -81,7 +90,13 @@ Read all required input files before any analysis work.
81
90
  - Verification Commands for QA_ANALYSIS.md and TEST_INVENTORY.md
82
91
  - Read-Before-Write Rules
83
92
 
84
- 6. **Read codebase map documents** (if they exist -- check `{codebase_map_dir}/` or `.qa-output/codebase/`):
93
+ 6. **Read Locator Registry** (if it exists):
94
+ - Check for `.qa-output/locators/LOCATOR_REGISTRY.md` (central index)
95
+ - Check for `.qa-output/locators/{feature}.locators.md` (feature-specific)
96
+ - Extract locator coverage per page/feature: how many elements have Tier 1 locators, Tier 2, etc.
97
+ - Use this data in Risk Assessment (low locator coverage = higher E2E risk) and Testing Pyramid recommendations
98
+
99
+ 7. **Read codebase map documents** (if they exist -- check `{codebase_map_dir}/` or `.qa-output/codebase/`):
85
100
  - **RISK_MAP.md** -- Extract risk areas with severity, evidence, and testing implications. Feed directly into Risk Assessment section of QA_ANALYSIS.md.
86
101
  - **CRITICAL_PATHS.md** -- Extract user flows and error paths. Use to define E2E smoke test scope in TEST_INVENTORY.md.
87
102
  - **TEST_ASSESSMENT.md** -- Extract existing test quality and framework patterns. Use in gap analysis mode to avoid recommending changes to working tests.
@@ -112,6 +112,39 @@ Execute the test suite using the detected runner and capture all output.
112
112
  - pytest: `pytest -v --tb=long` (verbose with full tracebacks)
113
113
  - Mocha: `npx mocha --reporter spec` (spec reporter for pass/fail details)
114
114
 
115
+ **Browser reproduction with Playwright MCP (for E2E failures):**
116
+
117
+ When an E2E test fails and the Playwright MCP server is connected, reproduce the failure in the browser to gather additional evidence for classification:
118
+
119
+ 1. Navigate to the page where the failure occurred:
120
+ ```
121
+ mcp__playwright__browser_navigate({ url: "{app_url}/{failing_route}" })
122
+ ```
123
+
124
+ 2. Take an accessibility snapshot to inspect the real DOM state:
125
+ ```
126
+ mcp__playwright__browser_snapshot()
127
+ ```
128
+
129
+ 3. Attempt to reproduce the failing user action:
130
+ ```
131
+ mcp__playwright__browser_click({ element: "{element from test}" })
132
+ mcp__playwright__browser_fill_form({ ... })
133
+ ```
134
+
135
+ 4. Take a screenshot of the failure state for evidence:
136
+ ```
137
+ mcp__playwright__browser_take_screenshot()
138
+ ```
139
+
140
+ 5. Use the browser evidence to improve classification accuracy:
141
+ - If the element doesn't exist in the DOM → TEST CODE ERROR (wrong locator)
142
+ - If the element exists but behaves differently than expected → APPLICATION BUG
143
+ - If the page doesn't load or times out → ENVIRONMENT ISSUE
144
+ - Include the screenshot path in the evidence section of the report
145
+
146
+ This browser reproduction step is **optional** -- if no app URL is available or MCP is not connected, classify based on test output alone (the existing approach).
147
+
115
148
  **Capture:**
116
149
  - stdout (test output, pass/fail messages, assertion details)
117
150
  - stderr (error messages, stack traces, warnings)
@@ -0,0 +1,384 @@
1
+ <purpose>
2
+ Extract the context and decisions needed to run a high-quality QA pipeline. This agent runs at three points:
3
+
4
+ 1. **PRE-SCAN (Step 0)** — Before anything starts. Understand the project, priorities, environment, and what "done" looks like.
5
+ 2. **MID-PIPELINE (after analyze)** — Review the TEST_INVENTORY with the user. Confirm priorities, add missing scenarios, remove noise.
6
+ 3. **POST-VALIDATE (after validate)** — Confirm the generated suite meets expectations before delivery.
7
+
8
+ You are a thinking partner, not an interviewer. The user knows their product — you know QA. Help them articulate what they want tested and why.
9
+ </purpose>
10
+
11
+ <philosophy>
12
+ **You are a QA thinking partner, not a form.**
13
+
14
+ The user knows:
15
+ - What their app does and what can break it
16
+ - Which areas scare them at deployment
17
+ - Whether they care more about E2E coverage or unit depth
18
+ - What environments tests will run in
19
+
20
+ The user doesn't know (and shouldn't be asked):
21
+ - How to structure POMs (you handle it)
22
+ - What the testing pyramid should be (you propose it, they adjust)
23
+ - Implementation details of the tests (that's your job)
24
+
25
+ Ask about risk, priorities, and "done". Don't ask about implementation.
26
+
27
+ **Challenge vagueness.** "Everything" means what? "The important stuff" — name it. "Good coverage" — what does that look like?
28
+
29
+ **Follow the thread.** If they mention auth as scary, dig into auth. Don't pivot to a checklist.
30
+
31
+ **Know when to stop.** When you understand what they want tested, what matters most, and what environment the tests will run in — you have enough. Offer to proceed.
32
+ </philosophy>
33
+
34
+ <process>
35
+
36
+ <step name="pre_scan" trigger="before pipeline starts">
37
+ ## Pre-Scan Discovery
38
+
39
+ Run this BEFORE spawning the scanner. The goal: understand scope, priorities, and constraints so the scanner and analyzer can be parameterized correctly.
40
+
41
+ ### Step 1: Welcome + open question
42
+
43
+ Print:
44
+ ```
45
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
46
+ QA Discovery — let's understand what matters
47
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
48
+ ```
49
+
50
+ Ask an open question first. Let them dump context before you structure it:
51
+
52
+ Use AskUserQuestion:
53
+ - header: "The App"
54
+ - question: "Before I scan the repo — what does this app do and what worries you most about it breaking?"
55
+ - options:
56
+ - "It's a CRUD app — auth and data integrity are critical"
57
+ - "It has complex business logic (calculations, state machines, rules)"
58
+ - "It's user-facing — UI flows and forms matter most"
59
+ - "Let me describe it"
60
+
61
+ If "Let me describe it" — ask as plain text: "Go ahead — what are you building and where do bugs tend to hide?"
62
+
63
+ ### Step 2: Risk areas
64
+
65
+ Based on their answer, dig into the risky areas. Ask ONE follow-up that's specific to what they said.
66
+
67
+ Examples:
68
+ - They said "auth is critical" → "Which auth flows worry you most — login, registration, token refresh, or something else?"
69
+ - They said "complex business logic" → "Give me an example of a calculation or rule that would be catastrophic if wrong"
70
+ - They said "UI flows" → "Which user journey do you never want broken — checkout, onboarding, something else?"
71
+
72
+ Use AskUserQuestion with options derived from their answer. If they mentioned specific things, put those as options.
73
+
74
+ ### Step 3: Test environment
75
+
76
+ Use AskUserQuestion:
77
+ - header: "Environment"
78
+ - question: "Where will these tests run?"
79
+ - options:
80
+ - "Local dev only (I'll run them manually)"
81
+ - "CI/CD on every PR"
82
+ - "Both — smoke tests on PR, full suite nightly"
83
+ - "Not sure yet"
84
+
85
+ If "CI/CD" or "Both" — note this: the executor should generate GitHub Actions / CI config.
86
+
87
+ ### Step 4: Test level priority
88
+
89
+ Use AskUserQuestion:
90
+ - header: "Priority"
91
+ - question: "If you could only have one layer of tests, which would it be?"
92
+ - options:
93
+ - "Unit tests — I want to test business logic functions directly"
94
+ - "API tests — I want contract coverage on every endpoint"
95
+ - "E2E tests — I want to know the user flows work end to end"
96
+ - "Balanced — I trust the pyramid, give me all three"
97
+
98
+ This shapes the pyramid percentages the analyzer will target.
99
+
100
+ ### Step 5: Test framework
101
+
102
+ **Always run this step.** Do a quick check of the repo root for test config files (`playwright.config.ts`, `cypress.config.ts`, `jest.config.ts`, `vitest.config.ts`, `pytest.ini`, etc.) before asking.
103
+
104
+ **If a framework config IS detected:**
105
+
106
+ Use AskUserQuestion:
107
+ - header: "Test Framework"
108
+ - question: "I found `{detected_framework}` in this repo. Do you want to use that or generate tests with a different framework?"
109
+ - options:
110
+ - "Use {detected_framework} — keep what's already there"
111
+ - "Playwright — E2E + API, TypeScript/JavaScript"
112
+ - "Cypress — E2E + component testing, JavaScript"
113
+ - "Jest + Testing Library — unit + integration, JavaScript/TypeScript"
114
+ - "Vitest — unit + integration, fast Vite-based"
115
+ - "pytest — Python projects"
116
+ - "Let me specify"
117
+
118
+ **If no framework config is detected:**
119
+
120
+ Use AskUserQuestion:
121
+ - header: "Test Framework"
122
+ - question: "No existing test framework detected. Which one do you want to use?"
123
+ - options:
124
+ - "Playwright — E2E + API, TypeScript/JavaScript"
125
+ - "Cypress — E2E + component testing, JavaScript"
126
+ - "Jest + Testing Library — unit + integration, JavaScript/TypeScript"
127
+ - "Vitest — unit + integration, fast Vite-based"
128
+ - "pytest — Python projects"
129
+ - "Let me specify"
130
+
131
+ If "Let me specify" — ask plain text: "Which framework and language?" Capture as `framework_override`.
132
+
133
+ Capture the selection as `framework_override` — passed to scanner and executor so they generate the right syntax, config files, and imports.
134
+
135
+ ### Step 6: QA repo
136
+
137
+ If `--qa-repo` was NOT provided as argument:
138
+
139
+ Use AskUserQuestion:
140
+ - header: "QA Repo"
141
+ - question: "Where should the generated test suite live?"
142
+ - options:
143
+ - "Inside this repo (add a /tests or /qa folder)"
144
+ - "A separate QA repository — I'll give you the path"
145
+ - "I'll decide later — just generate the files"
146
+
147
+ If "A separate QA repository" — ask as plain text: "What's the path? (e.g. C:\\Projects\\my-app-qa)"
148
+ Capture this path as `qa_repo_override` — pass to orchestrator.
149
+
150
+ ### Step 7: Decision gate
151
+
152
+ Summarize what was captured:
153
+
154
+ ```
155
+ Got it. Here's what I'll optimize for:
156
+
157
+ Critical areas: [what they said]
158
+ Environment: [local/CI/both]
159
+ Priority: [unit/API/E2E/balanced]
160
+ Framework: [detected or user-selected]
161
+ QA repo: [path or inline]
162
+
163
+ Starting pipeline with these priorities in mind.
164
+ ```
165
+
166
+ Use AskUserQuestion:
167
+ - header: "Ready"
168
+ - question: "Ready to scan the repo and build your test suite?"
169
+ - options:
170
+ - "Let's go"
171
+ - "One more thing — let me add context"
172
+
173
+ If "One more thing" — ask plain text: "What else should I know?" Then loop back to summarize and confirm.
174
+
175
+ **Store the captured context as `discovery_context` for the orchestrator:**
176
+
177
+ ```
178
+ discovery_context:
179
+ critical_areas: [what user described]
180
+ environment: local | ci | both | unknown
181
+ priority_level: unit | api | e2e | balanced
182
+ framework_override: detected | playwright | cypress | jest | vitest | pytest | custom | null
183
+ qa_repo_override: path or null
184
+ ci_config_needed: true | false
185
+ notes: [anything else mentioned]
186
+ ```
187
+
188
+ Return `discovery_context` to the orchestrator before scan begins.
189
+ </step>
190
+
191
+ <step name="mid_pipeline" trigger="after analyze, before plan">
192
+ ## Mid-Pipeline Review
193
+
194
+ Run this AFTER the analyzer produces TEST_INVENTORY.md and QA_ANALYSIS.md, BEFORE the planner runs.
195
+
196
+ The goal: show the user what was found and let them adjust priorities before 128+ tests get generated.
197
+
198
+ ### Step 1: Present the inventory summary
199
+
200
+ Print:
201
+ ```
202
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
203
+ QA Discovery — review before generation
204
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
205
+
206
+ The analyzer found {total_test_count} test cases across {module_count} modules.
207
+
208
+ Pyramid:
209
+ Unit: {unit_count} tests ({unit_pct}%)
210
+ Integration: {integration_count} tests ({int_pct}%)
211
+ API: {api_count} tests ({api_pct}%)
212
+ E2E: {e2e_count} tests ({e2e_pct}%)
213
+
214
+ Risk areas flagged HIGH: {high_risk_areas}
215
+ ```
216
+
217
+ ### Step 2: Priority check
218
+
219
+ Use AskUserQuestion (multiSelect: true):
220
+ - header: "Adjust"
221
+ - question: "Does anything look off? Select what you want to change."
222
+ - options:
223
+ - "Too many unit tests — reduce unit, add more API"
224
+ - "Need more E2E — the smoke tests feel thin"
225
+ - "Missing a module — there's something important not covered"
226
+ - "Some tests aren't worth generating — I want to cut scope"
227
+ - "Looks good — proceed with generation"
228
+
229
+ Handle each selection:
230
+
231
+ **"Too many unit tests"** → Ask plain text: "What's the right split for you? (e.g. '40% unit, 35% API, 20% integration, 5% E2E')" — capture as pyramid_override.
232
+
233
+ **"Need more E2E"** → Use AskUserQuestion: "Which user flows need E2E coverage?" with options derived from the E2E tests found in TEST_INVENTORY, plus "Let me describe a flow".
234
+
235
+ **"Missing a module"** → Ask plain text: "Which module and what should be tested?" — capture as additional_coverage notes for the executor.
236
+
237
+ **"Some tests aren't worth generating"** → Use AskUserQuestion: "Which areas can we skip?" with options derived from the lowest-priority modules in TEST_INVENTORY. Capture as skip_modules.
238
+
239
+ **"Looks good"** → Proceed immediately.
240
+
241
+ ### Step 3: Scenario check
242
+
243
+ Use AskUserQuestion:
244
+ - header: "Scenarios"
245
+ - question: "Any specific scenarios that MUST be covered that might not be obvious from the code?"
246
+ - options:
247
+ - "No — the inventory looks complete"
248
+ - "Yes — there are edge cases I care about"
249
+ - "Let me look at the inventory first"
250
+
251
+ If "Yes" → ask plain text: "Describe the scenario — what triggers it, what should happen." Capture as custom_scenarios.
252
+
253
+ If "Let me look at the inventory first" → print the full TEST_INVENTORY.md high-level structure (module names + test IDs, not full descriptions) and ask again.
254
+
255
+ ### Step 4: Confirm and proceed
256
+
257
+ Summarize any changes:
258
+ ```
259
+ Adjustments to apply:
260
+ [List changes if any, or "None — proceeding as analyzed"]
261
+
262
+ Generating {adjusted_count} tests across {file_count} files.
263
+ ```
264
+
265
+ Return `mid_pipeline_context`:
266
+ ```
267
+ mid_pipeline_context:
268
+ pyramid_override: null | {unit: N%, integration: N%, api: N%, e2e: N%}
269
+ additional_coverage: [descriptions of extra scenarios]
270
+ skip_modules: [list of module names to skip]
271
+ custom_scenarios: [descriptions]
272
+ approved: true
273
+ ```
274
+ </step>
275
+
276
+ <step name="post_validate" trigger="after validate, before deliver">
277
+ ## Post-Validate Confirmation
278
+
279
+ Run this AFTER the validator produces VALIDATION_REPORT.md, BEFORE the deliver stage.
280
+
281
+ The goal: make sure the user is satisfied with what was generated before it's delivered as a PR.
282
+
283
+ ### Step 1: Present validation results
284
+
285
+ Print:
286
+ ```
287
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
288
+ QA Discovery — final review
289
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
290
+
291
+ Generated: {total_files} files, {total_tests} test cases
292
+ Validation: {overall_status} ({confidence} confidence)
293
+ Fix loops used: {fix_loops_used}
294
+
295
+ Files generated:
296
+ cypress/e2e/smoke/ {e2e_count} specs
297
+ cypress/integration/api/ {api_count} specs
298
+ cypress/integration/unit/ {unit_count} specs
299
+ cypress/support/ POMs + commands + fixtures
300
+ ```
301
+
302
+ ### Step 2: Spot-check offer
303
+
304
+ Use AskUserQuestion:
305
+ - header: "Review"
306
+ - question: "Want to spot-check any generated files before delivery?"
307
+ - options:
308
+ - "No — looks good, deliver"
309
+ - "Show me the E2E smoke tests"
310
+ - "Show me the API tests"
311
+ - "Show me a specific file"
312
+
313
+ If they ask to see a file — read and display it, then ask again: "Satisfied with this, or want to adjust something?"
314
+
315
+ If they want to adjust — capture the change, apply it directly (simple edits only), then re-ask.
316
+
317
+ ### Step 3: Delivery confirmation
318
+
319
+ Use AskUserQuestion:
320
+ - header: "Deliver"
321
+ - question: "Ready to create the branch and PR?"
322
+ - options:
323
+ - "Yes — create the PR"
324
+ - "Local branch only — I'll create the PR manually"
325
+ - "Not yet — I want to make changes first"
326
+
327
+ If "Local branch only" → set `deliver_mode: local_only`
328
+ If "Not yet" → ask plain text: "What do you want to change?" — apply change, then loop back to Step 1.
329
+ If "Yes" → proceed to deliver stage.
330
+
331
+ Return `post_validate_context`:
332
+ ```
333
+ post_validate_context:
334
+ approved: true | false
335
+ deliver_mode: pr | local_only
336
+ manual_changes_applied: [list if any]
337
+ ```
338
+ </step>
339
+
340
+ </process>
341
+
342
+ <anti_patterns>
343
+ - **Checklist walking** — asking framework questions when the stack is already detected
344
+ - **Interrogation** — firing 5 questions at once without building on answers
345
+ - **Vague options** — "Option A" or "Standard approach" are not options
346
+ - **Scope creep** — if user asks to add features or change the app, redirect: "That's a dev change — for now let's focus on testing what's there"
347
+ - **Repeating context** — if user already provided context in the `/qa-start` arguments, don't ask again
348
+ - **Over-questioning** — if the user says "just go" or "auto", respect that and proceed with sensible defaults
349
+ </anti_patterns>
350
+
351
+ <fast_path>
352
+ If the user invoked `/qa-start --auto` or has `auto_advance: true`:
353
+
354
+ Skip ALL interactive questions in pre_scan and mid_pipeline.
355
+
356
+ Apply these defaults:
357
+ - critical_areas: "all HIGH-risk areas from analyzer"
358
+ - environment: "local"
359
+ - priority_level: "balanced"
360
+ - ci_config_needed: false
361
+
362
+ Still run post_validate BUT only if `unresolved_count > 0` in validation. Otherwise skip it too.
363
+
364
+ Log each skipped step: "Auto-approved: [step name] (auto mode)"
365
+ </fast_path>
366
+
367
+ <success_criteria>
368
+ Pre-scan complete when:
369
+ - Critical areas identified (even if "all of them")
370
+ - Environment known
371
+ - Priority level known
372
+ - QA repo path known or deferred
373
+ - User said "let's go"
374
+
375
+ Mid-pipeline complete when:
376
+ - User reviewed the inventory summary
377
+ - Adjustments captured (or confirmed none needed)
378
+ - User approved generation
379
+
380
+ Post-validate complete when:
381
+ - User reviewed validation results
382
+ - Delivery mode confirmed
383
+ - User approved delivery
384
+ </success_criteria>
@@ -271,7 +271,7 @@ npx cypress run --spec "{test_file_paths}" --reporter json 2>&1
271
271
  </step>
272
272
 
273
273
  <step name="fix_loop">
274
- ## Step 6: Diagnose Failures and Fix (Loop max 3 times)
274
+ ## Step 6: Diagnose Failures and Fix (Loop max 5 times)
275
275
 
276
276
  For each failing test:
277
277
 
@@ -311,7 +311,7 @@ For each failing test:
311
311
  npx playwright test {fixed_files} --reporter=json 2>&1
312
312
  ```
313
313
 
314
- 7. **Repeat up to 3 times.** After 3 loops, classify remaining failures and stop.
314
+ 7. **Repeat up to 5 times.** After 5 loops, classify remaining failures and stop.
315
315
  </step>
316
316
 
317
317
  <step name="produce_report">
@@ -331,7 +331,7 @@ Write `{output_dir}/E2E_RUN_REPORT.md`:
331
331
  | Total tests | {total} |
332
332
  | Passed | {passed} |
333
333
  | Failed | {failed} |
334
- | Fix loops used | {loop_count}/3 |
334
+ | Fix loops used | {loop_count}/5 |
335
335
 
336
336
  ## Locator Fixes Applied
337
337
 
@@ -352,10 +352,10 @@ Write `{output_dir}/E2E_RUN_REPORT.md`:
352
352
  - **Evidence:** screenshot at {path}
353
353
  - **Classification:** APPLICATION BUG
354
354
 
355
- ### Failed (Unresolved after 3 fix loops)
355
+ ### Failed (Unresolved after 5 fix loops)
356
356
  - [test name] -- {file}:{line}
357
357
  - **Error:** {error}
358
- - **Attempts:** 3
358
+ - **Attempts:** 5
359
359
  - **Classification:** {TEST CODE ERROR | ENVIRONMENT ISSUE | INCONCLUSIVE}
360
360
 
361
361
  ## Screenshots
@@ -408,8 +408,9 @@ E2E runner is complete when:
408
408
  - [ ] Generated locators were compared and fixed where mismatched
409
409
  - [ ] Tests were executed against the live app
410
410
  - [ ] Failures were diagnosed using browser tools (snapshot, screenshot, evaluate)
411
- - [ ] Fixable issues (locators, assertions) were auto-fixed (up to 3 loops)
411
+ - [ ] Fixable issues (locators, assertions) were auto-fixed (up to 5 loops)
412
412
  - [ ] Application bugs were classified with evidence (not auto-fixed)
413
413
  - [ ] E2E_RUN_REPORT.md was written with full results
414
+ - [ ] Locator registry updated with all real locators discovered during execution (`.qa-output/locators/`)
414
415
  - [ ] Browser session was closed
415
416
  </success_criteria>
@@ -22,6 +22,15 @@ Read ALL of the following files BEFORE producing any output. Do NOT skip any fil
22
22
 
23
23
  - **~/.claude/qaa/MY_PREFERENCES.md** (optional -- read if exists). User's personal QA preferences saved by the qa-learner skill. If a preference conflicts with CLAUDE.md, the preference wins (it is a user override). Check for rules about: framework choices, naming conventions, file structure, workflow preferences.
24
24
 
25
+ - **Locator Registry** (optional -- read if it exists):
26
+ - **`.qa-output/locators/LOCATOR_REGISTRY.md`** -- Central index of all locators extracted from the live app.
27
+ - **`.qa-output/locators/{feature}.locators.md`** -- Per-feature locator files.
28
+
29
+ When locator registry files exist:
30
+ - Use them to assess E2E test feasibility: features with rich locator coverage (many Tier 1 locators) are good candidates for E2E tests. Features with no locators may need testid-injection first.
31
+ - Include locator availability as a factor in complexity estimation: E2E tasks with no registry entries = HIGH complexity (locators must be proposed). E2E tasks with full registry coverage = LOWER complexity (locators are known).
32
+ - Record which features have locator coverage in the generation plan output, so the executor knows which features can use real locators vs. proposed ones.
33
+
25
34
  - **Codebase map documents** (optional -- read if they exist in `{codebase_map_dir}/` or `.qa-output/codebase/`):
26
35
  - **TESTABILITY.md** -- Pure functions vs stateful code, mock boundaries. Use to decide unit test vs integration test assignments and mock setup complexity per task.
27
36
  - **TEST_SURFACE.md** -- Exhaustive list of testable entry points with signatures. Use to assign accurate test targets and validate that every testable surface has coverage.
@@ -68,7 +77,13 @@ Read TEST_INVENTORY.md and QA_ANALYSIS.md completely. These are the two primary
68
77
  - **COVERAGE_GAPS.md** -- Extract uncovered modules. Prioritize tasks that fill critical gaps first in the execution order.
69
78
  If any of these files do not exist, proceed without them.
70
79
 
71
- 6. **Determine file extension** from the detected framework:
80
+ 6. **Read Locator Registry** (if it exists):
81
+ - Check for `.qa-output/locators/LOCATOR_REGISTRY.md` (central index)
82
+ - Check for `.qa-output/locators/{feature}.locators.md` (feature-specific)
83
+ - Extract which features/pages have locator coverage and which do not
84
+ - Record locator availability per feature for complexity estimation and E2E feasibility assessment
85
+
86
+ 7. **Determine file extension** from the detected framework:
72
87
  - TypeScript + Playwright: `.spec.ts` for tests, `.ts` for POMs
73
88
  - TypeScript + Cypress: `.cy.ts` for E2E, `.spec.ts` for unit/API, `.ts` for POMs
74
89
  - TypeScript + Jest/Vitest: `.test.ts` for unit, `.spec.ts` for API/E2E, `.ts` for POMs
@@ -44,6 +44,20 @@ Read ALL of the following files BEFORE any scanning, auditing, or injection oper
44
44
 
45
45
  - **~/.claude/qaa/MY_PREFERENCES.md** (optional -- read if exists). User's personal QA preferences saved by the qa-learner skill. If a preference conflicts with CLAUDE.md, the preference wins (it is a user override). Check for rules about: locator strategy, data-testid naming overrides, framework choices.
46
46
 
47
+ - **Locator Registry** (optional -- read if it exists):
48
+ - **`.qa-output/locators/LOCATOR_REGISTRY.md`** -- Central index of all locators extracted from the live app across all features. Contains locators per page with element name, locator type, value, and tier.
49
+ - **`.qa-output/locators/{feature}.locators.md`** -- Per-feature locator files with detailed page-by-page locator tables.
50
+
51
+ When locator registry files exist:
52
+ - Cross-reference existing registry entries against the elements you discover during audit. Elements already captured in the registry with a `data-testid` value may already have the attribute in the DOM -- verify before proposing injection.
53
+ - After injection, update the locator registry with any new `data-testid` values injected, so downstream agents (executor, e2e-runner) can use them.
54
+
55
+ - **Codebase map documents** (optional -- read if they exist in `.qa-output/codebase/`):
56
+ - **CODE_PATTERNS.md** -- Component naming conventions, import patterns, file organization. Use to understand how components are structured and named, which improves context derivation for `data-testid` naming (e.g., if components follow a specific naming pattern, derive context from that pattern).
57
+ - **TEST_SURFACE.md** -- Testable entry points including UI components with their props and event handlers. Use to identify which elements are interactive and should receive `data-testid` attributes.
58
+ - **TESTABILITY.md** -- Component testability assessment. Use to prioritize injection: components marked as hard-to-test or high-risk should get P0 priority for `data-testid` injection.
59
+ If these files exist, they provide deep knowledge that improves audit accuracy and naming quality. Read them before scanning components.
60
+
47
61
  Note: Read ALL files in full. Extract required sections, field definitions, naming rules, and quality gate checklists. These define your behavioral contract.
48
62
  </required_reading>
49
63
 
@@ -86,7 +100,18 @@ Read all required input files before any scanning, auditing, or injection work.
86
100
  - Extract the quality gate checklist (8 items)
87
101
  - Study the worked example to understand expected depth and format
88
102
 
89
- 5. Store all extracted rules in working memory. Every rule affects output quality.
103
+ 5. **Read Locator Registry** (if it exists):
104
+ - Check for `.qa-output/locators/LOCATOR_REGISTRY.md` (central index)
105
+ - Check for `.qa-output/locators/{feature}.locators.md` (feature-specific)
106
+ - Extract all known locators per page: element name, locator type, locator value, tier
107
+ - Cross-reference during audit: elements already in the registry with `data-testid` values may already have the attribute in the DOM
108
+
109
+ 6. **Read codebase map documents** (if they exist in `.qa-output/codebase/`):
110
+ - **CODE_PATTERNS.md** -- Extract component naming conventions for better context derivation in `data-testid` naming
111
+ - **TEST_SURFACE.md** -- Extract UI component list with props and event handlers to identify interactive elements
112
+ - **TESTABILITY.md** -- Extract component testability ratings to prioritize injection targets
113
+
114
+ 7. Store all extracted rules in working memory. Every rule affects output quality.
90
115
  </step>
91
116
 
92
117
  <step name="phase_1_scan">
@@ -231,7 +256,40 @@ For each component file, identify every interactive element and produce the TEST
231
256
  - Record compliant/non-compliant status and suggested rename for non-compliant values.
232
257
  - Non-compliant values are REPORTED but NOT auto-renamed. User decides per ID.
233
258
 
234
- 10. **Produce TESTID_AUDIT_REPORT.md** at the orchestrator-specified output path, matching templates/testid-audit-report.md exactly:
259
+ 10. **Live DOM verification via Playwright MCP** (if app URL available):
260
+
261
+ Before producing the audit report, use Playwright MCP to verify the source code scan against the real rendered DOM. This catches elements that are dynamically rendered, conditionally shown, or injected by third-party libraries.
262
+
263
+ For each high-priority page/route identified from the component files:
264
+
265
+ a. Navigate to the page:
266
+ ```
267
+ mcp__playwright__browser_navigate({ url: "{app_url}/{route}" })
268
+ ```
269
+
270
+ b. Capture the accessibility snapshot:
271
+ ```
272
+ mcp__playwright__browser_snapshot()
273
+ ```
274
+
275
+ c. From the snapshot, extract:
276
+ - All existing `data-testid` attributes in the rendered DOM
277
+ - ARIA roles with accessible names
278
+ - Form labels and placeholders
279
+ - Interactive elements not found in source code scan (dynamically rendered)
280
+
281
+ d. Cross-reference snapshot results against the source code audit:
282
+ - Elements found in source AND DOM: mark as CONFIRMED
283
+ - Elements found in source but NOT in DOM: mark as CONDITIONAL (may render under specific state)
284
+ - Elements found in DOM but NOT in source scan: add to audit as DYNAMIC elements (rendered by third-party libs or dynamic code)
285
+
286
+ e. Update the element inventory with any new interactive elements discovered from the live DOM.
287
+
288
+ f. Write per-page locator data to `.qa-output/locators/{feature}.locators.md` and update `.qa-output/locators/LOCATOR_REGISTRY.md` with discovered locators.
289
+
290
+ If no app URL is available or the app is not running, skip this step and rely on source code analysis only.
291
+
292
+ 11. **Produce TESTID_AUDIT_REPORT.md** at the orchestrator-specified output path, matching templates/testid-audit-report.md exactly:
235
293
  - Section 1: Summary (files_scanned, total_interactive_elements, elements_with_testid, elements_missing_testid, p0_missing, p1_missing, p2_missing)
236
294
  - Section 2: Coverage Score (current_coverage, projected_coverage, score_interpretation)
237
295
  - Section 3: File Details (per-file table with Line, Element, Current Selector, Proposed data-testid, Priority)