npm - qaa-agent - Versions diffs - 1.7.4 → 1.8.0 - Mend

qaa-agent 1.7.4 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/CHANGELOG.md +25 -0
package/README.md +1 -1
package/agents/qa-pipeline-orchestrator.md +47 -0
package/agents/qaa-analyzer.md +41 -0
package/agents/qaa-bug-detective.md +95 -0
package/agents/qaa-codebase-mapper.md +3 -0
package/agents/qaa-e2e-runner.md +86 -0
package/agents/qaa-executor.md +98 -0
package/agents/qaa-planner.md +41 -0
package/agents/qaa-testid-injector.md +68 -0
package/agents/qaa-validator.md +47 -0
package/commands/qa-audit.md +7 -0
package/commands/qa-create-test.md +30 -0
package/commands/qa-fix.md +4 -0
package/commands/qa-map.md +2 -0
package/package.json +1 -1
package/bin/install.cjs +0 -212

package/CHANGELOG.md CHANGED Viewed

@@ -3,6 +3,31 @@
 All notable changes to QAA (QA Automation Agent) are documented here.
+## [1.8.0] - 2026-04-13
+### Added
+- **Active verification checklist in every agent** — all 8 pipeline agents now end their body with a `## Before completing any task, verify each item actively:` section that forces the agent to run real `ls` + `cat` + `grep` commands against `.qa-output/` artifacts, the Locator Registry, codebase map documents, and `MY_PREFERENCES.md` before closing the task. The output of those commands lands in the subagent's context (recency effect), so the model cannot skip reading inputs or leave outputs unwritten without the verification failing.
+- **`skills:` declared in YAML frontmatter for every agent** — `qaa-analyzer`, `qaa-planner`, `qaa-executor`, `qaa-validator`, `qaa-e2e-runner`, `qaa-bug-detective`, `qaa-testid-injector`, `qaa-codebase-mapper`, `qa-pipeline-orchestrator`. Claude Code now injects the matching SKILL.md content at the start of the subagent's context when the Task tool spawns it. Previously subagents spawned with empty context and ignored the skill entirely.
+- **Non-negotiable rules section in `qaa-bug-detective`** — explicit rules for Locator Registry persistence and MY_PREFERENCES.md updates, placed mid-body as redundant reinforcement between the frontmatter (start) and the active checklist (end).
+- **`MY_PREFERENCES.md` reads propagated across slash commands** — `qa-create-test`, `qa-fix`, `qa-audit`, `qa-map` now pass `~/.claude/qaa/MY_PREFERENCES.md` to every spawned agent via `files_to_read`.
+- **Locator Registry reads propagated** — `qa-fix` and `qa-audit` now pass `.qa-output/locators/` to bug-detective, e2e-runner, and validator subagents.
+- **Playwright MCP usage is now non-negotiable in 4 agents** — `qaa-e2e-runner`, `qaa-testid-injector`, `qaa-bug-detective`, `qaa-executor` now hardcode Non-negotiable rules in their body that make live browser interaction via Playwright MCP **mandatory** (not optional) under the appropriate conditions. Previously agents sometimes skipped MCP calls even when the skill described them, because the description was advisory rather than enforced.
+- **MCP evidence files** at `.qa-output/mcp-evidence/{agent-name}-session.md` — every MCP-using agent now writes a structured evidence file per session logging `session_start`, `session_end`, URLs navigated, snapshots/screenshots taken, interactions performed, and `browser_closed: true`. The active verification checklist at the end of each agent runs `ls` + `grep` on this evidence file; missing or empty file = invalid run = hard failure.
+- **Skip-reason tracking** — when MCP is legitimately skipped (no `app_url`, non-E2E failure, MCP not connected), agents must document the skip reason in their primary report (TESTID_AUDIT_REPORT.md / FAILURE_CLASSIFICATION_REPORT.md). Silent skips are no longer permitted.
+- **Locator resolution priority chain — invention is forbidden** — `qaa-executor`, `qaa-e2e-runner`, and `qaa-bug-detective` now enforce a strict priority order when writing any locator: (1) Locator Registry first, (2) frontend source code `grep` second, (3) Playwright MCP live DOM snapshot third, (4) HALT if nothing resolvable. Agents MUST NOT invent `data-testid` values or guess CSS selectors. Every locator written to a generated file requires `source: registry | codebase | mcp` attribution in the MCP evidence file — anything else triggers file deletion or revert.
+- **Priority hit counts logged** — MCP evidence files now track `priority1_hits` (registry reuse), `priority2_hits` (source extraction), `priority3_hits` (MCP discovery), and `priority4_halts` (unresolvable elements), giving a full audit trail of where every locator came from.
+### Changed
+- **Agent reliability pattern: triple reinforcement** — every critical rule is now reinforced three times: (1) `skills:` frontmatter injection at the start of context, (2) `required_reading` + mid-body non-negotiable rules, (3) active `ls`/`cat`/`grep` verification at the end. This closes the "lost in the middle" attention gap documented in long-context LLM research.
+- **`qaa-bug-detective`, `qaa-executor`, `qaa-e2e-runner`, `qaa-validator`** — existing active checklists extended with `.qa-output/` specific items (generation plan, test inventory, codebase map, validation layers, failure classification evidence).
+### Fixed
+- **Subagent skill loss** — when a parent agent spawned a subagent via `Task()`, the subagent ran with fresh context and ignored the skill entirely (it had no way to know a skill existed). Declaring `skills:` in the YAML frontmatter fixes this at the Claude Code loader level.
+- **Artifact-read drift** — agents would sometimes reference `.qa-output/` artifacts in their reasoning without actually reading them. The active `grep` on specific content (e.g. "RISK_MAP HIGH items", "VALIDATION_REPORT confidence level") forces real consumption.
 ## [1.7.0] - 2026-04-02
 ### Added

package/README.md CHANGED Viewed

@@ -43,7 +43,7 @@ npx qaa-agent
 The interactive installer:
 1. Copies agents, commands, skills, templates, and workflows into your runtime directory
-2. Configures the [Playwright MCP](https://github.com/anthropics/mcp-playwright) server in your user-scope config (`~/.claude.json`) so it's available in **all projects**
+2. Configures the [Playwright MCP](https://www.npmjs.com/package/@playwright/mcp) server in your user-scope config (`~/.claude.json`) so it's available in **all projects**
 3. Merges required permissions into `settings.json`
 **Supported runtimes:** Claude Code, OpenCode

package/agents/qa-pipeline-orchestrator.md CHANGED Viewed

@@ -1,3 +1,10 @@
+---
+name: qa-pipeline-orchestrator
+description: Single orchestrator for the QA automation pipeline
+skills:
+  - qa-workflow-documenter
+---
 <purpose>
 Single orchestrator for the QA automation pipeline. Coordinates all 7 agent types (scanner, analyzer, planner, executor, validator, bug-detective, testid-injector) across 3 workflow options. Owns all pipeline state transitions -- agents never update state directly. The orchestrator sets stage status to 'running' before spawning an agent and 'complete' or 'failed' after the agent returns.
@@ -1376,3 +1383,43 @@ Before this orchestrator is considered complete, verify:
 4. Checkpoints pause when appropriate and auto-approve when safe
 5. Failure in any stage stops the pipeline cleanly with actionable error message
 </success_criteria>
+## MANDATORY verification — run ALL commands below, no exceptions, no skipping
+Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
+```bash
+echo "=== PIPELINE ORCHESTRATOR CHECKLIST START ==="
+echo "1. Pipeline state file:"
+ls .planning/STATE.md 2>/dev/null || echo "STATE_FILE_NOT_FOUND"
+echo "2. Stage status fields:"
+grep -E "scan_status|analyze_status|generate_status|validate_status|deliver_status" .planning/STATE.md 2>/dev/null || echo "NO_STATUS_FIELDS"
+echo "3. All .qa-output/ artifacts:"
+ls .qa-output/ 2>/dev/null || echo "QA_OUTPUT_EMPTY"
+echo "4. SCAN_MANIFEST.md (always required):"
+ls .qa-output/SCAN_MANIFEST.md 2>/dev/null || echo "NO_SCAN_MANIFEST"
+echo "5. Codebase map document count:"
+ls .qa-output/codebase/ 2>/dev/null | wc -l || echo "NO_CODEBASE_MAP"
+echo "6. Analyzer artifacts:"
+ls .qa-output/QA_ANALYSIS.md .qa-output/TEST_INVENTORY.md 2>/dev/null || echo "NO_ANALYZER_ARTIFACTS"
+echo "7. TestID audit report:"
+ls .qa-output/TESTID_AUDIT_REPORT.md 2>/dev/null || echo "NO_TESTID_REPORT"
+echo "8. Generation plan:"
+ls .qa-output/GENERATION_PLAN.md 2>/dev/null || echo "NO_GENERATION_PLAN"
+echo "9. Validation report:"
+ls .qa-output/VALIDATION_REPORT.md 2>/dev/null || echo "NO_VALIDATION_REPORT"
+echo "10. E2E + bug-detective reports:"
+ls .qa-output/E2E_RUN_REPORT.md .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_E2E_REPORTS"
+echo "11. State transitions with timestamps:"
+grep -cE "^- |^[0-9]+\." .planning/STATE.md 2>/dev/null || echo "NO_STATE_TRANSITIONS"
+echo "12. MY_PREFERENCES.md:"
+cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
+echo "=== PIPELINE ORCHESTRATOR CHECKLIST END ==="
+```
+**Rules:**
+- Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
+- If any output shows a problem (STATE_FILE_NOT_FOUND, NO_SCAN_MANIFEST), fix it before returning.
+- If output shows expected "not found" results (e.g., NO_TESTID_REPORT when no frontend was detected), that is fine — the point is you RAN the command instead of assuming the answer.
+- Do NOT return control to the user until the block has been executed and you have read every line of output.

package/agents/qaa-analyzer.md CHANGED Viewed

@@ -1,3 +1,10 @@
+---
+name: qaa-analyzer
+description: Analyzes scanned repo to produce QA analysis and test inventory
+skills:
+  - qa-repo-analyzer
+---
 <purpose>
 Analyze a scanned repository to produce QA_ANALYSIS.md and TEST_INVENTORY.md -- the two primary analysis artifacts that drive all downstream test planning and generation. Consumes SCAN_MANIFEST.md (produced by the scanner agent) and CLAUDE.md (QA standards) to produce a comprehensive testability report with architecture overview, risk assessment, top 10 unit test targets, API contract targets, and a testing pyramid distribution tailored to the specific repository. Produces a pyramid-based test case inventory where every test case has a unique ID, specific target, concrete inputs, explicit expected outcome with exact values, and priority. Optionally produces QA_REPO_BLUEPRINT.md for Option 1 (dev-only) workflows when no existing QA repository exists. Spawned by the orchestrator after the scanner completes successfully via Task(subagent_type='qaa-analyzer').
 </purpose>
@@ -537,3 +544,37 @@ The analyzer agent has completed successfully when:
 5. All artifacts are committed via `node bin/qaa-tools.cjs commit`
 6. Return to orchestrator: file paths, total test count, pyramid breakdown (unit/integration/api/e2e counts), risk count (high/medium/low)
 </success_criteria>
+## MANDATORY verification — run ALL commands below, no exceptions, no skipping
+Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
+```bash
+echo "=== ANALYZER CHECKLIST START ==="
+echo "1. SCAN_MANIFEST.md (input):"
+ls .qa-output/SCAN_MANIFEST.md 2>/dev/null || echo "SCAN_MANIFEST_NOT_FOUND"
+echo "2. SCAN_MANIFEST content preview:"
+head -50 .qa-output/SCAN_MANIFEST.md 2>/dev/null || echo "SCAN_MANIFEST_EMPTY"
+echo "3. Codebase map documents:"
+ls .qa-output/codebase/ 2>/dev/null || echo "NO_CODEBASE_MAP"
+echo "4. RISK_MAP.md risks:"
+grep -E "^## |HIGH|MEDIUM|LOW" .qa-output/codebase/RISK_MAP.md 2>/dev/null | head -20 || echo "NO_RISK_MAP"
+echo "5. CRITICAL_PATHS.md flows:"
+grep -c "^- \|^[0-9]\+\." .qa-output/codebase/CRITICAL_PATHS.md 2>/dev/null || echo "NO_CRITICAL_PATHS"
+echo "6. TEST_SURFACE.md entry points:"
+grep -E "function|class|method" .qa-output/codebase/TEST_SURFACE.md 2>/dev/null | head -10 || echo "NO_TEST_SURFACE"
+echo "7. Locator Registry:"
+ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
+echo "8. Output artifacts:"
+ls .qa-output/QA_ANALYSIS.md .qa-output/TEST_INVENTORY.md 2>/dev/null || echo "OUTPUTS_NOT_WRITTEN"
+echo "9. MY_PREFERENCES.md:"
+cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
+echo "=== ANALYZER CHECKLIST END ==="
+```
+**Rules:**
+- Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
+- If any output shows a problem (SCAN_MANIFEST_NOT_FOUND, OUTPUTS_NOT_WRITTEN), fix it before returning.
+- If output shows expected "not found" results (e.g., NO_CODEBASE_MAP when mapper hasn't run yet), that is fine — the point is you RAN the command instead of assuming the answer.
+- Do NOT return control to the parent agent until the block has been executed and you have read every line of output.

package/agents/qaa-bug-detective.md CHANGED Viewed

@@ -1,3 +1,10 @@
+---
+name: qaa-bug-detective
+description: Classifies failures and fixes test code errors
+skills:
+  - qa-bug-detective
+---
 <purpose>
 Run generated tests against the actual application and classify every failure into one of four actionable categories: APPLICATION BUG, TEST CODE ERROR, ENVIRONMENT ISSUE, or INCONCLUSIVE. Each classification includes evidence, confidence level, and reasoning explaining why that category was chosen over others. Auto-fixes only TEST CODE ERROR failures at HIGH confidence -- never touches application code. Reads test source files, CLAUDE.md classification rules, and the failure-classification template. Produces FAILURE_CLASSIFICATION_REPORT.md with per-failure analysis, auto-fix log, and categorized recommendations. Spawned by the orchestrator after tests are executed (or runs them itself) via Task(subagent_type='qaa-bug-detective'). This agent actually RUNS the test suite -- it is not static analysis. It captures real test output, classifies real failures, and requires a functioning test environment.
 </purpose>
@@ -309,6 +316,54 @@ Attempt auto-fixes for eligible failures. Strict eligibility rules apply.
 **Track all auto-fix attempts** for the Auto-Fix Log section of the report.
 </step>
+## Non-negotiable rules
+These rules are hardcoded in the agent body because they MUST NOT be skipped under any circumstance, regardless of whether the skill is loaded or not.
+### Locator Registry persistence
+After every fix loop iteration where the test **PASSES**:
+1. **Save all verified locators** to `.qa-output/locators/` — write a per-feature file `.qa-output/locators/{feature}.locators.md` and update `.qa-output/locators/LOCATOR_REGISTRY.md`.
+2. **Only save locators that were confirmed working** by a passing test. Do NOT save locators from failing tests — they may be incorrect and would contaminate the registry.
+3. **Locator format in registry:** Each entry must include: the `data-testid` or selector value, the tier (1-4), the page/component context, and the date verified.
+### MY_PREFERENCES.md persistence
+After every fix where a correction contradicts CLAUDE.md defaults or reveals a user-specific pattern:
+1. **Read `~/.claude/qaa/MY_PREFERENCES.md`** if it exists, before producing any output (this is also in `<required_reading>` but repeated here for emphasis).
+2. **Save new corrections** to `~/.claude/qaa/MY_PREFERENCES.md` so future agent instances inherit the learning.
+3. Preferences override CLAUDE.md when there is a conflict.
+### Playwright MCP reproduction is mandatory for E2E failures
+When an E2E test fails **and** Playwright MCP server is connected **and** an `app_url` is available, browser reproduction is **required, not optional** — classifying an E2E failure without reproducing it in the live browser produces unreliable APPLICATION BUG vs TEST CODE ERROR classifications.
+1. **For each E2E failure in the test run:** call at minimum `mcp__playwright__browser_navigate` (to the failing route), `mcp__playwright__browser_snapshot` (to inspect the real DOM), and `mcp__playwright__browser_take_screenshot` (visual evidence attached to the classification).
+2. **Skip is only permitted when:** the failure is a unit/API test (not E2E), OR no `app_url` is available, OR Playwright MCP is not connected. The skip MUST be recorded in FAILURE_CLASSIFICATION_REPORT.md under the failure's evidence section with reason (e.g., "MCP unavailable" or "no app_url").
+3. **Persist evidence of MCP usage** to `.qa-output/mcp-evidence/qaa-bug-detective-session.md` with:
+   - `session_start: {ISO timestamp}` and `session_end: {ISO timestamp}`
+   - `failures_reproduced:` list of `{test_id, route, classification}`
+   - `snapshots_taken:` count + route
+   - `screenshots_taken:` list of screenshot paths (evidence for classifications)
+   - `browser_closed: true`
+4. **If E2E failures exist and the evidence file is missing or empty, classifications for those failures are INVALID** — mark them INCONCLUSIVE with reason "MCP reproduction skipped" rather than making up an APPLICATION BUG / TEST CODE ERROR classification.
+### Locator resolution priority when auto-fixing TEST CODE ERRORS — invention is forbidden
+When a failure is classified as `TEST CODE ERROR` (wrong locator) and the agent auto-fixes the test file, the corrected locator MUST come from one of the following sources, in this exact priority order. **The agent MUST NOT invent a new `data-testid` or guess a CSS selector.**
+**Priority 1 — Locator Registry:** Check `.qa-output/locators/LOCATOR_REGISTRY.md` + `.qa-output/locators/{feature}.locators.md` for the target element.
+**Priority 2 — Codebase source:** `grep -rE "data-testid=|aria-label=|id=\""` the frontend source for the page where the failure occurred.
+**Priority 3 — Live DOM via Playwright MCP:** Use `mcp__playwright__browser_snapshot()` on the failing route to extract the real locator. Persist to registry with tier classification.
+**Priority 4 — HALT:** If nothing is resolvable, do NOT auto-fix. Re-classify the failure as `INCONCLUSIVE` with reason `locator unresolvable from registry/source/MCP`. The fix remains for the developer to address.
+Every locator written during auto-fix MUST have a source attribution in the MCP evidence file: `source: registry | codebase | mcp`. A locator without attribution is invented and the auto-fix is invalid (revert it).
 <step name="produce_report">
 Write FAILURE_CLASSIFICATION_REPORT.md matching templates/failure-classification.md exactly (4 required sections).
@@ -477,3 +532,43 @@ The bug detective agent has completed successfully when:
 8. Return values provided to orchestrator: report_path, total_failures, classification_breakdown, auto_fixes_applied, auto_fixes_verified, commit_hash
 9. All quality gate checks pass (8 template items + 4 detective-specific items)
 </success_criteria>
+## MANDATORY verification — run ALL commands below, no exceptions, no skipping
+Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
+```bash
+echo "=== BUG-DETECTIVE CHECKLIST START ==="
+echo "1. Locator Registry:"
+ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
+echo "2. MY_PREFERENCES.md:"
+cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
+echo "3. FAILURE_CLASSIFICATION_REPORT.md:"
+ls .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "REPORT_NOT_WRITTEN"
+echo "4. Classifications in report:"
+grep -E "APPLICATION BUG|TEST CODE ERROR|ENVIRONMENT ISSUE|INCONCLUSIVE" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_CLASSIFICATIONS_FOUND"
+echo "5. Confidence levels:"
+grep -E "HIGH|MEDIUM-HIGH|MEDIUM|LOW" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null | head -10 || echo "NO_CONFIDENCE_LEVELS"
+echo "6. Evidence and reasoning count:"
+grep -cE "^### |Evidence:|Reasoning:" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_EVIDENCE_SECTIONS"
+echo "7. Upstream reports:"
+ls .qa-output/E2E_RUN_REPORT.md 2>/dev/null || echo "NO_E2E_RUN_REPORT"
+ls .qa-output/VALIDATION_REPORT.md 2>/dev/null || echo "NO_VALIDATION_REPORT"
+echo "8. MCP reproduction evidence:"
+ls .qa-output/mcp-evidence/qaa-bug-detective-session.md 2>/dev/null || echo "NO_MCP_EVIDENCE"
+grep -cE "failures_reproduced:|snapshots_taken:|screenshots_taken:" .qa-output/mcp-evidence/qaa-bug-detective-session.md 2>/dev/null || echo "NO_MCP_REPRODUCTION_DATA"
+echo "9. MCP skip reasons (if any):"
+grep -E "MCP unavailable|no app_url|MCP reproduction skipped" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_MCP_SKIP_DOCUMENTED"
+echo "10. Locator source attribution:"
+grep -cE "source: registry|source: codebase|source: mcp" .qa-output/mcp-evidence/qaa-bug-detective-session.md 2>/dev/null || echo "NO_SOURCE_ATTRIBUTION"
+echo "11. Priority 4 halts:"
+grep -E "locator unresolvable from registry/source/MCP" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_PRIORITY4_HALTS"
+echo "=== BUG-DETECTIVE CHECKLIST END ==="
+```
+**Rules:**
+- Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
+- If any output shows a problem (REPORT_NOT_WRITTEN, NO_CLASSIFICATIONS_FOUND), fix it before returning.
+- If output shows expected "not found" results (e.g., NO_MCP_EVIDENCE when no E2E failures existed), that is fine — the point is you RAN the command instead of assuming the answer.
+- Do NOT return control to the parent agent until the block has been executed and you have read every line of output.

package/agents/qaa-codebase-mapper.md CHANGED Viewed

@@ -3,6 +3,8 @@ name: qaa-codebase-mapper
 description: Explores codebase and writes QA-focused analysis documents. Spawned by /qa-analyze or qa-start pipeline. Produces testing-oriented architecture, conventions, and risk documents.
 tools: Read, Bash, Grep, Glob, Write
 color: cyan
+skills:
+  - qa-repo-analyzer
 ---
 <role>
@@ -933,3 +935,4 @@ Test the highest-risk gaps first:
 - [ ] No secrets or forbidden file contents leaked
 - [ ] Confirmation returned (not document contents)
 </success_criteria>

package/agents/qaa-e2e-runner.md CHANGED Viewed

@@ -1,3 +1,10 @@
+---
+name: qaa-e2e-runner
+description: Runs E2E tests against live app, fixes locator mismatches
+skills:
+  - qa-bug-detective
+---
 <purpose>
 Run generated E2E test files against a live application using the Playwright browser tools. Navigate pages, capture real locators from the accessibility snapshot, compare them against the locators in generated test files, fix mismatches, and loop until tests pass or failures are classified as application bugs. This agent bridges the gap between "tests exist on disk" and "tests actually pass against the real app."
@@ -400,6 +407,39 @@ E2E_RUNNER_COMPLETE:
 | Test runner not found | No playwright/cypress installed | Report as ENVIRONMENT ISSUE with install instructions |
 </error_handling>
+## Non-negotiable rules
+These rules are hardcoded in the agent body because they MUST NOT be skipped under any circumstance, regardless of whether the skill is loaded or not.
+### Playwright MCP usage is mandatory (NOT optional)
+This agent's core job is to run tests against a **live browser**. That requires the Playwright MCP server. The agent MUST NOT classify a test run as complete based on static analysis, log inspection, or dry-run output alone.
+1. **Every E2E test execution MUST go through Playwright MCP tools** — `mcp__playwright__browser_navigate`, `mcp__playwright__browser_snapshot`, `mcp__playwright__browser_click`, `mcp__playwright__browser_fill_form`, `mcp__playwright__browser_take_screenshot`, `mcp__playwright__browser_close`. If these tools are not available, halt and return `ENVIRONMENT_ISSUE: Playwright MCP not connected` instead of faking execution.
+2. **Minimum required MCP operations per run:** at least one `browser_navigate` (to the app URL), at least one `browser_snapshot` (for DOM inspection), at least one `browser_take_screenshot` (for visual evidence), and exactly one `browser_close` at the end of the session.
+3. **Persist evidence of MCP usage** to `.qa-output/mcp-evidence/qaa-e2e-runner-session.md`. The file MUST contain:
+   - `session_start: {ISO timestamp}` and `session_end: {ISO timestamp}`
+   - `urls_navigated:` list of every URL passed to `browser_navigate`
+   - `snapshots_taken:` count of `browser_snapshot` calls with route per snapshot
+   - `screenshots_taken:` list of screenshot file paths (also written to `.qa-output/screenshots/`)
+   - `interactions:` list of clicks/fills with the element identifier
+   - `browser_closed: true` confirming `browser_close` was called
+4. **If the evidence file is missing, empty, or lists zero `browser_navigate` calls, the run is INVALID** — do not write E2E_RUN_REPORT.md and return a hard failure instead.
+### Locator resolution priority when fixing failing tests — invention is forbidden
+When a test fails due to a locator mismatch and the fix loop needs to update the POM or test file with a corrected locator, the runner MUST follow this priority chain. **Never invent a `data-testid` or selector that does not exist in one of the sources below.**
+**Priority 1 — Locator Registry:** Check `.qa-output/locators/LOCATOR_REGISTRY.md` and `.qa-output/locators/{feature}.locators.md` for the target element. If present, use it verbatim.
+**Priority 2 — Codebase source:** If not in registry, `grep -rE "data-testid=|aria-label=|id=\"" <frontend_source_dir>` for the page under test. If found, use verbatim and persist to registry.
+**Priority 3 — Live DOM via Playwright MCP:** If not in registry AND not in source, call `mcp__playwright__browser_snapshot()` on the failing route and extract the real locator from the snapshot. Persist to registry with `tier` classification.
+**Priority 4 — HALT (never invent):** If nothing is resolvable, mark the test as `BLOCKED: locator unresolvable` in E2E_RUN_REPORT.md with the unresolved element name. Do NOT fabricate a locator to "make the test pass". Do NOT replace the failing locator with a random guess.
+Every locator written to a POM/test during fix loops MUST have a source attribution in the MCP evidence file: `source: registry | codebase | mcp`. Anything else is invention and the fix is invalid.
 <success_criteria>
 E2E runner is complete when:
@@ -414,3 +454,49 @@ E2E runner is complete when:
 - [ ] Locator registry updated with all real locators discovered during execution (`.qa-output/locators/`)
 - [ ] Browser session was closed
 </success_criteria>
+## MANDATORY verification — run ALL commands below, no exceptions, no skipping
+Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
+```bash
+echo "=== E2E-RUNNER CHECKLIST START ==="
+echo "1. E2E Run Report:"
+ls .qa-output/E2E_RUN_REPORT.md 2>/dev/null || echo "REPORT_NOT_WRITTEN"
+echo "2. Locator Registry:"
+ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
+echo "3. Screenshots:"
+ls .qa-output/screenshots/ 2>/dev/null || echo "NO_SCREENSHOTS"
+echo "4. Modified POMs/tests in working tree:"
+git status 2>/dev/null | grep -E "modified:.*(pages/|tests/)" || echo "NO_MODIFIED_FILES"
+echo "5. MY_PREFERENCES.md:"
+cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
+echo "6. MCP evidence file:"
+ls .qa-output/mcp-evidence/qaa-e2e-runner-session.md 2>/dev/null || echo "NO_MCP_EVIDENCE"
+echo "7. MCP session boundaries:"
+grep -E "session_start:|session_end:|browser_closed: true" .qa-output/mcp-evidence/qaa-e2e-runner-session.md 2>/dev/null || echo "NO_MCP_SESSION"
+echo "8. URLs navigated via MCP:"
+grep -cE "^  - http|^  - /" .qa-output/mcp-evidence/qaa-e2e-runner-session.md 2>/dev/null || echo "NO_URLS_NAVIGATED"
+echo "9. Snapshot + screenshot operations:"
+grep -cE "browser_snapshot|browser_take_screenshot" .qa-output/mcp-evidence/qaa-e2e-runner-session.md 2>/dev/null || echo "NO_SNAPSHOT_OPS"
+echo "10. Locator source attribution:"
+grep -cE "source: registry|source: codebase|source: mcp" .qa-output/mcp-evidence/qaa-e2e-runner-session.md 2>/dev/null || echo "NO_SOURCE_ATTRIBUTION"
+echo "11. Unresolvable locator blocks:"
+grep -E "BLOCKED: locator unresolvable" .qa-output/E2E_RUN_REPORT.md 2>/dev/null || echo "NO_BLOCKED_LOCATORS"
+echo "12. Pass/fail counts in report:"
+grep -E "PASS|FAIL|Tests run|[0-9]+ passed|[0-9]+ failed" .qa-output/E2E_RUN_REPORT.md 2>/dev/null | head -5 || echo "NO_PASS_FAIL_COUNTS"
+echo "13. Locator Registry entries:"
+grep -cE "^- |^\* " .qa-output/locators/LOCATOR_REGISTRY.md 2>/dev/null || echo "NO_REGISTRY_ENTRIES"
+echo "14. Locator tier classification:"
+grep -E "tier: 1|tier: 2|tier: 3|tier: 4" .qa-output/locators/*.md 2>/dev/null | head -10 || echo "NO_TIER_CLASSIFICATION"
+echo "15. Validator report (input):"
+ls .qa-output/VALIDATION_REPORT.md 2>/dev/null || echo "NO_VALIDATION_REPORT"
+echo "=== E2E-RUNNER CHECKLIST END ==="
+```
+**Rules:**
+- Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
+- If any output shows a problem (REPORT_NOT_WRITTEN, NO_MCP_EVIDENCE when browser was used), fix it before returning.
+- If output shows expected "not found" results (e.g., NO_SCREENSHOTS when tests all passed first try), that is fine — the point is you RAN the command instead of assuming the answer.
+- Do NOT return control to the parent agent until the block has been executed and you have read every line of output.

package/agents/qaa-executor.md CHANGED Viewed

@@ -1,3 +1,11 @@
+---
+name: qaa-executor
+description: Generates test files, POMs, fixtures and configs
+skills:
+  - qa-template-engine
+  - qa-self-validator
+---
 <purpose>
 Read the generation plan (produced by qaa-planner), TEST_INVENTORY.md, and CLAUDE.md to produce actual test files, page object models, fixtures, and configuration files. This is the most complex agent in the pipeline -- it handles framework detection, BasePage scaffolding, POM generation following strict rules, test spec writing with concrete assertions, and per-file atomic commits for maximum traceability. The executor does not decide WHAT to test (that is the planner's job) -- it decides HOW to write each test file following CLAUDE.md standards and qa-template-engine patterns.
@@ -593,6 +601,48 @@ EXECUTOR_COMPLETE:
 ```
 </output>
+## Non-negotiable rules
+These rules are hardcoded in the agent body because they MUST NOT be skipped under any circumstance, regardless of whether the skill is loaded or not.
+### Locator resolution priority — locator invention is forbidden
+**Before writing any locator (Tier 1 `data-testid`, Tier 2 role/label, Tier 3 CSS) in a POM or E2E test, the executor MUST follow this exact priority chain. Proposing a value that exists in none of the sources below is a critical failure.**
+**Priority 1 — Locator Registry (first check):**
+- Run `ls .qa-output/locators/LOCATOR_REGISTRY.md` and `ls .qa-output/locators/{feature}.locators.md`.
+- `grep` the target element (by page + semantic description) in those files.
+- If a locator exists → USE IT VERBATIM. Do not modify, do not propose an alternative.
+**Priority 2 — Codebase source (second check, only if not in registry):**
+- `grep -rE "data-testid=|aria-label=|id=\"" <frontend_source_dir>` for the target page/component file.
+- If `data-testid`, stable `id`, or semantic `aria-label` is found in source → USE IT VERBATIM. Persist to registry so future runs hit Priority 1.
+**Priority 3 — Playwright MCP live DOM (third check, only if not in registry AND not in source):**
+- Call `mcp__playwright__browser_navigate({ url: "{app_url}/{route}" })` then `mcp__playwright__browser_snapshot()` to read the rendered DOM.
+- Extract the real locator from the snapshot (Tier 1 > Tier 2 > Tier 3 priority per CLAUDE.md).
+- Persist discovered locator to `.qa-output/locators/{feature}.locators.md` and update `LOCATOR_REGISTRY.md` so the next run hits Priority 1.
+**Priority 4 — HALT (never invent):**
+- If registry has no entry, source has no stable attribute, AND (MCP is unavailable OR `app_url` is missing), the agent MUST HALT for that element.
+- Return `BLOCKED: locator unresolvable for {page}:{element} — registry empty, source has no testid/aria, MCP unavailable. Options: (a) run qa-testid to inject, (b) provide app_url, (c) connect Playwright MCP.`
+- Do NOT invent a `data-testid` value. Do NOT propose a CSS selector based on a guess. Do NOT write the POM/test file with placeholder locators.
+### Playwright MCP evidence file (mandatory when MCP is used)
+When Priority 3 is invoked (MCP lookup), persist evidence to `.qa-output/mcp-evidence/qaa-executor-session.md` with:
+- `session_start: {ISO timestamp}` and `session_end: {ISO timestamp}`
+- `pages_validated:` list of `{page_name, url, locators_discovered_count, source: registry|codebase|mcp}`
+- `snapshots_taken:` count + route
+- `locators_discovered_via_mcp:` list of locators found via MCP (these MUST also appear in `.qa-output/locators/`)
+- `priority1_hits:` count (reused from registry)
+- `priority2_hits:` count (extracted from source)
+- `priority3_hits:` count (discovered via MCP)
+- `priority4_halts:` list of unresolvable elements (if any)
+- `browser_closed: true`
+**If E2E/POM files were generated AND the evidence file shows `priority3_hits > 0` but the registry was not updated, the generation is INVALID** — delete files and re-run. Every MCP-discovered locator MUST be persisted.
 <quality_gate>
 Before considering the executor's work complete, verify ALL of the following.
@@ -649,3 +699,51 @@ The executor agent has completed successfully when:
 8. All quality gate checks pass
 9. Return values provided to orchestrator: files_created, total_files, commit_count, features_covered, test_case_count
 </success_criteria>
+## MANDATORY verification — run ALL commands below, no exceptions, no skipping
+Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
+```bash
+echo "=== EXECUTOR CHECKLIST START ==="
+echo "1. Generated test files, POMs, fixtures:"
+ls tests/ pages/ fixtures/ 2>/dev/null || echo "NO_TEST_FILES_FOUND"
+echo "2. BasePage inheritance:"
+grep -rE "class BasePage|extends BasePage" pages/ 2>/dev/null || echo "NO_BASEPAGE_FOUND"
+echo "3. Test framework config:"
+ls *.config.* 2>/dev/null || echo "NO_CONFIG_FOUND"
+echo "4. MY_PREFERENCES.md:"
+cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
+echo "5. Locator Registry:"
+ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
+echo "6. Generation plan and test inventory inputs:"
+ls .qa-output/GENERATION_PLAN.md .qa-output/TEST_INVENTORY.md 2>/dev/null || echo "INPUTS_NOT_FOUND"
+echo "7. Test case count from inventory:"
+grep -cE "^\| (UT|INT|API|E2E)-" .qa-output/TEST_INVENTORY.md 2>/dev/null || echo "NO_TEST_CASES_COUNTED"
+echo "8. Generation plan tasks consumed:"
+grep -E "task_id|files_to_create" .qa-output/GENERATION_PLAN.md 2>/dev/null | head -20 || echo "NO_PLAN_TASKS"
+echo "9. Codebase map documents:"
+ls .qa-output/codebase/ 2>/dev/null || echo "NO_CODEBASE_MAP"
+echo "10. CODE_PATTERNS.md patterns:"
+grep -E "pattern|convention|style" .qa-output/codebase/CODE_PATTERNS.md 2>/dev/null | head -5 || echo "NO_CODE_PATTERNS"
+echo "11. Tier 1 locator usage in generated code:"
+grep -cE "data-testid|getByTestId|getByRole|findByRole" tests/ pages/ -r 2>/dev/null || echo "NO_TIER1_LOCATORS"
+echo "12. MCP evidence file:"
+ls .qa-output/mcp-evidence/qaa-executor-session.md 2>/dev/null || echo "NO_MCP_EVIDENCE"
+echo "13. Locator priority chain hits:"
+grep -E "priority1_hits:|priority2_hits:|priority3_hits:|priority4_halts:" .qa-output/mcp-evidence/qaa-executor-session.md 2>/dev/null || echo "NO_PRIORITY_HITS"
+echo "14. Locator source attribution:"
+grep -cE "source: registry|source: codebase|source: mcp" .qa-output/mcp-evidence/qaa-executor-session.md 2>/dev/null || echo "NO_SOURCE_ATTRIBUTION"
+echo "15. MCP session boundaries:"
+grep -E "session_start:|browser_closed: true" .qa-output/mcp-evidence/qaa-executor-session.md 2>/dev/null || echo "NO_MCP_SESSION"
+echo "16. Priority 4 halts (unresolvable locators):"
+grep -E "BLOCKED: locator unresolvable" .qa-output/mcp-evidence/qaa-executor-session.md 2>/dev/null || echo "NO_PRIORITY4_HALTS"
+echo "=== EXECUTOR CHECKLIST END ==="
+```
+**Rules:**
+- Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
+- If any output shows a problem (NO_TEST_FILES_FOUND after generation, INPUTS_NOT_FOUND), fix it before returning.
+- If output shows expected "not found" results (e.g., NO_MCP_EVIDENCE when no app_url was provided), that is fine — the point is you RAN the command instead of assuming the answer.
+- Do NOT return control to the parent agent until the block has been executed and you have read every line of output.

package/agents/qaa-planner.md CHANGED Viewed

@@ -1,3 +1,10 @@
+---
+name: qaa-planner
+description: Produces structured generation plan from test inventory
+skills:
+  - qa-template-engine
+---
 <purpose>
 Read TEST_INVENTORY.md and QA_ANALYSIS.md to produce a structured generation plan that maps every test case to an output file, grouped by feature domain with explicit task dependencies. This agent is the bridge between "what tests are needed" (from the analyzer) and "tests exist on disk" (from the executor). It is spawned by the orchestrator after the analyzer completes successfully via Task(subagent_type='qaa-planner'). The planner does NOT produce test files -- it produces a plan that the executor consumes. The generation plan is an internal artifact with no template; the planner defines its own output format documented in the <output> section below.
 </purpose>
@@ -403,3 +410,37 @@ The planner agent has completed successfully when:
 8. Return values provided to orchestrator: file_path, total_tasks, total_files, feature_count, dependency_depth, test_case_count, commit_hash
 9. All quality gate checks pass
 </success_criteria>
+## MANDATORY verification — run ALL commands below, no exceptions, no skipping
+Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
+```bash
+echo "=== PLANNER CHECKLIST START ==="
+echo "1. Primary inputs:"
+ls .qa-output/TEST_INVENTORY.md .qa-output/QA_ANALYSIS.md 2>/dev/null || echo "INPUTS_NOT_FOUND"
+echo "2. Test case count from inventory:"
+grep -cE "^\| (UT|INT|API|E2E)-" .qa-output/TEST_INVENTORY.md 2>/dev/null || echo "NO_TEST_CASES_COUNTED"
+echo "3. Architecture overview from QA_ANALYSIS:"
+grep -E "^### |system_type|framework|language" .qa-output/QA_ANALYSIS.md 2>/dev/null | head -15 || echo "NO_ARCHITECTURE_OVERVIEW"
+echo "4. Pyramid percentages:"
+grep -E "Unit [0-9]+%|Integration [0-9]+%|API [0-9]+%|E2E [0-9]+%" .qa-output/QA_ANALYSIS.md 2>/dev/null || echo "NO_PYRAMID_PERCENTAGES"
+echo "5. Codebase map documents:"
+ls .qa-output/codebase/ 2>/dev/null || echo "NO_CODEBASE_MAP"
+echo "6. TESTABILITY.md mock complexity:"
+grep -E "pure function|stateful" .qa-output/codebase/TESTABILITY.md 2>/dev/null | head -10 || echo "NO_TESTABILITY"
+echo "7. Locator Registry:"
+ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
+echo "8. Generation plan output:"
+ls .qa-output/GENERATION_PLAN.md 2>/dev/null || echo "PLAN_NOT_WRITTEN"
+echo "9. MY_PREFERENCES.md:"
+cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
+echo "=== PLANNER CHECKLIST END ==="
+```
+**Rules:**
+- Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
+- If any output shows a problem (INPUTS_NOT_FOUND, PLAN_NOT_WRITTEN), fix it before returning.
+- If output shows expected "not found" results (e.g., NO_CODEBASE_MAP when mapper hasn't run), that is fine — the point is you RAN the command instead of assuming the answer.
+- Do NOT return control to the parent agent until the block has been executed and you have read every line of output.

package/agents/qaa-testid-injector.md CHANGED Viewed

@@ -1,3 +1,10 @@
+---
+name: qaa-testid-injector
+description: Scans and injects data-testid attributes in frontend components
+skills:
+  - qa-testid-injector
+---
 <purpose>
 Scan frontend component files in a developer repository, audit every interactive UI element for `data-testid` coverage, and inject missing `data-testid` attributes following the `{context}-{description}-{element-type}` naming convention. Reads SCAN_MANIFEST.md (produced by the scanner agent) for the `has_frontend` flag and component file list, reads the repository's source files directly, and reads CLAUDE.md for the data-testid Convention section. Produces TESTID_AUDIT_REPORT.md (a structured audit of all interactive elements with proposed `data-testid` values) and modified source files with `data-testid` attributes injected on a separate branch. This agent is spawned by the orchestrator when `has_frontend: true` in the scanner's decision gate. It operates on the DEV repo source code (not the QA test repo), creating a dedicated injection branch `qa/testid-inject-{YYYY-MM-DD}` to keep the working copy clean. The user merges the injection branch if approved.
 </purpose>
@@ -601,6 +608,25 @@ INJECTOR_SKIPPED:
 ```
 </output>
+## Non-negotiable rules
+These rules are hardcoded in the agent body because they MUST NOT be skipped under any circumstance, regardless of whether the skill is loaded or not.
+### Playwright MCP usage is mandatory when app_url is provided
+When an `app_url` is available in the orchestrator prompt (or provided via `--app-url` flag), live DOM verification via Playwright MCP is **required, not optional**. Source-only scans miss dynamically rendered elements, conditionally shown components, and third-party injections.
+1. **If `app_url` is available, the agent MUST call Playwright MCP tools** — at minimum `mcp__playwright__browser_navigate` (once per unique route from SCAN_MANIFEST.md) and `mcp__playwright__browser_snapshot` (once per navigated route). If MCP tools are unavailable, halt with `ENVIRONMENT_ISSUE: Playwright MCP not connected` instead of falling back to source-only.
+2. **Skipping MCP verification is only permitted when `app_url` is not provided**, and this skip MUST be explicitly recorded in TESTID_AUDIT_REPORT.md under a "Live DOM Verification" section with reason "no app_url provided".
+3. **Persist evidence of MCP usage** to `.qa-output/mcp-evidence/qaa-testid-injector-session.md` with:
+   - `session_start: {ISO timestamp}` and `session_end: {ISO timestamp}`
+   - `app_url:` base URL provided
+   - `routes_navigated:` list of every route passed to `browser_navigate`
+   - `snapshots_taken:` count + route per snapshot
+   - `dynamic_elements_found:` count of elements present in DOM but absent from source scan (these trigger extra injections)
+   - `browser_closed: true`
+4. **If app_url is provided but the evidence file is missing or lists zero navigations, the audit is INVALID** — TESTID_AUDIT_REPORT.md must not be produced and the agent must return a hard failure.
 <quality_gate>
 Before considering this agent's work complete, verify ALL of the following.
@@ -641,3 +667,45 @@ The testid-injector agent has completed successfully when:
 9. Structured return values provided to orchestrator: report_path, changelog_path, branch name, coverage scores (before/after), element counts, validation status, commit hash
 10. All quality gate checks pass (8 template items + 6 injector-specific items)
 </success_criteria>
+## MANDATORY verification — run ALL commands below, no exceptions, no skipping
+Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
+```bash
+echo "=== TESTID-INJECTOR CHECKLIST START ==="
+echo "1. SCAN_MANIFEST.md (input):"
+ls .qa-output/SCAN_MANIFEST.md 2>/dev/null || echo "SCAN_MANIFEST_NOT_FOUND"
+echo "2. Frontend detection in manifest:"
+grep -E "has_frontend|component_patterns|frontend" .qa-output/SCAN_MANIFEST.md 2>/dev/null | head -10 || echo "NO_FRONTEND_DETECTION"
+echo "3. Component file count:"
+grep -cE "\.(tsx|jsx|vue|svelte|html)$" .qa-output/SCAN_MANIFEST.md 2>/dev/null || echo "NO_COMPONENT_FILES"
+echo "4. Codebase map documents:"
+ls .qa-output/codebase/ 2>/dev/null || echo "NO_CODEBASE_MAP"
+echo "5. CODE_PATTERNS.md interactive elements:"
+grep -E "interactive|button|input|form" .qa-output/codebase/CODE_PATTERNS.md 2>/dev/null | head -10 || echo "NO_CODE_PATTERNS"
+echo "6. Locator Registry:"
+ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
+echo "7. Output artifacts:"
+ls .qa-output/TESTID_AUDIT_REPORT.md .qa-output/INJECTION_CHANGELOG.md 2>/dev/null || echo "OUTPUTS_NOT_WRITTEN"
+echo "8. Coverage score in report:"
+grep -E "Coverage Score|[0-9]+/[0-9]+" .qa-output/TESTID_AUDIT_REPORT.md 2>/dev/null | head -5 || echo "NO_COVERAGE_SCORE"
+echo "9. MY_PREFERENCES.md:"
+cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
+echo "10. MCP evidence file:"
+ls .qa-output/mcp-evidence/qaa-testid-injector-session.md 2>/dev/null || echo "NO_MCP_EVIDENCE"
+echo "11. MCP session boundaries:"
+grep -E "session_start:|routes_navigated:|browser_closed: true" .qa-output/mcp-evidence/qaa-testid-injector-session.md 2>/dev/null || echo "NO_MCP_SESSION"
+echo "12. Routes navigated via MCP:"
+grep -cE "^  - http|^  - /" .qa-output/mcp-evidence/qaa-testid-injector-session.md 2>/dev/null || echo "NO_ROUTES_NAVIGATED"
+echo "13. MCP skip documentation:"
+grep -E "Live DOM Verification|no app_url" .qa-output/TESTID_AUDIT_REPORT.md 2>/dev/null || echo "NO_MCP_SKIP_DOCUMENTED"
+echo "=== TESTID-INJECTOR CHECKLIST END ==="
+```
+**Rules:**
+- Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
+- If any output shows a problem (SCAN_MANIFEST_NOT_FOUND, OUTPUTS_NOT_WRITTEN), fix it before returning.
+- If output shows expected "not found" results (e.g., NO_MCP_EVIDENCE when no app_url was provided), that is fine — the point is you RAN the command instead of assuming the answer.
+- Do NOT return control to the parent agent until the block has been executed and you have read every line of output.

package/agents/qaa-validator.md CHANGED Viewed

@@ -1,3 +1,10 @@
+---
+name: qaa-validator
+description: Validates generated test code across 4 layers with fix loops
+skills:
+  - qa-self-validator
+---
 <purpose>
 Validate generated test code across 4 layers (Syntax, Structure, Dependencies, Logic) and auto-fix issues with a closed-loop fix protocol. Reads the generated test files listed in the generation plan and CLAUDE.md quality standards. Produces VALIDATION_REPORT.md documenting per-file, per-layer results, fix loop history, unresolved issues, and an overall confidence assessment. Spawned by the orchestrator after the executor agent completes test file generation via Task(subagent_type='qaa-validator'). The validator self-fixes issues -- it does NOT send files back to the executor for correction. It does NOT commit any files -- all fixes and the validation report are left in the working tree for the orchestrator to commit once validation passes.
 </purpose>
@@ -488,3 +495,43 @@ The validator agent has completed successfully when:
 6. Return values provided to orchestrator: report_path, overall_status, confidence, layers_summary, fix_loops_used, issues_found, issues_fixed, unresolved_count
 7. All quality gate checks pass (7 template items + 6 validator-specific items)
 </success_criteria>
+## MANDATORY verification — run ALL commands below, no exceptions, no skipping
+Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
+```bash
+echo "=== VALIDATOR CHECKLIST START ==="
+echo "1. Validation report:"
+ls .qa-output/VALIDATION_REPORT.md 2>/dev/null || echo "REPORT_NOT_WRITTEN"
+echo "2. Required sections in report:"
+grep -E "^## " .qa-output/VALIDATION_REPORT.md 2>/dev/null || echo "NO_SECTIONS_FOUND"
+echo "3. Confidence level:"
+grep -E "HIGH|MEDIUM|LOW" .qa-output/VALIDATION_REPORT.md 2>/dev/null | head -5 || echo "NO_CONFIDENCE_LEVEL"
+echo "4. Last commit (validator must NOT commit):"
+git log --oneline -1 2>/dev/null || echo "NO_GIT_HISTORY"
+echo "5. Modified files in working tree:"
+git status 2>/dev/null | grep "modified:" || echo "NO_MODIFIED_FILES"
+echo "6. MY_PREFERENCES.md:"
+cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
+echo "7. Fix loop iterations:"
+grep -c "Loop" .qa-output/VALIDATION_REPORT.md 2>/dev/null || echo "NO_FIX_LOOPS"
+echo "8. Generation plan (input):"
+ls .qa-output/GENERATION_PLAN.md 2>/dev/null || echo "NO_GENERATION_PLAN"
+echo "9. Plan tasks parsed:"
+grep -cE "files_to_create|task_id" .qa-output/GENERATION_PLAN.md 2>/dev/null || echo "NO_PLAN_TASKS"
+echo "10. Locator Registry:"
+ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
+echo "11. Four validation layers per file:"
+grep -E "Syntax|Structure|Dependencies|Logic" .qa-output/VALIDATION_REPORT.md 2>/dev/null | head -20 || echo "NO_VALIDATION_LAYERS"
+echo "12. TEST_INVENTORY (input):"
+ls .qa-output/TEST_INVENTORY.md 2>/dev/null || echo "NO_TEST_INVENTORY"
+echo "=== VALIDATOR CHECKLIST END ==="
+```
+**Rules:**
+- Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
+- If any output shows a problem (REPORT_NOT_WRITTEN, NO_VALIDATION_LAYERS), fix it before returning.
+- If output shows expected "not found" results (e.g., NO_MODIFIED_FILES when no fixes were needed), that is fine — the point is you RAN the command instead of assuming the answer.
+- Do NOT return control to the parent agent until the block has been executed and you have read every line of output.

package/commands/qa-audit.md CHANGED Viewed

@@ -42,6 +42,7 @@ else:
 Scores across 6 dimensions: Locator Quality (20%), Assertion Specificity (20%), POM Compliance (15%), Test Coverage (20%), Naming Convention (15%), Test Data Management (10%).
 1. Read `CLAUDE.md` — quality gates, locator tiers, assertion rules, POM rules, naming conventions.
+1b. Read `~/.claude/qaa/MY_PREFERENCES.md` if it exists — user/company preferences override CLAUDE.md defaults.
 2. Invoke validator agent in audit mode:
 Task(
@@ -50,6 +51,8 @@ Task(
     <execution_context>@agents/qaa-validator.md</execution_context>
     <files_to_read>
     - CLAUDE.md
+    - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
+    - .qa-output/locators/LOCATOR_REGISTRY.md (if exists)
     </files_to_read>
     <parameters>
     user_input: $ARGUMENTS
@@ -68,6 +71,7 @@ Task(
 Analyze test distribution against the ideal testing pyramid from CLAUDE.md (Unit 60-70%, Integration 10-15%, API 20-25%, E2E 3-5%). Compares actual percentages to targets and produces an action plan.
 1. Read `CLAUDE.md` — testing pyramid target percentages.
+1b. Read `~/.claude/qaa/MY_PREFERENCES.md` if it exists — user/company preferences override CLAUDE.md defaults.
 2. Invoke analyzer agent for pyramid analysis:
 Task(
@@ -76,6 +80,7 @@ Task(
     <execution_context>@agents/qaa-analyzer.md</execution_context>
     <files_to_read>
     - CLAUDE.md
+    - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
     </files_to_read>
     <parameters>
     user_input: $ARGUMENTS
@@ -98,6 +103,7 @@ Generate a summary report of current QA status. Adapts detail level to audience.
 - `client` — coverage summary, confidence level, test pass rates, risk mitigation status
 1. Read `CLAUDE.md` — testing pyramid targets, quality gates.
+1b. Read `~/.claude/qaa/MY_PREFERENCES.md` if it exists — user/company preferences override CLAUDE.md defaults.
 2. Invoke analyzer agent for status reporting:
 Task(
@@ -106,6 +112,7 @@ Task(
     <execution_context>@agents/qaa-analyzer.md</execution_context>
     <files_to_read>
     - CLAUDE.md
+    - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
     </files_to_read>
     <parameters>
     user_input: $ARGUMENTS

package/commands/qa-create-test.md CHANGED Viewed

@@ -81,6 +81,7 @@ App URL: {url or "auto-detect"}
 ### FROM CODE MODE
 1. Read `CLAUDE.md` — POM rules, locator tiers, assertion rules, naming conventions, quality gates.
+1b. Read `~/.claude/qaa/MY_PREFERENCES.md` if it exists — user's personal QA preferences override CLAUDE.md defaults when there is a conflict.
 2. Read existing analysis artifacts if available:
    - `.qa-output/QA_ANALYSIS.md` — architecture context
    - `.qa-output/TEST_INVENTORY.md` — pre-defined test cases for this feature
@@ -127,6 +128,7 @@ Task(
     <execution_context>@agents/qaa-executor.md</execution_context>
     <files_to_read>
     - CLAUDE.md
+    - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
     - .qa-output/locators/LOCATOR_REGISTRY.md (if exists)
     - .qa-output/locators/{feature}.locators.md (if exists)
     - .qa-output/codebase/CODE_PATTERNS.md (if exists)
@@ -151,6 +153,8 @@ Task(
     <execution_context>@agents/qaa-e2e-runner.md</execution_context>
     <files_to_read>
     - CLAUDE.md
+    - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
+    - .qa-output/locators/LOCATOR_REGISTRY.md (if exists)
     - {generated E2E test files from executor return}
     - {generated POM files from executor return}
     </files_to_read>
@@ -210,6 +214,7 @@ Task(
     <execution_context>@agents/qaa-validator.md</execution_context>
     <files_to_read>
     - CLAUDE.md
+    - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
     </files_to_read>
     <parameters>
     user_input: $ARGUMENTS
@@ -228,6 +233,7 @@ Task(
     <execution_context>@agents/qaa-executor.md</execution_context>
     <files_to_read>
     - CLAUDE.md
+    - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
     - .qa-output/QA_AUDIT_REPORT.md
     - .qa-output/locators/LOCATOR_REGISTRY.md (if exists)
     </files_to_read>
@@ -262,6 +268,7 @@ Task(
     <execution_context>@agents/qaa-executor.md</execution_context>
     <files_to_read>
     - CLAUDE.md
+    - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
     - .qa-output/locators/LOCATOR_REGISTRY.md (if exists)
     </files_to_read>
     <parameters>
@@ -286,3 +293,26 @@ Task(
 - Every POM extends BasePage
 $ARGUMENTS
+## MANDATORY verification — run ALL commands below, no exceptions, no skipping
+Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
+```bash
+echo "=== CHECKLIST START ==="
+echo "1. Locator Registry:"
+ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
+echo "2. MY_PREFERENCES.md:"
+cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
+echo "3. Generated test files:"
+find tests/ pages/ fixtures/ -type f 2>/dev/null | head -20 || echo "NO_TEST_FILES_FOUND"
+echo "4. MCP evidence (if browser was used):"
+ls .qa-output/mcp-evidence/ 2>/dev/null || echo "NO_MCP_EVIDENCE"
+echo "=== CHECKLIST END ==="
+```
+**Rules:**
+- Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
+- If any output shows a problem (NO_LOCATORS when MCP was used, NO_TEST_FILES after generation), fix it before returning.
+- If output shows expected "not found" results (e.g., NO_MCP_EVIDENCE when no browser was used), that is fine — the point is you RAN the command instead of assuming the answer.
+- Do NOT mark this task as complete until the block has been executed and you have read every line of output.

package/commands/qa-fix.md CHANGED Viewed

@@ -335,6 +335,8 @@ Task(
     <execution_context>@agents/qaa-e2e-runner.md</execution_context>
     <files_to_read>
     - CLAUDE.md
+    - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
+    - .qa-output/locators/LOCATOR_REGISTRY.md (if exists)
     - {E2E test files from validated directory}
     - {POM files from validated directory}
     </files_to_read>
@@ -364,6 +366,8 @@ Task(
     <execution_context>@agents/qaa-bug-detective.md</execution_context>
     <files_to_read>
     - CLAUDE.md
+    - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
+    - .qa-output/locators/LOCATOR_REGISTRY.md (if exists)
     </files_to_read>
     <parameters>
     user_input: $ARGUMENTS

package/commands/qa-map.md CHANGED Viewed

@@ -99,6 +99,7 @@ Task(
     <execution_context>@agents/qaa-scanner.md</execution_context>
     <files_to_read>
     - CLAUDE.md
+    - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
     </files_to_read>
     <parameters>
     user_input: $ARGUMENTS
@@ -114,6 +115,7 @@ Task(
     <execution_context>@agents/qaa-analyzer.md</execution_context>
     <files_to_read>
     - CLAUDE.md
+    - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
     - .qa-output/SCAN_MANIFEST.md
     - .qa-output/codebase/TESTABILITY.md (if exists)
     - .qa-output/codebase/RISK_MAP.md (if exists)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "qaa-agent",
-  "version": "1.7.4",
+  "version": "1.8.0",
   "description": "QA Automation Agent for Claude Code — multi-agent pipeline that analyzes repos, generates tests, validates, and creates PRs",
   "bin": {
     "qaa-agent": "./bin/install.cjs"

package/bin/install.cjs DELETED Viewed

@@ -1,212 +0,0 @@
-#!/usr/bin/env node
-/**
- * QAA - QA Automation Agent Installer
- * Run with: npx qaa-agent
- */
-const fs = require('fs');
-const path = require('path');
-const readline = require('readline');
-const VERSION = require('../package.json').version;
-const ROOT = path.resolve(__dirname, '..');
-const HOME = process.env.HOME || process.env.USERPROFILE;
-// Runtime configs
-const RUNTIMES = {
-  '1': { name: 'Claude Code', dir: path.join(HOME, '.claude') },
-  '2': { name: 'OpenCode', dir: path.join(HOME, '.config', 'opencode') },
-};
-function ask(question, defaultVal) {
-  const rl = readline.createInterface({ input: process.stdin, output: process.stdout });
-  return new Promise(resolve => {
-    rl.question(question, answer => {
-      rl.close();
-      resolve(answer.trim() || defaultVal);
-    });
-  });
-}
-function copyDir(src, dest) {
-  if (!fs.existsSync(src)) return 0;
-  fs.mkdirSync(dest, { recursive: true });
-  let count = 0;
-  for (const entry of fs.readdirSync(src, { withFileTypes: true })) {
-    const srcPath = path.join(src, entry.name);
-    const destPath = path.join(dest, entry.name);
-    if (entry.isDirectory()) {
-      count += copyDir(srcPath, destPath);
-    } else {
-      fs.copyFileSync(srcPath, destPath);
-      count++;
-    }
-  }
-  return count;
-}
-function copyFile(src, dest) {
-  if (!fs.existsSync(src)) return false;
-  fs.mkdirSync(path.dirname(dest), { recursive: true });
-  fs.copyFileSync(src, dest);
-  return true;
-}
-function countEntries(dir, type) {
-  if (!fs.existsSync(dir)) return 0;
-  const entries = fs.readdirSync(dir, { withFileTypes: true });
-  if (type === 'dirs') return entries.filter(e => e.isDirectory()).length;
-  return entries.filter(e => e.isFile()).length;
-}
-function ok(msg) { console.log(`  \x1b[32m✓\x1b[0m ${msg}`); }
-function info(msg) { console.log(`  ${msg}`); }
-async function main() {
-  console.log('');
-  console.log('  \x1b[36m ██████╗  █████╗  █████╗ \x1b[0m');
-  console.log('  \x1b[36m██╔═══██╗██╔══██╗██╔══██╗\x1b[0m');
-  console.log('  \x1b[36m██║   ██║███████║███████║\x1b[0m');
-  console.log('  \x1b[36m██║▄▄ ██║██╔══██║██╔══██║\x1b[0m');
-  console.log('  \x1b[36m╚██████╔╝██║  ██║██║  ██║\x1b[0m');
-  console.log('  \x1b[36m ╚══▀▀═╝ ╚═╝  ╚═╝╚═╝  ╚═╝\x1b[0m');
-  console.log('');
-  console.log(`  \x1b[1mQA Automation Agent\x1b[0m v${VERSION}`);
-  console.log('  Multi-agent QA pipeline for Claude Code.');
-  console.log('  Analyzes repos, generates tests, validates, and creates PRs.');
-  console.log('');
-  // Ask runtime
-  console.log('  Which runtime would you like to install for?');
-  console.log('');
-  console.log('  1) Claude Code  (~/.claude)');
-  console.log('  2) OpenCode     (~/.config/opencode)');
-  console.log('');
-  const runtimeChoice = await ask('  Choice [1]: ', '1');
-  const runtime = RUNTIMES[runtimeChoice] || RUNTIMES['1'];
-  // Ask scope
-  console.log('');
-  console.log('  Where would you like to install?');
-  console.log('');
-  console.log(`  1) Global (~/${path.relative(HOME, runtime.dir)}) - available in all projects`);
-  console.log('  2) Local  (./.claude) - this project only');
-  console.log('');
-  const scopeChoice = await ask('  Choice [1]: ', '1');
-  const isGlobal = scopeChoice !== '2';
-  const baseDir = isGlobal ? runtime.dir : path.join(process.cwd(), '.claude');
-  const qaaDir = isGlobal ? path.join(runtime.dir, 'qaa') : path.join(process.cwd(), '.claude', 'qaa');
-  console.log('');
-  console.log(`  Installing for ${runtime.name} to ${isGlobal ? '~/' + path.relative(HOME, runtime.dir) : './.claude'}`);
-  console.log('');
-  // Install commands (from commands/ in package root to ~/.claude/commands/)
-  const commandsSrc = path.join(ROOT, 'commands');
-  const commandsDest = path.join(baseDir, 'commands');
-  const cmdCount = copyDir(commandsSrc, commandsDest);
-  ok(`Installed ${cmdCount} slash commands`);
-  // Install skills (from skills/ in package root to ~/.claude/skills/)
-  const skillsSrc = path.join(ROOT, 'skills');
-  const skillsDest = path.join(baseDir, 'skills');
-  const skillCount = copyDir(skillsSrc, skillsDest);
-  const skillDirCount = countEntries(skillsSrc, 'dirs');
-  ok(`Installed ${skillDirCount} skills (${skillCount} files)`);
-  // Install workflows
-  const workflowsSrc = path.join(ROOT, 'workflows');
-  const workflowsDest = path.join(qaaDir, 'workflows');
-  const wfCount = copyDir(workflowsSrc, workflowsDest);
-  ok(`Installed ${wfCount} workflows`);
-  // Install agents
-  const agentsSrc = path.join(ROOT, 'agents');
-  const agentsDest = path.join(qaaDir, 'agents');
-  const agentCount = copyDir(agentsSrc, agentsDest);
-  ok(`Installed ${agentCount} agent definitions`);
-  // Install templates
-  const templatesSrc = path.join(ROOT, 'templates');
-  const templatesDest = path.join(qaaDir, 'templates');
-  const templateCount = copyDir(templatesSrc, templatesDest);
-  ok(`Installed ${templateCount} templates`);
-  // Install bin
-  const binSrc = path.join(ROOT, 'bin');
-  const binDest = path.join(qaaDir, 'bin');
-  const binCount = copyDir(binSrc, binDest);
-  try { fs.unlinkSync(path.join(binDest, 'install.cjs')); } catch {}
-  ok(`Installed CLI tooling`);
-  // Install CLAUDE.md
-  copyFile(path.join(ROOT, 'CLAUDE.md'), path.join(qaaDir, 'CLAUDE.md'));
-  ok('Installed QA standards (CLAUDE.md)');
-  // Install .mcp.json (Playwright MCP server config)
-  const mcpSrc = path.join(ROOT, '.mcp.json');
-  if (fs.existsSync(mcpSrc)) {
-    // Copy to qaa dir for reference
-    copyFile(mcpSrc, path.join(qaaDir, '.mcp.json'));
-    // Merge MCP servers into ~/.claude.json (user-scope) so they're available in ALL projects
-    // Note: ~/.claude/.mcp.json is project-scope for ~/.claude/ only — NOT global
-    const userConfigPath = path.join(HOME, '.claude.json');
-    let userConfig = {};
-    if (fs.existsSync(userConfigPath)) {
-      try { userConfig = JSON.parse(fs.readFileSync(userConfigPath, 'utf8')); } catch {}
-    }
-    userConfig.mcpServers = userConfig.mcpServers || {};
-    const qaaMcp = JSON.parse(fs.readFileSync(mcpSrc, 'utf8'));
-    Object.assign(userConfig.mcpServers, qaaMcp.mcpServers);
-    fs.writeFileSync(userConfigPath, JSON.stringify(userConfig, null, 2));
-    ok('Installed Playwright MCP server config (user-scope — available in all projects)');
-  }
-  // Write version
-  fs.writeFileSync(path.join(qaaDir, 'VERSION'), VERSION);
-  ok(`Wrote VERSION (${VERSION})`);
-  // Merge settings (from settings.json in package root)
-  const settingsSrc = path.join(ROOT, 'settings.json');
-  const settingsDest = path.join(baseDir, 'settings.json');
-  if (fs.existsSync(settingsSrc)) {
-    let existing = {};
-    if (fs.existsSync(settingsDest)) {
-      try { existing = JSON.parse(fs.readFileSync(settingsDest, 'utf8')); } catch {}
-    }
-    const qaaSettings = JSON.parse(fs.readFileSync(settingsSrc, 'utf8'));
-    if (qaaSettings.permissions) {
-      existing.permissions = existing.permissions || {};
-      existing.permissions.allow = [...new Set([
-        ...(existing.permissions.allow || []),
-        ...(qaaSettings.permissions.allow || [])
-      ])];
-    }
-    fs.writeFileSync(settingsDest, JSON.stringify(existing, null, 2));
-    ok('Merged permissions into settings.json');
-  }
-  // Done
-  const total = cmdCount + skillCount + agentCount + templateCount + wfCount + binCount;
-  console.log('');
-  console.log(`  \x1b[32m✓ Done!\x1b[0m Installed ${total} files.`);
-  console.log('');
-  console.log('  Open Claude Code in any project and run:');
-  console.log('');
-  console.log('    \x1b[1m/qa-start\x1b[0m          Full QA pipeline (multi-agent)');
-  console.log('    \x1b[1m/qa-map\x1b[0m            Codebase map + analysis');
-  console.log('    \x1b[1m/qa-create-test\x1b[0m    Tests for a feature/ticket');
-  console.log('    \x1b[1m/qa-audit\x1b[0m          Audit existing tests');
-  console.log('    \x1b[1m/qa-fix\x1b[0m            Fix broken tests');
-  console.log('    \x1b[1m/qa-testid\x1b[0m         Inject data-testid attributes');
-  console.log('    \x1b[1m/qa-pr\x1b[0m             Create QA pull request');
-  console.log('');
-  console.log(`  ${cmdCount} commands + ${skillDirCount} skills + ${agentCount} agents ready.`);
-  console.log('');
-}
-main().catch(err => {
-  console.error('Installation failed:', err.message);
-  process.exit(1);
-});