npm - qaa-agent - Versions diffs - 1.7.4 → 1.8.1 - Mend

qaa-agent 1.7.4 → 1.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/.mcp.json +4 -0
package/CHANGELOG.md +40 -0
package/README.md +26 -44
package/agents/qa-pipeline-orchestrator.md +47 -0
package/agents/qaa-analyzer.md +41 -0
package/agents/qaa-bug-detective.md +95 -0
package/agents/qaa-codebase-mapper.md +3 -0
package/agents/qaa-e2e-runner.md +86 -0
package/agents/qaa-executor.md +98 -0
package/agents/qaa-planner.md +41 -0
package/agents/qaa-testid-injector.md +68 -0
package/agents/qaa-validator.md +47 -0
package/bin/install.cjs +253 -212
package/commands/qa-audit.md +7 -0
package/commands/qa-create-test.md +30 -0
package/commands/qa-fix.md +4 -0
package/commands/qa-map.md +2 -0
package/package.json +3 -2

package/.mcp.json CHANGED Viewed

@@ -3,6 +3,10 @@
     "playwright": {
       "command": "npx",
       "args": ["@playwright/mcp@latest"]
+    },
+    "context7": {
+      "command": "npx",
+      "args": ["-y", "@upstash/context7-mcp@latest"]
     }
   }
 }

package/CHANGELOG.md CHANGED Viewed

@@ -3,6 +3,46 @@
 All notable changes to QAA (QA Automation Agent) are documented here.
+## [1.8.1] - 2026-04-16
+### Added
+- **Context7 MCP integration** — `@upstash/context7-mcp` is now bundled alongside Playwright MCP. The installer registers both MCP servers in the user-scope config (`~/.claude.json`) so they're available in every project on the machine, not just in the QAA repo. Context7 gives every QAA agent on-demand access to up-to-date library documentation (Playwright, Cypress, Jest, Vitest, pytest, and any other framework), keeping generated tests aligned with current APIs instead of outdated training data.
+- **`bin/install.cjs` installer script** — the file was referenced in `package.json` but didn't actually exist on npm, causing `npx qaa-agent` to fail silently (`No bin file found at bin/install.cjs`). The installer now performs three steps on every run: (1) copies agents, commands, skills, templates, workflows, docs, and config files into the chosen scope (`~/.claude/qaa` for global, `./.claude/qaa` for local), (2) registers both MCP servers in `~/.claude.json` with idempotency — existing entries are not duplicated, and (3) deep-merges the QAA permissions into the user's `settings.json` without overwriting their existing settings.
+### Changed
+- **MCP registration is now user-scope by default** — previously MCPs were defined only in the project-level `.mcp.json`, which meant they only activated when the user opened the QAA repo itself. They now register in `~/.claude.json`, making them available in every Claude Code project on the user's machine. The project-level `.mcp.json` is kept for QAA development purposes but is no longer the source of truth for end users.
+### Fixed
+- **Silent `npx qaa-agent` failure** — users who installed QAA via npm before this release did not get Playwright or Context7 MCPs registered because the installer script was missing from the published package. Publishing 1.8.1 restores the expected behavior: a single `npx qaa-agent` command copies all files and registers both MCPs globally.
+## [1.8.0] - 2026-04-13
+### Added
+- **Active verification checklist in every agent** — all 8 pipeline agents now end their body with a `## Before completing any task, verify each item actively:` section that forces the agent to run real `ls` + `cat` + `grep` commands against `.qa-output/` artifacts, the Locator Registry, codebase map documents, and `MY_PREFERENCES.md` before closing the task. The output of those commands lands in the subagent's context (recency effect), so the model cannot skip reading inputs or leave outputs unwritten without the verification failing.
+- **`skills:` declared in YAML frontmatter for every agent** — `qaa-analyzer`, `qaa-planner`, `qaa-executor`, `qaa-validator`, `qaa-e2e-runner`, `qaa-bug-detective`, `qaa-testid-injector`, `qaa-codebase-mapper`, `qa-pipeline-orchestrator`. Claude Code now injects the matching SKILL.md content at the start of the subagent's context when the Task tool spawns it. Previously subagents spawned with empty context and ignored the skill entirely.
+- **Non-negotiable rules section in `qaa-bug-detective`** — explicit rules for Locator Registry persistence and MY_PREFERENCES.md updates, placed mid-body as redundant reinforcement between the frontmatter (start) and the active checklist (end).
+- **`MY_PREFERENCES.md` reads propagated across slash commands** — `qa-create-test`, `qa-fix`, `qa-audit`, `qa-map` now pass `~/.claude/qaa/MY_PREFERENCES.md` to every spawned agent via `files_to_read`.
+- **Locator Registry reads propagated** — `qa-fix` and `qa-audit` now pass `.qa-output/locators/` to bug-detective, e2e-runner, and validator subagents.
+- **Playwright MCP usage is now non-negotiable in 4 agents** — `qaa-e2e-runner`, `qaa-testid-injector`, `qaa-bug-detective`, `qaa-executor` now hardcode Non-negotiable rules in their body that make live browser interaction via Playwright MCP **mandatory** (not optional) under the appropriate conditions. Previously agents sometimes skipped MCP calls even when the skill described them, because the description was advisory rather than enforced.
+- **MCP evidence files** at `.qa-output/mcp-evidence/{agent-name}-session.md` — every MCP-using agent now writes a structured evidence file per session logging `session_start`, `session_end`, URLs navigated, snapshots/screenshots taken, interactions performed, and `browser_closed: true`. The active verification checklist at the end of each agent runs `ls` + `grep` on this evidence file; missing or empty file = invalid run = hard failure.
+- **Skip-reason tracking** — when MCP is legitimately skipped (no `app_url`, non-E2E failure, MCP not connected), agents must document the skip reason in their primary report (TESTID_AUDIT_REPORT.md / FAILURE_CLASSIFICATION_REPORT.md). Silent skips are no longer permitted.
+- **Locator resolution priority chain — invention is forbidden** — `qaa-executor`, `qaa-e2e-runner`, and `qaa-bug-detective` now enforce a strict priority order when writing any locator: (1) Locator Registry first, (2) frontend source code `grep` second, (3) Playwright MCP live DOM snapshot third, (4) HALT if nothing resolvable. Agents MUST NOT invent `data-testid` values or guess CSS selectors. Every locator written to a generated file requires `source: registry | codebase | mcp` attribution in the MCP evidence file — anything else triggers file deletion or revert.
+- **Priority hit counts logged** — MCP evidence files now track `priority1_hits` (registry reuse), `priority2_hits` (source extraction), `priority3_hits` (MCP discovery), and `priority4_halts` (unresolvable elements), giving a full audit trail of where every locator came from.
+### Changed
+- **Agent reliability pattern: triple reinforcement** — every critical rule is now reinforced three times: (1) `skills:` frontmatter injection at the start of context, (2) `required_reading` + mid-body non-negotiable rules, (3) active `ls`/`cat`/`grep` verification at the end. This closes the "lost in the middle" attention gap documented in long-context LLM research.
+- **`qaa-bug-detective`, `qaa-executor`, `qaa-e2e-runner`, `qaa-validator`** — existing active checklists extended with `.qa-output/` specific items (generation plan, test inventory, codebase map, validation layers, failure classification evidence).
+### Fixed
+- **Subagent skill loss** — when a parent agent spawned a subagent via `Task()`, the subagent ran with fresh context and ignored the skill entirely (it had no way to know a skill existed). Declaring `skills:` in the YAML frontmatter fixes this at the Claude Code loader level.
+- **Artifact-read drift** — agents would sometimes reference `.qa-output/` artifacts in their reasoning without actually reading them. The active `grep` on specific content (e.g. "RISK_MAP HIGH items", "VALIDATION_REPORT confidence level") forces real consumption.
 ## [1.7.0] - 2026-04-02
 ### Added

package/README.md CHANGED Viewed

@@ -43,7 +43,9 @@ npx qaa-agent
 The interactive installer:
 1. Copies agents, commands, skills, templates, and workflows into your runtime directory
-2. Configures the [Playwright MCP](https://github.com/anthropics/mcp-playwright) server in your user-scope config (`~/.claude.json`) so it's available in **all projects**
+2. Registers **two MCP servers** in your user-scope config (`~/.claude.json`) so they're available in **all projects**:
+   - [Playwright MCP](https://www.npmjs.com/package/@playwright/mcp) — live browser control for E2E tests and locator extraction
+   - [Context7 MCP](https://www.npmjs.com/package/@upstash/context7-mcp) — up-to-date library documentation on demand
 3. Merges required permissions into `settings.json`
 **Supported runtimes:** Claude Code, OpenCode
@@ -55,48 +57,34 @@ The interactive installer:
 - [Node.js](https://nodejs.org/) 18+
 - [Claude Code](https://docs.anthropic.com/en/docs/claude-code) installed
-### Playwright MCP (required for E2E)
+### Bundled MCP servers
-QAA uses [`@playwright/mcp`](https://www.npmjs.com/package/@playwright/mcp) to open a real browser, extract locators from live pages, run E2E tests, and auto-fix locator mismatches.
+Both MCP servers are **registered automatically** in `~/.claude.json` when you run `npx qaa-agent`. No manual setup required — once installed, they're available in every Claude Code project on your machine.
-**You need to install the Playwright MCP server manually in your environment:**
+#### Playwright MCP — live browser control
-<details>
-<summary><strong>VS Code (Claude Code extension)</strong></summary>
+Uses [`@playwright/mcp`](https://www.npmjs.com/package/@playwright/mcp) to:
-1. Open VS Code Settings (`Ctrl+Shift+P` > `Preferences: Open User Settings (JSON)`)
-2. Add the MCP server config:
+- Open a real browser and navigate your running app
+- Extract actual locators (`data-testid`, ARIA roles, labels) from live pages
+- Run E2E tests, capture failures, and auto-fix locator mismatches
+- Build a persistent **Locator Registry** (`.qa-output/locators/`) that caches real locators across features
-```json
-{
-  "claude-code.mcpServers": {
-    "playwright": {
-      "command": "npx",
-      "args": ["@playwright/mcp@latest"]
-    }
-  }
-}
-```
+#### Context7 MCP — up-to-date library docs
-Or add it to your project's `.vscode/mcp.json`:
+Uses [`@upstash/context7-mcp`](https://www.npmjs.com/package/@upstash/context7-mcp) to:
-```json
-{
-  "servers": {
-    "playwright": {
-      "command": "npx",
-      "args": ["@playwright/mcp@latest"]
-    }
-  }
-}
-```
+- Fetch the latest documentation for Playwright, Cypress, Jest, Vitest, pytest, and any other library the agent is working with
+- Keep generated tests aligned with current framework APIs instead of outdated training data
+- Free tier: ~60 requests/hour, ~3,300 tokens/query
-</details>
+#### Verifying the MCPs are connected
-<details>
-<summary><strong>Claude Code CLI</strong></summary>
+Open Claude Code in any project and type `/mcp`. You should see both `playwright` and `context7` listed as connected.
-Add to `~/.claude.json` (user-scope, all projects):
+#### Manual config (fallback)
+If for any reason the automatic registration fails, you can add the servers manually to `~/.claude.json`:
 ```json
 {
@@ -104,21 +92,15 @@ Add to `~/.claude.json` (user-scope, all projects):
     "playwright": {
       "command": "npx",
       "args": ["@playwright/mcp@latest"]
+    },
+    "context7": {
+      "command": "npx",
+      "args": ["-y", "@upstash/context7-mcp@latest"]
     }
   }
 }
 ```
-Or add a `.mcp.json` file in your project root for project-scope only.
-</details>
-Once configured, Playwright MCP enables QAA to:
-- Open a real browser and navigate your running app
-- Extract actual locators (`data-testid`, ARIA roles, labels) from live pages
-- Run E2E tests, capture failures, and auto-fix locator mismatches
-- Build a persistent **Locator Registry** (`.qa-output/locators/`) that caches real locators across features
 ---
 ## Quick Start
@@ -328,7 +310,7 @@ qaa-agent/
   bin/             # Installer and CLI tools
   docs/            # User documentation
   CLAUDE.md        # QA standards (read by every agent)
-  .mcp.json        # Playwright MCP server config
+  .mcp.json        # Playwright + Context7 MCP server config
   settings.json    # Claude Code permissions
 ```

package/agents/qa-pipeline-orchestrator.md CHANGED Viewed

@@ -1,3 +1,10 @@
+---
+name: qa-pipeline-orchestrator
+description: Single orchestrator for the QA automation pipeline
+skills:
+  - qa-workflow-documenter
+---
 <purpose>
 Single orchestrator for the QA automation pipeline. Coordinates all 7 agent types (scanner, analyzer, planner, executor, validator, bug-detective, testid-injector) across 3 workflow options. Owns all pipeline state transitions -- agents never update state directly. The orchestrator sets stage status to 'running' before spawning an agent and 'complete' or 'failed' after the agent returns.
@@ -1376,3 +1383,43 @@ Before this orchestrator is considered complete, verify:
 4. Checkpoints pause when appropriate and auto-approve when safe
 5. Failure in any stage stops the pipeline cleanly with actionable error message
 </success_criteria>
+## MANDATORY verification — run ALL commands below, no exceptions, no skipping
+Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
+```bash
+echo "=== PIPELINE ORCHESTRATOR CHECKLIST START ==="
+echo "1. Pipeline state file:"
+ls .planning/STATE.md 2>/dev/null || echo "STATE_FILE_NOT_FOUND"
+echo "2. Stage status fields:"
+grep -E "scan_status|analyze_status|generate_status|validate_status|deliver_status" .planning/STATE.md 2>/dev/null || echo "NO_STATUS_FIELDS"
+echo "3. All .qa-output/ artifacts:"
+ls .qa-output/ 2>/dev/null || echo "QA_OUTPUT_EMPTY"
+echo "4. SCAN_MANIFEST.md (always required):"
+ls .qa-output/SCAN_MANIFEST.md 2>/dev/null || echo "NO_SCAN_MANIFEST"
+echo "5. Codebase map document count:"
+ls .qa-output/codebase/ 2>/dev/null | wc -l || echo "NO_CODEBASE_MAP"
+echo "6. Analyzer artifacts:"
+ls .qa-output/QA_ANALYSIS.md .qa-output/TEST_INVENTORY.md 2>/dev/null || echo "NO_ANALYZER_ARTIFACTS"
+echo "7. TestID audit report:"
+ls .qa-output/TESTID_AUDIT_REPORT.md 2>/dev/null || echo "NO_TESTID_REPORT"
+echo "8. Generation plan:"
+ls .qa-output/GENERATION_PLAN.md 2>/dev/null || echo "NO_GENERATION_PLAN"
+echo "9. Validation report:"
+ls .qa-output/VALIDATION_REPORT.md 2>/dev/null || echo "NO_VALIDATION_REPORT"
+echo "10. E2E + bug-detective reports:"
+ls .qa-output/E2E_RUN_REPORT.md .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_E2E_REPORTS"
+echo "11. State transitions with timestamps:"
+grep -cE "^- |^[0-9]+\." .planning/STATE.md 2>/dev/null || echo "NO_STATE_TRANSITIONS"
+echo "12. MY_PREFERENCES.md:"
+cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
+echo "=== PIPELINE ORCHESTRATOR CHECKLIST END ==="
+```
+**Rules:**
+- Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
+- If any output shows a problem (STATE_FILE_NOT_FOUND, NO_SCAN_MANIFEST), fix it before returning.
+- If output shows expected "not found" results (e.g., NO_TESTID_REPORT when no frontend was detected), that is fine — the point is you RAN the command instead of assuming the answer.
+- Do NOT return control to the user until the block has been executed and you have read every line of output.

package/agents/qaa-analyzer.md CHANGED Viewed

@@ -1,3 +1,10 @@
+---
+name: qaa-analyzer
+description: Analyzes scanned repo to produce QA analysis and test inventory
+skills:
+  - qa-repo-analyzer
+---
 <purpose>
 Analyze a scanned repository to produce QA_ANALYSIS.md and TEST_INVENTORY.md -- the two primary analysis artifacts that drive all downstream test planning and generation. Consumes SCAN_MANIFEST.md (produced by the scanner agent) and CLAUDE.md (QA standards) to produce a comprehensive testability report with architecture overview, risk assessment, top 10 unit test targets, API contract targets, and a testing pyramid distribution tailored to the specific repository. Produces a pyramid-based test case inventory where every test case has a unique ID, specific target, concrete inputs, explicit expected outcome with exact values, and priority. Optionally produces QA_REPO_BLUEPRINT.md for Option 1 (dev-only) workflows when no existing QA repository exists. Spawned by the orchestrator after the scanner completes successfully via Task(subagent_type='qaa-analyzer').
 </purpose>
@@ -537,3 +544,37 @@ The analyzer agent has completed successfully when:
 5. All artifacts are committed via `node bin/qaa-tools.cjs commit`
 6. Return to orchestrator: file paths, total test count, pyramid breakdown (unit/integration/api/e2e counts), risk count (high/medium/low)
 </success_criteria>
+## MANDATORY verification — run ALL commands below, no exceptions, no skipping
+Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
+```bash
+echo "=== ANALYZER CHECKLIST START ==="
+echo "1. SCAN_MANIFEST.md (input):"
+ls .qa-output/SCAN_MANIFEST.md 2>/dev/null || echo "SCAN_MANIFEST_NOT_FOUND"
+echo "2. SCAN_MANIFEST content preview:"
+head -50 .qa-output/SCAN_MANIFEST.md 2>/dev/null || echo "SCAN_MANIFEST_EMPTY"
+echo "3. Codebase map documents:"
+ls .qa-output/codebase/ 2>/dev/null || echo "NO_CODEBASE_MAP"
+echo "4. RISK_MAP.md risks:"
+grep -E "^## |HIGH|MEDIUM|LOW" .qa-output/codebase/RISK_MAP.md 2>/dev/null | head -20 || echo "NO_RISK_MAP"
+echo "5. CRITICAL_PATHS.md flows:"
+grep -c "^- \|^[0-9]\+\." .qa-output/codebase/CRITICAL_PATHS.md 2>/dev/null || echo "NO_CRITICAL_PATHS"
+echo "6. TEST_SURFACE.md entry points:"
+grep -E "function|class|method" .qa-output/codebase/TEST_SURFACE.md 2>/dev/null | head -10 || echo "NO_TEST_SURFACE"
+echo "7. Locator Registry:"
+ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
+echo "8. Output artifacts:"
+ls .qa-output/QA_ANALYSIS.md .qa-output/TEST_INVENTORY.md 2>/dev/null || echo "OUTPUTS_NOT_WRITTEN"
+echo "9. MY_PREFERENCES.md:"
+cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
+echo "=== ANALYZER CHECKLIST END ==="
+```
+**Rules:**
+- Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
+- If any output shows a problem (SCAN_MANIFEST_NOT_FOUND, OUTPUTS_NOT_WRITTEN), fix it before returning.
+- If output shows expected "not found" results (e.g., NO_CODEBASE_MAP when mapper hasn't run yet), that is fine — the point is you RAN the command instead of assuming the answer.
+- Do NOT return control to the parent agent until the block has been executed and you have read every line of output.

package/agents/qaa-bug-detective.md CHANGED Viewed

@@ -1,3 +1,10 @@
+---
+name: qaa-bug-detective
+description: Classifies failures and fixes test code errors
+skills:
+  - qa-bug-detective
+---
 <purpose>
 Run generated tests against the actual application and classify every failure into one of four actionable categories: APPLICATION BUG, TEST CODE ERROR, ENVIRONMENT ISSUE, or INCONCLUSIVE. Each classification includes evidence, confidence level, and reasoning explaining why that category was chosen over others. Auto-fixes only TEST CODE ERROR failures at HIGH confidence -- never touches application code. Reads test source files, CLAUDE.md classification rules, and the failure-classification template. Produces FAILURE_CLASSIFICATION_REPORT.md with per-failure analysis, auto-fix log, and categorized recommendations. Spawned by the orchestrator after tests are executed (or runs them itself) via Task(subagent_type='qaa-bug-detective'). This agent actually RUNS the test suite -- it is not static analysis. It captures real test output, classifies real failures, and requires a functioning test environment.
 </purpose>
@@ -309,6 +316,54 @@ Attempt auto-fixes for eligible failures. Strict eligibility rules apply.
 **Track all auto-fix attempts** for the Auto-Fix Log section of the report.
 </step>
+## Non-negotiable rules
+These rules are hardcoded in the agent body because they MUST NOT be skipped under any circumstance, regardless of whether the skill is loaded or not.
+### Locator Registry persistence
+After every fix loop iteration where the test **PASSES**:
+1. **Save all verified locators** to `.qa-output/locators/` — write a per-feature file `.qa-output/locators/{feature}.locators.md` and update `.qa-output/locators/LOCATOR_REGISTRY.md`.
+2. **Only save locators that were confirmed working** by a passing test. Do NOT save locators from failing tests — they may be incorrect and would contaminate the registry.
+3. **Locator format in registry:** Each entry must include: the `data-testid` or selector value, the tier (1-4), the page/component context, and the date verified.
+### MY_PREFERENCES.md persistence
+After every fix where a correction contradicts CLAUDE.md defaults or reveals a user-specific pattern:
+1. **Read `~/.claude/qaa/MY_PREFERENCES.md`** if it exists, before producing any output (this is also in `<required_reading>` but repeated here for emphasis).
+2. **Save new corrections** to `~/.claude/qaa/MY_PREFERENCES.md` so future agent instances inherit the learning.
+3. Preferences override CLAUDE.md when there is a conflict.
+### Playwright MCP reproduction is mandatory for E2E failures
+When an E2E test fails **and** Playwright MCP server is connected **and** an `app_url` is available, browser reproduction is **required, not optional** — classifying an E2E failure without reproducing it in the live browser produces unreliable APPLICATION BUG vs TEST CODE ERROR classifications.
+1. **For each E2E failure in the test run:** call at minimum `mcp__playwright__browser_navigate` (to the failing route), `mcp__playwright__browser_snapshot` (to inspect the real DOM), and `mcp__playwright__browser_take_screenshot` (visual evidence attached to the classification).
+2. **Skip is only permitted when:** the failure is a unit/API test (not E2E), OR no `app_url` is available, OR Playwright MCP is not connected. The skip MUST be recorded in FAILURE_CLASSIFICATION_REPORT.md under the failure's evidence section with reason (e.g., "MCP unavailable" or "no app_url").
+3. **Persist evidence of MCP usage** to `.qa-output/mcp-evidence/qaa-bug-detective-session.md` with:
+   - `session_start: {ISO timestamp}` and `session_end: {ISO timestamp}`
+   - `failures_reproduced:` list of `{test_id, route, classification}`
+   - `snapshots_taken:` count + route
+   - `screenshots_taken:` list of screenshot paths (evidence for classifications)
+   - `browser_closed: true`
+4. **If E2E failures exist and the evidence file is missing or empty, classifications for those failures are INVALID** — mark them INCONCLUSIVE with reason "MCP reproduction skipped" rather than making up an APPLICATION BUG / TEST CODE ERROR classification.
+### Locator resolution priority when auto-fixing TEST CODE ERRORS — invention is forbidden
+When a failure is classified as `TEST CODE ERROR` (wrong locator) and the agent auto-fixes the test file, the corrected locator MUST come from one of the following sources, in this exact priority order. **The agent MUST NOT invent a new `data-testid` or guess a CSS selector.**
+**Priority 1 — Locator Registry:** Check `.qa-output/locators/LOCATOR_REGISTRY.md` + `.qa-output/locators/{feature}.locators.md` for the target element.
+**Priority 2 — Codebase source:** `grep -rE "data-testid=|aria-label=|id=\""` the frontend source for the page where the failure occurred.
+**Priority 3 — Live DOM via Playwright MCP:** Use `mcp__playwright__browser_snapshot()` on the failing route to extract the real locator. Persist to registry with tier classification.
+**Priority 4 — HALT:** If nothing is resolvable, do NOT auto-fix. Re-classify the failure as `INCONCLUSIVE` with reason `locator unresolvable from registry/source/MCP`. The fix remains for the developer to address.
+Every locator written during auto-fix MUST have a source attribution in the MCP evidence file: `source: registry | codebase | mcp`. A locator without attribution is invented and the auto-fix is invalid (revert it).
 <step name="produce_report">
 Write FAILURE_CLASSIFICATION_REPORT.md matching templates/failure-classification.md exactly (4 required sections).
@@ -477,3 +532,43 @@ The bug detective agent has completed successfully when:
 8. Return values provided to orchestrator: report_path, total_failures, classification_breakdown, auto_fixes_applied, auto_fixes_verified, commit_hash
 9. All quality gate checks pass (8 template items + 4 detective-specific items)
 </success_criteria>
+## MANDATORY verification — run ALL commands below, no exceptions, no skipping
+Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
+```bash
+echo "=== BUG-DETECTIVE CHECKLIST START ==="
+echo "1. Locator Registry:"
+ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
+echo "2. MY_PREFERENCES.md:"
+cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
+echo "3. FAILURE_CLASSIFICATION_REPORT.md:"
+ls .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "REPORT_NOT_WRITTEN"
+echo "4. Classifications in report:"
+grep -E "APPLICATION BUG|TEST CODE ERROR|ENVIRONMENT ISSUE|INCONCLUSIVE" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_CLASSIFICATIONS_FOUND"
+echo "5. Confidence levels:"
+grep -E "HIGH|MEDIUM-HIGH|MEDIUM|LOW" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null | head -10 || echo "NO_CONFIDENCE_LEVELS"
+echo "6. Evidence and reasoning count:"
+grep -cE "^### |Evidence:|Reasoning:" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_EVIDENCE_SECTIONS"
+echo "7. Upstream reports:"
+ls .qa-output/E2E_RUN_REPORT.md 2>/dev/null || echo "NO_E2E_RUN_REPORT"
+ls .qa-output/VALIDATION_REPORT.md 2>/dev/null || echo "NO_VALIDATION_REPORT"
+echo "8. MCP reproduction evidence:"
+ls .qa-output/mcp-evidence/qaa-bug-detective-session.md 2>/dev/null || echo "NO_MCP_EVIDENCE"
+grep -cE "failures_reproduced:|snapshots_taken:|screenshots_taken:" .qa-output/mcp-evidence/qaa-bug-detective-session.md 2>/dev/null || echo "NO_MCP_REPRODUCTION_DATA"
+echo "9. MCP skip reasons (if any):"
+grep -E "MCP unavailable|no app_url|MCP reproduction skipped" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_MCP_SKIP_DOCUMENTED"
+echo "10. Locator source attribution:"
+grep -cE "source: registry|source: codebase|source: mcp" .qa-output/mcp-evidence/qaa-bug-detective-session.md 2>/dev/null || echo "NO_SOURCE_ATTRIBUTION"
+echo "11. Priority 4 halts:"
+grep -E "locator unresolvable from registry/source/MCP" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_PRIORITY4_HALTS"
+echo "=== BUG-DETECTIVE CHECKLIST END ==="
+```
+**Rules:**
+- Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
+- If any output shows a problem (REPORT_NOT_WRITTEN, NO_CLASSIFICATIONS_FOUND), fix it before returning.
+- If output shows expected "not found" results (e.g., NO_MCP_EVIDENCE when no E2E failures existed), that is fine — the point is you RAN the command instead of assuming the answer.
+- Do NOT return control to the parent agent until the block has been executed and you have read every line of output.

package/agents/qaa-codebase-mapper.md CHANGED Viewed

@@ -3,6 +3,8 @@ name: qaa-codebase-mapper
 description: Explores codebase and writes QA-focused analysis documents. Spawned by /qa-analyze or qa-start pipeline. Produces testing-oriented architecture, conventions, and risk documents.
 tools: Read, Bash, Grep, Glob, Write
 color: cyan
+skills:
+  - qa-repo-analyzer
 ---
 <role>
@@ -933,3 +935,4 @@ Test the highest-risk gaps first:
 - [ ] No secrets or forbidden file contents leaked
 - [ ] Confirmation returned (not document contents)
 </success_criteria>

package/agents/qaa-e2e-runner.md CHANGED Viewed

@@ -1,3 +1,10 @@
+---
+name: qaa-e2e-runner
+description: Runs E2E tests against live app, fixes locator mismatches
+skills:
+  - qa-bug-detective
+---
 <purpose>
 Run generated E2E test files against a live application using the Playwright browser tools. Navigate pages, capture real locators from the accessibility snapshot, compare them against the locators in generated test files, fix mismatches, and loop until tests pass or failures are classified as application bugs. This agent bridges the gap between "tests exist on disk" and "tests actually pass against the real app."
@@ -400,6 +407,39 @@ E2E_RUNNER_COMPLETE:
 | Test runner not found | No playwright/cypress installed | Report as ENVIRONMENT ISSUE with install instructions |
 </error_handling>
+## Non-negotiable rules
+These rules are hardcoded in the agent body because they MUST NOT be skipped under any circumstance, regardless of whether the skill is loaded or not.
+### Playwright MCP usage is mandatory (NOT optional)
+This agent's core job is to run tests against a **live browser**. That requires the Playwright MCP server. The agent MUST NOT classify a test run as complete based on static analysis, log inspection, or dry-run output alone.
+1. **Every E2E test execution MUST go through Playwright MCP tools** — `mcp__playwright__browser_navigate`, `mcp__playwright__browser_snapshot`, `mcp__playwright__browser_click`, `mcp__playwright__browser_fill_form`, `mcp__playwright__browser_take_screenshot`, `mcp__playwright__browser_close`. If these tools are not available, halt and return `ENVIRONMENT_ISSUE: Playwright MCP not connected` instead of faking execution.
+2. **Minimum required MCP operations per run:** at least one `browser_navigate` (to the app URL), at least one `browser_snapshot` (for DOM inspection), at least one `browser_take_screenshot` (for visual evidence), and exactly one `browser_close` at the end of the session.
+3. **Persist evidence of MCP usage** to `.qa-output/mcp-evidence/qaa-e2e-runner-session.md`. The file MUST contain:
+   - `session_start: {ISO timestamp}` and `session_end: {ISO timestamp}`
+   - `urls_navigated:` list of every URL passed to `browser_navigate`
+   - `snapshots_taken:` count of `browser_snapshot` calls with route per snapshot
+   - `screenshots_taken:` list of screenshot file paths (also written to `.qa-output/screenshots/`)
+   - `interactions:` list of clicks/fills with the element identifier
+   - `browser_closed: true` confirming `browser_close` was called
+4. **If the evidence file is missing, empty, or lists zero `browser_navigate` calls, the run is INVALID** — do not write E2E_RUN_REPORT.md and return a hard failure instead.
+### Locator resolution priority when fixing failing tests — invention is forbidden
+When a test fails due to a locator mismatch and the fix loop needs to update the POM or test file with a corrected locator, the runner MUST follow this priority chain. **Never invent a `data-testid` or selector that does not exist in one of the sources below.**
+**Priority 1 — Locator Registry:** Check `.qa-output/locators/LOCATOR_REGISTRY.md` and `.qa-output/locators/{feature}.locators.md` for the target element. If present, use it verbatim.
+**Priority 2 — Codebase source:** If not in registry, `grep -rE "data-testid=|aria-label=|id=\"" <frontend_source_dir>` for the page under test. If found, use verbatim and persist to registry.
+**Priority 3 — Live DOM via Playwright MCP:** If not in registry AND not in source, call `mcp__playwright__browser_snapshot()` on the failing route and extract the real locator from the snapshot. Persist to registry with `tier` classification.
+**Priority 4 — HALT (never invent):** If nothing is resolvable, mark the test as `BLOCKED: locator unresolvable` in E2E_RUN_REPORT.md with the unresolved element name. Do NOT fabricate a locator to "make the test pass". Do NOT replace the failing locator with a random guess.
+Every locator written to a POM/test during fix loops MUST have a source attribution in the MCP evidence file: `source: registry | codebase | mcp`. Anything else is invention and the fix is invalid.
 <success_criteria>
 E2E runner is complete when:
@@ -414,3 +454,49 @@ E2E runner is complete when:
 - [ ] Locator registry updated with all real locators discovered during execution (`.qa-output/locators/`)
 - [ ] Browser session was closed
 </success_criteria>
+## MANDATORY verification — run ALL commands below, no exceptions, no skipping
+Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
+```bash
+echo "=== E2E-RUNNER CHECKLIST START ==="
+echo "1. E2E Run Report:"
+ls .qa-output/E2E_RUN_REPORT.md 2>/dev/null || echo "REPORT_NOT_WRITTEN"
+echo "2. Locator Registry:"
+ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
+echo "3. Screenshots:"
+ls .qa-output/screenshots/ 2>/dev/null || echo "NO_SCREENSHOTS"
+echo "4. Modified POMs/tests in working tree:"
+git status 2>/dev/null | grep -E "modified:.*(pages/|tests/)" || echo "NO_MODIFIED_FILES"
+echo "5. MY_PREFERENCES.md:"
+cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
+echo "6. MCP evidence file:"
+ls .qa-output/mcp-evidence/qaa-e2e-runner-session.md 2>/dev/null || echo "NO_MCP_EVIDENCE"
+echo "7. MCP session boundaries:"
+grep -E "session_start:|session_end:|browser_closed: true" .qa-output/mcp-evidence/qaa-e2e-runner-session.md 2>/dev/null || echo "NO_MCP_SESSION"
+echo "8. URLs navigated via MCP:"
+grep -cE "^  - http|^  - /" .qa-output/mcp-evidence/qaa-e2e-runner-session.md 2>/dev/null || echo "NO_URLS_NAVIGATED"
+echo "9. Snapshot + screenshot operations:"
+grep -cE "browser_snapshot|browser_take_screenshot" .qa-output/mcp-evidence/qaa-e2e-runner-session.md 2>/dev/null || echo "NO_SNAPSHOT_OPS"
+echo "10. Locator source attribution:"
+grep -cE "source: registry|source: codebase|source: mcp" .qa-output/mcp-evidence/qaa-e2e-runner-session.md 2>/dev/null || echo "NO_SOURCE_ATTRIBUTION"
+echo "11. Unresolvable locator blocks:"
+grep -E "BLOCKED: locator unresolvable" .qa-output/E2E_RUN_REPORT.md 2>/dev/null || echo "NO_BLOCKED_LOCATORS"
+echo "12. Pass/fail counts in report:"
+grep -E "PASS|FAIL|Tests run|[0-9]+ passed|[0-9]+ failed" .qa-output/E2E_RUN_REPORT.md 2>/dev/null | head -5 || echo "NO_PASS_FAIL_COUNTS"
+echo "13. Locator Registry entries:"
+grep -cE "^- |^\* " .qa-output/locators/LOCATOR_REGISTRY.md 2>/dev/null || echo "NO_REGISTRY_ENTRIES"
+echo "14. Locator tier classification:"
+grep -E "tier: 1|tier: 2|tier: 3|tier: 4" .qa-output/locators/*.md 2>/dev/null | head -10 || echo "NO_TIER_CLASSIFICATION"
+echo "15. Validator report (input):"
+ls .qa-output/VALIDATION_REPORT.md 2>/dev/null || echo "NO_VALIDATION_REPORT"
+echo "=== E2E-RUNNER CHECKLIST END ==="
+```
+**Rules:**
+- Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
+- If any output shows a problem (REPORT_NOT_WRITTEN, NO_MCP_EVIDENCE when browser was used), fix it before returning.
+- If output shows expected "not found" results (e.g., NO_SCREENSHOTS when tests all passed first try), that is fine — the point is you RAN the command instead of assuming the answer.
+- Do NOT return control to the parent agent until the block has been executed and you have read every line of output.

package/agents/qaa-executor.md CHANGED Viewed

@@ -1,3 +1,11 @@
+---
+name: qaa-executor
+description: Generates test files, POMs, fixtures and configs
+skills:
+  - qa-template-engine
+  - qa-self-validator
+---
 <purpose>
 Read the generation plan (produced by qaa-planner), TEST_INVENTORY.md, and CLAUDE.md to produce actual test files, page object models, fixtures, and configuration files. This is the most complex agent in the pipeline -- it handles framework detection, BasePage scaffolding, POM generation following strict rules, test spec writing with concrete assertions, and per-file atomic commits for maximum traceability. The executor does not decide WHAT to test (that is the planner's job) -- it decides HOW to write each test file following CLAUDE.md standards and qa-template-engine patterns.
@@ -593,6 +601,48 @@ EXECUTOR_COMPLETE:
 ```
 </output>
+## Non-negotiable rules
+These rules are hardcoded in the agent body because they MUST NOT be skipped under any circumstance, regardless of whether the skill is loaded or not.
+### Locator resolution priority — locator invention is forbidden
+**Before writing any locator (Tier 1 `data-testid`, Tier 2 role/label, Tier 3 CSS) in a POM or E2E test, the executor MUST follow this exact priority chain. Proposing a value that exists in none of the sources below is a critical failure.**
+**Priority 1 — Locator Registry (first check):**
+- Run `ls .qa-output/locators/LOCATOR_REGISTRY.md` and `ls .qa-output/locators/{feature}.locators.md`.
+- `grep` the target element (by page + semantic description) in those files.
+- If a locator exists → USE IT VERBATIM. Do not modify, do not propose an alternative.
+**Priority 2 — Codebase source (second check, only if not in registry):**
+- `grep -rE "data-testid=|aria-label=|id=\"" <frontend_source_dir>` for the target page/component file.
+- If `data-testid`, stable `id`, or semantic `aria-label` is found in source → USE IT VERBATIM. Persist to registry so future runs hit Priority 1.
+**Priority 3 — Playwright MCP live DOM (third check, only if not in registry AND not in source):**
+- Call `mcp__playwright__browser_navigate({ url: "{app_url}/{route}" })` then `mcp__playwright__browser_snapshot()` to read the rendered DOM.
+- Extract the real locator from the snapshot (Tier 1 > Tier 2 > Tier 3 priority per CLAUDE.md).
+- Persist discovered locator to `.qa-output/locators/{feature}.locators.md` and update `LOCATOR_REGISTRY.md` so the next run hits Priority 1.
+**Priority 4 — HALT (never invent):**
+- If registry has no entry, source has no stable attribute, AND (MCP is unavailable OR `app_url` is missing), the agent MUST HALT for that element.
+- Return `BLOCKED: locator unresolvable for {page}:{element} — registry empty, source has no testid/aria, MCP unavailable. Options: (a) run qa-testid to inject, (b) provide app_url, (c) connect Playwright MCP.`
+- Do NOT invent a `data-testid` value. Do NOT propose a CSS selector based on a guess. Do NOT write the POM/test file with placeholder locators.
+### Playwright MCP evidence file (mandatory when MCP is used)
+When Priority 3 is invoked (MCP lookup), persist evidence to `.qa-output/mcp-evidence/qaa-executor-session.md` with:
+- `session_start: {ISO timestamp}` and `session_end: {ISO timestamp}`
+- `pages_validated:` list of `{page_name, url, locators_discovered_count, source: registry|codebase|mcp}`
+- `snapshots_taken:` count + route
+- `locators_discovered_via_mcp:` list of locators found via MCP (these MUST also appear in `.qa-output/locators/`)
+- `priority1_hits:` count (reused from registry)
+- `priority2_hits:` count (extracted from source)
+- `priority3_hits:` count (discovered via MCP)
+- `priority4_halts:` list of unresolvable elements (if any)
+- `browser_closed: true`
+**If E2E/POM files were generated AND the evidence file shows `priority3_hits > 0` but the registry was not updated, the generation is INVALID** — delete files and re-run. Every MCP-discovered locator MUST be persisted.
 <quality_gate>
 Before considering the executor's work complete, verify ALL of the following.
@@ -649,3 +699,51 @@ The executor agent has completed successfully when:
 8. All quality gate checks pass
 9. Return values provided to orchestrator: files_created, total_files, commit_count, features_covered, test_case_count
 </success_criteria>
+## MANDATORY verification — run ALL commands below, no exceptions, no skipping
+Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
+```bash
+echo "=== EXECUTOR CHECKLIST START ==="
+echo "1. Generated test files, POMs, fixtures:"
+ls tests/ pages/ fixtures/ 2>/dev/null || echo "NO_TEST_FILES_FOUND"
+echo "2. BasePage inheritance:"
+grep -rE "class BasePage|extends BasePage" pages/ 2>/dev/null || echo "NO_BASEPAGE_FOUND"
+echo "3. Test framework config:"
+ls *.config.* 2>/dev/null || echo "NO_CONFIG_FOUND"
+echo "4. MY_PREFERENCES.md:"
+cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
+echo "5. Locator Registry:"
+ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
+echo "6. Generation plan and test inventory inputs:"
+ls .qa-output/GENERATION_PLAN.md .qa-output/TEST_INVENTORY.md 2>/dev/null || echo "INPUTS_NOT_FOUND"
+echo "7. Test case count from inventory:"
+grep -cE "^\| (UT|INT|API|E2E)-" .qa-output/TEST_INVENTORY.md 2>/dev/null || echo "NO_TEST_CASES_COUNTED"
+echo "8. Generation plan tasks consumed:"
+grep -E "task_id|files_to_create" .qa-output/GENERATION_PLAN.md 2>/dev/null | head -20 || echo "NO_PLAN_TASKS"
+echo "9. Codebase map documents:"
+ls .qa-output/codebase/ 2>/dev/null || echo "NO_CODEBASE_MAP"
+echo "10. CODE_PATTERNS.md patterns:"
+grep -E "pattern|convention|style" .qa-output/codebase/CODE_PATTERNS.md 2>/dev/null | head -5 || echo "NO_CODE_PATTERNS"
+echo "11. Tier 1 locator usage in generated code:"
+grep -cE "data-testid|getByTestId|getByRole|findByRole" tests/ pages/ -r 2>/dev/null || echo "NO_TIER1_LOCATORS"
+echo "12. MCP evidence file:"
+ls .qa-output/mcp-evidence/qaa-executor-session.md 2>/dev/null || echo "NO_MCP_EVIDENCE"
+echo "13. Locator priority chain hits:"
+grep -E "priority1_hits:|priority2_hits:|priority3_hits:|priority4_halts:" .qa-output/mcp-evidence/qaa-executor-session.md 2>/dev/null || echo "NO_PRIORITY_HITS"
+echo "14. Locator source attribution:"
+grep -cE "source: registry|source: codebase|source: mcp" .qa-output/mcp-evidence/qaa-executor-session.md 2>/dev/null || echo "NO_SOURCE_ATTRIBUTION"
+echo "15. MCP session boundaries:"
+grep -E "session_start:|browser_closed: true" .qa-output/mcp-evidence/qaa-executor-session.md 2>/dev/null || echo "NO_MCP_SESSION"
+echo "16. Priority 4 halts (unresolvable locators):"
+grep -E "BLOCKED: locator unresolvable" .qa-output/mcp-evidence/qaa-executor-session.md 2>/dev/null || echo "NO_PRIORITY4_HALTS"
+echo "=== EXECUTOR CHECKLIST END ==="
+```
+**Rules:**
+- Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
+- If any output shows a problem (NO_TEST_FILES_FOUND after generation, INPUTS_NOT_FOUND), fix it before returning.
+- If output shows expected "not found" results (e.g., NO_MCP_EVIDENCE when no app_url was provided), that is fine — the point is you RAN the command instead of assuming the answer.
+- Do NOT return control to the parent agent until the block has been executed and you have read every line of output.