qaa-agent 1.7.4 → 1.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/.mcp.json CHANGED
@@ -3,6 +3,10 @@
3
3
  "playwright": {
4
4
  "command": "npx",
5
5
  "args": ["@playwright/mcp@latest"]
6
+ },
7
+ "context7": {
8
+ "command": "npx",
9
+ "args": ["-y", "@upstash/context7-mcp@latest"]
6
10
  }
7
11
  }
8
12
  }
package/CHANGELOG.md CHANGED
@@ -3,6 +3,46 @@
3
3
 
4
4
  All notable changes to QAA (QA Automation Agent) are documented here.
5
5
 
6
+ ## [1.8.1] - 2026-04-16
7
+
8
+ ### Added
9
+
10
+ - **Context7 MCP integration** — `@upstash/context7-mcp` is now bundled alongside Playwright MCP. The installer registers both MCP servers in the user-scope config (`~/.claude.json`) so they're available in every project on the machine, not just in the QAA repo. Context7 gives every QAA agent on-demand access to up-to-date library documentation (Playwright, Cypress, Jest, Vitest, pytest, and any other framework), keeping generated tests aligned with current APIs instead of outdated training data.
11
+ - **`bin/install.cjs` installer script** — the file was referenced in `package.json` but didn't actually exist on npm, causing `npx qaa-agent` to fail silently (`No bin file found at bin/install.cjs`). The installer now performs three steps on every run: (1) copies agents, commands, skills, templates, workflows, docs, and config files into the chosen scope (`~/.claude/qaa` for global, `./.claude/qaa` for local), (2) registers both MCP servers in `~/.claude.json` with idempotency — existing entries are not duplicated, and (3) deep-merges the QAA permissions into the user's `settings.json` without overwriting their existing settings.
12
+
13
+ ### Changed
14
+
15
+ - **MCP registration is now user-scope by default** — previously MCPs were defined only in the project-level `.mcp.json`, which meant they only activated when the user opened the QAA repo itself. They now register in `~/.claude.json`, making them available in every Claude Code project on the user's machine. The project-level `.mcp.json` is kept for QAA development purposes but is no longer the source of truth for end users.
16
+
17
+ ### Fixed
18
+
19
+ - **Silent `npx qaa-agent` failure** — users who installed QAA via npm before this release did not get Playwright or Context7 MCPs registered because the installer script was missing from the published package. Publishing 1.8.1 restores the expected behavior: a single `npx qaa-agent` command copies all files and registers both MCPs globally.
20
+
21
+ ## [1.8.0] - 2026-04-13
22
+
23
+ ### Added
24
+
25
+ - **Active verification checklist in every agent** — all 8 pipeline agents now end their body with a `## Before completing any task, verify each item actively:` section that forces the agent to run real `ls` + `cat` + `grep` commands against `.qa-output/` artifacts, the Locator Registry, codebase map documents, and `MY_PREFERENCES.md` before closing the task. The output of those commands lands in the subagent's context (recency effect), so the model cannot skip reading inputs or leave outputs unwritten without the verification failing.
26
+ - **`skills:` declared in YAML frontmatter for every agent** — `qaa-analyzer`, `qaa-planner`, `qaa-executor`, `qaa-validator`, `qaa-e2e-runner`, `qaa-bug-detective`, `qaa-testid-injector`, `qaa-codebase-mapper`, `qa-pipeline-orchestrator`. Claude Code now injects the matching SKILL.md content at the start of the subagent's context when the Task tool spawns it. Previously subagents spawned with empty context and ignored the skill entirely.
27
+ - **Non-negotiable rules section in `qaa-bug-detective`** — explicit rules for Locator Registry persistence and MY_PREFERENCES.md updates, placed mid-body as redundant reinforcement between the frontmatter (start) and the active checklist (end).
28
+ - **`MY_PREFERENCES.md` reads propagated across slash commands** — `qa-create-test`, `qa-fix`, `qa-audit`, `qa-map` now pass `~/.claude/qaa/MY_PREFERENCES.md` to every spawned agent via `files_to_read`.
29
+ - **Locator Registry reads propagated** — `qa-fix` and `qa-audit` now pass `.qa-output/locators/` to bug-detective, e2e-runner, and validator subagents.
30
+ - **Playwright MCP usage is now non-negotiable in 4 agents** — `qaa-e2e-runner`, `qaa-testid-injector`, `qaa-bug-detective`, `qaa-executor` now hardcode Non-negotiable rules in their body that make live browser interaction via Playwright MCP **mandatory** (not optional) under the appropriate conditions. Previously agents sometimes skipped MCP calls even when the skill described them, because the description was advisory rather than enforced.
31
+ - **MCP evidence files** at `.qa-output/mcp-evidence/{agent-name}-session.md` — every MCP-using agent now writes a structured evidence file per session logging `session_start`, `session_end`, URLs navigated, snapshots/screenshots taken, interactions performed, and `browser_closed: true`. The active verification checklist at the end of each agent runs `ls` + `grep` on this evidence file; missing or empty file = invalid run = hard failure.
32
+ - **Skip-reason tracking** — when MCP is legitimately skipped (no `app_url`, non-E2E failure, MCP not connected), agents must document the skip reason in their primary report (TESTID_AUDIT_REPORT.md / FAILURE_CLASSIFICATION_REPORT.md). Silent skips are no longer permitted.
33
+ - **Locator resolution priority chain — invention is forbidden** — `qaa-executor`, `qaa-e2e-runner`, and `qaa-bug-detective` now enforce a strict priority order when writing any locator: (1) Locator Registry first, (2) frontend source code `grep` second, (3) Playwright MCP live DOM snapshot third, (4) HALT if nothing resolvable. Agents MUST NOT invent `data-testid` values or guess CSS selectors. Every locator written to a generated file requires `source: registry | codebase | mcp` attribution in the MCP evidence file — anything else triggers file deletion or revert.
34
+ - **Priority hit counts logged** — MCP evidence files now track `priority1_hits` (registry reuse), `priority2_hits` (source extraction), `priority3_hits` (MCP discovery), and `priority4_halts` (unresolvable elements), giving a full audit trail of where every locator came from.
35
+
36
+ ### Changed
37
+
38
+ - **Agent reliability pattern: triple reinforcement** — every critical rule is now reinforced three times: (1) `skills:` frontmatter injection at the start of context, (2) `required_reading` + mid-body non-negotiable rules, (3) active `ls`/`cat`/`grep` verification at the end. This closes the "lost in the middle" attention gap documented in long-context LLM research.
39
+ - **`qaa-bug-detective`, `qaa-executor`, `qaa-e2e-runner`, `qaa-validator`** — existing active checklists extended with `.qa-output/` specific items (generation plan, test inventory, codebase map, validation layers, failure classification evidence).
40
+
41
+ ### Fixed
42
+
43
+ - **Subagent skill loss** — when a parent agent spawned a subagent via `Task()`, the subagent ran with fresh context and ignored the skill entirely (it had no way to know a skill existed). Declaring `skills:` in the YAML frontmatter fixes this at the Claude Code loader level.
44
+ - **Artifact-read drift** — agents would sometimes reference `.qa-output/` artifacts in their reasoning without actually reading them. The active `grep` on specific content (e.g. "RISK_MAP HIGH items", "VALIDATION_REPORT confidence level") forces real consumption.
45
+
6
46
  ## [1.7.0] - 2026-04-02
7
47
 
8
48
  ### Added
package/README.md CHANGED
@@ -43,7 +43,9 @@ npx qaa-agent
43
43
  The interactive installer:
44
44
 
45
45
  1. Copies agents, commands, skills, templates, and workflows into your runtime directory
46
- 2. Configures the [Playwright MCP](https://github.com/anthropics/mcp-playwright) server in your user-scope config (`~/.claude.json`) so it's available in **all projects**
46
+ 2. Registers **two MCP servers** in your user-scope config (`~/.claude.json`) so they're available in **all projects**:
47
+ - [Playwright MCP](https://www.npmjs.com/package/@playwright/mcp) — live browser control for E2E tests and locator extraction
48
+ - [Context7 MCP](https://www.npmjs.com/package/@upstash/context7-mcp) — up-to-date library documentation on demand
47
49
  3. Merges required permissions into `settings.json`
48
50
 
49
51
  **Supported runtimes:** Claude Code, OpenCode
@@ -55,48 +57,34 @@ The interactive installer:
55
57
  - [Node.js](https://nodejs.org/) 18+
56
58
  - [Claude Code](https://docs.anthropic.com/en/docs/claude-code) installed
57
59
 
58
- ### Playwright MCP (required for E2E)
60
+ ### Bundled MCP servers
59
61
 
60
- QAA uses [`@playwright/mcp`](https://www.npmjs.com/package/@playwright/mcp) to open a real browser, extract locators from live pages, run E2E tests, and auto-fix locator mismatches.
62
+ Both MCP servers are **registered automatically** in `~/.claude.json` when you run `npx qaa-agent`. No manual setup required — once installed, they're available in every Claude Code project on your machine.
61
63
 
62
- **You need to install the Playwright MCP server manually in your environment:**
64
+ #### Playwright MCP live browser control
63
65
 
64
- <details>
65
- <summary><strong>VS Code (Claude Code extension)</strong></summary>
66
+ Uses [`@playwright/mcp`](https://www.npmjs.com/package/@playwright/mcp) to:
66
67
 
67
- 1. Open VS Code Settings (`Ctrl+Shift+P` > `Preferences: Open User Settings (JSON)`)
68
- 2. Add the MCP server config:
68
+ - Open a real browser and navigate your running app
69
+ - Extract actual locators (`data-testid`, ARIA roles, labels) from live pages
70
+ - Run E2E tests, capture failures, and auto-fix locator mismatches
71
+ - Build a persistent **Locator Registry** (`.qa-output/locators/`) that caches real locators across features
69
72
 
70
- ```json
71
- {
72
- "claude-code.mcpServers": {
73
- "playwright": {
74
- "command": "npx",
75
- "args": ["@playwright/mcp@latest"]
76
- }
77
- }
78
- }
79
- ```
73
+ #### Context7 MCP — up-to-date library docs
80
74
 
81
- Or add it to your project's `.vscode/mcp.json`:
75
+ Uses [`@upstash/context7-mcp`](https://www.npmjs.com/package/@upstash/context7-mcp) to:
82
76
 
83
- ```json
84
- {
85
- "servers": {
86
- "playwright": {
87
- "command": "npx",
88
- "args": ["@playwright/mcp@latest"]
89
- }
90
- }
91
- }
92
- ```
77
+ - Fetch the latest documentation for Playwright, Cypress, Jest, Vitest, pytest, and any other library the agent is working with
78
+ - Keep generated tests aligned with current framework APIs instead of outdated training data
79
+ - Free tier: ~60 requests/hour, ~3,300 tokens/query
93
80
 
94
- </details>
81
+ #### Verifying the MCPs are connected
95
82
 
96
- <details>
97
- <summary><strong>Claude Code CLI</strong></summary>
83
+ Open Claude Code in any project and type `/mcp`. You should see both `playwright` and `context7` listed as connected.
98
84
 
99
- Add to `~/.claude.json` (user-scope, all projects):
85
+ #### Manual config (fallback)
86
+
87
+ If for any reason the automatic registration fails, you can add the servers manually to `~/.claude.json`:
100
88
 
101
89
  ```json
102
90
  {
@@ -104,21 +92,15 @@ Add to `~/.claude.json` (user-scope, all projects):
104
92
  "playwright": {
105
93
  "command": "npx",
106
94
  "args": ["@playwright/mcp@latest"]
95
+ },
96
+ "context7": {
97
+ "command": "npx",
98
+ "args": ["-y", "@upstash/context7-mcp@latest"]
107
99
  }
108
100
  }
109
101
  }
110
102
  ```
111
103
 
112
- Or add a `.mcp.json` file in your project root for project-scope only.
113
-
114
- </details>
115
-
116
- Once configured, Playwright MCP enables QAA to:
117
- - Open a real browser and navigate your running app
118
- - Extract actual locators (`data-testid`, ARIA roles, labels) from live pages
119
- - Run E2E tests, capture failures, and auto-fix locator mismatches
120
- - Build a persistent **Locator Registry** (`.qa-output/locators/`) that caches real locators across features
121
-
122
104
  ---
123
105
 
124
106
  ## Quick Start
@@ -328,7 +310,7 @@ qaa-agent/
328
310
  bin/ # Installer and CLI tools
329
311
  docs/ # User documentation
330
312
  CLAUDE.md # QA standards (read by every agent)
331
- .mcp.json # Playwright MCP server config
313
+ .mcp.json # Playwright + Context7 MCP server config
332
314
  settings.json # Claude Code permissions
333
315
  ```
334
316
 
@@ -1,3 +1,10 @@
1
+ ---
2
+ name: qa-pipeline-orchestrator
3
+ description: Single orchestrator for the QA automation pipeline
4
+ skills:
5
+ - qa-workflow-documenter
6
+ ---
7
+
1
8
  <purpose>
2
9
  Single orchestrator for the QA automation pipeline. Coordinates all 7 agent types (scanner, analyzer, planner, executor, validator, bug-detective, testid-injector) across 3 workflow options. Owns all pipeline state transitions -- agents never update state directly. The orchestrator sets stage status to 'running' before spawning an agent and 'complete' or 'failed' after the agent returns.
3
10
 
@@ -1376,3 +1383,43 @@ Before this orchestrator is considered complete, verify:
1376
1383
  4. Checkpoints pause when appropriate and auto-approve when safe
1377
1384
  5. Failure in any stage stops the pipeline cleanly with actionable error message
1378
1385
  </success_criteria>
1386
+
1387
+ ## MANDATORY verification — run ALL commands below, no exceptions, no skipping
1388
+
1389
+ Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
1390
+
1391
+ ```bash
1392
+ echo "=== PIPELINE ORCHESTRATOR CHECKLIST START ==="
1393
+ echo "1. Pipeline state file:"
1394
+ ls .planning/STATE.md 2>/dev/null || echo "STATE_FILE_NOT_FOUND"
1395
+ echo "2. Stage status fields:"
1396
+ grep -E "scan_status|analyze_status|generate_status|validate_status|deliver_status" .planning/STATE.md 2>/dev/null || echo "NO_STATUS_FIELDS"
1397
+ echo "3. All .qa-output/ artifacts:"
1398
+ ls .qa-output/ 2>/dev/null || echo "QA_OUTPUT_EMPTY"
1399
+ echo "4. SCAN_MANIFEST.md (always required):"
1400
+ ls .qa-output/SCAN_MANIFEST.md 2>/dev/null || echo "NO_SCAN_MANIFEST"
1401
+ echo "5. Codebase map document count:"
1402
+ ls .qa-output/codebase/ 2>/dev/null | wc -l || echo "NO_CODEBASE_MAP"
1403
+ echo "6. Analyzer artifacts:"
1404
+ ls .qa-output/QA_ANALYSIS.md .qa-output/TEST_INVENTORY.md 2>/dev/null || echo "NO_ANALYZER_ARTIFACTS"
1405
+ echo "7. TestID audit report:"
1406
+ ls .qa-output/TESTID_AUDIT_REPORT.md 2>/dev/null || echo "NO_TESTID_REPORT"
1407
+ echo "8. Generation plan:"
1408
+ ls .qa-output/GENERATION_PLAN.md 2>/dev/null || echo "NO_GENERATION_PLAN"
1409
+ echo "9. Validation report:"
1410
+ ls .qa-output/VALIDATION_REPORT.md 2>/dev/null || echo "NO_VALIDATION_REPORT"
1411
+ echo "10. E2E + bug-detective reports:"
1412
+ ls .qa-output/E2E_RUN_REPORT.md .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_E2E_REPORTS"
1413
+ echo "11. State transitions with timestamps:"
1414
+ grep -cE "^- |^[0-9]+\." .planning/STATE.md 2>/dev/null || echo "NO_STATE_TRANSITIONS"
1415
+ echo "12. MY_PREFERENCES.md:"
1416
+ cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
1417
+ echo "=== PIPELINE ORCHESTRATOR CHECKLIST END ==="
1418
+ ```
1419
+
1420
+ **Rules:**
1421
+ - Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
1422
+ - If any output shows a problem (STATE_FILE_NOT_FOUND, NO_SCAN_MANIFEST), fix it before returning.
1423
+ - If output shows expected "not found" results (e.g., NO_TESTID_REPORT when no frontend was detected), that is fine — the point is you RAN the command instead of assuming the answer.
1424
+ - Do NOT return control to the user until the block has been executed and you have read every line of output.
1425
+
@@ -1,3 +1,10 @@
1
+ ---
2
+ name: qaa-analyzer
3
+ description: Analyzes scanned repo to produce QA analysis and test inventory
4
+ skills:
5
+ - qa-repo-analyzer
6
+ ---
7
+
1
8
  <purpose>
2
9
  Analyze a scanned repository to produce QA_ANALYSIS.md and TEST_INVENTORY.md -- the two primary analysis artifacts that drive all downstream test planning and generation. Consumes SCAN_MANIFEST.md (produced by the scanner agent) and CLAUDE.md (QA standards) to produce a comprehensive testability report with architecture overview, risk assessment, top 10 unit test targets, API contract targets, and a testing pyramid distribution tailored to the specific repository. Produces a pyramid-based test case inventory where every test case has a unique ID, specific target, concrete inputs, explicit expected outcome with exact values, and priority. Optionally produces QA_REPO_BLUEPRINT.md for Option 1 (dev-only) workflows when no existing QA repository exists. Spawned by the orchestrator after the scanner completes successfully via Task(subagent_type='qaa-analyzer').
3
10
  </purpose>
@@ -537,3 +544,37 @@ The analyzer agent has completed successfully when:
537
544
  5. All artifacts are committed via `node bin/qaa-tools.cjs commit`
538
545
  6. Return to orchestrator: file paths, total test count, pyramid breakdown (unit/integration/api/e2e counts), risk count (high/medium/low)
539
546
  </success_criteria>
547
+
548
+ ## MANDATORY verification — run ALL commands below, no exceptions, no skipping
549
+
550
+ Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
551
+
552
+ ```bash
553
+ echo "=== ANALYZER CHECKLIST START ==="
554
+ echo "1. SCAN_MANIFEST.md (input):"
555
+ ls .qa-output/SCAN_MANIFEST.md 2>/dev/null || echo "SCAN_MANIFEST_NOT_FOUND"
556
+ echo "2. SCAN_MANIFEST content preview:"
557
+ head -50 .qa-output/SCAN_MANIFEST.md 2>/dev/null || echo "SCAN_MANIFEST_EMPTY"
558
+ echo "3. Codebase map documents:"
559
+ ls .qa-output/codebase/ 2>/dev/null || echo "NO_CODEBASE_MAP"
560
+ echo "4. RISK_MAP.md risks:"
561
+ grep -E "^## |HIGH|MEDIUM|LOW" .qa-output/codebase/RISK_MAP.md 2>/dev/null | head -20 || echo "NO_RISK_MAP"
562
+ echo "5. CRITICAL_PATHS.md flows:"
563
+ grep -c "^- \|^[0-9]\+\." .qa-output/codebase/CRITICAL_PATHS.md 2>/dev/null || echo "NO_CRITICAL_PATHS"
564
+ echo "6. TEST_SURFACE.md entry points:"
565
+ grep -E "function|class|method" .qa-output/codebase/TEST_SURFACE.md 2>/dev/null | head -10 || echo "NO_TEST_SURFACE"
566
+ echo "7. Locator Registry:"
567
+ ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
568
+ echo "8. Output artifacts:"
569
+ ls .qa-output/QA_ANALYSIS.md .qa-output/TEST_INVENTORY.md 2>/dev/null || echo "OUTPUTS_NOT_WRITTEN"
570
+ echo "9. MY_PREFERENCES.md:"
571
+ cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
572
+ echo "=== ANALYZER CHECKLIST END ==="
573
+ ```
574
+
575
+ **Rules:**
576
+ - Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
577
+ - If any output shows a problem (SCAN_MANIFEST_NOT_FOUND, OUTPUTS_NOT_WRITTEN), fix it before returning.
578
+ - If output shows expected "not found" results (e.g., NO_CODEBASE_MAP when mapper hasn't run yet), that is fine — the point is you RAN the command instead of assuming the answer.
579
+ - Do NOT return control to the parent agent until the block has been executed and you have read every line of output.
580
+
@@ -1,3 +1,10 @@
1
+ ---
2
+ name: qaa-bug-detective
3
+ description: Classifies failures and fixes test code errors
4
+ skills:
5
+ - qa-bug-detective
6
+ ---
7
+
1
8
  <purpose>
2
9
  Run generated tests against the actual application and classify every failure into one of four actionable categories: APPLICATION BUG, TEST CODE ERROR, ENVIRONMENT ISSUE, or INCONCLUSIVE. Each classification includes evidence, confidence level, and reasoning explaining why that category was chosen over others. Auto-fixes only TEST CODE ERROR failures at HIGH confidence -- never touches application code. Reads test source files, CLAUDE.md classification rules, and the failure-classification template. Produces FAILURE_CLASSIFICATION_REPORT.md with per-failure analysis, auto-fix log, and categorized recommendations. Spawned by the orchestrator after tests are executed (or runs them itself) via Task(subagent_type='qaa-bug-detective'). This agent actually RUNS the test suite -- it is not static analysis. It captures real test output, classifies real failures, and requires a functioning test environment.
3
10
  </purpose>
@@ -309,6 +316,54 @@ Attempt auto-fixes for eligible failures. Strict eligibility rules apply.
309
316
  **Track all auto-fix attempts** for the Auto-Fix Log section of the report.
310
317
  </step>
311
318
 
319
+ ## Non-negotiable rules
320
+
321
+ These rules are hardcoded in the agent body because they MUST NOT be skipped under any circumstance, regardless of whether the skill is loaded or not.
322
+
323
+ ### Locator Registry persistence
324
+
325
+ After every fix loop iteration where the test **PASSES**:
326
+
327
+ 1. **Save all verified locators** to `.qa-output/locators/` — write a per-feature file `.qa-output/locators/{feature}.locators.md` and update `.qa-output/locators/LOCATOR_REGISTRY.md`.
328
+ 2. **Only save locators that were confirmed working** by a passing test. Do NOT save locators from failing tests — they may be incorrect and would contaminate the registry.
329
+ 3. **Locator format in registry:** Each entry must include: the `data-testid` or selector value, the tier (1-4), the page/component context, and the date verified.
330
+
331
+ ### MY_PREFERENCES.md persistence
332
+
333
+ After every fix where a correction contradicts CLAUDE.md defaults or reveals a user-specific pattern:
334
+
335
+ 1. **Read `~/.claude/qaa/MY_PREFERENCES.md`** if it exists, before producing any output (this is also in `<required_reading>` but repeated here for emphasis).
336
+ 2. **Save new corrections** to `~/.claude/qaa/MY_PREFERENCES.md` so future agent instances inherit the learning.
337
+ 3. Preferences override CLAUDE.md when there is a conflict.
338
+
339
+ ### Playwright MCP reproduction is mandatory for E2E failures
340
+
341
+ When an E2E test fails **and** Playwright MCP server is connected **and** an `app_url` is available, browser reproduction is **required, not optional** — classifying an E2E failure without reproducing it in the live browser produces unreliable APPLICATION BUG vs TEST CODE ERROR classifications.
342
+
343
+ 1. **For each E2E failure in the test run:** call at minimum `mcp__playwright__browser_navigate` (to the failing route), `mcp__playwright__browser_snapshot` (to inspect the real DOM), and `mcp__playwright__browser_take_screenshot` (visual evidence attached to the classification).
344
+ 2. **Skip is only permitted when:** the failure is a unit/API test (not E2E), OR no `app_url` is available, OR Playwright MCP is not connected. The skip MUST be recorded in FAILURE_CLASSIFICATION_REPORT.md under the failure's evidence section with reason (e.g., "MCP unavailable" or "no app_url").
345
+ 3. **Persist evidence of MCP usage** to `.qa-output/mcp-evidence/qaa-bug-detective-session.md` with:
346
+ - `session_start: {ISO timestamp}` and `session_end: {ISO timestamp}`
347
+ - `failures_reproduced:` list of `{test_id, route, classification}`
348
+ - `snapshots_taken:` count + route
349
+ - `screenshots_taken:` list of screenshot paths (evidence for classifications)
350
+ - `browser_closed: true`
351
+ 4. **If E2E failures exist and the evidence file is missing or empty, classifications for those failures are INVALID** — mark them INCONCLUSIVE with reason "MCP reproduction skipped" rather than making up an APPLICATION BUG / TEST CODE ERROR classification.
352
+
353
+ ### Locator resolution priority when auto-fixing TEST CODE ERRORS — invention is forbidden
354
+
355
+ When a failure is classified as `TEST CODE ERROR` (wrong locator) and the agent auto-fixes the test file, the corrected locator MUST come from one of the following sources, in this exact priority order. **The agent MUST NOT invent a new `data-testid` or guess a CSS selector.**
356
+
357
+ **Priority 1 — Locator Registry:** Check `.qa-output/locators/LOCATOR_REGISTRY.md` + `.qa-output/locators/{feature}.locators.md` for the target element.
358
+
359
+ **Priority 2 — Codebase source:** `grep -rE "data-testid=|aria-label=|id=\""` the frontend source for the page where the failure occurred.
360
+
361
+ **Priority 3 — Live DOM via Playwright MCP:** Use `mcp__playwright__browser_snapshot()` on the failing route to extract the real locator. Persist to registry with tier classification.
362
+
363
+ **Priority 4 — HALT:** If nothing is resolvable, do NOT auto-fix. Re-classify the failure as `INCONCLUSIVE` with reason `locator unresolvable from registry/source/MCP`. The fix remains for the developer to address.
364
+
365
+ Every locator written during auto-fix MUST have a source attribution in the MCP evidence file: `source: registry | codebase | mcp`. A locator without attribution is invented and the auto-fix is invalid (revert it).
366
+
312
367
  <step name="produce_report">
313
368
  Write FAILURE_CLASSIFICATION_REPORT.md matching templates/failure-classification.md exactly (4 required sections).
314
369
 
@@ -477,3 +532,43 @@ The bug detective agent has completed successfully when:
477
532
  8. Return values provided to orchestrator: report_path, total_failures, classification_breakdown, auto_fixes_applied, auto_fixes_verified, commit_hash
478
533
  9. All quality gate checks pass (8 template items + 4 detective-specific items)
479
534
  </success_criteria>
535
+
536
+ ## MANDATORY verification — run ALL commands below, no exceptions, no skipping
537
+
538
+ Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
539
+
540
+ ```bash
541
+ echo "=== BUG-DETECTIVE CHECKLIST START ==="
542
+ echo "1. Locator Registry:"
543
+ ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
544
+ echo "2. MY_PREFERENCES.md:"
545
+ cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
546
+ echo "3. FAILURE_CLASSIFICATION_REPORT.md:"
547
+ ls .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "REPORT_NOT_WRITTEN"
548
+ echo "4. Classifications in report:"
549
+ grep -E "APPLICATION BUG|TEST CODE ERROR|ENVIRONMENT ISSUE|INCONCLUSIVE" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_CLASSIFICATIONS_FOUND"
550
+ echo "5. Confidence levels:"
551
+ grep -E "HIGH|MEDIUM-HIGH|MEDIUM|LOW" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null | head -10 || echo "NO_CONFIDENCE_LEVELS"
552
+ echo "6. Evidence and reasoning count:"
553
+ grep -cE "^### |Evidence:|Reasoning:" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_EVIDENCE_SECTIONS"
554
+ echo "7. Upstream reports:"
555
+ ls .qa-output/E2E_RUN_REPORT.md 2>/dev/null || echo "NO_E2E_RUN_REPORT"
556
+ ls .qa-output/VALIDATION_REPORT.md 2>/dev/null || echo "NO_VALIDATION_REPORT"
557
+ echo "8. MCP reproduction evidence:"
558
+ ls .qa-output/mcp-evidence/qaa-bug-detective-session.md 2>/dev/null || echo "NO_MCP_EVIDENCE"
559
+ grep -cE "failures_reproduced:|snapshots_taken:|screenshots_taken:" .qa-output/mcp-evidence/qaa-bug-detective-session.md 2>/dev/null || echo "NO_MCP_REPRODUCTION_DATA"
560
+ echo "9. MCP skip reasons (if any):"
561
+ grep -E "MCP unavailable|no app_url|MCP reproduction skipped" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_MCP_SKIP_DOCUMENTED"
562
+ echo "10. Locator source attribution:"
563
+ grep -cE "source: registry|source: codebase|source: mcp" .qa-output/mcp-evidence/qaa-bug-detective-session.md 2>/dev/null || echo "NO_SOURCE_ATTRIBUTION"
564
+ echo "11. Priority 4 halts:"
565
+ grep -E "locator unresolvable from registry/source/MCP" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_PRIORITY4_HALTS"
566
+ echo "=== BUG-DETECTIVE CHECKLIST END ==="
567
+ ```
568
+
569
+ **Rules:**
570
+ - Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
571
+ - If any output shows a problem (REPORT_NOT_WRITTEN, NO_CLASSIFICATIONS_FOUND), fix it before returning.
572
+ - If output shows expected "not found" results (e.g., NO_MCP_EVIDENCE when no E2E failures existed), that is fine — the point is you RAN the command instead of assuming the answer.
573
+ - Do NOT return control to the parent agent until the block has been executed and you have read every line of output.
574
+
@@ -3,6 +3,8 @@ name: qaa-codebase-mapper
3
3
  description: Explores codebase and writes QA-focused analysis documents. Spawned by /qa-analyze or qa-start pipeline. Produces testing-oriented architecture, conventions, and risk documents.
4
4
  tools: Read, Bash, Grep, Glob, Write
5
5
  color: cyan
6
+ skills:
7
+ - qa-repo-analyzer
6
8
  ---
7
9
 
8
10
  <role>
@@ -933,3 +935,4 @@ Test the highest-risk gaps first:
933
935
  - [ ] No secrets or forbidden file contents leaked
934
936
  - [ ] Confirmation returned (not document contents)
935
937
  </success_criteria>
938
+
@@ -1,3 +1,10 @@
1
+ ---
2
+ name: qaa-e2e-runner
3
+ description: Runs E2E tests against live app, fixes locator mismatches
4
+ skills:
5
+ - qa-bug-detective
6
+ ---
7
+
1
8
  <purpose>
2
9
  Run generated E2E test files against a live application using the Playwright browser tools. Navigate pages, capture real locators from the accessibility snapshot, compare them against the locators in generated test files, fix mismatches, and loop until tests pass or failures are classified as application bugs. This agent bridges the gap between "tests exist on disk" and "tests actually pass against the real app."
3
10
 
@@ -400,6 +407,39 @@ E2E_RUNNER_COMPLETE:
400
407
  | Test runner not found | No playwright/cypress installed | Report as ENVIRONMENT ISSUE with install instructions |
401
408
  </error_handling>
402
409
 
410
+ ## Non-negotiable rules
411
+
412
+ These rules are hardcoded in the agent body because they MUST NOT be skipped under any circumstance, regardless of whether the skill is loaded or not.
413
+
414
+ ### Playwright MCP usage is mandatory (NOT optional)
415
+
416
+ This agent's core job is to run tests against a **live browser**. That requires the Playwright MCP server. The agent MUST NOT classify a test run as complete based on static analysis, log inspection, or dry-run output alone.
417
+
418
+ 1. **Every E2E test execution MUST go through Playwright MCP tools** — `mcp__playwright__browser_navigate`, `mcp__playwright__browser_snapshot`, `mcp__playwright__browser_click`, `mcp__playwright__browser_fill_form`, `mcp__playwright__browser_take_screenshot`, `mcp__playwright__browser_close`. If these tools are not available, halt and return `ENVIRONMENT_ISSUE: Playwright MCP not connected` instead of faking execution.
419
+ 2. **Minimum required MCP operations per run:** at least one `browser_navigate` (to the app URL), at least one `browser_snapshot` (for DOM inspection), at least one `browser_take_screenshot` (for visual evidence), and exactly one `browser_close` at the end of the session.
420
+ 3. **Persist evidence of MCP usage** to `.qa-output/mcp-evidence/qaa-e2e-runner-session.md`. The file MUST contain:
421
+ - `session_start: {ISO timestamp}` and `session_end: {ISO timestamp}`
422
+ - `urls_navigated:` list of every URL passed to `browser_navigate`
423
+ - `snapshots_taken:` count of `browser_snapshot` calls with route per snapshot
424
+ - `screenshots_taken:` list of screenshot file paths (also written to `.qa-output/screenshots/`)
425
+ - `interactions:` list of clicks/fills with the element identifier
426
+ - `browser_closed: true` confirming `browser_close` was called
427
+ 4. **If the evidence file is missing, empty, or lists zero `browser_navigate` calls, the run is INVALID** — do not write E2E_RUN_REPORT.md and return a hard failure instead.
428
+
429
+ ### Locator resolution priority when fixing failing tests — invention is forbidden
430
+
431
+ When a test fails due to a locator mismatch and the fix loop needs to update the POM or test file with a corrected locator, the runner MUST follow this priority chain. **Never invent a `data-testid` or selector that does not exist in one of the sources below.**
432
+
433
+ **Priority 1 — Locator Registry:** Check `.qa-output/locators/LOCATOR_REGISTRY.md` and `.qa-output/locators/{feature}.locators.md` for the target element. If present, use it verbatim.
434
+
435
+ **Priority 2 — Codebase source:** If not in registry, `grep -rE "data-testid=|aria-label=|id=\"" <frontend_source_dir>` for the page under test. If found, use verbatim and persist to registry.
436
+
437
+ **Priority 3 — Live DOM via Playwright MCP:** If not in registry AND not in source, call `mcp__playwright__browser_snapshot()` on the failing route and extract the real locator from the snapshot. Persist to registry with `tier` classification.
438
+
439
+ **Priority 4 — HALT (never invent):** If nothing is resolvable, mark the test as `BLOCKED: locator unresolvable` in E2E_RUN_REPORT.md with the unresolved element name. Do NOT fabricate a locator to "make the test pass". Do NOT replace the failing locator with a random guess.
440
+
441
+ Every locator written to a POM/test during fix loops MUST have a source attribution in the MCP evidence file: `source: registry | codebase | mcp`. Anything else is invention and the fix is invalid.
442
+
403
443
  <success_criteria>
404
444
  E2E runner is complete when:
405
445
 
@@ -414,3 +454,49 @@ E2E runner is complete when:
414
454
  - [ ] Locator registry updated with all real locators discovered during execution (`.qa-output/locators/`)
415
455
  - [ ] Browser session was closed
416
456
  </success_criteria>
457
+
458
+ ## MANDATORY verification — run ALL commands below, no exceptions, no skipping
459
+
460
+ Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
461
+
462
+ ```bash
463
+ echo "=== E2E-RUNNER CHECKLIST START ==="
464
+ echo "1. E2E Run Report:"
465
+ ls .qa-output/E2E_RUN_REPORT.md 2>/dev/null || echo "REPORT_NOT_WRITTEN"
466
+ echo "2. Locator Registry:"
467
+ ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
468
+ echo "3. Screenshots:"
469
+ ls .qa-output/screenshots/ 2>/dev/null || echo "NO_SCREENSHOTS"
470
+ echo "4. Modified POMs/tests in working tree:"
471
+ git status 2>/dev/null | grep -E "modified:.*(pages/|tests/)" || echo "NO_MODIFIED_FILES"
472
+ echo "5. MY_PREFERENCES.md:"
473
+ cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
474
+ echo "6. MCP evidence file:"
475
+ ls .qa-output/mcp-evidence/qaa-e2e-runner-session.md 2>/dev/null || echo "NO_MCP_EVIDENCE"
476
+ echo "7. MCP session boundaries:"
477
+ grep -E "session_start:|session_end:|browser_closed: true" .qa-output/mcp-evidence/qaa-e2e-runner-session.md 2>/dev/null || echo "NO_MCP_SESSION"
478
+ echo "8. URLs navigated via MCP:"
479
+ grep -cE "^ - http|^ - /" .qa-output/mcp-evidence/qaa-e2e-runner-session.md 2>/dev/null || echo "NO_URLS_NAVIGATED"
480
+ echo "9. Snapshot + screenshot operations:"
481
+ grep -cE "browser_snapshot|browser_take_screenshot" .qa-output/mcp-evidence/qaa-e2e-runner-session.md 2>/dev/null || echo "NO_SNAPSHOT_OPS"
482
+ echo "10. Locator source attribution:"
483
+ grep -cE "source: registry|source: codebase|source: mcp" .qa-output/mcp-evidence/qaa-e2e-runner-session.md 2>/dev/null || echo "NO_SOURCE_ATTRIBUTION"
484
+ echo "11. Unresolvable locator blocks:"
485
+ grep -E "BLOCKED: locator unresolvable" .qa-output/E2E_RUN_REPORT.md 2>/dev/null || echo "NO_BLOCKED_LOCATORS"
486
+ echo "12. Pass/fail counts in report:"
487
+ grep -E "PASS|FAIL|Tests run|[0-9]+ passed|[0-9]+ failed" .qa-output/E2E_RUN_REPORT.md 2>/dev/null | head -5 || echo "NO_PASS_FAIL_COUNTS"
488
+ echo "13. Locator Registry entries:"
489
+ grep -cE "^- |^\* " .qa-output/locators/LOCATOR_REGISTRY.md 2>/dev/null || echo "NO_REGISTRY_ENTRIES"
490
+ echo "14. Locator tier classification:"
491
+ grep -E "tier: 1|tier: 2|tier: 3|tier: 4" .qa-output/locators/*.md 2>/dev/null | head -10 || echo "NO_TIER_CLASSIFICATION"
492
+ echo "15. Validator report (input):"
493
+ ls .qa-output/VALIDATION_REPORT.md 2>/dev/null || echo "NO_VALIDATION_REPORT"
494
+ echo "=== E2E-RUNNER CHECKLIST END ==="
495
+ ```
496
+
497
+ **Rules:**
498
+ - Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
499
+ - If any output shows a problem (REPORT_NOT_WRITTEN, NO_MCP_EVIDENCE when browser was used), fix it before returning.
500
+ - If output shows expected "not found" results (e.g., NO_SCREENSHOTS when tests all passed first try), that is fine — the point is you RAN the command instead of assuming the answer.
501
+ - Do NOT return control to the parent agent until the block has been executed and you have read every line of output.
502
+
@@ -1,3 +1,11 @@
1
+ ---
2
+ name: qaa-executor
3
+ description: Generates test files, POMs, fixtures and configs
4
+ skills:
5
+ - qa-template-engine
6
+ - qa-self-validator
7
+ ---
8
+
1
9
  <purpose>
2
10
  Read the generation plan (produced by qaa-planner), TEST_INVENTORY.md, and CLAUDE.md to produce actual test files, page object models, fixtures, and configuration files. This is the most complex agent in the pipeline -- it handles framework detection, BasePage scaffolding, POM generation following strict rules, test spec writing with concrete assertions, and per-file atomic commits for maximum traceability. The executor does not decide WHAT to test (that is the planner's job) -- it decides HOW to write each test file following CLAUDE.md standards and qa-template-engine patterns.
3
11
 
@@ -593,6 +601,48 @@ EXECUTOR_COMPLETE:
593
601
  ```
594
602
  </output>
595
603
 
604
+ ## Non-negotiable rules
605
+
606
+ These rules are hardcoded in the agent body because they MUST NOT be skipped under any circumstance, regardless of whether the skill is loaded or not.
607
+
608
+ ### Locator resolution priority — locator invention is forbidden
609
+
610
+ **Before writing any locator (Tier 1 `data-testid`, Tier 2 role/label, Tier 3 CSS) in a POM or E2E test, the executor MUST follow this exact priority chain. Proposing a value that exists in none of the sources below is a critical failure.**
611
+
612
+ **Priority 1 — Locator Registry (first check):**
613
+ - Run `ls .qa-output/locators/LOCATOR_REGISTRY.md` and `ls .qa-output/locators/{feature}.locators.md`.
614
+ - `grep` the target element (by page + semantic description) in those files.
615
+ - If a locator exists → USE IT VERBATIM. Do not modify, do not propose an alternative.
616
+
617
+ **Priority 2 — Codebase source (second check, only if not in registry):**
618
+ - `grep -rE "data-testid=|aria-label=|id=\"" <frontend_source_dir>` for the target page/component file.
619
+ - If `data-testid`, stable `id`, or semantic `aria-label` is found in source → USE IT VERBATIM. Persist to registry so future runs hit Priority 1.
620
+
621
+ **Priority 3 — Playwright MCP live DOM (third check, only if not in registry AND not in source):**
622
+ - Call `mcp__playwright__browser_navigate({ url: "{app_url}/{route}" })` then `mcp__playwright__browser_snapshot()` to read the rendered DOM.
623
+ - Extract the real locator from the snapshot (Tier 1 > Tier 2 > Tier 3 priority per CLAUDE.md).
624
+ - Persist discovered locator to `.qa-output/locators/{feature}.locators.md` and update `LOCATOR_REGISTRY.md` so the next run hits Priority 1.
625
+
626
+ **Priority 4 — HALT (never invent):**
627
+ - If registry has no entry, source has no stable attribute, AND (MCP is unavailable OR `app_url` is missing), the agent MUST HALT for that element.
628
+ - Return `BLOCKED: locator unresolvable for {page}:{element} — registry empty, source has no testid/aria, MCP unavailable. Options: (a) run qa-testid to inject, (b) provide app_url, (c) connect Playwright MCP.`
629
+ - Do NOT invent a `data-testid` value. Do NOT propose a CSS selector based on a guess. Do NOT write the POM/test file with placeholder locators.
630
+
631
+ ### Playwright MCP evidence file (mandatory when MCP is used)
632
+
633
+ When Priority 3 is invoked (MCP lookup), persist evidence to `.qa-output/mcp-evidence/qaa-executor-session.md` with:
634
+ - `session_start: {ISO timestamp}` and `session_end: {ISO timestamp}`
635
+ - `pages_validated:` list of `{page_name, url, locators_discovered_count, source: registry|codebase|mcp}`
636
+ - `snapshots_taken:` count + route
637
+ - `locators_discovered_via_mcp:` list of locators found via MCP (these MUST also appear in `.qa-output/locators/`)
638
+ - `priority1_hits:` count (reused from registry)
639
+ - `priority2_hits:` count (extracted from source)
640
+ - `priority3_hits:` count (discovered via MCP)
641
+ - `priority4_halts:` list of unresolvable elements (if any)
642
+ - `browser_closed: true`
643
+
644
+ **If E2E/POM files were generated AND the evidence file shows `priority3_hits > 0` but the registry was not updated, the generation is INVALID** — delete files and re-run. Every MCP-discovered locator MUST be persisted.
645
+
596
646
  <quality_gate>
597
647
  Before considering the executor's work complete, verify ALL of the following.
598
648
 
@@ -649,3 +699,51 @@ The executor agent has completed successfully when:
649
699
  8. All quality gate checks pass
650
700
  9. Return values provided to orchestrator: files_created, total_files, commit_count, features_covered, test_case_count
651
701
  </success_criteria>
702
+
703
+ ## MANDATORY verification — run ALL commands below, no exceptions, no skipping
704
+
705
+ Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
706
+
707
+ ```bash
708
+ echo "=== EXECUTOR CHECKLIST START ==="
709
+ echo "1. Generated test files, POMs, fixtures:"
710
+ ls tests/ pages/ fixtures/ 2>/dev/null || echo "NO_TEST_FILES_FOUND"
711
+ echo "2. BasePage inheritance:"
712
+ grep -rE "class BasePage|extends BasePage" pages/ 2>/dev/null || echo "NO_BASEPAGE_FOUND"
713
+ echo "3. Test framework config:"
714
+ ls *.config.* 2>/dev/null || echo "NO_CONFIG_FOUND"
715
+ echo "4. MY_PREFERENCES.md:"
716
+ cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
717
+ echo "5. Locator Registry:"
718
+ ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
719
+ echo "6. Generation plan and test inventory inputs:"
720
+ ls .qa-output/GENERATION_PLAN.md .qa-output/TEST_INVENTORY.md 2>/dev/null || echo "INPUTS_NOT_FOUND"
721
+ echo "7. Test case count from inventory:"
722
+ grep -cE "^\| (UT|INT|API|E2E)-" .qa-output/TEST_INVENTORY.md 2>/dev/null || echo "NO_TEST_CASES_COUNTED"
723
+ echo "8. Generation plan tasks consumed:"
724
+ grep -E "task_id|files_to_create" .qa-output/GENERATION_PLAN.md 2>/dev/null | head -20 || echo "NO_PLAN_TASKS"
725
+ echo "9. Codebase map documents:"
726
+ ls .qa-output/codebase/ 2>/dev/null || echo "NO_CODEBASE_MAP"
727
+ echo "10. CODE_PATTERNS.md patterns:"
728
+ grep -E "pattern|convention|style" .qa-output/codebase/CODE_PATTERNS.md 2>/dev/null | head -5 || echo "NO_CODE_PATTERNS"
729
+ echo "11. Tier 1 locator usage in generated code:"
730
+ grep -cE "data-testid|getByTestId|getByRole|findByRole" tests/ pages/ -r 2>/dev/null || echo "NO_TIER1_LOCATORS"
731
+ echo "12. MCP evidence file:"
732
+ ls .qa-output/mcp-evidence/qaa-executor-session.md 2>/dev/null || echo "NO_MCP_EVIDENCE"
733
+ echo "13. Locator priority chain hits:"
734
+ grep -E "priority1_hits:|priority2_hits:|priority3_hits:|priority4_halts:" .qa-output/mcp-evidence/qaa-executor-session.md 2>/dev/null || echo "NO_PRIORITY_HITS"
735
+ echo "14. Locator source attribution:"
736
+ grep -cE "source: registry|source: codebase|source: mcp" .qa-output/mcp-evidence/qaa-executor-session.md 2>/dev/null || echo "NO_SOURCE_ATTRIBUTION"
737
+ echo "15. MCP session boundaries:"
738
+ grep -E "session_start:|browser_closed: true" .qa-output/mcp-evidence/qaa-executor-session.md 2>/dev/null || echo "NO_MCP_SESSION"
739
+ echo "16. Priority 4 halts (unresolvable locators):"
740
+ grep -E "BLOCKED: locator unresolvable" .qa-output/mcp-evidence/qaa-executor-session.md 2>/dev/null || echo "NO_PRIORITY4_HALTS"
741
+ echo "=== EXECUTOR CHECKLIST END ==="
742
+ ```
743
+
744
+ **Rules:**
745
+ - Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
746
+ - If any output shows a problem (NO_TEST_FILES_FOUND after generation, INPUTS_NOT_FOUND), fix it before returning.
747
+ - If output shows expected "not found" results (e.g., NO_MCP_EVIDENCE when no app_url was provided), that is fine — the point is you RAN the command instead of assuming the answer.
748
+ - Do NOT return control to the parent agent until the block has been executed and you have read every line of output.
749
+