qaa-agent 1.8.5 → 1.8.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -3,6 +3,18 @@
3
3
 
4
4
  All notable changes to QAA (QA Automation Agent) are documented here.
5
5
 
6
+ ## [1.8.6] - 2026-04-20
7
+
8
+ ### Added
9
+
10
+ - **Fix mode analyze-first flow in `/qa-fix`** — fix mode now runs in two phases: Phase 1 analyzes and classifies all failures without touching any files, then presents a Fix Plan to the user. Phase 2 executes auto-fixes only after explicit user confirmation. Users can refine the plan iteratively (add fixes, remove files, change approach) in a loop until satisfied, then approve or cancel. Replaces the previous fully-automatic fix behavior.
11
+ - **Codebase map context for bug-detective** — `/qa-fix` fix mode now passes all 4 codebase map documents (`CODE_PATTERNS.md`, `API_CONTRACTS.md`, `TEST_SURFACE.md`, `TESTABILITY.md`) to the bug-detective agent via `files_to_read`. Previously the bug-detective classified failures without project-specific context, leading to less accurate classifications.
12
+ - **Mandatory bash checklist in `/qa-fix`** — verification block at the end of `qa-fix.md` that forces the agent to run `ls`/`cat`/`grep` commands to confirm artifacts were produced (classification report, locator registry, codebase map, MCP evidence, test files). Matches the existing checklist pattern in `/qa-create-test`.
13
+
14
+ ### Changed
15
+
16
+ - **Validator fix loops increased from 3 to 5** — `qaa-validator` agent now has up to 5 fix loop iterations (previously 3), matching the E2E runner's loop budget. Updated across all references: locked decision, fix loop logic, checkpoint return, confidence criteria table, and quality gate checks.
17
+
6
18
  ## [1.8.5] - 2026-04-17
7
19
 
8
20
  ### Added
@@ -25,7 +25,7 @@ Read ALL of the following files BEFORE performing any validation. Do NOT skip.
25
25
 
26
26
  - **templates/validation-report.md** -- Output format contract. Defines the 5 required sections (Summary, File Details, Unresolved Issues, Fix Loop Log, Confidence Level), all field definitions, confidence criteria table (HIGH/MEDIUM/LOW), worked example, and quality gate checklist (7 items). Your VALIDATION_REPORT.md output MUST match this template exactly.
27
27
 
28
- - **.claude/skills/qa-self-validator/SKILL.md** -- Defines the 4 validation layers (Syntax, Structure, Dependencies, Logic), pass criteria per layer, fix loop protocol (max 3 loops), and output format.
28
+ - **.claude/skills/qa-self-validator/SKILL.md** -- Defines the 4 validation layers (Syntax, Structure, Dependencies, Logic), pass criteria per layer, fix loop protocol (max 5 loops), and output format.
29
29
 
30
30
  - **~/.claude/qaa/MY_PREFERENCES.md** (optional -- read if exists). User's personal QA preferences saved by the qa-learner skill. If a preference conflicts with CLAUDE.md, the preference wins (it is a user override). Check for rules about: assertion style, locator strategy, naming conventions, framework choices.
31
31
 
@@ -257,7 +257,7 @@ Attempt to fix issues found during validation layers. This step encodes ALL 8 lo
257
257
 
258
258
  **Locked Decision 2: Sequential, fail-fast** -- Layers run in order: Layer 1 (Syntax) -> Layer 2 (Structure) -> Layer 3 (Dependencies) -> Layer 4 (Logic). Fix Layer 1 issues before proceeding to check Layer 2. If Layer 1 fails, fix it and re-validate Layer 1 before moving to Layer 2.
259
259
 
260
- **Locked Decision 3: Max 3 loops** -- The fix loop runs at most 3 times. After 3 loops with unresolved issues, STOP and escalate.
260
+ **Locked Decision 3: Max 5 loops** -- The fix loop runs at most 5 times. After 5 loops with unresolved issues, STOP and escalate.
261
261
 
262
262
  **Locked Decision 4: Generated files only** -- Only fix files listed in the generation plan. Never modify pre-existing test files.
263
263
 
@@ -280,7 +280,7 @@ Attempt to fix issues found during validation layers. This step encodes ALL 8 lo
280
280
  **Fix loop execution:**
281
281
 
282
282
  ```
283
- Loop iteration (max 3):
283
+ Loop iteration (max 5):
284
284
  1. Run all 4 validation layers sequentially (fail-fast)
285
285
  2. If all layers PASS: exit loop, proceed to produce_report
286
286
  3. If any layer FAIL:
@@ -290,20 +290,20 @@ Loop iteration (max 3):
290
290
  - If MEDIUM or LOW: record as unresolved, do NOT apply
291
291
  b. Log this loop iteration: issues found, fixes applied, verification
292
292
  c. Re-validate from the FAILED layer (not from Layer 1 unless Layer 1 failed)
293
- d. If this was loop 3: exit loop regardless of results
293
+ d. If this was loop 5: exit loop regardless of results
294
294
  ```
295
295
 
296
- **After 3 loops with unresolved issues:**
296
+ **After 5 loops with unresolved issues:**
297
297
 
298
298
  STOP and return a checkpoint:
299
299
 
300
300
  ```
301
301
  CHECKPOINT_RETURN:
302
302
  completed: "Validated {N} files across 4 layers. Completed {loop_count} fix loops."
303
- blocking: "Unresolved validation issues after maximum 3 fix loops"
303
+ blocking: "Unresolved validation issues after maximum 5 fix loops"
304
304
  details:
305
305
  files_validated: {N}
306
- loops_completed: 3
306
+ loops_completed: 5
307
307
  issues_found: {total_count}
308
308
  issues_fixed: {fixed_count}
309
309
  unresolved:
@@ -314,6 +314,9 @@ details:
314
314
  why_not_fixed: "{reason auto-fix was not applied}"
315
315
  awaiting: "User decides: fix remaining issues manually, accept with warnings, or abort validation"
316
316
  ```
317
+
318
+ **Note:** The validator now has 5 fix loop iterations (up from 3), matching the E2E runner's loop budget. This gives more room to resolve cascading issues where fixing one problem reveals another.
319
+ ```
317
320
  </step>
318
321
 
319
322
  <step name="produce_report">
@@ -391,8 +394,8 @@ Include the confidence criteria table:
391
394
  | Level | All Layers PASS | Unresolved Issues | Fix Loops Used | Description |
392
395
  |-------|----------------|-------------------|----------------|-------------|
393
396
  | HIGH | Yes | 0 | 0-1 | All validations pass with minimal or no fixes needed. Code is ready for delivery. |
394
- | MEDIUM | Yes (after fixes) | 0-2 minor | 2-3 | All layers eventually pass, but required multiple fix rounds. Minor issues may exist. |
395
- | LOW | No (any FAIL) | Any critical | 3 (max) | At least one layer still fails, or critical issues remain unresolved. Human review required before delivery. |
397
+ | MEDIUM | Yes (after fixes) | 0-2 minor | 2-5 | All layers eventually pass, but required multiple fix rounds. Minor issues may exist. |
398
+ | LOW | No (any FAIL) | Any critical | 5 (max) | At least one layer still fails, or critical issues remain unresolved. Human review required before delivery. |
396
399
 
397
400
  Followed by the specific confidence statement:
398
401
  `**{LEVEL}:** {one-sentence reasoning referencing specific metrics from the summary}`
@@ -477,8 +480,8 @@ Before considering validation complete, verify ALL of the following.
477
480
  - [ ] Only generated files were validated (not pre-existing test files) -- verify every file in the report appears in the generation plan file list
478
481
  - [ ] Layer 4 cross-checked existing test files for duplicate IDs and overlapping selectors to prevent collisions
479
482
  - [ ] Fix confidence correctly classified (HIGH auto-applied, MEDIUM/LOW flagged for review but NOT auto-applied)
480
- - [ ] Fix loop count did not exceed 3 iterations
481
- - [ ] If 3 loops exhausted with unresolved issues: CHECKPOINT_RETURN was provided to escalate to user
483
+ - [ ] Fix loop count did not exceed 5 iterations
484
+ - [ ] If 5 loops exhausted with unresolved issues: CHECKPOINT_RETURN was provided to escalate to user
482
485
  - [ ] Validator did NOT commit any files (no git add, no git commit, no qaa-tools commit)
483
486
 
484
487
  If any check fails, fix the issue before finalizing the output. Do not deliver a validation report that fails its own quality gate.
@@ -325,7 +325,7 @@ The validator runs 4 layers per file:
325
325
  - Cross-reference test locators against real DOM elements
326
326
  - Flag locators that don't match, auto-fix mismatches
327
327
 
328
- Max 3 fix loop iterations. Produces VALIDATION_REPORT.md.
328
+ Max 5 fix loop iterations. Produces VALIDATION_REPORT.md.
329
329
 
330
330
  If `--run` flag is also present and E2E test files exist, invoke E2E runner after static validation:
331
331
 
@@ -357,20 +357,99 @@ Same as fix mode below but skip Step 4 (auto-fix). Only classify and report.
357
357
 
358
358
  ### FIX MODE (default)
359
359
 
360
+ Fix mode runs in two phases: **analyze first, then fix after user confirmation.**
361
+
362
+ #### Phase 1: Analyze & Present Plan (always runs)
363
+
360
364
  1. Read `CLAUDE.md` — classification rules, locator tiers, assertion quality.
361
- 2. Invoke bug-detective agent:
365
+ 2. Invoke bug-detective agent in **classify-only mode** (no auto-fixes):
362
366
 
363
367
  Task(
364
368
  prompt="
365
- <objective>Run tests, classify failures, and auto-fix TEST CODE ERRORS. Use Playwright MCP to reproduce E2E failures in the browser when available — navigate to failing pages, snapshot DOM, reproduce actions, and screenshot failure state for evidence.</objective>
369
+ <objective>Run tests and classify failures. Do NOT auto-fix anything yet — this is the analysis phase. Use Playwright MCP to reproduce E2E failures in the browser when available — navigate to failing pages, snapshot DOM, reproduce actions, and screenshot failure state for evidence.</objective>
366
370
  <execution_context>@agents/qaa-bug-detective.md</execution_context>
367
371
  <files_to_read>
368
372
  - CLAUDE.md
369
373
  - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
370
374
  - .qa-output/locators/LOCATOR_REGISTRY.md (if exists)
375
+ - .qa-output/codebase/CODE_PATTERNS.md (if exists)
376
+ - .qa-output/codebase/API_CONTRACTS.md (if exists)
377
+ - .qa-output/codebase/TEST_SURFACE.md (if exists)
378
+ - .qa-output/codebase/TESTABILITY.md (if exists)
371
379
  </files_to_read>
372
380
  <parameters>
373
381
  user_input: $ARGUMENTS
382
+ mode: classify-only
383
+ app_url: {auto-detect from test config baseURL, or ask user}
384
+ </parameters>
385
+ "
386
+ )
387
+
388
+ 3. Present the analysis to the user as a **fix plan**:
389
+
390
+ ```
391
+ === FIX PLAN ===
392
+
393
+ Tests run: {N}
394
+ Passed: {N}
395
+ Failed: {N}
396
+
397
+ Failures classified:
398
+ APPLICATION BUG: {N} (will NOT be touched)
399
+ TEST CODE ERROR: {N} (can auto-fix)
400
+ ENVIRONMENT ISSUE: {N} (resolution steps provided)
401
+ INCONCLUSIVE: {N} (needs more info)
402
+
403
+ Proposed auto-fixes (TEST CODE ERRORS only):
404
+ 1. {file}:{line} — {description of fix} [HIGH confidence]
405
+ 2. {file}:{line} — {description of fix} [HIGH confidence]
406
+ 3. {file}:{line} — {description of fix} [MEDIUM — flagged for review]
407
+
408
+ APPLICATION BUGs found (for developer action):
409
+ 1. {file}:{line} — {description}
410
+ 2. {file}:{line} — {description}
411
+
412
+ Proceed with auto-fixes? [yes / modify / cancel]
413
+ ================
414
+ ```
415
+
416
+ 4. **Wait for user confirmation.** Do NOT proceed until the user approves. This is a refinement loop — repeat until the user is satisfied:
417
+
418
+ - **"yes"** / **"proceed"** / **"dale"** → continue to Phase 2
419
+ - **"cancel"** / **"no"** → stop, deliver only the FAILURE_CLASSIFICATION_REPORT.md (same as --classify)
420
+ - **Any other response** (feedback, modifications, additions) → treat as a refinement request:
421
+ - Adjust the fix plan based on the user's instructions (add fixes, remove fixes, change approach, add new checks)
422
+ - Re-present the updated Fix Plan showing what changed
423
+ - Wait for user confirmation again
424
+ - Repeat this loop until the user says "yes" or "cancel"
425
+
426
+ **Examples of refinement requests:**
427
+ - "esto está bien pero también quiero que arregles los imports de utils" → add that fix to the plan
428
+ - "no toques el archivo de login, dejalo como está" → remove that file from the plan
429
+ - "cambiá el selector por getByRole en vez de getByTestId" → update the proposed fix
430
+ - "agregá también una validación de que el status sea 200" → add assertion fix to the plan
431
+
432
+ #### Phase 2: Execute Fixes (only after user confirmation)
433
+
434
+ 5. Invoke bug-detective agent in **fix mode** with the confirmed plan:
435
+
436
+ Task(
437
+ prompt="
438
+ <objective>Auto-fix the confirmed TEST CODE ERRORS from the analysis phase. Use Playwright MCP to reproduce E2E failures in the browser when available — navigate to failing pages, snapshot DOM, reproduce actions, and screenshot failure state for evidence.</objective>
439
+ <execution_context>@agents/qaa-bug-detective.md</execution_context>
440
+ <files_to_read>
441
+ - CLAUDE.md
442
+ - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
443
+ - .qa-output/locators/LOCATOR_REGISTRY.md (if exists)
444
+ - .qa-output/codebase/CODE_PATTERNS.md (if exists)
445
+ - .qa-output/codebase/API_CONTRACTS.md (if exists)
446
+ - .qa-output/codebase/TEST_SURFACE.md (if exists)
447
+ - .qa-output/codebase/TESTABILITY.md (if exists)
448
+ - .qa-output/FAILURE_CLASSIFICATION_REPORT.md
449
+ </files_to_read>
450
+ <parameters>
451
+ user_input: $ARGUMENTS
452
+ mode: fix
374
453
  app_url: {auto-detect from test config baseURL, or ask user}
375
454
  </parameters>
376
455
  "
@@ -394,6 +473,35 @@ Task(
394
473
  - Element exists, wrong behavior → APPLICATION BUG
395
474
  - Page doesn't load → ENVIRONMENT ISSUE
396
475
 
397
- 3. Present results. APPLICATION BUGs are reported for developer action, not auto-fixed.
476
+ 6. Present results. APPLICATION BUGs are reported for developer action, not auto-fixed.
398
477
 
399
478
  $ARGUMENTS
479
+
480
+ ## MANDATORY verification — run ALL commands below, no exceptions, no skipping
481
+
482
+ Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
483
+
484
+ ```bash
485
+ echo "=== QA-FIX CHECKLIST START ==="
486
+ echo "1. Failure classification report:"
487
+ ls .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_CLASSIFICATION_REPORT"
488
+ echo "2. Locator Registry:"
489
+ ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
490
+ echo "3. MY_PREFERENCES.md:"
491
+ cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
492
+ echo "4. Codebase map (context for bug-detective):"
493
+ ls .qa-output/codebase/ 2>/dev/null || echo "NO_CODEBASE_MAP"
494
+ echo "5. Test files in scope:"
495
+ find tests/ cypress/ __tests__/ e2e/ spec/ -type f -name "*.spec.*" -o -name "*.test.*" -o -name "*.e2e.*" 2>/dev/null | head -20 || echo "NO_TEST_FILES_FOUND"
496
+ echo "6. MCP evidence (if browser was used):"
497
+ ls .qa-output/mcp-evidence/ 2>/dev/null || echo "NO_MCP_EVIDENCE"
498
+ echo "7. Classification categories in report:"
499
+ grep -cE "APPLICATION BUG|TEST CODE ERROR|ENVIRONMENT ISSUE|INCONCLUSIVE" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_CLASSIFICATIONS"
500
+ echo "=== QA-FIX CHECKLIST END ==="
501
+ ```
502
+
503
+ **Rules:**
504
+ - Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
505
+ - If any output shows a problem (NO_CLASSIFICATION_REPORT after fix mode completed), fix it before returning.
506
+ - If output shows expected "not found" results (e.g., NO_MCP_EVIDENCE when no browser was used), that is fine — the point is you RAN the command instead of assuming the answer.
507
+ - Do NOT mark this task as complete until the block has been executed and you have read every line of output.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "qaa-agent",
3
- "version": "1.8.5",
3
+ "version": "1.8.6",
4
4
  "description": "QA Automation Agent for Claude Code — multi-agent pipeline that analyzes repos, generates tests, validates, and creates PRs",
5
5
  "bin": {
6
6
  "qaa-agent": "./bin/install.cjs"
@@ -15,10 +15,6 @@
15
15
  "pytest",
16
16
  "ai-agent"
17
17
  ],
18
- "repository": {
19
- "type": "git",
20
- "url": "https://github.com/Backhaus7997/qaa-testing.git"
21
- },
22
18
  "author": "Backhaus7997",
23
19
  "license": "MIT",
24
20
  "dependencies": {