npm - qaa-agent - Versions diffs - 1.8.1 → 1.8.6 - Mend

qaa-agent 1.8.1 → 1.8.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/CHANGELOG.md +24 -0
package/agents/qaa-validator.md +14 -11
package/commands/qa-create-test-ado.md +404 -0
package/commands/qa-create-test.md +46 -5
package/commands/qa-fix.md +112 -4
package/package.json +1 -5

package/CHANGELOG.md CHANGED Viewed

@@ -3,6 +3,30 @@
 All notable changes to QAA (QA Automation Agent) are documented here.
+## [1.8.6] - 2026-04-20
+### Added
+- **Fix mode analyze-first flow in `/qa-fix`** — fix mode now runs in two phases: Phase 1 analyzes and classifies all failures without touching any files, then presents a Fix Plan to the user. Phase 2 executes auto-fixes only after explicit user confirmation. Users can refine the plan iteratively (add fixes, remove files, change approach) in a loop until satisfied, then approve or cancel. Replaces the previous fully-automatic fix behavior.
+- **Codebase map context for bug-detective** — `/qa-fix` fix mode now passes all 4 codebase map documents (`CODE_PATTERNS.md`, `API_CONTRACTS.md`, `TEST_SURFACE.md`, `TESTABILITY.md`) to the bug-detective agent via `files_to_read`. Previously the bug-detective classified failures without project-specific context, leading to less accurate classifications.
+- **Mandatory bash checklist in `/qa-fix`** — verification block at the end of `qa-fix.md` that forces the agent to run `ls`/`cat`/`grep` commands to confirm artifacts were produced (classification report, locator registry, codebase map, MCP evidence, test files). Matches the existing checklist pattern in `/qa-create-test`.
+### Changed
+- **Validator fix loops increased from 3 to 5** — `qaa-validator` agent now has up to 5 fix loop iterations (previously 3), matching the E2E runner's loop budget. Updated across all references: locked decision, fix loop logic, checkpoint return, confidence criteria table, and quality gate checks.
+## [1.8.5] - 2026-04-17
+### Added
+- **Azure DevOps mode in `/qa-create-test`** — new `--ado` flag enables creating Test Cases directly in Azure DevOps from a work item. Supports work item ID or full ADO URL, auto-detects `dev.azure.com` and `*.visualstudio.com` URLs. Features include: boundary value triplet detection (N-1, N, N+1), deduplication against existing linked TCs, confidence scoring (Specified vs Draft), keyword-based Critical tagging, and preconditions block per test case.
+- **`/qa-create-test-ado` standalone command** — dedicated command for Azure DevOps test case creation with 7-phase workflow: retrieve work item with comments/attachments, dedup check, type-based content extraction (Bug → Repro Steps, User Story → Acceptance Criteria), test case design, creation in ADO via `testplan_create_test_case`, structured report generation, and report attachment to source work item.
+- **ADO-specific flags** — `--area-path`, `--iteration-path` (override paths for created TCs), `--skip-dedup` (skip deduplication check).
+### Changed
+- **`/qa-create-test` now supports 5 modes** — from-code, from-ticket, ADO, update, and POM-only (previously 3 modes). Mode detection updated to recognize ADO URLs before ticket URLs to avoid routing conflicts.
 ## [1.8.1] - 2026-04-16
 ### Added

package/agents/qaa-validator.md CHANGED Viewed

@@ -25,7 +25,7 @@ Read ALL of the following files BEFORE performing any validation. Do NOT skip.
 - **templates/validation-report.md** -- Output format contract. Defines the 5 required sections (Summary, File Details, Unresolved Issues, Fix Loop Log, Confidence Level), all field definitions, confidence criteria table (HIGH/MEDIUM/LOW), worked example, and quality gate checklist (7 items). Your VALIDATION_REPORT.md output MUST match this template exactly.
-- **.claude/skills/qa-self-validator/SKILL.md** -- Defines the 4 validation layers (Syntax, Structure, Dependencies, Logic), pass criteria per layer, fix loop protocol (max 3 loops), and output format.
+- **.claude/skills/qa-self-validator/SKILL.md** -- Defines the 4 validation layers (Syntax, Structure, Dependencies, Logic), pass criteria per layer, fix loop protocol (max 5 loops), and output format.
 - **~/.claude/qaa/MY_PREFERENCES.md** (optional -- read if exists). User's personal QA preferences saved by the qa-learner skill. If a preference conflicts with CLAUDE.md, the preference wins (it is a user override). Check for rules about: assertion style, locator strategy, naming conventions, framework choices.
@@ -257,7 +257,7 @@ Attempt to fix issues found during validation layers. This step encodes ALL 8 lo
 **Locked Decision 2: Sequential, fail-fast** -- Layers run in order: Layer 1 (Syntax) -> Layer 2 (Structure) -> Layer 3 (Dependencies) -> Layer 4 (Logic). Fix Layer 1 issues before proceeding to check Layer 2. If Layer 1 fails, fix it and re-validate Layer 1 before moving to Layer 2.
-**Locked Decision 3: Max 3 loops** -- The fix loop runs at most 3 times. After 3 loops with unresolved issues, STOP and escalate.
+**Locked Decision 3: Max 5 loops** -- The fix loop runs at most 5 times. After 5 loops with unresolved issues, STOP and escalate.
 **Locked Decision 4: Generated files only** -- Only fix files listed in the generation plan. Never modify pre-existing test files.
@@ -280,7 +280,7 @@ Attempt to fix issues found during validation layers. This step encodes ALL 8 lo
 **Fix loop execution:**
 ```
-Loop iteration (max 3):
+Loop iteration (max 5):
   1. Run all 4 validation layers sequentially (fail-fast)
   2. If all layers PASS: exit loop, proceed to produce_report
   3. If any layer FAIL:
@@ -290,20 +290,20 @@ Loop iteration (max 3):
         - If MEDIUM or LOW: record as unresolved, do NOT apply
      b. Log this loop iteration: issues found, fixes applied, verification
      c. Re-validate from the FAILED layer (not from Layer 1 unless Layer 1 failed)
-     d. If this was loop 3: exit loop regardless of results
+     d. If this was loop 5: exit loop regardless of results
 ```
-**After 3 loops with unresolved issues:**
+**After 5 loops with unresolved issues:**
 STOP and return a checkpoint:
 ```
 CHECKPOINT_RETURN:
 completed: "Validated {N} files across 4 layers. Completed {loop_count} fix loops."
-blocking: "Unresolved validation issues after maximum 3 fix loops"
+blocking: "Unresolved validation issues after maximum 5 fix loops"
 details:
   files_validated: {N}
-  loops_completed: 3
+  loops_completed: 5
   issues_found: {total_count}
   issues_fixed: {fixed_count}
   unresolved:
@@ -314,6 +314,9 @@ details:
       why_not_fixed: "{reason auto-fix was not applied}"
 awaiting: "User decides: fix remaining issues manually, accept with warnings, or abort validation"
 ```
+**Note:** The validator now has 5 fix loop iterations (up from 3), matching the E2E runner's loop budget. This gives more room to resolve cascading issues where fixing one problem reveals another.
+```
 </step>
 <step name="produce_report">
@@ -391,8 +394,8 @@ Include the confidence criteria table:
 | Level | All Layers PASS | Unresolved Issues | Fix Loops Used | Description |
 |-------|----------------|-------------------|----------------|-------------|
 | HIGH | Yes | 0 | 0-1 | All validations pass with minimal or no fixes needed. Code is ready for delivery. |
-| MEDIUM | Yes (after fixes) | 0-2 minor | 2-3 | All layers eventually pass, but required multiple fix rounds. Minor issues may exist. |
-| LOW | No (any FAIL) | Any critical | 3 (max) | At least one layer still fails, or critical issues remain unresolved. Human review required before delivery. |
+| MEDIUM | Yes (after fixes) | 0-2 minor | 2-5 | All layers eventually pass, but required multiple fix rounds. Minor issues may exist. |
+| LOW | No (any FAIL) | Any critical | 5 (max) | At least one layer still fails, or critical issues remain unresolved. Human review required before delivery. |
 Followed by the specific confidence statement:
 `**{LEVEL}:** {one-sentence reasoning referencing specific metrics from the summary}`
@@ -477,8 +480,8 @@ Before considering validation complete, verify ALL of the following.
 - [ ] Only generated files were validated (not pre-existing test files) -- verify every file in the report appears in the generation plan file list
 - [ ] Layer 4 cross-checked existing test files for duplicate IDs and overlapping selectors to prevent collisions
 - [ ] Fix confidence correctly classified (HIGH auto-applied, MEDIUM/LOW flagged for review but NOT auto-applied)
-- [ ] Fix loop count did not exceed 3 iterations
-- [ ] If 3 loops exhausted with unresolved issues: CHECKPOINT_RETURN was provided to escalate to user
+- [ ] Fix loop count did not exceed 5 iterations
+- [ ] If 5 loops exhausted with unresolved issues: CHECKPOINT_RETURN was provided to escalate to user
 - [ ] Validator did NOT commit any files (no git add, no git commit, no qaa-tools commit)
 If any check fails, fix the issue before finalizing the output. Do not deliver a validation report that fails its own quality gate.

package/commands/qa-create-test-ado.md ADDED Viewed

@@ -0,0 +1,404 @@
+# QA Create Test — Azure DevOps
+Retrieve an Azure DevOps work item, analyze its content, and generate well-structured Test Cases directly in Azure DevOps using the ADO MCP tools. Each test case is tagged for test plan membership (Smoke, Regression, Critical) and linked back to the source work item for full traceability. Integrates with the QAA pipeline: reads codebase map, locator registry, and user preferences for context-aware test case generation.
+## Usage
+```
+/qa-create-test-ado <work-item-id> [--area-path=<path>] [--iteration-path=<path>] [--skip-map] [--skip-dedup] [--app-url <url>]
+```
+### Arguments
+| Parameter | Purpose | Default |
+|-----------|---------|---------|
+| `<work-item-id>` | Azure DevOps work item ID to generate test cases from | Required |
+| `--area-path=<path>` | Override area path for all created test artifacts | Source work item's area path |
+| `--iteration-path=<path>` | Override iteration path for all created test artifacts | Source work item's iteration path |
+| `--skip-map` | Skip codebase map check and proceed without project context | false |
+| `--skip-dedup` | Skip deduplication check against existing linked test cases | false |
+| `--app-url <url>` | URL of running application for locator extraction via Playwright MCP | auto-detect |
+## What It Produces
+- Test Cases created directly in Azure DevOps (via `testplan_create_test_case`)
+- Test Cases linked to source work item via *Tested By* relationship
+- Tags applied: `Smoke`, `Regression`, `Critical`, `AutomationCandidate`, `NeedsReview`
+- `ai-tasks/ticket-{id}/test-cases.md` — structured report
+- Report attached to work item (if `ADO_MCP_AUTH_TOKEN` is set) or written to `Custom.QATestCasesReport` field (fallback)
+---
+## Process
+### Phase 1: Read Pipeline Context
+Before retrieving the work item, read QAA pipeline artifacts for context-aware generation.
+1. **Read `CLAUDE.md`** — POM rules, locator tiers, assertion rules, naming conventions, quality gates, test spec rules.
+2. **Read user preferences** — `~/.claude/qaa/MY_PREFERENCES.md` (if exists). User overrides win over defaults.
+3. **Check for codebase map** (`.qa-output/codebase/`):
+   - Look for: `CODE_PATTERNS.md`, `API_CONTRACTS.md`, `TEST_SURFACE.md`, `TESTABILITY.md`, `RISK_MAP.md`, `CRITICAL_PATHS.md`
+   - If at least 2 exist: read them all for project context (naming conventions, API shapes, testable surfaces, risk areas).
+   - If NONE exist and `--skip-map` not passed: warn the user that test cases will lack project context, suggest running `/qa-map` first. Continue anyway (ADO test cases are higher-level than code-level tests).
+4. **Check locator registry** — `.qa-output/locators/LOCATOR_REGISTRY.md` (if exists):
+   - If locators exist for pages related to the work item's feature: reference them in test step expected results (e.g., "Verify element `[data-testid='login-submit-btn']` is visible").
+   - If `--app-url` provided and locators missing: use Playwright MCP to extract locators from the live app before designing test steps:
+     ```
+     mcp__playwright__browser_navigate({ url: "{app_url}/{feature_path}" })
+     mcp__playwright__browser_snapshot()
+     ```
+   - Write extracted locators to `.qa-output/locators/{feature}.locators.md` and update the registry.
+---
+### Phase 2: Retrieve the Work Item
+Use `wit_get_work_item` with `expand: "relations"` to fetch the full work item:
+- Capture: **title**, **type** (`Bug`, `User Story`, `Ticket`), **state**, **assigned-to**, **area path**, **iteration path**
+- Capture all relevant content fields based on type (see Phase 3)
+- Note the project for all subsequent calls
+**Also retrieve comments** using `wit_list_work_item_comments`:
+- Read all comments in chronological order
+- Look for: acceptance criteria added in comments, QA notes, scope clarifications, tester feedback, or any conditions of satisfaction mentioned informally
+- These often contain implied test cases not captured in the formal fields
+**Also check attachments** from the relations list (entries where `rel` equals `AttachedFile`):
+- Filter to `.csv` and `.txt` files (case-insensitive) by inspecting `attributes.name`
+- If found, download via:
+  ```bash
+  curl -s --user ":{AZURE_DEVOPS_PAT}" "{attachment-url}"
+  ```
+- Read content for test data, expected values, error logs, or sample datasets that define expected behavior
+---
+### Phase 2b: Deduplication Check — Query Existing Test Cases
+Before generating any new test cases, check whether the source work item already has linked test cases to prevent duplicates.
+1. Inspect the relations returned in Phase 2 — filter for link type `"Microsoft.VSTS.Common.TestedBy-Forward"` (i.e., *Tested By* links).
+2. For each linked test case ID found, call `wit_get_work_item` to retrieve its **title** and **state**.
+3. Build an **existing TC registry** — a list of `{ id, title, state }` for all currently linked test cases.
+4. In Phase 5, before calling `testplan_create_test_case` for each planned TC, compare its title (normalized: lowercase, trimmed) against every title in the registry.
+   - **If match found** and existing TC is in state `Design`, `Ready`, or `Closed`: skip creation, log `"Skipped — duplicate of TC #{id}"`.
+   - **If match found** but existing TC is in state `Removed`: create the new TC anyway (the old one was intentionally discarded).
+   - **If no match**: proceed with creation.
+5. Include a **Dedup Summary** section in the output report.
+Skip this check with `--skip-dedup`.
+---
+### Phase 3: Identify Work Item Type and Extract Test Source Content
+Apply the correct extraction strategy based on work item type:
+#### If type is `Bug` or `Ticket`:
+Primary source — **Repro Steps** (`Microsoft.VSTS.TCM.ReproSteps`):
+- Each distinct action sequence is a candidate test case
+- The repro steps define the *negative path* (what triggers the bug)
+- Derive the *positive/fix-verification path* by inverting the expected outcome
+- Also read: **System Info** (`Microsoft.VSTS.TCM.SystemInfo`), **Description**, **QA Notes** (`CIIScrum.QANotes`)
+- Check `Custom.Whatisexpectedtohappen` and `Custom.Whatisactuallyhappening` to anchor pass/fail assertions
+Secondary sources:
+- Comments for tester observations or specific scenarios to cover
+- Attachments for error data or sample inputs
+#### If type is `User Story`:
+Primary source — **Acceptance Criteria** (`Microsoft.VSTS.Common.AcceptanceCriteria`):
+- Each acceptance criterion (Given/When/Then or checklist) maps to one or more test cases
+- Also read: **Description** for context and implied behaviors
+Secondary sources:
+- Comments for clarifications, edge cases raised in refinement, or stakeholder scenarios
+- Attachments for wireframes described in text, sample data, or business rules documents
+#### If type is unrecognized or fields are empty:
+Fall back to **Description** as the primary source. Extract any stated behaviors, expected outcomes, or constraints. Note the fallback in the output.
+**Cross-reference with codebase map** (if available):
+- Match mentioned components/features against `TEST_SURFACE.md` entry points
+- Check `RISK_MAP.md` for risk level of affected areas
+- Use `API_CONTRACTS.md` for exact endpoint shapes if the work item mentions API behavior
+- Use `CODE_PATTERNS.md` to align test step language with project conventions
+---
+### Phase 4: Analyze and Design Test Cases
+Before creating anything in Azure DevOps, plan out all test cases:
+**For each distinct scenario identified, determine:**
+1. **Test Case Title** — concise action-oriented name (e.g., "Verify guest pass entry counter resets at midnight")
+2. **Steps** — formatted as `{step action} | {expected result}` per step, using `|` as the delimiter
+3. **Priority** — 1 (Critical), 2 (High), 3 (Medium), 4 (Low)
+4. **Tags** — one or more of: `Smoke`, `Regression`, `Critical`, `AutomationCandidate`, `NeedsReview`
+5. **Preconditions** — required setup before executing the test
+6. **Confidence** — `Specified` or `Draft`
+**Minimum test case coverage per work item type:**
+| Scenario Type | Bug/Ticket | User Story |
+|---------------|-----------|------------|
+| Happy path (fix verified / AC met) | Required | Required per AC item |
+| Negative / error path | Required (original repro) | Where AC implies failure states |
+| Boundary / edge cases | If data-driven | If AC contains limits or conditions |
+| Boundary value triplets (n-1, n, n+1) | If limits detected | If AC contains limits/ranges |
+| Regression guard (related area) | Required | Required |
+#### Boundary Value Detection
+Scan all source content for **boundary keyword triggers**:
+> `max`, `min`, `limit`, `threshold`, `cap`, `ceiling`, `floor`, `range`, `between`, `up to`, `at most`, `at least`, `no more than`, `no fewer than`, `maximum`, `minimum`, `exactly`, `exceeds`, `boundary`
+When a trigger is found alongside a numeric value **N**:
+1. **Generate three test cases** (the boundary triplet):
+   - **N - 1** — just below the boundary
+   - **N** — exactly at the boundary
+   - **N + 1** — just above the boundary
+2. Title them clearly: e.g., `"Verify entry limit at 99 (below threshold)"`, `"...at 100 (at threshold)"`, `"...at 101 (above threshold)"`.
+3. Tag all three with `Regression`.
+4. If the boundary is on a critical-path field (per `CRITICAL_PATHS.md` or keyword detection), also tag `Critical`.
+If the source mentions a range, generate boundary triplets for **both** ends.
+#### Tagging Rules
+| Tag | Assign when... |
+|-----|---------------|
+| `Smoke` | Verifies core, user-facing functionality that must work for the app to be usable at all. Limit to the most essential 1-2 cases per work item. |
+| `Regression` | Guards against the specific bug or behavior being re-introduced. Every fix-verification test for a Bug/Ticket should be tagged. For User Stories, tag tests covering AC that touches shared or high-traffic code paths. |
+| `Critical` | Covers functionality whose failure would directly impact revenue, security, data integrity, or legal compliance. **Also apply when critical keywords are detected** (see Keyword-Based Critical Tagging below). Apply conservatively. |
+| `AutomationCandidate` | Test has: (a) deterministic steps with no subjective judgment, (b) assertions based on concrete data/state, (c) no manual-only prerequisites. Advisory only — QA confirms. |
+**Do not assign Smoke to every test case.** Smoke tests are a small, fast-running set.
+#### Keyword-Based Critical Tagging
+Automatically tag as `Critical` when any of the following keywords appear in the source content:
+> `auth`, `authentication`, `login`, `password`, `OAuth`, `SSO`, `payment`, `billing`, `charge`, `invoice`, `PII`, `personal data`, `SSN`, `date of birth`, `security`, `encryption`, `token`, `certificate`, `data integrity`, `transaction`, `rollback`, `compliance`, `HIPAA`, `GDPR`, `SOC`, `audit`, `permission`, `role-based`, `access control`
+Cross-reference with `RISK_MAP.md` (if available) for additional risk-based tagging.
+#### Confidence Scoring
+| Confidence | Criteria | Behavior |
+|------------|----------|----------|
+| **Specified** | Source content explicitly describes the scenario, expected outcome, and data. | Create the TC normally. |
+| **Draft** | Scenario is implied or partially described — inferred from context or sparse source. | Prefix TC title with `[DRAFT]`. Add `NeedsReview` tag. Add final step: `"Review — this test case was auto-generated from sparse source material and requires QA validation before execution." | "QA has reviewed and confirmed or updated the steps."` |
+**Threshold**: If more than 50% of the source content fields are empty or contain fewer than 20 words, default all inferred TCs to Draft.
+#### Preconditions Block
+Every test case documents preconditions:
+| Field | Description | Example |
+|-------|-------------|--------|
+| **Required Role(s)** | User role(s) or permission level(s) needed | `Admin`, `Property Manager`, `Resident` |
+| **Application State** | System/feature state that must be true before step 1 | `User is logged in`, `Feature flag X is enabled` |
+| **Test Data** | Specific data that must exist or be created | `Resident account with active lease` |
+| **Environment** | Environment-specific requirements | `Staging`, `API key configured` |
+Prepend preconditions to the TC description field in Azure DevOps:
+```
+**Preconditions**
+- Role(s): {roles}
+- State: {state}
+- Test Data: {data}
+- Environment: {env}
+```
+If locator registry data is available, include relevant locator references in test steps for E2E-related scenarios.
+---
+### Phase 5: Create Test Cases in Azure DevOps
+**Dedup gate**: Before creating each TC, check against the registry from Phase 2b.
+For each planned test case, call `testplan_create_test_case` with:
+- `project`: the work item's project
+- `title`: the test case title — prefixed with `[DRAFT]` if confidence is Draft
+- `steps`: formatted as `1. {action}|{expected result}\n2. {action}|{expected result}` — use `|` as delimiter. **Never pass XML or pre-formatted `<steps>` markup** — the tool generates XML from plain-text format.
+- `priority`: numeric priority (1-4)
+- `iterationPath`: use `--iteration-path` override if provided, otherwise source work item's iteration path
+- `areaPath`: use `--area-path` override if provided, otherwise source work item's area path
+**After creating each test case:**
+1. Call `wit_add_artifact_link` or `wit_work_items_link` to link the new TC to the source work item using link type `"tested by"`:
+   ```
+   source work item  --[Tested By]-->  test case
+   ```
+2. Call `wit_update_work_item` on the new TC to set `System.Tags` to semicolon-separated tags (e.g., `"Regression; Critical; AutomationCandidate"`).
+   - Draft TCs always include `NeedsReview`.
+Create all test cases sequentially — capture each new TC ID before proceeding.
+---
+### Phase 6: Synthesize the Output Report
+Save the report to `ai-tasks/ticket-$ARGUMENTS/test-cases.md`.
+**Required document structure:**
+```markdown
+# Test Cases: {work-item-id} — {Work Item Title}
+**Generated**: {current date}
+**Work Item**: [{work-item-id}]({azure-devops-url}) — {type} | {state}
+**Assigned To**: {assigned-to}
+**Area Path**: {area path}
+**Iteration**: {iteration path}
+**Test Source**: {Repro Steps / Acceptance Criteria / Description (fallback)}
+**Pipeline Context**: Codebase map: {yes/no}, Locator registry: {yes/no}, Preferences: {yes/no}
+---
+## Source Analysis
+### Work Item Summary
+{2-3 sentences describing the work item and what behavior needed to be tested.}
+### Key Scenarios Identified
+{Bulleted list of distinct testable scenarios extracted before designing test cases.}
+### Source Content Notes
+{Observations about quality/completeness of source material. Were repro steps/AC clear? Did comments add scenarios?}
+### Codebase Context Used
+{If codebase map was available: list which documents were read and what context they provided. If not available: note that test cases were generated without codebase context.}
+---
+## Test Cases Created
+### TC-{azure-devops-id}: {title}
+**Confidence**: `Specified` or `[DRAFT] — NeedsReview`
+**Tags**: `{Smoke}` · `{Regression}` · `{Critical}` · `{AutomationCandidate}` · `{NeedsReview}` *(show only tags that apply)*
+**Priority**: {1 – Critical / 2 – High / 3 – Medium / 4 – Low}
+**Linked To**: Work Item #{work-item-id} via *Tested By*
+**Azure DevOps ID**: {test-case-id}
+**Preconditions:**
+- **Role(s)**: {required roles or N/A}
+- **State**: {required application state or N/A}
+- **Test Data**: {required data or N/A}
+- **Environment**: {environment requirements or N/A}
+**Test Steps:**
+| # | Action | Expected Result |
+|---|--------|-----------------|
+| 1 | {action} | {expected result} |
+| 2 | {action} | {expected result} |
+{Repeat for each test case.}
+---
+## Tag Summary
+| Tag | Count | Test Case IDs |
+|-----|-------|---------------|
+| Smoke | {n} | {comma-separated IDs} |
+| Regression | {n} | {comma-separated IDs} |
+| Critical | {n} | {comma-separated IDs} |
+| AutomationCandidate | {n} | {comma-separated IDs} |
+| NeedsReview | {n} | {comma-separated IDs} |
+---
+## Dedup Summary
+| Planned Title | Skipped Reason | Existing TC |
+|---------------|---------------|-------------|
+| {title} | Duplicate of TC #{id} | #{id} — {state} |
+{If no duplicates: "No duplicates detected — all test cases were created."}
+---
+## Traceability
+All test cases linked to work item **#{work-item-id}** via *Tested By*.
+**Path Overrides Applied**: {If --area-path or --iteration-path provided, state them. Otherwise: "None — used source work item paths."}
+**Confidence Breakdown**: {n} Specified, {n} Draft (NeedsReview)
+**Boundary Triplets Generated**: {n} (from {n} detected boundaries)
+```
+---
+### Phase 7: Attach Report to Source Work Item
+**If `ADO_MCP_AUTH_TOKEN` is set:**
+Upload `test-cases.md` as an attachment:
+```bash
+# Step 1: Upload file
+ATTACHMENT_URL=$(curl -s \
+  --header "Authorization: Basic $(echo -n :${ADO_MCP_AUTH_TOKEN} | base64)" \
+  --header "Content-Type: application/octet-stream" \
+  --request POST \
+  --data-binary "@ai-tasks/ticket-$ARGUMENTS/test-cases.md" \
+  "https://dev.azure.com/{org}/{project}/_apis/wit/attachments?fileName=test-cases.md&api-version=7.1" \
+  | python3 -c "import sys,json; print(json.load(sys.stdin)['url'])")
+# Step 2: Link attachment to work item
+curl -s \
+  --header "Authorization: Basic $(echo -n :${ADO_MCP_AUTH_TOKEN} | base64)" \
+  --header "Content-Type: application/json-patch+json" \
+  --request PATCH \
+  --data "[{\"op\":\"add\",\"path\":\"/relations/-\",\"value\":{\"rel\":\"AttachedFile\",\"url\":\"${ATTACHMENT_URL}\",\"attributes\":{\"comment\":\"Generated test cases report\"}}}]" \
+  "https://dev.azure.com/{org}/{project}/_apis/wit/workItems/$ARGUMENTS?api-version=7.1"
+```
+**If `ADO_MCP_AUTH_TOKEN` is NOT set (fallback):**
+Write the full report as HTML to the work item's `Custom.QATestCasesReport` field via `wit_update_work_item`. Include all sections converted to HTML.
+Note in the final report which method was used.
+---
+## Final Report to User
+After completing all phases, provide:
+1. Brief inline summary (2-3 sentences) of scenarios covered
+2. Full path to generated file: `ai-tasks/ticket-{id}/test-cases.md`
+3. Table of every created TC: ID, title, tags, confidence
+4. Counts by tag: Smoke, Regression, Critical, AutomationCandidate, NeedsReview
+5. Dedup summary: how many planned TCs were skipped
+6. Confidence summary: Specified vs Draft counts
+7. Boundary summary: how many boundary triplets generated
+8. Pipeline context: which codebase map documents and locator registry data were used
+9. Gaps or assumptions made
+10. Path override confirmation (if used)
+11. Report delivery confirmation (attached as file or written to custom field)
+$ARGUMENTS

package/commands/qa-create-test.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # QA Create Test
-Create, update, or generate tests from tickets — all in one command. Supports three modes: generate tests from code analysis, generate tests from a ticket (Jira/Linear/GitHub), or update/improve existing tests. Uses Playwright MCP to extract real locators from the live app when available.
+Create, update, or generate tests from tickets — all in one command. Supports five modes: generate tests from code analysis, generate tests from a ticket (Jira/Linear/GitHub), create Test Cases in Azure DevOps from a work item, update/improve existing tests, or generate POM files only. Uses Playwright MCP to extract real locators from the live app when available.
 ## Usage
@@ -14,6 +14,7 @@ Create, update, or generate tests from tickets — all in one command. Supports
 |------|---------|---------|
 | **From code** | Feature name (no URL, no path to tests) | `/qa-create-test login` |
 | **From ticket** | URL, shorthand (#123), or `--ticket` flag | `/qa-create-test https://github.com/org/repo/issues/42` |
+| **Azure DevOps** | `--ado` flag with work item ID or ADO URL | `/qa-create-test --ado 85508` |
 | **Update existing** | Path to existing test files or `--update` flag | `/qa-create-test --update tests/e2e/` |
 | **POM only** | `--pom-only` flag | `/qa-create-test --pom-only src/pages/` |
@@ -25,6 +26,10 @@ Create, update, or generate tests from tickets — all in one command. Supports
 - `--ticket <source>` — force ticket mode with: URL, shorthand (#123, org/repo#123), file path, or plain text
 - `--update <path>` — force update mode: audit and improve existing tests at path
 - `--scope fix|improve|add|full` — for update mode only (default: full)
+- `--ado <work-item-id>` — Azure DevOps mode: read a work item and create Test Cases in ADO (accepts ID or full ADO URL)
+- `--area-path <path>` — (ADO mode) override area path for created test cases (default: source work item's area path)
+- `--iteration-path <path>` — (ADO mode) override iteration path for created test cases (default: source work item's iteration path)
+- `--skip-dedup` — (ADO mode) skip deduplication check against existing linked test cases
 - `--pom-only [path]` — generate only Page Object Model files (BasePage + feature POMs), no test specs
 - `--framework <name>` — override framework auto-detection (playwright, cypress, selenium) — used with --pom-only
@@ -33,8 +38,9 @@ Create, update, or generate tests from tickets — all in one command. Supports
 ```
 if --pom-only:
   MODE = "pom-only"
-elif argument matches URL pattern ...
-if argument matches URL pattern (github.com, atlassian.net, linear.app) OR contains "#" + digits OR --ticket flag:
+elif --ado flag OR argument matches ADO URL (dev.azure.com, *.visualstudio.com):
+  MODE = "ado"
+elif argument matches URL pattern (github.com, atlassian.net, linear.app) OR contains "#" + digits OR --ticket flag:
   MODE = "from-ticket"
 elif --update flag OR argument is path to existing test directory/files:
   MODE = "update"
@@ -57,6 +63,13 @@ else:
 - Test spec files with `traces_to` fields linking back to ticket ACs
 - VALIDATION_REPORT.md
+### Azure DevOps Mode
+- Test Cases created directly in Azure DevOps (via `testplan_create_test_case`)
+- Test Cases linked to source work item via *Tested By* relationship
+- Tags applied: `Smoke`, `Regression`, `Critical`, `AutomationCandidate`, `NeedsReview`
+- `ai-tasks/ticket-{id}/test-cases.md` — structured report
+- Report attached to work item (if `ADO_MCP_AUTH_TOKEN` is set) or written to `Custom.QATestCasesReport` field (fallback)
 ### Update Mode
 - QA_AUDIT_REPORT.md — current quality assessment
 - Improved test files (after user approval)
@@ -70,8 +83,8 @@ Parse `$ARGUMENTS` to determine mode using the detection logic above.
 Print mode banner:
 ```
 === QA Create Test ===
-Mode: {from-code | from-ticket | update}
-Target: {feature name | ticket URL | test path}
+Mode: {from-code | from-ticket | ado | update | pom-only}
+Target: {feature name | ticket URL | ADO work item ID | test path}
 App URL: {url or "auto-detect"}
 ===========================
 ```
@@ -203,6 +216,34 @@ Key steps in the workflow:
 ---
+### ADO MODE (Azure DevOps)
+Create Test Cases directly in Azure DevOps from a work item. Reads the work item content (repro steps, acceptance criteria, comments, attachments), designs test cases with boundary detection and deduplication, and creates them in ADO with full traceability.
+**Prerequisites:** ADO MCP server must be connected (provides `wit_get_work_item`, `testplan_create_test_case`, etc.).
+Execute the full ADO workflow defined in `@commands/qa-create-test-ado.md`:
+1. **Phase 1** — Read pipeline context: CLAUDE.md, MY_PREFERENCES.md, codebase map, locator registry
+2. **Phase 2** — Retrieve work item with relations, comments, and attachments
+3. **Phase 2b** — Deduplication check against existing linked test cases (skip with `--skip-dedup`)
+4. **Phase 3** — Extract test source content based on work item type (Bug → Repro Steps, User Story → Acceptance Criteria)
+5. **Phase 4** — Design test cases with boundary value detection, tagging rules, confidence scoring, and preconditions
+6. **Phase 5** — Create test cases in ADO via `testplan_create_test_case`, link via *Tested By*, set tags
+7. **Phase 6** — Generate structured report to `ai-tasks/ticket-{id}/test-cases.md`
+8. **Phase 7** — Attach report to source work item
+**Key features:**
+- Boundary value triplets: detects `max`, `min`, `limit`, `threshold` keywords with numeric values → generates N-1, N, N+1 test cases
+- Deduplication: checks existing linked TCs before creating, prevents duplicates
+- Confidence scoring: `Specified` (explicit source) vs `Draft` (inferred, tagged `NeedsReview`)
+- Cross-references codebase map for project-specific context when available
+- Supports `--area-path` and `--iteration-path` overrides
+For the complete step-by-step process, see `@commands/qa-create-test-ado.md`.
+---
 ### UPDATE MODE
 1. Read `CLAUDE.md` — quality gates, locator tiers, assertion rules, POM rules.

package/commands/qa-fix.md CHANGED Viewed

@@ -325,7 +325,7 @@ The validator runs 4 layers per file:
 - Cross-reference test locators against real DOM elements
 - Flag locators that don't match, auto-fix mismatches
-Max 3 fix loop iterations. Produces VALIDATION_REPORT.md.
+Max 5 fix loop iterations. Produces VALIDATION_REPORT.md.
 If `--run` flag is also present and E2E test files exist, invoke E2E runner after static validation:
@@ -357,20 +357,99 @@ Same as fix mode below but skip Step 4 (auto-fix). Only classify and report.
 ### FIX MODE (default)
+Fix mode runs in two phases: **analyze first, then fix after user confirmation.**
+#### Phase 1: Analyze & Present Plan (always runs)
 1. Read `CLAUDE.md` — classification rules, locator tiers, assertion quality.
-2. Invoke bug-detective agent:
+2. Invoke bug-detective agent in **classify-only mode** (no auto-fixes):
 Task(
   prompt="
-    <objective>Run tests, classify failures, and auto-fix TEST CODE ERRORS. Use Playwright MCP to reproduce E2E failures in the browser when available — navigate to failing pages, snapshot DOM, reproduce actions, and screenshot failure state for evidence.</objective>
+    <objective>Run tests and classify failures. Do NOT auto-fix anything yet — this is the analysis phase. Use Playwright MCP to reproduce E2E failures in the browser when available — navigate to failing pages, snapshot DOM, reproduce actions, and screenshot failure state for evidence.</objective>
     <execution_context>@agents/qaa-bug-detective.md</execution_context>
     <files_to_read>
     - CLAUDE.md
     - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
     - .qa-output/locators/LOCATOR_REGISTRY.md (if exists)
+    - .qa-output/codebase/CODE_PATTERNS.md (if exists)
+    - .qa-output/codebase/API_CONTRACTS.md (if exists)
+    - .qa-output/codebase/TEST_SURFACE.md (if exists)
+    - .qa-output/codebase/TESTABILITY.md (if exists)
     </files_to_read>
     <parameters>
     user_input: $ARGUMENTS
+    mode: classify-only
+    app_url: {auto-detect from test config baseURL, or ask user}
+    </parameters>
+  "
+)
+3. Present the analysis to the user as a **fix plan**:
+```
+=== FIX PLAN ===
+Tests run: {N}
+Passed: {N}
+Failed: {N}
+Failures classified:
+  APPLICATION BUG:    {N} (will NOT be touched)
+  TEST CODE ERROR:    {N} (can auto-fix)
+  ENVIRONMENT ISSUE:  {N} (resolution steps provided)
+  INCONCLUSIVE:       {N} (needs more info)
+Proposed auto-fixes (TEST CODE ERRORS only):
+  1. {file}:{line} — {description of fix} [HIGH confidence]
+  2. {file}:{line} — {description of fix} [HIGH confidence]
+  3. {file}:{line} — {description of fix} [MEDIUM — flagged for review]
+APPLICATION BUGs found (for developer action):
+  1. {file}:{line} — {description}
+  2. {file}:{line} — {description}
+Proceed with auto-fixes? [yes / modify / cancel]
+================
+```
+4. **Wait for user confirmation.** Do NOT proceed until the user approves. This is a refinement loop — repeat until the user is satisfied:
+   - **"yes"** / **"proceed"** / **"dale"** → continue to Phase 2
+   - **"cancel"** / **"no"** → stop, deliver only the FAILURE_CLASSIFICATION_REPORT.md (same as --classify)
+   - **Any other response** (feedback, modifications, additions) → treat as a refinement request:
+     - Adjust the fix plan based on the user's instructions (add fixes, remove fixes, change approach, add new checks)
+     - Re-present the updated Fix Plan showing what changed
+     - Wait for user confirmation again
+     - Repeat this loop until the user says "yes" or "cancel"
+   **Examples of refinement requests:**
+   - "esto está bien pero también quiero que arregles los imports de utils" → add that fix to the plan
+   - "no toques el archivo de login, dejalo como está" → remove that file from the plan
+   - "cambiá el selector por getByRole en vez de getByTestId" → update the proposed fix
+   - "agregá también una validación de que el status sea 200" → add assertion fix to the plan
+#### Phase 2: Execute Fixes (only after user confirmation)
+5. Invoke bug-detective agent in **fix mode** with the confirmed plan:
+Task(
+  prompt="
+    <objective>Auto-fix the confirmed TEST CODE ERRORS from the analysis phase. Use Playwright MCP to reproduce E2E failures in the browser when available — navigate to failing pages, snapshot DOM, reproduce actions, and screenshot failure state for evidence.</objective>
+    <execution_context>@agents/qaa-bug-detective.md</execution_context>
+    <files_to_read>
+    - CLAUDE.md
+    - ~/.claude/qaa/MY_PREFERENCES.md (if exists)
+    - .qa-output/locators/LOCATOR_REGISTRY.md (if exists)
+    - .qa-output/codebase/CODE_PATTERNS.md (if exists)
+    - .qa-output/codebase/API_CONTRACTS.md (if exists)
+    - .qa-output/codebase/TEST_SURFACE.md (if exists)
+    - .qa-output/codebase/TESTABILITY.md (if exists)
+    - .qa-output/FAILURE_CLASSIFICATION_REPORT.md
+    </files_to_read>
+    <parameters>
+    user_input: $ARGUMENTS
+    mode: fix
     app_url: {auto-detect from test config baseURL, or ask user}
     </parameters>
   "
@@ -394,6 +473,35 @@ Task(
 - Element exists, wrong behavior → APPLICATION BUG
 - Page doesn't load → ENVIRONMENT ISSUE
-3. Present results. APPLICATION BUGs are reported for developer action, not auto-fixed.
+6. Present results. APPLICATION BUGs are reported for developer action, not auto-fixed.
 $ARGUMENTS
+## MANDATORY verification — run ALL commands below, no exceptions, no skipping
+Before returning control, copy-paste and run this ENTIRE block. Do NOT decide which commands "apply" — run all of them every time. The output confirms what happened; you do not get to assume the answer.
+```bash
+echo "=== QA-FIX CHECKLIST START ==="
+echo "1. Failure classification report:"
+ls .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_CLASSIFICATION_REPORT"
+echo "2. Locator Registry:"
+ls .qa-output/locators/ 2>/dev/null || echo "NO_LOCATORS_FOUND"
+echo "3. MY_PREFERENCES.md:"
+cat ~/.claude/qaa/MY_PREFERENCES.md 2>/dev/null || echo "FILE_NOT_FOUND"
+echo "4. Codebase map (context for bug-detective):"
+ls .qa-output/codebase/ 2>/dev/null || echo "NO_CODEBASE_MAP"
+echo "5. Test files in scope:"
+find tests/ cypress/ __tests__/ e2e/ spec/ -type f -name "*.spec.*" -o -name "*.test.*" -o -name "*.e2e.*" 2>/dev/null | head -20 || echo "NO_TEST_FILES_FOUND"
+echo "6. MCP evidence (if browser was used):"
+ls .qa-output/mcp-evidence/ 2>/dev/null || echo "NO_MCP_EVIDENCE"
+echo "7. Classification categories in report:"
+grep -cE "APPLICATION BUG|TEST CODE ERROR|ENVIRONMENT ISSUE|INCONCLUSIVE" .qa-output/FAILURE_CLASSIFICATION_REPORT.md 2>/dev/null || echo "NO_CLASSIFICATIONS"
+echo "=== QA-FIX CHECKLIST END ==="
+```
+**Rules:**
+- Run the block AS-IS. Do not modify it. Do not split it. Do not skip lines.
+- If any output shows a problem (NO_CLASSIFICATION_REPORT after fix mode completed), fix it before returning.
+- If output shows expected "not found" results (e.g., NO_MCP_EVIDENCE when no browser was used), that is fine — the point is you RAN the command instead of assuming the answer.
+- Do NOT mark this task as complete until the block has been executed and you have read every line of output.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "qaa-agent",
-  "version": "1.8.1",
+  "version": "1.8.6",
   "description": "QA Automation Agent for Claude Code — multi-agent pipeline that analyzes repos, generates tests, validates, and creates PRs",
   "bin": {
     "qaa-agent": "./bin/install.cjs"
@@ -15,10 +15,6 @@
     "pytest",
     "ai-agent"
   ],
-  "repository": {
-    "type": "git",
-    "url": "https://github.com/Backhaus7997/qaa-testing.git"
-  },
   "author": "Backhaus7997",
   "license": "MIT",
   "dependencies": {