qaa-agent 1.6.2 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (78) hide show
  1. package/.mcp.json +8 -8
  2. package/CHANGELOG.md +93 -71
  3. package/CLAUDE.md +553 -553
  4. package/agents/qa-pipeline-orchestrator.md +1378 -1378
  5. package/agents/qaa-analyzer.md +539 -524
  6. package/agents/qaa-bug-detective.md +479 -446
  7. package/agents/qaa-codebase-mapper.md +935 -935
  8. package/agents/qaa-discovery.md +384 -0
  9. package/agents/qaa-e2e-runner.md +416 -415
  10. package/agents/qaa-executor.md +651 -651
  11. package/agents/qaa-planner.md +405 -390
  12. package/agents/qaa-project-researcher.md +319 -319
  13. package/agents/qaa-scanner.md +424 -424
  14. package/agents/qaa-testid-injector.md +643 -585
  15. package/agents/qaa-validator.md +490 -452
  16. package/bin/install.cjs +200 -198
  17. package/bin/lib/commands.cjs +709 -709
  18. package/bin/lib/config.cjs +307 -307
  19. package/bin/lib/core.cjs +497 -497
  20. package/bin/lib/frontmatter.cjs +299 -299
  21. package/bin/lib/init.cjs +989 -989
  22. package/bin/lib/milestone.cjs +241 -241
  23. package/bin/lib/model-profiles.cjs +60 -60
  24. package/bin/lib/phase.cjs +911 -911
  25. package/bin/lib/roadmap.cjs +306 -306
  26. package/bin/lib/state.cjs +748 -748
  27. package/bin/lib/template.cjs +222 -222
  28. package/bin/lib/verify.cjs +842 -842
  29. package/bin/qaa-tools.cjs +607 -607
  30. package/commands/qa-audit.md +119 -0
  31. package/commands/qa-create-test.md +288 -0
  32. package/commands/qa-fix.md +147 -0
  33. package/commands/qa-map.md +137 -0
  34. package/{.claude/commands → commands}/qa-pr.md +23 -23
  35. package/{.claude/commands → commands}/qa-start.md +22 -22
  36. package/{.claude/commands → commands}/qa-testid.md +19 -19
  37. package/docs/COMMANDS.md +341 -341
  38. package/docs/DEMO.md +182 -182
  39. package/docs/TESTING.md +156 -156
  40. package/package.json +6 -7
  41. package/{.claude/settings.json → settings.json} +1 -2
  42. package/templates/failure-classification.md +391 -391
  43. package/templates/gap-analysis.md +409 -409
  44. package/templates/pr-template.md +48 -48
  45. package/templates/qa-analysis.md +381 -381
  46. package/templates/qa-audit-report.md +465 -465
  47. package/templates/qa-repo-blueprint.md +636 -636
  48. package/templates/scan-manifest.md +312 -312
  49. package/templates/test-inventory.md +582 -582
  50. package/templates/testid-audit-report.md +354 -354
  51. package/templates/validation-report.md +243 -243
  52. package/workflows/qa-analyze.md +296 -296
  53. package/workflows/qa-from-ticket.md +536 -536
  54. package/workflows/qa-gap.md +309 -303
  55. package/workflows/qa-pr.md +389 -389
  56. package/workflows/qa-start.md +1192 -1168
  57. package/workflows/qa-testid.md +384 -356
  58. package/workflows/qa-validate.md +299 -295
  59. package/.claude/commands/create-test.md +0 -164
  60. package/.claude/commands/qa-audit.md +0 -37
  61. package/.claude/commands/qa-blueprint.md +0 -54
  62. package/.claude/commands/qa-fix.md +0 -36
  63. package/.claude/commands/qa-from-ticket.md +0 -24
  64. package/.claude/commands/qa-gap.md +0 -20
  65. package/.claude/commands/qa-map.md +0 -47
  66. package/.claude/commands/qa-pom.md +0 -36
  67. package/.claude/commands/qa-pyramid.md +0 -37
  68. package/.claude/commands/qa-report.md +0 -38
  69. package/.claude/commands/qa-research.md +0 -33
  70. package/.claude/commands/qa-validate.md +0 -42
  71. package/.claude/commands/update-test.md +0 -58
  72. package/.claude/skills/qa-learner/SKILL.md +0 -150
  73. /package/{.claude/skills → skills}/qa-bug-detective/SKILL.md +0 -0
  74. /package/{.claude/skills → skills}/qa-repo-analyzer/SKILL.md +0 -0
  75. /package/{.claude/skills → skills}/qa-self-validator/SKILL.md +0 -0
  76. /package/{.claude/skills → skills}/qa-template-engine/SKILL.md +0 -0
  77. /package/{.claude/skills → skills}/qa-testid-injector/SKILL.md +0 -0
  78. /package/{.claude/skills → skills}/qa-workflow-documenter/SKILL.md +0 -0
package/docs/DEMO.md CHANGED
@@ -1,182 +1,182 @@
1
- # QAA — QA Automation Agent
2
-
3
- ## What is it?
4
-
5
- QAA is a multi-agent system that automates QA test creation for any software project. You point it at a codebase, and it analyzes the architecture, maps the code, generates a full test suite following industry standards, validates everything, and delivers the result as a draft pull request — ready for review.
6
-
7
- No manual test writing. No guessing what to cover. One command, full pipeline.
8
-
9
- ## The Problem
10
-
11
- Writing test suites is slow, repetitive, and often inconsistent. Teams face:
12
-
13
- - **Starting from zero is painful** — a new project with no tests means weeks of setup before the first real test runs
14
- - **Coverage gaps are invisible** — without analysis, teams don't know what's missing until something breaks in production
15
- - **Standards drift** — different team members write tests differently: inconsistent locators, vague assertions, mixed naming conventions
16
- - **QA is always behind dev** — features ship faster than tests get written, and the gap keeps growing
17
- - **Existing QA teams still spend hours on repetitive work** — even with a mature test suite, adding tests for new features means manually inspecting pages, finding locators, writing POMs, running tests, fixing failures, repeat
18
-
19
- ## The Solution
20
-
21
- QAA runs a pipeline of specialized AI agents, each responsible for one stage:
22
-
23
- ```
24
- scan → map → analyze → plan → generate → validate → deliver
25
- ```
26
-
27
- | Stage | What happens | Output |
28
- |-------|-------------|--------|
29
- | **Scan** | Detects framework, language, testable surfaces | SCAN_MANIFEST.md |
30
- | **Map** | Deep-scans codebase for testability, risk, patterns, existing tests (4 parallel agents) | 8 codebase documents |
31
- | **Analyze** | Produces risk assessment, test inventory, testing pyramid | QA_ANALYSIS.md, TEST_INVENTORY.md |
32
- | **Plan** | Groups test cases by feature, assigns to files, resolves dependencies | GENERATION_PLAN.md |
33
- | **Generate** | Writes test files, POMs, fixtures, configs following project standards | Test suite on disk |
34
- | **Validate** | 4-layer validation (syntax, structure, dependencies, logic) with auto-fix | VALIDATION_REPORT.md |
35
- | **Deliver** | Creates branch, commits per stage, pushes, opens draft PR | Pull request URL |
36
-
37
- Every agent reads the project's QA standards (CLAUDE.md) before producing output. Every test case has a unique ID, concrete inputs, and explicit expected outcomes — never "works correctly."
38
-
39
- ## Three Workflows
40
-
41
- QAA adapts to where the project is in its QA maturity:
42
-
43
- **1. No QA repo yet** — `/qa-start --dev-repo ./myproject`
44
- Full pipeline from scratch. Produces a complete test suite, QA repo blueprint, and a draft PR with everything.
45
-
46
- **2. Immature QA repo** — `/qa-start --dev-repo ./myproject --qa-repo ./tests`
47
- Scans both repos, identifies gaps, fixes broken tests, adds missing coverage, standardizes existing tests.
48
-
49
- **3. Mature QA repo** — `/qa-start --dev-repo ./myproject --qa-repo ./tests`
50
- Only adds surgical test additions where coverage is thin. Doesn't touch working tests.
51
-
52
- ## The "Brain" — Codebase Map
53
-
54
- Before generating anything, QAA maps the entire codebase with 4 parallel agents:
55
-
56
- - **Testability** — what's testable, pure functions vs stateful code, mock boundaries
57
- - **Risk** — business-critical paths, security-sensitive areas, data integrity risks
58
- - **Patterns** — naming conventions, API shapes, import style, code patterns
59
- - **Existing tests** — current test quality, frameworks in use, coverage gaps
60
-
61
- These 8 documents become the shared context that every downstream agent reads. The analyzer uses risk data to prioritize tests. The planner uses testability data to estimate complexity. The executor uses code patterns to generate tests that match the project's style.
62
-
63
- Result: generated tests feel native to the codebase, not generic boilerplate.
64
-
65
- ## Day-to-Day for a QA Engineer
66
-
67
- This is where QAA shines for teams that already have a mature QA repo. The full pipeline is for bootstrapping — but the real daily value is the targeted workflow.
68
-
69
- ### The scenario
70
-
71
- You're a QA engineer. A developer just shipped a new "password reset" feature. You need tests. Here's what happens:
72
-
73
- ### Step 1: Map the codebase (once)
74
-
75
- ```
76
- /qa-map
77
- ```
78
-
79
- QAA scans the entire project and builds its "brain" — 8 documents covering testability, risk areas, API contracts, code patterns, and existing test coverage. This runs once and stays valid until the codebase changes significantly.
80
-
81
- ### Step 2: Create tests for the feature
82
-
83
- ```
84
- /create-test "password reset"
85
- ```
86
-
87
- QAA already knows the codebase. It reads the brain documents, finds the relevant source files (`auth.service.ts`, `reset.controller.ts`, the reset page component), understands the API contracts, and generates:
88
-
89
- - Unit tests for the reset token logic with concrete inputs and expected outputs
90
- - API tests for `POST /api/auth/reset-password` with real request/response shapes
91
- - E2E tests with Page Object Models that use the project's existing POM base class
92
- - Fixtures with test data (fake emails, expired tokens, invalid tokens)
93
-
94
- All following the project's naming conventions, import style, and assertion patterns.
95
-
96
- ### Step 3: Validate and fix in a loop
97
-
98
- ```
99
- /qa-validate ./tests
100
- ```
101
-
102
- The validator runs 4 layers of checks on every generated file:
103
-
104
- 1. **Syntax** — does it parse? Are imports correct?
105
- 2. **Structure** — does it follow POM rules? Are locators in the right tier?
106
- 3. **Dependencies** — do all imports resolve? Are mocks set up correctly?
107
- 4. **Logic** — are assertions concrete? Do test IDs follow the convention?
108
-
109
- If issues are found, the validator auto-fixes them and re-checks — up to 3 loops. If something still fails, the bug detective classifies it: is it an application bug, a test code error, or an environment issue?
110
-
111
- ### Step 4: Run the tests with Playwright
112
-
113
- QAA integrates with Playwright to actually execute the generated E2E tests against a running application. It opens the browser, navigates pages, fills forms, clicks buttons, and captures what happens. If a test fails, it reads the error, inspects the page state, and determines whether the locator is wrong, the page changed, or there's a real bug.
114
-
115
- The loop looks like this:
116
-
117
- ```
118
- generate → validate → run → failures? → classify → fix test code → run again → pass
119
- ```
120
-
121
- This continues until the tests pass or the issue is classified as an application bug that needs a developer fix.
122
-
123
- ### Step 5: Ship it
124
-
125
- ```
126
- /qa-pr --ticket PROJ-456 "password reset tests"
127
- ```
128
-
129
- QAA creates a branch following your team's naming convention (it asked you once and remembers forever), commits the test files, pushes, and opens a draft PR on GitHub, Azure DevOps, or GitLab — whatever your team uses. You get the link.
130
-
131
- ### The full daily flow
132
-
133
- ```
134
- /qa-map → builds the "brain" (once)
135
- /create-test "password reset" → generates tests using codebase knowledge
136
- /qa-validate ./tests/unit/auth* → validates + auto-fixes
137
- /qa-pr --ticket PROJ-456 "password reset tests" → draft PR with link
138
- ```
139
-
140
- From ticket to PR in minutes, not hours. And the tests follow the same standards as every other test in the repo because QAA read the existing patterns first.
141
-
142
- ### What about tickets?
143
-
144
- If you work from Jira, Linear, or GitHub Issues, skip the manual description:
145
-
146
- ```
147
- /qa-from-ticket https://company.atlassian.net/browse/PROJ-456
148
- ```
149
-
150
- QAA fetches the ticket, extracts acceptance criteria and edge cases, maps each criterion to test cases with a traceability matrix, generates the tests, validates them, and gives you a report showing which AC is covered by which test.
151
-
152
- ### When tests break after a deploy
153
-
154
- ```
155
- /qa-fix ./tests/e2e/checkout*
156
- ```
157
-
158
- QAA reads the failing tests, runs them, classifies each failure (app bug vs test code error vs environment issue), and auto-fixes the test code errors. Application bugs get flagged for the dev team with evidence — the exact assertion that failed, what was expected, and what was received.
159
-
160
- ## Standards
161
-
162
- Every test artifact follows strict rules:
163
-
164
- - **Testing pyramid** — 60-70% unit, 10-15% integration, 20-25% API, 3-5% E2E
165
- - **Locator hierarchy** — data-testid first, ARIA roles, labels, CSS as last resort (with TODO)
166
- - **Page Object Model** — one class per page, no assertions in POMs, locators as properties
167
- - **Assertions** — concrete values only. `expect(status).toBe(200)` not `expect(status).toBeTruthy()`
168
- - **Naming** — unique IDs per test case: `UT-AUTH-001`, `API-USERS-003`, `E2E-CHECKOUT-001`
169
-
170
- ## Learning System
171
-
172
- QAA remembers your preferences across sessions. When you correct it — "use Playwright, not Cypress" or "our branches start with feature/" — it saves the rule permanently. Next time, every agent reads your preferences before generating output.
173
-
174
- Preferences override defaults. Your team's conventions always win.
175
-
176
- ## Numbers
177
-
178
- 17 commands. 7 skills. 11 agents. 10 templates. 7 workflows.
179
-
180
- Supports GitHub, Azure DevOps, and GitLab. Works with Playwright, Cypress, Jest, Vitest, pytest, and more — detects what the project uses and matches it.
181
-
182
- One goal: you focus on building features, QAA handles the tests.
1
+ # QAA — QA Automation Agent
2
+
3
+ ## What is it?
4
+
5
+ QAA is a multi-agent system that automates QA test creation for any software project. You point it at a codebase, and it analyzes the architecture, maps the code, generates a full test suite following industry standards, validates everything, and delivers the result as a draft pull request — ready for review.
6
+
7
+ No manual test writing. No guessing what to cover. One command, full pipeline.
8
+
9
+ ## The Problem
10
+
11
+ Writing test suites is slow, repetitive, and often inconsistent. Teams face:
12
+
13
+ - **Starting from zero is painful** — a new project with no tests means weeks of setup before the first real test runs
14
+ - **Coverage gaps are invisible** — without analysis, teams don't know what's missing until something breaks in production
15
+ - **Standards drift** — different team members write tests differently: inconsistent locators, vague assertions, mixed naming conventions
16
+ - **QA is always behind dev** — features ship faster than tests get written, and the gap keeps growing
17
+ - **Existing QA teams still spend hours on repetitive work** — even with a mature test suite, adding tests for new features means manually inspecting pages, finding locators, writing POMs, running tests, fixing failures, repeat
18
+
19
+ ## The Solution
20
+
21
+ QAA runs a pipeline of specialized AI agents, each responsible for one stage:
22
+
23
+ ```
24
+ scan → map → analyze → plan → generate → validate → deliver
25
+ ```
26
+
27
+ | Stage | What happens | Output |
28
+ |-------|-------------|--------|
29
+ | **Scan** | Detects framework, language, testable surfaces | SCAN_MANIFEST.md |
30
+ | **Map** | Deep-scans codebase for testability, risk, patterns, existing tests (4 parallel agents) | 8 codebase documents |
31
+ | **Analyze** | Produces risk assessment, test inventory, testing pyramid | QA_ANALYSIS.md, TEST_INVENTORY.md |
32
+ | **Plan** | Groups test cases by feature, assigns to files, resolves dependencies | GENERATION_PLAN.md |
33
+ | **Generate** | Writes test files, POMs, fixtures, configs following project standards | Test suite on disk |
34
+ | **Validate** | 4-layer validation (syntax, structure, dependencies, logic) with auto-fix | VALIDATION_REPORT.md |
35
+ | **Deliver** | Creates branch, commits per stage, pushes, opens draft PR | Pull request URL |
36
+
37
+ Every agent reads the project's QA standards (CLAUDE.md) before producing output. Every test case has a unique ID, concrete inputs, and explicit expected outcomes — never "works correctly."
38
+
39
+ ## Three Workflows
40
+
41
+ QAA adapts to where the project is in its QA maturity:
42
+
43
+ **1. No QA repo yet** — `/qa-start --dev-repo ./myproject`
44
+ Full pipeline from scratch. Produces a complete test suite, QA repo blueprint, and a draft PR with everything.
45
+
46
+ **2. Immature QA repo** — `/qa-start --dev-repo ./myproject --qa-repo ./tests`
47
+ Scans both repos, identifies gaps, fixes broken tests, adds missing coverage, standardizes existing tests.
48
+
49
+ **3. Mature QA repo** — `/qa-start --dev-repo ./myproject --qa-repo ./tests`
50
+ Only adds surgical test additions where coverage is thin. Doesn't touch working tests.
51
+
52
+ ## The "Brain" — Codebase Map
53
+
54
+ Before generating anything, QAA maps the entire codebase with 4 parallel agents:
55
+
56
+ - **Testability** — what's testable, pure functions vs stateful code, mock boundaries
57
+ - **Risk** — business-critical paths, security-sensitive areas, data integrity risks
58
+ - **Patterns** — naming conventions, API shapes, import style, code patterns
59
+ - **Existing tests** — current test quality, frameworks in use, coverage gaps
60
+
61
+ These 8 documents become the shared context that every downstream agent reads. The analyzer uses risk data to prioritize tests. The planner uses testability data to estimate complexity. The executor uses code patterns to generate tests that match the project's style.
62
+
63
+ Result: generated tests feel native to the codebase, not generic boilerplate.
64
+
65
+ ## Day-to-Day for a QA Engineer
66
+
67
+ This is where QAA shines for teams that already have a mature QA repo. The full pipeline is for bootstrapping — but the real daily value is the targeted workflow.
68
+
69
+ ### The scenario
70
+
71
+ You're a QA engineer. A developer just shipped a new "password reset" feature. You need tests. Here's what happens:
72
+
73
+ ### Step 1: Map the codebase (once)
74
+
75
+ ```
76
+ /qa-map
77
+ ```
78
+
79
+ QAA scans the entire project and builds its "brain" — 8 documents covering testability, risk areas, API contracts, code patterns, and existing test coverage. This runs once and stays valid until the codebase changes significantly.
80
+
81
+ ### Step 2: Create tests for the feature
82
+
83
+ ```
84
+ /create-test "password reset"
85
+ ```
86
+
87
+ QAA already knows the codebase. It reads the brain documents, finds the relevant source files (`auth.service.ts`, `reset.controller.ts`, the reset page component), understands the API contracts, and generates:
88
+
89
+ - Unit tests for the reset token logic with concrete inputs and expected outputs
90
+ - API tests for `POST /api/auth/reset-password` with real request/response shapes
91
+ - E2E tests with Page Object Models that use the project's existing POM base class
92
+ - Fixtures with test data (fake emails, expired tokens, invalid tokens)
93
+
94
+ All following the project's naming conventions, import style, and assertion patterns.
95
+
96
+ ### Step 3: Validate and fix in a loop
97
+
98
+ ```
99
+ /qa-validate ./tests
100
+ ```
101
+
102
+ The validator runs 4 layers of checks on every generated file:
103
+
104
+ 1. **Syntax** — does it parse? Are imports correct?
105
+ 2. **Structure** — does it follow POM rules? Are locators in the right tier?
106
+ 3. **Dependencies** — do all imports resolve? Are mocks set up correctly?
107
+ 4. **Logic** — are assertions concrete? Do test IDs follow the convention?
108
+
109
+ If issues are found, the validator auto-fixes them and re-checks — up to 3 loops. If something still fails, the bug detective classifies it: is it an application bug, a test code error, or an environment issue?
110
+
111
+ ### Step 4: Run the tests with Playwright
112
+
113
+ QAA integrates with Playwright to actually execute the generated E2E tests against a running application. It opens the browser, navigates pages, fills forms, clicks buttons, and captures what happens. If a test fails, it reads the error, inspects the page state, and determines whether the locator is wrong, the page changed, or there's a real bug.
114
+
115
+ The loop looks like this:
116
+
117
+ ```
118
+ generate → validate → run → failures? → classify → fix test code → run again → pass
119
+ ```
120
+
121
+ This continues until the tests pass or the issue is classified as an application bug that needs a developer fix.
122
+
123
+ ### Step 5: Ship it
124
+
125
+ ```
126
+ /qa-pr --ticket PROJ-456 "password reset tests"
127
+ ```
128
+
129
+ QAA creates a branch following your team's naming convention (it asked you once and remembers forever), commits the test files, pushes, and opens a draft PR on GitHub, Azure DevOps, or GitLab — whatever your team uses. You get the link.
130
+
131
+ ### The full daily flow
132
+
133
+ ```
134
+ /qa-map → builds the "brain" (once)
135
+ /create-test "password reset" → generates tests using codebase knowledge
136
+ /qa-validate ./tests/unit/auth* → validates + auto-fixes
137
+ /qa-pr --ticket PROJ-456 "password reset tests" → draft PR with link
138
+ ```
139
+
140
+ From ticket to PR in minutes, not hours. And the tests follow the same standards as every other test in the repo because QAA read the existing patterns first.
141
+
142
+ ### What about tickets?
143
+
144
+ If you work from Jira, Linear, or GitHub Issues, skip the manual description:
145
+
146
+ ```
147
+ /qa-from-ticket https://company.atlassian.net/browse/PROJ-456
148
+ ```
149
+
150
+ QAA fetches the ticket, extracts acceptance criteria and edge cases, maps each criterion to test cases with a traceability matrix, generates the tests, validates them, and gives you a report showing which AC is covered by which test.
151
+
152
+ ### When tests break after a deploy
153
+
154
+ ```
155
+ /qa-fix ./tests/e2e/checkout*
156
+ ```
157
+
158
+ QAA reads the failing tests, runs them, classifies each failure (app bug vs test code error vs environment issue), and auto-fixes the test code errors. Application bugs get flagged for the dev team with evidence — the exact assertion that failed, what was expected, and what was received.
159
+
160
+ ## Standards
161
+
162
+ Every test artifact follows strict rules:
163
+
164
+ - **Testing pyramid** — 60-70% unit, 10-15% integration, 20-25% API, 3-5% E2E
165
+ - **Locator hierarchy** — data-testid first, ARIA roles, labels, CSS as last resort (with TODO)
166
+ - **Page Object Model** — one class per page, no assertions in POMs, locators as properties
167
+ - **Assertions** — concrete values only. `expect(status).toBe(200)` not `expect(status).toBeTruthy()`
168
+ - **Naming** — unique IDs per test case: `UT-AUTH-001`, `API-USERS-003`, `E2E-CHECKOUT-001`
169
+
170
+ ## Learning System
171
+
172
+ QAA remembers your preferences across sessions. When you correct it — "use Playwright, not Cypress" or "our branches start with feature/" — it saves the rule permanently. Next time, every agent reads your preferences before generating output.
173
+
174
+ Preferences override defaults. Your team's conventions always win.
175
+
176
+ ## Numbers
177
+
178
+ 17 commands. 7 skills. 11 agents. 10 templates. 7 workflows.
179
+
180
+ Supports GitHub, Azure DevOps, and GitLab. Works with Playwright, Cypress, Jest, Vitest, pytest, and more — detects what the project uses and matches it.
181
+
182
+ One goal: you focus on building features, QAA handles the tests.