qaa-agent 1.6.3 → 1.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +22 -0
- package/agents/qaa-analyzer.md +16 -1
- package/agents/qaa-bug-detective.md +33 -0
- package/agents/qaa-discovery.md +384 -0
- package/agents/qaa-e2e-runner.md +7 -6
- package/agents/qaa-planner.md +16 -1
- package/agents/qaa-testid-injector.md +60 -2
- package/agents/qaa-validator.md +38 -0
- package/bin/install.cjs +25 -13
- package/commands/qa-audit.md +119 -0
- package/commands/qa-create-test.md +288 -0
- package/commands/qa-fix.md +395 -0
- package/commands/qa-map.md +137 -0
- package/package.json +40 -41
- package/{.claude/settings.json → settings.json} +19 -20
- package/{.claude/skills → skills}/qa-bug-detective/SKILL.md +122 -122
- package/{.claude/skills → skills}/qa-repo-analyzer/SKILL.md +88 -88
- package/{.claude/skills → skills}/qa-self-validator/SKILL.md +109 -109
- package/{.claude/skills → skills}/qa-template-engine/SKILL.md +113 -113
- package/{.claude/skills → skills}/qa-testid-injector/SKILL.md +93 -93
- package/{.claude/skills → skills}/qa-workflow-documenter/SKILL.md +87 -87
- package/workflows/qa-gap.md +7 -1
- package/workflows/qa-start.md +25 -1
- package/workflows/qa-testid.md +29 -1
- package/workflows/qa-validate.md +5 -1
- package/.claude/commands/create-test.md +0 -164
- package/.claude/commands/qa-audit.md +0 -37
- package/.claude/commands/qa-blueprint.md +0 -54
- package/.claude/commands/qa-fix.md +0 -36
- package/.claude/commands/qa-from-ticket.md +0 -24
- package/.claude/commands/qa-gap.md +0 -20
- package/.claude/commands/qa-map.md +0 -47
- package/.claude/commands/qa-pom.md +0 -36
- package/.claude/commands/qa-pyramid.md +0 -37
- package/.claude/commands/qa-report.md +0 -38
- package/.claude/commands/qa-research.md +0 -33
- package/.claude/commands/qa-validate.md +0 -42
- package/.claude/commands/update-test.md +0 -58
- package/.claude/skills/qa-learner/SKILL.md +0 -150
- /package/{.claude/commands → commands}/qa-pr.md +0 -0
- /package/{.claude/commands → commands}/qa-start.md +0 -0
- /package/{.claude/commands → commands}/qa-testid.md +0 -0
|
@@ -1,122 +1,122 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: qa-bug-detective
|
|
3
|
-
description: QA Bug Detective. Runs generated tests and classifies failures as APPLICATION BUG, TEST CODE ERROR, ENVIRONMENT ISSUE, or INCONCLUSIVE with evidence and confidence levels. Use when user wants to run tests and classify results, investigate test failures, determine if failures are bugs or test issues, debug failing tests, triage test results, or understand why tests are failing. Triggers on "run tests", "classify failures", "why is this failing", "test failures", "debug tests", "triage results", "is this a bug or test error", "investigate failures".
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# QA Bug Detective
|
|
7
|
-
|
|
8
|
-
## Purpose
|
|
9
|
-
|
|
10
|
-
Run generated tests and classify every failure into one of four categories with evidence and confidence levels. Auto-fix TEST CODE ERRORS when confidence is HIGH.
|
|
11
|
-
|
|
12
|
-
## Classification Decision Tree
|
|
13
|
-
|
|
14
|
-
```
|
|
15
|
-
Test fails
|
|
16
|
-
├── Syntax/import error in TEST file?
|
|
17
|
-
│ └── YES → TEST CODE ERROR
|
|
18
|
-
├── Error occurs in PRODUCTION code path?
|
|
19
|
-
│ ├── Known bug / unexpected behavior? → APPLICATION BUG
|
|
20
|
-
│ └── Code works as designed but test expectation wrong? → TEST CODE ERROR
|
|
21
|
-
├── Connection refused / timeout / missing env var?
|
|
22
|
-
│ └── YES → ENVIRONMENT ISSUE
|
|
23
|
-
└── Can't determine?
|
|
24
|
-
└── INCONCLUSIVE
|
|
25
|
-
```
|
|
26
|
-
|
|
27
|
-
## Classification Categories
|
|
28
|
-
|
|
29
|
-
### APPLICATION BUG
|
|
30
|
-
- Error manifests in production code (not test code)
|
|
31
|
-
- Stack trace points to src/ or app/ code
|
|
32
|
-
- Behavior contradicts documented requirements or API contracts
|
|
33
|
-
- **Action**: Report only. NEVER auto-fix application code.
|
|
34
|
-
|
|
35
|
-
### TEST CODE ERROR
|
|
36
|
-
- Import/require fails (wrong path, missing module)
|
|
37
|
-
- Selector doesn't match current DOM
|
|
38
|
-
- Assertion expects wrong value (test written incorrectly)
|
|
39
|
-
- Missing await, wrong API usage, stale fixture reference
|
|
40
|
-
- **Action**: Auto-fix if HIGH confidence. Report if MEDIUM or lower.
|
|
41
|
-
|
|
42
|
-
### ENVIRONMENT ISSUE
|
|
43
|
-
- Connection refused (database, API, external service)
|
|
44
|
-
- Timeout waiting for resource
|
|
45
|
-
- Missing environment variable
|
|
46
|
-
- File/directory not found (test infrastructure)
|
|
47
|
-
- **Action**: Report with suggested resolution steps.
|
|
48
|
-
|
|
49
|
-
### INCONCLUSIVE
|
|
50
|
-
- Error is ambiguous
|
|
51
|
-
- Could be multiple root causes
|
|
52
|
-
- Insufficient data to classify
|
|
53
|
-
- **Action**: Report with what's known, request more info.
|
|
54
|
-
|
|
55
|
-
## Evidence Requirements
|
|
56
|
-
|
|
57
|
-
Every classification MUST include:
|
|
58
|
-
1. **File path**: Exact file where error occurs
|
|
59
|
-
2. **Line number**: Specific line of failure
|
|
60
|
-
3. **Error message**: Complete error text
|
|
61
|
-
4. **Code snippet**: The specific code proving the classification
|
|
62
|
-
5. **Confidence level**: HIGH / MEDIUM-HIGH / MEDIUM / LOW
|
|
63
|
-
6. **Reasoning**: Why this classification, not another
|
|
64
|
-
|
|
65
|
-
## Confidence Levels
|
|
66
|
-
|
|
67
|
-
| Level | Definition |
|
|
68
|
-
|-------|------------|
|
|
69
|
-
| HIGH | Clear evidence in one direction, no ambiguity |
|
|
70
|
-
| MEDIUM-HIGH | Strong evidence but minor ambiguity |
|
|
71
|
-
| MEDIUM | Evidence points one way but alternatives exist |
|
|
72
|
-
| LOW | Insufficient data, multiple possible causes |
|
|
73
|
-
|
|
74
|
-
## Auto-Fix Rules
|
|
75
|
-
|
|
76
|
-
Only auto-fix when:
|
|
77
|
-
- Classification = TEST CODE ERROR
|
|
78
|
-
- Confidence = HIGH
|
|
79
|
-
- Fix is mechanical (import path, selector, assertion value, config)
|
|
80
|
-
|
|
81
|
-
Fix types:
|
|
82
|
-
- Import path corrections
|
|
83
|
-
- Selector updates (match current DOM/data-testid)
|
|
84
|
-
- Assertion value updates (match current actual behavior)
|
|
85
|
-
- Config fixes (baseURL, timeout values)
|
|
86
|
-
- Missing await keywords
|
|
87
|
-
- Fixture path corrections
|
|
88
|
-
|
|
89
|
-
**NEVER auto-fix**: Application bugs, environment issues, anything with confidence < HIGH.
|
|
90
|
-
|
|
91
|
-
## Output: FAILURE_CLASSIFICATION_REPORT.md
|
|
92
|
-
|
|
93
|
-
```markdown
|
|
94
|
-
# Failure Classification Report
|
|
95
|
-
|
|
96
|
-
## Summary
|
|
97
|
-
| Classification | Count | Auto-Fixed | Needs Attention |
|
|
98
|
-
|---------------|-------|-----------|----------------|
|
|
99
|
-
| APPLICATION BUG | N | 0 | N |
|
|
100
|
-
| TEST CODE ERROR | N | N | N |
|
|
101
|
-
| ENVIRONMENT ISSUE | N | 0 | N |
|
|
102
|
-
| INCONCLUSIVE | N | 0 | N |
|
|
103
|
-
|
|
104
|
-
## Detailed Analysis
|
|
105
|
-
|
|
106
|
-
### Failure 1: [test name]
|
|
107
|
-
- **Classification**: [category]
|
|
108
|
-
- **Confidence**: [level]
|
|
109
|
-
- **File**: [path]:[line]
|
|
110
|
-
- **Error**: [message]
|
|
111
|
-
- **Evidence**: [code snippet + reasoning]
|
|
112
|
-
- **Action Taken**: [auto-fixed / reported]
|
|
113
|
-
- **Resolution**: [what was fixed / what needs human attention]
|
|
114
|
-
```
|
|
115
|
-
|
|
116
|
-
## Quality Gate
|
|
117
|
-
|
|
118
|
-
- [ ] Every failure classified with evidence
|
|
119
|
-
- [ ] Confidence level assigned to each
|
|
120
|
-
- [ ] No application bugs auto-fixed
|
|
121
|
-
- [ ] Auto-fixes only applied at HIGH confidence
|
|
122
|
-
- [ ] FAILURE_CLASSIFICATION_REPORT.md produced
|
|
1
|
+
---
|
|
2
|
+
name: qa-bug-detective
|
|
3
|
+
description: QA Bug Detective. Runs generated tests and classifies failures as APPLICATION BUG, TEST CODE ERROR, ENVIRONMENT ISSUE, or INCONCLUSIVE with evidence and confidence levels. Use when user wants to run tests and classify results, investigate test failures, determine if failures are bugs or test issues, debug failing tests, triage test results, or understand why tests are failing. Triggers on "run tests", "classify failures", "why is this failing", "test failures", "debug tests", "triage results", "is this a bug or test error", "investigate failures".
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# QA Bug Detective
|
|
7
|
+
|
|
8
|
+
## Purpose
|
|
9
|
+
|
|
10
|
+
Run generated tests and classify every failure into one of four categories with evidence and confidence levels. Auto-fix TEST CODE ERRORS when confidence is HIGH.
|
|
11
|
+
|
|
12
|
+
## Classification Decision Tree
|
|
13
|
+
|
|
14
|
+
```
|
|
15
|
+
Test fails
|
|
16
|
+
├── Syntax/import error in TEST file?
|
|
17
|
+
│ └── YES → TEST CODE ERROR
|
|
18
|
+
├── Error occurs in PRODUCTION code path?
|
|
19
|
+
│ ├── Known bug / unexpected behavior? → APPLICATION BUG
|
|
20
|
+
│ └── Code works as designed but test expectation wrong? → TEST CODE ERROR
|
|
21
|
+
├── Connection refused / timeout / missing env var?
|
|
22
|
+
│ └── YES → ENVIRONMENT ISSUE
|
|
23
|
+
└── Can't determine?
|
|
24
|
+
└── INCONCLUSIVE
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
## Classification Categories
|
|
28
|
+
|
|
29
|
+
### APPLICATION BUG
|
|
30
|
+
- Error manifests in production code (not test code)
|
|
31
|
+
- Stack trace points to src/ or app/ code
|
|
32
|
+
- Behavior contradicts documented requirements or API contracts
|
|
33
|
+
- **Action**: Report only. NEVER auto-fix application code.
|
|
34
|
+
|
|
35
|
+
### TEST CODE ERROR
|
|
36
|
+
- Import/require fails (wrong path, missing module)
|
|
37
|
+
- Selector doesn't match current DOM
|
|
38
|
+
- Assertion expects wrong value (test written incorrectly)
|
|
39
|
+
- Missing await, wrong API usage, stale fixture reference
|
|
40
|
+
- **Action**: Auto-fix if HIGH confidence. Report if MEDIUM or lower.
|
|
41
|
+
|
|
42
|
+
### ENVIRONMENT ISSUE
|
|
43
|
+
- Connection refused (database, API, external service)
|
|
44
|
+
- Timeout waiting for resource
|
|
45
|
+
- Missing environment variable
|
|
46
|
+
- File/directory not found (test infrastructure)
|
|
47
|
+
- **Action**: Report with suggested resolution steps.
|
|
48
|
+
|
|
49
|
+
### INCONCLUSIVE
|
|
50
|
+
- Error is ambiguous
|
|
51
|
+
- Could be multiple root causes
|
|
52
|
+
- Insufficient data to classify
|
|
53
|
+
- **Action**: Report with what's known, request more info.
|
|
54
|
+
|
|
55
|
+
## Evidence Requirements
|
|
56
|
+
|
|
57
|
+
Every classification MUST include:
|
|
58
|
+
1. **File path**: Exact file where error occurs
|
|
59
|
+
2. **Line number**: Specific line of failure
|
|
60
|
+
3. **Error message**: Complete error text
|
|
61
|
+
4. **Code snippet**: The specific code proving the classification
|
|
62
|
+
5. **Confidence level**: HIGH / MEDIUM-HIGH / MEDIUM / LOW
|
|
63
|
+
6. **Reasoning**: Why this classification, not another
|
|
64
|
+
|
|
65
|
+
## Confidence Levels
|
|
66
|
+
|
|
67
|
+
| Level | Definition |
|
|
68
|
+
|-------|------------|
|
|
69
|
+
| HIGH | Clear evidence in one direction, no ambiguity |
|
|
70
|
+
| MEDIUM-HIGH | Strong evidence but minor ambiguity |
|
|
71
|
+
| MEDIUM | Evidence points one way but alternatives exist |
|
|
72
|
+
| LOW | Insufficient data, multiple possible causes |
|
|
73
|
+
|
|
74
|
+
## Auto-Fix Rules
|
|
75
|
+
|
|
76
|
+
Only auto-fix when:
|
|
77
|
+
- Classification = TEST CODE ERROR
|
|
78
|
+
- Confidence = HIGH
|
|
79
|
+
- Fix is mechanical (import path, selector, assertion value, config)
|
|
80
|
+
|
|
81
|
+
Fix types:
|
|
82
|
+
- Import path corrections
|
|
83
|
+
- Selector updates (match current DOM/data-testid)
|
|
84
|
+
- Assertion value updates (match current actual behavior)
|
|
85
|
+
- Config fixes (baseURL, timeout values)
|
|
86
|
+
- Missing await keywords
|
|
87
|
+
- Fixture path corrections
|
|
88
|
+
|
|
89
|
+
**NEVER auto-fix**: Application bugs, environment issues, anything with confidence < HIGH.
|
|
90
|
+
|
|
91
|
+
## Output: FAILURE_CLASSIFICATION_REPORT.md
|
|
92
|
+
|
|
93
|
+
```markdown
|
|
94
|
+
# Failure Classification Report
|
|
95
|
+
|
|
96
|
+
## Summary
|
|
97
|
+
| Classification | Count | Auto-Fixed | Needs Attention |
|
|
98
|
+
|---------------|-------|-----------|----------------|
|
|
99
|
+
| APPLICATION BUG | N | 0 | N |
|
|
100
|
+
| TEST CODE ERROR | N | N | N |
|
|
101
|
+
| ENVIRONMENT ISSUE | N | 0 | N |
|
|
102
|
+
| INCONCLUSIVE | N | 0 | N |
|
|
103
|
+
|
|
104
|
+
## Detailed Analysis
|
|
105
|
+
|
|
106
|
+
### Failure 1: [test name]
|
|
107
|
+
- **Classification**: [category]
|
|
108
|
+
- **Confidence**: [level]
|
|
109
|
+
- **File**: [path]:[line]
|
|
110
|
+
- **Error**: [message]
|
|
111
|
+
- **Evidence**: [code snippet + reasoning]
|
|
112
|
+
- **Action Taken**: [auto-fixed / reported]
|
|
113
|
+
- **Resolution**: [what was fixed / what needs human attention]
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
## Quality Gate
|
|
117
|
+
|
|
118
|
+
- [ ] Every failure classified with evidence
|
|
119
|
+
- [ ] Confidence level assigned to each
|
|
120
|
+
- [ ] No application bugs auto-fixed
|
|
121
|
+
- [ ] Auto-fixes only applied at HIGH confidence
|
|
122
|
+
- [ ] FAILURE_CLASSIFICATION_REPORT.md produced
|
|
@@ -1,88 +1,88 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: qa-repo-analyzer
|
|
3
|
-
description: QA Repository Analyzer. Analyzes a dev repository and produces a complete QA baseline package including testability report, test inventory, and repo blueprint. Use when user wants to analyze a repo for testing, assess testability, generate test inventory, create QA baseline, understand test coverage needs, evaluate a codebase for QA, or produce a testing strategy. Triggers on "analyze repo", "testability report", "test inventory", "QA analysis", "QA baseline", "coverage assessment", "what should we test", "testing strategy".
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# QA Repository Analyzer
|
|
7
|
-
|
|
8
|
-
## Purpose
|
|
9
|
-
|
|
10
|
-
Analyze a developer repository and produce a complete QA baseline package: Testability Report, Test Inventory (pyramid-based), and QA Repo Blueprint.
|
|
11
|
-
|
|
12
|
-
## Core Rule
|
|
13
|
-
|
|
14
|
-
**Every analysis must be specific to the actual codebase — never generic advice. Every test case must have an explicit expected outcome.**
|
|
15
|
-
|
|
16
|
-
## Execution Steps
|
|
17
|
-
|
|
18
|
-
### Step 0: Collect Repo Context
|
|
19
|
-
|
|
20
|
-
Scan the repository systematically:
|
|
21
|
-
- Folder tree (entry points, structure)
|
|
22
|
-
- Package files (dependencies, scripts, framework detection)
|
|
23
|
-
- Service/controller files (API surface area)
|
|
24
|
-
- Model files (data structures, validation)
|
|
25
|
-
- Database layer (ORM, migrations, schemas)
|
|
26
|
-
- External integrations (payment, email, storage, queues)
|
|
27
|
-
- Existing test coverage (test files, config, CI)
|
|
28
|
-
- Configuration (env vars, feature flags)
|
|
29
|
-
|
|
30
|
-
### Step 1: Pre-Analysis — Assumptions & Questions
|
|
31
|
-
|
|
32
|
-
Before generating deliverables, list:
|
|
33
|
-
- **Assumptions**: What you're inferring from the code (e.g., "Auth uses JWT based on middleware")
|
|
34
|
-
- **Questions**: What's ambiguous (e.g., "Is the Stripe integration in production or test mode?")
|
|
35
|
-
|
|
36
|
-
Present to user for confirmation before proceeding.
|
|
37
|
-
|
|
38
|
-
### Step 2: Deliverable A — QA_ANALYSIS.md (Testability Report)
|
|
39
|
-
|
|
40
|
-
Produce with ALL these sections:
|
|
41
|
-
- **Architecture Overview**: System type, language, runtime, entry points table, internal layers
|
|
42
|
-
- **External Dependencies**: Table with purpose and risk level (HIGH/MEDIUM/LOW)
|
|
43
|
-
- **Risk Assessment**: Prioritized risks with justification
|
|
44
|
-
- **Top 10 Unit Test Targets**: Table with module/function, why it's high-priority, complexity assessment
|
|
45
|
-
- **API/Contract Test Targets**: Endpoints that need contract testing
|
|
46
|
-
- **Recommended Testing Pyramid**: Percentages adjusted to this specific app's architecture
|
|
47
|
-
|
|
48
|
-
### Step 3: Deliverable B — TEST_INVENTORY.md (Test Cases)
|
|
49
|
-
|
|
50
|
-
Generate pyramid-based test inventory:
|
|
51
|
-
|
|
52
|
-
**Unit Tests** (60-70%): For each target:
|
|
53
|
-
- Test ID (UT-MODULE-NNN)
|
|
54
|
-
- Target (file path + function)
|
|
55
|
-
- What to validate
|
|
56
|
-
- Concrete inputs
|
|
57
|
-
- Mocks needed
|
|
58
|
-
- Explicit expected outcome
|
|
59
|
-
|
|
60
|
-
**Integration/Contract Tests** (10-15%): Component interactions, API contracts
|
|
61
|
-
|
|
62
|
-
**API Tests** (20-25%): For each endpoint:
|
|
63
|
-
- Test ID (API-RESOURCE-NNN)
|
|
64
|
-
- Method + endpoint
|
|
65
|
-
- Request body/params
|
|
66
|
-
- Expected status + response shape
|
|
67
|
-
|
|
68
|
-
**E2E Smoke Tests** (3-5%): Max 3-8 critical user paths
|
|
69
|
-
|
|
70
|
-
### Step 4: QA_REPO_BLUEPRINT.md
|
|
71
|
-
|
|
72
|
-
If no QA repo exists, generate:
|
|
73
|
-
- Suggested repo name and folder structure
|
|
74
|
-
- Recommended stack (framework, runner, reporter)
|
|
75
|
-
- Config files needed
|
|
76
|
-
- Execution scripts (npm scripts, CI commands)
|
|
77
|
-
- CI/CD strategy (smoke on PR, regression nightly)
|
|
78
|
-
- Definition of Done checklist
|
|
79
|
-
|
|
80
|
-
## Quality Gate
|
|
81
|
-
|
|
82
|
-
- [ ] Architecture overview matches actual codebase (not generic)
|
|
83
|
-
- [ ] Every test case has explicit expected outcome with concrete values
|
|
84
|
-
- [ ] No vague assertions ("works correctly", "returns proper data")
|
|
85
|
-
- [ ] Test IDs follow naming convention
|
|
86
|
-
- [ ] Priority (P0/P1/P2) assigned to every test case
|
|
87
|
-
- [ ] Risks are specific with evidence from the code
|
|
88
|
-
- [ ] Testing pyramid percentages are justified for this architecture
|
|
1
|
+
---
|
|
2
|
+
name: qa-repo-analyzer
|
|
3
|
+
description: QA Repository Analyzer. Analyzes a dev repository and produces a complete QA baseline package including testability report, test inventory, and repo blueprint. Use when user wants to analyze a repo for testing, assess testability, generate test inventory, create QA baseline, understand test coverage needs, evaluate a codebase for QA, or produce a testing strategy. Triggers on "analyze repo", "testability report", "test inventory", "QA analysis", "QA baseline", "coverage assessment", "what should we test", "testing strategy".
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# QA Repository Analyzer
|
|
7
|
+
|
|
8
|
+
## Purpose
|
|
9
|
+
|
|
10
|
+
Analyze a developer repository and produce a complete QA baseline package: Testability Report, Test Inventory (pyramid-based), and QA Repo Blueprint.
|
|
11
|
+
|
|
12
|
+
## Core Rule
|
|
13
|
+
|
|
14
|
+
**Every analysis must be specific to the actual codebase — never generic advice. Every test case must have an explicit expected outcome.**
|
|
15
|
+
|
|
16
|
+
## Execution Steps
|
|
17
|
+
|
|
18
|
+
### Step 0: Collect Repo Context
|
|
19
|
+
|
|
20
|
+
Scan the repository systematically:
|
|
21
|
+
- Folder tree (entry points, structure)
|
|
22
|
+
- Package files (dependencies, scripts, framework detection)
|
|
23
|
+
- Service/controller files (API surface area)
|
|
24
|
+
- Model files (data structures, validation)
|
|
25
|
+
- Database layer (ORM, migrations, schemas)
|
|
26
|
+
- External integrations (payment, email, storage, queues)
|
|
27
|
+
- Existing test coverage (test files, config, CI)
|
|
28
|
+
- Configuration (env vars, feature flags)
|
|
29
|
+
|
|
30
|
+
### Step 1: Pre-Analysis — Assumptions & Questions
|
|
31
|
+
|
|
32
|
+
Before generating deliverables, list:
|
|
33
|
+
- **Assumptions**: What you're inferring from the code (e.g., "Auth uses JWT based on middleware")
|
|
34
|
+
- **Questions**: What's ambiguous (e.g., "Is the Stripe integration in production or test mode?")
|
|
35
|
+
|
|
36
|
+
Present to user for confirmation before proceeding.
|
|
37
|
+
|
|
38
|
+
### Step 2: Deliverable A — QA_ANALYSIS.md (Testability Report)
|
|
39
|
+
|
|
40
|
+
Produce with ALL these sections:
|
|
41
|
+
- **Architecture Overview**: System type, language, runtime, entry points table, internal layers
|
|
42
|
+
- **External Dependencies**: Table with purpose and risk level (HIGH/MEDIUM/LOW)
|
|
43
|
+
- **Risk Assessment**: Prioritized risks with justification
|
|
44
|
+
- **Top 10 Unit Test Targets**: Table with module/function, why it's high-priority, complexity assessment
|
|
45
|
+
- **API/Contract Test Targets**: Endpoints that need contract testing
|
|
46
|
+
- **Recommended Testing Pyramid**: Percentages adjusted to this specific app's architecture
|
|
47
|
+
|
|
48
|
+
### Step 3: Deliverable B — TEST_INVENTORY.md (Test Cases)
|
|
49
|
+
|
|
50
|
+
Generate pyramid-based test inventory:
|
|
51
|
+
|
|
52
|
+
**Unit Tests** (60-70%): For each target:
|
|
53
|
+
- Test ID (UT-MODULE-NNN)
|
|
54
|
+
- Target (file path + function)
|
|
55
|
+
- What to validate
|
|
56
|
+
- Concrete inputs
|
|
57
|
+
- Mocks needed
|
|
58
|
+
- Explicit expected outcome
|
|
59
|
+
|
|
60
|
+
**Integration/Contract Tests** (10-15%): Component interactions, API contracts
|
|
61
|
+
|
|
62
|
+
**API Tests** (20-25%): For each endpoint:
|
|
63
|
+
- Test ID (API-RESOURCE-NNN)
|
|
64
|
+
- Method + endpoint
|
|
65
|
+
- Request body/params
|
|
66
|
+
- Expected status + response shape
|
|
67
|
+
|
|
68
|
+
**E2E Smoke Tests** (3-5%): Max 3-8 critical user paths
|
|
69
|
+
|
|
70
|
+
### Step 4: QA_REPO_BLUEPRINT.md
|
|
71
|
+
|
|
72
|
+
If no QA repo exists, generate:
|
|
73
|
+
- Suggested repo name and folder structure
|
|
74
|
+
- Recommended stack (framework, runner, reporter)
|
|
75
|
+
- Config files needed
|
|
76
|
+
- Execution scripts (npm scripts, CI commands)
|
|
77
|
+
- CI/CD strategy (smoke on PR, regression nightly)
|
|
78
|
+
- Definition of Done checklist
|
|
79
|
+
|
|
80
|
+
## Quality Gate
|
|
81
|
+
|
|
82
|
+
- [ ] Architecture overview matches actual codebase (not generic)
|
|
83
|
+
- [ ] Every test case has explicit expected outcome with concrete values
|
|
84
|
+
- [ ] No vague assertions ("works correctly", "returns proper data")
|
|
85
|
+
- [ ] Test IDs follow naming convention
|
|
86
|
+
- [ ] Priority (P0/P1/P2) assigned to every test case
|
|
87
|
+
- [ ] Risks are specific with evidence from the code
|
|
88
|
+
- [ ] Testing pyramid percentages are justified for this architecture
|
|
@@ -1,109 +1,109 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: qa-self-validator
|
|
3
|
-
description: QA Self Validator. Closed-loop agent that validates generated test code across 4 layers (syntax, structure, dependencies, logic) and auto-fixes issues. Use when user wants to validate tests, check test quality, verify test code compiles, ensure tests follow standards, run quality checks on test suite, or verify generated tests before delivery. Triggers on "validate tests", "check test quality", "verify tests", "test validation", "quality check", "does it compile", "are tests valid", "check my tests".
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# QA Self Validator
|
|
7
|
-
|
|
8
|
-
## Purpose
|
|
9
|
-
|
|
10
|
-
Closed-loop validation agent: Generate -> Validate -> Fix -> Deliver. Never deliver test code without at least one validation pass.
|
|
11
|
-
|
|
12
|
-
## Core Rule
|
|
13
|
-
|
|
14
|
-
**NEVER deliver generated QA code without running at least one validation pass. Max 3 fix loops before escalating.**
|
|
15
|
-
|
|
16
|
-
## Validation Layers
|
|
17
|
-
|
|
18
|
-
### Layer 1: Syntax
|
|
19
|
-
Run the appropriate checker based on language:
|
|
20
|
-
- TypeScript: `tsc --noEmit`
|
|
21
|
-
- JavaScript: `node --check [file]`
|
|
22
|
-
- Python: `python -m py_compile [file]`
|
|
23
|
-
- C#: `dotnet build --no-restore`
|
|
24
|
-
- Also run project linter if configured (eslint, flake8, etc.)
|
|
25
|
-
|
|
26
|
-
**Pass criteria**: Zero syntax errors.
|
|
27
|
-
|
|
28
|
-
### Layer 2: Structure
|
|
29
|
-
Check each test file for:
|
|
30
|
-
- Correct directory placement (e2e in e2e/, unit in unit/, etc.)
|
|
31
|
-
- Naming convention compliance (CLAUDE.md patterns)
|
|
32
|
-
- Has actual test functions (not empty describe blocks)
|
|
33
|
-
- Imports reference real modules in the codebase
|
|
34
|
-
- No hardcoded secrets/credentials/tokens
|
|
35
|
-
- Page objects in pages/ directory, tests in tests/
|
|
36
|
-
|
|
37
|
-
**Pass criteria**: All structural checks pass.
|
|
38
|
-
|
|
39
|
-
### Layer 3: Dependencies
|
|
40
|
-
Verify:
|
|
41
|
-
- All imports resolvable (modules exist at the referenced paths)
|
|
42
|
-
- Packages listed in package.json/requirements.txt
|
|
43
|
-
- No missing dependencies
|
|
44
|
-
- No circular dependencies in test helpers
|
|
45
|
-
- Test fixtures reference existing fixture files
|
|
46
|
-
|
|
47
|
-
**Pass criteria**: All imports resolve, all packages available.
|
|
48
|
-
|
|
49
|
-
### Layer 4: Logic Quality
|
|
50
|
-
Check test logic:
|
|
51
|
-
- Happy path tests have positive assertions (toBe, toEqual, toHaveText)
|
|
52
|
-
- Error/negative tests have negative assertions (not.toBe, toThrow, status >= 400)
|
|
53
|
-
- Setup and teardown are symmetric (what's created is cleaned up)
|
|
54
|
-
- No duplicate test IDs across the suite
|
|
55
|
-
- Assertions are concrete — reject: toBeTruthy(), toBeDefined(), .should('exist')
|
|
56
|
-
- Each test has at least one assertion
|
|
57
|
-
|
|
58
|
-
**Pass criteria**: All logic checks pass.
|
|
59
|
-
|
|
60
|
-
## Fix Loop Protocol
|
|
61
|
-
|
|
62
|
-
```
|
|
63
|
-
Loop 1: Generate tests
|
|
64
|
-
-> Run all 4 validation layers
|
|
65
|
-
-> If PASS: Deliver
|
|
66
|
-
-> If FAIL: Identify issues, fix, continue
|
|
67
|
-
|
|
68
|
-
Loop 2: Re-validate after fixes
|
|
69
|
-
-> If PASS: Deliver
|
|
70
|
-
-> If FAIL: Identify remaining issues, fix
|
|
71
|
-
|
|
72
|
-
Loop 3: Final validation
|
|
73
|
-
-> If PASS: Deliver
|
|
74
|
-
-> If FAIL: Deliver with VALIDATION_REPORT noting unresolved issues
|
|
75
|
-
```
|
|
76
|
-
|
|
77
|
-
## Output: VALIDATION_REPORT.md
|
|
78
|
-
|
|
79
|
-
```markdown
|
|
80
|
-
# Validation Report
|
|
81
|
-
|
|
82
|
-
## Summary
|
|
83
|
-
| Layer | Status | Issues Found | Issues Fixed |
|
|
84
|
-
|-------|--------|-------------|-------------|
|
|
85
|
-
| Syntax | PASS/FAIL | N | N |
|
|
86
|
-
| Structure | PASS/FAIL | N | N |
|
|
87
|
-
| Dependencies | PASS/FAIL | N | N |
|
|
88
|
-
| Logic | PASS/FAIL | N | N |
|
|
89
|
-
|
|
90
|
-
## File Details
|
|
91
|
-
### [filename]
|
|
92
|
-
| Layer | Status | Details |
|
|
93
|
-
|-------|--------|---------|
|
|
94
|
-
| ... | ... | ... |
|
|
95
|
-
|
|
96
|
-
## Unresolved Issues
|
|
97
|
-
[Any issues that couldn't be auto-fixed after 3 loops]
|
|
98
|
-
|
|
99
|
-
## Confidence Level
|
|
100
|
-
[HIGH/MEDIUM/LOW with reasoning]
|
|
101
|
-
```
|
|
102
|
-
|
|
103
|
-
## Quality Gate
|
|
104
|
-
|
|
105
|
-
- [ ] All 4 layers checked for every file
|
|
106
|
-
- [ ] Fix loop executed (max 3 iterations)
|
|
107
|
-
- [ ] VALIDATION_REPORT.md produced
|
|
108
|
-
- [ ] No test delivered with syntax errors
|
|
109
|
-
- [ ] Unresolved issues clearly documented
|
|
1
|
+
---
|
|
2
|
+
name: qa-self-validator
|
|
3
|
+
description: QA Self Validator. Closed-loop agent that validates generated test code across 4 layers (syntax, structure, dependencies, logic) and auto-fixes issues. Use when user wants to validate tests, check test quality, verify test code compiles, ensure tests follow standards, run quality checks on test suite, or verify generated tests before delivery. Triggers on "validate tests", "check test quality", "verify tests", "test validation", "quality check", "does it compile", "are tests valid", "check my tests".
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# QA Self Validator
|
|
7
|
+
|
|
8
|
+
## Purpose
|
|
9
|
+
|
|
10
|
+
Closed-loop validation agent: Generate -> Validate -> Fix -> Deliver. Never deliver test code without at least one validation pass.
|
|
11
|
+
|
|
12
|
+
## Core Rule
|
|
13
|
+
|
|
14
|
+
**NEVER deliver generated QA code without running at least one validation pass. Max 3 fix loops before escalating.**
|
|
15
|
+
|
|
16
|
+
## Validation Layers
|
|
17
|
+
|
|
18
|
+
### Layer 1: Syntax
|
|
19
|
+
Run the appropriate checker based on language:
|
|
20
|
+
- TypeScript: `tsc --noEmit`
|
|
21
|
+
- JavaScript: `node --check [file]`
|
|
22
|
+
- Python: `python -m py_compile [file]`
|
|
23
|
+
- C#: `dotnet build --no-restore`
|
|
24
|
+
- Also run project linter if configured (eslint, flake8, etc.)
|
|
25
|
+
|
|
26
|
+
**Pass criteria**: Zero syntax errors.
|
|
27
|
+
|
|
28
|
+
### Layer 2: Structure
|
|
29
|
+
Check each test file for:
|
|
30
|
+
- Correct directory placement (e2e in e2e/, unit in unit/, etc.)
|
|
31
|
+
- Naming convention compliance (CLAUDE.md patterns)
|
|
32
|
+
- Has actual test functions (not empty describe blocks)
|
|
33
|
+
- Imports reference real modules in the codebase
|
|
34
|
+
- No hardcoded secrets/credentials/tokens
|
|
35
|
+
- Page objects in pages/ directory, tests in tests/
|
|
36
|
+
|
|
37
|
+
**Pass criteria**: All structural checks pass.
|
|
38
|
+
|
|
39
|
+
### Layer 3: Dependencies
|
|
40
|
+
Verify:
|
|
41
|
+
- All imports resolvable (modules exist at the referenced paths)
|
|
42
|
+
- Packages listed in package.json/requirements.txt
|
|
43
|
+
- No missing dependencies
|
|
44
|
+
- No circular dependencies in test helpers
|
|
45
|
+
- Test fixtures reference existing fixture files
|
|
46
|
+
|
|
47
|
+
**Pass criteria**: All imports resolve, all packages available.
|
|
48
|
+
|
|
49
|
+
### Layer 4: Logic Quality
|
|
50
|
+
Check test logic:
|
|
51
|
+
- Happy path tests have positive assertions (toBe, toEqual, toHaveText)
|
|
52
|
+
- Error/negative tests have negative assertions (not.toBe, toThrow, status >= 400)
|
|
53
|
+
- Setup and teardown are symmetric (what's created is cleaned up)
|
|
54
|
+
- No duplicate test IDs across the suite
|
|
55
|
+
- Assertions are concrete — reject: toBeTruthy(), toBeDefined(), .should('exist')
|
|
56
|
+
- Each test has at least one assertion
|
|
57
|
+
|
|
58
|
+
**Pass criteria**: All logic checks pass.
|
|
59
|
+
|
|
60
|
+
## Fix Loop Protocol
|
|
61
|
+
|
|
62
|
+
```
|
|
63
|
+
Loop 1: Generate tests
|
|
64
|
+
-> Run all 4 validation layers
|
|
65
|
+
-> If PASS: Deliver
|
|
66
|
+
-> If FAIL: Identify issues, fix, continue
|
|
67
|
+
|
|
68
|
+
Loop 2: Re-validate after fixes
|
|
69
|
+
-> If PASS: Deliver
|
|
70
|
+
-> If FAIL: Identify remaining issues, fix
|
|
71
|
+
|
|
72
|
+
Loop 3: Final validation
|
|
73
|
+
-> If PASS: Deliver
|
|
74
|
+
-> If FAIL: Deliver with VALIDATION_REPORT noting unresolved issues
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
## Output: VALIDATION_REPORT.md
|
|
78
|
+
|
|
79
|
+
```markdown
|
|
80
|
+
# Validation Report
|
|
81
|
+
|
|
82
|
+
## Summary
|
|
83
|
+
| Layer | Status | Issues Found | Issues Fixed |
|
|
84
|
+
|-------|--------|-------------|-------------|
|
|
85
|
+
| Syntax | PASS/FAIL | N | N |
|
|
86
|
+
| Structure | PASS/FAIL | N | N |
|
|
87
|
+
| Dependencies | PASS/FAIL | N | N |
|
|
88
|
+
| Logic | PASS/FAIL | N | N |
|
|
89
|
+
|
|
90
|
+
## File Details
|
|
91
|
+
### [filename]
|
|
92
|
+
| Layer | Status | Details |
|
|
93
|
+
|-------|--------|---------|
|
|
94
|
+
| ... | ... | ... |
|
|
95
|
+
|
|
96
|
+
## Unresolved Issues
|
|
97
|
+
[Any issues that couldn't be auto-fixed after 3 loops]
|
|
98
|
+
|
|
99
|
+
## Confidence Level
|
|
100
|
+
[HIGH/MEDIUM/LOW with reasoning]
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
## Quality Gate
|
|
104
|
+
|
|
105
|
+
- [ ] All 4 layers checked for every file
|
|
106
|
+
- [ ] Fix loop executed (max 3 iterations)
|
|
107
|
+
- [ ] VALIDATION_REPORT.md produced
|
|
108
|
+
- [ ] No test delivered with syntax errors
|
|
109
|
+
- [ ] Unresolved issues clearly documented
|