agentic-qe 3.6.4 → 3.6.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/helpers/v3/product-factors/sfdipot-reference-template.html +958 -0
- package/.claude/helpers/v3/quality-criteria/evidence-classification.md +116 -0
- package/.claude/helpers/v3/quality-criteria/htsm-categories.md +139 -0
- package/.claude/helpers/v3/quality-criteria/quality-criteria-reference-template.html +811 -0
- package/.claude/helpers/v3/quality-criteria/validate-quality-criteria.ts +167 -0
- package/.claude/skills/README.md +21 -8
- package/.claude/skills/skills-manifest.json +1 -1
- package/README.md +45 -49
- package/package.json +1 -1
- package/v3/CHANGELOG.md +13 -0
- package/v3/assets/agents/v3/README.md +9 -70
- package/v3/assets/agents/v3/qe-accessibility-auditor.md +8 -7
- package/v3/assets/agents/v3/qe-bdd-generator.md +8 -7
- package/v3/assets/agents/v3/qe-chaos-engineer.md +9 -8
- package/v3/assets/agents/v3/qe-code-complexity.md +8 -7
- package/v3/assets/agents/v3/qe-code-intelligence.md +8 -7
- package/v3/assets/agents/v3/qe-contract-validator.md +8 -7
- package/v3/assets/agents/v3/qe-coverage-specialist.md +8 -7
- package/v3/assets/agents/v3/qe-defect-predictor.md +8 -7
- package/v3/assets/agents/v3/qe-dependency-mapper.md +8 -7
- package/v3/assets/agents/v3/qe-deployment-advisor.md +8 -7
- package/v3/assets/agents/v3/qe-flaky-hunter.md +8 -7
- package/v3/assets/agents/v3/qe-fleet-commander.md +8 -7
- package/v3/assets/agents/v3/qe-gap-detector.md +8 -7
- package/v3/assets/agents/v3/qe-graphql-tester.md +8 -7
- package/v3/assets/agents/v3/qe-impact-analyzer.md +8 -7
- package/v3/assets/agents/v3/qe-kg-builder.md +8 -7
- package/v3/assets/agents/v3/qe-load-tester.md +8 -7
- package/v3/assets/agents/v3/qe-message-broker-tester.md +15 -10
- package/v3/assets/agents/v3/qe-metrics-optimizer.md +8 -7
- package/v3/assets/agents/v3/qe-middleware-validator.md +15 -10
- package/v3/assets/agents/v3/qe-mutation-tester.md +8 -7
- package/v3/assets/agents/v3/qe-odata-contract-tester.md +17 -12
- package/v3/assets/agents/v3/qe-performance-tester.md +8 -7
- package/v3/assets/agents/v3/qe-property-tester.md +8 -7
- package/v3/assets/agents/v3/qe-qx-partner.md +74 -14
- package/v3/assets/agents/v3/qe-regression-analyzer.md +8 -7
- package/v3/assets/agents/v3/qe-requirements-validator.md +8 -7
- package/v3/assets/agents/v3/qe-responsive-tester.md +8 -7
- package/v3/assets/agents/v3/qe-retry-handler.md +8 -7
- package/v3/assets/agents/v3/qe-risk-assessor.md +8 -7
- package/v3/assets/agents/v3/qe-root-cause-analyzer.md +8 -7
- package/v3/assets/agents/v3/qe-sap-idoc-tester.md +16 -11
- package/v3/assets/agents/v3/qe-sap-rfc-tester.md +15 -10
- package/v3/assets/agents/v3/qe-security-auditor.md +12 -7
- package/v3/assets/agents/v3/qe-security-scanner.md +9 -8
- package/v3/assets/agents/v3/qe-soap-tester.md +15 -10
- package/v3/assets/agents/v3/qe-sod-analyzer.md +17 -12
- package/v3/assets/agents/v3/qe-test-architect.md +8 -7
- package/v3/assets/agents/v3/qe-transfer-specialist.md +8 -7
- package/v3/assets/agents/v3/subagents/qe-code-reviewer.md +8 -7
- package/v3/assets/agents/v3/subagents/qe-integration-reviewer.md +8 -7
- package/v3/assets/agents/v3/subagents/qe-performance-reviewer.md +8 -7
- package/v3/assets/agents/v3/subagents/qe-security-reviewer.md +8 -7
- package/v3/assets/agents/v3/subagents/qe-tdd-green.md +8 -7
- package/v3/assets/agents/v3/subagents/qe-tdd-red.md +8 -7
- package/v3/assets/agents/v3/subagents/qe-tdd-refactor.md +8 -7
- package/v3/assets/agents/v3/templates/qx-report-template.html +26 -22
- package/v3/dist/cli/bundle.js +97 -19
- package/v3/dist/init/phases/11-claude-md.d.ts.map +1 -1
- package/v3/dist/init/phases/11-claude-md.js +94 -16
- package/v3/dist/init/phases/11-claude-md.js.map +1 -1
- package/v3/dist/kernel/constants.d.ts +1 -1
- package/v3/dist/kernel/constants.js +1 -1
- package/v3/dist/learning/qe-unified-memory.d.ts.map +1 -1
- package/v3/dist/learning/qe-unified-memory.js +8 -7
- package/v3/dist/learning/qe-unified-memory.js.map +1 -1
- package/v3/dist/mcp/bundle.js +80 -1
- package/v3/package.json +2 -2
|
@@ -0,0 +1,116 @@
|
|
|
1
|
+
# Evidence Classification Guide
|
|
2
|
+
|
|
3
|
+
Guidelines for classifying evidence in Quality Criteria recommendations.
|
|
4
|
+
|
|
5
|
+
## Evidence Types
|
|
6
|
+
|
|
7
|
+
### Direct Evidence
|
|
8
|
+
**Definition:** Actual code quote, explicit documentation statement, or measurable fact from source.
|
|
9
|
+
|
|
10
|
+
**Requirements:**
|
|
11
|
+
- Must include `file_path:line_range` reference (e.g., `src/auth/login.ts:45-52`)
|
|
12
|
+
- Line ranges should be narrow (max 10-15 lines)
|
|
13
|
+
- Must quote or directly reference the source
|
|
14
|
+
|
|
15
|
+
**Examples:**
|
|
16
|
+
```
|
|
17
|
+
Source: src/payment/processor.ts:123-128
|
|
18
|
+
Type: Direct
|
|
19
|
+
Finding: No input validation before API call
|
|
20
|
+
Reasoning: Unvalidated input could enable injection attacks
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
### Inferred Evidence
|
|
24
|
+
**Definition:** Logical deduction from observed patterns, architectural implications, or domain knowledge.
|
|
25
|
+
|
|
26
|
+
**Requirements:**
|
|
27
|
+
- Must show reasoning chain
|
|
28
|
+
- Can use architectural implications
|
|
29
|
+
- Should reference what was observed
|
|
30
|
+
|
|
31
|
+
**Examples:**
|
|
32
|
+
```
|
|
33
|
+
Source: Architecture review of src/api/
|
|
34
|
+
Type: Inferred
|
|
35
|
+
Finding: No rate limiting middleware detected
|
|
36
|
+
Reasoning: API endpoints could be vulnerable to DoS; need to verify with load testing
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
### Claimed Evidence
|
|
40
|
+
**Definition:** Statement that requires verification - based on assumptions or incomplete data.
|
|
41
|
+
|
|
42
|
+
**Requirements:**
|
|
43
|
+
- Must state "requires verification" or "needs inspection to confirm"
|
|
44
|
+
- Must NOT speculate about what "could" or "might" happen
|
|
45
|
+
- Used when source is unavailable or claim needs validation
|
|
46
|
+
|
|
47
|
+
**Examples:**
|
|
48
|
+
```
|
|
49
|
+
WRONG: "Could range from efficient to aggressive implementation"
|
|
50
|
+
RIGHT: "Poll interval not specified - requires code inspection to verify"
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
## Evidence Table Format
|
|
54
|
+
|
|
55
|
+
```html
|
|
56
|
+
<table class="evidence-table">
|
|
57
|
+
<thead>
|
|
58
|
+
<tr>
|
|
59
|
+
<th>Source Reference</th>
|
|
60
|
+
<th>Type</th>
|
|
61
|
+
<th>Quality Implication</th>
|
|
62
|
+
<th>Reasoning</th>
|
|
63
|
+
</tr>
|
|
64
|
+
</thead>
|
|
65
|
+
<tbody>
|
|
66
|
+
<tr>
|
|
67
|
+
<td><code>src/auth/session.ts:89-94</code></td>
|
|
68
|
+
<td><span class="evidence-type direct">Direct</span></td>
|
|
69
|
+
<td>Session tokens stored without encryption</td>
|
|
70
|
+
<td class="evidence-reasoning">Credential exposure risk if storage is compromised</td>
|
|
71
|
+
</tr>
|
|
72
|
+
</tbody>
|
|
73
|
+
</table>
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
## Source Reference Format
|
|
77
|
+
|
|
78
|
+
### For Specific Code
|
|
79
|
+
```
|
|
80
|
+
file_path:start_line-end_line
|
|
81
|
+
Example: src/agents/FleetCommanderAgent.ts:847-852
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
### For File-Level Metrics
|
|
85
|
+
```
|
|
86
|
+
file_path (metric)
|
|
87
|
+
Example: src/agents/N8nBaseAgent.ts (683 LOC)
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
### For Search Results (No Matches)
|
|
91
|
+
```
|
|
92
|
+
N/A (verified via Glob/Grep search)
|
|
93
|
+
- NOT: tests/**/n8n/**/*.test.ts (glob pattern)
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
## Reasoning Column Guidelines
|
|
97
|
+
|
|
98
|
+
The Reasoning column must explain **WHY** something matters, not **WHAT** the code does.
|
|
99
|
+
|
|
100
|
+
| WRONG (describes WHAT) | CORRECT (explains WHY) |
|
|
101
|
+
|------------------------|------------------------|
|
|
102
|
+
| "Retry logic with exponential backoff" | "Retry pattern handles transient failures; needs edge case testing for timeout exhaustion" |
|
|
103
|
+
| "Session cookie stored in memory" | "Credential in memory could leak if agent state is serialized to logs" |
|
|
104
|
+
| "getWorkflow supports forceRefresh flag" | "Cache bypass prevents stale data; but increases load on source system" |
|
|
105
|
+
|
|
106
|
+
**Formula:**
|
|
107
|
+
```
|
|
108
|
+
{What the code does} → {Why that matters for quality} → {What could go wrong}
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
## Prohibited Patterns
|
|
112
|
+
|
|
113
|
+
- **No confidence percentages**: Use evidence types instead of "85% confident"
|
|
114
|
+
- **No vague blast radius**: Use "affects 19 agents" not "affects many"
|
|
115
|
+
- **No speculation in Claimed**: Use "requires verification" not "could be X or Y"
|
|
116
|
+
- **No keyword matching claims**: Show semantic reasoning, not keyword counts
|
|
@@ -0,0 +1,139 @@
|
|
|
1
|
+
# HTSM v6.3 Quality Criteria Categories
|
|
2
|
+
|
|
3
|
+
James Bach's Heuristic Test Strategy Model (HTSM) v6.3 Quality Criteria framework.
|
|
4
|
+
|
|
5
|
+
## 1. Capability
|
|
6
|
+
**Can it perform the required functions?**
|
|
7
|
+
|
|
8
|
+
| Subcategory | Focus |
|
|
9
|
+
|-------------|-------|
|
|
10
|
+
| Sufficiency | Does it do what it's supposed to? |
|
|
11
|
+
| Correctness | Does it do it correctly? |
|
|
12
|
+
|
|
13
|
+
**Priority Indicators:**
|
|
14
|
+
- P0: Core business functionality
|
|
15
|
+
- P1: Important features
|
|
16
|
+
- P2: Secondary features
|
|
17
|
+
- P3: Nice-to-have features
|
|
18
|
+
|
|
19
|
+
## 2. Reliability
|
|
20
|
+
**Will it work well and resist failure?**
|
|
21
|
+
|
|
22
|
+
| Subcategory | Focus |
|
|
23
|
+
|-------------|-------|
|
|
24
|
+
| Robustness | Can it handle adverse conditions? |
|
|
25
|
+
| Error Handling | Does it handle errors gracefully? |
|
|
26
|
+
| Data Integrity | Is data protected from corruption? |
|
|
27
|
+
| Safety | Does it avoid dangerous behaviors? |
|
|
28
|
+
|
|
29
|
+
**Cannot be omitted** - All systems can fail.
|
|
30
|
+
|
|
31
|
+
## 3. Usability
|
|
32
|
+
**How easy is it for real users?**
|
|
33
|
+
|
|
34
|
+
| Subcategory | Focus |
|
|
35
|
+
|-------------|-------|
|
|
36
|
+
| Learnability | How quickly can users learn? |
|
|
37
|
+
| Operability | How easy to operate day-to-day? |
|
|
38
|
+
| Accessibility | Can users with disabilities use it? |
|
|
39
|
+
|
|
40
|
+
## 4. Charisma
|
|
41
|
+
**How appealing is the product?**
|
|
42
|
+
|
|
43
|
+
| Subcategory | Focus |
|
|
44
|
+
|-------------|-------|
|
|
45
|
+
| Aesthetics | Is it visually pleasing? |
|
|
46
|
+
| Uniqueness | Does it stand out? |
|
|
47
|
+
| Entrancement | Does it engage users? |
|
|
48
|
+
| Image | Does it project the right brand? |
|
|
49
|
+
|
|
50
|
+
**Note:** "Brand guidelines handled separately" is NOT a valid omission reason. Charisma is about UX testing, not brand documentation.
|
|
51
|
+
|
|
52
|
+
## 5. Security
|
|
53
|
+
**How well protected against unauthorized use?**
|
|
54
|
+
|
|
55
|
+
| Subcategory | Focus |
|
|
56
|
+
|-------------|-------|
|
|
57
|
+
| Authentication | Who is using it? |
|
|
58
|
+
| Authorization | What are they allowed to do? |
|
|
59
|
+
| Privacy | Is personal data protected? |
|
|
60
|
+
| Security Holes | Are there vulnerabilities? |
|
|
61
|
+
|
|
62
|
+
**Cannot be omitted** - Every system has attack surface.
|
|
63
|
+
|
|
64
|
+
## 6. Scalability
|
|
65
|
+
**How well does deployment scale?**
|
|
66
|
+
|
|
67
|
+
| Subcategory | Focus |
|
|
68
|
+
|-------------|-------|
|
|
69
|
+
| Load Handling | Behavior under increased demand |
|
|
70
|
+
| Resource Efficiency | Resource usage at scale |
|
|
71
|
+
|
|
72
|
+
## 7. Compatibility
|
|
73
|
+
**Works with external components?**
|
|
74
|
+
|
|
75
|
+
| Subcategory | Focus |
|
|
76
|
+
|-------------|-------|
|
|
77
|
+
| Application | Works with other applications? |
|
|
78
|
+
| OS | Works with target operating systems? |
|
|
79
|
+
| Hardware | Works with target hardware? |
|
|
80
|
+
| Backward | Works with previous versions? |
|
|
81
|
+
| Product Footprint | Resource requirements acceptable? |
|
|
82
|
+
|
|
83
|
+
## 8. Performance
|
|
84
|
+
**How speedy and responsive?**
|
|
85
|
+
|
|
86
|
+
| Subcategory | Focus |
|
|
87
|
+
|-------------|-------|
|
|
88
|
+
| Response Time | Under various conditions |
|
|
89
|
+
| Throughput | Data processing capacity |
|
|
90
|
+
| Efficiency | Resource utilization |
|
|
91
|
+
|
|
92
|
+
**Cannot be omitted** - Every system has response time.
|
|
93
|
+
|
|
94
|
+
## 9. Installability
|
|
95
|
+
**How easily installed?**
|
|
96
|
+
|
|
97
|
+
| Subcategory | Focus |
|
|
98
|
+
|-------------|-------|
|
|
99
|
+
| System Requirements | Clear and achievable? |
|
|
100
|
+
| Configuration | Easy to configure? |
|
|
101
|
+
| Uninstallation | Clean removal? |
|
|
102
|
+
| Upgrades/Patches | Easy to update? |
|
|
103
|
+
| Administration | Easy to administer? |
|
|
104
|
+
|
|
105
|
+
**Valid omission:** Pure SaaS/browser-based with no client installation.
|
|
106
|
+
|
|
107
|
+
## 10. Development
|
|
108
|
+
**How well can we create/test/modify?**
|
|
109
|
+
|
|
110
|
+
| Subcategory | Focus |
|
|
111
|
+
|-------------|-------|
|
|
112
|
+
| Supportability | Easy to support? |
|
|
113
|
+
| Testability | Easy to test? |
|
|
114
|
+
| Maintainability | Easy to maintain? |
|
|
115
|
+
| Portability | Easy to port? |
|
|
116
|
+
| Localizability | Easy to localize? |
|
|
117
|
+
|
|
118
|
+
**Cannot be omitted** - Always applies to software.
|
|
119
|
+
|
|
120
|
+
---
|
|
121
|
+
|
|
122
|
+
## Priority Assignment Guide
|
|
123
|
+
|
|
124
|
+
| Priority | Definition | Example |
|
|
125
|
+
|----------|------------|---------|
|
|
126
|
+
| **P0 (Critical)** | Failure causes immediate business/user harm | Payment failures, data breaches |
|
|
127
|
+
| **P1 (High)** | Critical to core user value proposition | Core features not working |
|
|
128
|
+
| **P2 (Medium)** | Affects satisfaction but not blocking | Secondary features |
|
|
129
|
+
| **P3 (Low)** | Nice-to-have improvements | Polish, edge case optimization |
|
|
130
|
+
|
|
131
|
+
## Valid vs Invalid Omission Reasons
|
|
132
|
+
|
|
133
|
+
| Category | Valid Omission | Invalid Omission |
|
|
134
|
+
|----------|----------------|------------------|
|
|
135
|
+
| Installability | "Pure SaaS, no client installation" | "Handled by ops team" |
|
|
136
|
+
| Charisma | "CLI tool, visual design N/A" | "Brand guidelines separate" |
|
|
137
|
+
| Compatibility | "Single-platform by contract" | "Will test on main browsers" |
|
|
138
|
+
| Development | **NEVER** | "Team is experienced" |
|
|
139
|
+
| Security | **NEVER** | "Internal system only" |
|