agentic-qe 2.0.0 → 2.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/agents/qx-partner.md +17 -4
- package/.claude/skills/accessibility-testing/SKILL.md +144 -692
- package/.claude/skills/agentic-quality-engineering/SKILL.md +176 -529
- package/.claude/skills/api-testing-patterns/SKILL.md +180 -560
- package/.claude/skills/brutal-honesty-review/SKILL.md +113 -603
- package/.claude/skills/bug-reporting-excellence/SKILL.md +116 -517
- package/.claude/skills/chaos-engineering-resilience/SKILL.md +127 -72
- package/.claude/skills/cicd-pipeline-qe-orchestrator/SKILL.md +209 -404
- package/.claude/skills/code-review-quality/SKILL.md +158 -608
- package/.claude/skills/compatibility-testing/SKILL.md +148 -38
- package/.claude/skills/compliance-testing/SKILL.md +132 -63
- package/.claude/skills/consultancy-practices/SKILL.md +114 -446
- package/.claude/skills/context-driven-testing/SKILL.md +117 -381
- package/.claude/skills/contract-testing/SKILL.md +176 -141
- package/.claude/skills/database-testing/SKILL.md +137 -130
- package/.claude/skills/exploratory-testing-advanced/SKILL.md +160 -629
- package/.claude/skills/holistic-testing-pact/SKILL.md +140 -188
- package/.claude/skills/localization-testing/SKILL.md +145 -33
- package/.claude/skills/mobile-testing/SKILL.md +132 -448
- package/.claude/skills/mutation-testing/SKILL.md +147 -41
- package/.claude/skills/performance-testing/SKILL.md +200 -546
- package/.claude/skills/quality-metrics/SKILL.md +164 -519
- package/.claude/skills/refactoring-patterns/SKILL.md +132 -699
- package/.claude/skills/regression-testing/SKILL.md +120 -926
- package/.claude/skills/risk-based-testing/SKILL.md +157 -660
- package/.claude/skills/security-testing/SKILL.md +199 -538
- package/.claude/skills/sherlock-review/SKILL.md +163 -699
- package/.claude/skills/shift-left-testing/SKILL.md +161 -465
- package/.claude/skills/shift-right-testing/SKILL.md +161 -519
- package/.claude/skills/six-thinking-hats/SKILL.md +175 -1110
- package/.claude/skills/skills-manifest.json +71 -20
- package/.claude/skills/tdd-london-chicago/SKILL.md +131 -448
- package/.claude/skills/technical-writing/SKILL.md +103 -154
- package/.claude/skills/test-automation-strategy/SKILL.md +166 -772
- package/.claude/skills/test-data-management/SKILL.md +126 -910
- package/.claude/skills/test-design-techniques/SKILL.md +179 -89
- package/.claude/skills/test-environment-management/SKILL.md +136 -91
- package/.claude/skills/test-reporting-analytics/SKILL.md +169 -92
- package/.claude/skills/testability-scoring/SKILL.md +172 -538
- package/.claude/skills/testability-scoring/scripts/generate-html-report.js +0 -0
- package/.claude/skills/visual-testing-advanced/SKILL.md +155 -78
- package/.claude/skills/xp-practices/SKILL.md +151 -587
- package/CHANGELOG.md +86 -0
- package/README.md +23 -16
- package/dist/agents/QXPartnerAgent.d.ts +47 -1
- package/dist/agents/QXPartnerAgent.d.ts.map +1 -1
- package/dist/agents/QXPartnerAgent.js +2086 -125
- package/dist/agents/QXPartnerAgent.js.map +1 -1
- package/dist/agents/lifecycle/AgentLifecycleManager.d.ts.map +1 -1
- package/dist/agents/lifecycle/AgentLifecycleManager.js +34 -31
- package/dist/agents/lifecycle/AgentLifecycleManager.js.map +1 -1
- package/dist/cli/commands/init-claude-md-template.d.ts.map +1 -1
- package/dist/cli/commands/init-claude-md-template.js +14 -0
- package/dist/cli/commands/init-claude-md-template.js.map +1 -1
- package/dist/core/SwarmCoordinator.d.ts +180 -0
- package/dist/core/SwarmCoordinator.d.ts.map +1 -0
- package/dist/core/SwarmCoordinator.js +473 -0
- package/dist/core/SwarmCoordinator.js.map +1 -0
- package/dist/core/memory/ReflexionMemoryAdapter.d.ts +109 -0
- package/dist/core/memory/ReflexionMemoryAdapter.d.ts.map +1 -0
- package/dist/core/memory/ReflexionMemoryAdapter.js +306 -0
- package/dist/core/memory/ReflexionMemoryAdapter.js.map +1 -0
- package/dist/core/memory/RuVectorPatternStore.d.ts +28 -0
- package/dist/core/memory/RuVectorPatternStore.d.ts.map +1 -1
- package/dist/core/memory/RuVectorPatternStore.js +70 -0
- package/dist/core/memory/RuVectorPatternStore.js.map +1 -1
- package/dist/core/memory/SparseVectorSearch.d.ts +55 -0
- package/dist/core/memory/SparseVectorSearch.d.ts.map +1 -0
- package/dist/core/memory/SparseVectorSearch.js +130 -0
- package/dist/core/memory/SparseVectorSearch.js.map +1 -0
- package/dist/core/memory/TieredCompression.d.ts +81 -0
- package/dist/core/memory/TieredCompression.d.ts.map +1 -0
- package/dist/core/memory/TieredCompression.js +270 -0
- package/dist/core/memory/TieredCompression.js.map +1 -0
- package/dist/core/memory/index.d.ts +6 -0
- package/dist/core/memory/index.d.ts.map +1 -1
- package/dist/core/memory/index.js +29 -1
- package/dist/core/memory/index.js.map +1 -1
- package/dist/core/metrics/MetricsAggregator.d.ts +228 -0
- package/dist/core/metrics/MetricsAggregator.d.ts.map +1 -0
- package/dist/core/metrics/MetricsAggregator.js +482 -0
- package/dist/core/metrics/MetricsAggregator.js.map +1 -0
- package/dist/core/metrics/index.d.ts +5 -0
- package/dist/core/metrics/index.d.ts.map +1 -0
- package/dist/core/metrics/index.js +11 -0
- package/dist/core/metrics/index.js.map +1 -0
- package/dist/core/optimization/SwarmOptimizer.d.ts +5 -0
- package/dist/core/optimization/SwarmOptimizer.d.ts.map +1 -1
- package/dist/core/optimization/SwarmOptimizer.js +17 -0
- package/dist/core/optimization/SwarmOptimizer.js.map +1 -1
- package/dist/core/orchestration/AdaptiveScheduler.d.ts +190 -0
- package/dist/core/orchestration/AdaptiveScheduler.d.ts.map +1 -0
- package/dist/core/orchestration/AdaptiveScheduler.js +460 -0
- package/dist/core/orchestration/AdaptiveScheduler.js.map +1 -0
- package/dist/core/orchestration/WorkflowOrchestrator.d.ts +13 -0
- package/dist/core/orchestration/WorkflowOrchestrator.d.ts.map +1 -1
- package/dist/core/orchestration/WorkflowOrchestrator.js +32 -0
- package/dist/core/orchestration/WorkflowOrchestrator.js.map +1 -1
- package/dist/core/recovery/CircuitBreaker.d.ts +176 -0
- package/dist/core/recovery/CircuitBreaker.d.ts.map +1 -0
- package/dist/core/recovery/CircuitBreaker.js +382 -0
- package/dist/core/recovery/CircuitBreaker.js.map +1 -0
- package/dist/core/recovery/RecoveryOrchestrator.d.ts +186 -0
- package/dist/core/recovery/RecoveryOrchestrator.d.ts.map +1 -0
- package/dist/core/recovery/RecoveryOrchestrator.js +476 -0
- package/dist/core/recovery/RecoveryOrchestrator.js.map +1 -0
- package/dist/core/recovery/RetryStrategy.d.ts +127 -0
- package/dist/core/recovery/RetryStrategy.d.ts.map +1 -0
- package/dist/core/recovery/RetryStrategy.js +314 -0
- package/dist/core/recovery/RetryStrategy.js.map +1 -0
- package/dist/core/recovery/index.d.ts +8 -0
- package/dist/core/recovery/index.d.ts.map +1 -0
- package/dist/core/recovery/index.js +27 -0
- package/dist/core/recovery/index.js.map +1 -0
- package/dist/core/skills/DependencyResolver.d.ts +99 -0
- package/dist/core/skills/DependencyResolver.d.ts.map +1 -0
- package/dist/core/skills/DependencyResolver.js +260 -0
- package/dist/core/skills/DependencyResolver.js.map +1 -0
- package/dist/core/skills/ManifestGenerator.d.ts +114 -0
- package/dist/core/skills/ManifestGenerator.d.ts.map +1 -0
- package/dist/core/skills/ManifestGenerator.js +449 -0
- package/dist/core/skills/ManifestGenerator.js.map +1 -0
- package/dist/core/skills/index.d.ts +9 -0
- package/dist/core/skills/index.d.ts.map +1 -0
- package/dist/core/skills/index.js +24 -0
- package/dist/core/skills/index.js.map +1 -0
- package/dist/mcp/handlers/chaos/chaos-inject-failure.d.ts +5 -0
- package/dist/mcp/handlers/chaos/chaos-inject-failure.d.ts.map +1 -1
- package/dist/mcp/handlers/chaos/chaos-inject-failure.js +36 -2
- package/dist/mcp/handlers/chaos/chaos-inject-failure.js.map +1 -1
- package/dist/mcp/handlers/chaos/chaos-inject-latency.d.ts +5 -0
- package/dist/mcp/handlers/chaos/chaos-inject-latency.d.ts.map +1 -1
- package/dist/mcp/handlers/chaos/chaos-inject-latency.js +36 -2
- package/dist/mcp/handlers/chaos/chaos-inject-latency.js.map +1 -1
- package/dist/mcp/server.d.ts +9 -9
- package/dist/mcp/server.d.ts.map +1 -1
- package/dist/mcp/server.js +1 -2
- package/dist/mcp/server.js.map +1 -1
- package/dist/types/qx.d.ts +113 -7
- package/dist/types/qx.d.ts.map +1 -1
- package/dist/types/qx.js.map +1 -1
- package/dist/visualization/api/RestEndpoints.js +1 -1
- package/dist/visualization/api/RestEndpoints.js.map +1 -1
- package/package.json +15 -54
|
@@ -1,580 +1,225 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: quality-metrics
|
|
3
|
-
description: Measure quality effectively with actionable metrics. Use when establishing quality dashboards, defining KPIs, or evaluating test effectiveness.
|
|
3
|
+
description: "Measure quality effectively with actionable metrics. Use when establishing quality dashboards, defining KPIs, or evaluating test effectiveness."
|
|
4
|
+
category: testing-methodologies
|
|
5
|
+
priority: high
|
|
6
|
+
tokenEstimate: 900
|
|
7
|
+
agents: [qe-quality-analyzer, qe-test-executor, qe-coverage-analyzer, qe-production-intelligence, qe-quality-gate]
|
|
8
|
+
implementation_status: optimized
|
|
9
|
+
optimization_version: 1.0
|
|
10
|
+
last_optimized: 2025-12-02
|
|
11
|
+
dependencies: []
|
|
12
|
+
quick_reference_card: true
|
|
13
|
+
tags: [metrics, dora, quality-gates, dashboards, kpis, measurement]
|
|
4
14
|
---
|
|
5
15
|
|
|
6
16
|
# Quality Metrics
|
|
7
17
|
|
|
8
|
-
|
|
18
|
+
<default_to_action>
|
|
19
|
+
When measuring quality or building dashboards:
|
|
20
|
+
1. MEASURE outcomes (bug escape rate, MTTD) not activities (test count)
|
|
21
|
+
2. FOCUS on DORA metrics: Deployment frequency, Lead time, MTTD, MTTR, Change failure rate
|
|
22
|
+
3. AVOID vanity metrics: 100% coverage means nothing if tests don't catch bugs
|
|
23
|
+
4. SET thresholds that drive behavior (quality gates block bad code)
|
|
24
|
+
5. TREND over time: Direction matters more than absolute numbers
|
|
25
|
+
|
|
26
|
+
**Quick Metric Selection:**
|
|
27
|
+
- Speed: Deployment frequency, lead time for changes
|
|
28
|
+
- Stability: Change failure rate, MTTR
|
|
29
|
+
- Quality: Bug escape rate, defect density, test effectiveness
|
|
30
|
+
- Process: Code review time, flaky test rate
|
|
31
|
+
|
|
32
|
+
**Critical Success Factors:**
|
|
33
|
+
- Metrics without action are theater
|
|
34
|
+
- What you measure is what you optimize
|
|
35
|
+
- Trends matter more than snapshots
|
|
36
|
+
</default_to_action>
|
|
37
|
+
|
|
38
|
+
## Quick Reference Card
|
|
39
|
+
|
|
40
|
+
### When to Use
|
|
41
|
+
- Building quality dashboards
|
|
42
|
+
- Defining quality gates
|
|
43
|
+
- Evaluating testing effectiveness
|
|
44
|
+
- Justifying quality investments
|
|
45
|
+
|
|
46
|
+
### Meaningful vs Vanity Metrics
|
|
47
|
+
| ✅ Meaningful | ❌ Vanity |
|
|
48
|
+
|--------------|-----------|
|
|
49
|
+
| Bug escape rate | Test case count |
|
|
50
|
+
| MTTD (detection) | Lines of test code |
|
|
51
|
+
| MTTR (recovery) | Test executions |
|
|
52
|
+
| Change failure rate | Coverage % (alone) |
|
|
53
|
+
| Lead time for changes | Requirements traced |
|
|
54
|
+
|
|
55
|
+
### DORA Metrics
|
|
56
|
+
| Metric | Elite | High | Medium | Low |
|
|
57
|
+
|--------|-------|------|--------|-----|
|
|
58
|
+
| Deploy Frequency | On-demand | Weekly | Monthly | Yearly |
|
|
59
|
+
| Lead Time | < 1 hour | < 1 week | < 1 month | > 6 months |
|
|
60
|
+
| Change Failure Rate | < 5% | < 15% | < 30% | > 45% |
|
|
61
|
+
| MTTR | < 1 hour | < 1 day | < 1 week | > 1 month |
|
|
62
|
+
|
|
63
|
+
### Quality Gate Thresholds
|
|
64
|
+
| Metric | Blocking Threshold | Warning |
|
|
65
|
+
|--------|-------------------|---------|
|
|
66
|
+
| Test pass rate | 100% | - |
|
|
67
|
+
| Critical coverage | > 80% | > 70% |
|
|
68
|
+
| Security critical | 0 | - |
|
|
69
|
+
| Performance p95 | < 200ms | < 500ms |
|
|
70
|
+
| Flaky tests | < 2% | < 5% |
|
|
9
71
|
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
Metrics should drive better decisions, not just prettier dashboards. If a metric doesn't change behavior or inform action, stop tracking it.
|
|
13
|
-
|
|
14
|
-
## The Vanity Metrics Problem
|
|
15
|
-
|
|
16
|
-
### Vanity Metrics (Stop Measuring These)
|
|
17
|
-
|
|
18
|
-
**Test Count**
|
|
19
|
-
- "We have 5,000 tests!"
|
|
20
|
-
- So what? Are they finding bugs? Are they maintainable? Do they give confidence?
|
|
21
|
-
|
|
22
|
-
**Code Coverage Percentage**
|
|
23
|
-
- "We achieved 85% coverage!"
|
|
24
|
-
- Useless without context. 85% of what? Critical paths? Or just getters/setters?
|
|
25
|
-
|
|
26
|
-
**Test Cases Executed**
|
|
27
|
-
- "Ran 10,000 test cases today!"
|
|
28
|
-
- How many found problems? How many are redundant?
|
|
29
|
-
|
|
30
|
-
**Bugs Found**
|
|
31
|
-
- "QA found 200 bugs this sprint!"
|
|
32
|
-
- Is that good or bad? Are they trivial or critical? Should they have been found earlier?
|
|
33
|
-
|
|
34
|
-
**Story Points Completed**
|
|
35
|
-
- "We completed 50 points of testing work!"
|
|
36
|
-
- Points are relative and gameable. What actually got better?
|
|
37
|
-
|
|
38
|
-
### Why Vanity Metrics Fail
|
|
39
|
-
|
|
40
|
-
1. **Easily gamed**: People optimize for the metric, not the goal
|
|
41
|
-
2. **No context**: Numbers without meaning
|
|
42
|
-
3. **No action**: What do you do differently based on this number?
|
|
43
|
-
4. **False confidence**: High numbers that mean nothing
|
|
44
|
-
|
|
45
|
-
## Meaningful Metrics
|
|
46
|
-
|
|
47
|
-
### 1. Defect Escape Rate
|
|
48
|
-
|
|
49
|
-
**What**: Percentage of bugs that reach production vs. caught before release
|
|
72
|
+
---
|
|
50
73
|
|
|
51
|
-
|
|
74
|
+
## Core Metrics
|
|
52
75
|
|
|
53
|
-
|
|
54
|
-
```
|
|
55
|
-
Defect Escape Rate = (Production Bugs / Total Bugs Found) × 100
|
|
76
|
+
### Bug Escape Rate
|
|
56
77
|
```
|
|
78
|
+
Bug Escape Rate = (Production Bugs / Total Bugs Found) × 100
|
|
57
79
|
|
|
58
|
-
|
|
59
|
-
**Needs work**: > 15% escape rate
|
|
60
|
-
|
|
61
|
-
**Actions**:
|
|
62
|
-
- High escape rate → Shift testing left, improve risk assessment
|
|
63
|
-
- Low escape rate but slow releases → Maybe over-testing, reduce friction
|
|
64
|
-
|
|
65
|
-
### 2. Mean Time to Detect (MTTD)
|
|
66
|
-
|
|
67
|
-
**What**: How long from bug introduction to discovery
|
|
68
|
-
|
|
69
|
-
**Why it matters**: Faster detection = cheaper fixes
|
|
70
|
-
|
|
71
|
-
**How to measure**:
|
|
80
|
+
Target: < 10% (90% caught before production)
|
|
72
81
|
```
|
|
73
|
-
MTTD = Time bug found - Time bug introduced
|
|
74
|
-
```
|
|
75
|
-
|
|
76
|
-
**Good**: < 1 day for critical paths
|
|
77
|
-
**Needs work**: > 1 week
|
|
78
|
-
|
|
79
|
-
**Actions**:
|
|
80
|
-
- High MTTD → Add monitoring, improve test coverage on critical paths
|
|
81
|
-
- Very low MTTD → Your fast feedback loops are working
|
|
82
|
-
|
|
83
|
-
### 3. Mean Time to Resolution (MTTR)
|
|
84
|
-
|
|
85
|
-
**What**: Time from bug discovery to fix deployed
|
|
86
82
|
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
**How to measure**:
|
|
90
|
-
```
|
|
91
|
-
MTTR = Time fix deployed - Time bug discovered
|
|
83
|
+
### Test Effectiveness
|
|
92
84
|
```
|
|
85
|
+
Test Effectiveness = (Bugs Found by Tests / Total Bugs) × 100
|
|
93
86
|
|
|
94
|
-
|
|
95
|
-
**Needs work**: > 1 week for critical bugs
|
|
96
|
-
|
|
97
|
-
**Actions**:
|
|
98
|
-
- High MTTR → Investigate bottlenecks (test env access? deployment pipeline? handoffs?)
|
|
99
|
-
- Very low MTTR but high escape rate → Rushing fixes, need better verification
|
|
100
|
-
|
|
101
|
-
### 4. Deployment Frequency
|
|
102
|
-
|
|
103
|
-
**What**: How often you deploy to production
|
|
104
|
-
|
|
105
|
-
**Why it matters**: Proxy for team confidence and process maturity
|
|
106
|
-
|
|
107
|
-
**How to measure**:
|
|
108
|
-
```
|
|
109
|
-
Deployments per week (or day)
|
|
87
|
+
Target: > 70%
|
|
110
88
|
```
|
|
111
89
|
|
|
112
|
-
|
|
113
|
-
**Decent**: Multiple per week
|
|
114
|
-
**Needs work**: Less than weekly
|
|
115
|
-
|
|
116
|
-
**Actions**:
|
|
117
|
-
- Low frequency → Reduce batch size, improve automation, build confidence
|
|
118
|
-
- High frequency with high defect rate → Need better automated checks
|
|
119
|
-
|
|
120
|
-
### 5. Change Failure Rate
|
|
121
|
-
|
|
122
|
-
**What**: Percentage of deployments that cause production issues
|
|
123
|
-
|
|
124
|
-
**Why it matters**: Measures release quality
|
|
125
|
-
|
|
126
|
-
**How to measure**:
|
|
90
|
+
### Defect Density
|
|
127
91
|
```
|
|
128
|
-
|
|
129
|
-
```
|
|
130
|
-
|
|
131
|
-
**Good**: < 5%
|
|
132
|
-
**Needs work**: > 15%
|
|
92
|
+
Defect Density = Defects / KLOC
|
|
133
93
|
|
|
134
|
-
|
|
135
|
-
- High failure rate → Improve pre-production validation, add canary deployments
|
|
136
|
-
- Very low but slow releases → Maybe you can deploy more frequently
|
|
137
|
-
|
|
138
|
-
### 6. Test Execution Time
|
|
139
|
-
|
|
140
|
-
**What**: How long your test suite takes to run
|
|
141
|
-
|
|
142
|
-
**Why it matters**: Slow tests = slow feedback = less frequent testing
|
|
143
|
-
|
|
144
|
-
**How to measure**:
|
|
145
|
-
```
|
|
146
|
-
Time from commit to test completion
|
|
94
|
+
Good: < 1 defect per KLOC
|
|
147
95
|
```
|
|
148
96
|
|
|
149
|
-
|
|
150
|
-
**Needs work**: > 1 hour
|
|
151
|
-
|
|
152
|
-
**Actions**:
|
|
153
|
-
- Slow tests → Parallelize, remove redundant tests, optimize slow tests
|
|
154
|
-
- Fast tests but bugs escaping → Coverage gaps, need better tests
|
|
155
|
-
|
|
156
|
-
### 7. Flaky Test Rate
|
|
157
|
-
|
|
158
|
-
**What**: Percentage of tests that fail intermittently
|
|
159
|
-
|
|
160
|
-
**Why it matters**: Flaky tests destroy confidence
|
|
161
|
-
|
|
162
|
-
**How to measure**:
|
|
97
|
+
### Mean Time to Detect (MTTD)
|
|
163
98
|
```
|
|
164
|
-
|
|
165
|
-
```
|
|
166
|
-
|
|
167
|
-
**Good**: < 1%
|
|
168
|
-
**Needs work**: > 5%
|
|
169
|
-
|
|
170
|
-
**Actions**:
|
|
171
|
-
- High flakiness → Fix or delete flaky tests immediately (quarantine pattern)
|
|
172
|
-
- Low flakiness → Maintain vigilance, don't let it creep up
|
|
173
|
-
|
|
174
|
-
## Context-Specific Metrics
|
|
175
|
-
|
|
176
|
-
### For Startups
|
|
177
|
-
|
|
178
|
-
**Focus on**:
|
|
179
|
-
- Deployment frequency (speed to market)
|
|
180
|
-
- Critical path coverage (protect revenue)
|
|
181
|
-
- MTTR (move fast, fix fast)
|
|
182
|
-
|
|
183
|
-
**Skip**:
|
|
184
|
-
- Comprehensive coverage metrics
|
|
185
|
-
- Detailed test documentation
|
|
186
|
-
- Complex traceability
|
|
187
|
-
|
|
188
|
-
### For Regulated Industries
|
|
189
|
-
|
|
190
|
-
**Focus on**:
|
|
191
|
-
- Traceability (requirement → test → result)
|
|
192
|
-
- Test documentation completeness
|
|
193
|
-
- Audit trail integrity
|
|
194
|
-
|
|
195
|
-
**Don't skip**:
|
|
196
|
-
- Deployment frequency still matters
|
|
197
|
-
- But compliance isn't optional
|
|
198
|
-
|
|
199
|
-
### For Established Products
|
|
200
|
-
|
|
201
|
-
**Focus on**:
|
|
202
|
-
- Defect escape rate (protect reputation)
|
|
203
|
-
- Regression detection (maintain stability)
|
|
204
|
-
- Test maintenance cost
|
|
205
|
-
|
|
206
|
-
**Balance**:
|
|
207
|
-
- Innovation vs. stability
|
|
208
|
-
- New features vs. technical debt
|
|
209
|
-
|
|
210
|
-
## Leading vs. Lagging Indicators
|
|
211
|
-
|
|
212
|
-
### Lagging Indicators (Rearview Mirror)
|
|
213
|
-
- Defect escape rate
|
|
214
|
-
- Production incidents
|
|
215
|
-
- Customer complaints
|
|
216
|
-
- MTTR
|
|
217
|
-
|
|
218
|
-
**Use for**: Understanding what happened, trending over time
|
|
219
|
-
|
|
220
|
-
### Leading Indicators (Windshield)
|
|
221
|
-
- Code review quality
|
|
222
|
-
- Test coverage on new code
|
|
223
|
-
- Deployment frequency trend
|
|
224
|
-
- Team confidence surveys
|
|
225
|
-
|
|
226
|
-
**Use for**: Predicting problems, early intervention
|
|
227
|
-
|
|
228
|
-
## Metrics for Different Audiences
|
|
229
|
-
|
|
230
|
-
### For Developers
|
|
231
|
-
- Test execution time
|
|
232
|
-
- Flaky test rate
|
|
233
|
-
- Code review turnaround
|
|
234
|
-
- Build failure frequency
|
|
235
|
-
|
|
236
|
-
**Language**: Technical, actionable
|
|
237
|
-
|
|
238
|
-
### For Product/Management
|
|
239
|
-
- Deployment frequency
|
|
240
|
-
- Change failure rate
|
|
241
|
-
- Feature lead time
|
|
242
|
-
- Customer-impacting incidents
|
|
243
|
-
|
|
244
|
-
**Language**: Business outcomes, not technical details
|
|
245
|
-
|
|
246
|
-
### For Executive Leadership
|
|
247
|
-
- Defect escape rate trend
|
|
248
|
-
- Mean time to resolution
|
|
249
|
-
- Release velocity
|
|
250
|
-
- Customer satisfaction (related to quality)
|
|
251
|
-
|
|
252
|
-
**Language**: Business impact, strategic
|
|
253
|
-
|
|
254
|
-
## Building a Metrics Dashboard
|
|
255
|
-
|
|
256
|
-
### Essential Dashboard (Start Here)
|
|
257
|
-
|
|
258
|
-
**Top Row (Health)**
|
|
259
|
-
- Defect escape rate (last 30 days)
|
|
260
|
-
- Deployment frequency (last 7 days)
|
|
261
|
-
- Change failure rate (last 30 days)
|
|
262
|
-
|
|
263
|
-
**Middle Row (Speed)**
|
|
264
|
-
- MTTD (average, last 30 days)
|
|
265
|
-
- MTTR (average, last 30 days)
|
|
266
|
-
- Test execution time (current)
|
|
267
|
-
|
|
268
|
-
**Bottom Row (Trends)**
|
|
269
|
-
- All of the above as sparklines (3-6 months)
|
|
99
|
+
MTTD = Time(Bug Reported) - Time(Bug Introduced)
|
|
270
100
|
|
|
271
|
-
|
|
272
|
-
|
|
273
|
-
Add:
|
|
274
|
-
- Flaky test rate
|
|
275
|
-
- Test coverage on critical paths (not overall %)
|
|
276
|
-
- Production error rate
|
|
277
|
-
- Customer-reported bugs vs. internally found
|
|
278
|
-
|
|
279
|
-
## Anti-Patterns
|
|
280
|
-
|
|
281
|
-
### ❌ Metric-Driven Development
|
|
282
|
-
**Problem**: Optimizing for metrics instead of quality
|
|
283
|
-
|
|
284
|
-
**Example**: Writing useless tests to hit coverage targets
|
|
285
|
-
|
|
286
|
-
**Fix**: Focus on outcomes (can we deploy confidently?) not numbers
|
|
287
|
-
|
|
288
|
-
### ❌ Too Many Metrics
|
|
289
|
-
**Problem**: Dashboard overload, no clear priorities
|
|
290
|
-
|
|
291
|
-
**Example**: Tracking 30+ metrics that no one understands
|
|
292
|
-
|
|
293
|
-
**Fix**: Start with 5-7 core metrics, add only if they drive decisions
|
|
294
|
-
|
|
295
|
-
### ❌ Metrics Without Action
|
|
296
|
-
**Problem**: Tracking numbers but not changing behavior
|
|
297
|
-
|
|
298
|
-
**Example**: Watching MTTR climb for months without investigating
|
|
299
|
-
|
|
300
|
-
**Fix**: For every metric, define thresholds and actions
|
|
301
|
-
|
|
302
|
-
### ❌ Gaming the System
|
|
303
|
-
**Problem**: People optimize for metrics, not quality
|
|
304
|
-
|
|
305
|
-
**Example**: Marking bugs as "won't fix" to improve resolution time
|
|
306
|
-
|
|
307
|
-
**Fix**: Multiple complementary metrics, qualitative reviews
|
|
308
|
-
|
|
309
|
-
### ❌ One-Size-Fits-All
|
|
310
|
-
**Problem**: Using same metrics for all teams/contexts
|
|
311
|
-
|
|
312
|
-
**Example**: Measuring startup team same as regulated medical device team
|
|
313
|
-
|
|
314
|
-
**Fix**: Context-driven metric selection
|
|
315
|
-
|
|
316
|
-
## Metric Hygiene
|
|
317
|
-
|
|
318
|
-
### Review Quarterly
|
|
319
|
-
- Are we still using this metric to make decisions?
|
|
320
|
-
- Is it being gamed?
|
|
321
|
-
- Does it reflect current priorities?
|
|
322
|
-
|
|
323
|
-
### Adjust Thresholds
|
|
324
|
-
- What's "good" changes as you improve
|
|
325
|
-
- Don't keep celebrating the same baseline
|
|
326
|
-
- Raise the bar when appropriate
|
|
327
|
-
|
|
328
|
-
### Kill Zombie Metrics
|
|
329
|
-
- If no one looks at it → Delete it
|
|
330
|
-
- If no one can explain what action to take → Delete it
|
|
331
|
-
- If it's always green or always red → Delete it
|
|
332
|
-
|
|
333
|
-
## Real-World Examples
|
|
334
|
-
|
|
335
|
-
### Example 1: E-Commerce Company
|
|
336
|
-
|
|
337
|
-
**Before**:
|
|
338
|
-
- Measured: Test count (5,000 tests)
|
|
339
|
-
- Result: Slow CI, frequent production bugs
|
|
340
|
-
|
|
341
|
-
**After**:
|
|
342
|
-
- Measured: Defect escape rate (8%), MTTD (3 days), deployment frequency (2/week)
|
|
343
|
-
- Actions:
|
|
344
|
-
- Removed 2,000 redundant tests
|
|
345
|
-
- Added monitoring for critical paths
|
|
346
|
-
- Improved deployment pipeline
|
|
347
|
-
- Result: Escape rate to 3%, MTTD to 6 hours, deploy 5x/day
|
|
348
|
-
|
|
349
|
-
### Example 2: SaaS Platform
|
|
350
|
-
|
|
351
|
-
**Before**:
|
|
352
|
-
- Measured: Code coverage (85%)
|
|
353
|
-
- Result: False confidence, bugs in uncovered critical paths
|
|
354
|
-
|
|
355
|
-
**After**:
|
|
356
|
-
- Measured: Critical path coverage (60%), deployment frequency, change failure rate
|
|
357
|
-
- Actions:
|
|
358
|
-
- Focused testing on payment, auth, data integrity
|
|
359
|
-
- Removed tests on deprecated features
|
|
360
|
-
- Added production monitoring
|
|
361
|
-
- Result: Fewer production incidents, faster releases
|
|
362
|
-
|
|
363
|
-
## Questions to Ask About Any Metric
|
|
364
|
-
|
|
365
|
-
1. **What decision does this inform?**
|
|
366
|
-
- If none → Don't track it
|
|
367
|
-
|
|
368
|
-
2. **What action do we take if it's red?**
|
|
369
|
-
- If you don't know → Define thresholds and actions
|
|
370
|
-
|
|
371
|
-
3. **Can this be gamed?**
|
|
372
|
-
- If yes → Add complementary metrics
|
|
373
|
-
|
|
374
|
-
4. **Does this reflect actual quality?**
|
|
375
|
-
- If no → Replace it with something that does
|
|
376
|
-
|
|
377
|
-
5. **Who needs to see this?**
|
|
378
|
-
- If no one → Stop tracking it
|
|
379
|
-
|
|
380
|
-
## Remember
|
|
381
|
-
|
|
382
|
-
**Good metrics**:
|
|
383
|
-
- Drive better decisions
|
|
384
|
-
- Are actionable
|
|
385
|
-
- Reflect actual outcomes
|
|
386
|
-
- Change as you mature
|
|
387
|
-
|
|
388
|
-
**Bad metrics**:
|
|
389
|
-
- Make dashboards pretty
|
|
390
|
-
- Are easily gamed
|
|
391
|
-
- Provide false confidence
|
|
392
|
-
- Persist long after they're useful
|
|
393
|
-
|
|
394
|
-
**Start small**: 5-7 metrics that matter
|
|
395
|
-
**Review often**: Quarterly at minimum
|
|
396
|
-
**Kill ruthlessly**: Remove metrics that don't drive action
|
|
397
|
-
**Stay contextual**: What matters changes with your situation
|
|
398
|
-
|
|
399
|
-
## Using with QE Agents
|
|
400
|
-
|
|
401
|
-
### Automated Metrics Collection
|
|
402
|
-
|
|
403
|
-
**qe-quality-analyzer** collects and analyzes quality metrics:
|
|
404
|
-
```typescript
|
|
405
|
-
// Agent collects comprehensive metrics automatically
|
|
406
|
-
await agent.collectMetrics({
|
|
407
|
-
scope: 'all',
|
|
408
|
-
timeframe: '30d',
|
|
409
|
-
categories: [
|
|
410
|
-
'deployment-frequency',
|
|
411
|
-
'defect-escape-rate',
|
|
412
|
-
'test-execution-time',
|
|
413
|
-
'flaky-test-rate',
|
|
414
|
-
'coverage-trends'
|
|
415
|
-
]
|
|
416
|
-
});
|
|
417
|
-
|
|
418
|
-
// Returns real-time dashboard data
|
|
419
|
-
// No manual tracking required
|
|
101
|
+
Target: < 1 day for critical, < 1 week for others
|
|
420
102
|
```
|
|
421
103
|
|
|
422
|
-
|
|
104
|
+
---
|
|
423
105
|
|
|
424
|
-
|
|
425
|
-
```typescript
|
|
426
|
-
// Agent detects metric anomalies
|
|
427
|
-
const analysis = await agent.analyzeTrends({
|
|
428
|
-
metric: 'defect-escape-rate',
|
|
429
|
-
timeframe: '90d',
|
|
430
|
-
alertThreshold: 0.15
|
|
431
|
-
});
|
|
106
|
+
## Dashboard Design
|
|
432
107
|
|
|
433
|
-
|
|
434
|
-
//
|
|
435
|
-
|
|
436
|
-
|
|
437
|
-
|
|
438
|
-
|
|
439
|
-
|
|
440
|
-
|
|
441
|
-
|
|
108
|
+
```typescript
|
|
109
|
+
// Agent generates quality dashboard
|
|
110
|
+
await Task("Generate Dashboard", {
|
|
111
|
+
metrics: {
|
|
112
|
+
delivery: ['deployment-frequency', 'lead-time', 'change-failure-rate'],
|
|
113
|
+
quality: ['bug-escape-rate', 'test-effectiveness', 'defect-density'],
|
|
114
|
+
stability: ['mttd', 'mttr', 'availability'],
|
|
115
|
+
process: ['code-review-time', 'flaky-test-rate', 'coverage-trend']
|
|
116
|
+
},
|
|
117
|
+
visualization: 'grafana',
|
|
118
|
+
alerts: {
|
|
119
|
+
critical: { bug_escape_rate: '>20%', mttr: '>24h' },
|
|
120
|
+
warning: { coverage: '<70%', flaky_rate: '>5%' }
|
|
121
|
+
}
|
|
122
|
+
}, "qe-quality-analyzer");
|
|
442
123
|
```
|
|
443
124
|
|
|
444
|
-
|
|
125
|
+
---
|
|
445
126
|
|
|
446
|
-
|
|
447
|
-
|
|
448
|
-
|
|
449
|
-
|
|
450
|
-
|
|
451
|
-
|
|
452
|
-
|
|
453
|
-
|
|
454
|
-
|
|
455
|
-
|
|
127
|
+
## Quality Gate Configuration
|
|
128
|
+
|
|
129
|
+
```json
|
|
130
|
+
{
|
|
131
|
+
"qualityGates": {
|
|
132
|
+
"commit": {
|
|
133
|
+
"coverage": { "min": 80, "blocking": true },
|
|
134
|
+
"lint": { "errors": 0, "blocking": true }
|
|
135
|
+
},
|
|
136
|
+
"pr": {
|
|
137
|
+
"tests": { "pass": "100%", "blocking": true },
|
|
138
|
+
"security": { "critical": 0, "blocking": true },
|
|
139
|
+
"coverage_delta": { "min": 0, "blocking": false }
|
|
140
|
+
},
|
|
141
|
+
"release": {
|
|
142
|
+
"e2e": { "pass": "100%", "blocking": true },
|
|
143
|
+
"performance_p95": { "max_ms": 200, "blocking": true },
|
|
144
|
+
"bug_escape_rate": { "max": "10%", "blocking": false }
|
|
145
|
+
}
|
|
456
146
|
}
|
|
457
|
-
}
|
|
458
|
-
|
|
459
|
-
// Returns:
|
|
460
|
-
// {
|
|
461
|
-
// decision: 'NO-GO',
|
|
462
|
-
// blockers: [
|
|
463
|
-
// 'Flaky test rate: 4.2% (threshold: 2%)'
|
|
464
|
-
// ],
|
|
465
|
-
// recommendations: [
|
|
466
|
-
// 'Run qe-flaky-test-hunter to stabilize tests'
|
|
467
|
-
// ]
|
|
468
|
-
// }
|
|
147
|
+
}
|
|
469
148
|
```
|
|
470
149
|
|
|
471
|
-
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
## Agent-Assisted Metrics
|
|
472
153
|
|
|
473
|
-
**qe-quality-analyzer** generates live dashboards:
|
|
474
154
|
```typescript
|
|
475
|
-
//
|
|
476
|
-
await
|
|
477
|
-
|
|
478
|
-
|
|
479
|
-
|
|
480
|
-
|
|
155
|
+
// Calculate quality trends
|
|
156
|
+
await Task("Quality Trend Analysis", {
|
|
157
|
+
timeframe: '90d',
|
|
158
|
+
metrics: ['bug-escape-rate', 'mttd', 'test-effectiveness'],
|
|
159
|
+
compare: 'previous-90d',
|
|
160
|
+
predictNext: '30d'
|
|
161
|
+
}, "qe-quality-analyzer");
|
|
481
162
|
|
|
482
|
-
//
|
|
483
|
-
|
|
484
|
-
|
|
485
|
-
|
|
486
|
-
|
|
163
|
+
// Evaluate quality gate
|
|
164
|
+
await Task("Quality Gate Evaluation", {
|
|
165
|
+
buildId: 'build-123',
|
|
166
|
+
environment: 'staging',
|
|
167
|
+
metrics: currentMetrics,
|
|
168
|
+
policy: qualityPolicy
|
|
169
|
+
}, "qe-quality-gate");
|
|
487
170
|
```
|
|
488
171
|
|
|
489
|
-
|
|
172
|
+
---
|
|
490
173
|
|
|
491
|
-
|
|
492
|
-
```typescript
|
|
493
|
-
// Agent identifies which tests provide most value
|
|
494
|
-
const optimization = await agent.optimizeTestSuite({
|
|
495
|
-
metrics: {
|
|
496
|
-
executionTime: 'per-test',
|
|
497
|
-
defectDetectionRate: 'per-test',
|
|
498
|
-
maintenanceCost: 'per-test'
|
|
499
|
-
},
|
|
500
|
-
goal: 'maximize-value-per-minute'
|
|
501
|
-
});
|
|
174
|
+
## Agent Coordination Hints
|
|
502
175
|
|
|
503
|
-
|
|
504
|
-
|
|
505
|
-
|
|
506
|
-
|
|
176
|
+
### Memory Namespace
|
|
177
|
+
```
|
|
178
|
+
aqe/quality-metrics/
|
|
179
|
+
├── dashboards/* - Dashboard configurations
|
|
180
|
+
├── trends/* - Historical metric data
|
|
181
|
+
├── gates/* - Gate evaluation results
|
|
182
|
+
└── alerts/* - Triggered alerts
|
|
507
183
|
```
|
|
508
184
|
|
|
509
|
-
### Fleet Coordination
|
|
510
|
-
|
|
185
|
+
### Fleet Coordination
|
|
511
186
|
```typescript
|
|
512
|
-
// Multiple agents collaborate on metrics collection and analysis
|
|
513
187
|
const metricsFleet = await FleetManager.coordinate({
|
|
514
188
|
strategy: 'quality-metrics',
|
|
515
189
|
agents: [
|
|
516
|
-
'qe-
|
|
517
|
-
'qe-
|
|
518
|
-
'qe-
|
|
519
|
-
'qe-
|
|
520
|
-
'qe-quality-gate'
|
|
190
|
+
'qe-quality-analyzer', // Trend analysis
|
|
191
|
+
'qe-test-executor', // Test metrics
|
|
192
|
+
'qe-coverage-analyzer', // Coverage data
|
|
193
|
+
'qe-production-intelligence', // Production metrics
|
|
194
|
+
'qe-quality-gate' // Gate decisions
|
|
521
195
|
],
|
|
522
|
-
topology: '
|
|
523
|
-
});
|
|
524
|
-
|
|
525
|
-
// Continuous metrics pipeline
|
|
526
|
-
await metricsFleet.execute({
|
|
527
|
-
schedule: 'continuous',
|
|
528
|
-
aggregationInterval: '5min'
|
|
196
|
+
topology: 'mesh'
|
|
529
197
|
});
|
|
530
198
|
```
|
|
531
199
|
|
|
532
|
-
|
|
200
|
+
---
|
|
533
201
|
|
|
534
|
-
|
|
535
|
-
// Agent recommends metrics based on context
|
|
536
|
-
const recommendation = await qe-quality-analyzer.recommendMetrics({
|
|
537
|
-
context: 'startup',
|
|
538
|
-
stage: 'early',
|
|
539
|
-
team: 'small',
|
|
540
|
-
compliance: 'none'
|
|
541
|
-
});
|
|
202
|
+
## Common Traps
|
|
542
203
|
|
|
543
|
-
|
|
544
|
-
|
|
545
|
-
|
|
546
|
-
|
|
547
|
-
|
|
548
|
-
|
|
549
|
-
// - comprehensive coverage %
|
|
550
|
-
// - detailed traceability
|
|
551
|
-
// - process compliance metrics
|
|
552
|
-
```
|
|
204
|
+
| Trap | Problem | Solution |
|
|
205
|
+
|------|---------|----------|
|
|
206
|
+
| Coverage worship | 100% coverage, bugs still escape | Measure bug escape rate instead |
|
|
207
|
+
| Test count focus | Many tests, slow feedback | Measure execution time |
|
|
208
|
+
| Activity metrics | Busy work, no outcomes | Measure outcomes (MTTD, MTTR) |
|
|
209
|
+
| Point-in-time | Snapshot without context | Track trends over time |
|
|
553
210
|
|
|
554
211
|
---
|
|
555
212
|
|
|
556
213
|
## Related Skills
|
|
557
|
-
|
|
558
|
-
|
|
559
|
-
- [
|
|
560
|
-
- [
|
|
561
|
-
|
|
562
|
-
**Testing Approaches:**
|
|
563
|
-
- [risk-based-testing](../risk-based-testing/) - Risk-based metric selection
|
|
564
|
-
- [test-automation-strategy](../test-automation-strategy/) - Automation effectiveness metrics
|
|
565
|
-
- [exploratory-testing-advanced](../exploratory-testing-advanced/) - Exploratory session metrics
|
|
566
|
-
|
|
567
|
-
**Development Practices:**
|
|
568
|
-
- [xp-practices](../xp-practices/) - XP success metrics (velocity, lead time)
|
|
214
|
+
- [agentic-quality-engineering](../agentic-quality-engineering/) - Agent coordination
|
|
215
|
+
- [cicd-pipeline-qe-orchestrator](../cicd-pipeline-qe-orchestrator/) - Quality gates
|
|
216
|
+
- [risk-based-testing](../risk-based-testing/) - Risk-informed metrics
|
|
217
|
+
- [shift-right-testing](../shift-right-testing/) - Production metrics
|
|
569
218
|
|
|
570
219
|
---
|
|
571
220
|
|
|
572
|
-
##
|
|
573
|
-
|
|
574
|
-
- **Accelerate** by Forsgren, Humble, Kim (DORA metrics)
|
|
575
|
-
- **How to Measure Anything** by Douglas Hubbard (measuring intangibles)
|
|
576
|
-
- Your own retrospectives (which metrics helped? Which didn't?)
|
|
221
|
+
## Remember
|
|
577
222
|
|
|
578
|
-
|
|
223
|
+
**Measure outcomes, not activities.** Bug escape rate > test count. MTTD/MTTR > coverage %. Trends > snapshots. Set gates that block bad code. What you measure is what you optimize.
|
|
579
224
|
|
|
580
|
-
**With Agents
|
|
225
|
+
**With Agents:** Agents track metrics automatically, analyze trends, trigger alerts, and make gate decisions. Use agents to maintain continuous quality visibility.
|