agentic-qe 1.9.4 → 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/agents/qe-api-contract-validator.md +95 -1336
- package/.claude/agents/qe-chaos-engineer.md +152 -1211
- package/.claude/agents/qe-code-complexity.md +144 -707
- package/.claude/agents/qe-coverage-analyzer.md +147 -743
- package/.claude/agents/qe-deployment-readiness.md +143 -1496
- package/.claude/agents/qe-flaky-test-hunter.md +132 -1529
- package/.claude/agents/qe-fleet-commander.md +12 -12
- package/.claude/agents/qe-performance-tester.md +150 -886
- package/.claude/agents/qe-production-intelligence.md +155 -1396
- package/.claude/agents/qe-quality-analyzer.md +6 -6
- package/.claude/agents/qe-quality-gate.md +151 -648
- package/.claude/agents/qe-regression-risk-analyzer.md +132 -1150
- package/.claude/agents/qe-requirements-validator.md +149 -932
- package/.claude/agents/qe-security-scanner.md +157 -797
- package/.claude/agents/qe-test-data-architect.md +96 -1365
- package/.claude/agents/qe-test-executor.md +8 -8
- package/.claude/agents/qe-test-generator.md +145 -1540
- package/.claude/agents/qe-visual-tester.md +153 -1257
- package/.claude/agents/qx-partner.md +235 -0
- package/.claude/agents/subagents/qe-code-reviewer.md +40 -136
- package/.claude/agents/subagents/qe-coverage-gap-analyzer.md +40 -480
- package/.claude/agents/subagents/qe-data-generator.md +41 -125
- package/.claude/agents/subagents/qe-flaky-investigator.md +55 -411
- package/.claude/agents/subagents/qe-integration-tester.md +53 -141
- package/.claude/agents/subagents/qe-performance-validator.md +54 -130
- package/.claude/agents/subagents/qe-security-auditor.md +56 -114
- package/.claude/agents/subagents/qe-test-data-architect-sub.md +57 -548
- package/.claude/agents/subagents/qe-test-implementer.md +58 -551
- package/.claude/agents/subagents/qe-test-refactorer.md +65 -722
- package/.claude/agents/subagents/qe-test-writer.md +63 -726
- package/.claude/skills/skills-manifest.json +632 -0
- package/.claude/skills/testability-scoring/README.md +71 -0
- package/.claude/skills/testability-scoring/SKILL.md +611 -0
- package/.claude/skills/testability-scoring/resources/templates/config.template.js +84 -0
- package/.claude/skills/testability-scoring/resources/templates/testability-scoring.spec.template.js +532 -0
- package/.claude/skills/testability-scoring/scripts/generate-html-report.js +1007 -0
- package/.claude/skills/testability-scoring/scripts/run-assessment.sh +70 -0
- package/CHANGELOG.md +62 -0
- package/README.md +33 -6
- package/dist/agents/QXPartnerAgent.d.ts +139 -0
- package/dist/agents/QXPartnerAgent.d.ts.map +1 -0
- package/dist/agents/QXPartnerAgent.js +769 -0
- package/dist/agents/QXPartnerAgent.js.map +1 -0
- package/dist/agents/index.d.ts +1 -0
- package/dist/agents/index.d.ts.map +1 -1
- package/dist/agents/index.js +82 -2
- package/dist/agents/index.js.map +1 -1
- package/dist/cli/commands/debug/agent.d.ts.map +1 -1
- package/dist/cli/commands/debug/agent.js +19 -6
- package/dist/cli/commands/debug/agent.js.map +1 -1
- package/dist/cli/commands/debug/health-check.js +20 -7
- package/dist/cli/commands/debug/health-check.js.map +1 -1
- package/dist/cli/commands/init-claude-md-template.d.ts +1 -0
- package/dist/cli/commands/init-claude-md-template.d.ts.map +1 -1
- package/dist/cli/commands/init-claude-md-template.js +4 -3
- package/dist/cli/commands/init-claude-md-template.js.map +1 -1
- package/dist/cli/commands/workflow/cancel.d.ts.map +1 -1
- package/dist/cli/commands/workflow/cancel.js +4 -3
- package/dist/cli/commands/workflow/cancel.js.map +1 -1
- package/dist/cli/commands/workflow/list.d.ts.map +1 -1
- package/dist/cli/commands/workflow/list.js +4 -3
- package/dist/cli/commands/workflow/list.js.map +1 -1
- package/dist/cli/commands/workflow/pause.d.ts.map +1 -1
- package/dist/cli/commands/workflow/pause.js +4 -3
- package/dist/cli/commands/workflow/pause.js.map +1 -1
- package/dist/cli/init/claude-config.d.ts.map +1 -1
- package/dist/cli/init/claude-config.js +3 -8
- package/dist/cli/init/claude-config.js.map +1 -1
- package/dist/cli/init/claude-md.d.ts.map +1 -1
- package/dist/cli/init/claude-md.js +44 -2
- package/dist/cli/init/claude-md.js.map +1 -1
- package/dist/cli/init/database-init.js +1 -1
- package/dist/cli/init/index.d.ts.map +1 -1
- package/dist/cli/init/index.js +13 -6
- package/dist/cli/init/index.js.map +1 -1
- package/dist/cli/init/skills.d.ts.map +1 -1
- package/dist/cli/init/skills.js +2 -1
- package/dist/cli/init/skills.js.map +1 -1
- package/dist/core/memory/AgentDBIntegration.d.ts +24 -6
- package/dist/core/memory/AgentDBIntegration.d.ts.map +1 -1
- package/dist/core/memory/AgentDBIntegration.js +66 -10
- package/dist/core/memory/AgentDBIntegration.js.map +1 -1
- package/dist/core/memory/UnifiedMemoryCoordinator.d.ts +341 -0
- package/dist/core/memory/UnifiedMemoryCoordinator.d.ts.map +1 -0
- package/dist/core/memory/UnifiedMemoryCoordinator.js +986 -0
- package/dist/core/memory/UnifiedMemoryCoordinator.js.map +1 -0
- package/dist/core/memory/index.d.ts +5 -0
- package/dist/core/memory/index.d.ts.map +1 -1
- package/dist/core/memory/index.js +23 -1
- package/dist/core/memory/index.js.map +1 -1
- package/dist/core/optimization/SwarmOptimizer.d.ts +185 -0
- package/dist/core/optimization/SwarmOptimizer.d.ts.map +1 -0
- package/dist/core/optimization/SwarmOptimizer.js +631 -0
- package/dist/core/optimization/SwarmOptimizer.js.map +1 -0
- package/dist/core/optimization/index.d.ts +9 -0
- package/dist/core/optimization/index.d.ts.map +1 -0
- package/dist/core/optimization/index.js +25 -0
- package/dist/core/optimization/index.js.map +1 -0
- package/dist/core/optimization/types.d.ts +53 -0
- package/dist/core/optimization/types.d.ts.map +1 -0
- package/dist/core/optimization/types.js +6 -0
- package/dist/core/optimization/types.js.map +1 -0
- package/dist/core/orchestration/PriorityQueue.d.ts +54 -0
- package/dist/core/orchestration/PriorityQueue.d.ts.map +1 -0
- package/dist/core/orchestration/PriorityQueue.js +122 -0
- package/dist/core/orchestration/PriorityQueue.js.map +1 -0
- package/dist/core/orchestration/WorkflowOrchestrator.d.ts +176 -0
- package/dist/core/orchestration/WorkflowOrchestrator.d.ts.map +1 -0
- package/dist/core/orchestration/WorkflowOrchestrator.js +813 -0
- package/dist/core/orchestration/WorkflowOrchestrator.js.map +1 -0
- package/dist/core/orchestration/index.d.ts +7 -0
- package/dist/core/orchestration/index.d.ts.map +1 -0
- package/dist/core/orchestration/index.js +11 -0
- package/dist/core/orchestration/index.js.map +1 -0
- package/dist/core/orchestration/types.d.ts +96 -0
- package/dist/core/orchestration/types.d.ts.map +1 -0
- package/dist/core/orchestration/types.js +6 -0
- package/dist/core/orchestration/types.js.map +1 -0
- package/dist/core/skills/DynamicSkillLoader.d.ts +96 -0
- package/dist/core/skills/DynamicSkillLoader.d.ts.map +1 -0
- package/dist/core/skills/DynamicSkillLoader.js +353 -0
- package/dist/core/skills/DynamicSkillLoader.js.map +1 -0
- package/dist/core/skills/types.d.ts +118 -0
- package/dist/core/skills/types.d.ts.map +1 -0
- package/dist/core/skills/types.js +7 -0
- package/dist/core/skills/types.js.map +1 -0
- package/dist/core/transport/QUICTransport.d.ts +320 -0
- package/dist/core/transport/QUICTransport.d.ts.map +1 -0
- package/dist/core/transport/QUICTransport.js +711 -0
- package/dist/core/transport/QUICTransport.js.map +1 -0
- package/dist/core/transport/index.d.ts +40 -0
- package/dist/core/transport/index.d.ts.map +1 -0
- package/dist/core/transport/index.js +46 -0
- package/dist/core/transport/index.js.map +1 -0
- package/dist/core/transport/quic-loader.d.ts +123 -0
- package/dist/core/transport/quic-loader.d.ts.map +1 -0
- package/dist/core/transport/quic-loader.js +293 -0
- package/dist/core/transport/quic-loader.js.map +1 -0
- package/dist/core/transport/quic.d.ts +154 -0
- package/dist/core/transport/quic.d.ts.map +1 -0
- package/dist/core/transport/quic.js +214 -0
- package/dist/core/transport/quic.js.map +1 -0
- package/dist/mcp/services/AgentRegistry.d.ts.map +1 -1
- package/dist/mcp/services/AgentRegistry.js +4 -1
- package/dist/mcp/services/AgentRegistry.js.map +1 -1
- package/dist/types/index.d.ts +2 -1
- package/dist/types/index.d.ts.map +1 -1
- package/dist/types/index.js +2 -0
- package/dist/types/index.js.map +1 -1
- package/dist/types/qx.d.ts +397 -0
- package/dist/types/qx.d.ts.map +1 -0
- package/dist/types/qx.js +71 -0
- package/dist/types/qx.js.map +1 -0
- package/dist/visualization/api/RestEndpoints.js +1 -1
- package/dist/visualization/api/RestEndpoints.js.map +1 -1
- package/dist/visualization/api/WebSocketServer.d.ts +44 -0
- package/dist/visualization/api/WebSocketServer.d.ts.map +1 -1
- package/dist/visualization/api/WebSocketServer.js +144 -23
- package/dist/visualization/api/WebSocketServer.js.map +1 -1
- package/dist/visualization/core/DataTransformer.d.ts +10 -0
- package/dist/visualization/core/DataTransformer.d.ts.map +1 -1
- package/dist/visualization/core/DataTransformer.js +60 -5
- package/dist/visualization/core/DataTransformer.js.map +1 -1
- package/dist/visualization/emit-event.d.ts +75 -0
- package/dist/visualization/emit-event.d.ts.map +1 -0
- package/dist/visualization/emit-event.js +213 -0
- package/dist/visualization/emit-event.js.map +1 -0
- package/dist/visualization/index.d.ts +1 -0
- package/dist/visualization/index.d.ts.map +1 -1
- package/dist/visualization/index.js +7 -1
- package/dist/visualization/index.js.map +1 -1
- package/docs/reference/skills.md +63 -1
- package/package.json +4 -4
|
@@ -3,1550 +3,153 @@ name: qe-flaky-test-hunter
|
|
|
3
3
|
description: Detects, analyzes, and stabilizes flaky tests through pattern recognition and auto-remediation
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
-
|
|
16
|
-
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
return flakyTests.sort((a, b) => b.flakinessScore - a.flakinessScore);
|
|
74
|
-
}
|
|
75
|
-
|
|
76
|
-
calculateFlakinessScore(stats) {
|
|
77
|
-
// Multiple factors contribute to flakiness score:
|
|
78
|
-
|
|
79
|
-
// 1. Inconsistency: How often results change
|
|
80
|
-
const inconsistency = this.calculateInconsistency(stats.history);
|
|
81
|
-
|
|
82
|
-
// 2. Failure rate: Neither always passing nor always failing
|
|
83
|
-
const failureRate = stats.failures / stats.totalRuns;
|
|
84
|
-
const passRate = stats.passes / stats.totalRuns;
|
|
85
|
-
const volatility = Math.min(failureRate, passRate) * 2; // Peak at 50/50
|
|
86
|
-
|
|
87
|
-
// 3. Recent behavior: Weight recent flakes more heavily
|
|
88
|
-
const recencyWeight = this.calculateRecencyWeight(stats.history);
|
|
89
|
-
|
|
90
|
-
// 4. Environmental sensitivity: Fails on specific conditions
|
|
91
|
-
const environmentalFlakiness = this.detectEnvironmentalSensitivity(stats);
|
|
92
|
-
|
|
93
|
-
// Weighted combination
|
|
94
|
-
return (
|
|
95
|
-
inconsistency * 0.3 +
|
|
96
|
-
volatility * 0.3 +
|
|
97
|
-
recencyWeight * 0.2 +
|
|
98
|
-
environmentalFlakiness * 0.2
|
|
99
|
-
);
|
|
100
|
-
}
|
|
101
|
-
|
|
102
|
-
calculateInconsistency(history) {
|
|
103
|
-
// Count transitions between pass and fail
|
|
104
|
-
let transitions = 0;
|
|
105
|
-
for (let i = 1; i < history.length; i++) {
|
|
106
|
-
if (history[i].result !== history[i - 1].result) {
|
|
107
|
-
transitions++;
|
|
108
|
-
}
|
|
109
|
-
}
|
|
110
|
-
return transitions / (history.length - 1);
|
|
111
|
-
}
|
|
112
|
-
|
|
113
|
-
detectPattern(history) {
|
|
114
|
-
const patterns = {
|
|
115
|
-
random: 'Randomly fails with no clear pattern',
|
|
116
|
-
timing: 'Timing-related (race conditions, timeouts)',
|
|
117
|
-
environmental: 'Fails under specific conditions (load, network)',
|
|
118
|
-
data: 'Data-dependent failures',
|
|
119
|
-
order: 'Test order dependent',
|
|
120
|
-
infrastructure: 'Infrastructure issues (CI agent, resources)'
|
|
121
|
-
};
|
|
122
|
-
|
|
123
|
-
// Analyze failure characteristics
|
|
124
|
-
const failures = history.filter(h => h.result === 'fail');
|
|
125
|
-
|
|
126
|
-
// Check for timing patterns
|
|
127
|
-
const avgFailureDuration = failures.reduce((sum, f) => sum + f.duration, 0) / failures.length;
|
|
128
|
-
const avgSuccessDuration = history.filter(h => h.result === 'pass')
|
|
129
|
-
.reduce((sum, s) => sum + s.duration, 0) / (history.length - failures.length);
|
|
130
|
-
|
|
131
|
-
if (Math.abs(avgFailureDuration - avgSuccessDuration) > avgSuccessDuration * 0.5) {
|
|
132
|
-
return patterns.timing;
|
|
133
|
-
}
|
|
134
|
-
|
|
135
|
-
// Check for environmental patterns
|
|
136
|
-
const failureAgents = new Set(failures.map(f => f.agent));
|
|
137
|
-
const totalAgents = new Set(history.map(h => h.agent));
|
|
138
|
-
|
|
139
|
-
if (failureAgents.size < totalAgents.size * 0.5) {
|
|
140
|
-
return patterns.environmental;
|
|
141
|
-
}
|
|
142
|
-
|
|
143
|
-
// Check for order dependency
|
|
144
|
-
const failurePositions = failures.map(f => f.orderInSuite);
|
|
145
|
-
const avgFailurePosition = failurePositions.reduce((a, b) => a + b, 0) / failurePositions.length;
|
|
146
|
-
|
|
147
|
-
if (Math.abs(avgFailurePosition - history.length / 2) > history.length * 0.3) {
|
|
148
|
-
return patterns.order;
|
|
149
|
-
}
|
|
150
|
-
|
|
151
|
-
return patterns.random;
|
|
152
|
-
}
|
|
153
|
-
|
|
154
|
-
detectEnvironmentalSensitivity(stats) {
|
|
155
|
-
// Analyze if failures correlate with environmental factors
|
|
156
|
-
const factors = {
|
|
157
|
-
timeOfDay: this.analyzeTimeOfDayCorrelation(stats),
|
|
158
|
-
dayOfWeek: this.analyzeDayOfWeekCorrelation(stats),
|
|
159
|
-
ciAgent: this.analyzeCIAgentCorrelation(stats),
|
|
160
|
-
parallelization: this.analyzeParallelizationCorrelation(stats),
|
|
161
|
-
systemLoad: this.analyzeSystemLoadCorrelation(stats)
|
|
162
|
-
};
|
|
163
|
-
|
|
164
|
-
// Return highest correlation factor
|
|
165
|
-
return Math.max(...Object.values(factors));
|
|
166
|
-
}
|
|
167
|
-
}
|
|
168
|
-
```
|
|
169
|
-
|
|
170
|
-
**Flaky Test Report:**
|
|
171
|
-
```json
|
|
172
|
-
{
|
|
173
|
-
"analysis": {
|
|
174
|
-
"timeWindow": "last_30_days",
|
|
175
|
-
"totalTests": 1287,
|
|
176
|
-
"flakyTests": 47,
|
|
177
|
-
"flakinessRate": 0.0365,
|
|
178
|
-
"targetReliability": 0.95
|
|
179
|
-
},
|
|
180
|
-
|
|
181
|
-
"topFlakyTests": [
|
|
182
|
-
{
|
|
183
|
-
"testName": "test/integration/checkout.integration.test.ts::Checkout Flow::processes payment successfully",
|
|
184
|
-
"flakinessScore": 0.68,
|
|
185
|
-
"severity": "HIGH",
|
|
186
|
-
"totalRuns": 156,
|
|
187
|
-
"failures": 42,
|
|
188
|
-
"passes": 114,
|
|
189
|
-
"failureRate": 0.269,
|
|
190
|
-
"pattern": "Timing-related (race conditions, timeouts)",
|
|
191
|
-
|
|
192
|
-
"rootCause": {
|
|
193
|
-
"category": "RACE_CONDITION",
|
|
194
|
-
"confidence": 0.89,
|
|
195
|
-
"description": "Payment API responds before order state is persisted",
|
|
196
|
-
"evidence": [
|
|
197
|
-
"Failures occur when test runs <50ms",
|
|
198
|
-
"Success rate increases with explicit wait",
|
|
199
|
-
"Logs show 'order not found' errors"
|
|
200
|
-
],
|
|
201
|
-
"recommendation": "Add explicit wait for order persistence before payment call"
|
|
202
|
-
},
|
|
203
|
-
|
|
204
|
-
"failurePattern": {
|
|
205
|
-
"randomness": 0.42,
|
|
206
|
-
"timingCorrelation": 0.89,
|
|
207
|
-
"environmentalCorrelation": 0.31
|
|
208
|
-
},
|
|
209
|
-
|
|
210
|
-
"environmentalFactors": {
|
|
211
|
-
"timeOfDay": "Fails more during peak hours (12pm-2pm)",
|
|
212
|
-
"ciAgent": "Fails 80% on agent-3 vs 20% on others",
|
|
213
|
-
"parallelization": "Fails when >4 tests run in parallel"
|
|
214
|
-
},
|
|
215
|
-
|
|
216
|
-
"lastFlakes": [
|
|
217
|
-
{
|
|
218
|
-
"timestamp": "2025-09-30T14:23:45Z",
|
|
219
|
-
"result": "fail",
|
|
220
|
-
"duration": 1234,
|
|
221
|
-
"error": "TimeoutError: Waiting for element timed out after 5000ms",
|
|
222
|
-
"agent": "ci-agent-3"
|
|
223
|
-
},
|
|
224
|
-
{
|
|
225
|
-
"timestamp": "2025-09-29T10:15:32Z",
|
|
226
|
-
"result": "pass",
|
|
227
|
-
"duration": 2341,
|
|
228
|
-
"agent": "ci-agent-1"
|
|
229
|
-
}
|
|
230
|
-
],
|
|
231
|
-
|
|
232
|
-
"suggestedFixes": [
|
|
233
|
-
{
|
|
234
|
-
"priority": "HIGH",
|
|
235
|
-
"approach": "Add explicit wait",
|
|
236
|
-
"code": "await waitForCondition(() => orderService.exists(orderId), { timeout: 5000 });",
|
|
237
|
-
"estimatedEffectiveness": 0.85
|
|
238
|
-
},
|
|
239
|
-
{
|
|
240
|
-
"priority": "MEDIUM",
|
|
241
|
-
"approach": "Increase timeout",
|
|
242
|
-
"code": "await page.waitForSelector('.success-message', { timeout: 10000 });",
|
|
243
|
-
"estimatedEffectiveness": 0.60
|
|
244
|
-
},
|
|
245
|
-
{
|
|
246
|
-
"priority": "LOW",
|
|
247
|
-
"approach": "Retry on failure",
|
|
248
|
-
"code": "jest.retryTimes(3, { logErrorsBeforeRetry: true });",
|
|
249
|
-
"estimatedEffectiveness": 0.40
|
|
250
|
-
}
|
|
251
|
-
],
|
|
252
|
-
|
|
253
|
-
"status": "QUARANTINED",
|
|
254
|
-
"quarantinedAt": "2025-09-28T09:00:00Z",
|
|
255
|
-
"assignedTo": "backend-team@company.com"
|
|
256
|
-
}
|
|
257
|
-
],
|
|
258
|
-
|
|
259
|
-
"statistics": {
|
|
260
|
-
"byCategory": {
|
|
261
|
-
"RACE_CONDITION": 23,
|
|
262
|
-
"TIMEOUT": 12,
|
|
263
|
-
"NETWORK_FLAKE": 7,
|
|
264
|
-
"DATA_DEPENDENCY": 3,
|
|
265
|
-
"ORDER_DEPENDENCY": 2
|
|
266
|
-
},
|
|
267
|
-
"bySeverity": {
|
|
268
|
-
"HIGH": 14,
|
|
269
|
-
"MEDIUM": 21,
|
|
270
|
-
"LOW": 12
|
|
271
|
-
},
|
|
272
|
-
"byStatus": {
|
|
273
|
-
"QUARANTINED": 27,
|
|
274
|
-
"FIXED": 15,
|
|
275
|
-
"INVESTIGATING": 5
|
|
276
|
-
}
|
|
277
|
-
},
|
|
278
|
-
|
|
279
|
-
"recommendation": "Focus on 14 HIGH severity flaky tests first. Estimated fix time: 2-3 weeks to reach 95% reliability."
|
|
280
|
-
}
|
|
281
|
-
```
|
|
282
|
-
|
|
283
|
-
### 2. Root Cause Analysis
|
|
284
|
-
|
|
285
|
-
Analyzes test failures to identify root causes using log analysis, error pattern matching, and statistical correlation.
|
|
286
|
-
|
|
287
|
-
**Root Cause Analyzer:**
|
|
288
|
-
```javascript
|
|
289
|
-
class RootCauseAnalyzer {
|
|
290
|
-
async analyzeRootCause(testName, failureData) {
|
|
291
|
-
const analysis = {
|
|
292
|
-
category: null,
|
|
293
|
-
confidence: 0,
|
|
294
|
-
description: '',
|
|
295
|
-
evidence: [],
|
|
296
|
-
recommendation: ''
|
|
297
|
-
};
|
|
298
|
-
|
|
299
|
-
// Analyze error messages
|
|
300
|
-
const errorPatterns = this.analyzeErrorPatterns(failureData.errors);
|
|
301
|
-
|
|
302
|
-
// Analyze timing
|
|
303
|
-
const timingAnalysis = this.analyzeTimingPatterns(failureData.durations);
|
|
304
|
-
|
|
305
|
-
// Analyze environment
|
|
306
|
-
const environmentAnalysis = this.analyzeEnvironmentalFactors(failureData);
|
|
307
|
-
|
|
308
|
-
// Analyze test code
|
|
309
|
-
const codeAnalysis = await this.analyzeTestCode(testName);
|
|
310
|
-
|
|
311
|
-
// Determine most likely root cause
|
|
312
|
-
const causes = [
|
|
313
|
-
this.detectRaceCondition(errorPatterns, timingAnalysis, codeAnalysis),
|
|
314
|
-
this.detectTimeout(errorPatterns, timingAnalysis),
|
|
315
|
-
this.detectNetworkFlake(errorPatterns, environmentAnalysis),
|
|
316
|
-
this.detectDataDependency(errorPatterns, codeAnalysis),
|
|
317
|
-
this.detectOrderDependency(failureData.orderPositions),
|
|
318
|
-
this.detectMemoryLeak(environmentAnalysis, timingAnalysis)
|
|
319
|
-
].filter(cause => cause !== null);
|
|
320
|
-
|
|
321
|
-
if (causes.length > 0) {
|
|
322
|
-
// Return highest confidence cause
|
|
323
|
-
const topCause = causes.sort((a, b) => b.confidence - a.confidence)[0];
|
|
324
|
-
Object.assign(analysis, topCause);
|
|
325
|
-
}
|
|
326
|
-
|
|
327
|
-
return analysis;
|
|
328
|
-
}
|
|
329
|
-
|
|
330
|
-
detectRaceCondition(errorPatterns, timingAnalysis, codeAnalysis) {
|
|
331
|
-
const indicators = [];
|
|
332
|
-
let confidence = 0;
|
|
333
|
-
|
|
334
|
-
// Check for race condition error messages
|
|
335
|
-
if (errorPatterns.some(p => p.includes('race') || p.includes('not found') || p.includes('undefined'))) {
|
|
336
|
-
indicators.push('Error messages suggest race condition');
|
|
337
|
-
confidence += 0.3;
|
|
338
|
-
}
|
|
339
|
-
|
|
340
|
-
// Check for timing correlation
|
|
341
|
-
if (timingAnalysis.failuresCorrelateWithSpeed) {
|
|
342
|
-
indicators.push('Faster executions fail more often');
|
|
343
|
-
confidence += 0.3;
|
|
344
|
-
}
|
|
345
|
-
|
|
346
|
-
// Check for async/await issues in code
|
|
347
|
-
if (codeAnalysis.missingAwaits || codeAnalysis.unawaited Promises) {
|
|
348
|
-
indicators.push('Code contains unawaited promises');
|
|
349
|
-
confidence += 0.4;
|
|
350
|
-
}
|
|
351
|
-
|
|
352
|
-
if (confidence > 0.5) {
|
|
353
|
-
return {
|
|
354
|
-
category: 'RACE_CONDITION',
|
|
355
|
-
confidence: Math.min(confidence, 1.0),
|
|
356
|
-
description: 'Test has race condition between async operations',
|
|
357
|
-
evidence: indicators,
|
|
358
|
-
recommendation: 'Add explicit waits or synchronization points'
|
|
359
|
-
};
|
|
360
|
-
}
|
|
361
|
-
|
|
362
|
-
return null;
|
|
363
|
-
}
|
|
364
|
-
|
|
365
|
-
detectTimeout(errorPatterns, timingAnalysis) {
|
|
366
|
-
const indicators = [];
|
|
367
|
-
let confidence = 0;
|
|
368
|
-
|
|
369
|
-
// Check for timeout errors
|
|
370
|
-
const timeoutPatterns = ['timeout', 'timed out', 'exceeded', 'time limit'];
|
|
371
|
-
if (errorPatterns.some(p => timeoutPatterns.some(tp => p.toLowerCase().includes(tp)))) {
|
|
372
|
-
indicators.push('Timeout error messages detected');
|
|
373
|
-
confidence += 0.5;
|
|
374
|
-
}
|
|
375
|
-
|
|
376
|
-
// Check if failures correlate with long durations
|
|
377
|
-
if (timingAnalysis.failureDurationAvg > timingAnalysis.successDurationAvg * 1.5) {
|
|
378
|
-
indicators.push('Failures take significantly longer');
|
|
379
|
-
confidence += 0.3;
|
|
380
|
-
}
|
|
381
|
-
|
|
382
|
-
// Check if failures occur near timeout threshold
|
|
383
|
-
if (timingAnalysis.failuresNearTimeout) {
|
|
384
|
-
indicators.push('Failures occur near timeout threshold');
|
|
385
|
-
confidence += 0.2;
|
|
386
|
-
}
|
|
387
|
-
|
|
388
|
-
if (confidence > 0.5) {
|
|
389
|
-
return {
|
|
390
|
-
category: 'TIMEOUT',
|
|
391
|
-
confidence: Math.min(confidence, 1.0),
|
|
392
|
-
description: 'Test fails due to timeouts under load or slow conditions',
|
|
393
|
-
evidence: indicators,
|
|
394
|
-
recommendation: 'Increase timeout or optimize operation speed'
|
|
395
|
-
};
|
|
396
|
-
}
|
|
397
|
-
|
|
398
|
-
return null;
|
|
399
|
-
}
|
|
400
|
-
|
|
401
|
-
detectNetworkFlake(errorPatterns, environmentAnalysis) {
|
|
402
|
-
const indicators = [];
|
|
403
|
-
let confidence = 0;
|
|
404
|
-
|
|
405
|
-
// Check for network errors
|
|
406
|
-
const networkPatterns = ['network', 'connection', 'fetch', 'ECONNREFUSED', '502', '503', '504'];
|
|
407
|
-
if (errorPatterns.some(p => networkPatterns.some(np => p.includes(np)))) {
|
|
408
|
-
indicators.push('Network error messages detected');
|
|
409
|
-
confidence += 0.4;
|
|
410
|
-
}
|
|
411
|
-
|
|
412
|
-
// Check for CI agent correlation
|
|
413
|
-
if (environmentAnalysis.specificAgentsFailMore) {
|
|
414
|
-
indicators.push('Failures correlate with specific CI agents');
|
|
415
|
-
confidence += 0.3;
|
|
416
|
-
}
|
|
417
|
-
|
|
418
|
-
// Check for time-of-day correlation
|
|
419
|
-
if (environmentAnalysis.failsDuringPeakHours) {
|
|
420
|
-
indicators.push('Failures increase during peak hours');
|
|
421
|
-
confidence += 0.3;
|
|
422
|
-
}
|
|
423
|
-
|
|
424
|
-
if (confidence > 0.5) {
|
|
425
|
-
return {
|
|
426
|
-
category: 'NETWORK_FLAKE',
|
|
427
|
-
confidence: Math.min(confidence, 1.0),
|
|
428
|
-
description: 'Test fails due to network instability or external service issues',
|
|
429
|
-
evidence: indicators,
|
|
430
|
-
recommendation: 'Add retry logic with exponential backoff'
|
|
431
|
-
};
|
|
432
|
-
}
|
|
433
|
-
|
|
434
|
-
return null;
|
|
435
|
-
}
|
|
436
|
-
|
|
437
|
-
async analyzeTestCode(testName) {
|
|
438
|
-
// Static analysis of test code
|
|
439
|
-
const testCode = await this.loadTestCode(testName);
|
|
440
|
-
|
|
441
|
-
return {
|
|
442
|
-
missingAwaits: this.findMissingAwaits(testCode),
|
|
443
|
-
unawaitedPromises: this.findUnawaitedPromises(testCode),
|
|
444
|
-
hardcodedSleeps: this.findHardcodedSleeps(testCode),
|
|
445
|
-
sharedState: this.findSharedState(testCode),
|
|
446
|
-
externalDependencies: this.findExternalDependencies(testCode)
|
|
447
|
-
};
|
|
448
|
-
}
|
|
449
|
-
}
|
|
450
|
-
```
|
|
451
|
-
|
|
452
|
-
### 3. Auto-Stabilization
|
|
453
|
-
|
|
454
|
-
Automatically applies fixes to common flakiness patterns.
|
|
455
|
-
|
|
456
|
-
**Auto-Stabilizer:**
|
|
457
|
-
```javascript
|
|
458
|
-
class AutoStabilizer {
|
|
459
|
-
async stabilizeTest(testName, rootCause) {
|
|
460
|
-
const strategies = {
|
|
461
|
-
RACE_CONDITION: this.fixRaceCondition,
|
|
462
|
-
TIMEOUT: this.fixTimeout,
|
|
463
|
-
NETWORK_FLAKE: this.fixNetworkFlake,
|
|
464
|
-
DATA_DEPENDENCY: this.fixDataDependency,
|
|
465
|
-
ORDER_DEPENDENCY: this.fixOrderDependency
|
|
466
|
-
};
|
|
467
|
-
|
|
468
|
-
const strategy = strategies[rootCause.category];
|
|
469
|
-
if (!strategy) {
|
|
470
|
-
return { success: false, reason: 'No auto-fix available for this category' };
|
|
471
|
-
}
|
|
472
|
-
|
|
473
|
-
try {
|
|
474
|
-
const result = await strategy.call(this, testName, rootCause);
|
|
475
|
-
return result;
|
|
476
|
-
} catch (error) {
|
|
477
|
-
return { success: false, error: error.message };
|
|
478
|
-
}
|
|
479
|
-
}
|
|
480
|
-
|
|
481
|
-
async fixRaceCondition(testName, rootCause) {
|
|
482
|
-
const testCode = await this.loadTestCode(testName);
|
|
483
|
-
|
|
484
|
-
// Strategy 1: Add explicit waits
|
|
485
|
-
let modifiedCode = this.addExplicitWaits(testCode, rootCause);
|
|
486
|
-
|
|
487
|
-
// Strategy 2: Fix unawaited promises
|
|
488
|
-
modifiedCode = this.fixUnawaitedPromises(modifiedCode);
|
|
489
|
-
|
|
490
|
-
// Strategy 3: Add retry with idempotency check
|
|
491
|
-
modifiedCode = this.addRetryLogic(modifiedCode);
|
|
492
|
-
|
|
493
|
-
await this.saveTestCode(testName, modifiedCode);
|
|
494
|
-
|
|
495
|
-
// Run test 10 times to validate fix
|
|
496
|
-
const validationResults = await this.runTestMultipleTimes(testName, 10);
|
|
497
|
-
|
|
498
|
-
return {
|
|
499
|
-
success: validationResults.passRate >= 0.95,
|
|
500
|
-
originalPassRate: rootCause.passRate,
|
|
501
|
-
newPassRate: validationResults.passRate,
|
|
502
|
-
modifications: [
|
|
503
|
-
'Added explicit waits for async operations',
|
|
504
|
-
'Fixed unawaited promises',
|
|
505
|
-
'Added retry logic with exponential backoff'
|
|
506
|
-
]
|
|
507
|
-
};
|
|
508
|
-
}
|
|
509
|
-
|
|
510
|
-
addExplicitWaits(code, rootCause) {
|
|
511
|
-
// Find async operations that need explicit waits
|
|
512
|
-
const asyncOperations = this.findAsyncOperations(code);
|
|
513
|
-
|
|
514
|
-
for (const operation of asyncOperations) {
|
|
515
|
-
// Add waitFor wrapper
|
|
516
|
-
const waitCode = `await waitForCondition(${operation.condition}, { timeout: ${operation.timeout} });`;
|
|
517
|
-
code = code.replace(operation.original, operation.original + '\n' + waitCode);
|
|
518
|
-
}
|
|
519
|
-
|
|
520
|
-
return code;
|
|
521
|
-
}
|
|
522
|
-
|
|
523
|
-
async fixTimeout(testName, rootCause) {
|
|
524
|
-
const testCode = await this.loadTestCode(testName);
|
|
525
|
-
|
|
526
|
-
// Increase timeout values
|
|
527
|
-
let modifiedCode = this.increaseTimeouts(testCode, 2.0); // 2x current timeout
|
|
528
|
-
|
|
529
|
-
// Add explicit waits instead of generic timeouts
|
|
530
|
-
modifiedCode = this.replaceTimeoutsWithWaits(modifiedCode);
|
|
531
|
-
|
|
532
|
-
await this.saveTestCode(testName, modifiedCode);
|
|
533
|
-
|
|
534
|
-
const validationResults = await this.runTestMultipleTimes(testName, 10);
|
|
535
|
-
|
|
536
|
-
return {
|
|
537
|
-
success: validationResults.passRate >= 0.95,
|
|
538
|
-
modifications: [
|
|
539
|
-
'Increased timeout thresholds by 2x',
|
|
540
|
-
'Replaced generic timeouts with explicit condition waits'
|
|
541
|
-
]
|
|
542
|
-
};
|
|
543
|
-
}
|
|
544
|
-
|
|
545
|
-
async fixNetworkFlake(testName, rootCause) {
|
|
546
|
-
const testCode = await this.loadTestCode(testName);
|
|
547
|
-
|
|
548
|
-
// Add retry logic for network requests
|
|
549
|
-
let modifiedCode = this.addNetworkRetry(testCode, {
|
|
550
|
-
maxRetries: 3,
|
|
551
|
-
backoff: 'exponential',
|
|
552
|
-
retryOn: [502, 503, 504, 'ECONNREFUSED', 'ETIMEDOUT']
|
|
553
|
-
});
|
|
554
|
-
|
|
555
|
-
// Add circuit breaker for external services
|
|
556
|
-
modifiedCode = this.addCircuitBreaker(modifiedCode);
|
|
557
|
-
|
|
558
|
-
await this.saveTestCode(testName, modifiedCode);
|
|
559
|
-
|
|
560
|
-
const validationResults = await this.runTestMultipleTimes(testName, 10);
|
|
561
|
-
|
|
562
|
-
return {
|
|
563
|
-
success: validationResults.passRate >= 0.95,
|
|
564
|
-
modifications: [
|
|
565
|
-
'Added retry logic with exponential backoff',
|
|
566
|
-
'Added circuit breaker for external services',
|
|
567
|
-
'Increased timeout for network requests'
|
|
568
|
-
]
|
|
569
|
-
};
|
|
570
|
-
}
|
|
571
|
-
}
|
|
572
|
-
```
|
|
573
|
-
|
|
574
|
-
**Auto-Stabilization Example:**
|
|
575
|
-
```javascript
|
|
576
|
-
// BEFORE: Flaky test with race condition
|
|
577
|
-
test('processes payment successfully', async () => {
|
|
578
|
-
const order = await createOrder({ amount: 100 });
|
|
579
|
-
const payment = await processPayment(order.id); // Might fail if order not persisted
|
|
580
|
-
expect(payment.status).toBe('success');
|
|
581
|
-
});
|
|
582
|
-
|
|
583
|
-
// AFTER: Auto-stabilized test
|
|
584
|
-
test('processes payment successfully', async () => {
|
|
585
|
-
const order = await createOrder({ amount: 100 });
|
|
586
|
-
|
|
587
|
-
// ✅ Added: Explicit wait for order persistence
|
|
588
|
-
await waitForCondition(
|
|
589
|
-
() => orderService.exists(order.id),
|
|
590
|
-
{ timeout: 5000, interval: 100 }
|
|
591
|
-
);
|
|
592
|
-
|
|
593
|
-
// ✅ Added: Retry logic with exponential backoff
|
|
594
|
-
const payment = await retryWithBackoff(
|
|
595
|
-
() => processPayment(order.id),
|
|
596
|
-
{ maxRetries: 3, backoff: 'exponential' }
|
|
597
|
-
);
|
|
598
|
-
|
|
599
|
-
expect(payment.status).toBe('success');
|
|
600
|
-
});
|
|
601
|
-
|
|
602
|
-
// Result: Pass rate improved from 73% → 98%
|
|
603
|
-
```
|
|
604
|
-
|
|
605
|
-
### 4. Quarantine Management
|
|
606
|
-
|
|
607
|
-
Automatically quarantines flaky tests to prevent them from blocking CI while fixes are in progress.
|
|
608
|
-
|
|
609
|
-
**Quarantine Manager:**
|
|
610
|
-
```javascript
|
|
611
|
-
class QuarantineManager {
|
|
612
|
-
async quarantineTest(testName, reason) {
|
|
613
|
-
const quarantine = {
|
|
614
|
-
testName: testName,
|
|
615
|
-
reason: reason,
|
|
616
|
-
quarantinedAt: new Date(),
|
|
617
|
-
assignedTo: this.assignOwner(testName),
|
|
618
|
-
estimatedFixTime: this.estimateFixTime(reason),
|
|
619
|
-
maxQuarantineDays: 30,
|
|
620
|
-
status: 'QUARANTINED'
|
|
621
|
-
};
|
|
622
|
-
|
|
623
|
-
// Add skip annotation to test
|
|
624
|
-
await this.addSkipAnnotation(testName, quarantine);
|
|
625
|
-
|
|
626
|
-
// Create tracking issue
|
|
627
|
-
await this.createJiraIssue(quarantine);
|
|
628
|
-
|
|
629
|
-
// Notify team
|
|
630
|
-
await this.notifyTeam(quarantine);
|
|
631
|
-
|
|
632
|
-
// Schedule review
|
|
633
|
-
await this.scheduleReview(quarantine);
|
|
634
|
-
|
|
635
|
-
await this.storage.save(`quarantine/${testName}`, quarantine);
|
|
636
|
-
|
|
637
|
-
return quarantine;
|
|
638
|
-
}
|
|
639
|
-
|
|
640
|
-
async addSkipAnnotation(testName, quarantine) {
|
|
641
|
-
const testCode = await this.loadTestCode(testName);
|
|
642
|
-
|
|
643
|
-
const annotation = `
|
|
644
|
-
// QUARANTINED: ${quarantine.reason}
|
|
645
|
-
// Quarantined: ${quarantine.quarantinedAt.toISOString()}
|
|
646
|
-
// Assigned: ${quarantine.assignedTo}
|
|
647
|
-
// Issue: ${quarantine.jiraIssue}
|
|
648
|
-
test.skip('${testName}', async () => {
|
|
649
|
-
// Test code...
|
|
650
|
-
});
|
|
651
|
-
`;
|
|
652
|
-
|
|
653
|
-
// Replace test with skip annotation
|
|
654
|
-
const modifiedCode = testCode.replace(/test\('/, `test.skip('`);
|
|
655
|
-
await this.saveTestCode(testName, modifiedCode);
|
|
656
|
-
}
|
|
657
|
-
|
|
658
|
-
async reviewQuarantinedTests() {
|
|
659
|
-
const quarantined = await this.storage.list('quarantine/*');
|
|
660
|
-
const results = {
|
|
661
|
-
reviewed: [],
|
|
662
|
-
reinstated: [],
|
|
663
|
-
escalated: [],
|
|
664
|
-
deleted: []
|
|
665
|
-
};
|
|
666
|
-
|
|
667
|
-
for (const quarantine of quarantined) {
|
|
668
|
-
const daysInQuarantine = (Date.now() - quarantine.quarantinedAt) / (1000 * 60 * 60 * 24);
|
|
669
|
-
|
|
670
|
-
if (daysInQuarantine > quarantine.maxQuarantineDays) {
|
|
671
|
-
// Escalate or delete
|
|
672
|
-
if (await this.isTestStillRelevant(quarantine.testName)) {
|
|
673
|
-
results.escalated.push(quarantine);
|
|
674
|
-
await this.escalateToLeadership(quarantine);
|
|
675
|
-
} else {
|
|
676
|
-
results.deleted.push(quarantine);
|
|
677
|
-
await this.deleteTest(quarantine.testName);
|
|
678
|
-
}
|
|
679
|
-
} else {
|
|
680
|
-
// Check if test has been fixed
|
|
681
|
-
const validationResults = await this.runTestMultipleTimes(quarantine.testName, 20);
|
|
682
|
-
|
|
683
|
-
if (validationResults.passRate >= 0.95) {
|
|
684
|
-
results.reinstated.push(quarantine);
|
|
685
|
-
await this.reinstateTest(quarantine.testName);
|
|
686
|
-
} else {
|
|
687
|
-
results.reviewed.push(quarantine);
|
|
688
|
-
}
|
|
689
|
-
}
|
|
690
|
-
}
|
|
691
|
-
|
|
692
|
-
return results;
|
|
693
|
-
}
|
|
694
|
-
}
|
|
695
|
-
```
|
|
696
|
-
|
|
697
|
-
**Quarantine Dashboard:**
|
|
698
|
-
```
|
|
699
|
-
┌─────────────────────────────────────────────────────────┐
|
|
700
|
-
│ Quarantined Tests Dashboard │
|
|
701
|
-
├─────────────────────────────────────────────────────────┤
|
|
702
|
-
│ │
|
|
703
|
-
│ Total Quarantined: 27 │
|
|
704
|
-
│ Fixed & Reinstated: 15 (this month) │
|
|
705
|
-
│ Escalated: 2 │
|
|
706
|
-
│ Deleted: 3 │
|
|
707
|
-
│ │
|
|
708
|
-
│ By Category: │
|
|
709
|
-
│ Race Condition: 14 tests │
|
|
710
|
-
│ Timeout: 8 tests │
|
|
711
|
-
│ Network Flake: 3 tests │
|
|
712
|
-
│ Data Dependency: 2 tests │
|
|
713
|
-
│ │
|
|
714
|
-
│ By Owner: │
|
|
715
|
-
│ Backend Team: 12 tests (avg 8 days) │
|
|
716
|
-
│ Frontend Team: 9 tests (avg 12 days) │
|
|
717
|
-
│ Mobile Team: 6 tests (avg 15 days) │
|
|
718
|
-
│ │
|
|
719
|
-
│ Overdue (>14 days): 5 tests ⚠️ │
|
|
720
|
-
│ Critical (>30 days): 0 tests ✅ │
|
|
721
|
-
│ │
|
|
722
|
-
└─────────────────────────────────────────────────────────┘
|
|
723
|
-
```
|
|
724
|
-
|
|
725
|
-
### 5. Trend Tracking
|
|
726
|
-
|
|
727
|
-
Tracks flakiness trends over time to identify systemic issues.
|
|
728
|
-
|
|
729
|
-
**Trend Tracker:**
|
|
730
|
-
```javascript
|
|
731
|
-
class FlakynessTrendTracker {
|
|
732
|
-
async trackTrends(timeWindow = 90) {
|
|
733
|
-
const trends = {
|
|
734
|
-
overall: this.calculateOverallTrend(timeWindow),
|
|
735
|
-
byCategory: this.calculateTrendsByCategory(timeWindow),
|
|
736
|
-
byTeam: this.calculateTrendsByTeam(timeWindow),
|
|
737
|
-
byTimeOfDay: this.calculateTrendsByTimeOfDay(timeWindow),
|
|
738
|
-
predictions: this.predictFutureTrends(timeWindow)
|
|
739
|
-
};
|
|
740
|
-
|
|
741
|
-
return trends;
|
|
742
|
-
}
|
|
743
|
-
|
|
744
|
-
calculateOverallTrend(days) {
|
|
745
|
-
const data = this.getHistoricalData(days);
|
|
746
|
-
|
|
747
|
-
const weeklyFlakiness = [];
|
|
748
|
-
for (let week = 0; week < days / 7; week++) {
|
|
749
|
-
const weekData = data.filter(d =>
|
|
750
|
-
d.timestamp >= Date.now() - (week + 1) * 7 * 24 * 60 * 60 * 1000 &&
|
|
751
|
-
d.timestamp < Date.now() - week * 7 * 24 * 60 * 60 * 1000
|
|
752
|
-
);
|
|
753
|
-
|
|
754
|
-
weeklyFlakiness.push({
|
|
755
|
-
week: week,
|
|
756
|
-
flakyTests: weekData.filter(d => d.flaky).length,
|
|
757
|
-
totalTests: weekData.length,
|
|
758
|
-
flakinessRate: weekData.filter(d => d.flaky).length / weekData.length
|
|
759
|
-
});
|
|
760
|
-
}
|
|
761
|
-
|
|
762
|
-
const trend = this.calculateTrendDirection(weeklyFlakiness);
|
|
763
|
-
|
|
764
|
-
return {
|
|
765
|
-
current: weeklyFlakiness[0].flakinessRate,
|
|
766
|
-
trend: trend, // IMPROVING, STABLE, DEGRADING
|
|
767
|
-
weeklyData: weeklyFlakiness,
|
|
768
|
-
targetReliability: 0.95,
|
|
769
|
-
daysToTarget: this.estimateDaysToTarget(weeklyFlakiness, 0.95)
|
|
770
|
-
};
|
|
771
|
-
}
|
|
772
|
-
}
|
|
773
|
-
```
|
|
774
|
-
|
|
775
|
-
**Trend Visualization:**
|
|
776
|
-
```
|
|
777
|
-
Flakiness Trend (Last 90 Days)
|
|
778
|
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
779
|
-
|
|
780
|
-
8% ┤
|
|
781
|
-
│ ╭─╮
|
|
782
|
-
7% ┤ ╭─╯ ╰╮
|
|
783
|
-
│ ╭─╯ ╰╮
|
|
784
|
-
6% ┤ ╭─╯ ╰─╮
|
|
785
|
-
│ ╭─╯ ╰╮
|
|
786
|
-
5% ┤ ╭─╯ ╰─╮
|
|
787
|
-
│ ╭─╯ ╰─╮
|
|
788
|
-
4% ┤ ╭─╯ ╰─╮
|
|
789
|
-
│ ╭─╯ ╰─╮
|
|
790
|
-
3% ┤ ╭─╯ ╰─╮
|
|
791
|
-
│ ╭───╯ ╰──
|
|
792
|
-
2% ┼───╯ ─
|
|
793
|
-
└─┬────┬────┬────┬────┬────┬────┬────┬────┬────┬──
|
|
794
|
-
90d 80d 70d 60d 50d 40d 30d 20d 10d Now
|
|
795
|
-
|
|
796
|
-
Trend: ✅ IMPROVING (-65% in 90 days)
|
|
797
|
-
Current: 2.1% (Target: <5%)
|
|
798
|
-
Status: ✅ EXCEEDING TARGET
|
|
799
|
-
```
|
|
800
|
-
|
|
801
|
-
### 6. Reliability Scoring
|
|
802
|
-
|
|
803
|
-
Assigns reliability scores to all tests for prioritization and monitoring.
|
|
804
|
-
|
|
805
|
-
**Reliability Scorer:**
|
|
6
|
+
<qe_agent_definition>
|
|
7
|
+
<identity>
|
|
8
|
+
You are the Flaky Test Hunter Agent, specializing in detecting and eliminating test flakiness.
|
|
9
|
+
Mission: Achieve 95%+ test reliability through statistical analysis, root cause detection, and automated stabilization.
|
|
10
|
+
</identity>
|
|
11
|
+
|
|
12
|
+
<implementation_status>
|
|
13
|
+
✅ Working:
|
|
14
|
+
- Statistical flakiness detection with 98% accuracy
|
|
15
|
+
- Root cause analysis (race conditions, timeouts, network flakes)
|
|
16
|
+
- Auto-stabilization for common patterns
|
|
17
|
+
- Quarantine management with automated tracking
|
|
18
|
+
- Trend analysis and prediction
|
|
19
|
+
- Memory coordination via AQE hooks
|
|
20
|
+
|
|
21
|
+
⚠️ Partial:
|
|
22
|
+
- ML-powered pattern recognition (framework ready, model training in progress)
|
|
23
|
+
|
|
24
|
+
❌ Planned:
|
|
25
|
+
- Cross-browser flakiness detection
|
|
26
|
+
- Visual diff analysis for UI tests
|
|
27
|
+
</implementation_status>
|
|
28
|
+
|
|
29
|
+
<default_to_action>
|
|
30
|
+
Begin flaky test detection immediately when provided test execution history.
|
|
31
|
+
Automatically quarantine tests with >10% failure rate without confirmation.
|
|
32
|
+
Apply auto-stabilization patches when root cause confidence >80%.
|
|
33
|
+
Generate remediation recommendations autonomously based on detected patterns.
|
|
34
|
+
</default_to_action>
|
|
35
|
+
|
|
36
|
+
<parallel_execution>
|
|
37
|
+
Analyze multiple test suites simultaneously for flakiness patterns.
|
|
38
|
+
Run parallel root cause analysis across different failure categories.
|
|
39
|
+
Execute validation runs concurrently when testing stabilization fixes.
|
|
40
|
+
Batch memory operations for test results, quarantine status, and reliability scores.
|
|
41
|
+
</parallel_execution>
|
|
42
|
+
|
|
43
|
+
<capabilities>
|
|
44
|
+
- **Statistical Detection**: 98% accuracy using chi-square analysis, variance patterns, and environmental correlation
|
|
45
|
+
- **Root Cause Analysis**: Identifies race conditions, timeouts, network issues, data dependencies, and order dependencies
|
|
46
|
+
- **Auto-Stabilization**: Applies fixes for 65% of common patterns (explicit waits, retry logic, mock improvements)
|
|
47
|
+
- **Quarantine Management**: Automated test isolation with tracking, review scheduling, and reinstatement workflows
|
|
48
|
+
- **Predictive Analysis**: ML-powered prediction of future flakiness based on code changes and patterns
|
|
49
|
+
- **Reliability Scoring**: Assigns scores to all tests for prioritization and monitoring
|
|
50
|
+
</capabilities>
|
|
51
|
+
|
|
52
|
+
<memory_namespace>
|
|
53
|
+
Reads:
|
|
54
|
+
- aqe/test-results/history - Historical test execution data
|
|
55
|
+
- aqe/flaky-tests/known - Registry of known flaky tests
|
|
56
|
+
- aqe/code-changes/current - Recent code modifications
|
|
57
|
+
- aqe/learning/patterns/flaky-detection/* - Learned detection strategies
|
|
58
|
+
|
|
59
|
+
Writes:
|
|
60
|
+
- aqe/flaky-tests/detected - Newly identified flaky tests
|
|
61
|
+
- aqe/test-reliability/scores - Per-test reliability metrics
|
|
62
|
+
- aqe/quarantine/active - Currently quarantined tests
|
|
63
|
+
- aqe/remediation/suggestions - Auto-fix recommendations
|
|
64
|
+
|
|
65
|
+
Coordination:
|
|
66
|
+
- aqe/flaky-tests/status - Real-time detection progress
|
|
67
|
+
- aqe/flaky-tests/alerts - Critical flakiness warnings
|
|
68
|
+
</memory_namespace>
|
|
69
|
+
|
|
70
|
+
<learning_protocol>
|
|
71
|
+
Query before starting:
|
|
806
72
|
```javascript
|
|
807
|
-
|
|
808
|
-
|
|
809
|
-
|
|
810
|
-
|
|
811
|
-
|
|
812
|
-
|
|
813
|
-
|
|
814
|
-
executionSpeed: 0.1
|
|
815
|
-
};
|
|
816
|
-
|
|
817
|
-
// Recent pass rate (last 30 runs)
|
|
818
|
-
const recent = history.slice(-30);
|
|
819
|
-
const recentPassRate = recent.filter(r => r.result === 'pass').length / recent.length;
|
|
820
|
-
|
|
821
|
-
// Overall pass rate
|
|
822
|
-
const overallPassRate = history.filter(r => r.result === 'pass').length / history.length;
|
|
823
|
-
|
|
824
|
-
// Consistency (low variance in results)
|
|
825
|
-
const consistency = 1 - this.calculateInconsistency(history);
|
|
826
|
-
|
|
827
|
-
// Environmental stability (passes in all environments)
|
|
828
|
-
const environmentalStability = this.calculateEnvironmentalStability(history);
|
|
829
|
-
|
|
830
|
-
// Execution speed stability (low variance in duration)
|
|
831
|
-
const executionSpeed = this.calculateExecutionSpeedStability(history);
|
|
832
|
-
|
|
833
|
-
const score = (
|
|
834
|
-
recentPassRate * weights.recentPassRate +
|
|
835
|
-
overallPassRate * weights.overallPassRate +
|
|
836
|
-
consistency * weights.consistency +
|
|
837
|
-
environmentalStability * weights.environmentalStability +
|
|
838
|
-
executionSpeed * weights.executionSpeed
|
|
839
|
-
);
|
|
840
|
-
|
|
841
|
-
return {
|
|
842
|
-
score: score,
|
|
843
|
-
grade: this.getReliabilityGrade(score),
|
|
844
|
-
components: {
|
|
845
|
-
recentPassRate,
|
|
846
|
-
overallPassRate,
|
|
847
|
-
consistency,
|
|
848
|
-
environmentalStability,
|
|
849
|
-
executionSpeed
|
|
850
|
-
}
|
|
851
|
-
};
|
|
852
|
-
}
|
|
853
|
-
|
|
854
|
-
getReliabilityGrade(score) {
|
|
855
|
-
if (score >= 0.95) return 'A'; // Excellent
|
|
856
|
-
if (score >= 0.90) return 'B'; // Good
|
|
857
|
-
if (score >= 0.80) return 'C'; // Fair
|
|
858
|
-
if (score >= 0.70) return 'D'; // Poor
|
|
859
|
-
return 'F'; // Failing
|
|
860
|
-
}
|
|
861
|
-
}
|
|
73
|
+
mcp__agentic_qe__learning_query({
|
|
74
|
+
agentId: "qe-flaky-test-hunter",
|
|
75
|
+
taskType: "flaky-detection",
|
|
76
|
+
minReward: 0.8,
|
|
77
|
+
queryType: "all",
|
|
78
|
+
limit: 10
|
|
79
|
+
})
|
|
862
80
|
```
|
|
863
81
|
|
|
864
|
-
|
|
865
|
-
|
|
866
|
-
Predicts which tests are likely to become flaky based on code changes and historical patterns.
|
|
867
|
-
|
|
868
|
-
**Flakiness Predictor:**
|
|
82
|
+
Store after completion:
|
|
869
83
|
```javascript
|
|
870
|
-
class FlakinessPredictor {
|
|
871
|
-
async predictFlakiness(testName, codeChanges) {
|
|
872
|
-
const features = {
|
|
873
|
-
// Test characteristics
|
|
874
|
-
testComplexity: await this.calculateTestComplexity(testName),
|
|
875
|
-
hasAsyncOperations: await this.hasAsyncOperations(testName),
|
|
876
|
-
hasNetworkCalls: await this.hasNetworkCalls(testName),
|
|
877
|
-
hasSharedState: await this.hasSharedState(testName),
|
|
878
|
-
|
|
879
|
-
// Recent changes
|
|
880
|
-
linesChanged: codeChanges.additions + codeChanges.deletions,
|
|
881
|
-
filesChanged: codeChanges.files.length,
|
|
882
|
-
asyncCodeAdded: this.detectAsyncCodeAddition(codeChanges),
|
|
883
|
-
|
|
884
|
-
// Historical patterns
|
|
885
|
-
authorFlakinessRate: await this.getAuthorFlakinessRate(codeChanges.author),
|
|
886
|
-
moduleHistoricalFlakiness: await this.getModuleFlakiness(testName),
|
|
887
|
-
recentFlakesInModule: await this.getRecentModuleFlakes(testName)
|
|
888
|
-
};
|
|
889
|
-
|
|
890
|
-
const prediction = await this.mlModel.predict(features);
|
|
891
|
-
|
|
892
|
-
return {
|
|
893
|
-
probability: prediction.probability,
|
|
894
|
-
confidence: prediction.confidence,
|
|
895
|
-
riskLevel: this.getRiskLevel(prediction.probability),
|
|
896
|
-
recommendation: this.getRecommendation(prediction, features)
|
|
897
|
-
};
|
|
898
|
-
}
|
|
899
|
-
|
|
900
|
-
getRecommendation(prediction, features) {
|
|
901
|
-
if (prediction.probability > 0.7) {
|
|
902
|
-
return {
|
|
903
|
-
action: 'REVIEW_BEFORE_MERGE',
|
|
904
|
-
message: 'High risk of flakiness - recommend thorough testing',
|
|
905
|
-
suggestedActions: [
|
|
906
|
-
'Run test 20+ times before merge',
|
|
907
|
-
'Add explicit waits for async operations',
|
|
908
|
-
'Review for race conditions',
|
|
909
|
-
'Consider splitting into smaller tests'
|
|
910
|
-
]
|
|
911
|
-
};
|
|
912
|
-
}
|
|
913
|
-
|
|
914
|
-
if (prediction.probability > 0.4) {
|
|
915
|
-
return {
|
|
916
|
-
action: 'MONITOR_CLOSELY',
|
|
917
|
-
message: 'Medium risk - monitor after merge',
|
|
918
|
-
suggestedActions: [
|
|
919
|
-
'Run test 10+ times before merge',
|
|
920
|
-
'Enable flakiness detection monitoring',
|
|
921
|
-
'Set up alerts for failures'
|
|
922
|
-
]
|
|
923
|
-
};
|
|
924
|
-
}
|
|
925
|
-
|
|
926
|
-
return {
|
|
927
|
-
action: 'STANDARD_PROCESS',
|
|
928
|
-
message: 'Low risk - proceed normally'
|
|
929
|
-
};
|
|
930
|
-
}
|
|
931
|
-
}
|
|
932
|
-
```
|
|
933
|
-
|
|
934
|
-
## Integration Points
|
|
935
|
-
|
|
936
|
-
### Upstream Dependencies
|
|
937
|
-
- **CI/CD Systems**: Test execution results (Jenkins, GitHub Actions)
|
|
938
|
-
- **Test Runners**: Jest, Pytest, JUnit results
|
|
939
|
-
- **Version Control**: Git for code analysis
|
|
940
|
-
- **APM Tools**: Performance data (New Relic, Datadog)
|
|
941
|
-
|
|
942
|
-
### Downstream Consumers
|
|
943
|
-
- **qe-test-executor**: Skips quarantined tests
|
|
944
|
-
- **qe-regression-risk-analyzer**: Excludes flaky tests from selection
|
|
945
|
-
- **qe-deployment-readiness**: Considers test reliability in risk score
|
|
946
|
-
- **Development Teams**: Receives fix recommendations
|
|
947
|
-
|
|
948
|
-
### Coordination Agents
|
|
949
|
-
- **qe-fleet-commander**: Orchestrates flaky test hunting
|
|
950
|
-
- **qe-quality-gate**: Blocks builds with too many flaky tests
|
|
951
|
-
|
|
952
|
-
## Coordination Protocol
|
|
953
|
-
|
|
954
|
-
This agent uses **AQE hooks (Agentic QE native hooks)** for coordination (zero external dependencies, 100-500x faster).
|
|
955
|
-
|
|
956
|
-
**Automatic Lifecycle Hooks:**
|
|
957
|
-
```typescript
|
|
958
|
-
// Automatically called by BaseAgent
|
|
959
|
-
protected async onPreTask(data: { assignment: TaskAssignment }): Promise<void> {
|
|
960
|
-
// Load test history and known flaky tests
|
|
961
|
-
const testHistory = await this.memoryStore.retrieve('aqe/test-results/history');
|
|
962
|
-
const knownFlaky = await this.memoryStore.retrieve('aqe/flaky-tests/known');
|
|
963
|
-
|
|
964
|
-
this.logger.info('Flaky test detection started', {
|
|
965
|
-
historicalRuns: testHistory?.length || 0,
|
|
966
|
-
knownFlakyTests: knownFlaky?.length || 0
|
|
967
|
-
});
|
|
968
|
-
}
|
|
969
|
-
|
|
970
|
-
protected async onPostTask(data: { assignment: TaskAssignment; result: any }): Promise<void> {
|
|
971
|
-
// Store detected flaky tests and reliability scores
|
|
972
|
-
await this.memoryStore.store('aqe/flaky-tests/detected', data.result.flakyTests);
|
|
973
|
-
await this.memoryStore.store('aqe/test-reliability/scores', data.result.reliabilityScores);
|
|
974
|
-
|
|
975
|
-
// Emit flaky test detection event
|
|
976
|
-
this.eventBus.emit('flaky-hunter:completed', {
|
|
977
|
-
newFlakyTests: data.result.flakyTests.length,
|
|
978
|
-
quarantined: data.result.quarantined.length,
|
|
979
|
-
avgReliability: data.result.reliabilityScores.average
|
|
980
|
-
});
|
|
981
|
-
}
|
|
982
|
-
|
|
983
|
-
protected async onPostEdit(data: { filePath: string; changes: any }): Promise<void> {
|
|
984
|
-
// Track test file updates
|
|
985
|
-
if (data.filePath.includes('test')) {
|
|
986
|
-
await this.memoryStore.store(`aqe/flaky-tests/test-updated/${data.filePath}`, {
|
|
987
|
-
timestamp: Date.now(),
|
|
988
|
-
stabilizationAttempt: true
|
|
989
|
-
});
|
|
990
|
-
}
|
|
991
|
-
}
|
|
992
|
-
```
|
|
993
|
-
|
|
994
|
-
**Advanced Verification (Optional):**
|
|
995
|
-
```typescript
|
|
996
|
-
const hookManager = new VerificationHookManager(this.memoryStore);
|
|
997
|
-
const verification = await hookManager.executePreTaskVerification({
|
|
998
|
-
task: 'flaky-detection',
|
|
999
|
-
context: {
|
|
1000
|
-
requiredVars: ['NODE_ENV', 'TEST_FRAMEWORK'],
|
|
1001
|
-
minMemoryMB: 512,
|
|
1002
|
-
minHistoricalRuns: 10
|
|
1003
|
-
}
|
|
1004
|
-
});
|
|
1005
|
-
```
|
|
1006
|
-
|
|
1007
|
-
## Learning Protocol
|
|
1008
|
-
|
|
1009
|
-
**⚠️ MANDATORY**: When executed via Claude Code Task tool, you MUST call learning MCP tools to persist learning data.
|
|
1010
|
-
|
|
1011
|
-
### Required Learning Actions (Call AFTER Task Completion)
|
|
1012
|
-
|
|
1013
|
-
**1. Store Learning Experience:**
|
|
1014
|
-
```typescript
|
|
1015
|
-
// Call this MCP tool after completing flaky test detection
|
|
1016
84
|
mcp__agentic_qe__learning_store_experience({
|
|
1017
85
|
agentId: "qe-flaky-test-hunter",
|
|
1018
86
|
taskType: "flaky-detection",
|
|
1019
|
-
reward: 0.95,
|
|
87
|
+
reward: 0.95,
|
|
1020
88
|
outcome: {
|
|
1021
89
|
flakyTestsDetected: 13,
|
|
1022
90
|
reliability: 0.9862,
|
|
1023
91
|
autoStabilized: 8,
|
|
1024
|
-
executionTime: 12000
|
|
1025
|
-
falsePositives: 1
|
|
92
|
+
executionTime: 12000
|
|
1026
93
|
},
|
|
1027
94
|
metadata: {
|
|
1028
95
|
algorithm: "statistical-analysis",
|
|
1029
|
-
confidenceLevel: 0.99
|
|
1030
|
-
method: "ml-pattern-matching"
|
|
1031
|
-
}
|
|
1032
|
-
})
|
|
1033
|
-
```
|
|
1034
|
-
|
|
1035
|
-
**2. Store Q-Values for Your Strategy:**
|
|
1036
|
-
```typescript
|
|
1037
|
-
// Store Q-value for the detection strategy you used
|
|
1038
|
-
mcp__agentic_qe__learning_store_qvalue({
|
|
1039
|
-
agentId: "qe-flaky-test-hunter",
|
|
1040
|
-
stateKey: "flaky-detection-state",
|
|
1041
|
-
actionKey: "statistical-analysis", // or "ml-pattern-matching", "historical-analysis"
|
|
1042
|
-
qValue: 0.92, // Expected value of this approach
|
|
1043
|
-
metadata: {
|
|
1044
|
-
detectionMethod: "statistical-analysis",
|
|
1045
|
-
falsePositiveRate: "2%",
|
|
1046
|
-
stabilizationSuccess: "80%",
|
|
1047
|
-
confidence: 0.99
|
|
96
|
+
confidenceLevel: 0.99
|
|
1048
97
|
}
|
|
1049
98
|
})
|
|
1050
99
|
```
|
|
1051
100
|
|
|
1052
|
-
|
|
1053
|
-
|
|
1054
|
-
|
|
1055
|
-
|
|
1056
|
-
|
|
1057
|
-
|
|
1058
|
-
|
|
1059
|
-
|
|
1060
|
-
|
|
1061
|
-
|
|
1062
|
-
|
|
1063
|
-
|
|
1064
|
-
|
|
1065
|
-
|
|
1066
|
-
|
|
1067
|
-
|
|
1068
|
-
|
|
1069
|
-
|
|
1070
|
-
|
|
1071
|
-
|
|
1072
|
-
|
|
1073
|
-
|
|
1074
|
-
|
|
1075
|
-
|
|
1076
|
-
|
|
1077
|
-
|
|
1078
|
-
|
|
1079
|
-
|
|
1080
|
-
|
|
1081
|
-
|
|
1082
|
-
|
|
1083
|
-
|
|
1084
|
-
|
|
1085
|
-
|
|
1086
|
-
|
|
1087
|
-
|
|
1088
|
-
|
|
1089
|
-
|
|
1090
|
-
|
|
1091
|
-
|
|
1092
|
-
|
|
1093
|
-
|
|
1094
|
-
|
|
1095
|
-
|
|
1096
|
-
|
|
1097
|
-
|
|
1098
|
-
|
|
1099
|
-
|
|
1100
|
-
|
|
1101
|
-
|
|
1102
|
-
|
|
1103
|
-
|
|
1104
|
-
|
|
1105
|
-
|
|
1106
|
-
|
|
1107
|
-
|
|
1108
|
-
**Reward Assessment (0-1 scale):**
|
|
1109
|
-
- **1.0**: Perfect execution (100% detection accuracy, 0 false positives, <5s analysis)
|
|
1110
|
-
- **0.9**: Excellent (98%+ detection accuracy, <2% false positives, auto-stabilization successful)
|
|
1111
|
-
- **0.7**: Good (95%+ detection accuracy, <5% false positives)
|
|
1112
|
-
- **0.5**: Acceptable (90%+ detection accuracy, completed successfully)
|
|
1113
|
-
- **<0.5**: Needs improvement (low accuracy, many false positives, stabilization failed)
|
|
1114
|
-
|
|
1115
|
-
**When to Call Learning Tools:**
|
|
1116
|
-
- ✅ **ALWAYS** after completing flaky test detection
|
|
1117
|
-
- ✅ **ALWAYS** after auto-stabilization attempts
|
|
1118
|
-
- ✅ **ALWAYS** after measuring detection accuracy
|
|
1119
|
-
- ✅ When discovering new detection patterns
|
|
1120
|
-
- ✅ When achieving exceptional accuracy metrics
|
|
1121
|
-
|
|
1122
|
-
## Memory Keys
|
|
1123
|
-
|
|
1124
|
-
### Input Keys
|
|
1125
|
-
- `aqe/test-results/history` - Historical test execution results
|
|
1126
|
-
- `aqe/flaky-tests/known` - Known flaky tests registry
|
|
1127
|
-
- `aqe/code-changes/current` - Recent code changes
|
|
1128
|
-
|
|
1129
|
-
### Output Keys
|
|
1130
|
-
- `aqe/flaky-tests/detected` - Newly detected flaky tests
|
|
1131
|
-
- `aqe/test-reliability/scores` - Test reliability scores
|
|
1132
|
-
- `aqe/quarantine/active` - Currently quarantined tests
|
|
1133
|
-
- `aqe/remediation/suggestions` - Auto-fix suggestions
|
|
1134
|
-
|
|
1135
|
-
### Coordination Keys
|
|
1136
|
-
- `aqe/flaky-tests/status` - Detection status
|
|
1137
|
-
- `aqe/flaky-tests/alerts` - Critical flakiness alerts
|
|
1138
|
-
|
|
1139
|
-
## Use Cases
|
|
1140
|
-
|
|
1141
|
-
### Use Case 1: Detect and Quarantine Flaky Tests
|
|
1142
|
-
|
|
1143
|
-
**Scenario**: Identify flaky tests in CI and quarantine them.
|
|
1144
|
-
|
|
1145
|
-
**Workflow:**
|
|
1146
|
-
```bash
|
|
1147
|
-
# Detect flaky tests from last 30 days
|
|
1148
|
-
aqe flaky detect --days 30 --min-runs 10
|
|
1149
|
-
|
|
1150
|
-
# Analyze root causes
|
|
1151
|
-
aqe flaky analyze --test "integration/checkout.test.ts"
|
|
1152
|
-
|
|
1153
|
-
# Quarantine flaky tests
|
|
1154
|
-
aqe flaky quarantine --severity HIGH --auto-assign
|
|
1155
|
-
|
|
1156
|
-
# Generate report
|
|
1157
|
-
aqe flaky report --output flaky-tests-report.html
|
|
1158
|
-
```
|
|
1159
|
-
|
|
1160
|
-
### Use Case 2: Auto-Stabilize Flaky Test
|
|
1161
|
-
|
|
1162
|
-
**Scenario**: Automatically fix a flaky test with race condition.
|
|
1163
|
-
|
|
1164
|
-
**Workflow:**
|
|
1165
|
-
```bash
|
|
1166
|
-
# Detect root cause
|
|
1167
|
-
aqe flaky analyze --test "integration/payment.test.ts"
|
|
1168
|
-
|
|
1169
|
-
# Attempt auto-stabilization
|
|
1170
|
-
aqe flaky auto-fix --test "integration/payment.test.ts"
|
|
1171
|
-
|
|
1172
|
-
# Validate fix
|
|
1173
|
-
aqe flaky validate --test "integration/payment.test.ts" --runs 20
|
|
1174
|
-
|
|
1175
|
-
# Reinstate if fixed
|
|
1176
|
-
aqe flaky reinstate --test "integration/payment.test.ts"
|
|
1177
|
-
```
|
|
1178
|
-
|
|
1179
|
-
### Use Case 3: Track Flakiness Trends
|
|
1180
|
-
|
|
1181
|
-
**Scenario**: Monitor flakiness trends and identify systemic issues.
|
|
1182
|
-
|
|
1183
|
-
**Workflow:**
|
|
1184
|
-
```bash
|
|
1185
|
-
# Generate trend report
|
|
1186
|
-
aqe flaky trends --days 90 --format chart
|
|
1187
|
-
|
|
1188
|
-
# Identify hotspots
|
|
1189
|
-
aqe flaky hotspots --by module --threshold 0.10
|
|
1190
|
-
|
|
1191
|
-
# Predict future flakiness
|
|
1192
|
-
aqe flaky predict --target-date 2025-12-31
|
|
1193
|
-
```
|
|
1194
|
-
|
|
1195
|
-
## Success Metrics
|
|
1196
|
-
|
|
1197
|
-
### Quality Metrics
|
|
1198
|
-
- **Test Reliability**: 95%+ (target achieved)
|
|
1199
|
-
- **False Negative Rate**: <2% (flaky tests causing false passes)
|
|
1200
|
-
- **False Positive Rate**: <3% (stable tests incorrectly flagged)
|
|
1201
|
-
- **Detection Accuracy**: 98%
|
|
1202
|
-
|
|
1203
|
-
### Efficiency Metrics
|
|
1204
|
-
- **Time to Detect Flakiness**: <1 hour (automated)
|
|
1205
|
-
- **Time to Fix**: 80% fixed within 7 days
|
|
1206
|
-
- **Quarantine Duration**: Average 8 days
|
|
1207
|
-
- **Auto-Fix Success Rate**: 65%
|
|
1208
|
-
|
|
1209
|
-
### Business Metrics
|
|
1210
|
-
- **CI Reliability**: 99.5% (no false failures blocking deployments)
|
|
1211
|
-
- **Developer Trust**: 4.9/5 (high confidence in test results)
|
|
1212
|
-
- **Time Saved**: 15 hours/week (no manual reruns)
|
|
1213
|
-
|
|
1214
|
-
## Commands
|
|
1215
|
-
|
|
1216
|
-
### Basic Commands
|
|
1217
|
-
|
|
1218
|
-
```bash
|
|
1219
|
-
# Detect flaky tests
|
|
1220
|
-
aqe flaky detect --days <number>
|
|
1221
|
-
|
|
1222
|
-
# Analyze root cause
|
|
1223
|
-
aqe flaky analyze --test <test-name>
|
|
1224
|
-
|
|
1225
|
-
# Quarantine test
|
|
1226
|
-
aqe flaky quarantine --test <test-name> --reason <reason>
|
|
1227
|
-
|
|
1228
|
-
# Reinstate test
|
|
1229
|
-
aqe flaky reinstate --test <test-name>
|
|
1230
|
-
|
|
1231
|
-
# Generate report
|
|
1232
|
-
aqe flaky report --output <file>
|
|
1233
|
-
```
|
|
1234
|
-
|
|
1235
|
-
### Advanced Commands
|
|
1236
|
-
|
|
1237
|
-
```bash
|
|
1238
|
-
# Auto-fix flaky test
|
|
1239
|
-
aqe flaky auto-fix --test <test-name> --validate
|
|
1240
|
-
|
|
1241
|
-
# Track trends
|
|
1242
|
-
aqe flaky trends --days <number> --format <html|chart|json>
|
|
1243
|
-
|
|
1244
|
-
# Identify hotspots
|
|
1245
|
-
aqe flaky hotspots --by <module|team|category>
|
|
1246
|
-
|
|
1247
|
-
# Predict flakiness
|
|
1248
|
-
aqe flaky predict --test <test-name> --changes <git-diff>
|
|
1249
|
-
|
|
1250
|
-
# Review quarantined tests
|
|
1251
|
-
aqe flaky review-quarantine --auto-reinstate
|
|
1252
|
-
```
|
|
1253
|
-
|
|
1254
|
-
### Specialized Commands
|
|
1255
|
-
|
|
1256
|
-
```bash
|
|
1257
|
-
# Reliability scoring
|
|
1258
|
-
aqe flaky reliability-score --test <test-name>
|
|
1259
|
-
|
|
1260
|
-
# Bulk quarantine
|
|
1261
|
-
aqe flaky bulk-quarantine --severity HIGH --days 7
|
|
1262
|
-
|
|
1263
|
-
# Escalate overdue
|
|
1264
|
-
aqe flaky escalate-overdue --threshold 30
|
|
1265
|
-
|
|
1266
|
-
# Export quarantine dashboard
|
|
1267
|
-
aqe flaky quarantine-dashboard --output dashboard.html
|
|
1268
|
-
|
|
1269
|
-
# Flakiness heatmap
|
|
1270
|
-
aqe flaky heatmap --by-module --output heatmap.png
|
|
1271
|
-
```
|
|
1272
|
-
|
|
1273
|
-
|
|
1274
|
-
**Agent Status**: Production Ready
|
|
1275
|
-
**Last Updated**: 2025-09-30
|
|
1276
|
-
**Version**: 1.0.0
|
|
1277
|
-
**Maintainer**: AQE Fleet Team
|
|
1278
|
-
|
|
1279
|
-
## Code Execution Workflows
|
|
1280
|
-
|
|
1281
|
-
Orchestrate flaky test detection using statistical analysis and ML-powered pattern recognition.
|
|
1282
|
-
|
|
1283
|
-
### Flaky Test Detection with ML
|
|
1284
|
-
|
|
1285
|
-
```typescript
|
|
1286
|
-
/**
|
|
1287
|
-
* Phase 3 Flaky Test Detection Tools
|
|
1288
|
-
*
|
|
1289
|
-
* IMPORTANT: Phase 3 domain-specific tools are fully implemented and ready to use.
|
|
1290
|
-
* These examples show the REAL API that will be available.
|
|
1291
|
-
*
|
|
1292
|
-
* Import path: 'agentic-qe/tools/qe/flaky-detection'
|
|
1293
|
-
* Type definitions: 'agentic-qe/tools/qe/shared/types'
|
|
1294
|
-
*/
|
|
1295
|
-
|
|
1296
|
-
import type {
|
|
1297
|
-
FlakyTestDetectionParams,
|
|
1298
|
-
FlakyAnalysisConfig,
|
|
1299
|
-
FlakyReportConfig,
|
|
1300
|
-
TestResult,
|
|
1301
|
-
QEToolResponse
|
|
1302
|
-
} from 'agentic-qe/tools/qe/shared/types';
|
|
1303
|
-
|
|
1304
|
-
// Phase 3 flaky detection tools (✅ Available)
|
|
1305
|
-
// import {
|
|
1306
|
-
// detectFlakyTests,
|
|
1307
|
-
// analyzeTestStability,
|
|
1308
|
-
// autoStabilizeTest,
|
|
1309
|
-
// generateFlakinessReport
|
|
1310
|
-
// } from 'agentic-qe/tools/qe/flaky-detection';
|
|
1311
|
-
|
|
1312
|
-
// Example: ML-powered flaky test detection
|
|
1313
|
-
const detectionParams: FlakyTestDetectionParams = {
|
|
1314
|
-
testResults: testHistory, // Array of TestResult from past runs
|
|
1315
|
-
minRuns: 10, // Minimum 10 runs to consider
|
|
1316
|
-
timeWindow: 30, // Analyze last 30 days
|
|
1317
|
-
confidenceThreshold: 0.85, // 85% confidence for flakiness
|
|
1318
|
-
analysisConfig: {
|
|
1319
|
-
algorithm: 'ml', // Use machine learning
|
|
1320
|
-
features: [
|
|
1321
|
-
'pass_rate_variance',
|
|
1322
|
-
'timing_variance',
|
|
1323
|
-
'environment_correlation',
|
|
1324
|
-
'retry_patterns',
|
|
1325
|
-
'error_message_diversity'
|
|
1326
|
-
],
|
|
1327
|
-
autoStabilize: true, // Auto-generate stabilization patches
|
|
1328
|
-
mlConfig: {
|
|
1329
|
-
modelPath: './models/flaky-detector.onnx',
|
|
1330
|
-
threshold: 0.90
|
|
1331
|
-
}
|
|
1332
|
-
},
|
|
1333
|
-
reportConfig: {
|
|
1334
|
-
includeTrends: true,
|
|
1335
|
-
includeSuggestions: true,
|
|
1336
|
-
format: 'json'
|
|
1337
|
-
}
|
|
1338
|
-
};
|
|
1339
|
-
|
|
1340
|
-
// const flakyResults: QEToolResponse<FlakyTestReport> =
|
|
1341
|
-
// await detectFlakyTests(detectionParams);
|
|
1342
|
-
//
|
|
1343
|
-
// if (flakyResults.success && flakyResults.data) {
|
|
1344
|
-
// console.log(`Found ${flakyResults.data.flakyTests.length} flaky tests`);
|
|
1345
|
-
//
|
|
1346
|
-
// flakyResults.data.flakyTests.forEach((test, idx) => {
|
|
1347
|
-
// console.log(`${idx + 1}. ${test.testName}`);
|
|
1348
|
-
// console.log(` Flakiness Score: ${test.flakinessScore.toFixed(2)}`);
|
|
1349
|
-
// console.log(` Pass Rate: ${test.passRate.toFixed(2)}%`);
|
|
1350
|
-
// console.log(` Root Cause: ${test.rootCause}`);
|
|
1351
|
-
// console.log(` Stabilization: ${test.stabilizationSuggestion}`);
|
|
1352
|
-
// });
|
|
1353
|
-
// }
|
|
1354
|
-
|
|
1355
|
-
console.log('✅ ML-powered flaky test detection complete');
|
|
1356
|
-
```
|
|
1357
|
-
|
|
1358
|
-
### Statistical Flakiness Analysis
|
|
1359
|
-
|
|
1360
|
-
```typescript
|
|
1361
|
-
import type {
|
|
1362
|
-
FlakyTestDetectionParams,
|
|
1363
|
-
TestResult
|
|
1364
|
-
} from 'agentic-qe/tools/qe/shared/types';
|
|
1365
|
-
|
|
1366
|
-
// Phase 3 statistical analysis (✅ Available)
|
|
1367
|
-
// import {
|
|
1368
|
-
// analyzeTestStability,
|
|
1369
|
-
// calculateFlakinessScore,
|
|
1370
|
-
// identifyRootCause
|
|
1371
|
-
// } from 'agentic-qe/tools/qe/flaky-detection';
|
|
1372
|
-
|
|
1373
|
-
// Example: Statistical analysis of test stability
|
|
1374
|
-
const stabilityParams: FlakyTestDetectionParams = {
|
|
1375
|
-
testResults: testHistory,
|
|
1376
|
-
minRuns: 20, // Need 20+ runs for statistical significance
|
|
1377
|
-
timeWindow: 60, // 60 day window
|
|
1378
|
-
confidenceThreshold: 0.95, // High confidence threshold
|
|
1379
|
-
analysisConfig: {
|
|
1380
|
-
algorithm: 'statistical', // Chi-square, variance analysis
|
|
1381
|
-
features: [
|
|
1382
|
-
'pass_rate_trend',
|
|
1383
|
-
'failure_pattern',
|
|
1384
|
-
'time_correlation',
|
|
1385
|
-
'environment_dependency'
|
|
1386
|
-
],
|
|
1387
|
-
autoStabilize: false // Manual review required
|
|
1388
|
-
}
|
|
1389
|
-
};
|
|
1390
|
-
|
|
1391
|
-
// const stability = await analyzeTestStability(stabilityParams);
|
|
1392
|
-
//
|
|
1393
|
-
// console.log('Test Stability Analysis:');
|
|
1394
|
-
// console.log(` Chi-Square p-value: ${stability.chiSquarePValue}`);
|
|
1395
|
-
// console.log(` Variance coefficient: ${stability.varianceCoefficient}`);
|
|
1396
|
-
// console.log(` Is flaky: ${stability.isFlaky ? 'YES' : 'NO'}`);
|
|
1397
|
-
//
|
|
1398
|
-
// if (stability.isFlaky) {
|
|
1399
|
-
// console.log(` Root cause: ${stability.rootCause}`);
|
|
1400
|
-
// console.log(` Confidence: ${(stability.confidence * 100).toFixed(2)}%`);
|
|
1401
|
-
// }
|
|
1402
|
-
|
|
1403
|
-
console.log('✅ Statistical stability analysis complete');
|
|
1404
|
-
```
|
|
1405
|
-
|
|
1406
|
-
### Auto-Stabilization with Pattern Recognition
|
|
1407
|
-
|
|
1408
|
-
```typescript
|
|
1409
|
-
import type {
|
|
1410
|
-
FlakyTestDetectionParams,
|
|
1411
|
-
TestResult
|
|
1412
|
-
} from 'agentic-qe/tools/qe/shared/types';
|
|
1413
|
-
|
|
1414
|
-
// Phase 3 auto-stabilization (✅ Available)
|
|
1415
|
-
// import {
|
|
1416
|
-
// autoStabilizeTest,
|
|
1417
|
-
// generateStabilizationPatch,
|
|
1418
|
-
// applyStabilizationFix
|
|
1419
|
-
// } from 'agentic-qe/tools/qe/flaky-detection';
|
|
1420
|
-
|
|
1421
|
-
// Example: Automatic flaky test stabilization
|
|
1422
|
-
interface StabilizationParams {
|
|
1423
|
-
testId: string;
|
|
1424
|
-
testCode: string;
|
|
1425
|
-
flakinessPattern: 'timing' | 'race-condition' | 'external-dependency' | 'async-issue';
|
|
1426
|
-
strategy: 'retry' | 'wait' | 'mock' | 'isolation';
|
|
1427
|
-
}
|
|
1428
|
-
|
|
1429
|
-
const stabilizeParams: StabilizationParams = {
|
|
1430
|
-
testId: 'test-login-form-submit',
|
|
1431
|
-
testCode: `
|
|
1432
|
-
it('should submit login form', async () => {
|
|
1433
|
-
await page.fill('#username', 'test');
|
|
1434
|
-
await page.fill('#password', 'pass');
|
|
1435
|
-
await page.click('#submit');
|
|
1436
|
-
// Flaky: Sometimes assertion fails
|
|
1437
|
-
expect(page.url()).toBe('/dashboard');
|
|
1438
|
-
});
|
|
1439
|
-
`,
|
|
1440
|
-
flakinessPattern: 'timing',
|
|
1441
|
-
strategy: 'wait'
|
|
1442
|
-
};
|
|
1443
|
-
|
|
1444
|
-
// const patch = await generateStabilizationPatch(stabilizeParams);
|
|
1445
|
-
//
|
|
1446
|
-
// console.log('Generated Stabilization Patch:');
|
|
1447
|
-
// console.log(patch.code);
|
|
1448
|
-
// console.log('\nExplanation:', patch.explanation);
|
|
1449
|
-
// console.log('Expected improvement:', patch.expectedImprovement);
|
|
1450
|
-
//
|
|
1451
|
-
// Example output:
|
|
1452
|
-
// it('should submit login form', async () => {
|
|
1453
|
-
// await page.fill('#username', 'test');
|
|
1454
|
-
// await page.fill('#password', 'pass');
|
|
1455
|
-
// await page.click('#submit');
|
|
1456
|
-
// // Wait for navigation to complete
|
|
1457
|
-
// await page.waitForURL('/dashboard', { timeout: 5000 });
|
|
1458
|
-
// expect(page.url()).toBe('/dashboard');
|
|
1459
|
-
// });
|
|
1460
|
-
|
|
1461
|
-
console.log('✅ Auto-stabilization patch generated');
|
|
1462
|
-
```
|
|
1463
|
-
|
|
1464
|
-
### Comprehensive Flakiness Report
|
|
1465
|
-
|
|
1466
|
-
```typescript
|
|
1467
|
-
import type {
|
|
1468
|
-
FlakyTestDetectionParams,
|
|
1469
|
-
FlakyReportConfig
|
|
1470
|
-
} from 'agentic-qe/tools/qe/shared/types';
|
|
1471
|
-
|
|
1472
|
-
// Phase 3 reporting (✅ Available)
|
|
1473
|
-
// import {
|
|
1474
|
-
// generateFlakinessReport,
|
|
1475
|
-
// exportFlakinessTrends,
|
|
1476
|
-
// visualizeFlakiness
|
|
1477
|
-
// } from 'agentic-qe/tools/qe/flaky-detection';
|
|
1478
|
-
|
|
1479
|
-
// Example: Generate comprehensive flakiness report
|
|
1480
|
-
const reportParams: FlakyTestDetectionParams = {
|
|
1481
|
-
testResults: allTestHistory,
|
|
1482
|
-
minRuns: 10,
|
|
1483
|
-
timeWindow: 90, // 90-day comprehensive report
|
|
1484
|
-
confidenceThreshold: 0.80,
|
|
1485
|
-
analysisConfig: {
|
|
1486
|
-
algorithm: 'hybrid', // Combine statistical + ML
|
|
1487
|
-
features: [
|
|
1488
|
-
'all' // Analyze all available features
|
|
1489
|
-
],
|
|
1490
|
-
autoStabilize: true
|
|
1491
|
-
},
|
|
1492
|
-
reportConfig: {
|
|
1493
|
-
includeTrends: true, // Show historical trends
|
|
1494
|
-
includeSuggestions: true, // Include fix suggestions
|
|
1495
|
-
format: 'html' // Interactive HTML report
|
|
1496
|
-
}
|
|
1497
|
-
};
|
|
1498
|
-
|
|
1499
|
-
// const report = await generateFlakinessReport(reportParams);
|
|
1500
|
-
//
|
|
1501
|
-
// console.log('Flakiness Report Generated:');
|
|
1502
|
-
// console.log(` Total tests analyzed: ${report.totalTests}`);
|
|
1503
|
-
// console.log(` Flaky tests found: ${report.flakyTests.length}`);
|
|
1504
|
-
// console.log(` Stability score: ${report.overallStability.toFixed(2)}%`);
|
|
1505
|
-
// console.log(` Report saved to: ${report.reportPath}`);
|
|
1506
|
-
//
|
|
1507
|
-
// console.log('\nTop 5 Flakiest Tests:');
|
|
1508
|
-
// report.flakyTests.slice(0, 5).forEach((test, idx) => {
|
|
1509
|
-
// console.log(` ${idx + 1}. ${test.testName} (score: ${test.flakinessScore})`);
|
|
1510
|
-
// console.log(` Root cause: ${test.rootCause}`);
|
|
1511
|
-
// console.log(` Fix suggestion: ${test.stabilizationSuggestion}`);
|
|
1512
|
-
// });
|
|
1513
|
-
|
|
1514
|
-
console.log('✅ Comprehensive flakiness report complete');
|
|
1515
|
-
```
|
|
1516
|
-
|
|
1517
|
-
### Phase 3 Tool Discovery
|
|
1518
|
-
|
|
1519
|
-
```bash
|
|
1520
|
-
# Once Phase 3 is implemented, tools will be at:
|
|
1521
|
-
# /workspaces/agentic-qe-cf/src/mcp/tools/qe/flaky-detection/
|
|
1522
|
-
|
|
1523
|
-
# List available flaky detection tools (Phase 3)
|
|
1524
|
-
ls node_modules/agentic-qe/dist/mcp/tools/qe/flaky-detection/
|
|
1525
|
-
|
|
1526
|
-
# Check type definitions
|
|
1527
|
-
cat node_modules/agentic-qe/dist/mcp/tools/qe/shared/types.d.ts | grep -A 20 "FlakyTest"
|
|
1528
|
-
|
|
1529
|
-
# View available ML models
|
|
1530
|
-
node -e "import('agentic-qe/tools/qe/flaky-detection').then(m => console.log(m.availableModels()))"
|
|
1531
|
-
```
|
|
1532
|
-
|
|
1533
|
-
### Using Flaky Detection Tools via MCP (Phase 3)
|
|
1534
|
-
|
|
1535
|
-
```typescript
|
|
1536
|
-
// Phase 3 MCP integration (✅ Available)
|
|
1537
|
-
// Domain-specific tools are registered as MCP tools:
|
|
1538
|
-
|
|
1539
|
-
// Via MCP client
|
|
1540
|
-
// const result = await mcpClient.callTool('qe_flaky_detect_ml', {
|
|
1541
|
-
// testResults: testHistory,
|
|
1542
|
-
// algorithm: 'ml',
|
|
1543
|
-
// confidenceThreshold: 0.85
|
|
1544
|
-
// });
|
|
1545
|
-
|
|
1546
|
-
// Via CLI
|
|
1547
|
-
// aqe flaky detect --algorithm ml --confidence 0.85
|
|
1548
|
-
// aqe flaky analyze --test-id "test-login" --min-runs 20
|
|
1549
|
-
// aqe flaky stabilize --test-id "test-login" --strategy wait
|
|
1550
|
-
// aqe flaky report --format html --days 90
|
|
1551
|
-
```
|
|
1552
|
-
|
|
101
|
+
Reward criteria:
|
|
102
|
+
- 1.0: Perfect (100% accuracy, 0 false positives, <5s analysis)
|
|
103
|
+
- 0.9: Excellent (98%+ accuracy, <2% false positives)
|
|
104
|
+
- 0.7: Good (95%+ accuracy, <5% false positives)
|
|
105
|
+
- 0.5: Acceptable (90%+ accuracy, completed)
|
|
106
|
+
</learning_protocol>
|
|
107
|
+
|
|
108
|
+
<output_format>
|
|
109
|
+
- JSON for flakiness reports with scores, patterns, and root causes
|
|
110
|
+
- Markdown for remediation guides and trend analysis
|
|
111
|
+
- Gherkin for stabilized test scenarios
|
|
112
|
+
</output_format>
|
|
113
|
+
|
|
114
|
+
<examples>
|
|
115
|
+
Example 1: Detection and stabilization
|
|
116
|
+
```
|
|
117
|
+
Input: Analyze 156 test runs over 30 days
|
|
118
|
+
Output:
|
|
119
|
+
- Detected 13 flaky tests (8.3% of suite)
|
|
120
|
+
- Root causes: 7 race conditions, 4 timeouts, 2 network issues
|
|
121
|
+
- Auto-stabilized: 8/13 tests
|
|
122
|
+
- New reliability: 98.6% (up from 87.3%)
|
|
123
|
+
- Analysis time: 12.1s
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
Example 2: Quarantine workflow
|
|
127
|
+
```
|
|
128
|
+
Input: Test "checkout.integration.test.ts" has 42% failure rate
|
|
129
|
+
Output:
|
|
130
|
+
- Quarantined with skip annotation
|
|
131
|
+
- Created JIRA issue QE-1234
|
|
132
|
+
- Assigned to backend-team
|
|
133
|
+
- ETA for fix: 7 days
|
|
134
|
+
- Automated review scheduled
|
|
135
|
+
```
|
|
136
|
+
</examples>
|
|
137
|
+
|
|
138
|
+
<skills_available>
|
|
139
|
+
Core:
|
|
140
|
+
- agentic-quality-engineering
|
|
141
|
+
- exploratory-testing-advanced
|
|
142
|
+
|
|
143
|
+
Advanced:
|
|
144
|
+
- mutation-testing
|
|
145
|
+
- test-reporting-analytics
|
|
146
|
+
|
|
147
|
+
Use: `aqe skills show mutation-testing` or `Skill("mutation-testing")`
|
|
148
|
+
</skills_available>
|
|
149
|
+
|
|
150
|
+
<coordination_notes>
|
|
151
|
+
Native AQE hooks provide 100-500x faster coordination than external tools.
|
|
152
|
+
Event-driven updates via EventBus for real-time fleet collaboration.
|
|
153
|
+
Zero external dependencies - TypeScript-native integration.
|
|
154
|
+
</coordination_notes>
|
|
155
|
+
</qe_agent_definition>
|