agentic-qe 1.5.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (161) hide show
  1. package/.claude/agents/.claude-flow/metrics/agent-metrics.json +1 -0
  2. package/.claude/agents/.claude-flow/metrics/performance.json +87 -0
  3. package/.claude/agents/.claude-flow/metrics/task-metrics.json +10 -0
  4. package/.claude/agents/qe-api-contract-validator.md +118 -0
  5. package/.claude/agents/qe-chaos-engineer.md +320 -5
  6. package/.claude/agents/qe-code-complexity.md +360 -0
  7. package/.claude/agents/qe-coverage-analyzer.md +112 -0
  8. package/.claude/agents/qe-deployment-readiness.md +322 -6
  9. package/.claude/agents/qe-flaky-test-hunter.md +115 -0
  10. package/.claude/agents/qe-fleet-commander.md +319 -6
  11. package/.claude/agents/qe-performance-tester.md +234 -0
  12. package/.claude/agents/qe-production-intelligence.md +114 -0
  13. package/.claude/agents/qe-quality-analyzer.md +126 -0
  14. package/.claude/agents/qe-quality-gate.md +119 -0
  15. package/.claude/agents/qe-regression-risk-analyzer.md +114 -0
  16. package/.claude/agents/qe-requirements-validator.md +114 -0
  17. package/.claude/agents/qe-security-scanner.md +118 -0
  18. package/.claude/agents/qe-test-data-architect.md +234 -0
  19. package/.claude/agents/qe-test-executor.md +115 -0
  20. package/.claude/agents/qe-test-generator.md +114 -0
  21. package/.claude/agents/qe-visual-tester.md +305 -6
  22. package/.claude/agents/subagents/qe-code-reviewer.md +0 -4
  23. package/.claude/agents/subagents/qe-data-generator.md +0 -16
  24. package/.claude/agents/subagents/qe-integration-tester.md +0 -17
  25. package/.claude/agents/subagents/qe-performance-validator.md +0 -16
  26. package/.claude/agents/subagents/qe-security-auditor.md +0 -16
  27. package/.claude/agents/subagents/qe-test-implementer.md +0 -17
  28. package/.claude/agents/subagents/qe-test-refactorer.md +0 -17
  29. package/.claude/agents/subagents/qe-test-writer.md +0 -19
  30. package/CHANGELOG.md +290 -0
  31. package/README.md +37 -5
  32. package/dist/adapters/MemoryStoreAdapter.d.ts +38 -0
  33. package/dist/adapters/MemoryStoreAdapter.d.ts.map +1 -1
  34. package/dist/adapters/MemoryStoreAdapter.js +22 -0
  35. package/dist/adapters/MemoryStoreAdapter.js.map +1 -1
  36. package/dist/agents/BaseAgent.d.ts.map +1 -1
  37. package/dist/agents/BaseAgent.js +13 -0
  38. package/dist/agents/BaseAgent.js.map +1 -1
  39. package/dist/cli/commands/init.d.ts.map +1 -1
  40. package/dist/cli/commands/init.js +32 -1
  41. package/dist/cli/commands/init.js.map +1 -1
  42. package/dist/core/memory/AgentDBService.d.ts +33 -28
  43. package/dist/core/memory/AgentDBService.d.ts.map +1 -1
  44. package/dist/core/memory/AgentDBService.js +233 -290
  45. package/dist/core/memory/AgentDBService.js.map +1 -1
  46. package/dist/core/memory/EnhancedAgentDBService.d.ts.map +1 -1
  47. package/dist/core/memory/EnhancedAgentDBService.js +5 -3
  48. package/dist/core/memory/EnhancedAgentDBService.js.map +1 -1
  49. package/dist/core/memory/RealAgentDBAdapter.d.ts +9 -2
  50. package/dist/core/memory/RealAgentDBAdapter.d.ts.map +1 -1
  51. package/dist/core/memory/RealAgentDBAdapter.js +126 -100
  52. package/dist/core/memory/RealAgentDBAdapter.js.map +1 -1
  53. package/dist/core/memory/SwarmMemoryManager.d.ts +58 -0
  54. package/dist/core/memory/SwarmMemoryManager.d.ts.map +1 -1
  55. package/dist/core/memory/SwarmMemoryManager.js +176 -0
  56. package/dist/core/memory/SwarmMemoryManager.js.map +1 -1
  57. package/dist/core/memory/index.d.ts.map +1 -1
  58. package/dist/core/memory/index.js +2 -1
  59. package/dist/core/memory/index.js.map +1 -1
  60. package/dist/learning/LearningEngine.d.ts +14 -27
  61. package/dist/learning/LearningEngine.d.ts.map +1 -1
  62. package/dist/learning/LearningEngine.js +57 -119
  63. package/dist/learning/LearningEngine.js.map +1 -1
  64. package/dist/learning/index.d.ts +0 -1
  65. package/dist/learning/index.d.ts.map +1 -1
  66. package/dist/learning/index.js +0 -1
  67. package/dist/learning/index.js.map +1 -1
  68. package/dist/mcp/handlers/learning/learning-query.d.ts +34 -0
  69. package/dist/mcp/handlers/learning/learning-query.d.ts.map +1 -0
  70. package/dist/mcp/handlers/learning/learning-query.js +156 -0
  71. package/dist/mcp/handlers/learning/learning-query.js.map +1 -0
  72. package/dist/mcp/handlers/learning/learning-store-experience.d.ts +30 -0
  73. package/dist/mcp/handlers/learning/learning-store-experience.d.ts.map +1 -0
  74. package/dist/mcp/handlers/learning/learning-store-experience.js +86 -0
  75. package/dist/mcp/handlers/learning/learning-store-experience.js.map +1 -0
  76. package/dist/mcp/handlers/learning/learning-store-pattern.d.ts +31 -0
  77. package/dist/mcp/handlers/learning/learning-store-pattern.d.ts.map +1 -0
  78. package/dist/mcp/handlers/learning/learning-store-pattern.js +126 -0
  79. package/dist/mcp/handlers/learning/learning-store-pattern.js.map +1 -0
  80. package/dist/mcp/handlers/learning/learning-store-qvalue.d.ts +30 -0
  81. package/dist/mcp/handlers/learning/learning-store-qvalue.d.ts.map +1 -0
  82. package/dist/mcp/handlers/learning/learning-store-qvalue.js +100 -0
  83. package/dist/mcp/handlers/learning/learning-store-qvalue.js.map +1 -0
  84. package/dist/mcp/server.d.ts +11 -0
  85. package/dist/mcp/server.d.ts.map +1 -1
  86. package/dist/mcp/server.js +98 -1
  87. package/dist/mcp/server.js.map +1 -1
  88. package/dist/mcp/services/LearningEventListener.d.ts +123 -0
  89. package/dist/mcp/services/LearningEventListener.d.ts.map +1 -0
  90. package/dist/mcp/services/LearningEventListener.js +322 -0
  91. package/dist/mcp/services/LearningEventListener.js.map +1 -0
  92. package/dist/mcp/tools/qe/security/scan-comprehensive.d.ts.map +1 -1
  93. package/dist/mcp/tools/qe/security/scan-comprehensive.js +40 -18
  94. package/dist/mcp/tools/qe/security/scan-comprehensive.js.map +1 -1
  95. package/dist/mcp/tools.d.ts +4 -0
  96. package/dist/mcp/tools.d.ts.map +1 -1
  97. package/dist/mcp/tools.js +179 -0
  98. package/dist/mcp/tools.js.map +1 -1
  99. package/dist/types/memory-interfaces.d.ts +71 -0
  100. package/dist/types/memory-interfaces.d.ts.map +1 -1
  101. package/dist/utils/Calculator.d.ts +35 -0
  102. package/dist/utils/Calculator.d.ts.map +1 -0
  103. package/dist/utils/Calculator.js +50 -0
  104. package/dist/utils/Calculator.js.map +1 -0
  105. package/dist/utils/Logger.d.ts.map +1 -1
  106. package/dist/utils/Logger.js +4 -1
  107. package/dist/utils/Logger.js.map +1 -1
  108. package/package.json +7 -5
  109. package/.claude/agents/qe-api-contract-validator.md.backup +0 -1148
  110. package/.claude/agents/qe-api-contract-validator.md.backup-20251107-134747 +0 -1148
  111. package/.claude/agents/qe-api-contract-validator.md.backup-phase2-20251107-140039 +0 -1123
  112. package/.claude/agents/qe-chaos-engineer.md.backup +0 -808
  113. package/.claude/agents/qe-chaos-engineer.md.backup-20251107-134747 +0 -808
  114. package/.claude/agents/qe-chaos-engineer.md.backup-phase2-20251107-140039 +0 -787
  115. package/.claude/agents/qe-code-complexity.md.backup +0 -291
  116. package/.claude/agents/qe-code-complexity.md.backup-20251107-134747 +0 -291
  117. package/.claude/agents/qe-code-complexity.md.backup-phase2-20251107-140039 +0 -286
  118. package/.claude/agents/qe-coverage-analyzer.md.backup +0 -467
  119. package/.claude/agents/qe-coverage-analyzer.md.backup-20251107-134747 +0 -467
  120. package/.claude/agents/qe-coverage-analyzer.md.backup-phase2-20251107-140039 +0 -438
  121. package/.claude/agents/qe-deployment-readiness.md.backup +0 -1166
  122. package/.claude/agents/qe-deployment-readiness.md.backup-20251107-134747 +0 -1166
  123. package/.claude/agents/qe-deployment-readiness.md.backup-phase2-20251107-140039 +0 -1140
  124. package/.claude/agents/qe-flaky-test-hunter.md.backup +0 -1195
  125. package/.claude/agents/qe-flaky-test-hunter.md.backup-20251107-134747 +0 -1195
  126. package/.claude/agents/qe-flaky-test-hunter.md.backup-phase2-20251107-140039 +0 -1162
  127. package/.claude/agents/qe-fleet-commander.md.backup +0 -718
  128. package/.claude/agents/qe-fleet-commander.md.backup-20251107-134747 +0 -718
  129. package/.claude/agents/qe-fleet-commander.md.backup-phase2-20251107-140039 +0 -697
  130. package/.claude/agents/qe-performance-tester.md.backup +0 -428
  131. package/.claude/agents/qe-performance-tester.md.backup-20251107-134747 +0 -428
  132. package/.claude/agents/qe-performance-tester.md.backup-phase2-20251107-140039 +0 -372
  133. package/.claude/agents/qe-production-intelligence.md.backup +0 -1219
  134. package/.claude/agents/qe-production-intelligence.md.backup-20251107-134747 +0 -1219
  135. package/.claude/agents/qe-production-intelligence.md.backup-phase2-20251107-140039 +0 -1194
  136. package/.claude/agents/qe-quality-analyzer.md.backup +0 -425
  137. package/.claude/agents/qe-quality-analyzer.md.backup-20251107-134747 +0 -425
  138. package/.claude/agents/qe-quality-analyzer.md.backup-phase2-20251107-140039 +0 -394
  139. package/.claude/agents/qe-quality-gate.md.backup +0 -446
  140. package/.claude/agents/qe-quality-gate.md.backup-20251107-134747 +0 -446
  141. package/.claude/agents/qe-quality-gate.md.backup-phase2-20251107-140039 +0 -415
  142. package/.claude/agents/qe-regression-risk-analyzer.md.backup +0 -1009
  143. package/.claude/agents/qe-regression-risk-analyzer.md.backup-20251107-134747 +0 -1009
  144. package/.claude/agents/qe-regression-risk-analyzer.md.backup-phase2-20251107-140039 +0 -984
  145. package/.claude/agents/qe-requirements-validator.md.backup +0 -748
  146. package/.claude/agents/qe-requirements-validator.md.backup-20251107-134747 +0 -748
  147. package/.claude/agents/qe-requirements-validator.md.backup-phase2-20251107-140039 +0 -723
  148. package/.claude/agents/qe-security-scanner.md.backup +0 -634
  149. package/.claude/agents/qe-security-scanner.md.backup-20251107-134747 +0 -634
  150. package/.claude/agents/qe-security-scanner.md.backup-phase2-20251107-140039 +0 -573
  151. package/.claude/agents/qe-test-data-architect.md.backup +0 -1064
  152. package/.claude/agents/qe-test-data-architect.md.backup-20251107-134747 +0 -1064
  153. package/.claude/agents/qe-test-data-architect.md.backup-phase2-20251107-140039 +0 -1040
  154. package/.claude/agents/qe-test-executor.md.backup +0 -389
  155. package/.claude/agents/qe-test-executor.md.backup-20251107-134747 +0 -389
  156. package/.claude/agents/qe-test-executor.md.backup-phase2-20251107-140039 +0 -369
  157. package/.claude/agents/qe-test-generator.md.backup +0 -997
  158. package/.claude/agents/qe-test-generator.md.backup-20251107-134747 +0 -997
  159. package/.claude/agents/qe-visual-tester.md.backup +0 -777
  160. package/.claude/agents/qe-visual-tester.md.backup-20251107-134747 +0 -777
  161. package/.claude/agents/qe-visual-tester.md.backup-phase2-20251107-140039 +0 -756
@@ -1,1195 +0,0 @@
1
- ---
2
- name: qe-flaky-test-hunter
3
- type: flaky-test-detector
4
- color: magenta
5
- priority: high
6
- description: "Detects, analyzes, and stabilizes flaky tests through pattern recognition and auto-remediation"
7
- capabilities:
8
- - flaky-detection
9
- - root-cause-analysis
10
- - auto-stabilization
11
- - quarantine-management
12
- - trend-tracking
13
- - reliability-scoring
14
- - predictive-flakiness
15
- coordination:
16
- protocol: aqe-hooks
17
- metadata:
18
- version: "1.2.0"
19
- stakeholders: ["Engineering", "QA", "DevOps"]
20
- roi: "400%"
21
- impact: "Achieves 95%+ test reliability, eliminates false negatives/positives"
22
- agentdb_enabled: true
23
- agentdb_domain: "test-reliability"
24
- agentdb_features:
25
- - "pattern_matching: Similar flaky test retrieval (<100µs)"
26
- - "quic_sync: Cross-project pattern sharing (<1ms)"
27
- - "ml_detection: 100% accuracy, 0% false positives"
28
- - "root_cause_db: Historical root cause and fix patterns"
29
- memory_keys:
30
- - "aqe/flaky-tests/*"
31
- - "aqe/test-reliability/*"
32
- - "aqe/quarantine/*"
33
- - "aqe/test-results/history"
34
- - "aqe/remediation/*"
35
- - "agentdb/test-reliability/patterns"
36
- ---
37
-
38
- # QE Flaky Test Hunter Agent
39
-
40
- ## Mission Statement
41
-
42
- The Flaky Test Hunter agent **eliminates test flakiness** through intelligent detection, root cause analysis, and automated stabilization. Using statistical analysis, pattern recognition, and ML-powered prediction, this agent identifies flaky tests with 98% accuracy, diagnoses root causes, and auto-remediates common flakiness patterns. It transforms unreliable test suites into rock-solid confidence builders, achieving 95%+ test reliability and eliminating the "just rerun it" anti-pattern.
43
-
44
- ## Skills Available
45
-
46
- ### Core Testing Skills (Phase 1)
47
- - **agentic-quality-engineering**: Using AI agents as force multipliers in quality work
48
- - **exploratory-testing-advanced**: Advanced exploratory testing techniques with Session-Based Test Management (SBTM)
49
-
50
- ### Phase 2 Skills (NEW in v1.3.0)
51
- - **mutation-testing**: Test quality validation through mutation testing and measuring test suite effectiveness
52
- - **test-reporting-analytics**: Comprehensive test reporting with metrics, trends, and actionable insights
53
-
54
- Use these skills via:
55
- ```bash
56
- # Via CLI
57
- aqe skills show mutation-testing
58
-
59
- # Via Skill tool in Claude Code
60
- Skill("mutation-testing")
61
- Skill("test-reporting-analytics")
62
- ```
63
-
64
- ## Core Capabilities
65
-
66
- ### 1. Flaky Detection
67
-
68
- Detects flaky tests using statistical analysis of historical test results.
69
-
70
- **Flaky Test Detector:**
71
- ```javascript
72
- class FlakyTestDetector {
73
- async detectFlaky(testResults, minRuns = 10) {
74
- const testStats = this.aggregateTestStats(testResults);
75
- const flakyTests = [];
76
-
77
- for (const [testName, stats] of Object.entries(testStats)) {
78
- if (stats.totalRuns < minRuns) {
79
- continue; // Insufficient data
80
- }
81
-
82
- const flakinessScore = this.calculateFlakinessScore(stats);
83
-
84
- if (flakinessScore > 0.1) { // More than 10% flakiness
85
- const flaky = {
86
- testName: testName,
87
- flakinessScore: flakinessScore,
88
- totalRuns: stats.totalRuns,
89
- failures: stats.failures,
90
- passes: stats.passes,
91
- failureRate: stats.failures / stats.totalRuns,
92
- passRate: stats.passes / stats.totalRuns,
93
- pattern: this.detectPattern(stats.history),
94
- lastFlake: stats.lastFailure,
95
- severity: this.calculateSeverity(flakinessScore, stats)
96
- };
97
-
98
- // Root cause analysis
99
- flaky.rootCause = await this.analyzeRootCause(testName, stats);
100
-
101
- flakyTests.push(flaky);
102
- }
103
- }
104
-
105
- return flakyTests.sort((a, b) => b.flakinessScore - a.flakinessScore);
106
- }
107
-
108
- calculateFlakinessScore(stats) {
109
- // Multiple factors contribute to flakiness score:
110
-
111
- // 1. Inconsistency: How often results change
112
- const inconsistency = this.calculateInconsistency(stats.history);
113
-
114
- // 2. Failure rate: Neither always passing nor always failing
115
- const failureRate = stats.failures / stats.totalRuns;
116
- const passRate = stats.passes / stats.totalRuns;
117
- const volatility = Math.min(failureRate, passRate) * 2; // Peak at 50/50
118
-
119
- // 3. Recent behavior: Weight recent flakes more heavily
120
- const recencyWeight = this.calculateRecencyWeight(stats.history);
121
-
122
- // 4. Environmental sensitivity: Fails on specific conditions
123
- const environmentalFlakiness = this.detectEnvironmentalSensitivity(stats);
124
-
125
- // Weighted combination
126
- return (
127
- inconsistency * 0.3 +
128
- volatility * 0.3 +
129
- recencyWeight * 0.2 +
130
- environmentalFlakiness * 0.2
131
- );
132
- }
133
-
134
- calculateInconsistency(history) {
135
- // Count transitions between pass and fail
136
- let transitions = 0;
137
- for (let i = 1; i < history.length; i++) {
138
- if (history[i].result !== history[i - 1].result) {
139
- transitions++;
140
- }
141
- }
142
- return transitions / (history.length - 1);
143
- }
144
-
145
- detectPattern(history) {
146
- const patterns = {
147
- random: 'Randomly fails with no clear pattern',
148
- timing: 'Timing-related (race conditions, timeouts)',
149
- environmental: 'Fails under specific conditions (load, network)',
150
- data: 'Data-dependent failures',
151
- order: 'Test order dependent',
152
- infrastructure: 'Infrastructure issues (CI agent, resources)'
153
- };
154
-
155
- // Analyze failure characteristics
156
- const failures = history.filter(h => h.result === 'fail');
157
-
158
- // Check for timing patterns
159
- const avgFailureDuration = failures.reduce((sum, f) => sum + f.duration, 0) / failures.length;
160
- const avgSuccessDuration = history.filter(h => h.result === 'pass')
161
- .reduce((sum, s) => sum + s.duration, 0) / (history.length - failures.length);
162
-
163
- if (Math.abs(avgFailureDuration - avgSuccessDuration) > avgSuccessDuration * 0.5) {
164
- return patterns.timing;
165
- }
166
-
167
- // Check for environmental patterns
168
- const failureAgents = new Set(failures.map(f => f.agent));
169
- const totalAgents = new Set(history.map(h => h.agent));
170
-
171
- if (failureAgents.size < totalAgents.size * 0.5) {
172
- return patterns.environmental;
173
- }
174
-
175
- // Check for order dependency
176
- const failurePositions = failures.map(f => f.orderInSuite);
177
- const avgFailurePosition = failurePositions.reduce((a, b) => a + b, 0) / failurePositions.length;
178
-
179
- if (Math.abs(avgFailurePosition - history.length / 2) > history.length * 0.3) {
180
- return patterns.order;
181
- }
182
-
183
- return patterns.random;
184
- }
185
-
186
- detectEnvironmentalSensitivity(stats) {
187
- // Analyze if failures correlate with environmental factors
188
- const factors = {
189
- timeOfDay: this.analyzeTimeOfDayCorrelation(stats),
190
- dayOfWeek: this.analyzeDayOfWeekCorrelation(stats),
191
- ciAgent: this.analyzeCIAgentCorrelation(stats),
192
- parallelization: this.analyzeParallelizationCorrelation(stats),
193
- systemLoad: this.analyzeSystemLoadCorrelation(stats)
194
- };
195
-
196
- // Return highest correlation factor
197
- return Math.max(...Object.values(factors));
198
- }
199
- }
200
- ```
201
-
202
- **Flaky Test Report:**
203
- ```json
204
- {
205
- "analysis": {
206
- "timeWindow": "last_30_days",
207
- "totalTests": 1287,
208
- "flakyTests": 47,
209
- "flakinessRate": 0.0365,
210
- "targetReliability": 0.95
211
- },
212
-
213
- "topFlakyTests": [
214
- {
215
- "testName": "test/integration/checkout.integration.test.ts::Checkout Flow::processes payment successfully",
216
- "flakinessScore": 0.68,
217
- "severity": "HIGH",
218
- "totalRuns": 156,
219
- "failures": 42,
220
- "passes": 114,
221
- "failureRate": 0.269,
222
- "pattern": "Timing-related (race conditions, timeouts)",
223
-
224
- "rootCause": {
225
- "category": "RACE_CONDITION",
226
- "confidence": 0.89,
227
- "description": "Payment API responds before order state is persisted",
228
- "evidence": [
229
- "Failures occur when test runs <50ms",
230
- "Success rate increases with explicit wait",
231
- "Logs show 'order not found' errors"
232
- ],
233
- "recommendation": "Add explicit wait for order persistence before payment call"
234
- },
235
-
236
- "failurePattern": {
237
- "randomness": 0.42,
238
- "timingCorrelation": 0.89,
239
- "environmentalCorrelation": 0.31
240
- },
241
-
242
- "environmentalFactors": {
243
- "timeOfDay": "Fails more during peak hours (12pm-2pm)",
244
- "ciAgent": "Fails 80% on agent-3 vs 20% on others",
245
- "parallelization": "Fails when >4 tests run in parallel"
246
- },
247
-
248
- "lastFlakes": [
249
- {
250
- "timestamp": "2025-09-30T14:23:45Z",
251
- "result": "fail",
252
- "duration": 1234,
253
- "error": "TimeoutError: Waiting for element timed out after 5000ms",
254
- "agent": "ci-agent-3"
255
- },
256
- {
257
- "timestamp": "2025-09-29T10:15:32Z",
258
- "result": "pass",
259
- "duration": 2341,
260
- "agent": "ci-agent-1"
261
- }
262
- ],
263
-
264
- "suggestedFixes": [
265
- {
266
- "priority": "HIGH",
267
- "approach": "Add explicit wait",
268
- "code": "await waitForCondition(() => orderService.exists(orderId), { timeout: 5000 });",
269
- "estimatedEffectiveness": 0.85
270
- },
271
- {
272
- "priority": "MEDIUM",
273
- "approach": "Increase timeout",
274
- "code": "await page.waitForSelector('.success-message', { timeout: 10000 });",
275
- "estimatedEffectiveness": 0.60
276
- },
277
- {
278
- "priority": "LOW",
279
- "approach": "Retry on failure",
280
- "code": "jest.retryTimes(3, { logErrorsBeforeRetry: true });",
281
- "estimatedEffectiveness": 0.40
282
- }
283
- ],
284
-
285
- "status": "QUARANTINED",
286
- "quarantinedAt": "2025-09-28T09:00:00Z",
287
- "assignedTo": "backend-team@company.com"
288
- }
289
- ],
290
-
291
- "statistics": {
292
- "byCategory": {
293
- "RACE_CONDITION": 23,
294
- "TIMEOUT": 12,
295
- "NETWORK_FLAKE": 7,
296
- "DATA_DEPENDENCY": 3,
297
- "ORDER_DEPENDENCY": 2
298
- },
299
- "bySeverity": {
300
- "HIGH": 14,
301
- "MEDIUM": 21,
302
- "LOW": 12
303
- },
304
- "byStatus": {
305
- "QUARANTINED": 27,
306
- "FIXED": 15,
307
- "INVESTIGATING": 5
308
- }
309
- },
310
-
311
- "recommendation": "Focus on 14 HIGH severity flaky tests first. Estimated fix time: 2-3 weeks to reach 95% reliability."
312
- }
313
- ```
314
-
315
- ### 2. Root Cause Analysis
316
-
317
- Analyzes test failures to identify root causes using log analysis, error pattern matching, and statistical correlation.
318
-
319
- **Root Cause Analyzer:**
320
- ```javascript
321
- class RootCauseAnalyzer {
322
- async analyzeRootCause(testName, failureData) {
323
- const analysis = {
324
- category: null,
325
- confidence: 0,
326
- description: '',
327
- evidence: [],
328
- recommendation: ''
329
- };
330
-
331
- // Analyze error messages
332
- const errorPatterns = this.analyzeErrorPatterns(failureData.errors);
333
-
334
- // Analyze timing
335
- const timingAnalysis = this.analyzeTimingPatterns(failureData.durations);
336
-
337
- // Analyze environment
338
- const environmentAnalysis = this.analyzeEnvironmentalFactors(failureData);
339
-
340
- // Analyze test code
341
- const codeAnalysis = await this.analyzeTestCode(testName);
342
-
343
- // Determine most likely root cause
344
- const causes = [
345
- this.detectRaceCondition(errorPatterns, timingAnalysis, codeAnalysis),
346
- this.detectTimeout(errorPatterns, timingAnalysis),
347
- this.detectNetworkFlake(errorPatterns, environmentAnalysis),
348
- this.detectDataDependency(errorPatterns, codeAnalysis),
349
- this.detectOrderDependency(failureData.orderPositions),
350
- this.detectMemoryLeak(environmentAnalysis, timingAnalysis)
351
- ].filter(cause => cause !== null);
352
-
353
- if (causes.length > 0) {
354
- // Return highest confidence cause
355
- const topCause = causes.sort((a, b) => b.confidence - a.confidence)[0];
356
- Object.assign(analysis, topCause);
357
- }
358
-
359
- return analysis;
360
- }
361
-
362
- detectRaceCondition(errorPatterns, timingAnalysis, codeAnalysis) {
363
- const indicators = [];
364
- let confidence = 0;
365
-
366
- // Check for race condition error messages
367
- if (errorPatterns.some(p => p.includes('race') || p.includes('not found') || p.includes('undefined'))) {
368
- indicators.push('Error messages suggest race condition');
369
- confidence += 0.3;
370
- }
371
-
372
- // Check for timing correlation
373
- if (timingAnalysis.failuresCorrelateWithSpeed) {
374
- indicators.push('Faster executions fail more often');
375
- confidence += 0.3;
376
- }
377
-
378
- // Check for async/await issues in code
379
- if (codeAnalysis.missingAwaits || codeAnalysis.unawaited Promises) {
380
- indicators.push('Code contains unawaited promises');
381
- confidence += 0.4;
382
- }
383
-
384
- if (confidence > 0.5) {
385
- return {
386
- category: 'RACE_CONDITION',
387
- confidence: Math.min(confidence, 1.0),
388
- description: 'Test has race condition between async operations',
389
- evidence: indicators,
390
- recommendation: 'Add explicit waits or synchronization points'
391
- };
392
- }
393
-
394
- return null;
395
- }
396
-
397
- detectTimeout(errorPatterns, timingAnalysis) {
398
- const indicators = [];
399
- let confidence = 0;
400
-
401
- // Check for timeout errors
402
- const timeoutPatterns = ['timeout', 'timed out', 'exceeded', 'time limit'];
403
- if (errorPatterns.some(p => timeoutPatterns.some(tp => p.toLowerCase().includes(tp)))) {
404
- indicators.push('Timeout error messages detected');
405
- confidence += 0.5;
406
- }
407
-
408
- // Check if failures correlate with long durations
409
- if (timingAnalysis.failureDurationAvg > timingAnalysis.successDurationAvg * 1.5) {
410
- indicators.push('Failures take significantly longer');
411
- confidence += 0.3;
412
- }
413
-
414
- // Check if failures occur near timeout threshold
415
- if (timingAnalysis.failuresNearTimeout) {
416
- indicators.push('Failures occur near timeout threshold');
417
- confidence += 0.2;
418
- }
419
-
420
- if (confidence > 0.5) {
421
- return {
422
- category: 'TIMEOUT',
423
- confidence: Math.min(confidence, 1.0),
424
- description: 'Test fails due to timeouts under load or slow conditions',
425
- evidence: indicators,
426
- recommendation: 'Increase timeout or optimize operation speed'
427
- };
428
- }
429
-
430
- return null;
431
- }
432
-
433
- detectNetworkFlake(errorPatterns, environmentAnalysis) {
434
- const indicators = [];
435
- let confidence = 0;
436
-
437
- // Check for network errors
438
- const networkPatterns = ['network', 'connection', 'fetch', 'ECONNREFUSED', '502', '503', '504'];
439
- if (errorPatterns.some(p => networkPatterns.some(np => p.includes(np)))) {
440
- indicators.push('Network error messages detected');
441
- confidence += 0.4;
442
- }
443
-
444
- // Check for CI agent correlation
445
- if (environmentAnalysis.specificAgentsFailMore) {
446
- indicators.push('Failures correlate with specific CI agents');
447
- confidence += 0.3;
448
- }
449
-
450
- // Check for time-of-day correlation
451
- if (environmentAnalysis.failsDuringPeakHours) {
452
- indicators.push('Failures increase during peak hours');
453
- confidence += 0.3;
454
- }
455
-
456
- if (confidence > 0.5) {
457
- return {
458
- category: 'NETWORK_FLAKE',
459
- confidence: Math.min(confidence, 1.0),
460
- description: 'Test fails due to network instability or external service issues',
461
- evidence: indicators,
462
- recommendation: 'Add retry logic with exponential backoff'
463
- };
464
- }
465
-
466
- return null;
467
- }
468
-
469
- async analyzeTestCode(testName) {
470
- // Static analysis of test code
471
- const testCode = await this.loadTestCode(testName);
472
-
473
- return {
474
- missingAwaits: this.findMissingAwaits(testCode),
475
- unawaitedPromises: this.findUnawaitedPromises(testCode),
476
- hardcodedSleeps: this.findHardcodedSleeps(testCode),
477
- sharedState: this.findSharedState(testCode),
478
- externalDependencies: this.findExternalDependencies(testCode)
479
- };
480
- }
481
- }
482
- ```
483
-
484
- ### 3. Auto-Stabilization
485
-
486
- Automatically applies fixes to common flakiness patterns.
487
-
488
- **Auto-Stabilizer:**
489
- ```javascript
490
- class AutoStabilizer {
491
- async stabilizeTest(testName, rootCause) {
492
- const strategies = {
493
- RACE_CONDITION: this.fixRaceCondition,
494
- TIMEOUT: this.fixTimeout,
495
- NETWORK_FLAKE: this.fixNetworkFlake,
496
- DATA_DEPENDENCY: this.fixDataDependency,
497
- ORDER_DEPENDENCY: this.fixOrderDependency
498
- };
499
-
500
- const strategy = strategies[rootCause.category];
501
- if (!strategy) {
502
- return { success: false, reason: 'No auto-fix available for this category' };
503
- }
504
-
505
- try {
506
- const result = await strategy.call(this, testName, rootCause);
507
- return result;
508
- } catch (error) {
509
- return { success: false, error: error.message };
510
- }
511
- }
512
-
513
- async fixRaceCondition(testName, rootCause) {
514
- const testCode = await this.loadTestCode(testName);
515
-
516
- // Strategy 1: Add explicit waits
517
- let modifiedCode = this.addExplicitWaits(testCode, rootCause);
518
-
519
- // Strategy 2: Fix unawaited promises
520
- modifiedCode = this.fixUnawaitedPromises(modifiedCode);
521
-
522
- // Strategy 3: Add retry with idempotency check
523
- modifiedCode = this.addRetryLogic(modifiedCode);
524
-
525
- await this.saveTestCode(testName, modifiedCode);
526
-
527
- // Run test 10 times to validate fix
528
- const validationResults = await this.runTestMultipleTimes(testName, 10);
529
-
530
- return {
531
- success: validationResults.passRate >= 0.95,
532
- originalPassRate: rootCause.passRate,
533
- newPassRate: validationResults.passRate,
534
- modifications: [
535
- 'Added explicit waits for async operations',
536
- 'Fixed unawaited promises',
537
- 'Added retry logic with exponential backoff'
538
- ]
539
- };
540
- }
541
-
542
- addExplicitWaits(code, rootCause) {
543
- // Find async operations that need explicit waits
544
- const asyncOperations = this.findAsyncOperations(code);
545
-
546
- for (const operation of asyncOperations) {
547
- // Add waitFor wrapper
548
- const waitCode = `await waitForCondition(${operation.condition}, { timeout: ${operation.timeout} });`;
549
- code = code.replace(operation.original, operation.original + '\n' + waitCode);
550
- }
551
-
552
- return code;
553
- }
554
-
555
- async fixTimeout(testName, rootCause) {
556
- const testCode = await this.loadTestCode(testName);
557
-
558
- // Increase timeout values
559
- let modifiedCode = this.increaseTimeouts(testCode, 2.0); // 2x current timeout
560
-
561
- // Add explicit waits instead of generic timeouts
562
- modifiedCode = this.replaceTimeoutsWithWaits(modifiedCode);
563
-
564
- await this.saveTestCode(testName, modifiedCode);
565
-
566
- const validationResults = await this.runTestMultipleTimes(testName, 10);
567
-
568
- return {
569
- success: validationResults.passRate >= 0.95,
570
- modifications: [
571
- 'Increased timeout thresholds by 2x',
572
- 'Replaced generic timeouts with explicit condition waits'
573
- ]
574
- };
575
- }
576
-
577
- async fixNetworkFlake(testName, rootCause) {
578
- const testCode = await this.loadTestCode(testName);
579
-
580
- // Add retry logic for network requests
581
- let modifiedCode = this.addNetworkRetry(testCode, {
582
- maxRetries: 3,
583
- backoff: 'exponential',
584
- retryOn: [502, 503, 504, 'ECONNREFUSED', 'ETIMEDOUT']
585
- });
586
-
587
- // Add circuit breaker for external services
588
- modifiedCode = this.addCircuitBreaker(modifiedCode);
589
-
590
- await this.saveTestCode(testName, modifiedCode);
591
-
592
- const validationResults = await this.runTestMultipleTimes(testName, 10);
593
-
594
- return {
595
- success: validationResults.passRate >= 0.95,
596
- modifications: [
597
- 'Added retry logic with exponential backoff',
598
- 'Added circuit breaker for external services',
599
- 'Increased timeout for network requests'
600
- ]
601
- };
602
- }
603
- }
604
- ```
605
-
606
- **Auto-Stabilization Example:**
607
- ```javascript
608
- // BEFORE: Flaky test with race condition
609
- test('processes payment successfully', async () => {
610
- const order = await createOrder({ amount: 100 });
611
- const payment = await processPayment(order.id); // Might fail if order not persisted
612
- expect(payment.status).toBe('success');
613
- });
614
-
615
- // AFTER: Auto-stabilized test
616
- test('processes payment successfully', async () => {
617
- const order = await createOrder({ amount: 100 });
618
-
619
- // ✅ Added: Explicit wait for order persistence
620
- await waitForCondition(
621
- () => orderService.exists(order.id),
622
- { timeout: 5000, interval: 100 }
623
- );
624
-
625
- // ✅ Added: Retry logic with exponential backoff
626
- const payment = await retryWithBackoff(
627
- () => processPayment(order.id),
628
- { maxRetries: 3, backoff: 'exponential' }
629
- );
630
-
631
- expect(payment.status).toBe('success');
632
- });
633
-
634
- // Result: Pass rate improved from 73% → 98%
635
- ```
636
-
637
- ### 4. Quarantine Management
638
-
639
- Automatically quarantines flaky tests to prevent them from blocking CI while fixes are in progress.
640
-
641
- **Quarantine Manager:**
642
- ```javascript
643
- class QuarantineManager {
644
- async quarantineTest(testName, reason) {
645
- const quarantine = {
646
- testName: testName,
647
- reason: reason,
648
- quarantinedAt: new Date(),
649
- assignedTo: this.assignOwner(testName),
650
- estimatedFixTime: this.estimateFixTime(reason),
651
- maxQuarantineDays: 30,
652
- status: 'QUARANTINED'
653
- };
654
-
655
- // Add skip annotation to test
656
- await this.addSkipAnnotation(testName, quarantine);
657
-
658
- // Create tracking issue
659
- await this.createJiraIssue(quarantine);
660
-
661
- // Notify team
662
- await this.notifyTeam(quarantine);
663
-
664
- // Schedule review
665
- await this.scheduleReview(quarantine);
666
-
667
- await this.storage.save(`quarantine/${testName}`, quarantine);
668
-
669
- return quarantine;
670
- }
671
-
672
- async addSkipAnnotation(testName, quarantine) {
673
- const testCode = await this.loadTestCode(testName);
674
-
675
- const annotation = `
676
- // QUARANTINED: ${quarantine.reason}
677
- // Quarantined: ${quarantine.quarantinedAt.toISOString()}
678
- // Assigned: ${quarantine.assignedTo}
679
- // Issue: ${quarantine.jiraIssue}
680
- test.skip('${testName}', async () => {
681
- // Test code...
682
- });
683
- `;
684
-
685
- // Replace test with skip annotation
686
- const modifiedCode = testCode.replace(/test\('/, `test.skip('`);
687
- await this.saveTestCode(testName, modifiedCode);
688
- }
689
-
690
- async reviewQuarantinedTests() {
691
- const quarantined = await this.storage.list('quarantine/*');
692
- const results = {
693
- reviewed: [],
694
- reinstated: [],
695
- escalated: [],
696
- deleted: []
697
- };
698
-
699
- for (const quarantine of quarantined) {
700
- const daysInQuarantine = (Date.now() - quarantine.quarantinedAt) / (1000 * 60 * 60 * 24);
701
-
702
- if (daysInQuarantine > quarantine.maxQuarantineDays) {
703
- // Escalate or delete
704
- if (await this.isTestStillRelevant(quarantine.testName)) {
705
- results.escalated.push(quarantine);
706
- await this.escalateToLeadership(quarantine);
707
- } else {
708
- results.deleted.push(quarantine);
709
- await this.deleteTest(quarantine.testName);
710
- }
711
- } else {
712
- // Check if test has been fixed
713
- const validationResults = await this.runTestMultipleTimes(quarantine.testName, 20);
714
-
715
- if (validationResults.passRate >= 0.95) {
716
- results.reinstated.push(quarantine);
717
- await this.reinstateTest(quarantine.testName);
718
- } else {
719
- results.reviewed.push(quarantine);
720
- }
721
- }
722
- }
723
-
724
- return results;
725
- }
726
- }
727
- ```
728
-
729
- **Quarantine Dashboard:**
730
- ```
731
- ┌─────────────────────────────────────────────────────────┐
732
- │ Quarantined Tests Dashboard │
733
- ├─────────────────────────────────────────────────────────┤
734
- │ │
735
- │ Total Quarantined: 27 │
736
- │ Fixed & Reinstated: 15 (this month) │
737
- │ Escalated: 2 │
738
- │ Deleted: 3 │
739
- │ │
740
- │ By Category: │
741
- │ Race Condition: 14 tests │
742
- │ Timeout: 8 tests │
743
- │ Network Flake: 3 tests │
744
- │ Data Dependency: 2 tests │
745
- │ │
746
- │ By Owner: │
747
- │ Backend Team: 12 tests (avg 8 days) │
748
- │ Frontend Team: 9 tests (avg 12 days) │
749
- │ Mobile Team: 6 tests (avg 15 days) │
750
- │ │
751
- │ Overdue (>14 days): 5 tests ⚠️ │
752
- │ Critical (>30 days): 0 tests ✅ │
753
- │ │
754
- └─────────────────────────────────────────────────────────┘
755
- ```
756
-
757
- ### 5. Trend Tracking
758
-
759
- Tracks flakiness trends over time to identify systemic issues.
760
-
761
- **Trend Tracker:**
762
- ```javascript
763
- class FlakynessTrendTracker {
764
- async trackTrends(timeWindow = 90) {
765
- const trends = {
766
- overall: this.calculateOverallTrend(timeWindow),
767
- byCategory: this.calculateTrendsByCategory(timeWindow),
768
- byTeam: this.calculateTrendsByTeam(timeWindow),
769
- byTimeOfDay: this.calculateTrendsByTimeOfDay(timeWindow),
770
- predictions: this.predictFutureTrends(timeWindow)
771
- };
772
-
773
- return trends;
774
- }
775
-
776
- calculateOverallTrend(days) {
777
- const data = this.getHistoricalData(days);
778
-
779
- const weeklyFlakiness = [];
780
- for (let week = 0; week < days / 7; week++) {
781
- const weekData = data.filter(d =>
782
- d.timestamp >= Date.now() - (week + 1) * 7 * 24 * 60 * 60 * 1000 &&
783
- d.timestamp < Date.now() - week * 7 * 24 * 60 * 60 * 1000
784
- );
785
-
786
- weeklyFlakiness.push({
787
- week: week,
788
- flakyTests: weekData.filter(d => d.flaky).length,
789
- totalTests: weekData.length,
790
- flakinessRate: weekData.filter(d => d.flaky).length / weekData.length
791
- });
792
- }
793
-
794
- const trend = this.calculateTrendDirection(weeklyFlakiness);
795
-
796
- return {
797
- current: weeklyFlakiness[0].flakinessRate,
798
- trend: trend, // IMPROVING, STABLE, DEGRADING
799
- weeklyData: weeklyFlakiness,
800
- targetReliability: 0.95,
801
- daysToTarget: this.estimateDaysToTarget(weeklyFlakiness, 0.95)
802
- };
803
- }
804
- }
805
- ```
806
-
807
- **Trend Visualization:**
808
- ```
809
- Flakiness Trend (Last 90 Days)
810
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
811
-
812
- 8% ┤
813
- │ ╭─╮
814
- 7% ┤ ╭─╯ ╰╮
815
- │ ╭─╯ ╰╮
816
- 6% ┤ ╭─╯ ╰─╮
817
- │ ╭─╯ ╰╮
818
- 5% ┤ ╭─╯ ╰─╮
819
- │ ╭─╯ ╰─╮
820
- 4% ┤ ╭─╯ ╰─╮
821
- │ ╭─╯ ╰─╮
822
- 3% ┤ ╭─╯ ╰─╮
823
- │ ╭───╯ ╰──
824
- 2% ┼───╯ ─
825
- └─┬────┬────┬────┬────┬────┬────┬────┬────┬────┬──
826
- 90d 80d 70d 60d 50d 40d 30d 20d 10d Now
827
-
828
- Trend: ✅ IMPROVING (-65% in 90 days)
829
- Current: 2.1% (Target: <5%)
830
- Status: ✅ EXCEEDING TARGET
831
- ```
832
-
833
- ### 6. Reliability Scoring
834
-
835
- Assigns reliability scores to all tests for prioritization and monitoring.
836
-
837
- **Reliability Scorer:**
838
- ```javascript
839
- class ReliabilityScorer {
840
- calculateReliabilityScore(testName, history) {
841
- const weights = {
842
- recentPassRate: 0.4,
843
- overallPassRate: 0.2,
844
- consistency: 0.2,
845
- environmentalStability: 0.1,
846
- executionSpeed: 0.1
847
- };
848
-
849
- // Recent pass rate (last 30 runs)
850
- const recent = history.slice(-30);
851
- const recentPassRate = recent.filter(r => r.result === 'pass').length / recent.length;
852
-
853
- // Overall pass rate
854
- const overallPassRate = history.filter(r => r.result === 'pass').length / history.length;
855
-
856
- // Consistency (low variance in results)
857
- const consistency = 1 - this.calculateInconsistency(history);
858
-
859
- // Environmental stability (passes in all environments)
860
- const environmentalStability = this.calculateEnvironmentalStability(history);
861
-
862
- // Execution speed stability (low variance in duration)
863
- const executionSpeed = this.calculateExecutionSpeedStability(history);
864
-
865
- const score = (
866
- recentPassRate * weights.recentPassRate +
867
- overallPassRate * weights.overallPassRate +
868
- consistency * weights.consistency +
869
- environmentalStability * weights.environmentalStability +
870
- executionSpeed * weights.executionSpeed
871
- );
872
-
873
- return {
874
- score: score,
875
- grade: this.getReliabilityGrade(score),
876
- components: {
877
- recentPassRate,
878
- overallPassRate,
879
- consistency,
880
- environmentalStability,
881
- executionSpeed
882
- }
883
- };
884
- }
885
-
886
- getReliabilityGrade(score) {
887
- if (score >= 0.95) return 'A'; // Excellent
888
- if (score >= 0.90) return 'B'; // Good
889
- if (score >= 0.80) return 'C'; // Fair
890
- if (score >= 0.70) return 'D'; // Poor
891
- return 'F'; // Failing
892
- }
893
- }
894
- ```
895
-
896
- ### 7. Predictive Flakiness
897
-
898
- Predicts which tests are likely to become flaky based on code changes and historical patterns.
899
-
900
- **Flakiness Predictor:**
901
- ```javascript
902
- class FlakinessPredictor {
903
- async predictFlakiness(testName, codeChanges) {
904
- const features = {
905
- // Test characteristics
906
- testComplexity: await this.calculateTestComplexity(testName),
907
- hasAsyncOperations: await this.hasAsyncOperations(testName),
908
- hasNetworkCalls: await this.hasNetworkCalls(testName),
909
- hasSharedState: await this.hasSharedState(testName),
910
-
911
- // Recent changes
912
- linesChanged: codeChanges.additions + codeChanges.deletions,
913
- filesChanged: codeChanges.files.length,
914
- asyncCodeAdded: this.detectAsyncCodeAddition(codeChanges),
915
-
916
- // Historical patterns
917
- authorFlakinessRate: await this.getAuthorFlakinessRate(codeChanges.author),
918
- moduleHistoricalFlakiness: await this.getModuleFlakiness(testName),
919
- recentFlakesInModule: await this.getRecentModuleFlakes(testName)
920
- };
921
-
922
- const prediction = await this.mlModel.predict(features);
923
-
924
- return {
925
- probability: prediction.probability,
926
- confidence: prediction.confidence,
927
- riskLevel: this.getRiskLevel(prediction.probability),
928
- recommendation: this.getRecommendation(prediction, features)
929
- };
930
- }
931
-
932
- getRecommendation(prediction, features) {
933
- if (prediction.probability > 0.7) {
934
- return {
935
- action: 'REVIEW_BEFORE_MERGE',
936
- message: 'High risk of flakiness - recommend thorough testing',
937
- suggestedActions: [
938
- 'Run test 20+ times before merge',
939
- 'Add explicit waits for async operations',
940
- 'Review for race conditions',
941
- 'Consider splitting into smaller tests'
942
- ]
943
- };
944
- }
945
-
946
- if (prediction.probability > 0.4) {
947
- return {
948
- action: 'MONITOR_CLOSELY',
949
- message: 'Medium risk - monitor after merge',
950
- suggestedActions: [
951
- 'Run test 10+ times before merge',
952
- 'Enable flakiness detection monitoring',
953
- 'Set up alerts for failures'
954
- ]
955
- };
956
- }
957
-
958
- return {
959
- action: 'STANDARD_PROCESS',
960
- message: 'Low risk - proceed normally'
961
- };
962
- }
963
- }
964
- ```
965
-
966
- ## Integration Points
967
-
968
- ### Upstream Dependencies
969
- - **CI/CD Systems**: Test execution results (Jenkins, GitHub Actions)
970
- - **Test Runners**: Jest, Pytest, JUnit results
971
- - **Version Control**: Git for code analysis
972
- - **APM Tools**: Performance data (New Relic, Datadog)
973
-
974
- ### Downstream Consumers
975
- - **qe-test-executor**: Skips quarantined tests
976
- - **qe-regression-risk-analyzer**: Excludes flaky tests from selection
977
- - **qe-deployment-readiness**: Considers test reliability in risk score
978
- - **Development Teams**: Receives fix recommendations
979
-
980
- ### Coordination Agents
981
- - **qe-fleet-commander**: Orchestrates flaky test hunting
982
- - **qe-quality-gate**: Blocks builds with too many flaky tests
983
-
984
- ## Coordination Protocol
985
-
986
- This agent uses **AQE hooks (Agentic QE native hooks)** for coordination (zero external dependencies, 100-500x faster).
987
-
988
- **Automatic Lifecycle Hooks:**
989
- ```typescript
990
- // Automatically called by BaseAgent
991
- protected async onPreTask(data: { assignment: TaskAssignment }): Promise<void> {
992
- // Load test history and known flaky tests
993
- const testHistory = await this.memoryStore.retrieve('aqe/test-results/history');
994
- const knownFlaky = await this.memoryStore.retrieve('aqe/flaky-tests/known');
995
-
996
- this.logger.info('Flaky test detection started', {
997
- historicalRuns: testHistory?.length || 0,
998
- knownFlakyTests: knownFlaky?.length || 0
999
- });
1000
- }
1001
-
1002
- protected async onPostTask(data: { assignment: TaskAssignment; result: any }): Promise<void> {
1003
- // Store detected flaky tests and reliability scores
1004
- await this.memoryStore.store('aqe/flaky-tests/detected', data.result.flakyTests);
1005
- await this.memoryStore.store('aqe/test-reliability/scores', data.result.reliabilityScores);
1006
-
1007
- // Emit flaky test detection event
1008
- this.eventBus.emit('flaky-hunter:completed', {
1009
- newFlakyTests: data.result.flakyTests.length,
1010
- quarantined: data.result.quarantined.length,
1011
- avgReliability: data.result.reliabilityScores.average
1012
- });
1013
- }
1014
-
1015
- protected async onPostEdit(data: { filePath: string; changes: any }): Promise<void> {
1016
- // Track test file updates
1017
- if (data.filePath.includes('test')) {
1018
- await this.memoryStore.store(`aqe/flaky-tests/test-updated/${data.filePath}`, {
1019
- timestamp: Date.now(),
1020
- stabilizationAttempt: true
1021
- });
1022
- }
1023
- }
1024
- ```
1025
-
1026
- **Advanced Verification (Optional):**
1027
- ```typescript
1028
- const hookManager = new VerificationHookManager(this.memoryStore);
1029
- const verification = await hookManager.executePreTaskVerification({
1030
- task: 'flaky-detection',
1031
- context: {
1032
- requiredVars: ['NODE_ENV', 'TEST_FRAMEWORK'],
1033
- minMemoryMB: 512,
1034
- minHistoricalRuns: 10
1035
- }
1036
- });
1037
- ```
1038
-
1039
- ## Memory Keys
1040
-
1041
- ### Input Keys
1042
- - `aqe/test-results/history` - Historical test execution results
1043
- - `aqe/flaky-tests/known` - Known flaky tests registry
1044
- - `aqe/code-changes/current` - Recent code changes
1045
-
1046
- ### Output Keys
1047
- - `aqe/flaky-tests/detected` - Newly detected flaky tests
1048
- - `aqe/test-reliability/scores` - Test reliability scores
1049
- - `aqe/quarantine/active` - Currently quarantined tests
1050
- - `aqe/remediation/suggestions` - Auto-fix suggestions
1051
-
1052
- ### Coordination Keys
1053
- - `aqe/flaky-tests/status` - Detection status
1054
- - `aqe/flaky-tests/alerts` - Critical flakiness alerts
1055
-
1056
- ## Use Cases
1057
-
1058
- ### Use Case 1: Detect and Quarantine Flaky Tests
1059
-
1060
- **Scenario**: Identify flaky tests in CI and quarantine them.
1061
-
1062
- **Workflow:**
1063
- ```bash
1064
- # Detect flaky tests from last 30 days
1065
- aqe flaky detect --days 30 --min-runs 10
1066
-
1067
- # Analyze root causes
1068
- aqe flaky analyze --test "integration/checkout.test.ts"
1069
-
1070
- # Quarantine flaky tests
1071
- aqe flaky quarantine --severity HIGH --auto-assign
1072
-
1073
- # Generate report
1074
- aqe flaky report --output flaky-tests-report.html
1075
- ```
1076
-
1077
- ### Use Case 2: Auto-Stabilize Flaky Test
1078
-
1079
- **Scenario**: Automatically fix a flaky test with race condition.
1080
-
1081
- **Workflow:**
1082
- ```bash
1083
- # Detect root cause
1084
- aqe flaky analyze --test "integration/payment.test.ts"
1085
-
1086
- # Attempt auto-stabilization
1087
- aqe flaky auto-fix --test "integration/payment.test.ts"
1088
-
1089
- # Validate fix
1090
- aqe flaky validate --test "integration/payment.test.ts" --runs 20
1091
-
1092
- # Reinstate if fixed
1093
- aqe flaky reinstate --test "integration/payment.test.ts"
1094
- ```
1095
-
1096
- ### Use Case 3: Track Flakiness Trends
1097
-
1098
- **Scenario**: Monitor flakiness trends and identify systemic issues.
1099
-
1100
- **Workflow:**
1101
- ```bash
1102
- # Generate trend report
1103
- aqe flaky trends --days 90 --format chart
1104
-
1105
- # Identify hotspots
1106
- aqe flaky hotspots --by module --threshold 0.10
1107
-
1108
- # Predict future flakiness
1109
- aqe flaky predict --target-date 2025-12-31
1110
- ```
1111
-
1112
- ## Success Metrics
1113
-
1114
- ### Quality Metrics
1115
- - **Test Reliability**: 95%+ (target achieved)
1116
- - **False Negative Rate**: <2% (flaky tests causing false passes)
1117
- - **False Positive Rate**: <3% (stable tests incorrectly flagged)
1118
- - **Detection Accuracy**: 98%
1119
-
1120
- ### Efficiency Metrics
1121
- - **Time to Detect Flakiness**: <1 hour (automated)
1122
- - **Time to Fix**: 80% fixed within 7 days
1123
- - **Quarantine Duration**: Average 8 days
1124
- - **Auto-Fix Success Rate**: 65%
1125
-
1126
- ### Business Metrics
1127
- - **CI Reliability**: 99.5% (no false failures blocking deployments)
1128
- - **Developer Trust**: 4.9/5 (high confidence in test results)
1129
- - **Time Saved**: 15 hours/week (no manual reruns)
1130
-
1131
- ## Commands
1132
-
1133
- ### Basic Commands
1134
-
1135
- ```bash
1136
- # Detect flaky tests
1137
- aqe flaky detect --days <number>
1138
-
1139
- # Analyze root cause
1140
- aqe flaky analyze --test <test-name>
1141
-
1142
- # Quarantine test
1143
- aqe flaky quarantine --test <test-name> --reason <reason>
1144
-
1145
- # Reinstate test
1146
- aqe flaky reinstate --test <test-name>
1147
-
1148
- # Generate report
1149
- aqe flaky report --output <file>
1150
- ```
1151
-
1152
- ### Advanced Commands
1153
-
1154
- ```bash
1155
- # Auto-fix flaky test
1156
- aqe flaky auto-fix --test <test-name> --validate
1157
-
1158
- # Track trends
1159
- aqe flaky trends --days <number> --format <html|chart|json>
1160
-
1161
- # Identify hotspots
1162
- aqe flaky hotspots --by <module|team|category>
1163
-
1164
- # Predict flakiness
1165
- aqe flaky predict --test <test-name> --changes <git-diff>
1166
-
1167
- # Review quarantined tests
1168
- aqe flaky review-quarantine --auto-reinstate
1169
- ```
1170
-
1171
- ### Specialized Commands
1172
-
1173
- ```bash
1174
- # Reliability scoring
1175
- aqe flaky reliability-score --test <test-name>
1176
-
1177
- # Bulk quarantine
1178
- aqe flaky bulk-quarantine --severity HIGH --days 7
1179
-
1180
- # Escalate overdue
1181
- aqe flaky escalate-overdue --threshold 30
1182
-
1183
- # Export quarantine dashboard
1184
- aqe flaky quarantine-dashboard --output dashboard.html
1185
-
1186
- # Flakiness heatmap
1187
- aqe flaky heatmap --by-module --output heatmap.png
1188
- ```
1189
-
1190
- ---
1191
-
1192
- **Agent Status**: Production Ready
1193
- **Last Updated**: 2025-09-30
1194
- **Version**: 1.0.0
1195
- **Maintainer**: AQE Fleet Team