agentic-qe 2.0.0 → 2.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/agents/qx-partner.md +17 -4
- package/.claude/skills/accessibility-testing/SKILL.md +144 -692
- package/.claude/skills/agentic-quality-engineering/SKILL.md +176 -529
- package/.claude/skills/api-testing-patterns/SKILL.md +180 -560
- package/.claude/skills/brutal-honesty-review/SKILL.md +113 -603
- package/.claude/skills/bug-reporting-excellence/SKILL.md +116 -517
- package/.claude/skills/chaos-engineering-resilience/SKILL.md +127 -72
- package/.claude/skills/cicd-pipeline-qe-orchestrator/SKILL.md +209 -404
- package/.claude/skills/code-review-quality/SKILL.md +158 -608
- package/.claude/skills/compatibility-testing/SKILL.md +148 -38
- package/.claude/skills/compliance-testing/SKILL.md +132 -63
- package/.claude/skills/consultancy-practices/SKILL.md +114 -446
- package/.claude/skills/context-driven-testing/SKILL.md +117 -381
- package/.claude/skills/contract-testing/SKILL.md +176 -141
- package/.claude/skills/database-testing/SKILL.md +137 -130
- package/.claude/skills/exploratory-testing-advanced/SKILL.md +160 -629
- package/.claude/skills/holistic-testing-pact/SKILL.md +140 -188
- package/.claude/skills/localization-testing/SKILL.md +145 -33
- package/.claude/skills/mobile-testing/SKILL.md +132 -448
- package/.claude/skills/mutation-testing/SKILL.md +147 -41
- package/.claude/skills/performance-testing/SKILL.md +200 -546
- package/.claude/skills/quality-metrics/SKILL.md +164 -519
- package/.claude/skills/refactoring-patterns/SKILL.md +132 -699
- package/.claude/skills/regression-testing/SKILL.md +120 -926
- package/.claude/skills/risk-based-testing/SKILL.md +157 -660
- package/.claude/skills/security-testing/SKILL.md +199 -538
- package/.claude/skills/sherlock-review/SKILL.md +163 -699
- package/.claude/skills/shift-left-testing/SKILL.md +161 -465
- package/.claude/skills/shift-right-testing/SKILL.md +161 -519
- package/.claude/skills/six-thinking-hats/SKILL.md +175 -1110
- package/.claude/skills/skills-manifest.json +71 -20
- package/.claude/skills/tdd-london-chicago/SKILL.md +131 -448
- package/.claude/skills/technical-writing/SKILL.md +103 -154
- package/.claude/skills/test-automation-strategy/SKILL.md +166 -772
- package/.claude/skills/test-data-management/SKILL.md +126 -910
- package/.claude/skills/test-design-techniques/SKILL.md +179 -89
- package/.claude/skills/test-environment-management/SKILL.md +136 -91
- package/.claude/skills/test-reporting-analytics/SKILL.md +169 -92
- package/.claude/skills/testability-scoring/SKILL.md +172 -538
- package/.claude/skills/testability-scoring/scripts/generate-html-report.js +0 -0
- package/.claude/skills/visual-testing-advanced/SKILL.md +155 -78
- package/.claude/skills/xp-practices/SKILL.md +151 -587
- package/CHANGELOG.md +48 -0
- package/README.md +23 -16
- package/dist/agents/QXPartnerAgent.d.ts +8 -1
- package/dist/agents/QXPartnerAgent.d.ts.map +1 -1
- package/dist/agents/QXPartnerAgent.js +1174 -112
- package/dist/agents/QXPartnerAgent.js.map +1 -1
- package/dist/agents/lifecycle/AgentLifecycleManager.d.ts.map +1 -1
- package/dist/agents/lifecycle/AgentLifecycleManager.js +34 -31
- package/dist/agents/lifecycle/AgentLifecycleManager.js.map +1 -1
- package/dist/cli/commands/init-claude-md-template.d.ts.map +1 -1
- package/dist/cli/commands/init-claude-md-template.js +14 -0
- package/dist/cli/commands/init-claude-md-template.js.map +1 -1
- package/dist/core/SwarmCoordinator.d.ts +180 -0
- package/dist/core/SwarmCoordinator.d.ts.map +1 -0
- package/dist/core/SwarmCoordinator.js +473 -0
- package/dist/core/SwarmCoordinator.js.map +1 -0
- package/dist/core/metrics/MetricsAggregator.d.ts +228 -0
- package/dist/core/metrics/MetricsAggregator.d.ts.map +1 -0
- package/dist/core/metrics/MetricsAggregator.js +482 -0
- package/dist/core/metrics/MetricsAggregator.js.map +1 -0
- package/dist/core/metrics/index.d.ts +5 -0
- package/dist/core/metrics/index.d.ts.map +1 -0
- package/dist/core/metrics/index.js +11 -0
- package/dist/core/metrics/index.js.map +1 -0
- package/dist/core/optimization/SwarmOptimizer.d.ts +5 -0
- package/dist/core/optimization/SwarmOptimizer.d.ts.map +1 -1
- package/dist/core/optimization/SwarmOptimizer.js +17 -0
- package/dist/core/optimization/SwarmOptimizer.js.map +1 -1
- package/dist/core/orchestration/AdaptiveScheduler.d.ts +190 -0
- package/dist/core/orchestration/AdaptiveScheduler.d.ts.map +1 -0
- package/dist/core/orchestration/AdaptiveScheduler.js +460 -0
- package/dist/core/orchestration/AdaptiveScheduler.js.map +1 -0
- package/dist/core/orchestration/WorkflowOrchestrator.d.ts +13 -0
- package/dist/core/orchestration/WorkflowOrchestrator.d.ts.map +1 -1
- package/dist/core/orchestration/WorkflowOrchestrator.js +32 -0
- package/dist/core/orchestration/WorkflowOrchestrator.js.map +1 -1
- package/dist/core/recovery/CircuitBreaker.d.ts +176 -0
- package/dist/core/recovery/CircuitBreaker.d.ts.map +1 -0
- package/dist/core/recovery/CircuitBreaker.js +382 -0
- package/dist/core/recovery/CircuitBreaker.js.map +1 -0
- package/dist/core/recovery/RecoveryOrchestrator.d.ts +186 -0
- package/dist/core/recovery/RecoveryOrchestrator.d.ts.map +1 -0
- package/dist/core/recovery/RecoveryOrchestrator.js +476 -0
- package/dist/core/recovery/RecoveryOrchestrator.js.map +1 -0
- package/dist/core/recovery/RetryStrategy.d.ts +127 -0
- package/dist/core/recovery/RetryStrategy.d.ts.map +1 -0
- package/dist/core/recovery/RetryStrategy.js +314 -0
- package/dist/core/recovery/RetryStrategy.js.map +1 -0
- package/dist/core/recovery/index.d.ts +8 -0
- package/dist/core/recovery/index.d.ts.map +1 -0
- package/dist/core/recovery/index.js +27 -0
- package/dist/core/recovery/index.js.map +1 -0
- package/dist/core/skills/DependencyResolver.d.ts +99 -0
- package/dist/core/skills/DependencyResolver.d.ts.map +1 -0
- package/dist/core/skills/DependencyResolver.js +260 -0
- package/dist/core/skills/DependencyResolver.js.map +1 -0
- package/dist/core/skills/ManifestGenerator.d.ts +114 -0
- package/dist/core/skills/ManifestGenerator.d.ts.map +1 -0
- package/dist/core/skills/ManifestGenerator.js +449 -0
- package/dist/core/skills/ManifestGenerator.js.map +1 -0
- package/dist/core/skills/index.d.ts +9 -0
- package/dist/core/skills/index.d.ts.map +1 -0
- package/dist/core/skills/index.js +24 -0
- package/dist/core/skills/index.js.map +1 -0
- package/dist/mcp/server.d.ts +9 -9
- package/dist/mcp/server.d.ts.map +1 -1
- package/dist/mcp/server.js +1 -2
- package/dist/mcp/server.js.map +1 -1
- package/dist/types/qx.d.ts +39 -7
- package/dist/types/qx.d.ts.map +1 -1
- package/dist/types/qx.js.map +1 -1
- package/dist/visualization/api/RestEndpoints.js +1 -1
- package/dist/visualization/api/RestEndpoints.js.map +1 -1
- package/package.json +13 -55
|
@@ -1,585 +1,227 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: shift-right-testing
|
|
3
|
-
description: Testing in production with feature flags, canary deployments, synthetic monitoring, and chaos engineering. Use when
|
|
3
|
+
description: "Testing in production with feature flags, canary deployments, synthetic monitoring, and chaos engineering. Use when implementing production observability or progressive delivery."
|
|
4
|
+
category: testing-methodologies
|
|
5
|
+
priority: high
|
|
6
|
+
tokenEstimate: 1000
|
|
7
|
+
agents: [qe-production-intelligence, qe-chaos-engineer, qe-performance-tester, qe-quality-analyzer]
|
|
8
|
+
implementation_status: optimized
|
|
9
|
+
optimization_version: 1.0
|
|
10
|
+
last_optimized: 2025-12-02
|
|
11
|
+
dependencies: []
|
|
12
|
+
quick_reference_card: true
|
|
13
|
+
tags: [shift-right, production-testing, canary, feature-flags, chaos-engineering, monitoring]
|
|
4
14
|
---
|
|
5
15
|
|
|
6
16
|
# Shift-Right Testing
|
|
7
17
|
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
**Shift-Right:**
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
- Real user
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
-
|
|
25
|
-
-
|
|
26
|
-
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
-
|
|
33
|
-
-
|
|
34
|
-
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
18
|
+
<default_to_action>
|
|
19
|
+
When testing in production or implementing progressive delivery:
|
|
20
|
+
1. IMPLEMENT feature flags for progressive rollout (1% → 10% → 50% → 100%)
|
|
21
|
+
2. DEPLOY with canary releases (compare metrics before full rollout)
|
|
22
|
+
3. MONITOR with synthetic tests (proactive) + RUM (reactive)
|
|
23
|
+
4. INJECT failures with chaos engineering (build resilience)
|
|
24
|
+
5. ANALYZE production data to improve pre-production testing
|
|
25
|
+
|
|
26
|
+
**Quick Shift-Right Techniques:**
|
|
27
|
+
- Feature flags → Control who sees what, instant rollback
|
|
28
|
+
- Canary deployment → 5% traffic, compare error rates
|
|
29
|
+
- Synthetic monitoring → Simulate users 24/7, catch issues before users
|
|
30
|
+
- Chaos engineering → Netflix-style failure injection
|
|
31
|
+
- RUM (Real User Monitoring) → Actual user experience data
|
|
32
|
+
|
|
33
|
+
**Critical Success Factors:**
|
|
34
|
+
- Production is the ultimate test environment
|
|
35
|
+
- Ship fast with safety nets, not slow with certainty
|
|
36
|
+
- Use production data to improve shift-left testing
|
|
37
|
+
</default_to_action>
|
|
38
|
+
|
|
39
|
+
## Quick Reference Card
|
|
40
|
+
|
|
41
|
+
### When to Use
|
|
42
|
+
- Progressive feature rollouts
|
|
43
|
+
- Production reliability validation
|
|
44
|
+
- Performance monitoring at scale
|
|
45
|
+
- Learning from real user behavior
|
|
46
|
+
|
|
47
|
+
### Shift-Right Techniques
|
|
48
|
+
| Technique | Purpose | When |
|
|
49
|
+
|-----------|---------|------|
|
|
50
|
+
| Feature Flags | Controlled rollout | Every feature |
|
|
51
|
+
| Canary | Compare new vs old | Every deployment |
|
|
52
|
+
| Synthetic Monitoring | Proactive detection | 24/7 |
|
|
53
|
+
| RUM | Real user metrics | Always on |
|
|
54
|
+
| Chaos Engineering | Resilience validation | Regularly |
|
|
55
|
+
| A/B Testing | User behavior validation | Feature decisions |
|
|
56
|
+
|
|
57
|
+
### Progressive Rollout Pattern
|
|
58
|
+
```
|
|
59
|
+
1% → 10% → 25% → 50% → 100%
|
|
60
|
+
↓ ↓ ↓ ↓
|
|
61
|
+
Check Check Check Monitor
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### Key Metrics to Monitor
|
|
65
|
+
| Metric | SLO Target | Alert Threshold |
|
|
66
|
+
|--------|------------|-----------------|
|
|
67
|
+
| Error rate | < 0.1% | > 1% |
|
|
68
|
+
| p95 latency | < 200ms | > 500ms |
|
|
69
|
+
| Availability | 99.9% | < 99.5% |
|
|
70
|
+
| Apdex | > 0.95 | < 0.8 |
|
|
42
71
|
|
|
43
72
|
---
|
|
44
73
|
|
|
45
|
-
##
|
|
46
|
-
|
|
47
|
-
### 1. Feature Flags (Progressive Rollout)
|
|
48
|
-
|
|
49
|
-
**Concept:** Deploy code to production but control who sees it.
|
|
74
|
+
## Feature Flags
|
|
50
75
|
|
|
51
76
|
```javascript
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
}
|
|
58
|
-
return <OldCheckout />; // Existing code (fallback)
|
|
59
|
-
}
|
|
60
|
-
```
|
|
61
|
-
|
|
62
|
-
**Rollout Strategy:**
|
|
63
|
-
```
|
|
64
|
-
1% → Monitor metrics for 1 hour
|
|
65
|
-
↓ (if healthy)
|
|
66
|
-
10% → A/B test performance vs old version
|
|
67
|
-
↓ (if successful)
|
|
68
|
-
50% → Validate at scale, monitor errors
|
|
69
|
-
↓ (if stable)
|
|
70
|
-
100% → Full rollout complete
|
|
71
|
-
```
|
|
72
|
-
|
|
73
|
-
**Benefits:**
|
|
74
|
-
- Test in production safely
|
|
75
|
-
- Instant rollback (disable flag)
|
|
76
|
-
- A/B testing built-in
|
|
77
|
-
- Gradual risk exposure
|
|
78
|
-
- Dark launches (test without users seeing)
|
|
79
|
-
|
|
80
|
-
**Implementation with LaunchDarkly:**
|
|
81
|
-
```javascript
|
|
82
|
-
import * as ld from 'launchdarkly-node-server-sdk';
|
|
83
|
-
|
|
84
|
-
const client = ld.init(process.env.LD_SDK_KEY);
|
|
85
|
-
|
|
86
|
-
// Check if feature enabled for user
|
|
87
|
-
const showNewFeature = await client.variation(
|
|
88
|
-
'new-checkout-flow',
|
|
89
|
-
{ key: user.id, email: user.email },
|
|
90
|
-
false // default value
|
|
91
|
-
);
|
|
77
|
+
// Progressive rollout with LaunchDarkly/Unleash pattern
|
|
78
|
+
const newCheckout = featureFlags.isEnabled('new-checkout', {
|
|
79
|
+
userId: user.id,
|
|
80
|
+
percentage: 10, // 10% of users
|
|
81
|
+
allowlist: ['beta-testers']
|
|
82
|
+
});
|
|
92
83
|
|
|
93
|
-
if (
|
|
94
|
-
|
|
84
|
+
if (newCheckout) {
|
|
85
|
+
return <NewCheckoutFlow />;
|
|
95
86
|
} else {
|
|
96
|
-
|
|
87
|
+
return <LegacyCheckoutFlow />;
|
|
97
88
|
}
|
|
98
|
-
```
|
|
99
89
|
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
feature: new-checkout-flow
|
|
103
|
-
variations:
|
|
104
|
-
- on: true
|
|
105
|
-
- off: false
|
|
106
|
-
targeting:
|
|
107
|
-
- rule: Internal employees
|
|
108
|
-
serve: on
|
|
109
|
-
match: email ends with "@company.com"
|
|
110
|
-
|
|
111
|
-
- rule: Beta testers
|
|
112
|
-
serve: on
|
|
113
|
-
match: user in segment "beta-users"
|
|
114
|
-
|
|
115
|
-
- rule: Percentage rollout
|
|
116
|
-
serve: on
|
|
117
|
-
match: 10% of users (by user ID hash)
|
|
118
|
-
|
|
119
|
-
default: off
|
|
90
|
+
// Instant rollback on issues
|
|
91
|
+
await featureFlags.disable('new-checkout');
|
|
120
92
|
```
|
|
121
93
|
|
|
122
94
|
---
|
|
123
95
|
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
**Concept:** Deploy new version to small percentage of infrastructure, monitor, then gradually increase.
|
|
127
|
-
|
|
128
|
-
**Manual Canary with Kubernetes:**
|
|
129
|
-
```bash
|
|
130
|
-
# Deploy new version to 5% of pods
|
|
131
|
-
kubectl set image deployment/api api=v2.0 --record
|
|
96
|
+
## Canary Deployment
|
|
132
97
|
|
|
133
|
-
# Monitor for 10 minutes
|
|
134
|
-
./monitor-metrics.sh --deployment=api --duration=10m \
|
|
135
|
-
--metrics="error_rate,latency_p95,cpu_usage"
|
|
136
|
-
|
|
137
|
-
# If healthy, scale up gradually
|
|
138
|
-
kubectl scale deployment/api-v2 --replicas=20 # 10%
|
|
139
|
-
./monitor-metrics.sh --duration=10m
|
|
140
|
-
|
|
141
|
-
kubectl scale deployment/api-v2 --replicas=100 # 50%
|
|
142
|
-
./monitor-metrics.sh --duration=10m
|
|
143
|
-
|
|
144
|
-
kubectl scale deployment/api-v2 --replicas=200 # 100%
|
|
145
|
-
kubectl scale deployment/api-v1 --replicas=0 # Remove old
|
|
146
|
-
```
|
|
147
|
-
|
|
148
|
-
**Automated Canary with Flagger:**
|
|
149
98
|
```yaml
|
|
99
|
+
# Flagger canary config
|
|
150
100
|
apiVersion: flagger.app/v1beta1
|
|
151
101
|
kind: Canary
|
|
152
|
-
metadata:
|
|
153
|
-
name: api-canary
|
|
154
102
|
spec:
|
|
155
103
|
targetRef:
|
|
156
104
|
apiVersion: apps/v1
|
|
157
105
|
kind: Deployment
|
|
158
|
-
name:
|
|
159
|
-
|
|
160
|
-
# Canary analysis configuration
|
|
106
|
+
name: checkout-service
|
|
107
|
+
progressDeadlineSeconds: 60
|
|
161
108
|
analysis:
|
|
162
|
-
interval: 1m
|
|
163
|
-
threshold:
|
|
164
|
-
maxWeight: 50
|
|
165
|
-
stepWeight: 10
|
|
166
|
-
|
|
167
|
-
# Success metrics (must pass)
|
|
109
|
+
interval: 1m
|
|
110
|
+
threshold: 5 # Max failed checks
|
|
111
|
+
maxWeight: 50 # Max traffic to canary
|
|
112
|
+
stepWeight: 10 # Increment per interval
|
|
168
113
|
metrics:
|
|
169
114
|
- name: request-success-rate
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
- name: request-duration-p95
|
|
174
|
-
thresholdRange:
|
|
175
|
-
max: 500 # p95 latency < 500ms
|
|
176
|
-
|
|
177
|
-
- name: error-rate
|
|
178
|
-
thresholdRange:
|
|
179
|
-
max: 1 # < 1% errors
|
|
180
|
-
|
|
181
|
-
# Webhook notifications
|
|
182
|
-
webhooks:
|
|
183
|
-
- name: slack-notification
|
|
184
|
-
url: https://hooks.slack.com/services/YOUR/WEBHOOK
|
|
185
|
-
type: post-rollout
|
|
115
|
+
threshold: 99
|
|
116
|
+
- name: request-duration
|
|
117
|
+
threshold: 500
|
|
186
118
|
```
|
|
187
119
|
|
|
188
|
-
**Automated Process:**
|
|
189
|
-
1. Deploy v2 to 10% of traffic
|
|
190
|
-
2. Monitor success rate, latency, errors
|
|
191
|
-
3. If metrics healthy → increase to 20%
|
|
192
|
-
4. Continue until 100% or failure detected
|
|
193
|
-
5. On failure → automatic rollback to v1
|
|
194
|
-
|
|
195
|
-
**Benefits:**
|
|
196
|
-
- Real production validation
|
|
197
|
-
- Gradual risk mitigation
|
|
198
|
-
- Automatic rollback on failures
|
|
199
|
-
- Minimal blast radius (5-10% impact)
|
|
200
|
-
|
|
201
120
|
---
|
|
202
121
|
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
**Concept:** Continuously run automated tests against production to detect issues before users do.
|
|
122
|
+
## Synthetic Monitoring
|
|
206
123
|
|
|
207
|
-
**Playwright Synthetic Monitor:**
|
|
208
124
|
```javascript
|
|
209
|
-
//
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
await page.click('[data-test=checkout]');
|
|
222
|
-
await page.fill('[data-test=email]', 'synthetic@monitor.test');
|
|
223
|
-
await page.fill('[data-test=card]', '4242424242424242'); // Test mode
|
|
224
|
-
|
|
225
|
-
// Don't actually complete purchase (test mode stops here)
|
|
226
|
-
|
|
227
|
-
const duration = Date.now() - start;
|
|
228
|
-
|
|
229
|
-
// Report success metric
|
|
230
|
-
await reportMetric('checkout-flow', {
|
|
231
|
-
success: true,
|
|
232
|
-
duration,
|
|
233
|
-
timestamp: new Date()
|
|
234
|
-
});
|
|
235
|
-
|
|
236
|
-
console.log(`✅ Checkout flow healthy (${duration}ms)`);
|
|
237
|
-
|
|
238
|
-
} catch (error) {
|
|
239
|
-
// Alert on failure
|
|
240
|
-
await reportMetric('checkout-flow', {
|
|
241
|
-
success: false,
|
|
242
|
-
error: error.message,
|
|
243
|
-
timestamp: new Date()
|
|
244
|
-
});
|
|
245
|
-
|
|
246
|
-
await alertOncall({
|
|
247
|
-
severity: 'critical',
|
|
248
|
-
message: 'Checkout flow failed in production',
|
|
249
|
-
error: error.message
|
|
250
|
-
});
|
|
251
|
-
|
|
252
|
-
console.error(`❌ Checkout flow failed: ${error.message}`);
|
|
253
|
-
} finally {
|
|
254
|
-
await browser.close();
|
|
125
|
+
// Continuous production validation
|
|
126
|
+
await Task("Synthetic Tests", {
|
|
127
|
+
endpoints: [
|
|
128
|
+
{ path: '/health', expected: 200, interval: '30s' },
|
|
129
|
+
{ path: '/api/products', expected: 200, interval: '1m' },
|
|
130
|
+
{ path: '/checkout', flow: 'full-purchase', interval: '5m' }
|
|
131
|
+
],
|
|
132
|
+
locations: ['us-east', 'eu-west', 'ap-south'],
|
|
133
|
+
alertOn: {
|
|
134
|
+
statusCode: '!= 200',
|
|
135
|
+
latency: '> 500ms',
|
|
136
|
+
contentMismatch: true
|
|
255
137
|
}
|
|
256
|
-
}
|
|
257
|
-
|
|
258
|
-
// Run every 5 minutes
|
|
259
|
-
setInterval(runCheckoutFlowMonitor, 5 * 60 * 1000);
|
|
138
|
+
}, "qe-production-intelligence");
|
|
260
139
|
```
|
|
261
140
|
|
|
262
|
-
**Datadog Synthetic Monitoring:**
|
|
263
|
-
```yaml
|
|
264
|
-
# synthetics.yaml
|
|
265
|
-
tests:
|
|
266
|
-
- name: "API Health Check"
|
|
267
|
-
type: api
|
|
268
|
-
request:
|
|
269
|
-
url: "https://api.example.com/health"
|
|
270
|
-
method: GET
|
|
271
|
-
assertions:
|
|
272
|
-
- type: statusCode
|
|
273
|
-
operator: is
|
|
274
|
-
target: 200
|
|
275
|
-
- type: responseTime
|
|
276
|
-
operator: lessThan
|
|
277
|
-
target: 500
|
|
278
|
-
locations: ["us-east-1", "eu-west-1", "ap-southeast-1"]
|
|
279
|
-
frequency: 300 # 5 minutes
|
|
280
|
-
|
|
281
|
-
- name: "Checkout Flow E2E"
|
|
282
|
-
type: browser
|
|
283
|
-
steps:
|
|
284
|
-
- type: navigateTo
|
|
285
|
-
url: "https://example.com"
|
|
286
|
-
- type: click
|
|
287
|
-
selector: "[data-test=add-to-cart]"
|
|
288
|
-
- type: click
|
|
289
|
-
selector: "[data-test=checkout]"
|
|
290
|
-
assertions:
|
|
291
|
-
- type: element
|
|
292
|
-
selector: "[data-test=checkout-success]"
|
|
293
|
-
operator: isVisible
|
|
294
|
-
frequency: 600 # 10 minutes
|
|
295
|
-
```
|
|
296
|
-
|
|
297
|
-
**Benefits:**
|
|
298
|
-
- Proactive issue detection
|
|
299
|
-
- User experience validation
|
|
300
|
-
- SLA monitoring
|
|
301
|
-
- Geographic validation (test from multiple regions)
|
|
302
|
-
|
|
303
141
|
---
|
|
304
142
|
|
|
305
|
-
|
|
306
|
-
|
|
307
|
-
**Concept:** Intentionally introduce failures in production to validate system resilience.
|
|
308
|
-
|
|
309
|
-
**Principles (Netflix Chaos Monkey):**
|
|
310
|
-
1. Define steady state (normal system behavior)
|
|
311
|
-
2. Hypothesize steady state continues during chaos
|
|
312
|
-
3. Introduce real-world failures
|
|
313
|
-
4. Try to disprove hypothesis
|
|
314
|
-
5. Minimize blast radius
|
|
315
|
-
|
|
316
|
-
**Example: Instance Failure Test**
|
|
317
|
-
```javascript
|
|
318
|
-
import { ChaosMonkey } from './chaos';
|
|
319
|
-
|
|
320
|
-
async function testInstanceResilience() {
|
|
321
|
-
// 1. Baseline: Record normal behavior
|
|
322
|
-
const baseline = await collectMetrics('api', '5m');
|
|
323
|
-
console.log(`Baseline: ${baseline.successRate}% success, ${baseline.latencyP95}ms p95`);
|
|
324
|
-
|
|
325
|
-
// 2. Hypothesis: System handles 1 instance failure gracefully
|
|
326
|
-
console.log('Hypothesis: Killing 1 instance won\'t impact users');
|
|
327
|
-
|
|
328
|
-
// 3. Introduce chaos (kill random instance)
|
|
329
|
-
await ChaosMonkey.killRandomInstance({
|
|
330
|
-
service: 'api',
|
|
331
|
-
count: 1, // Kill 1 instance
|
|
332
|
-
duration: '5m'
|
|
333
|
-
});
|
|
334
|
-
|
|
335
|
-
// 4. Measure impact
|
|
336
|
-
const chaosMetrics = await collectMetrics('api', '5m');
|
|
337
|
-
console.log(`During chaos: ${chaosMetrics.successRate}% success, ${chaosMetrics.latencyP95}ms p95`);
|
|
338
|
-
|
|
339
|
-
// 5. Verify hypothesis
|
|
340
|
-
const successRateDrop = baseline.successRate - chaosMetrics.successRate;
|
|
341
|
-
const latencyIncrease = chaosMetrics.latencyP95 - baseline.latencyP95;
|
|
342
|
-
|
|
343
|
-
if (successRateDrop < 0.1 && latencyIncrease < 50) {
|
|
344
|
-
console.log('✅ System is resilient to instance failures');
|
|
345
|
-
} else {
|
|
346
|
-
console.log('❌ System not resilient. Add redundancy!');
|
|
347
|
-
}
|
|
348
|
-
}
|
|
349
|
-
|
|
350
|
-
// Run weekly during low traffic
|
|
351
|
-
schedule.weekly('Sunday 3am', testInstanceResilience);
|
|
352
|
-
```
|
|
353
|
-
|
|
354
|
-
**Common Chaos Experiments:**
|
|
355
|
-
|
|
356
|
-
**a) Instance Failures**
|
|
357
|
-
```javascript
|
|
358
|
-
// Kill random instances (10% of fleet)
|
|
359
|
-
await ChaosMonkey.killRandomInstance({
|
|
360
|
-
service: 'api',
|
|
361
|
-
percentage: 10,
|
|
362
|
-
duration: '10m'
|
|
363
|
-
});
|
|
364
|
-
```
|
|
143
|
+
## Chaos Engineering
|
|
365
144
|
|
|
366
|
-
|
|
367
|
-
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
|
|
372
|
-
|
|
373
|
-
});
|
|
374
|
-
```
|
|
375
|
-
|
|
376
|
-
**c) Dependency Failures**
|
|
377
|
-
```javascript
|
|
378
|
-
// Simulate payment gateway outage
|
|
379
|
-
await ChaosMonkey.blockService({
|
|
380
|
-
service: 'payment-gateway',
|
|
381
|
-
duration: '5m'
|
|
382
|
-
});
|
|
383
|
-
|
|
384
|
-
// Verify: Graceful degradation? Retry logic working?
|
|
385
|
-
```
|
|
386
|
-
|
|
387
|
-
**d) Resource Exhaustion**
|
|
388
|
-
```javascript
|
|
389
|
-
// Stress test: High CPU load
|
|
390
|
-
await ChaosMonkey.stressCPU({
|
|
391
|
-
service: 'api',
|
|
392
|
-
percentage: 80, // 80% CPU usage
|
|
393
|
-
duration: '10m'
|
|
394
|
-
});
|
|
395
|
-
```
|
|
396
|
-
|
|
397
|
-
**Chaos Testing Tools:**
|
|
398
|
-
- Chaos Monkey (Netflix) - Random instance termination
|
|
399
|
-
- Chaos Toolkit - Programmable chaos experiments
|
|
400
|
-
- Gremlin - Chaos engineering platform
|
|
401
|
-
- Litmus Chaos - Kubernetes chaos engineering
|
|
402
|
-
|
|
403
|
-
---
|
|
404
|
-
|
|
405
|
-
### 5. A/B Testing (Hypothesis Validation)
|
|
406
|
-
|
|
407
|
-
**Concept:** Test two versions in production to determine which performs better.
|
|
408
|
-
|
|
409
|
-
```javascript
|
|
410
|
-
import { ABTest } from './ab-testing';
|
|
411
|
-
|
|
412
|
-
// Define A/B test
|
|
413
|
-
const checkoutTest = ABTest.create({
|
|
414
|
-
name: 'checkout-redesign',
|
|
415
|
-
hypothesis: 'New checkout flow increases conversion by 10%',
|
|
416
|
-
|
|
417
|
-
variants: {
|
|
418
|
-
control: {
|
|
419
|
-
weight: 50, // 50% of traffic
|
|
420
|
-
implementation: () => <OldCheckout />
|
|
421
|
-
},
|
|
422
|
-
treatment: {
|
|
423
|
-
weight: 50, // 50% of traffic
|
|
424
|
-
implementation: () => <NewCheckout />
|
|
425
|
-
}
|
|
145
|
+
```typescript
|
|
146
|
+
// Controlled failure injection
|
|
147
|
+
await Task("Chaos Experiment", {
|
|
148
|
+
hypothesis: 'System handles database latency gracefully',
|
|
149
|
+
steadyState: {
|
|
150
|
+
metric: 'error_rate',
|
|
151
|
+
expected: '< 0.1%'
|
|
426
152
|
},
|
|
427
|
-
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
|
|
153
|
+
experiment: {
|
|
154
|
+
type: 'network-latency',
|
|
155
|
+
target: 'database',
|
|
156
|
+
delay: '500ms',
|
|
157
|
+
duration: '5m'
|
|
431
158
|
},
|
|
432
|
-
|
|
433
|
-
|
|
434
|
-
|
|
435
|
-
});
|
|
436
|
-
|
|
437
|
-
// Render based on variant
|
|
438
|
-
function CheckoutPage({ user }) {
|
|
439
|
-
const variant = checkoutTest.getVariant(user.id);
|
|
440
|
-
const Checkout = variant.implementation;
|
|
441
|
-
|
|
442
|
-
// Track metrics
|
|
443
|
-
useEffect(() => {
|
|
444
|
-
checkoutTest.trackImpression(user.id, variant.name);
|
|
445
|
-
}, []);
|
|
446
|
-
|
|
447
|
-
return <Checkout onComplete={() => {
|
|
448
|
-
checkoutTest.trackConversion(user.id, variant.name);
|
|
449
|
-
}} />;
|
|
450
|
-
}
|
|
451
|
-
|
|
452
|
-
// Analyze results after sufficient data
|
|
453
|
-
async function analyzeTest() {
|
|
454
|
-
const results = await checkoutTest.analyze();
|
|
455
|
-
|
|
456
|
-
console.log(`Control conversion: ${results.control.conversionRate}%`);
|
|
457
|
-
console.log(`Treatment conversion: ${results.treatment.conversionRate}%`);
|
|
458
|
-
console.log(`Lift: ${results.lift}%`);
|
|
459
|
-
console.log(`P-value: ${results.pValue}`);
|
|
460
|
-
console.log(`Statistical significance: ${results.significant ? 'YES' : 'NO'}`);
|
|
461
|
-
|
|
462
|
-
if (results.significant && results.lift > 0) {
|
|
463
|
-
console.log('✅ Treatment wins! Rolling out to 100%');
|
|
464
|
-
await rolloutToProduction('treatment');
|
|
465
|
-
} else {
|
|
466
|
-
console.log('❌ No significant improvement. Keeping control.');
|
|
159
|
+
rollback: {
|
|
160
|
+
automatic: true,
|
|
161
|
+
trigger: 'error_rate > 5%'
|
|
467
162
|
}
|
|
468
|
-
}
|
|
163
|
+
}, "qe-chaos-engineer");
|
|
469
164
|
```
|
|
470
165
|
|
|
471
166
|
---
|
|
472
167
|
|
|
473
|
-
##
|
|
474
|
-
|
|
475
|
-
### 1. Minimize Blast Radius
|
|
476
|
-
|
|
477
|
-
**Always limit exposure:**
|
|
478
|
-
- Feature flags: 1% → 10% → 50% → 100%
|
|
479
|
-
- Canary: 5% → 10% → 25% → 50% → 100%
|
|
480
|
-
- Geographic: 1 region → 2 regions → All regions
|
|
481
|
-
|
|
482
|
-
### 2. Automate Rollback
|
|
168
|
+
## Production → Pre-Production Feedback Loop
|
|
483
169
|
|
|
484
|
-
|
|
485
|
-
|
|
486
|
-
|
|
487
|
-
|
|
488
|
-
|
|
489
|
-
|
|
490
|
-
}
|
|
491
|
-
|
|
492
|
-
|
|
493
|
-
|
|
494
|
-
|
|
495
|
-
**Key Metrics:**
|
|
496
|
-
- Success/error rates
|
|
497
|
-
- Latency (p50, p95, p99)
|
|
498
|
-
- CPU/memory usage
|
|
499
|
-
- User-facing metrics (conversion, engagement)
|
|
500
|
-
|
|
501
|
-
### 4. Test During Low Traffic
|
|
502
|
-
|
|
503
|
-
**Chaos engineering schedule:**
|
|
504
|
-
- Weekday mornings: Low traffic
|
|
505
|
-
- Sunday 3am: Minimal users
|
|
506
|
-
- Avoid holidays, sales events
|
|
507
|
-
|
|
508
|
-
### 5. Have a Kill Switch
|
|
170
|
+
```typescript
|
|
171
|
+
// Convert production incidents to regression tests
|
|
172
|
+
await Task("Incident Replay", {
|
|
173
|
+
incident: {
|
|
174
|
+
id: 'INC-2024-001',
|
|
175
|
+
type: 'performance-degradation',
|
|
176
|
+
conditions: { concurrent_users: 500, cart_items: 10 }
|
|
177
|
+
},
|
|
178
|
+
generateTests: true,
|
|
179
|
+
addToRegression: true
|
|
180
|
+
}, "qe-production-intelligence");
|
|
509
181
|
|
|
510
|
-
|
|
511
|
-
```javascript
|
|
512
|
-
// Global kill switch (stops all experiments)
|
|
513
|
-
if (FeatureFlags.isEnabled('global-kill-switch')) {
|
|
514
|
-
return <SafeMode />; // Fallback to known-good state
|
|
515
|
-
}
|
|
182
|
+
// Output: New test added to prevent recurrence
|
|
516
183
|
```
|
|
517
184
|
|
|
518
185
|
---
|
|
519
186
|
|
|
520
|
-
##
|
|
521
|
-
|
|
522
|
-
|
|
523
|
-
```
|
|
524
|
-
|
|
525
|
-
|
|
526
|
-
|
|
527
|
-
|
|
528
|
-
|
|
529
|
-
|
|
530
|
-
```
|
|
531
|
-
|
|
532
|
-
|
|
533
|
-
|
|
534
|
-
|
|
535
|
-
|
|
536
|
-
|
|
537
|
-
|
|
538
|
-
|
|
539
|
-
|
|
540
|
-
|
|
541
|
-
|
|
542
|
-
|
|
543
|
-
|
|
544
|
-
```
|
|
545
|
-
Alerts that weren't real issues
|
|
546
|
-
|
|
547
|
-
Target: < 5%
|
|
187
|
+
## Agent Coordination Hints
|
|
188
|
+
|
|
189
|
+
### Memory Namespace
|
|
190
|
+
```
|
|
191
|
+
aqe/shift-right/
|
|
192
|
+
├── canary-results/* - Canary deployment metrics
|
|
193
|
+
├── synthetic-tests/* - Monitoring configurations
|
|
194
|
+
├── chaos-experiments/* - Experiment results
|
|
195
|
+
├── production-insights/* - Issues → test conversions
|
|
196
|
+
└── rum-analysis/* - Real user data patterns
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
### Fleet Coordination
|
|
200
|
+
```typescript
|
|
201
|
+
const shiftRightFleet = await FleetManager.coordinate({
|
|
202
|
+
strategy: 'shift-right-testing',
|
|
203
|
+
agents: [
|
|
204
|
+
'qe-production-intelligence', // RUM, incident replay
|
|
205
|
+
'qe-chaos-engineer', // Resilience testing
|
|
206
|
+
'qe-performance-tester', // Synthetic monitoring
|
|
207
|
+
'qe-quality-analyzer' // Metrics analysis
|
|
208
|
+
],
|
|
209
|
+
topology: 'mesh'
|
|
210
|
+
});
|
|
548
211
|
```
|
|
549
212
|
|
|
550
213
|
---
|
|
551
214
|
|
|
552
215
|
## Related Skills
|
|
553
|
-
|
|
554
|
-
|
|
555
|
-
- [
|
|
556
|
-
- [
|
|
557
|
-
- [regression-testing](../regression-testing/)
|
|
558
|
-
|
|
559
|
-
**Infrastructure:**
|
|
560
|
-
- [test-environment-management](../test-environment-management/)
|
|
561
|
-
- [performance-testing](../performance-testing/)
|
|
562
|
-
|
|
563
|
-
**Monitoring:**
|
|
564
|
-
- [test-reporting-analytics](../test-reporting-analytics/)
|
|
565
|
-
- [production-intelligence](../production-intelligence/) (agent)
|
|
216
|
+
- [shift-left-testing](../shift-left-testing/) - Pre-production testing
|
|
217
|
+
- [chaos-engineering-resilience](../chaos-engineering-resilience/) - Failure injection deep dive
|
|
218
|
+
- [performance-testing](../performance-testing/) - Load testing
|
|
219
|
+
- [agentic-quality-engineering](../agentic-quality-engineering/) - Agent coordination
|
|
566
220
|
|
|
567
221
|
---
|
|
568
222
|
|
|
569
223
|
## Remember
|
|
570
224
|
|
|
571
|
-
**Production is the ultimate test environment.**
|
|
572
|
-
|
|
573
|
-
**Shift-Right complements Shift-Left:**
|
|
574
|
-
- **Shift-Left**: Catch bugs early (cheap)
|
|
575
|
-
- **Shift-Right**: Validate real-world behavior (accurate)
|
|
576
|
-
|
|
577
|
-
**Best Practices:**
|
|
578
|
-
1. Use feature flags for safe deployments
|
|
579
|
-
2. Canary deploy with automatic rollback
|
|
580
|
-
3. Synthetic monitoring for proactive detection
|
|
581
|
-
4. Chaos engineering for resilience
|
|
582
|
-
5. Always minimize blast radius
|
|
583
|
-
6. Monitor everything, alert intelligently
|
|
225
|
+
**Production is the ultimate test environment.** Feature flags enable instant rollback. Canary catches issues before 100% rollout. Synthetic monitoring detects problems before users. Chaos engineering builds resilience. RUM shows real user experience.
|
|
584
226
|
|
|
585
|
-
**With Agents:**
|
|
227
|
+
**With Agents:** Agents monitor production, replay incidents as tests, run chaos experiments, and convert production insights to pre-production tests. Use agents to maintain continuous production quality.
|