codex-genesis-harness 0.1.5 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (151) hide show
  1. package/.codebase/ARCHITECTURE_REVIEW_COMPLETE.md +216 -216
  2. package/.codebase/CURRENT_STATE.md +7 -2
  3. package/.codebase/FILE_NAMING_CLARIFICATION.md +161 -161
  4. package/.codebase/HARNESS_COMPLETENESS_AUDIT.md +613 -613
  5. package/.codebase/IMPLEMENTATION_COMPLETE.md +429 -429
  6. package/.codebase/IMPLEMENTATION_HANDOFF.md +351 -351
  7. package/.codebase/IMPROVEMENTS_SUMMARY.md +419 -419
  8. package/.codebase/PHASE3_SKILLS_NAMING_COMPLETE.md +292 -292
  9. package/.codebase/PHASE_DEPENDENCY_MAP.md +486 -486
  10. package/.codebase/QUICK_START_SPEC_IMPACT.md +456 -456
  11. package/.codebase/README.md +139 -139
  12. package/.codebase/RECOVERY_POINTS.md +438 -438
  13. package/.codex/skills/genesis-api-sync/SKILL.md +354 -354
  14. package/.codex/skills/genesis-api-sync/checklists/api-sync-checklist.md +101 -101
  15. package/.codex/skills/genesis-api-sync/templates/api-change-template.md +257 -257
  16. package/.codex/skills/genesis-debug-guide/SKILL.md +479 -479
  17. package/.codex/skills/genesis-debug-guide/checklists/flaky-test-investigation.md +339 -339
  18. package/.codex/skills/genesis-debug-guide/checklists/production-bug-debug.md +210 -210
  19. package/.codex/skills/genesis-debug-guide/checklists/test-failure-debug.md +158 -158
  20. package/.codex/skills/genesis-debug-guide/observability/debug-commands.md +365 -365
  21. package/.codex/skills/genesis-debug-guide/playbooks/unit-test-failures.md +289 -289
  22. package/.codex/skills/genesis-debug-guide/templates/debug-investigation-log.md +288 -288
  23. package/.codex/skills/genesis-docs-automation/SKILL.md +1003 -1003
  24. package/.codex/skills/genesis-docs-automation/checklists/docs-validation.md +359 -359
  25. package/.codex/skills/genesis-docs-automation/checklists/spec-alignment.md +312 -312
  26. package/.codex/skills/genesis-docs-automation/observability/docs-tracking.md +382 -382
  27. package/.codex/skills/genesis-docs-automation/playbooks/auto-update-flow.md +851 -851
  28. package/.codex/skills/genesis-docs-automation/playbooks/changelog-generation.md +491 -491
  29. package/.codex/skills/genesis-docs-automation/templates/changelog-entry-template.md +187 -187
  30. package/.codex/skills/genesis-docs-automation/templates/handoff-template.md +297 -297
  31. package/.codex/skills/genesis-harness/SKILL.md +1427 -1427
  32. package/.codex/skills/genesis-harness/agents/openai.yaml +7 -7
  33. package/.codex/skills/genesis-harness/checklists/bug-fix-qa.md +169 -169
  34. package/.codex/skills/genesis-harness/checklists/new-feature-qa.md +157 -157
  35. package/.codex/skills/genesis-harness/checklists/refactor-qa.md +216 -216
  36. package/.codex/skills/genesis-harness/checklists/requirements-validation.md +211 -211
  37. package/.codex/skills/genesis-harness/references/planning-schema.md +35 -35
  38. package/.codex/skills/genesis-harness/references/quality-rubric.md +21 -21
  39. package/.codex/skills/genesis-harness/references/research-rubric.md +41 -41
  40. package/.codex/skills/genesis-harness/references/workflows.md +33 -33
  41. package/.codex/skills/genesis-harness/resources/agents-template.md +27 -27
  42. package/.codex/skills/genesis-harness/resources/api-docs-template.md +32 -32
  43. package/.codex/skills/genesis-harness/resources/architecture-template.md +30 -30
  44. package/.codex/skills/genesis-harness/resources/audit-template.md +26 -26
  45. package/.codex/skills/genesis-harness/resources/bug-template.md +34 -34
  46. package/.codex/skills/genesis-harness/resources/change-impact-matrix-template.md +204 -204
  47. package/.codex/skills/genesis-harness/resources/check-template.md +21 -21
  48. package/.codex/skills/genesis-harness/resources/conventions-template.md +42 -42
  49. package/.codex/skills/genesis-harness/resources/decision-template.md +33 -33
  50. package/.codex/skills/genesis-harness/resources/design-template.md +26 -26
  51. package/.codex/skills/genesis-harness/resources/escalation-template.md +21 -21
  52. package/.codex/skills/genesis-harness/resources/feature-template.md +49 -49
  53. package/.codex/skills/genesis-harness/resources/foundation-phase-template.md +131 -131
  54. package/.codex/skills/genesis-harness/resources/integrations-template.md +32 -32
  55. package/.codex/skills/genesis-harness/resources/journeys-template.md +13 -13
  56. package/.codex/skills/genesis-harness/resources/lessons-learned-template.md +12 -12
  57. package/.codex/skills/genesis-harness/resources/observability-template.md +34 -34
  58. package/.codex/skills/genesis-harness/resources/phase-00-foundation-template.md +76 -76
  59. package/.codex/skills/genesis-harness/resources/phase-template.md +34 -34
  60. package/.codex/skills/genesis-harness/resources/pitfalls-template.md +22 -22
  61. package/.codex/skills/genesis-harness/resources/planning-tree-template.md +39 -39
  62. package/.codex/skills/genesis-harness/resources/post-implementation-guide.md +347 -347
  63. package/.codex/skills/genesis-harness/resources/project-template.md +38 -38
  64. package/.codex/skills/genesis-harness/resources/quality-score-template.md +11 -11
  65. package/.codex/skills/genesis-harness/resources/requirements-template.md +26 -26
  66. package/.codex/skills/genesis-harness/resources/research-template.md +26 -26
  67. package/.codex/skills/genesis-harness/resources/review-template.md +22 -22
  68. package/.codex/skills/genesis-harness/resources/spec-changelog-template.md +6 -6
  69. package/.codex/skills/genesis-harness/resources/stack-template.md +33 -33
  70. package/.codex/skills/genesis-harness/resources/verification-template.md +26 -26
  71. package/.codex/skills/genesis-harness/scripts/check-architecture-boundaries.sh +0 -0
  72. package/.codex/skills/genesis-harness/scripts/check-docs-sync.sh +0 -0
  73. package/.codex/skills/genesis-harness/scripts/check-no-debug-logs.sh +0 -0
  74. package/.codex/skills/genesis-harness/scripts/check-required-planning-files.sh +0 -0
  75. package/.codex/skills/genesis-harness/scripts/check-spec-changelog.sh +0 -0
  76. package/.codex/skills/genesis-harness/scripts/check-task-tracking.sh +0 -0
  77. package/.codex/skills/genesis-harness/scripts/compact-context.sh +0 -0
  78. package/.codex/skills/genesis-harness/scripts/create-adr.sh +0 -0
  79. package/.codex/skills/genesis-harness/scripts/create-bug.sh +0 -0
  80. package/.codex/skills/genesis-harness/scripts/create-feature.sh +0 -0
  81. package/.codex/skills/genesis-harness/scripts/detect-stack.sh +0 -0
  82. package/.codex/skills/genesis-harness/scripts/init-planning.sh +0 -0
  83. package/.codex/skills/genesis-harness/scripts/list-changed-files.sh +0 -0
  84. package/.codex/skills/genesis-harness/scripts/offload-log.sh +0 -0
  85. package/.codex/skills/genesis-harness/scripts/run-verification.sh +0 -0
  86. package/.codex/skills/genesis-harness/scripts/run-verify-loop.sh +0 -0
  87. package/.codex/skills/genesis-harness/scripts/update-state.sh +0 -0
  88. package/.codex/skills/genesis-mvp-planning/SKILL.md +114 -0
  89. package/.codex/skills/genesis-mvp-planning/agents/openai.yaml +6 -0
  90. package/.codex/skills/genesis-mvp-planning/checklists/mvp-readiness.md +18 -0
  91. package/.codex/skills/genesis-mvp-planning/examples/5-phase-roadmap-example.md +43 -0
  92. package/.codex/skills/genesis-mvp-planning/templates/phase-1-core.md +17 -0
  93. package/.codex/skills/genesis-mvp-planning/templates/phase-2-auth.md +17 -0
  94. package/.codex/skills/genesis-mvp-planning/templates/phase-3-features.md +17 -0
  95. package/.codex/skills/genesis-mvp-planning/templates/phase-4-integrations.md +17 -0
  96. package/.codex/skills/genesis-mvp-planning/templates/phase-5-readiness.md +17 -0
  97. package/.codex/skills/genesis-new-design/agents/openai.yaml +3 -3
  98. package/.codex/skills/genesis-observability-automation/checklists/.gitkeep +0 -0
  99. package/.codex/skills/genesis-observability-automation/observability/.gitkeep +0 -0
  100. package/.codex/skills/genesis-observability-automation/playbooks/.gitkeep +0 -0
  101. package/.codex/skills/genesis-observability-automation/templates/.gitkeep +0 -0
  102. package/.codex/skills/genesis-release-orchestration/SKILL.md +653 -653
  103. package/.codex/skills/genesis-release-orchestration/checklists/post-deployment-verification.md +274 -274
  104. package/.codex/skills/genesis-release-orchestration/checklists/pre-release-validation.md +220 -220
  105. package/.codex/skills/genesis-release-orchestration/observability/release-tracking.md +253 -253
  106. package/.codex/skills/genesis-release-orchestration/playbooks/canary-deployment-orchestration.md +472 -472
  107. package/.codex/skills/genesis-release-orchestration/playbooks/semantic-versioning-automation.md +494 -494
  108. package/.codex/skills/genesis-release-orchestration/templates/deployment-strategy-template.md +303 -303
  109. package/.codex/skills/genesis-release-orchestration/templates/release-runbook-template.md +420 -420
  110. package/.codex/skills/genesis-research-first/SKILL.md +237 -237
  111. package/.codex/skills/genesis-research-first/templates/.gitkeep +0 -0
  112. package/.codex/skills/genesis-spec-propagation/SKILL.md +534 -534
  113. package/.codex/skills/genesis-spec-propagation/checklists/phase-update-verification.md +384 -384
  114. package/.codex/skills/genesis-spec-propagation/checklists/spec-change-detection.md +257 -257
  115. package/.codex/skills/genesis-spec-propagation/observability/propagation-tracking.md +373 -373
  116. package/.codex/skills/genesis-spec-propagation/playbooks/breaking-change-propagation.md +692 -692
  117. package/.codex/skills/genesis-spec-propagation/playbooks/feature-change-propagation.md +434 -434
  118. package/.codex/skills/genesis-spec-propagation/templates/migration-guide-template.md +407 -407
  119. package/.codex/skills/genesis-upgrade-design/agents/openai.yaml +3 -3
  120. package/.codex/skills/spec-impact-engine/SKILL.md +504 -504
  121. package/.codex/skills/spec-impact-engine/detect-spec-changes.sh +0 -0
  122. package/.codex-plugin/plugin.json +19 -19
  123. package/CHANGELOG.md +42 -0
  124. package/LICENSE +22 -22
  125. package/README.EN.md +784 -730
  126. package/README.VI.md +776 -723
  127. package/README.md +102 -247
  128. package/VERSION +2 -2
  129. package/bin/genesis-harness.js +90 -87
  130. package/package.json +9 -3
  131. package/scripts/README.md +342 -342
  132. package/scripts/compact-context.sh +0 -0
  133. package/scripts/contract_integrity_gate.js +83 -0
  134. package/scripts/detect-changes.sh +0 -0
  135. package/scripts/healing_telemetry.js +118 -0
  136. package/scripts/install.sh +4 -1
  137. package/scripts/offload-log.sh +0 -0
  138. package/scripts/prompt_sentinel.js +84 -0
  139. package/scripts/run-evals.sh +1 -0
  140. package/scripts/run-verify-loop.sh +11 -0
  141. package/scripts/spec_visual_sync.js +157 -0
  142. package/scripts/test_generator.js +142 -0
  143. package/scripts/transition_state.sh +0 -0
  144. package/scripts/uninstall.sh +1 -0
  145. package/scripts/validation_gates.sh +40 -1
  146. package/scripts/verify.sh +5 -0
  147. package/tests/unit/contract_integrity_gate.test.js +74 -0
  148. package/tests/unit/healing_telemetry.test.js +58 -0
  149. package/tests/unit/prompt_sentinel.test.js +50 -0
  150. package/tests/unit/spec_visual_sync.test.js +77 -0
  151. package/tests/unit/test_generator.test.js +62 -0
@@ -1,303 +1,303 @@
1
- # Deployment Strategy Template
2
-
3
- **Release**: v[X.Y.Z]
4
- **Risk Score**: [N/10]
5
- **Strategy Selected**: [Blue-Green|Canary|Rolling]
6
- **Approval Date**: [YYYY-MM-DD]
7
- **Deployment Window**: [YYYY-MM-DD HH:MM-HH:MM UTC]
8
-
9
- ---
10
-
11
- ## Strategy Selection Criteria
12
-
13
- **Risk Score 1-2 (LOW)** → Rolling Deployment
14
- **Risk Score 3-5 (MEDIUM)** → Blue-Green Deployment
15
- **Risk Score 6-8 (HIGH)** → Canary Deployment
16
- **Risk Score 9-10 (CRITICAL)** → Canary + Manual approval at each stage
17
-
18
- ---
19
-
20
- ## Blue-Green Deployment Strategy
21
-
22
- **When to use**: Medium-risk releases (risk 3-5)
23
-
24
- ### Overview
25
-
26
- Two identical production environments:
27
- - **Blue** = Current production (v2.5.3)
28
- - **Green** = New version (v3.0.0)
29
-
30
- Traffic is routed to one at a time. If issues occur, switch back instantly.
31
-
32
- ### Timeline
33
-
34
- ```
35
- Stage 1: Deploy to Green (5-10 min)
36
- - Deploy v3.0.0 to green environment
37
- - Run health checks
38
- - Verify all systems ready
39
- - Keep blue (v2.5.3) running
40
-
41
- Stage 2: Validate Green (5-10 min)
42
- - Run smoke tests against green
43
- - Verify response format correct (if breaking changes)
44
- - Validate database migrations applied
45
- - Confirm green ready for traffic
46
-
47
- Stage 3: Switch Traffic (1-2 min)
48
- - Update load balancer: 100% traffic → green (v3.0.0)
49
- - Blue (v2.5.3) stops receiving traffic but stays running
50
-
51
- Stage 4: Monitor Green (1-24 hours)
52
- - Monitor error rate, latency, resources
53
- - If issues found: Instant rollback (switch back to blue)
54
- - If stable after 1-24 hours: Decommission blue, keep only green
55
-
56
- Rollback (if needed): <5 seconds
57
- - Switch traffic: green → blue (instant)
58
- - No data loss (traffic never split)
59
- ```
60
-
61
- ### Deployment Commands
62
-
63
- ```bash
64
- # 1. Deploy to green environment
65
- kubectl apply -f deployment-v3.0.0-green.yaml -n production
66
-
67
- # 2. Wait for green ready
68
- kubectl rollout status deployment/myapp-green --timeout=10m
69
-
70
- # 3. Validate green environment
71
- curl -f https://green-staging.myapp.example.com/health
72
- curl -f https://green-staging.myapp.example.com/api/users/1
73
-
74
- # 4. Switch traffic: Load Balancer route 100% to green
75
- aws elbv2 modify-rule --rule-arn arn:aws:elasticloadbalancing:... \
76
- --actions Type=forward,TargetGroups="[{TargetGroupArn=arn:...green:...,Weight=100}]"
77
-
78
- # 5. Monitor for 1-24 hours
79
- watch -n 10 'kubectl top pods -n production'
80
-
81
- # 6. If rollback needed: Switch back to blue (instant)
82
- aws elbv2 modify-rule --rule-arn arn:aws:elasticloadbalancing:... \
83
- --actions Type=forward,TargetGroups="[{TargetGroupArn=arn:...blue:...,Weight=100}]"
84
- ```
85
-
86
- ### Pros & Cons
87
-
88
- **Pros**:
89
- - ✅ Zero-downtime deployment
90
- - ✅ Instant rollback (< 5 seconds)
91
- - ✅ Can run parallel tests
92
- - ✅ No traffic splitting complexity
93
-
94
- **Cons**:
95
- - ❌ Double infrastructure cost (need 2x environments)
96
- - ❌ Data consistency challenges (2 databases)
97
- - ❌ Not suitable for frequent deployments
98
-
99
- ---
100
-
101
- ## Canary Deployment Strategy
102
-
103
- **When to use**: High-risk releases (risk 6-8+)
104
-
105
- ### Overview
106
-
107
- Gradually roll out new version to small % of traffic, increasing as confidence grows.
108
-
109
- - Stage 1: 1% traffic (1 hour monitoring)
110
- - Stage 2: 10% traffic (2 hours monitoring)
111
- - Stage 3: 50% traffic (4 hours monitoring)
112
- - Stage 4: 100% traffic (24+ hours monitoring)
113
-
114
- If issues at any stage: Rollback all traffic to previous version.
115
-
116
- ### Timeline
117
-
118
- ```
119
- Stage 1: 1% Traffic Canary (1 hour)
120
- - Deploy v3.0.0 alongside v2.5.3
121
- - Route 1% of traffic to v3.0.0
122
- - Monitor error rate, latency, resource usage
123
- - Go/No-go decision after 1 hour
124
-
125
- Stage 2: 10% Traffic Canary (2 hours)
126
- - Increase to 10% traffic if Stage 1 passed
127
- - Monitor for 2 hours (covers lunch rush if applicable)
128
- - Go/No-go decision after 2 hours
129
-
130
- Stage 3: 50% Traffic Canary (4 hours)
131
- - Increase to 50% traffic if Stage 2 passed
132
- - Monitor for 4 hours (covers peak traffic periods)
133
- - Go/No-go decision after 4 hours
134
-
135
- Stage 4: 100% Traffic - Full Rollout (24+ hours)
136
- - Route 100% traffic to v3.0.0
137
- - Keep v2.5.3 running for 24 hours (instant rollback)
138
- - Continuous monitoring for 24 hours
139
- - After 24 hours stable: Decommission v2.5.3
140
-
141
- Total deployment: 8+ hours (Stage 1→4)
142
- Rollback window: 24 hours post-complete deployment
143
- ```
144
-
145
- ### Deployment Commands
146
-
147
- ```bash
148
- # Stage 1: Deploy and route 1% traffic
149
- kubectl apply -f deployment-v3.0.0.yaml -n production
150
- kubectl get pods -w # Verify ready
151
-
152
- # Route 1% traffic to v3.0.0, 99% to v2.5.3
153
- aws elbv2 modify-rule --rule-arn <arn> \
154
- --actions Type=forward,TargetGroups="[{TargetGroupArn=<v3-tg>,Weight=1},{TargetGroupArn=<v2-tg>,Weight=99}]"
155
-
156
- # Monitor Stage 1 (1 hour)
157
- # ... check error rate, latency, etc.
158
-
159
- # Stage 2: Increase to 10%
160
- aws elbv2 modify-rule --rule-arn <arn> \
161
- --actions Type=forward,TargetGroups="[{TargetGroupArn=<v3-tg>,Weight=10},{TargetGroupArn=<v2-tg>,Weight=90}]"
162
-
163
- # Stage 3: Increase to 50%
164
- aws elbv2 modify-rule --rule-arn <arn> \
165
- --actions Type=forward,TargetGroups="[{TargetGroupArn=<v3-tg>,Weight=50},{TargetGroupArn=<v2-tg>,Weight=50}]"
166
-
167
- # Stage 4: 100% traffic (full rollout)
168
- aws elbv2 modify-rule --rule-arn <arn> \
169
- --actions Type=forward,TargetGroups="[{TargetGroupArn=<v3-tg>,Weight=100}]"
170
-
171
- # Rollback (if needed at any stage): Instant switch to previous version
172
- aws elbv2 modify-rule --rule-arn <arn> \
173
- --actions Type=forward,TargetGroups="[{TargetGroupArn=<v2-tg>,Weight=100}]"
174
- ```
175
-
176
- ### Go/No-Go Criteria (At Each Stage)
177
-
178
- **GO** if:
179
- - ✅ Error rate <1% (target <0.1%)
180
- - ✅ Latency within baseline (±10%)
181
- - ✅ No critical issues
182
- - ✅ Consumer feedback positive
183
- - ✅ Team lead approves
184
-
185
- **NO-GO** (Pause) if:
186
- - ⚠️ Error rate 1-5% (investigate, pause 30 min)
187
- - ⚠️ Latency spike >50% (investigate, may indicate load issue)
188
- - ⚠️ Resource exhaustion (tune, retry)
189
-
190
- **ROLLBACK** if:
191
- - ❌ Error rate >5%
192
- - ❌ Complete service unavailability
193
- - ❌ Data corruption
194
- - ❌ Security vulnerability
195
-
196
- ### Pros & Cons
197
-
198
- **Pros**:
199
- - ✅ Low risk (catch issues early)
200
- - ✅ Minimal blast radius at each stage
201
- - ✅ Customer feedback integrated
202
- - ✅ Gradual confidence increase
203
-
204
- **Cons**:
205
- - ❌ Slow deployment (8+ hours)
206
- - ❌ Requires careful monitoring
207
- - ❌ Split traffic may hide issues
208
- - ❌ Complex state management (need both versions running)
209
-
210
- ---
211
-
212
- ## Rolling Deployment Strategy
213
-
214
- **When to use**: Low-risk releases (risk 1-2)
215
-
216
- ### Overview
217
-
218
- Replace instances one-by-one, keeping service available throughout.
219
-
220
- - Wave 1: Update 25% of instances (1 hour)
221
- - Wave 2: Update 50% of instances (1 hour)
222
- - Wave 3: Update 75% of instances (1 hour)
223
- - Wave 4: Update 100% of instances (1 hour)
224
-
225
- If issue detected: Rollback stops, can reverse changes.
226
-
227
- ### Timeline
228
-
229
- ```
230
- Wave 1: 25% Instances (1 hour)
231
- - Cordoff 25% of instances
232
- - Deploy v3.0.0 to cordoned instances
233
- - Verify health checks pass
234
- - Return instances to rotation
235
- - Monitor error rate (should be ~0%)
236
-
237
- Wave 2: 50% Instances (1 hour)
238
- - Repeat for next 25%
239
-
240
- Wave 3: 75% Instances (1 hour)
241
- - Repeat for next 25%
242
-
243
- Wave 4: 100% Instances (1 hour)
244
- - Deploy to final 25%
245
- - All instances now v3.0.0
246
-
247
- Total deployment: 4 hours
248
- Rollback: Can stop deployment mid-wave, rollback in progress instances
249
- ```
250
-
251
- ### Pros & Cons
252
-
253
- **Pros**:
254
- - ✅ Fast (4 hours vs 8+ hours)
255
- - ✅ Zero-downtime (service always available)
256
- - ✅ Simple implementation
257
- - ✅ Low infrastructure cost
258
-
259
- **Cons**:
260
- - ❌ Higher risk (old + new running simultaneously)
261
- - ❌ Harder to rollback (partially deployed state)
262
- - ❌ Can't instantly revert (need to re-deploy)
263
-
264
- ---
265
-
266
- ## Monitoring Metrics for All Strategies
267
-
268
- **Critical Metrics** (check every 5-10 minutes):
269
-
270
- | Metric | Target | Alert |
271
- |--------|--------|-------|
272
- | Error Rate (5xx) | <0.1% | >1% |
273
- | Latency P95 | <200ms | >300ms |
274
- | Latency P99 | <500ms | >750ms |
275
- | CPU Usage | <70% | >80% |
276
- | Memory Usage | <80% | >85% |
277
- | Database Connections | Stable | Growing |
278
- | Request Timeout Rate | <0.01% | >0.05% |
279
-
280
- ---
281
-
282
- ## Rollback Procedure (Universal for All Strategies)
283
-
284
- ```bash
285
- # Immediate: Switch traffic back to previous version
286
- aws elbv2 modify-rule --rule-arn <arn> \
287
- --actions Type=forward,TargetGroups="[{TargetGroupArn=<prev-tg>,Weight=100}]"
288
-
289
- # Verify previous version handling traffic
290
- curl -f http://[prod-url]/health
291
-
292
- # Notify stakeholders
293
- # Email: ops-team@company.com
294
- # Slack: #incident-response
295
- # Status page: Updated
296
-
297
- # Start investigation
298
- echo "Rollback complete at $(date)" >> INCIDENT.md
299
- ```
300
-
301
- ---
302
-
303
- **DEPLOYMENT STRATEGY COMPLETE**
1
+ # Deployment Strategy Template
2
+
3
+ **Release**: v[X.Y.Z]
4
+ **Risk Score**: [N/10]
5
+ **Strategy Selected**: [Blue-Green|Canary|Rolling]
6
+ **Approval Date**: [YYYY-MM-DD]
7
+ **Deployment Window**: [YYYY-MM-DD HH:MM-HH:MM UTC]
8
+
9
+ ---
10
+
11
+ ## Strategy Selection Criteria
12
+
13
+ **Risk Score 1-2 (LOW)** → Rolling Deployment
14
+ **Risk Score 3-5 (MEDIUM)** → Blue-Green Deployment
15
+ **Risk Score 6-8 (HIGH)** → Canary Deployment
16
+ **Risk Score 9-10 (CRITICAL)** → Canary + Manual approval at each stage
17
+
18
+ ---
19
+
20
+ ## Blue-Green Deployment Strategy
21
+
22
+ **When to use**: Medium-risk releases (risk 3-5)
23
+
24
+ ### Overview
25
+
26
+ Two identical production environments:
27
+ - **Blue** = Current production (v2.5.3)
28
+ - **Green** = New version (v3.0.0)
29
+
30
+ Traffic is routed to one at a time. If issues occur, switch back instantly.
31
+
32
+ ### Timeline
33
+
34
+ ```
35
+ Stage 1: Deploy to Green (5-10 min)
36
+ - Deploy v3.0.0 to green environment
37
+ - Run health checks
38
+ - Verify all systems ready
39
+ - Keep blue (v2.5.3) running
40
+
41
+ Stage 2: Validate Green (5-10 min)
42
+ - Run smoke tests against green
43
+ - Verify response format correct (if breaking changes)
44
+ - Validate database migrations applied
45
+ - Confirm green ready for traffic
46
+
47
+ Stage 3: Switch Traffic (1-2 min)
48
+ - Update load balancer: 100% traffic → green (v3.0.0)
49
+ - Blue (v2.5.3) stops receiving traffic but stays running
50
+
51
+ Stage 4: Monitor Green (1-24 hours)
52
+ - Monitor error rate, latency, resources
53
+ - If issues found: Instant rollback (switch back to blue)
54
+ - If stable after 1-24 hours: Decommission blue, keep only green
55
+
56
+ Rollback (if needed): <5 seconds
57
+ - Switch traffic: green → blue (instant)
58
+ - No data loss (traffic never split)
59
+ ```
60
+
61
+ ### Deployment Commands
62
+
63
+ ```bash
64
+ # 1. Deploy to green environment
65
+ kubectl apply -f deployment-v3.0.0-green.yaml -n production
66
+
67
+ # 2. Wait for green ready
68
+ kubectl rollout status deployment/myapp-green --timeout=10m
69
+
70
+ # 3. Validate green environment
71
+ curl -f https://green-staging.myapp.example.com/health
72
+ curl -f https://green-staging.myapp.example.com/api/users/1
73
+
74
+ # 4. Switch traffic: Load Balancer route 100% to green
75
+ aws elbv2 modify-rule --rule-arn arn:aws:elasticloadbalancing:... \
76
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=arn:...green:...,Weight=100}]"
77
+
78
+ # 5. Monitor for 1-24 hours
79
+ watch -n 10 'kubectl top pods -n production'
80
+
81
+ # 6. If rollback needed: Switch back to blue (instant)
82
+ aws elbv2 modify-rule --rule-arn arn:aws:elasticloadbalancing:... \
83
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=arn:...blue:...,Weight=100}]"
84
+ ```
85
+
86
+ ### Pros & Cons
87
+
88
+ **Pros**:
89
+ - ✅ Zero-downtime deployment
90
+ - ✅ Instant rollback (< 5 seconds)
91
+ - ✅ Can run parallel tests
92
+ - ✅ No traffic splitting complexity
93
+
94
+ **Cons**:
95
+ - ❌ Double infrastructure cost (need 2x environments)
96
+ - ❌ Data consistency challenges (2 databases)
97
+ - ❌ Not suitable for frequent deployments
98
+
99
+ ---
100
+
101
+ ## Canary Deployment Strategy
102
+
103
+ **When to use**: High-risk releases (risk 6-8+)
104
+
105
+ ### Overview
106
+
107
+ Gradually roll out new version to small % of traffic, increasing as confidence grows.
108
+
109
+ - Stage 1: 1% traffic (1 hour monitoring)
110
+ - Stage 2: 10% traffic (2 hours monitoring)
111
+ - Stage 3: 50% traffic (4 hours monitoring)
112
+ - Stage 4: 100% traffic (24+ hours monitoring)
113
+
114
+ If issues at any stage: Rollback all traffic to previous version.
115
+
116
+ ### Timeline
117
+
118
+ ```
119
+ Stage 1: 1% Traffic Canary (1 hour)
120
+ - Deploy v3.0.0 alongside v2.5.3
121
+ - Route 1% of traffic to v3.0.0
122
+ - Monitor error rate, latency, resource usage
123
+ - Go/No-go decision after 1 hour
124
+
125
+ Stage 2: 10% Traffic Canary (2 hours)
126
+ - Increase to 10% traffic if Stage 1 passed
127
+ - Monitor for 2 hours (covers lunch rush if applicable)
128
+ - Go/No-go decision after 2 hours
129
+
130
+ Stage 3: 50% Traffic Canary (4 hours)
131
+ - Increase to 50% traffic if Stage 2 passed
132
+ - Monitor for 4 hours (covers peak traffic periods)
133
+ - Go/No-go decision after 4 hours
134
+
135
+ Stage 4: 100% Traffic - Full Rollout (24+ hours)
136
+ - Route 100% traffic to v3.0.0
137
+ - Keep v2.5.3 running for 24 hours (instant rollback)
138
+ - Continuous monitoring for 24 hours
139
+ - After 24 hours stable: Decommission v2.5.3
140
+
141
+ Total deployment: 8+ hours (Stage 1→4)
142
+ Rollback window: 24 hours post-complete deployment
143
+ ```
144
+
145
+ ### Deployment Commands
146
+
147
+ ```bash
148
+ # Stage 1: Deploy and route 1% traffic
149
+ kubectl apply -f deployment-v3.0.0.yaml -n production
150
+ kubectl get pods -w # Verify ready
151
+
152
+ # Route 1% traffic to v3.0.0, 99% to v2.5.3
153
+ aws elbv2 modify-rule --rule-arn <arn> \
154
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<v3-tg>,Weight=1},{TargetGroupArn=<v2-tg>,Weight=99}]"
155
+
156
+ # Monitor Stage 1 (1 hour)
157
+ # ... check error rate, latency, etc.
158
+
159
+ # Stage 2: Increase to 10%
160
+ aws elbv2 modify-rule --rule-arn <arn> \
161
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<v3-tg>,Weight=10},{TargetGroupArn=<v2-tg>,Weight=90}]"
162
+
163
+ # Stage 3: Increase to 50%
164
+ aws elbv2 modify-rule --rule-arn <arn> \
165
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<v3-tg>,Weight=50},{TargetGroupArn=<v2-tg>,Weight=50}]"
166
+
167
+ # Stage 4: 100% traffic (full rollout)
168
+ aws elbv2 modify-rule --rule-arn <arn> \
169
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<v3-tg>,Weight=100}]"
170
+
171
+ # Rollback (if needed at any stage): Instant switch to previous version
172
+ aws elbv2 modify-rule --rule-arn <arn> \
173
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<v2-tg>,Weight=100}]"
174
+ ```
175
+
176
+ ### Go/No-Go Criteria (At Each Stage)
177
+
178
+ **GO** if:
179
+ - ✅ Error rate <1% (target <0.1%)
180
+ - ✅ Latency within baseline (±10%)
181
+ - ✅ No critical issues
182
+ - ✅ Consumer feedback positive
183
+ - ✅ Team lead approves
184
+
185
+ **NO-GO** (Pause) if:
186
+ - ⚠️ Error rate 1-5% (investigate, pause 30 min)
187
+ - ⚠️ Latency spike >50% (investigate, may indicate load issue)
188
+ - ⚠️ Resource exhaustion (tune, retry)
189
+
190
+ **ROLLBACK** if:
191
+ - ❌ Error rate >5%
192
+ - ❌ Complete service unavailability
193
+ - ❌ Data corruption
194
+ - ❌ Security vulnerability
195
+
196
+ ### Pros & Cons
197
+
198
+ **Pros**:
199
+ - ✅ Low risk (catch issues early)
200
+ - ✅ Minimal blast radius at each stage
201
+ - ✅ Customer feedback integrated
202
+ - ✅ Gradual confidence increase
203
+
204
+ **Cons**:
205
+ - ❌ Slow deployment (8+ hours)
206
+ - ❌ Requires careful monitoring
207
+ - ❌ Split traffic may hide issues
208
+ - ❌ Complex state management (need both versions running)
209
+
210
+ ---
211
+
212
+ ## Rolling Deployment Strategy
213
+
214
+ **When to use**: Low-risk releases (risk 1-2)
215
+
216
+ ### Overview
217
+
218
+ Replace instances one-by-one, keeping service available throughout.
219
+
220
+ - Wave 1: Update 25% of instances (1 hour)
221
+ - Wave 2: Update 50% of instances (1 hour)
222
+ - Wave 3: Update 75% of instances (1 hour)
223
+ - Wave 4: Update 100% of instances (1 hour)
224
+
225
+ If issue detected: Rollback stops, can reverse changes.
226
+
227
+ ### Timeline
228
+
229
+ ```
230
+ Wave 1: 25% Instances (1 hour)
231
+ - Cordoff 25% of instances
232
+ - Deploy v3.0.0 to cordoned instances
233
+ - Verify health checks pass
234
+ - Return instances to rotation
235
+ - Monitor error rate (should be ~0%)
236
+
237
+ Wave 2: 50% Instances (1 hour)
238
+ - Repeat for next 25%
239
+
240
+ Wave 3: 75% Instances (1 hour)
241
+ - Repeat for next 25%
242
+
243
+ Wave 4: 100% Instances (1 hour)
244
+ - Deploy to final 25%
245
+ - All instances now v3.0.0
246
+
247
+ Total deployment: 4 hours
248
+ Rollback: Can stop deployment mid-wave, rollback in progress instances
249
+ ```
250
+
251
+ ### Pros & Cons
252
+
253
+ **Pros**:
254
+ - ✅ Fast (4 hours vs 8+ hours)
255
+ - ✅ Zero-downtime (service always available)
256
+ - ✅ Simple implementation
257
+ - ✅ Low infrastructure cost
258
+
259
+ **Cons**:
260
+ - ❌ Higher risk (old + new running simultaneously)
261
+ - ❌ Harder to rollback (partially deployed state)
262
+ - ❌ Can't instantly revert (need to re-deploy)
263
+
264
+ ---
265
+
266
+ ## Monitoring Metrics for All Strategies
267
+
268
+ **Critical Metrics** (check every 5-10 minutes):
269
+
270
+ | Metric | Target | Alert |
271
+ |--------|--------|-------|
272
+ | Error Rate (5xx) | <0.1% | >1% |
273
+ | Latency P95 | <200ms | >300ms |
274
+ | Latency P99 | <500ms | >750ms |
275
+ | CPU Usage | <70% | >80% |
276
+ | Memory Usage | <80% | >85% |
277
+ | Database Connections | Stable | Growing |
278
+ | Request Timeout Rate | <0.01% | >0.05% |
279
+
280
+ ---
281
+
282
+ ## Rollback Procedure (Universal for All Strategies)
283
+
284
+ ```bash
285
+ # Immediate: Switch traffic back to previous version
286
+ aws elbv2 modify-rule --rule-arn <arn> \
287
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<prev-tg>,Weight=100}]"
288
+
289
+ # Verify previous version handling traffic
290
+ curl -f http://[prod-url]/health
291
+
292
+ # Notify stakeholders
293
+ # Email: ops-team@company.com
294
+ # Slack: #incident-response
295
+ # Status page: Updated
296
+
297
+ # Start investigation
298
+ echo "Rollback complete at $(date)" >> INCIDENT.md
299
+ ```
300
+
301
+ ---
302
+
303
+ **DEPLOYMENT STRATEGY COMPLETE**