codex-genesis-harness 0.1.1 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (180) hide show
  1. package/.codebase/ARCHITECTURE_REVIEW_COMPLETE.md +216 -0
  2. package/.codebase/CURRENT_STATE.md +2 -0
  3. package/.codebase/DOMAIN_MODELS.md +5 -3
  4. package/.codebase/FILE_NAMING_CLARIFICATION.md +161 -0
  5. package/.codebase/HARNESS_COMPLETENESS_AUDIT.md +613 -0
  6. package/.codebase/IMPLEMENTATION_COMPLETE.md +429 -0
  7. package/.codebase/IMPLEMENTATION_HANDOFF.md +351 -0
  8. package/.codebase/IMPROVEMENTS_SUMMARY.md +419 -0
  9. package/.codebase/PHASE3_SKILLS_NAMING_COMPLETE.md +292 -0
  10. package/.codebase/PHASE_DEPENDENCY_MAP.md +486 -0
  11. package/.codebase/QUICK_START_SPEC_IMPACT.md +456 -0
  12. package/.codebase/README.md +139 -0
  13. package/.codebase/RECOVERY_POINTS.md +438 -0
  14. package/.codex/skills/genesis-api-sync/SKILL.md +354 -0
  15. package/.codex/skills/genesis-api-sync/agents/openai.yaml +7 -0
  16. package/.codex/skills/genesis-api-sync/checklists/api-sync-checklist.md +101 -0
  17. package/.codex/skills/genesis-api-sync/examples/example.md +68 -0
  18. package/.codex/skills/genesis-api-sync/templates/api-change-template.md +257 -0
  19. package/.codex/skills/genesis-debug-guide/SKILL.md +479 -0
  20. package/.codex/skills/genesis-debug-guide/agents/openai.yaml +7 -0
  21. package/.codex/skills/genesis-debug-guide/checklists/flaky-test-investigation.md +339 -0
  22. package/.codex/skills/genesis-debug-guide/checklists/production-bug-debug.md +210 -0
  23. package/.codex/skills/genesis-debug-guide/checklists/test-failure-debug.md +158 -0
  24. package/.codex/skills/genesis-debug-guide/examples/example.md +48 -0
  25. package/.codex/skills/genesis-debug-guide/observability/debug-commands.md +365 -0
  26. package/.codex/skills/genesis-debug-guide/playbooks/unit-test-failures.md +289 -0
  27. package/.codex/skills/genesis-debug-guide/templates/debug-investigation-log.md +288 -0
  28. package/.codex/skills/genesis-docs-automation/SKILL.md +1003 -0
  29. package/.codex/skills/genesis-docs-automation/agents/openai.yaml +7 -0
  30. package/.codex/skills/genesis-docs-automation/checklists/docs-validation.md +359 -0
  31. package/.codex/skills/genesis-docs-automation/checklists/spec-alignment.md +312 -0
  32. package/.codex/skills/genesis-docs-automation/examples/example.md +59 -0
  33. package/.codex/skills/genesis-docs-automation/observability/docs-tracking.md +382 -0
  34. package/.codex/skills/genesis-docs-automation/playbooks/auto-update-flow.md +851 -0
  35. package/.codex/skills/genesis-docs-automation/playbooks/changelog-generation.md +491 -0
  36. package/.codex/skills/genesis-docs-automation/templates/changelog-entry-template.md +187 -0
  37. package/.codex/skills/genesis-docs-automation/templates/handoff-template.md +297 -0
  38. package/.codex/skills/genesis-harness/SKILL.md +734 -82
  39. package/.codex/skills/genesis-harness/checklists/bug-fix-qa.md +169 -0
  40. package/.codex/skills/genesis-harness/checklists/new-feature-qa.md +157 -0
  41. package/.codex/skills/genesis-harness/checklists/refactor-qa.md +216 -0
  42. package/.codex/skills/genesis-harness/checklists/requirements-validation.md +211 -0
  43. package/.codex/skills/genesis-harness/resources/change-impact-matrix-template.md +204 -0
  44. package/.codex/skills/genesis-harness/resources/foundation-phase-template.md +131 -0
  45. package/.codex/skills/genesis-harness/resources/phase-00-foundation-template.md +76 -0
  46. package/.codex/skills/genesis-harness/resources/post-implementation-guide.md +347 -0
  47. package/.codex/skills/genesis-harness/scripts/check-architecture-boundaries.sh +23 -23
  48. package/.codex/skills/genesis-harness/scripts/check-docs-sync.sh +24 -24
  49. package/.codex/skills/genesis-harness/scripts/check-no-debug-logs.sh +21 -21
  50. package/.codex/skills/genesis-harness/scripts/check-required-planning-files.sh +46 -46
  51. package/.codex/skills/genesis-harness/scripts/check-spec-changelog.sh +24 -24
  52. package/.codex/skills/genesis-harness/scripts/check-task-tracking.sh +25 -25
  53. package/.codex/skills/genesis-harness/scripts/compact-context.sh +54 -0
  54. package/.codex/skills/genesis-harness/scripts/create-adr.sh +74 -74
  55. package/.codex/skills/genesis-harness/scripts/create-bug.sh +160 -160
  56. package/.codex/skills/genesis-harness/scripts/create-feature.sh +217 -217
  57. package/.codex/skills/genesis-harness/scripts/detect-stack.sh +26 -26
  58. package/.codex/skills/genesis-harness/scripts/init-planning.sh +750 -719
  59. package/.codex/skills/genesis-harness/scripts/list-changed-files.sh +12 -12
  60. package/.codex/skills/genesis-harness/scripts/offload-log.sh +72 -0
  61. package/.codex/skills/genesis-harness/scripts/run-verification.sh +47 -47
  62. package/.codex/skills/genesis-harness/scripts/run-verify-loop.sh +75 -0
  63. package/.codex/skills/genesis-harness/scripts/update-state.sh +33 -33
  64. package/.codex/skills/genesis-harness-engineering/SKILL.md +159 -0
  65. package/.codex/skills/genesis-harness-engineering/checklists/checklist.md +48 -0
  66. package/.codex/skills/genesis-harness-engineering/examples/example.md +57 -0
  67. package/.codex/skills/genesis-harness-engineering/playbooks/harness-evolution.md +99 -0
  68. package/.codex/skills/genesis-harness-engineering/templates/harness-change-template.md +37 -0
  69. package/.codex/skills/genesis-observability-automation/SKILL.md +382 -0
  70. package/.codex/skills/genesis-observability-automation/agents/openai.yaml +7 -0
  71. package/.codex/skills/genesis-observability-automation/examples/example.md +86 -0
  72. package/.codex/skills/genesis-performance-profiling/SKILL.md +510 -0
  73. package/.codex/skills/genesis-performance-profiling/agents/openai.yaml +6 -0
  74. package/.codex/skills/genesis-performance-profiling/checklists/optimization-verification.md +199 -0
  75. package/.codex/skills/genesis-performance-profiling/checklists/performance-baseline.md +183 -0
  76. package/.codex/skills/genesis-performance-profiling/examples/example.md +234 -0
  77. package/.codex/skills/genesis-performance-profiling/observability/performance-tracking.md +202 -0
  78. package/.codex/skills/genesis-performance-profiling/playbooks/load-testing-orchestration.md +593 -0
  79. package/.codex/skills/genesis-performance-profiling/playbooks/profiling-playbook.md +601 -0
  80. package/.codex/skills/genesis-performance-profiling/templates/load-test-config-template.md +428 -0
  81. package/.codex/skills/genesis-performance-profiling/templates/performance-report-template.md +238 -0
  82. package/.codex/skills/genesis-release-orchestration/SKILL.md +653 -0
  83. package/.codex/skills/genesis-release-orchestration/agents/openai.yaml +7 -0
  84. package/.codex/skills/genesis-release-orchestration/checklists/post-deployment-verification.md +274 -0
  85. package/.codex/skills/genesis-release-orchestration/checklists/pre-release-validation.md +220 -0
  86. package/.codex/skills/genesis-release-orchestration/examples/example.md +78 -0
  87. package/.codex/skills/genesis-release-orchestration/observability/release-tracking.md +253 -0
  88. package/.codex/skills/genesis-release-orchestration/playbooks/canary-deployment-orchestration.md +472 -0
  89. package/.codex/skills/genesis-release-orchestration/playbooks/semantic-versioning-automation.md +494 -0
  90. package/.codex/skills/genesis-release-orchestration/templates/deployment-strategy-template.md +303 -0
  91. package/.codex/skills/genesis-release-orchestration/templates/release-runbook-template.md +420 -0
  92. package/.codex/skills/genesis-research-first/SKILL.md +237 -0
  93. package/.codex/skills/genesis-research-first/agents/openai.yaml +7 -0
  94. package/.codex/skills/genesis-research-first/examples/example.md +85 -0
  95. package/.codex/skills/genesis-spec-propagation/SKILL.md +534 -0
  96. package/.codex/skills/genesis-spec-propagation/agents/openai.yaml +7 -0
  97. package/.codex/skills/genesis-spec-propagation/checklists/phase-update-verification.md +384 -0
  98. package/.codex/skills/genesis-spec-propagation/checklists/spec-change-detection.md +257 -0
  99. package/.codex/skills/genesis-spec-propagation/examples/example.md +63 -0
  100. package/.codex/skills/genesis-spec-propagation/observability/propagation-tracking.md +373 -0
  101. package/.codex/skills/genesis-spec-propagation/playbooks/breaking-change-propagation.md +692 -0
  102. package/.codex/skills/genesis-spec-propagation/playbooks/feature-change-propagation.md +434 -0
  103. package/.codex/skills/genesis-spec-propagation/templates/migration-guide-template.md +407 -0
  104. package/.codex/skills/spec-impact-engine/SKILL.md +504 -0
  105. package/.codex/skills/spec-impact-engine/agents/openai.yaml +7 -0
  106. package/.codex/skills/spec-impact-engine/detect-spec-changes.sh +262 -0
  107. package/.codex/skills/spec-impact-engine/examples/example.md +98 -0
  108. package/.codex/skills/spec-impact-engine/templates/impact-report.md +248 -0
  109. package/.codex/skills/spec-impact-engine/templates/migration-guide.md +223 -0
  110. package/.codex-plugin/plugin.json +1 -1
  111. package/README.EN.md +719 -0
  112. package/README.VI.md +712 -0
  113. package/README.md +261 -107
  114. package/VERSION +1 -1
  115. package/bin/genesis-harness.js +20 -11
  116. package/package.json +1 -1
  117. package/scripts/README.md +342 -0
  118. package/scripts/compact-context.sh +54 -0
  119. package/scripts/detect-changes.sh +152 -0
  120. package/scripts/install.sh +50 -41
  121. package/scripts/offload-log.sh +72 -0
  122. package/scripts/run-evals.sh +70 -43
  123. package/scripts/run-verify-loop.sh +75 -0
  124. package/scripts/uninstall.sh +52 -43
  125. package/scripts/verify.sh +165 -73
  126. package/.codex/skills/harness-engineering-skill/SKILL.md +0 -45
  127. package/.codex/skills/harness-engineering-skill/checklists/checklist.md +0 -8
  128. package/.codex/skills/harness-engineering-skill/examples/example.md +0 -4
  129. package/.codex/skills/harness-engineering-skill/templates/harness-change-template.md +0 -8
  130. /package/.codex/skills/{ai-provider-skill → genesis-ai-provider}/SKILL.md +0 -0
  131. /package/.codex/skills/{ai-provider-skill → genesis-ai-provider}/agents/openai.yaml +0 -0
  132. /package/.codex/skills/{ai-provider-skill → genesis-ai-provider}/checklists/checklist.md +0 -0
  133. /package/.codex/skills/{ai-provider-skill → genesis-ai-provider}/examples/example.md +0 -0
  134. /package/.codex/skills/{ai-provider-skill → genesis-ai-provider}/templates/provider-contract-template.md +0 -0
  135. /package/.codex/skills/{api-contract-skill → genesis-api-contract}/SKILL.md +0 -0
  136. /package/.codex/skills/{api-contract-skill → genesis-api-contract}/agents/openai.yaml +0 -0
  137. /package/.codex/skills/{api-contract-skill → genesis-api-contract}/checklists/checklist.md +0 -0
  138. /package/.codex/skills/{api-contract-skill → genesis-api-contract}/examples/example.md +0 -0
  139. /package/.codex/skills/{api-contract-skill → genesis-api-contract}/templates/api-contract-template.md +0 -0
  140. /package/.codex/skills/{architecture-skill → genesis-architecture}/SKILL.md +0 -0
  141. /package/.codex/skills/{architecture-skill → genesis-architecture}/agents/openai.yaml +0 -0
  142. /package/.codex/skills/{architecture-skill → genesis-architecture}/checklists/checklist.md +0 -0
  143. /package/.codex/skills/{architecture-skill → genesis-architecture}/examples/example.md +0 -0
  144. /package/.codex/skills/{architecture-skill → genesis-architecture}/templates/architecture-decision-template.md +0 -0
  145. /package/.codex/skills/{codebase-map-skill → genesis-codebase-map}/SKILL.md +0 -0
  146. /package/.codex/skills/{codebase-map-skill → genesis-codebase-map}/agents/openai.yaml +0 -0
  147. /package/.codex/skills/{codebase-map-skill → genesis-codebase-map}/checklists/checklist.md +0 -0
  148. /package/.codex/skills/{codebase-map-skill → genesis-codebase-map}/examples/example.md +0 -0
  149. /package/.codex/skills/{codebase-map-skill → genesis-codebase-map}/templates/map-update-template.md +0 -0
  150. /package/.codex/skills/{design-spec-skill → genesis-design-spec}/SKILL.md +0 -0
  151. /package/.codex/skills/{design-spec-skill → genesis-design-spec}/agents/openai.yaml +0 -0
  152. /package/.codex/skills/{design-spec-skill → genesis-design-spec}/checklists/checklist.md +0 -0
  153. /package/.codex/skills/{design-spec-skill → genesis-design-spec}/examples/example.md +0 -0
  154. /package/.codex/skills/{design-spec-skill → genesis-design-spec}/templates/design-spec-template.md +0 -0
  155. /package/.codex/skills/{docs-skill → genesis-docs}/SKILL.md +0 -0
  156. /package/.codex/skills/{docs-skill → genesis-docs}/agents/openai.yaml +0 -0
  157. /package/.codex/skills/{docs-skill → genesis-docs}/checklists/checklist.md +0 -0
  158. /package/.codex/skills/{docs-skill → genesis-docs}/examples/example.md +0 -0
  159. /package/.codex/skills/{docs-skill → genesis-docs}/templates/docs-update-template.md +0 -0
  160. /package/.codex/skills/{harness-engineering-skill → genesis-harness-engineering}/agents/openai.yaml +0 -0
  161. /package/.codex/skills/{pipeline-orchestration-skill → genesis-pipeline-orchestration}/SKILL.md +0 -0
  162. /package/.codex/skills/{pipeline-orchestration-skill → genesis-pipeline-orchestration}/agents/openai.yaml +0 -0
  163. /package/.codex/skills/{pipeline-orchestration-skill → genesis-pipeline-orchestration}/checklists/checklist.md +0 -0
  164. /package/.codex/skills/{pipeline-orchestration-skill → genesis-pipeline-orchestration}/examples/example.md +0 -0
  165. /package/.codex/skills/{pipeline-orchestration-skill → genesis-pipeline-orchestration}/templates/orchestration-template.md +0 -0
  166. /package/.codex/skills/{planning-skill → genesis-planning}/SKILL.md +0 -0
  167. /package/.codex/skills/{planning-skill → genesis-planning}/agents/openai.yaml +0 -0
  168. /package/.codex/skills/{planning-skill → genesis-planning}/checklists/checklist.md +0 -0
  169. /package/.codex/skills/{planning-skill → genesis-planning}/examples/example.md +0 -0
  170. /package/.codex/skills/{planning-skill → genesis-planning}/templates/plan-template.md +0 -0
  171. /package/.codex/skills/{release-skill → genesis-release}/SKILL.md +0 -0
  172. /package/.codex/skills/{release-skill → genesis-release}/agents/openai.yaml +0 -0
  173. /package/.codex/skills/{release-skill → genesis-release}/checklists/checklist.md +0 -0
  174. /package/.codex/skills/{release-skill → genesis-release}/examples/example.md +0 -0
  175. /package/.codex/skills/{release-skill → genesis-release}/templates/release-checklist-template.md +0 -0
  176. /package/.codex/skills/{research-skill → genesis-research}/SKILL.md +0 -0
  177. /package/.codex/skills/{research-skill → genesis-research}/agents/openai.yaml +0 -0
  178. /package/.codex/skills/{research-skill → genesis-research}/checklists/checklist.md +0 -0
  179. /package/.codex/skills/{research-skill → genesis-research}/examples/example.md +0 -0
  180. /package/.codex/skills/{research-skill → genesis-research}/templates/research-note-template.md +0 -0
@@ -0,0 +1,303 @@
1
+ # Deployment Strategy Template
2
+
3
+ **Release**: v[X.Y.Z]
4
+ **Risk Score**: [N/10]
5
+ **Strategy Selected**: [Blue-Green|Canary|Rolling]
6
+ **Approval Date**: [YYYY-MM-DD]
7
+ **Deployment Window**: [YYYY-MM-DD HH:MM-HH:MM UTC]
8
+
9
+ ---
10
+
11
+ ## Strategy Selection Criteria
12
+
13
+ **Risk Score 1-2 (LOW)** → Rolling Deployment
14
+ **Risk Score 3-5 (MEDIUM)** → Blue-Green Deployment
15
+ **Risk Score 6-8 (HIGH)** → Canary Deployment
16
+ **Risk Score 9-10 (CRITICAL)** → Canary + Manual approval at each stage
17
+
18
+ ---
19
+
20
+ ## Blue-Green Deployment Strategy
21
+
22
+ **When to use**: Medium-risk releases (risk 3-5)
23
+
24
+ ### Overview
25
+
26
+ Two identical production environments:
27
+ - **Blue** = Current production (v2.5.3)
28
+ - **Green** = New version (v3.0.0)
29
+
30
+ Traffic is routed to one at a time. If issues occur, switch back instantly.
31
+
32
+ ### Timeline
33
+
34
+ ```
35
+ Stage 1: Deploy to Green (5-10 min)
36
+ - Deploy v3.0.0 to green environment
37
+ - Run health checks
38
+ - Verify all systems ready
39
+ - Keep blue (v2.5.3) running
40
+
41
+ Stage 2: Validate Green (5-10 min)
42
+ - Run smoke tests against green
43
+ - Verify response format correct (if breaking changes)
44
+ - Validate database migrations applied
45
+ - Confirm green ready for traffic
46
+
47
+ Stage 3: Switch Traffic (1-2 min)
48
+ - Update load balancer: 100% traffic → green (v3.0.0)
49
+ - Blue (v2.5.3) stops receiving traffic but stays running
50
+
51
+ Stage 4: Monitor Green (1-24 hours)
52
+ - Monitor error rate, latency, resources
53
+ - If issues found: Instant rollback (switch back to blue)
54
+ - If stable after 1-24 hours: Decommission blue, keep only green
55
+
56
+ Rollback (if needed): <5 seconds
57
+ - Switch traffic: green → blue (instant)
58
+ - No data loss (traffic never split)
59
+ ```
60
+
61
+ ### Deployment Commands
62
+
63
+ ```bash
64
+ # 1. Deploy to green environment
65
+ kubectl apply -f deployment-v3.0.0-green.yaml -n production
66
+
67
+ # 2. Wait for green ready
68
+ kubectl rollout status deployment/myapp-green --timeout=10m
69
+
70
+ # 3. Validate green environment
71
+ curl -f https://green-staging.myapp.example.com/health
72
+ curl -f https://green-staging.myapp.example.com/api/users/1
73
+
74
+ # 4. Switch traffic: Load Balancer route 100% to green
75
+ aws elbv2 modify-rule --rule-arn arn:aws:elasticloadbalancing:... \
76
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=arn:...green:...,Weight=100}]"
77
+
78
+ # 5. Monitor for 1-24 hours
79
+ watch -n 10 'kubectl top pods -n production'
80
+
81
+ # 6. If rollback needed: Switch back to blue (instant)
82
+ aws elbv2 modify-rule --rule-arn arn:aws:elasticloadbalancing:... \
83
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=arn:...blue:...,Weight=100}]"
84
+ ```
85
+
86
+ ### Pros & Cons
87
+
88
+ **Pros**:
89
+ - ✅ Zero-downtime deployment
90
+ - ✅ Instant rollback (< 5 seconds)
91
+ - ✅ Can run parallel tests
92
+ - ✅ No traffic splitting complexity
93
+
94
+ **Cons**:
95
+ - ❌ Double infrastructure cost (need 2x environments)
96
+ - ❌ Data consistency challenges (2 databases)
97
+ - ❌ Not suitable for frequent deployments
98
+
99
+ ---
100
+
101
+ ## Canary Deployment Strategy
102
+
103
+ **When to use**: High-risk releases (risk 6-8+)
104
+
105
+ ### Overview
106
+
107
+ Gradually roll out new version to small % of traffic, increasing as confidence grows.
108
+
109
+ - Stage 1: 1% traffic (1 hour monitoring)
110
+ - Stage 2: 10% traffic (2 hours monitoring)
111
+ - Stage 3: 50% traffic (4 hours monitoring)
112
+ - Stage 4: 100% traffic (24+ hours monitoring)
113
+
114
+ If issues at any stage: Rollback all traffic to previous version.
115
+
116
+ ### Timeline
117
+
118
+ ```
119
+ Stage 1: 1% Traffic Canary (1 hour)
120
+ - Deploy v3.0.0 alongside v2.5.3
121
+ - Route 1% of traffic to v3.0.0
122
+ - Monitor error rate, latency, resource usage
123
+ - Go/No-go decision after 1 hour
124
+
125
+ Stage 2: 10% Traffic Canary (2 hours)
126
+ - Increase to 10% traffic if Stage 1 passed
127
+ - Monitor for 2 hours (covers lunch rush if applicable)
128
+ - Go/No-go decision after 2 hours
129
+
130
+ Stage 3: 50% Traffic Canary (4 hours)
131
+ - Increase to 50% traffic if Stage 2 passed
132
+ - Monitor for 4 hours (covers peak traffic periods)
133
+ - Go/No-go decision after 4 hours
134
+
135
+ Stage 4: 100% Traffic - Full Rollout (24+ hours)
136
+ - Route 100% traffic to v3.0.0
137
+ - Keep v2.5.3 running for 24 hours (instant rollback)
138
+ - Continuous monitoring for 24 hours
139
+ - After 24 hours stable: Decommission v2.5.3
140
+
141
+ Total deployment: 8+ hours (Stage 1→4)
142
+ Rollback window: 24 hours post-complete deployment
143
+ ```
144
+
145
+ ### Deployment Commands
146
+
147
+ ```bash
148
+ # Stage 1: Deploy and route 1% traffic
149
+ kubectl apply -f deployment-v3.0.0.yaml -n production
150
+ kubectl get pods -w # Verify ready
151
+
152
+ # Route 1% traffic to v3.0.0, 99% to v2.5.3
153
+ aws elbv2 modify-rule --rule-arn <arn> \
154
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<v3-tg>,Weight=1},{TargetGroupArn=<v2-tg>,Weight=99}]"
155
+
156
+ # Monitor Stage 1 (1 hour)
157
+ # ... check error rate, latency, etc.
158
+
159
+ # Stage 2: Increase to 10%
160
+ aws elbv2 modify-rule --rule-arn <arn> \
161
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<v3-tg>,Weight=10},{TargetGroupArn=<v2-tg>,Weight=90}]"
162
+
163
+ # Stage 3: Increase to 50%
164
+ aws elbv2 modify-rule --rule-arn <arn> \
165
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<v3-tg>,Weight=50},{TargetGroupArn=<v2-tg>,Weight=50}]"
166
+
167
+ # Stage 4: 100% traffic (full rollout)
168
+ aws elbv2 modify-rule --rule-arn <arn> \
169
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<v3-tg>,Weight=100}]"
170
+
171
+ # Rollback (if needed at any stage): Instant switch to previous version
172
+ aws elbv2 modify-rule --rule-arn <arn> \
173
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<v2-tg>,Weight=100}]"
174
+ ```
175
+
176
+ ### Go/No-Go Criteria (At Each Stage)
177
+
178
+ **GO** if:
179
+ - ✅ Error rate <1% (target <0.1%)
180
+ - ✅ Latency within baseline (±10%)
181
+ - ✅ No critical issues
182
+ - ✅ Consumer feedback positive
183
+ - ✅ Team lead approves
184
+
185
+ **NO-GO** (Pause) if:
186
+ - ⚠️ Error rate 1-5% (investigate, pause 30 min)
187
+ - ⚠️ Latency spike >50% (investigate, may indicate load issue)
188
+ - ⚠️ Resource exhaustion (tune, retry)
189
+
190
+ **ROLLBACK** if:
191
+ - ❌ Error rate >5%
192
+ - ❌ Complete service unavailability
193
+ - ❌ Data corruption
194
+ - ❌ Security vulnerability
195
+
196
+ ### Pros & Cons
197
+
198
+ **Pros**:
199
+ - ✅ Low risk (catch issues early)
200
+ - ✅ Minimal blast radius at each stage
201
+ - ✅ Customer feedback integrated
202
+ - ✅ Gradual confidence increase
203
+
204
+ **Cons**:
205
+ - ❌ Slow deployment (8+ hours)
206
+ - ❌ Requires careful monitoring
207
+ - ❌ Split traffic may hide issues
208
+ - ❌ Complex state management (need both versions running)
209
+
210
+ ---
211
+
212
+ ## Rolling Deployment Strategy
213
+
214
+ **When to use**: Low-risk releases (risk 1-2)
215
+
216
+ ### Overview
217
+
218
+ Replace instances one-by-one, keeping service available throughout.
219
+
220
+ - Wave 1: Update 25% of instances (1 hour)
221
+ - Wave 2: Update 50% of instances (1 hour)
222
+ - Wave 3: Update 75% of instances (1 hour)
223
+ - Wave 4: Update 100% of instances (1 hour)
224
+
225
+ If issue detected: Rollback stops, can reverse changes.
226
+
227
+ ### Timeline
228
+
229
+ ```
230
+ Wave 1: 25% Instances (1 hour)
231
+ - Cordoff 25% of instances
232
+ - Deploy v3.0.0 to cordoned instances
233
+ - Verify health checks pass
234
+ - Return instances to rotation
235
+ - Monitor error rate (should be ~0%)
236
+
237
+ Wave 2: 50% Instances (1 hour)
238
+ - Repeat for next 25%
239
+
240
+ Wave 3: 75% Instances (1 hour)
241
+ - Repeat for next 25%
242
+
243
+ Wave 4: 100% Instances (1 hour)
244
+ - Deploy to final 25%
245
+ - All instances now v3.0.0
246
+
247
+ Total deployment: 4 hours
248
+ Rollback: Can stop deployment mid-wave, rollback in progress instances
249
+ ```
250
+
251
+ ### Pros & Cons
252
+
253
+ **Pros**:
254
+ - ✅ Fast (4 hours vs 8+ hours)
255
+ - ✅ Zero-downtime (service always available)
256
+ - ✅ Simple implementation
257
+ - ✅ Low infrastructure cost
258
+
259
+ **Cons**:
260
+ - ❌ Higher risk (old + new running simultaneously)
261
+ - ❌ Harder to rollback (partially deployed state)
262
+ - ❌ Can't instantly revert (need to re-deploy)
263
+
264
+ ---
265
+
266
+ ## Monitoring Metrics for All Strategies
267
+
268
+ **Critical Metrics** (check every 5-10 minutes):
269
+
270
+ | Metric | Target | Alert |
271
+ |--------|--------|-------|
272
+ | Error Rate (5xx) | <0.1% | >1% |
273
+ | Latency P95 | <200ms | >300ms |
274
+ | Latency P99 | <500ms | >750ms |
275
+ | CPU Usage | <70% | >80% |
276
+ | Memory Usage | <80% | >85% |
277
+ | Database Connections | Stable | Growing |
278
+ | Request Timeout Rate | <0.01% | >0.05% |
279
+
280
+ ---
281
+
282
+ ## Rollback Procedure (Universal for All Strategies)
283
+
284
+ ```bash
285
+ # Immediate: Switch traffic back to previous version
286
+ aws elbv2 modify-rule --rule-arn <arn> \
287
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<prev-tg>,Weight=100}]"
288
+
289
+ # Verify previous version handling traffic
290
+ curl -f http://[prod-url]/health
291
+
292
+ # Notify stakeholders
293
+ # Email: ops-team@company.com
294
+ # Slack: #incident-response
295
+ # Status page: Updated
296
+
297
+ # Start investigation
298
+ echo "Rollback complete at $(date)" >> INCIDENT.md
299
+ ```
300
+
301
+ ---
302
+
303
+ **DEPLOYMENT STRATEGY COMPLETE**
@@ -0,0 +1,420 @@
1
+ # Release Runbook Template
2
+
3
+ **Environment**: [Development|Staging|Production]
4
+ **Release Version**: v[X.Y.Z]
5
+ **Release Date**: [YYYY-MM-DD HH:MM UTC]
6
+ **Release Manager**: [Name]
7
+ **Approval Chain**: [List approvers]
8
+
9
+ ---
10
+
11
+ ## Pre-Deployment Phase (30-60 minutes)
12
+
13
+ ### 1. Pre-Deployment Verification (10 min)
14
+
15
+ **Checklist**:
16
+ - [ ] All tests passing (80%+ coverage)
17
+ - [ ] Version correctly updated in VERSION file
18
+ - [ ] CHANGELOG.md updated with release notes
19
+ - [ ] Breaking changes documented (if applicable)
20
+ - [ ] Migration guides created (if breaking changes)
21
+ - [ ] Database migrations tested in staging
22
+ - [ ] Configuration validated for this environment
23
+ - [ ] Rollback plan tested and documented
24
+ - [ ] Team on-call and ready
25
+ - [ ] Status page updated (if applicable)
26
+
27
+ **Command to verify**:
28
+ ```bash
29
+ # Check tests
30
+ npm run test -- --coverage
31
+
32
+ # Verify version
33
+ cat VERSION # Should show v[X.Y.Z]
34
+
35
+ # Verify changelog
36
+ cat CHANGELOG.md | head -20 # Should have new version entry
37
+
38
+ # Verify migrations (if applicable)
39
+ ls -la db/migrations/ # Check latest migration script
40
+ ```
41
+
42
+ ### 2. Database Migrations (if applicable, 10-20 min)
43
+
44
+ **For Development/Staging**:
45
+ ```bash
46
+ # Run migrations
47
+ ./db/migrate-up.sh
48
+
49
+ # Verify: Check table structure
50
+ psql -d database_dev -c "\d users"
51
+
52
+ # Verify: Check data integrity
53
+ psql -d database_dev -c "SELECT COUNT(*) FROM users"
54
+
55
+ # Save migration timestamp
56
+ echo "Migration completed at $(date)" >> MIGRATION_LOG.md
57
+ ```
58
+
59
+ **For Production** (if breaking changes):
60
+ - [ ] Backup database created: `db_backup_v[X.Y.Z]_$(date).sql`
61
+ - [ ] Backup tested: Can restore from backup
62
+ - [ ] Estimated migration time: [X] minutes
63
+ - [ ] Data integrity verified post-migration
64
+ - [ ] Rollback migration script tested
65
+
66
+ ### 3. Configuration Validation (5-10 min)
67
+
68
+ **Check environment-specific config**:
69
+ ```bash
70
+ # Verify environment variables loaded
71
+ env | grep -E "DATABASE_URL|API_KEY|FEATURE_FLAG"
72
+
73
+ # Verify secrets available
74
+ grep -r "SECRET_KEY" config/ | grep -v "# PLACEHOLDER"
75
+
76
+ # Verify no hardcoded values
77
+ grep -r "hardcoded_value\|TODO_REPLACE\|CHANGE_ME" src/
78
+
79
+ # Validate config syntax
80
+ npm run config:validate
81
+ ```
82
+
83
+ **Config checklist**:
84
+ - [ ] Database URL correct for environment
85
+ - [ ] API keys/tokens available
86
+ - [ ] Feature flags set correctly
87
+ - [ ] Logging level appropriate (DEBUG in dev, INFO in prod)
88
+ - [ ] TLS certificates valid
89
+ - [ ] Domain names correct
90
+
91
+ ### 4. Pre-Deployment Approval (5 min)
92
+
93
+ **Get sign-off from**:
94
+ - [ ] Release Manager: Confirmed ready
95
+ - [ ] Tech Lead: All checks pass
96
+ - [ ] Ops Lead: Infrastructure ready
97
+ - [ ] Product (if breaking changes): Consumer communication sent
98
+
99
+ **Approval timestamp**: [HH:MM UTC]
100
+
101
+ ---
102
+
103
+ ## Deployment Phase (20-60 minutes)
104
+
105
+ ### For Development Environment
106
+
107
+ ```bash
108
+ # 1. Build Docker image
109
+ docker build -t myapp:v[X.Y.Z] .
110
+
111
+ # 2. Tag image
112
+ docker tag myapp:v[X.Y.Z] myapp:latest
113
+
114
+ # 3. Stop old container
115
+ docker stop myapp-container || true
116
+
117
+ # 4. Remove old container
118
+ docker rm myapp-container || true
119
+
120
+ # 5. Run new container
121
+ docker run -d \
122
+ --name myapp-container \
123
+ --env-file .env.dev \
124
+ -p 3000:3000 \
125
+ myapp:v[X.Y.Z]
126
+
127
+ # 6. Wait for startup (verify health endpoint)
128
+ sleep 5
129
+ curl -f http://localhost:3000/health || exit 1
130
+
131
+ echo "✅ Development deployment complete: v[X.Y.Z]"
132
+ ```
133
+
134
+ ### For Staging Environment (via Kubernetes)
135
+
136
+ ```bash
137
+ # 1. Build and push image to registry
138
+ docker build -t registry.example.com/myapp:v[X.Y.Z] .
139
+ docker push registry.example.com/myapp:v[X.Y.Z]
140
+
141
+ # 2. Update deployment
142
+ kubectl set image deployment/myapp \
143
+ myapp=registry.example.com/myapp:v[X.Y.Z] \
144
+ -n staging
145
+
146
+ # 3. Wait for rollout
147
+ kubectl rollout status deployment/myapp -n staging --timeout=5m
148
+
149
+ # 4. Verify pods running
150
+ kubectl get pods -n staging -l app=myapp
151
+
152
+ # 5. Port forward for testing
153
+ kubectl port-forward svc/myapp 3000:3000 -n staging
154
+
155
+ echo "✅ Staging deployment complete: v[X.Y.Z]"
156
+ ```
157
+
158
+ ### For Production Environment (Blue-Green or Canary)
159
+
160
+ **Blue-Green**:
161
+ ```bash
162
+ # 1. Build and push to registry
163
+ docker build -t registry.example.com/myapp:v[X.Y.Z] .
164
+ docker push registry.example.com/myapp:v[X.Y.Z]
165
+
166
+ # 2. Deploy to "green" environment (parallel to prod)
167
+ kubectl apply -f deployment-green-v[X.Y.Z].yaml -n production
168
+
169
+ # 3. Wait for health checks
170
+ kubectl rollout status deployment/myapp-green -n production --timeout=5m
171
+
172
+ # 4. Verify green environment healthy
173
+ curl -f https://green.myapp.example.com/health || exit 1
174
+
175
+ # 5. Switch traffic: blue (old) → green (new)
176
+ # This is typically done via load balancer or DNS switch
177
+ aws elbv2 modify-rule --rule-arn <arn> \
178
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<green-tg>,Weight=100}]"
179
+
180
+ # 6. Monitor: Keep blue running for instant rollback (1-2 hours)
181
+ echo "✅ Production deployment complete: Traffic switched to green (v[X.Y.Z])"
182
+ echo "⏱️ Blue environment available for 2 hours for instant rollback"
183
+ ```
184
+
185
+ **Canary** (for breaking changes):
186
+ ```bash
187
+ # 1-4: Same as blue-green deployment steps
188
+
189
+ # 5. Route small % of traffic to new version (canary)
190
+ aws elbv2 modify-rule --rule-arn <arn> \
191
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<v3-tg>,Weight=1},{TargetGroupArn=<v2-tg>,Weight=99}]"
192
+
193
+ # 6. Monitor Stage 1 (1 hour at 1% traffic)
194
+ # ... (see canary-deployment-orchestration.md for full stages)
195
+ ```
196
+
197
+ ---
198
+
199
+ ## Post-Deployment Verification (15-30 minutes)
200
+
201
+ ### 1. Health Check Verification (5 min)
202
+
203
+ ```bash
204
+ # Check liveness probe
205
+ curl -f http://[app-url]/health
206
+ # Expected: 200 OK with version info
207
+
208
+ # Check readiness probe
209
+ curl -f http://[app-url]/ready
210
+ # Expected: 200 OK with dependency status
211
+
212
+ # Check metrics endpoint
213
+ curl -f http://[app-url]/metrics | head -20
214
+ # Expected: Prometheus metrics output
215
+ ```
216
+
217
+ **Verification checklist**:
218
+ - [ ] Liveness endpoint: 200 OK
219
+ - [ ] Readiness endpoint: 200 OK
220
+ - [ ] Version matches v[X.Y.Z]
221
+ - [ ] Database connected: Yes
222
+ - [ ] Cache connected: Yes (if applicable)
223
+ - [ ] External services: Available
224
+
225
+ ### 2. Smoke Test Scenarios (5-10 min)
226
+
227
+ **Critical Workflow #1: Authentication**
228
+ ```bash
229
+ # Test login
230
+ curl -X POST http://[app-url]/api/login \
231
+ -H "Content-Type: application/json" \
232
+ -d '{"email":"test@example.com","password":"password"}'
233
+
234
+ # Expected response:
235
+ # { "token": "...", "user": { "id": 1, "email": "test@example.com" } }
236
+ ```
237
+
238
+ **Critical Workflow #2: Create & Read Data**
239
+ ```bash
240
+ # Create user
241
+ curl -X POST http://[app-url]/api/users \
242
+ -H "Authorization: Bearer [token]" \
243
+ -H "Content-Type: application/json" \
244
+ -d '{"name":"John","email":"john@example.com"}'
245
+
246
+ # Read user (verify format is correct for this version)
247
+ curl http://[app-url]/api/users/1 \
248
+ -H "Authorization: Bearer [token]"
249
+
250
+ # Expected for v3.0.0: { "data": { "id": 1, "name": "John" } }
251
+ # NOT: { "user": { "id": 1, "name": "John" } }
252
+ ```
253
+
254
+ **Critical Workflow #3: Business Logic**
255
+ ```bash
256
+ # Test key business workflow (e.g., payment processing)
257
+ curl -X POST http://[app-url]/api/payments \
258
+ -H "Authorization: Bearer [token]" \
259
+ -H "Content-Type: application/json" \
260
+ -d '{"amount":99.99,"currency":"USD"}'
261
+
262
+ # Expected: 200 OK with payment ID
263
+ ```
264
+
265
+ **Smoke Test Checklist**:
266
+ - [ ] Login works
267
+ - [ ] Create resource works
268
+ - [ ] Read resource works (new format if applicable)
269
+ - [ ] Update resource works
270
+ - [ ] Delete resource works
271
+ - [ ] Business critical endpoint works
272
+ - [ ] Error handling works (test 404, 400, 500 scenarios)
273
+
274
+ ### 3. Database Integrity Check (5 min)
275
+
276
+ ```bash
277
+ # Verify database accessible
278
+ psql -d [database] -c "SELECT 1"
279
+
280
+ # Check recent data
281
+ psql -d [database] -c "SELECT COUNT(*) FROM users"
282
+
283
+ # Check for any errors in migration
284
+ psql -d [database] -c "SELECT * FROM schema_migrations ORDER BY version DESC LIMIT 5"
285
+
286
+ # Verify no data corruption
287
+ psql -d [database] -c "SELECT * FROM users LIMIT 1" | head -10
288
+ ```
289
+
290
+ **Database checklist**:
291
+ - [ ] Database connection successful
292
+ - [ ] Tables present
293
+ - [ ] Data accessible
294
+ - [ ] No schema errors
295
+ - [ ] Data integrity checks pass
296
+
297
+ ### 4. Performance Baseline (5 min)
298
+
299
+ ```bash
300
+ # Get baseline metrics
301
+ curl http://[app-url]/metrics | grep http_request_duration_seconds | head -10
302
+
303
+ # Expected: Requests processing in <200ms (P95)
304
+ ```
305
+
306
+ **Performance checklist**:
307
+ - [ ] Response time: <200ms P95
308
+ - [ ] Error rate: <0.1%
309
+ - [ ] CPU usage: <70%
310
+ - [ ] Memory usage: <80%
311
+ - [ ] Cache hit rate: >80% (if applicable)
312
+
313
+ ---
314
+
315
+ ## Rollback Phase (If Needed - < 5 minutes)
316
+
317
+ ### For Development/Staging
318
+
319
+ ```bash
320
+ # Get previous version
321
+ PREVIOUS_VERSION=$(git tag | sort -V | tail -2 | head -1)
322
+
323
+ # Stop current version
324
+ docker stop myapp-container
325
+
326
+ # Run previous version
327
+ docker run -d \
328
+ --name myapp-container \
329
+ --env-file .env.dev \
330
+ -p 3000:3000 \
331
+ myapp:${PREVIOUS_VERSION}
332
+
333
+ # Verify rollback successful
334
+ sleep 5
335
+ curl -f http://localhost:3000/health || exit 1
336
+
337
+ echo "✅ Rollback to ${PREVIOUS_VERSION} complete"
338
+ ```
339
+
340
+ ### For Production
341
+
342
+ ```bash
343
+ # Revert to previous version immediately
344
+ # If Blue-Green: Switch traffic back to blue (original)
345
+ aws elbv2 modify-rule --rule-arn <arn> \
346
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<blue-tg>,Weight=100}]"
347
+
348
+ # If Canary: Immediately route 100% back to previous version
349
+ aws elbv2 modify-rule --rule-arn <arn> \
350
+ --actions Type=forward,TargetGroups="[{TargetGroupArn=<v2-tg>,Weight=100}]"
351
+
352
+ # Verify rollback
353
+ curl -f http://[prod-url]/health
354
+
355
+ # Notify stakeholders
356
+ # Send: Slack message, Status page update, Email to on-call
357
+
358
+ echo "✅ Rollback to v[PREVIOUS] complete"
359
+ echo "🔴 Incident: v[X.Y.Z] deployment ROLLED BACK"
360
+ echo "📋 Investigation: See INCIDENT.md"
361
+ ```
362
+
363
+ ---
364
+
365
+ ## Post-Deployment Monitoring (24 hours)
366
+
367
+ **First 1 hour (Critical)**:
368
+ - [ ] Error rate: Maintain <0.1%
369
+ - [ ] Latency: Within 10% of baseline
370
+ - [ ] Throughput: Expected level
371
+ - [ ] Health checks: Continuous passing
372
+ - [ ] Consumer feedback: No critical issues
373
+
374
+ **24-hour window**:
375
+ - [ ] Error rate: Maintain <0.1%
376
+ - [ ] Latency: Stable, no spikes
377
+ - [ ] All business workflows: Operational
378
+ - [ ] Consumer migrations: On track (if breaking changes)
379
+ - [ ] On-call team: Standing down after 24 hours
380
+
381
+ **Checklist**:
382
+ - [ ] 1-hour post-deployment: All green
383
+ - [ ] 4-hour checkpoint: All green
384
+ - [ ] 24-hour checkpoint: Ready to remove rollback capability
385
+
386
+ ---
387
+
388
+ ## Incident Log
389
+
390
+ **If any issues occur, document**:
391
+
392
+ ```
393
+ Time: [HH:MM UTC]
394
+ Severity: [LOW|MEDIUM|HIGH|CRITICAL]
395
+ Description: [What happened]
396
+ Impact: [How many users affected]
397
+ Root cause: [Why did it happen]
398
+ Action taken: [What was done]
399
+ Resolution: [How was it fixed]
400
+ Prevention: [How to prevent next time]
401
+ ```
402
+
403
+ ---
404
+
405
+ ## Sign-Off & Completion
406
+
407
+ **Deployment Complete**:
408
+ - Version deployed: v[X.Y.Z]
409
+ - Environment: [Development|Staging|Production]
410
+ - Date/Time: [YYYY-MM-DD HH:MM UTC]
411
+ - Deployed by: [Name]
412
+ - Approvals: [List all approvers]
413
+ - All health checks: ✅ PASS
414
+ - Smoke tests: ✅ PASS
415
+ - Issues: [None / List any]
416
+ - Status: ✅ **READY FOR NEXT STAGE**
417
+
418
+ ---
419
+
420
+ **RUNBOOK COMPLETE**