proagents 1.6.17 → 1.6.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (185) hide show
  1. package/.claude/settings.local.json +169 -0
  2. package/.proagents/AGENTS.md +2 -0
  3. package/.proagents/AI_INSTRUCTIONS.md +13 -0
  4. package/.proagents/ANTIGRAVITY.md +2 -0
  5. package/.proagents/BOLT.md +2 -0
  6. package/.proagents/CHATGPT.md +2 -0
  7. package/.proagents/CLAUDE.md +2 -0
  8. package/.proagents/GEMINI.md +2 -0
  9. package/.proagents/GROQ.md +2 -0
  10. package/.proagents/KIRO.md +2 -0
  11. package/.proagents/LOVABLE.md +2 -0
  12. package/.proagents/PROAGENTS.md +2 -0
  13. package/.proagents/REPLIT.md +2 -0
  14. package/.proagents/prompts/00-project-setup.md +878 -0
  15. package/.proagents/prompts/04-planning.md +38 -0
  16. package/.proagents/prompts/12-rnd.md +957 -0
  17. package/.proagents/workflow-modes/entry-modes.md +27 -0
  18. package/.proagents/worklog/_context.template.md +47 -0
  19. package/COMMANDS.md +654 -0
  20. package/README.md +16 -24
  21. package/package.json +2 -7
  22. package/.proagents/ai-models/README.md +0 -141
  23. package/.proagents/ai-models/cost-management.md +0 -362
  24. package/.proagents/ai-models/fallbacks.md +0 -342
  25. package/.proagents/ai-models/model-config.md +0 -318
  26. package/.proagents/ai-models/task-routing.md +0 -503
  27. package/.proagents/ai-training/README.md +0 -155
  28. package/.proagents/ai-training/continuous-learning.md +0 -413
  29. package/.proagents/ai-training/domain-knowledge.md +0 -378
  30. package/.proagents/ai-training/pattern-learning.md +0 -455
  31. package/.proagents/ai-training/training-data.md +0 -337
  32. package/.proagents/ai-training/user-preferences.md +0 -346
  33. package/.proagents/approval-workflows/README.md +0 -146
  34. package/.proagents/approval-workflows/approval-config.md +0 -332
  35. package/.proagents/approval-workflows/approval-stages.md +0 -503
  36. package/.proagents/approval-workflows/emergency-bypass.md +0 -351
  37. package/.proagents/approval-workflows/examples.md +0 -859
  38. package/.proagents/approval-workflows/notifications.md +0 -320
  39. package/.proagents/compliance/README.md +0 -206
  40. package/.proagents/compliance/access-control.md +0 -310
  41. package/.proagents/compliance/audit-logging.md +0 -444
  42. package/.proagents/compliance/compliance-frameworks.md +0 -429
  43. package/.proagents/compliance/reports.md +0 -491
  44. package/.proagents/compliance/retention-policies.md +0 -454
  45. package/.proagents/config-versioning/README.md +0 -120
  46. package/.proagents/config-versioning/changelog.md +0 -300
  47. package/.proagents/config-versioning/rollback.md +0 -283
  48. package/.proagents/config-versioning/versioning.md +0 -330
  49. package/.proagents/contract-testing/README.md +0 -223
  50. package/.proagents/contract-testing/contract-testing.md +0 -614
  51. package/.proagents/contract-testing/pact-integration.md +0 -507
  52. package/.proagents/contract-testing/schema-validation.md +0 -565
  53. package/.proagents/dependency-management/README.md +0 -140
  54. package/.proagents/dependency-management/automation.md +0 -363
  55. package/.proagents/dependency-management/compatibility.md +0 -319
  56. package/.proagents/dependency-management/security-scanning.md +0 -413
  57. package/.proagents/dependency-management/update-policies.md +0 -374
  58. package/.proagents/disaster-recovery/README.md +0 -247
  59. package/.proagents/disaster-recovery/automation.md +0 -366
  60. package/.proagents/disaster-recovery/backup-recovery.md +0 -571
  61. package/.proagents/disaster-recovery/incident-response.md +0 -565
  62. package/.proagents/disaster-recovery/rollback-procedures.md +0 -499
  63. package/.proagents/disaster-recovery/runbooks.md +0 -603
  64. package/.proagents/disaster-recovery/scenarios.md +0 -892
  65. package/.proagents/disaster-recovery/testing.md +0 -438
  66. package/.proagents/environments/README.md +0 -244
  67. package/.proagents/environments/configuration.md +0 -437
  68. package/.proagents/environments/promotion.md +0 -434
  69. package/.proagents/environments/setup.md +0 -420
  70. package/.proagents/examples/README.md +0 -55
  71. package/.proagents/examples/backend-nodejs/README.md +0 -188
  72. package/.proagents/examples/backend-nodejs/complete-conversation.md +0 -601
  73. package/.proagents/examples/backend-nodejs/proagents.config.yaml +0 -415
  74. package/.proagents/examples/backend-nodejs/workflow-example.md +0 -909
  75. package/.proagents/examples/fullstack-nextjs/README.md +0 -155
  76. package/.proagents/examples/fullstack-nextjs/complete-conversation.md +0 -604
  77. package/.proagents/examples/fullstack-nextjs/proagents.config.yaml +0 -287
  78. package/.proagents/examples/fullstack-nextjs/workflow-example.md +0 -553
  79. package/.proagents/examples/mobile-react-native/README.md +0 -171
  80. package/.proagents/examples/mobile-react-native/complete-conversation.md +0 -825
  81. package/.proagents/examples/mobile-react-native/proagents.config.yaml +0 -330
  82. package/.proagents/examples/mobile-react-native/workflow-example.md +0 -723
  83. package/.proagents/examples/web-frontend-react/README.md +0 -125
  84. package/.proagents/examples/web-frontend-react/complete-conversation.md +0 -556
  85. package/.proagents/examples/web-frontend-react/proagents.config.yaml +0 -183
  86. package/.proagents/examples/web-frontend-react/workflow-example.md +0 -603
  87. package/.proagents/existing-projects/README.md +0 -65
  88. package/.proagents/existing-projects/challenges.md +0 -861
  89. package/.proagents/existing-projects/coexistence-mode.md +0 -483
  90. package/.proagents/existing-projects/compatibility-assessment.md +0 -541
  91. package/.proagents/existing-projects/gradual-adoption.md +0 -515
  92. package/.proagents/existing-projects/migration-strategies.md +0 -788
  93. package/.proagents/existing-projects/pattern-reconciliation.md +0 -489
  94. package/.proagents/existing-projects/team-onboarding.md +0 -617
  95. package/.proagents/existing-projects/technical-debt-handling.md +0 -644
  96. package/.proagents/feature-flags/README.md +0 -263
  97. package/.proagents/feature-flags/ab-testing.md +0 -413
  98. package/.proagents/feature-flags/configuration.md +0 -420
  99. package/.proagents/feature-flags/kill-switches.md +0 -444
  100. package/.proagents/feature-flags/rollout-strategies.md +0 -392
  101. package/.proagents/history.log +0 -12
  102. package/.proagents/i18n/README.md +0 -133
  103. package/.proagents/i18n/extraction.md +0 -433
  104. package/.proagents/i18n/tms-integration.md +0 -332
  105. package/.proagents/i18n/translation-workflow.md +0 -413
  106. package/.proagents/i18n/validation.md +0 -355
  107. package/.proagents/logging/README.md +0 -276
  108. package/.proagents/logging/aggregation.md +0 -475
  109. package/.proagents/logging/log-levels.md +0 -376
  110. package/.proagents/logging/sensitive-data.md +0 -423
  111. package/.proagents/logging/structured-logging.md +0 -406
  112. package/.proagents/metrics/README.md +0 -69
  113. package/.proagents/metrics/code-quality-kpis.md +0 -461
  114. package/.proagents/metrics/deployment-metrics.md +0 -517
  115. package/.proagents/metrics/developer-productivity.md +0 -368
  116. package/.proagents/metrics/learning-effectiveness.md +0 -478
  117. package/.proagents/migrations/README.md +0 -77
  118. package/.proagents/migrations/from-claude-projects.md +0 -313
  119. package/.proagents/migrations/from-cursor-rules.md +0 -345
  120. package/.proagents/migrations/from-custom-workflows.md +0 -410
  121. package/.proagents/monitoring/README.md +0 -308
  122. package/.proagents/monitoring/alerting.md +0 -449
  123. package/.proagents/monitoring/dashboards.md +0 -454
  124. package/.proagents/monitoring/health-checks.md +0 -436
  125. package/.proagents/monitoring/metrics.md +0 -434
  126. package/.proagents/multi-project/README.md +0 -170
  127. package/.proagents/multi-project/coordinated-deploy.md +0 -510
  128. package/.proagents/multi-project/cross-project-deps.md +0 -395
  129. package/.proagents/multi-project/unified-changelog.md +0 -477
  130. package/.proagents/multi-project/walkthroughs/monorepo-setup.md +0 -787
  131. package/.proagents/multi-project/workspace-config.md +0 -408
  132. package/.proagents/notifications/README.md +0 -151
  133. package/.proagents/notifications/channels.md +0 -457
  134. package/.proagents/notifications/preferences.md +0 -415
  135. package/.proagents/notifications/routing.md +0 -449
  136. package/.proagents/notifications/scheduling.md +0 -425
  137. package/.proagents/notifications/templates.md +0 -446
  138. package/.proagents/offline-mode/README.md +0 -145
  139. package/.proagents/offline-mode/caching.md +0 -344
  140. package/.proagents/offline-mode/offline-operations.md +0 -312
  141. package/.proagents/offline-mode/queue-specifications.md +0 -679
  142. package/.proagents/offline-mode/sync.md +0 -475
  143. package/.proagents/parallel-features/README.md +0 -85
  144. package/.proagents/parallel-features/conflict-detection.md +0 -226
  145. package/.proagents/parallel-features/dependency-management.md +0 -392
  146. package/.proagents/parallel-features/merge-coordination.md +0 -506
  147. package/.proagents/parallel-features/tracking-system.md +0 -416
  148. package/.proagents/performance/README.md +0 -59
  149. package/.proagents/performance/bundle-analysis.md +0 -375
  150. package/.proagents/performance/load-testing.md +0 -563
  151. package/.proagents/performance/runtime-metrics.md +0 -489
  152. package/.proagents/performance/web-vitals.md +0 -425
  153. package/.proagents/plugins/README.md +0 -139
  154. package/.proagents/plugins/creating-plugins.md +0 -504
  155. package/.proagents/plugins/plugin-api.md +0 -467
  156. package/.proagents/plugins/plugin-registry.md +0 -276
  157. package/.proagents/reporting/README.md +0 -158
  158. package/.proagents/reporting/dashboards.md +0 -366
  159. package/.proagents/reporting/exports.md +0 -524
  160. package/.proagents/reporting/quality-metrics.md +0 -385
  161. package/.proagents/reporting/templates/README.md +0 -56
  162. package/.proagents/reporting/templates/dashboard-config.json +0 -187
  163. package/.proagents/reporting/templates/metrics-queries.md +0 -427
  164. package/.proagents/reporting/templates/react-dashboard.tsx +0 -544
  165. package/.proagents/reporting/templates/widgets.md +0 -451
  166. package/.proagents/reporting/velocity-metrics.md +0 -340
  167. package/.proagents/reverse-engineering/README.md +0 -151
  168. package/.proagents/reverse-engineering/architecture-extraction.md +0 -325
  169. package/.proagents/reverse-engineering/code-analysis.md +0 -377
  170. package/.proagents/reverse-engineering/dependency-mapping.md +0 -567
  171. package/.proagents/reverse-engineering/diagram-generation.md +0 -586
  172. package/.proagents/reverse-engineering/documentation-generation.md +0 -468
  173. package/.proagents/reverse-engineering/pattern-detection.md +0 -569
  174. package/.proagents/reverse-engineering/quality-assessment.md +0 -733
  175. package/.proagents/secrets/README.md +0 -278
  176. package/.proagents/secrets/access-control.md +0 -443
  177. package/.proagents/secrets/rotation.md +0 -403
  178. package/.proagents/secrets/scanning.md +0 -487
  179. package/.proagents/secrets/storage.md +0 -394
  180. package/.proagents/webhooks/README.md +0 -126
  181. package/.proagents/webhooks/endpoints.md +0 -298
  182. package/.proagents/webhooks/events.md +0 -316
  183. package/.proagents/webhooks/payloads.md +0 -325
  184. package/.proagents/webhooks/reliability.md +0 -363
  185. package/.proagents/webhooks/security.md +0 -380
@@ -1,565 +0,0 @@
1
- # Incident Response
2
-
3
- Incident response procedures and runbooks for development teams.
4
-
5
- ---
6
-
7
- ## Overview
8
-
9
- Structured incident response ensures rapid detection, response, and recovery from issues.
10
-
11
- ```
12
- ┌─────────────────────────────────────────────────────────────┐
13
- │ Incident Response Flow │
14
- ├─────────────────────────────────────────────────────────────┤
15
- │ │
16
- │ Detection ──► Triage ──► Response ──► Resolution ──► Review│
17
- │ │ │ │ │ │ │
18
- │ ▼ ▼ ▼ ▼ ▼ │
19
- │ Alerts Severity Containment Fix/Rollback Postmortem
20
- │ Monitor Assessment Communication Verification Learning
21
- │ │
22
- └─────────────────────────────────────────────────────────────┘
23
- ```
24
-
25
- ---
26
-
27
- ## Severity Levels
28
-
29
- | Level | Name | Description | Response Time | Examples |
30
- |-------|------|-------------|---------------|----------|
31
- | SEV1 | Critical | Complete outage | < 15 min | System down, data breach |
32
- | SEV2 | Major | Significant impact | < 30 min | Core feature broken |
33
- | SEV3 | Minor | Limited impact | < 2 hours | Non-critical bug |
34
- | SEV4 | Low | Minimal impact | < 24 hours | UI glitch |
35
-
36
- ### Severity Decision Matrix
37
-
38
- ```
39
- ┌─────────────────────────────────────────────────────────────┐
40
- │ Severity Assessment │
41
- ├─────────────────────────────────────────────────────────────┤
42
- │ │
43
- │ Impact: How many users affected? │
44
- │ ├── All users ─────────────────────────► SEV1/SEV2 │
45
- │ ├── Many users (>10%) ─────────────────► SEV2/SEV3 │
46
- │ ├── Some users (<10%) ─────────────────► SEV3/SEV4 │
47
- │ └── Few users (<1%) ───────────────────► SEV4 │
48
- │ │
49
- │ Urgency: Is it getting worse? │
50
- │ ├── Yes, rapidly ──────────────────────► Escalate │
51
- │ ├── Yes, slowly ───────────────────────► Monitor │
52
- │ └── No, stable ────────────────────────► Normal priority │
53
- │ │
54
- │ Data: Is data at risk? │
55
- │ ├── Data loss/corruption ──────────────► SEV1 │
56
- │ ├── Data exposure ─────────────────────► SEV1 │
57
- │ └── No data impact ────────────────────► Continue assess │
58
- │ │
59
- └─────────────────────────────────────────────────────────────┘
60
- ```
61
-
62
- ---
63
-
64
- ## Incident Response Phases
65
-
66
- ### Phase 1: Detection
67
-
68
- ```yaml
69
- detection:
70
- sources:
71
- automated:
72
- - monitoring_alerts
73
- - error_rate_spikes
74
- - health_check_failures
75
- - security_events
76
-
77
- manual:
78
- - user_reports
79
- - support_tickets
80
- - team_observations
81
-
82
- initial_actions:
83
- - acknowledge_alert
84
- - create_incident_ticket
85
- - notify_on_call
86
- ```
87
-
88
- ### Phase 2: Triage
89
-
90
- ```markdown
91
- ## Triage Checklist
92
-
93
- ☐ Confirm the issue is real (not false positive)
94
- ☐ Assess severity level
95
- ☐ Identify affected systems/users
96
- ☐ Determine if it's getting worse
97
- ☐ Assign incident commander
98
-
99
- ## Key Questions
100
-
101
- 1. What is broken?
102
- 2. Who is affected?
103
- 3. When did it start?
104
- 4. Is it getting worse?
105
- 5. What changed recently?
106
- ```
107
-
108
- ### Phase 3: Response
109
-
110
- ```yaml
111
- response:
112
- immediate_actions:
113
- sev1:
114
- - "Page incident commander"
115
- - "Start incident channel"
116
- - "Begin customer communication"
117
- - "Consider immediate rollback"
118
-
119
- sev2:
120
- - "Alert on-call team"
121
- - "Create incident channel"
122
- - "Begin investigation"
123
-
124
- sev3:
125
- - "Assign to engineer"
126
- - "Track in ticket system"
127
- - "Schedule fix"
128
-
129
- communication:
130
- internal:
131
- - incident_channel
132
- - status_updates
133
- external:
134
- - status_page
135
- - customer_notification
136
- ```
137
-
138
- ### Phase 4: Resolution
139
-
140
- ```markdown
141
- ## Resolution Steps
142
-
143
- 1. **Contain** - Stop the bleeding
144
- - Rollback if needed
145
- - Disable affected features
146
- - Block problematic traffic
147
-
148
- 2. **Fix** - Address root cause
149
- - Identify root cause
150
- - Implement fix
151
- - Test fix
152
-
153
- 3. **Verify** - Confirm resolution
154
- - Check metrics normalized
155
- - Verify user reports stopped
156
- - Confirm monitoring green
157
-
158
- 4. **Communicate** - Update stakeholders
159
- - Update status page
160
- - Notify affected users
161
- - Close incident channel
162
- ```
163
-
164
- ### Phase 5: Review
165
-
166
- ```markdown
167
- ## Post-Incident Review
168
-
169
- **Within 24-48 hours:**
170
- - Document timeline
171
- - Identify root cause
172
- - List contributing factors
173
-
174
- **Within 1 week:**
175
- - Hold blameless postmortem
176
- - Document learnings
177
- - Create action items
178
- - Update runbooks
179
- ```
180
-
181
- ---
182
-
183
- ## Incident Runbooks
184
-
185
- ### SEV1: Complete System Outage
186
-
187
- ```markdown
188
- ## SEV1 Runbook: Complete System Outage
189
-
190
- **Time Target:** Resolution within 1 hour
191
-
192
- ### Immediate Actions (0-5 minutes)
193
-
194
- 1. ☐ Acknowledge alert
195
- 2. ☐ Page incident commander
196
- 3. ☐ Create incident channel (#incident-YYYYMMDD)
197
- 4. ☐ Update status page: "Investigating"
198
-
199
- ### Triage (5-15 minutes)
200
-
201
- 1. ☐ Verify outage scope
202
- 2. ☐ Check recent deployments
203
- 3. ☐ Check infrastructure status
204
- 4. ☐ Check external dependencies
205
-
206
- ### Response (15-30 minutes)
207
-
208
- 1. ☐ If deployment related:
209
- ```bash
210
- proagents rollback production --emergency
211
- ```
212
-
213
- 2. ☐ If infrastructure related:
214
- ```bash
215
- proagents dr failover
216
- ```
217
-
218
- 3. ☐ Update status page: "Identified"
219
-
220
- ### Resolution (30-60 minutes)
221
-
222
- 1. ☐ Verify services recovering
223
- 2. ☐ Monitor error rates
224
- 3. ☐ Run smoke tests
225
- 4. ☐ Update status page: "Resolved"
226
-
227
- ### Post-Incident
228
-
229
- 1. ☐ Close incident channel
230
- 2. ☐ Schedule postmortem
231
- 3. ☐ Document timeline
232
- ```
233
-
234
- ### SEV1: Security Breach
235
-
236
- ```markdown
237
- ## SEV1 Runbook: Security Breach
238
-
239
- **Time Target:** Containment within 30 minutes
240
-
241
- ### Immediate Actions (0-5 minutes)
242
-
243
- 1. ☐ Page security team
244
- 2. ☐ Page incident commander
245
- 3. ☐ Create secure incident channel
246
- 4. ☐ DO NOT communicate externally yet
247
-
248
- ### Containment (5-30 minutes)
249
-
250
- 1. ☐ Identify breach scope
251
- - What data was accessed?
252
- - What systems are affected?
253
- - Is the attack ongoing?
254
-
255
- 2. ☐ Contain the breach
256
- ```bash
257
- # Isolate affected systems
258
- proagents security isolate --system affected-service
259
-
260
- # Rotate credentials
261
- proagents security rotate-credentials --scope affected
262
- ```
263
-
264
- 3. ☐ Preserve evidence
265
- ```bash
266
- proagents security preserve-logs --incident INC-XXX
267
- ```
268
-
269
- ### Investigation (30-120 minutes)
270
-
271
- 1. ☐ Determine attack vector
272
- 2. ☐ Identify all affected data
273
- 3. ☐ Assess legal obligations (GDPR, etc.)
274
- 4. ☐ Engage legal/compliance if needed
275
-
276
- ### Communication
277
-
278
- 1. ☐ Internal stakeholders
279
- 2. ☐ Legal assessment
280
- 3. ☐ Regulatory notification (if required)
281
- 4. ☐ Customer notification (if required)
282
- ```
283
-
284
- ### SEV2: Database Issues
285
-
286
- ```markdown
287
- ## SEV2 Runbook: Database Issues
288
-
289
- **Time Target:** Resolution within 2 hours
290
-
291
- ### Symptoms
292
-
293
- - Slow queries
294
- - Connection errors
295
- - Replication lag
296
- - Lock contention
297
-
298
- ### Triage
299
-
300
- 1. ☐ Check database metrics
301
- ```bash
302
- proagents db status
303
- ```
304
-
305
- 2. ☐ Check recent changes
306
- - New deployments?
307
- - Schema migrations?
308
- - Query changes?
309
-
310
- ### Response
311
-
312
- 1. **Connection Issues**
313
- ```bash
314
- # Check connection pool
315
- proagents db connections
316
-
317
- # Kill long-running queries
318
- proagents db kill-queries --older-than 300s
319
- ```
320
-
321
- 2. **Performance Issues**
322
- ```bash
323
- # Identify slow queries
324
- proagents db slow-queries
325
-
326
- # Check for locks
327
- proagents db locks
328
- ```
329
-
330
- 3. **Replication Issues**
331
- ```bash
332
- # Check replication status
333
- proagents db replication-status
334
-
335
- # If severely lagged, consider failover
336
- proagents db failover --dry-run
337
- ```
338
-
339
- ### If Rollback Needed
340
-
341
- ```bash
342
- proagents db rollback --steps 1
343
- ```
344
- ```
345
-
346
- ---
347
-
348
- ## Communication Templates
349
-
350
- ### Status Page Update
351
-
352
- ```markdown
353
- **[INVESTIGATING]** - Service Degradation
354
-
355
- We are currently investigating reports of slow performance.
356
-
357
- **Affected Services:** [list services]
358
- **Impact:** [describe impact]
359
- **Started:** [time]
360
-
361
- Updates will be provided every 30 minutes.
362
- ```
363
-
364
- ### Resolution Communication
365
-
366
- ```markdown
367
- **[RESOLVED]** - Service Degradation
368
-
369
- The service issues have been resolved. All services are operating normally.
370
-
371
- **Duration:** [start time] - [end time]
372
- **Root Cause:** [brief description]
373
- **Resolution:** [what was done]
374
-
375
- We apologize for any inconvenience. A full postmortem will be conducted.
376
- ```
377
-
378
- ### Customer Notification
379
-
380
- ```markdown
381
- Subject: Service Incident Notification
382
-
383
- Dear Customer,
384
-
385
- We experienced a service disruption on [date] from [time] to [time].
386
-
387
- **What happened:** [description]
388
- **Impact to you:** [specific impact]
389
- **What we did:** [resolution]
390
- **Prevention:** [what we're doing to prevent recurrence]
391
-
392
- We apologize for any inconvenience this may have caused.
393
-
394
- [Name]
395
- [Company]
396
- ```
397
-
398
- ---
399
-
400
- ## Incident Management Tools
401
-
402
- ### CLI Commands
403
-
404
- ```bash
405
- # Create incident
406
- proagents incident create --severity SEV2 --title "API errors"
407
-
408
- # Update incident
409
- proagents incident update INC-123 --status investigating
410
-
411
- # Add timeline entry
412
- proagents incident timeline INC-123 "Identified root cause"
413
-
414
- # Close incident
415
- proagents incident close INC-123 --resolution "Rolled back to v2.3.1"
416
-
417
- # Generate postmortem template
418
- proagents incident postmortem INC-123
419
- ```
420
-
421
- ### Integration Configuration
422
-
423
- ```yaml
424
- incident:
425
- integrations:
426
- pagerduty:
427
- enabled: true
428
- api_key: "${PAGERDUTY_API_KEY}"
429
- service_id: "SERVICE123"
430
-
431
- slack:
432
- enabled: true
433
- webhook: "${SLACK_WEBHOOK}"
434
- channels:
435
- incidents: "#incidents"
436
- status: "#status-updates"
437
-
438
- status_page:
439
- enabled: true
440
- provider: "statuspage.io"
441
- page_id: "${STATUSPAGE_ID}"
442
-
443
- jira:
444
- enabled: true
445
- project: "INC"
446
- issue_type: "Incident"
447
- ```
448
-
449
- ---
450
-
451
- ## Postmortem Template
452
-
453
- ```markdown
454
- # Incident Postmortem: [Incident Title]
455
-
456
- **Date:** [Date]
457
- **Duration:** [Start time] - [End time]
458
- **Severity:** [SEV level]
459
- **Incident Commander:** [Name]
460
-
461
- ## Summary
462
-
463
- [2-3 sentence summary of what happened]
464
-
465
- ## Impact
466
-
467
- - Users affected: [number]
468
- - Duration: [time]
469
- - Revenue impact: [if applicable]
470
- - SLA impact: [if applicable]
471
-
472
- ## Timeline
473
-
474
- | Time | Event |
475
- |------|-------|
476
- | HH:MM | Issue detected |
477
- | HH:MM | Incident declared |
478
- | HH:MM | Root cause identified |
479
- | HH:MM | Fix deployed |
480
- | HH:MM | Issue resolved |
481
-
482
- ## Root Cause
483
-
484
- [Detailed explanation of what caused the incident]
485
-
486
- ## Contributing Factors
487
-
488
- 1. [Factor 1]
489
- 2. [Factor 2]
490
- 3. [Factor 3]
491
-
492
- ## What Went Well
493
-
494
- - [Positive 1]
495
- - [Positive 2]
496
-
497
- ## What Could Be Improved
498
-
499
- - [Improvement 1]
500
- - [Improvement 2]
501
-
502
- ## Action Items
503
-
504
- | Action | Owner | Due Date | Status |
505
- |--------|-------|----------|--------|
506
- | [Action 1] | [Name] | [Date] | [ ] |
507
- | [Action 2] | [Name] | [Date] | [ ] |
508
-
509
- ## Lessons Learned
510
-
511
- [Key takeaways from this incident]
512
- ```
513
-
514
- ---
515
-
516
- ## On-Call Configuration
517
-
518
- ```yaml
519
- oncall:
520
- schedules:
521
- primary:
522
- rotation: "weekly"
523
- members:
524
- - "engineer1@company.com"
525
- - "engineer2@company.com"
526
- - "engineer3@company.com"
527
-
528
- secondary:
529
- rotation: "weekly"
530
- offset: 1 # Day offset from primary
531
- members:
532
- - "senior1@company.com"
533
- - "senior2@company.com"
534
-
535
- escalation:
536
- - level: 1
537
- delay: 5 # minutes
538
- target: "primary"
539
-
540
- - level: 2
541
- delay: 15
542
- target: "secondary"
543
-
544
- - level: 3
545
- delay: 30
546
- target: "engineering_manager"
547
-
548
- handoff:
549
- time: "09:00"
550
- timezone: "America/New_York"
551
- reminder: 1 # hour before
552
- ```
553
-
554
- ---
555
-
556
- ## Best Practices
557
-
558
- 1. **Blameless Culture**: Focus on systems, not individuals
559
- 2. **Clear Ownership**: Always have an incident commander
560
- 3. **Communication**: Over-communicate during incidents
561
- 4. **Documentation**: Document everything in real-time
562
- 5. **Practice**: Run regular incident simulations
563
- 6. **Automate**: Automate detection and response where possible
564
- 7. **Learn**: Always conduct postmortems
565
- 8. **Improve**: Turn learnings into concrete actions