proagents 1.6.16 → 1.6.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (170) hide show
  1. package/.claude/settings.local.json +169 -0
  2. package/COMMANDS.md +595 -0
  3. package/README.md +22 -64
  4. package/bin/proagents.js +0 -2
  5. package/lib/commands/init.js +4 -174
  6. package/package.json +2 -7
  7. package/.proagents/ai-models/README.md +0 -141
  8. package/.proagents/ai-models/cost-management.md +0 -362
  9. package/.proagents/ai-models/fallbacks.md +0 -342
  10. package/.proagents/ai-models/model-config.md +0 -318
  11. package/.proagents/ai-models/task-routing.md +0 -503
  12. package/.proagents/ai-training/README.md +0 -155
  13. package/.proagents/ai-training/continuous-learning.md +0 -413
  14. package/.proagents/ai-training/domain-knowledge.md +0 -378
  15. package/.proagents/ai-training/pattern-learning.md +0 -455
  16. package/.proagents/ai-training/training-data.md +0 -337
  17. package/.proagents/ai-training/user-preferences.md +0 -346
  18. package/.proagents/approval-workflows/README.md +0 -146
  19. package/.proagents/approval-workflows/approval-config.md +0 -332
  20. package/.proagents/approval-workflows/approval-stages.md +0 -503
  21. package/.proagents/approval-workflows/emergency-bypass.md +0 -351
  22. package/.proagents/approval-workflows/examples.md +0 -859
  23. package/.proagents/approval-workflows/notifications.md +0 -320
  24. package/.proagents/compliance/README.md +0 -206
  25. package/.proagents/compliance/access-control.md +0 -310
  26. package/.proagents/compliance/audit-logging.md +0 -444
  27. package/.proagents/compliance/compliance-frameworks.md +0 -429
  28. package/.proagents/compliance/reports.md +0 -491
  29. package/.proagents/compliance/retention-policies.md +0 -454
  30. package/.proagents/config-versioning/README.md +0 -120
  31. package/.proagents/config-versioning/changelog.md +0 -300
  32. package/.proagents/config-versioning/rollback.md +0 -283
  33. package/.proagents/config-versioning/versioning.md +0 -330
  34. package/.proagents/contract-testing/README.md +0 -223
  35. package/.proagents/contract-testing/contract-testing.md +0 -614
  36. package/.proagents/contract-testing/pact-integration.md +0 -507
  37. package/.proagents/contract-testing/schema-validation.md +0 -565
  38. package/.proagents/dependency-management/README.md +0 -140
  39. package/.proagents/dependency-management/automation.md +0 -363
  40. package/.proagents/dependency-management/compatibility.md +0 -319
  41. package/.proagents/dependency-management/security-scanning.md +0 -413
  42. package/.proagents/dependency-management/update-policies.md +0 -374
  43. package/.proagents/disaster-recovery/README.md +0 -247
  44. package/.proagents/disaster-recovery/automation.md +0 -366
  45. package/.proagents/disaster-recovery/backup-recovery.md +0 -571
  46. package/.proagents/disaster-recovery/incident-response.md +0 -565
  47. package/.proagents/disaster-recovery/rollback-procedures.md +0 -499
  48. package/.proagents/disaster-recovery/runbooks.md +0 -603
  49. package/.proagents/disaster-recovery/scenarios.md +0 -892
  50. package/.proagents/disaster-recovery/testing.md +0 -438
  51. package/.proagents/environments/README.md +0 -244
  52. package/.proagents/environments/configuration.md +0 -437
  53. package/.proagents/environments/promotion.md +0 -434
  54. package/.proagents/environments/setup.md +0 -420
  55. package/.proagents/examples/README.md +0 -55
  56. package/.proagents/examples/backend-nodejs/README.md +0 -188
  57. package/.proagents/examples/backend-nodejs/complete-conversation.md +0 -601
  58. package/.proagents/examples/backend-nodejs/proagents.config.yaml +0 -415
  59. package/.proagents/examples/backend-nodejs/workflow-example.md +0 -909
  60. package/.proagents/examples/fullstack-nextjs/README.md +0 -155
  61. package/.proagents/examples/fullstack-nextjs/complete-conversation.md +0 -604
  62. package/.proagents/examples/fullstack-nextjs/proagents.config.yaml +0 -287
  63. package/.proagents/examples/fullstack-nextjs/workflow-example.md +0 -553
  64. package/.proagents/examples/mobile-react-native/README.md +0 -171
  65. package/.proagents/examples/mobile-react-native/complete-conversation.md +0 -825
  66. package/.proagents/examples/mobile-react-native/proagents.config.yaml +0 -330
  67. package/.proagents/examples/mobile-react-native/workflow-example.md +0 -723
  68. package/.proagents/examples/web-frontend-react/README.md +0 -125
  69. package/.proagents/examples/web-frontend-react/complete-conversation.md +0 -556
  70. package/.proagents/examples/web-frontend-react/proagents.config.yaml +0 -183
  71. package/.proagents/examples/web-frontend-react/workflow-example.md +0 -603
  72. package/.proagents/existing-projects/README.md +0 -65
  73. package/.proagents/existing-projects/challenges.md +0 -861
  74. package/.proagents/existing-projects/coexistence-mode.md +0 -483
  75. package/.proagents/existing-projects/compatibility-assessment.md +0 -541
  76. package/.proagents/existing-projects/gradual-adoption.md +0 -515
  77. package/.proagents/existing-projects/migration-strategies.md +0 -788
  78. package/.proagents/existing-projects/pattern-reconciliation.md +0 -489
  79. package/.proagents/existing-projects/team-onboarding.md +0 -617
  80. package/.proagents/existing-projects/technical-debt-handling.md +0 -644
  81. package/.proagents/feature-flags/README.md +0 -263
  82. package/.proagents/feature-flags/ab-testing.md +0 -413
  83. package/.proagents/feature-flags/configuration.md +0 -420
  84. package/.proagents/feature-flags/kill-switches.md +0 -444
  85. package/.proagents/feature-flags/rollout-strategies.md +0 -392
  86. package/.proagents/history.log +0 -12
  87. package/.proagents/i18n/README.md +0 -133
  88. package/.proagents/i18n/extraction.md +0 -433
  89. package/.proagents/i18n/tms-integration.md +0 -332
  90. package/.proagents/i18n/translation-workflow.md +0 -413
  91. package/.proagents/i18n/validation.md +0 -355
  92. package/.proagents/logging/README.md +0 -276
  93. package/.proagents/logging/aggregation.md +0 -475
  94. package/.proagents/logging/log-levels.md +0 -376
  95. package/.proagents/logging/sensitive-data.md +0 -423
  96. package/.proagents/logging/structured-logging.md +0 -406
  97. package/.proagents/metrics/README.md +0 -69
  98. package/.proagents/metrics/code-quality-kpis.md +0 -461
  99. package/.proagents/metrics/deployment-metrics.md +0 -517
  100. package/.proagents/metrics/developer-productivity.md +0 -368
  101. package/.proagents/metrics/learning-effectiveness.md +0 -478
  102. package/.proagents/migrations/README.md +0 -77
  103. package/.proagents/migrations/from-claude-projects.md +0 -313
  104. package/.proagents/migrations/from-cursor-rules.md +0 -345
  105. package/.proagents/migrations/from-custom-workflows.md +0 -410
  106. package/.proagents/monitoring/README.md +0 -308
  107. package/.proagents/monitoring/alerting.md +0 -449
  108. package/.proagents/monitoring/dashboards.md +0 -454
  109. package/.proagents/monitoring/health-checks.md +0 -436
  110. package/.proagents/monitoring/metrics.md +0 -434
  111. package/.proagents/multi-project/README.md +0 -170
  112. package/.proagents/multi-project/coordinated-deploy.md +0 -510
  113. package/.proagents/multi-project/cross-project-deps.md +0 -395
  114. package/.proagents/multi-project/unified-changelog.md +0 -477
  115. package/.proagents/multi-project/walkthroughs/monorepo-setup.md +0 -787
  116. package/.proagents/multi-project/workspace-config.md +0 -408
  117. package/.proagents/notifications/README.md +0 -151
  118. package/.proagents/notifications/channels.md +0 -457
  119. package/.proagents/notifications/preferences.md +0 -415
  120. package/.proagents/notifications/routing.md +0 -449
  121. package/.proagents/notifications/scheduling.md +0 -425
  122. package/.proagents/notifications/templates.md +0 -446
  123. package/.proagents/offline-mode/README.md +0 -145
  124. package/.proagents/offline-mode/caching.md +0 -344
  125. package/.proagents/offline-mode/offline-operations.md +0 -312
  126. package/.proagents/offline-mode/queue-specifications.md +0 -679
  127. package/.proagents/offline-mode/sync.md +0 -475
  128. package/.proagents/parallel-features/README.md +0 -85
  129. package/.proagents/parallel-features/conflict-detection.md +0 -226
  130. package/.proagents/parallel-features/dependency-management.md +0 -392
  131. package/.proagents/parallel-features/merge-coordination.md +0 -506
  132. package/.proagents/parallel-features/tracking-system.md +0 -416
  133. package/.proagents/performance/README.md +0 -59
  134. package/.proagents/performance/bundle-analysis.md +0 -375
  135. package/.proagents/performance/load-testing.md +0 -563
  136. package/.proagents/performance/runtime-metrics.md +0 -489
  137. package/.proagents/performance/web-vitals.md +0 -425
  138. package/.proagents/plugins/README.md +0 -139
  139. package/.proagents/plugins/creating-plugins.md +0 -504
  140. package/.proagents/plugins/plugin-api.md +0 -467
  141. package/.proagents/plugins/plugin-registry.md +0 -276
  142. package/.proagents/reporting/README.md +0 -158
  143. package/.proagents/reporting/dashboards.md +0 -366
  144. package/.proagents/reporting/exports.md +0 -524
  145. package/.proagents/reporting/quality-metrics.md +0 -385
  146. package/.proagents/reporting/templates/README.md +0 -56
  147. package/.proagents/reporting/templates/dashboard-config.json +0 -187
  148. package/.proagents/reporting/templates/metrics-queries.md +0 -427
  149. package/.proagents/reporting/templates/react-dashboard.tsx +0 -544
  150. package/.proagents/reporting/templates/widgets.md +0 -451
  151. package/.proagents/reporting/velocity-metrics.md +0 -340
  152. package/.proagents/reverse-engineering/README.md +0 -151
  153. package/.proagents/reverse-engineering/architecture-extraction.md +0 -325
  154. package/.proagents/reverse-engineering/code-analysis.md +0 -377
  155. package/.proagents/reverse-engineering/dependency-mapping.md +0 -567
  156. package/.proagents/reverse-engineering/diagram-generation.md +0 -586
  157. package/.proagents/reverse-engineering/documentation-generation.md +0 -468
  158. package/.proagents/reverse-engineering/pattern-detection.md +0 -569
  159. package/.proagents/reverse-engineering/quality-assessment.md +0 -733
  160. package/.proagents/secrets/README.md +0 -278
  161. package/.proagents/secrets/access-control.md +0 -443
  162. package/.proagents/secrets/rotation.md +0 -403
  163. package/.proagents/secrets/scanning.md +0 -487
  164. package/.proagents/secrets/storage.md +0 -394
  165. package/.proagents/webhooks/README.md +0 -126
  166. package/.proagents/webhooks/endpoints.md +0 -298
  167. package/.proagents/webhooks/events.md +0 -316
  168. package/.proagents/webhooks/payloads.md +0 -325
  169. package/.proagents/webhooks/reliability.md +0 -363
  170. package/.proagents/webhooks/security.md +0 -380
@@ -1,565 +0,0 @@
1
- # Incident Response
2
-
3
- Incident response procedures and runbooks for development teams.
4
-
5
- ---
6
-
7
- ## Overview
8
-
9
- Structured incident response ensures rapid detection, response, and recovery from issues.
10
-
11
- ```
12
- ┌─────────────────────────────────────────────────────────────┐
13
- │ Incident Response Flow │
14
- ├─────────────────────────────────────────────────────────────┤
15
- │ │
16
- │ Detection ──► Triage ──► Response ──► Resolution ──► Review│
17
- │ │ │ │ │ │ │
18
- │ ▼ ▼ ▼ ▼ ▼ │
19
- │ Alerts Severity Containment Fix/Rollback Postmortem
20
- │ Monitor Assessment Communication Verification Learning
21
- │ │
22
- └─────────────────────────────────────────────────────────────┘
23
- ```
24
-
25
- ---
26
-
27
- ## Severity Levels
28
-
29
- | Level | Name | Description | Response Time | Examples |
30
- |-------|------|-------------|---------------|----------|
31
- | SEV1 | Critical | Complete outage | < 15 min | System down, data breach |
32
- | SEV2 | Major | Significant impact | < 30 min | Core feature broken |
33
- | SEV3 | Minor | Limited impact | < 2 hours | Non-critical bug |
34
- | SEV4 | Low | Minimal impact | < 24 hours | UI glitch |
35
-
36
- ### Severity Decision Matrix
37
-
38
- ```
39
- ┌─────────────────────────────────────────────────────────────┐
40
- │ Severity Assessment │
41
- ├─────────────────────────────────────────────────────────────┤
42
- │ │
43
- │ Impact: How many users affected? │
44
- │ ├── All users ─────────────────────────► SEV1/SEV2 │
45
- │ ├── Many users (>10%) ─────────────────► SEV2/SEV3 │
46
- │ ├── Some users (<10%) ─────────────────► SEV3/SEV4 │
47
- │ └── Few users (<1%) ───────────────────► SEV4 │
48
- │ │
49
- │ Urgency: Is it getting worse? │
50
- │ ├── Yes, rapidly ──────────────────────► Escalate │
51
- │ ├── Yes, slowly ───────────────────────► Monitor │
52
- │ └── No, stable ────────────────────────► Normal priority │
53
- │ │
54
- │ Data: Is data at risk? │
55
- │ ├── Data loss/corruption ──────────────► SEV1 │
56
- │ ├── Data exposure ─────────────────────► SEV1 │
57
- │ └── No data impact ────────────────────► Continue assess │
58
- │ │
59
- └─────────────────────────────────────────────────────────────┘
60
- ```
61
-
62
- ---
63
-
64
- ## Incident Response Phases
65
-
66
- ### Phase 1: Detection
67
-
68
- ```yaml
69
- detection:
70
- sources:
71
- automated:
72
- - monitoring_alerts
73
- - error_rate_spikes
74
- - health_check_failures
75
- - security_events
76
-
77
- manual:
78
- - user_reports
79
- - support_tickets
80
- - team_observations
81
-
82
- initial_actions:
83
- - acknowledge_alert
84
- - create_incident_ticket
85
- - notify_on_call
86
- ```
87
-
88
- ### Phase 2: Triage
89
-
90
- ```markdown
91
- ## Triage Checklist
92
-
93
- ☐ Confirm the issue is real (not false positive)
94
- ☐ Assess severity level
95
- ☐ Identify affected systems/users
96
- ☐ Determine if it's getting worse
97
- ☐ Assign incident commander
98
-
99
- ## Key Questions
100
-
101
- 1. What is broken?
102
- 2. Who is affected?
103
- 3. When did it start?
104
- 4. Is it getting worse?
105
- 5. What changed recently?
106
- ```
107
-
108
- ### Phase 3: Response
109
-
110
- ```yaml
111
- response:
112
- immediate_actions:
113
- sev1:
114
- - "Page incident commander"
115
- - "Start incident channel"
116
- - "Begin customer communication"
117
- - "Consider immediate rollback"
118
-
119
- sev2:
120
- - "Alert on-call team"
121
- - "Create incident channel"
122
- - "Begin investigation"
123
-
124
- sev3:
125
- - "Assign to engineer"
126
- - "Track in ticket system"
127
- - "Schedule fix"
128
-
129
- communication:
130
- internal:
131
- - incident_channel
132
- - status_updates
133
- external:
134
- - status_page
135
- - customer_notification
136
- ```
137
-
138
- ### Phase 4: Resolution
139
-
140
- ```markdown
141
- ## Resolution Steps
142
-
143
- 1. **Contain** - Stop the bleeding
144
- - Rollback if needed
145
- - Disable affected features
146
- - Block problematic traffic
147
-
148
- 2. **Fix** - Address root cause
149
- - Identify root cause
150
- - Implement fix
151
- - Test fix
152
-
153
- 3. **Verify** - Confirm resolution
154
- - Check metrics normalized
155
- - Verify user reports stopped
156
- - Confirm monitoring green
157
-
158
- 4. **Communicate** - Update stakeholders
159
- - Update status page
160
- - Notify affected users
161
- - Close incident channel
162
- ```
163
-
164
- ### Phase 5: Review
165
-
166
- ```markdown
167
- ## Post-Incident Review
168
-
169
- **Within 24-48 hours:**
170
- - Document timeline
171
- - Identify root cause
172
- - List contributing factors
173
-
174
- **Within 1 week:**
175
- - Hold blameless postmortem
176
- - Document learnings
177
- - Create action items
178
- - Update runbooks
179
- ```
180
-
181
- ---
182
-
183
- ## Incident Runbooks
184
-
185
- ### SEV1: Complete System Outage
186
-
187
- ```markdown
188
- ## SEV1 Runbook: Complete System Outage
189
-
190
- **Time Target:** Resolution within 1 hour
191
-
192
- ### Immediate Actions (0-5 minutes)
193
-
194
- 1. ☐ Acknowledge alert
195
- 2. ☐ Page incident commander
196
- 3. ☐ Create incident channel (#incident-YYYYMMDD)
197
- 4. ☐ Update status page: "Investigating"
198
-
199
- ### Triage (5-15 minutes)
200
-
201
- 1. ☐ Verify outage scope
202
- 2. ☐ Check recent deployments
203
- 3. ☐ Check infrastructure status
204
- 4. ☐ Check external dependencies
205
-
206
- ### Response (15-30 minutes)
207
-
208
- 1. ☐ If deployment related:
209
- ```bash
210
- proagents rollback production --emergency
211
- ```
212
-
213
- 2. ☐ If infrastructure related:
214
- ```bash
215
- proagents dr failover
216
- ```
217
-
218
- 3. ☐ Update status page: "Identified"
219
-
220
- ### Resolution (30-60 minutes)
221
-
222
- 1. ☐ Verify services recovering
223
- 2. ☐ Monitor error rates
224
- 3. ☐ Run smoke tests
225
- 4. ☐ Update status page: "Resolved"
226
-
227
- ### Post-Incident
228
-
229
- 1. ☐ Close incident channel
230
- 2. ☐ Schedule postmortem
231
- 3. ☐ Document timeline
232
- ```
233
-
234
- ### SEV1: Security Breach
235
-
236
- ```markdown
237
- ## SEV1 Runbook: Security Breach
238
-
239
- **Time Target:** Containment within 30 minutes
240
-
241
- ### Immediate Actions (0-5 minutes)
242
-
243
- 1. ☐ Page security team
244
- 2. ☐ Page incident commander
245
- 3. ☐ Create secure incident channel
246
- 4. ☐ DO NOT communicate externally yet
247
-
248
- ### Containment (5-30 minutes)
249
-
250
- 1. ☐ Identify breach scope
251
- - What data was accessed?
252
- - What systems are affected?
253
- - Is the attack ongoing?
254
-
255
- 2. ☐ Contain the breach
256
- ```bash
257
- # Isolate affected systems
258
- proagents security isolate --system affected-service
259
-
260
- # Rotate credentials
261
- proagents security rotate-credentials --scope affected
262
- ```
263
-
264
- 3. ☐ Preserve evidence
265
- ```bash
266
- proagents security preserve-logs --incident INC-XXX
267
- ```
268
-
269
- ### Investigation (30-120 minutes)
270
-
271
- 1. ☐ Determine attack vector
272
- 2. ☐ Identify all affected data
273
- 3. ☐ Assess legal obligations (GDPR, etc.)
274
- 4. ☐ Engage legal/compliance if needed
275
-
276
- ### Communication
277
-
278
- 1. ☐ Internal stakeholders
279
- 2. ☐ Legal assessment
280
- 3. ☐ Regulatory notification (if required)
281
- 4. ☐ Customer notification (if required)
282
- ```
283
-
284
- ### SEV2: Database Issues
285
-
286
- ```markdown
287
- ## SEV2 Runbook: Database Issues
288
-
289
- **Time Target:** Resolution within 2 hours
290
-
291
- ### Symptoms
292
-
293
- - Slow queries
294
- - Connection errors
295
- - Replication lag
296
- - Lock contention
297
-
298
- ### Triage
299
-
300
- 1. ☐ Check database metrics
301
- ```bash
302
- proagents db status
303
- ```
304
-
305
- 2. ☐ Check recent changes
306
- - New deployments?
307
- - Schema migrations?
308
- - Query changes?
309
-
310
- ### Response
311
-
312
- 1. **Connection Issues**
313
- ```bash
314
- # Check connection pool
315
- proagents db connections
316
-
317
- # Kill long-running queries
318
- proagents db kill-queries --older-than 300s
319
- ```
320
-
321
- 2. **Performance Issues**
322
- ```bash
323
- # Identify slow queries
324
- proagents db slow-queries
325
-
326
- # Check for locks
327
- proagents db locks
328
- ```
329
-
330
- 3. **Replication Issues**
331
- ```bash
332
- # Check replication status
333
- proagents db replication-status
334
-
335
- # If severely lagged, consider failover
336
- proagents db failover --dry-run
337
- ```
338
-
339
- ### If Rollback Needed
340
-
341
- ```bash
342
- proagents db rollback --steps 1
343
- ```
344
- ```
345
-
346
- ---
347
-
348
- ## Communication Templates
349
-
350
- ### Status Page Update
351
-
352
- ```markdown
353
- **[INVESTIGATING]** - Service Degradation
354
-
355
- We are currently investigating reports of slow performance.
356
-
357
- **Affected Services:** [list services]
358
- **Impact:** [describe impact]
359
- **Started:** [time]
360
-
361
- Updates will be provided every 30 minutes.
362
- ```
363
-
364
- ### Resolution Communication
365
-
366
- ```markdown
367
- **[RESOLVED]** - Service Degradation
368
-
369
- The service issues have been resolved. All services are operating normally.
370
-
371
- **Duration:** [start time] - [end time]
372
- **Root Cause:** [brief description]
373
- **Resolution:** [what was done]
374
-
375
- We apologize for any inconvenience. A full postmortem will be conducted.
376
- ```
377
-
378
- ### Customer Notification
379
-
380
- ```markdown
381
- Subject: Service Incident Notification
382
-
383
- Dear Customer,
384
-
385
- We experienced a service disruption on [date] from [time] to [time].
386
-
387
- **What happened:** [description]
388
- **Impact to you:** [specific impact]
389
- **What we did:** [resolution]
390
- **Prevention:** [what we're doing to prevent recurrence]
391
-
392
- We apologize for any inconvenience this may have caused.
393
-
394
- [Name]
395
- [Company]
396
- ```
397
-
398
- ---
399
-
400
- ## Incident Management Tools
401
-
402
- ### CLI Commands
403
-
404
- ```bash
405
- # Create incident
406
- proagents incident create --severity SEV2 --title "API errors"
407
-
408
- # Update incident
409
- proagents incident update INC-123 --status investigating
410
-
411
- # Add timeline entry
412
- proagents incident timeline INC-123 "Identified root cause"
413
-
414
- # Close incident
415
- proagents incident close INC-123 --resolution "Rolled back to v2.3.1"
416
-
417
- # Generate postmortem template
418
- proagents incident postmortem INC-123
419
- ```
420
-
421
- ### Integration Configuration
422
-
423
- ```yaml
424
- incident:
425
- integrations:
426
- pagerduty:
427
- enabled: true
428
- api_key: "${PAGERDUTY_API_KEY}"
429
- service_id: "SERVICE123"
430
-
431
- slack:
432
- enabled: true
433
- webhook: "${SLACK_WEBHOOK}"
434
- channels:
435
- incidents: "#incidents"
436
- status: "#status-updates"
437
-
438
- status_page:
439
- enabled: true
440
- provider: "statuspage.io"
441
- page_id: "${STATUSPAGE_ID}"
442
-
443
- jira:
444
- enabled: true
445
- project: "INC"
446
- issue_type: "Incident"
447
- ```
448
-
449
- ---
450
-
451
- ## Postmortem Template
452
-
453
- ```markdown
454
- # Incident Postmortem: [Incident Title]
455
-
456
- **Date:** [Date]
457
- **Duration:** [Start time] - [End time]
458
- **Severity:** [SEV level]
459
- **Incident Commander:** [Name]
460
-
461
- ## Summary
462
-
463
- [2-3 sentence summary of what happened]
464
-
465
- ## Impact
466
-
467
- - Users affected: [number]
468
- - Duration: [time]
469
- - Revenue impact: [if applicable]
470
- - SLA impact: [if applicable]
471
-
472
- ## Timeline
473
-
474
- | Time | Event |
475
- |------|-------|
476
- | HH:MM | Issue detected |
477
- | HH:MM | Incident declared |
478
- | HH:MM | Root cause identified |
479
- | HH:MM | Fix deployed |
480
- | HH:MM | Issue resolved |
481
-
482
- ## Root Cause
483
-
484
- [Detailed explanation of what caused the incident]
485
-
486
- ## Contributing Factors
487
-
488
- 1. [Factor 1]
489
- 2. [Factor 2]
490
- 3. [Factor 3]
491
-
492
- ## What Went Well
493
-
494
- - [Positive 1]
495
- - [Positive 2]
496
-
497
- ## What Could Be Improved
498
-
499
- - [Improvement 1]
500
- - [Improvement 2]
501
-
502
- ## Action Items
503
-
504
- | Action | Owner | Due Date | Status |
505
- |--------|-------|----------|--------|
506
- | [Action 1] | [Name] | [Date] | [ ] |
507
- | [Action 2] | [Name] | [Date] | [ ] |
508
-
509
- ## Lessons Learned
510
-
511
- [Key takeaways from this incident]
512
- ```
513
-
514
- ---
515
-
516
- ## On-Call Configuration
517
-
518
- ```yaml
519
- oncall:
520
- schedules:
521
- primary:
522
- rotation: "weekly"
523
- members:
524
- - "engineer1@company.com"
525
- - "engineer2@company.com"
526
- - "engineer3@company.com"
527
-
528
- secondary:
529
- rotation: "weekly"
530
- offset: 1 # Day offset from primary
531
- members:
532
- - "senior1@company.com"
533
- - "senior2@company.com"
534
-
535
- escalation:
536
- - level: 1
537
- delay: 5 # minutes
538
- target: "primary"
539
-
540
- - level: 2
541
- delay: 15
542
- target: "secondary"
543
-
544
- - level: 3
545
- delay: 30
546
- target: "engineering_manager"
547
-
548
- handoff:
549
- time: "09:00"
550
- timezone: "America/New_York"
551
- reminder: 1 # hour before
552
- ```
553
-
554
- ---
555
-
556
- ## Best Practices
557
-
558
- 1. **Blameless Culture**: Focus on systems, not individuals
559
- 2. **Clear Ownership**: Always have an incident commander
560
- 3. **Communication**: Over-communicate during incidents
561
- 4. **Documentation**: Document everything in real-time
562
- 5. **Practice**: Run regular incident simulations
563
- 6. **Automate**: Automate detection and response where possible
564
- 7. **Learn**: Always conduct postmortems
565
- 8. **Improve**: Turn learnings into concrete actions