specweave 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (288) hide show
  1. package/INSTALL.md +848 -0
  2. package/LICENSE +21 -0
  3. package/README.md +675 -0
  4. package/SPECWEAVE.md +665 -0
  5. package/bin/install-agents.sh +57 -0
  6. package/bin/install-all.sh +49 -0
  7. package/bin/install-commands.sh +56 -0
  8. package/bin/install-skills.sh +57 -0
  9. package/bin/specweave.js +81 -0
  10. package/dist/adapters/adapter-base.d.ts +50 -0
  11. package/dist/adapters/adapter-base.d.ts.map +1 -0
  12. package/dist/adapters/adapter-base.js +146 -0
  13. package/dist/adapters/adapter-base.js.map +1 -0
  14. package/dist/adapters/adapter-interface.d.ts +108 -0
  15. package/dist/adapters/adapter-interface.d.ts.map +1 -0
  16. package/dist/adapters/adapter-interface.js +9 -0
  17. package/dist/adapters/adapter-interface.js.map +1 -0
  18. package/dist/adapters/claude/adapter.d.ts +54 -0
  19. package/dist/adapters/claude/adapter.d.ts.map +1 -0
  20. package/dist/adapters/claude/adapter.js +184 -0
  21. package/dist/adapters/claude/adapter.js.map +1 -0
  22. package/dist/adapters/copilot/adapter.d.ts +42 -0
  23. package/dist/adapters/copilot/adapter.d.ts.map +1 -0
  24. package/dist/adapters/copilot/adapter.js +239 -0
  25. package/dist/adapters/copilot/adapter.js.map +1 -0
  26. package/dist/adapters/cursor/adapter.d.ts +42 -0
  27. package/dist/adapters/cursor/adapter.d.ts.map +1 -0
  28. package/dist/adapters/cursor/adapter.js +297 -0
  29. package/dist/adapters/cursor/adapter.js.map +1 -0
  30. package/dist/adapters/generic/adapter.d.ts +40 -0
  31. package/dist/adapters/generic/adapter.d.ts.map +1 -0
  32. package/dist/adapters/generic/adapter.js +155 -0
  33. package/dist/adapters/generic/adapter.js.map +1 -0
  34. package/dist/cli/commands/init.d.ts +6 -0
  35. package/dist/cli/commands/init.d.ts.map +1 -0
  36. package/dist/cli/commands/init.js +247 -0
  37. package/dist/cli/commands/init.js.map +1 -0
  38. package/dist/cli/commands/install.d.ts +7 -0
  39. package/dist/cli/commands/install.d.ts.map +1 -0
  40. package/dist/cli/commands/install.js +160 -0
  41. package/dist/cli/commands/install.js.map +1 -0
  42. package/dist/cli/commands/list.d.ts +6 -0
  43. package/dist/cli/commands/list.d.ts.map +1 -0
  44. package/dist/cli/commands/list.js +154 -0
  45. package/dist/cli/commands/list.js.map +1 -0
  46. package/package.json +90 -0
  47. package/src/adapters/README.md +312 -0
  48. package/src/adapters/adapter-base.ts +146 -0
  49. package/src/adapters/adapter-interface.ts +120 -0
  50. package/src/adapters/claude/README.md +241 -0
  51. package/src/adapters/claude/adapter.ts +157 -0
  52. package/src/adapters/copilot/.github/copilot/instructions.md +376 -0
  53. package/src/adapters/copilot/README.md +200 -0
  54. package/src/adapters/copilot/adapter.ts +210 -0
  55. package/src/adapters/cursor/.cursor/context/docs-context.md +62 -0
  56. package/src/adapters/cursor/.cursor/context/increments-context.md +71 -0
  57. package/src/adapters/cursor/.cursor/context/strategy-context.md +73 -0
  58. package/src/adapters/cursor/.cursor/context/tests-context.md +89 -0
  59. package/src/adapters/cursor/.cursorrules +325 -0
  60. package/src/adapters/cursor/README.md +243 -0
  61. package/src/adapters/cursor/adapter.ts +268 -0
  62. package/src/adapters/generic/README.md +277 -0
  63. package/src/adapters/generic/SPECWEAVE-MANUAL.md +676 -0
  64. package/src/adapters/generic/adapter.ts +159 -0
  65. package/src/adapters/registry.yaml +126 -0
  66. package/src/agents/architect/AGENT.md +416 -0
  67. package/src/agents/devops/AGENT.md +1738 -0
  68. package/src/agents/docs-writer/AGENT.md +239 -0
  69. package/src/agents/performance/AGENT.md +228 -0
  70. package/src/agents/pm/AGENT.md +751 -0
  71. package/src/agents/qa-lead/AGENT.md +150 -0
  72. package/src/agents/security/AGENT.md +179 -0
  73. package/src/agents/sre/AGENT.md +582 -0
  74. package/src/agents/sre/modules/backend-diagnostics.md +481 -0
  75. package/src/agents/sre/modules/database-diagnostics.md +509 -0
  76. package/src/agents/sre/modules/infrastructure.md +561 -0
  77. package/src/agents/sre/modules/monitoring.md +439 -0
  78. package/src/agents/sre/modules/security-incidents.md +421 -0
  79. package/src/agents/sre/modules/ui-diagnostics.md +302 -0
  80. package/src/agents/sre/playbooks/01-high-cpu-usage.md +204 -0
  81. package/src/agents/sre/playbooks/02-database-deadlock.md +241 -0
  82. package/src/agents/sre/playbooks/03-memory-leak.md +252 -0
  83. package/src/agents/sre/playbooks/04-slow-api-response.md +269 -0
  84. package/src/agents/sre/playbooks/05-ddos-attack.md +293 -0
  85. package/src/agents/sre/playbooks/06-disk-full.md +314 -0
  86. package/src/agents/sre/playbooks/07-service-down.md +333 -0
  87. package/src/agents/sre/playbooks/08-data-corruption.md +337 -0
  88. package/src/agents/sre/playbooks/09-cascade-failure.md +430 -0
  89. package/src/agents/sre/playbooks/10-rate-limit-exceeded.md +464 -0
  90. package/src/agents/sre/scripts/health-check.sh +230 -0
  91. package/src/agents/sre/scripts/log-analyzer.py +213 -0
  92. package/src/agents/sre/scripts/metrics-collector.sh +294 -0
  93. package/src/agents/sre/scripts/trace-analyzer.js +257 -0
  94. package/src/agents/sre/templates/incident-report.md +249 -0
  95. package/src/agents/sre/templates/mitigation-plan.md +375 -0
  96. package/src/agents/sre/templates/post-mortem.md +418 -0
  97. package/src/agents/sre/templates/runbook-template.md +412 -0
  98. package/src/agents/tech-lead/AGENT.md +263 -0
  99. package/src/commands/add-tasks.md +176 -0
  100. package/src/commands/close-increment.md +347 -0
  101. package/src/commands/create-increment.md +223 -0
  102. package/src/commands/create-project.md +528 -0
  103. package/src/commands/generate-docs.md +623 -0
  104. package/src/commands/list-increments.md +180 -0
  105. package/src/commands/review-docs.md +331 -0
  106. package/src/commands/start-increment.md +139 -0
  107. package/src/commands/sync-github.md +115 -0
  108. package/src/commands/validate-increment.md +800 -0
  109. package/src/hooks/README.md +252 -0
  110. package/src/hooks/docs-changed.sh +59 -0
  111. package/src/hooks/human-input-required.sh +55 -0
  112. package/src/hooks/post-task-completion.sh +57 -0
  113. package/src/hooks/pre-implementation.sh +47 -0
  114. package/src/skills/ado-sync/README.md +449 -0
  115. package/src/skills/ado-sync/SKILL.md +245 -0
  116. package/src/skills/ado-sync/test-cases/test-1.yaml +9 -0
  117. package/src/skills/ado-sync/test-cases/test-2.yaml +8 -0
  118. package/src/skills/ado-sync/test-cases/test-3.yaml +9 -0
  119. package/src/skills/bmad-method-expert/SKILL.md +628 -0
  120. package/src/skills/bmad-method-expert/scripts/analyze-project.js +318 -0
  121. package/src/skills/bmad-method-expert/scripts/check-setup.js +208 -0
  122. package/src/skills/bmad-method-expert/scripts/generate-template.js +1149 -0
  123. package/src/skills/bmad-method-expert/scripts/validate-documents.js +340 -0
  124. package/src/skills/bmad-method-expert/test-cases/test-1-placeholder.yaml +12 -0
  125. package/src/skills/bmad-method-expert/test-cases/test-2-placeholder.yaml +12 -0
  126. package/src/skills/bmad-method-expert/test-cases/test-3-placeholder.yaml +12 -0
  127. package/src/skills/brownfield-analyzer/SKILL.md +523 -0
  128. package/src/skills/brownfield-analyzer/test-cases/test-1-basic-analysis.yaml +48 -0
  129. package/src/skills/brownfield-analyzer/test-cases/test-2-placeholder.yaml +12 -0
  130. package/src/skills/brownfield-analyzer/test-cases/test-3-placeholder.yaml +12 -0
  131. package/src/skills/brownfield-onboarder/SKILL.md +625 -0
  132. package/src/skills/brownfield-onboarder/test-cases/test-1-placeholder.yaml +12 -0
  133. package/src/skills/brownfield-onboarder/test-cases/test-2-placeholder.yaml +12 -0
  134. package/src/skills/brownfield-onboarder/test-cases/test-3-placeholder.yaml +12 -0
  135. package/src/skills/calendar-system/test-cases/test-1-placeholder.yaml +12 -0
  136. package/src/skills/calendar-system/test-cases/test-2-placeholder.yaml +12 -0
  137. package/src/skills/calendar-system/test-cases/test-3-placeholder.yaml +12 -0
  138. package/src/skills/context-loader/SKILL.md +734 -0
  139. package/src/skills/context-loader/test-cases/test-1-basic-loading.yaml +39 -0
  140. package/src/skills/context-loader/test-cases/test-2-token-budget-exceeded.yaml +44 -0
  141. package/src/skills/context-loader/test-cases/test-3-section-anchors.yaml +45 -0
  142. package/src/skills/context-optimizer/SKILL.md +618 -0
  143. package/src/skills/context-optimizer/test-cases/test-1-bug-fix-narrow.yaml +97 -0
  144. package/src/skills/context-optimizer/test-cases/test-2-feature-focused.yaml +109 -0
  145. package/src/skills/context-optimizer/test-cases/test-3-architecture-broad.yaml +98 -0
  146. package/src/skills/cost-optimizer/SKILL.md +190 -0
  147. package/src/skills/cost-optimizer/test-cases/test-1-basic-comparison.yaml +75 -0
  148. package/src/skills/cost-optimizer/test-cases/test-2-budget-constraint.yaml +52 -0
  149. package/src/skills/cost-optimizer/test-cases/test-3-scale-requirement.yaml +63 -0
  150. package/src/skills/cost-optimizer/test-results/README.md +46 -0
  151. package/src/skills/design-system-architect/SKILL.md +107 -0
  152. package/src/skills/design-system-architect/test-cases/test-1-token-structure.yaml +23 -0
  153. package/src/skills/design-system-architect/test-cases/test-2-component-hierarchy.yaml +24 -0
  154. package/src/skills/design-system-architect/test-cases/test-3-accessibility-checklist.yaml +23 -0
  155. package/src/skills/diagrams-architect/SKILL.md +763 -0
  156. package/src/skills/diagrams-generator/SKILL.md +25 -0
  157. package/src/skills/diagrams-generator/test-cases/test-1.yaml +9 -0
  158. package/src/skills/diagrams-generator/test-cases/test-2.yaml +9 -0
  159. package/src/skills/diagrams-generator/test-cases/test-3.yaml +8 -0
  160. package/src/skills/docs-updater/README.md +48 -0
  161. package/src/skills/docs-updater/test-cases/test-1-placeholder.yaml +12 -0
  162. package/src/skills/docs-updater/test-cases/test-2-placeholder.yaml +12 -0
  163. package/src/skills/docs-updater/test-cases/test-3-placeholder.yaml +12 -0
  164. package/src/skills/dotnet-backend/SKILL.md +250 -0
  165. package/src/skills/e2e-playwright/README.md +506 -0
  166. package/src/skills/e2e-playwright/SKILL.md +457 -0
  167. package/src/skills/e2e-playwright/execute.js +373 -0
  168. package/src/skills/e2e-playwright/lib/utils.js +514 -0
  169. package/src/skills/e2e-playwright/package.json +33 -0
  170. package/src/skills/e2e-playwright/test-cases/TC-001-basic-navigation.yaml +54 -0
  171. package/src/skills/e2e-playwright/test-cases/TC-002-form-interaction.yaml +64 -0
  172. package/src/skills/e2e-playwright/test-cases/TC-003-specweave-integration.yaml +74 -0
  173. package/src/skills/e2e-playwright/test-cases/TC-004-accessibility-check.yaml +98 -0
  174. package/src/skills/figma-designer/SKILL.md +149 -0
  175. package/src/skills/figma-implementer/SKILL.md +148 -0
  176. package/src/skills/figma-mcp-connector/SKILL.md +136 -0
  177. package/src/skills/figma-mcp-connector/test-cases/test-1-read-file-desktop.yaml +22 -0
  178. package/src/skills/figma-mcp-connector/test-cases/test-2-read-file-framelink.yaml +21 -0
  179. package/src/skills/figma-mcp-connector/test-cases/test-3-error-handling.yaml +18 -0
  180. package/src/skills/figma-to-code/SKILL.md +128 -0
  181. package/src/skills/figma-to-code/test-cases/test-1-token-generation.yaml +29 -0
  182. package/src/skills/figma-to-code/test-cases/test-2-component-generation.yaml +27 -0
  183. package/src/skills/figma-to-code/test-cases/test-3-typescript-generation.yaml +28 -0
  184. package/src/skills/frontend/SKILL.md +177 -0
  185. package/src/skills/github-sync/SKILL.md +252 -0
  186. package/src/skills/github-sync/test-cases/test-1-placeholder.yaml +12 -0
  187. package/src/skills/github-sync/test-cases/test-2-placeholder.yaml +12 -0
  188. package/src/skills/github-sync/test-cases/test-3-placeholder.yaml +12 -0
  189. package/src/skills/hetzner-provisioner/README.md +308 -0
  190. package/src/skills/hetzner-provisioner/SKILL.md +251 -0
  191. package/src/skills/hetzner-provisioner/test-cases/test-1-basic-provision.yaml +71 -0
  192. package/src/skills/hetzner-provisioner/test-cases/test-2-postgres-provision.yaml +85 -0
  193. package/src/skills/hetzner-provisioner/test-cases/test-3-ssl-config.yaml +126 -0
  194. package/src/skills/hetzner-provisioner/test-results/README.md +259 -0
  195. package/src/skills/increment-planner/SKILL.md +889 -0
  196. package/src/skills/increment-planner/scripts/feature-utils.js +250 -0
  197. package/src/skills/increment-planner/test-cases/test-1-basic-feature.yaml +27 -0
  198. package/src/skills/increment-planner/test-cases/test-2-complex-feature.yaml +30 -0
  199. package/src/skills/increment-planner/test-cases/test-3-auto-numbering.yaml +24 -0
  200. package/src/skills/increment-quality-judge/SKILL.md +566 -0
  201. package/src/skills/increment-quality-judge/test-cases/test-1-good-spec.yaml +95 -0
  202. package/src/skills/increment-quality-judge/test-cases/test-2-poor-spec.yaml +108 -0
  203. package/src/skills/increment-quality-judge/test-cases/test-3-export-suggestions.yaml +87 -0
  204. package/src/skills/jira-sync/README.md +328 -0
  205. package/src/skills/jira-sync/SKILL.md +209 -0
  206. package/src/skills/jira-sync/test-cases/test-1.yaml +9 -0
  207. package/src/skills/jira-sync/test-cases/test-2.yaml +9 -0
  208. package/src/skills/jira-sync/test-cases/test-3.yaml +10 -0
  209. package/src/skills/nextjs/SKILL.md +176 -0
  210. package/src/skills/nodejs-backend/SKILL.md +181 -0
  211. package/src/skills/notification-system/test-cases/test-1-placeholder.yaml +12 -0
  212. package/src/skills/notification-system/test-cases/test-2-placeholder.yaml +12 -0
  213. package/src/skills/notification-system/test-cases/test-3-placeholder.yaml +12 -0
  214. package/src/skills/python-backend/SKILL.md +226 -0
  215. package/src/skills/role-orchestrator/README.md +197 -0
  216. package/src/skills/role-orchestrator/SKILL.md +1184 -0
  217. package/src/skills/role-orchestrator/test-cases/test-1-simple-product.yaml +98 -0
  218. package/src/skills/role-orchestrator/test-cases/test-2-quality-gate-failure.yaml +73 -0
  219. package/src/skills/role-orchestrator/test-cases/test-3-security-workflow.yaml +121 -0
  220. package/src/skills/role-orchestrator/test-cases/test-4-parallel-execution.yaml +145 -0
  221. package/src/skills/role-orchestrator/test-cases/test-5-feedback-loops.yaml +149 -0
  222. package/src/skills/skill-creator/LICENSE.txt +202 -0
  223. package/src/skills/skill-creator/SKILL.md +209 -0
  224. package/src/skills/skill-creator/scripts/init_skill.py +303 -0
  225. package/src/skills/skill-creator/scripts/package_skill.py +110 -0
  226. package/src/skills/skill-creator/scripts/quick_validate.py +65 -0
  227. package/src/skills/skill-creator/test-cases/test-1-placeholder.yaml +12 -0
  228. package/src/skills/skill-creator/test-cases/test-2-placeholder.yaml +12 -0
  229. package/src/skills/skill-creator/test-cases/test-3-placeholder.yaml +12 -0
  230. package/src/skills/skill-router/SKILL.md +497 -0
  231. package/src/skills/skill-router/test-cases/test-1-basic-routing.yaml +33 -0
  232. package/src/skills/skill-router/test-cases/test-2-ambiguous-request.yaml +42 -0
  233. package/src/skills/skill-router/test-cases/test-3-nested-orchestration.yaml +50 -0
  234. package/src/skills/spec-driven-brainstorming/README.md +264 -0
  235. package/src/skills/spec-driven-brainstorming/SKILL.md +439 -0
  236. package/src/skills/spec-driven-brainstorming/test-cases/TC-001-simple-idea-to-design.yaml +148 -0
  237. package/src/skills/spec-driven-brainstorming/test-cases/TC-002-complex-ultrathink-design.yaml +190 -0
  238. package/src/skills/spec-driven-brainstorming/test-cases/TC-003-unclear-requirements-socratic.yaml +233 -0
  239. package/src/skills/spec-driven-debugging/README.md +479 -0
  240. package/src/skills/spec-driven-debugging/SKILL.md +652 -0
  241. package/src/skills/spec-driven-debugging/test-cases/TC-001-simple-auth-bug.yaml +212 -0
  242. package/src/skills/spec-driven-debugging/test-cases/TC-002-race-condition-ultrathink.yaml +461 -0
  243. package/src/skills/spec-driven-debugging/test-cases/TC-003-brownfield-missing-spec.yaml +366 -0
  244. package/src/skills/spec-kit-expert/SKILL.md +1012 -0
  245. package/src/skills/spec-kit-expert/test-cases/test-1-placeholder.yaml +12 -0
  246. package/src/skills/spec-kit-expert/test-cases/test-2-placeholder.yaml +12 -0
  247. package/src/skills/spec-kit-expert/test-cases/test-3-placeholder.yaml +12 -0
  248. package/src/skills/specweave-ado-mapper/SKILL.md +501 -0
  249. package/src/skills/specweave-detector/SKILL.md +420 -0
  250. package/src/skills/specweave-detector/test-cases/test-1-basic-detection.yaml +37 -0
  251. package/src/skills/specweave-detector/test-cases/test-2-missing-config.yaml +37 -0
  252. package/src/skills/specweave-detector/test-cases/test-3-non-specweave-project.yaml +34 -0
  253. package/src/skills/specweave-jira-mapper/SKILL.md +500 -0
  254. package/src/skills/stripe-integrator/test-cases/test-1-placeholder.yaml +12 -0
  255. package/src/skills/stripe-integrator/test-cases/test-2-placeholder.yaml +12 -0
  256. package/src/skills/stripe-integrator/test-cases/test-3-placeholder.yaml +12 -0
  257. package/src/skills/task-builder/README.md +90 -0
  258. package/src/skills/task-builder/test-cases/test-1-placeholder.yaml +12 -0
  259. package/src/skills/task-builder/test-cases/test-2-placeholder.yaml +12 -0
  260. package/src/skills/task-builder/test-cases/test-3-placeholder.yaml +12 -0
  261. package/src/templates/.env.example +144 -0
  262. package/src/templates/.gitignore.template +81 -0
  263. package/src/templates/CLAUDE.md.template +383 -0
  264. package/src/templates/README.md.template +240 -0
  265. package/src/templates/config.yaml +333 -0
  266. package/src/templates/docs/README.md +124 -0
  267. package/src/templates/docs/adr-template.md +118 -0
  268. package/src/templates/docs/hld-template.md +220 -0
  269. package/src/templates/docs/lld-template.md +580 -0
  270. package/src/templates/docs/prd-template.md +132 -0
  271. package/src/templates/docs/rfc-template.md +229 -0
  272. package/src/templates/docs/runbook-template.md +298 -0
  273. package/src/templates/environments/minimal/.env.production +16 -0
  274. package/src/templates/environments/minimal/README.md +54 -0
  275. package/src/templates/environments/minimal/deploy-production.yml +52 -0
  276. package/src/templates/environments/progressive/.env.qa +28 -0
  277. package/src/templates/environments/progressive/README.md +129 -0
  278. package/src/templates/environments/progressive/deploy-production.yml +93 -0
  279. package/src/templates/environments/progressive/deploy-qa.yml +62 -0
  280. package/src/templates/environments/progressive/deploy-staging.yml +67 -0
  281. package/src/templates/environments/standard/.env.development +20 -0
  282. package/src/templates/environments/standard/.env.production +30 -0
  283. package/src/templates/environments/standard/.env.staging +23 -0
  284. package/src/templates/environments/standard/README.md +97 -0
  285. package/src/templates/environments/standard/deploy-production.yml +68 -0
  286. package/src/templates/environments/standard/deploy-staging.yml +61 -0
  287. package/src/templates/environments/standard/docker-compose.yml +43 -0
  288. package/src/templates/increment-metadata-template.yaml +138 -0
@@ -0,0 +1,430 @@
1
+ # Playbook: Cascade Failure
2
+
3
+ ## Symptoms
4
+
5
+ - Multiple services failing simultaneously
6
+ - Failures spreading across services
7
+ - Dependency services timing out
8
+ - Error rate increasing exponentially
9
+ - Monitoring alert: "Multiple services degraded", "Cascade detected"
10
+
11
+ ## Severity
12
+
13
+ - **SEV1** - Cascade affecting production services
14
+
15
+ ## What is a Cascade Failure?
16
+
17
+ **Definition**: One service failure triggers failures in dependent services, spreading through the system.
18
+
19
+ **Example**:
20
+ ```
21
+ Database slow (2s queries)
22
+
23
+ API times out waiting for database (5s timeout)
24
+
25
+ Frontend times out waiting for API (10s timeout)
26
+
27
+ Load balancer marks frontend unhealthy
28
+
29
+ Traffic routes to other frontends (overload them)
30
+
31
+ All frontends fail → Complete outage
32
+ ```
33
+
34
+ ---
35
+
36
+ ## Diagnosis
37
+
38
+ ### Step 1: Identify Initial Failure Point
39
+
40
+ **Check Service Dependencies**:
41
+ ```
42
+ Frontend → API → Database
43
+
44
+ Cache (Redis)
45
+
46
+ Queue (RabbitMQ)
47
+
48
+ External API
49
+ ```
50
+
51
+ **Find the root**:
52
+ ```bash
53
+ # Check service health (start with leaf dependencies)
54
+ # 1. Database
55
+ psql -c "SELECT 1"
56
+
57
+ # 2. Cache
58
+ redis-cli PING
59
+
60
+ # 3. Queue
61
+ rabbitmqctl status
62
+
63
+ # 4. External API
64
+ curl https://api.external.com/health
65
+
66
+ # First failure = likely root cause
67
+ ```
68
+
69
+ ---
70
+
71
+ ### Step 2: Trace Failure Propagation
72
+
73
+ **Check Service Logs** (in order):
74
+ ```bash
75
+ # Database logs (first)
76
+ tail -100 /var/log/postgresql/postgresql.log
77
+
78
+ # API logs (second)
79
+ tail -100 /var/log/api/error.log
80
+
81
+ # Frontend logs (third)
82
+ tail -100 /var/log/frontend/error.log
83
+ ```
84
+
85
+ **Look for timestamps**:
86
+ ```
87
+ 14:00:00 - Database: Slow query (7s) ← ROOT CAUSE
88
+ 14:00:05 - API: Timeout error
89
+ 14:00:10 - Frontend: API unavailable
90
+ 14:00:15 - Load balancer: All frontends unhealthy
91
+ ```
92
+
93
+ ---
94
+
95
+ ### Step 3: Assess Cascade Depth
96
+
97
+ **How many layers affected?**
98
+ - **1 layer**: Database only (isolated failure)
99
+ - **2-3 layers**: Database → API → Frontend (cascade)
100
+ - **4+ layers**: Full system cascade (critical)
101
+
102
+ ---
103
+
104
+ ## Mitigation
105
+
106
+ ### Immediate (Now - 5 min)
107
+
108
+ **PRIORITY: Stop the cascade from spreading**
109
+
110
+ **Option A: Circuit Breaker** (if not already enabled)
111
+ ```javascript
112
+ // Enable circuit breaker manually
113
+ // Prevents API from overwhelming database
114
+
115
+ const CircuitBreaker = require('opossum');
116
+
117
+ const dbQuery = new CircuitBreaker(queryDatabase, {
118
+ timeout: 3000, // 3s timeout
119
+ errorThresholdPercentage: 50, // Open after 50% failures
120
+ resetTimeout: 30000 // Try again after 30s
121
+ });
122
+
123
+ dbQuery.on('open', () => {
124
+ console.log('Circuit breaker OPEN - using fallback');
125
+ });
126
+
127
+ // Use fallback when circuit open
128
+ dbQuery.fallback(() => {
129
+ return cachedData; // Return cached data instead
130
+ });
131
+ ```
132
+
133
+ **Option B: Rate Limiting** (protect downstream)
134
+ ```nginx
135
+ # Limit requests to database (nginx)
136
+ limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
137
+
138
+ location /api/ {
139
+ limit_req zone=api burst=20 nodelay;
140
+ proxy_pass http://api-backend;
141
+ }
142
+ ```
143
+
144
+ **Option C: Shed Load** (reject non-critical requests)
145
+ ```javascript
146
+ // Reject non-critical requests when overloaded
147
+ app.use((req, res, next) => {
148
+ const load = getCurrentLoad(); // CPU, memory, queue depth
149
+
150
+ if (load > 0.8 && !isCriticalEndpoint(req.path)) {
151
+ return res.status(503).json({
152
+ error: 'Service overloaded, try again later'
153
+ });
154
+ }
155
+
156
+ next();
157
+ });
158
+
159
+ function isCriticalEndpoint(path) {
160
+ return ['/api/health', '/api/payment'].includes(path);
161
+ }
162
+ ```
163
+
164
+ **Option D: Isolate Failure** (take failing service offline)
165
+ ```bash
166
+ # Remove failing service from load balancer
167
+ # AWS ELB:
168
+ aws elbv2 deregister-targets \
169
+ --target-group-arn <arn> \
170
+ --targets Id=i-1234567890abcdef0
171
+
172
+ # nginx:
173
+ # Comment out failing backend in upstream block
174
+ # upstream api {
175
+ # server api1.example.com; # Healthy
176
+ # # server api2.example.com; # FAILING - commented out
177
+ # }
178
+
179
+ # Impact: Prevents failing service from affecting others
180
+ # Risk: Reduced capacity
181
+ ```
182
+
183
+ ---
184
+
185
+ ### Short-term (5 min - 1 hour)
186
+
187
+ **Option A: Fix Root Cause**
188
+
189
+ **If database slow**:
190
+ ```sql
191
+ -- Add missing index
192
+ CREATE INDEX CONCURRENTLY idx_users_last_login ON users(last_login_at);
193
+ ```
194
+
195
+ **If external API slow**:
196
+ ```javascript
197
+ // Add timeout + fallback
198
+ const response = await fetch('https://api.external.com', {
199
+ timeout: 2000 // 2s timeout
200
+ });
201
+
202
+ if (!response.ok) {
203
+ return fallbackData; // Don't cascade failure
204
+ }
205
+ ```
206
+
207
+ **If service overloaded**:
208
+ ```bash
209
+ # Scale horizontally (add more instances)
210
+ # AWS Auto Scaling:
211
+ aws autoscaling set-desired-capacity \
212
+ --auto-scaling-group-name my-asg \
213
+ --desired-capacity 10 # Was 5
214
+ ```
215
+
216
+ ---
217
+
218
+ **Option B: Add Timeouts** (prevent indefinite waiting)
219
+ ```javascript
220
+ // Database query timeout
221
+ const result = await db.query('SELECT * FROM users', {
222
+ timeout: 3000 // 3 second timeout
223
+ });
224
+
225
+ // API call timeout
226
+ const response = await fetch('/api/data', {
227
+ signal: AbortSignal.timeout(5000) // 5 second timeout
228
+ });
229
+
230
+ // Impact: Fail fast instead of cascading
231
+ // Risk: Low (better to timeout than cascade)
232
+ ```
233
+
234
+ ---
235
+
236
+ **Option C: Add Bulkheads** (isolate critical paths)
237
+ ```javascript
238
+ // Separate connection pools for critical vs non-critical
239
+ const criticalPool = new Pool({ max: 10 }); // Payments, auth
240
+ const nonCriticalPool = new Pool({ max: 5 }); // Analytics, reports
241
+
242
+ // Critical requests get priority
243
+ app.post('/api/payment', async (req, res) => {
244
+ const conn = await criticalPool.connect();
245
+ // ...
246
+ });
247
+
248
+ // Non-critical requests use separate pool
249
+ app.get('/api/analytics', async (req, res) => {
250
+ const conn = await nonCriticalPool.connect();
251
+ // ...
252
+ });
253
+
254
+ // Impact: Critical paths protected from non-critical load
255
+ // Risk: None (isolation improves reliability)
256
+ ```
257
+
258
+ ---
259
+
260
+ ### Long-term (1 hour+)
261
+
262
+ **Architecture Improvements**:
263
+
264
+ - [ ] **Circuit Breakers** (all external dependencies)
265
+ - [ ] **Timeouts** (every network call, database query)
266
+ - [ ] **Retries with exponential backoff** (transient failures)
267
+ - [ ] **Bulkheads** (isolate critical paths)
268
+ - [ ] **Rate limiting** (protect downstream services)
269
+ - [ ] **Graceful degradation** (fallback data, cached responses)
270
+ - [ ] **Health checks** (detect failures early)
271
+ - [ ] **Auto-scaling** (handle load spikes)
272
+ - [ ] **Chaos engineering** (test cascade scenarios)
273
+
274
+ ---
275
+
276
+ ## Cascade Prevention Patterns
277
+
278
+ ### 1. Circuit Breaker Pattern
279
+ ```javascript
280
+ const breaker = new CircuitBreaker(riskyOperation, {
281
+ timeout: 3000,
282
+ errorThresholdPercentage: 50,
283
+ resetTimeout: 30000
284
+ });
285
+
286
+ breaker.fallback(() => cachedData);
287
+ ```
288
+
289
+ **Benefits**:
290
+ - Fast failure (don't wait for timeout)
291
+ - Automatic recovery (reset after timeout)
292
+ - Fallback data (graceful degradation)
293
+
294
+ ---
295
+
296
+ ### 2. Timeout Pattern
297
+ ```javascript
298
+ // ALWAYS set timeouts
299
+ const response = await fetch('/api', {
300
+ signal: AbortSignal.timeout(5000)
301
+ });
302
+ ```
303
+
304
+ **Benefits**:
305
+ - Fail fast (don't cascade indefinite waits)
306
+ - Predictable behavior
307
+
308
+ ---
309
+
310
+ ### 3. Bulkhead Pattern
311
+ ```javascript
312
+ // Separate resource pools
313
+ const criticalPool = new Pool({ max: 10 });
314
+ const nonCriticalPool = new Pool({ max: 5 });
315
+ ```
316
+
317
+ **Benefits**:
318
+ - Critical paths protected
319
+ - Non-critical load can't exhaust resources
320
+
321
+ ---
322
+
323
+ ### 4. Retry with Backoff
324
+ ```javascript
325
+ async function retryWithBackoff(fn, retries = 3) {
326
+ for (let i = 0; i < retries; i++) {
327
+ try {
328
+ return await fn();
329
+ } catch (error) {
330
+ if (i === retries - 1) throw error;
331
+ await sleep(Math.pow(2, i) * 1000); // 1s, 2s, 4s
332
+ }
333
+ }
334
+ }
335
+ ```
336
+
337
+ **Benefits**:
338
+ - Handles transient failures
339
+ - Exponential backoff prevents thundering herd
340
+
341
+ ---
342
+
343
+ ### 5. Load Shedding
344
+ ```javascript
345
+ // Reject requests when overloaded
346
+ if (queueDepth > threshold) {
347
+ return res.status(503).send('Overloaded');
348
+ }
349
+ ```
350
+
351
+ **Benefits**:
352
+ - Prevent overload
353
+ - Protect downstream services
354
+
355
+ ---
356
+
357
+ ## Escalation
358
+
359
+ **Escalate to architecture team if**:
360
+ - System-wide cascade
361
+ - Architectural changes needed
362
+
363
+ **Escalate to all service owners if**:
364
+ - Multiple teams affected
365
+ - Need coordinated response
366
+
367
+ **Escalate to management if**:
368
+ - Complete outage
369
+ - Large customer impact
370
+
371
+ ---
372
+
373
+ ## Prevention Checklist
374
+
375
+ - [ ] Circuit breakers on all external calls
376
+ - [ ] Timeouts on all network operations
377
+ - [ ] Retries with exponential backoff
378
+ - [ ] Bulkheads for critical paths
379
+ - [ ] Rate limiting (protect downstream)
380
+ - [ ] Health checks (detect failures early)
381
+ - [ ] Auto-scaling (handle load)
382
+ - [ ] Graceful degradation (fallback data)
383
+ - [ ] Chaos engineering (test failure scenarios)
384
+ - [ ] Load testing (find breaking points)
385
+
386
+ ---
387
+
388
+ ## Related Runbooks
389
+
390
+ - [04-slow-api-response.md](04-slow-api-response.md) - API performance
391
+ - [07-service-down.md](07-service-down.md) - Service failures
392
+ - [../modules/backend-diagnostics.md](../modules/backend-diagnostics.md) - Backend troubleshooting
393
+
394
+ ---
395
+
396
+ ## Post-Incident
397
+
398
+ After resolving:
399
+ - [ ] Create post-mortem (MANDATORY for cascade failures)
400
+ - [ ] Draw cascade diagram (which services failed in order)
401
+ - [ ] Identify missing safeguards (circuit breakers, timeouts)
402
+ - [ ] Implement prevention patterns
403
+ - [ ] Test cascade scenarios (chaos engineering)
404
+ - [ ] Update this runbook if needed
405
+
406
+ ---
407
+
408
+ ## Cascade Failure Examples
409
+
410
+ **Netflix Outage (2012)**:
411
+ - Database latency → API timeouts → Frontend failures → Complete outage
412
+ - **Fix**: Circuit breakers, timeouts, fallback data
413
+
414
+ **AWS S3 Outage (2017)**:
415
+ - S3 down → Websites using S3 fail → Status dashboards fail (also on S3)
416
+ - **Fix**: Multi-region redundancy, fallback to different regions
417
+
418
+ **Google Cloud Outage (2019)**:
419
+ - Network misconfiguration → Internal services fail → External services cascade
420
+ - **Fix**: Network configuration validation, staged rollouts
421
+
422
+ ---
423
+
424
+ ## Key Takeaways
425
+
426
+ 1. **Cascades happen when failures propagate** (no circuit breakers, timeouts)
427
+ 2. **Fix the root cause first** (not the symptoms)
428
+ 3. **Fail fast, don't cascade waits** (timeouts everywhere)
429
+ 4. **Graceful degradation** (fallback > failure)
430
+ 5. **Test failure scenarios** (chaos engineering)