@zigrivers/scaffold 3.16.0 → 3.17.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (377) hide show
  1. package/README.md +28 -0
  2. package/content/knowledge/backend/backend-fintech-broker-integration.md +244 -0
  3. package/content/knowledge/backend/backend-fintech-compliance.md +181 -0
  4. package/content/knowledge/backend/backend-fintech-data-modeling.md +210 -0
  5. package/content/knowledge/backend/backend-fintech-ledger.md +226 -0
  6. package/content/knowledge/backend/backend-fintech-observability.md +151 -0
  7. package/content/knowledge/backend/backend-fintech-order-lifecycle.md +213 -0
  8. package/content/knowledge/backend/backend-fintech-risk-management.md +150 -0
  9. package/content/knowledge/backend/backend-fintech-testing.md +197 -0
  10. package/content/knowledge/core/automated-review-tooling.md +10 -0
  11. package/content/knowledge/core/multi-service-api-contracts.md +634 -0
  12. package/content/knowledge/core/multi-service-architecture.md +492 -0
  13. package/content/knowledge/core/multi-service-auth.md +706 -0
  14. package/content/knowledge/core/multi-service-data-ownership.md +539 -0
  15. package/content/knowledge/core/multi-service-observability.md +545 -0
  16. package/content/knowledge/core/multi-service-resilience.md +710 -0
  17. package/content/knowledge/core/multi-service-task-decomposition.md +615 -0
  18. package/content/knowledge/core/multi-service-testing.md +728 -0
  19. package/content/methodology/backend-fintech.yml +46 -0
  20. package/content/methodology/custom-defaults.yml +6 -0
  21. package/content/methodology/deep.yml +6 -0
  22. package/content/methodology/multi-service-overlay.yml +103 -0
  23. package/content/methodology/mvp.yml +6 -0
  24. package/content/pipeline/architecture/service-ownership-map.md +83 -0
  25. package/content/pipeline/quality/cross-service-auth.md +96 -0
  26. package/content/pipeline/quality/cross-service-observability.md +104 -0
  27. package/content/pipeline/quality/integration-test-plan.md +106 -0
  28. package/content/pipeline/specification/inter-service-contracts.md +95 -0
  29. package/dist/cli/commands/adopt.cli-flags.test.js +20 -0
  30. package/dist/cli/commands/adopt.cli-flags.test.js.map +1 -1
  31. package/dist/cli/commands/adopt.d.ts.map +1 -1
  32. package/dist/cli/commands/adopt.js +11 -3
  33. package/dist/cli/commands/adopt.js.map +1 -1
  34. package/dist/cli/commands/complete.d.ts +1 -0
  35. package/dist/cli/commands/complete.d.ts.map +1 -1
  36. package/dist/cli/commands/complete.js +26 -8
  37. package/dist/cli/commands/complete.js.map +1 -1
  38. package/dist/cli/commands/dashboard.d.ts +1 -0
  39. package/dist/cli/commands/dashboard.d.ts.map +1 -1
  40. package/dist/cli/commands/dashboard.js +19 -6
  41. package/dist/cli/commands/dashboard.js.map +1 -1
  42. package/dist/cli/commands/decisions.d.ts +1 -0
  43. package/dist/cli/commands/decisions.d.ts.map +1 -1
  44. package/dist/cli/commands/decisions.js +18 -4
  45. package/dist/cli/commands/decisions.js.map +1 -1
  46. package/dist/cli/commands/info.d.ts +1 -0
  47. package/dist/cli/commands/info.d.ts.map +1 -1
  48. package/dist/cli/commands/info.js +25 -3
  49. package/dist/cli/commands/info.js.map +1 -1
  50. package/dist/cli/commands/init-from.test.d.ts +2 -0
  51. package/dist/cli/commands/init-from.test.d.ts.map +1 -0
  52. package/dist/cli/commands/init-from.test.js +315 -0
  53. package/dist/cli/commands/init-from.test.js.map +1 -0
  54. package/dist/cli/commands/init.d.ts +3 -0
  55. package/dist/cli/commands/init.d.ts.map +1 -1
  56. package/dist/cli/commands/init.js +239 -129
  57. package/dist/cli/commands/init.js.map +1 -1
  58. package/dist/cli/commands/init.test.js +20 -0
  59. package/dist/cli/commands/init.test.js.map +1 -1
  60. package/dist/cli/commands/next.d.ts +1 -0
  61. package/dist/cli/commands/next.d.ts.map +1 -1
  62. package/dist/cli/commands/next.js +40 -4
  63. package/dist/cli/commands/next.js.map +1 -1
  64. package/dist/cli/commands/next.test.js +151 -0
  65. package/dist/cli/commands/next.test.js.map +1 -1
  66. package/dist/cli/commands/reset.d.ts +1 -0
  67. package/dist/cli/commands/reset.d.ts.map +1 -1
  68. package/dist/cli/commands/reset.js +77 -29
  69. package/dist/cli/commands/reset.js.map +1 -1
  70. package/dist/cli/commands/rework.d.ts +1 -0
  71. package/dist/cli/commands/rework.d.ts.map +1 -1
  72. package/dist/cli/commands/rework.js +16 -2
  73. package/dist/cli/commands/rework.js.map +1 -1
  74. package/dist/cli/commands/run.d.ts +1 -0
  75. package/dist/cli/commands/run.d.ts.map +1 -1
  76. package/dist/cli/commands/run.js +65 -13
  77. package/dist/cli/commands/run.js.map +1 -1
  78. package/dist/cli/commands/run.test.js +192 -3
  79. package/dist/cli/commands/run.test.js.map +1 -1
  80. package/dist/cli/commands/skip.d.ts +1 -0
  81. package/dist/cli/commands/skip.d.ts.map +1 -1
  82. package/dist/cli/commands/skip.js +24 -7
  83. package/dist/cli/commands/skip.js.map +1 -1
  84. package/dist/cli/commands/status.d.ts +1 -0
  85. package/dist/cli/commands/status.d.ts.map +1 -1
  86. package/dist/cli/commands/status.js +51 -4
  87. package/dist/cli/commands/status.js.map +1 -1
  88. package/dist/cli/commands/status.test.js +128 -0
  89. package/dist/cli/commands/status.test.js.map +1 -1
  90. package/dist/cli/guards-coverage.test.d.ts +2 -0
  91. package/dist/cli/guards-coverage.test.d.ts.map +1 -0
  92. package/dist/cli/guards-coverage.test.js +26 -0
  93. package/dist/cli/guards-coverage.test.js.map +1 -0
  94. package/dist/cli/guards-integration.test.d.ts +2 -0
  95. package/dist/cli/guards-integration.test.d.ts.map +1 -0
  96. package/dist/cli/guards-integration.test.js +178 -0
  97. package/dist/cli/guards-integration.test.js.map +1 -0
  98. package/dist/cli/guards.d.ts +13 -0
  99. package/dist/cli/guards.d.ts.map +1 -0
  100. package/dist/cli/guards.js +70 -0
  101. package/dist/cli/guards.js.map +1 -0
  102. package/dist/cli/guards.test.d.ts +2 -0
  103. package/dist/cli/guards.test.d.ts.map +1 -0
  104. package/dist/cli/guards.test.js +136 -0
  105. package/dist/cli/guards.test.js.map +1 -0
  106. package/dist/cli/init-flag-families.d.ts +1 -1
  107. package/dist/cli/init-flag-families.d.ts.map +1 -1
  108. package/dist/cli/init-flag-families.js +4 -1
  109. package/dist/cli/init-flag-families.js.map +1 -1
  110. package/dist/cli/init-flag-families.test.js +10 -0
  111. package/dist/cli/init-flag-families.test.js.map +1 -1
  112. package/dist/cli/shutdown.d.ts +2 -3
  113. package/dist/cli/shutdown.d.ts.map +1 -1
  114. package/dist/cli/shutdown.js +14 -11
  115. package/dist/cli/shutdown.js.map +1 -1
  116. package/dist/cli/shutdown.test.js +2 -4
  117. package/dist/cli/shutdown.test.js.map +1 -1
  118. package/dist/config/schema.d.ts +12122 -288
  119. package/dist/config/schema.d.ts.map +1 -1
  120. package/dist/config/schema.js +74 -79
  121. package/dist/config/schema.js.map +1 -1
  122. package/dist/config/schema.test.js +230 -1
  123. package/dist/config/schema.test.js.map +1 -1
  124. package/dist/config/validators/backend.d.ts +4 -0
  125. package/dist/config/validators/backend.d.ts.map +1 -0
  126. package/dist/config/validators/backend.js +14 -0
  127. package/dist/config/validators/backend.js.map +1 -0
  128. package/dist/config/validators/browser-extension.d.ts +4 -0
  129. package/dist/config/validators/browser-extension.d.ts.map +1 -0
  130. package/dist/config/validators/browser-extension.js +24 -0
  131. package/dist/config/validators/browser-extension.js.map +1 -0
  132. package/dist/config/validators/cli.d.ts +4 -0
  133. package/dist/config/validators/cli.d.ts.map +1 -0
  134. package/dist/config/validators/cli.js +14 -0
  135. package/dist/config/validators/cli.js.map +1 -0
  136. package/dist/config/validators/data-pipeline.d.ts +4 -0
  137. package/dist/config/validators/data-pipeline.d.ts.map +1 -0
  138. package/dist/config/validators/data-pipeline.js +14 -0
  139. package/dist/config/validators/data-pipeline.js.map +1 -0
  140. package/dist/config/validators/game.d.ts +4 -0
  141. package/dist/config/validators/game.d.ts.map +1 -0
  142. package/dist/config/validators/game.js +14 -0
  143. package/dist/config/validators/game.js.map +1 -0
  144. package/dist/config/validators/index.d.ts +7 -0
  145. package/dist/config/validators/index.d.ts.map +1 -0
  146. package/dist/config/validators/index.js +27 -0
  147. package/dist/config/validators/index.js.map +1 -0
  148. package/dist/config/validators/library.d.ts +4 -0
  149. package/dist/config/validators/library.d.ts.map +1 -0
  150. package/dist/config/validators/library.js +25 -0
  151. package/dist/config/validators/library.js.map +1 -0
  152. package/dist/config/validators/ml.d.ts +4 -0
  153. package/dist/config/validators/ml.d.ts.map +1 -0
  154. package/dist/config/validators/ml.js +31 -0
  155. package/dist/config/validators/ml.js.map +1 -0
  156. package/dist/config/validators/mobile-app.d.ts +4 -0
  157. package/dist/config/validators/mobile-app.d.ts.map +1 -0
  158. package/dist/config/validators/mobile-app.js +14 -0
  159. package/dist/config/validators/mobile-app.js.map +1 -0
  160. package/dist/config/validators/registry.test.d.ts +2 -0
  161. package/dist/config/validators/registry.test.d.ts.map +1 -0
  162. package/dist/config/validators/registry.test.js +26 -0
  163. package/dist/config/validators/registry.test.js.map +1 -0
  164. package/dist/config/validators/research.d.ts +4 -0
  165. package/dist/config/validators/research.d.ts.map +1 -0
  166. package/dist/config/validators/research.js +24 -0
  167. package/dist/config/validators/research.js.map +1 -0
  168. package/dist/config/validators/research.test.d.ts +2 -0
  169. package/dist/config/validators/research.test.d.ts.map +1 -0
  170. package/dist/config/validators/research.test.js +44 -0
  171. package/dist/config/validators/research.test.js.map +1 -0
  172. package/dist/config/validators/types.d.ts +19 -0
  173. package/dist/config/validators/types.d.ts.map +1 -0
  174. package/dist/config/validators/types.js +2 -0
  175. package/dist/config/validators/types.js.map +1 -0
  176. package/dist/config/validators/validators.test.d.ts +2 -0
  177. package/dist/config/validators/validators.test.d.ts.map +1 -0
  178. package/dist/config/validators/validators.test.js +25 -0
  179. package/dist/config/validators/validators.test.js.map +1 -0
  180. package/dist/config/validators/web-app.d.ts +4 -0
  181. package/dist/config/validators/web-app.d.ts.map +1 -0
  182. package/dist/config/validators/web-app.js +31 -0
  183. package/dist/config/validators/web-app.js.map +1 -0
  184. package/dist/core/assembly/context-gatherer.d.ts.map +1 -1
  185. package/dist/core/assembly/context-gatherer.js +4 -2
  186. package/dist/core/assembly/context-gatherer.js.map +1 -1
  187. package/dist/core/assembly/cross-reads.d.ts +58 -0
  188. package/dist/core/assembly/cross-reads.d.ts.map +1 -0
  189. package/dist/core/assembly/cross-reads.js +185 -0
  190. package/dist/core/assembly/cross-reads.js.map +1 -0
  191. package/dist/core/assembly/cross-reads.test.d.ts +2 -0
  192. package/dist/core/assembly/cross-reads.test.d.ts.map +1 -0
  193. package/dist/core/assembly/cross-reads.test.js +383 -0
  194. package/dist/core/assembly/cross-reads.test.js.map +1 -0
  195. package/dist/core/assembly/overlay-loader-structural.test.d.ts +2 -0
  196. package/dist/core/assembly/overlay-loader-structural.test.d.ts.map +1 -0
  197. package/dist/core/assembly/overlay-loader-structural.test.js +114 -0
  198. package/dist/core/assembly/overlay-loader-structural.test.js.map +1 -0
  199. package/dist/core/assembly/overlay-loader.d.ts +17 -3
  200. package/dist/core/assembly/overlay-loader.d.ts.map +1 -1
  201. package/dist/core/assembly/overlay-loader.js +75 -0
  202. package/dist/core/assembly/overlay-loader.js.map +1 -1
  203. package/dist/core/assembly/overlay-resolver.d.ts +2 -2
  204. package/dist/core/assembly/overlay-resolver.d.ts.map +1 -1
  205. package/dist/core/assembly/overlay-resolver.js.map +1 -1
  206. package/dist/core/assembly/overlay-resolver.test.js.map +1 -1
  207. package/dist/core/assembly/overlay-state-resolver.d.ts +5 -0
  208. package/dist/core/assembly/overlay-state-resolver.d.ts.map +1 -1
  209. package/dist/core/assembly/overlay-state-resolver.js +41 -1
  210. package/dist/core/assembly/overlay-state-resolver.js.map +1 -1
  211. package/dist/core/assembly/overlay-state-resolver.test.js +262 -0
  212. package/dist/core/assembly/overlay-state-resolver.test.js.map +1 -1
  213. package/dist/core/assembly/update-mode.d.ts +1 -0
  214. package/dist/core/assembly/update-mode.d.ts.map +1 -1
  215. package/dist/core/assembly/update-mode.js +17 -9
  216. package/dist/core/assembly/update-mode.js.map +1 -1
  217. package/dist/core/dependency/eligibility.d.ts +10 -1
  218. package/dist/core/dependency/eligibility.d.ts.map +1 -1
  219. package/dist/core/dependency/eligibility.js +19 -1
  220. package/dist/core/dependency/eligibility.js.map +1 -1
  221. package/dist/core/dependency/eligibility.test.js +82 -0
  222. package/dist/core/dependency/eligibility.test.js.map +1 -1
  223. package/dist/core/dependency/graph.d.ts +4 -1
  224. package/dist/core/dependency/graph.d.ts.map +1 -1
  225. package/dist/core/dependency/graph.js +7 -1
  226. package/dist/core/dependency/graph.js.map +1 -1
  227. package/dist/core/dependency/graph.test.js +29 -0
  228. package/dist/core/dependency/graph.test.js.map +1 -1
  229. package/dist/core/pipeline/global-steps.d.ts +7 -0
  230. package/dist/core/pipeline/global-steps.d.ts.map +1 -0
  231. package/dist/core/pipeline/global-steps.js +18 -0
  232. package/dist/core/pipeline/global-steps.js.map +1 -0
  233. package/dist/core/pipeline/resolver.d.ts +1 -0
  234. package/dist/core/pipeline/resolver.d.ts.map +1 -1
  235. package/dist/core/pipeline/resolver.js +51 -6
  236. package/dist/core/pipeline/resolver.js.map +1 -1
  237. package/dist/core/pipeline/types.d.ts +5 -1
  238. package/dist/core/pipeline/types.d.ts.map +1 -1
  239. package/dist/e2e/cross-service-references.test.d.ts +22 -0
  240. package/dist/e2e/cross-service-references.test.d.ts.map +1 -0
  241. package/dist/e2e/cross-service-references.test.js +188 -0
  242. package/dist/e2e/cross-service-references.test.js.map +1 -0
  243. package/dist/e2e/multi-service-pipeline.test.d.ts +10 -0
  244. package/dist/e2e/multi-service-pipeline.test.d.ts.map +1 -0
  245. package/dist/e2e/multi-service-pipeline.test.js +185 -0
  246. package/dist/e2e/multi-service-pipeline.test.js.map +1 -0
  247. package/dist/e2e/project-type-overlays.test.js +68 -0
  248. package/dist/e2e/project-type-overlays.test.js.map +1 -1
  249. package/dist/e2e/service-execution.test.d.ts +15 -0
  250. package/dist/e2e/service-execution.test.d.ts.map +1 -0
  251. package/dist/e2e/service-execution.test.js +219 -0
  252. package/dist/e2e/service-execution.test.js.map +1 -0
  253. package/dist/e2e/service-manifest.test.d.ts +19 -0
  254. package/dist/e2e/service-manifest.test.d.ts.map +1 -0
  255. package/dist/e2e/service-manifest.test.js +166 -0
  256. package/dist/e2e/service-manifest.test.js.map +1 -0
  257. package/dist/project/__frozen-schemas__/schema-v3.9.2.d.ts +224 -224
  258. package/dist/project/frontmatter.d.ts.map +1 -1
  259. package/dist/project/frontmatter.js +11 -0
  260. package/dist/project/frontmatter.js.map +1 -1
  261. package/dist/project/frontmatter.test.js +71 -0
  262. package/dist/project/frontmatter.test.js.map +1 -1
  263. package/dist/state/completion.d.ts +1 -1
  264. package/dist/state/completion.d.ts.map +1 -1
  265. package/dist/state/completion.js +10 -8
  266. package/dist/state/completion.js.map +1 -1
  267. package/dist/state/decision-logger.d.ts +3 -2
  268. package/dist/state/decision-logger.d.ts.map +1 -1
  269. package/dist/state/decision-logger.js +12 -11
  270. package/dist/state/decision-logger.js.map +1 -1
  271. package/dist/state/ensure-v3-migration.d.ts +9 -0
  272. package/dist/state/ensure-v3-migration.d.ts.map +1 -0
  273. package/dist/state/ensure-v3-migration.js +35 -0
  274. package/dist/state/ensure-v3-migration.js.map +1 -0
  275. package/dist/state/lock-manager.d.ts +5 -4
  276. package/dist/state/lock-manager.d.ts.map +1 -1
  277. package/dist/state/lock-manager.js +11 -11
  278. package/dist/state/lock-manager.js.map +1 -1
  279. package/dist/state/rework-manager.d.ts +1 -2
  280. package/dist/state/rework-manager.d.ts.map +1 -1
  281. package/dist/state/rework-manager.js +4 -5
  282. package/dist/state/rework-manager.js.map +1 -1
  283. package/dist/state/state-manager.d.ts +25 -1
  284. package/dist/state/state-manager.d.ts.map +1 -1
  285. package/dist/state/state-manager.js +86 -12
  286. package/dist/state/state-manager.js.map +1 -1
  287. package/dist/state/state-manager.test.js +278 -0
  288. package/dist/state/state-manager.test.js.map +1 -1
  289. package/dist/state/state-migration-v3.d.ts +22 -0
  290. package/dist/state/state-migration-v3.d.ts.map +1 -0
  291. package/dist/state/state-migration-v3.js +82 -0
  292. package/dist/state/state-migration-v3.js.map +1 -0
  293. package/dist/state/state-migration-v3.test.d.ts +2 -0
  294. package/dist/state/state-migration-v3.test.d.ts.map +1 -0
  295. package/dist/state/state-migration-v3.test.js +196 -0
  296. package/dist/state/state-migration-v3.test.js.map +1 -0
  297. package/dist/state/state-migration.d.ts.map +1 -1
  298. package/dist/state/state-migration.js +11 -6
  299. package/dist/state/state-migration.js.map +1 -1
  300. package/dist/state/state-migration.test.js +47 -2
  301. package/dist/state/state-migration.test.js.map +1 -1
  302. package/dist/state/state-path-resolver.d.ts +23 -0
  303. package/dist/state/state-path-resolver.d.ts.map +1 -0
  304. package/dist/state/state-path-resolver.js +36 -0
  305. package/dist/state/state-path-resolver.js.map +1 -0
  306. package/dist/state/state-path-resolver.test.d.ts +2 -0
  307. package/dist/state/state-path-resolver.test.d.ts.map +1 -0
  308. package/dist/state/state-path-resolver.test.js +78 -0
  309. package/dist/state/state-path-resolver.test.js.map +1 -0
  310. package/dist/state/state-version-dispatch.d.ts +17 -0
  311. package/dist/state/state-version-dispatch.d.ts.map +1 -0
  312. package/dist/state/state-version-dispatch.js +27 -0
  313. package/dist/state/state-version-dispatch.js.map +1 -0
  314. package/dist/state/state-version-dispatch.test.d.ts +2 -0
  315. package/dist/state/state-version-dispatch.test.d.ts.map +1 -0
  316. package/dist/state/state-version-dispatch.test.js +40 -0
  317. package/dist/state/state-version-dispatch.test.js.map +1 -0
  318. package/dist/types/config.d.ts +25 -3
  319. package/dist/types/config.d.ts.map +1 -1
  320. package/dist/types/config.test.js +13 -1
  321. package/dist/types/config.test.js.map +1 -1
  322. package/dist/types/dependency.d.ts +5 -0
  323. package/dist/types/dependency.d.ts.map +1 -1
  324. package/dist/types/frontmatter.d.ts +5 -0
  325. package/dist/types/frontmatter.d.ts.map +1 -1
  326. package/dist/types/lock.d.ts +1 -1
  327. package/dist/types/lock.d.ts.map +1 -1
  328. package/dist/types/state.d.ts +1 -1
  329. package/dist/types/state.d.ts.map +1 -1
  330. package/dist/utils/artifact-path.d.ts +19 -0
  331. package/dist/utils/artifact-path.d.ts.map +1 -0
  332. package/dist/utils/artifact-path.js +95 -0
  333. package/dist/utils/artifact-path.js.map +1 -0
  334. package/dist/utils/artifact-path.test.d.ts +2 -0
  335. package/dist/utils/artifact-path.test.d.ts.map +1 -0
  336. package/dist/utils/artifact-path.test.js +138 -0
  337. package/dist/utils/artifact-path.test.js.map +1 -0
  338. package/dist/utils/errors.d.ts +1 -1
  339. package/dist/utils/errors.d.ts.map +1 -1
  340. package/dist/utils/errors.js +5 -2
  341. package/dist/utils/errors.js.map +1 -1
  342. package/dist/utils/user-errors.d.ts +46 -0
  343. package/dist/utils/user-errors.d.ts.map +1 -0
  344. package/dist/utils/user-errors.js +76 -0
  345. package/dist/utils/user-errors.js.map +1 -0
  346. package/dist/utils/user-errors.test.d.ts +2 -0
  347. package/dist/utils/user-errors.test.d.ts.map +1 -0
  348. package/dist/utils/user-errors.test.js +74 -0
  349. package/dist/utils/user-errors.test.js.map +1 -0
  350. package/dist/validation/index.d.ts.map +1 -1
  351. package/dist/validation/index.js +16 -0
  352. package/dist/validation/index.js.map +1 -1
  353. package/dist/validation/index.test.js +48 -0
  354. package/dist/validation/index.test.js.map +1 -1
  355. package/dist/validation/state-validator.d.ts +5 -2
  356. package/dist/validation/state-validator.d.ts.map +1 -1
  357. package/dist/validation/state-validator.js +18 -20
  358. package/dist/validation/state-validator.js.map +1 -1
  359. package/dist/validation/state-validator.test.js +31 -2
  360. package/dist/validation/state-validator.test.js.map +1 -1
  361. package/dist/wizard/copy/backend.d.ts.map +1 -1
  362. package/dist/wizard/copy/backend.js +12 -0
  363. package/dist/wizard/copy/backend.js.map +1 -1
  364. package/dist/wizard/flags.d.ts +1 -0
  365. package/dist/wizard/flags.d.ts.map +1 -1
  366. package/dist/wizard/questions.d.ts.map +1 -1
  367. package/dist/wizard/questions.js +5 -1
  368. package/dist/wizard/questions.js.map +1 -1
  369. package/dist/wizard/questions.test.js +45 -2
  370. package/dist/wizard/questions.test.js.map +1 -1
  371. package/dist/wizard/wizard.d.ts +23 -0
  372. package/dist/wizard/wizard.d.ts.map +1 -1
  373. package/dist/wizard/wizard.js +85 -47
  374. package/dist/wizard/wizard.js.map +1 -1
  375. package/dist/wizard/wizard.test.js +186 -1
  376. package/dist/wizard/wizard.test.js.map +1 -1
  377. package/package.json +1 -1
@@ -0,0 +1,710 @@
1
+ ---
2
+ name: multi-service-resilience
3
+ description: Circuit breakers, bulkheads, timeout budgets, and failure isolation strategies
4
+ topics: [circuit-breakers, bulkheads, timeout-budgets, failure-isolation, retry-storms]
5
+ ---
6
+
7
+ ## Summary
8
+
9
+ In a multi-service system, any individual service will fail. The question is whether that failure stays contained or cascades into a full system outage. Resilience patterns exist to answer that question.
10
+
11
+ **Circuit breakers** wrap outbound calls and monitor failure rates. When failures exceed a threshold, the circuit opens and returns immediate failures instead of waiting for timeouts. Three states: closed (normal), open (failing fast), half-open (probing recovery).
12
+
13
+ **Bulkheads** partition resources so a misbehaving integration cannot consume all threads and connections. Dedicate separate concurrency limits per downstream service — a payments outage fills only the payments pool, not the global pool.
14
+
15
+ **Timeout budgets:** Every user-facing request has a total time budget (e.g., 2000ms). Propagate the absolute deadline via `x-request-deadline` header so all services share one clock. Timeouts must decrease with call depth — a chain of five services with 5000ms timeouts each can take 25 seconds to fail.
16
+
17
+ **Graceful degradation:** Design each feature for full, degraded, and unavailable modes. Use `Promise.allSettled` for parallel enrichment calls so one failure doesn't cancel all others. Return stale cache or placeholder data for non-critical enrichments when the upstream is unavailable.
18
+
19
+ **Retry storm prevention:** Add full jitter to exponential backoff. Coordinate retry budgets (allow at most 20% of traffic to be retries) in high-traffic services. Pair every retry block with a circuit breaker — retries should stop when the circuit opens.
20
+
21
+ **Observability for resilience:** Every circuit breaker, bulkhead, and retry must emit metrics (`circuit_breaker_state`, `bulkhead_rejected_calls_total`, `retry_attempts_total`) so you know when they activate.
22
+
23
+ ## Deep Guidance
24
+
25
+ ## Circuit Breaker Pattern
26
+
27
+ A circuit breaker wraps outbound calls to a downstream service. It monitors failure rates and, when they exceed a threshold, stops forwarding calls — returning an immediate failure instead of waiting for timeouts to accumulate.
28
+
29
+ ### States
30
+
31
+ A circuit breaker has three states:
32
+
33
+ **Closed (normal):** All calls pass through. Failures are counted. When the failure rate exceeds the threshold within the measurement window, the circuit opens.
34
+
35
+ **Open (tripped):** All calls fail immediately without attempting the downstream call. The caller receives a fast failure. After a configured timeout, the circuit transitions to half-open.
36
+
37
+ **Half-Open (probing):** A limited number of probe calls are allowed through to the downstream service. If they succeed, the circuit closes. If they fail, the circuit returns to open.
38
+
39
+ ```typescript
40
+ type CircuitState = 'closed' | 'open' | 'half-open';
41
+
42
+ interface CircuitBreakerConfig {
43
+ // How many failures in the window before opening
44
+ failureThreshold: number;
45
+ // Rolling window duration (ms) for counting failures
46
+ windowMs: number;
47
+ // Minimum number of calls in the window before the threshold applies
48
+ // Prevents opening on 1 failure out of 1 call (100% failure rate)
49
+ minimumCallCount: number;
50
+ // How long to stay open before probing (ms)
51
+ recoveryTimeoutMs: number;
52
+ // How many probe calls in half-open state before deciding
53
+ probeCount: number;
54
+ // How many probes must succeed to close the circuit
55
+ probeSuccessThreshold: number;
56
+ }
57
+
58
+ const DEFAULT_CIRCUIT_CONFIG: CircuitBreakerConfig = {
59
+ failureThreshold: 0.5, // 50% failure rate triggers open
60
+ windowMs: 60_000, // 1-minute rolling window
61
+ minimumCallCount: 10, // Need at least 10 calls before evaluating
62
+ recoveryTimeoutMs: 30_000, // Wait 30s before probing
63
+ probeCount: 3,
64
+ probeSuccessThreshold: 2,
65
+ };
66
+
67
+ class CircuitBreaker {
68
+ private state: CircuitState = 'closed';
69
+ private callWindow: Array<{ timestamp: number; success: boolean }> = [];
70
+ private probeAttempts = 0;
71
+ private probeSuccesses = 0;
72
+ private openedAt?: number;
73
+
74
+ constructor(
75
+ private readonly name: string,
76
+ private readonly config: CircuitBreakerConfig,
77
+ ) {}
78
+
79
+ async execute<T>(fn: () => Promise<T>): Promise<T> {
80
+ this.pruneWindow();
81
+
82
+ if (this.state === 'open') {
83
+ const elapsed = Date.now() - (this.openedAt ?? 0);
84
+ if (elapsed < this.config.recoveryTimeoutMs) {
85
+ throw new CircuitOpenError(`Circuit ${this.name} is open — downstream unavailable`);
86
+ }
87
+ this.transitionTo('half-open');
88
+ }
89
+
90
+ if (this.state === 'half-open') {
91
+ if (this.probeAttempts >= this.config.probeCount) {
92
+ throw new CircuitOpenError(`Circuit ${this.name} probe limit reached — still open`);
93
+ }
94
+ this.probeAttempts++;
95
+ }
96
+
97
+ try {
98
+ const result = await fn();
99
+ this.recordSuccess();
100
+ return result;
101
+ } catch (error) {
102
+ this.recordFailure();
103
+ throw error;
104
+ }
105
+ }
106
+
107
+ private recordSuccess() {
108
+ this.callWindow.push({ timestamp: Date.now(), success: true });
109
+
110
+ if (this.state === 'half-open') {
111
+ this.probeSuccesses++;
112
+ if (this.probeSuccesses >= this.config.probeSuccessThreshold) {
113
+ this.transitionTo('closed');
114
+ }
115
+ }
116
+ }
117
+
118
+ private recordFailure() {
119
+ this.callWindow.push({ timestamp: Date.now(), success: false });
120
+
121
+ if (this.state === 'half-open') {
122
+ this.transitionTo('open');
123
+ return;
124
+ }
125
+
126
+ if (this.state === 'closed' && this.shouldOpen()) {
127
+ this.transitionTo('open');
128
+ }
129
+ }
130
+
131
+ private shouldOpen(): boolean {
132
+ if (this.callWindow.length < this.config.minimumCallCount) return false;
133
+ const failures = this.callWindow.filter(c => !c.success).length;
134
+ return failures / this.callWindow.length >= this.config.failureThreshold;
135
+ }
136
+
137
+ private transitionTo(next: CircuitState) {
138
+ this.state = next;
139
+ if (next === 'open') {
140
+ this.openedAt = Date.now();
141
+ this.probeAttempts = 0;
142
+ this.probeSuccesses = 0;
143
+ } else if (next === 'closed') {
144
+ this.callWindow = [];
145
+ this.probeAttempts = 0;
146
+ this.probeSuccesses = 0;
147
+ }
148
+ }
149
+
150
+ private pruneWindow() {
151
+ const cutoff = Date.now() - this.config.windowMs;
152
+ this.callWindow = this.callWindow.filter(c => c.timestamp > cutoff);
153
+ }
154
+
155
+ getState(): CircuitState { return this.state; }
156
+ }
157
+
158
+ class CircuitOpenError extends Error {
159
+ constructor(message: string) {
160
+ super(message);
161
+ this.name = 'CircuitOpenError';
162
+ }
163
+ }
164
+ ```
165
+
166
+ **Trade-offs:**
167
+ - (+) Prevents cascading failures: a slow downstream service cannot hold the upstream's threads indefinitely.
168
+ - (+) Fast failure in open state gives the upstream service time to shed load and the downstream service time to recover.
169
+ - (+) Half-open probing enables automatic recovery without manual intervention.
170
+ - (-) Circuit opens on transient spikes, not just real outages — requires tuning `minimumCallCount` and `windowMs` carefully for your traffic patterns.
171
+ - (-) State is per-instance by default. In a horizontally-scaled service, each replica has its own circuit state. Use Redis or a service mesh (Istio, Envoy) for cluster-wide state.
172
+ - (-) `CircuitOpenError` is a new error type callers must handle. Forgetting to catch it surfaces as an unhandled exception.
173
+
174
+ **Configuration guidelines:**
175
+ - `failureThreshold: 0.5` is a reasonable default for most services. Lower it (0.25) for critical payments or auth paths. Raise it (0.75) for non-critical enrichment services where partial data is acceptable.
176
+ - `recoveryTimeoutMs: 30_000` gives a recovering service 30s of breathing room. Increase for services that are known to take longer to restart.
177
+ - `minimumCallCount: 10` prevents false opens on low-traffic periods. Reduce in environments where services handle fewer requests (dev, staging).
178
+
179
+ ## Bulkhead Isolation
180
+
181
+ Bulkheads partition resources so that failures in one integration cannot exhaust the resources needed by other integrations. The name comes from ship design: watertight compartments prevent a single hull breach from sinking the entire vessel.
182
+
183
+ ### Thread Pool Bulkhead
184
+
185
+ Dedicate a separate thread pool (or async concurrency limit) to each downstream service. A slow downstream service fills its own pool, not the global pool.
186
+
187
+ ```typescript
188
+ class Semaphore {
189
+ private permits: number;
190
+ private waitQueue: Array<() => void> = [];
191
+
192
+ constructor(private maxConcurrency: number) {
193
+ this.permits = maxConcurrency;
194
+ }
195
+
196
+ async acquire(): Promise<void> {
197
+ if (this.permits > 0) {
198
+ this.permits--;
199
+ return;
200
+ }
201
+ return new Promise(resolve => this.waitQueue.push(resolve));
202
+ }
203
+
204
+ release(): void {
205
+ if (this.waitQueue.length > 0) {
206
+ const next = this.waitQueue.shift()!;
207
+ next();
208
+ } else {
209
+ this.permits++;
210
+ }
211
+ }
212
+ }
213
+
214
+ class BulkheadExecutor {
215
+ private semaphore: Semaphore;
216
+ private activeCount = 0;
217
+ private rejectedCount = 0;
218
+
219
+ constructor(
220
+ private readonly name: string,
221
+ private readonly maxConcurrency: number,
222
+ private readonly maxQueueSize: number,
223
+ ) {
224
+ this.semaphore = new Semaphore(maxConcurrency);
225
+ }
226
+
227
+ async execute<T>(fn: () => Promise<T>): Promise<T> {
228
+ const queueDepth = this.semaphore['waitQueue'].length;
229
+ if (queueDepth >= this.maxQueueSize) {
230
+ this.rejectedCount++;
231
+ throw new BulkheadFullError(
232
+ `Bulkhead ${this.name} queue full (${this.maxConcurrency} concurrent + ${this.maxQueueSize} queued)`
233
+ );
234
+ }
235
+
236
+ await this.semaphore.acquire();
237
+ this.activeCount++;
238
+
239
+ try {
240
+ return await fn();
241
+ } finally {
242
+ this.activeCount--;
243
+ this.semaphore.release();
244
+ }
245
+ }
246
+
247
+ getMetrics() {
248
+ return {
249
+ name: this.name,
250
+ active: this.activeCount,
251
+ maxConcurrency: this.maxConcurrency,
252
+ rejected: this.rejectedCount,
253
+ };
254
+ }
255
+ }
256
+
257
+ class BulkheadFullError extends Error {
258
+ constructor(message: string) {
259
+ super(message);
260
+ this.name = 'BulkheadFullError';
261
+ }
262
+ }
263
+
264
+ // Usage: each downstream gets its own bulkhead
265
+ const paymentsBulkhead = new BulkheadExecutor('payments-service', 10, 20);
266
+ const inventoryBulkhead = new BulkheadExecutor('inventory-service', 20, 40);
267
+ const notificationsBulkhead = new BulkheadExecutor('notifications-service', 5, 10);
268
+
269
+ async function chargeAndFulfill(orderId: string) {
270
+ // Payments and inventory each have their own concurrency limit
271
+ // A payments outage cannot consume inventory's capacity
272
+ const [charge, reservation] = await Promise.all([
273
+ paymentsBulkhead.execute(() => paymentsClient.charge(orderId)),
274
+ inventoryBulkhead.execute(() => inventoryClient.reserve(orderId)),
275
+ ]);
276
+ return { charge, reservation };
277
+ }
278
+ ```
279
+
280
+ ### Queue-Based Bulkhead
281
+
282
+ For async workloads, use separate queues per integration. Each queue has its own concurrency and backpressure configuration.
283
+
284
+ ```typescript
285
+ // BullMQ example: separate queues isolate failure domains
286
+ const paymentQueue = new Queue('payment-processing', {
287
+ connection: redis,
288
+ defaultJobOptions: { attempts: 3, backoff: { type: 'exponential', delay: 1000 } },
289
+ });
290
+
291
+ const notificationQueue = new Queue('notification-dispatch', {
292
+ connection: redis,
293
+ defaultJobOptions: { attempts: 5, backoff: { type: 'exponential', delay: 500 } },
294
+ });
295
+
296
+ // Payment worker: limited to 5 concurrent jobs
297
+ const paymentWorker = new Worker('payment-processing', processPayment, {
298
+ connection: redis,
299
+ concurrency: 5, // Bulkhead: max 5 concurrent payment jobs
300
+ limiter: { max: 100, duration: 60_000 }, // 100 jobs/minute rate limit
301
+ });
302
+
303
+ // Notification worker: separate concurrency, isolated from payment issues
304
+ const notificationWorker = new Worker('notification-dispatch', sendNotification, {
305
+ connection: redis,
306
+ concurrency: 20, // Notifications can be higher concurrency
307
+ });
308
+ ```
309
+
310
+ **Trade-offs (bulkheads):**
311
+ - (+) Failure in one integration cannot starve others of thread/connection resources.
312
+ - (+) Per-integration metrics (`activeCount`, `rejectedCount`) make capacity problems visible before they cascade.
313
+ - (+) `BulkheadFullError` sheds load with a controlled rejection rather than an unbounded queue that grows until OOM.
314
+ - (-) Tuning concurrency limits requires load testing. Too low and you throttle unnecessarily; too high and the bulkhead doesn't protect.
315
+ - (-) Adds a queuing layer. Calls that would have failed fast now wait up to `maxQueueSize` entries before being rejected — this can increase average latency even during partial failures.
316
+ - (-) Per-integration bulkheads multiply operational configuration. Document limits in service manifests.
317
+
318
+ ## Timeout Budget Allocation
319
+
320
+ Timeouts must be set at every level of a call chain. A missing timeout is a resource leak waiting to become an outage.
321
+
322
+ ### The Budget Model
323
+
324
+ Every user-facing request has a total time budget. Each hop in the call chain consumes part of that budget. The sum of all hops (including their own processing time) must fit within the total budget.
325
+
326
+ ```
327
+ User SLA: 2,000ms (P99 target)
328
+
329
+ Entry path:
330
+ API Gateway auth + routing: 50ms
331
+ BFF / aggregator overhead: 100ms
332
+
333
+ Parallel downstream calls:
334
+ Order Service: 500ms
335
+ └─ Inventory sub-call: 200ms (within Order's 500ms budget)
336
+ User Profile Service: 300ms (parallel with Order)
337
+ Product Catalog Service: 250ms (parallel with Order)
338
+
339
+ Serialization + network: 100ms
340
+ P99 buffer / headroom: 500ms
341
+ ────
342
+ Total: 2,000ms
343
+ ```
344
+
345
+ ### Deadline Propagation
346
+
347
+ Pass the absolute deadline through the call chain so all services share one clock, not per-hop timeouts that can stack:
348
+
349
+ ```typescript
350
+ const DEADLINE_HEADER = 'x-request-deadline';
351
+ const DEFAULT_TIMEOUT_MS = 5_000;
352
+
353
+ // Middleware: attach deadline to incoming requests that don't have one
354
+ function deadlineMiddleware(req: Request, res: Response, next: NextFunction) {
355
+ if (!req.headers[DEADLINE_HEADER]) {
356
+ const deadline = new Date(Date.now() + DEFAULT_TIMEOUT_MS).toISOString();
357
+ req.headers[DEADLINE_HEADER] = deadline;
358
+ }
359
+ next();
360
+ }
361
+
362
+ // Helper: compute remaining timeout when making downstream calls
363
+ function getRemainingMs(req: Request, overhead = 10): number {
364
+ const deadline = req.headers[DEADLINE_HEADER] as string | undefined;
365
+ if (!deadline) return DEFAULT_TIMEOUT_MS;
366
+
367
+ const remaining = new Date(deadline).getTime() - Date.now() - overhead;
368
+ if (remaining <= 0) throw new DeadlineExceededError('Request deadline already exceeded');
369
+ return remaining;
370
+ }
371
+
372
+ // Usage: downstream call uses remaining budget, not a fixed timeout
373
+ async function callInventoryService(req: Request, orderId: string) {
374
+ const timeout = getRemainingMs(req, 20); // subtract 20ms for network overhead
375
+
376
+ return inventoryClient.getAvailability(orderId, {
377
+ headers: { [DEADLINE_HEADER]: req.headers[DEADLINE_HEADER] },
378
+ timeout,
379
+ });
380
+ }
381
+
382
+ class DeadlineExceededError extends Error {
383
+ constructor(message: string) {
384
+ super(message);
385
+ this.name = 'DeadlineExceededError';
386
+ }
387
+ }
388
+ ```
389
+
390
+ **Trade-offs (timeout budgets):**
391
+ - (+) Deadline propagation ensures the entire call chain fails fast rather than stacking per-hop timeouts.
392
+ - (+) Budget modeling makes timeout decisions explicit rather than scattered magic numbers across the codebase.
393
+ - (-) Requires all services to adopt the deadline header convention. Partial adoption produces incorrect behavior.
394
+ - (-) Clock skew across services introduces small errors in remaining-time calculations. Use monotonic clocks where available and add conservative overhead.
395
+ - (-) Budgets become stale as service performance changes. Revisit quarterly or after each major architectural change.
396
+
397
+ **Timeout anti-patterns:**
398
+ - **No timeout set**: Most HTTP clients default to no timeout. An unresponsive downstream holds a connection indefinitely, exhausting the upstream's connection pool.
399
+ - **Identical timeout at every level**: A chain of 5 services each with a 5,000ms timeout can take 25,000ms to fail end-to-end. Timeouts must decrease with depth.
400
+ - **Timeout without circuit breaker**: Repeated timeouts still attempt the downstream call on every retry. Pair every timeout with a circuit breaker so repeated failures open the circuit.
401
+
402
+ ## Failure Isolation Strategies
403
+
404
+ ### Error Classification
405
+
406
+ Not all errors are the same. Classify errors before deciding on the response:
407
+
408
+ ```typescript
409
+ type ErrorClass =
410
+ | 'transient' // Temporary — retry is likely to succeed
411
+ | 'permanent' // Client or data error — retrying won't help
412
+ | 'overload' // Server is busy — retry with backoff + jitter
413
+ | 'timeout' // Deadline exceeded — check deadline before retrying
414
+ | 'circuit-open' // Downstream unavailable — return fallback, don't retry
415
+ | 'unknown'; // Unclassified — treat conservatively
416
+
417
+ function classifyError(error: unknown): ErrorClass {
418
+ if (error instanceof CircuitOpenError) return 'circuit-open';
419
+ if (error instanceof DeadlineExceededError) return 'timeout';
420
+
421
+ const status = (error as any)?.response?.status as number | undefined;
422
+ if (!status) return 'transient'; // Network error, no response
423
+
424
+ if (status === 429) return 'overload';
425
+ if ([500, 502, 503, 504].includes(status)) return 'transient';
426
+ if ([400, 401, 403, 404, 422].includes(status)) return 'permanent';
427
+ if (status === 408) return 'timeout';
428
+
429
+ return 'unknown';
430
+ }
431
+ ```
432
+
433
+ ### Fallback Strategies by Error Class
434
+
435
+ ```typescript
436
+ async function getProductWithFallback(productId: string, req: Request): Promise<Product> {
437
+ try {
438
+ return await catalogBulkhead.execute(() =>
439
+ catalogCircuit.execute(() =>
440
+ callCatalogService(req, productId)
441
+ )
442
+ );
443
+ } catch (error) {
444
+ const errorClass = classifyError(error);
445
+
446
+ switch (errorClass) {
447
+ case 'circuit-open':
448
+ case 'transient':
449
+ // Return cached version if available
450
+ const cached = await cache.get<Product>(`product:${productId}`);
451
+ if (cached) {
452
+ logger.warn({ productId, errorClass }, 'Returning stale cached product');
453
+ return { ...cached, _stale: true };
454
+ }
455
+ // Return a minimal placeholder — better than an error for browsing
456
+ return { id: productId, name: 'Product temporarily unavailable', available: false };
457
+
458
+ case 'permanent':
459
+ // 404 or 422 — propagate, don't retry or cache
460
+ throw error;
461
+
462
+ case 'timeout':
463
+ // Deadline already exceeded — fail fast, don't attempt fallback
464
+ throw error;
465
+
466
+ default:
467
+ // Unknown — return placeholder and log for investigation
468
+ logger.error({ productId, errorClass, error }, 'Unclassified product fetch error');
469
+ return { id: productId, name: 'Product temporarily unavailable', available: false };
470
+ }
471
+ }
472
+ }
473
+ ```
474
+
475
+ ### Graceful Degradation Patterns
476
+
477
+ Design each feature for three modes: full, degraded, and unavailable.
478
+
479
+ ```typescript
480
+ interface ProductPageData {
481
+ product: Product;
482
+ relatedProducts: Product[]; // Optional — degradable
483
+ reviews: Review[]; // Optional — degradable
484
+ inventory: InventoryStatus; // Optional — degradable
485
+ }
486
+
487
+ async function getProductPageData(
488
+ productId: string,
489
+ req: Request
490
+ ): Promise<ProductPageData> {
491
+ // Core product data — non-degradable
492
+ const product = await getProductWithFallback(productId, req);
493
+
494
+ // Optional enrichments — fail independently, don't block the page
495
+ const [relatedProducts, reviews, inventory] = await Promise.allSettled([
496
+ getRelatedProducts(productId, req),
497
+ getProductReviews(productId, req),
498
+ getInventoryStatus(productId, req),
499
+ ]);
500
+
501
+ return {
502
+ product,
503
+ relatedProducts: relatedProducts.status === 'fulfilled' ? relatedProducts.value : [],
504
+ reviews: reviews.status === 'fulfilled' ? reviews.value : [],
505
+ inventory: inventory.status === 'fulfilled'
506
+ ? inventory.value
507
+ : { status: 'unknown', message: 'Inventory status temporarily unavailable' },
508
+ };
509
+ }
510
+ ```
511
+
512
+ **Trade-offs (graceful degradation):**
513
+ - (+) Users receive a working page rather than an error screen when a non-critical service is down.
514
+ - (+) `Promise.allSettled` is the correct primitive — unlike `Promise.all`, it does not short-circuit on failure.
515
+ - (-) Partial data requires careful UI handling. The frontend must know how to render each degraded state.
516
+ - (-) Stale cache fallbacks can show outdated prices or incorrect availability. Set explicit `_stale` flags so the UI can display a freshness warning.
517
+
518
+ ## Retry Storm Prevention
519
+
520
+ Retry storms occur when many clients simultaneously retry after a shared failure, overwhelming a recovering service with a burst of traffic.
521
+
522
+ ### Jitter and Backoff
523
+
524
+ ```typescript
525
+ interface RetryConfig {
526
+ maxAttempts: number;
527
+ baseDelayMs: number;
528
+ maxDelayMs: number;
529
+ jitter: 'full' | 'equal' | 'decorrelated';
530
+ }
531
+
532
+ const SERVICE_RETRY_CONFIG: RetryConfig = {
533
+ maxAttempts: 3,
534
+ baseDelayMs: 100,
535
+ maxDelayMs: 10_000,
536
+ jitter: 'full',
537
+ };
538
+
539
+ function computeDelay(attempt: number, config: RetryConfig): number {
540
+ const exponential = config.baseDelayMs * Math.pow(2, attempt - 1);
541
+ const capped = Math.min(exponential, config.maxDelayMs);
542
+
543
+ switch (config.jitter) {
544
+ case 'full':
545
+ // Random value between 0 and capped — maximum spread
546
+ return Math.random() * capped;
547
+
548
+ case 'equal':
549
+ // Half fixed, half random — more predictable average
550
+ return capped / 2 + Math.random() * (capped / 2);
551
+
552
+ case 'decorrelated':
553
+ // Each delay is random within [baseDelay, prevDelay * 3]
554
+ // Avoids correlated retry waves across clients
555
+ return Math.random() * (capped * 3 - config.baseDelayMs) + config.baseDelayMs;
556
+ }
557
+ }
558
+
559
+ async function withRetry<T>(
560
+ fn: () => Promise<T>,
561
+ config: RetryConfig = SERVICE_RETRY_CONFIG,
562
+ ): Promise<T> {
563
+ let lastError: unknown;
564
+
565
+ for (let attempt = 1; attempt <= config.maxAttempts; attempt++) {
566
+ try {
567
+ return await fn();
568
+ } catch (error) {
569
+ lastError = error;
570
+ const errorClass = classifyError(error);
571
+
572
+ // Only retry transient errors — never retry permanent or open-circuit
573
+ if (errorClass === 'permanent' || errorClass === 'circuit-open') {
574
+ throw error;
575
+ }
576
+
577
+ if (attempt === config.maxAttempts) break;
578
+
579
+ // Respect server-provided Retry-After header (rate limit responses)
580
+ const retryAfterHeader = (error as any)?.response?.headers?.['retry-after'];
581
+ const delay = retryAfterHeader
582
+ ? parseInt(retryAfterHeader, 10) * 1000
583
+ : computeDelay(attempt, config);
584
+
585
+ await new Promise(resolve => setTimeout(resolve, delay));
586
+ }
587
+ }
588
+
589
+ throw lastError;
590
+ }
591
+ ```
592
+
593
+ ### Coordinated Retry Budgets
594
+
595
+ In high-traffic systems, coordinate retry behavior at the request level using retry budgets — a limit on the ratio of retries to original requests:
596
+
597
+ ```typescript
598
+ class RetryBudget {
599
+ private tokens: number;
600
+ private readonly maxTokens: number;
601
+ private lastRefill: number;
602
+
603
+ constructor(
604
+ private readonly requestsPerSecond: number,
605
+ // Allow at most 20% of traffic to be retries
606
+ private readonly retryRatio: number = 0.2,
607
+ ) {
608
+ this.maxTokens = Math.ceil(requestsPerSecond * retryRatio);
609
+ this.tokens = this.maxTokens;
610
+ this.lastRefill = Date.now();
611
+ }
612
+
613
+ canRetry(): boolean {
614
+ this.refill();
615
+ if (this.tokens > 0) {
616
+ this.tokens--;
617
+ return true;
618
+ }
619
+ return false;
620
+ }
621
+
622
+ private refill() {
623
+ const now = Date.now();
624
+ const elapsed = now - this.lastRefill;
625
+ const refillAmount = (elapsed / 1000) * this.requestsPerSecond * this.retryRatio;
626
+ this.tokens = Math.min(this.maxTokens, this.tokens + refillAmount);
627
+ this.lastRefill = now;
628
+ }
629
+ }
630
+
631
+ const catalogRetryBudget = new RetryBudget(100); // 100 req/s, 20% retry budget = 20 retries/s
632
+
633
+ async function callCatalogWithBudget(productId: string): Promise<Product> {
634
+ try {
635
+ return await catalogClient.getProduct(productId);
636
+ } catch (error) {
637
+ if (classifyError(error) === 'transient' && catalogRetryBudget.canRetry()) {
638
+ await new Promise(resolve => setTimeout(resolve, computeDelay(1, SERVICE_RETRY_CONFIG)));
639
+ return catalogClient.getProduct(productId);
640
+ }
641
+ throw error;
642
+ }
643
+ }
644
+ ```
645
+
646
+ **Trade-offs (retry storm prevention):**
647
+ - (+) Full jitter is the most effective at preventing thundering herd — clients retry at random intervals rather than synchronized waves.
648
+ - (+) Retry budgets cap the amplification factor: 100 failures cannot produce 300+ retry calls when the budget is exhausted.
649
+ - (-) Full jitter means some retries happen very quickly (near zero delay), which may not be appropriate for services that need explicit recovery time. Use `equal` jitter for predictable minimum delays.
650
+ - (-) Retry budget state must be shared across instances in a horizontally-scaled service. Use Redis counters with TTL for cluster-wide budgets.
651
+
652
+ ## Observability for Resilience
653
+
654
+ Every resilience mechanism must emit metrics and structured logs so you know when it activates:
655
+
656
+ ```typescript
657
+ // Metrics to expose (Prometheus or equivalent):
658
+ // circuit_breaker_state{name, state} — 0=closed, 1=half-open, 2=open
659
+ // circuit_breaker_calls_total{name, result} — success, failure, short-circuited
660
+ // bulkhead_active_calls{name} — current concurrency
661
+ // bulkhead_rejected_calls_total{name} — rejected due to full bulkhead
662
+ // retry_attempts_total{service, attempt_number} — retry attempt distribution
663
+ // timeout_exceeded_total{service} — deadline exceeded events
664
+
665
+ function instrumentedCircuitCall<T>(
666
+ circuit: CircuitBreaker,
667
+ bulkhead: BulkheadExecutor,
668
+ fn: () => Promise<T>,
669
+ labels: { service: string },
670
+ ): Promise<T> {
671
+ const start = Date.now();
672
+
673
+ return bulkhead.execute(() => circuit.execute(fn))
674
+ .then(result => {
675
+ metrics.increment('circuit_breaker_calls_total', { ...labels, result: 'success' });
676
+ metrics.histogram('call_duration_ms', Date.now() - start, labels);
677
+ return result;
678
+ })
679
+ .catch(error => {
680
+ const result = error instanceof CircuitOpenError
681
+ ? 'short-circuited'
682
+ : error instanceof BulkheadFullError
683
+ ? 'rejected'
684
+ : 'failure';
685
+ metrics.increment('circuit_breaker_calls_total', { ...labels, result });
686
+ throw error;
687
+ });
688
+ }
689
+ ```
690
+
691
+ ## Common Pitfalls
692
+
693
+ **Circuit breaker per-instance only.** In a 10-replica service, each replica has its own circuit state. One replica may have opened its circuit while the other nine keep sending traffic, defeating the protection. Fix: share circuit state via Redis or delegate circuit breaking to the service mesh (Envoy, Istio).
694
+
695
+ **Retrying without a circuit breaker.** Retry logic keeps hammering a failing service even after the failure is clearly systemic. Fix: wrap every retry block in a circuit breaker. When the circuit opens, retries stop.
696
+
697
+ **Missing `Promise.allSettled` in fan-outs.** Using `Promise.all` for parallel enrichment calls means a single enrichment failure cancels all others. Fix: use `Promise.allSettled` and handle each result independently.
698
+
699
+ **Timeout but no deadline.** Setting a timeout on each call without propagating a shared deadline means the full call chain timeout is the sum of individual timeouts. A chain of three services each with 5,000ms timeouts can take 15 seconds to fail. Fix: attach a deadline at the entry point and propagate it through all downstream calls.
700
+
701
+ **Bulkhead sized by gut feel.** Setting `maxConcurrency: 10` without measuring the service's actual throughput. Too low throttles legitimate traffic; too high doesn't protect. Fix: load test the downstream service to find its natural breaking point, then set the bulkhead just below it.
702
+
703
+ **Graceful degradation without cache warming.** Returning a cached fallback only works if the cache was populated. On a cold start or after a cache flush, there's nothing to fall back to. Fix: implement cache warming on startup and ensure TTLs are set conservatively so fallbacks are available during brief outages.
704
+
705
+ ## See Also
706
+
707
+ - [multi-service-api-contracts](./multi-service-api-contracts.md) — Retry policies, idempotency keys, and timeout budget allocation
708
+ - [multi-service-observability](./multi-service-observability.md) — Metrics, tracing, and alerting for resilience signals
709
+ - [multi-service-architecture](./multi-service-architecture.md) — Circuit breaker context within service topology
710
+ - [multi-service-testing](./multi-service-testing.md) — Chaos testing and fault injection strategies