@zigrivers/scaffold 3.16.0 → 3.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (385) hide show
  1. package/README.md +28 -0
  2. package/content/knowledge/backend/backend-fintech-broker-integration.md +244 -0
  3. package/content/knowledge/backend/backend-fintech-compliance.md +181 -0
  4. package/content/knowledge/backend/backend-fintech-data-modeling.md +210 -0
  5. package/content/knowledge/backend/backend-fintech-ledger.md +226 -0
  6. package/content/knowledge/backend/backend-fintech-observability.md +151 -0
  7. package/content/knowledge/backend/backend-fintech-order-lifecycle.md +213 -0
  8. package/content/knowledge/backend/backend-fintech-risk-management.md +150 -0
  9. package/content/knowledge/backend/backend-fintech-testing.md +197 -0
  10. package/content/knowledge/core/automated-review-tooling.md +10 -0
  11. package/content/knowledge/core/multi-service-api-contracts.md +634 -0
  12. package/content/knowledge/core/multi-service-architecture.md +492 -0
  13. package/content/knowledge/core/multi-service-auth.md +706 -0
  14. package/content/knowledge/core/multi-service-data-ownership.md +539 -0
  15. package/content/knowledge/core/multi-service-observability.md +545 -0
  16. package/content/knowledge/core/multi-service-resilience.md +710 -0
  17. package/content/knowledge/core/multi-service-task-decomposition.md +615 -0
  18. package/content/knowledge/core/multi-service-testing.md +728 -0
  19. package/content/methodology/backend-fintech.yml +46 -0
  20. package/content/methodology/custom-defaults.yml +6 -0
  21. package/content/methodology/deep.yml +6 -0
  22. package/content/methodology/multi-service-overlay.yml +103 -0
  23. package/content/methodology/mvp.yml +6 -0
  24. package/content/pipeline/architecture/service-ownership-map.md +83 -0
  25. package/content/pipeline/quality/cross-service-auth.md +96 -0
  26. package/content/pipeline/quality/cross-service-observability.md +104 -0
  27. package/content/pipeline/quality/integration-test-plan.md +106 -0
  28. package/content/pipeline/specification/inter-service-contracts.md +95 -0
  29. package/dist/cli/commands/adopt.cli-flags.test.js +20 -0
  30. package/dist/cli/commands/adopt.cli-flags.test.js.map +1 -1
  31. package/dist/cli/commands/adopt.d.ts.map +1 -1
  32. package/dist/cli/commands/adopt.js +11 -3
  33. package/dist/cli/commands/adopt.js.map +1 -1
  34. package/dist/cli/commands/complete.d.ts +1 -0
  35. package/dist/cli/commands/complete.d.ts.map +1 -1
  36. package/dist/cli/commands/complete.js +26 -8
  37. package/dist/cli/commands/complete.js.map +1 -1
  38. package/dist/cli/commands/dashboard.d.ts +1 -0
  39. package/dist/cli/commands/dashboard.d.ts.map +1 -1
  40. package/dist/cli/commands/dashboard.js +19 -6
  41. package/dist/cli/commands/dashboard.js.map +1 -1
  42. package/dist/cli/commands/decisions.d.ts +1 -0
  43. package/dist/cli/commands/decisions.d.ts.map +1 -1
  44. package/dist/cli/commands/decisions.js +18 -4
  45. package/dist/cli/commands/decisions.js.map +1 -1
  46. package/dist/cli/commands/info.d.ts +1 -0
  47. package/dist/cli/commands/info.d.ts.map +1 -1
  48. package/dist/cli/commands/info.js +25 -3
  49. package/dist/cli/commands/info.js.map +1 -1
  50. package/dist/cli/commands/init-from.test.d.ts +2 -0
  51. package/dist/cli/commands/init-from.test.d.ts.map +1 -0
  52. package/dist/cli/commands/init-from.test.js +315 -0
  53. package/dist/cli/commands/init-from.test.js.map +1 -0
  54. package/dist/cli/commands/init.d.ts +3 -0
  55. package/dist/cli/commands/init.d.ts.map +1 -1
  56. package/dist/cli/commands/init.js +239 -129
  57. package/dist/cli/commands/init.js.map +1 -1
  58. package/dist/cli/commands/init.test.js +20 -0
  59. package/dist/cli/commands/init.test.js.map +1 -1
  60. package/dist/cli/commands/next.d.ts +1 -0
  61. package/dist/cli/commands/next.d.ts.map +1 -1
  62. package/dist/cli/commands/next.js +40 -4
  63. package/dist/cli/commands/next.js.map +1 -1
  64. package/dist/cli/commands/next.test.js +153 -0
  65. package/dist/cli/commands/next.test.js.map +1 -1
  66. package/dist/cli/commands/reset.d.ts +1 -0
  67. package/dist/cli/commands/reset.d.ts.map +1 -1
  68. package/dist/cli/commands/reset.js +77 -29
  69. package/dist/cli/commands/reset.js.map +1 -1
  70. package/dist/cli/commands/rework.d.ts +1 -0
  71. package/dist/cli/commands/rework.d.ts.map +1 -1
  72. package/dist/cli/commands/rework.js +16 -2
  73. package/dist/cli/commands/rework.js.map +1 -1
  74. package/dist/cli/commands/run.d.ts +1 -0
  75. package/dist/cli/commands/run.d.ts.map +1 -1
  76. package/dist/cli/commands/run.js +65 -13
  77. package/dist/cli/commands/run.js.map +1 -1
  78. package/dist/cli/commands/run.test.js +255 -3
  79. package/dist/cli/commands/run.test.js.map +1 -1
  80. package/dist/cli/commands/skip.d.ts +1 -0
  81. package/dist/cli/commands/skip.d.ts.map +1 -1
  82. package/dist/cli/commands/skip.js +24 -7
  83. package/dist/cli/commands/skip.js.map +1 -1
  84. package/dist/cli/commands/status.d.ts +1 -0
  85. package/dist/cli/commands/status.d.ts.map +1 -1
  86. package/dist/cli/commands/status.js +51 -4
  87. package/dist/cli/commands/status.js.map +1 -1
  88. package/dist/cli/commands/status.test.js +130 -0
  89. package/dist/cli/commands/status.test.js.map +1 -1
  90. package/dist/cli/guards-coverage.test.d.ts +2 -0
  91. package/dist/cli/guards-coverage.test.d.ts.map +1 -0
  92. package/dist/cli/guards-coverage.test.js +26 -0
  93. package/dist/cli/guards-coverage.test.js.map +1 -0
  94. package/dist/cli/guards-integration.test.d.ts +2 -0
  95. package/dist/cli/guards-integration.test.d.ts.map +1 -0
  96. package/dist/cli/guards-integration.test.js +178 -0
  97. package/dist/cli/guards-integration.test.js.map +1 -0
  98. package/dist/cli/guards.d.ts +13 -0
  99. package/dist/cli/guards.d.ts.map +1 -0
  100. package/dist/cli/guards.js +70 -0
  101. package/dist/cli/guards.js.map +1 -0
  102. package/dist/cli/guards.test.d.ts +2 -0
  103. package/dist/cli/guards.test.d.ts.map +1 -0
  104. package/dist/cli/guards.test.js +136 -0
  105. package/dist/cli/guards.test.js.map +1 -0
  106. package/dist/cli/init-flag-families.d.ts +1 -1
  107. package/dist/cli/init-flag-families.d.ts.map +1 -1
  108. package/dist/cli/init-flag-families.js +4 -1
  109. package/dist/cli/init-flag-families.js.map +1 -1
  110. package/dist/cli/init-flag-families.test.js +10 -0
  111. package/dist/cli/init-flag-families.test.js.map +1 -1
  112. package/dist/cli/shutdown.d.ts +2 -3
  113. package/dist/cli/shutdown.d.ts.map +1 -1
  114. package/dist/cli/shutdown.js +14 -11
  115. package/dist/cli/shutdown.js.map +1 -1
  116. package/dist/cli/shutdown.test.js +2 -4
  117. package/dist/cli/shutdown.test.js.map +1 -1
  118. package/dist/config/schema.d.ts +12122 -288
  119. package/dist/config/schema.d.ts.map +1 -1
  120. package/dist/config/schema.js +74 -79
  121. package/dist/config/schema.js.map +1 -1
  122. package/dist/config/schema.test.js +230 -1
  123. package/dist/config/schema.test.js.map +1 -1
  124. package/dist/config/validators/backend.d.ts +4 -0
  125. package/dist/config/validators/backend.d.ts.map +1 -0
  126. package/dist/config/validators/backend.js +14 -0
  127. package/dist/config/validators/backend.js.map +1 -0
  128. package/dist/config/validators/browser-extension.d.ts +4 -0
  129. package/dist/config/validators/browser-extension.d.ts.map +1 -0
  130. package/dist/config/validators/browser-extension.js +24 -0
  131. package/dist/config/validators/browser-extension.js.map +1 -0
  132. package/dist/config/validators/cli.d.ts +4 -0
  133. package/dist/config/validators/cli.d.ts.map +1 -0
  134. package/dist/config/validators/cli.js +14 -0
  135. package/dist/config/validators/cli.js.map +1 -0
  136. package/dist/config/validators/data-pipeline.d.ts +4 -0
  137. package/dist/config/validators/data-pipeline.d.ts.map +1 -0
  138. package/dist/config/validators/data-pipeline.js +14 -0
  139. package/dist/config/validators/data-pipeline.js.map +1 -0
  140. package/dist/config/validators/game.d.ts +4 -0
  141. package/dist/config/validators/game.d.ts.map +1 -0
  142. package/dist/config/validators/game.js +14 -0
  143. package/dist/config/validators/game.js.map +1 -0
  144. package/dist/config/validators/index.d.ts +7 -0
  145. package/dist/config/validators/index.d.ts.map +1 -0
  146. package/dist/config/validators/index.js +27 -0
  147. package/dist/config/validators/index.js.map +1 -0
  148. package/dist/config/validators/library.d.ts +4 -0
  149. package/dist/config/validators/library.d.ts.map +1 -0
  150. package/dist/config/validators/library.js +25 -0
  151. package/dist/config/validators/library.js.map +1 -0
  152. package/dist/config/validators/ml.d.ts +4 -0
  153. package/dist/config/validators/ml.d.ts.map +1 -0
  154. package/dist/config/validators/ml.js +31 -0
  155. package/dist/config/validators/ml.js.map +1 -0
  156. package/dist/config/validators/mobile-app.d.ts +4 -0
  157. package/dist/config/validators/mobile-app.d.ts.map +1 -0
  158. package/dist/config/validators/mobile-app.js +14 -0
  159. package/dist/config/validators/mobile-app.js.map +1 -0
  160. package/dist/config/validators/registry.test.d.ts +2 -0
  161. package/dist/config/validators/registry.test.d.ts.map +1 -0
  162. package/dist/config/validators/registry.test.js +26 -0
  163. package/dist/config/validators/registry.test.js.map +1 -0
  164. package/dist/config/validators/research.d.ts +4 -0
  165. package/dist/config/validators/research.d.ts.map +1 -0
  166. package/dist/config/validators/research.js +24 -0
  167. package/dist/config/validators/research.js.map +1 -0
  168. package/dist/config/validators/research.test.d.ts +2 -0
  169. package/dist/config/validators/research.test.d.ts.map +1 -0
  170. package/dist/config/validators/research.test.js +44 -0
  171. package/dist/config/validators/research.test.js.map +1 -0
  172. package/dist/config/validators/types.d.ts +19 -0
  173. package/dist/config/validators/types.d.ts.map +1 -0
  174. package/dist/config/validators/types.js +2 -0
  175. package/dist/config/validators/types.js.map +1 -0
  176. package/dist/config/validators/validators.test.d.ts +2 -0
  177. package/dist/config/validators/validators.test.d.ts.map +1 -0
  178. package/dist/config/validators/validators.test.js +25 -0
  179. package/dist/config/validators/validators.test.js.map +1 -0
  180. package/dist/config/validators/web-app.d.ts +4 -0
  181. package/dist/config/validators/web-app.d.ts.map +1 -0
  182. package/dist/config/validators/web-app.js +31 -0
  183. package/dist/config/validators/web-app.js.map +1 -0
  184. package/dist/core/assembly/context-gatherer.d.ts.map +1 -1
  185. package/dist/core/assembly/context-gatherer.js +4 -2
  186. package/dist/core/assembly/context-gatherer.js.map +1 -1
  187. package/dist/core/assembly/cross-reads.d.ts +61 -0
  188. package/dist/core/assembly/cross-reads.d.ts.map +1 -0
  189. package/dist/core/assembly/cross-reads.js +190 -0
  190. package/dist/core/assembly/cross-reads.js.map +1 -0
  191. package/dist/core/assembly/cross-reads.test.d.ts +2 -0
  192. package/dist/core/assembly/cross-reads.test.d.ts.map +1 -0
  193. package/dist/core/assembly/cross-reads.test.js +497 -0
  194. package/dist/core/assembly/cross-reads.test.js.map +1 -0
  195. package/dist/core/assembly/overlay-loader-structural.test.d.ts +2 -0
  196. package/dist/core/assembly/overlay-loader-structural.test.d.ts.map +1 -0
  197. package/dist/core/assembly/overlay-loader-structural.test.js +173 -0
  198. package/dist/core/assembly/overlay-loader-structural.test.js.map +1 -0
  199. package/dist/core/assembly/overlay-loader.d.ts +19 -3
  200. package/dist/core/assembly/overlay-loader.d.ts.map +1 -1
  201. package/dist/core/assembly/overlay-loader.js +135 -4
  202. package/dist/core/assembly/overlay-loader.js.map +1 -1
  203. package/dist/core/assembly/overlay-loader.test.js +204 -1
  204. package/dist/core/assembly/overlay-loader.test.js.map +1 -1
  205. package/dist/core/assembly/overlay-resolver.d.ts +9 -2
  206. package/dist/core/assembly/overlay-resolver.d.ts.map +1 -1
  207. package/dist/core/assembly/overlay-resolver.js +32 -1
  208. package/dist/core/assembly/overlay-resolver.js.map +1 -1
  209. package/dist/core/assembly/overlay-resolver.test.js +135 -17
  210. package/dist/core/assembly/overlay-resolver.test.js.map +1 -1
  211. package/dist/core/assembly/overlay-state-resolver.d.ts +9 -0
  212. package/dist/core/assembly/overlay-state-resolver.d.ts.map +1 -1
  213. package/dist/core/assembly/overlay-state-resolver.js +43 -2
  214. package/dist/core/assembly/overlay-state-resolver.js.map +1 -1
  215. package/dist/core/assembly/overlay-state-resolver.test.js +321 -0
  216. package/dist/core/assembly/overlay-state-resolver.test.js.map +1 -1
  217. package/dist/core/assembly/update-mode.d.ts +1 -0
  218. package/dist/core/assembly/update-mode.d.ts.map +1 -1
  219. package/dist/core/assembly/update-mode.js +17 -9
  220. package/dist/core/assembly/update-mode.js.map +1 -1
  221. package/dist/core/dependency/eligibility.d.ts +10 -1
  222. package/dist/core/dependency/eligibility.d.ts.map +1 -1
  223. package/dist/core/dependency/eligibility.js +19 -1
  224. package/dist/core/dependency/eligibility.js.map +1 -1
  225. package/dist/core/dependency/eligibility.test.js +82 -0
  226. package/dist/core/dependency/eligibility.test.js.map +1 -1
  227. package/dist/core/dependency/graph.d.ts +4 -1
  228. package/dist/core/dependency/graph.d.ts.map +1 -1
  229. package/dist/core/dependency/graph.js +7 -1
  230. package/dist/core/dependency/graph.js.map +1 -1
  231. package/dist/core/dependency/graph.test.js +48 -0
  232. package/dist/core/dependency/graph.test.js.map +1 -1
  233. package/dist/core/pipeline/global-steps.d.ts +7 -0
  234. package/dist/core/pipeline/global-steps.d.ts.map +1 -0
  235. package/dist/core/pipeline/global-steps.js +18 -0
  236. package/dist/core/pipeline/global-steps.js.map +1 -0
  237. package/dist/core/pipeline/resolver.d.ts +1 -0
  238. package/dist/core/pipeline/resolver.d.ts.map +1 -1
  239. package/dist/core/pipeline/resolver.js +54 -7
  240. package/dist/core/pipeline/resolver.js.map +1 -1
  241. package/dist/core/pipeline/resolver.test.js +51 -1
  242. package/dist/core/pipeline/resolver.test.js.map +1 -1
  243. package/dist/core/pipeline/types.d.ts +5 -1
  244. package/dist/core/pipeline/types.d.ts.map +1 -1
  245. package/dist/e2e/cross-service-references.test.d.ts +22 -0
  246. package/dist/e2e/cross-service-references.test.d.ts.map +1 -0
  247. package/dist/e2e/cross-service-references.test.js +230 -0
  248. package/dist/e2e/cross-service-references.test.js.map +1 -0
  249. package/dist/e2e/multi-service-pipeline.test.d.ts +10 -0
  250. package/dist/e2e/multi-service-pipeline.test.d.ts.map +1 -0
  251. package/dist/e2e/multi-service-pipeline.test.js +185 -0
  252. package/dist/e2e/multi-service-pipeline.test.js.map +1 -0
  253. package/dist/e2e/project-type-overlays.test.js +68 -0
  254. package/dist/e2e/project-type-overlays.test.js.map +1 -1
  255. package/dist/e2e/service-execution.test.d.ts +15 -0
  256. package/dist/e2e/service-execution.test.d.ts.map +1 -0
  257. package/dist/e2e/service-execution.test.js +219 -0
  258. package/dist/e2e/service-execution.test.js.map +1 -0
  259. package/dist/e2e/service-manifest.test.d.ts +19 -0
  260. package/dist/e2e/service-manifest.test.d.ts.map +1 -0
  261. package/dist/e2e/service-manifest.test.js +166 -0
  262. package/dist/e2e/service-manifest.test.js.map +1 -0
  263. package/dist/project/__frozen-schemas__/schema-v3.9.2.d.ts +224 -224
  264. package/dist/project/frontmatter.d.ts.map +1 -1
  265. package/dist/project/frontmatter.js +11 -0
  266. package/dist/project/frontmatter.js.map +1 -1
  267. package/dist/project/frontmatter.test.js +71 -0
  268. package/dist/project/frontmatter.test.js.map +1 -1
  269. package/dist/state/completion.d.ts +1 -1
  270. package/dist/state/completion.d.ts.map +1 -1
  271. package/dist/state/completion.js +10 -8
  272. package/dist/state/completion.js.map +1 -1
  273. package/dist/state/decision-logger.d.ts +3 -2
  274. package/dist/state/decision-logger.d.ts.map +1 -1
  275. package/dist/state/decision-logger.js +12 -11
  276. package/dist/state/decision-logger.js.map +1 -1
  277. package/dist/state/ensure-v3-migration.d.ts +9 -0
  278. package/dist/state/ensure-v3-migration.d.ts.map +1 -0
  279. package/dist/state/ensure-v3-migration.js +35 -0
  280. package/dist/state/ensure-v3-migration.js.map +1 -0
  281. package/dist/state/lock-manager.d.ts +5 -4
  282. package/dist/state/lock-manager.d.ts.map +1 -1
  283. package/dist/state/lock-manager.js +11 -11
  284. package/dist/state/lock-manager.js.map +1 -1
  285. package/dist/state/rework-manager.d.ts +1 -2
  286. package/dist/state/rework-manager.d.ts.map +1 -1
  287. package/dist/state/rework-manager.js +4 -5
  288. package/dist/state/rework-manager.js.map +1 -1
  289. package/dist/state/state-manager.d.ts +25 -1
  290. package/dist/state/state-manager.d.ts.map +1 -1
  291. package/dist/state/state-manager.js +86 -12
  292. package/dist/state/state-manager.js.map +1 -1
  293. package/dist/state/state-manager.test.js +278 -0
  294. package/dist/state/state-manager.test.js.map +1 -1
  295. package/dist/state/state-migration-v3.d.ts +22 -0
  296. package/dist/state/state-migration-v3.d.ts.map +1 -0
  297. package/dist/state/state-migration-v3.js +82 -0
  298. package/dist/state/state-migration-v3.js.map +1 -0
  299. package/dist/state/state-migration-v3.test.d.ts +2 -0
  300. package/dist/state/state-migration-v3.test.d.ts.map +1 -0
  301. package/dist/state/state-migration-v3.test.js +196 -0
  302. package/dist/state/state-migration-v3.test.js.map +1 -0
  303. package/dist/state/state-migration.d.ts.map +1 -1
  304. package/dist/state/state-migration.js +11 -6
  305. package/dist/state/state-migration.js.map +1 -1
  306. package/dist/state/state-migration.test.js +47 -2
  307. package/dist/state/state-migration.test.js.map +1 -1
  308. package/dist/state/state-path-resolver.d.ts +23 -0
  309. package/dist/state/state-path-resolver.d.ts.map +1 -0
  310. package/dist/state/state-path-resolver.js +36 -0
  311. package/dist/state/state-path-resolver.js.map +1 -0
  312. package/dist/state/state-path-resolver.test.d.ts +2 -0
  313. package/dist/state/state-path-resolver.test.d.ts.map +1 -0
  314. package/dist/state/state-path-resolver.test.js +78 -0
  315. package/dist/state/state-path-resolver.test.js.map +1 -0
  316. package/dist/state/state-version-dispatch.d.ts +17 -0
  317. package/dist/state/state-version-dispatch.d.ts.map +1 -0
  318. package/dist/state/state-version-dispatch.js +27 -0
  319. package/dist/state/state-version-dispatch.js.map +1 -0
  320. package/dist/state/state-version-dispatch.test.d.ts +2 -0
  321. package/dist/state/state-version-dispatch.test.d.ts.map +1 -0
  322. package/dist/state/state-version-dispatch.test.js +40 -0
  323. package/dist/state/state-version-dispatch.test.js.map +1 -0
  324. package/dist/types/config.d.ts +33 -3
  325. package/dist/types/config.d.ts.map +1 -1
  326. package/dist/types/config.test.js +62 -1
  327. package/dist/types/config.test.js.map +1 -1
  328. package/dist/types/dependency.d.ts +9 -0
  329. package/dist/types/dependency.d.ts.map +1 -1
  330. package/dist/types/frontmatter.d.ts +5 -0
  331. package/dist/types/frontmatter.d.ts.map +1 -1
  332. package/dist/types/lock.d.ts +1 -1
  333. package/dist/types/lock.d.ts.map +1 -1
  334. package/dist/types/state.d.ts +1 -1
  335. package/dist/types/state.d.ts.map +1 -1
  336. package/dist/utils/artifact-path.d.ts +19 -0
  337. package/dist/utils/artifact-path.d.ts.map +1 -0
  338. package/dist/utils/artifact-path.js +95 -0
  339. package/dist/utils/artifact-path.js.map +1 -0
  340. package/dist/utils/artifact-path.test.d.ts +2 -0
  341. package/dist/utils/artifact-path.test.d.ts.map +1 -0
  342. package/dist/utils/artifact-path.test.js +138 -0
  343. package/dist/utils/artifact-path.test.js.map +1 -0
  344. package/dist/utils/errors.d.ts +3 -1
  345. package/dist/utils/errors.d.ts.map +1 -1
  346. package/dist/utils/errors.js +21 -2
  347. package/dist/utils/errors.js.map +1 -1
  348. package/dist/utils/errors.test.js +27 -1
  349. package/dist/utils/errors.test.js.map +1 -1
  350. package/dist/utils/user-errors.d.ts +46 -0
  351. package/dist/utils/user-errors.d.ts.map +1 -0
  352. package/dist/utils/user-errors.js +76 -0
  353. package/dist/utils/user-errors.js.map +1 -0
  354. package/dist/utils/user-errors.test.d.ts +2 -0
  355. package/dist/utils/user-errors.test.d.ts.map +1 -0
  356. package/dist/utils/user-errors.test.js +74 -0
  357. package/dist/utils/user-errors.test.js.map +1 -0
  358. package/dist/validation/index.d.ts.map +1 -1
  359. package/dist/validation/index.js +16 -0
  360. package/dist/validation/index.js.map +1 -1
  361. package/dist/validation/index.test.js +48 -0
  362. package/dist/validation/index.test.js.map +1 -1
  363. package/dist/validation/state-validator.d.ts +5 -2
  364. package/dist/validation/state-validator.d.ts.map +1 -1
  365. package/dist/validation/state-validator.js +18 -20
  366. package/dist/validation/state-validator.js.map +1 -1
  367. package/dist/validation/state-validator.test.js +31 -2
  368. package/dist/validation/state-validator.test.js.map +1 -1
  369. package/dist/wizard/copy/backend.d.ts.map +1 -1
  370. package/dist/wizard/copy/backend.js +12 -0
  371. package/dist/wizard/copy/backend.js.map +1 -1
  372. package/dist/wizard/flags.d.ts +1 -0
  373. package/dist/wizard/flags.d.ts.map +1 -1
  374. package/dist/wizard/questions.d.ts.map +1 -1
  375. package/dist/wizard/questions.js +5 -1
  376. package/dist/wizard/questions.js.map +1 -1
  377. package/dist/wizard/questions.test.js +45 -2
  378. package/dist/wizard/questions.test.js.map +1 -1
  379. package/dist/wizard/wizard.d.ts +23 -0
  380. package/dist/wizard/wizard.d.ts.map +1 -1
  381. package/dist/wizard/wizard.js +85 -47
  382. package/dist/wizard/wizard.js.map +1 -1
  383. package/dist/wizard/wizard.test.js +186 -1
  384. package/dist/wizard/wizard.test.js.map +1 -1
  385. package/package.json +1 -1
@@ -0,0 +1,545 @@
1
+ ---
2
+ name: multi-service-observability
3
+ description: Distributed tracing, correlation IDs, cross-service SLOs, and failure attribution
4
+ topics: [distributed-tracing, correlation-ids, cross-service-slos, failure-attribution]
5
+ ---
6
+
7
+ ## Summary
8
+
9
+ Observability in a multi-service system is a prerequisite for correct operation, not an optional enhancement. When a request crosses four service boundaries before returning an error, you cannot debug it without distributed tracing and correlation IDs.
10
+
11
+ **Three pillars for multi-service systems:**
12
+ - **Distributed tracing (W3C Trace Context):** Every request gets a `traceparent` header with a trace ID and span ID. Each service records spans. All spans for a single request share a trace ID, creating a complete picture of the request's journey. Use OpenTelemetry (vendor-neutral) and export to any backend (Jaeger, Tempo, Datadog).
13
+ - **Correlation IDs (`X-Correlation-ID`):** Business-level identifier for a workflow, persisted in the application database. Survives async boundaries that distributed traces don't bridge (jobs, scheduled tasks, multi-request workflows). Include in every log entry and outgoing message.
14
+ - **Structured logs (JSON):** Every log entry must include `correlationId`, `traceId`, `service`, `version`, and `level`. Ship to a central aggregation system (ELK, Loki, CloudWatch).
15
+
16
+ **SLO strategy:** Define SLOs per service and per user-facing journey. Composite availability = product of all participating services' availabilities — a 5-service chain each at 99.9% yields ~99.5% composite. Alert on error budget burn rate (e.g., 14x sustainable rate in 1 hour), not hard thresholds.
17
+
18
+ **Failure attribution:** Walk the span tree inward from the user-facing error to find the first span that recorded an error. Classify as infrastructure, dependency, or application failure.
19
+
20
+ **OpenTelemetry Collector:** Route telemetry through a Collector (not directly from services to the backend) for backend-agnostic export, sampling, and buffering.
21
+
22
+ ## Deep Guidance
23
+
24
+ ## Distributed Tracing with W3C Trace Context
25
+
26
+ ### The Problem Distributed Tracing Solves
27
+
28
+ A single user-facing request in a multi-service system might be handled by an API gateway, an auth service, an order service, an inventory service, and a payment service. Each service logs independently. Without distributed tracing, a single failed request leaves log entries scattered across five services with no way to correlate them. Debugging requires manual log archaeology across systems with imprecise time correlation.
29
+
30
+ Distributed tracing solves this by propagating a trace context through every service boundary. Each service records spans — units of work with start time, duration, tags, and relationships. All spans for a single request share a trace ID, creating a complete picture of the request's journey.
31
+
32
+ ### W3C Trace Context Standard
33
+
34
+ The W3C Trace Context specification (https://www.w3.org/TR/trace-context/) defines two HTTP headers for propagating trace context:
35
+
36
+ **`traceparent`** — carries the trace ID, span ID, and sampling flags:
37
+
38
+ ```
39
+ traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
40
+ ^^ version
41
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ trace-id (16 bytes, hex)
42
+ ^^^^^^^^^^^^^^^^ parent-span-id (8 bytes, hex)
43
+ ^^ flags (01 = sampled)
44
+ ```
45
+
46
+ **`tracestate`** — carries vendor-specific key-value pairs alongside the standard header:
47
+
48
+ ```
49
+ tracestate: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE
50
+ ```
51
+
52
+ **Why use W3C Trace Context instead of vendor-specific headers:**
53
+ - (+) Interoperable: every OpenTelemetry SDK, AWS X-Ray, Google Cloud Trace, and Datadog agent understands it.
54
+ - (+) Future-proof: the standard is stable and broadly adopted.
55
+ - (-) Requires all services to propagate the headers correctly. A service that drops the headers breaks the trace chain.
56
+
57
+ ### OpenTelemetry Integration
58
+
59
+ OpenTelemetry (OTel) is the CNCF-standard SDK for distributed tracing, metrics, and logs. It is the recommended approach — instrument once, export to any backend.
60
+
61
+ **Node.js setup:**
62
+
63
+ ```typescript
64
+ // src/tracing.ts — initialize before requiring any other modules
65
+ import { NodeSDK } from '@opentelemetry/sdk-node'
66
+ import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http'
67
+ import { Resource } from '@opentelemetry/resources'
68
+ import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions'
69
+ import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node'
70
+
71
+ const sdk = new NodeSDK({
72
+ resource: new Resource({
73
+ [SemanticResourceAttributes.SERVICE_NAME]: process.env.SERVICE_NAME ?? 'unknown-service',
74
+ [SemanticResourceAttributes.SERVICE_VERSION]: process.env.SERVICE_VERSION ?? '0.0.0',
75
+ [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV ?? 'development',
76
+ }),
77
+ traceExporter: new OTLPTraceExporter({
78
+ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://otel-collector:4318/v1/traces',
79
+ }),
80
+ instrumentations: [
81
+ getNodeAutoInstrumentations({
82
+ '@opentelemetry/instrumentation-http': { enabled: true },
83
+ '@opentelemetry/instrumentation-express': { enabled: true },
84
+ '@opentelemetry/instrumentation-pg': { enabled: true },
85
+ }),
86
+ ],
87
+ })
88
+
89
+ sdk.start()
90
+
91
+ // Graceful shutdown
92
+ process.on('SIGTERM', () => sdk.shutdown())
93
+ ```
94
+
95
+ **Creating custom spans for business operations:**
96
+
97
+ ```typescript
98
+ import { trace, context, SpanStatusCode } from '@opentelemetry/api'
99
+
100
+ const tracer = trace.getTracer('order-service', '1.0.0')
101
+
102
+ async function processOrder(orderId: string, items: OrderItem[]): Promise<Order> {
103
+ return tracer.startActiveSpan('processOrder', async (span) => {
104
+ span.setAttributes({
105
+ 'order.id': orderId,
106
+ 'order.item_count': items.length,
107
+ })
108
+
109
+ try {
110
+ const result = await doProcessOrder(orderId, items)
111
+ span.setStatus({ code: SpanStatusCode.OK })
112
+ return result
113
+ } catch (err) {
114
+ span.recordException(err as Error)
115
+ span.setStatus({ code: SpanStatusCode.ERROR, message: (err as Error).message })
116
+ throw err
117
+ } finally {
118
+ span.end()
119
+ }
120
+ })
121
+ }
122
+ ```
123
+
124
+ **Trade-offs (OpenTelemetry auto-instrumentation):**
125
+ - (+) Automatic instrumentation for HTTP, gRPC, database drivers — no manual span creation needed for most cases.
126
+ - (+) Vendor-neutral: switch from Jaeger to Tempo to Datadog by changing the exporter config.
127
+ - (-) Auto-instrumentation adds startup latency (~200ms) — acceptable for long-running services, problematic for AWS Lambda cold starts.
128
+ - (-) High-cardinality span attributes (user IDs, order IDs) can explode storage costs. Set attribute cardinality limits.
129
+
130
+ ## Correlation ID Propagation
131
+
132
+ ### Correlation IDs vs. Trace IDs
133
+
134
+ Correlation IDs and trace IDs serve different purposes:
135
+
136
+ - **Trace ID** (from W3C traceparent): used by distributed tracing systems to correlate spans. Auto-generated by the tracing SDK. Used by engineers debugging specific requests.
137
+ - **Correlation ID**: a business-level identifier tied to a user request session or workflow, persisted in the application database for long-term audit and support. May span multiple traces if a workflow spans multiple requests or async operations.
138
+
139
+ Use both. The trace ID handles in-flight debugging; the correlation ID handles after-the-fact auditing and cross-referencing support tickets with log entries.
140
+
141
+ ### Propagation Standards
142
+
143
+ **Incoming requests:** Extract the correlation ID from the `X-Correlation-ID` header. If absent, generate a new UUID. Always return it in the response.
144
+
145
+ **Outgoing requests:** Attach the correlation ID to every outgoing HTTP call, Kafka message, and async job.
146
+
147
+ **Logs:** Include the correlation ID in every log entry during request processing.
148
+
149
+ ```typescript
150
+ // src/middleware/correlation.ts
151
+ import { randomUUID } from 'crypto'
152
+ import type { Request, Response, NextFunction } from 'express'
153
+ import { AsyncLocalStorage } from 'async_hooks'
154
+
155
+ const correlationStore = new AsyncLocalStorage<{ correlationId: string; traceId?: string }>()
156
+
157
+ export function correlationMiddleware(req: Request, res: Response, next: NextFunction): void {
158
+ const correlationId = (req.headers['x-correlation-id'] as string) ?? randomUUID()
159
+ const traceId = req.headers['traceparent'] as string | undefined
160
+
161
+ res.setHeader('X-Correlation-ID', correlationId)
162
+
163
+ correlationStore.run({ correlationId, traceId }, () => {
164
+ next()
165
+ })
166
+ }
167
+
168
+ export function getCorrelationId(): string | undefined {
169
+ return correlationStore.getStore()?.correlationId
170
+ }
171
+
172
+ // Attach to outgoing HTTP calls
173
+ export function outboundHeaders(): Record<string, string> {
174
+ const store = correlationStore.getStore()
175
+ if (!store) return {}
176
+ return {
177
+ 'X-Correlation-ID': store.correlationId,
178
+ }
179
+ }
180
+ ```
181
+
182
+ **In structured logs (pino example):**
183
+
184
+ ```typescript
185
+ import pino from 'pino'
186
+ import { getCorrelationId } from './middleware/correlation.js'
187
+
188
+ const baseLogger = pino({
189
+ level: process.env.LOG_LEVEL ?? 'info',
190
+ formatters: {
191
+ log(object) {
192
+ return {
193
+ ...object,
194
+ correlationId: getCorrelationId(),
195
+ service: process.env.SERVICE_NAME,
196
+ version: process.env.SERVICE_VERSION,
197
+ }
198
+ },
199
+ },
200
+ })
201
+
202
+ export const logger = baseLogger
203
+ ```
204
+
205
+ **In Kafka messages:**
206
+
207
+ ```typescript
208
+ // Attach correlation context to message headers
209
+ await producer.send({
210
+ topic: 'order.placed',
211
+ messages: [{
212
+ key: orderId,
213
+ value: JSON.stringify(payload),
214
+ headers: {
215
+ 'x-correlation-id': getCorrelationId() ?? '',
216
+ 'x-source-service': process.env.SERVICE_NAME ?? '',
217
+ },
218
+ }],
219
+ })
220
+ ```
221
+
222
+ **Consumer side — extract and propagate:**
223
+
224
+ ```typescript
225
+ consumer.run({
226
+ eachMessage: async ({ message }) => {
227
+ const correlationId =
228
+ message.headers?.['x-correlation-id']?.toString() ?? randomUUID()
229
+
230
+ correlationStore.run({ correlationId }, async () => {
231
+ await processMessage(message)
232
+ })
233
+ },
234
+ })
235
+ ```
236
+
237
+ **Trade-offs (correlation ID propagation):**
238
+ - (+) End-to-end request tracing across async boundaries that distributed tracing alone cannot bridge (async jobs, scheduled tasks, event chains spanning minutes or hours).
239
+ - (+) Customer support can reference a correlation ID in a ticket and engineers can filter all logs for that single workflow.
240
+ - (-) Every service must be updated to propagate the header. A service that drops it breaks the chain.
241
+ - (-) Adds cardinality to logs — increases log storage unless correlation IDs are indexed and older logs are pruned.
242
+
243
+ ## Cross-Service SLO Definition and Error Budget Management
244
+
245
+ ### Defining SLOs Across Services
246
+
247
+ A Service Level Objective (SLO) is a target for service reliability expressed as a percentage of requests that succeed within a defined latency. In a multi-service system, each service has its own SLOs, and user-facing operations have composite SLOs that depend on the SLOs of all participating services.
248
+
249
+ **Single-service SLO example:**
250
+
251
+ ```yaml
252
+ # docs/slos/order-service.yml
253
+ service: order-service
254
+ slos:
255
+ - name: order_placement_availability
256
+ description: POST /orders returns 2xx or 422 (valid response, not an infra error)
257
+ target: 99.9% # 43.8 minutes downtime per month
258
+ window: 30d
259
+ indicator:
260
+ type: availability
261
+ good_events: http_requests_total{service="order-service", path="/orders", method="POST", status=~"2xx|422"}
262
+ total_events: http_requests_total{service="order-service", path="/orders", method="POST"}
263
+
264
+ - name: order_placement_latency
265
+ description: POST /orders responds within 500ms at p99
266
+ target: 99%
267
+ window: 30d
268
+ indicator:
269
+ type: latency
270
+ threshold_ms: 500
271
+ percentile: 99
272
+ metric: http_request_duration_ms{service="order-service", path="/orders"}
273
+ ```
274
+
275
+ **Composite SLO for a user-facing flow:** When a user places an order, the request touches the API gateway, auth service, order service, inventory service, and payment service. The composite availability is the product of each service's availability:
276
+
277
+ ```
278
+ P(order_success) = P(gateway) × P(auth) × P(order) × P(inventory) × P(payment)
279
+ = 0.9999 × 0.9999 × 0.9990 × 0.9995 × 0.9990
280
+ = 0.9973 (99.73% availability, ~2 hours downtime/month)
281
+ ```
282
+
283
+ This means if you target 99.9% for the composite user experience, each participating service must significantly exceed that — a single 99.9% service makes the composite worse.
284
+
285
+ **Practical SLO guidelines:**
286
+ - Define SLOs per service and per user-facing journey. Both are needed.
287
+ - Use 30-day rolling windows for error budgets — avoids quarterly spikes.
288
+ - Alert on error budget burn rate (e.g., if you burn 5% of the monthly error budget in an hour, page on-call) rather than hard availability thresholds.
289
+ - SLOs should be stored in version control alongside the service code.
290
+
291
+ ### Error Budget Management
292
+
293
+ An error budget is the allowed failure capacity derived from the SLO target: if the SLO is 99.9%, the error budget is 0.1% (43.8 minutes of downtime per month).
294
+
295
+ **Error budget policy decisions:**
296
+
297
+ | Error Budget Remaining | Allowed Action |
298
+ |------------------------|----------------|
299
+ | > 50% | Normal development velocity, feature work, experiments |
300
+ | 25–50% | Caution. Prefer reliability improvements over new features |
301
+ | 10–25% | Freeze risky deploys. Focus on reliability work |
302
+ | < 10% | Stop all non-critical deploys. Incident review required |
303
+
304
+ **Prometheus alert rule for error budget burn rate:**
305
+
306
+ ```yaml
307
+ # alerts/slo-burn-rate.yml
308
+ groups:
309
+ - name: slo_burn_rate
310
+ rules:
311
+ - alert: HighErrorBudgetBurnRate
312
+ expr: |
313
+ (
314
+ rate(http_requests_total{status=~"5.."}[1h]) /
315
+ rate(http_requests_total[1h])
316
+ ) > (14.4 * (1 - 0.999))
317
+ for: 2m
318
+ labels:
319
+ severity: page
320
+ annotations:
321
+ summary: "{{ $labels.service }} burning error budget at 14x rate"
322
+ description: |
323
+ Service {{ $labels.service }} is burning its monthly error budget
324
+ at 14x the sustainable rate. At this rate, the full monthly budget
325
+ will be consumed in ~2 hours.
326
+ ```
327
+
328
+ **Trade-offs (SLO-based alerting):**
329
+ - (+) Error budget burn rate alerts fire early (before the budget is exhausted) and reduce alert fatigue compared to hard threshold alerts.
330
+ - (+) Aligns engineering and product decisions: spending error budget on risky experiments is an explicit product trade-off.
331
+ - (-) Requires setting meaningful SLO targets — too lenient wastes budget, too strict makes every incident a budget crisis.
332
+ - (-) Composite SLOs across services require all participating services to instrument and report correctly.
333
+
334
+ ## Failure Attribution and Root Cause Analysis
335
+
336
+ ### Attributing Failures in Distributed Traces
337
+
338
+ When a distributed request fails, the trace shows which span failed and why. The root cause is typically the deepest span with an error status — but not always. Use a structured analysis approach.
339
+
340
+ **Steps for trace-based failure attribution:**
341
+
342
+ 1. Identify the user-facing error (the outermost span with an error status).
343
+ 2. Walk the span tree inward until you find the first span that recorded an error. This is the origin of the error.
344
+ 3. Check if the origin span is a timeout, a 5xx from a downstream service, or an exception in application code.
345
+ 4. Classify the failure: infrastructure (network, hardware), dependency (external API, database), or application (bug, unhandled edge case).
346
+
347
+ **Span attributes to include for attribution:**
348
+
349
+ ```typescript
350
+ // Good span attributes for failure attribution
351
+ span.setAttributes({
352
+ 'http.method': 'POST',
353
+ 'http.url': 'https://payment-service/charge',
354
+ 'http.status_code': 503,
355
+ 'error.type': 'ServiceUnavailable',
356
+ 'error.message': 'payment-service: connection timeout after 2000ms',
357
+ 'downstream.service': 'payment-service',
358
+ 'retry.attempt': 2,
359
+ 'retry.max': 3,
360
+ })
361
+ ```
362
+
363
+ ### Distributed Logging Aggregation
364
+
365
+ All services must ship logs to a central log aggregation system (ELK stack, Loki, CloudWatch Logs). Structured JSON logs with consistent fields are essential.
366
+
367
+ **Mandatory log fields (every log entry from every service):**
368
+
369
+ ```typescript
370
+ interface LogEntry {
371
+ timestamp: string // ISO 8601
372
+ level: 'debug' | 'info' | 'warn' | 'error' | 'fatal'
373
+ service: string // service name from SERVICE_NAME env var
374
+ version: string // service version
375
+ correlationId?: string // propagated X-Correlation-ID
376
+ traceId?: string // from W3C traceparent
377
+ spanId?: string // current span ID
378
+ message: string
379
+ // Additional context fields as needed
380
+ [key: string]: unknown
381
+ }
382
+ ```
383
+
384
+ **Log query patterns for failure attribution:**
385
+
386
+ ```
387
+ # Find all log entries for a specific correlation ID across all services
388
+ correlationId = "550e8400-e29b-41d4-a716-446655440000"
389
+
390
+ # Find all errors in the order-placement flow in the last hour
391
+ level = "error" AND correlationId = "..." AND timestamp > now() - 1h
392
+
393
+ # Find timeout patterns across the payment-service
394
+ service = "payment-service" AND message:timeout AND level = "error"
395
+ | stats count by bin(1m)
396
+ ```
397
+
398
+ ### Cross-Service Dashboards
399
+
400
+ A cross-service dashboard gives the on-call engineer a single view of system health:
401
+
402
+ **Essential panels for a cross-service operations dashboard:**
403
+
404
+ ```yaml
405
+ # Grafana dashboard structure (conceptual)
406
+ dashboard:
407
+ title: "Multi-Service Operations"
408
+ rows:
409
+ - title: "User-Facing Health"
410
+ panels:
411
+ - name: "Composite Availability (30m window)"
412
+ type: stat
413
+ query: |
414
+ avg(rate(http_requests_success[30m]) / rate(http_requests_total[30m]))
415
+ - name: "p99 Latency by Service"
416
+ type: timeseries
417
+ query: |
418
+ histogram_quantile(0.99, rate(http_request_duration_ms_bucket[5m]))
419
+
420
+ - title: "Error Budget"
421
+ panels:
422
+ - name: "Error Budget Remaining (30d)"
423
+ type: gauge
424
+ thresholds: [10, 25, 50]
425
+ query: |
426
+ 1 - (sum(rate(http_requests_total{status=~"5.."}[30d])) /
427
+ sum(rate(http_requests_total[30d])))
428
+
429
+ - title: "Service Dependencies"
430
+ panels:
431
+ - name: "Cross-Service Call Success Rate"
432
+ type: heatmap
433
+ description: "Source service (rows) calling destination service (columns)"
434
+ ```
435
+
436
+ **Trade-offs (centralized dashboards):**
437
+ - (+) Single pane of glass during incidents — on-call does not need to check each service individually.
438
+ - (+) Error budget panels enforce SLO accountability.
439
+ - (-) Dashboard maintenance burden. When services are added or renamed, dashboards go stale.
440
+ - (-) A single cross-service dashboard can obscure service-specific details. Link to per-service dashboards from the cross-service dashboard rather than collapsing everything into one view.
441
+
442
+ ## OpenTelemetry Collector Deployment
443
+
444
+ For production deployments, route telemetry through an OpenTelemetry Collector rather than exporting directly from services to the backend. The Collector acts as a buffer, processor, and router.
445
+
446
+ ```yaml
447
+ # otel-collector-config.yml
448
+ receivers:
449
+ otlp:
450
+ protocols:
451
+ grpc:
452
+ endpoint: 0.0.0.0:4317
453
+ http:
454
+ endpoint: 0.0.0.0:4318
455
+
456
+ processors:
457
+ batch:
458
+ timeout: 10s
459
+ send_batch_size: 1024
460
+ memory_limiter:
461
+ check_interval: 1s
462
+ limit_mib: 400
463
+ spike_limit_mib: 100
464
+ resource:
465
+ attributes:
466
+ - action: insert
467
+ key: deployment.environment
468
+ value: "${DEPLOYMENT_ENVIRONMENT}"
469
+
470
+ exporters:
471
+ jaeger:
472
+ endpoint: jaeger-collector:14250
473
+ tls:
474
+ insecure: false
475
+ cert_file: /certs/collector.crt
476
+ key_file: /certs/collector.key
477
+ prometheus:
478
+ endpoint: "0.0.0.0:8889"
479
+ namespace: otel
480
+ loki:
481
+ endpoint: http://loki:3100/loki/api/v1/push
482
+
483
+ service:
484
+ pipelines:
485
+ traces:
486
+ receivers: [otlp]
487
+ processors: [memory_limiter, batch, resource]
488
+ exporters: [jaeger]
489
+ metrics:
490
+ receivers: [otlp]
491
+ processors: [memory_limiter, batch]
492
+ exporters: [prometheus]
493
+ logs:
494
+ receivers: [otlp]
495
+ processors: [memory_limiter, batch, resource]
496
+ exporters: [loki]
497
+ ```
498
+
499
+ **Trade-offs (OTel Collector):**
500
+ - (+) Backend-agnostic. Switch from Jaeger to Tempo by changing the exporter — no service code changes.
501
+ - (+) The Collector can sample, filter, and enrich telemetry before export. Reduces storage costs.
502
+ - (+) The Collector buffers telemetry during backend outages — no data loss if Jaeger has a hiccup.
503
+ - (-) Adds one more component to operate. The Collector must be highly available or services lose telemetry.
504
+ - (-) Misconfigured sampling in the Collector can silently drop critical traces. Monitor Collector drop rate.
505
+
506
+ ## Sampling Strategy
507
+
508
+ High-traffic services can generate millions of spans per minute. Sampling reduces storage costs at the expense of completeness.
509
+
510
+ **Head-based sampling:** The tracing SDK decides at the start of a trace whether to record it (based on a percentage, e.g., 1%). Simple but can drop error traces.
511
+
512
+ **Tail-based sampling (recommended for production):** The Collector holds spans in memory until the trace is complete, then decides whether to keep it based on trace-level criteria (e.g., keep all error traces, keep 1% of success traces).
513
+
514
+ ```yaml
515
+ # Tail-based sampling in OTel Collector
516
+ processors:
517
+ tail_sampling:
518
+ decision_wait: 10s
519
+ num_traces: 100000
520
+ expected_new_traces_per_sec: 1000
521
+ policies:
522
+ - name: errors-policy
523
+ type: status_code
524
+ status_code: {status_codes: [ERROR]}
525
+ - name: slow-traces-policy
526
+ type: latency
527
+ latency: {threshold_ms: 2000}
528
+ - name: probabilistic-policy
529
+ type: probabilistic
530
+ probabilistic: {sampling_percentage: 1}
531
+ ```
532
+
533
+ ## Common Pitfalls
534
+
535
+ **Missing header propagation.** A service receives a `traceparent` header but does not forward it in outgoing calls. The trace is broken at that service — the downstream spans appear as independent traces with no parent. Fix: instrument all HTTP clients, message producers, and async job dispatchers to propagate trace context.
536
+
537
+ **Log correlation without structured logs.** If services log plain text without the correlation ID field, log queries cannot aggregate across services. Fix: require structured JSON logs with `correlationId` and `traceId` as top-level fields in all services.
538
+
539
+ **SLOs without alerting.** Defining SLOs in YAML that nobody reads provides no operational benefit. Fix: SLO definitions must be backed by alerting rules that fire before the budget is exhausted. Treat unenforced SLOs as unfinished work.
540
+
541
+ **Dashboard sprawl.** Each service creates its own dashboard with different conventions, different time windows, and different color schemes. Nobody uses them during incidents because they cannot find the right one. Fix: establish a single cross-service dashboard as the on-call starting point with links to per-service detail dashboards.
542
+
543
+ **High-cardinality span attributes.** Adding user IDs or request payloads as span attributes creates millions of unique label combinations that explode trace storage costs. Fix: restrict span attributes to known-cardinality fields (service names, status codes, HTTP methods, boolean flags). Put user IDs in log fields, not span attributes.
544
+
545
+ **Tracing gaps in async flows.** A trace starts when an HTTP request arrives and ends when the response is sent. If that request enqueues a job that processes 30 minutes later, the trace does not capture the job processing. Fix: propagate the trace context in job metadata and create a new linked span in the worker, linking it to the original trace via `FOLLOWS_FROM` span link.