@wazir-dev/cli 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (629) hide show
  1. package/AGENTS.md +111 -0
  2. package/CHANGELOG.md +14 -0
  3. package/CONTRIBUTING.md +101 -0
  4. package/LICENSE +21 -0
  5. package/README.md +314 -0
  6. package/assets/composition-engine.mmd +34 -0
  7. package/assets/demo-script.sh +17 -0
  8. package/assets/logo-dark.svg +14 -0
  9. package/assets/logo.svg +14 -0
  10. package/assets/pipeline.mmd +39 -0
  11. package/assets/record-demo.sh +51 -0
  12. package/docs/README.md +51 -0
  13. package/docs/adapters/context-mode.md +60 -0
  14. package/docs/concepts/architecture.md +87 -0
  15. package/docs/concepts/artifact-model.md +60 -0
  16. package/docs/concepts/composition-engine.md +36 -0
  17. package/docs/concepts/indexing-and-recall.md +160 -0
  18. package/docs/concepts/observability.md +41 -0
  19. package/docs/concepts/roles-and-workflows.md +59 -0
  20. package/docs/concepts/terminology-policy.md +27 -0
  21. package/docs/getting-started/01-installation.md +78 -0
  22. package/docs/getting-started/02-first-run.md +102 -0
  23. package/docs/getting-started/03-adding-to-project.md +15 -0
  24. package/docs/getting-started/04-host-setup.md +15 -0
  25. package/docs/guides/ci-integration.md +15 -0
  26. package/docs/guides/creating-skills.md +15 -0
  27. package/docs/guides/expertise-module-authoring.md +15 -0
  28. package/docs/guides/hook-development.md +15 -0
  29. package/docs/guides/memory-and-learnings.md +34 -0
  30. package/docs/guides/multi-host-export.md +15 -0
  31. package/docs/guides/troubleshooting.md +101 -0
  32. package/docs/guides/writing-custom-roles.md +15 -0
  33. package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
  34. package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
  35. package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
  36. package/docs/readmes/INDEX.md +99 -0
  37. package/docs/readmes/features/expertise/README.md +171 -0
  38. package/docs/readmes/features/exports/README.md +222 -0
  39. package/docs/readmes/features/hooks/README.md +103 -0
  40. package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
  41. package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
  42. package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
  43. package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
  44. package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
  45. package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
  46. package/docs/readmes/features/hooks/session-start.md +119 -0
  47. package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
  48. package/docs/readmes/features/roles/README.md +157 -0
  49. package/docs/readmes/features/roles/clarifier.md +152 -0
  50. package/docs/readmes/features/roles/content-author.md +190 -0
  51. package/docs/readmes/features/roles/designer.md +193 -0
  52. package/docs/readmes/features/roles/executor.md +184 -0
  53. package/docs/readmes/features/roles/learner.md +210 -0
  54. package/docs/readmes/features/roles/planner.md +182 -0
  55. package/docs/readmes/features/roles/researcher.md +164 -0
  56. package/docs/readmes/features/roles/reviewer.md +184 -0
  57. package/docs/readmes/features/roles/specifier.md +162 -0
  58. package/docs/readmes/features/roles/verifier.md +215 -0
  59. package/docs/readmes/features/schemas/README.md +178 -0
  60. package/docs/readmes/features/skills/README.md +63 -0
  61. package/docs/readmes/features/skills/brainstorming.md +96 -0
  62. package/docs/readmes/features/skills/debugging.md +148 -0
  63. package/docs/readmes/features/skills/design.md +120 -0
  64. package/docs/readmes/features/skills/prepare-next.md +109 -0
  65. package/docs/readmes/features/skills/run-audit.md +159 -0
  66. package/docs/readmes/features/skills/scan-project.md +109 -0
  67. package/docs/readmes/features/skills/self-audit.md +176 -0
  68. package/docs/readmes/features/skills/tdd.md +137 -0
  69. package/docs/readmes/features/skills/using-skills.md +92 -0
  70. package/docs/readmes/features/skills/verification.md +120 -0
  71. package/docs/readmes/features/skills/writing-plans.md +104 -0
  72. package/docs/readmes/features/tooling/README.md +320 -0
  73. package/docs/readmes/features/workflows/README.md +186 -0
  74. package/docs/readmes/features/workflows/author.md +181 -0
  75. package/docs/readmes/features/workflows/clarify.md +154 -0
  76. package/docs/readmes/features/workflows/design-review.md +171 -0
  77. package/docs/readmes/features/workflows/design.md +169 -0
  78. package/docs/readmes/features/workflows/discover.md +162 -0
  79. package/docs/readmes/features/workflows/execute.md +173 -0
  80. package/docs/readmes/features/workflows/learn.md +167 -0
  81. package/docs/readmes/features/workflows/plan-review.md +165 -0
  82. package/docs/readmes/features/workflows/plan.md +170 -0
  83. package/docs/readmes/features/workflows/prepare-next.md +167 -0
  84. package/docs/readmes/features/workflows/review.md +169 -0
  85. package/docs/readmes/features/workflows/run-audit.md +191 -0
  86. package/docs/readmes/features/workflows/spec-challenge.md +159 -0
  87. package/docs/readmes/features/workflows/specify.md +160 -0
  88. package/docs/readmes/features/workflows/verify.md +177 -0
  89. package/docs/readmes/packages/README.md +50 -0
  90. package/docs/readmes/packages/ajv.md +117 -0
  91. package/docs/readmes/packages/context-mode.md +118 -0
  92. package/docs/readmes/packages/gray-matter.md +116 -0
  93. package/docs/readmes/packages/node-test.md +137 -0
  94. package/docs/readmes/packages/yaml.md +112 -0
  95. package/docs/reference/configuration-reference.md +159 -0
  96. package/docs/reference/expertise-index.md +52 -0
  97. package/docs/reference/git-flow.md +43 -0
  98. package/docs/reference/hooks.md +87 -0
  99. package/docs/reference/host-exports.md +50 -0
  100. package/docs/reference/launch-checklist.md +172 -0
  101. package/docs/reference/marketplace-listings.md +76 -0
  102. package/docs/reference/release-process.md +34 -0
  103. package/docs/reference/roles-reference.md +77 -0
  104. package/docs/reference/skills.md +33 -0
  105. package/docs/reference/templates.md +29 -0
  106. package/docs/reference/tooling-cli.md +94 -0
  107. package/docs/truth-claims.yaml +222 -0
  108. package/expertise/PROGRESS.md +63 -0
  109. package/expertise/README.md +18 -0
  110. package/expertise/antipatterns/PROGRESS.md +56 -0
  111. package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
  112. package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
  113. package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
  114. package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
  115. package/expertise/antipatterns/backend/index.md +24 -0
  116. package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
  117. package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
  118. package/expertise/antipatterns/code/async-antipatterns.md +622 -0
  119. package/expertise/antipatterns/code/code-smells.md +1186 -0
  120. package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
  121. package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
  122. package/expertise/antipatterns/code/index.md +27 -0
  123. package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
  124. package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
  125. package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
  126. package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
  127. package/expertise/antipatterns/design/dark-patterns.md +1121 -0
  128. package/expertise/antipatterns/design/index.md +22 -0
  129. package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
  130. package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
  131. package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
  132. package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
  133. package/expertise/antipatterns/frontend/index.md +23 -0
  134. package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
  135. package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
  136. package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
  137. package/expertise/antipatterns/index.md +31 -0
  138. package/expertise/antipatterns/performance/index.md +20 -0
  139. package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
  140. package/expertise/antipatterns/performance/premature-optimization.md +623 -0
  141. package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
  142. package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
  143. package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
  144. package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
  145. package/expertise/antipatterns/process/index.md +23 -0
  146. package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
  147. package/expertise/antipatterns/security/index.md +20 -0
  148. package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
  149. package/expertise/antipatterns/security/security-theater.md +843 -0
  150. package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
  151. package/expertise/architecture/PROGRESS.md +70 -0
  152. package/expertise/architecture/data/caching-architecture.md +671 -0
  153. package/expertise/architecture/data/data-consistency.md +574 -0
  154. package/expertise/architecture/data/data-modeling.md +536 -0
  155. package/expertise/architecture/data/event-streams-and-queues.md +634 -0
  156. package/expertise/architecture/data/index.md +25 -0
  157. package/expertise/architecture/data/search-architecture.md +663 -0
  158. package/expertise/architecture/data/sql-vs-nosql.md +708 -0
  159. package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
  160. package/expertise/architecture/decisions/build-vs-buy.md +616 -0
  161. package/expertise/architecture/decisions/index.md +23 -0
  162. package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
  163. package/expertise/architecture/decisions/technology-selection.md +616 -0
  164. package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
  165. package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
  166. package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
  167. package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
  168. package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
  169. package/expertise/architecture/distributed/index.md +25 -0
  170. package/expertise/architecture/distributed/saga-pattern.md +797 -0
  171. package/expertise/architecture/foundations/architectural-thinking.md +460 -0
  172. package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
  173. package/expertise/architecture/foundations/design-principles-solid.md +649 -0
  174. package/expertise/architecture/foundations/domain-driven-design.md +719 -0
  175. package/expertise/architecture/foundations/index.md +25 -0
  176. package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
  177. package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
  178. package/expertise/architecture/index.md +34 -0
  179. package/expertise/architecture/integration/api-design-graphql.md +638 -0
  180. package/expertise/architecture/integration/api-design-grpc.md +804 -0
  181. package/expertise/architecture/integration/api-design-rest.md +892 -0
  182. package/expertise/architecture/integration/index.md +25 -0
  183. package/expertise/architecture/integration/third-party-integration.md +795 -0
  184. package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
  185. package/expertise/architecture/integration/websockets-realtime.md +791 -0
  186. package/expertise/architecture/mobile-architecture/index.md +22 -0
  187. package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
  188. package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
  189. package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
  190. package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
  191. package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
  192. package/expertise/architecture/patterns/event-driven.md +797 -0
  193. package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
  194. package/expertise/architecture/patterns/index.md +27 -0
  195. package/expertise/architecture/patterns/layered-architecture.md +736 -0
  196. package/expertise/architecture/patterns/microservices.md +753 -0
  197. package/expertise/architecture/patterns/modular-monolith.md +692 -0
  198. package/expertise/architecture/patterns/monolith.md +626 -0
  199. package/expertise/architecture/patterns/plugin-architecture.md +735 -0
  200. package/expertise/architecture/patterns/serverless.md +780 -0
  201. package/expertise/architecture/scaling/database-scaling.md +615 -0
  202. package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
  203. package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
  204. package/expertise/architecture/scaling/index.md +24 -0
  205. package/expertise/architecture/scaling/multi-tenancy.md +800 -0
  206. package/expertise/architecture/scaling/stateless-design.md +787 -0
  207. package/expertise/backend/embedded-firmware.md +625 -0
  208. package/expertise/backend/go.md +853 -0
  209. package/expertise/backend/index.md +24 -0
  210. package/expertise/backend/java-spring.md +448 -0
  211. package/expertise/backend/node-typescript.md +625 -0
  212. package/expertise/backend/python-fastapi.md +724 -0
  213. package/expertise/backend/rust.md +458 -0
  214. package/expertise/backend/solidity.md +711 -0
  215. package/expertise/composition-map.yaml +443 -0
  216. package/expertise/content/foundations/content-modeling.md +395 -0
  217. package/expertise/content/foundations/editorial-standards.md +449 -0
  218. package/expertise/content/foundations/index.md +24 -0
  219. package/expertise/content/foundations/microcopy.md +455 -0
  220. package/expertise/content/foundations/terminology-governance.md +509 -0
  221. package/expertise/content/index.md +34 -0
  222. package/expertise/content/patterns/accessibility-copy.md +518 -0
  223. package/expertise/content/patterns/index.md +24 -0
  224. package/expertise/content/patterns/notification-content.md +433 -0
  225. package/expertise/content/patterns/sample-content.md +486 -0
  226. package/expertise/content/patterns/state-copy.md +439 -0
  227. package/expertise/design/PROGRESS.md +58 -0
  228. package/expertise/design/disciplines/dark-mode-theming.md +577 -0
  229. package/expertise/design/disciplines/design-systems.md +595 -0
  230. package/expertise/design/disciplines/index.md +25 -0
  231. package/expertise/design/disciplines/information-architecture.md +800 -0
  232. package/expertise/design/disciplines/interaction-design.md +788 -0
  233. package/expertise/design/disciplines/responsive-design.md +552 -0
  234. package/expertise/design/disciplines/usability-testing.md +516 -0
  235. package/expertise/design/disciplines/user-research.md +792 -0
  236. package/expertise/design/foundations/accessibility-design.md +796 -0
  237. package/expertise/design/foundations/color-theory.md +797 -0
  238. package/expertise/design/foundations/iconography.md +795 -0
  239. package/expertise/design/foundations/index.md +26 -0
  240. package/expertise/design/foundations/motion-and-animation.md +653 -0
  241. package/expertise/design/foundations/rtl-design.md +585 -0
  242. package/expertise/design/foundations/spacing-and-layout.md +607 -0
  243. package/expertise/design/foundations/typography.md +800 -0
  244. package/expertise/design/foundations/visual-hierarchy.md +761 -0
  245. package/expertise/design/index.md +32 -0
  246. package/expertise/design/patterns/authentication-flows.md +474 -0
  247. package/expertise/design/patterns/content-consumption.md +789 -0
  248. package/expertise/design/patterns/data-display.md +618 -0
  249. package/expertise/design/patterns/e-commerce.md +1494 -0
  250. package/expertise/design/patterns/feedback-and-states.md +642 -0
  251. package/expertise/design/patterns/forms-and-input.md +819 -0
  252. package/expertise/design/patterns/gamification.md +801 -0
  253. package/expertise/design/patterns/index.md +31 -0
  254. package/expertise/design/patterns/microinteractions.md +449 -0
  255. package/expertise/design/patterns/navigation.md +800 -0
  256. package/expertise/design/patterns/notifications.md +705 -0
  257. package/expertise/design/patterns/onboarding.md +700 -0
  258. package/expertise/design/patterns/search-and-filter.md +601 -0
  259. package/expertise/design/patterns/settings-and-preferences.md +768 -0
  260. package/expertise/design/patterns/social-and-community.md +748 -0
  261. package/expertise/design/platforms/desktop-native.md +612 -0
  262. package/expertise/design/platforms/index.md +25 -0
  263. package/expertise/design/platforms/mobile-android.md +825 -0
  264. package/expertise/design/platforms/mobile-cross-platform.md +983 -0
  265. package/expertise/design/platforms/mobile-ios.md +699 -0
  266. package/expertise/design/platforms/tablet.md +794 -0
  267. package/expertise/design/platforms/web-dashboard.md +790 -0
  268. package/expertise/design/platforms/web-responsive.md +550 -0
  269. package/expertise/design/psychology/behavioral-nudges.md +449 -0
  270. package/expertise/design/psychology/cognitive-load.md +1191 -0
  271. package/expertise/design/psychology/error-psychology.md +778 -0
  272. package/expertise/design/psychology/index.md +22 -0
  273. package/expertise/design/psychology/persuasive-design.md +736 -0
  274. package/expertise/design/psychology/user-mental-models.md +623 -0
  275. package/expertise/design/tooling/open-pencil.md +266 -0
  276. package/expertise/frontend/angular.md +1073 -0
  277. package/expertise/frontend/desktop-electron.md +546 -0
  278. package/expertise/frontend/flutter.md +782 -0
  279. package/expertise/frontend/index.md +27 -0
  280. package/expertise/frontend/native-android.md +409 -0
  281. package/expertise/frontend/native-ios.md +490 -0
  282. package/expertise/frontend/react-native.md +1160 -0
  283. package/expertise/frontend/react.md +808 -0
  284. package/expertise/frontend/vue.md +1089 -0
  285. package/expertise/humanize/domain-rules-code.md +79 -0
  286. package/expertise/humanize/domain-rules-content.md +67 -0
  287. package/expertise/humanize/domain-rules-technical-docs.md +56 -0
  288. package/expertise/humanize/index.md +35 -0
  289. package/expertise/humanize/self-audit-checklist.md +87 -0
  290. package/expertise/humanize/sentence-patterns.md +218 -0
  291. package/expertise/humanize/vocabulary-blacklist.md +105 -0
  292. package/expertise/i18n/PROGRESS.md +65 -0
  293. package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
  294. package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
  295. package/expertise/i18n/advanced/complex-scripts.md +30 -0
  296. package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
  297. package/expertise/i18n/advanced/testing-i18n.md +28 -0
  298. package/expertise/i18n/content/content-adaptation.md +23 -0
  299. package/expertise/i18n/content/locale-specific-formatting.md +23 -0
  300. package/expertise/i18n/content/machine-translation-integration.md +28 -0
  301. package/expertise/i18n/content/translation-management.md +29 -0
  302. package/expertise/i18n/foundations/date-time-calendars.md +67 -0
  303. package/expertise/i18n/foundations/i18n-architecture.md +272 -0
  304. package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
  305. package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
  306. package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
  307. package/expertise/i18n/foundations/string-externalization.md +236 -0
  308. package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
  309. package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
  310. package/expertise/i18n/index.md +38 -0
  311. package/expertise/i18n/platform/backend-i18n.md +31 -0
  312. package/expertise/i18n/platform/flutter-i18n.md +148 -0
  313. package/expertise/i18n/platform/native-android-i18n.md +36 -0
  314. package/expertise/i18n/platform/native-ios-i18n.md +36 -0
  315. package/expertise/i18n/platform/react-i18n.md +103 -0
  316. package/expertise/i18n/platform/web-css-i18n.md +81 -0
  317. package/expertise/i18n/rtl/arabic-specific.md +175 -0
  318. package/expertise/i18n/rtl/hebrew-specific.md +149 -0
  319. package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
  320. package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
  321. package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
  322. package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
  323. package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
  324. package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
  325. package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
  326. package/expertise/i18n/rtl/rtl-typography.md +160 -0
  327. package/expertise/index.md +113 -0
  328. package/expertise/index.yaml +216 -0
  329. package/expertise/infrastructure/cloud-aws.md +597 -0
  330. package/expertise/infrastructure/cloud-gcp.md +599 -0
  331. package/expertise/infrastructure/cybersecurity.md +816 -0
  332. package/expertise/infrastructure/database-mongodb.md +447 -0
  333. package/expertise/infrastructure/database-postgres.md +400 -0
  334. package/expertise/infrastructure/devops-cicd.md +787 -0
  335. package/expertise/infrastructure/index.md +27 -0
  336. package/expertise/performance/PROGRESS.md +50 -0
  337. package/expertise/performance/backend/api-latency.md +1204 -0
  338. package/expertise/performance/backend/background-jobs.md +506 -0
  339. package/expertise/performance/backend/connection-pooling.md +1209 -0
  340. package/expertise/performance/backend/database-query-optimization.md +515 -0
  341. package/expertise/performance/backend/index.md +23 -0
  342. package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
  343. package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
  344. package/expertise/performance/foundations/caching-strategies.md +489 -0
  345. package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
  346. package/expertise/performance/foundations/index.md +24 -0
  347. package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
  348. package/expertise/performance/foundations/memory-management.md +964 -0
  349. package/expertise/performance/foundations/performance-budgets.md +1314 -0
  350. package/expertise/performance/index.md +31 -0
  351. package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
  352. package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
  353. package/expertise/performance/infrastructure/index.md +22 -0
  354. package/expertise/performance/infrastructure/load-balancing.md +1081 -0
  355. package/expertise/performance/infrastructure/observability.md +1079 -0
  356. package/expertise/performance/mobile/index.md +23 -0
  357. package/expertise/performance/mobile/mobile-animations.md +544 -0
  358. package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
  359. package/expertise/performance/mobile/mobile-network.md +452 -0
  360. package/expertise/performance/mobile/mobile-rendering.md +599 -0
  361. package/expertise/performance/mobile/mobile-startup-time.md +505 -0
  362. package/expertise/performance/platform-specific/flutter-performance.md +647 -0
  363. package/expertise/performance/platform-specific/index.md +22 -0
  364. package/expertise/performance/platform-specific/node-performance.md +1307 -0
  365. package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
  366. package/expertise/performance/platform-specific/react-performance.md +1403 -0
  367. package/expertise/performance/web/bundle-optimization.md +1239 -0
  368. package/expertise/performance/web/image-and-media.md +636 -0
  369. package/expertise/performance/web/index.md +24 -0
  370. package/expertise/performance/web/network-optimization.md +1133 -0
  371. package/expertise/performance/web/rendering-performance.md +1098 -0
  372. package/expertise/performance/web/ssr-and-hydration.md +918 -0
  373. package/expertise/performance/web/web-vitals.md +1374 -0
  374. package/expertise/quality/accessibility.md +985 -0
  375. package/expertise/quality/evidence-based-verification.md +499 -0
  376. package/expertise/quality/index.md +24 -0
  377. package/expertise/quality/ml-model-audit.md +614 -0
  378. package/expertise/quality/performance.md +600 -0
  379. package/expertise/quality/testing-api.md +891 -0
  380. package/expertise/quality/testing-mobile.md +496 -0
  381. package/expertise/quality/testing-web.md +849 -0
  382. package/expertise/security/PROGRESS.md +54 -0
  383. package/expertise/security/agentic-identity.md +540 -0
  384. package/expertise/security/compliance-frameworks.md +601 -0
  385. package/expertise/security/data/data-encryption.md +364 -0
  386. package/expertise/security/data/data-privacy-gdpr.md +692 -0
  387. package/expertise/security/data/database-security.md +1171 -0
  388. package/expertise/security/data/index.md +22 -0
  389. package/expertise/security/data/pii-handling.md +531 -0
  390. package/expertise/security/foundations/authentication.md +1041 -0
  391. package/expertise/security/foundations/authorization.md +603 -0
  392. package/expertise/security/foundations/cryptography.md +1001 -0
  393. package/expertise/security/foundations/index.md +25 -0
  394. package/expertise/security/foundations/owasp-top-10.md +1354 -0
  395. package/expertise/security/foundations/secrets-management.md +1217 -0
  396. package/expertise/security/foundations/secure-sdlc.md +700 -0
  397. package/expertise/security/foundations/supply-chain-security.md +698 -0
  398. package/expertise/security/index.md +31 -0
  399. package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
  400. package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
  401. package/expertise/security/infrastructure/container-security.md +721 -0
  402. package/expertise/security/infrastructure/incident-response.md +1295 -0
  403. package/expertise/security/infrastructure/index.md +24 -0
  404. package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
  405. package/expertise/security/infrastructure/network-security.md +1337 -0
  406. package/expertise/security/mobile/index.md +23 -0
  407. package/expertise/security/mobile/mobile-android-security.md +1218 -0
  408. package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
  409. package/expertise/security/mobile/mobile-data-storage.md +1265 -0
  410. package/expertise/security/mobile/mobile-ios-security.md +1401 -0
  411. package/expertise/security/mobile/mobile-network-security.md +1520 -0
  412. package/expertise/security/smart-contract-security.md +594 -0
  413. package/expertise/security/testing/index.md +22 -0
  414. package/expertise/security/testing/penetration-testing.md +1258 -0
  415. package/expertise/security/testing/security-code-review.md +1765 -0
  416. package/expertise/security/testing/threat-modeling.md +1074 -0
  417. package/expertise/security/testing/vulnerability-scanning.md +1062 -0
  418. package/expertise/security/web/api-security.md +586 -0
  419. package/expertise/security/web/cors-and-headers.md +433 -0
  420. package/expertise/security/web/csrf.md +562 -0
  421. package/expertise/security/web/file-upload.md +1477 -0
  422. package/expertise/security/web/index.md +25 -0
  423. package/expertise/security/web/injection.md +1375 -0
  424. package/expertise/security/web/session-management.md +1101 -0
  425. package/expertise/security/web/xss.md +1158 -0
  426. package/exports/README.md +17 -0
  427. package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
  428. package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
  429. package/exports/hosts/claude/.claude/agents/designer.md +55 -0
  430. package/exports/hosts/claude/.claude/agents/executor.md +55 -0
  431. package/exports/hosts/claude/.claude/agents/learner.md +51 -0
  432. package/exports/hosts/claude/.claude/agents/planner.md +53 -0
  433. package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
  434. package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
  435. package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
  436. package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
  437. package/exports/hosts/claude/.claude/commands/author.md +42 -0
  438. package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
  439. package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
  440. package/exports/hosts/claude/.claude/commands/design.md +44 -0
  441. package/exports/hosts/claude/.claude/commands/discover.md +37 -0
  442. package/exports/hosts/claude/.claude/commands/execute.md +48 -0
  443. package/exports/hosts/claude/.claude/commands/learn.md +38 -0
  444. package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
  445. package/exports/hosts/claude/.claude/commands/plan.md +39 -0
  446. package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
  447. package/exports/hosts/claude/.claude/commands/review.md +40 -0
  448. package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
  449. package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
  450. package/exports/hosts/claude/.claude/commands/specify.md +38 -0
  451. package/exports/hosts/claude/.claude/commands/verify.md +37 -0
  452. package/exports/hosts/claude/.claude/settings.json +34 -0
  453. package/exports/hosts/claude/CLAUDE.md +19 -0
  454. package/exports/hosts/claude/export.manifest.json +38 -0
  455. package/exports/hosts/claude/host-package.json +67 -0
  456. package/exports/hosts/codex/AGENTS.md +19 -0
  457. package/exports/hosts/codex/export.manifest.json +38 -0
  458. package/exports/hosts/codex/host-package.json +41 -0
  459. package/exports/hosts/cursor/.cursor/hooks.json +16 -0
  460. package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
  461. package/exports/hosts/cursor/export.manifest.json +38 -0
  462. package/exports/hosts/cursor/host-package.json +42 -0
  463. package/exports/hosts/gemini/GEMINI.md +19 -0
  464. package/exports/hosts/gemini/export.manifest.json +38 -0
  465. package/exports/hosts/gemini/host-package.json +41 -0
  466. package/hooks/README.md +18 -0
  467. package/hooks/definitions/loop_cap_guard.yaml +21 -0
  468. package/hooks/definitions/post_tool_capture.yaml +24 -0
  469. package/hooks/definitions/pre_compact_summary.yaml +19 -0
  470. package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
  471. package/hooks/definitions/protected_path_write_guard.yaml +19 -0
  472. package/hooks/definitions/session_start.yaml +19 -0
  473. package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
  474. package/hooks/loop-cap-guard +17 -0
  475. package/hooks/post-tool-lint +36 -0
  476. package/hooks/protected-path-write-guard +17 -0
  477. package/hooks/session-start +41 -0
  478. package/llms-full.txt +2355 -0
  479. package/llms.txt +43 -0
  480. package/package.json +79 -0
  481. package/roles/README.md +20 -0
  482. package/roles/clarifier.md +42 -0
  483. package/roles/content-author.md +63 -0
  484. package/roles/designer.md +55 -0
  485. package/roles/executor.md +55 -0
  486. package/roles/learner.md +51 -0
  487. package/roles/planner.md +53 -0
  488. package/roles/researcher.md +43 -0
  489. package/roles/reviewer.md +54 -0
  490. package/roles/specifier.md +47 -0
  491. package/roles/verifier.md +71 -0
  492. package/schemas/README.md +24 -0
  493. package/schemas/accepted-learning.schema.json +20 -0
  494. package/schemas/author-artifact.schema.json +156 -0
  495. package/schemas/clarification.schema.json +19 -0
  496. package/schemas/design-artifact.schema.json +80 -0
  497. package/schemas/docs-claim.schema.json +18 -0
  498. package/schemas/export-manifest.schema.json +20 -0
  499. package/schemas/hook.schema.json +67 -0
  500. package/schemas/host-export-package.schema.json +18 -0
  501. package/schemas/implementation-plan.schema.json +19 -0
  502. package/schemas/proposed-learning.schema.json +19 -0
  503. package/schemas/research.schema.json +18 -0
  504. package/schemas/review.schema.json +29 -0
  505. package/schemas/run-manifest.schema.json +18 -0
  506. package/schemas/spec-challenge.schema.json +18 -0
  507. package/schemas/spec.schema.json +20 -0
  508. package/schemas/usage.schema.json +102 -0
  509. package/schemas/verification-proof.schema.json +29 -0
  510. package/schemas/wazir-manifest.schema.json +173 -0
  511. package/skills/README.md +40 -0
  512. package/skills/brainstorming/SKILL.md +77 -0
  513. package/skills/debugging/SKILL.md +50 -0
  514. package/skills/design/SKILL.md +61 -0
  515. package/skills/dispatching-parallel-agents/SKILL.md +128 -0
  516. package/skills/executing-plans/SKILL.md +70 -0
  517. package/skills/finishing-a-development-branch/SKILL.md +169 -0
  518. package/skills/humanize/SKILL.md +123 -0
  519. package/skills/init-pipeline/SKILL.md +124 -0
  520. package/skills/prepare-next/SKILL.md +20 -0
  521. package/skills/receiving-code-review/SKILL.md +123 -0
  522. package/skills/requesting-code-review/SKILL.md +105 -0
  523. package/skills/requesting-code-review/code-reviewer.md +108 -0
  524. package/skills/run-audit/SKILL.md +197 -0
  525. package/skills/scan-project/SKILL.md +41 -0
  526. package/skills/self-audit/SKILL.md +153 -0
  527. package/skills/subagent-driven-development/SKILL.md +154 -0
  528. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
  529. package/skills/subagent-driven-development/implementer-prompt.md +102 -0
  530. package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
  531. package/skills/tdd/SKILL.md +23 -0
  532. package/skills/using-git-worktrees/SKILL.md +163 -0
  533. package/skills/using-skills/SKILL.md +95 -0
  534. package/skills/verification/SKILL.md +22 -0
  535. package/skills/wazir/SKILL.md +463 -0
  536. package/skills/writing-plans/SKILL.md +30 -0
  537. package/skills/writing-skills/SKILL.md +157 -0
  538. package/skills/writing-skills/anthropic-best-practices.md +122 -0
  539. package/skills/writing-skills/persuasion-principles.md +50 -0
  540. package/templates/README.md +20 -0
  541. package/templates/artifacts/README.md +10 -0
  542. package/templates/artifacts/accepted-learning.md +19 -0
  543. package/templates/artifacts/accepted-learning.template.json +12 -0
  544. package/templates/artifacts/author.md +74 -0
  545. package/templates/artifacts/author.template.json +19 -0
  546. package/templates/artifacts/clarification.md +21 -0
  547. package/templates/artifacts/clarification.template.json +12 -0
  548. package/templates/artifacts/execute-notes.md +19 -0
  549. package/templates/artifacts/implementation-plan.md +21 -0
  550. package/templates/artifacts/implementation-plan.template.json +11 -0
  551. package/templates/artifacts/learning-proposal.md +19 -0
  552. package/templates/artifacts/next-run-handoff.md +21 -0
  553. package/templates/artifacts/plan-review.md +19 -0
  554. package/templates/artifacts/proposed-learning.template.json +12 -0
  555. package/templates/artifacts/research.md +21 -0
  556. package/templates/artifacts/research.template.json +12 -0
  557. package/templates/artifacts/review-findings.md +19 -0
  558. package/templates/artifacts/review.template.json +11 -0
  559. package/templates/artifacts/run-manifest.template.json +8 -0
  560. package/templates/artifacts/spec-challenge.md +19 -0
  561. package/templates/artifacts/spec-challenge.template.json +11 -0
  562. package/templates/artifacts/spec.md +21 -0
  563. package/templates/artifacts/spec.template.json +12 -0
  564. package/templates/artifacts/verification-proof.md +19 -0
  565. package/templates/artifacts/verification-proof.template.json +11 -0
  566. package/templates/examples/accepted-learning.example.json +14 -0
  567. package/templates/examples/author.example.json +152 -0
  568. package/templates/examples/clarification.example.json +15 -0
  569. package/templates/examples/docs-claim.example.json +8 -0
  570. package/templates/examples/export-manifest.example.json +7 -0
  571. package/templates/examples/host-export-package.example.json +11 -0
  572. package/templates/examples/implementation-plan.example.json +17 -0
  573. package/templates/examples/proposed-learning.example.json +13 -0
  574. package/templates/examples/research.example.json +15 -0
  575. package/templates/examples/research.example.md +6 -0
  576. package/templates/examples/review.example.json +17 -0
  577. package/templates/examples/run-manifest.example.json +9 -0
  578. package/templates/examples/spec-challenge.example.json +14 -0
  579. package/templates/examples/spec.example.json +21 -0
  580. package/templates/examples/verification-proof.example.json +21 -0
  581. package/templates/examples/wazir-manifest.example.yaml +65 -0
  582. package/templates/task-definition-schema.md +99 -0
  583. package/tooling/README.md +20 -0
  584. package/tooling/src/adapters/context-mode.js +50 -0
  585. package/tooling/src/capture/command.js +376 -0
  586. package/tooling/src/capture/store.js +99 -0
  587. package/tooling/src/capture/usage.js +270 -0
  588. package/tooling/src/checks/branches.js +50 -0
  589. package/tooling/src/checks/brand-truth.js +110 -0
  590. package/tooling/src/checks/changelog.js +231 -0
  591. package/tooling/src/checks/command-registry.js +36 -0
  592. package/tooling/src/checks/commits.js +102 -0
  593. package/tooling/src/checks/docs-drift.js +103 -0
  594. package/tooling/src/checks/docs-truth.js +201 -0
  595. package/tooling/src/checks/runtime-surface.js +156 -0
  596. package/tooling/src/cli.js +116 -0
  597. package/tooling/src/command-options.js +56 -0
  598. package/tooling/src/commands/validate.js +320 -0
  599. package/tooling/src/doctor/command.js +91 -0
  600. package/tooling/src/export/command.js +77 -0
  601. package/tooling/src/export/compiler.js +498 -0
  602. package/tooling/src/guards/loop-cap-guard.js +52 -0
  603. package/tooling/src/guards/protected-path-write-guard.js +67 -0
  604. package/tooling/src/index/command.js +152 -0
  605. package/tooling/src/index/storage.js +1061 -0
  606. package/tooling/src/index/summarizers.js +261 -0
  607. package/tooling/src/loaders.js +18 -0
  608. package/tooling/src/project-root.js +22 -0
  609. package/tooling/src/recall/command.js +225 -0
  610. package/tooling/src/schema-validator.js +30 -0
  611. package/tooling/src/state-root.js +40 -0
  612. package/tooling/src/status/command.js +71 -0
  613. package/wazir.manifest.yaml +135 -0
  614. package/workflows/README.md +19 -0
  615. package/workflows/author.md +42 -0
  616. package/workflows/clarify.md +38 -0
  617. package/workflows/design-review.md +46 -0
  618. package/workflows/design.md +44 -0
  619. package/workflows/discover.md +37 -0
  620. package/workflows/execute.md +48 -0
  621. package/workflows/learn.md +38 -0
  622. package/workflows/plan-review.md +42 -0
  623. package/workflows/plan.md +39 -0
  624. package/workflows/prepare-next.md +37 -0
  625. package/workflows/review.md +40 -0
  626. package/workflows/run-audit.md +41 -0
  627. package/workflows/spec-challenge.md +41 -0
  628. package/workflows/specify.md +38 -0
  629. package/workflows/verify.md +37 -0
@@ -0,0 +1,787 @@
1
+ # DevOps & CI/CD -- Expertise Module
2
+
3
+ > A DevOps/CI-CD specialist designs, builds, and maintains the automated pipelines, infrastructure,
4
+ > and operational practices that enable teams to deliver software reliably, securely, and at speed.
5
+ > Scope spans source control workflows through production observability, including IaC, container
6
+ > orchestration, deployment strategies, security scanning, and incident response.
7
+
8
+ ---
9
+
10
+ ## Core Patterns & Conventions
11
+
12
+ ### CI/CD Pipeline Design
13
+
14
+ **Canonical Stage Progression:**
15
+
16
+ ```
17
+ Source -> Build -> Unit Test -> SAST/Lint -> Integration Test -> Artifact Publish
18
+ -> Deploy Staging -> E2E/Smoke -> Security Scan -> Deploy Production -> Post-Deploy Verify
19
+ ```
20
+
21
+ **Key Principles:**
22
+
23
+ - **Pipeline as Code**: Store pipeline definitions in version control alongside application code.
24
+ - **Fail fast**: Place cheap checks (linting, unit tests, SAST) early; expensive checks later.
25
+ - **Parallel execution**: Run independent jobs (static analysis, unit tests, security scans) concurrently.
26
+ - **Quality gates**: Block promotion if coverage drops, vulnerabilities are found, or thresholds breach.
27
+ - **Immutable artifacts**: Build once, promote the same binary through environments.
28
+ - **Environment parity**: Staging must mirror production (same base images, resource limits).
29
+
30
+ **Environment Promotion:** `dev -> staging -> canary (5%) -> production (full rollout)` -- each promotion requires a gate (automated tests, approval, or metric-based analysis).
31
+
32
+ ### Infrastructure as Code (IaC)
33
+
34
+ **Tool Landscape (early 2026):**
35
+
36
+ | Tool | Language | Multi-Cloud | License | State Management |
37
+ |------|----------|-------------|---------|------------------|
38
+ | Terraform 1.10+ | HCL | Yes | BSL 1.1 (proprietary) | Remote state (S3, TFC) |
39
+ | OpenTofu 1.9+ | HCL | Yes | MPL 2.0 (open source) | Same as Terraform |
40
+ | Pulumi 3.x | Python/TS/Go/C# | Yes | Apache 2.0 | Pulumi Cloud or self-managed |
41
+ | AWS CDK 2.x | TS/Python/Java/C# | AWS only | Apache 2.0 | CloudFormation |
42
+
43
+ **Critical note**: Terraform Open Source under BSL will be discontinued after July 2025. OpenTofu (CNCF sandbox) is the drop-in open-source replacement. CDKTF was deprecated in December 2025.
44
+
45
+ **Best Practices:** Pin provider/module versions explicitly. Store state remotely with locking. Separate state per environment and per component. Run `plan`/`preview` in CI before any apply. Tag all resources with `team`, `env`, `cost-center`, `managed-by`.
46
+
47
+ ### Container Orchestration
48
+
49
+ **Kubernetes** remains the standard for production orchestration. Key practices: use namespaces for isolation; define resource requests AND limits; use `PodDisruptionBudget` for availability during drains; prefer `Deployment` (stateless) or `StatefulSet` (stateful); use HPA with custom metrics; deploy via GitOps.
50
+
51
+ **Docker Compose** suits local development. **ECS Fargate** is a valid simpler alternative for AWS-only workloads not needing K8s ecosystem tooling.
52
+
53
+ ### GitOps Workflows
54
+
55
+ GitOps treats Git as the single source of truth. An agent inside the cluster continuously reconciles state with Git.
56
+
57
+ - **ArgoCD**: Rich web UI, multi-tenancy, RBAC, SSO. Stronger ecosystem and enterprise backing. Recommended for most new projects. CNCF graduated.
58
+ - **Flux**: Kubernetes-native (CRDs), modular, CLI-driven, lightweight. CNCF graduated. Weaveworks shut down in 2024; ArgoCD has stronger momentum.
59
+
60
+ **Best Practices:** Separate app code repos from GitOps config repos. Use Kustomize overlays or Helm values per environment. Enable drift detection. Require PRs for all changes. Use sealed secrets or external secret operators.
61
+
62
+ ### Configuration Management
63
+
64
+ - **Ansible** (agentless, SSH-based): Best for VM provisioning and OS configuration.
65
+ - **Chef/Puppet**: Legacy environments only. Prefer Ansible for new projects.
66
+
67
+ ### Branching Strategies
68
+
69
+ | Strategy | CI/CD Implications | Best For |
70
+ |----------|-------------------|----------|
71
+ | **Trunk-based** | CI on every commit to main; short-lived branches (<1 day) | Continuous deployment |
72
+ | **GitHub Flow** | CI on PR branches; CD triggers on merge to main | Most SaaS teams |
73
+ | **GitFlow** | CI on feature/develop/release branches; complex release trains | Versioned/scheduled releases |
74
+
75
+ Trunk-based development with feature flags is the recommended default for continuous deployment.
76
+
77
+ ### Artifact Management
78
+
79
+ Use a dedicated registry (GHCR, ECR, Artifact Registry, Artifactory). Tag images with Git SHA, not `latest`. Implement retention policies. Sign artifacts with Sigstore/cosign.
80
+
81
+ ### Secret Management
82
+
83
+ | Tool | Best For |
84
+ |------|----------|
85
+ | **HashiCorp Vault** | Dynamic secrets, PKI, multi-cloud |
86
+ | **AWS Secrets Manager** | AWS-native workloads, automatic rotation |
87
+ | **SOPS** (Mozilla) | Encrypting secrets in Git (KMS backend) |
88
+ | **External Secrets Operator** | Syncing cloud secrets into K8s Secrets |
89
+
90
+ Never store secrets in code or CI/CD logs. Rotate static credentials every 90 days maximum. Use short-lived, dynamically generated credentials wherever possible (Vault dynamic secrets, IAM Roles for Service Accounts, GCP Workload Identity).
91
+
92
+ ---
93
+
94
+ ## Anti-Patterns & Pitfalls
95
+
96
+ ### 1. Creating a Separate "DevOps Team"
97
+ **Why**: Creates another silo, contradicting DevOps's goal of shared responsibility. Teams should own their own pipelines and infrastructure.
98
+
99
+ ### 2. Tool-First, Culture-Last
100
+ **Why**: Adopting K8s/Docker/Jenkins without changing collaboration patterns delivers zero value. Tools amplify culture; they do not replace it.
101
+
102
+ ### 3. Manual Deployments to Production
103
+ **Why**: Human error, unrepeatable, unauditable. Every deployment must be automated through the pipeline.
104
+
105
+ ### 4. No Rollback Strategy
106
+ **Why**: A failed deployment without a tested rollback path becomes an incident. Always have one-click rollback.
107
+
108
+ ### 5. Snowflake Servers / Configuration Drift
109
+ **Why**: Manually configured servers diverge and cannot be reproduced. IaC + immutable infrastructure eliminates this.
110
+
111
+ ### 6. Secrets in Source Control
112
+ **Why**: Secrets persist in Git history even after deletion. Bots actively scan repos for leaked credentials.
113
+
114
+ ### 7. Monolithic Pipelines (No Parallelism)
115
+ **Why**: Sequential execution turns 5-minute pipelines into 45-minute pipelines. Developers batch changes, reducing feedback speed.
116
+
117
+ ### 8. Skipping Staging
118
+ **Why**: Without production-parity staging, bugs from network policies, resource limits, and DNS surface only in production.
119
+
120
+ ### 9. Over-Automation Without Process Understanding
121
+ **Why**: Automating a broken process makes it break faster. Optimize the process first, then automate.
122
+
123
+ ### 10. Ignoring Pipeline as Code Versioning
124
+ **Why**: Editing pipelines via web UI means no audit trail, no code review, no rollback capability.
125
+
126
+ ### 11. Alert Fatigue
127
+ **Why**: Hundreds of noisy alerts train teams to ignore all alerts. Every alert must be actionable.
128
+
129
+ ### 12. "Lift and Shift" to Kubernetes
130
+ **Why**: Moving monoliths into containers without architectural changes adds complexity without benefits.
131
+
132
+ ### 13. Hardcoded Environment Configuration
133
+ **Why**: Config baked into images requires rebuilding per environment, breaking immutable artifact principles.
134
+
135
+ ### 14. No Observability Until Incidents
136
+ **Why**: Monitoring from day one is essential. Without baseline metrics, you cannot compare during incidents.
137
+
138
+ ### 15. Premature Microservices
139
+ **Why**: Adds network complexity and operational overhead. Start with a structured monolith; extract when scale demands it.
140
+
141
+ ---
142
+
143
+ ## Testing Strategy
144
+
145
+ ### Pipeline Testing
146
+
147
+ - **Linting**: Dockerfiles (`hadolint`), Helm (`helm lint`), Terraform (`tflint`), YAML (`yamllint`).
148
+ - **Unit tests**: Coverage thresholds (e.g., 80% min). Fail pipeline on coverage regression.
149
+ - **Integration tests**: Containerized dependencies via Docker Compose or Testcontainers.
150
+ - **E2E tests**: Deployed staging environment. Playwright/Cypress. Test critical paths only.
151
+
152
+ ### Infrastructure Testing
153
+
154
+ | Tool | Purpose |
155
+ |------|---------|
156
+ | **Terratest** | Integration tests for Terraform modules (Go) |
157
+ | **Checkov** | Static analysis for IaC security misconfigurations |
158
+ | **tfsec / Trivy IaC** | Security scanning for Terraform, CloudFormation, K8s manifests |
159
+ | **OPA/Conftest** | Policy testing against structured data (JSON/YAML/HCL) |
160
+
161
+ ### Deployment Testing
162
+
163
+ - **Canary analysis**: 5-10% traffic to new version; Argo Rollouts or Flagger with Prometheus metrics (error rate, p99 latency) for auto-promote/rollback.
164
+ - **Blue/green validation**: Smoke tests against green before switching the load balancer.
165
+ - **Smoke tests**: HTTP checks on `/health`, `/readiness`, key API routes post-deployment.
166
+
167
+ ### Chaos Engineering
168
+
169
+ - **Tools**: LitmusChaos (CNCF, K8s-native), Gremlin (commercial), Chaos Mesh (CNCF sandbox).
170
+ - Start small (kill a pod, add latency). Define steady state first. Limit blast radius.
171
+ - Integrate chaos experiments into staging release pipelines for resilience validation.
172
+
173
+ ---
174
+
175
+ ## Performance Considerations
176
+
177
+ ### Pipeline Speed Optimization
178
+
179
+ - **Dependency caching**: Cache `node_modules`, `.m2/repository`, pip wheels between runs. GHA: `actions/cache@v4`. GitLab: `cache:` directive.
180
+ - **Docker layer caching**: BuildKit with `--cache-from`/`--cache-to` for registry caching. GHA: `cache-from: type=gha`.
181
+ - **Remote build caching**: Gradle remote cache, Bazel remote execution, Nx Cloud, Turborepo.
182
+ - **Parallelism**: Split test suites (`jest --shard`, `pytest-split`). Fan-out/fan-in pattern. Matrix builds for multi-platform testing.
183
+
184
+ ### Docker Layer Caching
185
+
186
+ ```dockerfile
187
+ # BAD: Invalidates cache on any file change
188
+ COPY . /app
189
+ RUN npm install
190
+
191
+ # GOOD: Dependency manifest first, then source
192
+ COPY package.json package-lock.json /app/
193
+ RUN npm ci --production
194
+ COPY . /app
195
+ ```
196
+
197
+ Order instructions least-to-most frequently changed. Use `.dockerignore`. Use cache mounts: `RUN --mount=type=cache,target=/root/.npm npm ci`. Separate build stage from runtime via multi-stage builds.
198
+
199
+ ### Monorepo CI Optimization
200
+
201
+ - **Affected detection**: Nx (`nx affected`), Turborepo, Bazel, or `git diff` to build only what changed.
202
+ - **Task graph**: Nx/Bazel model inter-package dependencies for correct order + max parallelism.
203
+ - **Impact**: 60-80% CI time reduction with selective execution + remote caching.
204
+
205
+ ---
206
+
207
+ ## Security Considerations
208
+
209
+ ### Supply Chain Security
210
+
211
+ **SBOM**: Generate in SPDX 3.0 or CycloneDX for every release. Tools: Syft, Trivy, `cdxgen`. Mandated by U.S. EO 14028 for federal suppliers; CISA updated minimum elements in 2025.
212
+
213
+ **SLSA 1.0** (four levels of build integrity):
214
+ - **L1**: Documented build. **L2**: Hosted build + signed provenance (achievable in weeks with GHA OIDC + Sigstore). **L3**: Hardened platform, non-falsifiable provenance. **L4**: Two-person review, hermetic builds.
215
+
216
+ **Sigstore**: Keyless signing via `cosign` with OIDC identity. Sign: `cosign sign --yes <image>@<digest>`. Verify: `cosign verify --certificate-oidc-issuer=... <image>`. Rekor transparency log for tamper-proof audit.
217
+
218
+ ### Container Scanning
219
+
220
+ | Tool | Type | Strengths |
221
+ |------|------|-----------|
222
+ | **Trivy** | OSS, Apache 2.0 | All-in-one: containers, IaC, secrets, SBOM, licenses |
223
+ | **Snyk Container** | Commercial ($25/dev/mo) | Actionable remediation, auto-fix PRs |
224
+ | **Grype** | OSS | Fast, pairs with Syft for SBOM-based scanning |
225
+
226
+ Scan in CI (block on HIGH/CRITICAL), in registries (admission control), and at runtime. Trivy recommended for cost-sensitive teams.
227
+
228
+ ### Secret Scanning and Rotation
229
+
230
+ Enable GitHub secret scanning + push protection. Use `gitleaks` or `trufflehog` as pre-commit hooks. Rotate automatically (AWS Secrets Manager + Lambda). Use OIDC-based auth in CI/CD to eliminate static credentials (GHA OIDC with AWS/GCP/Azure).
231
+
232
+ ### RBAC for CI/CD
233
+
234
+ Least privilege for service accounts and runner tokens. Short-lived credentials scoped to repos/environments. GHA: environment protection rules, required reviewers, deployment branches. Separate build (read-only) from deploy (write) permissions.
235
+
236
+ ### Compliance as Code
237
+
238
+ - **OPA**: CNCF graduated. Rego-based policies. Steeper learning curve. Best for cross-cutting concerns (API auth, Terraform plan validation, SOC2 mapping).
239
+ - **Kyverno**: CNCF incubating. YAML-based, K8s-native. Lower learning curve. Built-in mutation. Best for K8s policies (pod security, image registry restrictions).
240
+
241
+ ---
242
+
243
+ ## Integration Patterns
244
+
245
+ ### CI/CD Platform Patterns
246
+
247
+ **GitHub Actions:** Reusable workflows as templates. Pin actions to commit SHAs in production. `secrets.inherit` for passing secrets. Limit: 50 workflows, 10 nested reusable per run (Feb 2026). `concurrency` groups to cancel redundant runs.
248
+
249
+ **GitLab CI:** `include:` for shared templates. `rules:changes:` for path-based triggering. `needs:` for DAG-based parallel execution.
250
+
251
+ **Jenkins:** Shared libraries for reuse. Declarative pipelines over scripted. K8s plugin for ephemeral agents. Market share declining in favor of GHA and GitLab CI.
252
+
253
+ ### Multi-Cloud Deployment
254
+
255
+ Use multi-cloud IaC (Terraform/OpenTofu, Pulumi). Abstract cloud details behind modules. Single CI/CD platform deploying to multiple clouds. Consistent tagging, monitoring, security across clouds.
256
+
257
+ ### Database Migration in CI/CD
258
+
259
+ - **Flyway**: Simple, sequential SQL migrations. Lightweight.
260
+ - **Liquibase**: Advanced governance, rollback, drift detection.
261
+ - **Atlas**: Modern, HCL-based.
262
+
263
+ Run migrations after build, before app deployment. Store scripts in VCS (`db/migrations/`). Design forward-compatible migrations. Separate migration credentials (elevated) from app credentials.
264
+
265
+ ### Feature Flags
266
+
267
+ - **LaunchDarkly**: Enterprise, FedRAMP/SOC2. 25% of Fortune 500.
268
+ - **Unleash**: Open-source, self-hostable.
269
+
270
+ Decouple deployment from release. Set expiration dates on temporary flags. Never reuse flag names (linked to 32% of production incidents). Audit and remove stale flags regularly.
271
+
272
+ ---
273
+
274
+ ## DevOps & Deployment
275
+
276
+ ### Deployment Strategies
277
+
278
+ | Strategy | Downtime | Risk | Resource Cost | Best For |
279
+ |----------|----------|------|---------------|----------|
280
+ | **Rolling** | Zero | Medium | 1x + surge | Stateless apps, K8s default |
281
+ | **Blue/Green** | Zero | Low | 2x | Zero-downtime with instant rollback |
282
+ | **Canary** | Zero | Very Low | 1x + small % | High-traffic, gradual validation |
283
+ | **A/B Testing** | Zero | Low | 1x + split | Feature validation with user segments |
284
+ | **Recreate** | Yes | High | 1x | Dev/test, stateful legacy |
285
+
286
+ **Tooling**: Argo Rollouts (canary/blue-green with Prometheus analysis), Flagger (Istio/Linkerd/Traefik traffic shifting).
287
+
288
+ ### Rollback Patterns
289
+
290
+ - **K8s**: `kubectl rollout undo deployment/<name>`.
291
+ - **GitOps**: Revert commit in config repo; ArgoCD/Flux reconciles automatically.
292
+ - **Blue/green**: Switch LB back to blue environment.
293
+ - **Database**: Backward-compatible migrations (expand-and-contract pattern).
294
+ - **Feature flags**: Disable the flag instantly without redeployment.
295
+
296
+ ### Observability Stack
297
+
298
+ **The "LGTM" Stack (2026):** Loki (logs) + Grafana 11.x (dashboards) + Tempo/Jaeger (traces) + Prometheus 3.x (metrics).
299
+
300
+ **OpenTelemetry** is the unified instrumentation standard (48.5% adoption, 2025 survey). Vendor-neutral SDKs for metrics, traces, logs. Prometheus 3.x supports OTLP ingestion natively.
301
+
302
+ **Key practices:** RED metrics (Rate/Errors/Duration) for services; USE metrics (Utilization/Saturation/Errors) for infrastructure. Alert on symptoms (error rate, latency), not causes (CPU). Define SLOs and alert on error budget consumption.
303
+
304
+ ### Incident Response Automation
305
+
306
+ - **PagerDuty / Opsgenie / incident.io**: Alert routing with escalation policies.
307
+ - **Runbook automation**: Pre-defined diagnostic/remediation workflows (PagerDuty Runbook Automation, Rundeck).
308
+ - **ChatOps**: Slack/Teams integration for status updates, escalation, timeline generation.
309
+ - **Post-incident**: Blameless retrospectives. Document timeline and action items.
310
+
311
+ ---
312
+
313
+ ## Decision Trees
314
+
315
+ ### Decision Tree 1: Which IaC Tool?
316
+
317
+ ```
318
+ START: Multi-cloud needed?
319
+ +-- NO (AWS only) --> Want CloudFormation safety nets?
320
+ | +-- YES --> AWS CDK 2.x
321
+ | +-- NO --> Terraform/OpenTofu or Pulumi
322
+ +-- YES --> Prefer declarative (HCL) or imperative (Python/TS/Go)?
323
+ +-- Declarative --> Need open-source license?
324
+ | +-- YES --> OpenTofu 1.9+ (MPL 2.0, CNCF)
325
+ | +-- NO --> Terraform 1.10+ (BSL 1.1, IBM/HashiCorp)
326
+ +-- Imperative --> Pulumi 3.x (native testing, IDE support, ~30% faster onboarding)
327
+ ```
328
+
329
+ ### Decision Tree 2: Which Deployment Strategy?
330
+
331
+ ```
332
+ START: Can the app tolerate brief partial unavailability?
333
+ +-- YES --> Low-risk change?
334
+ | +-- YES --> Rolling (K8s default, simplest)
335
+ | +-- NO --> Blue/Green (instant rollback, 2x resources)
336
+ +-- NO --> Have metric-based automation (Prometheus, etc.)?
337
+ +-- YES --> Canary with Argo Rollouts/Flagger (auto-promote/rollback)
338
+ +-- NO --> Blue/Green with manual verification
339
+ Need user-segment targeting? --> A/B with feature flags
340
+ ```
341
+
342
+ ### Decision Tree 3: Kubernetes vs. Simpler Alternatives?
343
+
344
+ ```
345
+ START: How many services?
346
+ +-- 1-3 --> Need auto-scaling/self-healing/multi-region?
347
+ | +-- NO --> ECS Fargate / Cloud Run / App Runner
348
+ | +-- YES --> Consider managed K8s (evaluate overhead)
349
+ +-- 4-10 --> Have platform engineering capacity?
350
+ | +-- YES --> Managed K8s (EKS/GKE/AKS)
351
+ | +-- NO --> ECS Fargate / Cloud Run
352
+ +-- 10+ --> Managed K8s + platform team/IDP + ArgoCD GitOps
353
+ ```
354
+
355
+ ---
356
+
357
+ ## Code Examples
358
+
359
+ ### Example 1: GitHub Actions CI/CD Pipeline
360
+
361
+ ```yaml
362
+ name: CI/CD Pipeline
363
+ on:
364
+ push: { branches: [main] }
365
+ pull_request: { branches: [main] }
366
+ concurrency:
367
+ group: ${{ github.workflow }}-${{ github.ref }}
368
+ cancel-in-progress: true
369
+ permissions:
370
+ contents: read
371
+ id-token: write
372
+ packages: write
373
+
374
+ jobs:
375
+ lint-and-test:
376
+ runs-on: ubuntu-24.04
377
+ steps:
378
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
379
+ - uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0
380
+ with: { node-version: 22, cache: npm }
381
+ - run: npm ci && npm run lint && npm test -- --coverage
382
+
383
+ security-scan:
384
+ runs-on: ubuntu-24.04
385
+ steps:
386
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
387
+ - uses: aquasecurity/trivy-action@18f2510ee396bbf400402947e7f3b01b8e110956 # v0.29.0
388
+ with: { scan-type: fs, severity: "CRITICAL,HIGH", exit-code: 1 }
389
+
390
+ build-and-push:
391
+ needs: [lint-and-test, security-scan]
392
+ if: github.ref == 'refs/heads/main'
393
+ runs-on: ubuntu-24.04
394
+ steps:
395
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
396
+ - uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2
397
+ - uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772
398
+ with: { registry: ghcr.io, username: "${{ github.actor }}", password: "${{ secrets.GITHUB_TOKEN }}" }
399
+ - uses: docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1
400
+ with:
401
+ push: true
402
+ tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
403
+ cache-from: type=gha
404
+ cache-to: type=gha,mode=max
405
+ ```
406
+
407
+ ### Example 2: Production Dockerfile (Multi-Stage)
408
+
409
+ ```dockerfile
410
+ # syntax=docker/dockerfile:1.9
411
+ FROM node:22-alpine AS builder
412
+ WORKDIR /app
413
+ COPY package.json package-lock.json ./
414
+ RUN --mount=type=cache,target=/root/.npm npm ci
415
+ COPY tsconfig.json ./
416
+ COPY src/ ./src/
417
+ RUN npm run build && npm prune --production
418
+
419
+ FROM node:22-alpine AS runtime
420
+ RUN apk add --no-cache tini && adduser -u 1001 -D appuser
421
+ WORKDIR /app
422
+ COPY --from=builder --chown=appuser /app/dist ./dist
423
+ COPY --from=builder --chown=appuser /app/node_modules ./node_modules
424
+ COPY --from=builder --chown=appuser /app/package.json ./
425
+ USER appuser
426
+ EXPOSE 3000
427
+ HEALTHCHECK --interval=30s --timeout=3s CMD wget -qO- http://localhost:3000/health || exit 1
428
+ ENTRYPOINT ["tini", "--"]
429
+ CMD ["node", "dist/index.js"]
430
+ ```
431
+
432
+ ### Example 3: Terraform ECS Fargate Service (condensed)
433
+
434
+ ```hcl
435
+ # modules/ecs-service/main.tf
436
+ terraform {
437
+ required_version = ">= 1.9.0"
438
+ required_providers {
439
+ aws = { source = "hashicorp/aws", version = "~> 5.80" }
440
+ }
441
+ }
442
+
443
+ resource "aws_ecs_service" "this" {
444
+ name = var.service_name
445
+ cluster = var.cluster_arn
446
+ task_definition = aws_ecs_task_definition.this.arn
447
+ desired_count = var.desired_count
448
+ launch_type = "FARGATE"
449
+
450
+ network_configuration {
451
+ subnets = var.subnet_ids
452
+ security_groups = [aws_security_group.service.id]
453
+ assign_public_ip = false
454
+ }
455
+ deployment_circuit_breaker {
456
+ enable = true
457
+ rollback = true # Auto-rollback on deployment failure
458
+ }
459
+ tags = merge(var.tags, { ManagedBy = "terraform" })
460
+ }
461
+ ```
462
+
463
+ ### Example 4: ArgoCD Application + Argo Rollouts Canary
464
+
465
+ ```yaml
466
+ # ArgoCD Application
467
+ apiVersion: argoproj.io/v1alpha1
468
+ kind: Application
469
+ metadata:
470
+ name: my-service
471
+ namespace: argocd
472
+ spec:
473
+ project: default
474
+ source:
475
+ repoURL: https://github.com/org/gitops-config.git
476
+ targetRevision: main
477
+ path: services/my-service/overlays/production
478
+ destination:
479
+ server: https://kubernetes.default.svc
480
+ namespace: my-service
481
+ syncPolicy:
482
+ automated: { prune: true, selfHeal: true }
483
+ syncOptions: [CreateNamespace=true, ServerSideApply=true]
484
+ ---
485
+ # Argo Rollouts Canary with Prometheus analysis
486
+ apiVersion: argoproj.io/v1alpha1
487
+ kind: Rollout
488
+ metadata:
489
+ name: my-service
490
+ spec:
491
+ replicas: 10
492
+ selector:
493
+ matchLabels: { app: my-service }
494
+ strategy:
495
+ canary:
496
+ steps:
497
+ - setWeight: 5
498
+ - pause: { duration: 2m }
499
+ - analysis:
500
+ templates: [{ templateName: success-rate }]
501
+ - setWeight: 25
502
+ - pause: { duration: 5m }
503
+ - setWeight: 100
504
+ trafficRouting:
505
+ istio:
506
+ virtualService: { name: my-service-vsvc }
507
+ ---
508
+ apiVersion: argoproj.io/v1alpha1
509
+ kind: AnalysisTemplate
510
+ metadata:
511
+ name: success-rate
512
+ spec:
513
+ metrics:
514
+ - name: success-rate
515
+ interval: 60s
516
+ successCondition: result[0] >= 0.99
517
+ failureLimit: 3
518
+ provider:
519
+ prometheus:
520
+ address: http://prometheus.monitoring:9090
521
+ query: |
522
+ sum(rate(http_requests_total{service="my-service",status=~"2.."}[2m]))
523
+ / sum(rate(http_requests_total{service="my-service"}[2m]))
524
+ ```
525
+
526
+ ---
527
+
528
+ ## Deployment Strategies
529
+
530
+ Production deployment requires deliberate strategy selection based on risk tolerance, resource
531
+ budget, and rollback requirements. The patterns below move from simplest (blue-green) through
532
+ progressive delivery (canary) to the infrastructure and observability layers that support them.
533
+
534
+ ### Blue-Green Deployment
535
+
536
+ Two identical environments run simultaneously: **blue** (current production) and **green** (new
537
+ release candidate). Traffic is routed entirely to one environment at a time. The deployment
538
+ sequence is: deploy to green, verify health, switch traffic, keep blue as instant rollback.
539
+
540
+ **Advantages:** Zero downtime, instant rollback (switch back to blue), full production-parity
541
+ testing before user exposure. **Trade-off:** Requires 2x infrastructure during the transition
542
+ window.
543
+
544
+ ```yaml
545
+ # .github/workflows/blue-green-deploy.yml
546
+ name: Blue-Green Deploy
547
+ on:
548
+ push:
549
+ branches: [main]
550
+
551
+ jobs:
552
+ deploy:
553
+ runs-on: ubuntu-latest
554
+ environment: production
555
+ steps:
556
+ - uses: actions/checkout@v4
557
+
558
+ - name: Deploy to green environment
559
+ run: |
560
+ aws ecs update-service \
561
+ --cluster prod \
562
+ --service app-green \
563
+ --task-definition app:${{ github.sha }} \
564
+ --desired-count 3
565
+
566
+ - name: Wait for green to stabilize
567
+ run: |
568
+ aws ecs wait services-stable \
569
+ --cluster prod \
570
+ --services app-green
571
+
572
+ - name: Health check green
573
+ run: |
574
+ for i in $(seq 1 30); do
575
+ STATUS=$(curl -sf -o /dev/null -w "%{http_code}" https://green.app.example.com/health)
576
+ if [ "$STATUS" = "200" ]; then
577
+ echo "Health check passed on attempt $i"
578
+ exit 0
579
+ fi
580
+ echo "Attempt $i failed (status: $STATUS), retrying in 10s..."
581
+ sleep 10
582
+ done
583
+ echo "Health check failed after 30 attempts"
584
+ exit 1
585
+
586
+ - name: Switch traffic to green
587
+ run: |
588
+ aws elbv2 modify-listener \
589
+ --listener-arn ${{ secrets.ALB_LISTENER_ARN }} \
590
+ --default-actions Type=forward,TargetGroupArn=${{ secrets.GREEN_TG_ARN }}
591
+
592
+ - name: Verify traffic switch
593
+ run: |
594
+ sleep 30
595
+ curl -sf https://app.example.com/health | jq .version
596
+ ```
597
+
598
+ **Rollback procedure:** If post-switch monitoring detects anomalies, revert the listener to
599
+ point back at the blue target group. No redeployment required -- blue is still running the
600
+ previous known-good version.
601
+
602
+ ### Canary Deployment
603
+
604
+ Canary releases route a small percentage of production traffic to the new version while the
605
+ majority continues hitting the stable release. Traffic is shifted incrementally as confidence
606
+ grows: **5% -> 25% -> 100%**. At each stage, automated analysis compares error rates, latency
607
+ percentiles, and business metrics between the canary and the baseline.
608
+
609
+ **When to use canary over blue-green:**
610
+ - High-traffic services where even brief full-cutover risk is unacceptable
611
+ - When metric-based automated promotion/rollback is available (Argo Rollouts, Flagger)
612
+ - When you need gradual user exposure to catch long-tail issues
613
+
614
+ **Traffic progression example:**
615
+
616
+ ```
617
+ Step 1: 5% canary -- 2 min pause -- run AnalysisTemplate (success rate >= 99%)
618
+ Step 2: 25% canary -- 5 min pause -- run AnalysisTemplate
619
+ Step 3: 100% canary -- promotion complete
620
+ ```
621
+
622
+ If any analysis step fails, traffic is automatically routed back to the stable version. The
623
+ canary pods are scaled down and the rollout is marked as degraded. See Example 4 (Argo Rollouts)
624
+ in the Code Examples section above for a full working manifest.
625
+
626
+ **Key metrics to monitor during canary analysis:**
627
+ - HTTP error rate (5xx / total requests) -- threshold: < 1%
628
+ - P95 and P99 latency -- threshold: within 10% of baseline
629
+ - Pod restart count -- threshold: 0 restarts during analysis window
630
+ - Business metrics (conversion rate, checkout success) when applicable
631
+
632
+ ### Infrastructure as Code (Terraform)
633
+
634
+ Auto Scaling Groups with target tracking policies provide elastic capacity that responds to
635
+ real-time demand. The configuration below demonstrates a rolling instance refresh strategy
636
+ that maintains 75% healthy capacity during deployments -- ensuring zero downtime while
637
+ replacing instances with updated launch templates.
638
+
639
+ ```hcl
640
+ # Auto Scaling Group with target tracking
641
+ resource "aws_autoscaling_group" "app" {
642
+ name = "${var.project}-${var.environment}-asg"
643
+ min_size = var.min_instances
644
+ max_size = var.max_instances
645
+ desired_capacity = var.desired_instances
646
+ health_check_type = "ELB"
647
+ health_check_grace_period = 300
648
+ target_group_arns = [aws_lb_target_group.app.arn]
649
+ vpc_zone_identifier = var.private_subnet_ids
650
+
651
+ launch_template {
652
+ id = aws_launch_template.app.id
653
+ version = "$Latest"
654
+ }
655
+
656
+ instance_refresh {
657
+ strategy = "Rolling"
658
+ preferences {
659
+ min_healthy_percentage = 75
660
+ }
661
+ }
662
+
663
+ tag {
664
+ key = "Environment"
665
+ value = var.environment
666
+ propagate_at_launch = true
667
+ }
668
+ }
669
+
670
+ # CPU-based auto scaling
671
+ resource "aws_autoscaling_policy" "cpu_target" {
672
+ name = "${var.project}-cpu-tracking"
673
+ autoscaling_group_name = aws_autoscaling_group.app.name
674
+ policy_type = "TargetTrackingScaling"
675
+
676
+ target_tracking_configuration {
677
+ predefined_metric_specification {
678
+ predefined_metric_type = "ASGAverageCPUUtilization"
679
+ }
680
+ target_value = 60.0
681
+ disable_scale_in = false
682
+ }
683
+ }
684
+ ```
685
+
686
+ **Scaling considerations:**
687
+ - Set `health_check_grace_period` long enough for the application to fully start (including
688
+ warm-up, cache priming, connection pool initialization).
689
+ - Use `mixed_instances_policy` with multiple instance types for cost optimization and
690
+ availability across AZs.
691
+ - Pair CPU-based scaling with request-count scaling (`ALBRequestCountPerTarget`) for
692
+ web-facing services -- CPU alone misses I/O-bound bottlenecks.
693
+
694
+ ### Monitoring & Alerting (Prometheus)
695
+
696
+ SLO-based alerting focuses on what matters to users: error rates and latency. The rules below
697
+ implement multi-window burn rate alerts that catch both sudden spikes and slow degradation.
698
+ Every alert includes a `runbook` annotation linking to the remediation procedure -- alerts
699
+ without runbooks become noise.
700
+
701
+ ```yaml
702
+ # prometheus-alerts.yml
703
+ groups:
704
+ - name: slo-alerts
705
+ rules:
706
+ - alert: HighErrorRate
707
+ expr: |
708
+ sum(rate(http_requests_total{status=~"5.."}[5m]))
709
+ / sum(rate(http_requests_total[5m])) > 0.01
710
+ for: 5m
711
+ labels:
712
+ severity: critical
713
+ annotations:
714
+ summary: "Error rate exceeds 1% SLO for 5 minutes"
715
+ runbook: "https://wiki.example.com/runbooks/high-error-rate"
716
+
717
+ - alert: HighP95Latency
718
+ expr: |
719
+ histogram_quantile(0.95,
720
+ sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
721
+ ) > 0.5
722
+ for: 5m
723
+ labels:
724
+ severity: warning
725
+ annotations:
726
+ summary: "P95 latency exceeds 500ms SLO"
727
+
728
+ - alert: PodCrashLooping
729
+ expr: |
730
+ increase(kube_pod_container_status_restarts_total[1h]) > 3
731
+ for: 10m
732
+ labels:
733
+ severity: critical
734
+ annotations:
735
+ summary: "Pod {{ $labels.pod }} restarting frequently"
736
+ ```
737
+
738
+ **Alert severity guidelines:**
739
+ - **critical**: Pages on-call immediately. Error budget burning at 14.4x+ rate. Examples:
740
+ sustained 5xx spike, data loss risk, complete service unavailability.
741
+ - **warning**: Notifies team channel. Error budget burning at 6x+ rate. Examples: elevated
742
+ latency, increased restart count, approaching resource limits.
743
+ - **info**: Dashboard-only. No notification. Examples: deployment started, scaling event,
744
+ certificate renewal approaching.
745
+
746
+ ### Zero-Downtime Database Migrations
747
+
748
+ Database schema changes are the most common source of deployment-related outages. The
749
+ **expand-contract pattern** ensures backwards compatibility throughout the migration lifecycle
750
+ by never removing or renaming something the running application depends on.
751
+
752
+ **The expand-contract sequence:**
753
+
754
+ 1. **Expand (additive change):** Add the new column, table, or index. The existing application
755
+ ignores these additions -- no code change needed yet. Run this migration independently,
756
+ well before the application deployment.
757
+
758
+ 2. **Deploy new application code:** The updated application writes to both old and new
759
+ columns/tables. It reads from the new structure but falls back to the old if the new
760
+ data is not yet populated.
761
+
762
+ 3. **Backfill:** Migrate existing data from old structure to new. Use batched updates with
763
+ throttling to avoid locking tables or overwhelming replication. Verify row counts match.
764
+
765
+ 4. **Add constraints:** Once backfill is complete and verified, add NOT NULL constraints,
766
+ foreign keys, or unique indexes on the new structure.
767
+
768
+ 5. **Deploy cleanup code:** Remove the fallback reads and dual-writes from the application.
769
+ The application now uses only the new structure.
770
+
771
+ 6. **Contract (remove old structure):** Drop the old column, table, or index. This is safe
772
+ because no running code references it.
773
+
774
+ **Hard rules for production migrations:**
775
+ - Never run `ALTER TABLE ... DROP COLUMN` in the same deployment that stops writing to it.
776
+ - Never add a `NOT NULL` column without a `DEFAULT` in the same migration.
777
+ - Never rename a column -- add the new one, backfill, drop the old one.
778
+ - Always test migrations against a production-sized dataset. A migration that takes 2ms on
779
+ dev can lock a 500M-row table for 30 minutes.
780
+ - Use online DDL tools (`pt-online-schema-change`, `gh-ost`, `pg_repack`) for large tables
781
+ to avoid locking.
782
+ - Separate migration deployment from application deployment -- run migrations first, verify,
783
+ then deploy application code.
784
+
785
+ ---
786
+
787
+ *Researched: 2026-03-07 | Sources: [Kellton CI/CD Best Practices](https://www.kellton.com/kellton-tech-blog/continuous-integration-deployment-best-practices-2025), [TekRecruiter CI/CD 2026](https://www.tekrecruiter.com/post/top-10-ci-cd-pipeline-best-practices-for-engineering-leaders-in-2026), [GitLab CI/CD Best Practices](https://about.gitlab.com/blog/how-to-keep-up-with-ci-cd-best-practices/), [Naviteq IaC Comparison](https://www.naviteq.io/blog/choosing-the-right-infrastructure-as-code-tools-a-ctos-guide-to-terraform-pulumi-cdk-and-more/), [dasroot IaC 2026](https://dasroot.net/posts/2026/01/infrastructure-as-code-terraform-opentofu-pulumi-comparison-2026/), [sanj.dev IaC Decision Framework](https://sanj.dev/post/terraform-pulumi-aws-cdk-2025-decision-framework), [CNCF GitOps 2025](https://www.cncf.io/blog/2025/06/09/gitops-in-2025-from-old-school-updates-to-the-modern-way/), [Spacelift Flux vs ArgoCD](https://spacelift.io/blog/flux-vs-argo-cd), [Alpacked Anti-Patterns](https://alpacked.io/blog/devops-anti-patterns/), [IsDown Antipatterns](https://isdown.app/blog/devops-antipatterns), [Faith Forge Labs Supply Chain](https://faithforgelabs.com/blog_supplychain_security_2025.php), [SLSA Framework](https://slsa.dev/blog/2025/07/slsa-e2e), [Aikido Snyk vs Trivy](https://www.aikido.dev/blog/snyk-vs-trivy), [Trivy.dev](https://trivy.dev/), [Nirmata Kyverno vs OPA](https://nirmata.com/2025/02/07/kubernetes-policy-comparison-kyverno-vs-opa-gatekeeper/), [GitHub Reusable Workflows](https://docs.github.com/en/actions/how-tos/reuse-automations/reuse-workflows), [GHA Feb 2026 Updates](https://github.blog/changelog/2026-02-05-github-actions-early-february-2026-updates/), [Docker Build Cache](https://docs.docker.com/build/cache/optimize/), [Netdata Docker Caching](https://www.netdata.cloud/academy/docker-layer-caching/), [DZone Monorepo CI/CD](https://dzone.com/articles/ci-cd-at-scale-smarter-pipelines-for-monorepos), [Groundcover K8s Strategies](https://www.groundcover.com/blog/kubernetes-deployment-strategies), [Akuity Argo Rollouts](https://akuity.io/blog/automating-blue-green-and-canary-deployments-with-argo-rollouts), [Bytebase Flyway vs Liquibase](https://www.bytebase.com/blog/flyway-vs-liquibase/), [LaunchDarkly Feature Flags](https://launchdarkly.com/blog/what-are-feature-flags/), [Grafana OTel](https://grafana.com/blog/2023/12/18/opentelemetry-best-practices-a-users-guide-to-getting-started-with-opentelemetry/), [PagerDuty Runbook Automation](https://www.pagerduty.com/platform/automation/runbook/), [Steadybit Chaos Tools](https://steadybit.com/blog/top-chaos-engineering-tools-worth-knowing-about-2025-guide/)*