blockmine 1.24.0 → 1.25.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (346) hide show
  1. package/CHANGELOG.md +32 -0
  2. package/README.en.md +427 -0
  3. package/README.md +40 -0
  4. package/backend/cli.js +1 -1
  5. package/backend/src/ai/plugin-assistant-system-prompt.md +664 -5
  6. package/backend/src/api/routes/bots.js +13 -0
  7. package/backend/src/api/routes/servers.js +14 -2
  8. package/backend/src/core/BotProcess.js +98 -2
  9. package/backend/src/core/PluginLoader.js +83 -3
  10. package/backend/src/core/PluginManager.js +75 -5
  11. package/backend/src/core/services/BotLifecycleService.js +186 -2
  12. package/backend/src/server.js +11 -1
  13. package/frontend/dist/assets/browser-ponyfill-DN7pwmHT.js +2 -0
  14. package/frontend/dist/assets/index-LSy71uwm.js +11261 -0
  15. package/frontend/dist/assets/index-SfhKxI4-.css +32 -0
  16. package/frontend/dist/flags/en.svg +32 -0
  17. package/frontend/dist/flags/ru.svg +5 -0
  18. package/frontend/dist/index.html +2 -2
  19. package/frontend/dist/locales/en/admin.json +100 -0
  20. package/frontend/dist/locales/en/api-keys.json +58 -0
  21. package/frontend/dist/locales/en/bots.json +110 -0
  22. package/frontend/dist/locales/en/common.json +47 -0
  23. package/frontend/dist/locales/en/configuration.json +22 -0
  24. package/frontend/dist/locales/en/console.json +10 -0
  25. package/frontend/dist/locales/en/dashboard.json +85 -0
  26. package/frontend/dist/locales/en/dialogs.json +70 -0
  27. package/frontend/dist/locales/en/event-graphs.json +50 -0
  28. package/frontend/dist/locales/en/graph-store.json +70 -0
  29. package/frontend/dist/locales/en/login.json +34 -0
  30. package/frontend/dist/locales/en/management.json +114 -0
  31. package/frontend/dist/locales/en/minecraft-viewer.json +27 -0
  32. package/frontend/dist/locales/en/nodes.json +1077 -0
  33. package/frontend/dist/locales/en/permissions.json +50 -0
  34. package/frontend/dist/locales/en/plugin-detail.json +49 -0
  35. package/frontend/dist/locales/en/plugins.json +110 -0
  36. package/frontend/dist/locales/en/proxies.json +81 -0
  37. package/frontend/dist/locales/en/servers.json +39 -0
  38. package/frontend/dist/locales/en/setup.json +17 -0
  39. package/frontend/dist/locales/en/sidebar.json +27 -0
  40. package/frontend/dist/locales/en/tasks.json +62 -0
  41. package/frontend/dist/locales/en/visual-editor.json +219 -0
  42. package/frontend/dist/locales/en/websocket.json +86 -0
  43. package/frontend/dist/locales/ru/admin.json +100 -0
  44. package/frontend/dist/locales/ru/api-keys.json +58 -0
  45. package/frontend/dist/locales/ru/bots.json +110 -0
  46. package/frontend/dist/locales/ru/common.json +49 -0
  47. package/frontend/dist/locales/ru/configuration.json +22 -0
  48. package/frontend/dist/locales/ru/console.json +10 -0
  49. package/frontend/dist/locales/ru/dashboard.json +85 -0
  50. package/frontend/dist/locales/ru/dialogs.json +70 -0
  51. package/frontend/dist/locales/ru/event-graphs.json +50 -0
  52. package/frontend/dist/locales/ru/graph-store.json +70 -0
  53. package/frontend/dist/locales/ru/login.json +34 -0
  54. package/frontend/dist/locales/ru/management.json +114 -0
  55. package/frontend/dist/locales/ru/minecraft-viewer.json +27 -0
  56. package/frontend/dist/locales/ru/nodes.json +1077 -0
  57. package/frontend/dist/locales/ru/permissions.json +50 -0
  58. package/frontend/dist/locales/ru/plugin-detail.json +49 -0
  59. package/frontend/dist/locales/ru/plugins.json +110 -0
  60. package/frontend/dist/locales/ru/proxies.json +81 -0
  61. package/frontend/dist/locales/ru/servers.json +39 -0
  62. package/frontend/dist/locales/ru/setup.json +17 -0
  63. package/frontend/dist/locales/ru/sidebar.json +27 -0
  64. package/frontend/dist/locales/ru/tasks.json +62 -0
  65. package/frontend/dist/locales/ru/visual-editor.json +221 -0
  66. package/frontend/dist/locales/ru/websocket.json +86 -0
  67. package/frontend/dist/monacoeditorwork/css.worker.bundle.js +7 -7
  68. package/frontend/dist/monacoeditorwork/html.worker.bundle.js +7 -7
  69. package/frontend/dist/monacoeditorwork/json.worker.bundle.js +7 -7
  70. package/frontend/dist/monacoeditorwork/ts.worker.bundle.js +3 -3
  71. package/frontend/package.json +4 -0
  72. package/package.json +1 -1
  73. package/screen/3dviewer.png +0 -0
  74. package/screen/console.png +0 -0
  75. package/screen/dashboard.png +0 -0
  76. package/screen/graph_collabe.png +0 -0
  77. package/screen/graph_live_debug.png +0 -0
  78. package/screen/language_selector.png +0 -0
  79. package/screen/management_command.png +0 -0
  80. package/screen/node_debug_trace.png +0 -0
  81. package/screen/plugin_/320/276/320/261/320/267/320/276/321/200.png +0 -0
  82. package/screen/websocket.png +0 -0
  83. package/screen//320/275/320/260/321/201/321/202/321/200/320/276/320/271/320/272/320/270_/320/276/321/202/320/264/320/265/320/273/321/214/320/275/321/213/321/205_/320/272/320/276/320/274/320/260/320/275/320/264_/320/272/320/260/320/266/320/264/321/203_/320/272/320/276/320/274/320/260/320/275/320/273/320/264/321/203_/320/274/320/276/320/266/320/275/320/276_/320/275/320/260/321/201/321/202/321/200/320/260/320/270/320/262/320/260/321/202/321/214.png +0 -0
  84. package/screen//320/277/320/273/320/260/320/275/320/270/321/200/320/276/320/262/321/211/320/270/320/272_/320/274/320/276/320/266/320/275/320/276_/320/267/320/260/320/264/320/260/320/262/320/260/321/202/321/214_/320/264/320/265/320/271/321/201/321/202/320/262/320/270/321/217_/320/277/320/276_/320/262/321/200/320/265/320/274/320/265/320/275/320/270.png +0 -0
  85. package/.claude/agents/README.md +0 -469
  86. package/.claude/agents/auth-route-debugger.md +0 -118
  87. package/.claude/agents/auth-route-tester.md +0 -93
  88. package/.claude/agents/auto-error-resolver.md +0 -97
  89. package/.claude/agents/build-optimizer.md +0 -236
  90. package/.claude/agents/code-architect.md +0 -34
  91. package/.claude/agents/code-architecture-reviewer.md +0 -83
  92. package/.claude/agents/code-explorer.md +0 -51
  93. package/.claude/agents/code-refactor-master.md +0 -94
  94. package/.claude/agents/code-reviewer.md +0 -46
  95. package/.claude/agents/cost-optimizer.md +0 -134
  96. package/.claude/agents/deployment-orchestrator.md +0 -113
  97. package/.claude/agents/documentation-architect.md +0 -82
  98. package/.claude/agents/frontend-error-fixer.md +0 -77
  99. package/.claude/agents/iac-code-generator.md +0 -71
  100. package/.claude/agents/incident-responder.md +0 -346
  101. package/.claude/agents/infrastructure-architect.md +0 -31
  102. package/.claude/agents/kubernetes-specialist.md +0 -56
  103. package/.claude/agents/migration-planner.md +0 -181
  104. package/.claude/agents/network-architect.md +0 -196
  105. package/.claude/agents/plan-reviewer.md +0 -52
  106. package/.claude/agents/refactor-planner.md +0 -63
  107. package/.claude/agents/security-scanner.md +0 -102
  108. package/.claude/agents/web-research-specialist.md +0 -78
  109. package/.claude/commands/cost-analysis.md +0 -315
  110. package/.claude/commands/dev-docs-update.md +0 -55
  111. package/.claude/commands/dev-docs.md +0 -51
  112. package/.claude/commands/feature-dev.md +0 -125
  113. package/.claude/commands/incident-debug.md +0 -247
  114. package/.claude/commands/infra-plan.md +0 -81
  115. package/.claude/commands/migration-plan.md +0 -478
  116. package/.claude/commands/route-research-for-testing.md +0 -37
  117. package/.claude/commands/security-review.md +0 -66
  118. package/.claude/hooks/CONFIG.md +0 -448
  119. package/.claude/hooks/README.md +0 -163
  120. package/.claude/hooks/SKILL_ACTIVATION_COMPLETE.md +0 -226
  121. package/.claude/hooks/WINDOWS_HOOKS_README.md +0 -151
  122. package/.claude/hooks/add-skill-activation-banners.ts +0 -132
  123. package/.claude/hooks/comprehensive-skill-test.ts +0 -1315
  124. package/.claude/hooks/error-handling-reminder.sh +0 -12
  125. package/.claude/hooks/error-handling-reminder.ts +0 -222
  126. package/.claude/hooks/k8s-manifest-validator.sh +0 -56
  127. package/.claude/hooks/package-lock.json +0 -556
  128. package/.claude/hooks/package.json +0 -16
  129. package/.claude/hooks/post-tool-use-tracker.ps1 +0 -174
  130. package/.claude/hooks/post-tool-use-tracker.sh +0 -183
  131. package/.claude/hooks/security-policy-check.sh +0 -247
  132. package/.claude/hooks/skill-activation-prompt.ps1 +0 -10
  133. package/.claude/hooks/skill-activation-prompt.sh +0 -10
  134. package/.claude/hooks/skill-activation-prompt.ts +0 -141
  135. package/.claude/hooks/stop-build-check-enhanced.sh +0 -130
  136. package/.claude/hooks/terraform-validator.sh +0 -53
  137. package/.claude/hooks/test-input.json +0 -7
  138. package/.claude/hooks/test-skill-activation.ts +0 -427
  139. package/.claude/hooks/trigger-build-resolver.sh +0 -79
  140. package/.claude/hooks/tsc-check.sh +0 -173
  141. package/.claude/hooks/tsconfig.json +0 -19
  142. package/.claude/settings.json +0 -59
  143. package/.claude/settings.local.json +0 -67
  144. package/.claude/skills/README.md +0 -507
  145. package/.claude/skills/api-engineering/SKILL.md +0 -63
  146. package/.claude/skills/api-engineering/resources/api-versioning.md +0 -88
  147. package/.claude/skills/api-engineering/resources/graphql-patterns.md +0 -106
  148. package/.claude/skills/api-engineering/resources/rate-limiting.md +0 -118
  149. package/.claude/skills/api-engineering/resources/rest-api-design.md +0 -105
  150. package/.claude/skills/backend-dev-guidelines/SKILL.md +0 -306
  151. package/.claude/skills/backend-dev-guidelines/resources/architecture-overview.md +0 -451
  152. package/.claude/skills/backend-dev-guidelines/resources/async-and-errors.md +0 -307
  153. package/.claude/skills/backend-dev-guidelines/resources/complete-examples.md +0 -638
  154. package/.claude/skills/backend-dev-guidelines/resources/configuration.md +0 -275
  155. package/.claude/skills/backend-dev-guidelines/resources/database-patterns.md +0 -224
  156. package/.claude/skills/backend-dev-guidelines/resources/middleware-guide.md +0 -213
  157. package/.claude/skills/backend-dev-guidelines/resources/routing-and-controllers.md +0 -756
  158. package/.claude/skills/backend-dev-guidelines/resources/sentry-and-monitoring.md +0 -336
  159. package/.claude/skills/backend-dev-guidelines/resources/services-and-repositories.md +0 -789
  160. package/.claude/skills/backend-dev-guidelines/resources/testing-guide.md +0 -235
  161. package/.claude/skills/backend-dev-guidelines/resources/validation-patterns.md +0 -754
  162. package/.claude/skills/budget-and-cost-management/SKILL.md +0 -850
  163. package/.claude/skills/build-engineering/SKILL.md +0 -431
  164. package/.claude/skills/build-engineering/resources/artifact-repositories.md +0 -72
  165. package/.claude/skills/build-engineering/resources/build-caching.md +0 -96
  166. package/.claude/skills/build-engineering/resources/build-pipelines.md +0 -105
  167. package/.claude/skills/build-engineering/resources/build-security.md +0 -95
  168. package/.claude/skills/build-engineering/resources/build-systems.md +0 -389
  169. package/.claude/skills/build-engineering/resources/compilation-optimization.md +0 -201
  170. package/.claude/skills/build-engineering/resources/dependency-management.md +0 -73
  171. package/.claude/skills/build-engineering/resources/monorepo-builds.md +0 -110
  172. package/.claude/skills/build-engineering/resources/performance-optimization.md +0 -113
  173. package/.claude/skills/build-engineering/resources/reproducible-builds.md +0 -82
  174. package/.claude/skills/cloud-engineering/SKILL.md +0 -675
  175. package/.claude/skills/cloud-engineering/resources/aws-patterns.md +0 -742
  176. package/.claude/skills/cloud-engineering/resources/azure-patterns.md +0 -714
  177. package/.claude/skills/cloud-engineering/resources/cleared-cloud-environments.md +0 -987
  178. package/.claude/skills/cloud-engineering/resources/cloud-cost-optimization.md +0 -757
  179. package/.claude/skills/cloud-engineering/resources/cloud-networking.md +0 -1058
  180. package/.claude/skills/cloud-engineering/resources/cloud-security-tools.md +0 -1530
  181. package/.claude/skills/cloud-engineering/resources/cloud-security.md +0 -990
  182. package/.claude/skills/cloud-engineering/resources/gcp-patterns.md +0 -758
  183. package/.claude/skills/cloud-engineering/resources/migration-strategies.md +0 -820
  184. package/.claude/skills/cloud-engineering/resources/multi-cloud-strategies.md +0 -670
  185. package/.claude/skills/cloud-engineering/resources/oci-patterns.md +0 -1198
  186. package/.claude/skills/cloud-engineering/resources/serverless-patterns.md +0 -795
  187. package/.claude/skills/cloud-engineering/resources/well-architected-frameworks.md +0 -966
  188. package/.claude/skills/cybersecurity/SKILL.md +0 -409
  189. package/.claude/skills/cybersecurity/resources/security-architecture.md +0 -266
  190. package/.claude/skills/database-engineering/SKILL.md +0 -61
  191. package/.claude/skills/database-engineering/resources/backup-and-recovery.md +0 -72
  192. package/.claude/skills/database-engineering/resources/database-replication.md +0 -63
  193. package/.claude/skills/database-engineering/resources/postgresql-fundamentals.md +0 -70
  194. package/.claude/skills/database-engineering/resources/query-optimization.md +0 -68
  195. package/.claude/skills/devsecops/SKILL.md +0 -374
  196. package/.claude/skills/devsecops/resources/ci-cd-security.md +0 -204
  197. package/.claude/skills/devsecops/resources/compliance-automation.md +0 -530
  198. package/.claude/skills/devsecops/resources/compliance-frameworks.md +0 -2322
  199. package/.claude/skills/devsecops/resources/container-security.md +0 -915
  200. package/.claude/skills/devsecops/resources/cspm-integration.md +0 -1440
  201. package/.claude/skills/devsecops/resources/policy-enforcement.md +0 -619
  202. package/.claude/skills/devsecops/resources/secrets-management.md +0 -755
  203. package/.claude/skills/devsecops/resources/security-monitoring.md +0 -146
  204. package/.claude/skills/devsecops/resources/security-scanning.md +0 -887
  205. package/.claude/skills/devsecops/resources/security-testing.md +0 -203
  206. package/.claude/skills/devsecops/resources/supply-chain-security.md +0 -518
  207. package/.claude/skills/devsecops/resources/vulnerability-management.md +0 -481
  208. package/.claude/skills/devsecops/resources/zero-trust-architecture.md +0 -177
  209. package/.claude/skills/documentation-as-code/SKILL.md +0 -323
  210. package/.claude/skills/documentation-as-code/resources/api-documentation.md +0 -90
  211. package/.claude/skills/documentation-as-code/resources/changelog-management.md +0 -79
  212. package/.claude/skills/documentation-as-code/resources/diagram-generation.md +0 -44
  213. package/.claude/skills/documentation-as-code/resources/docs-as-code-workflow.md +0 -99
  214. package/.claude/skills/documentation-as-code/resources/documentation-automation.md +0 -68
  215. package/.claude/skills/documentation-as-code/resources/documentation-sites.md +0 -79
  216. package/.claude/skills/documentation-as-code/resources/markdown-best-practices.md +0 -162
  217. package/.claude/skills/documentation-as-code/resources/openapi-specification.md +0 -77
  218. package/.claude/skills/documentation-as-code/resources/readme-engineering.md +0 -60
  219. package/.claude/skills/documentation-as-code/resources/technical-writing-guide.md +0 -202
  220. package/.claude/skills/engineering-management/SKILL.md +0 -356
  221. package/.claude/skills/engineering-management/resources/career-ladders.md +0 -609
  222. package/.claude/skills/engineering-management/resources/hiring-and-assessment.md +0 -555
  223. package/.claude/skills/engineering-management/resources/one-on-one-guides.md +0 -609
  224. package/.claude/skills/engineering-management/resources/resource-planning.md +0 -557
  225. package/.claude/skills/engineering-management/resources/team-organization-patterns.md +0 -491
  226. package/.claude/skills/engineering-management/resources/technical-interviews.md +0 -474
  227. package/.claude/skills/engineering-operations-management/SKILL.md +0 -817
  228. package/.claude/skills/error-tracking/SKILL.md +0 -379
  229. package/.claude/skills/frontend-design/SKILL.md +0 -42
  230. package/.claude/skills/frontend-dev-guidelines/SKILL.md +0 -403
  231. package/.claude/skills/frontend-dev-guidelines/resources/common-patterns.md +0 -331
  232. package/.claude/skills/frontend-dev-guidelines/resources/complete-examples.md +0 -872
  233. package/.claude/skills/frontend-dev-guidelines/resources/component-patterns.md +0 -502
  234. package/.claude/skills/frontend-dev-guidelines/resources/data-fetching.md +0 -767
  235. package/.claude/skills/frontend-dev-guidelines/resources/file-organization.md +0 -502
  236. package/.claude/skills/frontend-dev-guidelines/resources/loading-and-error-states.md +0 -501
  237. package/.claude/skills/frontend-dev-guidelines/resources/performance.md +0 -406
  238. package/.claude/skills/frontend-dev-guidelines/resources/routing-guide.md +0 -364
  239. package/.claude/skills/frontend-dev-guidelines/resources/styling-guide.md +0 -428
  240. package/.claude/skills/frontend-dev-guidelines/resources/typescript-standards.md +0 -418
  241. package/.claude/skills/general-it-engineering/SKILL.md +0 -393
  242. package/.claude/skills/general-it-engineering/resources/asset-management.md +0 -712
  243. package/.claude/skills/general-it-engineering/resources/automation-orchestration.md +0 -817
  244. package/.claude/skills/general-it-engineering/resources/business-continuity.md +0 -786
  245. package/.claude/skills/general-it-engineering/resources/change-management.md +0 -715
  246. package/.claude/skills/general-it-engineering/resources/enterprise-monitoring.md +0 -729
  247. package/.claude/skills/general-it-engineering/resources/help-desk-operations.md +0 -738
  248. package/.claude/skills/general-it-engineering/resources/incident-service-management.md +0 -834
  249. package/.claude/skills/general-it-engineering/resources/it-governance.md +0 -753
  250. package/.claude/skills/general-it-engineering/resources/itil-framework.md +0 -503
  251. package/.claude/skills/general-it-engineering/resources/service-management.md +0 -669
  252. package/.claude/skills/infrastructure-architecture/SKILL.md +0 -328
  253. package/.claude/skills/infrastructure-architecture/resources/architecture-decision-records.md +0 -505
  254. package/.claude/skills/infrastructure-architecture/resources/architecture-patterns.md +0 -528
  255. package/.claude/skills/infrastructure-architecture/resources/capacity-planning.md +0 -453
  256. package/.claude/skills/infrastructure-architecture/resources/cleared-environment-architecture.md +0 -773
  257. package/.claude/skills/infrastructure-architecture/resources/cost-architecture.md +0 -499
  258. package/.claude/skills/infrastructure-architecture/resources/data-architecture.md +0 -501
  259. package/.claude/skills/infrastructure-architecture/resources/disaster-recovery.md +0 -535
  260. package/.claude/skills/infrastructure-architecture/resources/migration-architecture.md +0 -512
  261. package/.claude/skills/infrastructure-architecture/resources/multi-region-design.md +0 -608
  262. package/.claude/skills/infrastructure-architecture/resources/reference-architectures.md +0 -562
  263. package/.claude/skills/infrastructure-architecture/resources/security-architecture.md +0 -538
  264. package/.claude/skills/infrastructure-architecture/resources/system-design-principles.md +0 -489
  265. package/.claude/skills/infrastructure-architecture/resources/workload-classification.md +0 -1000
  266. package/.claude/skills/infrastructure-strategy/SKILL.md +0 -924
  267. package/.claude/skills/network-engineering/SKILL.md +0 -385
  268. package/.claude/skills/network-engineering/resources/dns-management.md +0 -738
  269. package/.claude/skills/network-engineering/resources/load-balancing.md +0 -820
  270. package/.claude/skills/network-engineering/resources/network-architecture.md +0 -546
  271. package/.claude/skills/network-engineering/resources/network-security.md +0 -921
  272. package/.claude/skills/network-engineering/resources/network-troubleshooting.md +0 -749
  273. package/.claude/skills/network-engineering/resources/routing-switching.md +0 -373
  274. package/.claude/skills/network-engineering/resources/sdn-networking.md +0 -695
  275. package/.claude/skills/network-engineering/resources/service-mesh-networking.md +0 -777
  276. package/.claude/skills/network-engineering/resources/tcp-ip-protocols.md +0 -444
  277. package/.claude/skills/network-engineering/resources/vpn-connectivity.md +0 -672
  278. package/.claude/skills/node-development/SKILL.md +0 -317
  279. package/.claude/skills/observability-engineering/SKILL.md +0 -101
  280. package/.claude/skills/observability-engineering/resources/apm-tools.md +0 -97
  281. package/.claude/skills/observability-engineering/resources/correlation-strategies.md +0 -87
  282. package/.claude/skills/observability-engineering/resources/distributed-tracing.md +0 -98
  283. package/.claude/skills/observability-engineering/resources/logs-aggregation.md +0 -118
  284. package/.claude/skills/observability-engineering/resources/observability-cost-optimization.md +0 -141
  285. package/.claude/skills/observability-engineering/resources/opentelemetry.md +0 -110
  286. package/.claude/skills/platform-engineering/SKILL.md +0 -555
  287. package/.claude/skills/platform-engineering/resources/architecture-overview.md +0 -600
  288. package/.claude/skills/platform-engineering/resources/container-orchestration.md +0 -916
  289. package/.claude/skills/platform-engineering/resources/cost-optimization.md +0 -634
  290. package/.claude/skills/platform-engineering/resources/developer-platforms.md +0 -670
  291. package/.claude/skills/platform-engineering/resources/gitops-automation.md +0 -650
  292. package/.claude/skills/platform-engineering/resources/infrastructure-as-code.md +0 -778
  293. package/.claude/skills/platform-engineering/resources/infrastructure-standards.md +0 -708
  294. package/.claude/skills/platform-engineering/resources/multi-tenancy.md +0 -602
  295. package/.claude/skills/platform-engineering/resources/platform-security.md +0 -711
  296. package/.claude/skills/platform-engineering/resources/resource-management.md +0 -592
  297. package/.claude/skills/platform-engineering/resources/service-mesh.md +0 -628
  298. package/.claude/skills/release-engineering/SKILL.md +0 -393
  299. package/.claude/skills/release-engineering/resources/artifact-management.md +0 -108
  300. package/.claude/skills/release-engineering/resources/build-optimization.md +0 -84
  301. package/.claude/skills/release-engineering/resources/ci-cd-pipelines.md +0 -411
  302. package/.claude/skills/release-engineering/resources/deployment-strategies.md +0 -197
  303. package/.claude/skills/release-engineering/resources/pipeline-security.md +0 -62
  304. package/.claude/skills/release-engineering/resources/progressive-delivery.md +0 -83
  305. package/.claude/skills/release-engineering/resources/release-automation.md +0 -68
  306. package/.claude/skills/release-engineering/resources/release-orchestration.md +0 -77
  307. package/.claude/skills/release-engineering/resources/rollback-strategies.md +0 -66
  308. package/.claude/skills/release-engineering/resources/versioning-strategies.md +0 -59
  309. package/.claude/skills/route-tester/SKILL.md +0 -392
  310. package/.claude/skills/skill-developer/ADVANCED.md +0 -197
  311. package/.claude/skills/skill-developer/HOOK_MECHANISMS.md +0 -306
  312. package/.claude/skills/skill-developer/PATTERNS_LIBRARY.md +0 -152
  313. package/.claude/skills/skill-developer/SKILL.md +0 -430
  314. package/.claude/skills/skill-developer/SKILL_RULES_REFERENCE.md +0 -315
  315. package/.claude/skills/skill-developer/TRIGGER_TYPES.md +0 -305
  316. package/.claude/skills/skill-developer/TROUBLESHOOTING.md +0 -514
  317. package/.claude/skills/skill-rules.json +0 -2989
  318. package/.claude/skills/sre/SKILL.md +0 -464
  319. package/.claude/skills/sre/resources/alerting-best-practices.md +0 -282
  320. package/.claude/skills/sre/resources/capacity-planning.md +0 -226
  321. package/.claude/skills/sre/resources/chaos-engineering.md +0 -193
  322. package/.claude/skills/sre/resources/disaster-recovery.md +0 -232
  323. package/.claude/skills/sre/resources/incident-management.md +0 -436
  324. package/.claude/skills/sre/resources/observability-stack.md +0 -240
  325. package/.claude/skills/sre/resources/on-call-runbooks.md +0 -167
  326. package/.claude/skills/sre/resources/performance-optimization.md +0 -108
  327. package/.claude/skills/sre/resources/reliability-patterns.md +0 -183
  328. package/.claude/skills/sre/resources/slo-sli-sla.md +0 -464
  329. package/.claude/skills/sre/resources/toil-reduction.md +0 -145
  330. package/.claude/skills/systems-engineering/SKILL.md +0 -648
  331. package/.claude/skills/systems-engineering/resources/automation-patterns.md +0 -771
  332. package/.claude/skills/systems-engineering/resources/configuration-management.md +0 -998
  333. package/.claude/skills/systems-engineering/resources/linux-administration.md +0 -672
  334. package/.claude/skills/systems-engineering/resources/networking-fundamentals.md +0 -982
  335. package/.claude/skills/systems-engineering/resources/performance-tuning.md +0 -871
  336. package/.claude/skills/systems-engineering/resources/powershell-scripting.md +0 -482
  337. package/.claude/skills/systems-engineering/resources/security-hardening.md +0 -739
  338. package/.claude/skills/systems-engineering/resources/shell-scripting.md +0 -915
  339. package/.claude/skills/systems-engineering/resources/storage-management.md +0 -628
  340. package/.claude/skills/systems-engineering/resources/system-monitoring.md +0 -787
  341. package/.claude/skills/systems-engineering/resources/troubleshooting-guide.md +0 -753
  342. package/.claude/skills/systems-engineering/resources/windows-administration.md +0 -738
  343. package/.claude/skills/technical-leadership/SKILL.md +0 -728
  344. package/backend/docs/SECRETS_DOCUMENTATION.md +0 -327
  345. package/frontend/dist/assets/index-BC-NbKXi.css +0 -32
  346. package/frontend/dist/assets/index-DqJXZMHY.js +0 -11266
@@ -1,924 +0,0 @@
1
- # Infrastructure Strategy for Engineering Leaders
2
-
3
- **For VPs, Directors, and Senior Managers setting multi-year infrastructure direction.**
4
-
5
- > Infrastructure strategy is about making big bets that enable your business for years to come - cloud platform choices, build vs buy decisions, technology investments, and multi-year roadmaps.
6
-
7
- ---
8
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
9
- 🎯 SKILL ACTIVATED: infrastructure-strategy
10
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
11
-
12
-
13
- ## When to Use This Skill
14
-
15
- **You need help with:**
16
- - Cloud strategy (AWS vs Azure vs GCP, multi-cloud vs single-cloud)
17
- - Build vs buy decisions for infrastructure components
18
- - Platform investment ROI calculations
19
- - Multi-year infrastructure roadmapping
20
- - Technology evaluation and selection (technology radar)
21
- - Migration planning at executive level
22
- - Balancing innovation vs stability
23
- - Infrastructure budget prioritization
24
-
25
- **This skill does NOT cover:**
26
- - Day-to-day technical decisions (see technical-leadership)
27
- - Hands-on implementation (see technical skills)
28
- - Operational management (see engineering-operations-management)
29
-
30
- ---
31
-
32
- ## 1. Cloud Strategy
33
-
34
- ### Single Cloud vs Multi-Cloud
35
-
36
- **Single Cloud (Recommended for most)**
37
- ```
38
- Pros:
39
- ✅ Deep integration with platform services
40
- ✅ Team becomes expert in one platform
41
- ✅ Simpler operations and tooling
42
- ✅ Lower cost (volume discounts, reserved instances)
43
- ✅ Faster development (use platform services)
44
-
45
- Cons:
46
- ❌ Vendor lock-in risk
47
- ❌ Less negotiating leverage
48
- ❌ Subject to platform outages
49
- ❌ Limited to platform capabilities
50
-
51
- Best for:
52
- - Startups and scale-ups
53
- - Teams < 100 engineers
54
- - Standard workloads
55
- - Cost-sensitive orgs
56
- ```
57
-
58
- **Multi-Cloud (For specific use cases)**
59
- ```
60
- Pros:
61
- ✅ Avoid vendor lock-in
62
- ✅ Negotiating leverage
63
- ✅ Use best-of-breed services
64
- ✅ Geographic coverage (e.g., China requires local cloud)
65
-
66
- Cons:
67
- ❌ Operational complexity (2-3x overhead)
68
- ❌ Team knowledge fragmentation
69
- ❌ Higher costs (no volume discounts)
70
- ❌ Integration challenges
71
- ❌ Security complexity
72
-
73
- Best for:
74
- - Large enterprises (500+ engineers)
75
- - Regulatory requirements (data sovereignty)
76
- - M&A integration (acquired companies on different clouds)
77
- - Specific workload requirements
78
- ```
79
-
80
- **Decision Framework:**
81
- 1. **Start with single cloud** unless you have specific reason for multi-cloud
82
- 2. **Choose cloud based on:**
83
- - Existing team skills
84
- - Services needed (ML, analytics, compute)
85
- - Geographic presence
86
- - Pricing for your workload
87
- 3. **Design for portability** (Kubernetes, IaC) but don't pay multi-cloud tax
88
-
89
- ### Which Cloud Provider?
90
-
91
- | AWS | Azure | GCP | Oracle Cloud (OCI) |
92
- |-----|-------|-----|-------------------|
93
- | **Strengths:** Largest ecosystem, most services, mature, global coverage | **Strengths:** Enterprise sales, Microsoft integration, hybrid cloud (Arc) | **Strengths:** Data/ML services, Kubernetes, developer experience, pricing | **Strengths:** Oracle DB, enterprise support, government clouds |
94
- | **Weaknesses:** Complexity, older UI, pricing opacity | **Weaknesses:** Service quality inconsistency, documentation gaps | **Weaknesses:** Smaller ecosystem, fewer enterprise features | **Weaknesses:** Smaller ecosystem, fewer services |
95
- | **Best for:** Startups, tech companies, most use cases | **Best for:** Enterprises with Microsoft stack, hybrid cloud | **Best for:** Data-heavy workloads, ML/AI, Kubernetes-first | **Best for:** Oracle workloads, government, highly regulated |
96
-
97
- **Choosing strategy:**
98
- - **Startup/scale-up:** AWS (ecosystem) or GCP (developer experience)
99
- - **Enterprise:** Azure (if Microsoft shop) or AWS (if tech-forward)
100
- - **Regulated/government:** AWS GovCloud, Azure Government, or OCI
101
- - **Oracle DB heavy:** OCI (database licensing savings)
102
-
103
- ### Cloud Strategy Scenarios
104
-
105
- **Scenario: "Should we go all-in on AWS or stay flexible?"**
106
- - **All-in (Recommended):** Use AWS-specific services (Lambda, DynamoDB, etc.) for faster development
107
- - **Flexible:** Use portable tech (Kubernetes, Postgres) but sacrifice AWS integration benefits
108
- - **Reality:** Portability is expensive. Most companies that plan for multi-cloud never actually migrate.
109
- - **Decision:** Go all-in unless you have specific multi-cloud requirement
110
-
111
- **Scenario: "Is multi-cloud worth the complexity?"**
112
- - **Answer:** Usually NO. Multi-cloud costs 2-3x in operational overhead
113
- - **Only do multi-cloud if:**
114
- - Large enterprise (500+ engineers) with resources
115
- - Regulatory requirement (data must stay in specific regions/clouds)
116
- - M&A (acquired company on different cloud, temporary state)
117
- - **Alternative:** Design for cloud portability (Kubernetes, Terraform) but run on single cloud
118
-
119
- **Scenario: "Do we need disaster recovery in another cloud?"**
120
- - **Question:** "What's the failure mode? Entire AWS region or all of AWS?"
121
- - **Reality:** Multi-region in same cloud is simpler and handles 99.9% of DR scenarios
122
- - **Multi-cloud DR:** Only for catastrophic cloud-wide failures (extremely rare)
123
- - **Decision:** Multi-region DR first, multi-cloud DR only if mandated by compliance
124
-
125
- **Scenario: "Serverless vs container strategy?"**
126
- - **Serverless (Lambda/Cloud Functions):**
127
- - Best for: Event-driven, variable load, stateless functions
128
- - Not for: Long-running, stateful, complex orchestration
129
- - **Containers (ECS/EKS/Cloud Run):**
130
- - Best for: Always-on services, stateful apps, complex dependencies
131
- - Not for: Simple event handlers, variable load (without autoscaling)
132
- - **Decision:** Use both - serverless for events, containers for services
133
-
134
- **Scenario: "Moving from on-prem to cloud?"**
135
- - **Timeline:** 12-36 months depending on complexity
136
- - **Strategy:**
137
- - Phase 1: Lift-and-shift (VMs) to derisk
138
- - Phase 2: Re-platform (containerize, use managed services)
139
- - Phase 3: Re-architect (cloud-native, serverless)
140
- - **Don't:** Big-bang migration. Do: Incremental, service by service
141
-
142
- **Scenario: "Cost difference between clouds?"**
143
- - **Reality:** Pricing is similar for compute/storage (within 10-20%)
144
- - **True cost differences:**
145
- - Data egress (can be 3-5x different)
146
- - Managed services (varies widely)
147
- - Enterprise support (20% of spend)
148
- - Reserved instance discounts (negotiate these!)
149
- - **Decision:** Choose based on services/expertise, not just pricing
150
-
151
- **Scenario: "Should we use GCP for ML workloads and AWS for everything else?"**
152
- - **Sounds smart, but:** Operational complexity of managing two clouds
153
- - **Better:** Use AWS SageMaker or GCP Vertex AI - both are excellent
154
- - **Only split if:** ML team is separate and has strong GCP preference
155
- - **Reality:** Integration complexity usually outweighs best-of-breed benefits
156
-
157
- **Scenario: "GovCloud requirement - what changes?"**
158
- - **Limited services:** Not all AWS services available in GovCloud
159
- - **Higher cost:** Separate infrastructure, lower economies of scale
160
- - **Compliance burden:** STIG hardening, continuous monitoring, audit paperwork
161
- - **Staffing:** Need cleared personnel for some operations
162
- - **Timeline:** Add 3-6 months to normal cloud migration
163
-
164
- **Scenario: "Cloud-native vs cloud-agnostic?"**
165
- - **Cloud-native:** Use cloud-specific services (managed databases, serverless)
166
- - Faster development, lower operational burden
167
- - Trade-off: Harder to migrate clouds
168
- - **Cloud-agnostic:** Use portable tech (Kubernetes, open source)
169
- - Flexibility to move clouds
170
- - Trade-off: More operational burden, slower development
171
- - **Recommendation:** Be pragmatic - use cloud services but document dependencies
172
-
173
- ### Government and Cleared Clouds
174
-
175
- **For regulated industries:**
176
- - **FedRAMP:** AWS GovCloud, Azure Government, GCP for Government, OCI Government
177
- - **IL4/IL5:** AWS Secret Region, Azure Government Secret, GCP Assured Workloads
178
- - **IL6 (Top Secret):** AWS Top Secret Region, Azure Government Top Secret
179
-
180
- **Considerations:**
181
- - Limited service availability in government clouds
182
- - Higher costs (separate infrastructure)
183
- - Longer procurement cycles
184
- - Compliance overhead (STIG, NIST 800-53)
185
-
186
- ---
187
-
188
- ## 2. Build vs Buy Decisions
189
-
190
- ### Framework for Deciding
191
-
192
- ```
193
- BUILD when:
194
- ✅ Core differentiator for your business
195
- ✅ Existing solutions don't meet needs
196
- ✅ You have unique requirements
197
- ✅ Team has expertise and capacity
198
- ✅ Long-term cost justifies initial investment
199
-
200
- BUY when:
201
- ✅ Not a differentiator (infrastructure, auth, payments)
202
- ✅ Commodity problem with good solutions
203
- ✅ Time to market is critical
204
- ✅ Team lacks expertise
205
- ✅ Ongoing maintenance would be burden
206
- ```
207
-
208
- ### Decision Matrix
209
-
210
- | Component | Build | Buy | Rationale |
211
- |-----------|-------|-----|-----------|
212
- | **Authentication** | ❌ | ✅ Buy (Auth0, Okta) | Commodity, security-critical, complex |
213
- | **CI/CD** | ❌ | ✅ Buy (GitHub Actions, CircleCI) | Mature market, not differentiator |
214
- | **Observability** | ❌ | ✅ Buy (Datadog, New Relic) | Complex to build, mature vendors |
215
- | **Internal Developer Platform** | ✅ | ❌ | Core to productivity, unique needs |
216
- | **ML Platform** | ✅ | ❌ If ML is core business | Differentiator, specific workflows |
217
- | **API Gateway** | Maybe | Maybe | Depends on customization needs |
218
-
219
- ### Total Cost of Ownership (TCO)
220
-
221
- **Build TCO:**
222
- ```
223
- Initial Development:
224
- ├── Engineering time (months × $150K/year avg)
225
- ├── Opportunity cost (what else could they build?)
226
- └── Infrastructure costs
227
-
228
- Ongoing:
229
- ├── Maintenance (20-30% of dev cost annually)
230
- ├── Operations (monitoring, on-call)
231
- ├── Updates and security patches
232
- ├── Documentation and training
233
- └── Infrastructure costs
234
-
235
- 3-Year TCO = Initial + (3 × Annual Ongoing)
236
- ```
237
-
238
- **Buy TCO:**
239
- ```
240
- Year 1:
241
- ├── Vendor cost (licenses/seats)
242
- ├── Implementation/integration (1-3 months engineer time)
243
- ├── Training
244
- └── Infrastructure (if self-hosted)
245
-
246
- Years 2-3:
247
- ├── Annual license growth (plan for 20-30% growth)
248
- ├── Support/premium features
249
- ├── Minimal maintenance
250
- └── Infrastructure
251
-
252
- 3-Year TCO = Y1 + Y2 + Y3
253
- ```
254
-
255
- **Example: Auth System**
256
- ```
257
- BUILD:
258
- ├── 6 months × 2 engineers = $150K initial
259
- ├── Ongoing: $60K/year maintenance
260
- └── 3-year TCO: $150K + $180K = $330K
261
-
262
- BUY (Auth0):
263
- ├── $2/MAU × 100K users = $200K/year
264
- ├── Integration: $30K one-time
265
- └── 3-year TCO: $30K + $600K = $630K
266
-
267
- Conclusion: Build seems cheaper BUT:
268
- - Auth0 includes: MFA, SSO, compliance, security updates
269
- - Building all that: 12+ months, $300K+
270
- - Hidden costs: security incidents, compliance audits
271
- - Decision: BUY unless auth is your core business
272
- ```
273
-
274
- ### Build vs Buy Checklist
275
-
276
- ```
277
- □ Is this a core differentiator for our business?
278
- □ Do existing solutions meet 80%+ of our needs?
279
- □ Do we have team expertise to build and maintain?
280
- □ Have we calculated full 3-year TCO for both options?
281
- □ Can we afford the opportunity cost of building?
282
- □ Is vendor lock-in acceptable? (most cases: yes)
283
- □ What's the risk if we choose wrong? Can we switch later?
284
- □ Does "buy" option have enterprise SLA and support?
285
- ```
286
-
287
- ### Build vs Buy Scenarios
288
-
289
- **Scenario: "Should we build an internal platform like Heroku?"**
290
- - **Build cost:** 8-12 engineers × 12 months = $2M+ initial, $1.5M/year ongoing
291
- - **Buy alternative:** Heroku, Cloud Run, App Runner - $50-200K/year
292
- - **Build if:** 150+ engineers, unique workflows, platform is differentiator
293
- - **Buy if:** < 100 engineers, standard app deployment, want speed
294
- - **Hidden costs of building:** In-house support, documentation, feature requests, security updates
295
-
296
- **Scenario: "Payment processing - build or use Stripe?"**
297
- - **Build:** PCI compliance alone costs $500K+/year
298
- - **Stripe:** 2.9% + $0.30 per transaction
299
- - **Break-even:** Only makes sense at $100M+ annual GMV with specialized needs
300
- - **Decision:** Almost always buy. Payments are not your core business.
301
-
302
- **Scenario: "APM - commercial (DataDog/New Relic) vs open source (Prometheus/Grafana)?"**
303
- - **Commercial ($200-500K/year):**
304
- - Full-featured, hosted, 24/7 support
305
- - Fast time to value (days)
306
- - Best for teams < 50 engineers
307
- - **Open Source ($100-200K/year in engineering time):**
308
- - Self-hosted, requires dedicated team
309
- - Slower time to value (months)
310
- - Best for teams > 100 engineers with SRE expertise
311
- - **Decision:** Buy commercial until you have SRE team to run OSS
312
-
313
- **Scenario: "Service mesh - build custom vs buy Istio/Linkerd vs buy Consul?"**
314
- - **Build custom:** 6-12 months, ongoing maintenance nightmare
315
- - **Open source (Istio/Linkerd):** Complex to operate, requires expertise
316
- - **Commercial (Consul Enterprise, Gloo):** Easier, supported, expensive
317
- - **Reality:** Most companies don't need service mesh. Use it if:
318
- - 50+ microservices
319
- - Need mTLS everywhere
320
- - Complex traffic routing requirements
321
- - **Decision:** Buy managed service mesh or don't use one
322
-
323
- **Scenario: "Managed Kubernetes (EKS/GKE) vs self-hosted?"**
324
- - **Managed ($150/cluster/month):**
325
- - Control plane managed, auto-updates, integrated
326
- - Still need to manage worker nodes
327
- - **Self-hosted (save $150/month, cost $10K/month in engineering time):**
328
- - Full control, complex setup, manual updates
329
- - **Decision:** Always use managed unless you have 10+ dedicated Kubernetes experts
330
-
331
- **Scenario: "Observability - should we buy DataDog or build our own?"**
332
- - **Build cost:** $500K-1M first year, $300K/year ongoing
333
- - **DataDog:** $100-300K/year depending on scale
334
- - **Build if:** > 500 engineers, unique observability needs, cost > $1M/year
335
- - **Buy if:** < 500 engineers, standard needs, want to focus on product
336
- - **Hidden build costs:** Integration with all services, alerting, dashboards, on-call for observability platform
337
-
338
- **Scenario: "Should finance approve this observability tooling?"**
339
- - **Cost:** $200K/year for observability seems expensive
340
- - **Value:** Reduce MTTR from 2 hours to 15 minutes
341
- - 100 incidents/year × 1.75 hours saved × 3 engineers × $100/hour = $52.5K/year
342
- - Prevented outages: 10/year × $50K revenue impact = $500K/year saved
343
- - **ROI:** $752K value for $200K cost = 276% ROI
344
- - **Decision:** Approve - observability prevents costly outages
345
-
346
- **Scenario: "Terraform Cloud vs self-hosted Terraform?"**
347
- - **Terraform Cloud:** $20/user/month = $24K/year for 100 engineers
348
- - **Self-hosted:** Free but requires CI/CD integration, state management, RBAC
349
- - Engineering cost: $50K/year
350
- - **Decision:** Use Terraform Cloud unless you already have robust CI/CD for state management
351
-
352
- ---
353
-
354
- ## 3. Platform Investment ROI
355
-
356
- ### Calculating Platform ROI
357
-
358
- **Formula:**
359
- ```
360
- ROI = (Productivity Gains - Platform Cost) / Platform Cost × 100%
361
-
362
- Productivity Gains = (Time Saved × Engineer Count × Avg Salary)
363
- Platform Cost = (Team Cost + Infrastructure Cost)
364
- ```
365
-
366
- **Example: Internal Developer Platform**
367
- ```
368
- Investment:
369
- ├── Platform team: 8 engineers × $200K = $1.6M/year
370
- ├── Infrastructure: $400K/year
371
- └── Total Cost: $2M/year
372
-
373
- Productivity Gains:
374
- ├── Faster deployments: 2 hours/week saved × 50 engineers
375
- ├── Reduced incidents: 50% reduction = 10 hours/week saved
376
- ├── Faster onboarding: 2 weeks → 1 week for 20 new hires/year
377
- ├──Total time saved: ~5,000 hours/year
378
- ├── Value: 5,000 hours × $100/hour = $500K/year
379
-
380
- Wait, that's negative ROI!
381
-
382
- But indirect benefits:
383
- ├── Faster time to market: 2 week reduction × 12 features = 24 weeks
384
- ├── Value of shipping faster: $5M revenue brought forward
385
- ├── Reduced risk: Fewer outages = better customer retention
386
- ├── Improved hiring: Better developer experience attracts talent
387
-
388
- True ROI: Hard to quantify, but likely 3-5x over 3 years
389
- ```
390
-
391
- ### When to Invest in Platform
392
-
393
- **Invest when:**
394
- - Team size > 30-50 engineers
395
- - Development velocity slowing down
396
- - High cognitive load on engineers
397
- - Inconsistent practices across teams
398
- - Frequent production incidents
399
- - Hard to hire/onboard engineers
400
-
401
- **Don't invest when:**
402
- - Team < 30 engineers (not enough leverage)
403
- - Business model unproven (premature scaling)
404
- - Existential priorities (fundraising, shipping core product)
405
-
406
- ### ROI Calculation Scenarios
407
-
408
- **Scenario: "How do we calculate platform team ROI?"**
409
- - **Direct metrics:**
410
- - Deployment frequency: 1/week → 10/day
411
- - Lead time: 2 weeks → 2 days
412
- - MTTR: 4 hours → 30 minutes
413
- - Onboarding time: 4 weeks → 1 week
414
- - **Value calculation:**
415
- - 50 engineers × 5 hours/week saved = 250 hours/week
416
- - 250 hours × 50 weeks × $100/hour = $1.25M/year
417
- - **Platform cost:** 8 engineers × $200K = $1.6M
418
- - **ROI:** Breakeven year 1, positive thereafter
419
- - **Intangibles:** Better hiring, less burnout, faster innovation
420
-
421
- **Scenario: "Justifying Kubernetes migration"**
422
- - **Cost of migration:** 6 months × 4 engineers = $400K
423
- - **Benefits:**
424
- - Better resource utilization: Save 30% on infrastructure = $150K/year
425
- - Faster deployments: 2 hours → 10 minutes = 100 hours/week saved = $250K/year
426
- - Multi-cloud optionality (intangible)
427
- - **Payback period:** 12-18 months
428
- - **Decision:** Worth it if infrastructure cost > $500K/year or scaling quickly
429
-
430
- **Scenario: "Platform team value - what should we measure?"**
431
- - **Avoid vanity metrics:**
432
- - ❌ Number of deployments (more isn't always better)
433
- - ❌ Lines of code (meaningless)
434
- - ❌ Tickets closed (focuses on wrong thing)
435
- - **Focus on impact metrics:**
436
- - ✅ Developer survey scores (NPS for platform)
437
- - ✅ Time to first deployment (new engineer)
438
- - ✅ DORA metrics (deployment frequency, lead time, MTTR, change failure rate)
439
- - ✅ Time saved per engineer per week
440
- - ✅ Incident reduction (fewer production issues)
441
-
442
- **Scenario: "Infrastructure cost per developer?"**
443
- - **Calculate:** Total infrastructure cost / number of engineers
444
- - **Benchmarks:**
445
- - Early stage: $2-5K per engineer/month
446
- - Scale-up: $5-10K per engineer/month
447
- - Enterprise: $10-20K per engineer/month
448
- - **High cost reasons:** Data-intensive, ML workloads, inefficient usage, overprovisioning
449
- - **Optimization:** Right-size instances, use spot/reserved, implement autoscaling
450
-
451
- **Scenario: "How do we measure developer velocity improvement?"**
452
- - **Lead Time for Changes:**
453
- - Before: 2 weeks from commit to production
454
- - After platform investment: 2 days
455
- - Improvement: 10x faster
456
- - **Developer satisfaction:**
457
- - Survey: "How easy is it to deploy a new service?" 1-10
458
- - Target: Improve from 4 → 8
459
- - **Time to productivity:**
460
- - New engineer: Productive in 1 week vs 4 weeks
461
- - Value: 3 weeks × 20 new hires/year = 60 weeks saved
462
-
463
- **Scenario: "Service mesh cost-benefit analysis"**
464
- - **Cost:**
465
- - 2 engineers × 6 months setup = $200K
466
- - Ongoing: 1 engineer × $200K/year
467
- - Overhead: 10% latency increase, 20% infrastructure increase = $100K/year
468
- - **Total:** $200K + $300K/year
469
- - **Benefit:**
470
- - mTLS everywhere (security win)
471
- - Traffic management (canary deploys)
472
- - Observability (better debugging)
473
- - **Value:** Hard to quantify - mainly security/compliance
474
- - **Decision:** Only do it if:
475
- - Security/compliance requirement
476
- - 50+ microservices
477
- - Sophisticated traffic management needs
478
-
479
- **Scenario: "Platform break-even point"**
480
- - **Question:** "When does investing in platform pay off?"
481
- - **Formula:** Break-even when (Time Saved Value) > (Platform Cost)
482
- - **Example:**
483
- - Platform team cost: $2M/year (10 engineers)
484
- - Time saved: 100 engineers × 10 hours/week × $100/hour = $5M/year
485
- - **Break-even:** Immediate (2.5x return)
486
- - **Reality:** Benefits compound - velocity improvements enable more velocity
487
-
488
- **Scenario: "Opportunity cost of platform investment"**
489
- - **Question:** "What else could these 8 engineers build instead of platform?"
490
- - **Option A:** Platform team → enables 100 engineers to be 20% more productive = 20 FTE equivalent
491
- - **Option B:** Product team → ship 2-3 more features/year
492
- - **Trade-off:** Short-term features vs long-term productivity
493
- - **Decision:** At 50+ engineers, platform investment usually wins
494
-
495
- ### Investment Priorities by Stage
496
-
497
- **Startup (0-30 engineers):**
498
- ```
499
- Priority 1: Ship product, find product-market fit
500
- Infrastructure: Use managed services, don't build platform
501
- Investment: Observability, CI/CD (buy, don't build)
502
- ```
503
-
504
- **Scale-up (30-150 engineers):**
505
- ```
506
- Priority: Scale engineering productivity
507
- Infrastructure: Start investing in platform
508
- Investment:
509
- ├── Developer experience (CI/CD optimization, faster builds)
510
- ├── Observability (centralized logs, metrics, traces)
511
- ├── Self-service infrastructure (IaC templates, K8s)
512
- └── SRE function (reliability, on-call)
513
- ```
514
-
515
- **Enterprise (150+ engineers):**
516
- ```
517
- Priority: Maintain velocity as org scales
518
- Infrastructure: Platform as product
519
- Investment:
520
- ├── Internal developer platform (self-service everything)
521
- ├── Platform teams (dedicated orgs)
522
- ├── SRE org (production excellence)
523
- ├── Security org (AppSec, compliance)
524
- └── Data platform (analytics, ML)
525
- ```
526
-
527
- ---
528
-
529
- ## 4. Multi-Year Roadmapping
530
-
531
- ### Infrastructure Roadmap Framework
532
-
533
- **Year 1: Foundation**
534
- ```
535
- Q1-Q2: Stabilize
536
- ├── Production reliability (reduce incidents)
537
- ├── Observability (visibility into systems)
538
- ├── CI/CD basics (automated deployments)
539
- └── Security fundamentals (secrets management, scanning)
540
-
541
- Q3-Q4: Optimize
542
- ├── Developer experience improvements
543
- ├── Performance optimization
544
- ├── Cost optimization
545
- └── Team hiring and growth
546
- ```
547
-
548
- **Year 2: Scale**
549
- ```
550
- Q1-Q2: Platform Investment
551
- ├── Internal developer platform (IDP) foundation
552
- ├── Self-service infrastructure
553
- ├── Advanced observability (tracing, SLOs)
554
- └── Expand platform team
555
-
556
- Q3-Q4: Productivity
557
- ├── Faster deployments (reduce cycle time)
558
- ├── Better testing (reduce bugs)
559
- ├── Documentation and enablement
560
- └── Platform adoption
561
- ```
562
-
563
- **Year 3: Excellence**
564
- ```
565
- Q1-Q2: Maturity
566
- ├── Platform as product mindset
567
- ├── Multi-region/global infrastructure
568
- ├── Advanced security and compliance
569
- └── Disaster recovery and business continuity
570
-
571
- Q3-Q4: Innovation
572
- ├── Emerging technologies (ML, edge computing)
573
- ├── Next-generation architecture
574
- ├── Strategic bets
575
- └── Continuous improvement
576
- ```
577
-
578
- ### Balancing Roadmap
579
-
580
- **The 70-20-10 Rule:**
581
- - **70% Core Business:** Keep the lights on, support product roadmap
582
- - **20% Platform Investment:** Developer experience, reliability, security
583
- - **10% Innovation:** Experiments, R&D, emerging tech
584
-
585
- **Adjust by maturity:**
586
- - Early stage: 85% core, 10% platform, 5% innovation
587
- - Growth stage: 70% core, 20% platform, 10% innovation
588
- - Mature: 60% core, 25% platform, 15% innovation
589
-
590
- ### Roadmap Communication
591
-
592
- **Quarterly Infrastructure Review (with leadership):**
593
- ```
594
- 1. Last Quarter Recap (15 min)
595
- ├── What we shipped
596
- ├── Impact and metrics
597
- └── What we learned
598
-
599
- 2. This Quarter Plan (20 min)
600
- ├── Top 3-5 priorities
601
- ├── Resource allocation
602
- ├── Dependencies and risks
603
- └── Success criteria
604
-
605
- 3. Long-term Strategy (15 min)
606
- ├── Year-ahead preview
607
- ├── Strategic bets
608
- └── Investment needs
609
-
610
- 4. Q&A (10 min)
611
- ```
612
-
613
- ---
614
-
615
- ## 5. Technology Radar
616
-
617
- ### What is a Technology Radar?
618
-
619
- **A framework for tracking and evaluating technologies.**
620
-
621
- **Four Rings:**
622
- 1. **Adopt:** Proven, ready for production, recommended
623
- 2. **Trial:** Worth exploring, pilot projects
624
- 3. **Assess:** Interesting, but not ready yet
625
- 4. **Hold:** Avoid for now, or phase out
626
-
627
- **Four Quadrants:**
628
- 1. **Techniques:** Development practices, architectures
629
- 2. **Tools:** Software, frameworks, products
630
- 3. **Platforms:** Infrastructure, cloud services
631
- 4. **Languages & Frameworks:** Programming languages, libraries
632
-
633
- ### Example Technology Radar (Infrastructure)
634
-
635
- **ADOPT (Use in production):**
636
- ```
637
- ├── Kubernetes (Container orchestration)
638
- ├── Terraform (Infrastructure as Code)
639
- ├── GitHub Actions (CI/CD)
640
- ├── Datadog (Observability)
641
- ├── PostgreSQL (Relational database)
642
- └── AWS (Cloud platform)
643
- ```
644
-
645
- **TRIAL (Pilot projects):**
646
- ```
647
- ├── ArgoCD (GitOps)
648
- ├── Pulumi (IaC alternative to Terraform)
649
- ├── Temporal (Workflow orchestration)
650
- ├── ClickHouse (Analytics database)
651
- └── OpenTelemetry (Observability standard)
652
- ```
653
-
654
- **ASSESS (Evaluate):**
655
- ```
656
- ├── WebAssembly (Edge computing)
657
- ├── Serverless containers (AWS Fargate, Cloud Run)
658
- ├── Service mesh (Istio, Linkerd)
659
- └── eBPF (Observability and security)
660
- ```
661
-
662
- **HOLD (Avoid or deprecate):**
663
- ```
664
- ├── Monolithic architectures (favor microservices)
665
- ├── Manual deployments (automate everything)
666
- ├── Homegrown auth (use Auth0/Okta)
667
- └── [Legacy tool you're migrating from]
668
- ```
669
-
670
- ### Technology Evaluation Process
671
-
672
- **Before adopting new technology:**
673
- ```
674
- 1. Problem Validation
675
- └── What problem does this solve?
676
- └── Do we actually have this problem?
677
- └── How are we solving it today?
678
-
679
- 2. Technology Research
680
- └── Maturity: Production-ready? Stable?
681
- └── Community: Active? Well-supported?
682
- └── Ecosystem: Good documentation? Libraries? Integrations?
683
-
684
- 3. Proof of Concept
685
- └── Build small prototype (2-4 weeks max)
686
- └── Test with real use case
687
- └── Assess developer experience
688
-
689
- 4. Team Assessment
690
- └── Do we have skills? Can we learn?
691
- └── Can we operate and maintain this?
692
- └── What's the training investment?
693
-
694
- 5. Decision
695
- └── Adopt: Roll out to production
696
- └── Trial: More POCs, pilot projects
697
- └── Assess: Keep watching, not ready
698
- └── Hold: Not right for us, pass
699
-
700
- 6. Review Annually
701
- └── Revisit decisions yearly
702
- └── Move technologies between rings
703
- └── Deprecate old choices
704
- ```
705
-
706
- ---
707
-
708
- ## 6. Migration Planning (Executive Level)
709
-
710
- ### Types of Migrations
711
-
712
- **1. Cloud Migration (On-prem → Cloud)**
713
- ```
714
- Approaches:
715
- ├── Lift-and-shift (Rehost): Fast, minimal changes, technical debt
716
- ├── Replatform: Optimize for cloud (managed services, containers)
717
- ├── Refactor: Rewrite for cloud-native (microservices, serverless)
718
- └── Recommended: Hybrid (replatform most, refactor critical)
719
-
720
- Timeline: 12-36 months depending on scope
721
- Investment: 20-40% of engineering capacity
722
- Risk: Medium-High
723
- ```
724
-
725
- **2. Multi-Cloud (Single cloud → Multi-cloud)**
726
- ```
727
- Why:
728
- ├── Vendor negotiation leverage
729
- ├── Regulatory requirements (data sovereignty)
730
- ├── M&A integration
731
- └── Avoid vendor lock-in
732
-
733
- Cost: 2-3x operational overhead
734
- Timeline: 18-36 months
735
- Recommendation: Only if compelling business reason
736
- ```
737
-
738
- **3. Modernization (Monolith → Microservices)**
739
- ```
740
- Approach:
741
- ├── Strangler fig pattern (gradually extract services)
742
- ├── Don't rewrite everything at once
743
- └── Extract highest-value services first
744
-
745
- Timeline: 24-48 months
746
- Investment: 30-50% of engineering capacity
747
- Risk: High (many fail, scope creep)
748
- ```
749
-
750
- ### Migration Planning Framework
751
-
752
- **Phase 1: Assessment (2-3 months)**
753
- ```
754
- ├── Current state analysis
755
- │ ├── Inventory of systems
756
- │ ├── Dependencies mapped
757
- │ └── Technical debt identified
758
- ├── Target state definition
759
- │ ├── Architecture vision
760
- │ ├── Technology choices
761
- │ └── Success criteria
762
- └── Migration strategy
763
- ├── Wave planning (which systems, what order)
764
- ├── Risk assessment
765
- └── Resource planning
766
- ```
767
-
768
- **Phase 2: Pilot (3-6 months)**
769
- ```
770
- ├── Choose 1-2 non-critical systems
771
- ├── Migrate end-to-end
772
- ├── Learn and refine process
773
- ├── Build runbooks and automation
774
- └── Validate costs and effort estimates
775
- ```
776
-
777
- **Phase 3: Execution (12-24 months)**
778
- ```
779
- ├── Migrate in waves (monthly or quarterly)
780
- │ ├── Wave 1: Easy wins (stateless apps)
781
- │ ├── Wave 2: Medium complexity
782
- │ └── Wave 3: Complex/critical systems
783
- ├── Decommission old systems
784
- └── Continuous optimization
785
- ```
786
-
787
- **Phase 4: Optimization (Ongoing)**
788
- ```
789
- ├── Cost optimization
790
- ├── Performance tuning
791
- ├── Security hardening
792
- └── Team training
793
- ```
794
-
795
- ### Migration Risks and Mitigations
796
-
797
- | Risk | Impact | Mitigation |
798
- |------|--------|------------|
799
- | **Cost overruns** | Budget exceeded 2-3x | Detailed estimation, quarterly reviews, kill switch |
800
- | **Timeline delays** | Migration takes 2x longer | Conservative estimates, buffer time, phased approach |
801
- | **Data loss** | Critical data corrupted/lost | Backups, dual-write, rollback plan |
802
- | **Performance issues** | System slower after migration | Load testing, gradual rollout, performance baseline |
803
- | **Team burnout** | Engineers exhausted | Limit migration to 30-40% capacity, rotations |
804
- | **Vendor lock-in** | Stuck with new vendor | Design for portability (Kubernetes, IaC) |
805
-
806
- ---
807
-
808
- ## 7. Balancing Innovation vs Stability
809
-
810
- ### The Innovation Spectrum
811
-
812
- ```
813
- Bleeding Edge → Leading Edge → Mainstream → Legacy
814
- ↑ ↑ ↑ ↑
815
- High Risk Medium Risk Low Risk High Risk
816
- High Reward Medium Reward Low Reward Technical Debt
817
- ```
818
-
819
- **Where to be:**
820
- - **Core infrastructure:** Mainstream (proven, stable)
821
- - **Product features:** Leading edge (competitive advantage)
822
- - **Experiments:** Bleeding edge (limited blast radius)
823
- - **Legacy:** Migrate to mainstream
824
-
825
- ### Innovation Budget
826
-
827
- **Allocate engineering time:**
828
- ```
829
- ├── 70% Mainstream: Proven technologies, low risk
830
- ├── 20% Leading Edge: 1-2 year old, early adopters
831
- └── 10% Bleeding Edge: New, experimental, R&D
832
- ```
833
-
834
- **Example:**
835
- - Mainstream: Kubernetes, Postgres, AWS
836
- - Leading Edge: ArgoCD (GitOps), OpenTelemetry
837
- - Bleeding Edge: WebAssembly at edge, new ML frameworks
838
-
839
- ### Decision Framework: When to Adopt New Technology?
840
-
841
- **Adopt if:**
842
- - ✅ Solves real problem we have today
843
- - ✅ Mature enough (1-2 years in production elsewhere)
844
- - ✅ Active community and support
845
- - ✅ Team excited and willing to learn
846
- - ✅ Can pilot with low risk
847
-
848
- **Wait if:**
849
- - ❌ No clear problem it solves
850
- - ❌ Too new (< 1 year, frequent breaking changes)
851
- - ❌ Small community, unclear future
852
- - ❌ Team lacks bandwidth to learn
853
- - ❌ Can't fail safely
854
-
855
- ---
856
-
857
- ## Key Takeaways for Leaders
858
-
859
- 1. **Cloud strategy:** Single cloud for most, multi-cloud only if required
860
- 2. **Build vs buy:** Buy unless it's your core differentiator
861
- 3. **Platform ROI:** Invest when team > 30-50 engineers
862
- 4. **Roadmap balance:** 70% core, 20% platform, 10% innovation
863
- 5. **Technology radar:** Be deliberate about tech adoption
864
- 6. **Migration planning:** 12-36 months, 20-40% capacity
865
- 7. **Innovation budget:** 70% mainstream, 20% leading edge, 10% experimental
866
- 8. **Make reversible decisions:** Avoid vendor lock-in where possible
867
- 9. **Measure everything:** Track productivity, costs, reliability
868
- 10. **Think in years:** Infrastructure strategy is long-term game
869
-
870
- **Remember:** Infrastructure strategy is about enabling your business to move faster, scale efficiently, and compete effectively - not about using the coolest technology.
871
-
872
- ---
873
-
874
- ## Templates
875
-
876
- ### Technology Decision Template
877
-
878
- ```markdown
879
- # Technology Decision: [Technology Name]
880
-
881
- ## Problem
882
- [What problem are we solving?]
883
-
884
- ## Proposed Solution
885
- [Technology/approach we're evaluating]
886
-
887
- ## Alternatives Considered
888
- 1. [Alternative 1]
889
- 2. [Alternative 2]
890
- 3. Status quo
891
-
892
- ## Evaluation
893
- | Criteria | Weight | Score (1-5) | Notes |
894
- |----------|--------|-------------|-------|
895
- | Solves problem | High | | |
896
- | Maturity | High | | |
897
- | Team skills | Medium | | |
898
- | Cost | Medium | | |
899
- | Vendor support | Low | | |
900
-
901
- ## Decision
902
- [Adopt | Trial | Assess | Hold]
903
-
904
- ## Next Steps
905
- - [ ] Prototype (if Trial)
906
- - [ ] Training plan
907
- - [ ] Migration plan
908
- - [ ] Success metrics
909
-
910
- ## Review Date
911
- [When we'll revisit this decision]
912
- ```
913
-
914
- ---
915
-
916
- ## Integration with Other Skills
917
-
918
- **This skill works with:**
919
- - **technical-leadership** - Evaluating technical proposals, architecture reviews
920
- - **engineering-management** - Resource planning, team organization
921
- - **budget-and-cost-management** - Infrastructure budgets, cost optimization
922
- - **engineering-operations-management** - SRE strategy, reliability
923
-
924
- Your infrastructure strategy should enable your business strategy, not constrain it.