blockmine 1.24.0 → 1.27.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (476) hide show
  1. package/CHANGELOG.md +76 -1
  2. package/README.en.md +427 -0
  3. package/README.md +40 -0
  4. package/backend/package.json +2 -2
  5. package/backend/prisma/migrations/20260328173000_add_plugin_source_ref/migration.sql +2 -0
  6. package/backend/prisma/migrations/migration_lock.toml +2 -2
  7. package/backend/prisma/schema.prisma +2 -0
  8. package/backend/src/ai/plugin-assistant-system-prompt.md +664 -5
  9. package/backend/src/api/routes/apiKeys.js +8 -0
  10. package/backend/src/api/routes/bots.js +271 -9
  11. package/backend/src/api/routes/eventGraphs.js +151 -1
  12. package/backend/src/api/routes/health.js +38 -0
  13. package/backend/src/api/routes/nodeRegistry.js +63 -0
  14. package/backend/src/api/routes/plugins.js +254 -29
  15. package/backend/src/api/routes/servers.js +14 -2
  16. package/backend/src/container.js +11 -8
  17. package/backend/src/core/BotCommandLoader.js +161 -0
  18. package/backend/src/core/BotConnection.js +125 -0
  19. package/backend/src/core/BotEventHandlers.js +234 -0
  20. package/backend/src/core/BotIPCHandler.js +445 -0
  21. package/backend/src/core/BotManager.js +15 -7
  22. package/backend/src/core/BotProcess.js +169 -140
  23. package/backend/src/core/EventGraphManager.js +7 -3
  24. package/backend/src/core/GraphDebugHandler.js +229 -0
  25. package/backend/src/core/GraphDebugIPC.js +117 -0
  26. package/backend/src/core/GraphExecutionEngine.js +545 -978
  27. package/backend/src/core/GraphTraversal.js +80 -0
  28. package/backend/src/core/GraphValidation.js +73 -0
  29. package/backend/src/core/NodeDefinition.js +138 -0
  30. package/backend/src/core/NodeRegistry.js +153 -141
  31. package/backend/src/core/PluginLoader.js +83 -3
  32. package/backend/src/core/PluginManager.js +346 -35
  33. package/backend/src/core/RewindSignal.js +9 -0
  34. package/backend/src/core/config/ConfigValidator.js +72 -0
  35. package/backend/src/core/config/FeatureFlags.js +52 -0
  36. package/backend/src/core/config/__tests__/ConfigValidator.test.js +232 -0
  37. package/backend/src/core/domain/entities/Bot.js +39 -0
  38. package/backend/src/core/domain/entities/Command.js +41 -0
  39. package/backend/src/core/domain/entities/EventGraph.js +39 -0
  40. package/backend/src/core/domain/entities/Plugin.js +45 -0
  41. package/backend/src/core/domain/entities/User.js +40 -0
  42. package/backend/src/core/domain/services/DependencyResolver.js +168 -0
  43. package/backend/src/core/domain/services/GraphValidator.js +117 -0
  44. package/backend/src/core/domain/services/PermissionChecker.js +34 -0
  45. package/backend/src/core/domain/services/__tests__/DependencyResolver.test.js +126 -0
  46. package/backend/src/core/domain/valueObjects/BotConfig.js +27 -0
  47. package/backend/src/core/domain/valueObjects/DependencyGraph.js +86 -0
  48. package/backend/src/core/domain/valueObjects/PluginManifest.js +36 -0
  49. package/backend/src/core/errors/BaseError.js +29 -0
  50. package/backend/src/core/errors/ErrorHandler.js +81 -0
  51. package/backend/src/core/errors/__tests__/ErrorHandler.test.js +188 -0
  52. package/backend/src/core/errors/index.js +68 -0
  53. package/backend/src/core/infrastructure/BatchingUtility.js +66 -0
  54. package/backend/src/core/infrastructure/CircuitBreaker.js +103 -0
  55. package/backend/src/core/infrastructure/ConnectionPool.js +81 -0
  56. package/backend/src/core/infrastructure/RateLimiter.js +64 -0
  57. package/backend/src/core/infrastructure/__tests__/BatchingUtility.test.js +86 -0
  58. package/backend/src/core/infrastructure/__tests__/CircuitBreaker.test.js +156 -0
  59. package/backend/src/core/infrastructure/__tests__/ConnectionPool.test.js +146 -0
  60. package/backend/src/core/infrastructure/__tests__/RateLimiter.test.js +171 -0
  61. package/backend/src/core/ipc/botApiFactory.js +72 -0
  62. package/backend/src/core/ipc/ipcMessageTypes.js +115 -0
  63. package/backend/src/core/logging/AuditLogger.js +61 -0
  64. package/backend/src/core/logging/StructuredLogger.js +80 -0
  65. package/backend/src/core/logging/__tests__/StructuredLogger.test.js +213 -0
  66. package/backend/src/core/logging/index.js +7 -0
  67. package/backend/src/core/metrics/MetricsCollector.js +104 -0
  68. package/backend/src/core/metrics/__tests__/MetricsCollector.test.js +131 -0
  69. package/backend/src/core/node-registries/actionsNodes.js +191 -0
  70. package/backend/src/core/node-registries/arraysNodes.js +152 -0
  71. package/backend/src/core/node-registries/botNodes.js +48 -0
  72. package/backend/src/core/node-registries/containerNodes.js +141 -0
  73. package/backend/src/core/node-registries/dataNodes.js +284 -0
  74. package/backend/src/core/node-registries/debugNodes.js +23 -0
  75. package/backend/src/core/node-registries/eventsNodes.js +223 -0
  76. package/backend/src/core/node-registries/flowNodes.js +151 -0
  77. package/backend/src/core/node-registries/furnaceNodes.js +123 -0
  78. package/backend/src/core/node-registries/index.js +108 -0
  79. package/backend/src/core/node-registries/inventory.js +102 -106
  80. package/backend/src/core/node-registries/logicNodes.js +54 -0
  81. package/backend/src/core/node-registries/mathNodes.js +38 -0
  82. package/backend/src/core/node-registries/navigationNodes.js +109 -0
  83. package/backend/src/core/node-registries/objectsNodes.js +90 -0
  84. package/backend/src/core/node-registries/stringsNodes.js +165 -0
  85. package/backend/src/core/node-registries/timeNodes.js +105 -0
  86. package/backend/src/core/node-registries/typeNodes.js +22 -0
  87. package/backend/src/core/node-registries/usersNodes.js +126 -0
  88. package/backend/src/core/nodes/arrays/shuffle.js +14 -0
  89. package/backend/src/core/nodes/bot/get_name.js +8 -0
  90. package/backend/src/core/nodes/bot/stop_bot.js +5 -0
  91. package/backend/src/core/nodes/container/open.js +101 -111
  92. package/backend/src/core/nodes/data/store_read.js +26 -0
  93. package/backend/src/core/nodes/data/store_write.js +23 -0
  94. package/backend/src/core/nodes/event/call_event.js +31 -0
  95. package/backend/src/core/nodes/event/custom_event.js +8 -0
  96. package/backend/src/core/nodes/flow/timer.js +35 -0
  97. package/backend/src/core/nodes/inventory/drop.js +73 -65
  98. package/backend/src/core/nodes/inventory/equip.js +54 -45
  99. package/backend/src/core/nodes/inventory/select_slot.js +48 -46
  100. package/backend/src/core/nodes/navigation/follow.js +54 -51
  101. package/backend/src/core/nodes/navigation/go_to.js +41 -53
  102. package/backend/src/core/nodes/navigation/go_to_entity.js +65 -69
  103. package/backend/src/core/nodes/navigation/go_to_player.js +65 -70
  104. package/backend/src/core/nodes/navigation/stop.js +17 -26
  105. package/backend/src/core/nodes/users/add_to_group.js +24 -0
  106. package/backend/src/core/nodes/users/check_permission.js +26 -0
  107. package/backend/src/core/nodes/users/remove_from_group.js +24 -0
  108. package/backend/src/core/services/BotIPCMessageRouter.js +337 -0
  109. package/backend/src/core/services/BotLifecycleService.js +43 -450
  110. package/backend/src/core/services/CacheManager.js +83 -23
  111. package/backend/src/core/services/CrashRestartManager.js +42 -0
  112. package/backend/src/core/services/DebugSessionManager.js +114 -12
  113. package/backend/src/core/services/EventGraphService.js +69 -0
  114. package/backend/src/core/services/MinecraftBotManager.js +9 -1
  115. package/backend/src/core/services/PluginManagementService.js +84 -0
  116. package/backend/src/core/services/TestModeContext.js +65 -0
  117. package/backend/src/core/services/__tests__/CacheManager.test.js +168 -0
  118. package/backend/src/core/services.js +1 -11
  119. package/backend/src/core/validation/InputValidator.js +167 -0
  120. package/backend/src/core/validation/__tests__/InputValidator.test.js +296 -0
  121. package/backend/src/real-time/botApi/index.js +1 -1
  122. package/backend/src/real-time/socketHandler.js +26 -0
  123. package/backend/src/server.js +21 -6
  124. package/frontend/dist/assets/browser-ponyfill-D8y0Ty7C.js +2 -0
  125. package/frontend/dist/assets/index-CFJLS0dk.css +32 -0
  126. package/frontend/dist/assets/index-D91UGNMG.js +11260 -0
  127. package/frontend/dist/flags/en.svg +32 -0
  128. package/frontend/dist/flags/ru.svg +5 -0
  129. package/frontend/dist/index.html +2 -2
  130. package/frontend/dist/locales/en/admin.json +100 -0
  131. package/frontend/dist/locales/en/api-keys.json +58 -0
  132. package/frontend/dist/locales/en/bots.json +113 -0
  133. package/frontend/dist/locales/en/common.json +53 -0
  134. package/frontend/dist/locales/en/configuration.json +22 -0
  135. package/frontend/dist/locales/en/console.json +10 -0
  136. package/frontend/dist/locales/en/dashboard.json +85 -0
  137. package/frontend/dist/locales/en/dialogs.json +70 -0
  138. package/frontend/dist/locales/en/event-graphs.json +50 -0
  139. package/frontend/dist/locales/en/graph-store.json +70 -0
  140. package/frontend/dist/locales/en/login.json +36 -0
  141. package/frontend/dist/locales/en/management.json +192 -0
  142. package/frontend/dist/locales/en/minecraft-viewer.json +27 -0
  143. package/frontend/dist/locales/en/nodes.json +1132 -0
  144. package/frontend/dist/locales/en/permissions.json +50 -0
  145. package/frontend/dist/locales/en/plugin-detail.json +69 -0
  146. package/frontend/dist/locales/en/plugins.json +329 -0
  147. package/frontend/dist/locales/en/proxies.json +81 -0
  148. package/frontend/dist/locales/en/servers.json +39 -0
  149. package/frontend/dist/locales/en/setup.json +19 -0
  150. package/frontend/dist/locales/en/sidebar.json +195 -0
  151. package/frontend/dist/locales/en/tasks.json +62 -0
  152. package/frontend/dist/locales/en/visual-editor.json +418 -0
  153. package/frontend/dist/locales/en/websocket.json +86 -0
  154. package/frontend/dist/locales/ru/admin.json +100 -0
  155. package/frontend/dist/locales/ru/api-keys.json +58 -0
  156. package/frontend/dist/locales/ru/bots.json +113 -0
  157. package/frontend/dist/locales/ru/common.json +49 -0
  158. package/frontend/dist/locales/ru/configuration.json +22 -0
  159. package/frontend/dist/locales/ru/console.json +10 -0
  160. package/frontend/dist/locales/ru/dashboard.json +85 -0
  161. package/frontend/dist/locales/ru/dialogs.json +70 -0
  162. package/frontend/dist/locales/ru/event-graphs.json +50 -0
  163. package/frontend/dist/locales/ru/graph-store.json +70 -0
  164. package/frontend/dist/locales/ru/login.json +36 -0
  165. package/frontend/dist/locales/ru/management.json +192 -0
  166. package/frontend/dist/locales/ru/minecraft-viewer.json +30 -0
  167. package/frontend/dist/locales/ru/nodes.json +1131 -0
  168. package/frontend/dist/locales/ru/permissions.json +50 -0
  169. package/frontend/dist/locales/ru/plugin-detail.json +49 -0
  170. package/frontend/dist/locales/ru/plugins.json +209 -0
  171. package/frontend/dist/locales/ru/proxies.json +81 -0
  172. package/frontend/dist/locales/ru/servers.json +39 -0
  173. package/frontend/dist/locales/ru/setup.json +19 -0
  174. package/frontend/dist/locales/ru/sidebar.json +195 -0
  175. package/frontend/dist/locales/ru/tasks.json +62 -0
  176. package/frontend/dist/locales/ru/visual-editor.json +420 -0
  177. package/frontend/dist/locales/ru/websocket.json +86 -0
  178. package/frontend/dist/monacoeditorwork/css.worker.bundle.js +7 -7
  179. package/frontend/dist/monacoeditorwork/html.worker.bundle.js +7 -7
  180. package/frontend/dist/monacoeditorwork/json.worker.bundle.js +7 -7
  181. package/frontend/dist/monacoeditorwork/ts.worker.bundle.js +3 -3
  182. package/frontend/package.json +6 -0
  183. package/nul +12 -0
  184. package/package.json +3 -3
  185. package/screen/3dviewer.png +0 -0
  186. package/screen/console.png +0 -0
  187. package/screen/dashboard.png +0 -0
  188. package/screen/graph_collabe.png +0 -0
  189. package/screen/graph_live_debug.png +0 -0
  190. package/screen/language_selector.png +0 -0
  191. package/screen/management_command.png +0 -0
  192. package/screen/node_debug_trace.png +0 -0
  193. package/screen/plugin_/320/276/320/261/320/267/320/276/321/200.png +0 -0
  194. package/screen/websocket.png +0 -0
  195. package/screen//320/275/320/260/321/201/321/202/321/200/320/276/320/271/320/272/320/270_/320/276/321/202/320/264/320/265/320/273/321/214/320/275/321/213/321/205_/320/272/320/276/320/274/320/260/320/275/320/264_/320/272/320/260/320/266/320/264/321/203_/320/272/320/276/320/274/320/260/320/275/320/273/320/264/321/203_/320/274/320/276/320/266/320/275/320/276_/320/275/320/260/321/201/321/202/321/200/320/260/320/270/320/262/320/260/321/202/321/214.png +0 -0
  196. package/screen//320/277/320/273/320/260/320/275/320/270/321/200/320/276/320/262/321/211/320/270/320/272_/320/274/320/276/320/266/320/275/320/276_/320/267/320/260/320/264/320/260/320/262/320/260/321/202/321/214_/320/264/320/265/320/271/321/201/321/202/320/262/320/270/321/217_/320/277/320/276_/320/262/321/200/320/265/320/274/320/265/320/275/320/270.png +0 -0
  197. package/.claude/agents/README.md +0 -469
  198. package/.claude/agents/auth-route-debugger.md +0 -118
  199. package/.claude/agents/auth-route-tester.md +0 -93
  200. package/.claude/agents/auto-error-resolver.md +0 -97
  201. package/.claude/agents/build-optimizer.md +0 -236
  202. package/.claude/agents/code-architect.md +0 -34
  203. package/.claude/agents/code-architecture-reviewer.md +0 -83
  204. package/.claude/agents/code-explorer.md +0 -51
  205. package/.claude/agents/code-refactor-master.md +0 -94
  206. package/.claude/agents/code-reviewer.md +0 -46
  207. package/.claude/agents/cost-optimizer.md +0 -134
  208. package/.claude/agents/deployment-orchestrator.md +0 -113
  209. package/.claude/agents/documentation-architect.md +0 -82
  210. package/.claude/agents/frontend-error-fixer.md +0 -77
  211. package/.claude/agents/iac-code-generator.md +0 -71
  212. package/.claude/agents/incident-responder.md +0 -346
  213. package/.claude/agents/infrastructure-architect.md +0 -31
  214. package/.claude/agents/kubernetes-specialist.md +0 -56
  215. package/.claude/agents/migration-planner.md +0 -181
  216. package/.claude/agents/network-architect.md +0 -196
  217. package/.claude/agents/plan-reviewer.md +0 -52
  218. package/.claude/agents/refactor-planner.md +0 -63
  219. package/.claude/agents/security-scanner.md +0 -102
  220. package/.claude/agents/web-research-specialist.md +0 -78
  221. package/.claude/commands/cost-analysis.md +0 -315
  222. package/.claude/commands/dev-docs-update.md +0 -55
  223. package/.claude/commands/dev-docs.md +0 -51
  224. package/.claude/commands/feature-dev.md +0 -125
  225. package/.claude/commands/incident-debug.md +0 -247
  226. package/.claude/commands/infra-plan.md +0 -81
  227. package/.claude/commands/migration-plan.md +0 -478
  228. package/.claude/commands/route-research-for-testing.md +0 -37
  229. package/.claude/commands/security-review.md +0 -66
  230. package/.claude/hooks/CONFIG.md +0 -448
  231. package/.claude/hooks/README.md +0 -163
  232. package/.claude/hooks/SKILL_ACTIVATION_COMPLETE.md +0 -226
  233. package/.claude/hooks/WINDOWS_HOOKS_README.md +0 -151
  234. package/.claude/hooks/add-skill-activation-banners.ts +0 -132
  235. package/.claude/hooks/comprehensive-skill-test.ts +0 -1315
  236. package/.claude/hooks/error-handling-reminder.sh +0 -12
  237. package/.claude/hooks/error-handling-reminder.ts +0 -222
  238. package/.claude/hooks/k8s-manifest-validator.sh +0 -56
  239. package/.claude/hooks/package-lock.json +0 -556
  240. package/.claude/hooks/package.json +0 -16
  241. package/.claude/hooks/post-tool-use-tracker.ps1 +0 -174
  242. package/.claude/hooks/post-tool-use-tracker.sh +0 -183
  243. package/.claude/hooks/security-policy-check.sh +0 -247
  244. package/.claude/hooks/skill-activation-prompt.ps1 +0 -10
  245. package/.claude/hooks/skill-activation-prompt.sh +0 -10
  246. package/.claude/hooks/skill-activation-prompt.ts +0 -141
  247. package/.claude/hooks/stop-build-check-enhanced.sh +0 -130
  248. package/.claude/hooks/terraform-validator.sh +0 -53
  249. package/.claude/hooks/test-input.json +0 -7
  250. package/.claude/hooks/test-skill-activation.ts +0 -427
  251. package/.claude/hooks/trigger-build-resolver.sh +0 -79
  252. package/.claude/hooks/tsc-check.sh +0 -173
  253. package/.claude/hooks/tsconfig.json +0 -19
  254. package/.claude/settings.json +0 -59
  255. package/.claude/settings.local.json +0 -67
  256. package/.claude/skills/README.md +0 -507
  257. package/.claude/skills/api-engineering/SKILL.md +0 -63
  258. package/.claude/skills/api-engineering/resources/api-versioning.md +0 -88
  259. package/.claude/skills/api-engineering/resources/graphql-patterns.md +0 -106
  260. package/.claude/skills/api-engineering/resources/rate-limiting.md +0 -118
  261. package/.claude/skills/api-engineering/resources/rest-api-design.md +0 -105
  262. package/.claude/skills/backend-dev-guidelines/SKILL.md +0 -306
  263. package/.claude/skills/backend-dev-guidelines/resources/architecture-overview.md +0 -451
  264. package/.claude/skills/backend-dev-guidelines/resources/async-and-errors.md +0 -307
  265. package/.claude/skills/backend-dev-guidelines/resources/complete-examples.md +0 -638
  266. package/.claude/skills/backend-dev-guidelines/resources/configuration.md +0 -275
  267. package/.claude/skills/backend-dev-guidelines/resources/database-patterns.md +0 -224
  268. package/.claude/skills/backend-dev-guidelines/resources/middleware-guide.md +0 -213
  269. package/.claude/skills/backend-dev-guidelines/resources/routing-and-controllers.md +0 -756
  270. package/.claude/skills/backend-dev-guidelines/resources/sentry-and-monitoring.md +0 -336
  271. package/.claude/skills/backend-dev-guidelines/resources/services-and-repositories.md +0 -789
  272. package/.claude/skills/backend-dev-guidelines/resources/testing-guide.md +0 -235
  273. package/.claude/skills/backend-dev-guidelines/resources/validation-patterns.md +0 -754
  274. package/.claude/skills/budget-and-cost-management/SKILL.md +0 -850
  275. package/.claude/skills/build-engineering/SKILL.md +0 -431
  276. package/.claude/skills/build-engineering/resources/artifact-repositories.md +0 -72
  277. package/.claude/skills/build-engineering/resources/build-caching.md +0 -96
  278. package/.claude/skills/build-engineering/resources/build-pipelines.md +0 -105
  279. package/.claude/skills/build-engineering/resources/build-security.md +0 -95
  280. package/.claude/skills/build-engineering/resources/build-systems.md +0 -389
  281. package/.claude/skills/build-engineering/resources/compilation-optimization.md +0 -201
  282. package/.claude/skills/build-engineering/resources/dependency-management.md +0 -73
  283. package/.claude/skills/build-engineering/resources/monorepo-builds.md +0 -110
  284. package/.claude/skills/build-engineering/resources/performance-optimization.md +0 -113
  285. package/.claude/skills/build-engineering/resources/reproducible-builds.md +0 -82
  286. package/.claude/skills/cloud-engineering/SKILL.md +0 -675
  287. package/.claude/skills/cloud-engineering/resources/aws-patterns.md +0 -742
  288. package/.claude/skills/cloud-engineering/resources/azure-patterns.md +0 -714
  289. package/.claude/skills/cloud-engineering/resources/cleared-cloud-environments.md +0 -987
  290. package/.claude/skills/cloud-engineering/resources/cloud-cost-optimization.md +0 -757
  291. package/.claude/skills/cloud-engineering/resources/cloud-networking.md +0 -1058
  292. package/.claude/skills/cloud-engineering/resources/cloud-security-tools.md +0 -1530
  293. package/.claude/skills/cloud-engineering/resources/cloud-security.md +0 -990
  294. package/.claude/skills/cloud-engineering/resources/gcp-patterns.md +0 -758
  295. package/.claude/skills/cloud-engineering/resources/migration-strategies.md +0 -820
  296. package/.claude/skills/cloud-engineering/resources/multi-cloud-strategies.md +0 -670
  297. package/.claude/skills/cloud-engineering/resources/oci-patterns.md +0 -1198
  298. package/.claude/skills/cloud-engineering/resources/serverless-patterns.md +0 -795
  299. package/.claude/skills/cloud-engineering/resources/well-architected-frameworks.md +0 -966
  300. package/.claude/skills/cybersecurity/SKILL.md +0 -409
  301. package/.claude/skills/cybersecurity/resources/security-architecture.md +0 -266
  302. package/.claude/skills/database-engineering/SKILL.md +0 -61
  303. package/.claude/skills/database-engineering/resources/backup-and-recovery.md +0 -72
  304. package/.claude/skills/database-engineering/resources/database-replication.md +0 -63
  305. package/.claude/skills/database-engineering/resources/postgresql-fundamentals.md +0 -70
  306. package/.claude/skills/database-engineering/resources/query-optimization.md +0 -68
  307. package/.claude/skills/devsecops/SKILL.md +0 -374
  308. package/.claude/skills/devsecops/resources/ci-cd-security.md +0 -204
  309. package/.claude/skills/devsecops/resources/compliance-automation.md +0 -530
  310. package/.claude/skills/devsecops/resources/compliance-frameworks.md +0 -2322
  311. package/.claude/skills/devsecops/resources/container-security.md +0 -915
  312. package/.claude/skills/devsecops/resources/cspm-integration.md +0 -1440
  313. package/.claude/skills/devsecops/resources/policy-enforcement.md +0 -619
  314. package/.claude/skills/devsecops/resources/secrets-management.md +0 -755
  315. package/.claude/skills/devsecops/resources/security-monitoring.md +0 -146
  316. package/.claude/skills/devsecops/resources/security-scanning.md +0 -887
  317. package/.claude/skills/devsecops/resources/security-testing.md +0 -203
  318. package/.claude/skills/devsecops/resources/supply-chain-security.md +0 -518
  319. package/.claude/skills/devsecops/resources/vulnerability-management.md +0 -481
  320. package/.claude/skills/devsecops/resources/zero-trust-architecture.md +0 -177
  321. package/.claude/skills/documentation-as-code/SKILL.md +0 -323
  322. package/.claude/skills/documentation-as-code/resources/api-documentation.md +0 -90
  323. package/.claude/skills/documentation-as-code/resources/changelog-management.md +0 -79
  324. package/.claude/skills/documentation-as-code/resources/diagram-generation.md +0 -44
  325. package/.claude/skills/documentation-as-code/resources/docs-as-code-workflow.md +0 -99
  326. package/.claude/skills/documentation-as-code/resources/documentation-automation.md +0 -68
  327. package/.claude/skills/documentation-as-code/resources/documentation-sites.md +0 -79
  328. package/.claude/skills/documentation-as-code/resources/markdown-best-practices.md +0 -162
  329. package/.claude/skills/documentation-as-code/resources/openapi-specification.md +0 -77
  330. package/.claude/skills/documentation-as-code/resources/readme-engineering.md +0 -60
  331. package/.claude/skills/documentation-as-code/resources/technical-writing-guide.md +0 -202
  332. package/.claude/skills/engineering-management/SKILL.md +0 -356
  333. package/.claude/skills/engineering-management/resources/career-ladders.md +0 -609
  334. package/.claude/skills/engineering-management/resources/hiring-and-assessment.md +0 -555
  335. package/.claude/skills/engineering-management/resources/one-on-one-guides.md +0 -609
  336. package/.claude/skills/engineering-management/resources/resource-planning.md +0 -557
  337. package/.claude/skills/engineering-management/resources/team-organization-patterns.md +0 -491
  338. package/.claude/skills/engineering-management/resources/technical-interviews.md +0 -474
  339. package/.claude/skills/engineering-operations-management/SKILL.md +0 -817
  340. package/.claude/skills/error-tracking/SKILL.md +0 -379
  341. package/.claude/skills/frontend-design/SKILL.md +0 -42
  342. package/.claude/skills/frontend-dev-guidelines/SKILL.md +0 -403
  343. package/.claude/skills/frontend-dev-guidelines/resources/common-patterns.md +0 -331
  344. package/.claude/skills/frontend-dev-guidelines/resources/complete-examples.md +0 -872
  345. package/.claude/skills/frontend-dev-guidelines/resources/component-patterns.md +0 -502
  346. package/.claude/skills/frontend-dev-guidelines/resources/data-fetching.md +0 -767
  347. package/.claude/skills/frontend-dev-guidelines/resources/file-organization.md +0 -502
  348. package/.claude/skills/frontend-dev-guidelines/resources/loading-and-error-states.md +0 -501
  349. package/.claude/skills/frontend-dev-guidelines/resources/performance.md +0 -406
  350. package/.claude/skills/frontend-dev-guidelines/resources/routing-guide.md +0 -364
  351. package/.claude/skills/frontend-dev-guidelines/resources/styling-guide.md +0 -428
  352. package/.claude/skills/frontend-dev-guidelines/resources/typescript-standards.md +0 -418
  353. package/.claude/skills/general-it-engineering/SKILL.md +0 -393
  354. package/.claude/skills/general-it-engineering/resources/asset-management.md +0 -712
  355. package/.claude/skills/general-it-engineering/resources/automation-orchestration.md +0 -817
  356. package/.claude/skills/general-it-engineering/resources/business-continuity.md +0 -786
  357. package/.claude/skills/general-it-engineering/resources/change-management.md +0 -715
  358. package/.claude/skills/general-it-engineering/resources/enterprise-monitoring.md +0 -729
  359. package/.claude/skills/general-it-engineering/resources/help-desk-operations.md +0 -738
  360. package/.claude/skills/general-it-engineering/resources/incident-service-management.md +0 -834
  361. package/.claude/skills/general-it-engineering/resources/it-governance.md +0 -753
  362. package/.claude/skills/general-it-engineering/resources/itil-framework.md +0 -503
  363. package/.claude/skills/general-it-engineering/resources/service-management.md +0 -669
  364. package/.claude/skills/infrastructure-architecture/SKILL.md +0 -328
  365. package/.claude/skills/infrastructure-architecture/resources/architecture-decision-records.md +0 -505
  366. package/.claude/skills/infrastructure-architecture/resources/architecture-patterns.md +0 -528
  367. package/.claude/skills/infrastructure-architecture/resources/capacity-planning.md +0 -453
  368. package/.claude/skills/infrastructure-architecture/resources/cleared-environment-architecture.md +0 -773
  369. package/.claude/skills/infrastructure-architecture/resources/cost-architecture.md +0 -499
  370. package/.claude/skills/infrastructure-architecture/resources/data-architecture.md +0 -501
  371. package/.claude/skills/infrastructure-architecture/resources/disaster-recovery.md +0 -535
  372. package/.claude/skills/infrastructure-architecture/resources/migration-architecture.md +0 -512
  373. package/.claude/skills/infrastructure-architecture/resources/multi-region-design.md +0 -608
  374. package/.claude/skills/infrastructure-architecture/resources/reference-architectures.md +0 -562
  375. package/.claude/skills/infrastructure-architecture/resources/security-architecture.md +0 -538
  376. package/.claude/skills/infrastructure-architecture/resources/system-design-principles.md +0 -489
  377. package/.claude/skills/infrastructure-architecture/resources/workload-classification.md +0 -1000
  378. package/.claude/skills/infrastructure-strategy/SKILL.md +0 -924
  379. package/.claude/skills/network-engineering/SKILL.md +0 -385
  380. package/.claude/skills/network-engineering/resources/dns-management.md +0 -738
  381. package/.claude/skills/network-engineering/resources/load-balancing.md +0 -820
  382. package/.claude/skills/network-engineering/resources/network-architecture.md +0 -546
  383. package/.claude/skills/network-engineering/resources/network-security.md +0 -921
  384. package/.claude/skills/network-engineering/resources/network-troubleshooting.md +0 -749
  385. package/.claude/skills/network-engineering/resources/routing-switching.md +0 -373
  386. package/.claude/skills/network-engineering/resources/sdn-networking.md +0 -695
  387. package/.claude/skills/network-engineering/resources/service-mesh-networking.md +0 -777
  388. package/.claude/skills/network-engineering/resources/tcp-ip-protocols.md +0 -444
  389. package/.claude/skills/network-engineering/resources/vpn-connectivity.md +0 -672
  390. package/.claude/skills/node-development/SKILL.md +0 -317
  391. package/.claude/skills/observability-engineering/SKILL.md +0 -101
  392. package/.claude/skills/observability-engineering/resources/apm-tools.md +0 -97
  393. package/.claude/skills/observability-engineering/resources/correlation-strategies.md +0 -87
  394. package/.claude/skills/observability-engineering/resources/distributed-tracing.md +0 -98
  395. package/.claude/skills/observability-engineering/resources/logs-aggregation.md +0 -118
  396. package/.claude/skills/observability-engineering/resources/observability-cost-optimization.md +0 -141
  397. package/.claude/skills/observability-engineering/resources/opentelemetry.md +0 -110
  398. package/.claude/skills/platform-engineering/SKILL.md +0 -555
  399. package/.claude/skills/platform-engineering/resources/architecture-overview.md +0 -600
  400. package/.claude/skills/platform-engineering/resources/container-orchestration.md +0 -916
  401. package/.claude/skills/platform-engineering/resources/cost-optimization.md +0 -634
  402. package/.claude/skills/platform-engineering/resources/developer-platforms.md +0 -670
  403. package/.claude/skills/platform-engineering/resources/gitops-automation.md +0 -650
  404. package/.claude/skills/platform-engineering/resources/infrastructure-as-code.md +0 -778
  405. package/.claude/skills/platform-engineering/resources/infrastructure-standards.md +0 -708
  406. package/.claude/skills/platform-engineering/resources/multi-tenancy.md +0 -602
  407. package/.claude/skills/platform-engineering/resources/platform-security.md +0 -711
  408. package/.claude/skills/platform-engineering/resources/resource-management.md +0 -592
  409. package/.claude/skills/platform-engineering/resources/service-mesh.md +0 -628
  410. package/.claude/skills/release-engineering/SKILL.md +0 -393
  411. package/.claude/skills/release-engineering/resources/artifact-management.md +0 -108
  412. package/.claude/skills/release-engineering/resources/build-optimization.md +0 -84
  413. package/.claude/skills/release-engineering/resources/ci-cd-pipelines.md +0 -411
  414. package/.claude/skills/release-engineering/resources/deployment-strategies.md +0 -197
  415. package/.claude/skills/release-engineering/resources/pipeline-security.md +0 -62
  416. package/.claude/skills/release-engineering/resources/progressive-delivery.md +0 -83
  417. package/.claude/skills/release-engineering/resources/release-automation.md +0 -68
  418. package/.claude/skills/release-engineering/resources/release-orchestration.md +0 -77
  419. package/.claude/skills/release-engineering/resources/rollback-strategies.md +0 -66
  420. package/.claude/skills/release-engineering/resources/versioning-strategies.md +0 -59
  421. package/.claude/skills/route-tester/SKILL.md +0 -392
  422. package/.claude/skills/skill-developer/ADVANCED.md +0 -197
  423. package/.claude/skills/skill-developer/HOOK_MECHANISMS.md +0 -306
  424. package/.claude/skills/skill-developer/PATTERNS_LIBRARY.md +0 -152
  425. package/.claude/skills/skill-developer/SKILL.md +0 -430
  426. package/.claude/skills/skill-developer/SKILL_RULES_REFERENCE.md +0 -315
  427. package/.claude/skills/skill-developer/TRIGGER_TYPES.md +0 -305
  428. package/.claude/skills/skill-developer/TROUBLESHOOTING.md +0 -514
  429. package/.claude/skills/skill-rules.json +0 -2989
  430. package/.claude/skills/sre/SKILL.md +0 -464
  431. package/.claude/skills/sre/resources/alerting-best-practices.md +0 -282
  432. package/.claude/skills/sre/resources/capacity-planning.md +0 -226
  433. package/.claude/skills/sre/resources/chaos-engineering.md +0 -193
  434. package/.claude/skills/sre/resources/disaster-recovery.md +0 -232
  435. package/.claude/skills/sre/resources/incident-management.md +0 -436
  436. package/.claude/skills/sre/resources/observability-stack.md +0 -240
  437. package/.claude/skills/sre/resources/on-call-runbooks.md +0 -167
  438. package/.claude/skills/sre/resources/performance-optimization.md +0 -108
  439. package/.claude/skills/sre/resources/reliability-patterns.md +0 -183
  440. package/.claude/skills/sre/resources/slo-sli-sla.md +0 -464
  441. package/.claude/skills/sre/resources/toil-reduction.md +0 -145
  442. package/.claude/skills/systems-engineering/SKILL.md +0 -648
  443. package/.claude/skills/systems-engineering/resources/automation-patterns.md +0 -771
  444. package/.claude/skills/systems-engineering/resources/configuration-management.md +0 -998
  445. package/.claude/skills/systems-engineering/resources/linux-administration.md +0 -672
  446. package/.claude/skills/systems-engineering/resources/networking-fundamentals.md +0 -982
  447. package/.claude/skills/systems-engineering/resources/performance-tuning.md +0 -871
  448. package/.claude/skills/systems-engineering/resources/powershell-scripting.md +0 -482
  449. package/.claude/skills/systems-engineering/resources/security-hardening.md +0 -739
  450. package/.claude/skills/systems-engineering/resources/shell-scripting.md +0 -915
  451. package/.claude/skills/systems-engineering/resources/storage-management.md +0 -628
  452. package/.claude/skills/systems-engineering/resources/system-monitoring.md +0 -787
  453. package/.claude/skills/systems-engineering/resources/troubleshooting-guide.md +0 -753
  454. package/.claude/skills/systems-engineering/resources/windows-administration.md +0 -738
  455. package/.claude/skills/technical-leadership/SKILL.md +0 -728
  456. package/backend/docs/SECRETS_DOCUMENTATION.md +0 -327
  457. package/backend/package-lock.json +0 -6801
  458. package/backend/src/core/node-registries/actions.js +0 -202
  459. package/backend/src/core/node-registries/arrays.js +0 -155
  460. package/backend/src/core/node-registries/bot.js +0 -23
  461. package/backend/src/core/node-registries/container.js +0 -162
  462. package/backend/src/core/node-registries/data.js +0 -290
  463. package/backend/src/core/node-registries/debug.js +0 -26
  464. package/backend/src/core/node-registries/events.js +0 -201
  465. package/backend/src/core/node-registries/flow.js +0 -139
  466. package/backend/src/core/node-registries/furnace.js +0 -143
  467. package/backend/src/core/node-registries/logic.js +0 -62
  468. package/backend/src/core/node-registries/math.js +0 -42
  469. package/backend/src/core/node-registries/navigation.js +0 -111
  470. package/backend/src/core/node-registries/objects.js +0 -98
  471. package/backend/src/core/node-registries/strings.js +0 -187
  472. package/backend/src/core/node-registries/time.js +0 -113
  473. package/backend/src/core/node-registries/type.js +0 -25
  474. package/backend/src/core/node-registries/users.js +0 -79
  475. package/frontend/dist/assets/index-BC-NbKXi.css +0 -32
  476. package/frontend/dist/assets/index-DqJXZMHY.js +0 -11266
@@ -1,464 +0,0 @@
1
- # SLO, SLI, and SLA - Service Level Objectives, Indicators, and Agreements
2
-
3
- Defining SLIs/SLOs/SLAs, error budgets, measuring reliability, and example calculations for site reliability engineering.
4
-
5
- ## Table of Contents
6
-
7
- - [Definitions](#definitions)
8
- - [SLI - Service Level Indicators](#sli---service-level-indicators)
9
- - [SLO - Service Level Objectives](#slo---service-level-objectives)
10
- - [SLA - Service Level Agreements](#sla---service-level-agreements)
11
- - [Error Budgets](#error-budgets)
12
- - [Implementation](#implementation)
13
- - [Monitoring and Measurement](#monitoring-and-measurement)
14
- - [Best Practices](#best-practices)
15
-
16
- ## Definitions
17
-
18
- **SLI (Service Level Indicator):** Quantitative measure of service quality
19
- **SLO (Service Level Objective):** Target value for an SLI
20
- **SLA (Service Level Agreement):** Business agreement with consequences
21
-
22
- ```
23
- SLI: What we measure
24
-
25
- SLO: What we promise internally
26
-
27
- SLA: What we promise customers (with penalties)
28
- ```
29
-
30
- ## SLI - Service Level Indicators
31
-
32
- ### Common SLIs
33
-
34
- **Availability:**
35
- ```
36
- Availability = (Successful Requests / Total Requests) × 100%
37
-
38
- Example:
39
- 999,000 successful / 1,000,000 total = 99.9% availability
40
- ```
41
-
42
- **Latency:**
43
- ```
44
- Latency SLI = % of requests faster than threshold
45
-
46
- Example:
47
- 95% of requests complete within 200ms
48
- 99% of requests complete within 500ms
49
- ```
50
-
51
- **Error Rate:**
52
- ```
53
- Error Rate = (Failed Requests / Total Requests) × 100%
54
-
55
- Example:
56
- 100 errors / 100,000 requests = 0.1% error rate
57
- ```
58
-
59
- **Throughput:**
60
- ```
61
- Throughput = Requests per second (RPS)
62
-
63
- Example:
64
- 1,000 requests per second sustained
65
- ```
66
-
67
- ### Prometheus Queries for SLIs
68
-
69
- **Availability SLI:**
70
- ```promql
71
- # Success rate over 30 days
72
- sum(rate(http_requests_total{status=~"2.."}[30d]))
73
- /
74
- sum(rate(http_requests_total[30d]))
75
- ```
76
-
77
- **Latency SLI (p95):**
78
- ```promql
79
- # 95th percentile latency
80
- histogram_quantile(0.95,
81
- sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
82
- )
83
- ```
84
-
85
- **Error Rate SLI:**
86
- ```promql
87
- # Error rate over 30 days
88
- sum(rate(http_requests_total{status=~"5.."}[30d]))
89
- /
90
- sum(rate(http_requests_total[30d]))
91
- ```
92
-
93
- ## SLO - Service Level Objectives
94
-
95
- ### Defining SLOs
96
-
97
- **Four Golden Signals:**
98
- 1. **Latency:** Request duration
99
- 2. **Traffic:** Request rate
100
- 3. **Errors:** Failed requests
101
- 4. **Saturation:** Resource utilization
102
-
103
- **Example SLOs:**
104
- ```yaml
105
- slos:
106
- availability:
107
- target: 99.9%
108
- window: 30d
109
- description: "Service is available and responding to requests"
110
-
111
- latency:
112
- target: 95%
113
- threshold: 200ms
114
- window: 30d
115
- description: "95% of requests complete within 200ms"
116
-
117
- error_rate:
118
- target: 99.9%
119
- window: 30d
120
- description: "99.9% of requests succeed (0.1% error budget)"
121
- ```
122
-
123
- ### Availability Tiers
124
-
125
- ```
126
- 99.9% (three nines) = 43.2 minutes downtime/month
127
- 99.95% (three-five) = 21.6 minutes downtime/month
128
- 99.99% (four nines) = 4.32 minutes downtime/month
129
- 99.999% (five nines) = 26 seconds downtime/month
130
- ```
131
-
132
- ### SLO Document Example
133
-
134
- ```yaml
135
- # api-service-slo.yaml
136
- service: api-service
137
- owner: platform-team
138
- reviewed: 2024-01-15
139
-
140
- slos:
141
- - name: availability
142
- description: API endpoint availability
143
- sli:
144
- query: |
145
- sum(rate(http_requests_total{job="api",status=~"2.."}[30d]))
146
- /
147
- sum(rate(http_requests_total{job="api"}[30d]))
148
- target: 0.999 # 99.9%
149
- window: 30d
150
-
151
- - name: latency-p95
152
- description: 95th percentile latency under 200ms
153
- sli:
154
- query: |
155
- histogram_quantile(0.95,
156
- sum(rate(http_request_duration_seconds_bucket{job="api"}[5m])) by (le)
157
- )
158
- target: 0.2 # 200ms
159
- window: 30d
160
-
161
- - name: error-rate
162
- description: Error rate below 0.1%
163
- sli:
164
- query: |
165
- sum(rate(http_requests_total{job="api",status=~"5.."}[30d]))
166
- /
167
- sum(rate(http_requests_total{job="api"}[30d]))
168
- target: 0.001 # 0.1% errors = 99.9% success
169
- window: 30d
170
-
171
- dependencies:
172
- - database-service (99.95% SLO)
173
- - cache-service (99.9% SLO)
174
-
175
- alerting:
176
- burn_rate_fast: 14.4 # 2% error budget in 1 hour
177
- burn_rate_slow: 6 # 5% error budget in 6 hours
178
- ```
179
-
180
- ## SLA - Service Level Agreements
181
-
182
- ### SLA vs SLO
183
-
184
- **SLO (Internal):**
185
- - Target: 99.9%
186
- - No financial penalty
187
- - Triggers internal response
188
-
189
- **SLA (Customer-Facing):**
190
- - Commitment: 99.5% (buffer below SLO)
191
- - Financial penalty if missed
192
- - Legal agreement
193
-
194
- ### SLA Example
195
-
196
- ```yaml
197
- # customer-sla.yaml
198
- service: api-platform
199
- effective_date: 2024-01-01
200
-
201
- commitments:
202
- availability:
203
- guarantee: 99.5%
204
- measurement_period: monthly
205
- exclusions:
206
- - Scheduled maintenance (with 48hr notice)
207
- - Customer-caused issues
208
- - Force majeure
209
-
210
- credits:
211
- 99.0% - 99.5%: 10% monthly fee credit
212
- 98.0% - 99.0%: 25% monthly fee credit
213
- < 98.0%: 50% monthly fee credit
214
-
215
- support:
216
- severity_1: 1 hour response time
217
- severity_2: 4 hours response time
218
- severity_3: 24 hours response time
219
-
220
- data_durability:
221
- guarantee: 99.999999999% (11 nines)
222
- ```
223
-
224
- ## Error Budgets
225
-
226
- ### Concept
227
-
228
- ```
229
- Error Budget = 1 - SLO
230
-
231
- 99.9% SLO = 0.1% error budget
232
- = 43.2 minutes/month
233
- = 432 failed requests per million
234
- ```
235
-
236
- ### Error Budget Policy
237
-
238
- ```yaml
239
- # error-budget-policy.yaml
240
- error_budget_policy:
241
- # When error budget > 0: Normal operations
242
- when_budget_available:
243
- - Deploy during business hours
244
- - Accept reasonable risk
245
- - Focus on feature velocity
246
- - Continue experimentation
247
-
248
- # When error budget exhausted: Freeze changes
249
- when_budget_exhausted:
250
- - Halt all feature deployments
251
- - Focus on reliability improvements
252
- - Root cause analysis required
253
- - Only critical bug fixes allowed
254
- - Emergency change approval needed
255
-
256
- # When error budget critically low
257
- when_budget_critical: # < 25% remaining
258
- - Heightened change review
259
- - Increased monitoring
260
- - Reduce deployment frequency
261
- - Prepare contingency plans
262
- ```
263
-
264
- ### Error Budget Calculation
265
-
266
- ```python
267
- def calculate_error_budget(slo_target, total_requests, failed_requests):
268
- """
269
- Calculate error budget consumption
270
-
271
- Args:
272
- slo_target: Target SLO (e.g., 0.999 for 99.9%)
273
- total_requests: Total requests in period
274
- failed_requests: Failed requests in period
275
-
276
- Returns:
277
- dict with error budget metrics
278
- """
279
- allowed_failures = total_requests * (1 - slo_target)
280
- error_budget_consumed = failed_requests / allowed_failures
281
-
282
- return {
283
- 'allowed_failures': allowed_failures,
284
- 'actual_failures': failed_requests,
285
- 'budget_consumed_pct': error_budget_consumed * 100,
286
- 'budget_remaining_pct': (1 - error_budget_consumed) * 100,
287
- 'is_exhausted': error_budget_consumed >= 1.0
288
- }
289
-
290
- # Example
291
- result = calculate_error_budget(
292
- slo_target=0.999,
293
- total_requests=10_000_000,
294
- failed_requests=5_000
295
- )
296
-
297
- print(f"Allowed failures: {result['allowed_failures']}") # 10,000
298
- print(f"Actual failures: {result['actual_failures']}") # 5,000
299
- print(f"Budget consumed: {result['budget_consumed_pct']:.1f}%") # 50%
300
- print(f"Budget remaining: {result['budget_remaining_pct']:.1f}%") # 50%
301
- ```
302
-
303
- ## Implementation
304
-
305
- ### Prometheus Recording Rules
306
-
307
- ```yaml
308
- # prometheus-slo-rules.yaml
309
- groups:
310
- - name: slo_recording_rules
311
- interval: 30s
312
- rules:
313
- # Availability SLI
314
- - record: slo:availability:ratio_rate30d
315
- expr: |
316
- sum(rate(http_requests_total{job="api",status=~"2.."}[30d]))
317
- /
318
- sum(rate(http_requests_total{job="api"}[30d]))
319
-
320
- # Error budget remaining
321
- - record: slo:error_budget:ratio
322
- expr: |
323
- 1 - (
324
- (1 - slo:availability:ratio_rate30d)
325
- /
326
- (1 - 0.999)
327
- )
328
-
329
- # Latency SLI
330
- - record: slo:latency:p95_30d
331
- expr: |
332
- histogram_quantile(0.95,
333
- sum(rate(http_request_duration_seconds_bucket{job="api"}[30d])) by (le)
334
- )
335
- ```
336
-
337
- ### Alerting Rules
338
-
339
- ```yaml
340
- # prometheus-slo-alerts.yaml
341
- groups:
342
- - name: slo_alerts
343
- rules:
344
- # Fast burn: 2% budget in 1 hour
345
- - alert: ErrorBudgetBurnRateFast
346
- expr: |
347
- (
348
- sum(rate(http_requests_total{job="api",status=~"5.."}[1h]))
349
- /
350
- sum(rate(http_requests_total{job="api"}[1h]))
351
- ) > (14.4 * (1 - 0.999))
352
- labels:
353
- severity: critical
354
- annotations:
355
- summary: "Error budget burning too fast"
356
- description: "2% of monthly error budget consumed in 1 hour"
357
-
358
- # Slow burn: 5% budget in 6 hours
359
- - alert: ErrorBudgetBurnRateSlow
360
- expr: |
361
- (
362
- sum(rate(http_requests_total{job="api",status=~"5.."}[6h]))
363
- /
364
- sum(rate(http_requests_total{job="api"}[6h]))
365
- ) > (6 * (1 - 0.999))
366
- labels:
367
- severity: warning
368
- annotations:
369
- summary: "Error budget burning at elevated rate"
370
-
371
- # Budget exhausted
372
- - alert: ErrorBudgetExhausted
373
- expr: slo:error_budget:ratio <= 0
374
- labels:
375
- severity: critical
376
- annotations:
377
- summary: "Error budget fully consumed"
378
- description: "Halt feature deployments, focus on reliability"
379
- ```
380
-
381
- ## Monitoring and Measurement
382
-
383
- ### Grafana Dashboard
384
-
385
- ```json
386
- {
387
- "dashboard": {
388
- "title": "SLO Dashboard",
389
- "panels": [
390
- {
391
- "title": "Error Budget Remaining",
392
- "type": "gauge",
393
- "targets": [{
394
- "expr": "slo:error_budget:ratio * 100"
395
- }],
396
- "thresholds": [
397
- { "value": 0, "color": "red" },
398
- { "value": 25, "color": "yellow" },
399
- { "value": 50, "color": "green" }
400
- ]
401
- },
402
- {
403
- "title": "Availability (30d)",
404
- "type": "stat",
405
- "targets": [{
406
- "expr": "slo:availability:ratio_rate30d * 100"
407
- }],
408
- "format": "percent"
409
- }
410
- ]
411
- }
412
- }
413
- ```
414
-
415
- ## Best Practices
416
-
417
- ### 1. Start Simple
418
-
419
- ```yaml
420
- # Begin with basic availability SLO
421
- initial_slo:
422
- availability: 99.9%
423
- measurement: request_success_rate
424
- ```
425
-
426
- ### 2. User-Centric SLIs
427
-
428
- ```
429
- ✅ Good: "95% of page loads complete in < 2s"
430
- ❌ Bad: "CPU usage < 80%"
431
- ```
432
-
433
- ### 3. Realistic Targets
434
-
435
- ```
436
- Don't aim for 100% - impossible and expensive
437
- 99.9% is often appropriate for most services
438
- 99.99% only if business truly requires it
439
- ```
440
-
441
- ### 4. Define Measurement Windows
442
-
443
- ```
444
- Use 30-day rolling windows
445
- Shorter windows (1d, 7d) for faster feedback
446
- ```
447
-
448
- ### 5. Document Everything
449
-
450
- ```yaml
451
- # Include in SLO document:
452
- - What is measured
453
- - Why it matters
454
- - How it's calculated
455
- - Who owns it
456
- - Review frequency
457
- ```
458
-
459
- ---
460
-
461
- **Related Resources:**
462
- - [incident-management.md](incident-management.md) - Responding to SLO violations
463
- - [alerting-best-practices.md](alerting-best-practices.md) - SLO-based alerting
464
- - [observability-stack.md](observability-stack.md) - Monitoring implementation
@@ -1,145 +0,0 @@
1
- # Toil Reduction
2
-
3
- Identifying toil, automation opportunities, self-healing systems, eliminating manual work, and improving operational efficiency.
4
-
5
- ## What is Toil?
6
-
7
- **Toil Characteristics:**
8
- ```
9
- Manual - Requires human intervention
10
- Repetitive - Same task over and over
11
- Automatable - Could be automated
12
- Tactical - Interrupt-driven, reactive
13
- No enduring value - Doesn't improve system
14
- Scales linearly - More growth = more toil
15
- ```
16
-
17
- ## Identifying Toil
18
-
19
- **Toil Audit:**
20
- ```yaml
21
- # Track on-call time spent
22
- weekly_activities:
23
- - task: Restart failed pods
24
- time_spent: 2 hours
25
- frequency: 15 times
26
- toil_score: HIGH
27
- automation_potential: HIGH
28
-
29
- - task: Manual deployment
30
- time_spent: 3 hours
31
- frequency: 10 times
32
- toil_score: CRITICAL
33
- automation_potential: HIGH
34
-
35
- - task: Update DNS records
36
- time_spent: 30 minutes
37
- frequency: 5 times
38
- toil_score: MEDIUM
39
- automation_potential: MEDIUM
40
- ```
41
-
42
- ## Automation Examples
43
-
44
- **Auto-Remediation:**
45
- ```yaml
46
- # Kubernetes CronJob for cleanup
47
- apiVersion: batch/v1
48
- kind: CronJob
49
- metadata:
50
- name: cleanup-failed-pods
51
- spec:
52
- schedule: "*/30 * * * *"
53
- jobTemplate:
54
- spec:
55
- template:
56
- spec:
57
- containers:
58
- - name: cleanup
59
- image: bitnami/kubectl
60
- command:
61
- - /bin/sh
62
- - -c
63
- - kubectl delete pods --field-selector status.phase=Failed
64
- ```
65
-
66
- **Self-Healing with Horizontal Pod Autoscaler:**
67
- ```yaml
68
- apiVersion: autoscaling/v2
69
- kind: HorizontalPodAutoscaler
70
- metadata:
71
- name: api-hpa
72
- spec:
73
- scaleTargetRef:
74
- apiVersion: apps/v1
75
- kind: Deployment
76
- name: api
77
- minReplicas: 3
78
- maxReplicas: 50
79
- metrics:
80
- - type: Resource
81
- resource:
82
- name: cpu
83
- target:
84
- type: Utilization
85
- averageUtilization: 70
86
- ```
87
-
88
- **Automated Deployment:**
89
- ```yaml
90
- # ArgoCD for GitOps
91
- apiVersion: argoproj.io/v1alpha1
92
- kind: Application
93
- metadata:
94
- name: api-service
95
- spec:
96
- destination:
97
- namespace: production
98
- server: https://kubernetes.default.svc
99
- source:
100
- path: k8s/production
101
- repoURL: https://github.com/example/repo
102
- targetRevision: main
103
- syncPolicy:
104
- automated:
105
- prune: true
106
- selfHeal: true
107
- ```
108
-
109
- ## Toil Reduction Strategies
110
-
111
- ### 1. Eliminate Manual Steps
112
-
113
- ```
114
- Before: SSH to server → restart service → check logs → update ticket
115
- After: kubectl rollout restart → auto-verification → auto-notification
116
- ```
117
-
118
- ### 2. Self-Service Platforms
119
-
120
- ```yaml
121
- # Developer self-service
122
- backstage_template:
123
- - Create new service
124
- - Provision infrastructure
125
- - Setup CI/CD
126
- - Configure monitoring
127
- - All automated, no ops team needed
128
- ```
129
-
130
- ### 3. Intelligent Automation
131
-
132
- ```python
133
- # Auto-scale based on patterns
134
- def intelligent_scaling(metrics):
135
- if is_business_hours() and metrics['traffic'] > threshold:
136
- scale_up()
137
- elif is_weekend() and metrics['traffic'] < threshold:
138
- scale_down()
139
- ```
140
-
141
- ---
142
-
143
- **Related Resources:**
144
- - [chaos-engineering.md](chaos-engineering.md)
145
- - [reliability-patterns.md](reliability-patterns.md)