@jetrabbits/agentic 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (440) hide show
  1. package/AGENTS.md +143 -0
  2. package/README.md +154 -0
  3. package/agentic +1615 -0
  4. package/areas/devops/ci-cd/AGENTS.md +48 -0
  5. package/areas/devops/ci-cd/PROMPTS.md +7 -0
  6. package/areas/devops/ci-cd/prompts/onboard-repo.md +97 -0
  7. package/areas/devops/ci-cd/prompts/pipeline-debug.md +103 -0
  8. package/areas/devops/ci-cd/prompts/release-pipeline.md +115 -0
  9. package/areas/devops/ci-cd/rules/pipeline-standards.md +33 -0
  10. package/areas/devops/ci-cd/rules/quality-gates.md +24 -0
  11. package/areas/devops/ci-cd/rules/supply-chain-security.md +34 -0
  12. package/areas/devops/ci-cd/skills/artifact-management/SKILL.md +157 -0
  13. package/areas/devops/ci-cd/skills/build-optimization/SKILL.md +168 -0
  14. package/areas/devops/ci-cd/skills/github-actions-patterns/SKILL.md +190 -0
  15. package/areas/devops/ci-cd/skills/gitlab-ci-patterns/SKILL.md +169 -0
  16. package/areas/devops/ci-cd/skills/pipeline-security/SKILL.md +161 -0
  17. package/areas/devops/ci-cd/workflows/onboard-repo.md +73 -0
  18. package/areas/devops/ci-cd/workflows/pipeline-debug.md +66 -0
  19. package/areas/devops/ci-cd/workflows/release-pipeline.md +115 -0
  20. package/areas/devops/database-ops/AGENTS.md +47 -0
  21. package/areas/devops/database-ops/prompts/backup-verify.md +83 -0
  22. package/areas/devops/database-ops/prompts/db-incident.md +127 -0
  23. package/areas/devops/database-ops/rules/access-control.md +20 -0
  24. package/areas/devops/database-ops/rules/backup-policy.md +33 -0
  25. package/areas/devops/database-ops/rules/migration-runbook.md +32 -0
  26. package/areas/devops/database-ops/skills/backup-restore/SKILL.md +226 -0
  27. package/areas/devops/database-ops/skills/db-performance/SKILL.md +205 -0
  28. package/areas/devops/database-ops/skills/migration-safety/SKILL.md +155 -0
  29. package/areas/devops/database-ops/skills/postgres-operations/SKILL.md +156 -0
  30. package/areas/devops/database-ops/skills/redis-operations/SKILL.md +174 -0
  31. package/areas/devops/database-ops/workflows/backup-verify.md +107 -0
  32. package/areas/devops/database-ops/workflows/db-incident.md +86 -0
  33. package/areas/devops/devsecops/AGENTS.md +47 -0
  34. package/areas/devops/devsecops/prompts/policy-onboard.md +79 -0
  35. package/areas/devops/devsecops/prompts/security-scan-pipeline.md +131 -0
  36. package/areas/devops/devsecops/rules/container-security.md +22 -0
  37. package/areas/devops/devsecops/rules/policy-as-code.md +37 -0
  38. package/areas/devops/devsecops/rules/shift-left-policy.md +26 -0
  39. package/areas/devops/devsecops/skills/container-hardening/SKILL.md +146 -0
  40. package/areas/devops/devsecops/skills/opa-policies/SKILL.md +188 -0
  41. package/areas/devops/devsecops/skills/sbom-supply-chain/SKILL.md +165 -0
  42. package/areas/devops/devsecops/skills/secret-detection/SKILL.md +190 -0
  43. package/areas/devops/devsecops/skills/sigstore-signing/SKILL.md +184 -0
  44. package/areas/devops/devsecops/workflows/policy-onboard.md +104 -0
  45. package/areas/devops/devsecops/workflows/security-scan-pipeline.md +155 -0
  46. package/areas/devops/infrastructure/AGENTS.md +50 -0
  47. package/areas/devops/infrastructure/prompts/destroy-environment.md +81 -0
  48. package/areas/devops/infrastructure/prompts/drift-remediation.md +71 -0
  49. package/areas/devops/infrastructure/prompts/module-development.md +69 -0
  50. package/areas/devops/infrastructure/prompts/provision-environment.md +121 -0
  51. package/areas/devops/infrastructure/rules/iac-standards.md +80 -0
  52. package/areas/devops/infrastructure/rules/immutability.md +28 -0
  53. package/areas/devops/infrastructure/rules/secret-hygiene.md +53 -0
  54. package/areas/devops/infrastructure/rules/state-management.md +47 -0
  55. package/areas/devops/infrastructure/skills/ansible-playbooks/SKILL.md +174 -0
  56. package/areas/devops/infrastructure/skills/cost-optimization/SKILL.md +177 -0
  57. package/areas/devops/infrastructure/skills/drift-detection/SKILL.md +178 -0
  58. package/areas/devops/infrastructure/skills/state-management/SKILL.md +159 -0
  59. package/areas/devops/infrastructure/skills/terraform-modules/SKILL.md +169 -0
  60. package/areas/devops/infrastructure/workflows/destroy-environment.md +96 -0
  61. package/areas/devops/infrastructure/workflows/drift-remediation.md +66 -0
  62. package/areas/devops/infrastructure/workflows/module-development.md +101 -0
  63. package/areas/devops/infrastructure/workflows/provision-environment.md +96 -0
  64. package/areas/devops/kubernetes/AGENTS.md +57 -0
  65. package/areas/devops/kubernetes/PROMPTS.md +9 -0
  66. package/areas/devops/kubernetes/prompts/cluster-bootstrap.md +67 -0
  67. package/areas/devops/kubernetes/prompts/debug-workload.md +91 -0
  68. package/areas/devops/kubernetes/prompts/onboard-service.md +101 -0
  69. package/areas/devops/kubernetes/prompts/upgrade-cluster.md +63 -0
  70. package/areas/devops/kubernetes/rules/cluster-standards.md +51 -0
  71. package/areas/devops/kubernetes/rules/resource-governance.md +80 -0
  72. package/areas/devops/kubernetes/rules/upgrade-policy.md +52 -0
  73. package/areas/devops/kubernetes/rules/workload-security.md +64 -0
  74. package/areas/devops/kubernetes/skills/cluster-operations/SKILL.md +136 -0
  75. package/areas/devops/kubernetes/skills/helm-charts/SKILL.md +152 -0
  76. package/areas/devops/kubernetes/skills/network-policies/SKILL.md +169 -0
  77. package/areas/devops/kubernetes/skills/pod-troubleshooting/SKILL.md +129 -0
  78. package/areas/devops/kubernetes/skills/rbac-design/SKILL.md +148 -0
  79. package/areas/devops/kubernetes/skills/resource-tuning/SKILL.md +156 -0
  80. package/areas/devops/kubernetes/workflows/cluster-bootstrap.md +194 -0
  81. package/areas/devops/kubernetes/workflows/debug-workload.md +108 -0
  82. package/areas/devops/kubernetes/workflows/onboard-service.md +124 -0
  83. package/areas/devops/kubernetes/workflows/upgrade-cluster.md +165 -0
  84. package/areas/devops/networking/AGENTS.md +47 -0
  85. package/areas/devops/networking/prompts/onboard-ingress.md +119 -0
  86. package/areas/devops/networking/prompts/service-mesh-onboard.md +77 -0
  87. package/areas/devops/networking/rules/ingress-standards.md +17 -0
  88. package/areas/devops/networking/rules/network-segmentation.md +24 -0
  89. package/areas/devops/networking/rules/tls-policy.md +32 -0
  90. package/areas/devops/networking/skills/dns-management/SKILL.md +169 -0
  91. package/areas/devops/networking/skills/ingress-patterns/SKILL.md +165 -0
  92. package/areas/devops/networking/skills/service-mesh/SKILL.md +206 -0
  93. package/areas/devops/networking/skills/tls-termination/SKILL.md +198 -0
  94. package/areas/devops/networking/skills/vpc-design/SKILL.md +132 -0
  95. package/areas/devops/networking/workflows/onboard-ingress.md +64 -0
  96. package/areas/devops/networking/workflows/service-mesh-onboard.md +122 -0
  97. package/areas/devops/observability/AGENTS.md +48 -0
  98. package/areas/devops/observability/prompts/alert-investigation.md +117 -0
  99. package/areas/devops/observability/prompts/observability-stack-setup.md +99 -0
  100. package/areas/devops/observability/prompts/onboard-service-monitoring.md +79 -0
  101. package/areas/devops/observability/rules/alerting-standards.md +36 -0
  102. package/areas/devops/observability/rules/data-retention.md +19 -0
  103. package/areas/devops/observability/rules/golden-signals.md +28 -0
  104. package/areas/devops/observability/skills/distributed-tracing/SKILL.md +149 -0
  105. package/areas/devops/observability/skills/grafana-dashboards/SKILL.md +201 -0
  106. package/areas/devops/observability/skills/log-aggregation/SKILL.md +159 -0
  107. package/areas/devops/observability/skills/prometheus-alertmanager/SKILL.md +188 -0
  108. package/areas/devops/observability/skills/slo-implementation/SKILL.md +189 -0
  109. package/areas/devops/observability/workflows/alert-investigation.md +98 -0
  110. package/areas/devops/observability/workflows/observability-stack-setup.md +156 -0
  111. package/areas/devops/observability/workflows/onboard-service-monitoring.md +83 -0
  112. package/areas/devops/sre/AGENTS.md +48 -0
  113. package/areas/devops/sre/prompts/incident-response.md +129 -0
  114. package/areas/devops/sre/prompts/postmortem.md +101 -0
  115. package/areas/devops/sre/prompts/slo-review.md +125 -0
  116. package/areas/devops/sre/rules/error-budget-policy.md +25 -0
  117. package/areas/devops/sre/rules/on-call-standards.md +25 -0
  118. package/areas/devops/sre/rules/slo-policy.md +31 -0
  119. package/areas/devops/sre/skills/capacity-planning/SKILL.md +162 -0
  120. package/areas/devops/sre/skills/chaos-engineering/SKILL.md +186 -0
  121. package/areas/devops/sre/skills/incident-command/SKILL.md +119 -0
  122. package/areas/devops/sre/skills/postmortem-analysis/SKILL.md +104 -0
  123. package/areas/devops/sre/skills/slo-sli-design/SKILL.md +145 -0
  124. package/areas/devops/sre/workflows/incident-response.md +66 -0
  125. package/areas/devops/sre/workflows/postmortem.md +90 -0
  126. package/areas/devops/sre/workflows/slo-review.md +95 -0
  127. package/areas/software/backend/AGENTS.md +59 -0
  128. package/areas/software/backend/PROMPTS.md +50 -0
  129. package/areas/software/backend/README.md +48 -0
  130. package/areas/software/backend/prompts/add-migration.md +93 -0
  131. package/areas/software/backend/prompts/create-endpoint.md +97 -0
  132. package/areas/software/backend/prompts/debug-issue.md +87 -0
  133. package/areas/software/backend/prompts/develop-epic.md +83 -0
  134. package/areas/software/backend/prompts/develop-feature.md +91 -0
  135. package/areas/software/backend/prompts/refactor-module.md +79 -0
  136. package/areas/software/backend/prompts/test-feature.md +89 -0
  137. package/areas/software/backend/rules/architecture.md +20 -0
  138. package/areas/software/backend/rules/data_access.md +20 -0
  139. package/areas/software/backend/rules/security.md +20 -0
  140. package/areas/software/backend/rules/testing.md +19 -0
  141. package/areas/software/backend/skills/api-design/SKILL.md +170 -0
  142. package/areas/software/backend/skills/async-processing/SKILL.md +152 -0
  143. package/areas/software/backend/skills/database-modeling/SKILL.md +173 -0
  144. package/areas/software/backend/skills/observability/SKILL.md +162 -0
  145. package/areas/software/backend/skills/troubleshooting/SKILL.md +139 -0
  146. package/areas/software/backend/workflows/add-migration.md +79 -0
  147. package/areas/software/backend/workflows/create-endpoint.md +89 -0
  148. package/areas/software/backend/workflows/debug-issue.md +77 -0
  149. package/areas/software/backend/workflows/develop-epic.md +78 -0
  150. package/areas/software/backend/workflows/develop-feature.md +98 -0
  151. package/areas/software/backend/workflows/refactor-module.md +73 -0
  152. package/areas/software/backend/workflows/test-feature.md +67 -0
  153. package/areas/software/data-engineering/AGENTS.md +59 -0
  154. package/areas/software/data-engineering/PROMPTS.md +32 -0
  155. package/areas/software/data-engineering/prompts/backfill-data.md +107 -0
  156. package/areas/software/data-engineering/prompts/data-quality-incident.md +109 -0
  157. package/areas/software/data-engineering/prompts/lineage-trace.md +121 -0
  158. package/areas/software/data-engineering/prompts/new-model.md +117 -0
  159. package/areas/software/data-engineering/prompts/schema-migration.md +111 -0
  160. package/areas/software/data-engineering/rules/data-governance.md +11 -0
  161. package/areas/software/data-engineering/rules/pii-handling.md +19 -0
  162. package/areas/software/data-engineering/rules/pipeline-integrity.md +11 -0
  163. package/areas/software/data-engineering/rules/schema-management.md +21 -0
  164. package/areas/software/data-engineering/skills/data-modeling/SKILL.md +49 -0
  165. package/areas/software/data-engineering/skills/dbt-patterns/SKILL.md +43 -0
  166. package/areas/software/data-engineering/skills/lineage-governance/SKILL.md +38 -0
  167. package/areas/software/data-engineering/skills/orchestration/SKILL.md +35 -0
  168. package/areas/software/data-engineering/skills/quality-checks/SKILL.md +50 -0
  169. package/areas/software/data-engineering/skills/sql-optimization/SKILL.md +47 -0
  170. package/areas/software/data-engineering/skills/streaming-patterns/SKILL.md +48 -0
  171. package/areas/software/data-engineering/workflows/backfill-data.md +59 -0
  172. package/areas/software/data-engineering/workflows/data-quality-incident.md +64 -0
  173. package/areas/software/data-engineering/workflows/lineage-trace.md +56 -0
  174. package/areas/software/data-engineering/workflows/new-model.md +71 -0
  175. package/areas/software/data-engineering/workflows/schema-migration.md +67 -0
  176. package/areas/software/frontend/AGENTS.md +60 -0
  177. package/areas/software/frontend/PROMPTS.md +32 -0
  178. package/areas/software/frontend/prompts/a11y-fix.md +75 -0
  179. package/areas/software/frontend/prompts/bundle-analyze.md +75 -0
  180. package/areas/software/frontend/prompts/release-prep.md +83 -0
  181. package/areas/software/frontend/prompts/scaffold-component.md +69 -0
  182. package/areas/software/frontend/prompts/visual-regression.md +73 -0
  183. package/areas/software/frontend/rules/accessibility.md +16 -0
  184. package/areas/software/frontend/rules/architecture.md +29 -0
  185. package/areas/software/frontend/rules/performance.md +23 -0
  186. package/areas/software/frontend/rules/quality.md +12 -0
  187. package/areas/software/frontend/skills/a11y-audit/SKILL.md +61 -0
  188. package/areas/software/frontend/skills/api-integration/SKILL.md +58 -0
  189. package/areas/software/frontend/skills/component-design/SKILL.md +171 -0
  190. package/areas/software/frontend/skills/css-architecture/SKILL.md +146 -0
  191. package/areas/software/frontend/skills/error-handling/SKILL.md +55 -0
  192. package/areas/software/frontend/skills/performance-tuning/SKILL.md +58 -0
  193. package/areas/software/frontend/skills/state-management/SKILL.md +54 -0
  194. package/areas/software/frontend/skills/testing-patterns/SKILL.md +69 -0
  195. package/areas/software/frontend/workflows/a11y-fix.md +63 -0
  196. package/areas/software/frontend/workflows/bundle-analyze.md +56 -0
  197. package/areas/software/frontend/workflows/release-prep.md +66 -0
  198. package/areas/software/frontend/workflows/scaffold-component.md +67 -0
  199. package/areas/software/frontend/workflows/visual-regression.md +65 -0
  200. package/areas/software/full-stack/AGENTS.md +72 -0
  201. package/areas/software/full-stack/PROMPTS.md +66 -0
  202. package/areas/software/full-stack/prompts/backend-project-full-cycle.md +141 -0
  203. package/areas/software/full-stack/prompts/debug-issue.md +115 -0
  204. package/areas/software/full-stack/prompts/develop-feature.md +119 -0
  205. package/areas/software/full-stack/prompts/feature-implementation-flow.md +137 -0
  206. package/areas/software/full-stack/prompts/testing-ci-pipeline.md +119 -0
  207. package/areas/software/full-stack/rules/api-design-guide.md +24 -0
  208. package/areas/software/full-stack/rules/async-concurrency-guide.md +21 -0
  209. package/areas/software/full-stack/rules/backend-architecture-rule.md +41 -0
  210. package/areas/software/full-stack/rules/background-jobs-guide.md +20 -0
  211. package/areas/software/full-stack/rules/code-quality-guide.md +22 -0
  212. package/areas/software/full-stack/rules/database-access-guide.md +24 -0
  213. package/areas/software/full-stack/rules/database-migrations-guide.md +24 -0
  214. package/areas/software/full-stack/rules/domain-models-guide.md +28 -0
  215. package/areas/software/full-stack/rules/e2e-test-guide.md +18 -0
  216. package/areas/software/full-stack/rules/env-settings-guide.md +34 -0
  217. package/areas/software/full-stack/rules/error-handling-guide.md +20 -0
  218. package/areas/software/full-stack/rules/logging-observability-guide.md +22 -0
  219. package/areas/software/full-stack/rules/project-guide.md +34 -0
  220. package/areas/software/full-stack/rules/python-venv-guide.md +23 -0
  221. package/areas/software/full-stack/rules/security-guide.md +22 -0
  222. package/areas/software/full-stack/rules/svt-test-guide.md +17 -0
  223. package/areas/software/full-stack/rules/testing-ci-guide.md +25 -0
  224. package/areas/software/full-stack/skills/api-design-principles/SKILL.md +125 -0
  225. package/areas/software/full-stack/skills/api-design-principles/assets/api-design-checklist.md +155 -0
  226. package/areas/software/full-stack/skills/api-design-principles/assets/rest-api-template.py +182 -0
  227. package/areas/software/full-stack/skills/api-design-principles/references/graphql-schema-design.md +583 -0
  228. package/areas/software/full-stack/skills/api-design-principles/references/rest-best-practices.md +408 -0
  229. package/areas/software/full-stack/skills/api-design-principles/resources/implementation-playbook.md +513 -0
  230. package/areas/software/full-stack/skills/api-patterns/SKILL.md +81 -0
  231. package/areas/software/full-stack/skills/api-patterns/api-style.md +42 -0
  232. package/areas/software/full-stack/skills/api-patterns/auth.md +24 -0
  233. package/areas/software/full-stack/skills/api-patterns/documentation.md +26 -0
  234. package/areas/software/full-stack/skills/api-patterns/graphql.md +41 -0
  235. package/areas/software/full-stack/skills/api-patterns/rate-limiting.md +31 -0
  236. package/areas/software/full-stack/skills/api-patterns/response.md +37 -0
  237. package/areas/software/full-stack/skills/api-patterns/rest.md +40 -0
  238. package/areas/software/full-stack/skills/api-patterns/scripts/api_validator.py +211 -0
  239. package/areas/software/full-stack/skills/api-patterns/security-testing.md +122 -0
  240. package/areas/software/full-stack/skills/api-patterns/trpc.md +41 -0
  241. package/areas/software/full-stack/skills/api-patterns/versioning.md +22 -0
  242. package/areas/software/full-stack/skills/app-builder/SKILL.md +135 -0
  243. package/areas/software/full-stack/skills/app-builder/agent-coordination.md +71 -0
  244. package/areas/software/full-stack/skills/app-builder/feature-building.md +53 -0
  245. package/areas/software/full-stack/skills/app-builder/project-detection.md +34 -0
  246. package/areas/software/full-stack/skills/app-builder/scaffolding.md +118 -0
  247. package/areas/software/full-stack/skills/app-builder/tech-stack.md +40 -0
  248. package/areas/software/full-stack/skills/app-builder/templates/SKILL.md +39 -0
  249. package/areas/software/full-stack/skills/app-builder/templates/astro-static/TEMPLATE.md +76 -0
  250. package/areas/software/full-stack/skills/app-builder/templates/chrome-extension/TEMPLATE.md +92 -0
  251. package/areas/software/full-stack/skills/app-builder/templates/cli-tool/TEMPLATE.md +88 -0
  252. package/areas/software/full-stack/skills/app-builder/templates/electron-desktop/TEMPLATE.md +88 -0
  253. package/areas/software/full-stack/skills/app-builder/templates/express-api/TEMPLATE.md +83 -0
  254. package/areas/software/full-stack/skills/app-builder/templates/flutter-app/TEMPLATE.md +90 -0
  255. package/areas/software/full-stack/skills/app-builder/templates/monorepo-turborepo/TEMPLATE.md +90 -0
  256. package/areas/software/full-stack/skills/app-builder/templates/nextjs-fullstack/TEMPLATE.md +82 -0
  257. package/areas/software/full-stack/skills/app-builder/templates/nextjs-saas/TEMPLATE.md +100 -0
  258. package/areas/software/full-stack/skills/app-builder/templates/nextjs-static/TEMPLATE.md +106 -0
  259. package/areas/software/full-stack/skills/app-builder/templates/nuxt-app/TEMPLATE.md +101 -0
  260. package/areas/software/full-stack/skills/app-builder/templates/python-fastapi/TEMPLATE.md +83 -0
  261. package/areas/software/full-stack/skills/app-builder/templates/react-native-app/TEMPLATE.md +93 -0
  262. package/areas/software/full-stack/skills/backend-developer/SKILL.md +58 -0
  263. package/areas/software/full-stack/skills/bash-pro/SKILL.md +310 -0
  264. package/areas/software/full-stack/skills/blackbox-test/SKILL.md +84 -0
  265. package/areas/software/full-stack/skills/prompt-project-planner/SKILL.md +130 -0
  266. package/areas/software/full-stack/skills/prompt-project-planner/output.schema.md +68 -0
  267. package/areas/software/full-stack/skills/prompt-project-planner/questions.md +80 -0
  268. package/areas/software/full-stack/skills/python-pro/SKILL.md +158 -0
  269. package/areas/software/full-stack/skills/skill-creator/LICENSE.txt +202 -0
  270. package/areas/software/full-stack/skills/skill-creator/SKILL.md +356 -0
  271. package/areas/software/full-stack/skills/skill-creator/references/output-patterns.md +82 -0
  272. package/areas/software/full-stack/skills/skill-creator/references/workflows.md +28 -0
  273. package/areas/software/full-stack/skills/skill-creator/scripts/init_skill.py +303 -0
  274. package/areas/software/full-stack/skills/skill-creator/scripts/package_skill.py +110 -0
  275. package/areas/software/full-stack/skills/skill-creator/scripts/quick_validate.py +95 -0
  276. package/areas/software/full-stack/workflows/backend-project-full-cycle.md +132 -0
  277. package/areas/software/full-stack/workflows/debug-issue.md +70 -0
  278. package/areas/software/full-stack/workflows/develop-feature.md +85 -0
  279. package/areas/software/full-stack/workflows/feature-implementation-flow.md +78 -0
  280. package/areas/software/full-stack/workflows/testing-ci-pipeline.md +65 -0
  281. package/areas/software/general/AGENTS.md +68 -0
  282. package/areas/software/general/prompts/code-review-workflow.md +87 -0
  283. package/areas/software/general/prompts/development-cycle-workflow.md +83 -0
  284. package/areas/software/general/prompts/project-setup-workflow.md +93 -0
  285. package/areas/software/general/rules/code-style-guide.md +31 -0
  286. package/areas/software/general/rules/docker-compose-guide.md +27 -0
  287. package/areas/software/general/rules/git-workflow-guide.md +27 -0
  288. package/areas/software/general/rules/github-workflow-guide.md +27 -0
  289. package/areas/software/general/rules/gitlab-ci-guide.md +27 -0
  290. package/areas/software/general/rules/lint-format-guide.md +29 -0
  291. package/areas/software/general/rules/makefile-guide.md +34 -0
  292. package/areas/software/general/rules/readme-sync-guide.md +40 -0
  293. package/areas/software/general/rules/sdlc-methodology-guide.md +27 -0
  294. package/areas/software/general/rules/sdlc-role-responsibilities.md +108 -0
  295. package/areas/software/general/skills/general-dev-tools/SKILL.md +324 -0
  296. package/areas/software/general/workflows/code-review-workflow.md +84 -0
  297. package/areas/software/general/workflows/development-cycle-workflow.md +85 -0
  298. package/areas/software/general/workflows/project-setup-workflow.md +94 -0
  299. package/areas/software/mlops/AGENTS.md +57 -0
  300. package/areas/software/mlops/PROMPTS.md +32 -0
  301. package/areas/software/mlops/prompts/champion-challenger.md +87 -0
  302. package/areas/software/mlops/prompts/deploy-endpoint.md +91 -0
  303. package/areas/software/mlops/prompts/evaluate-model.md +87 -0
  304. package/areas/software/mlops/prompts/model-incident.md +87 -0
  305. package/areas/software/mlops/prompts/train-experiment.md +83 -0
  306. package/areas/software/mlops/rules/data-integrity.md +9 -0
  307. package/areas/software/mlops/rules/model-governance.md +9 -0
  308. package/areas/software/mlops/rules/production-safety.md +9 -0
  309. package/areas/software/mlops/rules/reproducibility.md +9 -0
  310. package/areas/software/mlops/skills/experiment-tracking/SKILL.md +29 -0
  311. package/areas/software/mlops/skills/feature-engineering/SKILL.md +44 -0
  312. package/areas/software/mlops/skills/inference-serving/SKILL.md +35 -0
  313. package/areas/software/mlops/skills/model-evaluation/SKILL.md +40 -0
  314. package/areas/software/mlops/skills/model-monitoring/SKILL.md +32 -0
  315. package/areas/software/mlops/workflows/champion-challenger.md +65 -0
  316. package/areas/software/mlops/workflows/deploy-endpoint.md +70 -0
  317. package/areas/software/mlops/workflows/evaluate-model.md +63 -0
  318. package/areas/software/mlops/workflows/model-incident.md +64 -0
  319. package/areas/software/mlops/workflows/train-experiment.md +56 -0
  320. package/areas/software/mobile/AGENTS.md +58 -0
  321. package/areas/software/mobile/PROMPTS.md +32 -0
  322. package/areas/software/mobile/prompts/crash-triage.md +63 -0
  323. package/areas/software/mobile/prompts/device-testing.md +83 -0
  324. package/areas/software/mobile/prompts/ota-update.md +75 -0
  325. package/areas/software/mobile/prompts/release-build.md +67 -0
  326. package/areas/software/mobile/prompts/store-submission.md +79 -0
  327. package/areas/software/mobile/rules/offline-first.md +10 -0
  328. package/areas/software/mobile/rules/performance-budget.md +20 -0
  329. package/areas/software/mobile/rules/platform-compliance.md +17 -0
  330. package/areas/software/mobile/rules/security-mobile.md +9 -0
  331. package/areas/software/mobile/skills/app-store-prep/SKILL.md +27 -0
  332. package/areas/software/mobile/skills/mobile-testing/SKILL.md +36 -0
  333. package/areas/software/mobile/skills/native-modules/SKILL.md +38 -0
  334. package/areas/software/mobile/skills/navigation-patterns/SKILL.md +49 -0
  335. package/areas/software/mobile/skills/push-notifications/SKILL.md +40 -0
  336. package/areas/software/mobile/skills/state-sync/SKILL.md +48 -0
  337. package/areas/software/mobile/workflows/crash-triage.md +63 -0
  338. package/areas/software/mobile/workflows/device-testing.md +54 -0
  339. package/areas/software/mobile/workflows/ota-update.md +54 -0
  340. package/areas/software/mobile/workflows/release-build.md +67 -0
  341. package/areas/software/mobile/workflows/store-submission.md +63 -0
  342. package/areas/software/platform/AGENTS.md +67 -0
  343. package/areas/software/platform/PROMPTS.md +32 -0
  344. package/areas/software/platform/prompts/cost-audit.md +117 -0
  345. package/areas/software/platform/prompts/deploy-production.md +109 -0
  346. package/areas/software/platform/prompts/drift-check.md +107 -0
  347. package/areas/software/platform/prompts/incident-response.md +121 -0
  348. package/areas/software/platform/prompts/provision-env.md +113 -0
  349. package/areas/software/platform/rules/cost-governance.md +11 -0
  350. package/areas/software/platform/rules/immutability.md +17 -0
  351. package/areas/software/platform/rules/reliability.md +19 -0
  352. package/areas/software/platform/rules/security-posture.md +12 -0
  353. package/areas/software/platform/skills/ci-cd-pipelines/SKILL.md +58 -0
  354. package/areas/software/platform/skills/incident-response/SKILL.md +41 -0
  355. package/areas/software/platform/skills/k8s-manifests/SKILL.md +56 -0
  356. package/areas/software/platform/skills/networking/SKILL.md +44 -0
  357. package/areas/software/platform/skills/observability-setup/SKILL.md +49 -0
  358. package/areas/software/platform/skills/secrets-management/SKILL.md +43 -0
  359. package/areas/software/platform/skills/terraform-patterns/SKILL.md +75 -0
  360. package/areas/software/platform/workflows/cost-audit.md +61 -0
  361. package/areas/software/platform/workflows/deploy-production.md +67 -0
  362. package/areas/software/platform/workflows/drift-check.md +61 -0
  363. package/areas/software/platform/workflows/incident-response.md +69 -0
  364. package/areas/software/platform/workflows/provision-env.md +77 -0
  365. package/areas/software/qa/AGENTS.md +58 -0
  366. package/areas/software/qa/PROMPTS.md +32 -0
  367. package/areas/software/qa/prompts/flakiness-investigation.md +61 -0
  368. package/areas/software/qa/prompts/performance-audit.md +65 -0
  369. package/areas/software/qa/prompts/regression-suite.md +61 -0
  370. package/areas/software/qa/prompts/smoke-test.md +65 -0
  371. package/areas/software/qa/prompts/test-coverage-report.md +61 -0
  372. package/areas/software/qa/rules/flakiness-policy.md +12 -0
  373. package/areas/software/qa/rules/quality-gates.md +28 -0
  374. package/areas/software/qa/rules/test-data.md +9 -0
  375. package/areas/software/qa/rules/test-strategy.md +11 -0
  376. package/areas/software/qa/skills/accessibility-testing/SKILL.md +139 -0
  377. package/areas/software/qa/skills/api-testing/SKILL.md +140 -0
  378. package/areas/software/qa/skills/e2e-patterns/SKILL.md +152 -0
  379. package/areas/software/qa/skills/performance-testing/SKILL.md +177 -0
  380. package/areas/software/qa/skills/test-data-management/SKILL.md +161 -0
  381. package/areas/software/qa/skills/test-pyramid/SKILL.md +127 -0
  382. package/areas/software/qa/workflows/flakiness-investigation.md +63 -0
  383. package/areas/software/qa/workflows/performance-audit.md +59 -0
  384. package/areas/software/qa/workflows/regression-suite.md +59 -0
  385. package/areas/software/qa/workflows/smoke-test.md +64 -0
  386. package/areas/software/qa/workflows/test-coverage-report.md +57 -0
  387. package/areas/software/security/AGENTS.md +58 -0
  388. package/areas/software/security/PROMPTS.md +32 -0
  389. package/areas/software/security/prompts/compliance-report.md +113 -0
  390. package/areas/software/security/prompts/pen-test-sim.md +113 -0
  391. package/areas/software/security/prompts/secret-rotation.md +115 -0
  392. package/areas/software/security/prompts/security-scan.md +91 -0
  393. package/areas/software/security/prompts/threat-model-review.md +105 -0
  394. package/areas/software/security/rules/compliance-baseline.md +23 -0
  395. package/areas/software/security/rules/dependency-policy.md +12 -0
  396. package/areas/software/security/rules/secrets-policy.md +22 -0
  397. package/areas/software/security/rules/secure-coding.md +22 -0
  398. package/areas/software/security/skills/auth-patterns/SKILL.md +42 -0
  399. package/areas/software/security/skills/crypto-standards/SKILL.md +42 -0
  400. package/areas/software/security/skills/dependency-audit/SKILL.md +29 -0
  401. package/areas/software/security/skills/sast-dast-interpretation/SKILL.md +33 -0
  402. package/areas/software/security/skills/security-headers/SKILL.md +29 -0
  403. package/areas/software/security/skills/threat-modeling/SKILL.md +36 -0
  404. package/areas/software/security/workflows/compliance-report.md +57 -0
  405. package/areas/software/security/workflows/pen-test-sim.md +63 -0
  406. package/areas/software/security/workflows/secret-rotation.md +67 -0
  407. package/areas/software/security/workflows/security-scan.md +64 -0
  408. package/areas/software/security/workflows/threat-model-review.md +62 -0
  409. package/areas/template/AGENTS-area.tmpl.md +61 -0
  410. package/areas/template/AGENTS.tmpl.md +67 -0
  411. package/areas/template/GUIDE.md +102 -0
  412. package/areas/template/PROMPTS.tmpl.md +29 -0
  413. package/areas/template/README.md +57 -0
  414. package/areas/template/README.tmpl.md +51 -0
  415. package/areas/template/prompt.tmpl.md +101 -0
  416. package/areas/template/rule.tmpl.md +71 -0
  417. package/areas/template/skill.tmpl.md +108 -0
  418. package/areas/template/workflow.tmpl.md +104 -0
  419. package/bin/agentic.js +24 -0
  420. package/extensions/antigravity/GEMINI.md +10 -0
  421. package/extensions/claude/CLAUDE.md +10 -0
  422. package/extensions/codex/AGENTS.override.md +93 -0
  423. package/extensions/gemini/GEMINI.md +10 -0
  424. package/extensions/opencode/agents/designer.md +65 -0
  425. package/extensions/opencode/agents/developer.md +63 -0
  426. package/extensions/opencode/agents/devops-engineer.md +69 -0
  427. package/extensions/opencode/agents/pm.md +61 -0
  428. package/extensions/opencode/agents/product-owner.md +76 -0
  429. package/extensions/opencode/agents/qa.md +66 -0
  430. package/extensions/opencode/agents/team-lead.md +67 -0
  431. package/extensions/opencode/commands/feature.md +75 -0
  432. package/extensions/opencode/opencode.json +93 -0
  433. package/extensions/opencode/plugins/model-checker.json +14 -0
  434. package/extensions/opencode/plugins/model-checker.ts +279 -0
  435. package/extensions/opencode/plugins/sound-notification.ts +13 -0
  436. package/extensions/opencode/plugins/telegram-notification.ts +86 -0
  437. package/extensions/opencode/skills/code_review_expert/SKILL.md +144 -0
  438. package/extensions/opencode/skills/design_expert/SKILL.md +42 -0
  439. package/extensions/opencode/skills/qa_expert/SKILL.md +116 -0
  440. package/package.json +19 -0
@@ -0,0 +1,25 @@
1
+ # Rule: On-Call Standards
2
+
3
+ **Priority**: P1 — On-call engineers must be equipped and protected.
4
+
5
+ ## On-Call Requirements
6
+
7
+ 1. **Response times** — P0: 5 min; P1: 15 min; P2: 1h (business hours).
8
+ 2. **On-call rotation** — maximum 1 week primary + 1 week secondary per engineer per month.
9
+ 3. **Runbook coverage** — every alert that can page must have a runbook. No runbook = alert is demoted to warning until written.
10
+ 4. **Tooling access** — on-call engineer has prod read + limited write access (rollback, scale, restart). Full access requires separate MFA.
11
+ 5. **Escalation path** documented and tested quarterly.
12
+
13
+ ## Incident Severity (align with platform)
14
+
15
+ | Severity | Definition | Response |
16
+ |:---|:---|:---|
17
+ | P0 | Complete outage; data loss | Immediate; all hands |
18
+ | P1 | Major feature broken; >25% users affected | 15 min; on-call + lead |
19
+ | P2 | Degraded; workaround available | 1h; business hours OK |
20
+ | P3 | Minor issue; no user impact | Next sprint |
21
+
22
+ ## Toil Budget
23
+
24
+ - On-call toil (repetitive, automatable work) must not exceed 50% of on-call hours.
25
+ - Toil > 50% for 2 consecutive rotations → mandatory automation sprint.
@@ -0,0 +1,31 @@
1
+ # Rule: SLO Policy
2
+
3
+ **Priority**: P1 — Services in production must have defined SLOs with error budgets.
4
+
5
+ ## SLO Definition Requirements
6
+
7
+ 1. **Every Tier 1 service must define:**
8
+ - **SLI** (what we measure): e.g., proportion of requests completing < 500ms
9
+ - **SLO** (the target): e.g., 99.5% of requests complete < 500ms over 28 days
10
+ - **Error budget**: 100% - SLO = 0.5% = ~3.6h of downtime per 28 days
11
+
12
+ 2. **SLI types (choose appropriate)**
13
+
14
+ | SLI type | Formula | Use when |
15
+ |:---|:---|:---|
16
+ | Availability | good_requests / total_requests | HTTP services |
17
+ | Latency | requests_below_threshold / total_requests | Latency-sensitive APIs |
18
+ | Throughput | actual_throughput / target_throughput | Batch/stream processing |
19
+ | Correctness | correct_results / total_results | Data pipelines |
20
+
21
+ 3. **SLO tiers**
22
+
23
+ | Tier | Example SLO | Error budget / 28d |
24
+ |:---|:---|:---|
25
+ | Tier 1 (revenue) | 99.9% availability | 43 minutes |
26
+ | Tier 2 (internal) | 99.5% availability | 3.6 hours |
27
+ | Tier 3 (batch) | 99.0% availability | 7.2 hours |
28
+
29
+ 4. **28-day rolling window** is the default measurement period. Rolling > calendar month (avoids "burn ahead" gaming).
30
+
31
+ 5. **SLOs reviewed quarterly** — adjust based on actual reliability data.
@@ -0,0 +1,162 @@
1
+ ---
2
+ name: capacity-planning
3
+ type: skill
4
+ description: Forecast infrastructure capacity needs — traffic projection, resource headroom calculations, node pool sizing, K8s cluster capacity.
5
+ related-rules:
6
+ - slo-policy.md
7
+ allowed-tools: Read, Write, Edit, Bash
8
+ ---
9
+
10
+ # Skill: Capacity Planning
11
+
12
+ > **Expertise:** Traffic forecasting, per-pod resource modeling, node pool sizing, cluster capacity headroom, VPA/HPA tuning for growth.
13
+
14
+ ## When to load
15
+
16
+ When planning for growth, validating current cluster headroom, sizing node pools, or preparing for a high-traffic event (sale, launch).
17
+
18
+ ## Traffic Forecasting
19
+
20
+ ```promql
21
+ # Current RPS baseline (7-day average)
22
+ avg_over_time(
23
+ sum(rate(http_requests_total{service="checkout-service"}[5m]))[7d:5m]
24
+ )
25
+
26
+ # Peak RPS (7-day p99)
27
+ quantile_over_time(0.99,
28
+ sum(rate(http_requests_total{service="checkout-service"}[5m]))[7d:5m]
29
+ )
30
+
31
+ # Week-over-week growth rate
32
+ (
33
+ avg_over_time(sum(rate(http_requests_total[5m]))[7d:5m])
34
+ /
35
+ avg_over_time(sum(rate(http_requests_total[5m]))[7d:5m] offset 7d)
36
+ ) - 1
37
+ # e.g. 0.08 = 8% weekly growth → ~3.5× in 6 months
38
+ ```
39
+
40
+ ## Per-Pod Resource Modeling
41
+
42
+ ```
43
+ Model: what resources does 1 pod consume per RPS unit?
44
+
45
+ Step 1: current pod metrics
46
+ - pods = 4 (HPA current)
47
+ - RPS = 200 req/s (avg)
48
+ - CPU per pod = 320m (avg), 480m (p99)
49
+ - Memory per pod = 280Mi (avg), 380Mi (peak)
50
+
51
+ Step 2: per-RPS resource cost
52
+ - CPU per RPS = 320m / (200/4) = 6.4m CPU per RPS
53
+ - Mem per RPS = 280Mi / (200/4) = 5.6Mi per RPS
54
+
55
+ Step 3: future requirements at 2× traffic (400 RPS)
56
+ - CPU needed = 400 × 6.4m = 2560m = 2.56 cores
57
+ - Mem needed = 400 × 5.6Mi = 2240Mi ≈ 2.2Gi
58
+ - Pods needed (at 70% CPU target) = 2560m / (500m × 0.7) = 7.3 → 8 pods min
59
+ - Update HPA maxReplicas to accommodate
60
+ ```
61
+
62
+ ## Cluster Capacity Check
63
+
64
+ ```bash
65
+ # Total cluster allocatable resources
66
+ kubectl get nodes -o json | jq '
67
+ [.items[].status.allocatable] |
68
+ {
69
+ cpu: [(.[].cpu | gsub("m";"") | tonumber) / 1000] | add,
70
+ memory_gi: [(.[].memory | gsub("Ki";"") | tonumber) / 1048576] | add
71
+ }'
72
+
73
+ # Currently requested resources (sum of all pod requests)
74
+ kubectl get pods -A -o json | jq '
75
+ [.items[].spec.containers[].resources.requests // {}] |
76
+ {
77
+ cpu_requested: [.[].cpu // "0m" | gsub("m";"") | tonumber] | add / 1000,
78
+ mem_requested_gi: [.[].memory // "0Mi" | gsub("Mi";"") | tonumber] | add / 1024
79
+ }'
80
+
81
+ # Headroom per node (allocatable - requested)
82
+ kubectl describe nodes | grep -A5 "Allocated resources:"
83
+
84
+ # Quick headroom summary script
85
+ kubectl get nodes -o custom-columns=\
86
+ "NAME:.metadata.name,\
87
+ CPU_ALLOC:.status.allocatable.cpu,\
88
+ MEM_ALLOC:.status.allocatable.memory,\
89
+ READY:.status.conditions[-1].type"
90
+ ```
91
+
92
+ ## Node Pool Sizing Formula
93
+
94
+ ```
95
+ Variables:
96
+ T = target RPS (peak)
97
+ R_cpu = CPU request per pod (millicores)
98
+ R_mem = memory request per pod (MiB)
99
+ util = target utilisation (e.g. 0.70 = 70%)
100
+ headroom = spare capacity factor (e.g. 1.3 = 30% spare)
101
+ node_cpu = node allocatable CPU (millicores)
102
+ node_mem = node allocatable memory (MiB)
103
+
104
+ Pods needed:
105
+ pods = ceil((T × cpu_per_rps) / (node_cpu × util)) × headroom
106
+
107
+ Nodes needed for CPU:
108
+ nodes_cpu = ceil((pods × R_cpu) / (node_cpu × util))
109
+
110
+ Nodes needed for Memory:
111
+ nodes_mem = ceil((pods × R_mem) / (node_mem × util))
112
+
113
+ Required nodes = max(nodes_cpu, nodes_mem) + 1 (N+1 for failure tolerance)
114
+ ```
115
+
116
+ ## Pre-Event Capacity (sale, product launch)
117
+
118
+ ```bash
119
+ # 1. Estimate peak multiplier from past events or product team forecast
120
+ PEAK_MULTIPLIER=5 # "we expect 5× normal traffic for 2 hours"
121
+
122
+ # 2. Pre-scale HPA min replicas before event
123
+ kubectl patch hpa order-service -n production \
124
+ -p '{"spec":{"minReplicas":10}}'
125
+
126
+ # 3. Pre-warm node pool (add nodes before autoscaler reacts)
127
+ # AWS: adjust ASG desired capacity
128
+ aws autoscaling set-desired-capacity \
129
+ --auto-scaling-group-name prod-workers \
130
+ --desired-capacity 12
131
+
132
+ # 4. Disable HPA scale-down during event window
133
+ kubectl patch hpa order-service -n production \
134
+ -p '{"spec":{"behavior":{"scaleDown":{"stabilizationWindowSeconds":3600}}}}'
135
+
136
+ # 5. Restore after event
137
+ kubectl patch hpa order-service -n production \
138
+ -p '{"spec":{"minReplicas":2,"behavior":{"scaleDown":{"stabilizationWindowSeconds":300}}}}'
139
+ ```
140
+
141
+ ## Capacity Planning Report (monthly)
142
+
143
+ ```markdown
144
+ ## Capacity Report — November 2024
145
+
146
+ ### Current State
147
+ - Cluster: 9 workers (cx41, 4 vCPU / 16Gi each)
148
+ - CPU utilisation: 58% avg, 71% peak
149
+ - Memory utilisation: 62% avg, 74% peak
150
+ - Headroom: ~25% CPU, ~20% Memory
151
+
152
+ ### Growth Trend
153
+ - Traffic WoW growth: +6.2% (8 weeks avg)
154
+ - Extrapolation: current capacity exhausted in ~14 weeks at current growth
155
+
156
+ ### Recommendations
157
+ 1. Add 2 nodes before end of Q4 (reduce peak CPU to < 60%)
158
+ 2. Evaluate spot nodes for worker pool (60-75% cost saving)
159
+ 3. Review order-service memory limit — VPA recommends 640Mi vs current 512Mi
160
+
161
+ ### Next Review: December 2024
162
+ ```
@@ -0,0 +1,186 @@
1
+ ---
2
+ name: chaos-engineering
3
+ type: skill
4
+ description: Design and run chaos experiments in Kubernetes — pod failures, network partitions, resource pressure with LitmusChaos and manual chaos.
5
+ related-rules:
6
+ - slo-policy.md
7
+ - on-call-standards.md
8
+ allowed-tools: Read, Write, Edit, Bash
9
+ ---
10
+
11
+ # Skill: Chaos Engineering
12
+
13
+ > **Expertise:** LitmusChaos experiments, manual K8s chaos, network partition testing, graceful degradation validation.
14
+
15
+ ## When to load
16
+
17
+ When designing chaos experiments, validating failover behavior, verifying SLO headroom, or onboarding a service to chaos testing.
18
+
19
+ ## Chaos Experiment Design Principles
20
+
21
+ ```
22
+ 1. Define steady state first
23
+ → What does "working" look like? (SLI baseline: error rate < 0.1%, p99 < 200ms)
24
+
25
+ 2. Hypothesize
26
+ → "If 1/3 of pods die, the service will continue serving with p99 < 500ms"
27
+
28
+ 3. Blast radius control
29
+ → Start with staging. Start with 1 pod. Increase gradually.
30
+
31
+ 4. Abort conditions
32
+ → Auto-stop if error rate > 1% or p99 > 1s for > 2 min
33
+
34
+ 5. Document and act
35
+ → Passed = evidence of resilience. Failed = fix + re-test. Never just accept failure.
36
+ ```
37
+
38
+ ## Manual Chaos (no tooling needed)
39
+
40
+ ```bash
41
+ # ── Pod kill (test restart recovery) ──────────────────────────
42
+ kubectl delete pod <pod-name> -n production
43
+ # Watch: kubectl get pods -n production -l app=my-service -w
44
+ # Expected: new pod starts, readiness probe passes, 0 user-visible errors
45
+
46
+ # ── Kill all pods in deployment (test rolling restart recovery) ──
47
+ kubectl rollout restart deployment/my-service -n production
48
+ # Watch error rate during rollout
49
+
50
+ # ── Simulate OOMKill ──────────────────────────────────────────
51
+ kubectl exec -it <pod> -n production -- sh -c \
52
+ "dd if=/dev/zero of=/dev/shm/blob bs=1M count=600"
53
+ # Expected: pod OOMKilled, restarted, alert fired, no user impact
54
+
55
+ # ── Resource pressure on node ─────────────────────────────────
56
+ kubectl run stress --image=polinux/stress --restart=Never \
57
+ --overrides='{"spec":{"nodeSelector":{"kubernetes.io/hostname":"worker-01"}}}' \
58
+ -- stress --cpu 4 --vm 1 --vm-bytes 2G --timeout 120s
59
+
60
+ # ── Network partition: isolate a pod (Cilium + network policy) ──
61
+ # Apply a policy that drops all traffic from/to the pod
62
+ kubectl apply -f - << 'EOF'
63
+ apiVersion: networking.k8s.io/v1
64
+ kind: NetworkPolicy
65
+ metadata: { name: chaos-isolate, namespace: production }
66
+ spec:
67
+ podSelector: { matchLabels: { chaos-target: "true" } }
68
+ policyTypes: [Ingress, Egress]
69
+ EOF
70
+ kubectl label pod <pod> chaos-target=true -n production
71
+ # Observe: circuit breakers trip, retries, fallback behavior
72
+ # Cleanup:
73
+ kubectl delete networkpolicy chaos-isolate -n production
74
+ kubectl label pod <pod> chaos-target- -n production
75
+ ```
76
+
77
+ ## LitmusChaos Experiments
78
+
79
+ ```yaml
80
+ # Install LitmusChaos
81
+ kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v3.0.0.yaml
82
+
83
+ # ── Pod Delete experiment ────────────────────────────────────
84
+ apiVersion: litmuschaos.io/v1alpha1
85
+ kind: ChaosEngine
86
+ metadata:
87
+ name: pod-delete-experiment
88
+ namespace: production
89
+ spec:
90
+ appinfo:
91
+ appns: production
92
+ applabel: app=order-service
93
+ appkind: deployment
94
+ engineState: active
95
+ chaosServiceAccount: litmus-admin
96
+ experiments:
97
+ - name: pod-delete
98
+ spec:
99
+ components:
100
+ env:
101
+ - name: TOTAL_CHAOS_DURATION
102
+ value: "60" # run for 60 seconds
103
+ - name: CHAOS_INTERVAL
104
+ value: "10" # delete a pod every 10s
105
+ - name: FORCE
106
+ value: "false" # graceful delete (test SIGTERM handling)
107
+ - name: PODS_AFFECTED_PERC
108
+ value: "33" # kill 33% of pods at a time
109
+ ```
110
+
111
+ ```yaml
112
+ # ── Pod CPU Hog (test HPA scale-out) ─────────────────────────
113
+ apiVersion: litmuschaos.io/v1alpha1
114
+ kind: ChaosEngine
115
+ metadata:
116
+ name: cpu-hog-experiment
117
+ namespace: production
118
+ spec:
119
+ experiments:
120
+ - name: pod-cpu-hog
121
+ spec:
122
+ components:
123
+ env:
124
+ - name: CPU_CORES
125
+ value: "1"
126
+ - name: TOTAL_CHAOS_DURATION
127
+ value: "120"
128
+ - name: TARGET_PODS
129
+ value: "order-service-abc123"
130
+ ```
131
+
132
+ ## Chaos Game Days (structured runbook)
133
+
134
+ ```
135
+ 1. Define scope (30 min)
136
+ - Which services? Which failure modes?
137
+ - What is acceptable impact? (staging or prod with traffic shadow)
138
+
139
+ 2. Baseline measurement (10 min)
140
+ - Capture: RPS, error rate, p99, pod count
141
+ - Screenshot Grafana dashboard
142
+
143
+ 3. Run experiments (60–90 min)
144
+ Experiment A: Kill 1 of 3 pods → observe recovery time
145
+ Experiment B: Saturate CPU on 1 pod → observe HPA response
146
+ Experiment C: Partition service from its DB → observe circuit breaker
147
+
148
+ 4. Capture results per experiment
149
+ - Steady state maintained? (SLI threshold)
150
+ - Time to recovery
151
+ - Alerts fired? Correct ones?
152
+ - Runbook adequate?
153
+
154
+ 5. Action items (20 min)
155
+ - For each failure: fix or accept with documentation
156
+ - Schedule follow-up experiments after fixes
157
+ ```
158
+
159
+ ## Abort / Safety Controls
160
+
161
+ ```yaml
162
+ # LitmusChaos: abort on SLO breach using steady-state hypothesis
163
+ spec:
164
+ jobCleanUpPolicy: delete
165
+ monitoring: true
166
+ # Prometheus probe: abort if error rate > 1%
167
+ experiments:
168
+ - name: pod-delete
169
+ spec:
170
+ probe:
171
+ - name: check-error-rate
172
+ type: promProbe
173
+ promProbe/inputs:
174
+ endpoint: http://prometheus:9090
175
+ query: |
176
+ sum(rate(http_requests_total{service="order-service",status=~"5.."}[2m]))
177
+ / sum(rate(http_requests_total{service="order-service"}[2m]))
178
+ comparator:
179
+ type: float
180
+ criteria: "<="
181
+ value: "0.01" # abort if error rate exceeds 1%
182
+ mode: Continuous
183
+ runProperties:
184
+ probeTimeout: 10s
185
+ interval: 15s
186
+ ```
@@ -0,0 +1,119 @@
1
+ ---
2
+ name: incident-command
3
+ type: skill
4
+ description: Structured incident command for P0/P1 — roles, timeline, communication templates, and mitigation-first approach.
5
+ related-rules:
6
+ - on-call-standards.md
7
+ - error-budget-policy.md
8
+ allowed-tools: Read, Bash
9
+ ---
10
+
11
+ # Skill: Incident Command
12
+
13
+ > **Expertise:** ICS-inspired incident structure, communication templates, mitigation over diagnosis, blameless culture.
14
+
15
+ ## When to load
16
+
17
+ When responding to a P0/P1 incident, coordinating a multi-engineer response, or writing a war room update.
18
+
19
+ ## Incident Roles
20
+
21
+ | Role | Responsibility | Who |
22
+ |:---|:---|:---|
23
+ | **Incident Commander (IC)** | Owns coordination; makes go/no-go calls | On-call lead or SRE |
24
+ | **Technical Lead** | Diagnoses and implements fix | On-call engineer |
25
+ | **Comms Lead** | Writes status page + stakeholder updates | PM or secondary on-call |
26
+ | **Scribe** | Documents timeline in real-time | Any available engineer |
27
+
28
+ ## P0 Timeline (first 30 minutes)
29
+
30
+ ```
31
+ T+0: ACKNOWLEDGE — "I'm on it" in #incidents Slack
32
+ T+2: SCOPE — What's broken? Since when? Who's affected?
33
+ → kubectl get pods -A | grep -v Running
34
+ → Check Grafana error rate + latency dashboard
35
+ T+5: PAGE escalation if > 10% users affected or revenue impacted
36
+ T+10: STATUS PAGE update: "We are investigating reports of [symptom]"
37
+ T+15: MITIGATION — Rollback > fix. Prefer reversible actions.
38
+ Order: rollback deploy → feature flag off → scale up → redirect traffic
39
+ T+20: COMMUNICATE — Slack update with mitigation status + ETA
40
+ T+30: STABILIZE — Confirm metrics returning to baseline
41
+ → Watch error rate for 10 min after mitigation
42
+ T+60: PRELIMINARY POSTMORTEM doc created (timeline captured)
43
+ T+24h: FULL POSTMORTEM — 5-whys, action items, owners
44
+ ```
45
+
46
+ ## Mitigation Priority (always prefer fast+reversible)
47
+
48
+ ```
49
+ 1. Rollback deploy → helm rollback <release> -n <ns> # < 2 min
50
+ 2. Feature flag off → LaunchDarkly / split.io toggle # < 1 min
51
+ 3. Scale up replicas → kubectl scale deploy ... --replicas=N
52
+ 4. Restart pods → kubectl rollout restart deploy/<n>
53
+ 5. Redirect traffic → DNS change / load balancer weight
54
+ 6. Fix forward → only if rollback is not possible
55
+ ```
56
+
57
+ ## Slack Communication Templates
58
+
59
+ ```
60
+ # P0 Opening Message (#incidents channel)
61
+ 🔴 **P0 INCIDENT OPEN** — [service] [symptom]
62
+ IC: @you | Scribe: @name
63
+ Impact: [who is affected, estimated user count]
64
+ Current status: Investigating
65
+ Thread: all updates in this thread
66
+ War room: https://meet.google.com/...
67
+
68
+ # Update every 15 min until resolved
69
+ 📊 **UPDATE T+15** — [service]
70
+ Status: Mitigating / Resolved / Monitoring
71
+ Action taken: Rolled back to v2.3.0
72
+ Current error rate: 0.2% (was 8.4%)
73
+ ETA: Monitoring for 10 min, then close
74
+
75
+ # Resolution
76
+ ✅ **RESOLVED** — [service] — [duration]
77
+ Root cause (preliminary): [1-sentence summary]
78
+ Mitigation: [what fixed it]
79
+ Next: Postmortem within 24h @[owner]
80
+ ```
81
+
82
+ ## Status Page Templates
83
+
84
+ ```
85
+ # Investigating
86
+ Investigating - We are investigating reports of [symptom] affecting [service].
87
+ Users may experience [impact]. We will provide updates every 15 minutes.
88
+
89
+ # Identified
90
+ Identified - We have identified the issue causing [symptom].
91
+ We are working on a fix and expect resolution by [ETA].
92
+
93
+ # Monitoring
94
+ Monitoring - A fix has been implemented and we are monitoring the results.
95
+ Users should no longer experience [symptom].
96
+
97
+ # Resolved
98
+ Resolved - [symptom] affecting [service] has been resolved.
99
+ This incident lasted [duration]. A postmortem will be published within 72 hours.
100
+ ```
101
+
102
+ ## Useful Emergency Commands
103
+
104
+ ```bash
105
+ # Immediate rollback
106
+ helm rollback <release-name> -n <namespace> # rolls back 1 version
107
+ helm rollback <release-name> <revision> -n <namespace> # specific revision
108
+ helm history <release-name> -n <namespace> # list revisions
109
+
110
+ # Scale up quickly
111
+ kubectl scale deploy <name> -n <ns> --replicas=10
112
+
113
+ # Emergency pod restart (without rollout)
114
+ kubectl delete pods -n <ns> -l app=<name>
115
+
116
+ # Check what changed recently
117
+ kubectl describe deploy <name> -n <ns> | grep -A5 "Events:"
118
+ kubectl rollout history deploy/<name> -n <ns>
119
+ ```
@@ -0,0 +1,104 @@
1
+ ---
2
+ name: postmortem-analysis
3
+ type: skill
4
+ description: Write blameless postmortems with 5-whys RCA, actionable follow-ups, and systematic prevention measures.
5
+ related-rules:
6
+ - on-call-standards.md
7
+ allowed-tools: Read, Write
8
+ ---
9
+
10
+ # Skill: Postmortem Analysis
11
+
12
+ > **Expertise:** Blameless culture, 5-whys root cause analysis, contributing factors, actionable items with owners and due dates.
13
+
14
+ ## When to load
15
+
16
+ When writing a postmortem after a P0/P1 incident, reviewing a draft postmortem, or designing action items.
17
+
18
+ ## Postmortem Template
19
+
20
+ ```markdown
21
+ # Postmortem: [Service] — [Date] — [Severity]
22
+
23
+ **Status:** Draft / In Review / Complete
24
+ **Severity:** P0 / P1
25
+ **Duration:** [start] → [end] ([total duration])
26
+ **Impact:** [N users affected, revenue impact if known, SLO budget consumed: X minutes]
27
+ **Incident Commander:** [name]
28
+ **Authors:** [name(s)]
29
+
30
+ ---
31
+
32
+ ## Summary
33
+ [2–3 sentences: what broke, what caused it, what fixed it]
34
+
35
+ ## Timeline (UTC)
36
+
37
+ | Time | Event |
38
+ |:---|:---|
39
+ | 14:22 | Alert fired: HighErrorRate on payment-service |
40
+ | 14:24 | On-call acknowledged; war room opened |
41
+ | 14:28 | Identified: error correlated with v2.4.1 deploy at 14:05 |
42
+ | 14:31 | Mitigation: helm rollback payment-service to revision 3 |
43
+ | 14:33 | Error rate returning to baseline |
44
+ | 14:40 | Resolved; monitoring |
45
+
46
+ ## Root Cause Analysis (5-Whys)
47
+
48
+ **Symptom:** Payment service returning 502s at 4.2% rate
49
+
50
+ 1. **Why?** → Upstream credit-card-service returning 503s
51
+ 2. **Why?** → credit-card-service pods OOMKilled
52
+ 3. **Why?** → Memory limit was 256Mi; new code path loaded full transaction history into memory
53
+ 4. **Why?** → Code review missed memory complexity of the new query (no performance test)
54
+ 5. **Why?** → No memory profiling step in CI; no load test in staging pipeline
55
+
56
+ **Root cause:** Insufficient memory limit combined with absent memory regression testing.
57
+
58
+ ## Contributing Factors
59
+ - [ ] Memory limits not updated with new feature PR
60
+ - [ ] Staging environment has lower traffic than production (bug not triggered)
61
+ - [ ] No VPA recommendation visible to developers
62
+
63
+ ## What Went Well
64
+ - On-call responded in 4 minutes (SLO: 5 min) ✅
65
+ - Rollback executed in 2 minutes ✅
66
+ - Status page updated within 10 minutes ✅
67
+
68
+ ## What Went Poorly
69
+ - Memory issue not caught in staging
70
+ - Alert fired 17 minutes after deploy (too slow — alert `for: 2m` but high latency in detection)
71
+ - Runbook for OOMKilled did not include memory limit increase steps
72
+
73
+ ## Action Items
74
+
75
+ | Action | Owner | Priority | Due |
76
+ |:---|:---|:---|:---|
77
+ | Add memory profiling step to CI (`memory-profiler`) | @dev-team | P1 | 2024-11-22 |
78
+ | Add k6 load test to staging pipeline (match prod traffic pattern) | @devops-team | P1 | 2024-11-29 |
79
+ | Add VPA in "Off" mode for all services → surface recommendations | @devops-team | P2 | 2024-12-06 |
80
+ | Update OOMKilled runbook with memory limit increase steps | @sre-team | P2 | 2024-11-20 |
81
+ | Reduce alert `for:` to 1m for payment-service | @sre-team | P3 | 2024-11-20 |
82
+
83
+ ## SLO Impact
84
+ - Error budget consumed: 18 minutes (of 201.6 min / 28d budget)
85
+ - Budget remaining: 89.1%
86
+ - Budget state: 🟢 Healthy
87
+ ```
88
+
89
+ ## 5-Whys Facilitation Tips
90
+
91
+ 1. **Start with the user-visible symptom**, not the technical failure.
92
+ 2. **Each "why" must be something that could have been different** — avoid "because the code had a bug" (that's not actionable).
93
+ 3. **Stop at organizational / process level** — usually why 4 or 5 reveals a missing process, test, or convention.
94
+ 4. **Multiple root causes are OK** — most incidents have 2-3 contributing causes, not one.
95
+ 5. **Blameless means systems-focused** — "the deployment process allowed an under-tested change" not "Alice didn't test well".
96
+
97
+ ## Action Item Quality
98
+
99
+ | ❌ Weak | ✅ Strong |
100
+ |:---|:---|
101
+ | "Improve testing" | "Add k6 load test targeting payment endpoint to staging pipeline by Nov 29" |
102
+ | "Fix monitoring" | "Add HighMemoryUsage alert (> 80% of limit) with `for: 5m` by Nov 20" |
103
+ | "Be more careful" | "Add required checklist item in PR template: memory impact assessed for new DB queries" |
104
+ | "Investigate X" | "Timebox investigation to 2h; report findings in Slack #postmortems by Nov 21" |