@jetrabbits/agentic 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +143 -0
- package/README.md +154 -0
- package/agentic +1615 -0
- package/areas/devops/ci-cd/AGENTS.md +48 -0
- package/areas/devops/ci-cd/PROMPTS.md +7 -0
- package/areas/devops/ci-cd/prompts/onboard-repo.md +97 -0
- package/areas/devops/ci-cd/prompts/pipeline-debug.md +103 -0
- package/areas/devops/ci-cd/prompts/release-pipeline.md +115 -0
- package/areas/devops/ci-cd/rules/pipeline-standards.md +33 -0
- package/areas/devops/ci-cd/rules/quality-gates.md +24 -0
- package/areas/devops/ci-cd/rules/supply-chain-security.md +34 -0
- package/areas/devops/ci-cd/skills/artifact-management/SKILL.md +157 -0
- package/areas/devops/ci-cd/skills/build-optimization/SKILL.md +168 -0
- package/areas/devops/ci-cd/skills/github-actions-patterns/SKILL.md +190 -0
- package/areas/devops/ci-cd/skills/gitlab-ci-patterns/SKILL.md +169 -0
- package/areas/devops/ci-cd/skills/pipeline-security/SKILL.md +161 -0
- package/areas/devops/ci-cd/workflows/onboard-repo.md +73 -0
- package/areas/devops/ci-cd/workflows/pipeline-debug.md +66 -0
- package/areas/devops/ci-cd/workflows/release-pipeline.md +115 -0
- package/areas/devops/database-ops/AGENTS.md +47 -0
- package/areas/devops/database-ops/prompts/backup-verify.md +83 -0
- package/areas/devops/database-ops/prompts/db-incident.md +127 -0
- package/areas/devops/database-ops/rules/access-control.md +20 -0
- package/areas/devops/database-ops/rules/backup-policy.md +33 -0
- package/areas/devops/database-ops/rules/migration-runbook.md +32 -0
- package/areas/devops/database-ops/skills/backup-restore/SKILL.md +226 -0
- package/areas/devops/database-ops/skills/db-performance/SKILL.md +205 -0
- package/areas/devops/database-ops/skills/migration-safety/SKILL.md +155 -0
- package/areas/devops/database-ops/skills/postgres-operations/SKILL.md +156 -0
- package/areas/devops/database-ops/skills/redis-operations/SKILL.md +174 -0
- package/areas/devops/database-ops/workflows/backup-verify.md +107 -0
- package/areas/devops/database-ops/workflows/db-incident.md +86 -0
- package/areas/devops/devsecops/AGENTS.md +47 -0
- package/areas/devops/devsecops/prompts/policy-onboard.md +79 -0
- package/areas/devops/devsecops/prompts/security-scan-pipeline.md +131 -0
- package/areas/devops/devsecops/rules/container-security.md +22 -0
- package/areas/devops/devsecops/rules/policy-as-code.md +37 -0
- package/areas/devops/devsecops/rules/shift-left-policy.md +26 -0
- package/areas/devops/devsecops/skills/container-hardening/SKILL.md +146 -0
- package/areas/devops/devsecops/skills/opa-policies/SKILL.md +188 -0
- package/areas/devops/devsecops/skills/sbom-supply-chain/SKILL.md +165 -0
- package/areas/devops/devsecops/skills/secret-detection/SKILL.md +190 -0
- package/areas/devops/devsecops/skills/sigstore-signing/SKILL.md +184 -0
- package/areas/devops/devsecops/workflows/policy-onboard.md +104 -0
- package/areas/devops/devsecops/workflows/security-scan-pipeline.md +155 -0
- package/areas/devops/infrastructure/AGENTS.md +50 -0
- package/areas/devops/infrastructure/prompts/destroy-environment.md +81 -0
- package/areas/devops/infrastructure/prompts/drift-remediation.md +71 -0
- package/areas/devops/infrastructure/prompts/module-development.md +69 -0
- package/areas/devops/infrastructure/prompts/provision-environment.md +121 -0
- package/areas/devops/infrastructure/rules/iac-standards.md +80 -0
- package/areas/devops/infrastructure/rules/immutability.md +28 -0
- package/areas/devops/infrastructure/rules/secret-hygiene.md +53 -0
- package/areas/devops/infrastructure/rules/state-management.md +47 -0
- package/areas/devops/infrastructure/skills/ansible-playbooks/SKILL.md +174 -0
- package/areas/devops/infrastructure/skills/cost-optimization/SKILL.md +177 -0
- package/areas/devops/infrastructure/skills/drift-detection/SKILL.md +178 -0
- package/areas/devops/infrastructure/skills/state-management/SKILL.md +159 -0
- package/areas/devops/infrastructure/skills/terraform-modules/SKILL.md +169 -0
- package/areas/devops/infrastructure/workflows/destroy-environment.md +96 -0
- package/areas/devops/infrastructure/workflows/drift-remediation.md +66 -0
- package/areas/devops/infrastructure/workflows/module-development.md +101 -0
- package/areas/devops/infrastructure/workflows/provision-environment.md +96 -0
- package/areas/devops/kubernetes/AGENTS.md +57 -0
- package/areas/devops/kubernetes/PROMPTS.md +9 -0
- package/areas/devops/kubernetes/prompts/cluster-bootstrap.md +67 -0
- package/areas/devops/kubernetes/prompts/debug-workload.md +91 -0
- package/areas/devops/kubernetes/prompts/onboard-service.md +101 -0
- package/areas/devops/kubernetes/prompts/upgrade-cluster.md +63 -0
- package/areas/devops/kubernetes/rules/cluster-standards.md +51 -0
- package/areas/devops/kubernetes/rules/resource-governance.md +80 -0
- package/areas/devops/kubernetes/rules/upgrade-policy.md +52 -0
- package/areas/devops/kubernetes/rules/workload-security.md +64 -0
- package/areas/devops/kubernetes/skills/cluster-operations/SKILL.md +136 -0
- package/areas/devops/kubernetes/skills/helm-charts/SKILL.md +152 -0
- package/areas/devops/kubernetes/skills/network-policies/SKILL.md +169 -0
- package/areas/devops/kubernetes/skills/pod-troubleshooting/SKILL.md +129 -0
- package/areas/devops/kubernetes/skills/rbac-design/SKILL.md +148 -0
- package/areas/devops/kubernetes/skills/resource-tuning/SKILL.md +156 -0
- package/areas/devops/kubernetes/workflows/cluster-bootstrap.md +194 -0
- package/areas/devops/kubernetes/workflows/debug-workload.md +108 -0
- package/areas/devops/kubernetes/workflows/onboard-service.md +124 -0
- package/areas/devops/kubernetes/workflows/upgrade-cluster.md +165 -0
- package/areas/devops/networking/AGENTS.md +47 -0
- package/areas/devops/networking/prompts/onboard-ingress.md +119 -0
- package/areas/devops/networking/prompts/service-mesh-onboard.md +77 -0
- package/areas/devops/networking/rules/ingress-standards.md +17 -0
- package/areas/devops/networking/rules/network-segmentation.md +24 -0
- package/areas/devops/networking/rules/tls-policy.md +32 -0
- package/areas/devops/networking/skills/dns-management/SKILL.md +169 -0
- package/areas/devops/networking/skills/ingress-patterns/SKILL.md +165 -0
- package/areas/devops/networking/skills/service-mesh/SKILL.md +206 -0
- package/areas/devops/networking/skills/tls-termination/SKILL.md +198 -0
- package/areas/devops/networking/skills/vpc-design/SKILL.md +132 -0
- package/areas/devops/networking/workflows/onboard-ingress.md +64 -0
- package/areas/devops/networking/workflows/service-mesh-onboard.md +122 -0
- package/areas/devops/observability/AGENTS.md +48 -0
- package/areas/devops/observability/prompts/alert-investigation.md +117 -0
- package/areas/devops/observability/prompts/observability-stack-setup.md +99 -0
- package/areas/devops/observability/prompts/onboard-service-monitoring.md +79 -0
- package/areas/devops/observability/rules/alerting-standards.md +36 -0
- package/areas/devops/observability/rules/data-retention.md +19 -0
- package/areas/devops/observability/rules/golden-signals.md +28 -0
- package/areas/devops/observability/skills/distributed-tracing/SKILL.md +149 -0
- package/areas/devops/observability/skills/grafana-dashboards/SKILL.md +201 -0
- package/areas/devops/observability/skills/log-aggregation/SKILL.md +159 -0
- package/areas/devops/observability/skills/prometheus-alertmanager/SKILL.md +188 -0
- package/areas/devops/observability/skills/slo-implementation/SKILL.md +189 -0
- package/areas/devops/observability/workflows/alert-investigation.md +98 -0
- package/areas/devops/observability/workflows/observability-stack-setup.md +156 -0
- package/areas/devops/observability/workflows/onboard-service-monitoring.md +83 -0
- package/areas/devops/sre/AGENTS.md +48 -0
- package/areas/devops/sre/prompts/incident-response.md +129 -0
- package/areas/devops/sre/prompts/postmortem.md +101 -0
- package/areas/devops/sre/prompts/slo-review.md +125 -0
- package/areas/devops/sre/rules/error-budget-policy.md +25 -0
- package/areas/devops/sre/rules/on-call-standards.md +25 -0
- package/areas/devops/sre/rules/slo-policy.md +31 -0
- package/areas/devops/sre/skills/capacity-planning/SKILL.md +162 -0
- package/areas/devops/sre/skills/chaos-engineering/SKILL.md +186 -0
- package/areas/devops/sre/skills/incident-command/SKILL.md +119 -0
- package/areas/devops/sre/skills/postmortem-analysis/SKILL.md +104 -0
- package/areas/devops/sre/skills/slo-sli-design/SKILL.md +145 -0
- package/areas/devops/sre/workflows/incident-response.md +66 -0
- package/areas/devops/sre/workflows/postmortem.md +90 -0
- package/areas/devops/sre/workflows/slo-review.md +95 -0
- package/areas/software/backend/AGENTS.md +59 -0
- package/areas/software/backend/PROMPTS.md +50 -0
- package/areas/software/backend/README.md +48 -0
- package/areas/software/backend/prompts/add-migration.md +93 -0
- package/areas/software/backend/prompts/create-endpoint.md +97 -0
- package/areas/software/backend/prompts/debug-issue.md +87 -0
- package/areas/software/backend/prompts/develop-epic.md +83 -0
- package/areas/software/backend/prompts/develop-feature.md +91 -0
- package/areas/software/backend/prompts/refactor-module.md +79 -0
- package/areas/software/backend/prompts/test-feature.md +89 -0
- package/areas/software/backend/rules/architecture.md +20 -0
- package/areas/software/backend/rules/data_access.md +20 -0
- package/areas/software/backend/rules/security.md +20 -0
- package/areas/software/backend/rules/testing.md +19 -0
- package/areas/software/backend/skills/api-design/SKILL.md +170 -0
- package/areas/software/backend/skills/async-processing/SKILL.md +152 -0
- package/areas/software/backend/skills/database-modeling/SKILL.md +173 -0
- package/areas/software/backend/skills/observability/SKILL.md +162 -0
- package/areas/software/backend/skills/troubleshooting/SKILL.md +139 -0
- package/areas/software/backend/workflows/add-migration.md +79 -0
- package/areas/software/backend/workflows/create-endpoint.md +89 -0
- package/areas/software/backend/workflows/debug-issue.md +77 -0
- package/areas/software/backend/workflows/develop-epic.md +78 -0
- package/areas/software/backend/workflows/develop-feature.md +98 -0
- package/areas/software/backend/workflows/refactor-module.md +73 -0
- package/areas/software/backend/workflows/test-feature.md +67 -0
- package/areas/software/data-engineering/AGENTS.md +59 -0
- package/areas/software/data-engineering/PROMPTS.md +32 -0
- package/areas/software/data-engineering/prompts/backfill-data.md +107 -0
- package/areas/software/data-engineering/prompts/data-quality-incident.md +109 -0
- package/areas/software/data-engineering/prompts/lineage-trace.md +121 -0
- package/areas/software/data-engineering/prompts/new-model.md +117 -0
- package/areas/software/data-engineering/prompts/schema-migration.md +111 -0
- package/areas/software/data-engineering/rules/data-governance.md +11 -0
- package/areas/software/data-engineering/rules/pii-handling.md +19 -0
- package/areas/software/data-engineering/rules/pipeline-integrity.md +11 -0
- package/areas/software/data-engineering/rules/schema-management.md +21 -0
- package/areas/software/data-engineering/skills/data-modeling/SKILL.md +49 -0
- package/areas/software/data-engineering/skills/dbt-patterns/SKILL.md +43 -0
- package/areas/software/data-engineering/skills/lineage-governance/SKILL.md +38 -0
- package/areas/software/data-engineering/skills/orchestration/SKILL.md +35 -0
- package/areas/software/data-engineering/skills/quality-checks/SKILL.md +50 -0
- package/areas/software/data-engineering/skills/sql-optimization/SKILL.md +47 -0
- package/areas/software/data-engineering/skills/streaming-patterns/SKILL.md +48 -0
- package/areas/software/data-engineering/workflows/backfill-data.md +59 -0
- package/areas/software/data-engineering/workflows/data-quality-incident.md +64 -0
- package/areas/software/data-engineering/workflows/lineage-trace.md +56 -0
- package/areas/software/data-engineering/workflows/new-model.md +71 -0
- package/areas/software/data-engineering/workflows/schema-migration.md +67 -0
- package/areas/software/frontend/AGENTS.md +60 -0
- package/areas/software/frontend/PROMPTS.md +32 -0
- package/areas/software/frontend/prompts/a11y-fix.md +75 -0
- package/areas/software/frontend/prompts/bundle-analyze.md +75 -0
- package/areas/software/frontend/prompts/release-prep.md +83 -0
- package/areas/software/frontend/prompts/scaffold-component.md +69 -0
- package/areas/software/frontend/prompts/visual-regression.md +73 -0
- package/areas/software/frontend/rules/accessibility.md +16 -0
- package/areas/software/frontend/rules/architecture.md +29 -0
- package/areas/software/frontend/rules/performance.md +23 -0
- package/areas/software/frontend/rules/quality.md +12 -0
- package/areas/software/frontend/skills/a11y-audit/SKILL.md +61 -0
- package/areas/software/frontend/skills/api-integration/SKILL.md +58 -0
- package/areas/software/frontend/skills/component-design/SKILL.md +171 -0
- package/areas/software/frontend/skills/css-architecture/SKILL.md +146 -0
- package/areas/software/frontend/skills/error-handling/SKILL.md +55 -0
- package/areas/software/frontend/skills/performance-tuning/SKILL.md +58 -0
- package/areas/software/frontend/skills/state-management/SKILL.md +54 -0
- package/areas/software/frontend/skills/testing-patterns/SKILL.md +69 -0
- package/areas/software/frontend/workflows/a11y-fix.md +63 -0
- package/areas/software/frontend/workflows/bundle-analyze.md +56 -0
- package/areas/software/frontend/workflows/release-prep.md +66 -0
- package/areas/software/frontend/workflows/scaffold-component.md +67 -0
- package/areas/software/frontend/workflows/visual-regression.md +65 -0
- package/areas/software/full-stack/AGENTS.md +72 -0
- package/areas/software/full-stack/PROMPTS.md +66 -0
- package/areas/software/full-stack/prompts/backend-project-full-cycle.md +141 -0
- package/areas/software/full-stack/prompts/debug-issue.md +115 -0
- package/areas/software/full-stack/prompts/develop-feature.md +119 -0
- package/areas/software/full-stack/prompts/feature-implementation-flow.md +137 -0
- package/areas/software/full-stack/prompts/testing-ci-pipeline.md +119 -0
- package/areas/software/full-stack/rules/api-design-guide.md +24 -0
- package/areas/software/full-stack/rules/async-concurrency-guide.md +21 -0
- package/areas/software/full-stack/rules/backend-architecture-rule.md +41 -0
- package/areas/software/full-stack/rules/background-jobs-guide.md +20 -0
- package/areas/software/full-stack/rules/code-quality-guide.md +22 -0
- package/areas/software/full-stack/rules/database-access-guide.md +24 -0
- package/areas/software/full-stack/rules/database-migrations-guide.md +24 -0
- package/areas/software/full-stack/rules/domain-models-guide.md +28 -0
- package/areas/software/full-stack/rules/e2e-test-guide.md +18 -0
- package/areas/software/full-stack/rules/env-settings-guide.md +34 -0
- package/areas/software/full-stack/rules/error-handling-guide.md +20 -0
- package/areas/software/full-stack/rules/logging-observability-guide.md +22 -0
- package/areas/software/full-stack/rules/project-guide.md +34 -0
- package/areas/software/full-stack/rules/python-venv-guide.md +23 -0
- package/areas/software/full-stack/rules/security-guide.md +22 -0
- package/areas/software/full-stack/rules/svt-test-guide.md +17 -0
- package/areas/software/full-stack/rules/testing-ci-guide.md +25 -0
- package/areas/software/full-stack/skills/api-design-principles/SKILL.md +125 -0
- package/areas/software/full-stack/skills/api-design-principles/assets/api-design-checklist.md +155 -0
- package/areas/software/full-stack/skills/api-design-principles/assets/rest-api-template.py +182 -0
- package/areas/software/full-stack/skills/api-design-principles/references/graphql-schema-design.md +583 -0
- package/areas/software/full-stack/skills/api-design-principles/references/rest-best-practices.md +408 -0
- package/areas/software/full-stack/skills/api-design-principles/resources/implementation-playbook.md +513 -0
- package/areas/software/full-stack/skills/api-patterns/SKILL.md +81 -0
- package/areas/software/full-stack/skills/api-patterns/api-style.md +42 -0
- package/areas/software/full-stack/skills/api-patterns/auth.md +24 -0
- package/areas/software/full-stack/skills/api-patterns/documentation.md +26 -0
- package/areas/software/full-stack/skills/api-patterns/graphql.md +41 -0
- package/areas/software/full-stack/skills/api-patterns/rate-limiting.md +31 -0
- package/areas/software/full-stack/skills/api-patterns/response.md +37 -0
- package/areas/software/full-stack/skills/api-patterns/rest.md +40 -0
- package/areas/software/full-stack/skills/api-patterns/scripts/api_validator.py +211 -0
- package/areas/software/full-stack/skills/api-patterns/security-testing.md +122 -0
- package/areas/software/full-stack/skills/api-patterns/trpc.md +41 -0
- package/areas/software/full-stack/skills/api-patterns/versioning.md +22 -0
- package/areas/software/full-stack/skills/app-builder/SKILL.md +135 -0
- package/areas/software/full-stack/skills/app-builder/agent-coordination.md +71 -0
- package/areas/software/full-stack/skills/app-builder/feature-building.md +53 -0
- package/areas/software/full-stack/skills/app-builder/project-detection.md +34 -0
- package/areas/software/full-stack/skills/app-builder/scaffolding.md +118 -0
- package/areas/software/full-stack/skills/app-builder/tech-stack.md +40 -0
- package/areas/software/full-stack/skills/app-builder/templates/SKILL.md +39 -0
- package/areas/software/full-stack/skills/app-builder/templates/astro-static/TEMPLATE.md +76 -0
- package/areas/software/full-stack/skills/app-builder/templates/chrome-extension/TEMPLATE.md +92 -0
- package/areas/software/full-stack/skills/app-builder/templates/cli-tool/TEMPLATE.md +88 -0
- package/areas/software/full-stack/skills/app-builder/templates/electron-desktop/TEMPLATE.md +88 -0
- package/areas/software/full-stack/skills/app-builder/templates/express-api/TEMPLATE.md +83 -0
- package/areas/software/full-stack/skills/app-builder/templates/flutter-app/TEMPLATE.md +90 -0
- package/areas/software/full-stack/skills/app-builder/templates/monorepo-turborepo/TEMPLATE.md +90 -0
- package/areas/software/full-stack/skills/app-builder/templates/nextjs-fullstack/TEMPLATE.md +82 -0
- package/areas/software/full-stack/skills/app-builder/templates/nextjs-saas/TEMPLATE.md +100 -0
- package/areas/software/full-stack/skills/app-builder/templates/nextjs-static/TEMPLATE.md +106 -0
- package/areas/software/full-stack/skills/app-builder/templates/nuxt-app/TEMPLATE.md +101 -0
- package/areas/software/full-stack/skills/app-builder/templates/python-fastapi/TEMPLATE.md +83 -0
- package/areas/software/full-stack/skills/app-builder/templates/react-native-app/TEMPLATE.md +93 -0
- package/areas/software/full-stack/skills/backend-developer/SKILL.md +58 -0
- package/areas/software/full-stack/skills/bash-pro/SKILL.md +310 -0
- package/areas/software/full-stack/skills/blackbox-test/SKILL.md +84 -0
- package/areas/software/full-stack/skills/prompt-project-planner/SKILL.md +130 -0
- package/areas/software/full-stack/skills/prompt-project-planner/output.schema.md +68 -0
- package/areas/software/full-stack/skills/prompt-project-planner/questions.md +80 -0
- package/areas/software/full-stack/skills/python-pro/SKILL.md +158 -0
- package/areas/software/full-stack/skills/skill-creator/LICENSE.txt +202 -0
- package/areas/software/full-stack/skills/skill-creator/SKILL.md +356 -0
- package/areas/software/full-stack/skills/skill-creator/references/output-patterns.md +82 -0
- package/areas/software/full-stack/skills/skill-creator/references/workflows.md +28 -0
- package/areas/software/full-stack/skills/skill-creator/scripts/init_skill.py +303 -0
- package/areas/software/full-stack/skills/skill-creator/scripts/package_skill.py +110 -0
- package/areas/software/full-stack/skills/skill-creator/scripts/quick_validate.py +95 -0
- package/areas/software/full-stack/workflows/backend-project-full-cycle.md +132 -0
- package/areas/software/full-stack/workflows/debug-issue.md +70 -0
- package/areas/software/full-stack/workflows/develop-feature.md +85 -0
- package/areas/software/full-stack/workflows/feature-implementation-flow.md +78 -0
- package/areas/software/full-stack/workflows/testing-ci-pipeline.md +65 -0
- package/areas/software/general/AGENTS.md +68 -0
- package/areas/software/general/prompts/code-review-workflow.md +87 -0
- package/areas/software/general/prompts/development-cycle-workflow.md +83 -0
- package/areas/software/general/prompts/project-setup-workflow.md +93 -0
- package/areas/software/general/rules/code-style-guide.md +31 -0
- package/areas/software/general/rules/docker-compose-guide.md +27 -0
- package/areas/software/general/rules/git-workflow-guide.md +27 -0
- package/areas/software/general/rules/github-workflow-guide.md +27 -0
- package/areas/software/general/rules/gitlab-ci-guide.md +27 -0
- package/areas/software/general/rules/lint-format-guide.md +29 -0
- package/areas/software/general/rules/makefile-guide.md +34 -0
- package/areas/software/general/rules/readme-sync-guide.md +40 -0
- package/areas/software/general/rules/sdlc-methodology-guide.md +27 -0
- package/areas/software/general/rules/sdlc-role-responsibilities.md +108 -0
- package/areas/software/general/skills/general-dev-tools/SKILL.md +324 -0
- package/areas/software/general/workflows/code-review-workflow.md +84 -0
- package/areas/software/general/workflows/development-cycle-workflow.md +85 -0
- package/areas/software/general/workflows/project-setup-workflow.md +94 -0
- package/areas/software/mlops/AGENTS.md +57 -0
- package/areas/software/mlops/PROMPTS.md +32 -0
- package/areas/software/mlops/prompts/champion-challenger.md +87 -0
- package/areas/software/mlops/prompts/deploy-endpoint.md +91 -0
- package/areas/software/mlops/prompts/evaluate-model.md +87 -0
- package/areas/software/mlops/prompts/model-incident.md +87 -0
- package/areas/software/mlops/prompts/train-experiment.md +83 -0
- package/areas/software/mlops/rules/data-integrity.md +9 -0
- package/areas/software/mlops/rules/model-governance.md +9 -0
- package/areas/software/mlops/rules/production-safety.md +9 -0
- package/areas/software/mlops/rules/reproducibility.md +9 -0
- package/areas/software/mlops/skills/experiment-tracking/SKILL.md +29 -0
- package/areas/software/mlops/skills/feature-engineering/SKILL.md +44 -0
- package/areas/software/mlops/skills/inference-serving/SKILL.md +35 -0
- package/areas/software/mlops/skills/model-evaluation/SKILL.md +40 -0
- package/areas/software/mlops/skills/model-monitoring/SKILL.md +32 -0
- package/areas/software/mlops/workflows/champion-challenger.md +65 -0
- package/areas/software/mlops/workflows/deploy-endpoint.md +70 -0
- package/areas/software/mlops/workflows/evaluate-model.md +63 -0
- package/areas/software/mlops/workflows/model-incident.md +64 -0
- package/areas/software/mlops/workflows/train-experiment.md +56 -0
- package/areas/software/mobile/AGENTS.md +58 -0
- package/areas/software/mobile/PROMPTS.md +32 -0
- package/areas/software/mobile/prompts/crash-triage.md +63 -0
- package/areas/software/mobile/prompts/device-testing.md +83 -0
- package/areas/software/mobile/prompts/ota-update.md +75 -0
- package/areas/software/mobile/prompts/release-build.md +67 -0
- package/areas/software/mobile/prompts/store-submission.md +79 -0
- package/areas/software/mobile/rules/offline-first.md +10 -0
- package/areas/software/mobile/rules/performance-budget.md +20 -0
- package/areas/software/mobile/rules/platform-compliance.md +17 -0
- package/areas/software/mobile/rules/security-mobile.md +9 -0
- package/areas/software/mobile/skills/app-store-prep/SKILL.md +27 -0
- package/areas/software/mobile/skills/mobile-testing/SKILL.md +36 -0
- package/areas/software/mobile/skills/native-modules/SKILL.md +38 -0
- package/areas/software/mobile/skills/navigation-patterns/SKILL.md +49 -0
- package/areas/software/mobile/skills/push-notifications/SKILL.md +40 -0
- package/areas/software/mobile/skills/state-sync/SKILL.md +48 -0
- package/areas/software/mobile/workflows/crash-triage.md +63 -0
- package/areas/software/mobile/workflows/device-testing.md +54 -0
- package/areas/software/mobile/workflows/ota-update.md +54 -0
- package/areas/software/mobile/workflows/release-build.md +67 -0
- package/areas/software/mobile/workflows/store-submission.md +63 -0
- package/areas/software/platform/AGENTS.md +67 -0
- package/areas/software/platform/PROMPTS.md +32 -0
- package/areas/software/platform/prompts/cost-audit.md +117 -0
- package/areas/software/platform/prompts/deploy-production.md +109 -0
- package/areas/software/platform/prompts/drift-check.md +107 -0
- package/areas/software/platform/prompts/incident-response.md +121 -0
- package/areas/software/platform/prompts/provision-env.md +113 -0
- package/areas/software/platform/rules/cost-governance.md +11 -0
- package/areas/software/platform/rules/immutability.md +17 -0
- package/areas/software/platform/rules/reliability.md +19 -0
- package/areas/software/platform/rules/security-posture.md +12 -0
- package/areas/software/platform/skills/ci-cd-pipelines/SKILL.md +58 -0
- package/areas/software/platform/skills/incident-response/SKILL.md +41 -0
- package/areas/software/platform/skills/k8s-manifests/SKILL.md +56 -0
- package/areas/software/platform/skills/networking/SKILL.md +44 -0
- package/areas/software/platform/skills/observability-setup/SKILL.md +49 -0
- package/areas/software/platform/skills/secrets-management/SKILL.md +43 -0
- package/areas/software/platform/skills/terraform-patterns/SKILL.md +75 -0
- package/areas/software/platform/workflows/cost-audit.md +61 -0
- package/areas/software/platform/workflows/deploy-production.md +67 -0
- package/areas/software/platform/workflows/drift-check.md +61 -0
- package/areas/software/platform/workflows/incident-response.md +69 -0
- package/areas/software/platform/workflows/provision-env.md +77 -0
- package/areas/software/qa/AGENTS.md +58 -0
- package/areas/software/qa/PROMPTS.md +32 -0
- package/areas/software/qa/prompts/flakiness-investigation.md +61 -0
- package/areas/software/qa/prompts/performance-audit.md +65 -0
- package/areas/software/qa/prompts/regression-suite.md +61 -0
- package/areas/software/qa/prompts/smoke-test.md +65 -0
- package/areas/software/qa/prompts/test-coverage-report.md +61 -0
- package/areas/software/qa/rules/flakiness-policy.md +12 -0
- package/areas/software/qa/rules/quality-gates.md +28 -0
- package/areas/software/qa/rules/test-data.md +9 -0
- package/areas/software/qa/rules/test-strategy.md +11 -0
- package/areas/software/qa/skills/accessibility-testing/SKILL.md +139 -0
- package/areas/software/qa/skills/api-testing/SKILL.md +140 -0
- package/areas/software/qa/skills/e2e-patterns/SKILL.md +152 -0
- package/areas/software/qa/skills/performance-testing/SKILL.md +177 -0
- package/areas/software/qa/skills/test-data-management/SKILL.md +161 -0
- package/areas/software/qa/skills/test-pyramid/SKILL.md +127 -0
- package/areas/software/qa/workflows/flakiness-investigation.md +63 -0
- package/areas/software/qa/workflows/performance-audit.md +59 -0
- package/areas/software/qa/workflows/regression-suite.md +59 -0
- package/areas/software/qa/workflows/smoke-test.md +64 -0
- package/areas/software/qa/workflows/test-coverage-report.md +57 -0
- package/areas/software/security/AGENTS.md +58 -0
- package/areas/software/security/PROMPTS.md +32 -0
- package/areas/software/security/prompts/compliance-report.md +113 -0
- package/areas/software/security/prompts/pen-test-sim.md +113 -0
- package/areas/software/security/prompts/secret-rotation.md +115 -0
- package/areas/software/security/prompts/security-scan.md +91 -0
- package/areas/software/security/prompts/threat-model-review.md +105 -0
- package/areas/software/security/rules/compliance-baseline.md +23 -0
- package/areas/software/security/rules/dependency-policy.md +12 -0
- package/areas/software/security/rules/secrets-policy.md +22 -0
- package/areas/software/security/rules/secure-coding.md +22 -0
- package/areas/software/security/skills/auth-patterns/SKILL.md +42 -0
- package/areas/software/security/skills/crypto-standards/SKILL.md +42 -0
- package/areas/software/security/skills/dependency-audit/SKILL.md +29 -0
- package/areas/software/security/skills/sast-dast-interpretation/SKILL.md +33 -0
- package/areas/software/security/skills/security-headers/SKILL.md +29 -0
- package/areas/software/security/skills/threat-modeling/SKILL.md +36 -0
- package/areas/software/security/workflows/compliance-report.md +57 -0
- package/areas/software/security/workflows/pen-test-sim.md +63 -0
- package/areas/software/security/workflows/secret-rotation.md +67 -0
- package/areas/software/security/workflows/security-scan.md +64 -0
- package/areas/software/security/workflows/threat-model-review.md +62 -0
- package/areas/template/AGENTS-area.tmpl.md +61 -0
- package/areas/template/AGENTS.tmpl.md +67 -0
- package/areas/template/GUIDE.md +102 -0
- package/areas/template/PROMPTS.tmpl.md +29 -0
- package/areas/template/README.md +57 -0
- package/areas/template/README.tmpl.md +51 -0
- package/areas/template/prompt.tmpl.md +101 -0
- package/areas/template/rule.tmpl.md +71 -0
- package/areas/template/skill.tmpl.md +108 -0
- package/areas/template/workflow.tmpl.md +104 -0
- package/bin/agentic.js +24 -0
- package/extensions/antigravity/GEMINI.md +10 -0
- package/extensions/claude/CLAUDE.md +10 -0
- package/extensions/codex/AGENTS.override.md +93 -0
- package/extensions/gemini/GEMINI.md +10 -0
- package/extensions/opencode/agents/designer.md +65 -0
- package/extensions/opencode/agents/developer.md +63 -0
- package/extensions/opencode/agents/devops-engineer.md +69 -0
- package/extensions/opencode/agents/pm.md +61 -0
- package/extensions/opencode/agents/product-owner.md +76 -0
- package/extensions/opencode/agents/qa.md +66 -0
- package/extensions/opencode/agents/team-lead.md +67 -0
- package/extensions/opencode/commands/feature.md +75 -0
- package/extensions/opencode/opencode.json +93 -0
- package/extensions/opencode/plugins/model-checker.json +14 -0
- package/extensions/opencode/plugins/model-checker.ts +279 -0
- package/extensions/opencode/plugins/sound-notification.ts +13 -0
- package/extensions/opencode/plugins/telegram-notification.ts +86 -0
- package/extensions/opencode/skills/code_review_expert/SKILL.md +144 -0
- package/extensions/opencode/skills/design_expert/SKILL.md +42 -0
- package/extensions/opencode/skills/qa_expert/SKILL.md +116 -0
- package/package.json +19 -0
|
@@ -0,0 +1,159 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: log-aggregation
|
|
3
|
+
type: skill
|
|
4
|
+
description: Set up Loki or ELK log aggregation for K8s workloads — structured logging, log routing, and log-based alerting.
|
|
5
|
+
related-rules:
|
|
6
|
+
- golden-signals.md
|
|
7
|
+
- data-retention.md
|
|
8
|
+
allowed-tools: Read, Write, Edit, Bash
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# Skill: Log Aggregation
|
|
12
|
+
|
|
13
|
+
> **Expertise:** Loki (Grafana stack), Promtail/Fluent Bit, structured JSON logging, log-based alerting, ELK basics.
|
|
14
|
+
|
|
15
|
+
## When to load
|
|
16
|
+
|
|
17
|
+
When setting up log collection, writing log queries, debugging missing logs, or adding log-based alerts.
|
|
18
|
+
|
|
19
|
+
## Loki Stack (K8s — recommended)
|
|
20
|
+
|
|
21
|
+
```yaml
|
|
22
|
+
# Promtail DaemonSet auto-discovers K8s pod logs
|
|
23
|
+
# Install via helm:
|
|
24
|
+
helm upgrade --install loki grafana/loki-stack \
|
|
25
|
+
-n monitoring \
|
|
26
|
+
-f loki-values.yaml
|
|
27
|
+
|
|
28
|
+
# loki-values.yaml
|
|
29
|
+
loki:
|
|
30
|
+
auth_enabled: false
|
|
31
|
+
limits_config:
|
|
32
|
+
retention_period: 720h # 30 days
|
|
33
|
+
ingestion_rate_mb: 16
|
|
34
|
+
max_streams_per_user: 10000
|
|
35
|
+
storage_config:
|
|
36
|
+
boltdb_shipper:
|
|
37
|
+
active_index_directory: /data/loki/boltdb-index
|
|
38
|
+
filesystem:
|
|
39
|
+
directory: /data/loki/chunks
|
|
40
|
+
|
|
41
|
+
promtail:
|
|
42
|
+
config:
|
|
43
|
+
clients:
|
|
44
|
+
- url: http://loki:3100/loki/api/v1/push
|
|
45
|
+
scrape_configs:
|
|
46
|
+
- job_name: kubernetes-pods
|
|
47
|
+
kubernetes_sd_configs:
|
|
48
|
+
- role: pod
|
|
49
|
+
pipeline_stages:
|
|
50
|
+
- docker: {} # parse Docker JSON log format
|
|
51
|
+
- json: # extract fields from app JSON logs
|
|
52
|
+
expressions:
|
|
53
|
+
level: level
|
|
54
|
+
trace_id: trace_id
|
|
55
|
+
service: service
|
|
56
|
+
- labels:
|
|
57
|
+
level:
|
|
58
|
+
service:
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
## LogQL Queries
|
|
62
|
+
|
|
63
|
+
```logql
|
|
64
|
+
# All error logs from a service in last 5 min
|
|
65
|
+
{namespace="production", app="order-service"} |= "ERROR"
|
|
66
|
+
|
|
67
|
+
# Parse JSON and filter by field
|
|
68
|
+
{namespace="production"} | json | level="error" | trace_id != ""
|
|
69
|
+
|
|
70
|
+
# Count errors per service (for alerting)
|
|
71
|
+
sum by (service) (
|
|
72
|
+
count_over_time({namespace="production"} | json | level="error" [5m])
|
|
73
|
+
)
|
|
74
|
+
|
|
75
|
+
# Log rate (to detect log explosion)
|
|
76
|
+
sum(rate({namespace="production"}[5m])) by (app)
|
|
77
|
+
|
|
78
|
+
# Find slow requests from logs
|
|
79
|
+
{app="api-gateway"} | json | response_time_ms > 500
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
## Structured Logging Standards
|
|
83
|
+
|
|
84
|
+
```python
|
|
85
|
+
# Python — structlog
|
|
86
|
+
import structlog
|
|
87
|
+
|
|
88
|
+
log = structlog.get_logger()
|
|
89
|
+
|
|
90
|
+
# Always include: service, version, trace_id, span_id, level
|
|
91
|
+
log.info("order.created",
|
|
92
|
+
order_id="ord-123",
|
|
93
|
+
user_id="usr-456", # OK in log; NOT in metrics labels
|
|
94
|
+
amount_cents=4999,
|
|
95
|
+
# trace_id injected automatically via TraceContextFilter
|
|
96
|
+
)
|
|
97
|
+
|
|
98
|
+
# Output (JSON):
|
|
99
|
+
# {"event": "order.created", "level": "info", "order_id": "ord-123",
|
|
100
|
+
# "trace_id": "abc123def456", "span_id": "789xyz", "timestamp": "..."}
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
```go
|
|
104
|
+
// Go — slog (stdlib, Go 1.21+)
|
|
105
|
+
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
|
|
106
|
+
Level: slog.LevelInfo,
|
|
107
|
+
}))
|
|
108
|
+
slog.SetDefault(logger)
|
|
109
|
+
|
|
110
|
+
slog.Info("order.created",
|
|
111
|
+
"order_id", "ord-123",
|
|
112
|
+
"amount_cents", 4999,
|
|
113
|
+
"trace_id", traceID, // inject from context
|
|
114
|
+
)
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
## Log-Based Alerting (Loki ruler)
|
|
118
|
+
|
|
119
|
+
```yaml
|
|
120
|
+
# loki-rules.yaml
|
|
121
|
+
groups:
|
|
122
|
+
- name: application.logs
|
|
123
|
+
rules:
|
|
124
|
+
- alert: HighErrorLogRate
|
|
125
|
+
expr: |
|
|
126
|
+
sum(rate({namespace="production"} | json | level="error" [5m])) by (app)
|
|
127
|
+
> 10
|
|
128
|
+
for: 2m
|
|
129
|
+
labels:
|
|
130
|
+
severity: warning
|
|
131
|
+
annotations:
|
|
132
|
+
summary: "Error log rate > 10/s — {{ $labels.app }}"
|
|
133
|
+
runbook_url: "https://runbooks.internal/high-error-logs"
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
## Fluent Bit (alternative — lower resource usage)
|
|
137
|
+
|
|
138
|
+
```yaml
|
|
139
|
+
# fluent-bit-config.yaml (K8s ConfigMap)
|
|
140
|
+
[INPUT]
|
|
141
|
+
Name tail
|
|
142
|
+
Path /var/log/containers/*.log
|
|
143
|
+
Parser docker
|
|
144
|
+
Refresh_Interval 5
|
|
145
|
+
|
|
146
|
+
[FILTER]
|
|
147
|
+
Name kubernetes
|
|
148
|
+
Match kube.*
|
|
149
|
+
Kube_Tag_Prefix kube.var.log.containers.
|
|
150
|
+
Merge_Log On
|
|
151
|
+
Keep_Log Off
|
|
152
|
+
|
|
153
|
+
[OUTPUT]
|
|
154
|
+
Name loki
|
|
155
|
+
Match kube.*
|
|
156
|
+
Host loki
|
|
157
|
+
Port 3100
|
|
158
|
+
Labels job=fluent-bit, namespace=$kubernetes['namespace_name']
|
|
159
|
+
```
|
|
@@ -0,0 +1,188 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: prometheus-alertmanager
|
|
3
|
+
type: skill
|
|
4
|
+
description: Write production-quality Prometheus alert rules, recording rules, and Alertmanager routing configs.
|
|
5
|
+
related-rules:
|
|
6
|
+
- golden-signals.md
|
|
7
|
+
- alerting-standards.md
|
|
8
|
+
allowed-tools: Read, Write, Edit, Bash
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# Skill: Prometheus & Alertmanager
|
|
12
|
+
|
|
13
|
+
> **Expertise:** PromQL, alert rules, recording rules, Alertmanager routing, inhibition, silences.
|
|
14
|
+
|
|
15
|
+
## When to load
|
|
16
|
+
|
|
17
|
+
When writing alert rules, debugging PromQL, configuring Alertmanager routing, or investigating a firing alert.
|
|
18
|
+
|
|
19
|
+
## Golden Signal Alert Rules
|
|
20
|
+
|
|
21
|
+
```yaml
|
|
22
|
+
# alerts/service-golden-signals.yaml
|
|
23
|
+
groups:
|
|
24
|
+
- name: service.golden-signals
|
|
25
|
+
rules:
|
|
26
|
+
|
|
27
|
+
# ── Errors ────────────────────────────────────────
|
|
28
|
+
- alert: HighErrorRate
|
|
29
|
+
expr: |
|
|
30
|
+
(
|
|
31
|
+
sum(rate(http_requests_total{status=~"5.."}[5m])) by (namespace, service)
|
|
32
|
+
/
|
|
33
|
+
sum(rate(http_requests_total[5m])) by (namespace, service)
|
|
34
|
+
) > 0.01
|
|
35
|
+
for: 2m
|
|
36
|
+
labels:
|
|
37
|
+
severity: critical
|
|
38
|
+
annotations:
|
|
39
|
+
summary: "Error rate > 1% — {{ $labels.service }} in {{ $labels.namespace }}"
|
|
40
|
+
description: "Current error rate: {{ $value | humanizePercentage }}"
|
|
41
|
+
runbook_url: "https://runbooks.internal/high-error-rate"
|
|
42
|
+
|
|
43
|
+
# ── Latency ───────────────────────────────────────
|
|
44
|
+
- alert: HighP99Latency
|
|
45
|
+
expr: |
|
|
46
|
+
histogram_quantile(0.99,
|
|
47
|
+
sum(rate(http_request_duration_seconds_bucket[5m])) by (namespace, service, le)
|
|
48
|
+
) > 1.0
|
|
49
|
+
for: 5m
|
|
50
|
+
labels:
|
|
51
|
+
severity: warning
|
|
52
|
+
annotations:
|
|
53
|
+
summary: "p99 latency > 1s — {{ $labels.service }}"
|
|
54
|
+
description: "p99: {{ $value | humanizeDuration }}"
|
|
55
|
+
runbook_url: "https://runbooks.internal/high-latency"
|
|
56
|
+
|
|
57
|
+
# ── Saturation ────────────────────────────────────
|
|
58
|
+
- alert: PodMemoryPressure
|
|
59
|
+
expr: |
|
|
60
|
+
(
|
|
61
|
+
container_memory_working_set_bytes{container!=""}
|
|
62
|
+
/
|
|
63
|
+
container_spec_memory_limit_bytes{container!=""}
|
|
64
|
+
) > 0.85
|
|
65
|
+
for: 5m
|
|
66
|
+
labels:
|
|
67
|
+
severity: warning
|
|
68
|
+
annotations:
|
|
69
|
+
summary: "Memory > 85% limit — {{ $labels.container }} in {{ $labels.namespace }}"
|
|
70
|
+
runbook_url: "https://runbooks.internal/memory-pressure"
|
|
71
|
+
|
|
72
|
+
# ── Traffic Drop ──────────────────────────────────
|
|
73
|
+
- alert: TrafficDrop
|
|
74
|
+
expr: |
|
|
75
|
+
(
|
|
76
|
+
sum(rate(http_requests_total[5m])) by (service)
|
|
77
|
+
/
|
|
78
|
+
sum(rate(http_requests_total[1h] offset 5m)) by (service)
|
|
79
|
+
) < 0.5
|
|
80
|
+
for: 5m
|
|
81
|
+
labels:
|
|
82
|
+
severity: warning
|
|
83
|
+
annotations:
|
|
84
|
+
summary: "Traffic dropped > 50% vs 1h ago — {{ $labels.service }}"
|
|
85
|
+
runbook_url: "https://runbooks.internal/traffic-drop"
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
## Recording Rules (pre-aggregate expensive queries)
|
|
89
|
+
|
|
90
|
+
```yaml
|
|
91
|
+
groups:
|
|
92
|
+
- name: service.recording
|
|
93
|
+
interval: 1m
|
|
94
|
+
rules:
|
|
95
|
+
# Pre-compute error rate (used in dashboards — no re-computation)
|
|
96
|
+
- record: job:http_requests:error_rate5m
|
|
97
|
+
expr: |
|
|
98
|
+
sum(rate(http_requests_total{status=~"5.."}[5m])) by (job, namespace)
|
|
99
|
+
/
|
|
100
|
+
sum(rate(http_requests_total[5m])) by (job, namespace)
|
|
101
|
+
|
|
102
|
+
# Pre-compute p99 (expensive histogram_quantile)
|
|
103
|
+
- record: job:http_request_duration_seconds:p99_5m
|
|
104
|
+
expr: |
|
|
105
|
+
histogram_quantile(0.99,
|
|
106
|
+
sum(rate(http_request_duration_seconds_bucket[5m])) by (job, le)
|
|
107
|
+
)
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
## PromQL Patterns
|
|
111
|
+
|
|
112
|
+
```promql
|
|
113
|
+
# Rate of requests (always use rate() on counters, not irate() for alerting)
|
|
114
|
+
rate(http_requests_total[5m])
|
|
115
|
+
|
|
116
|
+
# Error ratio
|
|
117
|
+
sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
|
|
118
|
+
/ sum(rate(http_requests_total[5m])) by (service)
|
|
119
|
+
|
|
120
|
+
# Memory utilisation (working set vs limit)
|
|
121
|
+
container_memory_working_set_bytes / container_spec_memory_limit_bytes
|
|
122
|
+
|
|
123
|
+
# CPU throttling ratio (> 25% = limit too low)
|
|
124
|
+
sum(rate(container_cpu_throttled_seconds_total[5m])) by (pod)
|
|
125
|
+
/ sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)
|
|
126
|
+
|
|
127
|
+
# Absent metric (detect missing scrape targets)
|
|
128
|
+
absent(up{job="my-service"} == 1)
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
## Alertmanager Config
|
|
132
|
+
|
|
133
|
+
```yaml
|
|
134
|
+
# alertmanager.yml
|
|
135
|
+
global:
|
|
136
|
+
resolve_timeout: 5m
|
|
137
|
+
slack_api_url: https://hooks.slack.com/...
|
|
138
|
+
|
|
139
|
+
route:
|
|
140
|
+
receiver: slack-warning
|
|
141
|
+
group_by: [alertname, namespace, service]
|
|
142
|
+
group_wait: 30s
|
|
143
|
+
group_interval: 5m
|
|
144
|
+
repeat_interval: 4h
|
|
145
|
+
routes:
|
|
146
|
+
- matchers: [severity="critical"]
|
|
147
|
+
receiver: pagerduty
|
|
148
|
+
group_wait: 0s # page immediately
|
|
149
|
+
- matchers: [alertname="Watchdog"]
|
|
150
|
+
receiver: deadman-snitch # heartbeat alert
|
|
151
|
+
|
|
152
|
+
inhibit_rules:
|
|
153
|
+
# If a service is down (critical), suppress its latency/error warnings
|
|
154
|
+
- source_matchers: [severity="critical", alertname="ServiceDown"]
|
|
155
|
+
target_matchers: [severity="warning"]
|
|
156
|
+
equal: [namespace, service]
|
|
157
|
+
|
|
158
|
+
receivers:
|
|
159
|
+
- name: pagerduty
|
|
160
|
+
pagerduty_configs:
|
|
161
|
+
- routing_key: $PD_ROUTING_KEY
|
|
162
|
+
description: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
|
|
163
|
+
|
|
164
|
+
- name: slack-warning
|
|
165
|
+
slack_configs:
|
|
166
|
+
- channel: '#alerts-warning'
|
|
167
|
+
title: '{{ .GroupLabels.alertname }}'
|
|
168
|
+
text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
## Debugging Alerts
|
|
172
|
+
|
|
173
|
+
```bash
|
|
174
|
+
# Check currently firing alerts
|
|
175
|
+
kubectl port-forward svc/alertmanager 9093:9093 -n monitoring
|
|
176
|
+
# Open http://localhost:9093
|
|
177
|
+
|
|
178
|
+
# Evaluate a PromQL expression (check why alert fired/didn't fire)
|
|
179
|
+
kubectl port-forward svc/prometheus 9090:9090 -n monitoring
|
|
180
|
+
# Open http://localhost:9090/graph
|
|
181
|
+
|
|
182
|
+
# Check alert rule evaluation
|
|
183
|
+
curl http://localhost:9090/api/v1/rules | jq '.data.groups[].rules[] | select(.name=="HighErrorRate")'
|
|
184
|
+
|
|
185
|
+
# Silence a noisy alert during maintenance
|
|
186
|
+
amtool silence add alertname="HighErrorRate" namespace="staging" \
|
|
187
|
+
--duration=2h --comment="Scheduled maintenance window"
|
|
188
|
+
```
|
|
@@ -0,0 +1,189 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: slo-implementation
|
|
3
|
+
type: skill
|
|
4
|
+
description: Implement SLOs end-to-end in Prometheus — recording rules, burn rate alerts, error budget dashboards, and Sloth/pyrra integration.
|
|
5
|
+
related-rules:
|
|
6
|
+
- golden-signals.md
|
|
7
|
+
- alerting-standards.md
|
|
8
|
+
allowed-tools: Read, Write, Edit, Bash
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# Skill: SLO Implementation
|
|
12
|
+
|
|
13
|
+
> **Expertise:** Prometheus recording rules for SLOs, multi-window burn rate alerts, Sloth code generation, error budget Grafana panels.
|
|
14
|
+
|
|
15
|
+
## When to load
|
|
16
|
+
|
|
17
|
+
When implementing SLOs for a service in Prometheus, setting up burn rate alerts, or creating error budget dashboards.
|
|
18
|
+
|
|
19
|
+
## Full SLO Stack (single service)
|
|
20
|
+
|
|
21
|
+
### Step 1: Define the SLI Recording Rules
|
|
22
|
+
|
|
23
|
+
```yaml
|
|
24
|
+
# prometheus-rules/slo-checkout-service.yaml
|
|
25
|
+
groups:
|
|
26
|
+
- name: slo:checkout-service:recording
|
|
27
|
+
interval: 30s
|
|
28
|
+
rules:
|
|
29
|
+
# Good requests: 2xx, latency < 500ms (combine availability + latency SLI)
|
|
30
|
+
- record: slo:http_requests_good:rate5m
|
|
31
|
+
labels: { service: checkout-service }
|
|
32
|
+
expr: |
|
|
33
|
+
sum(rate(http_requests_total{
|
|
34
|
+
service="checkout-service",
|
|
35
|
+
status=~"2.."
|
|
36
|
+
}[5m]))
|
|
37
|
+
# For latency SLI, intersect with bucket:
|
|
38
|
+
# sum(rate(http_request_duration_seconds_bucket{
|
|
39
|
+
# service="checkout-service", le="0.5"}[5m]))
|
|
40
|
+
|
|
41
|
+
- record: slo:http_requests_total:rate5m
|
|
42
|
+
labels: { service: checkout-service }
|
|
43
|
+
expr: |
|
|
44
|
+
sum(rate(http_requests_total{service="checkout-service"}[5m]))
|
|
45
|
+
|
|
46
|
+
# SLI ratio (5m window)
|
|
47
|
+
- record: slo:http_availability:ratio_rate5m
|
|
48
|
+
labels: { service: checkout-service }
|
|
49
|
+
expr: |
|
|
50
|
+
slo:http_requests_good:rate5m{service="checkout-service"}
|
|
51
|
+
/ slo:http_requests_total:rate5m{service="checkout-service"}
|
|
52
|
+
|
|
53
|
+
# Pre-compute multiple windows for burn rate alerts
|
|
54
|
+
- record: slo:http_availability:ratio_rate30m
|
|
55
|
+
labels: { service: checkout-service }
|
|
56
|
+
expr: |
|
|
57
|
+
sum(rate(http_requests_total{service="checkout-service",status=~"2.."}[30m]))
|
|
58
|
+
/ sum(rate(http_requests_total{service="checkout-service"}[30m]))
|
|
59
|
+
|
|
60
|
+
- record: slo:http_availability:ratio_rate1h
|
|
61
|
+
labels: { service: checkout-service }
|
|
62
|
+
expr: |
|
|
63
|
+
sum(rate(http_requests_total{service="checkout-service",status=~"2.."}[1h]))
|
|
64
|
+
/ sum(rate(http_requests_total{service="checkout-service"}[1h]))
|
|
65
|
+
|
|
66
|
+
- record: slo:http_availability:ratio_rate6h
|
|
67
|
+
labels: { service: checkout-service }
|
|
68
|
+
expr: |
|
|
69
|
+
sum(rate(http_requests_total{service="checkout-service",status=~"2.."}[6h]))
|
|
70
|
+
/ sum(rate(http_requests_total{service="checkout-service"}[6h]))
|
|
71
|
+
|
|
72
|
+
- record: slo:http_availability:ratio_rate1d
|
|
73
|
+
labels: { service: checkout-service }
|
|
74
|
+
expr: |
|
|
75
|
+
sum(rate(http_requests_total{service="checkout-service",status=~"2.."}[1d]))
|
|
76
|
+
/ sum(rate(http_requests_total{service="checkout-service"}[1d]))
|
|
77
|
+
|
|
78
|
+
- record: slo:http_availability:ratio_rate28d
|
|
79
|
+
labels: { service: checkout-service }
|
|
80
|
+
expr: |
|
|
81
|
+
sum_over_time(slo:http_availability:ratio_rate5m{service="checkout-service"}[28d])
|
|
82
|
+
/ (28 * 24 * 12) # 12 samples/hour × 24h × 28d
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### Step 2: Multi-Window Burn Rate Alerts
|
|
86
|
+
|
|
87
|
+
```yaml
|
|
88
|
+
- name: slo:checkout-service:alerts
|
|
89
|
+
rules:
|
|
90
|
+
# ── Fast burn (1h + 5m windows, 14.4× rate) ──────────────────
|
|
91
|
+
# Consumes 2% of 28d budget in 1h → page immediately
|
|
92
|
+
- alert: CheckoutSLOFastBurn
|
|
93
|
+
expr: |
|
|
94
|
+
(slo:http_availability:ratio_rate1h{service="checkout-service"} < (1 - 14.4 * 0.005))
|
|
95
|
+
and
|
|
96
|
+
(slo:http_availability:ratio_rate5m{service="checkout-service"} < (1 - 14.4 * 0.005))
|
|
97
|
+
for: 2m
|
|
98
|
+
labels:
|
|
99
|
+
severity: critical
|
|
100
|
+
service: checkout-service
|
|
101
|
+
slo: availability-99.5
|
|
102
|
+
annotations:
|
|
103
|
+
summary: "Checkout SLO fast burn — error rate > 14.4× baseline"
|
|
104
|
+
description: "1h availability: {{ $value | humanizePercentage }}. Budget burning rapidly."
|
|
105
|
+
runbook_url: "https://runbooks.internal/checkout-slo-fast-burn"
|
|
106
|
+
|
|
107
|
+
# ── Slow burn (6h + 30m windows, 6× rate) ────────────────────
|
|
108
|
+
# Consumes 5% of 28d budget in 6h → ticket, fix in business hours
|
|
109
|
+
- alert: CheckoutSLOSlowBurn
|
|
110
|
+
expr: |
|
|
111
|
+
(slo:http_availability:ratio_rate6h{service="checkout-service"} < (1 - 6 * 0.005))
|
|
112
|
+
and
|
|
113
|
+
(slo:http_availability:ratio_rate30m{service="checkout-service"} < (1 - 6 * 0.005))
|
|
114
|
+
for: 15m
|
|
115
|
+
labels:
|
|
116
|
+
severity: warning
|
|
117
|
+
service: checkout-service
|
|
118
|
+
slo: availability-99.5
|
|
119
|
+
annotations:
|
|
120
|
+
summary: "Checkout SLO slow burn — error rate > 6× baseline"
|
|
121
|
+
runbook_url: "https://runbooks.internal/checkout-slo-slow-burn"
|
|
122
|
+
|
|
123
|
+
# ── Budget exhaustion warning ─────────────────────────────────
|
|
124
|
+
- alert: CheckoutSLOBudgetLow
|
|
125
|
+
expr: |
|
|
126
|
+
slo:http_availability:ratio_rate28d{service="checkout-service"}
|
|
127
|
+
< (1 - 0.005 * 0.75) # < 25% budget remaining
|
|
128
|
+
for: 1h
|
|
129
|
+
labels:
|
|
130
|
+
severity: warning
|
|
131
|
+
service: checkout-service
|
|
132
|
+
annotations:
|
|
133
|
+
summary: "Checkout error budget < 25% remaining for this month"
|
|
134
|
+
runbook_url: "https://runbooks.internal/checkout-error-budget"
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
### Step 3: Sloth (generate from YAML spec)
|
|
138
|
+
|
|
139
|
+
```yaml
|
|
140
|
+
# slo/checkout-service.yaml
|
|
141
|
+
version: "prometheus/v1"
|
|
142
|
+
service: checkout-service
|
|
143
|
+
labels: { team: backend, tier: "1" }
|
|
144
|
+
slos:
|
|
145
|
+
- name: requests-availability
|
|
146
|
+
objective: 99.5
|
|
147
|
+
description: "99.5% of checkout requests succeed"
|
|
148
|
+
sli:
|
|
149
|
+
events:
|
|
150
|
+
error_query: |
|
|
151
|
+
sum(rate(http_requests_total{
|
|
152
|
+
service="checkout-service",
|
|
153
|
+
status=~"5.."}[{{.window}}]))
|
|
154
|
+
total_query: |
|
|
155
|
+
sum(rate(http_requests_total{
|
|
156
|
+
service="checkout-service"}[{{.window}}]))
|
|
157
|
+
alerting:
|
|
158
|
+
name: CheckoutServiceAvailability
|
|
159
|
+
page_alert:
|
|
160
|
+
labels: { severity: critical }
|
|
161
|
+
annotations:
|
|
162
|
+
runbook_url: https://runbooks.internal/checkout-availability
|
|
163
|
+
ticket_alert:
|
|
164
|
+
labels: { severity: warning }
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
# Generate Prometheus rules + alerts from Sloth spec
|
|
169
|
+
sloth generate -i slo/checkout-service.yaml -o rules/slo-checkout-generated.yaml
|
|
170
|
+
# Produces: recording rules for all windows + multi-window burn rate alerts
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
### Step 4: Error Budget Dashboard (Grafana)
|
|
174
|
+
|
|
175
|
+
```promql
|
|
176
|
+
-- Current error budget remaining (percent of 28d budget)
|
|
177
|
+
(
|
|
178
|
+
sum_over_time(slo:http_availability:ratio_rate5m{service="checkout-service"}[28d])
|
|
179
|
+
/ (28 * 24 * 12)
|
|
180
|
+
- (1 - 0.005)
|
|
181
|
+
)
|
|
182
|
+
/ 0.005 * 100
|
|
183
|
+
|
|
184
|
+
-- Hours of budget remaining at current burn rate
|
|
185
|
+
(
|
|
186
|
+
(slo:http_availability:ratio_rate28d{service="checkout-service"} - (1 - 0.005))
|
|
187
|
+
/ 0.005
|
|
188
|
+
) * 28 * 24
|
|
189
|
+
```
|
|
@@ -0,0 +1,98 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: alert-investigation
|
|
3
|
+
type: workflow
|
|
4
|
+
trigger: /alert-investigation
|
|
5
|
+
description: Structured alert investigation — classify, correlate metrics/logs/traces, identify root cause, mitigate, and improve alert quality.
|
|
6
|
+
inputs:
|
|
7
|
+
- alert_name
|
|
8
|
+
- alert_labels
|
|
9
|
+
- firing_since
|
|
10
|
+
outputs:
|
|
11
|
+
- root_cause_summary
|
|
12
|
+
- mitigation_applied_or_deferred
|
|
13
|
+
- alert_quality_notes
|
|
14
|
+
roles:
|
|
15
|
+
- devops-engineer
|
|
16
|
+
- developer
|
|
17
|
+
execution:
|
|
18
|
+
initiator: developer
|
|
19
|
+
related-rules:
|
|
20
|
+
- golden-signals.md
|
|
21
|
+
- alerting-standards.md
|
|
22
|
+
uses-skills:
|
|
23
|
+
- prometheus-alertmanager
|
|
24
|
+
- grafana-dashboards
|
|
25
|
+
- log-aggregation
|
|
26
|
+
- distributed-tracing
|
|
27
|
+
quality-gates:
|
|
28
|
+
- root cause identified before alert is silenced
|
|
29
|
+
- action item created for any alert that fired without a valid runbook step
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## Steps
|
|
33
|
+
|
|
34
|
+
### 1. Acknowledge & Classify — `@devops-engineer`
|
|
35
|
+
- Open Grafana: navigate to service dashboard for the affected service
|
|
36
|
+
- Check: is this a real user-impact alert or a false positive?
|
|
37
|
+
- Real: error rate / latency / saturation affecting users
|
|
38
|
+
- False: alert threshold too sensitive for normal traffic patterns
|
|
39
|
+
- Check: when did the alert start? Correlate with recent deploys or cron jobs
|
|
40
|
+
- **Done when:** alert classified (real/false-positive) and current status known
|
|
41
|
+
|
|
42
|
+
### 2. Correlate Signals — `@devops-engineer`
|
|
43
|
+
|
|
44
|
+
**Metrics (Prometheus):**
|
|
45
|
+
```promql
|
|
46
|
+
-- Error rate breakdown by endpoint
|
|
47
|
+
sum(rate(http_requests_total{service="$svc", status=~"5.."}[5m])) by (path)
|
|
48
|
+
/ sum(rate(http_requests_total{service="$svc"}[5m])) by (path)
|
|
49
|
+
|
|
50
|
+
-- Latency distribution shift
|
|
51
|
+
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{service="$svc"}[5m])) by (le))
|
|
52
|
+
|
|
53
|
+
-- Recent pod restarts
|
|
54
|
+
increase(kube_pod_container_status_restarts_total{namespace="$ns"}[30m])
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
**Logs (Loki):**
|
|
58
|
+
```logql
|
|
59
|
+
{namespace="$ns", app="$svc"} | json | level="error"
|
|
60
|
+
| line_format "{{.message}} trace={{.trace_id}}"
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
**Traces (Tempo):**
|
|
64
|
+
- Search by trace_id from logs → view full request trace
|
|
65
|
+
- Filter by `duration > 1s AND status=error` to find slow/failing requests
|
|
66
|
+
|
|
67
|
+
### 3. Identify Root Cause — `@devops-engineer` + `@developer`
|
|
68
|
+
|
|
69
|
+
Decision tree:
|
|
70
|
+
```
|
|
71
|
+
Error rate spike?
|
|
72
|
+
→ Recent deploy? → Check image diff, config changes → Rollback candidate
|
|
73
|
+
→ No deploy? → Check upstream dependency health, DB connections, external API
|
|
74
|
+
|
|
75
|
+
Latency spike?
|
|
76
|
+
→ CPU throttling? → Check container_cpu_throttled_seconds
|
|
77
|
+
→ Memory pressure? → Check working set vs limits
|
|
78
|
+
→ Downstream slow? → Trace to identify bottleneck service
|
|
79
|
+
→ DB slow? → Check pg_stat_statements, lock waits
|
|
80
|
+
|
|
81
|
+
Saturation?
|
|
82
|
+
→ CPU: scale out or increase limits
|
|
83
|
+
→ Memory: right-size or find leak
|
|
84
|
+
→ Connections: check PgBouncer, connection leak
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### 4. Mitigate — `@devops-engineer`
|
|
88
|
+
- Apply fix (rollback, scale, restart, config change)
|
|
89
|
+
- Watch: is the alert resolving? (usually auto-resolves within `for:` duration after fix)
|
|
90
|
+
- If not resolving: escalate to P1
|
|
91
|
+
|
|
92
|
+
### 5. Post-Investigation Notes — `@devops-engineer`
|
|
93
|
+
- Was the runbook adequate? (could a junior follow it to resolution?)
|
|
94
|
+
- Is the alert threshold correct? (too sensitive = toil; too loose = misses real issues)
|
|
95
|
+
- Create ticket if: runbook needs update, threshold needs tuning, or root cause needs a code fix
|
|
96
|
+
|
|
97
|
+
## Exit
|
|
98
|
+
Alert resolved or escalated + root cause noted + runbook quality assessed = investigation complete.
|