@cubis/foundry 0.3.71 → 0.3.73
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +23 -2
- package/dist/cli/core.js +9 -22
- package/dist/cli/core.js.map +1 -1
- package/package.json +1 -1
- package/src/cli/core.ts +13 -22
- package/workflows/powers/accessibility/POWER.md +83 -94
- package/workflows/powers/accessibility/SKILL.md +82 -94
- package/workflows/powers/agent-design/POWER.md +201 -0
- package/workflows/powers/agent-design/SKILL.md +198 -0
- package/workflows/powers/agent-design/references/clarification-patterns.md +153 -0
- package/workflows/powers/agent-design/references/skill-testing.md +164 -0
- package/workflows/powers/agent-design/references/workflow-patterns.md +226 -0
- package/workflows/powers/agentic-eval/POWER.md +62 -0
- package/workflows/powers/agentic-eval/SKILL.md +59 -0
- package/workflows/powers/agentic-eval/references/rubric-and-regression-checklist.md +11 -0
- package/workflows/powers/api-designer/POWER.md +43 -71
- package/workflows/powers/api-designer/SKILL.md +43 -71
- package/workflows/powers/api-patterns/POWER.md +42 -56
- package/workflows/powers/api-patterns/SKILL.md +42 -57
- package/workflows/powers/architecture-designer/POWER.md +43 -60
- package/workflows/powers/architecture-designer/SKILL.md +43 -60
- package/workflows/powers/ask-questions-if-underspecified/POWER.md +51 -3
- package/workflows/powers/auth-architect/POWER.md +69 -0
- package/workflows/powers/auth-architect/SKILL.md +66 -0
- package/workflows/powers/auth-architect/references/session-token-policy-checklist.md +45 -0
- package/workflows/powers/behavioral-modes/POWER.md +100 -9
- package/workflows/powers/c-pro/POWER.md +105 -0
- package/workflows/powers/c-pro/SKILL.md +102 -0
- package/workflows/powers/c-pro/references/build-systems-and-toolchains.md +148 -0
- package/workflows/powers/c-pro/references/common-ub-and-portability.md +166 -0
- package/workflows/powers/c-pro/references/debugging-with-sanitizers.md +205 -0
- package/workflows/powers/c-pro/references/memory-safety-and-build-checklist.md +60 -0
- package/workflows/powers/c-pro/references/posix-and-platform-apis.md +244 -0
- package/workflows/powers/changelog-generator/POWER.md +127 -63
- package/workflows/powers/changelog-generator/SKILL.md +126 -63
- package/workflows/powers/ci-cd-pipelines/POWER.md +156 -0
- package/workflows/powers/ci-cd-pipelines/SKILL.md +153 -0
- package/workflows/powers/ci-cd-pipelines/references/github-actions-patterns.md +160 -0
- package/workflows/powers/ci-cd-pipelines/references/pipeline-security-checklist.md +57 -0
- package/workflows/powers/cli-developer/POWER.md +152 -95
- package/workflows/powers/cli-developer/SKILL.md +152 -95
- package/workflows/powers/cpp-pro/POWER.md +111 -0
- package/workflows/powers/cpp-pro/SKILL.md +108 -0
- package/workflows/powers/cpp-pro/references/concurrency-primitives.md +266 -0
- package/workflows/powers/cpp-pro/references/move-semantics-and-value-types.md +149 -0
- package/workflows/powers/cpp-pro/references/performance-and-profiling.md +191 -0
- package/workflows/powers/cpp-pro/references/raii-and-modern-cpp-checklist.md +87 -0
- package/workflows/powers/cpp-pro/references/template-and-concepts-patterns.md +205 -0
- package/workflows/powers/csharp-pro/POWER.md +47 -22
- package/workflows/powers/csharp-pro/SKILL.md +47 -22
- package/workflows/powers/dart-pro/POWER.md +68 -0
- package/workflows/powers/dart-pro/SKILL.md +65 -0
- package/workflows/powers/dart-pro/references/isolate-and-concurrency.md +180 -0
- package/workflows/powers/dart-pro/references/null-safety-and-async-patterns.md +133 -0
- package/workflows/powers/dart-pro/references/package-structure-and-linting.md +193 -0
- package/workflows/powers/dart-pro/references/sealed-records-patterns.md +173 -0
- package/workflows/powers/dart-pro/references/testing-and-mocking.md +235 -0
- package/workflows/powers/database-design/POWER.md +47 -33
- package/workflows/powers/database-design/SKILL.md +47 -33
- package/workflows/powers/database-optimizer/POWER.md +43 -64
- package/workflows/powers/database-optimizer/SKILL.md +43 -64
- package/workflows/powers/database-skills/POWER.md +59 -93
- package/workflows/powers/database-skills/SKILL.md +59 -93
- package/workflows/powers/debugging-strategies/POWER.md +69 -0
- package/workflows/powers/debugging-strategies/SKILL.md +66 -0
- package/workflows/powers/debugging-strategies/references/reproduce-isolate-verify-checklist.md +42 -0
- package/workflows/powers/deep-research/POWER.md +67 -0
- package/workflows/powers/deep-research/SKILL.md +64 -0
- package/workflows/powers/deep-research/references/multi-round-research-loop.md +80 -0
- package/workflows/powers/design-system-builder/POWER.md +130 -116
- package/workflows/powers/design-system-builder/SKILL.md +130 -116
- package/workflows/powers/devops-engineer/POWER.md +120 -57
- package/workflows/powers/devops-engineer/SKILL.md +120 -57
- package/workflows/powers/docker-kubernetes/POWER.md +94 -0
- package/workflows/powers/docker-kubernetes/SKILL.md +91 -0
- package/workflows/powers/docker-kubernetes/references/dockerfile-optimization-checklist.md +35 -0
- package/workflows/powers/docker-kubernetes/references/kubernetes-deployment-patterns.md +59 -0
- package/workflows/powers/documentation-templates/POWER.md +158 -127
- package/workflows/powers/documentation-templates/SKILL.md +158 -127
- package/workflows/powers/drizzle-expert/POWER.md +66 -0
- package/workflows/powers/drizzle-expert/SKILL.md +63 -0
- package/workflows/powers/drizzle-expert/references/runtime-pairing-matrix.md +16 -0
- package/workflows/powers/drizzle-expert/references/schema-and-migration-playbook.md +18 -0
- package/workflows/powers/error-ux-observability/POWER.md +144 -131
- package/workflows/powers/error-ux-observability/SKILL.md +143 -131
- package/workflows/powers/fastapi-expert/POWER.md +46 -60
- package/workflows/powers/fastapi-expert/SKILL.md +46 -60
- package/workflows/powers/firebase/POWER.md +65 -0
- package/workflows/powers/firebase/SKILL.md +62 -0
- package/workflows/powers/firebase/references/platform-routing.md +16 -0
- package/workflows/powers/firebase/references/rules-and-indexes-checklist.md +11 -0
- package/workflows/powers/flutter-design-system/POWER.md +63 -0
- package/workflows/powers/flutter-design-system/SKILL.md +60 -0
- package/workflows/powers/flutter-design-system/references/shared-widgets.md +29 -0
- package/workflows/powers/flutter-design-system/references/tokens-and-theme.md +34 -0
- package/workflows/powers/flutter-drift/POWER.md +65 -0
- package/workflows/powers/flutter-drift/SKILL.md +62 -0
- package/workflows/powers/flutter-drift/references/migrations.md +22 -0
- package/workflows/powers/flutter-drift/references/query-patterns.md +26 -0
- package/workflows/powers/flutter-feature/POWER.md +65 -0
- package/workflows/powers/flutter-feature/SKILL.md +62 -0
- package/workflows/powers/flutter-feature/references/architecture-rules.md +85 -0
- package/workflows/powers/flutter-feature/references/composite-provider.md +58 -0
- package/workflows/powers/flutter-feature/references/outbox-pattern.md +87 -0
- package/workflows/powers/flutter-feature/references/testing-patterns.md +218 -0
- package/workflows/powers/flutter-go-router/POWER.md +64 -0
- package/workflows/powers/flutter-go-router/SKILL.md +61 -0
- package/workflows/powers/flutter-go-router/references/guards-and-deeplinks.md +20 -0
- package/workflows/powers/flutter-go-router/references/typed-routes.md +27 -0
- package/workflows/powers/flutter-offline-sync/POWER.md +62 -0
- package/workflows/powers/flutter-offline-sync/SKILL.md +59 -0
- package/workflows/powers/flutter-offline-sync/references/outbox-full.md +44 -0
- package/workflows/powers/flutter-repository/POWER.md +64 -0
- package/workflows/powers/flutter-repository/SKILL.md +61 -0
- package/workflows/powers/flutter-repository/references/drift-patterns.md +21 -0
- package/workflows/powers/flutter-repository/references/retrofit-patterns.md +20 -0
- package/workflows/powers/flutter-riverpod/POWER.md +70 -0
- package/workflows/powers/flutter-riverpod/SKILL.md +67 -0
- package/workflows/powers/flutter-riverpod/references/async-and-mutations.md +19 -0
- package/workflows/powers/flutter-riverpod/references/async-lifecycle.md +19 -0
- package/workflows/powers/flutter-riverpod/references/provider-selection.md +20 -0
- package/workflows/powers/flutter-riverpod/references/testing.md +21 -0
- package/workflows/powers/flutter-riverpod/references/version-matrix.md +24 -0
- package/workflows/powers/flutter-state-machine/POWER.md +62 -0
- package/workflows/powers/flutter-state-machine/SKILL.md +59 -0
- package/workflows/powers/flutter-state-machine/references/app-state-contract.md +23 -0
- package/workflows/powers/flutter-state-machine/references/ui-rendering.md +14 -0
- package/workflows/powers/flutter-testing/POWER.md +64 -0
- package/workflows/powers/flutter-testing/SKILL.md +61 -0
- package/workflows/powers/flutter-testing/references/offline-sync-tests.md +16 -0
- package/workflows/powers/flutter-testing/references/test-layers.md +33 -0
- package/workflows/powers/frontend-code-review/POWER.md +137 -0
- package/workflows/powers/frontend-code-review/SKILL.md +134 -0
- package/workflows/powers/frontend-code-review/references/common-antipatterns.md +86 -0
- package/workflows/powers/frontend-code-review/references/performance-budgets.md +56 -0
- package/workflows/powers/frontend-code-review/references/review-checklists.md +47 -0
- package/workflows/powers/frontend-design/POWER.md +163 -362
- package/workflows/powers/frontend-design/SKILL.md +163 -362
- package/workflows/powers/game-development/POWER.md +57 -140
- package/workflows/powers/game-development/SKILL.md +57 -140
- package/workflows/powers/geo-fundamentals/POWER.md +64 -126
- package/workflows/powers/geo-fundamentals/SKILL.md +64 -127
- package/workflows/powers/git-workflow/POWER.md +135 -0
- package/workflows/powers/git-workflow/SKILL.md +132 -0
- package/workflows/powers/git-workflow/references/pr-review-checklist.md +63 -0
- package/workflows/powers/golang-pro/POWER.md +46 -35
- package/workflows/powers/golang-pro/SKILL.md +46 -35
- package/workflows/powers/graphql-architect/POWER.md +44 -62
- package/workflows/powers/graphql-architect/SKILL.md +44 -62
- package/workflows/powers/i18n-localization/POWER.md +118 -103
- package/workflows/powers/i18n-localization/SKILL.md +118 -103
- package/workflows/powers/java-pro/POWER.md +47 -22
- package/workflows/powers/java-pro/SKILL.md +47 -22
- package/workflows/powers/javascript-pro/POWER.md +47 -34
- package/workflows/powers/javascript-pro/SKILL.md +47 -34
- package/workflows/powers/kotlin-pro/POWER.md +46 -23
- package/workflows/powers/kotlin-pro/SKILL.md +46 -23
- package/workflows/powers/legacy-modernizer/POWER.md +43 -60
- package/workflows/powers/legacy-modernizer/SKILL.md +43 -60
- package/workflows/powers/mcp-builder/POWER.md +65 -0
- package/workflows/powers/mcp-builder/SKILL.md +62 -0
- package/workflows/powers/mcp-builder/references/testing-and-evals.md +17 -0
- package/workflows/powers/mcp-builder/references/transport-and-tool-design.md +17 -0
- package/workflows/powers/microservices-architect/POWER.md +43 -70
- package/workflows/powers/microservices-architect/SKILL.md +43 -70
- package/workflows/powers/mobile-design/POWER.md +110 -345
- package/workflows/powers/mobile-design/SKILL.md +110 -345
- package/workflows/powers/mongodb/POWER.md +67 -0
- package/workflows/powers/mongodb/SKILL.md +64 -0
- package/workflows/powers/mongodb/references/mongodb-checklist.md +20 -0
- package/workflows/powers/mysql/POWER.md +67 -0
- package/workflows/powers/mysql/SKILL.md +64 -0
- package/workflows/powers/mysql/references/mysql-checklist.md +20 -0
- package/workflows/powers/neki/POWER.md +67 -0
- package/workflows/powers/neki/SKILL.md +64 -0
- package/workflows/powers/neki/references/neki-checklist.md +18 -0
- package/workflows/powers/nestjs-expert/POWER.md +45 -91
- package/workflows/powers/nestjs-expert/SKILL.md +45 -91
- package/workflows/powers/nextjs-developer/POWER.md +51 -44
- package/workflows/powers/nextjs-developer/SKILL.md +51 -44
- package/workflows/powers/nodejs-best-practices/POWER.md +48 -29
- package/workflows/powers/nodejs-best-practices/SKILL.md +48 -29
- package/workflows/powers/observability/POWER.md +109 -0
- package/workflows/powers/observability/SKILL.md +106 -0
- package/workflows/powers/observability/references/alerting-and-slo-checklist.md +87 -0
- package/workflows/powers/observability/references/opentelemetry-setup-guide.md +121 -0
- package/workflows/powers/openai-docs/POWER.md +61 -0
- package/workflows/powers/openai-docs/SKILL.md +58 -0
- package/workflows/powers/openai-docs/references/official-source-playbook.md +10 -0
- package/workflows/powers/performance-profiling/POWER.md +61 -114
- package/workflows/powers/performance-profiling/SKILL.md +61 -114
- package/workflows/powers/php-pro/POWER.md +116 -0
- package/workflows/powers/php-pro/SKILL.md +113 -0
- package/workflows/powers/php-pro/references/architecture-and-di.md +239 -0
- package/workflows/powers/php-pro/references/modern-php-features.md +189 -0
- package/workflows/powers/php-pro/references/performance-and-deployment.md +197 -0
- package/workflows/powers/php-pro/references/php84-strict-typing-checklist.md +161 -0
- package/workflows/powers/php-pro/references/testing-and-static-analysis.md +235 -0
- package/workflows/powers/playwright-e2e/POWER.md +85 -0
- package/workflows/powers/playwright-e2e/SKILL.md +82 -0
- package/workflows/powers/playwright-e2e/references/locator-trace-flake-checklist.md +80 -0
- package/workflows/powers/postgres/POWER.md +67 -0
- package/workflows/powers/postgres/SKILL.md +64 -0
- package/workflows/powers/postgres/references/postgres-checklist.md +20 -0
- package/workflows/powers/prompt-engineer/POWER.md +47 -30
- package/workflows/powers/prompt-engineer/SKILL.md +47 -30
- package/workflows/powers/python-pro/POWER.md +47 -36
- package/workflows/powers/python-pro/SKILL.md +47 -36
- package/workflows/powers/react-best-practices/POWER.md +56 -33
- package/workflows/powers/react-best-practices/SKILL.md +56 -33
- package/workflows/powers/react-expert/POWER.md +47 -37
- package/workflows/powers/react-expert/SKILL.md +47 -37
- package/workflows/powers/redis/POWER.md +67 -0
- package/workflows/powers/redis/SKILL.md +64 -0
- package/workflows/powers/redis/references/redis-checklist.md +19 -0
- package/workflows/powers/ruby-pro/POWER.md +118 -0
- package/workflows/powers/ruby-pro/SKILL.md +115 -0
- package/workflows/powers/ruby-pro/references/modern-ruby-features.md +189 -0
- package/workflows/powers/ruby-pro/references/object-design-patterns.md +220 -0
- package/workflows/powers/ruby-pro/references/performance-and-profiling.md +224 -0
- package/workflows/powers/ruby-pro/references/ruby-concurrency-and-testing.md +190 -0
- package/workflows/powers/ruby-pro/references/testing-and-rspec.md +236 -0
- package/workflows/powers/rust-pro/POWER.md +45 -31
- package/workflows/powers/rust-pro/SKILL.md +45 -31
- package/workflows/powers/security-engineer/POWER.md +129 -0
- package/workflows/powers/security-engineer/SKILL.md +126 -0
- package/workflows/powers/seo-fundamentals/POWER.md +59 -102
- package/workflows/powers/seo-fundamentals/SKILL.md +59 -102
- package/workflows/powers/serverless-patterns/POWER.md +171 -0
- package/workflows/powers/serverless-patterns/SKILL.md +168 -0
- package/workflows/powers/skill-creator/POWER.md +90 -0
- package/workflows/powers/skill-creator/SKILL.md +87 -0
- package/workflows/powers/skill-creator/references/platform-formats.md +181 -0
- package/workflows/powers/skill-creator/references/schemas.md +430 -0
- package/workflows/powers/spec-miner/POWER.md +49 -57
- package/workflows/powers/spec-miner/SKILL.md +49 -57
- package/workflows/powers/sqlite/POWER.md +67 -0
- package/workflows/powers/sqlite/SKILL.md +64 -0
- package/workflows/powers/sqlite/references/sqlite-checklist.md +19 -0
- package/workflows/powers/sre-engineer/POWER.md +123 -64
- package/workflows/powers/sre-engineer/SKILL.md +123 -64
- package/workflows/powers/static-analysis/POWER.md +121 -77
- package/workflows/powers/static-analysis/SKILL.md +121 -77
- package/workflows/powers/stripe-best-practices/POWER.md +140 -17
- package/workflows/powers/stripe-best-practices/SKILL.md +139 -17
- package/workflows/powers/supabase/POWER.md +67 -0
- package/workflows/powers/supabase/SKILL.md +64 -0
- package/workflows/powers/supabase/references/supabase-checklist.md +19 -0
- package/workflows/powers/swift-pro/POWER.md +118 -0
- package/workflows/powers/swift-pro/SKILL.md +115 -0
- package/workflows/powers/swift-pro/references/concurrency-patterns.md +165 -0
- package/workflows/powers/swift-pro/references/protocol-and-generics.md +172 -0
- package/workflows/powers/swift-pro/references/sendable-and-isolation.md +116 -0
- package/workflows/powers/swift-pro/references/swift-concurrency-and-protocols.md +260 -0
- package/workflows/powers/swift-pro/references/testing-and-packages.md +192 -0
- package/workflows/powers/tailwind-patterns/POWER.md +71 -240
- package/workflows/powers/tailwind-patterns/SKILL.md +71 -240
- package/workflows/powers/testing-patterns/POWER.md +155 -10
- package/workflows/powers/testing-patterns/SKILL.md +155 -10
- package/workflows/powers/typescript-pro/POWER.md +47 -38
- package/workflows/powers/typescript-pro/SKILL.md +47 -38
- package/workflows/powers/vitess/POWER.md +67 -0
- package/workflows/powers/vitess/SKILL.md +64 -0
- package/workflows/powers/vitess/references/vitess-checklist.md +19 -0
- package/workflows/powers/vulnerability-scanner/POWER.md +146 -10
- package/workflows/powers/vulnerability-scanner/SKILL.md +146 -10
- package/workflows/powers/web-perf/POWER.md +43 -170
- package/workflows/powers/web-perf/SKILL.md +43 -170
- package/workflows/powers/webapp-testing/POWER.md +43 -164
- package/workflows/powers/webapp-testing/SKILL.md +43 -164
- package/workflows/workflows/agent-environment-setup/platforms/antigravity/rules/GEMINI.md +65 -42
- package/workflows/workflows/agent-environment-setup/platforms/claude/rules/CLAUDE.md +8 -6
- package/workflows/workflows/agent-environment-setup/platforms/codex/rules/AGENTS.md +65 -41
- package/workflows/workflows/agent-environment-setup/platforms/copilot/rules/copilot-instructions.md +8 -6
- package/workflows/workflows/agent-environment-setup/shared/rules/STEERING.md +9 -8
- package/workflows/workflows/agent-environment-setup/shared/rules/overrides/codex.md +1 -1
|
@@ -1,82 +1,145 @@
|
|
|
1
1
|
---
|
|
2
|
-
name:
|
|
3
|
-
description:
|
|
2
|
+
name: devops-engineer
|
|
3
|
+
description: Design CI/CD pipelines, infrastructure-as-code, monitoring, deployment strategies, and incident response procedures for reliable software delivery.
|
|
4
|
+
license: Apache-2.0
|
|
5
|
+
metadata:
|
|
6
|
+
author: cubis-foundry
|
|
7
|
+
version: "3.0"
|
|
8
|
+
compatibility: Claude Code, Codex, GitHub Copilot, Gemini CLI
|
|
4
9
|
---
|
|
5
10
|
|
|
6
|
-
|
|
7
11
|
# DevOps Engineer
|
|
8
12
|
|
|
9
|
-
##
|
|
13
|
+
## Purpose
|
|
14
|
+
|
|
15
|
+
Guide DevOps practices including CI/CD pipeline design, infrastructure-as-code, deployment strategies, monitoring, and incident response. Bridge development and operations for reliable, automated delivery.
|
|
16
|
+
|
|
17
|
+
## When to Use
|
|
18
|
+
|
|
19
|
+
- Setting up or improving CI/CD pipelines
|
|
20
|
+
- Designing deployment strategies (blue-green, canary, rolling)
|
|
21
|
+
- Writing infrastructure-as-code (Terraform, Pulumi, CloudFormation)
|
|
22
|
+
- Configuring monitoring, alerting, and observability
|
|
23
|
+
- Building incident response procedures
|
|
24
|
+
- Containerizing applications (Docker, Kubernetes)
|
|
25
|
+
|
|
26
|
+
## Instructions
|
|
27
|
+
|
|
28
|
+
### Step 1 — CI/CD Pipeline Design
|
|
29
|
+
|
|
30
|
+
**Pipeline stages** (in order):
|
|
31
|
+
|
|
32
|
+
1. **Lint & Format** — static analysis, code formatting (fastest feedback)
|
|
33
|
+
2. **Unit Tests** — isolated logic tests (< 5 min target)
|
|
34
|
+
3. **Build** — compile, bundle, generate artifacts
|
|
35
|
+
4. **Integration Tests** — API, database, service boundary tests
|
|
36
|
+
5. **Security Scan** — dependency audit, SAST, secret scanning
|
|
37
|
+
6. **Deploy to Staging** — automated deployment to pre-production
|
|
38
|
+
7. **E2E / Smoke Tests** — critical path verification on staging
|
|
39
|
+
8. **Deploy to Production** — automated or gated release
|
|
40
|
+
|
|
41
|
+
**Principles**:
|
|
42
|
+
|
|
43
|
+
- Fail fast — put the quickest checks first
|
|
44
|
+
- Parallelize independent stages
|
|
45
|
+
- Cache dependencies between runs (node_modules, Docker layers)
|
|
46
|
+
- Every merge to main should be deployable
|
|
47
|
+
- Never skip tests to "move faster"
|
|
48
|
+
|
|
49
|
+
### Step 2 — Deployment Strategies
|
|
50
|
+
|
|
51
|
+
| Strategy | Risk | Rollback Speed | When to Use |
|
|
52
|
+
| ------------- | -------- | ------------------- | ----------------------------------------------- |
|
|
53
|
+
| Rolling | Low | Medium | Default for most services |
|
|
54
|
+
| Blue-Green | Low | Instant (switch) | Stateless services, zero-downtime required |
|
|
55
|
+
| Canary | Very Low | Fast (route change) | High-traffic services, gradual confidence |
|
|
56
|
+
| Feature Flags | Very Low | Instant (toggle) | Decoupling deploy from release |
|
|
57
|
+
| Recreate | High | Slow (redeploy) | Only when breaking changes require full restart |
|
|
58
|
+
|
|
59
|
+
**Rollback plan**: Every deployment must have a documented rollback path that takes < 5 minutes.
|
|
60
|
+
|
|
61
|
+
### Step 3 — Infrastructure as Code
|
|
62
|
+
|
|
63
|
+
**Principles**:
|
|
64
|
+
|
|
65
|
+
- All infrastructure defined in version-controlled code
|
|
66
|
+
- Environments are reproducible from code alone
|
|
67
|
+
- No manual changes to production (drift = risk)
|
|
68
|
+
- Use modules/components for reusable infrastructure patterns
|
|
69
|
+
- Plan before apply — review changes before executing
|
|
70
|
+
|
|
71
|
+
**Structure**:
|
|
72
|
+
|
|
73
|
+
```
|
|
74
|
+
infrastructure/
|
|
75
|
+
├── modules/ (reusable components)
|
|
76
|
+
│ ├── networking/
|
|
77
|
+
│ ├── compute/
|
|
78
|
+
│ └── database/
|
|
79
|
+
├── environments/
|
|
80
|
+
│ ├── staging/
|
|
81
|
+
│ └── production/
|
|
82
|
+
└── shared/ (DNS, IAM, secrets)
|
|
83
|
+
```
|
|
10
84
|
|
|
11
|
-
|
|
85
|
+
### Step 4 — Monitoring & Alerting
|
|
12
86
|
|
|
13
|
-
|
|
87
|
+
**Four Golden Signals** (monitor these for every service):
|
|
14
88
|
|
|
15
|
-
|
|
89
|
+
| Signal | Measures | Example Metric |
|
|
90
|
+
| ---------- | ---------------------- | ------------------------------ |
|
|
91
|
+
| Latency | Time to serve requests | p50, p95, p99 response time |
|
|
92
|
+
| Traffic | Demand on the system | Requests per second |
|
|
93
|
+
| Errors | Failed requests | Error rate (5xx / total) |
|
|
94
|
+
| Saturation | Resource utilization | CPU, memory, disk, connections |
|
|
16
95
|
|
|
17
|
-
|
|
18
|
-
- **Deploy Hat**: Orchestrating deployments across environments
|
|
19
|
-
- **Ops Hat**: Ensuring reliability, monitoring, and incident response
|
|
96
|
+
**Alerting rules**:
|
|
20
97
|
|
|
21
|
-
|
|
98
|
+
- Alert on symptoms (high error rate), not causes (high CPU)
|
|
99
|
+
- Every alert must be actionable — if no one needs to act, it's noise
|
|
100
|
+
- Use severity levels: critical (page), warning (ticket), info (dashboard)
|
|
101
|
+
- Include runbook link in every alert
|
|
22
102
|
|
|
23
|
-
|
|
24
|
-
- Containerizing applications (Docker, Docker Compose)
|
|
25
|
-
- Kubernetes deployments and configurations
|
|
26
|
-
- Infrastructure as code (Terraform, Pulumi)
|
|
27
|
-
- Cloud platform configuration (AWS, GCP, Azure)
|
|
28
|
-
- Deployment strategies (blue-green, canary, rolling)
|
|
29
|
-
- Building internal developer platforms and self-service tools
|
|
30
|
-
- Incident response, on-call, and production troubleshooting
|
|
31
|
-
- Release automation and artifact management
|
|
103
|
+
### Step 5 — Incident Response
|
|
32
104
|
|
|
33
|
-
|
|
105
|
+
**Incident lifecycle**:
|
|
34
106
|
|
|
35
|
-
1. **
|
|
36
|
-
2. **
|
|
37
|
-
3. **
|
|
38
|
-
4. **
|
|
39
|
-
5. **
|
|
107
|
+
1. **Detect** — monitoring alerts or user reports
|
|
108
|
+
2. **Respond** — acknowledge, assess severity, assemble responders
|
|
109
|
+
3. **Mitigate** — stop the bleeding (rollback, feature flag, scale up)
|
|
110
|
+
4. **Resolve** — fix root cause
|
|
111
|
+
5. **Review** — blameless postmortem within 48 hours
|
|
40
112
|
|
|
41
|
-
|
|
113
|
+
**Postmortem template**:
|
|
42
114
|
|
|
43
|
-
|
|
115
|
+
- What happened? (timeline)
|
|
116
|
+
- What was the impact? (users affected, duration)
|
|
117
|
+
- What was the root cause?
|
|
118
|
+
- What prevented earlier detection?
|
|
119
|
+
- Action items (with owners and deadlines)
|
|
44
120
|
|
|
45
|
-
|
|
46
|
-
| -------------- | ----------------------------------- | -------------------------------------------------------------- |
|
|
47
|
-
| GitHub Actions | `references/github-actions.md` | Setting up CI/CD pipelines, GitHub workflows |
|
|
48
|
-
| Docker | `references/docker-patterns.md` | Containerizing applications, writing Dockerfiles |
|
|
49
|
-
| Kubernetes | `references/kubernetes.md` | K8s deployments, services, ingress, pods |
|
|
50
|
-
| Terraform | `references/terraform-iac.md` | Infrastructure as code, AWS/GCP provisioning |
|
|
51
|
-
| Deployment | `references/deployment-strategies.md` | Blue-green, canary, rolling updates, rollback |
|
|
52
|
-
| Platform | `references/platform-engineering.md` | Self-service infra, developer portals, golden paths, Backstage |
|
|
53
|
-
| Release | `references/release-automation.md` | Artifact management, feature flags, multi-platform CI/CD |
|
|
54
|
-
| Incidents | `references/incident-response.md` | Production outages, on-call, MTTR, postmortems, runbooks |
|
|
121
|
+
## Output Format
|
|
55
122
|
|
|
56
|
-
|
|
123
|
+
```
|
|
124
|
+
## DevOps Recommendation
|
|
125
|
+
[approach and reasoning]
|
|
57
126
|
|
|
58
|
-
|
|
127
|
+
## Implementation
|
|
128
|
+
[configuration files, scripts, or pipeline definitions]
|
|
59
129
|
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
- Store secrets in secret managers (not env files)
|
|
63
|
-
- Enable container scanning in CI/CD
|
|
64
|
-
- Document rollback procedures
|
|
65
|
-
- Use GitOps for Kubernetes (ArgoCD, Flux)
|
|
130
|
+
## Monitoring
|
|
131
|
+
[what to monitor and alert on]
|
|
66
132
|
|
|
67
|
-
|
|
133
|
+
## Rollback Plan
|
|
134
|
+
[how to revert if something goes wrong]
|
|
135
|
+
```
|
|
68
136
|
|
|
69
|
-
|
|
70
|
-
- Store secrets in code or CI/CD variables
|
|
71
|
-
- Skip staging environment testing
|
|
72
|
-
- Ignore resource limits in containers
|
|
73
|
-
- Use `latest` tag in production
|
|
74
|
-
- Deploy on Fridays without monitoring
|
|
137
|
+
## Examples
|
|
75
138
|
|
|
76
|
-
|
|
139
|
+
**User**: "Set up a GitHub Actions CI/CD pipeline for our Node.js API"
|
|
77
140
|
|
|
78
|
-
|
|
141
|
+
**Response approach**: Multi-stage pipeline: lint → test → build → deploy. Cache node_modules. Run security audit. Deploy to staging on PR merge, production on release tag. Include health check after deploy.
|
|
79
142
|
|
|
80
|
-
|
|
143
|
+
**User**: "We need to deploy without downtime"
|
|
81
144
|
|
|
82
|
-
|
|
145
|
+
**Response approach**: Recommend blue-green or rolling deployment based on architecture. Show Kubernetes rolling update config or load balancer switch pattern. Include health check probes and rollback trigger.
|
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
````markdown
|
|
2
|
+
---
|
|
3
|
+
inclusion: manual
|
|
4
|
+
name: docker-kubernetes
|
|
5
|
+
description: "Use for containerization strategy, Dockerfile optimization, Kubernetes deployment patterns, Helm charts, and container orchestration decisions."
|
|
6
|
+
license: MIT
|
|
7
|
+
metadata:
|
|
8
|
+
author: cubis-foundry
|
|
9
|
+
version: "1.0"
|
|
10
|
+
compatibility: Claude Code, Codex, GitHub Copilot
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Docker & Kubernetes
|
|
14
|
+
|
|
15
|
+
## Purpose
|
|
16
|
+
|
|
17
|
+
Use for containerization strategy, Dockerfile optimization, Kubernetes deployment patterns, Helm charts, and container orchestration decisions.
|
|
18
|
+
|
|
19
|
+
## When to Use
|
|
20
|
+
|
|
21
|
+
- Writing or optimizing Dockerfiles for production builds.
|
|
22
|
+
- Designing Kubernetes deployment, service, and ingress manifests.
|
|
23
|
+
- Choosing between deployment strategies (rolling, blue-green, canary).
|
|
24
|
+
- Setting up health checks, resource limits, and autoscaling.
|
|
25
|
+
- Writing Helm charts or Kustomize overlays.
|
|
26
|
+
- Debugging container startup failures, crashloops, or networking issues.
|
|
27
|
+
|
|
28
|
+
## Instructions
|
|
29
|
+
|
|
30
|
+
1. Define the build target: minimal base image, multi-stage build, layer caching.
|
|
31
|
+
2. Set resource requests and limits based on measured usage, not guesses.
|
|
32
|
+
3. Configure health checks (liveness, readiness, startup probes) appropriate to the service.
|
|
33
|
+
4. Design the deployment strategy matching the service's availability requirements.
|
|
34
|
+
5. Verify the container runs as non-root, has no secrets baked in, and scans clean.
|
|
35
|
+
|
|
36
|
+
### Dockerfile standards
|
|
37
|
+
|
|
38
|
+
- Use multi-stage builds to separate build dependencies from runtime.
|
|
39
|
+
- Pin base image versions to digest or specific tag — never use `latest` in production.
|
|
40
|
+
- Order layers from least-changing to most-changing for optimal cache utilization.
|
|
41
|
+
- Copy dependency manifests and install before copying application code.
|
|
42
|
+
- Run as non-root user: `USER nonroot` or numeric UID.
|
|
43
|
+
- Use `.dockerignore` to exclude `.git`, `node_modules`, test fixtures, and local configs.
|
|
44
|
+
- Keep final image minimal: `distroless`, `alpine`, or `slim` variants.
|
|
45
|
+
|
|
46
|
+
### Kubernetes standards
|
|
47
|
+
|
|
48
|
+
- Always set resource requests AND limits for CPU and memory.
|
|
49
|
+
- Define liveness probes (restart on deadlock), readiness probes (remove from service on overload), and startup probes (slow initialization).
|
|
50
|
+
- Use `PodDisruptionBudget` for services that need availability guarantees during node maintenance.
|
|
51
|
+
- Set `securityContext`: `runAsNonRoot: true`, `readOnlyRootFilesystem: true`, drop all capabilities.
|
|
52
|
+
- Use `ConfigMap` for non-sensitive config, `Secret` (or external secrets operator) for credentials.
|
|
53
|
+
- Label everything: `app`, `version`, `component`, `part-of`, `managed-by`.
|
|
54
|
+
- Use `topologySpreadConstraints` or pod anti-affinity for high-availability across zones.
|
|
55
|
+
|
|
56
|
+
### Debugging patterns
|
|
57
|
+
|
|
58
|
+
- CrashLoopBackOff: check `kubectl logs --previous`, verify probes aren't too aggressive.
|
|
59
|
+
- ImagePullBackOff: verify image name, tag, registry auth, and network access.
|
|
60
|
+
- Pending pods: check events for resource pressure, node affinity mismatches, or PVC issues.
|
|
61
|
+
- OOMKilled: increase memory limits or fix the memory leak — never just raise limits blindly.
|
|
62
|
+
- Network issues: verify service selectors match pod labels, check NetworkPolicy rules.
|
|
63
|
+
|
|
64
|
+
### Constraints
|
|
65
|
+
|
|
66
|
+
- Avoid storing secrets in Dockerfiles, environment variables baked at build time, or ConfigMaps.
|
|
67
|
+
- Avoid running as root in production containers.
|
|
68
|
+
- Avoid using `latest` tag for base images or application images.
|
|
69
|
+
- Avoid setting CPU limits without measuring — over-limiting causes throttling.
|
|
70
|
+
- Avoid single-replica deployments for stateless services that need availability.
|
|
71
|
+
- Avoid skipping health probes — Kubernetes cannot manage what it cannot observe.
|
|
72
|
+
|
|
73
|
+
## Output Format
|
|
74
|
+
|
|
75
|
+
Provide implementation guidance, code examples, and configuration as appropriate to the task.
|
|
76
|
+
|
|
77
|
+
## References
|
|
78
|
+
|
|
79
|
+
Load on demand. Do not preload all reference files.
|
|
80
|
+
|
|
81
|
+
| File | Load when |
|
|
82
|
+
| ------------------------------------------------- | --------------------------------------------------------------------------------------------------------- |
|
|
83
|
+
| `references/dockerfile-optimization-checklist.md` | The task focuses on Dockerfile writing, multi-stage builds, layer caching, or image size reduction. |
|
|
84
|
+
| `references/kubernetes-deployment-patterns.md` | The task involves deployment strategies, autoscaling, Helm charts, or production Kubernetes architecture. |
|
|
85
|
+
|
|
86
|
+
## Scripts
|
|
87
|
+
|
|
88
|
+
No helper scripts are required for this skill right now. Keep execution in `SKILL.md` and `references/` unless repeated automation becomes necessary.
|
|
89
|
+
|
|
90
|
+
## Examples
|
|
91
|
+
|
|
92
|
+
- "Help me with docker kubernetes best practices in this project"
|
|
93
|
+
- "Review my docker kubernetes implementation for issues"
|
|
94
|
+
````
|
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: docker-kubernetes
|
|
3
|
+
description: "Use for containerization strategy, Dockerfile optimization, Kubernetes deployment patterns, Helm charts, and container orchestration decisions."
|
|
4
|
+
license: MIT
|
|
5
|
+
metadata:
|
|
6
|
+
author: cubis-foundry
|
|
7
|
+
version: "1.0"
|
|
8
|
+
compatibility: Claude Code, Codex, GitHub Copilot
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# Docker & Kubernetes
|
|
12
|
+
|
|
13
|
+
## Purpose
|
|
14
|
+
|
|
15
|
+
Use for containerization strategy, Dockerfile optimization, Kubernetes deployment patterns, Helm charts, and container orchestration decisions.
|
|
16
|
+
|
|
17
|
+
## When to Use
|
|
18
|
+
|
|
19
|
+
- Writing or optimizing Dockerfiles for production builds.
|
|
20
|
+
- Designing Kubernetes deployment, service, and ingress manifests.
|
|
21
|
+
- Choosing between deployment strategies (rolling, blue-green, canary).
|
|
22
|
+
- Setting up health checks, resource limits, and autoscaling.
|
|
23
|
+
- Writing Helm charts or Kustomize overlays.
|
|
24
|
+
- Debugging container startup failures, crashloops, or networking issues.
|
|
25
|
+
|
|
26
|
+
## Instructions
|
|
27
|
+
|
|
28
|
+
1. Define the build target: minimal base image, multi-stage build, layer caching.
|
|
29
|
+
2. Set resource requests and limits based on measured usage, not guesses.
|
|
30
|
+
3. Configure health checks (liveness, readiness, startup probes) appropriate to the service.
|
|
31
|
+
4. Design the deployment strategy matching the service's availability requirements.
|
|
32
|
+
5. Verify the container runs as non-root, has no secrets baked in, and scans clean.
|
|
33
|
+
|
|
34
|
+
### Dockerfile standards
|
|
35
|
+
|
|
36
|
+
- Use multi-stage builds to separate build dependencies from runtime.
|
|
37
|
+
- Pin base image versions to digest or specific tag — never use `latest` in production.
|
|
38
|
+
- Order layers from least-changing to most-changing for optimal cache utilization.
|
|
39
|
+
- Copy dependency manifests and install before copying application code.
|
|
40
|
+
- Run as non-root user: `USER nonroot` or numeric UID.
|
|
41
|
+
- Use `.dockerignore` to exclude `.git`, `node_modules`, test fixtures, and local configs.
|
|
42
|
+
- Keep final image minimal: `distroless`, `alpine`, or `slim` variants.
|
|
43
|
+
|
|
44
|
+
### Kubernetes standards
|
|
45
|
+
|
|
46
|
+
- Always set resource requests AND limits for CPU and memory.
|
|
47
|
+
- Define liveness probes (restart on deadlock), readiness probes (remove from service on overload), and startup probes (slow initialization).
|
|
48
|
+
- Use `PodDisruptionBudget` for services that need availability guarantees during node maintenance.
|
|
49
|
+
- Set `securityContext`: `runAsNonRoot: true`, `readOnlyRootFilesystem: true`, drop all capabilities.
|
|
50
|
+
- Use `ConfigMap` for non-sensitive config, `Secret` (or external secrets operator) for credentials.
|
|
51
|
+
- Label everything: `app`, `version`, `component`, `part-of`, `managed-by`.
|
|
52
|
+
- Use `topologySpreadConstraints` or pod anti-affinity for high-availability across zones.
|
|
53
|
+
|
|
54
|
+
### Debugging patterns
|
|
55
|
+
|
|
56
|
+
- CrashLoopBackOff: check `kubectl logs --previous`, verify probes aren't too aggressive.
|
|
57
|
+
- ImagePullBackOff: verify image name, tag, registry auth, and network access.
|
|
58
|
+
- Pending pods: check events for resource pressure, node affinity mismatches, or PVC issues.
|
|
59
|
+
- OOMKilled: increase memory limits or fix the memory leak — never just raise limits blindly.
|
|
60
|
+
- Network issues: verify service selectors match pod labels, check NetworkPolicy rules.
|
|
61
|
+
|
|
62
|
+
### Constraints
|
|
63
|
+
|
|
64
|
+
- Avoid storing secrets in Dockerfiles, environment variables baked at build time, or ConfigMaps.
|
|
65
|
+
- Avoid running as root in production containers.
|
|
66
|
+
- Avoid using `latest` tag for base images or application images.
|
|
67
|
+
- Avoid setting CPU limits without measuring — over-limiting causes throttling.
|
|
68
|
+
- Avoid single-replica deployments for stateless services that need availability.
|
|
69
|
+
- Avoid skipping health probes — Kubernetes cannot manage what it cannot observe.
|
|
70
|
+
|
|
71
|
+
## Output Format
|
|
72
|
+
|
|
73
|
+
Provide implementation guidance, code examples, and configuration as appropriate to the task.
|
|
74
|
+
|
|
75
|
+
## References
|
|
76
|
+
|
|
77
|
+
Load on demand. Do not preload all reference files.
|
|
78
|
+
|
|
79
|
+
| File | Load when |
|
|
80
|
+
| ------------------------------------------------- | --------------------------------------------------------------------------------------------------------- |
|
|
81
|
+
| `references/dockerfile-optimization-checklist.md` | The task focuses on Dockerfile writing, multi-stage builds, layer caching, or image size reduction. |
|
|
82
|
+
| `references/kubernetes-deployment-patterns.md` | The task involves deployment strategies, autoscaling, Helm charts, or production Kubernetes architecture. |
|
|
83
|
+
|
|
84
|
+
## Scripts
|
|
85
|
+
|
|
86
|
+
No helper scripts are required for this skill right now. Keep execution in `SKILL.md` and `references/` unless repeated automation becomes necessary.
|
|
87
|
+
|
|
88
|
+
## Examples
|
|
89
|
+
|
|
90
|
+
- "Help me with docker kubernetes best practices in this project"
|
|
91
|
+
- "Review my docker kubernetes implementation for issues"
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
# Dockerfile Optimization Checklist
|
|
2
|
+
|
|
3
|
+
## Multi-stage builds
|
|
4
|
+
|
|
5
|
+
- Stage 1 (builder): install build tools, compile, bundle.
|
|
6
|
+
- Stage 2 (runtime): copy only artifacts needed to run.
|
|
7
|
+
- Name stages explicitly: `FROM node:22-slim AS builder`.
|
|
8
|
+
- Copy from named stages: `COPY --from=builder /app/dist ./dist`.
|
|
9
|
+
|
|
10
|
+
## Layer caching
|
|
11
|
+
|
|
12
|
+
- `COPY package.json package-lock.json ./` then `RUN npm ci` BEFORE `COPY . .`.
|
|
13
|
+
- Group related `RUN` commands with `&&` to reduce layers.
|
|
14
|
+
- Never invalidate cache unnecessarily — file order matters.
|
|
15
|
+
|
|
16
|
+
## Image size
|
|
17
|
+
|
|
18
|
+
- Prefer `distroless` or `alpine` base images for production.
|
|
19
|
+
- Remove build caches in the same RUN step: `RUN apt-get install -y ... && rm -rf /var/lib/apt/lists/*`.
|
|
20
|
+
- Use `--mount=type=cache` for package manager caches in BuildKit.
|
|
21
|
+
- Avoid installing dev dependencies in the runtime stage.
|
|
22
|
+
|
|
23
|
+
## Security
|
|
24
|
+
|
|
25
|
+
- Pin base images to digest: `FROM node:22-slim@sha256:abc123...`.
|
|
26
|
+
- Run `npm ci --omit=dev` for production Node.js images.
|
|
27
|
+
- Scan with `trivy`, `grype`, or `docker scout` in CI before push.
|
|
28
|
+
- Never use `ADD` for remote URLs — use explicit `curl` + verify checksum.
|
|
29
|
+
- Set `USER 1000` or named non-root user as the last instruction before CMD.
|
|
30
|
+
|
|
31
|
+
## Health and signals
|
|
32
|
+
|
|
33
|
+
- Use `STOPSIGNAL SIGTERM` — ensure the app handles graceful shutdown.
|
|
34
|
+
- Avoid `ENTRYPOINT ["sh", "-c", "..."]` — PID 1 must be the app process (use `exec` form or `tini`).
|
|
35
|
+
- Define `HEALTHCHECK` in Dockerfile for standalone Docker, but prefer Kubernetes probes in K8s.
|
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
# Kubernetes Deployment Patterns
|
|
2
|
+
|
|
3
|
+
## Deployment strategies
|
|
4
|
+
|
|
5
|
+
- **Rolling update** (default): gradually replaces pods. Set `maxUnavailable` and `maxSurge` based on replica count and availability requirement.
|
|
6
|
+
- **Blue-green**: deploy new version alongside old, switch service selector after verification. Requires 2x resources temporarily.
|
|
7
|
+
- **Canary**: route a percentage of traffic to new version. Use Istio, Linkerd, or Argo Rollouts for traffic splitting.
|
|
8
|
+
- **Recreate**: kill all old pods before starting new. Only for services that cannot run two versions simultaneously.
|
|
9
|
+
|
|
10
|
+
## Autoscaling
|
|
11
|
+
|
|
12
|
+
- **HPA** (Horizontal Pod Autoscaler): scale on CPU, memory, or custom metrics. Set `minReplicas` ≥ 2 for availability.
|
|
13
|
+
- **VPA** (Vertical Pod Autoscaler): auto-adjust resource requests. Use in recommendation mode first, not auto mode.
|
|
14
|
+
- **KEDA**: scale on event-driven metrics (queue depth, HTTP rate). Good for batch and event-driven workloads.
|
|
15
|
+
- Always set resource requests accurately — HPA decisions depend on them.
|
|
16
|
+
|
|
17
|
+
## Helm chart structure
|
|
18
|
+
|
|
19
|
+
```
|
|
20
|
+
chart-name/
|
|
21
|
+
├── Chart.yaml
|
|
22
|
+
├── values.yaml
|
|
23
|
+
├── values-staging.yaml
|
|
24
|
+
├── values-production.yaml
|
|
25
|
+
└── templates/
|
|
26
|
+
├── deployment.yaml
|
|
27
|
+
├── service.yaml
|
|
28
|
+
├── ingress.yaml
|
|
29
|
+
├── hpa.yaml
|
|
30
|
+
├── pdb.yaml
|
|
31
|
+
├── configmap.yaml
|
|
32
|
+
├── serviceaccount.yaml
|
|
33
|
+
└── _helpers.tpl
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
- Keep `values.yaml` as the single source of configurable parameters.
|
|
37
|
+
- Use `_helpers.tpl` for label generation and name templating.
|
|
38
|
+
- Pin chart dependencies to exact versions.
|
|
39
|
+
- Test with `helm template` and `helm lint` before deploy.
|
|
40
|
+
|
|
41
|
+
## Networking
|
|
42
|
+
|
|
43
|
+
- Use `ClusterIP` services for internal communication, `LoadBalancer` or `Ingress` for external.
|
|
44
|
+
- Use `NetworkPolicy` to restrict pod-to-pod communication (default-deny, allow-list).
|
|
45
|
+
- Prefer `Ingress` with TLS termination over exposing services directly.
|
|
46
|
+
|
|
47
|
+
## Observability
|
|
48
|
+
|
|
49
|
+
- Export `/healthz` and `/readyz` endpoints from every service.
|
|
50
|
+
- Use structured JSON logging to stdout/stderr — never write logs to files inside containers.
|
|
51
|
+
- Expose Prometheus metrics on a `/metrics` endpoint.
|
|
52
|
+
- Set meaningful pod labels for filtering in dashboards and alerting.
|
|
53
|
+
|
|
54
|
+
## Stateful workloads
|
|
55
|
+
|
|
56
|
+
- Use `StatefulSet` only when stable network identity or stable storage is required.
|
|
57
|
+
- Prefer managed databases over running databases in Kubernetes.
|
|
58
|
+
- Use `PersistentVolumeClaim` with appropriate storage class and reclaim policy.
|
|
59
|
+
- Back up PVCs independently — Kubernetes does not manage backup.
|