@intentius/chant-lexicon-k8s 0.0.18 → 0.0.24

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/dist/integrity.json +9 -4
  2. package/dist/manifest.json +1 -1
  3. package/dist/skills/chant-k8s-aks.md +146 -0
  4. package/{src/skills/kubernetes-patterns.md → dist/skills/chant-k8s-deployment-strategies.md} +1 -1
  5. package/dist/skills/chant-k8s-eks.md +156 -0
  6. package/dist/skills/chant-k8s-gke.md +246 -0
  7. package/{src/skills/kubernetes-security.md → dist/skills/chant-k8s-security.md} +1 -1
  8. package/dist/skills/chant-k8s.md +66 -3
  9. package/package.json +20 -2
  10. package/src/composites/adot-collector.ts +34 -22
  11. package/src/composites/agic-ingress.ts +14 -6
  12. package/src/composites/aks-external-dns-agent.ts +29 -18
  13. package/src/composites/alb-ingress.ts +14 -6
  14. package/src/composites/autoscaled-service.ts +25 -20
  15. package/src/composites/azure-disk-storage-class.ts +14 -6
  16. package/src/composites/azure-file-storage-class.ts +14 -6
  17. package/src/composites/azure-monitor-collector.ts +34 -22
  18. package/src/composites/batch-job.ts +25 -17
  19. package/src/composites/cockroachdb-cluster.ts +148 -58
  20. package/src/composites/composites.test.ts +369 -363
  21. package/src/composites/config-connector-context.ts +15 -8
  22. package/src/composites/configured-app.ts +21 -15
  23. package/src/composites/cron-workload.ts +25 -20
  24. package/src/composites/ebs-storage-class.ts +14 -6
  25. package/src/composites/efs-storage-class.ts +14 -6
  26. package/src/composites/external-dns-agent.ts +26 -20
  27. package/src/composites/filestore-storage-class.ts +14 -6
  28. package/src/composites/fluent-bit-agent.ts +30 -24
  29. package/src/composites/gce-ingress.ts +14 -6
  30. package/src/composites/gce-pd-storage-class.ts +14 -6
  31. package/src/composites/gke-external-dns-agent.ts +34 -21
  32. package/src/composites/gke-fluent-bit-agent.ts +34 -22
  33. package/src/composites/gke-gateway.ts +19 -12
  34. package/src/composites/gke-otel-collector.ts +34 -22
  35. package/src/composites/irsa-service-account.ts +22 -14
  36. package/src/composites/metrics-server.ts +41 -26
  37. package/src/composites/monitored-service.ts +26 -19
  38. package/src/composites/namespace-env.ts +26 -17
  39. package/src/composites/network-isolated-app.ts +21 -16
  40. package/src/composites/node-agent.ts +33 -22
  41. package/src/composites/secure-ingress.ts +19 -11
  42. package/src/composites/sidecar-app.ts +17 -12
  43. package/src/composites/stateful-app.ts +21 -12
  44. package/src/composites/web-app.ts +25 -21
  45. package/src/composites/worker-pool.ts +40 -26
  46. package/src/composites/workload-identity-sa.ts +22 -14
  47. package/src/composites/workload-identity-service-account.ts +22 -16
  48. package/src/plugin.ts +130 -614
  49. package/src/serializer.ts +3 -0
  50. package/src/skills/chant-k8s-deployment-strategies.md +183 -0
  51. package/src/skills/chant-k8s-gke.md +55 -0
  52. package/src/skills/chant-k8s-patterns.md +245 -0
  53. package/src/skills/chant-k8s-security.md +237 -0
  54. package/src/skills/chant-k8s.md +305 -0
@@ -0,0 +1,237 @@
1
+ ---
2
+ skill: chant-k8s-security
3
+ description: Kubernetes pod security, image scanning, network policies, and secrets management
4
+ user-invocable: true
5
+ ---
6
+
7
+ # Kubernetes Security Patterns
8
+
9
+ ## Pod Security
10
+
11
+ ### Security Context (container-level)
12
+
13
+ All Deployment-based composites accept `securityContext` for hardened containers:
14
+
15
+ ```typescript
16
+ import { WebApp } from "@intentius/chant-lexicon-k8s";
17
+
18
+ const { deployment, service } = WebApp({
19
+ name: "api",
20
+ image: "api:1.0",
21
+ port: 8080,
22
+ securityContext: {
23
+ runAsNonRoot: true,
24
+ runAsUser: 1000,
25
+ readOnlyRootFilesystem: true,
26
+ allowPrivilegeEscalation: false,
27
+ capabilities: { drop: ["ALL"] },
28
+ },
29
+ });
30
+ ```
31
+
32
+ ### Pod Security Standards
33
+
34
+ Kubernetes enforces three levels via Pod Security Admission:
35
+
36
+ | Level | What it blocks | When to use |
37
+ |-------|---------------|-------------|
38
+ | `privileged` | Nothing | System namespaces only |
39
+ | `baseline` | hostNetwork, hostPID, privileged containers | Development |
40
+ | `restricted` | Non-root, no capabilities, read-only root FS | Production |
41
+
42
+ Apply to a namespace:
43
+
44
+ ```bash
45
+ kubectl label namespace prod pod-security.kubernetes.io/enforce=restricted
46
+ kubectl label namespace prod pod-security.kubernetes.io/warn=restricted
47
+ ```
48
+
49
+ ### Post-Synth Security Checks
50
+
51
+ chant catches security issues at build time:
52
+
53
+ | Check | What it detects |
54
+ |-------|----------------|
55
+ | WK8005 | Secrets exposed in environment variables |
56
+ | WK8006 | `latest` image tags (non-deterministic) |
57
+ | WK8041 | API keys or tokens in plain text |
58
+ | WK8042 | Hardcoded passwords in container env |
59
+ | WK8201 | Missing resource limits (CPU/memory) |
60
+ | WK8202 | Privileged containers |
61
+ | WK8203 | Host namespace sharing (hostPID/hostNetwork) |
62
+ | WK8204 | Writable root filesystem |
63
+ | WK8205 | Containers running as root |
64
+
65
+ ## Image Security
66
+
67
+ ### Pin Image Digests
68
+
69
+ Use image digests instead of tags for immutability:
70
+
71
+ ```typescript
72
+ const { deployment } = WebApp({
73
+ name: "api",
74
+ image: "api@sha256:abc123def456...",
75
+ port: 8080,
76
+ });
77
+ ```
78
+
79
+ ### Private Registry with imagePullSecrets
80
+
81
+ ```typescript
82
+ import { Deployment, Secret } from "@intentius/chant-lexicon-k8s";
83
+
84
+ export const registryCreds = new Secret({
85
+ metadata: { name: "registry-creds" },
86
+ type: "kubernetes.io/dockerconfigjson",
87
+ data: { ".dockerconfigjson": "${DOCKER_CONFIG_JSON}" },
88
+ });
89
+
90
+ export const deployment = new Deployment({
91
+ spec: {
92
+ template: {
93
+ spec: {
94
+ imagePullSecrets: [{ name: "registry-creds" }],
95
+ containers: [{ name: "app", image: "private.registry.io/app:1.0" }],
96
+ },
97
+ },
98
+ },
99
+ });
100
+ ```
101
+
102
+ ### Image Policy
103
+
104
+ Block unsigned or unscanned images with admission controllers:
105
+ - **Kyverno**: policy-based, Kubernetes-native
106
+ - **OPA/Gatekeeper**: Rego-based policies
107
+ - **Sigstore/Cosign**: image signature verification
108
+
109
+ ## Network Policies
110
+
111
+ ### Default Deny All
112
+
113
+ Start with deny-all and add explicit allows:
114
+
115
+ ```typescript
116
+ import { NamespaceEnv } from "@intentius/chant-lexicon-k8s";
117
+
118
+ const ns = NamespaceEnv({
119
+ name: "prod",
120
+ defaultDenyIngress: true,
121
+ defaultDenyEgress: true,
122
+ });
123
+ ```
124
+
125
+ ### Allow Specific Traffic
126
+
127
+ ```typescript
128
+ import { NetworkIsolatedApp } from "@intentius/chant-lexicon-k8s";
129
+
130
+ const app = NetworkIsolatedApp({
131
+ name: "api",
132
+ image: "api:1.0",
133
+ port: 8080,
134
+ namespace: "prod",
135
+ allowIngressFrom: [
136
+ { podSelector: { "app.kubernetes.io/name": "gateway" } },
137
+ { namespaceSelector: { "kubernetes.io/metadata.name": "monitoring" } },
138
+ ],
139
+ allowEgressTo: [
140
+ { podSelector: { "app.kubernetes.io/name": "postgres" }, ports: [{ port: 5432 }] },
141
+ { ipBlock: { cidr: "10.0.0.0/8" }, ports: [{ port: 443 }] },
142
+ ],
143
+ });
144
+ ```
145
+
146
+ ### Allow DNS Egress
147
+
148
+ Most pods need DNS. Always allow egress to kube-dns:
149
+
150
+ ```yaml
151
+ egress:
152
+ - to:
153
+ - namespaceSelector:
154
+ matchLabels:
155
+ kubernetes.io/metadata.name: kube-system
156
+ ports:
157
+ - port: 53
158
+ protocol: UDP
159
+ - port: 53
160
+ protocol: TCP
161
+ ```
162
+
163
+ ## Secrets Management
164
+
165
+ ### External Secrets Operator
166
+
167
+ Sync secrets from external providers (AWS Secrets Manager, Vault, etc.):
168
+
169
+ ```yaml
170
+ apiVersion: external-secrets.io/v1beta1
171
+ kind: ExternalSecret
172
+ metadata:
173
+ name: app-secrets
174
+ spec:
175
+ refreshInterval: 1h
176
+ secretStoreRef:
177
+ name: aws-secrets
178
+ kind: ClusterSecretStore
179
+ target:
180
+ name: app-secrets
181
+ data:
182
+ - secretKey: db-password
183
+ remoteRef:
184
+ key: prod/db-password
185
+ ```
186
+
187
+ ### Sealed Secrets
188
+
189
+ Encrypt secrets for safe storage in Git:
190
+
191
+ ```bash
192
+ kubeseal --format yaml < secret.yaml > sealed-secret.yaml
193
+ ```
194
+
195
+ ### Secret Rotation
196
+
197
+ Use the External Secrets Operator `refreshInterval` or Reloader to restart pods on secret changes:
198
+
199
+ ```bash
200
+ kubectl annotate deployment api reloader.stakater.com/auto="true"
201
+ ```
202
+
203
+ ## RBAC Hardening
204
+
205
+ ### Audit RBAC Permissions
206
+
207
+ ```bash
208
+ # Check what a ServiceAccount can do
209
+ kubectl auth can-i --list --as=system:serviceaccount:prod:api-sa
210
+
211
+ # Check specific permission
212
+ kubectl auth can-i create pods --as=system:serviceaccount:prod:api-sa
213
+ ```
214
+
215
+ ### Avoid Cluster-Admin
216
+
217
+ Never bind `cluster-admin` to application ServiceAccounts. Use namespace-scoped Roles with minimal verbs:
218
+
219
+ ```typescript
220
+ import { WorkerPool } from "@intentius/chant-lexicon-k8s";
221
+
222
+ const worker = WorkerPool({
223
+ name: "processor",
224
+ image: "processor:1.0",
225
+ rbacRules: [
226
+ { apiGroups: [""], resources: ["configmaps"], verbs: ["get"] },
227
+ ],
228
+ });
229
+ ```
230
+
231
+ ### Service Account Token Projection
232
+
233
+ Disable auto-mounting of SA tokens when not needed:
234
+
235
+ ```yaml
236
+ automountServiceAccountToken: false
237
+ ```
@@ -0,0 +1,305 @@
1
+ ---
2
+ skill: chant-k8s
3
+ description: Build, validate, and deploy Kubernetes manifests from a chant project
4
+ user-invocable: true
5
+ ---
6
+
7
+ # Kubernetes Operational Playbook
8
+
9
+ ## How chant and Kubernetes relate
10
+
11
+ chant is a **synthesis compiler** — it compiles TypeScript source files into Kubernetes YAML manifests. `chant build` does not call the Kubernetes API; synthesis is pure and deterministic. The optional `chant state snapshot` command queries the Kubernetes API to capture deployment metadata (pod names, status, UIDs) for observability. Your job as an agent is to bridge synthesis and deployment:
12
+
13
+ - Use **chant** for: build, lint, diff (local YAML comparison)
14
+ - Use **kubectl / k8s API** for: apply, rollback, monitoring, troubleshooting
15
+
16
+ The source of truth for infrastructure is the TypeScript in `src/`. The generated YAML manifests are intermediate artifacts — never edit them by hand.
17
+
18
+ ## Scaffolding a new project
19
+
20
+ ### Initialize with a template
21
+
22
+ ```bash
23
+ chant init --lexicon k8s # default: Deployment + Service
24
+ chant init --lexicon k8s --template microservice # Deployment + Service + HPA + PDB
25
+ chant init --lexicon k8s --template stateful # StatefulSet + PVC + Service
26
+ ```
27
+
28
+ ### Available templates
29
+
30
+ | Template | What it generates | Best for |
31
+ |----------|-------------------|----------|
32
+ | *(default)* | Deployment + Service | Simple stateless apps |
33
+ | `microservice` | Deployment + Service + HPA + PDB | Production microservices |
34
+ | `stateful` | StatefulSet + PVC + headless Service | Databases, caches |
35
+
36
+ ## Build and validate
37
+
38
+ ### Build manifests
39
+
40
+ ```bash
41
+ chant build src/ --output manifests.yaml
42
+ ```
43
+
44
+ Options:
45
+ - `--format yaml` — emit YAML (default for K8s)
46
+ - `--watch` — rebuild on source changes
47
+ - `--output <path>` — write to a specific file
48
+
49
+ ### Lint the source
50
+
51
+ ```bash
52
+ chant lint src/
53
+ ```
54
+
55
+ ### What each step catches
56
+
57
+ | Step | Catches | When to run |
58
+ |------|---------|-------------|
59
+ | `chant lint` | Hardcoded namespaces (WK8001) | Every edit |
60
+ | `chant build` | Post-synth: secrets in env (WK8005), latest tags (WK8006), API keys (WK8041), missing probes (WK8301), no resource limits (WK8201), privileged containers (WK8202), and more | Before apply |
61
+ | `kubectl --dry-run=server` | K8s API validation: schema errors, admission webhooks | Before production apply |
62
+
63
+ ## Deploying to Kubernetes
64
+
65
+ ### Apply manifests
66
+
67
+ ```bash
68
+ # Build
69
+ chant build src/ --output manifests.yaml
70
+
71
+ # Dry run first
72
+ kubectl apply -f manifests.yaml --dry-run=server
73
+
74
+ # Apply
75
+ kubectl apply -f manifests.yaml
76
+ ```
77
+
78
+ ### Check rollout status
79
+
80
+ ```bash
81
+ kubectl rollout status deployment/my-app
82
+ ```
83
+
84
+ ### Rollback
85
+
86
+ ```bash
87
+ kubectl rollout undo deployment/my-app
88
+ kubectl rollout undo deployment/my-app --to-revision=2
89
+ ```
90
+
91
+ ## Debugging strategies
92
+
93
+ ### Check pod status and events
94
+
95
+ ```bash
96
+ # Overview
97
+ kubectl get pods -l app.kubernetes.io/name=my-app
98
+ kubectl get events --sort-by=.lastTimestamp -n <namespace>
99
+
100
+ # Deep dive into a specific pod
101
+ kubectl describe pod <pod-name>
102
+
103
+ # Logs (current and previous crash)
104
+ kubectl logs <pod-name>
105
+ kubectl logs <pod-name> --previous
106
+ kubectl logs <pod-name> -c <container-name> # specific container
107
+ kubectl logs deployment/my-app --all-containers
108
+
109
+ # Debug containers (K8s 1.25+)
110
+ kubectl debug <pod-name> -it --image=busybox --target=<container>
111
+
112
+ # Port-forwarding for local testing
113
+ kubectl port-forward svc/my-app 8080:80
114
+ kubectl port-forward pod/<pod-name> 8080:8080
115
+ ```
116
+
117
+ ### Common error patterns
118
+
119
+ | Status | Meaning | Diagnostic command | Typical fix |
120
+ |--------|---------|-------------------|-------------|
121
+ | Pending | Not scheduled | `kubectl describe pod` → Events | Check resource requests, node selectors, taints, PVC binding |
122
+ | CrashLoopBackOff | App crashing on start | `kubectl logs --previous` | Fix app startup, check probe config, increase initialDelaySeconds |
123
+ | ImagePullBackOff | Image not found | `kubectl describe pod` → Events | Verify image name/tag, check imagePullSecrets, registry auth |
124
+ | OOMKilled | Out of memory | `kubectl describe pod` → Last State | Increase memory limit, profile app memory usage |
125
+ | Evicted | Node disk/memory pressure | `kubectl describe node` | Increase limits, add node capacity, check for log/tmp bloat |
126
+ | CreateContainerError | Container config issue | `kubectl describe pod` → Events | Check volume mounts, configmap/secret refs, security context |
127
+ | Init:CrashLoopBackOff | Init container failing | `kubectl logs -c <init-container>` | Fix init container command, check dependencies |
128
+
129
+ ## Production safety
130
+
131
+ ### Pre-apply validation
132
+
133
+ ```bash
134
+ # Always diff before applying
135
+ kubectl diff -f manifests.yaml
136
+
137
+ # Server-side dry run (validates with admission webhooks)
138
+ kubectl apply -f manifests.yaml --dry-run=server
139
+
140
+ # Client-side dry run (fast, but no webhook validation)
141
+ kubectl apply -f manifests.yaml --dry-run=client
142
+ ```
143
+
144
+ ### Deployment strategies
145
+
146
+ - **RollingUpdate** (default): Gradually replaces pods. Set `maxSurge` and `maxUnavailable`.
147
+ - **Recreate**: All pods terminated before new ones created. Use for stateful apps that cannot run multiple versions.
148
+ - **Canary**: Deploy a second Deployment with 1 replica + same selector labels. Route percentage via Ingress annotations or service mesh.
149
+ - **Blue/Green**: Two full Deployments (blue/green), switch Service selector between them.
150
+
151
+ ## Choosing the Right Composite
152
+
153
+ Composites are higher-level functions that produce multiple coordinated K8s resources from a single call. They return plain prop objects — not class instances.
154
+
155
+ ### Decision Tree — Core Composites
156
+
157
+ | Need | Composite | Resources |
158
+ |------|-----------|-----------|
159
+ | Stateless web app | **WebApp** | Deployment + Service + optional Ingress + optional PDB |
160
+ | Stateful database/cache | **StatefulApp** | StatefulSet + headless Service + PVC + optional PDB |
161
+ | Production HTTP service with autoscaling | **AutoscaledService** | Deployment + Service + HPA + PDB |
162
+ | Background queue workers | **WorkerPool** | Deployment + RBAC + optional ConfigMap + optional HPA + optional PDB |
163
+ | Scheduled jobs | **CronWorkload** | CronJob + RBAC |
164
+ | One-shot batch jobs | **BatchJob** | Job + optional RBAC |
165
+ | App with ConfigMap/Secret mounts | **ConfiguredApp** | Deployment + Service + optional ConfigMap |
166
+ | Multi-container sidecar patterns | **SidecarApp** | Deployment + Service |
167
+ | App with Prometheus monitoring | **MonitoredService** | Deployment + Service + ServiceMonitor + optional PrometheusRule |
168
+ | App with fine-grained network policies | **NetworkIsolatedApp** | Deployment + Service + NetworkPolicy |
169
+ | Namespace with quotas and isolation | **NamespaceEnv** | Namespace + ResourceQuota + LimitRange + NetworkPolicy |
170
+ | Per-node agent (custom) | **NodeAgent** | DaemonSet + RBAC + optional ConfigMap |
171
+ | Multi-host TLS Ingress (cert-manager) | **SecureIngress** | Ingress + optional Certificate |
172
+ | HPA metrics provider | **MetricsServer** | Deployment + Service + RBAC + APIService |
173
+ | CockroachDB cluster | **CockroachDbCluster** | StatefulSet + Services + PVCs + RBAC + optional CertificateRequests |
174
+
175
+ ### Decision Tree — AWS (EKS) Composites
176
+
177
+ | Need | Composite | Resources |
178
+ |------|-----------|-----------|
179
+ | EKS IRSA ServiceAccount | **IrsaServiceAccount** | ServiceAccount + optional RBAC |
180
+ | AWS ALB Ingress | **AlbIngress** | Ingress with ALB annotations |
181
+ | EBS StorageClass | **EbsStorageClass** | StorageClass (ebs.csi.aws.com) |
182
+ | EFS StorageClass | **EfsStorageClass** | StorageClass (efs.csi.aws.com) |
183
+ | Fluent Bit for CloudWatch | **FluentBitAgent** | DaemonSet + RBAC + ConfigMap |
184
+ | ExternalDNS for Route53 | **ExternalDnsAgent** | Deployment + IRSA SA + ClusterRole |
185
+ | ADOT for CloudWatch/X-Ray | **AdotCollector** | DaemonSet + RBAC + ConfigMap |
186
+
187
+ ### Decision Tree — GCP (GKE) Composites
188
+
189
+ | Need | Composite | Resources |
190
+ |------|-----------|-----------|
191
+ | GKE Gateway API Ingress | **GkeGateway** | Gateway + HTTPRoute + optional HealthCheckPolicy |
192
+ | GKE Ingress (classic) | **GceIngress** | Ingress with GCE annotations + optional BackendConfig |
193
+ | GKE Workload Identity SA | **WorkloadIdentityServiceAccount** | ServiceAccount + optional RBAC (GCP IAM binding annotation) |
194
+ | GCE Persistent Disk StorageClass | **GcePdStorageClass** | StorageClass (pd.csi.storage.gke.io) |
195
+ | Filestore StorageClass | **FilestoreStorageClass** | StorageClass (filestore.csi.storage.gke.io) |
196
+ | GKE Fluent Bit agent | **GkeFluentBitAgent** | DaemonSet + RBAC + ConfigMap (Cloud Logging) |
197
+ | GKE ExternalDNS agent | **GkeExternalDnsAgent** | Deployment + WI SA + ClusterRole (Cloud DNS) |
198
+ | GKE OpenTelemetry Collector | **GkeOtelCollector** | DaemonSet + RBAC + ConfigMap (Cloud Trace/Monitoring) |
199
+ | Config Connector context | **ConfigConnectorContext** | ConfigConnectorContext + Namespace + RBAC |
200
+
201
+ ### Decision Tree — Azure (AKS) Composites
202
+
203
+ | Need | Composite | Resources |
204
+ |------|-----------|-----------|
205
+ | AKS AGIC Ingress | **AgicIngress** | Ingress with AGIC annotations + optional BackendConfig |
206
+ | Azure Disk StorageClass | **AzureDiskStorageClass** | StorageClass (disk.csi.azure.com) |
207
+ | Azure File StorageClass | **AzureFileStorageClass** | StorageClass (file.csi.azure.com) |
208
+ | AKS ExternalDNS agent | **AksExternalDnsAgent** | Deployment + WI SA + ClusterRole (Azure DNS) |
209
+ | Azure Monitor Collector | **AzureMonitorCollector** | DaemonSet + RBAC + ConfigMap (Azure Monitor) |
210
+
211
+ ### Hardening options (available on Deployment-based composites)
212
+
213
+ - `minAvailable` — creates a PodDisruptionBudget (WebApp, StatefulApp, WorkerPool; AutoscaledService always has one)
214
+ - `initContainers` — run before main containers (WebApp, StatefulApp, AutoscaledService, ConfiguredApp, SidecarApp)
215
+ - `securityContext` — container security settings (WebApp, StatefulApp, AutoscaledService, WorkerPool)
216
+ - `terminationGracePeriodSeconds` — graceful shutdown (WebApp, StatefulApp, AutoscaledService, WorkerPool)
217
+ - `priorityClassName` — pod scheduling priority (WebApp, StatefulApp, AutoscaledService, WorkerPool)
218
+
219
+ ### Common patterns across all composites
220
+
221
+ - All resources carry `app.kubernetes.io/name`, `app.kubernetes.io/managed-by: chant`, and `app.kubernetes.io/component` labels
222
+ - Pass `labels: { team: "platform" }` to add extra labels to all resources
223
+ - Pass `namespace: "prod"` to set namespace on all namespaced resources
224
+ - Pass `env: [{ name: "KEY", value: "val" }]` for container environment variables
225
+
226
+ ## Troubleshooting reference table
227
+
228
+ | Symptom | Likely cause | Resolution |
229
+ |---------|-------------|------------|
230
+ | Pod stuck in Pending | Insufficient CPU/memory on nodes | Scale up cluster or reduce resource requests |
231
+ | Pod stuck in Pending | PVC not bound | Check StorageClass exists, PV available |
232
+ | Pod stuck in Pending | Node selector/affinity mismatch | Verify node labels match selectors |
233
+ | Pod stuck in Pending | Too many pods on node | Check `kubectl describe node` for Allocatable vs Allocated |
234
+ | Pod stuck in ContainerCreating | ConfigMap/Secret not found | Ensure referenced ConfigMaps/Secrets exist |
235
+ | Pod stuck in ContainerCreating | Volume mount timeout | Check CSI driver logs, storage provider status |
236
+ | Service returns 503 | No ready endpoints | Check pod readiness probes, selector match |
237
+ | Service returns 503 | Endpoints exist but pod failing probes | Increase `initialDelaySeconds`, check probe path |
238
+ | Ingress returns 404 | Backend service not found | Check Ingress rules, service name/port |
239
+ | Ingress returns 502 | Backend pods not ready | Check pod readiness, container port matches Service targetPort |
240
+ | HPA not scaling | Metrics server not installed | Install metrics-server or MetricsServer composite |
241
+ | HPA not scaling | Resource requests not set | Add CPU/memory requests to containers |
242
+ | HPA stuck at max | Target utilization too low | Raise `targetCPUUtilizationPercentage`, add nodes |
243
+ | CronJob not running | Invalid cron expression | Validate cron syntax (5-field format) |
244
+ | CronJob not running | `concurrencyPolicy: Forbid` with long jobs | Increase job deadline or switch to `Replace` |
245
+ | NetworkPolicy blocking | Default deny applied | Add explicit allow rules for required traffic |
246
+ | RBAC permission denied | Missing Role/RoleBinding | Check ServiceAccount bindings and verb permissions |
247
+ | RBAC permission denied | ClusterRole vs namespaced Role | Use ClusterRoleBinding for cluster-scoped resources |
248
+ | PDB preventing eviction | minAvailable too high | Lower minAvailable or increase replicas |
249
+ | StatefulSet stuck on update | Pod ordinal blocked | Check PVC status for ordinal, delete stuck pod |
250
+
251
+ ## Troubleshooting decision tree
252
+
253
+ ```
254
+ Pod not running?
255
+ ├─ Pending
256
+ │ ├─ "Insufficient cpu/memory" → scale cluster or lower requests
257
+ │ ├─ "no nodes match" → fix nodeSelector / tolerations
258
+ │ └─ "unbound PVC" → check StorageClass, provision PV
259
+ ├─ ContainerCreating
260
+ │ ├─ "secret not found" → create missing Secret
261
+ │ └─ "timeout waiting for volume" → check CSI driver
262
+ ├─ CrashLoopBackOff
263
+ │ ├─ OOMKilled → increase memory limit
264
+ │ ├─ exit code 1 → check app logs (`kubectl logs --previous`)
265
+ │ └─ exit code 137 → SIGKILL — liveness probe too aggressive
266
+ ├─ ImagePullBackOff
267
+ │ ├─ 401 Unauthorized → check imagePullSecrets
268
+ │ └─ manifest unknown → verify image:tag exists in registry
269
+ └─ Running but not Ready
270
+ ├─ readinessProbe failing → check probe path/port
271
+ └─ startup probe failing → increase `failureThreshold * periodSeconds`
272
+ ```
273
+
274
+ ## Quick reference
275
+
276
+ ```bash
277
+ # Build
278
+ chant build src/ --output manifests.yaml
279
+
280
+ # Lint
281
+ chant lint src/
282
+
283
+ # Validate
284
+ kubectl apply -f manifests.yaml --dry-run=server
285
+
286
+ # Diff
287
+ kubectl diff -f manifests.yaml
288
+
289
+ # Apply
290
+ kubectl apply -f manifests.yaml
291
+
292
+ # Status
293
+ kubectl get pods,svc,deploy
294
+
295
+ # Logs
296
+ kubectl logs deployment/my-app
297
+
298
+ # Rollback
299
+ kubectl rollout undo deployment/my-app
300
+
301
+ # Debug
302
+ kubectl describe pod <name>
303
+ kubectl logs <name> --previous
304
+ kubectl get events --sort-by=.lastTimestamp
305
+ ```