k8s-agent-skills 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/README.md +102 -0
  2. package/package.json +63 -0
  3. package/skills/atlas/SKILL.md +166 -0
  4. package/skills/cert-manager/SKILL.md +212 -0
  5. package/skills/cilium-gateway/SKILL.md +283 -0
  6. package/skills/cilium-network/SKILL.md +243 -0
  7. package/skills/cnpg/SKILL.md +130 -0
  8. package/skills/dragonfly/SKILL.md +194 -0
  9. package/skills/external-dns/SKILL.md +185 -0
  10. package/skills/flagger/SKILL.md +292 -0
  11. package/skills/flux/SKILL.md +36 -0
  12. package/skills/gitea/SKILL.md +32 -0
  13. package/skills/gitea-api/SKILL.md +104 -0
  14. package/skills/gitea-registry/SKILL.md +71 -0
  15. package/skills/gitea-runner/SKILL.md +126 -0
  16. package/skills/gitea-tea/SKILL.md +206 -0
  17. package/skills/gitea-webhooks/SKILL.md +93 -0
  18. package/skills/harbor/SKILL.md +32 -0
  19. package/skills/harbor-api/SKILL.md +231 -0
  20. package/skills/harbor-helm/SKILL.md +238 -0
  21. package/skills/harbor-terraform/SKILL.md +233 -0
  22. package/skills/higress/SKILL.md +27 -0
  23. package/skills/higress-helm/SKILL.md +328 -0
  24. package/skills/higress-operator/SKILL.md +435 -0
  25. package/skills/kserve/SKILL.md +28 -0
  26. package/skills/kserve-helm/SKILL.md +330 -0
  27. package/skills/kserve-operator/SKILL.md +763 -0
  28. package/skills/kubeflow/SKILL.md +33 -0
  29. package/skills/kubeflow-pipelines/SKILL.md +392 -0
  30. package/skills/kubeflow-trainer/SKILL.md +429 -0
  31. package/skills/kubeflow-training-operator/SKILL.md +176 -0
  32. package/skills/mariadb/SKILL.md +27 -0
  33. package/skills/mariadb-helm/SKILL.md +378 -0
  34. package/skills/mariadb-operator/SKILL.md +1114 -0
  35. package/skills/nvidia-device-plugin/SKILL.md +204 -0
  36. package/skills/rook-ceph/SKILL.md +22 -0
  37. package/skills/rook-ceph-operator/SKILL.md +150 -0
  38. package/skills/rook-ceph-toolbox/SKILL.md +220 -0
  39. package/skills/sealed-secrets/SKILL.md +221 -0
  40. package/skills/stakater-reloader/SKILL.md +259 -0
  41. package/skills/talos/SKILL.md +244 -0
  42. package/skills/tekton/SKILL.md +187 -0
  43. package/skills/vector/SKILL.md +24 -0
  44. package/skills/vector-helm/SKILL.md +186 -0
  45. package/skills/vector-operator/SKILL.md +455 -0
  46. package/skills/victoria-metrics/SKILL.md +35 -0
  47. package/skills/victoriametrics-operator/SKILL.md +248 -0
  48. package/skills/zitadel/SKILL.md +24 -0
  49. package/skills/zitadel-api/SKILL.md +962 -0
  50. package/skills/zitadel-helm/SKILL.md +263 -0
  51. package/skills/zitadel-terraform/SKILL.md +728 -0
@@ -0,0 +1,283 @@
1
+ ---
2
+ name: cilium-gateway
3
+ description: Use when creating Gateway API resources for ingress, configuring TLS termination or passthrough, setting up HTTP-to-HTTPS redirect or traffic splitting, integrating oauth2-proxy or ExternalDNS with Cilium, or debugging Cilium Gateway controller issues including the Programmed=False cosmetic bug.
4
+ ---
5
+
6
+ # Cilium Gateway
7
+
8
+ Cilium v1.19.4 — Gateway API implementation for ingress traffic via per-node Envoy.
9
+
10
+ ## Overview
11
+
12
+ Cilium implements Gateway API v1.4.1 using per-node Envoy proxies with eBPF TPROXY interception. Supports GatewayClass, Gateway, HTTPRoute, GRPCRoute, TLSRoute, and ReferenceGrant. Host network mode exposes listeners directly on node IPs without a LoadBalancer Service. TLS termination, traffic splitting, and header modification all handled in Envoy.
13
+
14
+ ## CRDs Used
15
+
16
+ ### Gateway API (standard)
17
+ | CRD | Version | Purpose |
18
+ |-----|---------|---------|
19
+ | `GatewayClass` | `gateway.networking.k8s.io/v1` | Class reference (parametersRef → `CiliumGatewayClassConfig`) |
20
+ | `Gateway` | `gateway.networking.k8s.io/v1` | Shared LB listener — hostname, TLS, ports |
21
+ | `HTTPRoute` | `gateway.networking.k8s.io/v1` | HTTP route rules — matches, filters, backends |
22
+ | `GRPCRoute` | `gateway.networking.k8s.io/v1` | gRPC route rules |
23
+ | `TLSRoute` | `gateway.networking.k8s.io/v1alpha2` | TLS passthrough routing by SNI (experimental) |
24
+ | `ReferenceGrant` | `gateway.networking.k8s.io/v1beta1` | Allow cross-namespace references (Secret, Service) |
25
+
26
+ ### Cilium-specific
27
+ | CRD | Version | Purpose |
28
+ |-----|---------|---------|
29
+ | `CiliumGatewayClassConfig` | `cilium.io/v2alpha1` | Cilium-specific GatewayClass parameters (envoy config, LB type, etc.) |
30
+ | `CiliumEnvoyConfig` | `cilium.io/v2` | Low-level Envoy config (used internally by Gateway controller) |
31
+
32
+ ## Architecture
33
+
34
+ ```
35
+ Internet → LB IP → any node → eBPF TPROXY → per-node Envoy → identity "ingress" → backend pod
36
+ ```
37
+
38
+ - Traffic arrives at any node, eBPF intercepts via TPROXY using `ingress` identity
39
+ - Per-node Envoy (DaemonSet or cilium-agent embedded) handles L7
40
+ - Two policy enforcement points: `world → ingress` and `ingress → backend`
41
+ - Source IP preserved in `X-Forwarded-For` and `X-Envoy-External-Address` headers
42
+
43
+ ## Prerequisites
44
+
45
+ ```yaml
46
+ # Helm values needed
47
+ kubeProxyReplacement: true
48
+ gatewayAPI:
49
+ enabled: true
50
+ ```
51
+
52
+ Gateway API v1.4.1 CRDs must be pre-installed:
53
+ ```bash
54
+ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.4.1/config/crd/standard/gateway.networking.k8s.io_*.yaml
55
+ ```
56
+
57
+ ## Host Network Mode
58
+
59
+ Cilium 1.16+ — expose Gateway directly on host network (no LoadBalancer Service). **Requires Envoy hostNetwork.**
60
+
61
+ ```yaml
62
+ gatewayAPI:
63
+ enabled: true
64
+ hostNetwork:
65
+ enabled: true
66
+ nodes:
67
+ matchLabels:
68
+ role: infra
69
+ envoy:
70
+ enabled: true
71
+ hostNetwork: true
72
+ securityContext:
73
+ capabilities:
74
+ keepCapNetBindService: true
75
+ envoy:
76
+ - NET_BIND_SERVICE
77
+ ```
78
+
79
+ **Privileged ports (≤1023):** Add `NET_BIND_SERVICE` capability to Envoy.
80
+
81
+ ## Common Patterns
82
+
83
+ ### Gateway with TLS Termination
84
+ ```yaml
85
+ apiVersion: gateway.networking.k8s.io/v1
86
+ kind: Gateway
87
+ metadata:
88
+ name: cilium-gateway
89
+ namespace: cilium-gateway
90
+ annotations:
91
+ external-dns.alpha.kubernetes.io/target: 79.76.124.104
92
+ spec:
93
+ gatewayClassName: cilium
94
+ listeners:
95
+ - name: http
96
+ protocol: HTTP
97
+ port: 80
98
+ hostname: "*.kubexa.tech"
99
+ allowedRoutes:
100
+ namespaces:
101
+ from: All
102
+ - name: https
103
+ protocol: HTTPS
104
+ port: 443
105
+ hostname: "*.kubexa.tech"
106
+ tls:
107
+ mode: Terminate
108
+ certificateRefs:
109
+ - name: kubexa-tech-tls
110
+ kind: Secret
111
+ allowedRoutes:
112
+ namespaces:
113
+ from: All
114
+ ```
115
+
116
+ ### HTTP → HTTPS Redirect
117
+ ```yaml
118
+ apiVersion: gateway.networking.k8s.io/v1
119
+ kind: HTTPRoute
120
+ metadata:
121
+ name: http-to-https
122
+ namespace: cilium-gateway
123
+ spec:
124
+ parentRefs:
125
+ - name: cilium-gateway
126
+ sectionName: http
127
+ hostnames:
128
+ - "*.kubexa.tech"
129
+ rules:
130
+ - filters:
131
+ - type: RequestRedirect
132
+ requestRedirect:
133
+ scheme: https
134
+ statusCode: 301
135
+ ```
136
+
137
+ ### HTTPRoute with Backend
138
+ ```yaml
139
+ apiVersion: gateway.networking.k8s.io/v1
140
+ kind: HTTPRoute
141
+ metadata:
142
+ name: app-route
143
+ namespace: myapp
144
+ spec:
145
+ parentRefs:
146
+ - name: cilium-gateway
147
+ namespace: cilium-gateway
148
+ sectionName: https
149
+ hostnames:
150
+ - app.kubexa.tech
151
+ rules:
152
+ - matches:
153
+ - path:
154
+ type: PathPrefix
155
+ value: /
156
+ backendRefs:
157
+ - name: app-service
158
+ port: 8080
159
+ ```
160
+
161
+ ### Cross-Namespace Reference (ReferenceGrant)
162
+ ```yaml
163
+ apiVersion: gateway.networking.k8s.io/v1beta1
164
+ kind: ReferenceGrant
165
+ metadata:
166
+ name: allow-httproutes
167
+ namespace: cilium-gateway
168
+ spec:
169
+ from:
170
+ - group: gateway.networking.k8s.io
171
+ kind: HTTPRoute
172
+ namespace: kube-system
173
+ to:
174
+ - group: ""
175
+ kind: Secret
176
+ name: kubexa-tech-tls
177
+ ```
178
+ Needed when HTTPRoute is in a different namespace than the Gateway or the Secret.
179
+
180
+ ### oauth2-proxy Integration
181
+ ```yaml
182
+ apiVersion: gateway.networking.k8s.io/v1
183
+ kind: HTTPRoute
184
+ metadata:
185
+ name: hubble-ui
186
+ namespace: kube-system
187
+ spec:
188
+ parentRefs:
189
+ - name: cilium-gateway
190
+ namespace: cilium-gateway
191
+ sectionName: https
192
+ hostnames:
193
+ - hubble.kubexa.tech
194
+ rules:
195
+ - backendRefs:
196
+ - name: oauth2-proxy-hubble
197
+ port: 4180
198
+ ```
199
+
200
+ ### Traffic Splitting
201
+ ```yaml
202
+ spec:
203
+ rules:
204
+ - backendRefs:
205
+ - name: app-v1
206
+ port: 80
207
+ weight: 90
208
+ - name: app-v2
209
+ port: 80
210
+ weight: 10
211
+ ```
212
+
213
+ ### Header Modification
214
+ ```yaml
215
+ spec:
216
+ rules:
217
+ - filters:
218
+ - type: RequestHeaderModifier
219
+ requestHeaderModifier:
220
+ set:
221
+ - name: X-Custom-Header
222
+ value: my-value
223
+ add:
224
+ - name: X-Trace-Id
225
+ value: "abc123"
226
+ ```
227
+
228
+ ### LB IPAM Integration with Gateway Addresses
229
+ ```yaml
230
+ apiVersion: gateway.networking.k8s.io/v1
231
+ kind: Gateway
232
+ metadata:
233
+ name: my-gateway
234
+ spec:
235
+ addresses:
236
+ - type: IPAddress
237
+ value: 172.18.0.140
238
+ gatewayClassName: cilium
239
+ listeners:
240
+ - ...
241
+ ```
242
+ Or via annotation (deprecated): `io.cilium/lb-ipam-ips: "172.18.0.141"`
243
+
244
+ ## Known Issues
245
+
246
+ ### Programmed=False (Cosmetic)
247
+ Cilium Gateway may show `PROGRAMMED=False` status even though routes work fine.
248
+ - Gateway status: "Address not ready yet" / Programmed: False
249
+ - HTTPRoute status: Accepted: True, ResolvedRefs: True
250
+ - Traffic flows despite Programmed=False
251
+ - Do not treat this as a failure — routes are functional
252
+
253
+ ### TLS Passthrough Source IP
254
+ When using TLS passthrough, backends see Envoy IP (node IP) as source, not the client IP. This is inherent to TCP proxy mode.
255
+
256
+ ## Troubleshooting
257
+
258
+ ```bash
259
+ # Check gateway status
260
+ kubectl get gateway -A
261
+ kubectl describe gateway <name>
262
+
263
+ # Check HTTPRoute
264
+ kubectl describe httproute <name>
265
+
266
+ # Check operator logs for Gateway API errors
267
+ kubectl logs -n kube-system deployments/cilium-operator | grep gateway
268
+
269
+ # Check Envoy config
270
+ kubectl get ciliumenvoyconfigs -A
271
+
272
+ # Verify Gateway API CRDs installed
273
+ kubectl get crd | grep gateway.networking.k8s.io
274
+ ```
275
+
276
+ ### Common Errors
277
+ | Symptom | Cause | Fix |
278
+ |---------|-------|-----|
279
+ | GatewayClass not found | Gateway API CRDs not installed | Install v1.4.1 CRDs |
280
+ | Secret "X" not found | Missing ReferenceGrant for cross-namespace Secret | Add ReferenceGrant in Secret's namespace |
281
+ | BackendNotFound | Service doesn't exist or wrong namespace | Check `backendRefs` names |
282
+ | Programmed=False | Cosmetic bug (see above) | Verify HTTPRoute shows Accepted/ResolvedRefs |
283
+ | HostNetwork port clash | Port already in use | Use unique ports per Gateway resource |
@@ -0,0 +1,243 @@
1
+ ---
2
+ name: cilium-network
3
+ description: Use when configuring Cilium CNI networking, writing network policies, debugging pod connectivity issues, setting up LB IPAM or L2 announcements for Service LoadBalancer IPs, enabling BGP route advertisement, configuring transparent encryption or Hubble observability, or managing Cilium security features (host firewall, Local Redirect Policy, CiliumCIDRGroup). Not for Gateway API (use cilium-gateway).
4
+ ---
5
+
6
+ # Cilium Network
7
+
8
+ Cilium v1.19.4 — eBPF-based CNI, networking, and security for Kubernetes.
9
+
10
+ ## Overview
11
+
12
+ Cilium provides pod networking via eBPF, replacing kube-proxy with socket-level load balancing. Security policies enforce L3-L7 rules based on identity (not IP), surviving pod churn. Hubble delivers observability. LB IPAM + L2/BGP announcements expose Services externally without a cloud LB.
13
+
14
+ ## Cilium CRDs (Network Domain)
15
+
16
+ | CRD | API | Purpose |
17
+ |-----|-----|---------|
18
+ | `CiliumNetworkPolicy` | `cilium.io/v2` | Namespaced L3-L7 network policy (identity, HTTP/gRPC/Kafka, DNS) |
19
+ | `CiliumClusterwideNetworkPolicy` | `cilium.io/v2` | Cluster-scoped L3-L7 network policy |
20
+ | `CiliumEndpoint` | `cilium.io/v2` | Per-pod status: identity, labels, policy enforcement state |
21
+ | `CiliumEndpointSlice` | `cilium.io/v2` | Groups CiliumEndpoints for large-scale clusters |
22
+ | `CiliumIdentity` | `cilium.io/v2` | Security identity (labels → numeric ID) |
23
+ | `CiliumNode` | `cilium.io/v2` | Node-level Cilium config (allocated CIDRs, health endpoints) |
24
+ | `CiliumCIDRGroup` | `cilium.io/v2alpha1` | Named group of CIDRs for policy `fromCIDRSet`/`toCIDRSet` |
25
+ | `CiliumLoadBalancerIPPool` | `cilium.io/v2` | LB IPAM pool — allocate LoadBalancer IPs |
26
+ | `CiliumL2AnnouncementPolicy` | `cilium.io/v2alpha1` | L2 ARP/NDP announcement of LoadBalancer IPs (beta) |
27
+ | `CiliumLocalRedirectPolicy` | `cilium.io/v2` | Node-local traffic redirect (DNS cache, node-local proxy) |
28
+ | `CiliumBGPClusterConfig` | `cilium.io/v2alpha1` | BGP cluster-level config (ASN, listen port, node selector) |
29
+ | `CiliumBGPPeerConfig` | `cilium.io/v2alpha1` | BGP peer — transport, auth, timers, AFI/SAFI |
30
+ | `CiliumBGPAdvertisement` | `cilium.io/v2alpha1` | Advertise pod CIDRs / service LB IPs via BGP |
31
+ | `CiliumBGPNodeConfig` | `cilium.io/v2alpha1` | Per-node BGP status (read-only) |
32
+ | `CiliumEnvoyConfig` | `cilium.io/v2` | Envoy proxy config (L7 policy enforcement) |
33
+
34
+ ## Key Concepts
35
+
36
+ ### Routing Modes
37
+ - **Overlay (VXLAN/Geneve):** Encapsulation, works on any infra. Default.
38
+ - **Native routing:** Uses host routing table. Needs KubeSpan/IPsec/WireGuard for cross-node encryption. Faster.
39
+ - **kube-proxy replacement:** `kubeProxyReplacement=true` — eBPF replaces iptables for services. Required for many features.
40
+
41
+ ### Identity-Based Security
42
+ Cilium assigns a security identity to every pod based on labels. Policies match on identity, not IP. This survives pod churn.
43
+
44
+ ### Policy Enforcement Points
45
+ - L3/L4: eBPF datapath (fast path)
46
+ - L7: Envoy proxy (HTTP/gRPC/Kafka inspection)
47
+ - `reserved:host` identity for kubelet probes, `reserved:world` for external traffic
48
+
49
+ ## CRD Usage Patterns
50
+
51
+ ### Network Policies
52
+ ```yaml
53
+ # Namespaced: allow ingress from app=frontend to app=backend on port 80
54
+ apiVersion: cilium.io/v2
55
+ kind: CiliumNetworkPolicy
56
+ metadata:
57
+ name: allow-frontend
58
+ namespace: default
59
+ spec:
60
+ endpointSelector:
61
+ matchLabels:
62
+ app: backend
63
+ ingress:
64
+ - fromEndpoints:
65
+ - matchLabels:
66
+ app: frontend
67
+ toPorts:
68
+ - ports:
69
+ - port: "80"
70
+ protocol: TCP
71
+
72
+ # Cluster-wide: allow kubelet health probes from host
73
+ apiVersion: cilium.io/v2
74
+ kind: CiliumClusterwideNetworkPolicy
75
+ metadata:
76
+ name: allow-kubelet-health-probes
77
+ spec:
78
+ endpointSelector: {}
79
+ ingress:
80
+ - fromEntities:
81
+ - host
82
+ - remote-node
83
+ ```
84
+ Key endpoints: `fromEndpoints`, `toEndpoints`, `fromEntities` (host/world/cluster/remote-node/all), `fromCIDR`, `toCIDR`, `toFQDNs` (DNS), `fromCIDRSet` with `cidrGroupRef`.
85
+
86
+ ### L7 HTTP Policy
87
+ ```yaml
88
+ spec:
89
+ endpointSelector:
90
+ matchLabels:
91
+ app: my-api
92
+ ingress:
93
+ - toPorts:
94
+ - ports:
95
+ - port: "8080"
96
+ rules:
97
+ http:
98
+ - method: "GET"
99
+ path: "/public/.*"
100
+ ```
101
+
102
+ ### LB IPAM + L2 Announcements
103
+ ```yaml
104
+ # 1. IP pool
105
+ apiVersion: cilium.io/v2
106
+ kind: CiliumLoadBalancerIPPool
107
+ metadata:
108
+ name: prod-pool
109
+ spec:
110
+ blocks:
111
+ - cidr: "10.0.10.0/24"
112
+ serviceSelector:
113
+ matchLabels:
114
+ lb-site: prod
115
+
116
+ # 2. L2 announcement policy
117
+ apiVersion: cilium.io/v2alpha1
118
+ kind: CiliumL2AnnouncementPolicy
119
+ metadata:
120
+ name: prod-policy
121
+ spec:
122
+ loadBalancerIPs: true
123
+ serviceSelector:
124
+ matchLabels:
125
+ lb-site: prod
126
+ nodeSelector:
127
+ matchLabels:
128
+ role: infra
129
+ interfaces:
130
+ - ^eth[0-9]+
131
+ ```
132
+ **Key:** L2 announcements require `loadBalancerIPs: true` or `externalIPs: true`. Without an explicit service selector, all services match. Without node selector, all nodes are candidates.
133
+
134
+ ### L2 Announcement Lease Tuning
135
+ ```yaml
136
+ l2announcements:
137
+ leaseDuration: 15s # time before failover
138
+ leaseRenewDeadline: 5s
139
+ leaseRetryPeriod: 2s
140
+ ```
141
+ Client rate limit must be sized: `QPS = #services * (1 / leaseRenewDeadline)`.
142
+
143
+ ### CiliumCIDRGroup
144
+ ```yaml
145
+ apiVersion: cilium.io/v2alpha1
146
+ kind: CiliumCIDRGroup
147
+ metadata:
148
+ name: vpn-cidrs
149
+ spec:
150
+ externalCIDRs:
151
+ - "10.48.0.0/24"
152
+ ---
153
+ apiVersion: cilium.io/v2
154
+ kind: CiliumNetworkPolicy
155
+ metadata:
156
+ name: from-vpn
157
+ spec:
158
+ endpointSelector: {}
159
+ ingress:
160
+ - fromCIDRSet:
161
+ - cidrGroupRef: vpn-cidrs
162
+ ```
163
+
164
+ ### BGP Control Plane
165
+ Enable via `bgpControlPlane.enabled=true` in Helm.
166
+
167
+ ```yaml
168
+ apiVersion: cilium.io/v2alpha1
169
+ kind: CiliumBGPClusterConfig
170
+ metadata:
171
+ name: cilium-bgp
172
+ spec:
173
+ nodeSelector:
174
+ matchLabels:
175
+ bgp: enabled
176
+ bgpInstances:
177
+ - name: "65000"
178
+ localASN: 65000
179
+ peers:
180
+ - name: "65001"
181
+ peerAddress: "10.0.0.1"
182
+ peerASN: 65001
183
+ peerConfigRef:
184
+ name: cilium-peer
185
+ ---
186
+ apiVersion: cilium.io/v2alpha1
187
+ kind: CiliumBGPPeerConfig
188
+ metadata:
189
+ name: cilium-peer
190
+ spec:
191
+ transport:
192
+ peerPort: 179
193
+ timers:
194
+ holdTimeSeconds: 90
195
+ keepAliveTimeSeconds: 30
196
+ families:
197
+ - afi: ipv4
198
+ safi: unicast
199
+ ---
200
+ apiVersion: cilium.io/v2alpha1
201
+ kind: CiliumBGPAdvertisement
202
+ metadata:
203
+ name: bgp-adverts
204
+ spec:
205
+ advertisements:
206
+ - advertisementType: "PodCIDR"
207
+ selector:
208
+ matchLabels:
209
+ bgp: enabled
210
+ - advertisementType: "Service"
211
+ selector:
212
+ matchExpressions:
213
+ - {key: "color", operator: In, values: [blue]}
214
+ service:
215
+ addressType: LoadBalancerIP
216
+ ```
217
+
218
+ ### Encryption
219
+ - **IPsec:** `encryption.type=ipsec` — tunnel or wireguard encryption mode
220
+ - **WireGuard:** `encryption.type=wireguard` — per-tunnel, simpler key mgmt
221
+ - **Transparent Encryption:** node-to-node encryption without app changes
222
+
223
+ ### Hubble Observability
224
+ Enable with `hubble.enabled=true`, `hubble.relay.enabled=true`, `hubble.ui.enabled=true`.
225
+ Metrics: `hubble.metrics.enabled: [dns, drop, tcp, flow, http, node:true]`
226
+
227
+ ### Host Firewall
228
+ ```yaml
229
+ hostFirewall:
230
+ enabled: true
231
+ ```
232
+ Protects host network namespace with CiliumClusterwideNetworkPolicy (use `nodeSelector` to target).
233
+
234
+ ## Common Mistakes
235
+
236
+ - **Forgot `loadBalancerIPs: true` in L2 policy** → nothing announced
237
+ - **No client rate limit sizing for L2 announcements** → lease renewal fails at scale
238
+ - **CNP `endpointSelector` empty in wrong namespace** → policy applies to nothing
239
+ - **Using `protocol: UDP` instead of `protocol: UDP` in port rules** — Cilium uses uppercase `TCP`/`UDP`/`SCTP`
240
+ - **`externalTrafficPolicy: Local` with L2 announcements** → traffic drops on nodes without local pods. Use `Cluster` instead.
241
+ - **BGP: peer IP unreachable from all nodes** — BGP runs in host network, ensure L2 connectivity to peer
242
+ - **BGP `nodeSelector` across all CiliumBGP* CRDs must match** — a node must be selected by cluster config, peer config, AND advertisement
243
+ - **Hubble metrics missing desired protocols** — list explicit metrics `[dns, drop, tcp, flow, http]`
@@ -0,0 +1,130 @@
1
+ ---
2
+ name: cnpg
3
+ description: Use when working with CloudNativePG (CNPG) for PostgreSQL on Kubernetes — creating or troubleshooting Cluster, Backup, ScheduledBackup, Pooler, or ImageCatalog resources; defining bootstrap, storage, backup, or affinity settings.
4
+ ---
5
+
6
+ # CloudNativePG (CNPG)
7
+
8
+ ## Overview
9
+
10
+ CloudNativePG operator manages HA PostgreSQL clusters on Kubernetes. API: `postgresql.cnpg.io/v1`. Deployed via Helm chart (latest: 0.28.2 \u2192 operator 1.29.1).
11
+
12
+ ## CRD Reference
13
+
14
+ | CRD | Purpose |
15
+ |-----|---------|
16
+ | `Cluster` | HA PostgreSQL cluster (core resource) |
17
+ | `Backup` | On-demand backup |
18
+ | `ScheduledBackup` | Cron-based backup schedule |
19
+ | `Pooler` | PgBouncer connection pooler |
20
+ | `Database` | Database-level management within cluster |
21
+ | `Publication` / `Subscription` | Logical replication |
22
+ | `ImageCatalog` / `ClusterImageCatalog` | Extension image management (v1.29+) |
23
+ | `FailoverQuorum` | Quorum-based failover settings |
24
+
25
+ ## Cluster Spec (Key Fields)
26
+
27
+ ```yaml
28
+ apiVersion: postgresql.cnpg.io/v1
29
+ kind: Cluster
30
+ spec:
31
+ instances: 1 # Replicas (1 = standalone primary)
32
+ imageName: ghcr.io/cloudnative-pg/postgresql:18
33
+
34
+ storage:
35
+ size: 10Gi
36
+ storageClass: ceph-block # Omit or null for cluster default
37
+ resizePVC: true # Allow storage resize via Helm upgrade
38
+
39
+ walStorage: # Optional \u2014 separate WAL volume
40
+ size: 5Gi
41
+ storageClass: ceph-block
42
+
43
+ bootstrap:
44
+ initdb:
45
+ database: app # Default DB name
46
+ owner: app # Default owner user
47
+ postInitSQL: # Run after init
48
+ - CREATE EXTENSION vector;
49
+
50
+ affinity:
51
+ nodeSelector:
52
+ kubernetes.io/hostname: worker-proxmox
53
+ podAntiAffinity:
54
+ type: preferred # required or preferred
55
+ topologyKey: kubernetes.io/hostname
56
+
57
+ postgresql:
58
+ parameters:
59
+ shared_preload_libraries: vector # Extensions
60
+ pg_hba: # Custom pg_hba (v1.29+ use podSelectorRefs)
61
+
62
+ primaryUpdateStrategy: unsupervised # unsupervised (default) or switchover
63
+ primaryUpdateMethod: restart # restart or switchover
64
+
65
+ monitoring:
66
+ enablePodMonitor: true # Generates Prometheus PodMonitor
67
+
68
+ backup:
69
+ volumeSnapshot:
70
+ snapshotClass: csi-ceph-block # CSI snapshot class
71
+ ```
72
+
73
+ ## Bootstrap Methods
74
+
75
+ | Method | Use case |
76
+ |--------|----------|
77
+ | `initdb` | Fresh cluster from scratch |
78
+ | `pg_basebackup` | Clone from existing CNPG cluster |
79
+ | `recovery` | PITR from Barman/volumeSnapshot/plugin backup |
80
+ | `pg_basebackup` via `recovery` | Bootstrap replica from another cluster |
81
+
82
+ ## Backup Methods
83
+
84
+ | Method | Status | Details |
85
+ |--------|--------|---------|
86
+ | `barmanObjectStore` | **Deprecated** (removed v1.30) | S3/GCS/Azure via Barman Cloud |
87
+ | `volumeSnapshot` | Current | CSI snapshots (recommended for Ceph, EBS, etc.) |
88
+ | `plugin` | Current | CNPG-I gRPC plugin for backup/WAL/recovery |
89
+
90
+ ## Affinity / Scheduling Patterns
91
+
92
+ - **nodeSelector** — Pin to specific node (common for stateful workloads)
93
+ - **podAntiAffinity** — Spread replicas across nodes (`required` for HA, `preferred` for soft)
94
+ - **topologySpreadConstraints** — Distribute across zones
95
+ - **tolerations** — For tainted nodes (GPU, infra-only)
96
+
97
+ Set per-instance overrides via `instances` + `nodeSelector` to pin specific roles.
98
+
99
+ ## ImageCatalog (v1.29+)
100
+
101
+ ```yaml
102
+ apiVersion: postgresql.cnpg.io/v1
103
+ kind: ClusterImageCatalog
104
+ metadata:
105
+ name: postgres-extensions
106
+ spec:
107
+ images:
108
+ - image: ghcr.io/my-org/postgres:18-pgvector
109
+ major: 18
110
+ ```
111
+
112
+ Then reference in Cluster: `spec.imageCatalogRef: postgres-extensions`
113
+
114
+ ## Common Mistakes
115
+
116
+ - **BarmanObjectStore as default** \u2014 deprecated, use `volumeSnapshot` or `plugin`
117
+ - **No WAL storage** \u2014 important for HA. Add `walStorage` block for performance
118
+ - **`resizePVC: false`** \u2014 prevents storage expansion on Helm upgrade
119
+ - **`primaryUpdateStrategy: supervised`** \u2014 requires manual approval for switchover; use `unsupervised` for automatic
120
+ - **Same storageClass for all workloads** \u2014 ceph-block fine for most, but immich needs local-fast for vector extension performance
121
+ - **Missing `INHERITED_ANNOTATIONS` / `INHERITED_LABELS`** \u2014 empty is fine; only set if you want annotations propagated to PVCs
122
+
123
+ ## Version Mapping
124
+
125
+ | Helm Chart | Operator | Release Notes |
126
+ |------------|----------|---------------|
127
+ | 0.28.2 | 1.29.1 | Latest (CVE fixes) |
128
+ | 0.28.0 | 1.29.0 | ImageCatalog, podSelectorRefs, CNPG-I plugins |
129
+
130
+ Check `helm list -n <ns>` for deployed chart version; cross-reference with [cloudnative-pg.io/release-notes](https://cloudnative-pg.io/release-notes/).