k8s-agent-skills 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +102 -0
- package/package.json +63 -0
- package/skills/atlas/SKILL.md +166 -0
- package/skills/cert-manager/SKILL.md +212 -0
- package/skills/cilium-gateway/SKILL.md +283 -0
- package/skills/cilium-network/SKILL.md +243 -0
- package/skills/cnpg/SKILL.md +130 -0
- package/skills/dragonfly/SKILL.md +194 -0
- package/skills/external-dns/SKILL.md +185 -0
- package/skills/flagger/SKILL.md +292 -0
- package/skills/flux/SKILL.md +36 -0
- package/skills/gitea/SKILL.md +32 -0
- package/skills/gitea-api/SKILL.md +104 -0
- package/skills/gitea-registry/SKILL.md +71 -0
- package/skills/gitea-runner/SKILL.md +126 -0
- package/skills/gitea-tea/SKILL.md +206 -0
- package/skills/gitea-webhooks/SKILL.md +93 -0
- package/skills/harbor/SKILL.md +32 -0
- package/skills/harbor-api/SKILL.md +231 -0
- package/skills/harbor-helm/SKILL.md +238 -0
- package/skills/harbor-terraform/SKILL.md +233 -0
- package/skills/higress/SKILL.md +27 -0
- package/skills/higress-helm/SKILL.md +328 -0
- package/skills/higress-operator/SKILL.md +435 -0
- package/skills/kserve/SKILL.md +28 -0
- package/skills/kserve-helm/SKILL.md +330 -0
- package/skills/kserve-operator/SKILL.md +763 -0
- package/skills/kubeflow/SKILL.md +33 -0
- package/skills/kubeflow-pipelines/SKILL.md +392 -0
- package/skills/kubeflow-trainer/SKILL.md +429 -0
- package/skills/kubeflow-training-operator/SKILL.md +176 -0
- package/skills/mariadb/SKILL.md +27 -0
- package/skills/mariadb-helm/SKILL.md +378 -0
- package/skills/mariadb-operator/SKILL.md +1114 -0
- package/skills/nvidia-device-plugin/SKILL.md +204 -0
- package/skills/rook-ceph/SKILL.md +22 -0
- package/skills/rook-ceph-operator/SKILL.md +150 -0
- package/skills/rook-ceph-toolbox/SKILL.md +220 -0
- package/skills/sealed-secrets/SKILL.md +221 -0
- package/skills/stakater-reloader/SKILL.md +259 -0
- package/skills/talos/SKILL.md +244 -0
- package/skills/tekton/SKILL.md +187 -0
- package/skills/vector/SKILL.md +24 -0
- package/skills/vector-helm/SKILL.md +186 -0
- package/skills/vector-operator/SKILL.md +455 -0
- package/skills/victoria-metrics/SKILL.md +35 -0
- package/skills/victoriametrics-operator/SKILL.md +248 -0
- package/skills/zitadel/SKILL.md +24 -0
- package/skills/zitadel-api/SKILL.md +962 -0
- package/skills/zitadel-helm/SKILL.md +263 -0
- package/skills/zitadel-terraform/SKILL.md +728 -0
|
@@ -0,0 +1,194 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: dragonfly
|
|
3
|
+
description: Use when working with DragonflyDB operator on Kubernetes — creating or troubleshooting Dragonfly resources, configuring replication, snapshots, TLS, authentication, or affinity/scheduling for Dragonfly instances.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# DragonflyDB Operator
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
Dragonfly operator manages Dragonfly (Redis-compatible) in-memory data store instances on Kubernetes. API: `dragonflydb.io/v1alpha1`. Single CRD: `Dragonfly` (plural: `dragonflies`). Deployed via Helm chart `oci://ghcr.io/dragonflydb/dragonfly-operator/helm/dragonfly-operator`. Latest operator: **v1.5.0** (Mar 2026). Deployed chart: v1.5.0.
|
|
11
|
+
|
|
12
|
+
## CRD Fields
|
|
13
|
+
|
|
14
|
+
| Field | Type | Since | Description |
|
|
15
|
+
|-------|------|-------|-------------|
|
|
16
|
+
| `replicas` | int | — | Total instances (1 = standalone primary, 2 = 1 primary + 1 replica) |
|
|
17
|
+
| `image` | string | — | Dragonfly image (default: `docker.dragonflydb.io/dragonflydb/dragonfly:v1.21.2`) |
|
|
18
|
+
| `args` | []string | — | Dragonfly server args (e.g. `--maxmemory=2gb`) |
|
|
19
|
+
| `resources` | ResourceRequirements | — | Container CPU/memory |
|
|
20
|
+
| `affinity` | Affinity | — | Pod affinity (nodeAffinity, podAntiAffinity, etc.) |
|
|
21
|
+
| `nodeSelector` | map | v1.1.1 | Node selector for pod scheduling |
|
|
22
|
+
| `tolerations` | []Toleration | — | Pod tolerations |
|
|
23
|
+
| `topologySpreadConstraints` | []TopologySpreadConstraint | v1.1.1 | Spread pods across topology domains |
|
|
24
|
+
| `annotations` | object | — | Annotations on Dragonfly pods |
|
|
25
|
+
| `labels` | object | — | Labels on Dragonfly pods |
|
|
26
|
+
| `env` | []EnvVar | — | Environment variables |
|
|
27
|
+
| `authentication.passwordFromSecret` | SecretKeySelector | — | Password from Secret key |
|
|
28
|
+
| `authentication.clientCaCertSecret` | SecretReference | — | Client CA certificate Secret |
|
|
29
|
+
| `tlsSecretRef` | SecretReference | — | TLS cert Secret for server TLS |
|
|
30
|
+
| `snapshot.cron` | string | — | Cron schedule for snapshots (e.g. `"\*/5 * * * *"`) |
|
|
31
|
+
| `snapshot.persistentVolumeClaimSpec` | PVC Spec | — | PVC for snapshot storage |
|
|
32
|
+
| `aclFromSecret` | SecretKeySelector | v1.1.1 | ACL file from Secret |
|
|
33
|
+
| `serviceAccountName` | string | — | Pod service account |
|
|
34
|
+
| `serviceSpec.type` | string | — | Service type (ClusterIP, LoadBalancer, etc.) |
|
|
35
|
+
| `serviceSpec.name` | string | v1.1.3 | Custom service name |
|
|
36
|
+
| `serviceSpec.annotations` | object | — | Service annotations |
|
|
37
|
+
| `priorityClassName` | string | v1.1.1 | Pod priority class |
|
|
38
|
+
| `skipFSGroup` | bool | v1.1.2 | Skip FSGroup assignment (OpenShift) |
|
|
39
|
+
| `memcachedPort` | int | v1.1.2 | Memcached port (alternative to `--memcached_port` arg) |
|
|
40
|
+
| `additionalContainers` | []Container | — | Sidecar containers |
|
|
41
|
+
| `additionalVolumes` | []Volume | — | Extra volumes |
|
|
42
|
+
|
|
43
|
+
## Dragonfly Spec Patterns
|
|
44
|
+
|
|
45
|
+
```yaml
|
|
46
|
+
apiVersion: dragonflydb.io/v1alpha1
|
|
47
|
+
kind: Dragonfly
|
|
48
|
+
metadata:
|
|
49
|
+
name: my-cache
|
|
50
|
+
spec:
|
|
51
|
+
replicas: 1 # 1 = standalone primary
|
|
52
|
+
args:
|
|
53
|
+
- --maxmemory=2gb
|
|
54
|
+
- --logtostderr
|
|
55
|
+
- --cluster_mode=emulated # Enable cluster-compatible mode
|
|
56
|
+
- --lock_on_hashtags # Hashtag-based locking
|
|
57
|
+
- --default_lua_flags=allow-undeclared-keys
|
|
58
|
+
|
|
59
|
+
resources:
|
|
60
|
+
requests:
|
|
61
|
+
cpu: 500m
|
|
62
|
+
memory: 1Gi
|
|
63
|
+
limits:
|
|
64
|
+
cpu: "1"
|
|
65
|
+
memory: 2Gi
|
|
66
|
+
|
|
67
|
+
affinity:
|
|
68
|
+
nodeAffinity:
|
|
69
|
+
requiredDuringSchedulingIgnoredDuringExecution:
|
|
70
|
+
nodeSelectorTerms:
|
|
71
|
+
- matchExpressions:
|
|
72
|
+
- key: kubernetes.io/hostname
|
|
73
|
+
operator: In
|
|
74
|
+
values:
|
|
75
|
+
- worker-proxmox
|
|
76
|
+
|
|
77
|
+
authentication:
|
|
78
|
+
passwordFromSecret:
|
|
79
|
+
name: dragonfly-password
|
|
80
|
+
key: password
|
|
81
|
+
|
|
82
|
+
snapshot:
|
|
83
|
+
cron: "*/5 * * * *"
|
|
84
|
+
persistentVolumeClaimSpec:
|
|
85
|
+
accessModes:
|
|
86
|
+
- ReadWriteOnce
|
|
87
|
+
storageClassName: ceph-block
|
|
88
|
+
resources:
|
|
89
|
+
requests:
|
|
90
|
+
storage: 2Gi
|
|
91
|
+
|
|
92
|
+
serviceSpec:
|
|
93
|
+
type: ClusterIP
|
|
94
|
+
annotations:
|
|
95
|
+
external-dns.alpha.kubernetes.io/hostname: dragonfly.example.com
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
## Replication Model
|
|
99
|
+
|
|
100
|
+
- `replicas=1` — standalone primary
|
|
101
|
+
- `replicas=2` — 1 primary + 1 replica
|
|
102
|
+
- `replicas=3` — 1 primary + 2 replicas
|
|
103
|
+
- **Always exactly 1 primary** regardless of replica count
|
|
104
|
+
- Operator manages automatic failover if primary fails
|
|
105
|
+
- Service `<name>.<ns>.svc.cluster.local` always points to current primary
|
|
106
|
+
|
|
107
|
+
## Authentication
|
|
108
|
+
|
|
109
|
+
| Method | Config | Description |
|
|
110
|
+
|--------|--------|-------------|
|
|
111
|
+
| Password | `authentication.passwordFromSecret` | Basic password auth (maps to `--requirepass`) |
|
|
112
|
+
| Client CA | `authentication.clientCaCertSecret` | TLS client cert verification |
|
|
113
|
+
| ACL file | `aclFromSecret` | ACL rules file from Secret (v1.1.1+) |
|
|
114
|
+
|
|
115
|
+
Password can also be set via `args: ["--requirepass=<pw>"]` or `env: [{name: DFLY_requirepass, value: "<pw>"}]`.
|
|
116
|
+
|
|
117
|
+
## TLS
|
|
118
|
+
|
|
119
|
+
```yaml
|
|
120
|
+
spec:
|
|
121
|
+
tlsSecretRef:
|
|
122
|
+
name: dragonfly-tls # Secret must have tls.crt, tls.key
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
Secret must exist in same namespace. Optionally combine with `authentication.clientCaCertSecret` for mutual TLS.
|
|
126
|
+
|
|
127
|
+
## Snapshots
|
|
128
|
+
|
|
129
|
+
Snapshots store Dragonfly data to PVC for persistence across restarts:
|
|
130
|
+
|
|
131
|
+
```yaml
|
|
132
|
+
spec:
|
|
133
|
+
snapshot:
|
|
134
|
+
cron: "*/5 * * * *"
|
|
135
|
+
persistentVolumeClaimSpec:
|
|
136
|
+
storageClassName: ceph-block
|
|
137
|
+
accessModes: [ReadWriteOnce]
|
|
138
|
+
resources:
|
|
139
|
+
requests:
|
|
140
|
+
storage: 2Gi
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
- `cron` is optional — omit for on-demand only
|
|
144
|
+
- Snapshots **not auto-pruned** — manage disk or use static `--dbfilename` to overwrite
|
|
145
|
+
- PVC spec follows standard Kubernetes `PersistentVolumeClaimSpec`
|
|
146
|
+
|
|
147
|
+
## Exposing Dragonfly as Cache
|
|
148
|
+
|
|
149
|
+
Applications connect via the service at `<name>.<ns>.svc.cluster.local:6379`. In apps using Dragonfly as Redis-compatible cache (valkey/Redis clients):
|
|
150
|
+
|
|
151
|
+
```yaml
|
|
152
|
+
# In the app's ConfigMap or env
|
|
153
|
+
REDIS_URL: redis://:${REDIS_PASSWORD}@my-cache.namespace:6379
|
|
154
|
+
REDIS_PASSWORD: # from Dragonfly auth secret
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
For cluster-mode emulated (`--cluster_mode=emulated`), Redis cluster clients (e.g. `ioredis` cluster mode) can connect as if it's a Redis Cluster — but there's only one actual Dragonfly instance behind the service.
|
|
158
|
+
|
|
159
|
+
## Monitoring
|
|
160
|
+
|
|
161
|
+
Dragonfly exposes metrics on the `admin` port. Use a `PodMonitor` to scrape:
|
|
162
|
+
|
|
163
|
+
```yaml
|
|
164
|
+
apiVersion: monitoring.coreos.com/v1
|
|
165
|
+
kind: PodMonitor
|
|
166
|
+
metadata:
|
|
167
|
+
name: dragonfly-monitor
|
|
168
|
+
spec:
|
|
169
|
+
selector:
|
|
170
|
+
matchLabels:
|
|
171
|
+
app: my-cache # Must match Dragonfly resource name
|
|
172
|
+
podMetricsEndpoints:
|
|
173
|
+
- port: admin
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
The operator creates pods with label `app: <dragonfly-name>` by default.
|
|
177
|
+
|
|
178
|
+
## Common Mistakes
|
|
179
|
+
|
|
180
|
+
- **`replicas` confusion** — replicas=2 means 1 primary + 1 replica, NOT 2 primaries
|
|
181
|
+
- **No snapshot cron** — without `cron` + PVC, restarts lose all data; always configure for stateful use
|
|
182
|
+
- **`cluster_mode=emulated` without `lock_on_hashtags`** — emulated cluster needs hashtag locking for multi-key ops
|
|
183
|
+
- **Password collision** — don't set password via both `authentication.passwordFromSecret` AND `--requirepass` arg; use the CRD field
|
|
184
|
+
- **Same storageClass for all** — immich with ceph-block fine; if latency-sensitive, use local SSD via `nodeSelector` + local storage
|
|
185
|
+
- **No resource limits** — Dragonfly can OOM under load; always set `resources.limits.memory`
|
|
186
|
+
- **Helm chart OCI URL** — use `oci://ghcr.io/dragonflydb/dragonfly-operator/helm`, chart name `dragonfly-operator`, version `v1.5.0`
|
|
187
|
+
|
|
188
|
+
## Version History
|
|
189
|
+
|
|
190
|
+
| Operator | Helm Chart | Date | Notes |
|
|
191
|
+
|----------|-----------|------|-------|
|
|
192
|
+
| v1.5.0 | v1.5.0 | Mar 2026 | Latest (deployed) |
|
|
193
|
+
| v1.4.0 | v1.4.0 | Jan 2026 | |
|
|
194
|
+
| v1.3.1 | v1.3.1 | Nov 2025 | |
|
|
@@ -0,0 +1,185 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: external-dns
|
|
3
|
+
description: Use when working with ExternalDNS — synchronizing Kubernetes resources with DNS providers (Cloudflare). Covers providers, sources, registry, RBAC, Gateway API integration, Helm values. No CRDs.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# ExternalDNS
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
ExternalDNS synchronizes exposed Kubernetes Services, Ingresses, and Gateway API routes with DNS providers. It watches resources via the K8s watch API and creates/updates/deletes DNS records to match.
|
|
11
|
+
|
|
12
|
+
**No CRDs.** Controlled via CLI flags, sources, and provider-specific config.
|
|
13
|
+
|
|
14
|
+
**Latest:** chart 1.21.1, app v0.21.0.
|
|
15
|
+
|
|
16
|
+
## Architecture
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
K8s Sources (Ingress, HTTPRoute, Service...)
|
|
20
|
+
→ ExternalDNS watches for changes
|
|
21
|
+
→ Resolves to DNS endpoints
|
|
22
|
+
→ Creates/updates/deletes DNS records (Cloudflare, Route53, etc.)
|
|
23
|
+
→ Registry (TXT records) tracks ownership
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## Sources
|
|
27
|
+
|
|
28
|
+
ExternalDNS queries one or more source types for DNS endpoints:
|
|
29
|
+
|
|
30
|
+
| Source | Flag | Supported |
|
|
31
|
+
|--------|------|-----------|
|
|
32
|
+
| Ingress | `--source=ingress` | ✅ |
|
|
33
|
+
| Service (LoadBalancer) | `--source=service` | ✅ (no NodePort) |
|
|
34
|
+
| Gateway HTTPRoute | `--source=gateway-httproute` | ✅ |
|
|
35
|
+
| Gateway GRPCRoute | `--source=gateway-grpcroute` | ✅ |
|
|
36
|
+
| Gateway TLSRoute | `--source=gateway-tlsroute` | ✅ (v1alpha2) |
|
|
37
|
+
| Gateway TCPRoute | `--source=gateway-tcproute` | ✅ (experimental) |
|
|
38
|
+
| Gateway UDPRoute | `--source=gateway-udproute` | ✅ (experimental) |
|
|
39
|
+
| Istio Gateway | `--source=istio-gateway` | ✅ |
|
|
40
|
+
| Istio VirtualService | `--source=istio-virtualservice` | ✅ |
|
|
41
|
+
| CRD | `--source=crd` | ✅ (externaldns.k8s.io/v1alpha1) |
|
|
42
|
+
| Node | `--source=node` | ✅ |
|
|
43
|
+
| Pod | `--source=pod` | ✅ |
|
|
44
|
+
| OpenShift Route | `--source=openshift-route` | ✅ |
|
|
45
|
+
| Contour HTTPProxy | `--source=contour-httpproxy` | ✅ |
|
|
46
|
+
| Traefik Proxy | `--source=traefik-proxy` | ✅ |
|
|
47
|
+
|
|
48
|
+
**Deployed sources:** `ingress`, `gateway-httproute`.
|
|
49
|
+
|
|
50
|
+
## Providers
|
|
51
|
+
|
|
52
|
+
ExternalDNS supports 25+ providers. Deployed: Cloudflare.
|
|
53
|
+
|
|
54
|
+
### Cloudflare
|
|
55
|
+
|
|
56
|
+
Auth via API token (env var `CF_API_TOKEN`). Example values:
|
|
57
|
+
|
|
58
|
+
```yaml
|
|
59
|
+
provider:
|
|
60
|
+
name: cloudflare
|
|
61
|
+
env:
|
|
62
|
+
- name: CF_API_TOKEN
|
|
63
|
+
valueFrom:
|
|
64
|
+
secretKeyRef:
|
|
65
|
+
name: cloudflare-credentials
|
|
66
|
+
key: api-token
|
|
67
|
+
extraArgs:
|
|
68
|
+
- --cloudflare-proxied
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
Cloudflare-specific flags:
|
|
72
|
+
|
|
73
|
+
| Flag | Description |
|
|
74
|
+
|------|-------------|
|
|
75
|
+
| `--cloudflare-proxied` | Enable Cloudflare proxy (orange cloud) — CDN, DDoS protection, SSL |
|
|
76
|
+
| `--cloudflare-dns-records-per-page=N` | Records per page (default 100, max 5000) |
|
|
77
|
+
| `--cloudflare-custom-hostnames` | Enable Cloudflare for SaaS Custom Hostnames |
|
|
78
|
+
| `--cloudflare-regional-services` | Restrict HTTPS decryption to specific regions |
|
|
79
|
+
| `--cloudflare-region-key` | Region key for regional services |
|
|
80
|
+
| `--cloudflare-record-comment` | Add comment to provisioned records (≤100/≤500 chars) |
|
|
81
|
+
|
|
82
|
+
## Registry & Policy
|
|
83
|
+
|
|
84
|
+
Controls how ExternalDNS tracks ownership of records:
|
|
85
|
+
|
|
86
|
+
```yaml
|
|
87
|
+
registry: txt # Use TXT records to track ownership
|
|
88
|
+
txtOwnerId: my-cluster # Owner identifier in TXT record
|
|
89
|
+
policy: upsert-only # Only create/update, never delete
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
| Registry | Description |
|
|
93
|
+
|----------|-------------|
|
|
94
|
+
| `txt` | TXT records with owner ID (prevents overwriting records from other sources) |
|
|
95
|
+
| `aws` | AWS Route53 tag-based (provider-specific) |
|
|
96
|
+
| `noop` | No ownership tracking |
|
|
97
|
+
|
|
98
|
+
| Policy | Description |
|
|
99
|
+
|--------|-------------|
|
|
100
|
+
| `upsert-only` | Create and update only (safe for shared zones) |
|
|
101
|
+
| `sync` | Full sync — create, update, delete (can delete external records) |
|
|
102
|
+
| `create-only` | Only create, never update or delete |
|
|
103
|
+
|
|
104
|
+
## Domain & Ownership Filtering
|
|
105
|
+
|
|
106
|
+
```yaml
|
|
107
|
+
domainFilters:
|
|
108
|
+
- example.com # Only manage records in this zone
|
|
109
|
+
excludeDomains: [] # Exclude specific domains
|
|
110
|
+
zoneIdFilters: [] # Limit to specific zone IDs
|
|
111
|
+
annotationFilter: "" # Filter resources by annotation
|
|
112
|
+
labelFilter: "" # Filter resources by label
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
## Gateway API Integration
|
|
116
|
+
|
|
117
|
+
ExternalDNS reads hostnames from Gateway API HTTPRoute/GRPCRoute resources:
|
|
118
|
+
|
|
119
|
+
```yaml
|
|
120
|
+
sources:
|
|
121
|
+
- gateway-httproute
|
|
122
|
+
- gateway-grpcroute
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
RBAC for Gateway sources requires additional permissions:
|
|
126
|
+
|
|
127
|
+
```yaml
|
|
128
|
+
rbac:
|
|
129
|
+
extraRules:
|
|
130
|
+
- apiGroups: ["gateway.networking.k8s.io"]
|
|
131
|
+
resources: ["httproutes", "gateways"]
|
|
132
|
+
verbs: ["get", "watch", "list"]
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
HTTPRoute annotations for per-route overrides:
|
|
136
|
+
|
|
137
|
+
```yaml
|
|
138
|
+
metadata:
|
|
139
|
+
annotations:
|
|
140
|
+
external-dns.alpha.kubernetes.io/cloudflare-proxied: "true"
|
|
141
|
+
external-dns.alpha.kubernetes.io/ttl: "300"
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
## Deployment (Flux HelmRelease)
|
|
145
|
+
|
|
146
|
+
```yaml
|
|
147
|
+
apiVersion: helm.toolkit.fluxcd.io/v2
|
|
148
|
+
kind: HelmRelease
|
|
149
|
+
spec:
|
|
150
|
+
chart:
|
|
151
|
+
spec:
|
|
152
|
+
chart: external-dns
|
|
153
|
+
sourceRef:
|
|
154
|
+
kind: HelmRepository
|
|
155
|
+
name: external-dns
|
|
156
|
+
version: 1.21.1
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
### Helm Values
|
|
160
|
+
|
|
161
|
+
| Value | Default | Description |
|
|
162
|
+
|-------|---------|-------------|
|
|
163
|
+
| `provider.name` | `aws` | DNS provider (cloudflare, google, aws, azure, etc.) |
|
|
164
|
+
| `sources` | `[service, ingress]` | Resources to watch (ingress, gateway-httproute, service, etc.) |
|
|
165
|
+
| `domainFilters` | `[]` | Limit to specific DNS zones |
|
|
166
|
+
| `policy` | `upsert-only` | Sync policy (upsert-only, sync, create-only) |
|
|
167
|
+
| `registry` | `txt` | Ownership registry (txt, aws, noop) |
|
|
168
|
+
| `txtOwnerId` | — | Owner identifier for TXT registry |
|
|
169
|
+
| `interval` | `1m` | Sync interval |
|
|
170
|
+
| `logLevel` | `info` | Log verbosity |
|
|
171
|
+
| `rbac.create` | `true` | Create ClusterRole |
|
|
172
|
+
| `rbac.extraRules` | `[]` | Additional RBAC rules (e.g. for Gateway API) |
|
|
173
|
+
| `nodeSelector` | `{}` | Node selector |
|
|
174
|
+
| `tolerations` | `[]` | Pod tolerations |
|
|
175
|
+
| `extraArgs` | `[]` | Additional CLI args |
|
|
176
|
+
|
|
177
|
+
## Common Mistakes
|
|
178
|
+
|
|
179
|
+
- **Missing RBAC for Gateway sources.** Without `rbac.extraRules`, gateway-httproute source returns no endpoints. Both `httproutes` and `gateways` resources must be listed.
|
|
180
|
+
- **`upsert-only` doesn't clean up stale records.** When an Ingress/HTTPRoute is deleted, its DNS record persists. Use `sync` policy or manual cleanup.
|
|
181
|
+
- **Cloudflare API token needs specific permissions.** Requires `Zone:DNS:Edit` for the target zone. A token with only `Zone:Read` will fail silently.
|
|
182
|
+
- **`--cloudflare-proxied` is a global flag.** To proxy only specific records, omit the global flag and use the `external-dns.alpha.kubernetes.io/cloudflare-proxied: "true"` annotation per resource.
|
|
183
|
+
- **Domain filter is a suffix match.** `kubexa.tech` matches `app.kubexa.tech` but NOT `kubexa.tech.app.com`. Add trailing dot if needed.
|
|
184
|
+
- **NodePort services not supported** with `source=service`. Only LoadBalancer services are detected.
|
|
185
|
+
- **Multiple TXT owner IDs on same zone.** If two ExternalDNS instances manage the same zone with different `txtOwnerId`, they can coexist. Records with unknown owner ID are left untouched by `upsert-only`.
|
|
@@ -0,0 +1,292 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: flagger
|
|
3
|
+
description: Use when working with Flagger — progressive delivery, canary deployments, A/B testing, blue/green on Kubernetes. Covers Canary CRD, analysis/meshes/metrics/webhooks, Helm values. CRDs: Canary, MetricTemplate, AlertProvider.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Flagger
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
Flagger automates progressive delivery for Kubernetes workloads. It gradually shifts traffic to a new version while measuring metrics and running conformance tests. Supports canary releases (weighted traffic), A/B testing (header/cookie routing), and blue/green deployments (instant switch or mirroring).
|
|
11
|
+
|
|
12
|
+
**CRDs:** `Canary` (flagger.app/v1beta1), `MetricTemplate`, `AlertProvider`.
|
|
13
|
+
|
|
14
|
+
**Latest:** chart 1.43.0, app v1.43.0 (Apr 2026).
|
|
15
|
+
|
|
16
|
+
## Architecture
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
User creates/updates Canary resource
|
|
20
|
+
→ Flagger creates:
|
|
21
|
+
- <name>-primary Deployment (stable version)
|
|
22
|
+
- <name>-canary Deployment (new version)
|
|
23
|
+
- <name> ClusterIP service (routes to primary)
|
|
24
|
+
- <name>-primary ClusterIP service (stable)
|
|
25
|
+
- <name>-canary ClusterIP service (new)
|
|
26
|
+
- Mesh/Ingress routing objects (if mesh provider set)
|
|
27
|
+
→ Analysis loop:
|
|
28
|
+
1. Increment traffic to canary (stepWeight)
|
|
29
|
+
2. Run webhooks (pre-rollout, rollout, post-rollout)
|
|
30
|
+
3. Check metrics (success rate, duration, custom)
|
|
31
|
+
4. If all pass → promote canary to primary
|
|
32
|
+
5. If threshold exceeded → rollback
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
## CRD: Canary
|
|
36
|
+
|
|
37
|
+
`apiVersion: flagger.app/v1beta1`, `kind: Canary`
|
|
38
|
+
|
|
39
|
+
### Minimal Example (Kubernetes CNI — no mesh)
|
|
40
|
+
|
|
41
|
+
```yaml
|
|
42
|
+
apiVersion: flagger.app/v1beta1
|
|
43
|
+
kind: Canary
|
|
44
|
+
metadata:
|
|
45
|
+
name: myapp
|
|
46
|
+
namespace: prod
|
|
47
|
+
spec:
|
|
48
|
+
provider: kubernetes # No service mesh (uses ClusterIP + pod labels)
|
|
49
|
+
targetRef:
|
|
50
|
+
apiVersion: apps/v1
|
|
51
|
+
kind: Deployment
|
|
52
|
+
name: myapp
|
|
53
|
+
service:
|
|
54
|
+
port: 9898
|
|
55
|
+
portDiscovery: true
|
|
56
|
+
analysis:
|
|
57
|
+
interval: 1m
|
|
58
|
+
threshold: 5
|
|
59
|
+
iterations: 10 # Used for blue/green with no mesh provider
|
|
60
|
+
metrics:
|
|
61
|
+
- name: request-success-rate
|
|
62
|
+
thresholdRange:
|
|
63
|
+
min: 99
|
|
64
|
+
interval: 1m
|
|
65
|
+
webhooks:
|
|
66
|
+
- name: load-test
|
|
67
|
+
type: rollout
|
|
68
|
+
url: http://flagger-loadtester.test/
|
|
69
|
+
metadata:
|
|
70
|
+
cmd: "hey -z 1m -q 10 http://myapp-canary.prod:9898/"
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
### CanarySpec Fields
|
|
74
|
+
|
|
75
|
+
| Field | Required | Description |
|
|
76
|
+
|-------|----------|-------------|
|
|
77
|
+
| `provider` | Yes | Traffic provider: `kubernetes`, `istio`, `linkerd`, `nginx`, `contour`, `gloo`, `traefik`, `gatewayapi:v1`, `apisix`, `kuma`, `knative`, `skipper`, `osm`, `smi:v1alpha2`, `appmesh:v1beta2` |
|
|
78
|
+
| `targetRef` | Yes | Target Deployment reference |
|
|
79
|
+
| `autoscalerRef` | No | HPA reference (copied to canary) |
|
|
80
|
+
| `service` | Yes | Service spec (port, portName, targetPort, hosts, gatewayRefs, match, rewrite, timeout, headers, etc.) |
|
|
81
|
+
| `suspend` | No | Suspend all canary runs |
|
|
82
|
+
| `progressDeadlineSeconds` | No | Max time for canary progress before rollback (default 600) |
|
|
83
|
+
| `skipAnalysis` | No | Promote without analysis (default false) |
|
|
84
|
+
|
|
85
|
+
### AnalysisSpec Fields
|
|
86
|
+
|
|
87
|
+
| Field | Required | Description |
|
|
88
|
+
|-------|----------|-------------|
|
|
89
|
+
| `interval` | Yes | Schedule interval (e.g. `1m`, `30s`) |
|
|
90
|
+
| `threshold` | Yes | Max failed checks before rollback |
|
|
91
|
+
| `maxWeight` | Canary | Max traffic % to canary (0-100). Used with `stepWeight` |
|
|
92
|
+
| `stepWeight` | Canary | Traffic increment per interval (0-100). Used with `maxWeight` |
|
|
93
|
+
| `stepWeights` | Canary | Explicit array of traffic weights. Replaces stepWeight |
|
|
94
|
+
| `stepWeightPromotion` | No | Traffic increment during promotion phase |
|
|
95
|
+
| `iterations` | A/B, Blue/Green | Number of iterations (replaces stepWeight/maxWeight) |
|
|
96
|
+
| `match` | A/B | HTTP header/cookie match conditions for A/B testing |
|
|
97
|
+
| `mirror` | Blue/Green | Mirror traffic to canary (default false) |
|
|
98
|
+
| `mirrorWeight` | No | % of traffic to mirror (0-100) |
|
|
99
|
+
| `primaryReadyThreshold` | No | % of pods that must be available before starting (% , default 100) |
|
|
100
|
+
| `canaryReadyThreshold` | No | % of canary pods that must be available (%, default 100) |
|
|
101
|
+
| `metrics` | No | List of metric checks |
|
|
102
|
+
| `webhooks` | No | List of webhooks (pre-rollout, rollout, confirm-promotion, etc.) |
|
|
103
|
+
| `alerts` | No | List of alert configs |
|
|
104
|
+
| `sessionAffinity` | No | Session affinity settings for canary |
|
|
105
|
+
|
|
106
|
+
### Analysis Strategies
|
|
107
|
+
|
|
108
|
+
| Strategy | Fields | Traffic shaping | Use case |
|
|
109
|
+
|----------|--------|-----------------|----------|
|
|
110
|
+
| Canary (weighted) | `stepWeight` + `maxWeight` | Gradual traffic shift | Gradual rollout with metrics |
|
|
111
|
+
| Canary (custom steps) | `stepWeights: [5, 10, 25, 50, 75]` | Custom traffic steps | Non-linear rollout |
|
|
112
|
+
| A/B Testing | `iterations` + `match` | Header/cookie routing | Test specific user segments |
|
|
113
|
+
| Blue/Green | `iterations` | Instant switch | Quick rollback or pre-production validation |
|
|
114
|
+
| Blue/Green Mirror | `iterations` + `mirror: true` | Traffic mirroring | Shadow traffic without impact |
|
|
115
|
+
|
|
116
|
+
### Metrics
|
|
117
|
+
|
|
118
|
+
```yaml
|
|
119
|
+
metrics:
|
|
120
|
+
- name: request-success-rate
|
|
121
|
+
thresholdRange:
|
|
122
|
+
min: 99
|
|
123
|
+
interval: 1m
|
|
124
|
+
- name: request-duration
|
|
125
|
+
thresholdRange:
|
|
126
|
+
max: 500
|
|
127
|
+
interval: 30s
|
|
128
|
+
- name: custom-metric
|
|
129
|
+
templateRef:
|
|
130
|
+
name: my-metric-template
|
|
131
|
+
namespace: flagger
|
|
132
|
+
thresholdRange:
|
|
133
|
+
min: 2
|
|
134
|
+
max: 100
|
|
135
|
+
interval: 1m
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
Built-in metric checks (when `templateRef` is not set):
|
|
139
|
+
- `request-success-rate` — Prometheus query `rate(...)` for non-5xx responses
|
|
140
|
+
- `request-duration` — Prometheus query `histogram_quantile(0.99, ...)` for P99 latency
|
|
141
|
+
|
|
142
|
+
Custom metrics use `MetricTemplate` CRD (see below).
|
|
143
|
+
|
|
144
|
+
### Webhooks
|
|
145
|
+
|
|
146
|
+
```yaml
|
|
147
|
+
webhooks:
|
|
148
|
+
- name: "load test"
|
|
149
|
+
type: rollout # Run during canary analysis
|
|
150
|
+
url: http://tester/ # Webhook endpoint
|
|
151
|
+
timeout: 5m
|
|
152
|
+
retries: 3
|
|
153
|
+
disableTLS: false
|
|
154
|
+
metadata:
|
|
155
|
+
cmd: "hey -z 1m http://app:9898/"
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
Webhook types (execution order):
|
|
159
|
+
|
|
160
|
+
| Type | Phase | Purpose |
|
|
161
|
+
|------|-------|---------|
|
|
162
|
+
| `pre-rollout` | Before canary starts | Acceptance tests, DB migrations check |
|
|
163
|
+
| `confirm-rollout` | Before canary starts (gating) | Manual approval gate |
|
|
164
|
+
| `rollout` | During analysis (each step) | Load tests |
|
|
165
|
+
| `confirm-promotion` | Before promotion (gating) | Manual approval for promotion |
|
|
166
|
+
| `post-rollout` | After promotion | Smoke tests, cleanup |
|
|
167
|
+
| `rollback` | After rollback | Cleanup, notifications |
|
|
168
|
+
| `event` | Any time | Informational events |
|
|
169
|
+
| `confirm-traffic-increase` | Before each step increase (gating) | Per-step manual approval |
|
|
170
|
+
|
|
171
|
+
### Alerts
|
|
172
|
+
|
|
173
|
+
```yaml
|
|
174
|
+
alerts:
|
|
175
|
+
- name: "Slack"
|
|
176
|
+
severity: error # info, warn, error
|
|
177
|
+
providerRef:
|
|
178
|
+
name: dev-slack
|
|
179
|
+
namespace: flagger
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
## CRD: MetricTemplate
|
|
183
|
+
|
|
184
|
+
`apiVersion: flagger.app/v1beta1`, `kind: MetricTemplate`
|
|
185
|
+
|
|
186
|
+
Defines custom metric queries for canary analysis:
|
|
187
|
+
|
|
188
|
+
```yaml
|
|
189
|
+
apiVersion: flagger.app/v1beta1
|
|
190
|
+
kind: MetricTemplate
|
|
191
|
+
metadata:
|
|
192
|
+
name: db-connections
|
|
193
|
+
namespace: flagger
|
|
194
|
+
spec:
|
|
195
|
+
provider:
|
|
196
|
+
type: prometheus
|
|
197
|
+
address: http://prometheus.monitoring:9090
|
|
198
|
+
query: |
|
|
199
|
+
avg_over_time(
|
|
200
|
+
pg_stat_activity_count{namespace="{{ namespace }}",app="{{ target }}"}[{{ interval }}]
|
|
201
|
+
)
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
Template variables: `{{ namespace }}`, `{{ target }}`, `{{ interval }}`.
|
|
205
|
+
|
|
206
|
+
## CRD: AlertProvider
|
|
207
|
+
|
|
208
|
+
`apiVersion: flagger.app/v1beta1`, `kind: AlertProvider`
|
|
209
|
+
|
|
210
|
+
```yaml
|
|
211
|
+
apiVersion: flagger.app/v1beta1
|
|
212
|
+
kind: AlertProvider
|
|
213
|
+
metadata:
|
|
214
|
+
name: dev-slack
|
|
215
|
+
namespace: flagger
|
|
216
|
+
spec:
|
|
217
|
+
type: slack
|
|
218
|
+
channel: flagger-alerts
|
|
219
|
+
username: flager
|
|
220
|
+
address: https://hooks.slack.com/services/TOKEN
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
Supported types: `slack`, `teams`, `discord`, `rocket`.
|
|
224
|
+
|
|
225
|
+
## Deployment (Flux HelmRelease)
|
|
226
|
+
|
|
227
|
+
```yaml
|
|
228
|
+
apiVersion: source.toolkit.fluxcd.io/v1
|
|
229
|
+
kind: HelmRepository
|
|
230
|
+
metadata:
|
|
231
|
+
name: flagger
|
|
232
|
+
namespace: flagger-system
|
|
233
|
+
spec:
|
|
234
|
+
interval: 24h
|
|
235
|
+
url: https://flagger.app
|
|
236
|
+
---
|
|
237
|
+
apiVersion: helm.toolkit.fluxcd.io/v2
|
|
238
|
+
kind: HelmRelease
|
|
239
|
+
metadata:
|
|
240
|
+
name: flagger
|
|
241
|
+
namespace: flagger-system
|
|
242
|
+
spec:
|
|
243
|
+
chart:
|
|
244
|
+
spec:
|
|
245
|
+
chart: flagger
|
|
246
|
+
sourceRef:
|
|
247
|
+
kind: HelmRepository
|
|
248
|
+
name: flagger
|
|
249
|
+
version: "1.39.0"
|
|
250
|
+
values:
|
|
251
|
+
meshProvider: "" # Kubernetes CNI mode
|
|
252
|
+
metricsServer: "http://prometheus:9090"
|
|
253
|
+
prometheus:
|
|
254
|
+
install: false # Use existing Prometheus
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
## Helm Values
|
|
258
|
+
|
|
259
|
+
| Value | Default | Description |
|
|
260
|
+
|-------|---------|-------------|
|
|
261
|
+
| `meshProvider` | `""` (kubernetes) | Traffic provider: istio, linkerd, nginx, contour, kubernetes, etc. |
|
|
262
|
+
| `metricsServer` | `http://prometheus.istio-system:9090` | Prometheus URL |
|
|
263
|
+
| `logLevel` | `info` | Log level |
|
|
264
|
+
| `crd.create` | `false` | Create CRDs (Helm v3 handles this separately) |
|
|
265
|
+
| `prometheus.install` | `false` | Install bundled Prometheus |
|
|
266
|
+
| `prometheus.retention` | `2h` | Prometheus data retention |
|
|
267
|
+
| `serviceMonitor.enabled` | `false` | Create ServiceMonitor |
|
|
268
|
+
| `podMonitor.enabled` | `false` | Create PodMonitor |
|
|
269
|
+
| `namespace` | `""` (all) | Watch single namespace (empty = all) |
|
|
270
|
+
| `selectorLabels` | `app,name,app.kubernetes.io/name` | Labels for workload selection |
|
|
271
|
+
|
|
272
|
+
## Provider-Specific Features
|
|
273
|
+
|
|
274
|
+
| Feature | Istio | Linkerd | Contour | NGINX | Kubernetes | Gateway API |
|
|
275
|
+
|---------|-------|---------|---------|-------|-----------|-------------|
|
|
276
|
+
| Weighted canary | ✅ | ✅ | ✅ | ✅ | ➖ | ✅ |
|
|
277
|
+
| A/B testing | ✅ | ➖ | ✅ | ✅ | ➖ | ✅ |
|
|
278
|
+
| Blue/green (switch) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
279
|
+
| Blue/green (mirror) | ✅ | ➖ | ➖ | ➖ | ➖ | ➖ |
|
|
280
|
+
| Request success rate | ✅ | ✅ | ✅ | ➖ | ✅ | ✅ |
|
|
281
|
+
| Request duration | ✅ | ✅ | ✅ | ➖ | ✅ | ✅ |
|
|
282
|
+
|
|
283
|
+
## Common Mistakes
|
|
284
|
+
|
|
285
|
+
- **`meshProvider: ""` (Kubernetes CNI) has no traffic shaping.** Flagger can only do blue/green (iterations-based) with the `kubernetes` provider. Weighted canary (stepWeight) requires a service mesh or ingress controller.
|
|
286
|
+
- **Prometheus must be reachable.** Without `metricsServer`, the analysis loop immediately fails. Verify Prometheus URL and that Flagger can query it.
|
|
287
|
+
- **CRDs not installed.** `crd.create: false` means CRDs must be installed separately. If running Flux, the CRDs from the upstream `crds.yaml` must exist before Canary resources are applied.
|
|
288
|
+
- **Webhook URL must be reachable from Flagger pod.** Load test webhooks are called during canary analysis. If the webhook times out or returns error, the canary fails. Use cluster-internal URLs.
|
|
289
|
+
- **`targetRef` must be a Deployment.** Flagger only supports Deployment as the target. Other workload types (StatefulSet, DaemonSet) won't work.
|
|
290
|
+
- **`progressDeadlineSeconds` too low.** If canary takes longer than this (e.g., image pull delay, slow startup), Flagger rolls back. Default 600s. Increase for large images.
|
|
291
|
+
- **Missing `service.port`.** Required field. Flagger creates ClusterIP services and needs to know the container port.
|
|
292
|
+
- **Metric template `{{ target }}` defaults to the canary name.** In Prometheus queries, `{{ target }}` resolves to the service name. Ensure your metrics service matches the label selectors Flagger sets.
|