k8s-agent-skills 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/README.md +102 -0
  2. package/package.json +63 -0
  3. package/skills/atlas/SKILL.md +166 -0
  4. package/skills/cert-manager/SKILL.md +212 -0
  5. package/skills/cilium-gateway/SKILL.md +283 -0
  6. package/skills/cilium-network/SKILL.md +243 -0
  7. package/skills/cnpg/SKILL.md +130 -0
  8. package/skills/dragonfly/SKILL.md +194 -0
  9. package/skills/external-dns/SKILL.md +185 -0
  10. package/skills/flagger/SKILL.md +292 -0
  11. package/skills/flux/SKILL.md +36 -0
  12. package/skills/gitea/SKILL.md +32 -0
  13. package/skills/gitea-api/SKILL.md +104 -0
  14. package/skills/gitea-registry/SKILL.md +71 -0
  15. package/skills/gitea-runner/SKILL.md +126 -0
  16. package/skills/gitea-tea/SKILL.md +206 -0
  17. package/skills/gitea-webhooks/SKILL.md +93 -0
  18. package/skills/harbor/SKILL.md +32 -0
  19. package/skills/harbor-api/SKILL.md +231 -0
  20. package/skills/harbor-helm/SKILL.md +238 -0
  21. package/skills/harbor-terraform/SKILL.md +233 -0
  22. package/skills/higress/SKILL.md +27 -0
  23. package/skills/higress-helm/SKILL.md +328 -0
  24. package/skills/higress-operator/SKILL.md +435 -0
  25. package/skills/kserve/SKILL.md +28 -0
  26. package/skills/kserve-helm/SKILL.md +330 -0
  27. package/skills/kserve-operator/SKILL.md +763 -0
  28. package/skills/kubeflow/SKILL.md +33 -0
  29. package/skills/kubeflow-pipelines/SKILL.md +392 -0
  30. package/skills/kubeflow-trainer/SKILL.md +429 -0
  31. package/skills/kubeflow-training-operator/SKILL.md +176 -0
  32. package/skills/mariadb/SKILL.md +27 -0
  33. package/skills/mariadb-helm/SKILL.md +378 -0
  34. package/skills/mariadb-operator/SKILL.md +1114 -0
  35. package/skills/nvidia-device-plugin/SKILL.md +204 -0
  36. package/skills/rook-ceph/SKILL.md +22 -0
  37. package/skills/rook-ceph-operator/SKILL.md +150 -0
  38. package/skills/rook-ceph-toolbox/SKILL.md +220 -0
  39. package/skills/sealed-secrets/SKILL.md +221 -0
  40. package/skills/stakater-reloader/SKILL.md +259 -0
  41. package/skills/talos/SKILL.md +244 -0
  42. package/skills/tekton/SKILL.md +187 -0
  43. package/skills/vector/SKILL.md +24 -0
  44. package/skills/vector-helm/SKILL.md +186 -0
  45. package/skills/vector-operator/SKILL.md +455 -0
  46. package/skills/victoria-metrics/SKILL.md +35 -0
  47. package/skills/victoriametrics-operator/SKILL.md +248 -0
  48. package/skills/zitadel/SKILL.md +24 -0
  49. package/skills/zitadel-api/SKILL.md +962 -0
  50. package/skills/zitadel-helm/SKILL.md +263 -0
  51. package/skills/zitadel-terraform/SKILL.md +728 -0
@@ -0,0 +1,330 @@
1
+ # KServe — Helm Deployment
2
+
3
+ **Repo:** `https://kserve.github.io/helm-charts`
4
+ **Charts:** 10 (see table below)
5
+ **Latest:** v0.18.0
6
+
7
+ ## Charts (10)
8
+
9
+ | Chart | Namespace | Purpose |
10
+ |-------|-----------|---------|
11
+ | `kserve-crd` | `kserve` | Core CRDs only |
12
+ | `kserve-resources` | `kserve` | Controller + webhook + storage init |
13
+ | `kserve-llmisvc-crd` | `kserve` | LLMInferenceService CRD only |
14
+ | `kserve-llmisvc-resources` | `kserve` | LLMInferenceService controller |
15
+ | `kserve-localmodel-crd` | `kserve` | LocalModelCache + LocalModelNode CRDs |
16
+ | `kserve-localmodel-resources` | `kserve` | LocalModel controller + node agent |
17
+ | `kserve-runtime-configs` | `kserve` | Default ServingRuntime configs |
18
+ | `kserve-crd-minimal` | `kserve` | Min CRDs (InferenceService only) |
19
+ | `kserve-resources-minimal` | `kserve` | Min controller (no Mesh/Knative) |
20
+ | `kserve-runtime-configs-minimal` | `kserve` | Minimal runtime configs |
21
+
22
+ ## Prerequisites
23
+
24
+ | Dependency | Version | Required For |
25
+ |-----------|---------|-------------|
26
+ | Kubernetes | ≥ 1.26 | All |
27
+ | Cert-Manager | ≥ 1.12 | Controller webhook |
28
+ | Istio | ≥ 1.20 | Standard mode (with ingress gateway) |
29
+ | Knative Serving | ≥ 1.16 | Serverless mode (optional, auto-scaling to zero) |
30
+ | Gateway API | ≥ 1.2 | Gateway API ingress (alternative to Istio) |
31
+
32
+ ## Quick Install
33
+
34
+ ### Standard Mode (Knative + Istio)
35
+
36
+ ```bash
37
+ # 1. Install Knative (if needed)
38
+ kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.16.0/serving-crds.yaml
39
+ kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.16.0/serving-core.yaml
40
+ kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.16.0/net-istio.yaml
41
+
42
+ # 2. Install KServe
43
+ helm repo add kserve https://kserve.github.io/helm-charts
44
+ helm repo update
45
+
46
+ # CRDs
47
+ helm install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd \
48
+ --version v0.18.0 --namespace kserve --create-namespace
49
+
50
+ # Controllers
51
+ helm install kserve-resources oci://ghcr.io/kserve/charts/kserve-resources \
52
+ --version v0.18.0 --namespace kserve
53
+
54
+ # Runtime configs
55
+ helm install kserve-runtime-configs oci://ghcr.io/kserve/charts/kserve-runtime-configs \
56
+ --version v0.18.0 --namespace kserve
57
+ ```
58
+
59
+ ### Raw Deployment Mode (No Mesh)
60
+
61
+ ```bash
62
+ # Minimal CRDs (InferenceService only)
63
+ helm install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd-minimal \
64
+ --version v0.18.0 --namespace kserve --create-namespace
65
+
66
+ # Minimal controller
67
+ helm install kserve-resources oci://ghcr.io/kserve/charts/kserve-resources-minimal \
68
+ --version v0.18.0 --namespace kserve
69
+
70
+ # Minimal runtime configs
71
+ helm install kserve-runtime-configs oci://ghcr.io/kserve/charts/kserve-runtime-configs-minimal \
72
+ --version v0.18.0 --namespace kserve
73
+ ```
74
+
75
+ ### LLM Inference Service (Optional)
76
+
77
+ ```bash
78
+ helm install kserve-llmisvc-crd oci://ghcr.io/kserve/charts/kserve-llmisvc-crd \
79
+ --version v0.18.0 --namespace kserve
80
+
81
+ helm install kserve-llmisvc-resources oci://ghcr.io/kserve/charts/kserve-llmisvc-resources \
82
+ --version v0.18.0 --namespace kserve
83
+ ```
84
+
85
+ ### LocalModel (Optional)
86
+
87
+ ```bash
88
+ helm install kserve-localmodel-crd oci://ghcr.io/kserve/charts/kserve-localmodel-crd \
89
+ --version v0.18.0 --namespace kserve
90
+
91
+ helm install kserve-localmodel-resources oci://ghcr.io/kserve/charts/kserve-localmodel-resources \
92
+ --version v0.18.0 --namespace kserve
93
+ ```
94
+
95
+ ## Key Values (kserve-resources)
96
+
97
+ | Parameter | Default | Description |
98
+ |-----------|---------|-------------|
99
+ | `kserve.controller.deploymentMode` | `Serverless` | `Serverless`, `RawDeployment`, `ModelMesh` |
100
+ | `kserve.controller.gateway.ingressGateway.enabled` | `true` | Enable ingress gateway config |
101
+ | `kserve.controller.gateway.ingressGateway.gatewayService` | `istio-ingressgateway.istio-system` | Ingress gateway service |
102
+ | `kserve.controller.gateway.ingressGateway.localGatewayService` | `knative-local-gateway.knative-serving` | Local gateway for mesh |
103
+ | `kserve.controller.gateway.clusterLocalGateway.enabled` | `true` | Enable cluster-local gateway |
104
+ | `kserve.controller.gateway.clusterLocalGateway.service` | `knative-local-gateway.knative-serving` | Local gateway service |
105
+
106
+ ### Controller
107
+
108
+ | Parameter | Default | Description |
109
+ |-----------|---------|-------------|
110
+ | `kserve.controller.image` | `kserve/kserve-controller` | Controller image |
111
+ | `kserve.controller.tag` | `v0.18.0` | Image tag |
112
+ | `kserve.controller.replicas` | `1` | Controller replicas |
113
+ | `kserve.controller.resources` | `{cpu: 100m/500m, mem: 256Mi/1Gi}` | Container resources |
114
+ | `kserve.controller.nodeSelector` | `{}` | Node selector |
115
+ | `kserve.controller.tolerations` | `[]` | Tolerations |
116
+ | `kserve.controller.affinity` | `{}` | Pod affinity |
117
+ | `kserve.controller.topologySpreadConstraints` | `[]` | Topology spread |
118
+ | `kserve.controller.pdb.enabled` | `false` | PDB |
119
+ | `kserve.controller.pdb.maxUnavailable` | `1` | PDB max unavailable |
120
+ | `kserve.controller.env` | `[]` | Extra env vars |
121
+ | `kserve.controller.args` | `[]` | Extra args |
122
+
123
+ ### Webhook
124
+
125
+ | Parameter | Default | Description |
126
+ |-----------|---------|-------------|
127
+ | `kserve.webhook.image` | `kserve/kserve-controller` | Webhook image |
128
+ | `kserve.webhook.tag` | `v0.18.0` | Image tag |
129
+ | `kserve.webhook.replicas` | `1` | Webhook replicas |
130
+ | `kserve.webhook.resources` | `{cpu: 100m/500m, mem: 256Mi/1Gi}` | Container resources |
131
+ | `kserve.webhook.nodeSelector` | `{}` | Node selector |
132
+ | `kserve.webhook.tolerations` | `[]` | Tolerations |
133
+ | `kserve.webhook.affinity` | `{}` | Pod affinity |
134
+
135
+ ### Storage Initializer
136
+
137
+ | Parameter | Default | Description |
138
+ |-----------|---------|-------------|
139
+ | `kserve.storageInitializer.image` | `kserve/storage-initializer` | Init container image |
140
+ | `kserve.storageInitializer.tag` | `v0.18.0` | Image tag |
141
+ | `kserve.storageInitializer.resources` | `{cpu: 100m/1, mem: 200Mi/1Gi}` | Init container resources |
142
+ | `kserve.storageInitializer.env` | `[]` | Extra env vars |
143
+
144
+ ### Agent
145
+
146
+ | Parameter | Default | Description |
147
+ |-----------|---------|-------------|
148
+ | `kserve.agent.image` | `kserve/agent` | Agent image |
149
+ | `kserve.agent.tag` | `v0.18.0` | Image tag |
150
+
151
+ ### Router
152
+
153
+ | Parameter | Default | Description |
154
+ |-----------|---------|-------------|
155
+ | `kserve.router.image` | `kserve/router` | Router image |
156
+ | `kserve.router.tag` | `v0.18.0` | Image tag |
157
+
158
+ ### Global
159
+
160
+ | Parameter | Default | Description |
161
+ |-----------|---------|-------------|
162
+ | `global.imagePullSecrets` | `[]` | Image pull secrets |
163
+ | `global.customMetrics.enabled` | `false` | Enable custom metrics adapter |
164
+
165
+ ## Deployment Modes
166
+
167
+ ### Standard Mode (Serverless - Knative)
168
+
169
+ Default mode. Auto-scaling to zero via Knative. Requires Knative Serving + Istio.
170
+
171
+ ```bash
172
+ helm install kserve-resources oci://ghcr.io/kserve/charts/kserve-resources \
173
+ --version v0.18.0 --namespace kserve \
174
+ --set kserve.controller.deploymentMode=Serverless
175
+ ```
176
+
177
+ **Characteristics:**
178
+ - Auto-scaling to zero (cold start)
179
+ - Traffic splitting (canary)
180
+ - Revision-based model versioning
181
+ - Requires Knative + Istio
182
+
183
+ ### Raw Deployment Mode
184
+
185
+ Direct Deployments without Knative. No auto-scaling to zero. Simpler stack.
186
+
187
+ ```bash
188
+ helm install kserve-resources oci://ghcr.io/kserve/charts/kserve-resources \
189
+ --version v0.18.0 --namespace kserve \
190
+ --set kserve.controller.deploymentMode=RawDeployment
191
+ ```
192
+
193
+ **Characteristics:**
194
+ - No Knative dependency
195
+ - Standard K8s Deployments + Services
196
+ - HPA for autoscaling (minReplicas ≥ 1)
197
+ - Lighter resource usage
198
+
199
+ ### ModelMesh Mode
200
+
201
+ Multi-model serving on shared runtimes. Requires ServingRuntime CRDs.
202
+
203
+ ```bash
204
+ helm install kserve-resources oci://ghcr.io/kserve/charts/kserve-resources \
205
+ --version v0.18.0 --namespace kserve \
206
+ --set kserve.controller.deploymentMode=ModelMesh
207
+ ```
208
+
209
+ **Characteristics:**
210
+ - Multiple models per pod (multiModel)
211
+ - ServingRuntime templates
212
+ - Model pooling
213
+ - Efficient GPU utilization
214
+
215
+ ## Production Values Example
216
+
217
+ ### Standard Mode Production
218
+
219
+ ```yaml
220
+ kserve:
221
+ controller:
222
+ deploymentMode: Serverless
223
+ replicas: 3
224
+ resources:
225
+ requests:
226
+ cpu: 500m
227
+ memory: 1Gi
228
+ limits:
229
+ cpu: 2
230
+ memory: 4Gi
231
+ env:
232
+ - name: ENABLE_GPU
233
+ value: "true"
234
+ affinity:
235
+ podAntiAffinity:
236
+ requiredDuringSchedulingIgnoredDuringExecution:
237
+ - labelSelector:
238
+ matchExpressions:
239
+ - key: app
240
+ operator: In
241
+ values:
242
+ - kserve-controller
243
+ topologyKey: kubernetes.io/hostname
244
+ pdb:
245
+ enabled: true
246
+ maxUnavailable: 1
247
+
248
+ webhook:
249
+ replicas: 3
250
+ resources:
251
+ requests:
252
+ cpu: 200m
253
+ memory: 512Mi
254
+ limits:
255
+ cpu: 500m
256
+ memory: 1Gi
257
+
258
+ storageInitializer:
259
+ env:
260
+ - name: AWS_ENDPOINT_URL
261
+ value: s3.us-east-1.amazonaws.com
262
+ - name: AWS_REGION
263
+ value: us-east-1
264
+ ```
265
+
266
+ ### Raw Deployment Mode
267
+
268
+ ```yaml
269
+ kserve:
270
+ controller:
271
+ deploymentMode: RawDeployment
272
+ gateway:
273
+ ingressGateway:
274
+ enabled: false
275
+ clusterLocalGateway:
276
+ enabled: false
277
+ ```
278
+
279
+ ## Upgrading
280
+
281
+ ```bash
282
+ helm repo update
283
+
284
+ # Upgrade CRDs first
285
+ helm upgrade kserve-crd oci://ghcr.io/kserve/charts/kserve-crd \
286
+ --version v0.18.0 --namespace kserve
287
+
288
+ # Upgrade controller
289
+ helm upgrade kserve-resources oci://ghcr.io/kserve/charts/kserve-resources \
290
+ --version v0.18.0 --namespace kserve
291
+
292
+ # Upgrade runtime configs
293
+ helm upgrade kserve-runtime-configs oci://ghcr.io/kserve/charts/kserve-runtime-configs \
294
+ --version v0.18.0 --namespace kserve
295
+ ```
296
+
297
+ ## Uninstalling
298
+
299
+ ```bash
300
+ # Uninstall resources (reverses operation order)
301
+ helm uninstall kserve-runtime-configs --namespace kserve
302
+ helm uninstall kserve-localmodel-resources --namespace kserve
303
+ helm uninstall kserve-localmodel-crd --namespace kserve
304
+ helm uninstall kserve-llmisvc-resources --namespace kserve
305
+ helm uninstall kserve-llmisvc-crd --namespace kserve
306
+ helm uninstall kserve-resources --namespace kserve
307
+
308
+ # Uninstall CRDs last
309
+ helm uninstall kserve-crd --namespace kserve
310
+
311
+ # Delete namespace
312
+ kubectl delete namespace kserve
313
+ ```
314
+
315
+ **⚠️ WARNING:** Uninstalling `kserve-crd` deletes ALL CRDs and cascades to delete ALL InferenceServices across ALL namespaces.
316
+
317
+ ## Common Mistakes
318
+
319
+ - **CRDs not installed first** — The controller chart fails without CRDs. Always install `kserve-crd` first.
320
+ - **Missing cert-manager** — KServe webhook requires cert-manager ≥ 1.12. Without it, controller pods crash-loop with webhook TLS errors.
321
+ - **Deployment mode mismatch** — `Serverless` mode requires Knative installed. `RawDeployment` does not. Set correctly before install.
322
+ - **Knative + Istio version mismatch** — KServe v0.18.0 requires Knative ≥ 1.16 and Istio ≥ 1.20. Version mismatch causes networking errors.
323
+ - **Stale webhook certs** — After upgrade, webhook certificates may need re-issuance. Restarting the webhook pod usually resolves.
324
+ - **Gateway service exists** — If `istio-ingressgateway.istio-system` doesn't exist in Standard mode, InferenceServices get `Failed to get gateway` status.
325
+ - **Localmodel with non-NVMe storage** — LocalModel cache works best with NVMe. Regular disks may cause model loading performance issues.
326
+ - **ModelMesh with single model** — ModelMesh expects `multiModel: true` on ServingRuntime. For single-model, use Standard mode.
327
+ - **v0.19.0-rc0 caution** — Release candidate. Test for production before using.
328
+ - **OCI chart vs Helm repo** — Use `oci://ghcr.io/kserve/charts/` for v0.18.0+ charts. Older versions used the HTTPS helm repo.
329
+ - **Annotations not propagated** — Some predictor annotations (e.g., `serving.kserve.io/deploymentMode`) must be on the InferenceService, not the pod template.
330
+ - **Storage initializer timeout** — Large models (>10GB) may exceed the default init container timeout. Increase via `kserve.storageInitializer.env`.