@intentius/chant-lexicon-k8s 0.1.0 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,252 @@
1
+ ---
2
+ skill: chant-k8s-ray
3
+ description: KubeRay composites for distributed Ray clusters on Kubernetes — RayCluster, RayJob, RayService
4
+ user-invocable: true
5
+ ---
6
+
7
+ # KubeRay Composites
8
+
9
+ Three composites cover the full KubeRay surface: persistent clusters, ephemeral batch jobs, and Ray Serve HTTP endpoints.
10
+
11
+ ## Prerequisites
12
+
13
+ KubeRay operator must be installed before applying any Ray CRs:
14
+
15
+ ```bash
16
+ kubectl apply -f https://github.com/ray-project/kuberay/releases/download/v1.3.0/kuberay-operator.yaml
17
+ kubectl -n kuberay-operator wait deploy/kuberay-operator --for=condition=Available --timeout=120s
18
+ ```
19
+
20
+ ## When to use which composite
21
+
22
+ | Composite | Use case |
23
+ |---|---|
24
+ | `RayCluster` | Interactive dev, long-lived infra, jobs submitted via CLI / Ray client |
25
+ | `RayJob` | Training pipelines, batch jobs — spins up → runs → tears down |
26
+ | `RayService` | Ray Serve HTTP endpoints with zero-downtime blue-green upgrades |
27
+
28
+ ---
29
+
30
+ ## RayCluster — persistent cluster
31
+
32
+ ```typescript
33
+ import { RayCluster } from "@intentius/chant-lexicon-k8s";
34
+
35
+ export const {
36
+ serviceAccount,
37
+ clusterRole, // only when enableAutoscaler: true
38
+ clusterRoleBinding, // only when enableAutoscaler: true
39
+ networkPolicy,
40
+ pdb,
41
+ pvc, // only when sharedStorage is set
42
+ dashboardService, // only when exposeDashboard: true
43
+ rayCluster,
44
+ } = RayCluster({
45
+ name: "ray",
46
+ namespace: "ray-system",
47
+ cluster: {
48
+ image: "us-central1-docker.pkg.dev/my-project/ray-images/ray:2.40.0",
49
+ head: {
50
+ resources: { cpu: "2", memory: "8Gi" },
51
+ shmSize: "4Gi", // /dev/shm for PyTorch multi-process tensor sharing
52
+ },
53
+ workerGroups: [
54
+ {
55
+ groupName: "cpu",
56
+ replicas: 2,
57
+ minReplicas: 1,
58
+ maxReplicas: 8,
59
+ resources: { cpu: "2", memory: "4Gi" },
60
+ idleTimeoutSeconds: 60,
61
+ },
62
+ {
63
+ groupName: "gpu",
64
+ replicas: 0,
65
+ minReplicas: 0,
66
+ maxReplicas: 4,
67
+ resources: { cpu: "4", memory: "16Gi", gpu: 1 },
68
+ gpuTolerations: true,
69
+ idleTimeoutSeconds: 300, // higher — amortize GPU init overhead
70
+ },
71
+ ],
72
+ },
73
+ sharedStorage: {
74
+ storageClass: "ray-filestore",
75
+ size: "1Ti",
76
+ mountPath: "/mnt/ray-data", // mounted on all pods (head + all workers)
77
+ },
78
+ spilloverBucket: "ray-spill", // GCS bucket for object store overflow
79
+ enableAutoscaler: true,
80
+ exposeDashboard: false, // use kubectl port-forward 8265 in dev
81
+ });
82
+ ```
83
+
84
+ **Key props:**
85
+
86
+ | Prop | Type | Description |
87
+ |---|---|---|
88
+ | `name` | `string` | Resource name prefix |
89
+ | `namespace` | `string` | Kubernetes namespace |
90
+ | `cluster.image` | `string` | Ray Docker image (pre-built recommended) |
91
+ | `cluster.head.resources` | `ResourceSpec` | CPU/memory for the head pod |
92
+ | `cluster.head.shmSize` | `string?` | Size of /dev/shm emptyDir (default: `"2Gi"`) |
93
+ | `cluster.workerGroups` | `WorkerGroupSpec[]` | One entry per worker group |
94
+ | `sharedStorage` | `object?` | PVC + volume mounts on all pods |
95
+ | `spilloverBucket` | `string?` | GCS bucket for Ray object store spillover |
96
+ | `enableAutoscaler` | `boolean?` | Emit ClusterRole/CRB for in-tree autoscaler |
97
+ | `exposeDashboard` | `boolean?` | Emit LoadBalancer Service for port 8265 |
98
+ | `labels` | `Record<string, string>?` | Extra labels on all resources |
99
+ | `defaults` | `object?` | Deep-merge overrides onto any generated resource |
100
+
101
+ ---
102
+
103
+ ## RayJob — ephemeral cluster per batch job
104
+
105
+ ```typescript
106
+ import { RayJob } from "@intentius/chant-lexicon-k8s";
107
+
108
+ export const { serviceAccount, networkPolicy, pvc, rayJob } = RayJob({
109
+ name: "train-job",
110
+ namespace: "ray-system",
111
+ entrypoint: "python train.py --epochs 10",
112
+ cluster: {
113
+ image: "us-central1-docker.pkg.dev/my-project/ray-images/ray:2.40.0",
114
+ head: { resources: { cpu: "2", memory: "8Gi" } },
115
+ workerGroups: [
116
+ { groupName: "cpu", replicas: 4, resources: { cpu: "4", memory: "16Gi" } },
117
+ ],
118
+ },
119
+ shutdownAfterJobFinishes: true, // default: true — cluster tears down after job
120
+ ttlSecondsAfterFinished: 300, // default: 300 — delay before RayJob CR is deleted
121
+ runtimeEnvYAML: "pip:\n - torch==2.3.0",
122
+ spilloverBucket: "ray-spill",
123
+ });
124
+ ```
125
+
126
+ **Key props:**
127
+
128
+ | Prop | Type | Description |
129
+ |---|---|---|
130
+ | `entrypoint` | `string` | Shell command to run as the Ray job |
131
+ | `runtimeEnvYAML` | `string?` | Ray runtime env YAML (pip packages, env vars, working_dir) |
132
+ | `shutdownAfterJobFinishes` | `boolean?` | Tear down cluster after job completes (default: `true`) |
133
+ | `ttlSecondsAfterFinished` | `number?` | Seconds before deleting the RayJob CR (default: `300`) |
134
+
135
+ All other props (`cluster`, `sharedStorage`, `spilloverBucket`, `enableAutoscaler`) work the same as `RayCluster`.
136
+
137
+ ---
138
+
139
+ ## RayService — persistent Ray Serve endpoint
140
+
141
+ ```typescript
142
+ import { RayService } from "@intentius/chant-lexicon-k8s";
143
+
144
+ export const {
145
+ serviceAccount, networkPolicy, pdb, pvc,
146
+ serveService, // LoadBalancer Service on port 8000
147
+ rayService,
148
+ } = RayService({
149
+ name: "inference",
150
+ namespace: "ray-system",
151
+ serveConfigV2: `
152
+ applications:
153
+ - name: classifier
154
+ import_path: app:deployment
155
+ route_prefix: /
156
+ deployments:
157
+ - name: Classifier
158
+ num_replicas: 2
159
+ ray_actor_options:
160
+ num_cpus: 1
161
+ `,
162
+ cluster: {
163
+ image: "us-central1-docker.pkg.dev/my-project/ray-images/ray:2.40.0",
164
+ head: { resources: { cpu: "2", memory: "8Gi" } },
165
+ workerGroups: [
166
+ { groupName: "serve", replicas: 2, minReplicas: 1, maxReplicas: 8,
167
+ resources: { cpu: "4", memory: "16Gi" } },
168
+ ],
169
+ },
170
+ enableAutoscaler: true,
171
+ });
172
+ // Access: kubectl port-forward svc/inference-serve-svc 8000:8000
173
+ ```
174
+
175
+ `serveService` is always emitted — a LoadBalancer Service on port 8000. To expose it via Ingress, add an annotation via `defaults.serveService`.
176
+
177
+ ---
178
+
179
+ ## Shared types
180
+
181
+ ```typescript
182
+ interface ResourceSpec {
183
+ cpu: string; // "2", "500m"
184
+ memory: string; // "4Gi", "512Mi"
185
+ gpu?: number; // adds nvidia.com/gpu resource limit
186
+ }
187
+
188
+ interface HeadGroupSpec {
189
+ resources: ResourceSpec;
190
+ shmSize?: string; // /dev/shm size, default "2Gi"
191
+ rayStartParams?: Record<string, string>; // extra ray start flags
192
+ env?: Array<{ name: string; value: string }>;
193
+ }
194
+
195
+ interface WorkerGroupSpec {
196
+ groupName: string;
197
+ replicas: number;
198
+ minReplicas?: number;
199
+ maxReplicas?: number;
200
+ resources: ResourceSpec;
201
+ idleTimeoutSeconds?: number; // default 60; use 300+ for GPU
202
+ gpuTolerations?: boolean; // tolerate nvidia.com/gpu taint
203
+ rayStartParams?: Record<string, string>;
204
+ env?: Array<{ name: string; value: string }>;
205
+ }
206
+ ```
207
+
208
+ ---
209
+
210
+ ## Production defaults (encoded in composites)
211
+
212
+ All three composites automatically apply these defaults — no manual configuration needed:
213
+
214
+ | Default | Why |
215
+ |---|---|
216
+ | `preStop: ["ray", "stop"]` + `terminationGracePeriodSeconds: 120` | Graceful drain on pod eviction; in-flight tasks complete rather than fail |
217
+ | `idleTimeoutSeconds: 60` (default) | Prevents stuck idle workers consuming resources |
218
+ | `--num-cpus` derived from `resources.cpu` | Prevents autoscaler over-commit; without this Ray reads host CPU count, not container limit |
219
+ | `RAY_object_spilling_config` env var | Routes large object spills to GCS; without this, large models or shuffled datasets OOM the head |
220
+ | `shmSize` dshm emptyDir | PyTorch tensor sharing via /dev/shm; default 2Gi, set 4Gi+ for multi-process training |
221
+ | `gpuTolerations: true` | Adds `nvidia.com/gpu: present: NoSchedule` toleration; required for GPU node pools with standard taints |
222
+
223
+ ---
224
+
225
+ ## NetworkPolicy strategy
226
+
227
+ The composites emit a `NetworkPolicy` using `podSelector` only for intra-cluster rules — no IP CIDR blocks for Ray traffic. This avoids the GKE secondary IP range mismatch problem: GKE allocates pod CIDRs from secondary ranges that differ from declared subnet CIDRs, so CIDR-based NetworkPolicy rules silently fail when pods move nodes.
228
+
229
+ GCS/HTTPS egress uses an ipBlock rule with RFC1918 ranges excluded — this allows Google APIs (storage.googleapis.com) while blocking internal lateral movement.
230
+
231
+ Ports covered: 6379 (GCS object store), 8265 (dashboard), 10001–10002 (Ray client), 8080 (metrics), 32768–60999 (ephemeral gRPC).
232
+
233
+ DNS egress (port 53 UDP/TCP) is always allowed — required for head service resolution.
234
+
235
+ ---
236
+
237
+ ## Troubleshooting
238
+
239
+ **Workers not joining the cluster**
240
+ Check the NetworkPolicy allows port 6379 from worker pods. The composite uses `ray.io/cluster-name: <name>` as the podSelector label — confirm this label is present on the pods (`kubectl get pods -n ray-system --show-labels`).
241
+
242
+ **Autoscaler not scaling up**
243
+ `enableAutoscaler: true` is required to emit the ClusterRole with pod CRUD permissions. Without it, the autoscaler controller cannot create or delete pods and will silently fail.
244
+
245
+ **GPU workers not scheduling**
246
+ Set `gpuTolerations: true` on the GPU worker group. Without the `nvidia.com/gpu: present: NoSchedule` toleration, pods won't schedule on GPU-tainted nodes. Also confirm the node pool taint key matches.
247
+
248
+ **Head OOM on large workloads**
249
+ Set `spilloverBucket` to a GCS bucket the head pod can write to. The head pod needs GCS access — use Workload Identity and bind the K8s ServiceAccount to a GCP SA with `roles/storage.objectAdmin` on the bucket. The composite injects `RAY_object_spilling_config` automatically when `spilloverBucket` is set.
250
+
251
+ **Pre-built images vs runtimeEnv pip installs**
252
+ Avoid `runtimeEnvYAML` pip installs in production. Each worker restart re-runs pip install, adding minutes to cold start at scale. Pre-build a Docker image with all dependencies baked in and push it to Artifact Registry.