@techwavedev/agi-agent-kit 1.1.7 → 1.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of @techwavedev/agi-agent-kit might be problematic. Click here for more details.
- package/CHANGELOG.md +82 -1
- package/README.md +190 -12
- package/bin/init.js +30 -2
- package/package.json +6 -3
- package/templates/base/AGENTS.md +54 -23
- package/templates/base/README.md +325 -0
- package/templates/base/directives/memory_integration.md +95 -0
- package/templates/base/execution/memory_manager.py +309 -0
- package/templates/base/execution/session_boot.py +218 -0
- package/templates/base/execution/session_init.py +320 -0
- package/templates/base/skill-creator/SKILL_skillcreator.md +23 -36
- package/templates/base/skill-creator/scripts/init_skill.py +18 -135
- package/templates/skills/ec/README.md +31 -0
- package/templates/skills/ec/aws/SKILL.md +1020 -0
- package/templates/skills/ec/aws/defaults.yaml +13 -0
- package/templates/skills/ec/aws/references/common_patterns.md +80 -0
- package/templates/skills/ec/aws/references/mcp_servers.md +98 -0
- package/templates/skills/ec/aws-terraform/SKILL.md +349 -0
- package/templates/skills/ec/aws-terraform/references/best_practices.md +394 -0
- package/templates/skills/ec/aws-terraform/references/checkov_reference.md +337 -0
- package/templates/skills/ec/aws-terraform/scripts/configure_mcp.py +150 -0
- package/templates/skills/ec/confluent-kafka/SKILL.md +655 -0
- package/templates/skills/ec/confluent-kafka/references/ansible_playbooks.md +792 -0
- package/templates/skills/ec/confluent-kafka/references/ec_deployment.md +579 -0
- package/templates/skills/ec/confluent-kafka/references/kraft_migration.md +490 -0
- package/templates/skills/ec/confluent-kafka/references/troubleshooting.md +778 -0
- package/templates/skills/ec/confluent-kafka/references/upgrade_7x_to_8x.md +488 -0
- package/templates/skills/ec/confluent-kafka/scripts/kafka_health_check.py +435 -0
- package/templates/skills/ec/confluent-kafka/scripts/upgrade_preflight.py +568 -0
- package/templates/skills/ec/confluent-kafka/scripts/validate_config.py +455 -0
- package/templates/skills/ec/consul/SKILL.md +427 -0
- package/templates/skills/ec/consul/references/acl_setup.md +168 -0
- package/templates/skills/ec/consul/references/ha_config.md +196 -0
- package/templates/skills/ec/consul/references/troubleshooting.md +267 -0
- package/templates/skills/ec/consul/references/upgrades.md +213 -0
- package/templates/skills/ec/consul/scripts/consul_health_report.py +530 -0
- package/templates/skills/ec/consul/scripts/consul_status.py +264 -0
- package/templates/skills/ec/consul/scripts/generate_values.py +170 -0
- package/templates/skills/ec/documentation/SKILL.md +351 -0
- package/templates/skills/ec/documentation/references/best_practices.md +201 -0
- package/templates/skills/ec/documentation/scripts/analyze_code.py +307 -0
- package/templates/skills/ec/documentation/scripts/detect_changes.py +460 -0
- package/templates/skills/ec/documentation/scripts/generate_changelog.py +312 -0
- package/templates/skills/ec/documentation/scripts/sync_docs.py +272 -0
- package/templates/skills/ec/documentation/scripts/update_skill_docs.py +366 -0
- package/templates/skills/ec/gitlab/SKILL.md +529 -0
- package/templates/skills/ec/gitlab/references/agent_installation.md +416 -0
- package/templates/skills/ec/gitlab/references/api_reference.md +508 -0
- package/templates/skills/ec/gitlab/references/gitops_flux.md +465 -0
- package/templates/skills/ec/gitlab/references/troubleshooting.md +518 -0
- package/templates/skills/ec/gitlab/scripts/generate_agent_values.py +329 -0
- package/templates/skills/ec/gitlab/scripts/gitlab_agent_status.py +414 -0
- package/templates/skills/ec/jira/SKILL.md +484 -0
- package/templates/skills/ec/jira/references/jql_reference.md +148 -0
- package/templates/skills/ec/jira/scripts/add_comment.py +91 -0
- package/templates/skills/ec/jira/scripts/bulk_log_work.py +124 -0
- package/templates/skills/ec/jira/scripts/create_ticket.py +162 -0
- package/templates/skills/ec/jira/scripts/get_ticket.py +191 -0
- package/templates/skills/ec/jira/scripts/jira_client.py +383 -0
- package/templates/skills/ec/jira/scripts/log_work.py +154 -0
- package/templates/skills/ec/jira/scripts/search_tickets.py +104 -0
- package/templates/skills/ec/jira/scripts/update_comment.py +67 -0
- package/templates/skills/ec/jira/scripts/update_ticket.py +161 -0
- package/templates/skills/ec/karpenter/SKILL.md +301 -0
- package/templates/skills/ec/karpenter/references/ec2nodeclasses.md +421 -0
- package/templates/skills/ec/karpenter/references/migration.md +396 -0
- package/templates/skills/ec/karpenter/references/nodepools.md +400 -0
- package/templates/skills/ec/karpenter/references/troubleshooting.md +359 -0
- package/templates/skills/ec/karpenter/scripts/generate_ec2nodeclass.py +187 -0
- package/templates/skills/ec/karpenter/scripts/generate_nodepool.py +245 -0
- package/templates/skills/ec/karpenter/scripts/karpenter_status.py +359 -0
- package/templates/skills/ec/opensearch/SKILL.md +720 -0
- package/templates/skills/ec/opensearch/references/ml_neural_search.md +576 -0
- package/templates/skills/ec/opensearch/references/operator.md +532 -0
- package/templates/skills/ec/opensearch/references/query_dsl.md +532 -0
- package/templates/skills/ec/opensearch/scripts/configure_mcp.py +148 -0
- package/templates/skills/ec/victoriametrics/SKILL.md +598 -0
- package/templates/skills/ec/victoriametrics/references/kubernetes.md +531 -0
- package/templates/skills/ec/victoriametrics/references/prometheus_migration.md +333 -0
- package/templates/skills/ec/victoriametrics/references/troubleshooting.md +442 -0
- package/templates/skills/knowledge/SKILLS_CATALOG.md +274 -4
- package/templates/skills/knowledge/intelligent-routing/SKILL.md +237 -164
- package/templates/skills/knowledge/parallel-agents/SKILL.md +345 -73
- package/templates/skills/knowledge/plugin-discovery/SKILL.md +582 -0
- package/templates/skills/knowledge/plugin-discovery/scripts/platform_setup.py +1083 -0
- package/templates/skills/knowledge/design-md/README.md +0 -34
- package/templates/skills/knowledge/design-md/SKILL.md +0 -193
- package/templates/skills/knowledge/design-md/examples/DESIGN.md +0 -154
- package/templates/skills/knowledge/notebooklm-mcp/SKILL.md +0 -71
- package/templates/skills/knowledge/notebooklm-mcp/assets/example_asset.txt +0 -24
- package/templates/skills/knowledge/notebooklm-mcp/references/api_reference.md +0 -34
- package/templates/skills/knowledge/notebooklm-mcp/scripts/example.py +0 -19
- package/templates/skills/knowledge/react-components/README.md +0 -36
- package/templates/skills/knowledge/react-components/SKILL.md +0 -53
- package/templates/skills/knowledge/react-components/examples/gold-standard-card.tsx +0 -80
- package/templates/skills/knowledge/react-components/package-lock.json +0 -231
- package/templates/skills/knowledge/react-components/package.json +0 -16
- package/templates/skills/knowledge/react-components/resources/architecture-checklist.md +0 -15
- package/templates/skills/knowledge/react-components/resources/component-template.tsx +0 -37
- package/templates/skills/knowledge/react-components/resources/stitch-api-reference.md +0 -14
- package/templates/skills/knowledge/react-components/resources/style-guide.json +0 -27
- package/templates/skills/knowledge/react-components/scripts/fetch-stitch.sh +0 -30
- package/templates/skills/knowledge/react-components/scripts/validate.js +0 -68
- package/templates/skills/knowledge/self-update/SKILL.md +0 -60
- package/templates/skills/knowledge/self-update/scripts/update_kit.py +0 -103
- package/templates/skills/knowledge/stitch-loop/README.md +0 -54
- package/templates/skills/knowledge/stitch-loop/SKILL.md +0 -235
- package/templates/skills/knowledge/stitch-loop/examples/SITE.md +0 -73
- package/templates/skills/knowledge/stitch-loop/examples/next-prompt.md +0 -25
- package/templates/skills/knowledge/stitch-loop/resources/baton-schema.md +0 -61
- package/templates/skills/knowledge/stitch-loop/resources/site-template.md +0 -104
|
@@ -0,0 +1,396 @@
|
|
|
1
|
+
# Migration from Cluster Autoscaler to Karpenter
|
|
2
|
+
|
|
3
|
+
Step-by-step guide to migrate from Kubernetes Cluster Autoscaler to Karpenter.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Overview
|
|
8
|
+
|
|
9
|
+
### Why Migrate?
|
|
10
|
+
|
|
11
|
+
| Feature | Cluster Autoscaler | Karpenter |
|
|
12
|
+
| ------------------ | ----------------------- | -------------------------- |
|
|
13
|
+
| Provisioning speed | Minutes | Seconds |
|
|
14
|
+
| Instance selection | Pre-defined node groups | Dynamic, right-sized |
|
|
15
|
+
| Spot handling | Limited | Native, automatic failover |
|
|
16
|
+
| Consolidation | None | Built-in |
|
|
17
|
+
| Configuration | Node groups | Kubernetes-native CRDs |
|
|
18
|
+
|
|
19
|
+
### Migration Strategies
|
|
20
|
+
|
|
21
|
+
1. **Parallel** (Recommended): Run both, gradually shift workloads
|
|
22
|
+
2. **Big Bang**: Replace completely in one operation
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## Pre-Migration Checklist
|
|
27
|
+
|
|
28
|
+
- [ ] Karpenter IAM role created with required permissions
|
|
29
|
+
- [ ] IRSA configured for Karpenter ServiceAccount
|
|
30
|
+
- [ ] SQS queue for spot interruptions (optional but recommended)
|
|
31
|
+
- [ ] Subnets and security groups tagged with `karpenter.sh/discovery: <cluster-name>`
|
|
32
|
+
- [ ] Test environment validated
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## Step-by-Step Migration
|
|
37
|
+
|
|
38
|
+
### Step 1: Prepare Infrastructure
|
|
39
|
+
|
|
40
|
+
**Tag Subnets:**
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
aws ec2 create-tags --resources subnet-xxx subnet-yyy \
|
|
44
|
+
--tags Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
**Tag Security Groups:**
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
aws ec2 create-tags --resources sg-xxx \
|
|
51
|
+
--tags Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
**Create Karpenter IAM Role:**
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
# Use eksctl or Terraform - see Karpenter getting started guide
|
|
58
|
+
eksctl create iamserviceaccount \
|
|
59
|
+
--cluster=${CLUSTER_NAME} \
|
|
60
|
+
--namespace=karpenter \
|
|
61
|
+
--name=karpenter \
|
|
62
|
+
--role-name=${CLUSTER_NAME}-karpenter \
|
|
63
|
+
--attach-policy-arn=arn:aws:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy \
|
|
64
|
+
--approve
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### Step 2: Install Karpenter
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
helm registry logout public.ecr.aws
|
|
71
|
+
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
|
|
72
|
+
--version "1.5.2" \
|
|
73
|
+
--namespace karpenter --create-namespace \
|
|
74
|
+
--set "settings.clusterName=${CLUSTER_NAME}" \
|
|
75
|
+
--set "settings.interruptionQueue=${CLUSTER_NAME}" \
|
|
76
|
+
--set replicas=2 \
|
|
77
|
+
--wait
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
### Step 3: Create Karpenter Resources
|
|
81
|
+
|
|
82
|
+
**Create EC2NodeClass:**
|
|
83
|
+
|
|
84
|
+
```yaml
|
|
85
|
+
apiVersion: karpenter.k8s.aws/v1
|
|
86
|
+
kind: EC2NodeClass
|
|
87
|
+
metadata:
|
|
88
|
+
name: default
|
|
89
|
+
spec:
|
|
90
|
+
role: KarpenterNodeRole-${CLUSTER_NAME}
|
|
91
|
+
amiSelectorTerms:
|
|
92
|
+
- alias: al2023@latest
|
|
93
|
+
subnetSelectorTerms:
|
|
94
|
+
- tags:
|
|
95
|
+
karpenter.sh/discovery: ${CLUSTER_NAME}
|
|
96
|
+
securityGroupSelectorTerms:
|
|
97
|
+
- tags:
|
|
98
|
+
karpenter.sh/discovery: ${CLUSTER_NAME}
|
|
99
|
+
blockDeviceMappings:
|
|
100
|
+
- deviceName: /dev/xvda
|
|
101
|
+
ebs:
|
|
102
|
+
volumeSize: 100Gi
|
|
103
|
+
volumeType: gp3
|
|
104
|
+
encrypted: true
|
|
105
|
+
metadataOptions:
|
|
106
|
+
httpTokens: required
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
**Create NodePool (mirror existing ASG capabilities):**
|
|
110
|
+
|
|
111
|
+
```yaml
|
|
112
|
+
apiVersion: karpenter.sh/v1
|
|
113
|
+
kind: NodePool
|
|
114
|
+
metadata:
|
|
115
|
+
name: default
|
|
116
|
+
spec:
|
|
117
|
+
template:
|
|
118
|
+
spec:
|
|
119
|
+
nodeClassRef:
|
|
120
|
+
group: karpenter.k8s.aws
|
|
121
|
+
kind: EC2NodeClass
|
|
122
|
+
name: default
|
|
123
|
+
requirements:
|
|
124
|
+
# Match your existing node groups
|
|
125
|
+
- key: kubernetes.io/arch
|
|
126
|
+
operator: In
|
|
127
|
+
values: ["amd64"]
|
|
128
|
+
- key: karpenter.sh/capacity-type
|
|
129
|
+
operator: In
|
|
130
|
+
values: ["spot", "on-demand"]
|
|
131
|
+
- key: karpenter.k8s.aws/instance-category
|
|
132
|
+
operator: In
|
|
133
|
+
values: ["c", "m", "r"]
|
|
134
|
+
- key: karpenter.k8s.aws/instance-generation
|
|
135
|
+
operator: Gt
|
|
136
|
+
values: ["5"]
|
|
137
|
+
limits:
|
|
138
|
+
cpu: 1000
|
|
139
|
+
disruption:
|
|
140
|
+
consolidationPolicy: WhenEmptyOrUnderutilized
|
|
141
|
+
consolidateAfter: 5m
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
### Step 4: Test Karpenter (Non-Critical Workloads)
|
|
145
|
+
|
|
146
|
+
**Deploy a test workload:**
|
|
147
|
+
|
|
148
|
+
```yaml
|
|
149
|
+
apiVersion: apps/v1
|
|
150
|
+
kind: Deployment
|
|
151
|
+
metadata:
|
|
152
|
+
name: karpenter-test
|
|
153
|
+
spec:
|
|
154
|
+
replicas: 5
|
|
155
|
+
selector:
|
|
156
|
+
matchLabels:
|
|
157
|
+
app: karpenter-test
|
|
158
|
+
template:
|
|
159
|
+
metadata:
|
|
160
|
+
labels:
|
|
161
|
+
app: karpenter-test
|
|
162
|
+
spec:
|
|
163
|
+
# Force scheduling on new nodes (not on CAS nodes)
|
|
164
|
+
affinity:
|
|
165
|
+
nodeAffinity:
|
|
166
|
+
requiredDuringSchedulingIgnoredDuringExecution:
|
|
167
|
+
nodeSelectorTerms:
|
|
168
|
+
- matchExpressions:
|
|
169
|
+
- key: karpenter.sh/nodepool
|
|
170
|
+
operator: Exists
|
|
171
|
+
containers:
|
|
172
|
+
- name: pause
|
|
173
|
+
image: registry.k8s.io/pause:3.9
|
|
174
|
+
resources:
|
|
175
|
+
requests:
|
|
176
|
+
cpu: 100m
|
|
177
|
+
memory: 100Mi
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
**Verify nodes are provisioned:**
|
|
181
|
+
|
|
182
|
+
```bash
|
|
183
|
+
kubectl get nodes -l karpenter.sh/nodepool
|
|
184
|
+
kubectl get nodeclaims
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
### Step 5: Migrate Workloads Gradually
|
|
188
|
+
|
|
189
|
+
**Option A: Cordon ASG nodes**
|
|
190
|
+
|
|
191
|
+
```bash
|
|
192
|
+
# Cordon all non-Karpenter nodes
|
|
193
|
+
kubectl get nodes --no-headers -l '!karpenter.sh/nodepool' | \
|
|
194
|
+
awk '{print $1}' | \
|
|
195
|
+
xargs -I {} kubectl cordon {}
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
**Option B: Use pod anti-affinity**
|
|
199
|
+
|
|
200
|
+
Add to deployments to prefer Karpenter nodes:
|
|
201
|
+
|
|
202
|
+
```yaml
|
|
203
|
+
affinity:
|
|
204
|
+
nodeAffinity:
|
|
205
|
+
preferredDuringSchedulingIgnoredDuringExecution:
|
|
206
|
+
- weight: 100
|
|
207
|
+
preference:
|
|
208
|
+
matchExpressions:
|
|
209
|
+
- key: karpenter.sh/nodepool
|
|
210
|
+
operator: Exists
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
### Step 6: Drain ASG Nodes
|
|
214
|
+
|
|
215
|
+
```bash
|
|
216
|
+
# Drain nodes one by one
|
|
217
|
+
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
|
|
218
|
+
|
|
219
|
+
# Or scale down ASG
|
|
220
|
+
aws autoscaling update-auto-scaling-group \
|
|
221
|
+
--auto-scaling-group-name <asg-name> \
|
|
222
|
+
--desired-capacity 0 \
|
|
223
|
+
--min-size 0
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
### Step 7: Disable Cluster Autoscaler
|
|
227
|
+
|
|
228
|
+
```bash
|
|
229
|
+
# Scale down CAS
|
|
230
|
+
kubectl scale deployment cluster-autoscaler -n kube-system --replicas=0
|
|
231
|
+
|
|
232
|
+
# Or uninstall
|
|
233
|
+
helm uninstall cluster-autoscaler -n kube-system
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
### Step 8: Clean Up
|
|
237
|
+
|
|
238
|
+
```bash
|
|
239
|
+
# Delete old ASGs (after validation period)
|
|
240
|
+
eksctl delete nodegroup --cluster=${CLUSTER_NAME} --name=<nodegroup-name>
|
|
241
|
+
|
|
242
|
+
# Remove CAS IAM resources
|
|
243
|
+
# Delete CAS deployment if not using Helm
|
|
244
|
+
kubectl delete deployment cluster-autoscaler -n kube-system
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
---
|
|
248
|
+
|
|
249
|
+
## Mapping CAS Configuration to Karpenter
|
|
250
|
+
|
|
251
|
+
### Node Group → NodePool
|
|
252
|
+
|
|
253
|
+
**CAS Node Group (eksctl):**
|
|
254
|
+
|
|
255
|
+
```yaml
|
|
256
|
+
nodeGroups:
|
|
257
|
+
- name: general
|
|
258
|
+
instanceType: m5.xlarge
|
|
259
|
+
desiredCapacity: 3
|
|
260
|
+
minSize: 1
|
|
261
|
+
maxSize: 10
|
|
262
|
+
labels:
|
|
263
|
+
workload-type: general
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
**Karpenter NodePool:**
|
|
267
|
+
|
|
268
|
+
```yaml
|
|
269
|
+
apiVersion: karpenter.sh/v1
|
|
270
|
+
kind: NodePool
|
|
271
|
+
metadata:
|
|
272
|
+
name: general
|
|
273
|
+
spec:
|
|
274
|
+
template:
|
|
275
|
+
metadata:
|
|
276
|
+
labels:
|
|
277
|
+
workload-type: general
|
|
278
|
+
spec:
|
|
279
|
+
nodeClassRef:
|
|
280
|
+
group: karpenter.k8s.aws
|
|
281
|
+
kind: EC2NodeClass
|
|
282
|
+
name: default
|
|
283
|
+
requirements:
|
|
284
|
+
- key: node.kubernetes.io/instance-type
|
|
285
|
+
operator: In
|
|
286
|
+
values: ["m5.xlarge", "m5.2xlarge", "m5a.xlarge"] # Expand options
|
|
287
|
+
limits:
|
|
288
|
+
cpu: 100 # ~10 nodes worth
|
|
289
|
+
```
|
|
290
|
+
|
|
291
|
+
### Mixed Instance Types
|
|
292
|
+
|
|
293
|
+
**CAS:**
|
|
294
|
+
|
|
295
|
+
```yaml
|
|
296
|
+
instanceTypes: ["m5.large", "m5a.large", "m5.xlarge"]
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
**Karpenter:**
|
|
300
|
+
|
|
301
|
+
```yaml
|
|
302
|
+
requirements:
|
|
303
|
+
- key: karpenter.k8s.aws/instance-category
|
|
304
|
+
operator: In
|
|
305
|
+
values: ["m"]
|
|
306
|
+
- key: karpenter.k8s.aws/instance-size
|
|
307
|
+
operator: In
|
|
308
|
+
values: ["large", "xlarge"]
|
|
309
|
+
```
|
|
310
|
+
|
|
311
|
+
### Taints
|
|
312
|
+
|
|
313
|
+
**CAS:**
|
|
314
|
+
|
|
315
|
+
```yaml
|
|
316
|
+
taints:
|
|
317
|
+
dedicated: gpu:NoSchedule
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
**Karpenter:**
|
|
321
|
+
|
|
322
|
+
```yaml
|
|
323
|
+
spec:
|
|
324
|
+
template:
|
|
325
|
+
spec:
|
|
326
|
+
taints:
|
|
327
|
+
- key: dedicated
|
|
328
|
+
value: gpu
|
|
329
|
+
effect: NoSchedule
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
---
|
|
333
|
+
|
|
334
|
+
## Rollback Plan
|
|
335
|
+
|
|
336
|
+
If issues occur:
|
|
337
|
+
|
|
338
|
+
1. **Cordon Karpenter nodes:**
|
|
339
|
+
|
|
340
|
+
```bash
|
|
341
|
+
kubectl get nodes -l karpenter.sh/nodepool -o name | xargs kubectl cordon
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
2. **Scale up CAS:**
|
|
345
|
+
|
|
346
|
+
```bash
|
|
347
|
+
kubectl scale deployment cluster-autoscaler -n kube-system --replicas=1
|
|
348
|
+
```
|
|
349
|
+
|
|
350
|
+
3. **Increase ASG capacity:**
|
|
351
|
+
|
|
352
|
+
```bash
|
|
353
|
+
aws autoscaling update-auto-scaling-group \
|
|
354
|
+
--auto-scaling-group-name <asg-name> \
|
|
355
|
+
--desired-capacity 5
|
|
356
|
+
```
|
|
357
|
+
|
|
358
|
+
4. **Drain Karpenter nodes:**
|
|
359
|
+
|
|
360
|
+
```bash
|
|
361
|
+
kubectl get nodes -l karpenter.sh/nodepool -o name | xargs kubectl drain --ignore-daemonsets
|
|
362
|
+
```
|
|
363
|
+
|
|
364
|
+
5. **Delete Karpenter nodes:**
|
|
365
|
+
```bash
|
|
366
|
+
kubectl delete nodes -l karpenter.sh/nodepool
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
---
|
|
370
|
+
|
|
371
|
+
## Post-Migration Validation
|
|
372
|
+
|
|
373
|
+
```bash
|
|
374
|
+
# Verify all pods running
|
|
375
|
+
kubectl get pods --all-namespaces | grep -v Running
|
|
376
|
+
|
|
377
|
+
# Verify node distribution
|
|
378
|
+
kubectl get nodes -o wide
|
|
379
|
+
|
|
380
|
+
# Check Karpenter metrics
|
|
381
|
+
kubectl port-forward -n karpenter svc/karpenter 8080:8080
|
|
382
|
+
curl localhost:8080/metrics | grep karpenter_
|
|
383
|
+
|
|
384
|
+
# Monitor for 24-48 hours before cleanup
|
|
385
|
+
```
|
|
386
|
+
|
|
387
|
+
---
|
|
388
|
+
|
|
389
|
+
## Common Migration Issues
|
|
390
|
+
|
|
391
|
+
| Issue | Cause | Solution |
|
|
392
|
+
| -------------------------------------- | ----------------------- | ----------------------------------- |
|
|
393
|
+
| Pods not scheduling to Karpenter nodes | Node selectors/affinity | Check for CAS-specific labels |
|
|
394
|
+
| Spot interruptions | Normal behavior | Verify SQS queue configured |
|
|
395
|
+
| Higher instance variety | Expected | Karpenter optimizes for cost |
|
|
396
|
+
| Nodes not consolidating | PDBs or annotations | Check `karpenter.sh/do-not-disrupt` |
|