@techwavedev/agi-agent-kit 1.1.7 → 1.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of @techwavedev/agi-agent-kit might be problematic. Click here for more details.

Files changed (111) hide show
  1. package/CHANGELOG.md +82 -1
  2. package/README.md +190 -12
  3. package/bin/init.js +30 -2
  4. package/package.json +6 -3
  5. package/templates/base/AGENTS.md +54 -23
  6. package/templates/base/README.md +325 -0
  7. package/templates/base/directives/memory_integration.md +95 -0
  8. package/templates/base/execution/memory_manager.py +309 -0
  9. package/templates/base/execution/session_boot.py +218 -0
  10. package/templates/base/execution/session_init.py +320 -0
  11. package/templates/base/skill-creator/SKILL_skillcreator.md +23 -36
  12. package/templates/base/skill-creator/scripts/init_skill.py +18 -135
  13. package/templates/skills/ec/README.md +31 -0
  14. package/templates/skills/ec/aws/SKILL.md +1020 -0
  15. package/templates/skills/ec/aws/defaults.yaml +13 -0
  16. package/templates/skills/ec/aws/references/common_patterns.md +80 -0
  17. package/templates/skills/ec/aws/references/mcp_servers.md +98 -0
  18. package/templates/skills/ec/aws-terraform/SKILL.md +349 -0
  19. package/templates/skills/ec/aws-terraform/references/best_practices.md +394 -0
  20. package/templates/skills/ec/aws-terraform/references/checkov_reference.md +337 -0
  21. package/templates/skills/ec/aws-terraform/scripts/configure_mcp.py +150 -0
  22. package/templates/skills/ec/confluent-kafka/SKILL.md +655 -0
  23. package/templates/skills/ec/confluent-kafka/references/ansible_playbooks.md +792 -0
  24. package/templates/skills/ec/confluent-kafka/references/ec_deployment.md +579 -0
  25. package/templates/skills/ec/confluent-kafka/references/kraft_migration.md +490 -0
  26. package/templates/skills/ec/confluent-kafka/references/troubleshooting.md +778 -0
  27. package/templates/skills/ec/confluent-kafka/references/upgrade_7x_to_8x.md +488 -0
  28. package/templates/skills/ec/confluent-kafka/scripts/kafka_health_check.py +435 -0
  29. package/templates/skills/ec/confluent-kafka/scripts/upgrade_preflight.py +568 -0
  30. package/templates/skills/ec/confluent-kafka/scripts/validate_config.py +455 -0
  31. package/templates/skills/ec/consul/SKILL.md +427 -0
  32. package/templates/skills/ec/consul/references/acl_setup.md +168 -0
  33. package/templates/skills/ec/consul/references/ha_config.md +196 -0
  34. package/templates/skills/ec/consul/references/troubleshooting.md +267 -0
  35. package/templates/skills/ec/consul/references/upgrades.md +213 -0
  36. package/templates/skills/ec/consul/scripts/consul_health_report.py +530 -0
  37. package/templates/skills/ec/consul/scripts/consul_status.py +264 -0
  38. package/templates/skills/ec/consul/scripts/generate_values.py +170 -0
  39. package/templates/skills/ec/documentation/SKILL.md +351 -0
  40. package/templates/skills/ec/documentation/references/best_practices.md +201 -0
  41. package/templates/skills/ec/documentation/scripts/analyze_code.py +307 -0
  42. package/templates/skills/ec/documentation/scripts/detect_changes.py +460 -0
  43. package/templates/skills/ec/documentation/scripts/generate_changelog.py +312 -0
  44. package/templates/skills/ec/documentation/scripts/sync_docs.py +272 -0
  45. package/templates/skills/ec/documentation/scripts/update_skill_docs.py +366 -0
  46. package/templates/skills/ec/gitlab/SKILL.md +529 -0
  47. package/templates/skills/ec/gitlab/references/agent_installation.md +416 -0
  48. package/templates/skills/ec/gitlab/references/api_reference.md +508 -0
  49. package/templates/skills/ec/gitlab/references/gitops_flux.md +465 -0
  50. package/templates/skills/ec/gitlab/references/troubleshooting.md +518 -0
  51. package/templates/skills/ec/gitlab/scripts/generate_agent_values.py +329 -0
  52. package/templates/skills/ec/gitlab/scripts/gitlab_agent_status.py +414 -0
  53. package/templates/skills/ec/jira/SKILL.md +484 -0
  54. package/templates/skills/ec/jira/references/jql_reference.md +148 -0
  55. package/templates/skills/ec/jira/scripts/add_comment.py +91 -0
  56. package/templates/skills/ec/jira/scripts/bulk_log_work.py +124 -0
  57. package/templates/skills/ec/jira/scripts/create_ticket.py +162 -0
  58. package/templates/skills/ec/jira/scripts/get_ticket.py +191 -0
  59. package/templates/skills/ec/jira/scripts/jira_client.py +383 -0
  60. package/templates/skills/ec/jira/scripts/log_work.py +154 -0
  61. package/templates/skills/ec/jira/scripts/search_tickets.py +104 -0
  62. package/templates/skills/ec/jira/scripts/update_comment.py +67 -0
  63. package/templates/skills/ec/jira/scripts/update_ticket.py +161 -0
  64. package/templates/skills/ec/karpenter/SKILL.md +301 -0
  65. package/templates/skills/ec/karpenter/references/ec2nodeclasses.md +421 -0
  66. package/templates/skills/ec/karpenter/references/migration.md +396 -0
  67. package/templates/skills/ec/karpenter/references/nodepools.md +400 -0
  68. package/templates/skills/ec/karpenter/references/troubleshooting.md +359 -0
  69. package/templates/skills/ec/karpenter/scripts/generate_ec2nodeclass.py +187 -0
  70. package/templates/skills/ec/karpenter/scripts/generate_nodepool.py +245 -0
  71. package/templates/skills/ec/karpenter/scripts/karpenter_status.py +359 -0
  72. package/templates/skills/ec/opensearch/SKILL.md +720 -0
  73. package/templates/skills/ec/opensearch/references/ml_neural_search.md +576 -0
  74. package/templates/skills/ec/opensearch/references/operator.md +532 -0
  75. package/templates/skills/ec/opensearch/references/query_dsl.md +532 -0
  76. package/templates/skills/ec/opensearch/scripts/configure_mcp.py +148 -0
  77. package/templates/skills/ec/victoriametrics/SKILL.md +598 -0
  78. package/templates/skills/ec/victoriametrics/references/kubernetes.md +531 -0
  79. package/templates/skills/ec/victoriametrics/references/prometheus_migration.md +333 -0
  80. package/templates/skills/ec/victoriametrics/references/troubleshooting.md +442 -0
  81. package/templates/skills/knowledge/SKILLS_CATALOG.md +274 -4
  82. package/templates/skills/knowledge/intelligent-routing/SKILL.md +237 -164
  83. package/templates/skills/knowledge/parallel-agents/SKILL.md +345 -73
  84. package/templates/skills/knowledge/plugin-discovery/SKILL.md +582 -0
  85. package/templates/skills/knowledge/plugin-discovery/scripts/platform_setup.py +1083 -0
  86. package/templates/skills/knowledge/design-md/README.md +0 -34
  87. package/templates/skills/knowledge/design-md/SKILL.md +0 -193
  88. package/templates/skills/knowledge/design-md/examples/DESIGN.md +0 -154
  89. package/templates/skills/knowledge/notebooklm-mcp/SKILL.md +0 -71
  90. package/templates/skills/knowledge/notebooklm-mcp/assets/example_asset.txt +0 -24
  91. package/templates/skills/knowledge/notebooklm-mcp/references/api_reference.md +0 -34
  92. package/templates/skills/knowledge/notebooklm-mcp/scripts/example.py +0 -19
  93. package/templates/skills/knowledge/react-components/README.md +0 -36
  94. package/templates/skills/knowledge/react-components/SKILL.md +0 -53
  95. package/templates/skills/knowledge/react-components/examples/gold-standard-card.tsx +0 -80
  96. package/templates/skills/knowledge/react-components/package-lock.json +0 -231
  97. package/templates/skills/knowledge/react-components/package.json +0 -16
  98. package/templates/skills/knowledge/react-components/resources/architecture-checklist.md +0 -15
  99. package/templates/skills/knowledge/react-components/resources/component-template.tsx +0 -37
  100. package/templates/skills/knowledge/react-components/resources/stitch-api-reference.md +0 -14
  101. package/templates/skills/knowledge/react-components/resources/style-guide.json +0 -27
  102. package/templates/skills/knowledge/react-components/scripts/fetch-stitch.sh +0 -30
  103. package/templates/skills/knowledge/react-components/scripts/validate.js +0 -68
  104. package/templates/skills/knowledge/self-update/SKILL.md +0 -60
  105. package/templates/skills/knowledge/self-update/scripts/update_kit.py +0 -103
  106. package/templates/skills/knowledge/stitch-loop/README.md +0 -54
  107. package/templates/skills/knowledge/stitch-loop/SKILL.md +0 -235
  108. package/templates/skills/knowledge/stitch-loop/examples/SITE.md +0 -73
  109. package/templates/skills/knowledge/stitch-loop/examples/next-prompt.md +0 -25
  110. package/templates/skills/knowledge/stitch-loop/resources/baton-schema.md +0 -61
  111. package/templates/skills/knowledge/stitch-loop/resources/site-template.md +0 -104
@@ -0,0 +1,196 @@
1
+ # Consul HA Configuration Patterns
2
+
3
+ ## Table of Contents
4
+
5
+ - [3-Server HA (Standard)](#3-server-ha-standard)
6
+ - [5-Server HA (High Availability)](#5-server-ha-high-availability)
7
+ - [Multi-Datacenter Federation](#multi-datacenter-federation)
8
+ - [Resource Sizing](#resource-sizing)
9
+ - [Storage Configuration](#storage-configuration)
10
+
11
+ ---
12
+
13
+ ## 3-Server HA (Standard)
14
+
15
+ Minimum HA configuration tolerating 1 server failure.
16
+
17
+ ```yaml
18
+ global:
19
+ name: consul
20
+ datacenter: dc1
21
+ gossipEncryption:
22
+ autoGenerate: true
23
+ tls:
24
+ enabled: true
25
+ enableAutoEncrypt: true
26
+ acls:
27
+ manageSystemACLs: true
28
+
29
+ server:
30
+ replicas: 3
31
+ bootstrapExpect: 3
32
+ resources:
33
+ requests:
34
+ memory: "200Mi"
35
+ cpu: "100m"
36
+ limits:
37
+ memory: "500Mi"
38
+ cpu: "500m"
39
+ storageClass: gp3
40
+ storage: 10Gi
41
+ affinity: |
42
+ podAntiAffinity:
43
+ requiredDuringSchedulingIgnoredDuringExecution:
44
+ - labelSelector:
45
+ matchLabels:
46
+ app: consul
47
+ component: server
48
+ topologyKey: kubernetes.io/hostname
49
+ topologySpreadConstraints: |
50
+ - maxSkew: 1
51
+ topologyKey: topology.kubernetes.io/zone
52
+ whenUnsatisfiable: DoNotSchedule
53
+ labelSelector:
54
+ matchLabels:
55
+ app: consul
56
+ component: server
57
+
58
+ connectInject:
59
+ enabled: true
60
+ default: false
61
+
62
+ controller:
63
+ enabled: true
64
+ ```
65
+
66
+ ---
67
+
68
+ ## 5-Server HA (High Availability)
69
+
70
+ Enhanced HA configuration tolerating 2 server failures.
71
+
72
+ ```yaml
73
+ server:
74
+ replicas: 5
75
+ bootstrapExpect: 5
76
+ resources:
77
+ requests:
78
+ memory: "500Mi"
79
+ cpu: "200m"
80
+ limits:
81
+ memory: "1Gi"
82
+ cpu: "1000m"
83
+ storageClass: gp3
84
+ storage: 20Gi
85
+ affinity: |
86
+ podAntiAffinity:
87
+ requiredDuringSchedulingIgnoredDuringExecution:
88
+ - labelSelector:
89
+ matchLabels:
90
+ app: consul
91
+ component: server
92
+ topologyKey: topology.kubernetes.io/zone
93
+ ```
94
+
95
+ ---
96
+
97
+ ## Multi-Datacenter Federation
98
+
99
+ ### Primary Datacenter
100
+
101
+ ```yaml
102
+ global:
103
+ name: consul
104
+ datacenter: dc1
105
+ tls:
106
+ enabled: true
107
+ enableAutoEncrypt: true
108
+ acls:
109
+ manageSystemACLs: true
110
+ federation:
111
+ enabled: true
112
+ createFederationSecret: true
113
+
114
+ meshGateway:
115
+ enabled: true
116
+ replicas: 2
117
+
118
+ server:
119
+ replicas: 3
120
+ ```
121
+
122
+ ### Secondary Datacenter
123
+
124
+ ```yaml
125
+ global:
126
+ name: consul
127
+ datacenter: dc2
128
+ tls:
129
+ enabled: true
130
+ enableAutoEncrypt: true
131
+ caCert:
132
+ secretName: consul-federation
133
+ secretKey: caCert
134
+ acls:
135
+ manageSystemACLs: true
136
+ replicationToken:
137
+ secretName: consul-federation
138
+ secretKey: replicationToken
139
+ federation:
140
+ enabled: true
141
+ primaryDatacenter: dc1
142
+ primaryGateways:
143
+ - "mesh-gateway-dc1.example.com:443"
144
+
145
+ meshGateway:
146
+ enabled: true
147
+ replicas: 2
148
+
149
+ server:
150
+ replicas: 3
151
+ extraConfig: |
152
+ {
153
+ "primary_datacenter": "dc1",
154
+ "retry_join_wan": ["mesh-gateway-dc1.example.com:443"]
155
+ }
156
+ ```
157
+
158
+ ---
159
+
160
+ ## Resource Sizing
161
+
162
+ | Cluster Size | Servers | CPU Request | Memory Request | Storage |
163
+ | ------------------------ | ------- | ----------- | -------------- | ------- |
164
+ | Small (<50 services) | 3 | 100m | 200Mi | 10Gi |
165
+ | Medium (50-200 services) | 3 | 200m | 500Mi | 20Gi |
166
+ | Large (200+ services) | 5 | 500m | 1Gi | 50Gi |
167
+
168
+ ---
169
+
170
+ ## Storage Configuration
171
+
172
+ ### AWS gp3 StorageClass
173
+
174
+ ```yaml
175
+ apiVersion: storage.k8s.io/v1
176
+ kind: StorageClass
177
+ metadata:
178
+ name: consul-storage
179
+ provisioner: ebs.csi.aws.com
180
+ parameters:
181
+ type: gp3
182
+ iops: "3000"
183
+ throughput: "125"
184
+ encrypted: "true"
185
+ reclaimPolicy: Retain
186
+ allowVolumeExpansion: true
187
+ volumeBindingMode: WaitForFirstConsumer
188
+ ```
189
+
190
+ ### Consul Helm Values for Custom Storage
191
+
192
+ ```yaml
193
+ server:
194
+ storageClass: consul-storage
195
+ storage: 20Gi
196
+ ```
@@ -0,0 +1,267 @@
1
+ # Consul Troubleshooting Guide
2
+
3
+ ## Table of Contents
4
+
5
+ - [Cluster Formation Issues](#cluster-formation-issues)
6
+ - [Connect Sidecar Issues](#connect-sidecar-issues)
7
+ - [Performance Issues](#performance-issues)
8
+ - [TLS/Certificate Issues](#tlscertificate-issues)
9
+ - [ACL Issues](#acl-issues)
10
+ - [Upgrade Issues](#upgrade-issues)
11
+
12
+ ---
13
+
14
+ ## Cluster Formation Issues
15
+
16
+ ### Servers Not Joining Cluster
17
+
18
+ **Symptoms:**
19
+
20
+ - `consul members` shows fewer than expected servers
21
+ - Servers stuck in `left` or `failed` state
22
+
23
+ **Diagnosis:**
24
+
25
+ ```bash
26
+ # Check server logs
27
+ kubectl logs -n consul consul-server-0 | grep -i "join\|gossip\|serf"
28
+
29
+ # Check gossip key
30
+ kubectl get secret consul-gossip-encryption-key -n consul -o jsonpath='{.data.key}' | base64 -d
31
+ ```
32
+
33
+ **Solutions:**
34
+
35
+ 1. **Gossip key mismatch** — Ensure all servers use same key
36
+ 2. **Network policies** — Allow ports 8301 (LAN gossip), 8300 (RPC)
37
+ 3. **DNS resolution** — Check headless service resolves correctly
38
+
39
+ ### Raft Quorum Lost
40
+
41
+ **Symptoms:**
42
+
43
+ - `No cluster leader`
44
+ - Only 1 server responding
45
+
46
+ **Recovery:**
47
+
48
+ ```bash
49
+ # Check raft peers
50
+ kubectl exec -n consul consul-server-0 -- consul operator raft list-peers
51
+
52
+ # Remove failed peer
53
+ kubectl exec -n consul consul-server-0 -- consul operator raft remove-peer -address=<failed-peer-ip>:8300
54
+
55
+ # If single node recovery needed (DANGEROUS - data loss possible)
56
+ kubectl exec -n consul consul-server-0 -- consul operator raft remove-peer -address=<peer-address>
57
+ ```
58
+
59
+ ---
60
+
61
+ ## Connect Sidecar Issues
62
+
63
+ ### Sidecar Not Injecting
64
+
65
+ **Symptoms:**
66
+
67
+ - Pods start without `consul-dataplane` container
68
+ - No Envoy sidecar
69
+
70
+ **Diagnosis:**
71
+
72
+ ```bash
73
+ # Check webhook
74
+ kubectl get mutatingwebhookconfigurations | grep consul
75
+
76
+ # Check injector logs
77
+ kubectl logs -n consul -l app=consul,component=connect-injector
78
+
79
+ # Check pod annotations
80
+ kubectl get pod <pod-name> -o yaml | grep consul
81
+ ```
82
+
83
+ **Solutions:**
84
+
85
+ 1. **Missing annotation** — Add `consul.hashicorp.com/connect-inject: "true"`
86
+ 2. **Namespace not labeled** — `kubectl label namespace myapp consul.hashicorp.com/connect-inject=true`
87
+ 3. **Webhook certificate expired** — Restart injector deployment
88
+
89
+ ### Sidecar CrashLoopBackOff
90
+
91
+ **Symptoms:**
92
+
93
+ - `consul-dataplane` container crashes
94
+ - Pod restarts repeatedly
95
+
96
+ **Diagnosis:**
97
+
98
+ ```bash
99
+ # Check dataplane logs
100
+ kubectl logs <pod-name> -c consul-dataplane
101
+
102
+ # Check Envoy config
103
+ kubectl exec <pod-name> -c consul-dataplane -- wget -qO- localhost:19000/config_dump
104
+ ```
105
+
106
+ **Solutions:**
107
+
108
+ 1. **ACL token issues** — Verify service token exists
109
+ 2. **Upstream not found** — Check service registration
110
+ 3. **Resource limits** — Increase CPU/memory for dataplane
111
+
112
+ ---
113
+
114
+ ## Performance Issues
115
+
116
+ ### High Latency Between Services
117
+
118
+ **Diagnosis:**
119
+
120
+ ```bash
121
+ # Check Envoy stats
122
+ kubectl exec <pod-name> -c consul-dataplane -- wget -qO- localhost:19000/stats | grep upstream
123
+
124
+ # Check connect intentions
125
+ kubectl get serviceintentions -A
126
+ ```
127
+
128
+ **Solutions:**
129
+
130
+ 1. **Increase Envoy resources**
131
+
132
+ ```yaml
133
+ annotations:
134
+ consul.hashicorp.com/sidecar-proxy-cpu-request: "100m"
135
+ consul.hashicorp.com/sidecar-proxy-memory-request: "128Mi"
136
+ ```
137
+
138
+ 2. **Check network policies** — Ensure direct pod-to-pod traffic allowed
139
+ 3. **Locality-aware routing** — Enable mesh gateway mode
140
+
141
+ ### Server High CPU
142
+
143
+ **Diagnosis:**
144
+
145
+ ```bash
146
+ # Check leader
147
+ kubectl exec -n consul consul-server-0 -- consul operator raft list-peers | grep leader
148
+
149
+ # Check catalog size
150
+ kubectl exec -n consul consul-server-0 -- consul catalog services | wc -l
151
+ ```
152
+
153
+ **Solutions:**
154
+
155
+ 1. Scale to 5 servers for read distribution
156
+ 2. Increase server resources
157
+ 3. Review service registration churn
158
+
159
+ ---
160
+
161
+ ## TLS/Certificate Issues
162
+
163
+ ### Certificate Errors
164
+
165
+ **Symptoms:**
166
+
167
+ - `x509: certificate signed by unknown authority`
168
+ - Connect handshake failures
169
+
170
+ **Diagnosis:**
171
+
172
+ ```bash
173
+ # Check CA config
174
+ kubectl exec -n consul consul-server-0 -- consul connect ca get-config
175
+
176
+ # Check certificate expiry
177
+ kubectl exec -n consul consul-server-0 -- consul tls cert-show
178
+ ```
179
+
180
+ **Solutions:**
181
+
182
+ 1. **CA mismatch** — Ensure all clients trust the CA
183
+ 2. **Certificate rotation** — Trigger CA rotation
184
+ 3. **Auto-encrypt issues** — Verify `enableAutoEncrypt: true`
185
+
186
+ ### Gossip Encryption Failures
187
+
188
+ **Symptoms:**
189
+
190
+ - Servers can't communicate
191
+ - `memberlist: Encrypt message failed`
192
+
193
+ **Diagnosis:**
194
+
195
+ ```bash
196
+ # Check keyring
197
+ kubectl exec -n consul consul-server-0 -- consul keyring -list
198
+ ```
199
+
200
+ **Solutions:**
201
+
202
+ 1. Ensure all nodes have the same gossip key
203
+ 2. If rotating keys, follow proper key rotation procedure
204
+
205
+ ---
206
+
207
+ ## ACL Issues
208
+
209
+ ### Permission Denied Errors
210
+
211
+ **Diagnosis:**
212
+
213
+ ```bash
214
+ # Check token
215
+ kubectl exec -n consul consul-server-0 -- consul acl token list
216
+
217
+ # Check policy
218
+ kubectl exec -n consul consul-server-0 -- consul acl policy list
219
+ ```
220
+
221
+ **Solutions:**
222
+
223
+ 1. Create appropriate policy for the service
224
+ 2. Attach policy to token
225
+ 3. Verify token is being used correctly
226
+
227
+ ### Bootstrap Token Lost
228
+
229
+ **Recovery:**
230
+
231
+ ```bash
232
+ # Reset bootstrap (requires server access)
233
+ kubectl exec -n consul consul-server-0 -- consul acl bootstrap -reset
234
+ ```
235
+
236
+ ---
237
+
238
+ ## Upgrade Issues
239
+
240
+ ### Pods Stuck in Pending
241
+
242
+ **Cause:** Volume affinity conflicts
243
+
244
+ **Solution:**
245
+
246
+ ```bash
247
+ # Check PVC binding
248
+ kubectl get pvc -n consul
249
+
250
+ # Delete stuck pod (statefulset will recreate)
251
+ kubectl delete pod consul-server-X -n consul
252
+ ```
253
+
254
+ ### Version Incompatibility
255
+
256
+ **Prevention:**
257
+
258
+ - Always check [upgrade notes](https://developer.hashicorp.com/consul/docs/upgrading)
259
+ - Upgrade one minor version at a time
260
+ - Test in non-prod first
261
+
262
+ **Recovery:**
263
+
264
+ ```bash
265
+ # Rollback Helm release
266
+ helm rollback consul <previous-revision> -n consul
267
+ ```
@@ -0,0 +1,213 @@
1
+ # Consul Upgrade Guide
2
+
3
+ ## Table of Contents
4
+
5
+ - [Pre-Upgrade Checklist](#pre-upgrade-checklist)
6
+ - [Upgrade Paths](#upgrade-paths)
7
+ - [Upgrade Procedures](#upgrade-procedures)
8
+ - [Post-Upgrade Validation](#post-upgrade-validation)
9
+ - [Rollback Procedures](#rollback-procedures)
10
+ - [Breaking Changes by Version](#breaking-changes-by-version)
11
+
12
+ ---
13
+
14
+ ## Pre-Upgrade Checklist
15
+
16
+ ### Before Any Upgrade
17
+
18
+ - [ ] Review [release notes](https://developer.hashicorp.com/consul/docs/release-notes)
19
+ - [ ] Check for breaking changes
20
+ - [ ] Create snapshot backup
21
+ - [ ] Verify cluster is healthy
22
+ - [ ] Test upgrade in non-prod first
23
+ - [ ] Plan maintenance window
24
+
25
+ ### Backup Commands
26
+
27
+ ```bash
28
+ # Create snapshot
29
+ kubectl exec -n consul consul-server-0 -- consul snapshot save /tmp/backup.snap
30
+ kubectl cp consul/consul-server-0:/tmp/backup.snap ./consul-backup-$(date +%Y%m%d-%H%M).snap
31
+
32
+ # Verify backup
33
+ consul snapshot inspect ./consul-backup-*.snap
34
+ ```
35
+
36
+ ### Health Verification
37
+
38
+ ```bash
39
+ # All servers healthy
40
+ kubectl exec -n consul consul-server-0 -- consul members
41
+
42
+ # Raft consensus
43
+ kubectl exec -n consul consul-server-0 -- consul operator raft list-peers
44
+
45
+ # No critical services down
46
+ kubectl exec -n consul consul-server-0 -- consul catalog services
47
+ ```
48
+
49
+ ---
50
+
51
+ ## Upgrade Paths
52
+
53
+ ### Consul Version Compatibility
54
+
55
+ | From | To | Notes |
56
+ | ------ | ------ | ----------------------------------------------- |
57
+ | 1.15.x | 1.16.x | Direct upgrade supported |
58
+ | 1.16.x | 1.17.x | Direct upgrade supported |
59
+ | 1.17.x | 1.18.x | Direct upgrade supported |
60
+ | 1.15.x | 1.18.x | **Step upgrade required** (1.15→1.16→1.17→1.18) |
61
+
62
+ ### consul-k8s Helm Chart Versions
63
+
64
+ | Helm Chart | Consul Version | Notes |
65
+ | ---------- | -------------- | ------ |
66
+ | 1.0.x | 1.14.x | Legacy |
67
+ | 1.1.x | 1.15.x | |
68
+ | 1.2.x | 1.16.x | |
69
+ | 1.3.x | 1.17.x | |
70
+ | 1.4.x | 1.18.x | Latest |
71
+
72
+ ---
73
+
74
+ ## Upgrade Procedures
75
+
76
+ ### Standard Helm Upgrade
77
+
78
+ ```bash
79
+ # 1. Check current version
80
+ helm list -n consul
81
+ kubectl exec -n consul consul-server-0 -- consul version
82
+
83
+ # 2. Update Helm repo
84
+ helm repo update hashicorp
85
+
86
+ # 3. Check available versions
87
+ helm search repo hashicorp/consul --versions
88
+
89
+ # 4. Dry-run upgrade
90
+ helm upgrade consul hashicorp/consul \
91
+ --namespace consul \
92
+ --values consul-values.yaml \
93
+ --version <NEW_VERSION> \
94
+ --dry-run
95
+
96
+ # 5. Perform upgrade
97
+ helm upgrade consul hashicorp/consul \
98
+ --namespace consul \
99
+ --values consul-values.yaml \
100
+ --version <NEW_VERSION>
101
+
102
+ # 6. Watch rollout
103
+ kubectl rollout status statefulset/consul-server -n consul --timeout=10m
104
+ ```
105
+
106
+ ### Zero-Downtime Upgrade Strategy
107
+
108
+ ```bash
109
+ # Upgrade servers one at a time
110
+ for i in 2 1 0; do
111
+ echo "Upgrading consul-server-$i..."
112
+ kubectl delete pod consul-server-$i -n consul
113
+ kubectl wait --for=condition=Ready pod/consul-server-$i -n consul --timeout=300s
114
+ sleep 30
115
+ done
116
+
117
+ # Verify leader election
118
+ kubectl exec -n consul consul-server-0 -- consul operator raft list-peers
119
+ ```
120
+
121
+ ### Multi-Datacenter Upgrade Order
122
+
123
+ 1. **Upgrade secondary datacenters first**
124
+ 2. Verify federation health
125
+ 3. **Upgrade primary datacenter last**
126
+
127
+ ---
128
+
129
+ ## Post-Upgrade Validation
130
+
131
+ ### Immediate Checks
132
+
133
+ ```bash
134
+ # Server health
135
+ kubectl exec -n consul consul-server-0 -- consul members
136
+
137
+ # Version verification
138
+ kubectl exec -n consul consul-server-0 -- consul version
139
+
140
+ # Raft status
141
+ kubectl exec -n consul consul-server-0 -- consul operator raft list-peers
142
+
143
+ # Connect CA
144
+ kubectl exec -n consul consul-server-0 -- consul connect ca get-config
145
+ ```
146
+
147
+ ### Service Mesh Validation
148
+
149
+ ```bash
150
+ # Check service registrations
151
+ kubectl exec -n consul consul-server-0 -- consul catalog services
152
+
153
+ # Verify sidecar injection still works
154
+ kubectl rollout restart deployment/test-app -n test
155
+
156
+ # Check intentions
157
+ kubectl get serviceintentions -A
158
+ ```
159
+
160
+ ---
161
+
162
+ ## Rollback Procedures
163
+
164
+ ### Helm Rollback
165
+
166
+ ```bash
167
+ # List revisions
168
+ helm history consul -n consul
169
+
170
+ # Rollback to previous
171
+ helm rollback consul <REVISION> -n consul
172
+
173
+ # Watch rollout
174
+ kubectl rollout status statefulset/consul-server -n consul
175
+ ```
176
+
177
+ ### Snapshot Restore (Data Recovery)
178
+
179
+ ```bash
180
+ # Copy backup to server
181
+ kubectl cp ./consul-backup.snap consul/consul-server-0:/tmp/restore.snap
182
+
183
+ # Restore snapshot
184
+ kubectl exec -n consul consul-server-0 -- consul snapshot restore /tmp/restore.snap
185
+
186
+ # Restart servers
187
+ kubectl rollout restart statefulset/consul-server -n consul
188
+ ```
189
+
190
+ ---
191
+
192
+ ## Breaking Changes by Version
193
+
194
+ ### 1.17.x → 1.18.x
195
+
196
+ - Consul Dataplane is now default (replaces client agents)
197
+ - Update sidecar annotations if customized
198
+
199
+ ### 1.16.x → 1.17.x
200
+
201
+ - ACL token migration completed
202
+ - Legacy ACL tokens no longer supported
203
+
204
+ ### 1.15.x → 1.16.x
205
+
206
+ - Catalog API v2 introduced (opt-in)
207
+ - Some deprecated flags removed
208
+
209
+ ### General Guidance
210
+
211
+ - Always check the official [upgrading documentation](https://developer.hashicorp.com/consul/docs/upgrading)
212
+ - Test CRD changes in non-prod
213
+ - Monitor Envoy proxy compatibility