@techwavedev/agi-agent-kit 1.1.7 → 1.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of @techwavedev/agi-agent-kit might be problematic. Click here for more details.
- package/CHANGELOG.md +82 -1
- package/README.md +190 -12
- package/bin/init.js +30 -2
- package/package.json +6 -3
- package/templates/base/AGENTS.md +54 -23
- package/templates/base/README.md +325 -0
- package/templates/base/directives/memory_integration.md +95 -0
- package/templates/base/execution/memory_manager.py +309 -0
- package/templates/base/execution/session_boot.py +218 -0
- package/templates/base/execution/session_init.py +320 -0
- package/templates/base/skill-creator/SKILL_skillcreator.md +23 -36
- package/templates/base/skill-creator/scripts/init_skill.py +18 -135
- package/templates/skills/ec/README.md +31 -0
- package/templates/skills/ec/aws/SKILL.md +1020 -0
- package/templates/skills/ec/aws/defaults.yaml +13 -0
- package/templates/skills/ec/aws/references/common_patterns.md +80 -0
- package/templates/skills/ec/aws/references/mcp_servers.md +98 -0
- package/templates/skills/ec/aws-terraform/SKILL.md +349 -0
- package/templates/skills/ec/aws-terraform/references/best_practices.md +394 -0
- package/templates/skills/ec/aws-terraform/references/checkov_reference.md +337 -0
- package/templates/skills/ec/aws-terraform/scripts/configure_mcp.py +150 -0
- package/templates/skills/ec/confluent-kafka/SKILL.md +655 -0
- package/templates/skills/ec/confluent-kafka/references/ansible_playbooks.md +792 -0
- package/templates/skills/ec/confluent-kafka/references/ec_deployment.md +579 -0
- package/templates/skills/ec/confluent-kafka/references/kraft_migration.md +490 -0
- package/templates/skills/ec/confluent-kafka/references/troubleshooting.md +778 -0
- package/templates/skills/ec/confluent-kafka/references/upgrade_7x_to_8x.md +488 -0
- package/templates/skills/ec/confluent-kafka/scripts/kafka_health_check.py +435 -0
- package/templates/skills/ec/confluent-kafka/scripts/upgrade_preflight.py +568 -0
- package/templates/skills/ec/confluent-kafka/scripts/validate_config.py +455 -0
- package/templates/skills/ec/consul/SKILL.md +427 -0
- package/templates/skills/ec/consul/references/acl_setup.md +168 -0
- package/templates/skills/ec/consul/references/ha_config.md +196 -0
- package/templates/skills/ec/consul/references/troubleshooting.md +267 -0
- package/templates/skills/ec/consul/references/upgrades.md +213 -0
- package/templates/skills/ec/consul/scripts/consul_health_report.py +530 -0
- package/templates/skills/ec/consul/scripts/consul_status.py +264 -0
- package/templates/skills/ec/consul/scripts/generate_values.py +170 -0
- package/templates/skills/ec/documentation/SKILL.md +351 -0
- package/templates/skills/ec/documentation/references/best_practices.md +201 -0
- package/templates/skills/ec/documentation/scripts/analyze_code.py +307 -0
- package/templates/skills/ec/documentation/scripts/detect_changes.py +460 -0
- package/templates/skills/ec/documentation/scripts/generate_changelog.py +312 -0
- package/templates/skills/ec/documentation/scripts/sync_docs.py +272 -0
- package/templates/skills/ec/documentation/scripts/update_skill_docs.py +366 -0
- package/templates/skills/ec/gitlab/SKILL.md +529 -0
- package/templates/skills/ec/gitlab/references/agent_installation.md +416 -0
- package/templates/skills/ec/gitlab/references/api_reference.md +508 -0
- package/templates/skills/ec/gitlab/references/gitops_flux.md +465 -0
- package/templates/skills/ec/gitlab/references/troubleshooting.md +518 -0
- package/templates/skills/ec/gitlab/scripts/generate_agent_values.py +329 -0
- package/templates/skills/ec/gitlab/scripts/gitlab_agent_status.py +414 -0
- package/templates/skills/ec/jira/SKILL.md +484 -0
- package/templates/skills/ec/jira/references/jql_reference.md +148 -0
- package/templates/skills/ec/jira/scripts/add_comment.py +91 -0
- package/templates/skills/ec/jira/scripts/bulk_log_work.py +124 -0
- package/templates/skills/ec/jira/scripts/create_ticket.py +162 -0
- package/templates/skills/ec/jira/scripts/get_ticket.py +191 -0
- package/templates/skills/ec/jira/scripts/jira_client.py +383 -0
- package/templates/skills/ec/jira/scripts/log_work.py +154 -0
- package/templates/skills/ec/jira/scripts/search_tickets.py +104 -0
- package/templates/skills/ec/jira/scripts/update_comment.py +67 -0
- package/templates/skills/ec/jira/scripts/update_ticket.py +161 -0
- package/templates/skills/ec/karpenter/SKILL.md +301 -0
- package/templates/skills/ec/karpenter/references/ec2nodeclasses.md +421 -0
- package/templates/skills/ec/karpenter/references/migration.md +396 -0
- package/templates/skills/ec/karpenter/references/nodepools.md +400 -0
- package/templates/skills/ec/karpenter/references/troubleshooting.md +359 -0
- package/templates/skills/ec/karpenter/scripts/generate_ec2nodeclass.py +187 -0
- package/templates/skills/ec/karpenter/scripts/generate_nodepool.py +245 -0
- package/templates/skills/ec/karpenter/scripts/karpenter_status.py +359 -0
- package/templates/skills/ec/opensearch/SKILL.md +720 -0
- package/templates/skills/ec/opensearch/references/ml_neural_search.md +576 -0
- package/templates/skills/ec/opensearch/references/operator.md +532 -0
- package/templates/skills/ec/opensearch/references/query_dsl.md +532 -0
- package/templates/skills/ec/opensearch/scripts/configure_mcp.py +148 -0
- package/templates/skills/ec/victoriametrics/SKILL.md +598 -0
- package/templates/skills/ec/victoriametrics/references/kubernetes.md +531 -0
- package/templates/skills/ec/victoriametrics/references/prometheus_migration.md +333 -0
- package/templates/skills/ec/victoriametrics/references/troubleshooting.md +442 -0
- package/templates/skills/knowledge/SKILLS_CATALOG.md +274 -4
- package/templates/skills/knowledge/intelligent-routing/SKILL.md +237 -164
- package/templates/skills/knowledge/parallel-agents/SKILL.md +345 -73
- package/templates/skills/knowledge/plugin-discovery/SKILL.md +582 -0
- package/templates/skills/knowledge/plugin-discovery/scripts/platform_setup.py +1083 -0
- package/templates/skills/knowledge/design-md/README.md +0 -34
- package/templates/skills/knowledge/design-md/SKILL.md +0 -193
- package/templates/skills/knowledge/design-md/examples/DESIGN.md +0 -154
- package/templates/skills/knowledge/notebooklm-mcp/SKILL.md +0 -71
- package/templates/skills/knowledge/notebooklm-mcp/assets/example_asset.txt +0 -24
- package/templates/skills/knowledge/notebooklm-mcp/references/api_reference.md +0 -34
- package/templates/skills/knowledge/notebooklm-mcp/scripts/example.py +0 -19
- package/templates/skills/knowledge/react-components/README.md +0 -36
- package/templates/skills/knowledge/react-components/SKILL.md +0 -53
- package/templates/skills/knowledge/react-components/examples/gold-standard-card.tsx +0 -80
- package/templates/skills/knowledge/react-components/package-lock.json +0 -231
- package/templates/skills/knowledge/react-components/package.json +0 -16
- package/templates/skills/knowledge/react-components/resources/architecture-checklist.md +0 -15
- package/templates/skills/knowledge/react-components/resources/component-template.tsx +0 -37
- package/templates/skills/knowledge/react-components/resources/stitch-api-reference.md +0 -14
- package/templates/skills/knowledge/react-components/resources/style-guide.json +0 -27
- package/templates/skills/knowledge/react-components/scripts/fetch-stitch.sh +0 -30
- package/templates/skills/knowledge/react-components/scripts/validate.js +0 -68
- package/templates/skills/knowledge/self-update/SKILL.md +0 -60
- package/templates/skills/knowledge/self-update/scripts/update_kit.py +0 -103
- package/templates/skills/knowledge/stitch-loop/README.md +0 -54
- package/templates/skills/knowledge/stitch-loop/SKILL.md +0 -235
- package/templates/skills/knowledge/stitch-loop/examples/SITE.md +0 -73
- package/templates/skills/knowledge/stitch-loop/examples/next-prompt.md +0 -25
- package/templates/skills/knowledge/stitch-loop/resources/baton-schema.md +0 -61
- package/templates/skills/knowledge/stitch-loop/resources/site-template.md +0 -104
|
@@ -0,0 +1,196 @@
|
|
|
1
|
+
# Consul HA Configuration Patterns
|
|
2
|
+
|
|
3
|
+
## Table of Contents
|
|
4
|
+
|
|
5
|
+
- [3-Server HA (Standard)](#3-server-ha-standard)
|
|
6
|
+
- [5-Server HA (High Availability)](#5-server-ha-high-availability)
|
|
7
|
+
- [Multi-Datacenter Federation](#multi-datacenter-federation)
|
|
8
|
+
- [Resource Sizing](#resource-sizing)
|
|
9
|
+
- [Storage Configuration](#storage-configuration)
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## 3-Server HA (Standard)
|
|
14
|
+
|
|
15
|
+
Minimum HA configuration tolerating 1 server failure.
|
|
16
|
+
|
|
17
|
+
```yaml
|
|
18
|
+
global:
|
|
19
|
+
name: consul
|
|
20
|
+
datacenter: dc1
|
|
21
|
+
gossipEncryption:
|
|
22
|
+
autoGenerate: true
|
|
23
|
+
tls:
|
|
24
|
+
enabled: true
|
|
25
|
+
enableAutoEncrypt: true
|
|
26
|
+
acls:
|
|
27
|
+
manageSystemACLs: true
|
|
28
|
+
|
|
29
|
+
server:
|
|
30
|
+
replicas: 3
|
|
31
|
+
bootstrapExpect: 3
|
|
32
|
+
resources:
|
|
33
|
+
requests:
|
|
34
|
+
memory: "200Mi"
|
|
35
|
+
cpu: "100m"
|
|
36
|
+
limits:
|
|
37
|
+
memory: "500Mi"
|
|
38
|
+
cpu: "500m"
|
|
39
|
+
storageClass: gp3
|
|
40
|
+
storage: 10Gi
|
|
41
|
+
affinity: |
|
|
42
|
+
podAntiAffinity:
|
|
43
|
+
requiredDuringSchedulingIgnoredDuringExecution:
|
|
44
|
+
- labelSelector:
|
|
45
|
+
matchLabels:
|
|
46
|
+
app: consul
|
|
47
|
+
component: server
|
|
48
|
+
topologyKey: kubernetes.io/hostname
|
|
49
|
+
topologySpreadConstraints: |
|
|
50
|
+
- maxSkew: 1
|
|
51
|
+
topologyKey: topology.kubernetes.io/zone
|
|
52
|
+
whenUnsatisfiable: DoNotSchedule
|
|
53
|
+
labelSelector:
|
|
54
|
+
matchLabels:
|
|
55
|
+
app: consul
|
|
56
|
+
component: server
|
|
57
|
+
|
|
58
|
+
connectInject:
|
|
59
|
+
enabled: true
|
|
60
|
+
default: false
|
|
61
|
+
|
|
62
|
+
controller:
|
|
63
|
+
enabled: true
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
68
|
+
## 5-Server HA (High Availability)
|
|
69
|
+
|
|
70
|
+
Enhanced HA configuration tolerating 2 server failures.
|
|
71
|
+
|
|
72
|
+
```yaml
|
|
73
|
+
server:
|
|
74
|
+
replicas: 5
|
|
75
|
+
bootstrapExpect: 5
|
|
76
|
+
resources:
|
|
77
|
+
requests:
|
|
78
|
+
memory: "500Mi"
|
|
79
|
+
cpu: "200m"
|
|
80
|
+
limits:
|
|
81
|
+
memory: "1Gi"
|
|
82
|
+
cpu: "1000m"
|
|
83
|
+
storageClass: gp3
|
|
84
|
+
storage: 20Gi
|
|
85
|
+
affinity: |
|
|
86
|
+
podAntiAffinity:
|
|
87
|
+
requiredDuringSchedulingIgnoredDuringExecution:
|
|
88
|
+
- labelSelector:
|
|
89
|
+
matchLabels:
|
|
90
|
+
app: consul
|
|
91
|
+
component: server
|
|
92
|
+
topologyKey: topology.kubernetes.io/zone
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## Multi-Datacenter Federation
|
|
98
|
+
|
|
99
|
+
### Primary Datacenter
|
|
100
|
+
|
|
101
|
+
```yaml
|
|
102
|
+
global:
|
|
103
|
+
name: consul
|
|
104
|
+
datacenter: dc1
|
|
105
|
+
tls:
|
|
106
|
+
enabled: true
|
|
107
|
+
enableAutoEncrypt: true
|
|
108
|
+
acls:
|
|
109
|
+
manageSystemACLs: true
|
|
110
|
+
federation:
|
|
111
|
+
enabled: true
|
|
112
|
+
createFederationSecret: true
|
|
113
|
+
|
|
114
|
+
meshGateway:
|
|
115
|
+
enabled: true
|
|
116
|
+
replicas: 2
|
|
117
|
+
|
|
118
|
+
server:
|
|
119
|
+
replicas: 3
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
### Secondary Datacenter
|
|
123
|
+
|
|
124
|
+
```yaml
|
|
125
|
+
global:
|
|
126
|
+
name: consul
|
|
127
|
+
datacenter: dc2
|
|
128
|
+
tls:
|
|
129
|
+
enabled: true
|
|
130
|
+
enableAutoEncrypt: true
|
|
131
|
+
caCert:
|
|
132
|
+
secretName: consul-federation
|
|
133
|
+
secretKey: caCert
|
|
134
|
+
acls:
|
|
135
|
+
manageSystemACLs: true
|
|
136
|
+
replicationToken:
|
|
137
|
+
secretName: consul-federation
|
|
138
|
+
secretKey: replicationToken
|
|
139
|
+
federation:
|
|
140
|
+
enabled: true
|
|
141
|
+
primaryDatacenter: dc1
|
|
142
|
+
primaryGateways:
|
|
143
|
+
- "mesh-gateway-dc1.example.com:443"
|
|
144
|
+
|
|
145
|
+
meshGateway:
|
|
146
|
+
enabled: true
|
|
147
|
+
replicas: 2
|
|
148
|
+
|
|
149
|
+
server:
|
|
150
|
+
replicas: 3
|
|
151
|
+
extraConfig: |
|
|
152
|
+
{
|
|
153
|
+
"primary_datacenter": "dc1",
|
|
154
|
+
"retry_join_wan": ["mesh-gateway-dc1.example.com:443"]
|
|
155
|
+
}
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
---
|
|
159
|
+
|
|
160
|
+
## Resource Sizing
|
|
161
|
+
|
|
162
|
+
| Cluster Size | Servers | CPU Request | Memory Request | Storage |
|
|
163
|
+
| ------------------------ | ------- | ----------- | -------------- | ------- |
|
|
164
|
+
| Small (<50 services) | 3 | 100m | 200Mi | 10Gi |
|
|
165
|
+
| Medium (50-200 services) | 3 | 200m | 500Mi | 20Gi |
|
|
166
|
+
| Large (200+ services) | 5 | 500m | 1Gi | 50Gi |
|
|
167
|
+
|
|
168
|
+
---
|
|
169
|
+
|
|
170
|
+
## Storage Configuration
|
|
171
|
+
|
|
172
|
+
### AWS gp3 StorageClass
|
|
173
|
+
|
|
174
|
+
```yaml
|
|
175
|
+
apiVersion: storage.k8s.io/v1
|
|
176
|
+
kind: StorageClass
|
|
177
|
+
metadata:
|
|
178
|
+
name: consul-storage
|
|
179
|
+
provisioner: ebs.csi.aws.com
|
|
180
|
+
parameters:
|
|
181
|
+
type: gp3
|
|
182
|
+
iops: "3000"
|
|
183
|
+
throughput: "125"
|
|
184
|
+
encrypted: "true"
|
|
185
|
+
reclaimPolicy: Retain
|
|
186
|
+
allowVolumeExpansion: true
|
|
187
|
+
volumeBindingMode: WaitForFirstConsumer
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
### Consul Helm Values for Custom Storage
|
|
191
|
+
|
|
192
|
+
```yaml
|
|
193
|
+
server:
|
|
194
|
+
storageClass: consul-storage
|
|
195
|
+
storage: 20Gi
|
|
196
|
+
```
|
|
@@ -0,0 +1,267 @@
|
|
|
1
|
+
# Consul Troubleshooting Guide
|
|
2
|
+
|
|
3
|
+
## Table of Contents
|
|
4
|
+
|
|
5
|
+
- [Cluster Formation Issues](#cluster-formation-issues)
|
|
6
|
+
- [Connect Sidecar Issues](#connect-sidecar-issues)
|
|
7
|
+
- [Performance Issues](#performance-issues)
|
|
8
|
+
- [TLS/Certificate Issues](#tlscertificate-issues)
|
|
9
|
+
- [ACL Issues](#acl-issues)
|
|
10
|
+
- [Upgrade Issues](#upgrade-issues)
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## Cluster Formation Issues
|
|
15
|
+
|
|
16
|
+
### Servers Not Joining Cluster
|
|
17
|
+
|
|
18
|
+
**Symptoms:**
|
|
19
|
+
|
|
20
|
+
- `consul members` shows fewer than expected servers
|
|
21
|
+
- Servers stuck in `left` or `failed` state
|
|
22
|
+
|
|
23
|
+
**Diagnosis:**
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
# Check server logs
|
|
27
|
+
kubectl logs -n consul consul-server-0 | grep -i "join\|gossip\|serf"
|
|
28
|
+
|
|
29
|
+
# Check gossip key
|
|
30
|
+
kubectl get secret consul-gossip-encryption-key -n consul -o jsonpath='{.data.key}' | base64 -d
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
**Solutions:**
|
|
34
|
+
|
|
35
|
+
1. **Gossip key mismatch** — Ensure all servers use same key
|
|
36
|
+
2. **Network policies** — Allow ports 8301 (LAN gossip), 8300 (RPC)
|
|
37
|
+
3. **DNS resolution** — Check headless service resolves correctly
|
|
38
|
+
|
|
39
|
+
### Raft Quorum Lost
|
|
40
|
+
|
|
41
|
+
**Symptoms:**
|
|
42
|
+
|
|
43
|
+
- `No cluster leader`
|
|
44
|
+
- Only 1 server responding
|
|
45
|
+
|
|
46
|
+
**Recovery:**
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
# Check raft peers
|
|
50
|
+
kubectl exec -n consul consul-server-0 -- consul operator raft list-peers
|
|
51
|
+
|
|
52
|
+
# Remove failed peer
|
|
53
|
+
kubectl exec -n consul consul-server-0 -- consul operator raft remove-peer -address=<failed-peer-ip>:8300
|
|
54
|
+
|
|
55
|
+
# If single node recovery needed (DANGEROUS - data loss possible)
|
|
56
|
+
kubectl exec -n consul consul-server-0 -- consul operator raft remove-peer -address=<peer-address>
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## Connect Sidecar Issues
|
|
62
|
+
|
|
63
|
+
### Sidecar Not Injecting
|
|
64
|
+
|
|
65
|
+
**Symptoms:**
|
|
66
|
+
|
|
67
|
+
- Pods start without `consul-dataplane` container
|
|
68
|
+
- No Envoy sidecar
|
|
69
|
+
|
|
70
|
+
**Diagnosis:**
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
# Check webhook
|
|
74
|
+
kubectl get mutatingwebhookconfigurations | grep consul
|
|
75
|
+
|
|
76
|
+
# Check injector logs
|
|
77
|
+
kubectl logs -n consul -l app=consul,component=connect-injector
|
|
78
|
+
|
|
79
|
+
# Check pod annotations
|
|
80
|
+
kubectl get pod <pod-name> -o yaml | grep consul
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
**Solutions:**
|
|
84
|
+
|
|
85
|
+
1. **Missing annotation** — Add `consul.hashicorp.com/connect-inject: "true"`
|
|
86
|
+
2. **Namespace not labeled** — `kubectl label namespace myapp consul.hashicorp.com/connect-inject=true`
|
|
87
|
+
3. **Webhook certificate expired** — Restart injector deployment
|
|
88
|
+
|
|
89
|
+
### Sidecar CrashLoopBackOff
|
|
90
|
+
|
|
91
|
+
**Symptoms:**
|
|
92
|
+
|
|
93
|
+
- `consul-dataplane` container crashes
|
|
94
|
+
- Pod restarts repeatedly
|
|
95
|
+
|
|
96
|
+
**Diagnosis:**
|
|
97
|
+
|
|
98
|
+
```bash
|
|
99
|
+
# Check dataplane logs
|
|
100
|
+
kubectl logs <pod-name> -c consul-dataplane
|
|
101
|
+
|
|
102
|
+
# Check Envoy config
|
|
103
|
+
kubectl exec <pod-name> -c consul-dataplane -- wget -qO- localhost:19000/config_dump
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
**Solutions:**
|
|
107
|
+
|
|
108
|
+
1. **ACL token issues** — Verify service token exists
|
|
109
|
+
2. **Upstream not found** — Check service registration
|
|
110
|
+
3. **Resource limits** — Increase CPU/memory for dataplane
|
|
111
|
+
|
|
112
|
+
---
|
|
113
|
+
|
|
114
|
+
## Performance Issues
|
|
115
|
+
|
|
116
|
+
### High Latency Between Services
|
|
117
|
+
|
|
118
|
+
**Diagnosis:**
|
|
119
|
+
|
|
120
|
+
```bash
|
|
121
|
+
# Check Envoy stats
|
|
122
|
+
kubectl exec <pod-name> -c consul-dataplane -- wget -qO- localhost:19000/stats | grep upstream
|
|
123
|
+
|
|
124
|
+
# Check connect intentions
|
|
125
|
+
kubectl get serviceintentions -A
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
**Solutions:**
|
|
129
|
+
|
|
130
|
+
1. **Increase Envoy resources**
|
|
131
|
+
|
|
132
|
+
```yaml
|
|
133
|
+
annotations:
|
|
134
|
+
consul.hashicorp.com/sidecar-proxy-cpu-request: "100m"
|
|
135
|
+
consul.hashicorp.com/sidecar-proxy-memory-request: "128Mi"
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
2. **Check network policies** — Ensure direct pod-to-pod traffic allowed
|
|
139
|
+
3. **Locality-aware routing** — Enable mesh gateway mode
|
|
140
|
+
|
|
141
|
+
### Server High CPU
|
|
142
|
+
|
|
143
|
+
**Diagnosis:**
|
|
144
|
+
|
|
145
|
+
```bash
|
|
146
|
+
# Check leader
|
|
147
|
+
kubectl exec -n consul consul-server-0 -- consul operator raft list-peers | grep leader
|
|
148
|
+
|
|
149
|
+
# Check catalog size
|
|
150
|
+
kubectl exec -n consul consul-server-0 -- consul catalog services | wc -l
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
**Solutions:**
|
|
154
|
+
|
|
155
|
+
1. Scale to 5 servers for read distribution
|
|
156
|
+
2. Increase server resources
|
|
157
|
+
3. Review service registration churn
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
## TLS/Certificate Issues
|
|
162
|
+
|
|
163
|
+
### Certificate Errors
|
|
164
|
+
|
|
165
|
+
**Symptoms:**
|
|
166
|
+
|
|
167
|
+
- `x509: certificate signed by unknown authority`
|
|
168
|
+
- Connect handshake failures
|
|
169
|
+
|
|
170
|
+
**Diagnosis:**
|
|
171
|
+
|
|
172
|
+
```bash
|
|
173
|
+
# Check CA config
|
|
174
|
+
kubectl exec -n consul consul-server-0 -- consul connect ca get-config
|
|
175
|
+
|
|
176
|
+
# Check certificate expiry
|
|
177
|
+
kubectl exec -n consul consul-server-0 -- consul tls cert-show
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
**Solutions:**
|
|
181
|
+
|
|
182
|
+
1. **CA mismatch** — Ensure all clients trust the CA
|
|
183
|
+
2. **Certificate rotation** — Trigger CA rotation
|
|
184
|
+
3. **Auto-encrypt issues** — Verify `enableAutoEncrypt: true`
|
|
185
|
+
|
|
186
|
+
### Gossip Encryption Failures
|
|
187
|
+
|
|
188
|
+
**Symptoms:**
|
|
189
|
+
|
|
190
|
+
- Servers can't communicate
|
|
191
|
+
- `memberlist: Encrypt message failed`
|
|
192
|
+
|
|
193
|
+
**Diagnosis:**
|
|
194
|
+
|
|
195
|
+
```bash
|
|
196
|
+
# Check keyring
|
|
197
|
+
kubectl exec -n consul consul-server-0 -- consul keyring -list
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
**Solutions:**
|
|
201
|
+
|
|
202
|
+
1. Ensure all nodes have the same gossip key
|
|
203
|
+
2. If rotating keys, follow proper key rotation procedure
|
|
204
|
+
|
|
205
|
+
---
|
|
206
|
+
|
|
207
|
+
## ACL Issues
|
|
208
|
+
|
|
209
|
+
### Permission Denied Errors
|
|
210
|
+
|
|
211
|
+
**Diagnosis:**
|
|
212
|
+
|
|
213
|
+
```bash
|
|
214
|
+
# Check token
|
|
215
|
+
kubectl exec -n consul consul-server-0 -- consul acl token list
|
|
216
|
+
|
|
217
|
+
# Check policy
|
|
218
|
+
kubectl exec -n consul consul-server-0 -- consul acl policy list
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
**Solutions:**
|
|
222
|
+
|
|
223
|
+
1. Create appropriate policy for the service
|
|
224
|
+
2. Attach policy to token
|
|
225
|
+
3. Verify token is being used correctly
|
|
226
|
+
|
|
227
|
+
### Bootstrap Token Lost
|
|
228
|
+
|
|
229
|
+
**Recovery:**
|
|
230
|
+
|
|
231
|
+
```bash
|
|
232
|
+
# Reset bootstrap (requires server access)
|
|
233
|
+
kubectl exec -n consul consul-server-0 -- consul acl bootstrap -reset
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
---
|
|
237
|
+
|
|
238
|
+
## Upgrade Issues
|
|
239
|
+
|
|
240
|
+
### Pods Stuck in Pending
|
|
241
|
+
|
|
242
|
+
**Cause:** Volume affinity conflicts
|
|
243
|
+
|
|
244
|
+
**Solution:**
|
|
245
|
+
|
|
246
|
+
```bash
|
|
247
|
+
# Check PVC binding
|
|
248
|
+
kubectl get pvc -n consul
|
|
249
|
+
|
|
250
|
+
# Delete stuck pod (statefulset will recreate)
|
|
251
|
+
kubectl delete pod consul-server-X -n consul
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
### Version Incompatibility
|
|
255
|
+
|
|
256
|
+
**Prevention:**
|
|
257
|
+
|
|
258
|
+
- Always check [upgrade notes](https://developer.hashicorp.com/consul/docs/upgrading)
|
|
259
|
+
- Upgrade one minor version at a time
|
|
260
|
+
- Test in non-prod first
|
|
261
|
+
|
|
262
|
+
**Recovery:**
|
|
263
|
+
|
|
264
|
+
```bash
|
|
265
|
+
# Rollback Helm release
|
|
266
|
+
helm rollback consul <previous-revision> -n consul
|
|
267
|
+
```
|
|
@@ -0,0 +1,213 @@
|
|
|
1
|
+
# Consul Upgrade Guide
|
|
2
|
+
|
|
3
|
+
## Table of Contents
|
|
4
|
+
|
|
5
|
+
- [Pre-Upgrade Checklist](#pre-upgrade-checklist)
|
|
6
|
+
- [Upgrade Paths](#upgrade-paths)
|
|
7
|
+
- [Upgrade Procedures](#upgrade-procedures)
|
|
8
|
+
- [Post-Upgrade Validation](#post-upgrade-validation)
|
|
9
|
+
- [Rollback Procedures](#rollback-procedures)
|
|
10
|
+
- [Breaking Changes by Version](#breaking-changes-by-version)
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## Pre-Upgrade Checklist
|
|
15
|
+
|
|
16
|
+
### Before Any Upgrade
|
|
17
|
+
|
|
18
|
+
- [ ] Review [release notes](https://developer.hashicorp.com/consul/docs/release-notes)
|
|
19
|
+
- [ ] Check for breaking changes
|
|
20
|
+
- [ ] Create snapshot backup
|
|
21
|
+
- [ ] Verify cluster is healthy
|
|
22
|
+
- [ ] Test upgrade in non-prod first
|
|
23
|
+
- [ ] Plan maintenance window
|
|
24
|
+
|
|
25
|
+
### Backup Commands
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
# Create snapshot
|
|
29
|
+
kubectl exec -n consul consul-server-0 -- consul snapshot save /tmp/backup.snap
|
|
30
|
+
kubectl cp consul/consul-server-0:/tmp/backup.snap ./consul-backup-$(date +%Y%m%d-%H%M).snap
|
|
31
|
+
|
|
32
|
+
# Verify backup
|
|
33
|
+
consul snapshot inspect ./consul-backup-*.snap
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
### Health Verification
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
# All servers healthy
|
|
40
|
+
kubectl exec -n consul consul-server-0 -- consul members
|
|
41
|
+
|
|
42
|
+
# Raft consensus
|
|
43
|
+
kubectl exec -n consul consul-server-0 -- consul operator raft list-peers
|
|
44
|
+
|
|
45
|
+
# No critical services down
|
|
46
|
+
kubectl exec -n consul consul-server-0 -- consul catalog services
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## Upgrade Paths
|
|
52
|
+
|
|
53
|
+
### Consul Version Compatibility
|
|
54
|
+
|
|
55
|
+
| From | To | Notes |
|
|
56
|
+
| ------ | ------ | ----------------------------------------------- |
|
|
57
|
+
| 1.15.x | 1.16.x | Direct upgrade supported |
|
|
58
|
+
| 1.16.x | 1.17.x | Direct upgrade supported |
|
|
59
|
+
| 1.17.x | 1.18.x | Direct upgrade supported |
|
|
60
|
+
| 1.15.x | 1.18.x | **Step upgrade required** (1.15→1.16→1.17→1.18) |
|
|
61
|
+
|
|
62
|
+
### consul-k8s Helm Chart Versions
|
|
63
|
+
|
|
64
|
+
| Helm Chart | Consul Version | Notes |
|
|
65
|
+
| ---------- | -------------- | ------ |
|
|
66
|
+
| 1.0.x | 1.14.x | Legacy |
|
|
67
|
+
| 1.1.x | 1.15.x | |
|
|
68
|
+
| 1.2.x | 1.16.x | |
|
|
69
|
+
| 1.3.x | 1.17.x | |
|
|
70
|
+
| 1.4.x | 1.18.x | Latest |
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
## Upgrade Procedures
|
|
75
|
+
|
|
76
|
+
### Standard Helm Upgrade
|
|
77
|
+
|
|
78
|
+
```bash
|
|
79
|
+
# 1. Check current version
|
|
80
|
+
helm list -n consul
|
|
81
|
+
kubectl exec -n consul consul-server-0 -- consul version
|
|
82
|
+
|
|
83
|
+
# 2. Update Helm repo
|
|
84
|
+
helm repo update hashicorp
|
|
85
|
+
|
|
86
|
+
# 3. Check available versions
|
|
87
|
+
helm search repo hashicorp/consul --versions
|
|
88
|
+
|
|
89
|
+
# 4. Dry-run upgrade
|
|
90
|
+
helm upgrade consul hashicorp/consul \
|
|
91
|
+
--namespace consul \
|
|
92
|
+
--values consul-values.yaml \
|
|
93
|
+
--version <NEW_VERSION> \
|
|
94
|
+
--dry-run
|
|
95
|
+
|
|
96
|
+
# 5. Perform upgrade
|
|
97
|
+
helm upgrade consul hashicorp/consul \
|
|
98
|
+
--namespace consul \
|
|
99
|
+
--values consul-values.yaml \
|
|
100
|
+
--version <NEW_VERSION>
|
|
101
|
+
|
|
102
|
+
# 6. Watch rollout
|
|
103
|
+
kubectl rollout status statefulset/consul-server -n consul --timeout=10m
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### Zero-Downtime Upgrade Strategy
|
|
107
|
+
|
|
108
|
+
```bash
|
|
109
|
+
# Upgrade servers one at a time
|
|
110
|
+
for i in 2 1 0; do
|
|
111
|
+
echo "Upgrading consul-server-$i..."
|
|
112
|
+
kubectl delete pod consul-server-$i -n consul
|
|
113
|
+
kubectl wait --for=condition=Ready pod/consul-server-$i -n consul --timeout=300s
|
|
114
|
+
sleep 30
|
|
115
|
+
done
|
|
116
|
+
|
|
117
|
+
# Verify leader election
|
|
118
|
+
kubectl exec -n consul consul-server-0 -- consul operator raft list-peers
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
### Multi-Datacenter Upgrade Order
|
|
122
|
+
|
|
123
|
+
1. **Upgrade secondary datacenters first**
|
|
124
|
+
2. Verify federation health
|
|
125
|
+
3. **Upgrade primary datacenter last**
|
|
126
|
+
|
|
127
|
+
---
|
|
128
|
+
|
|
129
|
+
## Post-Upgrade Validation
|
|
130
|
+
|
|
131
|
+
### Immediate Checks
|
|
132
|
+
|
|
133
|
+
```bash
|
|
134
|
+
# Server health
|
|
135
|
+
kubectl exec -n consul consul-server-0 -- consul members
|
|
136
|
+
|
|
137
|
+
# Version verification
|
|
138
|
+
kubectl exec -n consul consul-server-0 -- consul version
|
|
139
|
+
|
|
140
|
+
# Raft status
|
|
141
|
+
kubectl exec -n consul consul-server-0 -- consul operator raft list-peers
|
|
142
|
+
|
|
143
|
+
# Connect CA
|
|
144
|
+
kubectl exec -n consul consul-server-0 -- consul connect ca get-config
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
### Service Mesh Validation
|
|
148
|
+
|
|
149
|
+
```bash
|
|
150
|
+
# Check service registrations
|
|
151
|
+
kubectl exec -n consul consul-server-0 -- consul catalog services
|
|
152
|
+
|
|
153
|
+
# Verify sidecar injection still works
|
|
154
|
+
kubectl rollout restart deployment/test-app -n test
|
|
155
|
+
|
|
156
|
+
# Check intentions
|
|
157
|
+
kubectl get serviceintentions -A
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
## Rollback Procedures
|
|
163
|
+
|
|
164
|
+
### Helm Rollback
|
|
165
|
+
|
|
166
|
+
```bash
|
|
167
|
+
# List revisions
|
|
168
|
+
helm history consul -n consul
|
|
169
|
+
|
|
170
|
+
# Rollback to previous
|
|
171
|
+
helm rollback consul <REVISION> -n consul
|
|
172
|
+
|
|
173
|
+
# Watch rollout
|
|
174
|
+
kubectl rollout status statefulset/consul-server -n consul
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
### Snapshot Restore (Data Recovery)
|
|
178
|
+
|
|
179
|
+
```bash
|
|
180
|
+
# Copy backup to server
|
|
181
|
+
kubectl cp ./consul-backup.snap consul/consul-server-0:/tmp/restore.snap
|
|
182
|
+
|
|
183
|
+
# Restore snapshot
|
|
184
|
+
kubectl exec -n consul consul-server-0 -- consul snapshot restore /tmp/restore.snap
|
|
185
|
+
|
|
186
|
+
# Restart servers
|
|
187
|
+
kubectl rollout restart statefulset/consul-server -n consul
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
---
|
|
191
|
+
|
|
192
|
+
## Breaking Changes by Version
|
|
193
|
+
|
|
194
|
+
### 1.17.x → 1.18.x
|
|
195
|
+
|
|
196
|
+
- Consul Dataplane is now default (replaces client agents)
|
|
197
|
+
- Update sidecar annotations if customized
|
|
198
|
+
|
|
199
|
+
### 1.16.x → 1.17.x
|
|
200
|
+
|
|
201
|
+
- ACL token migration completed
|
|
202
|
+
- Legacy ACL tokens no longer supported
|
|
203
|
+
|
|
204
|
+
### 1.15.x → 1.16.x
|
|
205
|
+
|
|
206
|
+
- Catalog API v2 introduced (opt-in)
|
|
207
|
+
- Some deprecated flags removed
|
|
208
|
+
|
|
209
|
+
### General Guidance
|
|
210
|
+
|
|
211
|
+
- Always check the official [upgrading documentation](https://developer.hashicorp.com/consul/docs/upgrading)
|
|
212
|
+
- Test CRD changes in non-prod
|
|
213
|
+
- Monitor Envoy proxy compatibility
|