@techwavedev/agi-agent-kit 1.1.7 → 1.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of @techwavedev/agi-agent-kit might be problematic. Click here for more details.
- package/CHANGELOG.md +82 -1
- package/README.md +190 -12
- package/bin/init.js +30 -2
- package/package.json +6 -3
- package/templates/base/AGENTS.md +54 -23
- package/templates/base/README.md +325 -0
- package/templates/base/directives/memory_integration.md +95 -0
- package/templates/base/execution/memory_manager.py +309 -0
- package/templates/base/execution/session_boot.py +218 -0
- package/templates/base/execution/session_init.py +320 -0
- package/templates/base/skill-creator/SKILL_skillcreator.md +23 -36
- package/templates/base/skill-creator/scripts/init_skill.py +18 -135
- package/templates/skills/ec/README.md +31 -0
- package/templates/skills/ec/aws/SKILL.md +1020 -0
- package/templates/skills/ec/aws/defaults.yaml +13 -0
- package/templates/skills/ec/aws/references/common_patterns.md +80 -0
- package/templates/skills/ec/aws/references/mcp_servers.md +98 -0
- package/templates/skills/ec/aws-terraform/SKILL.md +349 -0
- package/templates/skills/ec/aws-terraform/references/best_practices.md +394 -0
- package/templates/skills/ec/aws-terraform/references/checkov_reference.md +337 -0
- package/templates/skills/ec/aws-terraform/scripts/configure_mcp.py +150 -0
- package/templates/skills/ec/confluent-kafka/SKILL.md +655 -0
- package/templates/skills/ec/confluent-kafka/references/ansible_playbooks.md +792 -0
- package/templates/skills/ec/confluent-kafka/references/ec_deployment.md +579 -0
- package/templates/skills/ec/confluent-kafka/references/kraft_migration.md +490 -0
- package/templates/skills/ec/confluent-kafka/references/troubleshooting.md +778 -0
- package/templates/skills/ec/confluent-kafka/references/upgrade_7x_to_8x.md +488 -0
- package/templates/skills/ec/confluent-kafka/scripts/kafka_health_check.py +435 -0
- package/templates/skills/ec/confluent-kafka/scripts/upgrade_preflight.py +568 -0
- package/templates/skills/ec/confluent-kafka/scripts/validate_config.py +455 -0
- package/templates/skills/ec/consul/SKILL.md +427 -0
- package/templates/skills/ec/consul/references/acl_setup.md +168 -0
- package/templates/skills/ec/consul/references/ha_config.md +196 -0
- package/templates/skills/ec/consul/references/troubleshooting.md +267 -0
- package/templates/skills/ec/consul/references/upgrades.md +213 -0
- package/templates/skills/ec/consul/scripts/consul_health_report.py +530 -0
- package/templates/skills/ec/consul/scripts/consul_status.py +264 -0
- package/templates/skills/ec/consul/scripts/generate_values.py +170 -0
- package/templates/skills/ec/documentation/SKILL.md +351 -0
- package/templates/skills/ec/documentation/references/best_practices.md +201 -0
- package/templates/skills/ec/documentation/scripts/analyze_code.py +307 -0
- package/templates/skills/ec/documentation/scripts/detect_changes.py +460 -0
- package/templates/skills/ec/documentation/scripts/generate_changelog.py +312 -0
- package/templates/skills/ec/documentation/scripts/sync_docs.py +272 -0
- package/templates/skills/ec/documentation/scripts/update_skill_docs.py +366 -0
- package/templates/skills/ec/gitlab/SKILL.md +529 -0
- package/templates/skills/ec/gitlab/references/agent_installation.md +416 -0
- package/templates/skills/ec/gitlab/references/api_reference.md +508 -0
- package/templates/skills/ec/gitlab/references/gitops_flux.md +465 -0
- package/templates/skills/ec/gitlab/references/troubleshooting.md +518 -0
- package/templates/skills/ec/gitlab/scripts/generate_agent_values.py +329 -0
- package/templates/skills/ec/gitlab/scripts/gitlab_agent_status.py +414 -0
- package/templates/skills/ec/jira/SKILL.md +484 -0
- package/templates/skills/ec/jira/references/jql_reference.md +148 -0
- package/templates/skills/ec/jira/scripts/add_comment.py +91 -0
- package/templates/skills/ec/jira/scripts/bulk_log_work.py +124 -0
- package/templates/skills/ec/jira/scripts/create_ticket.py +162 -0
- package/templates/skills/ec/jira/scripts/get_ticket.py +191 -0
- package/templates/skills/ec/jira/scripts/jira_client.py +383 -0
- package/templates/skills/ec/jira/scripts/log_work.py +154 -0
- package/templates/skills/ec/jira/scripts/search_tickets.py +104 -0
- package/templates/skills/ec/jira/scripts/update_comment.py +67 -0
- package/templates/skills/ec/jira/scripts/update_ticket.py +161 -0
- package/templates/skills/ec/karpenter/SKILL.md +301 -0
- package/templates/skills/ec/karpenter/references/ec2nodeclasses.md +421 -0
- package/templates/skills/ec/karpenter/references/migration.md +396 -0
- package/templates/skills/ec/karpenter/references/nodepools.md +400 -0
- package/templates/skills/ec/karpenter/references/troubleshooting.md +359 -0
- package/templates/skills/ec/karpenter/scripts/generate_ec2nodeclass.py +187 -0
- package/templates/skills/ec/karpenter/scripts/generate_nodepool.py +245 -0
- package/templates/skills/ec/karpenter/scripts/karpenter_status.py +359 -0
- package/templates/skills/ec/opensearch/SKILL.md +720 -0
- package/templates/skills/ec/opensearch/references/ml_neural_search.md +576 -0
- package/templates/skills/ec/opensearch/references/operator.md +532 -0
- package/templates/skills/ec/opensearch/references/query_dsl.md +532 -0
- package/templates/skills/ec/opensearch/scripts/configure_mcp.py +148 -0
- package/templates/skills/ec/victoriametrics/SKILL.md +598 -0
- package/templates/skills/ec/victoriametrics/references/kubernetes.md +531 -0
- package/templates/skills/ec/victoriametrics/references/prometheus_migration.md +333 -0
- package/templates/skills/ec/victoriametrics/references/troubleshooting.md +442 -0
- package/templates/skills/knowledge/SKILLS_CATALOG.md +274 -4
- package/templates/skills/knowledge/intelligent-routing/SKILL.md +237 -164
- package/templates/skills/knowledge/parallel-agents/SKILL.md +345 -73
- package/templates/skills/knowledge/plugin-discovery/SKILL.md +582 -0
- package/templates/skills/knowledge/plugin-discovery/scripts/platform_setup.py +1083 -0
- package/templates/skills/knowledge/design-md/README.md +0 -34
- package/templates/skills/knowledge/design-md/SKILL.md +0 -193
- package/templates/skills/knowledge/design-md/examples/DESIGN.md +0 -154
- package/templates/skills/knowledge/notebooklm-mcp/SKILL.md +0 -71
- package/templates/skills/knowledge/notebooklm-mcp/assets/example_asset.txt +0 -24
- package/templates/skills/knowledge/notebooklm-mcp/references/api_reference.md +0 -34
- package/templates/skills/knowledge/notebooklm-mcp/scripts/example.py +0 -19
- package/templates/skills/knowledge/react-components/README.md +0 -36
- package/templates/skills/knowledge/react-components/SKILL.md +0 -53
- package/templates/skills/knowledge/react-components/examples/gold-standard-card.tsx +0 -80
- package/templates/skills/knowledge/react-components/package-lock.json +0 -231
- package/templates/skills/knowledge/react-components/package.json +0 -16
- package/templates/skills/knowledge/react-components/resources/architecture-checklist.md +0 -15
- package/templates/skills/knowledge/react-components/resources/component-template.tsx +0 -37
- package/templates/skills/knowledge/react-components/resources/stitch-api-reference.md +0 -14
- package/templates/skills/knowledge/react-components/resources/style-guide.json +0 -27
- package/templates/skills/knowledge/react-components/scripts/fetch-stitch.sh +0 -30
- package/templates/skills/knowledge/react-components/scripts/validate.js +0 -68
- package/templates/skills/knowledge/self-update/SKILL.md +0 -60
- package/templates/skills/knowledge/self-update/scripts/update_kit.py +0 -103
- package/templates/skills/knowledge/stitch-loop/README.md +0 -54
- package/templates/skills/knowledge/stitch-loop/SKILL.md +0 -235
- package/templates/skills/knowledge/stitch-loop/examples/SITE.md +0 -73
- package/templates/skills/knowledge/stitch-loop/examples/next-prompt.md +0 -25
- package/templates/skills/knowledge/stitch-loop/resources/baton-schema.md +0 -61
- package/templates/skills/knowledge/stitch-loop/resources/site-template.md +0 -104
|
@@ -0,0 +1,655 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: confluent-kafka
|
|
3
|
+
description: Confluent Kafka specialist for tarball/Ansible custom installations. Expert in updating, maintaining, checking health, troubleshooting, documenting, analyzing metrics, and upgrading Confluent Kafka deployments from 7.x to 8.x versions. Covers KRaft mode (ZooKeeper-less), broker configuration, Schema Registry, Connect, ksqlDB, Control Center, and production-grade operations. Use when working with Confluent Platform installations, migrations to KRaft, performance tuning, health monitoring, and infrastructure-as-code with Ansible.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Confluent Kafka Skill
|
|
7
|
+
|
|
8
|
+
Comprehensive skill for managing Confluent Platform Kafka clusters deployed via tarball distributions and automated with Ansible. **Primary deployment context is EC (European Commission) controlled environments using KRaft-only, SSL-only, non-root systemd user services.**
|
|
9
|
+
|
|
10
|
+
> **Last Updated:** 2026-01-20 from [Confluent Documentation](https://docs.confluent.io/)
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## EC Environment Quick Reference
|
|
15
|
+
|
|
16
|
+
> **Note:** Values below use variable placeholders. Define actual values in inventory files outside git.
|
|
17
|
+
|
|
18
|
+
| Item | Variable / Default |
|
|
19
|
+
| --------------------- | ---------------------------------------------- |
|
|
20
|
+
| **Confluent Version** | `{{ confluent_version }}` (e.g., 7.9.3) |
|
|
21
|
+
| **Ansible Base** | `{{ ansible_base }}` |
|
|
22
|
+
| **Confluent Install** | `{{ base_path }}/opt/confluent-{{ version }}/` |
|
|
23
|
+
| **JAVA_HOME** | `{{ base_path }}/opt/{{ java_version }}` |
|
|
24
|
+
| **SSL Directory** | `{{ base_path }}/opt/ssl/` |
|
|
25
|
+
| **Data: Controller** | `{{ base_path }}/opt/data/controller` |
|
|
26
|
+
| **Data: Broker** | `{{ base_path }}/opt/data` |
|
|
27
|
+
| **Logs** | `{{ base_path }}/logs/` |
|
|
28
|
+
| **Systemd (User)** | `~/.config/systemd/user/` |
|
|
29
|
+
| **User/Group** | `{{ kafka_user }}:{{ kafka_group }}` |
|
|
30
|
+
| **Controller Port** | `{{ controller_port }}` (default: 9093) |
|
|
31
|
+
| **Broker Port** | `{{ broker_port }}` (default: 9443) |
|
|
32
|
+
|
|
33
|
+
> **Full EC deployment reference:** [references/ec_deployment.md](references/ec_deployment.md)
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## Quick Start (EC Environment)
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
# Set environment variables (or source from environment file)
|
|
41
|
+
export KAFKA_HOME={{ base_path }}/opt/confluent-{{ confluent_version }}
|
|
42
|
+
export BOOTSTRAP={{ broker_host_1 }}:{{ broker_port }},{{ broker_host_2 }}:{{ broker_port }}
|
|
43
|
+
export CLIENT_PROPS={{ base_path }}/etc/kafka/client.properties
|
|
44
|
+
|
|
45
|
+
# SSH to a broker node
|
|
46
|
+
ssh {{ kafka_user }}@{{ broker_host_1 }}
|
|
47
|
+
|
|
48
|
+
# Verify broker is running (user systemd scope)
|
|
49
|
+
systemctl --user status confluent-server
|
|
50
|
+
|
|
51
|
+
# Check cluster health
|
|
52
|
+
$KAFKA_HOME/bin/kafka-broker-api-versions \
|
|
53
|
+
--bootstrap-server $BOOTSTRAP \
|
|
54
|
+
--command-config $CLIENT_PROPS
|
|
55
|
+
|
|
56
|
+
# Check controller quorum (KRaft)
|
|
57
|
+
$KAFKA_HOME/bin/kafka-metadata \
|
|
58
|
+
--snapshot {{ base_path }}/opt/data/controller/__cluster_metadata-0/00000000000000000000.log \
|
|
59
|
+
--command quorum
|
|
60
|
+
|
|
61
|
+
# Use management script
|
|
62
|
+
{{ base_path }}/scripts/management/kafka_node.sh status
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## Core Concepts
|
|
68
|
+
|
|
69
|
+
### Confluent Platform Components
|
|
70
|
+
|
|
71
|
+
| Component | Description | Default Port |
|
|
72
|
+
| -------------------- | ---------------------------------------- | ------------ |
|
|
73
|
+
| **Kafka Broker** | Core message broker (confluent-server) | 9092 |
|
|
74
|
+
| **KRaft Controller** | Metadata management (replaces ZooKeeper) | 9093 |
|
|
75
|
+
| **Schema Registry** | Avro/JSON/Protobuf schema management | 8081 |
|
|
76
|
+
| **Kafka Connect** | Data integration connectors | 8083 |
|
|
77
|
+
| **ksqlDB** | Stream processing with SQL | 8088 |
|
|
78
|
+
| **Control Center** | Web-based management UI | 9021 |
|
|
79
|
+
| **REST Proxy** | HTTP interface to Kafka | 8082 |
|
|
80
|
+
|
|
81
|
+
### KRaft Architecture (8.x Native)
|
|
82
|
+
|
|
83
|
+
```
|
|
84
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
85
|
+
│ Confluent Kafka KRaft Cluster │
|
|
86
|
+
│ │
|
|
87
|
+
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
88
|
+
│ │ Controller Quorum (Raft Consensus) │ │
|
|
89
|
+
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
|
|
90
|
+
│ │ │Controller-01 │ │Controller-02 │ │Controller-03 │ │ │
|
|
91
|
+
│ │ │ (voter) │ │ (voter) │ │ (voter) │ │ │
|
|
92
|
+
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
|
|
93
|
+
│ └─────────────────────────────────────────────────────────┘ │
|
|
94
|
+
│ │ │
|
|
95
|
+
│ Metadata updates via Raft │
|
|
96
|
+
│ ▼ │
|
|
97
|
+
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
98
|
+
│ │ Broker Nodes │ │
|
|
99
|
+
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
|
|
100
|
+
│ │ │Broker-01 │ │Broker-02 │ │Broker-03 │ │Broker-N │ │ │
|
|
101
|
+
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
|
|
102
|
+
│ └─────────────────────────────────────────────────────────┘ │
|
|
103
|
+
│ │
|
|
104
|
+
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
105
|
+
│ │ Ecosystem Services │ │
|
|
106
|
+
│ │ Schema Registry │ Connect │ ksqlDB │ Control Center │ │
|
|
107
|
+
│ └─────────────────────────────────────────────────────────┘ │
|
|
108
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### EC Installation Layout
|
|
112
|
+
|
|
113
|
+
```
|
|
114
|
+
{{ base_path }}/ # Main installation root
|
|
115
|
+
├── opt/
|
|
116
|
+
│ ├── confluent-{{ confluent_version }}/ # Confluent Platform
|
|
117
|
+
│ │ ├── bin/ # CLI tools
|
|
118
|
+
│ │ ├── etc/ # Default configs (unused)
|
|
119
|
+
│ │ └── share/ # Libraries
|
|
120
|
+
│ │
|
|
121
|
+
│ ├── {{ java_version }}/ # Java installation
|
|
122
|
+
│ │
|
|
123
|
+
│ ├── ssl/ # SSL certificates
|
|
124
|
+
│ │ ├── {{ keystore_filename }}
|
|
125
|
+
│ │ ├── {{ truststore_filename }}
|
|
126
|
+
│ │ └── security.properties # Encrypted passwords
|
|
127
|
+
│ │
|
|
128
|
+
│ ├── data/ # Kafka data
|
|
129
|
+
│ │ ├── controller/ # Controller logs
|
|
130
|
+
│ │ └── (broker data at root)
|
|
131
|
+
│ │
|
|
132
|
+
│ ├── logs/ # Application logs
|
|
133
|
+
│ └── monitoring/ # JMX exporter
|
|
134
|
+
│
|
|
135
|
+
├── etc/
|
|
136
|
+
│ ├── kafka/server.properties # Broker config
|
|
137
|
+
│ └── controller/server.properties # Controller config
|
|
138
|
+
│
|
|
139
|
+
├── logs/ # Runtime + GC logs
|
|
140
|
+
├── tmp/ # Java temp directory
|
|
141
|
+
└── scripts/management/ # Management scripts
|
|
142
|
+
├── kafka_node.sh # Start/stop wrapper
|
|
143
|
+
└── kafka_tools.sh # Aux tools
|
|
144
|
+
|
|
145
|
+
~/.config/systemd/user/ # User-scope systemd
|
|
146
|
+
├── confluent-kcontroller.service # Controller service
|
|
147
|
+
└── confluent-server.service # Broker service
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
> **Standard paths** (non-EC): `/opt/confluent/`, `/var/kafka-logs/`, `/etc/systemd/system/`
|
|
151
|
+
|
|
152
|
+
---
|
|
153
|
+
|
|
154
|
+
## Common Workflows
|
|
155
|
+
|
|
156
|
+
### 1. Check Cluster Health (EC)
|
|
157
|
+
|
|
158
|
+
```bash
|
|
159
|
+
# Set environment (source from your environment file)
|
|
160
|
+
export KAFKA_HOME={{ base_path }}/opt/confluent-{{ confluent_version }}
|
|
161
|
+
export BOOTSTRAP={{ broker_host_1 }}:{{ broker_port }},{{ broker_host_2 }}:{{ broker_port }},{{ broker_host_3 }}:{{ broker_port }}
|
|
162
|
+
export CLIENT_PROPS={{ base_path }}/etc/kafka/client.properties
|
|
163
|
+
|
|
164
|
+
# Broker status (user systemd)
|
|
165
|
+
systemctl --user status confluent-server
|
|
166
|
+
|
|
167
|
+
# Controller quorum status (KRaft)
|
|
168
|
+
$KAFKA_HOME/bin/kafka-metadata \
|
|
169
|
+
--snapshot {{ base_path }}/opt/data/controller/__cluster_metadata-0/00000000000000000000.log \
|
|
170
|
+
--command quorum
|
|
171
|
+
|
|
172
|
+
# Under-replicated partitions
|
|
173
|
+
$KAFKA_HOME/bin/kafka-topics --bootstrap-server $BOOTSTRAP \
|
|
174
|
+
--command-config $CLIENT_PROPS \
|
|
175
|
+
--describe --under-replicated-partitions
|
|
176
|
+
|
|
177
|
+
# Offline partitions (CRITICAL)
|
|
178
|
+
$KAFKA_HOME/bin/kafka-topics --bootstrap-server $BOOTSTRAP \
|
|
179
|
+
--command-config $CLIENT_PROPS \
|
|
180
|
+
--describe --unavailable-partitions
|
|
181
|
+
|
|
182
|
+
# Broker disk usage
|
|
183
|
+
df -h /ec/local/reuse/opt/data
|
|
184
|
+
|
|
185
|
+
# JVM heap usage
|
|
186
|
+
jstat -gc $(pgrep -f kafka.Kafka) 1000 5
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
### 2. Monitor Consumer Lag
|
|
190
|
+
|
|
191
|
+
```bash
|
|
192
|
+
# All consumer groups
|
|
193
|
+
/opt/confluent/bin/kafka-consumer-groups --bootstrap-server localhost:9092 \
|
|
194
|
+
--all-groups --describe
|
|
195
|
+
|
|
196
|
+
# Specific group
|
|
197
|
+
/opt/confluent/bin/kafka-consumer-groups --bootstrap-server localhost:9092 \
|
|
198
|
+
--group my-consumer-group --describe
|
|
199
|
+
|
|
200
|
+
# Export lag metrics (for monitoring integration)
|
|
201
|
+
/opt/confluent/bin/kafka-consumer-groups --bootstrap-server localhost:9092 \
|
|
202
|
+
--all-groups --describe | awk 'NR>1 {sum+=$6} END {print "Total Lag: " sum}'
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
### 3. Schema Registry Operations
|
|
206
|
+
|
|
207
|
+
```bash
|
|
208
|
+
# Check Schema Registry health
|
|
209
|
+
curl -s http://localhost:8081/ | jq
|
|
210
|
+
|
|
211
|
+
# List all subjects
|
|
212
|
+
curl -s http://localhost:8081/subjects | jq
|
|
213
|
+
|
|
214
|
+
# Get schema versions for a subject
|
|
215
|
+
curl -s http://localhost:8081/subjects/my-topic-value/versions | jq
|
|
216
|
+
|
|
217
|
+
# Get specific schema
|
|
218
|
+
curl -s http://localhost:8081/subjects/my-topic-value/versions/latest | jq
|
|
219
|
+
|
|
220
|
+
# Check compatibility
|
|
221
|
+
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
|
|
222
|
+
--data '{"schema": "{\"type\":\"record\",\"name\":\"Test\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"}]}"}' \
|
|
223
|
+
http://localhost:8081/compatibility/subjects/my-topic-value/versions/latest | jq
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
### 4. Connect Cluster Operations
|
|
227
|
+
|
|
228
|
+
```bash
|
|
229
|
+
# Connect worker status
|
|
230
|
+
curl -s http://localhost:8083/ | jq
|
|
231
|
+
|
|
232
|
+
# List connectors
|
|
233
|
+
curl -s http://localhost:8083/connectors | jq
|
|
234
|
+
|
|
235
|
+
# Connector status
|
|
236
|
+
curl -s http://localhost:8083/connectors/my-connector/status | jq
|
|
237
|
+
|
|
238
|
+
# Restart failed tasks
|
|
239
|
+
curl -X POST http://localhost:8083/connectors/my-connector/tasks/0/restart
|
|
240
|
+
|
|
241
|
+
# View connector config
|
|
242
|
+
curl -s http://localhost:8083/connectors/my-connector/config | jq
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
---
|
|
246
|
+
|
|
247
|
+
## Upgrade Guide: 7.x to 8.x
|
|
248
|
+
|
|
249
|
+
### Key Changes in 8.x
|
|
250
|
+
|
|
251
|
+
| Change | Impact |
|
|
252
|
+
| --------------------- | ---------------------------------------- |
|
|
253
|
+
| **KRaft GA** | ZooKeeper deprecated, KRaft recommended |
|
|
254
|
+
| **Java 17+** | Minimum Java version requirement |
|
|
255
|
+
| **Deprecated APIs** | kafka-preferred-replica-election removed |
|
|
256
|
+
| **New Metrics** | Enhanced KRaft controller metrics |
|
|
257
|
+
| **Security Defaults** | Stricter TLS requirements |
|
|
258
|
+
|
|
259
|
+
### Pre-Upgrade Checklist
|
|
260
|
+
|
|
261
|
+
```bash
|
|
262
|
+
# 1. Check current version
|
|
263
|
+
/opt/confluent/bin/kafka-broker-api-versions --version
|
|
264
|
+
|
|
265
|
+
# 2. Verify Java version (must be 17+)
|
|
266
|
+
java -version
|
|
267
|
+
|
|
268
|
+
# 3. Backup configurations
|
|
269
|
+
tar -czvf /backup/confluent-config-$(date +%Y%m%d).tar.gz /opt/confluent/etc/
|
|
270
|
+
|
|
271
|
+
# 4. Export topic configurations
|
|
272
|
+
/opt/confluent/bin/kafka-configs --bootstrap-server localhost:9092 \
|
|
273
|
+
--entity-type topics --all --describe > /backup/topic-configs.txt
|
|
274
|
+
|
|
275
|
+
# 5. Check for deprecated configurations
|
|
276
|
+
grep -r "log.message.format.version\|inter.broker.protocol.version" /opt/confluent/etc/kafka/
|
|
277
|
+
|
|
278
|
+
# 6. Verify cluster health
|
|
279
|
+
/opt/confluent/bin/kafka-topics --bootstrap-server localhost:9092 \
|
|
280
|
+
--describe --under-replicated-partitions
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
### Reference Files
|
|
284
|
+
|
|
285
|
+
- **[references/ec_deployment.md](references/ec_deployment.md)** — **EC deployment paths, Vault, and setup**
|
|
286
|
+
- **[references/upgrade_7x_to_8x.md](references/upgrade_7x_to_8x.md)** — Complete 7.x to 8.x migration guide
|
|
287
|
+
- **[references/kraft_migration.md](references/kraft_migration.md)** — ZooKeeper to KRaft migration steps
|
|
288
|
+
- **[references/ansible_playbooks.md](references/ansible_playbooks.md)** — Ansible automation patterns
|
|
289
|
+
|
|
290
|
+
---
|
|
291
|
+
|
|
292
|
+
## Troubleshooting Guide
|
|
293
|
+
|
|
294
|
+
### Common Issues
|
|
295
|
+
|
|
296
|
+
| Issue | Diagnosis | Solution |
|
|
297
|
+
| -------------------------- | ------------------------------------------ | --------------------------------------------- |
|
|
298
|
+
| **Broker won't start** | Check logs `/var/log/confluent/kafka/` | Fix config, check disk space, verify ports |
|
|
299
|
+
| **Controller quorum lost** | `kafka-metadata` shows <3 voters | Restore failed controllers, check network |
|
|
300
|
+
| **High consumer lag** | Consumer processing slower than production | Scale consumers, optimize processing |
|
|
301
|
+
| **ISR shrinking** | Followers falling behind leader | Check network, disk I/O, increase replica lag |
|
|
302
|
+
| **Schema Registry 409** | Schema incompatibility | Check compatibility mode, use FULL_TRANSITIVE |
|
|
303
|
+
| **Connect task failed** | Connector config or target system issue | Check task status, review error in config |
|
|
304
|
+
| **OOM on broker** | Heap exhaustion | Tune JVM heap, check for memory leaks |
|
|
305
|
+
|
|
306
|
+
### Debug Commands (EC)
|
|
307
|
+
|
|
308
|
+
```bash
|
|
309
|
+
# Set paths (source from your environment file)
|
|
310
|
+
export KAFKA_HOME={{ base_path }}/opt/confluent-{{ confluent_version }}
|
|
311
|
+
export LOG_DIR={{ base_path }}/logs
|
|
312
|
+
export DATA_DIR={{ base_path }}/opt/data
|
|
313
|
+
|
|
314
|
+
# Kafka server logs (last 100 lines)
|
|
315
|
+
tail -100 $LOG_DIR/server.log
|
|
316
|
+
|
|
317
|
+
# Controller logs (KRaft)
|
|
318
|
+
tail -100 $LOG_DIR/controller.log
|
|
319
|
+
|
|
320
|
+
# Systemd journal logs
|
|
321
|
+
journalctl --user -u confluent-server -f
|
|
322
|
+
journalctl --user -u confluent-kcontroller -f
|
|
323
|
+
|
|
324
|
+
# Check open file descriptors
|
|
325
|
+
lsof -p $(pgrep -f kafka.Kafka) | wc -l
|
|
326
|
+
|
|
327
|
+
# Network connections to broker
|
|
328
|
+
netstat -an | grep {{ broker_port }} | wc -l
|
|
329
|
+
|
|
330
|
+
# Thread dump for debugging hung brokers
|
|
331
|
+
jstack $(pgrep -f kafka.Kafka) > /tmp/kafka-thread-dump.txt
|
|
332
|
+
|
|
333
|
+
# GC logs analysis
|
|
334
|
+
grep "GC pause" $LOG_DIR/gc.log | tail -20
|
|
335
|
+
|
|
336
|
+
# KRaft metadata diagnostics
|
|
337
|
+
$KAFKA_HOME/bin/kafka-metadata \
|
|
338
|
+
--snapshot $DATA_DIR/controller/__cluster_metadata-0/00000000000000000000.log \
|
|
339
|
+
--command topic --topics __consumer_offsets
|
|
340
|
+
```
|
|
341
|
+
|
|
342
|
+
### Detailed Troubleshooting
|
|
343
|
+
|
|
344
|
+
For in-depth troubleshooting scenarios, see **[references/troubleshooting.md](references/troubleshooting.md)**.
|
|
345
|
+
|
|
346
|
+
---
|
|
347
|
+
|
|
348
|
+
## Ansible Automation (EC Environment)
|
|
349
|
+
|
|
350
|
+
### EC Inventory Structure
|
|
351
|
+
|
|
352
|
+
```yaml
|
|
353
|
+
# {{ ansible_base }}/inventories/{{ env_name }}/hosts.yml
|
|
354
|
+
# Replace {{ variable }} placeholders with actual values in your inventory (not in git)
|
|
355
|
+
all:
|
|
356
|
+
children:
|
|
357
|
+
kafka_controller:
|
|
358
|
+
hosts:
|
|
359
|
+
{ { controller_host_1 } }:
|
|
360
|
+
node_id: { { controller_id_1 } }
|
|
361
|
+
{ { controller_host_2 } }:
|
|
362
|
+
node_id: { { controller_id_2 } }
|
|
363
|
+
{ { controller_host_3 } }:
|
|
364
|
+
node_id: { { controller_id_3 } }
|
|
365
|
+
|
|
366
|
+
kafka_broker:
|
|
367
|
+
hosts:
|
|
368
|
+
{ { broker_host_1 } }:
|
|
369
|
+
node_id: { { broker_id_1 } }
|
|
370
|
+
{ { broker_host_2 } }:
|
|
371
|
+
node_id: { { broker_id_2 } }
|
|
372
|
+
{ { broker_host_3 } }:
|
|
373
|
+
node_id: { { broker_id_3 } }
|
|
374
|
+
```
|
|
375
|
+
|
|
376
|
+
### EC Deployment Commands
|
|
377
|
+
|
|
378
|
+
```bash
|
|
379
|
+
# Export Vault token (obtain via PrivX or your auth method)
|
|
380
|
+
export VAULT_TOKEN="${VAULT_TOKEN}"
|
|
381
|
+
cd {{ ansible_base }}
|
|
382
|
+
|
|
383
|
+
# Vault bootstrap (one-time per environment)
|
|
384
|
+
ansible-playbook playbooks/tasks/vault-bootstrap.yml \
|
|
385
|
+
-e vault_env={{ env_name }} \
|
|
386
|
+
-e "@resources/secrets.yml"
|
|
387
|
+
|
|
388
|
+
# Deploy controllers
|
|
389
|
+
ansible-playbook -i inventories/{{ env_name }}/hosts.yml \
|
|
390
|
+
playbooks/10-kafka-controllers.yml \
|
|
391
|
+
--limit {{ controller_host_1 }} \
|
|
392
|
+
-vv \
|
|
393
|
+
--skip-tags ec,package,sysctl,health_check \
|
|
394
|
+
-e "@resources/override.yml"
|
|
395
|
+
|
|
396
|
+
# Deploy brokers
|
|
397
|
+
ansible-playbook -i inventories/{{ env_name }}/hosts.yml \
|
|
398
|
+
playbooks/20-kafka-brokers.yml \
|
|
399
|
+
--limit {{ broker_host_1 }} \
|
|
400
|
+
-vv \
|
|
401
|
+
--skip-tags ec,package,sysctl,health_check \
|
|
402
|
+
-e "@resources/override.yml"
|
|
403
|
+
```
|
|
404
|
+
|
|
405
|
+
### EC Systemd User Service Pattern
|
|
406
|
+
|
|
407
|
+
```yaml
|
|
408
|
+
# User-scope systemd (no root)
|
|
409
|
+
- name: Kafka Started
|
|
410
|
+
ansible.builtin.systemd:
|
|
411
|
+
name: "{{ kafka_broker_service_name }}"
|
|
412
|
+
enabled: true
|
|
413
|
+
scope: user # EC constraint: user-mode systemd
|
|
414
|
+
state: started
|
|
415
|
+
tags: systemd
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
### Skip Tags Reference
|
|
419
|
+
|
|
420
|
+
| Tag | Purpose | When to Skip |
|
|
421
|
+
| -------------- | -------------------- | --------------- |
|
|
422
|
+
| `ec` | EC-specific mods | Already applied |
|
|
423
|
+
| `package` | Package installation | Re-runs |
|
|
424
|
+
| `sysctl` | Sysctl tuning | No root |
|
|
425
|
+
| `health_check` | Post-checks | Manual |
|
|
426
|
+
| `privileged` | Root-required | Non-root env |
|
|
427
|
+
|
|
428
|
+
### Reference Files
|
|
429
|
+
|
|
430
|
+
- **[references/ec_deployment.md](references/ec_deployment.md)** — **Complete EC paths, Vault, Ansible setup**
|
|
431
|
+
- **[references/ansible_playbooks.md](references/ansible_playbooks.md)** — Generic Ansible automation patterns
|
|
432
|
+
|
|
433
|
+
---
|
|
434
|
+
|
|
435
|
+
## Metrics and Monitoring
|
|
436
|
+
|
|
437
|
+
### Key JMX Metrics
|
|
438
|
+
|
|
439
|
+
```bash
|
|
440
|
+
# Using kafka JMX tool
|
|
441
|
+
export JMX_PORT=9999
|
|
442
|
+
|
|
443
|
+
# Under-replicated partitions (should be 0)
|
|
444
|
+
kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions
|
|
445
|
+
|
|
446
|
+
# Active controller count (should be 1 in cluster)
|
|
447
|
+
kafka.controller:type=KafkaController,name=ActiveControllerCount
|
|
448
|
+
|
|
449
|
+
# Request handler idle ratio (should be > 0.3)
|
|
450
|
+
kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent
|
|
451
|
+
|
|
452
|
+
# Network processor idle ratio
|
|
453
|
+
kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent
|
|
454
|
+
|
|
455
|
+
# Log flush latency
|
|
456
|
+
kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs
|
|
457
|
+
|
|
458
|
+
# Bytes in/out per second
|
|
459
|
+
kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
|
|
460
|
+
kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec
|
|
461
|
+
```
|
|
462
|
+
|
|
463
|
+
### Prometheus Integration
|
|
464
|
+
|
|
465
|
+
```yaml
|
|
466
|
+
# prometheus-kafka-exporter config
|
|
467
|
+
# Add to server.properties
|
|
468
|
+
kafka_jmx_exporter:
|
|
469
|
+
jmx_url: "service:jmx:rmi:///jndi/rmi://localhost:9999/jmxrmi"
|
|
470
|
+
lowercaseOutputName: true
|
|
471
|
+
lowercaseOutputLabelNames: true
|
|
472
|
+
whitelistObjectNames:
|
|
473
|
+
- "kafka.server:type=BrokerTopicMetrics,*"
|
|
474
|
+
- "kafka.server:type=ReplicaManager,*"
|
|
475
|
+
- "kafka.controller:type=KafkaController,*"
|
|
476
|
+
- "kafka.server:type=KafkaRequestHandlerPool,*"
|
|
477
|
+
```
|
|
478
|
+
|
|
479
|
+
### Health Check Script
|
|
480
|
+
|
|
481
|
+
Run the health check script for comprehensive cluster analysis:
|
|
482
|
+
|
|
483
|
+
```bash
|
|
484
|
+
python skills/confluent-kafka/scripts/kafka_health_check.py \
|
|
485
|
+
--bootstrap-servers kafka-01:9092,kafka-02:9092,kafka-03:9092 \
|
|
486
|
+
--output reports/kafka/health/
|
|
487
|
+
```
|
|
488
|
+
|
|
489
|
+
---
|
|
490
|
+
|
|
491
|
+
## Configuration Best Practices
|
|
492
|
+
|
|
493
|
+
### Production Broker Settings
|
|
494
|
+
|
|
495
|
+
```properties
|
|
496
|
+
# /opt/confluent/etc/kafka/server.properties
|
|
497
|
+
|
|
498
|
+
# KRaft mode settings
|
|
499
|
+
process.roles=broker
|
|
500
|
+
node.id=101
|
|
501
|
+
controller.quorum.voters=1@kafka-controller-01:9093,2@kafka-controller-02:9093,3@kafka-controller-03:9093
|
|
502
|
+
controller.listener.names=CONTROLLER
|
|
503
|
+
inter.broker.listener.name=INTERNAL
|
|
504
|
+
|
|
505
|
+
# Listeners
|
|
506
|
+
listeners=INTERNAL://0.0.0.0:9092,EXTERNAL://0.0.0.0:9094
|
|
507
|
+
listener.security.protocol.map=INTERNAL:SASL_SSL,EXTERNAL:SASL_SSL,CONTROLLER:SASL_SSL
|
|
508
|
+
advertised.listeners=INTERNAL://kafka-01.internal:9092,EXTERNAL://kafka-01.external:9094
|
|
509
|
+
|
|
510
|
+
# Performance tuning
|
|
511
|
+
num.network.threads=8
|
|
512
|
+
num.io.threads=16
|
|
513
|
+
socket.send.buffer.bytes=102400
|
|
514
|
+
socket.receive.buffer.bytes=102400
|
|
515
|
+
socket.request.max.bytes=104857600
|
|
516
|
+
|
|
517
|
+
# Log settings
|
|
518
|
+
log.dirs=/var/kafka-logs
|
|
519
|
+
num.partitions=12
|
|
520
|
+
default.replication.factor=3
|
|
521
|
+
min.insync.replicas=2
|
|
522
|
+
log.retention.hours=168
|
|
523
|
+
log.segment.bytes=1073741824
|
|
524
|
+
log.retention.check.interval.ms=300000
|
|
525
|
+
|
|
526
|
+
# Replication
|
|
527
|
+
replica.lag.time.max.ms=30000
|
|
528
|
+
num.replica.fetchers=4
|
|
529
|
+
replica.fetch.max.bytes=1048576
|
|
530
|
+
|
|
531
|
+
# Compression
|
|
532
|
+
compression.type=producer
|
|
533
|
+
|
|
534
|
+
# Security
|
|
535
|
+
authorizer.class.name=kafka.security.authorizer.AclAuthorizer
|
|
536
|
+
super.users=User:admin
|
|
537
|
+
ssl.keystore.location=/var/ssl/kafka/kafka.keystore.jks
|
|
538
|
+
ssl.truststore.location=/var/ssl/kafka/kafka.truststore.jks
|
|
539
|
+
```
|
|
540
|
+
|
|
541
|
+
### JVM Tuning
|
|
542
|
+
|
|
543
|
+
```properties
|
|
544
|
+
# /opt/confluent/etc/kafka/jvm.config
|
|
545
|
+
# For 64GB RAM server with 32GB heap
|
|
546
|
+
|
|
547
|
+
-Xms24g
|
|
548
|
+
-Xmx24g
|
|
549
|
+
-XX:MetaspaceSize=256m
|
|
550
|
+
-XX:MaxMetaspaceSize=512m
|
|
551
|
+
-XX:+UseG1GC
|
|
552
|
+
-XX:MaxGCPauseMillis=20
|
|
553
|
+
-XX:InitiatingHeapOccupancyPercent=35
|
|
554
|
+
-XX:G1HeapRegionSize=16m
|
|
555
|
+
-XX:MinMetaspaceFreeRatio=50
|
|
556
|
+
-XX:MaxMetaspaceFreeRatio=80
|
|
557
|
+
-XX:+ExplicitGCInvokesConcurrent
|
|
558
|
+
-XX:+PrintFlagsFinal
|
|
559
|
+
-XX:+UnlockDiagnosticVMOptions
|
|
560
|
+
-XX:+UseCompressedOops
|
|
561
|
+
-Djava.awt.headless=true
|
|
562
|
+
```
|
|
563
|
+
|
|
564
|
+
---
|
|
565
|
+
|
|
566
|
+
## Scripts
|
|
567
|
+
|
|
568
|
+
### Cluster Health Report
|
|
569
|
+
|
|
570
|
+
```bash
|
|
571
|
+
# Generate comprehensive health report
|
|
572
|
+
python skills/confluent-kafka/scripts/kafka_health_check.py \
|
|
573
|
+
--bootstrap-servers kafka-01:9092,kafka-02:9092,kafka-03:9092 \
|
|
574
|
+
--output reports/kafka/health/ \
|
|
575
|
+
--format both
|
|
576
|
+
|
|
577
|
+
# Quick status check only
|
|
578
|
+
python skills/confluent-kafka/scripts/kafka_health_check.py \
|
|
579
|
+
--bootstrap-servers localhost:9092 \
|
|
580
|
+
--quick
|
|
581
|
+
```
|
|
582
|
+
|
|
583
|
+
### Configuration Validator
|
|
584
|
+
|
|
585
|
+
```bash
|
|
586
|
+
# Validate server.properties before deployment
|
|
587
|
+
python skills/confluent-kafka/scripts/validate_config.py \
|
|
588
|
+
--config /opt/confluent/etc/kafka/server.properties \
|
|
589
|
+
--version 8.0
|
|
590
|
+
|
|
591
|
+
# Compare configurations across brokers
|
|
592
|
+
python skills/confluent-kafka/scripts/validate_config.py \
|
|
593
|
+
--compare broker-01:/opt/confluent/etc/kafka/server.properties \
|
|
594
|
+
broker-02:/opt/confluent/etc/kafka/server.properties
|
|
595
|
+
```
|
|
596
|
+
|
|
597
|
+
### Upgrade Pre-flight Check
|
|
598
|
+
|
|
599
|
+
```bash
|
|
600
|
+
# Run pre-upgrade validation
|
|
601
|
+
python skills/confluent-kafka/scripts/upgrade_preflight.py \
|
|
602
|
+
--current-version 7.6 \
|
|
603
|
+
--target-version 8.0 \
|
|
604
|
+
--bootstrap-servers kafka-01:9092
|
|
605
|
+
```
|
|
606
|
+
|
|
607
|
+
---
|
|
608
|
+
|
|
609
|
+
## Best Practices
|
|
610
|
+
|
|
611
|
+
### Security
|
|
612
|
+
|
|
613
|
+
1. **Enable SASL/SSL** — Always use encryption and authentication in production
|
|
614
|
+
2. **ACLs** — Enable authorization with `authorizer.class.name`
|
|
615
|
+
3. **Rotate certificates** — Plan for SSL certificate rotation before expiry
|
|
616
|
+
4. **Secrets management** — Use Vault or AWS Secrets Manager for credentials
|
|
617
|
+
|
|
618
|
+
### Performance
|
|
619
|
+
|
|
620
|
+
1. **Partition count** — Start with 12 partitions per topic, scale as needed
|
|
621
|
+
2. **Replication factor** — Use 3 for durability (min.insync.replicas=2)
|
|
622
|
+
3. **Compression** — Use `lz4` or `zstd` for producer compression
|
|
623
|
+
4. **Batch size** — Tune producer `batch.size` and `linger.ms` for throughput
|
|
624
|
+
|
|
625
|
+
### Reliability
|
|
626
|
+
|
|
627
|
+
1. **min.insync.replicas=2** — Ensure durability with acks=all producers
|
|
628
|
+
2. **Unclean leader election** — Keep `unclean.leader.election.enable=false`
|
|
629
|
+
3. **Regular backups** — Back up controller metadata and configs
|
|
630
|
+
4. **Monitoring alerts** — Alert on under-replicated partitions, lag, disk
|
|
631
|
+
|
|
632
|
+
### Maintenance
|
|
633
|
+
|
|
634
|
+
1. **Rolling restarts** — Always use controlled shutdown for upgrades
|
|
635
|
+
2. **Documentation** — Keep runbooks for common operations
|
|
636
|
+
3. **Test upgrades** — Always test in non-prod first
|
|
637
|
+
4. **Capacity planning** — Monitor growth trends, plan disk expansion
|
|
638
|
+
|
|
639
|
+
---
|
|
640
|
+
|
|
641
|
+
## Related Skills
|
|
642
|
+
|
|
643
|
+
- **[aws](../aws/SKILL.md)** — AWS infrastructure for Kafka deployment
|
|
644
|
+
- **[victoriametrics](../victoriametrics/SKILL.md)** — Metrics collection for Kafka monitoring
|
|
645
|
+
- **[consul](../consul/SKILL.md)** — Service discovery integration
|
|
646
|
+
|
|
647
|
+
---
|
|
648
|
+
|
|
649
|
+
## External Resources
|
|
650
|
+
|
|
651
|
+
- [Confluent Documentation](https://docs.confluent.io/platform/current/overview.html)
|
|
652
|
+
- [Apache Kafka Documentation](https://kafka.apache.org/documentation/)
|
|
653
|
+
- [KRaft Migration Guide](https://docs.confluent.io/platform/current/installation/migrate-zk-kraft.html)
|
|
654
|
+
- [Confluent Platform Release Notes](https://docs.confluent.io/platform/current/release-notes/index.html)
|
|
655
|
+
- [Kafka Operations Best Practices](https://docs.confluent.io/platform/current/kafka/operations.html)
|