@techwavedev/agi-agent-kit 1.1.7 → 1.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of @techwavedev/agi-agent-kit might be problematic. Click here for more details.

Files changed (111) hide show
  1. package/CHANGELOG.md +82 -1
  2. package/README.md +190 -12
  3. package/bin/init.js +30 -2
  4. package/package.json +6 -3
  5. package/templates/base/AGENTS.md +54 -23
  6. package/templates/base/README.md +325 -0
  7. package/templates/base/directives/memory_integration.md +95 -0
  8. package/templates/base/execution/memory_manager.py +309 -0
  9. package/templates/base/execution/session_boot.py +218 -0
  10. package/templates/base/execution/session_init.py +320 -0
  11. package/templates/base/skill-creator/SKILL_skillcreator.md +23 -36
  12. package/templates/base/skill-creator/scripts/init_skill.py +18 -135
  13. package/templates/skills/ec/README.md +31 -0
  14. package/templates/skills/ec/aws/SKILL.md +1020 -0
  15. package/templates/skills/ec/aws/defaults.yaml +13 -0
  16. package/templates/skills/ec/aws/references/common_patterns.md +80 -0
  17. package/templates/skills/ec/aws/references/mcp_servers.md +98 -0
  18. package/templates/skills/ec/aws-terraform/SKILL.md +349 -0
  19. package/templates/skills/ec/aws-terraform/references/best_practices.md +394 -0
  20. package/templates/skills/ec/aws-terraform/references/checkov_reference.md +337 -0
  21. package/templates/skills/ec/aws-terraform/scripts/configure_mcp.py +150 -0
  22. package/templates/skills/ec/confluent-kafka/SKILL.md +655 -0
  23. package/templates/skills/ec/confluent-kafka/references/ansible_playbooks.md +792 -0
  24. package/templates/skills/ec/confluent-kafka/references/ec_deployment.md +579 -0
  25. package/templates/skills/ec/confluent-kafka/references/kraft_migration.md +490 -0
  26. package/templates/skills/ec/confluent-kafka/references/troubleshooting.md +778 -0
  27. package/templates/skills/ec/confluent-kafka/references/upgrade_7x_to_8x.md +488 -0
  28. package/templates/skills/ec/confluent-kafka/scripts/kafka_health_check.py +435 -0
  29. package/templates/skills/ec/confluent-kafka/scripts/upgrade_preflight.py +568 -0
  30. package/templates/skills/ec/confluent-kafka/scripts/validate_config.py +455 -0
  31. package/templates/skills/ec/consul/SKILL.md +427 -0
  32. package/templates/skills/ec/consul/references/acl_setup.md +168 -0
  33. package/templates/skills/ec/consul/references/ha_config.md +196 -0
  34. package/templates/skills/ec/consul/references/troubleshooting.md +267 -0
  35. package/templates/skills/ec/consul/references/upgrades.md +213 -0
  36. package/templates/skills/ec/consul/scripts/consul_health_report.py +530 -0
  37. package/templates/skills/ec/consul/scripts/consul_status.py +264 -0
  38. package/templates/skills/ec/consul/scripts/generate_values.py +170 -0
  39. package/templates/skills/ec/documentation/SKILL.md +351 -0
  40. package/templates/skills/ec/documentation/references/best_practices.md +201 -0
  41. package/templates/skills/ec/documentation/scripts/analyze_code.py +307 -0
  42. package/templates/skills/ec/documentation/scripts/detect_changes.py +460 -0
  43. package/templates/skills/ec/documentation/scripts/generate_changelog.py +312 -0
  44. package/templates/skills/ec/documentation/scripts/sync_docs.py +272 -0
  45. package/templates/skills/ec/documentation/scripts/update_skill_docs.py +366 -0
  46. package/templates/skills/ec/gitlab/SKILL.md +529 -0
  47. package/templates/skills/ec/gitlab/references/agent_installation.md +416 -0
  48. package/templates/skills/ec/gitlab/references/api_reference.md +508 -0
  49. package/templates/skills/ec/gitlab/references/gitops_flux.md +465 -0
  50. package/templates/skills/ec/gitlab/references/troubleshooting.md +518 -0
  51. package/templates/skills/ec/gitlab/scripts/generate_agent_values.py +329 -0
  52. package/templates/skills/ec/gitlab/scripts/gitlab_agent_status.py +414 -0
  53. package/templates/skills/ec/jira/SKILL.md +484 -0
  54. package/templates/skills/ec/jira/references/jql_reference.md +148 -0
  55. package/templates/skills/ec/jira/scripts/add_comment.py +91 -0
  56. package/templates/skills/ec/jira/scripts/bulk_log_work.py +124 -0
  57. package/templates/skills/ec/jira/scripts/create_ticket.py +162 -0
  58. package/templates/skills/ec/jira/scripts/get_ticket.py +191 -0
  59. package/templates/skills/ec/jira/scripts/jira_client.py +383 -0
  60. package/templates/skills/ec/jira/scripts/log_work.py +154 -0
  61. package/templates/skills/ec/jira/scripts/search_tickets.py +104 -0
  62. package/templates/skills/ec/jira/scripts/update_comment.py +67 -0
  63. package/templates/skills/ec/jira/scripts/update_ticket.py +161 -0
  64. package/templates/skills/ec/karpenter/SKILL.md +301 -0
  65. package/templates/skills/ec/karpenter/references/ec2nodeclasses.md +421 -0
  66. package/templates/skills/ec/karpenter/references/migration.md +396 -0
  67. package/templates/skills/ec/karpenter/references/nodepools.md +400 -0
  68. package/templates/skills/ec/karpenter/references/troubleshooting.md +359 -0
  69. package/templates/skills/ec/karpenter/scripts/generate_ec2nodeclass.py +187 -0
  70. package/templates/skills/ec/karpenter/scripts/generate_nodepool.py +245 -0
  71. package/templates/skills/ec/karpenter/scripts/karpenter_status.py +359 -0
  72. package/templates/skills/ec/opensearch/SKILL.md +720 -0
  73. package/templates/skills/ec/opensearch/references/ml_neural_search.md +576 -0
  74. package/templates/skills/ec/opensearch/references/operator.md +532 -0
  75. package/templates/skills/ec/opensearch/references/query_dsl.md +532 -0
  76. package/templates/skills/ec/opensearch/scripts/configure_mcp.py +148 -0
  77. package/templates/skills/ec/victoriametrics/SKILL.md +598 -0
  78. package/templates/skills/ec/victoriametrics/references/kubernetes.md +531 -0
  79. package/templates/skills/ec/victoriametrics/references/prometheus_migration.md +333 -0
  80. package/templates/skills/ec/victoriametrics/references/troubleshooting.md +442 -0
  81. package/templates/skills/knowledge/SKILLS_CATALOG.md +274 -4
  82. package/templates/skills/knowledge/intelligent-routing/SKILL.md +237 -164
  83. package/templates/skills/knowledge/parallel-agents/SKILL.md +345 -73
  84. package/templates/skills/knowledge/plugin-discovery/SKILL.md +582 -0
  85. package/templates/skills/knowledge/plugin-discovery/scripts/platform_setup.py +1083 -0
  86. package/templates/skills/knowledge/design-md/README.md +0 -34
  87. package/templates/skills/knowledge/design-md/SKILL.md +0 -193
  88. package/templates/skills/knowledge/design-md/examples/DESIGN.md +0 -154
  89. package/templates/skills/knowledge/notebooklm-mcp/SKILL.md +0 -71
  90. package/templates/skills/knowledge/notebooklm-mcp/assets/example_asset.txt +0 -24
  91. package/templates/skills/knowledge/notebooklm-mcp/references/api_reference.md +0 -34
  92. package/templates/skills/knowledge/notebooklm-mcp/scripts/example.py +0 -19
  93. package/templates/skills/knowledge/react-components/README.md +0 -36
  94. package/templates/skills/knowledge/react-components/SKILL.md +0 -53
  95. package/templates/skills/knowledge/react-components/examples/gold-standard-card.tsx +0 -80
  96. package/templates/skills/knowledge/react-components/package-lock.json +0 -231
  97. package/templates/skills/knowledge/react-components/package.json +0 -16
  98. package/templates/skills/knowledge/react-components/resources/architecture-checklist.md +0 -15
  99. package/templates/skills/knowledge/react-components/resources/component-template.tsx +0 -37
  100. package/templates/skills/knowledge/react-components/resources/stitch-api-reference.md +0 -14
  101. package/templates/skills/knowledge/react-components/resources/style-guide.json +0 -27
  102. package/templates/skills/knowledge/react-components/scripts/fetch-stitch.sh +0 -30
  103. package/templates/skills/knowledge/react-components/scripts/validate.js +0 -68
  104. package/templates/skills/knowledge/self-update/SKILL.md +0 -60
  105. package/templates/skills/knowledge/self-update/scripts/update_kit.py +0 -103
  106. package/templates/skills/knowledge/stitch-loop/README.md +0 -54
  107. package/templates/skills/knowledge/stitch-loop/SKILL.md +0 -235
  108. package/templates/skills/knowledge/stitch-loop/examples/SITE.md +0 -73
  109. package/templates/skills/knowledge/stitch-loop/examples/next-prompt.md +0 -25
  110. package/templates/skills/knowledge/stitch-loop/resources/baton-schema.md +0 -61
  111. package/templates/skills/knowledge/stitch-loop/resources/site-template.md +0 -104
@@ -0,0 +1,655 @@
1
+ ---
2
+ name: confluent-kafka
3
+ description: Confluent Kafka specialist for tarball/Ansible custom installations. Expert in updating, maintaining, checking health, troubleshooting, documenting, analyzing metrics, and upgrading Confluent Kafka deployments from 7.x to 8.x versions. Covers KRaft mode (ZooKeeper-less), broker configuration, Schema Registry, Connect, ksqlDB, Control Center, and production-grade operations. Use when working with Confluent Platform installations, migrations to KRaft, performance tuning, health monitoring, and infrastructure-as-code with Ansible.
4
+ ---
5
+
6
+ # Confluent Kafka Skill
7
+
8
+ Comprehensive skill for managing Confluent Platform Kafka clusters deployed via tarball distributions and automated with Ansible. **Primary deployment context is EC (European Commission) controlled environments using KRaft-only, SSL-only, non-root systemd user services.**
9
+
10
+ > **Last Updated:** 2026-01-20 from [Confluent Documentation](https://docs.confluent.io/)
11
+
12
+ ---
13
+
14
+ ## EC Environment Quick Reference
15
+
16
+ > **Note:** Values below use variable placeholders. Define actual values in inventory files outside git.
17
+
18
+ | Item | Variable / Default |
19
+ | --------------------- | ---------------------------------------------- |
20
+ | **Confluent Version** | `{{ confluent_version }}` (e.g., 7.9.3) |
21
+ | **Ansible Base** | `{{ ansible_base }}` |
22
+ | **Confluent Install** | `{{ base_path }}/opt/confluent-{{ version }}/` |
23
+ | **JAVA_HOME** | `{{ base_path }}/opt/{{ java_version }}` |
24
+ | **SSL Directory** | `{{ base_path }}/opt/ssl/` |
25
+ | **Data: Controller** | `{{ base_path }}/opt/data/controller` |
26
+ | **Data: Broker** | `{{ base_path }}/opt/data` |
27
+ | **Logs** | `{{ base_path }}/logs/` |
28
+ | **Systemd (User)** | `~/.config/systemd/user/` |
29
+ | **User/Group** | `{{ kafka_user }}:{{ kafka_group }}` |
30
+ | **Controller Port** | `{{ controller_port }}` (default: 9093) |
31
+ | **Broker Port** | `{{ broker_port }}` (default: 9443) |
32
+
33
+ > **Full EC deployment reference:** [references/ec_deployment.md](references/ec_deployment.md)
34
+
35
+ ---
36
+
37
+ ## Quick Start (EC Environment)
38
+
39
+ ```bash
40
+ # Set environment variables (or source from environment file)
41
+ export KAFKA_HOME={{ base_path }}/opt/confluent-{{ confluent_version }}
42
+ export BOOTSTRAP={{ broker_host_1 }}:{{ broker_port }},{{ broker_host_2 }}:{{ broker_port }}
43
+ export CLIENT_PROPS={{ base_path }}/etc/kafka/client.properties
44
+
45
+ # SSH to a broker node
46
+ ssh {{ kafka_user }}@{{ broker_host_1 }}
47
+
48
+ # Verify broker is running (user systemd scope)
49
+ systemctl --user status confluent-server
50
+
51
+ # Check cluster health
52
+ $KAFKA_HOME/bin/kafka-broker-api-versions \
53
+ --bootstrap-server $BOOTSTRAP \
54
+ --command-config $CLIENT_PROPS
55
+
56
+ # Check controller quorum (KRaft)
57
+ $KAFKA_HOME/bin/kafka-metadata \
58
+ --snapshot {{ base_path }}/opt/data/controller/__cluster_metadata-0/00000000000000000000.log \
59
+ --command quorum
60
+
61
+ # Use management script
62
+ {{ base_path }}/scripts/management/kafka_node.sh status
63
+ ```
64
+
65
+ ---
66
+
67
+ ## Core Concepts
68
+
69
+ ### Confluent Platform Components
70
+
71
+ | Component | Description | Default Port |
72
+ | -------------------- | ---------------------------------------- | ------------ |
73
+ | **Kafka Broker** | Core message broker (confluent-server) | 9092 |
74
+ | **KRaft Controller** | Metadata management (replaces ZooKeeper) | 9093 |
75
+ | **Schema Registry** | Avro/JSON/Protobuf schema management | 8081 |
76
+ | **Kafka Connect** | Data integration connectors | 8083 |
77
+ | **ksqlDB** | Stream processing with SQL | 8088 |
78
+ | **Control Center** | Web-based management UI | 9021 |
79
+ | **REST Proxy** | HTTP interface to Kafka | 8082 |
80
+
81
+ ### KRaft Architecture (8.x Native)
82
+
83
+ ```
84
+ ┌─────────────────────────────────────────────────────────────────┐
85
+ │ Confluent Kafka KRaft Cluster │
86
+ │ │
87
+ │ ┌─────────────────────────────────────────────────────────┐ │
88
+ │ │ Controller Quorum (Raft Consensus) │ │
89
+ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
90
+ │ │ │Controller-01 │ │Controller-02 │ │Controller-03 │ │ │
91
+ │ │ │ (voter) │ │ (voter) │ │ (voter) │ │ │
92
+ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
93
+ │ └─────────────────────────────────────────────────────────┘ │
94
+ │ │ │
95
+ │ Metadata updates via Raft │
96
+ │ ▼ │
97
+ │ ┌─────────────────────────────────────────────────────────┐ │
98
+ │ │ Broker Nodes │ │
99
+ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
100
+ │ │ │Broker-01 │ │Broker-02 │ │Broker-03 │ │Broker-N │ │ │
101
+ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
102
+ │ └─────────────────────────────────────────────────────────┘ │
103
+ │ │
104
+ │ ┌─────────────────────────────────────────────────────────┐ │
105
+ │ │ Ecosystem Services │ │
106
+ │ │ Schema Registry │ Connect │ ksqlDB │ Control Center │ │
107
+ │ └─────────────────────────────────────────────────────────┘ │
108
+ └─────────────────────────────────────────────────────────────────┘
109
+ ```
110
+
111
+ ### EC Installation Layout
112
+
113
+ ```
114
+ {{ base_path }}/ # Main installation root
115
+ ├── opt/
116
+ │ ├── confluent-{{ confluent_version }}/ # Confluent Platform
117
+ │ │ ├── bin/ # CLI tools
118
+ │ │ ├── etc/ # Default configs (unused)
119
+ │ │ └── share/ # Libraries
120
+ │ │
121
+ │ ├── {{ java_version }}/ # Java installation
122
+ │ │
123
+ │ ├── ssl/ # SSL certificates
124
+ │ │ ├── {{ keystore_filename }}
125
+ │ │ ├── {{ truststore_filename }}
126
+ │ │ └── security.properties # Encrypted passwords
127
+ │ │
128
+ │ ├── data/ # Kafka data
129
+ │ │ ├── controller/ # Controller logs
130
+ │ │ └── (broker data at root)
131
+ │ │
132
+ │ ├── logs/ # Application logs
133
+ │ └── monitoring/ # JMX exporter
134
+
135
+ ├── etc/
136
+ │ ├── kafka/server.properties # Broker config
137
+ │ └── controller/server.properties # Controller config
138
+
139
+ ├── logs/ # Runtime + GC logs
140
+ ├── tmp/ # Java temp directory
141
+ └── scripts/management/ # Management scripts
142
+ ├── kafka_node.sh # Start/stop wrapper
143
+ └── kafka_tools.sh # Aux tools
144
+
145
+ ~/.config/systemd/user/ # User-scope systemd
146
+ ├── confluent-kcontroller.service # Controller service
147
+ └── confluent-server.service # Broker service
148
+ ```
149
+
150
+ > **Standard paths** (non-EC): `/opt/confluent/`, `/var/kafka-logs/`, `/etc/systemd/system/`
151
+
152
+ ---
153
+
154
+ ## Common Workflows
155
+
156
+ ### 1. Check Cluster Health (EC)
157
+
158
+ ```bash
159
+ # Set environment (source from your environment file)
160
+ export KAFKA_HOME={{ base_path }}/opt/confluent-{{ confluent_version }}
161
+ export BOOTSTRAP={{ broker_host_1 }}:{{ broker_port }},{{ broker_host_2 }}:{{ broker_port }},{{ broker_host_3 }}:{{ broker_port }}
162
+ export CLIENT_PROPS={{ base_path }}/etc/kafka/client.properties
163
+
164
+ # Broker status (user systemd)
165
+ systemctl --user status confluent-server
166
+
167
+ # Controller quorum status (KRaft)
168
+ $KAFKA_HOME/bin/kafka-metadata \
169
+ --snapshot {{ base_path }}/opt/data/controller/__cluster_metadata-0/00000000000000000000.log \
170
+ --command quorum
171
+
172
+ # Under-replicated partitions
173
+ $KAFKA_HOME/bin/kafka-topics --bootstrap-server $BOOTSTRAP \
174
+ --command-config $CLIENT_PROPS \
175
+ --describe --under-replicated-partitions
176
+
177
+ # Offline partitions (CRITICAL)
178
+ $KAFKA_HOME/bin/kafka-topics --bootstrap-server $BOOTSTRAP \
179
+ --command-config $CLIENT_PROPS \
180
+ --describe --unavailable-partitions
181
+
182
+ # Broker disk usage
183
+ df -h /ec/local/reuse/opt/data
184
+
185
+ # JVM heap usage
186
+ jstat -gc $(pgrep -f kafka.Kafka) 1000 5
187
+ ```
188
+
189
+ ### 2. Monitor Consumer Lag
190
+
191
+ ```bash
192
+ # All consumer groups
193
+ /opt/confluent/bin/kafka-consumer-groups --bootstrap-server localhost:9092 \
194
+ --all-groups --describe
195
+
196
+ # Specific group
197
+ /opt/confluent/bin/kafka-consumer-groups --bootstrap-server localhost:9092 \
198
+ --group my-consumer-group --describe
199
+
200
+ # Export lag metrics (for monitoring integration)
201
+ /opt/confluent/bin/kafka-consumer-groups --bootstrap-server localhost:9092 \
202
+ --all-groups --describe | awk 'NR>1 {sum+=$6} END {print "Total Lag: " sum}'
203
+ ```
204
+
205
+ ### 3. Schema Registry Operations
206
+
207
+ ```bash
208
+ # Check Schema Registry health
209
+ curl -s http://localhost:8081/ | jq
210
+
211
+ # List all subjects
212
+ curl -s http://localhost:8081/subjects | jq
213
+
214
+ # Get schema versions for a subject
215
+ curl -s http://localhost:8081/subjects/my-topic-value/versions | jq
216
+
217
+ # Get specific schema
218
+ curl -s http://localhost:8081/subjects/my-topic-value/versions/latest | jq
219
+
220
+ # Check compatibility
221
+ curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
222
+ --data '{"schema": "{\"type\":\"record\",\"name\":\"Test\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"}]}"}' \
223
+ http://localhost:8081/compatibility/subjects/my-topic-value/versions/latest | jq
224
+ ```
225
+
226
+ ### 4. Connect Cluster Operations
227
+
228
+ ```bash
229
+ # Connect worker status
230
+ curl -s http://localhost:8083/ | jq
231
+
232
+ # List connectors
233
+ curl -s http://localhost:8083/connectors | jq
234
+
235
+ # Connector status
236
+ curl -s http://localhost:8083/connectors/my-connector/status | jq
237
+
238
+ # Restart failed tasks
239
+ curl -X POST http://localhost:8083/connectors/my-connector/tasks/0/restart
240
+
241
+ # View connector config
242
+ curl -s http://localhost:8083/connectors/my-connector/config | jq
243
+ ```
244
+
245
+ ---
246
+
247
+ ## Upgrade Guide: 7.x to 8.x
248
+
249
+ ### Key Changes in 8.x
250
+
251
+ | Change | Impact |
252
+ | --------------------- | ---------------------------------------- |
253
+ | **KRaft GA** | ZooKeeper deprecated, KRaft recommended |
254
+ | **Java 17+** | Minimum Java version requirement |
255
+ | **Deprecated APIs** | kafka-preferred-replica-election removed |
256
+ | **New Metrics** | Enhanced KRaft controller metrics |
257
+ | **Security Defaults** | Stricter TLS requirements |
258
+
259
+ ### Pre-Upgrade Checklist
260
+
261
+ ```bash
262
+ # 1. Check current version
263
+ /opt/confluent/bin/kafka-broker-api-versions --version
264
+
265
+ # 2. Verify Java version (must be 17+)
266
+ java -version
267
+
268
+ # 3. Backup configurations
269
+ tar -czvf /backup/confluent-config-$(date +%Y%m%d).tar.gz /opt/confluent/etc/
270
+
271
+ # 4. Export topic configurations
272
+ /opt/confluent/bin/kafka-configs --bootstrap-server localhost:9092 \
273
+ --entity-type topics --all --describe > /backup/topic-configs.txt
274
+
275
+ # 5. Check for deprecated configurations
276
+ grep -r "log.message.format.version\|inter.broker.protocol.version" /opt/confluent/etc/kafka/
277
+
278
+ # 6. Verify cluster health
279
+ /opt/confluent/bin/kafka-topics --bootstrap-server localhost:9092 \
280
+ --describe --under-replicated-partitions
281
+ ```
282
+
283
+ ### Reference Files
284
+
285
+ - **[references/ec_deployment.md](references/ec_deployment.md)** — **EC deployment paths, Vault, and setup**
286
+ - **[references/upgrade_7x_to_8x.md](references/upgrade_7x_to_8x.md)** — Complete 7.x to 8.x migration guide
287
+ - **[references/kraft_migration.md](references/kraft_migration.md)** — ZooKeeper to KRaft migration steps
288
+ - **[references/ansible_playbooks.md](references/ansible_playbooks.md)** — Ansible automation patterns
289
+
290
+ ---
291
+
292
+ ## Troubleshooting Guide
293
+
294
+ ### Common Issues
295
+
296
+ | Issue | Diagnosis | Solution |
297
+ | -------------------------- | ------------------------------------------ | --------------------------------------------- |
298
+ | **Broker won't start** | Check logs `/var/log/confluent/kafka/` | Fix config, check disk space, verify ports |
299
+ | **Controller quorum lost** | `kafka-metadata` shows <3 voters | Restore failed controllers, check network |
300
+ | **High consumer lag** | Consumer processing slower than production | Scale consumers, optimize processing |
301
+ | **ISR shrinking** | Followers falling behind leader | Check network, disk I/O, increase replica lag |
302
+ | **Schema Registry 409** | Schema incompatibility | Check compatibility mode, use FULL_TRANSITIVE |
303
+ | **Connect task failed** | Connector config or target system issue | Check task status, review error in config |
304
+ | **OOM on broker** | Heap exhaustion | Tune JVM heap, check for memory leaks |
305
+
306
+ ### Debug Commands (EC)
307
+
308
+ ```bash
309
+ # Set paths (source from your environment file)
310
+ export KAFKA_HOME={{ base_path }}/opt/confluent-{{ confluent_version }}
311
+ export LOG_DIR={{ base_path }}/logs
312
+ export DATA_DIR={{ base_path }}/opt/data
313
+
314
+ # Kafka server logs (last 100 lines)
315
+ tail -100 $LOG_DIR/server.log
316
+
317
+ # Controller logs (KRaft)
318
+ tail -100 $LOG_DIR/controller.log
319
+
320
+ # Systemd journal logs
321
+ journalctl --user -u confluent-server -f
322
+ journalctl --user -u confluent-kcontroller -f
323
+
324
+ # Check open file descriptors
325
+ lsof -p $(pgrep -f kafka.Kafka) | wc -l
326
+
327
+ # Network connections to broker
328
+ netstat -an | grep {{ broker_port }} | wc -l
329
+
330
+ # Thread dump for debugging hung brokers
331
+ jstack $(pgrep -f kafka.Kafka) > /tmp/kafka-thread-dump.txt
332
+
333
+ # GC logs analysis
334
+ grep "GC pause" $LOG_DIR/gc.log | tail -20
335
+
336
+ # KRaft metadata diagnostics
337
+ $KAFKA_HOME/bin/kafka-metadata \
338
+ --snapshot $DATA_DIR/controller/__cluster_metadata-0/00000000000000000000.log \
339
+ --command topic --topics __consumer_offsets
340
+ ```
341
+
342
+ ### Detailed Troubleshooting
343
+
344
+ For in-depth troubleshooting scenarios, see **[references/troubleshooting.md](references/troubleshooting.md)**.
345
+
346
+ ---
347
+
348
+ ## Ansible Automation (EC Environment)
349
+
350
+ ### EC Inventory Structure
351
+
352
+ ```yaml
353
+ # {{ ansible_base }}/inventories/{{ env_name }}/hosts.yml
354
+ # Replace {{ variable }} placeholders with actual values in your inventory (not in git)
355
+ all:
356
+ children:
357
+ kafka_controller:
358
+ hosts:
359
+ { { controller_host_1 } }:
360
+ node_id: { { controller_id_1 } }
361
+ { { controller_host_2 } }:
362
+ node_id: { { controller_id_2 } }
363
+ { { controller_host_3 } }:
364
+ node_id: { { controller_id_3 } }
365
+
366
+ kafka_broker:
367
+ hosts:
368
+ { { broker_host_1 } }:
369
+ node_id: { { broker_id_1 } }
370
+ { { broker_host_2 } }:
371
+ node_id: { { broker_id_2 } }
372
+ { { broker_host_3 } }:
373
+ node_id: { { broker_id_3 } }
374
+ ```
375
+
376
+ ### EC Deployment Commands
377
+
378
+ ```bash
379
+ # Export Vault token (obtain via PrivX or your auth method)
380
+ export VAULT_TOKEN="${VAULT_TOKEN}"
381
+ cd {{ ansible_base }}
382
+
383
+ # Vault bootstrap (one-time per environment)
384
+ ansible-playbook playbooks/tasks/vault-bootstrap.yml \
385
+ -e vault_env={{ env_name }} \
386
+ -e "@resources/secrets.yml"
387
+
388
+ # Deploy controllers
389
+ ansible-playbook -i inventories/{{ env_name }}/hosts.yml \
390
+ playbooks/10-kafka-controllers.yml \
391
+ --limit {{ controller_host_1 }} \
392
+ -vv \
393
+ --skip-tags ec,package,sysctl,health_check \
394
+ -e "@resources/override.yml"
395
+
396
+ # Deploy brokers
397
+ ansible-playbook -i inventories/{{ env_name }}/hosts.yml \
398
+ playbooks/20-kafka-brokers.yml \
399
+ --limit {{ broker_host_1 }} \
400
+ -vv \
401
+ --skip-tags ec,package,sysctl,health_check \
402
+ -e "@resources/override.yml"
403
+ ```
404
+
405
+ ### EC Systemd User Service Pattern
406
+
407
+ ```yaml
408
+ # User-scope systemd (no root)
409
+ - name: Kafka Started
410
+ ansible.builtin.systemd:
411
+ name: "{{ kafka_broker_service_name }}"
412
+ enabled: true
413
+ scope: user # EC constraint: user-mode systemd
414
+ state: started
415
+ tags: systemd
416
+ ```
417
+
418
+ ### Skip Tags Reference
419
+
420
+ | Tag | Purpose | When to Skip |
421
+ | -------------- | -------------------- | --------------- |
422
+ | `ec` | EC-specific mods | Already applied |
423
+ | `package` | Package installation | Re-runs |
424
+ | `sysctl` | Sysctl tuning | No root |
425
+ | `health_check` | Post-checks | Manual |
426
+ | `privileged` | Root-required | Non-root env |
427
+
428
+ ### Reference Files
429
+
430
+ - **[references/ec_deployment.md](references/ec_deployment.md)** — **Complete EC paths, Vault, Ansible setup**
431
+ - **[references/ansible_playbooks.md](references/ansible_playbooks.md)** — Generic Ansible automation patterns
432
+
433
+ ---
434
+
435
+ ## Metrics and Monitoring
436
+
437
+ ### Key JMX Metrics
438
+
439
+ ```bash
440
+ # Using kafka JMX tool
441
+ export JMX_PORT=9999
442
+
443
+ # Under-replicated partitions (should be 0)
444
+ kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions
445
+
446
+ # Active controller count (should be 1 in cluster)
447
+ kafka.controller:type=KafkaController,name=ActiveControllerCount
448
+
449
+ # Request handler idle ratio (should be > 0.3)
450
+ kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent
451
+
452
+ # Network processor idle ratio
453
+ kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent
454
+
455
+ # Log flush latency
456
+ kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs
457
+
458
+ # Bytes in/out per second
459
+ kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
460
+ kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec
461
+ ```
462
+
463
+ ### Prometheus Integration
464
+
465
+ ```yaml
466
+ # prometheus-kafka-exporter config
467
+ # Add to server.properties
468
+ kafka_jmx_exporter:
469
+ jmx_url: "service:jmx:rmi:///jndi/rmi://localhost:9999/jmxrmi"
470
+ lowercaseOutputName: true
471
+ lowercaseOutputLabelNames: true
472
+ whitelistObjectNames:
473
+ - "kafka.server:type=BrokerTopicMetrics,*"
474
+ - "kafka.server:type=ReplicaManager,*"
475
+ - "kafka.controller:type=KafkaController,*"
476
+ - "kafka.server:type=KafkaRequestHandlerPool,*"
477
+ ```
478
+
479
+ ### Health Check Script
480
+
481
+ Run the health check script for comprehensive cluster analysis:
482
+
483
+ ```bash
484
+ python skills/confluent-kafka/scripts/kafka_health_check.py \
485
+ --bootstrap-servers kafka-01:9092,kafka-02:9092,kafka-03:9092 \
486
+ --output reports/kafka/health/
487
+ ```
488
+
489
+ ---
490
+
491
+ ## Configuration Best Practices
492
+
493
+ ### Production Broker Settings
494
+
495
+ ```properties
496
+ # /opt/confluent/etc/kafka/server.properties
497
+
498
+ # KRaft mode settings
499
+ process.roles=broker
500
+ node.id=101
501
+ controller.quorum.voters=1@kafka-controller-01:9093,2@kafka-controller-02:9093,3@kafka-controller-03:9093
502
+ controller.listener.names=CONTROLLER
503
+ inter.broker.listener.name=INTERNAL
504
+
505
+ # Listeners
506
+ listeners=INTERNAL://0.0.0.0:9092,EXTERNAL://0.0.0.0:9094
507
+ listener.security.protocol.map=INTERNAL:SASL_SSL,EXTERNAL:SASL_SSL,CONTROLLER:SASL_SSL
508
+ advertised.listeners=INTERNAL://kafka-01.internal:9092,EXTERNAL://kafka-01.external:9094
509
+
510
+ # Performance tuning
511
+ num.network.threads=8
512
+ num.io.threads=16
513
+ socket.send.buffer.bytes=102400
514
+ socket.receive.buffer.bytes=102400
515
+ socket.request.max.bytes=104857600
516
+
517
+ # Log settings
518
+ log.dirs=/var/kafka-logs
519
+ num.partitions=12
520
+ default.replication.factor=3
521
+ min.insync.replicas=2
522
+ log.retention.hours=168
523
+ log.segment.bytes=1073741824
524
+ log.retention.check.interval.ms=300000
525
+
526
+ # Replication
527
+ replica.lag.time.max.ms=30000
528
+ num.replica.fetchers=4
529
+ replica.fetch.max.bytes=1048576
530
+
531
+ # Compression
532
+ compression.type=producer
533
+
534
+ # Security
535
+ authorizer.class.name=kafka.security.authorizer.AclAuthorizer
536
+ super.users=User:admin
537
+ ssl.keystore.location=/var/ssl/kafka/kafka.keystore.jks
538
+ ssl.truststore.location=/var/ssl/kafka/kafka.truststore.jks
539
+ ```
540
+
541
+ ### JVM Tuning
542
+
543
+ ```properties
544
+ # /opt/confluent/etc/kafka/jvm.config
545
+ # For 64GB RAM server with 32GB heap
546
+
547
+ -Xms24g
548
+ -Xmx24g
549
+ -XX:MetaspaceSize=256m
550
+ -XX:MaxMetaspaceSize=512m
551
+ -XX:+UseG1GC
552
+ -XX:MaxGCPauseMillis=20
553
+ -XX:InitiatingHeapOccupancyPercent=35
554
+ -XX:G1HeapRegionSize=16m
555
+ -XX:MinMetaspaceFreeRatio=50
556
+ -XX:MaxMetaspaceFreeRatio=80
557
+ -XX:+ExplicitGCInvokesConcurrent
558
+ -XX:+PrintFlagsFinal
559
+ -XX:+UnlockDiagnosticVMOptions
560
+ -XX:+UseCompressedOops
561
+ -Djava.awt.headless=true
562
+ ```
563
+
564
+ ---
565
+
566
+ ## Scripts
567
+
568
+ ### Cluster Health Report
569
+
570
+ ```bash
571
+ # Generate comprehensive health report
572
+ python skills/confluent-kafka/scripts/kafka_health_check.py \
573
+ --bootstrap-servers kafka-01:9092,kafka-02:9092,kafka-03:9092 \
574
+ --output reports/kafka/health/ \
575
+ --format both
576
+
577
+ # Quick status check only
578
+ python skills/confluent-kafka/scripts/kafka_health_check.py \
579
+ --bootstrap-servers localhost:9092 \
580
+ --quick
581
+ ```
582
+
583
+ ### Configuration Validator
584
+
585
+ ```bash
586
+ # Validate server.properties before deployment
587
+ python skills/confluent-kafka/scripts/validate_config.py \
588
+ --config /opt/confluent/etc/kafka/server.properties \
589
+ --version 8.0
590
+
591
+ # Compare configurations across brokers
592
+ python skills/confluent-kafka/scripts/validate_config.py \
593
+ --compare broker-01:/opt/confluent/etc/kafka/server.properties \
594
+ broker-02:/opt/confluent/etc/kafka/server.properties
595
+ ```
596
+
597
+ ### Upgrade Pre-flight Check
598
+
599
+ ```bash
600
+ # Run pre-upgrade validation
601
+ python skills/confluent-kafka/scripts/upgrade_preflight.py \
602
+ --current-version 7.6 \
603
+ --target-version 8.0 \
604
+ --bootstrap-servers kafka-01:9092
605
+ ```
606
+
607
+ ---
608
+
609
+ ## Best Practices
610
+
611
+ ### Security
612
+
613
+ 1. **Enable SASL/SSL** — Always use encryption and authentication in production
614
+ 2. **ACLs** — Enable authorization with `authorizer.class.name`
615
+ 3. **Rotate certificates** — Plan for SSL certificate rotation before expiry
616
+ 4. **Secrets management** — Use Vault or AWS Secrets Manager for credentials
617
+
618
+ ### Performance
619
+
620
+ 1. **Partition count** — Start with 12 partitions per topic, scale as needed
621
+ 2. **Replication factor** — Use 3 for durability (min.insync.replicas=2)
622
+ 3. **Compression** — Use `lz4` or `zstd` for producer compression
623
+ 4. **Batch size** — Tune producer `batch.size` and `linger.ms` for throughput
624
+
625
+ ### Reliability
626
+
627
+ 1. **min.insync.replicas=2** — Ensure durability with acks=all producers
628
+ 2. **Unclean leader election** — Keep `unclean.leader.election.enable=false`
629
+ 3. **Regular backups** — Back up controller metadata and configs
630
+ 4. **Monitoring alerts** — Alert on under-replicated partitions, lag, disk
631
+
632
+ ### Maintenance
633
+
634
+ 1. **Rolling restarts** — Always use controlled shutdown for upgrades
635
+ 2. **Documentation** — Keep runbooks for common operations
636
+ 3. **Test upgrades** — Always test in non-prod first
637
+ 4. **Capacity planning** — Monitor growth trends, plan disk expansion
638
+
639
+ ---
640
+
641
+ ## Related Skills
642
+
643
+ - **[aws](../aws/SKILL.md)** — AWS infrastructure for Kafka deployment
644
+ - **[victoriametrics](../victoriametrics/SKILL.md)** — Metrics collection for Kafka monitoring
645
+ - **[consul](../consul/SKILL.md)** — Service discovery integration
646
+
647
+ ---
648
+
649
+ ## External Resources
650
+
651
+ - [Confluent Documentation](https://docs.confluent.io/platform/current/overview.html)
652
+ - [Apache Kafka Documentation](https://kafka.apache.org/documentation/)
653
+ - [KRaft Migration Guide](https://docs.confluent.io/platform/current/installation/migrate-zk-kraft.html)
654
+ - [Confluent Platform Release Notes](https://docs.confluent.io/platform/current/release-notes/index.html)
655
+ - [Kafka Operations Best Practices](https://docs.confluent.io/platform/current/kafka/operations.html)