@autofleet/kafka 0.1.1 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -7,12 +7,16 @@ Internal wrapper for Apache Kafka producer using [@platformatic/kafka](https://w
7
7
  - [Installation](#installation)
8
8
  - [Features](#features)
9
9
  - [Quick Start](#quick-start)
10
+ - [Initialization & Bootstrap](#initialization--bootstrap)
11
+ - [Lifecycle Overview](#lifecycle-overview)
12
+ - [Production Bootstrap Pattern](#production-bootstrap-pattern)
13
+ - [Development Bootstrap Pattern](#development-bootstrap-pattern)
14
+ - [Bootstrap Options](#bootstrap-options)
10
15
  - [Usage](#usage)
11
16
  - [Setup with Multiple Producers](#setup-with-multiple-producers)
12
17
  - [Publishing Messages](#publishing-messages)
13
18
  - [Mock Mode (Disable Kafka)](#mock-mode-disable-kafka)
14
19
  - [Health Checks & Readiness Probes](#health-checks--readiness-probes)
15
- - [Migration from getKafka() Pattern](#migration-from-getkafka-pattern)
16
20
  - [Configuration](#configuration)
17
21
  - [Advanced Features](#advanced-features)
18
22
  - [Message Keys for Partitioning](#message-keys-for-partitioning)
@@ -20,12 +24,13 @@ Internal wrapper for Apache Kafka producer using [@platformatic/kafka](https://w
20
24
  - [Partition Control](#partition-control)
21
25
  - [API Reference](#api-reference)
22
26
  - [Best Practices](#best-practices)
27
+ - [Troubleshooting](#troubleshooting)
23
28
  - [Testing](#testing)
24
29
 
25
30
  ## Installation
26
31
 
27
32
  ```bash
28
- pnpm add @autofleet/kafka
33
+ npm add @autofleet/kafka
29
34
  ```
30
35
 
31
36
  ## Features
@@ -33,7 +38,7 @@ pnpm add @autofleet/kafka
33
38
  - **Multi-Producer Management** - Manage multiple named producers with different broker configurations
34
39
  - **Type-Safe Producer Names** - Producer names are typed based on your configuration with full autocomplete
35
40
  - **Built-in Mock Mode** - Disable Kafka entirely with automatic mock implementations
36
- - **Direct API Access** - No need for `getKafka()` wrappers, access producers directly
41
+ - **Direct API Access** - Access producers directly with a clean, simple interface
37
42
  - **Centralized Health Checks** - Single readiness check for all producers
38
43
  - **Automatic Connection Management** - Handles connection lifecycle automatically
39
44
  - **Batch Publishing** - Efficient batch message sending
@@ -66,7 +71,7 @@ const kafka = KafkaManager.create({
66
71
  },
67
72
  });
68
73
 
69
- // Publish directly - no getKafka() needed!
74
+ // Publish messages directly
70
75
  // Producers initialize automatically on first publish
71
76
  // TypeScript knows 'main' and 'analytics' are the only valid producer names!
72
77
  await kafka.publish('main', 'user-events', {
@@ -91,6 +96,177 @@ app.get('/health/ready', async (req, res) => {
91
96
  });
92
97
  ```
93
98
 
99
+ ## Initialization & Bootstrap
100
+
101
+ ### Lifecycle Overview
102
+
103
+ The `@autofleet/kafka` package follows a clear, predictable lifecycle:
104
+
105
+ ```
106
+ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐
107
+ │ create() │────▶│ bootstrap() │────▶│ publish/use │────▶│disconnect()│
108
+ │ (sync) │ │ (async) │ │ (async) │ │ (async) │
109
+ └─────────────┘ └──────────────┘ └──────────────┘ └────────────┘
110
+ Instant Validates Runtime Cleanup
111
+ construction connectivity operations
112
+ ```
113
+
114
+ **1. `create()` - Synchronous Construction**
115
+ - Creates the KafkaManager instance
116
+ - Sets up configuration
117
+ - Does NOT connect to Kafka yet
118
+ - Returns immediately
119
+
120
+ **2. `bootstrap()` - Async Initialization**
121
+ - Explicitly connects all producers to Kafka
122
+ - Validates broker connectivity
123
+ - Fetches cluster metadata
124
+ - Fails fast with detailed diagnostics
125
+ - **Recommended before serving traffic**
126
+
127
+ **3. Runtime Operations**
128
+ - `publish()`, `publishBatch()` work normally
129
+ - Lazy initialization supported (producers connect on first use)
130
+ - Health checks available
131
+
132
+ **4. `disconnect()` - Cleanup**
133
+ - Gracefully closes all connections
134
+ - Automatic on SIGTERM/SIGINT
135
+
136
+ ### Production Bootstrap Pattern
137
+
138
+ **Recommended production service initialization:**
139
+
140
+ ```typescript
141
+ import { KafkaManager } from '@autofleet/kafka';
142
+ import logger from './logger';
143
+
144
+ // 1. Synchronous construction
145
+ export const kafka = KafkaManager.create({
146
+ enabled: process.env.ENABLE_KAFKA === 'true',
147
+ logger,
148
+ bootstrapTimeoutMs: 15000, // 15s bootstrap timeout
149
+ healthCheckCacheMs: 2000, // Cache health for 2s
150
+ strictBootstrap: true, // Fail if ANY producer fails
151
+ producers: {
152
+ main: {
153
+ brokers: process.env.KAFKA_BROKERS?.split(',') || [],
154
+ clientId: 'my-service-main',
155
+ },
156
+ analytics: {
157
+ brokers: process.env.ANALYTICS_BROKERS?.split(',') || [],
158
+ clientId: 'my-service-analytics',
159
+ },
160
+ },
161
+ });
162
+
163
+ // 2. Bootstrap before serving traffic
164
+ async function startServer() {
165
+ logger.info('Bootstrapping Kafka...');
166
+
167
+ try {
168
+ const result = await kafka.bootstrap();
169
+
170
+ logger.info('Kafka bootstrap successful', {
171
+ duration: result.duration,
172
+ producers: Object.keys(result.results),
173
+ });
174
+ } catch (error) {
175
+ logger.error('Fatal: Kafka bootstrap failed', { error });
176
+ process.exit(1); // Fail fast in production
177
+ }
178
+
179
+ // 3. Start HTTP server only after Kafka is ready
180
+ app.listen(3000, () => {
181
+ logger.info('Server ready - Kafka connected');
182
+ });
183
+ }
184
+
185
+ startServer();
186
+ ```
187
+
188
+ ### Development Bootstrap Pattern
189
+
190
+ **For development/local environments with optional Kafka:**
191
+
192
+ ```typescript
193
+ async function startServer() {
194
+ // Try to bootstrap, but don't fail if Kafka unavailable
195
+ try {
196
+ await kafka.bootstrap({
197
+ timeoutMs: 5000, // Shorter timeout for dev
198
+ strict: false, // Allow partial failures
199
+ });
200
+ logger.info('Kafka connected');
201
+ } catch (error) {
202
+ logger.warn('Kafka unavailable - using lazy initialization', { error });
203
+ // Server still starts, producers will retry on first publish
204
+ }
205
+
206
+ app.listen(3000, () => {
207
+ logger.info('Server ready');
208
+ });
209
+ }
210
+ ```
211
+
212
+ ### Bootstrap Options
213
+
214
+ #### Strict Mode (Default)
215
+
216
+ Bootstrap fails if **any** producer fails to connect:
217
+
218
+ ```typescript
219
+ await kafka.bootstrap(); // strict: true by default
220
+
221
+ // Throws if any producer fails
222
+ // Perfect for production where all clusters must be available
223
+ ```
224
+
225
+ #### Non-Strict Mode
226
+
227
+ Bootstrap succeeds if **at least one** producer connects:
228
+
229
+ ```typescript
230
+ const result = await kafka.bootstrap({
231
+ strict: false,
232
+ });
233
+
234
+ if (!result.success) {
235
+ logger.warn('Some producers failed:', result.results);
236
+ }
237
+
238
+ // Check which ones failed
239
+ const failed = Object.entries(result.results)
240
+ .filter(([_, r]) => !r.success)
241
+ .map(([name]) => name);
242
+
243
+ logger.warn('Failed producers:', failed);
244
+ // Continue - failed producers use lazy initialization
245
+ ```
246
+
247
+ #### Selective Bootstrap
248
+
249
+ Bootstrap specific producers only:
250
+
251
+ ```typescript
252
+ // Only bootstrap critical producer
253
+ await kafka.bootstrap({
254
+ producers: ['main'],
255
+ timeoutMs: 10000,
256
+ });
257
+
258
+ // Analytics producer will use lazy initialization
259
+ ```
260
+
261
+ #### Custom Timeouts
262
+
263
+ ```typescript
264
+ await kafka.bootstrap({
265
+ timeoutMs: 20000, // 20 second timeout
266
+ strict: true,
267
+ });
268
+ ```
269
+
94
270
  ## Usage
95
271
 
96
272
  ### Setup with Multiple Producers
@@ -133,12 +309,12 @@ export const TOPICS = {
133
309
 
134
310
  ### Publishing Messages
135
311
 
136
- Now you can use it directly throughout your service with full type safety:
312
+ Use the Kafka manager directly throughout your service with full type safety:
137
313
 
138
314
  ```typescript
139
315
  import { kafka, TOPICS } from './kafka';
140
316
 
141
- // Publish directly - no getKafka() overhead!
317
+ // Publish directly
142
318
  // TypeScript validates producer names: 'main' | 'analytics'
143
319
  await kafka.publish('main', TOPICS.DRIVER_CONSENT_V1, {
144
320
  state: 'accepted',
@@ -202,100 +378,150 @@ await kafka.publish('main', 'topic', { data: 'test' });
202
378
 
203
379
  ### Health Checks & Readiness Probes
204
380
 
205
- Use the built-in health check for Kubernetes readiness probes:
381
+ The package provides comprehensive health check APIs suitable for Kubernetes probes:
382
+
383
+ #### Readiness Probe
384
+
385
+ Checks if Kafka is ready to handle traffic (connects to brokers):
206
386
 
207
387
  ```typescript
208
388
  import { kafka } from './kafka';
209
389
 
210
- // Express/Fastify/etc
211
390
  app.get('/health/ready', async (req, res) => {
212
391
  const ready = await kafka.isReady();
213
- res.status(ready ? 200 : 503).json({
214
- ready,
215
- kafka: kafka.getConnectionStatus(),
216
- });
217
- });
218
392
 
219
- // Or manually check each producer
220
- app.get('/health/detailed', async (req, res) => {
221
- const status = kafka.getConnectionStatus();
222
- const allConnected = Object.values(status).every(s => s);
393
+ if (!ready) {
394
+ const health = kafka.getHealth();
395
+ return res.status(503).json({
396
+ ready: false,
397
+ kafka: health,
398
+ });
399
+ }
223
400
 
224
- res.status(allConnected ? 200 : 503).json({
225
- producers: status,
226
- enabled: kafka.isEnabled,
227
- });
401
+ res.json({ ready: true });
228
402
  });
229
403
  ```
230
404
 
231
- ### Migration from getKafka() Pattern
405
+ **Readiness check features:**
406
+ - Revalidates connectivity if cache is stale
407
+ - Configurable cache duration (default: 1 second)
408
+ - Force revalidation with `isReady({ force: true })`
409
+ - Respects `healthCheckTimeoutMs` configuration
232
410
 
233
- **Before (problematic pattern):**
411
+ #### Liveness Probe
412
+
413
+ Lightweight check for process health (does NOT connect to Kafka):
234
414
 
235
415
  ```typescript
236
- // kafka.ts
237
- let kafkaInstance: AfKafka | null = null;
238
- const ENABLE_KAFKA = process.env.ENABLE_KAFKA === 'true';
416
+ app.get('/health/live', (req, res) => {
417
+ const live = kafka.isLive();
418
+ res.status(live ? 200 : 503).json({ live });
419
+ });
420
+ ```
239
421
 
240
- const disabledKafkaMock: Partial<AfKafka> = {
241
- ping: async () => { logger.info('Kafka disabled'); },
242
- publish: async (topic, message) => {
243
- logger.debug('Skipping publish', { topic, message });
244
- return [];
245
- },
246
- };
422
+ **Liveness check:**
423
+ - Returns `false` if manager is in a fatal state
424
+ - Returns `false` if graceful shutdown has started
425
+ - Does NOT perform Kafka operations (very fast)
426
+ - Perfect for Kubernetes liveness probes
247
427
 
248
- async function getKafka(): Promise<AfKafka> {
249
- if (!ENABLE_KAFKA) {
250
- return disabledKafkaMock as AfKafka;
251
- }
428
+ #### Detailed Health Snapshot
252
429
 
253
- if (!kafkaInstance) {
254
- const { default: Kafka } = await import('@autofleet/kafka');
255
- kafkaInstance = await Kafka.create({
256
- brokers: ['kafka:9092'],
257
- clientId: 'my-service',
258
- });
259
- }
260
- return kafkaInstance;
261
- }
430
+ Get comprehensive health information for all producers:
431
+
432
+ ```typescript
433
+ app.get('/status/kafka', (req, res) => {
434
+ const health = kafka.getHealth();
435
+
436
+ res.json({
437
+ producers: health,
438
+ enabled: kafka.isEnabled,
439
+ live: kafka.isLive(),
440
+ });
441
+ });
262
442
 
263
- // Usage - awkward!
264
- async function publishEvent() {
265
- const kafka = await getKafka(); // Overhead on every call
266
- await kafka.publish('topic', data);
443
+ // Example response:
444
+ {
445
+ "producers": {
446
+ "main": {
447
+ "name": "main",
448
+ "enabled": true,
449
+ "isConnected": true,
450
+ "lastPingAt": 1701234567890,
451
+ "lastPingSucceededAt": 1701234567890,
452
+ "lastError": null,
453
+ "clusterId": "prod-kafka-main-01",
454
+ "brokerCount": 3,
455
+ "brokers": ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
456
+ },
457
+ "analytics": {
458
+ "name": "analytics",
459
+ "enabled": true,
460
+ "isConnected": false,
461
+ "lastPingAt": 1701234560000,
462
+ "lastPingSucceededAt": 1701234500000,
463
+ "lastError": {
464
+ "message": "Connection refused",
465
+ "timestamp": 1701234560000
466
+ },
467
+ "clusterId": null,
468
+ "brokerCount": 0,
469
+ "brokers": ["analytics-kafka:9092"]
470
+ }
471
+ },
472
+ "enabled": true,
473
+ "live": true
267
474
  }
268
475
  ```
269
476
 
270
- **After (clean pattern):**
477
+ #### Connection Status
478
+
479
+ Get quick connection status summary:
271
480
 
272
481
  ```typescript
273
- // kafka.ts
274
- import { KafkaManager } from '@autofleet/kafka';
275
- import logger from './logger';
482
+ const status = kafka.getConnectionStatus();
276
483
 
277
- // Synchronous initialization - no await!
278
- export const kafka = KafkaManager.create({
279
- enabled: process.env.ENABLE_KAFKA === 'true', // Built-in mock mode!
280
- logger,
281
- producers: {
282
- main: {
283
- brokers: ['kafka:9092'],
284
- clientId: 'my-service',
285
- },
484
+ // Returns:
485
+ {
486
+ "main": {
487
+ "connected": true,
488
+ "lastSuccessAt": 1701234567890,
489
+ "lastError": null
286
490
  },
287
- });
288
-
289
- export const TOPICS = {
290
- MY_TOPIC: 'my.topic',
291
- } as const;
292
-
293
- // Usage - clean and direct!
294
- async function publishEvent() {
295
- await kafka.publish('main', TOPICS.MY_TOPIC, data); // Direct access!
491
+ "analytics": {
492
+ "connected": false,
493
+ "lastSuccessAt": 1701234500000,
494
+ "lastError": "Connection refused"
495
+ }
296
496
  }
297
497
  ```
298
498
 
499
+ #### Kubernetes Example
500
+
501
+ ```yaml
502
+ apiVersion: v1
503
+ kind: Pod
504
+ metadata:
505
+ name: my-service
506
+ spec:
507
+ containers:
508
+ - name: app
509
+ image: my-service:latest
510
+ livenessProbe:
511
+ httpGet:
512
+ path: /health/live
513
+ port: 3000
514
+ initialDelaySeconds: 10
515
+ periodSeconds: 10
516
+ readinessProbe:
517
+ httpGet:
518
+ path: /health/ready
519
+ port: 3000
520
+ initialDelaySeconds: 15
521
+ periodSeconds: 5
522
+ timeoutSeconds: 3
523
+ ```
524
+
299
525
  ## Configuration
300
526
 
301
527
  ### KafkaManagerOptions
@@ -303,16 +529,41 @@ async function publishEvent() {
303
529
  ```typescript
304
530
  interface KafkaManagerOptions {
305
531
  // Enable/disable Kafka - when false, returns mock implementations
532
+ // @default true
306
533
  enabled?: boolean;
307
534
 
308
535
  // Custom logger instance
309
536
  logger?: LoggerInstanceManager;
310
537
 
311
- // Skip automatic graceful shutdown (default: false)
538
+ // Skip automatic graceful shutdown
539
+ // @default false
312
540
  dontGracefulShutdown?: boolean;
313
541
 
314
- // Named producers configuration
542
+ // Named producers configuration (required)
315
543
  producers: Record<string, ProducerConfig>;
544
+
545
+ // Health & Bootstrap Options
546
+
547
+ // Global timeout for health checks in ms
548
+ // Used by isReady() and ping operations
549
+ // @default 5000 (5 seconds)
550
+ healthCheckTimeoutMs?: number;
551
+
552
+ // How long to cache health check results in ms
553
+ // Set to 0 to always revalidate
554
+ // @default 1000 (1 second)
555
+ healthCheckCacheMs?: number;
556
+
557
+ // Default timeout for bootstrap operations in ms
558
+ // Can be overridden per bootstrap() call
559
+ // @default 30000 (30 seconds)
560
+ bootstrapTimeoutMs?: number;
561
+
562
+ // Default strict mode for bootstrap
563
+ // If true, all producers must succeed
564
+ // If false, at least one producer must succeed
565
+ // @default true
566
+ strictBootstrap?: boolean;
316
567
  }
317
568
  ```
318
569
 
@@ -402,9 +653,11 @@ await kafka.publish('main', 'events', data, {
402
653
 
403
654
  ## API Reference
404
655
 
405
- ### `KafkaManager.create(options): KafkaManager<ProducerNames>`
656
+ ### Core Lifecycle Methods
657
+
658
+ #### `KafkaManager.create(options): KafkaManager<ProducerNames>`
406
659
 
407
- Creates a new KafkaManager instance with multiple named producers. Producers are initialized lazily on first use. **Producer names are type-safe** - TypeScript will autocomplete and validate them based on your configuration.
660
+ Creates a new KafkaManager instance with multiple named producers. **Producer names are type-safe** - TypeScript will autocomplete and validate them based on your configuration.
408
661
 
409
662
  **Parameters:**
410
663
  - `options` - Configuration options (see [Configuration](#configuration))
@@ -437,7 +690,123 @@ await kafka.publish('analytics', 'metrics', { value: 1 }); // ✅ Valid
437
690
  // await kafka.publish('wrong', 'topic', {}); // ❌ TypeScript error!
438
691
  ```
439
692
 
440
- ### `publish<T>(producerName, topic, value, options?): Promise<RecordMetadata[]>`
693
+ #### `bootstrap(options?): Promise<BootstrapResult>`
694
+
695
+ Explicitly bootstrap all (or specific) producers. Connects to brokers, validates connectivity, and returns detailed results. **Recommended before serving traffic in production.**
696
+
697
+ **Parameters:**
698
+ - `options.timeoutMs` (optional) - Maximum time to wait for all producers (default: `bootstrapTimeoutMs` config)
699
+ - `options.strict` (optional) - Fail if ANY producer fails (default: `strictBootstrap` config)
700
+ - `options.producers` (optional) - Array of specific producer names to bootstrap (default: all)
701
+
702
+ **Returns:** `BootstrapResult` with success status, duration, and per-producer results
703
+
704
+ **Examples:**
705
+ ```typescript
706
+ // Bootstrap all producers with defaults
707
+ await kafka.bootstrap();
708
+
709
+ // Non-strict mode - continue if some fail
710
+ const result = await kafka.bootstrap({
711
+ timeoutMs: 10000,
712
+ strict: false,
713
+ });
714
+ if (!result.success) {
715
+ console.error('Some producers failed:', result.results);
716
+ }
717
+
718
+ // Bootstrap specific producers only
719
+ await kafka.bootstrap({ producers: ['main'] });
720
+ ```
721
+
722
+ ### Health & Monitoring Methods
723
+
724
+ #### `getHealth(): Record<ProducerNames, ProducerHealth>`
725
+
726
+ Get detailed health snapshot for all producers. Returns comprehensive metadata including connection state, timestamps, errors, and cluster info.
727
+
728
+ **Returns:** Object mapping producer names to `ProducerHealth` objects
729
+
730
+ **Example:**
731
+ ```typescript
732
+ const health = kafka.getHealth();
733
+ console.log(health.main.clusterId); // 'prod-kafka-01'
734
+ console.log(health.main.lastPingSucceededAt);// 1701234567890
735
+ console.log(health.main.lastError); // null or error object
736
+ ```
737
+
738
+ #### `getProducerHealth(name): ProducerHealth`
739
+
740
+ Get detailed health for a specific producer.
741
+
742
+ **Parameters:**
743
+ - `name` - Producer name (type-safe)
744
+
745
+ **Returns:** `ProducerHealth` object
746
+
747
+ **Example:**
748
+ ```typescript
749
+ const health = kafka.getProducerHealth('main');
750
+ console.log('Cluster:', health.clusterId);
751
+ console.log('Brokers:', health.brokerCount);
752
+ ```
753
+
754
+ #### `isReady(options?): Promise<boolean>`
755
+
756
+ Check if Kafka is ready to handle traffic. Revalidates connectivity if cache is stale. **Use for Kubernetes readiness probes.**
757
+
758
+ **Parameters:**
759
+ - `options.timeout` (optional) - Override default health check timeout
760
+ - `options.force` (optional) - Force revalidation, ignore cache
761
+
762
+ **Returns:** `true` if all producers are connected, `false` otherwise
763
+
764
+ **Examples:**
765
+ ```typescript
766
+ // Use cached result if fresh
767
+ const ready = await kafka.isReady();
768
+
769
+ // Force immediate revalidation
770
+ const ready = await kafka.isReady({ force: true });
771
+
772
+ // Custom timeout
773
+ const ready = await kafka.isReady({ timeout: 10000 });
774
+ ```
775
+
776
+ #### `isLive(): boolean`
777
+
778
+ Lightweight liveness check - does NOT perform Kafka operations. Only checks internal state for fatal errors. **Perfect for Kubernetes liveness probes.**
779
+
780
+ **Returns:** `false` if manager is in fatal state or graceful shutdown has started
781
+
782
+ **Example:**
783
+ ```typescript
784
+ app.get('/health/live', (req, res) => {
785
+ res.status(kafka.isLive() ? 200 : 503).json({ live: kafka.isLive() });
786
+ });
787
+ ```
788
+
789
+ #### `getConnectionStatus(): Record<ProducerNames, ConnectionStatus>`
790
+
791
+ Get connection status for all producers with timestamps and errors.
792
+
793
+ **Returns:** Object mapping producer names to connection status
794
+
795
+ **Example:**
796
+ ```typescript
797
+ const status = kafka.getConnectionStatus();
798
+ // {
799
+ // main: {
800
+ // connected: true,
801
+ // lastSuccessAt: 1701234567890,
802
+ // lastError: null
803
+ // }
804
+ // }
805
+ ```
806
+
807
+ ### Publishing Methods
808
+
809
+ #### `publish<T>(producerName, topic, value, options?): Promise<RecordMetadata[]>`
441
810
 
442
811
  Publishes a single message to a topic using a named producer. Producer name is **type-safe** - must match one of the configured producers.
443
812
 
@@ -657,18 +1026,212 @@ await kafka.publish('main', 'orders', orderData);
657
1026
  await kafka.publish('analytics', 'metrics', metricsData);
658
1027
  ```
659
1028
 
1029
+ ## Troubleshooting
1030
+
1031
+ ### Common Issues and Solutions
1032
+
1033
+ #### Bootstrap Fails with Connection Timeout
1034
+
1035
+ **Problem:**
1036
+ ```
1037
+ [main] Failed to connect to Kafka brokers: Connection timeout after 5000ms
1038
+
1039
+ Possible causes:
1040
+ 1. Brokers are unreachable: kafka:9092
1041
+ 2. Network connectivity issues
1042
+ 3. Firewall blocking ports
1043
+ ```
1044
+
1045
+ **Solutions:**
1046
+ 1. **Verify brokers are running:**
1047
+ ```bash
1048
+ kubectl get pods -l app=kafka
1049
+ ```
1050
+
1051
+ 2. **Test connectivity:**
1052
+ ```bash
1053
+ nc -zv kafka-broker 9092
1054
+ ```
1055
+
1056
+ 3. **Check network policies:**
1057
+ - Ensure service can reach Kafka namespace
1058
+ - Verify firewall rules allow traffic on port 9092
1059
+
1060
+ 4. **Increase timeout:**
1061
+ ```typescript
1062
+ await kafka.bootstrap({ timeoutMs: 30000 }); // 30 seconds
1063
+ ```
1064
+
1065
+ #### SASL Authentication Failed
1066
+
1067
+ **Problem:**
1068
+ ```
1069
+ [main] Failed to connect to Kafka brokers: SASL authentication failed
1070
+ ```
1071
+
1072
+ **Solutions:**
1073
+ 1. **Verify credentials:**
1074
+ ```typescript
1075
+ producers: {
1076
+ main: {
1077
+ brokers: ['kafka:9092'],
1078
+ sasl: {
1079
+ mechanism: 'scram-sha-256',
1080
+ username: process.env.KAFKA_USERNAME, // Check this
1081
+ password: process.env.KAFKA_PASSWORD, // And this
1082
+ },
1083
+ },
1084
+ }
1085
+ ```
1086
+
1087
+ 2. **Check mechanism:**
1088
+ - Verify broker supports the SASL mechanism
1089
+ - Common mechanisms: `'plain'`, `'scram-sha-256'`, `'scram-sha-512'`
1090
+
1091
+ 3. **Inspect secrets:**
1092
+ ```bash
1093
+ kubectl get secret kafka-credentials -o yaml
1094
+ ```
1095
+
1096
+ #### Producer Not Found Error
1097
+
1098
+ **Problem:**
1099
+ ```
1100
+ Producer 'analytics' not found. Available producers: main
1101
+ ```
1102
+
1103
+ **Solution:**
1104
+ You're trying to use a producer that wasn't configured:
1105
+
1106
+ ```typescript
1107
+ // Add the missing producer
1108
+ const kafka = KafkaManager.create({
1109
+ producers: {
1110
+ main: { brokers: ['kafka:9092'] },
1111
+ analytics: { brokers: ['kafka-analytics:9092'] }, // Add this
1112
+ },
1113
+ });
1114
+ ```
1115
+
1116
+ #### Readiness Check Always Fails
1117
+
1118
+ **Problem:**
1119
+ Health checks keep failing even though Kafka is up.
1120
+
1121
+ **Solutions:**
1122
+ 1. **Check timeout is sufficient:**
1123
+ ```typescript
1124
+ const kafka = KafkaManager.create({
1125
+ healthCheckTimeoutMs: 5000, // Increase if needed
1126
+ });
1127
+ ```
1128
+
1129
+ 2. **Force revalidation:**
1130
+ ```typescript
1131
+ const ready = await kafka.isReady({ force: true });
1132
+ ```
1133
+
1134
+ 3. **Inspect detailed health:**
1135
+ ```typescript
1136
+ const health = kafka.getHealth();
1137
+ console.log(health); // Check lastError for details
1138
+ ```
1139
+
1140
+ #### Bootstrap Succeeds but Publish Fails
1141
+
1142
+ **Problem:**
1143
+ Bootstrap passes, but publish throws "topic auto-create disabled".
1144
+
1145
+ **Solution:**
1146
+ Enable topic auto-creation or create topics manually:
1147
+
1148
+ ```typescript
1149
+ producers: {
1150
+ main: {
1151
+ brokers: ['kafka:9092'],
1152
+ autoCreateTopics: true, // Enable for dev/test
1153
+ },
1154
+ }
1155
+
1156
+ // Or create topics manually:
1157
+ // kafka-topics --create --topic my-topic --bootstrap-server kafka:9092
1158
+ ```
1159
+
1160
+ #### Graceful Shutdown Not Working
1161
+
1162
+ **Problem:**
1163
+ Service doesn't close Kafka connections on shutdown.
1164
+
1165
+ **Solutions:**
1166
+ 1. **Ensure graceful shutdown is enabled:**
1167
+ ```typescript
1168
+ const kafka = KafkaManager.create({
1169
+ dontGracefulShutdown: false, // Default
1170
+ producers: { /* ... */ },
1171
+ });
1172
+ ```
1173
+
1174
+ 2. **Manual disconnect if needed:**
1175
+ ```typescript
1176
+ process.on('SIGTERM', async () => {
1177
+ await kafka.disconnect();
1178
+ process.exit(0);
1179
+ });
1180
+ ```
1181
+
1182
+ ### Debugging Tips
1183
+
1184
+ #### Enable Debug Logging
1185
+
1186
+ ```typescript
1187
+ import { KafkaManager } from '@autofleet/kafka';
1188
+ import logger from './logger';
1189
+
1190
+ const kafka = KafkaManager.create({
1191
+ logger, // Ensure logger has debug level enabled
1192
+ producers: { /* ... */ },
1193
+ });
1194
+
1195
+ // Check health frequently
1196
+ setInterval(() => {
1197
+ const health = kafka.getHealth();
1198
+ logger.debug('Kafka health', { health });
1199
+ }, 10000);
1200
+ ```
1201
+
1202
+ #### Inspect Cluster Metadata
1203
+
1204
+ ```typescript
1205
+ const health = kafka.getProducerHealth('main');
1206
+ console.log('Cluster ID:', health.clusterId);
1207
+ console.log('Brokers:', health.brokerCount);
1208
+ console.log('Last success:', new Date(health.lastPingSucceededAt));
1209
+ console.log('Last error:', health.lastError);
1210
+ ```
1211
+
1212
+ #### Test Connectivity Manually
1213
+
1214
+ ```typescript
1215
+ try {
1216
+ await kafka.pingProducer('main');
1217
+ console.log('✓ Connection successful');
1218
+ } catch (error) {
1219
+ console.error('✗ Connection failed:', error.message);
1220
+ }
1221
+ ```
1222
+
660
1223
  ## Testing
661
1224
 
662
1225
  Run the test suite:
663
1226
 
664
1227
  ```bash
665
- pnpm test
1228
+ npm test
666
1229
  ```
667
1230
 
668
1231
  Run tests with coverage:
669
1232
 
670
1233
  ```bash
671
- pnpm run coverage
1234
+ npm run coverage
672
1235
  ```
673
1236
 
674
1237
  ### Example Test