@drarzter/kafka-client 0.5.6 → 0.6.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -32,12 +32,14 @@ Type-safe Kafka client for Node.js. Framework-agnostic core with a first-class N
32
32
  - [Error classes](#error-classes)
33
33
  - [Retry topic chain](#retry-topic-chain)
34
34
  - [stopConsumer](#stopconsumer)
35
+ - [Graceful shutdown](#graceful-shutdown)
35
36
  - [Consumer handles](#consumer-handles)
36
37
  - [onMessageLost](#onmessagelost)
37
38
  - [onRebalance](#onrebalance)
38
39
  - [Consumer lag](#consumer-lag)
39
40
  - [Handler timeout warning](#handler-timeout-warning)
40
41
  - [Schema validation](#schema-validation)
42
+ - [Context-aware validators](#context-aware-validators-schemaparsecontext)
41
43
  - [Health check](#health-check)
42
44
  - [Testing](#testing)
43
45
  - [Project structure](#project-structure)
@@ -112,7 +114,7 @@ For standalone usage (Express, Fastify, raw Node), no extra dependencies needed
112
114
  ```typescript
113
115
  import { KafkaClient, topic } from '@drarzter/kafka-client/core';
114
116
 
115
- const OrderCreated = topic('order.created')<{ orderId: string; amount: number }>();
117
+ const OrderCreated = topic('order.created').type<{ orderId: string; amount: number }>();
116
118
 
117
119
  const kafka = new KafkaClient('my-app', 'my-group', ['localhost:9092']);
118
120
  await kafka.connectProducer();
@@ -245,13 +247,13 @@ Instead of a centralized topic map, define each topic as a standalone typed obje
245
247
  ```typescript
246
248
  import { topic, TopicsFrom } from '@drarzter/kafka-client';
247
249
 
248
- export const OrderCreated = topic('order.created')<{
250
+ export const OrderCreated = topic('order.created').type<{
249
251
  orderId: string;
250
252
  userId: string;
251
253
  amount: number;
252
254
  }>();
253
255
 
254
- export const OrderCompleted = topic('order.completed')<{
256
+ export const OrderCompleted = topic('order.completed').type<{
255
257
  orderId: string;
256
258
  completedAt: string;
257
259
  }>();
@@ -581,6 +583,8 @@ await this.kafka.startBatchConsumer(
581
583
  );
582
584
  ```
583
585
 
586
+ > **Note:** If your handler calls `resolveOffset()` or `commitOffsetsIfNecessary()` without setting `autoCommit: false`, a `warn` is logged at consumer-start time — mixing autoCommit with manual offset control causes offset conflicts. Set `autoCommit: false` to suppress the warning and take full control of offset management.
587
+
584
588
  With `@SubscribeTo()`:
585
589
 
586
590
  ```typescript
@@ -592,6 +596,20 @@ async handleOrders(envelopes: EventEnvelope<OrdersTopicMap['order.created']>[],
592
596
 
593
597
  Schema validation runs per-message — invalid messages are skipped (DLQ'd if enabled), valid ones are passed to the handler. Retry applies to the whole batch.
594
598
 
599
+ `retryTopics: true` is also supported on `startBatchConsumer`. On handler failure, each envelope in the batch is routed individually to `<topic>.retry.1`; the companion retry consumers call the batch handler one message at a time with a stub `BatchMeta` (no-op `heartbeat`/`resolveOffset`/`commitOffsetsIfNecessary`):
600
+
601
+ ```typescript
602
+ await kafka.startBatchConsumer(
603
+ ['orders.created'],
604
+ async (envelopes, meta) => { /* same handler */ },
605
+ {
606
+ retry: { maxRetries: 3, backoffMs: 1000 },
607
+ dlq: true,
608
+ retryTopics: true, // ← now supported for batch consumers too
609
+ },
610
+ );
611
+ ```
612
+
595
613
  `BatchMeta` exposes:
596
614
 
597
615
  | Property/Method | Description |
@@ -671,21 +689,55 @@ const kafka = new KafkaClient('my-app', 'my-group', brokers, {
671
689
  });
672
690
  ```
673
691
 
674
- `otelInstrumentation()` injects `traceparent` on send, extracts it on consume, and creates `CONSUMER` spans automatically. Requires `@opentelemetry/api` as a peer dependency.
692
+ `otelInstrumentation()` injects `traceparent` on send, extracts it on consume, and creates `CONSUMER` spans automatically. The span is set as the **active OTel context** for the handler's duration via `context.with()` — so `trace.getActiveSpan()` works inside your handler and any child spans are automatically parented to the consume span. Requires `@opentelemetry/api` as a peer dependency.
675
693
 
676
- Custom instrumentation:
694
+ ### Custom instrumentation
695
+
696
+ `beforeConsume` can return a `BeforeConsumeResult` — either the legacy `() => void` cleanup function, or an object with `cleanup` and/or `wrap`:
677
697
 
678
698
  ```typescript
679
- import { KafkaInstrumentation } from '@drarzter/kafka-client';
699
+ import { KafkaInstrumentation, BeforeConsumeResult } from '@drarzter/kafka-client';
680
700
 
681
- const metrics: KafkaInstrumentation = {
701
+ const myInstrumentation: KafkaInstrumentation = {
682
702
  beforeSend(topic, headers) { /* inject headers, start timer */ },
683
703
  afterSend(topic) { /* record send latency */ },
684
- beforeConsume(envelope) { /* start span */ return () => { /* end span */ }; },
704
+
705
+ beforeConsume(envelope): BeforeConsumeResult {
706
+ const span = startMySpan(envelope.topic);
707
+ return {
708
+ // cleanup() is called after the handler completes (success or error)
709
+ cleanup() { span.end(); },
710
+ // wrap(fn) runs the handler inside the desired async context
711
+ // call fn() wherever you need it in the context scope
712
+ wrap(fn) { return runWithSpanActive(span, fn); },
713
+ };
714
+ },
715
+
685
716
  onConsumeError(envelope, error) { /* record error metric */ },
686
717
  };
687
718
  ```
688
719
 
720
+ The legacy `() => void` form is still fully supported — return a function directly if you only need cleanup:
721
+
722
+ ```typescript
723
+ beforeConsume(envelope) {
724
+ const timer = startTimer();
725
+ return () => timer.end(); // cleanup only, no context wrapping
726
+ },
727
+ ```
728
+
729
+ `BeforeConsumeResult` is a union:
730
+
731
+ ```typescript
732
+ type BeforeConsumeResult =
733
+ | (() => void) // legacy: cleanup only
734
+ | { cleanup?(): void; // called after handler (success or error)
735
+ wrap?(fn: () => Promise<void>): Promise<void>; // wraps handler execution
736
+ };
737
+ ```
738
+
739
+ When multiple instrumentations each provide a `wrap`, they compose in declaration order — the first instrumentation's `wrap` is the outermost.
740
+
689
741
  ## Options reference
690
742
 
691
743
  ### Send options
@@ -713,8 +765,9 @@ Options for `sendMessage()` — the third argument:
713
765
  | `retry.backoffMs` | `1000` | Base delay for exponential backoff in ms |
714
766
  | `retry.maxBackoffMs` | `30000` | Maximum delay cap for exponential backoff in ms |
715
767
  | `dlq` | `false` | Send to `{topic}.dlq` after all retries exhausted — message carries `x-dlq-*` metadata headers |
716
- | `retryTopics` | `false` | Route failed messages through `{topic}.retry` instead of sleeping in-process (see [Retry topic chain](#retry-topic-chain)) |
768
+ | `retryTopics` | `false` | Route failed messages through per-level topics (`{topic}.retry.1`, `{topic}.retry.2`, …) instead of sleeping in-process; exactly-once routing semantics within the retry chain; requires `retry` (see [Retry topic chain](#retry-topic-chain)) |
717
769
  | `interceptors` | `[]` | Array of before/after/onError hooks |
770
+ | `retryTopicAssignmentTimeoutMs` | `10000` | Timeout (ms) to wait for each retry level consumer to receive partition assignments after connecting; increase for slow brokers |
718
771
  | `handlerTimeoutMs` | — | Log a warning if the handler hasn't resolved within this window (ms) — does not cancel the handler |
719
772
  | `batch` | `false` | (decorator only) Use `startBatchConsumer` instead of `startConsumer` |
720
773
  | `subscribeRetry.retries` | `5` | Max attempts for `consumer.subscribe()` when topic doesn't exist yet |
@@ -826,12 +879,31 @@ const interceptor: ConsumerInterceptor<MyTopics> = {
826
879
 
827
880
  ## Retry topic chain
828
881
 
829
- By default, retry is handled in-process: the consumer sleeps between attempts while holding the partition. With `retryTopics: true`, failed messages are routed to a `<topic>.retry` Kafka topic instead. A companion consumer auto-starts on `<topic>.retry` (group `<groupId>-retry`), waits until the scheduled retry time, then calls the same handler.
882
+ > **tl;dr recommended production setup:**
883
+ >
884
+ > ```typescript
885
+ > await kafka.startConsumer(['orders.created'], handler, {
886
+ > retry: { maxRetries: 3, backoffMs: 1_000, maxBackoffMs: 30_000 },
887
+ > dlq: true, // ← messages never silently disappear
888
+ > retryTopics: true, // ← retries survive restarts; routing is exactly-once
889
+ > });
890
+ > ```
891
+ >
892
+ > Just `retry` + `dlq: true` is already safe for most workloads — failed messages land in `{topic}.dlq` after all retries and are never silently dropped. Add `retryTopics: true` for crash-durable retries and exactly-once routing guarantees within the retry chain.
893
+ >
894
+ > | Configuration | What happens to a message that always fails | Process crash mid-retry |
895
+ > | --- | --- | --- |
896
+ > | `retry` only | Dropped — `onMessageLost` fires | Lost if crash between attempts |
897
+ > | `retry` + `dlq` | Lands in `{topic}.dlq` after all attempts | DLQ write may duplicate (rare) |
898
+ > | `retry` + `dlq` + `retryTopics` | Lands in `{topic}.dlq` after all attempts | Retries survive restarts; routing is exactly-once |
899
+
900
+ By default, retry is handled in-process: the consumer sleeps between attempts while holding the partition. With `retryTopics: true`, failed messages are routed through a chain of Kafka topics instead — one topic per retry level. A companion consumer auto-starts per level, waits for the scheduled delay using partition pause/resume, then calls the same handler.
830
901
 
831
902
  Benefits over in-process retry:
832
903
 
833
- - **Durable** — retry messages survive a consumer restart
834
- - **Non-blocking** — the original consumer is free immediately; the retry consumer pauses only the specific partition being delayed, so other partitions continue processing
904
+ - **Durable** — retry messages survive a consumer restart; routing between levels and to DLQ is exactly-once via Kafka transactions
905
+ - **Non-blocking** — the original consumer is free immediately; each level consumer only pauses its specific partition during the delay window, so other partitions continue processing
906
+ - **Isolated** — each retry level has its own consumer group, so a slow level 3 consumer never blocks a level 1 consumer
835
907
 
836
908
  ```typescript
837
909
  await kafka.startConsumer(['orders.created'], handler, {
@@ -841,17 +913,33 @@ await kafka.startConsumer(['orders.created'], handler, {
841
913
  });
842
914
  ```
843
915
 
844
- Message flow with `maxRetries: 2`:
916
+ With `maxRetries: 3`, this creates three dedicated topics and three companion consumers:
917
+
918
+ ```text
919
+ orders.created.retry.1 → consumer group: my-group-retry.1 (delay ~1 s)
920
+ orders.created.retry.2 → consumer group: my-group-retry.2 (delay ~2 s)
921
+ orders.created.retry.3 → consumer group: my-group-retry.3 (delay ~4 s)
922
+ ```
923
+
924
+ Message flow with `maxRetries: 2` and `dlq: true`:
845
925
 
846
926
  ```text
847
- orders.created → handler fails → orders.created.retry (attempt 1, delay ~1 s)
848
- → handler fails → orders.created.retry (attempt 2, delay ~2 s)
849
- → handler fails → orders.created.dlq
927
+ orders.created → handler fails → orders.created.retry.1 (attempt 1, delay ~1 s)
928
+ orders.created.retry.1 → handler fails → orders.created.retry.2 (attempt 2, delay ~2 s)
929
+ orders.created.retry.2 → handler fails → orders.created.dlq
850
930
  ```
851
931
 
852
- The retry topic messages carry scheduling headers (`x-retry-attempt`, `x-retry-after`, `x-retry-original-topic`, `x-retry-max-retries`) that the companion consumer reads automatically no manual configuration needed.
932
+ Each level consumer uses `consumer.pause sleep(remaining) consumer.resume` so the partition offset is never committed before the message is processed. On a process crash during sleep or handler execution, the message is redelivered on restart.
933
+
934
+ The retry topic messages carry scheduling headers (`x-retry-attempt`, `x-retry-after`, `x-retry-original-topic`, `x-retry-max-retries`) that each level consumer reads automatically — no manual configuration needed.
853
935
 
854
- > **Note:** `retryTopics` requires `retry` to be setan error is thrown at startup if `retry` is missing. Currently only applies to `startConsumer`; batch consumers (`startBatchConsumer`) use in-process retry regardless.
936
+ > **Delivery guarantee:** routing within the retry chain (retry.N retry.N+1 and retry.N → DLQ) is **exactly-once** each routing step is wrapped in a Kafka transaction via `sendOffsetsToTransaction`, so the produce and the consumer offset commit happen atomically. A crash at any point rolls back the transaction: the message is redelivered and the routing is retried, with no duplicate in the next level. If the EOS transaction itself fails (broker unavailable), the offset is not committed and the message stays safely in the retry topic until the broker recovers.
937
+ >
938
+ > The remaining at-least-once window is at the **main consumer → retry.1** boundary: the main consumer uses `autoCommit: true` by default, so if it crashes after routing to `retry.1` but before autoCommit fires, the message may appear twice in `retry.1`. This is the standard Kafka at-least-once trade-off for any consumer using autoCommit. Design handlers to be idempotent if this edge case is unacceptable.
939
+ >
940
+ > **Startup validation:** `retryTopics` requires `retry` to be set — an error is thrown at startup if `retry` is missing. When `autoCreateTopics: false`, all `{topic}.retry.N` topics are validated to exist at startup and a clear error lists any missing ones. With `autoCreateTopics: true` the check is skipped — topics are created automatically by the `ensureTopic` path. Supported by both `startConsumer` and `startBatchConsumer`.
941
+
942
+ `stopConsumer(groupId)` automatically stops all companion retry level consumers started for that group.
855
943
 
856
944
  ## stopConsumer
857
945
 
@@ -867,6 +955,36 @@ await kafka.stopConsumer();
867
955
 
868
956
  `stopConsumer(groupId)` disconnects and removes only that group's consumer, leaving other groups running. Useful when you want to pause processing for a specific topic without restarting the whole client.
869
957
 
958
+ ## Graceful shutdown
959
+
960
+ `disconnect()` now drains in-flight handlers before tearing down connections — no messages are silently cut off mid-processing.
961
+
962
+ **NestJS** apps get this automatically: `onModuleDestroy` calls `disconnect()`, which waits for all running `eachMessage` / `eachBatch` callbacks to settle first. Enable NestJS shutdown hooks in your bootstrap:
963
+
964
+ ```typescript
965
+ // main.ts
966
+ const app = await NestFactory.create(AppModule);
967
+ app.enableShutdownHooks(); // lets NestJS call onModuleDestroy on SIGTERM
968
+ await app.listen(3000);
969
+ ```
970
+
971
+ **Standalone** apps call `enableGracefulShutdown()` to register SIGTERM / SIGINT handlers:
972
+
973
+ ```typescript
974
+ const kafka = new KafkaClient('my-app', 'my-group', brokers);
975
+ await kafka.connectProducer();
976
+
977
+ kafka.enableGracefulShutdown();
978
+ // or with custom signals and timeout:
979
+ kafka.enableGracefulShutdown(['SIGTERM', 'SIGINT'], 60_000);
980
+ ```
981
+
982
+ `disconnect()` accepts an optional `drainTimeoutMs` (default `30_000` ms). If handlers haven't settled within the window, a warning is logged and the client disconnects anyway:
983
+
984
+ ```typescript
985
+ await kafka.disconnect(10_000); // wait up to 10 s, then force disconnect
986
+ ```
987
+
870
988
  ## Consumer handles
871
989
 
872
990
  `startConsumer()` and `startBatchConsumer()` return a `ConsumerHandle` instead of `void`. Use it to stop a specific consumer without needing to remember the group ID:
@@ -900,12 +1018,13 @@ const kafka = new KafkaClient('my-app', 'my-group', ['localhost:9092'], {
900
1018
  });
901
1019
  ```
902
1020
 
903
- `onMessageLost` fires in two cases:
1021
+ `onMessageLost` fires in three cases:
904
1022
 
905
1023
  1. **Handler error** — handler threw after all retries and `dlq: false`
906
1024
  2. **Validation error** — schema rejected the message and `dlq: false` (attempt is `0`)
1025
+ 3. **DLQ send failure** — `dlq: true` but `producer.send()` to `{topic}.dlq` itself threw (broker down, topic missing); the error passed to `onMessageLost` is the send error, not the original handler error
907
1026
 
908
- It does NOT fire when `dlq: true` in that case the message is preserved in `{topic}.dlq`.
1027
+ In the normal case (`dlq: true`, DLQ send succeeds), `onMessageLost` does NOT fire the message is preserved in `{topic}.dlq`.
909
1028
 
910
1029
  ## onRebalance
911
1030
 
@@ -977,18 +1096,18 @@ export const OrderCreated = topic('order.created').schema(z.object({
977
1096
  amount: z.number().positive(),
978
1097
  }));
979
1098
 
980
- // Without schema — explicit generic (still works)
981
- export const OrderAudit = topic('order.audit')<{ orderId: string; action: string }>();
1099
+ // Without schema — explicit type via .type<T>()
1100
+ export const OrderAudit = topic('order.audit').type<{ orderId: string; action: string }>();
982
1101
 
983
1102
  export type MyTopics = TopicsFrom<typeof OrderCreated | typeof OrderAudit>;
984
1103
  ```
985
1104
 
986
1105
  ### How it works
987
1106
 
988
- **On send** — `sendMessage`, `sendBatch`, and `transaction` call `schema.parse(message)` before serializing. Invalid messages throw immediately (the schema library's error, e.g. `ZodError`):
1107
+ **On send** — `sendMessage`, `sendBatch`, and `transaction` call `schema.parse(message)` before serializing. Invalid messages throw immediately as `KafkaValidationError` (the original schema error is available as `cause`):
989
1108
 
990
1109
  ```typescript
991
- // This throws ZodError — amount must be positive
1110
+ // This throws KafkaValidationError — amount must be positive
992
1111
  await kafka.sendMessage(OrderCreated, { orderId: '1', userId: '2', amount: -5 });
993
1112
  ```
994
1113
 
@@ -1048,6 +1167,38 @@ const asyncValidator: SchemaLike<{ id: string }> = {
1048
1167
  const MyTopic = topic('my.topic').schema(customValidator);
1049
1168
  ```
1050
1169
 
1170
+ ### Context-aware validators (`SchemaParseContext`)
1171
+
1172
+ `parse()` receives an optional second argument `ctx: SchemaParseContext` on both the consume and send paths. Use it for schema-registry lookups, version-aware migration, or header-driven parsing:
1173
+
1174
+ ```typescript
1175
+ import { SchemaLike, SchemaParseContext } from '@drarzter/kafka-client';
1176
+
1177
+ const versionedValidator: SchemaLike<MyPayload> = {
1178
+ parse(data: unknown, ctx?: SchemaParseContext) {
1179
+ const version = ctx?.version ?? 1;
1180
+ // version comes from the x-schema-version header (send: schemaVersion option)
1181
+ if (version >= 2) return migrateV1toV2(data);
1182
+ return validateV1(data);
1183
+ },
1184
+ };
1185
+
1186
+ // On consume: ctx = { topic: 'orders.created', headers: { ... }, version: 2 }
1187
+ // On send: ctx = { topic: 'orders.created', headers: { ... }, version: schemaVersion ?? 1 }
1188
+ ```
1189
+
1190
+ `SchemaParseContext` shape:
1191
+
1192
+ ```typescript
1193
+ interface SchemaParseContext {
1194
+ topic: string; // topic the message was produced to / consumed from
1195
+ headers: MessageHeaders; // decoded headers (envelope headers included)
1196
+ version: number; // x-schema-version header value, defaults to 1
1197
+ }
1198
+ ```
1199
+
1200
+ Existing validators (Zod, Valibot, ArkType, custom) that only use the first argument continue to work unchanged — the second argument is silently ignored.
1201
+
1051
1202
  ## Health check
1052
1203
 
1053
1204
  Monitor Kafka connectivity with the built-in health indicator:
@@ -1100,7 +1251,7 @@ expect(kafka.sendMessage).toHaveBeenCalledWith(
1100
1251
  );
1101
1252
 
1102
1253
  // Override return values
1103
- kafka.checkStatus.mockResolvedValueOnce({ topics: ['order.created'] });
1254
+ kafka.checkStatus.mockResolvedValueOnce({ status: 'up', clientId: 'mock-client', topics: ['order.created'] });
1104
1255
 
1105
1256
  // Mock rejections
1106
1257
  kafka.sendMessage.mockRejectedValueOnce(new Error('broker down'));