@drarzter/kafka-client 0.6.6 → 0.6.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -33,6 +33,9 @@ Type-safe Kafka client for Node.js. Framework-agnostic core with a first-class N
33
33
  - [Deduplication (Lamport Clock)](#deduplication-lamport-clock)
34
34
  - [Retry topic chain](#retry-topic-chain)
35
35
  - [stopConsumer](#stopconsumer)
36
+ - [Pause and resume](#pause-and-resume)
37
+ - [Reset consumer offsets](#reset-consumer-offsets)
38
+ - [DLQ replay](#dlq-replay)
36
39
  - [Graceful shutdown](#graceful-shutdown)
37
40
  - [Consumer handles](#consumer-handles)
38
41
  - [onMessageLost](#onmessagelost)
@@ -620,7 +623,7 @@ await kafka.startBatchConsumer(
620
623
  | Property/Method | Description |
621
624
  | --------------- | ----------- |
622
625
  | `partition` | Partition number for this batch |
623
- | `highWatermark` | Latest offset in the partition (lag indicator) |
626
+ | `highWatermark` | Latest offset in the partition (`string`). `null` when the message is replayed via a retry topic consumer — in that path the broker high-watermark is not available. Guard against `null` before computing lag |
624
627
  | `heartbeat()` | Send a heartbeat to keep the consumer session alive — call during long processing loops |
625
628
  | `resolveOffset(offset)` | Mark offset as processed (required before `commitOffsetsIfNecessary`) |
626
629
  | `commitOffsetsIfNecessary()` | Commit resolved offsets; respects `autoCommit` setting |
@@ -743,6 +746,55 @@ type BeforeConsumeResult =
743
746
 
744
747
  When multiple instrumentations each provide a `wrap`, they compose in declaration order — the first instrumentation's `wrap` is the outermost.
745
748
 
749
+ ### Lifecycle event hooks
750
+
751
+ Three additional hooks fire for specific events in the consume pipeline:
752
+
753
+ | Hook | When called | Arguments |
754
+ | ---- | ----------- | --------- |
755
+ | `onMessage` | Handler successfully processed a message | `(envelope)` — use as a success counter for error-rate calculations |
756
+ | `onRetry` | A message is queued for another attempt (in-process backoff or routed to a retry topic) | `(envelope, attempt, maxRetries)` |
757
+ | `onDlq` | A message is routed to the dead letter queue | `(envelope, reason)` — reason is `'handler-error'`, `'validation-error'`, or `'lamport-clock-duplicate'` |
758
+ | `onDuplicate` | A duplicate is detected via Lamport Clock | `(envelope, strategy)` — strategy is `'drop'`, `'dlq'`, or `'topic'` |
759
+
760
+ ```typescript
761
+ const myInstrumentation: KafkaInstrumentation = {
762
+ onMessage(envelope) {
763
+ metrics.increment('kafka.processed', { topic: envelope.topic });
764
+ },
765
+ onRetry(envelope, attempt, maxRetries) {
766
+ console.warn(`Retrying ${envelope.topic} — attempt ${attempt}/${maxRetries}`);
767
+ },
768
+ onDlq(envelope, reason) {
769
+ alertingSystem.send({ topic: envelope.topic, reason });
770
+ },
771
+ onDuplicate(envelope, strategy) {
772
+ metrics.increment('kafka.duplicate', { topic: envelope.topic, strategy });
773
+ },
774
+ };
775
+ ```
776
+
777
+ ### Built-in metrics
778
+
779
+ `KafkaClient` maintains lightweight in-process event counters independently of any instrumentation:
780
+
781
+ ```typescript
782
+ // Global snapshot — aggregate across all topics
783
+ const snapshot = kafka.getMetrics();
784
+ // { processedCount: number; retryCount: number; dlqCount: number; dedupCount: number }
785
+
786
+ // Per-topic snapshot
787
+ const orderMetrics = kafka.getMetrics('order.created');
788
+ // { processedCount: 5, retryCount: 1, dlqCount: 0, dedupCount: 0 }
789
+
790
+ kafka.resetMetrics(); // reset all counters
791
+ kafka.resetMetrics('order.created'); // reset only one topic's counters
792
+ ```
793
+
794
+ Passing a topic name that has not seen any events returns a zero-valued snapshot — it never throws.
795
+
796
+ Counters are incremented in the same code paths that fire the corresponding hooks — they are always active regardless of whether any instrumentation is configured.
797
+
746
798
  ## Options reference
747
799
 
748
800
  ### Send options
@@ -795,6 +847,7 @@ Passed to `KafkaModule.register()` or returned from `registerAsync()` factory:
795
847
  | `numPartitions` | `1` | Number of partitions for auto-created topics |
796
848
  | `strictSchemas` | `true` | Validate string topic keys against schemas registered via TopicDescriptor |
797
849
  | `instrumentation` | `[]` | Client-wide instrumentation hooks (e.g. OTel). Applied to both send and consume paths |
850
+ | `transactionalId` | `${clientId}-tx` | Transactional producer ID for `transaction()` calls. Must be unique per producer instance across the cluster — two instances sharing the same ID will be fenced by Kafka. The client logs a warning when the same ID is registered twice within one process |
798
851
  | `onMessageLost` | — | Called when a message is silently dropped without DLQ — use to alert, log to external systems, or trigger fallback logic |
799
852
  | `onRebalance` | — | Called on every partition assign/revoke event across all consumers created by this client |
800
853
 
@@ -983,7 +1036,7 @@ By default, retry is handled in-process: the consumer sleeps between attempts wh
983
1036
 
984
1037
  Benefits over in-process retry:
985
1038
 
986
- - **Durable** — retry messages survive a consumer restart; routing between levels and to DLQ is exactly-once via Kafka transactions
1039
+ - **Durable** — retry messages survive a consumer restart; all routing (main retry.1, level N → N+1, retry → DLQ) is exactly-once via Kafka transactions
987
1040
  - **Non-blocking** — the original consumer is free immediately; each level consumer only pauses its specific partition during the delay window, so other partitions continue processing
988
1041
  - **Isolated** — each retry level has its own consumer group, so a slow level 3 consumer never blocks a level 1 consumer
989
1042
 
@@ -1015,9 +1068,9 @@ Each level consumer uses `consumer.pause → sleep(remaining) → consumer.resum
1015
1068
 
1016
1069
  The retry topic messages carry scheduling headers (`x-retry-attempt`, `x-retry-after`, `x-retry-original-topic`, `x-retry-max-retries`) that each level consumer reads automatically — no manual configuration needed.
1017
1070
 
1018
- > **Delivery guarantee:** routing within the retry chain (retry.N → retry.N+1 and retry.N → DLQ) is **exactly-once** — each routing step is wrapped in a Kafka transaction via `sendOffsetsToTransaction`, so the produce and the consumer offset commit happen atomically. A crash at any point rolls back the transaction: the message is redelivered and the routing is retried, with no duplicate in the next level. If the EOS transaction itself fails (broker unavailable), the offset is not committed and the message stays safely in the retry topic until the broker recovers.
1071
+ > **Delivery guarantee:** the entire retry chain — including the **main consumer → retry.1** boundary — is **exactly-once**. Every routing step (main → retry.1, retry.N → retry.N+1, retry.N → DLQ) is wrapped in a Kafka transaction via `sendOffsetsToTransaction`: the produce and the consumer offset commit happen atomically. A crash at any point rolls back the transaction: the message is redelivered and the routing is retried, with no duplicate in the next level. If the EOS transaction fails (broker unavailable), the offset stays uncommitted and the message is safely redelivered it is never lost.
1019
1072
  >
1020
- > The remaining at-least-once window is at the **main consumer → retry.1** boundary: the main consumer uses `autoCommit: true` by default, so if it crashes after routing to `retry.1` but before autoCommit fires, the message may appear twice in `retry.1`. This is the standard Kafka at-least-once trade-off for any consumer using autoCommit. Design handlers to be idempotent if this edge case is unacceptable.
1073
+ > The standard Kafka at-least-once guarantee still applies at the handler level: if your handler succeeds but the process crashes before the manual offset commit completes, the message is redelivered to the handler. Design handlers to be idempotent.
1021
1074
  >
1022
1075
  > **Startup validation:** `retryTopics` requires `retry` to be set — an error is thrown at startup if `retry` is missing. When `autoCreateTopics: false`, all `{topic}.retry.N` topics are validated to exist at startup and a clear error lists any missing ones. With `autoCreateTopics: true` the check is skipped — topics are created automatically by the `ensureTopic` path. Supported by both `startConsumer` and `startBatchConsumer`.
1023
1076
 
@@ -1037,6 +1090,75 @@ await kafka.stopConsumer();
1037
1090
 
1038
1091
  `stopConsumer(groupId)` disconnects and removes only that group's consumer, leaving other groups running. Useful when you want to pause processing for a specific topic without restarting the whole client.
1039
1092
 
1093
+ ## Pause and resume
1094
+
1095
+ Temporarily stop delivering messages from specific partitions without disconnecting the consumer:
1096
+
1097
+ ```typescript
1098
+ // Pause partition 0 of 'orders' (default group)
1099
+ kafka.pauseConsumer(undefined, [{ topic: 'orders', partitions: [0] }]);
1100
+
1101
+ // Resume it later
1102
+ kafka.resumeConsumer(undefined, [{ topic: 'orders', partitions: [0] }]);
1103
+
1104
+ // Target a specific consumer group, multiple partitions
1105
+ kafka.pauseConsumer('payments-group', [{ topic: 'payments', partitions: [0, 1] }]);
1106
+ ```
1107
+
1108
+ The first argument is the consumer group ID — pass `undefined` to target the default group. A warning is logged if the group is not found.
1109
+
1110
+ Pausing is non-destructive: the consumer stays connected and Kafka preserves the partition assignment for as long as the group session is alive. Messages accumulate in the topic and are delivered once the consumer resumes. Typical use: apply backpressure when a downstream dependency (e.g. a database) is temporarily overloaded.
1111
+
1112
+ ## Reset consumer offsets
1113
+
1114
+ Seek a consumer group's committed offsets to the beginning or end of a topic:
1115
+
1116
+ ```typescript
1117
+ // Seek to the beginning — re-process all existing messages
1118
+ await kafka.resetOffsets(undefined, 'orders', 'earliest');
1119
+
1120
+ // Seek to the end — skip existing messages, process only new ones
1121
+ await kafka.resetOffsets(undefined, 'orders', 'latest');
1122
+
1123
+ // Target a specific consumer group
1124
+ await kafka.resetOffsets('payments-group', 'orders', 'earliest');
1125
+ ```
1126
+
1127
+ **Important:** the consumer for the specified group must be stopped before calling `resetOffsets`. An error is thrown if the group is currently running — this prevents the reset from racing with an active offset commit.
1128
+
1129
+ ## DLQ replay
1130
+
1131
+ Re-publish messages from a dead letter queue back to the original topic:
1132
+
1133
+ ```typescript
1134
+ // Re-publish all messages from 'orders.dlq' → 'orders'
1135
+ const result = await kafka.replayDlq('orders');
1136
+ // { replayed: 42, skipped: 0 }
1137
+ ```
1138
+
1139
+ Options:
1140
+
1141
+ | Option | Default | Description |
1142
+ | ------ | ------- | ----------- |
1143
+ | `targetTopic` | `x-dlq-original-topic` header | Override the destination topic |
1144
+ | `dryRun` | `false` | Count messages without sending |
1145
+ | `filter` | — | `(headers) => boolean` — skip messages where the callback returns `false` |
1146
+
1147
+ ```typescript
1148
+ // Dry run — see how many messages would be replayed
1149
+ const dry = await kafka.replayDlq('orders', { dryRun: true });
1150
+
1151
+ // Route to a different topic
1152
+ const result = await kafka.replayDlq('orders', { targetTopic: 'orders.v2' });
1153
+
1154
+ // Only replay messages with a specific correlation ID
1155
+ const filtered = await kafka.replayDlq('orders', {
1156
+ filter: (headers) => headers['x-correlation-id'] === 'corr-123',
1157
+ });
1158
+ ```
1159
+
1160
+ `replayDlq` creates a temporary consumer group that reads the DLQ topic up to the high-watermark at the time of the call — messages published after replay starts are not included. DLQ metadata headers (`x-dlq-original-topic`, `x-dlq-error-message`, `x-dlq-error-stack`, `x-dlq-failed-at`, `x-dlq-attempt-count`) are stripped from the replayed messages; all other headers (e.g. `x-correlation-id`) are preserved.
1161
+
1040
1162
  ## Graceful shutdown
1041
1163
 
1042
1164
  `disconnect()` now drains in-flight handlers before tearing down connections — no messages are silently cut off mid-processing.