@drarzter/kafka-client 0.9.4 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (180) hide show
  1. package/README.md +693 -8
  2. package/dist/chunk-OR7TPAAE.mjs +4760 -0
  3. package/dist/chunk-OR7TPAAE.mjs.map +1 -0
  4. package/dist/chunk-PQVBRDNV.mjs +149 -0
  5. package/dist/chunk-PQVBRDNV.mjs.map +1 -0
  6. package/dist/cli/dlq.d.ts +119 -0
  7. package/dist/cli/dlq.d.ts.map +1 -0
  8. package/dist/cli/index.d.ts +3 -0
  9. package/dist/cli/index.d.ts.map +1 -0
  10. package/dist/{chunk-SM4FZKAZ.mjs → cli/index.js} +1073 -309
  11. package/dist/cli/index.js.map +1 -0
  12. package/dist/cli/index.mjs +356 -0
  13. package/dist/cli/index.mjs.map +1 -0
  14. package/dist/client/config/from-env.d.ts +188 -0
  15. package/dist/client/config/from-env.d.ts.map +1 -0
  16. package/dist/client/config/index.d.ts +2 -0
  17. package/dist/client/config/index.d.ts.map +1 -0
  18. package/dist/client/errors.d.ts +67 -0
  19. package/dist/client/errors.d.ts.map +1 -0
  20. package/dist/client/kafka.client/admin/ops.d.ts +114 -0
  21. package/dist/client/kafka.client/admin/ops.d.ts.map +1 -0
  22. package/dist/client/kafka.client/consumer/features/delayed.d.ts +24 -0
  23. package/dist/client/kafka.client/consumer/features/delayed.d.ts.map +1 -0
  24. package/dist/client/kafka.client/consumer/features/dlq-replay.d.ts +52 -0
  25. package/dist/client/kafka.client/consumer/features/dlq-replay.d.ts.map +1 -0
  26. package/dist/client/kafka.client/consumer/features/routed.d.ts +4 -0
  27. package/dist/client/kafka.client/consumer/features/routed.d.ts.map +1 -0
  28. package/dist/client/kafka.client/consumer/features/snapshot.d.ts +10 -0
  29. package/dist/client/kafka.client/consumer/features/snapshot.d.ts.map +1 -0
  30. package/dist/client/kafka.client/consumer/features/window.d.ts +5 -0
  31. package/dist/client/kafka.client/consumer/features/window.d.ts.map +1 -0
  32. package/dist/client/kafka.client/consumer/handler.d.ts +163 -0
  33. package/dist/client/kafka.client/consumer/handler.d.ts.map +1 -0
  34. package/dist/client/kafka.client/consumer/ops.d.ts +64 -0
  35. package/dist/client/kafka.client/consumer/ops.d.ts.map +1 -0
  36. package/dist/client/kafka.client/consumer/pipeline.d.ts +168 -0
  37. package/dist/client/kafka.client/consumer/pipeline.d.ts.map +1 -0
  38. package/dist/client/kafka.client/consumer/queue.d.ts +37 -0
  39. package/dist/client/kafka.client/consumer/queue.d.ts.map +1 -0
  40. package/dist/client/kafka.client/consumer/retry-topic.d.ts +68 -0
  41. package/dist/client/kafka.client/consumer/retry-topic.d.ts.map +1 -0
  42. package/dist/client/kafka.client/consumer/setup.d.ts +66 -0
  43. package/dist/client/kafka.client/consumer/setup.d.ts.map +1 -0
  44. package/dist/client/kafka.client/consumer/start.d.ts +7 -0
  45. package/dist/client/kafka.client/consumer/start.d.ts.map +1 -0
  46. package/dist/client/kafka.client/consumer/stop.d.ts +19 -0
  47. package/dist/client/kafka.client/consumer/stop.d.ts.map +1 -0
  48. package/dist/client/kafka.client/consumer/subscribe-retry.d.ts +4 -0
  49. package/dist/client/kafka.client/consumer/subscribe-retry.d.ts.map +1 -0
  50. package/dist/client/kafka.client/context.d.ts +75 -0
  51. package/dist/client/kafka.client/context.d.ts.map +1 -0
  52. package/dist/client/kafka.client/index.d.ts +155 -0
  53. package/dist/client/kafka.client/index.d.ts.map +1 -0
  54. package/dist/client/kafka.client/infra/circuit-breaker.manager.d.ts +61 -0
  55. package/dist/client/kafka.client/infra/circuit-breaker.manager.d.ts.map +1 -0
  56. package/dist/client/kafka.client/infra/dedup.store.d.ts +28 -0
  57. package/dist/client/kafka.client/infra/dedup.store.d.ts.map +1 -0
  58. package/dist/client/kafka.client/infra/inflight.tracker.d.ts +22 -0
  59. package/dist/client/kafka.client/infra/inflight.tracker.d.ts.map +1 -0
  60. package/dist/client/kafka.client/infra/metrics.manager.d.ts +67 -0
  61. package/dist/client/kafka.client/infra/metrics.manager.d.ts.map +1 -0
  62. package/dist/client/kafka.client/producer/lifecycle.d.ts +41 -0
  63. package/dist/client/kafka.client/producer/lifecycle.d.ts.map +1 -0
  64. package/dist/client/kafka.client/producer/ops.d.ts +79 -0
  65. package/dist/client/kafka.client/producer/ops.d.ts.map +1 -0
  66. package/dist/client/kafka.client/producer/send.d.ts +21 -0
  67. package/dist/client/kafka.client/producer/send.d.ts.map +1 -0
  68. package/dist/client/kafka.client/validate-options.d.ts +11 -0
  69. package/dist/client/kafka.client/validate-options.d.ts.map +1 -0
  70. package/dist/client/message/envelope.d.ts +105 -0
  71. package/dist/client/message/envelope.d.ts.map +1 -0
  72. package/dist/client/message/schema-registry.d.ts +124 -0
  73. package/dist/client/message/schema-registry.d.ts.map +1 -0
  74. package/dist/client/message/serde.d.ts +68 -0
  75. package/dist/client/message/serde.d.ts.map +1 -0
  76. package/dist/client/message/topic.d.ts +159 -0
  77. package/dist/client/message/topic.d.ts.map +1 -0
  78. package/dist/client/message/versioned-schema.d.ts +53 -0
  79. package/dist/client/message/versioned-schema.d.ts.map +1 -0
  80. package/dist/client/outbox/index.d.ts +4 -0
  81. package/dist/client/outbox/index.d.ts.map +1 -0
  82. package/dist/client/outbox/outbox.relay.d.ts +90 -0
  83. package/dist/client/outbox/outbox.relay.d.ts.map +1 -0
  84. package/dist/client/outbox/outbox.store.d.ts +42 -0
  85. package/dist/client/outbox/outbox.store.d.ts.map +1 -0
  86. package/dist/client/outbox/outbox.types.d.ts +144 -0
  87. package/dist/client/outbox/outbox.types.d.ts.map +1 -0
  88. package/dist/client/security/acl.d.ts +108 -0
  89. package/dist/client/security/acl.d.ts.map +1 -0
  90. package/dist/client/security/index.d.ts +5 -0
  91. package/dist/client/security/index.d.ts.map +1 -0
  92. package/dist/client/security/providers.d.ts +88 -0
  93. package/dist/client/security/providers.d.ts.map +1 -0
  94. package/dist/client/security/resolve-security.d.ts +19 -0
  95. package/dist/client/security/resolve-security.d.ts.map +1 -0
  96. package/dist/client/security/security.types.d.ts +76 -0
  97. package/dist/client/security/security.types.d.ts.map +1 -0
  98. package/dist/client/transport/confluent.transport.d.ts +32 -0
  99. package/dist/client/transport/confluent.transport.d.ts.map +1 -0
  100. package/dist/client/transport/transport.interface.d.ts +221 -0
  101. package/dist/client/transport/transport.interface.d.ts.map +1 -0
  102. package/dist/client/types/admin.interface.d.ts +174 -0
  103. package/dist/client/types/admin.interface.d.ts.map +1 -0
  104. package/dist/client/types/admin.types.d.ts +140 -0
  105. package/dist/client/types/admin.types.d.ts.map +1 -0
  106. package/dist/client/types/client.d.ts +21 -0
  107. package/dist/client/types/client.d.ts.map +1 -0
  108. package/dist/client/types/common.d.ts +84 -0
  109. package/dist/client/types/common.d.ts.map +1 -0
  110. package/dist/client/types/config.types.d.ts +167 -0
  111. package/dist/client/types/config.types.d.ts.map +1 -0
  112. package/dist/client/types/consumer.interface.d.ts +115 -0
  113. package/dist/client/types/consumer.interface.d.ts.map +1 -0
  114. package/dist/{consumer.types-fFCag3VJ.d.mts → client/types/consumer.types.d.ts} +62 -383
  115. package/dist/client/types/consumer.types.d.ts.map +1 -0
  116. package/dist/client/types/dedup.types.d.ts +50 -0
  117. package/dist/client/types/dedup.types.d.ts.map +1 -0
  118. package/dist/client/types/lifecycle.interface.d.ts +72 -0
  119. package/dist/client/types/lifecycle.interface.d.ts.map +1 -0
  120. package/dist/client/types/producer.interface.d.ts +52 -0
  121. package/dist/client/types/producer.interface.d.ts.map +1 -0
  122. package/dist/client/types/producer.types.d.ts +90 -0
  123. package/dist/client/types/producer.types.d.ts.map +1 -0
  124. package/dist/client/types.d.ts +8 -0
  125. package/dist/client/types.d.ts.map +1 -0
  126. package/dist/core.d.ts +13 -314
  127. package/dist/core.d.ts.map +1 -0
  128. package/dist/core.js +1466 -123
  129. package/dist/core.js.map +1 -1
  130. package/dist/core.mjs +45 -3
  131. package/dist/index.d.ts +7 -128
  132. package/dist/index.d.ts.map +1 -0
  133. package/dist/index.js +1483 -123
  134. package/dist/index.js.map +1 -1
  135. package/dist/index.mjs +62 -3
  136. package/dist/index.mjs.map +1 -1
  137. package/dist/nest/kafka.constants.d.ts +5 -0
  138. package/dist/nest/kafka.constants.d.ts.map +1 -0
  139. package/dist/nest/kafka.decorator.d.ts +49 -0
  140. package/dist/nest/kafka.decorator.d.ts.map +1 -0
  141. package/dist/nest/kafka.explorer.d.ts +17 -0
  142. package/dist/nest/kafka.explorer.d.ts.map +1 -0
  143. package/dist/nest/kafka.health.d.ts +7 -0
  144. package/dist/nest/kafka.health.d.ts.map +1 -0
  145. package/dist/nest/kafka.module.d.ts +61 -0
  146. package/dist/nest/kafka.module.d.ts.map +1 -0
  147. package/dist/otel.d.ts +83 -5
  148. package/dist/otel.d.ts.map +1 -0
  149. package/dist/otel.js +100 -6
  150. package/dist/otel.js.map +1 -1
  151. package/dist/otel.mjs +98 -5
  152. package/dist/otel.mjs.map +1 -1
  153. package/dist/serde.d.ts +157 -0
  154. package/dist/serde.d.ts.map +1 -0
  155. package/dist/serde.js +308 -0
  156. package/dist/serde.js.map +1 -0
  157. package/dist/serde.mjs +158 -0
  158. package/dist/serde.mjs.map +1 -0
  159. package/dist/testing/client.mock.d.ts +47 -0
  160. package/dist/testing/client.mock.d.ts.map +1 -0
  161. package/dist/testing/index.d.ts +4 -0
  162. package/dist/testing/index.d.ts.map +1 -0
  163. package/dist/testing/test.container.d.ts +63 -0
  164. package/dist/testing/test.container.d.ts.map +1 -0
  165. package/dist/{testing.d.mts → testing/transport.fake.d.ts} +7 -111
  166. package/dist/testing/transport.fake.d.ts.map +1 -0
  167. package/dist/testing.d.ts +2 -318
  168. package/dist/testing.d.ts.map +1 -0
  169. package/dist/testing.js +26 -0
  170. package/dist/testing.js.map +1 -1
  171. package/dist/testing.mjs +26 -0
  172. package/dist/testing.mjs.map +1 -1
  173. package/package.json +40 -8
  174. package/dist/chunk-SM4FZKAZ.mjs.map +0 -1
  175. package/dist/client-1irhGEu0.d.mts +0 -751
  176. package/dist/client-BpFjkHhr.d.ts +0 -751
  177. package/dist/consumer.types-fFCag3VJ.d.ts +0 -958
  178. package/dist/core.d.mts +0 -314
  179. package/dist/index.d.mts +0 -128
  180. package/dist/otel.d.mts +0 -27
package/README.md CHANGED
@@ -24,17 +24,25 @@ Type-safe Kafka client for Node.js. Framework-agnostic core with a first-class N
24
24
  - [Iterator: consume()](#iterator-consume)
25
25
  - [Multiple consumer groups](#multiple-consumer-groups)
26
26
  - [Partition key](#partition-key)
27
+ - [Typed partition keys](#typed-partition-keys)
27
28
  - [Message headers](#message-headers)
28
29
  - [Batch sending](#batch-sending)
30
+ - [Delayed delivery](#delayed-delivery)
29
31
  - [Batch consuming](#batch-consuming)
30
32
  - [Tombstone messages](#tombstone-messages)
31
33
  - [Compression](#compression)
32
34
  - [Transactions](#transactions)
33
35
  - [Consumer interceptors](#consumer-interceptors)
34
36
  - [Instrumentation](#instrumentation)
37
+ - [OpenTelemetry metrics](#opentelemetry-metrics)
38
+ - [Transport security](#transport-security)
39
+ - [AWS MSK IAM & GCP authentication](#aws-msk-iam--gcp-authentication)
40
+ - [ACL requirements](#acl-requirements)
41
+ - [Environment configuration](#environment-configuration)
35
42
  - [Options reference](#options-reference)
36
43
  - [Error classes](#error-classes)
37
44
  - [Deduplication (Lamport Clock)](#deduplication-lamport-clock)
45
+ - [Pluggable deduplication store](#pluggable-deduplication-store)
38
46
  - [Retry topic chain](#retry-topic-chain)
39
47
  - [stopConsumer](#stopconsumer)
40
48
  - [Pause and resume](#pause-and-resume)
@@ -53,7 +61,11 @@ Type-safe Kafka client for Node.js. Framework-agnostic core with a first-class N
53
61
  - [Header-based routing](#header-based-routing)
54
62
  - [Lag-based producer throttling](#lag-based-producer-throttling)
55
63
  - [Transactional consumer](#transactional-consumer)
64
+ - [Transactional outbox](#transactional-outbox)
65
+ - [Serialization: JSON, Avro, Protobuf](#serialization-json-avro-protobuf)
66
+ - [Schema Registry client](#schema-registry-client)
56
67
  - [Admin API](#admin-api)
68
+ - [DLQ CLI](#dlq-cli)
57
69
  - [Graceful shutdown](#graceful-shutdown)
58
70
  - [Consumer handles](#consumer-handles)
59
71
  - [onMessageLost](#onmessagelost)
@@ -61,8 +73,11 @@ Type-safe Kafka client for Node.js. Framework-agnostic core with a first-class N
61
73
  - [onRebalance](#onrebalance)
62
74
  - [Consumer lag](#consumer-lag)
63
75
  - [Handler timeout warning](#handler-timeout-warning)
76
+ - [Static group membership](#static-group-membership)
64
77
  - [Schema validation](#schema-validation)
78
+ - [Versioned schemas](#versioned-schemas)
65
79
  - [Context-aware validators](#context-aware-validators-schemaparsecontext)
80
+ - [Constructor options validation](#constructor-options-validation)
66
81
  - [Health check](#health-check)
67
82
  - [Testing](#testing)
68
83
  - [Project structure](#project-structure)
@@ -107,13 +122,28 @@ Safe by default. Configurable when you need it. Escape hatches for when you know
107
122
  - **Declarative & imperative** — use `@SubscribeTo()` decorator or `startConsumer()` directly
108
123
  - **Async iterator** — `consume<K>()` returns an `AsyncIterableIterator<EventEnvelope<T[K]>>` for `for await` consumption; breaking out of the loop stops the consumer automatically
109
124
  - **Message TTL** — `messageTtlMs` drops or DLQs messages older than a configurable threshold, preventing stale events from poisoning downstream systems after a lag spike
110
- - **Circuit breaker** — `circuitBreaker` option applies a sliding-window breaker per topic-partition; pauses delivery on repeated DLQ failures and resumes after a configurable recovery window
125
+ - **Circuit breaker** — `circuitBreaker` option applies a sliding-window breaker per topic-partition; pauses delivery on repeated handler failures and resumes after a configurable recovery window
111
126
  - **Seek to offset** — `seekToOffset(groupId, assignments)` seeks individual partitions to explicit offsets for fine-grained replay
112
127
  - **Tombstone messages** — `sendTombstone(topic, key)` sends a null-value record to compact a key out of a log-compacted topic; all instrumentation hooks still fire
113
128
  - **Regex topic subscription** — `startConsumer([/^orders\..+/], handler)` subscribes using a pattern; the broker routes matching topics to the consumer dynamically
114
129
  - **Compression** — per-send `compression` option (`gzip`, `snappy`, `lz4`, `zstd`) in `SendOptions` and `BatchSendOptions`
115
130
  - **Partition assignment strategy** — `partitionAssigner` in `ConsumerOptions` chooses between `cooperative-sticky` (default), `roundrobin`, and `range`
116
131
  - **Admin API** — `listConsumerGroups()`, `describeTopics()`, `deleteRecords()` for group inspection, partition metadata, and message deletion
132
+ - **Typed partition keys** — `topic('orders').type<T>().key(m => m.orderId)` binds a partition-key extractor to a descriptor so related messages land on the same partition without passing `key` at every call site
133
+ - **Versioned schemas** — `versionedSchema({ 1: v1, 2: v2 }, { migrate })` dispatches validation on the `x-schema-version` header and upgrades old shapes to the latest
134
+ - **Constructor validation** — the `KafkaClient` constructor fails fast, throwing a single aggregated error that lists every invalid config value instead of surfacing a confusing driver error on first use
135
+ - **Pluggable deduplication store** — swap the in-memory Lamport-clock store for a `DedupStore` (e.g. Redis-backed) so deduplication survives restarts and rebalances; fail-open on store errors
136
+ - **Delayed delivery** — `sendMessage(..., { deliverAfterMs })` stages messages in `<topic>.delayed`; a `startDelayedRelay()` consumer forwards them transactionally once the deadline passes
137
+ - **OpenTelemetry metrics** — `otelMetricsInstrumentation()` records send/consume counters and a handler-duration histogram; `otelLagGauge()` reports per-partition consumer lag as an observable gauge
138
+ - **Transport security** — `security: { ssl, sasl }` with secure-by-default rules: SASL auto-enables TLS, plaintext to non-local brokers warns once (silenceable via `allowInsecure: true`); SASL mechanisms `plain`, `scram-sha-256`, `scram-sha-512`, `oauthbearer`
139
+ - **AWS MSK / GCP auth** — `awsMskIamProvider({ region })` and `gcpAccessTokenProvider()` supply OAUTHBEARER tokens from the standard AWS / Google credential chains (IRSA, task roles, ADC)
140
+ - **ACL requirements helper** — `describeRequiredAcls()` enumerates every derived topic, companion group, ephemeral group, and transactional id a service needs; render them as `kafka-acls.sh` commands or an MSK IAM policy
141
+ - **Environment configuration** — `kafkaClientConfigFromEnv()`, `consumerOptionsFromEnv()`, and `mergeConsumerOptions()` build config from env vars with `code > env > defaults` precedence
142
+ - **Transactional outbox** — `startOutboxRelay()` publishes rows from a DB outbox table to Kafka inside a transaction; at-least-once with stable `eventId` for downstream dedup
143
+ - **Pluggable serialization** — JSON by default; drop in `avroSerde()` / `protobufSerde()` (`@drarzter/kafka-client/serde`) for **Confluent wire-format** Avro/Protobuf and interop with Java/Go via a Schema Registry, client-wide or per-topic
144
+ - **Schema Registry client** — `SchemaRegistryClient` + `registrySchema()` keep locally-defined schemas in lockstep with a Confluent-compatible registry
145
+ - **Static group membership** — `groupInstanceId` (`group.instance.id`) skips rebalance on k8s rolling restarts within `session.timeout.ms`
146
+ - **DLQ CLI** — `kafka-client-dlq ls | peek | replay` for inspecting and re-publishing dead letter queues from the terminal
117
147
 
118
148
  See the [Roadmap](./ROADMAP.md) for upcoming features and version history.
119
149
 
@@ -605,6 +635,36 @@ await this.kafka.sendMessage(
605
635
  );
606
636
  ```
607
637
 
638
+ ### Typed partition keys
639
+
640
+ Instead of passing `key` at every call site, bind a partition-key extractor to the topic descriptor with `.key()`. The extractor runs on every send through that descriptor, so messages with the same logical key always land on the same partition — you never forget to set it. Available on both `.type<T>()` and `.schema()` descriptors:
641
+
642
+ ```typescript
643
+ import { topic } from '@drarzter/kafka-client';
644
+
645
+ const OrderCreated = topic('order.created')
646
+ .type<{ orderId: string; userId: string; amount: number }>()
647
+ .key((m) => m.orderId);
648
+
649
+ // Key is derived automatically from the payload — no `key` needed
650
+ await kafka.sendMessage(OrderCreated, { orderId: '123', userId: '456', amount: 100 });
651
+ // → produced with key '123'
652
+
653
+ // Works with schema descriptors too
654
+ const PaymentTaken = topic('payment.taken')
655
+ .schema(z.object({ paymentId: z.string(), orderId: z.string() }))
656
+ .key((m) => m.orderId);
657
+ ```
658
+
659
+ The extractor runs on the **original (pre-validation) payload**. An explicit `key` in `SendOptions` — or a batch item's `key` — always wins over the descriptor's extractor:
660
+
661
+ ```typescript
662
+ // Explicit key overrides the extractor
663
+ await kafka.sendMessage(OrderCreated, { orderId: '123', userId: '456', amount: 100 }, {
664
+ key: 'custom-partition-key',
665
+ });
666
+ ```
667
+
608
668
  ## Message headers
609
669
 
610
670
  Attach metadata to messages:
@@ -642,6 +702,33 @@ await this.kafka.sendBatch('order.created', [
642
702
  ]);
643
703
  ```
644
704
 
705
+ ## Delayed delivery
706
+
707
+ Schedule a message for future delivery with `deliverAfterMs`. Instead of going straight to the target topic, the message is produced to a `<topic>.delayed` staging topic carrying `x-delayed-until` (deadline) and `x-delayed-target` headers. A **relay consumer** started via `startDelayedRelay()` holds each message until its deadline passes, then forwards it to the target topic:
708
+
709
+ ```typescript
710
+ // 1. Start the relay once (per process) for the topics you delay-deliver to
711
+ await kafka.startDelayedRelay(['order.reminder']);
712
+
713
+ // 2. Send a message that should arrive in ~1 hour
714
+ await kafka.sendMessage(
715
+ 'order.reminder',
716
+ { orderId: '123', channel: 'email' },
717
+ { deliverAfterMs: 60 * 60 * 1000 },
718
+ );
719
+ // → staged in order.reminder.delayed, forwarded to order.reminder ~1 h later
720
+ ```
721
+
722
+ `deliverAfterMs` also works on `sendBatch` — it applies to the whole batch:
723
+
724
+ ```typescript
725
+ await kafka.sendBatch('order.reminder', messages, { deliverAfterMs: 30_000 });
726
+ ```
727
+
728
+ The relay defaults to a `<defaultGroupId>-delayed-relay` consumer group; override it with `startDelayedRelay(topics, { groupId })`. Forwarding is **transactional** — the produce to the target topic and the source-offset commit happen atomically, so no duplicates are relayed even if the relay crashes mid-forward. The original key, value, and envelope headers (`x-event-id`, `x-correlation-id`, `x-lamport-clock`, `traceparent`) all survive the hop; only the `x-delayed-*` control headers are stripped.
729
+
730
+ > **Delivery time is a lower bound.** The relay pauses a partition until the head-of-line message's deadline, so later messages on the same partition wait behind it (at-least semantics). Delayed messages are only delivered while the relay is running — treat it as a long-lived consumer, not a fire-and-forget scheduler.
731
+
645
732
  ## Batch consuming
646
733
 
647
734
  Process messages in batches for higher throughput. The handler receives an array of `EventEnvelope`s and a `BatchMeta` object with offset management controls:
@@ -821,6 +908,47 @@ const kafka = new KafkaClient('my-app', 'my-group', brokers, {
821
908
 
822
909
  `otelInstrumentation()` injects `traceparent` on send, extracts it on consume, and creates `CONSUMER` spans automatically. The span is set as the **active OTel context** for the handler's duration via `context.with()` — so `trace.getActiveSpan()` works inside your handler and any child spans are automatically parented to the consume span. Requires `@opentelemetry/api` as a peer dependency.
823
910
 
911
+ ### OpenTelemetry metrics
912
+
913
+ `otelInstrumentation()` handles **traces**. For **metrics**, the same entrypoint exports `otelMetricsInstrumentation()` (counters + a duration histogram) and `otelLagGauge()` (an observable consumer-lag gauge). They share nothing with the tracing instrumentation and compose with it in any order:
914
+
915
+ ```typescript
916
+ import {
917
+ otelInstrumentation,
918
+ otelMetricsInstrumentation,
919
+ otelLagGauge,
920
+ } from '@drarzter/kafka-client/otel';
921
+
922
+ const kafka = new KafkaClient('my-app', 'my-group', brokers, {
923
+ instrumentation: [otelInstrumentation(), otelMetricsInstrumentation()],
924
+ });
925
+ ```
926
+
927
+ `otelMetricsInstrumentation()` registers seven instruments under the meter `@drarzter/kafka-client` (created once per instance, not per message):
928
+
929
+ | Instrument | Type | Attributes | Recorded when |
930
+ | ---------- | ---- | ---------- | ------------- |
931
+ | `kafka.client.messages.sent` | Counter | `topic` | a message is sent |
932
+ | `kafka.client.messages.processed` | Counter | `topic` | a handler succeeds |
933
+ | `kafka.client.messages.retried` | Counter | `topic` | a message is queued for retry |
934
+ | `kafka.client.messages.dlq` | Counter | `topic`, `reason` | a message is routed to a DLQ |
935
+ | `kafka.client.messages.duplicate` | Counter | `topic`, `strategy` | a Lamport-clock duplicate is detected |
936
+ | `kafka.client.consume.errors` | Counter | `topic` | a handler throws |
937
+ | `kafka.client.consume.duration` | Histogram (ms) | `topic` | measured across the handler's execution |
938
+
939
+ Pass a custom meter with `otelMetricsInstrumentation({ meter })` to route instruments through your own `MeterProvider`; it defaults to `metrics.getMeter('@drarzter/kafka-client')`.
940
+
941
+ `otelLagGauge()` registers an observable gauge `kafka.client.consumer.lag` (attributes `topic`, `partition`, `groupId`) that polls `getConsumerLag()` on each metric-collection cycle. It returns an **unregister disposer** — call it on shutdown to stop observing:
942
+
943
+ ```typescript
944
+ const unregisterLag = otelLagGauge(kafka, { groupId: 'billing-service' });
945
+
946
+ // ...later, on shutdown:
947
+ unregisterLag();
948
+ ```
949
+
950
+ `groupId` defaults to the client's constructor group (reported as an empty-string attribute), and `meter` overrides the meter as above. Lag-query failures during a collection cycle are swallowed silently — a broker hiccup reports no samples for that cycle rather than breaking metric collection. Both helpers require `@opentelemetry/api` as a peer dependency.
951
+
824
952
  ### Custom instrumentation
825
953
 
826
954
  `beforeConsume` can return a `BeforeConsumeResult` — either the legacy `() => void` cleanup function, or an object with `cleanup` and/or `wrap`:
@@ -917,6 +1045,204 @@ Passing a topic name that has not seen any events returns a zero-valued snapshot
917
1045
 
918
1046
  Counters are incremented in the same code paths that fire the corresponding hooks — they are always active regardless of whether any instrumentation is configured.
919
1047
 
1048
+ ## Transport security
1049
+
1050
+ Configure TLS and SASL through the `security` option on `KafkaClientOptions`. The library applies **secure-by-default** rules so credentials never leak onto plaintext connections by accident:
1051
+
1052
+ - **SASL auto-enables TLS.** When `sasl` is set and `ssl` is left unset, `ssl` is turned on automatically — SASL credentials always travel over TLS unless you explicitly opt out.
1053
+ - **Explicit `ssl: false` with SASL warns.** Setting `sasl` together with `ssl: false` logs a warning that credentials will cross the wire in plaintext — only safe on fully trusted networks.
1054
+ - **Plaintext to non-local brokers warns once.** With no `ssl`/`sasl` at all and at least one non-local broker (anything outside `localhost`, `127.0.0.0/8`, `::1`, `0.0.0.0`, `host.docker.internal`), a single warning is logged per client. Acknowledge and silence it with `allowInsecure: true`.
1055
+
1056
+ Nothing here ever throws or blocks a connection — the defaults protect, you stay in control.
1057
+
1058
+ ```typescript
1059
+ import { KafkaClient } from '@drarzter/kafka-client/core';
1060
+
1061
+ // SASL/SCRAM over TLS — ssl auto-enabled because sasl is set
1062
+ const kafka = new KafkaClient('billing-svc', 'billing-group', ['broker.example.com:9093'], {
1063
+ security: {
1064
+ sasl: {
1065
+ mechanism: 'scram-sha-512',
1066
+ username: 'billing-svc',
1067
+ password: process.env.KAFKA_PASSWORD!,
1068
+ },
1069
+ // ssl: true — inferred automatically; set explicitly if you prefer
1070
+ },
1071
+ });
1072
+ ```
1073
+
1074
+ `KafkaSecurityOptions`:
1075
+
1076
+ | Field | Default | Description |
1077
+ | ----- | ------- | ----------- |
1078
+ | `ssl` | `true` when `sasl` set, else `false` | Enable TLS |
1079
+ | `sasl` | — | SASL authentication (see below) |
1080
+ | `allowInsecure` | `false` | Acknowledge an intentionally insecure (plaintext, non-local) setup and silence the warning. No effect when `ssl`/`sasl` are set |
1081
+
1082
+ `sasl` is a discriminated union on `mechanism`:
1083
+
1084
+ ```typescript
1085
+ // Username / password mechanisms
1086
+ { mechanism: 'plain' | 'scram-sha-256' | 'scram-sha-512', username: string, password: string }
1087
+
1088
+ // Token-based (AWS MSK IAM, GCP, custom)
1089
+ { mechanism: 'oauthbearer', oauthBearerProvider: () => Promise<OAuthBearerToken> }
1090
+ ```
1091
+
1092
+ An `OAuthBearerProvider` is an async factory the driver calls on connect and before each token expiry; it returns `{ value, principal?, lifetimeMs?, extensions? }`.
1093
+
1094
+ ### AWS MSK IAM & GCP authentication
1095
+
1096
+ Two ready-made `oauthbearer` providers cover the common managed-Kafka cases. Both resolve credentials from the platform's standard chain — nothing to hard-code — and rely on an **optional** peer dependency you install alongside this library.
1097
+
1098
+ **AWS MSK IAM** — `awsMskIamProvider({ region })` delegates token signing to `aws-msk-iam-sasl-signer-js`. Credentials come from the standard AWS provider chain, so EKS IRSA, ECS task roles, and env credentials all work unchanged. Authorisation is then governed by IAM policies (`kafka-cluster:*` actions) — see [ACL requirements](#acl-requirements) to generate one:
1099
+
1100
+ ```bash
1101
+ npm install aws-msk-iam-sasl-signer-js
1102
+ ```
1103
+
1104
+ ```typescript
1105
+ import { KafkaClient, awsMskIamProvider } from '@drarzter/kafka-client/core';
1106
+
1107
+ const kafka = new KafkaClient('orders-svc', 'orders-group', brokers, {
1108
+ security: {
1109
+ sasl: {
1110
+ mechanism: 'oauthbearer',
1111
+ oauthBearerProvider: awsMskIamProvider({ region: 'eu-west-1' }),
1112
+ },
1113
+ },
1114
+ });
1115
+ ```
1116
+
1117
+ **GCP** — `gcpAccessTokenProvider()` delegates to `google-auth-library` using Application Default Credentials, so GKE Workload Identity, attached service accounts, and `GOOGLE_APPLICATION_CREDENTIALS` all work unchanged. It supplies a raw ADC access token; verify the exact token format your cluster expects against current Google documentation:
1118
+
1119
+ ```bash
1120
+ npm install google-auth-library
1121
+ ```
1122
+
1123
+ ```typescript
1124
+ import { KafkaClient, gcpAccessTokenProvider } from '@drarzter/kafka-client/core';
1125
+
1126
+ const kafka = new KafkaClient('events-svc', 'events-group', brokers, {
1127
+ security: {
1128
+ sasl: {
1129
+ mechanism: 'oauthbearer',
1130
+ oauthBearerProvider: gcpAccessTokenProvider(),
1131
+ },
1132
+ },
1133
+ });
1134
+ ```
1135
+
1136
+ | Provider | Options | Optional peer dep |
1137
+ | -------- | ------- | ----------------- |
1138
+ | `awsMskIamProvider` | `{ region }` | `aws-msk-iam-sasl-signer-js` |
1139
+ | `gcpAccessTokenProvider` | `{ scopes?, principal?, tokenTtlMs? }` (defaults: `cloud-platform` scope, principal `'gcp'`, 50 min TTL) | `google-auth-library` |
1140
+
1141
+ Neither package is a hard dependency — they are dynamically imported on first token fetch. If the package is missing, the provider throws a clear install hint rather than failing at build time.
1142
+
1143
+ ### ACL requirements
1144
+
1145
+ The features that make this library convenient — retry topics, DLQ, delayed delivery, deduplication routing, DLQ replay, snapshots, clock recovery — quietly create **extra topics and consumer groups** (`<topic>.retry.N`, `<topic>.dlq`, `<topic>.delayed`, `<topic>.duplicates`, `<groupId>-retry.N`, timestamped ephemeral groups, transactional ids). On a locked-down cluster every one of them needs an ACL, and the last place you want to discover a missing grant is production at 3 a.m.
1146
+
1147
+ `describeRequiredAcls()` enumerates the complete set from a declarative usage profile. Feed the result to `toKafkaAclCommands()` for `kafka-acls.sh` commands, or `toMskIamPolicy()` for an AWS MSK IAM policy document:
1148
+
1149
+ ```typescript
1150
+ import {
1151
+ describeRequiredAcls,
1152
+ toKafkaAclCommands,
1153
+ toMskIamPolicy,
1154
+ } from '@drarzter/kafka-client/core';
1155
+
1156
+ const resources = describeRequiredAcls({
1157
+ clientId: 'billing-svc',
1158
+ groupIds: ['billing-svc-group'],
1159
+ produceTopics: ['invoices.created'],
1160
+ consumeTopics: ['orders.created'],
1161
+ features: {
1162
+ retryTopics: { maxRetries: 3 },
1163
+ dlq: true,
1164
+ dlqReplay: true,
1165
+ transactions: true,
1166
+ },
1167
+ });
1168
+
1169
+ // Render kafka-acls.sh commands for a principal
1170
+ for (const cmd of toKafkaAclCommands(resources, 'User:billing-svc', 'broker:9092')) {
1171
+ console.log(cmd);
1172
+ }
1173
+ // kafka-acls.sh --bootstrap-server broker:9092 --add --allow-principal 'User:billing-svc' \
1174
+ // --operation READ --operation DESCRIBE --topic 'orders.created' # startConsumer
1175
+ // kafka-acls.sh ... --topic 'orders.created.dlq' # dlq: true — failed messages routed to DLQ
1176
+ // kafka-acls.sh ... --topic 'orders.created.retry.1' ... --topic 'orders.created.retry.3'
1177
+ // kafka-acls.sh ... --group 'billing-svc-group-retry.' --resource-pattern-type prefixed
1178
+ // kafka-acls.sh ... --transactional-id 'billing-svc-group-' --resource-pattern-type prefixed
1179
+ // kafka-acls.sh ... --group 'orders.created.dlq-replay' --operation DELETE --resource-pattern-type prefixed
1180
+ // ...
1181
+
1182
+ // Or an MSK IAM policy document
1183
+ const policy = toMskIamPolicy(resources, {
1184
+ region: 'eu-west-1',
1185
+ accountId: '123456789012',
1186
+ clusterName: 'prod',
1187
+ clusterUuid: 'abcd-1234',
1188
+ });
1189
+ ```
1190
+
1191
+ `describeRequiredAcls()` returns `AclResource[]`, each carrying `resourceType` (`topic` | `group` | `transactional-id` | `cluster`), `patternType` (`literal` | `prefixed`), `name`, `operations`, and a `reason` naming the feature that requires it. Ephemeral-group features (`dlqReplay`, `snapshots`, `clockRecovery`) request `DELETE` on a **prefixed** pattern, because those groups are timestamped and cleaned up after use.
1192
+
1193
+ | Feature flag | Adds |
1194
+ | ------------ | ---- |
1195
+ | `dlq` | `<topic>.dlq` WRITE per consumed topic |
1196
+ | `retryTopics: { maxRetries }` | `<topic>.retry.1…N` topics; `<groupId>-retry.` prefixed groups; `<groupId>-` prefixed transactional ids |
1197
+ | `delayedDelivery` | `<topic>.delayed` topics; `<groupId>-delayed-relay` group + `-tx` id |
1198
+ | `duplicatesTopic` | `<topic>.duplicates` (or a custom topic name) WRITE |
1199
+ | `dlqReplay` | `<topic>.dlq-replay` prefixed groups (READ, DESCRIBE, **DELETE**) + DLQ READ |
1200
+ | `snapshots` | `<clientId>-snapshot-` prefixed groups (READ, DESCRIBE, **DELETE**) |
1201
+ | `clockRecovery` | `<clientId>-clock-recovery-` prefixed groups (READ, DESCRIBE, **DELETE**) |
1202
+ | `transactions` | `<clientId>-tx` transactional id |
1203
+ | `autoCreateTopics` | cluster `CREATE` (avoid in production) |
1204
+
1205
+ `toMskIamPolicy()` maps Kafka operations to `kafka-cluster:*` actions, turns prefixed patterns into `name*` ARN wildcards, and always includes `kafka-cluster:Connect`. **Review both outputs against your organisation's least-privilege standards and current AWS documentation before applying** — they are a starting point, not a rubber stamp.
1206
+
1207
+ ## Environment configuration
1208
+
1209
+ Build client and consumer configuration from environment variables with a strict precedence rule: **explicit code options > env vars > built-in library defaults**. The helpers only *feed* values in — anything you hard-code always wins, and any variable left unset keeps the library default.
1210
+
1211
+ The library never reads a `.env` file itself. Load one first with Node's built-in `node --env-file=.env` (Node 20.6+) or the `dotenv` package, then call the helpers:
1212
+
1213
+ ```typescript
1214
+ import { KafkaClient, kafkaClientConfigFromEnv } from '@drarzter/kafka-client/core';
1215
+
1216
+ const { clientId, groupId, brokers, options } = kafkaClientConfigFromEnv();
1217
+
1218
+ const kafka = new KafkaClient(
1219
+ clientId ?? 'my-svc', // env value or your fallback
1220
+ groupId ?? 'my-grp',
1221
+ brokers ?? ['localhost:9092'],
1222
+ {
1223
+ ...options, // only the keys whose env vars were present
1224
+ onMessageLost: alerting, // code-level value — always applied, not env-configurable
1225
+ },
1226
+ );
1227
+ ```
1228
+
1229
+ `kafkaClientConfigFromEnv(env?, prefix?)` reads `KAFKA_`-prefixed variables (`CLIENT_ID`, `GROUP_ID`, `BROKERS`, `AUTO_CREATE_TOPICS`, `STRICT_SCHEMAS`, `NUM_PARTITIONS`, `TRANSACTIONAL_ID`, `CLOCK_RECOVERY_*`, `LAG_THROTTLE_*`, and the security vars `SSL`, `SASL_MECHANISM`, `SASL_USERNAME`, `SASL_PASSWORD`, `ALLOW_INSECURE`). It returns `{ clientId?, groupId?, brokers?, options }`, emitting only the keys whose variables were set. Malformed booleans/numbers/enums throw with the offending variable named. `oauthbearer` cannot come from env — token providers are functions, so configure them in code.
1230
+
1231
+ `consumerOptionsFromEnv(env?, prefix?)` reads `KAFKA_CONSUMER_`-prefixed variables into a `Partial<ConsumerOptions>` (retry, DLQ, deduplication, circuit breaker, TTL, `GROUP_INSTANCE_ID`, and more). Merge it under your code-level options with `mergeConsumerOptions()`, which applies the precedence rule — later layers win, and the nested objects (`retry`, `deduplication`, `circuitBreaker`, `subscribeRetry`) are deep-merged so a code layer can override a single field:
1232
+
1233
+ ```typescript
1234
+ import { consumerOptionsFromEnv, mergeConsumerOptions } from '@drarzter/kafka-client/core';
1235
+
1236
+ const envDefaults = consumerOptionsFromEnv();
1237
+ await kafka.startConsumer(
1238
+ ['orders'],
1239
+ handler,
1240
+ mergeConsumerOptions(envDefaults, { dlq: true }), // code layer wins on conflict
1241
+ );
1242
+ ```
1243
+
1244
+ Both helpers accept an explicit `env` object (handy in tests) and a custom variable `prefix`. See [`docs/configuration.md`](./docs/configuration.md) for the full variable reference and [`.env.example`](./.env.example) for a ready-to-copy template.
1245
+
920
1246
  ## Options reference
921
1247
 
922
1248
  ### Send options
@@ -931,8 +1257,9 @@ Options for `sendMessage()` — the third argument:
931
1257
  | `schemaVersion` | `1` | Schema version for the payload |
932
1258
  | `eventId` | auto | Override the auto-generated event ID (UUID v4) |
933
1259
  | `compression` | — | Compression codec for the message set: `'gzip'`, `'snappy'`, `'lz4'`, `'zstd'`; omit to send uncompressed |
1260
+ | `deliverAfterMs` | — | Delay delivery by at least this many milliseconds via a `<topic>.delayed` staging topic; requires a running `startDelayedRelay()` (see [Delayed delivery](#delayed-delivery)) |
934
1261
 
935
- `sendBatch()` accepts `compression` as a top-level option (not per-message); all other options are per-message inside the array items.
1262
+ `sendBatch()` accepts `compression` and `deliverAfterMs` as top-level options (not per-message); all other options are per-message inside the array items.
936
1263
 
937
1264
  ### Consumer options
938
1265
 
@@ -951,15 +1278,17 @@ Options for `sendMessage()` — the third argument:
951
1278
  | `handlerTimeoutMs` | — | Log a warning if the handler hasn't resolved within this window (ms) — does not cancel the handler |
952
1279
  | `deduplication.strategy` | `'drop'` | What to do with duplicate messages: `'drop'` silently discards, `'dlq'` forwards to `{topic}.dlq` (requires `dlq: true`), `'topic'` forwards to `{topic}.duplicates` |
953
1280
  | `deduplication.duplicatesTopic` | `{topic}.duplicates` | Custom destination for `strategy: 'topic'` |
1281
+ | `deduplication.store` | in-memory | Pluggable `DedupStore` for the per-partition last-processed clock; supply a persistent store (e.g. Redis) so dedup survives restarts/rebalances (see [Pluggable deduplication store](#pluggable-deduplication-store)) |
954
1282
  | `messageTtlMs` | — | Drop (or DLQ) messages older than this many milliseconds at consumption time; evaluated against the `x-timestamp` header; see [Message TTL](#message-ttl) |
955
- | `circuitBreaker` | — | Enable circuit breaker with `{}` for zero-config defaults; requires `dlq: true`; see [Circuit breaker](#circuit-breaker) |
956
- | `circuitBreaker.threshold` | `5` | DLQ failures within `windowSize` that opens the circuit |
1283
+ | `circuitBreaker` | — | Enable circuit breaker with `{}` for zero-config defaults; see [Circuit breaker](#circuit-breaker) |
1284
+ | `circuitBreaker.threshold` | `5` | Failed handler attempts within `windowSize` that open the circuit |
957
1285
  | `circuitBreaker.recoveryMs` | `30_000` | Milliseconds to wait in OPEN state before entering HALF_OPEN |
958
1286
  | `circuitBreaker.windowSize` | `threshold × 2, min 10` | Sliding window size in messages |
959
1287
  | `circuitBreaker.halfOpenSuccesses` | `1` | Consecutive successes in HALF_OPEN required to close the circuit |
960
1288
  | `queueHighWaterMark` | unbounded | Max messages buffered in the `consume()` iterator queue before the partition is paused; resumes at 50% drain. Only applies to `consume()` |
961
1289
  | `batch` | `false` | (decorator only) Use `startBatchConsumer` instead of `startConsumer` |
962
1290
  | `partitionAssigner` | `'cooperative-sticky'` | Partition assignment strategy: `'cooperative-sticky'` (minimal movement on rebalance, best for horizontal scaling), `'roundrobin'` (even distribution), `'range'` (contiguous partition ranges) |
1291
+ | `groupInstanceId` | — | Static group membership (`group.instance.id`) — a member that restarts within `session.timeout.ms` rejoins with the same partitions and no rebalance. Must be unique per member; not propagated to retry companions. See [Static group membership](#static-group-membership) |
963
1292
  | `onTtlExpired` | — | Per-consumer override of the client-level `onTtlExpired` callback; takes precedence when set. Receives `TtlExpiredContext` — same shape as the client-level hook |
964
1293
  | `onMessageLost` | — | Per-consumer override of the client-level `onMessageLost` callback; takes precedence when set. Use for consumer-specific dead-message alerting or structured logging |
965
1294
  | `onRetry` | — | Per-consumer retry callback; fires **in addition to** the built-in metrics hook (does not replace it). Same signature as `KafkaInstrumentation.onRetry` |
@@ -980,11 +1309,34 @@ Passed to `KafkaModule.register()` or returned from `registerAsync()` factory:
980
1309
  | `autoCreateTopics` | `false` | Auto-create topics on first send (dev only) |
981
1310
  | `numPartitions` | `1` | Number of partitions for auto-created topics |
982
1311
  | `strictSchemas` | `true` | Validate string topic keys against schemas registered via TopicDescriptor |
1312
+ | `security` | — | TLS + SASL transport security with secure-by-default rules (`{ ssl, sasl, allowInsecure }`); see [Transport security](#transport-security) |
983
1313
  | `instrumentation` | `[]` | Client-wide instrumentation hooks (e.g. OTel). Applied to both send and consume paths |
984
1314
  | `transactionalId` | `${clientId}-tx` | Transactional producer ID for `transaction()` calls. Must be unique per producer instance across the cluster — two instances sharing the same ID will be fenced by Kafka. The client logs a warning when the same ID is registered twice within one process |
985
1315
  | `onMessageLost` | — | Called when a message is silently dropped without DLQ — use to alert, log to external systems, or trigger fallback logic |
986
1316
  | `onTtlExpired` | — | Called when a message is dropped due to TTL expiration (`messageTtlMs`) and `dlq` is not enabled; receives `{ topic, ageMs, messageTtlMs, headers }` |
987
1317
  | `onRebalance` | — | Called on every partition assign/revoke event across all consumers created by this client |
1318
+ | `clockRecovery.topics` | — | Topics to scan on `connectProducer()` to recover the highest `x-lamport-clock`, so the clock stays monotonic across restarts (see [Deduplication](#deduplication-lamport-clock)) |
1319
+ | `clockRecovery.timeoutMs` | `30000` | Max time (ms) to wait for clock recovery before proceeding with a partial result |
1320
+ | `lagThrottle` | — | Delay sends when a consumer group's lag exceeds `maxLag` (see [Lag-based producer throttling](#lag-based-producer-throttling)) |
1321
+ | `lagThrottle.maxLag` | — | Lag threshold (messages) above which sends are delayed (required when `lagThrottle` is set) |
1322
+ | `lagThrottle.groupId` | default group | Consumer group whose lag is monitored |
1323
+ | `lagThrottle.pollIntervalMs` | `5000` | How often (ms) to poll `getConsumerLag()` in the background |
1324
+ | `lagThrottle.maxWaitMs` | `30000` | Max time (ms) a send waits while throttled before proceeding anyway (best-effort, not hard back-pressure) |
1325
+ | `transport` | `ConfluentTransport` | Custom `KafkaTransport` implementation — target an alternative broker library or inject a deterministic fake in tests |
1326
+
1327
+ > **Advanced — direct transport access.** `ConfluentTransport` and the full
1328
+ > `KafkaTransport` interface family (`IProducer`, `IConsumer`, `IAdmin`, …) are
1329
+ > exported from `@drarzter/kafka-client/core`. When you need low-level admin
1330
+ > operations the facade does not expose (e.g. per-partition watermarks), build a
1331
+ > transport instead of deep-importing the raw driver:
1332
+ >
1333
+ > ```typescript
1334
+ > import { ConfluentTransport } from '@drarzter/kafka-client/core';
1335
+ >
1336
+ > const admin = new ConfluentTransport('ops-cli', brokers).admin();
1337
+ > await admin.connect();
1338
+ > const watermarks = await admin.fetchTopicOffsets('orders'); // [{ partition, low, high }]
1339
+ > ```
988
1340
 
989
1341
  **Module-scoped** (default) — import `KafkaModule` in each module that needs it:
990
1342
 
@@ -1147,6 +1499,50 @@ Deduplication state is **in-memory and per-consumer-instance**. Understand what
1147
1499
 
1148
1500
  Use this feature as a lightweight first line of defence — not as a substitute for idempotent business logic.
1149
1501
 
1502
+ ### Pluggable deduplication store
1503
+
1504
+ The in-memory limitation above is only the **default**. Pass a `store` in `deduplication` to back the per-partition clock with any external system — Redis, a database, anything — so deduplication survives process restarts and rebalances. The store implements the `DedupStore` interface:
1505
+
1506
+ ```typescript
1507
+ import { DedupStore } from '@drarzter/kafka-client';
1508
+
1509
+ interface DedupStore {
1510
+ // Return the last processed clock for a group + "topic:partition", or undefined.
1511
+ getLastClock(groupId: string, topicPartition: string): number | undefined | Promise<number | undefined>;
1512
+ // Persist the last processed clock for a group + "topic:partition".
1513
+ setLastClock(groupId: string, topicPartition: string, clock: number): void | Promise<void>;
1514
+ }
1515
+ ```
1516
+
1517
+ Both methods may be synchronous or return a promise. A minimal Redis-backed store:
1518
+
1519
+ ```typescript
1520
+ class RedisDedupStore implements DedupStore {
1521
+ constructor(private readonly redis: RedisClient) {}
1522
+
1523
+ private key(groupId: string, topicPartition: string) {
1524
+ return `dedup:${groupId}:${topicPartition}`;
1525
+ }
1526
+
1527
+ async getLastClock(groupId: string, topicPartition: string) {
1528
+ const raw = await this.redis.get(this.key(groupId, topicPartition));
1529
+ return raw === null ? undefined : Number(raw);
1530
+ }
1531
+
1532
+ async setLastClock(groupId: string, topicPartition: string, clock: number) {
1533
+ await this.redis.set(this.key(groupId, topicPartition), String(clock));
1534
+ }
1535
+ }
1536
+
1537
+ await kafka.startConsumer(['payments'], handler, {
1538
+ deduplication: { strategy: 'drop', store: new RedisDedupStore(redis) },
1539
+ });
1540
+ ```
1541
+
1542
+ **Failure semantics (fail-open):** if `getLastClock` or `setLastClock` throws or rejects, the error is logged and the message is treated as **not** a duplicate. A transient store outage never silently drops messages — it only weakens deduplication until the store recovers, biasing towards at-least-once delivery.
1543
+
1544
+ When `store` is omitted, the built-in `InMemoryDedupStore` is used — the in-session behaviour described above.
1545
+
1150
1546
  ## Retry topic chain
1151
1547
 
1152
1548
  > **tl;dr — recommended production setup:**
@@ -1246,9 +1642,9 @@ Pausing is non-destructive: the consumer stays connected and Kafka preserves the
1246
1642
 
1247
1643
  ## Circuit breaker
1248
1644
 
1249
- Automatically pause delivery from a topic-partition when its DLQ error rate exceeds a threshold. After a recovery window the partition is resumed automatically.
1645
+ Automatically pause delivery from a topic-partition when its handler failure rate exceeds a threshold. After a recovery window the partition is resumed automatically.
1250
1646
 
1251
- **`dlq: true` is required** the breaker counts DLQ events as failures. Without it no failures are recorded and the circuit never opens.
1647
+ Failures are recorded at the handler-error boundary: every failed handler attempt counts (including in-process retries and retry-topic chain levels), independent of whether the message ends up in a DLQ. `dlq` is **not** required for the breaker to work.
1252
1648
 
1253
1649
  Zero-config start — all options have sensible defaults:
1254
1650
 
@@ -1287,7 +1683,7 @@ Options:
1287
1683
 
1288
1684
  | Option | Default | Description |
1289
1685
  | ------ | ------- | ----------- |
1290
- | `threshold` | `5` | DLQ failures within `windowSize` that opens the circuit |
1686
+ | `threshold` | `5` | Failed handler attempts within `windowSize` that open the circuit |
1291
1687
  | `recoveryMs` | `30_000` | Milliseconds to wait in OPEN state before entering HALF_OPEN |
1292
1688
  | `windowSize` | `threshold × 2, min 10` | Sliding window size in messages |
1293
1689
  | `halfOpenSuccesses` | `1` | Consecutive successes in HALF_OPEN required to close the circuit |
@@ -1385,7 +1781,7 @@ await kafka.seekToTimestamp('payments-group', [
1385
1781
  ]);
1386
1782
  ```
1387
1783
 
1388
- Uses `admin.fetchTopicOffsetsByTime` under the hood. If no offset exists at the requested timestamp (e.g. the partition is empty or the timestamp is in the future), the partition falls back to `-1` (end of topic — new messages only).
1784
+ Uses `admin.fetchTopicOffsetsByTimestamp` under the hood. If no offset exists at the requested timestamp (e.g. the partition is empty or the timestamp is in the future), the partition falls back to the current high watermark (end of topic — new messages only).
1389
1785
 
1390
1786
  **Important:** the consumer group must be stopped before seeking. Assignments for the same topic are batched into a single `admin.setOffsets` call.
1391
1787
 
@@ -1691,6 +2087,169 @@ await kafka.startTransactionalConsumer(
1691
2087
 
1692
2088
  `retryTopics: true` is rejected at startup — EOS redelivery on failure is already guaranteed by the transaction. `autoCommit` is always `false` (managed internally).
1693
2089
 
2090
+ ## Transactional outbox
2091
+
2092
+ The transactional-outbox pattern decouples "write my business state" from "publish an event" so the two can never diverge. Application code writes an event row into an outbox table **in the same DB transaction** as its business writes; a relay polls that table and publishes the rows to Kafka, marking them published only after Kafka has acked them. If the process dies after the DB commit but before the publish, the row is still there and gets published on the next poll — the event is never lost.
2093
+
2094
+ `startOutboxRelay()` runs that relay against any `OutboxStore` you implement. The library never touches your database — you own the schema and the queries; it only needs to read unpublished rows oldest-first and durably mark rows published:
2095
+
2096
+ ```typescript
2097
+ import { startOutboxRelay, OutboxStore } from '@drarzter/kafka-client/core';
2098
+
2099
+ // Pseudo-Postgres store — you own the table and the SQL.
2100
+ const store: OutboxStore = {
2101
+ async fetchUnpublished(limit) {
2102
+ const { rows } = await pool.query(
2103
+ `SELECT id, topic, payload, key, correlation_id AS "correlationId",
2104
+ event_id AS "eventId", headers
2105
+ FROM outbox
2106
+ WHERE published_at IS NULL
2107
+ ORDER BY created_at ASC
2108
+ LIMIT $1`,
2109
+ [limit],
2110
+ );
2111
+ return rows;
2112
+ },
2113
+ async markPublished(ids) {
2114
+ await pool.query(`UPDATE outbox SET published_at = now() WHERE id = ANY($1)`, [ids]);
2115
+ },
2116
+ };
2117
+
2118
+ await kafka.connectProducer();
2119
+
2120
+ const relay = startOutboxRelay(kafka, store, {
2121
+ pollIntervalMs: 500, // default 1000
2122
+ batchSize: 200, // default 100 — rows fetched & published per tick
2123
+ onPublished: (n) => metrics.increment('outbox.published', n),
2124
+ onError: (err, batch) => logger.error(`outbox batch of ${batch.length} failed`, err),
2125
+ });
2126
+
2127
+ // On shutdown — stop() halts the timer and awaits any in-flight iteration:
2128
+ await relay.stop();
2129
+ await kafka.disconnect();
2130
+ ```
2131
+
2132
+ Meanwhile, application code inserts outbox rows inside its business transaction:
2133
+
2134
+ ```typescript
2135
+ // Inside a DB transaction, alongside your business INSERT/UPDATE:
2136
+ await tx.query(
2137
+ `INSERT INTO outbox (id, topic, payload, key, correlation_id, event_id)
2138
+ VALUES ($1, $2, $3, $4, $5, $6)`,
2139
+ [randomUUID(), 'orders.created', JSON.stringify(order), order.id, corrId, eventId],
2140
+ );
2141
+ ```
2142
+
2143
+ **Delivery guarantee: at-least-once.** Each poll publishes the whole batch inside **one Kafka transaction**, then marks the rows published. If the process crashes *after* the Kafka commit but *before* `markPublished`, those rows are re-published on the next tick — a **duplicate**. Persist a stable `eventId` on each row (surfaced as `x-event-id`) so consumers can deduplicate, either via this library's [Lamport-clock deduplication](#deduplication-lamport-clock) or an application-level idempotency check. Iterations never overlap; the loop never dies on error.
2144
+
2145
+ `OutboxStore` interface:
2146
+
2147
+ | Method | Description |
2148
+ | ------ | ----------- |
2149
+ | `fetchUnpublished(limit): Promise<OutboxMessage[]>` | Unpublished rows, oldest first, capped at `limit`. Empty array = nothing to do |
2150
+ | `markPublished(ids): Promise<void>` | Durably mark ids published; called only after Kafka acks. Idempotent |
2151
+
2152
+ An `InMemoryOutboxStore` (with `.add()`, `pendingCount`, `publishedCount`) ships for tests and as executable documentation — it is **not** durable, so it does not provide the "same DB transaction as the business write" guarantee that is the whole point of the pattern. A full Postgres reference implementation lives in [`src/integration/postgres-outbox.integration.spec.ts`](./src/integration/postgres-outbox.integration.spec.ts).
2153
+
2154
+ ## Serialization: JSON, Avro, Protobuf
2155
+
2156
+ By default every message value is serialized as JSON — no configuration needed.
2157
+ Serialization is a pluggable seam (`MessageSerde`): swap in Avro or Protobuf
2158
+ with **Confluent wire format** (`[magic 0x00][4-byte schema id][payload]`) to
2159
+ interoperate with Java/Go producers and consumers through a Schema Registry.
2160
+
2161
+ ```typescript
2162
+ import { KafkaClient, topic } from '@drarzter/kafka-client/core';
2163
+ import { avroSerde } from '@drarzter/kafka-client/serde';
2164
+ import { SchemaRegistryClient } from '@drarzter/kafka-client/core';
2165
+
2166
+ const registry = new SchemaRegistryClient({ baseUrl: 'http://localhost:8081' });
2167
+
2168
+ const orderSchema = {
2169
+ type: 'record', name: 'Order',
2170
+ fields: [{ name: 'orderId', type: 'string' }, { name: 'amount', type: 'int' }],
2171
+ };
2172
+
2173
+ // Client-wide: every value goes through Avro.
2174
+ const kafka = new KafkaClient('orders-svc', 'orders-grp', ['localhost:9092'], {
2175
+ serde: avroSerde({ registry, schema: orderSchema, autoRegister: true }),
2176
+ });
2177
+
2178
+ // …or per-topic (JSON elsewhere, Avro just here):
2179
+ const OrderCreated = topic('order.created')
2180
+ .serde(avroSerde({ registry, schema: orderSchema, autoRegister: true }))
2181
+ .type<{ orderId: string; amount: number }>();
2182
+ ```
2183
+
2184
+ `protobufSerde({ registry, schema: protoSource, messageType: 'Order', autoRegister: true })`
2185
+ works the same way. `avsc` / `protobufjs` are **optional peer dependencies** —
2186
+ install only the one you use (`npm i avsc` or `npm i protobufjs`); a clear error
2187
+ tells you if it's missing.
2188
+
2189
+ **Serde options.** `registry` (required); `schema` (Avro JSON / `.proto` source —
2190
+ required to serialize); `subject?` (defaults to Confluent TopicNameStrategy
2191
+ `<topic>-value` / `<topic>-key`); `autoRegister?` (register the schema on first
2192
+ send to obtain its id — handy in dev; default `false` reads the latest registered
2193
+ schema instead). Parsed schemas and id→schema lookups are cached.
2194
+
2195
+ **Custom serde.** Implement `MessageSerde` (`serialize(value, ctx) → Buffer | string`,
2196
+ `deserialize(data, ctx) → value`) for MessagePack, CBOR, encryption, etc. `JsonSerde`
2197
+ is the default and is exported for composition.
2198
+
2199
+ **Notes & limits (v0.11):** the envelope headers (`x-event-id`, Lamport clock,
2200
+ `traceparent`, …) always travel as Kafka headers regardless of value serde. DLQ,
2201
+ retry-topic, duplicates, and delayed-relay forwarding preserve the original wire
2202
+ bytes losslessly, so binary formats survive every hop. Avro currently uses the
2203
+ writer schema as the reader schema (no reader-schema evolution yet); Protobuf
2204
+ supports the top-level message type only; `readSnapshot` remains JSON-only.
2205
+
2206
+ ## Schema Registry client
2207
+
2208
+ `SchemaRegistryClient` is a minimal, dependency-free client for the Confluent Schema Registry REST API (works with Confluent Platform/Cloud, Redpanda, Karapace, and the AWS Glue SR proxy). Its scope is **subject/version management, compatibility checks, and id→schema lookups** — used both to keep your locally-defined schemas in lockstep with a central registry and as the backing lookup for the Avro/Protobuf serdes (see [Serialization: JSON, Avro, Protobuf](#serialization-json-avro-protobuf)).
2209
+
2210
+ ```typescript
2211
+ import { SchemaRegistryClient } from '@drarzter/kafka-client/core';
2212
+
2213
+ const registry = new SchemaRegistryClient({
2214
+ baseUrl: 'http://localhost:8081',
2215
+ auth: { username: apiKey, password: apiSecret }, // optional HTTP Basic (Confluent Cloud)
2216
+ cacheTtlMs: 300_000, // latest-version cache TTL — default 5 min
2217
+ });
2218
+
2219
+ // Register (idempotent — re-registering the same schema returns the existing id)
2220
+ const { id } = await registry.registerSchema('order.created-value', JSON.stringify(orderJsonSchema), 'JSON');
2221
+
2222
+ // Fetch (getLatestSchema is cached; getSchemaVersion is not)
2223
+ const latest = await registry.getLatestSchema('order.created-value');
2224
+ const v2 = await registry.getSchemaVersion('order.created-value', 2);
2225
+
2226
+ // Check compatibility against the subject's policy without registering
2227
+ const ok = await registry.checkCompatibility('order.created-value', JSON.stringify(candidate));
2228
+ ```
2229
+
2230
+ | Method | Cached | Description |
2231
+ | ------ | ------ | ----------- |
2232
+ | `getLatestSchema(subject)` | yes (`cacheTtlMs`) | Latest `{ id, version, schema }` for a subject |
2233
+ | `getSchemaVersion(subject, version)` | no | A specific registered version |
2234
+ | `registerSchema(subject, schema, schemaType?)` | invalidates cache | Register (idempotent); returns `{ id }`. `schemaType` defaults to `'JSON'` |
2235
+ | `checkCompatibility(subject, schema, schemaType?)` | no | `true` when the registry reports the schema compatible |
2236
+
2237
+ `registrySchema()` bridges a registry subject to this library's `SchemaLike` seam so you can attach it to a `TopicDescriptor` like any other schema. On each `parse` it resolves the subject's latest version (cached), optionally verifies the message's `x-schema-version` is not newer than what is registered, and delegates structural validation to a local validator:
2238
+
2239
+ ```typescript
2240
+ import { topic, registrySchema } from '@drarzter/kafka-client/core';
2241
+ import { z } from 'zod';
2242
+
2243
+ const OrderCreated = topic('order.created').schema(
2244
+ registrySchema(registry, 'order.created-value', {
2245
+ validator: z.object({ orderId: z.string() }), // local runtime shape check
2246
+ enforceVersion: true, // default — fail loudly if the message version outruns the registry
2247
+ }),
2248
+ );
2249
+ ```
2250
+
2251
+ The division of labour: the **registry governs schema evolution** (compatibility across versions); the **local validator governs runtime shape**. When `enforceVersion` is `true` (the default) a producer publishing a version newer than the latest registered version fails loudly rather than drifting silently.
2252
+
1694
2253
  ## Admin API
1695
2254
 
1696
2255
  Inspect consumer groups, topic metadata, and delete records via the built-in admin client — no separate connection needed.
@@ -1735,6 +2294,34 @@ await kafka.deleteRecords('orders.created', [
1735
2294
 
1736
2295
  Pass `offset: '-1'` to delete all records in a partition (truncate completely).
1737
2296
 
2297
+ ## DLQ CLI
2298
+
2299
+ The package ships a `kafka-client-dlq` binary for inspecting and re-publishing dead letter queues from the terminal — no code needed. It operates on `<topic>.dlq` topics and delegates replay to `KafkaClient.replayDlq`:
2300
+
2301
+ ```bash
2302
+ # List every .dlq topic with its message count (optionally filtered by base-topic prefix)
2303
+ kafka-client-dlq ls --brokers localhost:9092 [--prefix orders]
2304
+
2305
+ # Print up to N messages from <topic>.dlq — offset, x-dlq-* headers, and value
2306
+ kafka-client-dlq peek --brokers localhost:9092 --topic orders.created [--limit 5]
2307
+
2308
+ # Re-publish <topic>.dlq to its original topic (or --target), full or incremental
2309
+ kafka-client-dlq replay --brokers localhost:9092 --topic orders.created [--target orders.manual] [--dry-run] [--from-beginning | --incremental]
2310
+ ```
2311
+
2312
+ | Flag | Command | Description |
2313
+ | ---- | ------- | ----------- |
2314
+ | `--brokers <list>` | all | Comma-separated broker addresses (**required**) |
2315
+ | `--prefix <name>` | `ls` | Only show DLQ topics whose base name starts with `<name>` |
2316
+ | `--topic <name>` | `peek`, `replay` | Base topic name — the CLI reads `<name>.dlq` |
2317
+ | `--limit <n>` | `peek` | Max messages to print (default `10`) |
2318
+ | `--target <t>` | `replay` | Override destination topic (default: `x-dlq-original-topic` header) |
2319
+ | `--dry-run` | `replay` | Count what would be replayed without publishing |
2320
+ | `--from-beginning` | `replay` | Full replay of all DLQ messages every call (default) |
2321
+ | `--incremental` | `replay` | Only messages added since the previous replay |
2322
+
2323
+ `--from-beginning` and `--incremental` are mutually exclusive. Run `kafka-client-dlq --help` (or with no arguments) for the full usage text.
2324
+
1738
2325
  ## Graceful shutdown
1739
2326
 
1740
2327
  `disconnect()` now drains in-flight handlers before tearing down connections — no messages are silently cut off mid-processing.
@@ -1899,6 +2486,20 @@ If the handler hasn't resolved within the window, a `warn` is logged:
1899
2486
 
1900
2487
  The handler is **not** cancelled — the warning is diagnostic only. Combine with `retry` to automatically give up after a fixed number of slow attempts.
1901
2488
 
2489
+ ## Static group membership
2490
+
2491
+ Set `groupInstanceId` in `ConsumerOptions` to give a consumer a **static** identity (`group.instance.id`). A member that restarts within the broker's `session.timeout.ms` rejoins the group with the same partition assignment and triggers **no rebalance** — ideal for Kubernetes rolling restarts and short redeploys where a transient rebalance would otherwise stall every consumer in the group:
2492
+
2493
+ ```typescript
2494
+ await kafka.startConsumer(['orders'], handler, {
2495
+ groupInstanceId: `orders-svc-${process.env.HOSTNAME}`,
2496
+ });
2497
+ ```
2498
+
2499
+ The id must be **unique per member** within the consumer group — derive it from a stable per-pod value such as the StatefulSet ordinal or hostname. Two live members sharing the same `groupInstanceId` are fenced by the broker.
2500
+
2501
+ `groupInstanceId` is applied only to the consumer you set it on. It is **not** propagated to retry-chain companion consumers — those run in their own groups (`<groupId>-retry.N`) and rebalance independently. It can also be supplied via the `KAFKA_CONSUMER_GROUP_INSTANCE_ID` environment variable (see [Environment configuration](#environment-configuration)).
2502
+
1902
2503
  ## Schema validation
1903
2504
 
1904
2505
  Add runtime message validation using any library with a `.parse()` method — Zod, Valibot, ArkType, or a custom validator. No extra dependency required.
@@ -2019,6 +2620,70 @@ interface SchemaParseContext {
2019
2620
 
2020
2621
  Existing validators (Zod, Valibot, ArkType, custom) that only use the first argument continue to work unchanged — the second argument is silently ignored.
2021
2622
 
2623
+ ### Versioned schemas
2624
+
2625
+ `versionedSchema()` composes per-version validators into a single `SchemaLike` that dispatches on the message's `x-schema-version` header (via `SchemaParseContext.version`). Pass a map of version number → validator, plus an optional `migrate` hook that upgrades older shapes to the latest:
2626
+
2627
+ ```typescript
2628
+ import { topic, versionedSchema } from '@drarzter/kafka-client';
2629
+ import { z } from 'zod';
2630
+
2631
+ const OrderSchema = versionedSchema<{ orderId: string; amountMinor: number }>(
2632
+ {
2633
+ 1: z.object({ orderId: z.string(), amount: z.number() }), // legacy: major units
2634
+ 2: z.object({ orderId: z.string(), amountMinor: z.number().int() }), // current: minor units
2635
+ },
2636
+ {
2637
+ // migrate(data, fromVersion, latestVersion) → data in its latest shape
2638
+ migrate: (data, from) =>
2639
+ from === 1
2640
+ ? { orderId: data.orderId, amountMinor: Math.round(data.amount * 100) }
2641
+ : data,
2642
+ },
2643
+ );
2644
+
2645
+ const OrderCreated = topic('order.created').schema(OrderSchema);
2646
+ ```
2647
+
2648
+ Dispatch rules:
2649
+
2650
+ - **Consume path** — the version comes from the `x-schema-version` header (defaults to `1` when absent).
2651
+ - **Send path** — the version comes from `SendOptions.schemaVersion` (defaults to `1`).
2652
+ - **No parse context** (a direct `schema.parse(data)` call) — the **latest** registered version is assumed.
2653
+
2654
+ After a non-latest version is parsed, `migrate` (if provided) is called so your handler always receives the latest shape. Without a `migrate` hook, older versions are returned as parsed and callers must handle shape differences themselves.
2655
+
2656
+ A message carrying a version with **no registered schema throws** — the error lists every registered version rather than validating against the wrong shape, so a misconfigured producer fails loudly:
2657
+
2658
+ ```text
2659
+ versionedSchema: no schema registered for version 3 (topic "order.created") — registered versions: 1, 2
2660
+ ```
2661
+
2662
+ ## Constructor options validation
2663
+
2664
+ The `KafkaClient` constructor validates its arguments up front. If anything is invalid it throws a **single aggregated error** listing every problem at once, so a misconfigured client fails at construction with a clear message instead of surfacing a confusing driver error on first use:
2665
+
2666
+ ```typescript
2667
+ new KafkaClient('', '', [], { numPartitions: 0 });
2668
+ // throws:
2669
+ // KafkaClient: invalid configuration:
2670
+ // - clientId must be a non-empty string
2671
+ // - groupId must be a non-empty string
2672
+ // - brokers must be a non-empty array of broker addresses
2673
+ // - numPartitions must be a positive integer (got 0)
2674
+ ```
2675
+
2676
+ Checks performed:
2677
+
2678
+ - `clientId` and `groupId` must be non-empty strings.
2679
+ - `brokers` must be a non-empty array with no empty entries — **unless** a custom `transport` is supplied (e.g. `FakeTransport` in tests), in which case an empty `brokers` array is allowed since no broker is dialled.
2680
+ - `numPartitions`, when set, must be a positive integer.
2681
+ - `transactionalId`, when set, must be non-empty.
2682
+ - `clockRecovery.topics` must be an array; `clockRecovery.timeoutMs`, when set, must be `> 0`.
2683
+ - `lagThrottle.maxLag` must be `>= 0`; `lagThrottle.pollIntervalMs` must be `> 0`; `lagThrottle.maxWaitMs` must be `>= 0` (each validated only when set).
2684
+
2685
+ This applies to both `new KafkaClient(...)` and `KafkaModule.register()` / `registerAsync()`, which construct the client under the hood.
2686
+
2022
2687
  ## Health check
2023
2688
 
2024
2689
  Monitor Kafka connectivity with the built-in health indicator:
@@ -2129,6 +2794,26 @@ The integration suite spins up a single-node KRaft Kafka container and tests sen
2129
2794
 
2130
2795
  Both suites run in CI on every push to `main` and on pull requests.
2131
2796
 
2797
+ **Chaos suite** — fault-injection tests (broker restarts, forced rebalances) that verify redelivery and offset-commit guarantees under failure:
2798
+
2799
+ ```bash
2800
+ npm run test:chaos
2801
+ ```
2802
+
2803
+ **Benchmark** — measure the wrapper's overhead over the raw driver:
2804
+
2805
+ ```bash
2806
+ npm run bench
2807
+ ```
2808
+
2809
+ The throughput benchmark reports roughly **~2% overhead** versus using `@confluentinc/kafka-javascript` directly — the typed envelope, Lamport clock, and instrumentation hooks cost very little on the hot path.
2810
+
2811
+ **Clean up stray containers** — if a Testcontainers run is interrupted, remove leftover containers:
2812
+
2813
+ ```bash
2814
+ npm run containers:clean
2815
+ ```
2816
+
2132
2817
  ## File naming conventions
2133
2818
 
2134
2819
  Hyphens within a multi-word name; dot separates the name from its role suffix.