@drarzter/kafka-client 0.7.4 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -45,6 +45,14 @@ Type-safe Kafka client for Node.js. Framework-agnostic core with a first-class N
45
45
  - [Seek to timestamp](#seek-to-timestamp)
46
46
  - [Message TTL](#message-ttl)
47
47
  - [DLQ replay](#dlq-replay)
48
+ - [Read snapshot](#read-snapshot)
49
+ - [Offset checkpointing](#offset-checkpointing)
50
+ - [checkpointOffsets](#checkpointoffsets)
51
+ - [restoreFromCheckpoint](#restorefromcheckpoint)
52
+ - [Windowed batch consumer](#windowed-batch-consumer)
53
+ - [Header-based routing](#header-based-routing)
54
+ - [Lag-based producer throttling](#lag-based-producer-throttling)
55
+ - [Transactional consumer](#transactional-consumer)
48
56
  - [Admin API](#admin-api)
49
57
  - [Graceful shutdown](#graceful-shutdown)
50
58
  - [Consumer handles](#consumer-handles)
@@ -1430,6 +1438,253 @@ const filtered = await kafka.replayDlq('orders', {
1430
1438
 
1431
1439
  `replayDlq` creates a temporary consumer group that reads the DLQ topic up to the high-watermark at the time of the call — messages published after replay starts are not included. DLQ metadata headers (`x-dlq-original-topic`, `x-dlq-error-message`, `x-dlq-error-stack`, `x-dlq-failed-at`, `x-dlq-attempt-count`) are stripped from the replayed messages; all other headers (e.g. `x-correlation-id`) are preserved.
1432
1440
 
1441
+ ## Read snapshot
1442
+
1443
+ Read any topic from the beginning to its current high-watermark and return a `Map<key, EventEnvelope<T>>` with the **latest value per key**. Useful for bootstrapping in-memory state at service startup without an external cache:
1444
+
1445
+ ```typescript
1446
+ // Build a key → latest-value index for a compacted topic
1447
+ const orders = await kafka.readSnapshot('orders.state');
1448
+ orders.get('order-123'); // EventEnvelope with the latest payload for that key
1449
+ ```
1450
+
1451
+ Tombstone records (null-value messages) remove the key from the map, consistent with log-compaction semantics:
1452
+
1453
+ ```typescript
1454
+ const snapshot = await kafka.readSnapshot('orders.state', {
1455
+ onTombstone: (key) => console.log(`Key deleted: ${key}`),
1456
+ });
1457
+ ```
1458
+
1459
+ Optional schema validation skips invalid messages with a warning instead of throwing:
1460
+
1461
+ ```typescript
1462
+ import { z } from 'zod';
1463
+
1464
+ const OrderSchema = z.object({ orderId: z.string(), amount: z.number() });
1465
+
1466
+ const snapshot = await kafka.readSnapshot('orders.state', {
1467
+ schema: OrderSchema,
1468
+ });
1469
+ ```
1470
+
1471
+ `readSnapshot` uses a short-lived temporary consumer that is **not** registered in the client's consumer map — it disconnects as soon as all partitions reach their high-watermark. The call resolves with the complete snapshot; it does not stream.
1472
+
1473
+ | Option | Description |
1474
+ | ------ | ----------- |
1475
+ | `schema` | Zod / Valibot / ArkType (any `.parse()` shape) — invalid messages are skipped with a warning |
1476
+ | `onTombstone` | Called for each tombstone key before it is removed from the map |
1477
+
1478
+ ## Offset checkpointing
1479
+
1480
+ Save and restore consumer group offsets via a dedicated Kafka topic. Useful for point-in-time recovery, blue/green deployments, and disaster recovery without resetting to `earliest`/`latest`.
1481
+
1482
+ ### checkpointOffsets
1483
+
1484
+ Snapshot the current committed offsets of a consumer group into a Kafka topic:
1485
+
1486
+ ```typescript
1487
+ // Checkpoint the default group
1488
+ const result = await kafka.checkpointOffsets(undefined, 'checkpoints');
1489
+ // {
1490
+ // groupId: 'orders-group',
1491
+ // topics: ['orders', 'payments'],
1492
+ // partitionCount: 4,
1493
+ // savedAt: 1710000000000,
1494
+ // }
1495
+
1496
+ // Checkpoint a specific group
1497
+ await kafka.checkpointOffsets('payments-group', 'checkpoints');
1498
+ ```
1499
+
1500
+ Each call appends a new record to the checkpoint topic keyed by `groupId`, with `x-checkpoint-timestamp` and `x-checkpoint-group-id` headers. The checkpoint topic acts as an append-only audit log — use a **non-compacted** topic to retain history.
1501
+
1502
+ Requires `connectProducer()` to have been called before checkpointing.
1503
+
1504
+ ### restoreFromCheckpoint
1505
+
1506
+ Restore a consumer group's committed offsets from the nearest checkpoint:
1507
+
1508
+ ```typescript
1509
+ // Restore to the latest checkpoint
1510
+ const result = await kafka.restoreFromCheckpoint(undefined, 'checkpoints');
1511
+ // {
1512
+ // groupId: 'orders-group',
1513
+ // offsets: [{ topic: 'orders', partition: 0, offset: '1500' }, ...],
1514
+ // restoredAt: 1710000000000,
1515
+ // checkpointAge: 3600000, // ms since the checkpoint was saved
1516
+ // }
1517
+
1518
+ // Restore to the nearest checkpoint before a specific timestamp
1519
+ const ts = new Date('2024-06-01T12:00:00Z').getTime();
1520
+ await kafka.restoreFromCheckpoint(undefined, 'checkpoints', { timestamp: ts });
1521
+ ```
1522
+
1523
+ Checkpoint selection rules:
1524
+
1525
+ 1. If `timestamp` is omitted — the **latest** checkpoint is selected.
1526
+ 2. If `timestamp` is given — the newest checkpoint whose `savedAt ≤ timestamp` is selected.
1527
+ 3. If all checkpoints are newer than `timestamp` — falls back to the **oldest** checkpoint with a warning.
1528
+ 4. Throws if no checkpoint exists for the group.
1529
+
1530
+ **Important:** the consumer group must be stopped before calling `restoreFromCheckpoint`. An error is thrown if any consumer in the group is currently running.
1531
+
1532
+ `restoreFromCheckpoint` uses a short-lived temporary consumer to read all checkpoint records up to the current high-watermark, then calls `admin.setOffsets` for every topic-partition in the selected checkpoint.
1533
+
1534
+ | Option | Description |
1535
+ | ------ | ----------- |
1536
+ | `timestamp` | Target Unix ms. Omit to restore the latest checkpoint |
1537
+
1538
+ ## Windowed batch consumer
1539
+
1540
+ Accumulate messages into a buffer and flush a handler when either a **size** or **time** trigger fires — whichever comes first. Gives explicit control over both batch size and processing latency, unlike `startBatchConsumer` which delivers broker-sized batches of unpredictable size:
1541
+
1542
+ ```typescript
1543
+ const handle = await kafka.startWindowConsumer(
1544
+ 'orders',
1545
+ async (envelopes, meta) => {
1546
+ console.log(`Flushing ${envelopes.length} orders (trigger: ${meta.trigger})`);
1547
+ await db.bulkInsert(envelopes.map((e) => e.payload));
1548
+ },
1549
+ {
1550
+ maxMessages: 100, // flush when 100 messages accumulate
1551
+ maxMs: 5_000, // or after 5 s, whichever fires first
1552
+ },
1553
+ );
1554
+ ```
1555
+
1556
+ `WindowMeta` is passed to the handler on every flush:
1557
+
1558
+ | Field | Description |
1559
+ | ----- | ----------- |
1560
+ | `trigger` | `"size"` — buffer reached `maxMessages`; `"time"` — `maxMs` elapsed |
1561
+ | `windowStart` | Unix ms of the first message in the flushed window |
1562
+ | `windowEnd` | Unix ms when the flush was initiated |
1563
+
1564
+ On `handle.stop()` any buffered messages are flushed before the consumer disconnects — no messages are lost on clean shutdown.
1565
+
1566
+ `retryTopics: true` is rejected at startup with a clear error — the retry topic chain is incompatible with windowed accumulation.
1567
+
1568
+ | Option | Default | Description |
1569
+ | ------ | ------- | ----------- |
1570
+ | `maxMessages` | required | Flush when the buffer reaches this many messages |
1571
+ | `maxMs` | required | Flush after this many ms since the first buffered message |
1572
+ | All `ConsumerOptions` fields | — | Standard consumer options apply (`retry`, `dlq`, `deduplication`, etc.) |
1573
+
1574
+ ## Header-based routing
1575
+
1576
+ Dispatch messages to different handlers based on the value of a Kafka header — no `if/switch` boilerplate in a catch-all handler. Useful when one topic carries multiple event types distinguished by a header like `x-event-type`:
1577
+
1578
+ ```typescript
1579
+ await kafka.startRoutedConsumer(['events'], {
1580
+ header: 'x-event-type',
1581
+ routes: {
1582
+ 'order.created': async (e) => handleOrderCreated(e.payload),
1583
+ 'order.cancelled': async (e) => handleOrderCancelled(e.payload),
1584
+ 'order.shipped': async (e) => handleOrderShipped(e.payload),
1585
+ },
1586
+ fallback: async (e) => logger.warn('Unknown event type', e.headers),
1587
+ });
1588
+ ```
1589
+
1590
+ Messages are dispatched to the handler whose key matches `envelope.headers[header]`. If the header is absent or its value has no matching route:
1591
+
1592
+ - The `fallback` handler is called if provided.
1593
+ - The message is silently skipped if `fallback` is omitted.
1594
+
1595
+ All standard `ConsumerOptions` apply uniformly across every route — retry, DLQ, deduplication, circuit breaker, interceptors, etc.:
1596
+
1597
+ ```typescript
1598
+ await kafka.startRoutedConsumer(
1599
+ ['events'],
1600
+ {
1601
+ header: 'x-event-type',
1602
+ routes: {
1603
+ 'payment.processed': async (e) => processPayment(e.payload),
1604
+ 'payment.failed': async (e) => handleFailure(e.payload),
1605
+ },
1606
+ },
1607
+ {
1608
+ retry: { maxRetries: 3, backoffMs: 500 },
1609
+ dlq: true,
1610
+ deduplication: { strategy: 'drop' },
1611
+ },
1612
+ );
1613
+ ```
1614
+
1615
+ The returned `ConsumerHandle` works the same as `startConsumer` — `handle.stop()` stops the consumer cleanly.
1616
+
1617
+ ## Lag-based producer throttling
1618
+
1619
+ Delay `sendMessage`, `sendBatch`, and `sendTombstone` automatically when a consumer group falls behind. Provides backpressure without an external store — the lag is measured via the built-in admin API:
1620
+
1621
+ ```typescript
1622
+ const kafka = new KafkaClient('my-service', 'orders-group', brokers, {
1623
+ lagThrottle: {
1624
+ maxLag: 10_000, // delay sends when lag exceeds 10 000 messages
1625
+ pollIntervalMs: 5_000, // check lag every 5 s (default)
1626
+ maxWaitMs: 30_000, // give up waiting after 30 s and send anyway (default)
1627
+ },
1628
+ });
1629
+
1630
+ await kafka.connectProducer(); // starts the background polling loop
1631
+ ```
1632
+
1633
+ While the observed lag exceeds `maxLag`, every send waits in a `100 ms` spin-loop until the lag drops or `maxWaitMs` is reached. When `maxWaitMs` is exceeded a warning is logged and the send proceeds — this is best-effort throttling, not hard backpressure.
1634
+
1635
+ | Option | Default | Description |
1636
+ | ------ | ------- | ----------- |
1637
+ | `maxLag` | required | Total lag threshold (sum across all partitions) |
1638
+ | `groupId` | client default group | Consumer group whose lag is monitored |
1639
+ | `pollIntervalMs` | `5000` | How often to call `getConsumerLag()` in the background |
1640
+ | `maxWaitMs` | `30000` | Maximum time (ms) a single send waits before proceeding anyway |
1641
+
1642
+ The polling timer is started by `connectProducer()` and cleared by `disconnect()` or `disconnectProducer()`. Poll errors are silently ignored — a failing admin call never blocks sends.
1643
+
1644
+ ## Transactional consumer
1645
+
1646
+ Consume messages with **exactly-once semantics** for read-process-write pipelines. Each message is processed inside a Kafka transaction: outgoing sends and the source offset commit succeed or fail atomically — no partial writes, no duplicates on restart:
1647
+
1648
+ ```typescript
1649
+ await kafka.startTransactionalConsumer(
1650
+ ['orders'],
1651
+ async (envelope, tx) => {
1652
+ // Both sends and the offset commit are part of one atomic transaction
1653
+ await tx.send('invoices', { orderId: envelope.payload.orderId, amount: envelope.payload.amount });
1654
+ await tx.send('notifications', { userId: envelope.payload.userId, message: 'Order confirmed' });
1655
+ // tx commits automatically when this function returns
1656
+ },
1657
+ );
1658
+ ```
1659
+
1660
+ The handler receives a `TransactionalHandlerContext` with two methods:
1661
+
1662
+ | Method | Description |
1663
+ | ------ | ----------- |
1664
+ | `tx.send(topic, message, options?)` | Stage a single message inside the transaction |
1665
+ | `tx.sendBatch(topic, messages, options?)` | Stage multiple messages inside the transaction |
1666
+
1667
+ **On handler success** — staged sends + source offset commit are committed atomically via `tx.sendOffsets()` + `tx.commit()`. Downstream consumers only see the messages after the commit.
1668
+
1669
+ **On handler failure** — `tx.abort()` is called automatically. No staged sends become visible. The source message offset is not committed, so Kafka redelivers the message on the next poll.
1670
+
1671
+ ```typescript
1672
+ await kafka.startTransactionalConsumer(
1673
+ ['payments'],
1674
+ async (envelope, tx) => {
1675
+ const result = await processPayment(envelope.payload);
1676
+ // Only route to the audit topic if payment succeeded
1677
+ await tx.send('payments.audit', { paymentId: result.id, status: 'ok' });
1678
+ },
1679
+ {
1680
+ groupId: 'payments-eos',
1681
+ deduplication: { strategy: 'drop' }, // standard ConsumerOptions apply
1682
+ },
1683
+ );
1684
+ ```
1685
+
1686
+ `retryTopics: true` is rejected at startup — EOS redelivery on failure is already guaranteed by the transaction. `autoCommit` is always `false` (managed internally).
1687
+
1433
1688
  ## Admin API
1434
1689
 
1435
1690
  Inspect consumer groups, topic metadata, and delete records via the built-in admin client — no separate connection needed.
@@ -1874,6 +2129,7 @@ src/
1874
2129
  ├── client/ # Core library — zero framework dependencies
1875
2130
  │ ├── types.ts # All public interfaces: KafkaClientOptions, ConsumerOptions,
1876
2131
  │ │ # SendOptions, EventEnvelope, ConsumerHandle, BatchMeta,
2132
+ │ │ # RoutingOptions, TransactionalHandlerContext,
1877
2133
  │ │ # KafkaInstrumentation, ConsumerInterceptor, SchemaLike, …
1878
2134
  │ ├── errors.ts # KafkaProcessingError, KafkaRetryExhaustedError, KafkaValidationError
1879
2135
  │ │
@@ -1885,7 +2141,10 @@ src/
1885
2141
  │ ├── kafka.client/
1886
2142
  │ │ ├── index.ts # KafkaClient class — public API, producer/consumer lifecycle,
1887
2143
  │ │ │ # Lamport clock, ALS correlation ID, graceful shutdown,
1888
- │ │ │ # Lamport clock recovery (clockRecovery option)
2144
+ │ │ │ # clockRecovery, readSnapshot(), checkpointOffsets(),
2145
+ │ │ │ # restoreFromCheckpoint(), startWindowConsumer(),
2146
+ │ │ │ # startRoutedConsumer(), startTransactionalConsumer(),
2147
+ │ │ │ # lagThrottle poller, waitIfThrottled()
1889
2148
  │ │ │
1890
2149
  │ │ ├── admin/
1891
2150
  │ │ │ └── ops.ts # AdminOps: listConsumerGroups(), describeTopics(),
@@ -1938,6 +2197,11 @@ src/
1938
2197
  │ │ ├── deduplication.spec.ts # Lamport clock dedup, strategies (drop/dlq/topic)
1939
2198
  │ │ ├── interceptors.spec.ts # ConsumerInterceptor before/after/onError hooks
1940
2199
  │ │ ├── dlq-replay.spec.ts # replayDlq(), dryRun, filter, targetTopic
2200
+ │ │ ├── read-snapshot.spec.ts # readSnapshot(), tombstones, multi-partition, schema, HWM
2201
+ │ │ ├── checkpoint.spec.ts # checkpointOffsets(), restoreFromCheckpoint(), timestamp selection
2202
+ │ │ ├── window-consumer.spec.ts # startWindowConsumer(), size/time triggers, shutdown flush
2203
+ │ │ ├── router.spec.ts # startRoutedConsumer(), route dispatch, fallback, skip
2204
+ │ │ ├── transactional-consumer.spec.ts # startTransactionalConsumer(), EOS commit/abort, tx.send
1941
2205
  │ │ ├── ttl.spec.ts # messageTtlMs, onTtlExpired, TTL→DLQ routing
1942
2206
  │ │ ├── message-lost.spec.ts # onMessageLost — handler error, validation, DLQ failure
1943
2207
  │ │ ├── handler-timeout.spec.ts # handlerTimeoutMs warning
@@ -1946,7 +2210,8 @@ src/
1946
2210
  │ │ ├── producer.spec.ts # sendMessage(), sendBatch(), sendTombstone(), compression
1947
2211
  │ │ ├── transaction.spec.ts # transaction(), tx.send(), tx.sendBatch(), rollback
1948
2212
  │ │ ├── schema.spec.ts # Schema validation on send/consume, strictSchemas
1949
- │ │ └── topic.spec.ts # topic() descriptor, TopicsFrom, schema registry
2213
+ │ │ ├── topic.spec.ts # topic() descriptor, TopicsFrom, schema registry
2214
+ │ │ └── lag-throttle.spec.ts # lagThrottle option, threshold, maxWaitMs, poll errors
1950
2215
  │ ├── admin/
1951
2216
  │ │ ├── admin.spec.ts # listConsumerGroups(), describeTopics(), deleteRecords(),
1952
2217
  │ │ │ # resetOffsets(), seekToOffset(), seekToTimestamp()