@eventferry/kafka 3.3.0 → 3.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/CHANGELOG.md +426 -0
  2. package/package.json +5 -4
package/CHANGELOG.md ADDED
@@ -0,0 +1,426 @@
1
+ # @eventferry/kafka
2
+
3
+ ## 3.3.1
4
+
5
+ ### Patch Changes
6
+
7
+ - 3c33f71: **chore: ship `CHANGELOG.md` inside the npm tarball**
8
+
9
+ Previously, each package's `files` allowlist contained only `"dist"` (and `"sql"` for `@eventferry/postgres`), so the auto-generated `CHANGELOG.md` was never published. Users browsing the package on npmjs.com or unpacking the tarball couldn't see release notes — they had to navigate to the GitHub repo.
10
+
11
+ This release adds `"CHANGELOG.md"` to the `files` array of every publishable package. Starting with this version, the per-version release notes are accessible:
12
+
13
+ - Directly in `node_modules/@eventferry/<pkg>/CHANGELOG.md` after `npm install`
14
+ - In the file listing on npmjs.com (under the "Code" / "Files" tab, depending on the npm UI)
15
+ - Inside the tarball downloaded from `https://registry.npmjs.org/...`
16
+
17
+ No code or API surface changes.
18
+
19
+ - Updated dependencies [3c33f71]
20
+ - @eventferry/core@3.3.1
21
+
22
+ ## 3.3.0
23
+
24
+ ### Minor Changes
25
+
26
+ - cdc20cf: **feat: DLQ enrichment + backpressure runtime + quota multiplier — Tier 1 of the reliability gap closed**
27
+
28
+ ### DLQ enrichment
29
+
30
+ Records routed to the dead-letter queue now carry the full context an operator needs to triage:
31
+
32
+ | Header | Set by | Note |
33
+ | --------------------------- | --------- | ------------------------------------------------------------------------------------------------ |
34
+ | `original-topic` | relay | already existed |
35
+ | `dlq-reason` | publisher | already existed (`error.message`) |
36
+ | `dlq-failed-at` | publisher | already existed (ISO timestamp) |
37
+ | `dlq-error-class` | publisher | **new** — `error.name` / constructor name |
38
+ | `dlq-attempts` | relay | **new** — string-encoded `attempts` count |
39
+ | `dlq-original-aggregate-id` | relay | **new** — for joining with business state |
40
+ | `dlq-original-message-id` | relay | **new** — for dedup / idempotency lookups |
41
+ | `dlq-error-stack` | relay | **new** — opt-in via `DlqConfig.includeStackTraces`, truncated to `maxStackBytes` (default 4 KB) |
42
+
43
+ ```ts
44
+ new Relay({
45
+ store,
46
+ publisher,
47
+ dlq: { topic: "orders.dlq", includeStackTraces: true, maxStackBytes: 4096 },
48
+ });
49
+ ```
50
+
51
+ ### Backpressure runtime behavior
52
+
53
+ When the driver classifies a failure as `errorKind: "backpressure"` (client-side producer queue full), the relay no longer treats it like a regular retriable failure. Instead:
54
+
55
+ - The record is re-queued via the new `OutboxStore.requeue(id, retryAt)` method,
56
+ - `attempts` is **not incremented** — the buffer being full is a "slow down" signal, not the record's fault,
57
+ - The retry is scheduled `RetryConfig.backpressureDelayMs` ms ahead (default 1000 ms).
58
+
59
+ Stores that don't implement `requeue` fall back to `markFailed` (with attempts++); both `@eventferry/postgres` and `@eventferry/mysql` ship a real implementation.
60
+
61
+ ### Quota multiplier
62
+
63
+ When the driver classifies a failure as `errorKind: "quota"` (broker `THROTTLING_QUOTA_EXCEEDED`), the scheduled retry delay is multiplied by `RetryConfig.quotaMultiplier` (default 5) so the producer gives the broker breathing room. Quota failures DO count as attempts — after the budget is exhausted the record routes to DLQ + `dead`.
64
+
65
+ ### New / changed types
66
+
67
+ - `RetryConfig` gains `backpressureDelayMs?` and `quotaMultiplier?`.
68
+ - `DlqConfig` gains `includeStackTraces?` and `maxStackBytes?`.
69
+ - `OutboxStore.requeue?(recordId, retryAt)` is a new **optional** method. Stores without it fall through to `markFailed`.
70
+
71
+ ### Backward compatibility
72
+
73
+ Pure-additive everywhere. Default behavior matches the prior release:
74
+
75
+ - A `RetryConfig` without `backpressureDelayMs` uses 1000 ms (sensible default).
76
+ - A `DlqConfig` without `includeStackTraces` keeps DLQ messages small (default off).
77
+ - An `OutboxStore` without `requeue` falls back to `markFailed` — same as before, just with a documented quirk.
78
+
79
+ This closes the last three Tier 1 items in `docs/kafka-gap-analysis/reliability.md`. Phase A reliability surface is now ~100% complete.
80
+
81
+ ### Patch Changes
82
+
83
+ - Updated dependencies [cdc20cf]
84
+ - @eventferry/core@3.3.0
85
+
86
+ ## 3.2.1
87
+
88
+ ### Patch Changes
89
+
90
+ - 9beb3e2: **chore: migrate to independent versioning (Astro pattern)**
91
+
92
+ Fixes the major-version inflation that produced four consecutive surprise majors (`1.0.4 → 2.0.0`, `2.0.0 → 3.0.0`, `3.0.0 → 4.0.0 corrected to 3.1.0`, `3.1.0 → 4.0.0 corrected to 3.2.0`) from changesets whose frontmatter only asked for `minor`.
93
+
94
+ **Root cause** (cited in [changesets/changesets#1759](https://github.com/changesets/changesets/issues/1759) and [docs/decisions.md](https://github.com/changesets/changesets/blob/main/docs/decisions.md)): the adapters listed `@eventferry/core` as a `peerDependency` with `workspace:*`. Changesets' documented rule is that an internal bump of a peer forces a major bump on the dependent — and the `fixed: [["@eventferry/*"]]` group reconciler then propagated that major across every package in the group.
95
+
96
+ **Fix** (exactly the [Astro config](https://github.com/withastro/astro/blob/main/.changeset/config.json)):
97
+
98
+ 1. `.changeset/config.json` — drop `fixed`, set `linked: []`, enable
99
+ `___experimentalUnsafeOptions_WILL_CHANGE_IN_PATCH.onlyUpdatePeerDependentsWhenOutOfRange: true`.
100
+ 2. Move `@eventferry/core` from `peerDependencies` to `dependencies` in
101
+ `@eventferry/postgres`, `@eventferry/mysql`, `@eventferry/kafka`, and
102
+ `@eventferry/schema-registry`. External user-facing peers (`pg`,
103
+ `mysql2`, `kafkajs`, `@confluentinc/kafka-javascript`,
104
+ `@kafkajs/confluent-schema-registry`) stay unchanged.
105
+
106
+ **Effect on releases.** Packages now evolve at independent semver tempos: a `core: minor` changeset produces `core@3.3.0` alongside `postgres@3.2.1` (patch, from "Updated dependencies"). No more major surprises. No more manual force-push corrections.
107
+
108
+ **Effect on consumers.** Pure-additive at the install boundary: `npm i @eventferry/kafka` now resolves `@eventferry/core` automatically (it's a regular dep). Previously consumers had to install it themselves as a peer; the typical flow already did this. No source-code changes required.
109
+
110
+ - Updated dependencies [9beb3e2]
111
+ - @eventferry/core@3.2.1
112
+
113
+ ## 3.2.0
114
+
115
+ ### Minor Changes
116
+
117
+ - 0208275: **feat: OpenTelemetry publish span + hook surface + logger passthrough**
118
+
119
+ ### OpenTelemetry tracing
120
+
121
+ `KafkaPublisher` now accepts an optional `tracer` that follows the current stable [OpenTelemetry messaging semantic conventions](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/messaging/kafka.md). One span per `publish()` call, named `"{topic} publish"`, with `messaging.system=kafka`, `messaging.operation.type=publish`, `messaging.destination.name=<topic>`, and `messaging.batch.message_count=<n>`. No dependency on `@opentelemetry/api` — wire through a 10-line adapter:
122
+
123
+ ```ts
124
+ import { trace, SpanKind, SpanStatusCode } from "@opentelemetry/api";
125
+ import type { KafkaTracer } from "@eventferry/kafka";
126
+
127
+ const otel = trace.getTracer("@eventferry/kafka");
128
+ const tracer: KafkaTracer = {
129
+ startPublishSpan(name, attributes) {
130
+ const span = otel.startSpan(name, {
131
+ kind: SpanKind.PRODUCER,
132
+ attributes,
133
+ });
134
+ return {
135
+ /* setAttribute, setStatus, recordException, end */
136
+ };
137
+ },
138
+ };
139
+
140
+ new KafkaPublisher({ brokers, tracer });
141
+ ```
142
+
143
+ ### Hook surface
144
+
145
+ `KafkaPublisher` now accepts `hooks` for observability and metrics integration:
146
+
147
+ ```ts
148
+ new KafkaPublisher({
149
+ brokers,
150
+ hooks: {
151
+ onConnect,
152
+ onDisconnect,
153
+ onPublish,
154
+ onError,
155
+ onTransactionAbort,
156
+ },
157
+ });
158
+ ```
159
+
160
+ Hooks are **safe by construction**: a throwing hook never breaks publishing — the publisher catches and logs via the configured `logger`.
161
+
162
+ ### Logger passthrough
163
+
164
+ A new optional `logger?: Logger` field on `KafkaPublisherOptions` (same `Logger` interface as `@eventferry/core`). Routes the publisher's own diagnostics (driver warnings about unsupported tuning, hook failures) through your logging stack instead of `console.warn`. When omitted, behavior matches today (drivers still fall back to `console.warn`).
165
+
166
+ ### Backward compatibility
167
+
168
+ 100% additive. Existing call sites (no hooks, no tracer, no logger) work unchanged — the tracer defaults to a `NoopKafkaTracer`, the hook map defaults to `{}`, and the logger stays undefined.
169
+
170
+ - ae64a98: **feat: callable `transactionalId` + abort-aware tx hook**
171
+
172
+ ### Callable `transactionalId`
173
+
174
+ `transactionalId` accepts a sync or async resolver in addition to a plain string:
175
+
176
+ ```ts
177
+ new KafkaPublisher({
178
+ brokers,
179
+ transactional: true,
180
+ transactionalId: () =>
181
+ `${process.env.POD_NAME}-${process.env.REPLICA_INDEX}`,
182
+ });
183
+ ```
184
+
185
+ Useful when the id depends on runtime context that isn't known at construction time (pod name, AZ + replica index, k8s ordinal). For multi-instance EOS, the resolved id MUST be stable across a single instance's restarts but UNIQUE across instances. The plain-string form remains supported and unchanged.
186
+
187
+ ### Abort-aware `onTransactionAbort` hook
188
+
189
+ When a transactional `sendBatch` triggers the abort path, the publisher fires `hooks.onTransactionAbort(err)` so dashboards and metrics catch EOS failure rates:
190
+
191
+ ```ts
192
+ new KafkaPublisher({
193
+ brokers,
194
+ transactional: true,
195
+ transactionalId: "orders-tx",
196
+ hooks: {
197
+ onTransactionAbort: (err) => metrics.txAborts.inc({ reason: err.name }),
198
+ },
199
+ });
200
+ ```
201
+
202
+ Best-effort: the hook is safe-wrapped (a throwing hook never breaks the abort path); both `kafkajs` and `@confluentinc/kafka-javascript` drivers fire it from their transaction catch blocks.
203
+
204
+ ### Backward compatibility
205
+
206
+ 100% additive. Existing call sites — string `transactionalId`, no hooks — work unchanged.
207
+
208
+ ### Patch Changes
209
+
210
+ - @eventferry/core@3.2.0
211
+
212
+ ## 3.1.0
213
+
214
+ ### Minor Changes
215
+
216
+ - da39b08: **feat: producer tuning passthrough + per-message partition override + kafkajs partitioner choice**
217
+
218
+ ### Producer tuning
219
+
220
+ `KafkaPublisher` now accepts the full set of producer tuning knobs every serious Kafka deployment eventually needs:
221
+
222
+ ```ts
223
+ new KafkaPublisher({
224
+ driver: "confluent",
225
+ brokers,
226
+ lingerMs: 25, // ⚠ confluent only
227
+ batchSize: 131_072, // ⚠ confluent only
228
+ maxInFlightRequests: 5,
229
+ requestTimeoutMs: 30_000,
230
+ deliveryTimeoutMs: 120_000, // ⚠ confluent only
231
+ maxRequestSize: 2_000_000, // ⚠ confluent only
232
+ transactionTimeoutMs: 90_000,
233
+ });
234
+ ```
235
+
236
+ **Driver asymmetry:** `kafkajs` has no producer-level config for `lingerMs`, `batchSize`, `deliveryTimeoutMs`, or `maxRequestSize` — its batching is sticky-partitioner + hardcoded internals. The typed API stays uniform; on the kafkajs driver, those four knobs log a **one-time** warning (deduped process-wide) and are otherwise ignored. For fine-grained tuning, switch to the confluent driver.
237
+
238
+ ### Per-message partition override
239
+
240
+ `PublishableMessage` gains an optional `partition?: number` field. When set, the publisher routes that record to the exact partition, bypassing the configured partitioner. Use cases: compacted topics with application-managed sharding, tenant-affinity routing, geo-pinning. Both drivers honor it.
241
+
242
+ ### kafkajs partitioner choice
243
+
244
+ Silences the noisy `KafkaJSPartitionerNotSpecified` warning kafkajs v2 emits on every producer instance, by letting you pick a partitioner explicitly:
245
+
246
+ ```ts
247
+ new KafkaPublisher({
248
+ driver: "kafkajs",
249
+ brokers,
250
+ partitioner: "java-compatible", // (default) | "legacy" | "default"
251
+ });
252
+ ```
253
+
254
+ - `"java-compatible"` is the new greenfield default (matches the Java client's murmur2).
255
+ - `"legacy"` preserves pre-v2 hash continuity for existing topics.
256
+ - `"default"` follows kafkajs's current default.
257
+
258
+ ### Backward compatibility
259
+
260
+ Pure-additive. Existing call sites continue to work unchanged; the partitioner-choice default (`"java-compatible"`) is what kafkajs v2's migration guide recommends for new producers.
261
+
262
+ - bbb1792: **feat: mTLS + SASL/OAUTHBEARER support**
263
+
264
+ Two new authentication paths for managed and enterprise Kafka clusters.
265
+
266
+ ### mTLS (mutual TLS)
267
+
268
+ The `ssl` option now accepts a full `TlsConfig` in addition to the boolean shorthand:
269
+
270
+ ```ts
271
+ new KafkaPublisher({
272
+ brokers: ["broker:9093"],
273
+ ssl: {
274
+ ca: readFileSync("/etc/ssl/kafka-ca.pem"),
275
+ cert: readFileSync("/etc/ssl/client.pem"),
276
+ key: readFileSync("/etc/ssl/client-key.pem"),
277
+ passphrase: "optional",
278
+ servername: "broker.example.com", // SNI override
279
+ },
280
+ });
281
+ ```
282
+
283
+ Buffer and PEM-string inputs are both supported. `ssl: true` continues to work unchanged (one-way TLS using the driver's default trust store).
284
+
285
+ > `rejectUnauthorized` is intentionally NOT exposed. TLS verification is non-negotiable; pass the cluster CA via `ca` for dev clusters with self-signed certs.
286
+
287
+ ### SASL/OAUTHBEARER
288
+
289
+ Required for Azure Event Hubs, Confluent Cloud with OAuth/SSO, and any OIDC-fronted cluster. Bring your own token provider:
290
+
291
+ ```ts
292
+ new KafkaPublisher({
293
+ brokers: ["broker:9093"],
294
+ ssl: true,
295
+ sasl: {
296
+ mechanism: "oauthbearer",
297
+ oauthBearerProvider: async () => ({
298
+ value: bearerToken,
299
+ principal: "user@realm", // required on confluent
300
+ lifetime: 3600_000, // ms — required on confluent
301
+ extensions: { scope: "read,write" },
302
+ }),
303
+ },
304
+ });
305
+ ```
306
+
307
+ **Driver asymmetry to know about:** `kafkajs` reads only `value`; `@confluentinc/kafka-javascript` requires `value` + `principal` + `lifetime` (ms) and accepts an optional `extensions` map. Cross-driver portable providers should populate all four.
308
+
309
+ ### Confluent driver internals
310
+
311
+ `@confluentinc/kafka-javascript` integrates via a small translator: simple `ssl: true` and SASL configs go through the kafkajs-compat layer, but a custom `TlsConfig` is mapped to the librdkafka PEM keys (`ssl.ca.pem`, `ssl.certificate.pem`, `ssl.key.pem`, `ssl.key.password`) and `security.protocol` is auto-derived (`ssl` / `sasl_plaintext` / `sasl_ssl`). Buffer inputs are coerced to UTF-8 strings (librdkafka does not accept Buffers).
312
+
313
+ ### Backward compatibility
314
+
315
+ Pure-additive. Existing configs (`ssl: true | false | undefined`, password SASL) work unchanged.
316
+
317
+ ### Patch Changes
318
+
319
+ - Updated dependencies [da39b08]
320
+ - @eventferry/core@3.1.0
321
+
322
+ ## 3.0.0
323
+
324
+ ### Minor Changes
325
+
326
+ - f0c7483: **feat: error classification for smarter retry, DLQ, and pause behavior**
327
+
328
+ Publisher implementations can now tag each failed `PublishResult` with an `errorKind` so the relay knows whether the error is worth retrying.
329
+
330
+ **New in `@eventferry/core`:**
331
+
332
+ - `PublishErrorKind = "retriable" | "fatal" | "poison" | "backpressure" | "quota"` — opt-in classification surface on `PublishResult.errorKind`.
333
+ - The `Relay` now reads `errorKind`:
334
+ - `"fatal"` (auth denied, fenced epoch, transactional id rejected) and `"poison"` (oversized record, corrupt payload, schema rejected) **short-circuit retries** straight to the DLQ + `dead` status. No more burning the retry budget on errors that cannot succeed.
335
+ - `"retriable"`, `"backpressure"`, `"quota"`, and absent (`undefined`) continue to use the existing backoff schedule, preserving backward compatibility. Smarter `backpressure` / `quota` handling (pause polling, longer backoff) is planned for a follow-up release.
336
+
337
+ **New in `@eventferry/kafka`:**
338
+
339
+ - `classifyKafkajsError(err): PublishErrorKind` — maps the most-common `KafkaJSProtocolError` types/codes and the `KafkaJSConnectionError` / `KafkaJSRequestTimeoutError` / `KafkaJSNonRetriableError` subclasses to a category. Verified against `kafkajs/src/errors.js`.
340
+ - `classifyConfluentError(err): PublishErrorKind` — maps the librdkafka `RD_KAFKA_RESP_ERR_*` codes (both negative internal codes and Kafka wire-protocol codes) to a category. Verified against `librdkafka/src/rdkafka.h`. Includes the dedicated `"backpressure"` mapping for `ERR__QUEUE_FULL` (-184) and `"quota"` for `ERR_THROTTLING_QUOTA_EXCEEDED` (89).
341
+ - Both drivers (`KafkaJsDriver`, `ConfluentDriver`) now call their respective classifier in the catch path and emit the `errorKind` on every failed `PublishResult`.
342
+
343
+ **Backward compatibility:** `errorKind` is optional everywhere. Existing publisher implementations that don't set it continue to work unchanged — the relay treats absent `errorKind` as `"retriable"`, which is what the relay did before this change.
344
+
345
+ **Migration:** none required.
346
+
347
+ ### Patch Changes
348
+
349
+ - Updated dependencies [f0c7483]
350
+ - @eventferry/core@3.0.0
351
+
352
+ ## 2.0.0
353
+
354
+ ### Patch Changes
355
+
356
+ - @eventferry/core@2.0.0
357
+
358
+ ## 1.0.4
359
+
360
+ ### Patch Changes
361
+
362
+ - Updated dependencies [64d115d]
363
+ - @eventferry/core@1.0.4
364
+
365
+ ## 1.0.3
366
+
367
+ ### Patch Changes
368
+
369
+ - Updated dependencies [aaca9a2]
370
+ - @eventferry/core@1.0.3
371
+
372
+ ## 1.0.2
373
+
374
+ ### Patch Changes
375
+
376
+ - 89f1867: Declare `engines.node` (>=18) so npm shows the supported Node version and tooling can warn on unsupported runtimes.
377
+ - Updated dependencies [89f1867]
378
+ - @eventferry/core@1.0.2
379
+
380
+ ## 1.0.1
381
+
382
+ ### Patch Changes
383
+
384
+ - docs: polish per-package READMEs (npm page content). No code changes.
385
+ - Updated dependencies
386
+ - @eventferry/core@1.0.1
387
+
388
+ ## 1.0.0
389
+
390
+ ### Minor Changes
391
+
392
+ - b06f8ec: Strict per-aggregate ordering, crash recovery, and driver/packaging fixes.
393
+
394
+ - **postgres:** the claim query now enforces strict per-aggregate ordering by
395
+ only taking the _head_ of each aggregate (no earlier unfinished row for the
396
+ same `aggregateId`). At most one in-flight message per aggregate; failed
397
+ messages block their successors until resolved.
398
+ - **postgres:** added a `claimed_at` column and a visibility-timeout reaper
399
+ (`claimTimeoutMs`, default 60s) so rows orphaned by a crashed relay are
400
+ reclaimed instead of stuck in `processing` forever. Migration is upgrade-safe
401
+ (`ADD COLUMN IF NOT EXISTS`); the partial indexes were retuned for the new
402
+ ordered, reaper-aware claim.
403
+ - **core:** dead-lettered messages now carry the real `original-topic` header
404
+ (previously always empty); `ConsoleLogger` routes warn/error to the matching
405
+ `console` methods.
406
+ - **kafka:** the confluent driver now honors `acks` and `compression` (it
407
+ silently ignored them before), matching the kafkajs driver.
408
+ - **packaging:** the `@eventferry/postgres/migrations` subpath export now
409
+ advertises its types; `pnpm-workspace.yaml` dropped an invalid placeholder
410
+ block.
411
+
412
+ Note: `claimTimeoutMs` should exceed your worst-case publish latency. This is
413
+ an at-least-once system — pair it with idempotent producers/consumers.
414
+
415
+ ### Patch Changes
416
+
417
+ - b06f8ec: Fix the kafkajs driver using `producer.send` with a multi-topic `topicMessages`
418
+ payload, which kafkajs rejects with "Invalid topic" — the `topicMessages` form is
419
+ `producer.sendBatch`. Batches now publish correctly (caught by the new integration
420
+ suite against real Redpanda; unit tests used a fake producer that didn't validate).
421
+ - Updated dependencies [b06f8ec]
422
+ - Updated dependencies [b06f8ec]
423
+ - Updated dependencies [b06f8ec]
424
+ - Updated dependencies [b06f8ec]
425
+ - Updated dependencies [b06f8ec]
426
+ - @eventferry/core@1.0.0
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@eventferry/kafka",
3
- "version": "3.3.0",
3
+ "version": "3.3.1",
4
4
  "description": "Kafka/Redpanda publisher for @eventferry (kafkajs + confluent drivers)",
5
5
  "type": "module",
6
6
  "main": "./dist/index.cjs",
@@ -14,7 +14,8 @@
14
14
  }
15
15
  },
16
16
  "files": [
17
- "dist"
17
+ "dist",
18
+ "CHANGELOG.md"
18
19
  ],
19
20
  "keywords": [
20
21
  "outbox",
@@ -47,7 +48,7 @@
47
48
  "node": ">=18"
48
49
  },
49
50
  "dependencies": {
50
- "@eventferry/core": "3.3.0"
51
+ "@eventferry/core": "3.3.1"
51
52
  },
52
53
  "peerDependencies": {
53
54
  "kafkajs": "^2.0.0",
@@ -66,7 +67,7 @@
66
67
  "tsup": "^8.3.5",
67
68
  "typescript": "^5.7.2",
68
69
  "vitest": "^2.1.8",
69
- "@eventferry/core": "3.3.0"
70
+ "@eventferry/core": "3.3.1"
70
71
  },
71
72
  "scripts": {
72
73
  "build": "tsup",