@eventferry/core 3.3.0 → 3.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/CHANGELOG.md +290 -0
  2. package/package.json +3 -2
package/CHANGELOG.md ADDED
@@ -0,0 +1,290 @@
1
+ # @eventferry/core
2
+
3
+ ## 3.3.1
4
+
5
+ ### Patch Changes
6
+
7
+ - 3c33f71: **chore: ship `CHANGELOG.md` inside the npm tarball**
8
+
9
+ Previously, each package's `files` allowlist contained only `"dist"` (and `"sql"` for `@eventferry/postgres`), so the auto-generated `CHANGELOG.md` was never published. Users browsing the package on npmjs.com or unpacking the tarball couldn't see release notes — they had to navigate to the GitHub repo.
10
+
11
+ This release adds `"CHANGELOG.md"` to the `files` array of every publishable package. Starting with this version, the per-version release notes are accessible:
12
+
13
+ - Directly in `node_modules/@eventferry/<pkg>/CHANGELOG.md` after `npm install`
14
+ - In the file listing on npmjs.com (under the "Code" / "Files" tab, depending on the npm UI)
15
+ - Inside the tarball downloaded from `https://registry.npmjs.org/...`
16
+
17
+ No code or API surface changes.
18
+
19
+ ## 3.3.0
20
+
21
+ ### Minor Changes
22
+
23
+ - cdc20cf: **feat: DLQ enrichment + backpressure runtime + quota multiplier — Tier 1 of the reliability gap closed**
24
+
25
+ ### DLQ enrichment
26
+
27
+ Records routed to the dead-letter queue now carry the full context an operator needs to triage:
28
+
29
+ | Header | Set by | Note |
30
+ | --------------------------- | --------- | ------------------------------------------------------------------------------------------------ |
31
+ | `original-topic` | relay | already existed |
32
+ | `dlq-reason` | publisher | already existed (`error.message`) |
33
+ | `dlq-failed-at` | publisher | already existed (ISO timestamp) |
34
+ | `dlq-error-class` | publisher | **new** — `error.name` / constructor name |
35
+ | `dlq-attempts` | relay | **new** — string-encoded `attempts` count |
36
+ | `dlq-original-aggregate-id` | relay | **new** — for joining with business state |
37
+ | `dlq-original-message-id` | relay | **new** — for dedup / idempotency lookups |
38
+ | `dlq-error-stack` | relay | **new** — opt-in via `DlqConfig.includeStackTraces`, truncated to `maxStackBytes` (default 4 KB) |
39
+
40
+ ```ts
41
+ new Relay({
42
+ store,
43
+ publisher,
44
+ dlq: { topic: "orders.dlq", includeStackTraces: true, maxStackBytes: 4096 },
45
+ });
46
+ ```
47
+
48
+ ### Backpressure runtime behavior
49
+
50
+ When the driver classifies a failure as `errorKind: "backpressure"` (client-side producer queue full), the relay no longer treats it like a regular retriable failure. Instead:
51
+
52
+ - The record is re-queued via the new `OutboxStore.requeue(id, retryAt)` method,
53
+ - `attempts` is **not incremented** — the buffer being full is a "slow down" signal, not the record's fault,
54
+ - The retry is scheduled `RetryConfig.backpressureDelayMs` ms ahead (default 1000 ms).
55
+
56
+ Stores that don't implement `requeue` fall back to `markFailed` (with attempts++); both `@eventferry/postgres` and `@eventferry/mysql` ship a real implementation.
57
+
58
+ ### Quota multiplier
59
+
60
+ When the driver classifies a failure as `errorKind: "quota"` (broker `THROTTLING_QUOTA_EXCEEDED`), the scheduled retry delay is multiplied by `RetryConfig.quotaMultiplier` (default 5) so the producer gives the broker breathing room. Quota failures DO count as attempts — after the budget is exhausted the record routes to DLQ + `dead`.
61
+
62
+ ### New / changed types
63
+
64
+ - `RetryConfig` gains `backpressureDelayMs?` and `quotaMultiplier?`.
65
+ - `DlqConfig` gains `includeStackTraces?` and `maxStackBytes?`.
66
+ - `OutboxStore.requeue?(recordId, retryAt)` is a new **optional** method. Stores without it fall through to `markFailed`.
67
+
68
+ ### Backward compatibility
69
+
70
+ Pure-additive everywhere. Default behavior matches the prior release:
71
+
72
+ - A `RetryConfig` without `backpressureDelayMs` uses 1000 ms (sensible default).
73
+ - A `DlqConfig` without `includeStackTraces` keeps DLQ messages small (default off).
74
+ - An `OutboxStore` without `requeue` falls back to `markFailed` — same as before, just with a documented quirk.
75
+
76
+ This closes the last three Tier 1 items in `docs/kafka-gap-analysis/reliability.md`. Phase A reliability surface is now ~100% complete.
77
+
78
+ ## 3.2.1
79
+
80
+ ### Patch Changes
81
+
82
+ - 9beb3e2: **chore: migrate to independent versioning (Astro pattern)**
83
+
84
+ Fixes the major-version inflation that produced four consecutive surprise majors (`1.0.4 → 2.0.0`, `2.0.0 → 3.0.0`, `3.0.0 → 4.0.0 corrected to 3.1.0`, `3.1.0 → 4.0.0 corrected to 3.2.0`) from changesets whose frontmatter only asked for `minor`.
85
+
86
+ **Root cause** (cited in [changesets/changesets#1759](https://github.com/changesets/changesets/issues/1759) and [docs/decisions.md](https://github.com/changesets/changesets/blob/main/docs/decisions.md)): the adapters listed `@eventferry/core` as a `peerDependency` with `workspace:*`. Changesets' documented rule is that an internal bump of a peer forces a major bump on the dependent — and the `fixed: [["@eventferry/*"]]` group reconciler then propagated that major across every package in the group.
87
+
88
+ **Fix** (exactly the [Astro config](https://github.com/withastro/astro/blob/main/.changeset/config.json)):
89
+
90
+ 1. `.changeset/config.json` — drop `fixed`, set `linked: []`, enable
91
+ `___experimentalUnsafeOptions_WILL_CHANGE_IN_PATCH.onlyUpdatePeerDependentsWhenOutOfRange: true`.
92
+ 2. Move `@eventferry/core` from `peerDependencies` to `dependencies` in
93
+ `@eventferry/postgres`, `@eventferry/mysql`, `@eventferry/kafka`, and
94
+ `@eventferry/schema-registry`. External user-facing peers (`pg`,
95
+ `mysql2`, `kafkajs`, `@confluentinc/kafka-javascript`,
96
+ `@kafkajs/confluent-schema-registry`) stay unchanged.
97
+
98
+ **Effect on releases.** Packages now evolve at independent semver tempos: a `core: minor` changeset produces `core@3.3.0` alongside `postgres@3.2.1` (patch, from "Updated dependencies"). No more major surprises. No more manual force-push corrections.
99
+
100
+ **Effect on consumers.** Pure-additive at the install boundary: `npm i @eventferry/kafka` now resolves `@eventferry/core` automatically (it's a regular dep). Previously consumers had to install it themselves as a peer; the typical flow already did this. No source-code changes required.
101
+
102
+ ## 3.2.0
103
+
104
+ ## 3.1.0
105
+
106
+ ### Minor Changes
107
+
108
+ - da39b08: **feat: producer tuning passthrough + per-message partition override + kafkajs partitioner choice**
109
+
110
+ ### Producer tuning
111
+
112
+ `KafkaPublisher` now accepts the full set of producer tuning knobs every serious Kafka deployment eventually needs:
113
+
114
+ ```ts
115
+ new KafkaPublisher({
116
+ driver: "confluent",
117
+ brokers,
118
+ lingerMs: 25, // ⚠ confluent only
119
+ batchSize: 131_072, // ⚠ confluent only
120
+ maxInFlightRequests: 5,
121
+ requestTimeoutMs: 30_000,
122
+ deliveryTimeoutMs: 120_000, // ⚠ confluent only
123
+ maxRequestSize: 2_000_000, // ⚠ confluent only
124
+ transactionTimeoutMs: 90_000,
125
+ });
126
+ ```
127
+
128
+ **Driver asymmetry:** `kafkajs` has no producer-level config for `lingerMs`, `batchSize`, `deliveryTimeoutMs`, or `maxRequestSize` — its batching is sticky-partitioner + hardcoded internals. The typed API stays uniform; on the kafkajs driver, those four knobs log a **one-time** warning (deduped process-wide) and are otherwise ignored. For fine-grained tuning, switch to the confluent driver.
129
+
130
+ ### Per-message partition override
131
+
132
+ `PublishableMessage` gains an optional `partition?: number` field. When set, the publisher routes that record to the exact partition, bypassing the configured partitioner. Use cases: compacted topics with application-managed sharding, tenant-affinity routing, geo-pinning. Both drivers honor it.
133
+
134
+ ### kafkajs partitioner choice
135
+
136
+ Silences the noisy `KafkaJSPartitionerNotSpecified` warning kafkajs v2 emits on every producer instance, by letting you pick a partitioner explicitly:
137
+
138
+ ```ts
139
+ new KafkaPublisher({
140
+ driver: "kafkajs",
141
+ brokers,
142
+ partitioner: "java-compatible", // (default) | "legacy" | "default"
143
+ });
144
+ ```
145
+
146
+ - `"java-compatible"` is the new greenfield default (matches the Java client's murmur2).
147
+ - `"legacy"` preserves pre-v2 hash continuity for existing topics.
148
+ - `"default"` follows kafkajs's current default.
149
+
150
+ ### Backward compatibility
151
+
152
+ Pure-additive. Existing call sites continue to work unchanged; the partitioner-choice default (`"java-compatible"`) is what kafkajs v2's migration guide recommends for new producers.
153
+
154
+ ## 3.0.0
155
+
156
+ ### Minor Changes
157
+
158
+ - f0c7483: **feat: error classification for smarter retry, DLQ, and pause behavior**
159
+
160
+ Publisher implementations can now tag each failed `PublishResult` with an `errorKind` so the relay knows whether the error is worth retrying.
161
+
162
+ **New in `@eventferry/core`:**
163
+
164
+ - `PublishErrorKind = "retriable" | "fatal" | "poison" | "backpressure" | "quota"` — opt-in classification surface on `PublishResult.errorKind`.
165
+ - The `Relay` now reads `errorKind`:
166
+ - `"fatal"` (auth denied, fenced epoch, transactional id rejected) and `"poison"` (oversized record, corrupt payload, schema rejected) **short-circuit retries** straight to the DLQ + `dead` status. No more burning the retry budget on errors that cannot succeed.
167
+ - `"retriable"`, `"backpressure"`, `"quota"`, and absent (`undefined`) continue to use the existing backoff schedule, preserving backward compatibility. Smarter `backpressure` / `quota` handling (pause polling, longer backoff) is planned for a follow-up release.
168
+
169
+ **New in `@eventferry/kafka`:**
170
+
171
+ - `classifyKafkajsError(err): PublishErrorKind` — maps the most-common `KafkaJSProtocolError` types/codes and the `KafkaJSConnectionError` / `KafkaJSRequestTimeoutError` / `KafkaJSNonRetriableError` subclasses to a category. Verified against `kafkajs/src/errors.js`.
172
+ - `classifyConfluentError(err): PublishErrorKind` — maps the librdkafka `RD_KAFKA_RESP_ERR_*` codes (both negative internal codes and Kafka wire-protocol codes) to a category. Verified against `librdkafka/src/rdkafka.h`. Includes the dedicated `"backpressure"` mapping for `ERR__QUEUE_FULL` (-184) and `"quota"` for `ERR_THROTTLING_QUOTA_EXCEEDED` (89).
173
+ - Both drivers (`KafkaJsDriver`, `ConfluentDriver`) now call their respective classifier in the catch path and emit the `errorKind` on every failed `PublishResult`.
174
+
175
+ **Backward compatibility:** `errorKind` is optional everywhere. Existing publisher implementations that don't set it continue to work unchanged — the relay treats absent `errorKind` as `"retriable"`, which is what the relay did before this change.
176
+
177
+ **Migration:** none required.
178
+
179
+ ## 2.0.0
180
+
181
+ ## 1.0.4
182
+
183
+ ### Patch Changes
184
+
185
+ - 64d115d: docs / metadata: expand `keywords` on all packages for better npm and LLM discoverability (outbox-pattern, dual-write, cdc, event-driven, microservices, etc.). No code changes.
186
+
187
+ ## 1.0.3
188
+
189
+ ### Patch Changes
190
+
191
+ - aaca9a2: docs: use a non-expiring `2026-present` copyright year in LICENSE and a static MIT license badge in the README
192
+
193
+ ## 1.0.2
194
+
195
+ ### Patch Changes
196
+
197
+ - 89f1867: Declare `engines.node` (>=18) so npm shows the supported Node version and tooling can warn on unsupported runtimes.
198
+
199
+ ## 1.0.1
200
+
201
+ ### Patch Changes
202
+
203
+ - docs: polish per-package READMEs (npm page content). No code changes.
204
+
205
+ ## 1.0.0
206
+
207
+ ### Minor Changes
208
+
209
+ - b06f8ec: Add a low-latency notify-driven relay (Postgres `LISTEN`/`NOTIFY`).
210
+
211
+ - **core:** new `Waker` interface and an optional `Relay({ waker })`. The relay's
212
+ idle wait is now interruptible — when the waker signals, it claims immediately
213
+ instead of sleeping out `pollIntervalMs`. With no waker, behavior is unchanged.
214
+ - **postgres:** `PostgresNotifyWaker` holds a dedicated `LISTEN` connection and
215
+ wakes the relay on each notification, reconnecting with backoff if it drops.
216
+ `createNotifyTriggerSql(table, channel)` emits an `AFTER INSERT FOR EACH STATEMENT`
217
+ trigger that `pg_notify`s on commit (empty payload — the relay re-claims).
218
+ - Polling remains the safety net: a missed notification is caught by the next poll,
219
+ so no event is lost. All ordering/retry/DLQ/crash-recovery guarantees are unchanged.
220
+ - No new dependencies (`LISTEN`/`NOTIFY` is native to `pg`).
221
+
222
+ - b06f8ec: Add a streaming relay that publishes straight from the Postgres WAL (logical replication).
223
+
224
+ - **postgres:** `PostgresStreamingRelay` consumes INSERTs on the outbox table via
225
+ `pg-logical-replication` + `pgoutput` (built-in, no DB extension) and publishes them
226
+ with no claim query on the happy path — lower DB load than the notify waker. A failed
227
+ publish is demoted to `failed`; an internal claim-based retry loop drains it with the
228
+ existing backoff / DLQ / dead handling. `pg-logical-replication` is a new **optional**
229
+ peer dependency, loaded only in streaming mode.
230
+ - **postgres:** `PostgresStore` gains `claimFailedOnly` (claims only `failed`/timed-out
231
+ `processing` rows, never `pending`) so the stream owns pending rows with no duplication.
232
+ `createPublicationSql(table, publication)` emits an idempotent insert-only publication.
233
+ - **core:** the record→message builder is extracted as `buildPublishable(record,
234
+ serializer)` and shared by `Relay` and the streaming relay (no behavior change).
235
+ - **At-least-once:** the slot's LSN is acknowledged only after a batch's side effects
236
+ commit; a crash re-streams and re-publishes (idempotent consumers absorb the duplicate).
237
+ - **Ordering:** streaming is best-effort per-aggregate (a retried failure may land after
238
+ later same-aggregate rows). Use the polling relay for the strict head-of-aggregate
239
+ guarantee. Requires `wal_level = logical`.
240
+
241
+ - b06f8ec: Strict per-aggregate ordering, crash recovery, and driver/packaging fixes.
242
+
243
+ - **postgres:** the claim query now enforces strict per-aggregate ordering by
244
+ only taking the _head_ of each aggregate (no earlier unfinished row for the
245
+ same `aggregateId`). At most one in-flight message per aggregate; failed
246
+ messages block their successors until resolved.
247
+ - **postgres:** added a `claimed_at` column and a visibility-timeout reaper
248
+ (`claimTimeoutMs`, default 60s) so rows orphaned by a crashed relay are
249
+ reclaimed instead of stuck in `processing` forever. Migration is upgrade-safe
250
+ (`ADD COLUMN IF NOT EXISTS`); the partial indexes were retuned for the new
251
+ ordered, reaper-aware claim.
252
+ - **core:** dead-lettered messages now carry the real `original-topic` header
253
+ (previously always empty); `ConsoleLogger` routes warn/error to the matching
254
+ `console` methods.
255
+ - **kafka:** the confluent driver now honors `acks` and `compression` (it
256
+ silently ignored them before), matching the kafkajs driver.
257
+ - **packaging:** the `@eventferry/postgres/migrations` subpath export now
258
+ advertises its types; `pnpm-workspace.yaml` dropped an invalid placeholder
259
+ block.
260
+
261
+ Note: `claimTimeoutMs` should exceed your worst-case publish latency. This is
262
+ an at-least-once system — pair it with idempotent producers/consumers.
263
+
264
+ - b06f8ec: Add a type-safe event registry: `defineOutbox`.
265
+
266
+ Declare each topic once (`{ aggregateType, schema }`) and get a typed, runtime-
267
+ validated `enqueue` plus a `decode` helper consumers can reuse from the same
268
+ registry. Payloads are validated before the row is inserted, so a malformed event
269
+ rolls back with the rest of your transaction instead of reaching the outbox.
270
+
271
+ - **Validator-agnostic:** any [Standard Schema](https://standardschema.dev) works
272
+ (Zod 3.24+, Valibot, ArkType, …). The spec interface is inlined, so `@eventferry/core`
273
+ gains no runtime dependency.
274
+ - **Producer + consumer:** `defineOutbox(registry, { store })` exposes typed
275
+ `enqueue`; `defineOutbox(registry)` (no store) exposes `decode`/`validate` for
276
+ consuming services.
277
+ - New `OutboxValidationError` carries the failing topic and the validator's issues.
278
+ - Purely additive — `PostgresStore`, `Relay`, and untyped `store.enqueue` are unchanged.
279
+
280
+ - b06f8ec: Add W3C trace propagation (OpenTelemetry-compatible), dependency-free.
281
+
282
+ - **core:** new `Tracing` interface (`inject(carrier)`), the shape of an OpenTelemetry
283
+ `TextMapPropagator` — the library depends on no tracing package.
284
+ - **postgres:** `PostgresStore({ tracing })` captures the active W3C
285
+ `traceparent`/`tracestate` into the row's headers at `enqueue`, so it rides along to
286
+ the published message (on every path — polling, notify, streaming — since headers
287
+ already pass through) and the consumer can continue the trace.
288
+ - The caller's `headers` object is never mutated. With no `tracing` configured,
289
+ behavior is unchanged. The existing `trace-id` header stays for simple correlation.
290
+ - OpenTelemetry/Datadog/custom integrate via a ~5-line adapter (documented, not bundled).
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@eventferry/core",
3
- "version": "3.3.0",
3
+ "version": "3.3.1",
4
4
  "description": "DB- and broker-agnostic core for the transactional outbox pattern",
5
5
  "type": "module",
6
6
  "main": "./dist/index.cjs",
@@ -14,7 +14,8 @@
14
14
  }
15
15
  },
16
16
  "files": [
17
- "dist"
17
+ "dist",
18
+ "CHANGELOG.md"
18
19
  ],
19
20
  "keywords": [
20
21
  "outbox",