@nextn/outbound-guard 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,479 +1,136 @@
1
-
2
- # outbound-guard
3
-
4
- A small, opinionated **Node.js HTTP client** that protects your service from slow or failing upstreams by enforcing:
5
-
6
- - concurrency limits (in-flight cap)
7
- - bounded queue (backpressure)
8
- - request timeouts
9
-
10
-
11
- This is a **library**, not a service.
12
- It is **process-local**, **in-memory**, and intentionally simple.
13
-
14
- ---
15
-
16
- ## Why this exists
17
-
18
- Most production outages don’t start inside your service.
19
- They start when you call **something you don’t control**:
20
-
21
- - partner APIs
22
- - payment gateways
23
- - internal services under load
24
- - flaky dependencies
25
-
26
- Without protection, outbound calls cause:
27
- - unbounded concurrency
28
- - growing queues
29
- - long tail latency
30
- - cascading failures
31
-
32
- `outbound-guard` puts **hard limits** around outbound HTTP calls so your Node.js process stays alive and predictable under stress.
33
-
34
- ---
35
-
36
- ## What this library does (in practice)
37
-
38
- outbound-guard is built around a single idea:
39
- collapse duplicate work first, then apply limits.
40
-
41
- Most failures don’t come from too many different requests —
42
- they come from too many identical requests hitting a slow dependency.
43
-
44
- This library solves that problem before it escalates.
45
-
46
- For every outbound HTTP request, it enforces:
47
-
48
- ### 1. GET micro-cache + request coalescing (core feature)
49
-
50
- This is the most important feature in `outbound-guard`.
51
-
52
- When enabled, identical GET requests are **collapsed into a single upstream call**.
53
-
54
- - One request becomes the **leader**
55
- - All others become **followers**
56
- - Only **one upstream request** is ever in flight
57
- - Followers either:
58
- - receive cached data immediately, or
59
- - fail fast when limits are reached
60
-
61
- This prevents the most common real-world failure mode:
62
- **thundering herds on slow but healthy upstreams**.
63
-
64
- This is not long-lived caching.
65
- It is **short-lived, in-process, burst protection**.
66
-
67
-
68
- ### 2. Concurrency limits
69
- - At most `maxInFlight` requests execute at once.
70
- - Prevents connection exhaustion and event-loop overload.
71
-
72
- ### 3. Bounded queue (backpressure)
73
- - Excess requests wait in a FIFO queue (up to `maxQueue`).
74
- - If the queue is full **reject immediately**.
75
- - If waiting too long **reject with a timeout**.
76
-
77
- Failing early is a feature.
78
-
79
- ### 4. Request timeouts
80
- - Every request has a hard timeout via `AbortController`.
81
- - No hanging promises.
82
-
83
-
84
- ### 5. Observability hooks
85
- - Emits lifecycle events (queueing, failures, breaker transitions)
86
- - Exposes a lightweight `snapshot()` for debugging
87
-
88
- No metrics backend required.
89
-
90
-
91
- ---
92
- ## The failure mode this library is designed to stop
93
-
94
- #### Thundering herd on slow GET requests
95
-
96
- This is how many production incidents start:
97
-
98
- - Traffic spikes
99
- - Many requests trigger the **same GET**
100
- - The upstream slows down (not down — just slow)
101
- - Node starts N identical outbound requests
102
- - Queues grow, retries multiply, latency explodes
103
- - Eventually the process collapses
104
-
105
- Timeouts and retries don’t fix this.
106
- They **amplify it**.
107
-
108
- ### What outbound-guard does differently
109
-
110
- With `microCache` enabled:
111
-
112
- - Only **one upstream GET** is ever in flight per key
113
- - All concurrent identical requests share it
114
- - While refreshing:
115
- - previous data is served (within bounds)
116
- - or failures surface quickly
117
- - Retries happen **once**, not per caller
118
-
119
- This keeps:
120
- - upstream traffic flat
121
- - latency predictable
122
- - failures visible instead of hidden
123
-
124
- This is **request coalescing**, not caching.
125
-
126
- ## Why the micro-cache is intentionally different
127
-
128
- Most HTTP clients do one of two things:
129
-
130
- 1. Cache aggressively and risk serving bad data
131
- 2. Don’t cache at all and collapse under burst load
132
-
133
- `outbound-guard` does neither.
134
-
135
- Its micro-cache is:
136
- - GET-only
137
- - short-lived
138
- - bounded by time and memory
139
- - aware of in-flight requests
140
-
141
- The goal is **operational stability**, not freshness guarantees.
142
-
143
- If the upstream is:
144
- - slow → callers don’t pile up
145
- - failing → failures surface quickly
146
- - recovered → traffic resumes cleanly
147
-
148
- This design keeps failure behavior honest.
149
-
150
-
151
- Got it 👍
152
- You want **the same words**, just **clean structure + proper Markdown**, copy-paste ready.
153
-
154
- Below is **exactly your text**, only reorganized into clear sections with headers and spacing.
155
- No rewording, no meaning changes.
156
-
157
- ---
158
-
159
- ````md
160
- ## How to tune micro-cache safely
161
-
162
- Most users only need to understand **two knobs**.
163
-
164
- ---
165
-
166
- ### `maxWaiters`
167
-
168
- Controls how many concurrent callers are allowed to wait for the leader.
169
-
170
- ```ts
171
- maxWaiters: 10
172
- ````
173
-
174
- * Low value → aggressive load shedding
175
- * High value → tolerate more fan-in
176
-
177
- If this fills quickly, it means:
178
-
179
- > “This upstream is too slow for current traffic.”
180
-
181
- That’s a signal, not a bug.
182
-
183
- ---
184
-
185
- ### `followerTimeoutMs`
186
-
187
- Controls how long followers are willing to wait once.
188
-
189
- ```ts
190
- followerTimeoutMs: 5000
191
- ```
192
-
193
- * Followers wait at most once
194
- * No per-request retries
195
- * No silent backlog growth
196
-
197
- If this expires:
198
-
199
- * followers fail fast
200
- * queues drain
201
- * the system stays responsive
202
-
203
- This prevents **“slow death by waiting”**.
204
-
205
- ---
206
-
207
- ## Retry settings (leader-only)
208
-
209
- Retries apply **only to the leader**.
210
-
211
- ```ts
212
- retry: {
213
- maxAttempts: 3,
214
- baseDelayMs: 50,
215
- maxDelayMs: 200,
216
- retryOnStatus: [503],
217
- }
218
- ```
219
-
220
- ### What this means
221
-
222
- * **`maxAttempts`**
223
- total leader tries (including first)
224
-
225
- * **`baseDelayMs`**
226
- initial backoff
227
-
228
- * **`maxDelayMs`**
229
- cap on exponential backoff
230
-
231
- * **`retryOnStatus`**
232
- retry only when the upstream explicitly signals trouble
233
-
234
- Followers never retry.
235
- Retries never multiply under load.
236
-
237
- ```
238
- ```
239
-
240
-
241
- ### How outbound-guard helps
242
-
243
- When `microCache` is enabled:
244
-
245
- - **Only one real GET request** is sent upstream.
246
- - All concurrent identical GETs **share the same in-flight request**.
247
- - If the upstream takes 5 seconds, deduplication lasts for the full 5 seconds.
248
- - After success, the response is cached briefly (default: 1 seconds).
249
- - Requests during that window are served immediately — no new upstream calls.
250
-
251
- This dramatically reduces:
252
- - outbound request count
253
- - upstream pressure
254
- - cost
255
- - tail latency
256
- - failure amplification
257
-
258
- This is **request coalescing**, not long-lived caching.
259
-
260
- It is intentionally:
261
- - GET-only
262
- - short-lived
263
- - in-memory
264
- - process-local
265
-
266
- The goal is load shedding and cost reduction by collapsing duplicate work under concurrent load — not durability guarantees.
267
-
268
-
269
- -------------
270
-
271
- ## What this library does NOT do (by design)
272
-
273
- - ❌ No persistence
274
- - ❌ No Redis / Kafka
275
- - ❌ No retries by default
276
- - ❌ No distributed coordination
277
- - ❌ No service discovery
278
-
279
- This library provides **resilience**, not **durability**.
280
-
281
- If you need guaranteed delivery, pair it with:
282
- - a database outbox
283
- - a job queue
284
- - a message broker
285
-
286
- ---
287
-
288
- ## Installation
289
-
290
- ```bash
291
- npm install outbound-guard
292
- ````
293
-
294
- (Node.js ≥ 20)
295
-
296
- ---
297
-
298
- ## Basic usage
299
-
300
- ```ts
301
- import { ResilientHttpClient } from "outbound-guard";
302
-
303
- const client = new ResilientHttpClient({
304
- maxInFlight: 20,
305
- maxQueue: 100,
306
- enqueueTimeoutMs: 200,
307
- requestTimeoutMs: 5000,
308
-
309
- microCache: {
310
- enabled: true,
311
-
312
- // short-lived cache window
313
- ttlMs: 1000,
314
- maxStaleMs: 800,
315
-
316
- // protect against fan-in explosions
317
- maxWaiters: 10,
318
- followerTimeoutMs: 5000,
319
-
320
- // leader-only retries
321
- retry: {
322
- maxAttempts: 3,
323
- baseDelayMs: 50,
324
- maxDelayMs: 200,
325
- retryOnStatus: [503],
326
- },
327
- },
328
- });
329
-
330
-
331
- // Use it instead of fetch/axios directly
332
- await client.request({
333
- method: "GET",
334
- url: "https://third-party.example.com/config",
335
- });
336
-
337
- console.log(res.status, res.body);
338
- ```
339
-
340
- That’s it.
341
- Everything else happens automatically.
342
-
343
- ---
344
-
345
- ## Error handling
346
-
347
- Errors are **explicit and typed**:
348
-
349
- * `QueueFullError`
350
- * `QueueTimeoutError`
351
- * `RequestTimeoutError`
352
-
353
- Example:
354
-
355
- ```ts
356
- try {
357
- await client.request({ method: "GET", url });
358
- } catch (err) {
359
- if (err instanceof CircuitOpenError) {
360
- // upstream is unhealthy → fail fast
361
- }
362
- }
363
- ```
364
-
365
- ---
366
-
367
- ## Observability
368
-
369
- ### Events
370
-
371
- ```ts
372
-
373
- client.on("request:failure", (e) => {
374
- console.error("request failed", e.error);
375
- });
376
- ```
377
-
378
- ### Snapshot
379
-
380
- ```ts
381
- const snap = client.snapshot();
382
-
383
- console.log(snap.inFlight);
384
- console.log(snap.queueDepth);
385
- console.log(snap.breakers);
386
- ```
387
-
388
- Useful for logs, debugging, or ad-hoc metrics.
389
-
390
- ---
391
-
392
- ## Demo (local, no deployment)
393
-
394
- This repo includes a demo that visibly shows request coalescing, backpressure, and recovery under load.
395
-
396
- ### Terminal A — flaky upstream
397
-
398
- ```bash
399
- npm run demo:upstream
400
- ```
401
-
402
- ### Terminal B — load generator
403
-
404
- ```bash
405
- npm run demo:loadgen
406
- ```
407
-
408
- You will see patterns like:
409
-
410
- === burst: cold-start ===
411
- ok-1 ok-1 ok-1 ...
412
-
413
- === burst: cached ===
414
- ok-1 ok-1 ok-1 ...
415
-
416
- === burst: refresh-with-stale ===
417
- ok-1 ok-1 ok-2
418
-
419
- === burst: failure ===
420
- ok-2 ok-2 ok-2
421
-
422
- === burst: recovered ===
423
- ok-3 ok-3 ok-4
424
-
425
-
426
- This shows:
427
-
428
- only one upstream hit per burst
429
-
430
- cached responses during spikes
431
-
432
- safe reuse during refresh
433
-
434
- fast recovery without restart
435
-
436
- ---
437
-
438
- ## When should you use this?
439
-
440
- Good fit if you:
441
-
442
- * call external APIs from Node.js
443
- * run BFFs or API gateways
444
- * send webhooks
445
- * run background workers
446
- * want predictable failure under load
447
-
448
- ---
449
-
450
- ## When should you NOT use this?
451
-
452
- Not a good fit if you need:
453
-
454
- * durable delivery across restarts
455
- * distributed rate limiting
456
- * cross-process coordination
457
- * heavy retry orchestration
458
-
459
- This library is **not** a service mesh.
460
-
461
- ---
462
-
463
- ## Design philosophy
464
-
465
- * Explicit > clever
466
- * Fail fast > degrade silently
467
- * Small surface area > feature creep
468
- * In-process resilience first
469
-
470
- See `docs/DESIGN.md` for details.
471
-
472
- ---
473
-
474
- ## License
475
-
476
- MIT
477
-
478
-
479
- ```
1
+ # outbound-guard
2
+
3
+ Process-local Node.js HTTP client that collapses duplicate GETs and wraps every outbound call with per-host limits, bounded queueing, timeouts, and a small health gate. Its goal is simple: stop thundering herds and keep your service predictable when upstreams are slow or flaky.
4
+
5
+ ## Highlights
6
+ - Request coalescing + short-lived GET micro-cache (leader/followers, stale-while-refresh)
7
+ - Per-base-URL limiter with bounded FIFO queue (maxQueue = maxInFlight * 10)
8
+ - Lightweight health gate (OPEN → CLOSED → HALF_OPEN probe) with queue flush on close
9
+ - Hard per-attempt timeouts and leader-only retries (no retry amplification)
10
+ - Zero deps beyond `undici`; entirely in-memory and process-local
11
+
12
+ ## Install
13
+
14
+ ```bash
15
+ npm install @nextn/outbound-guard
16
+ # Node 18+ (tested on 20)
17
+ ```
18
+
19
+ ## Quick start
20
+
21
+ ```ts
22
+ import { ResilientHttpClient } from "@nextn/outbound-guard";
23
+
24
+ const client = new ResilientHttpClient({
25
+ // applied per base URL (protocol + host + port)
26
+ maxInFlight: 20,
27
+ requestTimeoutMs: 5_000,
28
+
29
+ microCache: {
30
+ enabled: true,
31
+ ttlMs: 1_000, // fresh window
32
+ maxStaleMs: 10_000, // serve stale while refreshing
33
+ maxEntries: 500,
34
+ maxWaiters: 1_000, // concurrent followers per key
35
+ followerTimeoutMs: 5_000,
36
+ retry: {
37
+ maxAttempts: 3,
38
+ baseDelayMs: 50,
39
+ maxDelayMs: 200,
40
+ retryOnStatus: [429, 502, 503, 504], // leader-only
41
+ },
42
+ },
43
+ });
44
+
45
+ const res = await client.request({
46
+ method: "GET",
47
+ url: "https://third-party.example.com/config",
48
+ });
49
+
50
+ console.log(res.status, Buffer.from(res.body).toString("utf8"));
51
+ ```
52
+
53
+ `res.body` is a `Uint8Array`; convert with `Buffer.from(res.body)` or `TextDecoder` as needed.
54
+
55
+ ## How it works
56
+
57
+ ### Micro-cache and request coalescing (GET only)
58
+ - Keyed by `GET ${normalizedUrl}` by default (hostname lowercased, default ports stripped). Override via `microCache.keyFn`.
59
+ - One caller becomes **leader**; identical concurrent GETs become **followers** and wait for the leader result.
60
+ - Fresh window: successful 2xx responses are cached for `ttlMs`.
61
+ - Stale-while-refresh: after `ttlMs`, followers can be served from the previous value while a new leader refreshes, up to `maxStaleMs`.
62
+ - Follower guardrails: reject immediately when `maxWaiters` is exceeded or when waiting longer than `followerTimeoutMs`.
63
+ - Failure handling: if a refresh fails but stale data is within `maxStaleMs`, stale is served; otherwise the error is surfaced.
64
+ - Leader-only retry: optional exponential backoff for retryable statuses so retries do not multiply under fan-in.
65
+
66
+ ### Concurrency and queueing
67
+ - Limits are per base URL (`protocol://host:port`).
68
+ - At most `maxInFlight` requests run at once; overflow enters a bounded FIFO queue sized at `maxInFlight * 10` (internal for now).
69
+ - If the queue is full, a `QueueFullError` is thrown immediately. Queued waiters are rejected if the upstream is marked unhealthy.
70
+
71
+ ### Health gate (tiny circuit breaker)
72
+ - Tracks outcomes per base URL. Hard failures (request timeouts or unknown errors) and soft failures (429, 502, 503, 504) feed the window.
73
+ - Closes immediately after 3 consecutive hard failures, or when (with ≥10 samples) hard-fail rate ≥30% or total fail rate ≥50%.
74
+ - CLOSED: new requests fail fast with `UpstreamUnhealthyError` (micro-cache can still serve stale if available).
75
+ - Cooldown uses exponential backoff: starts at ~1s with jitter, doubles up to 30s. When cooldown elapses, the circuit moves to HALF_OPEN.
76
+ - HALF_OPEN: exactly one probe is allowed; other calls get `HalfOpenRejectedError`. A successful probe reopens; a failing probe recloses.
77
+ - Per-host isolation: a bad upstream does not poison other hosts.
78
+
79
+ ### Timeouts and retries
80
+ - `requestTimeoutMs` is enforced per attempt with `AbortController`; hanging upstreams become `RequestTimeoutError`.
81
+ - Retries are opt-in and apply only to GET leaders via `microCache.retry`. Followers never retry, so retries cannot explode under load.
82
+
83
+ ## API surface
84
+
85
+ ```ts
86
+ await client.request({
87
+ method: "GET" | "POST" | "PUT" | "PATCH" | "DELETE" | "HEAD" | "OPTIONS",
88
+ url: "https://example.com/resource",
89
+ headers?: Record<string, string>,
90
+ body?: string | Uint8Array | Buffer,
91
+ });
92
+
93
+ client.snapshot(); // { inFlight, queueDepth }
94
+ client.on(eventName, handler); // see below for event names
95
+ ```
96
+
97
+ ### Errors
98
+ All exported errors extend `ResilientHttpError`:
99
+ - `QueueFullError` queue capacity hit for that base URL.
100
+ - `RequestTimeoutError` per-attempt timeout exceeded.
101
+ - `UpstreamUnhealthyError` circuit is CLOSED for the base URL.
102
+ - `HalfOpenRejectedError` circuit is HALF_OPEN and the call was not the probe.
103
+
104
+ ### Events
105
+ The client is an `EventEmitter`. Useful hooks:
106
+ - `request:start | request:success | request:failure | request:rejected`
107
+ - `health:closed | health:half_open | health:open`
108
+ - `microcache:retry | microcache:refresh_failed`
109
+
110
+ Event payloads include the request, requestId, status/duration when available, and error objects on failures.
111
+
112
+ ## Demo (local)
113
+
114
+ Visualize coalescing and backpressure without deploying anything:
115
+
116
+ ```bash
117
+ npm run demo:upstream # terminal A: flaky upstream
118
+ npm run demo:loadgen # terminal B: bursts against the client
119
+ ```
120
+
121
+ Watch how bursts collapse to a single upstream hit, stale responses are served during refresh, and failures recover cleanly.
122
+
123
+ ## When to use
124
+ - Calling external APIs or partner services from Node.js
125
+ - BFFs/API gateways that must isolate upstream slowness
126
+ - Webhook senders or background workers that need predictable failure behavior
127
+
128
+ ## When not to use
129
+ - If you need durable delivery across restarts (use queues/outbox)
130
+ - If you need cross-process coordination or distributed rate limiting
131
+ - If you need a service mesh or long-lived caching
132
+
133
+ ## Design stance
134
+ - Favor explicit limits over hidden buffers
135
+ - Fail fast instead of building invisible backlogs
136
+ - Keep the surface small; stay in-process and dependency-light