service-bridge 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/README.md +989 -0
  2. package/biome.json +28 -0
  3. package/bun.lock +249 -0
  4. package/dist/express.d.ts +51 -0
  5. package/dist/express.js +129 -0
  6. package/dist/fastify.d.ts +43 -0
  7. package/dist/fastify.js +122 -0
  8. package/dist/index.js +34507 -0
  9. package/dist/trace.d.ts +19 -0
  10. package/http/dist/express.d.ts +51 -0
  11. package/http/dist/express.d.ts.map +1 -0
  12. package/http/dist/express.test.d.ts +2 -0
  13. package/http/dist/express.test.d.ts.map +1 -0
  14. package/http/dist/fastify.d.ts +43 -0
  15. package/http/dist/fastify.d.ts.map +1 -0
  16. package/http/dist/fastify.test.d.ts +2 -0
  17. package/http/dist/fastify.test.d.ts.map +1 -0
  18. package/http/dist/index.d.ts +7 -0
  19. package/http/dist/index.d.ts.map +1 -0
  20. package/http/dist/trace.d.ts +19 -0
  21. package/http/dist/trace.d.ts.map +1 -0
  22. package/http/dist/trace.test.d.ts +2 -0
  23. package/http/dist/trace.test.d.ts.map +1 -0
  24. package/http/package.json +49 -0
  25. package/http/src/express.test.ts +125 -0
  26. package/http/src/express.ts +209 -0
  27. package/http/src/fastify.test.ts +142 -0
  28. package/http/src/fastify.ts +159 -0
  29. package/http/src/index.ts +10 -0
  30. package/http/src/sdk-augment.d.ts +11 -0
  31. package/http/src/servicebridge.d.ts +23 -0
  32. package/http/src/trace.test.ts +97 -0
  33. package/http/src/trace.ts +56 -0
  34. package/http/tsconfig.json +17 -0
  35. package/http/tsconfig.test.json +6 -0
  36. package/package.json +113 -0
  37. package/sdk/dist/generated/servicebridge-package-definition.d.ts +4912 -0
  38. package/sdk/dist/grpc-client.d.ts +344 -0
  39. package/sdk/dist/grpc-client.test.d.ts +1 -0
  40. package/sdk/dist/index.d.ts +2 -0
  41. package/sdk/package.json +31 -0
  42. package/sdk/scripts/generate-proto.ts +65 -0
  43. package/sdk/src/generated/servicebridge-package-definition.ts +5423 -0
  44. package/sdk/src/grpc-client.d.ts +332 -0
  45. package/sdk/src/grpc-client.d.ts.map +1 -0
  46. package/sdk/src/grpc-client.test.ts +422 -0
  47. package/sdk/src/grpc-client.ts +3088 -0
  48. package/sdk/src/index.d.ts +3 -0
  49. package/sdk/src/index.d.ts.map +1 -0
  50. package/sdk/src/index.ts +31 -0
  51. package/sdk/tsconfig.json +13 -0
package/README.md ADDED
@@ -0,0 +1,989 @@
1
+ <!-- keywords: service-bridge servicebridge npm install service-bridge Node.js TypeScript JavaScript microservices RPC gRPC event-bus event-driven distributed-tracing workflow orchestration background-jobs cron mTLS service-mesh service-discovery distributed-systems zero-sidecar Istio-alternative RabbitMQ-alternative Temporal-alternative Jaeger-alternative PostgreSQL Docker Kubernetes DLQ dead-letter-queue saga distributed-transactions AI-agent-orchestration Express Fastify HTTP-middleware observability Prometheus tracing service-catalog async-messaging durable-events retries idempotency auto-mTLS runtime-dashboard production-ready bun deno -->
2
+
3
+ # service-bridge
4
+
5
+ [![npm version](https://img.shields.io/npm/v/%40service-bridge%2Fnode?color=cb3837&logo=npm)](https://www.npmjs.com/package/service-bridge)
6
+ [![License](https://img.shields.io/badge/License-Free%20%2F%20Commercial-blue)](../LICENSE)
7
+ [![TypeScript](https://img.shields.io/badge/TypeScript-5%2B-3178c6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
8
+ [![Node](https://img.shields.io/badge/Node.js-18%2B-339933?logo=node.js&logoColor=white)](https://nodejs.org/)
9
+
10
+ **The Unified Bridge for Microservices Interaction**
11
+
12
+ Node.js SDK for [ServiceBridge](https://servicebridge.dev) — production-ready RPC, durable events, workflows, jobs, and distributed tracing in a single SDK. One Go runtime and PostgreSQL.
13
+
14
+ ```
15
+ ┌─────────────────────────────────────────────────────────────────┐
16
+ │ BEFORE: 10 moving parts │
17
+ │ Istio · Envoy · RabbitMQ · Temporal · Jaeger · Consul · │
18
+ │ cert-manager · Alertmanager · cron · custom glue │
19
+ └─────────────────────────────────────────────────────────────────┘
20
+
21
+ ┌─────────────────────────────────────────────────────────────────┐
22
+ │ AFTER: ServiceBridge + PostgreSQL │
23
+ │ RPC · Events · Workflows · Jobs · Tracing · mTLS · Dashboard │
24
+ │ One SDK · One runtime · Zero sidecars │
25
+ └─────────────────────────────────────────────────────────────────┘
26
+ ```
27
+
28
+ ## Table of Contents
29
+
30
+ - [Why ServiceBridge](#why-servicebridge)
31
+ - [Use Cases](#use-cases)
32
+ - [Quick Start](#quick-start)
33
+ - [Install](#install)
34
+ - [Runtime Setup](#runtime-setup)
35
+ - [End-to-End Example](#end-to-end-example)
36
+ - [Platform Features](#platform-features)
37
+ - [How It Compares](#how-it-compares)
38
+ - [API Reference](#api-reference)
39
+ - [HTTP Plugins](#http-plugins)
40
+ - [Configuration](#configuration)
41
+ - [Environment Variables](#environment-variables)
42
+ - [Error Handling](#error-handling)
43
+ - [When to Use / When Not to Use](#when-to-use--when-not-to-use)
44
+ - [FAQ](#faq)
45
+ - [Community and Support](#community-and-support)
46
+ - [License](#license)
47
+
48
+ ---
49
+
50
+ ## Why ServiceBridge
51
+
52
+ | Problem | Without ServiceBridge | With ServiceBridge |
53
+ |---|---|---|
54
+ | Service-to-service calls | Istio/Envoy sidecar proxy per pod | **Direct SDK-to-worker gRPC, zero proxy hops** |
55
+ | Async messaging | Kafka/RabbitMQ + retry logic + DLQ setup | **Built-in durable events with retry, DLQ, replay** |
56
+ | Background jobs | Bull/BullMQ + Redis + cron daemon | **Built-in cron and delayed jobs** |
57
+ | Workflow orchestration | Temporal/Conductor cluster + persistence | **Built-in DAG workflows** |
58
+ | Distributed tracing | Jaeger/Tempo + OTEL collector + dashboards | **Built-in traces + realtime UI** |
59
+ | Service discovery | Consul/etcd + DNS glue | **Built-in registry + health-aware balancing** |
60
+ | mTLS | cert-manager + Vault PKI | **Auto-provisioned certs from service key** |
61
+
62
+ **Result**: `10 tools → 1 runtime`. One Go binary + PostgreSQL replaces the entire stack.
63
+
64
+ ---
65
+
66
+ ## Use Cases
67
+
68
+ **Microservice communication** — Replace sidecar mesh with direct RPC calls. Get sub-millisecond overhead instead of double proxy hop latency.
69
+
70
+ **Event-driven architecture** — Publish durable events with fan-out, retries, DLQ, idempotency, and server-side filtering. No broker infrastructure to manage.
71
+
72
+ **Background job scheduling** — Cron jobs, delayed execution, and job-triggered workflows in a single API. No Redis, no separate queue workers.
73
+
74
+ **Saga / distributed transactions** — DAG workflows with typed steps (`rpc`, `event`, `event_wait`, `sleep`, child workflow). Compensations and rollbacks via workflow step dependencies.
75
+
76
+ **AI agent orchestration** — Stream LLM tokens via realtime run streams with replay. Orchestrate multi-step AI pipelines as workflows.
77
+
78
+ **Full-stack observability** — Every RPC call, event delivery, workflow step, and HTTP request traced automatically. One timeline, one dashboard. Prometheus metrics and Loki-compatible log API included.
79
+
80
+ ---
81
+
82
+ ## Quick Start
83
+
84
+ ### 1. Install
85
+
86
+ ```bash
87
+ npm i service-bridge
88
+ # or
89
+ bun add service-bridge
90
+ ```
91
+
92
+ ### 2. Create a worker (service that handles calls)
93
+
94
+ ```ts
95
+ import { servicebridge } from "service-bridge";
96
+
97
+ const sb = servicebridge(
98
+ process.env.SERVICEBRIDGE_URL ?? "127.0.0.1:14445",
99
+ process.env.SERVICEBRIDGE_SERVICE_KEY!,
100
+ "payments",
101
+ );
102
+
103
+ sb.handleRpc("charge", async (payload: { orderId: string; amount: number }) => {
104
+ return { ok: true, txId: `tx_${Date.now()}`, orderId: payload.orderId };
105
+ });
106
+
107
+ await sb.serve({ host: "127.0.0.1" });
108
+ ```
109
+
110
+ ### 3. Call it from another service
111
+
112
+ ```ts
113
+ import { servicebridge } from "service-bridge";
114
+
115
+ const sb = servicebridge(
116
+ process.env.SERVICEBRIDGE_URL ?? "127.0.0.1:14445",
117
+ process.env.SERVICEBRIDGE_SERVICE_KEY!,
118
+ "orders",
119
+ );
120
+
121
+ const result = await sb.rpc<{ ok: boolean; txId: string }>("payments/charge", {
122
+ orderId: "ord_42",
123
+ amount: 4990,
124
+ });
125
+
126
+ console.log(result.txId); // tx_1711234567890
127
+ ```
128
+
129
+ That's it. No broker, no sidecar, no proxy — direct gRPC call between services.
130
+
131
+ ---
132
+
133
+ ## Runtime Setup
134
+
135
+ The SDK connects to a ServiceBridge runtime. The fastest way to start:
136
+
137
+ ```bash
138
+ bash <(curl -fsSL https://servicebridge.dev/install.sh)
139
+ ```
140
+
141
+ This installs ServiceBridge + PostgreSQL via Docker Compose and generates an admin password automatically. After install, the dashboard is at `http://localhost:14444` and the gRPC control plane at `127.0.0.1:14445`.
142
+
143
+ For manual Docker Compose setup, configuration reference, and all runtime environment variables, see the **[Runtime Setup](../README.md#runtime-setup)** section in the main SDK README.
144
+
145
+ ---
146
+
147
+ ## End-to-End Example
148
+
149
+ A complete order flow: HTTP request → RPC → Event → Event handler with streaming.
150
+
151
+ ```ts
152
+ import { servicebridge } from "service-bridge";
153
+
154
+ // --- Payments service (worker) ---
155
+
156
+ const payments = servicebridge("127.0.0.1:14445", process.env.SERVICEBRIDGE_SERVICE_KEY!, "payments");
157
+
158
+ payments.handleRpc("charge", async (payload: { orderId: string; amount: number }, ctx) => {
159
+ await ctx?.stream.write({ status: "charging", orderId: payload.orderId }, "progress");
160
+
161
+ // ... charge logic ...
162
+
163
+ await ctx?.stream.write({ status: "charged" }, "progress");
164
+ return { ok: true, txId: `tx_${Date.now()}` };
165
+ });
166
+
167
+ await payments.serve({ host: "127.0.0.1" });
168
+ ```
169
+
170
+ ```ts
171
+ // --- Orders service (caller + event publisher) ---
172
+
173
+ const orders = servicebridge("127.0.0.1:14445", process.env.SERVICEBRIDGE_SERVICE_KEY!, "orders");
174
+
175
+ // Call payments, then publish event
176
+ const charge = await orders.rpc<{ ok: boolean; txId: string }>("payments/charge", {
177
+ orderId: "ord_42",
178
+ amount: 4990,
179
+ });
180
+
181
+ await orders.event("orders.completed", {
182
+ orderId: "ord_42",
183
+ txId: charge.txId,
184
+ }, {
185
+ idempotencyKey: "order:ord_42:completed",
186
+ headers: { source: "checkout" },
187
+ });
188
+ ```
189
+
190
+ ```ts
191
+ // --- Notifications service (event consumer) ---
192
+
193
+ const notifications = servicebridge("127.0.0.1:14445", process.env.SERVICEBRIDGE_SERVICE_KEY!, "notifications");
194
+
195
+ notifications.handleEvent("orders.*", async (payload, ctx) => {
196
+ const body = payload as { orderId: string; txId: string };
197
+ await ctx.stream.write({ status: "sending_email", orderId: body.orderId }, "progress");
198
+ // ... send email ...
199
+ });
200
+
201
+ await notifications.serve({ host: "127.0.0.1" });
202
+ ```
203
+
204
+ ```ts
205
+ // --- Orchestrate as a workflow ---
206
+
207
+ await orders.workflow("order.fulfillment", [
208
+ { id: "reserve", type: "rpc", ref: "inventory/reserve" },
209
+ { id: "charge", type: "rpc", ref: "payments/charge", deps: ["reserve"] },
210
+ { id: "wait_dlv", type: "event_wait", ref: "shipping.delivered", deps: ["charge"] },
211
+ { id: "notify", type: "event", ref: "orders.fulfilled", deps: ["wait_dlv"] },
212
+ ]);
213
+ ```
214
+
215
+ Every step above — RPC, event publish, event delivery, workflow execution — appears in a single trace timeline in the built-in dashboard.
216
+
217
+ ---
218
+
219
+ ## Platform Features
220
+
221
+ ### Communication
222
+ - **Direct RPC** — zero-hop gRPC calls with retries, deadlines, and mTLS identity
223
+ - **Durable Events** — fan-out delivery, at-least-once guarantees, retries, DLQ, replay, idempotency
224
+ - **Realtime Streams** — live chunks with replay for AI/progress/log streaming
225
+ - **Service Discovery** — automatic endpoint resolution and round-robin balancing
226
+ - **HTTP Middleware** — Express and Fastify instrumentation with automatic trace propagation
227
+
228
+ ### Orchestration
229
+ - **Workflows** — DAG steps: `rpc`, `event`, `event_wait`, `sleep`, child workflow
230
+ - **Jobs** — cron, delayed, and workflow-triggered scheduling
231
+
232
+ ### Security
233
+ - **Auto mTLS** — automatic certificate provisioning for workers
234
+ - **Access Policy** — service-level caller/target restrictions and RBAC
235
+
236
+ ### Observability
237
+ - **Unified Tracing** — single trace timeline across HTTP, RPC, events, workflows, and jobs
238
+ - **Metrics** — Prometheus-compatible `/metrics` endpoint (30+ metric families)
239
+ - **Logs** — structured log ingest with Loki-compatible query API
240
+ - **Alerts** — runtime alerts for delivery failures, errors, and service health
241
+ - **Dashboard** — realtime web UI for runs, events, workflows, jobs, DLQ, service map, and service keys
242
+
243
+ ---
244
+
245
+ ## How It Compares
246
+
247
+ | Concern | Istio + Envoy | Dapr | Temporal + Kafka | ServiceBridge |
248
+ |---|---|---|---|---|
249
+ | RPC data path | Sidecar proxy hop | Sidecar/daemon hop | N/A | **Direct (proxyless)** |
250
+ | Service discovery | K8s control plane | Sidecar placement | External registry | **Built-in registry** |
251
+ | Durable events + DLQ | External broker | Pub/Sub component | Kafka + consumers | **Built-in** |
252
+ | Workflow orchestration | External engine | External engine | Built-in | **Built-in** |
253
+ | Job scheduling | External cron/queue | External scheduler | External scheduler | **Built-in** |
254
+ | Traces + UI | Jaeger/Tempo + dashboards | OTEL backend + dashboards | Temporal UI | **Built-in** |
255
+ | Logs for Grafana | Loki + Promtail pipeline | Log pipeline | Log pipeline | **Built-in Loki API** |
256
+ | Metrics | App/exporter setup | App/exporter setup | Multiple exporters | **Built-in `/metrics`** |
257
+ | Security model | Mesh PKI + policy | Deployment-dependent mTLS | Mixed | **Service keys + auto mTLS** |
258
+ | Operational footprint | Multi-component mesh | Runtime + sidecars | Workflow + broker + DB | **One binary + PostgreSQL** |
259
+
260
+ ---
261
+
262
+ ## API Reference
263
+
264
+ ### Cross-SDK parity notes
265
+
266
+ ServiceBridge keeps the core API shape consistent across Node.js, Go, and Python:
267
+ constructor, RPC, events, jobs, workflows, `runWorkflow`, streams, serve/stop, and `ServiceBridgeError`.
268
+
269
+ Constructor-level defaults for `timeout`, `retries`, and `retryDelay` are available
270
+ across all three SDKs. The following are intentionally language-specific today:
271
+
272
+ - Node-only constructor options: `workerTransport`, `workerTLS`
273
+ - Node-only handler hints: `handleRpc.timeout`, `handleRpc.retryable`, `handleRpc.concurrency`, `handleEvent.concurrency`, `handleEvent.prefetch`
274
+ - Node-only `serve()` fields: `instanceId`, `weight`, `transport`, `tls`
275
+
276
+ ### `servicebridge(url, serviceKey, serviceName?, opts?)`
277
+
278
+ ```ts
279
+ function servicebridge(
280
+ url: string,
281
+ serviceKey: string,
282
+ service?: string,
283
+ globalOpts?: ServiceBridgeOpts,
284
+ ): ServiceBridgeService
285
+ ```
286
+
287
+ Creates an SDK client instance.
288
+
289
+ `ServiceBridgeOpts`:
290
+
291
+ | Option | Type | Default | Description |
292
+ |---|---|---|---|
293
+ | `timeout` | `number` | `30000` | Default timeout (ms) for SDK operations. |
294
+ | `retries` | `number` | `3` | Default retry count for `rpc()`. |
295
+ | `retryDelay` | `number` | `300` | Base backoff delay (ms) for `rpc()`. |
296
+ | `discoveryRefreshMs` | `number` | `10000` | Discovery refresh period for endpoint updates. |
297
+ | `queueMaxSize` | `number` | `1000` | Max offline queue size for control-plane writes. |
298
+ | `queueOverflow` | `"drop-oldest" \| "drop-newest" \| "error"` | `"drop-oldest"` | Overflow strategy for offline queue. |
299
+ | `heartbeatIntervalMs` | `number` | `10000` | Base heartbeat period for worker registrations. |
300
+ | `workerTransport` | `"tls"` | `"tls"` | Worker server transport. |
301
+ | `workerTLS` | `WorkerTLSOpts` | auto | Explicit cert/key/CA for worker mTLS. |
302
+ | `adminUrl` | `string` | derived from `url` | HTTP admin base URL (TLS provisioning and management API calls). |
303
+ | `adminSessionCookie` | `string` | `undefined` | Admin session cookie for browser-authenticated endpoints (e.g. `cancelWorkflowRun`). |
304
+ | `adminCsrfToken` | `string` | `undefined` | CSRF token paired with `adminSessionCookie` for unsafe HTTP methods. |
305
+ | `adminOrigin` | `string` | `undefined` | Origin header required by admin CSRF/origin checks. |
306
+ | `captureLogs` | `boolean` | `true` | Forward `console.*` logs to ServiceBridge. |
307
+
308
+ `WorkerTLSOpts`:
309
+
310
+ ```ts
311
+ type WorkerTLSOpts = {
312
+ caCert?: string | Buffer;
313
+ cert?: string | Buffer;
314
+ key?: string | Buffer;
315
+ serverName?: string;
316
+ }
317
+ ```
318
+
319
+ ---
320
+
321
+ ### `rpc(fn, payload?, opts?)`
322
+
323
+ ```ts
324
+ rpc<T = unknown>(fn: string, payload?: unknown, opts?: RpcOpts): Promise<T>
325
+ ```
326
+
327
+ Calls a registered RPC handler on another service. Direct gRPC path, no proxy.
328
+
329
+ `RpcOpts`:
330
+
331
+ | Option | Type | Description |
332
+ |---|---|---|
333
+ | `timeout` | `number` | Call timeout in ms. |
334
+ | `retries` | `number` | Retry count override. |
335
+ | `retryDelay` | `number` | Base retry delay override. |
336
+ | `traceId` | `string` | Explicit trace id. |
337
+ | `parentSpanId` | `string` | Explicit parent span id. |
338
+
339
+ ```ts
340
+ const user = await sb.rpc<{ id: string; name: string }>("users/get", { id: "u_1" }, {
341
+ timeout: 5000,
342
+ retries: 2,
343
+ });
344
+ ```
345
+
346
+ ---
347
+
348
+ ### `event(topic, payload?, opts?)`
349
+
350
+ ```ts
351
+ event(topic: string, payload?: unknown, opts?: EventOpts): Promise<string>
352
+ ```
353
+
354
+ Publishes a durable event. Returns `messageId` when online.
355
+
356
+ `EventOpts`:
357
+
358
+ | Option | Type | Description |
359
+ |---|---|---|
360
+ | `traceId` | `string` | Explicit trace id. |
361
+ | `parentSpanId` | `string` | Explicit parent span id. |
362
+ | `idempotencyKey` | `string` | Idempotency key for dedup-safe publishing. |
363
+ | `headers` | `Record<string, string>` | Custom metadata headers. |
364
+
365
+ ```ts
366
+ await sb.event("orders.created", { orderId: "ord_42" }, {
367
+ idempotencyKey: "order:ord_42",
368
+ headers: { source: "checkout" },
369
+ });
370
+ ```
371
+
372
+ ---
373
+
374
+ ### `job(target, opts)`
375
+
376
+ ```ts
377
+ job(target: string, opts: ScheduleOpts): Promise<string>
378
+ ```
379
+
380
+ Registers a scheduled or delayed job.
381
+
382
+ `ScheduleOpts`:
383
+
384
+ | Option | Type | Description |
385
+ |---|---|---|
386
+ | `cron` | `string` | Cron expression. |
387
+ | `delay` | `number` | Delay in ms before execution. Backed by `int32` in the proto — maximum ~24.8 days (~2,147,483,647 ms). |
388
+ | `timezone` | `string` | Timezone for cron execution. |
389
+ | `misfire` | `"fire_now" \| "skip"` | Misfire policy. |
390
+ | `via` | `"event" \| "rpc" \| "workflow"` | Target type. |
391
+ | `retryPolicyJson` | `string` | Retry policy JSON string. |
392
+
393
+ ```ts
394
+ await sb.job("billing/collect", {
395
+ cron: "0 * * * *",
396
+ timezone: "UTC",
397
+ via: "rpc",
398
+ });
399
+ ```
400
+
401
+ ---
402
+
403
+ ### `workflow(name, steps, opts?)`
404
+
405
+ ```ts
406
+ workflow(name: string, steps: WorkflowStep[], opts?: WorkflowOpts): Promise<void>
407
+ ```
408
+
409
+ Registers (or updates) a workflow definition as a DAG of typed steps.
410
+
411
+ `WorkflowStep`:
412
+
413
+ | Field | Type | Description |
414
+ |---|---|---|
415
+ | `id` | `string` | Unique step identifier in the DAG. |
416
+ | `type` | `"rpc" \| "event" \| "event_wait" \| "sleep" \| "workflow"` | Step execution type. |
417
+ | `ref` | `string` | Required for `rpc`, `event`, `event_wait`, `workflow`. |
418
+ | `deps` | `string[]` | Dependencies. Empty/omitted means root step. |
419
+ | `if` | `string` | Optional filter expression (step is skipped if false). |
420
+ | `timeoutMs` | `number` | Optional timeout for `rpc` and `event_wait` steps. |
421
+ | `durationMs` | `number` | Required for `sleep` steps. |
422
+
423
+ `WorkflowOpts`:
424
+
425
+ ```ts
426
+ interface WorkflowOpts {
427
+ stateLimitBytes?: number; // default 262144 (256 KB)
428
+ stepTimeoutMs?: number; // default 30000 (30 s)
429
+ }
430
+ ```
431
+
432
+ | Field | Type | Default | Description |
433
+ |---|---|---|---|
434
+ | `stateLimitBytes` | `number` | `262144` (256 KB) | Maximum serialized state size in bytes. |
435
+ | `stepTimeoutMs` | `number` | `30000` (30 s) | Default per-step timeout in milliseconds. |
436
+
437
+ ```ts
438
+ await sb.workflow("order.fulfillment", [
439
+ { id: "reserve", type: "rpc", ref: "inventory/reserve" },
440
+ { id: "charge", type: "rpc", ref: "payments/charge", deps: ["reserve"] },
441
+ { id: "wait_5m", type: "sleep", durationMs: 300_000, deps: ["charge"] },
442
+ { id: "notify", type: "event", ref: "orders.fulfilled", deps: ["wait_5m"] },
443
+ ]);
444
+ ```
445
+
446
+ With explicit limits:
447
+
448
+ ```ts
449
+ await sb.workflow("checkout.flow", steps, { stepTimeoutMs: 60_000 });
450
+ ```
451
+
452
+ ---
453
+
454
+ ### `runWorkflow(name, input?, opts?)`
455
+
456
+ ```ts
457
+ runWorkflow(name: string, input?: unknown, opts?: RunWorkflowOpts): Promise<{ runId: string; traceId: string }>
458
+ ```
459
+
460
+ Starts a workflow run on demand. The workflow must be registered first via `workflow()`.
461
+ An alternative to scheduling via `job(target, { via: "workflow" })` — triggers the run immediately.
462
+
463
+ | Parameter | Type | Default | Description |
464
+ |---|---|---|---|
465
+ | `name` | `string` | required | Name of a previously registered workflow. |
466
+ | `input` | `unknown` | `undefined` | Optional JSON-serializable input payload. |
467
+
468
+ Returns `{ runId, traceId }`. Use `traceId` with `watchRun()` to observe execution in real time.
469
+
470
+ ```ts
471
+ const { runId, traceId } = await sb.runWorkflow("user.onboarding", { userId: "u_123" });
472
+ ```
473
+
474
+ ---
475
+
476
+ ### `cancelWorkflowRun(runId)`
477
+
478
+ ```ts
479
+ cancelWorkflowRun(runId: string): Promise<void>
480
+ ```
481
+
482
+ Cancels a running workflow instance.
483
+
484
+ ```ts
485
+ await sb.cancelWorkflowRun("run_01HQ...XYZ");
486
+ ```
487
+
488
+ ---
489
+
490
+ ### `handleRpc(fn, handler, opts?)`
491
+
492
+ ```ts
493
+ handleRpc(
494
+ fn: string,
495
+ handler: (payload: unknown, ctx?: RpcContext) => unknown | Promise<unknown>,
496
+ opts?: HandleRpcOpts,
497
+ ): ServiceBridgeService
498
+ ```
499
+
500
+ Registers an RPC handler. Chainable.
501
+
502
+ `HandleRpcOpts`:
503
+
504
+ | Option | Type | Description |
505
+ |---|---|---|
506
+ | `timeout` | `number` | Node-only timeout hint (currently not hard-enforced by runtime). |
507
+ | `retryable` | `boolean` | Node-only retry hint (currently metadata-level, not a strict policy switch). |
508
+ | `concurrency` | `number` | Node-only concurrency hint (currently not hard-enforced). |
509
+ | `schema` | `RpcSchemaOpts` | Inline protobuf schema for binary encode/decode. |
510
+ | `allowedCallers` | `string[]` | Allow-list of caller service names. |
511
+
512
+ ```ts
513
+ sb.handleRpc("ai/generate", async (payload: { prompt: string }, ctx) => {
514
+ await ctx?.stream.write({ token: "Hello" }, "output");
515
+ await ctx?.stream.write({ token: " world" }, "output");
516
+ return { text: "Hello world" };
517
+ });
518
+ ```
519
+
520
+ ---
521
+
522
+ ### `handleEvent(pattern, handler, opts?)`
523
+
524
+ ```ts
525
+ handleEvent(
526
+ pattern: string,
527
+ handler: (payload: unknown, ctx: EventContext) => void | Promise<void>,
528
+ opts?: HandleEventOpts,
529
+ ): ServiceBridgeService
530
+ ```
531
+
532
+ Registers an event consumer handler. Chainable.
533
+
534
+ `HandleEventOpts`:
535
+
536
+ | Option | Type | Description |
537
+ |---|---|---|
538
+ | `groupName` | `string` | Consumer group name. Default: `<service>.<pattern>`. |
539
+ | `concurrency` | `number` | Node-only concurrency hint (currently not hard-enforced). |
540
+ | `prefetch` | `number` | Node-only prefetch hint (currently not hard-enforced). |
541
+ | `retryPolicyJson` | `string` | Retry policy JSON string. |
542
+ | `filterExpr` | `string` | Server-side filter expression. |
543
+
544
+ Duplicate `groupName` registration throws an error.
545
+
546
+ `EventContext` helpers:
547
+
548
+ - `ctx.retry(delayMs?)` — ask for redelivery with optional delay
549
+ - `ctx.reject(reason)` — reject without retry
550
+ - `ctx.refs` — metadata (`topic`, `groupName`, `messageId`, `attempt`, `headers`)
551
+ - `ctx.stream.write(...)` — append real-time chunks to run stream
552
+
553
+ ```ts
554
+ sb.handleEvent("orders.*", async (payload, ctx) => {
555
+ const body = payload as { orderId?: string };
556
+ if (!body.orderId) {
557
+ ctx.reject("missing_order_id");
558
+ return;
559
+ }
560
+ await ctx.stream.write({ status: "processing", orderId: body.orderId }, "progress");
561
+ });
562
+ ```
563
+
564
+ ---
565
+
566
+ ### `serve(opts?)`
567
+
568
+ ```ts
569
+ serve(opts?: ServeOpts): Promise<void>
570
+ ```
571
+
572
+ Starts the worker gRPC server and registers handlers with the control plane.
573
+ The promise resolves once startup/registration is complete (it does not block
574
+ the Node.js process).
575
+
576
+ `ServeOpts`:
577
+
578
+ | Option | Type | Description |
579
+ |---|---|---|
580
+ | `host` | `string` | Bind host. Default: `127.0.0.1`. |
581
+ | `instanceId` | `string` | Stable worker instance identifier (Node-only). |
582
+ | `weight` | `number` | Scheduling/discovery weight hint (Node-only). |
583
+ | `transport` | `"tls"` | Worker transport (Node-only in `serve()`). |
584
+ | `tls` | `WorkerTLSOpts` | Per-serve TLS override (Node-only in `serve()`). |
585
+
586
+ ```ts
587
+ await sb.serve({
588
+ host: "0.0.0.0",
589
+ instanceId: process.env.HOSTNAME,
590
+ });
591
+ ```
592
+
593
+ ---
594
+
595
+ ### `stop()`
596
+
597
+ ```ts
598
+ stop(): void
599
+ ```
600
+
601
+ Gracefully stops the worker gRPC server (try graceful shutdown, then force), heartbeats, channels, and SDK internals.
602
+
603
+ ---
604
+
605
+ ### `startHttpSpan(opts)`
606
+
607
+ ```ts
608
+ startHttpSpan(opts: {
609
+ method: string;
610
+ path: string;
611
+ traceId?: string;
612
+ parentSpanId?: string;
613
+ }): HttpSpan
614
+ ```
615
+
616
+ Manual HTTP tracing primitive.
617
+
618
+ ```ts
619
+ const span = sb.startHttpSpan({ method: "GET", path: "/health" });
620
+ try {
621
+ span.end({ statusCode: 200, success: true });
622
+ } catch (e) {
623
+ span.end({ success: false, error: String(e) });
624
+ }
625
+ ```
626
+
627
+ ---
628
+
629
+ ### `registerHttpEndpoint(opts)`
630
+
631
+ ```ts
632
+ registerHttpEndpoint(opts: {
633
+ method: string;
634
+ route: string;
635
+ instanceId?: string;
636
+ endpoint?: string;
637
+ allowedCallers?: string[];
638
+ requestSchemaJson?: string;
639
+ responseSchemaJson?: string;
640
+ transport?: string;
641
+ }): Promise<void>
642
+ ```
643
+
644
+ Registers HTTP route metadata in the ServiceBridge service catalog.
645
+ Also starts a periodic heartbeat to keep the HTTP endpoint alive in the registry.
646
+
647
+ | Option | Type | Description |
648
+ |---|---|---|
649
+ | `method` | `string` | HTTP method: `GET`, `POST`, `PUT`, `PATCH`, `DELETE`, etc. |
650
+ | `route` | `string` | Route pattern with parameter placeholders, e.g. `"/users/:id"`. |
651
+ | `instanceId` | `string` | Stable identifier for this process instance. |
652
+ | `endpoint` | `string` | Reachable address, e.g. `"http://10.0.0.1:3000"`. |
653
+ | `allowedCallers` | `string[]` | Service names allowed to call (RBAC). |
654
+ | `requestSchemaJson` | `string` | JSON schema for request validation metadata. |
655
+ | `responseSchemaJson` | `string` | JSON schema for response validation metadata. |
656
+ | `transport` | `string` | Transport label (e.g. `"http"`, `"https"`). |
657
+
658
+ ```ts
659
+ await sb.registerHttpEndpoint({
660
+ method: "GET",
661
+ route: "/users/:id",
662
+ requestSchemaJson: '{"type":"object"}',
663
+ transport: "http",
664
+ });
665
+ ```
666
+
667
+ ---
668
+
669
+ ### `watchRun(runId, opts?)`
670
+
671
+ ```ts
672
+ watchRun(runId: string, opts?: WatchRunOpts): AsyncIterable<RunStreamEvent>
673
+ ```
674
+
675
+ Subscribes to a run stream with replay and live updates. `runId` is the stream
676
+ identifier used by `ctx.stream.write(...)` (typically a trace ID).
677
+
678
+ `WatchRunOpts`:
679
+
680
+ | Option | Type | Default | Description |
681
+ |---|---|---|---|
682
+ | `key` | `string` | `""` | Stream key filter (`""` = all keys). |
683
+ | `fromSequence` | `number` | `0` | Replay from sequence cursor. |
684
+
685
+ `RunStreamEvent`:
686
+
687
+ | Field | Type | Description |
688
+ |---|---|---|
689
+ | `type` | `"chunk" \| "run_complete"` | Event kind. |
690
+ | `runId` | `string` | Stream/run identifier being watched. |
691
+ | `key` | `string` | Stream lane key. |
692
+ | `sequence` | `number` | Monotonic sequence number. |
693
+ | `data` | `unknown` | JSON-decoded chunk payload. |
694
+ | `runStatus` | `string \| undefined` | Final status on `run_complete`. |
695
+
696
+ ```ts
697
+ for await (const evt of sb.watchRun(runId, { key: "output", fromSequence: 0 })) {
698
+ if (evt.type === "chunk") {
699
+ process.stdout.write(String((evt.data as { token?: string }).token ?? ""));
700
+ }
701
+ if (evt.type === "run_complete") break;
702
+ }
703
+ ```
704
+
705
+ ---
706
+
707
+ ### Trace Utilities
708
+
709
+ #### `getTraceContext()`
710
+
711
+ ```ts
712
+ getTraceContext(): { traceId: string; spanId: string } | undefined
713
+ ```
714
+
715
+ Returns the current async-local trace context.
716
+
717
+ ```ts
718
+ import { getTraceContext } from "service-bridge";
719
+
720
+ const tc = getTraceContext();
721
+ if (tc) {
722
+ console.log(tc.traceId, tc.spanId);
723
+ }
724
+ ```
725
+
726
+ #### `runWithTraceContext(ctx, fn)`
727
+
728
+ ```ts
729
+ runWithTraceContext<T>(ctx: { traceId: string; spanId: string }, fn: () => T): T
730
+ ```
731
+
732
+ Runs a function inside an explicit trace context.
733
+
734
+ ```ts
735
+ import { runWithTraceContext } from "service-bridge";
736
+
737
+ runWithTraceContext({ traceId: "trace-1", spanId: "span-1" }, async () => {
738
+ await sb.event("audit.log", { action: "user.login" });
739
+ });
740
+ ```
741
+
742
+ ---
743
+
744
+ ## HTTP Plugins
745
+
746
+ ### Express (`service-bridge/express`)
747
+
748
+ ```bash
749
+ npm install express
750
+ ```
751
+
752
+ ```ts
753
+ import express from "express";
754
+ import { servicebridge } from "service-bridge";
755
+ import { servicebridgeMiddleware, registerExpressRoutes } from "service-bridge/express";
756
+
757
+ const sb = servicebridge(process.env.SERVICEBRIDGE_URL!, process.env.SERVICEBRIDGE_SERVICE_KEY!, "api");
758
+ const app = express();
759
+
760
+ app.use(servicebridgeMiddleware({
761
+ client: sb,
762
+ excludePaths: ["/health"],
763
+ autoRegister: true,
764
+ }));
765
+
766
+ app.get("/users/:id", async (req, res) => {
767
+ const user = await req.servicebridge.rpc("users/get", { id: req.params.id });
768
+ res.json(user);
769
+ });
770
+ ```
771
+
772
+ #### `servicebridgeMiddleware(options)`
773
+
774
+ ```ts
775
+ servicebridgeMiddleware(options: {
776
+ client: ServiceBridgeService;
777
+ excludePaths?: string[];
778
+ propagateTraceHeader?: boolean;
779
+ autoRegister?: boolean;
780
+ }): express.RequestHandler
781
+ ```
782
+
783
+ - Attaches `req.servicebridge`, `req.traceId`, `req.spanId`
784
+ - Starts/ends HTTP span automatically
785
+ - Optionally sets `x-trace-id` response header
786
+ - Optionally auto-registers route pattern in catalog on first hit
787
+
788
+ #### `registerExpressRoutes(app, client, opts?)`
789
+
790
+ Eager route catalog registration without waiting for first request.
791
+
792
+ ```ts
793
+ await registerExpressRoutes(app, sb, {
794
+ endpoint: "http://10.0.0.5:3000",
795
+ allowedCallers: ["api-gateway"],
796
+ excludePaths: ["/health"],
797
+ });
798
+ ```
799
+
800
+ ---
801
+
802
+ ### Fastify (`service-bridge/fastify`)
803
+
804
+ ```bash
805
+ npm install fastify
806
+ ```
807
+
808
+ ```ts
809
+ import Fastify from "fastify";
810
+ import { servicebridge } from "service-bridge";
811
+ import { servicebridgePlugin, wrapHandler } from "service-bridge/fastify";
812
+
813
+ const sb = servicebridge(process.env.SERVICEBRIDGE_URL!, process.env.SERVICEBRIDGE_SERVICE_KEY!, "api");
814
+ const app = Fastify();
815
+
816
+ await app.register(servicebridgePlugin, {
817
+ client: sb,
818
+ excludePaths: ["/health"],
819
+ autoRegister: true,
820
+ });
821
+
822
+ app.get("/users/:id", wrapHandler(async (request, reply) => {
823
+ const user = await request.servicebridge.rpc("users/get", {
824
+ id: (request.params as any).id,
825
+ });
826
+ return reply.send(user);
827
+ }));
828
+ ```
829
+
830
+ #### `servicebridgePlugin(fastify, options)`
831
+
832
+ ```ts
833
+ servicebridgePlugin(fastify, {
834
+ client,
835
+ excludePaths?,
836
+ propagateTraceHeader?,
837
+ autoRegister?,
838
+ register?: {
839
+ instanceId?,
840
+ endpoint?,
841
+ allowedCallers?,
842
+ excludePaths?,
843
+ },
844
+ })
845
+ ```
846
+
847
+ - Decorates `request.servicebridge`, `request.traceId`, `request.spanId`
848
+ - Traces HTTP lifecycle via hooks
849
+ - Auto-registers routes on `onRoute` before traffic
850
+
851
+ #### `wrapHandler(handler)`
852
+
853
+ Runs a Fastify handler inside the current trace context so downstream SDK calls inherit the trace.
854
+
855
+ ---
856
+
857
+ ## Configuration
858
+
859
+ ### TLS behavior
860
+
861
+ - Worker transport is TLS-only.
862
+ - If `workerTLS` is not provided, SDK auto-provisions certs through the admin API.
863
+ - `workerTLS.cert` and `workerTLS.key` must be provided together.
864
+ - `serve({ tls })` overrides global `workerTLS` for a specific worker instance.
865
+
866
+ ### Offline queue behavior
867
+
868
+ When the control plane is unavailable, SDK queues write operations (`event`, `job`, `workflow`, telemetry writes).
869
+
870
+ - Queue size: `queueMaxSize` (default: 1000)
871
+ - Overflow policy: `queueOverflow` (default: `"drop-oldest"`)
872
+ - Return values for queued writes may be empty strings until flushed
873
+
874
+ ---
875
+
876
+ ## Environment Variables
877
+
878
+ The SDK requires values you pass into `servicebridge(...)`. Common setup:
879
+
880
+ | Variable | Required | Example | Description |
881
+ |---|---|---|---|
882
+ | `SERVICEBRIDGE_URL` | yes | `127.0.0.1:14445` | gRPC control plane URL |
883
+ | `SERVICEBRIDGE_SERVICE_KEY` | yes | `sb_live_...` | Service authentication key |
884
+ | `SERVICEBRIDGE_SERVICE` | yes (worker mode) | `orders` | Service name in registry |
885
+ | `SERVICEBRIDGE_ADMIN_URL` | optional | `http://127.0.0.1:14444` | Explicit admin API base URL |
886
+
887
+ ```ts
888
+ const sb = servicebridge(
889
+ process.env.SERVICEBRIDGE_URL ?? "127.0.0.1:14445",
890
+ process.env.SERVICEBRIDGE_SERVICE_KEY!,
891
+ process.env.SERVICEBRIDGE_SERVICE ?? "orders",
892
+ {
893
+ adminUrl: process.env.SERVICEBRIDGE_ADMIN_URL,
894
+ },
895
+ );
896
+ ```
897
+
898
+ ---
899
+
900
+ ## Error Handling
901
+
902
+ `ServiceBridgeError` is exported for normalized SDK and runtime errors.
903
+
904
+ ```ts
905
+ import { servicebridge, ServiceBridgeError } from "service-bridge";
906
+
907
+ try {
908
+ await sb.rpc("payments/charge", { orderId: "ord_1" });
909
+ } catch (e) {
910
+ if (e instanceof ServiceBridgeError) {
911
+ console.error(e.component, e.operation, e.severity, e.retryable, e.code);
912
+ }
913
+ throw e;
914
+ }
915
+ ```
916
+
917
+ | Field | Type | Description |
918
+ |---|---|---|
919
+ | `component` | `string` | SDK subsystem (for example, `"rpc"` or `"event"`). |
920
+ | `operation` | `string` | Operation that failed. |
921
+ | `severity` | `"fatal" \| "retriable" \| "ignorable"` | Error classification. |
922
+ | `retryable` | `boolean` | Whether retry is recommended. |
923
+ | `code` | `number \| undefined` | gRPC status code (if available). |
924
+ | `cause` | `unknown` | Original underlying error. |
925
+
926
+ ---
927
+
928
+ ## When to Use / When Not to Use
929
+
930
+ ### ServiceBridge is a good fit when you:
931
+
932
+ - Have **3+ microservices** that need to communicate via RPC, events, or both
933
+ - Want **RPC + events + workflows + jobs** without managing separate infrastructure for each
934
+ - Need **end-to-end tracing** across all communication patterns in one timeline
935
+ - Want to **eliminate sidecar proxies** and reduce operational overhead
936
+ - Need **durable event delivery** with retry, DLQ, and replay without running a broker
937
+ - Are building **AI/LLM pipelines** and need realtime streaming with replay
938
+
939
+ ### Consider alternatives when you:
940
+
941
+ - Run a **single monolith** with no service decomposition plans
942
+ - Need **ultra-high-throughput event streaming** (100K+ msg/s sustained) — Kafka is purpose-built for this
943
+ - Need a **full API gateway** with rate limiting, auth plugins, and request transformation — use Kong/Envoy Gateway
944
+ - Already have a **mature Istio/Linkerd mesh** and only need traffic management (no events/workflows/jobs)
945
+ - Need **multi-region event replication** — ServiceBridge currently targets single-region deployments
946
+
947
+ ---
948
+
949
+ ## FAQ
950
+
951
+ **How does ServiceBridge handle service failures?**
952
+ RPC calls have configurable retries with exponential backoff. Events are durable (PostgreSQL-backed) with at-least-once delivery per consumer group. Failed deliveries are retried according to policy, then moved to DLQ. Workflows track step state and can be resumed.
953
+
954
+ **Is there vendor lock-in?**
955
+ ServiceBridge is self-hosted. The runtime is a single Go binary + PostgreSQL. SDK calls map to standard patterns (RPC, pub/sub, cron) — migrating away means replacing SDK calls with equivalent library calls.
956
+
957
+ **How does tracing work without an OTEL collector?**
958
+ The SDK automatically reports trace spans for every RPC call, event publish/delivery, workflow step, and HTTP request. The runtime stores traces in PostgreSQL and serves them via the built-in dashboard and a Loki-compatible API for Grafana integration.
959
+
960
+ **Can I use ServiceBridge alongside existing infrastructure?**
961
+ Yes. You can adopt incrementally — start with RPC between two services, add events later, then workflows. ServiceBridge doesn't require replacing your existing broker or mesh all at once.
962
+
963
+ **What happens when the control plane is down?**
964
+ In-flight direct RPC calls continue working (they go service-to-service, not through the control plane). New discovery lookups, event publishes, and telemetry writes are queued in the SDK offline queue and flushed when the control plane recovers.
965
+
966
+ **What databases does the runtime support?**
967
+ PostgreSQL 16+. The runtime uses PostgreSQL for all persistence: traces, events, workflows, jobs, service registry, and configuration.
968
+
969
+ ---
970
+
971
+ ## Community and Support
972
+
973
+ - Website: [servicebridge.dev](https://servicebridge.dev)
974
+ - GitHub: [github.com/service-bridge](https://github.com/service-bridge)
975
+ - SDK monorepo: [README.md](../README.md)
976
+
977
+ ---
978
+
979
+ ## License
980
+
981
+ Free for non-commercial use. Commercial use requires a separate license. See [LICENSE](../LICENSE).
982
+
983
+ Copyright (c) 2026 Eugene Surkov.
984
+
985
+ ---
986
+
987
+ ## Keywords
988
+
989
+ service-bridge · servicebridge · npm install service-bridge · npm i service-bridge · bun add service-bridge · Node.js SDK · TypeScript SDK · JavaScript microservices · RPC · gRPC · event bus · event-driven · distributed tracing · workflow orchestration · background jobs · cron · mTLS · service mesh · service discovery · zero sidecar · Istio alternative · Envoy alternative · RabbitMQ alternative · Temporal alternative · Jaeger alternative · PostgreSQL · Docker · Kubernetes · DLQ · dead letter queue · saga · distributed transactions · AI agent orchestration · Express middleware · Fastify middleware · HTTP middleware · observability · Prometheus · tracing · service catalog · durable events · retries · idempotency · auto mTLS · runtime dashboard · production ready · microservice communication