service-bridge 1.9.0-dev.52 → 2.0.0-alpha.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,1337 +1,633 @@
1
- <!-- keywords: service-bridge servicebridge npm install service-bridge Node.js TypeScript JavaScript microservices RPC gRPC event-bus event-driven distributed-tracing workflow orchestration background-jobs cron mTLS service-mesh service-discovery distributed-systems zero-sidecar Istio-alternative RabbitMQ-alternative Temporal-alternative Jaeger-alternative PostgreSQL Docker Kubernetes DLQ dead-letter-queue saga distributed-transactions AI-agent-orchestration Express Fastify HTTP-middleware observability Prometheus tracing service-catalog async-messaging durable-events retries idempotency auto-mTLS runtime-dashboard production-ready bun deno -->
1
+ <!--
2
+ Keywords: service-bridge, ServiceBridge, microservices, Node.js SDK, TypeScript SDK, Bun,
3
+ gRPC, mTLS, RPC framework, durable events, pub/sub, message broker alternative, RabbitMQ alternative,
4
+ workflow engine, saga, orchestration, Temporal alternative, job scheduler, cron, distributed tracing,
5
+ observability, OpenTelemetry alternative, Jaeger alternative, service mesh alternative, Istio alternative,
6
+ self-hosted, PostgreSQL, Express, Fastify, Hono, circuit breaker, idempotency, retries, load balancing.
7
+ -->
2
8
 
3
9
  # service-bridge
4
10
 
5
- [![npm version](https://img.shields.io/npm/v/service-bridge?color=cb3837&logo=npm)](https://www.npmjs.com/package/service-bridge)
6
- [![License](https://img.shields.io/badge/License-Free%20%2F%20Commercial-blue)](../LICENSE)
7
- [![TypeScript](https://img.shields.io/badge/TypeScript-5%2B-3178c6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
8
- [![Node](https://img.shields.io/badge/Node.js-18%2B-339933?logo=node.js&logoColor=white)](https://nodejs.org/)
11
+ [![npm version](https://img.shields.io/npm/v/service-bridge?color=cb3837&label=npm)](https://www.npmjs.com/package/service-bridge)
12
+ [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)
13
+ [![TypeScript](https://img.shields.io/badge/types-included-3178c6.svg)](https://www.typescriptlang.org/)
14
+ [![Node](https://img.shields.io/badge/node-%E2%89%A518-339933.svg)](https://nodejs.org/)
9
15
 
10
- **The Unified Bridge for Microservices Interaction**
16
+ **The Node.js / Bun SDK for [ServiceBridge](https://servicebridge.dev) — RPC, durable events, workflows, jobs, streaming and full observability over one self-hosted runtime. No broker. No sidecar. No tracing stack. Just one Go binary plus PostgreSQL.**
11
17
 
12
- Node.js SDK for [ServiceBridge](https://servicebridge.dev) production-ready RPC, durable events, workflows, jobs, and distributed tracing in a single SDK. One Go runtime and PostgreSQL.
18
+ You declare what your service handles and what it calls. ServiceBridge does the rest: provisions an mTLS identity, opens the connection, registers your handlers, and routes every RPC, event, job and workflow step with tracing, metrics and access policy built in.
13
19
 
14
20
  ```
15
- ┌─────────────────────────────────────────────────────────────────┐
16
- │ BEFORE: 10 moving parts │
17
- Istio · Envoy · RabbitMQ · Temporal · Jaeger · Consul · │
18
- cert-manager · Alertmanager · cron · custom glue │
19
- └─────────────────────────────────────────────────────────────────┘
20
-
21
- ┌─────────────────────────────────────────────────────────────────┐
22
- AFTER: ServiceBridge + PostgreSQL
23
- RPC · Events · Workflows · Jobs · Tracing · mTLS · Dashboard
24
- One SDK · One runtime · Zero sidecars
25
- └─────────────────────────────────────────────────────────────────┘
21
+ BEFORE AFTER
22
+
23
+ ┌─────────────────────┐
24
+ Istio + Envoy │ ← mesh / mTLS
25
+ │ RabbitMQ / Kafka │ ← events ┌──────────────────────┐
26
+ │ Temporal │ ← workflows │ │
27
+ │ a cron scheduler │ ← jobs │ ServiceBridge │
28
+ gRPC plumbing │ ← RPC ═══► │ runtime (1 binary)
29
+ Jaeger / Tempo │ ← tracing + │
30
+ Prometheus wiring metrics PostgreSQL │
31
+ │ Loki │ ← logs │ │
32
+ │ a load balancer │ ← LB / retries └──────────────────────┘
33
+ │ service registry │ ← discovery
34
+ └─────────────────────┘
35
+ 10+ moving parts 2 things to run
26
36
  ```
27
37
 
28
- ## Table of Contents
38
+ ---
39
+
40
+ ## Table of contents
29
41
 
30
- - [Why ServiceBridge](#why-servicebridge)
31
- - [Use Cases](#use-cases)
32
- - [Quick Start](#quick-start)
33
42
  - [Install](#install)
34
- - [Runtime Setup](#runtime-setup)
35
- - [End-to-End Example](#end-to-end-example)
36
- - [Platform Features](#platform-features)
37
- - [How It Compares](#how-it-compares)
38
- - [API Reference](#api-reference)
39
- - [HTTP Plugins](#http-plugins)
43
+ - [Why ServiceBridge](#why-servicebridge)
44
+ - [Use cases](#use-cases)
45
+ - [Quick start](#quick-start)
46
+ - [Runtime setup](#runtime-setup)
47
+ - [End-to-end example](#end-to-end-example)
48
+ - [Platform features](#platform-features)
49
+ - [How it compares](#how-it-compares)
50
+ - [API reference](#api-reference)
51
+ - [RPC](#rpc)
52
+ - [Events](#events)
53
+ - [Jobs](#jobs)
54
+ - [Workflows](#workflows)
55
+ - [Streaming](#streaming)
56
+ - [Telemetry](#telemetry)
57
+ - [HTTP](#http)
58
+ - [HTTP plugins](#http-plugins)
40
59
  - [Configuration](#configuration)
41
- - [Environment Variables](#environment-variables)
42
- - [Error Handling](#error-handling)
43
- - [When to Use / When Not to Use](#when-to-use--when-not-to-use)
60
+ - [Error handling](#error-handling)
44
61
  - [FAQ](#faq)
45
- - [Community and Support](#community-and-support)
62
+ - [Community](#community)
46
63
  - [License](#license)
47
64
 
48
65
  ---
49
66
 
50
- ## Why ServiceBridge
51
-
52
- | Problem | Without ServiceBridge | With ServiceBridge |
53
- |---|---|---|
54
- | Service-to-service calls | Istio/Envoy sidecar proxy per pod | **Direct SDK-to-worker gRPC, zero proxy hops** |
55
- | Async messaging | Kafka/RabbitMQ + retry logic + DLQ setup | **Built-in durable events with retry, DLQ, replay** |
56
- | Background jobs | Bull/BullMQ + Redis + cron daemon | **Built-in cron and delayed jobs** |
57
- | Workflow orchestration | Temporal/Conductor cluster + persistence | **Built-in DAG workflows** |
58
- | Distributed tracing | Jaeger/Tempo + OTEL collector + dashboards | **Built-in traces + realtime UI** |
59
- | Service discovery | Consul/etcd + DNS glue | **Built-in registry + health-aware balancing** |
60
- | mTLS | cert-manager + Vault PKI | **Auto-provisioned certs from service key** |
61
-
62
- **Result**: `10 tools → 1 runtime`. One Go binary + PostgreSQL replaces the entire stack.
63
-
64
- ---
65
-
66
- ## Use Cases
67
-
68
- **Microservice communication** — Replace sidecar mesh with direct RPC calls. Get sub-millisecond overhead instead of double proxy hop latency.
69
-
70
- **Event-driven architecture** — Publish durable events with fan-out, retries, DLQ, idempotency, and server-side filtering. No broker infrastructure to manage.
71
-
72
- **Background job scheduling** — Cron jobs, delayed execution, and job-triggered workflows in a single API. No Redis, no separate queue workers.
73
-
74
- **Saga / distributed transactions** — DAG workflows with typed steps (`rpc`, `event`, `event_wait`, `sleep`, child workflow). Compensations and rollbacks via workflow step dependencies.
75
-
76
- **AI agent orchestration** — Stream LLM tokens via realtime trace streams with replay. Orchestrate multi-step AI pipelines as workflows.
77
-
78
- **Full-stack observability** — Every RPC call, event delivery, workflow step, and HTTP request traced automatically. One timeline, one dashboard. Prometheus metrics and Loki-compatible log API included.
79
-
80
- ---
81
-
82
- ## Quick Start
83
-
84
- ### 1. Install
67
+ ## Install
85
68
 
86
- ```bash
69
+ ```sh
87
70
  npm i service-bridge
88
71
  # or
89
72
  bun add service-bridge
90
73
  ```
91
74
 
92
- ### 2. Create a worker (service that handles calls)
75
+ - **Runtime:** Node.js 18+ or any current Bun.
76
+ - **Types:** included, written in TypeScript 5.
77
+ - **Backend:** a running ServiceBridge runtime (gRPC control plane on `:14445`) backed by PostgreSQL 18+. See [Runtime setup](#runtime-setup).
93
78
 
94
79
  ```ts
95
80
  import { ServiceBridge } from "service-bridge";
96
81
 
97
82
  const sb = new ServiceBridge(
98
- process.env.SERVICEBRIDGE_URL ?? "localhost:14445",
99
- process.env.SERVICEBRIDGE_SERVICE_KEY!,
83
+ "localhost:14445", // runtime control-plane address
84
+ "sb_key_...", // bootstrap service key from the runtime
100
85
  );
101
-
102
- sb.rpc.handle("payment.charge", async (payload: { orderId: string; amount: number }) => {
103
- return { ok: true, txId: `tx_${Date.now()}`, orderId: payload.orderId };
104
- });
105
-
106
- await sb.start({ host: "localhost" });
107
86
  ```
108
87
 
109
- ### 3. Call it from another service
110
-
111
- ```ts
112
- import { ServiceBridge } from "service-bridge";
88
+ The third constructor argument is an [options](#configuration) object. The SDK reads **no environment variables** — every knob is a constructor option, so you stay in control of where config comes from.
113
89
 
114
- const sb = new ServiceBridge(
115
- process.env.SERVICEBRIDGE_URL ?? "localhost:14445",
116
- process.env.SERVICEBRIDGE_SERVICE_KEY!,
117
- );
90
+ ### Using an AI coding agent?
118
91
 
119
- const result = await sb.rpc.invoke<{ ok: boolean; txId: string }>("payment.charge", {
120
- orderId: "ord_42",
121
- amount: 4990,
122
- });
92
+ Drop in the official **`servicebridge-node`** skill and your agent (Claude Code, etc.) writes correct ServiceBridge code on the first try — RPC, events, workflows, jobs and HTTP integration, grounded in this exact SDK:
123
93
 
124
- console.log(result.txId); // tx_1711234567890
94
+ ```sh
95
+ npx degit service-bridge/sdk/skills/servicebridge-node .claude/skills/servicebridge-node
125
96
  ```
126
97
 
127
- That's it. No broker, no sidecar, no proxy — direct gRPC call between services.
98
+ Source and details: [`skills/servicebridge-node/`](https://github.com/service-bridge/sdk/tree/main/skills/servicebridge-node).
128
99
 
129
100
  ---
130
101
 
131
- ## Runtime Setup
132
-
133
- The SDK connects to a ServiceBridge runtime. The fastest way to start:
134
-
135
- ```bash
136
- bash <(curl -fsSL https://servicebridge.dev/install.sh)
137
- ```
138
-
139
- This installs ServiceBridge + PostgreSQL via Docker Compose and generates an admin password automatically. After install, the dashboard is at `http://localhost:14444` and the gRPC control plane at `localhost:14445`.
140
-
141
- For manual Docker Compose setup, configuration reference, and all runtime environment variables, see the **[Runtime Setup](../README.md#runtime-setup)** section in the main SDK README.
142
-
143
- ---
144
-
145
- ## End-to-End Example
146
-
147
- A complete order flow: HTTP request → RPC → Event → Event handler with streaming.
148
-
149
- ```ts
150
- import { ServiceBridge } from "service-bridge";
151
-
152
- // --- Payments service (worker) ---
153
-
154
- const payments = new ServiceBridge("localhost:14445", process.env.SERVICEBRIDGE_SERVICE_KEY!);
155
-
156
- payments.rpc.handle("payment.charge", async (payload: { orderId: string; amount: number }, ctx) => {
157
- await ctx?.stream.write({ status: "charging", orderId: payload.orderId }, "progress");
158
-
159
- // ... charge logic ...
160
-
161
- await ctx?.stream.write({ status: "charged" }, "progress");
162
- return { ok: true, txId: `tx_${Date.now()}` };
163
- });
164
-
165
- await payments.start({ host: "localhost" });
166
- ```
167
-
168
- ```ts
169
- // --- Orders service (caller + event publisher) ---
170
-
171
- const orders = new ServiceBridge("localhost:14445", process.env.SERVICEBRIDGE_SERVICE_KEY!);
172
-
173
- // Call payments, then publish event
174
- const charge = await orders.rpc.invoke<{ ok: boolean; txId: string }>("payment.charge", {
175
- orderId: "ord_42",
176
- amount: 4990,
177
- });
178
-
179
- await orders.events.publish("orders.completed", {
180
- orderId: "ord_42",
181
- txId: charge.txId,
182
- }, {
183
- idempotencyKey: "order:ord_42:completed",
184
- headers: { source: "checkout" },
185
- });
186
- ```
187
-
188
- ```ts
189
- // --- Notifications service (event consumer) ---
102
+ ## Why ServiceBridge
190
103
 
191
- const notifications = new ServiceBridge("localhost:14445", process.env.SERVICEBRIDGE_SERVICE_KEY!);
104
+ Microservices rarely fail because of business logic. They fail in the gaps *between* services — the broker that dropped a message, the workflow engine nobody fully understands, the trace that stops at a service boundary, the mesh config that takes a week to debug. Each gap is another system to run, secure and correlate.
192
105
 
193
- notifications.events.handle("orders.*", async (payload, ctx) => {
194
- const body = payload as { orderId: string; txId: string };
195
- await ctx.stream.write({ status: "sending_email", orderId: body.orderId }, "progress");
196
- // ... send email ...
197
- });
106
+ ServiceBridge collapses those gaps into one runtime. Your service talks to a single gRPC endpoint over mTLS; the runtime is the single source of truth for routing, delivery and state.
198
107
 
199
- await notifications.start({ host: "localhost" });
200
- ```
201
-
202
- ```ts
203
- // --- Orchestrate as a workflow ---
204
-
205
- await orders.workflows.run("order.fulfillment", [
206
- { id: "reserve", type: "rpc", service: "inventory", ref: "inventory.reserve" },
207
- { id: "charge", type: "rpc", service: "payment", ref: "payment.charge", deps: ["reserve"] },
208
- { id: "wait_dlv", type: "event_wait", ref: "shipping.delivered", deps: ["charge"] },
209
- { id: "notify", type: "event", ref: "orders.fulfilled", deps: ["wait_dlv"] },
210
- ]);
211
- ```
108
+ | Problem | Without ServiceBridge | With ServiceBridge |
109
+ |---|---|---|
110
+ | Service-to-service calls | gRPC/HTTP plumbing + a mesh for mTLS + retries | `sb.rpc.call("svc", "Method", req)` — mTLS, LB, retries, breakers built in |
111
+ | Reliable async messaging | Stand up and operate a broker | `sb.event.publish(...)` — durable outbox, at-least-once, fan-out, DLQ |
112
+ | Multi-step business processes | A separate workflow engine to learn and host | `sb.workflow.handle(...)` — durable DAGs with compensation and replay |
113
+ | Scheduled work | A cron box or a job scheduler service | `sb.job.handle(...)` — cron / interval / delay, leased and retried |
114
+ | Knowing what happened | Wire up tracing + metrics + logs across N tools | Every hop is traced, measured and logged automatically |
115
+ | Identity & access | Certificates, a mesh policy layer | mTLS from a service key + granular access policy, on by default |
212
116
 
213
- Every step above — RPC, event publish, event delivery, workflow execution appears in a single trace timeline in the built-in dashboard.
117
+ One binary, one database, one place to look when something breaks.
214
118
 
215
119
  ---
216
120
 
217
- ## Platform Features
218
-
219
- ### Communication
220
- - **Direct RPC** — zero-hop gRPC calls with retries, deadlines, and mTLS identity
221
- - **Durable Events** — fan-out delivery, guaranteed delivery (RabbitMQ-style), at-least-once guarantees, retries, DLQ, replay, idempotency. If a consumer is offline, the message waits in the server-side queue and is dispatched the moment the consumer reconnects — no retry budget consumed while waiting.
222
- - **Realtime Streams** — live chunks with replay for AI/progress/log streaming
223
- - **Service Discovery** — automatic endpoint resolution and round-robin balancing
224
- - **HTTP Middleware** — Express and Fastify instrumentation with automatic trace propagation
225
-
226
- ### Orchestration
227
- - **Workflows** — DAG steps: `rpc`, `event`, `event_wait`, `sleep`, child workflow
228
- - **Jobs** — cron, delayed, and workflow-triggered scheduling
229
-
230
- ### Security
231
- - **TLS by default** — control plane TLS + worker mTLS with gRPC certificate provisioning
232
- - **Access Policy** — service-level caller/target restrictions and RBAC
233
-
234
- ### Observability
235
- - **Unified Tracing** — single trace timeline across HTTP, RPC, events, workflows, and jobs
236
- - **Metrics** — Prometheus-compatible `/metrics` endpoint (30+ metric families)
237
- - **Logs** — structured log ingest with Loki-compatible query API
238
- - **Alerts** — runtime alerts for delivery failures, errors, and service health
239
- - **Dashboard** — realtime web UI for traces, events, workflows, jobs, DLQ, service map, and service keys
240
-
241
- ---
121
+ ## Use cases
242
122
 
243
- ## How It Compares
244
-
245
- | Concern | Istio + Envoy | Dapr | Temporal + Kafka | ServiceBridge |
246
- |---|---|---|---|---|
247
- | RPC data path | Sidecar proxy hop | Sidecar/daemon hop | N/A | **Direct (proxyless)** |
248
- | Service discovery | K8s control plane | Sidecar placement | External registry | **Built-in registry** |
249
- | Durable events + DLQ | External broker | Pub/Sub component | Kafka + consumers | **Built-in** |
250
- | Workflow orchestration | External engine | External engine | Built-in | **Built-in** |
251
- | Job scheduling | External cron/queue | External scheduler | External scheduler | **Built-in** |
252
- | Traces + UI | Jaeger/Tempo + dashboards | OTEL backend + dashboards | Temporal UI | **Built-in** |
253
- | Logs for Grafana | Loki + Promtail pipeline | Log pipeline | Log pipeline | **Built-in Loki API** |
254
- | Metrics | App/exporter setup | App/exporter setup | Multiple exporters | **Built-in `/metrics`** |
255
- | Security model | Mesh PKI + policy | Deployment-dependent mTLS | Mixed | **Service keys + auto mTLS** |
256
- | Operational footprint | Multi-component mesh | Runtime + sidecars | Workflow + broker + DB | **One binary + PostgreSQL** |
123
+ - **Replace a broker** — durable, at-least-once events with fan-out and a dead-letter queue, without operating Kafka or RabbitMQ.
124
+ - **Run sagas / orchestration** — checkout, onboarding, fulfilment as durable workflows with automatic compensation on failure.
125
+ - **Internal RPC backbone** typed service-to-service calls with load balancing, retries and circuit breakers, secured by mTLS.
126
+ - **Scheduled & delayed work** — nightly rollups, reminders, periodic syncs as leased, retried jobs.
127
+ - **Streaming responses** token-by-token LLM output or progress feeds over server-side streaming RPC.
128
+ - **Observability for free** get a full distributed trace across RPC event workflow → job without instrumenting by hand.
257
129
 
258
130
  ---
259
131
 
260
- ## API Reference
261
-
262
- ### `ServiceBridge` / `ServiceBridgeService` surface
263
-
264
- Per-instance API for `new ServiceBridge(...)` (implements `ServiceBridgeService`):
265
-
266
- - **Namespaces:** `rpc` (`handle`, `invoke`, `declare`), `events` (`handle`, `publish`, `publishWorker`, `declare`), `jobs` (`run`), `workflows` (`run`, `declare`).
267
- - **Lifecycle:** `start(opts?)`, `stop()`.
268
- - **Workflows:** `cancelWorkflow(traceId)`.
269
- - **HTTP & traces:** `startHttpSpan(opts)`, `registerHttpEndpoint(opts)`, `watchTrace(traceId, opts?)`.
270
- - **Module helpers (exported from `service-bridge`):** `getTraceContext`, `withTraceContext`, `ServiceBridgeError`, `mapGrpcStatus`, `SB`, `SB_MESSAGES`. (`captureConsole` exists internally for log capture but is not part of the public package exports.)
271
-
272
- ### Cross-SDK parity notes
273
-
274
- ServiceBridge keeps the core API shape consistent across Node.js, Go, and Python:
275
- constructor, `rpc` / `events` / `jobs` / `workflows` namespaces, streams, `start`/`stop`, and `ServiceBridgeError`.
132
+ ## Quick start
276
133
 
277
- Constructor-level defaults for `timeout`, `retries`, and `retryDelay` are available
278
- across all three SDKs. Parity differences are naming-only (language idioms):
134
+ Schemas are **file-based**: point the SDK at a `.proto` file (it resolves request/response types from the `service` block) or a `.schema.json` with explicit field numbers. There is no inline schema.
279
135
 
280
- - Constructor TLS overrides: `workerTLS`/`caCert` (Node), `WorkerTLS`/`CACert` (Go), `worker_tls`/`ca_cert` (Python)
281
- - Handler hints: timeout/retryable/concurrency/prefetch are advisory in all SDKs
282
- - Shared `start()` fields across SDKs: host, max in-flight, instance ID, weight, and per-start TLS override
283
-
284
- ### `new ServiceBridge(url, serviceKey, opts?)`
285
-
286
- ```ts
287
- class ServiceBridge {
288
- constructor(url: string, serviceKey: string, opts?: ServiceBridgeOpts);
136
+ ```proto
137
+ // payment.proto
138
+ syntax = "proto3";
139
+ message ChargeRequest { string user_id = 1; int64 amount = 2; }
140
+ message ChargeReply { bool ok = 1; }
141
+ service Payment {
142
+ rpc Charge(ChargeRequest) returns (ChargeReply);
289
143
  }
290
144
  ```
291
145
 
292
- Creates an SDK client instance. Service identity is resolved by the runtime from the sbv2 `serviceKey` (key id). Use `new ServiceBridge(...)` as the **public** entry point from the `service-bridge` package (the constructor delegates to the same internal client setup used by the SDK; a lower-level factory exists in source but is **not** exported from the published entry).
293
-
294
- `ServiceBridgeOpts`:
295
-
296
- | Option | Type | Default | Description |
297
- |---|---|---|---|
298
- | `timeout` | `number` | `30000` | Default hard timeout per `rpc.invoke()` attempt (ms). |
299
- | `retries` | `number` | `3` | Default retry count for `rpc.invoke()`. |
300
- | `retryDelay` | `number` | `300` | Base backoff delay (ms) for `rpc.invoke()`. |
301
- | `discoveryRefreshMs` | `number` | `10000` | Discovery refresh period for endpoint updates. |
302
- | `queueMaxSize` | `number` | `1000` | Max offline queue size for control-plane writes. |
303
- | `queueOverflow` | `"drop-oldest" \| "drop-newest" \| "error"` | `"drop-oldest"` | Overflow strategy for offline queue. |
304
- | `heartbeatIntervalMs` | `number` | `10000` | Base heartbeat period for worker registrations. |
305
- | `captureLogs` | `boolean` | `true` | Forward `console.*` logs to ServiceBridge. |
306
- | `strictOutboundDeclarations` | `boolean` | `false` | When `true`, every outbound `rpc.invoke()` must be preceded by `rpc.declare(fn)` for the resolved target. |
307
-
308
- ### Advanced TLS overrides
309
-
310
- | Option | Type | Default | Description |
311
- |---|---|---|---|
312
- | `workerTLS` | `WorkerTLSOpts` | auto | Explicit cert/key/CA for worker mTLS. |
313
- | `caCert` | `string \| Buffer` | from `serviceKey` | Optional control-plane CA override. By default SDK reads CA from sbv2 service key. |
314
-
315
- `WorkerTLSOpts`:
146
+ **Worker** register the handler. One argument in, one value out.
316
147
 
317
148
  ```ts
318
- type WorkerTLSOpts = {
319
- caCert?: string | Buffer;
320
- cert?: string | Buffer;
321
- key?: string | Buffer;
322
- serverName?: string;
323
- }
324
- ```
149
+ import { ServiceBridge } from "service-bridge";
325
150
 
326
- ---
151
+ const sb = new ServiceBridge("localhost:14445", process.env.PAYMENT_KEY!);
327
152
 
328
- ### `rpc.invoke(fn, payload?, opts?)`
153
+ sb.rpc.handle(
154
+ "Charge",
155
+ async (req: { userId: string; amount: number }) => {
156
+ return { ok: req.amount > 0 };
157
+ },
158
+ { schema: { protoFile: "./payment.proto" } },
159
+ );
329
160
 
330
- ```ts
331
- invoke<T = unknown>(fn: string, payload?: unknown, opts?: RpcOpts): Promise<T>
161
+ await sb.start();
332
162
  ```
333
163
 
334
- Calls a registered RPC handler on another worker. Direct gRPC path, no proxy.
335
-
336
- **Function name** — `fn` is a single **global function name** (the same string passed to `rpc.handle` on the callee), e.g. `payment.charge` or `user.get`. It must be unique in the catalog and **must not contain `/`**.
337
-
338
- `RpcOpts`:
339
-
340
- | Option | Type | Description |
341
- |---|---|---|
342
- | `timeout` | `number` | Call timeout in ms. |
343
- | `retries` | `number` | Retry count override. |
344
- | `retryDelay` | `number` | Base retry delay override. |
345
- | `traceId` | `string` | Explicit trace id. |
346
- | `parentSpanId` | `string` | Explicit parent span id. |
347
- | `mode` | `"direct" \| "proxy"` | Transport mode. `"direct"` (default) connects directly to the worker. `"proxy"` routes through the control plane when direct connection is unavailable. |
164
+ **Caller** — in another process, build a typed client and call it. `sb.client()` reads the `.proto` once, declares every method in its `service` block as an outgoing dependency, loads the schemas, and returns a typed proxy.
348
165
 
349
166
  ```ts
350
- const user = await sb.rpc.invoke<{ id: string; name: string }>("user.get", { id: "u_1" });
351
-
352
- const user2 = await sb.rpc.invoke<{ id: string; name: string }>("user.get", { id: "u_1" }, {
353
- timeout: 5000,
354
- retries: 2,
355
- });
356
- ```
357
-
358
- `rpc.invoke()` is bounded even when a downstream worker is silent:
359
- each attempt has a hard local timeout, retries are finite (`retries + 1` total attempts),
360
- and after the final failed attempt the root RPC span is closed with `error`.
361
-
362
- Retry delay uses exponential backoff: `retryDelay * 2^(attempt-1)`.
167
+ import { ServiceBridge } from "service-bridge";
363
168
 
364
- ---
169
+ const sb = new ServiceBridge("localhost:14445", process.env.ORDERS_KEY!);
170
+ const payment = await sb.client("payment-svc", "./payment.proto");
365
171
 
366
- ### `rpc.declare(fn)`
172
+ await sb.start();
367
173
 
368
- ```ts
369
- declare(fn: string): void
174
+ const res = await payment.Charge({ userId: "u-1", amount: 100 });
175
+ // res.ok === true
370
176
  ```
371
177
 
372
- Declares an outbound RPC dependency for registration metadata. When `strictOutboundDeclarations` is `true`, you must call `rpc.declare(fn)` before `rpc.invoke(fn, ...)` for that function. Does not invoke the remote handler.
178
+ Declare dependencies and build typed clients **before** `start()` they ride along in the first registration. Calls succeed once `start()` has connected.
373
179
 
374
180
  ---
375
181
 
376
- ### `events.publish(topic, payload?, opts?)`
377
-
378
- ```ts
379
- publish(topic: string, payload?: unknown, opts?: EventOpts): Promise<string>
380
- ```
381
-
382
- Publishes a durable event. Returns `messageId` when online.
182
+ ## Runtime setup
383
183
 
384
- `EventOpts`:
184
+ The SDK needs a running ServiceBridge runtime. Spin one up with the one-line installer:
385
185
 
386
- | Option | Type | Description |
387
- |---|---|---|
388
- | `traceId` | `string` | Explicit trace id. |
389
- | `parentSpanId` | `string` | Explicit parent span id. |
390
- | `idempotencyKey` | `string` | Idempotency key for dedup-safe publishing. |
391
- | `headers` | `Record<string, string>` | Custom metadata headers. |
392
-
393
- ```ts
394
- await sb.events.publish("orders.created", { orderId: "ord_42" }, {
395
- idempotencyKey: "order:ord_42",
396
- headers: { source: "checkout" },
397
- });
186
+ ```sh
187
+ bash <(curl -fsSL https://servicebridge.dev/install.sh)
398
188
  ```
399
189
 
400
- ---
401
-
402
- ### `events.publishWorker(topic, payload?, opts?)`
190
+ It pulls the runtime container, wires it to PostgreSQL 18+, and exposes the gRPC control plane on `:14445` and the dashboard on `:14444`. Open the dashboard, create a service, and copy its **bootstrap service key** — that opaque string is the second argument to `new ServiceBridge(url, key)`.
403
191
 
404
- ```ts
405
- publishWorker(
406
- topic: string,
407
- payload?: unknown,
408
- opts?: { traceId?: string; parentSpanId?: string; headers?: Record<string, string> },
409
- ): Promise<string>
410
- ```
192
+ Each instance authenticates with its key: the SDK calls `Bootstrap.Provision`, receives a short-lived leaf certificate, opens an mTLS gRPC channel and registers. Certificates rotate automatically with overlap (the new session is live before the old one closes), so long-running instances never drop traffic at renewal.
411
193
 
412
- Publishes over the worker session stream (after `start()`). If no worker session is active, the promise is rejected.
194
+ Full self-hosting docs live at **[servicebridge.dev/docs](https://servicebridge.dev/docs)**.
413
195
 
414
196
  ---
415
197
 
416
- ### `events.declare(topic)`
417
-
418
- ```ts
419
- declare(topic: string): void
420
- ```
198
+ ## End-to-end example
421
199
 
422
- Declares an outbound event dependency for registration metadata (does not publish a message).
423
-
424
- ---
425
-
426
- ### `jobs.run(service, fn, opts)` / `jobs.run(target, opts)`
200
+ A small order flow: an HTTP request triggers a workflow that charges a payment, then publishes an event another service consumes all traced as one tree.
427
201
 
428
202
  ```ts
429
- run(service: string, fn: string, opts: ScheduleOpts & { via: "rpc" }): Promise<string>
430
- run(target: string, opts: ScheduleOpts & { via: "event" | "workflow" }): Promise<string>
431
- ```
432
-
433
- Registers a scheduled or delayed job. Resolves to the registration key: `"${service}/${fn}"` for the RPC overload, or the `target` string for the `event` / `workflow` overload.
434
-
435
- `ScheduleOpts`:
436
-
437
- | Option | Type | Description |
438
- |---|---|---|
439
- | `cron` | `string` | Cron expression. |
440
- | `delay` | `number` | Delay in ms before execution. Backed by `int32` in the proto — maximum ~24.8 days (~2,147,483,647 ms). |
441
- | `timezone` | `string` | Timezone for cron execution. |
442
- | `misfire` | `"fire_now" \| "skip"` | Misfire policy. |
443
- | `via` | `"event" \| "rpc" \| "workflow"` | Target type. |
444
- | `retryPolicyJson` | `string` | Retry policy JSON string. |
203
+ import { ServiceBridge } from "service-bridge";
445
204
 
446
- ```ts
447
- await sb.jobs.run("payments", "billing.collect", {
448
- cron: "0 * * * *",
449
- timezone: "UTC",
450
- via: "rpc",
205
+ const sb = new ServiceBridge("localhost:14445", process.env.ORDERS_KEY!);
206
+
207
+ // Outgoing dependencies declared before start().
208
+ sb.service("payment-svc", { rpc: ["Charge"] });
209
+ sb.event.define("order.placed", { protoFile: "./events.proto", input: "OrderPlaced" });
210
+
211
+ // A durable workflow: charge, then announce. Steps run by dependency level.
212
+ sb.workflow.handle("checkout", {
213
+ input: { type: "object", properties: { orderId: { type: "string" } } },
214
+ steps: [
215
+ { id: "charge", type: "call", service: "payment-svc", method: "Charge",
216
+ input: "$.input" },
217
+ { id: "announce", type: "publish", event: "order.placed",
218
+ input: "$.input", waitFor: ["charge"] },
219
+ ],
451
220
  });
452
- ```
453
221
 
454
- ---
222
+ sb.on("connected", ({ serviceName }) => console.log(`up as ${serviceName}`));
455
223
 
456
- ### `workflows.run(name, steps, opts?)` — register DAG
224
+ await sb.start();
457
225
 
458
- TypeScript (single method; behavior depends on the second argument):
459
-
460
- ```ts
461
- run(
462
- nameOrService: string,
463
- stepsOrName: WorkflowStep[] | string,
464
- inputOrOpts?: unknown,
465
- opts?: ExecuteWorkflowOpts,
466
- ): Promise<string | ExecuteWorkflowResult>
467
- ```
468
-
469
- - **Register:** when `stepsOrName` is `WorkflowStep[]`, `nameOrService` is the workflow name, `inputOrOpts` is optional `WorkflowOpts`, and the promise resolves to that name (`string`).
470
- - **Execute:** when `stepsOrName` is a `string`, `nameOrService` is the target **service** name, `stepsOrName` is the workflow name, `inputOrOpts` is the optional execution input, and `opts` is optional `ExecuteWorkflowOpts` (see execute section below).
471
-
472
- Overload as used when registering:
473
-
474
- ```ts
475
- run(name: string, steps: WorkflowStep[], opts?: WorkflowOpts): Promise<string>
226
+ // Kick off a run and wait for the final state.
227
+ const { runId } = await sb.workflow.start("checkout", { orderId: "o-1" });
228
+ const state = await sb.workflow.await(runId);
229
+ console.log("done", state);
476
230
  ```
477
231
 
478
- Registers (or updates) a workflow definition as a DAG of typed steps. Returns the workflow name.
479
-
480
- `WorkflowStep`:
481
-
482
- | Field | Type | Description |
483
- |---|---|---|
484
- | `id` | `string` | Unique step identifier in the DAG. |
485
- | `type` | `"rpc" \| "event" \| "event_wait" \| "sleep" \| "workflow"` | Step execution type. |
486
- | `service` | `string` | Required for `rpc` and `workflow`: target service that owns the function or child workflow. |
487
- | `ref` | `string` | Target name: RPC function, event topic, waited topic, or child workflow name (per `type`). |
488
- | `deps` | `string[]` | Dependencies. Empty/omitted means root step. |
489
- | `if` | `string` | Optional filter expression (step is skipped if false). |
490
- | `timeoutMs` | `number` | Optional timeout for `rpc` and `event_wait` steps. |
491
- | `durationMs` | `number` | Required for `sleep` steps. |
492
-
493
- `WorkflowOpts` (third argument when registering a DAG — shape below; the interface is defined in the SDK but **not** re-exported from the main `service-bridge` package entry, so use an inline object in app code):
232
+ The consuming service just subscribes:
494
233
 
495
234
  ```ts
496
- interface WorkflowOpts {
497
- stateLimitBytes?: number; // default 262144 (256 KB)
498
- stepTimeoutMs?: number; // default 30000 (30 s)
499
- }
500
- ```
501
-
502
- | Field | Type | Default | Description |
503
- |---|---|---|---|
504
- | `stateLimitBytes` | `number` | `262144` (256 KB) | Maximum serialized state size in bytes. |
505
- | `stepTimeoutMs` | `number` | `30000` (30 s) | Default per-step timeout in milliseconds. |
506
-
507
- ```ts
508
- await sb.workflows.run("order.fulfillment", [
509
- { id: "reserve", type: "rpc", service: "inventory", ref: "inventory.reserve" },
510
- { id: "charge", type: "rpc", service: "payment", ref: "payment.charge", deps: ["reserve"] },
511
- { id: "wait_5m", type: "sleep", durationMs: 300_000, deps: ["charge"] },
512
- { id: "notify", type: "event", ref: "orders.fulfilled", deps: ["wait_5m"] },
513
- ]);
235
+ sb.event.handle("order.placed", async (payload) => {
236
+ await sendReceipt(payload);
237
+ });
238
+ await sb.start();
514
239
  ```
515
240
 
516
- With explicit limits:
517
-
518
- ```ts
519
- await sb.workflows.run("checkout.flow", steps, { stepTimeoutMs: 60_000 });
520
- ```
241
+ In the dashboard you see one trace spanning the workflow run, the `Charge` RPC, the `order.placed` publish, and its delivery to the subscriber.
521
242
 
522
243
  ---
523
244
 
524
- ### `workflows.declare(service, name)`
245
+ ## Platform features
525
246
 
526
- ```ts
527
- declare(service: string, name: string): void
528
- ```
247
+ | Area | What you get |
248
+ |---|---|
249
+ | **Communication** | Direct RPC, server-side streaming, durable events, service discovery, full-mesh routing, a live service map |
250
+ | **Orchestration** | Workflows (DAG steps with compensation), sub-workflows, jobs (cron / interval / delayed), bidirectional replay |
251
+ | **Reliability** | At-least-once delivery, retries, DLQ, idempotency, fan-out, session resilience, multi-instance failover, circuit breakers |
252
+ | **Traffic control** | Load balancing, rate limiting, per-definition limits, filter expressions, adaptive performance |
253
+ | **Security** | TLS by default, mTLS identity, auto-provisioned certs from a service key, granular access policy |
254
+ | **Observability** | Unified tracing with propagation, Prometheus-compatible metrics, structured logs, smart alerts |
529
255
 
530
- Declares an outbound workflow dependency for registration metadata (does not start an execution).
256
+ Designed to run up to 1000 services against a single runtime.
531
257
 
532
258
  ---
533
259
 
534
- ### `workflows.run(service, name, input?, opts?)` — execute
260
+ ## How it compares
535
261
 
536
- This is the same `run` method as above when the second argument is the workflow **name** (`string`), not a step array:
262
+ | You'd otherwise reach for | ServiceBridge gives you |
263
+ |---|---|
264
+ | Istio / Linkerd (mesh, mTLS) | mTLS identity + routing + policy, no sidecars |
265
+ | RabbitMQ / Kafka / NATS | Durable events with outbox, fan-out, retries, DLQ |
266
+ | Temporal / Cadence | Durable workflows with compensation, signals, replay |
267
+ | A cron service / Quartz | Leased, retried scheduled jobs |
268
+ | Jaeger / Tempo + Prometheus + Loki | Tracing, metrics and logs, correlated out of the box |
269
+ | gRPC + a service registry | Typed RPC with discovery, LB and breakers |
537
270
 
538
- ```ts
539
- run(service: string, name: string, input?: unknown, opts?: ExecuteWorkflowOpts): Promise<ExecuteWorkflowResult>
540
- ```
271
+ The point isn't that ServiceBridge beats each tool at its own game — it's that you stop running and correlating ten of them.
541
272
 
542
- Starts a workflow execution on demand. The workflow must be registered first via `workflows.run(name, steps)`.
543
- An alternative to scheduling via `jobs.run(target, { via: "workflow", ... })` — triggers the execution immediately.
273
+ ---
544
274
 
545
- | Parameter | Type | Default | Description |
546
- |---|---|---|---|
547
- | `service` / `name` | `string` | required | Target service and workflow name. |
548
- | `input` | `unknown` | `undefined` | Optional JSON-serializable input payload (serialized as JSON for the runtime). |
275
+ ## API reference
549
276
 
550
- Returns `{ traceId }`. Use `traceId` with `watchTrace()` to observe execution in real time.
277
+ The bridge exposes four domains (`sb.rpc`, `sb.event`, `sb.job`, `sb.workflow`) plus `sb.stream()` and `sb.telemetry`. Register handlers and declare dependencies **before** `start()`.
551
278
 
552
- `ExecuteWorkflowOpts` (optional fourth argument):
279
+ ### RPC
553
280
 
554
- | Option | Type | Description |
555
- |---|---|---|
556
- | `traceId` | `string` | Declared on the exported type for API parity; the current Node implementation does **not** forward this field to the control plane (the gRPC request is built without it). Prefer relying on the returned `traceId`. |
281
+ `sb.rpc` is request/response: register handlers, call other services.
557
282
 
558
283
  ```ts
559
- const { traceId } = await sb.workflows.run("users", "user.onboarding", { userId: "u_123" });
560
- ```
561
-
562
- ---
563
-
564
- ### `cancelWorkflow(traceId)`
284
+ // Unary handler: (req) => res
285
+ sb.rpc.handle<ChargeRequest, ChargeReply>(
286
+ "Charge",
287
+ async (req) => ({ ok: req.amount > 0 }),
288
+ { schema: { protoFile: "./payment.proto" } },
289
+ );
565
290
 
566
- ```ts
567
- cancelWorkflow(traceId: string): Promise<void>
291
+ // Server-side streaming handler: (req) => AsyncIterable<chunk>
292
+ sb.rpc.handleStream<GenRequest, Token>(
293
+ "Generate",
294
+ async function* (req) {
295
+ for (const word of req.prompt.split(" ")) yield { token: word };
296
+ },
297
+ { schema: { protoFile: "./gen.proto" } },
298
+ );
568
299
  ```
569
300
 
570
- Cancels a running workflow instance.
301
+ Calling the typed proxy from `sb.client()` (preferred), or the lower-level `sb.rpc.call()`:
571
302
 
572
303
  ```ts
573
- await sb.cancelWorkflow("trace_01HQ...XYZ");
574
- ```
575
-
576
- ---
577
-
578
- ### `rpc.handle(fn, handler, opts?)`
304
+ const res = await payment.Charge({ userId: "u-1", amount: 100 });
579
305
 
580
- ```ts
581
- handle(
582
- fn: string,
583
- handler: (payload: unknown, ctx?: RpcContext) => unknown | Promise<unknown>,
584
- opts?: HandleRpcOpts,
585
- ): ServiceBridgeService
306
+ const res2 = await sb.rpc.call("payment-svc", "Charge",
307
+ { userId: "u-1", amount: 100 },
308
+ { timeout: "5s", idempotencyKey: "order-42" },
309
+ );
586
310
  ```
587
311
 
588
- Registers an RPC handler. Chainable.
589
-
590
- `RpcContext`:
591
-
592
- | Field | Type | Description |
593
- |---|---|---|
594
- | `traceId` | `string` | Current trace ID. |
595
- | `spanId` | `string` | Current span ID. |
596
- | `stream` | `StreamWriter` | Real-time stream writer. |
597
-
598
- `HandleRpcOpts`:
312
+ `CallOpts` apply per call, layered over `callDefaults` from the constructor:
599
313
 
600
- | Option | Type | Description |
601
- |---|---|---|
602
- | `timeout` | `number` | Advisory timeout hint (currently metadata-level, not hard-enforced by runtime). |
603
- | `retryable` | `boolean` | Advisory retry hint (currently metadata-level, not a strict policy switch). |
604
- | `concurrency` | `number` | Advisory concurrency hint (currently not hard-enforced). |
605
- | `schema` | `RpcSchemaOpts` | Inline protobuf schema for binary encode/decode. |
606
- | `allowedCallers` | `string[]` | Allow-list of caller service names. |
607
-
608
- ```ts
609
- sb.rpc.handle("ai.generate", async (payload: { prompt: string }, ctx) => {
610
- await ctx?.stream.write({ token: "Hello" }, "output");
611
- await ctx?.stream.write({ token: " world" }, "output");
612
- return { text: "Hello world" };
613
- });
614
- ```
615
-
616
- `StreamWriter`:
314
+ | `CallOpts` | Type | Default | Description |
315
+ |---|---|---|---|
316
+ | `timeout` | `string` | `"30s"` | Deadline, e.g. `"500ms"`, `"10s"`, `"2m"`. |
317
+ | `requestId` | `string` | random UUID v4 | Correlation id carried to the callee. |
318
+ | `transport` | `"direct" \| "proxy" \| "auto"` | `"auto"` | `direct` = caller→callee mTLS; `proxy` = via the runtime; `auto` = direct when an endpoint is known. |
319
+ | `idempotencyKey` | `string` | none | Opts into runtime-side dedup; replays within the TTL return the cached response. |
320
+ | `retry` | `Partial<RetryOpts>` | exp. backoff | `{ maxAttempts: 3, baseDelayMs: 200, factor: 2, maxDelayMs: 5000, jitter: 0.3 }`. Set `maxAttempts: 1` to disable. |
617
321
 
618
- | Method | Signature | Description |
619
- |---|---|---|
620
- | `write` | `write(data: unknown, key?: string): Promise<void>` | Append a real-time chunk to the trace stream. |
621
- | `end` | `end(key?: string): Promise<void>` | No-op placeholder for API symmetry (lifecycle managed by runtime). |
322
+ Without an `idempotencyKey`, ambiguous failures (`INTERNAL` / `ABORTED` / `UNKNOWN`) are treated as non-retryable so a non-idempotent call is never silently repeated. Schema-version mismatches are filtered at routing time, so blue-green deploys route `v1→v1` and `v2→v2` automatically.
622
323
 
623
- ---
324
+ ### Events
624
325
 
625
- ### `events.handle(pattern, handler, opts?)`
326
+ Durable, at-least-once publish/subscribe. Events hit a local SQLite outbox first, then drain to the runtime, so a publish survives a transient disconnect.
626
327
 
627
328
  ```ts
628
- handle(
629
- pattern: string,
630
- handler: (payload: unknown, ctx: EventContext) => void | Promise<void>,
631
- opts?: HandleEventOpts,
632
- ): ServiceBridgeService
633
- ```
634
-
635
- Registers an event consumer handler. Chainable.
636
-
637
- `HandleEventOpts`:
329
+ // Declare what you publish (same file-based SchemaSpec as RPC).
330
+ sb.event.define("order.placed", { protoFile: "./events.proto", input: "OrderPlaced" });
638
331
 
639
- | Option | Type | Description |
640
- |---|---|---|
641
- | `concurrency` | `number` | Advisory concurrency hint (currently not hard-enforced). |
642
- | `prefetch` | `number` | Advisory prefetch hint (currently not hard-enforced). |
643
- | `retryPolicyJson` | `string` | Retry policy JSON string. |
644
- | `filterExpr` | `string` | Server-side filter expression. |
645
-
646
- The consumer group name is fixed as `<service-key-id>.<pattern>` (derived from your sbv2 key and the pattern string). Registering a second handler for the same pattern throws a duplicate consumer-group error.
647
-
648
- **Delivery guarantee**: once a message is accepted by the runtime, delivery to each consumer group
649
- is guaranteed. If the consumer is offline, the message waits in the server-side queue and is
650
- dispatched automatically the moment the service reconnects and registers its handlers — no retry
651
- budget is consumed while waiting. After `SERVICEBRIDGE_DELIVERY_TTL_DAYS` (default 7) days without
652
- a consumer, the delivery moves to DLQ with reason `delivery_ttl_exceeded`.
653
-
654
- `EventContext` helpers:
655
-
656
- - `ctx.traceId` — current trace ID
657
- - `ctx.spanId` — current span ID
658
- - `ctx.retry(delayMs?)` — ask for redelivery with optional delay
659
- - `ctx.reject(reason)` — move to DLQ immediately, bypassing remaining retries
660
- - `ctx.refs` — metadata (`topic`, `groupName`, `messageId`, `attempt`, `headers`)
661
- - `ctx.stream.write(...)` — append real-time chunks to trace stream
662
-
663
- ```ts
664
- sb.events.handle("orders.*", async (payload, ctx) => {
665
- const body = payload as { orderId?: string };
666
- if (!body.orderId) {
667
- ctx.reject("missing_order_id");
668
- return;
669
- }
670
- await ctx.stream.write({ status: "processing", orderId: body.orderId }, "progress");
332
+ // Subscribe exact name or wildcard ("order.*", "order.#").
333
+ sb.event.handle("order.placed", async (payload) => {
334
+ await fulfil(payload);
671
335
  });
672
- ```
673
336
 
674
- ---
675
-
676
- ### `start(opts?)`
337
+ await sb.start();
677
338
 
678
- ```ts
679
- start(opts?: StartOpts): Promise<void>
339
+ const { eventId } = await sb.event.publish("order.placed", { orderId: "o-1", total: 4200 });
680
340
  ```
681
341
 
682
- Starts the worker gRPC server and registers handlers with the control plane.
683
- The promise resolves once startup/registration is complete (it does not block
684
- the Node.js process). Throws immediately if no handlers are registered (neither `rpc.handle()` nor `events.handle()` have been called).
342
+ Event names must match `^[a-z0-9_-]+(\.[a-z0-9_-]+)*$` (invalid `InvalidEventNameError`). A full outbox throws `OutboxFullError`.
685
343
 
686
- `StartOpts`:
687
-
688
- | Option | Type | Description |
344
+ | `PublishOpts` | Type | Description |
689
345
  |---|---|---|
690
- | `host` | `string` | Bind host. Default: `localhost`. Use `0.0.0.0` in Docker/Kubernetes so ServiceBridge can reach the worker. |
691
- | `maxInFlight` | `number` | Max in-flight runtime-originated commands over `OpenWorkerSession`. Default: `128`. |
692
- | `instanceId` | `string` | Stable worker instance identifier. |
693
- | `weight` | `number` | Scheduling/discovery weight hint. |
694
- | `tls` | `WorkerTLSOpts` | Per-start worker TLS override. |
695
-
696
- ```ts
697
- await sb.start({
698
- host: "localhost",
699
- instanceId: process.env.HOSTNAME,
700
- });
701
- ```
702
-
703
- ---
704
-
705
- ### `stop()`
706
-
707
- ```ts
708
- stop(): void
709
- ```
346
+ | `idempotencyKey` | `string` | Dedup key for at-least-once delivery. |
347
+ | `partitionKey` | `string` | Orders delivery within a partition. |
348
+ | `fireAndForget` | `boolean` | Skip the durable wait for the publish ack. |
349
+ | `headers` | `Record<string, string>` | Custom envelope headers. |
350
+ | `occurredAtMs` | `number` | Event time (unix-ms); defaults to now. |
710
351
 
711
- Gracefully stops the worker gRPC server (try graceful shutdown, then force), heartbeats, channels, and SDK internals.
352
+ The runtime delivers at-least-once, retries failures, fans out to every matching subscriber, and dead-letters exhausted messages. The DLQ is operated from the dashboard — the SDK has no DLQ API; make handlers idempotent and throw to signal "retry me".
712
353
 
713
- ---
354
+ ### Jobs
714
355
 
715
- ### `startHttpSpan(opts)`
356
+ Scheduled work: cron, fixed interval, or one-shot delay. The runtime owns the schedule, leasing and retries.
716
357
 
717
358
  ```ts
718
- startHttpSpan(opts: {
719
- method: string;
720
- path: string;
721
- traceId?: string;
722
- parentSpanId?: string;
723
- }): HttpSpan
724
- ```
725
-
726
- Manual HTTP tracing primitive.
727
-
728
- ```ts
729
- const span = sb.startHttpSpan({ method: "GET", path: "/health" });
730
- try {
731
- span.end({ statusCode: 200, success: true });
732
- } catch (e) {
733
- span.end({ success: false, error: String(e) });
734
- }
735
- ```
736
-
737
- ---
359
+ sb.job.handle("nightly-rollup",
360
+ { trigger: { cron: "0 3 * * *", tz: "UTC" } }, // 5-field cron, no seconds
361
+ async (ctx) => { await rollup(ctx.scheduledAt); },
362
+ );
738
363
 
739
- ### `registerHttpEndpoint(opts)`
364
+ sb.job.handle("heartbeat", { trigger: { interval: 30_000 } }, async () => { await ping(); });
740
365
 
741
- ```ts
742
- registerHttpEndpoint(opts: {
743
- method: string;
744
- route: string;
745
- instanceId?: string;
746
- endpoint?: string;
747
- allowedCallers?: string[];
748
- requestSchemaJson?: string;
749
- responseSchemaJson?: string;
750
- transport?: string;
751
- }): Promise<void>
366
+ sb.job.handle("send-reminder",
367
+ { trigger: { delayed: { at: Date.now() + 60_000 } } }, // Date | number | ISO string
368
+ async (ctx) => { await remind(ctx.idempotencyKey); },
369
+ );
752
370
  ```
753
371
 
754
- Registers HTTP route metadata in the ServiceBridge service catalog (stored and sent on the next Reconcile). **Requires a completed worker `start()`**: until `start()` has finished successfully, the call resolves but does not record the route (HTTP middleware may invoke `registerHttpEndpoint` on first request; catalog entries appear only after `start()` has run).
755
-
756
- | Option | Type | Description |
757
- |---|---|---|
758
- | `method` | `string` | HTTP method: `GET`, `POST`, `PUT`, `PATCH`, `DELETE`, etc. |
759
- | `route` | `string` | Route pattern with parameter placeholders, e.g. `"/users/:id"`. |
760
- | `instanceId` | `string` | Present on the public opts type; **not** applied by the current Node client when building `http_routes` for Reconcile (worker identity comes from `start()`). |
761
- | `endpoint` | `string` | Same as above — use `start()` / deployment wiring for the reachable worker base URL. |
762
- | `allowedCallers` | `string[]` | Service names allowed to call (RBAC). |
763
- | `requestSchemaJson` | `string` | JSON schema for request validation metadata. |
764
- | `responseSchemaJson` | `string` | JSON schema for response validation metadata. |
765
- | `transport` | `string` | Present on the public opts type; **not** sent per route in the current Node reconcile payload. |
372
+ The handler receives a `JobHandlerCtx`: `{ jobName, executionId, scheduledAt, localScheduledAt, attempt, idempotencyKey, signal }`.
766
373
 
767
- ```ts
768
- await sb.registerHttpEndpoint({
769
- method: "GET",
770
- route: "/users/:id",
771
- requestSchemaJson: '{"type":"object"}',
772
- transport: "http",
374
+ | `JobOpts` | Type | Default | Description |
375
+ |---|---|---|---|
376
+ | `trigger` | `{cron, tz?} \| {delayed:{at}} \| {interval}` | required | Exactly one trigger; `interval` is in ms. |
377
+ | `catchup` | `"skip" \| "fire_once" \| "fire_all"` | `skip` | What to do for fire times missed during downtime. |
378
+ | `overlap` | `"skip" \| "allow" \| "buffer_one"` | `allow` | Behaviour when a previous run is still in flight. |
379
+ | `deps` | `DeclaredDep[]` | none | Outgoing deps: `{ rpc }`, `{ event }`, `{ workflow }`. |
380
+ | `maxAttempts` / `leaseTtlMs` / `maxConcurrent` / `retry` | — | runtime default | Execution limits and `{ initialMs, maxMs, multiplier, jitter }` retry. |
381
+
382
+ ### Workflows
383
+
384
+ Durable DAGs. Declare the graph once; the runtime executes it, persists state between steps, survives restarts, and compensates on failure or cancel.
385
+
386
+ ```ts
387
+ sb.workflow.handle("checkout", {
388
+ input: { type: "object", properties: { orderId: { type: "string" } } },
389
+ steps: [
390
+ { id: "reserve", type: "call", service: "inventory-svc", method: "Reserve",
391
+ input: "$.input",
392
+ compensate: { service: "inventory-svc", method: "Release", input: "$.reserve" } },
393
+ { id: "charge", type: "call", service: "payment-svc", method: "Charge",
394
+ input: "$.input", waitFor: ["reserve"] },
395
+ { id: "notify", type: "publish", event: "order.placed",
396
+ input: "$.input", waitFor: ["charge"] },
397
+ ],
773
398
  });
774
399
  ```
775
400
 
776
- ---
401
+ Top-level steps run in parallel by default; `waitFor` declares dependencies and defines the execution levels. Step types: `call`, `publish`, `sleep`, `wait_event`, `wait_signal`, `workflow` (sub-workflow), `parallel`, `sequence`, `local`. Inputs are JSON-path expressions (`"$.input"`, `"$.reserve.id"`) over the accumulated run state.
777
402
 
778
- ### `watchTrace(traceId, opts?)`
403
+ Driving a run:
779
404
 
780
405
  ```ts
781
- watchTrace(traceId: string, opts?: WatchTraceOpts): AsyncIterable<TraceStreamEvent>
782
- ```
783
-
784
- Subscribes to a trace stream with replay and live updates. `traceId` is the stream
785
- identifier used by `ctx.stream.write(...)`.
786
-
787
- `WatchTraceOpts`:
788
-
789
- | Option | Type | Default | Description |
790
- |---|---|---|---|
791
- | `key` | `string` | `""` | Stream key filter (`""` = all keys). |
792
- | `fromSequence` | `number` | `0` | Replay from sequence cursor. |
406
+ const { runId } = await sb.workflow.start("checkout", { orderId: "o-1" });
793
407
 
794
- `TraceStreamEvent`:
408
+ const state = await sb.workflow.await(runId); // block until terminal
409
+ const snap = await sb.workflow.query(runId); // { status, state, steps: [...] }
410
+ await sb.workflow.signal(runId, "approval", { ok: 1 }); // resume a wait_signal step
411
+ await sb.workflow.cancel(runId); // compensate in reverse
412
+ const { runId: forked } = await sb.workflow.replay(runId, { fromStepId: "charge" });
413
+ ```
795
414
 
796
- | Field | Type | Description |
797
- |---|---|---|
798
- | `type` | `"chunk" \| "trace_complete"` | Event kind. |
799
- | `traceId` | `string` | Trace identifier being watched. |
800
- | `key` | `string` | Stream lane key. |
801
- | `sequence` | `number` | Monotonic sequence number. |
802
- | `data` | `unknown` | JSON-decoded chunk payload. |
803
- | `traceStatus` | `string \| undefined` | Final status on `trace_complete`. |
415
+ Use `sb.workflow.query()` for the snapshot there is no `getStatus`. `start()` with no permission throws `WorkflowAccessDeniedError`; an unknown name throws `WorkflowNotFoundError`; signalling/cancelling a finished run throws `WorkflowTerminalError`.
804
416
 
805
- Behavior:
417
+ ### Streaming
806
418
 
807
- - Auto-reconnect with exponential backoff (`500ms` `5000ms`) on retryable stream failures.
808
- - Deduplicates by `sequence` across reconnects.
809
- - Enforces strict JSON for `type="chunk"` payloads (non-JSON chunk terminates stream with fatal error).
810
- - Enforces internal queue limit `256`; overflow is fatal (consumer must drain promptly).
419
+ Server-side streaming is a first-class RPC shape. Register with `sb.rpc.handleStream`, consume with `sb.stream()` (or the typed proxy, which auto-detects `returns (stream T)` methods).
811
420
 
812
421
  ```ts
813
- for await (const evt of sb.watchTrace(traceId, { key: "output", fromSequence: 0 })) {
814
- if (evt.type === "chunk") {
815
- process.stdout.write(String((evt.data as { token?: string }).token ?? ""));
816
- }
817
- if (evt.type === "trace_complete") break;
422
+ for await (const chunk of sb.stream("gen-svc", "Generate", { prompt: "write a haiku" })) {
423
+ process.stdout.write(chunk.token);
818
424
  }
819
425
  ```
820
426
 
821
- ---
427
+ Breaking the loop (`break`/`return`) tears down the gRPC stream end to end. Streams are single-pick — never retried — by design.
822
428
 
823
- ### Trace Utilities
429
+ ### Telemetry
824
430
 
825
- #### `getTraceContext()`
431
+ Telemetry flows automatically: every RPC, event, job, workflow step and HTTP request emits an operation span and propagates the trace across hops. Add your own through `sb.telemetry`; anything emitted inside a handler nests under that handler's trace.
826
432
 
827
433
  ```ts
828
- getTraceContext(): TraceCtx | undefined
829
- ```
830
-
831
- Returns the current async-local trace context.
434
+ import { Channel, UserSubOp } from "service-bridge";
832
435
 
833
- ```ts
834
- import { getTraceContext } from "service-bridge";
835
-
836
- const tc = getTraceContext();
837
- if (tc) {
838
- console.log(tc.traceId, tc.spanId);
436
+ const op = sb.telemetry.startOp({
437
+ channel: Channel.USER, kind: UserSubOp, subject: "reprice-cart", businessKey: cartId,
438
+ });
439
+ try {
440
+ await reprice(cartId);
441
+ op.end(/* Status.SUCCESS */);
442
+ } catch (err) {
443
+ op.end(/* Status.ERROR */, String(err));
444
+ throw err;
839
445
  }
840
- ```
841
446
 
842
- #### `withTraceContext(ctx, fn)`
843
-
844
- ```ts
845
- withTraceContext<T>(ctx: TraceCtx, fn: () => T): T
447
+ sb.telemetry.log.info("cart repriced", { cartId, items: 7 }); // also sb.logger
448
+ sb.telemetry.counter("carts_repriced_total").inc();
449
+ sb.telemetry.gauge("queue_depth").set(42);
450
+ sb.telemetry.histogram("reprice_ms", "ms").observe(12.5);
846
451
  ```
847
452
 
848
- Runs a function inside an explicit trace context.
453
+ `startOp()` returns a handle whose `.end(status, message?)` closes the span. Anything emitted before `start()` buffers in an in-memory ring and drains once connected.
849
454
 
850
- ```ts
851
- import { withTraceContext } from "service-bridge";
455
+ ### HTTP
852
456
 
853
- withTraceContext({ traceId: "trace-1", spanId: "span-1" }, async () => {
854
- await sb.events.publish("audit.log", { action: "user.login" });
855
- });
856
- ```
457
+ ServiceBridge does **not** proxy your business HTTP. You run your own server; the integration discovers your routes, publishes them to the Service Map, and wraps each request in a trace span so HTTP stitches into the same trace as the RPCs and events it triggers. See [HTTP plugins](#http-plugins).
458
+
459
+ Useful read accessors after `start()`: `sb.identity()` (current session identity or `null`), `sb.serviceMap()` (live registry: visible methods, instances, endpoints), `sb.policyEvaluation()` (the runtime's current access-policy verdict).
857
460
 
858
461
  ---
859
462
 
860
- ## HTTP Plugins
463
+ ## HTTP plugins
861
464
 
862
- ### Express (`service-bridge/express`)
465
+ Each integration is a subpath import with an optional peer dependency.
863
466
 
864
- ```bash
865
- npm install express
866
- ```
467
+ **Express** — `service-bridge/express`:
867
468
 
868
469
  ```ts
869
470
  import express from "express";
870
471
  import { ServiceBridge } from "service-bridge";
871
- import { servicebridgeMiddleware, registerExpressRoutes } from "service-bridge/express";
472
+ import { attachExpress } from "service-bridge/express";
872
473
 
873
- const sb = new ServiceBridge(process.env.SERVICEBRIDGE_URL!, process.env.SERVICEBRIDGE_SERVICE_KEY!);
874
474
  const app = express();
475
+ app.post("/orders", (req, res) => res.json({ ok: true }));
875
476
 
876
- app.use(servicebridgeMiddleware({
877
- client: sb,
878
- excludePaths: ["/health"],
879
- autoRegister: true,
880
- }));
881
-
882
- app.get("/users/:id", async (req, res) => {
883
- const user = await req.servicebridge.rpc.invoke("user.get", { id: req.params.id });
884
- res.json(user);
885
- });
886
- ```
887
-
888
- #### `servicebridgeMiddleware(options)`
477
+ const sb = new ServiceBridge("localhost:14445", KEY);
478
+ await sb.start();
889
479
 
890
- ```ts
891
- servicebridgeMiddleware(options: {
892
- client: ServiceBridgeService;
893
- excludePaths?: string[];
894
- propagateTraceHeader?: boolean;
895
- autoRegister?: boolean;
896
- }): express.RequestHandler
480
+ app.listen(3000, () => attachExpress(app, sb, { port: 3000 }));
897
481
  ```
898
482
 
899
- - Attaches `req.servicebridge`, `req.traceId`, `req.spanId`
900
- - Starts/ends HTTP span automatically
901
- - Optionally sets `x-trace-id` response header
902
- - Optionally auto-registers route pattern in catalog on first hit
903
-
904
- #### `registerExpressRoutes(app, client, opts?)`
905
-
906
- Eager route catalog registration without waiting for first request.
907
-
908
- ```ts
909
- await registerExpressRoutes(app, sb, {
910
- endpoint: "http://10.0.0.5:3000",
911
- allowedCallers: ["api-gateway"],
912
- excludePaths: ["/health"],
913
- });
914
- ```
915
-
916
- ---
917
-
918
- ### Fastify (`service-bridge/fastify`)
919
-
920
- ```bash
921
- npm install fastify
922
- ```
483
+ **Fastify** `service-bridge/fastify`:
923
484
 
924
485
  ```ts
925
486
  import Fastify from "fastify";
926
487
  import { ServiceBridge } from "service-bridge";
927
- import { servicebridgePlugin, wrapHandler } from "service-bridge/fastify";
488
+ import { sbFastify } from "service-bridge/fastify";
928
489
 
929
- const sb = new ServiceBridge(process.env.SERVICEBRIDGE_URL!, process.env.SERVICEBRIDGE_SERVICE_KEY!);
930
490
  const app = Fastify();
491
+ const sb = new ServiceBridge("localhost:14445", KEY);
931
492
 
932
- await app.register(servicebridgePlugin, {
933
- client: sb,
934
- excludePaths: ["/health"],
935
- autoRegister: true,
936
- });
493
+ app.post("/orders", async () => ({ ok: true }));
494
+ await app.register(sbFastify, { sb }); // discovers routes + endpoint in onListen
937
495
 
938
- app.get("/users/:id", wrapHandler(async (request, reply) => {
939
- const user = await request.servicebridge.rpc.invoke("user.get", {
940
- id: (request.params as any).id,
941
- });
942
- return reply.send(user);
943
- }));
496
+ await sb.start();
497
+ await app.listen({ port: 3000 });
944
498
  ```
945
499
 
946
- #### `servicebridgePlugin(fastify, options)`
500
+ **Hono** `service-bridge/hono`:
947
501
 
948
502
  ```ts
949
- servicebridgePlugin(fastify, {
950
- client,
951
- excludePaths?,
952
- propagateTraceHeader?,
953
- autoRegister?,
954
- register?: {
955
- instanceId?,
956
- endpoint?,
957
- allowedCallers?,
958
- excludePaths?,
959
- },
960
- })
961
- ```
962
-
963
- - Decorates `request.servicebridge`, `request.traceId`, `request.spanId`
964
- - Traces HTTP lifecycle via hooks
965
- - Auto-registers routes on `onRoute` before traffic
966
-
967
- #### `wrapHandler(handler)`
968
-
969
- Runs a Fastify handler inside the current trace context so downstream SDK calls inherit the trace.
970
-
971
- ---
972
-
973
- ### Trace Utilities (HTTP Plugins)
503
+ import { Hono } from "hono";
504
+ import { ServiceBridge } from "service-bridge";
505
+ import { attachHono } from "service-bridge/hono";
974
506
 
975
- #### `extractTraceFromHeaders(headers)`
507
+ const app = new Hono();
508
+ app.post("/orders", (c) => c.json({ ok: true }));
976
509
 
977
- ```ts
978
- import { extractTraceFromHeaders } from "service-bridge/express";
979
- // or
980
- import { extractTraceFromHeaders } from "service-bridge/fastify";
510
+ const sb = new ServiceBridge("localhost:14445", KEY);
511
+ await sb.start();
981
512
 
982
- const { traceId, parentSpanId } = extractTraceFromHeaders(req.headers);
513
+ attachHono(app, sb, { port: 3000 }); // Hono doesn't own the socket — pass the port
514
+ Bun.serve({ port: 3000, fetch: app.fetch });
983
515
  ```
984
516
 
985
- Extracts trace context from HTTP headers. Supports W3C `traceparent`, `x-trace-id`/`x-span-id` headers, and generates random IDs as fallback. Useful for custom HTTP framework integrations (Hono, Koa, etc.).
517
+ `attachExpress`/`attachHono` take `{ port, host? }`; `sbFastify` reads the bound address itself. Host defaults to the bound socket, falling back to `127.0.0.1`. Attaching before `start()` is safe the endpoint rides along in the first registration.
986
518
 
987
519
  ---
988
520
 
989
521
  ## Configuration
990
522
 
991
- ### TLS behavior
992
-
993
- - Worker transport is TLS-only.
994
- - Control plane is TLS-only. Trust source is embedded into sbv2 service key by default.
995
- - Embedded/explicit CA PEM is validated with strict x509 parsing.
996
- - If `workerTLS` is not provided, SDK auto-provisions worker certs via gRPC `ProvisionWorkerCertificate`.
997
- - `workerTLS.cert` and `workerTLS.key` must be provided together.
998
- - `start({ tls })` overrides global `workerTLS` for a specific worker instance.
523
+ All configuration lives on the `ServiceBridge` constructor — `new ServiceBridge(url, key, options)`. The SDK reads no environment variables; you decide where `url`, `key` and options come from. Every option is optional.
999
524
 
1000
- ### Offline queue behavior
1001
-
1002
- When the control plane is unavailable, SDK queues write operations (`events.publish`, `jobs.run`, `workflows.run`, telemetry writes).
1003
-
1004
- - Queue size: `queueMaxSize` (default: 1000)
1005
- - Overflow policy: `queueOverflow` (default: `"drop-oldest"`)
1006
- - Return values for queued writes may be empty strings until flushed
1007
-
1008
- ---
1009
-
1010
- ## Environment Variables
1011
-
1012
- The SDK requires values you pass into `new ServiceBridge(...)`. Common setup:
1013
-
1014
- | Variable | Required | Example | Description |
525
+ | Option | Type | Default | Description |
1015
526
  |---|---|---|---|
1016
- | `SERVICEBRIDGE_URL` | yes | `localhost:14445` | gRPC control plane URL |
1017
- | `SERVICEBRIDGE_SERVICE_KEY` | yes | `sbv2.<id>.<secret>.<ca>` | Service authentication key (sbv2 only) |
1018
-
1019
- ```ts
1020
- const sb = new ServiceBridge(
1021
- process.env.SERVICEBRIDGE_URL ?? "localhost:14445",
1022
- process.env.SERVICEBRIDGE_SERVICE_KEY!,
1023
- );
1024
- ```
1025
-
1026
- ---
1027
-
1028
- ## Error Handling
1029
-
1030
- `ServiceBridgeError` is exported for normalized SDK and runtime errors.
1031
-
1032
- ```ts
1033
- import { ServiceBridge, ServiceBridgeError } from "service-bridge";
1034
-
1035
- try {
1036
- await sb.rpc.invoke("payment.charge", { orderId: "ord_1" });
1037
- } catch (e) {
1038
- if (e instanceof ServiceBridgeError) {
1039
- console.error(e.component, e.operation, e.severity, e.retryable, e.code, e.grpcStatus);
1040
- }
1041
- throw e;
1042
- }
1043
- ```
1044
-
1045
- | Field | Type | Description |
1046
- |---|---|---|
1047
- | `component` | `string` | SDK subsystem (for example, `"rpc"` or `"event"`). |
1048
- | `operation` | `string` | Operation that failed. |
1049
- | `severity` | `"fatal" \| "retriable" \| "ignorable"` | Error classification. |
1050
- | `retryable` | `boolean` | Whether retry is recommended (`true` when `severity === "retriable"`). |
1051
- | `code` | `ServiceBridgeErrorCode` | Stable SDK error id (`SB_*`). |
1052
- | `grpcStatus` | `number \| undefined` | gRPC status code when the error came from gRPC. |
1053
- | `cause` | `unknown` | Original underlying error. |
1054
-
1055
- ---
1056
-
1057
- ## When to Use / When Not to Use
1058
-
1059
- ### ServiceBridge is a good fit when you:
1060
-
1061
- - Have **3+ microservices** that need to communicate via RPC, events, or both
1062
- - Want **RPC + events + workflows + jobs** without managing separate infrastructure for each
1063
- - Need **end-to-end tracing** across all communication patterns in one timeline
1064
- - Want to **eliminate sidecar proxies** and reduce operational overhead
1065
- - Need **durable event delivery** with retry, DLQ, and replay without running a broker
1066
- - Are building **AI/LLM pipelines** and need realtime streaming with replay
1067
-
1068
- ### Consider alternatives when you:
1069
-
1070
- - Run a **single monolith** with no service decomposition plans
1071
- - Need **ultra-high-throughput event streaming** (100K+ msg/s sustained) — Kafka is purpose-built for this
1072
- - Need a **full API gateway** with rate limiting, auth plugins, and request transformation — use Kong/Envoy Gateway
1073
- - Already have a **mature Istio/Linkerd mesh** and only need traffic management (no events/workflows/jobs)
1074
- - Need **multi-region event replication** — ServiceBridge currently targets single-region deployments
1075
-
1076
- ---
1077
-
1078
- ## v2 Session API
1079
-
1080
- `session_v2.ts` реализует новый Enterprise Session Protocol — Channel-based bidi stream с 8-состоянийным FSM, адаптивным heartbeat и кредитным управлением потоком. Симметричен с Go и Python SDK.
1081
-
1082
- ### Жизненный цикл сессии (8 состояний FSM)
1083
-
1084
- ```
1085
- connecting → handshaking → ready ↔ active
1086
- ↘ suspended → (reconnect)
1087
- ↘ draining → closed
1088
- ↘ fenced (permanent)
1089
- ```
1090
-
1091
- | Состояние | Описание |
1092
- |-----------|----------|
1093
- | `connecting` | Устанавливается TCP/TLS соединение |
1094
- | `handshaking` | Отправлен Hello, ждём HelloAck |
1095
- | `ready` | HelloAck получен, команды не выполняются |
1096
- | `active` | Есть активные команды |
1097
- | `suspended` | Heartbeat пропущен 2+ раза |
1098
- | `draining` | Инициирован graceful shutdown |
1099
- | `fenced` | Сервер прислал GOAWAY_FENCED — сессия закрыта навсегда |
1100
- | `closed` | Соединение закрыто |
1101
-
1102
- ### Быстрый старт
1103
-
1104
- ```typescript
1105
- import { V2SessionClient, validateV2Config } from 'service-bridge';
1106
-
1107
- const cfg = {
1108
- serverAddress: 'localhost:9090',
1109
- instanceId: 'worker-1',
1110
- zone: 'us-east-1a',
1111
- transportMode: 'direct' as const,
1112
- maxInflight: 64,
1113
- };
1114
-
1115
- validateV2Config(cfg);
1116
- const session = new V2SessionClient(cfg);
1117
-
1118
- // Отправить Hello при подключении
1119
- const hello = session.getHelloFields();
1120
-
1121
- // Обработать HelloAck от сервера
1122
- session.onHelloAck({
1123
- sessionId: 'sess-abc',
1124
- resumeToken: 'token-xyz',
1125
- epoch: 1n,
1126
- resumed: false,
1127
- resumeFromSeq: 0n,
1128
- replayedCommands: 0,
1129
- reconciledResults: 0,
1130
- heartbeatIntervalMs: 10_000,
1131
- heartbeatTimeoutMs: 30_000,
1132
- initialPermits: 64,
1133
- maxPermits: 128,
1134
- effectiveTransportMode: 'direct',
527
+ | `advertise` | `{ host, port } \| false` | `127.0.0.1` on a free port (with a warning) | Inbound RPC server address. Pass `{ host, port }` in containers / k8s; `false` for caller-only instances that never serve RPC. |
528
+ | `callDefaults` | `CallOpts` | `{}` | Default `CallOpts` merged under every `sb.rpc.call()` / `sb.stream()`. |
529
+ | `failOnPolicyViolation` | `boolean` | `false` | When `true`, any policy warning at registration makes `start()` surface a `disconnected` event and stop. Otherwise warnings are logged and emitted as `policy_violation`. |
530
+ | `telemetry` | `boolean` | `true` | Emit ops/logs/metrics to the runtime. `false` fully disables the telemetry transport. |
531
+ | `telemetryRingSize` | `number` | `262144` (256 KiB) | Byte budget for the in-memory ops ring buffer. |
532
+ | `dataDir` | `string` | `"./.servicebridge"` | Directory for the local SQLite event outbox. |
533
+ | `maxOutboxRows` | `number` | `100000` | Outbox rows before `publish` back-pressures with `OutboxFullError`. |
534
+ | `eventsDrainerBatch` | `number` | `50` | Outbox rows drained to the runtime per tick. |
535
+ | `eventsMaxInFlight` | `number` | `32` | Max concurrent inbound events processed by subscribers. |
536
+ | `payloadMaxBytes` | `number` | `65536` | Per-direction cap on captured payload bytes. |
537
+ | `reconnectIntervalMs` | `number` | `3000` | Delay between reconnect attempts. |
538
+ | `reconnectAttempts` | `number` | `3` | Reconnect attempts before giving up. `0` = unlimited. |
539
+
540
+ ```ts
541
+ const sb = new ServiceBridge("localhost:14445", KEY, {
542
+ advertise: { host: process.env.POD_IP!, port: 50051 },
543
+ callDefaults: { timeout: "10s" },
544
+ reconnectAttempts: 0,
545
+ dataDir: "/var/lib/myservice/sb",
1135
546
  });
1136
-
1137
- console.log(session.state); // 'ready'
1138
-
1139
- // Входящая команда
1140
- const accepted = session.onCommandReceived(1n, 'cmd-001');
1141
- if (!accepted) {
1142
- // backpressure — permits = 0
1143
- }
1144
-
1145
- // Команда выполнена
1146
- session.onCommandCompleted(1n, 'cmd-001');
1147
547
  ```
1148
548
 
1149
- ### Адаптивный heartbeat (EWMA RTT)
1150
-
1151
- ```typescript
1152
- import { AdaptiveHeartbeatV2 } from 'service-bridge';
549
+ ### Lifecycle
1153
550
 
1154
- const hb = new AdaptiveHeartbeatV2(10_000, 30_000);
1155
-
1156
- // Получен pong
1157
- hb.onPong(25); // rttMs
1158
-
1159
- // Следующий интервал (адаптируется по EWMA RTT)
1160
- const nextMs = hb.nextIntervalMs();
1161
-
1162
- // Пропуск — ускоряем пинги
1163
- const missCount = hb.onMiss();
1164
- if (missCount >= 2) {
1165
- // reconnect
1166
- }
1167
- ```
1168
-
1169
- Алгоритм: базовый интервал `intervalMs / 3`; при пропусках делится на `2^miss` (min 2s); при стабильном RTT < 50ms удваивается (max 30s).
1170
-
1171
- ### Кредитное управление потоком
551
+ ```ts
552
+ const sb = new ServiceBridge("localhost:14445", KEY);
1172
553
 
1173
- ```typescript
1174
- import { FlowControlStateV2 } from 'service-bridge';
554
+ sb.service("payment-svc", { rpc: ["Charge"] }); // what you call
555
+ sb.rpc.handle("Ship", shipHandler, { schema: { protoFile: "./ship.proto" } }); // what you serve
1175
556
 
1176
- const fc = new FlowControlStateV2(64, 1, 128);
557
+ sb.on("connected", ({ serviceName }) => console.log(`connected as ${serviceName}`));
558
+ sb.on("reconnecting", ({ attempt, reason }) => console.warn(`reconnecting #${attempt}: ${reason}`));
559
+ sb.on("disconnected", ({ reason }) => console.error(`disconnected: ${reason}`));
560
+ sb.on("policy_violation", (v) => console.warn(`policy: ${v.declaration} ${v.value} — ${v.reason}`));
1177
561
 
1178
- if (fc.tryConsume()) {
1179
- // dispatch command
1180
- }
562
+ await sb.start();
1181
563
 
1182
- // Команда завершена вернуть permit
1183
- fc.release(1);
1184
-
1185
- // Сервер прислал FlowControlUpdate
1186
- fc.setWindow(32);
564
+ process.on("SIGTERM", async () => { await sb.stop(); process.exit(0); });
1187
565
  ```
1188
566
 
1189
- ### Reconnect и resume
1190
-
1191
- `BackoffV2` реализует экспоненциальный backoff с full jitter (base=100ms, max=30s). При переподключении `getHelloFields()` автоматически включает `resumeToken`, `epoch`, `lastReceivedSeq`, `lastSentSeq`, `completedCommandIds` — сервер продолжит сессию с нужной позиции.
1192
-
1193
- ```typescript
1194
- import { BackoffV2 } from 'service-bridge';
567
+ ---
1195
568
 
1196
- const backoff = new BackoffV2();
569
+ ## Error handling
1197
570
 
1198
- while (true) {
1199
- if (backoff.isCircuitOpen()) break; // 10+ сбоев подряд
571
+ Typed errors are exported from the package root, so you can `catch` precisely:
1200
572
 
1201
- const delayMs = backoff.next();
1202
- await new Promise(r => setTimeout(r, delayMs));
573
+ ```ts
574
+ import {
575
+ RpcAccessDeniedError,
576
+ WorkflowAccessDeniedError,
577
+ InvalidEventNameError,
578
+ OutboxFullError,
579
+ ServiceBridgeError,
580
+ } from "service-bridge";
1203
581
 
1204
- try {
1205
- // reconnect...
1206
- backoff.reset();
1207
- } catch {
1208
- backoff.recordFail();
582
+ try {
583
+ await payment.Charge({ userId: "u-1", amount: 100 });
584
+ } catch (err) {
585
+ if (err instanceof RpcAccessDeniedError) {
586
+ // denied by access policy: { serviceName, methodName, reason }
587
+ } else if (err instanceof ServiceBridgeError) {
588
+ // connection / provisioning failure with a typed .code
1209
589
  }
1210
590
  }
1211
591
  ```
1212
592
 
1213
- ### ConfigPush динамическая конфигурация транспорта
1214
-
1215
- Сервер может в любой момент прислать `ConfigPush` с новыми правилами маршрутизации:
1216
-
1217
- ```typescript
1218
- session.onConfigPush({
1219
- defaultMode: 'direct',
1220
- serviceOverrides: {
1221
- 'payment-svc': { mode: 'proxy', fallbackPolicy: 'fallback_to_direct' },
1222
- },
1223
- functionOverrides: {
1224
- 'payment.charge': { mode: 'proxy', timeoutMs: 5000 },
1225
- },
1226
- });
1227
-
1228
- // Разрешить транспорт для функции
1229
- const mode = session.resolveTransportMode('payment.charge'); // 'proxy'
1230
- ```
1231
-
1232
- ### Все события сессии
1233
-
1234
- | Метод | Описание |
1235
- |-------|----------|
1236
- | `getHelloFields()` | Поля для отправки Hello (первый + resume) |
1237
- | `onHelloAck(ack)` | Обработка HelloAck от сервера |
1238
- | `onCommandReceived(seq, id)` | Входящая команда; возвращает `false` при backpressure |
1239
- | `onCommandCompleted(seq, id)` | Команда выполнена; освобождает permit |
1240
- | `onPermitGrant(n)` | Сервер добавил `n` permits |
1241
- | `onFlowControlUpdate(size, reason)` | Сервер изменил размер окна |
1242
- | `onPong(rttMs)` | Получен pong; обновляет EWMA |
1243
- | `onHeartbeatMiss()` | Таймаут pong; возвращает `true` → `suspended` |
1244
- | `onDrain(reason, deadlineMs)` | Инициировать graceful drain |
1245
- | `onGoaway(code, reason)` | GoawaySignal от сервера |
1246
- | `onConfigPush(config)` | Применить новую конфигурацию транспорта |
1247
- | `resolveTransportMode(fnName)` | Получить режим транспорта для функции |
1248
- | `stop()` | Немедленно закрыть сессию |
1249
-
1250
- ### Экспортируемые классы и типы
1251
-
1252
- | Символ | Тип | Описание |
1253
- |--------|-----|----------|
1254
- | `V2SessionClient` | class | Главный клиент сессии |
1255
- | `AdaptiveHeartbeatV2` | class | EWMA RTT heartbeat controller |
1256
- | `FlowControlStateV2` | class | Кредитное управление потоком |
1257
- | `BackoffV2` | class | Exponential backoff + circuit |
1258
- | `PositionTrackerV2` | class | Трекер seq/completed IDs |
1259
- | `ConfigPushStateV2` | class | Менеджер динамической конфигурации |
1260
- | `validateV2Config` | function | Валидация конфига; бросает `Error` |
1261
- | `V2Config` | interface | Конфигурация сессии |
1262
- | `SessionStateV2` | type | Союз 8 состояний FSM |
1263
- | `TransportMode` | type | `'direct' \| 'proxy'` |
1264
- | `HelloAckV2` | interface | Данные HelloAck от сервера |
1265
- | `TransportConfigV2` | interface | ConfigPush payload |
1266
- | `ReconcileRequestV2` | interface | Declarative worker registration request |
1267
- | `FunctionDeclarationV2` | interface | Function declaration for Reconcile |
1268
- | `ConsumerGroupDeclarationV2` | interface | Consumer group declaration |
1269
- | `HttpRouteDeclarationV2` | interface | HTTP route declaration |
1270
- | `JobDeclarationV2` | interface | Job declaration |
1271
- | `WorkflowDeclarationV2` | interface | Workflow declaration |
1272
- | `SubscribeRequestV2` | interface | Registry subscribe request |
1273
- | `WorkerEndpointV2` | interface | Worker endpoint info |
1274
- | `IssueCertificateRequestV2` | interface | Certificate request |
1275
- | `IssueCertificateResponseV2` | interface | Certificate response |
1276
- | `CircuitBreakerConfigV2` | interface | Circuit breaker config |
1277
- | `ZoneConfigV2` | interface | Zone-aware config |
1278
- | `ServiceTransportOverride` | interface | Per-service transport override |
1279
- | `FunctionTransportOverride` | interface | Per-function transport override |
1280
- | `ResumeState` | interface | Reconnect resume state |
1281
-
1282
- From the main entry `service-bridge`, types such as `ServiceBridgeOpts`, `RpcOpts`, `EventOpts`, `HandleRpcOpts`, `HandleEventOpts`, `ScheduleOpts`, `StartOpts`, `ExecuteWorkflowOpts`, and `ExecuteWorkflowResult` are available. The DAG shapes **`WorkflowStep` and `WorkflowOpts` are documented above but are not named exports** from that entry — use inline object literals (inference from `workflows.run(...)`) unless your toolchain exposes deep paths. Example:
1283
-
1284
- ```ts
1285
- import type {
1286
- RpcContext,
1287
- EventContext,
1288
- StreamWriter,
1289
- TraceCtx,
1290
- RetryPolicy,
1291
- ServiceBridgeErrorSeverity,
1292
- } from "service-bridge";
1293
- ```
593
+ | Error | Thrown when |
594
+ |---|---|
595
+ | `RpcAccessDeniedError` | An RPC call is denied by access policy. Also fires a `policy_violation` event. |
596
+ | `WorkflowAccessDeniedError` | A workflow `start()` is denied by access policy. |
597
+ | `WorkflowNotFoundError` | Starting a workflow name the runtime doesn't know. |
598
+ | `WorkflowTerminalError` | Signalling/cancelling a run that already finished. |
599
+ | `InvalidEventNameError` | Publishing/defining an event whose name fails the naming rule. |
600
+ | `OutboxFullError` | The local event outbox is at `maxOutboxRows` (back-pressure). |
601
+ | `ServiceBridgeError` | Connection / provisioning failures; carries a typed `.code` (retryable ones drive auto-reconnect). |
1294
602
 
1295
603
  ---
1296
604
 
1297
605
  ## FAQ
1298
606
 
1299
- **How does ServiceBridge handle service failures?**
1300
- RPC calls have configurable retries with exponential backoff and hard per-attempt timeouts, so a silent downstream service cannot keep a call pending forever. Events are durable (PostgreSQL-backed) with at-least-once delivery per consumer group. Failed deliveries are retried according to policy, then moved to DLQ. Workflows track step state and can be resumed.
607
+ **Do I have to use Protobuf?** You point handlers at a `.proto` file or a `.schema.json` with explicit field numbers. Both are file-based; there is no inline schema.
1301
608
 
1302
- **Is there vendor lock-in?**
1303
- ServiceBridge is self-hosted. The runtime is a single Go binary + PostgreSQL. SDK calls map to standard patterns (RPC, pub/sub, cron) — migrating away means replacing SDK calls with equivalent library calls.
609
+ **Does ServiceBridge proxy my HTTP traffic?** No. You run your own Express / Fastify / Hono server. The integration only discovers your routes for the Service Map and adds trace spans — your HTTP path is untouched.
1304
610
 
1305
- **How does tracing work without an OTEL collector?**
1306
- The SDK automatically reports trace spans for every RPC call, event publish/delivery, workflow step, and HTTP request. The runtime stores traces in PostgreSQL and serves them via the built-in dashboard and a Loki-compatible API for Grafana integration.
611
+ **How do I scale horizontally?** Run as many SDK instances as you like; the runtime load-balances RPC across live instances and fails over automatically. The runtime itself is a single source of truth backed by PostgreSQL.
1307
612
 
1308
- **Can I use ServiceBridge alongside existing infrastructure?**
1309
- Yes. You can adopt incrementally — start with RPC between two services, add events later, then workflows. ServiceBridge doesn't require replacing your existing broker or mesh all at once.
613
+ **What happens on a transient disconnect?** Published events sit in the local SQLite outbox and drain when the connection returns. The SDK auto-reconnects (configurable) and rotates certs with overlap so live instances don't drop traffic.
1310
614
 
1311
- **What happens when the control plane is down?**
1312
- In-flight direct RPC calls continue working (they go service-to-service, not through the control plane). New discovery lookups, event publishes, and telemetry writes are queued in the SDK offline queue and flushed when the control plane recovers.
615
+ **Where do I see traces, metrics and the DLQ?** In the runtime dashboard on `:14444`. Tracing, metrics and the dead-letter queue are operated there.
1313
616
 
1314
- **What databases does the runtime support?**
1315
- PostgreSQL 16+. The runtime uses PostgreSQL for all persistence: traces, events, workflows, jobs, service registry, and configuration.
617
+ **Node or Bun?** Both. Node 18+ or any current Bun. Bun-native APIs are used where available.
1316
618
 
1317
619
  ---
1318
620
 
1319
- ## Community and Support
621
+ ## Community
1320
622
 
1321
- - Website: [servicebridge.dev](https://servicebridge.dev)
1322
- - GitHub: [github.com/service-bridge](https://github.com/service-bridge)
1323
- - SDK monorepo: [README.md](../README.md)
623
+ - **Website & docs:** [servicebridge.dev](https://servicebridge.dev) · [servicebridge.dev/docs](https://servicebridge.dev/docs)
624
+ - **SDK umbrella repo (all languages):** [github.com/service-bridge/sdk](https://github.com/service-bridge/sdk)
625
+ - **Runtime:** [github.com/servicebridge2/runtime](https://github.com/servicebridge2/runtime)
1324
626
 
1325
- ---
1326
-
1327
- ## License
1328
-
1329
- Free for non-commercial use. Commercial use requires a separate license. See [LICENSE](../LICENSE).
1330
-
1331
- Copyright (c) 2026 Eugene Surkov.
627
+ This is an alpha release (`2.0.0-alpha`). The API is stabilising — issues and feedback are welcome.
1332
628
 
1333
629
  ---
1334
630
 
1335
- ## Keywords
631
+ ## License
1336
632
 
1337
- service-bridge · servicebridge · npm install service-bridge · npm i service-bridge · bun add service-bridge · Node.js SDK · TypeScript SDK · JavaScript microservices · RPC · gRPC · event bus · event-driven · distributed tracing · workflow orchestration · background jobs · cron · mTLS · service mesh · service discovery · zero sidecar · Istio alternative · Envoy alternative · RabbitMQ alternative · Temporal alternative · Jaeger alternative · PostgreSQL · Docker · Kubernetes · DLQ · dead letter queue · saga · distributed transactions · AI agent orchestration · Express middleware · Fastify middleware · HTTP middleware · observability · Prometheus · tracing · service catalog · durable events · retries · idempotency · auto mTLS · runtime dashboard · production ready · microservice communication
633
+ Licensed under the **MIT License** see [LICENSE](./LICENSE). Free for any use, including commercial; you only need to keep the copyright and license notice (attribution to esurkov1 <esurkovv@yandex.ru>).