service-bridge 1.8.5-dev.49 → 2.0.0-alpha

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,1290 +1,623 @@
1
- <!-- keywords: service-bridge servicebridge npm install service-bridge Node.js TypeScript JavaScript microservices RPC gRPC event-bus event-driven distributed-tracing workflow orchestration background-jobs cron mTLS service-mesh service-discovery distributed-systems zero-sidecar Istio-alternative RabbitMQ-alternative Temporal-alternative Jaeger-alternative PostgreSQL Docker Kubernetes DLQ dead-letter-queue saga distributed-transactions AI-agent-orchestration Express Fastify HTTP-middleware observability Prometheus tracing service-catalog async-messaging durable-events retries idempotency auto-mTLS runtime-dashboard production-ready bun deno -->
1
+ <!--
2
+ Keywords: service-bridge, ServiceBridge, microservices, Node.js SDK, TypeScript SDK, Bun,
3
+ gRPC, mTLS, RPC framework, durable events, pub/sub, message broker alternative, RabbitMQ alternative,
4
+ workflow engine, saga, orchestration, Temporal alternative, job scheduler, cron, distributed tracing,
5
+ observability, OpenTelemetry alternative, Jaeger alternative, service mesh alternative, Istio alternative,
6
+ self-hosted, PostgreSQL, Express, Fastify, Hono, circuit breaker, idempotency, retries, load balancing.
7
+ -->
2
8
 
3
9
  # service-bridge
4
10
 
5
- [![npm version](https://img.shields.io/npm/v/service-bridge?color=cb3837&logo=npm)](https://www.npmjs.com/package/service-bridge)
6
- [![License](https://img.shields.io/badge/License-Free%20%2F%20Commercial-blue)](../LICENSE)
7
- [![TypeScript](https://img.shields.io/badge/TypeScript-5%2B-3178c6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
8
- [![Node](https://img.shields.io/badge/Node.js-18%2B-339933?logo=node.js&logoColor=white)](https://nodejs.org/)
11
+ [![npm version](https://img.shields.io/npm/v/service-bridge?color=cb3837&label=npm)](https://www.npmjs.com/package/service-bridge)
12
+ [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)
13
+ [![TypeScript](https://img.shields.io/badge/types-included-3178c6.svg)](https://www.typescriptlang.org/)
14
+ [![Node](https://img.shields.io/badge/node-%E2%89%A518-339933.svg)](https://nodejs.org/)
9
15
 
10
- **The unified runtime for microservices — RPC, events, workflows, jobs, and observability without a mesh.**
16
+ **The Node.js / Bun SDK for [ServiceBridge](https://servicebridge.dev) — RPC, durable events, workflows, jobs, streaming and full observability over one self-hosted runtime. No broker. No sidecar. No tracing stack. Just one Go binary plus PostgreSQL.**
11
17
 
12
- Node.js SDK for [ServiceBridge](https://servicebridge.dev) direct gRPC between workers with zero proxy hops; durable events, background jobs, long-running workflows, and distributed tracing in one SDK. Full mesh with circuit breakers, auto mTLS, and hot-reload transport config. Node, Python, and Goone identical API. One Go runtime and PostgreSQL.
18
+ You declare what your service handles and what it calls. ServiceBridge does the rest: provisions an mTLS identity, opens the connection, registers your handlers, and routes every RPC, event, job and workflow step with tracing, metrics and access policy built in.
13
19
 
14
20
  ```
15
- ┌─────────────────────────────────────────────────────────────────┐
16
- │ BEFORE: 10 moving parts │
17
- Istio · Envoy · RabbitMQ · Temporal · Jaeger · Consul · │
18
- cert-manager · Alertmanager · cron · custom glue │
19
- └─────────────────────────────────────────────────────────────────┘
20
-
21
- ┌─────────────────────────────────────────────────────────────────┐
22
- AFTER: ServiceBridge + PostgreSQL
23
- RPC · Events · Workflows · Jobs · Tracing · mTLS · Dashboard
24
- One SDK · One runtime · Zero sidecars
25
- └─────────────────────────────────────────────────────────────────┘
21
+ BEFORE AFTER
22
+
23
+ ┌─────────────────────┐
24
+ Istio + Envoy │ ← mesh / mTLS
25
+ │ RabbitMQ / Kafka │ ← events ┌──────────────────────┐
26
+ │ Temporal │ ← workflows │ │
27
+ │ a cron scheduler │ ← jobs │ ServiceBridge │
28
+ gRPC plumbing │ ← RPC ═══► │ runtime (1 binary)
29
+ Jaeger / Tempo │ ← tracing + │
30
+ Prometheus wiring metrics PostgreSQL │
31
+ │ Loki │ ← logs │ │
32
+ │ a load balancer │ ← LB / retries └──────────────────────┘
33
+ │ service registry │ ← discovery
34
+ └─────────────────────┘
35
+ 10+ moving parts 2 things to run
26
36
  ```
27
37
 
28
- ## Table of Contents
38
+ ---
39
+
40
+ ## Table of contents
29
41
 
30
- - [Why ServiceBridge](#why-servicebridge)
31
- - [Use Cases](#use-cases)
32
- - [Quick Start](#quick-start)
33
42
  - [Install](#install)
34
- - [Runtime Setup](#runtime-setup)
35
- - [End-to-End Example](#end-to-end-example)
36
- - [Platform Features](#platform-features)
37
- - [How It Compares](#how-it-compares)
38
- - [API Reference](#api-reference)
39
- - [HTTP Plugins](#http-plugins)
43
+ - [Why ServiceBridge](#why-servicebridge)
44
+ - [Use cases](#use-cases)
45
+ - [Quick start](#quick-start)
46
+ - [Runtime setup](#runtime-setup)
47
+ - [End-to-end example](#end-to-end-example)
48
+ - [Platform features](#platform-features)
49
+ - [How it compares](#how-it-compares)
50
+ - [API reference](#api-reference)
51
+ - [RPC](#rpc)
52
+ - [Events](#events)
53
+ - [Jobs](#jobs)
54
+ - [Workflows](#workflows)
55
+ - [Streaming](#streaming)
56
+ - [Telemetry](#telemetry)
57
+ - [HTTP](#http)
58
+ - [HTTP plugins](#http-plugins)
40
59
  - [Configuration](#configuration)
41
- - [Environment Variables](#environment-variables)
42
- - [Error Handling](#error-handling)
43
- - [When to Use / When Not to Use](#when-to-use--when-not-to-use)
60
+ - [Error handling](#error-handling)
44
61
  - [FAQ](#faq)
45
- - [Community and Support](#community-and-support)
62
+ - [Community](#community)
46
63
  - [License](#license)
47
64
 
48
65
  ---
49
66
 
50
- ## Why ServiceBridge
51
-
52
- | Problem | Without ServiceBridge | With ServiceBridge |
53
- |---|---|---|
54
- | Service-to-service calls | Istio/Envoy sidecar proxy per pod | **Direct SDK-to-worker gRPC, zero proxy hops** |
55
- | Async messaging | Kafka/RabbitMQ + retry logic + DLQ setup | **Built-in durable events with retry, DLQ, replay** |
56
- | Background jobs | Bull/BullMQ + Redis + cron daemon | **Built-in cron and delayed jobs** |
57
- | Workflow orchestration | Temporal/Conductor cluster + persistence | **Built-in DAG workflows** |
58
- | Distributed tracing | Jaeger/Tempo + OTEL collector + dashboards | **Built-in traces + realtime UI** |
59
- | Service discovery | Consul/etcd + DNS glue | **Built-in registry + health-aware balancing** |
60
- | mTLS | cert-manager + Vault PKI | **Auto-provisioned certs from service key** |
61
-
62
- **Result**: `10 tools → 1 runtime`. One Go binary + PostgreSQL replaces the entire stack.
63
-
64
- ---
65
-
66
- ## Use Cases
67
-
68
- **Microservice communication** — Replace sidecar mesh with direct RPC calls. Get sub-millisecond overhead instead of double proxy hop latency.
69
-
70
- **Event-driven architecture** — Publish durable events with fan-out, retries, DLQ, idempotency, and server-side filtering. No broker infrastructure to manage.
71
-
72
- **Background job scheduling** — Cron jobs, delayed execution, and job-triggered workflows in a single API. No Redis, no separate queue workers.
73
-
74
- **Saga / distributed transactions** — DAG workflows with typed steps (`rpc`, `event`, `event_wait`, `sleep`, child workflow). Compensations and rollbacks via workflow step dependencies.
75
-
76
- **AI agent orchestration** — Stream LLM tokens via realtime trace streams with replay. Orchestrate multi-step AI pipelines as workflows.
77
-
78
- **Full-stack observability** — Every RPC call, event delivery, workflow step, and HTTP request traced automatically. One timeline, one dashboard. Prometheus metrics and Loki-compatible log API included.
79
-
80
- ---
81
-
82
- ## Quick Start
67
+ ## Install
83
68
 
84
- ### 1. Install
85
-
86
- ```bash
69
+ ```sh
87
70
  npm i service-bridge
88
71
  # or
89
72
  bun add service-bridge
90
73
  ```
91
74
 
92
- ### 2. Create a worker (service that handles calls)
75
+ - **Runtime:** Node.js 18+ or any current Bun.
76
+ - **Types:** included, written in TypeScript 5.
77
+ - **Backend:** a running ServiceBridge runtime (gRPC control plane on `:14445`) backed by PostgreSQL 18+. See [Runtime setup](#runtime-setup).
93
78
 
94
79
  ```ts
95
- import { servicebridge } from "service-bridge";
80
+ import { ServiceBridge } from "service-bridge";
96
81
 
97
- const sb = servicebridge(
98
- process.env.SERVICEBRIDGE_URL ?? "localhost:14445",
99
- process.env.SERVICEBRIDGE_SERVICE_KEY!,
82
+ const sb = new ServiceBridge(
83
+ "localhost:14445", // runtime control-plane address
84
+ "sb_key_...", // bootstrap service key from the runtime
100
85
  );
101
-
102
- sb.handleRpc("payment.charge", async (payload: { orderId: string; amount: number }) => {
103
- return { ok: true, txId: `tx_${Date.now()}`, orderId: payload.orderId };
104
- });
105
-
106
- await sb.serve({ host: "localhost" });
107
86
  ```
108
87
 
109
- ### 3. Call it from another service
110
-
111
- ```ts
112
- import { servicebridge } from "service-bridge";
113
-
114
- const sb = servicebridge(
115
- process.env.SERVICEBRIDGE_URL ?? "localhost:14445",
116
- process.env.SERVICEBRIDGE_SERVICE_KEY!,
117
- );
118
-
119
- const result = await sb.rpc<{ ok: boolean; txId: string }>("payments", "payment.charge", {
120
- orderId: "ord_42",
121
- amount: 4990,
122
- });
123
-
124
- console.log(result.txId); // tx_1711234567890
125
- ```
126
-
127
- That's it. No broker, no sidecar, no proxy — direct gRPC call between services.
88
+ The third constructor argument is an [options](#configuration) object. The SDK reads **no environment variables** — every knob is a constructor option, so you stay in control of where config comes from.
128
89
 
129
90
  ---
130
91
 
131
- ## Runtime Setup
92
+ ## Why ServiceBridge
132
93
 
133
- The SDK connects to a ServiceBridge runtime. The fastest way to start:
94
+ Microservices rarely fail because of business logic. They fail in the gaps *between* services — the broker that dropped a message, the workflow engine nobody fully understands, the trace that stops at a service boundary, the mesh config that takes a week to debug. Each gap is another system to run, secure and correlate.
134
95
 
135
- ```bash
136
- bash <(curl -fsSL https://servicebridge.dev/install.sh)
137
- ```
96
+ ServiceBridge collapses those gaps into one runtime. Your service talks to a single gRPC endpoint over mTLS; the runtime is the single source of truth for routing, delivery and state.
138
97
 
139
- This installs ServiceBridge + PostgreSQL via Docker Compose and generates an admin password automatically. After install, the dashboard is at `http://localhost:14444` and the gRPC control plane at `localhost:14445`.
98
+ | Problem | Without ServiceBridge | With ServiceBridge |
99
+ |---|---|---|
100
+ | Service-to-service calls | gRPC/HTTP plumbing + a mesh for mTLS + retries | `sb.rpc.call("svc", "Method", req)` — mTLS, LB, retries, breakers built in |
101
+ | Reliable async messaging | Stand up and operate a broker | `sb.event.publish(...)` — durable outbox, at-least-once, fan-out, DLQ |
102
+ | Multi-step business processes | A separate workflow engine to learn and host | `sb.workflow.handle(...)` — durable DAGs with compensation and replay |
103
+ | Scheduled work | A cron box or a job scheduler service | `sb.job.handle(...)` — cron / interval / delay, leased and retried |
104
+ | Knowing what happened | Wire up tracing + metrics + logs across N tools | Every hop is traced, measured and logged automatically |
105
+ | Identity & access | Certificates, a mesh policy layer | mTLS from a service key + granular access policy, on by default |
140
106
 
141
- For manual Docker Compose setup, configuration reference, and all runtime environment variables, see the **[Runtime Setup](../README.md#runtime-setup)** section in the main SDK README.
107
+ One binary, one database, one place to look when something breaks.
142
108
 
143
109
  ---
144
110
 
145
- ## End-to-End Example
146
-
147
- A complete order flow: HTTP request → RPC → Event → Event handler with streaming.
148
-
149
- ```ts
150
- import { servicebridge } from "service-bridge";
151
-
152
- // --- Payments service (worker) ---
111
+ ## Use cases
153
112
 
154
- const payments = servicebridge("localhost:14445", process.env.SERVICEBRIDGE_SERVICE_KEY!);
113
+ - **Replace a broker** — durable, at-least-once events with fan-out and a dead-letter queue, without operating Kafka or RabbitMQ.
114
+ - **Run sagas / orchestration** — checkout, onboarding, fulfilment as durable workflows with automatic compensation on failure.
115
+ - **Internal RPC backbone** — typed service-to-service calls with load balancing, retries and circuit breakers, secured by mTLS.
116
+ - **Scheduled & delayed work** — nightly rollups, reminders, periodic syncs as leased, retried jobs.
117
+ - **Streaming responses** — token-by-token LLM output or progress feeds over server-side streaming RPC.
118
+ - **Observability for free** — get a full distributed trace across RPC → event → workflow → job without instrumenting by hand.
155
119
 
156
- payments.handleRpc("payment.charge", async (payload: { orderId: string; amount: number }, ctx) => {
157
- await ctx?.stream.write({ status: "charging", orderId: payload.orderId }, "progress");
120
+ ---
158
121
 
159
- // ... charge logic ...
122
+ ## Quick start
160
123
 
161
- await ctx?.stream.write({ status: "charged" }, "progress");
162
- return { ok: true, txId: `tx_${Date.now()}` };
163
- });
124
+ Schemas are **file-based**: point the SDK at a `.proto` file (it resolves request/response types from the `service` block) or a `.schema.json` with explicit field numbers. There is no inline schema.
164
125
 
165
- await payments.serve({ host: "localhost" });
126
+ ```proto
127
+ // payment.proto
128
+ syntax = "proto3";
129
+ message ChargeRequest { string user_id = 1; int64 amount = 2; }
130
+ message ChargeReply { bool ok = 1; }
131
+ service Payment {
132
+ rpc Charge(ChargeRequest) returns (ChargeReply);
133
+ }
166
134
  ```
167
135
 
168
- ```ts
169
- // --- Orders service (caller + event publisher) ---
170
-
171
- const orders = servicebridge("localhost:14445", process.env.SERVICEBRIDGE_SERVICE_KEY!);
172
-
173
- // Call payments, then publish event
174
- const charge = await orders.rpc<{ ok: boolean; txId: string }>("payments", "payment.charge", {
175
- orderId: "ord_42",
176
- amount: 4990,
177
- });
178
-
179
- await orders.event("orders.completed", {
180
- orderId: "ord_42",
181
- txId: charge.txId,
182
- }, {
183
- idempotencyKey: "order:ord_42:completed",
184
- headers: { source: "checkout" },
185
- });
186
- ```
136
+ **Worker** — register the handler. One argument in, one value out.
187
137
 
188
138
  ```ts
189
- // --- Notifications service (event consumer) ---
139
+ import { ServiceBridge } from "service-bridge";
190
140
 
191
- const notifications = servicebridge("localhost:14445", process.env.SERVICEBRIDGE_SERVICE_KEY!);
141
+ const sb = new ServiceBridge("localhost:14445", process.env.PAYMENT_KEY!);
192
142
 
193
- notifications.handleEvent("orders.*", async (payload, ctx) => {
194
- const body = payload as { orderId: string; txId: string };
195
- await ctx.stream.write({ status: "sending_email", orderId: body.orderId }, "progress");
196
- // ... send email ...
197
- });
198
-
199
- await notifications.serve({ host: "localhost" });
200
- ```
143
+ sb.rpc.handle(
144
+ "Charge",
145
+ async (req: { userId: string; amount: number }) => {
146
+ return { ok: req.amount > 0 };
147
+ },
148
+ { schema: { protoFile: "./payment.proto" } },
149
+ );
201
150
 
202
- ```ts
203
- // --- Orchestrate as a workflow ---
204
-
205
- await orders.workflow("order.fulfillment", [
206
- { id: "reserve", type: "rpc", service: "inventory", ref: "stock.reserve" },
207
- { id: "charge", type: "rpc", service: "payments", ref: "payment.charge", deps: ["reserve"] },
208
- { id: "wait_dlv", type: "event_wait", ref: "shipping.delivered", deps: ["charge"] },
209
- { id: "notify", type: "event", ref: "orders.fulfilled", deps: ["wait_dlv"] },
210
- ]);
151
+ await sb.start();
211
152
  ```
212
153
 
213
- Every step above RPC, event publish, event delivery, workflow execution appears in a single trace timeline in the built-in dashboard.
154
+ **Caller** in another process, build a typed client and call it. `sb.client()` reads the `.proto` once, declares every method in its `service` block as an outgoing dependency, loads the schemas, and returns a typed proxy.
214
155
 
215
- ---
216
-
217
- ## Platform Features
218
-
219
- ### Communication
220
- - **Direct RPC** — zero-hop gRPC calls with retries, deadlines, and mTLS identity
221
- - **Durable Events** — fan-out delivery, guaranteed delivery (RabbitMQ-style), at-least-once guarantees, retries, DLQ, replay, idempotency. If a consumer is offline, the message waits in the server-side queue and is dispatched the moment the consumer reconnects — no retry budget consumed while waiting.
222
- - **Realtime Streams** — live chunks with replay for AI/progress/log streaming
223
- - **Service Discovery** — automatic endpoint resolution and round-robin balancing
224
- - **HTTP Middleware** — Express and Fastify instrumentation with automatic trace propagation
225
-
226
- ### Orchestration
227
- - **Workflows** — DAG steps: `rpc`, `event`, `event_wait`, `sleep`, child workflow
228
- - **Jobs** — cron, delayed, and workflow-triggered scheduling
156
+ ```ts
157
+ import { ServiceBridge } from "service-bridge";
229
158
 
230
- ### Security
231
- - **TLS by default** control plane TLS + worker mTLS with gRPC certificate provisioning
232
- - **Access Policy** — service-level caller/target restrictions and RBAC
159
+ const sb = new ServiceBridge("localhost:14445", process.env.ORDERS_KEY!);
160
+ const payment = await sb.client("payment-svc", "./payment.proto");
233
161
 
234
- ### Observability
235
- - **Unified Tracing** — single trace timeline across HTTP, RPC, events, workflows, and jobs
236
- - **Metrics** — Prometheus-compatible `/metrics` endpoint (30+ metric families)
237
- - **Logs** — structured log ingest with Loki-compatible query API
238
- - **Alerts** — runtime alerts for delivery failures, errors, and service health
239
- - **Dashboard** — realtime web UI for traces, events, workflows, jobs, DLQ, service map, and service keys
162
+ await sb.start();
240
163
 
241
- ---
164
+ const res = await payment.Charge({ userId: "u-1", amount: 100 });
165
+ // res.ok === true
166
+ ```
242
167
 
243
- ## How It Compares
244
-
245
- | Concern | Istio + Envoy | Dapr | Temporal + Kafka | ServiceBridge |
246
- |---|---|---|---|---|
247
- | RPC data path | Sidecar proxy hop | Sidecar/daemon hop | N/A | **Direct (proxyless)** |
248
- | Service discovery | K8s control plane | Sidecar placement | External registry | **Built-in registry** |
249
- | Durable events + DLQ | External broker | Pub/Sub component | Kafka + consumers | **Built-in** |
250
- | Workflow orchestration | External engine | External engine | Built-in | **Built-in** |
251
- | Job scheduling | External cron/queue | External scheduler | External scheduler | **Built-in** |
252
- | Traces + UI | Jaeger/Tempo + dashboards | OTEL backend + dashboards | Temporal UI | **Built-in** |
253
- | Logs for Grafana | Loki + Promtail pipeline | Log pipeline | Log pipeline | **Built-in Loki API** |
254
- | Metrics | App/exporter setup | App/exporter setup | Multiple exporters | **Built-in `/metrics`** |
255
- | Security model | Mesh PKI + policy | Deployment-dependent mTLS | Mixed | **Service keys + auto mTLS** |
256
- | Operational footprint | Multi-component mesh | Runtime + sidecars | Workflow + broker + DB | **One binary + PostgreSQL** |
168
+ Declare dependencies and build typed clients **before** `start()` — they ride along in the first registration. Calls succeed once `start()` has connected.
257
169
 
258
170
  ---
259
171
 
260
- ## API Reference
261
-
262
- ### Cross-SDK parity notes
263
-
264
- ServiceBridge keeps the core API shape consistent across Node.js, Go, and Python:
265
- constructor, RPC, events, jobs, workflows, `executeWorkflow`, streams, serve/stop, and `ServiceBridgeError`.
266
-
267
- Constructor-level defaults for `timeout`, `retries`, and `retryDelay` are available
268
- across all three SDKs. Parity differences are naming-only (language idioms):
269
-
270
- - Constructor TLS overrides: `workerTLS`/`caCert` (Node), `WorkerTLS`/`CACert` (Go), `worker_tls`/`ca_cert` (Python)
271
- - Handler hints: timeout/retryable/concurrency/prefetch are advisory in all SDKs
272
- - Shared `serve()` fields across SDKs: host, max in-flight, instance ID, weight, and per-serve TLS override
172
+ ## Runtime setup
273
173
 
274
- ### `servicebridge(url, serviceKey, opts?)`
174
+ The SDK needs a running ServiceBridge runtime. Spin one up with the one-line installer:
275
175
 
276
- ```ts
277
- function servicebridge(
278
- url: string,
279
- serviceKey: string,
280
- opts?: ServiceBridgeOpts,
281
- ): ServiceBridgeService
176
+ ```sh
177
+ bash <(curl -fsSL https://servicebridge.dev/install.sh)
282
178
  ```
283
179
 
284
- Creates an SDK client instance.
285
- Service identity is resolved by the runtime from `serviceKey`.
286
-
287
- `ServiceBridgeOpts`:
180
+ It pulls the runtime container, wires it to PostgreSQL 18+, and exposes the gRPC control plane on `:14445` and the dashboard on `:14444`. Open the dashboard, create a service, and copy its **bootstrap service key** — that opaque string is the second argument to `new ServiceBridge(url, key)`.
288
181
 
289
- | Option | Type | Default | Description |
290
- |---|---|---|---|
291
- | `timeout` | `number` | `30000` | Default hard timeout per RPC attempt (ms). |
292
- | `retries` | `number` | `3` | Default retry count for `rpc()`. |
293
- | `retryDelay` | `number` | `300` | Base backoff delay (ms) for `rpc()`. |
294
- | `discoveryRefreshMs` | `number` | `10000` | Discovery refresh period for endpoint updates. |
295
- | `queueMaxSize` | `number` | `1000` | Max offline queue size for control-plane writes. |
296
- | `queueOverflow` | `"drop-oldest" \| "drop-newest" \| "error"` | `"drop-oldest"` | Overflow strategy for offline queue. |
297
- | `heartbeatIntervalMs` | `number` | `10000` | Base heartbeat period for worker registrations. |
298
- | `captureLogs` | `boolean` | `true` | Forward `console.*` logs to ServiceBridge. |
182
+ Each instance authenticates with its key: the SDK calls `Bootstrap.Provision`, receives a short-lived leaf certificate, opens an mTLS gRPC channel and registers. Certificates rotate automatically with overlap (the new session is live before the old one closes), so long-running instances never drop traffic at renewal.
299
183
 
300
- ### Advanced TLS overrides
301
-
302
- | Option | Type | Default | Description |
303
- |---|---|---|---|
304
- | `workerTLS` | `WorkerTLSOpts` | auto | Explicit cert/key/CA for worker mTLS. |
305
- | `caCert` | `string \| Buffer` | from `serviceKey` | Optional control-plane CA override. By default SDK reads CA from sbv2 service key. |
306
-
307
- `WorkerTLSOpts`:
308
-
309
- ```ts
310
- type WorkerTLSOpts = {
311
- caCert?: string | Buffer;
312
- cert?: string | Buffer;
313
- key?: string | Buffer;
314
- serverName?: string;
315
- }
316
- ```
184
+ Full self-hosting docs live at **[servicebridge.dev/docs](https://servicebridge.dev/docs)**.
317
185
 
318
186
  ---
319
187
 
320
- ### `rpc(service, fn, payload?, opts?)`
321
-
322
- ```ts
323
- rpc<T = unknown>(service: string, fn: string, payload?: unknown, opts?: RpcOpts): Promise<T>
324
- ```
188
+ ## End-to-end example
325
189
 
326
- Calls a registered RPC handler on another service. Direct gRPC path, no proxy.
190
+ A small order flow: an HTTP request triggers a workflow that charges a payment, then publishes an event another service consumes all traced as one tree.
327
191
 
328
- **Arguments** — `service` is the callee’s logical name; `fn` is the name used in `handleRpc` (e.g. `payment.charge`). Use **dot notation** in `fn` to group methods. Do not put `/` in `fn`.
192
+ ```ts
193
+ import { ServiceBridge } from "service-bridge";
329
194
 
330
- `RpcOpts`:
195
+ const sb = new ServiceBridge("localhost:14445", process.env.ORDERS_KEY!);
331
196
 
332
- | Option | Type | Description |
333
- |---|---|---|
334
- | `timeout` | `number` | Call timeout in ms. |
335
- | `retries` | `number` | Retry count override. |
336
- | `retryDelay` | `number` | Base retry delay override. |
337
- | `traceId` | `string` | Explicit trace id. |
338
- | `parentSpanId` | `string` | Explicit parent span id. |
339
- | `mode` | `"direct" \| "proxy"` | Transport mode. `"direct"` (default) connects directly to the worker. `"proxy"` routes through the control plane when direct connection is unavailable. |
197
+ // Outgoing dependencies declared before start().
198
+ sb.service("payment-svc", { rpc: ["Charge"] });
199
+ sb.event.define("order.placed", { protoFile: "./events.proto", input: "OrderPlaced" });
340
200
 
341
- ```ts
342
- const user = await sb.rpc<{ id: string; name: string }>("users", "user.get", { id: "u_1" }, {
343
- timeout: 5000,
344
- retries: 2,
201
+ // A durable workflow: charge, then announce. Steps run by dependency level.
202
+ sb.workflow.handle("checkout", {
203
+ input: { type: "object", properties: { orderId: { type: "string" } } },
204
+ steps: [
205
+ { id: "charge", type: "call", service: "payment-svc", method: "Charge",
206
+ input: "$.input" },
207
+ { id: "announce", type: "publish", event: "order.placed",
208
+ input: "$.input", waitFor: ["charge"] },
209
+ ],
345
210
  });
346
- ```
347
211
 
348
- `rpc()` is bounded even when a downstream worker is silent:
349
- each attempt has a hard local timeout, retries are finite (`retries + 1` total attempts),
350
- and after the final failed attempt the root RPC span is closed with `error`.
212
+ sb.on("connected", ({ serviceName }) => console.log(`up as ${serviceName}`));
351
213
 
352
- Retry delay uses exponential backoff: `retryDelay * 2^(attempt-1)`.
214
+ await sb.start();
353
215
 
354
- ---
355
-
356
- ### `event(topic, payload?, opts?)`
357
-
358
- ```ts
359
- event(topic: string, payload?: unknown, opts?: EventOpts): Promise<string>
216
+ // Kick off a run and wait for the final state.
217
+ const { runId } = await sb.workflow.start("checkout", { orderId: "o-1" });
218
+ const state = await sb.workflow.await(runId);
219
+ console.log("done", state);
360
220
  ```
361
221
 
362
- Publishes a durable event. Returns `messageId` when online.
363
-
364
- `EventOpts`:
365
-
366
- | Option | Type | Description |
367
- |---|---|---|
368
- | `traceId` | `string` | Explicit trace id. |
369
- | `parentSpanId` | `string` | Explicit parent span id. |
370
- | `idempotencyKey` | `string` | Idempotency key for dedup-safe publishing. |
371
- | `headers` | `Record<string, string>` | Custom metadata headers. |
222
+ The consuming service just subscribes:
372
223
 
373
224
  ```ts
374
- await sb.event("orders.created", { orderId: "ord_42" }, {
375
- idempotencyKey: "order:ord_42",
376
- headers: { source: "checkout" },
225
+ sb.event.handle("order.placed", async (payload) => {
226
+ await sendReceipt(payload);
377
227
  });
228
+ await sb.start();
378
229
  ```
379
230
 
380
- ---
381
-
382
- ### `publishEvent(topic, payload?, opts?)`
383
-
384
- ```ts
385
- publishEvent(topic: string, payload?: unknown, opts?: PublishEventOpts): Promise<string>
386
- ```
387
-
388
- Publishes an event via the established worker session stream. Requires an active worker session — call after `serve()`. Resolves with `messageId` once the server confirms with `publish_ack`. Times out after 30 s if no ack. Use `event()` when not serving (e.g. caller-only services); use `publishEvent()` from within a worker for lower-latency publishing over the existing session.
231
+ In the dashboard you see one trace spanning the workflow run, the `Charge` RPC, the `order.placed` publish, and its delivery to the subscriber.
389
232
 
390
233
  ---
391
234
 
392
- ### `job(target, opts)`
235
+ ## Platform features
393
236
 
394
- ```ts
395
- job(target: string, opts: ScheduleOpts): Promise<string>
396
- ```
237
+ | Area | What you get |
238
+ |---|---|
239
+ | **Communication** | Direct RPC, server-side streaming, durable events, service discovery, full-mesh routing, a live service map |
240
+ | **Orchestration** | Workflows (DAG steps with compensation), sub-workflows, jobs (cron / interval / delayed), bidirectional replay |
241
+ | **Reliability** | At-least-once delivery, retries, DLQ, idempotency, fan-out, session resilience, multi-instance failover, circuit breakers |
242
+ | **Traffic control** | Load balancing, rate limiting, per-definition limits, filter expressions, adaptive performance |
243
+ | **Security** | TLS by default, mTLS identity, auto-provisioned certs from a service key, granular access policy |
244
+ | **Observability** | Unified tracing with propagation, Prometheus-compatible metrics, structured logs, smart alerts |
397
245
 
398
- Registers a scheduled or delayed job.
246
+ Designed to run up to 1000 services against a single runtime.
399
247
 
400
- `ScheduleOpts`:
401
-
402
- | Option | Type | Description |
403
- |---|---|---|
404
- | `cron` | `string` | Cron expression. |
405
- | `delay` | `number` | Delay in ms before execution. Backed by `int32` in the proto — maximum ~24.8 days (~2,147,483,647 ms). |
406
- | `timezone` | `string` | Timezone for cron execution. |
407
- | `misfire` | `"fire_now" \| "skip"` | Misfire policy. |
408
- | `via` | `"event" \| "rpc" \| "workflow"` | Target type. |
409
- | `retryPolicyJson` | `string` | Retry policy JSON string. |
248
+ ---
410
249
 
411
- ```ts
412
- // RPC job: explicit service and function
413
- await sb.job("billing", "collect", {
414
- cron: "0 * * * *",
415
- timezone: "UTC",
416
- via: "rpc",
417
- });
250
+ ## How it compares
418
251
 
419
- // Event job: single target
420
- await sb.job("user.signup", {
421
- cron: "0 0 * * *",
422
- via: "event",
423
- });
252
+ | You'd otherwise reach for | ServiceBridge gives you |
253
+ |---|---|
254
+ | Istio / Linkerd (mesh, mTLS) | mTLS identity + routing + policy, no sidecars |
255
+ | RabbitMQ / Kafka / NATS | Durable events with outbox, fan-out, retries, DLQ |
256
+ | Temporal / Cadence | Durable workflows with compensation, signals, replay |
257
+ | A cron service / Quartz | Leased, retried scheduled jobs |
258
+ | Jaeger / Tempo + Prometheus + Loki | Tracing, metrics and logs, correlated out of the box |
259
+ | gRPC + a service registry | Typed RPC with discovery, LB and breakers |
424
260
 
425
- // Workflow job: single target
426
- await sb.job("monthly_report", {
427
- cron: "0 0 1 * *",
428
- via: "workflow",
429
- });
430
- ```
261
+ The point isn't that ServiceBridge beats each tool at its own game — it's that you stop running and correlating ten of them.
431
262
 
432
263
  ---
433
264
 
434
- ### `workflow(name, steps, opts?)`
265
+ ## API reference
435
266
 
436
- ```ts
437
- workflow(name: string, steps: WorkflowStep[], opts?: WorkflowOpts): Promise<string>
438
- ```
439
-
440
- Registers (or updates) a workflow definition as a DAG of typed steps. Returns the workflow name.
267
+ The bridge exposes four domains (`sb.rpc`, `sb.event`, `sb.job`, `sb.workflow`) plus `sb.stream()` and `sb.telemetry`. Register handlers and declare dependencies **before** `start()`.
441
268
 
442
- `WorkflowStep`:
443
-
444
- | Field | Type | Description |
445
- |---|---|---|
446
- | `id` | `string` | Unique step identifier in the DAG. |
447
- | `type` | `"rpc" \| "event" \| "event_wait" \| "sleep" \| "workflow"` | Step execution type. |
448
- | `service` | `string` | **Required** for `rpc` and `workflow` steps. Target logical service name (e.g. `"inventory"`, `"payments"`). |
449
- | `ref` | `string` | Required for `rpc`, `event`, `event_wait`, `workflow`. For `rpc` — the registered function name (e.g. `"stock.reserve"`, `"payment.charge"`). For `event`/`event_wait` — topic or pattern. For `workflow` — child workflow name. Always use dots, never slashes. |
450
- | `deps` | `string[]` | Dependencies. Empty/omitted means root step. |
451
- | `if` | `string` | Optional filter expression (step is skipped if false). |
452
- | `timeoutMs` | `number` | Optional timeout for `rpc` and `event_wait` steps. |
453
- | `durationMs` | `number` | Required for `sleep` steps. |
269
+ ### RPC
454
270
 
455
- `WorkflowOpts`:
271
+ `sb.rpc` is request/response: register handlers, call other services.
456
272
 
457
273
  ```ts
458
- interface WorkflowOpts {
459
- stateLimitBytes?: number; // default 262144 (256 KB)
460
- stepTimeoutMs?: number; // default 30000 (30 s)
461
- }
462
- ```
463
-
464
- | Field | Type | Default | Description |
465
- |---|---|---|---|
466
- | `stateLimitBytes` | `number` | `262144` (256 KB) | Maximum serialized state size in bytes. |
467
- | `stepTimeoutMs` | `number` | `30000` (30 s) | Default per-step timeout in milliseconds. |
274
+ // Unary handler: (req) => res
275
+ sb.rpc.handle<ChargeRequest, ChargeReply>(
276
+ "Charge",
277
+ async (req) => ({ ok: req.amount > 0 }),
278
+ { schema: { protoFile: "./payment.proto" } },
279
+ );
468
280
 
469
- ```ts
470
- await sb.workflow("order.fulfillment", [
471
- { id: "reserve", type: "rpc", service: "inventory", ref: "stock.reserve" },
472
- { id: "charge", type: "rpc", service: "payments", ref: "payment.charge", deps: ["reserve"] },
473
- { id: "wait_5m", type: "sleep", durationMs: 300_000, deps: ["charge"] },
474
- { id: "notify", type: "event", ref: "orders.fulfilled", deps: ["wait_5m"] },
475
- ]);
281
+ // Server-side streaming handler: (req) => AsyncIterable<chunk>
282
+ sb.rpc.handleStream<GenRequest, Token>(
283
+ "Generate",
284
+ async function* (req) {
285
+ for (const word of req.prompt.split(" ")) yield { token: word };
286
+ },
287
+ { schema: { protoFile: "./gen.proto" } },
288
+ );
476
289
  ```
477
290
 
478
- With explicit limits:
291
+ Calling the typed proxy from `sb.client()` (preferred), or the lower-level `sb.rpc.call()`:
479
292
 
480
293
  ```ts
481
- await sb.workflow("checkout.flow", steps, { stepTimeoutMs: 60_000 });
482
- ```
483
-
484
- ---
485
-
486
- ### `executeWorkflow(service, name, input?, opts?)`
294
+ const res = await payment.Charge({ userId: "u-1", amount: 100 });
487
295
 
488
- ```ts
489
- executeWorkflow(service: string, name: string, input?: unknown, opts?: ExecuteWorkflowOpts): Promise<{ traceId: string; groupTraceId: string }>
296
+ const res2 = await sb.rpc.call("payment-svc", "Charge",
297
+ { userId: "u-1", amount: 100 },
298
+ { timeout: "5s", idempotencyKey: "order-42" },
299
+ );
490
300
  ```
491
301
 
492
- Starts a workflow execution on demand. The workflow must be registered first via `workflow()` on the target service.
493
- An alternative to scheduling via `job(target, { via: "workflow" })` — triggers the execution immediately.
302
+ `CallOpts` apply per call, layered over `callDefaults` from the constructor:
494
303
 
495
- | Parameter | Type | Default | Description |
304
+ | `CallOpts` | Type | Default | Description |
496
305
  |---|---|---|---|
497
- | `service` | `string` | required | Logical service that owns the workflow definition (same as the worker identity that called `workflow()`). |
498
- | `name` | `string` | required | Workflow name. |
499
- | `input` | `unknown` | `undefined` | Optional JSON-serializable input payload. |
500
-
501
- Returns `{ traceId, groupTraceId }`. Use `traceId` with `watchTrace()` to observe execution in real time.
502
-
503
- `ExecuteWorkflowOpts`:
504
-
505
- | Option | Type | Description |
506
- |---|---|---|
507
- | `traceId` | `string` | Override trace ID for this workflow execution. |
508
-
509
- ```ts
510
- const { traceId, groupTraceId } = await sb.executeWorkflow("users", "user.onboarding", { userId: "u_123" });
511
- ```
306
+ | `timeout` | `string` | `"30s"` | Deadline, e.g. `"500ms"`, `"10s"`, `"2m"`. |
307
+ | `requestId` | `string` | random UUID v4 | Correlation id carried to the callee. |
308
+ | `transport` | `"direct" \| "proxy" \| "auto"` | `"auto"` | `direct` = caller→callee mTLS; `proxy` = via the runtime; `auto` = direct when an endpoint is known. |
309
+ | `idempotencyKey` | `string` | none | Opts into runtime-side dedup; replays within the TTL return the cached response. |
310
+ | `retry` | `Partial<RetryOpts>` | exp. backoff | `{ maxAttempts: 3, baseDelayMs: 200, factor: 2, maxDelayMs: 5000, jitter: 0.3 }`. Set `maxAttempts: 1` to disable. |
512
311
 
513
- ---
514
-
515
- ### `cancelWorkflow(traceId)`
312
+ Without an `idempotencyKey`, ambiguous failures (`INTERNAL` / `ABORTED` / `UNKNOWN`) are treated as non-retryable so a non-idempotent call is never silently repeated. Schema-version mismatches are filtered at routing time, so blue-green deploys route `v1→v1` and `v2→v2` automatically.
516
313
 
517
- ```ts
518
- cancelWorkflow(traceId: string): Promise<void>
519
- ```
314
+ ### Events
520
315
 
521
- Cancels a running workflow instance.
316
+ Durable, at-least-once publish/subscribe. Events hit a local SQLite outbox first, then drain to the runtime, so a publish survives a transient disconnect.
522
317
 
523
318
  ```ts
524
- await sb.cancelWorkflow("trace_01HQ...XYZ");
525
- ```
526
-
527
- ---
528
-
529
- ### `handleRpc(fn, handler, opts?)`
530
-
531
- ```ts
532
- handleRpc(
533
- fn: string,
534
- handler: (payload: unknown, ctx?: RpcContext) => unknown | Promise<unknown>,
535
- opts?: HandleRpcOpts,
536
- ): ServiceBridgeService
537
- ```
319
+ // Declare what you publish (same file-based SchemaSpec as RPC).
320
+ sb.event.define("order.placed", { protoFile: "./events.proto", input: "OrderPlaced" });
538
321
 
539
- Registers an RPC handler. Chainable.
540
-
541
- `RpcContext`:
542
-
543
- | Field | Type | Description |
544
- |---|---|---|
545
- | `traceId` | `string` | Current trace ID. |
546
- | `spanId` | `string` | Current span ID. |
547
- | `stream` | `StreamWriter` | Real-time stream writer. |
548
-
549
- `HandleRpcOpts`:
550
-
551
- | Option | Type | Description |
552
- |---|---|---|
553
- | `timeout` | `number` | Advisory timeout hint (currently metadata-level, not hard-enforced by runtime). |
554
- | `retryable` | `boolean` | Advisory retry hint (currently metadata-level, not a strict policy switch). |
555
- | `concurrency` | `number` | Advisory concurrency hint (currently not hard-enforced). |
556
- | `schema` | `RpcSchemaOpts` | Inline protobuf schema for binary encode/decode. |
557
- | `allowedCallers` | `string[]` | Allow-list of caller service names. |
558
-
559
- ```ts
560
- sb.handleRpc("ai.generate", async (payload: { prompt: string }, ctx) => {
561
- await ctx?.stream.write({ token: "Hello" }, "output");
562
- await ctx?.stream.write({ token: " world" }, "output");
563
- return { text: "Hello world" };
322
+ // Subscribe exact name or wildcard ("order.*", "order.#").
323
+ sb.event.handle("order.placed", async (payload) => {
324
+ await fulfil(payload);
564
325
  });
565
- ```
566
326
 
567
- `StreamWriter`:
568
-
569
- | Method | Signature | Description |
570
- |---|---|---|
571
- | `write` | `write(data: unknown, key?: string): Promise<void>` | Append a real-time chunk to the trace stream. |
572
- | `end` | `end(key?: string): Promise<void>` | No-op placeholder for API symmetry (lifecycle managed by runtime). |
327
+ await sb.start();
573
328
 
574
- ---
575
-
576
- ### `handleEvent(pattern, handler, opts?)`
577
-
578
- ```ts
579
- handleEvent(
580
- pattern: string,
581
- handler: (payload: unknown, ctx: EventContext) => void | Promise<void>,
582
- opts?: HandleEventOpts,
583
- ): ServiceBridgeService
329
+ const { eventId } = await sb.event.publish("order.placed", { orderId: "o-1", total: 4200 });
584
330
  ```
585
331
 
586
- Registers an event consumer handler. Chainable.
587
-
588
- `HandleEventOpts`:
332
+ Event names must match `^[a-z0-9_-]+(\.[a-z0-9_-]+)*$` (invalid → `InvalidEventNameError`). A full outbox throws `OutboxFullError`.
589
333
 
590
- | Option | Type | Description |
334
+ | `PublishOpts` | Type | Description |
591
335
  |---|---|---|
592
- | `concurrency` | `number` | Advisory concurrency hint (currently not hard-enforced). |
593
- | `prefetch` | `number` | Advisory prefetch hint (currently not hard-enforced). |
594
- | `retryPolicyJson` | `string` | Retry policy JSON string. |
595
- | `filterExpr` | `string` | Server-side filter expression. |
336
+ | `idempotencyKey` | `string` | Dedup key for at-least-once delivery. |
337
+ | `partitionKey` | `string` | Orders delivery within a partition. |
338
+ | `fireAndForget` | `boolean` | Skip the durable wait for the publish ack. |
339
+ | `headers` | `Record<string, string>` | Custom envelope headers. |
340
+ | `occurredAtMs` | `number` | Event time (unix-ms); defaults to now. |
596
341
 
597
- Duplicate pattern registration within the same service throws an error.
342
+ The runtime delivers at-least-once, retries failures, fans out to every matching subscriber, and dead-letters exhausted messages. The DLQ is operated from the dashboard the SDK has no DLQ API; make handlers idempotent and throw to signal "retry me".
598
343
 
599
- **Delivery guarantee**: once a message is accepted by the runtime, delivery to each consumer group
600
- is guaranteed. If the consumer is offline, the message waits in the server-side queue and is
601
- dispatched automatically the moment the service reconnects and registers its handlers — no retry
602
- budget is consumed while waiting. After `SERVICEBRIDGE_DELIVERY_TTL_DAYS` (default 7) days without
603
- a consumer, the delivery moves to DLQ with reason `delivery_ttl_exceeded`.
344
+ ### Jobs
604
345
 
605
- `EventContext` helpers:
606
-
607
- - `ctx.traceId` — current trace ID
608
- - `ctx.spanId` — current span ID
609
- - `ctx.retry(delayMs?)` — ask for redelivery with optional delay
610
- - `ctx.reject(reason)` — move to DLQ immediately, bypassing remaining retries
611
- - `ctx.refs` — metadata (`topic`, `groupName`, `messageId`, `attempt`, `headers`)
612
- - `ctx.stream.write(...)` — append real-time chunks to trace stream
346
+ Scheduled work: cron, fixed interval, or one-shot delay. The runtime owns the schedule, leasing and retries.
613
347
 
614
348
  ```ts
615
- sb.handleEvent("orders.*", async (payload, ctx) => {
616
- const body = payload as { orderId?: string };
617
- if (!body.orderId) {
618
- ctx.reject("missing_order_id");
619
- return;
620
- }
621
- await ctx.stream.write({ status: "processing", orderId: body.orderId }, "progress");
622
- });
623
- ```
624
-
625
- ---
349
+ sb.job.handle("nightly-rollup",
350
+ { trigger: { cron: "0 3 * * *", tz: "UTC" } }, // 5-field cron, no seconds
351
+ async (ctx) => { await rollup(ctx.scheduledAt); },
352
+ );
626
353
 
627
- ### `serve(opts?)`
354
+ sb.job.handle("heartbeat", { trigger: { interval: 30_000 } }, async () => { await ping(); });
628
355
 
629
- ```ts
630
- serve(opts?: ServeOpts): Promise<void>
356
+ sb.job.handle("send-reminder",
357
+ { trigger: { delayed: { at: Date.now() + 60_000 } } }, // Date | number | ISO string
358
+ async (ctx) => { await remind(ctx.idempotencyKey); },
359
+ );
631
360
  ```
632
361
 
633
- Starts the worker gRPC server and registers handlers with the control plane.
634
- The promise resolves once startup/registration is complete (it does not block
635
- the Node.js process). Throws immediately if no handlers are registered (neither `handleRpc()` nor `handleEvent()` have been called).
636
-
637
- `ServeOpts`:
362
+ The handler receives a `JobHandlerCtx`: `{ jobName, executionId, scheduledAt, localScheduledAt, attempt, idempotencyKey, signal }`.
638
363
 
639
- | Option | Type | Description |
640
- |---|---|---|
641
- | `host` | `string` | Bind host. Default: `localhost`. Use `0.0.0.0` in Docker/Kubernetes so ServiceBridge can reach the worker. |
642
- | `maxInFlight` | `number` | Max in-flight runtime-originated commands over `OpenWorkerSession`. Default: `128`. |
643
- | `instanceId` | `string` | Stable worker instance identifier. |
644
- | `weight` | `number` | Scheduling/discovery weight hint. |
645
- | `tls` | `WorkerTLSOpts` | Per-serve worker TLS override. |
646
-
647
- ```ts
648
- await sb.serve({
649
- host: "localhost",
650
- instanceId: process.env.HOSTNAME,
364
+ | `JobOpts` | Type | Default | Description |
365
+ |---|---|---|---|
366
+ | `trigger` | `{cron, tz?} \| {delayed:{at}} \| {interval}` | required | Exactly one trigger; `interval` is in ms. |
367
+ | `catchup` | `"skip" \| "fire_once" \| "fire_all"` | `skip` | What to do for fire times missed during downtime. |
368
+ | `overlap` | `"skip" \| "allow" \| "buffer_one"` | `allow` | Behaviour when a previous run is still in flight. |
369
+ | `deps` | `DeclaredDep[]` | none | Outgoing deps: `{ rpc }`, `{ event }`, `{ workflow }`. |
370
+ | `maxAttempts` / `leaseTtlMs` / `maxConcurrent` / `retry` | | runtime default | Execution limits and `{ initialMs, maxMs, multiplier, jitter }` retry. |
371
+
372
+ ### Workflows
373
+
374
+ Durable DAGs. Declare the graph once; the runtime executes it, persists state between steps, survives restarts, and compensates on failure or cancel.
375
+
376
+ ```ts
377
+ sb.workflow.handle("checkout", {
378
+ input: { type: "object", properties: { orderId: { type: "string" } } },
379
+ steps: [
380
+ { id: "reserve", type: "call", service: "inventory-svc", method: "Reserve",
381
+ input: "$.input",
382
+ compensate: { service: "inventory-svc", method: "Release", input: "$.reserve" } },
383
+ { id: "charge", type: "call", service: "payment-svc", method: "Charge",
384
+ input: "$.input", waitFor: ["reserve"] },
385
+ { id: "notify", type: "publish", event: "order.placed",
386
+ input: "$.input", waitFor: ["charge"] },
387
+ ],
651
388
  });
652
389
  ```
653
390
 
654
- ---
655
-
656
- ### `stop()`
657
-
658
- ```ts
659
- stop(): void
660
- ```
661
-
662
- Gracefully stops the worker gRPC server (try graceful shutdown, then force), heartbeats, channels, and SDK internals.
663
-
664
- ---
391
+ Top-level steps run in parallel by default; `waitFor` declares dependencies and defines the execution levels. Step types: `call`, `publish`, `sleep`, `wait_event`, `wait_signal`, `workflow` (sub-workflow), `parallel`, `sequence`, `local`. Inputs are JSON-path expressions (`"$.input"`, `"$.reserve.id"`) over the accumulated run state.
665
392
 
666
- ### `startHttpSpan(opts)`
393
+ Driving a run:
667
394
 
668
395
  ```ts
669
- startHttpSpan(opts: {
670
- method: string;
671
- path: string;
672
- traceId?: string;
673
- parentSpanId?: string;
674
- }): HttpSpan
675
- ```
676
-
677
- Manual HTTP tracing primitive.
396
+ const { runId } = await sb.workflow.start("checkout", { orderId: "o-1" });
678
397
 
679
- ```ts
680
- const span = sb.startHttpSpan({ method: "GET", path: "/health" });
681
- try {
682
- span.end({ statusCode: 200, success: true });
683
- } catch (e) {
684
- span.end({ success: false, error: String(e) });
685
- }
398
+ const state = await sb.workflow.await(runId); // block until terminal
399
+ const snap = await sb.workflow.query(runId); // { status, state, steps: [...] }
400
+ await sb.workflow.signal(runId, "approval", { ok: 1 }); // resume a wait_signal step
401
+ await sb.workflow.cancel(runId); // compensate in reverse
402
+ const { runId: forked } = await sb.workflow.replay(runId, { fromStepId: "charge" });
686
403
  ```
687
404
 
688
- ---
689
-
690
- ### `registerHttpEndpoint(opts)`
691
-
692
- ```ts
693
- registerHttpEndpoint(opts: {
694
- method: string;
695
- route: string;
696
- instanceId?: string;
697
- endpoint?: string;
698
- allowedCallers?: string[];
699
- requestSchemaJson?: string;
700
- responseSchemaJson?: string;
701
- transport?: string;
702
- }): Promise<void>
703
- ```
704
-
705
- Registers HTTP route metadata in the ServiceBridge service catalog.
706
- Also starts a periodic heartbeat to keep the HTTP endpoint alive in the registry.
707
-
708
- | Option | Type | Description |
709
- |---|---|---|
710
- | `method` | `string` | HTTP method: `GET`, `POST`, `PUT`, `PATCH`, `DELETE`, etc. |
711
- | `route` | `string` | Route pattern with parameter placeholders, e.g. `"/users/:id"`. |
712
- | `instanceId` | `string` | Stable identifier for this process instance. |
713
- | `endpoint` | `string` | Reachable address, e.g. `"http://10.0.0.1:3000"`. |
714
- | `allowedCallers` | `string[]` | Service names allowed to call (RBAC). |
715
- | `requestSchemaJson` | `string` | JSON schema for request validation metadata. |
716
- | `responseSchemaJson` | `string` | JSON schema for response validation metadata. |
717
- | `transport` | `string` | Transport label (e.g. `"http"`, `"https"`). |
718
-
719
- ```ts
720
- await sb.registerHttpEndpoint({
721
- method: "GET",
722
- route: "/users/:id",
723
- requestSchemaJson: '{"type":"object"}',
724
- transport: "http",
725
- });
726
- ```
727
-
728
- ---
729
-
730
- ### `watchTrace(traceId, opts?)`
731
-
732
- ```ts
733
- watchTrace(traceId: string, opts?: WatchTraceOpts): AsyncIterable<TraceStreamEvent>
734
- ```
735
-
736
- Subscribes to a trace stream with replay and live updates. `traceId` is the stream
737
- identifier used by `ctx.stream.write(...)`.
738
-
739
- `WatchTraceOpts`:
740
-
741
- | Option | Type | Default | Description |
742
- |---|---|---|---|
743
- | `key` | `string` | `""` | Stream key filter (`""` = all keys). |
744
- | `fromSequence` | `number` | `0` | Replay from sequence cursor. |
745
-
746
- `TraceStreamEvent`:
405
+ Use `sb.workflow.query()` for the snapshot — there is no `getStatus`. `start()` with no permission throws `WorkflowAccessDeniedError`; an unknown name throws `WorkflowNotFoundError`; signalling/cancelling a finished run throws `WorkflowTerminalError`.
747
406
 
748
- | Field | Type | Description |
749
- |---|---|---|
750
- | `type` | `"chunk" \| "trace_complete"` | Event kind. |
751
- | `traceId` | `string` | Trace identifier being watched. |
752
- | `key` | `string` | Stream lane key. |
753
- | `sequence` | `number` | Monotonic sequence number. |
754
- | `data` | `unknown` | JSON-decoded chunk payload. |
755
- | `traceStatus` | `string \| undefined` | Final status on `trace_complete`. |
756
-
757
- Behavior:
407
+ ### Streaming
758
408
 
759
- - Auto-reconnect with exponential backoff (`500ms` `5000ms`) on retryable stream failures.
760
- - Deduplicates by `sequence` across reconnects.
761
- - Enforces strict JSON for `type="chunk"` payloads (non-JSON chunk terminates stream with fatal error).
762
- - Enforces internal queue limit `256`; overflow is fatal (consumer must drain promptly).
409
+ Server-side streaming is a first-class RPC shape. Register with `sb.rpc.handleStream`, consume with `sb.stream()` (or the typed proxy, which auto-detects `returns (stream T)` methods).
763
410
 
764
411
  ```ts
765
- for await (const evt of sb.watchTrace(traceId, { key: "output", fromSequence: 0 })) {
766
- if (evt.type === "chunk") {
767
- process.stdout.write(String((evt.data as { token?: string }).token ?? ""));
768
- }
769
- if (evt.type === "trace_complete") break;
412
+ for await (const chunk of sb.stream("gen-svc", "Generate", { prompt: "write a haiku" })) {
413
+ process.stdout.write(chunk.token);
770
414
  }
771
415
  ```
772
416
 
773
- ---
774
-
775
- ### Trace Utilities
776
-
777
- #### `getTraceContext()`
417
+ Breaking the loop (`break`/`return`) tears down the gRPC stream end to end. Streams are single-pick — never retried — by design.
778
418
 
779
- ```ts
780
- getTraceContext(): { traceId: string; spanId: string } | undefined
781
- ```
419
+ ### Telemetry
782
420
 
783
- Returns the current async-local trace context.
421
+ Telemetry flows automatically: every RPC, event, job, workflow step and HTTP request emits an operation span and propagates the trace across hops. Add your own through `sb.telemetry`; anything emitted inside a handler nests under that handler's trace.
784
422
 
785
423
  ```ts
786
- import { getTraceContext } from "service-bridge";
424
+ import { Channel, UserSubOp } from "service-bridge";
787
425
 
788
- const tc = getTraceContext();
789
- if (tc) {
790
- console.log(tc.traceId, tc.spanId);
426
+ const op = sb.telemetry.startOp({
427
+ channel: Channel.USER, kind: UserSubOp, subject: "reprice-cart", businessKey: cartId,
428
+ });
429
+ try {
430
+ await reprice(cartId);
431
+ op.end(/* Status.SUCCESS */);
432
+ } catch (err) {
433
+ op.end(/* Status.ERROR */, String(err));
434
+ throw err;
791
435
  }
792
- ```
793
436
 
794
- #### `withTraceContext(ctx, fn)`
795
-
796
- ```ts
797
- withTraceContext<T>(ctx: { traceId: string; spanId: string }, fn: () => T): T
437
+ sb.telemetry.log.info("cart repriced", { cartId, items: 7 }); // also sb.logger
438
+ sb.telemetry.counter("carts_repriced_total").inc();
439
+ sb.telemetry.gauge("queue_depth").set(42);
440
+ sb.telemetry.histogram("reprice_ms", "ms").observe(12.5);
798
441
  ```
799
442
 
800
- Runs a function inside an explicit trace context.
443
+ `startOp()` returns a handle whose `.end(status, message?)` closes the span. Anything emitted before `start()` buffers in an in-memory ring and drains once connected.
801
444
 
802
- ```ts
803
- import { withTraceContext } from "service-bridge";
445
+ ### HTTP
804
446
 
805
- withTraceContext({ traceId: "trace-1", spanId: "span-1" }, async () => {
806
- await sb.event("audit.log", { action: "user.login" });
807
- });
808
- ```
447
+ ServiceBridge does **not** proxy your business HTTP. You run your own server; the integration discovers your routes, publishes them to the Service Map, and wraps each request in a trace span so HTTP stitches into the same trace as the RPCs and events it triggers. See [HTTP plugins](#http-plugins).
448
+
449
+ Useful read accessors after `start()`: `sb.identity()` (current session identity or `null`), `sb.serviceMap()` (live registry: visible methods, instances, endpoints), `sb.policyEvaluation()` (the runtime's current access-policy verdict).
809
450
 
810
451
  ---
811
452
 
812
- ## HTTP Plugins
453
+ ## HTTP plugins
813
454
 
814
- ### Express (`service-bridge/express`)
455
+ Each integration is a subpath import with an optional peer dependency.
815
456
 
816
- ```bash
817
- npm install express
818
- ```
457
+ **Express** — `service-bridge/express`:
819
458
 
820
459
  ```ts
821
460
  import express from "express";
822
- import { servicebridge } from "service-bridge";
823
- import { servicebridgeMiddleware, registerExpressRoutes } from "service-bridge/express";
461
+ import { ServiceBridge } from "service-bridge";
462
+ import { attachExpress } from "service-bridge/express";
824
463
 
825
- const sb = servicebridge(process.env.SERVICEBRIDGE_URL!, process.env.SERVICEBRIDGE_SERVICE_KEY!);
826
464
  const app = express();
465
+ app.post("/orders", (req, res) => res.json({ ok: true }));
827
466
 
828
- app.use(servicebridgeMiddleware({
829
- client: sb,
830
- excludePaths: ["/health"],
831
- autoRegister: true,
832
- }));
467
+ const sb = new ServiceBridge("localhost:14445", KEY);
468
+ await sb.start();
833
469
 
834
- app.get("/users/:id", async (req, res) => {
835
- const user = await req.servicebridge.rpc("users", "user.get", { id: req.params.id });
836
- res.json(user);
837
- });
470
+ app.listen(3000, () => attachExpress(app, sb, { port: 3000 }));
838
471
  ```
839
472
 
840
- #### `servicebridgeMiddleware(options)`
841
-
842
- ```ts
843
- servicebridgeMiddleware(options: {
844
- client: ServiceBridgeService;
845
- excludePaths?: string[];
846
- propagateTraceHeader?: boolean;
847
- autoRegister?: boolean;
848
- }): express.RequestHandler
849
- ```
850
-
851
- - Attaches `req.servicebridge`, `req.traceId`, `req.spanId`
852
- - Starts/ends HTTP span automatically
853
- - Optionally sets `x-trace-id` response header
854
- - Optionally auto-registers route pattern in catalog on first hit
855
-
856
- #### `registerExpressRoutes(app, client, opts?)`
857
-
858
- Eager route catalog registration without waiting for first request.
859
-
860
- ```ts
861
- await registerExpressRoutes(app, sb, {
862
- endpoint: "http://10.0.0.5:3000",
863
- allowedCallers: ["api-gateway"],
864
- excludePaths: ["/health"],
865
- });
866
- ```
867
-
868
- ---
869
-
870
- ### Fastify (`service-bridge/fastify`)
871
-
872
- ```bash
873
- npm install fastify
874
- ```
473
+ **Fastify** `service-bridge/fastify`:
875
474
 
876
475
  ```ts
877
476
  import Fastify from "fastify";
878
- import { servicebridge } from "service-bridge";
879
- import { servicebridgePlugin, wrapHandler } from "service-bridge/fastify";
477
+ import { ServiceBridge } from "service-bridge";
478
+ import { sbFastify } from "service-bridge/fastify";
880
479
 
881
- const sb = servicebridge(process.env.SERVICEBRIDGE_URL!, process.env.SERVICEBRIDGE_SERVICE_KEY!);
882
480
  const app = Fastify();
481
+ const sb = new ServiceBridge("localhost:14445", KEY);
883
482
 
884
- await app.register(servicebridgePlugin, {
885
- client: sb,
886
- excludePaths: ["/health"],
887
- autoRegister: true,
888
- });
483
+ app.post("/orders", async () => ({ ok: true }));
484
+ await app.register(sbFastify, { sb }); // discovers routes + endpoint in onListen
889
485
 
890
- app.get("/users/:id", wrapHandler(async (request, reply) => {
891
- const user = await request.servicebridge.rpc("users", "user.get", {
892
- id: (request.params as any).id,
893
- });
894
- return reply.send(user);
895
- }));
486
+ await sb.start();
487
+ await app.listen({ port: 3000 });
896
488
  ```
897
489
 
898
- #### `servicebridgePlugin(fastify, options)`
490
+ **Hono** `service-bridge/hono`:
899
491
 
900
492
  ```ts
901
- servicebridgePlugin(fastify, {
902
- client,
903
- excludePaths?,
904
- propagateTraceHeader?,
905
- autoRegister?,
906
- register?: {
907
- instanceId?,
908
- endpoint?,
909
- allowedCallers?,
910
- excludePaths?,
911
- },
912
- })
913
- ```
914
-
915
- - Decorates `request.servicebridge`, `request.traceId`, `request.spanId`
916
- - Traces HTTP lifecycle via hooks
917
- - Auto-registers routes on `onRoute` before traffic
493
+ import { Hono } from "hono";
494
+ import { ServiceBridge } from "service-bridge";
495
+ import { attachHono } from "service-bridge/hono";
918
496
 
919
- #### `wrapHandler(handler)`
497
+ const app = new Hono();
498
+ app.post("/orders", (c) => c.json({ ok: true }));
920
499
 
921
- Runs a Fastify handler inside the current trace context so downstream SDK calls inherit the trace.
922
-
923
- ---
500
+ const sb = new ServiceBridge("localhost:14445", KEY);
501
+ await sb.start();
924
502
 
925
- ### Trace Utilities (HTTP Plugins)
926
-
927
- #### `extractTraceFromHeaders(headers)`
928
-
929
- ```ts
930
- import { extractTraceFromHeaders } from "service-bridge/express";
931
- // or
932
- import { extractTraceFromHeaders } from "service-bridge/fastify";
933
-
934
- const { traceId, parentSpanId } = extractTraceFromHeaders(req.headers);
503
+ attachHono(app, sb, { port: 3000 }); // Hono doesn't own the socket — pass the port
504
+ Bun.serve({ port: 3000, fetch: app.fetch });
935
505
  ```
936
506
 
937
- Extracts trace context from HTTP headers. Supports W3C `traceparent`, `x-trace-id`/`x-span-id` headers, and generates random IDs as fallback. Useful for custom HTTP framework integrations (Hono, Koa, etc.).
507
+ `attachExpress`/`attachHono` take `{ port, host? }`; `sbFastify` reads the bound address itself. Host defaults to the bound socket, falling back to `127.0.0.1`. Attaching before `start()` is safe the endpoint rides along in the first registration.
938
508
 
939
509
  ---
940
510
 
941
511
  ## Configuration
942
512
 
943
- ### TLS behavior
944
-
945
- - Worker transport is TLS-only.
946
- - Control plane is TLS-only. Trust source is embedded into sbv2 service key by default.
947
- - Embedded/explicit CA PEM is validated with strict x509 parsing.
948
- - If `workerTLS` is not provided, SDK auto-provisions worker certs via gRPC `ProvisionWorkerCertificate`.
949
- - `workerTLS.cert` and `workerTLS.key` must be provided together.
950
- - `serve({ tls })` overrides global `workerTLS` for a specific worker instance.
951
-
952
- ### Offline queue behavior
953
-
954
- When the control plane is unavailable, SDK queues write operations (`event`, `job`, `workflow`, telemetry writes).
955
-
956
- - Queue size: `queueMaxSize` (default: 1000)
957
- - Overflow policy: `queueOverflow` (default: `"drop-oldest"`)
958
- - Return values for queued writes may be empty strings until flushed
959
-
960
- ---
961
-
962
- ## Environment Variables
513
+ All configuration lives on the `ServiceBridge` constructor — `new ServiceBridge(url, key, options)`. The SDK reads no environment variables; you decide where `url`, `key` and options come from. Every option is optional.
963
514
 
964
- The SDK requires values you pass into `servicebridge(...)`. Common setup:
965
-
966
- | Variable | Required | Example | Description |
515
+ | Option | Type | Default | Description |
967
516
  |---|---|---|---|
968
- | `SERVICEBRIDGE_URL` | yes | `localhost:14445` | gRPC control plane URL |
969
- | `SERVICEBRIDGE_SERVICE_KEY` | yes | `sbv2.<id>.<secret>.<ca>` | Service authentication key (sbv2 only) |
970
-
971
- ```ts
972
- const sb = servicebridge(
973
- process.env.SERVICEBRIDGE_URL ?? "localhost:14445",
974
- process.env.SERVICEBRIDGE_SERVICE_KEY!,
975
- );
976
- ```
977
-
978
- ---
979
-
980
- ## Error Handling
981
-
982
- `ServiceBridgeError` is exported for normalized SDK and runtime errors.
983
-
984
- ```ts
985
- import { servicebridge, ServiceBridgeError } from "service-bridge";
986
-
987
- try {
988
- await sb.rpc("payments", "payment.charge", { orderId: "ord_1" });
989
- } catch (e) {
990
- if (e instanceof ServiceBridgeError) {
991
- console.error(e.component, e.operation, e.severity, e.retryable, e.code);
992
- }
993
- throw e;
994
- }
995
- ```
996
-
997
- | Field | Type | Description |
998
- |---|---|---|
999
- | `component` | `string` | SDK subsystem (for example, `"rpc"` or `"event"`). |
1000
- | `operation` | `string` | Operation that failed. |
1001
- | `severity` | `"fatal" \| "retriable" \| "ignorable"` | Error classification. |
1002
- | `retryable` | `boolean` | Whether retry is recommended. |
1003
- | `code` | `number \| undefined` | gRPC status code (if available). |
1004
- | `cause` | `unknown` | Original underlying error. |
1005
-
1006
- ---
1007
-
1008
- ## When to Use / When Not to Use
1009
-
1010
- ### ServiceBridge is a good fit when you:
1011
-
1012
- - Have **3+ microservices** that need to communicate via RPC, events, or both
1013
- - Want **RPC + events + workflows + jobs** without managing separate infrastructure for each
1014
- - Need **end-to-end tracing** across all communication patterns in one timeline
1015
- - Want to **eliminate sidecar proxies** and reduce operational overhead
1016
- - Need **durable event delivery** with retry, DLQ, and replay without running a broker
1017
- - Are building **AI/LLM pipelines** and need realtime streaming with replay
1018
-
1019
- ### Consider alternatives when you:
1020
-
1021
- - Run a **single monolith** with no service decomposition plans
1022
- - Need **ultra-high-throughput event streaming** (100K+ msg/s sustained) — Kafka is purpose-built for this
1023
- - Need a **full API gateway** with rate limiting, auth plugins, and request transformation — use Kong/Envoy Gateway
1024
- - Already have a **mature Istio/Linkerd mesh** and only need traffic management (no events/workflows/jobs)
1025
- - Need **multi-region event replication** — ServiceBridge currently targets single-region deployments
1026
-
1027
- ---
1028
-
1029
- ## v2 Session API
1030
-
1031
- `session_v2.ts` реализует новый Enterprise Session Protocol — Channel-based bidi stream с 8-состоянийным FSM, адаптивным heartbeat и кредитным управлением потоком. Симметричен с Go и Python SDK.
1032
-
1033
- ### Жизненный цикл сессии (8 состояний FSM)
1034
-
1035
- ```
1036
- connecting → handshaking → ready ↔ active
1037
- ↘ suspended → (reconnect)
1038
- ↘ draining → closed
1039
- ↘ fenced (permanent)
1040
- ```
1041
-
1042
- | Состояние | Описание |
1043
- |-----------|----------|
1044
- | `connecting` | Устанавливается TCP/TLS соединение |
1045
- | `handshaking` | Отправлен Hello, ждём HelloAck |
1046
- | `ready` | HelloAck получен, команды не выполняются |
1047
- | `active` | Есть активные команды |
1048
- | `suspended` | Heartbeat пропущен 2+ раза |
1049
- | `draining` | Инициирован graceful shutdown |
1050
- | `fenced` | Сервер прислал GOAWAY_FENCED — сессия закрыта навсегда |
1051
- | `closed` | Соединение закрыто |
1052
-
1053
- ### Быстрый старт
1054
-
1055
- ```typescript
1056
- import { V2SessionClient, validateV2Config } from 'service-bridge';
1057
-
1058
- const cfg = {
1059
- serverAddress: 'localhost:9090',
1060
- instanceId: 'worker-1',
1061
- zone: 'us-east-1a',
1062
- transportMode: 'direct' as const,
1063
- maxInflight: 64,
1064
- };
1065
-
1066
- validateV2Config(cfg);
1067
- const session = new V2SessionClient(cfg);
1068
-
1069
- // Отправить Hello при подключении
1070
- const hello = session.getHelloFields();
1071
-
1072
- // Обработать HelloAck от сервера
1073
- session.onHelloAck({
1074
- sessionId: 'sess-abc',
1075
- resumeToken: 'token-xyz',
1076
- epoch: 1n,
1077
- resumed: false,
1078
- resumeFromSeq: 0n,
1079
- replayedCommands: 0,
1080
- reconciledResults: 0,
1081
- heartbeatIntervalMs: 10_000,
1082
- heartbeatTimeoutMs: 30_000,
1083
- initialPermits: 64,
1084
- maxPermits: 128,
1085
- effectiveTransportMode: 'direct',
517
+ | `advertise` | `{ host, port } \| false` | `127.0.0.1` on a free port (with a warning) | Inbound RPC server address. Pass `{ host, port }` in containers / k8s; `false` for caller-only instances that never serve RPC. |
518
+ | `callDefaults` | `CallOpts` | `{}` | Default `CallOpts` merged under every `sb.rpc.call()` / `sb.stream()`. |
519
+ | `failOnPolicyViolation` | `boolean` | `false` | When `true`, any policy warning at registration makes `start()` surface a `disconnected` event and stop. Otherwise warnings are logged and emitted as `policy_violation`. |
520
+ | `telemetry` | `boolean` | `true` | Emit ops/logs/metrics to the runtime. `false` fully disables the telemetry transport. |
521
+ | `telemetryRingSize` | `number` | `262144` (256 KiB) | Byte budget for the in-memory ops ring buffer. |
522
+ | `dataDir` | `string` | `"./.servicebridge"` | Directory for the local SQLite event outbox. |
523
+ | `maxOutboxRows` | `number` | `100000` | Outbox rows before `publish` back-pressures with `OutboxFullError`. |
524
+ | `eventsDrainerBatch` | `number` | `50` | Outbox rows drained to the runtime per tick. |
525
+ | `eventsMaxInFlight` | `number` | `32` | Max concurrent inbound events processed by subscribers. |
526
+ | `payloadMaxBytes` | `number` | `65536` | Per-direction cap on captured payload bytes. |
527
+ | `reconnectIntervalMs` | `number` | `3000` | Delay between reconnect attempts. |
528
+ | `reconnectAttempts` | `number` | `3` | Reconnect attempts before giving up. `0` = unlimited. |
529
+
530
+ ```ts
531
+ const sb = new ServiceBridge("localhost:14445", KEY, {
532
+ advertise: { host: process.env.POD_IP!, port: 50051 },
533
+ callDefaults: { timeout: "10s" },
534
+ reconnectAttempts: 0,
535
+ dataDir: "/var/lib/myservice/sb",
1086
536
  });
1087
-
1088
- console.log(session.state); // 'ready'
1089
-
1090
- // Входящая команда
1091
- const accepted = session.onCommandReceived(1n, 'cmd-001');
1092
- if (!accepted) {
1093
- // backpressure — permits = 0
1094
- }
1095
-
1096
- // Команда выполнена
1097
- session.onCommandCompleted(1n, 'cmd-001');
1098
537
  ```
1099
538
 
1100
- ### Адаптивный heartbeat (EWMA RTT)
1101
-
1102
- ```typescript
1103
- import { AdaptiveHeartbeatV2 } from 'service-bridge';
1104
-
1105
- const hb = new AdaptiveHeartbeatV2(10_000, 30_000);
1106
-
1107
- // Получен pong
1108
- hb.onPong(25); // rttMs
1109
-
1110
- // Следующий интервал (адаптируется по EWMA RTT)
1111
- const nextMs = hb.nextIntervalMs();
539
+ ### Lifecycle
1112
540
 
1113
- // Пропуск — ускоряем пинги
1114
- const missCount = hb.onMiss();
1115
- if (missCount >= 2) {
1116
- // reconnect
1117
- }
1118
- ```
1119
-
1120
- Алгоритм: базовый интервал `intervalMs / 3`; при пропусках делится на `2^miss` (min 2s); при стабильном RTT < 50ms удваивается (max 30s).
1121
-
1122
- ### Кредитное управление потоком
1123
-
1124
- ```typescript
1125
- import { FlowControlStateV2 } from 'service-bridge';
541
+ ```ts
542
+ const sb = new ServiceBridge("localhost:14445", KEY);
1126
543
 
1127
- const fc = new FlowControlStateV2(64, 1, 128);
544
+ sb.service("payment-svc", { rpc: ["Charge"] }); // what you call
545
+ sb.rpc.handle("Ship", shipHandler, { schema: { protoFile: "./ship.proto" } }); // what you serve
1128
546
 
1129
- if (fc.tryConsume()) {
1130
- // dispatch command
1131
- }
547
+ sb.on("connected", ({ serviceName }) => console.log(`connected as ${serviceName}`));
548
+ sb.on("reconnecting", ({ attempt, reason }) => console.warn(`reconnecting #${attempt}: ${reason}`));
549
+ sb.on("disconnected", ({ reason }) => console.error(`disconnected: ${reason}`));
550
+ sb.on("policy_violation", (v) => console.warn(`policy: ${v.declaration} ${v.value} — ${v.reason}`));
1132
551
 
1133
- // Команда завершена — вернуть permit
1134
- fc.release(1);
552
+ await sb.start();
1135
553
 
1136
- // Сервер прислал FlowControlUpdate
1137
- fc.setWindow(32);
554
+ process.on("SIGTERM", async () => { await sb.stop(); process.exit(0); });
1138
555
  ```
1139
556
 
1140
- ### Reconnect и resume
1141
-
1142
- `BackoffV2` реализует экспоненциальный backoff с full jitter (base=100ms, max=30s). При переподключении `getHelloFields()` автоматически включает `resumeToken`, `epoch`, `lastReceivedSeq`, `lastSentSeq`, `completedCommandIds` — сервер продолжит сессию с нужной позиции.
1143
-
1144
- ```typescript
1145
- import { BackoffV2 } from 'service-bridge';
557
+ ---
1146
558
 
1147
- const backoff = new BackoffV2();
559
+ ## Error handling
1148
560
 
1149
- while (true) {
1150
- if (backoff.isCircuitOpen()) break; // 10+ сбоев подряд
561
+ Typed errors are exported from the package root, so you can `catch` precisely:
1151
562
 
1152
- const delayMs = backoff.next();
1153
- await new Promise(r => setTimeout(r, delayMs));
563
+ ```ts
564
+ import {
565
+ RpcAccessDeniedError,
566
+ WorkflowAccessDeniedError,
567
+ InvalidEventNameError,
568
+ OutboxFullError,
569
+ ServiceBridgeError,
570
+ } from "service-bridge";
1154
571
 
1155
- try {
1156
- // reconnect...
1157
- backoff.reset();
1158
- } catch {
1159
- backoff.recordFail();
572
+ try {
573
+ await payment.Charge({ userId: "u-1", amount: 100 });
574
+ } catch (err) {
575
+ if (err instanceof RpcAccessDeniedError) {
576
+ // denied by access policy: { serviceName, methodName, reason }
577
+ } else if (err instanceof ServiceBridgeError) {
578
+ // connection / provisioning failure with a typed .code
1160
579
  }
1161
580
  }
1162
581
  ```
1163
582
 
1164
- ### ConfigPush динамическая конфигурация транспорта
1165
-
1166
- Сервер может в любой момент прислать `ConfigPush` с новыми правилами маршрутизации:
1167
-
1168
- ```typescript
1169
- session.onConfigPush({
1170
- defaultMode: 'direct',
1171
- serviceOverrides: {
1172
- 'payment-svc': { mode: 'proxy', fallbackPolicy: 'fallback_to_direct' },
1173
- },
1174
- functionOverrides: {
1175
- 'payment-svc.charge': { mode: 'proxy', timeoutMs: 5000 },
1176
- },
1177
- });
1178
-
1179
- // Разрешить транспорт для функции
1180
- const mode = session.resolveTransportMode('payment-svc.charge'); // 'proxy'
1181
- ```
1182
-
1183
- ### Все события сессии
1184
-
1185
- | Метод | Описание |
1186
- |-------|----------|
1187
- | `getHelloFields()` | Поля для отправки Hello (первый + resume) |
1188
- | `onHelloAck(ack)` | Обработка HelloAck от сервера |
1189
- | `onCommandReceived(seq, id)` | Входящая команда; возвращает `false` при backpressure |
1190
- | `onCommandCompleted(seq, id)` | Команда выполнена; освобождает permit |
1191
- | `onPermitGrant(n)` | Сервер добавил `n` permits |
1192
- | `onFlowControlUpdate(size, reason)` | Сервер изменил размер окна |
1193
- | `onPong(rttMs)` | Получен pong; обновляет EWMA |
1194
- | `onHeartbeatMiss()` | Таймаут pong; возвращает `true` → `suspended` |
1195
- | `onDrain(reason, deadlineMs)` | Инициировать graceful drain |
1196
- | `onGoaway(code, reason)` | GoawaySignal от сервера |
1197
- | `onConfigPush(config)` | Применить новую конфигурацию транспорта |
1198
- | `resolveTransportMode(fnName)` | Получить режим транспорта для функции |
1199
- | `stop()` | Немедленно закрыть сессию |
1200
-
1201
- ### Экспортируемые классы и типы
1202
-
1203
- | Символ | Тип | Описание |
1204
- |--------|-----|----------|
1205
- | `V2SessionClient` | class | Главный клиент сессии |
1206
- | `AdaptiveHeartbeatV2` | class | EWMA RTT heartbeat controller |
1207
- | `FlowControlStateV2` | class | Кредитное управление потоком |
1208
- | `BackoffV2` | class | Exponential backoff + circuit |
1209
- | `PositionTrackerV2` | class | Трекер seq/completed IDs |
1210
- | `ConfigPushStateV2` | class | Менеджер динамической конфигурации |
1211
- | `validateV2Config` | function | Валидация конфига; бросает `Error` |
1212
- | `V2Config` | interface | Конфигурация сессии |
1213
- | `SessionStateV2` | type | Союз 8 состояний FSM |
1214
- | `TransportMode` | type | `'direct' \| 'proxy'` |
1215
- | `HelloAckV2` | interface | Данные HelloAck от сервера |
1216
- | `TransportConfigV2` | interface | ConfigPush payload |
1217
- | `ReconcileRequestV2` | interface | Declarative worker registration request |
1218
- | `FunctionDeclarationV2` | interface | Function declaration for Reconcile |
1219
- | `ConsumerGroupDeclarationV2` | interface | Consumer group declaration |
1220
- | `HttpRouteDeclarationV2` | interface | HTTP route declaration |
1221
- | `JobDeclarationV2` | interface | Job declaration |
1222
- | `WorkflowDeclarationV2` | interface | Workflow declaration |
1223
- | `SubscribeRequestV2` | interface | Registry subscribe request |
1224
- | `WorkerEndpointV2` | interface | Worker endpoint info |
1225
- | `IssueCertificateRequestV2` | interface | Certificate request |
1226
- | `IssueCertificateResponseV2` | interface | Certificate response |
1227
- | `CircuitBreakerConfigV2` | interface | Circuit breaker config |
1228
- | `ZoneConfigV2` | interface | Zone-aware config |
1229
- | `ServiceTransportOverride` | interface | Per-service transport override |
1230
- | `FunctionTransportOverride` | interface | Per-function transport override |
1231
- | `ResumeState` | interface | Reconnect resume state |
1232
-
1233
- Key types available for import:
1234
-
1235
- ```ts
1236
- import type {
1237
- WorkflowStep,
1238
- WorkerTLSOpts,
1239
- RpcContext,
1240
- EventContext,
1241
- StreamWriter,
1242
- TraceCtx,
1243
- RetryPolicy,
1244
- ServiceBridgeErrorSeverity,
1245
- } from "service-bridge";
1246
- ```
583
+ | Error | Thrown when |
584
+ |---|---|
585
+ | `RpcAccessDeniedError` | An RPC call is denied by access policy. Also fires a `policy_violation` event. |
586
+ | `WorkflowAccessDeniedError` | A workflow `start()` is denied by access policy. |
587
+ | `WorkflowNotFoundError` | Starting a workflow name the runtime doesn't know. |
588
+ | `WorkflowTerminalError` | Signalling/cancelling a run that already finished. |
589
+ | `InvalidEventNameError` | Publishing/defining an event whose name fails the naming rule. |
590
+ | `OutboxFullError` | The local event outbox is at `maxOutboxRows` (back-pressure). |
591
+ | `ServiceBridgeError` | Connection / provisioning failures; carries a typed `.code` (retryable ones drive auto-reconnect). |
1247
592
 
1248
593
  ---
1249
594
 
1250
595
  ## FAQ
1251
596
 
1252
- **How does ServiceBridge handle service failures?**
1253
- RPC calls have configurable retries with exponential backoff and hard per-attempt timeouts, so a silent downstream service cannot keep a call pending forever. Events are durable (PostgreSQL-backed) with at-least-once delivery per consumer group. Failed deliveries are retried according to policy, then moved to DLQ. Workflows track step state and can be resumed.
1254
-
1255
- **Is there vendor lock-in?**
1256
- ServiceBridge is self-hosted. The runtime is a single Go binary + PostgreSQL. SDK calls map to standard patterns (RPC, pub/sub, cron) — migrating away means replacing SDK calls with equivalent library calls.
597
+ **Do I have to use Protobuf?** You point handlers at a `.proto` file or a `.schema.json` with explicit field numbers. Both are file-based; there is no inline schema.
1257
598
 
1258
- **How does tracing work without an OTEL collector?**
1259
- The SDK automatically reports trace spans for every RPC call, event publish/delivery, workflow step, and HTTP request. The runtime stores traces in PostgreSQL and serves them via the built-in dashboard and a Loki-compatible API for Grafana integration.
599
+ **Does ServiceBridge proxy my HTTP traffic?** No. You run your own Express / Fastify / Hono server. The integration only discovers your routes for the Service Map and adds trace spans — your HTTP path is untouched.
1260
600
 
1261
- **Can I use ServiceBridge alongside existing infrastructure?**
1262
- Yes. You can adopt incrementally — start with RPC between two services, add events later, then workflows. ServiceBridge doesn't require replacing your existing broker or mesh all at once.
601
+ **How do I scale horizontally?** Run as many SDK instances as you like; the runtime load-balances RPC across live instances and fails over automatically. The runtime itself is a single source of truth backed by PostgreSQL.
1263
602
 
1264
- **What happens when the control plane is down?**
1265
- In-flight direct RPC calls continue working (they go service-to-service, not through the control plane). New discovery lookups, event publishes, and telemetry writes are queued in the SDK offline queue and flushed when the control plane recovers.
603
+ **What happens on a transient disconnect?** Published events sit in the local SQLite outbox and drain when the connection returns. The SDK auto-reconnects (configurable) and rotates certs with overlap so live instances don't drop traffic.
1266
604
 
1267
- **What databases does the runtime support?**
1268
- PostgreSQL 16+. The runtime uses PostgreSQL for all persistence: traces, events, workflows, jobs, service registry, and configuration.
605
+ **Where do I see traces, metrics and the DLQ?** In the runtime dashboard on `:14444`. Tracing, metrics and the dead-letter queue are operated there.
1269
606
 
1270
- ---
1271
-
1272
- ## Community and Support
1273
-
1274
- - Website: [servicebridge.dev](https://servicebridge.dev)
1275
- - GitHub: [github.com/service-bridge](https://github.com/service-bridge)
1276
- - SDK monorepo: [README.md](../README.md)
607
+ **Node or Bun?** Both. Node 18+ or any current Bun. Bun-native APIs are used where available.
1277
608
 
1278
609
  ---
1279
610
 
1280
- ## License
611
+ ## Community
1281
612
 
1282
- Free for non-commercial use. Commercial use requires a separate license. See [LICENSE](../LICENSE).
613
+ - **Website & docs:** [servicebridge.dev](https://servicebridge.dev) · [servicebridge.dev/docs](https://servicebridge.dev/docs)
614
+ - **SDK umbrella repo (all languages):** [github.com/service-bridge/sdk](https://github.com/service-bridge/sdk)
615
+ - **Runtime:** [github.com/servicebridge2/runtime](https://github.com/servicebridge2/runtime)
1283
616
 
1284
- Copyright (c) 2026 Eugene Surkov.
617
+ This is an alpha release (`2.0.0-alpha`). The API is stabilising — issues and feedback are welcome.
1285
618
 
1286
619
  ---
1287
620
 
1288
- ## Keywords
621
+ ## License
1289
622
 
1290
- service-bridge · servicebridge · npm install service-bridge · npm i service-bridge · bun add service-bridge · Node.js SDK · TypeScript SDK · JavaScript microservices · RPC · gRPC · event bus · event-driven · distributed tracing · workflow orchestration · background jobs · cron · mTLS · service mesh · service discovery · zero sidecar · Istio alternative · Envoy alternative · RabbitMQ alternative · Temporal alternative · Jaeger alternative · PostgreSQL · Docker · Kubernetes · DLQ · dead letter queue · saga · distributed transactions · AI agent orchestration · Express middleware · Fastify middleware · HTTP middleware · observability · Prometheus · tracing · service catalog · durable events · retries · idempotency · auto mTLS · runtime dashboard · production ready · microservice communication
623
+ Licensed under the **MIT License** see [LICENSE](./LICENSE). Free for any use, including commercial; you only need to keep the copyright and license notice (attribution to esurkov1 <esurkovv@yandex.ru>).