service-bridge 1.9.0-dev.52 → 2.0.0-alpha

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,1337 +1,623 @@
1
- <!-- keywords: service-bridge servicebridge npm install service-bridge Node.js TypeScript JavaScript microservices RPC gRPC event-bus event-driven distributed-tracing workflow orchestration background-jobs cron mTLS service-mesh service-discovery distributed-systems zero-sidecar Istio-alternative RabbitMQ-alternative Temporal-alternative Jaeger-alternative PostgreSQL Docker Kubernetes DLQ dead-letter-queue saga distributed-transactions AI-agent-orchestration Express Fastify HTTP-middleware observability Prometheus tracing service-catalog async-messaging durable-events retries idempotency auto-mTLS runtime-dashboard production-ready bun deno -->
1
+ <!--
2
+ Keywords: service-bridge, ServiceBridge, microservices, Node.js SDK, TypeScript SDK, Bun,
3
+ gRPC, mTLS, RPC framework, durable events, pub/sub, message broker alternative, RabbitMQ alternative,
4
+ workflow engine, saga, orchestration, Temporal alternative, job scheduler, cron, distributed tracing,
5
+ observability, OpenTelemetry alternative, Jaeger alternative, service mesh alternative, Istio alternative,
6
+ self-hosted, PostgreSQL, Express, Fastify, Hono, circuit breaker, idempotency, retries, load balancing.
7
+ -->
2
8
 
3
9
  # service-bridge
4
10
 
5
- [![npm version](https://img.shields.io/npm/v/service-bridge?color=cb3837&logo=npm)](https://www.npmjs.com/package/service-bridge)
6
- [![License](https://img.shields.io/badge/License-Free%20%2F%20Commercial-blue)](../LICENSE)
7
- [![TypeScript](https://img.shields.io/badge/TypeScript-5%2B-3178c6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
8
- [![Node](https://img.shields.io/badge/Node.js-18%2B-339933?logo=node.js&logoColor=white)](https://nodejs.org/)
11
+ [![npm version](https://img.shields.io/npm/v/service-bridge?color=cb3837&label=npm)](https://www.npmjs.com/package/service-bridge)
12
+ [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)
13
+ [![TypeScript](https://img.shields.io/badge/types-included-3178c6.svg)](https://www.typescriptlang.org/)
14
+ [![Node](https://img.shields.io/badge/node-%E2%89%A518-339933.svg)](https://nodejs.org/)
9
15
 
10
- **The Unified Bridge for Microservices Interaction**
16
+ **The Node.js / Bun SDK for [ServiceBridge](https://servicebridge.dev) — RPC, durable events, workflows, jobs, streaming and full observability over one self-hosted runtime. No broker. No sidecar. No tracing stack. Just one Go binary plus PostgreSQL.**
11
17
 
12
- Node.js SDK for [ServiceBridge](https://servicebridge.dev) production-ready RPC, durable events, workflows, jobs, and distributed tracing in a single SDK. One Go runtime and PostgreSQL.
18
+ You declare what your service handles and what it calls. ServiceBridge does the rest: provisions an mTLS identity, opens the connection, registers your handlers, and routes every RPC, event, job and workflow step with tracing, metrics and access policy built in.
13
19
 
14
20
  ```
15
- ┌─────────────────────────────────────────────────────────────────┐
16
- │ BEFORE: 10 moving parts │
17
- Istio · Envoy · RabbitMQ · Temporal · Jaeger · Consul · │
18
- cert-manager · Alertmanager · cron · custom glue │
19
- └─────────────────────────────────────────────────────────────────┘
20
-
21
- ┌─────────────────────────────────────────────────────────────────┐
22
- AFTER: ServiceBridge + PostgreSQL
23
- RPC · Events · Workflows · Jobs · Tracing · mTLS · Dashboard
24
- One SDK · One runtime · Zero sidecars
25
- └─────────────────────────────────────────────────────────────────┘
21
+ BEFORE AFTER
22
+
23
+ ┌─────────────────────┐
24
+ Istio + Envoy │ ← mesh / mTLS
25
+ │ RabbitMQ / Kafka │ ← events ┌──────────────────────┐
26
+ │ Temporal │ ← workflows │ │
27
+ │ a cron scheduler │ ← jobs │ ServiceBridge │
28
+ gRPC plumbing │ ← RPC ═══► │ runtime (1 binary)
29
+ Jaeger / Tempo │ ← tracing + │
30
+ Prometheus wiring metrics PostgreSQL │
31
+ │ Loki │ ← logs │ │
32
+ │ a load balancer │ ← LB / retries └──────────────────────┘
33
+ │ service registry │ ← discovery
34
+ └─────────────────────┘
35
+ 10+ moving parts 2 things to run
26
36
  ```
27
37
 
28
- ## Table of Contents
38
+ ---
39
+
40
+ ## Table of contents
29
41
 
30
- - [Why ServiceBridge](#why-servicebridge)
31
- - [Use Cases](#use-cases)
32
- - [Quick Start](#quick-start)
33
42
  - [Install](#install)
34
- - [Runtime Setup](#runtime-setup)
35
- - [End-to-End Example](#end-to-end-example)
36
- - [Platform Features](#platform-features)
37
- - [How It Compares](#how-it-compares)
38
- - [API Reference](#api-reference)
39
- - [HTTP Plugins](#http-plugins)
43
+ - [Why ServiceBridge](#why-servicebridge)
44
+ - [Use cases](#use-cases)
45
+ - [Quick start](#quick-start)
46
+ - [Runtime setup](#runtime-setup)
47
+ - [End-to-end example](#end-to-end-example)
48
+ - [Platform features](#platform-features)
49
+ - [How it compares](#how-it-compares)
50
+ - [API reference](#api-reference)
51
+ - [RPC](#rpc)
52
+ - [Events](#events)
53
+ - [Jobs](#jobs)
54
+ - [Workflows](#workflows)
55
+ - [Streaming](#streaming)
56
+ - [Telemetry](#telemetry)
57
+ - [HTTP](#http)
58
+ - [HTTP plugins](#http-plugins)
40
59
  - [Configuration](#configuration)
41
- - [Environment Variables](#environment-variables)
42
- - [Error Handling](#error-handling)
43
- - [When to Use / When Not to Use](#when-to-use--when-not-to-use)
60
+ - [Error handling](#error-handling)
44
61
  - [FAQ](#faq)
45
- - [Community and Support](#community-and-support)
62
+ - [Community](#community)
46
63
  - [License](#license)
47
64
 
48
65
  ---
49
66
 
50
- ## Why ServiceBridge
51
-
52
- | Problem | Without ServiceBridge | With ServiceBridge |
53
- |---|---|---|
54
- | Service-to-service calls | Istio/Envoy sidecar proxy per pod | **Direct SDK-to-worker gRPC, zero proxy hops** |
55
- | Async messaging | Kafka/RabbitMQ + retry logic + DLQ setup | **Built-in durable events with retry, DLQ, replay** |
56
- | Background jobs | Bull/BullMQ + Redis + cron daemon | **Built-in cron and delayed jobs** |
57
- | Workflow orchestration | Temporal/Conductor cluster + persistence | **Built-in DAG workflows** |
58
- | Distributed tracing | Jaeger/Tempo + OTEL collector + dashboards | **Built-in traces + realtime UI** |
59
- | Service discovery | Consul/etcd + DNS glue | **Built-in registry + health-aware balancing** |
60
- | mTLS | cert-manager + Vault PKI | **Auto-provisioned certs from service key** |
61
-
62
- **Result**: `10 tools → 1 runtime`. One Go binary + PostgreSQL replaces the entire stack.
63
-
64
- ---
65
-
66
- ## Use Cases
67
-
68
- **Microservice communication** — Replace sidecar mesh with direct RPC calls. Get sub-millisecond overhead instead of double proxy hop latency.
69
-
70
- **Event-driven architecture** — Publish durable events with fan-out, retries, DLQ, idempotency, and server-side filtering. No broker infrastructure to manage.
71
-
72
- **Background job scheduling** — Cron jobs, delayed execution, and job-triggered workflows in a single API. No Redis, no separate queue workers.
73
-
74
- **Saga / distributed transactions** — DAG workflows with typed steps (`rpc`, `event`, `event_wait`, `sleep`, child workflow). Compensations and rollbacks via workflow step dependencies.
75
-
76
- **AI agent orchestration** — Stream LLM tokens via realtime trace streams with replay. Orchestrate multi-step AI pipelines as workflows.
77
-
78
- **Full-stack observability** — Every RPC call, event delivery, workflow step, and HTTP request traced automatically. One timeline, one dashboard. Prometheus metrics and Loki-compatible log API included.
79
-
80
- ---
81
-
82
- ## Quick Start
83
-
84
- ### 1. Install
67
+ ## Install
85
68
 
86
- ```bash
69
+ ```sh
87
70
  npm i service-bridge
88
71
  # or
89
72
  bun add service-bridge
90
73
  ```
91
74
 
92
- ### 2. Create a worker (service that handles calls)
93
-
94
- ```ts
95
- import { ServiceBridge } from "service-bridge";
96
-
97
- const sb = new ServiceBridge(
98
- process.env.SERVICEBRIDGE_URL ?? "localhost:14445",
99
- process.env.SERVICEBRIDGE_SERVICE_KEY!,
100
- );
101
-
102
- sb.rpc.handle("payment.charge", async (payload: { orderId: string; amount: number }) => {
103
- return { ok: true, txId: `tx_${Date.now()}`, orderId: payload.orderId };
104
- });
105
-
106
- await sb.start({ host: "localhost" });
107
- ```
108
-
109
- ### 3. Call it from another service
75
+ - **Runtime:** Node.js 18+ or any current Bun.
76
+ - **Types:** included, written in TypeScript 5.
77
+ - **Backend:** a running ServiceBridge runtime (gRPC control plane on `:14445`) backed by PostgreSQL 18+. See [Runtime setup](#runtime-setup).
110
78
 
111
79
  ```ts
112
80
  import { ServiceBridge } from "service-bridge";
113
81
 
114
82
  const sb = new ServiceBridge(
115
- process.env.SERVICEBRIDGE_URL ?? "localhost:14445",
116
- process.env.SERVICEBRIDGE_SERVICE_KEY!,
83
+ "localhost:14445", // runtime control-plane address
84
+ "sb_key_...", // bootstrap service key from the runtime
117
85
  );
118
-
119
- const result = await sb.rpc.invoke<{ ok: boolean; txId: string }>("payment.charge", {
120
- orderId: "ord_42",
121
- amount: 4990,
122
- });
123
-
124
- console.log(result.txId); // tx_1711234567890
125
- ```
126
-
127
- That's it. No broker, no sidecar, no proxy — direct gRPC call between services.
128
-
129
- ---
130
-
131
- ## Runtime Setup
132
-
133
- The SDK connects to a ServiceBridge runtime. The fastest way to start:
134
-
135
- ```bash
136
- bash <(curl -fsSL https://servicebridge.dev/install.sh)
137
86
  ```
138
87
 
139
- This installs ServiceBridge + PostgreSQL via Docker Compose and generates an admin password automatically. After install, the dashboard is at `http://localhost:14444` and the gRPC control plane at `localhost:14445`.
140
-
141
- For manual Docker Compose setup, configuration reference, and all runtime environment variables, see the **[Runtime Setup](../README.md#runtime-setup)** section in the main SDK README.
88
+ The third constructor argument is an [options](#configuration) object. The SDK reads **no environment variables** every knob is a constructor option, so you stay in control of where config comes from.
142
89
 
143
90
  ---
144
91
 
145
- ## End-to-End Example
146
-
147
- A complete order flow: HTTP request → RPC → Event → Event handler with streaming.
148
-
149
- ```ts
150
- import { ServiceBridge } from "service-bridge";
151
-
152
- // --- Payments service (worker) ---
153
-
154
- const payments = new ServiceBridge("localhost:14445", process.env.SERVICEBRIDGE_SERVICE_KEY!);
155
-
156
- payments.rpc.handle("payment.charge", async (payload: { orderId: string; amount: number }, ctx) => {
157
- await ctx?.stream.write({ status: "charging", orderId: payload.orderId }, "progress");
158
-
159
- // ... charge logic ...
160
-
161
- await ctx?.stream.write({ status: "charged" }, "progress");
162
- return { ok: true, txId: `tx_${Date.now()}` };
163
- });
164
-
165
- await payments.start({ host: "localhost" });
166
- ```
167
-
168
- ```ts
169
- // --- Orders service (caller + event publisher) ---
170
-
171
- const orders = new ServiceBridge("localhost:14445", process.env.SERVICEBRIDGE_SERVICE_KEY!);
172
-
173
- // Call payments, then publish event
174
- const charge = await orders.rpc.invoke<{ ok: boolean; txId: string }>("payment.charge", {
175
- orderId: "ord_42",
176
- amount: 4990,
177
- });
178
-
179
- await orders.events.publish("orders.completed", {
180
- orderId: "ord_42",
181
- txId: charge.txId,
182
- }, {
183
- idempotencyKey: "order:ord_42:completed",
184
- headers: { source: "checkout" },
185
- });
186
- ```
187
-
188
- ```ts
189
- // --- Notifications service (event consumer) ---
190
-
191
- const notifications = new ServiceBridge("localhost:14445", process.env.SERVICEBRIDGE_SERVICE_KEY!);
92
+ ## Why ServiceBridge
192
93
 
193
- notifications.events.handle("orders.*", async (payload, ctx) => {
194
- const body = payload as { orderId: string; txId: string };
195
- await ctx.stream.write({ status: "sending_email", orderId: body.orderId }, "progress");
196
- // ... send email ...
197
- });
94
+ Microservices rarely fail because of business logic. They fail in the gaps *between* services — the broker that dropped a message, the workflow engine nobody fully understands, the trace that stops at a service boundary, the mesh config that takes a week to debug. Each gap is another system to run, secure and correlate.
198
95
 
199
- await notifications.start({ host: "localhost" });
200
- ```
96
+ ServiceBridge collapses those gaps into one runtime. Your service talks to a single gRPC endpoint over mTLS; the runtime is the single source of truth for routing, delivery and state.
201
97
 
202
- ```ts
203
- // --- Orchestrate as a workflow ---
204
-
205
- await orders.workflows.run("order.fulfillment", [
206
- { id: "reserve", type: "rpc", service: "inventory", ref: "inventory.reserve" },
207
- { id: "charge", type: "rpc", service: "payment", ref: "payment.charge", deps: ["reserve"] },
208
- { id: "wait_dlv", type: "event_wait", ref: "shipping.delivered", deps: ["charge"] },
209
- { id: "notify", type: "event", ref: "orders.fulfilled", deps: ["wait_dlv"] },
210
- ]);
211
- ```
98
+ | Problem | Without ServiceBridge | With ServiceBridge |
99
+ |---|---|---|
100
+ | Service-to-service calls | gRPC/HTTP plumbing + a mesh for mTLS + retries | `sb.rpc.call("svc", "Method", req)` — mTLS, LB, retries, breakers built in |
101
+ | Reliable async messaging | Stand up and operate a broker | `sb.event.publish(...)` — durable outbox, at-least-once, fan-out, DLQ |
102
+ | Multi-step business processes | A separate workflow engine to learn and host | `sb.workflow.handle(...)` — durable DAGs with compensation and replay |
103
+ | Scheduled work | A cron box or a job scheduler service | `sb.job.handle(...)` — cron / interval / delay, leased and retried |
104
+ | Knowing what happened | Wire up tracing + metrics + logs across N tools | Every hop is traced, measured and logged automatically |
105
+ | Identity & access | Certificates, a mesh policy layer | mTLS from a service key + granular access policy, on by default |
212
106
 
213
- Every step above — RPC, event publish, event delivery, workflow execution appears in a single trace timeline in the built-in dashboard.
107
+ One binary, one database, one place to look when something breaks.
214
108
 
215
109
  ---
216
110
 
217
- ## Platform Features
218
-
219
- ### Communication
220
- - **Direct RPC** — zero-hop gRPC calls with retries, deadlines, and mTLS identity
221
- - **Durable Events** — fan-out delivery, guaranteed delivery (RabbitMQ-style), at-least-once guarantees, retries, DLQ, replay, idempotency. If a consumer is offline, the message waits in the server-side queue and is dispatched the moment the consumer reconnects — no retry budget consumed while waiting.
222
- - **Realtime Streams** — live chunks with replay for AI/progress/log streaming
223
- - **Service Discovery** — automatic endpoint resolution and round-robin balancing
224
- - **HTTP Middleware** — Express and Fastify instrumentation with automatic trace propagation
225
-
226
- ### Orchestration
227
- - **Workflows** — DAG steps: `rpc`, `event`, `event_wait`, `sleep`, child workflow
228
- - **Jobs** — cron, delayed, and workflow-triggered scheduling
111
+ ## Use cases
229
112
 
230
- ### Security
231
- - **TLS by default** — control plane TLS + worker mTLS with gRPC certificate provisioning
232
- - **Access Policy** — service-level caller/target restrictions and RBAC
233
-
234
- ### Observability
235
- - **Unified Tracing** — single trace timeline across HTTP, RPC, events, workflows, and jobs
236
- - **Metrics** — Prometheus-compatible `/metrics` endpoint (30+ metric families)
237
- - **Logs** — structured log ingest with Loki-compatible query API
238
- - **Alerts** — runtime alerts for delivery failures, errors, and service health
239
- - **Dashboard** — realtime web UI for traces, events, workflows, jobs, DLQ, service map, and service keys
240
-
241
- ---
242
-
243
- ## How It Compares
244
-
245
- | Concern | Istio + Envoy | Dapr | Temporal + Kafka | ServiceBridge |
246
- |---|---|---|---|---|
247
- | RPC data path | Sidecar proxy hop | Sidecar/daemon hop | N/A | **Direct (proxyless)** |
248
- | Service discovery | K8s control plane | Sidecar placement | External registry | **Built-in registry** |
249
- | Durable events + DLQ | External broker | Pub/Sub component | Kafka + consumers | **Built-in** |
250
- | Workflow orchestration | External engine | External engine | Built-in | **Built-in** |
251
- | Job scheduling | External cron/queue | External scheduler | External scheduler | **Built-in** |
252
- | Traces + UI | Jaeger/Tempo + dashboards | OTEL backend + dashboards | Temporal UI | **Built-in** |
253
- | Logs for Grafana | Loki + Promtail pipeline | Log pipeline | Log pipeline | **Built-in Loki API** |
254
- | Metrics | App/exporter setup | App/exporter setup | Multiple exporters | **Built-in `/metrics`** |
255
- | Security model | Mesh PKI + policy | Deployment-dependent mTLS | Mixed | **Service keys + auto mTLS** |
256
- | Operational footprint | Multi-component mesh | Runtime + sidecars | Workflow + broker + DB | **One binary + PostgreSQL** |
113
+ - **Replace a broker** — durable, at-least-once events with fan-out and a dead-letter queue, without operating Kafka or RabbitMQ.
114
+ - **Run sagas / orchestration** — checkout, onboarding, fulfilment as durable workflows with automatic compensation on failure.
115
+ - **Internal RPC backbone** — typed service-to-service calls with load balancing, retries and circuit breakers, secured by mTLS.
116
+ - **Scheduled & delayed work** — nightly rollups, reminders, periodic syncs as leased, retried jobs.
117
+ - **Streaming responses** — token-by-token LLM output or progress feeds over server-side streaming RPC.
118
+ - **Observability for free** — get a full distributed trace across RPC event workflow → job without instrumenting by hand.
257
119
 
258
120
  ---
259
121
 
260
- ## API Reference
261
-
262
- ### `ServiceBridge` / `ServiceBridgeService` surface
263
-
264
- Per-instance API for `new ServiceBridge(...)` (implements `ServiceBridgeService`):
122
+ ## Quick start
265
123
 
266
- - **Namespaces:** `rpc` (`handle`, `invoke`, `declare`), `events` (`handle`, `publish`, `publishWorker`, `declare`), `jobs` (`run`), `workflows` (`run`, `declare`).
267
- - **Lifecycle:** `start(opts?)`, `stop()`.
268
- - **Workflows:** `cancelWorkflow(traceId)`.
269
- - **HTTP & traces:** `startHttpSpan(opts)`, `registerHttpEndpoint(opts)`, `watchTrace(traceId, opts?)`.
270
- - **Module helpers (exported from `service-bridge`):** `getTraceContext`, `withTraceContext`, `ServiceBridgeError`, `mapGrpcStatus`, `SB`, `SB_MESSAGES`. (`captureConsole` exists internally for log capture but is not part of the public package exports.)
124
+ Schemas are **file-based**: point the SDK at a `.proto` file (it resolves request/response types from the `service` block) or a `.schema.json` with explicit field numbers. There is no inline schema.
271
125
 
272
- ### Cross-SDK parity notes
273
-
274
- ServiceBridge keeps the core API shape consistent across Node.js, Go, and Python:
275
- constructor, `rpc` / `events` / `jobs` / `workflows` namespaces, streams, `start`/`stop`, and `ServiceBridgeError`.
276
-
277
- Constructor-level defaults for `timeout`, `retries`, and `retryDelay` are available
278
- across all three SDKs. Parity differences are naming-only (language idioms):
279
-
280
- - Constructor TLS overrides: `workerTLS`/`caCert` (Node), `WorkerTLS`/`CACert` (Go), `worker_tls`/`ca_cert` (Python)
281
- - Handler hints: timeout/retryable/concurrency/prefetch are advisory in all SDKs
282
- - Shared `start()` fields across SDKs: host, max in-flight, instance ID, weight, and per-start TLS override
283
-
284
- ### `new ServiceBridge(url, serviceKey, opts?)`
285
-
286
- ```ts
287
- class ServiceBridge {
288
- constructor(url: string, serviceKey: string, opts?: ServiceBridgeOpts);
126
+ ```proto
127
+ // payment.proto
128
+ syntax = "proto3";
129
+ message ChargeRequest { string user_id = 1; int64 amount = 2; }
130
+ message ChargeReply { bool ok = 1; }
131
+ service Payment {
132
+ rpc Charge(ChargeRequest) returns (ChargeReply);
289
133
  }
290
134
  ```
291
135
 
292
- Creates an SDK client instance. Service identity is resolved by the runtime from the sbv2 `serviceKey` (key id). Use `new ServiceBridge(...)` as the **public** entry point from the `service-bridge` package (the constructor delegates to the same internal client setup used by the SDK; a lower-level factory exists in source but is **not** exported from the published entry).
293
-
294
- `ServiceBridgeOpts`:
295
-
296
- | Option | Type | Default | Description |
297
- |---|---|---|---|
298
- | `timeout` | `number` | `30000` | Default hard timeout per `rpc.invoke()` attempt (ms). |
299
- | `retries` | `number` | `3` | Default retry count for `rpc.invoke()`. |
300
- | `retryDelay` | `number` | `300` | Base backoff delay (ms) for `rpc.invoke()`. |
301
- | `discoveryRefreshMs` | `number` | `10000` | Discovery refresh period for endpoint updates. |
302
- | `queueMaxSize` | `number` | `1000` | Max offline queue size for control-plane writes. |
303
- | `queueOverflow` | `"drop-oldest" \| "drop-newest" \| "error"` | `"drop-oldest"` | Overflow strategy for offline queue. |
304
- | `heartbeatIntervalMs` | `number` | `10000` | Base heartbeat period for worker registrations. |
305
- | `captureLogs` | `boolean` | `true` | Forward `console.*` logs to ServiceBridge. |
306
- | `strictOutboundDeclarations` | `boolean` | `false` | When `true`, every outbound `rpc.invoke()` must be preceded by `rpc.declare(fn)` for the resolved target. |
307
-
308
- ### Advanced TLS overrides
309
-
310
- | Option | Type | Default | Description |
311
- |---|---|---|---|
312
- | `workerTLS` | `WorkerTLSOpts` | auto | Explicit cert/key/CA for worker mTLS. |
313
- | `caCert` | `string \| Buffer` | from `serviceKey` | Optional control-plane CA override. By default SDK reads CA from sbv2 service key. |
314
-
315
- `WorkerTLSOpts`:
136
+ **Worker** register the handler. One argument in, one value out.
316
137
 
317
138
  ```ts
318
- type WorkerTLSOpts = {
319
- caCert?: string | Buffer;
320
- cert?: string | Buffer;
321
- key?: string | Buffer;
322
- serverName?: string;
323
- }
324
- ```
139
+ import { ServiceBridge } from "service-bridge";
325
140
 
326
- ---
141
+ const sb = new ServiceBridge("localhost:14445", process.env.PAYMENT_KEY!);
327
142
 
328
- ### `rpc.invoke(fn, payload?, opts?)`
143
+ sb.rpc.handle(
144
+ "Charge",
145
+ async (req: { userId: string; amount: number }) => {
146
+ return { ok: req.amount > 0 };
147
+ },
148
+ { schema: { protoFile: "./payment.proto" } },
149
+ );
329
150
 
330
- ```ts
331
- invoke<T = unknown>(fn: string, payload?: unknown, opts?: RpcOpts): Promise<T>
151
+ await sb.start();
332
152
  ```
333
153
 
334
- Calls a registered RPC handler on another worker. Direct gRPC path, no proxy.
335
-
336
- **Function name** — `fn` is a single **global function name** (the same string passed to `rpc.handle` on the callee), e.g. `payment.charge` or `user.get`. It must be unique in the catalog and **must not contain `/`**.
337
-
338
- `RpcOpts`:
339
-
340
- | Option | Type | Description |
341
- |---|---|---|
342
- | `timeout` | `number` | Call timeout in ms. |
343
- | `retries` | `number` | Retry count override. |
344
- | `retryDelay` | `number` | Base retry delay override. |
345
- | `traceId` | `string` | Explicit trace id. |
346
- | `parentSpanId` | `string` | Explicit parent span id. |
347
- | `mode` | `"direct" \| "proxy"` | Transport mode. `"direct"` (default) connects directly to the worker. `"proxy"` routes through the control plane when direct connection is unavailable. |
154
+ **Caller** — in another process, build a typed client and call it. `sb.client()` reads the `.proto` once, declares every method in its `service` block as an outgoing dependency, loads the schemas, and returns a typed proxy.
348
155
 
349
156
  ```ts
350
- const user = await sb.rpc.invoke<{ id: string; name: string }>("user.get", { id: "u_1" });
351
-
352
- const user2 = await sb.rpc.invoke<{ id: string; name: string }>("user.get", { id: "u_1" }, {
353
- timeout: 5000,
354
- retries: 2,
355
- });
356
- ```
357
-
358
- `rpc.invoke()` is bounded even when a downstream worker is silent:
359
- each attempt has a hard local timeout, retries are finite (`retries + 1` total attempts),
360
- and after the final failed attempt the root RPC span is closed with `error`.
361
-
362
- Retry delay uses exponential backoff: `retryDelay * 2^(attempt-1)`.
157
+ import { ServiceBridge } from "service-bridge";
363
158
 
364
- ---
159
+ const sb = new ServiceBridge("localhost:14445", process.env.ORDERS_KEY!);
160
+ const payment = await sb.client("payment-svc", "./payment.proto");
365
161
 
366
- ### `rpc.declare(fn)`
162
+ await sb.start();
367
163
 
368
- ```ts
369
- declare(fn: string): void
164
+ const res = await payment.Charge({ userId: "u-1", amount: 100 });
165
+ // res.ok === true
370
166
  ```
371
167
 
372
- Declares an outbound RPC dependency for registration metadata. When `strictOutboundDeclarations` is `true`, you must call `rpc.declare(fn)` before `rpc.invoke(fn, ...)` for that function. Does not invoke the remote handler.
168
+ Declare dependencies and build typed clients **before** `start()` they ride along in the first registration. Calls succeed once `start()` has connected.
373
169
 
374
170
  ---
375
171
 
376
- ### `events.publish(topic, payload?, opts?)`
172
+ ## Runtime setup
377
173
 
378
- ```ts
379
- publish(topic: string, payload?: unknown, opts?: EventOpts): Promise<string>
380
- ```
174
+ The SDK needs a running ServiceBridge runtime. Spin one up with the one-line installer:
381
175
 
382
- Publishes a durable event. Returns `messageId` when online.
383
-
384
- `EventOpts`:
385
-
386
- | Option | Type | Description |
387
- |---|---|---|
388
- | `traceId` | `string` | Explicit trace id. |
389
- | `parentSpanId` | `string` | Explicit parent span id. |
390
- | `idempotencyKey` | `string` | Idempotency key for dedup-safe publishing. |
391
- | `headers` | `Record<string, string>` | Custom metadata headers. |
392
-
393
- ```ts
394
- await sb.events.publish("orders.created", { orderId: "ord_42" }, {
395
- idempotencyKey: "order:ord_42",
396
- headers: { source: "checkout" },
397
- });
176
+ ```sh
177
+ bash <(curl -fsSL https://servicebridge.dev/install.sh)
398
178
  ```
399
179
 
400
- ---
401
-
402
- ### `events.publishWorker(topic, payload?, opts?)`
180
+ It pulls the runtime container, wires it to PostgreSQL 18+, and exposes the gRPC control plane on `:14445` and the dashboard on `:14444`. Open the dashboard, create a service, and copy its **bootstrap service key** — that opaque string is the second argument to `new ServiceBridge(url, key)`.
403
181
 
404
- ```ts
405
- publishWorker(
406
- topic: string,
407
- payload?: unknown,
408
- opts?: { traceId?: string; parentSpanId?: string; headers?: Record<string, string> },
409
- ): Promise<string>
410
- ```
182
+ Each instance authenticates with its key: the SDK calls `Bootstrap.Provision`, receives a short-lived leaf certificate, opens an mTLS gRPC channel and registers. Certificates rotate automatically with overlap (the new session is live before the old one closes), so long-running instances never drop traffic at renewal.
411
183
 
412
- Publishes over the worker session stream (after `start()`). If no worker session is active, the promise is rejected.
184
+ Full self-hosting docs live at **[servicebridge.dev/docs](https://servicebridge.dev/docs)**.
413
185
 
414
186
  ---
415
187
 
416
- ### `events.declare(topic)`
188
+ ## End-to-end example
417
189
 
418
- ```ts
419
- declare(topic: string): void
420
- ```
421
-
422
- Declares an outbound event dependency for registration metadata (does not publish a message).
423
-
424
- ---
425
-
426
- ### `jobs.run(service, fn, opts)` / `jobs.run(target, opts)`
190
+ A small order flow: an HTTP request triggers a workflow that charges a payment, then publishes an event another service consumes — all traced as one tree.
427
191
 
428
192
  ```ts
429
- run(service: string, fn: string, opts: ScheduleOpts & { via: "rpc" }): Promise<string>
430
- run(target: string, opts: ScheduleOpts & { via: "event" | "workflow" }): Promise<string>
431
- ```
432
-
433
- Registers a scheduled or delayed job. Resolves to the registration key: `"${service}/${fn}"` for the RPC overload, or the `target` string for the `event` / `workflow` overload.
434
-
435
- `ScheduleOpts`:
436
-
437
- | Option | Type | Description |
438
- |---|---|---|
439
- | `cron` | `string` | Cron expression. |
440
- | `delay` | `number` | Delay in ms before execution. Backed by `int32` in the proto — maximum ~24.8 days (~2,147,483,647 ms). |
441
- | `timezone` | `string` | Timezone for cron execution. |
442
- | `misfire` | `"fire_now" \| "skip"` | Misfire policy. |
443
- | `via` | `"event" \| "rpc" \| "workflow"` | Target type. |
444
- | `retryPolicyJson` | `string` | Retry policy JSON string. |
193
+ import { ServiceBridge } from "service-bridge";
445
194
 
446
- ```ts
447
- await sb.jobs.run("payments", "billing.collect", {
448
- cron: "0 * * * *",
449
- timezone: "UTC",
450
- via: "rpc",
195
+ const sb = new ServiceBridge("localhost:14445", process.env.ORDERS_KEY!);
196
+
197
+ // Outgoing dependencies declared before start().
198
+ sb.service("payment-svc", { rpc: ["Charge"] });
199
+ sb.event.define("order.placed", { protoFile: "./events.proto", input: "OrderPlaced" });
200
+
201
+ // A durable workflow: charge, then announce. Steps run by dependency level.
202
+ sb.workflow.handle("checkout", {
203
+ input: { type: "object", properties: { orderId: { type: "string" } } },
204
+ steps: [
205
+ { id: "charge", type: "call", service: "payment-svc", method: "Charge",
206
+ input: "$.input" },
207
+ { id: "announce", type: "publish", event: "order.placed",
208
+ input: "$.input", waitFor: ["charge"] },
209
+ ],
451
210
  });
452
- ```
453
-
454
- ---
455
-
456
- ### `workflows.run(name, steps, opts?)` — register DAG
457
-
458
- TypeScript (single method; behavior depends on the second argument):
459
-
460
- ```ts
461
- run(
462
- nameOrService: string,
463
- stepsOrName: WorkflowStep[] | string,
464
- inputOrOpts?: unknown,
465
- opts?: ExecuteWorkflowOpts,
466
- ): Promise<string | ExecuteWorkflowResult>
467
- ```
468
-
469
- - **Register:** when `stepsOrName` is `WorkflowStep[]`, `nameOrService` is the workflow name, `inputOrOpts` is optional `WorkflowOpts`, and the promise resolves to that name (`string`).
470
- - **Execute:** when `stepsOrName` is a `string`, `nameOrService` is the target **service** name, `stepsOrName` is the workflow name, `inputOrOpts` is the optional execution input, and `opts` is optional `ExecuteWorkflowOpts` (see execute section below).
471
-
472
- Overload as used when registering:
473
-
474
- ```ts
475
- run(name: string, steps: WorkflowStep[], opts?: WorkflowOpts): Promise<string>
476
- ```
477
-
478
- Registers (or updates) a workflow definition as a DAG of typed steps. Returns the workflow name.
479
211
 
480
- `WorkflowStep`:
212
+ sb.on("connected", ({ serviceName }) => console.log(`up as ${serviceName}`));
481
213
 
482
- | Field | Type | Description |
483
- |---|---|---|
484
- | `id` | `string` | Unique step identifier in the DAG. |
485
- | `type` | `"rpc" \| "event" \| "event_wait" \| "sleep" \| "workflow"` | Step execution type. |
486
- | `service` | `string` | Required for `rpc` and `workflow`: target service that owns the function or child workflow. |
487
- | `ref` | `string` | Target name: RPC function, event topic, waited topic, or child workflow name (per `type`). |
488
- | `deps` | `string[]` | Dependencies. Empty/omitted means root step. |
489
- | `if` | `string` | Optional filter expression (step is skipped if false). |
490
- | `timeoutMs` | `number` | Optional timeout for `rpc` and `event_wait` steps. |
491
- | `durationMs` | `number` | Required for `sleep` steps. |
492
-
493
- `WorkflowOpts` (third argument when registering a DAG — shape below; the interface is defined in the SDK but **not** re-exported from the main `service-bridge` package entry, so use an inline object in app code):
214
+ await sb.start();
494
215
 
495
- ```ts
496
- interface WorkflowOpts {
497
- stateLimitBytes?: number; // default 262144 (256 KB)
498
- stepTimeoutMs?: number; // default 30000 (30 s)
499
- }
216
+ // Kick off a run and wait for the final state.
217
+ const { runId } = await sb.workflow.start("checkout", { orderId: "o-1" });
218
+ const state = await sb.workflow.await(runId);
219
+ console.log("done", state);
500
220
  ```
501
221
 
502
- | Field | Type | Default | Description |
503
- |---|---|---|---|
504
- | `stateLimitBytes` | `number` | `262144` (256 KB) | Maximum serialized state size in bytes. |
505
- | `stepTimeoutMs` | `number` | `30000` (30 s) | Default per-step timeout in milliseconds. |
222
+ The consuming service just subscribes:
506
223
 
507
224
  ```ts
508
- await sb.workflows.run("order.fulfillment", [
509
- { id: "reserve", type: "rpc", service: "inventory", ref: "inventory.reserve" },
510
- { id: "charge", type: "rpc", service: "payment", ref: "payment.charge", deps: ["reserve"] },
511
- { id: "wait_5m", type: "sleep", durationMs: 300_000, deps: ["charge"] },
512
- { id: "notify", type: "event", ref: "orders.fulfilled", deps: ["wait_5m"] },
513
- ]);
225
+ sb.event.handle("order.placed", async (payload) => {
226
+ await sendReceipt(payload);
227
+ });
228
+ await sb.start();
514
229
  ```
515
230
 
516
- With explicit limits:
517
-
518
- ```ts
519
- await sb.workflows.run("checkout.flow", steps, { stepTimeoutMs: 60_000 });
520
- ```
231
+ In the dashboard you see one trace spanning the workflow run, the `Charge` RPC, the `order.placed` publish, and its delivery to the subscriber.
521
232
 
522
233
  ---
523
234
 
524
- ### `workflows.declare(service, name)`
235
+ ## Platform features
525
236
 
526
- ```ts
527
- declare(service: string, name: string): void
528
- ```
237
+ | Area | What you get |
238
+ |---|---|
239
+ | **Communication** | Direct RPC, server-side streaming, durable events, service discovery, full-mesh routing, a live service map |
240
+ | **Orchestration** | Workflows (DAG steps with compensation), sub-workflows, jobs (cron / interval / delayed), bidirectional replay |
241
+ | **Reliability** | At-least-once delivery, retries, DLQ, idempotency, fan-out, session resilience, multi-instance failover, circuit breakers |
242
+ | **Traffic control** | Load balancing, rate limiting, per-definition limits, filter expressions, adaptive performance |
243
+ | **Security** | TLS by default, mTLS identity, auto-provisioned certs from a service key, granular access policy |
244
+ | **Observability** | Unified tracing with propagation, Prometheus-compatible metrics, structured logs, smart alerts |
529
245
 
530
- Declares an outbound workflow dependency for registration metadata (does not start an execution).
246
+ Designed to run up to 1000 services against a single runtime.
531
247
 
532
248
  ---
533
249
 
534
- ### `workflows.run(service, name, input?, opts?)` — execute
250
+ ## How it compares
535
251
 
536
- This is the same `run` method as above when the second argument is the workflow **name** (`string`), not a step array:
252
+ | You'd otherwise reach for | ServiceBridge gives you |
253
+ |---|---|
254
+ | Istio / Linkerd (mesh, mTLS) | mTLS identity + routing + policy, no sidecars |
255
+ | RabbitMQ / Kafka / NATS | Durable events with outbox, fan-out, retries, DLQ |
256
+ | Temporal / Cadence | Durable workflows with compensation, signals, replay |
257
+ | A cron service / Quartz | Leased, retried scheduled jobs |
258
+ | Jaeger / Tempo + Prometheus + Loki | Tracing, metrics and logs, correlated out of the box |
259
+ | gRPC + a service registry | Typed RPC with discovery, LB and breakers |
537
260
 
538
- ```ts
539
- run(service: string, name: string, input?: unknown, opts?: ExecuteWorkflowOpts): Promise<ExecuteWorkflowResult>
540
- ```
261
+ The point isn't that ServiceBridge beats each tool at its own game — it's that you stop running and correlating ten of them.
541
262
 
542
- Starts a workflow execution on demand. The workflow must be registered first via `workflows.run(name, steps)`.
543
- An alternative to scheduling via `jobs.run(target, { via: "workflow", ... })` — triggers the execution immediately.
263
+ ---
544
264
 
545
- | Parameter | Type | Default | Description |
546
- |---|---|---|---|
547
- | `service` / `name` | `string` | required | Target service and workflow name. |
548
- | `input` | `unknown` | `undefined` | Optional JSON-serializable input payload (serialized as JSON for the runtime). |
265
+ ## API reference
549
266
 
550
- Returns `{ traceId }`. Use `traceId` with `watchTrace()` to observe execution in real time.
267
+ The bridge exposes four domains (`sb.rpc`, `sb.event`, `sb.job`, `sb.workflow`) plus `sb.stream()` and `sb.telemetry`. Register handlers and declare dependencies **before** `start()`.
551
268
 
552
- `ExecuteWorkflowOpts` (optional fourth argument):
269
+ ### RPC
553
270
 
554
- | Option | Type | Description |
555
- |---|---|---|
556
- | `traceId` | `string` | Declared on the exported type for API parity; the current Node implementation does **not** forward this field to the control plane (the gRPC request is built without it). Prefer relying on the returned `traceId`. |
271
+ `sb.rpc` is request/response: register handlers, call other services.
557
272
 
558
273
  ```ts
559
- const { traceId } = await sb.workflows.run("users", "user.onboarding", { userId: "u_123" });
560
- ```
561
-
562
- ---
563
-
564
- ### `cancelWorkflow(traceId)`
274
+ // Unary handler: (req) => res
275
+ sb.rpc.handle<ChargeRequest, ChargeReply>(
276
+ "Charge",
277
+ async (req) => ({ ok: req.amount > 0 }),
278
+ { schema: { protoFile: "./payment.proto" } },
279
+ );
565
280
 
566
- ```ts
567
- cancelWorkflow(traceId: string): Promise<void>
281
+ // Server-side streaming handler: (req) => AsyncIterable<chunk>
282
+ sb.rpc.handleStream<GenRequest, Token>(
283
+ "Generate",
284
+ async function* (req) {
285
+ for (const word of req.prompt.split(" ")) yield { token: word };
286
+ },
287
+ { schema: { protoFile: "./gen.proto" } },
288
+ );
568
289
  ```
569
290
 
570
- Cancels a running workflow instance.
291
+ Calling the typed proxy from `sb.client()` (preferred), or the lower-level `sb.rpc.call()`:
571
292
 
572
293
  ```ts
573
- await sb.cancelWorkflow("trace_01HQ...XYZ");
574
- ```
575
-
576
- ---
294
+ const res = await payment.Charge({ userId: "u-1", amount: 100 });
577
295
 
578
- ### `rpc.handle(fn, handler, opts?)`
579
-
580
- ```ts
581
- handle(
582
- fn: string,
583
- handler: (payload: unknown, ctx?: RpcContext) => unknown | Promise<unknown>,
584
- opts?: HandleRpcOpts,
585
- ): ServiceBridgeService
296
+ const res2 = await sb.rpc.call("payment-svc", "Charge",
297
+ { userId: "u-1", amount: 100 },
298
+ { timeout: "5s", idempotencyKey: "order-42" },
299
+ );
586
300
  ```
587
301
 
588
- Registers an RPC handler. Chainable.
302
+ `CallOpts` apply per call, layered over `callDefaults` from the constructor:
589
303
 
590
- `RpcContext`:
591
-
592
- | Field | Type | Description |
593
- |---|---|---|
594
- | `traceId` | `string` | Current trace ID. |
595
- | `spanId` | `string` | Current span ID. |
596
- | `stream` | `StreamWriter` | Real-time stream writer. |
597
-
598
- `HandleRpcOpts`:
599
-
600
- | Option | Type | Description |
601
- |---|---|---|
602
- | `timeout` | `number` | Advisory timeout hint (currently metadata-level, not hard-enforced by runtime). |
603
- | `retryable` | `boolean` | Advisory retry hint (currently metadata-level, not a strict policy switch). |
604
- | `concurrency` | `number` | Advisory concurrency hint (currently not hard-enforced). |
605
- | `schema` | `RpcSchemaOpts` | Inline protobuf schema for binary encode/decode. |
606
- | `allowedCallers` | `string[]` | Allow-list of caller service names. |
607
-
608
- ```ts
609
- sb.rpc.handle("ai.generate", async (payload: { prompt: string }, ctx) => {
610
- await ctx?.stream.write({ token: "Hello" }, "output");
611
- await ctx?.stream.write({ token: " world" }, "output");
612
- return { text: "Hello world" };
613
- });
614
- ```
615
-
616
- `StreamWriter`:
304
+ | `CallOpts` | Type | Default | Description |
305
+ |---|---|---|---|
306
+ | `timeout` | `string` | `"30s"` | Deadline, e.g. `"500ms"`, `"10s"`, `"2m"`. |
307
+ | `requestId` | `string` | random UUID v4 | Correlation id carried to the callee. |
308
+ | `transport` | `"direct" \| "proxy" \| "auto"` | `"auto"` | `direct` = caller→callee mTLS; `proxy` = via the runtime; `auto` = direct when an endpoint is known. |
309
+ | `idempotencyKey` | `string` | none | Opts into runtime-side dedup; replays within the TTL return the cached response. |
310
+ | `retry` | `Partial<RetryOpts>` | exp. backoff | `{ maxAttempts: 3, baseDelayMs: 200, factor: 2, maxDelayMs: 5000, jitter: 0.3 }`. Set `maxAttempts: 1` to disable. |
617
311
 
618
- | Method | Signature | Description |
619
- |---|---|---|
620
- | `write` | `write(data: unknown, key?: string): Promise<void>` | Append a real-time chunk to the trace stream. |
621
- | `end` | `end(key?: string): Promise<void>` | No-op placeholder for API symmetry (lifecycle managed by runtime). |
312
+ Without an `idempotencyKey`, ambiguous failures (`INTERNAL` / `ABORTED` / `UNKNOWN`) are treated as non-retryable so a non-idempotent call is never silently repeated. Schema-version mismatches are filtered at routing time, so blue-green deploys route `v1→v1` and `v2→v2` automatically.
622
313
 
623
- ---
314
+ ### Events
624
315
 
625
- ### `events.handle(pattern, handler, opts?)`
316
+ Durable, at-least-once publish/subscribe. Events hit a local SQLite outbox first, then drain to the runtime, so a publish survives a transient disconnect.
626
317
 
627
318
  ```ts
628
- handle(
629
- pattern: string,
630
- handler: (payload: unknown, ctx: EventContext) => void | Promise<void>,
631
- opts?: HandleEventOpts,
632
- ): ServiceBridgeService
633
- ```
319
+ // Declare what you publish (same file-based SchemaSpec as RPC).
320
+ sb.event.define("order.placed", { protoFile: "./events.proto", input: "OrderPlaced" });
634
321
 
635
- Registers an event consumer handler. Chainable.
636
-
637
- `HandleEventOpts`:
638
-
639
- | Option | Type | Description |
640
- |---|---|---|
641
- | `concurrency` | `number` | Advisory concurrency hint (currently not hard-enforced). |
642
- | `prefetch` | `number` | Advisory prefetch hint (currently not hard-enforced). |
643
- | `retryPolicyJson` | `string` | Retry policy JSON string. |
644
- | `filterExpr` | `string` | Server-side filter expression. |
645
-
646
- The consumer group name is fixed as `<service-key-id>.<pattern>` (derived from your sbv2 key and the pattern string). Registering a second handler for the same pattern throws a duplicate consumer-group error.
647
-
648
- **Delivery guarantee**: once a message is accepted by the runtime, delivery to each consumer group
649
- is guaranteed. If the consumer is offline, the message waits in the server-side queue and is
650
- dispatched automatically the moment the service reconnects and registers its handlers — no retry
651
- budget is consumed while waiting. After `SERVICEBRIDGE_DELIVERY_TTL_DAYS` (default 7) days without
652
- a consumer, the delivery moves to DLQ with reason `delivery_ttl_exceeded`.
653
-
654
- `EventContext` helpers:
655
-
656
- - `ctx.traceId` — current trace ID
657
- - `ctx.spanId` — current span ID
658
- - `ctx.retry(delayMs?)` — ask for redelivery with optional delay
659
- - `ctx.reject(reason)` — move to DLQ immediately, bypassing remaining retries
660
- - `ctx.refs` — metadata (`topic`, `groupName`, `messageId`, `attempt`, `headers`)
661
- - `ctx.stream.write(...)` — append real-time chunks to trace stream
662
-
663
- ```ts
664
- sb.events.handle("orders.*", async (payload, ctx) => {
665
- const body = payload as { orderId?: string };
666
- if (!body.orderId) {
667
- ctx.reject("missing_order_id");
668
- return;
669
- }
670
- await ctx.stream.write({ status: "processing", orderId: body.orderId }, "progress");
322
+ // Subscribe exact name or wildcard ("order.*", "order.#").
323
+ sb.event.handle("order.placed", async (payload) => {
324
+ await fulfil(payload);
671
325
  });
672
- ```
673
326
 
674
- ---
327
+ await sb.start();
675
328
 
676
- ### `start(opts?)`
677
-
678
- ```ts
679
- start(opts?: StartOpts): Promise<void>
329
+ const { eventId } = await sb.event.publish("order.placed", { orderId: "o-1", total: 4200 });
680
330
  ```
681
331
 
682
- Starts the worker gRPC server and registers handlers with the control plane.
683
- The promise resolves once startup/registration is complete (it does not block
684
- the Node.js process). Throws immediately if no handlers are registered (neither `rpc.handle()` nor `events.handle()` have been called).
685
-
686
- `StartOpts`:
332
+ Event names must match `^[a-z0-9_-]+(\.[a-z0-9_-]+)*$` (invalid `InvalidEventNameError`). A full outbox throws `OutboxFullError`.
687
333
 
688
- | Option | Type | Description |
334
+ | `PublishOpts` | Type | Description |
689
335
  |---|---|---|
690
- | `host` | `string` | Bind host. Default: `localhost`. Use `0.0.0.0` in Docker/Kubernetes so ServiceBridge can reach the worker. |
691
- | `maxInFlight` | `number` | Max in-flight runtime-originated commands over `OpenWorkerSession`. Default: `128`. |
692
- | `instanceId` | `string` | Stable worker instance identifier. |
693
- | `weight` | `number` | Scheduling/discovery weight hint. |
694
- | `tls` | `WorkerTLSOpts` | Per-start worker TLS override. |
336
+ | `idempotencyKey` | `string` | Dedup key for at-least-once delivery. |
337
+ | `partitionKey` | `string` | Orders delivery within a partition. |
338
+ | `fireAndForget` | `boolean` | Skip the durable wait for the publish ack. |
339
+ | `headers` | `Record<string, string>` | Custom envelope headers. |
340
+ | `occurredAtMs` | `number` | Event time (unix-ms); defaults to now. |
695
341
 
696
- ```ts
697
- await sb.start({
698
- host: "localhost",
699
- instanceId: process.env.HOSTNAME,
700
- });
701
- ```
342
+ The runtime delivers at-least-once, retries failures, fans out to every matching subscriber, and dead-letters exhausted messages. The DLQ is operated from the dashboard — the SDK has no DLQ API; make handlers idempotent and throw to signal "retry me".
702
343
 
703
- ---
344
+ ### Jobs
704
345
 
705
- ### `stop()`
346
+ Scheduled work: cron, fixed interval, or one-shot delay. The runtime owns the schedule, leasing and retries.
706
347
 
707
348
  ```ts
708
- stop(): void
709
- ```
710
-
711
- Gracefully stops the worker gRPC server (try graceful shutdown, then force), heartbeats, channels, and SDK internals.
712
-
713
- ---
714
-
715
- ### `startHttpSpan(opts)`
716
-
717
- ```ts
718
- startHttpSpan(opts: {
719
- method: string;
720
- path: string;
721
- traceId?: string;
722
- parentSpanId?: string;
723
- }): HttpSpan
724
- ```
725
-
726
- Manual HTTP tracing primitive.
727
-
728
- ```ts
729
- const span = sb.startHttpSpan({ method: "GET", path: "/health" });
730
- try {
731
- span.end({ statusCode: 200, success: true });
732
- } catch (e) {
733
- span.end({ success: false, error: String(e) });
734
- }
735
- ```
736
-
737
- ---
349
+ sb.job.handle("nightly-rollup",
350
+ { trigger: { cron: "0 3 * * *", tz: "UTC" } }, // 5-field cron, no seconds
351
+ async (ctx) => { await rollup(ctx.scheduledAt); },
352
+ );
738
353
 
739
- ### `registerHttpEndpoint(opts)`
354
+ sb.job.handle("heartbeat", { trigger: { interval: 30_000 } }, async () => { await ping(); });
740
355
 
741
- ```ts
742
- registerHttpEndpoint(opts: {
743
- method: string;
744
- route: string;
745
- instanceId?: string;
746
- endpoint?: string;
747
- allowedCallers?: string[];
748
- requestSchemaJson?: string;
749
- responseSchemaJson?: string;
750
- transport?: string;
751
- }): Promise<void>
356
+ sb.job.handle("send-reminder",
357
+ { trigger: { delayed: { at: Date.now() + 60_000 } } }, // Date | number | ISO string
358
+ async (ctx) => { await remind(ctx.idempotencyKey); },
359
+ );
752
360
  ```
753
361
 
754
- Registers HTTP route metadata in the ServiceBridge service catalog (stored and sent on the next Reconcile). **Requires a completed worker `start()`**: until `start()` has finished successfully, the call resolves but does not record the route (HTTP middleware may invoke `registerHttpEndpoint` on first request; catalog entries appear only after `start()` has run).
362
+ The handler receives a `JobHandlerCtx`: `{ jobName, executionId, scheduledAt, localScheduledAt, attempt, idempotencyKey, signal }`.
755
363
 
756
- | Option | Type | Description |
757
- |---|---|---|
758
- | `method` | `string` | HTTP method: `GET`, `POST`, `PUT`, `PATCH`, `DELETE`, etc. |
759
- | `route` | `string` | Route pattern with parameter placeholders, e.g. `"/users/:id"`. |
760
- | `instanceId` | `string` | Present on the public opts type; **not** applied by the current Node client when building `http_routes` for Reconcile (worker identity comes from `start()`). |
761
- | `endpoint` | `string` | Same as above use `start()` / deployment wiring for the reachable worker base URL. |
762
- | `allowedCallers` | `string[]` | Service names allowed to call (RBAC). |
763
- | `requestSchemaJson` | `string` | JSON schema for request validation metadata. |
764
- | `responseSchemaJson` | `string` | JSON schema for response validation metadata. |
765
- | `transport` | `string` | Present on the public opts type; **not** sent per route in the current Node reconcile payload. |
766
-
767
- ```ts
768
- await sb.registerHttpEndpoint({
769
- method: "GET",
770
- route: "/users/:id",
771
- requestSchemaJson: '{"type":"object"}',
772
- transport: "http",
364
+ | `JobOpts` | Type | Default | Description |
365
+ |---|---|---|---|
366
+ | `trigger` | `{cron, tz?} \| {delayed:{at}} \| {interval}` | required | Exactly one trigger; `interval` is in ms. |
367
+ | `catchup` | `"skip" \| "fire_once" \| "fire_all"` | `skip` | What to do for fire times missed during downtime. |
368
+ | `overlap` | `"skip" \| "allow" \| "buffer_one"` | `allow` | Behaviour when a previous run is still in flight. |
369
+ | `deps` | `DeclaredDep[]` | none | Outgoing deps: `{ rpc }`, `{ event }`, `{ workflow }`. |
370
+ | `maxAttempts` / `leaseTtlMs` / `maxConcurrent` / `retry` | | runtime default | Execution limits and `{ initialMs, maxMs, multiplier, jitter }` retry. |
371
+
372
+ ### Workflows
373
+
374
+ Durable DAGs. Declare the graph once; the runtime executes it, persists state between steps, survives restarts, and compensates on failure or cancel.
375
+
376
+ ```ts
377
+ sb.workflow.handle("checkout", {
378
+ input: { type: "object", properties: { orderId: { type: "string" } } },
379
+ steps: [
380
+ { id: "reserve", type: "call", service: "inventory-svc", method: "Reserve",
381
+ input: "$.input",
382
+ compensate: { service: "inventory-svc", method: "Release", input: "$.reserve" } },
383
+ { id: "charge", type: "call", service: "payment-svc", method: "Charge",
384
+ input: "$.input", waitFor: ["reserve"] },
385
+ { id: "notify", type: "publish", event: "order.placed",
386
+ input: "$.input", waitFor: ["charge"] },
387
+ ],
773
388
  });
774
389
  ```
775
390
 
776
- ---
391
+ Top-level steps run in parallel by default; `waitFor` declares dependencies and defines the execution levels. Step types: `call`, `publish`, `sleep`, `wait_event`, `wait_signal`, `workflow` (sub-workflow), `parallel`, `sequence`, `local`. Inputs are JSON-path expressions (`"$.input"`, `"$.reserve.id"`) over the accumulated run state.
777
392
 
778
- ### `watchTrace(traceId, opts?)`
393
+ Driving a run:
779
394
 
780
395
  ```ts
781
- watchTrace(traceId: string, opts?: WatchTraceOpts): AsyncIterable<TraceStreamEvent>
782
- ```
783
-
784
- Subscribes to a trace stream with replay and live updates. `traceId` is the stream
785
- identifier used by `ctx.stream.write(...)`.
786
-
787
- `WatchTraceOpts`:
788
-
789
- | Option | Type | Default | Description |
790
- |---|---|---|---|
791
- | `key` | `string` | `""` | Stream key filter (`""` = all keys). |
792
- | `fromSequence` | `number` | `0` | Replay from sequence cursor. |
396
+ const { runId } = await sb.workflow.start("checkout", { orderId: "o-1" });
793
397
 
794
- `TraceStreamEvent`:
398
+ const state = await sb.workflow.await(runId); // block until terminal
399
+ const snap = await sb.workflow.query(runId); // { status, state, steps: [...] }
400
+ await sb.workflow.signal(runId, "approval", { ok: 1 }); // resume a wait_signal step
401
+ await sb.workflow.cancel(runId); // compensate in reverse
402
+ const { runId: forked } = await sb.workflow.replay(runId, { fromStepId: "charge" });
403
+ ```
795
404
 
796
- | Field | Type | Description |
797
- |---|---|---|
798
- | `type` | `"chunk" \| "trace_complete"` | Event kind. |
799
- | `traceId` | `string` | Trace identifier being watched. |
800
- | `key` | `string` | Stream lane key. |
801
- | `sequence` | `number` | Monotonic sequence number. |
802
- | `data` | `unknown` | JSON-decoded chunk payload. |
803
- | `traceStatus` | `string \| undefined` | Final status on `trace_complete`. |
405
+ Use `sb.workflow.query()` for the snapshot there is no `getStatus`. `start()` with no permission throws `WorkflowAccessDeniedError`; an unknown name throws `WorkflowNotFoundError`; signalling/cancelling a finished run throws `WorkflowTerminalError`.
804
406
 
805
- Behavior:
407
+ ### Streaming
806
408
 
807
- - Auto-reconnect with exponential backoff (`500ms` `5000ms`) on retryable stream failures.
808
- - Deduplicates by `sequence` across reconnects.
809
- - Enforces strict JSON for `type="chunk"` payloads (non-JSON chunk terminates stream with fatal error).
810
- - Enforces internal queue limit `256`; overflow is fatal (consumer must drain promptly).
409
+ Server-side streaming is a first-class RPC shape. Register with `sb.rpc.handleStream`, consume with `sb.stream()` (or the typed proxy, which auto-detects `returns (stream T)` methods).
811
410
 
812
411
  ```ts
813
- for await (const evt of sb.watchTrace(traceId, { key: "output", fromSequence: 0 })) {
814
- if (evt.type === "chunk") {
815
- process.stdout.write(String((evt.data as { token?: string }).token ?? ""));
816
- }
817
- if (evt.type === "trace_complete") break;
412
+ for await (const chunk of sb.stream("gen-svc", "Generate", { prompt: "write a haiku" })) {
413
+ process.stdout.write(chunk.token);
818
414
  }
819
415
  ```
820
416
 
821
- ---
822
-
823
- ### Trace Utilities
824
-
825
- #### `getTraceContext()`
417
+ Breaking the loop (`break`/`return`) tears down the gRPC stream end to end. Streams are single-pick — never retried — by design.
826
418
 
827
- ```ts
828
- getTraceContext(): TraceCtx | undefined
829
- ```
419
+ ### Telemetry
830
420
 
831
- Returns the current async-local trace context.
421
+ Telemetry flows automatically: every RPC, event, job, workflow step and HTTP request emits an operation span and propagates the trace across hops. Add your own through `sb.telemetry`; anything emitted inside a handler nests under that handler's trace.
832
422
 
833
423
  ```ts
834
- import { getTraceContext } from "service-bridge";
424
+ import { Channel, UserSubOp } from "service-bridge";
835
425
 
836
- const tc = getTraceContext();
837
- if (tc) {
838
- console.log(tc.traceId, tc.spanId);
426
+ const op = sb.telemetry.startOp({
427
+ channel: Channel.USER, kind: UserSubOp, subject: "reprice-cart", businessKey: cartId,
428
+ });
429
+ try {
430
+ await reprice(cartId);
431
+ op.end(/* Status.SUCCESS */);
432
+ } catch (err) {
433
+ op.end(/* Status.ERROR */, String(err));
434
+ throw err;
839
435
  }
840
- ```
841
-
842
- #### `withTraceContext(ctx, fn)`
843
436
 
844
- ```ts
845
- withTraceContext<T>(ctx: TraceCtx, fn: () => T): T
437
+ sb.telemetry.log.info("cart repriced", { cartId, items: 7 }); // also sb.logger
438
+ sb.telemetry.counter("carts_repriced_total").inc();
439
+ sb.telemetry.gauge("queue_depth").set(42);
440
+ sb.telemetry.histogram("reprice_ms", "ms").observe(12.5);
846
441
  ```
847
442
 
848
- Runs a function inside an explicit trace context.
443
+ `startOp()` returns a handle whose `.end(status, message?)` closes the span. Anything emitted before `start()` buffers in an in-memory ring and drains once connected.
849
444
 
850
- ```ts
851
- import { withTraceContext } from "service-bridge";
445
+ ### HTTP
852
446
 
853
- withTraceContext({ traceId: "trace-1", spanId: "span-1" }, async () => {
854
- await sb.events.publish("audit.log", { action: "user.login" });
855
- });
856
- ```
447
+ ServiceBridge does **not** proxy your business HTTP. You run your own server; the integration discovers your routes, publishes them to the Service Map, and wraps each request in a trace span so HTTP stitches into the same trace as the RPCs and events it triggers. See [HTTP plugins](#http-plugins).
448
+
449
+ Useful read accessors after `start()`: `sb.identity()` (current session identity or `null`), `sb.serviceMap()` (live registry: visible methods, instances, endpoints), `sb.policyEvaluation()` (the runtime's current access-policy verdict).
857
450
 
858
451
  ---
859
452
 
860
- ## HTTP Plugins
453
+ ## HTTP plugins
861
454
 
862
- ### Express (`service-bridge/express`)
455
+ Each integration is a subpath import with an optional peer dependency.
863
456
 
864
- ```bash
865
- npm install express
866
- ```
457
+ **Express** — `service-bridge/express`:
867
458
 
868
459
  ```ts
869
460
  import express from "express";
870
461
  import { ServiceBridge } from "service-bridge";
871
- import { servicebridgeMiddleware, registerExpressRoutes } from "service-bridge/express";
462
+ import { attachExpress } from "service-bridge/express";
872
463
 
873
- const sb = new ServiceBridge(process.env.SERVICEBRIDGE_URL!, process.env.SERVICEBRIDGE_SERVICE_KEY!);
874
464
  const app = express();
465
+ app.post("/orders", (req, res) => res.json({ ok: true }));
875
466
 
876
- app.use(servicebridgeMiddleware({
877
- client: sb,
878
- excludePaths: ["/health"],
879
- autoRegister: true,
880
- }));
467
+ const sb = new ServiceBridge("localhost:14445", KEY);
468
+ await sb.start();
881
469
 
882
- app.get("/users/:id", async (req, res) => {
883
- const user = await req.servicebridge.rpc.invoke("user.get", { id: req.params.id });
884
- res.json(user);
885
- });
470
+ app.listen(3000, () => attachExpress(app, sb, { port: 3000 }));
886
471
  ```
887
472
 
888
- #### `servicebridgeMiddleware(options)`
889
-
890
- ```ts
891
- servicebridgeMiddleware(options: {
892
- client: ServiceBridgeService;
893
- excludePaths?: string[];
894
- propagateTraceHeader?: boolean;
895
- autoRegister?: boolean;
896
- }): express.RequestHandler
897
- ```
898
-
899
- - Attaches `req.servicebridge`, `req.traceId`, `req.spanId`
900
- - Starts/ends HTTP span automatically
901
- - Optionally sets `x-trace-id` response header
902
- - Optionally auto-registers route pattern in catalog on first hit
903
-
904
- #### `registerExpressRoutes(app, client, opts?)`
905
-
906
- Eager route catalog registration without waiting for first request.
907
-
908
- ```ts
909
- await registerExpressRoutes(app, sb, {
910
- endpoint: "http://10.0.0.5:3000",
911
- allowedCallers: ["api-gateway"],
912
- excludePaths: ["/health"],
913
- });
914
- ```
915
-
916
- ---
917
-
918
- ### Fastify (`service-bridge/fastify`)
919
-
920
- ```bash
921
- npm install fastify
922
- ```
473
+ **Fastify** `service-bridge/fastify`:
923
474
 
924
475
  ```ts
925
476
  import Fastify from "fastify";
926
477
  import { ServiceBridge } from "service-bridge";
927
- import { servicebridgePlugin, wrapHandler } from "service-bridge/fastify";
478
+ import { sbFastify } from "service-bridge/fastify";
928
479
 
929
- const sb = new ServiceBridge(process.env.SERVICEBRIDGE_URL!, process.env.SERVICEBRIDGE_SERVICE_KEY!);
930
480
  const app = Fastify();
481
+ const sb = new ServiceBridge("localhost:14445", KEY);
931
482
 
932
- await app.register(servicebridgePlugin, {
933
- client: sb,
934
- excludePaths: ["/health"],
935
- autoRegister: true,
936
- });
483
+ app.post("/orders", async () => ({ ok: true }));
484
+ await app.register(sbFastify, { sb }); // discovers routes + endpoint in onListen
937
485
 
938
- app.get("/users/:id", wrapHandler(async (request, reply) => {
939
- const user = await request.servicebridge.rpc.invoke("user.get", {
940
- id: (request.params as any).id,
941
- });
942
- return reply.send(user);
943
- }));
486
+ await sb.start();
487
+ await app.listen({ port: 3000 });
944
488
  ```
945
489
 
946
- #### `servicebridgePlugin(fastify, options)`
490
+ **Hono** `service-bridge/hono`:
947
491
 
948
492
  ```ts
949
- servicebridgePlugin(fastify, {
950
- client,
951
- excludePaths?,
952
- propagateTraceHeader?,
953
- autoRegister?,
954
- register?: {
955
- instanceId?,
956
- endpoint?,
957
- allowedCallers?,
958
- excludePaths?,
959
- },
960
- })
961
- ```
962
-
963
- - Decorates `request.servicebridge`, `request.traceId`, `request.spanId`
964
- - Traces HTTP lifecycle via hooks
965
- - Auto-registers routes on `onRoute` before traffic
966
-
967
- #### `wrapHandler(handler)`
968
-
969
- Runs a Fastify handler inside the current trace context so downstream SDK calls inherit the trace.
970
-
971
- ---
972
-
973
- ### Trace Utilities (HTTP Plugins)
493
+ import { Hono } from "hono";
494
+ import { ServiceBridge } from "service-bridge";
495
+ import { attachHono } from "service-bridge/hono";
974
496
 
975
- #### `extractTraceFromHeaders(headers)`
497
+ const app = new Hono();
498
+ app.post("/orders", (c) => c.json({ ok: true }));
976
499
 
977
- ```ts
978
- import { extractTraceFromHeaders } from "service-bridge/express";
979
- // or
980
- import { extractTraceFromHeaders } from "service-bridge/fastify";
500
+ const sb = new ServiceBridge("localhost:14445", KEY);
501
+ await sb.start();
981
502
 
982
- const { traceId, parentSpanId } = extractTraceFromHeaders(req.headers);
503
+ attachHono(app, sb, { port: 3000 }); // Hono doesn't own the socket — pass the port
504
+ Bun.serve({ port: 3000, fetch: app.fetch });
983
505
  ```
984
506
 
985
- Extracts trace context from HTTP headers. Supports W3C `traceparent`, `x-trace-id`/`x-span-id` headers, and generates random IDs as fallback. Useful for custom HTTP framework integrations (Hono, Koa, etc.).
507
+ `attachExpress`/`attachHono` take `{ port, host? }`; `sbFastify` reads the bound address itself. Host defaults to the bound socket, falling back to `127.0.0.1`. Attaching before `start()` is safe the endpoint rides along in the first registration.
986
508
 
987
509
  ---
988
510
 
989
511
  ## Configuration
990
512
 
991
- ### TLS behavior
513
+ All configuration lives on the `ServiceBridge` constructor — `new ServiceBridge(url, key, options)`. The SDK reads no environment variables; you decide where `url`, `key` and options come from. Every option is optional.
992
514
 
993
- - Worker transport is TLS-only.
994
- - Control plane is TLS-only. Trust source is embedded into sbv2 service key by default.
995
- - Embedded/explicit CA PEM is validated with strict x509 parsing.
996
- - If `workerTLS` is not provided, SDK auto-provisions worker certs via gRPC `ProvisionWorkerCertificate`.
997
- - `workerTLS.cert` and `workerTLS.key` must be provided together.
998
- - `start({ tls })` overrides global `workerTLS` for a specific worker instance.
999
-
1000
- ### Offline queue behavior
1001
-
1002
- When the control plane is unavailable, SDK queues write operations (`events.publish`, `jobs.run`, `workflows.run`, telemetry writes).
1003
-
1004
- - Queue size: `queueMaxSize` (default: 1000)
1005
- - Overflow policy: `queueOverflow` (default: `"drop-oldest"`)
1006
- - Return values for queued writes may be empty strings until flushed
1007
-
1008
- ---
1009
-
1010
- ## Environment Variables
1011
-
1012
- The SDK requires values you pass into `new ServiceBridge(...)`. Common setup:
1013
-
1014
- | Variable | Required | Example | Description |
515
+ | Option | Type | Default | Description |
1015
516
  |---|---|---|---|
1016
- | `SERVICEBRIDGE_URL` | yes | `localhost:14445` | gRPC control plane URL |
1017
- | `SERVICEBRIDGE_SERVICE_KEY` | yes | `sbv2.<id>.<secret>.<ca>` | Service authentication key (sbv2 only) |
1018
-
1019
- ```ts
1020
- const sb = new ServiceBridge(
1021
- process.env.SERVICEBRIDGE_URL ?? "localhost:14445",
1022
- process.env.SERVICEBRIDGE_SERVICE_KEY!,
1023
- );
1024
- ```
1025
-
1026
- ---
1027
-
1028
- ## Error Handling
1029
-
1030
- `ServiceBridgeError` is exported for normalized SDK and runtime errors.
1031
-
1032
- ```ts
1033
- import { ServiceBridge, ServiceBridgeError } from "service-bridge";
1034
-
1035
- try {
1036
- await sb.rpc.invoke("payment.charge", { orderId: "ord_1" });
1037
- } catch (e) {
1038
- if (e instanceof ServiceBridgeError) {
1039
- console.error(e.component, e.operation, e.severity, e.retryable, e.code, e.grpcStatus);
1040
- }
1041
- throw e;
1042
- }
1043
- ```
1044
-
1045
- | Field | Type | Description |
1046
- |---|---|---|
1047
- | `component` | `string` | SDK subsystem (for example, `"rpc"` or `"event"`). |
1048
- | `operation` | `string` | Operation that failed. |
1049
- | `severity` | `"fatal" \| "retriable" \| "ignorable"` | Error classification. |
1050
- | `retryable` | `boolean` | Whether retry is recommended (`true` when `severity === "retriable"`). |
1051
- | `code` | `ServiceBridgeErrorCode` | Stable SDK error id (`SB_*`). |
1052
- | `grpcStatus` | `number \| undefined` | gRPC status code when the error came from gRPC. |
1053
- | `cause` | `unknown` | Original underlying error. |
1054
-
1055
- ---
1056
-
1057
- ## When to Use / When Not to Use
1058
-
1059
- ### ServiceBridge is a good fit when you:
1060
-
1061
- - Have **3+ microservices** that need to communicate via RPC, events, or both
1062
- - Want **RPC + events + workflows + jobs** without managing separate infrastructure for each
1063
- - Need **end-to-end tracing** across all communication patterns in one timeline
1064
- - Want to **eliminate sidecar proxies** and reduce operational overhead
1065
- - Need **durable event delivery** with retry, DLQ, and replay without running a broker
1066
- - Are building **AI/LLM pipelines** and need realtime streaming with replay
1067
-
1068
- ### Consider alternatives when you:
1069
-
1070
- - Run a **single monolith** with no service decomposition plans
1071
- - Need **ultra-high-throughput event streaming** (100K+ msg/s sustained) — Kafka is purpose-built for this
1072
- - Need a **full API gateway** with rate limiting, auth plugins, and request transformation — use Kong/Envoy Gateway
1073
- - Already have a **mature Istio/Linkerd mesh** and only need traffic management (no events/workflows/jobs)
1074
- - Need **multi-region event replication** — ServiceBridge currently targets single-region deployments
1075
-
1076
- ---
1077
-
1078
- ## v2 Session API
1079
-
1080
- `session_v2.ts` реализует новый Enterprise Session Protocol — Channel-based bidi stream с 8-состоянийным FSM, адаптивным heartbeat и кредитным управлением потоком. Симметричен с Go и Python SDK.
1081
-
1082
- ### Жизненный цикл сессии (8 состояний FSM)
1083
-
1084
- ```
1085
- connecting → handshaking → ready ↔ active
1086
- ↘ suspended → (reconnect)
1087
- ↘ draining → closed
1088
- ↘ fenced (permanent)
1089
- ```
1090
-
1091
- | Состояние | Описание |
1092
- |-----------|----------|
1093
- | `connecting` | Устанавливается TCP/TLS соединение |
1094
- | `handshaking` | Отправлен Hello, ждём HelloAck |
1095
- | `ready` | HelloAck получен, команды не выполняются |
1096
- | `active` | Есть активные команды |
1097
- | `suspended` | Heartbeat пропущен 2+ раза |
1098
- | `draining` | Инициирован graceful shutdown |
1099
- | `fenced` | Сервер прислал GOAWAY_FENCED — сессия закрыта навсегда |
1100
- | `closed` | Соединение закрыто |
1101
-
1102
- ### Быстрый старт
1103
-
1104
- ```typescript
1105
- import { V2SessionClient, validateV2Config } from 'service-bridge';
1106
-
1107
- const cfg = {
1108
- serverAddress: 'localhost:9090',
1109
- instanceId: 'worker-1',
1110
- zone: 'us-east-1a',
1111
- transportMode: 'direct' as const,
1112
- maxInflight: 64,
1113
- };
1114
-
1115
- validateV2Config(cfg);
1116
- const session = new V2SessionClient(cfg);
1117
-
1118
- // Отправить Hello при подключении
1119
- const hello = session.getHelloFields();
1120
-
1121
- // Обработать HelloAck от сервера
1122
- session.onHelloAck({
1123
- sessionId: 'sess-abc',
1124
- resumeToken: 'token-xyz',
1125
- epoch: 1n,
1126
- resumed: false,
1127
- resumeFromSeq: 0n,
1128
- replayedCommands: 0,
1129
- reconciledResults: 0,
1130
- heartbeatIntervalMs: 10_000,
1131
- heartbeatTimeoutMs: 30_000,
1132
- initialPermits: 64,
1133
- maxPermits: 128,
1134
- effectiveTransportMode: 'direct',
517
+ | `advertise` | `{ host, port } \| false` | `127.0.0.1` on a free port (with a warning) | Inbound RPC server address. Pass `{ host, port }` in containers / k8s; `false` for caller-only instances that never serve RPC. |
518
+ | `callDefaults` | `CallOpts` | `{}` | Default `CallOpts` merged under every `sb.rpc.call()` / `sb.stream()`. |
519
+ | `failOnPolicyViolation` | `boolean` | `false` | When `true`, any policy warning at registration makes `start()` surface a `disconnected` event and stop. Otherwise warnings are logged and emitted as `policy_violation`. |
520
+ | `telemetry` | `boolean` | `true` | Emit ops/logs/metrics to the runtime. `false` fully disables the telemetry transport. |
521
+ | `telemetryRingSize` | `number` | `262144` (256 KiB) | Byte budget for the in-memory ops ring buffer. |
522
+ | `dataDir` | `string` | `"./.servicebridge"` | Directory for the local SQLite event outbox. |
523
+ | `maxOutboxRows` | `number` | `100000` | Outbox rows before `publish` back-pressures with `OutboxFullError`. |
524
+ | `eventsDrainerBatch` | `number` | `50` | Outbox rows drained to the runtime per tick. |
525
+ | `eventsMaxInFlight` | `number` | `32` | Max concurrent inbound events processed by subscribers. |
526
+ | `payloadMaxBytes` | `number` | `65536` | Per-direction cap on captured payload bytes. |
527
+ | `reconnectIntervalMs` | `number` | `3000` | Delay between reconnect attempts. |
528
+ | `reconnectAttempts` | `number` | `3` | Reconnect attempts before giving up. `0` = unlimited. |
529
+
530
+ ```ts
531
+ const sb = new ServiceBridge("localhost:14445", KEY, {
532
+ advertise: { host: process.env.POD_IP!, port: 50051 },
533
+ callDefaults: { timeout: "10s" },
534
+ reconnectAttempts: 0,
535
+ dataDir: "/var/lib/myservice/sb",
1135
536
  });
1136
-
1137
- console.log(session.state); // 'ready'
1138
-
1139
- // Входящая команда
1140
- const accepted = session.onCommandReceived(1n, 'cmd-001');
1141
- if (!accepted) {
1142
- // backpressure — permits = 0
1143
- }
1144
-
1145
- // Команда выполнена
1146
- session.onCommandCompleted(1n, 'cmd-001');
1147
537
  ```
1148
538
 
1149
- ### Адаптивный heartbeat (EWMA RTT)
539
+ ### Lifecycle
1150
540
 
1151
- ```typescript
1152
- import { AdaptiveHeartbeatV2 } from 'service-bridge';
1153
-
1154
- const hb = new AdaptiveHeartbeatV2(10_000, 30_000);
1155
-
1156
- // Получен pong
1157
- hb.onPong(25); // rttMs
1158
-
1159
- // Следующий интервал (адаптируется по EWMA RTT)
1160
- const nextMs = hb.nextIntervalMs();
1161
-
1162
- // Пропуск — ускоряем пинги
1163
- const missCount = hb.onMiss();
1164
- if (missCount >= 2) {
1165
- // reconnect
1166
- }
1167
- ```
1168
-
1169
- Алгоритм: базовый интервал `intervalMs / 3`; при пропусках делится на `2^miss` (min 2s); при стабильном RTT < 50ms удваивается (max 30s).
1170
-
1171
- ### Кредитное управление потоком
1172
-
1173
- ```typescript
1174
- import { FlowControlStateV2 } from 'service-bridge';
541
+ ```ts
542
+ const sb = new ServiceBridge("localhost:14445", KEY);
1175
543
 
1176
- const fc = new FlowControlStateV2(64, 1, 128);
544
+ sb.service("payment-svc", { rpc: ["Charge"] }); // what you call
545
+ sb.rpc.handle("Ship", shipHandler, { schema: { protoFile: "./ship.proto" } }); // what you serve
1177
546
 
1178
- if (fc.tryConsume()) {
1179
- // dispatch command
1180
- }
547
+ sb.on("connected", ({ serviceName }) => console.log(`connected as ${serviceName}`));
548
+ sb.on("reconnecting", ({ attempt, reason }) => console.warn(`reconnecting #${attempt}: ${reason}`));
549
+ sb.on("disconnected", ({ reason }) => console.error(`disconnected: ${reason}`));
550
+ sb.on("policy_violation", (v) => console.warn(`policy: ${v.declaration} ${v.value} — ${v.reason}`));
1181
551
 
1182
- // Команда завершена — вернуть permit
1183
- fc.release(1);
552
+ await sb.start();
1184
553
 
1185
- // Сервер прислал FlowControlUpdate
1186
- fc.setWindow(32);
554
+ process.on("SIGTERM", async () => { await sb.stop(); process.exit(0); });
1187
555
  ```
1188
556
 
1189
- ### Reconnect и resume
1190
-
1191
- `BackoffV2` реализует экспоненциальный backoff с full jitter (base=100ms, max=30s). При переподключении `getHelloFields()` автоматически включает `resumeToken`, `epoch`, `lastReceivedSeq`, `lastSentSeq`, `completedCommandIds` — сервер продолжит сессию с нужной позиции.
1192
-
1193
- ```typescript
1194
- import { BackoffV2 } from 'service-bridge';
557
+ ---
1195
558
 
1196
- const backoff = new BackoffV2();
559
+ ## Error handling
1197
560
 
1198
- while (true) {
1199
- if (backoff.isCircuitOpen()) break; // 10+ сбоев подряд
561
+ Typed errors are exported from the package root, so you can `catch` precisely:
1200
562
 
1201
- const delayMs = backoff.next();
1202
- await new Promise(r => setTimeout(r, delayMs));
563
+ ```ts
564
+ import {
565
+ RpcAccessDeniedError,
566
+ WorkflowAccessDeniedError,
567
+ InvalidEventNameError,
568
+ OutboxFullError,
569
+ ServiceBridgeError,
570
+ } from "service-bridge";
1203
571
 
1204
- try {
1205
- // reconnect...
1206
- backoff.reset();
1207
- } catch {
1208
- backoff.recordFail();
572
+ try {
573
+ await payment.Charge({ userId: "u-1", amount: 100 });
574
+ } catch (err) {
575
+ if (err instanceof RpcAccessDeniedError) {
576
+ // denied by access policy: { serviceName, methodName, reason }
577
+ } else if (err instanceof ServiceBridgeError) {
578
+ // connection / provisioning failure with a typed .code
1209
579
  }
1210
580
  }
1211
581
  ```
1212
582
 
1213
- ### ConfigPush динамическая конфигурация транспорта
1214
-
1215
- Сервер может в любой момент прислать `ConfigPush` с новыми правилами маршрутизации:
1216
-
1217
- ```typescript
1218
- session.onConfigPush({
1219
- defaultMode: 'direct',
1220
- serviceOverrides: {
1221
- 'payment-svc': { mode: 'proxy', fallbackPolicy: 'fallback_to_direct' },
1222
- },
1223
- functionOverrides: {
1224
- 'payment.charge': { mode: 'proxy', timeoutMs: 5000 },
1225
- },
1226
- });
1227
-
1228
- // Разрешить транспорт для функции
1229
- const mode = session.resolveTransportMode('payment.charge'); // 'proxy'
1230
- ```
1231
-
1232
- ### Все события сессии
1233
-
1234
- | Метод | Описание |
1235
- |-------|----------|
1236
- | `getHelloFields()` | Поля для отправки Hello (первый + resume) |
1237
- | `onHelloAck(ack)` | Обработка HelloAck от сервера |
1238
- | `onCommandReceived(seq, id)` | Входящая команда; возвращает `false` при backpressure |
1239
- | `onCommandCompleted(seq, id)` | Команда выполнена; освобождает permit |
1240
- | `onPermitGrant(n)` | Сервер добавил `n` permits |
1241
- | `onFlowControlUpdate(size, reason)` | Сервер изменил размер окна |
1242
- | `onPong(rttMs)` | Получен pong; обновляет EWMA |
1243
- | `onHeartbeatMiss()` | Таймаут pong; возвращает `true` → `suspended` |
1244
- | `onDrain(reason, deadlineMs)` | Инициировать graceful drain |
1245
- | `onGoaway(code, reason)` | GoawaySignal от сервера |
1246
- | `onConfigPush(config)` | Применить новую конфигурацию транспорта |
1247
- | `resolveTransportMode(fnName)` | Получить режим транспорта для функции |
1248
- | `stop()` | Немедленно закрыть сессию |
1249
-
1250
- ### Экспортируемые классы и типы
1251
-
1252
- | Символ | Тип | Описание |
1253
- |--------|-----|----------|
1254
- | `V2SessionClient` | class | Главный клиент сессии |
1255
- | `AdaptiveHeartbeatV2` | class | EWMA RTT heartbeat controller |
1256
- | `FlowControlStateV2` | class | Кредитное управление потоком |
1257
- | `BackoffV2` | class | Exponential backoff + circuit |
1258
- | `PositionTrackerV2` | class | Трекер seq/completed IDs |
1259
- | `ConfigPushStateV2` | class | Менеджер динамической конфигурации |
1260
- | `validateV2Config` | function | Валидация конфига; бросает `Error` |
1261
- | `V2Config` | interface | Конфигурация сессии |
1262
- | `SessionStateV2` | type | Союз 8 состояний FSM |
1263
- | `TransportMode` | type | `'direct' \| 'proxy'` |
1264
- | `HelloAckV2` | interface | Данные HelloAck от сервера |
1265
- | `TransportConfigV2` | interface | ConfigPush payload |
1266
- | `ReconcileRequestV2` | interface | Declarative worker registration request |
1267
- | `FunctionDeclarationV2` | interface | Function declaration for Reconcile |
1268
- | `ConsumerGroupDeclarationV2` | interface | Consumer group declaration |
1269
- | `HttpRouteDeclarationV2` | interface | HTTP route declaration |
1270
- | `JobDeclarationV2` | interface | Job declaration |
1271
- | `WorkflowDeclarationV2` | interface | Workflow declaration |
1272
- | `SubscribeRequestV2` | interface | Registry subscribe request |
1273
- | `WorkerEndpointV2` | interface | Worker endpoint info |
1274
- | `IssueCertificateRequestV2` | interface | Certificate request |
1275
- | `IssueCertificateResponseV2` | interface | Certificate response |
1276
- | `CircuitBreakerConfigV2` | interface | Circuit breaker config |
1277
- | `ZoneConfigV2` | interface | Zone-aware config |
1278
- | `ServiceTransportOverride` | interface | Per-service transport override |
1279
- | `FunctionTransportOverride` | interface | Per-function transport override |
1280
- | `ResumeState` | interface | Reconnect resume state |
1281
-
1282
- From the main entry `service-bridge`, types such as `ServiceBridgeOpts`, `RpcOpts`, `EventOpts`, `HandleRpcOpts`, `HandleEventOpts`, `ScheduleOpts`, `StartOpts`, `ExecuteWorkflowOpts`, and `ExecuteWorkflowResult` are available. The DAG shapes **`WorkflowStep` and `WorkflowOpts` are documented above but are not named exports** from that entry — use inline object literals (inference from `workflows.run(...)`) unless your toolchain exposes deep paths. Example:
1283
-
1284
- ```ts
1285
- import type {
1286
- RpcContext,
1287
- EventContext,
1288
- StreamWriter,
1289
- TraceCtx,
1290
- RetryPolicy,
1291
- ServiceBridgeErrorSeverity,
1292
- } from "service-bridge";
1293
- ```
583
+ | Error | Thrown when |
584
+ |---|---|
585
+ | `RpcAccessDeniedError` | An RPC call is denied by access policy. Also fires a `policy_violation` event. |
586
+ | `WorkflowAccessDeniedError` | A workflow `start()` is denied by access policy. |
587
+ | `WorkflowNotFoundError` | Starting a workflow name the runtime doesn't know. |
588
+ | `WorkflowTerminalError` | Signalling/cancelling a run that already finished. |
589
+ | `InvalidEventNameError` | Publishing/defining an event whose name fails the naming rule. |
590
+ | `OutboxFullError` | The local event outbox is at `maxOutboxRows` (back-pressure). |
591
+ | `ServiceBridgeError` | Connection / provisioning failures; carries a typed `.code` (retryable ones drive auto-reconnect). |
1294
592
 
1295
593
  ---
1296
594
 
1297
595
  ## FAQ
1298
596
 
1299
- **How does ServiceBridge handle service failures?**
1300
- RPC calls have configurable retries with exponential backoff and hard per-attempt timeouts, so a silent downstream service cannot keep a call pending forever. Events are durable (PostgreSQL-backed) with at-least-once delivery per consumer group. Failed deliveries are retried according to policy, then moved to DLQ. Workflows track step state and can be resumed.
597
+ **Do I have to use Protobuf?** You point handlers at a `.proto` file or a `.schema.json` with explicit field numbers. Both are file-based; there is no inline schema.
1301
598
 
1302
- **Is there vendor lock-in?**
1303
- ServiceBridge is self-hosted. The runtime is a single Go binary + PostgreSQL. SDK calls map to standard patterns (RPC, pub/sub, cron) — migrating away means replacing SDK calls with equivalent library calls.
599
+ **Does ServiceBridge proxy my HTTP traffic?** No. You run your own Express / Fastify / Hono server. The integration only discovers your routes for the Service Map and adds trace spans — your HTTP path is untouched.
1304
600
 
1305
- **How does tracing work without an OTEL collector?**
1306
- The SDK automatically reports trace spans for every RPC call, event publish/delivery, workflow step, and HTTP request. The runtime stores traces in PostgreSQL and serves them via the built-in dashboard and a Loki-compatible API for Grafana integration.
601
+ **How do I scale horizontally?** Run as many SDK instances as you like; the runtime load-balances RPC across live instances and fails over automatically. The runtime itself is a single source of truth backed by PostgreSQL.
1307
602
 
1308
- **Can I use ServiceBridge alongside existing infrastructure?**
1309
- Yes. You can adopt incrementally — start with RPC between two services, add events later, then workflows. ServiceBridge doesn't require replacing your existing broker or mesh all at once.
603
+ **What happens on a transient disconnect?** Published events sit in the local SQLite outbox and drain when the connection returns. The SDK auto-reconnects (configurable) and rotates certs with overlap so live instances don't drop traffic.
1310
604
 
1311
- **What happens when the control plane is down?**
1312
- In-flight direct RPC calls continue working (they go service-to-service, not through the control plane). New discovery lookups, event publishes, and telemetry writes are queued in the SDK offline queue and flushed when the control plane recovers.
605
+ **Where do I see traces, metrics and the DLQ?** In the runtime dashboard on `:14444`. Tracing, metrics and the dead-letter queue are operated there.
1313
606
 
1314
- **What databases does the runtime support?**
1315
- PostgreSQL 16+. The runtime uses PostgreSQL for all persistence: traces, events, workflows, jobs, service registry, and configuration.
607
+ **Node or Bun?** Both. Node 18+ or any current Bun. Bun-native APIs are used where available.
1316
608
 
1317
609
  ---
1318
610
 
1319
- ## Community and Support
611
+ ## Community
1320
612
 
1321
- - Website: [servicebridge.dev](https://servicebridge.dev)
1322
- - GitHub: [github.com/service-bridge](https://github.com/service-bridge)
1323
- - SDK monorepo: [README.md](../README.md)
613
+ - **Website & docs:** [servicebridge.dev](https://servicebridge.dev) · [servicebridge.dev/docs](https://servicebridge.dev/docs)
614
+ - **SDK umbrella repo (all languages):** [github.com/service-bridge/sdk](https://github.com/service-bridge/sdk)
615
+ - **Runtime:** [github.com/servicebridge2/runtime](https://github.com/servicebridge2/runtime)
1324
616
 
1325
- ---
1326
-
1327
- ## License
1328
-
1329
- Free for non-commercial use. Commercial use requires a separate license. See [LICENSE](../LICENSE).
1330
-
1331
- Copyright (c) 2026 Eugene Surkov.
617
+ This is an alpha release (`2.0.0-alpha`). The API is stabilising — issues and feedback are welcome.
1332
618
 
1333
619
  ---
1334
620
 
1335
- ## Keywords
621
+ ## License
1336
622
 
1337
- service-bridge · servicebridge · npm install service-bridge · npm i service-bridge · bun add service-bridge · Node.js SDK · TypeScript SDK · JavaScript microservices · RPC · gRPC · event bus · event-driven · distributed tracing · workflow orchestration · background jobs · cron · mTLS · service mesh · service discovery · zero sidecar · Istio alternative · Envoy alternative · RabbitMQ alternative · Temporal alternative · Jaeger alternative · PostgreSQL · Docker · Kubernetes · DLQ · dead letter queue · saga · distributed transactions · AI agent orchestration · Express middleware · Fastify middleware · HTTP middleware · observability · Prometheus · tracing · service catalog · durable events · retries · idempotency · auto mTLS · runtime dashboard · production ready · microservice communication
623
+ Licensed under the **MIT License** see [LICENSE](./LICENSE). Free for any use, including commercial; you only need to keep the copyright and license notice (attribution to esurkov1 <esurkovv@yandex.ru>).