@net-mesh/sdk 0.19.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,1684 @@
1
+ # Net TypeScript SDK
2
+
3
+ Ergonomic TypeScript SDK for the Net mesh network.
4
+
5
+ Wraps the `@net-mesh/core` NAPI bindings with streaming, typed channels, and a developer-friendly API.
6
+
7
+ ## Install
8
+
9
+ ```bash
10
+ npm install @net-mesh/sdk @net-mesh/core
11
+ ```
12
+
13
+ ## Cargo features (transitive)
14
+
15
+ `@net-mesh/sdk` is pure TypeScript; every wrapper class dispatches into the underlying `@net-mesh/core` napi-rs binding. Published `.node` artifacts ship every feature enabled, but anyone building from source via `napi build` needs to pass them — symbols from a disabled feature are absent at runtime and the TypeScript wrapper's `import` will fail with `undefined`.
16
+
17
+ | Cargo feature | sdk-ts wrapper module | Surface |
18
+ |---|---|---|
19
+ | `cortex` | `@net-mesh/sdk/cortex` (also re-exported top-level) | `Redex`, `RedexFile`, `TasksAdapter`, `MemoriesAdapter`, `NetDb`, error types |
20
+ | `meshdb` | `@net-mesh/sdk/meshdb` | `MeshQuery`, `MeshQueryRunner`, `MeshQueryStream`, `QueryBuilder`, `InMemoryChainReader`, result + config types |
21
+ | `meshos` | `@net-mesh/sdk/meshos` | `MeshOsDaemonSdk`, `MeshOsDaemonHandle`, `MeshOsDaemon` interface, `DaemonHealth`, `CapabilityAdvert` |
22
+ | `compute` | `@net-mesh/sdk/compute` | `DaemonRuntime`, `DaemonHandle`, `MigrationHandle`, daemon trait shapes |
23
+ | `groups` | `@net-mesh/sdk/groups` | `ReplicaGroup`, `ForkGroup`, `StandbyGroup`, group config types |
24
+ | `deck` | `@net-mesh/sdk/deck` | `DeckClient`, `OperatorIdentity`, admin / snapshot / status streams, ICE break-glass |
25
+ | `redis` | `@net-mesh/sdk` top-level | `RedisStreamDedup` |
26
+ | `net` | `@net-mesh/sdk/mesh` | `MeshNode`, `NetStream`, channel auth |
27
+
28
+ The bus surface (`NetNode`, `EventStream`, capabilities, identity, predicates) is always present.
29
+
30
+ The `default` Cargo feature set enables every flag, so `npm install` users get full functionality. If you're building from source for an embedded target, slim the feature set in `bindings/node/Cargo.toml` and rebuild via `npm run build:debug` (or `build` for release).
31
+
32
+ ## Quick Start
33
+
34
+ ```typescript
35
+ import { NetNode } from '@net-mesh/sdk';
36
+
37
+ const node = await NetNode.create({ shards: 4 });
38
+
39
+ // Emit events
40
+ node.emit({ token: 'hello', index: 0 });
41
+ node.emitRaw('{"token": "world"}');
42
+ node.emitBuffer(Buffer.from('{"token": "foo"}'));
43
+
44
+ // Batch
45
+ node.emitBatch([{ a: 1 }, { a: 2 }, { a: 3 }]);
46
+
47
+ await node.flush();
48
+
49
+ // Poll
50
+ const response = await node.poll({ limit: 100 });
51
+ for (const event of response.events) {
52
+ console.log(event.raw);
53
+ }
54
+
55
+ // Stream (async iterator)
56
+ for await (const event of node.subscribe({ limit: 100 })) {
57
+ console.log(event.raw);
58
+ }
59
+
60
+ await node.shutdown();
61
+ ```
62
+
63
+ ## Typed Streams
64
+
65
+ ```typescript
66
+ interface TokenEvent {
67
+ token: string;
68
+ index: number;
69
+ }
70
+
71
+ for await (const token of node.subscribeTyped<TokenEvent>({ limit: 100 })) {
72
+ console.log(`${token.index}: ${token.token}`);
73
+ }
74
+ ```
75
+
76
+ ## Typed Channels
77
+
78
+ ```typescript
79
+ interface TemperatureReading {
80
+ sensor_id: string;
81
+ celsius: number;
82
+ timestamp: number;
83
+ }
84
+
85
+ const temps = node.channel<TemperatureReading>('sensors/temperature');
86
+
87
+ // Publish
88
+ temps.publish({ sensor_id: 'A1', celsius: 22.5, timestamp: Date.now() });
89
+
90
+ // Subscribe
91
+ for await (const reading of temps.subscribe()) {
92
+ console.log(`${reading.sensor_id}: ${reading.celsius}°C`);
93
+ }
94
+ ```
95
+
96
+ ## Ingestion Methods
97
+
98
+ | Method | Input | Speed | Returns |
99
+ |--------|-------|-------|---------|
100
+ | `emit(obj)` | Object | Fast | `Receipt` |
101
+ | `emitRaw(json)` | String | Fast | `Receipt` |
102
+ | `emitBuffer(buf)` | Buffer | Fastest | `boolean` |
103
+ | `emitBatch(objs)` | Object[] | Bulk | `number` |
104
+ | `emitRawBatch(jsons)` | String[] | Bulk | `number` |
105
+ | `fire(json)` | String | Fire-and-forget | `boolean` |
106
+ | `fireBatch(jsons)` | String[] | Fire-and-forget | `number` |
107
+
108
+ ## Transports
109
+
110
+ ```typescript
111
+ // In-memory (default)
112
+ await NetNode.create({ shards: 4 });
113
+
114
+ // Redis
115
+ await NetNode.create({ transport: { type: 'redis', url: 'redis://localhost:6379' } });
116
+
117
+ // JetStream
118
+ await NetNode.create({ transport: { type: 'jetstream', url: 'nats://localhost:4222' } });
119
+
120
+ // Encrypted mesh
121
+ await NetNode.create({
122
+ transport: {
123
+ type: 'mesh',
124
+ bind: '0.0.0.0:9000',
125
+ peer: '192.168.1.10:9001',
126
+ psk: '...',
127
+ peerPublicKey: '...',
128
+ },
129
+ });
130
+ ```
131
+
132
+ ### Persistent producer nonce (cross-restart dedup)
133
+
134
+ JetStream and Redis adapters key dedup on `(producer_nonce, shard,
135
+ sequence_start, i)`. Without persistence the nonce is fresh per
136
+ process — a producer that crashes mid-batch and restarts gets a
137
+ new nonce, retransmits look fresh, and the backend persists the
138
+ partial half twice. Configure
139
+ `producerNoncePath` to make the nonce durable:
140
+
141
+ ```typescript
142
+ await NetNode.create({
143
+ shards: 4,
144
+ transport: { type: 'redis', url: 'redis://localhost:6379' },
145
+ producerNoncePath: '/var/lib/myapp/producer.nonce',
146
+ });
147
+ ```
148
+
149
+ The bus loads (or creates on first run) a u64 nonce at this
150
+ path. JetStream gets cross-restart dedup automatically;
151
+ Redis Streams ships the same id as a `dedup_id` field on every
152
+ XADD, filterable via the helper below.
153
+
154
+ ## Redis Streams consumer-side dedup helper
155
+
156
+ The Redis adapter writes a stable `dedup_id` field on every XADD
157
+ entry (`{producer_nonce:hex}:{shard_id}:{sequence_start}:{i}`).
158
+ Combined with `producerNoncePath` above, the id is stable across
159
+ both retries and process restart, so the `MULTI/EXEC` timeout
160
+ race becomes filterable consumer-side.
161
+
162
+ `RedisStreamDedup` is exposed on the underlying `@net-mesh/core`
163
+ NAPI module:
164
+
165
+ ```typescript
166
+ import { RedisStreamDedup } from '@net-mesh/core';
167
+ import { createClient } from 'redis';
168
+
169
+ // Sizing: ~10k events/sec * 1 min dedup window → ~600,000.
170
+ const dedup = new RedisStreamDedup(600_000);
171
+
172
+ const r = createClient();
173
+ await r.connect();
174
+
175
+ let cursor = '0';
176
+ while (true) {
177
+ // XRANGE bounds are INCLUSIVE on both ends. After the first
178
+ // page we must use the exclusive form `(<id>` so we don't
179
+ // re-read the entry the cursor points at — a vanilla
180
+ // `xRange(stream, cursor, '+')` loop spins forever once the
181
+ // cursor reaches the tail and the same entry is returned every
182
+ // iteration.
183
+ const start = cursor === '0' ? cursor : `(${cursor}`;
184
+ const entries = await r.xRange('net:shard:0', start, '+', { COUNT: 100 });
185
+ if (entries.length === 0) break;
186
+ for (const entry of entries) {
187
+ const dedupId = entry.message.dedup_id;
188
+ if (!dedupId) {
189
+ // Older entries / non-Net producers: skip dedup.
190
+ await process(entry);
191
+ continue;
192
+ }
193
+ if (!dedup.isDuplicate(dedupId)) {
194
+ await process(entry);
195
+ }
196
+ cursor = entry.id;
197
+ }
198
+ }
199
+ ```
200
+
201
+ Surface (NAPI class):
202
+
203
+ ```typescript
204
+ new RedisStreamDedup(capacity?: number) // defaults to 4096
205
+ dedup.isDuplicate(id: string): boolean
206
+ dedup.len: number // readonly
207
+ dedup.capacity: number // readonly
208
+ dedup.isEmpty: boolean // readonly
209
+ dedup.clear(): void
210
+ ```
211
+
212
+ The helper is transport-agnostic — bring your own `redis` /
213
+ `ioredis` / equivalent client; it just answers the dedup
214
+ question against an in-memory LRU. Concurrency: the underlying
215
+ handle wraps a Rust mutex, so concurrent calls from worker
216
+ threads serialize but are safe. Production-shape is one helper
217
+ per consumer worker.
218
+
219
+ ## NAT Traversal (optimization, not correctness)
220
+
221
+ Two NATed peers already reach each other through the mesh's routed-handshake path. NAT traversal opens a shorter direct path when the NAT shape allows it; it's never required for connectivity. The TS SDK doesn't yet wrap this surface — it's a planned follow-up. For now, construct a `NetMesh` from `@net-mesh/core` directly to access the NAPI methods:
222
+
223
+ ```ts
224
+ import { NetMesh } from '@net-mesh/core';
225
+
226
+ const mesh = await NetMesh.create({
227
+ bindAddr: '0.0.0.0:9000',
228
+ psk: '00'.repeat(32),
229
+ });
230
+
231
+ await mesh.reclassifyNat();
232
+
233
+ const klass = mesh.natType(); // "open" | "cone" | "symmetric" | "unknown"
234
+ const reflex = mesh.reflexAddr(); // "203.0.113.5:9001" | null
235
+
236
+ const observed = await mesh.probeReflex(peerNodeId); // "ip:port"
237
+
238
+ // Attempt a direct connection via the pair-type matrix.
239
+ // `coordinator` mediates the punch when the matrix picks one.
240
+ // Always resolves — stats tell you which path won.
241
+ await mesh.connectDirect(peerNodeId, peerPubkeyHex, coordinatorNodeId);
242
+
243
+ // Cumulative counters — all BigInt, monotonic.
244
+ const s = mesh.traversalStats();
245
+ s.punchesAttempted; // coordinator mediated a PunchRequest + Introduce
246
+ s.punchesSucceeded; // ack arrived AND direct handshake landed
247
+ s.relayFallbacks; // landed on the routed path after skip/fail
248
+ ```
249
+
250
+ Operators with a known-public address skip the classifier sweep entirely. The override pins `"open"` + the supplied address on every capability announcement; call `announceCapabilities()` after to propagate (the setter resets the rate-limit floor so the next announce is guaranteed to broadcast).
251
+
252
+ ```ts
253
+ mesh.setReflexOverride('203.0.113.5:9001');
254
+ await mesh.announceCapabilities(/* caps */);
255
+ // later:
256
+ mesh.clearReflexOverride();
257
+ await mesh.announceCapabilities(/* caps */);
258
+ ```
259
+
260
+ Traversal failures surface as `Error` instances whose `message` follows the stable `traversal: <kind>[: <detail>]` convention. The `<kind>` discriminator is one of `reflex-timeout` | `peer-not-reachable` | `transport` | `rendezvous-no-relay` | `rendezvous-rejected` | `punch-failed` | `port-map-unavailable` | `unsupported`. Match on the prefix:
261
+
262
+ ```ts
263
+ try {
264
+ await mesh.connectDirect(peerNodeId, peerPubkeyHex, coordId);
265
+ } catch (e) {
266
+ const msg = (e as Error).message;
267
+ if (msg.startsWith('traversal: unsupported')) {
268
+ // native library built without --features nat-traversal
269
+ } else if (msg.startsWith('traversal: peer-not-reachable')) {
270
+ // ...
271
+ }
272
+ }
273
+ ```
274
+
275
+ A build without the `nat-traversal` feature raises `traversal: unsupported` for every NAT call — the routed path keeps working regardless. The NAPI type declarations for these methods are only generated when the build-time type-gen runs against a build *with* the feature, so a feature-off cdylib may require an `as any` cast or a local `.d.ts` augmentation.
276
+
277
+ ## Mesh Streams (multi-peer + back-pressure)
278
+
279
+ For direct peer-to-peer messaging — open a stream to a specific peer
280
+ and react to back-pressure with first-class error classes:
281
+
282
+ ```typescript
283
+ import { MeshNode, BackpressureError, NotConnectedError } from '@net-mesh/sdk';
284
+
285
+ const node = await MeshNode.create({
286
+ bindAddr: '127.0.0.1:9000',
287
+ psk: '0'.repeat(64),
288
+ });
289
+ // ... handshake (node.connect(...) / node.accept(...)) ...
290
+
291
+ const stream = node.openStream(peerNodeId, {
292
+ streamId: 0x42n,
293
+ reliability: 'reliable',
294
+ windowBytes: 256, // max in-flight packets before BackpressureError
295
+ });
296
+
297
+ // Three canonical daemon patterns:
298
+
299
+ // 1. Drop on pressure.
300
+ try {
301
+ await node.sendOnStream(stream, [Buffer.from('{}')]);
302
+ } catch (e) {
303
+ if (e instanceof BackpressureError) {
304
+ metrics.inc('stream.backpressure_drops');
305
+ } else if (e instanceof NotConnectedError) {
306
+ // peer gone or stream closed — re-open if needed
307
+ } else {
308
+ throw e;
309
+ }
310
+ }
311
+
312
+ // 2. Retry with exponential backoff (5 ms → 200 ms, up to maxRetries).
313
+ await node.sendWithRetry(stream, [Buffer.from('{}')], 8);
314
+
315
+ // 3. Block until the network lets up (bounded retry, ~13 min worst case).
316
+ await node.sendBlocking(stream, [Buffer.from('{}')]);
317
+
318
+ // Live stats — tx/rx seq, in-flight, window, backpressure count (BigInts).
319
+ const stats = node.streamStats(peerNodeId, 0x42n);
320
+ ```
321
+
322
+ `BackpressureError` and `NotConnectedError` both extend `Error`, so
323
+ `instanceof` and `try/catch` work as expected. The transport never
324
+ retries or buffers on its own behalf — the helper methods are
325
+ opt-in policies, not defaults. See `../docs/TRANSPORT.md` for the full
326
+ contract.
327
+
328
+ ## Security (identity, tokens, capabilities, subnets)
329
+
330
+ Identity, capabilities, and subnets ride the underlying NAPI bindings
331
+ as a single security unit — the mesh's subprotocol dispatch threads
332
+ identity + capabilities + subnets + channel auth together at runtime,
333
+ and the TS SDK surfaces all of it through one type hierarchy.
334
+
335
+ ```typescript
336
+ import { randomBytes } from 'node:crypto';
337
+ import { Identity, MeshNode } from '@net-mesh/sdk';
338
+
339
+ // Load once from caller-owned storage (vault / KMS / env secret).
340
+ // The persisted form IS the 32-byte seed; treat as secret material.
341
+ const seed = randomBytes(32);
342
+ const identity = Identity.fromSeed(seed);
343
+
344
+ // Stable entity_id / node_id across restarts — derived from the seed.
345
+ const mesh = await MeshNode.create({
346
+ bindAddr: '127.0.0.1:9001',
347
+ psk: '42'.repeat(32),
348
+ identitySeed: seed, // mesh and identity share the keypair
349
+ });
350
+
351
+ // mesh.entityId().equals(identity.entityId) // true — compare via
352
+ // Buffer.equals(), since `===` on Buffers checks reference identity
353
+ // not byte equality.
354
+
355
+ // Issue a scoped subscribe grant for another entity.
356
+ const grantee = Identity.generate();
357
+ const token = identity.issueToken({
358
+ subject: grantee.entityId,
359
+ scope: ['subscribe'],
360
+ channel: 'sensors/temp',
361
+ ttlSeconds: 300, // `0` throws — zero TTL would mint a born-expired token
362
+ delegationDepth: 0, // 0 forbids re-delegation
363
+ });
364
+
365
+ // `token.bytes` is the transport-ready 161-byte blob.
366
+ // Ship it to the grantee; they hand it back on subscribe.
367
+ ```
368
+
369
+ Errors surface as `IdentityError` (malformed inputs — bad seed
370
+ length, unknown scope, invalid channel name) and `TokenError` whose
371
+ `kind` discriminator is one of `invalid_format` | `invalid_signature`
372
+ | `expired` | `not_yet_valid` | `delegation_exhausted` |
373
+ `delegation_not_allowed` | `not_authorized`. Both extend `Error`,
374
+ so `try/catch` + `instanceof` work as expected.
375
+
376
+ ### Capability announcements
377
+
378
+ `mesh.announceCapabilities(caps)` broadcasts a `CapabilitySet` to
379
+ every directly-connected peer and self-indexes locally.
380
+ `mesh.findNodes(filter)` queries the local index — results include
381
+ this node's own id when self matches.
382
+
383
+ ```typescript
384
+ import { MeshNode } from '@net-mesh/sdk';
385
+
386
+ const mesh = await MeshNode.create({
387
+ bindAddr: '127.0.0.1:9002',
388
+ psk: '42'.repeat(32),
389
+ });
390
+
391
+ await mesh.announceCapabilities({
392
+ hardware: {
393
+ cpuCores: 16,
394
+ memoryGb: 64,
395
+ gpu: { vendor: 'nvidia', model: 'h100', vramGb: 80 },
396
+ },
397
+ models: [
398
+ { modelId: 'llama-3.1-70b', family: 'llama', contextLength: 128_000 },
399
+ ],
400
+ tags: ['gpu', 'prod'],
401
+ });
402
+
403
+ const gpuPeers = mesh.findNodes({
404
+ requireGpu: true,
405
+ gpuVendor: 'nvidia',
406
+ minVramMb: 40_000,
407
+ });
408
+ // gpuPeers includes mesh.nodeId() on self-match.
409
+ ```
410
+
411
+ #### Scoped discovery (reserved `scope:*` tags)
412
+
413
+ A provider can narrow *who its query result reaches* by tagging
414
+ its `CapabilitySet` with reserved `scope:*` tags. Queries call
415
+ `mesh.findNodesScoped(filter, scope)` to filter candidates. The
416
+ wire format and forwarders are untouched — enforcement is
417
+ purely query-side.
418
+
419
+ ```typescript
420
+ import { withTenantScope } from '@net-mesh/sdk';
421
+
422
+ // GPU pool advertised to one tenant only.
423
+ await mesh.announceCapabilities({
424
+ tags: withTenantScope(['model:llama3-70b'], 'oem-123'),
425
+ });
426
+
427
+ // Tenant-scoped query — returns this node + any Global (untagged) peers.
428
+ const oemNodes = mesh.findNodesScoped(
429
+ { requireTags: ['model:llama3-70b'] },
430
+ { kind: 'tenant', tenant: 'oem-123' },
431
+ );
432
+ ```
433
+
434
+ `ScopeFilter` is a tagged union by `kind`:
435
+ `{ kind: 'any' }` (default), `{ kind: 'globalOnly' }`,
436
+ `{ kind: 'sameSubnet' }`, `{ kind: 'tenant', tenant }`,
437
+ `{ kind: 'tenants', tenants: [...] }`,
438
+ `{ kind: 'region', region }`,
439
+ `{ kind: 'regions', regions: [...] }`. Reserved announcement
440
+ tags: `scope:subnet-local` (visible only under `sameSubnet`),
441
+ `scope:tenant:<id>`, `scope:region:<name>` — strictest scope
442
+ wins. Helpers `withTenantScope`, `withRegionScope`,
443
+ `withSubnetLocalScope` build the tag list idempotently.
444
+ Untagged peers resolve to `Global` and stay visible under
445
+ permissive queries. Full design:
446
+ [`docs/SCOPED_CAPABILITIES_PLAN.md`](../docs/SCOPED_CAPABILITIES_PLAN.md).
447
+
448
+ Propagation is multi-hop, bounded by `MAX_CAPABILITY_HOPS = 16`.
449
+ Forwarders re-broadcast every received announcement to their other
450
+ peers; dedup on `(origin, version)` drops duplicates at convergence
451
+ points, and `hop_count` sits outside the signed envelope so the
452
+ origin's signature verifies at every hop.
453
+ `capabilityGcIntervalMs` + TTL-driven eviction are configurable on
454
+ `MeshNode.create`. See
455
+ [`docs/MULTIHOP_CAPABILITY_PLAN.md`](../docs/MULTIHOP_CAPABILITY_PLAN.md).
456
+
457
+ #### Capability enhancements (typed taxonomy + predicates + validation)
458
+
459
+ Beyond announce / find-peers, the SDK exposes a caller-local
460
+ enhancement layer mirroring the substrate's `CapabilityEnhancements`:
461
+
462
+ ```typescript
463
+ import {
464
+ // Typed taxonomy
465
+ tagFromUserString, RESERVED_PREFIXES,
466
+ // Chain helpers
467
+ emptyCapabilities, requireTag, requireAxisValue, withMetadata,
468
+ // Predicates
469
+ p, evaluatePredicate, predicateToRpcHeader, predicateFromRpcHeader,
470
+ RPC_WHERE_HEADER,
471
+ // Predicate trace + debug
472
+ evaluatePredicateWithTrace,
473
+ predicateDebugReport, redactMetadataKeys, renderDebugReport,
474
+ // Validation
475
+ validateCapabilities, isReportValid,
476
+ // Diff
477
+ diffCapabilities,
478
+ // Placement filters
479
+ standardPlacement, placementFilterFromFn,
480
+ } from '@net-mesh/sdk';
481
+
482
+ // Build a capability set in the wire shape `{ tags, metadata }`.
483
+ let caps = emptyCapabilities();
484
+ caps = requireTag(caps, 'hardware', 'gpu');
485
+ caps = requireAxisValue(caps, 'software', 'os', 'linux');
486
+ caps = withMetadata(caps, 'intent', 'ml-training');
487
+
488
+ // Author a predicate.
489
+ const pred = p.and(
490
+ p.exists({ axis: 'hardware', key: 'gpu' }),
491
+ p.numericAtLeast({ axis: 'hardware', key: 'memory_gb' }, 64),
492
+ p.metadataEquals('intent', 'ml-training'),
493
+ );
494
+
495
+ // Local evaluation (no mesh round-trip).
496
+ const matched = evaluatePredicate(pred, caps.tags, caps.metadata);
497
+
498
+ // Wire form for nRPC `net-where:` headers — pair with the
499
+ // header-bearing call variants (`callWithHeaders` etc.) so a
500
+ // server-side filter can match candidates without running the
501
+ // predicate over the whole route.
502
+ const headerValue = predicateToRpcHeader(pred);
503
+ // Reverse direction: parse a peer-supplied header back into the AST.
504
+ const decoded = predicateFromRpcHeader(headerValue);
505
+
506
+ // Validate against the canonical schema (catches typos, type
507
+ // mismatches, oversize metadata, legacy tags).
508
+ const report = validateCapabilities(caps);
509
+ if (!isReportValid(report)) {
510
+ console.error('schema errors:', report.errors);
511
+ }
512
+
513
+ // Detect what changed between two snapshots — drives placement
514
+ // re-evaluation when a daemon's CapabilitySet updates.
515
+ const delta = diffCapabilities(prevCaps, caps);
516
+
517
+ // Single-evaluation trace — every clause's verdict + skipped
518
+ // children for short-circuit AND/OR.
519
+ const { result, trace } = evaluatePredicateWithTrace(pred, tags, metadata);
520
+
521
+ // Profile a predicate across a corpus + render a per-clause report.
522
+ const debug = predicateDebugReport(pred, contexts);
523
+ const safe = redactMetadataKeys(debug, ['intent']); // scrub before persisting
524
+ console.log(renderDebugReport(safe));
525
+
526
+ // Wrap a predicate as a placement-filter callback the substrate
527
+ // invokes per candidate. Pair with `standardPlacement` to
528
+ // install a custom scoring axis driven by the JS predicate.
529
+ const filter = placementFilterFromFn((cand) =>
530
+ evaluatePredicate(pred, cand.tags, cand.metadata),
531
+ );
532
+ const placement = standardPlacement().withCustomFilterId(filter.id).build();
533
+ ```
534
+
535
+ The wire format is byte-identical across all five bindings (Rust /
536
+ TS / Python / Go / C) — pinned by JSON fixtures under
537
+ `tests/cross_lang_capability/`. A predicate authored in TS and
538
+ shipped to a Go service via nRPC headers decodes losslessly.
539
+
540
+ ### Subnets (visibility partitioning)
541
+
542
+ `subnet` pins a node to a specific 4-level `SubnetId`; `subnetPolicy`
543
+ derives each *peer's* subnet from their inbound capability tags so
544
+ every node in the mesh agrees on the geometry without a central
545
+ directory.
546
+
547
+ ```typescript
548
+ import { MeshNode } from '@net-mesh/sdk';
549
+
550
+ const policy = {
551
+ rules: [
552
+ { tagPrefix: 'region:', level: 0, values: { us: 3, eu: 4 } },
553
+ { tagPrefix: 'fleet:', level: 1, values: { blue: 7, green: 8 } },
554
+ ],
555
+ };
556
+
557
+ const mesh = await MeshNode.create({
558
+ bindAddr: '127.0.0.1:9003',
559
+ psk: '42'.repeat(32),
560
+ subnet: { levels: [3, 7] }, // us/blue
561
+ subnetPolicy: policy,
562
+ });
563
+
564
+ // Announce tags matching the policy so peers derive the same
565
+ // SubnetId [3, 7] when they apply their own policy to our caps.
566
+ await mesh.announceCapabilities({ tags: ['region:us', 'fleet:blue'] });
567
+ ```
568
+
569
+ Channel `visibility` gates publish fan-out and subscribe
570
+ authorization against the derived geometry. Cross-subnet subscribes
571
+ to a `SubnetLocal` channel reject with `Unauthorized`.
572
+
573
+ ### Channel authentication
574
+
575
+ `ChannelConfig` carries three auth knobs, enforced end-to-end at
576
+ both the subscribe gate and the publish path:
577
+
578
+ - `publishCaps: CapabilityFilter` — publisher must satisfy before
579
+ fan-out. Failing publishes raise an error; no peers are attempted.
580
+ - `subscribeCaps: CapabilityFilter` — subscribers must satisfy
581
+ before being added to the roster. Failures surface as
582
+ `ChannelAuthError`.
583
+ - `requireToken: true` — subscribers must present a valid `Token`
584
+ whose subject matches their `entityId`. The publisher verifies
585
+ the ed25519 signature, installs the token in its local cache,
586
+ then runs `can_subscribe`.
587
+
588
+ ```typescript
589
+ import { Identity, MeshNode } from '@net-mesh/sdk';
590
+
591
+ const pubIdentity = Identity.generate();
592
+ const subIdentity = Identity.generate();
593
+
594
+ const publisher = await MeshNode.create({
595
+ bindAddr: '127.0.0.1:9004',
596
+ psk: '42'.repeat(32),
597
+ identitySeed: pubIdentity.toBytes(),
598
+ });
599
+
600
+ // Subscriber-side mesh, pinned to subIdentity so the publisher's
601
+ // `require_token` check matches the token's subject against the
602
+ // subscribing peer's entityId.
603
+ const subscriber = await MeshNode.create({
604
+ bindAddr: '127.0.0.1:9005',
605
+ psk: '42'.repeat(32),
606
+ identitySeed: subIdentity.toBytes(),
607
+ });
608
+ // Handshake the pair + start receive loops before any subscribe —
609
+ // omitted here for brevity; see the `Mesh Streams` section.
610
+
611
+ publisher.registerChannel({
612
+ name: 'events/inference',
613
+ subscribeCaps: { requireTags: ['gpu'] },
614
+ requireToken: true,
615
+ });
616
+
617
+ // Issue a SUBSCRIBE-scope token for the subscriber.
618
+ const token = pubIdentity.issueToken({
619
+ subject: subIdentity.entityId,
620
+ scope: ['subscribe'],
621
+ channel: 'events/inference',
622
+ ttlSeconds: 300,
623
+ });
624
+
625
+ // Subscriber attaches the token on subscribe.
626
+ await subscriber.subscribeChannel(
627
+ publisher.nodeId(),
628
+ 'events/inference',
629
+ { token },
630
+ );
631
+ ```
632
+
633
+ Denied subscribes surface as `ChannelAuthError` (a subclass of
634
+ `ChannelError`); malformed token bytes raise `TokenError` before
635
+ any network I/O. Successful subscribes populate an `AuthGuard`
636
+ bloom filter on the publisher so every subsequent publish admits
637
+ the subscriber in constant time (~20 ns per check,
638
+ single-threaded). Expired tokens evict within the publisher's
639
+ `token_sweep_interval` (default 30 s); repeated subscribe
640
+ failures from the same peer throttle via `RateLimited` acks so
641
+ bad-token storms never tie up ed25519 verification. Cross-SDK
642
+ behaviour is fixed by the Rust integration suite — see
643
+ [`SDK_SECURITY_SURFACE_PLAN.md`](../docs/SDK_SECURITY_SURFACE_PLAN.md)
644
+ and
645
+ [`CHANNEL_AUTH_GUARD_PLAN.md`](../docs/CHANNEL_AUTH_GUARD_PLAN.md)
646
+ for the full contract.
647
+
648
+ ## Channels (distributed pub/sub)
649
+
650
+ Named pub/sub across the encrypted mesh. The publisher registers a
651
+ channel config; subscribers ask to join via `subscribeChannel` (the
652
+ subscribe goes through a dedicated subprotocol with an Ack round-trip);
653
+ `publish` fans one payload out to every current subscriber.
654
+
655
+ ```typescript
656
+ import { MeshNode, ChannelAuthError } from '@net-mesh/sdk';
657
+
658
+ const psk = '0'.repeat(64);
659
+
660
+ // Publisher side.
661
+ const b = await MeshNode.create({ bindAddr: '127.0.0.1:9001', psk });
662
+ b.registerChannel({
663
+ name: 'sensors/temp',
664
+ visibility: 'global', // or 'subnet-local' / 'parent-visible' / 'exported'
665
+ reliable: true,
666
+ priority: 2,
667
+ maxRatePps: 1000,
668
+ });
669
+
670
+ // Subscriber side + full handshake.
671
+ const a = await MeshNode.create({ bindAddr: '127.0.0.1:9002', psk });
672
+ const aNodeId = a.nodeId();
673
+ const bNodeId = b.nodeId();
674
+ // connect/accept must race: the initiator blocks on a handshake reply
675
+ // that only shows up once the responder is in accept(). Then both
676
+ // sides must start() their receive loops before app traffic flows.
677
+ await Promise.all([
678
+ b.accept(aNodeId),
679
+ a.connect('127.0.0.1:9001', b.publicKey(), bNodeId),
680
+ ]);
681
+ await a.start();
682
+ await b.start();
683
+ await a.subscribeChannel(bNodeId, 'sensors/temp');
684
+
685
+ // Fan out.
686
+ const report = await b.publish(
687
+ 'sensors/temp',
688
+ Buffer.from(JSON.stringify({ celsius: 22.5 })),
689
+ { reliability: 'reliable', onFailure: 'best_effort', maxInflight: 32 },
690
+ );
691
+ console.log(`${report.delivered}/${report.attempted} subscribers received`);
692
+
693
+ // Rejections surface with typed errors:
694
+ try {
695
+ await a.subscribeChannel(bNodeId, 'restricted');
696
+ } catch (e) {
697
+ if (e instanceof ChannelAuthError) { /* ACL rejected */ }
698
+ }
699
+ ```
700
+
701
+ **Channel names always cross the boundary as strings.** The u16 hash
702
+ is a transport-layer index only; ACL lookups key on the canonical
703
+ name to avoid bypass via hash collision (see `../docs/CHANNELS.md`).
704
+
705
+ Subscribers today receive payloads through the existing event-bus
706
+ `poll()` surface — a dedicated per-channel `AsyncIterable` receive
707
+ method is a follow-up.
708
+
709
+ ## CortEX & NetDb (event-sourced state)
710
+
711
+ Typed, event-sourced state on top of RedEX — tasks and memories with
712
+ filterable queries and reactive `AsyncIterable` watches. Includes the
713
+ `snapshotAndWatch` primitive whose race fix landed on v2, so you can
714
+ safely "paint what's there now, then react to changes" without losing
715
+ updates that race during construction.
716
+
717
+ ```typescript
718
+ import { NetDb, TaskStatus, CortexError } from '@net-mesh/sdk';
719
+
720
+ const db = await NetDb.open({
721
+ originHash: 0xABCDEF01,
722
+ withTasks: true,
723
+ withMemories: true,
724
+ // persistentDir + persistent: true for disk-backed files
725
+ });
726
+
727
+ // CRUD through the domain API — no EventMeta plumbing.
728
+ try {
729
+ const seq = db.tasks!.create(1n, 'write docs', 100n);
730
+ await db.tasks!.waitForSeq(seq); // wait for the fold to apply
731
+ } catch (e) {
732
+ if (e instanceof CortexError) { /* handle adapter error */ }
733
+ else { throw e; }
734
+ }
735
+
736
+ // Snapshot + watch: one atomic call, no race.
737
+ const { snapshot, updates } = await db.tasks!.snapshotAndWatch({
738
+ status: TaskStatus.Pending,
739
+ });
740
+ render(snapshot);
741
+ for await (const next of updates) {
742
+ render(next);
743
+ if (shouldStop) break; // automatically closes the native iterator
744
+ }
745
+
746
+ db.close();
747
+ ```
748
+
749
+ ### Plain watches
750
+
751
+ `watch()` returns the same `AsyncIterable<T[]>` shape without a
752
+ snapshot. Prefer `snapshotAndWatch` when the caller needs the initial
753
+ result — calling `listTasks()` + `watch()` separately races, and a
754
+ mutation landing between them can be silently lost.
755
+
756
+ ```typescript
757
+ for await (const batch of await db.tasks!.watch({ titleContains: 'ship' })) {
758
+ // each batch is the current filter result after a deduplicated fold tick
759
+ }
760
+ ```
761
+
762
+ ### Standalone adapters
763
+
764
+ If you only need one model, skip the `NetDb` facade and open the
765
+ adapter directly against a `Redex`:
766
+
767
+ ```typescript
768
+ import { Redex, TasksAdapter } from '@net-mesh/sdk';
769
+
770
+ const redex = new Redex({ persistentDir: '/var/lib/net/redex' });
771
+ const tasks = await TasksAdapter.open(redex, 0xABCDEF01, { persistent: true });
772
+ ```
773
+
774
+ ### Raw RedEX file (no CortEX fold)
775
+
776
+ For domain-agnostic persistent logs — your own event schema, no fold,
777
+ no typed adapter — open a `RedexFile` directly from a `Redex`. The
778
+ tail iterator is the same `AsyncIterable` shape as the CortEX
779
+ watches, so `for await` + `break` cleans up native resources.
780
+
781
+ ```typescript
782
+ import { Redex, RedexError } from '@net-mesh/sdk';
783
+
784
+ const redex = new Redex({ persistentDir: '/var/lib/net/events' });
785
+ const file = redex.openFile('analytics/clicks', {
786
+ persistent: true,
787
+ fsyncIntervalMs: 100, // or fsyncEveryN: 1000n
788
+ retentionMaxEvents: 1_000_000n,
789
+ });
790
+
791
+ // Append (or batch-append).
792
+ const seq = file.append(Buffer.from(JSON.stringify({ url: '/home' })));
793
+ // `appendBatch` returns the first-seq `bigint` of the batch, or
794
+ // `null` for an empty input. The `null` return is the explicit
795
+ // "I appended nothing" signal — pre-`bugfixes-8` it returned `0n`,
796
+ // which collided with the legitimate "first event of a non-empty
797
+ // batch landed at seq 0" return.
798
+ const firstSeq = file.appendBatch(payloadBuffers);
799
+
800
+ // Tail — backfills the retained range, then streams live appends.
801
+ const stream = await file.tail(0n);
802
+ try {
803
+ for await (const event of stream) {
804
+ const parsed = JSON.parse(event.payload.toString());
805
+ console.log(event.seq, parsed);
806
+ if (shouldStop) break; // automatically closes the native iterator
807
+ }
808
+ } catch (e) {
809
+ if (e instanceof RedexError) { /* ... */ }
810
+ throw e;
811
+ } finally {
812
+ // Ensure the file is closed even if tailing / parsing throws.
813
+ file.close();
814
+ }
815
+ ```
816
+
817
+ ### Cross-node RedEX replication
818
+
819
+ RedEX channels can replicate across the mesh. Opt in per channel by
820
+ setting `replication` on the file config. The default — omitting
821
+ `replication` — keeps the channel single-node and adds zero wire
822
+ traffic. Replicated channels carry N copies of the log; the leader is
823
+ the single writer, replicas catch up via pull-based sync. Failover
824
+ uses a deterministic nearest-RTT election with NodeId tie-break.
825
+
826
+ ```typescript
827
+ import { NetMesh, Redex } from '@net-mesh/core';
828
+
829
+ const mesh = await NetMesh.create({
830
+ bindAddr: '127.0.0.1:0',
831
+ psk: '...',
832
+ });
833
+ const redex = new Redex({ persistentDir: '/var/lib/net/events' });
834
+
835
+ // Install the per-Redex replication router on the mesh.
836
+ // Idempotent — safe to call from multiple paths.
837
+ redex.enableReplication(mesh);
838
+
839
+ const file = redex.openFile('orders/audit', {
840
+ persistent: true,
841
+ replication: {
842
+ factor: 3, // 1..16; default 3
843
+ heartbeatMs: 500n, // min 100; default 500
844
+ placement: 'standard', // 'standard' | 'pinned' | 'colocation-strict'
845
+ // pinnedNodes: [nodeIdA, nodeIdB, nodeIdC], // required when placement = 'pinned'
846
+ // leaderPinned: someNodeId,
847
+ onUnderCapacity: 'withdraw', // 'withdraw' (default) | 'evict-oldest'
848
+ replicationBudgetFraction: 0.5,
849
+ },
850
+ });
851
+ file.append(Buffer.from('event payload'));
852
+ ```
853
+
854
+ The leader handles every append locally; replicas observe the
855
+ leader's heartbeat `tail_seq`, issue `SYNC_REQUEST` on lag, apply
856
+ chunks via `SYNC_RESPONSE`. When the leader closes (or the replica's
857
+ believed leader goes silent past `3 × heartbeatMs`), the surviving
858
+ replicas run the deterministic election and one becomes the new
859
+ leader within microseconds.
860
+
861
+ `Redex.replicationPrometheusText()` renders the seven per-channel
862
+ metric shapes — `*_lag_seconds`, `*_sync_bytes_total`,
863
+ `*_leader_changes_total`, `*_under_capacity_total`,
864
+ `*_skip_ahead_total`, `*_election_thrash_total`,
865
+ `*_witness_withdrawals_total` — for an HTTP scrape endpoint. Returns
866
+ the empty string when replication isn't enabled; pipe directly into a
867
+ response body without branching. `replicationRuntimeCount()` returns
868
+ the count of registered per-channel runtimes.
869
+
870
+ ```typescript
871
+ // HTTP scrape handler
872
+ app.get('/metrics', (req, res) => {
873
+ res.type('text/plain').send(redex.replicationPrometheusText());
874
+ });
875
+ ```
876
+
877
+ Disk-pressure handling: when a replica's local file rejects an
878
+ append (heap-segment cap or disk write-fail), the configured
879
+ `onUnderCapacity` policy fires — `withdraw` drops the replica role
880
+ (capability tag withdrawn; peers re-route to a healthy holder),
881
+ `evict-oldest` runs retention sweep + retries (requires
882
+ `retentionMax*` caps to be set on the same file config).
883
+
884
+ ### Error classes
885
+
886
+ CortEX-boundary errors are typed and catchable via `instanceof`:
887
+
888
+ - `CortexError` — adapter errors (fold halted, RedEX I/O, decode failures).
889
+ - `NetDbError` — snapshot/restore bundle errors, missing-model lookups.
890
+ - `RedexError` — raw file errors (invalid channel name, bad config,
891
+ append / tail / sync / close failures).
892
+
893
+ All three are re-exported from `@net-mesh/sdk`; you don't need a
894
+ separate import path.
895
+
896
+ ## Dataforts (greedy cache, gravity, blob refs, read-your-writes)
897
+
898
+ Dataforts is the compositional data plane on top of RedEX / CortEX
899
+ / capability-index / proximity-graph. The TypeScript surface exposes
900
+ greedy + gravity through `Redex` methods, blob registration through
901
+ top-level helpers, and read-your-writes through `WriteToken` +
902
+ `waitForToken` on `Tasks` / `Memories`. The underlying native module
903
+ is built with the `dataforts` Cargo feature; pre-built `@net-mesh/core`
904
+ release artifacts ship with the feature on.
905
+
906
+ Four phases:
907
+
908
+ - **Phase 1 — Greedy-LRU caching.** Per-node speculative caching
909
+ of in-scope chains observed via the tail-subscription path.
910
+ Five-axis admission (scope + proximity + capability-preference
911
+ + colocation + storage-cap) plus a bandwidth budget gate decide
912
+ whether to admit each inbound event. Cold channels evict under
913
+ cluster-cap pressure and withdraw their `causal:<hex>`
914
+ advertisement. The runtime also observes `BlobRef`-shaped
915
+ payloads + runs the `should_pull_blob` admission gate; on
916
+ admit the wired `BlobAdapter::prefetch` spawns a best-effort
917
+ pull via the per-chunk replication runtime.
918
+ - **Phase 3 — `BlobRef` + blob adapters.** Two shapes:
919
+ - **External-hook variant (v0.15):** a `[0xB0, 0xB1, 0xB2,
920
+ 0xB3]` magic + version + 32-byte BLAKE3 + size + URI
921
+ reference whose bytes live in the caller's storage (S3 /
922
+ Ceph / IPFS / local FS). Exposed today via
923
+ `registerFilesystemBlobAdapter` + `blobPublish` /
924
+ `blobResolve`.
925
+ - **Substrate-owned variant (v0.2):** the substrate stores
926
+ each chunk as a content-addressed `RedexFile`, riding the
927
+ existing replication runtime for cross-node placement.
928
+ `MeshBlobAdapter` is now available as a TypeScript class
929
+ on the `@net-mesh/core` Node binding (CRUD path: `store` /
930
+ `fetch` / `fetchRange` / `exists` / `prometheusText`).
931
+ The deeper integration points (`publish_with_blob`,
932
+ `BlobRefcountTable`, `BlobMetrics`, `BlobAdapter::prefetch`)
933
+ are still Rust-only — operator scripts that need them
934
+ from TypeScript call out to the `net-blob` CLI or a Rust-
935
+ side daemon RPC until each follow-up wrapper lands. See
936
+ [`docs/plans/DATAFORTS_BLOB_STORAGE_PLAN.md`](../docs/plans/DATAFORTS_BLOB_STORAGE_PLAN.md)
937
+ for the shipping status.
938
+ - **Phase 3.5 — Active blob overflow (v0.3 blob track).** Push-
939
+ side complement of Phase 4's pull-driven migration. Disabled
940
+ by default; opt in via the `MeshBlobAdapter` constructor's
941
+ `overflow` option or the runtime `setOverflowEnabled(true)`
942
+ method. The full counter family
943
+ (`dataforts_blob_overflow_*` — admitted / 6-label per-reason
944
+ rejected / hysteresis edges / `active` gauge / `disk_ratio`)
945
+ lands in `prometheusText()`. See
946
+ [`docs/plans/DATAFORTS_BLOB_OVERFLOW_PLAN.md`](../docs/plans/DATAFORTS_BLOB_OVERFLOW_PLAN.md)
947
+ for design + per-PR shipping status.
948
+ - **Phase 4 — Data gravity.** Per-chain read-rate counters with
949
+ exponential decay. Threshold-crossing emissions stamp
950
+ `heat:<hex>=<rate>` onto the chain's capability announcement;
951
+ greedy weights cache pulls by `heat × scope-match × proximity`.
952
+ The v0.2 blob track adds parallel `BlobHeatRegistry` keyed on
953
+ chunk hash + `heat:blob:<hex>=<rate>` tag emission +
954
+ `drive_blob_migration_tick` consumer — exposed from Rust
955
+ today; Node wrapper deferred.
956
+ - **Phase 5 — Read-your-writes.** Every `tasks.create`,
957
+ `memories.insert`, etc. returns a `WriteToken`. Pass it to
958
+ `tasks.waitForToken(token, deadlineMs)` and the call resolves
959
+ only after the local fold has *applied* that seq — tracking
960
+ both `appliedThroughSeq` and `foldedThroughSeq` so a stalled
961
+ fold surfaces a typed error, not a silent resolve.
962
+
963
+ ```ts
964
+ import { Redex, Tasks, BlobRef, registerFilesystemBlobAdapter,
965
+ blobPublish, blobResolve, MeshNode } from '@net-mesh/core';
966
+
967
+ const mesh = new MeshNode({ bindAddr: '0.0.0.0:7000', psk: '…' });
968
+ const redex = new Redex({ persistentDir: '/var/lib/net/redex' });
969
+
970
+ // Phase 1 — wire greedy into the mesh inbound dispatch.
971
+ redex.enableGreedyDataforts(mesh, {
972
+ scopes: ['region:us'],
973
+ totalCapBytes: 1n << 30n, // 1 GiB cluster-cap
974
+ perChannelCapBytes: 64n << 20n,
975
+ });
976
+
977
+ // Phase 4 — layer gravity on top.
978
+ redex.enableGravityForGreedy(mesh, {
979
+ enabled: true,
980
+ emitThresholdRatio: 1.5,
981
+ decayHalfLifeSecs: 300n,
982
+ });
983
+
984
+ // Phase 3 — register an adapter (filesystem ships in-tree).
985
+ registerFilesystemBlobAdapter('local', '/var/blobs');
986
+ const ref = await blobPublish('local', 'local://obj/payload', someBytes);
987
+ const back = await blobResolve(ref);
988
+
989
+ // Phase 3 v0.2 — substrate-owned `MeshBlobAdapter`.
990
+ import { MeshBlobAdapter, BlobRef } from '@net-mesh/core';
991
+ const meshBlob = new MeshBlobAdapter(redex, 'mesh-app', {
992
+ persistent: true,
993
+ });
994
+ const hash = /* 32-byte BLAKE3 of `someBytes` */ Buffer.alloc(32);
995
+ const blobRef = new BlobRef('mesh://demo', hash, BigInt(someBytes.length));
996
+ await meshBlob.store(blobRef, someBytes);
997
+ const fetched = await meshBlob.fetch(blobRef);
998
+
999
+ // Phase 3.5 / v0.3 — active blob overflow.
1000
+ // At construction:
1001
+ const overflowed = new MeshBlobAdapter(redex, 'mesh-overflow', {
1002
+ persistent: true,
1003
+ overflow: {
1004
+ enabled: true,
1005
+ highWaterRatio: 0.80,
1006
+ lowWaterRatio: 0.65,
1007
+ maxPushesPerTick: 8,
1008
+ scope: 'zone',
1009
+ tickIntervalMs: 30000,
1010
+ },
1011
+ });
1012
+
1013
+ // Or flip the master switch at runtime — no rebuild required:
1014
+ overflowed.setOverflowEnabled(false);
1015
+ overflowed.setOverflowEnabled(true);
1016
+
1017
+ // Inspection (read-only getters):
1018
+ console.log(overflowed.overflowEnabled); // boolean
1019
+ console.log(overflowed.overflowActive); // boolean — hysteresis state
1020
+ console.log(overflowed.overflowConfig); // typed snapshot
1021
+ console.log(overflowed.prometheusText()); // includes dataforts_blob_overflow_*
1022
+
1023
+ // Phase 5 — read-your-writes.
1024
+ const tasks = await Tasks.open(redex, { originHash: mesh.originHash });
1025
+ const { token } = await tasks.create(1, 'first', 100);
1026
+ await tasks.waitForToken(token, 250); // ms deadline; throws CortexError on timeout
1027
+
1028
+ // Diagnostics.
1029
+ console.log(redex.greedyCachedChannelCount());
1030
+ console.log(redex.greedyPrometheusText());
1031
+ ```
1032
+
1033
+ The canonical channel hash is 64-bit (`channelHash(name)` returns
1034
+ `bigint` in the u64 range). The per-packet wire `NetHeader`
1035
+ `channel_hash` stays `u16` — fast-path filter hint, may
1036
+ bucket-collide at scale; ACL / config / cache / RYW decisions key on
1037
+ the canonical 64-bit hash via registry disambiguation. The
1038
+ `PermissionToken` wire form is 169 bytes (`PermissionToken::WIRE_SIZE`
1039
+ in the Rust core).
1040
+
1041
+ ## nRPC (request / response over the mesh)
1042
+
1043
+ nRPC is the request/response convention layer riding on top of the
1044
+ pub/sub mesh. It turns a directed channel pair
1045
+ (`<service>.requests` / `<service>.replies.<caller_origin>`) into
1046
+ a typed RPC surface with deadlines, queue-group fan-out, response
1047
+ streaming, and end-to-end cancellation.
1048
+
1049
+ The typed surface ships in the napi binding at
1050
+ `@net-mesh/core/mesh_rpc` (the SDK's `MeshNode` wraps a `NetMesh`
1051
+ that nRPC consumes directly):
1052
+
1053
+ ```typescript
1054
+ import { MeshNode } from '@net-mesh/sdk'
1055
+ import {
1056
+ classifyError,
1057
+ RpcCancelledError,
1058
+ RpcServerError,
1059
+ } from '@net-mesh/core/errors'
1060
+ import {
1061
+ appError,
1062
+ CircuitBreaker,
1063
+ HedgePolicy,
1064
+ NRPC_TYPED_BAD_REQUEST,
1065
+ RetryPolicy,
1066
+ TypedMeshRpc,
1067
+ } from '@net-mesh/core/mesh_rpc'
1068
+
1069
+ const server = await MeshNode.create({ bindAddr: '127.0.0.1:9001', psk })
1070
+ const client = await MeshNode.create({ bindAddr: '127.0.0.1:9000', psk })
1071
+ // (handshake omitted — see Mesh Streams example)
1072
+
1073
+ interface EchoSumRequest { text: string; numbers: number[] }
1074
+ interface EchoSumResponse { echo: string; sum: number }
1075
+
1076
+ // Server side: register a typed handler. Returned `serveHandle`
1077
+ // MUST be `close()`d to stop accepting new requests; in-flight
1078
+ // handlers complete (no abort).
1079
+ const serverRpc = TypedMeshRpc.fromMesh((server as any)._native)
1080
+ const serveHandle = serverRpc.serve<EchoSumRequest, EchoSumResponse>(
1081
+ 'echo_sum',
1082
+ async (req) => ({ echo: req.text, sum: req.numbers.reduce((a, b) => a + b, 0) }),
1083
+ )
1084
+
1085
+ // Client side: typed call with a 200ms deadline.
1086
+ const clientRpc = TypedMeshRpc.fromMesh((client as any)._native)
1087
+ try {
1088
+ const reply = await clientRpc.call<EchoSumRequest, EchoSumResponse>(
1089
+ server.nodeId(),
1090
+ 'echo_sum',
1091
+ { text: 'hi', numbers: [1, 2, 3] },
1092
+ { deadlineMs: 200 },
1093
+ )
1094
+ // reply.sum === 6
1095
+ } catch (e) {
1096
+ // Errors carry a stable `nrpc:` prefix; classifyError() routes
1097
+ // them to typed subclasses for instanceof checks.
1098
+ const typed = classifyError(e)
1099
+ if (typed instanceof RpcServerError && typed.status === NRPC_TYPED_BAD_REQUEST) {
1100
+ // handler bad-request
1101
+ }
1102
+ }
1103
+
1104
+ await serveHandle.close()
1105
+ ```
1106
+
1107
+ ### Streaming responses
1108
+
1109
+ ```typescript
1110
+ const stream = await clientRpc.callStreaming<MyReq, MyChunk>(
1111
+ targetNodeId, 'tail', { tail: 'events' },
1112
+ { deadlineMs: 5_000, streamWindowInitial: 8 }, // optional flow control
1113
+ )
1114
+ for await (const chunk of stream) {
1115
+ // chunk is decoded MyChunk
1116
+ }
1117
+ // stream.close() emits CANCEL to the server (best-effort);
1118
+ // in-flight chunks are silently discarded.
1119
+ // stream.grant(n) issues an explicit credit publish for batched
1120
+ // cadence (no-op on streams without flow control).
1121
+ // stream.flowControlled() reports whether streamWindowInitial was
1122
+ // set on the call — useful for code that conditionally grants.
1123
+ ```
1124
+
1125
+ ### Cancellation (`AbortSignal`)
1126
+
1127
+ `call` / `callService` accept an `AbortSignal` via `opts.signal`.
1128
+ The wrapper mints a cancel token, attaches a one-shot abort
1129
+ listener, and detaches it on settle so the same signal can be
1130
+ reused. Aborting publishes CANCEL to the server and rejects with
1131
+ `RpcCancelledError` (caller-fixable; **not** retried by the
1132
+ default `RetryPolicy` predicate).
1133
+
1134
+ ```typescript
1135
+ const ac = new AbortController()
1136
+ setTimeout(() => ac.abort(), 100)
1137
+
1138
+ try {
1139
+ await clientRpc.call(targetNodeId, 'slow', {}, { signal: ac.signal })
1140
+ } catch (e) {
1141
+ if (classifyError(e) instanceof RpcCancelledError) {
1142
+ // CANCEL fired on the wire; server-side handler observes
1143
+ // its `ctx.cancellation` token.
1144
+ }
1145
+ }
1146
+ ```
1147
+
1148
+ Pre-aborted signals fail fast — the call rejects with
1149
+ `nrpc:cancelled:` before any tokio spawn / registry overhead.
1150
+
1151
+ ### Resilience helpers
1152
+
1153
+ Defaults mirror the Rust SDK (`mesh_rpc_resilience`): 3 attempts,
1154
+ 50ms→1s exponential backoff with full-half jitter, retryable
1155
+ predicate skips `RpcCodecError` / `RpcNoRouteError` /
1156
+ `RpcCancelledError` and non-transient `RpcServerError` statuses.
1157
+
1158
+ ```typescript
1159
+ // RetryPolicy. `jitter` is a boolean (full-half jitter on/off);
1160
+ // override `retryable` to gate which errors retry.
1161
+ const policy = new RetryPolicy({
1162
+ maxAttempts: 4,
1163
+ initialBackoffMs: 50,
1164
+ maxBackoffMs: 1000,
1165
+ jitter: true,
1166
+ })
1167
+ const reply = await clientRpc.callWithRetry(
1168
+ targetNodeId, 'echo', { hello: 'world' }, undefined /* opts */, policy,
1169
+ )
1170
+
1171
+ // HedgePolicy fans out parallel attempts on a delay; primary at
1172
+ // t=0, additional hedges at t=delayMs * idx. First reply (Ok or
1173
+ // Err) wins; if every hedge fails, the primary's error surfaces
1174
+ // deterministically.
1175
+ const hedge = new HedgePolicy({ delayMs: 50, hedges: 2 }) // primary + 2 hedges
1176
+ await clientRpc.callWithHedgeTo(targetNodeIds, 'echo', { /*...*/ }, undefined, hedge)
1177
+
1178
+ // CircuitBreaker — closed → open → half-open with a configurable
1179
+ // failure predicate. Open breakers throw `BreakerOpenError` carrying
1180
+ // the `nrpc:breaker_open:` prefix.
1181
+ const breaker = new CircuitBreaker({ failureThreshold: 5, resetAfterMs: 1000 })
1182
+ await breaker.call(() => clientRpc.call(targetNodeId, 'echo', {}))
1183
+ ```
1184
+
1185
+ ### Typed handler bad-request
1186
+
1187
+ `appError(code, body)` builds an `Error` whose message follows the
1188
+ `nrpc:app_error:0x<code>:<body>` contract the napi binding parses
1189
+ into `RpcStatus::Application(code)`. Mirrors the Python binding's
1190
+ `RpcAppError`:
1191
+
1192
+ ```typescript
1193
+ serverRpc.serve<EchoSumRequest, EchoSumResponse>('echo_sum', (req) => {
1194
+ if (typeof req.text !== 'string') {
1195
+ throw appError(NRPC_TYPED_BAD_REQUEST, JSON.stringify({
1196
+ error: 'invalid_request',
1197
+ detail: 'text must be a string',
1198
+ }))
1199
+ }
1200
+ return { echo: req.text, sum: req.numbers.reduce((a, b) => a + b, 0) }
1201
+ })
1202
+ ```
1203
+
1204
+ ### Errors
1205
+
1206
+ Caller-side failures throw a plain `Error` whose `.message`
1207
+ starts with the stable `nrpc:` prefix (the binding throws plain
1208
+ `Error` rather than typed classes to sidestep vitest's
1209
+ dual-module-instance hazard; `classifyError(e)` reconstructs the
1210
+ typed subclass at the catch site):
1211
+
1212
+ | Kind segment | Typed class | Retried by default? |
1213
+ | --------------- | --------------------- | ------------------- |
1214
+ | `no_route` | `RpcNoRouteError` | no |
1215
+ | `timeout` | `RpcTimeoutError` | yes |
1216
+ | `server_error` | `RpcServerError` | only `0x0003` / `0x0004` / `0x0006` |
1217
+ | `transport` | `RpcTransportError` | yes |
1218
+ | `codec_encode` | `RpcCodecError` | no (caller-fixable) |
1219
+ | `codec_decode` | `RpcCodecError` | no (caller-fixable) |
1220
+ | `cancelled` | `RpcCancelledError` | no (caller-driven) |
1221
+ | any other | `RpcError` (base) | yes (forward-compat fallback) |
1222
+
1223
+ `BreakerOpenError` is thrown directly by `CircuitBreaker.call`
1224
+ when the breaker is open — catch it via
1225
+ `instanceof BreakerOpenError` (imported from `@net-mesh/core/mesh_rpc`).
1226
+ It carries the `nrpc:breaker_open:` prefix for log filtering, but
1227
+ `classifyError` routes it through the base `RpcError` rather than
1228
+ its own subclass. Server-side `appError(code, body)` rejections
1229
+ arrive at the caller as `nrpc:server_error: status=0x<code>`, so
1230
+ they classify as `RpcServerError` with `err.status === code`
1231
+ (check against `NRPC_TYPED_BAD_REQUEST` etc.).
1232
+
1233
+ `classifyError` is duck-typed on `.message`: it accepts real
1234
+ `Error` instances, plain `{message: string}` objects, and string
1235
+ rejections — so top-level catch handlers reconstruct typed
1236
+ errors regardless of what the throw site emitted.
1237
+
1238
+ Two stable status constants exposed by `@net-mesh/core/mesh_rpc`:
1239
+
1240
+ | Constant | Hex | Meaning |
1241
+ | ------------------------------ | -------- | ------------------------------------------------ |
1242
+ | `NRPC_TYPED_BAD_REQUEST` | `0x8000` | Typed handler couldn't decode the request body. |
1243
+ | `NRPC_TYPED_HANDLER_ERROR` | `0x8001` | Typed handler ran but returned an exception. |
1244
+
1245
+ Cross-binding contract spec — including the canonical
1246
+ `cross_lang_echo_sum` service used by every binding's wire-format
1247
+ compat test — lives in [`../README.md#nrpc`](../README.md#nrpc).
1248
+
1249
+ ## MeshDB (federated query layer)
1250
+
1251
+ MeshDB is the typed query layer above the RedEX / CortEX /
1252
+ capability-index substrate. The native binding builds with
1253
+ `--features meshdb`; MeshDB classes import from `@net-mesh/core`.
1254
+ Architectural overview:
1255
+ [`../README.md#meshdb`](../README.md#meshdb).
1256
+
1257
+ ### Quick start
1258
+
1259
+ ```ts
1260
+ import {
1261
+ InMemoryChainReader,
1262
+ MeshQuery,
1263
+ MeshQueryRunner,
1264
+ } from '@net-mesh/core';
1265
+
1266
+ const reader = new InMemoryChainReader();
1267
+ reader.append(0xabn, 1n, Buffer.from('v1'));
1268
+ reader.append(0xabn, 2n, Buffer.from('v2'));
1269
+ reader.append(0xabn, 3n, Buffer.from('v3'));
1270
+
1271
+ const runner = new MeshQueryRunner(reader);
1272
+
1273
+ // Atomic operator — emits the tip row.
1274
+ const stream = await runner.execute(MeshQuery.latest(0xabn));
1275
+ const rows = await stream.toArray();
1276
+ console.log(rows[0].seq, Buffer.from(rows[0].payload).toString());
1277
+ // 3n "v3"
1278
+ ```
1279
+
1280
+ `runner.execute(query)` returns a `Promise<MeshQueryStream>`;
1281
+ `.toArray()` drains the stream eagerly, `.next()` pulls one row
1282
+ at a time, and the `@net-mesh/core/meshdb` re-export installs a
1283
+ `Symbol.asyncIterator` shim so `for await` works directly:
1284
+
1285
+ ```ts
1286
+ import '@net-mesh/core/meshdb'; // installs the async-iterator shim
1287
+ import { MeshQuery, MeshQueryRunner } from '@net-mesh/core';
1288
+
1289
+ const stream = await runner.execute(MeshQuery.between(0xabn, 1n, 10n));
1290
+ for await (const row of stream as unknown as AsyncIterable<{ seq: bigint }>) {
1291
+ console.log(row.seq);
1292
+ }
1293
+ ```
1294
+
1295
+ ### Operator surface
1296
+
1297
+ ```ts
1298
+ import {
1299
+ MeshQuery,
1300
+ predicateEquals,
1301
+ predicateAnd,
1302
+ predicateNumericAtLeast,
1303
+ } from '@net-mesh/core';
1304
+
1305
+ // Fluent builder (common-ops shortcut).
1306
+ const query = MeshQuery.builder()
1307
+ .between(0xabn, 1n, 100n)
1308
+ .filter(
1309
+ predicateAnd([
1310
+ predicateEquals('severity', 'high'),
1311
+ predicateNumericAtLeast('seq', 5),
1312
+ ]),
1313
+ )
1314
+ .count(['origin'])
1315
+ .build();
1316
+
1317
+ // Or compose static factories directly.
1318
+ const between = MeshQuery.between(0xabn, 1n, 100n);
1319
+ const filtered = MeshQuery.filter(between, predicateEquals('severity', 'high'));
1320
+ const grouped = MeshQuery.count(filtered, ['origin']);
1321
+ ```
1322
+
1323
+ | Family | Factories / builder methods |
1324
+ |---|---|
1325
+ | Atomic | `MeshQuery.at`, `MeshQuery.between`, `MeshQuery.latest`, `MeshQuery.lineageEmit` |
1326
+ | Composite | `MeshQuery.filter`, `MeshQuery.window`, `MeshQuery.count`, `MeshQuery.sum/avg/min/max/percentile`, `MeshQuery.distinctCount`, `MeshQuery.join` |
1327
+ | Fluent builder | `MeshQuery.builder().<at|between|latest>(...).<filter|window|count|...>(...).build()` |
1328
+ | Predicate factories | `predicateExists`, `predicateEquals`, `predicateNumericAtLeast/AtMost/InRange`, `predicateStringPrefix/Matches`, `predicateSemverAtLeast`, `predicateAnd/Or/Not` |
1329
+
1330
+ Field paths target row-intrinsic names (`"origin"` / `"seq"`) or
1331
+ dotted JSON-payload paths (`"a.b.c"`).
1332
+
1333
+ ### Sentinel row decoders
1334
+
1335
+ Atomic rows expose `.payload` directly as `Uint8Array`. Composite
1336
+ rows carry postcard-encoded sentinel envelopes — decode via the
1337
+ module-level helpers:
1338
+
1339
+ ```ts
1340
+ import { decodeAggregate, decodeJoined, decodeWindow } from '@net-mesh/core';
1341
+
1342
+ const [aggRow] = await (
1343
+ await runner.execute(MeshQuery.count(MeshQuery.between(0xabn, 1n, 4n)))
1344
+ ).toArray();
1345
+ const result = decodeAggregate(aggRow);
1346
+ // { group: null, kind: 'count', value: 3, count: 3n }
1347
+
1348
+ const [pair] = await (await runner.execute(joinQuery)).toArray();
1349
+ const joined = decodeJoined(pair);
1350
+ // { left: ResultRow|null, right: ResultRow|null }
1351
+
1352
+ const [bucket] = await (await runner.execute(windowQuery)).toArray();
1353
+ const window = decodeWindow(bucket);
1354
+ // { start: bigint, end: bigint, rows: ResultRow[] }
1355
+ ```
1356
+
1357
+ Each decoder returns `null` for non-sentinel rows (atomic
1358
+ operator output), so callers branch on "did this row deserialise?"
1359
+ without a separate type query.
1360
+
1361
+ ### Phase F result cache
1362
+
1363
+ Pass `enableCache: true` at runner construction; tune per-call via
1364
+ the optional second argument to `execute`:
1365
+
1366
+ ```ts
1367
+ import {
1368
+ cachePolicyPermanent,
1369
+ cachePolicyTimeBound,
1370
+ MeshQueryRunner,
1371
+ } from '@net-mesh/core';
1372
+
1373
+ const runner = new MeshQueryRunner(reader, /* enableCache */ true);
1374
+
1375
+ // Default — TimeBound TTL = 5 s (mirrors the join watermark).
1376
+ await runner.execute(query);
1377
+
1378
+ // Explicit per-call policy.
1379
+ await runner.execute(query, { cachePolicy: cachePolicyPermanent() });
1380
+ await runner.execute(query, { cachePolicy: cachePolicyTimeBound(30) });
1381
+ await runner.execute(query, { bypassCache: true });
1382
+ ```
1383
+
1384
+ `cachePolicyPermanent()` is safe only when the query result is
1385
+ immutable under substrate semantics.
1386
+
1387
+ ### Lineage emit
1388
+
1389
+ The SDK doesn't walk the `fork-of:` graph itself — callers supply
1390
+ pre-walked entries in walk order:
1391
+
1392
+ ```ts
1393
+ import { MeshQuery } from '@net-mesh/core';
1394
+
1395
+ const query = MeshQuery.lineageEmit(
1396
+ 0xaan,
1397
+ [
1398
+ { originHash: 0xaan, depth: 0, tipSeq: 5n },
1399
+ { originHash: 0xbbn, depth: 1, tipSeq: 3n },
1400
+ { originHash: 0xccn, depth: 2 }, // tipSeq omitted -> emits seq=0n
1401
+ ],
1402
+ 'back',
1403
+ );
1404
+ // Compose with .at / .between to fetch event bodies per chain.
1405
+ ```
1406
+
1407
+ ### Errors
1408
+
1409
+ Every factory and runner method throws a plain `Error` whose
1410
+ `.message` carries a stable kind prefix on failure (planner /
1411
+ executor / invalid argument). The native binding pre-validates
1412
+ the AST at construction time, so most errors surface at the
1413
+ factory call rather than at `execute`.
1414
+
1415
+ > **Note.** The `@net-mesh/sdk` wrapper doesn't yet re-export
1416
+ > the MeshDB surface — import directly from `@net-mesh/core` /
1417
+ > `@net-mesh/core/meshdb`.
1418
+
1419
+ ## Compute (daemons + migration)
1420
+
1421
+ Run `MeshDaemon`s directly from TypeScript. `DaemonRuntime` owns
1422
+ the factory table, per-daemon hosts, and the
1423
+ `Registering → Ready → ShuttingDown` lifecycle gate that decides
1424
+ when inbound migrations may land. Daemons are plain JS objects
1425
+ (or class instances) whose `process(event)` returns an array of
1426
+ output `Buffer`s — the runtime wraps each output in a causal link
1427
+ automatically.
1428
+
1429
+ Build the `@net-mesh/core` NAPI module with `--features compute`
1430
+ (auto-enabled in the default `local` bundle) to expose the
1431
+ surface; everything below is re-exported from `@net-mesh/sdk`.
1432
+ Full design notes:
1433
+ [`docs/SDK_COMPUTE_SURFACE_PLAN.md`](../docs/SDK_COMPUTE_SURFACE_PLAN.md).
1434
+
1435
+ ```typescript
1436
+ import {
1437
+ DaemonRuntime, DaemonError, Identity, MeshNode,
1438
+ type CausalEvent, type MeshDaemon,
1439
+ } from '@net-mesh/sdk';
1440
+
1441
+ // 1. Build a mesh + runtime.
1442
+ const mesh = await MeshNode.create({ bindAddr: '127.0.0.1:0', psk: '42'.repeat(32) });
1443
+ const rt = DaemonRuntime.create(mesh);
1444
+
1445
+ // 2. Register factories BEFORE flipping the runtime to Ready.
1446
+ rt.registerFactory('echo', (): MeshDaemon => ({
1447
+ name: 'echo',
1448
+ process: (event: CausalEvent) => [event.payload],
1449
+ // optional: snapshot() / restore(state) for migration-capable daemons
1450
+ }));
1451
+
1452
+ // 3. Ready the runtime — after this point spawns + migrations accept.
1453
+ await rt.start();
1454
+
1455
+ // 4. Spawn a daemon. `Identity` pins its ed25519 keypair so
1456
+ // `originHash` / `entityId` stay stable across migrations.
1457
+ const handle = await rt.spawn('echo', Identity.generate());
1458
+ console.log('origin =', handle.originHash.toString(16));
1459
+
1460
+ // 5. Inspect / stop when done.
1461
+ const stats = handle.stats(); // eventsProcessed / eventsEmitted / ...
1462
+ await rt.stop(handle.originHash);
1463
+ await rt.shutdown();
1464
+ ```
1465
+
1466
+ `MeshDaemon.process` is synchronous by contract — the NAPI TSFN
1467
+ bridge blocks the calling tokio task until it returns, so
1468
+ returning a `Promise` will break event dispatch. Stateful daemons
1469
+ opt into migration by adding `snapshot(): Buffer | null` and
1470
+ `restore(state: Buffer): void`.
1471
+
1472
+ ### Migration
1473
+
1474
+ `startMigration(origin, sourceNode, targetNode)` orchestrates the
1475
+ six-phase cutover (`Snapshot → Transfer → Restore → Replay →
1476
+ Cutover → Complete`). The source seals the daemon's seed into the
1477
+ outbound snapshot using the target's X25519 static pubkey; the
1478
+ target's factory for the same `kind` rebuilds the daemon, replays
1479
+ any events that arrived during transfer, then activates.
1480
+
1481
+ ```typescript
1482
+ import { MigrationError } from '@net-mesh/sdk';
1483
+
1484
+ try {
1485
+ const mig = await rtA.startMigration(handle.originHash, nodeA, nodeB);
1486
+ console.log('phase =', mig.phase); // 'snapshot' | 'transfer' | ...
1487
+ await mig.wait(); // drive to completion
1488
+ } catch (e) {
1489
+ if (e instanceof MigrationError) {
1490
+ switch (e.kind) {
1491
+ case 'not-ready': break; // target not started yet
1492
+ case 'factory-not-found': break; // target missing `kind`
1493
+ case 'compute-not-supported': break; // target has no DaemonRuntime
1494
+ case 'state-failed': break; // snapshot / restore threw
1495
+ case 'identity-transport-failed': break; // seal / unseal failed
1496
+ // ... see MigrationErrorKind for the full set
1497
+ }
1498
+ }
1499
+ }
1500
+ ```
1501
+
1502
+ `startMigrationWith(origin, src, dst, { sealSeed, ... })` exposes
1503
+ the advanced knobs. On the target node, call
1504
+ `rt.registerMigrationTargetIdentity(identity)` before a migration
1505
+ lands — without it, the runtime rejects sealed-seed envelopes with
1506
+ `MigrationError.kind === 'identity-transport-failed'`.
1507
+
1508
+ ### Surface at a glance
1509
+
1510
+ | Method | Description |
1511
+ |---|---|
1512
+ | `DaemonRuntime.create(mesh)` | Construct a runtime against an existing `MeshNode` |
1513
+ | `rt.registerFactory(kind, fn)` | Install a factory (must run before `start()`) |
1514
+ | `rt.start() / rt.shutdown()` | Flip the lifecycle gate |
1515
+ | `rt.spawn(kind, identity, cfg?)` | Spawn a local daemon |
1516
+ | `rt.spawnFromSnapshot(kind, identity, bytes, cfg?)` | Rehydrate from a snapshot |
1517
+ | `rt.stop(origin)` | Stop a local daemon |
1518
+ | `rt.snapshot(origin)` | Capture a `Buffer` for persistence / migration |
1519
+ | `rt.deliver(origin, event)` | Feed an event (returns output buffers) |
1520
+ | `rt.startMigration(origin, src, dst)` | Orchestrate a live migration |
1521
+ | `rt.registerMigrationTargetIdentity(id)` | Pin the unseal keypair on target nodes |
1522
+ | `handle.originHash` / `entityId` / `stats()` | Per-daemon identity + observability |
1523
+ | `DaemonError` / `MigrationError` | Typed catch classes (`instanceof` + `err.kind`) |
1524
+
1525
+ ## Groups (replica / fork / standby)
1526
+
1527
+ HA / scaling overlays on top of `DaemonRuntime`. Build the NAPI
1528
+ crate with `--features groups` (implies `compute`) to expose
1529
+ `ReplicaGroup`, `ForkGroup`, and `StandbyGroup`.
1530
+
1531
+ - **ReplicaGroup** — N interchangeable copies with deterministic
1532
+ identity per index; load-balances inbound events across healthy
1533
+ members; auto-replaces on node failure.
1534
+ - **ForkGroup** — N independent daemons forked from a common parent
1535
+ at `forkSeq`. Unique identities, shared ancestry via a verifiable
1536
+ `ForkRecord`.
1537
+ - **StandbyGroup** — active-passive replication. One member processes
1538
+ events; standbys hold snapshots via `sync()`. Most-synced standby
1539
+ promotes on active failure and replays buffered events.
1540
+
1541
+ ```typescript
1542
+ import {
1543
+ DaemonRuntime, ForkGroup, GroupError, ReplicaGroup, StandbyGroup,
1544
+ } from '@net-mesh/sdk';
1545
+
1546
+ const rt = await DaemonRuntime.create(mesh);
1547
+ rt.registerFactory('counter', () => new CounterDaemon());
1548
+
1549
+ // ReplicaGroup — async because the factory round-trips through the
1550
+ // Node main thread (TSFN).
1551
+ const replicas = await ReplicaGroup.spawn(rt, 'counter', {
1552
+ replicaCount: 3,
1553
+ groupSeed: Buffer.alloc(32, 0x11),
1554
+ lbStrategy: 'consistent-hash', // or 'round-robin' | 'least-load' | ...
1555
+ });
1556
+
1557
+ const origin = replicas.routeEvent({ routingKey: 'user:42' });
1558
+ await rt.deliver(origin, event);
1559
+
1560
+ await replicas.scaleTo(5); // grow
1561
+ await replicas.onNodeFailure(failedNodeId); // respawn elsewhere
1562
+
1563
+ // ForkGroup
1564
+ const forks = await ForkGroup.fork(rt, 'counter',
1565
+ /* parentOrigin */ 0xabcdef01,
1566
+ /* forkSeq */ 42n,
1567
+ { forkCount: 3, lbStrategy: 'round-robin' });
1568
+ console.log(forks.verifyLineage(), forks.forkRecords.length);
1569
+
1570
+ // StandbyGroup — manual event buffering for replay on promotion.
1571
+ const hot = await StandbyGroup.spawn(rt, 'counter', {
1572
+ memberCount: 3, // 1 active + 2 standbys
1573
+ groupSeed: Buffer.alloc(32, 0x77),
1574
+ });
1575
+ await rt.deliver(hot.activeOrigin, event);
1576
+ hot.onEventDelivered(event); // keep standbys' replay buffer accurate
1577
+ await hot.sync(); // periodic catchup
1578
+ // await hot.onNodeFailure(failedNodeId); // auto-promotes the most-synced standby
1579
+ ```
1580
+
1581
+ ### Typed errors
1582
+
1583
+ Failures surface as `GroupError` (a subclass of `DaemonError`) with
1584
+ a stable `kind` discriminator parsed from the Rust side's
1585
+ `daemon: group: <kind>[: detail]` prefix:
1586
+
1587
+ ```typescript
1588
+ import { GroupError } from '@net-mesh/sdk';
1589
+
1590
+ try {
1591
+ await ReplicaGroup.spawn(rt, 'never-registered', cfg);
1592
+ } catch (e) {
1593
+ if (e instanceof GroupError) {
1594
+ switch (e.kind) {
1595
+ case 'not-ready': break; // runtime.start() hasn't run
1596
+ case 'factory-not-found': break; // e.requestedKind tells you which
1597
+ case 'no-healthy-member': break; // routeEvent on an all-down group
1598
+ case 'invalid-config': break; // e.detail has the specifics
1599
+ case 'placement-failed': break;
1600
+ case 'registry-failed': break;
1601
+ }
1602
+ }
1603
+ }
1604
+ ```
1605
+
1606
+ Full staging, wire formats, and rationale:
1607
+ [`docs/SDK_GROUPS_SURFACE_PLAN.md`](../docs/SDK_GROUPS_SURFACE_PLAN.md).
1608
+ Core semantics (placement spread, health aggregation, failure
1609
+ domains): [`../README.md#daemons`](../README.md#daemons).
1610
+
1611
+ ## API
1612
+
1613
+ | Method | Description |
1614
+ |--------|-------------|
1615
+ | `NetNode.create(config)` | Create a new node |
1616
+ | `emit(obj)` | Emit a typed event |
1617
+ | `emitRaw(json)` | Emit a JSON string |
1618
+ | `emitBuffer(buf)` | Emit a Buffer (fastest) |
1619
+ | `emitBatch(objs)` | Batch emit |
1620
+ | `emitRawBatch(jsons)` | Batch emit strings |
1621
+ | `fire(json)` | Fire-and-forget |
1622
+ | `fireBatch(jsons)` | Fire-and-forget batch |
1623
+ | `poll(request)` | One-shot poll |
1624
+ | `pollOne()` | Poll a single event |
1625
+ | `subscribe(opts)` | Async iterable stream |
1626
+ | `subscribeTyped<T>(opts)` | Typed async iterable |
1627
+ | `channel<T>(name)` | Create a typed channel |
1628
+ | `stats()` | Ingestion statistics |
1629
+ | `shards()` | Number of active shards |
1630
+ | `flush()` | Flush pending batches |
1631
+ | `shutdown()` | Graceful shutdown |
1632
+ | `napi` | Access underlying NAPI binding |
1633
+
1634
+ ### CortEX surface
1635
+
1636
+ | Entry point | Description |
1637
+ |---|---|
1638
+ | `new Redex({ persistentDir? })` | Local event-log manager |
1639
+ | `NetDb.open({ originHash, withTasks?, withMemories?, ... })` | Unified handle |
1640
+ | `NetDb.openFromSnapshot(config, bundle)` | Restore from `db.snapshot()` bundle |
1641
+ | `db.tasks` / `db.memories` | Typed adapter handles |
1642
+ | `TasksAdapter.open(redex, origin, opts?)` | Standalone tasks adapter |
1643
+ | `MemoriesAdapter.open(redex, origin, opts?)` | Standalone memories adapter |
1644
+ | `adapter.create/rename/complete/delete/...` | Domain CRUD |
1645
+ | `adapter.listTasks(filter?)` / `listMemories` | Sync snapshot query |
1646
+ | `adapter.watch(filter?)` | `Promise<AsyncIterable<T[]>>` over deduplicated fold results |
1647
+ | `adapter.snapshotAndWatch(filter?)` | `Promise<SnapshotAndWatch<T>>` — atomic paint+react |
1648
+ | `adapter.snapshot()` / `openFromSnapshot` | Model-level persistence |
1649
+ | `db.snapshot()` / `NetDb.openFromSnapshot` | Bundled multi-model persistence |
1650
+ | `redex.openFile(name, config?)` | Raw RedEX file — append-only log |
1651
+ | `file.append(buffer)` / `appendBatch(buffers)` | Append one / many payloads |
1652
+ | `file.readRange(start, end)` | Range read over retained entries |
1653
+ | `file.tail(fromSeq?)` | `AsyncIterable<RedexEvent>` |
1654
+ | `file.sync()` / `file.close()` | Explicit fsync / close |
1655
+
1656
+ ## Cargo features
1657
+
1658
+ `@net-mesh/sdk` wraps `@net-mesh/core` (the napi-rs binding), so its reachable surface matches whatever Cargo features the underlying `.node` artifact was built with. The five feature flags relevant to building from source:
1659
+
1660
+ | Feature | What it enables on the underlying `@net-mesh/core` binding |
1661
+ |---|---|
1662
+ | `cortex` | `Redex`, `RedexFile`, `TasksAdapter`, `MemoriesAdapter`, `NetDb`, `Task`, `Memory`, watch iterators, `RedexError`, `CortexError`, `NetDbError` |
1663
+ | `redex-disk` | Disk-backed RedEX persistence — the `persistentDir` ctor option and `persistent: true` on `openFile`. Without it the persistent path rejects with `RedexError`. |
1664
+ | `netdb` | `NetDb` composition (requires `cortex`); the `net_netdb_*` FFI entry points ship with this feature. |
1665
+ | `meshdb` | `MeshQuery`, `MeshQueryRunner`, `MeshQueryStream`, `QueryBuilder`, `InMemoryChainReader`, plus the `libnet_meshdb` cdylib. |
1666
+ | `meshos` | `MeshOsDaemonSdk`, `MeshOsDaemonHandle`, plus the `libnet_meshos` cdylib. |
1667
+
1668
+ A `.node` artifact built without a feature silently omits its symbols — there is no build warning. The TypeScript wrapper destructures the napi exports lazily, so a missing feature surfaces as `undefined` at the import site rather than a load-time error.
1669
+
1670
+ Enable at build time (rebuild the underlying `@net-mesh/core` artifact, then re-link / reinstall in the consumer):
1671
+
1672
+ ```bash
1673
+ cd net/crates/net/bindings/node
1674
+ napi build --platform --release --features "cortex netdb redex-disk meshdb meshos"
1675
+ # The repo's `npm run build` script already passes a full feature
1676
+ # set; see `bindings/node/package.json` -> scripts.build for the
1677
+ # canonical list of flags shipped to npm.
1678
+ ```
1679
+
1680
+ Pre-built npm artifacts ship with every feature enabled; the flags above only matter for source builds.
1681
+
1682
+ ## License
1683
+
1684
+ Apache-2.0