@databricks/zerobus-ingest-sdk 0.0.1 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,1220 @@
1
+ # Databricks Zerobus Ingest SDK for TypeScript
2
+
3
+ [Public Preview](https://docs.databricks.com/release-notes/release-types.html): This SDK is supported for production use cases and is available to all customers. Databricks is actively working on stabilizing the Zerobus Ingest SDK for TypeScript. Minor version updates may include backwards-incompatible changes.
4
+
5
+ We are keen to hear feedback from you on this SDK. Please [file issues](https://github.com/databricks/zerobus-sdk-ts/issues), and we will address them.
6
+
7
+ The Databricks Zerobus Ingest SDK for TypeScript provides a high-performance client for ingesting data directly into Databricks Delta tables using the Zerobus streaming protocol. This SDK wraps the high-performance [Rust SDK](https://github.com/databricks/zerobus-sdk-rs) using native bindings for optimal performance. | See also the [SDK for Rust](https://github.com/databricks/zerobus-sdk-rs) | See also the [SDK for Python](https://github.com/databricks/zerobus-sdk-py) | See also the [SDK for Java](https://github.com/databricks/zerobus-sdk-java) | See also the [SDK for Go](https://github.com/databricks/zerobus-sdk-go)
8
+
9
+ ## Table of Contents
10
+
11
+ - [Features](#features)
12
+ - [Requirements](#requirements)
13
+ - [Quick Start User Guide](#quick-start-user-guide)
14
+ - [Prerequisites](#prerequisites)
15
+ - [Installation](#installation)
16
+ - [Choose Your Serialization Format](#choose-your-serialization-format)
17
+ - [Option 1: Using JSON (Quick Start)](#option-1-using-json-quick-start)
18
+ - [Option 2: Using Protocol Buffers (Default, Recommended)](#option-2-using-protocol-buffers-default-recommended)
19
+ - [Usage Examples](#usage-examples)
20
+ - [Authentication](#authentication)
21
+ - [Configuration](#configuration)
22
+ - [Descriptor Utilities](#descriptor-utilities)
23
+ - [Error Handling](#error-handling)
24
+ - [API Reference](#api-reference)
25
+ - [Best Practices](#best-practices)
26
+ - [Platform Support](#platform-support)
27
+ - [Architecture](#architecture)
28
+ - [Contributing](#contributing)
29
+ - [Related Projects](#related-projects)
30
+
31
+ ## Features
32
+
33
+ - **High-throughput ingestion**: Optimized for high-volume data ingestion with native Rust implementation
34
+ - **Automatic recovery**: Built-in retry and recovery mechanisms for transient failures
35
+ - **Flexible configuration**: Customizable stream behavior and timeouts
36
+ - **Multiple serialization formats**: Support for JSON and Protocol Buffers
37
+ - **Type widening**: Accept high-level types (plain objects, protobuf messages) or low-level types (strings, buffers) - automatically handles serialization
38
+ - **Batch ingestion**: Ingest multiple records with a single acknowledgment for higher throughput
39
+ - **OAuth 2.0 authentication**: Secure authentication with client credentials
40
+ - **TypeScript support**: Full type definitions for excellent IDE support
41
+ - **Cross-platform**: Supports Linux, macOS, and Windows
42
+
43
+ ## Requirements
44
+
45
+ ### Runtime Requirements
46
+
47
+ - **Node.js**: >= 16
48
+ - **Databricks workspace** with Zerobus access enabled
49
+
50
+ ### Build Requirements
51
+
52
+ - **Rust toolchain**: 1.70 or higher - [Install Rust](https://rustup.rs/)
53
+ - **Cargo**: Included with Rust
54
+
55
+ ### Dependencies
56
+
57
+ These will be installed automatically:
58
+
59
+ ```json
60
+ {
61
+ "@napi-rs/cli": "^2.18.4",
62
+ "napi-build": "^0.3.3"
63
+ }
64
+ ```
65
+
66
+ ## Quick Start User Guide
67
+
68
+ ### Prerequisites
69
+
70
+ Before using the SDK, you'll need the following:
71
+
72
+ #### 1. Workspace URL and Workspace ID
73
+
74
+ After logging into your Databricks workspace, look at the browser URL:
75
+
76
+ ```
77
+ https://<databricks-instance>.cloud.databricks.com/?o=<workspace-id>
78
+ ```
79
+
80
+ - **Workspace URL**: The part before `/?o=` → `https://<databricks-instance>.cloud.databricks.com`
81
+ - **Workspace ID**: The part after `?o=` → `<workspace-id>`
82
+ - **Zerobus Endpoint**: `https://<workspace-id>.zerobus.<region>.cloud.databricks.com`
83
+
84
+ > **Note:** The examples above show AWS endpoints (`.cloud.databricks.com`). For Azure deployments, the workspace URL will be `https://<databricks-instance>.azuredatabricks.net` and Zerobus endpoint will use `.azuredatabricks.net`.
85
+
86
+ Example:
87
+ - Full URL: `https://dbc-a1b2c3d4-e5f6.cloud.databricks.com/?o=1234567890123456`
88
+ - Workspace URL: `https://dbc-a1b2c3d4-e5f6.cloud.databricks.com`
89
+ - Workspace ID: `1234567890123456`
90
+ - Zerobus Endpoint: `https://1234567890123456.zerobus.us-west-2.cloud.databricks.com`
91
+
92
+ #### 2. Create a Delta Table
93
+
94
+ Create a table using Databricks SQL:
95
+
96
+ ```sql
97
+ CREATE TABLE <catalog_name>.default.air_quality (
98
+ device_name STRING,
99
+ temp INT,
100
+ humidity BIGINT
101
+ )
102
+ USING DELTA;
103
+ ```
104
+
105
+ Replace `<catalog_name>` with your catalog name (e.g., `main`).
106
+
107
+ #### 3. Create a Service Principal
108
+
109
+ 1. Navigate to **Settings > Identity and Access** in your Databricks workspace
110
+ 2. Click **Service principals** and create a new service principal
111
+ 3. Generate a new secret for the service principal and save it securely
112
+ 4. Grant the following permissions:
113
+ - `USE_CATALOG` on the catalog (e.g., `main`)
114
+ - `USE_SCHEMA` on the schema (e.g., `default`)
115
+ - `MODIFY` and `SELECT` on the table (e.g., `air_quality`)
116
+
117
+ Grant permissions using SQL:
118
+
119
+ ```sql
120
+ -- Grant catalog permission
121
+ GRANT USE CATALOG ON CATALOG <catalog_name> TO `<service-principal-application-id>`;
122
+
123
+ -- Grant schema permission
124
+ GRANT USE SCHEMA ON SCHEMA <catalog_name>.default TO `<service-principal-application-id>`;
125
+
126
+ -- Grant table permissions
127
+ GRANT SELECT, MODIFY ON TABLE <catalog_name>.default.air_quality TO `<service-principal-application-id>`;
128
+ ```
129
+
130
+ ### Installation
131
+
132
+ #### Prerequisites
133
+
134
+ Before installing the SDK, ensure you have the required tools:
135
+
136
+ **1. Node.js >= 16**
137
+
138
+ Check if Node.js is installed:
139
+ ```bash
140
+ node --version
141
+ ```
142
+
143
+ If not installed, download from [nodejs.org](https://nodejs.org/).
144
+
145
+ **2. Rust Toolchain (1.70+)**
146
+
147
+ The SDK requires Rust to compile the native addon. Install using `rustup` (the official Rust installer):
148
+
149
+ **On Linux and macOS:**
150
+ ```bash
151
+ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
152
+ ```
153
+
154
+ Follow the prompts (typically just press Enter to accept defaults).
155
+
156
+ **On Windows:**
157
+
158
+ Download and run the installer from [rustup.rs](https://rustup.rs/), or use:
159
+ ```powershell
160
+ # Using winget
161
+ winget install Rustlang.Rustup
162
+
163
+ # Or download from https://rustup.rs/
164
+ ```
165
+
166
+ **Verify Installation:**
167
+ ```bash
168
+ rustc --version
169
+ cargo --version
170
+ ```
171
+
172
+ You should see version 1.70 or higher. If the commands aren't found, restart your terminal or add Rust to your PATH:
173
+ ```bash
174
+ # Linux/macOS
175
+ source $HOME/.cargo/env
176
+
177
+ # Windows (PowerShell)
178
+ # Restart your terminal
179
+ ```
180
+
181
+ **Additional Platform Requirements:**
182
+
183
+ - **Linux**: Build essentials
184
+ ```bash
185
+ # Ubuntu/Debian
186
+ sudo apt-get install build-essential
187
+
188
+ # CentOS/RHEL
189
+ sudo yum groupinstall "Development Tools"
190
+ ```
191
+
192
+ - **macOS**: Xcode Command Line Tools
193
+ ```bash
194
+ xcode-select --install
195
+ ```
196
+
197
+ - **Windows**: Visual Studio Build Tools
198
+ - Install [Visual Studio Build Tools](https://visualstudio.microsoft.com/downloads/#build-tools-for-visual-studio-2022)
199
+ - During installation, select "Desktop development with C++"
200
+
201
+ #### Installation Steps
202
+
203
+ **Note for macOS users**: Pre-built binaries are not available. The package will automatically build from source during `npm install`. Ensure you have Rust toolchain and Xcode Command Line Tools installed (see prerequisites above).
204
+
205
+ 1. Extract the SDK package:
206
+ ```bash
207
+ unzip zerobus-sdk-ts.zip
208
+ cd zerobus-sdk-ts
209
+ ```
210
+
211
+ 2. Install dependencies:
212
+ ```bash
213
+ npm install
214
+ ```
215
+
216
+ 3. Build the native addon:
217
+ ```bash
218
+ npm run build
219
+ ```
220
+
221
+ This will compile the Rust code into a native Node.js addon (`.node` file) for your platform.
222
+
223
+ 4. Verify the build:
224
+ ```bash
225
+ # You should see a .node file
226
+ ls -la *.node
227
+ ```
228
+
229
+ 5. The SDK is now ready to use! You can:
230
+ - Use it directly in this directory for examples
231
+ - Link it globally: `npm link`
232
+ - Or copy it into your project's `node_modules`
233
+
234
+ **Troubleshooting:**
235
+
236
+ - **"rustc: command not found"**: Restart your terminal after installing Rust
237
+ - **Build fails on Windows**: Ensure Visual Studio Build Tools are installed with C++ support
238
+ - **Build fails on Linux**: Install build-essential or equivalent package
239
+ - **Permission errors**: Don't use `sudo` with npm/cargo commands
240
+
241
+ ### Choose Your Serialization Format
242
+
243
+ The SDK supports two serialization formats. **Protocol Buffers is the default** and recommended for production use:
244
+
245
+ - **Protocol Buffers (Default)** - Strongly-typed schemas, efficient binary encoding, better performance. This is the default format.
246
+ - **JSON** - Simple, no schema compilation needed. Good for getting started quickly or when schema flexibility is needed.
247
+
248
+ > **Note:** If you don't specify `recordType`, the SDK will use Protocol Buffers by default. To use JSON, explicitly set `recordType: RecordType.Json`.
249
+
250
+ ### Option 1: Using JSON (Quick Start)
251
+
252
+ JSON mode is the simplest way to get started. You don't need to define or compile protobuf schemas, but you must explicitly specify `RecordType.Json`.
253
+
254
+ ```typescript
255
+ import { ZerobusSdk, RecordType } from '@databricks/zerobus-ingest-sdk';
256
+
257
+ // Configuration
258
+ // For AWS:
259
+ const zerobusEndpoint = '<workspace-id>.zerobus.<region>.cloud.databricks.com';
260
+ const workspaceUrl = 'https://<workspace-name>.cloud.databricks.com';
261
+ // For Azure:
262
+ // const zerobusEndpoint = '<workspace-id>.zerobus.<region>.azuredatabricks.net';
263
+ // const workspaceUrl = 'https://<workspace-name>.azuredatabricks.net';
264
+
265
+ const tableName = 'main.default.air_quality';
266
+ const clientId = process.env.DATABRICKS_CLIENT_ID!;
267
+ const clientSecret = process.env.DATABRICKS_CLIENT_SECRET!;
268
+
269
+ // Initialize SDK
270
+ const sdk = new ZerobusSdk(zerobusEndpoint, workspaceUrl);
271
+
272
+ // Configure table properties (no descriptor needed for JSON)
273
+ const tableProperties = { tableName };
274
+
275
+ // Configure stream with JSON record type
276
+ const options = {
277
+ recordType: RecordType.Json, // JSON encoding
278
+ maxInflightRequests: 1000,
279
+ recovery: true
280
+ };
281
+
282
+ // Create stream
283
+ const stream = await sdk.createStream(
284
+ tableProperties,
285
+ clientId,
286
+ clientSecret,
287
+ options
288
+ );
289
+
290
+ try {
291
+ let lastAckPromise;
292
+
293
+ // Send all records
294
+ for (let i = 0; i < 100; i++) {
295
+ // Create JSON record
296
+ const record = {
297
+ device_name: `sensor-${i % 10}`,
298
+ temp: 20 + (i % 15),
299
+ humidity: 50 + (i % 40)
300
+ };
301
+
302
+ // JSON supports 2 types:
303
+ // 1. object (high-level) - SDK auto-stringifies
304
+ lastAckPromise = stream.ingestRecord(record);
305
+ // 2. string (low-level) - pre-serialized JSON
306
+ // lastAckPromise = stream.ingestRecord(JSON.stringify(record));
307
+ }
308
+
309
+ console.log('All records sent. Waiting for last acknowledgment...');
310
+
311
+ // Wait for the last record's acknowledgment
312
+ const lastOffset = await lastAckPromise;
313
+ console.log(`Last record offset: ${lastOffset}`);
314
+
315
+ // Flush to ensure all records are acknowledged
316
+ await stream.flush();
317
+ console.log('Successfully ingested 100 records!');
318
+ } finally {
319
+ // Always close the stream
320
+ await stream.close();
321
+ }
322
+ ```
323
+
324
+ ### Option 2: Using Protocol Buffers (Default, Recommended)
325
+
326
+ Protocol Buffers is the default serialization format and provides efficient binary encoding with schema validation. This is recommended for production use. This section covers the complete setup process.
327
+
328
+ #### Prerequisites
329
+
330
+ Before starting, ensure you have:
331
+
332
+ 1. **Protocol Buffer Compiler (`protoc`)** - Required for generating descriptor files
333
+ 2. **protobufjs** and **protobufjs-cli** - Already included in package.json devDependencies
334
+
335
+ #### Step 1: Install Protocol Buffer Compiler
336
+
337
+ **Linux:**
338
+
339
+ ```bash
340
+ # Ubuntu/Debian
341
+ sudo apt-get update && sudo apt-get install -y protobuf-compiler
342
+
343
+ # CentOS/RHEL
344
+ sudo yum install -y protobuf-compiler
345
+
346
+ # Alpine
347
+ apk add protobuf
348
+ ```
349
+
350
+ **macOS:**
351
+
352
+ ```bash
353
+ brew install protobuf
354
+ ```
355
+
356
+ **Windows:**
357
+
358
+ ```powershell
359
+ # Using Chocolatey
360
+ choco install protoc
361
+
362
+ # Or download from: https://github.com/protocolbuffers/protobuf/releases
363
+ ```
364
+
365
+ **Verify Installation:**
366
+
367
+ ```bash
368
+ protoc --version
369
+ # Should show: libprotoc 3.x.x or higher
370
+ ```
371
+
372
+ #### Step 2: Define Your Protocol Buffer Schema
373
+
374
+ The SDK includes an example schema at `schemas/air_quality.proto`:
375
+
376
+ ```protobuf
377
+ syntax = "proto2";
378
+
379
+ package examples;
380
+
381
+ // Example message representing air quality sensor data
382
+ message AirQuality {
383
+ optional string device_name = 1;
384
+ optional int32 temp = 2;
385
+ optional int64 humidity = 3;
386
+ }
387
+ ```
388
+
389
+ #### Step 3: Generate TypeScript Code
390
+
391
+ Generate TypeScript code from your proto schema:
392
+
393
+ ```bash
394
+ npm run build:proto
395
+ ```
396
+
397
+ This runs:
398
+ ```bash
399
+ pbjs -t static-module -w commonjs -o examples/generated/air_quality.js schemas/air_quality.proto
400
+ pbts -o examples/generated/air_quality.d.ts examples/generated/air_quality.js
401
+ ```
402
+
403
+ **Output:**
404
+ - `examples/generated/air_quality.js` - JavaScript protobuf code
405
+ - `examples/generated/air_quality.d.ts` - TypeScript type definitions
406
+
407
+ #### Step 4: Generate Descriptor File for Databricks
408
+
409
+ Databricks requires descriptor metadata about your protobuf schema.
410
+
411
+ **Generate Binary Descriptor:**
412
+
413
+ ```bash
414
+ protoc --descriptor_set_out=schemas/air_quality_descriptor.pb \
415
+ --include_imports \
416
+ schemas/air_quality.proto
417
+ ```
418
+
419
+ **Important flags:**
420
+ - `--descriptor_set_out` - Output path for the binary descriptor
421
+ - `--include_imports` - Include all imported proto files (required)
422
+
423
+ That's it! The SDK will automatically extract the message descriptor from this file.
424
+
425
+ #### Step 5: Use in Your Code
426
+
427
+ ```typescript
428
+ import { ZerobusSdk, RecordType } from '@databricks/zerobus-ingest-sdk';
429
+ import * as airQuality from './examples/generated/air_quality';
430
+ import { loadDescriptorProto } from '@databricks/zerobus-ingest-sdk/utils/descriptor';
431
+
432
+ // Configuration
433
+ const zerobusEndpoint = '<workspace-id>.zerobus.<region>.cloud.databricks.com';
434
+ const workspaceUrl = 'https://<workspace-name>.cloud.databricks.com';
435
+ const tableName = 'main.default.air_quality';
436
+ const clientId = process.env.DATABRICKS_CLIENT_ID!;
437
+ const clientSecret = process.env.DATABRICKS_CLIENT_SECRET!;
438
+
439
+ // Load and extract the descriptor for your specific message
440
+ const descriptorBase64 = loadDescriptorProto({
441
+ descriptorPath: 'schemas/air_quality_descriptor.pb',
442
+ protoFileName: 'air_quality.proto',
443
+ messageName: 'AirQuality'
444
+ });
445
+
446
+ // Initialize SDK
447
+ const sdk = new ZerobusSdk(zerobusEndpoint, workspaceUrl);
448
+
449
+ // Configure table properties with protobuf descriptor
450
+ const tableProperties = {
451
+ tableName,
452
+ descriptorProto: descriptorBase64 // Required for Protocol Buffers
453
+ };
454
+
455
+ // Configure stream with Protocol Buffers record type
456
+ const options = {
457
+ recordType: RecordType.Proto, // Protocol Buffers encoding
458
+ maxInflightRequests: 1000,
459
+ recovery: true
460
+ };
461
+
462
+ // Create stream
463
+ const stream = await sdk.createStream(tableProperties, clientId, clientSecret, options);
464
+
465
+ try {
466
+ const AirQuality = airQuality.examples.AirQuality;
467
+ let lastAckPromise;
468
+
469
+ // Send all records
470
+ for (let i = 0; i < 100; i++) {
471
+ const record = AirQuality.create({
472
+ device_name: `sensor-${i}`,
473
+ temp: 20 + i,
474
+ humidity: 50 + i
475
+ });
476
+
477
+ // Protobuf supports 2 types:
478
+ // 1. Message object (high-level) - SDK calls .encode().finish()
479
+ lastAckPromise = stream.ingestRecord(record);
480
+ // 2. Buffer (low-level) - pre-serialized bytes
481
+ // const buffer = Buffer.from(AirQuality.encode(record).finish());
482
+ // lastAckPromise = stream.ingestRecord(buffer);
483
+ }
484
+
485
+ console.log('All records sent. Waiting for last acknowledgment...');
486
+
487
+ // Wait for the last record's acknowledgment
488
+ const lastOffset = await lastAckPromise;
489
+ console.log(`Last record offset: ${lastOffset}`);
490
+
491
+ // Flush to ensure all records are acknowledged
492
+ await stream.flush();
493
+ console.log('Successfully ingested 100 records!');
494
+ } finally {
495
+ await stream.close();
496
+ }
497
+ ```
498
+
499
+ #### Type Mapping: Delta ↔ Protocol Buffers
500
+
501
+ When creating your proto schema, use these type mappings:
502
+
503
+ | Delta Type | Proto2 Type | Notes |
504
+ |-----------|-------------|-------|
505
+ | STRING, VARCHAR | string | |
506
+ | INT, SMALLINT, SHORT | int32 | |
507
+ | BIGINT, LONG | int64 | |
508
+ | FLOAT | float | |
509
+ | DOUBLE | double | |
510
+ | BOOLEAN | bool | |
511
+ | BINARY | bytes | |
512
+ | DATE | int32 | Days since epoch |
513
+ | TIMESTAMP | int64 | Microseconds since epoch |
514
+ | ARRAY\<type\> | repeated type | Use repeated field |
515
+ | MAP\<key, value\> | map\<key, value\> | Use map field |
516
+ | STRUCT\<fields\> | message | Define nested message |
517
+
518
+ **Example: Complex Schema**
519
+
520
+ ```protobuf
521
+ syntax = "proto2";
522
+
523
+ package examples;
524
+
525
+ message ComplexRecord {
526
+ optional string id = 1;
527
+ optional int64 timestamp = 2;
528
+ repeated string tags = 3; // ARRAY<STRING>
529
+ map<string, int32> metrics = 4; // MAP<STRING, INT>
530
+ optional NestedData nested = 5; // STRUCT
531
+ }
532
+
533
+ message NestedData {
534
+ optional string field1 = 1;
535
+ optional double field2 = 2;
536
+ }
537
+ ```
538
+
539
+ #### Using Your Own Schema
540
+
541
+ 1. **Create your proto file:**
542
+ ```bash
543
+ cat > schemas/my_schema.proto << 'EOF'
544
+ syntax = "proto2";
545
+
546
+ package my_schema;
547
+
548
+ message MyMessage {
549
+ optional string field1 = 1;
550
+ optional int32 field2 = 2;
551
+ }
552
+ EOF
553
+ ```
554
+
555
+ 2. **Add build script to package.json:**
556
+ ```json
557
+ {
558
+ "scripts": {
559
+ "build:proto:myschema": "pbjs -t static-module -w commonjs -o examples/generated/my_schema.js schemas/my_schema.proto && pbts -o examples/generated/my_schema.d.ts examples/generated/my_schema.js"
560
+ }
561
+ }
562
+ ```
563
+
564
+ 3. **Generate code and descriptor:**
565
+ ```bash
566
+ npm run build:proto:myschema
567
+ protoc --descriptor_set_out=schemas/my_schema_descriptor.pb --include_imports schemas/my_schema.proto
568
+ ```
569
+
570
+ 4. **Load descriptor in your code:**
571
+ ```typescript
572
+ import { loadDescriptorProto } from '@databricks/zerobus-ingest-sdk/utils/descriptor';
573
+ const descriptorBase64 = loadDescriptorProto({
574
+ descriptorPath: 'schemas/my_schema_descriptor.pb',
575
+ protoFileName: 'my_schema.proto',
576
+ messageName: 'MyMessage'
577
+ });
578
+ ```
579
+
580
+ #### Troubleshooting Protocol Buffers
581
+
582
+ **"protoc: command not found"**
583
+ - Install `protoc` (see Step 1 above)
584
+
585
+ **"Cannot find module './generated/air_quality'"**
586
+ - Run `npm run build:proto` to generate TypeScript code
587
+
588
+ **"Descriptor file not found"**
589
+ - Generate the descriptor file using the commands in Step 4
590
+
591
+ **"Invalid descriptor"**
592
+ - Ensure you used `--include_imports` flag when generating the descriptor
593
+ - Verify the `.pb` file was created: `ls -lh schemas/*.pb`
594
+ - Check that `protoFileName` and `messageName` match your proto file
595
+ - Make sure you're using `loadDescriptorProto()` from the utils
596
+
597
+ **Build fails on proto generation**
598
+ - Ensure protobufjs is installed: `npm install --save-dev protobufjs protobufjs-cli`
599
+
600
+ #### Quick Reference
601
+
602
+ Complete setup from scratch:
603
+ ```bash
604
+ # Install dependencies and build SDK
605
+ npm install
606
+ npm run build
607
+
608
+ # Setup Protocol Buffers
609
+ npm run build:proto
610
+ protoc --descriptor_set_out=schemas/air_quality_descriptor.pb --include_imports schemas/air_quality.proto
611
+
612
+ # Run example
613
+ npx tsx examples/proto.ts
614
+ ```
615
+
616
+ #### Why Two Steps (TypeScript + Descriptor)?
617
+
618
+ 1. **TypeScript Code Generation** (`npm run build:proto`):
619
+ - Creates JavaScript/TypeScript code for your application
620
+ - Provides type-safe message creation and encoding
621
+ - Used in your application code
622
+
623
+ 2. **Descriptor File Generation** (`protoc --descriptor_set_out`):
624
+ - Creates metadata about your schema for Databricks
625
+ - Required by Zerobus service for schema validation
626
+ - Uploaded as base64 string when creating a stream
627
+
628
+ Both are necessary for Protocol Buffers ingestion!
629
+
630
+ ## Usage Examples
631
+
632
+ See the `examples/` directory for complete, runnable examples. See [examples/README.md](examples/README.md) for detailed instructions.
633
+
634
+ ### Running Examples
635
+
636
+ ```bash
637
+ # Set environment variables
638
+ export ZEROBUS_SERVER_ENDPOINT="<workspace-id>.zerobus.<region>.cloud.databricks.com"
639
+ export DATABRICKS_WORKSPACE_URL="https://<workspace-name>.cloud.databricks.com"
640
+ export DATABRICKS_CLIENT_ID="your-client-id"
641
+ export DATABRICKS_CLIENT_SECRET="your-client-secret"
642
+ export ZEROBUS_TABLE_NAME="main.default.air_quality"
643
+
644
+ # Run JSON example
645
+ npx tsx examples/json.ts
646
+
647
+ # For Protocol Buffers, generate TypeScript code and descriptor
648
+ npm run build:proto
649
+ protoc --descriptor_set_out=schemas/air_quality_descriptor.pb --include_imports schemas/air_quality.proto
650
+
651
+ # Run Protocol Buffers example
652
+ npx tsx examples/proto.ts
653
+ ```
654
+
655
+ ### Batch Ingestion
656
+
657
+ For higher throughput, use batch ingestion to send multiple records with a single acknowledgment:
658
+
659
+ #### Protocol Buffers
660
+
661
+ ```typescript
662
+ const records = Array.from({ length: 1000 }, (_, i) =>
663
+ AirQuality.create({ device_name: `sensor-${i}`, temp: 20 + i, humidity: 50 + i })
664
+ );
665
+
666
+ // Protobuf Type 1: Message objects (high-level) - SDK auto-serializes
667
+ const offsetId = await stream.ingestRecords(records);
668
+
669
+ // Protobuf Type 2: Buffers (low-level) - pre-serialized bytes
670
+ // const buffers = records.map(r => Buffer.from(AirQuality.encode(r).finish()));
671
+ // const offsetId = await stream.ingestRecords(buffers);
672
+
673
+ if (offsetId !== null) {
674
+ console.log(`Batch acknowledged at offset ${offsetId}`);
675
+ }
676
+ ```
677
+
678
+ #### JSON
679
+
680
+ ```typescript
681
+ const records = Array.from({ length: 1000 }, (_, i) => ({
682
+ device_name: `sensor-${i}`,
683
+ temp: 20 + i,
684
+ humidity: 50 + i
685
+ }));
686
+
687
+ // JSON Type 1: objects (high-level) - SDK auto-stringifies
688
+ const offsetId = await stream.ingestRecords(records);
689
+
690
+ // JSON Type 2: strings (low-level) - pre-serialized JSON
691
+ // const jsonRecords = records.map(r => JSON.stringify(r));
692
+ // const offsetId = await stream.ingestRecords(jsonRecords);
693
+ ```
694
+
695
+ **Type Widening Support:**
696
+ - JSON mode: Accept `object[]` (auto-stringify) or `string[]` (pre-stringified)
697
+ - Proto mode: Accept protobuf messages with `.encode()` method (auto-serialize) or `Buffer[]` (pre-serialized)
698
+ - Mixed types are supported in the same batch
699
+
700
+ **Best Practices**:
701
+ - Batch size: 100-1,000 records for optimal throughput/latency balance
702
+ - Empty batches return `null` (no error, no offset)
703
+ - Use `recreateStream()` for recovery - it automatically handles unacknowledged batches
704
+
705
+ **Examples:**
706
+ Both `json.ts` and `proto.ts` examples demonstrate batch ingestion.
707
+
708
+ ## Authentication
709
+
710
+ The SDK uses OAuth 2.0 Client Credentials for authentication:
711
+
712
+ ```typescript
713
+ import { ZerobusSdk } from '@databricks/zerobus-ingest-sdk';
714
+
715
+ const sdk = new ZerobusSdk(zerobusEndpoint, workspaceUrl);
716
+
717
+ // Create stream with OAuth authentication
718
+ const stream = await sdk.createStream(
719
+ tableProperties,
720
+ clientId,
721
+ clientSecret,
722
+ options
723
+ );
724
+ ```
725
+
726
+ The SDK automatically fetches access tokens and includes these headers:
727
+ - `"authorization": "Bearer <oauth_token>"` - Obtained via OAuth 2.0 Client Credentials flow
728
+ - `"x-databricks-zerobus-table-name": "<table_name>"` - The fully qualified table name
729
+
730
+ ### Custom Authentication
731
+
732
+ Beyond OAuth, you can use custom headers for Personal Access Tokens (PAT) or other auth methods:
733
+
734
+ ```typescript
735
+ import { ZerobusSdk } from '@databricks/zerobus-ingest-sdk';
736
+ import { HeadersProvider } from '@databricks/zerobus-ingest-sdk/src/headers_provider';
737
+
738
+ class CustomHeadersProvider implements HeadersProvider {
739
+ async getHeaders(): Promise<Array<[string, string]>> {
740
+ return [
741
+ ["authorization", `Bearer ${myToken}`],
742
+ ["x-databricks-zerobus-table-name", tableName]
743
+ ];
744
+ }
745
+ }
746
+
747
+ const headersProvider = new CustomHeadersProvider();
748
+ const stream = await sdk.createStream(
749
+ tableProperties,
750
+ '', // client_id (ignored when headers_provider is provided)
751
+ '', // client_secret (ignored when headers_provider is provided)
752
+ options,
753
+ { getHeadersCallback: headersProvider.getHeaders.bind(headersProvider) }
754
+ );
755
+ ```
756
+
757
+ **Note:** Custom authentication is integrated into the main `createStream()` method. See the API Reference for details.
758
+
759
+ ## Configuration
760
+
761
+ ### Stream Configuration Options
762
+
763
+ | Option | Default | Description |
764
+ |--------|---------|-------------|
765
+ | `recordType` | `RecordType.Proto` | Serialization format: `RecordType.Json` or `RecordType.Proto` |
766
+ | `maxInflightRequests` | 10,000 | Maximum number of unacknowledged requests |
767
+ | `recovery` | true | Enable automatic stream recovery |
768
+ | `recoveryTimeoutMs` | 15,000 | Timeout for recovery operations (ms) |
769
+ | `recoveryBackoffMs` | 2,000 | Delay between recovery attempts (ms) |
770
+ | `recoveryRetries` | 4 | Maximum number of recovery attempts |
771
+ | `flushTimeoutMs` | 300,000 | Timeout for flush operations (ms) |
772
+ | `serverLackOfAckTimeoutMs` | 60,000 | Server acknowledgment timeout (ms) |
773
+
774
+ ### Example Configuration
775
+
776
+ ```typescript
777
+ import { StreamConfigurationOptions, RecordType } from '@databricks/zerobus-ingest-sdk';
778
+
779
+ const options: StreamConfigurationOptions = {
780
+ recordType: RecordType.Json, // JSON encoding
781
+ maxInflightRequests: 10000,
782
+ recovery: true,
783
+ recoveryTimeoutMs: 20000,
784
+ recoveryBackoffMs: 2000,
785
+ recoveryRetries: 4
786
+ };
787
+
788
+ const stream = await sdk.createStream(
789
+ tableProperties,
790
+ clientId,
791
+ clientSecret,
792
+ options
793
+ );
794
+ ```
795
+
796
+ ## Descriptor Utilities
797
+
798
+ The SDK provides a helper function to extract Protocol Buffer descriptors from FileDescriptorSets.
799
+
800
+ ### loadDescriptorProto()
801
+
802
+ Extracts a specific message descriptor from a FileDescriptorSet:
803
+
804
+ ```typescript
805
+ import { loadDescriptorProto } from '@databricks/zerobus-ingest-sdk/utils/descriptor';
806
+
807
+ const descriptorBase64 = loadDescriptorProto({
808
+ descriptorPath: 'schemas/my_schema_descriptor.pb',
809
+ protoFileName: 'my_schema.proto', // Name of your .proto file
810
+ messageName: 'MyMessage' // The specific message to use
811
+ });
812
+ ```
813
+
814
+ **Parameters:**
815
+ - `descriptorPath`: Path to the `.pb` file generated by `protoc --descriptor_set_out`
816
+ - `protoFileName`: Name of the proto file (e.g., `"air_quality.proto"`)
817
+ - `messageName`: Name of the message type to extract (e.g., `"AirQuality"`)
818
+
819
+ **Why use this utility?**
820
+ - Extracts the specific message descriptor you need
821
+ - No manual base64 conversion required
822
+ - Clear error messages if the file or message isn't found
823
+ - Flexible for complex schemas with multiple messages or imports
824
+
825
+ **Example with multiple messages:**
826
+ ```typescript
827
+ // Your proto file has: Order, OrderItem, Customer
828
+ // You want to ingest Orders:
829
+ const descriptorBase64 = loadDescriptorProto({
830
+ descriptorPath: 'schemas/orders_descriptor.pb',
831
+ protoFileName: 'orders.proto',
832
+ messageName: 'Order' // Explicitly choose Order
833
+ });
834
+ ```
835
+
836
+ ## Error Handling
837
+
838
+ The SDK includes automatic recovery for transient failures (enabled by default with `recovery: true`). For permanent failures, use `recreateStream()` to automatically recover all unacknowledged batches. Always use try/finally blocks to ensure streams are properly closed:
839
+
840
+ ```typescript
841
+ try {
842
+ const offset = await stream.ingestRecord(JSON.stringify(record));
843
+ console.log(`Success: offset ${offset}`);
844
+ } catch (error) {
845
+ console.error('Ingestion failed:', error);
846
+
847
+ // When stream fails, close it first
848
+ await stream.close();
849
+ console.log('Stream closed after error');
850
+
851
+ // Optional: Inspect what needs recovery (must be called on closed stream)
852
+ const unackedBatches = await stream.getUnackedBatches();
853
+ console.log(`Batches to recover: ${unackedBatches.length}`);
854
+
855
+ // Recommended recovery approach: Use recreateStream()
856
+ // This method:
857
+ // 1. Gets all unacknowledged batches from the failed stream
858
+ // 2. Creates a new stream with the same configuration
859
+ // 3. Re-ingests all unacknowledged batches automatically
860
+ // 4. Returns the new stream ready for continued use
861
+ const newStream = await sdk.recreateStream(stream);
862
+ console.log(`Stream recreated with ${unackedBatches.length} batches re-ingested`);
863
+
864
+ // Continue using newStream for further ingestion
865
+ try {
866
+ // Continue ingesting...
867
+ } finally {
868
+ await newStream.close();
869
+ }
870
+ }
871
+ ```
872
+
873
+ **Best Practices:**
874
+ - **Rely on automatic recovery** (default): The SDK will automatically retry transient failures
875
+ - **Use `recreateStream()` for permanent failures**: Automatically recovers all unacknowledged batches
876
+ - **Use `getUnackedRecords()` for inspection only**: Primarily for debugging or understanding failed records
877
+ - Always close streams in a `finally` block to ensure proper cleanup
878
+
879
+ ## API Reference
880
+
881
+ ### ZerobusSdk
882
+
883
+ Main entry point for the SDK.
884
+
885
+ **Constructor:**
886
+
887
+ ```typescript
888
+ new ZerobusSdk(zerobusEndpoint: string, unityCatalogUrl: string)
889
+ ```
890
+
891
+ **Parameters:**
892
+ - `zerobusEndpoint` (string) - The Zerobus gRPC endpoint (e.g., `<workspace-id>.zerobus.<region>.cloud.databricks.com` for AWS, or `<workspace-id>.zerobus.<region>.azuredatabricks.net` for Azure)
893
+ - `unityCatalogUrl` (string) - The Unity Catalog endpoint (your workspace URL)
894
+
895
+ **Methods:**
896
+
897
+ ```typescript
898
+ async createStream(
899
+ tableProperties: TableProperties,
900
+ clientId: string,
901
+ clientSecret: string,
902
+ options?: StreamConfigurationOptions
903
+ ): Promise<ZerobusStream>
904
+ ```
905
+
906
+ Creates a new ingestion stream using OAuth 2.0 Client Credentials authentication.
907
+
908
+ Automatically includes these headers:
909
+ - `"authorization": "Bearer <oauth_token>"` (fetched via OAuth 2.0 Client Credentials flow)
910
+ - `"x-databricks-zerobus-table-name": "<table_name>"`
911
+
912
+ Returns a `ZerobusStream` instance.
913
+
914
+ ---
915
+
916
+ ```typescript
917
+ async recreateStream(stream: ZerobusStream): Promise<ZerobusStream>
918
+ ```
919
+
920
+ Recreates a stream with the same configuration and automatically re-ingests all unacknowledged batches.
921
+
922
+ This method is the **recommended approach** for recovering from stream failures. It:
923
+ 1. Retrieves all unacknowledged batches from the failed stream
924
+ 2. Creates a new stream with identical configuration (same table, auth, options)
925
+ 3. Re-ingests all unacknowledged batches in their original order
926
+ 4. Returns the new stream ready for continued ingestion
927
+
928
+ **Parameters:**
929
+ - `stream` - The failed or closed stream to recreate
930
+
931
+ **Returns:** Promise resolving to a new `ZerobusStream` with all unacknowledged batches re-ingested
932
+
933
+ **Example:**
934
+ ```typescript
935
+ try {
936
+ await stream.ingestRecords(batch);
937
+ } catch (error) {
938
+ await stream.close();
939
+ // Automatically recreate stream and recover all unacked batches
940
+ const newStream = await sdk.recreateStream(stream);
941
+ // Continue ingesting with newStream
942
+ }
943
+ ```
944
+
945
+ **Note:** This method preserves batch structure and re-ingests batches atomically. For debugging, you can inspect what was recovered using `getUnackedBatches()` after closing the stream.
946
+
947
+ ---
948
+
949
+ ### ZerobusStream
950
+
951
+ Represents an active ingestion stream.
952
+
953
+ **Methods:**
954
+
955
+ ```typescript
956
+ async ingestRecord(payload: Buffer | string | object): Promise<bigint>
957
+ ```
958
+
959
+ Ingests a single record. This method **blocks** until the record is sent to the SDK's internal landing zone, then returns a Promise for the server acknowledgment. This allows you to send many records without waiting for individual acknowledgments.
960
+
961
+ **Parameters:**
962
+ - `payload` - Record data. The SDK supports 4 input types for flexibility:
963
+ - **JSON Mode** (`RecordType.Json`):
964
+ - **Type 1 - object** (high-level): Plain JavaScript object - SDK auto-stringifies with `JSON.stringify()`
965
+ - **Type 2 - string** (low-level): Pre-serialized JSON string
966
+ - **Protocol Buffers Mode** (`RecordType.Proto`):
967
+ - **Type 3 - Message** (high-level): Protobuf message object - SDK calls `.encode().finish()` automatically
968
+ - **Type 4 - Buffer** (low-level): Pre-serialized protobuf bytes
969
+
970
+ **All 4 Type Examples:**
971
+ ```typescript
972
+ // JSON Type 1: object (high-level) - SDK auto-stringifies
973
+ await stream.ingestRecord({ device: 'sensor-1', temp: 25 });
974
+
975
+ // JSON Type 2: string (low-level) - pre-serialized
976
+ await stream.ingestRecord(JSON.stringify({ device: 'sensor-1', temp: 25 }));
977
+
978
+ // Protobuf Type 3: Message object (high-level) - SDK auto-serializes
979
+ const message = MyMessage.create({ device: 'sensor-1', temp: 25 });
980
+ await stream.ingestRecord(message);
981
+
982
+ // Protobuf Type 4: Buffer (low-level) - pre-serialized bytes
983
+ const buffer = Buffer.from(MyMessage.encode(message).finish());
984
+ await stream.ingestRecord(buffer);
985
+ ```
986
+
987
+ **Note:** The SDK automatically detects protobufjs message objects by checking if the constructor has a static `.encode()` method. This works seamlessly with messages created via `MyMessage.create()` or `new MyMessage()`.
988
+
989
+ **Returns:** Promise resolving to the offset ID when the server acknowledges the record
990
+
991
+ ---
992
+
993
+ ```typescript
994
+ async ingestRecords(payloads: Array<Buffer | string | object>): Promise<bigint | null>
995
+ ```
996
+
997
+ Ingests multiple records as a batch. All records in a batch are acknowledged together atomically. This method **blocks** until all records are sent to the SDK's internal landing zone, then returns a Promise for the server acknowledgment.
998
+
999
+ **Parameters:**
1000
+ - `payloads` - Array of record data. Supports the same 4 types as `ingestRecord()`:
1001
+ - **JSON Mode**: Array of **objects** (Type 1) or **strings** (Type 2)
1002
+ - **Proto Mode**: Array of **Message objects** (Type 3) or **Buffers** (Type 4)
1003
+ - Mixed types within the same array are supported
1004
+
1005
+ **All 4 Type Examples:**
1006
+ ```typescript
1007
+ // JSON Type 1: objects (high-level) - SDK auto-stringifies
1008
+ await stream.ingestRecords([
1009
+ { device: 'sensor-1', temp: 25 },
1010
+ { device: 'sensor-2', temp: 26 }
1011
+ ]);
1012
+
1013
+ // JSON Type 2: strings (low-level) - pre-serialized
1014
+ await stream.ingestRecords([
1015
+ JSON.stringify({ device: 'sensor-1', temp: 25 }),
1016
+ JSON.stringify({ device: 'sensor-2', temp: 26 })
1017
+ ]);
1018
+
1019
+ // Protobuf Type 3: Message objects (high-level) - SDK auto-serializes
1020
+ await stream.ingestRecords([
1021
+ MyMessage.create({ device: 'sensor-1', temp: 25 }),
1022
+ MyMessage.create({ device: 'sensor-2', temp: 26 })
1023
+ ]);
1024
+
1025
+ // Protobuf Type 4: Buffers (low-level) - pre-serialized bytes
1026
+ const buffers = [
1027
+ Buffer.from(MyMessage.encode(msg1).finish()),
1028
+ Buffer.from(MyMessage.encode(msg2).finish())
1029
+ ];
1030
+ await stream.ingestRecords(buffers);
1031
+ ```
1032
+
1033
+ **Returns:** Promise resolving to:
1034
+ - `bigint` - Offset ID when the server acknowledges the entire batch
1035
+ - `null` - If the batch was empty (no records sent)
1036
+
1037
+ **Best Practices:**
1038
+ - Batch size: 100-1,000 records for optimal throughput/latency balance
1039
+ - Empty batches are allowed and return `null`
1040
+
1041
+ ---
1042
+
1043
+ ```typescript
1044
+ async flush(): Promise<void>
1045
+ ```
1046
+
1047
+ Flushes all pending records and waits for acknowledgments.
1048
+
1049
+ ```typescript
1050
+ async close(): Promise<void>
1051
+ ```
1052
+
1053
+ Closes the stream gracefully, flushing all pending data. **Always call this in a finally block!**
1054
+
1055
+ ```typescript
1056
+ async getUnackedRecords(): Promise<Buffer[]>
1057
+ ```
1058
+
1059
+ Returns unacknowledged record payloads as a flat array for inspection purposes.
1060
+
1061
+ **Important:** Can only be called on **closed streams**. Call `stream.close()` first, or this will throw an error.
1062
+
1063
+ **Returns:** Array of Buffer containing the raw record payloads
1064
+
1065
+ **Use case:** For inspecting unacknowledged individual records when using `ingestRecord()`. **Note:** This method is primarily for debugging and inspection. For recovery, use `recreateStream()` (recommended) or automatic recovery (default).
1066
+
1067
+ ---
1068
+
1069
+ ```typescript
1070
+ async getUnackedBatches(): Promise<Buffer[][]>
1071
+ ```
1072
+
1073
+ Returns unacknowledged records grouped by their original batches for inspection purposes.
1074
+
1075
+ **Important:** Can only be called on **closed streams**. Call `stream.close()` first, or this will throw an error.
1076
+
1077
+ **Returns:** Array of arrays, where each inner array represents a batch of records as Buffers
1078
+
1079
+ **Use case:** For inspecting unacknowledged batches when using `ingestRecords()`. Preserves the original batch structure. **Note:** This method is primarily for debugging and inspection. For recovery, use `recreateStream()` (recommended) or automatic recovery (default).
1080
+
1081
+ **Example:**
1082
+ ```typescript
1083
+ try {
1084
+ await stream.ingestRecords(batch1);
1085
+ await stream.ingestRecords(batch2);
1086
+ // ... error occurs
1087
+ } catch (error) {
1088
+ await stream.close();
1089
+ const unackedBatches = await stream.getUnackedBatches();
1090
+ // unackedBatches[0] contains records from batch1 (if not acked)
1091
+ // unackedBatches[1] contains records from batch2 (if not acked)
1092
+
1093
+ // Re-ingest with new stream
1094
+ for (const batch of unackedBatches) {
1095
+ await newStream.ingestRecords(batch);
1096
+ }
1097
+ }
1098
+ ```
1099
+
1100
+ ---
1101
+
1102
+ ### TableProperties
1103
+
1104
+ Configuration for the target table.
1105
+
1106
+ **Interface:**
1107
+
1108
+ ```typescript
1109
+ interface TableProperties {
1110
+ tableName: string; // Fully qualified table name (e.g., "catalog.schema.table")
1111
+ descriptorProto?: string; // Base64-encoded protobuf descriptor (required for Protocol Buffers)
1112
+ }
1113
+ ```
1114
+
1115
+ **Examples:**
1116
+
1117
+ ```typescript
1118
+ // JSON mode
1119
+ const tableProperties = { tableName: 'main.default.air_quality' };
1120
+
1121
+ // Protocol Buffers mode
1122
+ const tableProperties = {
1123
+ tableName: 'main.default.air_quality',
1124
+ descriptorProto: descriptorBase64 // Required for protobuf
1125
+ };
1126
+ ```
1127
+
1128
+ ---
1129
+
1130
+ ### StreamConfigurationOptions
1131
+
1132
+ Configuration options for stream behavior.
1133
+
1134
+ **Interface:**
1135
+
1136
+ ```typescript
1137
+ interface StreamConfigurationOptions {
1138
+ recordType?: RecordType; // RecordType.Json or RecordType.Proto. Default: RecordType.Proto
1139
+ maxInflightRequests?: number; // Default: 10,000
1140
+ recovery?: boolean; // Default: true
1141
+ recoveryTimeoutMs?: number; // Default: 15,000
1142
+ recoveryBackoffMs?: number; // Default: 2,000
1143
+ recoveryRetries?: number; // Default: 4
1144
+ flushTimeoutMs?: number; // Default: 300,000
1145
+ serverLackOfAckTimeoutMs?: number; // Default: 60,000
1146
+ }
1147
+
1148
+ enum RecordType {
1149
+ Json = 0, // JSON encoding
1150
+ Proto = 1 // Protocol Buffers encoding
1151
+ }
1152
+ ```
1153
+
1154
+ ## Best Practices
1155
+
1156
+ 1. **Reuse SDK instances**: Create one `ZerobusSdk` instance per application
1157
+ 2. **Stream lifecycle**: Always close streams in a `finally` block to ensure all records are flushed
1158
+ 3. **Batch size**: Adjust `maxInflightRequests` based on your throughput requirements (default: 10,000)
1159
+ 4. **Error handling**: The stream handles errors internally with automatic retry. Only use `recreateStream()` for persistent failures after internal retries are exhausted.
1160
+ 5. **Use Protocol Buffers for production**: Protocol Buffers (the default) provides better performance and schema validation. Use JSON only when you need schema flexibility or for quick prototyping.
1161
+ 6. **Store credentials securely**: Use environment variables, never hardcode credentials
1162
+ 7. **Use batch ingestion**: For high-throughput scenarios, use `ingestRecords()` instead of individual `ingestRecord()` calls
1163
+
1164
+ ## Platform Support
1165
+
1166
+ The SDK supports all platforms where Node.js and Rust are available.
1167
+
1168
+ ### Pre-built Binaries
1169
+
1170
+ Pre-built native binaries are available for:
1171
+
1172
+ - **Linux**: x64, ARM64
1173
+ - **Windows**: x64
1174
+
1175
+ ### Build from Source
1176
+
1177
+ **macOS users**: Pre-built binaries are not available for macOS. The package will automatically build from source during `npm install`, which requires:
1178
+
1179
+ - **Rust toolchain** (1.70+): Install via `curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh`
1180
+ - **Xcode Command Line Tools**: Install via `xcode-select --install`
1181
+
1182
+ The build process happens automatically during installation and typically takes 2-3 minutes.
1183
+
1184
+ ## Architecture
1185
+
1186
+ This SDK wraps the high-performance [Rust Zerobus SDK](https://github.com/databricks/zerobus-sdk-rs) using [NAPI-RS](https://napi.rs):
1187
+
1188
+ ```
1189
+ ┌─────────────────────────────┐
1190
+ │ TypeScript Application │
1191
+ └─────────────┬───────────────┘
1192
+ │ (NAPI-RS bindings)
1193
+ ┌─────────────▼───────────────┐
1194
+ │ Rust Zerobus SDK │
1195
+ │ - gRPC communication │
1196
+ │ - OAuth authentication │
1197
+ │ - Stream management │
1198
+ └─────────────┬───────────────┘
1199
+ │ (gRPC/TLS)
1200
+ ┌─────────────▼───────────────┐
1201
+ │ Databricks Zerobus Service│
1202
+ └─────────────────────────────┘
1203
+ ```
1204
+
1205
+ **Benefits:**
1206
+ - **Zero-copy data transfer** between JavaScript and Rust
1207
+ - **Native async/await support** - Rust futures become JavaScript Promises
1208
+ - **Automatic memory management** - No manual cleanup required
1209
+ - **Type safety** - Compile-time checks on both sides
1210
+
1211
+ ## Contributing
1212
+
1213
+ We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for details.
1214
+
1215
+ ## Related Projects
1216
+
1217
+ - [Zerobus Rust SDK](https://github.com/databricks/zerobus-sdk-rs) - The underlying Rust implementation
1218
+ - [Zerobus Python SDK](https://github.com/databricks/zerobus-sdk-py) - Python SDK for Zerobus
1219
+ - [Zerobus Java SDK](https://github.com/databricks/zerobus-sdk-java) - Java SDK for Zerobus
1220
+ - [NAPI-RS](https://napi.rs) - Rust/Node.js binding framework