@rip-lang/db 0.10.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/INTERNALS.md ADDED
@@ -0,0 +1,324 @@
1
+ # DuckDB Internals
2
+
3
+ This document covers the internal architecture, binary protocol, and implementation
4
+ details for rip-db's native DuckDB integration. For usage, see README.md.
5
+
6
+ ## Architecture
7
+
8
+ ```
9
+ ┌────────────────────────────────────────────────────────────────────────────┐
10
+ │ Browser │
11
+ │ ┌──────────────────────────────────────────────────────────────────────┐ │
12
+ │ │ DuckDB UI (React App) │ │
13
+ │ │ - Loaded from http://localhost:4213/ (proxied from ui.duckdb.org) │ │
14
+ │ │ - Makes API calls to relative URLs (/ddb/run, /ddb/tokenize, etc.) │ │
15
+ │ │ - Uses EventSource for /localEvents (catalog updates) │ │
16
+ │ └──────────────────────────────────────────────────────────────────────┘ │
17
+ │ │ │
18
+ │ │ Same-origin requests │
19
+ │ ▼ │
20
+ └────────────────────────────────────────────────────────────────────────────┘
21
+
22
+
23
+ ┌────────────────────────────────────────────────────────────────────────────┐
24
+ │ rip-db Server (:4213) │
25
+ │ │
26
+ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │
27
+ │ │ Static Proxy │ │ Binary API │ │ SSE Events │ │
28
+ │ │ GET /* │ │ POST /ddb/run │ │ GET /localEvents │ │
29
+ │ │ → ui.duckdb.org │ │ POST /ddb/token │ │ → catalog updates │ │
30
+ │ └─────────────────┘ │ POST /ddb/intr │ └─────────────────────────┘ │
31
+ │ └─────────────────┘ │
32
+ │ │ │
33
+ │ ▼ │
34
+ │ ┌──────────────────────────────────────────────────────────────────────┐ │
35
+ │ │ High-Performance Zig Bindings │ │
36
+ │ │ - Single FFI call per query │ │
37
+ │ │ - Zero-copy for numeric columns (direct memory access) │ │
38
+ │ │ - Binary serialization done in Zig (no JS overhead) │ │
39
+ │ │ - Pre-allocated output buffer (no allocations per query) │ │
40
+ │ └──────────────────────────────────────────────────────────────────────┘ │
41
+ │ │ │
42
+ │ ▼ │
43
+ │ ┌─────────────────┐ │
44
+ │ │ DuckDB │ │
45
+ │ │ (native) │ │
46
+ │ └─────────────────┘ │
47
+ └────────────────────────────────────────────────────────────────────────────┘
48
+ ```
49
+
50
+ ### Why Zig?
51
+
52
+ The naive approach (per-value FFI calls) has severe problems:
53
+
54
+ ```
55
+ ┌─────────┐ FFI call ┌──────────┐ per-value ┌──────────┐
56
+ │ DuckDB │ ──────────────▶ │ Zig │ ─────────────▶ │ Bun/JS │
57
+ │ Result │ ◀────────────── │ Wrapper │ ◀───────────── │ Extracts │
58
+ └─────────┘ ptr to data └──────────┘ 100k calls └──────────┘
59
+
60
+
61
+ ┌──────────────────────────┐
62
+ │ JavaScript Objects │
63
+ │ (100k allocations) │
64
+ └──────────────────────────┘
65
+ ```
66
+
67
+ **Problems:**
68
+ - ~100,000 FFI calls for 10k rows × 10 columns
69
+ - String pointers become invalid after result freed (segfaults!)
70
+ - V8 heap pressure from intermediate objects
71
+
72
+ **Our solution:** Do everything in Zig with a single FFI call:
73
+
74
+ ```
75
+ ┌─────────┐ single call ┌──────────────────────────────┐
76
+ │ DuckDB │ ──────────────▶ │ Zig │
77
+ │ Result │ │ ┌─────────────────────────┐ │
78
+ └─────────┘ │ │ Binary Serializer │ │
79
+ │ │ │ (direct memory access) │ │
80
+ │ column data ptrs │ └───────────┬─────────────┘ │
81
+ └──────────────────────┼──────────────┘ │
82
+ │ ▼ │
83
+ │ ┌─────────────────────────┐ │
84
+ │ │ Output Buffer │ │
85
+ │ │ (pre-allocated) │ │
86
+ │ └─────────────────────────┘ │
87
+ └──────────────────────────────┘
88
+
89
+
90
+ ┌──────────────────────────────┐
91
+ │ HTTP Response │
92
+ └──────────────────────────────┘
93
+ ```
94
+
95
+ | Metric | Naive (JS) | Zig | Improvement |
96
+ |--------|------------|-----|-------------|
97
+ | FFI calls (10k×10) | ~100,000 | 1 | 100,000× |
98
+ | Allocations | O(rows×cols) | O(1) | ∞ |
99
+ | Memory copies | 3-4 per value | 1 total | 3-4× |
100
+ | String handling | Unsafe (crash) | Safe | ✓ |
101
+
102
+ ---
103
+
104
+ ## Binary Protocol
105
+
106
+ The DuckDB UI uses a custom binary format for query results. All numbers are little-endian.
107
+
108
+ ### Primitives
109
+
110
+ #### varint (Variable-length Integer)
111
+ ```
112
+ while (byte & 0x80):
113
+ result |= (byte & 0x7F) << shift
114
+ shift += 7
115
+ ```
116
+
117
+ #### Field ID
118
+ ```
119
+ id: uint16 (little-endian)
120
+ end marker: 0xFFFF
121
+ ```
122
+
123
+ #### string
124
+ ```
125
+ length: varint
126
+ data: UTF-8 bytes
127
+ ```
128
+
129
+ #### list<T>
130
+ ```
131
+ count: varint
132
+ items: T[]
133
+ ```
134
+
135
+ #### nullable<T>
136
+ ```
137
+ present: uint8 (0 = null, non-zero = present)
138
+ value: T (only if present)
139
+ ```
140
+
141
+ ### Response Types
142
+
143
+ #### SuccessResult
144
+ ```
145
+ field_100: boolean (true)
146
+ field_101: ColumnNamesAndTypes
147
+ field_102: list<DataChunk>
148
+ 0xFFFF
149
+ ```
150
+
151
+ #### ErrorResult
152
+ ```
153
+ field_100: boolean (false)
154
+ field_101: string (error message)
155
+ 0xFFFF
156
+ ```
157
+
158
+ #### TokenizeResult
159
+ ```
160
+ field_100: list<varint> (offsets)
161
+ field_101: list<varint> (token types)
162
+ 0xFFFF
163
+ ```
164
+
165
+ ### ColumnNamesAndTypes
166
+ ```
167
+ field_100: list<string> (column names)
168
+ field_101: list<Type> (column types)
169
+ 0xFFFF
170
+ ```
171
+
172
+ ### Type
173
+ ```
174
+ field_100: uint8 (LogicalTypeId)
175
+ field_101: nullable<TypeInfo>
176
+ 0xFFFF
177
+ ```
178
+
179
+ ### DataChunk
180
+ ```
181
+ field_100: varint (row count)
182
+ field_101: list<Vector>
183
+ 0xFFFF
184
+ ```
185
+
186
+ ### Vector
187
+
188
+ ```
189
+ field_100: uint8 (allValid: 0 = all valid, 1 = has bitmap)
190
+ field_101: data (validity bitmap - only if allValid != 0)
191
+ field_102: data (values)
192
+ 0xFFFF
193
+ ```
194
+
195
+ **Validity bitmap:** LSB first, 1 = valid, 0 = NULL, size = ceil(rows/8) bytes.
196
+
197
+ ### LogicalTypeId
198
+
199
+ | ID | Type | Bytes |
200
+ |----|------|-------|
201
+ | 10 | BOOLEAN | 1 |
202
+ | 11 | TINYINT | 1 |
203
+ | 12 | SMALLINT | 2 |
204
+ | 13 | INTEGER | 4 |
205
+ | 14 | BIGINT | 8 |
206
+ | 15 | DATE | 4 (days since epoch) |
207
+ | 16 | TIME | 8 (microseconds) |
208
+ | 19 | TIMESTAMP | 8 (microseconds since epoch) |
209
+ | 22 | FLOAT | 4 |
210
+ | 23 | DOUBLE | 8 |
211
+ | 25 | VARCHAR | variable (list<string>) |
212
+ | 26 | BLOB | variable |
213
+ | 50 | HUGEINT | 16 |
214
+ | 54 | UUID | 16 |
215
+ | 100 | STRUCT | nested |
216
+ | 101 | LIST | nested |
217
+ | 102 | MAP | nested |
218
+
219
+ ---
220
+
221
+ ## HTTP Endpoints
222
+
223
+ ### Required for DuckDB UI
224
+
225
+ | Endpoint | Method | Request | Response |
226
+ |----------|--------|---------|----------|
227
+ | `/ddb/run` | POST | SQL text | Binary result |
228
+ | `/ddb/interrupt` | POST | Empty | Empty result |
229
+ | `/ddb/tokenize` | POST | SQL text | Binary tokens |
230
+ | `/info` | GET | - | Headers only |
231
+ | `/version` | GET | - | JSON `{"origin":"host",...}` |
232
+ | `/config` | GET | - | Proxied + version headers |
233
+ | `/localToken` | GET | - | Empty (no MotherDuck) |
234
+ | `/localEvents` | GET | - | SSE stream |
235
+
236
+ ### Request Headers
237
+
238
+ | Header | Encoding | Purpose |
239
+ |--------|----------|---------|
240
+ | `Origin` | - | Security check (must match server URL) |
241
+ | `X-DuckDB-UI-Result-Row-Limit` | Plain | Max rows to return |
242
+ | `X-DuckDB-UI-Database-Name` | Base64 | Target database |
243
+ | `X-DuckDB-UI-Schema-Name` | Base64 | Target schema |
244
+ | `X-DuckDB-UI-Parameter-Count` | Plain | Prepared statement params |
245
+ | `X-DuckDB-UI-Parameter-Value-{n}` | Base64 | Param values |
246
+
247
+ ### Response Headers (Required)
248
+
249
+ ```
250
+ X-DuckDB-Version: 1.4.1
251
+ X-DuckDB-Platform: rip-db
252
+ X-DuckDB-UI-Extension-Version: 139-944c08a214
253
+ ```
254
+
255
+ The UI checks `X-DuckDB-UI-Extension-Version` to decide between HTTP and WASM modes.
256
+
257
+ ### Security
258
+
259
+ The official DuckDB server checks Origin header:
260
+ ```cpp
261
+ if (origin != local_url) {
262
+ res.status = 401; // UNAUTHORIZED
263
+ }
264
+ ```
265
+
266
+ Our solution: Proxy UI assets from `ui.duckdb.org` so all requests are same-origin.
267
+
268
+ ---
269
+
270
+ ## Building
271
+
272
+ ```bash
273
+ cd packages/db
274
+
275
+ # Build (outputs to lib/{os}-{arch}/duckdb.node)
276
+ # ReleaseFast is the default, so no -Doptimize flag needed
277
+ zig build --prefix .
278
+
279
+ # Run tests
280
+ zig build test
281
+
282
+ # Debug build
283
+ zig build -Doptimize=Debug --prefix .
284
+ ```
285
+
286
+ Requires DuckDB headers and library. Set `DUCKDB_DIR` if not at `/opt/homebrew`:
287
+ ```bash
288
+ DUCKDB_DIR=/path/to/duckdb zig build
289
+ ```
290
+
291
+ ---
292
+
293
+ ## Debugging
294
+
295
+ ### Test endpoints
296
+
297
+ ```bash
298
+ # Start server
299
+ rip db.rip :memory: --port 4213
300
+
301
+ # Test binary endpoint
302
+ curl -X POST http://localhost:4213/ddb/run \
303
+ -H "Origin: http://localhost:4213" \
304
+ -d "SELECT 42" | xxd | head -5
305
+
306
+ # Should see: 6400 01 = field 100, value true (success)
307
+ ```
308
+
309
+ ### Common Issues
310
+
311
+ | Issue | Cause | Fix |
312
+ |-------|-------|-----|
313
+ | 401 on /ddb/run | Wrong Origin | Load UI from localhost, not ui.duckdb.org |
314
+ | UI shows "WASM" | Missing version header | Add `X-DuckDB-UI-Extension-Version` |
315
+ | Syntax highlighting broken | /ddb/tokenize error | Check server logs |
316
+ | Segfault | Memory lifetime | Use Zig bindings, not JS FFI per-value |
317
+
318
+ ---
319
+
320
+ ## References
321
+
322
+ - [DuckDB UI GitHub](https://github.com/duckdb/duckdb-ui)
323
+ - [DuckDB UI Client](https://github.com/duckdb/duckdb-ui/tree/main/ts/pkgs/duckdb-ui-client)
324
+ - [BinaryDeserializer.ts](https://github.com/duckdb/duckdb-ui/blob/main/ts/pkgs/duckdb-ui-client/src/serialization/classes/BinaryDeserializer.ts)