vectorjson 0.2.1 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +151 -1
- package/dist/engine-wasm.generated.d.ts +1 -1
- package/dist/engine.wasm +0 -0
- package/dist/index.d.ts +38 -12
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +5 -5
- package/dist/index.js.map +4 -4
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -59,6 +59,24 @@ parser.destroy();
|
|
|
59
59
|
|
|
60
60
|
`getValue()` returns a **live JS object** that grows incrementally on each `feed()`. No re-parsing — each byte is scanned exactly once.
|
|
61
61
|
|
|
62
|
+
**Schema-driven streaming** — pass a schema, get only what it defines. An LLM streams a 50KB tool call with `name`, `age`, `bio`, `metadata` — but your schema only needs `name` and `age`. Everything else is skipped at the byte level, never allocated in JS:
|
|
63
|
+
|
|
64
|
+
```js
|
|
65
|
+
import { z } from "zod";
|
|
66
|
+
import { createParser } from "vectorjson";
|
|
67
|
+
|
|
68
|
+
const User = z.object({ name: z.string(), age: z.number() });
|
|
69
|
+
|
|
70
|
+
for await (const partial of createParser({ schema: User, source: response.body })) {
|
|
71
|
+
console.log(partial);
|
|
72
|
+
// { name: "Ali" } ← partial, render immediately
|
|
73
|
+
// { name: "Alice" }
|
|
74
|
+
// { name: "Alice", age: 30 } ← validated on complete
|
|
75
|
+
}
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
The schema defines what to parse. Fields outside the schema are ignored during scanning — no objects created, no strings decoded, no memory wasted.
|
|
79
|
+
|
|
62
80
|
**Or skip intermediate access entirely** — if you only need the final value:
|
|
63
81
|
|
|
64
82
|
```js
|
|
@@ -308,6 +326,59 @@ parser.onText((text) => thinkingPanel.append(text)); // opt-in
|
|
|
308
326
|
parser.feed(llmOutput);
|
|
309
327
|
```
|
|
310
328
|
|
|
329
|
+
### Schema-driven streaming — parse only what the schema defines
|
|
330
|
+
|
|
331
|
+
Pass a schema and VectorJSON extracts only the fields it defines. Everything else is skipped at the byte level — no objects created, no strings decoded. Arrays are transparent: `{ users: z.array(z.object({ name })) }` picks through arrays automatically.
|
|
332
|
+
|
|
333
|
+
```js
|
|
334
|
+
import { z } from "zod";
|
|
335
|
+
import { createParser } from "vectorjson";
|
|
336
|
+
|
|
337
|
+
const User = z.object({ name: z.string(), age: z.number() });
|
|
338
|
+
|
|
339
|
+
for await (const partial of createParser({ schema: User, source: response.body })) {
|
|
340
|
+
console.log(partial);
|
|
341
|
+
// { name: "Ali" } ← partial, render immediately
|
|
342
|
+
// { name: "Alice", age: 30 } ← validated on complete
|
|
343
|
+
}
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
Works on dirty LLM output — think tags, code fences, and leading prose are stripped automatically when a schema is provided:
|
|
347
|
+
|
|
348
|
+
```js
|
|
349
|
+
// All of these work with createParser(schema):
|
|
350
|
+
// <think>reasoning</think>{"name":"Alice","age":30}
|
|
351
|
+
// ```json\n{"name":"Alice","age":30}\n```
|
|
352
|
+
// Here's the result: {"name":"Alice","age":30}
|
|
353
|
+
```
|
|
354
|
+
|
|
355
|
+
Both `createParser` and `createEventParser` support `source` + `for await`:
|
|
356
|
+
|
|
357
|
+
```js
|
|
358
|
+
import { createEventParser } from "vectorjson";
|
|
359
|
+
|
|
360
|
+
const parser = createEventParser({ source: response.body });
|
|
361
|
+
parser.on('tool', (e) => showToolUI(e.value));
|
|
362
|
+
|
|
363
|
+
for await (const partial of parser) {
|
|
364
|
+
updateUI(partial); // growing partial value
|
|
365
|
+
}
|
|
366
|
+
```
|
|
367
|
+
|
|
368
|
+
Works with any async source — fetch body, WebSocket wrapper, SSE adapter, or a plain async generator:
|
|
369
|
+
|
|
370
|
+
```js
|
|
371
|
+
async function* chunks() {
|
|
372
|
+
yield '{"status":"';
|
|
373
|
+
yield 'ok","data":';
|
|
374
|
+
yield '[1,2,3]}';
|
|
375
|
+
}
|
|
376
|
+
|
|
377
|
+
for await (const partial of createParser({ source: chunks() })) {
|
|
378
|
+
console.log(partial);
|
|
379
|
+
}
|
|
380
|
+
```
|
|
381
|
+
|
|
311
382
|
### Schema validation
|
|
312
383
|
|
|
313
384
|
Validate and auto-infer types with Zod, Valibot, ArkType, or any lib with `.safeParse()`. Works on all three APIs:
|
|
@@ -449,8 +520,39 @@ interface ParseResult {
|
|
|
449
520
|
- **`invalid`** — broken JSON
|
|
450
521
|
|
|
451
522
|
### `createParser(schema?): StreamingParser<T>`
|
|
523
|
+
### `createParser(options?): StreamingParser<T>`
|
|
524
|
+
|
|
525
|
+
Each `feed()` processes only new bytes — O(n) total. Three overloads:
|
|
526
|
+
|
|
527
|
+
```ts
|
|
528
|
+
createParser(); // no validation
|
|
529
|
+
createParser(schema); // schema validation + auto-pick + dirty input handling
|
|
530
|
+
createParser({ schema, source }); // options object
|
|
531
|
+
```
|
|
532
|
+
|
|
533
|
+
**Options object:**
|
|
534
|
+
|
|
535
|
+
```ts
|
|
536
|
+
interface CreateParserOptions<T = unknown> {
|
|
537
|
+
schema?: ZodLike<T>; // validate on complete, auto-pick from shape, skip dirty input
|
|
538
|
+
source?: ReadableStream<Uint8Array> | AsyncIterable<Uint8Array | string>;
|
|
539
|
+
pick?: string[]; // advanced: explicit field paths (overrides schema auto-pick)
|
|
540
|
+
}
|
|
541
|
+
```
|
|
452
542
|
|
|
453
|
-
|
|
543
|
+
When a `schema` is provided:
|
|
544
|
+
- Fields are auto-picked from the schema's `.shape` — only matching fields are parsed
|
|
545
|
+
- Arrays are transparent — `{ users: z.array(z.object({ name })) }` picks `users.name` through arrays
|
|
546
|
+
- Dirty input (think tags, code fences, leading prose) is stripped before parsing
|
|
547
|
+
- On complete, `safeParse()` validates the final value
|
|
548
|
+
|
|
549
|
+
When `source` is provided, the parser becomes async-iterable — use `for await` to consume partial values:
|
|
550
|
+
|
|
551
|
+
```ts
|
|
552
|
+
for await (const partial of createParser({ schema: User, source: stream })) {
|
|
553
|
+
console.log(partial); // growing object with only schema fields
|
|
554
|
+
}
|
|
555
|
+
```
|
|
454
556
|
|
|
455
557
|
```ts
|
|
456
558
|
interface StreamingParser<T = unknown> {
|
|
@@ -460,6 +562,7 @@ interface StreamingParser<T = unknown> {
|
|
|
460
562
|
getRawBuffer(): ArrayBuffer | null; // transferable buffer for Worker postMessage
|
|
461
563
|
getStatus(): FeedStatus;
|
|
462
564
|
destroy(): void;
|
|
565
|
+
[Symbol.asyncIterator](): AsyncIterableIterator<T | undefined>; // requires source
|
|
463
566
|
}
|
|
464
567
|
type FeedStatus = "incomplete" | "complete" | "error" | "end_early";
|
|
465
568
|
```
|
|
@@ -501,6 +604,33 @@ type DeepPartial<T> = T extends object
|
|
|
501
604
|
|
|
502
605
|
Event-driven streaming parser. Events fire synchronously during `feed()`.
|
|
503
606
|
|
|
607
|
+
```ts
|
|
608
|
+
createEventParser(); // basic
|
|
609
|
+
createEventParser({ source: stream }); // for-await iteration
|
|
610
|
+
createEventParser({ multiRoot: true, onRoot }); // NDJSON
|
|
611
|
+
```
|
|
612
|
+
|
|
613
|
+
**Options:**
|
|
614
|
+
|
|
615
|
+
```ts
|
|
616
|
+
{
|
|
617
|
+
multiRoot?: boolean; // auto-reset between JSON values (NDJSON)
|
|
618
|
+
onRoot?: (event: RootEvent) => void;
|
|
619
|
+
source?: ReadableStream<Uint8Array> | AsyncIterable<Uint8Array | string>;
|
|
620
|
+
}
|
|
621
|
+
```
|
|
622
|
+
|
|
623
|
+
When `source` is provided, the parser becomes async-iterable — `for await` yields growing partial values, just like `createParser`:
|
|
624
|
+
|
|
625
|
+
```ts
|
|
626
|
+
const parser = createEventParser({ source: stream });
|
|
627
|
+
parser.on('tool', (e) => showToolUI(e.value));
|
|
628
|
+
|
|
629
|
+
for await (const partial of parser) {
|
|
630
|
+
updateUI(partial);
|
|
631
|
+
}
|
|
632
|
+
```
|
|
633
|
+
|
|
504
634
|
```ts
|
|
505
635
|
interface EventParser {
|
|
506
636
|
on(path: string, callback: (event: PathEvent) => void): EventParser;
|
|
@@ -515,6 +645,7 @@ interface EventParser {
|
|
|
515
645
|
getRawBuffer(): ArrayBuffer | null; // transferable buffer for Worker postMessage
|
|
516
646
|
getStatus(): FeedStatus;
|
|
517
647
|
destroy(): void;
|
|
648
|
+
[Symbol.asyncIterator](): AsyncIterableIterator<unknown | undefined>; // requires source
|
|
518
649
|
}
|
|
519
650
|
```
|
|
520
651
|
|
|
@@ -555,6 +686,18 @@ interface RootEvent {
|
|
|
555
686
|
}
|
|
556
687
|
```
|
|
557
688
|
|
|
689
|
+
### Parser comparison
|
|
690
|
+
|
|
691
|
+
| | `createParser` | `createEventParser` |
|
|
692
|
+
|---|---|---|
|
|
693
|
+
| **Use case** | Get a growing partial object | React to individual fields as they arrive |
|
|
694
|
+
| **Schema auto-pick** | Yes — schema `.shape` drives field selection | No — use `skip()` and `on()` for filtering |
|
|
695
|
+
| **Dirty input handling** | Yes (when schema provided) | Yes (always) |
|
|
696
|
+
| **`for await` with source** | Yes | Yes |
|
|
697
|
+
| **Field subscriptions** | No | `on()`, `onDelta()`, `skip()` |
|
|
698
|
+
| **Multi-root / NDJSON** | No | Yes (`multiRoot: true`) |
|
|
699
|
+
| **Text callbacks** | No | `onText()` for non-JSON text |
|
|
700
|
+
|
|
558
701
|
### `deepCompare(a, b, options?): boolean`
|
|
559
702
|
|
|
560
703
|
Compare two values for deep equality without materializing JS objects. When both values are VJ proxies, comparison runs entirely in WASM memory — zero allocations, zero Proxy traps.
|
|
@@ -644,6 +787,13 @@ node --expose-gc bench/deep-compare.mjs # deep compare: VJ vs JS deepEq
|
|
|
644
787
|
|
|
645
788
|
Benchmark numbers in this README were measured on GitHub Actions (Ubuntu, x86_64). Results vary by machine but relative speedups are consistent.
|
|
646
789
|
|
|
790
|
+
## Acknowledgments
|
|
791
|
+
|
|
792
|
+
VectorJSON is built on the work of:
|
|
793
|
+
|
|
794
|
+
- **[zimdjson](https://github.com/EzequielRamis/zimdjson)** by Ezequiel Ramis — a Zig port of simdjson that powers the WASM engine
|
|
795
|
+
- **[simdjson](https://simdjson.org/)** by Daniel Lemire & Geoff Langdale — the SIMD-accelerated JSON parsing research that started it all
|
|
796
|
+
|
|
647
797
|
## License
|
|
648
798
|
|
|
649
799
|
Apache-2.0
|