vectorjson 0.2.1 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -59,6 +59,24 @@ parser.destroy();
59
59
 
60
60
  `getValue()` returns a **live JS object** that grows incrementally on each `feed()`. No re-parsing — each byte is scanned exactly once.
61
61
 
62
+ **Schema-driven streaming** — pass a schema, get only what it defines. An LLM streams a 50KB tool call with `name`, `age`, `bio`, `metadata` — but your schema only needs `name` and `age`. Everything else is skipped at the byte level, never allocated in JS:
63
+
64
+ ```js
65
+ import { z } from "zod";
66
+ import { createParser } from "vectorjson";
67
+
68
+ const User = z.object({ name: z.string(), age: z.number() });
69
+
70
+ for await (const partial of createParser({ schema: User, source: response.body })) {
71
+ console.log(partial);
72
+ // { name: "Ali" } ← partial, render immediately
73
+ // { name: "Alice" }
74
+ // { name: "Alice", age: 30 } ← validated on complete
75
+ }
76
+ ```
77
+
78
+ The schema defines what to parse. Fields outside the schema are ignored during scanning — no objects created, no strings decoded, no memory wasted.
79
+
62
80
  **Or skip intermediate access entirely** — if you only need the final value:
63
81
 
64
82
  ```js
@@ -308,6 +326,59 @@ parser.onText((text) => thinkingPanel.append(text)); // opt-in
308
326
  parser.feed(llmOutput);
309
327
  ```
310
328
 
329
+ ### Schema-driven streaming — parse only what the schema defines
330
+
331
+ Pass a schema and VectorJSON extracts only the fields it defines. Everything else is skipped at the byte level — no objects created, no strings decoded. Arrays are transparent: `{ users: z.array(z.object({ name })) }` picks through arrays automatically.
332
+
333
+ ```js
334
+ import { z } from "zod";
335
+ import { createParser } from "vectorjson";
336
+
337
+ const User = z.object({ name: z.string(), age: z.number() });
338
+
339
+ for await (const partial of createParser({ schema: User, source: response.body })) {
340
+ console.log(partial);
341
+ // { name: "Ali" } ← partial, render immediately
342
+ // { name: "Alice", age: 30 } ← validated on complete
343
+ }
344
+ ```
345
+
346
+ Works on dirty LLM output — think tags, code fences, and leading prose are stripped automatically when a schema is provided:
347
+
348
+ ```js
349
+ // All of these work with createParser(schema):
350
+ // <think>reasoning</think>{"name":"Alice","age":30}
351
+ // ```json\n{"name":"Alice","age":30}\n```
352
+ // Here's the result: {"name":"Alice","age":30}
353
+ ```
354
+
355
+ Both `createParser` and `createEventParser` support `source` + `for await`:
356
+
357
+ ```js
358
+ import { createEventParser } from "vectorjson";
359
+
360
+ const parser = createEventParser({ source: response.body });
361
+ parser.on('tool', (e) => showToolUI(e.value));
362
+
363
+ for await (const partial of parser) {
364
+ updateUI(partial); // growing partial value
365
+ }
366
+ ```
367
+
368
+ Works with any async source — fetch body, WebSocket wrapper, SSE adapter, or a plain async generator:
369
+
370
+ ```js
371
+ async function* chunks() {
372
+ yield '{"status":"';
373
+ yield 'ok","data":';
374
+ yield '[1,2,3]}';
375
+ }
376
+
377
+ for await (const partial of createParser({ source: chunks() })) {
378
+ console.log(partial);
379
+ }
380
+ ```
381
+
311
382
  ### Schema validation
312
383
 
313
384
  Validate and auto-infer types with Zod, Valibot, ArkType, or any lib with `.safeParse()`. Works on all three APIs:
@@ -449,8 +520,39 @@ interface ParseResult {
449
520
  - **`invalid`** — broken JSON
450
521
 
451
522
  ### `createParser(schema?): StreamingParser<T>`
523
+ ### `createParser(options?): StreamingParser<T>`
524
+
525
+ Each `feed()` processes only new bytes — O(n) total. Three overloads:
526
+
527
+ ```ts
528
+ createParser(); // no validation
529
+ createParser(schema); // schema validation + auto-pick + dirty input handling
530
+ createParser({ schema, source }); // options object
531
+ ```
532
+
533
+ **Options object:**
534
+
535
+ ```ts
536
+ interface CreateParserOptions<T = unknown> {
537
+ schema?: ZodLike<T>; // validate on complete, auto-pick from shape, skip dirty input
538
+ source?: ReadableStream<Uint8Array> | AsyncIterable<Uint8Array | string>;
539
+ pick?: string[]; // advanced: explicit field paths (overrides schema auto-pick)
540
+ }
541
+ ```
452
542
 
453
- Each `feed()` processes only new bytes — O(n) total. Pass an optional schema to auto-validate and infer the return type.
543
+ When a `schema` is provided:
544
+ - Fields are auto-picked from the schema's `.shape` — only matching fields are parsed
545
+ - Arrays are transparent — `{ users: z.array(z.object({ name })) }` picks `users.name` through arrays
546
+ - Dirty input (think tags, code fences, leading prose) is stripped before parsing
547
+ - On complete, `safeParse()` validates the final value
548
+
549
+ When `source` is provided, the parser becomes async-iterable — use `for await` to consume partial values:
550
+
551
+ ```ts
552
+ for await (const partial of createParser({ schema: User, source: stream })) {
553
+ console.log(partial); // growing object with only schema fields
554
+ }
555
+ ```
454
556
 
455
557
  ```ts
456
558
  interface StreamingParser<T = unknown> {
@@ -460,6 +562,7 @@ interface StreamingParser<T = unknown> {
460
562
  getRawBuffer(): ArrayBuffer | null; // transferable buffer for Worker postMessage
461
563
  getStatus(): FeedStatus;
462
564
  destroy(): void;
565
+ [Symbol.asyncIterator](): AsyncIterableIterator<T | undefined>; // requires source
463
566
  }
464
567
  type FeedStatus = "incomplete" | "complete" | "error" | "end_early";
465
568
  ```
@@ -501,6 +604,33 @@ type DeepPartial<T> = T extends object
501
604
 
502
605
  Event-driven streaming parser. Events fire synchronously during `feed()`.
503
606
 
607
+ ```ts
608
+ createEventParser(); // basic
609
+ createEventParser({ source: stream }); // for-await iteration
610
+ createEventParser({ multiRoot: true, onRoot }); // NDJSON
611
+ ```
612
+
613
+ **Options:**
614
+
615
+ ```ts
616
+ {
617
+ multiRoot?: boolean; // auto-reset between JSON values (NDJSON)
618
+ onRoot?: (event: RootEvent) => void;
619
+ source?: ReadableStream<Uint8Array> | AsyncIterable<Uint8Array | string>;
620
+ }
621
+ ```
622
+
623
+ When `source` is provided, the parser becomes async-iterable — `for await` yields growing partial values, just like `createParser`:
624
+
625
+ ```ts
626
+ const parser = createEventParser({ source: stream });
627
+ parser.on('tool', (e) => showToolUI(e.value));
628
+
629
+ for await (const partial of parser) {
630
+ updateUI(partial);
631
+ }
632
+ ```
633
+
504
634
  ```ts
505
635
  interface EventParser {
506
636
  on(path: string, callback: (event: PathEvent) => void): EventParser;
@@ -515,6 +645,7 @@ interface EventParser {
515
645
  getRawBuffer(): ArrayBuffer | null; // transferable buffer for Worker postMessage
516
646
  getStatus(): FeedStatus;
517
647
  destroy(): void;
648
+ [Symbol.asyncIterator](): AsyncIterableIterator<unknown | undefined>; // requires source
518
649
  }
519
650
  ```
520
651
 
@@ -555,6 +686,18 @@ interface RootEvent {
555
686
  }
556
687
  ```
557
688
 
689
+ ### Parser comparison
690
+
691
+ | | `createParser` | `createEventParser` |
692
+ |---|---|---|
693
+ | **Use case** | Get a growing partial object | React to individual fields as they arrive |
694
+ | **Schema auto-pick** | Yes — schema `.shape` drives field selection | No — use `skip()` and `on()` for filtering |
695
+ | **Dirty input handling** | Yes (when schema provided) | Yes (always) |
696
+ | **`for await` with source** | Yes | Yes |
697
+ | **Field subscriptions** | No | `on()`, `onDelta()`, `skip()` |
698
+ | **Multi-root / NDJSON** | No | Yes (`multiRoot: true`) |
699
+ | **Text callbacks** | No | `onText()` for non-JSON text |
700
+
558
701
  ### `deepCompare(a, b, options?): boolean`
559
702
 
560
703
  Compare two values for deep equality without materializing JS objects. When both values are VJ proxies, comparison runs entirely in WASM memory — zero allocations, zero Proxy traps.
@@ -644,6 +787,13 @@ node --expose-gc bench/deep-compare.mjs # deep compare: VJ vs JS deepEq
644
787
 
645
788
  Benchmark numbers in this README were measured on GitHub Actions (Ubuntu, x86_64). Results vary by machine but relative speedups are consistent.
646
789
 
790
+ ## Acknowledgments
791
+
792
+ VectorJSON is built on the work of:
793
+
794
+ - **[zimdjson](https://github.com/EzequielRamis/zimdjson)** by Ezequiel Ramis — a Zig port of simdjson that powers the WASM engine
795
+ - **[simdjson](https://simdjson.org/)** by Daniel Lemire & Geoff Langdale — the SIMD-accelerated JSON parsing research that started it all
796
+
647
797
  ## License
648
798
 
649
799
  Apache-2.0