vectorjson 0.3.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -98,7 +98,6 @@ const parser = createEventParser();
98
98
 
99
99
  parser.on('tool', (e) => showToolUI(e.value)); // fires immediately
100
100
  parser.onDelta('code', (e) => editor.append(e.value)); // streams char-by-char
101
- parser.skip('explanation'); // never materialized
102
101
 
103
102
  for await (const chunk of llmStream) {
104
103
  parser.feed(chunk); // O(n) — only new bytes scanned
@@ -279,35 +278,12 @@ parser.onDelta('tool_calls[0].args.code', (e) => {
279
278
  editor.append(e.value); // just the new characters, decoded
280
279
  });
281
280
 
282
- // Don't waste CPU on fields you don't need
283
- parser.skip('tool_calls[*].args.explanation');
284
-
285
281
  for await (const chunk of llmStream) {
286
282
  parser.feed(chunk);
287
283
  }
288
284
  parser.destroy();
289
285
  ```
290
286
 
291
- ### Multi-root / NDJSON
292
-
293
- Some LLM APIs stream multiple JSON values separated by newlines. VectorJSON auto-resets between values:
294
-
295
- ```js
296
- import { createEventParser } from "vectorjson";
297
-
298
- const parser = createEventParser({
299
- multiRoot: true,
300
- onRoot(event) {
301
- console.log(`Root #${event.index}:`, event.value);
302
- }
303
- });
304
-
305
- for await (const chunk of ndjsonStream) {
306
- parser.feed(chunk);
307
- }
308
- parser.destroy();
309
- ```
310
-
311
287
  ### Mixed LLM output (chain-of-thought, code fences)
312
288
 
313
289
  Some models emit thinking text before JSON, or wrap JSON in code fences. VectorJSON finds the JSON automatically:
@@ -343,15 +319,6 @@ for await (const partial of createParser({ schema: User, source: response.body }
343
319
  }
344
320
  ```
345
321
 
346
- Works on dirty LLM output — think tags, code fences, and leading prose are stripped automatically when a schema is provided:
347
-
348
- ```js
349
- // All of these work with createParser(schema):
350
- // <think>reasoning</think>{"name":"Alice","age":30}
351
- // ```json\n{"name":"Alice","age":30}\n```
352
- // Here's the result: {"name":"Alice","age":30}
353
- ```
354
-
355
322
  Both `createParser` and `createEventParser` support `source` + `for await`:
356
323
 
357
324
  ```js
@@ -487,6 +454,24 @@ result.status; // "complete" | "complete_early" | "incomplete" | "invalid"
487
454
  result.value.users; // lazy Proxy — materializes on access
488
455
  ```
489
456
 
457
+ ### JSONL & JSON5
458
+
459
+ Both `createParser` and `createEventParser` accept `format: "jsonl" | "json5"`:
460
+
461
+ ```js
462
+ // JSONL — yields each value separately
463
+ for await (const value of createParser({ format: "jsonl", source: stream })) {
464
+ console.log(value); // { user: "Alice" }, { user: "Bob" }, ...
465
+ }
466
+
467
+ // JSON5 — comments, trailing commas, unquoted keys, single-quoted strings, hex, Infinity/NaN
468
+ const p = createParser({ format: "json5" });
469
+ p.feed(`{ name: 'Alice', tags: ['admin',], color: 0xFF0000, timeout: Infinity, }`);
470
+ p.getValue(); // { name: "Alice", tags: ["admin"], color: 16711680, timeout: Infinity }
471
+ ```
472
+
473
+ JSONL push-based: call `resetForNext()` after each value. JSON5 comments are stripped at the byte level during streaming.
474
+
490
475
  ## API Reference
491
476
 
492
477
  ### Direct exports (recommended)
@@ -526,7 +511,7 @@ Each `feed()` processes only new bytes — O(n) total. Three overloads:
526
511
 
527
512
  ```ts
528
513
  createParser(); // no validation
529
- createParser(schema); // schema validation + auto-pick + dirty input handling
514
+ createParser(schema); // only parse schema fields, validate on complete
530
515
  createParser({ schema, source }); // options object
531
516
  ```
532
517
 
@@ -534,16 +519,15 @@ createParser({ schema, source }); // options object
534
519
 
535
520
  ```ts
536
521
  interface CreateParserOptions<T = unknown> {
537
- schema?: ZodLike<T>; // validate on complete, auto-pick from shape, skip dirty input
522
+ schema?: ZodLike<T>; // only parse schema fields, validate on complete
538
523
  source?: ReadableStream<Uint8Array> | AsyncIterable<Uint8Array | string>;
539
- pick?: string[]; // advanced: explicit field paths (overrides schema auto-pick)
524
+ format?: "json" | "jsonl" | "json5"; // default: "json"
540
525
  }
541
526
  ```
542
527
 
543
528
  When a `schema` is provided:
544
- - Fields are auto-picked from the schema's `.shape`only matching fields are parsed
545
- - Arrays are transparent — `{ users: z.array(z.object({ name })) }` picks `users.name` through arrays
546
- - Dirty input (think tags, code fences, leading prose) is stripped before parsing
529
+ - Only fields defined in the schema are parsed everything else is skipped at the byte level
530
+ - Arrays are transparent — `z.array(z.object({ name }))` parses `name` inside each array element
547
531
  - On complete, `safeParse()` validates the final value
548
532
 
549
533
  When `source` is provided, the parser becomes async-iterable — use `for await` to consume partial values:
@@ -561,6 +545,7 @@ interface StreamingParser<T = unknown> {
561
545
  getRemaining(): Uint8Array | null;
562
546
  getRawBuffer(): ArrayBuffer | null; // transferable buffer for Worker postMessage
563
547
  getStatus(): FeedStatus;
548
+ resetForNext(): number; // JSONL: reset for next value, returns remaining byte count
564
549
  destroy(): void;
565
550
  [Symbol.asyncIterator](): AsyncIterableIterator<T | undefined>; // requires source
566
551
  }
@@ -607,16 +592,16 @@ Event-driven streaming parser. Events fire synchronously during `feed()`.
607
592
  ```ts
608
593
  createEventParser(); // basic
609
594
  createEventParser({ source: stream }); // for-await iteration
610
- createEventParser({ multiRoot: true, onRoot }); // NDJSON
595
+ createEventParser({ schema, source }); // schema + for-await
611
596
  ```
612
597
 
613
598
  **Options:**
614
599
 
615
600
  ```ts
616
601
  {
617
- multiRoot?: boolean; // auto-reset between JSON values (NDJSON)
618
- onRoot?: (event: RootEvent) => void;
619
602
  source?: ReadableStream<Uint8Array> | AsyncIterable<Uint8Array | string>;
603
+ schema?: ZodLike<T>; // only parse schema fields (same as createParser)
604
+ format?: "json" | "jsonl" | "json5"; // default: "json"
620
605
  }
621
606
  ```
622
607
 
@@ -637,7 +622,6 @@ interface EventParser {
637
622
  on<T>(path: string, schema: { safeParse: Function }, callback: (event: PathEvent & { value: T }) => void): EventParser;
638
623
  onDelta(path: string, callback: (event: DeltaEvent) => void): EventParser;
639
624
  onText(callback: (text: string) => void): EventParser;
640
- skip(...paths: string[]): EventParser;
641
625
  off(path: string, callback?: Function): EventParser;
642
626
  feed(chunk: string | Uint8Array): FeedStatus;
643
627
  getValue(): unknown | undefined; // undefined while incomplete, throws on parse errors
@@ -649,7 +633,7 @@ interface EventParser {
649
633
  }
650
634
  ```
651
635
 
652
- All methods return `self` for chaining: `parser.on(...).onDelta(...).skip(...)`.
636
+ All methods return `self` for chaining: `parser.on(...).onDelta(...)`.
653
637
 
654
638
  **Path syntax:**
655
639
  - `foo.bar` — exact key
@@ -679,24 +663,48 @@ interface DeltaEvent {
679
663
  length: number; // byte length of delta (raw bytes, not char count)
680
664
  }
681
665
 
682
- interface RootEvent {
683
- type: 'root';
684
- index: number; // which root value (0, 1, 2...)
685
- value: unknown; // parsed via doc_parse
686
- }
687
666
  ```
688
667
 
689
668
  ### Parser comparison
690
669
 
691
670
  | | `createParser` | `createEventParser` |
692
671
  |---|---|---|
693
- | **Use case** | Get a growing partial object | React to individual fields as they arrive |
694
- | **Schema auto-pick** | Yes schema `.shape` drives field selection | No use `skip()` and `on()` for filtering |
695
- | **Dirty input handling** | Yes (when schema provided) | Yes (always) |
696
- | **`for await` with source** | Yes | Yes |
697
- | **Field subscriptions** | No | `on()`, `onDelta()`, `skip()` |
698
- | **Multi-root / NDJSON** | No | Yes (`multiRoot: true`) |
699
- | **Text callbacks** | No | `onText()` for non-JSON text |
672
+ | **Completion** | `feed()` returns `"complete"` after one JSON value | Handles multiple JSON values user calls `destroy()` when done |
673
+ | **Malformed JSON** | `feed()` returns `"error"` | Skips it, finds the next JSON |
674
+ | **Schema** | Pass Zod/Valibot, only schema fields are parsed | Same |
675
+ | **Skip non-JSON** (think tags, code fences, prose) | | Always |
676
+ | **Field subscriptions** | | `on()`, `onDelta()` |
677
+ | **JSONL** | `format: "jsonl"` | `format: "jsonl"` |
678
+ | **Text callbacks** | | `onText()` |
679
+
680
+ **`createParser` parses one JSON value** and reports status — you check it and react:
681
+
682
+ ```js
683
+ const parser = createParser();
684
+ for await (const chunk of stream) {
685
+ const status = parser.feed(chunk);
686
+ if (status === "complete") break; // done — one JSON value parsed
687
+ if (status === "error") break; // malformed JSON detected
688
+ }
689
+ const result = parser.getValue();
690
+ parser.destroy();
691
+ ```
692
+
693
+ **`createEventParser` handles an entire LLM response** — text, thinking, code fences, all in one stream:
694
+
695
+ ```js
696
+ const parser = createEventParser();
697
+ parser.on('tool', (e) => showToolUI(e.value));
698
+ parser.onText((text) => thinkingPanel.append(text));
699
+
700
+ // LLM output with mixed content:
701
+ // <think>let me reason about this...</think>
702
+ // {"tool":"search","query":"weather"}
703
+ for await (const chunk of llmStream) {
704
+ parser.feed(chunk); // strips think tags, finds JSON, fires callbacks
705
+ }
706
+ parser.destroy();
707
+ ```
700
708
 
701
709
  ### `deepCompare(a, b, options?): boolean`
702
710