npm - vectorjson - Versions diffs - 0.2.1 → 0.3.1 - Mend

vectorjson 0.2.1 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md +151 -1
package/dist/engine-wasm.generated.d.ts +1 -1
package/dist/engine.wasm +0 -0
package/dist/index.d.ts +38 -12
package/dist/index.d.ts.map +1 -1
package/dist/index.js +5 -5
package/dist/index.js.map +4 -4
package/package.json +2 -2

package/README.md CHANGED Viewed

@@ -59,6 +59,24 @@ parser.destroy();
 `getValue()` returns a **live JS object** that grows incrementally on each `feed()`. No re-parsing — each byte is scanned exactly once.
+**Schema-driven streaming** — pass a schema, get only what it defines. An LLM streams a 50KB tool call with `name`, `age`, `bio`, `metadata` — but your schema only needs `name` and `age`. Everything else is skipped at the byte level, never allocated in JS:
+```js
+import { z } from "zod";
+import { createParser } from "vectorjson";
+const User = z.object({ name: z.string(), age: z.number() });
+for await (const partial of createParser({ schema: User, source: response.body })) {
+  console.log(partial);
+  // { name: "Ali" }              ← partial, render immediately
+  // { name: "Alice" }
+  // { name: "Alice", age: 30 }   ← validated on complete
+}
+```
+The schema defines what to parse. Fields outside the schema are ignored during scanning — no objects created, no strings decoded, no memory wasted.
 **Or skip intermediate access entirely** — if you only need the final value:
 ```js
@@ -308,6 +326,59 @@ parser.onText((text) => thinkingPanel.append(text)); // opt-in
 parser.feed(llmOutput);
 ```
+### Schema-driven streaming — parse only what the schema defines
+Pass a schema and VectorJSON extracts only the fields it defines. Everything else is skipped at the byte level — no objects created, no strings decoded. Arrays are transparent: `{ users: z.array(z.object({ name })) }` picks through arrays automatically.
+```js
+import { z } from "zod";
+import { createParser } from "vectorjson";
+const User = z.object({ name: z.string(), age: z.number() });
+for await (const partial of createParser({ schema: User, source: response.body })) {
+  console.log(partial);
+  // { name: "Ali" }              ← partial, render immediately
+  // { name: "Alice", age: 30 }   ← validated on complete
+}
+```
+Works on dirty LLM output — think tags, code fences, and leading prose are stripped automatically when a schema is provided:
+```js
+// All of these work with createParser(schema):
+// <think>reasoning</think>{"name":"Alice","age":30}
+// ```json\n{"name":"Alice","age":30}\n```
+// Here's the result: {"name":"Alice","age":30}
+```
+Both `createParser` and `createEventParser` support `source` + `for await`:
+```js
+import { createEventParser } from "vectorjson";
+const parser = createEventParser({ source: response.body });
+parser.on('tool', (e) => showToolUI(e.value));
+for await (const partial of parser) {
+  updateUI(partial); // growing partial value
+}
+```
+Works with any async source — fetch body, WebSocket wrapper, SSE adapter, or a plain async generator:
+```js
+async function* chunks() {
+  yield '{"status":"';
+  yield 'ok","data":';
+  yield '[1,2,3]}';
+}
+for await (const partial of createParser({ source: chunks() })) {
+  console.log(partial);
+}
+```
 ### Schema validation
 Validate and auto-infer types with Zod, Valibot, ArkType, or any lib with `.safeParse()`. Works on all three APIs:
@@ -449,8 +520,39 @@ interface ParseResult {
 - **`invalid`** — broken JSON
 ### `createParser(schema?): StreamingParser<T>`
+### `createParser(options?): StreamingParser<T>`
+Each `feed()` processes only new bytes — O(n) total. Three overloads:
+```ts
+createParser();                    // no validation
+createParser(schema);              // schema validation + auto-pick + dirty input handling
+createParser({ schema, source });  // options object
+```
+**Options object:**
+```ts
+interface CreateParserOptions<T = unknown> {
+  schema?: ZodLike<T>;   // validate on complete, auto-pick from shape, skip dirty input
+  source?: ReadableStream<Uint8Array> | AsyncIterable<Uint8Array | string>;
+  pick?: string[];       // advanced: explicit field paths (overrides schema auto-pick)
+}
+```
-Each `feed()` processes only new bytes — O(n) total. Pass an optional schema to auto-validate and infer the return type.
+When a `schema` is provided:
+- Fields are auto-picked from the schema's `.shape` — only matching fields are parsed
+- Arrays are transparent — `{ users: z.array(z.object({ name })) }` picks `users.name` through arrays
+- Dirty input (think tags, code fences, leading prose) is stripped before parsing
+- On complete, `safeParse()` validates the final value
+When `source` is provided, the parser becomes async-iterable — use `for await` to consume partial values:
+```ts
+for await (const partial of createParser({ schema: User, source: stream })) {
+  console.log(partial); // growing object with only schema fields
+}
+```
 ```ts
 interface StreamingParser<T = unknown> {
@@ -460,6 +562,7 @@ interface StreamingParser<T = unknown> {
   getRawBuffer(): ArrayBuffer | null;  // transferable buffer for Worker postMessage
   getStatus(): FeedStatus;
   destroy(): void;
+  [Symbol.asyncIterator](): AsyncIterableIterator<T | undefined>;  // requires source
 }
 type FeedStatus = "incomplete" | "complete" | "error" | "end_early";
 ```
@@ -501,6 +604,33 @@ type DeepPartial<T> = T extends object
 Event-driven streaming parser. Events fire synchronously during `feed()`.
+```ts
+createEventParser();                              // basic
+createEventParser({ source: stream });            // for-await iteration
+createEventParser({ multiRoot: true, onRoot });   // NDJSON
+```
+**Options:**
+```ts
+{
+  multiRoot?: boolean;   // auto-reset between JSON values (NDJSON)
+  onRoot?: (event: RootEvent) => void;
+  source?: ReadableStream<Uint8Array> | AsyncIterable<Uint8Array | string>;
+}
+```
+When `source` is provided, the parser becomes async-iterable — `for await` yields growing partial values, just like `createParser`:
+```ts
+const parser = createEventParser({ source: stream });
+parser.on('tool', (e) => showToolUI(e.value));
+for await (const partial of parser) {
+  updateUI(partial);
+}
+```
 ```ts
 interface EventParser {
   on(path: string, callback: (event: PathEvent) => void): EventParser;
@@ -515,6 +645,7 @@ interface EventParser {
   getRawBuffer(): ArrayBuffer | null;  // transferable buffer for Worker postMessage
   getStatus(): FeedStatus;
   destroy(): void;
+  [Symbol.asyncIterator](): AsyncIterableIterator<unknown | undefined>;  // requires source
 }
 ```
@@ -555,6 +686,18 @@ interface RootEvent {
 }
 ```
+### Parser comparison
+| | `createParser` | `createEventParser` |
+|---|---|---|
+| **Use case** | Get a growing partial object | React to individual fields as they arrive |
+| **Schema auto-pick** | Yes — schema `.shape` drives field selection | No — use `skip()` and `on()` for filtering |
+| **Dirty input handling** | Yes (when schema provided) | Yes (always) |
+| **`for await` with source** | Yes | Yes |
+| **Field subscriptions** | No | `on()`, `onDelta()`, `skip()` |
+| **Multi-root / NDJSON** | No | Yes (`multiRoot: true`) |
+| **Text callbacks** | No | `onText()` for non-JSON text |
 ### `deepCompare(a, b, options?): boolean`
 Compare two values for deep equality without materializing JS objects. When both values are VJ proxies, comparison runs entirely in WASM memory — zero allocations, zero Proxy traps.
@@ -644,6 +787,13 @@ node --expose-gc bench/deep-compare.mjs          # deep compare: VJ vs JS deepEq
 Benchmark numbers in this README were measured on GitHub Actions (Ubuntu, x86_64). Results vary by machine but relative speedups are consistent.
+## Acknowledgments
+VectorJSON is built on the work of:
+- **[zimdjson](https://github.com/EzequielRamis/zimdjson)** by Ezequiel Ramis — a Zig port of simdjson that powers the WASM engine
+- **[simdjson](https://simdjson.org/)** by Daniel Lemire & Geoff Langdale — the SIMD-accelerated JSON parsing research that started it all
 ## License
 Apache-2.0