llm-stream-assemble 1.0.1 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,32 +1,81 @@
1
1
  # llm-stream-assemble
2
2
 
3
- ![core](https://img.shields.io/badge/core-1.0.1-blue)
3
+ ![core](https://img.shields.io/badge/core-1.2.0-blue)
4
4
  ![node](https://img.shields.io/badge/node-%3E%3D18-339933)
5
5
  ![runtime deps](https://img.shields.io/badge/runtime_deps-0-brightgreen)
6
- ![tests](https://img.shields.io/badge/tests-547%2B_passing-brightgreen)
6
+ ![tests](https://img.shields.io/badge/tests-755%2B_passing-brightgreen)
7
7
  [![ci](https://github.com/01laky/llm-stream-assemble/actions/workflows/ci.yml/badge.svg)](https://github.com/01laky/llm-stream-assemble/actions/workflows/ci.yml)
8
- ![status](https://img.shields.io/badge/status-stable_1.0.1-brightgreen)
8
+ ![status](https://img.shields.io/badge/status-stable_1.2.0-brightgreen)
9
9
 
10
- A zero-dependency TypeScript library that normalizes LLM streaming responses — text, tool calls, reasoning, JSON, usage, errors, and non-streaming payloads — into unified events.
10
+ **One typed event model for every LLM stream** — text, tool calls, reasoning, JSON, usage, refusals, errors, and non-streaming responses.
11
11
 
12
- **Status:** Stable `1.0.1`. Core, OpenAI Chat, OpenAI-compatible, Anthropic Messages, OpenAI Responses adapters, transforms, replay helpers, and examples are production-ready. Pin semver ranges as usual and review [CHANGELOG.md](./CHANGELOG.md) before major upgrades.
12
+ > A zero-dependency TypeScript layer for assembling **OpenAI**, **Anthropic**, **Google Gemini**, and **OpenAI-compatible** LLM streams into unified events so you can stop hand-rolling provider parsers and keep one clean, typed event model across chat UIs, agents, proxies, and backends.
13
13
 
14
- > A zero-dependency TypeScript layer for assembling OpenAI, Anthropic, and compatible LLM streams into unified events for text, tool calls, reasoning, JSON, usage, errors, and non-streaming responses - so you can stop hand-rolling provider parsers and keep one clean, typed event model across LLM apps, agents, proxies, and backends.
14
+ **Status:** Stable `1.2.0`. Five built-in adapters, twelve OpenAI-compatible host presets (including **Azure OpenAI**), transforms, replay helpers, and examples are production-ready. Pin semver ranges as usual and review [CHANGELOG.md](./CHANGELOG.md) before major upgrades.
15
15
 
16
- ## How it works
16
+ ---
17
+
18
+ ## Contents
19
+
20
+ - [Why use this](#why-use-this)
21
+ - [Architecture](#architecture)
22
+ - [Providers at a glance](#providers-at-a-glance)
23
+ - [Install](#install)
24
+ - [Quickstart](#quickstart)
25
+ - [Documentation](#documentation)
26
+ - [Usage guides](#usage-guides)
27
+ - [Transforms & replay](#transforms--replay)
28
+ - [Examples & proxy safety](#examples--proxy-safety)
29
+ - [Non-goals](#non-goals)
30
+ - [Development](#development)
31
+
32
+ ---
33
+
34
+ ## Why use this
35
+
36
+ - **Zero runtime dependencies** — thin adapters + core assembly, no provider SDKs.
37
+ - **Stream and non-stream parity** — same `StreamEvent` union from SSE chunks or JSON bodies.
38
+ - **Provider presets, not forks** — Groq, Azure, Perplexity, xAI, and others reuse one compatible parser with dialect options.
39
+ - **Proxy-ready transforms** — `toSSE({ sanitizeErrors: true })`, `tapEvents`, `collectStream`, fixture replay.
40
+
41
+ ---
42
+
43
+ ## Architecture
17
44
 
18
45
  Raw provider bytes enter through a **thin adapter**, get assembled into **typed events**, and leave through the same transform layer whether you stream live, replay fixtures, or proxy to a browser.
19
46
 
20
- ![Architecture pipeline](https://raw.githubusercontent.com/01laky/llm-stream-assemble/main/docs/img/pipeline.svg)
47
+ ![End-to-end pipeline](https://raw.githubusercontent.com/01laky/llm-stream-assemble/main/docs/img/pipeline.svg)
21
48
 
22
- Every adapter maps provider-specific fragments into the same **`StreamEvent`** union — one event model for streaming and non-streaming code paths:
49
+ ### Built-in adapters
23
50
 
24
- ![StreamEvent mindmap](https://raw.githubusercontent.com/01laky/llm-stream-assemble/main/docs/img/stream-event.svg)
51
+ ![Built-in adapters and compatible presets](https://raw.githubusercontent.com/01laky/llm-stream-assemble/main/docs/img/adapters-overview.svg)
52
+
53
+ ### Unified event model
25
54
 
26
- Diagram sources (Mermaid): [`docs/img/pipeline.mmd`](./docs/img/pipeline.mmd), [`docs/img/stream-event.mmd`](./docs/img/stream-event.mmd). Regenerate SVGs with `@mermaid-js/mermaid-cli` after editing.
55
+ Every adapter maps provider-specific fragments into the same **`StreamEvent`** union:
56
+
57
+ ![StreamEvent mindmap](https://raw.githubusercontent.com/01laky/llm-stream-assemble/main/docs/img/stream-event.svg)
27
58
 
28
59
  **Design constraints:** adapters never accumulate cross-chunk state beyond id/index reconciliation; assembly, buffering, and `.done` emission live in core. No HTTP client, no tool execution, no UI — just the stream layer.
29
60
 
61
+ Diagram sources: [`docs/img/`](./docs/img/) (Mermaid `.mmd` + committed SVG). Regenerate with `pnpm diagrams:build`.
62
+
63
+ ---
64
+
65
+ ## Providers at a glance
66
+
67
+ | Adapter | Provider / API | Import |
68
+ | --------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | ------------------------------------------- |
69
+ | `openaiChatAdapter()` | OpenAI Chat Completions | `llm-stream-assemble` |
70
+ | `openaiCompatibleAdapter({ provider })` | Groq, DeepSeek, Mistral, Ollama, LM Studio, Together, Fireworks, OpenRouter, Perplexity, xAI, **Azure OpenAI**, generic | `llm-stream-assemble` |
71
+ | `anthropicAdapter()` | Anthropic Messages | `llm-stream-assemble` |
72
+ | `openaiResponsesAdapter()` | OpenAI Responses API | `llm-stream-assemble` |
73
+ | `geminiAdapter()` | Google AI Gemini | `llm-stream-assemble` or `/adapters/gemini` |
74
+
75
+ Full feature flags and quirks: [compatibility matrix](./docs/compatibility.md).
76
+
77
+ ---
78
+
30
79
  ## Install
31
80
 
32
81
  ```bash
@@ -34,20 +83,38 @@ pnpm add llm-stream-assemble
34
83
  # or npm install llm-stream-assemble
35
84
  ```
36
85
 
37
- ## Requirements
86
+ **Requirements:** Node.js 18+
87
+
88
+ ---
89
+
90
+ ## Quickstart
91
+
92
+ ```ts
93
+ import { assembleStream, openaiChatAdapter } from "llm-stream-assemble";
94
+
95
+ for await (const event of assembleStream(response.body!, openaiChatAdapter())) {
96
+ if (event.type === "text.delta") process.stdout.write(event.text);
97
+ }
98
+ ```
38
99
 
39
- - Node.js 18+
100
+ ---
40
101
 
41
102
  ## Documentation
42
103
 
43
- - [Product & technical proposal](./docs/proposal.md)
44
- - [Post-1.0 provider roadmap (proposal)](./docs/post-1.0-provider-roadmap.md)
45
104
  - [Provider compatibility matrix](./docs/compatibility.md)
46
105
  - [Adapter author guide](./docs/adapter-guide.md)
106
+ - [Architecture diagrams](./docs/img/README.md)
107
+ - [Live smoke checklist (maintainers)](./docs/live-smoke.md)
108
+ - [Post-1.0 provider roadmap](./docs/post-1.0-provider-roadmap.md)
109
+ - [Product & technical proposal](./docs/proposal.md)
110
+
111
+ ---
47
112
 
48
- ## Core Usage
113
+ ## Usage guides
49
114
 
50
- The core pipeline works with any adapter that emits `RawChunk[]`, including the built-in OpenAI Chat, OpenAI-compatible, Anthropic Messages, and OpenAI Responses adapters:
115
+ ### Core Usage
116
+
117
+ The core pipeline works with any adapter that emits `RawChunk[]`, including the built-in OpenAI Chat, OpenAI-compatible, Anthropic Messages, OpenAI Responses, and Google Gemini adapters:
51
118
 
52
119
  ```ts
53
120
  import { assembleFromPayloads, type StreamAdapter } from "llm-stream-assemble";
@@ -66,17 +133,7 @@ for await (const event of assembleFromPayloads(payloads, adapter)) {
66
133
 
67
134
  Assembly buffers completed text, reasoning, JSON, and tool-call arguments so it can emit final `.done` events. Use `maxBufferBytes` to cap those buffers for untrusted or unusually large streams.
68
135
 
69
- ## Quickstart
70
-
71
- ```ts
72
- import { assembleStream, openaiChatAdapter } from "llm-stream-assemble";
73
-
74
- for await (const event of assembleStream(response.body!, openaiChatAdapter())) {
75
- if (event.type === "text.delta") process.stdout.write(event.text);
76
- }
77
- ```
78
-
79
- ## OpenAI Chat Usage
136
+ ### OpenAI Chat Usage
80
137
 
81
138
  `openaiChatAdapter()` parses OpenAI Chat Completions payloads. Create one adapter instance per request/stream because it keeps minimal state for metadata and tool-call indexes.
82
139
 
@@ -104,7 +161,7 @@ for await (const event of assembleStream(response.body!, openaiChatAdapter())) {
104
161
 
105
162
  Streaming usage requires `stream_options: { include_usage: true }` on the OpenAI request. JSON mode content is exposed by OpenAI as normal content deltas, so use `openaiChatAdapter({ jsonMode: true })` when you want content mapped to `json.*` events.
106
163
 
107
- ## OpenAI-Compatible Usage
164
+ ### OpenAI-Compatible Usage
108
165
 
109
166
  `openaiCompatibleAdapter()` supports OpenAI-shaped Chat Completions APIs with best-effort provider presets. Create one adapter instance per request/stream.
110
167
 
@@ -122,15 +179,22 @@ for await (const event of assembleStream(response.body!, adapter)) {
122
179
 
123
180
  Provider presets:
124
181
 
125
- | Preset | Intended hosts | Notes |
126
- | ------------ | ----------------------------- | ----------------------------------------------------------- |
127
- | `generic` | Any OpenAI-shaped API | Loose defaults, best first try |
128
- | `openrouter` | OpenRouter | Mostly OpenAI-shaped; provider-specific metadata may appear |
129
- | `groq` | Groq OpenAI-compatible API | OpenAI-like; usage can vary by endpoint/model |
130
- | `ollama` | Ollama `/v1/chat/completions` | Local host, metadata may be sparse |
131
- | `lmstudio` | LM Studio local server | Local host, metadata/usage may be sparse |
132
- | `together` | Together AI | OpenAI-like, reasoning fields may vary |
133
- | `fireworks` | Fireworks AI | OpenAI-like, usage/details may vary |
182
+ | Preset | Intended hosts | Notes |
183
+ | ------------ | ----------------------------- | ------------------------------------------------------------------------------------------- |
184
+ | `generic` | Any OpenAI-shaped API | Loose defaults, best first try |
185
+ | `openrouter` | OpenRouter | Mostly OpenAI-shaped; provider-specific metadata may appear |
186
+ | `groq` | Groq OpenAI-compatible API | OpenAI-like; usage can vary by endpoint/model |
187
+ | `deepseek` | DeepSeek API | Maps `reasoning_content` to reasoning events on R1-style models |
188
+ | `mistral` | Mistral API | OpenAI-like; parallel tool calls supported |
189
+ | `ollama` | Ollama `/v1/chat/completions` | Local host, metadata may be sparse |
190
+ | `lmstudio` | LM Studio local server | Local host, metadata/usage may be sparse |
191
+ | `together` | Together AI | OpenAI-like; `reasoning` / `reasoning_delta` aliases |
192
+ | `fireworks` | Fireworks AI | OpenAI-like, usage/details may vary |
193
+ | `perplexity` | Perplexity API | Search-grounded answers; citations in `metadata.raw` |
194
+ | `xai` | xAI Grok API | OpenAI-compatible; `reasoning_content` mapped when present |
195
+ | `azure` | Azure OpenAI Chat Completions | Stricter preset; deployment URL + `api-key` auth; content filter metadata in `metadata.raw` |
196
+
197
+ Base URL examples: Groq `https://api.groq.com/openai/v1`, DeepSeek `https://api.deepseek.com`, Mistral `https://api.mistral.ai/v1`, Ollama `http://localhost:11434/v1`, LM Studio `http://localhost:1234/v1`, Together `https://api.together.xyz/v1`, Fireworks `https://api.fireworks.ai/inference/v1`, OpenRouter `https://openrouter.ai/api/v1`, Perplexity `https://api.perplexity.ai`, xAI `https://api.x.ai/v1`, Azure OpenAI `https://{resource}.openai.azure.com/openai/deployments/{deployment}/chat/completions?api-version={version}`.
134
198
 
135
199
  Strict vs loose configuration:
136
200
 
@@ -155,7 +219,44 @@ Known limitations:
155
219
  - Multi-choice terminal behavior is limited by the current core single terminal finish event.
156
220
  - Missing tool ids are tolerated because core can synthesize stable ids by index.
157
221
 
158
- ## Anthropic Messages Usage
222
+ ### Azure OpenAI Usage
223
+
224
+ Azure OpenAI Chat Completions uses a deployment-scoped URL and **`api-key`** authentication instead of Bearer tokens. Use the **`azure`** preset — not `generic` — for stricter parsing aligned with OpenAI Chat semantics (`allowMissingMetadata: false`, `looseErrorShape: false`).
225
+
226
+ ```ts
227
+ import { assembleStream, openaiCompatibleAdapter } from "llm-stream-assemble";
228
+
229
+ const resource = process.env.AZURE_OPENAI_RESOURCE!;
230
+ const deployment = process.env.AZURE_OPENAI_DEPLOYMENT!;
231
+ const apiVersion = process.env.AZURE_OPENAI_API_VERSION ?? "2024-10-21";
232
+ const url = `https://${resource}.openai.azure.com/openai/deployments/${deployment}/chat/completions?api-version=${apiVersion}`;
233
+
234
+ const response = await fetch(url, {
235
+ method: "POST",
236
+ headers: {
237
+ "api-key": process.env.AZURE_OPENAI_API_KEY!,
238
+ "Content-Type": "application/json",
239
+ },
240
+ body: JSON.stringify({
241
+ messages: [{ role: "user", content: "Hello" }],
242
+ stream: true,
243
+ stream_options: { include_usage: true },
244
+ }),
245
+ });
246
+
247
+ for await (const event of assembleStream(
248
+ response.body!,
249
+ openaiCompatibleAdapter({ provider: "azure" }),
250
+ )) {
251
+ if (event.type === "text.delta") process.stdout.write(event.text);
252
+ }
253
+ ```
254
+
255
+ Use `openaiCompatibleAdapter({ provider: "azure", jsonMode: true })` when structured JSON output should map to `json.*` events. Content-filter blocks surface as `refusal.*` events with `finish_reason: content_filter`; filter result fields remain in `metadata.raw` for auditing. If an API gateway strips metadata from chunks, soften strict parsing server-side only with `allowMissingMetadata: true`.
256
+
257
+ See `examples/node-fetch/azure-openai.ts` for a URL builder helper and `examples/proxy-safety/README.md` for server-side proxy notes.
258
+
259
+ ### Anthropic Messages Usage
159
260
 
160
261
  `anthropicAdapter()` parses Anthropic Messages streaming events and non-streaming responses. Create one adapter instance per request/stream.
161
262
 
@@ -169,7 +270,7 @@ for await (const event of assembleStream(response.body!, anthropicAdapter())) {
169
270
 
170
271
  Anthropic tool calls are emitted from `tool_use` content blocks. Fine-grained tool input streaming is supported through `input_json_delta`; partial input may be invalid JSON until the block ends, and core handles those partial previews best-effort. Thinking blocks map to `reasoning.*` events with `variant: "detail"`.
171
272
 
172
- ## OpenAI Responses Usage
273
+ ### OpenAI Responses Usage
173
274
 
174
275
  `openaiResponsesAdapter()` parses OpenAI Responses API streaming events and non-streaming response objects. It focuses on output text and function call argument streams; Realtime, audio, and multimodal binary output are out of scope.
175
276
 
@@ -183,7 +284,44 @@ for await (const event of assembleStream(response.body!, openaiResponsesAdapter(
183
284
 
184
285
  Use `openaiResponsesAdapter({ jsonMode: true })` to map output text to `json.*` events. Reasoning support is best-effort for string summary/detail fields. Create a new adapter instance per stream.
185
286
 
186
- ## Collecting a Stream
287
+ ### Gemini Usage
288
+
289
+ `geminiAdapter()` parses Google AI Gemini `GenerateContentResponse` payloads from `streamGenerateContent?alt=sse` and non-streaming `generateContent`. Create one adapter instance per request/stream.
290
+
291
+ ```ts
292
+ import { assembleStream, geminiAdapter } from "llm-stream-assemble";
293
+
294
+ const model = "gemini-2.5-flash";
295
+ const apiKey = process.env.GOOGLE_API_KEY!;
296
+ const url = `https://generativelanguage.googleapis.com/v1beta/models/${model}:streamGenerateContent?alt=sse&key=${encodeURIComponent(apiKey)}`;
297
+
298
+ const response = await fetch(url, {
299
+ method: "POST",
300
+ headers: { "Content-Type": "application/json" },
301
+ body: JSON.stringify({
302
+ contents: [{ role: "user", parts: [{ text: "Hello" }] }],
303
+ }),
304
+ });
305
+
306
+ for await (const event of assembleStream(response.body!, geminiAdapter())) {
307
+ if (event.type === "text.delta") process.stdout.write(event.text);
308
+ if (event.type === "tool_call.done") console.log(event.name, event.args);
309
+ }
310
+ ```
311
+
312
+ Use `geminiAdapter({ jsonMode: true })` when structured JSON output should map to `json.*` instead of `text.*`. Thinking models may emit `thought` parts mapped to `reasoning.*` (best-effort). Gemini does not expose OpenAI-style `refusal.*` events — blocked prompts use `promptFeedback` or safety finish reasons instead.
313
+
314
+ Subpath import: `import { geminiAdapter } from "llm-stream-assemble/adapters/gemini"`.
315
+
316
+ Vertex AI and the Interactions API are out of scope for this adapter; see [compatibility matrix](./docs/compatibility.md).
317
+
318
+ ---
319
+
320
+ ## Transforms & replay
321
+
322
+ ![Transforms and helpers](https://raw.githubusercontent.com/01laky/llm-stream-assemble/main/docs/img/transforms.svg)
323
+
324
+ ### Collecting a Stream
187
325
 
188
326
  `collectStream()` materializes a full event stream into text, reasoning, refusals, JSON, tool calls, latest usage, and finish reason. It buffers full output in memory and aggregates multi-choice text in event order; it is not a per-choice collector and does not currently collect metadata.
189
327
 
@@ -194,7 +332,7 @@ const result = await collectStream(events);
194
332
  console.log(result.text, result.toolCalls, result.finishReason);
195
333
  ```
196
334
 
197
- ## Tapping Events
335
+ ### Tapping Events
198
336
 
199
337
  `tapEvents()` lets you observe events for logging or metrics without changing the stream.
200
338
 
@@ -206,7 +344,7 @@ for await (const event of tapEvents(events, (event) => console.debug(event.type)
206
344
  }
207
345
  ```
208
346
 
209
- ## Forwarding Unified SSE
347
+ ### Forwarding Unified SSE
210
348
 
211
349
  `toSSE()` serializes unified `StreamEvent` objects as `data: <json>` SSE messages. It does not currently emit named SSE `event:` fields, and it emits unified event JSON rather than raw provider SSE.
212
350
 
@@ -220,7 +358,7 @@ return new Response(toSSE(events, { sanitizeErrors: true }), {
220
358
 
221
359
  Use `sanitizeErrors: true` when forwarding events to browsers so raw provider internals are not exposed.
222
360
 
223
- ## Replaying Fixtures
361
+ ### Replaying Fixtures
224
362
 
225
363
  `assembleFromFile()` is a Node/dev replay helper for local `.sse` and `.json` fixtures. It uses `node:fs/promises`, so avoid it in browser bundles; a dedicated browser/edge entry point can be added later if needed.
226
364
 
@@ -235,14 +373,21 @@ for await (const event of assembleFromFile(
235
373
  }
236
374
  ```
237
375
 
238
- ## Examples
376
+ ---
377
+
378
+ ## Examples & proxy safety
239
379
 
240
- - `examples/node-fetch/openai-chat.ts`
241
- - `examples/node-fetch/openai-compatible.ts`
242
- - `examples/node-fetch/anthropic.ts`
243
- - `examples/node-fetch/replay-fixture.ts`
244
- - `examples/proxy-safety/web-standard-proxy.ts`
245
- - `examples/proxy-safety/browser-client.ts`
380
+ | Example | Description |
381
+ | ---------------------------------------------------------------------------------------- | --------------------------------------- |
382
+ | [`examples/node-fetch/openai-chat.ts`](./examples/node-fetch/openai-chat.ts) | OpenAI Chat Completions streaming |
383
+ | [`examples/node-fetch/openai-compatible.ts`](./examples/node-fetch/openai-compatible.ts) | OpenAI-compatible presets |
384
+ | [`examples/node-fetch/azure-openai.ts`](./examples/node-fetch/azure-openai.ts) | Azure OpenAI deployment URL + `api-key` |
385
+ | [`examples/node-fetch/perplexity.ts`](./examples/node-fetch/perplexity.ts) | Perplexity streaming |
386
+ | [`examples/node-fetch/xai.ts`](./examples/node-fetch/xai.ts) | xAI Grok streaming |
387
+ | [`examples/node-fetch/anthropic.ts`](./examples/node-fetch/anthropic.ts) | Anthropic Messages |
388
+ | [`examples/node-fetch/gemini.ts`](./examples/node-fetch/gemini.ts) | Google Gemini SSE |
389
+ | [`examples/node-fetch/replay-fixture.ts`](./examples/node-fetch/replay-fixture.ts) | Local fixture replay |
390
+ | [`examples/proxy-safety/`](./examples/proxy-safety/) | Proxy + browser client patterns |
246
391
 
247
392
  Proxy safety:
248
393
 
@@ -251,6 +396,8 @@ Proxy safety:
251
396
  - Never forward raw provider errors or upstream non-OK response bodies to browsers.
252
397
  - CORS headers are application-specific and intentionally omitted from the Web-standard example.
253
398
 
399
+ ---
400
+
254
401
  ## Non-goals
255
402
 
256
403
  - No HTTP client, auth, retries, or provider SDK wrapper.
@@ -258,6 +405,8 @@ Proxy safety:
258
405
  - No UI framework, React hooks, or browser components.
259
406
  - No multimodal binary/audio/video parsing.
260
407
 
408
+ ---
409
+
261
410
  ## Development
262
411
 
263
412
  ```bash
@@ -265,14 +414,16 @@ pnpm install
265
414
  pnpm verify
266
415
  ```
267
416
 
268
- Scripts:
417
+ | Command | Description |
418
+ | --------------------- | --------------------------------------------------- |
419
+ | `pnpm verify` | lint + typecheck + test + build |
420
+ | `pnpm verify:deps` | fail if runtime dependencies are added |
421
+ | `pnpm release:prep` | pre-tag checks (version, CHANGELOG, dist, npm pack) |
422
+ | `pnpm diagrams:build` | regenerate README SVGs from Mermaid sources |
423
+ | `pnpm test` | Vitest smoke tests |
424
+ | `pnpm build` | tsup → ESM + CJS + declarations |
269
425
 
270
- | Command | Description |
271
- | ------------------ | -------------------------------------- |
272
- | `pnpm verify` | lint + typecheck + test + build |
273
- | `pnpm verify:deps` | fail if runtime dependencies are added |
274
- | `pnpm test` | Vitest smoke tests |
275
- | `pnpm build` | tsup → ESM + CJS + declarations |
426
+ ---
276
427
 
277
428
  ## Author
278
429