llm-stream-assemble 1.0.1 → 1.3.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,32 +1,154 @@
1
1
  # llm-stream-assemble
2
2
 
3
- ![core](https://img.shields.io/badge/core-1.0.1-blue)
3
+ ![core](https://img.shields.io/badge/core-1.3.5-blue)
4
4
  ![node](https://img.shields.io/badge/node-%3E%3D18-339933)
5
5
  ![runtime deps](https://img.shields.io/badge/runtime_deps-0-brightgreen)
6
- ![tests](https://img.shields.io/badge/tests-547%2B_passing-brightgreen)
6
+ ![tests](https://img.shields.io/badge/tests-961%2B_passing-brightgreen)
7
7
  [![ci](https://github.com/01laky/llm-stream-assemble/actions/workflows/ci.yml/badge.svg)](https://github.com/01laky/llm-stream-assemble/actions/workflows/ci.yml)
8
- ![status](https://img.shields.io/badge/status-stable_1.0.1-brightgreen)
8
+ ![status](https://img.shields.io/badge/status-stable_1.3.5-brightgreen)
9
9
 
10
- A zero-dependency TypeScript library that normalizes LLM streaming responses — text, tool calls, reasoning, JSON, usage, errors, and non-streaming payloads — into unified events.
10
+ **One typed event model for every LLM stream** — text, tool calls, reasoning, JSON, usage, refusals, errors, and non-streaming responses.
11
11
 
12
- **Status:** Stable `1.0.1`. Core, OpenAI Chat, OpenAI-compatible, Anthropic Messages, OpenAI Responses adapters, transforms, replay helpers, and examples are production-ready. Pin semver ranges as usual and review [CHANGELOG.md](./CHANGELOG.md) before major upgrades.
12
+ > A zero-dependency TypeScript layer for assembling **OpenAI**, **Anthropic**, **Google Gemini**, and **OpenAI-compatible** LLM streams into unified events so you can stop hand-rolling provider parsers and keep one clean, typed event model across chat UIs, agents, proxies, and backends.
13
13
 
14
- > A zero-dependency TypeScript layer for assembling OpenAI, Anthropic, and compatible LLM streams into unified events for text, tool calls, reasoning, JSON, usage, errors, and non-streaming responses - so you can stop hand-rolling provider parsers and keep one clean, typed event model across LLM apps, agents, proxies, and backends.
14
+ Turn provider SSE fragments into typed events **not another `+=` loop**.
15
15
 
16
- ## How it works
16
+ **Status:** Stable `1.3.5`. Five built-in adapters, thirteen OpenAI-compatible host presets (including **Azure OpenAI** and **Cloudflare Workers AI**), transforms, replay helpers, and examples are production-ready. Pin semver ranges as usual and review [CHANGELOG.md](./CHANGELOG.md) before major upgrades.
17
+
18
+ ---
19
+
20
+ ## Contents
21
+
22
+ - [Why not just concatenate?](#why-not-just-concatenate)
23
+ - [Edge-case showcase](#edge-case-showcase)
24
+ - [Why use this](#why-use-this)
25
+ - [Architecture](#architecture)
26
+ - [Providers at a glance](#providers-at-a-glance)
27
+ - [Install](#install)
28
+ - [First success in 30 seconds](#first-success-in-30-seconds)
29
+ - [Quickstart](#quickstart)
30
+ - [Quick decision guide](#quick-decision-guide)
31
+ - [Documentation](#documentation)
32
+ - [How this compares](#how-this-compares)
33
+ - [Examples](#examples)
34
+ - [Usage guides](#usage-guides)
35
+ - [Transforms & replay](#transforms--replay)
36
+ - [Examples & proxy safety](#examples--proxy-safety)
37
+ - [Non-goals](#non-goals)
38
+ - [Development](#development)
39
+
40
+ ---
41
+
42
+ ## Why not just concatenate?
43
+
44
+ Raw LLM streams look like text, but **simple string concatenation or naive `JSON.parse` per chunk fails** in production. Providers emit **protocol events**, not finished messages.
45
+
46
+ 1. **SSE mid-line splits** — TCP chunks can break `data: {"choices":[...]}\n` across reads; you need a line buffer (`parse-sse.ts`, fixtures **LSA-C**).
47
+ 2. **Tool argument fragmentation** — function parameters arrive as partial JSON across dozens of deltas; only assembly produces valid `tool_call.done` args.
48
+ 3. **Anthropic id/index ordering** — `tool_use` blocks may stream `index` before `id`; fine-grained `input_json_delta` is invalid JSON until the block ends.
49
+ 4. **Reasoning vs user text** — DeepSeek R1, Claude thinking, and OpenAI reasoning models interleave hidden reasoning that must map to `reasoning.*`, not `text.*`.
50
+ 5. **JSON mode streaming** — structured output streams as deltas; you do not receive a parsed object until completion (`json.delta` / `json.done`).
51
+ 6. **Stream lifecycle** — `[DONE]` markers, usage-only tail chunks, and incomplete streams without explicit finish need consistent terminal handling.
52
+ 7. **Mid-stream errors** — provider error payloads must not leak raw internals to browsers; use `sanitizeErrors` when proxying (**LSA-X23**).
53
+ 8. **Dual code paths** — the same `StreamEvent` union should work for `stream: true` SSE and non-stream JSON (`assembleStream` vs `assembleResponse`).
54
+
55
+ This library is the **assembly layer** between those raw bytes and your UI, agent, or proxy.
56
+
57
+ ### Why not `text += chunk`?
58
+
59
+ The first reaction is often: “Why not `message += chunk`?” Provider streams are **protocol events**, not finished message strings.
60
+
61
+ | Failure mode | What breaks with `+=` / naive parse | This library |
62
+ | --------------------------------- | --------------------------------------------------- | ------------------------------------------------- |
63
+ | **Chunk boundaries** | SSE `data:` line split mid-payload across TCP reads | Line buffer — `parse-sse.ts` |
64
+ | **Incomplete structures** | One SSE payload ≠ one complete JSON message | Adapter per payload; assembler until `.done` |
65
+ | **State management** | Parallel tools, reasoning vs text channels | `EventAssembler` per stream |
66
+ | **Parser invalidity mid-stream** | Anthropic `input_json_delta`, partial tool args | Partial preview; valid at `.done` |
67
+ | **JSON partials** | Structured output streams as fragments | `json.*`, `tool_call.args.delta` |
68
+ | **Markdown fences in model text** | ` ```json ` split across **text tokens** | **Out of scope** — render `text.delta` in your UI |
69
+
70
+ See [Edge-case showcase](#edge-case-showcase) for concrete chunk examples.
71
+
72
+ ---
73
+
74
+ ## Edge-case showcase
75
+
76
+ Raw streams break in predictable ways. Three layers — **SSE framing**, **tool/JSON assembly**, **UI text** — fail differently:
77
+
78
+ ![Chunk assembly: SSE fragments to unified events](https://raw.githubusercontent.com/01laky/llm-stream-assemble/main/docs/img/chunk-assembly.svg)
79
+
80
+ - **SSE mid-line split** — TCP reads break `data: {...}\n` across buffers; line parser required.
81
+ - **Tool JSON partials** — args stream as `{`, `"city":`, `"Paris"}` before `tool_call.done`.
82
+ - **JSON mode** — structured output arrives as `json.delta` strings, not a parsed object.
83
+
84
+ **[Full edge-case walkthrough →](./docs/edge-cases.md)** — DIY vs `assembleStream`, fixture replay, test IDs (**LSA-C04**, **LSA-C52**, golden fixtures).
85
+
86
+ ---
87
+
88
+ ## Why use this
89
+
90
+ - **Zero runtime dependencies** — thin adapters + core assembly, no provider SDKs.
91
+ - **Stream and non-stream parity** — same `StreamEvent` union from SSE chunks or JSON bodies.
92
+ - **Provider presets, not forks** — Groq, Azure, Cloudflare, Perplexity, xAI, and others reuse one compatible parser with dialect options.
93
+ - **Proxy-ready transforms** — `toSSE({ sanitizeErrors: true })`, `tapEvents`, `collectStream`, fixture replay.
94
+
95
+ ### Performance at a glance
96
+
97
+ - **Zero runtime dependencies** — verified in CI (`pnpm verify:deps`)
98
+ - **Incremental SSE parsing** — line buffer; no full-stream re-parse
99
+ - **Single-pass O(n) assembly** — **LSA-C52** smoke test on 10k chunks
100
+ - **Bounded buffers** — `maxBufferBytes` for untrusted streams
101
+ - **Local repro:** `pnpm bench:smoke` — see [performance](./docs/performance.md)
102
+
103
+ ---
104
+
105
+ ## Architecture
17
106
 
18
107
  Raw provider bytes enter through a **thin adapter**, get assembled into **typed events**, and leave through the same transform layer whether you stream live, replay fixtures, or proxy to a browser.
19
108
 
20
- ![Architecture pipeline](https://raw.githubusercontent.com/01laky/llm-stream-assemble/main/docs/img/pipeline.svg)
109
+ ![End-to-end pipeline](https://raw.githubusercontent.com/01laky/llm-stream-assemble/main/docs/img/pipeline.svg)
21
110
 
22
- Every adapter maps provider-specific fragments into the same **`StreamEvent`** union — one event model for streaming and non-streaming code paths:
111
+ ### Built-in adapters
23
112
 
24
- ![StreamEvent mindmap](https://raw.githubusercontent.com/01laky/llm-stream-assemble/main/docs/img/stream-event.svg)
113
+ ![Built-in adapters and compatible presets](https://raw.githubusercontent.com/01laky/llm-stream-assemble/main/docs/img/adapters-overview.svg)
114
+
115
+ ### Unified event model
25
116
 
26
- Diagram sources (Mermaid): [`docs/img/pipeline.mmd`](./docs/img/pipeline.mmd), [`docs/img/stream-event.mmd`](./docs/img/stream-event.mmd). Regenerate SVGs with `@mermaid-js/mermaid-cli` after editing.
117
+ Every adapter maps provider-specific fragments into the same **`StreamEvent`** union:
118
+
119
+ ![StreamEvent mindmap](https://raw.githubusercontent.com/01laky/llm-stream-assemble/main/docs/img/stream-event.svg)
27
120
 
28
121
  **Design constraints:** adapters never accumulate cross-chunk state beyond id/index reconciliation; assembly, buffering, and `.done` emission live in core. No HTTP client, no tool execution, no UI — just the stream layer.
29
122
 
123
+ ### Lifecycle & concurrency
124
+
125
+ - **`EventAssembler` is stateful per stream** — it buffers text, reasoning, JSON, refusals, and open tool calls until `.done` / `finish`.
126
+ - **Public APIs create a new assembler per call** — `assembleStream`, `assembleFromPayloads`, `assembleResponse`, and `createAssemblyTransform` each construct their own instance.
127
+ - **One assembler = one stream/response** — do not share an instance across concurrent requests.
128
+ - **`EventAssembler.reset()`** clears state for tests or explicit reuse after a stream completes.
129
+ - **Adapters are thin** — one payload in, `RawChunk[]` out; create **one adapter instance per request/stream** (minimal id/index map only).
130
+ - **Transforms are stateless** — `tapEvents`, `toSSE`, and `collectStream` operate on the unified event stream.
131
+
132
+ ![Stateful assembler vs stateless adapters](https://raw.githubusercontent.com/01laky/llm-stream-assemble/main/docs/img/assembler-lifecycle.svg)
133
+
134
+ Diagram sources: [`docs/img/`](./docs/img/) (Mermaid `.mmd` + committed SVG). Regenerate with `pnpm diagrams:build`.
135
+
136
+ ---
137
+
138
+ ## Providers at a glance
139
+
140
+ | Adapter | Provider / API | Import |
141
+ | --------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------- |
142
+ | `openaiChatAdapter()` | OpenAI Chat Completions | `llm-stream-assemble` |
143
+ | `openaiCompatibleAdapter({ provider })` | Groq, DeepSeek, Mistral, Ollama, LM Studio, Together, Fireworks, OpenRouter, Perplexity, xAI, **Azure OpenAI**, **Cloudflare Workers AI**, generic | `llm-stream-assemble` |
144
+ | `anthropicAdapter()` | Anthropic Messages | `llm-stream-assemble` |
145
+ | `openaiResponsesAdapter()` | OpenAI Responses API | `llm-stream-assemble` |
146
+ | `geminiAdapter()` | Google AI Gemini | `llm-stream-assemble` or `/adapters/gemini` |
147
+
148
+ Full feature flags and quirks: [compatibility matrix](./docs/compatibility.md).
149
+
150
+ ---
151
+
30
152
  ## Install
31
153
 
32
154
  ```bash
@@ -34,20 +156,156 @@ pnpm add llm-stream-assemble
34
156
  # or npm install llm-stream-assemble
35
157
  ```
36
158
 
37
- ## Requirements
159
+ **Requirements:** Node.js 18+
160
+
161
+ ---
162
+
163
+ ## First success in 30 seconds
164
+
165
+ Minimal loop once you have a streaming `response.body` — see [Quickstart](#quickstart) for full `fetch` setup:
166
+
167
+ ```ts
168
+ import { assembleStream, openaiChatAdapter } from "llm-stream-assemble";
169
+
170
+ for await (const event of assembleStream(response.body!, openaiChatAdapter())) {
171
+ if (event.type === "text.delta") process.stdout.write(event.text);
172
+ if (event.type === "text.done") console.log("\n--- done:", event.text);
173
+ }
174
+ ```
175
+
176
+ Swap `openaiChatAdapter()` for `anthropicAdapter()`, `geminiAdapter()`, or `openaiCompatibleAdapter({ provider: "ollama" })` — [Quick decision guide](#quick-decision-guide).
38
177
 
39
- - Node.js 18+
178
+ ---
179
+
180
+ ## Quickstart
181
+
182
+ ```ts
183
+ import { assembleStream, openaiChatAdapter } from "llm-stream-assemble";
184
+
185
+ for await (const event of assembleStream(response.body!, openaiChatAdapter())) {
186
+ if (event.type === "text.delta") process.stdout.write(event.text);
187
+ }
188
+ ```
189
+
190
+ ---
191
+
192
+ ## Quick decision guide
193
+
194
+ Pick an adapter in ~30 seconds:
195
+
196
+ ![Quick decision guide](https://raw.githubusercontent.com/01laky/llm-stream-assemble/main/docs/img/quick-decision.svg)
197
+
198
+ - **OpenAI Chat Completions SSE** → `openaiChatAdapter()`
199
+ - **OpenAI Responses API** → `openaiResponsesAdapter()`
200
+ - **Anthropic Messages** → `anthropicAdapter()`
201
+ - **Google Gemini** → `geminiAdapter()`
202
+ - **Groq, Ollama, Azure, Cloudflare, OpenRouter, …** → `openaiCompatibleAdapter({ provider })`
203
+ - **Non-streaming JSON body** → `assembleResponse(body, adapter)`
204
+ - **React chat UI / full agent framework** → not this package — see [comparison](./docs/comparison.md)
205
+ - **XML/markdown tag parsing from model text** → out of scope — see [Non-goals](#non-goals)
206
+
207
+ ---
40
208
 
41
209
  ## Documentation
42
210
 
43
- - [Product & technical proposal](./docs/proposal.md)
44
- - [Post-1.0 provider roadmap (proposal)](./docs/post-1.0-provider-roadmap.md)
45
211
  - [Provider compatibility matrix](./docs/compatibility.md)
46
212
  - [Adapter author guide](./docs/adapter-guide.md)
213
+ - [Performance & runtime behavior](./docs/performance.md)
214
+ - [Edge-case showcase](./docs/edge-cases.md)
215
+ - [How this compares](./docs/comparison.md)
216
+ - [FAQ](./docs/faq.md)
217
+ - [Architecture diagrams](./docs/img/README.md)
218
+ - [Live smoke checklist (maintainers)](./docs/live-smoke.md)
219
+ - [Post-1.0 provider roadmap](./docs/post-1.0-provider-roadmap.md)
220
+ - [Product & technical proposal](./docs/proposal.md)
221
+
222
+ ---
223
+
224
+ ## How this compares
225
+
226
+ | | llm-stream-assemble | Full-stack AI SDK | Provider SDK | DIY concat |
227
+ | ------------ | --------------------- | ------------------ | -------------- | ------------ |
228
+ | Scope | Stream assembly only | HTTP + UI + agents | Vendor RPC | Manual parse |
229
+ | Events | Unified `StreamEvent` | Framework types | Vendor types | Ad hoc |
230
+ | Dependencies | Zero runtime | Many | Vendor package | None |
231
+
232
+ Full matrix, when-not-to-use, and alternatives: **[docs/comparison.md](./docs/comparison.md)**.
233
+
234
+ ---
235
+
236
+ ## Examples
47
237
 
48
- ## Core Usage
238
+ Curated index — full snippets live in [Usage guides](#usage-guides) and [`examples/`](./examples/README.md).
49
239
 
50
- The core pipeline works with any adapter that emits `RawChunk[]`, including the built-in OpenAI Chat, OpenAI-compatible, Anthropic Messages, and OpenAI Responses adapters:
240
+ ### OpenAI Chat
241
+
242
+ ```ts
243
+ import { assembleStream, openaiChatAdapter } from "llm-stream-assemble";
244
+ // fetch(..., { stream: true }) then:
245
+ for await (const event of assembleStream(response.body!, openaiChatAdapter())) {
246
+ if (event.type === "text.delta") process.stdout.write(event.text);
247
+ }
248
+ ```
249
+
250
+ → [`examples/node-fetch/openai-chat.ts`](./examples/node-fetch/openai-chat.ts)
251
+
252
+ ### Ollama (local)
253
+
254
+ ```ts
255
+ import { assembleStream, openaiCompatibleAdapter } from "llm-stream-assemble";
256
+ const adapter = openaiCompatibleAdapter({ provider: "ollama" });
257
+ for await (const event of assembleStream(response.body!, adapter)) {
258
+ if (event.type === "text.delta") process.stdout.write(event.text);
259
+ }
260
+ ```
261
+
262
+ → [`examples/node-fetch/openai-compatible.ts`](./examples/node-fetch/openai-compatible.ts) · Usage: [OpenAI-Compatible](#openai-compatible-usage)
263
+
264
+ ### Anthropic Messages
265
+
266
+ → [`examples/node-fetch/anthropic.ts`](./examples/node-fetch/anthropic.ts) · Usage: [Anthropic Messages](#anthropic-messages-usage)
267
+
268
+ ### Google Gemini
269
+
270
+ → [`examples/node-fetch/gemini.ts`](./examples/node-fetch/gemini.ts) · Usage: [Gemini](#gemini-usage)
271
+
272
+ ### Streaming JSON (structured output)
273
+
274
+ ```ts
275
+ for await (const event of assembleStream(response.body!, openaiChatAdapter({ jsonMode: true }))) {
276
+ if (event.type === "json.delta") process.stdout.write(event.delta);
277
+ if (event.type === "json.done") console.log(event.json);
278
+ }
279
+ ```
280
+
281
+ ### Tool calling
282
+
283
+ ```ts
284
+ for await (const event of assembleStream(response.body!, openaiChatAdapter())) {
285
+ if (event.type === "tool_call.args.delta") process.stdout.write(event.delta);
286
+ if (event.type === "tool_call.done") console.log(event.name, event.args);
287
+ }
288
+ ```
289
+
290
+ ### Chat UI / markdown rendering
291
+
292
+ Stream `text.delta` into your renderer — this library does **not** parse markdown/XML tags from model output (see [Non-goals](#non-goals)).
293
+
294
+ ### SSE proxy to browser
295
+
296
+ → [`examples/proxy-safety/`](./examples/proxy-safety/) — `toSSE(events, { sanitizeErrors: true })`
297
+
298
+ ### Fixture replay
299
+
300
+ → [`examples/node-fetch/replay-fixture.ts`](./examples/node-fetch/replay-fixture.ts)
301
+
302
+ ---
303
+
304
+ ## Usage guides
305
+
306
+ ### Core Usage
307
+
308
+ The core pipeline works with any adapter that emits `RawChunk[]`, including the built-in OpenAI Chat, OpenAI-compatible, Anthropic Messages, OpenAI Responses, and Google Gemini adapters:
51
309
 
52
310
  ```ts
53
311
  import { assembleFromPayloads, type StreamAdapter } from "llm-stream-assemble";
@@ -66,17 +324,7 @@ for await (const event of assembleFromPayloads(payloads, adapter)) {
66
324
 
67
325
  Assembly buffers completed text, reasoning, JSON, and tool-call arguments so it can emit final `.done` events. Use `maxBufferBytes` to cap those buffers for untrusted or unusually large streams.
68
326
 
69
- ## Quickstart
70
-
71
- ```ts
72
- import { assembleStream, openaiChatAdapter } from "llm-stream-assemble";
73
-
74
- for await (const event of assembleStream(response.body!, openaiChatAdapter())) {
75
- if (event.type === "text.delta") process.stdout.write(event.text);
76
- }
77
- ```
78
-
79
- ## OpenAI Chat Usage
327
+ ### OpenAI Chat Usage
80
328
 
81
329
  `openaiChatAdapter()` parses OpenAI Chat Completions payloads. Create one adapter instance per request/stream because it keeps minimal state for metadata and tool-call indexes.
82
330
 
@@ -104,7 +352,7 @@ for await (const event of assembleStream(response.body!, openaiChatAdapter())) {
104
352
 
105
353
  Streaming usage requires `stream_options: { include_usage: true }` on the OpenAI request. JSON mode content is exposed by OpenAI as normal content deltas, so use `openaiChatAdapter({ jsonMode: true })` when you want content mapped to `json.*` events.
106
354
 
107
- ## OpenAI-Compatible Usage
355
+ ### OpenAI-Compatible Usage
108
356
 
109
357
  `openaiCompatibleAdapter()` supports OpenAI-shaped Chat Completions APIs with best-effort provider presets. Create one adapter instance per request/stream.
110
358
 
@@ -122,15 +370,23 @@ for await (const event of assembleStream(response.body!, adapter)) {
122
370
 
123
371
  Provider presets:
124
372
 
125
- | Preset | Intended hosts | Notes |
126
- | ------------ | ----------------------------- | ----------------------------------------------------------- |
127
- | `generic` | Any OpenAI-shaped API | Loose defaults, best first try |
128
- | `openrouter` | OpenRouter | Mostly OpenAI-shaped; provider-specific metadata may appear |
129
- | `groq` | Groq OpenAI-compatible API | OpenAI-like; usage can vary by endpoint/model |
130
- | `ollama` | Ollama `/v1/chat/completions` | Local host, metadata may be sparse |
131
- | `lmstudio` | LM Studio local server | Local host, metadata/usage may be sparse |
132
- | `together` | Together AI | OpenAI-like, reasoning fields may vary |
133
- | `fireworks` | Fireworks AI | OpenAI-like, usage/details may vary |
373
+ | Preset | Intended hosts | Notes |
374
+ | ------------ | ----------------------------- | ------------------------------------------------------------------------------------------- |
375
+ | `generic` | Any OpenAI-shaped API | Loose defaults, best first try |
376
+ | `openrouter` | OpenRouter | Mostly OpenAI-shaped; provider-specific metadata may appear |
377
+ | `groq` | Groq OpenAI-compatible API | OpenAI-like; usage can vary by endpoint/model |
378
+ | `deepseek` | DeepSeek API | Maps `reasoning_content` to reasoning events on R1-style models |
379
+ | `mistral` | Mistral API | OpenAI-like; parallel tool calls supported |
380
+ | `ollama` | Ollama `/v1/chat/completions` | Local host, metadata may be sparse |
381
+ | `lmstudio` | LM Studio local server | Local host, metadata/usage may be sparse |
382
+ | `together` | Together AI | OpenAI-like; `reasoning` / `reasoning_delta` aliases |
383
+ | `fireworks` | Fireworks AI | OpenAI-like, usage/details may vary |
384
+ | `perplexity` | Perplexity API | Search-grounded answers; citations in `metadata.raw` |
385
+ | `xai` | xAI Grok API | OpenAI-compatible; `reasoning_content` mapped when present |
386
+ | `azure` | Azure OpenAI Chat Completions | Stricter preset; deployment URL + `api-key` auth; content filter metadata in `metadata.raw` |
387
+ | `cloudflare` | Cloudflare Workers AI REST | OpenAI-compatible `/v1/chat/completions`; Bearer + account id; loose preset like Groq |
388
+
389
+ Base URL examples: Groq `https://api.groq.com/openai/v1`, DeepSeek `https://api.deepseek.com`, Mistral `https://api.mistral.ai/v1`, Ollama `http://localhost:11434/v1`, LM Studio `http://localhost:1234/v1`, Together `https://api.together.xyz/v1`, Fireworks `https://api.fireworks.ai/inference/v1`, OpenRouter `https://openrouter.ai/api/v1`, Perplexity `https://api.perplexity.ai`, xAI `https://api.x.ai/v1`, Azure OpenAI `https://{resource}.openai.azure.com/openai/deployments/{deployment}/chat/completions?api-version={version}`, Cloudflare Workers AI `https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/v1/chat/completions`.
134
390
 
135
391
  Strict vs loose configuration:
136
392
 
@@ -155,7 +411,82 @@ Known limitations:
155
411
  - Multi-choice terminal behavior is limited by the current core single terminal finish event.
156
412
  - Missing tool ids are tolerated because core can synthesize stable ids by index.
157
413
 
158
- ## Anthropic Messages Usage
414
+ ### Azure OpenAI Usage
415
+
416
+ Azure OpenAI Chat Completions uses a deployment-scoped URL and **`api-key`** authentication instead of Bearer tokens. Use the **`azure`** preset — not `generic` — for stricter parsing aligned with OpenAI Chat semantics (`allowMissingMetadata: false`, `looseErrorShape: false`).
417
+
418
+ ```ts
419
+ import { assembleStream, openaiCompatibleAdapter } from "llm-stream-assemble";
420
+
421
+ const resource = process.env.AZURE_OPENAI_RESOURCE!;
422
+ const deployment = process.env.AZURE_OPENAI_DEPLOYMENT!;
423
+ const apiVersion = process.env.AZURE_OPENAI_API_VERSION ?? "2024-10-21";
424
+ const url = `https://${resource}.openai.azure.com/openai/deployments/${deployment}/chat/completions?api-version=${apiVersion}`;
425
+
426
+ const response = await fetch(url, {
427
+ method: "POST",
428
+ headers: {
429
+ "api-key": process.env.AZURE_OPENAI_API_KEY!,
430
+ "Content-Type": "application/json",
431
+ },
432
+ body: JSON.stringify({
433
+ messages: [{ role: "user", content: "Hello" }],
434
+ stream: true,
435
+ stream_options: { include_usage: true },
436
+ }),
437
+ });
438
+
439
+ for await (const event of assembleStream(
440
+ response.body!,
441
+ openaiCompatibleAdapter({ provider: "azure" }),
442
+ )) {
443
+ if (event.type === "text.delta") process.stdout.write(event.text);
444
+ }
445
+ ```
446
+
447
+ Use `openaiCompatibleAdapter({ provider: "azure", jsonMode: true })` when structured JSON output should map to `json.*` events. Content-filter blocks surface as `refusal.*` events with `finish_reason: content_filter`; filter result fields remain in `metadata.raw` for auditing. If an API gateway strips metadata from chunks, soften strict parsing server-side only with `allowMissingMetadata: true`.
448
+
449
+ See `examples/node-fetch/azure-openai.ts` for a URL builder helper and `examples/proxy-safety/README.md` for server-side proxy notes.
450
+
451
+ ### Cloudflare Workers AI Usage
452
+
453
+ Cloudflare Workers AI exposes an OpenAI-compatible REST endpoint at `/v1/chat/completions` under your account. Use the **`cloudflare`** preset — not `generic` — when you want fixture-tested defaults for Workers AI REST (loose metadata tolerance like Groq).
454
+
455
+ ```ts
456
+ import { assembleStream, openaiCompatibleAdapter } from "llm-stream-assemble";
457
+
458
+ const accountId = process.env.CLOUDFLARE_ACCOUNT_ID!;
459
+ const url = `https://api.cloudflare.com/client/v4/accounts/${accountId}/ai/v1/chat/completions`;
460
+
461
+ const response = await fetch(url, {
462
+ method: "POST",
463
+ headers: {
464
+ Authorization: `Bearer ${process.env.CLOUDFLARE_API_TOKEN!}`,
465
+ "Content-Type": "application/json",
466
+ },
467
+ body: JSON.stringify({
468
+ model: "@cf/meta/llama-3.1-8b-instruct",
469
+ messages: [{ role: "user", content: "Hello" }],
470
+ stream: true,
471
+ stream_options: { include_usage: true },
472
+ }),
473
+ });
474
+
475
+ for await (const event of assembleStream(
476
+ response.body!,
477
+ openaiCompatibleAdapter({ provider: "cloudflare" }),
478
+ )) {
479
+ if (event.type === "text.delta") process.stdout.write(event.text);
480
+ }
481
+ ```
482
+
483
+ Streaming usage requires `stream_options: { include_usage: true }` on the request. Use `openaiCompatibleAdapter({ provider: "cloudflare", jsonMode: true })` when JSON output should map to `json.*` events.
484
+
485
+ The **`env.AI.run(model, { stream: true })`** Worker binding can return SSE bytes compatible with `assembleStream` when the model streams Chat Completions-shaped payloads — account binding and auth stay in your Worker; this library only parses the bytes.
486
+
487
+ See `examples/workers-ai/rest-chat-completions.ts` and `examples/proxy-safety/README.md` (Bearer token + account id must never reach the browser).
488
+
489
+ ### Anthropic Messages Usage
159
490
 
160
491
  `anthropicAdapter()` parses Anthropic Messages streaming events and non-streaming responses. Create one adapter instance per request/stream.
161
492
 
@@ -169,7 +500,7 @@ for await (const event of assembleStream(response.body!, anthropicAdapter())) {
169
500
 
170
501
  Anthropic tool calls are emitted from `tool_use` content blocks. Fine-grained tool input streaming is supported through `input_json_delta`; partial input may be invalid JSON until the block ends, and core handles those partial previews best-effort. Thinking blocks map to `reasoning.*` events with `variant: "detail"`.
171
502
 
172
- ## OpenAI Responses Usage
503
+ ### OpenAI Responses Usage
173
504
 
174
505
  `openaiResponsesAdapter()` parses OpenAI Responses API streaming events and non-streaming response objects. It focuses on output text and function call argument streams; Realtime, audio, and multimodal binary output are out of scope.
175
506
 
@@ -183,7 +514,44 @@ for await (const event of assembleStream(response.body!, openaiResponsesAdapter(
183
514
 
184
515
  Use `openaiResponsesAdapter({ jsonMode: true })` to map output text to `json.*` events. Reasoning support is best-effort for string summary/detail fields. Create a new adapter instance per stream.
185
516
 
186
- ## Collecting a Stream
517
+ ### Gemini Usage
518
+
519
+ `geminiAdapter()` parses Google AI Gemini `GenerateContentResponse` payloads from `streamGenerateContent?alt=sse` and non-streaming `generateContent`. Create one adapter instance per request/stream.
520
+
521
+ ```ts
522
+ import { assembleStream, geminiAdapter } from "llm-stream-assemble";
523
+
524
+ const model = "gemini-2.5-flash";
525
+ const apiKey = process.env.GOOGLE_API_KEY!;
526
+ const url = `https://generativelanguage.googleapis.com/v1beta/models/${model}:streamGenerateContent?alt=sse&key=${encodeURIComponent(apiKey)}`;
527
+
528
+ const response = await fetch(url, {
529
+ method: "POST",
530
+ headers: { "Content-Type": "application/json" },
531
+ body: JSON.stringify({
532
+ contents: [{ role: "user", parts: [{ text: "Hello" }] }],
533
+ }),
534
+ });
535
+
536
+ for await (const event of assembleStream(response.body!, geminiAdapter())) {
537
+ if (event.type === "text.delta") process.stdout.write(event.text);
538
+ if (event.type === "tool_call.done") console.log(event.name, event.args);
539
+ }
540
+ ```
541
+
542
+ Use `geminiAdapter({ jsonMode: true })` when structured JSON output should map to `json.*` instead of `text.*`. Thinking models may emit `thought` parts mapped to `reasoning.*` (best-effort). Gemini does not expose OpenAI-style `refusal.*` events — blocked prompts use `promptFeedback` or safety finish reasons instead.
543
+
544
+ Subpath import: `import { geminiAdapter } from "llm-stream-assemble/adapters/gemini"`.
545
+
546
+ Vertex AI and the Interactions API are out of scope for this adapter; see [compatibility matrix](./docs/compatibility.md).
547
+
548
+ ---
549
+
550
+ ## Transforms & replay
551
+
552
+ ![Transforms and helpers](https://raw.githubusercontent.com/01laky/llm-stream-assemble/main/docs/img/transforms.svg)
553
+
554
+ ### Collecting a Stream
187
555
 
188
556
  `collectStream()` materializes a full event stream into text, reasoning, refusals, JSON, tool calls, latest usage, and finish reason. It buffers full output in memory and aggregates multi-choice text in event order; it is not a per-choice collector and does not currently collect metadata.
189
557
 
@@ -194,7 +562,7 @@ const result = await collectStream(events);
194
562
  console.log(result.text, result.toolCalls, result.finishReason);
195
563
  ```
196
564
 
197
- ## Tapping Events
565
+ ### Tapping Events
198
566
 
199
567
  `tapEvents()` lets you observe events for logging or metrics without changing the stream.
200
568
 
@@ -206,7 +574,7 @@ for await (const event of tapEvents(events, (event) => console.debug(event.type)
206
574
  }
207
575
  ```
208
576
 
209
- ## Forwarding Unified SSE
577
+ ### Forwarding Unified SSE
210
578
 
211
579
  `toSSE()` serializes unified `StreamEvent` objects as `data: <json>` SSE messages. It does not currently emit named SSE `event:` fields, and it emits unified event JSON rather than raw provider SSE.
212
580
 
@@ -220,7 +588,7 @@ return new Response(toSSE(events, { sanitizeErrors: true }), {
220
588
 
221
589
  Use `sanitizeErrors: true` when forwarding events to browsers so raw provider internals are not exposed.
222
590
 
223
- ## Replaying Fixtures
591
+ ### Replaying Fixtures
224
592
 
225
593
  `assembleFromFile()` is a Node/dev replay helper for local `.sse` and `.json` fixtures. It uses `node:fs/promises`, so avoid it in browser bundles; a dedicated browser/edge entry point can be added later if needed.
226
594
 
@@ -235,14 +603,22 @@ for await (const event of assembleFromFile(
235
603
  }
236
604
  ```
237
605
 
238
- ## Examples
606
+ ---
607
+
608
+ ## Examples & proxy safety
239
609
 
240
- - `examples/node-fetch/openai-chat.ts`
241
- - `examples/node-fetch/openai-compatible.ts`
242
- - `examples/node-fetch/anthropic.ts`
243
- - `examples/node-fetch/replay-fixture.ts`
244
- - `examples/proxy-safety/web-standard-proxy.ts`
245
- - `examples/proxy-safety/browser-client.ts`
610
+ | Example | Description |
611
+ | ------------------------------------------------------------------------------------------------ | ------------------------------------------------ |
612
+ | [`examples/node-fetch/openai-chat.ts`](./examples/node-fetch/openai-chat.ts) | OpenAI Chat Completions streaming |
613
+ | [`examples/node-fetch/openai-compatible.ts`](./examples/node-fetch/openai-compatible.ts) | OpenAI-compatible presets |
614
+ | [`examples/node-fetch/azure-openai.ts`](./examples/node-fetch/azure-openai.ts) | Azure OpenAI deployment URL + `api-key` |
615
+ | [`examples/workers-ai/rest-chat-completions.ts`](./examples/workers-ai/rest-chat-completions.ts) | Cloudflare Workers AI REST + `cloudflare` preset |
616
+ | [`examples/node-fetch/perplexity.ts`](./examples/node-fetch/perplexity.ts) | Perplexity streaming |
617
+ | [`examples/node-fetch/xai.ts`](./examples/node-fetch/xai.ts) | xAI Grok streaming |
618
+ | [`examples/node-fetch/anthropic.ts`](./examples/node-fetch/anthropic.ts) | Anthropic Messages |
619
+ | [`examples/node-fetch/gemini.ts`](./examples/node-fetch/gemini.ts) | Google Gemini SSE |
620
+ | [`examples/node-fetch/replay-fixture.ts`](./examples/node-fetch/replay-fixture.ts) | Local fixture replay |
621
+ | [`examples/proxy-safety/`](./examples/proxy-safety/) | Proxy + browser client patterns |
246
622
 
247
623
  Proxy safety:
248
624
 
@@ -251,6 +627,8 @@ Proxy safety:
251
627
  - Never forward raw provider errors or upstream non-OK response bodies to browsers.
252
628
  - CORS headers are application-specific and intentionally omitted from the Web-standard example.
253
629
 
630
+ ---
631
+
254
632
  ## Non-goals
255
633
 
256
634
  - No HTTP client, auth, retries, or provider SDK wrapper.
@@ -258,6 +636,8 @@ Proxy safety:
258
636
  - No UI framework, React hooks, or browser components.
259
637
  - No multimodal binary/audio/video parsing.
260
638
 
639
+ ---
640
+
261
641
  ## Development
262
642
 
263
643
  ```bash
@@ -265,14 +645,17 @@ pnpm install
265
645
  pnpm verify
266
646
  ```
267
647
 
268
- Scripts:
269
-
270
- | Command | Description |
271
- | ------------------ | -------------------------------------- |
272
- | `pnpm verify` | lint + typecheck + test + build |
273
- | `pnpm verify:deps` | fail if runtime dependencies are added |
274
- | `pnpm test` | Vitest smoke tests |
275
- | `pnpm build` | tsup ESM + CJS + declarations |
648
+ | Command | Description |
649
+ | --------------------- | --------------------------------------------------- |
650
+ | `pnpm verify` | lint + typecheck + test + build |
651
+ | `pnpm verify:deps` | fail if runtime dependencies are added |
652
+ | `pnpm release:prep` | pre-tag checks (version, CHANGELOG, dist, npm pack) |
653
+ | `pnpm diagrams:build` | regenerate README SVGs from Mermaid sources |
654
+ | `pnpm bench:smoke` | local LSA-C52 timing script (requires build first) |
655
+ | `pnpm test` | Vitest smoke tests |
656
+ | `pnpm build` | tsup → ESM + CJS + declarations |
657
+
658
+ ---
276
659
 
277
660
  ## Author
278
661