llm-stream-assemble 1.9.1 → 1.10.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +76 -673
- package/dist/adapters/anthropic.cjs +4 -4
- package/dist/adapters/anthropic.cjs.map +1 -1
- package/dist/adapters/anthropic.js +4 -4
- package/dist/adapters/anthropic.js.map +1 -1
- package/dist/adapters/bedrock.cjs +82 -76
- package/dist/adapters/bedrock.cjs.map +1 -1
- package/dist/adapters/bedrock.d.cts +1 -0
- package/dist/adapters/bedrock.d.ts +1 -0
- package/dist/adapters/bedrock.js +82 -76
- package/dist/adapters/bedrock.js.map +1 -1
- package/dist/adapters/cohere.cjs +209 -196
- package/dist/adapters/cohere.cjs.map +1 -1
- package/dist/adapters/cohere.d.cts +1 -0
- package/dist/adapters/cohere.d.ts +1 -0
- package/dist/adapters/cohere.js +209 -196
- package/dist/adapters/cohere.js.map +1 -1
- package/dist/adapters/gemini.cjs +212 -191
- package/dist/adapters/gemini.cjs.map +1 -1
- package/dist/adapters/gemini.d.cts +3 -1
- package/dist/adapters/gemini.d.ts +3 -1
- package/dist/adapters/gemini.js +212 -191
- package/dist/adapters/gemini.js.map +1 -1
- package/dist/adapters/openai-chat.cjs +3 -3
- package/dist/adapters/openai-chat.cjs.map +1 -1
- package/dist/adapters/openai-chat.js +3 -3
- package/dist/adapters/openai-chat.js.map +1 -1
- package/dist/adapters/openai-compatible.cjs +3 -3
- package/dist/adapters/openai-compatible.cjs.map +1 -1
- package/dist/adapters/openai-compatible.js +3 -3
- package/dist/adapters/openai-compatible.js.map +1 -1
- package/dist/adapters/openai-responses.cjs +368 -356
- package/dist/adapters/openai-responses.cjs.map +1 -1
- package/dist/adapters/openai-responses.d.cts +1 -0
- package/dist/adapters/openai-responses.d.ts +1 -0
- package/dist/adapters/openai-responses.js +368 -356
- package/dist/adapters/openai-responses.js.map +1 -1
- package/dist/core/index.cjs +67 -40
- package/dist/core/index.cjs.map +1 -1
- package/dist/core/index.js +67 -40
- package/dist/core/index.js.map +1 -1
- package/dist/index.cjs +767 -688
- package/dist/index.cjs.map +1 -1
- package/dist/index.js +767 -688
- package/dist/index.js.map +1 -1
- package/package.json +5 -2
package/README.md
CHANGED
|
@@ -1,156 +1,80 @@
|
|
|
1
1
|
# llm-stream-assemble
|
|
2
2
|
|
|
3
|
-

|
|
4
4
|

|
|
5
5
|

|
|
6
|
-

|
|
7
7
|
[](https://github.com/01laky/llm-stream-assemble/actions/workflows/ci.yml)
|
|
8
|
-

|
|
9
9
|
|
|
10
|
-
**One typed event model for every LLM stream** — text,
|
|
10
|
+
**One typed event model for every LLM stream** — text, tools, reasoning, JSON, usage, refusals, citations, grounding, logprobs, errors, and non-streaming responses.
|
|
11
11
|
|
|
12
12
|
> A composable TypeScript layer between raw LLM provider bytes and your app: seven built-in adapters, thirteen host presets, and a single StreamEvent model for text, tools, reasoning, JSON, and lifecycle — from Ollama to Azure to Vertex AI to Bedrock to Cohere to Cloudflare Workers AI.
|
|
13
13
|
|
|
14
14
|
Turn provider SSE fragments into typed events — **not another `+=` loop**.
|
|
15
15
|
|
|
16
|
-
**Status:** Stable `1.
|
|
16
|
+
**Status:** Stable `1.10.2`. Review [CHANGELOG.md](./CHANGELOG.md) before major upgrades.
|
|
17
17
|
|
|
18
18
|
---
|
|
19
19
|
|
|
20
20
|
## Contents
|
|
21
21
|
|
|
22
|
+
- [Positioning](#positioning)
|
|
22
23
|
- [Why not just concatenate?](#why-not-just-concatenate)
|
|
23
24
|
- [Edge-case showcase](#edge-case-showcase)
|
|
24
25
|
- [Why use this](#why-use-this)
|
|
25
|
-
- [Architecture](#architecture)
|
|
26
|
-
- [Providers at a glance](#providers-at-a-glance)
|
|
27
26
|
- [Install](#install)
|
|
28
|
-
- [First success in 30 seconds](#first-success-in-30-seconds)
|
|
29
27
|
- [Quickstart](#quickstart)
|
|
30
|
-
- [
|
|
28
|
+
- [Architecture](#architecture)
|
|
29
|
+
- [Providers at a glance](#providers-at-a-glance)
|
|
31
30
|
- [Documentation](#documentation)
|
|
32
|
-
- [How this compares](#how-this-compares)
|
|
33
|
-
- [Examples](#examples)
|
|
34
|
-
- [Integration cookbook](#integration-cookbook)
|
|
35
31
|
- [Usage guides](#usage-guides)
|
|
36
|
-
- [
|
|
37
|
-
- [Examples & proxy safety](#examples--proxy-safety)
|
|
32
|
+
- [Examples](#examples)
|
|
38
33
|
- [Non-goals](#non-goals)
|
|
39
34
|
- [Development](#development)
|
|
40
35
|
|
|
41
36
|
---
|
|
42
37
|
|
|
43
|
-
##
|
|
44
|
-
|
|
45
|
-
Raw LLM streams look like text, but **simple string concatenation or naive `JSON.parse` per chunk fails** in production. Providers emit **protocol events**, not finished messages.
|
|
38
|
+
## Positioning
|
|
46
39
|
|
|
47
|
-
|
|
48
|
-
2. **Tool argument fragmentation** — function parameters arrive as partial JSON across dozens of deltas; only assembly produces valid `tool_call.done` args.
|
|
49
|
-
3. **Anthropic id/index ordering** — `tool_use` blocks may stream `index` before `id`; fine-grained `input_json_delta` is invalid JSON until the block ends.
|
|
50
|
-
4. **Reasoning vs user text** — DeepSeek R1, Claude thinking, and OpenAI reasoning models interleave hidden reasoning that must map to `reasoning.*`, not `text.*`.
|
|
51
|
-
5. **JSON mode streaming** — structured output streams as deltas; you do not receive a parsed object until completion (`json.delta` / `json.done`).
|
|
52
|
-
6. **Stream lifecycle** — `[DONE]` markers, usage-only tail chunks, and incomplete streams without explicit finish need consistent terminal handling.
|
|
53
|
-
7. **Mid-stream errors** — provider error payloads must not leak raw internals to browsers; use `sanitizeErrors` when proxying (**LSA-X23**).
|
|
54
|
-
8. **Dual code paths** — the same `StreamEvent` union should work for `stream: true` SSE and non-stream JSON (`assembleStream` vs `assembleResponse`).
|
|
40
|
+
`llm-stream-assemble` is the stream layer only: it parses provider payloads and emits unified typed events. You keep your own HTTP client, auth, retries, tool execution, and UI.
|
|
55
41
|
|
|
56
|
-
|
|
42
|
+
---
|
|
57
43
|
|
|
58
|
-
|
|
44
|
+
## Why not just concatenate?
|
|
59
45
|
|
|
60
|
-
|
|
46
|
+
Raw LLM streams are protocol events, not finished messages.
|
|
61
47
|
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
| **Parser invalidity mid-stream** | Anthropic `input_json_delta`, partial tool args | Partial preview; valid at `.done` |
|
|
68
|
-
| **JSON partials** | Structured output streams as fragments | `json.*`, `tool_call.args.delta` |
|
|
69
|
-
| **Markdown fences in model text** | ` ```json ` split across **text tokens** | **Out of scope** — render `text.delta` in your UI |
|
|
48
|
+
- **SSE boundaries split mid-line** across TCP reads; one read is not one JSON object.
|
|
49
|
+
- **Tool args stream as fragments** and become valid JSON only at completion.
|
|
50
|
+
- **Reasoning and text channels differ** and should not be merged blindly.
|
|
51
|
+
- **JSON mode streams partial strings** before `json.done`.
|
|
52
|
+
- **Lifecycle tails vary** (`[DONE]`, usage-only tails, incomplete streams).
|
|
70
53
|
|
|
71
|
-
|
|
54
|
+
Concrete fixtures and failing edge cases: [docs/edge-cases.md](./docs/edge-cases.md).
|
|
72
55
|
|
|
73
56
|
---
|
|
74
57
|
|
|
75
58
|
## Edge-case showcase
|
|
76
59
|
|
|
77
|
-
|
|
60
|
+
Three layers fail differently in production: SSE framing, tool/JSON assembly, and UI rendering.
|
|
78
61
|
|
|
79
62
|

|
|
80
63
|
|
|
81
|
-
- **SSE mid-line split**
|
|
82
|
-
- **Tool JSON partials**
|
|
83
|
-
- **JSON mode**
|
|
64
|
+
- **SSE mid-line split** requires line-buffer parsing.
|
|
65
|
+
- **Tool JSON partials** require incremental assembly.
|
|
66
|
+
- **JSON mode** emits deltas, then terminal `.done`.
|
|
84
67
|
|
|
85
|
-
|
|
68
|
+
Walkthrough with fixtures and test IDs: [docs/edge-cases.md](./docs/edge-cases.md).
|
|
86
69
|
|
|
87
70
|
---
|
|
88
71
|
|
|
89
72
|
## Why use this
|
|
90
73
|
|
|
91
|
-
- **Zero runtime dependencies
|
|
92
|
-
- **
|
|
93
|
-
- **Provider
|
|
94
|
-
- **Proxy-
|
|
95
|
-
|
|
96
|
-
### Performance at a glance
|
|
97
|
-
|
|
98
|
-
- **Zero runtime dependencies** — verified in CI (`pnpm verify:deps`)
|
|
99
|
-
- **Incremental SSE parsing** — line buffer; no full-stream re-parse
|
|
100
|
-
- **Single-pass O(n) assembly** — **LSA-C52** smoke test on 10k chunks
|
|
101
|
-
- **Bounded buffers** — `maxBufferBytes` for untrusted streams
|
|
102
|
-
- **Local repro:** `pnpm bench:smoke` — see [performance](./docs/performance.md)
|
|
103
|
-
|
|
104
|
-
---
|
|
105
|
-
|
|
106
|
-
## Architecture
|
|
107
|
-
|
|
108
|
-
Raw provider bytes enter through a **thin adapter**, get assembled into **typed events**, and leave through the same transform layer whether you stream live, replay fixtures, or proxy to a browser.
|
|
109
|
-
|
|
110
|
-

|
|
111
|
-
|
|
112
|
-
### Built-in adapters
|
|
113
|
-
|
|
114
|
-

|
|
115
|
-
|
|
116
|
-
### Unified event model
|
|
117
|
-
|
|
118
|
-
Every adapter maps provider-specific fragments into the same **`StreamEvent`** union:
|
|
119
|
-
|
|
120
|
-

|
|
121
|
-
|
|
122
|
-
**Provenance events** include **`citation`**, **`grounding`**, and **`logprob`**. Chat / compatible: enable with `logprobs: true` on the request. Responses API: enable with `include: ["message.output_text.logprobs"]`. Logprob events are atomic per token — use **`logprobConfidence()`** for probability/margin and **`alignLogprobsWithText()`** to map tokens onto assembled assistant text.
|
|
123
|
-
|
|
124
|
-
**Design constraints:** adapters never accumulate cross-chunk state beyond id/index reconciliation; assembly, buffering, and `.done` emission live in core. No HTTP client, no tool execution, no UI — just the stream layer.
|
|
125
|
-
|
|
126
|
-
### Lifecycle & concurrency
|
|
127
|
-
|
|
128
|
-
- **`EventAssembler` is stateful per stream** — it buffers text, reasoning, JSON, refusals, and open tool calls until `.done` / `finish`.
|
|
129
|
-
- **Public APIs create a new assembler per call** — `assembleStream`, `assembleFromPayloads`, `assembleResponse`, and `createAssemblyTransform` each construct their own instance.
|
|
130
|
-
- **One assembler = one stream/response** — do not share an instance across concurrent requests.
|
|
131
|
-
- **`EventAssembler.reset()`** clears state for tests or explicit reuse after a stream completes.
|
|
132
|
-
- **Adapters are thin** — one payload in, `RawChunk[]` out; create **one adapter instance per request/stream** (minimal id/index map only).
|
|
133
|
-
- **Transforms are stateless** — `tapEvents`, `toSSE`, and `collectStream` operate on the unified event stream.
|
|
134
|
-
|
|
135
|
-

|
|
136
|
-
|
|
137
|
-
Diagram sources: [`docs/img/`](./docs/img/) (Mermaid `.mmd` + committed SVG). Regenerate with `pnpm diagrams:build`.
|
|
138
|
-
|
|
139
|
-
---
|
|
140
|
-
|
|
141
|
-
## Providers at a glance
|
|
142
|
-
|
|
143
|
-
| Adapter | Provider / API | Import |
|
|
144
|
-
| --------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------- |
|
|
145
|
-
| `openaiChatAdapter()` | OpenAI Chat Completions | `llm-stream-assemble` |
|
|
146
|
-
| `openaiCompatibleAdapter({ provider })` | Groq, DeepSeek, Mistral, Ollama, LM Studio, Together, Fireworks, OpenRouter, Perplexity, xAI, **Azure OpenAI**, **Cloudflare Workers AI**, generic | `llm-stream-assemble` |
|
|
147
|
-
| `anthropicAdapter()` | Anthropic Messages | `llm-stream-assemble` |
|
|
148
|
-
| `openaiResponsesAdapter()` | OpenAI Responses API | `llm-stream-assemble` |
|
|
149
|
-
| `geminiAdapter()` | Google AI Gemini + Vertex AI (`apiSurface`) | `llm-stream-assemble` or `/adapters/gemini` |
|
|
150
|
-
| `bedrockAdapter()` | AWS Bedrock Converse / ConverseStream | `llm-stream-assemble` or `/adapters/bedrock` |
|
|
151
|
-
| `cohereAdapter()` | Cohere Chat v2 (`api.cohere.com/v2/chat`) | `llm-stream-assemble` or `/adapters/cohere` |
|
|
152
|
-
|
|
153
|
-
Full feature flags and quirks: [compatibility matrix](./docs/compatibility.md).
|
|
74
|
+
- **Zero runtime dependencies**.
|
|
75
|
+
- **One event union for stream and non-stream flows**.
|
|
76
|
+
- **Provider adapters + host presets** instead of per-provider parser rewrites.
|
|
77
|
+
- **Proxy-safe transforms** (`toSSE`, `tapEvents`, `collectStream`) and fixture replay.
|
|
154
78
|
|
|
155
79
|
---
|
|
156
80
|
|
|
@@ -165,20 +89,11 @@ pnpm add llm-stream-assemble
|
|
|
165
89
|
|
|
166
90
|
## Runtimes
|
|
167
91
|
|
|
168
|
-
|
|
169
|
-
| ---------------------- | ------- | ------------------------------------------------------------------------------------------ |
|
|
170
|
-
| **Node.js 18+** | Primary | CI on LTS 18, 20, 22 — [compatibility matrix](./docs/compatibility.md) |
|
|
171
|
-
| **Bun** | Smoke | `ReadableStream` + npm package import |
|
|
172
|
-
| **Deno** | Smoke | `npm:llm-stream-assemble` specifiers |
|
|
173
|
-
| **Cloudflare Workers** | Smoke | `TransformStream` proxy patterns in [integration cookbook](./docs/integration-cookbook.md) |
|
|
174
|
-
|
|
175
|
-
Full matrix and caveats: [post-1.0 provider roadmap — Runtime support](./docs/post-1.0-provider-roadmap.md#runtime-support-matrix).
|
|
92
|
+
Node.js **18+** (CI on LTS 18, 20, 22). See [compatibility matrix](./docs/compatibility.md).
|
|
176
93
|
|
|
177
94
|
---
|
|
178
95
|
|
|
179
|
-
##
|
|
180
|
-
|
|
181
|
-
Minimal loop once you have a streaming `response.body` — see [Quickstart](#quickstart) for full `fetch` setup:
|
|
96
|
+
## Quickstart
|
|
182
97
|
|
|
183
98
|
```ts
|
|
184
99
|
import { assembleStream, openaiChatAdapter } from "llm-stream-assemble";
|
|
@@ -189,595 +104,84 @@ for await (const event of assembleStream(response.body!, openaiChatAdapter())) {
|
|
|
189
104
|
}
|
|
190
105
|
```
|
|
191
106
|
|
|
192
|
-
|
|
107
|
+
For provider-specific setup, request payloads, and caveats, use [docs/usage-guides.md](./docs/usage-guides.md).
|
|
193
108
|
|
|
194
109
|
---
|
|
195
110
|
|
|
196
|
-
##
|
|
111
|
+
## Architecture
|
|
197
112
|
|
|
198
|
-
|
|
199
|
-
import { assembleStream, openaiChatAdapter } from "llm-stream-assemble";
|
|
113
|
+
Raw provider bytes enter through a thin adapter and exit as unified typed events.
|
|
200
114
|
|
|
201
|
-
|
|
202
|
-
if (event.type === "text.delta") process.stdout.write(event.text);
|
|
203
|
-
}
|
|
204
|
-
```
|
|
115
|
+
### Lifecycle & concurrency
|
|
205
116
|
|
|
206
|
-
|
|
117
|
+
Adapters are **stateful per stream** — create a new adapter instance per request. The assembler supports **`reset()`** for reuse within a long-lived worker when needed.
|
|
207
118
|
|
|
208
|
-
|
|
119
|
+

|
|
209
120
|
|
|
210
|
-
|
|
121
|
+
- Architecture diagrams: [docs/img/README.md](./docs/img/README.md)
|
|
122
|
+
- Adapter graph: [docs/img/adapters-overview.svg](./docs/img/adapters-overview.svg)
|
|
123
|
+
- Transforms: [docs/img/transforms.svg](./docs/img/transforms.svg)
|
|
124
|
+
- Quick decision guide: [docs/img/quick-decision.svg](./docs/img/quick-decision.svg)
|
|
125
|
+
- Lifecycle model: [docs/img/assembler-lifecycle.svg](./docs/img/assembler-lifecycle.svg)
|
|
211
126
|
|
|
212
|
-
|
|
127
|
+
---
|
|
128
|
+
|
|
129
|
+
## Providers at a glance
|
|
130
|
+
|
|
131
|
+
| Adapter | Provider / API |
|
|
132
|
+
| --------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- |
|
|
133
|
+
| `openaiChatAdapter()` | OpenAI Chat Completions |
|
|
134
|
+
| `openaiCompatibleAdapter({ provider })` | Groq, DeepSeek, Mistral, Ollama, LM Studio, Together, Fireworks, OpenRouter, Perplexity, xAI, Azure OpenAI, Cloudflare Workers AI |
|
|
135
|
+
| `anthropicAdapter()` | Anthropic Messages |
|
|
136
|
+
| `openaiResponsesAdapter()` | OpenAI Responses API |
|
|
137
|
+
| `geminiAdapter()` | Google AI Gemini + Vertex AI (`apiSurface`) |
|
|
138
|
+
| `bedrockAdapter()` | AWS Bedrock Converse / ConverseStream |
|
|
139
|
+
| `cohereAdapter()` | Cohere Chat v2 (`api.cohere.com/v2/chat`) |
|
|
213
140
|
|
|
214
|
-
|
|
215
|
-
- **OpenAI Responses API** → `openaiResponsesAdapter()`
|
|
216
|
-
- **Anthropic Messages** → `anthropicAdapter()`
|
|
217
|
-
- **Google Gemini** → `geminiAdapter()`
|
|
218
|
-
- **AWS Bedrock ConverseStream** → `bedrockAdapter()` (decoded JSON per event — see [Bedrock Usage](#bedrock-usage))
|
|
219
|
-
- **Cohere Chat v2 SSE** → `cohereAdapter()` (not OpenAI-compatible — see [Cohere Usage](#cohere-usage))
|
|
220
|
-
- **Groq, Ollama, Azure, Cloudflare, OpenRouter, …** → `openaiCompatibleAdapter({ provider })`
|
|
221
|
-
- **Non-streaming JSON body** → `assembleResponse(body, adapter)`
|
|
222
|
-
- **React chat UI / full agent framework** → not this package — see [comparison](./docs/comparison.md)
|
|
223
|
-
- **XML/markdown tag parsing from model text** → out of scope — see [Non-goals](#non-goals)
|
|
141
|
+
Feature flags and quirks: [docs/compatibility.md](./docs/compatibility.md).
|
|
224
142
|
|
|
225
143
|
---
|
|
226
144
|
|
|
227
145
|
## Documentation
|
|
228
146
|
|
|
147
|
+
- [Usage guides](./docs/usage-guides.md)
|
|
229
148
|
- [Provider compatibility matrix](./docs/compatibility.md)
|
|
149
|
+
- [Integration cookbook](./docs/integration-cookbook.md)
|
|
150
|
+
- [Examples index](./examples/README.md)
|
|
230
151
|
- [Adapter author guide](./docs/adapter-guide.md)
|
|
231
152
|
- [Performance & runtime behavior](./docs/performance.md)
|
|
232
|
-
- [Edge-case showcase](./docs/edge-cases.md)
|
|
233
|
-
- [Integration cookbook](./docs/integration-cookbook.md)
|
|
234
153
|
- [How this compares](./docs/comparison.md)
|
|
235
154
|
- [FAQ](./docs/faq.md)
|
|
236
155
|
- [Architecture diagrams](./docs/img/README.md)
|
|
237
|
-
- [Live smoke checklist (maintainers)](./docs/live-smoke.md)
|
|
238
|
-
- [Post-1.0 provider roadmap](./docs/post-1.0-provider-roadmap.md)
|
|
239
|
-
- [Product & technical proposal](./docs/proposal.md)
|
|
240
|
-
|
|
241
|
-
---
|
|
242
|
-
|
|
243
|
-
## How this compares
|
|
244
|
-
|
|
245
|
-
| | llm-stream-assemble | Full-stack AI SDK | Provider SDK | DIY concat |
|
|
246
|
-
| ------------ | --------------------- | ------------------ | -------------- | ------------ |
|
|
247
|
-
| Scope | Stream assembly only | HTTP + UI + agents | Vendor RPC | Manual parse |
|
|
248
|
-
| Events | Unified `StreamEvent` | Framework types | Vendor types | Ad hoc |
|
|
249
|
-
| Dependencies | Zero runtime | Many | Vendor package | None |
|
|
250
|
-
|
|
251
|
-
Full matrix, when-not-to-use, and alternatives: **[docs/comparison.md](./docs/comparison.md)**.
|
|
252
|
-
|
|
253
|
-
---
|
|
254
|
-
|
|
255
|
-
## Examples
|
|
256
|
-
|
|
257
|
-
Curated index — full snippets live in [Usage guides](#usage-guides) and [`examples/`](./examples/README.md).
|
|
258
|
-
|
|
259
|
-
### OpenAI Chat
|
|
260
|
-
|
|
261
|
-
```ts
|
|
262
|
-
import { assembleStream, openaiChatAdapter } from "llm-stream-assemble";
|
|
263
|
-
// fetch(..., { stream: true }) then:
|
|
264
|
-
for await (const event of assembleStream(response.body!, openaiChatAdapter())) {
|
|
265
|
-
if (event.type === "text.delta") process.stdout.write(event.text);
|
|
266
|
-
}
|
|
267
|
-
```
|
|
268
|
-
|
|
269
|
-
→ [`examples/node-fetch/openai-chat.ts`](./examples/node-fetch/openai-chat.ts)
|
|
270
|
-
|
|
271
|
-
### Ollama (local)
|
|
272
|
-
|
|
273
|
-
```ts
|
|
274
|
-
import { assembleStream, openaiCompatibleAdapter } from "llm-stream-assemble";
|
|
275
|
-
const adapter = openaiCompatibleAdapter({ provider: "ollama" });
|
|
276
|
-
for await (const event of assembleStream(response.body!, adapter)) {
|
|
277
|
-
if (event.type === "text.delta") process.stdout.write(event.text);
|
|
278
|
-
}
|
|
279
|
-
```
|
|
280
|
-
|
|
281
|
-
→ [`examples/node-fetch/openai-compatible.ts`](./examples/node-fetch/openai-compatible.ts) · Usage: [OpenAI-Compatible](#openai-compatible-usage)
|
|
282
|
-
|
|
283
|
-
### Anthropic Messages
|
|
284
|
-
|
|
285
|
-
→ [`examples/node-fetch/anthropic.ts`](./examples/node-fetch/anthropic.ts) · Usage: [Anthropic Messages](#anthropic-messages-usage)
|
|
286
|
-
|
|
287
|
-
### Google Gemini
|
|
288
|
-
|
|
289
|
-
→ [`examples/node-fetch/gemini.ts`](./examples/node-fetch/gemini.ts) · Usage: [Gemini](#gemini-usage)
|
|
290
|
-
|
|
291
|
-
### AWS Bedrock
|
|
292
|
-
|
|
293
|
-
→ [`examples/node-fetch/bedrock.ts`](./examples/node-fetch/bedrock.ts) · Usage: [Bedrock](#bedrock-usage) · Decode helper: [`examples/bedrock/README.md`](./examples/bedrock/README.md)
|
|
294
|
-
|
|
295
|
-
### Cohere Chat v2
|
|
296
|
-
|
|
297
|
-
→ [`examples/node-fetch/cohere.ts`](./examples/node-fetch/cohere.ts) · Usage: [Cohere](#cohere-usage)
|
|
298
|
-
|
|
299
|
-
### Streaming JSON (structured output)
|
|
300
|
-
|
|
301
|
-
```ts
|
|
302
|
-
for await (const event of assembleStream(response.body!, openaiChatAdapter({ jsonMode: true }))) {
|
|
303
|
-
if (event.type === "json.delta") process.stdout.write(event.delta);
|
|
304
|
-
if (event.type === "json.done") console.log(event.json);
|
|
305
|
-
}
|
|
306
|
-
```
|
|
307
|
-
|
|
308
|
-
### Tool calling
|
|
309
|
-
|
|
310
|
-
```ts
|
|
311
|
-
for await (const event of assembleStream(response.body!, openaiChatAdapter())) {
|
|
312
|
-
if (event.type === "tool_call.args.delta") process.stdout.write(event.delta);
|
|
313
|
-
if (event.type === "tool_call.done") console.log(event.name, event.args);
|
|
314
|
-
}
|
|
315
|
-
```
|
|
316
|
-
|
|
317
|
-
### Chat UI / markdown rendering
|
|
318
|
-
|
|
319
|
-
Stream `text.delta` into your renderer — this library does **not** parse markdown/XML tags from model output (see [Non-goals](#non-goals)).
|
|
320
|
-
|
|
321
|
-
### SSE proxy to browser
|
|
322
|
-
|
|
323
|
-
→ [`examples/proxy-safety/`](./examples/proxy-safety/) — `toSSE(events, { sanitizeErrors: true })`
|
|
324
|
-
|
|
325
|
-
### Fixture replay
|
|
326
|
-
|
|
327
|
-
→ [`examples/node-fetch/replay-fixture.ts`](./examples/node-fetch/replay-fixture.ts)
|
|
328
|
-
|
|
329
|
-
### Integration cookbook
|
|
330
|
-
|
|
331
|
-
Wire unified events into **Hono**, **Express**, **Cloudflare Workers**, **LiteLLM**, **Next.js App Router**, AI SDK mapping, and LangChain callbacks — [`examples/integrations/`](./examples/integrations/) · **[Full cookbook →](./docs/integration-cookbook.md)**
|
|
332
156
|
|
|
333
157
|
---
|
|
334
158
|
|
|
335
159
|
## Usage guides
|
|
336
160
|
|
|
337
|
-
|
|
338
|
-
|
|
339
|
-
The core pipeline works with any adapter that emits `RawChunk[]`, including the built-in OpenAI Chat, OpenAI-compatible, Anthropic Messages, OpenAI Responses, Google Gemini, AWS Bedrock, and Cohere adapters:
|
|
340
|
-
|
|
341
|
-
```ts
|
|
342
|
-
import { assembleFromPayloads, type StreamAdapter } from "llm-stream-assemble";
|
|
343
|
-
|
|
344
|
-
const adapter: StreamAdapter = {
|
|
345
|
-
parseChunk(raw) {
|
|
346
|
-
const data = JSON.parse(raw) as { text?: string };
|
|
347
|
-
return data.text ? [{ kind: "text-delta", text: data.text }] : [];
|
|
348
|
-
},
|
|
349
|
-
};
|
|
350
|
-
|
|
351
|
-
for await (const event of assembleFromPayloads(payloads, adapter)) {
|
|
352
|
-
if (event.type === "text.delta") process.stdout.write(event.text);
|
|
353
|
-
}
|
|
354
|
-
```
|
|
161
|
+
Moved out of README to keep this page focused and release-stable:
|
|
355
162
|
|
|
356
|
-
|
|
163
|
+
- Core usage + adapter contract: [docs/usage-guides.md#core-usage](./docs/usage-guides.md#core-usage)
|
|
164
|
+
- OpenAI Chat / compatible / Azure / Cloudflare: [docs/usage-guides.md#openai-chat-usage](./docs/usage-guides.md#openai-chat-usage)
|
|
165
|
+
- Anthropic + OpenAI Responses: [docs/usage-guides.md#anthropic-messages-usage](./docs/usage-guides.md#anthropic-messages-usage)
|
|
166
|
+
- Gemini + Vertex: [docs/usage-guides.md#gemini-usage](./docs/usage-guides.md#gemini-usage)
|
|
167
|
+
- Bedrock + Cohere: [docs/usage-guides.md#bedrock-usage](./docs/usage-guides.md#bedrock-usage)
|
|
357
168
|
|
|
358
|
-
|
|
169
|
+
More operational guidance:
|
|
359
170
|
|
|
360
|
-
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
import { assembleStream, openaiChatAdapter } from "llm-stream-assemble";
|
|
364
|
-
|
|
365
|
-
const response = await fetch("https://api.openai.com/v1/chat/completions", {
|
|
366
|
-
method: "POST",
|
|
367
|
-
headers: {
|
|
368
|
-
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
|
|
369
|
-
"Content-Type": "application/json",
|
|
370
|
-
},
|
|
371
|
-
body: JSON.stringify({
|
|
372
|
-
model: "gpt-4o-mini",
|
|
373
|
-
messages,
|
|
374
|
-
stream: true,
|
|
375
|
-
stream_options: { include_usage: true },
|
|
376
|
-
}),
|
|
377
|
-
});
|
|
378
|
-
|
|
379
|
-
for await (const event of assembleStream(response.body!, openaiChatAdapter())) {
|
|
380
|
-
if (event.type === "text.delta") process.stdout.write(event.text);
|
|
381
|
-
}
|
|
382
|
-
```
|
|
383
|
-
|
|
384
|
-
Streaming usage requires `stream_options: { include_usage: true }` on the OpenAI request. JSON mode content is exposed by OpenAI as normal content deltas, so use `openaiChatAdapter({ jsonMode: true })` when you want content mapped to `json.*` events.
|
|
385
|
-
|
|
386
|
-
### OpenAI-Compatible Usage
|
|
387
|
-
|
|
388
|
-
`openaiCompatibleAdapter()` supports OpenAI-shaped Chat Completions APIs with best-effort provider presets. Create one adapter instance per request/stream.
|
|
389
|
-
|
|
390
|
-
```ts
|
|
391
|
-
import { assembleStream, openaiCompatibleAdapter } from "llm-stream-assemble";
|
|
392
|
-
|
|
393
|
-
const adapter = openaiCompatibleAdapter({
|
|
394
|
-
provider: "openrouter",
|
|
395
|
-
});
|
|
396
|
-
|
|
397
|
-
for await (const event of assembleStream(response.body!, adapter)) {
|
|
398
|
-
if (event.type === "text.delta") process.stdout.write(event.text);
|
|
399
|
-
}
|
|
400
|
-
```
|
|
401
|
-
|
|
402
|
-
Provider presets:
|
|
403
|
-
|
|
404
|
-
| Preset | Intended hosts | Notes |
|
|
405
|
-
| ------------ | ----------------------------- | ------------------------------------------------------------------------------------------- |
|
|
406
|
-
| `generic` | Any OpenAI-shaped API | Loose defaults, best first try |
|
|
407
|
-
| `openrouter` | OpenRouter | Mostly OpenAI-shaped; provider-specific metadata may appear |
|
|
408
|
-
| `groq` | Groq OpenAI-compatible API | OpenAI-like; usage can vary by endpoint/model |
|
|
409
|
-
| `deepseek` | DeepSeek API | Maps `reasoning_content` to reasoning events on R1-style models |
|
|
410
|
-
| `mistral` | Mistral API | OpenAI-like; parallel tool calls supported |
|
|
411
|
-
| `ollama` | Ollama `/v1/chat/completions` | Local host, metadata may be sparse |
|
|
412
|
-
| `lmstudio` | LM Studio local server | Local host, metadata/usage may be sparse |
|
|
413
|
-
| `together` | Together AI | OpenAI-like; `reasoning` / `reasoning_delta` aliases |
|
|
414
|
-
| `fireworks` | Fireworks AI | OpenAI-like, usage/details may vary |
|
|
415
|
-
| `perplexity` | Perplexity API | Search-grounded answers; root `citations` / `search_results` → typed `citation` events |
|
|
416
|
-
| `xai` | xAI Grok API | OpenAI-compatible; `reasoning_content` mapped when present |
|
|
417
|
-
| `azure` | Azure OpenAI Chat Completions | Stricter preset; deployment URL + `api-key` auth; content filter metadata in `metadata.raw` |
|
|
418
|
-
| `cloudflare` | Cloudflare Workers AI REST | OpenAI-compatible `/v1/chat/completions`; Bearer + account id; loose preset like Groq |
|
|
419
|
-
|
|
420
|
-
Base URL examples: Groq `https://api.groq.com/openai/v1`, DeepSeek `https://api.deepseek.com`, Mistral `https://api.mistral.ai/v1`, Ollama `http://localhost:11434/v1`, LM Studio `http://localhost:1234/v1`, Together `https://api.together.xyz/v1`, Fireworks `https://api.fireworks.ai/inference/v1`, OpenRouter `https://openrouter.ai/api/v1`, Perplexity `https://api.perplexity.ai`, xAI `https://api.x.ai/v1`, Azure OpenAI `https://{resource}.openai.azure.com/openai/deployments/{deployment}/chat/completions?api-version={version}`, Cloudflare Workers AI `https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/v1/chat/completions`.
|
|
421
|
-
|
|
422
|
-
Strict vs loose configuration:
|
|
423
|
-
|
|
424
|
-
```ts
|
|
425
|
-
// Loose default: good for local/open-source OpenAI-compatible hosts.
|
|
426
|
-
openaiCompatibleAdapter({ provider: "ollama" });
|
|
427
|
-
|
|
428
|
-
// Stricter mode: useful when unexpected payload shapes should fail fast.
|
|
429
|
-
openaiCompatibleAdapter({
|
|
430
|
-
provider: "generic",
|
|
431
|
-
allowMissingMetadata: false,
|
|
432
|
-
looseErrorShape: false,
|
|
433
|
-
useChoicePositionFallback: false,
|
|
434
|
-
});
|
|
435
|
-
```
|
|
436
|
-
|
|
437
|
-
Known limitations:
|
|
438
|
-
|
|
439
|
-
- Provider presets are fixture-tested and best-effort; CI does not call live provider APIs.
|
|
440
|
-
- Hosts can change OpenAI-compatible dialects without notice.
|
|
441
|
-
- Non-string reasoning payloads are skipped.
|
|
442
|
-
- Multi-choice terminal behavior is limited by the current core single terminal finish event.
|
|
443
|
-
- Missing tool ids are tolerated because core can synthesize stable ids by index.
|
|
444
|
-
|
|
445
|
-
### Azure OpenAI Usage
|
|
446
|
-
|
|
447
|
-
Azure OpenAI Chat Completions uses a deployment-scoped URL and **`api-key`** authentication instead of Bearer tokens. Use the **`azure`** preset — not `generic` — for stricter parsing aligned with OpenAI Chat semantics (`allowMissingMetadata: false`, `looseErrorShape: false`).
|
|
448
|
-
|
|
449
|
-
```ts
|
|
450
|
-
import { assembleStream, openaiCompatibleAdapter } from "llm-stream-assemble";
|
|
451
|
-
|
|
452
|
-
const resource = process.env.AZURE_OPENAI_RESOURCE!;
|
|
453
|
-
const deployment = process.env.AZURE_OPENAI_DEPLOYMENT!;
|
|
454
|
-
const apiVersion = process.env.AZURE_OPENAI_API_VERSION ?? "2024-10-21";
|
|
455
|
-
const url = `https://${resource}.openai.azure.com/openai/deployments/${deployment}/chat/completions?api-version=${apiVersion}`;
|
|
456
|
-
|
|
457
|
-
const response = await fetch(url, {
|
|
458
|
-
method: "POST",
|
|
459
|
-
headers: {
|
|
460
|
-
"api-key": process.env.AZURE_OPENAI_API_KEY!,
|
|
461
|
-
"Content-Type": "application/json",
|
|
462
|
-
},
|
|
463
|
-
body: JSON.stringify({
|
|
464
|
-
messages: [{ role: "user", content: "Hello" }],
|
|
465
|
-
stream: true,
|
|
466
|
-
stream_options: { include_usage: true },
|
|
467
|
-
}),
|
|
468
|
-
});
|
|
469
|
-
|
|
470
|
-
for await (const event of assembleStream(
|
|
471
|
-
response.body!,
|
|
472
|
-
openaiCompatibleAdapter({ provider: "azure" }),
|
|
473
|
-
)) {
|
|
474
|
-
if (event.type === "text.delta") process.stdout.write(event.text);
|
|
475
|
-
}
|
|
476
|
-
```
|
|
477
|
-
|
|
478
|
-
Use `openaiCompatibleAdapter({ provider: "azure", jsonMode: true })` when structured JSON output should map to `json.*` events. Content-filter blocks surface as `refusal.*` events with `finish_reason: content_filter`; filter result fields remain in `metadata.raw` for auditing. If an API gateway strips metadata from chunks, soften strict parsing server-side only with `allowMissingMetadata: true`.
|
|
479
|
-
|
|
480
|
-
See `examples/node-fetch/azure-openai.ts` for a URL builder helper and `examples/proxy-safety/README.md` for server-side proxy notes.
|
|
481
|
-
|
|
482
|
-
### Cloudflare Workers AI Usage
|
|
483
|
-
|
|
484
|
-
Cloudflare Workers AI exposes an OpenAI-compatible REST endpoint at `/v1/chat/completions` under your account. Use the **`cloudflare`** preset — not `generic` — when you want fixture-tested defaults for Workers AI REST (loose metadata tolerance like Groq).
|
|
485
|
-
|
|
486
|
-
```ts
|
|
487
|
-
import { assembleStream, openaiCompatibleAdapter } from "llm-stream-assemble";
|
|
488
|
-
|
|
489
|
-
const accountId = process.env.CLOUDFLARE_ACCOUNT_ID!;
|
|
490
|
-
const url = `https://api.cloudflare.com/client/v4/accounts/${accountId}/ai/v1/chat/completions`;
|
|
491
|
-
|
|
492
|
-
const response = await fetch(url, {
|
|
493
|
-
method: "POST",
|
|
494
|
-
headers: {
|
|
495
|
-
Authorization: `Bearer ${process.env.CLOUDFLARE_API_TOKEN!}`,
|
|
496
|
-
"Content-Type": "application/json",
|
|
497
|
-
},
|
|
498
|
-
body: JSON.stringify({
|
|
499
|
-
model: "@cf/meta/llama-3.1-8b-instruct",
|
|
500
|
-
messages: [{ role: "user", content: "Hello" }],
|
|
501
|
-
stream: true,
|
|
502
|
-
stream_options: { include_usage: true },
|
|
503
|
-
}),
|
|
504
|
-
});
|
|
505
|
-
|
|
506
|
-
for await (const event of assembleStream(
|
|
507
|
-
response.body!,
|
|
508
|
-
openaiCompatibleAdapter({ provider: "cloudflare" }),
|
|
509
|
-
)) {
|
|
510
|
-
if (event.type === "text.delta") process.stdout.write(event.text);
|
|
511
|
-
}
|
|
512
|
-
```
|
|
513
|
-
|
|
514
|
-
Streaming usage requires `stream_options: { include_usage: true }` on the request. Use `openaiCompatibleAdapter({ provider: "cloudflare", jsonMode: true })` when JSON output should map to `json.*` events.
|
|
515
|
-
|
|
516
|
-
The **`env.AI.run(model, { stream: true })`** Worker binding can return SSE bytes compatible with `assembleStream` when the model streams Chat Completions-shaped payloads — account binding and auth stay in your Worker; this library only parses the bytes.
|
|
517
|
-
|
|
518
|
-
See `examples/workers-ai/rest-chat-completions.ts` and `examples/proxy-safety/README.md` (Bearer token + account id must never reach the browser).
|
|
519
|
-
|
|
520
|
-
### Anthropic Messages Usage
|
|
521
|
-
|
|
522
|
-
`anthropicAdapter()` parses Anthropic Messages streaming events and non-streaming responses. Create one adapter instance per request/stream.
|
|
523
|
-
|
|
524
|
-
```ts
|
|
525
|
-
import { anthropicAdapter, assembleStream } from "llm-stream-assemble";
|
|
526
|
-
|
|
527
|
-
for await (const event of assembleStream(response.body!, anthropicAdapter())) {
|
|
528
|
-
if (event.type === "text.delta") process.stdout.write(event.text);
|
|
529
|
-
}
|
|
530
|
-
```
|
|
531
|
-
|
|
532
|
-
Anthropic tool calls are emitted from `tool_use` content blocks. Fine-grained tool input streaming is supported through `input_json_delta`; partial input may be invalid JSON until the block ends, and core handles those partial previews best-effort. Thinking blocks map to `reasoning.*` events with `variant: "detail"`.
|
|
533
|
-
|
|
534
|
-
### OpenAI Responses Usage
|
|
535
|
-
|
|
536
|
-
`openaiResponsesAdapter()` parses OpenAI Responses API streaming events and non-streaming response objects. It focuses on output text and function call argument streams; Realtime, audio, and multimodal binary output are out of scope.
|
|
537
|
-
|
|
538
|
-
```ts
|
|
539
|
-
import { assembleStream, openaiResponsesAdapter } from "llm-stream-assemble";
|
|
540
|
-
|
|
541
|
-
for await (const event of assembleStream(response.body!, openaiResponsesAdapter())) {
|
|
542
|
-
if (event.type === "tool_call.args.delta") console.log(event.delta);
|
|
543
|
-
}
|
|
544
|
-
```
|
|
545
|
-
|
|
546
|
-
Use `openaiResponsesAdapter({ jsonMode: true })` to map output text to `json.*` events. Reasoning support is best-effort for string summary/detail fields. Typed **`logprob`** events when the request sets `include: ["message.output_text.logprobs"]` (optional `top_logprobs`) — same helpers as Chat Completions. Create a new adapter instance per stream.
|
|
547
|
-
|
|
548
|
-
### Gemini Usage
|
|
549
|
-
|
|
550
|
-
`geminiAdapter()` parses Google AI Gemini `GenerateContentResponse` payloads from `streamGenerateContent?alt=sse` and non-streaming `generateContent`. Create one adapter instance per request/stream.
|
|
551
|
-
|
|
552
|
-
```ts
|
|
553
|
-
import { assembleStream, geminiAdapter } from "llm-stream-assemble";
|
|
554
|
-
|
|
555
|
-
const model = "gemini-2.5-flash";
|
|
556
|
-
const apiKey = process.env.GOOGLE_API_KEY!;
|
|
557
|
-
const url = `https://generativelanguage.googleapis.com/v1beta/models/${model}:streamGenerateContent?alt=sse&key=${encodeURIComponent(apiKey)}`;
|
|
558
|
-
|
|
559
|
-
const response = await fetch(url, {
|
|
560
|
-
method: "POST",
|
|
561
|
-
headers: { "Content-Type": "application/json" },
|
|
562
|
-
body: JSON.stringify({
|
|
563
|
-
contents: [{ role: "user", parts: [{ text: "Hello" }] }],
|
|
564
|
-
}),
|
|
565
|
-
});
|
|
566
|
-
|
|
567
|
-
for await (const event of assembleStream(response.body!, geminiAdapter())) {
|
|
568
|
-
if (event.type === "text.delta") process.stdout.write(event.text);
|
|
569
|
-
if (event.type === "tool_call.done") console.log(event.name, event.args);
|
|
570
|
-
}
|
|
571
|
-
```
|
|
572
|
-
|
|
573
|
-
Use `geminiAdapter({ jsonMode: true })` when structured JSON output should map to `json.*` instead of `text.*`. Thinking models may emit `thought` parts mapped to `reasoning.*` (best-effort). Gemini does not expose OpenAI-style `refusal.*` events — blocked prompts use `promptFeedback` or safety finish reasons instead.
|
|
574
|
-
|
|
575
|
-
Subpath import: `import { geminiAdapter } from "llm-stream-assemble/adapters/gemini"`.
|
|
576
|
-
|
|
577
|
-
#### Vertex AI Gemini
|
|
578
|
-
|
|
579
|
-
Vertex uses the same `geminiAdapter()` with **`apiSurface: "vertex"`**. The adapter strips Vertex / gateway envelopes (`response`, `result`, `predictions[0]`) via **`normalizeVertexChunk()`** before mapping `candidates` and tools. Vertex HTTP streams are often **JSONL or concatenated JSON objects**, not Google AI `data:` SSE — split complete JSON strings in your app, then pass each line to `assembleFromPayloads` (see [`examples/vertex/read-chunk-stream.ts`](./examples/vertex/read-chunk-stream.ts)).
|
|
580
|
-
|
|
581
|
-
```ts
|
|
582
|
-
import { assembleFromPayloads, geminiAdapter } from "llm-stream-assemble";
|
|
583
|
-
import { buildVertexStreamUrl } from "./examples/vertex/build-vertex-url";
|
|
584
|
-
import { readVertexJsonlStrings } from "./examples/vertex/read-chunk-stream";
|
|
585
|
-
|
|
586
|
-
const projectId = process.env.GOOGLE_CLOUD_PROJECT!;
|
|
587
|
-
const location = process.env.VERTEX_LOCATION ?? "us-central1";
|
|
588
|
-
const model = process.env.VERTEX_MODEL ?? "gemini-2.5-flash";
|
|
589
|
-
const accessToken = process.env.VERTEX_ACCESS_TOKEN!; // ADC — not GOOGLE_API_KEY
|
|
590
|
-
|
|
591
|
-
const response = await fetch(buildVertexStreamUrl({ projectId, location, model }), {
|
|
592
|
-
method: "POST",
|
|
593
|
-
headers: {
|
|
594
|
-
Authorization: `Bearer ${accessToken}`,
|
|
595
|
-
"Content-Type": "application/json",
|
|
596
|
-
},
|
|
597
|
-
body: JSON.stringify({
|
|
598
|
-
contents: [{ role: "user", parts: [{ text: "Hello" }] }],
|
|
599
|
-
}),
|
|
600
|
-
});
|
|
601
|
-
|
|
602
|
-
async function* lines() {
|
|
603
|
-
for await (const line of readVertexJsonlStrings(response.body!)) yield line;
|
|
604
|
-
}
|
|
605
|
-
|
|
606
|
-
for await (const event of assembleFromPayloads(lines(), geminiAdapter({ apiSurface: "vertex" }))) {
|
|
607
|
-
if (event.type === "text.delta") process.stdout.write(event.text);
|
|
608
|
-
}
|
|
609
|
-
```
|
|
610
|
-
|
|
611
|
-
Obtain a short-lived bearer token with Application Default Credentials, e.g. `gcloud auth application-default print-access-token`, and set `VERTEX_ACCESS_TOKEN` (or pass `accessToken` in your own wrapper). Full runnable example: [`examples/node-fetch/vertex-gemini.ts`](./examples/node-fetch/vertex-gemini.ts). Live smoke: `pnpm smoke:vertex` — see [live-smoke](./docs/live-smoke.md).
|
|
612
|
-
|
|
613
|
-
The Gemini **Interactions API** remains deferred; see [compatibility matrix](./docs/compatibility.md).
|
|
614
|
-
|
|
615
|
-
### Bedrock Usage
|
|
616
|
-
|
|
617
|
-
`bedrockAdapter()` parses **decoded** AWS Bedrock **ConverseStream** JSON events — one ConverseStream envelope object per `parseChunk` call. Create one adapter instance per request/stream.
|
|
618
|
-
|
|
619
|
-
Bedrock streaming responses are often `application/vnd.amazon.eventstream` (binary). **Decode EventStream bytes in your app, AWS SDK, or the example helper** before assembly — this library does not sign requests or parse binary framing.
|
|
620
|
-
|
|
621
|
-
```
|
|
622
|
-
Bedrock Runtime → EventStream bytes → [SDK or decode helper] → JSON strings
|
|
623
|
-
→ bedrockAdapter().parseChunk / assembleFromPayloads / assembleStream → StreamEvent[]
|
|
624
|
-
```
|
|
625
|
-
|
|
626
|
-
**Recommended path:** use `@aws-sdk/client-bedrock-runtime` `ConverseStreamCommand`, iterate the async stream, `JSON.stringify` each event object, and feed lines to `assembleFromPayloads`. See [`examples/bedrock/README.md`](./examples/bedrock/README.md) and [`examples/node-fetch/bedrock.ts`](./examples/node-fetch/bedrock.ts).
|
|
627
|
-
|
|
628
|
-
```ts
|
|
629
|
-
import { assembleFromPayloads, bedrockAdapter } from "llm-stream-assemble";
|
|
630
|
-
|
|
631
|
-
async function* decodedConverseEvents(sdkStream: AsyncIterable<Record<string, unknown>>) {
|
|
632
|
-
for await (const event of sdkStream) {
|
|
633
|
-
yield JSON.stringify(event);
|
|
634
|
-
}
|
|
635
|
-
}
|
|
636
|
-
|
|
637
|
-
for await (const event of assembleFromPayloads(
|
|
638
|
-
decodedConverseEvents(converseStream),
|
|
639
|
-
bedrockAdapter({ modelFamily: "auto" }),
|
|
640
|
-
)) {
|
|
641
|
-
if (event.type === "text.delta") process.stdout.write(event.text);
|
|
642
|
-
if (event.type === "tool_call.done") console.log(event.name, event.args);
|
|
643
|
-
}
|
|
644
|
-
```
|
|
645
|
-
|
|
646
|
-
**`modelFamily`** hints which ConverseStream dialect to prefer when envelopes overlap:
|
|
647
|
-
|
|
648
|
-
| Value | When to use |
|
|
649
|
-
| --------------- | ----------------------------------------------------------------- |
|
|
650
|
-
| `"auto"` | Default — structural detection from payload shape |
|
|
651
|
-
| `"anthropic"` | Claude on Bedrock — reasoning deltas, Anthropic-style tool blocks |
|
|
652
|
-
| `"nova"` | Amazon Nova models |
|
|
653
|
-
| `"openai-like"` | Llama and other OpenAI-shaped delta fields |
|
|
654
|
-
|
|
655
|
-
Use `bedrockAdapter({ jsonMode: true })` when structured JSON text blocks should map to `json.*` instead of `text.*`. Guardrail interventions map to `finish` with `content_filter`; trace details remain in `metadata.raw`.
|
|
656
|
-
|
|
657
|
-
**Environment variables** for live smoke and examples: `AWS_REGION`, `BEDROCK_MODEL_ID`, plus standard AWS credential chain (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_PROFILE`, SSO). IAM and SigV4 signing stay outside this library.
|
|
658
|
-
|
|
659
|
-
Subpath import: `import { bedrockAdapter } from "llm-stream-assemble/adapters/bedrock"`.
|
|
660
|
-
|
|
661
|
-
Worker proxy recipe: [`examples/integrations/bedrock-worker-proxy.ts`](./examples/integrations/bedrock-worker-proxy.ts). EventStream decode helper (examples only): [`examples/bedrock/decode-event-stream.ts`](./examples/bedrock/decode-event-stream.ts).
|
|
662
|
-
|
|
663
|
-
### Cohere Usage
|
|
664
|
-
|
|
665
|
-
`cohereAdapter()` parses Cohere Chat **v2** SSE events from `https://api.cohere.com/v2/chat` and non-streaming v2 response bodies. Create one adapter instance per request/stream. Cohere is **not** OpenAI-compatible — use `cohereAdapter()`, not `openaiCompatibleAdapter()`.
|
|
666
|
-
|
|
667
|
-
Core `parseSSE()` frames the HTTP body; `assembleStream` yields one JSON payload string per `data:` line to `cohereAdapter().parseChunk`.
|
|
668
|
-
|
|
669
|
-
```ts
|
|
670
|
-
import { assembleStream, cohereAdapter } from "llm-stream-assemble";
|
|
671
|
-
|
|
672
|
-
const response = await fetch("https://api.cohere.com/v2/chat", {
|
|
673
|
-
method: "POST",
|
|
674
|
-
headers: {
|
|
675
|
-
Authorization: `Bearer ${process.env.COHERE_API_KEY}`,
|
|
676
|
-
"Content-Type": "application/json",
|
|
677
|
-
},
|
|
678
|
-
body: JSON.stringify({
|
|
679
|
-
model: "command-r-plus-08-2024",
|
|
680
|
-
messages: [{ role: "user", content: "Hello" }],
|
|
681
|
-
stream: true,
|
|
682
|
-
}),
|
|
683
|
-
});
|
|
684
|
-
|
|
685
|
-
for await (const event of assembleStream(response.body!, cohereAdapter())) {
|
|
686
|
-
if (event.type === "text.delta") process.stdout.write(event.text);
|
|
687
|
-
if (event.type === "reasoning.delta") process.stdout.write(event.text);
|
|
688
|
-
if (event.type === "tool_call.done") console.log(event.name, event.args);
|
|
689
|
-
}
|
|
690
|
-
```
|
|
691
|
-
|
|
692
|
-
Use `cohereAdapter({ jsonMode: true })` when structured JSON output should map to `json.*` instead of `text.*`. **`tool-plan-delta`** events map to `reasoning.*` with `variant: "detail"`. **`citation-start`** maps to typed **`citation`** events (span, sources, index). Set `emitLegacyCitationMetadata: true` on any citation-capable adapter to dual-emit legacy `metadata.raw` blobs during migration. Legacy Cohere v1 endpoints are out of scope.
|
|
693
|
-
|
|
694
|
-
Subpath import: `import { cohereAdapter } from "llm-stream-assemble/adapters/cohere"`.
|
|
695
|
-
|
|
696
|
-
Live smoke: `pnpm smoke:cohere` — see [`docs/live-smoke.md`](./docs/live-smoke.md) for `COHERE_API_KEY`, `COHERE_MODEL`, and `COHERE_SMOKE_TOOLS`.
|
|
171
|
+
- Compatibility details: [docs/compatibility.md](./docs/compatibility.md)
|
|
172
|
+
- Framework recipes: [docs/integration-cookbook.md](./docs/integration-cookbook.md)
|
|
173
|
+
- Runnable examples: [examples/README.md](./examples/README.md)
|
|
697
174
|
|
|
698
175
|
---
|
|
699
176
|
|
|
700
|
-
##
|
|
701
|
-
|
|
702
|
-

|
|
703
|
-
|
|
704
|
-
### Collecting a Stream
|
|
705
|
-
|
|
706
|
-
`collectStream()` materializes a full event stream into text, reasoning, refusals, JSON, tool calls, citations, grounding, logprobs, latest usage, and finish reason. It buffers full output in memory and aggregates multi-choice text in event order; it is not a per-choice collector and does not currently collect metadata.
|
|
707
|
-
|
|
708
|
-
```ts
|
|
709
|
-
import { collectStream } from "llm-stream-assemble";
|
|
710
|
-
|
|
711
|
-
const result = await collectStream(events);
|
|
712
|
-
console.log(result.text, result.toolCalls, result.finishReason);
|
|
713
|
-
```
|
|
714
|
-
|
|
715
|
-
### Tapping Events
|
|
716
|
-
|
|
717
|
-
`tapEvents()` lets you observe events for logging or metrics without changing the stream.
|
|
718
|
-
|
|
719
|
-
```ts
|
|
720
|
-
import { tapEvents } from "llm-stream-assemble";
|
|
721
|
-
|
|
722
|
-
for await (const event of tapEvents(events, (event) => console.debug(event.type))) {
|
|
723
|
-
// consume normally
|
|
724
|
-
}
|
|
725
|
-
```
|
|
726
|
-
|
|
727
|
-
### Forwarding Unified SSE
|
|
728
|
-
|
|
729
|
-
`toSSE()` serializes unified `StreamEvent` objects as `data: <json>` SSE messages. It does not currently emit named SSE `event:` fields, and it emits unified event JSON rather than raw provider SSE.
|
|
730
|
-
|
|
731
|
-
```ts
|
|
732
|
-
import { toSSE } from "llm-stream-assemble";
|
|
733
|
-
|
|
734
|
-
return new Response(toSSE(events, { sanitizeErrors: true }), {
|
|
735
|
-
headers: { "Content-Type": "text/event-stream" },
|
|
736
|
-
});
|
|
737
|
-
```
|
|
738
|
-
|
|
739
|
-
Use `sanitizeErrors: true` when forwarding events to browsers so raw provider internals are not exposed.
|
|
740
|
-
|
|
741
|
-
### Replaying Fixtures
|
|
742
|
-
|
|
743
|
-
`assembleFromFile()` is a Node/dev replay helper for local `.sse` and `.json` fixtures. It uses `node:fs/promises`, so avoid it in browser bundles; a dedicated browser/edge entry point can be added later if needed.
|
|
744
|
-
|
|
745
|
-
```ts
|
|
746
|
-
import { assembleFromFile, openaiChatAdapter } from "llm-stream-assemble";
|
|
747
|
-
|
|
748
|
-
for await (const event of assembleFromFile(
|
|
749
|
-
"test/fixtures/openai-chat/text-basic.sse",
|
|
750
|
-
openaiChatAdapter(),
|
|
751
|
-
)) {
|
|
752
|
-
console.log(event);
|
|
753
|
-
}
|
|
754
|
-
```
|
|
177
|
+
## Examples
|
|
755
178
|
|
|
756
|
-
|
|
179
|
+
Runnable samples: [examples/README.md](./examples/README.md) — `examples/node-fetch/openai-chat.ts`, `examples/node-fetch/openai-compatible.ts`, `examples/node-fetch/azure-openai.ts`, `examples/node-fetch/perplexity.ts`, `examples/node-fetch/xai.ts`, `examples/node-fetch/gemini.ts`, `examples/node-fetch/vertex-gemini.ts`, `examples/node-fetch/bedrock.ts`, `examples/node-fetch/cohere.ts`, `examples/workers-ai/rest-chat-completions.ts`; proxy safety via `sanitizeErrors`.
|
|
757
180
|
|
|
758
|
-
|
|
759
|
-
|
|
760
|
-
|
|
761
|
-
|
|
762
|
-
| [`examples/node-fetch/openai-chat.ts`](./examples/node-fetch/openai-chat.ts) | OpenAI Chat Completions streaming |
|
|
763
|
-
| [`examples/node-fetch/openai-compatible.ts`](./examples/node-fetch/openai-compatible.ts) | OpenAI-compatible presets |
|
|
764
|
-
| [`examples/node-fetch/azure-openai.ts`](./examples/node-fetch/azure-openai.ts) | Azure OpenAI deployment URL + `api-key` |
|
|
765
|
-
| [`examples/workers-ai/rest-chat-completions.ts`](./examples/workers-ai/rest-chat-completions.ts) | Cloudflare Workers AI REST + `cloudflare` preset |
|
|
766
|
-
| [`examples/node-fetch/perplexity.ts`](./examples/node-fetch/perplexity.ts) | Perplexity streaming |
|
|
767
|
-
| [`examples/node-fetch/xai.ts`](./examples/node-fetch/xai.ts) | xAI Grok streaming |
|
|
768
|
-
| [`examples/node-fetch/anthropic.ts`](./examples/node-fetch/anthropic.ts) | Anthropic Messages |
|
|
769
|
-
| [`examples/node-fetch/gemini.ts`](./examples/node-fetch/gemini.ts) | Google AI Gemini SSE |
|
|
770
|
-
| [`examples/node-fetch/vertex-gemini.ts`](./examples/node-fetch/vertex-gemini.ts) | Vertex AI Gemini JSONL stream |
|
|
771
|
-
| [`examples/node-fetch/bedrock.ts`](./examples/node-fetch/bedrock.ts) | AWS Bedrock ConverseStream (decoded JSON) |
|
|
772
|
-
| [`examples/node-fetch/replay-fixture.ts`](./examples/node-fetch/replay-fixture.ts) | Local fixture replay |
|
|
773
|
-
| [`examples/proxy-safety/`](./examples/proxy-safety/) | Proxy + browser client patterns |
|
|
774
|
-
|
|
775
|
-
Proxy safety:
|
|
776
|
-
|
|
777
|
-
- Use `toSSE(events, { sanitizeErrors: true })` for browser-facing streams.
|
|
778
|
-
- Use `tapEvents` for server-side observation and logging.
|
|
779
|
-
- Never forward raw provider errors or upstream non-OK response bodies to browsers.
|
|
780
|
-
- CORS headers are application-specific and intentionally omitted from the Web-standard example.
|
|
181
|
+
- Full examples index: [examples/README.md](./examples/README.md)
|
|
182
|
+
- Node fetch examples: [examples/node-fetch/](./examples/node-fetch/)
|
|
183
|
+
- Integration recipes: [examples/integrations/](./examples/integrations/)
|
|
184
|
+
- Proxy safety patterns: [examples/proxy-safety/](./examples/proxy-safety/)
|
|
781
185
|
|
|
782
186
|
---
|
|
783
187
|
|
|
@@ -786,7 +190,7 @@ Proxy safety:
|
|
|
786
190
|
- No HTTP client, auth, retries, or provider SDK wrapper.
|
|
787
191
|
- No agent loop, tool execution, memory, or persistence.
|
|
788
192
|
- No UI framework, React hooks, or browser components.
|
|
789
|
-
- No
|
|
193
|
+
- No markdown/XML tag parsing inside model text.
|
|
790
194
|
|
|
791
195
|
---
|
|
792
196
|
|
|
@@ -803,7 +207,6 @@ pnpm verify
|
|
|
803
207
|
| `pnpm verify:deps` | fail if runtime dependencies are added |
|
|
804
208
|
| `pnpm release:prep` | pre-tag checks (version, CHANGELOG, dist, npm pack) |
|
|
805
209
|
| `pnpm diagrams:build` | regenerate README SVGs from Mermaid sources |
|
|
806
|
-
| `pnpm bench:smoke` | local LSA-C52 timing script (requires build first) |
|
|
807
210
|
| `pnpm test` | Vitest smoke tests |
|
|
808
211
|
| `pnpm build` | tsup → ESM + CJS + declarations |
|
|
809
212
|
|