llm-stream-assemble 1.4.1 → 1.5.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,19 +1,19 @@
1
1
  # llm-stream-assemble
2
2
 
3
- ![core](https://img.shields.io/badge/core-1.4.1-blue)
3
+ ![core](https://img.shields.io/badge/core-1.5.5-blue)
4
4
  ![node](https://img.shields.io/badge/node-%3E%3D18-339933)
5
5
  ![runtime deps](https://img.shields.io/badge/runtime_deps-0-brightgreen)
6
- ![tests](https://img.shields.io/badge/tests-1183_passing-brightgreen)
6
+ ![tests](https://img.shields.io/badge/tests-1477_passing-brightgreen)
7
7
  [![ci](https://github.com/01laky/llm-stream-assemble/actions/workflows/ci.yml/badge.svg)](https://github.com/01laky/llm-stream-assemble/actions/workflows/ci.yml)
8
- ![status](https://img.shields.io/badge/status-stable_1.4.1-brightgreen)
8
+ ![status](https://img.shields.io/badge/status-stable_1.5.5-brightgreen)
9
9
 
10
10
  **One typed event model for every LLM stream** — text, tool calls, reasoning, JSON, usage, refusals, errors, and non-streaming responses.
11
11
 
12
- > A zero-dependency TypeScript layer between raw LLM provider bytes and your app: six built-in adapters, thirteen host presets, and a single StreamEvent model for text, tools, reasoning, JSON, and lifecycle — from Ollama to Azure to Bedrock to Cloudflare Workers AI.
12
+ > A zero-dependency TypeScript layer between raw LLM provider bytes and your app: seven built-in adapters, thirteen host presets, and a single StreamEvent model for text, tools, reasoning, JSON, and lifecycle — from Ollama to Azure to Bedrock to Cohere to Cloudflare Workers AI.
13
13
 
14
14
  Turn provider SSE fragments into typed events — **not another `+=` loop**.
15
15
 
16
- **Status:** Stable `1.4.1`. Six built-in adapters, thirteen OpenAI-compatible host presets (including **Azure OpenAI** and **Cloudflare Workers AI**), transforms, replay helpers, and examples are production-ready. Pin semver ranges as usual and review [CHANGELOG.md](./CHANGELOG.md) before major upgrades.
16
+ **Status:** Stable `1.5.5`. Seven built-in adapters (Gemini covers **Google AI** and **Vertex AI** via `apiSurface`), thirteen OpenAI-compatible host presets (including **Azure OpenAI** and **Cloudflare Workers AI**), transforms, replay helpers, and examples are production-ready. Pin semver ranges as usual and review [CHANGELOG.md](./CHANGELOG.md) before major upgrades.
17
17
 
18
18
  ---
19
19
 
@@ -144,8 +144,9 @@ Diagram sources: [`docs/img/`](./docs/img/) (Mermaid `.mmd` + committed SVG). Re
144
144
  | `openaiCompatibleAdapter({ provider })` | Groq, DeepSeek, Mistral, Ollama, LM Studio, Together, Fireworks, OpenRouter, Perplexity, xAI, **Azure OpenAI**, **Cloudflare Workers AI**, generic | `llm-stream-assemble` |
145
145
  | `anthropicAdapter()` | Anthropic Messages | `llm-stream-assemble` |
146
146
  | `openaiResponsesAdapter()` | OpenAI Responses API | `llm-stream-assemble` |
147
- | `geminiAdapter()` | Google AI Gemini | `llm-stream-assemble` or `/adapters/gemini` |
147
+ | `geminiAdapter()` | Google AI Gemini + Vertex AI (`apiSurface`) | `llm-stream-assemble` or `/adapters/gemini` |
148
148
  | `bedrockAdapter()` | AWS Bedrock Converse / ConverseStream | `llm-stream-assemble` or `/adapters/bedrock` |
149
+ | `cohereAdapter()` | Cohere Chat v2 (`api.cohere.com/v2/chat`) | `llm-stream-assemble` or `/adapters/cohere` |
149
150
 
150
151
  Full feature flags and quirks: [compatibility matrix](./docs/compatibility.md).
151
152
 
@@ -202,6 +203,7 @@ Pick an adapter in ~30 seconds:
202
203
  - **Anthropic Messages** → `anthropicAdapter()`
203
204
  - **Google Gemini** → `geminiAdapter()`
204
205
  - **AWS Bedrock ConverseStream** → `bedrockAdapter()` (decoded JSON per event — see [Bedrock Usage](#bedrock-usage))
206
+ - **Cohere Chat v2 SSE** → `cohereAdapter()` (not OpenAI-compatible — see [Cohere Usage](#cohere-usage))
205
207
  - **Groq, Ollama, Azure, Cloudflare, OpenRouter, …** → `openaiCompatibleAdapter({ provider })`
206
208
  - **Non-streaming JSON body** → `assembleResponse(body, adapter)`
207
209
  - **React chat UI / full agent framework** → not this package — see [comparison](./docs/comparison.md)
@@ -277,6 +279,10 @@ for await (const event of assembleStream(response.body!, adapter)) {
277
279
 
278
280
  → [`examples/node-fetch/bedrock.ts`](./examples/node-fetch/bedrock.ts) · Usage: [Bedrock](#bedrock-usage) · Decode helper: [`examples/bedrock/README.md`](./examples/bedrock/README.md)
279
281
 
282
+ ### Cohere Chat v2
283
+
284
+ → [`examples/node-fetch/cohere.ts`](./examples/node-fetch/cohere.ts) · Usage: [Cohere](#cohere-usage)
285
+
280
286
  ### Streaming JSON (structured output)
281
287
 
282
288
  ```ts
@@ -317,7 +323,7 @@ Wire unified events into **Hono**, **Express**, **Cloudflare Workers**, **LiteLL
317
323
 
318
324
  ### Core Usage
319
325
 
320
- The core pipeline works with any adapter that emits `RawChunk[]`, including the built-in OpenAI Chat, OpenAI-compatible, Anthropic Messages, OpenAI Responses, Google Gemini, and AWS Bedrock adapters:
326
+ The core pipeline works with any adapter that emits `RawChunk[]`, including the built-in OpenAI Chat, OpenAI-compatible, Anthropic Messages, OpenAI Responses, Google Gemini, AWS Bedrock, and Cohere adapters:
321
327
 
322
328
  ```ts
323
329
  import { assembleFromPayloads, type StreamAdapter } from "llm-stream-assemble";
@@ -555,7 +561,43 @@ Use `geminiAdapter({ jsonMode: true })` when structured JSON output should map t
555
561
 
556
562
  Subpath import: `import { geminiAdapter } from "llm-stream-assemble/adapters/gemini"`.
557
563
 
558
- Vertex AI and the Interactions API are out of scope for this adapter; see [compatibility matrix](./docs/compatibility.md).
564
+ #### Vertex AI Gemini
565
+
566
+ Vertex uses the same `geminiAdapter()` with **`apiSurface: "vertex"`**. The adapter strips Vertex / gateway envelopes (`response`, `result`, `predictions[0]`) via **`normalizeVertexChunk()`** before mapping `candidates` and tools. Vertex HTTP streams are often **JSONL or concatenated JSON objects**, not Google AI `data:` SSE — split complete JSON strings in your app, then pass each line to `assembleFromPayloads` (see [`examples/vertex/read-chunk-stream.ts`](./examples/vertex/read-chunk-stream.ts)).
567
+
568
+ ```ts
569
+ import { assembleFromPayloads, geminiAdapter } from "llm-stream-assemble";
570
+ import { buildVertexStreamUrl } from "./examples/vertex/build-vertex-url";
571
+ import { readVertexJsonlStrings } from "./examples/vertex/read-chunk-stream";
572
+
573
+ const projectId = process.env.GOOGLE_CLOUD_PROJECT!;
574
+ const location = process.env.VERTEX_LOCATION ?? "us-central1";
575
+ const model = process.env.VERTEX_MODEL ?? "gemini-2.5-flash";
576
+ const accessToken = process.env.VERTEX_ACCESS_TOKEN!; // ADC — not GOOGLE_API_KEY
577
+
578
+ const response = await fetch(buildVertexStreamUrl({ projectId, location, model }), {
579
+ method: "POST",
580
+ headers: {
581
+ Authorization: `Bearer ${accessToken}`,
582
+ "Content-Type": "application/json",
583
+ },
584
+ body: JSON.stringify({
585
+ contents: [{ role: "user", parts: [{ text: "Hello" }] }],
586
+ }),
587
+ });
588
+
589
+ async function* lines() {
590
+ for await (const line of readVertexJsonlStrings(response.body!)) yield line;
591
+ }
592
+
593
+ for await (const event of assembleFromPayloads(lines(), geminiAdapter({ apiSurface: "vertex" }))) {
594
+ if (event.type === "text.delta") process.stdout.write(event.text);
595
+ }
596
+ ```
597
+
598
+ Obtain a short-lived bearer token with Application Default Credentials, e.g. `gcloud auth application-default print-access-token`, and set `VERTEX_ACCESS_TOKEN` (or pass `accessToken` in your own wrapper). Full runnable example: [`examples/node-fetch/vertex-gemini.ts`](./examples/node-fetch/vertex-gemini.ts). Live smoke: `pnpm smoke:vertex` — see [live-smoke](./docs/live-smoke.md).
599
+
600
+ The Gemini **Interactions API** remains deferred; see [compatibility matrix](./docs/compatibility.md).
559
601
 
560
602
  ### Bedrock Usage
561
603
 
@@ -605,6 +647,41 @@ Subpath import: `import { bedrockAdapter } from "llm-stream-assemble/adapters/be
605
647
 
606
648
  Worker proxy recipe: [`examples/integrations/bedrock-worker-proxy.ts`](./examples/integrations/bedrock-worker-proxy.ts). EventStream decode helper (examples only): [`examples/bedrock/decode-event-stream.ts`](./examples/bedrock/decode-event-stream.ts).
607
649
 
650
+ ### Cohere Usage
651
+
652
+ `cohereAdapter()` parses Cohere Chat **v2** SSE events from `https://api.cohere.com/v2/chat` and non-streaming v2 response bodies. Create one adapter instance per request/stream. Cohere is **not** OpenAI-compatible — use `cohereAdapter()`, not `openaiCompatibleAdapter()`.
653
+
654
+ Core `parseSSE()` frames the HTTP body; `assembleStream` yields one JSON payload string per `data:` line to `cohereAdapter().parseChunk`.
655
+
656
+ ```ts
657
+ import { assembleStream, cohereAdapter } from "llm-stream-assemble";
658
+
659
+ const response = await fetch("https://api.cohere.com/v2/chat", {
660
+ method: "POST",
661
+ headers: {
662
+ Authorization: `Bearer ${process.env.COHERE_API_KEY}`,
663
+ "Content-Type": "application/json",
664
+ },
665
+ body: JSON.stringify({
666
+ model: "command-r-plus-08-2024",
667
+ messages: [{ role: "user", content: "Hello" }],
668
+ stream: true,
669
+ }),
670
+ });
671
+
672
+ for await (const event of assembleStream(response.body!, cohereAdapter())) {
673
+ if (event.type === "text.delta") process.stdout.write(event.text);
674
+ if (event.type === "reasoning.delta") process.stdout.write(event.text);
675
+ if (event.type === "tool_call.done") console.log(event.name, event.args);
676
+ }
677
+ ```
678
+
679
+ Use `cohereAdapter({ jsonMode: true })` when structured JSON output should map to `json.*` instead of `text.*`. **`tool-plan-delta`** events map to `reasoning.*` with `variant: "detail"`. **`citation-start`** payloads are preserved in `metadata.raw` — there are no dedicated `citation.*` unified events in 1.x. Legacy Cohere v1 endpoints are out of scope.
680
+
681
+ Subpath import: `import { cohereAdapter } from "llm-stream-assemble/adapters/cohere"`.
682
+
683
+ Live smoke: `pnpm smoke:cohere` — see [`docs/live-smoke.md`](./docs/live-smoke.md) for `COHERE_API_KEY`, `COHERE_MODEL`, and `COHERE_SMOKE_TOOLS`.
684
+
608
685
  ---
609
686
 
610
687
  ## Transforms & replay
@@ -676,7 +753,8 @@ for await (const event of assembleFromFile(
676
753
  | [`examples/node-fetch/perplexity.ts`](./examples/node-fetch/perplexity.ts) | Perplexity streaming |
677
754
  | [`examples/node-fetch/xai.ts`](./examples/node-fetch/xai.ts) | xAI Grok streaming |
678
755
  | [`examples/node-fetch/anthropic.ts`](./examples/node-fetch/anthropic.ts) | Anthropic Messages |
679
- | [`examples/node-fetch/gemini.ts`](./examples/node-fetch/gemini.ts) | Google Gemini SSE |
756
+ | [`examples/node-fetch/gemini.ts`](./examples/node-fetch/gemini.ts) | Google AI Gemini SSE |
757
+ | [`examples/node-fetch/vertex-gemini.ts`](./examples/node-fetch/vertex-gemini.ts) | Vertex AI Gemini JSONL stream |
680
758
  | [`examples/node-fetch/bedrock.ts`](./examples/node-fetch/bedrock.ts) | AWS Bedrock ConverseStream (decoded JSON) |
681
759
  | [`examples/node-fetch/replay-fixture.ts`](./examples/node-fetch/replay-fixture.ts) | Local fixture replay |
682
760
  | [`examples/proxy-safety/`](./examples/proxy-safety/) | Proxy + browser client patterns |