npm - workers-ai-provider - Versions diffs - 3.1.14 → 3.2.0 - Mend

workers-ai-provider 3.1.14 → 3.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/README.md +183 -31
package/dist/anthropic.d.mts +14 -0
package/dist/anthropic.mjs +21 -0
package/dist/anthropic.mjs.map +1 -0
package/dist/gateway-delegate-BfaUTwDZ.d.mts +385 -0
package/dist/gateway-provider-1USFWm7c.mjs +583 -0
package/dist/gateway-provider-1USFWm7c.mjs.map +1 -0
package/dist/gateway-provider.d.mts +80 -0
package/dist/gateway-provider.mjs +2 -0
package/dist/google.d.mts +14 -0
package/dist/google.mjs +21 -0
package/dist/google.mjs.map +1 -0
package/dist/index.d.mts +64 -7
package/dist/index.mjs +703 -73
package/dist/index.mjs.map +1 -1
package/dist/openai.d.mts +20 -0
package/dist/openai.mjs +27 -0
package/dist/openai.mjs.map +1 -0
package/package.json +47 -6
package/src/anthropic.ts +17 -0
package/src/client-fallback.ts +70 -0
package/src/convert-to-workersai-chat-messages.ts +30 -5
package/src/errors.ts +216 -0
package/src/gateway-delegate.ts +696 -0
package/src/gateway-provider.ts +167 -0
package/src/gateway-providers.ts +457 -0
package/src/google.ts +19 -0
package/src/index.ts +180 -9
package/src/openai.ts +25 -0
package/src/resumable-stream.ts +223 -0
package/src/streaming.ts +102 -30
package/src/utils.ts +187 -4
package/src/workersai-chat-language-model.ts +87 -26
package/src/workersai-chat-settings.ts +1 -1
package/src/workersai-models.ts +11 -3

package/README.md CHANGED Viewed

@@ -1,8 +1,12 @@
 # workers-ai-provider
-[Workers AI](https://developers.cloudflare.com/workers-ai/) provider for the [AI SDK](https://sdk.vercel.ai/). Run Cloudflare's models for chat, embeddings, image generation, transcription, text-to-speech, reranking, and [AI Search](https://developers.cloudflare.com/ai-search/) — all from a single provider.
+[Workers AI](https://developers.cloudflare.com/workers-ai/) provider for the [AI SDK](https://sdk.vercel.ai/). Run Cloudflare's models for chat, embeddings, image generation, transcription, text-to-speech, reranking, and [AI Search](https://developers.cloudflare.com/ai-search/) — all from a single provider. It can also route **third-party** models (OpenAI, Anthropic, Google, …) through [AI Gateway](https://developers.cloudflare.com/ai-gateway/) — see [Third-party models](#third-party-models-via-ai-gateway).
-## Quick Start
+## Quick start
+```bash
+npm install workers-ai-provider ai
+```
 ```jsonc
 // wrangler.jsonc
@@ -20,7 +24,7 @@ export default {
 		const workersai = createWorkersAI({ binding: env.AI });
 		const result = streamText({
-			model: workersai("@cf/moonshotai/kimi-k2.5"),
+			model: workersai("@cf/moonshotai/kimi-k2.7-code"),
 			messages: [{ role: "user", content: "Write a haiku about Cloudflare" }],
 		});
@@ -29,10 +33,6 @@ export default {
 };
 ```
-```bash
-npm install workers-ai-provider ai
-```
 ## Configuration
 ### Workers binding (recommended)
@@ -73,10 +73,10 @@ Some good defaults:
 | Task           | Model                                  | Notes                               |
 | -------------- | -------------------------------------- | ----------------------------------- |
-| Chat           | `@cf/moonshotai/kimi-k2.5`             | 256k ctx, tools, vision, reasoning  |
+| Chat           | `@cf/moonshotai/kimi-k2.7-code`        | 256k ctx, tools, vision, reasoning  |
 | Chat           | `@cf/zai-org/glm-4.7-flash`            | Fast, multilingual, 131k ctx        |
 | Chat           | `@cf/openai/gpt-oss-120b`              | OpenAI open-weights, high reasoning |
-| Reasoning      | `@cf/moonshotai/kimi-k2.5`             | Configurable `reasoning_effort`     |
+| Reasoning      | `@cf/moonshotai/kimi-k2.7-code`        | Configurable `reasoning_effort`     |
 | Reasoning      | `@cf/qwen/qwq-32b`                     | Emits `reasoning_content`           |
 | Embeddings     | `@cf/baai/bge-base-en-v1.5`            | 768-dim, English                    |
 | Embeddings     | `@cf/google/embeddinggemma-300m`       | 100+ languages, by Google           |
@@ -86,13 +86,13 @@ Some good defaults:
 | Text-to-Speech | `@cf/deepgram/aura-2-en`               | Context-aware, natural pacing       |
 | Reranking      | `@cf/baai/bge-reranker-base`           | Fast document reranking             |
-## Text Generation
+## Text generation
 ```ts
 import { generateText } from "ai";
 const { text } = await generateText({
-	model: workersai("@cf/moonshotai/kimi-k2.5"),
+	model: workersai("@cf/moonshotai/kimi-k2.7-code"),
 	prompt: "Explain Workers AI in one paragraph",
 });
 ```
@@ -103,7 +103,7 @@ Streaming:
 import { streamText } from "ai";
 const result = streamText({
-	model: workersai("@cf/moonshotai/kimi-k2.5"),
+	model: workersai("@cf/moonshotai/kimi-k2.7-code"),
 	messages: [{ role: "user", content: "Write a short story" }],
 });
@@ -112,7 +112,7 @@ for await (const chunk of result.textStream) {
 }
 ```
-## Reasoning Controls
+## Reasoning controls
 Reasoning-capable Workers AI models (GLM-4.7-flash, Kimi K2.5/K2.6, GPT-OSS, QwQ) accept `reasoning_effort` and `chat_template_kwargs` on their inputs. Either set them at model creation time as settings, or per-call via `providerOptions["workers-ai"]` (per-call wins):
@@ -141,7 +141,7 @@ await generateText({
 `reasoning_effort: null` is meaningful — it's the explicit "disable reasoning" signal for models that support it. Both fields land on the `inputs` object of `binding.run()` (and the JSON body of the REST request), matching the shape expected by Workers AI. See the [model catalog](https://developers.cloudflare.com/workers-ai/models/) for per-model reasoning capabilities.
-## Vision (Image Inputs)
+## Vision (image inputs)
 Send images to vision-capable models like Kimi K2.5:
@@ -149,7 +149,7 @@ Send images to vision-capable models like Kimi K2.5:
 import { generateText } from "ai";
 const { text } = await generateText({
-	model: workersai("@cf/moonshotai/kimi-k2.5"),
+	model: workersai("@cf/moonshotai/kimi-k2.7-code"),
 	messages: [
 		{
 			role: "user",
@@ -164,14 +164,14 @@ const { text } = await generateText({
 Images can be provided as `Uint8Array`, base64 strings, or data URLs. Multiple images per message are supported. Works with both the binding and REST API configurations.
-## Tool Calling
+## Tool calling
 ```ts
 import { generateText, stepCountIs } from "ai";
 import { z } from "zod";
 const { text } = await generateText({
-	model: workersai("@cf/moonshotai/kimi-k2.5"),
+	model: workersai("@cf/moonshotai/kimi-k2.7-code"),
 	prompt: "What's the weather in London?",
 	tools: {
 		getWeather: {
@@ -184,14 +184,14 @@ const { text } = await generateText({
 });
 ```
-## Structured Output
+## Structured output
 ```ts
 import { generateText, Output } from "ai";
 import { z } from "zod";
 const { output } = await generateText({
-	model: workersai("@cf/moonshotai/kimi-k2.5"),
+	model: workersai("@cf/moonshotai/kimi-k2.7-code"),
 	prompt: "Recipe for spaghetti bolognese",
 	output: Output.object({
 		schema: z.object({
@@ -214,7 +214,7 @@ const { embeddings } = await embedMany({
 });
 ```
-## Image Generation
+## Image generation
 ```ts
 import { generateImage } from "ai";
@@ -228,7 +228,7 @@ const { images } = await generateImage({
 // images[0].uint8Array contains the PNG bytes
 ```
-## Transcription (Speech-to-Text)
+## Transcription (speech-to-text)
 Transcribe audio using Whisper or Deepgram Nova-3 models.
@@ -265,7 +265,7 @@ const { text } = await transcribe({
 });
 ```
-## Text-to-Speech
+## Text-to-speech
 Generate spoken audio from text using Deepgram Aura-2.
@@ -329,21 +329,173 @@ Streaming works the same way — use `streamText` instead of `generateText`.
 > `createAutoRAG` still works but is deprecated. Use `createAISearch` instead.
-## API Reference
+## Third-party models via AI Gateway
+> **⚠️ Experimental.** Everything in this section (routing third-party models via `createWorkersAI({ providers })`, the provider plugins, the registry, the resume layer, and `createGatewayFetch`/`createGatewayProvider`) is a new and substantial addition — well beyond the package's original job of wrapping Workers AI. Treat the whole surface as experimental: APIs may change, and several behaviors depend on undocumented AI Gateway internals (the `cf-aig-run-id` resume buffer, per-provider run-path wire formats). It does **not** affect the stable Workers AI / AI Search APIs above. Bug reports and feedback are very welcome.
+Route **third-party** catalog models — OpenAI, Anthropic, Google, xAI/Grok, Groq, and the OpenAI-compatible long tail — through [AI Gateway](https://developers.cloudflare.com/ai-gateway/) using the same `env.AI` binding, with resumable streaming, BYOK, caching, and fallback.
+Install only the wire-format plugins you actually use. They're **optional** peer dependencies:
+```bash
+npm install @ai-sdk/openai      # openai, deepseek, xai/grok, groq, mistral, perplexity, cerebras, openrouter, fireworks, alibaba, minimax
+npm install @ai-sdk/anthropic   # anthropic
+npm install @ai-sdk/google      # google, google-vertex
+```
+### Setup
+Pass the plugins to `createWorkersAI` via `providers`. Then it's the same provider you already use: `@cf/...` ids build Workers AI models, and a `"<provider>/<model>"` catalog slug is routed through AI Gateway automatically. `createWorkersAI` is the single public entry point — there's no separate factory to import.
+```ts
+import { createWorkersAI } from "workers-ai-provider";
+import { openai } from "workers-ai-provider/openai";
+import { anthropic } from "workers-ai-provider/anthropic";
+import { streamText } from "ai";
+const workersai = createWorkersAI({
+	binding: env.AI,
+	providers: [openai, anthropic], // opt-in; enables third-party routing
+	// gateway is optional — catalog routing uses your account's "default"
+	// gateway unless you set one, e.g. gateway: { id: "my-gateway" }.
+});
+workersai("@cf/meta/llama-3.1-8b-instruct"); // Workers AI (unchanged)
+const result = streamText({
+	model: workersai("openai/gpt-5", { resume: true }), // routed through AI Gateway
+	prompt: "Hello",
+});
+// result.response.headers["cf-aig-run-id"] is set — resume from there.
+```
+The settings argument is **typed from the model id**: pass a `"<provider>/<model>"` catalog slug and the second argument autocompletes the per-call gateway options (`resume`, `fallback`, `cacheTtl`, `byok`, `metadata`, …, i.e. `DelegateCallOptions`); pass a `@cf/...` id and it autocompletes the usual `WorkersAIChatSettings`. `providers` is optional and **additive**: leave it unset and `createWorkersAI` behaves exactly as before; passing a catalog slug without it throws a helpful error pointing you here.
+`gateway` is optional for catalog routing — when unset, requests use your account's `"default"` AI Gateway. Set `gateway: { id: "…" }` (here or per call) to use a specific gateway.
+The examples below assume a `workersai` configured with `providers` as above.
+### Wire-format plugins and provider coverage
+One plugin per **wire format** serves every provider of that format. The `openai` plugin alone covers `openai/…`, `deepseek/…`, `xai/…` (alias `grok`), `groq/…`, `mistral/…`, `perplexity/…`, `cerebras/…`, `openrouter/…`, `fireworks/…`, plus the unified-catalog chat providers `alibaba/…` (Qwen) and `minimax/…`.
+The registry covers every provider in the [AI Gateway provider directory](https://developers.cloudflare.com/ai-gateway/usage/providers/) — OpenAI, Anthropic, Google AI Studio, Google Vertex AI, xAI/Grok, Groq, DeepSeek, Mistral, Perplexity, Cerebras, OpenRouter, Cohere, Baseten, Parallel, Azure OpenAI, Amazon Bedrock, HuggingFace, Replicate, Fal, Ideogram, Cartesia, Deepgram, ElevenLabs (plus Fireworks) — so `createGatewayFetch` can auto-detect them from the request URL.
+**Coverage maturity varies** (the whole feature is experimental). Only the unified-billing run-catalog providers — OpenAI, Anthropic, Google, xAI/Grok, Groq, Alibaba/Qwen, MiniMax — are exercised end-to-end against a live gateway. The remaining registry entries (BYOK gateway-path providers like DeepSeek/Mistral/Perplexity/Cerebras/OpenRouter/Fireworks, and bring-your-own-provider-only ones like Cohere/Baseten/Parallel/Azure OpenAI/Bedrock/HuggingFace/Replicate/Fal/Ideogram/Cartesia/Deepgram/ElevenLabs) are registry-level wiring (gateway id, host pattern, endpoint transform) that is **not yet live-verified** — the routing is in place, but a provider's exact request shape may need adjustment. Please file issues for any that misbehave.
+> **Run-path wire format is per-provider — not always OpenAI.** On the resumable run path (`env.AI.run`), Cloudflare's unified catalog **normalizes most providers to OpenAI chat-completions** (so `google/…` is parsed with the `openai` plugin on the run path, even though the gateway path uses the native `google` plugin), but **passes Anthropic through natively** — so `anthropic/…` uses the `anthropic` plugin on both paths. In practice: include `openai` for the openai-wire run-path providers (openai, google, xai/grok, groq), and `anthropic` to use `anthropic/…`. The native `google` plugin is only needed if you force google onto the **gateway path**. If a plugin a transport needs is missing, the delegate throws a `GatewayDelegateError` naming it.
+### Transports
+The transport is chosen automatically from the options you pass:
+| Transport         | Backed by                    | Resume (`cf-aig-run-id`) | Caching | Server fallback | Billing           |
+| ----------------- | ---------------------------- | ------------------------ | ------- | --------------- | ----------------- |
+| **run** (default) | `env.AI.run(...)`            | ✅                       | ❌      | ❌              | Unified billing   |
+| **gateway**       | `env.AI.gateway(id).run([])` | ❌                       | ✅      | ✅              | BYOK / stored key |
+Run-catalog providers (OpenAI, Anthropic, Google, xAI, Groq, plus the unified-catalog chat providers Alibaba/Qwen and MiniMax) default to the resumable **run path**. BYOK-only providers (deepseek, mistral, perplexity, …) always use the **gateway path**. Asking for an impossible combination (e.g. `resume: true` with `fallback.mode: "server"`) throws a `GatewayDelegateError`.
+> Alibaba and MiniMax are **run-path only** — they're on the unified catalog but not the native gateway directory, so there's no gateway path. Requesting `transport: "gateway"`, caching, or server-side fallback on them throws a clear `GatewayDelegateError` at build time (rather than failing upstream); use the default run path or `fallback.mode: "client"`.
+### BYOK (bring your own key)
+On the gateway path, set `byok: true` and supply the upstream key via `extraHeaders`:
+```ts
+streamText({
+	model: workersai("deepseek/deepseek-chat", {
+		byok: true,
+		extraHeaders: { authorization: `Bearer ${env.DEEPSEEK_API_KEY}` },
+	}),
+	prompt: "Hello",
+});
+```
+Without `byok`, provider auth headers are stripped so unified billing / the gateway's stored key applies.
+### Fallback
+```ts
+// Client-side: keeps resume per leg. A failed pre-stream dispatch falls through.
+workersai("openai/gpt-5", {
+	fallback: { mode: "client", models: ["anthropic/claude-sonnet-4-5"] },
+});
+// Server-side: same-vendor, on the gateway path.
+workersai("openai/gpt-5", { fallback: { mode: "server", models: ["openai/gpt-5-mini"] } });
+```
+If every client-side leg fails, a `WorkersAIFallbackError` carries the per-attempt tree.
+### Caching
+```ts
+workersai("openai/gpt-5", { cacheTtl: 3600 }); // gateway path; cacheTtl/skipCache force it
+```
+### Metadata & logging
+Attach custom metadata (for spend attribution, tenant breakdowns, etc.) and toggle gateway log collection per request. Both work on either transport — on the run path they go into the typed gateway options; on the gateway path they become `cf-aig-metadata` / `cf-aig-collect-log` headers. Call-level `metadata` merges over (and wins against) any `metadata` set via `gateway: { metadata }`.
+```ts
+workersai("openai/gpt-5", {
+	metadata: { teamId: "AI", userId: 12345 }, // breaks down spend in the dashboard
+	collectLog: false, // opt this request out of log collection
+});
+```
+### Resume after disconnect
+The run path wraps the response stream so a transient mid-stream drop reconnects through the gateway resume endpoint transparently. For cross-invocation recovery (e.g. a Durable Object re-attaching after eviction), persist `{ runId, eventOffset }` via `onDispatch` + `onProgress` and re-attach with `createResumableStream`:
+```ts
+workersai("openai/gpt-5", {
+	onDispatch: (info) => save({ runId: info.runId }),
+	onProgress: (eventOffset) => save({ eventOffset }), // throttle your own writes
+	onResumeExpired: "accept-partial", // or "error" (default) once the ~5.5 min buffer TTL elapses
+});
+```
+### Bring your own provider
+For provider-native or non-chat providers the slug delegate can't auto-wire (bedrock, replicate, audio/image), or for full control, route any `@ai-sdk/*` provider through the gateway:
+```ts
+import { createOpenAI } from "@ai-sdk/openai";
+import { createGatewayFetch } from "workers-ai-provider/gateway";
+const openai = createOpenAI({
+	apiKey: env.OPENAI_API_KEY, // forwarded when byok: true
+	fetch: createGatewayFetch({ binding: env.AI, gateway: "my-gateway", byok: true }),
+});
+const model = openai("gpt-5");
+```
+The provider id is detected from the request URL (or pass `provider` explicitly).
+### Errors
+`WorkersAIGatewayError` carries a coarse `code` (`auth`, `rate-limit`, `not-found`, `bad-request`, `provider-error`, `gateway-error`, `resume-expired`), a `recoverable` hint, the HTTP `status`, and the parsed CF/provider envelope. `WorkersAIFallbackError` carries the `attempts` tree.
+## API reference
 ### `createWorkersAI(options)`
-| Option      | Type             | Description                                                                  |
-| ----------- | ---------------- | ---------------------------------------------------------------------------- |
-| `binding`   | `Ai`             | Workers AI binding (`env.AI`). Use this OR credentials.                      |
-| `accountId` | `string`         | Cloudflare account ID. Required with `apiKey`.                               |
-| `apiKey`    | `string`         | Cloudflare API token. Required with `accountId`.                             |
-| `gateway`   | `GatewayOptions` | Optional [AI Gateway](https://developers.cloudflare.com/ai-gateway/) config. |
+| Option            | Type                            | Description                                                                                       |
+| ----------------- | ------------------------------- | ------------------------------------------------------------------------------------------------- |
+| `binding`         | `Ai`                            | Workers AI binding (`env.AI`). Use this OR credentials.                                           |
+| `accountId`       | `string`                        | Cloudflare account ID. Required with `apiKey`.                                                    |
+| `apiKey`          | `string`                        | Cloudflare API token. Required with `accountId`.                                                  |
+| `gateway`         | `GatewayOptions`                | Optional [AI Gateway](https://developers.cloudflare.com/ai-gateway/) config.                      |
+| `providers`       | `ProviderPlugin[]`              | _Experimental._ Wire-format plugins that enable routing `"<provider>/<model>"` slugs via gateway. |
+| `resume`          | `boolean`                       | _Experimental._ Default resume behavior for gateway-routed catalog models. Defaults to `true`.    |
+| `onResumeExpired` | `"error"` \| `"accept-partial"` | _Experimental._ Default policy when the gateway resume buffer expires. Defaults to `"error"`.     |
 Returns a provider with model factories. Each factory accepts an optional second argument for per-model settings:
 ```ts
-workersai("@cf/moonshotai/kimi-k2.5", {
+workersai("@cf/moonshotai/kimi-k2.7-code", {
 	sessionAffinity: "my-unique-session-id",
 });
 ```
@@ -356,7 +508,7 @@ workersai("@cf/moonshotai/kimi-k2.5", {
 Model factories:
 ```ts
-// Chat — for generateText / streamText
+// Chat — for generateText / streamText (also accepts third-party slugs when `providers` is set)
 workersai(modelId);
 workersai.chat(modelId);

package/dist/anthropic.d.mts ADDED Viewed

@@ -0,0 +1,14 @@
+import { o as ProviderPlugin } from "./gateway-delegate-BfaUTwDZ.mjs";
+//#region src/anthropic.d.ts
+/**
+ * Anthropic-wire provider plugin for the gateway delegate. Pass to
+ * `createGatewayDelegate({ providers: [anthropic] })` to handle
+ * `"anthropic/<model>"` slugs.
+ *
+ * Requires `@ai-sdk/anthropic` (an optional peer dependency — install it yourself).
+ */
+declare const anthropic: ProviderPlugin;
+//#endregion
+export { anthropic };
+//# sourceMappingURL=anthropic.d.mts.map

package/dist/anthropic.mjs ADDED Viewed

@@ -0,0 +1,21 @@
+import { createAnthropic } from "@ai-sdk/anthropic";
+//#region src/anthropic.ts
+/**
+* Anthropic-wire provider plugin for the gateway delegate. Pass to
+* `createGatewayDelegate({ providers: [anthropic] })` to handle
+* `"anthropic/<model>"` slugs.
+*
+* Requires `@ai-sdk/anthropic` (an optional peer dependency — install it yourself).
+*/
+const anthropic = {
+	wireFormat: "anthropic",
+	create: ({ modelId, fetch, baseURL }) => createAnthropic({
+		apiKey: "unused",
+		fetch,
+		...baseURL ? { baseURL } : {}
+	})(modelId)
+};
+//#endregion
+export { anthropic };
+//# sourceMappingURL=anthropic.mjs.map

package/dist/anthropic.mjs.map ADDED Viewed

@@ -0,0 +1 @@

+ {"version":3,"file":"anthropic.mjs","names":[],"sources":["../src/anthropic.ts"],"sourcesContent":["import { createAnthropic } from \"@ai-sdk/anthropic\";\nimport type { ProviderPlugin } from \"./gateway-delegate\";\n\n/**\n * Anthropic-wire provider plugin for the gateway delegate. Pass to\n * `createGatewayDelegate({ providers: [anthropic] })` to handle\n * `\"anthropic/<model>\"` slugs.\n *\n * Requires `@ai-sdk/anthropic` (an optional peer dependency — install it yourself).\n */\nexport const anthropic: ProviderPlugin = {\n\twireFormat: \"anthropic\",\n\tcreate: ({ modelId, fetch, baseURL }) =>\n\t\t// apiKey is a placeholder — the gateway handles auth (unified billing / BYOK)\n\t\t// and the delegate strips the x-api-key header on the gateway path.\n\t\tcreateAnthropic({ apiKey: \"unused\", fetch, ...(baseURL ? { baseURL } : {}) })(modelId),\n};\n"],"mappings":";;;;;;;;;AAUA,MAAa,YAA4B;CACxC,YAAY;CACZ,SAAS,EAAE,SAAS,OAAO,cAG1B,gBAAgB;EAAE,QAAQ;EAAU;EAAO,GAAI,UAAU,EAAE,QAAQ,IAAI,CAAC;CAAG,CAAC,CAAC,CAAC,OAAO;AACvF"}