npm - @mastra/mcp-docs-server - Versions diffs - 1.1.34 → 1.1.35-alpha.10 - Mend

@mastra/mcp-docs-server 1.1.34 → 1.1.35-alpha.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

package/.docs/docs/agents/background-tasks.md +62 -2
package/.docs/docs/agents/processors.md +26 -2
package/.docs/docs/memory/observational-memory.md +2 -1
package/.docs/docs/memory/overview.md +2 -1
package/.docs/guides/deployment/inngest.md +45 -16
package/.docs/guides/guide/web-search.md +7 -7
package/.docs/models/gateways/azure-openai.md +94 -23
package/.docs/models/gateways/netlify.md +2 -1
package/.docs/models/gateways/openrouter.md +2 -1
package/.docs/models/index.md +1 -1
package/.docs/models/providers/deepinfra.md +2 -1
package/.docs/models/providers/digitalocean.md +2 -1
package/.docs/models/providers/kiro.md +110 -0
package/.docs/models/providers/llmgateway.md +1 -1
package/.docs/models/providers/opencode-go.md +2 -4
package/.docs/models/providers/opencode.md +1 -1
package/.docs/models/providers/qiniu-ai.md +2 -2
package/.docs/models/providers/xiaomi.md +2 -2
package/.docs/models/providers/zenmux.md +1 -1
package/.docs/models/providers.md +1 -0
package/.docs/reference/cli/mastra.md +464 -0
package/.docs/reference/client-js/agents.md +26 -1
package/.docs/reference/harness/harness-class.md +2 -0
package/.docs/reference/index.md +1 -0
package/.docs/reference/processors/processor-interface.md +74 -12
package/.docs/reference/processors/provider-history-compat.md +132 -0
package/.docs/reference/streaming/ChunkType.md +44 -0
package/.docs/reference/streaming/agents/stream.md +18 -2
package/.docs/reference/tools/mcp-client.md +47 -0
package/CHANGELOG.md +35 -0
package/package.json +4 -4

package/.docs/docs/agents/background-tasks.md CHANGED Viewed

@@ -127,10 +127,12 @@ When a tool call dispatches as a background task, two streams may surface lifecy
 | `background-task-completed` | The task finished successfully. The `payload.result` matches the eventual tool result. | Manager stream |
 | `background-task-failed`    | The task threw or timed out.                                                           | Manager stream |
 | `background-task-cancelled` | The task was cancelled before completing.                                              | Manager stream |
+| `background-task-suspended` | The tool called `suspend()` from inside its execute.                                   | Manager stream |
+| `background-task-resumed`   | A suspended task was resumed via `manager.resume(taskId, resumeData)`.                 | Manager stream |
-`agent.stream().fullStream` only emits the agent-loop chunks (`background-task-started`, `background-task-progress`) on its own. `agent.streamUntilIdle()` emits the same two chunks and additionally subscribes to the manager pubsub for the run's memory scope and pipes the five manager chunks (`background-task-running`, `background-task-output`, `background-task-completed`, `background-task-failed`, `background-task-cancelled`) into the same `fullStream`, so consumers of `streamUntilIdle().fullStream` see all seven types.
+`agent.stream().fullStream` only emits the agent-loop chunks (`background-task-started`, `background-task-progress`) on its own. `agent.streamUntilIdle()` emits the same two chunks and additionally subscribes to the manager pubsub for the run's memory scope and pipes the seven manager chunks (`background-task-running`, `background-task-output`, `background-task-completed`, `background-task-failed`, `background-task-cancelled`, `background-task-suspended`, `background-task-resumed`) into the same `fullStream`.
-`backgroundTaskManager.stream()` only emits the five manager chunks.
+`backgroundTaskManager.stream()` only emits the seven manager chunks.
 The full payload shapes are documented in the [background task chunks reference](https://mastra.ai/reference/streaming/ChunkType).
@@ -210,6 +212,64 @@ When this `researchAgent` is delegated to from a supervisor that has no backgrou
 Use this pattern when you want a subagent to behave consistently in the background regardless of which supervisor invokes it. Use the supervisor-side opt-in (above) when you want to tune background behavior centrally per supervisor.
+## Suspending and resuming
+A background task can pause itself mid-execution and wait for an external signal before continuing. This is useful for human approvals, webhooks, or any flow where the next step depends on data that arrives later.
+A tool calls `suspend(data)` from inside its `execute`, which:
+- Persists `status: 'suspended'` and the `data` payload on the task record.
+- Saves the workflow snapshot so the run survives process restarts.
+- Emits a `background-task-suspended` chunk on the manager stream.
+- Releases the concurrency slot so other tasks can run.
+Resume the task with `mastra.backgroundTaskManager.resume(taskId, resumeData)`. The `resumeData` arrives in the tool's `execute` options on the resumed run, and the task transitions back to `running`.
+```typescript
+import { createTool } from '@mastra/core/tools'
+import { z } from 'zod'
+export const reviewTool = createTool({
+  id: 'review',
+  description: 'Submit a draft for human review.',
+  inputSchema: z.object({ draft: z.string() }),
+  outputSchema: z.object({ approvedBy: z.string(), edits: z.string().optional() }),
+  background: { enabled: true },
+  execute: async ({ draft }, context) => {
+    const { suspend, resumeData } = context.agent
+    if (!resumeData) {
+      await suspend?.({ awaiting: 'approval', draft })
+      return { approvedBy: '', edits: undefined }
+    }
+    const { reviewer, edits } = resumeData as { reviewer: string; edits?: string }
+    return { approvedBy: reviewer, edits }
+  },
+})
+```
+The first invocation of `execute` sees `resumeData === undefined` and calls `suspend`. After the task is resumed, the runtime restarts the tool with `resumeData` populated; the `if` branch falls through and the tool returns its real result.
+To resume the task once an approval arrives:
+```typescript
+await mastra.backgroundTaskManager?.resume(taskId, {
+  reviewer: 'alice@example.com',
+  edits: 'Reworded paragraph 3.',
+})
+```
+### What happens to the agent loop
+When a task suspends mid-`streamUntilIdle()`, the wrapper treats it as terminal for the current iteration and closes. To continue the agent immediately when the resume payload is in hand, call `agent.resumeStreamUntilIdle(resumeData, { runId, toolCallId, memory })`: the resumed bg task runs to completion, its result lands in the message list, and the agent runs a follow-up turn — all on the same SSE connection. If you'd rather drive the resume out-of-band, call `mastra.backgroundTaskManager.resume(taskId, resumeData)` directly and the result still writes into the thread for the next user turn to pick up.
+### Re-registering the executor on resume
+The manager keeps tool executors in process memory. If the process restarts while a task is suspended, the executor closure is gone — the caller of `resume()` must re-register it first via `manager.registerTaskContext(taskId, ...)`. Tasks dispatched and resumed inside the same process don't need this.
+### Cancelling a suspended task
+`manager.cancel(taskId)` works against suspended tasks the same way it works for running ones: the row flips to `cancelled`, the workflow snapshot is cleaned up, and a `task.cancelled` event fires.
 ## Lifecycle callbacks
 Each layer can register terminal-state callbacks. They don't replace one another, and success/failure hooks fire for their respective outcomes:

package/.docs/docs/agents/processors.md CHANGED Viewed

@@ -211,6 +211,22 @@ The method receives the current `stepNumber`, `model`, `tools`, `toolChoice`, `m
 See the [`Processor` reference](https://mastra.ai/reference/processors/processor-interface) for all available arguments and return types.
+### Rewrite the LLM request before the provider call
+Use `processLLMRequest()` when you need to rewrite the final prompt that Mastra sends to the model. This hook runs after Mastra converts the `MessageList` into the provider-facing prompt format (`LanguageModelV2Prompt`) and immediately before the provider call.
+Use the message-based hooks for conversation changes:
+- `processInput()`: Change the conversation once before the agentic loop starts.
+- `processInputStep()`: Change messages or step configuration before each LLM call.
+- `processLLMRequest()`: Change only the outbound prompt for the current provider call.
+Changes returned from `processLLMRequest()` are transient. They don't persist back to `MessageList`, memory, UI history, or future provider calls. This makes the hook a good fit for provider compatibility rewrites, role/content normalization, or other model-specific prompt changes that shouldn't alter stored conversation history.
+The method receives `prompt`, `model`, `stepNumber`, `steps`, `state`, and the shared processor context. Calling `abort()` from `processLLMRequest()` emits the normal tripwire response and stops the call.
+See the [`Processor` reference](https://mastra.ai/reference/processors/processor-interface) for all available arguments and return types.
 ### Use the `prepareStep()` callback
 The `prepareStep()` callback on `generate()` or `stream()` is a shorthand for `processInputStep()`. Internally, Mastra wraps it in a processor that calls your function at each step. It accepts the same arguments and return type as `processInputStep()`, but doesn't require creating a class:
@@ -317,7 +333,7 @@ For more on retry behavior, see [Retry mechanism](#retry-mechanism) in Advanced
 ### Persist data across chunks and steps
-Output methods receive a `state` object that persists for the lifetime of one request. State is keyed by the processor's `id`, so each processor sees only its own data, and it is shared between `processOutputStream`, `processOutputStep`, and `processOutputResult`. A new state object is created for every new `agent.generate()` or `agent.stream()` call.
+Output methods receive a `state` object that persists for the lifetime of one request. State is keyed by the processor's `id`, so each processor sees only its own data, and it's shared between `processOutputStream`, `processOutputStep`, and `processOutputResult`. A new state object is created for every new `agent.generate()` or `agent.stream()` call.
 ```typescript
 import type { Processor } from '@mastra/core/processors'
@@ -383,6 +399,14 @@ Enables dynamic tool discovery for agents with large tool libraries. Instead of
 See the [`ToolSearchProcessor` reference](https://mastra.ai/reference/processors/tool-search-processor) for configuration options and usage examples.
+### `ProviderHistoryCompat`
+Handles provider-specific history incompatibilities when agents reuse messages across model providers. It can rewrite the outbound LLM request before the provider call, or recover from known provider API errors and retry.
+Add `ProviderHistoryCompat` explicitly when you need provider history compatibility rules, reactive API error recovery, custom compatibility rules, or predictable processor ordering.
+See the [`ProviderHistoryCompat` reference](https://mastra.ai/reference/processors/provider-history-compat) for setup, built-in rules, and custom rule options.
 ## Advanced patterns
 ### Ensure a final response with `maxSteps`
@@ -494,7 +518,7 @@ for await (const chunk of stream.fullStream) {
 Custom chunk types must use the `data-` prefix (e.g., `data-moderation-update`, `data-status`).
-By default, `processOutputStream()` skips `data-*` chunks so it does not accidentally operate on tool telemetry or other processors' output. To inspect, modify, or block these chunks in a processor, set `processDataParts = true` on that processor:
+By default, `processOutputStream()` skips `data-*` chunks so it doesn't accidentally operate on tool telemetry or other processors' output. To inspect, modify, or block these chunks in a processor, set `processDataParts = true` on that processor:
 ```typescript
 class ModerationCollector implements Processor {

package/.docs/docs/memory/observational-memory.md CHANGED Viewed

@@ -458,4 +458,5 @@ In practical terms, OM replaces both working memory and message history, and has
 - [Observational Memory Reference](https://mastra.ai/reference/memory/observational-memory)
 - [Memory Overview](https://mastra.ai/docs/memory/overview)
 - [Message History](https://mastra.ai/docs/memory/message-history)
-- [Memory Processors](https://mastra.ai/docs/memory/memory-processors)
+- [Memory Processors](https://mastra.ai/docs/memory/memory-processors)
+- [Mastra Code](https://code.mastra.ai/): A coding agent using Observational Memory

package/.docs/docs/memory/overview.md CHANGED Viewed

@@ -237,4 +237,5 @@ export const memoryAgent = new Agent({
 - [`Memory` reference](https://mastra.ai/reference/memory/memory-class)
 - [Tracing](https://mastra.ai/docs/observability/tracing/overview)
-- [Request Context](https://mastra.ai/docs/server/request-context)
+- [Request Context](https://mastra.ai/docs/server/request-context)
+- [Mastra Code](https://code.mastra.ai/): A coding agent using Mastra's memory system

package/.docs/guides/deployment/inngest.md CHANGED Viewed

@@ -21,27 +21,29 @@ Install the required packages:
 **npm**:
 ```bash
-npm install @mastra/inngest@latest inngest @inngest/realtime
+npm install @mastra/inngest@latest inngest
 ```
 **pnpm**:
 ```bash
-pnpm add @mastra/inngest@latest inngest @inngest/realtime
+pnpm add @mastra/inngest@latest inngest
 ```
 **Yarn**:
 ```bash
-yarn add @mastra/inngest@latest inngest @inngest/realtime
+yarn add @mastra/inngest@latest inngest
 ```
 **Bun**:
 ```bash
-bun add @mastra/inngest@latest inngest @inngest/realtime
+bun add @mastra/inngest@latest inngest
 ```
+> **Note:** Requires `inngest@^4` and Inngest Dev Server `v1.18.0` or later. Realtime is built into the SDK in v4, so `@inngest/realtime` and `realtimeMiddleware` are no longer used.
 ## Building an Inngest workflow
 This guide walks through creating a workflow with Inngest and Mastra, demonstrating a counter application that increments a value until it reaches 10.
@@ -54,13 +56,11 @@ In development:
 ```ts
 import { Inngest } from 'inngest'
-import { realtimeMiddleware } from '@inngest/realtime/middleware'
 export const inngest = new Inngest({
   id: 'mastra',
   baseUrl: 'http://localhost:8288',
   isDev: true,
-  middleware: [realtimeMiddleware()],
 })
 ```
@@ -68,11 +68,9 @@ In production:
 ```ts
 import { Inngest } from 'inngest'
-import { realtimeMiddleware } from '@inngest/realtime/middleware'
 export const inngest = new Inngest({
   id: 'mastra',
-  middleware: [realtimeMiddleware()],
 })
 ```
@@ -141,7 +139,7 @@ export const mastra = new Mastra({
     host: '0.0.0.0',
     apiRoutes: [
       {
-        path: '/api/inngest',
+        path: '/inngest/api',
         method: 'ALL',
         createHandler: async ({ mastra }) => {
           return serve({ mastra, inngest })
@@ -153,6 +151,8 @@ export const mastra = new Mastra({
 })
 ```
+> **Note:** The path is `/inngest/api`, not `/api/inngest`. Mastra reserves the `/api` prefix for built-in routes (agents, workflows, memory). Custom `apiRoutes` paths that start with the server's `apiPrefix` (default `/api`) throw at startup. See [#15743](https://github.com/mastra-ai/mastra/pull/15743) for context, or skip to [Using a custom `apiPrefix`](#using-a-custom-apiprefix) if you need to keep `/api/inngest`.
 ## Running workflows
 ### Running locally
@@ -162,10 +162,10 @@ export const mastra = new Mastra({
 2. Start the Inngest Dev Server. In a new terminal, run:
    ```bash
-   npx inngest-cli@latest dev -u http://localhost:4111/api/inngest
+   npx inngest-cli@latest dev -u http://localhost:4111/inngest/api
    ```
-   > **Note:** The URL after `-u` tells the Inngest dev server where to find your Mastra `/api/inngest` endpoint
+   > **Note:** The URL after `-u` tells the Inngest dev server where to find your Mastra `/inngest/api` endpoint
 3. Open the Inngest Dashboard at <http://localhost:8288> and go to the **Apps** section in the sidebar to verify your Mastra workflow is registered
@@ -229,6 +229,8 @@ Before you begin, make sure you have:
 5. Sync with the [Inngest dashboard](https://app.inngest.com/env/production/apps) by selecting **Sync new app with Vercel** and following the instructions
+   > **Warning:** Inngest's auto-discover convention assumes `/api/inngest`. Because this guide uses `/inngest/api`, set the **URL** field on the Inngest app to your deployed origin plus `/inngest/api` (for example `https://your-app.vercel.app/inngest/api`). If you leave it on the default, the Inngest dashboard will not find your app's functions.
 6. Invoke the workflow by going to **Functions**, selecting `workflow.increment-workflow`, selecting **All actions** > **Invoke**, and providing the following input:
    ```json
@@ -294,7 +296,7 @@ export const mastra = new Mastra({
     host: '0.0.0.0',
     apiRoutes: [
       {
-        path: '/api/inngest',
+        path: '/inngest/api',
         method: 'ALL',
         createHandler: async ({ mastra }) => {
           return serve({
@@ -316,7 +318,7 @@ When you include custom functions:
 1. Mastra workflows are automatically converted to Inngest functions with IDs like `workflow.${workflowId}`
 2. Custom functions retain their specified IDs (e.g., `send-welcome-email`, `process-webhook`)
-3. All functions are served together on the same `/api/inngest` endpoint
+3. All functions are served together on the same `/inngest/api` endpoint
 This allows you to combine Mastra's workflow orchestration with your existing Inngest functions.
@@ -338,7 +340,7 @@ const app = express()
 app.use(express.json())
 const handler = createServe(expressAdapter)({ mastra, inngest })
-app.use('/api/inngest', handler)
+app.use('/inngest/api', handler)
 app.listen(3000)
 ```
@@ -358,7 +360,7 @@ const handler = createServe(fastifyAdapter)({ mastra, inngest })
 fastify.route({
   method: ['GET', 'POST', 'PUT'],
-  url: '/api/inngest',
+  url: '/inngest/api',
   handler,
 })
@@ -382,7 +384,7 @@ const router = new Router()
 app.use(bodyParser())
 const handler = createServe(koaAdapter)({ mastra, inngest })
-router.all('/api/inngest', handler)
+router.all('/inngest/api', handler)
 app.use(router.routes())
 app.use(router.allowedMethods())
@@ -406,6 +408,33 @@ export { handler as GET, handler as POST, handler as PUT }
 The `createServe` function works with any Inngest adapter. See the [Inngest serve documentation](https://www.inngest.com/docs/reference/serve) for a complete list of available adapters including AWS Lambda, Cloudflare Workers, and more.
+## Using a custom `apiPrefix`
+If you need to keep `/api/inngest` (for example to match Inngest's auto-discover convention without changing the dashboard URL), set `server.apiPrefix` to relocate Mastra's built-in routes:
+```ts
+import { Mastra } from '@mastra/core'
+import { serve } from '@mastra/inngest'
+import { inngest } from './inngest'
+export const mastra = new Mastra({
+  server: {
+    apiPrefix: '/_mastra',
+    apiRoutes: [
+      {
+        path: '/api/inngest',
+        method: 'ALL',
+        createHandler: async ({ mastra }) => serve({ mastra, inngest }),
+      },
+    ],
+  },
+})
+```
+Mastra's built-in routes now resolve under `/_mastra/agents`, `/_mastra/workflows`, and so on, freeing the `/api/inngest` path for your custom route.
+> **Warning:** The default auth configuration protects `/api/*` and treats `/api`, `/api/auth/*` as public. When you change `apiPrefix`, those defaults no longer match and built-in routes fall outside the protected pattern. Update `server.auth.protected` and `server.auth.public` to reference the new prefix, and update any client code (including [`MastraClient`](https://mastra.ai/docs/server/mastra-client) `apiPrefix`) that hits `/api/*`.
 ## Flow control
 Inngest workflows support flow control features including concurrency limits, rate limiting, throttling, debouncing, and priority queuing. These options are configured in the `createWorkflow()` call and help manage workflow execution at scale.

package/.docs/guides/guide/web-search.md CHANGED Viewed

@@ -17,7 +17,7 @@ Some LLM providers include built-in web search capabilities that can be used dir
 1. Install dependencies
-   **Open AI**:
+   **OpenAI**:
    **npm**:
@@ -119,7 +119,7 @@ Some LLM providers include built-in web search capabilities that can be used dir
 2. Create a new file `src/mastra/agents/searchAgent.ts` and define your agent:
-   **Open AI**:
+   **OpenAI**:
    ```ts
    import { Agent } from '@mastra/core/agent'
@@ -128,7 +128,7 @@ Some LLM providers include built-in web search capabilities that can be used dir
      id: 'search-agent',
      name: 'Search Agent',
      instructions: 'You are a search agent that can search the web for information.',
-     model: 'openai/gpt-5.4',
+     model: 'openai/gpt-5.5',
    })
    ```
@@ -147,7 +147,7 @@ Some LLM providers include built-in web search capabilities that can be used dir
 3. Setup the tool:
-   **Open AI**:
+   **OpenAI**:
    ```ts
    import { openai } from '@ai-sdk/openai'
@@ -157,7 +157,7 @@ Some LLM providers include built-in web search capabilities that can be used dir
      id: 'search-agent',
      name: 'Search Agent',
      instructions: 'You are a search agent that can search the web for information.',
-     model: 'openai/gpt-5.4',
+     model: 'openai/gpt-5.5',
      tools: {
        webSearch: openai.tools.webSearch(),
      },
@@ -241,7 +241,7 @@ For more control over search behavior, you can integrate external search APIs as
      id: 'search-agent',
      name: 'Search Agent',
      instructions: 'You are a search agent that can search the web for information.',
-     model: 'openai/gpt-5.4',
+     model: 'openai/gpt-5.5',
    })
    ```
@@ -293,7 +293,7 @@ For more control over search behavior, you can integrate external search APIs as
      id: 'search-agent',
      name: 'Search Agent',
      instructions: 'You are a search agent that can search the web for information.',
-     model: 'openai/gpt-5.4',
+     model: 'openai/gpt-5.5',
      tools: {
        webSearch,
      },

package/.docs/models/gateways/azure-openai.md CHANGED Viewed

@@ -13,7 +13,7 @@ const agent = new Agent({
   id: "my-agent",
   name: "My Agent",
   instructions: "You are a helpful assistant",
-  model: "azure-openai/my-gpt4-deployment"  // Use your Azure deployment name (autocompleted in dev mode)
+  model: "azure-openai/my-gpt-5-4-deployment"  // Use your Azure deployment name (autocompleted in dev mode)
 });
 // Generate a response
@@ -34,9 +34,9 @@ Azure model IDs follow this pattern: `azure-openai/your-deployment-name`
 The deployment name is **specific to your Azure account** and chosen when you create a deployment in Azure Portal. Common examples:
-- `azure-openai/my-gpt4-deployment`
-- `azure-openai/production-gpt-35-turbo`
-- `azure-openai/staging-gpt-4o`
+- `azure-openai/my-gpt-5-4-deployment`
+- `azure-openai/production-gpt-5-4`
+- `azure-openai/staging-gpt-5-4-mini`
 ## Setup
@@ -44,7 +44,7 @@ Create deployments in [Azure OpenAI Studio](https://oai.azure.com/). The resourc
 ## Configuration
-Instantiate the gateway and pass it to Mastra. Three configuration modes are available.
+Instantiate the gateway and pass it to Mastra. The common configuration modes are shown below.
 ### Static Deployments
@@ -59,7 +59,7 @@ export const mastra = new Mastra({
     new AzureOpenAIGateway({
       resourceName: "my-openai-resource",
       apiKey: process.env.AZURE_API_KEY!,
-      deployments: ["gpt-4-prod", "gpt-35-turbo-dev"],
+      deployments: ["gpt-5-4-prod", "gpt-5-4-mini-dev"],
     }),
   ],
 });
@@ -111,7 +111,7 @@ export const mastra = new Mastra({
         type: "entraId",
         credential: new DefaultAzureCredential(),
       },
-      deployments: ["gpt-4-prod", "gpt-35-turbo-dev"],
+      deployments: ["gpt-5-4-prod", "gpt-5-4-mini-dev"],
     }),
   ],
 });
@@ -145,23 +145,94 @@ export const mastra = new Mastra({
 });
 ```
+### Azure Responses API
+Azure OpenAI supports the Responses API through the `v1` API path used by the AI SDK Azure provider. Set `useResponsesAPI: true` when your Azure resource and deployment support that route. The gateway then uses `apiVersion: "v1"` and `useDeploymentBasedUrls: false` by default.
+```typescript
+import { Mastra } from "@mastra/core";
+import { AzureOpenAIGateway } from "@mastra/core/llm";
+export const mastra = new Mastra({
+  gateways: [
+    new AzureOpenAIGateway({
+      resourceName: "my-openai-resource",
+      apiKey: process.env.AZURE_API_KEY!,
+      useResponsesAPI: true,
+      deployments: ["my-gpt-5-4-deployment"],
+    }),
+  ],
+});
+```
+Keep `useResponsesAPI` omitted or set it to `false` for the existing Azure chat completions route. That path keeps `apiVersion: "2024-04-01-preview"` and deployment-based URLs by default for compatibility.
+You can still configure `apiVersion` and `useDeploymentBasedUrls` directly. For example, set `useDeploymentBasedUrls: false` to use the Azure `v1` URL shape with the chat model constructor; the gateway defaults `apiVersion` to `"v1"` for that route. Passing `apiVersion: "v1"` by itself keeps the existing deployment-based URL default for compatibility.
+Do not combine `useResponsesAPI: true` with `useDeploymentBasedUrls: true`; the gateway rejects that configuration because Responses API support uses the Azure `v1` route.
+Use `apiVersion: "v1"` for the GA `v1` route. Microsoft currently exposes preview `v1` features through feature-specific headers, such as `"aoai-evals": "preview"`, or through preview/alpha API paths. The gateway still accepts `apiVersion: "preview"` with `useDeploymentBasedUrls: false` for Azure provider configurations that require the preview query value. Date-based API versions are only for the legacy deployment-based route, so the gateway rejects them when `useResponsesAPI` is `true` or `useDeploymentBasedUrls` is `false`.
+The same API key and Microsoft Entra ID authentication modes work with the `v1` route.
+### Azure Responses WebSocket transport
+Azure OpenAI also supports WebSocket mode on the Responses API. Use it for agent or tool loops with many model-tool round trips. Keep the standard HTTP transport for single-shot requests and short conversations.
+WebSocket transport requires `useResponsesAPI: true`, because Azure exposes it on the `v1` Responses path. Then opt in per stream request with `providerOptions.azure.transport: "websocket"`.
+```typescript
+import { Agent } from "@mastra/core/agent";
+const agent = new Agent({
+  id: "azure-ws-agent",
+  name: "Azure WebSocket Agent",
+  instructions: "Use tools when they are useful.",
+  model: "azure-openai/my-gpt-5-4-deployment",
+});
+const stream = await agent.stream("Find and improve the slow function.", {
+  providerOptions: {
+    azure: {
+      transport: "websocket",
+      store: false,
+      websocket: {
+        closeOnFinish: false,
+      },
+    },
+  },
+});
+for await (const chunk of stream.textStream) {
+  process.stdout.write(chunk);
+}
+stream.transport?.close();
+```
+Set `closeOnFinish: false` when you want to keep the socket open across follow-up turns. Azure keeps one response chain in connection-local memory, so continuing from the most recent `previous_response_id` can reduce continuation latency. The connection runs one response at a time and does not multiplex parallel runs.
+Do not send overlapping follow-up requests with `previous_response_id` on the same WebSocket transport. Mastra rejects overlapping continuation requests because Azure only keeps one in-flight response per connection. Wait for the active stream to finish before continuing the response chain.
 ## Configuration Reference
-| Option                      | Type              | Required | Description                                                           |
-| --------------------------- | ----------------- | -------- | --------------------------------------------------------------------- |
-| `resourceName`              | `string`          | Yes      | Azure OpenAI resource name                                            |
-| `apiKey`                    | `string`          | Yes\*    | API key from "Keys and Endpoint"                                      |
-| `authentication`            | `object`          | No       | Microsoft Entra ID authentication                                     |
-| `authentication.type`       | `"entraId"`       | Yes\*    | Authentication mode                                                   |
-| `authentication.credential` | `TokenCredential` | Yes\*    | Azure SDK-compatible credential for `entraId` authentication mode     |
-| `authentication.scope`      | `string`          | No       | Token scope (default: `https://cognitiveservices.azure.com/.default`) |
-| `apiVersion`                | `string`          | No       | API version (default: `2024-04-01-preview`)                           |
-| `deployments`               | `string[]`        | No       | Deployment names for static mode                                      |
-| `management`                | `object`          | No       | Management API credentials                                            |
-| `management.tenantId`       | `string`          | Yes\*    | Azure AD tenant ID                                                    |
-| `management.clientId`       | `string`          | Yes\*    | Service Principal client ID                                           |
-| `management.clientSecret`   | `string`          | Yes\*    | Service Principal secret                                              |
-| `management.subscriptionId` | `string`          | Yes\*    | Azure subscription ID                                                 |
-| `management.resourceGroup`  | `string`          | Yes\*    | Resource group name                                                   |
+| Option                      | Type              | Required | Description                                                                                                                  |
+| --------------------------- | ----------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------- |
+| `resourceName`              | `string`          | Yes      | Azure OpenAI resource name                                                                                                   |
+| `apiKey`                    | `string`          | Yes\*    | API key from "Keys and Endpoint"                                                                                             |
+| `authentication`            | `object`          | No       | Microsoft Entra ID authentication                                                                                            |
+| `authentication.type`       | `"entraId"`       | Yes\*    | Authentication mode                                                                                                          |
+| `authentication.credential` | `TokenCredential` | Yes\*    | Azure SDK-compatible credential for `entraId` authentication mode                                                            |
+| `authentication.scope`      | `string`          | No       | Token scope (default: `https://cognitiveservices.azure.com/.default`)                                                        |
+| `apiVersion`                | `string`          | No       | API version (default: `2024-04-01-preview`, or `v1` when `useResponsesAPI` is `true` or `useDeploymentBasedUrls` is `false`) |
+| `useResponsesAPI`           | `boolean`         | No       | Resolve deployments through the Azure OpenAI Responses API (default: `false`)                                                |
+| `useDeploymentBasedUrls`    | `boolean`         | No       | Use Azure deployment-based URLs (default: `true`, or `false` when `useResponsesAPI` is `true`)                               |
+| `deployments`               | `string[]`        | No       | Deployment names for static mode                                                                                             |
+| `management`                | `object`          | No       | Management API credentials                                                                                                   |
+| `management.tenantId`       | `string`          | Yes\*    | Azure AD tenant ID                                                                                                           |
+| `management.clientId`       | `string`          | Yes\*    | Service Principal client ID                                                                                                  |
+| `management.clientSecret`   | `string`          | Yes\*    | Service Principal secret                                                                                                     |
+| `management.subscriptionId` | `string`          | Yes\*    | Azure subscription ID                                                                                                        |
+| `management.resourceGroup`  | `string`          | Yes\*    | Resource group name                                                                                                          |
 \* Provide either `apiKey` or `authentication.type: "entraId"`. Management fields are required if `management` is provided.

package/.docs/models/gateways/netlify.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Netlify
-Netlify AI Gateway provides unified access to multiple providers with built-in caching and observability. Access 66 models through Mastra's model router.
+Netlify AI Gateway provides unified access to multiple providers with built-in caching and observability. Access 67 models through Mastra's model router.
 Learn more in the [Netlify documentation](https://docs.netlify.com/build/ai-gateway/overview/).
@@ -62,6 +62,7 @@ ANTHROPIC_API_KEY=ant-...
 | `gemini/gemini-3.1-pro-preview-customtools` |
 | `gemini/gemini-flash-latest`                |
 | `gemini/gemini-flash-lite-latest`           |
+| `openai/chat-latest`                        |
 | `openai/gpt-4.1`                            |
 | `openai/gpt-4.1-mini`                       |
 | `openai/gpt-4.1-nano`                       |

package/.docs/models/gateways/openrouter.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # ![OpenRouter logo](https://models.dev/logos/openrouter.svg)OpenRouter
-OpenRouter aggregates models from multiple providers with enhanced features like rate limiting and failover. Access 185 models through Mastra's model router.
+OpenRouter aggregates models from multiple providers with enhanced features like rate limiting and failover. Access 186 models through Mastra's model router.
 Learn more in the [OpenRouter documentation](https://openrouter.ai/models).
@@ -172,6 +172,7 @@ ANTHROPIC_API_KEY=ant-...
 | `poolside/laguna-xs.2:free`                                     |
 | `prime-intellect/intellect-3`                                   |
 | `qwen/qwen-2.5-coder-32b-instruct`                              |
+| `qwen/qwen-3.6-27b`                                             |
 | `qwen/qwen2.5-vl-72b-instruct`                                  |
 | `qwen/qwen3-235b-a22b-07-25`                                    |
 | `qwen/qwen3-235b-a22b-thinking-2507`                            |

package/.docs/models/index.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Model Providers
-Mastra provides a unified interface for working with LLMs across multiple providers, giving you access to 3875 models from 107 providers through a single API.
+Mastra provides a unified interface for working with LLMs across multiple providers, giving you access to 3889 models from 108 providers through a single API.
 ## Features

package/.docs/models/providers/deepinfra.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # ![Deep Infra logo](https://models.dev/logos/deepinfra.svg)Deep Infra
-Access 35 Deep Infra models through Mastra's model router. Authentication is handled automatically using the `DEEPINFRA_API_KEY` environment variable.
+Access 36 Deep Infra models through Mastra's model router. Authentication is handled automatically using the `DEEPINFRA_API_KEY` environment variable.
 Learn more in the [Deep Infra documentation](https://deepinfra.com/models).
@@ -36,6 +36,7 @@ for await (const chunk of stream) {
 | `deepinfra/anthropic/claude-4-opus`                           | 200K    |       |           |       |       |       | $17        | $83         |
 | `deepinfra/deepseek-ai/DeepSeek-R1-0528`                      | 164K    |       |           |       |       |       | $0.50      | $2          |
 | `deepinfra/deepseek-ai/DeepSeek-V3.2`                         | 164K    |       |           |       |       |       | $0.26      | $0.38       |
+| `deepinfra/deepseek-ai/DeepSeek-V4-Flash`                     | 1.0M    |       |           |       |       |       | $0.14      | $0.28       |
 | `deepinfra/deepseek-ai/DeepSeek-V4-Pro`                       | 66K     |       |           |       |       |       | $2         | $3          |
 | `deepinfra/google/gemma-4-26B-A4B-it`                         | 256K    |       |           |       |       |       | $0.07      | $0.34       |
 | `deepinfra/google/gemma-4-31B-it`                             | 256K    |       |           |       |       |       | $0.13      | $0.38       |

package/.docs/models/providers/digitalocean.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # ![DigitalOcean logo](https://models.dev/logos/digitalocean.svg)DigitalOcean
-Access 63 DigitalOcean models through Mastra's model router. Authentication is handled automatically using the `DIGITALOCEAN_ACCESS_TOKEN` environment variable.
+Access 64 DigitalOcean models through Mastra's model router. Authentication is handled automatically using the `DIGITALOCEAN_ACCESS_TOKEN` environment variable.
 Learn more in the [DigitalOcean documentation](https://docs.digitalocean.com/products/gradient-ai-platform/details/models/).
@@ -60,6 +60,7 @@ for await (const chunk of stream) {
 | `digitalocean/glm-5`                                 | 203K    |       |           |       |       |       | $1         | $3          |
 | `digitalocean/gte-large-en-v1.5`                     | 8K      |       |           |       |       |       | $0.09      | —           |
 | `digitalocean/kimi-k2.5`                             | 262K    |       |           |       |       |       | $0.50      | $3          |
+| `digitalocean/kimi-k2.6`                             | 262K    |       |           |       |       |       | $0.95      | $4          |
 | `digitalocean/llama-4-maverick`                      | 1.0M    |       |           |       |       |       | $0.25      | $0.87       |
 | `digitalocean/llama-guard-4-12b`                     | 128K    |       |           |       |       |       | —          | —           |
 | `digitalocean/llama3.3-70b-instruct`                 | 128K    |       |           |       |       |       | $0.65      | $0.65       |