@mastra/mcp-docs-server 1.1.35-alpha.13 → 1.1.35-alpha.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. package/.docs/course/03-agent-memory/18-advanced-configuration-semantic-recall.md +48 -4
  2. package/.docs/docs/agents/response-caching.md +148 -0
  3. package/.docs/docs/agents/using-tools.md +8 -0
  4. package/.docs/docs/memory/observational-memory.md +56 -12
  5. package/.docs/docs/memory/semantic-recall.md +68 -6
  6. package/.docs/docs/observability/tracing/exporters/arize.md +5 -5
  7. package/.docs/docs/observability/tracing/exporters/default.md +36 -1
  8. package/.docs/docs/observability/tracing/overview.md +1 -1
  9. package/.docs/docs/workflows/suspend-and-resume.md +28 -1
  10. package/.docs/models/gateways/openrouter.md +2 -1
  11. package/.docs/models/index.md +1 -1
  12. package/.docs/models/providers/llmgateway.md +7 -1
  13. package/.docs/models/providers/nebius.md +2 -1
  14. package/.docs/models/providers/opencode.md +1 -2
  15. package/.docs/models/providers/poe.md +4 -1
  16. package/.docs/reference/agents/agent.md +2 -0
  17. package/.docs/reference/client-js/responses.md +4 -0
  18. package/.docs/reference/configuration.md +4 -4
  19. package/.docs/reference/harness/harness-class.md +21 -8
  20. package/.docs/reference/index.md +2 -0
  21. package/.docs/reference/memory/observational-memory.md +11 -1
  22. package/.docs/reference/observability/tracing/exporters/arize.md +1 -1
  23. package/.docs/reference/observability/tracing/exporters/default-exporter.md +2 -0
  24. package/.docs/reference/observability/tracing/interfaces.md +36 -1
  25. package/.docs/reference/processors/response-cache.md +114 -0
  26. package/.docs/reference/tools/create-tool.md +46 -0
  27. package/.docs/reference/workflows/workflow-state-reader.md +113 -0
  28. package/CHANGELOG.md +15 -0
  29. package/package.json +3 -3
@@ -1,6 +1,6 @@
1
- # Advanced Configuration of Semantic Recall
1
+ # Advanced configuration of semantic recall
2
2
 
3
- We can configure semantic recall in more detail by setting options for the `semanticRecall` option:
3
+ Configure semantic recall with the `semanticRecall` option:
4
4
 
5
5
  ```typescript
6
6
  const memory = new Memory({
@@ -19,11 +19,55 @@ const memory = new Memory({
19
19
  before: 2,
20
20
  after: 1,
21
21
  },
22
+ scope: 'resource', // Search all threads for this resource
23
+ filter: { projectId: { $eq: 'project-a' } },
22
24
  },
23
25
  },
24
26
  })
25
27
  ```
26
28
 
27
- The `topK` parameter controls how many semantically similar messages are retrieved. A higher value will retrieve more messages, which can be helpful for complex topics but may also include less relevant information. The default value is `4`.
29
+ The `topK` parameter controls how many similar messages Mastra retrieves. A higher value retrieves more messages, which can help with complex topics but may include less relevant information. The default value is `4`.
28
30
 
29
- The `messageRange` parameter controls how much context is included with each match. This is important because the matching message alone might not provide enough context to understand the conversation. Including messages before and after the match helps the agent understand the context of the matched message.
31
+ The `messageRange` parameter controls how much context Mastra includes with each match. Messages before and after the match help the agent understand the matched message.
32
+
33
+ The `scope` parameter controls whether Mastra searches the current thread (`'thread'`) or all threads owned by a resource (`'resource'`). Use `scope: 'resource'` to let the agent recall information from past conversations for the same resource.
34
+
35
+ The `filter` parameter restricts semantic recall results to messages with matching thread metadata, such as a project ID or category.
36
+
37
+ Filters match metadata stored on message embeddings when messages are saved. If thread metadata changes later, existing embeddings keep their previous metadata until those messages are saved or indexed again.
38
+
39
+ Supported filter operators:
40
+
41
+ - `$and`: Logical AND
42
+ - `$eq`: Equal to
43
+ - `$gt`: Greater than
44
+ - `$gte`: Greater than or equal
45
+ - `$in`: In array
46
+ - `$lt`: Less than
47
+ - `$lte`: Less than or equal
48
+ - `$ne`: Not equal to
49
+ - `$nin`: Not in array
50
+ - `$or`: Logical OR
51
+
52
+ The following example demonstrates metadata filters for common use cases:
53
+
54
+ ```typescript
55
+ // Filter by project
56
+ const options = {
57
+ semanticRecall: { filter: { projectId: { $eq: 'my-project' } } },
58
+ }
59
+
60
+ // Filter by multiple categories
61
+ const options = {
62
+ semanticRecall: { filter: { category: { $in: ['work', 'research'] } } },
63
+ }
64
+
65
+ // Filter by project and priority
66
+ const options = {
67
+ semanticRecall: {
68
+ filter: {
69
+ $and: [{ projectId: { $eq: 'project-a' } }, { priority: { $gte: 3 } }],
70
+ },
71
+ },
72
+ }
73
+ ```
@@ -0,0 +1,148 @@
1
+ # Response caching
2
+
3
+ Response caching skips the LLM call and replays a previously cached response when an agent receives an identical request. Use it to drop latency to single-digit milliseconds and avoid paying for repeated calls.
4
+
5
+ Caching is implemented as the [`ResponseCache`](https://mastra.ai/reference/processors/response-cache) input processor. There is no agent-level option — to enable caching, register the processor explicitly. This keeps the API surface small while we collect feedback; per-call overrides flow through `RequestContext`.
6
+
7
+ ## When to use response caching
8
+
9
+ Reach for it when the same request shape repeats across users or sessions, for example prompt templates, suggested-prompt buttons, agentic search re-asks, or guardrail LLMs that classify the same input over and over. Skip it when calls trigger external side effects through tools, since cache hits replay tool calls without re-executing them.
10
+
11
+ ## Quickstart
12
+
13
+ Add a `ResponseCache` to the agent's `inputProcessors` and pass any `MastraServerCache` as the backend. For development, `InMemoryServerCache` works out of the box:
14
+
15
+ ```typescript
16
+ import { Agent } from '@mastra/core/agent'
17
+ import { InMemoryServerCache } from '@mastra/core/cache'
18
+ import { ResponseCache } from '@mastra/core/processors'
19
+
20
+ const cache = new InMemoryServerCache()
21
+
22
+ export const searchAgent = new Agent({
23
+ name: 'Search Agent',
24
+ instructions: 'You answer questions concisely.',
25
+ model: 'openai/gpt-5',
26
+ inputProcessors: [new ResponseCache({ cache, ttl: 600 })], // 10 minutes
27
+ })
28
+ ```
29
+
30
+ The first call runs the LLM normally and writes the response to the cache. Subsequent calls with an identical resolved prompt return the cached response without invoking the LLM.
31
+
32
+ ## Per-call overrides via RequestContext
33
+
34
+ Per-call config flows through `RequestContext`. Use `ResponseCache.context()` to build a fresh context, or `ResponseCache.applyContext()` to merge into one you already have:
35
+
36
+ ```typescript
37
+ import { ResponseCache } from '@mastra/core/processors'
38
+ import { RequestContext } from '@mastra/core/request-context'
39
+
40
+ // Fresh context with the override
41
+ await agent.stream('hello', {
42
+ requestContext: ResponseCache.context({ key: 'custom-key', bust: true }),
43
+ })
44
+
45
+ // Or merge into an existing context
46
+ const ctx = new RequestContext()
47
+ ctx.set('caller-meta', { userId: 'u-123' })
48
+ ResponseCache.applyContext(ctx, { bust: true })
49
+ await agent.stream('hello', { requestContext: ctx })
50
+ ```
51
+
52
+ Three fields are overridable per call:
53
+
54
+ - `key` — string or function. Overrides the auto-derived cache key for this request only.
55
+ - `scope` — string or `null`. Overrides the tenant/user scope for this request only. `null` opts out of scoping.
56
+ - `bust` — boolean. Skips the cache read but still writes on completion (useful for "force refresh" buttons).
57
+
58
+ `cache`, `ttl`, and `agentId` stay on the constructor — they are instance-level concerns and not safe to vary per call.
59
+
60
+ ## Tenant scoping
61
+
62
+ By default, `ResponseCache` looks up `MASTRA_RESOURCE_ID_KEY` on the request context and uses it as the cache scope. This means an agent that already populates the resource id (e.g. via memory) gets per-user isolation automatically — two users never see each other's cached responses.
63
+
64
+ Override explicitly when you need a different scope:
65
+
66
+ ```typescript
67
+ new Agent({
68
+ // ...
69
+ inputProcessors: [
70
+ new ResponseCache({
71
+ cache,
72
+ scope: 'org-123', // explicit tenant scope
73
+ }),
74
+ ],
75
+ })
76
+ ```
77
+
78
+ Pass `scope: null` to deliberately share entries across all callers — only use this for known-public, non-personalized content.
79
+
80
+ ## Custom cache backend
81
+
82
+ `ResponseCache` accepts any `MastraServerCache`. For production, use `RedisCache` from `@mastra/redis`:
83
+
84
+ ```typescript
85
+ import { Agent } from '@mastra/core/agent'
86
+ import { ResponseCache } from '@mastra/core/processors'
87
+ import { RedisCache } from '@mastra/redis'
88
+
89
+ const cache = new RedisCache({ url: process.env.REDIS_URL })
90
+
91
+ export const agent = new Agent({
92
+ name: 'Cached Agent',
93
+ instructions: '...',
94
+ model: 'openai/gpt-5',
95
+ inputProcessors: [new ResponseCache({ cache })],
96
+ })
97
+ ```
98
+
99
+ For a custom backend, extend `MastraServerCache` and implement its abstract methods (the processor only calls `get` and `set`).
100
+
101
+ ## How caching is implemented
102
+
103
+ `ResponseCache` hooks into `processLLMRequest` (cache lookup, short-circuits on hit) and `processLLMResponse` (cache write on completion). Both run inside the agentic loop _after_ memory has loaded and earlier input processors have transformed the prompt.
104
+
105
+ This means the cache key is derived from the resolved `LanguageModelV2Prompt` Mastra is about to send to the model — i.e. _after_ memory has loaded and earlier input processors have run — and each step in an agentic tool loop is independently cached.
106
+
107
+ ## What's in the cache key
108
+
109
+ When you don't supply `key`, the processor derives one deterministically from the inputs that change the LLM's response at this step: `agentId`, `stepNumber` (so each step in a tool loop has its own cache entry), `scope`, model identity (`provider`, `modelId`, spec version), and the resolved `prompt` (post-memory + post-processors). Any change to these inputs automatically invalidates the cache.
110
+
111
+ ### Customize the cache key
112
+
113
+ Pass `key` as a function on the constructor or per-call to derive your own cache key from any subset of those inputs. The function receives the same inputs the deterministic hash would have consumed and returns a string (or a `Promise<string>`):
114
+
115
+ ```typescript
116
+ import { ResponseCache, buildResponseCacheKey } from '@mastra/core/processors'
117
+
118
+ await agent.stream(input, {
119
+ requestContext: ResponseCache.context({
120
+ // Cache only on the model id and the resolved prompt tail — ignore
121
+ // step number, scope, etc.
122
+ key: ({ model, prompt }) => `qa:${model.modelId}:${JSON.stringify(prompt).slice(-200)}`,
123
+ }),
124
+ })
125
+
126
+ // Or reuse the deterministic helper while overriding individual fields:
127
+ await agent.stream(input, {
128
+ requestContext: ResponseCache.context({
129
+ key: inputs => buildResponseCacheKey({ ...inputs, scope: 'global' }),
130
+ }),
131
+ })
132
+ ```
133
+
134
+ If the function throws, the processor falls back to the default key derivation so the call still benefits from caching.
135
+
136
+ ## How cache hits work
137
+
138
+ When the processor finds a cache hit, it short-circuits the LLM call by returning the cached chunks from `processLLMRequest`. The agentic loop synthesizes a stream from those chunks instead of calling the model. `agent.generate()` collects them into a `FullOutput`; `agent.stream()` returns a `MastraModelOutput` whose chunks come from the cached buffer, so consumers iterating `fullStream` or awaiting `text`, `usage`, and `finishReason` see the cached values.
139
+
140
+ Cache writes happen after the response completes. Failed runs (errors, tripwire activations) are not cached, so the next call retries cleanly.
141
+
142
+ ## Related
143
+
144
+ - [`ResponseCache` reference](https://mastra.ai/reference/processors/response-cache)
145
+ - [Processors](https://mastra.ai/docs/agents/processors)
146
+ - [Guardrails](https://mastra.ai/docs/agents/guardrails)
147
+ - [Agent.stream()](https://mastra.ai/reference/streaming/agents/stream)
148
+ - [Agent.generate()](https://mastra.ai/reference/agents/generate)
@@ -224,6 +224,14 @@ export const weatherTool = createTool({
224
224
  })
225
225
  ```
226
226
 
227
+ ## Transform tool payloads for UI and transcripts
228
+
229
+ Use `transform` when a tool returns raw data your application needs, but browser-facing streams or user-visible transcript messages should receive a smaller or safer shape. `transform` is separate from `toModelOutput`: `toModelOutput` shapes the payload sent back to the model, while `transform` shapes tool input, output, errors, approval payloads, and suspension payloads for `display` and `transcript` targets.
230
+
231
+ If a transform is configured and it fails, Mastra does not fall back to the raw payload for display or transcript targets. Input deltas are suppressed when no safe `inputDelta` transform is available.
232
+
233
+ See the [`createTool()` reference](https://mastra.ai/reference/tools/create-tool) for a `transform` example. For shared rules across several tools, configure the agent-level `transform` policy in the [`Agent` constructor](https://mastra.ai/reference/agents/agent).
234
+
227
235
  ## Control tool selection
228
236
 
229
237
  Pass `toolChoice` or `activeTools` to `.generate()` or `.stream()` to control which tools the agent uses at runtime.
@@ -77,6 +77,48 @@ The observer also sees these markers when it processes the thread, so the observ
77
77
 
78
78
  See [the API reference](https://mastra.ai/reference/memory/observational-memory) for the full configuration shape.
79
79
 
80
+ ## Early activation
81
+
82
+ OM can activate buffered observations before the token threshold is reached. This is useful when a prompt cache is likely to expire, or when the agent changes model providers.
83
+
84
+ Top-level early activation settings apply to observations by default:
85
+
86
+ ```typescript
87
+ const memory = new Memory({
88
+ options: {
89
+ observationalMemory: {
90
+ model: 'google/gemini-2.5-flash',
91
+ activateAfterIdle: '5m',
92
+ activateOnProviderChange: true,
93
+ },
94
+ },
95
+ })
96
+ ```
97
+
98
+ Use nested `observation` and `reflection` settings for per-phase control. Reflection early activation is opt-in, so top-level settings affect only observations.
99
+
100
+ ```typescript
101
+ const memory = new Memory({
102
+ options: {
103
+ observationalMemory: {
104
+ model: 'google/gemini-2.5-flash',
105
+ activateAfterIdle: '5m',
106
+ observation: {
107
+ activateAfterIdle: false,
108
+ },
109
+ reflection: {
110
+ activateAfterIdle: '10m',
111
+ activateOnProviderChange: true,
112
+ },
113
+ },
114
+ },
115
+ })
116
+ ```
117
+
118
+ In this example, the top-level idle setting is disabled for observations, while reflections opt into idle and provider-change activation.
119
+
120
+ See [the API reference](https://mastra.ai/reference/memory/observational-memory) for the full configuration shape.
121
+
80
122
  ## Benefits
81
123
 
82
124
  - **Prompt caching**: OM's context is stable and observations append over time rather than being dynamically retrieved each turn. This keeps the prompt prefix cacheable, which reduces costs.
@@ -368,17 +410,19 @@ Reflection works similarly — the Reflector runs in the background when observa
368
410
 
369
411
  ### Settings
370
412
 
371
- | Setting | Default | What it controls |
372
- | ------------------------------ | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
373
- | `observation.bufferTokens` | `0.2` | How often to buffer. `0.2` means every 20% of `messageTokens` — with the default 30k threshold, that's roughly every 6k tokens. Can also be an absolute token count (e.g. `5000`). |
374
- | `observation.bufferActivation` | `0.8` | How aggressively to clear the message window on activation. `0.8` means remove enough messages to keep only 20% of `messageTokens` remaining. Lower values keep more message history. |
375
- | `observation.blockAfter` | `1.2` | Safety threshold as a multiplier of `messageTokens`. At `1.2`, synchronous observation is forced at 36k tokens (1.2 × 30k). Only matters if buffering can't keep up. |
376
- | `activateAfterIdle` | none | Forces buffered observations and buffered reflections to activate after a period of inactivity, even if their token thresholds have not been reached yet. Accepts milliseconds or duration strings like `300_000`, `"5m"`, or `"1hr"`. Set this to your prompt cache TTL if you want activation to happen before the next cold prompt. |
377
- | `activateOnProviderChange` | `false` | Forces buffered observations and reflections to activate when the next step uses a different `provider/model` than the one that produced the latest assistant step. Use this when switching providers or models would invalidate prompt cache reuse. |
378
- | `reflection.bufferActivation` | `0.5` | When to start background reflection. `0.5` means reflection begins when observations reach 50% of the `observationTokens` threshold. |
379
- | `reflection.blockAfter` | `1.2` | Safety threshold for reflection, same logic as observation. |
413
+ | Setting | Default | What it controls |
414
+ | ------------------------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
415
+ | `observation.bufferTokens` | `0.2` | How often to buffer. `0.2` means every 20% of `messageTokens` — with the default 30k threshold, that's roughly every 6k tokens. Can also be an absolute token count (e.g. `5000`). |
416
+ | `observation.bufferActivation` | `0.8` | How aggressively to clear the message window on activation. `0.8` means remove enough messages to keep only 20% of `messageTokens` remaining. Lower values keep more message history. |
417
+ | `observation.blockAfter` | `1.2` | Safety threshold as a multiplier of `messageTokens`. At `1.2`, synchronous observation is forced at 36k tokens (1.2 × 30k). Only matters if buffering can't keep up. |
418
+ | `activateAfterIdle` | none | Forces buffered observations to activate after a period of inactivity, even before `observation.messageTokens` is reached. Accepts a numeric millisecond value such as `300_000`, or duration strings like `"5m"` or `"1hr"`. Set this to your prompt cache TTL if you want activation to happen before the next cold prompt. |
419
+ | `activateOnProviderChange` | `false` | Forces buffered observations to activate when the next step uses a different `provider/model` than the one that produced the latest assistant step. Use this when switching providers or models would invalidate prompt cache reuse. |
420
+ | `reflection.bufferActivation` | `0.5` | When to start background reflection. `0.5` means reflection begins when observations reach 50% of the `observationTokens` threshold. |
421
+ | `reflection.activateAfterIdle` | none | Opts buffered reflections into idle activation. Reflections don't inherit top-level `activateAfterIdle`. |
422
+ | `reflection.activateOnProviderChange` | `false` | Opts buffered reflections into provider-change activation. Reflections don't inherit top-level `activateOnProviderChange`. |
423
+ | `reflection.blockAfter` | `1.2` | Safety threshold for reflection, same logic as observation. |
380
424
 
381
- If you're relying on prompt caching, set `activateAfterIdle` to match your cache TTL. That way, once a thread has been idle long enough for the cache to expire, the next request can activate buffered observations or reflections first and send a smaller compressed context window.
425
+ If you're relying on prompt caching, set `activateAfterIdle` to match your cache TTL. That way, once a thread has been idle long enough for the cache to expire, the next request can activate buffered observations first and send a smaller compressed context window.
382
426
 
383
427
  ```typescript
384
428
  const memory = new Memory({
@@ -392,9 +436,9 @@ const memory = new Memory({
392
436
  })
393
437
  ```
394
438
 
395
- With a 5-minute prompt cache TTL, this activates buffered context after 5 minutes of inactivity so the next uncached prompt uses observations and reflections instead of a larger raw message window. If you prefer, `300_000` works the same way.
439
+ With a 5-minute prompt cache TTL, this activates buffered observations after 5 minutes of inactivity so the next uncached prompt uses compressed observations instead of a larger raw message window. If you prefer, `300_000` works the same way.
396
440
 
397
- Changing model or providers mid-thread will invalidate the prompt cache. If your agent can switch between providers or models mid-thread, `activateOnProviderChange: true` forces buffered context to activate before the new provider runs. That avoids sending a large raw window to a provider that cannot reuse the previous prompt cache.
441
+ Changing model or providers mid-thread will invalidate the prompt cache. If your agent can switch between providers or models mid-thread, `activateOnProviderChange: true` forces buffered observations to activate before the new provider runs. That avoids sending a large raw window to a provider that can't reuse the previous prompt cache.
398
442
 
399
443
  ### Disabling
400
444
 
@@ -121,26 +121,88 @@ Each vector store page below includes installation instructions, configuration p
121
121
 
122
122
  ## Recall configuration
123
123
 
124
- The three main parameters that control semantic recall behavior are:
124
+ The following options control semantic recall behavior:
125
125
 
126
- 1. **topK**: How many semantically similar messages to retrieve
127
- 2. **messageRange**: How much surrounding context to include with each match
128
- 3. **scope**: Whether to search within the current thread or across all threads owned by a resource (the default is resource scope).
126
+ 1. **topK**: The number of similar messages to retrieve
127
+ 2. **messageRange**: The surrounding messages to include with each match
128
+ 3. **scope**: Whether to search the current thread or all threads for a resource
129
+ 4. **filter**: Metadata criteria that restrict search results
129
130
 
130
131
  ```typescript
131
132
  const agent = new Agent({
132
133
  memory: new Memory({
133
134
  options: {
134
135
  semanticRecall: {
135
- topK: 3, // Retrieve 3 most similar messages
136
+ topK: 3, // Retrieve 3 similar messages
136
137
  messageRange: 2, // Include 2 messages before and after each match
137
- scope: 'resource', // Search across all threads for this user (default setting if omitted)
138
+ scope: 'resource', // Search all threads for this resource
139
+ filter: { projectId: { $eq: 'project-a' } },
138
140
  },
139
141
  },
140
142
  }),
141
143
  })
142
144
  ```
143
145
 
146
+ > **Note:** `scope: 'resource'` is supported by the LibSQL, PostgreSQL, and Upstash storage adapters.
147
+
148
+ ### Metadata filtering
149
+
150
+ The `filter` option restricts semantic recall results to messages with matching thread metadata.
151
+
152
+ ```typescript
153
+ const agent = new Agent({
154
+ memory: new Memory({
155
+ options: {
156
+ semanticRecall: {
157
+ scope: 'resource',
158
+ filter: {
159
+ projectId: { $eq: 'project-a' },
160
+ category: { $in: ['work', 'personal'] },
161
+ },
162
+ },
163
+ },
164
+ }),
165
+ })
166
+ ```
167
+
168
+ Filters match metadata stored on message embeddings when messages are saved. If thread metadata changes later, existing embeddings keep their previous metadata until those messages are saved or indexed again.
169
+
170
+ Supported filter operators:
171
+
172
+ - `$and`: Logical AND
173
+ - `$eq`: Equal to
174
+ - `$gt`: Greater than
175
+ - `$gte`: Greater than or equal
176
+ - `$in`: In array
177
+ - `$lt`: Less than
178
+ - `$lte`: Less than or equal
179
+ - `$ne`: Not equal to
180
+ - `$nin`: Not in array
181
+ - `$or`: Logical OR
182
+
183
+ The following example demonstrates metadata filters for common use cases:
184
+
185
+ ```typescript
186
+ // Filter by project
187
+ const options = {
188
+ semanticRecall: { filter: { projectId: { $eq: 'my-project' } } },
189
+ }
190
+
191
+ // Filter by multiple categories
192
+ const options = {
193
+ semanticRecall: { filter: { category: { $in: ['work', 'research'] } } },
194
+ }
195
+
196
+ // Filter by project and priority
197
+ const options = {
198
+ semanticRecall: {
199
+ filter: {
200
+ $and: [{ projectId: { $eq: 'project-a' } }, { priority: { $gte: 3 } }],
201
+ },
202
+ },
203
+ }
204
+ ```
205
+
144
206
  ## Embedder configuration
145
207
 
146
208
  Semantic recall relies on an [embedding model](https://mastra.ai/reference/memory/memory-class) to convert messages into embeddings. Mastra supports embedding models through the model router using `provider/model` strings, or you can use any [embedding model](https://sdk.vercel.ai/docs/ai-sdk-core/embeddings) compatible with the AI SDK.
@@ -43,7 +43,7 @@ Phoenix is an open-source observability platform that can be self-hosted or used
43
43
 
44
44
  ```bash
45
45
  # Required
46
- PHOENIX_ENDPOINT=http://localhost:6006/v1/traces # Or your Phoenix Cloud URL
46
+ PHOENIX_COLLECTOR_ENDPOINT=http://localhost:6006/v1/traces # Or your Phoenix Cloud URL
47
47
 
48
48
  # Optional
49
49
  PHOENIX_API_KEY=your-api-key # For authenticated Phoenix instances
@@ -87,7 +87,7 @@ export const mastra = new Mastra({
87
87
  serviceName: process.env.PHOENIX_PROJECT_NAME || 'mastra-service',
88
88
  exporters: [
89
89
  new ArizeExporter({
90
- endpoint: process.env.PHOENIX_ENDPOINT!,
90
+ endpoint: process.env.PHOENIX_COLLECTOR_ENDPOINT!,
91
91
  apiKey: process.env.PHOENIX_API_KEY,
92
92
  projectName: process.env.PHOENIX_PROJECT_NAME,
93
93
  }),
@@ -106,7 +106,7 @@ export const mastra = new Mastra({
106
106
  > arizephoenix/phoenix:latest
107
107
  > ```
108
108
  >
109
- > Set `PHOENIX_ENDPOINT=http://localhost:6006/v1/traces` and run your Mastra agent to see traces at [localhost:6006](http://localhost:6006).
109
+ > Set `PHOENIX_COLLECTOR_ENDPOINT=http://localhost:6006/v1/traces` and run your Mastra agent to see traces at [localhost:6006](http://localhost:6006).
110
110
 
111
111
  ### Arize AX Setup
112
112
 
@@ -218,7 +218,7 @@ Control how traces are batched and exported:
218
218
 
219
219
  ```typescript
220
220
  new ArizeExporter({
221
- endpoint: process.env.PHOENIX_ENDPOINT!,
221
+ endpoint: process.env.PHOENIX_COLLECTOR_ENDPOINT!,
222
222
  apiKey: process.env.PHOENIX_API_KEY,
223
223
 
224
224
  // Batch processing configuration
@@ -233,7 +233,7 @@ Add custom attributes to all exported spans:
233
233
 
234
234
  ```typescript
235
235
  new ArizeExporter({
236
- endpoint: process.env.PHOENIX_ENDPOINT!,
236
+ endpoint: process.env.PHOENIX_COLLECTOR_ENDPOINT!,
237
237
  resourceAttributes: {
238
238
  'deployment.environment': process.env.NODE_ENV,
239
239
  'service.namespace': 'production',
@@ -174,7 +174,42 @@ The DefaultExporter includes robust error handling for production use:
174
174
  - **Persistent Failures**: Drop batch after 4 failed attempts
175
175
  - **Buffer Overflow**: Prevent memory issues during storage outages
176
176
 
177
- ### Configuration Examples
177
+ ## Dropped observability events
178
+
179
+ `DefaultExporter` emits structured drop events when it cannot persist observability data. Register an exporter or bridge with `onDroppedEvent` to forward these drops to alerting or monitoring.
180
+
181
+ There are two drop reasons:
182
+
183
+ - `unsupported-storage`: The storage provider does not implement the signal type.
184
+ - `retry-exhausted`: The exporter retried a batch up to `maxRetries` times and then dropped it.
185
+
186
+ The following example demonstrates forwarding drop details to a monitoring endpoint:
187
+
188
+ ```typescript
189
+ import { BaseExporter } from '@mastra/observability'
190
+ import type { ObservabilityDropEvent, TracingEvent } from '@mastra/core/observability'
191
+
192
+ class DropAlertExporter extends BaseExporter {
193
+ name = 'drop-alerts'
194
+
195
+ async onDroppedEvent(event: ObservabilityDropEvent) {
196
+ await fetch('https://monitoring.example.com/observability-drops', {
197
+ method: 'POST',
198
+ headers: { 'content-type': 'application/json' },
199
+ body: JSON.stringify({
200
+ count: event.count,
201
+ signal: event.signal,
202
+ reason: event.reason,
203
+ exporterName: event.exporterName,
204
+ }),
205
+ })
206
+ }
207
+
208
+ protected async _exportTracingEvent(_event: TracingEvent) {}
209
+ }
210
+ ```
211
+
212
+ ## Configuration examples
178
213
 
179
214
  ```typescript
180
215
  // Zero config - recommended for most users
@@ -292,7 +292,7 @@ export const mastra = new Mastra({
292
292
  serviceName: 'my-service',
293
293
  exporters: [
294
294
  new ArizeExporter({
295
- endpoint: process.env.PHOENIX_ENDPOINT,
295
+ endpoint: process.env.PHOENIX_COLLECTOR_ENDPOINT,
296
296
  apiKey: process.env.PHOENIX_API_KEY,
297
297
  }),
298
298
  new DefaultExporter(), // Keep Studio access
@@ -188,6 +188,32 @@ The `suspended` array contains the IDs of any suspended workflows and steps from
188
188
  ['nested-workflow', 'step-1']
189
189
  ```
190
190
 
191
+ ## Recovering suspended runs
192
+
193
+ Use `workflow.getWorkflowRunById()` with `createWorkflowStateReader()` when your application needs to recover a suspended run from storage. The reader exposes suspended steps, resume labels, step payloads, and step outputs without reading the raw snapshot shape.
194
+
195
+ ```typescript
196
+ import { createWorkflowStateReader } from '@mastra/core/workflows'
197
+
198
+ const workflow = mastra.getWorkflow('testWorkflow')
199
+ const state = await workflow.getWorkflowRunById('run-123')
200
+
201
+ if (state?.status === 'suspended') {
202
+ const reader = createWorkflowStateReader(state)
203
+ const suspendedStep = reader.getSuspendedStep()
204
+ const approvalLabel = reader.getResumeLabel('approve')
205
+ const run = await workflow.createRun({ runId: state.runId })
206
+
207
+ await run.resume({
208
+ step: approvalLabel?.stepId ?? suspendedStep?.path,
209
+ resumeData: { approved: true },
210
+ forEachIndex: approvalLabel?.foreachIndex,
211
+ })
212
+ }
213
+ ```
214
+
215
+ For nested workflows, `suspendedStep.path` contains the resume path. For `foreach` suspensions, matching resume labels include `foreachIndex` when the label points to a specific iteration.
216
+
191
217
  ## Sleep
192
218
 
193
219
  Sleep methods can be used to pause execution at the workflow level, which sets the status to `waiting`. By comparison, `suspend()` pauses execution within a specific step and sets the status to `suspended`.
@@ -202,4 +228,5 @@ Sleep methods can be used to pause execution at the workflow level, which sets t
202
228
  - [Control Flow](https://mastra.ai/docs/workflows/control-flow)
203
229
  - [Human-in-the-loop](https://mastra.ai/docs/workflows/human-in-the-loop)
204
230
  - [Snapshots](https://mastra.ai/docs/workflows/snapshots)
205
- - [Time Travel](https://mastra.ai/docs/workflows/time-travel)
231
+ - [Time Travel](https://mastra.ai/docs/workflows/time-travel)
232
+ - [Workflow state reader](https://mastra.ai/reference/workflows/workflow-state-reader)
@@ -1,6 +1,6 @@
1
1
  # ![OpenRouter logo](https://models.dev/logos/openrouter.svg)OpenRouter
2
2
 
3
- OpenRouter aggregates models from multiple providers with enhanced features like rate limiting and failover. Access 188 models through Mastra's model router.
3
+ OpenRouter aggregates models from multiple providers with enhanced features like rate limiting and failover. Access 189 models through Mastra's model router.
4
4
 
5
5
  Learn more in the [OpenRouter documentation](https://openrouter.ai/models).
6
6
 
@@ -195,6 +195,7 @@ ANTHROPIC_API_KEY=ant-...
195
195
  | `sourceful/riverflow-v2-max-preview` |
196
196
  | `sourceful/riverflow-v2-standard-preview` |
197
197
  | `stepfun/step-3.5-flash` |
198
+ | `tencent/hy3-preview` |
198
199
  | `x-ai/grok-3` |
199
200
  | `x-ai/grok-3-beta` |
200
201
  | `x-ai/grok-3-mini` |
@@ -1,6 +1,6 @@
1
1
  # Model Providers
2
2
 
3
- Mastra provides a unified interface for working with LLMs across multiple providers, giving you access to 3887 models from 108 providers through a single API.
3
+ Mastra provides a unified interface for working with LLMs across multiple providers, giving you access to 3897 models from 108 providers through a single API.
4
4
 
5
5
  ## Features
6
6
 
@@ -1,6 +1,6 @@
1
1
  # ![LLM Gateway logo](https://models.dev/logos/llmgateway.svg)LLM Gateway
2
2
 
3
- Access 189 LLM Gateway models through Mastra's model router. Authentication is handled automatically using the `LLMGATEWAY_API_KEY` environment variable.
3
+ Access 195 LLM Gateway models through Mastra's model router. Authentication is handled automatically using the `LLMGATEWAY_API_KEY` environment variable.
4
4
 
5
5
  Learn more in the [LLM Gateway documentation](https://llmgateway.io/docs).
6
6
 
@@ -66,6 +66,7 @@ for await (const chunk of stream) {
66
66
  | `llmgateway/gemini-2.5-flash-lite-preview-09-2025` | 1.0M | | | | | | $0.10 | $0.40 |
67
67
  | `llmgateway/gemini-2.5-pro` | 1.0M | | | | | | $1 | $10 |
68
68
  | `llmgateway/gemini-3-flash-preview` | 1.0M | | | | | | $0.50 | $3 |
69
+ | `llmgateway/gemini-3.1-flash-lite` | 1.0M | | | | | | $0.25 | $2 |
69
70
  | `llmgateway/gemini-3.1-flash-lite-preview` | 1.0M | | | | | | $0.25 | $2 |
70
71
  | `llmgateway/gemini-3.1-pro-preview` | 1.0M | | | | | | $2 | $12 |
71
72
  | `llmgateway/gemini-pro-latest` | 1.0M | | | | | | $2 | $12 |
@@ -132,6 +133,7 @@ for await (const chunk of stream) {
132
133
  | `llmgateway/grok-4-1-fast-reasoning` | 2.0M | | | | | | $0.20 | $0.50 |
133
134
  | `llmgateway/grok-4-20-beta-0309-non-reasoning` | 2.0M | | | | | | $2 | $6 |
134
135
  | `llmgateway/grok-4-20-beta-0309-reasoning` | 2.0M | | | | | | $2 | $6 |
136
+ | `llmgateway/grok-4-3` | 1.0M | | | | | | $1 | $3 |
135
137
  | `llmgateway/grok-4-fast` | 2.0M | | | | | | $0.20 | $0.50 |
136
138
  | `llmgateway/grok-4-fast-non-reasoning` | 2.0M | | | | | | $0.20 | $0.50 |
137
139
  | `llmgateway/grok-4-fast-reasoning` | 2.0M | | | | | | $0.20 | $0.50 |
@@ -154,6 +156,10 @@ for await (const chunk of stream) {
154
156
  | `llmgateway/llama-4-scout` | 33K | | | | | | $0.18 | $0.59 |
155
157
  | `llmgateway/llama-4-scout-17b-instruct` | 8K | | | | | | $0.17 | $0.66 |
156
158
  | `llmgateway/mimo-v2-flash` | 262K | | | | | | $0.10 | $0.30 |
159
+ | `llmgateway/mimo-v2-omni` | 262K | | | | | | $0.40 | $2 |
160
+ | `llmgateway/mimo-v2-pro` | 1.0M | | | | | | $1 | $3 |
161
+ | `llmgateway/mimo-v2.5` | 1.0M | | | | | | $0.40 | $2 |
162
+ | `llmgateway/mimo-v2.5-pro` | 1.0M | | | | | | $1 | $3 |
157
163
  | `llmgateway/minimax-m2` | 197K | | | | | | $0.30 | $1 |
158
164
  | `llmgateway/minimax-m2.1` | 205K | | | | | | $0.30 | $1 |
159
165
  | `llmgateway/minimax-m2.1-lightning` | 197K | | | | | | $0.12 | $0.48 |
@@ -1,6 +1,6 @@
1
1
  # ![Nebius Token Factory logo](https://models.dev/logos/nebius.svg)Nebius Token Factory
2
2
 
3
- Access 30 Nebius Token Factory models through Mastra's model router. Authentication is handled automatically using the `NEBIUS_API_KEY` environment variable.
3
+ Access 31 Nebius Token Factory models through Mastra's model router. Authentication is handled automatically using the `NEBIUS_API_KEY` environment variable.
4
4
 
5
5
  Learn more in the [Nebius Token Factory documentation](https://docs.tokenfactory.nebius.com/).
6
6
 
@@ -36,6 +36,7 @@ for await (const chunk of stream) {
36
36
  | ------------------------------------------------ | ------- | ----- | --------- | ----- | ----- | ----- | ---------- | ----------- |
37
37
  | `nebius/deepseek-ai/DeepSeek-V3.2` | 163K | | | | | | $0.30 | $0.45 |
38
38
  | `nebius/deepseek-ai/DeepSeek-V3.2-fast` | 8K | | | | | | $0.40 | $2 |
39
+ | `nebius/deepseek-ai/DeepSeek-V4-Pro` | 1.0M | | | | | | $2 | $4 |
39
40
  | `nebius/google/gemma-2-2b-it` | 8K | | | | | | $0.02 | $0.06 |
40
41
  | `nebius/google/gemma-3-27b-it` | 110K | | | | | | $0.10 | $0.30 |
41
42
  | `nebius/meta-llama/Llama-3.3-70B-Instruct` | 128K | | | | | | $0.13 | $0.40 |