@mastra/mcp-docs-server 1.1.39-alpha.8 → 1.1.39
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.docs/docs/agents/acp.md +238 -0
- package/.docs/docs/agents/agent-approval.md +2 -0
- package/.docs/docs/agents/background-tasks.md +9 -6
- package/.docs/docs/agents/response-caching.md +2 -0
- package/.docs/docs/agents/signals.md +29 -3
- package/.docs/docs/evals/evals-with-memory.md +146 -0
- package/.docs/docs/evals/running-in-ci.md +1 -0
- package/.docs/docs/memory/multi-user-threads.md +206 -0
- package/.docs/docs/memory/observational-memory.md +53 -17
- package/.docs/docs/memory/overview.md +1 -0
- package/.docs/docs/memory/working-memory.md +1 -1
- package/.docs/models/gateways/netlify.md +2 -1
- package/.docs/models/gateways/openrouter.md +2 -1
- package/.docs/models/index.md +1 -1
- package/.docs/models/providers/deepinfra.md +2 -2
- package/.docs/models/providers/fireworks-ai.md +23 -22
- package/.docs/models/providers/google.md +29 -46
- package/.docs/models/providers/llmgateway.md +186 -191
- package/.docs/models/providers/opencode.md +1 -1
- package/.docs/models/providers/orcarouter.md +2 -2
- package/.docs/models/providers/poe.md +2 -1
- package/.docs/models/providers/routing-run.md +27 -40
- package/.docs/models/providers/the-grid-ai.md +15 -9
- package/.docs/models/providers/xai.md +2 -1
- package/.docs/reference/agents/agent.md +13 -5
- package/.docs/reference/agents/channels.md +4 -2
- package/.docs/reference/client-js/agents.md +1 -1
- package/.docs/reference/configuration.md +1 -1
- package/.docs/reference/memory/observational-memory.md +5 -3
- package/.docs/reference/server/register-api-route.md +1 -1
- package/.docs/reference/storage/convex.md +74 -12
- package/.docs/reference/tools/mcp-client.md +27 -2
- package/.docs/reference/vectors/convex.md +129 -7
- package/CHANGELOG.md +66 -0
- package/package.json +6 -6
|
@@ -0,0 +1,238 @@
|
|
|
1
|
+
# ACP (Agent Client Protocol)
|
|
2
|
+
|
|
3
|
+
Mastra supports the [Agent Client Protocol (ACP)](https://agentclientprotocol.com/overview/introduction) for running ACP-compatible coding agents from a Mastra agent. Use `@mastra/acp` to wrap a coding agent process as a Mastra tool or as a subagent.
|
|
4
|
+
|
|
5
|
+
ACP is useful for coding agents such as Claude Code, Amp, Codex, or any other executable that implements the Agent Client Protocol over standard input and output.
|
|
6
|
+
|
|
7
|
+
## When to use ACP
|
|
8
|
+
|
|
9
|
+
- A Mastra agent should delegate code inspection, editing, or repository tasks to an external coding agent.
|
|
10
|
+
- An ACP-compatible agent process should stay alive across calls so it can keep session context.
|
|
11
|
+
- A parent agent needs real-time output from a coding agent while the task runs.
|
|
12
|
+
- An ACP-compatible agent needs permission prompts before it reads files, writes files, or runs actions.
|
|
13
|
+
- File access should go through Mastra's workspace abstraction instead of direct process-only file access.
|
|
14
|
+
|
|
15
|
+
## How ACP works
|
|
16
|
+
|
|
17
|
+
`@mastra/acp` starts the configured ACP agent command as a child process and communicates with it using newline-delimited JSON over standard input and output.
|
|
18
|
+
|
|
19
|
+
The flow is:
|
|
20
|
+
|
|
21
|
+
1. Configure `command`, `args`, and optional connection settings.
|
|
22
|
+
2. `@mastra/acp` spawns the ACP agent process on first use.
|
|
23
|
+
3. The client sends ACP `initialize` and `session/new` requests.
|
|
24
|
+
4. Mastra sends the user task to the ACP agent with `session/prompt`.
|
|
25
|
+
5. The ACP agent streams session updates and message chunks back to Mastra.
|
|
26
|
+
6. Mastra returns the buffered output, emits streaming chunks, or suspends for permission input.
|
|
27
|
+
7. The ACP process stays alive by default, or stops after the prompt when `persistSession` is `false`.
|
|
28
|
+
|
|
29
|
+
During execution, the ACP client also handles permission requests and file operations. File reads and writes go through Mastra's `Workspace`, so the ACP agent operates inside the workspace you provide.
|
|
30
|
+
|
|
31
|
+
## Getting started
|
|
32
|
+
|
|
33
|
+
Install `@mastra/acp` in a project that already uses `@mastra/core`:
|
|
34
|
+
|
|
35
|
+
**npm**:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
npm install @mastra/acp
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
**pnpm**:
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
pnpm add @mastra/acp
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
**Yarn**:
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
yarn add @mastra/acp
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
**Bun**:
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
bun add @mastra/acp
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
`@mastra/acp` exports two APIs:
|
|
60
|
+
|
|
61
|
+
- `createACPTool`: Create a Mastra tool that sends a `task` string to an ACP agent and returns an `output` string.
|
|
62
|
+
- `AcpAgent`: Wrap an ACP agent as a Mastra subagent with `generate()` and `stream()` support.
|
|
63
|
+
|
|
64
|
+
The package requires `@mastra/core` version `1.34.0` or later.
|
|
65
|
+
|
|
66
|
+
## Use ACP as a subagent
|
|
67
|
+
|
|
68
|
+
Use `AcpAgent` when a parent Mastra agent should delegate directly to an ACP-compatible coding agent as a subagent. Create the ACP agent, then register it in the parent agent's `agents` map.
|
|
69
|
+
|
|
70
|
+
```typescript
|
|
71
|
+
import { AcpAgent } from '@mastra/acp'
|
|
72
|
+
import { Agent } from '@mastra/core/agent'
|
|
73
|
+
|
|
74
|
+
const codeAgent = new AcpAgent({
|
|
75
|
+
id: 'code-agent',
|
|
76
|
+
name: 'Code Agent',
|
|
77
|
+
description: 'An ACP-compatible coding agent that can inspect and edit files',
|
|
78
|
+
command: 'acp-agent',
|
|
79
|
+
args: ['--stdio'],
|
|
80
|
+
cwd: process.cwd(),
|
|
81
|
+
})
|
|
82
|
+
|
|
83
|
+
export const codeSupervisor = new Agent({
|
|
84
|
+
id: 'code-supervisor',
|
|
85
|
+
name: 'Code Supervisor',
|
|
86
|
+
instructions: 'Delegate code editing tasks to the code-agent subagent.',
|
|
87
|
+
model: 'openai/gpt-5.4',
|
|
88
|
+
agents: {
|
|
89
|
+
codeAgent,
|
|
90
|
+
},
|
|
91
|
+
})
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
`AcpAgent.generate()` buffers the ACP response and returns it as text. `AcpAgent.stream()` emits Mastra `text-delta` chunks as ACP `agent_message_chunk` updates arrive.
|
|
95
|
+
|
|
96
|
+
## Use ACP as a tool
|
|
97
|
+
|
|
98
|
+
Use `createACPTool` when the parent Mastra agent should decide when to call the ACP agent as a tool. The following example creates a code editing tool and registers it on a parent agent:
|
|
99
|
+
|
|
100
|
+
```typescript
|
|
101
|
+
import { createACPTool } from '@mastra/acp'
|
|
102
|
+
import { Agent } from '@mastra/core/agent'
|
|
103
|
+
|
|
104
|
+
const codeAgentTool = createACPTool({
|
|
105
|
+
id: 'code-agent',
|
|
106
|
+
description: 'Use an ACP-compatible coding agent to inspect and edit code',
|
|
107
|
+
command: 'acp-agent',
|
|
108
|
+
args: ['--stdio'],
|
|
109
|
+
cwd: process.cwd(),
|
|
110
|
+
})
|
|
111
|
+
|
|
112
|
+
export const codeSupervisor = new Agent({
|
|
113
|
+
id: 'code-supervisor',
|
|
114
|
+
name: 'Code Supervisor',
|
|
115
|
+
instructions: 'Use the code-agent tool when a task requires repository inspection or code edits.',
|
|
116
|
+
model: 'openai/gpt-5.4',
|
|
117
|
+
tools: {
|
|
118
|
+
codeAgentTool,
|
|
119
|
+
},
|
|
120
|
+
})
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
Use the `command` and `args` required by the ACP-compatible agent you run. The tool input schema has a single `task` string, and the output schema returns the final ACP response as `output`.
|
|
124
|
+
|
|
125
|
+
If the ACP agent requests permission, the tool can suspend and resume through Mastra's tool suspension flow. Use `onPermissionRequest` when you need custom permission behavior.
|
|
126
|
+
|
|
127
|
+
## Options reference
|
|
128
|
+
|
|
129
|
+
`createACPTool` and `AcpAgent` accept the same ACP connection options. `AcpAgent` also accepts `name` to set the display name used during agent delegation.
|
|
130
|
+
|
|
131
|
+
| Option | Type | Description |
|
|
132
|
+
| --------------------- | -------------------------------- | --------------------------------------------------------------------------------------- |
|
|
133
|
+
| `id` | `string` | Unique tool or subagent identifier. |
|
|
134
|
+
| `description` | `string` | Description shown to the model when it can call the tool or delegate to the subagent. |
|
|
135
|
+
| `command` | `string` | ACP agent executable to spawn. |
|
|
136
|
+
| `args` | `string[]` | Arguments passed to the ACP agent executable. |
|
|
137
|
+
| `env` | `Record<string, string>` | Environment variables to merge with the current process environment. |
|
|
138
|
+
| `cwd` | `string` | Working directory for the ACP process, ACP session, and default workspace. |
|
|
139
|
+
| `session` | `Partial<NewSessionRequest>` | ACP session creation options. Defaults to `cwd` or `process.cwd()` and no MCP servers. |
|
|
140
|
+
| `initialize` | `Partial<InitializeRequest>` | ACP initialization options. Defaults to Mastra client information and protocol version. |
|
|
141
|
+
| `authMethodId` | `string` | ACP authentication method ID to invoke after initialization. |
|
|
142
|
+
| `persistSession` | `boolean` | Keep the ACP process alive after execution. Defaults to `true`. |
|
|
143
|
+
| `onPermissionRequest` | `(request) => Promise<Response>` | Callback for ACP permission requests. Defaults to selecting the first option. |
|
|
144
|
+
| `workspace` | `Workspace` | Workspace used for ACP file reads and writes. |
|
|
145
|
+
|
|
146
|
+
## Session lifecycle
|
|
147
|
+
|
|
148
|
+
`createACPTool` and `AcpAgent` start the configured command on first use and create an ACP session. By default, `persistSession` is `true`, so the child process stays alive across calls.
|
|
149
|
+
|
|
150
|
+
Use the default persistent session when:
|
|
151
|
+
|
|
152
|
+
- The ACP agent benefits from keeping conversation or repository context.
|
|
153
|
+
- Startup is expensive and repeated calls should reuse the same process.
|
|
154
|
+
- A parent agent may delegate several related tasks to the same coding agent.
|
|
155
|
+
|
|
156
|
+
Set `persistSession: false` when each prompt should run in an isolated process:
|
|
157
|
+
|
|
158
|
+
```typescript
|
|
159
|
+
import { AcpAgent } from '@mastra/acp'
|
|
160
|
+
|
|
161
|
+
export const codeAgent = new AcpAgent({
|
|
162
|
+
id: 'code-agent',
|
|
163
|
+
description: 'Run one isolated ACP coding task',
|
|
164
|
+
command: 'acp-agent',
|
|
165
|
+
args: ['--stdio'],
|
|
166
|
+
cwd: process.cwd(),
|
|
167
|
+
persistSession: false,
|
|
168
|
+
})
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
With `persistSession: false`, `@mastra/acp` stops the ACP process after each prompt completes.
|
|
172
|
+
|
|
173
|
+
## Permission handling
|
|
174
|
+
|
|
175
|
+
ACP agents may ask the client to choose a permission option before they continue. By default, `@mastra/acp` selects the first option returned by the ACP agent.
|
|
176
|
+
|
|
177
|
+
Pass `onPermissionRequest` to inspect the request and return the selected option yourself:
|
|
178
|
+
|
|
179
|
+
```typescript
|
|
180
|
+
import { createACPTool } from '@mastra/acp'
|
|
181
|
+
|
|
182
|
+
export const codeAgentTool = createACPTool({
|
|
183
|
+
id: 'code-agent',
|
|
184
|
+
description: 'Use an ACP-compatible coding agent',
|
|
185
|
+
command: 'acp-agent',
|
|
186
|
+
args: ['--stdio'],
|
|
187
|
+
async onPermissionRequest(request) {
|
|
188
|
+
const allowOption = request.options.find(option => option.name === 'Allow')
|
|
189
|
+
|
|
190
|
+
if (!allowOption) {
|
|
191
|
+
return { outcome: { outcome: 'cancelled' } }
|
|
192
|
+
}
|
|
193
|
+
|
|
194
|
+
return {
|
|
195
|
+
outcome: {
|
|
196
|
+
outcome: 'selected',
|
|
197
|
+
optionId: allowOption.optionId,
|
|
198
|
+
},
|
|
199
|
+
}
|
|
200
|
+
},
|
|
201
|
+
})
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
Use this callback to enforce local policy, inspect the permission title, or route the decision to your own approval flow.
|
|
205
|
+
|
|
206
|
+
## Workspace integration
|
|
207
|
+
|
|
208
|
+
ACP file operations go through Mastra's workspace abstraction. If you don't pass `workspace`, `@mastra/acp` creates a `Workspace` backed by `LocalFilesystem` and uses `cwd` as the filesystem root.
|
|
209
|
+
|
|
210
|
+
Pass a custom `Workspace` when the ACP agent should read and write through a specific filesystem implementation:
|
|
211
|
+
|
|
212
|
+
```typescript
|
|
213
|
+
import { AcpAgent } from '@mastra/acp'
|
|
214
|
+
import { LocalFilesystem, Workspace } from '@mastra/core/workspace'
|
|
215
|
+
|
|
216
|
+
const workspace = new Workspace({
|
|
217
|
+
filesystem: new LocalFilesystem({
|
|
218
|
+
root: process.cwd(),
|
|
219
|
+
}),
|
|
220
|
+
})
|
|
221
|
+
|
|
222
|
+
export const codeAgent = new AcpAgent({
|
|
223
|
+
id: 'code-agent',
|
|
224
|
+
description: 'Run coding tasks in a controlled workspace',
|
|
225
|
+
command: 'acp-agent',
|
|
226
|
+
args: ['--stdio'],
|
|
227
|
+
workspace,
|
|
228
|
+
})
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
Use `cwd` and `workspace` together when the ACP process should start in one directory but file operations should use an explicitly configured workspace root.
|
|
232
|
+
|
|
233
|
+
## Related
|
|
234
|
+
|
|
235
|
+
- [Agent reference](https://mastra.ai/reference/agents/agent)
|
|
236
|
+
- [Subagents](https://mastra.ai/docs/agents/supervisor-agents)
|
|
237
|
+
- [Agent Client Protocol introduction](https://agentclientprotocol.com/overview/introduction)
|
|
238
|
+
- [Agent Client Protocol schema](https://agentclientprotocol.com/protocol/schema)
|
|
@@ -92,6 +92,8 @@ A tool can also pause _during_ its `execute` function by calling `suspend()`. Th
|
|
|
92
92
|
|
|
93
93
|
The stream emits a `tool-call-suspended` chunk with a custom payload defined by the tool's `suspendSchema`. You resume by calling `resumeStream()` with data matching the tool's `resumeSchema`.
|
|
94
94
|
|
|
95
|
+
> **Note:** `suspend()` does not throw — return immediately after calling it (e.g. `return await suspend({ ... })`). Code after `await suspend(...)` still runs before the tool pauses.
|
|
96
|
+
|
|
95
97
|
## Tool approval with `generate()`
|
|
96
98
|
|
|
97
99
|
Tool approval also works with `generate()` for non-streaming use cases. When a tool requires approval, `generate()` returns immediately with `finishReason: 'suspended'`, a `suspendPayload` containing the tool call details (`toolCallId`, `toolName`, `args`), and a `runId`:
|
|
@@ -40,11 +40,12 @@ The full set of options is listed in the [backgroundTasks configuration referenc
|
|
|
40
40
|
|
|
41
41
|
## Run a tool in the background
|
|
42
42
|
|
|
43
|
-
Enabling the manager doesn't run anything in the background by itself as every tool defaults to foreground execution.
|
|
43
|
+
Enabling the manager doesn't run anything in the background by itself as every tool defaults to foreground execution. Tools opt in at one of two layers:
|
|
44
44
|
|
|
45
|
-
1. **
|
|
45
|
+
1. **Tool-level config**: the tool itself declares it as background-eligible.
|
|
46
46
|
2. **Agent-level config**: the agent declares which of its tools are background-eligible.
|
|
47
|
-
|
|
47
|
+
|
|
48
|
+
Once a tool has opted in, the LLM can optionally include a `_background` field in the tool arguments to override the resolved config for a specific call (timeout, retries, or to flip the call back to foreground).
|
|
48
49
|
|
|
49
50
|
### Tool-level
|
|
50
51
|
|
|
@@ -103,13 +104,15 @@ When a tool is registered on an agent that has background tasks enabled, the mod
|
|
|
103
104
|
}
|
|
104
105
|
```
|
|
105
106
|
|
|
107
|
+
The `_background` override is a _modifier_ on tools the developer has already opted in at the tool or agent layer — it is not a standalone opt-in. If a tool hasn't been opted in, `_background.enabled: true` from the model is ignored and the tool runs in the foreground. This keeps deterministic, foreground-only tools (calculators, lookups, schema validators) from being silently dispatched as tasks.
|
|
108
|
+
|
|
106
109
|
### Resolution order
|
|
107
110
|
|
|
108
111
|
When a tool call is dispatched, the resolved background config is computed in this priority order:
|
|
109
112
|
|
|
110
|
-
1.
|
|
111
|
-
2.
|
|
112
|
-
3.
|
|
113
|
+
1. Agent-level `backgroundTasks.tools` entry for the tool.
|
|
114
|
+
2. Tool-level `backgroundTasks` config.
|
|
115
|
+
3. LLM `_background.enabled` override (only used to enable background dispatch when the tool was opted in at one of the layers above).
|
|
113
116
|
4. Manager defaults (`defaultTimeoutMs`, `defaultRetries`).
|
|
114
117
|
|
|
115
118
|
If the agent has `backgroundTasks.disabled: true`, every tool call runs synchronously regardless of the layers above.
|
|
@@ -1,5 +1,7 @@
|
|
|
1
1
|
# Response Caching
|
|
2
2
|
|
|
3
|
+
> **Experimental:** This feature is in alpha. Breaking changes may occur without a major version bump until the API is stable.
|
|
4
|
+
|
|
3
5
|
Response caching skips the LLM call and replays a previously cached response when an agent receives an identical request. Use it to reduce latency and avoid paying for repeated calls.
|
|
4
6
|
|
|
5
7
|
Caching is implemented as the [`ResponseCache`](https://mastra.ai/reference/processors/response-cache) input processor. Mastra doesn't provide an agent-level option. To enable caching, register the processor explicitly. This keeps the API surface small while Mastra collects feedback; per-call overrides flow through `RequestContext`.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Signals
|
|
2
2
|
|
|
3
|
-
> **Experimental:**
|
|
3
|
+
> **Experimental:** This feature is in alpha. Breaking changes may occur without a major version bump until the API is stable.
|
|
4
4
|
|
|
5
5
|
Signals are a way to interact with an agent through a thread. Instead of starting every interaction with `agent.stream()`, subscribe to a thread and send signals. Mastra either wakes the agent when the thread is idle or drops the signal into the running agent loop.
|
|
6
6
|
|
|
@@ -86,6 +86,32 @@ agent.sendSignal(
|
|
|
86
86
|
)
|
|
87
87
|
```
|
|
88
88
|
|
|
89
|
+
## Identify users with attributes
|
|
90
|
+
|
|
91
|
+
Use `attributes` to tag each signal with user identity. The signal type and attributes are rendered as XML so the model can distinguish who said what in a multi-user thread:
|
|
92
|
+
|
|
93
|
+
```typescript
|
|
94
|
+
agent.sendSignal(
|
|
95
|
+
{
|
|
96
|
+
type: 'user',
|
|
97
|
+
contents: 'Can we simplify the API surface?',
|
|
98
|
+
attributes: { name: 'Devin', from: 'slack' },
|
|
99
|
+
},
|
|
100
|
+
{
|
|
101
|
+
resourceId: 'user_123',
|
|
102
|
+
threadId: 'thread_456',
|
|
103
|
+
},
|
|
104
|
+
)
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
The model receives:
|
|
108
|
+
|
|
109
|
+
```xml
|
|
110
|
+
<user name="Devin" from="slack">Can we simplify the API surface?</user>
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
The UI sees just the message contents but can also read `attributes` and `metadata` off the signal message for custom rendering (e.g. showing user names, avatars, or platform badges).
|
|
114
|
+
|
|
89
115
|
## Send external event context
|
|
90
116
|
|
|
91
117
|
Use custom signal types for system-generated context. Non-user signal types are rendered as XML-style user-role context so they can appear inside conversation history without looking like assistant output.
|
|
@@ -96,7 +122,7 @@ agent.sendSignal(
|
|
|
96
122
|
type: 'system-reminder',
|
|
97
123
|
contents: 'User X has left a new PR comment asking for a smaller API surface.',
|
|
98
124
|
attributes: {
|
|
99
|
-
|
|
125
|
+
source: 'github',
|
|
100
126
|
pr: '123',
|
|
101
127
|
},
|
|
102
128
|
},
|
|
@@ -110,7 +136,7 @@ agent.sendSignal(
|
|
|
110
136
|
The model receives the custom signal as context like this:
|
|
111
137
|
|
|
112
138
|
```xml
|
|
113
|
-
<system-reminder
|
|
139
|
+
<system-reminder source="github" pr="123">User X has left a new PR comment asking for a smaller API surface.</system-reminder>
|
|
114
140
|
```
|
|
115
141
|
|
|
116
142
|
Use XML-safe signal type names and attribute names. Signal type names and attribute names can contain letters, numbers, underscores, periods, and hyphens. They must start with a letter or underscore.
|
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
# Evals with memory
|
|
2
|
+
|
|
3
|
+
Agents that use memory in `thread` scope — including observational memory — require a thread ID at run time. When an eval invokes the agent without one, you'll see:
|
|
4
|
+
|
|
5
|
+
```text
|
|
6
|
+
ObservationalMemory (scope: 'thread') requires a threadId, but none was found in RequestContext or MessageList.
|
|
7
|
+
```
|
|
8
|
+
|
|
9
|
+
This page covers the three working patterns for running Mastra evals against memory-enabled agents, what each path supports, and which one to pick. A complete runnable repro for all three approaches lives in [`examples/evals-with-memory`](https://github.com/mastra-ai/mastra/tree/main/examples/evals-with-memory).
|
|
10
|
+
|
|
11
|
+
## When to use which approach
|
|
12
|
+
|
|
13
|
+
| Goal | Approach |
|
|
14
|
+
| ----------------------------------------------- | ----------------------------------------------------------------------------------------- |
|
|
15
|
+
| One shared conversation across every item | [`runEvals` with global `targetOptions.memory`](#shared-thread-with-runevals) |
|
|
16
|
+
| One independent thread per item, simple CI loop | [`runEvals` per item](#per-item-threads-with-runevals) |
|
|
17
|
+
| Per-item threads driven by a stored `Dataset` | [`dataset.startExperiment` with an inline task](#dataset-experiments-with-an-inline-task) |
|
|
18
|
+
|
|
19
|
+
Pre-seeding `RequestContext` with `MastraMemory` is **not** a supported way to drive memory into an agent. Thread resolution reads `args.memory.thread` — `RequestContext.MastraMemory` is populated by `prepare-memory-step` after the agent has already resolved its thread.
|
|
20
|
+
|
|
21
|
+
## Shared thread with `runEvals`
|
|
22
|
+
|
|
23
|
+
`runEvals` accepts `targetOptions`, which is forwarded to `agent.generate()`. Passing `memory: { thread, resource }` runs every data item against the same thread — useful for testing recall across a multi-turn conversation.
|
|
24
|
+
|
|
25
|
+
```typescript
|
|
26
|
+
import { runEvals } from '@mastra/core/evals'
|
|
27
|
+
import { supportAgent } from './support-agent'
|
|
28
|
+
import { recallScorer } from '../scorers/recall-scorer'
|
|
29
|
+
|
|
30
|
+
const memory = await supportAgent.getMemory()
|
|
31
|
+
await memory!.createThread({ threadId: 'eval-thread', resourceId: 'ci-user' })
|
|
32
|
+
|
|
33
|
+
const result = await runEvals({
|
|
34
|
+
target: supportAgent,
|
|
35
|
+
scorers: [recallScorer],
|
|
36
|
+
targetOptions: {
|
|
37
|
+
memory: { thread: 'eval-thread', resource: 'ci-user' },
|
|
38
|
+
},
|
|
39
|
+
data: [
|
|
40
|
+
{ input: 'My order number is 12345' },
|
|
41
|
+
{ input: 'What is my order number?', groundTruth: '12345' },
|
|
42
|
+
],
|
|
43
|
+
})
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
`targetOptions` is **global per call**. There is no per-item override on `RunEvalsDataItem` today.
|
|
47
|
+
|
|
48
|
+
## Per-item threads with `runEvals`
|
|
49
|
+
|
|
50
|
+
When each data item needs its own thread (the common CI shape), call `runEvals` once per item with a unique `targetOptions.memory` and aggregate the scores yourself.
|
|
51
|
+
|
|
52
|
+
```typescript
|
|
53
|
+
import { randomUUID } from 'node:crypto'
|
|
54
|
+
import { runEvals } from '@mastra/core/evals'
|
|
55
|
+
import { supportAgent } from './support-agent'
|
|
56
|
+
import { recallScorer } from '../scorers/recall-scorer'
|
|
57
|
+
|
|
58
|
+
const memory = await supportAgent.getMemory()
|
|
59
|
+
const resourceId = 'ci-user'
|
|
60
|
+
|
|
61
|
+
const items = [
|
|
62
|
+
{ input: 'Cats are mammals', groundTruth: 'mammals' },
|
|
63
|
+
{ input: 'Dogs are mammals too', groundTruth: 'mammals' },
|
|
64
|
+
]
|
|
65
|
+
|
|
66
|
+
// `runEvals` returns `{ scores: Record<string, number>; summary: { totalItems } }`.
|
|
67
|
+
const scores: number[] = []
|
|
68
|
+
for (const item of items) {
|
|
69
|
+
const threadId = `eval-${randomUUID()}`
|
|
70
|
+
await memory!.createThread({ threadId, resourceId, title: item.input })
|
|
71
|
+
|
|
72
|
+
const result = await runEvals({
|
|
73
|
+
target: supportAgent,
|
|
74
|
+
scorers: [recallScorer],
|
|
75
|
+
targetOptions: { memory: { thread: threadId, resource: resourceId } },
|
|
76
|
+
data: [item],
|
|
77
|
+
})
|
|
78
|
+
|
|
79
|
+
scores.push(result.scores[recallScorer.id])
|
|
80
|
+
}
|
|
81
|
+
|
|
82
|
+
const average = scores.reduce((a, b) => a + b, 0) / scores.length
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
> **Note:** Create the thread before running the eval. Observational memory in `thread` scope reads from a record that must already exist.
|
|
86
|
+
|
|
87
|
+
## Dataset experiments with an inline task
|
|
88
|
+
|
|
89
|
+
`dataset.startExperiment({ target: agent })` does **not** forward a `memory` option to the agent — only `requestContext`. To run a stored dataset against a memory-enabled agent, use an inline `task` function and stash `{ threadId, resourceId }` in each item's `metadata`. The scorer pipeline still runs as normal.
|
|
90
|
+
|
|
91
|
+
```typescript
|
|
92
|
+
import { randomUUID } from 'node:crypto'
|
|
93
|
+
import { mastra } from '../index'
|
|
94
|
+
import { supportAgent } from '../agents/support-agent'
|
|
95
|
+
import { recallScorer } from '../scorers/recall-scorer'
|
|
96
|
+
|
|
97
|
+
const memory = await supportAgent.getMemory()
|
|
98
|
+
const resourceId = 'ci-user'
|
|
99
|
+
|
|
100
|
+
const items = [
|
|
101
|
+
{ input: 'Cats are mammals', groundTruth: 'mammals', thread: `ds-${randomUUID()}` },
|
|
102
|
+
{ input: 'Dogs are mammals too', groundTruth: 'mammals', thread: `ds-${randomUUID()}` },
|
|
103
|
+
]
|
|
104
|
+
|
|
105
|
+
for (const it of items) {
|
|
106
|
+
await memory!.createThread({ threadId: it.thread, resourceId, title: it.input })
|
|
107
|
+
}
|
|
108
|
+
|
|
109
|
+
const dataset = await mastra.datasets.create({
|
|
110
|
+
name: 'support-recall',
|
|
111
|
+
description: 'Per-item memory via inline task + item metadata',
|
|
112
|
+
})
|
|
113
|
+
|
|
114
|
+
await dataset.addItems({
|
|
115
|
+
items: items.map(it => ({
|
|
116
|
+
input: it.input,
|
|
117
|
+
groundTruth: it.groundTruth,
|
|
118
|
+
metadata: { threadId: it.thread, resourceId },
|
|
119
|
+
})),
|
|
120
|
+
})
|
|
121
|
+
|
|
122
|
+
const summary = await dataset.startExperiment({
|
|
123
|
+
scorers: [recallScorer],
|
|
124
|
+
task: async ({ input, metadata }) => {
|
|
125
|
+
const { threadId, resourceId: rid } = (metadata ?? {}) as {
|
|
126
|
+
threadId: string
|
|
127
|
+
resourceId: string
|
|
128
|
+
}
|
|
129
|
+
const result = await supportAgent.generate(input as string, {
|
|
130
|
+
memory: { thread: threadId, resource: rid },
|
|
131
|
+
})
|
|
132
|
+
return result.text
|
|
133
|
+
},
|
|
134
|
+
})
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
The inline `task` receives the item's `metadata`, so each row can drive its own thread without changing the agent or any scorer.
|
|
138
|
+
|
|
139
|
+
> **Note:** Visit [runEvals reference](https://mastra.ai/reference/evals/run-evals) and [Dataset reference](https://mastra.ai/reference/datasets/dataset) for full configuration.
|
|
140
|
+
|
|
141
|
+
## Related
|
|
142
|
+
|
|
143
|
+
- [Running scorers in CI](https://mastra.ai/docs/evals/running-in-ci)
|
|
144
|
+
- [Running experiments](https://mastra.ai/docs/evals/datasets/running-experiments)
|
|
145
|
+
- [Observational memory](https://mastra.ai/docs/memory/observational-memory)
|
|
146
|
+
- [runEvals API reference](https://mastra.ai/reference/evals/run-evals)
|
|
@@ -121,4 +121,5 @@ describe('Weather Agent Tests', () => {
|
|
|
121
121
|
|
|
122
122
|
- Learn about [creating custom scorers](https://mastra.ai/docs/evals/custom-scorers)
|
|
123
123
|
- Explore [built-in scorers](https://mastra.ai/docs/evals/built-in-scorers)
|
|
124
|
+
- Run scorers against [memory-enabled agents](https://mastra.ai/docs/evals/evals-with-memory)
|
|
124
125
|
- Read the [runEvals API reference](https://mastra.ai/reference/evals/run-evals)
|