@mastra/mcp-docs-server 0.0.1 → 0.0.2-alpha.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.docs/organized/changelogs/%40mastra%2Fastra.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fchroma.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fclient-js.md +27 -27
- package/.docs/organized/changelogs/%40mastra%2Fcomposio.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fcore.md +24 -24
- package/.docs/organized/changelogs/%40mastra%2Fdeployer-cloudflare.md +37 -37
- package/.docs/organized/changelogs/%40mastra%2Fdeployer-netlify.md +37 -37
- package/.docs/organized/changelogs/%40mastra%2Fdeployer-vercel.md +37 -37
- package/.docs/organized/changelogs/%40mastra%2Fdeployer.md +36 -36
- package/.docs/organized/changelogs/%40mastra%2Fevals.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Ffirecrawl.md +29 -29
- package/.docs/organized/changelogs/%40mastra%2Fgithub.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Floggers.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fmcp-docs-server.md +26 -0
- package/.docs/organized/changelogs/%40mastra%2Fmcp.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fmemory.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fpg.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fpinecone.md +29 -29
- package/.docs/organized/changelogs/%40mastra%2Fplayground-ui.md +34 -34
- package/.docs/organized/changelogs/%40mastra%2Fqdrant.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Frag.md +27 -27
- package/.docs/organized/changelogs/%40mastra%2Fragie.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fspeech-azure.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fspeech-deepgram.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fspeech-elevenlabs.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fspeech-google.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fspeech-ibm.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fspeech-murf.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fspeech-openai.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fspeech-playai.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fspeech-replicate.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fspeech-speechify.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fstabilityai.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fturbopuffer.md +24 -0
- package/.docs/organized/changelogs/%40mastra%2Fupstash.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fvectorize.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fvoice-deepgram.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fvoice-elevenlabs.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fvoice-google.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fvoice-murf.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fvoice-openai-realtime.md +26 -0
- package/.docs/organized/changelogs/%40mastra%2Fvoice-openai.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fvoice-playai.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fvoice-sarvam.md +25 -0
- package/.docs/organized/changelogs/%40mastra%2Fvoice-speechify.md +25 -25
- package/.docs/organized/changelogs/create-mastra.md +16 -16
- package/.docs/organized/changelogs/mastra.md +52 -52
- package/.docs/organized/code-examples/agent.md +8 -3
- package/.docs/organized/code-examples/ai-sdk-useChat.md +1 -0
- package/.docs/organized/code-examples/weather-agent.md +1 -0
- package/.docs/raw/deployment/client.mdx +120 -0
- package/.docs/raw/deployment/server.mdx +1 -1
- package/.docs/raw/evals/00-overview.mdx +58 -75
- package/.docs/raw/evals/01-textual-evals.mdx +53 -0
- package/.docs/raw/evals/02-custom-eval.mdx +6 -170
- package/.docs/raw/evals/03-running-in-ci.mdx +78 -0
- package/.docs/raw/getting-started/installation.mdx +24 -13
- package/.docs/raw/getting-started/mcp-docs-server.mdx +138 -0
- package/.docs/raw/index.mdx +2 -2
- package/.docs/raw/local-dev/add-to-existing-project.mdx +48 -0
- package/.docs/raw/local-dev/creating-a-new-project.mdx +54 -0
- package/.docs/raw/local-dev/mastra-dev.mdx +78 -35
- package/.docs/raw/reference/agents/createTool.mdx +128 -89
- package/.docs/raw/reference/agents/stream.mdx +19 -18
- package/.docs/raw/reference/cli/dev.mdx +58 -21
- package/.docs/raw/storage/overview.mdx +331 -0
- package/package.json +2 -2
- package/.docs/raw/evals/01-supported-evals.mdx +0 -31
- package/.docs/raw/local-dev/creating-projects.mdx +0 -74
- package/.docs/raw/reference/client-js/index.mdx +0 -127
- /package/.docs/raw/{local-dev/integrations.mdx → integrations/index.mdx} +0 -0
|
@@ -1,5 +1,56 @@
|
|
|
1
1
|
# mastra
|
|
2
2
|
|
|
3
|
+
## 0.4.1-alpha.1
|
|
4
|
+
|
|
5
|
+
### Patch Changes
|
|
6
|
+
|
|
7
|
+
- 2f6a8b6: Update port handling in dev command to ensure CLI port takes precedence over environment variables and add warning when overriding PORT environment variable.
|
|
8
|
+
- Updated dependencies [beaf1c2]
|
|
9
|
+
- Updated dependencies [3084e13]
|
|
10
|
+
- @mastra/core@0.6.2-alpha.0
|
|
11
|
+
- @mastra/deployer@0.2.2-alpha.1
|
|
12
|
+
|
|
13
|
+
## 0.4.1-alpha.0
|
|
14
|
+
|
|
15
|
+
### Patch Changes
|
|
16
|
+
|
|
17
|
+
- aede1ea: Add non english support to weather example
|
|
18
|
+
- Updated dependencies [4e6732b]
|
|
19
|
+
- @mastra/deployer@0.2.2-alpha.0
|
|
20
|
+
|
|
21
|
+
## 0.4.0
|
|
22
|
+
|
|
23
|
+
### Minor Changes
|
|
24
|
+
|
|
25
|
+
- f9b6ab5: add Cerebras as a llm provider to create-mastra@latest
|
|
26
|
+
|
|
27
|
+
### Patch Changes
|
|
28
|
+
|
|
29
|
+
- 5052613: Added a new `mastra create --project-name <string>` flag so coder agents can create new Mastra projects with a one line command.
|
|
30
|
+
- 1291e89: Add resizable-panel to playground-ui and use in agent and workflow sidebars
|
|
31
|
+
- 1405e46: update the Groq model the create-mastra@latest sets
|
|
32
|
+
- da8d9bb: Enable public dir copying if it exists
|
|
33
|
+
- 9ba1e97: update playground ui for mastra and create-mastra
|
|
34
|
+
- 5baf1ec: animate new traces
|
|
35
|
+
- 65f2a4c: Add Mastra Docs MCP to the pnpm create mastra TUI with the option to install in Cursor or Windsurf
|
|
36
|
+
- 9116d70: Handle the different workflow methods in workflow graph
|
|
37
|
+
- 0709d99: add prop for dynamic empty text
|
|
38
|
+
- Updated dependencies [cc7f392]
|
|
39
|
+
- Updated dependencies [fc2f89c]
|
|
40
|
+
- Updated dependencies [dfbb131]
|
|
41
|
+
- Updated dependencies [f4854ee]
|
|
42
|
+
- Updated dependencies [afaf73f]
|
|
43
|
+
- Updated dependencies [0850b4c]
|
|
44
|
+
- Updated dependencies [7bcfaee]
|
|
45
|
+
- Updated dependencies [da8d9bb]
|
|
46
|
+
- Updated dependencies [44631b1]
|
|
47
|
+
- Updated dependencies [9116d70]
|
|
48
|
+
- Updated dependencies [6e559a0]
|
|
49
|
+
- Updated dependencies [5f43505]
|
|
50
|
+
- Updated dependencies [61ad5a4]
|
|
51
|
+
- @mastra/deployer@0.2.1
|
|
52
|
+
- @mastra/core@0.6.1
|
|
53
|
+
|
|
3
54
|
## 0.4.0-alpha.2
|
|
4
55
|
|
|
5
56
|
### Patch Changes
|
|
@@ -247,56 +298,5 @@
|
|
|
247
298
|
- Updated dependencies [c151ae6]
|
|
248
299
|
- Updated dependencies [52e0418]
|
|
249
300
|
- Updated dependencies [03236ec]
|
|
250
|
-
- Updated dependencies [3764e71]
|
|
251
|
-
- Updated dependencies [df982db]
|
|
252
|
-
- Updated dependencies [0461849]
|
|
253
|
-
- Updated dependencies [2259379]
|
|
254
|
-
- Updated dependencies [358f069]
|
|
255
|
-
- @mastra/core@0.5.0-alpha.5
|
|
256
|
-
- @mastra/deployer@0.1.8-alpha.5
|
|
257
|
-
|
|
258
|
-
## 0.2.9-alpha.4
|
|
259
|
-
|
|
260
|
-
### Patch Changes
|
|
261
|
-
|
|
262
|
-
- 144b3d5: Update traces table UI, agent Chat UI
|
|
263
|
-
Fix get workflows breaking
|
|
264
|
-
- Updated dependencies [d79aedf]
|
|
265
|
-
- Updated dependencies [144b3d5]
|
|
266
|
-
- @mastra/core@0.5.0-alpha.4
|
|
267
|
-
- @mastra/deployer@0.1.8-alpha.4
|
|
268
|
-
|
|
269
|
-
## 0.2.9-alpha.3
|
|
270
|
-
|
|
271
|
-
### Patch Changes
|
|
272
|
-
|
|
273
|
-
- Updated dependencies [3d0e290]
|
|
274
|
-
- @mastra/core@0.5.0-alpha.3
|
|
275
|
-
- @mastra/deployer@0.1.8-alpha.3
|
|
276
|
-
|
|
277
|
-
## 0.2.9-alpha.2
|
|
278
|
-
|
|
279
|
-
### Patch Changes
|
|
280
|
-
|
|
281
|
-
- Updated dependencies [02ffb7b]
|
|
282
|
-
- @mastra/core@0.5.0-alpha.2
|
|
283
|
-
- @mastra/deployer@0.1.8-alpha.2
|
|
284
|
-
|
|
285
|
-
## 0.2.9-alpha.1
|
|
286
|
-
|
|
287
|
-
### Patch Changes
|
|
288
|
-
|
|
289
|
-
- e5149bb: Fix playground-ui agent-evals tab-content
|
|
290
|
-
- Updated dependencies [dab255b]
|
|
291
|
-
- @mastra/core@0.5.0-alpha.1
|
|
292
|
-
- @mastra/deployer@0.1.8-alpha.1
|
|
293
|
-
|
|
294
|
-
## 0.2.9-alpha.0
|
|
295
|
-
|
|
296
|
-
### Patch Changes
|
|
297
|
-
|
|
298
|
-
- 5fae49e: Configurable timeout on npm create mastra
|
|
299
|
-
- 960690d: Improve client-js workflow watch dx
|
|
300
|
-
- 62565c1: --no-timeout npm create mastra flag
|
|
301
301
|
|
|
302
|
-
...
|
|
302
|
+
... 2010 more lines hidden. See full changelog in package directory.
|
|
@@ -375,9 +375,14 @@ import { z } from 'zod';
|
|
|
375
375
|
export const cookingTool = createTool({
|
|
376
376
|
id: 'cooking-tool',
|
|
377
377
|
description: 'My tool description',
|
|
378
|
-
inputSchema: z.object({
|
|
379
|
-
|
|
380
|
-
|
|
378
|
+
inputSchema: z.object({
|
|
379
|
+
ingredient: z.string(),
|
|
380
|
+
}),
|
|
381
|
+
execute: async ({ context }, options) => {
|
|
382
|
+
console.log('My tool is running!', context.ingredient);
|
|
383
|
+
if (options?.toolCallId) {
|
|
384
|
+
console.log('Tool call ID:', options.toolCallId);
|
|
385
|
+
}
|
|
381
386
|
return 'My tool result';
|
|
382
387
|
},
|
|
383
388
|
});
|
|
@@ -48,6 +48,7 @@ export const weatherAgent = new Agent({
|
|
|
48
48
|
|
|
49
49
|
Your primary function is to help users get weather details for specific locations. When responding:
|
|
50
50
|
- Always ask for a location if none is provided
|
|
51
|
+
- If the location name isn’t in English, please translate it
|
|
51
52
|
- If giving a location with multiple parts (e.g. "New York, NY"), use the most relevant part (e.g. "New York")
|
|
52
53
|
- Include relevant details like humidity, wind conditions, and precipitation
|
|
53
54
|
- Keep responses concise but informative
|
|
@@ -35,6 +35,7 @@ export const weatherAgent = new Agent({
|
|
|
35
35
|
|
|
36
36
|
Your primary function is to help users get weather details for specific locations. When responding:
|
|
37
37
|
- Always ask for a location if none is provided
|
|
38
|
+
- If the location name isn’t in English, please translate it
|
|
38
39
|
- If giving a location with multiple parts (e.g. "New York, NY"), use the most relevant part (e.g. "New York")
|
|
39
40
|
- Include relevant details like humidity, wind conditions, and precipitation
|
|
40
41
|
- Keep responses concise but informative
|
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "MastraClient"
|
|
3
|
+
description: "Learn how to set up and use the Mastra Client SDK"
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Mastra Client SDK
|
|
7
|
+
|
|
8
|
+
The Mastra Client SDK provides a simple and type-safe interface for interacting with your [Mastra Server](/docs/deployment/server) from your client environment.
|
|
9
|
+
|
|
10
|
+
## Development Requirements
|
|
11
|
+
|
|
12
|
+
To ensure smooth local development, make sure you have:
|
|
13
|
+
|
|
14
|
+
- Node.js 18.x or later installed
|
|
15
|
+
- TypeScript 4.7+ (if using TypeScript)
|
|
16
|
+
- A modern browser environment with Fetch API support
|
|
17
|
+
- Your local Mastra server running (typically on port 4111)
|
|
18
|
+
|
|
19
|
+
## Installation
|
|
20
|
+
|
|
21
|
+
import { Tabs } from "nextra/components";
|
|
22
|
+
|
|
23
|
+
<Tabs items={["npm", "yarn", "pnpm"]}>
|
|
24
|
+
<Tabs.Tab>
|
|
25
|
+
```bash copy
|
|
26
|
+
npm install @mastra/client-js
|
|
27
|
+
```
|
|
28
|
+
</Tabs.Tab>
|
|
29
|
+
<Tabs.Tab>
|
|
30
|
+
```bash copy
|
|
31
|
+
yarn add @mastra/client-js
|
|
32
|
+
```
|
|
33
|
+
</Tabs.Tab>
|
|
34
|
+
<Tabs.Tab>
|
|
35
|
+
```bash copy
|
|
36
|
+
pnpm add @mastra/client-js
|
|
37
|
+
```
|
|
38
|
+
</Tabs.Tab>
|
|
39
|
+
</Tabs>
|
|
40
|
+
|
|
41
|
+
## Initialize Mastra Client
|
|
42
|
+
|
|
43
|
+
To get started you'll need to initialize your MastraClient with necessary parameters:
|
|
44
|
+
|
|
45
|
+
```typescript
|
|
46
|
+
import { MastraClient } from "@mastra/client-js";
|
|
47
|
+
|
|
48
|
+
const client = new MastraClient({
|
|
49
|
+
baseUrl: "http://localhost:4111", // Default Mastra development server port
|
|
50
|
+
});
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
### Configuration Options
|
|
54
|
+
|
|
55
|
+
You can customize the client with various options:
|
|
56
|
+
|
|
57
|
+
```typescript
|
|
58
|
+
const client = new MastraClient({
|
|
59
|
+
// Required
|
|
60
|
+
baseUrl: "http://localhost:4111",
|
|
61
|
+
|
|
62
|
+
// Optional configurations for development
|
|
63
|
+
retries: 3, // Number of retry attempts
|
|
64
|
+
backoffMs: 300, // Initial retry backoff time
|
|
65
|
+
maxBackoffMs: 5000, // Maximum retry backoff time
|
|
66
|
+
headers: { // Custom headers for development
|
|
67
|
+
"X-Development": "true"
|
|
68
|
+
}
|
|
69
|
+
});
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## Example
|
|
73
|
+
|
|
74
|
+
Once your MastraClient is initialized you can start making client calls via the type-safe
|
|
75
|
+
interface
|
|
76
|
+
|
|
77
|
+
```typescript
|
|
78
|
+
// Get a reference to your local agent
|
|
79
|
+
const agent = client.getAgent("dev-agent-id");
|
|
80
|
+
|
|
81
|
+
// Generate responses
|
|
82
|
+
const response = await agent.generate({
|
|
83
|
+
messages: [
|
|
84
|
+
{
|
|
85
|
+
role: "user",
|
|
86
|
+
content: "Hello, I'm testing the local development setup!"
|
|
87
|
+
}
|
|
88
|
+
]
|
|
89
|
+
});
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
## Available Features
|
|
93
|
+
|
|
94
|
+
Mastra client exposes all resources served by the Mastra Server
|
|
95
|
+
|
|
96
|
+
- [**Agents**](/docs/reference/client-js/agents): Create and manage AI agents, generate responses, and handle streaming interactions
|
|
97
|
+
- [**Memory**](/docs/reference/client-js/memory): Manage conversation threads and message history
|
|
98
|
+
- [**Tools**](/docs/reference/client-js/tools): Access and execute tools available to agents
|
|
99
|
+
- [**Workflows**](/docs/reference/client-js/workflows): Create and manage automated workflows
|
|
100
|
+
- [**Vectors**](/docs/reference/client-js/vectors): Handle vector operations for semantic search and similarity matching
|
|
101
|
+
|
|
102
|
+
|
|
103
|
+
## Best Practices
|
|
104
|
+
1. **Error Handling**: Implement proper error handling for development scenarios
|
|
105
|
+
2. **Environment Variables**: Use environment variables for configuration
|
|
106
|
+
3. **Debugging**: Enable detailed logging when needed
|
|
107
|
+
|
|
108
|
+
```typescript
|
|
109
|
+
// Example with error handling and logging
|
|
110
|
+
try {
|
|
111
|
+
const agent = client.getAgent("dev-agent-id");
|
|
112
|
+
const response = await agent.generate({
|
|
113
|
+
messages: [{ role: "user", content: "Test message" }]
|
|
114
|
+
});
|
|
115
|
+
console.log("Response:", response);
|
|
116
|
+
} catch (error) {
|
|
117
|
+
console.error("Development error:", error);
|
|
118
|
+
}
|
|
119
|
+
```
|
|
120
|
+
|
|
@@ -5,7 +5,7 @@ description: "Configure and customize the Mastra server with middleware and othe
|
|
|
5
5
|
|
|
6
6
|
# Mastra Server
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
While developing or when you deploy a Mastra application, it runs as an HTTP server that exposes your agents, workflows, and other functionality as API endpoints. This page explains how to configure and customize the server behavior.
|
|
9
9
|
|
|
10
10
|
## Server Architecture
|
|
11
11
|
|
|
@@ -1,106 +1,89 @@
|
|
|
1
1
|
---
|
|
2
2
|
title: "Overview"
|
|
3
|
-
description: "
|
|
3
|
+
description: "Understanding how to evaluate and measure AI agent quality using Mastra evals."
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# Testing your agents with evals
|
|
7
7
|
|
|
8
|
+
While traditional software tests have clear pass/fail conditions, AI outputs are non-deterministic — they can vary with the same input. Evals help bridge this gap by providing quantifiable metrics for measuring agent quality.
|
|
9
|
+
|
|
8
10
|
Evals are automated tests that evaluate Agents outputs using model-graded, rule-based, and statistical methods. Each eval returns a normalized score between 0-1 that can be logged and compared. Evals can be customized with your own prompts and scoring functions.
|
|
9
11
|
|
|
10
12
|
Evals can be run in the cloud, capturing real-time results. But evals can also be part of your CI/CD pipeline, allowing you to test and monitor your agents over time.
|
|
11
13
|
|
|
12
|
-
##
|
|
14
|
+
## Types of Evals
|
|
15
|
+
|
|
16
|
+
There are different kinds of evals, each serving a specific purpose. Here are some common types:
|
|
17
|
+
|
|
18
|
+
1. **Textual Evals**: Evaluate accuracy, reliability, and context understanding of agent responses
|
|
19
|
+
2. **Classification Evals**: Measure accuracy in categorizing data based on predefined categories
|
|
20
|
+
3. **Tool Usage Evals**: Assess how effectively an agent uses external tools or APIs
|
|
21
|
+
4. **Prompt Engineering Evals**: Explore impact of different instructions and input formats
|
|
13
22
|
|
|
14
|
-
|
|
23
|
+
## Getting Started
|
|
24
|
+
|
|
25
|
+
Evals need to be added to an agent. Here's an example using the faithfulness, content similarity, and hallucination metrics:
|
|
15
26
|
|
|
16
27
|
```typescript copy showLineNumbers filename="src/mastra/agents/index.ts"
|
|
17
28
|
import { Agent } from "@mastra/core/agent";
|
|
18
29
|
import { openai } from "@ai-sdk/openai";
|
|
19
|
-
import {
|
|
30
|
+
import {
|
|
31
|
+
FaithfulnessMetric,
|
|
32
|
+
ContentSimilarityMetric,
|
|
33
|
+
HallucinationMetric
|
|
34
|
+
} from "@mastra/evals/nlp";
|
|
20
35
|
|
|
21
36
|
export const myAgent = new Agent({
|
|
22
|
-
name: "
|
|
23
|
-
instructions: "You are a
|
|
24
|
-
model: openai("gpt-4o
|
|
25
|
-
evals:
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
### Executing evals in your CI/CD pipeline
|
|
34
|
-
|
|
35
|
-
We support any testing framework that supports ESM modules. For example, you can use [Vitest](https://vitest.dev/), [Jest](https://jestjs.io/) or [Mocha](https://mochajs.org/) to run evals in your CI/CD pipeline.
|
|
36
|
-
|
|
37
|
-
```typescript copy showLineNumbers filename="src/mastra/agents/index.test.ts"
|
|
38
|
-
import { describe, it, expect } from 'vitest';
|
|
39
|
-
import { evaluate } from '@mastra/core/eval';
|
|
40
|
-
import { myAgent } from './index';
|
|
41
|
-
|
|
42
|
-
describe('My Agent', () => {
|
|
43
|
-
it('should be able to validate tone consistency', async () => {
|
|
44
|
-
const metric = new ToneConsistencyMetric();
|
|
45
|
-
const result = await evaluate(myAgent, 'Hello, world!', metric)
|
|
46
|
-
|
|
47
|
-
expect(result.score).toBe(1);
|
|
48
|
-
});
|
|
37
|
+
name: "ContentWriter",
|
|
38
|
+
instructions: "You are a content writer that creates accurate summaries",
|
|
39
|
+
model: openai("gpt-4o"),
|
|
40
|
+
evals: [
|
|
41
|
+
new FaithfulnessMetric(), // Checks if output matches source material
|
|
42
|
+
new ContentSimilarityMetric({
|
|
43
|
+
threshold: 0.8 // Require 80% similarity with expected output
|
|
44
|
+
}),
|
|
45
|
+
new HallucinationMetric()
|
|
46
|
+
]
|
|
49
47
|
});
|
|
50
|
-
|
|
51
48
|
```
|
|
52
49
|
|
|
53
|
-
You
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
#### Vitest
|
|
50
|
+
You can view eval results in the Mastra dashboard when using `mastra dev`.
|
|
57
51
|
|
|
58
|
-
|
|
59
|
-
Without these files, the evals will still run and fail when necessary but you won't be able to see the results in the Mastra dashboard.
|
|
52
|
+
## Beyond Automated Testing
|
|
60
53
|
|
|
61
|
-
|
|
62
|
-
import { globalSetup } from '@mastra/evals';
|
|
54
|
+
While automated evals are valuable, high-performing AI teams often combine them with:
|
|
63
55
|
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
```
|
|
56
|
+
1. **A/B Testing**: Compare different versions with real users
|
|
57
|
+
2. **Human Review**: Regular review of production data and traces
|
|
58
|
+
3. **Continuous Monitoring**: Track eval metrics over time to detect regressions
|
|
68
59
|
|
|
69
|
-
|
|
70
|
-
import { beforeAll } from 'vitest';
|
|
71
|
-
import { attachListeners } from '@mastra/evals';
|
|
60
|
+
## Understanding Eval Results
|
|
72
61
|
|
|
73
|
-
|
|
74
|
-
await attachListeners();
|
|
75
|
-
});
|
|
76
|
-
```
|
|
62
|
+
Each eval metric measures a specific aspect of your agent's output. Here's how to interpret and improve your results:
|
|
77
63
|
|
|
78
|
-
|
|
79
|
-
|
|
64
|
+
### Understanding Scores
|
|
65
|
+
For any metric:
|
|
66
|
+
1. Check the metric documentation to understand the scoring process
|
|
67
|
+
2. Look for patterns in when scores change
|
|
68
|
+
3. Compare scores across different inputs and contexts
|
|
69
|
+
4. Track changes over time to spot trends
|
|
80
70
|
|
|
81
|
-
|
|
71
|
+
### Improving Results
|
|
72
|
+
When scores aren't meeting your targets:
|
|
73
|
+
1. Check your instructions - Are they clear? Try making them more specific
|
|
74
|
+
2. Look at your context - Is it giving the agent what it needs?
|
|
75
|
+
3. Simplify your prompts - Break complex tasks into smaller steps
|
|
76
|
+
4. Add guardrails - Include specific rules for tricky cases
|
|
82
77
|
|
|
83
|
-
|
|
84
|
-
|
|
78
|
+
### Maintaining Quality
|
|
79
|
+
Once you're hitting your targets:
|
|
80
|
+
1. Monitor stability - Do scores remain consistent?
|
|
81
|
+
2. Document what works - Keep notes on successful approaches
|
|
82
|
+
3. Test edge cases - Add examples that cover unusual scenarios
|
|
83
|
+
4. Fine-tune - Look for ways to improve efficiency
|
|
85
84
|
|
|
86
|
-
|
|
87
|
-
// Store evals in Mastra Storage (requires storage to be enabled)
|
|
88
|
-
await attachListeners(mastra);
|
|
89
|
-
});
|
|
90
|
-
```
|
|
85
|
+
See [Textual Evals](/docs/evals/textual-evals) for more info on what evals can do.
|
|
91
86
|
|
|
92
|
-
|
|
93
|
-
With file storage, evals persist and can be queried later.
|
|
94
|
-
With memory storage, evals are isolated to the test process.
|
|
95
|
-
</details>
|
|
87
|
+
For more info on how to create your own evals, see the [Custom Evals](/docs/evals/02-custom-eval) guide.
|
|
96
88
|
|
|
97
|
-
|
|
98
|
-
import { defineConfig } from 'vitest/config'
|
|
99
|
-
|
|
100
|
-
export default defineConfig({
|
|
101
|
-
test: {
|
|
102
|
-
globalSetup: './globalSetup.ts',
|
|
103
|
-
setupFiles: ['./testSetup.ts'],
|
|
104
|
-
},
|
|
105
|
-
})
|
|
106
|
-
```
|
|
89
|
+
For running evals in your CI pipeline, see the [Running in CI](/docs/evals/running-in-ci) guide.
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Textual Evals"
|
|
3
|
+
description: "Understand how Mastra uses LLM-as-judge methodology to evaluate text quality."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Textual Evals
|
|
7
|
+
|
|
8
|
+
Textual evals use an LLM-as-judge methodology to evaluate agent outputs. This approach leverages language models to assess various aspects of text quality, similar to how a teaching assistant might grade assignments using a rubric.
|
|
9
|
+
|
|
10
|
+
Each eval focuses on specific quality aspects and returns a score between 0 and 1, providing quantifiable metrics for non-deterministic AI outputs.
|
|
11
|
+
|
|
12
|
+
Mastra provides several eval metrics for assessing Agent outputs. Mastra is not limited to these metrics, and you can also [define your own evals](/docs/evals/02-custom-eval).
|
|
13
|
+
|
|
14
|
+
## Why Use Textual Evals?
|
|
15
|
+
|
|
16
|
+
Textual evals help ensure your agent:
|
|
17
|
+
- Produces accurate and reliable responses
|
|
18
|
+
- Uses context effectively
|
|
19
|
+
- Follows output requirements
|
|
20
|
+
- Maintains consistent quality over time
|
|
21
|
+
|
|
22
|
+
## Available Metrics
|
|
23
|
+
|
|
24
|
+
### Accuracy and Reliability
|
|
25
|
+
|
|
26
|
+
These metrics evaluate how correct, truthful, and complete your agent's answers are:
|
|
27
|
+
|
|
28
|
+
- [`hallucination`](/docs/reference/evals/hallucination): Detects facts or claims not present in provided context
|
|
29
|
+
- [`faithfulness`](/docs/reference/evals/faithfulness): Measures how accurately responses represent provided context
|
|
30
|
+
- [`content-similarity`](/docs/reference/evals/content-similarity): Evaluates consistency of information across different phrasings
|
|
31
|
+
- [`completeness`](/docs/reference/evals/completeness): Checks if responses include all necessary information
|
|
32
|
+
- [`answer-relevancy`](/docs/reference/evals/answer-relevancy): Assesses how well responses address the original query
|
|
33
|
+
- [`textual-difference`](/docs/reference/evals/textual-difference): Measures textual differences between strings
|
|
34
|
+
|
|
35
|
+
### Understanding Context
|
|
36
|
+
|
|
37
|
+
These metrics evaluate how well your agent uses provided context:
|
|
38
|
+
|
|
39
|
+
- [`context-position`](/docs/reference/evals/context-position): Analyzes where context appears in responses
|
|
40
|
+
- [`context-precision`](/docs/reference/evals/context-precision): Evaluates whether context chunks are grouped logically
|
|
41
|
+
- [`context-relevancy`](/docs/reference/evals/context-relevancy): Measures use of appropriate context pieces
|
|
42
|
+
- [`contextual-recall`](/docs/reference/evals/contextual-recall): Assesses completeness of context usage
|
|
43
|
+
|
|
44
|
+
### Output Quality
|
|
45
|
+
|
|
46
|
+
These metrics evaluate adherence to format and style requirements:
|
|
47
|
+
|
|
48
|
+
- [`tone`](/docs/reference/evals/tone-consistency): Measures consistency in formality, complexity, and style
|
|
49
|
+
- [`toxicity`](/docs/reference/evals/toxicity): Detects harmful or inappropriate content
|
|
50
|
+
- [`bias`](/docs/reference/evals/bias): Detects potential biases in the output
|
|
51
|
+
- [`prompt-alignment`](/docs/reference/evals/prompt-alignment): Checks adherence to explicit instructions like length restrictions, formatting requirements, or other constraints
|
|
52
|
+
- [`summarization`](/docs/reference/evals/summarization): Evaluates information retention and conciseness
|
|
53
|
+
- [`keyword-coverage`](/docs/reference/evals/keyword-coverage): Assesses technical terminology usage
|