@mastra/mcp-docs-server 0.0.1 → 0.0.2-alpha.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. package/.docs/organized/changelogs/%40mastra%2Fastra.md +25 -25
  2. package/.docs/organized/changelogs/%40mastra%2Fchroma.md +25 -25
  3. package/.docs/organized/changelogs/%40mastra%2Fclient-js.md +27 -27
  4. package/.docs/organized/changelogs/%40mastra%2Fcomposio.md +25 -25
  5. package/.docs/organized/changelogs/%40mastra%2Fcore.md +24 -24
  6. package/.docs/organized/changelogs/%40mastra%2Fdeployer-cloudflare.md +37 -37
  7. package/.docs/organized/changelogs/%40mastra%2Fdeployer-netlify.md +37 -37
  8. package/.docs/organized/changelogs/%40mastra%2Fdeployer-vercel.md +37 -37
  9. package/.docs/organized/changelogs/%40mastra%2Fdeployer.md +36 -36
  10. package/.docs/organized/changelogs/%40mastra%2Fevals.md +25 -25
  11. package/.docs/organized/changelogs/%40mastra%2Ffirecrawl.md +29 -29
  12. package/.docs/organized/changelogs/%40mastra%2Fgithub.md +25 -25
  13. package/.docs/organized/changelogs/%40mastra%2Floggers.md +25 -25
  14. package/.docs/organized/changelogs/%40mastra%2Fmcp-docs-server.md +26 -0
  15. package/.docs/organized/changelogs/%40mastra%2Fmcp.md +25 -25
  16. package/.docs/organized/changelogs/%40mastra%2Fmemory.md +25 -25
  17. package/.docs/organized/changelogs/%40mastra%2Fpg.md +25 -25
  18. package/.docs/organized/changelogs/%40mastra%2Fpinecone.md +29 -29
  19. package/.docs/organized/changelogs/%40mastra%2Fplayground-ui.md +34 -34
  20. package/.docs/organized/changelogs/%40mastra%2Fqdrant.md +25 -25
  21. package/.docs/organized/changelogs/%40mastra%2Frag.md +27 -27
  22. package/.docs/organized/changelogs/%40mastra%2Fragie.md +25 -25
  23. package/.docs/organized/changelogs/%40mastra%2Fspeech-azure.md +25 -25
  24. package/.docs/organized/changelogs/%40mastra%2Fspeech-deepgram.md +25 -25
  25. package/.docs/organized/changelogs/%40mastra%2Fspeech-elevenlabs.md +25 -25
  26. package/.docs/organized/changelogs/%40mastra%2Fspeech-google.md +25 -25
  27. package/.docs/organized/changelogs/%40mastra%2Fspeech-ibm.md +25 -25
  28. package/.docs/organized/changelogs/%40mastra%2Fspeech-murf.md +25 -25
  29. package/.docs/organized/changelogs/%40mastra%2Fspeech-openai.md +25 -25
  30. package/.docs/organized/changelogs/%40mastra%2Fspeech-playai.md +25 -25
  31. package/.docs/organized/changelogs/%40mastra%2Fspeech-replicate.md +25 -25
  32. package/.docs/organized/changelogs/%40mastra%2Fspeech-speechify.md +25 -25
  33. package/.docs/organized/changelogs/%40mastra%2Fstabilityai.md +25 -25
  34. package/.docs/organized/changelogs/%40mastra%2Fturbopuffer.md +24 -0
  35. package/.docs/organized/changelogs/%40mastra%2Fupstash.md +25 -25
  36. package/.docs/organized/changelogs/%40mastra%2Fvectorize.md +25 -25
  37. package/.docs/organized/changelogs/%40mastra%2Fvoice-deepgram.md +25 -25
  38. package/.docs/organized/changelogs/%40mastra%2Fvoice-elevenlabs.md +25 -25
  39. package/.docs/organized/changelogs/%40mastra%2Fvoice-google.md +25 -25
  40. package/.docs/organized/changelogs/%40mastra%2Fvoice-murf.md +25 -25
  41. package/.docs/organized/changelogs/%40mastra%2Fvoice-openai-realtime.md +26 -0
  42. package/.docs/organized/changelogs/%40mastra%2Fvoice-openai.md +25 -25
  43. package/.docs/organized/changelogs/%40mastra%2Fvoice-playai.md +25 -25
  44. package/.docs/organized/changelogs/%40mastra%2Fvoice-sarvam.md +25 -0
  45. package/.docs/organized/changelogs/%40mastra%2Fvoice-speechify.md +25 -25
  46. package/.docs/organized/changelogs/create-mastra.md +16 -16
  47. package/.docs/organized/changelogs/mastra.md +52 -52
  48. package/.docs/organized/code-examples/agent.md +8 -3
  49. package/.docs/organized/code-examples/ai-sdk-useChat.md +1 -0
  50. package/.docs/organized/code-examples/weather-agent.md +1 -0
  51. package/.docs/raw/deployment/client.mdx +120 -0
  52. package/.docs/raw/deployment/server.mdx +1 -1
  53. package/.docs/raw/evals/00-overview.mdx +58 -75
  54. package/.docs/raw/evals/01-textual-evals.mdx +53 -0
  55. package/.docs/raw/evals/02-custom-eval.mdx +6 -170
  56. package/.docs/raw/evals/03-running-in-ci.mdx +78 -0
  57. package/.docs/raw/getting-started/installation.mdx +24 -13
  58. package/.docs/raw/getting-started/mcp-docs-server.mdx +138 -0
  59. package/.docs/raw/index.mdx +2 -2
  60. package/.docs/raw/local-dev/add-to-existing-project.mdx +48 -0
  61. package/.docs/raw/local-dev/creating-a-new-project.mdx +54 -0
  62. package/.docs/raw/local-dev/mastra-dev.mdx +78 -35
  63. package/.docs/raw/reference/agents/createTool.mdx +128 -89
  64. package/.docs/raw/reference/agents/stream.mdx +19 -18
  65. package/.docs/raw/reference/cli/dev.mdx +58 -21
  66. package/.docs/raw/storage/overview.mdx +331 -0
  67. package/package.json +2 -2
  68. package/.docs/raw/evals/01-supported-evals.mdx +0 -31
  69. package/.docs/raw/local-dev/creating-projects.mdx +0 -74
  70. package/.docs/raw/reference/client-js/index.mdx +0 -127
  71. /package/.docs/raw/{local-dev/integrations.mdx → integrations/index.mdx} +0 -0
@@ -1,5 +1,56 @@
1
1
  # mastra
2
2
 
3
+ ## 0.4.1-alpha.1
4
+
5
+ ### Patch Changes
6
+
7
+ - 2f6a8b6: Update port handling in dev command to ensure CLI port takes precedence over environment variables and add warning when overriding PORT environment variable.
8
+ - Updated dependencies [beaf1c2]
9
+ - Updated dependencies [3084e13]
10
+ - @mastra/core@0.6.2-alpha.0
11
+ - @mastra/deployer@0.2.2-alpha.1
12
+
13
+ ## 0.4.1-alpha.0
14
+
15
+ ### Patch Changes
16
+
17
+ - aede1ea: Add non english support to weather example
18
+ - Updated dependencies [4e6732b]
19
+ - @mastra/deployer@0.2.2-alpha.0
20
+
21
+ ## 0.4.0
22
+
23
+ ### Minor Changes
24
+
25
+ - f9b6ab5: add Cerebras as a llm provider to create-mastra@latest
26
+
27
+ ### Patch Changes
28
+
29
+ - 5052613: Added a new `mastra create --project-name <string>` flag so coder agents can create new Mastra projects with a one line command.
30
+ - 1291e89: Add resizable-panel to playground-ui and use in agent and workflow sidebars
31
+ - 1405e46: update the Groq model the create-mastra@latest sets
32
+ - da8d9bb: Enable public dir copying if it exists
33
+ - 9ba1e97: update playground ui for mastra and create-mastra
34
+ - 5baf1ec: animate new traces
35
+ - 65f2a4c: Add Mastra Docs MCP to the pnpm create mastra TUI with the option to install in Cursor or Windsurf
36
+ - 9116d70: Handle the different workflow methods in workflow graph
37
+ - 0709d99: add prop for dynamic empty text
38
+ - Updated dependencies [cc7f392]
39
+ - Updated dependencies [fc2f89c]
40
+ - Updated dependencies [dfbb131]
41
+ - Updated dependencies [f4854ee]
42
+ - Updated dependencies [afaf73f]
43
+ - Updated dependencies [0850b4c]
44
+ - Updated dependencies [7bcfaee]
45
+ - Updated dependencies [da8d9bb]
46
+ - Updated dependencies [44631b1]
47
+ - Updated dependencies [9116d70]
48
+ - Updated dependencies [6e559a0]
49
+ - Updated dependencies [5f43505]
50
+ - Updated dependencies [61ad5a4]
51
+ - @mastra/deployer@0.2.1
52
+ - @mastra/core@0.6.1
53
+
3
54
  ## 0.4.0-alpha.2
4
55
 
5
56
  ### Patch Changes
@@ -247,56 +298,5 @@
247
298
  - Updated dependencies [c151ae6]
248
299
  - Updated dependencies [52e0418]
249
300
  - Updated dependencies [03236ec]
250
- - Updated dependencies [3764e71]
251
- - Updated dependencies [df982db]
252
- - Updated dependencies [0461849]
253
- - Updated dependencies [2259379]
254
- - Updated dependencies [358f069]
255
- - @mastra/core@0.5.0-alpha.5
256
- - @mastra/deployer@0.1.8-alpha.5
257
-
258
- ## 0.2.9-alpha.4
259
-
260
- ### Patch Changes
261
-
262
- - 144b3d5: Update traces table UI, agent Chat UI
263
- Fix get workflows breaking
264
- - Updated dependencies [d79aedf]
265
- - Updated dependencies [144b3d5]
266
- - @mastra/core@0.5.0-alpha.4
267
- - @mastra/deployer@0.1.8-alpha.4
268
-
269
- ## 0.2.9-alpha.3
270
-
271
- ### Patch Changes
272
-
273
- - Updated dependencies [3d0e290]
274
- - @mastra/core@0.5.0-alpha.3
275
- - @mastra/deployer@0.1.8-alpha.3
276
-
277
- ## 0.2.9-alpha.2
278
-
279
- ### Patch Changes
280
-
281
- - Updated dependencies [02ffb7b]
282
- - @mastra/core@0.5.0-alpha.2
283
- - @mastra/deployer@0.1.8-alpha.2
284
-
285
- ## 0.2.9-alpha.1
286
-
287
- ### Patch Changes
288
-
289
- - e5149bb: Fix playground-ui agent-evals tab-content
290
- - Updated dependencies [dab255b]
291
- - @mastra/core@0.5.0-alpha.1
292
- - @mastra/deployer@0.1.8-alpha.1
293
-
294
- ## 0.2.9-alpha.0
295
-
296
- ### Patch Changes
297
-
298
- - 5fae49e: Configurable timeout on npm create mastra
299
- - 960690d: Improve client-js workflow watch dx
300
- - 62565c1: --no-timeout npm create mastra flag
301
301
 
302
- ... 1959 more lines hidden. See full changelog in package directory.
302
+ ... 2010 more lines hidden. See full changelog in package directory.
@@ -375,9 +375,14 @@ import { z } from 'zod';
375
375
  export const cookingTool = createTool({
376
376
  id: 'cooking-tool',
377
377
  description: 'My tool description',
378
- inputSchema: z.object({}),
379
- execute: async () => {
380
- console.log('My tool is running!');
378
+ inputSchema: z.object({
379
+ ingredient: z.string(),
380
+ }),
381
+ execute: async ({ context }, options) => {
382
+ console.log('My tool is running!', context.ingredient);
383
+ if (options?.toolCallId) {
384
+ console.log('Tool call ID:', options.toolCallId);
385
+ }
381
386
  return 'My tool result';
382
387
  },
383
388
  });
@@ -48,6 +48,7 @@ export const weatherAgent = new Agent({
48
48
 
49
49
  Your primary function is to help users get weather details for specific locations. When responding:
50
50
  - Always ask for a location if none is provided
51
+ - If the location name isn’t in English, please translate it
51
52
  - If giving a location with multiple parts (e.g. "New York, NY"), use the most relevant part (e.g. "New York")
52
53
  - Include relevant details like humidity, wind conditions, and precipitation
53
54
  - Keep responses concise but informative
@@ -35,6 +35,7 @@ export const weatherAgent = new Agent({
35
35
 
36
36
  Your primary function is to help users get weather details for specific locations. When responding:
37
37
  - Always ask for a location if none is provided
38
+ - If the location name isn’t in English, please translate it
38
39
  - If giving a location with multiple parts (e.g. "New York, NY"), use the most relevant part (e.g. "New York")
39
40
  - Include relevant details like humidity, wind conditions, and precipitation
40
41
  - Keep responses concise but informative
@@ -0,0 +1,120 @@
1
+ ---
2
+ title: "MastraClient"
3
+ description: "Learn how to set up and use the Mastra Client SDK"
4
+ ---
5
+
6
+ # Mastra Client SDK
7
+
8
+ The Mastra Client SDK provides a simple and type-safe interface for interacting with your [Mastra Server](/docs/deployment/server) from your client environment.
9
+
10
+ ## Development Requirements
11
+
12
+ To ensure smooth local development, make sure you have:
13
+
14
+ - Node.js 18.x or later installed
15
+ - TypeScript 4.7+ (if using TypeScript)
16
+ - A modern browser environment with Fetch API support
17
+ - Your local Mastra server running (typically on port 4111)
18
+
19
+ ## Installation
20
+
21
+ import { Tabs } from "nextra/components";
22
+
23
+ <Tabs items={["npm", "yarn", "pnpm"]}>
24
+ <Tabs.Tab>
25
+ ```bash copy
26
+ npm install @mastra/client-js
27
+ ```
28
+ </Tabs.Tab>
29
+ <Tabs.Tab>
30
+ ```bash copy
31
+ yarn add @mastra/client-js
32
+ ```
33
+ </Tabs.Tab>
34
+ <Tabs.Tab>
35
+ ```bash copy
36
+ pnpm add @mastra/client-js
37
+ ```
38
+ </Tabs.Tab>
39
+ </Tabs>
40
+
41
+ ## Initialize Mastra Client
42
+
43
+ To get started you'll need to initialize your MastraClient with necessary parameters:
44
+
45
+ ```typescript
46
+ import { MastraClient } from "@mastra/client-js";
47
+
48
+ const client = new MastraClient({
49
+ baseUrl: "http://localhost:4111", // Default Mastra development server port
50
+ });
51
+ ```
52
+
53
+ ### Configuration Options
54
+
55
+ You can customize the client with various options:
56
+
57
+ ```typescript
58
+ const client = new MastraClient({
59
+ // Required
60
+ baseUrl: "http://localhost:4111",
61
+
62
+ // Optional configurations for development
63
+ retries: 3, // Number of retry attempts
64
+ backoffMs: 300, // Initial retry backoff time
65
+ maxBackoffMs: 5000, // Maximum retry backoff time
66
+ headers: { // Custom headers for development
67
+ "X-Development": "true"
68
+ }
69
+ });
70
+ ```
71
+
72
+ ## Example
73
+
74
+ Once your MastraClient is initialized you can start making client calls via the type-safe
75
+ interface
76
+
77
+ ```typescript
78
+ // Get a reference to your local agent
79
+ const agent = client.getAgent("dev-agent-id");
80
+
81
+ // Generate responses
82
+ const response = await agent.generate({
83
+ messages: [
84
+ {
85
+ role: "user",
86
+ content: "Hello, I'm testing the local development setup!"
87
+ }
88
+ ]
89
+ });
90
+ ```
91
+
92
+ ## Available Features
93
+
94
+ Mastra client exposes all resources served by the Mastra Server
95
+
96
+ - [**Agents**](/docs/reference/client-js/agents): Create and manage AI agents, generate responses, and handle streaming interactions
97
+ - [**Memory**](/docs/reference/client-js/memory): Manage conversation threads and message history
98
+ - [**Tools**](/docs/reference/client-js/tools): Access and execute tools available to agents
99
+ - [**Workflows**](/docs/reference/client-js/workflows): Create and manage automated workflows
100
+ - [**Vectors**](/docs/reference/client-js/vectors): Handle vector operations for semantic search and similarity matching
101
+
102
+
103
+ ## Best Practices
104
+ 1. **Error Handling**: Implement proper error handling for development scenarios
105
+ 2. **Environment Variables**: Use environment variables for configuration
106
+ 3. **Debugging**: Enable detailed logging when needed
107
+
108
+ ```typescript
109
+ // Example with error handling and logging
110
+ try {
111
+ const agent = client.getAgent("dev-agent-id");
112
+ const response = await agent.generate({
113
+ messages: [{ role: "user", content: "Test message" }]
114
+ });
115
+ console.log("Response:", response);
116
+ } catch (error) {
117
+ console.error("Development error:", error);
118
+ }
119
+ ```
120
+
@@ -5,7 +5,7 @@ description: "Configure and customize the Mastra server with middleware and othe
5
5
 
6
6
  # Mastra Server
7
7
 
8
- When you deploy a Mastra application, it runs as an HTTP server that exposes your agents, workflows, and other functionality as API endpoints. This page explains how to configure and customize the server behavior.
8
+ While developing or when you deploy a Mastra application, it runs as an HTTP server that exposes your agents, workflows, and other functionality as API endpoints. This page explains how to configure and customize the server behavior.
9
9
 
10
10
  ## Server Architecture
11
11
 
@@ -1,106 +1,89 @@
1
1
  ---
2
2
  title: "Overview"
3
- description: "Mastra evals help you measure LLM output quality with metrics for relevance, bias, hallucination, and more."
3
+ description: "Understanding how to evaluate and measure AI agent quality using Mastra evals."
4
4
  ---
5
5
 
6
6
  # Testing your agents with evals
7
7
 
8
+ While traditional software tests have clear pass/fail conditions, AI outputs are non-deterministic — they can vary with the same input. Evals help bridge this gap by providing quantifiable metrics for measuring agent quality.
9
+
8
10
  Evals are automated tests that evaluate Agents outputs using model-graded, rule-based, and statistical methods. Each eval returns a normalized score between 0-1 that can be logged and compared. Evals can be customized with your own prompts and scoring functions.
9
11
 
10
12
  Evals can be run in the cloud, capturing real-time results. But evals can also be part of your CI/CD pipeline, allowing you to test and monitor your agents over time.
11
13
 
12
- ## How to use evals
14
+ ## Types of Evals
15
+
16
+ There are different kinds of evals, each serving a specific purpose. Here are some common types:
17
+
18
+ 1. **Textual Evals**: Evaluate accuracy, reliability, and context understanding of agent responses
19
+ 2. **Classification Evals**: Measure accuracy in categorizing data based on predefined categories
20
+ 3. **Tool Usage Evals**: Assess how effectively an agent uses external tools or APIs
21
+ 4. **Prompt Engineering Evals**: Explore impact of different instructions and input formats
13
22
 
14
- Evals need to be added to an agent. To use any of [the default metrics](/docs/evals/01-supported-evals), you can do the following:
23
+ ## Getting Started
24
+
25
+ Evals need to be added to an agent. Here's an example using the faithfulness, content similarity, and hallucination metrics:
15
26
 
16
27
  ```typescript copy showLineNumbers filename="src/mastra/agents/index.ts"
17
28
  import { Agent } from "@mastra/core/agent";
18
29
  import { openai } from "@ai-sdk/openai";
19
- import { ToneConsistencyMetric } from "@mastra/evals/nlp";
30
+ import {
31
+ FaithfulnessMetric,
32
+ ContentSimilarityMetric,
33
+ HallucinationMetric
34
+ } from "@mastra/evals/nlp";
20
35
 
21
36
  export const myAgent = new Agent({
22
- name: "My Agent",
23
- instructions: "You are a helpful assistant.",
24
- model: openai("gpt-4o-mini"),
25
- evals: {
26
- tone: new ToneConsistencyMetric()
27
- },
28
- });
29
- ```
30
-
31
- You can now view the evals in the Mastra dashboard, when using `mastra dev`.
32
-
33
- ### Executing evals in your CI/CD pipeline
34
-
35
- We support any testing framework that supports ESM modules. For example, you can use [Vitest](https://vitest.dev/), [Jest](https://jestjs.io/) or [Mocha](https://mochajs.org/) to run evals in your CI/CD pipeline.
36
-
37
- ```typescript copy showLineNumbers filename="src/mastra/agents/index.test.ts"
38
- import { describe, it, expect } from 'vitest';
39
- import { evaluate } from '@mastra/core/eval';
40
- import { myAgent } from './index';
41
-
42
- describe('My Agent', () => {
43
- it('should be able to validate tone consistency', async () => {
44
- const metric = new ToneConsistencyMetric();
45
- const result = await evaluate(myAgent, 'Hello, world!', metric)
46
-
47
- expect(result.score).toBe(1);
48
- });
37
+ name: "ContentWriter",
38
+ instructions: "You are a content writer that creates accurate summaries",
39
+ model: openai("gpt-4o"),
40
+ evals: [
41
+ new FaithfulnessMetric(), // Checks if output matches source material
42
+ new ContentSimilarityMetric({
43
+ threshold: 0.8 // Require 80% similarity with expected output
44
+ }),
45
+ new HallucinationMetric()
46
+ ]
49
47
  });
50
-
51
48
  ```
52
49
 
53
- You will need to configure a testSetup and globalSetup script for your testing framework to capture the eval results. It allows us to show these results in your mastra dashboard.
54
-
55
-
56
- #### Vitest
50
+ You can view eval results in the Mastra dashboard when using `mastra dev`.
57
51
 
58
- These are the files you need to add to your project to run evals in your CI/CD pipeline and allow us to capture the results.
59
- Without these files, the evals will still run and fail when necessary but you won't be able to see the results in the Mastra dashboard.
52
+ ## Beyond Automated Testing
60
53
 
61
- ```typescript copy showLineNumbers filename="globalSetup.ts"
62
- import { globalSetup } from '@mastra/evals';
54
+ While automated evals are valuable, high-performing AI teams often combine them with:
63
55
 
64
- export default function setup() {
65
- globalSetup()
66
- }
67
- ```
56
+ 1. **A/B Testing**: Compare different versions with real users
57
+ 2. **Human Review**: Regular review of production data and traces
58
+ 3. **Continuous Monitoring**: Track eval metrics over time to detect regressions
68
59
 
69
- ```typescript copy showLineNumbers filename="testSetup.ts"
70
- import { beforeAll } from 'vitest';
71
- import { attachListeners } from '@mastra/evals';
60
+ ## Understanding Eval Results
72
61
 
73
- beforeAll(async () => {
74
- await attachListeners();
75
- });
76
- ```
62
+ Each eval metric measures a specific aspect of your agent's output. Here's how to interpret and improve your results:
77
63
 
78
- <details>
79
- <summary>Store evals in Mastra Storage</summary>
64
+ ### Understanding Scores
65
+ For any metric:
66
+ 1. Check the metric documentation to understand the scoring process
67
+ 2. Look for patterns in when scores change
68
+ 3. Compare scores across different inputs and contexts
69
+ 4. Track changes over time to spot trends
80
70
 
81
- Pass your Mastra instance to store evals in the configured storage:
71
+ ### Improving Results
72
+ When scores aren't meeting your targets:
73
+ 1. Check your instructions - Are they clear? Try making them more specific
74
+ 2. Look at your context - Is it giving the agent what it needs?
75
+ 3. Simplify your prompts - Break complex tasks into smaller steps
76
+ 4. Add guardrails - Include specific rules for tricky cases
82
77
 
83
- ```typescript
84
- import { mastra } from './your-mastra-setup';
78
+ ### Maintaining Quality
79
+ Once you're hitting your targets:
80
+ 1. Monitor stability - Do scores remain consistent?
81
+ 2. Document what works - Keep notes on successful approaches
82
+ 3. Test edge cases - Add examples that cover unusual scenarios
83
+ 4. Fine-tune - Look for ways to improve efficiency
85
84
 
86
- beforeAll(async () => {
87
- // Store evals in Mastra Storage (requires storage to be enabled)
88
- await attachListeners(mastra);
89
- });
90
- ```
85
+ See [Textual Evals](/docs/evals/textual-evals) for more info on what evals can do.
91
86
 
92
- This allows you to save evals in Mastra Storage.
93
- With file storage, evals persist and can be queried later.
94
- With memory storage, evals are isolated to the test process.
95
- </details>
87
+ For more info on how to create your own evals, see the [Custom Evals](/docs/evals/02-custom-eval) guide.
96
88
 
97
- ```typescript copy showLineNumbers filename="vitest.config.ts"
98
- import { defineConfig } from 'vitest/config'
99
-
100
- export default defineConfig({
101
- test: {
102
- globalSetup: './globalSetup.ts',
103
- setupFiles: ['./testSetup.ts'],
104
- },
105
- })
106
- ```
89
+ For running evals in your CI pipeline, see the [Running in CI](/docs/evals/running-in-ci) guide.
@@ -0,0 +1,53 @@
1
+ ---
2
+ title: "Textual Evals"
3
+ description: "Understand how Mastra uses LLM-as-judge methodology to evaluate text quality."
4
+ ---
5
+
6
+ # Textual Evals
7
+
8
+ Textual evals use an LLM-as-judge methodology to evaluate agent outputs. This approach leverages language models to assess various aspects of text quality, similar to how a teaching assistant might grade assignments using a rubric.
9
+
10
+ Each eval focuses on specific quality aspects and returns a score between 0 and 1, providing quantifiable metrics for non-deterministic AI outputs.
11
+
12
+ Mastra provides several eval metrics for assessing Agent outputs. Mastra is not limited to these metrics, and you can also [define your own evals](/docs/evals/02-custom-eval).
13
+
14
+ ## Why Use Textual Evals?
15
+
16
+ Textual evals help ensure your agent:
17
+ - Produces accurate and reliable responses
18
+ - Uses context effectively
19
+ - Follows output requirements
20
+ - Maintains consistent quality over time
21
+
22
+ ## Available Metrics
23
+
24
+ ### Accuracy and Reliability
25
+
26
+ These metrics evaluate how correct, truthful, and complete your agent's answers are:
27
+
28
+ - [`hallucination`](/docs/reference/evals/hallucination): Detects facts or claims not present in provided context
29
+ - [`faithfulness`](/docs/reference/evals/faithfulness): Measures how accurately responses represent provided context
30
+ - [`content-similarity`](/docs/reference/evals/content-similarity): Evaluates consistency of information across different phrasings
31
+ - [`completeness`](/docs/reference/evals/completeness): Checks if responses include all necessary information
32
+ - [`answer-relevancy`](/docs/reference/evals/answer-relevancy): Assesses how well responses address the original query
33
+ - [`textual-difference`](/docs/reference/evals/textual-difference): Measures textual differences between strings
34
+
35
+ ### Understanding Context
36
+
37
+ These metrics evaluate how well your agent uses provided context:
38
+
39
+ - [`context-position`](/docs/reference/evals/context-position): Analyzes where context appears in responses
40
+ - [`context-precision`](/docs/reference/evals/context-precision): Evaluates whether context chunks are grouped logically
41
+ - [`context-relevancy`](/docs/reference/evals/context-relevancy): Measures use of appropriate context pieces
42
+ - [`contextual-recall`](/docs/reference/evals/contextual-recall): Assesses completeness of context usage
43
+
44
+ ### Output Quality
45
+
46
+ These metrics evaluate adherence to format and style requirements:
47
+
48
+ - [`tone`](/docs/reference/evals/tone-consistency): Measures consistency in formality, complexity, and style
49
+ - [`toxicity`](/docs/reference/evals/toxicity): Detects harmful or inappropriate content
50
+ - [`bias`](/docs/reference/evals/bias): Detects potential biases in the output
51
+ - [`prompt-alignment`](/docs/reference/evals/prompt-alignment): Checks adherence to explicit instructions like length restrictions, formatting requirements, or other constraints
52
+ - [`summarization`](/docs/reference/evals/summarization): Evaluates information retention and conciseness
53
+ - [`keyword-coverage`](/docs/reference/evals/keyword-coverage): Assesses technical terminology usage