npm - @mastra/mcp-docs-server - Versions diffs - 0.13.39 → 1.0.0-beta.1 - Mend

@mastra/mcp-docs-server 0.13.39 → 1.0.0-beta.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (494) hide show

package/.docs/raw/reference/workflows/run-methods/watch.mdx DELETED Viewed

@@ -1,73 +0,0 @@
----
-title: "Reference: Run.watch() | Workflows | Mastra Docs"
-description: Documentation for the `Run.watch()` method in workflows, which allows you to monitor the execution of a workflow run.
----
-# Run.watch()
-The `.watch()` method allows you to monitor the execution of a workflow run, providing real-time updates on the status of steps.
-## Usage example
-```typescript showLineNumbers copy
-const run = await workflow.createRunAsync();
-run.watch((event) => {
-  console.log(event?.payload?.currentStep?.id);
-});
-const result = await run.start({ inputData: { value: "initial data" } });
-```
-## Parameters
-<PropertiesTable
-  content={[
-    {
-      name: "callback",
-      type: "(event: WatchEvent) => void",
-      description:
-        "A callback function that is called whenever a step is completed or the workflow state changes. The event parameter contains: type ('watch'), payload (currentStep and workflowState), and eventTimestamp",
-      isOptional: false,
-    },
-    {
-      name: "type",
-      type: "'watch' | 'watch-v2'",
-      description:
-        "The type of watch events to listen for. 'watch' for step completion events, 'watch-v2' for data stream events",
-      isOptional: true,
-      defaultValue: "'watch'",
-    },
-  ]}
-/>
-## Returns
-<PropertiesTable
-  content={[
-    {
-      name: "unwatch",
-      type: "() => void",
-      description:
-        "A function that can be called to stop watching the workflow run",
-    },
-  ]}
-/>
-## Extended usage example
-```typescript showLineNumbers copy
-const run = await workflow.createRunAsync();
-run.watch((event) => {
-  console.log(event?.payload?.currentStep?.id);
-}, "watch");
-const result = await run.start({ inputData: { value: "initial data" } });
-```
-## Related
-- [Workflows overview](/docs/workflows/overview#running-workflows)
-- [Workflow.createRunAsync()](../workflow-methods/create-run)
-- [Watch Workflow](/docs/workflows/overview)

package/.docs/raw/scorers/evals-old-api/custom-eval.mdx DELETED Viewed

@@ -1,24 +0,0 @@
----
-title: "Create a Custom Eval | Scorers | Mastra Docs"
-description: "Mastra allows you to create your own evals, here is how."
----
-# Create a Custom Eval
-:::info Scorers
-This documentation refers to the legacy evals API. For the latest scorer features, see [Scorers](/docs/scorers/overview).
-:::
-Create a custom eval by extending the `Metric` class and implementing the `measure` method. This gives you full control over how scores are calculated and what information is returned. For LLM-based evaluations, extend the `MastraAgentJudge` class to define how the model reasons and scores output.
-## Native JavaScript evaluation
-You can write lightweight custom metrics using plain JavaScript/TypeScript. These are ideal for simple string comparisons, pattern checks, or other rule-based logic.
-See our [Word Inclusion example](/examples/evals/custom-native-javascript-eval), which scores responses based on the number of reference words found in the output.
-## LLM as a judge evaluation
-For more complex evaluations, you can build a judge powered by an LLM. This lets you capture more nuanced criteria, like factual accuracy, tone, or reasoning.
-See the [Real World Countries example](/examples/evals/custom-llm-judge-eval) for a complete walkthrough of building a custom judge and metric that evaluates real-world factual accuracy.

package/.docs/raw/scorers/evals-old-api/overview.mdx DELETED Viewed

@@ -1,106 +0,0 @@
----
-title: "Testing your agents with evals | Scorers | Mastra Docs"
-description: "Understanding how to evaluate and measure AI agent quality using Mastra evals."
----
-# Testing your agents with evals
-:::info Scorers
-This documentation refers to the legacy evals API. For the latest scorer features, see [Scorers](/docs/scorers/overview).
-:::
-While traditional software tests have clear pass/fail conditions, AI outputs are non-deterministic — they can vary with the same input. Evals help bridge this gap by providing quantifiable metrics for measuring agent quality.
-Evals are automated tests that evaluate Agents outputs using model-graded, rule-based, and statistical methods. Each eval returns a normalized score between 0-1 that can be logged and compared. Evals can be customized with your own prompts and scoring functions.
-Evals can be run in the cloud, capturing real-time results. But evals can also be part of your CI/CD pipeline, allowing you to test and monitor your agents over time.
-## Types of Evals
-There are different kinds of evals, each serving a specific purpose. Here are some common types:
-1. **Textual Evals**: Evaluate accuracy, reliability, and context understanding of agent responses
-2. **Classification Evals**: Measure accuracy in categorizing data based on predefined categories
-3. **Prompt Engineering Evals**: Explore impact of different instructions and input formats
-## Installation
-To access Mastra's evals feature install the `@mastra/evals` package.
-```bash copy
-npm install @mastra/evals@latest
-```
-## Getting Started
-Evals need to be added to an agent. Here's an example using the summarization, content similarity, and tone consistency metrics:
-```typescript copy showLineNumbers title="src/mastra/agents/index.ts"
-import { Agent } from "@mastra/core/agent";
-import { openai } from "@ai-sdk/openai";
-import { SummarizationMetric } from "@mastra/evals/llm";
-import {
-  ContentSimilarityMetric,
-  ToneConsistencyMetric,
-} from "@mastra/evals/nlp";
-const model = openai("gpt-4o");
-export const myAgent = new Agent({
-  name: "ContentWriter",
-  instructions: "You are a content writer that creates accurate summaries",
-  model,
-  evals: {
-    summarization: new SummarizationMetric(model),
-    contentSimilarity: new ContentSimilarityMetric(),
-    tone: new ToneConsistencyMetric(),
-  },
-});
-```
-You can view eval results in the Mastra dashboard when using `mastra dev`.
-## Beyond Automated Testing
-While automated evals are valuable, high-performing AI teams often combine them with:
-1. **A/B Testing**: Compare different versions with real users
-2. **Human Review**: Regular review of production data and traces
-3. **Continuous Monitoring**: Track eval metrics over time to detect regressions
-## Understanding Eval Results
-Each eval metric measures a specific aspect of your agent's output. Here's how to interpret and improve your results:
-### Understanding Scores
-For any metric:
-1. Check the metric documentation to understand the scoring process
-2. Look for patterns in when scores change
-3. Compare scores across different inputs and contexts
-4. Track changes over time to spot trends
-### Improving Results
-When scores aren't meeting your targets:
-1. Check your instructions - Are they clear? Try making them more specific
-2. Look at your context - Is it giving the agent what it needs?
-3. Simplify your prompts - Break complex tasks into smaller steps
-4. Add guardrails - Include specific rules for tricky cases
-### Maintaining Quality
-Once you're hitting your targets:
-1. Monitor stability - Do scores remain consistent?
-2. Document what works - Keep notes on successful approaches
-3. Test edge cases - Add examples that cover unusual scenarios
-4. Fine-tune - Look for ways to improve efficiency
-See [Textual Evals](/docs/scorers/evals-old-api/textual-evals) for more info on what evals can do.
-For more info on how to create your own evals, see the [Custom Evals](/docs/scorers/evals-old-api/custom-eval) guide.
-For running evals in your CI pipeline, see the [Running in CI](/docs/scorers/evals-old-api/running-in-ci) guide.

package/.docs/raw/scorers/evals-old-api/running-in-ci.mdx DELETED Viewed

@@ -1,85 +0,0 @@
----
-title: "Running Evals in CI | Scorers | Mastra Docs"
-description: "Learn how to run Mastra evals in your CI/CD pipeline to monitor agent quality over time."
----
-# Running Evals in CI
-:::info Scorers
-This documentation refers to the legacy evals API. For the latest scorer features, see [Scorers](/docs/scorers/overview).
-:::
-Running evals in your CI pipeline helps bridge this gap by providing quantifiable metrics for measuring agent quality over time.
-## Setting Up CI Integration
-We support any testing framework that supports ESM modules. For example, you can use [Vitest](https://vitest.dev/), [Jest](https://jestjs.io/) or [Mocha](https://mochajs.org/) to run evals in your CI/CD pipeline.
-```typescript copy showLineNumbers title="src/mastra/agents/index.test.ts"
-import { describe, it, expect } from "vitest";
-import { evaluate } from "@mastra/evals";
-import { ToneConsistencyMetric } from "@mastra/evals/nlp";
-import { myAgent } from "./index";
-describe("My Agent", () => {
-  it("should validate tone consistency", async () => {
-    const metric = new ToneConsistencyMetric();
-    const result = await evaluate(myAgent, "Hello, world!", metric);
-    expect(result.score).toBe(1);
-  });
-});
-```
-You will need to configure a testSetup and globalSetup script for your testing framework to capture the eval results. It allows us to show these results in your mastra dashboard.
-## Framework Configuration
-### Vitest Setup
-Add these files to your project to run evals in your CI/CD pipeline:
-```typescript copy showLineNumbers title="globalSetup.ts"
-import { globalSetup } from "@mastra/evals";
-export default function setup() {
-  globalSetup();
-}
-```
-```typescript copy showLineNumbers title="testSetup.ts"
-import { beforeAll } from "vitest";
-import { attachListeners } from "@mastra/evals";
-beforeAll(async () => {
-  await attachListeners();
-});
-```
-```typescript copy showLineNumbers title="vitest.config.ts"
-import { defineConfig } from "vitest/config";
-export default defineConfig({
-  test: {
-    globalSetup: "./globalSetup.ts",
-    setupFiles: ["./testSetup.ts"],
-  },
-});
-```
-## Storage Configuration
-To store eval results in Mastra Storage and capture results in the Mastra dashboard:
-```typescript copy showLineNumbers title="testSetup.ts"
-import { beforeAll } from "vitest";
-import { attachListeners } from "@mastra/evals";
-import { mastra } from "./your-mastra-setup";
-beforeAll(async () => {
-  // Store evals in Mastra Storage (requires storage to be enabled)
-  await attachListeners(mastra);
-});
-```
-With file storage, evals persist and can be queried later. With memory storage, evals are isolated to the test process.

package/.docs/raw/scorers/evals-old-api/textual-evals.mdx DELETED Viewed

@@ -1,58 +0,0 @@
----
-title: "Textual Evals | Scorers | Mastra Docs"
-description: "Understand how Mastra uses LLM-as-judge methodology to evaluate text quality."
----
-# Textual Evals
-:::info Scorers
-This documentation refers to the legacy evals API. For the latest scorer features, see [Scorers](/docs/scorers/overview).
-:::
-Textual evals use an LLM-as-judge methodology to evaluate agent outputs. This approach leverages language models to assess various aspects of text quality, similar to how a teaching assistant might grade assignments using a rubric.
-Each eval focuses on specific quality aspects and returns a score between 0 and 1, providing quantifiable metrics for non-deterministic AI outputs.
-Mastra provides several eval metrics for assessing Agent outputs. Mastra is not limited to these metrics, and you can also [define your own evals](/docs/scorers/evals-old-api/custom-eval).
-## Why Use Textual Evals?
-Textual evals help ensure your agent:
-- Produces accurate and reliable responses
-- Uses context effectively
-- Follows output requirements
-- Maintains consistent quality over time
-## Available Metrics
-### Accuracy and Reliability
-These metrics evaluate how correct, truthful, and complete your agent's answers are:
-- [`hallucination`](/reference/evals/hallucination): Detects facts or claims not present in provided context
-- [`faithfulness`](/reference/evals/faithfulness): Measures how accurately responses represent provided context
-- [`content-similarity`](/reference/evals/content-similarity): Evaluates consistency of information across different phrasings
-- [`completeness`](/reference/evals/completeness): Checks if responses include all necessary information
-- [`answer-relevancy`](/reference/evals/answer-relevancy): Assesses how well responses address the original query
-- [`textual-difference`](/reference/evals/textual-difference): Measures textual differences between strings
-### Understanding Context
-These metrics evaluate how well your agent uses provided context:
-- [`context-position`](/reference/evals/context-position): Analyzes where context appears in responses
-- [`context-precision`](/reference/evals/context-precision): Evaluates whether context chunks are grouped logically
-- [`context-relevancy`](/reference/evals/context-relevancy): Measures use of appropriate context pieces
-- [`contextual-recall`](/reference/evals/contextual-recall): Assesses completeness of context usage
-### Output Quality
-These metrics evaluate adherence to format and style requirements:
-- [`tone`](/reference/evals/tone-consistency): Measures consistency in formality, complexity, and style
-- [`toxicity`](/reference/evals/toxicity): Detects harmful or inappropriate content
-- [`bias`](/reference/evals/bias): Detects potential biases in the output
-- [`prompt-alignment`](/reference/evals/prompt-alignment): Checks adherence to explicit instructions like length restrictions, formatting requirements, or other constraints
-- [`summarization`](/reference/evals/summarization): Evaluates information retention and conciseness
-- [`keyword-coverage`](/reference/evals/keyword-coverage): Assesses technical terminology usage

package/.docs/raw/scorers/off-the-shelf-scorers.mdx DELETED Viewed

@@ -1,50 +0,0 @@
----
-title: "Built-in Scorers | Scorers | Mastra Docs"
-description: "Overview of Mastra's ready-to-use scorers for evaluating AI outputs across quality, safety, and performance dimensions."
----
-# Built-in Scorers
-Mastra provides a comprehensive set of built-in scorers for evaluating AI outputs. These scorers are optimized for common evaluation scenarios and are ready to use in your agents and workflows.
-## Available Scorers
-### Accuracy and Reliability
-These scorers evaluate how correct, truthful, and complete your agent's answers are:
-- [`answer-relevancy`](/reference/scorers/answer-relevancy): Evaluates how well responses address the input query (`0-1`, higher is better)
-- [`answer-similarity`](/reference/scorers/answer-similarity): Compares agent outputs against ground truth answers for CI/CD testing using semantic analysis (`0-1`, higher is better)
-- [`faithfulness`](/reference/scorers/faithfulness): Measures how accurately responses represent provided context (`0-1`, higher is better)
-- [`hallucination`](/reference/scorers/hallucination): Detects factual contradictions and unsupported claims (`0-1`, lower is better)
-- [`completeness`](/reference/scorers/completeness): Checks if responses include all necessary information (`0-1`, higher is better)
-- [`content-similarity`](/reference/scorers/content-similarity): Measures textual similarity using character-level matching (`0-1`, higher is better)
-- [`textual-difference`](/reference/scorers/textual-difference): Measures textual differences between strings (`0-1`, higher means more similar)
-- [`tool-call-accuracy`](/reference/scorers/tool-call-accuracy): Evaluates whether the LLM selects the correct tool from available options (`0-1`, higher is better)
-- [`prompt-alignment`](/reference/scorers/prompt-alignment): Measures how well agent responses align with user prompt intent, requirements, completeness, and format (`0-1`, higher is better)
-### Context Quality
-These scorers evaluate the quality and relevance of context used in generating responses:
-- [`context-precision`](/reference/scorers/context-precision): Evaluates context relevance and ranking using Mean Average Precision, rewarding early placement of relevant context (`0-1`, higher is better)
-- [`context-relevance`](/reference/scorers/context-relevance): Measures context utility with nuanced relevance levels, usage tracking, and missing context detection (`0-1`, higher is better)
-> tip Context Scorer Selection
-- Use **Context Precision** when context ordering matters and you need standard IR metrics (ideal for RAG ranking evaluation)
-- Use **Context Relevance** when you need detailed relevance assessment and want to track context usage and identify gaps
-Both context scorers support:
-- **Static context**: Pre-defined context arrays
-- **Dynamic context extraction**: Extract context from runs using custom functions (ideal for RAG systems, vector databases, etc.)
-### Output Quality
-These scorers evaluate adherence to format, style, and safety requirements:
-- [`tone-consistency`](/reference/scorers/tone-consistency): Measures consistency in formality, complexity, and style (`0-1`, higher is better)
-- [`toxicity`](/reference/scorers/toxicity): Detects harmful or inappropriate content (`0-1`, lower is better)
-- [`bias`](/reference/scorers/bias): Detects potential biases in the output (`0-1`, lower is better)
-- [`keyword-coverage`](/reference/scorers/keyword-coverage): Assesses technical terminology usage (`0-1`, higher is better)