npm - @mastra/mcp-docs-server - Versions diffs - 0.13.39 → 1.0.0-beta.0 - Mend

@mastra/mcp-docs-server 0.13.39 → 1.0.0-beta.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (480) hide show

package/.docs/raw/reference/evals/keyword-coverage.mdx CHANGED Viewed

@@ -1,94 +1,68 @@
 ---
-title: "Reference: KeywordCoverageMetric | Evals | Mastra Docs"
-description: Documentation for the Keyword Coverage Metric in Mastra, which evaluates how well LLM outputs cover important keywords from the input.
+title: "Reference: Keyword Coverage Scorer | Evals | Mastra Docs"
+description: Documentation for the Keyword Coverage Scorer in Mastra, which evaluates how well LLM outputs cover important keywords from the input.
 ---
-# KeywordCoverageMetric
+# Keyword Coverage Scorer
-:::info Scorers
-This documentation refers to the legacy evals API. For the latest scorer features, see [Scorers](/docs/scorers/overview).
-:::
+The `createKeywordCoverageScorer()` function evaluates how well an LLM's output covers the important keywords from the input. It analyzes keyword presence and matches while ignoring common words and stop words.
-The `KeywordCoverageMetric` class evaluates how well an LLM's output covers the important keywords from the input. It analyzes keyword presence and matches while ignoring common words and stop words.
+## Parameters
-## Basic Usage
+The `createKeywordCoverageScorer()` function does not take any options.
-```typescript
-import { KeywordCoverageMetric } from "@mastra/evals/nlp";
-const metric = new KeywordCoverageMetric();
-const result = await metric.measure(
-  "What are the key features of Python programming language?",
-  "Python is a high-level programming language known for its simple syntax and extensive libraries.",
-);
-console.log(result.score); // Coverage score from 0-1
-console.log(result.info); // Object containing detailed metrics about keyword coverage
-```
+This function returns an instance of the MastraScorer class. See the [MastraScorer reference](./mastra-scorer) for details on the `.run()` method and its input/output.
-## measure() Parameters
+## .run() Returns
 <PropertiesTable
   content={[
     {
-      name: "input",
+      name: "runId",
       type: "string",
-      description: "The original text containing keywords to be matched",
-      isOptional: false,
+      description: "The id of the run (optional).",
     },
     {
-      name: "output",
-      type: "string",
-      description: "The text to evaluate for keyword coverage",
-      isOptional: false,
+      name: "preprocessStepResult",
+      type: "object",
+      description:
+        "Object with extracted keywords: { referenceKeywords: Set<string>, responseKeywords: Set<string> }",
     },
-  ]}
-/>
-## Returns
-<PropertiesTable
-  content={[
     {
-      name: "score",
-      type: "number",
+      name: "analyzeStepResult",
+      type: "object",
       description:
-        "Coverage score (0-1) representing the proportion of matched keywords",
+        "Object with keyword coverage: { totalKeywords: number, matchedKeywords: number }",
     },
     {
-      name: "info",
-      type: "object",
-      description: "Object containing detailed metrics about keyword coverage",
-      properties: [
-        {
-          type: "number",
-          parameters: [
-            {
-              name: "matchedKeywords",
-              type: "number",
-              description: "Number of keywords found in the output",
-            },
-          ],
-        },
-        {
-          type: "number",
-          parameters: [
-            {
-              name: "totalKeywords",
-              type: "number",
-              description: "Total number of keywords from the input",
-            },
-          ],
-        },
-      ],
+      name: "score",
+      type: "number",
+      description:
+        "Coverage score (0-1) representing the proportion of matched keywords.",
     },
   ]}
 />
+`.run()` returns a result in the following shape:
+```typescript
+{
+  runId: string,
+  extractStepResult: {
+    referenceKeywords: Set<string>,
+    responseKeywords: Set<string>
+  },
+  analyzeStepResult: {
+    totalKeywords: number,
+    matchedKeywords: number
+  },
+  score: number
+}
+```
 ## Scoring Details
-The metric evaluates keyword coverage by matching keywords with the following features:
+The scorer evaluates keyword coverage by matching keywords with the following features:
 - Common word and stop word filtering (e.g., "the", "a", "and")
 - Case-insensitive matching
@@ -101,7 +75,6 @@ The metric evaluates keyword coverage by matching keywords with the following fe
    - Filters out common words and stop words
    - Normalizes case and word forms
    - Handles special terms and compounds
 2. Calculates keyword coverage:
    - Matches keywords between texts
    - Counts successful matches
@@ -111,75 +84,146 @@ Final score: `(matched_keywords / total_keywords) * scale`
 ### Score interpretation
-(0 to scale, default 0-1)
+A coverage score between 0 and 1:
+- **1.0**: Complete coverage – all keywords present.
+- **0.7–0.9**: High coverage – most keywords included.
+- **0.4–0.6**: Partial coverage – some keywords present.
+- **0.1–0.3**: Low coverage – few keywords matched.
+- **0.0**: No coverage – no keywords found.
+### Special Cases
+The scorer handles several special cases:
+- Empty input/output: Returns score of 1.0 if both empty, 0.0 if only one is empty
+- Single word: Treated as a single keyword
+- Technical terms: Preserves compound technical terms (e.g., "React.js", "machine learning")
+- Case differences: "JavaScript" matches "javascript"
+- Common words: Ignored in scoring to focus on meaningful keywords
+## Examples
+### Full coverage example
+In this example, the response fully reflects the key terms from the input. All required keywords are present, resulting in complete coverage with no omissions.
+```typescript title="src/example-full-keyword-coverage.ts" showLineNumbers copy
+import { createKeywordCoverageScorer } from "@mastra/evals/scorers/prebuilt";
-- 1.0: Perfect keyword coverage
-- 0.7-0.9: Good coverage with most keywords present
-- 0.4-0.6: Moderate coverage with some keywords missing
-- 0.1-0.3: Poor coverage with many keywords missing
-- 0.0: No keyword matches
+const scorer = createKeywordCoverageScorer();
-## Examples with Analysis
+const input = "JavaScript frameworks like React and Vue";
+const output =
+  "Popular JavaScript frameworks include React and Vue for web development";
+const result = await scorer.run({
+  input: [{ role: "user", content: input }],
+  output: { role: "assistant", text: output },
+});
+console.log("Score:", result.score);
+console.log("AnalyzeStepResult:", result.analyzeStepResult);
+```
+#### Full coverage output
+A score of 1 indicates that all expected keywords were found in the response. The `analyzeStepResult` field confirms that the number of matched keywords equals the total number extracted from the input.
 ```typescript
-import { KeywordCoverageMetric } from "@mastra/evals/nlp";
+{
+  score: 1,
+  analyzeStepResult: {
+    totalKeywords: 4,
+    matchedKeywords: 4
+  }
+}
+```
-const metric = new KeywordCoverageMetric();
+### Partial coverage example
+In this example, the response includes some, but not all, of the important keywords from the input. The score reflects partial coverage, with key terms either missing or only partially matched.
+```typescript title="src/example-partial-keyword-coverage.ts" showLineNumbers copy
+import { createKeywordCoverageScorer } from "@mastra/evals/scorers/prebuilt";
+const scorer = createKeywordCoverageScorer();
-// Perfect coverage example
-const result1 = await metric.measure(
-  "The quick brown fox jumps over the lazy dog",
-  "A quick brown fox jumped over a lazy dog",
-);
-// {
-//   score: 1.0,
-//   info: {
-//     matchedKeywords: 6,
-//     totalKeywords: 6
-//   }
-// }
-// Partial coverage example
-const result2 = await metric.measure(
-  "Python features include easy syntax, dynamic typing, and extensive libraries",
-  "Python has simple syntax and many libraries",
-);
-// {
-//   score: 0.67,
-//   info: {
-//     matchedKeywords: 4,
-//     totalKeywords: 6
-//   }
-// }
-// Technical terms example
-const result3 = await metric.measure(
-  "Discuss React.js component lifecycle and state management",
-  "React components have lifecycle methods and manage state",
-);
-// {
-//   score: 1.0,
-//   info: {
-//     matchedKeywords: 4,
-//     totalKeywords: 4
-//   }
-// }
+const input = "TypeScript offers interfaces, generics, and type inference";
+const output = "TypeScript provides type inference and some advanced features";
+const result = await scorer.run({
+  input: [{ role: "user", content: input }],
+  output: { role: "assistant", text: output },
+});
+console.log("Score:", result.score);
+console.log("AnalyzeStepResult:", result.analyzeStepResult);
 ```
-## Special Cases
+#### Partial coverage output
-The metric handles several special cases:
+A score of 0.5 indicates that only half of the expected keywords were found in the response. The `analyzeStepResult` field shows how many terms were matched compared to the total identified in the input.
-- Empty input/output: Returns score of 1.0 if both empty, 0.0 if only one is empty
-- Single word: Treated as a single keyword
-- Technical terms: Preserves compound technical terms (e.g., "React.js", "machine learning")
-- Case differences: "JavaScript" matches "javascript"
-- Common words: Ignored in scoring to focus on meaningful keywords
+```typescript
+{
+  score: 0.5,
+  analyzeStepResult: {
+    totalKeywords: 6,
+    matchedKeywords: 3
+  }
+}
+```
+### Minimal coverage example
+In this example, the response includes very few of the important keywords from the input. The score reflects minimal coverage, with most key terms missing or unaccounted for.
+```typescript title="src/example-minimal-keyword-coverage.ts" showLineNumbers copy
+import { createKeywordCoverageScorer } from "@mastra/evals/scorers/prebuilt";
+const scorer = createKeywordCoverageScorer();
+const input =
+  "Machine learning models require data preprocessing, feature engineering, and hyperparameter tuning";
+const output = "Data preparation is important for models";
+const result = await scorer.run({
+  input: [{ role: "user", content: input }],
+  output: { role: "assistant", text: output },
+});
+console.log("Score:", result.score);
+console.log("AnalyzeStepResult:", result.analyzeStepResult);
+```
+#### Minimal coverage output
+A low score indicates that only a small number of the expected keywords were present in the response. The `analyzeStepResult` field highlights the gap between total and matched keywords, signaling insufficient coverage.
+```typescript
+{
+  score: 0.2,
+  analyzeStepResult: {
+    totalKeywords: 10,
+    matchedKeywords: 2
+  }
+}
+```
+### Metric configuration
+You can create a `KeywordCoverageMetric` instance with default settings. No additional configuration is required.
+```typescript
+const metric = new KeywordCoverageMetric();
+```
+> See [KeywordCoverageScorer](/reference/v1/evals/keyword-coverage) for a full list of configuration options.
 ## Related
-- [Completeness Metric](./completeness)
-- [Content Similarity Metric](./content-similarity)
-- [Answer Relevancy Metric](./answer-relevancy)
-- [Textual Difference Metric](./textual-difference)
-- [Context Relevancy Metric](./context-relevancy)
+- [Completeness Scorer](./completeness)
+- [Content Similarity Scorer](./content-similarity)
+- [Answer Relevancy Scorer](./answer-relevancy)
+- [Textual Difference Scorer](./textual-difference)

package/.docs/raw/reference/{scorers → evals}/mastra-scorer.mdx RENAMED Viewed

@@ -1,5 +1,5 @@
 ---
-title: "Reference: MastraScorer | Scorers | Mastra Docs"
+title: "Reference: MastraScorer | Evals | Mastra Docs"
 description: Documentation for the MastraScorer base class in Mastra, which provides the foundation for all custom and built-in scorers.
 ---
@@ -34,7 +34,7 @@ const result = await scorer.run({
   input: "What is machine learning?",
   output: "Machine learning is a subset of artificial intelligence...",
   runId: "optional-run-id",
-  runtimeContext: {
+  requestContext: {
     /* optional context */
   },
 });
@@ -65,18 +65,18 @@ const result = await scorer.run({
       description: "Optional unique identifier for this scoring run.",
     },
     {
-      name: "runtimeContext",
+      name: "requestContext",
       type: "any",
       required: false,
       description:
-        "Optional runtime context from the agent or workflow step being evaluated.",
+        "Optional request context from the agent or workflow step being evaluated.",
     },
     {
       name: "groundTruth",
       type: "any",
       required: false,
       description:
-        "Optional expected or reference output for comparison during scoring. Automatically passed when using runExperiment.",
+        "Optional expected or reference output for comparison during scoring. Automatically passed when using runEvals.",
     },
   ]}
 />

package/.docs/raw/reference/{scorers → evals}/noise-sensitivity.mdx RENAMED Viewed

@@ -1,11 +1,11 @@
 ---
-title: "Reference: Noise Sensitivity Scorer (CI/Testing Only) | Scorers | Mastra Docs"
+title: "Reference: Noise Sensitivity Scorer | Evals | Mastra Docs"
 description: Documentation for the Noise Sensitivity Scorer in Mastra. A CI/testing scorer that evaluates agent robustness by comparing responses between clean and noisy inputs in controlled test environments.
 ---
 import PropertiesTable from "@site/src/components/PropertiesTable";
-# Noise Sensitivity Scorer (CI/Testing Only)
+# Noise Sensitivity Scorer
 The `createNoiseSensitivityScorerLLM()` function creates a **CI/testing scorer** that evaluates how robust an agent is when exposed to irrelevant, distracting, or misleading information. Unlike live scorers that evaluate single production runs, this scorer requires predetermined test data including both baseline responses and noisy variations.
@@ -160,7 +160,7 @@ To use this scorer effectively, you need to prepare:
 ```typescript
 import { describe, it, expect } from "vitest";
-import { createNoiseSensitivityScorerLLM } from "@mastra/evals/scorers/llm";
+import { createNoiseSensitivityScorerLLM } from "@mastra/evals/scorers/prebuilt";
 import { myAgent } from "./agents";
 describe("Agent Noise Resistance Tests", () => {
@@ -352,7 +352,7 @@ Based on noise sensitivity results:
 ```typescript title="agent-noise.test.ts"
 import { describe, it, expect, beforeAll } from "vitest";
-import { createNoiseSensitivityScorerLLM } from "@mastra/evals/scorers/llm";
+import { createNoiseSensitivityScorerLLM } from "@mastra/evals/scorers/prebuilt";
 import { myAgent } from "./agents";
 // Test data preparation
@@ -467,7 +467,7 @@ console.log(result);
 This example shows an agent partially distracted by irrelevant requests:
 ```typescript
-import { createNoiseSensitivityScorerLLM } from "@mastra/evals";
+import { createNoiseSensitivityScorerLLM } from "@mastra/evals/scorers/prebuilt";
 const scorer = createNoiseSensitivityScorerLLM({
   model: "openai/gpt-4o-mini",
@@ -811,7 +811,7 @@ jobs:
 ## Related
-- [Running in CI](/docs/scorers/overview) - Setting up scorers in CI/CD pipelines
-- [Hallucination Scorer](/reference/scorers/hallucination) - Evaluates fabricated content
-- [Answer Relevancy Scorer](/reference/scorers/answer-relevancy) - Measures response focus
-- [Custom Scorers](/docs/scorers/custom-scorers) - Creating your own evaluation metrics
+- [Scorers Overview](/docs/v1/evals/overview) - Setting up scorer pipelines
+- [Hallucination Scorer](/reference/v1/evals/hallucination) - Evaluates fabricated content
+- [Answer Relevancy Scorer](/reference/v1/evals/answer-relevancy) - Measures response focus
+- [Custom Scorers](/docs/v1/evals/custom-scorers) - Creating your own evaluation metrics