npm - @mastra/mcp-docs-server - Versions diffs - 0.13.39 → 1.0.0-beta.0 - Mend

@mastra/mcp-docs-server 0.13.39 → 1.0.0-beta.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (480) hide show

package/.docs/raw/reference/{scorers/run-experiment.mdx → evals/run-evals.mdx} RENAMED Viewed

@@ -1,20 +1,20 @@
 ---
-title: "Reference: runExperiment | Scorers | Mastra Docs"
-description: "Documentation for the runExperiment function in Mastra, which enables batch evaluation of agents and workflows using multiple scorers."
+title: "Reference: runEvals | Evals | Mastra Docs"
+description: "Documentation for the runEvals function in Mastra, which enables batch evaluation of agents and workflows using multiple scorers."
 ---
-# runExperiment
+# runEvals
-The `runExperiment` function enables batch evaluation of agents and workflows by running multiple test cases against scorers concurrently. This is essential for systematic testing, performance analysis, and validation of AI systems.
+The `runEvals` function enables batch evaluation of agents and workflows by running multiple test cases against scorers concurrently. This is essential for systematic testing, performance analysis, and validation of AI systems.
 ## Usage Example
 ```typescript
-import { runExperiment } from "@mastra/core/scores";
+import { runEvals } from "@mastra/core/evals";
 import { myAgent } from "./agents/my-agent";
 import { myScorer1, myScorer2 } from "./scorers";
-const result = await runExperiment({
+const result = await runEvals({
   target: myAgent,
   data: [
     { input: "What is machine learning?" },
@@ -45,7 +45,7 @@ console.log(`Processed ${result.summary.totalItems} items`);
     },
     {
       name: "data",
-      type: "RunExperimentDataItem[]",
+      type: "RunEvalsDataItem[]",
       description:
         "Array of test cases with input data and optional ground truth.",
       isOptional: false,
@@ -93,8 +93,8 @@ console.log(`Processed ${result.summary.totalItems} items`);
       isOptional: true,
     },
     {
-      name: "runtimeContext",
-      type: "RuntimeContext",
+      name: "requestContext",
+      type: "RequestContext",
       description: "Request Context to pass to the target during execution.",
       isOptional: true,
     },
@@ -157,8 +157,7 @@ For workflows, you can specify scorers at different levels using `WorkflowScorer
 ### Agent Evaluation
 ```typescript
-import { runExperiment } from "@mastra/core/scores";
-import { createScorer } from "@mastra/core/scores";
+import { createScorer, runEvals } from "@mastra/core/evals";
 const myScorer = createScorer({
   name: "My Scorer",
@@ -170,7 +169,7 @@ const myScorer = createScorer({
   return response.includes(expectedResponse) ? 1 : 0;
 });
-const result = await runExperiment({
+const result = await runEvals({
   target: chatAgent,
   data: [
     {
@@ -192,7 +191,7 @@ const result = await runExperiment({
 ### Workflow Evaluation
 ```typescript
-const workflowResult = await runExperiment({
+const workflowResult = await runEvals({
   target: myWorkflow,
   data: [
     { input: { query: "Process this data", priority: "high" } },
@@ -206,7 +205,7 @@ const workflowResult = await runExperiment({
     },
   },
   onItemComplete: ({ item, targetResult, scorerResults }) => {
-    console.log(`Workflow completed for: ${item.input.query}`);
+    console.log(`Workflow completed for: ${item.inputData.query}`);
     if (scorerResults.workflow) {
       console.log("Workflow scores:", scorerResults.workflow);
     }
@@ -219,7 +218,7 @@ const workflowResult = await runExperiment({
 ## Related
-- [createScorer()](../../reference/scorers/create-scorer) - Create custom scorers for experiments
-- [MastraScorer](../../reference/scorers/mastra-scorer) - Learn about scorer structure and methods
-- [Custom Scorers](../../docs/scorers/custom-scorers) - Guide to building evaluation logic
-- [Scorers Overview](../../docs/scorers/overview) - Understanding scorer concepts
+- [createScorer()](/reference/v1/evals/create-scorer) - Create custom scorers for experiments
+- [MastraScorer](/reference/v1/evals/mastra-scorer) - Learn about scorer structure and methods
+- [Custom Scorers](/docs/v1/evals/custom-scorers) - Guide to building evaluation logic
+- [Scorers Overview](/docs/v1/evals/overview) - Understanding scorer concepts

package/.docs/raw/reference/evals/textual-difference.mdx CHANGED Viewed

@@ -1,115 +1,59 @@
 ---
-title: "Reference: TextualDifferenceMetric | Evals | Mastra Docs"
-description: Documentation for the Textual Difference Metric in Mastra, which measures textual differences between strings using sequence matching.
+title: "Reference: Textual Difference Scorer | Evals | Mastra Docs"
+description: Documentation for the Textual Difference Scorer in Mastra, which measures textual differences between strings using sequence matching.
 ---
-# TextualDifferenceMetric
+# Textual Difference Scorer
-:::info Scorers
-This documentation refers to the legacy evals API. For the latest scorer features, see [Scorers](/docs/scorers/overview).
-:::
+The `createTextualDifferenceScorer()` function uses sequence matching to measure the textual differences between two strings. It provides detailed information about changes, including the number of operations needed to transform one text into another.
-The `TextualDifferenceMetric` class uses sequence matching to measure the textual differences between two strings. It provides detailed information about changes, including the number of operations needed to transform one text into another.
+## Parameters
-## Basic Usage
+The `createTextualDifferenceScorer()` function does not take any options.
-```typescript
-import { TextualDifferenceMetric } from "@mastra/evals/nlp";
-const metric = new TextualDifferenceMetric();
+This function returns an instance of the MastraScorer class. See the [MastraScorer reference](./mastra-scorer) for details on the `.run()` method and its input/output.
-const result = await metric.measure(
-  "The quick brown fox",
-  "The fast brown fox",
-);
-console.log(result.score); // Similarity ratio from 0-1
-console.log(result.info); // Detailed change metrics
-```
-## measure() Parameters
+## .run() Returns
 <PropertiesTable
   content={[
     {
-      name: "input",
+      name: "runId",
       type: "string",
-      description: "The original text to compare against",
-      isOptional: false,
+      description: "The id of the run (optional).",
     },
     {
-      name: "output",
-      type: "string",
-      description: "The text to evaluate for differences",
-      isOptional: false,
+      name: "analyzeStepResult",
+      type: "object",
+      description:
+        "Object with difference metrics: { confidence: number, changes: number, lengthDiff: number }",
     },
-  ]}
-/>
-## Returns
-<PropertiesTable
-  content={[
     {
       name: "score",
       type: "number",
-      description: "Similarity ratio (0-1) where 1 indicates identical texts",
-    },
-    {
-      name: "info",
-      description: "Detailed metrics about the differences",
-      properties: [
-        {
-          type: "number",
-          parameters: [
-            {
-              name: "confidence",
-              type: "number",
-              description:
-                "Confidence score based on length difference between texts (0-1)",
-            },
-          ],
-        },
-        {
-          type: "number",
-          parameters: [
-            {
-              name: "ratio",
-              type: "number",
-              description: "Raw similarity ratio between the texts",
-            },
-          ],
-        },
-        {
-          type: "number",
-          parameters: [
-            {
-              name: "changes",
-              type: "number",
-              description:
-                "Number of change operations (insertions, deletions, replacements)",
-            },
-          ],
-        },
-        {
-          type: "number",
-          parameters: [
-            {
-              name: "lengthDiff",
-              type: "number",
-              description:
-                "Normalized difference in length between input and output (0-1)",
-            },
-          ],
-        },
-      ],
+      description: "Similarity ratio (0-1) where 1 indicates identical texts.",
     },
   ]}
 />
+`.run()` returns a result in the following shape:
+```typescript
+{
+  runId: string,
+  analyzeStepResult: {
+    confidence: number,
+    ratio: number,
+    changes: number,
+    lengthDiff: number
+  },
+  score: number
+}
+```
 ## Scoring Details
-The metric calculates several measures:
+The scorer calculates several measures:
 - **Similarity Ratio**: Based on sequence matching between texts (0-1)
 - **Changes**: Count of non-matching operations needed
@@ -122,7 +66,6 @@ The metric calculates several measures:
    - Performs sequence matching between input and output
    - Counts the number of change operations required
    - Measures length differences
 2. Calculates metrics:
    - Computes similarity ratio
    - Determines confidence score
@@ -132,40 +75,129 @@ Final score: `(similarity_ratio * confidence) * scale`
 ### Score interpretation
-(0 to scale, default 0-1)
+A textual difference score between 0 and 1:
+- **1.0**: Identical texts – no differences detected.
+- **0.7–0.9**: Minor differences – few changes needed.
+- **0.4–0.6**: Moderate differences – noticeable changes required.
+- **0.1–0.3**: Major differences – extensive changes needed.
+- **0.0**: Completely different texts.
+## Examples
+### No differences example
+In this example, the texts are exactly the same. The scorer identifies complete similarity with a perfect score and no detected changes.
+```typescript title="src/example-no-differences.ts" showLineNumbers copy
+import { createTextualDifferenceScorer } from "@mastra/evals/scorers/prebuilt";
+const scorer = createTextualDifferenceScorer();
+const input = "The quick brown fox jumps over the lazy dog";
+const output = "The quick brown fox jumps over the lazy dog";
+const result = await scorer.run({
+  input: [{ role: "user", content: input }],
+  output: { role: "assistant", text: output },
+});
+console.log("Score:", result.score);
+console.log("AnalyzeStepResult:", result.analyzeStepResult);
+```
+#### No differences output
+The scorer returns a high score, indicating the texts are identical. The detailed info confirms zero changes and no length difference.
+```typescript
+{
+  score: 1,
+  analyzeStepResult: {
+    confidence: 1,
+    ratio: 1,
+    changes: 0,
+    lengthDiff: 0,
+  },
+}
+```
+### Minor differences example
+In this example, the texts have small variations. The scorer detects these minor differences and returns a moderate similarity score.
+```typescript title="src/example-minor-differences.ts" showLineNumbers copy
+import { createTextualDifferenceScorer } from "@mastra/evals/scorers/prebuilt";
+const scorer = createTextualDifferenceScorer();
+const input = "Hello world! How are you?";
+const output = "Hello there! How is it going?";
+const result = await scorer.run({
+  input: [{ role: "user", content: input }],
+  output: { role: "assistant", text: output },
+});
+console.log("Score:", result.score);
+console.log("AnalyzeStepResult:", result.analyzeStepResult);
+```
+#### Minor differences output
+The scorer returns a moderate score reflecting the small variations between the texts. The detailed info includes the number of changes and length difference observed.
+```typescript
+{
+  score: 0.5925925925925926,
+  analyzeStepResult: {
+    confidence: 0.8620689655172413,
+    ratio: 0.5925925925925926,
+    changes: 5,
+    lengthDiff: 0.13793103448275862
+  }
+}
+```
+### Major differences example
+In this example, the texts differ significantly. The scorer detects extensive changes and returns a low similarity score.
+```typescript title="src/example-major-differences.ts" showLineNumbers copy
+import { createTextualDifferenceScorer } from "@mastra/evals/scorers/prebuilt";
+const scorer = createTextualDifferenceScorer();
+const input = "Python is a high-level programming language";
+const output = "JavaScript is used for web development";
+const result = await scorer.run({
+  input: [{ role: "user", content: input }],
+  output: { role: "assistant", text: output },
+});
+console.log("Score:", result.score);
+console.log("AnalyzeStepResult:", result.analyzeStepResult);
+```
-- 1.0: Identical texts - no differences
-- 0.7-0.9: Minor differences - few changes needed
-- 0.4-0.6: Moderate differences - significant changes
-- 0.1-0.3: Major differences - extensive changes
-- 0.0: Completely different texts
+#### Major differences output
-## Example with Analysis
+The scorer returns a low score due to significant differences between the texts. The detailed `analyzeStepResult` shows numerous changes and a notable length difference.
 ```typescript
-import { TextualDifferenceMetric } from "@mastra/evals/nlp";
-const metric = new TextualDifferenceMetric();
-const result = await metric.measure(
-  "Hello world! How are you?",
-  "Hello there! How is it going?",
-);
-// Example output:
-// {
-//   score: 0.65,
-//   info: {
-//     confidence: 0.95,
-//     ratio: 0.65,
-//     changes: 2,
-//     lengthDiff: 0.05
-//   }
-// }
+{
+  score: 0.3170731707317073,
+  analyzeStepResult: {
+    confidence: 0.8636363636363636,
+    ratio: 0.3170731707317073,
+    changes: 8,
+    lengthDiff: 0.13636363636363635
+  }
+}
 ```
 ## Related
-- [Content Similarity Metric](./content-similarity)
-- [Completeness Metric](./completeness)
-- [Keyword Coverage Metric](./keyword-coverage)
+- [Content Similarity Scorer](./content-similarity)
+- [Completeness Scorer](./completeness)
+- [Keyword Coverage Scorer](./keyword-coverage)