npm - @mastra/evals - Versions diffs - 1.1.2-alpha.0 → 1.2.0-alpha.0 - Mend

@mastra/evals 1.1.2-alpha.0 → 1.2.0-alpha.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (60) hide show

package/CHANGELOG.md +59 -2
package/LICENSE.md +15 -0
package/dist/chunk-EVBNIL5M.js +606 -0
package/dist/chunk-EVBNIL5M.js.map +1 -0
package/dist/chunk-XRUR5PBK.cjs +632 -0
package/dist/chunk-XRUR5PBK.cjs.map +1 -0
package/dist/docs/SKILL.md +20 -19
package/dist/docs/assets/SOURCE_MAP.json +1 -1
package/dist/docs/references/docs-evals-built-in-scorers.md +2 -1
package/dist/docs/references/docs-evals-overview.md +11 -16
package/dist/docs/references/reference-evals-answer-relevancy.md +25 -25
package/dist/docs/references/reference-evals-answer-similarity.md +33 -35
package/dist/docs/references/reference-evals-bias.md +24 -24
package/dist/docs/references/reference-evals-completeness.md +19 -20
package/dist/docs/references/reference-evals-content-similarity.md +20 -20
package/dist/docs/references/reference-evals-context-precision.md +36 -36
package/dist/docs/references/reference-evals-context-relevance.md +136 -141
package/dist/docs/references/reference-evals-faithfulness.md +24 -24
package/dist/docs/references/reference-evals-hallucination.md +52 -69
package/dist/docs/references/reference-evals-keyword-coverage.md +18 -18
package/dist/docs/references/reference-evals-noise-sensitivity.md +167 -177
package/dist/docs/references/reference-evals-prompt-alignment.md +111 -116
package/dist/docs/references/reference-evals-scorer-utils.md +285 -105
package/dist/docs/references/reference-evals-textual-difference.md +18 -18
package/dist/docs/references/reference-evals-tone-consistency.md +19 -19
package/dist/docs/references/reference-evals-tool-call-accuracy.md +165 -165
package/dist/docs/references/reference-evals-toxicity.md +21 -21
package/dist/docs/references/reference-evals-trajectory-accuracy.md +613 -0
package/dist/scorers/code/index.d.ts +1 -0
package/dist/scorers/code/index.d.ts.map +1 -1
package/dist/scorers/code/trajectory/index.d.ts +147 -0
package/dist/scorers/code/trajectory/index.d.ts.map +1 -0
package/dist/scorers/llm/answer-similarity/index.d.ts +2 -2
package/dist/scorers/llm/context-precision/index.d.ts +2 -2
package/dist/scorers/llm/context-relevance/index.d.ts +1 -1
package/dist/scorers/llm/faithfulness/index.d.ts +1 -1
package/dist/scorers/llm/hallucination/index.d.ts +2 -2
package/dist/scorers/llm/index.d.ts +1 -0
package/dist/scorers/llm/index.d.ts.map +1 -1
package/dist/scorers/llm/noise-sensitivity/index.d.ts +1 -1
package/dist/scorers/llm/prompt-alignment/index.d.ts +5 -5
package/dist/scorers/llm/tool-call-accuracy/index.d.ts +1 -1
package/dist/scorers/llm/toxicity/index.d.ts +1 -1
package/dist/scorers/llm/trajectory/index.d.ts +58 -0
package/dist/scorers/llm/trajectory/index.d.ts.map +1 -0
package/dist/scorers/llm/trajectory/prompts.d.ts +20 -0
package/dist/scorers/llm/trajectory/prompts.d.ts.map +1 -0
package/dist/scorers/prebuilt/index.cjs +638 -59
package/dist/scorers/prebuilt/index.cjs.map +1 -1
package/dist/scorers/prebuilt/index.js +578 -2
package/dist/scorers/prebuilt/index.js.map +1 -1
package/dist/scorers/utils.cjs +41 -17
package/dist/scorers/utils.d.ts +171 -1
package/dist/scorers/utils.d.ts.map +1 -1
package/dist/scorers/utils.js +1 -1
package/package.json +14 -11
package/dist/chunk-OEOE7ZHN.js +0 -195
package/dist/chunk-OEOE7ZHN.js.map +0 -1
package/dist/chunk-W3U7MMDX.cjs +0 -212
package/dist/chunk-W3U7MMDX.cjs.map +0 -1

package/dist/docs/references/reference-evals-scorer-utils.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# Scorer Utils
+# Scorer utils
 Mastra provides utility functions to help extract and process data from scorer run inputs and outputs. These utilities are particularly useful in the `preprocess` step of custom scorers.
@@ -14,35 +14,47 @@ import {
   extractToolCalls,
   extractInputMessages,
   extractAgentResponseMessages,
-} from "@mastra/evals/scorers/utils";
+  compareTrajectories,
+  createTrajectoryTestRun,
+} from '@mastra/evals/scorers/utils'
 ```
-## Message Extraction
+Trajectory extraction functions are available from `@mastra/core/evals`:
-### getAssistantMessageFromRunOutput
+```typescript
+import {
+  extractTrajectory,
+  extractWorkflowTrajectory,
+  extractTrajectoryFromTrace,
+} from '@mastra/core/evals'
+```
+## Message extraction
+### `getAssistantMessageFromRunOutput`
 Extracts the text content from the first assistant message in the run output.
 ```typescript
 const scorer = createScorer({
-  id: "my-scorer",
-  description: "My scorer",
-  type: "agent",
+  id: 'my-scorer',
+  description: 'My scorer',
+  type: 'agent',
 })
   .preprocess(({ run }) => {
-    const response = getAssistantMessageFromRunOutput(run.output);
-    return { response };
+    const response = getAssistantMessageFromRunOutput(run.output)
+    return { response }
   })
   .generateScore(({ results }) => {
-    return results.preprocessStepResult?.response ? 1 : 0;
-  });
+    return results.preprocessStepResult?.response ? 1 : 0
+  })
 ```
-**output?:** (`ScorerRunOutputForAgent`): The scorer run output (array of MastraDBMessage)
+**output** (`ScorerRunOutputForAgent`): The scorer run output (array of MastraDBMessage)
 **Returns:** `string | undefined` - The assistant message text, or undefined if no assistant message is found.
-### getUserMessageFromRunInput
+### `getUserMessageFromRunInput`
 Extracts the text content from the first user message in the run input.
@@ -53,11 +65,11 @@ Extracts the text content from the first user message in the run input.
 })
 ```
-**input?:** (`ScorerRunInputForAgent`): The scorer run input containing input messages
+**input** (`ScorerRunInputForAgent`): The scorer run input containing input messages
 **Returns:** `string | undefined` - The user message text, or undefined if no user message is found.
-### extractInputMessages
+### `extractInputMessages`
 Extracts text content from all input messages as an array.
@@ -70,7 +82,7 @@ Extracts text content from all input messages as an array.
 **Returns:** `string[]` - Array of text strings from each input message.
-### extractAgentResponseMessages
+### `extractAgentResponseMessages`
 Extracts text content from all assistant response messages as an array.
@@ -83,9 +95,9 @@ Extracts text content from all assistant response messages as an array.
 **Returns:** `string[]` - Array of text strings from each assistant message.
-## Reasoning Extraction
+## Reasoning extraction
-### getReasoningFromRunOutput
+### `getReasoningFromRunOutput`
 Extracts reasoning text from the run output. This is particularly useful when evaluating responses from reasoning models like `deepseek-reasoner` that produce chain-of-thought reasoning.
@@ -97,50 +109,50 @@ Reasoning can be stored in two places:
 ```typescript
 import {
   getReasoningFromRunOutput,
-  getAssistantMessageFromRunOutput
-} from "@mastra/evals/scorers/utils";
+  getAssistantMessageFromRunOutput,
+} from '@mastra/evals/scorers/utils'
 const reasoningQualityScorer = createScorer({
-  id: "reasoning-quality",
-  name: "Reasoning Quality",
-  description: "Evaluates the quality of model reasoning",
-  type: "agent",
+  id: 'reasoning-quality',
+  name: 'Reasoning Quality',
+  description: 'Evaluates the quality of model reasoning',
+  type: 'agent',
 })
   .preprocess(({ run }) => {
-    const reasoning = getReasoningFromRunOutput(run.output);
-    const response = getAssistantMessageFromRunOutput(run.output);
-    return { reasoning, response };
+    const reasoning = getReasoningFromRunOutput(run.output)
+    const response = getAssistantMessageFromRunOutput(run.output)
+    return { reasoning, response }
   })
   .analyze(({ results }) => {
-    const { reasoning } = results.preprocessStepResult || {};
+    const { reasoning } = results.preprocessStepResult || {}
     return {
       hasReasoning: !!reasoning,
       reasoningLength: reasoning?.length || 0,
-      hasStepByStep: reasoning?.includes("step") || false,
-    };
+      hasStepByStep: reasoning?.includes('step') || false,
+    }
   })
   .generateScore(({ results }) => {
-    const { hasReasoning, reasoningLength } = results.analyzeStepResult || {};
-    if (!hasReasoning) return 0;
+    const { hasReasoning, reasoningLength } = results.analyzeStepResult || {}
+    if (!hasReasoning) return 0
     // Score based on reasoning length (normalized to 0-1)
-    return Math.min(reasoningLength / 500, 1);
+    return Math.min(reasoningLength / 500, 1)
   })
   .generateReason(({ results, score }) => {
-    const { hasReasoning, reasoningLength } = results.analyzeStepResult || {};
+    const { hasReasoning, reasoningLength } = results.analyzeStepResult || {}
     if (!hasReasoning) {
-      return "No reasoning was provided by the model.";
+      return 'No reasoning was provided by the model.'
     }
-    return `Model provided ${reasoningLength} characters of reasoning. Score: ${score}`;
-  });
+    return `Model provided ${reasoningLength} characters of reasoning. Score: ${score}`
+  })
 ```
-**output?:** (`ScorerRunOutputForAgent`): The scorer run output (array of MastraDBMessage)
+**output** (`ScorerRunOutputForAgent`): The scorer run output (array of MastraDBMessage)
 **Returns:** `string | undefined` - The reasoning text, or undefined if no reasoning is present.
-## System Message Extraction
+## System message extraction
-### getSystemMessagesFromRunInput
+### `getSystemMessagesFromRunInput`
 Extracts all system messages from the run input, including both standard system messages and tagged system messages (specialized prompts like memory instructions).
@@ -156,7 +168,7 @@ Extracts all system messages from the run input, including both standard system
 **Returns:** `string[]` - Array of system message strings.
-### getCombinedSystemPrompt
+### `getCombinedSystemPrompt`
 Combines all system messages into a single prompt string, joined with double newlines.
@@ -169,31 +181,31 @@ Combines all system messages into a single prompt string, joined with double new
 **Returns:** `string` - Combined system prompt string.
-## Tool Call Extraction
+## Tool call extraction
-### extractToolCalls
+### `extractToolCalls`
 Extracts information about all tool calls from the run output, including tool names, call IDs, and their positions in the message array.
 ```typescript
 const toolUsageScorer = createScorer({
-  id: "tool-usage",
-  description: "Evaluates tool usage patterns",
-  type: "agent",
+  id: 'tool-usage',
+  description: 'Evaluates tool usage patterns',
+  type: 'agent',
 })
   .preprocess(({ run }) => {
-    const { tools, toolCallInfos } = extractToolCalls(run.output);
+    const { tools, toolCallInfos } = extractToolCalls(run.output)
     return {
       toolsUsed: tools,
       toolCount: tools.length,
       toolDetails: toolCallInfos,
-    };
+    }
   })
   .generateScore(({ results }) => {
-    const { toolCount } = results.preprocessStepResult || {};
+    const { toolCount } = results.preprocessStepResult || {}
     // Score based on appropriate tool usage
-    return toolCount > 0 ? 1 : 0;
-  });
+    return toolCount > 0 ? 1 : 0
+  })
 ```
 **Returns:**
@@ -209,94 +221,262 @@ Where `ToolCallInfo` is:
 ```typescript
 type ToolCallInfo = {
-  toolName: string;      // Name of the tool
-  toolCallId: string;    // Unique call identifier
-  messageIndex: number;  // Index in the output array
-  invocationIndex: number; // Index within message's tool invocations
-};
+  toolName: string // Name of the tool
+  toolCallId: string // Unique call identifier
+  messageIndex: number // Index in the output array
+  invocationIndex: number // Index within message's tool invocations
+}
 ```
-## Test Utilities
+## Test utilities
 These utilities help create test data for scorer development.
-### createTestMessage
+### `createTestMessage`
 Creates a `MastraDBMessage` object for testing purposes.
 ```typescript
-import { createTestMessage } from "@mastra/evals/scorers/utils";
+import { createTestMessage } from '@mastra/evals/scorers/utils'
 const userMessage = createTestMessage({
-  content: "What is the weather?",
-  role: "user",
-});
+  content: 'What is the weather?',
+  role: 'user',
+})
 const assistantMessage = createTestMessage({
-  content: "The weather is sunny.",
-  role: "assistant",
+  content: 'The weather is sunny.',
+  role: 'assistant',
   toolInvocations: [
     {
-      toolCallId: "call-1",
-      toolName: "weatherTool",
-      args: { location: "London" },
+      toolCallId: 'call-1',
+      toolName: 'weatherTool',
+      args: { location: 'London' },
       result: { temp: 20 },
-      state: "result",
+      state: 'result',
     },
   ],
-});
+})
 ```
-### createAgentTestRun
+### `createAgentTestRun`
 Creates a complete test run object for testing scorers.
 ```typescript
-import { createAgentTestRun, createTestMessage } from "@mastra/evals/scorers/utils";
+import { createAgentTestRun, createTestMessage } from '@mastra/evals/scorers/utils'
 const testRun = createAgentTestRun({
-  inputMessages: [
-    createTestMessage({ content: "Hello", role: "user" }),
-  ],
-  output: [
-    createTestMessage({ content: "Hi there!", role: "assistant" }),
-  ],
-});
+  inputMessages: [createTestMessage({ content: 'Hello', role: 'user' })],
+  output: [createTestMessage({ content: 'Hi there!', role: 'assistant' })],
+})
 // Run your scorer with the test data
 const result = await myScorer.run({
   input: testRun.input,
   output: testRun.output,
-});
+})
+```
+## Trajectory utilities
+### `extractTrajectory`
+Extracts a `Trajectory` from agent output messages (`MastraDBMessage[]`). Converts tool invocations into `ToolCallStep` objects. The `runEvals` pipeline calls this automatically for trajectory scorers — you only need it for direct testing.
+Available from `@mastra/core/evals`.
+```typescript
+import { extractTrajectory } from '@mastra/core/evals'
+const trajectory = extractTrajectory(agentOutputMessages)
+// trajectory.steps — ToolCallStep[] extracted from toolInvocations
+// trajectory.rawOutput — the original MastraDBMessage[] array
+```
+**Returns:** `Trajectory` — Contains `steps: TrajectoryStep[]`, `totalDurationMs`, and `rawOutput`.
+### `extractWorkflowTrajectory`
+Extracts a `Trajectory` from workflow step results. Converts `StepResult` records into `WorkflowStepStep` objects, respecting the execution path ordering.
+Available from `@mastra/core/evals`.
+```typescript
+import { extractWorkflowTrajectory } from '@mastra/core/evals'
+const trajectory = extractWorkflowTrajectory(
+  workflowResult.steps, // Record<string, StepResult>
+  workflowResult.stepExecutionPath, // string[] (optional)
+)
+// trajectory.steps — WorkflowStepStep[] in execution order
 ```
-## Complete Example
+**Returns:** `Trajectory` — Contains `steps: TrajectoryStep[]`, `totalDurationMs`, and `rawWorkflowResult`.
+### `extractTrajectoryFromTrace`
+Builds a hierarchical `Trajectory` from observability trace spans (`SpanRecord[]`). Reconstructs the parent-child span tree and maps each span to the appropriate `TrajectoryStep` discriminated union type with nested `children`.
+This is the preferred extraction method when storage is available. The `runEvals` pipeline calls this automatically when the target's `Mastra` instance has a configured storage backend. It produces richer trajectories than `extractTrajectory` or `extractWorkflowTrajectory` because it captures the full execution tree, including nested agent runs, tool calls, and model generations.
+Available from `@mastra/core/evals`.
+```typescript
+import { extractTrajectoryFromTrace } from '@mastra/core/evals'
+// After fetching a trace from the observability store
+const traceData = await observabilityStore.getTrace({ traceId })
+const trajectory = extractTrajectoryFromTrace(traceData.spans, rootSpanId)
+// trajectory.steps — hierarchical TrajectoryStep[] with children
+```
+**Parameters:**
+- `spans` (`SpanRecord[]`) — Array of span records from a trace query.
+- `rootSpanId` (`string`, optional) — Span ID to use as the starting point. When omitted, uses spans with no parent.
+**Returns:** `Trajectory` — Contains `steps: TrajectoryStep[]` with recursive `children` and `totalDurationMs`.
+#### Span type mapping
+| Span type              | Trajectory step type   | Key fields extracted                                          |
+| ---------------------- | ---------------------- | ------------------------------------------------------------- |
+| `TOOL_CALL`            | `tool_call`            | `toolArgs`, `toolResult`, `success`                           |
+| `MCP_TOOL_CALL`        | `mcp_tool_call`        | `toolArgs`, `toolResult`, `mcpServer`, `success`              |
+| `MODEL_GENERATION`     | `model_generation`     | `modelId`, `promptTokens`, `completionTokens`, `finishReason` |
+| `AGENT_RUN`            | `agent_run`            | `agentId` (from entity ID)                                    |
+| `WORKFLOW_RUN`         | `workflow_run`         | `workflowId` (from entity ID)                                 |
+| `WORKFLOW_STEP`        | `workflow_step`        | `output`                                                      |
+| `WORKFLOW_CONDITIONAL` | `workflow_conditional` | `conditionCount`, `selectedSteps`                             |
+| `WORKFLOW_PARALLEL`    | `workflow_parallel`    | `branchCount`, `parallelSteps`                                |
+| `WORKFLOW_LOOP`        | `workflow_loop`        | `loopType`, `totalIterations`                                 |
+| `WORKFLOW_SLEEP`       | `workflow_sleep`       | `sleepDurationMs`, `sleepType`                                |
+| `WORKFLOW_WAIT_EVENT`  | `workflow_wait_event`  | `eventName`, `eventReceived`                                  |
+| `PROCESSOR_RUN`        | `processor_run`        | `processorId`                                                 |
+Spans with types `GENERIC`, `MODEL_STEP`, `MODEL_CHUNK`, and `WORKFLOW_CONDITIONAL_EVAL` are skipped as noise.
+### `compareTrajectories`
+Compares an actual trajectory against an expected trajectory and returns a detailed comparison result. Used internally by `createTrajectoryAccuracyScorerCode`.
+The `expected` parameter accepts either a `Trajectory` (actual trajectory) or `{ steps: ExpectedStep[] }`. When using `ExpectedStep[]`, you can match by name only, name + stepType, or include data for comparison. See [Expected steps](https://mastra.ai/reference/evals/trajectory-accuracy) for details.
+```typescript
+import { compareTrajectories } from '@mastra/evals/scorers/utils'
+// Using ExpectedStep[] (recommended for expectations)
+const result = compareTrajectories(
+  actualTrajectory,
+  { steps: [{ name: 'search' }, { name: 'summarize', stepType: 'tool_call' }] },
+  { compareStepData: false, allowRepeatedSteps: true },
+)
+// result.score — 0.0 to 1.0
+// result.missingSteps — step names not found
+// result.extraSteps — unexpected step names
+// result.outOfOrderSteps — steps found but in wrong order
+```
+**Returns:** `TrajectoryComparisonResult`
+### `createTrajectoryTestRun`
+Creates a test run object for trajectory scorers. Wraps a `Trajectory` into the expected `ScorerRun` format.
+```typescript
+import { createTrajectoryTestRun } from '@mastra/evals/scorers/utils'
+const run = createTrajectoryTestRun({
+  steps: [
+    { stepType: 'tool_call', name: 'search', toolArgs: { q: 'test' } },
+    { stepType: 'tool_call', name: 'summarize' },
+  ],
+})
+const result = await trajectoryScorer.run(run)
+```
+### `checkTrajectoryEfficiency`
+Evaluates trajectory efficiency against step, token, and duration budgets. Also detects redundant calls (same tool with same arguments).
+```typescript
+import { checkTrajectoryEfficiency } from '@mastra/evals/scorers/utils'
+const result = checkTrajectoryEfficiency(trajectory, {
+  maxSteps: 5,
+  maxTotalTokens: 2000,
+  maxTotalDurationMs: 5000,
+  noRedundantCalls: true,
+})
+// result.score — 1.0 if within all budgets, lower with penalties
+// result.redundantCalls — duplicate tool+args combos
+// result.overBudget — which budgets were exceeded
+```
+**Returns:** `TrajectoryEfficiencyResult`
+### `checkTrajectoryBlacklist`
+Checks whether a trajectory contains forbidden tools or tool call sequences.
+```typescript
+import { checkTrajectoryBlacklist } from '@mastra/evals/scorers/utils'
+const result = checkTrajectoryBlacklist(trajectory, {
+  blacklistedTools: ['deleteAll', 'admin-override'],
+  blacklistedSequences: [['escalate', 'admin-override']],
+})
+// result.passed — true if no violations
+// result.violations — list of violations with type and details
+```
+**Returns:** `TrajectoryBlacklistResult`
+### `analyzeToolFailures`
+Detects tool failure patterns including retries, fallbacks, and argument corrections.
+```typescript
+import { analyzeToolFailures } from '@mastra/evals/scorers/utils'
+const result = analyzeToolFailures(trajectory, {
+  maxRetriesPerTool: 3,
+})
+// result.score — 1.0 if no failure patterns, lower if patterns detected
+// result.patterns — detected patterns (retry, fallback, arg_correction)
+```
+**Returns:** `ToolFailureAnalysisResult`
+## Complete example
 Here's a complete example showing how to use multiple utilities together:
 ```typescript
-import { createScorer } from "@mastra/core/evals";
+import { createScorer } from '@mastra/core/evals'
 import {
   getAssistantMessageFromRunOutput,
   getReasoningFromRunOutput,
   getUserMessageFromRunInput,
   getCombinedSystemPrompt,
   extractToolCalls,
-} from "@mastra/evals/scorers/utils";
+} from '@mastra/evals/scorers/utils'
 const comprehensiveScorer = createScorer({
-  id: "comprehensive-analysis",
-  name: "Comprehensive Analysis",
-  description: "Analyzes all aspects of an agent response",
-  type: "agent",
+  id: 'comprehensive-analysis',
+  name: 'Comprehensive Analysis',
+  description: 'Analyzes all aspects of an agent response',
+  type: 'agent',
 })
   .preprocess(({ run }) => {
     // Extract all relevant data
-    const userMessage = getUserMessageFromRunInput(run.input);
-    const response = getAssistantMessageFromRunOutput(run.output);
-    const reasoning = getReasoningFromRunOutput(run.output);
-    const systemPrompt = getCombinedSystemPrompt(run.input);
-    const { tools, toolCallInfos } = extractToolCalls(run.output);
+    const userMessage = getUserMessageFromRunInput(run.input)
+    const response = getAssistantMessageFromRunOutput(run.output)
+    const reasoning = getReasoningFromRunOutput(run.output)
+    const systemPrompt = getCombinedSystemPrompt(run.input)
+    const { tools, toolCallInfos } = extractToolCalls(run.output)
     return {
       userMessage,
@@ -305,26 +485,26 @@ const comprehensiveScorer = createScorer({
       systemPrompt,
       toolsUsed: tools,
       toolCount: tools.length,
-    };
+    }
   })
   .generateScore(({ results }) => {
-    const { response, reasoning, toolCount } = results.preprocessStepResult || {};
+    const { response, reasoning, toolCount } = results.preprocessStepResult || {}
-    let score = 0;
-    if (response && response.length > 0) score += 0.4;
-    if (reasoning) score += 0.3;
-    if (toolCount > 0) score += 0.3;
+    let score = 0
+    if (response && response.length > 0) score += 0.4
+    if (reasoning) score += 0.3
+    if (toolCount > 0) score += 0.3
-    return score;
+    return score
   })
   .generateReason(({ results, score }) => {
-    const { response, reasoning, toolCount } = results.preprocessStepResult || {};
+    const { response, reasoning, toolCount } = results.preprocessStepResult || {}
-    const parts = [];
-    if (response) parts.push("provided a response");
-    if (reasoning) parts.push("included reasoning");
-    if (toolCount > 0) parts.push(`used ${toolCount} tool(s)`);
+    const parts = []
+    if (response) parts.push('provided a response')
+    if (reasoning) parts.push('included reasoning')
+    if (toolCount > 0) parts.push(`used ${toolCount} tool(s)`)
-    return `Score: ${score}. The agent ${parts.join(", ")}.`;
-  });
+    return `Score: ${score}. The agent ${parts.join(', ')}.`
+  })
 ```

package/dist/docs/references/reference-evals-textual-difference.md CHANGED Viewed

@@ -1,20 +1,20 @@
-# Textual Difference Scorer
+# Textual difference scorer
 The `createTextualDifferenceScorer()` function uses sequence matching to measure the textual differences between two strings. It provides detailed information about changes, including the number of operations needed to transform one text into another.
 ## Parameters
-The `createTextualDifferenceScorer()` function does not take any options.
+The `createTextualDifferenceScorer()` function doesn't take any options.
 This function returns an instance of the MastraScorer class. See the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer) for details on the `.run()` method and its input/output.
-## .run() Returns
+## `.run()` returns
-**runId:** (`string`): The id of the run (optional).
+**runId** (`string`): The id of the run (optional).
-**analyzeStepResult:** (`object`): Object with difference metrics: { confidence: number, changes: number, lengthDiff: number }
+**analyzeStepResult** (`object`): Object with difference metrics: { confidence: number, changes: number, lengthDiff: number }
-**score:** (`number`): Similarity ratio (0-1) where 1 indicates identical texts.
+**score** (`number`): Similarity ratio (0-1) where 1 indicates identical texts.
 `.run()` returns a result in the following shape:
@@ -31,7 +31,7 @@ This function returns an instance of the MastraScorer class. See the [MastraScor
 }
 ```
-## Scoring Details
+## Scoring details
 The scorer calculates several measures:
@@ -71,22 +71,22 @@ A textual difference score between 0 and 1:
 Measure textual differences between expected and actual agent outputs:
 ```typescript
-import { runEvals } from "@mastra/core/evals";
-import { createTextualDifferenceScorer } from "@mastra/evals/scorers/prebuilt";
-import { myAgent } from "./agent";
+import { runEvals } from '@mastra/core/evals'
+import { createTextualDifferenceScorer } from '@mastra/evals/scorers/prebuilt'
+import { myAgent } from './agent'
-const scorer = createTextualDifferenceScorer();
+const scorer = createTextualDifferenceScorer()
 const result = await runEvals({
   data: [
     {
-      input: "Summarize the concept of recursion",
+      input: 'Summarize the concept of recursion',
       groundTruth:
-        "Recursion is when a function calls itself to solve a problem by breaking it into smaller subproblems.",
+        'Recursion is when a function calls itself to solve a problem by breaking it into smaller subproblems.',
     },
     {
-      input: "What is the capital of France?",
-      groundTruth: "The capital of France is Paris.",
+      input: 'What is the capital of France?',
+      groundTruth: 'The capital of France is Paris.',
     },
   ],
   scorers: [scorer],
@@ -95,11 +95,11 @@ const result = await runEvals({
     console.log({
       score: scorerResults[scorer.id].score,
       groundTruth: scorerResults[scorer.id].groundTruth,
-    });
+    })
   },
-});
+})
-console.log(result.scores);
+console.log(result.scores)
 ```
 For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).