npm - @mastra/core - Versions diffs - 1.7.0 → 1.8.0 - Mend

@mastra/core 1.7.0 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (498) hide show

package/dist/docs/references/reference-evals-scorer-utils.md ADDED Viewed

@@ -0,0 +1,326 @@
+# Scorer Utils
+Mastra provides utility functions to help extract and process data from scorer run inputs and outputs. These utilities are particularly useful in the `preprocess` step of custom scorers.
+## Import
+```typescript
+import {
+  getAssistantMessageFromRunOutput,
+  getReasoningFromRunOutput,
+  getUserMessageFromRunInput,
+  getSystemMessagesFromRunInput,
+  getCombinedSystemPrompt,
+  extractToolCalls,
+  extractInputMessages,
+  extractAgentResponseMessages,
+} from '@mastra/evals/scorers/utils'
+```
+## Message Extraction
+### getAssistantMessageFromRunOutput
+Extracts the text content from the first assistant message in the run output.
+```typescript
+const scorer = createScorer({
+  id: 'my-scorer',
+  description: 'My scorer',
+  type: 'agent',
+})
+  .preprocess(({ run }) => {
+    const response = getAssistantMessageFromRunOutput(run.output)
+    return { response }
+  })
+  .generateScore(({ results }) => {
+    return results.preprocessStepResult?.response ? 1 : 0
+  })
+```
+**output?:** (`ScorerRunOutputForAgent`): The scorer run output (array of MastraDBMessage)
+**Returns:** `string | undefined` - The assistant message text, or undefined if no assistant message is found.
+### getUserMessageFromRunInput
+Extracts the text content from the first user message in the run input.
+```typescript
+.preprocess(({ run }) => {
+  const userMessage = getUserMessageFromRunInput(run.input);
+  return { userMessage };
+})
+```
+**input?:** (`ScorerRunInputForAgent`): The scorer run input containing input messages
+**Returns:** `string | undefined` - The user message text, or undefined if no user message is found.
+### extractInputMessages
+Extracts text content from all input messages as an array.
+```typescript
+.preprocess(({ run }) => {
+  const allUserMessages = extractInputMessages(run.input);
+  return { conversationHistory: allUserMessages.join("\n") };
+})
+```
+**Returns:** `string[]` - Array of text strings from each input message.
+### extractAgentResponseMessages
+Extracts text content from all assistant response messages as an array.
+```typescript
+.preprocess(({ run }) => {
+  const allResponses = extractAgentResponseMessages(run.output);
+  return { allResponses };
+})
+```
+**Returns:** `string[]` - Array of text strings from each assistant message.
+## Reasoning Extraction
+### getReasoningFromRunOutput
+Extracts reasoning text from the run output. This is particularly useful when evaluating responses from reasoning models like `deepseek-reasoner` that produce chain-of-thought reasoning.
+Reasoning can be stored in two places:
+1. `content.reasoning` - a string field on the message content
+2. `content.parts` - as parts with `type: 'reasoning'` containing `details`
+```typescript
+import {
+  getReasoningFromRunOutput,
+  getAssistantMessageFromRunOutput,
+} from '@mastra/evals/scorers/utils'
+const reasoningQualityScorer = createScorer({
+  id: 'reasoning-quality',
+  name: 'Reasoning Quality',
+  description: 'Evaluates the quality of model reasoning',
+  type: 'agent',
+})
+  .preprocess(({ run }) => {
+    const reasoning = getReasoningFromRunOutput(run.output)
+    const response = getAssistantMessageFromRunOutput(run.output)
+    return { reasoning, response }
+  })
+  .analyze(({ results }) => {
+    const { reasoning } = results.preprocessStepResult || {}
+    return {
+      hasReasoning: !!reasoning,
+      reasoningLength: reasoning?.length || 0,
+      hasStepByStep: reasoning?.includes('step') || false,
+    }
+  })
+  .generateScore(({ results }) => {
+    const { hasReasoning, reasoningLength } = results.analyzeStepResult || {}
+    if (!hasReasoning) return 0
+    // Score based on reasoning length (normalized to 0-1)
+    return Math.min(reasoningLength / 500, 1)
+  })
+  .generateReason(({ results, score }) => {
+    const { hasReasoning, reasoningLength } = results.analyzeStepResult || {}
+    if (!hasReasoning) {
+      return 'No reasoning was provided by the model.'
+    }
+    return `Model provided ${reasoningLength} characters of reasoning. Score: ${score}`
+  })
+```
+**output?:** (`ScorerRunOutputForAgent`): The scorer run output (array of MastraDBMessage)
+**Returns:** `string | undefined` - The reasoning text, or undefined if no reasoning is present.
+## System Message Extraction
+### getSystemMessagesFromRunInput
+Extracts all system messages from the run input, including both standard system messages and tagged system messages (specialized prompts like memory instructions).
+```typescript
+.preprocess(({ run }) => {
+  const systemMessages = getSystemMessagesFromRunInput(run.input);
+  return {
+    systemPromptCount: systemMessages.length,
+    systemPrompts: systemMessages
+  };
+})
+```
+**Returns:** `string[]` - Array of system message strings.
+### getCombinedSystemPrompt
+Combines all system messages into a single prompt string, joined with double newlines.
+```typescript
+.preprocess(({ run }) => {
+  const fullSystemPrompt = getCombinedSystemPrompt(run.input);
+  return { fullSystemPrompt };
+})
+```
+**Returns:** `string` - Combined system prompt string.
+## Tool Call Extraction
+### extractToolCalls
+Extracts information about all tool calls from the run output, including tool names, call IDs, and their positions in the message array.
+```typescript
+const toolUsageScorer = createScorer({
+  id: 'tool-usage',
+  description: 'Evaluates tool usage patterns',
+  type: 'agent',
+})
+  .preprocess(({ run }) => {
+    const { tools, toolCallInfos } = extractToolCalls(run.output)
+    return {
+      toolsUsed: tools,
+      toolCount: tools.length,
+      toolDetails: toolCallInfos,
+    }
+  })
+  .generateScore(({ results }) => {
+    const { toolCount } = results.preprocessStepResult || {}
+    // Score based on appropriate tool usage
+    return toolCount > 0 ? 1 : 0
+  })
+```
+**Returns:**
+```typescript
+{
+  tools: string[];           // Array of tool names
+  toolCallInfos: ToolCallInfo[];  // Detailed tool call information
+}
+```
+Where `ToolCallInfo` is:
+```typescript
+type ToolCallInfo = {
+  toolName: string // Name of the tool
+  toolCallId: string // Unique call identifier
+  messageIndex: number // Index in the output array
+  invocationIndex: number // Index within message's tool invocations
+}
+```
+## Test Utilities
+These utilities help create test data for scorer development.
+### createTestMessage
+Creates a `MastraDBMessage` object for testing purposes.
+```typescript
+import { createTestMessage } from '@mastra/evals/scorers/utils'
+const userMessage = createTestMessage({
+  content: 'What is the weather?',
+  role: 'user',
+})
+const assistantMessage = createTestMessage({
+  content: 'The weather is sunny.',
+  role: 'assistant',
+  toolInvocations: [
+    {
+      toolCallId: 'call-1',
+      toolName: 'weatherTool',
+      args: { location: 'London' },
+      result: { temp: 20 },
+      state: 'result',
+    },
+  ],
+})
+```
+### createAgentTestRun
+Creates a complete test run object for testing scorers.
+```typescript
+import { createAgentTestRun, createTestMessage } from '@mastra/evals/scorers/utils'
+const testRun = createAgentTestRun({
+  inputMessages: [createTestMessage({ content: 'Hello', role: 'user' })],
+  output: [createTestMessage({ content: 'Hi there!', role: 'assistant' })],
+})
+// Run your scorer with the test data
+const result = await myScorer.run({
+  input: testRun.input,
+  output: testRun.output,
+})
+```
+## Complete Example
+Here's a complete example showing how to use multiple utilities together:
+```typescript
+import { createScorer } from '@mastra/core/evals'
+import {
+  getAssistantMessageFromRunOutput,
+  getReasoningFromRunOutput,
+  getUserMessageFromRunInput,
+  getCombinedSystemPrompt,
+  extractToolCalls,
+} from '@mastra/evals/scorers/utils'
+const comprehensiveScorer = createScorer({
+  id: 'comprehensive-analysis',
+  name: 'Comprehensive Analysis',
+  description: 'Analyzes all aspects of an agent response',
+  type: 'agent',
+})
+  .preprocess(({ run }) => {
+    // Extract all relevant data
+    const userMessage = getUserMessageFromRunInput(run.input)
+    const response = getAssistantMessageFromRunOutput(run.output)
+    const reasoning = getReasoningFromRunOutput(run.output)
+    const systemPrompt = getCombinedSystemPrompt(run.input)
+    const { tools, toolCallInfos } = extractToolCalls(run.output)
+    return {
+      userMessage,
+      response,
+      reasoning,
+      systemPrompt,
+      toolsUsed: tools,
+      toolCount: tools.length,
+    }
+  })
+  .generateScore(({ results }) => {
+    const { response, reasoning, toolCount } = results.preprocessStepResult || {}
+    let score = 0
+    if (response && response.length > 0) score += 0.4
+    if (reasoning) score += 0.3
+    if (toolCount > 0) score += 0.3
+    return score
+  })
+  .generateReason(({ results, score }) => {
+    const { response, reasoning, toolCount } = results.preprocessStepResult || {}
+    const parts = []
+    if (response) parts.push('provided a response')
+    if (reasoning) parts.push('included reasoning')
+    if (toolCount > 0) parts.push(`used ${toolCount} tool(s)`)
+    return `Score: ${score}. The agent ${parts.join(', ')}.`
+  })
+```

package/dist/docs/references/reference-evals-textual-difference.md ADDED Viewed

@@ -0,0 +1,113 @@
+# Textual Difference Scorer
+The `createTextualDifferenceScorer()` function uses sequence matching to measure the textual differences between two strings. It provides detailed information about changes, including the number of operations needed to transform one text into another.
+## Parameters
+The `createTextualDifferenceScorer()` function does not take any options.
+This function returns an instance of the MastraScorer class. See the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer) for details on the `.run()` method and its input/output.
+## .run() Returns
+**runId:** (`string`): The id of the run (optional).
+**analyzeStepResult:** (`object`): Object with difference metrics: { confidence: number, changes: number, lengthDiff: number }
+**score:** (`number`): Similarity ratio (0-1) where 1 indicates identical texts.
+`.run()` returns a result in the following shape:
+```typescript
+{
+  runId: string,
+  analyzeStepResult: {
+    confidence: number,
+    ratio: number,
+    changes: number,
+    lengthDiff: number
+  },
+  score: number
+}
+```
+## Scoring Details
+The scorer calculates several measures:
+- **Similarity Ratio**: Based on sequence matching between texts (0-1)
+- **Changes**: Count of non-matching operations needed
+- **Length Difference**: Normalized difference in text lengths
+- **Confidence**: Inversely proportional to length difference
+### Scoring Process
+1. Analyzes textual differences:
+   - Performs sequence matching between input and output
+   - Counts the number of change operations required
+   - Measures length differences
+2. Calculates metrics:
+   - Computes similarity ratio
+   - Determines confidence score
+   - Combines into weighted score
+Final score: `(similarity_ratio * confidence) * scale`
+### Score interpretation
+A textual difference score between 0 and 1:
+- **1.0**: Identical texts – no differences detected.
+- **0.7–0.9**: Minor differences – few changes needed.
+- **0.4–0.6**: Moderate differences – noticeable changes required.
+- **0.1–0.3**: Major differences – extensive changes needed.
+- **0.0**: Completely different texts.
+## Example
+Measure textual differences between expected and actual agent outputs:
+```typescript
+import { runEvals } from '@mastra/core/evals'
+import { createTextualDifferenceScorer } from '@mastra/evals/scorers/prebuilt'
+import { myAgent } from './agent'
+const scorer = createTextualDifferenceScorer()
+const result = await runEvals({
+  data: [
+    {
+      input: 'Summarize the concept of recursion',
+      groundTruth:
+        'Recursion is when a function calls itself to solve a problem by breaking it into smaller subproblems.',
+    },
+    {
+      input: 'What is the capital of France?',
+      groundTruth: 'The capital of France is Paris.',
+    },
+  ],
+  scorers: [scorer],
+  target: myAgent,
+  onItemComplete: ({ scorerResults }) => {
+    console.log({
+      score: scorerResults[scorer.id].score,
+      groundTruth: scorerResults[scorer.id].groundTruth,
+    })
+  },
+})
+console.log(result.scores)
+```
+For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).
+To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide.
+## Related
+- [Content Similarity Scorer](https://mastra.ai/reference/evals/content-similarity)
+- [Completeness Scorer](https://mastra.ai/reference/evals/completeness)
+- [Keyword Coverage Scorer](https://mastra.ai/reference/evals/keyword-coverage)

package/dist/docs/references/reference-evals-tone-consistency.md ADDED Viewed

@@ -0,0 +1,119 @@
+# Tone Consistency Scorer
+The `createToneScorer()` function evaluates the text's emotional tone and sentiment consistency. It can operate in two modes: comparing tone between input/output pairs or analyzing tone stability within a single text.
+## Parameters
+The `createToneScorer()` function does not take any options.
+This function returns an instance of the MastraScorer class. See the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer) for details on the `.run()` method and its input/output.
+## .run() Returns
+**runId:** (`string`): The id of the run (optional).
+**analyzeStepResult:** (`object`): Object with tone metrics: { responseSentiment: number, referenceSentiment: number, difference: number } (for comparison mode) OR { avgSentiment: number, sentimentVariance: number } (for stability mode)
+**score:** (`number`): Tone consistency/stability score (0-1).
+`.run()` returns a result in the following shape:
+```typescript
+{
+  runId: string,
+  analyzeStepResult: {
+    responseSentiment?: number,
+    referenceSentiment?: number,
+    difference?: number,
+    avgSentiment?: number,
+    sentimentVariance?: number,
+  },
+  score: number
+}
+```
+## Scoring Details
+The scorer evaluates sentiment consistency through tone pattern analysis and mode-specific scoring.
+### Scoring Process
+1. Analyzes tone patterns:
+   - Extracts sentiment features
+   - Computes sentiment scores
+   - Measures tone variations
+2. Calculates mode-specific score: **Tone Consistency** (input and output):
+   - Compares sentiment between texts
+   - Calculates sentiment difference
+   - Score = 1 - (sentiment\_difference / max\_difference) **Tone Stability** (single input):
+   - Analyzes sentiment across sentences
+   - Calculates sentiment variance
+   - Score = 1 - (sentiment\_variance / max\_variance)
+Final score: `mode_specific_score * scale`
+### Score interpretation
+(0 to scale, default 0-1)
+- 1.0: Perfect tone consistency/stability
+- 0.7-0.9: Strong consistency with minor variations
+- 0.4-0.6: Moderate consistency with noticeable shifts
+- 0.1-0.3: Poor consistency with major tone changes
+- 0.0: No consistency - completely different tones
+### analyzeStepResult
+Object with tone metrics:
+- **responseSentiment**: Sentiment score for the response (comparison mode).
+- **referenceSentiment**: Sentiment score for the input/reference (comparison mode).
+- **difference**: Absolute difference between sentiment scores (comparison mode).
+- **avgSentiment**: Average sentiment across sentences (stability mode).
+- **sentimentVariance**: Variance of sentiment across sentences (stability mode).
+## Example
+Evaluate tone consistency between related agent responses:
+```typescript
+import { runEvals } from '@mastra/core/evals'
+import { createToneScorer } from '@mastra/evals/scorers/prebuilt'
+import { myAgent } from './agent'
+const scorer = createToneScorer()
+const result = await runEvals({
+  data: [
+    {
+      input: 'How was your experience with our service?',
+      groundTruth: 'The service was excellent and exceeded expectations!',
+    },
+    {
+      input: 'Tell me about the customer support',
+      groundTruth: 'The support team was friendly and very helpful.',
+    },
+  ],
+  scorers: [scorer],
+  target: myAgent,
+  onItemComplete: ({ scorerResults }) => {
+    console.log({
+      score: scorerResults[scorer.id].score,
+    })
+  },
+})
+console.log(result.scores)
+```
+For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).
+To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide.
+## Related
+- [Content Similarity Scorer](https://mastra.ai/reference/evals/content-similarity)
+- [Toxicity Scorer](https://mastra.ai/reference/evals/toxicity)

package/dist/docs/references/reference-evals-toxicity.md ADDED Viewed

@@ -0,0 +1,123 @@
+# Toxicity Scorer
+The `createToxicityScorer()` function evaluates whether an LLM's output contains racist, biased, or toxic elements. It uses a judge-based system to analyze responses for various forms of toxicity including personal attacks, mockery, hate speech, dismissive statements, and threats.
+## Parameters
+The `createToxicityScorer()` function accepts a single options object with the following properties:
+**model:** (`LanguageModel`): Configuration for the model used to evaluate toxicity.
+**scale:** (`number`): Maximum score value (default is 1). (Default: `1`)
+This function returns an instance of the MastraScorer class. The `.run()` method accepts the same input as other scorers (see the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer)), but the return value includes LLM-specific fields as documented below.
+## .run() Returns
+**runId:** (`string`): The id of the run (optional).
+**analyzeStepResult:** (`object`): Object with verdicts: { verdicts: Array<{ verdict: 'yes' | 'no', reason: string }> }
+**analyzePrompt:** (`string`): The prompt sent to the LLM for the analyze step (optional).
+**score:** (`number`): Toxicity score (0 to scale, default 0-1).
+**reason:** (`string`): Detailed explanation of the toxicity assessment.
+**generateReasonPrompt:** (`string`): The prompt sent to the LLM for the generateReason step (optional).
+`.run()` returns a result in the following shape:
+```typescript
+{
+  runId: string,
+  analyzeStepResult: {
+    verdicts: Array<{ verdict: 'yes' | 'no', reason: string }>
+  },
+  analyzePrompt: string,
+  score: number,
+  reason: string,
+  reasonPrompt: string
+}
+```
+## Scoring Details
+The scorer evaluates toxicity through multiple aspects:
+- Personal attacks
+- Mockery or sarcasm
+- Hate speech
+- Dismissive statements
+- Threats or intimidation
+### Scoring Process
+1. Analyzes toxic elements:
+   - Identifies personal attacks and mockery
+   - Detects hate speech and threats
+   - Evaluates dismissive statements
+   - Assesses severity levels
+2. Calculates toxicity score:
+   - Weighs detected elements
+   - Combines severity ratings
+   - Normalizes to scale
+Final score: `(toxicity_weighted_sum / max_toxicity) * scale`
+### Score interpretation
+A toxicity score between 0 and 1:
+- **0.8–1.0**: Severe toxicity.
+- **0.4–0.7**: Moderate toxicity.
+- **0.1–0.3**: Mild toxicity.
+- **0.0**: No toxic elements detected.
+## Example
+Evaluate agent responses for toxic, biased, or harmful content:
+```typescript
+import { runEvals } from '@mastra/core/evals'
+import { createToxicityScorer } from '@mastra/evals/scorers/prebuilt'
+import { myAgent } from './agent'
+const scorer = createToxicityScorer({ model: 'openai/gpt-4o' })
+const result = await runEvals({
+  data: [
+    {
+      input: 'What do you think about the new team member?',
+    },
+    {
+      input: 'How was the meeting discussion?',
+    },
+    {
+      input: 'Can you provide feedback on the project proposal?',
+    },
+  ],
+  scorers: [scorer],
+  target: myAgent,
+  onItemComplete: ({ scorerResults }) => {
+    console.log({
+      score: scorerResults[scorer.id].score,
+      reason: scorerResults[scorer.id].reason,
+    })
+  },
+})
+console.log(result.scores)
+```
+For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).
+To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide.
+## Related
+- [Tone Consistency Scorer](https://mastra.ai/reference/evals/tone-consistency)
+- [Bias Scorer](https://mastra.ai/reference/evals/bias)