npm - @mastra/evals - Versions diffs - 1.1.2-alpha.0 → 1.2.0-alpha.0 - Mend

@mastra/evals 1.1.2-alpha.0 → 1.2.0-alpha.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (60) hide show

package/CHANGELOG.md +59 -2
package/LICENSE.md +15 -0
package/dist/chunk-EVBNIL5M.js +606 -0
package/dist/chunk-EVBNIL5M.js.map +1 -0
package/dist/chunk-XRUR5PBK.cjs +632 -0
package/dist/chunk-XRUR5PBK.cjs.map +1 -0
package/dist/docs/SKILL.md +20 -19
package/dist/docs/assets/SOURCE_MAP.json +1 -1
package/dist/docs/references/docs-evals-built-in-scorers.md +2 -1
package/dist/docs/references/docs-evals-overview.md +11 -16
package/dist/docs/references/reference-evals-answer-relevancy.md +25 -25
package/dist/docs/references/reference-evals-answer-similarity.md +33 -35
package/dist/docs/references/reference-evals-bias.md +24 -24
package/dist/docs/references/reference-evals-completeness.md +19 -20
package/dist/docs/references/reference-evals-content-similarity.md +20 -20
package/dist/docs/references/reference-evals-context-precision.md +36 -36
package/dist/docs/references/reference-evals-context-relevance.md +136 -141
package/dist/docs/references/reference-evals-faithfulness.md +24 -24
package/dist/docs/references/reference-evals-hallucination.md +52 -69
package/dist/docs/references/reference-evals-keyword-coverage.md +18 -18
package/dist/docs/references/reference-evals-noise-sensitivity.md +167 -177
package/dist/docs/references/reference-evals-prompt-alignment.md +111 -116
package/dist/docs/references/reference-evals-scorer-utils.md +285 -105
package/dist/docs/references/reference-evals-textual-difference.md +18 -18
package/dist/docs/references/reference-evals-tone-consistency.md +19 -19
package/dist/docs/references/reference-evals-tool-call-accuracy.md +165 -165
package/dist/docs/references/reference-evals-toxicity.md +21 -21
package/dist/docs/references/reference-evals-trajectory-accuracy.md +613 -0
package/dist/scorers/code/index.d.ts +1 -0
package/dist/scorers/code/index.d.ts.map +1 -1
package/dist/scorers/code/trajectory/index.d.ts +147 -0
package/dist/scorers/code/trajectory/index.d.ts.map +1 -0
package/dist/scorers/llm/answer-similarity/index.d.ts +2 -2
package/dist/scorers/llm/context-precision/index.d.ts +2 -2
package/dist/scorers/llm/context-relevance/index.d.ts +1 -1
package/dist/scorers/llm/faithfulness/index.d.ts +1 -1
package/dist/scorers/llm/hallucination/index.d.ts +2 -2
package/dist/scorers/llm/index.d.ts +1 -0
package/dist/scorers/llm/index.d.ts.map +1 -1
package/dist/scorers/llm/noise-sensitivity/index.d.ts +1 -1
package/dist/scorers/llm/prompt-alignment/index.d.ts +5 -5
package/dist/scorers/llm/tool-call-accuracy/index.d.ts +1 -1
package/dist/scorers/llm/toxicity/index.d.ts +1 -1
package/dist/scorers/llm/trajectory/index.d.ts +58 -0
package/dist/scorers/llm/trajectory/index.d.ts.map +1 -0
package/dist/scorers/llm/trajectory/prompts.d.ts +20 -0
package/dist/scorers/llm/trajectory/prompts.d.ts.map +1 -0
package/dist/scorers/prebuilt/index.cjs +638 -59
package/dist/scorers/prebuilt/index.cjs.map +1 -1
package/dist/scorers/prebuilt/index.js +578 -2
package/dist/scorers/prebuilt/index.js.map +1 -1
package/dist/scorers/utils.cjs +41 -17
package/dist/scorers/utils.d.ts +171 -1
package/dist/scorers/utils.d.ts.map +1 -1
package/dist/scorers/utils.js +1 -1
package/package.json +14 -11
package/dist/chunk-OEOE7ZHN.js +0 -195
package/dist/chunk-OEOE7ZHN.js.map +0 -1
package/dist/chunk-W3U7MMDX.cjs +0 -212
package/dist/chunk-W3U7MMDX.cjs.map +0 -1

package/dist/docs/references/reference-evals-context-relevance.md CHANGED Viewed

@@ -1,10 +1,10 @@
-# Context Relevance Scorer
+# Context relevance scorer
 The `createContextRelevanceScorerLLM()` function creates a scorer that evaluates how relevant and useful provided context was for generating agent responses. It uses weighted relevance levels and applies penalties for unused high-relevance context and missing information.
-It is especially useful for these use cases:
+It's especially useful for these use cases:
-**Content Generation Evaluation**
+## Content generation evaluation
 Best for evaluating context quality in:
@@ -12,7 +12,7 @@ Best for evaluating context quality in:
 - RAG pipelines needing nuanced relevance assessment
 - Systems where missing context affects quality
-**Context Selection Optimization**
+## Context selection optimization
 Use when optimizing for:
@@ -22,19 +22,19 @@ Use when optimizing for:
 ## Parameters
-**model:** (`MastraModelConfig`): The language model to use for evaluating context relevance
+**model** (`MastraModelConfig`): The language model to use for evaluating context relevance
-**options:** (`ContextRelevanceOptions`): Configuration options for the scorer
+**options** (`ContextRelevanceOptions`): Configuration options for the scorer
 Note: Either `context` or `contextExtractor` must be provided. If both are provided, `contextExtractor` takes precedence.
-## .run() Returns
+## `.run()` returns
-**score:** (`number`): Weighted relevance score between 0 and scale (default 0-1)
+**score** (`number`): Weighted relevance score between 0 and scale (default 0-1)
-**reason:** (`string`): Human-readable explanation of the context relevance evaluation
+**reason** (`string`): Human-readable explanation of the context relevance evaluation
-## Scoring Details
+## Scoring details
 ### Weighted Relevance Scoring
@@ -115,16 +115,16 @@ Use results to improve your system:
 Control how penalties are applied for unused and missing context:
 ```typescript
-import { createContextRelevanceScorerLLM } from "@mastra/evals";
+import { createContextRelevanceScorerLLM } from '@mastra/evals'
 // Stricter penalty configuration
 const strictScorer = createContextRelevanceScorerLLM({
-  model: "openai/gpt-5.1",
+  model: 'openai/gpt-5.4',
   options: {
     context: [
-      "Einstein won the Nobel Prize for photoelectric effect",
-      "He developed the theory of relativity",
-      "Einstein was born in Germany",
+      'Einstein won the Nobel Prize for photoelectric effect',
+      'He developed the theory of relativity',
+      'Einstein was born in Germany',
     ],
     penalties: {
       unusedHighRelevanceContext: 0.2, // 20% penalty per unused high-relevance context
@@ -133,16 +133,16 @@ const strictScorer = createContextRelevanceScorerLLM({
     },
     scale: 1,
   },
-});
+})
 // Lenient penalty configuration
 const lenientScorer = createContextRelevanceScorerLLM({
-  model: "openai/gpt-5.1",
+  model: 'openai/gpt-5.4',
   options: {
     context: [
-      "Einstein won the Nobel Prize for photoelectric effect",
-      "He developed the theory of relativity",
-      "Einstein was born in Germany",
+      'Einstein won the Nobel Prize for photoelectric effect',
+      'He developed the theory of relativity',
+      'Einstein was born in Germany',
     ],
     penalties: {
       unusedHighRelevanceContext: 0.05, // 5% penalty per unused high-relevance context
@@ -151,69 +151,68 @@ const lenientScorer = createContextRelevanceScorerLLM({
     },
     scale: 1,
   },
-});
+})
 const testRun = {
   input: {
     inputMessages: [
       {
-        id: "1",
-        role: "user",
-        content: "What did Einstein achieve in physics?",
+        id: '1',
+        role: 'user',
+        content: 'What did Einstein achieve in physics?',
       },
     ],
   },
   output: [
     {
-      id: "2",
-      role: "assistant",
-      content:
-        "Einstein won the Nobel Prize for his work on the photoelectric effect.",
+      id: '2',
+      role: 'assistant',
+      content: 'Einstein won the Nobel Prize for his work on the photoelectric effect.',
     },
   ],
-};
+}
-const strictResult = await strictScorer.run(testRun);
-const lenientResult = await lenientScorer.run(testRun);
+const strictResult = await strictScorer.run(testRun)
+const lenientResult = await lenientScorer.run(testRun)
-console.log("Strict penalties:", strictResult.score); // Lower score due to unused context
-console.log("Lenient penalties:", lenientResult.score); // Higher score, less penalty
+console.log('Strict penalties:', strictResult.score) // Lower score due to unused context
+console.log('Lenient penalties:', lenientResult.score) // Higher score, less penalty
 ```
 ### Dynamic Context Extraction
 ```typescript
 const scorer = createContextRelevanceScorerLLM({
-  model: "openai/gpt-5.1",
+  model: 'openai/gpt-5.4',
   options: {
     contextExtractor: (input, output) => {
       // Extract context based on the query
-      const userQuery = input?.inputMessages?.[0]?.content || "";
-      if (userQuery.includes("Einstein")) {
+      const userQuery = input?.inputMessages?.[0]?.content || ''
+      if (userQuery.includes('Einstein')) {
         return [
-          "Einstein won the Nobel Prize for the photoelectric effect",
-          "He developed the theory of relativity",
-        ];
+          'Einstein won the Nobel Prize for the photoelectric effect',
+          'He developed the theory of relativity',
+        ]
       }
-      return ["General physics information"];
+      return ['General physics information']
     },
     penalties: {
       unusedHighRelevanceContext: 0.15,
     },
   },
-});
+})
 ```
 ### Custom scale factor
 ```typescript
 const scorer = createContextRelevanceScorerLLM({
-  model: "openai/gpt-5.1",
+  model: 'openai/gpt-5.4',
   options: {
-    context: ["Relevant information...", "Supporting details..."],
+    context: ['Relevant information...', 'Supporting details...'],
     scale: 100, // Scale scores from 0-100 instead of 0-1
   },
-});
+})
 // Result will be scaled: score: 85 instead of 0.85
 ```
@@ -222,21 +221,21 @@ const scorer = createContextRelevanceScorerLLM({
 ```typescript
 const scorer = createContextRelevanceScorerLLM({
-  model: "openai/gpt-5.1",
+  model: 'openai/gpt-5.4',
   options: {
     contextExtractor: (input, output) => {
-      const query = input?.inputMessages?.[0]?.content || "";
+      const query = input?.inputMessages?.[0]?.content || ''
       // Combine from multiple sources
-      const kbContext = knowledgeBase.search(query);
-      const docContext = documentStore.retrieve(query);
-      const cacheContext = contextCache.get(query);
+      const kbContext = knowledgeBase.search(query)
+      const docContext = documentStore.retrieve(query)
+      const cacheContext = contextCache.get(query)
-      return [...kbContext, ...docContext, ...cacheContext];
+      return [...kbContext, ...docContext, ...cacheContext]
     },
     scale: 1,
   },
-});
+})
 ```
 ## Examples
@@ -246,41 +245,41 @@ const scorer = createContextRelevanceScorerLLM({
 This example shows excellent context relevance where all context directly supports the response:
 ```typescript
-import { createContextRelevanceScorerLLM } from "@mastra/evals";
+import { createContextRelevanceScorerLLM } from '@mastra/evals'
 const scorer = createContextRelevanceScorerLLM({
-  model: "openai/gpt-5.1",
+  model: 'openai/gpt-5.4',
   options: {
     context: [
-      "Einstein won the Nobel Prize for his discovery of the photoelectric effect in 1921.",
-      "He published his theory of special relativity in 1905.",
-      "His general relativity theory, published in 1915, revolutionized our understanding of gravity.",
+      'Einstein won the Nobel Prize for his discovery of the photoelectric effect in 1921.',
+      'He published his theory of special relativity in 1905.',
+      'His general relativity theory, published in 1915, revolutionized our understanding of gravity.',
     ],
     scale: 1,
   },
-});
+})
 const result = await scorer.run({
   input: {
     inputMessages: [
       {
-        id: "1",
-        role: "user",
+        id: '1',
+        role: 'user',
         content: "What were Einstein's major scientific achievements?",
       },
     ],
   },
   output: [
     {
-      id: "2",
-      role: "assistant",
+      id: '2',
+      role: 'assistant',
       content:
         "Einstein's major achievements include the Nobel Prize for the photoelectric effect, special relativity in 1905, and general relativity in 1915.",
     },
   ],
-});
+})
-console.log(result);
+console.log(result)
 // Output:
 // {
 //   score: 1.0,
@@ -293,43 +292,43 @@ console.log(result);
 This example shows moderate relevance with some context being irrelevant or unused:
 ```typescript
-import { createContextRelevanceScorerLLM } from "@mastra/evals";
+import { createContextRelevanceScorerLLM } from '@mastra/evals'
 const scorer = createContextRelevanceScorerLLM({
-  model: "openai/gpt-5.1",
+  model: 'openai/gpt-5.4',
   options: {
     context: [
-      "Solar eclipses occur when the Moon blocks the Sun.",
-      "The Moon moves between the Earth and Sun during eclipses.",
-      "The Moon is visible at night.",
-      "Stars twinkle due to atmospheric interference.",
-      "Total eclipses can last up to 7.5 minutes.",
+      'Solar eclipses occur when the Moon blocks the Sun.',
+      'The Moon moves between the Earth and Sun during eclipses.',
+      'The Moon is visible at night.',
+      'Stars twinkle due to atmospheric interference.',
+      'Total eclipses can last up to 7.5 minutes.',
     ],
     scale: 1,
   },
-});
+})
 const result = await scorer.run({
   input: {
     inputMessages: [
       {
-        id: "1",
-        role: "user",
-        content: "What causes solar eclipses?",
+        id: '1',
+        role: 'user',
+        content: 'What causes solar eclipses?',
       },
     ],
   },
   output: [
     {
-      id: "2",
-      role: "assistant",
+      id: '2',
+      role: 'assistant',
       content:
-        "Solar eclipses happen when the Moon moves between Earth and the Sun, blocking sunlight.",
+        'Solar eclipses happen when the Moon moves between Earth and the Sun, blocking sunlight.',
     },
   ],
-});
+})
-console.log(result);
+console.log(result)
 // Output with default penalties:
 // {
 //   score: 0.64,
@@ -338,14 +337,14 @@ console.log(result);
 // With custom penalty configuration
 const customScorer = createContextRelevanceScorerLLM({
-  model: "openai/gpt-5.1",
+  model: 'openai/gpt-5.4',
   options: {
     context: [
-      "Solar eclipses occur when the Moon blocks the Sun.",
-      "The Moon moves between the Earth and Sun during eclipses.",
-      "The Moon is visible at night.",
-      "Stars twinkle due to atmospheric interference.",
-      "Total eclipses can last up to 7.5 minutes.",
+      'Solar eclipses occur when the Moon blocks the Sun.',
+      'The Moon moves between the Earth and Sun during eclipses.',
+      'The Moon is visible at night.',
+      'Stars twinkle due to atmospheric interference.',
+      'Total eclipses can last up to 7.5 minutes.',
     ],
     penalties: {
       unusedHighRelevanceContext: 0.05, // Lower penalty for unused context
@@ -353,25 +352,23 @@ const customScorer = createContextRelevanceScorerLLM({
       maxMissingContextPenalty: 0.3,
     },
   },
-});
+})
 const customResult = await customScorer.run({
   input: {
-    inputMessages: [
-      { id: "1", role: "user", content: "What causes solar eclipses?" },
-    ],
+    inputMessages: [{ id: '1', role: 'user', content: 'What causes solar eclipses?' }],
   },
   output: [
     {
-      id: "2",
-      role: "assistant",
+      id: '2',
+      role: 'assistant',
       content:
-        "Solar eclipses happen when the Moon moves between Earth and the Sun, blocking sunlight.",
+        'Solar eclipses happen when the Moon moves between Earth and the Sun, blocking sunlight.',
     },
   ],
-});
+})
-console.log(customResult);
+console.log(customResult)
 // Output with lenient penalties:
 // {
 //   score: 0.69, // Higher score due to reduced penalty for unused context
@@ -384,42 +381,42 @@ console.log(customResult);
 This example shows poor context relevance with mostly irrelevant information:
 ```typescript
-import { createContextRelevanceScorerLLM } from "@mastra/evals";
+import { createContextRelevanceScorerLLM } from '@mastra/evals'
 const scorer = createContextRelevanceScorerLLM({
-  model: "openai/gpt-5.1",
+  model: 'openai/gpt-5.4',
   options: {
     context: [
-      "The Great Barrier Reef is located in Australia.",
-      "Coral reefs need warm water to survive.",
-      "Many fish species live in coral reefs.",
-      "Australia has six states and two territories.",
-      "The capital of Australia is Canberra.",
+      'The Great Barrier Reef is located in Australia.',
+      'Coral reefs need warm water to survive.',
+      'Many fish species live in coral reefs.',
+      'Australia has six states and two territories.',
+      'The capital of Australia is Canberra.',
     ],
     scale: 1,
   },
-});
+})
 const result = await scorer.run({
   input: {
     inputMessages: [
       {
-        id: "1",
-        role: "user",
-        content: "What is the capital of Australia?",
+        id: '1',
+        role: 'user',
+        content: 'What is the capital of Australia?',
       },
     ],
   },
   output: [
     {
-      id: "2",
-      role: "assistant",
-      content: "The capital of Australia is Canberra.",
+      id: '2',
+      role: 'assistant',
+      content: 'The capital of Australia is Canberra.',
     },
   ],
-});
+})
-console.log(result);
+console.log(result)
 // Output:
 // {
 //   score: 0.26,
@@ -432,33 +429,33 @@ console.log(result);
 Extract context dynamically based on the run input:
 ```typescript
-import { createContextRelevanceScorerLLM } from "@mastra/evals";
+import { createContextRelevanceScorerLLM } from '@mastra/evals'
 const scorer = createContextRelevanceScorerLLM({
-  model: "openai/gpt-5.1",
+  model: 'openai/gpt-5.4',
   options: {
     contextExtractor: (input, output) => {
       // Extract query from input
-      const query = input?.inputMessages?.[0]?.content || "";
+      const query = input?.inputMessages?.[0]?.content || ''
       // Dynamically retrieve context based on query
-      if (query.toLowerCase().includes("einstein")) {
+      if (query.toLowerCase().includes('einstein')) {
         return [
-          "Einstein developed E=mc²",
-          "He won the Nobel Prize in 1921",
-          "His theories revolutionized physics",
-        ];
+          'Einstein developed E=mc²',
+          'He won the Nobel Prize in 1921',
+          'His theories revolutionized physics',
+        ]
       }
-      if (query.toLowerCase().includes("climate")) {
+      if (query.toLowerCase().includes('climate')) {
         return [
-          "Global temperatures are rising",
-          "CO2 levels affect climate",
-          "Renewable energy reduces emissions",
-        ];
+          'Global temperatures are rising',
+          'CO2 levels affect climate',
+          'Renewable energy reduces emissions',
+        ]
       }
-      return ["General knowledge base entry"];
+      return ['General knowledge base entry']
     },
     penalties: {
       unusedHighRelevanceContext: 0.15, // 15% penalty for unused relevant context
@@ -467,7 +464,7 @@ const scorer = createContextRelevanceScorerLLM({
     },
     scale: 1,
   },
-});
+})
 ```
 ### RAG system integration
@@ -475,19 +472,17 @@ const scorer = createContextRelevanceScorerLLM({
 Integrate with RAG pipelines to evaluate retrieved context:
 ```typescript
-import { createContextRelevanceScorerLLM } from "@mastra/evals";
+import { createContextRelevanceScorerLLM } from '@mastra/evals'
 const scorer = createContextRelevanceScorerLLM({
-  model: "openai/gpt-5.1",
+  model: 'openai/gpt-5.4',
   options: {
     contextExtractor: (input, output) => {
       // Extract from RAG retrieval results
-      const ragResults = inputData.metadata?.ragResults || [];
+      const ragResults = inputData.metadata?.ragResults || []
       // Return the text content of retrieved documents
-      return ragResults
-        .filter((doc) => doc.relevanceScore > 0.5)
-        .map((doc) => doc.content);
+      return ragResults.filter(doc => doc.relevanceScore > 0.5).map(doc => doc.content)
     },
     penalties: {
       unusedHighRelevanceContext: 0.12, // Moderate penalty for unused RAG context
@@ -496,28 +491,28 @@ const scorer = createContextRelevanceScorerLLM({
     },
     scale: 1,
   },
-});
+})
 // Evaluate RAG system performance
-const evaluateRAG = async (testCases) => {
-  const results = [];
+const evaluateRAG = async testCases => {
+  const results = []
   for (const testCase of testCases) {
-    const score = await scorer.run(testCase);
+    const score = await scorer.run(testCase)
     results.push({
       query: testCase.inputData.inputMessages[0].content,
       relevanceScore: score.score,
       feedback: score.reason,
-      unusedContext: score.reason.includes("unused"),
-      missingContext: score.reason.includes("missing"),
-    });
+      unusedContext: score.reason.includes('unused'),
+      missingContext: score.reason.includes('missing'),
+    })
   }
-  return results;
-};
+  return results
+}
 ```
-## Comparison with Context Precision
+## Comparison with context precision
 Choose the right scorer for your needs:

package/dist/docs/references/reference-evals-faithfulness.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# Faithfulness Scorer
+# Faithfulness scorer
 The `createFaithfulnessScorer()` function evaluates how factually accurate an LLM's output is compared to the provided context. It extracts claims from the output and verifies them against the context, making it essential to measure RAG pipeline responses' reliability.
@@ -6,33 +6,33 @@ The `createFaithfulnessScorer()` function evaluates how factually accurate an LL
 The `createFaithfulnessScorer()` function accepts a single options object with the following properties:
-**model:** (`LanguageModel`): Configuration for the model used to evaluate faithfulness.
+**model** (`LanguageModel`): Configuration for the model used to evaluate faithfulness.
-**context:** (`string[]`): Array of context chunks against which the output's claims will be verified.
+**context** (`string[]`): Array of context chunks against which the output's claims will be verified.
-**scale:** (`number`): The maximum score value. The final score will be normalized to this scale. (Default: `1`)
+**scale** (`number`): The maximum score value. The final score will be normalized to this scale. (Default: `1`)
 This function returns an instance of the MastraScorer class. The `.run()` method accepts the same input as other scorers (see the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer)), but the return value includes LLM-specific fields as documented below.
-## .run() Returns
+## `.run()` returns
-**runId:** (`string`): The id of the run (optional).
+**runId** (`string`): The id of the run (optional).
-**preprocessStepResult:** (`string[]`): Array of extracted claims from the output.
+**preprocessStepResult** (`string[]`): Array of extracted claims from the output.
-**preprocessPrompt:** (`string`): The prompt sent to the LLM for the preprocess step (optional).
+**preprocessPrompt** (`string`): The prompt sent to the LLM for the preprocess step (optional).
-**analyzeStepResult:** (`object`): Object with verdicts: { verdicts: Array<{ verdict: 'yes' | 'no' | 'unsure', reason: string }> }
+**analyzeStepResult** (`object`): Object with verdicts: { verdicts: Array<{ verdict: 'yes' | 'no' | 'unsure', reason: string }> }
-**analyzePrompt:** (`string`): The prompt sent to the LLM for the analyze step (optional).
+**analyzePrompt** (`string`): The prompt sent to the LLM for the analyze step (optional).
-**score:** (`number`): A score between 0 and the configured scale, representing the proportion of claims that are supported by the context.
+**score** (`number`): A score between 0 and the configured scale, representing the proportion of claims that are supported by the context.
-**reason:** (`string`): A detailed explanation of the score, including which claims were supported, contradicted, or marked as unsure.
+**reason** (`string`): A detailed explanation of the score, including which claims were supported, contradicted, or marked as unsure.
-**generateReasonPrompt:** (`string`): The prompt sent to the LLM for the generateReason step (optional).
+**generateReasonPrompt** (`string`): The prompt sent to the LLM for the generateReason step (optional).
-## Scoring Details
+## Scoring details
 The scorer evaluates faithfulness through claim verification against provided context.
@@ -73,22 +73,22 @@ A faithfulness score between 0 and 1:
 Evaluate agent responses for faithfulness to provided context:
 ```typescript
-import { runEvals } from "@mastra/core/evals";
-import { createFaithfulnessScorer } from "@mastra/evals/scorers/prebuilt";
-import { myAgent } from "./agent";
+import { runEvals } from '@mastra/core/evals'
+import { createFaithfulnessScorer } from '@mastra/evals/scorers/prebuilt'
+import { myAgent } from './agent'
 // Context is typically populated from agent tool calls or RAG retrieval
 const scorer = createFaithfulnessScorer({
-  model: "openai/gpt-4o",
-});
+  model: 'openai/gpt-5.4',
+})
 const result = await runEvals({
   data: [
     {
-      input: "Tell me about the Tesla Model 3.",
+      input: 'Tell me about the Tesla Model 3.',
     },
     {
-      input: "What are the key features of this electric vehicle?",
+      input: 'What are the key features of this electric vehicle?',
     },
   ],
   scorers: [scorer],
@@ -97,11 +97,11 @@ const result = await runEvals({
     console.log({
       score: scorerResults[scorer.id].score,
       reason: scorerResults[scorer.id].reason,
-    });
+    })
   },
-});
+})
-console.log(result.scores);
+console.log(result.scores)
 ```
 For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).