npm - @mastra/core - Versions diffs - 1.6.0 → 1.7.0 - Mend

@mastra/core 1.6.0 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (453) hide show

package/dist/docs/references/reference-evals-completeness.md DELETED Viewed

@@ -1,137 +0,0 @@
-# Completeness Scorer
-The `createCompletenessScorer()` function evaluates how thoroughly an LLM's output covers the key elements present in the input. It analyzes nouns, verbs, topics, and terms to determine coverage and provides a detailed completeness score.
-## Parameters
-The `createCompletenessScorer()` function does not take any options.
-This function returns an instance of the MastraScorer class. See the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer) for details on the `.run()` method and its input/output.
-## .run() Returns
-**runId:** (`string`): The id of the run (optional).
-**preprocessStepResult:** (`object`): Object with extracted elements and coverage details: { inputElements: string\[], outputElements: string\[], missingElements: string\[], elementCounts: { input: number, output: number } }
-**score:** (`number`): Completeness score (0-1) representing the proportion of input elements covered in the output.
-The `.run()` method returns a result in the following shape:
-```typescript
-{
-  runId: string,
-  extractStepResult: {
-    inputElements: string[],
-    outputElements: string[],
-    missingElements: string[],
-    elementCounts: { input: number, output: number }
-  },
-  score: number
-}
-```
-## Element Extraction Details
-The scorer extracts and analyzes several types of elements:
-- Nouns: Key objects, concepts, and entities
-- Verbs: Actions and states (converted to infinitive form)
-- Topics: Main subjects and themes
-- Terms: Individual significant words
-The extraction process includes:
-- Normalization of text (removing diacritics, converting to lowercase)
-- Splitting camelCase words
-- Handling of word boundaries
-- Special handling of short words (3 characters or less)
-- Deduplication of elements
-### extractStepResult
-From the `.run()` method, you can get the `extractStepResult` object with the following properties:
-- **inputElements**: Key elements found in the input (e.g., nouns, verbs, topics, terms).
-- **outputElements**: Key elements found in the output.
-- **missingElements**: Input elements not found in the output.
-- **elementCounts**: The number of elements in the input and output.
-## Scoring Details
-The scorer evaluates completeness through linguistic element coverage analysis.
-### Scoring Process
-1. Extracts key elements:
-   - Nouns and named entities
-   - Action verbs
-   - Topic-specific terms
-   - Normalized word forms
-2. Calculates coverage of input elements:
-   - Exact matches for short terms (≤3 chars)
-   - Substantial overlap (>60%) for longer terms
-Final score: `(covered_elements / total_input_elements) * scale`
-### Score interpretation
-A completeness score between 0 and 1:
-- **1.0**: Thoroughly addresses all aspects of the query with comprehensive detail.
-- **0.7–0.9**: Covers most important aspects with good detail, minor gaps.
-- **0.4–0.6**: Addresses some key points but missing important aspects or lacking detail.
-- **0.1–0.3**: Only partially addresses the query with significant gaps.
-- **0.0**: Fails to address the query or provides irrelevant information.
-## Example
-Evaluate agent responses for completeness across different query complexities:
-```typescript
-import { runEvals } from "@mastra/core/evals";
-import { createCompletenessScorer } from "@mastra/evals/scorers/prebuilt";
-import { myAgent } from "./agent";
-const scorer = createCompletenessScorer();
-const result = await runEvals({
-  data: [
-    {
-      input:
-        "Explain the process of photosynthesis, including the inputs, outputs, and stages involved.",
-    },
-    {
-      input:
-        "What are the benefits and drawbacks of remote work for both employees and employers?",
-    },
-    {
-      input:
-        "Compare renewable and non-renewable energy sources in terms of cost, environmental impact, and sustainability.",
-    },
-  ],
-  scorers: [scorer],
-  target: myAgent,
-  onItemComplete: ({ scorerResults }) => {
-    console.log({
-      score: scorerResults[scorer.id].score,
-    });
-  },
-});
-console.log(result.scores);
-```
-For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).
-To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide.
-## Related
-- [Answer Relevancy Scorer](https://mastra.ai/reference/evals/answer-relevancy)
-- [Content Similarity Scorer](https://mastra.ai/reference/evals/content-similarity)
-- [Textual Difference Scorer](https://mastra.ai/reference/evals/textual-difference)
-- [Keyword Coverage Scorer](https://mastra.ai/reference/evals/keyword-coverage)

package/dist/docs/references/reference-evals-content-similarity.md DELETED Viewed

@@ -1,101 +0,0 @@
-# Content Similarity Scorer
-The `createContentSimilarityScorer()` function measures the textual similarity between two strings, providing a score that indicates how closely they match. It supports configurable options for case sensitivity and whitespace handling.
-## Parameters
-The `createContentSimilarityScorer()` function accepts a single options object with the following properties:
-**ignoreCase:** (`boolean`): Whether to ignore case differences when comparing strings. (Default: `true`)
-**ignoreWhitespace:** (`boolean`): Whether to normalize whitespace when comparing strings. (Default: `true`)
-This function returns an instance of the MastraScorer class. See the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer) for details on the `.run()` method and its input/output.
-## .run() Returns
-**runId:** (`string`): The id of the run (optional).
-**preprocessStepResult:** (`object`): Object with processed input and output: { processedInput: string, processedOutput: string }
-**analyzeStepResult:** (`object`): Object with similarity: { similarity: number }
-**score:** (`number`): Similarity score (0-1) where 1 indicates perfect similarity.
-## Scoring Details
-The scorer evaluates textual similarity through character-level matching and configurable text normalization.
-### Scoring Process
-1. Normalizes text:
-   - Case normalization (if ignoreCase: true)
-   - Whitespace normalization (if ignoreWhitespace: true)
-2. Compares processed strings using string-similarity algorithm:
-   - Analyzes character sequences
-   - Aligns word boundaries
-   - Considers relative positions
-   - Accounts for length differences
-Final score: `similarity_value * scale`
-## Example
-Evaluate textual similarity between expected and actual agent outputs:
-```typescript
-import { runEvals } from "@mastra/core/evals";
-import { createContentSimilarityScorer } from "@mastra/evals/scorers/prebuilt";
-import { myAgent } from "./agent";
-const scorer = createContentSimilarityScorer();
-const result = await runEvals({
-  data: [
-    {
-      input: "Summarize the benefits of TypeScript",
-      groundTruth:
-        "TypeScript provides static typing, better tooling support, and improved code maintainability.",
-    },
-    {
-      input: "What is machine learning?",
-      groundTruth:
-        "Machine learning is a subset of AI that enables systems to learn from data without explicit programming.",
-    },
-  ],
-  scorers: [scorer],
-  target: myAgent,
-  onItemComplete: ({ scorerResults }) => {
-    console.log({
-      score: scorerResults[scorer.id].score,
-      groundTruth: scorerResults[scorer.id].groundTruth,
-    });
-  },
-});
-console.log(result.scores);
-```
-For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).
-To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide.
-### Score interpretation
-A similarity score between 0 and 1:
-- **1.0**: Perfect match – content is nearly identical.
-- **0.7–0.9**: High similarity – minor differences in word choice or structure.
-- **0.4–0.6**: Moderate similarity – general overlap with noticeable variation.
-- **0.1–0.3**: Low similarity – few common elements or shared meaning.
-- **0.0**: No similarity – completely different content.
-## Related
-- [Completeness Scorer](https://mastra.ai/reference/evals/completeness)
-- [Textual Difference Scorer](https://mastra.ai/reference/evals/textual-difference)
-- [Answer Relevancy Scorer](https://mastra.ai/reference/evals/answer-relevancy)
-- [Keyword Coverage Scorer](https://mastra.ai/reference/evals/keyword-coverage)

package/dist/docs/references/reference-evals-context-precision.md DELETED Viewed

@@ -1,196 +0,0 @@
-# Context Precision Scorer
-The `createContextPrecisionScorer()` function creates a scorer that evaluates how relevant and well-positioned retrieved context pieces are for generating expected outputs. It uses **Mean Average Precision (MAP)** to reward systems that place relevant context earlier in the sequence.
-It is especially useful for these use cases:
-**RAG System Evaluation**
-Ideal for evaluating retrieved context in RAG pipelines where:
-- Context ordering matters for model performance
-- You need to measure retrieval quality beyond simple relevance
-- Early relevant context is more valuable than later relevant context
-**Context Window Optimization**
-Use when optimizing context selection for:
-- Limited context windows
-- Token budget constraints
-- Multi-step reasoning tasks
-## Parameters
-**model:** (`MastraModelConfig`): The language model to use for evaluating context relevance
-**options:** (`ContextPrecisionMetricOptions`): Configuration options for the scorer
-**Note**: Either `context` or `contextExtractor` must be provided. If both are provided, `contextExtractor` takes precedence.
-## .run() Returns
-**score:** (`number`): Mean Average Precision score between 0 and scale (default 0-1)
-**reason:** (`string`): Human-readable explanation of the context precision evaluation
-## Scoring Details
-### Mean Average Precision (MAP)
-Context Precision uses **Mean Average Precision** to evaluate both relevance and positioning:
-1. **Context Evaluation**: Each context piece is classified as relevant or irrelevant for generating the expected output
-2. **Precision Calculation**: For each relevant context at position `i`, precision = `relevant_items_so_far / (i + 1)`
-3. **Average Precision**: Sum all precision values and divide by total relevant items
-4. **Final Score**: Multiply by scale factor and round to 2 decimals
-### Scoring Formula
-```text
-MAP = (Σ Precision@k) / R
-Where:
-- Precision@k = (relevant items in positions 1...k) / k
-- R = total number of relevant items
-- Only calculated at positions where relevant items appear
-```
-### Score Interpretation
-- **0.9-1.0**: Excellent precision - all relevant context early in sequence
-- **0.7-0.8**: Good precision - most relevant context well-positioned
-- **0.4-0.6**: Moderate precision - relevant context mixed with irrelevant
-- **0.1-0.3**: Poor precision - little relevant context or poorly positioned
-- **0.0**: No relevant context found
-### Reason analysis
-The reason field explains:
-- Which context pieces were deemed relevant/irrelevant
-- How positioning affected the MAP calculation
-- Specific relevance criteria used in evaluation
-### Optimization insights
-Use results to:
-- **Improve retrieval**: Filter out irrelevant context before ranking
-- **Optimize ranking**: Ensure relevant context appears early
-- **Tune chunk size**: Balance context detail vs. relevance precision
-- **Evaluate embeddings**: Test different embedding models for better retrieval
-### Example Calculation
-Given context: `[relevant, irrelevant, relevant, irrelevant]`
-- Position 0: Relevant → Precision = 1/1 = 1.0
-- Position 1: Skip (irrelevant)
-- Position 2: Relevant → Precision = 2/3 = 0.67
-- Position 3: Skip (irrelevant)
-MAP = (1.0 + 0.67) / 2 = 0.835 ≈ **0.83**
-## Scorer configuration
-### Dynamic context extraction
-```typescript
-const scorer = createContextPrecisionScorer({
-  model: "openai/gpt-5.1",
-  options: {
-    contextExtractor: (input, output) => {
-      // Extract context dynamically based on the query
-      const query = input?.inputMessages?.[0]?.content || "";
-      // Example: Retrieve from a vector database
-      const searchResults = vectorDB.search(query, { limit: 10 });
-      return searchResults.map((result) => result.content);
-    },
-    scale: 1,
-  },
-});
-```
-### Large context evaluation
-```typescript
-const scorer = createContextPrecisionScorer({
-  model: "openai/gpt-5.1",
-  options: {
-    context: [
-      // Simulate retrieved documents from vector database
-      "Document 1: Highly relevant content...",
-      "Document 2: Somewhat related content...",
-      "Document 3: Tangentially related...",
-      "Document 4: Not relevant...",
-      "Document 5: Highly relevant content...",
-      // ... up to dozens of context pieces
-    ],
-  },
-});
-```
-## Example
-Evaluate RAG system context retrieval precision for different queries:
-```typescript
-import { runEvals } from "@mastra/core/evals";
-import { createContextPrecisionScorer } from "@mastra/evals/scorers/prebuilt";
-import { myAgent } from "./agent";
-const scorer = createContextPrecisionScorer({
-  model: "openai/gpt-4o",
-  options: {
-    contextExtractor: (input, output) => {
-      // Extract context from agent's retrieved documents
-      return output.metadata?.retrievedContext || [];
-    },
-  },
-});
-const result = await runEvals({
-  data: [
-    {
-      input: "How does photosynthesis work in plants?",
-    },
-    {
-      input: "What are the mental and physical benefits of exercise?",
-    },
-  ],
-  scorers: [scorer],
-  target: myAgent,
-  onItemComplete: ({ scorerResults }) => {
-    console.log({
-      score: scorerResults[scorer.id].score,
-      reason: scorerResults[scorer.id].reason,
-    });
-  },
-});
-console.log(result.scores);
-```
-For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).
-To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide.
-## Comparison with Context Relevance
-Choose the right scorer for your needs:
-| Use Case                 | Context Relevance    | Context Precision         |
-| ------------------------ | -------------------- | ------------------------- |
-| **RAG evaluation**       | When usage matters   | When ranking matters      |
-| **Context quality**      | Nuanced levels       | Binary relevance          |
-| **Missing detection**    | ✓ Identifies gaps    | ✗ Not evaluated           |
-| **Usage tracking**       | ✓ Tracks utilization | ✗ Not considered          |
-| **Position sensitivity** | ✗ Position agnostic  | ✓ Rewards early placement |
-## Related
-- [Answer Relevancy Scorer](https://mastra.ai/reference/evals/answer-relevancy) - Evaluates if answers address the question
-- [Faithfulness Scorer](https://mastra.ai/reference/evals/faithfulness) - Measures answer groundedness in context
-- [Custom Scorers](https://mastra.ai/docs/evals/custom-scorers) - Creating your own evaluation metrics