npm - @mastra/mcp-docs-server - Versions diffs - 0.13.16 → 0.13.17-alpha.1 - Mend

@mastra/mcp-docs-server 0.13.16 → 0.13.17-alpha.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (75) hide show

package/.docs/raw/reference/scorers/context-precision.mdx ADDED Viewed

@@ -0,0 +1,130 @@
+---
+title: "Reference: Context Precision Scorer | Scorers | Mastra Docs"
+description: Documentation for the Context Precision Scorer in Mastra. Evaluates the relevance and precision of retrieved context for generating expected outputs using Mean Average Precision.
+---
+import { PropertiesTable } from "@/components/properties-table";
+# Context Precision Scorer
+The `createContextPrecisionScorer()` function creates a scorer that evaluates how relevant and well-positioned retrieved context pieces are for generating expected outputs. It uses **Mean Average Precision (MAP)** to reward systems that place relevant context earlier in the sequence.
+## Parameters
+<PropertiesTable
+  content={[
+    {
+      name: "model",
+      type: "MastraLanguageModel",
+      description: "The language model to use for evaluating context relevance",
+      required: true,
+    },
+    {
+      name: "options",
+      type: "ContextPrecisionMetricOptions",
+      description: "Configuration options for the scorer",
+      required: true,
+      children: [
+        {
+          name: "context",
+          type: "string[]",
+          description: "Array of context pieces to evaluate for relevance",
+          required: false,
+        },
+        {
+          name: "contextExtractor",
+          type: "(input, output) => string[]",
+          description: "Function to dynamically extract context from the run input and output",
+          required: false,
+        },
+        {
+          name: "scale",
+          type: "number",
+          description: "Scale factor to multiply the final score (default: 1)",
+          required: false,
+        },
+      ],
+    },
+  ]}
+/>
+:::note
+Either `context` or `contextExtractor` must be provided. If both are provided, `contextExtractor` takes precedence.
+:::
+## .run() Returns
+<PropertiesTable
+  content={[
+    {
+      name: "score",
+      type: "number",
+      description: "Mean Average Precision score between 0 and scale (default 0-1)",
+    },
+    {
+      name: "reason",
+      type: "string",
+      description: "Human-readable explanation of the context precision evaluation",
+    },
+  ]}
+/>
+## Scoring Details
+### Mean Average Precision (MAP)
+Context Precision uses **Mean Average Precision** to evaluate both relevance and positioning:
+1. **Context Evaluation**: Each context piece is classified as relevant or irrelevant for generating the expected output
+2. **Precision Calculation**: For each relevant context at position `i`, precision = `relevant_items_so_far / (i + 1)`
+3. **Average Precision**: Sum all precision values and divide by total relevant items
+4. **Final Score**: Multiply by scale factor and round to 2 decimals
+### Scoring Formula
+```
+MAP = (Σ Precision@k) / R
+Where:
+- Precision@k = (relevant items in positions 1...k) / k
+- R = total number of relevant items
+- Only calculated at positions where relevant items appear
+```
+### Score Interpretation
+- **1.0** = Perfect precision (all relevant context appears first)
+- **0.5-0.9** = Good precision with some relevant context well-positioned
+- **0.1-0.4** = Poor precision with relevant context buried or scattered
+- **0.0** = No relevant context found
+### Example Calculation
+Given context: `[relevant, irrelevant, relevant, irrelevant]`
+- Position 0: Relevant → Precision = 1/1 = 1.0
+- Position 1: Skip (irrelevant)
+- Position 2: Relevant → Precision = 2/3 = 0.67
+- Position 3: Skip (irrelevant)
+MAP = (1.0 + 0.67) / 2 = 0.835 ≈ **0.83**
+## Usage Patterns
+### RAG System Evaluation
+Ideal for evaluating retrieved context in RAG pipelines where:
+- Context ordering matters for model performance
+- You need to measure retrieval quality beyond simple relevance
+- Early relevant context is more valuable than later relevant context
+### Context Window Optimization
+Use when optimizing context selection for:
+- Limited context windows
+- Token budget constraints
+- Multi-step reasoning tasks
+## Related
+- [Answer Relevancy Scorer](/reference/scorers/answer-relevancy) - Evaluates if answers address the question
+- [Faithfulness Scorer](/reference/scorers/faithfulness) - Measures answer groundedness in context
+- [Custom Scorers](/docs/scorers/custom-scorers) - Creating your own evaluation metrics

package/.docs/raw/reference/scorers/context-relevance.mdx ADDED Viewed

@@ -0,0 +1,222 @@
+---
+title: "Reference: Context Relevance Scorer | Scorers | Mastra Docs"
+description: Documentation for the Context Relevance Scorer in Mastra. Evaluates the relevance and utility of provided context for generating agent responses using weighted relevance scoring.
+---
+import { PropertiesTable } from "@/components/properties-table";
+# Context Relevance Scorer
+The `createContextRelevanceScorerLLM()` function creates a scorer that evaluates how relevant and useful provided context was for generating agent responses. It uses weighted relevance levels and applies penalties for unused high-relevance context and missing information.
+## Parameters
+<PropertiesTable
+  content={[
+    {
+      name: "model",
+      type: "MastraLanguageModel",
+      description: "The language model to use for evaluating context relevance",
+      required: true,
+    },
+    {
+      name: "options",
+      type: "ContextRelevanceOptions",
+      description: "Configuration options for the scorer",
+      required: true,
+      children: [
+        {
+          name: "context",
+          type: "string[]",
+          description: "Array of context pieces to evaluate for relevance",
+          required: false,
+        },
+        {
+          name: "contextExtractor",
+          type: "(input, output) => string[]",
+          description: "Function to dynamically extract context from the run input and output",
+          required: false,
+        },
+        {
+          name: "scale",
+          type: "number",
+          description: "Scale factor to multiply the final score (default: 1)",
+          required: false,
+        },
+        {
+          name: "penalties",
+          type: "object",
+          description: "Configurable penalty settings for scoring",
+          required: false,
+          children: [
+            {
+              name: "unusedHighRelevanceContext",
+              type: "number",
+              description: "Penalty per unused high-relevance context (default: 0.1)",
+              required: false,
+            },
+            {
+              name: "missingContextPerItem",
+              type: "number",
+              description: "Penalty per missing context item (default: 0.15)",
+              required: false,
+            },
+            {
+              name: "maxMissingContextPenalty",
+              type: "number",
+              description: "Maximum total missing context penalty (default: 0.5)",
+              required: false,
+            },
+          ],
+        },
+      ],
+    },
+  ]}
+/>
+:::note
+Either `context` or `contextExtractor` must be provided. If both are provided, `contextExtractor` takes precedence.
+:::
+## .run() Returns
+<PropertiesTable
+  content={[
+    {
+      name: "score",
+      type: "number",
+      description: "Weighted relevance score between 0 and scale (default 0-1)",
+    },
+    {
+      name: "reason",
+      type: "string",
+      description: "Human-readable explanation of the context relevance evaluation",
+    },
+  ]}
+/>
+## Scoring Details
+### Weighted Relevance Scoring
+Context Relevance uses a sophisticated scoring algorithm that considers:
+1. **Relevance Levels**: Each context piece is classified with weighted values:
+   - `high` = 1.0 (directly addresses the query)
+   - `medium` = 0.7 (supporting information)
+   - `low` = 0.3 (tangentially related)
+   - `none` = 0.0 (completely irrelevant)
+2. **Usage Detection**: Tracks whether relevant context was actually used in the response
+3. **Penalties Applied** (configurable via `penalties` options):
+   - **Unused High-Relevance**: `unusedHighRelevanceContext` penalty per unused high-relevance context (default: 0.1)
+   - **Missing Context**: Up to `maxMissingContextPenalty` for identified missing information (default: 0.5)
+### Scoring Formula
+```
+Base Score = Σ(relevance_weights) / (num_contexts × 1.0)
+Usage Penalty = count(unused_high_relevance) × unusedHighRelevanceContext
+Missing Penalty = min(count(missing_context) × missingContextPerItem, maxMissingContextPenalty)
+Final Score = max(0, Base Score - Usage Penalty - Missing Penalty) × scale
+```
+**Default Values**:
+- `unusedHighRelevanceContext` = 0.1 (10% penalty per unused high-relevance context)
+- `missingContextPerItem` = 0.15 (15% penalty per missing context item)
+- `maxMissingContextPenalty` = 0.5 (maximum 50% penalty for missing context)
+- `scale` = 1
+### Score Interpretation
+- **0.9-1.0** = Excellent relevance with minimal gaps
+- **0.7-0.8** = Good relevance with some unused or missing context
+- **0.4-0.6** = Mixed relevance with significant gaps
+- **0.0-0.3** = Poor relevance or mostly irrelevant context
+### Difference from Context Precision
+| Aspect | Context Relevance | Context Precision |
+|--------|-------------------|-------------------|
+| **Algorithm** | Weighted levels with penalties | Mean Average Precision (MAP) |
+| **Relevance** | Multiple levels (high/medium/low/none) | Binary (yes/no) |
+| **Position** | Not considered | Critical (rewards early placement) |
+| **Usage** | Tracks and penalizes unused context | Not considered |
+| **Missing** | Identifies and penalizes gaps | Not evaluated |
+## Usage Examples
+### Basic Configuration
+```typescript
+const scorer = createContextRelevanceScorerLLM({
+  model: openai('gpt-4o'),
+  options: {
+    context: ['Einstein won the Nobel Prize for his work on the photoelectric effect'],
+    scale: 1,
+  },
+});
+```
+### Custom Penalty Configuration
+```typescript
+const scorer = createContextRelevanceScorerLLM({
+  model: openai('gpt-4o'),
+  options: {
+    context: ['Context information...'],
+    penalties: {
+      unusedHighRelevanceContext: 0.05, // Lower penalty for unused context
+      missingContextPerItem: 0.2, // Higher penalty per missing item
+      maxMissingContextPenalty: 0.4, // Lower maximum penalty cap
+    },
+    scale: 2, // Double the final score
+  },
+});
+```
+### Dynamic Context Extraction
+```typescript
+const scorer = createContextRelevanceScorerLLM({
+  model: openai('gpt-4o'),
+  options: {
+    contextExtractor: (input, output) => {
+      // Extract context based on the query
+      const userQuery = input?.inputMessages?.[0]?.content || '';
+      if (userQuery.includes('Einstein')) {
+        return [
+          'Einstein won the Nobel Prize for the photoelectric effect',
+          'He developed the theory of relativity'
+        ];
+      }
+      return ['General physics information'];
+    },
+    penalties: {
+      unusedHighRelevanceContext: 0.15,
+    },
+  },
+});
+```
+## Usage Patterns
+### Content Generation Evaluation
+Best for evaluating context quality in:
+- Chat systems where context usage matters
+- RAG pipelines needing nuanced relevance assessment
+- Systems where missing context affects quality
+### Context Selection Optimization
+Use when optimizing for:
+- Comprehensive context coverage
+- Effective context utilization
+- Identifying context gaps
+## Related
+- [Context Precision Scorer](/reference/scorers/context-precision) - Evaluates context ranking using MAP
+- [Faithfulness Scorer](/reference/scorers/faithfulness) - Measures answer groundedness in context
+- [Custom Scorers](/docs/scorers/custom-scorers) - Creating your own evaluation metrics

package/.docs/raw/reference/tools/graph-rag-tool.mdx CHANGED Viewed

@@ -98,6 +98,13 @@ const graphTool = createGraphRAGTool({
       isOptional: true,
       defaultValue: "Default graph options",
     },
+    {
+      name: "providerOptions",
+      type: "Record<string, Record<string, any>>",
+      description:
+        "Provider-specific options for the embedding model (e.g., outputDimensionality). **Important**: Only works with AI SDK EmbeddingModelV2 models. For V1 models, configure options when creating the model itself.",
+      isOptional: true,
+    },
   ]}
 />

package/.docs/raw/reference/tools/vector-query-tool.mdx CHANGED Viewed

@@ -108,6 +108,13 @@ const queryTool = createVectorQueryTool({
         "Database-specific configuration options for optimizing queries. (Can be set at creation or overridden at runtime.)",
       isOptional: true,
     },
+    {
+      name: "providerOptions",
+      type: "Record<string, Record<string, any>>",
+      description:
+        "Provider-specific options for the embedding model (e.g., outputDimensionality). **Important**: Only works with AI SDK EmbeddingModelV2 models. For V1 models, configure options when creating the model itself.",
+      isOptional: true,
+    },
   ]}
 />

package/.docs/raw/scorers/off-the-shelf-scorers.mdx CHANGED Viewed

@@ -21,6 +21,22 @@ These scorers evaluate how correct, truthful, and complete your agent's answers
 - [`textual-difference`](/reference/scorers/textual-difference): Measures textual differences between strings (`0-1`, higher means more similar)
 - [`tool-call-accuracy`](/reference/scorers/tool-call-accuracy): Evaluates whether the LLM selects the correct tool from available options (`0-1`, higher is better)
+### Context Quality
+These scorers evaluate the quality and relevance of context used in generating responses:
+- [`context-precision`](/reference/scorers/context-precision): Evaluates context relevance and ranking using Mean Average Precision, rewarding early placement of relevant context (`0-1`, higher is better)
+- [`context-relevance`](/reference/scorers/context-relevance): Measures context utility with nuanced relevance levels, usage tracking, and missing context detection (`0-1`, higher is better)
+:::tip Context Scorer Selection
+- Use **Context Precision** when context ordering matters and you need standard IR metrics (ideal for RAG ranking evaluation)
+- Use **Context Relevance** when you need detailed relevance assessment and want to track context usage and identify gaps
+Both context scorers support:
+- **Static context**: Pre-defined context arrays
+- **Dynamic context extraction**: Extract context from runs using custom functions (ideal for RAG systems, vector databases, etc.)
+:::
 ### Output Quality
 These scorers evaluate adherence to format, style, and safety requirements:
@@ -28,4 +44,4 @@ These scorers evaluate adherence to format, style, and safety requirements:
 - [`tone-consistency`](/reference/scorers/tone-consistency): Measures consistency in formality, complexity, and style (`0-1`, higher is better)
 - [`toxicity`](/reference/scorers/toxicity): Detects harmful or inappropriate content (`0-1`, lower is better)
 - [`bias`](/reference/scorers/bias): Detects potential biases in the output (`0-1`, lower is better)
-- [`keyword-coverage`](/reference/scorers/keyword-coverage): Assesses technical terminology usage (`0-1`, higher is better)
+- [`keyword-coverage`](/reference/scorers/keyword-coverage): Assesses technical terminology usage (`0-1`, higher is better)

package/.docs/raw/server-db/local-dev-playground.mdx CHANGED Viewed

@@ -116,7 +116,7 @@ Key features:
 ## REST API Endpoints
-The local development server exposes a set of REST API routes via the [Mastra Server](/docs/deployment/server), allowing you to test and interact with your agents and workflows before deployment.
+The local development server exposes a set of REST API routes via the [Mastra Server](/docs/deployment/server-deployment), allowing you to test and interact with your agents and workflows before deployment.
 For a full overview of available API routes, including agents, tools, and workflows, see the [Routes reference](/reference/cli/dev#routes).
@@ -184,6 +184,25 @@ export const mastra = new Mastra({
 });
 ```
+## Bundler options
+Use `transpilePackages` to compile TypeScript packages or libraries. Use `externals` to exclude dependencies resolved at runtime, and `sourcemap` to emit readable stack traces.
+```typescript filename="src/mastra/index.ts" showLineNumbers copy
+import { Mastra } from "@mastra/core/mastra";
+export const mastra = new Mastra({
+  // ...
+  bundler: {
+    transpilePackages: ["utils"],
+    externals: ["ui"],
+    sourcemap: true
+  }
+});
+```
+> See [Mastra Class](../../reference/core/mastra-class.mdx) for more configuration options.
 ## Next steps

package/.docs/raw/workflows/control-flow.mdx CHANGED Viewed

@@ -190,52 +190,6 @@ export const testWorkflow = createWorkflow({...})
   .commit();
 ```
-## Exiting early with `bail()`
-Use `bail()` in a step to exit early with a successful result. This returns the provided payload as the step output and ends workflow execution.
-```typescript {7} filename="src/mastra/workflows/test-workflow.ts" showLineNumbers copy
-import { createWorkflow, createStep } from "@mastra/core/workflows";
-import { z } from "zod";
-const step1 = createStep({
-  id: 'step1',
-  execute: async ({ bail }) => {
-    return bail({ result: 'bailed' });
-  },
-  inputSchema: z.object({ value: z.string() }),
-  outputSchema: z.object({ result: z.string() }),
-});
-export const testWorkflow = createWorkflow({...})
-  .then(step1)
-  .commit();
-```
-## Exiting early with `Error()`
-Use `throw new Error()` in a step to exit with an error.
-```typescript {7} filename="src/mastra/workflows/test-workflow.ts" showLineNumbers copy
-import { createWorkflow, createStep } from "@mastra/core/workflows";
-import { z } from "zod";
-const step1 = createStep({
-  id: 'step1',
-  execute: async () => {
-    throw new Error('bailed');
-  },
-  inputSchema: z.object({ value: z.string() }),
-  outputSchema: z.object({ result: z.string() }),
-});
-export const testWorkflow = createWorkflow({...})
-  .then(step1)
-  .commit();
-```
-This throws an error from the step and stops workflow execution, returning the error as the result.
 ## Example Run Instance
 The following example demonstrates how to start a run with multiple inputs. Each input will pass through the `mapStep` sequentially.