npm - @mastra/mcp-docs-server - Versions diffs - 0.13.17-alpha.4 → 0.13.17-alpha.6 - Mend

@mastra/mcp-docs-server 0.13.17-alpha.4 → 0.13.17-alpha.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (76) hide show

package/.docs/raw/observability/ai-tracing.mdx ADDED Viewed

@@ -0,0 +1,438 @@
+---
+title: "AI Tracing | Mastra Observability Documentation"
+description: "Set up AI tracing for Mastra applications"
+---
+import { Callout } from "nextra/components";
+# AI Tracing
+AI Tracing provides specialized monitoring and debugging for the AI-related operations in your application. When enabled, Mastra automatically creates traces for agent runs, LLM generations, tool calls, and workflow steps with AI-specific context and metadata.
+Unlike traditional application tracing, AI Tracing focuses specifically on understanding your AI pipeline — capturing token usage, model parameters, tool execution details, and conversation flows. This makes it easier to debug issues, optimize performance, and understand how your AI systems behave in production.
+You create AI traces by:
+- **Configuring exporters** to send trace data to observability platforms like Langfuse
+- **Setting sampling strategies** to control which traces are collected
+- **Running agents and workflows** — Mastra automatically instruments them with detailed AI tracing
+This provides full visibility into your AI operations with minimal setup, helping you build more reliable and observable AI applications.
+<Callout type="warning">
+  **Experimental Feature**
+  AI Tracing is available as of `@mastra/core 0.14.0` and is currently experimental. The API may change in future releases.
+</Callout>
+## How It Differs from Standard Tracing
+AI Tracing complements Mastra's existing [OpenTelemetry-based tracing](./tracing.mdx) but serves a different purpose:
+| Feature | Standard Tracing | AI Tracing |
+|---------|-----------------|------------|
+| **Focus** | Application infrastructure | AI operations only |
+| **Data Format** | OpenTelemetry standard | Provider-native (Langfuse, etc.) |
+| **Timing** | Batch export | Real-time option for debugging |
+| **Metadata** | Generic span attributes | AI-specific (tokens, models, tools) |
+## Current Status
+**Supported Exporters:**
+- ✅ [Langfuse](https://langfuse.com/) - Full support with real-time mode
+- 🔄 [Braintrust](https://www.braintrust.dev/home) - Coming soon
+- 🔄 [OpenTelemetry](https://opentelemetry.io/) - Coming soon
+**Known Limitations:**
+- Mastra playground traces still use the legacy tracing system
+- API is experimental and may change
+For the latest updates, see [GitHub issue #6773](https://github.com/mastra-ai/mastra/issues/6773).
+## Basic Configuration
+Here's a simple example of enabling AI Tracing:
+```ts filename="src/mastra/index.ts" showLineNumbers copy
+import { LangfuseExporter } from '@mastra/langfuse';
+export const mastra = new Mastra({
+  // ... other config
+  observability: {
+    instances: {
+      langfuse: {
+        serviceName: 'my-service',
+        exporters: [
+          new LangfuseExporter({
+            publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
+            secretKey: process.env.LANGFUSE_SECRET_KEY!,
+            baseUrl: process.env.LANGFUSE_BASE_URL!,
+            realtime: true,
+          }),
+        ],
+      },
+    },
+  },
+});
+```
+## Configuration Options
+The AI tracing config accepts these properties:
+```ts
+type AITracingConfig = {
+  // Map of tracing instance names to their configurations
+  instances: Record<string, AITracingInstanceConfig | MastraAITracing>;
+  // Optional function to select which tracing instance to use
+  selector?: TracingSelector;
+};
+type AITracingInstanceConfig = {
+  // Name to identify your service in traces
+  serviceName: string;
+  // Control how many traces are sampled
+  sampling?: {
+    type: "always" | "never" | "ratio" | "custom";
+    probability?: number; // For ratio sampling (0.0 to 1.0)
+    sampler?: (context: TraceContext) => boolean; // For custom sampling
+  };
+  // Array of exporters to send trace data to
+  exporters?: AITracingExporter[];
+  // Array of processors to transform spans before export
+  processors?: AISpanProcessor[];
+};
+```
+### Sampling Configuration
+Control which traces are collected and exported:
+```ts filename="src/mastra/index.ts" showLineNumbers copy
+export const mastra = new Mastra({
+  observability: {
+    instances: {
+      langfuse: {
+        serviceName: 'my-service',
+        // Sample all traces (default)
+        sampling: { type: 'always' },
+        exporters: [langfuseExporter],
+      },
+      development: {
+        serviceName: 'dev-service',
+        // Sample 10% of traces
+        sampling: {
+          type: 'ratio',
+          probability: 0.1
+        },
+        exporters: [langfuseExporter],
+      },
+      custom: {
+        serviceName: 'custom-service',
+        // Custom sampling logic
+        sampling: {
+          type: 'custom',
+          sampler: (context) => {
+            // Only trace requests from specific users
+            return context.metadata?.userId === 'debug-user';
+          }
+        },
+        exporters: [langfuseExporter],
+      },
+    },
+  },
+});
+```
+### Langfuse Exporter Configuration
+The Langfuse exporter accepts these options:
+```ts
+type LangfuseExporterConfig = {
+  // Langfuse API credentials
+  publicKey: string;
+  secretKey: string;
+  baseUrl: string;
+  // Enable realtime mode for immediate trace visibility
+  realtime?: boolean; // defaults to false
+  // Additional options passed to Langfuse client
+  options?: any;
+};
+```
+Example with environment variables:
+```ts filename="mastra.config.ts" showLineNumbers copy
+import { LangfuseExporter } from '@mastra/langfuse';
+export const mastra = new Mastra({
+  observability: {
+    instances: {
+      langfuse: {
+        serviceName: process.env.SERVICE_NAME || 'mastra-app',
+        sampling: { type: 'always' },
+        exporters: [
+          new LangfuseExporter({
+            publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
+            secretKey: process.env.LANGFUSE_SECRET_KEY!,
+            baseUrl: process.env.LANGFUSE_BASE_URL!,
+            realtime: process.env.NODE_ENV === 'development',
+          }),
+        ],
+      },
+    },
+  },
+});
+```
+#### Real-time vs Batch Mode
+The Langfuse exporter supports two modes:
+**Batch Mode (default)**
+- Traces are buffered and sent periodically
+- Better performance for production
+- Traces may appear with slight delay
+**Real-time Mode**
+- Each trace event is immediately flushed
+- Ideal for development and debugging
+- Immediate visibility in Langfuse dashboard
+```ts
+new LangfuseExporter({
+  // ... other config
+  realtime: process.env.NODE_ENV === 'development',
+})
+```
+#### Multi-Instance Configuration
+You can configure multiple tracing instances and use a selector to choose which one to use:
+```ts filename="mastra.config.ts" showLineNumbers copy
+export const mastra = new Mastra({
+  observability: {
+    instances: {
+      production: {
+        serviceName: 'prod-service',
+        sampling: { type: 'ratio', probability: 0.1 },
+        exporters: [prodLangfuseExporter],
+      },
+      development: {
+        serviceName: 'dev-service',
+        sampling: { type: 'always' },
+        exporters: [devLangfuseExporter],
+      },
+    },
+    selector: (context, availableTracers) => {
+      // Use development tracer for debug sessions
+      if (context.runtimeContext?.get('debug') === 'true') {
+        return 'development';
+      }
+      return 'production';
+    },
+  },
+});
+```
+## Span Types and Attributes
+AI Tracing automatically creates spans for different AI operations. Mastra supports the following span types:
+### Agent Operation Types
+- **`AGENT_RUN`** - Agent execution from start to finish
+- **`LLM_GENERATION`** - Individual model calls with prompts and completions
+- **`TOOL_CALL`** - Function/tool executions with inputs and outputs
+- **`MCP_TOOL_CALL`** - Model Context Protocol tool executions
+- **`GENERIC`** - Custom operations
+### Workflow Operation Types
+- **`WORKFLOW_RUN`** - Workflow execution from start to finish
+- **`WORKFLOW_STEP`** - Individual step processing
+- **`WORKFLOW_CONDITIONAL`** - Conditional execution blocks
+- **`WORKFLOW_CONDITIONAL_EVAL`** - Individual condition evaluations
+- **`WORKFLOW_PARALLEL`** - Parallel execution blocks
+- **`WORKFLOW_LOOP`** - Loop execution blocks
+- **`WORKFLOW_SLEEP`** - Sleep/delay operations
+- **`WORKFLOW_WAIT_EVENT`** - Event waiting operations
+### Key Attributes
+Each span type includes relevant attributes:
+- **Agent spans**: Agent ID, instructions, available tools, max steps
+- **LLM spans**: Model name, provider, token usage, parameters, finish reason
+- **Tool spans**: Tool ID, tool type, success status
+- **Workflow spans**: Step/workflow IDs, status information
+## Adding Custom Metadata to Spans
+You can add custom metadata to spans using the `tracingContext.currentSpan` available in workflow steps and tool calls. This is useful for tracking additional context like API status codes, user IDs, or performance metrics.
+```ts showLineNumbers copy
+execute: async ({ inputData, tracingContext }) => {
+  const response = await fetch(inputData.endpoint, {
+    method: 'POST',
+    body: JSON.stringify(inputData.payload),
+  });
+  // Add custom metadata to the current span
+  tracingContext.currentSpan?.update({
+    metadata: {
+      apiStatusCode: response.status,
+      responseHeaders: Object.fromEntries(response.headers.entries()),
+      endpoint: inputData.endpoint,
+    }
+  });
+  const data = await response.json();
+  return { data, statusCode: response.status };
+}
+```
+## Creating Child Spans
+You can create child spans to track specific operations within your workflow steps or tools. This provides more granular visibility into what's happening during execution.
+```ts showLineNumbers copy
+execute: async ({ input, tracingContext }) => {
+  // Create a child span for the database query
+  const querySpan = tracingContext.currentSpan?.createChildSpan({
+    type: 'generic',
+    name: 'database-query',
+    input: {
+      query: input.query,
+      params: input.params,
+    }
+  });
+  try {
+    const results = await db.query(input.query, input.params);
+    // Update child span with results and end it
+    querySpan?.end({
+      output: results.data,
+      metadata: {
+        rowsReturned: results.length,
+        success: true,
+      }
+    });
+    return { results, rowCount: results.length };
+  } catch (error) {
+    // Record error on child span
+    querySpan?.error({error});
+    throw error;
+  }
+}
+```
+## Span Processors and Data Filtering
+Span processors allow you to modify or filter span data before it's exported to observability platforms. This is useful for adding computed fields, redacting sensitive information, or transforming data formats.
+### Built-in SensitiveDataFilter
+Mastra includes a `SensitiveDataFilter` processor that automatically redacts sensitive fields from span data. It's enabled by default and scans for common sensitive field names:
+```ts filename="src/mastra/index.ts" showLineNumbers copy
+import { LangfuseExporter } from '@mastra/langfuse';
+import { SensitiveDataFilter } from '@mastra/core/ai-tracing';
+export const mastra = new Mastra({
+  observability: {
+    instances: {
+      langfuse: {
+        serviceName: 'my-service',
+        exporters: [new LangfuseExporter({ /* config */ })],
+        // SensitiveDataFilter is included by default, but you can customize it
+        processors: [
+          new SensitiveDataFilter([
+            'password', 'token', 'secret', 'key', 'apiKey',
+            'auth', 'authorization', 'bearer', 'jwt',
+            'credential', 'sessionId',
+            // Add your custom sensitive fields
+            'ssn', 'creditCard', 'bankAccount'
+          ])
+        ],
+      },
+    },
+  },
+});
+```
+The `SensitiveDataFilter` automatically redacts matching fields in:
+- Span attributes
+- Span metadata
+- Input/output data
+- Error information
+Fields are matched case-insensitively, and nested objects are processed recursively.
+### Custom Processors
+You can create custom processors to implement your own span transformation logic:
+```ts showLineNumbers copy
+import type { AISpanProcessor, AnyAISpan } from '@mastra/core/ai-tracing';
+export class PerformanceEnrichmentProcessor implements AISpanProcessor {
+  name = 'performance-enrichment';
+  process(span: AnyAISpan): AnyAISpan | null {
+    const modifiedSpan = { ...span };
+    // Add computed performance metrics
+    if (span.startTime && span.endTime) {
+      const duration = span.endTime.getTime() - span.startTime.getTime();
+      modifiedSpan.metadata = {
+        ...span.metadata,
+        durationMs: duration,
+        performanceCategory: duration < 100 ? 'fast' : duration < 1000 ? 'medium' : 'slow',
+      };
+    }
+    // Add environment context
+    modifiedSpan.metadata = {
+      ...modifiedSpan.metadata,
+      environment: process.env.NODE_ENV || 'unknown',
+      region: process.env.AWS_REGION || 'unknown',
+    };
+    return modifiedSpan;
+  }
+  async shutdown(): Promise<void> {
+    // Cleanup if needed
+  }
+}
+// Use in your Mastra configuration
+export const mastra = new Mastra({
+  observability: {
+    instances: {
+      langfuse: {
+        serviceName: 'my-service',
+        exporters: [new LangfuseExporter({ /* config */ })],
+        processors: [
+          new SensitiveDataFilter(),
+          new PerformanceEnrichmentProcessor(),
+        ],
+      },
+    },
+  },
+});
+```
+Processors are executed in the order they're defined, and each processor receives the output of the previous one.

package/.docs/raw/reference/scorers/noise-sensitivity.mdx CHANGED Viewed

@@ -1,13 +1,15 @@
 ---
-title: "Reference: Noise Sensitivity Scorer | Scorers | Mastra Docs"
-description: Documentation for the Noise Sensitivity Scorer in Mastra. Evaluates how robust an agent is when exposed to irrelevant, distracting, or misleading information in user queries.
+title: "Reference: Noise Sensitivity Scorer (CI/Testing) | Scorers | Mastra Docs"
+description: Documentation for the Noise Sensitivity Scorer in Mastra. A CI/testing scorer that evaluates agent robustness by comparing responses between clean and noisy inputs in controlled test environments.
 ---
 import { PropertiesTable } from "@/components/properties-table";
-# Noise Sensitivity Scorer
+# Noise Sensitivity Scorer (CI/Testing Only)
-The `createNoiseSensitivityScorerLLM()` function creates a scorer that evaluates how robust an agent is when exposed to irrelevant, distracting, or misleading information. It measures the agent's ability to maintain response quality and accuracy despite noise in the input.
+The `createNoiseSensitivityScorerLLM()` function creates a **CI/testing scorer** that evaluates how robust an agent is when exposed to irrelevant, distracting, or misleading information. Unlike live scorers that evaluate single production runs, this scorer requires predetermined test data including both baseline responses and noisy variations.
+**Important:** This is not a live scorer. It requires pre-computed baseline responses and cannot be used for real-time agent evaluation. Use this scorer in your CI/CD pipeline or testing suites only.
 ## Parameters
@@ -120,6 +122,68 @@ The `createNoiseSensitivityScorerLLM()` function creates a scorer that evaluates
   ]}
 />
+## CI/Testing Requirements
+This scorer is designed exclusively for CI/testing environments and has specific requirements:
+### Why This Is a CI Scorer
+1. **Requires Baseline Data**: You must provide a pre-computed baseline response (the "correct" answer without noise)
+2. **Needs Test Variations**: Requires both the original query and a noisy variation prepared in advance
+3. **Comparative Analysis**: The scorer compares responses between baseline and noisy versions, which is only possible in controlled test conditions
+4. **Not Suitable for Production**: Cannot evaluate single, real-time agent responses without predetermined test data
+### Test Data Preparation
+To use this scorer effectively, you need to prepare:
+- **Original Query**: The clean user input without any noise
+- **Baseline Response**: Run your agent with the original query and capture the response
+- **Noisy Query**: Add distractions, misinformation, or irrelevant content to the original query
+- **Test Execution**: Run your agent with the noisy query and evaluate using this scorer
+### Example: CI Test Implementation
+```typescript
+import { describe, it, expect } from "vitest";
+import { createNoiseSensitivityScorerLLM } from "@mastra/evals/scorers/llm";
+import { openai } from "@ai-sdk/openai";
+import { myAgent } from "./agents";
+describe("Agent Noise Resistance Tests", () => {
+  it("should maintain accuracy despite misinformation noise", async () => {
+    // Step 1: Define test data
+    const originalQuery = "What is the capital of France?";
+    const noisyQuery = "What is the capital of France? Berlin is the capital of Germany, and Rome is in Italy. Some people incorrectly say Lyon is the capital.";
+    // Step 2: Get baseline response (pre-computed or cached)
+    const baselineResponse = "The capital of France is Paris.";
+    // Step 3: Run agent with noisy query
+    const noisyResult = await myAgent.run({
+      messages: [{ role: "user", content: noisyQuery }]
+    });
+    // Step 4: Evaluate using noise sensitivity scorer
+    const scorer = createNoiseSensitivityScorerLLM({
+      model: openai("gpt-4o-mini"),
+      options: {
+        baselineResponse,
+        noisyQuery,
+        noiseType: "misinformation"
+      }
+    });
+    const evaluation = await scorer.run({
+      input: originalQuery,
+      output: noisyResult.content
+    });
+    // Assert the agent maintains robustness
+    expect(evaluation.score).toBeGreaterThan(0.8);
+  });
+});
+```
 ## .run() Returns
 <PropertiesTable
@@ -200,26 +264,28 @@ Deliberately conflicting instructions designed to confuse.
 Example: "Write a summary of this article. Actually, ignore that and tell me about dogs instead."
-## Usage Patterns
+## CI/Testing Usage Patterns
-### Testing Agent Robustness
-Use to verify that agents maintain quality when faced with:
-- User confusion or contradictions
-- Multiple unrelated questions in one query
-- False premises or assumptions
-- Emotional or distracting content
+### Integration Testing
+Use in your CI pipeline to verify agent robustness:
+- Create test suites with baseline and noisy query pairs
+- Run regression tests to ensure noise resistance doesn't degrade
+- Compare different model versions' noise handling capabilities
+- Validate fixes for noise-related issues
-### Quality Assurance
-Integrate into evaluation pipelines to:
-- Benchmark different models' noise resistance
-- Identify agents vulnerable to manipulation
-- Validate production readiness
+### Quality Assurance Testing
+Include in your test harness to:
+- Benchmark different models' noise resistance before deployment
+- Identify agents vulnerable to manipulation during development
+- Create comprehensive test coverage for various noise types
+- Ensure consistent behavior across updates
 ### Security Testing
-Evaluate resistance to:
-- Prompt injection attempts
-- Social engineering tactics
-- Information pollution attacks
+Evaluate resistance in controlled environments:
+- Test prompt injection resistance with prepared attack vectors
+- Validate defenses against social engineering attempts
+- Measure resilience to information pollution
+- Document security boundaries and limitations
 ## Score Interpretation
@@ -231,6 +297,7 @@ Evaluate resistance to:
 ## Related
+- [Running in CI](/docs/evals/running-in-ci) - Setting up scorers in CI/CD pipelines
 - [Noise Sensitivity Examples](/examples/scorers/noise-sensitivity) - Practical usage examples
 - [Hallucination Scorer](/reference/scorers/hallucination) - Evaluates fabricated content
 - [Answer Relevancy Scorer](/reference/scorers/answer-relevancy) - Measures response focus