npm - @mastra/mcp-docs-server - Versions diffs - 1.0.0-beta.0 → 1.0.0-beta.10 - Mend

@mastra/mcp-docs-server 1.0.0-beta.0 → 1.0.0-beta.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (496) hide show

package/.docs/raw/reference/evals/create-scorer.mdx CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: "Reference: createScorer | Evals | Mastra Docs"
+title: "Reference: createScorer | Evals"
 description: Documentation for creating custom scorers in Mastra, allowing users to define their own evaluation logic using either JavaScript functions or LLM-based prompts.
 ---
@@ -12,8 +12,11 @@ Mastra provides a unified `createScorer` factory that allows you to define custo
 Use the `createScorer` factory to define your scorer with a name, description, and optional judge configuration. Then chain step methods to build your evaluation pipeline. You must provide at least a `generateScore` step.
 ```typescript
+import { createScorer } from "@mastra/core/evals";
 const scorer = createScorer({
-  name: "My Custom Scorer",
+  id: "my-custom-scorer",
+  name: "My Custom Scorer", // Optional, defaults to id
   description: "Evaluates responses based on custom criteria",
   type: "agent", // Optional: for agent evaluation with automatic typing
   judge: {
@@ -39,29 +42,35 @@ const scorer = createScorer({
 <PropertiesTable
   content={[
+    {
+      name: "id",
+      type: "string",
+      isOptional: false,
+      description: "Unique identifier for the scorer. Used as the name if `name` is not provided.",
+    },
     {
       name: "name",
       type: "string",
-      required: true,
-      description: "Name of the scorer.",
+      isOptional: true,
+      description: "Name of the scorer. Defaults to `id` if not provided.",
     },
     {
       name: "description",
       type: "string",
-      required: true,
+      isOptional: false,
       description: "Description of what the scorer does.",
     },
     {
       name: "judge",
       type: "object",
-      required: false,
+      isOptional: true,
       description:
         "Optional judge configuration for LLM-based steps. See Judge Object section below.",
     },
     {
       name: "type",
       type: "string",
-      required: false,
+      isOptional: true,
       description:
         "Type specification for input/output. Use 'agent' for automatic agent types. For custom types, use the generic approach instead.",
     },
@@ -77,13 +86,13 @@ This function returns a scorer builder that you can chain step methods onto. See
     {
       name: "model",
       type: "LanguageModel",
-      required: true,
+      isOptional: false,
       description: "The LLM model instance to use for evaluation.",
     },
     {
       name: "instructions",
       type: "string",
-      required: true,
+      isOptional: false,
       description: "System prompt/instructions for the LLM.",
     },
   ]}
@@ -98,7 +107,7 @@ You can specify input/output types when creating scorers for better type inferen
 For evaluating agents, use `type: 'agent'` to automatically get the correct types for agent input/output:
 ```typescript
-import { createScorer } from "@mastra/core/scorers";
+import { createScorer } from "@mastra/core/evals";
 // Agent scorer with automatic typing
 const agentScorer = createScorer({
@@ -123,7 +132,7 @@ const agentScorer = createScorer({
 For custom input/output types, use the generic approach:
 ```typescript
-import { createScorer } from "@mastra/core/scorers";
+import { createScorer } from "@mastra/core/evals";
 type CustomInput = { query: string; context: string[] };
 type CustomOutput = { answer: string; confidence: number };
@@ -185,34 +194,34 @@ Function: `({ run, results }) => any`
     {
       name: "run.input",
       type: "any",
-      required: true,
+      isOptional: false,
       description:
         "Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.",
     },
     {
       name: "run.output",
       type: "any",
-      required: true,
+      isOptional: false,
       description:
         "Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.",
     },
     {
       name: "run.runId",
       type: "string",
-      required: true,
+      isOptional: false,
       description: "Unique identifier for this scoring run.",
     },
     {
       name: "run.requestContext",
       type: "object",
-      required: false,
+      isOptional: true,
       description:
         "Request Context from the agent or workflow step being evaluated (optional).",
     },
     {
       name: "results",
       type: "object",
-      required: true,
+      isOptional: false,
       description: "Empty object (no previous steps).",
     },
   ]}
@@ -228,26 +237,26 @@ The method can return any value. The returned value will be available to subsequ
     {
       name: "description",
       type: "string",
-      required: true,
+      isOptional: false,
       description: "Description of what this preprocessing step does.",
     },
     {
       name: "outputSchema",
       type: "ZodSchema",
-      required: true,
+      isOptional: false,
       description: "Zod schema for the expected output of the preprocess step.",
     },
     {
       name: "createPrompt",
       type: "function",
-      required: true,
+      isOptional: false,
       description:
         "Function: ({ run, results }) => string. Returns the prompt for the LLM.",
     },
     {
       name: "judge",
       type: "object",
-      required: false,
+      isOptional: true,
       description:
         "(Optional) LLM judge for this step (can override main judge). See Judge Object section.",
     },
@@ -266,34 +275,34 @@ Function: `({ run, results }) => any`
     {
       name: "run.input",
       type: "any",
-      required: true,
+      isOptional: false,
       description:
         "Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.",
     },
     {
       name: "run.output",
       type: "any",
-      required: true,
+      isOptional: false,
       description:
         "Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.",
     },
     {
       name: "run.runId",
       type: "string",
-      required: true,
+      isOptional: false,
       description: "Unique identifier for this scoring run.",
     },
     {
       name: "run.requestContext",
       type: "object",
-      required: false,
+      isOptional: true,
       description:
         "Request Context from the agent or workflow step being evaluated (optional).",
     },
     {
       name: "results.preprocessStepResult",
       type: "any",
-      required: false,
+      isOptional: true,
       description: "Result from preprocess step, if defined (optional).",
     },
   ]}
@@ -309,26 +318,26 @@ The method can return any value. The returned value will be available to subsequ
     {
       name: "description",
       type: "string",
-      required: true,
+      isOptional: false,
       description: "Description of what this analysis step does.",
     },
     {
       name: "outputSchema",
       type: "ZodSchema",
-      required: true,
+      isOptional: false,
       description: "Zod schema for the expected output of the analyze step.",
     },
     {
       name: "createPrompt",
       type: "function",
-      required: true,
+      isOptional: false,
       description:
         "Function: ({ run, results }) => string. Returns the prompt for the LLM.",
     },
     {
       name: "judge",
       type: "object",
-      required: false,
+      isOptional: true,
       description:
         "(Optional) LLM judge for this step (can override main judge). See Judge Object section.",
     },
@@ -347,40 +356,40 @@ Function: `({ run, results }) => number`
     {
       name: "run.input",
       type: "any",
-      required: true,
+      isOptional: false,
       description:
         "Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.",
     },
     {
       name: "run.output",
       type: "any",
-      required: true,
+      isOptional: false,
       description:
         "Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.",
     },
     {
       name: "run.runId",
       type: "string",
-      required: true,
+      isOptional: false,
       description: "Unique identifier for this scoring run.",
     },
     {
       name: "run.requestContext",
       type: "object",
-      required: false,
+      isOptional: true,
       description:
         "Request Context from the agent or workflow step being evaluated (optional).",
     },
     {
       name: "results.preprocessStepResult",
       type: "any",
-      required: false,
+      isOptional: true,
       description: "Result from preprocess step, if defined (optional).",
     },
     {
       name: "results.analyzeStepResult",
       type: "any",
-      required: false,
+      isOptional: true,
       description: "Result from analyze step, if defined (optional).",
     },
   ]}
@@ -396,27 +405,27 @@ The method must return a numerical score.
     {
       name: "description",
       type: "string",
-      required: true,
+      isOptional: false,
       description: "Description of what this scoring step does.",
     },
     {
       name: "outputSchema",
       type: "ZodSchema",
-      required: true,
+      isOptional: false,
       description:
         "Zod schema for the expected output of the generateScore step.",
     },
     {
       name: "createPrompt",
       type: "function",
-      required: true,
+      isOptional: false,
       description:
         "Function: ({ run, results }) => string. Returns the prompt for the LLM.",
     },
     {
       name: "judge",
       type: "object",
-      required: false,
+      isOptional: true,
       description:
         "(Optional) LLM judge for this step (can override main judge). See Judge Object section.",
     },
@@ -430,7 +439,7 @@ When using prompt object mode, you must also provide a `calculateScore` function
     {
       name: "calculateScore",
       type: "function",
-      required: true,
+      isOptional: false,
       description:
         "Function: ({ run, results, analyzeStepResult }) => number. Converts the LLM's structured output into a numerical score.",
     },
@@ -449,46 +458,46 @@ Function: `({ run, results, score }) => string`
     {
       name: "run.input",
       type: "any",
-      required: true,
+      isOptional: false,
       description:
         "Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.",
     },
     {
       name: "run.output",
       type: "any",
-      required: true,
+      isOptional: false,
       description:
         "Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.",
     },
     {
       name: "run.runId",
       type: "string",
-      required: true,
+      isOptional: false,
       description: "Unique identifier for this scoring run.",
     },
     {
       name: "run.requestContext",
       type: "object",
-      required: false,
+      isOptional: true,
       description:
         "Request Context from the agent or workflow step being evaluated (optional).",
     },
     {
       name: "results.preprocessStepResult",
       type: "any",
-      required: false,
+      isOptional: true,
       description: "Result from preprocess step, if defined (optional).",
     },
     {
       name: "results.analyzeStepResult",
       type: "any",
-      required: false,
+      isOptional: true,
       description: "Result from analyze step, if defined (optional).",
     },
     {
       name: "score",
       type: "number",
-      required: true,
+      isOptional: false,
       description: "Score computed by the generateScore step.",
     },
   ]}
@@ -504,20 +513,20 @@ The method must return a string explaining the score.
     {
       name: "description",
       type: "string",
-      required: true,
+      isOptional: false,
       description: "Description of what this reasoning step does.",
     },
     {
       name: "createPrompt",
       type: "function",
-      required: true,
+      isOptional: false,
       description:
         "Function: ({ run, results, score }) => string. Returns the prompt for the LLM.",
     },
     {
       name: "judge",
       type: "object",
-      required: false,
+      isOptional: true,
       description:
         "(Optional) LLM judge for this step (can override main judge). See Judge Object section.",
     },

package/.docs/raw/reference/evals/faithfulness.mdx CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: "Reference: Faithfulness Scorer | Evals | Mastra Docs"
+title: "Reference: Faithfulness Scorer | Evals"
 description: Documentation for the Faithfulness Scorer in Mastra, which evaluates the factual accuracy of LLM outputs compared to the provided context.
 ---
@@ -122,118 +122,45 @@ A faithfulness score between 0 and 1:
 - **0.1–0.3**: Most of the content is inaccurate or unsupported.
 - **0.0**: All claims are false or contradict the context.
-## Examples
+## Example
-### High faithfulness example
+Evaluate agent responses for faithfulness to provided context:
-In this example, the response closely aligns with the context. Each statement in the output is verifiable and supported by the provided context entries, resulting in a high score.
-```typescript title="src/example-high-faithfulness.ts" showLineNumbers copy
-import { createFaithfulnessScorer } from "@mastra/evals/scorers/prebuilt";
-const scorer = createFaithfulnessScorer({ model: 'openai/gpt-4o-mini', options: {
-  context: [
-    "The Tesla Model 3 was launched in 2017.",
-    "It has a range of up to 358 miles.",
-    "The base model accelerates 0-60 mph in 5.8 seconds."
-  ]
-});
-const query = "Tell me about the Tesla Model 3.";
-const response = "The Tesla Model 3 was introduced in 2017. It can travel up to 358 miles on a single charge and the base version goes from 0 to 60 mph in 5.8 seconds.";
-const result = await scorer.run({
-  input: [{ role: 'user', content: query }],
-  output: { text: response },
-});
-console.log(result);
-```
-#### High faithfulness output
-The output receives a score of 1 because all the information it provides can be directly traced to the context. There are no missing or contradictory facts.
-```typescript
-{
-  score: 1,
-  reason: 'The score is 1 because all claims made in the output are supported by the provided context.'
-}
-```
-### Mixed faithfulness example
-In this example, there are a mix of supported and unsupported claims. Some parts of the response are backed by the context, while others introduce new information not found in the source material.
-```typescript title="src/example-mixed-faithfulness.ts" showLineNumbers copy
-import { createFaithfulnessScorer } from "@mastra/evals/scorers/prebuilt";
-const scorer = createFaithfulnessScorer({ model: 'openai/gpt-4o-mini', options: {
-  context: [
-    "Python was created by Guido van Rossum.",
-    "The first version was released in 1991.",
-    "Python emphasizes code readability."
-  ]
-});
-const query = "What can you tell me about Python?";
-const response = "Python was created by Guido van Rossum and released in 1991. It is the most popular programming language today and is used by millions of developers worldwide.";
-const result = await scorer.run({
-  input: [{ role: 'user', content: query }],
-  output: { text: response },
-});
-console.log(result);
-```
-#### Mixed faithfulness output
-The score is lower because only a portion of the response is verifiable. While some claims match the context, others are unconfirmed or out of scope, reducing the overall faithfulness.
-```typescript
-{
-  score: 0.5,
-  reason: "The score is 0.5 because while two claims are supported by the context (Python was created by Guido van Rossum and Python was released in 1991), the other two claims regarding Python's popularity and usage cannot be verified as they are not mentioned in the context."
-}
-```
-### Low faithfulness example
-In this example, the response directly contradicts the context. None of the claims are supported, and several conflict with the facts provided.
-```typescript title="src/example-low-faithfulness.ts" showLineNumbers copy
+```typescript title="src/example-faithfulness.ts" showLineNumbers copy
+import { runEvals } from "@mastra/core/evals";
 import { createFaithfulnessScorer } from "@mastra/evals/scorers/prebuilt";
+import { myAgent } from "./agent";
-const scorer = createFaithfulnessScorer({ model: 'openai/gpt-4o-mini', options: {
-  context: [
-    "Mars is the fourth planet from the Sun.",
-    "It has a thin atmosphere of mostly carbon dioxide.",
-    "Two small moons orbit Mars: Phobos and Deimos."
-  ]
+// Context is typically populated from agent tool calls or RAG retrieval
+const scorer = createFaithfulnessScorer({
+  model: "openai/gpt-4o",
 });
-const query = "What do we know about Mars?";
-const response = "Mars is the third planet from the Sun. It has a thick atmosphere rich in oxygen and nitrogen, and is orbited by three large moons.";
-const result = await scorer.run({
-  input: [{ role: 'user', content: query }],
-  output: { text: response },
+const result = await runEvals({
+  data: [
+    {
+      input: "Tell me about the Tesla Model 3.",
+    },
+    {
+      input: "What are the key features of this electric vehicle?",
+    },
+  ],
+  scorers: [scorer],
+  target: myAgent,
+  onItemComplete: ({ scorerResults }) => {
+    console.log({
+      score: scorerResults[scorer.id].score,
+      reason: scorerResults[scorer.id].reason,
+    });
+  },
 });
-console.log(result);
+console.log(result.scores);
 ```
-#### Low faithfulness output
+For more details on `runEvals`, see the [runEvals reference](/reference/v1/evals/run-evals).
-Each claim is inaccurate or conflicts with the context, resulting in a score of 0.
-```typescript
-{
-  score: 0,
-  reason: "The score is 0 because all claims made in the output contradict the provided context. The output states that Mars is the third planet from the Sun, while the context clearly states it is the fourth. Additionally, it claims that Mars has a thick atmosphere rich in oxygen and nitrogen, contradicting the context's description of a thin atmosphere mostly composed of carbon dioxide. Finally, the output mentions that Mars is orbited by three large moons, while the context specifies that it has only two small moons, Phobos and Deimos. Therefore, there are no supported claims, leading to a score of 0."
-}
-```
+To add this scorer to an agent, see the [Scorers overview](/docs/v1/evals/overview#adding-scorers-to-agents) guide.
 ## Related