@mastra/mcp-docs-server 0.13.7-alpha.0 → 0.13.7-alpha.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.docs/organized/changelogs/%40mastra%2Fclient-js.md +39 -39
- package/.docs/organized/changelogs/%40mastra%2Fcloudflare-d1.md +18 -18
- package/.docs/organized/changelogs/%40mastra%2Fcloudflare.md +18 -18
- package/.docs/organized/changelogs/%40mastra%2Fcore.md +45 -45
- package/.docs/organized/changelogs/%40mastra%2Fdeployer-cloudflare.md +21 -21
- package/.docs/organized/changelogs/%40mastra%2Fdeployer.md +44 -44
- package/.docs/organized/changelogs/%40mastra%2Fevals.md +11 -11
- package/.docs/organized/changelogs/%40mastra%2Flibsql.md +29 -29
- package/.docs/organized/changelogs/%40mastra%2Fmcp-docs-server.md +25 -25
- package/.docs/organized/changelogs/%40mastra%2Fmemory.md +39 -39
- package/.docs/organized/changelogs/%40mastra%2Fmongodb.md +20 -20
- package/.docs/organized/changelogs/%40mastra%2Fmssql.md +17 -0
- package/.docs/organized/changelogs/%40mastra%2Fpg.md +29 -29
- package/.docs/organized/changelogs/%40mastra%2Fplayground-ui.md +12 -12
- package/.docs/organized/changelogs/%40mastra%2Fserver.md +38 -38
- package/.docs/organized/changelogs/%40mastra%2Fupstash.md +29 -29
- package/.docs/organized/changelogs/%40mastra%2Fvectorize.md +18 -18
- package/.docs/organized/changelogs/%40mastra%2Fvoice-cloudflare.md +18 -18
- package/.docs/organized/changelogs/create-mastra.md +7 -7
- package/.docs/organized/changelogs/mastra.md +32 -32
- package/.docs/organized/code-examples/agent.md +93 -3
- package/.docs/organized/code-examples/ai-sdk-v5.md +4 -4
- package/.docs/raw/agents/input-processors.mdx +268 -0
- package/.docs/raw/agents/using-tools-and-mcp.mdx +39 -0
- package/.docs/raw/community/contributing-templates.mdx +192 -0
- package/.docs/raw/getting-started/installation.mdx +16 -0
- package/.docs/raw/getting-started/templates.mdx +95 -0
- package/.docs/raw/observability/tracing.mdx +44 -0
- package/.docs/raw/reference/agents/agent.mdx +7 -0
- package/.docs/raw/reference/agents/generate.mdx +18 -1
- package/.docs/raw/reference/agents/stream.mdx +18 -1
- package/.docs/raw/reference/cli/dev.mdx +6 -0
- package/.docs/raw/reference/client-js/memory.mdx +18 -0
- package/.docs/raw/reference/core/mastra-class.mdx +1 -1
- package/.docs/raw/reference/memory/Memory.mdx +1 -0
- package/.docs/raw/reference/memory/deleteMessages.mdx +95 -0
- package/.docs/raw/reference/memory/getThreadsByResourceId.mdx +33 -1
- package/.docs/raw/reference/rag/upstash.mdx +112 -5
- package/.docs/raw/reference/scorers/answer-relevancy.mdx +114 -0
- package/.docs/raw/reference/scorers/bias.mdx +127 -0
- package/.docs/raw/reference/scorers/completeness.mdx +89 -0
- package/.docs/raw/reference/scorers/content-similarity.mdx +96 -0
- package/.docs/raw/reference/scorers/custom-code-scorer.mdx +155 -0
- package/.docs/raw/reference/scorers/faithfulness.mdx +122 -0
- package/.docs/raw/reference/scorers/hallucination.mdx +133 -0
- package/.docs/raw/reference/scorers/keyword-coverage.mdx +92 -0
- package/.docs/raw/reference/scorers/llm-scorer.mdx +210 -0
- package/.docs/raw/reference/scorers/mastra-scorer.mdx +218 -0
- package/.docs/raw/reference/scorers/textual-difference.mdx +76 -0
- package/.docs/raw/reference/scorers/tone-consistency.mdx +75 -0
- package/.docs/raw/reference/scorers/toxicity.mdx +109 -0
- package/.docs/raw/reference/storage/libsql.mdx +7 -4
- package/.docs/raw/reference/storage/mssql.mdx +7 -3
- package/.docs/raw/reference/storage/postgresql.mdx +7 -3
- package/.docs/raw/reference/templates.mdx +228 -0
- package/.docs/raw/scorers/custom-scorers.mdx +319 -0
- package/.docs/raw/scorers/off-the-shelf-scorers.mdx +30 -0
- package/.docs/raw/scorers/overview.mdx +124 -0
- package/package.json +4 -4
|
@@ -0,0 +1,127 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Reference: Bias | Scorers | Mastra Docs"
|
|
3
|
+
description: Documentation for the Bias Scorer in Mastra, which evaluates LLM outputs for various forms of bias, including gender, political, racial/ethnic, or geographical bias.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Bias Scorer
|
|
7
|
+
The `createBiasScorer()` function accepts a single options object with the following properties:
|
|
8
|
+
|
|
9
|
+
For a usage example, see the [Bias Examples](/examples/scorers/bias).
|
|
10
|
+
|
|
11
|
+
## Parameters
|
|
12
|
+
|
|
13
|
+
|
|
14
|
+
<PropertiesTable
|
|
15
|
+
content={[
|
|
16
|
+
{
|
|
17
|
+
name: "model",
|
|
18
|
+
type: "LanguageModel",
|
|
19
|
+
required: true,
|
|
20
|
+
description: "Configuration for the model used to evaluate bias.",
|
|
21
|
+
},
|
|
22
|
+
{
|
|
23
|
+
name: "scale",
|
|
24
|
+
type: "number",
|
|
25
|
+
required: false,
|
|
26
|
+
defaultValue: "1",
|
|
27
|
+
description: "Maximum score value.",
|
|
28
|
+
},
|
|
29
|
+
]}
|
|
30
|
+
/>
|
|
31
|
+
|
|
32
|
+
This function returns an instance of the MastraScorer class. The `.run()` method accepts the same input as other scorers (see the [MastraScorer reference](./mastra-scorer)), but the return value includes LLM-specific fields as documented below.
|
|
33
|
+
|
|
34
|
+
## .run() Returns
|
|
35
|
+
|
|
36
|
+
<PropertiesTable
|
|
37
|
+
content={[
|
|
38
|
+
{
|
|
39
|
+
name: "runId",
|
|
40
|
+
type: "string",
|
|
41
|
+
description: "The id of the run (optional).",
|
|
42
|
+
},
|
|
43
|
+
{
|
|
44
|
+
name: "extractStepResult",
|
|
45
|
+
type: "object",
|
|
46
|
+
description: "Object with extracted opinions: { opinions: string[] }",
|
|
47
|
+
},
|
|
48
|
+
{
|
|
49
|
+
name: "extractPrompt",
|
|
50
|
+
type: "string",
|
|
51
|
+
description: "The prompt sent to the LLM for the extract step (optional).",
|
|
52
|
+
},
|
|
53
|
+
{
|
|
54
|
+
name: "analyzeStepResult",
|
|
55
|
+
type: "object",
|
|
56
|
+
description: "Object with results: { results: Array<{ result: 'yes' | 'no', reason: string }> }",
|
|
57
|
+
},
|
|
58
|
+
{
|
|
59
|
+
name: "analyzePrompt",
|
|
60
|
+
type: "string",
|
|
61
|
+
description: "The prompt sent to the LLM for the analyze step (optional).",
|
|
62
|
+
},
|
|
63
|
+
{
|
|
64
|
+
name: "score",
|
|
65
|
+
type: "number",
|
|
66
|
+
description: "Bias score (0 to scale, default 0-1). Higher scores indicate more bias.",
|
|
67
|
+
},
|
|
68
|
+
{
|
|
69
|
+
name: "reason",
|
|
70
|
+
type: "string",
|
|
71
|
+
description: "Explanation of the score.",
|
|
72
|
+
},
|
|
73
|
+
{
|
|
74
|
+
name: "reasonPrompt",
|
|
75
|
+
type: "string",
|
|
76
|
+
description: "The prompt sent to the LLM for the reason step (optional).",
|
|
77
|
+
},
|
|
78
|
+
]}
|
|
79
|
+
/>
|
|
80
|
+
|
|
81
|
+
## Bias Categories
|
|
82
|
+
|
|
83
|
+
The scorer evaluates several types of bias:
|
|
84
|
+
|
|
85
|
+
1. **Gender Bias**: Discrimination or stereotypes based on gender
|
|
86
|
+
2. **Political Bias**: Prejudice against political ideologies or beliefs
|
|
87
|
+
3. **Racial/Ethnic Bias**: Discrimination based on race, ethnicity, or national origin
|
|
88
|
+
4. **Geographical Bias**: Prejudice based on location or regional stereotypes
|
|
89
|
+
|
|
90
|
+
## Scoring Details
|
|
91
|
+
|
|
92
|
+
The scorer evaluates bias through opinion analysis based on:
|
|
93
|
+
|
|
94
|
+
- Opinion identification and extraction
|
|
95
|
+
- Presence of discriminatory language
|
|
96
|
+
- Use of stereotypes or generalizations
|
|
97
|
+
- Balance in perspective presentation
|
|
98
|
+
- Loaded or prejudicial terminology
|
|
99
|
+
|
|
100
|
+
### Scoring Process
|
|
101
|
+
|
|
102
|
+
1. Extracts opinions from text:
|
|
103
|
+
- Identifies subjective statements
|
|
104
|
+
- Excludes factual claims
|
|
105
|
+
- Includes cited opinions
|
|
106
|
+
2. Evaluates each opinion:
|
|
107
|
+
- Checks for discriminatory language
|
|
108
|
+
- Assesses stereotypes and generalizations
|
|
109
|
+
- Analyzes perspective balance
|
|
110
|
+
|
|
111
|
+
Final score: `(biased_opinions / total_opinions) * scale`
|
|
112
|
+
|
|
113
|
+
### Score interpretation
|
|
114
|
+
|
|
115
|
+
(0 to scale, default 0-1)
|
|
116
|
+
|
|
117
|
+
- 1.0: Complete bias - all opinions contain bias
|
|
118
|
+
- 0.7-0.9: Significant bias - majority of opinions show bias
|
|
119
|
+
- 0.4-0.6: Moderate bias - mix of biased and neutral opinions
|
|
120
|
+
- 0.1-0.3: Minimal bias - most opinions show balanced perspective
|
|
121
|
+
- 0.0: No detectable bias - opinions are balanced and neutral
|
|
122
|
+
|
|
123
|
+
## Related
|
|
124
|
+
|
|
125
|
+
- [Toxicity Scorer](./toxicity)
|
|
126
|
+
- [Faithfulness Scorer](./faithfulness)
|
|
127
|
+
- [Hallucination Scorer](./hallucination)
|
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Reference: Completeness | Scorers | Mastra Docs"
|
|
3
|
+
description: Documentation for the Completeness Scorer in Mastra, which evaluates how thoroughly LLM outputs cover key elements present in the input.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Completeness Scorer
|
|
7
|
+
|
|
8
|
+
The `createCompletenessScorer()` function evaluates how thoroughly an LLM's output covers the key elements present in the input. It analyzes nouns, verbs, topics, and terms to determine coverage and provides a detailed completeness score.
|
|
9
|
+
|
|
10
|
+
For a usage example, see the [Completeness Examples](/examples/scorers/completeness).
|
|
11
|
+
|
|
12
|
+
## Parameters
|
|
13
|
+
|
|
14
|
+
The `createCompletenessScorer()` function does not take any options.
|
|
15
|
+
|
|
16
|
+
This function returns an instance of the MastraScorer class. See the [MastraScorer reference](./mastra-scorer) for details on the `.run()` method and its input/output.
|
|
17
|
+
|
|
18
|
+
## .run() Returns
|
|
19
|
+
|
|
20
|
+
<PropertiesTable
|
|
21
|
+
content={[
|
|
22
|
+
{
|
|
23
|
+
name: "runId",
|
|
24
|
+
type: "string",
|
|
25
|
+
description: "The id of the run (optional).",
|
|
26
|
+
},
|
|
27
|
+
{
|
|
28
|
+
name: "extractStepResult",
|
|
29
|
+
type: "object",
|
|
30
|
+
description: "Object with extracted elements and coverage details: { inputElements: string[], outputElements: string[], missingElements: string[], elementCounts: { input: number, output: number } }",
|
|
31
|
+
},
|
|
32
|
+
{
|
|
33
|
+
name: "score",
|
|
34
|
+
type: "number",
|
|
35
|
+
description: "Completeness score (0-1) representing the proportion of input elements covered in the output.",
|
|
36
|
+
},
|
|
37
|
+
]}
|
|
38
|
+
/>
|
|
39
|
+
|
|
40
|
+
## Element Extraction Details
|
|
41
|
+
|
|
42
|
+
The scorer extracts and analyzes several types of elements:
|
|
43
|
+
|
|
44
|
+
- Nouns: Key objects, concepts, and entities
|
|
45
|
+
- Verbs: Actions and states (converted to infinitive form)
|
|
46
|
+
- Topics: Main subjects and themes
|
|
47
|
+
- Terms: Individual significant words
|
|
48
|
+
|
|
49
|
+
The extraction process includes:
|
|
50
|
+
|
|
51
|
+
- Normalization of text (removing diacritics, converting to lowercase)
|
|
52
|
+
- Splitting camelCase words
|
|
53
|
+
- Handling of word boundaries
|
|
54
|
+
- Special handling of short words (3 characters or less)
|
|
55
|
+
- Deduplication of elements
|
|
56
|
+
|
|
57
|
+
## Scoring Details
|
|
58
|
+
|
|
59
|
+
The scorer evaluates completeness through linguistic element coverage analysis.
|
|
60
|
+
|
|
61
|
+
### Scoring Process
|
|
62
|
+
|
|
63
|
+
1. Extracts key elements:
|
|
64
|
+
- Nouns and named entities
|
|
65
|
+
- Action verbs
|
|
66
|
+
- Topic-specific terms
|
|
67
|
+
- Normalized word forms
|
|
68
|
+
2. Calculates coverage of input elements:
|
|
69
|
+
- Exact matches for short terms (≤3 chars)
|
|
70
|
+
- Substantial overlap (>60%) for longer terms
|
|
71
|
+
|
|
72
|
+
Final score: `(covered_elements / total_input_elements) * scale`
|
|
73
|
+
|
|
74
|
+
### Score interpretation
|
|
75
|
+
|
|
76
|
+
(0 to scale, default 0-1)
|
|
77
|
+
|
|
78
|
+
- 1.0: Complete coverage - contains all input elements
|
|
79
|
+
- 0.7-0.9: High coverage - includes most key elements
|
|
80
|
+
- 0.4-0.6: Partial coverage - contains some key elements
|
|
81
|
+
- 0.1-0.3: Low coverage - missing most key elements
|
|
82
|
+
- 0.0: No coverage - output lacks all input elements
|
|
83
|
+
|
|
84
|
+
## Related
|
|
85
|
+
|
|
86
|
+
- [Answer Relevancy Scorer](./answer-relevancy)
|
|
87
|
+
- [Content Similarity Scorer](./content-similarity)
|
|
88
|
+
- [Textual Difference Scorer](./textual-difference)
|
|
89
|
+
- [Keyword Coverage Scorer](./keyword-coverage)
|
|
@@ -0,0 +1,96 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Reference: Content Similarity | Scorers | Mastra Docs"
|
|
3
|
+
description: Documentation for the Content Similarity Scorer in Mastra, which measures textual similarity between strings and provides a matching score.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Content Similarity Scorer
|
|
7
|
+
|
|
8
|
+
The `createContentSimilarityScorer()` function measures the textual similarity between two strings, providing a score that indicates how closely they match. It supports configurable options for case sensitivity and whitespace handling.
|
|
9
|
+
|
|
10
|
+
For a usage example, see the [Content Similarity Examples](/examples/scorers/content-similarity).
|
|
11
|
+
|
|
12
|
+
## Parameters
|
|
13
|
+
|
|
14
|
+
The `createContentSimilarityScorer()` function accepts a single options object with the following properties:
|
|
15
|
+
|
|
16
|
+
<PropertiesTable
|
|
17
|
+
content={[
|
|
18
|
+
{
|
|
19
|
+
name: "ignoreCase",
|
|
20
|
+
type: "boolean",
|
|
21
|
+
required: false,
|
|
22
|
+
defaultValue: "true",
|
|
23
|
+
description: "Whether to ignore case differences when comparing strings.",
|
|
24
|
+
},
|
|
25
|
+
{
|
|
26
|
+
name: "ignoreWhitespace",
|
|
27
|
+
type: "boolean",
|
|
28
|
+
required: false,
|
|
29
|
+
defaultValue: "true",
|
|
30
|
+
description: "Whether to normalize whitespace when comparing strings.",
|
|
31
|
+
},
|
|
32
|
+
]}
|
|
33
|
+
/>
|
|
34
|
+
|
|
35
|
+
This function returns an instance of the MastraScorer class. See the [MastraScorer reference](./mastra-scorer) for details on the `.run()` method and its input/output.
|
|
36
|
+
|
|
37
|
+
## .run() Returns
|
|
38
|
+
|
|
39
|
+
<PropertiesTable
|
|
40
|
+
content={[
|
|
41
|
+
{
|
|
42
|
+
name: "runId",
|
|
43
|
+
type: "string",
|
|
44
|
+
description: "The id of the run (optional).",
|
|
45
|
+
},
|
|
46
|
+
{
|
|
47
|
+
name: "extractStepResult",
|
|
48
|
+
type: "object",
|
|
49
|
+
description: "Object with processed input and output: { processedInput: string, processedOutput: string }",
|
|
50
|
+
},
|
|
51
|
+
{
|
|
52
|
+
name: "analyzeStepResult",
|
|
53
|
+
type: "object",
|
|
54
|
+
description: "Object with similarity: { similarity: number }",
|
|
55
|
+
},
|
|
56
|
+
{
|
|
57
|
+
name: "score",
|
|
58
|
+
type: "number",
|
|
59
|
+
description: "Similarity score (0-1) where 1 indicates perfect similarity.",
|
|
60
|
+
},
|
|
61
|
+
]}
|
|
62
|
+
/>
|
|
63
|
+
|
|
64
|
+
## Scoring Details
|
|
65
|
+
|
|
66
|
+
The scorer evaluates textual similarity through character-level matching and configurable text normalization.
|
|
67
|
+
|
|
68
|
+
### Scoring Process
|
|
69
|
+
|
|
70
|
+
1. Normalizes text:
|
|
71
|
+
- Case normalization (if ignoreCase: true)
|
|
72
|
+
- Whitespace normalization (if ignoreWhitespace: true)
|
|
73
|
+
2. Compares processed strings using string-similarity algorithm:
|
|
74
|
+
- Analyzes character sequences
|
|
75
|
+
- Aligns word boundaries
|
|
76
|
+
- Considers relative positions
|
|
77
|
+
- Accounts for length differences
|
|
78
|
+
|
|
79
|
+
Final score: `similarity_value * scale`
|
|
80
|
+
|
|
81
|
+
### Score interpretation
|
|
82
|
+
|
|
83
|
+
(0 to scale, default 0-1)
|
|
84
|
+
|
|
85
|
+
- 1.0: Perfect match - identical texts
|
|
86
|
+
- 0.7-0.9: High similarity - mostly matching content
|
|
87
|
+
- 0.4-0.6: Moderate similarity - partial matches
|
|
88
|
+
- 0.1-0.3: Low similarity - few matching patterns
|
|
89
|
+
- 0.0: No similarity - completely different texts
|
|
90
|
+
|
|
91
|
+
## Related
|
|
92
|
+
|
|
93
|
+
- [Completeness Scorer](./completeness)
|
|
94
|
+
- [Textual Difference Scorer](./textual-difference)
|
|
95
|
+
- [Answer Relevancy Scorer](./answer-relevancy)
|
|
96
|
+
- [Keyword Coverage Scorer](./keyword-coverage)
|
|
@@ -0,0 +1,155 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Reference: Create Custom Scorer | Scorers | Mastra Docs"
|
|
3
|
+
description: Documentation for creating custom code scorers in Mastra, allowing users to define their own evaluation logic.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# createScorer
|
|
7
|
+
|
|
8
|
+
Mastra allows you to define your own custom code scorers for evaluating input/output pairs using any logic you choose. Custom scorers integrate seamlessly with the Mastra scoring framework and can be used anywhere built-in scorers are used.
|
|
9
|
+
|
|
10
|
+
For a usage example, see the [Custom Code Scorer Examples](/examples/scorers/custom-native-javascript-eval).
|
|
11
|
+
|
|
12
|
+
## How to Create a Custom Scorer
|
|
13
|
+
|
|
14
|
+
Use the `createScorer` factory to define your scorer. You must provide at least a `name`, `description`, and an `analyze` function. Optionally, you can provide `extract` and `reason` functions for multi-step or more advanced logic.
|
|
15
|
+
|
|
16
|
+
## createScorer Options
|
|
17
|
+
|
|
18
|
+
<PropertiesTable
|
|
19
|
+
content={[
|
|
20
|
+
{
|
|
21
|
+
name: "name",
|
|
22
|
+
type: "string",
|
|
23
|
+
required: true,
|
|
24
|
+
description: "Name of the scorer.",
|
|
25
|
+
},
|
|
26
|
+
{
|
|
27
|
+
name: "description",
|
|
28
|
+
type: "string",
|
|
29
|
+
required: true,
|
|
30
|
+
description: "Description of what the scorer does.",
|
|
31
|
+
},
|
|
32
|
+
{
|
|
33
|
+
name: "analyze",
|
|
34
|
+
type: "function",
|
|
35
|
+
required: true,
|
|
36
|
+
description: "Main scoring logic",
|
|
37
|
+
},
|
|
38
|
+
{
|
|
39
|
+
name: "extract",
|
|
40
|
+
type: "function",
|
|
41
|
+
required: false,
|
|
42
|
+
description: "Optional pre-processing step.",
|
|
43
|
+
},
|
|
44
|
+
{
|
|
45
|
+
name: "reason",
|
|
46
|
+
type: "function",
|
|
47
|
+
required: false,
|
|
48
|
+
description: "Optional reason/explanation step.",
|
|
49
|
+
},
|
|
50
|
+
{
|
|
51
|
+
name: "metadata",
|
|
52
|
+
type: "object",
|
|
53
|
+
required: false,
|
|
54
|
+
description: "Optional metadata for the scorer.",
|
|
55
|
+
},
|
|
56
|
+
]}
|
|
57
|
+
/>
|
|
58
|
+
|
|
59
|
+
This function returns an instance of the MastraScorer class. See the [MastraScorer reference](./mastra-scorer) for details on the `.run()` method and its input/output.
|
|
60
|
+
|
|
61
|
+
## Step Function Signatures
|
|
62
|
+
|
|
63
|
+
|
|
64
|
+
### extract
|
|
65
|
+
<PropertiesTable
|
|
66
|
+
content={[
|
|
67
|
+
{
|
|
68
|
+
name: "input",
|
|
69
|
+
type: "Record<string, any>[]",
|
|
70
|
+
required: false,
|
|
71
|
+
description:
|
|
72
|
+
"Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.",
|
|
73
|
+
},
|
|
74
|
+
{
|
|
75
|
+
name: "output",
|
|
76
|
+
type: "Record<string, any>",
|
|
77
|
+
required: true,
|
|
78
|
+
description:
|
|
79
|
+
"Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.",
|
|
80
|
+
},
|
|
81
|
+
]}
|
|
82
|
+
/>
|
|
83
|
+
Returns: `{ results: any }`
|
|
84
|
+
The method must return an object with a `results` property. The value of `results` will be passed to the analyze function as `extractStepResult`.
|
|
85
|
+
|
|
86
|
+
### analyze
|
|
87
|
+
<PropertiesTable
|
|
88
|
+
content={[
|
|
89
|
+
{
|
|
90
|
+
name: "input",
|
|
91
|
+
type: "Record<string, any>[]",
|
|
92
|
+
required: true,
|
|
93
|
+
description:
|
|
94
|
+
"Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.",
|
|
95
|
+
},
|
|
96
|
+
{
|
|
97
|
+
name: "output",
|
|
98
|
+
type: "Record<string, any>",
|
|
99
|
+
required: true,
|
|
100
|
+
description:
|
|
101
|
+
"Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.",
|
|
102
|
+
},
|
|
103
|
+
{
|
|
104
|
+
name: "extractStepResult",
|
|
105
|
+
type: "object",
|
|
106
|
+
required: false,
|
|
107
|
+
description: "Result of the extract step, if defined (optional).",
|
|
108
|
+
},
|
|
109
|
+
]}
|
|
110
|
+
/>
|
|
111
|
+
Returns: `{ score: number, results?: any }`
|
|
112
|
+
The method must return an object with a `score` property (required). Optionally, it may return a `results` property. The value of `results` will be passed to the reason function as `analyzeStepResult`.
|
|
113
|
+
|
|
114
|
+
|
|
115
|
+
### reason
|
|
116
|
+
<PropertiesTable
|
|
117
|
+
content={[
|
|
118
|
+
{
|
|
119
|
+
name: "input",
|
|
120
|
+
type: "Record<string, any>[]",
|
|
121
|
+
required: true,
|
|
122
|
+
description:
|
|
123
|
+
"Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.",
|
|
124
|
+
},
|
|
125
|
+
{
|
|
126
|
+
name: "output",
|
|
127
|
+
type: "Record<string, any>",
|
|
128
|
+
required: true,
|
|
129
|
+
description:
|
|
130
|
+
"Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.",
|
|
131
|
+
},
|
|
132
|
+
{
|
|
133
|
+
name: "score",
|
|
134
|
+
type: "number",
|
|
135
|
+
required: true,
|
|
136
|
+
description: "Score computed by the analyze step.",
|
|
137
|
+
},
|
|
138
|
+
{
|
|
139
|
+
name: "analyzeStepResult",
|
|
140
|
+
type: "object",
|
|
141
|
+
required: true,
|
|
142
|
+
description: "Result of the analyze step.",
|
|
143
|
+
},
|
|
144
|
+
{
|
|
145
|
+
name: "extractStepResult",
|
|
146
|
+
type: "object",
|
|
147
|
+
required: false,
|
|
148
|
+
description: "Result of the extract step, if defined (optional).",
|
|
149
|
+
},
|
|
150
|
+
]}
|
|
151
|
+
/>
|
|
152
|
+
Returns: `{ reason: string }`
|
|
153
|
+
The method must return an object with a `reason` property, which should be a string explaining the score.
|
|
154
|
+
|
|
155
|
+
All step functions can be async.
|
|
@@ -0,0 +1,122 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Reference: Faithfulness | Scorers | Mastra Docs"
|
|
3
|
+
description: Documentation for the Faithfulness Scorer in Mastra, which evaluates the factual accuracy of LLM outputs compared to the provided context.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Faithfulness Scorer
|
|
7
|
+
|
|
8
|
+
The `createFaithfulnessScorer()` function evaluates how factually accurate an LLM's output is compared to the provided context. It extracts claims from the output and verifies them against the context, making it essential to measure RAG pipeline responses' reliability.
|
|
9
|
+
|
|
10
|
+
For a usage example, see the [Faithfulness Examples](/examples/scorers/faithfulness).
|
|
11
|
+
|
|
12
|
+
## Parameters
|
|
13
|
+
|
|
14
|
+
The `createFaithfulnessScorer()` function accepts a single options object with the following properties:
|
|
15
|
+
|
|
16
|
+
<PropertiesTable
|
|
17
|
+
content={[
|
|
18
|
+
{
|
|
19
|
+
name: "model",
|
|
20
|
+
type: "LanguageModel",
|
|
21
|
+
required: true,
|
|
22
|
+
description: "Configuration for the model used to evaluate faithfulness.",
|
|
23
|
+
},
|
|
24
|
+
{
|
|
25
|
+
name: "context",
|
|
26
|
+
type: "string[]",
|
|
27
|
+
required: true,
|
|
28
|
+
description: "Array of context chunks against which the output's claims will be verified.",
|
|
29
|
+
},
|
|
30
|
+
{
|
|
31
|
+
name: "scale",
|
|
32
|
+
type: "number",
|
|
33
|
+
required: false,
|
|
34
|
+
defaultValue: "1",
|
|
35
|
+
description: "The maximum score value. The final score will be normalized to this scale.",
|
|
36
|
+
},
|
|
37
|
+
]}
|
|
38
|
+
/>
|
|
39
|
+
|
|
40
|
+
This function returns an instance of the MastraScorer class. The `.run()` method accepts the same input as other scorers (see the [MastraScorer reference](./mastra-scorer)), but the return value includes LLM-specific fields as documented below.
|
|
41
|
+
|
|
42
|
+
## .run() Returns
|
|
43
|
+
|
|
44
|
+
<PropertiesTable
|
|
45
|
+
content={[
|
|
46
|
+
{
|
|
47
|
+
name: "runId",
|
|
48
|
+
type: "string",
|
|
49
|
+
description: "The id of the run (optional).",
|
|
50
|
+
},
|
|
51
|
+
{
|
|
52
|
+
name: "extractStepResult",
|
|
53
|
+
type: "string[]",
|
|
54
|
+
description: "Array of extracted claims from the output.",
|
|
55
|
+
},
|
|
56
|
+
{
|
|
57
|
+
name: "extractPrompt",
|
|
58
|
+
type: "string",
|
|
59
|
+
description: "The prompt sent to the LLM for the extract step (optional).",
|
|
60
|
+
},
|
|
61
|
+
{
|
|
62
|
+
name: "analyzeStepResult",
|
|
63
|
+
type: "object",
|
|
64
|
+
description: "Object with verdicts: { verdicts: Array<{ verdict: 'yes' | 'no' | 'unsure', reason: string }> }",
|
|
65
|
+
},
|
|
66
|
+
{
|
|
67
|
+
name: "analyzePrompt",
|
|
68
|
+
type: "string",
|
|
69
|
+
description: "The prompt sent to the LLM for the analyze step (optional).",
|
|
70
|
+
},
|
|
71
|
+
{
|
|
72
|
+
name: "score",
|
|
73
|
+
type: "number",
|
|
74
|
+
description: "A score between 0 and the configured scale, representing the proportion of claims that are supported by the context.",
|
|
75
|
+
},
|
|
76
|
+
{
|
|
77
|
+
name: "reason",
|
|
78
|
+
type: "string",
|
|
79
|
+
description: "A detailed explanation of the score, including which claims were supported, contradicted, or marked as unsure.",
|
|
80
|
+
},
|
|
81
|
+
{
|
|
82
|
+
name: "reasonPrompt",
|
|
83
|
+
type: "string",
|
|
84
|
+
description: "The prompt sent to the LLM for the reason step (optional).",
|
|
85
|
+
},
|
|
86
|
+
]}
|
|
87
|
+
/>
|
|
88
|
+
|
|
89
|
+
## Scoring Details
|
|
90
|
+
|
|
91
|
+
The scorer evaluates faithfulness through claim verification against provided context.
|
|
92
|
+
|
|
93
|
+
### Scoring Process
|
|
94
|
+
|
|
95
|
+
1. Analyzes claims and context:
|
|
96
|
+
- Extracts all claims (factual and speculative)
|
|
97
|
+
- Verifies each claim against context
|
|
98
|
+
- Assigns one of three verdicts:
|
|
99
|
+
- "yes" - claim supported by context
|
|
100
|
+
- "no" - claim contradicts context
|
|
101
|
+
- "unsure" - claim unverifiable
|
|
102
|
+
2. Calculates faithfulness score:
|
|
103
|
+
- Counts supported claims
|
|
104
|
+
- Divides by total claims
|
|
105
|
+
- Scales to configured range
|
|
106
|
+
|
|
107
|
+
Final score: `(supported_claims / total_claims) * scale`
|
|
108
|
+
|
|
109
|
+
### Score interpretation
|
|
110
|
+
|
|
111
|
+
(0 to scale, default 0-1)
|
|
112
|
+
|
|
113
|
+
- 1.0: All claims supported by context
|
|
114
|
+
- 0.7-0.9: Most claims supported, few unverifiable
|
|
115
|
+
- 0.4-0.6: Mixed support with some contradictions
|
|
116
|
+
- 0.1-0.3: Limited support, many contradictions
|
|
117
|
+
- 0.0: No supported claims
|
|
118
|
+
|
|
119
|
+
## Related
|
|
120
|
+
|
|
121
|
+
- [Answer Relevancy Scorer](./answer-relevancy)
|
|
122
|
+
- [Hallucination Scorer](./hallucination)
|