@mastra/evals 0.0.0-a2a-20250421213654

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE.md ADDED
@@ -0,0 +1,46 @@
1
+ # Elastic License 2.0 (ELv2)
2
+
3
+ Copyright (c) 2025 Mastra AI, Inc.
4
+
5
+ **Acceptance**
6
+ By using the software, you agree to all of the terms and conditions below.
7
+
8
+ **Copyright License**
9
+ The licensor grants you a non-exclusive, royalty-free, worldwide, non-sublicensable, non-transferable license to use, copy, distribute, make available, and prepare derivative works of the software, in each case subject to the limitations and conditions below
10
+
11
+ **Limitations**
12
+ You may not provide the software to third parties as a hosted or managed service, where the service provides users with access to any substantial set of the features or functionality of the software.
13
+
14
+ You may not move, change, disable, or circumvent the license key functionality in the software, and you may not remove or obscure any functionality in the software that is protected by the license key.
15
+
16
+ You may not alter, remove, or obscure any licensing, copyright, or other notices of the licensor in the software. Any use of the licensor’s trademarks is subject to applicable law.
17
+
18
+ **Patents**
19
+ The licensor grants you a license, under any patent claims the licensor can license, or becomes able to license, to make, have made, use, sell, offer for sale, import and have imported the software, in each case subject to the limitations and conditions in this license. This license does not cover any patent claims that you cause to be infringed by modifications or additions to the software. If you or your company make any written claim that the software infringes or contributes to infringement of any patent, your patent license for the software granted under these terms ends immediately. If your company makes such a claim, your patent license ends immediately for work on behalf of your company.
20
+
21
+ **Notices**
22
+ You must ensure that anyone who gets a copy of any part of the software from you also gets a copy of these terms.
23
+
24
+ If you modify the software, you must include in any modified copies of the software prominent notices stating that you have modified the software.
25
+
26
+ **No Other Rights**
27
+ These terms do not imply any licenses other than those expressly granted in these terms.
28
+
29
+ **Termination**
30
+ If you use the software in violation of these terms, such use is not licensed, and your licenses will automatically terminate. If the licensor provides you with a notice of your violation, and you cease all violation of this license no later than 30 days after you receive that notice, your licenses will be reinstated retroactively. However, if you violate these terms after such reinstatement, any additional violation of these terms will cause your licenses to terminate automatically and permanently.
31
+
32
+ **No Liability**
33
+ As far as the law allows, the software comes as is, without any warranty or condition, and the licensor will not be liable to you for any damages arising out of these terms or the use or nature of the software, under any kind of legal claim.
34
+
35
+ **Definitions**
36
+ The _licensor_ is the entity offering these terms, and the _software_ is the software the licensor makes available under these terms, including any portion of it.
37
+
38
+ _you_ refers to the individual or entity agreeing to these terms.
39
+
40
+ _your company_ is any legal entity, sole proprietorship, or other kind of organization that you work for, plus all organizations that have control over, are under the control of, or are under common control with that organization. _control_ means ownership of substantially all the assets of an entity, or the power to direct its management and policies by vote, contract, or otherwise. Control can be direct or indirect.
41
+
42
+ _your licenses_ are all the licenses granted to you for the software under these terms.
43
+
44
+ _use_ means anything you do with the software requiring one of your licenses.
45
+
46
+ _trademark_ means trademarks, service marks, and similar rights.
package/README.md ADDED
@@ -0,0 +1,185 @@
1
+ # @mastra/evals
2
+
3
+ A comprehensive evaluation framework for assessing AI model outputs across multiple dimensions.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ npm install @mastra/evals
9
+ ```
10
+
11
+ ## Overview
12
+
13
+ `@mastra/evals` provides a suite of evaluation metrics for assessing AI model outputs. The package includes both LLM-based and NLP-based metrics, enabling both automated and model-assisted evaluation of AI responses.
14
+
15
+ ## Features
16
+
17
+ ### LLM-Based Metrics
18
+
19
+ 1. **Answer Relevancy**
20
+
21
+ - Evaluates how well an answer addresses the input question
22
+ - Considers uncertainty weighting for more nuanced scoring
23
+ - Returns detailed reasoning for scores
24
+
25
+ 2. **Bias Detection**
26
+
27
+ - Identifies potential biases in model outputs
28
+ - Analyzes opinions and statements for bias indicators
29
+ - Provides explanations for detected biases
30
+ - Configurable scoring scale
31
+
32
+ 3. **Context Precision & Relevancy**
33
+
34
+ - Assesses how well responses use provided context
35
+ - Evaluates accuracy of context usage
36
+ - Measures relevance of context to the response
37
+ - Analyzes context positioning in responses
38
+
39
+ 4. **Faithfulness**
40
+
41
+ - Verifies that responses are faithful to provided context
42
+ - Detects hallucinations or fabricated information
43
+ - Evaluates claims against provided context
44
+ - Provides detailed analysis of faithfulness breaches
45
+
46
+ 5. **Prompt Alignment**
47
+
48
+ - Measures how well responses follow given instructions
49
+ - Evaluates adherence to multiple instruction criteria
50
+ - Provides per-instruction scoring
51
+ - Supports custom instruction sets
52
+
53
+ 6. **Toxicity**
54
+ - Detects toxic or harmful content in responses
55
+ - Provides detailed reasoning for toxicity verdicts
56
+ - Configurable scoring thresholds
57
+ - Considers both input and output context
58
+
59
+ ### NLP-Based Metrics
60
+
61
+ 1. **Completeness**
62
+
63
+ - Analyzes structural completeness of responses
64
+ - Identifies missing elements from input requirements
65
+ - Provides detailed element coverage analysis
66
+ - Tracks input-output element ratios
67
+
68
+ 2. **Content Similarity**
69
+
70
+ - Measures text similarity between inputs and outputs
71
+ - Configurable for case and whitespace sensitivity
72
+ - Returns normalized similarity scores
73
+ - Uses string comparison algorithms for accuracy
74
+
75
+ 3. **Keyword Coverage**
76
+ - Tracks presence of key terms from input in output
77
+ - Provides detailed keyword matching statistics
78
+ - Calculates coverage ratios
79
+ - Useful for ensuring comprehensive responses
80
+
81
+ ## Usage
82
+
83
+ ### Basic Example
84
+
85
+ ```typescript
86
+ import { ContentSimilarityMetric, ToxicityMetric } from '@mastra/evals';
87
+
88
+ // Initialize metrics
89
+ const similarityMetric = new ContentSimilarityMetric({
90
+ ignoreCase: true,
91
+ ignoreWhitespace: true,
92
+ });
93
+
94
+ const toxicityMetric = new ToxicityMetric({
95
+ model: openai('gpt-4'),
96
+ scale: 1, // Optional: adjust scoring scale
97
+ });
98
+
99
+ // Evaluate outputs
100
+ const input = 'What is the capital of France?';
101
+ const output = 'Paris is the capital of France.';
102
+
103
+ const similarityResult = await similarityMetric.measure(input, output);
104
+ const toxicityResult = await toxicityMetric.measure(input, output);
105
+
106
+ console.log('Similarity Score:', similarityResult.score);
107
+ console.log('Toxicity Score:', toxicityResult.score);
108
+ ```
109
+
110
+ ### Context-Aware Evaluation
111
+
112
+ ```typescript
113
+ import { FaithfulnessMetric } from '@mastra/evals';
114
+
115
+ // Initialize with context
116
+ const faithfulnessMetric = new FaithfulnessMetric({
117
+ model: openai('gpt-4'),
118
+ context: ['Paris is the capital of France', 'Paris has a population of 2.2 million'],
119
+ scale: 1,
120
+ });
121
+
122
+ // Evaluate response against context
123
+ const result = await faithfulnessMetric.measure(
124
+ 'Tell me about Paris',
125
+ 'Paris is the capital of France with 2.2 million residents',
126
+ );
127
+
128
+ console.log('Faithfulness Score:', result.score);
129
+ console.log('Reasoning:', result.reason);
130
+ ```
131
+
132
+ ## Metric Results
133
+
134
+ Each metric returns a standardized result object containing:
135
+
136
+ - `score`: Normalized score (typically 0-1)
137
+ - `info`: Detailed information about the evaluation
138
+ - Additional metric-specific data (e.g., matched keywords, missing elements)
139
+
140
+ Some metrics also provide:
141
+
142
+ - `reason`: Detailed explanation of the score
143
+ - `verdicts`: Individual judgments that contributed to the final score
144
+
145
+ ## Telemetry and Logging
146
+
147
+ The package includes built-in telemetry and logging capabilities:
148
+
149
+ - Automatic evaluation tracking through Mastra Storage
150
+ - Integration with OpenTelemetry for performance monitoring
151
+ - Detailed evaluation traces for debugging
152
+
153
+ ```typescript
154
+ import { attachListeners } from '@mastra/evals';
155
+
156
+ // Enable basic evaluation tracking
157
+ await attachListeners();
158
+
159
+ // Store evals in Mastra Storage (if storage is enabled)
160
+ await attachListeners(mastra);
161
+ // Note: When using in-memory storage, evaluations are isolated to the test process.
162
+ // When using file storage, evaluations are persisted and can be queried later.
163
+ ```
164
+
165
+ ## Environment Variables
166
+
167
+ Required for LLM-based metrics:
168
+
169
+ - `OPENAI_API_KEY`: For OpenAI model access
170
+ - Additional provider keys as needed (Cohere, Anthropic, etc.)
171
+
172
+ ## Package Exports
173
+
174
+ ```typescript
175
+ // Main package exports
176
+ import { evaluate } from '@mastra/evals';
177
+ // NLP-specific metrics
178
+ import { ContentSimilarityMetric } from '@mastra/evals/nlp';
179
+ ```
180
+
181
+ ## Related Packages
182
+
183
+ - `@mastra/core`: Core framework functionality
184
+ - `@mastra/engine`: LLM execution engine
185
+ - `@mastra/mcp`: Model Context Protocol integration