@mastra/evals 1.2.0 → 1.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +28 -0
- package/dist/docs/SKILL.md +1 -1
- package/dist/docs/assets/SOURCE_MAP.json +1 -1
- package/dist/docs/references/docs-evals-built-in-scorers.md +1 -1
- package/dist/docs/references/docs-evals-overview.md +8 -9
- package/dist/scorers/prebuilt/index.cjs +1 -1
- package/dist/scorers/prebuilt/index.cjs.map +1 -1
- package/dist/scorers/prebuilt/index.js +1 -1
- package/dist/scorers/prebuilt/index.js.map +1 -1
- package/package.json +7 -7
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,33 @@
|
|
|
1
1
|
# @mastra/evals
|
|
2
2
|
|
|
3
|
+
## 1.2.1
|
|
4
|
+
|
|
5
|
+
### Patch Changes
|
|
6
|
+
|
|
7
|
+
- Fix answer-similarity scorer to align prompt guidelines with allowed match types ([#15001](https://github.com/mastra-ai/mastra/pull/15001))
|
|
8
|
+
|
|
9
|
+
The answer-similarity scorer could throw a ZodError when the LLM returned
|
|
10
|
+
"contradiction" as a matchType, since only exact/semantic/partial/missing are
|
|
11
|
+
valid. The prompt now correctly directs contradictory information to the
|
|
12
|
+
existing contradictions array instead.
|
|
13
|
+
|
|
14
|
+
- Updated dependencies [[`f32b9e1`](https://github.com/mastra-ai/mastra/commit/f32b9e115a3c754d1c8cfa3f4256fba87b09cfb7), [`7d6f521`](https://github.com/mastra-ai/mastra/commit/7d6f52164d0cca099f0b07cb2bba334360f1c8ab), [`a50d220`](https://github.com/mastra-ai/mastra/commit/a50d220b01ecbc5644d489a3d446c3bd4ab30245), [`665477b`](https://github.com/mastra-ai/mastra/commit/665477bc104fd52cfef8e7610d7664781a70c220), [`4cc2755`](https://github.com/mastra-ai/mastra/commit/4cc2755a7194cb08720ff2ab4dffb4b4a5103dfd), [`ac7baf6`](https://github.com/mastra-ai/mastra/commit/ac7baf66ef1db15e03975ef4ebb02724f015a391), [`ed425d7`](https://github.com/mastra-ai/mastra/commit/ed425d78e7c66cbda8209fee910856f98c6c6b82), [`1371703`](https://github.com/mastra-ai/mastra/commit/1371703835080450ef3f9aea58059a95d0da2e5a), [`0df8321`](https://github.com/mastra-ai/mastra/commit/0df832196eeb2450ab77ce887e8553abdd44c5a6), [`98f8a8b`](https://github.com/mastra-ai/mastra/commit/98f8a8bdf5761b9982f3ad3acbe7f1cc3efa71f3), [`ba6f7e9`](https://github.com/mastra-ai/mastra/commit/ba6f7e9086d8281393f2acae60fda61de3bff1f9), [`7eb2596`](https://github.com/mastra-ai/mastra/commit/7eb25960d607e07468c9a10c5437abd2deaf1e9a), [`1805ddc`](https://github.com/mastra-ai/mastra/commit/1805ddc9c9b3b14b63749735a13c05a45af43a80), [`fff91cf`](https://github.com/mastra-ai/mastra/commit/fff91cf914de0e731578aacebffdeebef82f0440), [`61109b3`](https://github.com/mastra-ai/mastra/commit/61109b34feb0e38d54bee4b8ca83eb7345b1d557), [`33f1ead`](https://github.com/mastra-ai/mastra/commit/33f1eadfa19c86953f593478e5fa371093b33779)]:
|
|
15
|
+
- @mastra/core@1.23.0
|
|
16
|
+
|
|
17
|
+
## 1.2.1-alpha.0
|
|
18
|
+
|
|
19
|
+
### Patch Changes
|
|
20
|
+
|
|
21
|
+
- Fix answer-similarity scorer to align prompt guidelines with allowed match types ([#15001](https://github.com/mastra-ai/mastra/pull/15001))
|
|
22
|
+
|
|
23
|
+
The answer-similarity scorer could throw a ZodError when the LLM returned
|
|
24
|
+
"contradiction" as a matchType, since only exact/semantic/partial/missing are
|
|
25
|
+
valid. The prompt now correctly directs contradictory information to the
|
|
26
|
+
existing contradictions array instead.
|
|
27
|
+
|
|
28
|
+
- Updated dependencies [[`ac7baf6`](https://github.com/mastra-ai/mastra/commit/ac7baf66ef1db15e03975ef4ebb02724f015a391), [`0df8321`](https://github.com/mastra-ai/mastra/commit/0df832196eeb2450ab77ce887e8553abdd44c5a6), [`61109b3`](https://github.com/mastra-ai/mastra/commit/61109b34feb0e38d54bee4b8ca83eb7345b1d557), [`33f1ead`](https://github.com/mastra-ai/mastra/commit/33f1eadfa19c86953f593478e5fa371093b33779)]:
|
|
29
|
+
- @mastra/core@1.23.0-alpha.8
|
|
30
|
+
|
|
3
31
|
## 1.2.0
|
|
4
32
|
|
|
5
33
|
### Minor Changes
|
package/dist/docs/SKILL.md
CHANGED
|
@@ -28,7 +28,7 @@ These scorers evaluate the quality and relevance of context used in generating r
|
|
|
28
28
|
- [`context-precision`](https://mastra.ai/reference/evals/context-precision): Evaluates context relevance and ranking using Mean Average Precision, rewarding early placement of relevant context (`0-1`, higher is better)
|
|
29
29
|
- [`context-relevance`](https://mastra.ai/reference/evals/context-relevance): Measures context utility with nuanced relevance levels, usage tracking, and missing context detection (`0-1`, higher is better)
|
|
30
30
|
|
|
31
|
-
>
|
|
31
|
+
> **Context Scorer Selection:**
|
|
32
32
|
>
|
|
33
33
|
> - Use **Context Precision** when context ordering matters and you need standard IR metrics (ideal for RAG ranking evaluation)
|
|
34
34
|
> - Use **Context Relevance** when you need detailed relevance assessment and want to track context usage and identify gaps
|
|
@@ -111,9 +111,9 @@ export const contentWorkflow = createWorkflow({ ... })
|
|
|
111
111
|
|
|
112
112
|
In addition to live evaluations, you can use scorers to evaluate historical traces from your agent interactions and workflows. This is particularly useful for analyzing past performance, debugging issues, or running batch evaluations.
|
|
113
113
|
|
|
114
|
-
> **Observability
|
|
114
|
+
> **Observability required:** To score traces, you must first configure observability in your Mastra instance to collect trace data. See [Tracing documentation](https://mastra.ai/docs/observability/tracing/overview) for setup instructions.
|
|
115
115
|
|
|
116
|
-
|
|
116
|
+
## Studio
|
|
117
117
|
|
|
118
118
|
To score traces, you first need to register your scorers with your Mastra instance:
|
|
119
119
|
|
|
@@ -126,16 +126,15 @@ const mastra = new Mastra({
|
|
|
126
126
|
})
|
|
127
127
|
```
|
|
128
128
|
|
|
129
|
-
Once registered, you can score traces interactively within Studio under the Observability section.
|
|
129
|
+
Once registered, you can score traces interactively within Studio under the **Observability** section. Open Studio to manage scorers, review scores, and run experiments.
|
|
130
130
|
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
For more details, see [Studio](https://mastra.ai/docs/getting-started/studio) docs.
|
|
131
|
+
- **Scorers list**: Browse all registered scorers with their description, and the number of agents and workflows each scorer is attached to.
|
|
132
|
+
- **Score results**: Select a scorer to see a paginated list of every score it has produced. Click a row to open the detail panel, which shows the score value, reason, input, output, and the prompts used by the judge. From this panel, save any result as a dataset item for future experiments.
|
|
133
|
+
- **Agent Evaluate tab**: Open the Evaluate tab on any agent to attach or detach scorers, create or edit stored scorers inline, manage datasets, and run experiments. Experiment results display per-item scores alongside pass/fail status and version tags.
|
|
134
|
+
- **Trace scoring**: In the Observability section, run a scorer against any historical trace or span to evaluate past interactions. Filter scores by agent or workflow.
|
|
136
135
|
|
|
137
136
|
## Next steps
|
|
138
137
|
|
|
139
138
|
- Learn how to create your own scorers in the [Creating Custom Scorers](https://mastra.ai/docs/evals/custom-scorers) guide
|
|
140
139
|
- Explore built-in scorers in the [Built-in Scorers](https://mastra.ai/docs/evals/built-in-scorers) section
|
|
141
|
-
- Test scorers with [Studio](https://mastra.ai/docs/
|
|
140
|
+
- Test scorers with [Studio](https://mastra.ai/docs/studio/overview)
|
|
@@ -333,7 +333,7 @@ Matching Guidelines:
|
|
|
333
333
|
- "semantic": The same concept or fact expressed differently but with equivalent meaning
|
|
334
334
|
- "partial": Some overlap but missing important details or context
|
|
335
335
|
- "missing": No corresponding information found in the output
|
|
336
|
-
-
|
|
336
|
+
- For factually incorrect information (wrong facts, incorrect names), mark the match as "missing" and add it to the "contradictions" array
|
|
337
337
|
|
|
338
338
|
CRITICAL: If the output contains factually incorrect information (wrong names, wrong facts, opposite claims), you MUST identify contradictions and mark relevant matches as "missing" while adding entries to the contradictions array.
|
|
339
339
|
|