@mastra/evals 1.0.0-beta.2 → 1.0.0-beta.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +29 -0
- package/dist/{chunk-CKKVCGRB.js → chunk-6EA6D7JG.js} +2 -2
- package/dist/chunk-6EA6D7JG.js.map +1 -0
- package/dist/{chunk-AT7HXT3U.cjs → chunk-DSXZHUHI.cjs} +2 -2
- package/dist/chunk-DSXZHUHI.cjs.map +1 -0
- package/dist/docs/README.md +31 -0
- package/dist/docs/SKILL.md +32 -0
- package/dist/docs/SOURCE_MAP.json +6 -0
- package/dist/docs/evals/01-overview.md +130 -0
- package/dist/docs/evals/02-built-in-scorers.md +49 -0
- package/dist/docs/evals/03-reference.md +4018 -0
- package/dist/index.cjs +4 -0
- package/dist/index.cjs.map +1 -0
- package/dist/index.d.ts +12 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +3 -0
- package/dist/index.js.map +1 -0
- package/dist/scorers/prebuilt/index.cjs +59 -59
- package/dist/scorers/prebuilt/index.cjs.map +1 -1
- package/dist/scorers/prebuilt/index.js +1 -1
- package/dist/scorers/prebuilt/index.js.map +1 -1
- package/dist/scorers/utils.cjs +16 -16
- package/dist/scorers/utils.d.ts +1 -2
- package/dist/scorers/utils.d.ts.map +1 -1
- package/dist/scorers/utils.js +1 -1
- package/package.json +8 -10
- package/dist/chunk-AT7HXT3U.cjs.map +0 -1
- package/dist/chunk-CKKVCGRB.js.map +0 -1
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,34 @@
|
|
|
1
1
|
# @mastra/evals
|
|
2
2
|
|
|
3
|
+
## 1.0.0-beta.4
|
|
4
|
+
|
|
5
|
+
### Patch Changes
|
|
6
|
+
|
|
7
|
+
- Add embedded documentation support for Mastra packages ([#11472](https://github.com/mastra-ai/mastra/pull/11472))
|
|
8
|
+
|
|
9
|
+
Mastra packages now include embedded documentation in the published npm package under `dist/docs/`. This enables coding agents and AI assistants to understand and use the framework by reading documentation directly from `node_modules`.
|
|
10
|
+
|
|
11
|
+
Each package includes:
|
|
12
|
+
- **SKILL.md** - Entry point explaining the package's purpose and capabilities
|
|
13
|
+
- **SOURCE_MAP.json** - Machine-readable index mapping exports to types and implementation files
|
|
14
|
+
- **Topic folders** - Conceptual documentation organized by feature area
|
|
15
|
+
|
|
16
|
+
Documentation is driven by the `packages` frontmatter field in MDX files, which maps docs to their corresponding packages. CI validation ensures all docs include this field.
|
|
17
|
+
|
|
18
|
+
- Users no longer need to install the `ai` package as a peer dependency; this package now includes the necessary types internally. ([#11625](https://github.com/mastra-ai/mastra/pull/11625))
|
|
19
|
+
|
|
20
|
+
- Updated dependencies [[`d2d3e22`](https://github.com/mastra-ai/mastra/commit/d2d3e22a419ee243f8812a84e3453dd44365ecb0), [`bc72b52`](https://github.com/mastra-ai/mastra/commit/bc72b529ee4478fe89ecd85a8be47ce0127b82a0), [`05b8bee`](https://github.com/mastra-ai/mastra/commit/05b8bee9e50e6c2a4a2bf210eca25ee212ca24fa), [`c042bd0`](https://github.com/mastra-ai/mastra/commit/c042bd0b743e0e86199d0cb83344ca7690e34a9c), [`940a2b2`](https://github.com/mastra-ai/mastra/commit/940a2b27480626ed7e74f55806dcd2181c1dd0c2), [`e0941c3`](https://github.com/mastra-ai/mastra/commit/e0941c3d7fc75695d5d258e7008fd5d6e650800c), [`0c0580a`](https://github.com/mastra-ai/mastra/commit/0c0580a42f697cd2a7d5973f25bfe7da9055038a), [`28f5f89`](https://github.com/mastra-ai/mastra/commit/28f5f89705f2409921e3c45178796c0e0d0bbb64), [`e601b27`](https://github.com/mastra-ai/mastra/commit/e601b272c70f3a5ecca610373aa6223012704892), [`3d3366f`](https://github.com/mastra-ai/mastra/commit/3d3366f31683e7137d126a3a57174a222c5801fb), [`5a4953f`](https://github.com/mastra-ai/mastra/commit/5a4953f7d25bb15ca31ed16038092a39cb3f98b3), [`eb9e522`](https://github.com/mastra-ai/mastra/commit/eb9e522ce3070a405e5b949b7bf5609ca51d7fe2), [`20e6f19`](https://github.com/mastra-ai/mastra/commit/20e6f1971d51d3ff6dd7accad8aaaae826d540ed), [`4f0b3c6`](https://github.com/mastra-ai/mastra/commit/4f0b3c66f196c06448487f680ccbb614d281e2f7), [`74c4f22`](https://github.com/mastra-ai/mastra/commit/74c4f22ed4c71e72598eacc346ba95cdbc00294f), [`81b6a8f`](https://github.com/mastra-ai/mastra/commit/81b6a8ff79f49a7549d15d66624ac1a0b8f5f971), [`e4d366a`](https://github.com/mastra-ai/mastra/commit/e4d366aeb500371dd4210d6aa8361a4c21d87034), [`a4f010b`](https://github.com/mastra-ai/mastra/commit/a4f010b22e4355a5fdee70a1fe0f6e4a692cc29e), [`73b0bb3`](https://github.com/mastra-ai/mastra/commit/73b0bb394dba7c9482eb467a97ab283dbc0ef4db), [`5627a8c`](https://github.com/mastra-ai/mastra/commit/5627a8c6dc11fe3711b3fa7a6ffd6eb34100a306), [`3ff45d1`](https://github.com/mastra-ai/mastra/commit/3ff45d10e0c80c5335a957ab563da72feb623520), [`251df45`](https://github.com/mastra-ai/mastra/commit/251df4531407dfa46d805feb40ff3fb49769f455), [`f894d14`](https://github.com/mastra-ai/mastra/commit/f894d148946629af7b1f452d65a9cf864cec3765), [`c2b9547`](https://github.com/mastra-ai/mastra/commit/c2b9547bf435f56339f23625a743b2147ab1c7a6), [`580b592`](https://github.com/mastra-ai/mastra/commit/580b5927afc82fe460dfdf9a38a902511b6b7e7f), [`58e3931`](https://github.com/mastra-ai/mastra/commit/58e3931af9baa5921688566210f00fb0c10479fa), [`08bb631`](https://github.com/mastra-ai/mastra/commit/08bb631ae2b14684b2678e3549d0b399a6f0561e), [`4fba91b`](https://github.com/mastra-ai/mastra/commit/4fba91bec7c95911dc28e369437596b152b04cd0), [`12b0cc4`](https://github.com/mastra-ai/mastra/commit/12b0cc4077d886b1a552637dedb70a7ade93528c)]:
|
|
21
|
+
- @mastra/core@1.0.0-beta.20
|
|
22
|
+
|
|
23
|
+
## 1.0.0-beta.3
|
|
24
|
+
|
|
25
|
+
### Patch Changes
|
|
26
|
+
|
|
27
|
+
- Add main entry point to @mastra/evals package. Packages with exports field should define a "." entry point. The package now exports commonly used utilities and prebuilt scorers from the main entry, allowing users to import directly from '@mastra/evals' in addition to the existing subpath exports. ([#11307](https://github.com/mastra-ai/mastra/pull/11307))
|
|
28
|
+
|
|
29
|
+
- Updated dependencies [[`33a4d2e`](https://github.com/mastra-ai/mastra/commit/33a4d2e4ed8af51f69256232f00c34d6b6b51d48), [`4aaa844`](https://github.com/mastra-ai/mastra/commit/4aaa844a4f19d054490f43638a990cc57bda8d2f), [`4a1a6cb`](https://github.com/mastra-ai/mastra/commit/4a1a6cb3facad54b2bb6780b00ce91d6de1edc08), [`31d13d5`](https://github.com/mastra-ai/mastra/commit/31d13d5fdc2e2380e2e3ee3ec9fb29d2a00f265d), [`4c62166`](https://github.com/mastra-ai/mastra/commit/4c621669f4a29b1f443eca3ba70b814afa286266), [`7bcbf10`](https://github.com/mastra-ai/mastra/commit/7bcbf10133516e03df964b941f9a34e9e4ab4177), [`4353600`](https://github.com/mastra-ai/mastra/commit/43536005a65988a8eede236f69122e7f5a284ba2), [`6986fb0`](https://github.com/mastra-ai/mastra/commit/6986fb064f5db6ecc24aa655e1d26529087b43b3), [`053e979`](https://github.com/mastra-ai/mastra/commit/053e9793b28e970086b0507f7f3b76ea32c1e838), [`e26dc9c`](https://github.com/mastra-ai/mastra/commit/e26dc9c3ccfec54ae3dc3e2b2589f741f9ae60a6), [`55edf73`](https://github.com/mastra-ai/mastra/commit/55edf7302149d6c964fbb7908b43babfc2b52145), [`27c0009`](https://github.com/mastra-ai/mastra/commit/27c0009777a6073d7631b0eb7b481d94e165b5ca), [`dee388d`](https://github.com/mastra-ai/mastra/commit/dee388dde02f2e63c53385ae69252a47ab6825cc), [`3f3fc30`](https://github.com/mastra-ai/mastra/commit/3f3fc3096f24c4a26cffeecfe73085928f72aa63), [`d90ea65`](https://github.com/mastra-ai/mastra/commit/d90ea6536f7aa51c6545a4e9215b55858e98e16d), [`d171e55`](https://github.com/mastra-ai/mastra/commit/d171e559ead9f52ec728d424844c8f7b164c4510), [`10c2735`](https://github.com/mastra-ai/mastra/commit/10c27355edfdad1ee2b826b897df74125eb81fb8), [`1924cf0`](https://github.com/mastra-ai/mastra/commit/1924cf06816e5e4d4d5333065ec0f4bb02a97799), [`b339816`](https://github.com/mastra-ai/mastra/commit/b339816df0984d0243d944ac2655d6ba5f809cde)]:
|
|
30
|
+
- @mastra/core@1.0.0-beta.15
|
|
31
|
+
|
|
3
32
|
## 1.0.0-beta.2
|
|
4
33
|
|
|
5
34
|
### Minor Changes
|
|
@@ -173,5 +173,5 @@ var extractAgentResponseMessages = (runOutput) => {
|
|
|
173
173
|
};
|
|
174
174
|
|
|
175
175
|
export { createAgentTestRun, createTestMessage, createTestRun, createToolInvocation, extractAgentResponseMessages, extractInputMessages, extractToolCalls, getAssistantMessageFromRunOutput, getCombinedSystemPrompt, getReasoningFromRunOutput, getSystemMessagesFromRunInput, getTextContentFromMastraDBMessage, getUserMessageFromRunInput, isCloserTo, roundToTwoDecimals };
|
|
176
|
-
//# sourceMappingURL=chunk-
|
|
177
|
-
//# sourceMappingURL=chunk-
|
|
176
|
+
//# sourceMappingURL=chunk-6EA6D7JG.js.map
|
|
177
|
+
//# sourceMappingURL=chunk-6EA6D7JG.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"sources":["../src/scorers/utils.ts"],"names":[],"mappings":";;;AAyBO,SAAS,kCAAkC,OAAA,EAAkC;AAClF,EAAA,IAAI,OAAO,QAAQ,OAAA,CAAQ,OAAA,KAAY,YAAY,OAAA,CAAQ,OAAA,CAAQ,YAAY,EAAA,EAAI;AACjF,IAAA,OAAO,QAAQ,OAAA,CAAQ,OAAA;AAAA,EACzB;AACA,EAAA,IAAI,OAAA,CAAQ,QAAQ,KAAA,IAAS,KAAA,CAAM,QAAQ,OAAA,CAAQ,OAAA,CAAQ,KAAK,CAAA,EAAG;AAEjE,IAAA,MAAM,SAAA,GAAY,QAAQ,OAAA,CAAQ,KAAA,CAAM,OAAO,CAAA,CAAA,KAAK,CAAA,CAAE,SAAS,MAAM,CAAA;AACrE,IAAA,OAAO,SAAA,CAAU,SAAS,CAAA,GAAI,SAAA,CAAU,UAAU,MAAA,GAAS,CAAC,CAAA,EAAG,IAAA,IAAQ,EAAA,GAAK,EAAA;AAAA,EAC9E;AACA,EAAA,OAAO,EAAA;AACT;AAgBO,IAAM,kBAAA,GAAqB,CAAC,GAAA,KAAgB;AACjD,EAAA,OAAO,KAAK,KAAA,CAAA,CAAO,GAAA,GAAM,MAAA,CAAO,OAAA,IAAW,GAAG,CAAA,GAAI,GAAA;AACpD;AAgBO,SAAS,UAAA,CAAW,KAAA,EAAe,OAAA,EAAiB,OAAA,EAA0B;AACnF,EAAA,OAAO,IAAA,CAAK,IAAI,KAAA,GAAQ,OAAO,IAAI,IAAA,CAAK,GAAA,CAAI,QAAQ,OAAO,CAAA;AAC7D;AA6CO,IAAM,aAAA,GAAgB,CAC3B,KAAA,EACA,MAAA,EACA,mBACA,cAAA,KACiB;AACjB,EAAA,OAAO;AAAA,IACL,OAAO,CAAC,EAAE,MAAM,MAAA,EAAQ,OAAA,EAAS,OAAO,CAAA;AAAA,IACxC,MAAA,EAAQ,EAAE,IAAA,EAAM,WAAA,EAAa,MAAM,MAAA,EAAO;AAAA,IAC1C,iBAAA,EAAmB,qBAAqB,EAAC;AAAA,IACzC,cAAA,EAAgB,kBAAkB;AAAC,GACrC;AACF;AAmBO,IAAM,0BAAA,GAA6B,CAAC,KAAA,KAAuD;AAChG,EAAA,MAAM,OAAA,GAAU,OAAO,aAAA,CAAc,IAAA,CAAK,CAAC,EAAE,IAAA,EAAK,KAAM,IAAA,KAAS,MAAM,CAAA;AACvE,EAAA,OAAO,OAAA,GAAU,iCAAA,CAAkC,OAAO,CAAA,GAAI,MAAA;AAChE;AAoBO,IAAM,6BAAA,GAAgC,CAAC,KAAA,KAA6C;AACzF,EAAA,MAAM,iBAA2B,EAAC;AAGlC,EAAA,IAAI,OAAO,cAAA,EAAgB;AACzB,IAAA,cAAA,CAAe,IAAA;AAAA,MACb,GAAG,KAAA,CAAM,cAAA,CACN,GAAA,CAAI,CAAA,GAAA,KAAO;AAEV,QAAA,IAAI,OAAO,GAAA,CAAI,OAAA,KAAY,QAAA,EAAU;AACnC,UAAA,OAAO,GAAA,CAAI,OAAA;AAAA,QACb,CAAA,MAAA,IAAW,KAAA,CAAM,OAAA,CAAQ,GAAA,CAAI,OAAO,CAAA,EAAG;AAErC,UAAA,OAAO,IAAI,OAAA,CACR,MAAA,CAAO,CAAC,IAAA,KAAc,KAAK,IAAA,KAAS,MAAM,CAAA,CAC1C,GAAA,CAAI,CAAC,IAAA,KAAc,IAAA,CAAK,QAAQ,EAAE,CAAA,CAClC,KAAK,GAAG,CAAA;AAAA,QACb;AACA,QAAA,OAAO,EAAA;AAAA,MACT,CAAC,CAAA,CACA,MAAA,CAAO,CAAA,OAAA,KAAW,OAAO;AAAA,KAC9B;AAAA,EACF;AAGA,EAAA,IAAI,OAAO,oBAAA,EAAsB;AAC/B,IAAA,MAAA,CAAO,MAAA,CAAO,KAAA,CAAM,oBAAoB,CAAA,CAAE,QAAQ,CAAA,QAAA,KAAY;AAC5D,MAAA,QAAA,CAAS,QAAQ,CAAA,GAAA,KAAO;AACtB,QAAA,IAAI,OAAO,GAAA,CAAI,OAAA,KAAY,QAAA,EAAU;AACnC,UAAA,cAAA,CAAe,IAAA,CAAK,IAAI,OAAO,CAAA;AAAA,QACjC;AAAA,MACF,CAAC,CAAA;AAAA,IACH,CAAC,CAAA;AAAA,EACH;AAEA,EAAA,OAAO,cAAA;AACT;AAmBO,IAAM,uBAAA,GAA0B,CAAC,KAAA,KAA2C;AACjF,EAAA,MAAM,cAAA,GAAiB,8BAA8B,KAAK,CAAA;AAC1D,EAAA,OAAO,cAAA,CAAe,KAAK,MAAM,CAAA;AACnC;AAmBO,IAAM,gCAAA,GAAmC,CAAC,MAAA,KAAqC;AACpF,EAAA,MAAM,OAAA,GAAU,QAAQ,IAAA,CAAK,CAAC,EAAE,IAAA,EAAK,KAAM,SAAS,WAAW,CAAA;AAC/D,EAAA,OAAO,OAAA,GAAU,iCAAA,CAAkC,OAAO,CAAA,GAAI,MAAA;AAChE;AAiCO,IAAM,yBAAA,GAA4B,CAAC,MAAA,KAAyD;AACjG,EAAA,IAAI,CAAC,QAAQ,OAAO,MAAA;AAEpB,EAAA,MAAM,OAAA,GAAU,OAAO,IAAA,CAAK,CAAC,EAAE,IAAA,EAAK,KAAM,SAAS,WAAW,CAAA;AAC9D,EAAA,IAAI,CAAC,SAAS,OAAO,MAAA;AAGrB,EAAA,IAAI,OAAA,CAAQ,QAAQ,SAAA,EAAW;AAC7B,IAAA,OAAO,QAAQ,OAAA,CAAQ,SAAA;AAAA,EACzB;AAIA,EAAA,MAAM,cAAA,GAAiB,QAAQ,OAAA,CAAQ,KAAA,EAAO,OAAO,CAAC,CAAA,KAAW,CAAA,CAAE,IAAA,KAAS,WAAW,CAAA;AACvF,EAAA,IAAI,cAAA,IAAkB,cAAA,CAAe,MAAA,GAAS,CAAA,EAAG;AAC/C,IAAA,MAAM,cAAA,GAAiB,cAAA,CACpB,GAAA,CAAI,CAAC,CAAA,KAAW;AAEf,MAAA,IAAI,EAAE,OAAA,IAAW,KAAA,CAAM,OAAA,CAAQ,CAAA,CAAE,OAAO,CAAA,EAAG;AACzC,QAAA,OAAO,EAAE,OAAA,CACN,MAAA,CAAO,CAAC,CAAA,KAAW,EAAE,IAAA,KAAS,MAAM,CAAA,CACpC,GAAA,CAAI,CAAC,CAAA,KAAW,CAAA,CAAE,IAAI,CAAA,CACtB,KAAK,EAAE,CAAA;AAAA,MACZ;AACA,MAAA,OAAO,EAAE,SAAA,IAAa,EAAA;AAAA,IACxB,CAAC,CAAA,CACA,MAAA,CAAO,OAAO,CAAA;AAEjB,IAAA,OAAO,eAAe,MAAA,GAAS,CAAA,GAAI,cAAA,CAAe,IAAA,CAAK,IAAI,CAAA,GAAI,MAAA;AAAA,EACjE;AAEA,EAAA,OAAO,MAAA;AACT;AAuBO,IAAM,uBAAuB,CAAC;AAAA,EACnC,UAAA;AAAA,EACA,QAAA;AAAA,EACA,IAAA;AAAA,EACA,MAAA;AAAA,EACA,KAAA,GAAQ;AACV,CAAA,KAMuH;AACrH,EAAA,OAAO;AAAA,IACL,UAAA;AAAA,IACA,QAAA;AAAA,IACA,IAAA;AAAA,IACA,MAAA;AAAA,IACA;AAAA,GACF;AACF;AAmCO,SAAS,iBAAA,CAAkB;AAAA,EAChC,OAAA;AAAA,EACA,IAAA;AAAA,EACA,EAAA,GAAK,cAAA;AAAA,EACL,kBAAkB;AACpB,CAAA,EAWoB;AAClB,EAAA,OAAO;AAAA,IACL,EAAA;AAAA,IACA,IAAA;AAAA,IACA,OAAA,EAAS;AAAA,MACP,MAAA,EAAQ,CAAA;AAAA,MACR,OAAO,CAAC,EAAE,MAAM,MAAA,EAAQ,IAAA,EAAM,SAAS,CAAA;AAAA,MACvC,OAAA;AAAA,MACA,GAAI,eAAA,CAAgB,MAAA,GAAS,CAAA,IAAK;AAAA,QAChC,eAAA,EAAiB,eAAA,CAAgB,GAAA,CAAI,CAAA,EAAA,MAAO;AAAA,UAC1C,YAAY,EAAA,CAAG,UAAA;AAAA,UACf,UAAU,EAAA,CAAG,QAAA;AAAA,UACb,MAAM,EAAA,CAAG,IAAA;AAAA,UACT,QAAQ,EAAA,CAAG,MAAA;AAAA,UACX,OAAO,EAAA,CAAG;AAAA,SACZ,CAAE;AAAA;AACJ,KACF;AAAA,IACA,SAAA,sBAAe,IAAA;AAAK,GACtB;AACF;AA+BO,IAAM,qBAAqB,CAAC;AAAA,EACjC,gBAAgB,EAAC;AAAA,EACjB,MAAA;AAAA,EACA,qBAAqB,EAAC;AAAA,EACtB,iBAAiB,EAAC;AAAA,EAClB,uBAAuB,EAAC;AAAA,EACxB,cAAA,GAAiB,IAAI,cAAA,EAAe;AAAA,EACpC,KAAA,GAAQ,OAAO,UAAA;AACjB,CAAA,KAaK;AACH,EAAA,OAAO;AAAA,IACL,KAAA,EAAO;AAAA,MACL,aAAA;AAAA,MACA,kBAAA;AAAA,MACA,cAAA;AAAA,MACA;AAAA,KACF;AAAA,IACA,MAAA;AAAA,IACA,cAAA;AAAA,IACA;AAAA,GACF;AACF;AAqCO,SAAS,iBAAiB,MAAA,EAAqF;AACpH,EAAA,MAAM,YAAsB,EAAC;AAC7B,EAAA,MAAM,gBAAgC,EAAC;AAEvC,EAAA,KAAA,IAAS,YAAA,GAAe,CAAA,EAAG,YAAA,GAAe,MAAA,CAAO,QAAQ,YAAA,EAAA,EAAgB;AACvE,IAAA,MAAM,OAAA,GAAU,OAAO,YAAY,CAAA;AAEnC,IAAA,IAAI,OAAA,EAAS,SAAS,eAAA,EAAiB;AACrC,MAAA,KAAA,IAAS,kBAAkB,CAAA,EAAG,eAAA,GAAkB,QAAQ,OAAA,CAAQ,eAAA,CAAgB,QAAQ,eAAA,EAAA,EAAmB;AACzG,QAAA,MAAM,UAAA,GAAa,OAAA,CAAQ,OAAA,CAAQ,eAAA,CAAgB,eAAe,CAAA;AAClE,QAAA,IAAI,UAAA,IAAc,WAAW,QAAA,KAAa,UAAA,CAAW,UAAU,QAAA,IAAY,UAAA,CAAW,UAAU,MAAA,CAAA,EAAS;AACvG,UAAA,SAAA,CAAU,IAAA,CAAK,WAAW,QAAQ,CAAA;AAClC,UAAA,aAAA,CAAc,IAAA,CAAK;AAAA,YACjB,UAAU,UAAA,CAAW,QAAA;AAAA,YACrB,YAAY,UAAA,CAAW,UAAA,IAAc,CAAA,EAAG,YAAY,IAAI,eAAe,CAAA,CAAA;AAAA,YACvE,YAAA;AAAA,YACA;AAAA,WACD,CAAA;AAAA,QACH;AAAA,MACF;AAAA,IACF;AAAA,EACF;AAEA,EAAA,OAAO,EAAE,KAAA,EAAO,SAAA,EAAW,aAAA,EAAc;AAC3C;AAiBO,IAAM,oBAAA,GAAuB,CAAC,QAAA,KAA2D;AAC9F,EAAA,OAAO,QAAA,EAAU,eAAe,GAAA,CAAI,CAAA,GAAA,KAAO,kCAAkC,GAAG,CAAC,KAAK,EAAC;AACzF;AAmBO,IAAM,4BAAA,GAA+B,CAAC,SAAA,KAAiD;AAC5F,EAAA,OAAO,SAAA,CAAU,MAAA,CAAO,CAAA,GAAA,KAAO,GAAA,CAAI,IAAA,KAAS,WAAW,CAAA,CAAE,GAAA,CAAI,CAAA,GAAA,KAAO,iCAAA,CAAkC,GAAG,CAAC,CAAA;AAC5G","file":"chunk-6EA6D7JG.js","sourcesContent":["import type { MastraDBMessage } from '@mastra/core/agent';\nimport type { ScorerRunInputForAgent, ScorerRunOutputForAgent, ScoringInput } from '@mastra/core/evals';\nimport { RequestContext } from '@mastra/core/request-context';\n\n/**\n * Extracts text content from a MastraDBMessage.\n *\n * This function matches the logic used in `MessageList.mastraDBMessageToAIV4UIMessage`.\n * It first checks for a string `content.content` field, then falls back to extracting\n * text from the `parts` array (returning only the last text part, like AI SDK does).\n *\n * @param message - The MastraDBMessage to extract text from\n * @returns The extracted text content, or an empty string if no text is found\n *\n * @example\n * ```ts\n * const message: MastraDBMessage = {\n * id: 'msg-1',\n * role: 'assistant',\n * content: { format: 2, parts: [{ type: 'text', text: 'Hello!' }] },\n * createdAt: new Date(),\n * };\n * const text = getTextContentFromMastraDBMessage(message); // 'Hello!'\n * ```\n */\nexport function getTextContentFromMastraDBMessage(message: MastraDBMessage): string {\n if (typeof message.content.content === 'string' && message.content.content !== '') {\n return message.content.content;\n }\n if (message.content.parts && Array.isArray(message.content.parts)) {\n // Return only the last text part like AI SDK does\n const textParts = message.content.parts.filter(p => p.type === 'text');\n return textParts.length > 0 ? textParts[textParts.length - 1]?.text || '' : '';\n }\n return '';\n}\n\n/**\n * Rounds a number to two decimal places.\n *\n * Uses `Number.EPSILON` to handle floating-point precision issues.\n *\n * @param num - The number to round\n * @returns The number rounded to two decimal places\n *\n * @example\n * ```ts\n * roundToTwoDecimals(0.1 + 0.2); // 0.3\n * roundToTwoDecimals(1.005); // 1.01\n * ```\n */\nexport const roundToTwoDecimals = (num: number) => {\n return Math.round((num + Number.EPSILON) * 100) / 100;\n};\n\n/**\n * Determines if a value is closer to the first target than the second.\n *\n * @param value - The value to compare\n * @param target1 - The first target value\n * @param target2 - The second target value\n * @returns `true` if `value` is closer to `target1` than `target2`\n *\n * @example\n * ```ts\n * isCloserTo(0.6, 1, 0); // true (0.6 is closer to 1)\n * isCloserTo(0.3, 1, 0); // false (0.3 is closer to 0)\n * ```\n */\nexport function isCloserTo(value: number, target1: number, target2: number): boolean {\n return Math.abs(value - target1) < Math.abs(value - target2);\n}\n\n/**\n * Represents a test case for scorer evaluation.\n */\nexport type TestCase = {\n /** The input text to evaluate */\n input: string;\n /** The output text to evaluate */\n output: string;\n /** The expected result of the evaluation */\n expectedResult: {\n /** The expected score */\n score: number;\n /** The optional expected reason */\n reason?: string;\n };\n};\n\n/**\n * Represents a test case with additional context for scorer evaluation.\n */\nexport type TestCaseWithContext = TestCase & {\n /** Additional context strings for the evaluation */\n context: string[];\n};\n\n/**\n * Creates a scoring input object for testing purposes.\n *\n * @param input - The user input text\n * @param output - The assistant output text\n * @param additionalContext - Optional additional context data\n * @param requestContext - Optional request context data\n * @returns A ScoringInput object ready for use in scorer tests\n *\n * @example\n * ```ts\n * const run = createTestRun(\n * 'What is 2+2?',\n * 'The answer is 4.',\n * { topic: 'math' }\n * );\n * ```\n */\nexport const createTestRun = (\n input: string,\n output: string,\n additionalContext?: Record<string, any>,\n requestContext?: Record<string, any>,\n): ScoringInput => {\n return {\n input: [{ role: 'user', content: input }],\n output: { role: 'assistant', text: output },\n additionalContext: additionalContext ?? {},\n requestContext: requestContext ?? {},\n };\n};\n\n/**\n * Extracts the user message text from a scorer run input.\n *\n * Finds the first message with role 'user' and extracts its text content.\n *\n * @param input - The scorer run input containing input messages\n * @returns The user message text, or `undefined` if no user message is found\n *\n * @example\n * ```ts\n * const scorer = createScorer({ ... })\n * .preprocess(({ run }) => {\n * const userText = getUserMessageFromRunInput(run.input);\n * return { userText };\n * });\n * ```\n */\nexport const getUserMessageFromRunInput = (input?: ScorerRunInputForAgent): string | undefined => {\n const message = input?.inputMessages.find(({ role }) => role === 'user');\n return message ? getTextContentFromMastraDBMessage(message) : undefined;\n};\n\n/**\n * Extracts all system messages from a scorer run input.\n *\n * Collects text from both standard system messages and tagged system messages\n * (specialized system prompts like memory instructions).\n *\n * @param input - The scorer run input containing system messages\n * @returns An array of system message strings\n *\n * @example\n * ```ts\n * const scorer = createScorer({ ... })\n * .preprocess(({ run }) => {\n * const systemMessages = getSystemMessagesFromRunInput(run.input);\n * return { systemPrompt: systemMessages.join('\\n') };\n * });\n * ```\n */\nexport const getSystemMessagesFromRunInput = (input?: ScorerRunInputForAgent): string[] => {\n const systemMessages: string[] = [];\n\n // Add standard system messages\n if (input?.systemMessages) {\n systemMessages.push(\n ...input.systemMessages\n .map(msg => {\n // Handle different content types - extract text if it's an array of parts\n if (typeof msg.content === 'string') {\n return msg.content;\n } else if (Array.isArray(msg.content)) {\n // Extract text from parts array\n return msg.content\n .filter((part: any) => part.type === 'text')\n .map((part: any) => part.text || '')\n .join(' ');\n }\n return '';\n })\n .filter(content => content),\n );\n }\n\n // Add tagged system messages (these are specialized system prompts)\n if (input?.taggedSystemMessages) {\n Object.values(input.taggedSystemMessages).forEach(messages => {\n messages.forEach(msg => {\n if (typeof msg.content === 'string') {\n systemMessages.push(msg.content);\n }\n });\n });\n }\n\n return systemMessages;\n};\n\n/**\n * Combines all system messages into a single prompt string.\n *\n * Joins all system messages (standard and tagged) with double newlines.\n *\n * @param input - The scorer run input containing system messages\n * @returns A combined system prompt string\n *\n * @example\n * ```ts\n * const scorer = createScorer({ ... })\n * .preprocess(({ run }) => {\n * const systemPrompt = getCombinedSystemPrompt(run.input);\n * return { systemPrompt };\n * });\n * ```\n */\nexport const getCombinedSystemPrompt = (input?: ScorerRunInputForAgent): string => {\n const systemMessages = getSystemMessagesFromRunInput(input);\n return systemMessages.join('\\n\\n');\n};\n\n/**\n * Extracts the assistant message text from a scorer run output.\n *\n * Finds the first message with role 'assistant' and extracts its text content.\n *\n * @param output - The scorer run output (array of MastraDBMessage)\n * @returns The assistant message text, or `undefined` if no assistant message is found\n *\n * @example\n * ```ts\n * const scorer = createScorer({ ... })\n * .preprocess(({ run }) => {\n * const response = getAssistantMessageFromRunOutput(run.output);\n * return { response };\n * });\n * ```\n */\nexport const getAssistantMessageFromRunOutput = (output?: ScorerRunOutputForAgent) => {\n const message = output?.find(({ role }) => role === 'assistant');\n return message ? getTextContentFromMastraDBMessage(message) : undefined;\n};\n\n/**\n * Extracts reasoning text from a scorer run output.\n *\n * This function extracts reasoning content from assistant messages, which is\n * produced by reasoning models like `deepseek-reasoner`. The reasoning can be\n * stored in two places:\n * 1. `content.reasoning` - a string field on the message content\n * 2. `content.parts` - as parts with `type: 'reasoning'` containing `details`\n *\n * @param output - The scorer run output (array of MastraDBMessage)\n * @returns The reasoning text, or `undefined` if no reasoning is present\n *\n * @example\n * ```ts\n * const reasoningScorer = createScorer({\n * id: 'reasoning-scorer',\n * name: 'Reasoning Quality',\n * description: 'Evaluates the quality of model reasoning',\n * type: 'agent',\n * })\n * .preprocess(({ run }) => {\n * const reasoning = getReasoningFromRunOutput(run.output);\n * const response = getAssistantMessageFromRunOutput(run.output);\n * return { reasoning, response };\n * })\n * .generateScore(({ results }) => {\n * // Score based on reasoning quality\n * return results.preprocessStepResult?.reasoning ? 1 : 0;\n * });\n * ```\n */\nexport const getReasoningFromRunOutput = (output?: ScorerRunOutputForAgent): string | undefined => {\n if (!output) return undefined;\n\n const message = output.find(({ role }) => role === 'assistant');\n if (!message) return undefined;\n\n // Check for reasoning in content.reasoning (string format)\n if (message.content.reasoning) {\n return message.content.reasoning;\n }\n\n // Check for reasoning in parts with type 'reasoning'\n // Reasoning models store reasoning in parts as { type: 'reasoning', details: [{ type: 'text', text: '...' }] }\n const reasoningParts = message.content.parts?.filter((p: any) => p.type === 'reasoning');\n if (reasoningParts && reasoningParts.length > 0) {\n const reasoningTexts = reasoningParts\n .map((p: any) => {\n // The reasoning text can be in p.reasoning or in p.details[].text\n if (p.details && Array.isArray(p.details)) {\n return p.details\n .filter((d: any) => d.type === 'text')\n .map((d: any) => d.text)\n .join('');\n }\n return p.reasoning || '';\n })\n .filter(Boolean);\n\n return reasoningTexts.length > 0 ? reasoningTexts.join('\\n') : undefined;\n }\n\n return undefined;\n};\n\n/**\n * Creates a tool invocation object for testing purposes.\n *\n * @param options - The tool invocation configuration\n * @param options.toolCallId - Unique identifier for the tool call\n * @param options.toolName - Name of the tool being called\n * @param options.args - Arguments passed to the tool\n * @param options.result - Result returned by the tool\n * @param options.state - State of the invocation (default: 'result')\n * @returns A tool invocation object\n *\n * @example\n * ```ts\n * const invocation = createToolInvocation({\n * toolCallId: 'call-123',\n * toolName: 'weatherTool',\n * args: { location: 'London' },\n * result: { temperature: 20, condition: 'sunny' },\n * });\n * ```\n */\nexport const createToolInvocation = ({\n toolCallId,\n toolName,\n args,\n result,\n state = 'result',\n}: {\n toolCallId: string;\n toolName: string;\n args: Record<string, any>;\n result: Record<string, any>;\n state?: 'call' | 'partial-call' | 'result';\n}): { toolCallId: string; toolName: string; args: Record<string, any>; result: Record<string, any>; state: string } => {\n return {\n toolCallId,\n toolName,\n args,\n result,\n state,\n };\n};\n\n/**\n * Creates a MastraDBMessage object for testing purposes.\n *\n * Supports optional tool invocations for testing tool call scenarios.\n *\n * @param options - The message configuration\n * @param options.content - The text content of the message\n * @param options.role - The role of the message sender ('user', 'assistant', or 'system')\n * @param options.id - Optional message ID (default: 'test-message')\n * @param options.toolInvocations - Optional array of tool invocations\n * @returns A MastraDBMessage object\n *\n * @example\n * ```ts\n * const message = createTestMessage({\n * content: 'Hello, how can I help?',\n * role: 'assistant',\n * });\n *\n * // With tool invocations\n * const messageWithTools = createTestMessage({\n * content: 'Let me check the weather.',\n * role: 'assistant',\n * toolInvocations: [{\n * toolCallId: 'call-1',\n * toolName: 'weatherTool',\n * args: { location: 'Paris' },\n * result: { temp: 22 },\n * state: 'result',\n * }],\n * });\n * ```\n */\nexport function createTestMessage({\n content,\n role,\n id = 'test-message',\n toolInvocations = [],\n}: {\n content: string;\n role: 'user' | 'assistant' | 'system';\n id?: string;\n toolInvocations?: Array<{\n toolCallId: string;\n toolName: string;\n args: Record<string, any>;\n result: Record<string, any>;\n state: any;\n }>;\n}): MastraDBMessage {\n return {\n id,\n role,\n content: {\n format: 2,\n parts: [{ type: 'text', text: content }],\n content,\n ...(toolInvocations.length > 0 && {\n toolInvocations: toolInvocations.map(ti => ({\n toolCallId: ti.toolCallId,\n toolName: ti.toolName,\n args: ti.args,\n result: ti.result,\n state: ti.state,\n })),\n }),\n },\n createdAt: new Date(),\n };\n}\n\n/**\n * Creates a complete agent test run object for testing scorers.\n *\n * Provides a convenient way to construct the full run object that scorers receive,\n * including input messages, output, system messages, and request context.\n *\n * @param options - The test run configuration\n * @param options.inputMessages - Array of input messages (default: [])\n * @param options.output - The output messages (required)\n * @param options.rememberedMessages - Array of remembered messages from memory (default: [])\n * @param options.systemMessages - Array of system messages (default: [])\n * @param options.taggedSystemMessages - Tagged system messages map (default: {})\n * @param options.requestContext - Request context (default: new RequestContext())\n * @param options.runId - Unique run ID (default: random UUID)\n * @returns A complete test run object\n *\n * @example\n * ```ts\n * const testRun = createAgentTestRun({\n * inputMessages: [createTestMessage({ content: 'Hello', role: 'user' })],\n * output: [createTestMessage({ content: 'Hi there!', role: 'assistant' })],\n * });\n *\n * const result = await scorer.run({\n * input: testRun.input,\n * output: testRun.output,\n * });\n * ```\n */\nexport const createAgentTestRun = ({\n inputMessages = [],\n output,\n rememberedMessages = [],\n systemMessages = [],\n taggedSystemMessages = {},\n requestContext = new RequestContext(),\n runId = crypto.randomUUID(),\n}: {\n inputMessages?: ScorerRunInputForAgent['inputMessages'];\n output: ScorerRunOutputForAgent;\n rememberedMessages?: ScorerRunInputForAgent['rememberedMessages'];\n systemMessages?: ScorerRunInputForAgent['systemMessages'];\n taggedSystemMessages?: ScorerRunInputForAgent['taggedSystemMessages'];\n requestContext?: RequestContext;\n runId?: string;\n}): {\n input: ScorerRunInputForAgent;\n output: ScorerRunOutputForAgent;\n requestContext: RequestContext;\n runId: string;\n} => {\n return {\n input: {\n inputMessages,\n rememberedMessages,\n systemMessages,\n taggedSystemMessages,\n },\n output,\n requestContext,\n runId,\n };\n};\n\n/**\n * Information about a tool call extracted from scorer output.\n */\nexport type ToolCallInfo = {\n /** Name of the tool that was called */\n toolName: string;\n /** Unique identifier for the tool call */\n toolCallId: string;\n /** Index of the message containing this tool call */\n messageIndex: number;\n /** Index of the invocation within the message's tool invocations */\n invocationIndex: number;\n};\n\n/**\n * Extracts all tool calls from a scorer run output.\n *\n * Iterates through all messages and their tool invocations to collect\n * information about tools that were called (with state 'result' or 'call').\n *\n * @param output - The scorer run output (array of MastraDBMessage)\n * @returns An object containing tool names and detailed tool call info\n *\n * @example\n * ```ts\n * const scorer = createScorer({ ... })\n * .preprocess(({ run }) => {\n * const { tools, toolCallInfos } = extractToolCalls(run.output);\n * return {\n * toolsUsed: tools,\n * toolCount: tools.length,\n * };\n * });\n * ```\n */\nexport function extractToolCalls(output: ScorerRunOutputForAgent): { tools: string[]; toolCallInfos: ToolCallInfo[] } {\n const toolCalls: string[] = [];\n const toolCallInfos: ToolCallInfo[] = [];\n\n for (let messageIndex = 0; messageIndex < output.length; messageIndex++) {\n const message = output[messageIndex];\n // Tool invocations are now nested under content\n if (message?.content?.toolInvocations) {\n for (let invocationIndex = 0; invocationIndex < message.content.toolInvocations.length; invocationIndex++) {\n const invocation = message.content.toolInvocations[invocationIndex];\n if (invocation && invocation.toolName && (invocation.state === 'result' || invocation.state === 'call')) {\n toolCalls.push(invocation.toolName);\n toolCallInfos.push({\n toolName: invocation.toolName,\n toolCallId: invocation.toolCallId || `${messageIndex}-${invocationIndex}`,\n messageIndex,\n invocationIndex,\n });\n }\n }\n }\n }\n\n return { tools: toolCalls, toolCallInfos };\n}\n\n/**\n * Extracts text content from all input messages.\n *\n * @param runInput - The scorer run input\n * @returns An array of text strings from each input message\n *\n * @example\n * ```ts\n * const scorer = createScorer({ ... })\n * .preprocess(({ run }) => {\n * const messages = extractInputMessages(run.input);\n * return { allUserMessages: messages.join('\\n') };\n * });\n * ```\n */\nexport const extractInputMessages = (runInput: ScorerRunInputForAgent | undefined): string[] => {\n return runInput?.inputMessages?.map(msg => getTextContentFromMastraDBMessage(msg)) || [];\n};\n\n/**\n * Extracts text content from all assistant response messages.\n *\n * Filters for messages with role 'assistant' and extracts their text content.\n *\n * @param runOutput - The scorer run output (array of MastraDBMessage)\n * @returns An array of text strings from each assistant message\n *\n * @example\n * ```ts\n * const scorer = createScorer({ ... })\n * .preprocess(({ run }) => {\n * const responses = extractAgentResponseMessages(run.output);\n * return { allResponses: responses.join('\\n') };\n * });\n * ```\n */\nexport const extractAgentResponseMessages = (runOutput: ScorerRunOutputForAgent): string[] => {\n return runOutput.filter(msg => msg.role === 'assistant').map(msg => getTextContentFromMastraDBMessage(msg));\n};\n"]}
|
|
@@ -189,5 +189,5 @@ exports.getTextContentFromMastraDBMessage = getTextContentFromMastraDBMessage;
|
|
|
189
189
|
exports.getUserMessageFromRunInput = getUserMessageFromRunInput;
|
|
190
190
|
exports.isCloserTo = isCloserTo;
|
|
191
191
|
exports.roundToTwoDecimals = roundToTwoDecimals;
|
|
192
|
-
//# sourceMappingURL=chunk-
|
|
193
|
-
//# sourceMappingURL=chunk-
|
|
192
|
+
//# sourceMappingURL=chunk-DSXZHUHI.cjs.map
|
|
193
|
+
//# sourceMappingURL=chunk-DSXZHUHI.cjs.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"sources":["../src/scorers/utils.ts"],"names":["requestContext","RequestContext"],"mappings":";;;;;AAyBO,SAAS,kCAAkC,OAAA,EAAkC;AAClF,EAAA,IAAI,OAAO,QAAQ,OAAA,CAAQ,OAAA,KAAY,YAAY,OAAA,CAAQ,OAAA,CAAQ,YAAY,EAAA,EAAI;AACjF,IAAA,OAAO,QAAQ,OAAA,CAAQ,OAAA;AAAA,EACzB;AACA,EAAA,IAAI,OAAA,CAAQ,QAAQ,KAAA,IAAS,KAAA,CAAM,QAAQ,OAAA,CAAQ,OAAA,CAAQ,KAAK,CAAA,EAAG;AAEjE,IAAA,MAAM,SAAA,GAAY,QAAQ,OAAA,CAAQ,KAAA,CAAM,OAAO,CAAA,CAAA,KAAK,CAAA,CAAE,SAAS,MAAM,CAAA;AACrE,IAAA,OAAO,SAAA,CAAU,SAAS,CAAA,GAAI,SAAA,CAAU,UAAU,MAAA,GAAS,CAAC,CAAA,EAAG,IAAA,IAAQ,EAAA,GAAK,EAAA;AAAA,EAC9E;AACA,EAAA,OAAO,EAAA;AACT;AAgBO,IAAM,kBAAA,GAAqB,CAAC,GAAA,KAAgB;AACjD,EAAA,OAAO,KAAK,KAAA,CAAA,CAAO,GAAA,GAAM,MAAA,CAAO,OAAA,IAAW,GAAG,CAAA,GAAI,GAAA;AACpD;AAgBO,SAAS,UAAA,CAAW,KAAA,EAAe,OAAA,EAAiB,OAAA,EAA0B;AACnF,EAAA,OAAO,IAAA,CAAK,IAAI,KAAA,GAAQ,OAAO,IAAI,IAAA,CAAK,GAAA,CAAI,QAAQ,OAAO,CAAA;AAC7D;AA6CO,IAAM,aAAA,GAAgB,CAC3B,KAAA,EACA,MAAA,EACA,mBACA,cAAA,KACiB;AACjB,EAAA,OAAO;AAAA,IACL,OAAO,CAAC,EAAE,MAAM,MAAA,EAAQ,OAAA,EAAS,OAAO,CAAA;AAAA,IACxC,MAAA,EAAQ,EAAE,IAAA,EAAM,WAAA,EAAa,MAAM,MAAA,EAAO;AAAA,IAC1C,iBAAA,EAAmB,qBAAqB,EAAC;AAAA,IACzC,cAAA,EAAgB,kBAAkB;AAAC,GACrC;AACF;AAmBO,IAAM,0BAAA,GAA6B,CAAC,KAAA,KAAuD;AAChG,EAAA,MAAM,OAAA,GAAU,OAAO,aAAA,CAAc,IAAA,CAAK,CAAC,EAAE,IAAA,EAAK,KAAM,IAAA,KAAS,MAAM,CAAA;AACvE,EAAA,OAAO,OAAA,GAAU,iCAAA,CAAkC,OAAO,CAAA,GAAI,MAAA;AAChE;AAoBO,IAAM,6BAAA,GAAgC,CAAC,KAAA,KAA6C;AACzF,EAAA,MAAM,iBAA2B,EAAC;AAGlC,EAAA,IAAI,OAAO,cAAA,EAAgB;AACzB,IAAA,cAAA,CAAe,IAAA;AAAA,MACb,GAAG,KAAA,CAAM,cAAA,CACN,GAAA,CAAI,CAAA,GAAA,KAAO;AAEV,QAAA,IAAI,OAAO,GAAA,CAAI,OAAA,KAAY,QAAA,EAAU;AACnC,UAAA,OAAO,GAAA,CAAI,OAAA;AAAA,QACb,CAAA,MAAA,IAAW,KAAA,CAAM,OAAA,CAAQ,GAAA,CAAI,OAAO,CAAA,EAAG;AAErC,UAAA,OAAO,IAAI,OAAA,CACR,MAAA,CAAO,CAAC,IAAA,KAAc,KAAK,IAAA,KAAS,MAAM,CAAA,CAC1C,GAAA,CAAI,CAAC,IAAA,KAAc,IAAA,CAAK,QAAQ,EAAE,CAAA,CAClC,KAAK,GAAG,CAAA;AAAA,QACb;AACA,QAAA,OAAO,EAAA;AAAA,MACT,CAAC,CAAA,CACA,MAAA,CAAO,CAAA,OAAA,KAAW,OAAO;AAAA,KAC9B;AAAA,EACF;AAGA,EAAA,IAAI,OAAO,oBAAA,EAAsB;AAC/B,IAAA,MAAA,CAAO,MAAA,CAAO,KAAA,CAAM,oBAAoB,CAAA,CAAE,QAAQ,CAAA,QAAA,KAAY;AAC5D,MAAA,QAAA,CAAS,QAAQ,CAAA,GAAA,KAAO;AACtB,QAAA,IAAI,OAAO,GAAA,CAAI,OAAA,KAAY,QAAA,EAAU;AACnC,UAAA,cAAA,CAAe,IAAA,CAAK,IAAI,OAAO,CAAA;AAAA,QACjC;AAAA,MACF,CAAC,CAAA;AAAA,IACH,CAAC,CAAA;AAAA,EACH;AAEA,EAAA,OAAO,cAAA;AACT;AAmBO,IAAM,uBAAA,GAA0B,CAAC,KAAA,KAA2C;AACjF,EAAA,MAAM,cAAA,GAAiB,8BAA8B,KAAK,CAAA;AAC1D,EAAA,OAAO,cAAA,CAAe,KAAK,MAAM,CAAA;AACnC;AAmBO,IAAM,gCAAA,GAAmC,CAAC,MAAA,KAAqC;AACpF,EAAA,MAAM,OAAA,GAAU,QAAQ,IAAA,CAAK,CAAC,EAAE,IAAA,EAAK,KAAM,SAAS,WAAW,CAAA;AAC/D,EAAA,OAAO,OAAA,GAAU,iCAAA,CAAkC,OAAO,CAAA,GAAI,MAAA;AAChE;AAiCO,IAAM,yBAAA,GAA4B,CAAC,MAAA,KAAyD;AACjG,EAAA,IAAI,CAAC,QAAQ,OAAO,MAAA;AAEpB,EAAA,MAAM,OAAA,GAAU,OAAO,IAAA,CAAK,CAAC,EAAE,IAAA,EAAK,KAAM,SAAS,WAAW,CAAA;AAC9D,EAAA,IAAI,CAAC,SAAS,OAAO,MAAA;AAGrB,EAAA,IAAI,OAAA,CAAQ,QAAQ,SAAA,EAAW;AAC7B,IAAA,OAAO,QAAQ,OAAA,CAAQ,SAAA;AAAA,EACzB;AAIA,EAAA,MAAM,cAAA,GAAiB,QAAQ,OAAA,CAAQ,KAAA,EAAO,OAAO,CAAC,CAAA,KAAW,CAAA,CAAE,IAAA,KAAS,WAAW,CAAA;AACvF,EAAA,IAAI,cAAA,IAAkB,cAAA,CAAe,MAAA,GAAS,CAAA,EAAG;AAC/C,IAAA,MAAM,cAAA,GAAiB,cAAA,CACpB,GAAA,CAAI,CAAC,CAAA,KAAW;AAEf,MAAA,IAAI,EAAE,OAAA,IAAW,KAAA,CAAM,OAAA,CAAQ,CAAA,CAAE,OAAO,CAAA,EAAG;AACzC,QAAA,OAAO,EAAE,OAAA,CACN,MAAA,CAAO,CAAC,CAAA,KAAW,EAAE,IAAA,KAAS,MAAM,CAAA,CACpC,GAAA,CAAI,CAAC,CAAA,KAAW,CAAA,CAAE,IAAI,CAAA,CACtB,KAAK,EAAE,CAAA;AAAA,MACZ;AACA,MAAA,OAAO,EAAE,SAAA,IAAa,EAAA;AAAA,IACxB,CAAC,CAAA,CACA,MAAA,CAAO,OAAO,CAAA;AAEjB,IAAA,OAAO,eAAe,MAAA,GAAS,CAAA,GAAI,cAAA,CAAe,IAAA,CAAK,IAAI,CAAA,GAAI,MAAA;AAAA,EACjE;AAEA,EAAA,OAAO,MAAA;AACT;AAuBO,IAAM,uBAAuB,CAAC;AAAA,EACnC,UAAA;AAAA,EACA,QAAA;AAAA,EACA,IAAA;AAAA,EACA,MAAA;AAAA,EACA,KAAA,GAAQ;AACV,CAAA,KAMuH;AACrH,EAAA,OAAO;AAAA,IACL,UAAA;AAAA,IACA,QAAA;AAAA,IACA,IAAA;AAAA,IACA,MAAA;AAAA,IACA;AAAA,GACF;AACF;AAmCO,SAAS,iBAAA,CAAkB;AAAA,EAChC,OAAA;AAAA,EACA,IAAA;AAAA,EACA,EAAA,GAAK,cAAA;AAAA,EACL,kBAAkB;AACpB,CAAA,EAWoB;AAClB,EAAA,OAAO;AAAA,IACL,EAAA;AAAA,IACA,IAAA;AAAA,IACA,OAAA,EAAS;AAAA,MACP,MAAA,EAAQ,CAAA;AAAA,MACR,OAAO,CAAC,EAAE,MAAM,MAAA,EAAQ,IAAA,EAAM,SAAS,CAAA;AAAA,MACvC,OAAA;AAAA,MACA,GAAI,eAAA,CAAgB,MAAA,GAAS,CAAA,IAAK;AAAA,QAChC,eAAA,EAAiB,eAAA,CAAgB,GAAA,CAAI,CAAA,EAAA,MAAO;AAAA,UAC1C,YAAY,EAAA,CAAG,UAAA;AAAA,UACf,UAAU,EAAA,CAAG,QAAA;AAAA,UACb,MAAM,EAAA,CAAG,IAAA;AAAA,UACT,QAAQ,EAAA,CAAG,MAAA;AAAA,UACX,OAAO,EAAA,CAAG;AAAA,SACZ,CAAE;AAAA;AACJ,KACF;AAAA,IACA,SAAA,sBAAe,IAAA;AAAK,GACtB;AACF;AA+BO,IAAM,qBAAqB,CAAC;AAAA,EACjC,gBAAgB,EAAC;AAAA,EACjB,MAAA;AAAA,EACA,qBAAqB,EAAC;AAAA,EACtB,iBAAiB,EAAC;AAAA,EAClB,uBAAuB,EAAC;AAAA,kBACxBA,gBAAA,GAAiB,IAAIC,6BAAA,EAAe;AAAA,EACpC,KAAA,GAAQ,OAAO,UAAA;AACjB,CAAA,KAaK;AACH,EAAA,OAAO;AAAA,IACL,KAAA,EAAO;AAAA,MACL,aAAA;AAAA,MACA,kBAAA;AAAA,MACA,cAAA;AAAA,MACA;AAAA,KACF;AAAA,IACA,MAAA;AAAA,oBACAD,gBAAA;AAAA,IACA;AAAA,GACF;AACF;AAqCO,SAAS,iBAAiB,MAAA,EAAqF;AACpH,EAAA,MAAM,YAAsB,EAAC;AAC7B,EAAA,MAAM,gBAAgC,EAAC;AAEvC,EAAA,KAAA,IAAS,YAAA,GAAe,CAAA,EAAG,YAAA,GAAe,MAAA,CAAO,QAAQ,YAAA,EAAA,EAAgB;AACvE,IAAA,MAAM,OAAA,GAAU,OAAO,YAAY,CAAA;AAEnC,IAAA,IAAI,OAAA,EAAS,SAAS,eAAA,EAAiB;AACrC,MAAA,KAAA,IAAS,kBAAkB,CAAA,EAAG,eAAA,GAAkB,QAAQ,OAAA,CAAQ,eAAA,CAAgB,QAAQ,eAAA,EAAA,EAAmB;AACzG,QAAA,MAAM,UAAA,GAAa,OAAA,CAAQ,OAAA,CAAQ,eAAA,CAAgB,eAAe,CAAA;AAClE,QAAA,IAAI,UAAA,IAAc,WAAW,QAAA,KAAa,UAAA,CAAW,UAAU,QAAA,IAAY,UAAA,CAAW,UAAU,MAAA,CAAA,EAAS;AACvG,UAAA,SAAA,CAAU,IAAA,CAAK,WAAW,QAAQ,CAAA;AAClC,UAAA,aAAA,CAAc,IAAA,CAAK;AAAA,YACjB,UAAU,UAAA,CAAW,QAAA;AAAA,YACrB,YAAY,UAAA,CAAW,UAAA,IAAc,CAAA,EAAG,YAAY,IAAI,eAAe,CAAA,CAAA;AAAA,YACvE,YAAA;AAAA,YACA;AAAA,WACD,CAAA;AAAA,QACH;AAAA,MACF;AAAA,IACF;AAAA,EACF;AAEA,EAAA,OAAO,EAAE,KAAA,EAAO,SAAA,EAAW,aAAA,EAAc;AAC3C;AAiBO,IAAM,oBAAA,GAAuB,CAAC,QAAA,KAA2D;AAC9F,EAAA,OAAO,QAAA,EAAU,eAAe,GAAA,CAAI,CAAA,GAAA,KAAO,kCAAkC,GAAG,CAAC,KAAK,EAAC;AACzF;AAmBO,IAAM,4BAAA,GAA+B,CAAC,SAAA,KAAiD;AAC5F,EAAA,OAAO,SAAA,CAAU,MAAA,CAAO,CAAA,GAAA,KAAO,GAAA,CAAI,IAAA,KAAS,WAAW,CAAA,CAAE,GAAA,CAAI,CAAA,GAAA,KAAO,iCAAA,CAAkC,GAAG,CAAC,CAAA;AAC5G","file":"chunk-DSXZHUHI.cjs","sourcesContent":["import type { MastraDBMessage } from '@mastra/core/agent';\nimport type { ScorerRunInputForAgent, ScorerRunOutputForAgent, ScoringInput } from '@mastra/core/evals';\nimport { RequestContext } from '@mastra/core/request-context';\n\n/**\n * Extracts text content from a MastraDBMessage.\n *\n * This function matches the logic used in `MessageList.mastraDBMessageToAIV4UIMessage`.\n * It first checks for a string `content.content` field, then falls back to extracting\n * text from the `parts` array (returning only the last text part, like AI SDK does).\n *\n * @param message - The MastraDBMessage to extract text from\n * @returns The extracted text content, or an empty string if no text is found\n *\n * @example\n * ```ts\n * const message: MastraDBMessage = {\n * id: 'msg-1',\n * role: 'assistant',\n * content: { format: 2, parts: [{ type: 'text', text: 'Hello!' }] },\n * createdAt: new Date(),\n * };\n * const text = getTextContentFromMastraDBMessage(message); // 'Hello!'\n * ```\n */\nexport function getTextContentFromMastraDBMessage(message: MastraDBMessage): string {\n if (typeof message.content.content === 'string' && message.content.content !== '') {\n return message.content.content;\n }\n if (message.content.parts && Array.isArray(message.content.parts)) {\n // Return only the last text part like AI SDK does\n const textParts = message.content.parts.filter(p => p.type === 'text');\n return textParts.length > 0 ? textParts[textParts.length - 1]?.text || '' : '';\n }\n return '';\n}\n\n/**\n * Rounds a number to two decimal places.\n *\n * Uses `Number.EPSILON` to handle floating-point precision issues.\n *\n * @param num - The number to round\n * @returns The number rounded to two decimal places\n *\n * @example\n * ```ts\n * roundToTwoDecimals(0.1 + 0.2); // 0.3\n * roundToTwoDecimals(1.005); // 1.01\n * ```\n */\nexport const roundToTwoDecimals = (num: number) => {\n return Math.round((num + Number.EPSILON) * 100) / 100;\n};\n\n/**\n * Determines if a value is closer to the first target than the second.\n *\n * @param value - The value to compare\n * @param target1 - The first target value\n * @param target2 - The second target value\n * @returns `true` if `value` is closer to `target1` than `target2`\n *\n * @example\n * ```ts\n * isCloserTo(0.6, 1, 0); // true (0.6 is closer to 1)\n * isCloserTo(0.3, 1, 0); // false (0.3 is closer to 0)\n * ```\n */\nexport function isCloserTo(value: number, target1: number, target2: number): boolean {\n return Math.abs(value - target1) < Math.abs(value - target2);\n}\n\n/**\n * Represents a test case for scorer evaluation.\n */\nexport type TestCase = {\n /** The input text to evaluate */\n input: string;\n /** The output text to evaluate */\n output: string;\n /** The expected result of the evaluation */\n expectedResult: {\n /** The expected score */\n score: number;\n /** The optional expected reason */\n reason?: string;\n };\n};\n\n/**\n * Represents a test case with additional context for scorer evaluation.\n */\nexport type TestCaseWithContext = TestCase & {\n /** Additional context strings for the evaluation */\n context: string[];\n};\n\n/**\n * Creates a scoring input object for testing purposes.\n *\n * @param input - The user input text\n * @param output - The assistant output text\n * @param additionalContext - Optional additional context data\n * @param requestContext - Optional request context data\n * @returns A ScoringInput object ready for use in scorer tests\n *\n * @example\n * ```ts\n * const run = createTestRun(\n * 'What is 2+2?',\n * 'The answer is 4.',\n * { topic: 'math' }\n * );\n * ```\n */\nexport const createTestRun = (\n input: string,\n output: string,\n additionalContext?: Record<string, any>,\n requestContext?: Record<string, any>,\n): ScoringInput => {\n return {\n input: [{ role: 'user', content: input }],\n output: { role: 'assistant', text: output },\n additionalContext: additionalContext ?? {},\n requestContext: requestContext ?? {},\n };\n};\n\n/**\n * Extracts the user message text from a scorer run input.\n *\n * Finds the first message with role 'user' and extracts its text content.\n *\n * @param input - The scorer run input containing input messages\n * @returns The user message text, or `undefined` if no user message is found\n *\n * @example\n * ```ts\n * const scorer = createScorer({ ... })\n * .preprocess(({ run }) => {\n * const userText = getUserMessageFromRunInput(run.input);\n * return { userText };\n * });\n * ```\n */\nexport const getUserMessageFromRunInput = (input?: ScorerRunInputForAgent): string | undefined => {\n const message = input?.inputMessages.find(({ role }) => role === 'user');\n return message ? getTextContentFromMastraDBMessage(message) : undefined;\n};\n\n/**\n * Extracts all system messages from a scorer run input.\n *\n * Collects text from both standard system messages and tagged system messages\n * (specialized system prompts like memory instructions).\n *\n * @param input - The scorer run input containing system messages\n * @returns An array of system message strings\n *\n * @example\n * ```ts\n * const scorer = createScorer({ ... })\n * .preprocess(({ run }) => {\n * const systemMessages = getSystemMessagesFromRunInput(run.input);\n * return { systemPrompt: systemMessages.join('\\n') };\n * });\n * ```\n */\nexport const getSystemMessagesFromRunInput = (input?: ScorerRunInputForAgent): string[] => {\n const systemMessages: string[] = [];\n\n // Add standard system messages\n if (input?.systemMessages) {\n systemMessages.push(\n ...input.systemMessages\n .map(msg => {\n // Handle different content types - extract text if it's an array of parts\n if (typeof msg.content === 'string') {\n return msg.content;\n } else if (Array.isArray(msg.content)) {\n // Extract text from parts array\n return msg.content\n .filter((part: any) => part.type === 'text')\n .map((part: any) => part.text || '')\n .join(' ');\n }\n return '';\n })\n .filter(content => content),\n );\n }\n\n // Add tagged system messages (these are specialized system prompts)\n if (input?.taggedSystemMessages) {\n Object.values(input.taggedSystemMessages).forEach(messages => {\n messages.forEach(msg => {\n if (typeof msg.content === 'string') {\n systemMessages.push(msg.content);\n }\n });\n });\n }\n\n return systemMessages;\n};\n\n/**\n * Combines all system messages into a single prompt string.\n *\n * Joins all system messages (standard and tagged) with double newlines.\n *\n * @param input - The scorer run input containing system messages\n * @returns A combined system prompt string\n *\n * @example\n * ```ts\n * const scorer = createScorer({ ... })\n * .preprocess(({ run }) => {\n * const systemPrompt = getCombinedSystemPrompt(run.input);\n * return { systemPrompt };\n * });\n * ```\n */\nexport const getCombinedSystemPrompt = (input?: ScorerRunInputForAgent): string => {\n const systemMessages = getSystemMessagesFromRunInput(input);\n return systemMessages.join('\\n\\n');\n};\n\n/**\n * Extracts the assistant message text from a scorer run output.\n *\n * Finds the first message with role 'assistant' and extracts its text content.\n *\n * @param output - The scorer run output (array of MastraDBMessage)\n * @returns The assistant message text, or `undefined` if no assistant message is found\n *\n * @example\n * ```ts\n * const scorer = createScorer({ ... })\n * .preprocess(({ run }) => {\n * const response = getAssistantMessageFromRunOutput(run.output);\n * return { response };\n * });\n * ```\n */\nexport const getAssistantMessageFromRunOutput = (output?: ScorerRunOutputForAgent) => {\n const message = output?.find(({ role }) => role === 'assistant');\n return message ? getTextContentFromMastraDBMessage(message) : undefined;\n};\n\n/**\n * Extracts reasoning text from a scorer run output.\n *\n * This function extracts reasoning content from assistant messages, which is\n * produced by reasoning models like `deepseek-reasoner`. The reasoning can be\n * stored in two places:\n * 1. `content.reasoning` - a string field on the message content\n * 2. `content.parts` - as parts with `type: 'reasoning'` containing `details`\n *\n * @param output - The scorer run output (array of MastraDBMessage)\n * @returns The reasoning text, or `undefined` if no reasoning is present\n *\n * @example\n * ```ts\n * const reasoningScorer = createScorer({\n * id: 'reasoning-scorer',\n * name: 'Reasoning Quality',\n * description: 'Evaluates the quality of model reasoning',\n * type: 'agent',\n * })\n * .preprocess(({ run }) => {\n * const reasoning = getReasoningFromRunOutput(run.output);\n * const response = getAssistantMessageFromRunOutput(run.output);\n * return { reasoning, response };\n * })\n * .generateScore(({ results }) => {\n * // Score based on reasoning quality\n * return results.preprocessStepResult?.reasoning ? 1 : 0;\n * });\n * ```\n */\nexport const getReasoningFromRunOutput = (output?: ScorerRunOutputForAgent): string | undefined => {\n if (!output) return undefined;\n\n const message = output.find(({ role }) => role === 'assistant');\n if (!message) return undefined;\n\n // Check for reasoning in content.reasoning (string format)\n if (message.content.reasoning) {\n return message.content.reasoning;\n }\n\n // Check for reasoning in parts with type 'reasoning'\n // Reasoning models store reasoning in parts as { type: 'reasoning', details: [{ type: 'text', text: '...' }] }\n const reasoningParts = message.content.parts?.filter((p: any) => p.type === 'reasoning');\n if (reasoningParts && reasoningParts.length > 0) {\n const reasoningTexts = reasoningParts\n .map((p: any) => {\n // The reasoning text can be in p.reasoning or in p.details[].text\n if (p.details && Array.isArray(p.details)) {\n return p.details\n .filter((d: any) => d.type === 'text')\n .map((d: any) => d.text)\n .join('');\n }\n return p.reasoning || '';\n })\n .filter(Boolean);\n\n return reasoningTexts.length > 0 ? reasoningTexts.join('\\n') : undefined;\n }\n\n return undefined;\n};\n\n/**\n * Creates a tool invocation object for testing purposes.\n *\n * @param options - The tool invocation configuration\n * @param options.toolCallId - Unique identifier for the tool call\n * @param options.toolName - Name of the tool being called\n * @param options.args - Arguments passed to the tool\n * @param options.result - Result returned by the tool\n * @param options.state - State of the invocation (default: 'result')\n * @returns A tool invocation object\n *\n * @example\n * ```ts\n * const invocation = createToolInvocation({\n * toolCallId: 'call-123',\n * toolName: 'weatherTool',\n * args: { location: 'London' },\n * result: { temperature: 20, condition: 'sunny' },\n * });\n * ```\n */\nexport const createToolInvocation = ({\n toolCallId,\n toolName,\n args,\n result,\n state = 'result',\n}: {\n toolCallId: string;\n toolName: string;\n args: Record<string, any>;\n result: Record<string, any>;\n state?: 'call' | 'partial-call' | 'result';\n}): { toolCallId: string; toolName: string; args: Record<string, any>; result: Record<string, any>; state: string } => {\n return {\n toolCallId,\n toolName,\n args,\n result,\n state,\n };\n};\n\n/**\n * Creates a MastraDBMessage object for testing purposes.\n *\n * Supports optional tool invocations for testing tool call scenarios.\n *\n * @param options - The message configuration\n * @param options.content - The text content of the message\n * @param options.role - The role of the message sender ('user', 'assistant', or 'system')\n * @param options.id - Optional message ID (default: 'test-message')\n * @param options.toolInvocations - Optional array of tool invocations\n * @returns A MastraDBMessage object\n *\n * @example\n * ```ts\n * const message = createTestMessage({\n * content: 'Hello, how can I help?',\n * role: 'assistant',\n * });\n *\n * // With tool invocations\n * const messageWithTools = createTestMessage({\n * content: 'Let me check the weather.',\n * role: 'assistant',\n * toolInvocations: [{\n * toolCallId: 'call-1',\n * toolName: 'weatherTool',\n * args: { location: 'Paris' },\n * result: { temp: 22 },\n * state: 'result',\n * }],\n * });\n * ```\n */\nexport function createTestMessage({\n content,\n role,\n id = 'test-message',\n toolInvocations = [],\n}: {\n content: string;\n role: 'user' | 'assistant' | 'system';\n id?: string;\n toolInvocations?: Array<{\n toolCallId: string;\n toolName: string;\n args: Record<string, any>;\n result: Record<string, any>;\n state: any;\n }>;\n}): MastraDBMessage {\n return {\n id,\n role,\n content: {\n format: 2,\n parts: [{ type: 'text', text: content }],\n content,\n ...(toolInvocations.length > 0 && {\n toolInvocations: toolInvocations.map(ti => ({\n toolCallId: ti.toolCallId,\n toolName: ti.toolName,\n args: ti.args,\n result: ti.result,\n state: ti.state,\n })),\n }),\n },\n createdAt: new Date(),\n };\n}\n\n/**\n * Creates a complete agent test run object for testing scorers.\n *\n * Provides a convenient way to construct the full run object that scorers receive,\n * including input messages, output, system messages, and request context.\n *\n * @param options - The test run configuration\n * @param options.inputMessages - Array of input messages (default: [])\n * @param options.output - The output messages (required)\n * @param options.rememberedMessages - Array of remembered messages from memory (default: [])\n * @param options.systemMessages - Array of system messages (default: [])\n * @param options.taggedSystemMessages - Tagged system messages map (default: {})\n * @param options.requestContext - Request context (default: new RequestContext())\n * @param options.runId - Unique run ID (default: random UUID)\n * @returns A complete test run object\n *\n * @example\n * ```ts\n * const testRun = createAgentTestRun({\n * inputMessages: [createTestMessage({ content: 'Hello', role: 'user' })],\n * output: [createTestMessage({ content: 'Hi there!', role: 'assistant' })],\n * });\n *\n * const result = await scorer.run({\n * input: testRun.input,\n * output: testRun.output,\n * });\n * ```\n */\nexport const createAgentTestRun = ({\n inputMessages = [],\n output,\n rememberedMessages = [],\n systemMessages = [],\n taggedSystemMessages = {},\n requestContext = new RequestContext(),\n runId = crypto.randomUUID(),\n}: {\n inputMessages?: ScorerRunInputForAgent['inputMessages'];\n output: ScorerRunOutputForAgent;\n rememberedMessages?: ScorerRunInputForAgent['rememberedMessages'];\n systemMessages?: ScorerRunInputForAgent['systemMessages'];\n taggedSystemMessages?: ScorerRunInputForAgent['taggedSystemMessages'];\n requestContext?: RequestContext;\n runId?: string;\n}): {\n input: ScorerRunInputForAgent;\n output: ScorerRunOutputForAgent;\n requestContext: RequestContext;\n runId: string;\n} => {\n return {\n input: {\n inputMessages,\n rememberedMessages,\n systemMessages,\n taggedSystemMessages,\n },\n output,\n requestContext,\n runId,\n };\n};\n\n/**\n * Information about a tool call extracted from scorer output.\n */\nexport type ToolCallInfo = {\n /** Name of the tool that was called */\n toolName: string;\n /** Unique identifier for the tool call */\n toolCallId: string;\n /** Index of the message containing this tool call */\n messageIndex: number;\n /** Index of the invocation within the message's tool invocations */\n invocationIndex: number;\n};\n\n/**\n * Extracts all tool calls from a scorer run output.\n *\n * Iterates through all messages and their tool invocations to collect\n * information about tools that were called (with state 'result' or 'call').\n *\n * @param output - The scorer run output (array of MastraDBMessage)\n * @returns An object containing tool names and detailed tool call info\n *\n * @example\n * ```ts\n * const scorer = createScorer({ ... })\n * .preprocess(({ run }) => {\n * const { tools, toolCallInfos } = extractToolCalls(run.output);\n * return {\n * toolsUsed: tools,\n * toolCount: tools.length,\n * };\n * });\n * ```\n */\nexport function extractToolCalls(output: ScorerRunOutputForAgent): { tools: string[]; toolCallInfos: ToolCallInfo[] } {\n const toolCalls: string[] = [];\n const toolCallInfos: ToolCallInfo[] = [];\n\n for (let messageIndex = 0; messageIndex < output.length; messageIndex++) {\n const message = output[messageIndex];\n // Tool invocations are now nested under content\n if (message?.content?.toolInvocations) {\n for (let invocationIndex = 0; invocationIndex < message.content.toolInvocations.length; invocationIndex++) {\n const invocation = message.content.toolInvocations[invocationIndex];\n if (invocation && invocation.toolName && (invocation.state === 'result' || invocation.state === 'call')) {\n toolCalls.push(invocation.toolName);\n toolCallInfos.push({\n toolName: invocation.toolName,\n toolCallId: invocation.toolCallId || `${messageIndex}-${invocationIndex}`,\n messageIndex,\n invocationIndex,\n });\n }\n }\n }\n }\n\n return { tools: toolCalls, toolCallInfos };\n}\n\n/**\n * Extracts text content from all input messages.\n *\n * @param runInput - The scorer run input\n * @returns An array of text strings from each input message\n *\n * @example\n * ```ts\n * const scorer = createScorer({ ... })\n * .preprocess(({ run }) => {\n * const messages = extractInputMessages(run.input);\n * return { allUserMessages: messages.join('\\n') };\n * });\n * ```\n */\nexport const extractInputMessages = (runInput: ScorerRunInputForAgent | undefined): string[] => {\n return runInput?.inputMessages?.map(msg => getTextContentFromMastraDBMessage(msg)) || [];\n};\n\n/**\n * Extracts text content from all assistant response messages.\n *\n * Filters for messages with role 'assistant' and extracts their text content.\n *\n * @param runOutput - The scorer run output (array of MastraDBMessage)\n * @returns An array of text strings from each assistant message\n *\n * @example\n * ```ts\n * const scorer = createScorer({ ... })\n * .preprocess(({ run }) => {\n * const responses = extractAgentResponseMessages(run.output);\n * return { allResponses: responses.join('\\n') };\n * });\n * ```\n */\nexport const extractAgentResponseMessages = (runOutput: ScorerRunOutputForAgent): string[] => {\n return runOutput.filter(msg => msg.role === 'assistant').map(msg => getTextContentFromMastraDBMessage(msg));\n};\n"]}
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
# @mastra/evals Documentation
|
|
2
|
+
|
|
3
|
+
> Embedded documentation for coding agents
|
|
4
|
+
|
|
5
|
+
## Quick Start
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
# Read the skill overview
|
|
9
|
+
cat docs/SKILL.md
|
|
10
|
+
|
|
11
|
+
# Get the source map
|
|
12
|
+
cat docs/SOURCE_MAP.json
|
|
13
|
+
|
|
14
|
+
# Read topic documentation
|
|
15
|
+
cat docs/<topic>/01-overview.md
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
## Structure
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
docs/
|
|
22
|
+
├── SKILL.md # Entry point
|
|
23
|
+
├── README.md # This file
|
|
24
|
+
├── SOURCE_MAP.json # Export index
|
|
25
|
+
├── evals/ (19 files)
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
## Version
|
|
29
|
+
|
|
30
|
+
Package: @mastra/evals
|
|
31
|
+
Version: 1.0.0-beta.4
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: mastra-evals-docs
|
|
3
|
+
description: Documentation for @mastra/evals. Includes links to type definitions and readable implementation code in dist/.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# @mastra/evals Documentation
|
|
7
|
+
|
|
8
|
+
> **Version**: 1.0.0-beta.4
|
|
9
|
+
> **Package**: @mastra/evals
|
|
10
|
+
|
|
11
|
+
## Quick Navigation
|
|
12
|
+
|
|
13
|
+
Use SOURCE_MAP.json to find any export:
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
cat docs/SOURCE_MAP.json
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
Each export maps to:
|
|
20
|
+
- **types**: `.d.ts` file with JSDoc and API signatures
|
|
21
|
+
- **implementation**: `.js` chunk file with readable source
|
|
22
|
+
- **docs**: Conceptual documentation in `docs/`
|
|
23
|
+
|
|
24
|
+
## Top Exports
|
|
25
|
+
|
|
26
|
+
|
|
27
|
+
|
|
28
|
+
See SOURCE_MAP.json for the complete list.
|
|
29
|
+
|
|
30
|
+
## Available Topics
|
|
31
|
+
|
|
32
|
+
- [Evals](evals/) - 19 file(s)
|
|
@@ -0,0 +1,130 @@
|
|
|
1
|
+
> Overview of scorers in Mastra, detailing their capabilities for evaluating AI outputs and measuring performance.
|
|
2
|
+
|
|
3
|
+
# Scorers overview
|
|
4
|
+
|
|
5
|
+
While traditional software tests have clear pass/fail conditions, AI outputs are non-deterministic — they can vary with the same input. **Scorers** help bridge this gap by providing quantifiable metrics for measuring agent quality.
|
|
6
|
+
|
|
7
|
+
Scorers are automated tests that evaluate Agents outputs using model-graded, rule-based, and statistical methods. Scorers return **scores**: numerical values (typically between 0 and 1) that quantify how well an output meets your evaluation criteria. These scores enable you to objectively track performance, compare different approaches, and identify areas for improvement in your AI systems. Scorers can be customized with your own prompts and scoring functions.
|
|
8
|
+
|
|
9
|
+
Scorers can be run in the cloud, capturing real-time results. But scorers can also be part of your CI/CD pipeline, allowing you to test and monitor your agents over time.
|
|
10
|
+
|
|
11
|
+
## Types of Scorers
|
|
12
|
+
|
|
13
|
+
There are different kinds of scorers, each serving a specific purpose. Here are some common types:
|
|
14
|
+
|
|
15
|
+
1. **Textual Scorers**: Evaluate accuracy, reliability, and context understanding of agent responses
|
|
16
|
+
2. **Classification Scorers**: Measure accuracy in categorizing data based on predefined categories
|
|
17
|
+
3. **Prompt Engineering Scorers**: Explore impact of different instructions and input formats
|
|
18
|
+
|
|
19
|
+
## Installation
|
|
20
|
+
|
|
21
|
+
To access Mastra's scorers feature install the `@mastra/evals` package.
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
npm install @mastra/evals@beta
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
## Live evaluations
|
|
28
|
+
|
|
29
|
+
**Live evaluations** allow you to automatically score AI outputs in real-time as your agents and workflows operate. Instead of running evaluations manually or in batches, scorers run asynchronously alongside your AI systems, providing continuous quality monitoring.
|
|
30
|
+
|
|
31
|
+
### Adding scorers to agents
|
|
32
|
+
|
|
33
|
+
You can add built-in scorers to your agents to automatically evaluate their outputs. See the [full list of built-in scorers](https://mastra.ai/docs/v1/evals/built-in-scorers) for all available options.
|
|
34
|
+
|
|
35
|
+
```typescript title="src/mastra/agents/evaluated-agent.ts"
|
|
36
|
+
import { Agent } from "@mastra/core/agent";
|
|
37
|
+
import {
|
|
38
|
+
createAnswerRelevancyScorer,
|
|
39
|
+
createToxicityScorer,
|
|
40
|
+
} from "@mastra/evals/scorers/prebuilt";
|
|
41
|
+
|
|
42
|
+
export const evaluatedAgent = new Agent({
|
|
43
|
+
scorers: {
|
|
44
|
+
relevancy: {
|
|
45
|
+
scorer: createAnswerRelevancyScorer({ model: "openai/gpt-4.1-nano" }),
|
|
46
|
+
sampling: { type: "ratio", rate: 0.5 },
|
|
47
|
+
},
|
|
48
|
+
safety: {
|
|
49
|
+
scorer: createToxicityScorer({ model: "openai/gpt-4.1-nano" }),
|
|
50
|
+
sampling: { type: "ratio", rate: 1 },
|
|
51
|
+
},
|
|
52
|
+
},
|
|
53
|
+
});
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### Adding scorers to workflow steps
|
|
57
|
+
|
|
58
|
+
You can also add scorers to individual workflow steps to evaluate outputs at specific points in your process:
|
|
59
|
+
|
|
60
|
+
```typescript title="src/mastra/workflows/content-generation.ts"
|
|
61
|
+
import { createWorkflow, createStep } from "@mastra/core/workflows";
|
|
62
|
+
import { z } from "zod";
|
|
63
|
+
import { customStepScorer } from "../scorers/custom-step-scorer";
|
|
64
|
+
|
|
65
|
+
const contentStep = createStep({
|
|
66
|
+
scorers: {
|
|
67
|
+
customStepScorer: {
|
|
68
|
+
scorer: customStepScorer(),
|
|
69
|
+
sampling: {
|
|
70
|
+
type: "ratio",
|
|
71
|
+
rate: 1, // Score every step execution
|
|
72
|
+
}
|
|
73
|
+
}
|
|
74
|
+
},
|
|
75
|
+
});
|
|
76
|
+
|
|
77
|
+
export const contentWorkflow = createWorkflow({ ... })
|
|
78
|
+
.then(contentStep)
|
|
79
|
+
.commit();
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
### How live evaluations work
|
|
83
|
+
|
|
84
|
+
**Asynchronous execution**: Live evaluations run in the background without blocking your agent responses or workflow execution. This ensures your AI systems maintain their performance while still being monitored.
|
|
85
|
+
|
|
86
|
+
**Sampling control**: The `sampling.rate` parameter (0-1) controls what percentage of outputs get scored:
|
|
87
|
+
|
|
88
|
+
- `1.0`: Score every single response (100%)
|
|
89
|
+
- `0.5`: Score half of all responses (50%)
|
|
90
|
+
- `0.1`: Score 10% of responses
|
|
91
|
+
- `0.0`: Disable scoring
|
|
92
|
+
|
|
93
|
+
**Automatic storage**: All scoring results are automatically stored in the `mastra_scorers` table in your configured database, allowing you to analyze performance trends over time.
|
|
94
|
+
|
|
95
|
+
## Trace evaluations
|
|
96
|
+
|
|
97
|
+
In addition to live evaluations, you can use scorers to evaluate historical traces from your agent interactions and workflows. This is particularly useful for analyzing past performance, debugging issues, or running batch evaluations.
|
|
98
|
+
|
|
99
|
+
> **Note:**
|
|
100
|
+
|
|
101
|
+
**Observability Required**
|
|
102
|
+
|
|
103
|
+
To score traces, you must first configure observability in your Mastra instance to collect trace data. See [Tracing documentation](../observability/tracing/overview) for setup instructions.
|
|
104
|
+
|
|
105
|
+
### Scoring traces with Studio
|
|
106
|
+
|
|
107
|
+
To score traces, you first need to register your scorers with your Mastra instance:
|
|
108
|
+
|
|
109
|
+
```typescript
|
|
110
|
+
const mastra = new Mastra({
|
|
111
|
+
scorers: {
|
|
112
|
+
answerRelevancy: myAnswerRelevancyScorer,
|
|
113
|
+
responseQuality: myResponseQualityScorer,
|
|
114
|
+
},
|
|
115
|
+
});
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
Once registered, you can score traces interactively within Studio under the Observability section. This provides a user-friendly interface for running scorers against historical traces.
|
|
119
|
+
|
|
120
|
+
## Testing scorers locally
|
|
121
|
+
|
|
122
|
+
Mastra provides a CLI command `mastra dev` to test your scorers. Studio includes a scorers section where you can run individual scorers against test inputs and view detailed results.
|
|
123
|
+
|
|
124
|
+
For more details, see [Studio](https://mastra.ai/docs/v1/getting-started/studio) docs.
|
|
125
|
+
|
|
126
|
+
## Next steps
|
|
127
|
+
|
|
128
|
+
- Learn how to create your own scorers in the [Creating Custom Scorers](https://mastra.ai/docs/v1/evals/custom-scorers) guide
|
|
129
|
+
- Explore built-in scorers in the [Built-in Scorers](https://mastra.ai/docs/v1/evals/built-in-scorers) section
|
|
130
|
+
- Test scorers with [Studio](https://mastra.ai/docs/v1/getting-started/studio)
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
> Overview of Mastra
|
|
2
|
+
|
|
3
|
+
# Built-in Scorers
|
|
4
|
+
|
|
5
|
+
Mastra provides a comprehensive set of built-in scorers for evaluating AI outputs. These scorers are optimized for common evaluation scenarios and are ready to use in your agents and workflows.
|
|
6
|
+
|
|
7
|
+
To create your own scorers, see the [Custom Scorers](https://mastra.ai/docs/v1/evals/custom-scorers) guide.
|
|
8
|
+
|
|
9
|
+
## Available scorers
|
|
10
|
+
|
|
11
|
+
### Accuracy and reliability
|
|
12
|
+
|
|
13
|
+
These scorers evaluate how correct, truthful, and complete your agent's answers are:
|
|
14
|
+
|
|
15
|
+
- [`answer-relevancy`](https://mastra.ai/reference/v1/evals/answer-relevancy): Evaluates how well responses address the input query (`0-1`, higher is better)
|
|
16
|
+
- [`answer-similarity`](https://mastra.ai/reference/v1/evals/answer-similarity): Compares agent outputs against ground-truth answers for CI/CD testing using semantic analysis (`0-1`, higher is better)
|
|
17
|
+
- [`faithfulness`](https://mastra.ai/reference/v1/evals/faithfulness): Measures how accurately responses represent provided context (`0-1`, higher is better)
|
|
18
|
+
- [`hallucination`](https://mastra.ai/reference/v1/evals/hallucination): Detects factual contradictions and unsupported claims (`0-1`, lower is better)
|
|
19
|
+
- [`completeness`](https://mastra.ai/reference/v1/evals/completeness): Checks if responses include all necessary information (`0-1`, higher is better)
|
|
20
|
+
- [`content-similarity`](https://mastra.ai/reference/v1/evals/content-similarity): Measures textual similarity using character-level matching (`0-1`, higher is better)
|
|
21
|
+
- [`textual-difference`](https://mastra.ai/reference/v1/evals/textual-difference): Measures textual differences between strings (`0-1`, higher means more similar)
|
|
22
|
+
- [`tool-call-accuracy`](https://mastra.ai/reference/v1/evals/tool-call-accuracy): Evaluates whether the LLM selects the correct tool from available options (`0-1`, higher is better)
|
|
23
|
+
- [`prompt-alignment`](https://mastra.ai/reference/v1/evals/prompt-alignment): Measures how well agent responses align with user prompt intent, requirements, completeness, and format (`0-1`, higher is better)
|
|
24
|
+
|
|
25
|
+
### Context quality
|
|
26
|
+
|
|
27
|
+
These scorers evaluate the quality and relevance of context used in generating responses:
|
|
28
|
+
|
|
29
|
+
- [`context-precision`](https://mastra.ai/reference/v1/evals/context-precision): Evaluates context relevance and ranking using Mean Average Precision, rewarding early placement of relevant context (`0-1`, higher is better)
|
|
30
|
+
- [`context-relevance`](https://mastra.ai/reference/v1/evals/context-relevance): Measures context utility with nuanced relevance levels, usage tracking, and missing context detection (`0-1`, higher is better)
|
|
31
|
+
|
|
32
|
+
> tip Context Scorer Selection
|
|
33
|
+
>
|
|
34
|
+
>- Use **Context Precision** when context ordering matters and you need standard IR metrics (ideal for RAG ranking evaluation)
|
|
35
|
+
>- Use **Context Relevance** when you need detailed relevance assessment and want to track context usage and identify gaps
|
|
36
|
+
>
|
|
37
|
+
>Both context scorers support:
|
|
38
|
+
>
|
|
39
|
+
>- **Static context**: Pre-defined context arrays
|
|
40
|
+
>- **Dynamic context extraction**: Extract context from runs using custom functions (ideal for RAG systems, vector databases, etc.)
|
|
41
|
+
|
|
42
|
+
### Output quality
|
|
43
|
+
|
|
44
|
+
These scorers evaluate adherence to format, style, and safety requirements:
|
|
45
|
+
|
|
46
|
+
- [`tone-consistency`](https://mastra.ai/reference/v1/evals/tone-consistency): Measures consistency in formality, complexity, and style (`0-1`, higher is better)
|
|
47
|
+
- [`toxicity`](https://mastra.ai/reference/v1/evals/toxicity): Detects harmful or inappropriate content (`0-1`, lower is better)
|
|
48
|
+
- [`bias`](https://mastra.ai/reference/v1/evals/bias): Detects potential biases in the output (`0-1`, lower is better)
|
|
49
|
+
- [`keyword-coverage`](https://mastra.ai/reference/v1/evals/keyword-coverage): Assesses technical terminology usage (`0-1`, higher is better)
|