@agentv/sdk 4.42.1-next.1 → 4.42.2-next.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +43 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# @agentv/sdk
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
Public lightweight SDK for AgentV - build YAML-aligned eval suites, custom graders, and prompt templates around the canonical AgentV eval model.
|
|
4
4
|
|
|
5
5
|
## Installation
|
|
6
6
|
|
|
@@ -8,6 +8,21 @@ Evaluation SDK for AgentV - build YAML-aligned eval suites, custom graders, and
|
|
|
8
8
|
npm install @agentv/sdk
|
|
9
9
|
```
|
|
10
10
|
|
|
11
|
+
## Migrating from `@agentv/eval`
|
|
12
|
+
|
|
13
|
+
Use `@agentv/sdk` for new code:
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
npm uninstall @agentv/eval
|
|
17
|
+
npm install @agentv/sdk
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
```typescript
|
|
21
|
+
import { defineCodeGrader } from '@agentv/sdk';
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
`@agentv/eval` remains only as a temporary deprecated compatibility package that re-exports this SDK for existing consumers. New docs, examples, scaffolds, and skills should not import from it.
|
|
25
|
+
|
|
11
26
|
## Quick Start
|
|
12
27
|
|
|
13
28
|
### defineAssertion (simplest way)
|
|
@@ -97,6 +112,33 @@ export default defineEval({
|
|
|
97
112
|
|
|
98
113
|
The helpers return ordinary `assertions` entries such as `type: contains`, `type: llm-grader`, and `type: code-grader`. CamelCase SDK options such as `minScore` and `maxSteps` lower to canonical YAML keys such as `min_score` and `max_steps`.
|
|
99
114
|
|
|
115
|
+
If you are coming from Braintrust `scores` or DeepEval metrics, model reusable checks as small AgentV-native helper factories that return these grader configs. They still lower to the same YAML/runtime contract:
|
|
116
|
+
|
|
117
|
+
```typescript
|
|
118
|
+
import { defineEval, graders } from '@agentv/sdk';
|
|
119
|
+
|
|
120
|
+
function ragFaithfulness() {
|
|
121
|
+
return graders.llmGrader({
|
|
122
|
+
name: 'rag-faithfulness',
|
|
123
|
+
target: 'grader-target',
|
|
124
|
+
prompt: 'Grade whether the answer is supported by the provided context.',
|
|
125
|
+
});
|
|
126
|
+
}
|
|
127
|
+
|
|
128
|
+
export default defineEval({
|
|
129
|
+
name: 'rag-suite',
|
|
130
|
+
tests: [
|
|
131
|
+
{
|
|
132
|
+
id: 'grounded-answer',
|
|
133
|
+
input: 'Answer using the retrieved context.',
|
|
134
|
+
assertions: [ragFaithfulness()],
|
|
135
|
+
},
|
|
136
|
+
],
|
|
137
|
+
});
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
Python workflows should emit canonical YAML/JSONL or implement code graders over the stdin/stdout contract. The repo-local helper under `examples/features/sdk-python/` is an example, not a promised published Python package.
|
|
141
|
+
|
|
100
142
|
## Exports
|
|
101
143
|
|
|
102
144
|
- `defineAssertion(handler)` - Define a custom assertion (pass/fail + optional score)
|