agentv 0.20.1 → 0.21.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +53 -0
- package/dist/{chunk-GDGNKNKP.js → chunk-WOCXZEH4.js} +806 -230
- package/dist/chunk-WOCXZEH4.js.map +1 -0
- package/dist/cli.js +5 -2
- package/dist/cli.js.map +1 -1
- package/dist/index.js +3 -3
- package/dist/templates/.claude/skills/agentv-eval-builder/SKILL.md +3 -3
- package/dist/templates/agentv/.env.template +23 -0
- package/package.json +4 -4
- package/dist/chunk-GDGNKNKP.js.map +0 -1
package/README.md
CHANGED
|
@@ -334,6 +334,59 @@ Evaluation criteria and guidelines...
|
|
|
334
334
|
}
|
|
335
335
|
```
|
|
336
336
|
|
|
337
|
+
## Rubric-Based Evaluation
|
|
338
|
+
|
|
339
|
+
AgentV supports structured evaluation through rubrics - lists of criteria that define what makes a good response. Rubrics are checked by an LLM judge and scored based on weights and requirements.
|
|
340
|
+
|
|
341
|
+
### Basic Usage
|
|
342
|
+
|
|
343
|
+
Define rubrics inline using simple strings:
|
|
344
|
+
|
|
345
|
+
```yaml
|
|
346
|
+
- id: example-1
|
|
347
|
+
expected_outcome: Explain quicksort algorithm
|
|
348
|
+
rubrics:
|
|
349
|
+
- Mentions divide-and-conquer approach
|
|
350
|
+
- Explains the partition step
|
|
351
|
+
- States time complexity correctly
|
|
352
|
+
```
|
|
353
|
+
|
|
354
|
+
Or use detailed objects for fine-grained control:
|
|
355
|
+
|
|
356
|
+
```yaml
|
|
357
|
+
rubrics:
|
|
358
|
+
- id: structure
|
|
359
|
+
description: Has clear headings and organization
|
|
360
|
+
weight: 1.0
|
|
361
|
+
required: true
|
|
362
|
+
- id: examples
|
|
363
|
+
description: Includes practical examples
|
|
364
|
+
weight: 0.5
|
|
365
|
+
required: false
|
|
366
|
+
```
|
|
367
|
+
|
|
368
|
+
### Generate Rubrics
|
|
369
|
+
|
|
370
|
+
Automatically generate rubrics from `expected_outcome` fields:
|
|
371
|
+
|
|
372
|
+
```bash
|
|
373
|
+
# Generate rubrics for all eval cases without rubrics
|
|
374
|
+
agentv generate rubrics evals/my-eval.yaml
|
|
375
|
+
|
|
376
|
+
# Use a specific LLM target for generation
|
|
377
|
+
agentv generate rubrics evals/my-eval.yaml --target openai:gpt-4o
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
### Scoring and Verdicts
|
|
381
|
+
|
|
382
|
+
- **Score**: (sum of satisfied weights) / (total weights)
|
|
383
|
+
- **Verdicts**:
|
|
384
|
+
- `pass`: Score ≥ 0.8 and all required rubrics met
|
|
385
|
+
- `borderline`: Score ≥ 0.6 and all required rubrics met
|
|
386
|
+
- `fail`: Score < 0.6 or any required rubric failed
|
|
387
|
+
|
|
388
|
+
For complete examples and detailed patterns, see [examples/features/evals/rubric/](examples/features/evals/rubric/).
|
|
389
|
+
|
|
337
390
|
## Advanced Configuration
|
|
338
391
|
|
|
339
392
|
### Retry Configuration
|