agentv 2.5.4 → 2.5.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -31,13 +31,9 @@ evalcases:
31
31
  - id: addition
32
32
  expected_outcome: Correctly calculates 15 + 27 = 42
33
33
 
34
- input_messages:
35
- - role: user
36
- content: What is 15 + 27?
34
+ input: What is 15 + 27?
37
35
 
38
- expected_messages:
39
- - role: assistant
40
- content: "42"
36
+ expected_output: "42"
41
37
 
42
38
  execution:
43
39
  evaluators:
@@ -108,8 +104,8 @@ See [AGENTS.md](AGENTS.md) for development guidelines and design principles.
108
104
  For large-scale evaluations, AgentV supports JSONL (JSON Lines) format as an alternative to YAML:
109
105
 
110
106
  ```jsonl
111
- {"id": "test-1", "expected_outcome": "Calculates correctly", "input_messages": [{"role": "user", "content": "What is 2+2?"}]}
112
- {"id": "test-2", "expected_outcome": "Provides explanation", "input_messages": [{"role": "user", "content": "Explain variables"}]}
107
+ {"id": "test-1", "expected_outcome": "Calculates correctly", "input": "What is 2+2?"}
108
+ {"id": "test-2", "expected_outcome": "Provides explanation", "input": "Explain variables"}
113
109
  ```
114
110
 
115
111
  Optional sidecar YAML metadata file (`dataset.yaml` alongside `dataset.jsonl`):
@@ -184,7 +180,7 @@ execution:
184
180
  script: ./validators/check_answer.py
185
181
  ```
186
182
 
187
- For complete templates, examples, and evaluator patterns, see: [custom-evaluators.md](.claude/skills/agentv-eval-builder/references/custom-evaluators.md)
183
+ For complete templates, examples, and evaluator patterns, see: [custom-evaluators](https://agentv.dev/evaluators/custom-evaluators/)
188
184
 
189
185
  ### Compare Evaluation Results
190
186
 
@@ -238,7 +234,7 @@ Write validators in any language (Python, TypeScript, Node, etc.):
238
234
  ```
239
235
 
240
236
  For complete examples and patterns, see:
241
- - [custom-evaluators skill](.claude/skills/agentv-eval-builder/references/custom-evaluators.md)
237
+ - [custom-evaluators](https://agentv.dev/evaluators/custom-evaluators/)
242
238
  - [code-judge-sdk example](examples/features/code-judge-sdk)
243
239
 
244
240
  ### LLM Judges
@@ -264,9 +260,7 @@ evalcases:
264
260
  - id: quicksort-explain
265
261
  expected_outcome: Explain how quicksort works
266
262
 
267
- input_messages:
268
- - role: user
269
- content: Explain quicksort algorithm
263
+ input: Explain quicksort algorithm
270
264
 
271
265
  rubrics:
272
266
  - Mentions divide-and-conquer approach
@@ -281,7 +275,7 @@ Auto-generate rubrics from expected outcomes:
281
275
  agentv generate rubrics evals/my-eval.yaml
282
276
  ```
283
277
 
284
- See [rubric-evaluator skill](.claude/skills/agentv-eval-builder/references/rubric-evaluator.md) for detailed patterns.
278
+ See [rubric evaluator](https://agentv.dev/evaluation/rubrics/) for detailed patterns.
285
279
 
286
280
  ## Advanced Configuration
287
281
 
@@ -310,9 +304,15 @@ Automatically retries on rate limits, transient 5xx errors, and network failures
310
304
  - AI agents: Ask Claude Code to `/agentv-eval-builder` to create and iterate on evals
311
305
 
312
306
  **Detailed Guides:**
313
- - [Evaluation format and structure](.claude/skills/agentv-eval-builder/SKILL.md)
314
- - [Custom evaluators](.claude/skills/agentv-eval-builder/references/custom-evaluators.md)
315
- - [Structured data evaluation](.claude/skills/agentv-eval-builder/references/structured-data-evaluators.md)
307
+ - [Evaluation format and structure](https://agentv.dev/evaluation/eval-files/)
308
+ - [Custom evaluators](https://agentv.dev/evaluators/custom-evaluators/)
309
+ - [Rubric evaluator](https://agentv.dev/evaluation/rubrics/)
310
+ - [Composite evaluator](https://agentv.dev/evaluators/composite/)
311
+ - [Tool trajectory evaluator](https://agentv.dev/evaluators/tool-trajectory/)
312
+ - [Structured data evaluators](https://agentv.dev/evaluators/structured-data/)
313
+ - [Batch CLI evaluation](https://agentv.dev/evaluation/batch-cli/)
314
+ - [Compare results](https://agentv.dev/tools/compare/)
315
+ - [Example evaluations](https://agentv.dev/evaluation/examples/)
316
316
 
317
317
  **Reference:**
318
318
  - Monorepo structure: `packages/core/` (engine), `packages/eval/` (evaluation logic), `apps/cli/` (commands)