agentv 2.1.0 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -101,7 +101,27 @@ See [AGENTS.md](AGENTS.md) for development guidelines and design principles.
101
101
 
102
102
  ## Core Concepts
103
103
 
104
- **Evaluation files** (`.yaml`) define test cases with expected outcomes. **Targets** specify which agent/provider to evaluate. **Judges** (code or LLM) score results. **Results** are written as JSONL/YAML for analysis and comparison.
104
+ **Evaluation files** (`.yaml` or `.jsonl`) define test cases with expected outcomes. **Targets** specify which agent/provider to evaluate. **Judges** (code or LLM) score results. **Results** are written as JSONL/YAML for analysis and comparison.
105
+
106
+ ### JSONL Format Support
107
+
108
+ For large-scale evaluations, AgentV supports JSONL (JSON Lines) format as an alternative to YAML:
109
+
110
+ ```jsonl
111
+ {"id": "test-1", "expected_outcome": "Calculates correctly", "input_messages": [{"role": "user", "content": "What is 2+2?"}]}
112
+ {"id": "test-2", "expected_outcome": "Provides explanation", "input_messages": [{"role": "user", "content": "Explain variables"}]}
113
+ ```
114
+
115
+ Optional sidecar YAML metadata file (`dataset.yaml` alongside `dataset.jsonl`):
116
+ ```yaml
117
+ description: Math evaluation dataset
118
+ dataset: math-tests
119
+ execution:
120
+ target: azure_base
121
+ evaluator: llm_judge
122
+ ```
123
+
124
+ Benefits: Streaming-friendly, Git-friendly diffs, programmatic generation, industry standard (DeepEval, LangWatch, Hugging Face).
105
125
 
106
126
  ## Usage
107
127