agentv 0.2.6 → 0.2.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -74,35 +74,43 @@ You are now ready to start development. The monorepo contains:
74
74
 
75
75
  ## Quick Start
76
76
 
77
- ### Linting Eval Files
77
+ ### Configuring Guideline Patterns
78
78
 
79
- Validate your eval and targets files before running them:
79
+ AgentV automatically detects guideline files and treats them differently from regular file content. You can customize which files are considered guidelines using an optional `.agentv/config.yaml` configuration file.
80
80
 
81
- ```bash
82
- # Lint a single file
83
- agentv lint evals/my-test.yaml
81
+ **Config file discovery:**
82
+ - AgentV searches for `.agentv/config.yaml` starting from the eval file's directory
83
+ - Walks up the directory tree to the repository root
84
+ - Uses the first config file found (similar to how `targets.yaml` is discovered)
85
+ - This allows you to place one config file at the project root for all evals
84
86
 
85
- # Lint multiple files
86
- agentv lint evals/test1.yaml evals/test2.yaml
87
+ **Custom patterns** (create `.agentv/config.yaml` in same directory as your eval file):
87
88
 
88
- # Lint entire directory (recursively finds all YAML files)
89
- agentv lint evals/
89
+ ```yaml
90
+ # .agentv/config.yaml
91
+ guideline_patterns:
92
+ - "**/*.guide.md" # Match all .guide.md files
93
+ - "**/guidelines/**" # Match all files in /guidelines/ dirs
94
+ - "docs/AGENTS.md" # Match specific files
95
+ - "**/*.rules.md" # Match by naming convention
96
+ ```
90
97
 
91
- # Enable strict mode for additional checks
92
- agentv lint --strict evals/
98
+ See [config.yaml example](docs/examples/simple/.agentv/config.yaml) for more pattern examples.
93
99
 
94
- # Output results in JSON format
95
- agentv lint --json evals/
96
- ```
100
+ ### Validating Eval Files
97
101
 
98
- **Linter features:**
102
+ Validate your eval and targets files before running them:
103
+
104
+ ```bash
105
+ # Validate a single file
106
+ agentv validate evals/my-eval.yaml
99
107
 
100
- - Validates `$schema` field is present and correct
101
- - Checks required fields and structure for eval and targets files
102
- - Validates file references exist and are accessible
103
- - Provides clear error messages with file path and location context
104
- - Exits with non-zero code on validation failures (CI-friendly)
105
- - Supports strict mode for additional checks (e.g., non-empty file content)
108
+ # Validate multiple files
109
+ agentv validate evals/eval1.yaml evals/eval2.yaml
110
+
111
+ # Validate entire directory (recursively finds all YAML files)
112
+ agentv validate evals/
113
+ ```
106
114
 
107
115
  **File type detection:**
108
116
 
@@ -112,7 +120,7 @@ All AgentV files must include a `$schema` field:
112
120
  # Eval files
113
121
  $schema: agentv-eval-v2
114
122
  evalcases:
115
- - id: test-1
123
+ - id: eval-1
116
124
  # ...
117
125
 
118
126
  # Targets files
@@ -126,29 +134,29 @@ Files without a `$schema` field will be rejected with a clear error message.
126
134
 
127
135
  ### Running Evals
128
136
 
129
- Run eval (target auto-selected from test file or CLI override):
137
+ Run eval (target auto-selected from eval file or CLI override):
130
138
 
131
139
  ```bash
132
- # If your test.yaml contains "target: azure_base", it will be used automatically
133
- agentv eval "path/to/test.yaml"
140
+ # If your eval.yaml contains "target: azure_base", it will be used automatically
141
+ agentv eval "path/to/eval.yaml"
134
142
 
135
- # Override the test file's target with CLI flag
136
- agentv eval --target vscode_projectx "path/to/test.yaml"
143
+ # Override the eval file's target with CLI flag
144
+ agentv eval --target vscode_projectx "path/to/eval.yaml"
137
145
  ```
138
146
 
139
- Run a specific test case with custom targets path:
147
+ Run a specific eval case with custom targets path:
140
148
 
141
149
  ```bash
142
- agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --eval-id "my-test-case" "path/to/test.yaml"
150
+ agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --eval-id "my-eval-case" "path/to/eval.yaml"
143
151
  ```
144
152
 
145
153
  ### Command Line Options
146
154
 
147
- - `test_file`: Path to test YAML file (required, positional argument)
148
- - `--target TARGET`: Execution target name from targets.yaml (overrides target specified in test file)
155
+ - `eval_file`: Path to eval YAML file (required, positional argument)
156
+ - `--target TARGET`: Execution target name from targets.yaml (overrides target specified in eval file)
149
157
  - `--targets TARGETS`: Path to targets.yaml file (default: ./.agentv/targets.yaml)
150
- - `--eval-id EVAL_ID`: Run only the test case with this specific ID
151
- - `--out OUTPUT_FILE`: Output file path (default: results/{testname}_{timestamp}.jsonl)
158
+ - `--eval-id EVAL_ID`: Run only the eval case with this specific ID
159
+ - `--out OUTPUT_FILE`: Output file path (default: results/{evalname}_{timestamp}.jsonl)
152
160
  - `--format FORMAT`: Output format: 'jsonl' or 'yaml' (default: jsonl)
153
161
  - `--dry-run`: Run with mock model for testing
154
162
  - `--agent-timeout SECONDS`: Timeout in seconds for agent response polling (default: 120)
@@ -162,12 +170,12 @@ agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --eval-id
162
170
  The CLI determines which execution target to use with the following precedence:
163
171
 
164
172
  1. CLI flag override: `--target my_target` (when provided and not 'default')
165
- 2. Test file specification: `target: my_target` key in the .test.yaml file
173
+ 2. Eval file specification: `target: my_target` key in the .eval.yaml file
166
174
  3. Default fallback: Uses the 'default' target (original behavior)
167
175
 
168
- This allows test files to specify their preferred target while still allowing command-line overrides for flexibility, and maintains backward compatibility with existing workflows.
176
+ This allows eval files to specify their preferred target while still allowing command-line overrides for flexibility, and maintains backward compatibility with existing workflows.
169
177
 
170
- Output goes to `.agentv/results/{testname}_{timestamp}.jsonl` (or `.yaml`) unless `--out` is provided.
178
+ Output goes to `.agentv/results/{evalname}_{timestamp}.jsonl` (or `.yaml`) unless `--out` is provided.
171
179
 
172
180
  ### Tips for VS Code Copilot Evals
173
181
 
@@ -189,7 +197,7 @@ Environment keys (configured via targets.yaml):
189
197
 
190
198
  ## Targets and Environment Variables
191
199
 
192
- Execution targets in `.agentv/targets.yaml` decouple tests from providers/settings and provide flexible environment variable mapping.
200
+ Execution targets in `.agentv/targets.yaml` decouple evals from providers/settings and provide flexible environment variable mapping.
193
201
 
194
202
  ### Target Configuration Structure
195
203
 
@@ -251,8 +259,8 @@ Each target specifies:
251
259
  When using VS Code or other AI agents that may experience timeouts, the evaluator includes automatic retry functionality:
252
260
 
253
261
  - **Timeout detection:** Automatically detects when agents timeout
254
- - **Automatic retries:** When a timeout occurs, the same test case is retried up to `--max-retries` times (default: 2)
255
- - **Retry behavior:** Only timeouts trigger retries; other errors proceed to the next test case
262
+ - **Automatic retries:** When a timeout occurs, the same eval case is retried up to `--max-retries` times (default: 2)
263
+ - **Retry behavior:** Only timeouts trigger retries; other errors proceed to the next eval case
256
264
  - **Timeout configuration:** Use `--agent-timeout` to adjust how long to wait for agent responses
257
265
 
258
266
  Example with custom timeout settings:
@@ -263,7 +271,7 @@ agentv eval evals/projectx/example.yaml --target vscode_projectx --agent-timeout
263
271
 
264
272
  ## How the Evals Work
265
273
 
266
- For each test case in a `.yaml` file:
274
+ For each eval case in a `.yaml` file:
267
275
 
268
276
  1. Parse YAML and collect user messages (inline text and referenced files)
269
277
  2. Extract code blocks from text for structured prompting
@@ -306,12 +314,12 @@ AgentV uses an AI-powered quality grader that:
306
314
 
307
315
  ### Summary Statistics
308
316
 
309
- After running all test cases, AgentV displays:
317
+ After running all eval cases, AgentV displays:
310
318
 
311
319
  - Mean, median, min, max scores
312
320
  - Standard deviation
313
321
  - Distribution histogram
314
- - Total test count and execution time
322
+ - Total eval count and execution time
315
323
 
316
324
  ## Architecture
317
325