agentv 0.2.6 → 0.2.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -74,35 +74,60 @@ You are now ready to start development. The monorepo contains:
74
74
 
75
75
  ## Quick Start
76
76
 
77
- ### Linting Eval Files
77
+ ### Configuring Guideline Patterns
78
78
 
79
- Validate your eval and targets files before running them:
79
+ AgentV automatically detects guideline files (instructions, prompts) and treats them differently from regular file content. You can customize which files are considered guidelines using an optional `.agentv/config.yaml` configuration file.
80
80
 
81
- ```bash
82
- # Lint a single file
83
- agentv lint evals/my-test.yaml
81
+ **Config file discovery:**
82
+ - AgentV searches for `.agentv/config.yaml` starting from the eval file's directory
83
+ - Walks up the directory tree to the repository root
84
+ - Uses the first config file found (similar to how `targets.yaml` is discovered)
85
+ - This allows you to place one config file at the project root for all evals
84
86
 
85
- # Lint multiple files
86
- agentv lint evals/test1.yaml evals/test2.yaml
87
+ **Default patterns** (used when `.agentv/config.yaml` is absent):
87
88
 
88
- # Lint entire directory (recursively finds all YAML files)
89
- agentv lint evals/
89
+ ```yaml
90
+ guideline_patterns:
91
+ - "**/*.instructions.md"
92
+ - "**/instructions/**"
93
+ - "**/*.prompt.md"
94
+ - "**/prompts/**"
95
+ ```
90
96
 
91
- # Enable strict mode for additional checks
92
- agentv lint --strict evals/
97
+ **Custom patterns** (create `.agentv/config.yaml` in same directory as your eval file):
93
98
 
94
- # Output results in JSON format
95
- agentv lint --json evals/
99
+ ```yaml
100
+ # .agentv/config.yaml
101
+ guideline_patterns:
102
+ - "**/*.guide.md" # Match all .guide.md files
103
+ - "**/guidelines/**" # Match all files in /guidelines/ dirs
104
+ - "docs/AGENTS.md" # Match specific files
105
+ - "**/*.rules.md" # Match by naming convention
96
106
  ```
97
107
 
98
- **Linter features:**
108
+ **How it works:**
109
+
110
+ - Files matching guideline patterns are loaded as separate guideline context
111
+ - Files NOT matching are treated as regular file content in user messages
112
+ - Patterns use standard glob syntax (via [micromatch](https://github.com/micromatch/micromatch))
113
+ - Paths are normalized to forward slashes for cross-platform compatibility
114
+
115
+ See [config.yaml example](docs/examples/simple/.agentv/config.yaml) for more pattern examples.
99
116
 
100
- - Validates `$schema` field is present and correct
101
- - Checks required fields and structure for eval and targets files
102
- - Validates file references exist and are accessible
103
- - Provides clear error messages with file path and location context
104
- - Exits with non-zero code on validation failures (CI-friendly)
105
- - Supports strict mode for additional checks (e.g., non-empty file content)
117
+ ### Validating Eval Files
118
+
119
+ Validate your eval and targets files before running them:
120
+
121
+ ```bash
122
+ # Validate a single file
123
+ agentv validate evals/my-eval.yaml
124
+
125
+ # Validate multiple files
126
+ agentv validate evals/eval1.yaml evals/eval2.yaml
127
+
128
+ # Validate entire directory (recursively finds all YAML files)
129
+ agentv validate evals/
130
+ ```
106
131
 
107
132
  **File type detection:**
108
133
 
@@ -112,7 +137,7 @@ All AgentV files must include a `$schema` field:
112
137
  # Eval files
113
138
  $schema: agentv-eval-v2
114
139
  evalcases:
115
- - id: test-1
140
+ - id: eval-1
116
141
  # ...
117
142
 
118
143
  # Targets files
@@ -126,29 +151,29 @@ Files without a `$schema` field will be rejected with a clear error message.
126
151
 
127
152
  ### Running Evals
128
153
 
129
- Run eval (target auto-selected from test file or CLI override):
154
+ Run eval (target auto-selected from eval file or CLI override):
130
155
 
131
156
  ```bash
132
- # If your test.yaml contains "target: azure_base", it will be used automatically
133
- agentv eval "path/to/test.yaml"
157
+ # If your eval.yaml contains "target: azure_base", it will be used automatically
158
+ agentv eval "path/to/eval.yaml"
134
159
 
135
- # Override the test file's target with CLI flag
136
- agentv eval --target vscode_projectx "path/to/test.yaml"
160
+ # Override the eval file's target with CLI flag
161
+ agentv eval --target vscode_projectx "path/to/eval.yaml"
137
162
  ```
138
163
 
139
- Run a specific test case with custom targets path:
164
+ Run a specific eval case with custom targets path:
140
165
 
141
166
  ```bash
142
- agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --eval-id "my-test-case" "path/to/test.yaml"
167
+ agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --eval-id "my-eval-case" "path/to/eval.yaml"
143
168
  ```
144
169
 
145
170
  ### Command Line Options
146
171
 
147
- - `test_file`: Path to test YAML file (required, positional argument)
148
- - `--target TARGET`: Execution target name from targets.yaml (overrides target specified in test file)
172
+ - `eval_file`: Path to eval YAML file (required, positional argument)
173
+ - `--target TARGET`: Execution target name from targets.yaml (overrides target specified in eval file)
149
174
  - `--targets TARGETS`: Path to targets.yaml file (default: ./.agentv/targets.yaml)
150
- - `--eval-id EVAL_ID`: Run only the test case with this specific ID
151
- - `--out OUTPUT_FILE`: Output file path (default: results/{testname}_{timestamp}.jsonl)
175
+ - `--eval-id EVAL_ID`: Run only the eval case with this specific ID
176
+ - `--out OUTPUT_FILE`: Output file path (default: results/{evalname}_{timestamp}.jsonl)
152
177
  - `--format FORMAT`: Output format: 'jsonl' or 'yaml' (default: jsonl)
153
178
  - `--dry-run`: Run with mock model for testing
154
179
  - `--agent-timeout SECONDS`: Timeout in seconds for agent response polling (default: 120)
@@ -162,12 +187,12 @@ agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --eval-id
162
187
  The CLI determines which execution target to use with the following precedence:
163
188
 
164
189
  1. CLI flag override: `--target my_target` (when provided and not 'default')
165
- 2. Test file specification: `target: my_target` key in the .test.yaml file
190
+ 2. Eval file specification: `target: my_target` key in the .eval.yaml file
166
191
  3. Default fallback: Uses the 'default' target (original behavior)
167
192
 
168
- This allows test files to specify their preferred target while still allowing command-line overrides for flexibility, and maintains backward compatibility with existing workflows.
193
+ This allows eval files to specify their preferred target while still allowing command-line overrides for flexibility, and maintains backward compatibility with existing workflows.
169
194
 
170
- Output goes to `.agentv/results/{testname}_{timestamp}.jsonl` (or `.yaml`) unless `--out` is provided.
195
+ Output goes to `.agentv/results/{evalname}_{timestamp}.jsonl` (or `.yaml`) unless `--out` is provided.
171
196
 
172
197
  ### Tips for VS Code Copilot Evals
173
198
 
@@ -189,7 +214,7 @@ Environment keys (configured via targets.yaml):
189
214
 
190
215
  ## Targets and Environment Variables
191
216
 
192
- Execution targets in `.agentv/targets.yaml` decouple tests from providers/settings and provide flexible environment variable mapping.
217
+ Execution targets in `.agentv/targets.yaml` decouple evals from providers/settings and provide flexible environment variable mapping.
193
218
 
194
219
  ### Target Configuration Structure
195
220
 
@@ -251,8 +276,8 @@ Each target specifies:
251
276
  When using VS Code or other AI agents that may experience timeouts, the evaluator includes automatic retry functionality:
252
277
 
253
278
  - **Timeout detection:** Automatically detects when agents timeout
254
- - **Automatic retries:** When a timeout occurs, the same test case is retried up to `--max-retries` times (default: 2)
255
- - **Retry behavior:** Only timeouts trigger retries; other errors proceed to the next test case
279
+ - **Automatic retries:** When a timeout occurs, the same eval case is retried up to `--max-retries` times (default: 2)
280
+ - **Retry behavior:** Only timeouts trigger retries; other errors proceed to the next eval case
256
281
  - **Timeout configuration:** Use `--agent-timeout` to adjust how long to wait for agent responses
257
282
 
258
283
  Example with custom timeout settings:
@@ -263,7 +288,7 @@ agentv eval evals/projectx/example.yaml --target vscode_projectx --agent-timeout
263
288
 
264
289
  ## How the Evals Work
265
290
 
266
- For each test case in a `.yaml` file:
291
+ For each eval case in a `.yaml` file:
267
292
 
268
293
  1. Parse YAML and collect user messages (inline text and referenced files)
269
294
  2. Extract code blocks from text for structured prompting
@@ -306,12 +331,12 @@ AgentV uses an AI-powered quality grader that:
306
331
 
307
332
  ### Summary Statistics
308
333
 
309
- After running all test cases, AgentV displays:
334
+ After running all eval cases, AgentV displays:
310
335
 
311
336
  - Mean, median, min, max scores
312
337
  - Standard deviation
313
338
  - Distribution histogram
314
- - Total test count and execution time
339
+ - Total eval count and execution time
315
340
 
316
341
  ## Architecture
317
342