agentv 0.2.3 → 0.2.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +67 -42
- package/dist/{chunk-S3RN2GSO.js → chunk-RLBRJX7V.js} +611 -428
- package/dist/chunk-RLBRJX7V.js.map +1 -0
- package/dist/cli.js +1 -1
- package/dist/index.js +1 -1
- package/dist/templates/config-schema.json +27 -0
- package/dist/templates/eval-build.prompt.md +3 -3
- package/dist/templates/eval-schema.json +3 -3
- package/package.json +3 -2
- package/dist/chunk-S3RN2GSO.js.map +0 -1
package/README.md
CHANGED
|
@@ -74,35 +74,60 @@ You are now ready to start development. The monorepo contains:
|
|
|
74
74
|
|
|
75
75
|
## Quick Start
|
|
76
76
|
|
|
77
|
-
###
|
|
77
|
+
### Configuring Guideline Patterns
|
|
78
78
|
|
|
79
|
-
|
|
79
|
+
AgentV automatically detects guideline files (instructions, prompts) and treats them differently from regular file content. You can customize which files are considered guidelines using an optional `.agentv/config.yaml` configuration file.
|
|
80
80
|
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
81
|
+
**Config file discovery:**
|
|
82
|
+
- AgentV searches for `.agentv/config.yaml` starting from the eval file's directory
|
|
83
|
+
- Walks up the directory tree to the repository root
|
|
84
|
+
- Uses the first config file found (similar to how `targets.yaml` is discovered)
|
|
85
|
+
- This allows you to place one config file at the project root for all evals
|
|
84
86
|
|
|
85
|
-
|
|
86
|
-
agentv lint evals/test1.yaml evals/test2.yaml
|
|
87
|
+
**Default patterns** (used when `.agentv/config.yaml` is absent):
|
|
87
88
|
|
|
88
|
-
|
|
89
|
-
|
|
89
|
+
```yaml
|
|
90
|
+
guideline_patterns:
|
|
91
|
+
- "**/*.instructions.md"
|
|
92
|
+
- "**/instructions/**"
|
|
93
|
+
- "**/*.prompt.md"
|
|
94
|
+
- "**/prompts/**"
|
|
95
|
+
```
|
|
90
96
|
|
|
91
|
-
|
|
92
|
-
agentv lint --strict evals/
|
|
97
|
+
**Custom patterns** (create `.agentv/config.yaml` in same directory as your eval file):
|
|
93
98
|
|
|
94
|
-
|
|
95
|
-
agentv
|
|
99
|
+
```yaml
|
|
100
|
+
# .agentv/config.yaml
|
|
101
|
+
guideline_patterns:
|
|
102
|
+
- "**/*.guide.md" # Match all .guide.md files
|
|
103
|
+
- "**/guidelines/**" # Match all files in /guidelines/ dirs
|
|
104
|
+
- "docs/AGENTS.md" # Match specific files
|
|
105
|
+
- "**/*.rules.md" # Match by naming convention
|
|
96
106
|
```
|
|
97
107
|
|
|
98
|
-
**
|
|
108
|
+
**How it works:**
|
|
109
|
+
|
|
110
|
+
- Files matching guideline patterns are loaded as separate guideline context
|
|
111
|
+
- Files NOT matching are treated as regular file content in user messages
|
|
112
|
+
- Patterns use standard glob syntax (via [micromatch](https://github.com/micromatch/micromatch))
|
|
113
|
+
- Paths are normalized to forward slashes for cross-platform compatibility
|
|
114
|
+
|
|
115
|
+
See [config.yaml example](docs/examples/simple/.agentv/config.yaml) for more pattern examples.
|
|
99
116
|
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
117
|
+
### Validating Eval Files
|
|
118
|
+
|
|
119
|
+
Validate your eval and targets files before running them:
|
|
120
|
+
|
|
121
|
+
```bash
|
|
122
|
+
# Validate a single file
|
|
123
|
+
agentv validate evals/my-eval.yaml
|
|
124
|
+
|
|
125
|
+
# Validate multiple files
|
|
126
|
+
agentv validate evals/eval1.yaml evals/eval2.yaml
|
|
127
|
+
|
|
128
|
+
# Validate entire directory (recursively finds all YAML files)
|
|
129
|
+
agentv validate evals/
|
|
130
|
+
```
|
|
106
131
|
|
|
107
132
|
**File type detection:**
|
|
108
133
|
|
|
@@ -112,7 +137,7 @@ All AgentV files must include a `$schema` field:
|
|
|
112
137
|
# Eval files
|
|
113
138
|
$schema: agentv-eval-v2
|
|
114
139
|
evalcases:
|
|
115
|
-
- id:
|
|
140
|
+
- id: eval-1
|
|
116
141
|
# ...
|
|
117
142
|
|
|
118
143
|
# Targets files
|
|
@@ -126,29 +151,29 @@ Files without a `$schema` field will be rejected with a clear error message.
|
|
|
126
151
|
|
|
127
152
|
### Running Evals
|
|
128
153
|
|
|
129
|
-
Run eval (target auto-selected from
|
|
154
|
+
Run eval (target auto-selected from eval file or CLI override):
|
|
130
155
|
|
|
131
156
|
```bash
|
|
132
|
-
# If your
|
|
133
|
-
agentv eval "path/to/
|
|
157
|
+
# If your eval.yaml contains "target: azure_base", it will be used automatically
|
|
158
|
+
agentv eval "path/to/eval.yaml"
|
|
134
159
|
|
|
135
|
-
# Override the
|
|
136
|
-
agentv eval --target vscode_projectx "path/to/
|
|
160
|
+
# Override the eval file's target with CLI flag
|
|
161
|
+
agentv eval --target vscode_projectx "path/to/eval.yaml"
|
|
137
162
|
```
|
|
138
163
|
|
|
139
|
-
Run a specific
|
|
164
|
+
Run a specific eval case with custom targets path:
|
|
140
165
|
|
|
141
166
|
```bash
|
|
142
|
-
agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --
|
|
167
|
+
agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --eval-id "my-eval-case" "path/to/eval.yaml"
|
|
143
168
|
```
|
|
144
169
|
|
|
145
170
|
### Command Line Options
|
|
146
171
|
|
|
147
|
-
- `
|
|
148
|
-
- `--target TARGET`: Execution target name from targets.yaml (overrides target specified in
|
|
172
|
+
- `eval_file`: Path to eval YAML file (required, positional argument)
|
|
173
|
+
- `--target TARGET`: Execution target name from targets.yaml (overrides target specified in eval file)
|
|
149
174
|
- `--targets TARGETS`: Path to targets.yaml file (default: ./.agentv/targets.yaml)
|
|
150
|
-
- `--
|
|
151
|
-
- `--out OUTPUT_FILE`: Output file path (default: results/{
|
|
175
|
+
- `--eval-id EVAL_ID`: Run only the eval case with this specific ID
|
|
176
|
+
- `--out OUTPUT_FILE`: Output file path (default: results/{evalname}_{timestamp}.jsonl)
|
|
152
177
|
- `--format FORMAT`: Output format: 'jsonl' or 'yaml' (default: jsonl)
|
|
153
178
|
- `--dry-run`: Run with mock model for testing
|
|
154
179
|
- `--agent-timeout SECONDS`: Timeout in seconds for agent response polling (default: 120)
|
|
@@ -162,12 +187,12 @@ agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --test-id
|
|
|
162
187
|
The CLI determines which execution target to use with the following precedence:
|
|
163
188
|
|
|
164
189
|
1. CLI flag override: `--target my_target` (when provided and not 'default')
|
|
165
|
-
2.
|
|
190
|
+
2. Eval file specification: `target: my_target` key in the .eval.yaml file
|
|
166
191
|
3. Default fallback: Uses the 'default' target (original behavior)
|
|
167
192
|
|
|
168
|
-
This allows
|
|
193
|
+
This allows eval files to specify their preferred target while still allowing command-line overrides for flexibility, and maintains backward compatibility with existing workflows.
|
|
169
194
|
|
|
170
|
-
Output goes to `.agentv/results/{
|
|
195
|
+
Output goes to `.agentv/results/{evalname}_{timestamp}.jsonl` (or `.yaml`) unless `--out` is provided.
|
|
171
196
|
|
|
172
197
|
### Tips for VS Code Copilot Evals
|
|
173
198
|
|
|
@@ -189,7 +214,7 @@ Environment keys (configured via targets.yaml):
|
|
|
189
214
|
|
|
190
215
|
## Targets and Environment Variables
|
|
191
216
|
|
|
192
|
-
Execution targets in `.agentv/targets.yaml` decouple
|
|
217
|
+
Execution targets in `.agentv/targets.yaml` decouple evals from providers/settings and provide flexible environment variable mapping.
|
|
193
218
|
|
|
194
219
|
### Target Configuration Structure
|
|
195
220
|
|
|
@@ -251,8 +276,8 @@ Each target specifies:
|
|
|
251
276
|
When using VS Code or other AI agents that may experience timeouts, the evaluator includes automatic retry functionality:
|
|
252
277
|
|
|
253
278
|
- **Timeout detection:** Automatically detects when agents timeout
|
|
254
|
-
- **Automatic retries:** When a timeout occurs, the same
|
|
255
|
-
- **Retry behavior:** Only timeouts trigger retries; other errors proceed to the next
|
|
279
|
+
- **Automatic retries:** When a timeout occurs, the same eval case is retried up to `--max-retries` times (default: 2)
|
|
280
|
+
- **Retry behavior:** Only timeouts trigger retries; other errors proceed to the next eval case
|
|
256
281
|
- **Timeout configuration:** Use `--agent-timeout` to adjust how long to wait for agent responses
|
|
257
282
|
|
|
258
283
|
Example with custom timeout settings:
|
|
@@ -263,7 +288,7 @@ agentv eval evals/projectx/example.yaml --target vscode_projectx --agent-timeout
|
|
|
263
288
|
|
|
264
289
|
## How the Evals Work
|
|
265
290
|
|
|
266
|
-
For each
|
|
291
|
+
For each eval case in a `.yaml` file:
|
|
267
292
|
|
|
268
293
|
1. Parse YAML and collect user messages (inline text and referenced files)
|
|
269
294
|
2. Extract code blocks from text for structured prompting
|
|
@@ -296,7 +321,7 @@ AgentV uses an AI-powered quality grader that:
|
|
|
296
321
|
**JSONL format (default):**
|
|
297
322
|
|
|
298
323
|
- One JSON object per line (newline-delimited)
|
|
299
|
-
- Fields: `
|
|
324
|
+
- Fields: `eval_id`, `score`, `hits`, `misses`, `model_answer`, `expected_aspect_count`, `target`, `timestamp`, `reasoning`, `raw_request`, `grader_raw_request`
|
|
300
325
|
|
|
301
326
|
**YAML format (with `--format yaml`):**
|
|
302
327
|
|
|
@@ -306,12 +331,12 @@ AgentV uses an AI-powered quality grader that:
|
|
|
306
331
|
|
|
307
332
|
### Summary Statistics
|
|
308
333
|
|
|
309
|
-
After running all
|
|
334
|
+
After running all eval cases, AgentV displays:
|
|
310
335
|
|
|
311
336
|
- Mean, median, min, max scores
|
|
312
337
|
- Standard deviation
|
|
313
338
|
- Distribution histogram
|
|
314
|
-
- Total
|
|
339
|
+
- Total eval count and execution time
|
|
315
340
|
|
|
316
341
|
## Architecture
|
|
317
342
|
|