agentv 0.2.6 → 0.2.11
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +50 -42
- package/dist/chunk-7MGIZBZG.js +15238 -0
- package/dist/chunk-7MGIZBZG.js.map +1 -0
- package/dist/{chunk-32ZAVIQY.js → chunk-JT3E7T7V.js} +582 -436
- package/dist/chunk-JT3E7T7V.js.map +1 -0
- package/dist/cli.js +1 -1
- package/dist/index.js +1 -1
- package/dist/templates/config-schema.json +27 -0
- package/dist/templates/eval-build.prompt.md +3 -3
- package/dist/templates/eval-schema.json +3 -3
- package/package.json +3 -2
- package/dist/chunk-32ZAVIQY.js.map +0 -1
package/README.md
CHANGED
|
@@ -74,35 +74,43 @@ You are now ready to start development. The monorepo contains:
|
|
|
74
74
|
|
|
75
75
|
## Quick Start
|
|
76
76
|
|
|
77
|
-
###
|
|
77
|
+
### Configuring Guideline Patterns
|
|
78
78
|
|
|
79
|
-
|
|
79
|
+
AgentV automatically detects guideline files and treats them differently from regular file content. You can customize which files are considered guidelines using an optional `.agentv/config.yaml` configuration file.
|
|
80
80
|
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
81
|
+
**Config file discovery:**
|
|
82
|
+
- AgentV searches for `.agentv/config.yaml` starting from the eval file's directory
|
|
83
|
+
- Walks up the directory tree to the repository root
|
|
84
|
+
- Uses the first config file found (similar to how `targets.yaml` is discovered)
|
|
85
|
+
- This allows you to place one config file at the project root for all evals
|
|
84
86
|
|
|
85
|
-
|
|
86
|
-
agentv lint evals/test1.yaml evals/test2.yaml
|
|
87
|
+
**Custom patterns** (create `.agentv/config.yaml` in same directory as your eval file):
|
|
87
88
|
|
|
88
|
-
|
|
89
|
-
agentv
|
|
89
|
+
```yaml
|
|
90
|
+
# .agentv/config.yaml
|
|
91
|
+
guideline_patterns:
|
|
92
|
+
- "**/*.guide.md" # Match all .guide.md files
|
|
93
|
+
- "**/guidelines/**" # Match all files in /guidelines/ dirs
|
|
94
|
+
- "docs/AGENTS.md" # Match specific files
|
|
95
|
+
- "**/*.rules.md" # Match by naming convention
|
|
96
|
+
```
|
|
90
97
|
|
|
91
|
-
|
|
92
|
-
agentv lint --strict evals/
|
|
98
|
+
See [config.yaml example](docs/examples/simple/.agentv/config.yaml) for more pattern examples.
|
|
93
99
|
|
|
94
|
-
|
|
95
|
-
agentv lint --json evals/
|
|
96
|
-
```
|
|
100
|
+
### Validating Eval Files
|
|
97
101
|
|
|
98
|
-
|
|
102
|
+
Validate your eval and targets files before running them:
|
|
103
|
+
|
|
104
|
+
```bash
|
|
105
|
+
# Validate a single file
|
|
106
|
+
agentv validate evals/my-eval.yaml
|
|
99
107
|
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
108
|
+
# Validate multiple files
|
|
109
|
+
agentv validate evals/eval1.yaml evals/eval2.yaml
|
|
110
|
+
|
|
111
|
+
# Validate entire directory (recursively finds all YAML files)
|
|
112
|
+
agentv validate evals/
|
|
113
|
+
```
|
|
106
114
|
|
|
107
115
|
**File type detection:**
|
|
108
116
|
|
|
@@ -112,7 +120,7 @@ All AgentV files must include a `$schema` field:
|
|
|
112
120
|
# Eval files
|
|
113
121
|
$schema: agentv-eval-v2
|
|
114
122
|
evalcases:
|
|
115
|
-
- id:
|
|
123
|
+
- id: eval-1
|
|
116
124
|
# ...
|
|
117
125
|
|
|
118
126
|
# Targets files
|
|
@@ -126,29 +134,29 @@ Files without a `$schema` field will be rejected with a clear error message.
|
|
|
126
134
|
|
|
127
135
|
### Running Evals
|
|
128
136
|
|
|
129
|
-
Run eval (target auto-selected from
|
|
137
|
+
Run eval (target auto-selected from eval file or CLI override):
|
|
130
138
|
|
|
131
139
|
```bash
|
|
132
|
-
# If your
|
|
133
|
-
agentv eval "path/to/
|
|
140
|
+
# If your eval.yaml contains "target: azure_base", it will be used automatically
|
|
141
|
+
agentv eval "path/to/eval.yaml"
|
|
134
142
|
|
|
135
|
-
# Override the
|
|
136
|
-
agentv eval --target vscode_projectx "path/to/
|
|
143
|
+
# Override the eval file's target with CLI flag
|
|
144
|
+
agentv eval --target vscode_projectx "path/to/eval.yaml"
|
|
137
145
|
```
|
|
138
146
|
|
|
139
|
-
Run a specific
|
|
147
|
+
Run a specific eval case with custom targets path:
|
|
140
148
|
|
|
141
149
|
```bash
|
|
142
|
-
agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --eval-id "my-
|
|
150
|
+
agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --eval-id "my-eval-case" "path/to/eval.yaml"
|
|
143
151
|
```
|
|
144
152
|
|
|
145
153
|
### Command Line Options
|
|
146
154
|
|
|
147
|
-
- `
|
|
148
|
-
- `--target TARGET`: Execution target name from targets.yaml (overrides target specified in
|
|
155
|
+
- `eval_file`: Path to eval YAML file (required, positional argument)
|
|
156
|
+
- `--target TARGET`: Execution target name from targets.yaml (overrides target specified in eval file)
|
|
149
157
|
- `--targets TARGETS`: Path to targets.yaml file (default: ./.agentv/targets.yaml)
|
|
150
|
-
- `--eval-id EVAL_ID`: Run only the
|
|
151
|
-
- `--out OUTPUT_FILE`: Output file path (default: results/{
|
|
158
|
+
- `--eval-id EVAL_ID`: Run only the eval case with this specific ID
|
|
159
|
+
- `--out OUTPUT_FILE`: Output file path (default: results/{evalname}_{timestamp}.jsonl)
|
|
152
160
|
- `--format FORMAT`: Output format: 'jsonl' or 'yaml' (default: jsonl)
|
|
153
161
|
- `--dry-run`: Run with mock model for testing
|
|
154
162
|
- `--agent-timeout SECONDS`: Timeout in seconds for agent response polling (default: 120)
|
|
@@ -162,12 +170,12 @@ agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --eval-id
|
|
|
162
170
|
The CLI determines which execution target to use with the following precedence:
|
|
163
171
|
|
|
164
172
|
1. CLI flag override: `--target my_target` (when provided and not 'default')
|
|
165
|
-
2.
|
|
173
|
+
2. Eval file specification: `target: my_target` key in the .eval.yaml file
|
|
166
174
|
3. Default fallback: Uses the 'default' target (original behavior)
|
|
167
175
|
|
|
168
|
-
This allows
|
|
176
|
+
This allows eval files to specify their preferred target while still allowing command-line overrides for flexibility, and maintains backward compatibility with existing workflows.
|
|
169
177
|
|
|
170
|
-
Output goes to `.agentv/results/{
|
|
178
|
+
Output goes to `.agentv/results/{evalname}_{timestamp}.jsonl` (or `.yaml`) unless `--out` is provided.
|
|
171
179
|
|
|
172
180
|
### Tips for VS Code Copilot Evals
|
|
173
181
|
|
|
@@ -189,7 +197,7 @@ Environment keys (configured via targets.yaml):
|
|
|
189
197
|
|
|
190
198
|
## Targets and Environment Variables
|
|
191
199
|
|
|
192
|
-
Execution targets in `.agentv/targets.yaml` decouple
|
|
200
|
+
Execution targets in `.agentv/targets.yaml` decouple evals from providers/settings and provide flexible environment variable mapping.
|
|
193
201
|
|
|
194
202
|
### Target Configuration Structure
|
|
195
203
|
|
|
@@ -251,8 +259,8 @@ Each target specifies:
|
|
|
251
259
|
When using VS Code or other AI agents that may experience timeouts, the evaluator includes automatic retry functionality:
|
|
252
260
|
|
|
253
261
|
- **Timeout detection:** Automatically detects when agents timeout
|
|
254
|
-
- **Automatic retries:** When a timeout occurs, the same
|
|
255
|
-
- **Retry behavior:** Only timeouts trigger retries; other errors proceed to the next
|
|
262
|
+
- **Automatic retries:** When a timeout occurs, the same eval case is retried up to `--max-retries` times (default: 2)
|
|
263
|
+
- **Retry behavior:** Only timeouts trigger retries; other errors proceed to the next eval case
|
|
256
264
|
- **Timeout configuration:** Use `--agent-timeout` to adjust how long to wait for agent responses
|
|
257
265
|
|
|
258
266
|
Example with custom timeout settings:
|
|
@@ -263,7 +271,7 @@ agentv eval evals/projectx/example.yaml --target vscode_projectx --agent-timeout
|
|
|
263
271
|
|
|
264
272
|
## How the Evals Work
|
|
265
273
|
|
|
266
|
-
For each
|
|
274
|
+
For each eval case in a `.yaml` file:
|
|
267
275
|
|
|
268
276
|
1. Parse YAML and collect user messages (inline text and referenced files)
|
|
269
277
|
2. Extract code blocks from text for structured prompting
|
|
@@ -306,12 +314,12 @@ AgentV uses an AI-powered quality grader that:
|
|
|
306
314
|
|
|
307
315
|
### Summary Statistics
|
|
308
316
|
|
|
309
|
-
After running all
|
|
317
|
+
After running all eval cases, AgentV displays:
|
|
310
318
|
|
|
311
319
|
- Mean, median, min, max scores
|
|
312
320
|
- Standard deviation
|
|
313
321
|
- Distribution histogram
|
|
314
|
-
- Total
|
|
322
|
+
- Total eval count and execution time
|
|
315
323
|
|
|
316
324
|
## Architecture
|
|
317
325
|
|