agentv 0.5.1 → 0.6.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +19 -145
- package/dist/{chunk-HPH4YWGU.js → chunk-GURDWEMI.js} +4358 -1081
- package/dist/chunk-GURDWEMI.js.map +1 -0
- package/dist/cli.js +1 -1
- package/dist/index.js +1 -1
- package/dist/templates/agentv/config.yaml +2 -3
- package/dist/templates/agentv/targets.yaml +31 -31
- package/package.json +5 -3
- package/dist/chunk-HPH4YWGU.js.map +0 -1
package/README.md
CHANGED
|
@@ -64,39 +64,17 @@ You are now ready to start development. The monorepo contains:
|
|
|
64
64
|
|
|
65
65
|
### Environment Setup
|
|
66
66
|
|
|
67
|
-
1.
|
|
68
|
-
-
|
|
69
|
-
-
|
|
67
|
+
1. Initialize your workspace:
|
|
68
|
+
- Run `agentv init` at the root of your repository
|
|
69
|
+
- This command automatically sets up the `.agentv/` directory structure and configuration files
|
|
70
70
|
|
|
71
|
-
2.
|
|
72
|
-
-
|
|
73
|
-
-
|
|
71
|
+
2. Configure environment variables:
|
|
72
|
+
- The init command creates a `.env.template` file in your project root
|
|
73
|
+
- Copy `.env.template` to `.env` and fill in your API keys, endpoints, and other configuration values
|
|
74
|
+
- Update the environment variable names in `.agentv/targets.yaml` to match those defined in your `.env` file
|
|
74
75
|
|
|
75
76
|
## Quick Start
|
|
76
77
|
|
|
77
|
-
### Configuring Guideline Patterns
|
|
78
|
-
|
|
79
|
-
AgentV automatically detects guideline files and treats them differently from regular file content. You can customize which files are considered guidelines using an optional `.agentv/config.yaml` configuration file.
|
|
80
|
-
|
|
81
|
-
**Config file discovery:**
|
|
82
|
-
- AgentV searches for `.agentv/config.yaml` starting from the eval file's directory
|
|
83
|
-
- Walks up the directory tree to the repository root
|
|
84
|
-
- Uses the first config file found (similar to how `targets.yaml` is discovered)
|
|
85
|
-
- This allows you to place one config file at the project root for all evals
|
|
86
|
-
|
|
87
|
-
**Custom patterns** (create `.agentv/config.yaml` in same directory as your eval file):
|
|
88
|
-
|
|
89
|
-
```yaml
|
|
90
|
-
# .agentv/config.yaml
|
|
91
|
-
guideline_patterns:
|
|
92
|
-
- "**/*.guide.md" # Match all .guide.md files
|
|
93
|
-
- "**/guidelines/**" # Match all files in /guidelines/ dirs
|
|
94
|
-
- "docs/AGENTS.md" # Match specific files
|
|
95
|
-
- "**/*.rules.md" # Match by naming convention
|
|
96
|
-
```
|
|
97
|
-
|
|
98
|
-
See [config.yaml example](docs/examples/simple/.agentv/config.yaml) for more pattern examples.
|
|
99
|
-
|
|
100
78
|
### Validating Eval Files
|
|
101
79
|
|
|
102
80
|
Validate your eval and targets files before running them:
|
|
@@ -142,6 +120,9 @@ agentv eval "path/to/eval.yaml"
|
|
|
142
120
|
|
|
143
121
|
# Override the eval file's target with CLI flag
|
|
144
122
|
agentv eval --target vscode_projectx "path/to/eval.yaml"
|
|
123
|
+
|
|
124
|
+
# Run multiple evals via glob
|
|
125
|
+
agentv eval "path/to/evals/**/*.yaml"
|
|
145
126
|
```
|
|
146
127
|
|
|
147
128
|
Run a specific eval case with custom targets path:
|
|
@@ -152,17 +133,18 @@ agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --eval-id
|
|
|
152
133
|
|
|
153
134
|
### Command Line Options
|
|
154
135
|
|
|
155
|
-
- `
|
|
136
|
+
- `eval_paths...`: Path(s) or glob(s) to eval YAML files (required; e.g., `evals/**/*.yaml`)
|
|
156
137
|
- `--target TARGET`: Execution target name from targets.yaml (overrides target specified in eval file)
|
|
157
138
|
- `--targets TARGETS`: Path to targets.yaml file (default: ./.agentv/targets.yaml)
|
|
158
139
|
- `--eval-id EVAL_ID`: Run only the eval case with this specific ID
|
|
159
|
-
- `--out OUTPUT_FILE`: Output file path (default: results/
|
|
160
|
-
- `--format FORMAT`: Output format: 'jsonl' or 'yaml' (default: jsonl)
|
|
140
|
+
- `--out OUTPUT_FILE`: Output file path (default: .agentv/results/eval_<timestamp>.jsonl)
|
|
141
|
+
- `--output-format FORMAT`: Output format: 'jsonl' or 'yaml' (default: jsonl)
|
|
161
142
|
- `--dry-run`: Run with mock model for testing
|
|
162
143
|
- `--agent-timeout SECONDS`: Timeout in seconds for agent response polling (default: 120)
|
|
163
144
|
- `--max-retries COUNT`: Maximum number of retries for timeout cases (default: 2)
|
|
164
145
|
- `--cache`: Enable caching of LLM responses (default: disabled)
|
|
165
146
|
- `--dump-prompts`: Save all prompts to `.agentv/prompts/` directory
|
|
147
|
+
- `--workers COUNT`: Parallel workers for eval cases (default: 3; target `workers` setting used when provided)
|
|
166
148
|
- `--verbose`: Verbose output
|
|
167
149
|
|
|
168
150
|
### Target Selection Priority
|
|
@@ -175,7 +157,7 @@ The CLI determines which execution target to use with the following precedence:
|
|
|
175
157
|
|
|
176
158
|
This allows eval files to specify their preferred target while still allowing command-line overrides for flexibility, and maintains backward compatibility with existing workflows.
|
|
177
159
|
|
|
178
|
-
Output goes to `.agentv/results/
|
|
160
|
+
Output goes to `.agentv/results/eval_<timestamp>.jsonl` (or `.yaml`) unless `--out` is provided.
|
|
179
161
|
|
|
180
162
|
### Tips for VS Code Copilot Evals
|
|
181
163
|
|
|
@@ -256,21 +238,6 @@ Each target specifies:
|
|
|
256
238
|
Codex targets require the standalone `codex` CLI and a configured profile (via `codex configure`) so credentials are stored in `~/.codex/config` (or whatever path the CLI already uses). AgentV mirrors all guideline and attachment files into a fresh scratch workspace, so the `file://` preread links remain valid even when the CLI runs outside your repo tree.
|
|
257
239
|
Confirm the CLI works by running `codex exec --json --profile <name> "ping"` (or any supported dry run) before starting an eval. This prints JSONL events; seeing `item.completed` messages indicates the CLI is healthy.
|
|
258
240
|
|
|
259
|
-
## Timeout Handling and Retries
|
|
260
|
-
|
|
261
|
-
When using VS Code or other AI agents that may experience timeouts, the evaluator includes automatic retry functionality:
|
|
262
|
-
|
|
263
|
-
- **Timeout detection:** Automatically detects when agents timeout
|
|
264
|
-
- **Automatic retries:** When a timeout occurs, the same eval case is retried up to `--max-retries` times (default: 2)
|
|
265
|
-
- **Retry behavior:** Only timeouts trigger retries; other errors proceed to the next eval case
|
|
266
|
-
- **Timeout configuration:** Use `--agent-timeout` to adjust how long to wait for agent responses
|
|
267
|
-
|
|
268
|
-
Example with custom timeout settings:
|
|
269
|
-
|
|
270
|
-
```bash
|
|
271
|
-
agentv eval evals/projectx/example.yaml --target vscode_projectx --agent-timeout 180 --max-retries 3
|
|
272
|
-
```
|
|
273
|
-
|
|
274
241
|
## Writing Custom Evaluators
|
|
275
242
|
|
|
276
243
|
### Code Evaluator I/O Contract
|
|
@@ -370,110 +337,17 @@ Evaluation criteria and guidelines...
|
|
|
370
337
|
|
|
371
338
|
## Next Steps
|
|
372
339
|
|
|
373
|
-
- Review
|
|
374
|
-
- Create your own eval
|
|
375
|
-
- Write custom evaluator scripts for
|
|
376
|
-
- Create LLM judge
|
|
340
|
+
- Review [docs/examples/simple/evals/example-eval.yaml](docs/examples/simple/evals/example-eval.yaml) to understand the schema
|
|
341
|
+
- Create your own eval dataset following the schema
|
|
342
|
+
- Write custom evaluator scripts for deterministic evaluation
|
|
343
|
+
- Create LLM judge prompts for semantic evaluation
|
|
377
344
|
- Set up optimizer configs when ready to improve prompts
|
|
378
345
|
|
|
379
346
|
## Resources
|
|
380
347
|
|
|
381
348
|
- [Simple Example README](docs/examples/simple/README.md)
|
|
382
|
-
- [Schema Specification](docs/openspec/changes/update-eval-schema-v2/)
|
|
383
349
|
- [Ax ACE Documentation](https://github.com/ax-llm/ax/blob/main/docs/ACE.md)
|
|
384
350
|
|
|
385
|
-
## Scoring and Outputs
|
|
386
|
-
|
|
387
|
-
Run with `--verbose` to print detailed information and stack traces on errors.
|
|
388
|
-
|
|
389
|
-
### Scoring Methodology
|
|
390
|
-
|
|
391
|
-
AgentV uses an AI-powered quality grader that:
|
|
392
|
-
|
|
393
|
-
- Extracts key aspects from the expected answer
|
|
394
|
-
- Compares model output against those aspects
|
|
395
|
-
- Provides detailed hit/miss analysis with reasoning
|
|
396
|
-
- Returns a normalized score (0.0 to 1.0)
|
|
397
|
-
|
|
398
|
-
### Output Formats
|
|
399
|
-
|
|
400
|
-
**JSONL format (default):**
|
|
401
|
-
|
|
402
|
-
- One JSON object per line (newline-delimited)
|
|
403
|
-
- Fields: `eval_id`, `score`, `hits`, `misses`, `model_answer`, `expected_aspect_count`, `target`, `timestamp`, `reasoning`, `raw_request`, `grader_raw_request`
|
|
404
|
-
|
|
405
|
-
**YAML format (with `--format yaml`):**
|
|
406
|
-
|
|
407
|
-
- Human-readable YAML documents
|
|
408
|
-
- Same fields as JSONL, properly formatted for readability
|
|
409
|
-
- Multi-line strings use literal block style
|
|
410
|
-
|
|
411
|
-
### Summary Statistics
|
|
412
|
-
|
|
413
|
-
After running all eval cases, AgentV displays:
|
|
414
|
-
|
|
415
|
-
- Mean, median, min, max scores
|
|
416
|
-
- Standard deviation
|
|
417
|
-
- Distribution histogram
|
|
418
|
-
- Total eval count and execution time
|
|
419
|
-
|
|
420
|
-
## Architecture
|
|
421
|
-
|
|
422
|
-
AgentV is built as a TypeScript monorepo using:
|
|
423
|
-
|
|
424
|
-
- **pnpm workspaces:** Efficient dependency management
|
|
425
|
-
- **Turbo:** Build system and task orchestration
|
|
426
|
-
- **@ax-llm/ax:** Unified LLM provider abstraction
|
|
427
|
-
- **Vercel AI SDK:** Streaming and tool use capabilities
|
|
428
|
-
- **Zod:** Runtime type validation
|
|
429
|
-
- **Commander.js:** CLI argument parsing
|
|
430
|
-
- **Vitest:** Testing framework
|
|
431
|
-
|
|
432
|
-
### Package Structure
|
|
433
|
-
|
|
434
|
-
- `@agentv/core` - Core evaluation engine, providers, grading logic
|
|
435
|
-
- `agentv` - Main package that bundles CLI functionality
|
|
436
|
-
|
|
437
|
-
## Troubleshooting
|
|
438
|
-
|
|
439
|
-
### Installation Issues
|
|
440
|
-
|
|
441
|
-
**Problem:** Package installation fails or command not found.
|
|
442
|
-
|
|
443
|
-
**Solution:**
|
|
444
|
-
|
|
445
|
-
```bash
|
|
446
|
-
# Clear npm cache and reinstall
|
|
447
|
-
npm cache clean --force
|
|
448
|
-
npm uninstall -g agentv
|
|
449
|
-
npm install -g agentv
|
|
450
|
-
|
|
451
|
-
# Or use npx without installing
|
|
452
|
-
npx agentv@latest --help
|
|
453
|
-
```
|
|
454
|
-
|
|
455
|
-
### VS Code Integration Issues
|
|
456
|
-
|
|
457
|
-
**Problem:** VS Code workspace doesn't open or prompts aren't injected.
|
|
458
|
-
|
|
459
|
-
**Solution:**
|
|
460
|
-
|
|
461
|
-
- Ensure the `subagent` package is installed (should be automatic)
|
|
462
|
-
- Verify your workspace path in `.env` is correct and points to a `.code-workspace` file
|
|
463
|
-
- Close any other VS Code instances before running evals
|
|
464
|
-
- Use `--verbose` flag to see detailed workspace switching logs
|
|
465
|
-
|
|
466
|
-
### Provider Configuration Issues
|
|
467
|
-
|
|
468
|
-
**Problem:** API authentication errors or missing credentials.
|
|
469
|
-
|
|
470
|
-
**Solution:**
|
|
471
|
-
|
|
472
|
-
- Double-check environment variables in your `.env` file
|
|
473
|
-
- Verify the variable names in `targets.yaml` match your `.env` file
|
|
474
|
-
- Use `--dry-run` first to test without making API calls
|
|
475
|
-
- Check provider-specific documentation for required environment variables
|
|
476
|
-
|
|
477
351
|
## License
|
|
478
352
|
|
|
479
353
|
MIT License - see [LICENSE](LICENSE) for details.
|