agentv 0.5.1 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -64,39 +64,17 @@ You are now ready to start development. The monorepo contains:
64
64
 
65
65
  ### Environment Setup
66
66
 
67
- 1. Configure environment variables:
68
- - Copy [.env.template](docs/examples/simple/.env.template) to `.env` in your project root
69
- - Fill in your API keys, endpoints, and other configuration values
67
+ 1. Initialize your workspace:
68
+ - Run `agentv init` at the root of your repository
69
+ - This command automatically sets up the `.agentv/` directory structure and configuration files
70
70
 
71
- 2. Set up targets:
72
- - Copy [targets.yaml](docs/examples/simple/.agentv/targets.yaml) to `.agentv/targets.yaml`
73
- - Update the environment variable names in targets.yaml to match those defined in your `.env` file
71
+ 2. Configure environment variables:
72
+ - The init command creates a `.env.template` file in your project root
73
+ - Copy `.env.template` to `.env` and fill in your API keys, endpoints, and other configuration values
74
+ - Update the environment variable names in `.agentv/targets.yaml` to match those defined in your `.env` file
74
75
 
75
76
  ## Quick Start
76
77
 
77
- ### Configuring Guideline Patterns
78
-
79
- AgentV automatically detects guideline files and treats them differently from regular file content. You can customize which files are considered guidelines using an optional `.agentv/config.yaml` configuration file.
80
-
81
- **Config file discovery:**
82
- - AgentV searches for `.agentv/config.yaml` starting from the eval file's directory
83
- - Walks up the directory tree to the repository root
84
- - Uses the first config file found (similar to how `targets.yaml` is discovered)
85
- - This allows you to place one config file at the project root for all evals
86
-
87
- **Custom patterns** (create `.agentv/config.yaml` in same directory as your eval file):
88
-
89
- ```yaml
90
- # .agentv/config.yaml
91
- guideline_patterns:
92
- - "**/*.guide.md" # Match all .guide.md files
93
- - "**/guidelines/**" # Match all files in /guidelines/ dirs
94
- - "docs/AGENTS.md" # Match specific files
95
- - "**/*.rules.md" # Match by naming convention
96
- ```
97
-
98
- See [config.yaml example](docs/examples/simple/.agentv/config.yaml) for more pattern examples.
99
-
100
78
  ### Validating Eval Files
101
79
 
102
80
  Validate your eval and targets files before running them:
@@ -142,6 +120,9 @@ agentv eval "path/to/eval.yaml"
142
120
 
143
121
  # Override the eval file's target with CLI flag
144
122
  agentv eval --target vscode_projectx "path/to/eval.yaml"
123
+
124
+ # Run multiple evals via glob
125
+ agentv eval "path/to/evals/**/*.yaml"
145
126
  ```
146
127
 
147
128
  Run a specific eval case with custom targets path:
@@ -152,17 +133,18 @@ agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --eval-id
152
133
 
153
134
  ### Command Line Options
154
135
 
155
- - `eval_file`: Path to eval YAML file (required, positional argument)
136
+ - `eval_paths...`: Path(s) or glob(s) to eval YAML files (required; e.g., `evals/**/*.yaml`)
156
137
  - `--target TARGET`: Execution target name from targets.yaml (overrides target specified in eval file)
157
138
  - `--targets TARGETS`: Path to targets.yaml file (default: ./.agentv/targets.yaml)
158
139
  - `--eval-id EVAL_ID`: Run only the eval case with this specific ID
159
- - `--out OUTPUT_FILE`: Output file path (default: results/{evalname}_{timestamp}.jsonl)
160
- - `--format FORMAT`: Output format: 'jsonl' or 'yaml' (default: jsonl)
140
+ - `--out OUTPUT_FILE`: Output file path (default: .agentv/results/eval_<timestamp>.jsonl)
141
+ - `--output-format FORMAT`: Output format: 'jsonl' or 'yaml' (default: jsonl)
161
142
  - `--dry-run`: Run with mock model for testing
162
143
  - `--agent-timeout SECONDS`: Timeout in seconds for agent response polling (default: 120)
163
144
  - `--max-retries COUNT`: Maximum number of retries for timeout cases (default: 2)
164
145
  - `--cache`: Enable caching of LLM responses (default: disabled)
165
146
  - `--dump-prompts`: Save all prompts to `.agentv/prompts/` directory
147
+ - `--workers COUNT`: Parallel workers for eval cases (default: 3; target `workers` setting used when provided)
166
148
  - `--verbose`: Verbose output
167
149
 
168
150
  ### Target Selection Priority
@@ -175,7 +157,7 @@ The CLI determines which execution target to use with the following precedence:
175
157
 
176
158
  This allows eval files to specify their preferred target while still allowing command-line overrides for flexibility, and maintains backward compatibility with existing workflows.
177
159
 
178
- Output goes to `.agentv/results/{evalname}_{timestamp}.jsonl` (or `.yaml`) unless `--out` is provided.
160
+ Output goes to `.agentv/results/eval_<timestamp>.jsonl` (or `.yaml`) unless `--out` is provided.
179
161
 
180
162
  ### Tips for VS Code Copilot Evals
181
163
 
@@ -256,21 +238,6 @@ Each target specifies:
256
238
  Codex targets require the standalone `codex` CLI and a configured profile (via `codex configure`) so credentials are stored in `~/.codex/config` (or whatever path the CLI already uses). AgentV mirrors all guideline and attachment files into a fresh scratch workspace, so the `file://` preread links remain valid even when the CLI runs outside your repo tree.
257
239
  Confirm the CLI works by running `codex exec --json --profile <name> "ping"` (or any supported dry run) before starting an eval. This prints JSONL events; seeing `item.completed` messages indicates the CLI is healthy.
258
240
 
259
- ## Timeout Handling and Retries
260
-
261
- When using VS Code or other AI agents that may experience timeouts, the evaluator includes automatic retry functionality:
262
-
263
- - **Timeout detection:** Automatically detects when agents timeout
264
- - **Automatic retries:** When a timeout occurs, the same eval case is retried up to `--max-retries` times (default: 2)
265
- - **Retry behavior:** Only timeouts trigger retries; other errors proceed to the next eval case
266
- - **Timeout configuration:** Use `--agent-timeout` to adjust how long to wait for agent responses
267
-
268
- Example with custom timeout settings:
269
-
270
- ```bash
271
- agentv eval evals/projectx/example.yaml --target vscode_projectx --agent-timeout 180 --max-retries 3
272
- ```
273
-
274
241
  ## Writing Custom Evaluators
275
242
 
276
243
  ### Code Evaluator I/O Contract
@@ -370,110 +337,17 @@ Evaluation criteria and guidelines...
370
337
 
371
338
  ## Next Steps
372
339
 
373
- - Review `docs/examples/simple/evals/example-eval.yaml` to understand the schema
374
- - Create your own eval cases following the schema
375
- - Write custom evaluator scripts for domain-specific validation
376
- - Create LLM judge templates for semantic evaluation
340
+ - Review [docs/examples/simple/evals/example-eval.yaml](docs/examples/simple/evals/example-eval.yaml) to understand the schema
341
+ - Create your own eval dataset following the schema
342
+ - Write custom evaluator scripts for deterministic evaluation
343
+ - Create LLM judge prompts for semantic evaluation
377
344
  - Set up optimizer configs when ready to improve prompts
378
345
 
379
346
  ## Resources
380
347
 
381
348
  - [Simple Example README](docs/examples/simple/README.md)
382
- - [Schema Specification](docs/openspec/changes/update-eval-schema-v2/)
383
349
  - [Ax ACE Documentation](https://github.com/ax-llm/ax/blob/main/docs/ACE.md)
384
350
 
385
- ## Scoring and Outputs
386
-
387
- Run with `--verbose` to print detailed information and stack traces on errors.
388
-
389
- ### Scoring Methodology
390
-
391
- AgentV uses an AI-powered quality grader that:
392
-
393
- - Extracts key aspects from the expected answer
394
- - Compares model output against those aspects
395
- - Provides detailed hit/miss analysis with reasoning
396
- - Returns a normalized score (0.0 to 1.0)
397
-
398
- ### Output Formats
399
-
400
- **JSONL format (default):**
401
-
402
- - One JSON object per line (newline-delimited)
403
- - Fields: `eval_id`, `score`, `hits`, `misses`, `model_answer`, `expected_aspect_count`, `target`, `timestamp`, `reasoning`, `raw_request`, `grader_raw_request`
404
-
405
- **YAML format (with `--format yaml`):**
406
-
407
- - Human-readable YAML documents
408
- - Same fields as JSONL, properly formatted for readability
409
- - Multi-line strings use literal block style
410
-
411
- ### Summary Statistics
412
-
413
- After running all eval cases, AgentV displays:
414
-
415
- - Mean, median, min, max scores
416
- - Standard deviation
417
- - Distribution histogram
418
- - Total eval count and execution time
419
-
420
- ## Architecture
421
-
422
- AgentV is built as a TypeScript monorepo using:
423
-
424
- - **pnpm workspaces:** Efficient dependency management
425
- - **Turbo:** Build system and task orchestration
426
- - **@ax-llm/ax:** Unified LLM provider abstraction
427
- - **Vercel AI SDK:** Streaming and tool use capabilities
428
- - **Zod:** Runtime type validation
429
- - **Commander.js:** CLI argument parsing
430
- - **Vitest:** Testing framework
431
-
432
- ### Package Structure
433
-
434
- - `@agentv/core` - Core evaluation engine, providers, grading logic
435
- - `agentv` - Main package that bundles CLI functionality
436
-
437
- ## Troubleshooting
438
-
439
- ### Installation Issues
440
-
441
- **Problem:** Package installation fails or command not found.
442
-
443
- **Solution:**
444
-
445
- ```bash
446
- # Clear npm cache and reinstall
447
- npm cache clean --force
448
- npm uninstall -g agentv
449
- npm install -g agentv
450
-
451
- # Or use npx without installing
452
- npx agentv@latest --help
453
- ```
454
-
455
- ### VS Code Integration Issues
456
-
457
- **Problem:** VS Code workspace doesn't open or prompts aren't injected.
458
-
459
- **Solution:**
460
-
461
- - Ensure the `subagent` package is installed (should be automatic)
462
- - Verify your workspace path in `.env` is correct and points to a `.code-workspace` file
463
- - Close any other VS Code instances before running evals
464
- - Use `--verbose` flag to see detailed workspace switching logs
465
-
466
- ### Provider Configuration Issues
467
-
468
- **Problem:** API authentication errors or missing credentials.
469
-
470
- **Solution:**
471
-
472
- - Double-check environment variables in your `.env` file
473
- - Verify the variable names in `targets.yaml` match your `.env` file
474
- - Use `--dry-run` first to test without making API calls
475
- - Check provider-specific documentation for required environment variables
476
-
477
351
  ## License
478
352
 
479
353
  MIT License - see [LICENSE](LICENSE) for details.