npm - agentv - Versions diffs - 0.2.3 → 0.2.8 - Mend

agentv 0.2.3 → 0.2.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md +67 -42
package/dist/{chunk-S3RN2GSO.js → chunk-RLBRJX7V.js} +611 -428
package/dist/chunk-RLBRJX7V.js.map +1 -0
package/dist/cli.js +1 -1
package/dist/index.js +1 -1
package/dist/templates/config-schema.json +27 -0
package/dist/templates/eval-build.prompt.md +3 -3
package/dist/templates/eval-schema.json +3 -3
package/package.json +3 -2
package/dist/chunk-S3RN2GSO.js.map +0 -1

package/README.md CHANGED Viewed

@@ -74,35 +74,60 @@ You are now ready to start development. The monorepo contains:
 ## Quick Start
-### Linting Eval Files
+### Configuring Guideline Patterns
-Validate your eval and targets files before running them:
+AgentV automatically detects guideline files (instructions, prompts) and treats them differently from regular file content. You can customize which files are considered guidelines using an optional `.agentv/config.yaml` configuration file.
-```bash
-# Lint a single file
-agentv lint evals/my-test.yaml
+**Config file discovery:**
+- AgentV searches for `.agentv/config.yaml` starting from the eval file's directory
+- Walks up the directory tree to the repository root
+- Uses the first config file found (similar to how `targets.yaml` is discovered)
+- This allows you to place one config file at the project root for all evals
-# Lint multiple files
-agentv lint evals/test1.yaml evals/test2.yaml
+**Default patterns** (used when `.agentv/config.yaml` is absent):
-# Lint entire directory (recursively finds all YAML files)
-agentv lint evals/
+```yaml
+guideline_patterns:
+  - "**/*.instructions.md"
+  - "**/instructions/**"
+  - "**/*.prompt.md"
+  - "**/prompts/**"
+```
-# Enable strict mode for additional checks
-agentv lint --strict evals/
+**Custom patterns** (create `.agentv/config.yaml` in same directory as your eval file):
-# Output results in JSON format
-agentv lint --json evals/
+```yaml
+# .agentv/config.yaml
+guideline_patterns:
+  - "**/*.guide.md"           # Match all .guide.md files
+  - "**/guidelines/**"        # Match all files in /guidelines/ dirs
+  - "docs/AGENTS.md"          # Match specific files
+  - "**/*.rules.md"           # Match by naming convention
 ```
-**Linter features:**
+**How it works:**
+- Files matching guideline patterns are loaded as separate guideline context
+- Files NOT matching are treated as regular file content in user messages
+- Patterns use standard glob syntax (via [micromatch](https://github.com/micromatch/micromatch))
+- Paths are normalized to forward slashes for cross-platform compatibility
+See [config.yaml example](docs/examples/simple/.agentv/config.yaml) for more pattern examples.
-- Validates `$schema` field is present and correct
-- Checks required fields and structure for eval and targets files
-- Validates file references exist and are accessible
-- Provides clear error messages with file path and location context
-- Exits with non-zero code on validation failures (CI-friendly)
-- Supports strict mode for additional checks (e.g., non-empty file content)
+### Validating Eval Files
+Validate your eval and targets files before running them:
+```bash
+# Validate a single file
+agentv validate evals/my-eval.yaml
+# Validate multiple files
+agentv validate evals/eval1.yaml evals/eval2.yaml
+# Validate entire directory (recursively finds all YAML files)
+agentv validate evals/
+```
 **File type detection:**
@@ -112,7 +137,7 @@ All AgentV files must include a `$schema` field:
 # Eval files
 $schema: agentv-eval-v2
 evalcases:
-  - id: test-1
+  - id: eval-1
     # ...
 # Targets files
@@ -126,29 +151,29 @@ Files without a `$schema` field will be rejected with a clear error message.
 ### Running Evals
-Run eval (target auto-selected from test file or CLI override):
+Run eval (target auto-selected from eval file or CLI override):
 ```bash
-# If your test.yaml contains "target: azure_base", it will be used automatically
-agentv eval "path/to/test.yaml"
+# If your eval.yaml contains "target: azure_base", it will be used automatically
+agentv eval "path/to/eval.yaml"
-# Override the test file's target with CLI flag
-agentv eval --target vscode_projectx "path/to/test.yaml"
+# Override the eval file's target with CLI flag
+agentv eval --target vscode_projectx "path/to/eval.yaml"
 ```
-Run a specific test case with custom targets path:
+Run a specific eval case with custom targets path:
 ```bash
-agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --test-id "my-test-case" "path/to/test.yaml"
+agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --eval-id "my-eval-case" "path/to/eval.yaml"
 ```
 ### Command Line Options
-- `test_file`: Path to test YAML file (required, positional argument)
-- `--target TARGET`: Execution target name from targets.yaml (overrides target specified in test file)
+- `eval_file`: Path to eval YAML file (required, positional argument)
+- `--target TARGET`: Execution target name from targets.yaml (overrides target specified in eval file)
 - `--targets TARGETS`: Path to targets.yaml file (default: ./.agentv/targets.yaml)
-- `--test-id TEST_ID`: Run only the test case with this specific ID
-- `--out OUTPUT_FILE`: Output file path (default: results/{testname}_{timestamp}.jsonl)
+- `--eval-id EVAL_ID`: Run only the eval case with this specific ID
+- `--out OUTPUT_FILE`: Output file path (default: results/{evalname}_{timestamp}.jsonl)
 - `--format FORMAT`: Output format: 'jsonl' or 'yaml' (default: jsonl)
 - `--dry-run`: Run with mock model for testing
 - `--agent-timeout SECONDS`: Timeout in seconds for agent response polling (default: 120)
@@ -162,12 +187,12 @@ agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --test-id
 The CLI determines which execution target to use with the following precedence:
 1. CLI flag override: `--target my_target` (when provided and not 'default')
-2. Test file specification: `target: my_target` key in the .test.yaml file
+2. Eval file specification: `target: my_target` key in the .eval.yaml file
 3. Default fallback: Uses the 'default' target (original behavior)
-This allows test files to specify their preferred target while still allowing command-line overrides for flexibility, and maintains backward compatibility with existing workflows.
+This allows eval files to specify their preferred target while still allowing command-line overrides for flexibility, and maintains backward compatibility with existing workflows.
-Output goes to `.agentv/results/{testname}_{timestamp}.jsonl` (or `.yaml`) unless `--out` is provided.
+Output goes to `.agentv/results/{evalname}_{timestamp}.jsonl` (or `.yaml`) unless `--out` is provided.
 ### Tips for VS Code Copilot Evals
@@ -189,7 +214,7 @@ Environment keys (configured via targets.yaml):
 ## Targets and Environment Variables
-Execution targets in `.agentv/targets.yaml` decouple tests from providers/settings and provide flexible environment variable mapping.
+Execution targets in `.agentv/targets.yaml` decouple evals from providers/settings and provide flexible environment variable mapping.
 ### Target Configuration Structure
@@ -251,8 +276,8 @@ Each target specifies:
 When using VS Code or other AI agents that may experience timeouts, the evaluator includes automatic retry functionality:
 - **Timeout detection:** Automatically detects when agents timeout
-- **Automatic retries:** When a timeout occurs, the same test case is retried up to `--max-retries` times (default: 2)
-- **Retry behavior:** Only timeouts trigger retries; other errors proceed to the next test case
+- **Automatic retries:** When a timeout occurs, the same eval case is retried up to `--max-retries` times (default: 2)
+- **Retry behavior:** Only timeouts trigger retries; other errors proceed to the next eval case
 - **Timeout configuration:** Use `--agent-timeout` to adjust how long to wait for agent responses
 Example with custom timeout settings:
@@ -263,7 +288,7 @@ agentv eval evals/projectx/example.yaml --target vscode_projectx --agent-timeout
 ## How the Evals Work
-For each test case in a `.yaml` file:
+For each eval case in a `.yaml` file:
 1. Parse YAML and collect user messages (inline text and referenced files)
 2. Extract code blocks from text for structured prompting
@@ -296,7 +321,7 @@ AgentV uses an AI-powered quality grader that:
 **JSONL format (default):**
 - One JSON object per line (newline-delimited)
-- Fields: `test_id`, `score`, `hits`, `misses`, `model_answer`, `expected_aspect_count`, `target`, `timestamp`, `reasoning`, `raw_request`, `grader_raw_request`
+- Fields: `eval_id`, `score`, `hits`, `misses`, `model_answer`, `expected_aspect_count`, `target`, `timestamp`, `reasoning`, `raw_request`, `grader_raw_request`
 **YAML format (with `--format yaml`):**
@@ -306,12 +331,12 @@ AgentV uses an AI-powered quality grader that:
 ### Summary Statistics
-After running all test cases, AgentV displays:
+After running all eval cases, AgentV displays:
 - Mean, median, min, max scores
 - Standard deviation
 - Distribution histogram
-- Total test count and execution time
+- Total eval count and execution time
 ## Architecture