npm - agentv - Versions diffs - 1.0.0 → 1.2.0 - Mend

agentv 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/dist/{chunk-RIJO5WBF.js → chunk-IVIT4U6S.js} +52 -256
package/dist/chunk-IVIT4U6S.js.map +1 -0
package/dist/cli.js +1 -1
package/dist/index.js +1 -1
package/dist/templates/.claude/skills/agentv-eval-builder/SKILL.md +0 -16
package/dist/templates/.claude/skills/agentv-eval-builder/references/example-evals.md +0 -27
package/dist/templates/.claude/skills/agentv-eval-builder/references/tool-trajectory-evaluator.md +10 -68
package/package.json +1 -1
package/dist/chunk-RIJO5WBF.js.map +0 -1

package/dist/cli.js CHANGED Viewed

@@ -1,7 +1,7 @@
 #!/usr/bin/env node
 import {
   runCli
-} from "./chunk-RIJO5WBF.js";
+} from "./chunk-IVIT4U6S.js";
 import "./chunk-UE4GLFVL.js";
 // src/cli.ts

package/dist/index.js CHANGED Viewed

@@ -1,7 +1,7 @@
 import {
   app,
   runCli
-} from "./chunk-RIJO5WBF.js";
+} from "./chunk-IVIT4U6S.js";
 import "./chunk-UE4GLFVL.js";
 export {
   app,

package/dist/templates/.claude/skills/agentv-eval-builder/SKILL.md CHANGED Viewed

@@ -79,22 +79,6 @@ execution:
 See `references/tool-trajectory-evaluator.md` for modes and configuration.
-### Expected Tool Calls Evaluators
-Validate tool calls and inputs inline with conversation flow:
-```yaml
-expected_messages:
-  - role: assistant
-    tool_calls:
-      - tool: getMetrics
-        input: { server: "prod-1" }
-execution:
-  evaluators:
-    - name: input_check
-      type: expected_tool_calls
-```
 ### Multiple Evaluators
 Define multiple evaluators to run sequentially. The final score is a weighted average of all results.

package/dist/templates/.claude/skills/agentv-eval-builder/references/example-evals.md CHANGED Viewed

@@ -142,33 +142,6 @@ evalcases:
             - tool: generateToken
 ```
-## Expected Messages with Tool Calls
-Validate precise tool inputs inline with expected messages.
-```yaml
-$schema: agentv-eval-v2
-description: Tool input validation
-target: mock_agent
-evalcases:
-  - id: precise-inputs
-    expected_outcome: Agent calls tools with correct parameters
-    input_messages:
-      - role: user
-        content: Check CPU metrics for prod-1
-    expected_messages:
-      - role: assistant
-        content: Checking metrics...
-        tool_calls:
-          - tool: getCpuMetrics
-            input: { server: "prod-1" }
-    execution:
-      evaluators:
-        - name: input-validator
-          type: expected_tool_calls
-```
 ## Static Trace Evaluation
 Evaluate pre-existing trace files without running an agent.

package/dist/templates/.claude/skills/agentv-eval-builder/references/tool-trajectory-evaluator.md CHANGED Viewed

@@ -2,13 +2,6 @@
 Tool trajectory evaluators validate that an agent used the expected tools during execution. They work with trace data returned by agent providers (codex, vscode, cli with trace support).
-## Evaluator Types
-AgentV provides two ways to validate tool usage:
-1. **`tool_trajectory`** - Dedicated evaluator with configurable matching modes
-2. **`expected_messages`** - Inline tool_calls in expected_messages for simpler cases
 ## Tool Trajectory Evaluator
 ### Modes
@@ -76,50 +69,6 @@ execution:
 - Strict protocol validation
 - Regression testing specific behavior
-## Expected Tool Calls Evaluator
-For simpler cases, specify tool_calls inline in `expected_messages`:
-```yaml
-evalcases:
-  - id: research-task
-    expected_outcome: Agent searches and retrieves documents
-    input_messages:
-      - role: user
-        content: Research REST vs GraphQL differences
-    expected_messages:
-      - role: assistant
-        content: I'll research this topic.
-        tool_calls:
-          - tool: knowledgeSearch
-          - tool: knowledgeSearch
-          - tool: documentRetrieve
-    execution:
-      evaluators:
-        - name: tool-validator
-          type: expected_tool_calls
-```
-### With Input Matching
-Validate specific inputs were passed to tools:
-```yaml
-expected_messages:
-  - role: assistant
-    content: Checking metrics...
-    tool_calls:
-      - tool: getCpuMetrics
-        input:
-          server: prod-1
-      - tool: getMemoryMetrics
-        input:
-          server: prod-1
-```
 ## Scoring
 ### tool_trajectory Scoring
@@ -130,10 +79,6 @@ expected_messages:
 | `in_order` | (matched tools in sequence) / (expected tools count) |
 | `exact` | (correctly positioned tools) / (expected tools count) |
-### expected_tool_calls Scoring
-Sequential matching: `(matched tool_calls) / (expected tool_calls)`
 ## Trace Data Requirements
 Tool trajectory evaluators require trace data from the agent provider. Supported providers:
@@ -198,24 +143,21 @@ evalcases:
 evalcases:
   - id: data-pipeline
     expected_outcome: Process data through complete pipeline
     input_messages:
       - role: user
         content: Process the customer dataset
-    expected_messages:
-      - role: assistant
-        content: Processing data...
-        tool_calls:
-          - tool: loadData
-          - tool: validate
-          - tool: transform
-          - tool: export
     execution:
       evaluators:
         - name: pipeline-check
-          type: expected_tool_calls
+          type: tool_trajectory
+          mode: exact
+          expected:
+            - tool: loadData
+            - tool: validate
+            - tool: transform
+            - tool: export
 ```
 ## CLI Options for Traces
@@ -234,4 +176,4 @@ agentv eval evals/test.yaml --include-trace
 2. **Start with any_order** - Then tighten to `in_order` or `exact` as needed
 3. **Combine with other evaluators** - Use tool trajectory for execution, LLM judge for output quality
 4. **Test with --dump-traces** - Inspect actual traces to understand agent behavior
-5. **Use expected_tool_calls for simple cases** - It's more readable for basic tool validation
+5. **Use code evaluators for custom validation** - Write custom tool validation scripts with access to trace data

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agentv",
-  "version": "1.0.0",
+  "version": "1.2.0",
   "description": "CLI entry point for AgentV",
   "type": "module",
   "repository": {