agentv 0.26.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -76,7 +76,7 @@ execution:
76
76
  - Strict protocol validation
77
77
  - Regression testing specific behavior
78
78
 
79
- ## Expected Messages Evaluator
79
+ ## Expected Tool Calls Evaluator
80
80
 
81
81
  For simpler cases, specify tool_calls inline in `expected_messages`:
82
82
 
@@ -84,11 +84,11 @@ For simpler cases, specify tool_calls inline in `expected_messages`:
84
84
  evalcases:
85
85
  - id: research-task
86
86
  expected_outcome: Agent searches and retrieves documents
87
-
87
+
88
88
  input_messages:
89
89
  - role: user
90
90
  content: Research REST vs GraphQL differences
91
-
91
+
92
92
  expected_messages:
93
93
  - role: assistant
94
94
  content: I'll research this topic.
@@ -96,11 +96,11 @@ evalcases:
96
96
  - tool: knowledgeSearch
97
97
  - tool: knowledgeSearch
98
98
  - tool: documentRetrieve
99
-
99
+
100
100
  execution:
101
101
  evaluators:
102
102
  - name: tool-validator
103
- type: expected_messages
103
+ type: expected_tool_calls
104
104
  ```
105
105
 
106
106
  ### With Input Matching
@@ -130,7 +130,7 @@ expected_messages:
130
130
  | `in_order` | (matched tools in sequence) / (expected tools count) |
131
131
  | `exact` | (correctly positioned tools) / (expected tools count) |
132
132
 
133
- ### expected_messages Scoring
133
+ ### expected_tool_calls Scoring
134
134
 
135
135
  Sequential matching: `(matched tool_calls) / (expected tool_calls)`
136
136
 
@@ -215,7 +215,7 @@ evalcases:
215
215
  execution:
216
216
  evaluators:
217
217
  - name: pipeline-check
218
- type: expected_messages
218
+ type: expected_tool_calls
219
219
  ```
220
220
 
221
221
  ## CLI Options for Traces
@@ -234,4 +234,4 @@ agentv eval evals/test.yaml --include-trace
234
234
  2. **Start with any_order** - Then tighten to `in_order` or `exact` as needed
235
235
  3. **Combine with other evaluators** - Use tool trajectory for execution, LLM judge for output quality
236
236
  4. **Test with --dump-traces** - Inspect actual traces to understand agent behavior
237
- 5. **Use expected_messages for simple cases** - It's more readable for basic tool validation
237
+ 5. **Use expected_tool_calls for simple cases** - It's more readable for basic tool validation
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agentv",
3
- "version": "0.26.0",
3
+ "version": "1.0.0",
4
4
  "description": "CLI entry point for AgentV",
5
5
  "type": "module",
6
6
  "repository": {