npm - agentv - Versions diffs - 0.20.1 → 0.21.2 - Mend

agentv 0.20.1 → 0.21.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md +53 -0
package/dist/{chunk-GDGNKNKP.js → chunk-WOCXZEH4.js} +806 -230
package/dist/chunk-WOCXZEH4.js.map +1 -0
package/dist/cli.js +5 -2
package/dist/cli.js.map +1 -1
package/dist/index.js +3 -3
package/dist/templates/.claude/skills/agentv-eval-builder/SKILL.md +3 -3
package/dist/templates/agentv/.env.template +23 -0
package/package.json +4 -4
package/dist/chunk-GDGNKNKP.js.map +0 -1

package/README.md CHANGED Viewed

@@ -334,6 +334,59 @@ Evaluation criteria and guidelines...
 }
 ```
+## Rubric-Based Evaluation
+AgentV supports structured evaluation through rubrics - lists of criteria that define what makes a good response. Rubrics are checked by an LLM judge and scored based on weights and requirements.
+### Basic Usage
+Define rubrics inline using simple strings:
+```yaml
+- id: example-1
+  expected_outcome: Explain quicksort algorithm
+  rubrics:
+    - Mentions divide-and-conquer approach
+    - Explains the partition step
+    - States time complexity correctly
+```
+Or use detailed objects for fine-grained control:
+```yaml
+rubrics:
+  - id: structure
+    description: Has clear headings and organization
+    weight: 1.0
+    required: true
+  - id: examples
+    description: Includes practical examples
+    weight: 0.5
+    required: false
+```
+### Generate Rubrics
+Automatically generate rubrics from `expected_outcome` fields:
+```bash
+# Generate rubrics for all eval cases without rubrics
+agentv generate rubrics evals/my-eval.yaml
+# Use a specific LLM target for generation
+agentv generate rubrics evals/my-eval.yaml --target openai:gpt-4o
+```
+### Scoring and Verdicts
+- **Score**: (sum of satisfied weights) / (total weights)
+- **Verdicts**:
+  - `pass`: Score ≥ 0.8 and all required rubrics met
+  - `borderline`: Score ≥ 0.6 and all required rubrics met
+  - `fail`: Score < 0.6 or any required rubric failed
+For complete examples and detailed patterns, see [examples/features/evals/rubric/](examples/features/evals/rubric/).
 ## Advanced Configuration
 ### Retry Configuration