agentv 0.20.1 → 0.21.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -334,6 +334,59 @@ Evaluation criteria and guidelines...
334
334
  }
335
335
  ```
336
336
 
337
+ ## Rubric-Based Evaluation
338
+
339
+ AgentV supports structured evaluation through rubrics - lists of criteria that define what makes a good response. Rubrics are checked by an LLM judge and scored based on weights and requirements.
340
+
341
+ ### Basic Usage
342
+
343
+ Define rubrics inline using simple strings:
344
+
345
+ ```yaml
346
+ - id: example-1
347
+ expected_outcome: Explain quicksort algorithm
348
+ rubrics:
349
+ - Mentions divide-and-conquer approach
350
+ - Explains the partition step
351
+ - States time complexity correctly
352
+ ```
353
+
354
+ Or use detailed objects for fine-grained control:
355
+
356
+ ```yaml
357
+ rubrics:
358
+ - id: structure
359
+ description: Has clear headings and organization
360
+ weight: 1.0
361
+ required: true
362
+ - id: examples
363
+ description: Includes practical examples
364
+ weight: 0.5
365
+ required: false
366
+ ```
367
+
368
+ ### Generate Rubrics
369
+
370
+ Automatically generate rubrics from `expected_outcome` fields:
371
+
372
+ ```bash
373
+ # Generate rubrics for all eval cases without rubrics
374
+ agentv generate rubrics evals/my-eval.yaml
375
+
376
+ # Use a specific LLM target for generation
377
+ agentv generate rubrics evals/my-eval.yaml --target openai:gpt-4o
378
+ ```
379
+
380
+ ### Scoring and Verdicts
381
+
382
+ - **Score**: (sum of satisfied weights) / (total weights)
383
+ - **Verdicts**:
384
+ - `pass`: Score ≥ 0.8 and all required rubrics met
385
+ - `borderline`: Score ≥ 0.6 and all required rubrics met
386
+ - `fail`: Score < 0.6 or any required rubric failed
387
+
388
+ For complete examples and detailed patterns, see [examples/features/evals/rubric/](examples/features/evals/rubric/).
389
+
337
390
  ## Advanced Configuration
338
391
 
339
392
  ### Retry Configuration