@axlsdk/studio 0.13.3 → 0.13.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md
CHANGED
|
@@ -107,7 +107,7 @@ Browse active sessions with conversation history. Replay sessions step by step.
|
|
|
107
107
|
Browse all registered tools with their schemas rendered as forms. Test any tool directly with custom input and see the result.
|
|
108
108
|
|
|
109
109
|
### Eval Runner
|
|
110
|
-
Run evaluations from the UI. View per-item results with scores. Compare two
|
|
110
|
+
Run evaluations from the UI. View per-item results with scores, timing, and cost. Drill into individual items to see LLM scorer reasoning, per-scorer timing/cost, and annotations. Filter items by error state or score threshold, sort by score/duration/cost. Score distribution chart shows how scores are spread across bins. Compare two runs with timing/cost tradeoff analysis and expandable regression detail showing side-by-side outputs and reasoning. History tab tracks mean scores across runs with an eval name filter. Requires `@axlsdk/eval` as an optional peer dependency.
|
|
111
111
|
|
|
112
112
|
## What gets registered
|
|
113
113
|
|