PyPI - judgeval - Versions diffs - 0.0.53__tar.gz → 0.0.54__tar.gz - Mend

judgeval 0.0.53tar.gz → 0.0.54tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (102) hide show

{judgeval-0.0.53 → judgeval-0.0.54}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: judgeval
-Version: 0.0.53
+Version: 0.0.54
 Summary: Judgeval Package
 Project-URL: Homepage, https://github.com/JudgmentLabs/judgeval
 Project-URL: Issues, https://github.com/JudgmentLabs/judgeval/issues
@@ -151,10 +151,10 @@ You'll see your trace exported to the Judgment Platform:
 |  |  |
 |:---|:---:|
-| <h3>🔍 Tracing</h3>Automatic agent tracing integrated with common frameworks (LangGraph, OpenAI, Anthropic): **tracking inputs/outputs, agent tool calls, latency, and cost** at every step.<br><br>Online evals can be applied to traces to measure quality on production data in real-time. Export data per individual trace for detailed analysis.<br><br>**Useful for:**<br>• 🐛 Debugging agent runs <br>• 📋 Collecting agent environment data <br>• 🔬 Pinpointing performance bottlenecks| <p align="center"><img src="assets/trace_screenshot.png" alt="Tracing visualization" width="1200"/></p> |
-| <h3>🧪 Evals</h3>Evals are the key to regression testing for agents. Judgeval provides 15+ research-backed metrics including tool call accuracy, hallucinations, instruction adherence, and retrieval context recall.<br><br>Judgeval supports LLM-as-a-judge, manual labeling, and custom evaluators that connect with our metric-tracking infrastructure. <br><br>**Useful for:**<br>• ⚠️ Unit-testing <br>• 🔬 Experimental prompt testing<br>• 🛡️ Online guardrails | <p align="center"><img src="assets/experiments_page.png" alt="Evaluation metrics" width="800"/></p> |
-| <h3>📡 Monitoring</h3>Track all your agent metrics in production. **Catch production regressions early.**<br><br>Configure alerts to trigger automated actions when metric thresholds are exceeded (add agent trace to review queue/dataset, Slack notification, etc.).<br><br> **Useful for:** <br>• 📉 Identifying degradation early <br>• 📈 Visualizing performance trends across agent versions and time | <p align="center"><img src="assets/error_analysis_dashboard.png" alt="Monitoring Dashboard" width="1200"/></p> |
-| <h3>📊 Datasets</h3>Export comprehensive agent-environment interaction data or import external testcases to datasets for scaled analysis and optimization. Move datasets to/from Parquet, S3, etc. <br><br>Run evals on datasets as unit tests or to A/B test different agent configurations, enabling continuous learning from production interactions. <br><br> **Useful for:**<br>• 🗃️ Agent environment interaction data for optimization<br>• 🔄 Scaled analysis for A/B tests | <p align="center"><img src="assets/datasets_preview_screenshot.png" alt="Dataset management" width="1200"/></p> |
+| <h3>🔍 Tracing</h3>Automatic agent tracing integrated with common frameworks (LangGraph, OpenAI, Anthropic). **Tracks inputs/outputs, agent tool calls, latency, cost, and custom metadata** at every step.<br><br>**Useful for:**<br>• 🐛 Debugging agent runs <br>• 📋 Collecting agent environment data <br>• 🔬 Pinpointing performance bottlenecks| <p align="center"><img src="assets/trace_screenshot.png" alt="Tracing visualization" width="1200"/></p> |
+| <h3>🧪 Evals</h3>Build custom evaluators on top of your agents. Judgeval supports LLM-as-a-judge, manual labeling, and code-based evaluators that connect with our metric-tracking infrastructure. <br><br>**Useful for:**<br>• ⚠️ Unit-testing <br>• 🔬 A/B testing <br>• 🛡️ Online guardrails | <p align="center"><img src="assets/experiments_page.png" alt="Evaluation metrics" width="800"/></p> |
+| <h3>📡 Monitoring</h3>Get Slack alerts when you agent failures in production. Add custom hooks to address production regressions.<br><br> **Useful for:** <br>• 📉 Identifying degradation early <br>• 📈 Visualizing performance trends across agent versions and time | <p align="center"><img src="assets/error_analysis_dashboard.png" alt="Monitoring Dashboard" width="1200"/></p> |
+| <h3>📊 Datasets</h3>Export traces and test cases to datasets for scaled analysis and optimization. Move datasets to/from Parquet, S3, etc. <br><br>Run evals on datasets as unit tests or to A/B test different agent configurations, enabling continuous learning from production interactions. <br><br> **Useful for:**<br>• 🗃️ Agent environment interaction data for optimization<br>• 🔄 Scaled analysis for A/B tests | <p align="center"><img src="assets/datasets_preview_screenshot.png" alt="Dataset management" width="1200"/></p> |
 ## 🏢 Self-Hosting

{judgeval-0.0.53 → judgeval-0.0.54}/README.md RENAMED Viewed

@@ -121,10 +121,10 @@ You'll see your trace exported to the Judgment Platform:
 |  |  |
 |:---|:---:|
-| <h3>🔍 Tracing</h3>Automatic agent tracing integrated with common frameworks (LangGraph, OpenAI, Anthropic): **tracking inputs/outputs, agent tool calls, latency, and cost** at every step.<br><br>Online evals can be applied to traces to measure quality on production data in real-time. Export data per individual trace for detailed analysis.<br><br>**Useful for:**<br>• 🐛 Debugging agent runs <br>• 📋 Collecting agent environment data <br>• 🔬 Pinpointing performance bottlenecks| <p align="center"><img src="assets/trace_screenshot.png" alt="Tracing visualization" width="1200"/></p> |
-| <h3>🧪 Evals</h3>Evals are the key to regression testing for agents. Judgeval provides 15+ research-backed metrics including tool call accuracy, hallucinations, instruction adherence, and retrieval context recall.<br><br>Judgeval supports LLM-as-a-judge, manual labeling, and custom evaluators that connect with our metric-tracking infrastructure. <br><br>**Useful for:**<br>• ⚠️ Unit-testing <br>• 🔬 Experimental prompt testing<br>• 🛡️ Online guardrails | <p align="center"><img src="assets/experiments_page.png" alt="Evaluation metrics" width="800"/></p> |
-| <h3>📡 Monitoring</h3>Track all your agent metrics in production. **Catch production regressions early.**<br><br>Configure alerts to trigger automated actions when metric thresholds are exceeded (add agent trace to review queue/dataset, Slack notification, etc.).<br><br> **Useful for:** <br>• 📉 Identifying degradation early <br>• 📈 Visualizing performance trends across agent versions and time | <p align="center"><img src="assets/error_analysis_dashboard.png" alt="Monitoring Dashboard" width="1200"/></p> |
-| <h3>📊 Datasets</h3>Export comprehensive agent-environment interaction data or import external testcases to datasets for scaled analysis and optimization. Move datasets to/from Parquet, S3, etc. <br><br>Run evals on datasets as unit tests or to A/B test different agent configurations, enabling continuous learning from production interactions. <br><br> **Useful for:**<br>• 🗃️ Agent environment interaction data for optimization<br>• 🔄 Scaled analysis for A/B tests | <p align="center"><img src="assets/datasets_preview_screenshot.png" alt="Dataset management" width="1200"/></p> |
+| <h3>🔍 Tracing</h3>Automatic agent tracing integrated with common frameworks (LangGraph, OpenAI, Anthropic). **Tracks inputs/outputs, agent tool calls, latency, cost, and custom metadata** at every step.<br><br>**Useful for:**<br>• 🐛 Debugging agent runs <br>• 📋 Collecting agent environment data <br>• 🔬 Pinpointing performance bottlenecks| <p align="center"><img src="assets/trace_screenshot.png" alt="Tracing visualization" width="1200"/></p> |
+| <h3>🧪 Evals</h3>Build custom evaluators on top of your agents. Judgeval supports LLM-as-a-judge, manual labeling, and code-based evaluators that connect with our metric-tracking infrastructure. <br><br>**Useful for:**<br>• ⚠️ Unit-testing <br>• 🔬 A/B testing <br>• 🛡️ Online guardrails | <p align="center"><img src="assets/experiments_page.png" alt="Evaluation metrics" width="800"/></p> |
+| <h3>📡 Monitoring</h3>Get Slack alerts when you agent failures in production. Add custom hooks to address production regressions.<br><br> **Useful for:** <br>• 📉 Identifying degradation early <br>• 📈 Visualizing performance trends across agent versions and time | <p align="center"><img src="assets/error_analysis_dashboard.png" alt="Monitoring Dashboard" width="1200"/></p> |
+| <h3>📊 Datasets</h3>Export traces and test cases to datasets for scaled analysis and optimization. Move datasets to/from Parquet, S3, etc. <br><br>Run evals on datasets as unit tests or to A/B test different agent configurations, enabling continuous learning from production interactions. <br><br> **Useful for:**<br>• 🗃️ Agent environment interaction data for optimization<br>• 🔄 Scaled analysis for A/B tests | <p align="center"><img src="assets/datasets_preview_screenshot.png" alt="Dataset management" width="1200"/></p> |
 ## 🏢 Self-Hosting

{judgeval-0.0.53 → judgeval-0.0.54}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "judgeval"
-version = "0.0.53"
+version = "0.0.54"
 authors = [
     { name="Andrew Li", email="andrew@judgmentlabs.ai" },
     { name="Alex Shan", email="alex@judgmentlabs.ai" },

{judgeval-0.0.53 → judgeval-0.0.54}/.github/ISSUE_TEMPLATE/bug_report.md RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/.github/ISSUE_TEMPLATE/feature_request.md RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/.github/pull_request_template.md RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/.github/workflows/blocked-pr.yaml RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/.github/workflows/ci.yaml RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/.github/workflows/lint.yaml RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/.github/workflows/merge-branch-check.yaml RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/.github/workflows/release.yaml RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/.github/workflows/validate-branch.yaml RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/.gitignore RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/.pre-commit-config.yaml RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/LICENSE.md RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/Screenshot 2025-05-17 at 8.14.27/342/200/257PM.png" RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/agent.gif RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/data.gif RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/dataset_clustering_screenshot.png RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/dataset_clustering_screenshot_dm.png RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/datasets_preview_screenshot.png RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/document.gif RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/error_analysis_dashboard.png RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/experiments_dashboard_screenshot.png RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/experiments_page.png RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/experiments_pagev2.png RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/logo-dark.svg RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/logo-light.svg RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/monitoring_screenshot.png RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/new_darkmode.svg RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/new_lightmode.svg RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/product_shot.png RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/trace.gif RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/trace_demo.png RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/trace_screenshot.png RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/assets/trace_screenshot_old.png RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/pytest.ini RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/.coveragerc RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/__init__.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/clients.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/common/__init__.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/common/exceptions.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/common/logger.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/common/s3_storage.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/common/tracer.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/common/utils.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/constants.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/data/__init__.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/data/datasets/__init__.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/data/datasets/dataset.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/data/datasets/eval_dataset_client.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/data/example.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/data/judgment_types.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/data/result.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/data/scorer_data.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/data/scripts/fix_default_factory.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/data/scripts/openapi_transform.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/data/tool.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/data/trace.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/data/trace_run.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/evaluation_run.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/integrations/langgraph.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/judges/__init__.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/judges/base_judge.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/judges/litellm_judge.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/judges/mixture_of_judges.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/judges/together_judge.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/judges/utils.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/judgment_client.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/rules.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/run_evaluation.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/__init__.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/agent_scorer.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/api_scorer.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/base_scorer.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/example_scorer.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/exceptions.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/judgeval_scorers/__init__.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/judgeval_scorers/api_scorers/__init__.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/judgeval_scorers/api_scorers/answer_correctness.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/judgeval_scorers/api_scorers/answer_relevancy.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/judgeval_scorers/api_scorers/classifier_scorer.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/judgeval_scorers/api_scorers/derailment_scorer.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/judgeval_scorers/api_scorers/execution_order.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/judgeval_scorers/api_scorers/faithfulness.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/judgeval_scorers/api_scorers/hallucination.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/judgeval_scorers/api_scorers/instruction_adherence.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/judgeval_scorers/api_scorers/tool_dependency.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/judgeval_scorers/api_scorers/tool_order.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/judgeval_scorers/classifiers/__init__.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/judgeval_scorers/classifiers/text2sql/__init__.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/judgeval_scorers/classifiers/text2sql/text2sql_scorer.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/score.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/scorers/utils.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/tracer/__init__.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/utils/alerts.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/utils/file_utils.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/utils/requests.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/judgeval/version_check.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/src/update_types.sh RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/update_version.py RENAMED Viewed

File without changes

{judgeval-0.0.53 → judgeval-0.0.54}/uv.lock RENAMED Viewed

File without changes

judgeval 0.0.53__tar.gz → 0.0.54__tar.gz

judgeval 0.0.53tar.gz → 0.0.54tar.gz