PyPI - judgeval - Versions diffs - 0.2.0__py3-none-any.whl → 0.3.1__py3-none-any.whl - Mend

judgeval 0.2.0py3-none-any.whl → 0.3.1py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

judgeval/common/api/api.py +38 -7
judgeval/common/api/constants.py +9 -1
judgeval/common/storage/s3_storage.py +2 -3
judgeval/common/tracer/core.py +66 -32
judgeval/common/tracer/otel_span_processor.py +4 -50
judgeval/common/tracer/span_transformer.py +16 -10
judgeval/common/utils.py +46 -38
judgeval/constants.py +2 -0
judgeval/data/example.py +9 -37
judgeval/data/judgment_types.py +23 -45
judgeval/data/result.py +8 -14
judgeval/data/scripts/openapi_transform.py +5 -5
judgeval/data/trace.py +3 -4
judgeval/dataset.py +192 -0
judgeval/evaluation_run.py +1 -0
judgeval/judges/litellm_judge.py +2 -2
judgeval/judges/mixture_of_judges.py +6 -6
judgeval/judges/together_judge.py +6 -3
judgeval/judgment_client.py +9 -71
judgeval/run_evaluation.py +41 -9
judgeval/scorers/score.py +11 -7
judgeval/scorers/utils.py +3 -3
judgeval/utils/file_utils.py +40 -25
{judgeval-0.2.0.dist-info → judgeval-0.3.1.dist-info}/METADATA +10 -6
{judgeval-0.2.0.dist-info → judgeval-0.3.1.dist-info}/RECORD +27 -29
judgeval/data/datasets/__init__.py +0 -4
judgeval/data/datasets/dataset.py +0 -341
judgeval/data/datasets/eval_dataset_client.py +0 -214
{judgeval-0.2.0.dist-info → judgeval-0.3.1.dist-info}/WHEEL +0 -0
{judgeval-0.2.0.dist-info → judgeval-0.3.1.dist-info}/licenses/LICENSE.md +0 -0

{judgeval-0.2.0.dist-info → judgeval-0.3.1.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: judgeval
-Version: 0.2.0
+Version: 0.3.1
 Summary: Judgeval Package
 Project-URL: Homepage, https://github.com/JudgmentLabs/judgeval
 Project-URL: Issues, https://github.com/JudgmentLabs/judgeval/issues
@@ -14,6 +14,7 @@ Requires-Dist: anthropic
 Requires-Dist: boto3
 Requires-Dist: datamodel-code-generator>=0.31.1
 Requires-Dist: google-genai
+Requires-Dist: groq>=0.30.0
 Requires-Dist: langchain-anthropic
 Requires-Dist: langchain-core
 Requires-Dist: langchain-huggingface
@@ -22,6 +23,9 @@ Requires-Dist: litellm>=1.61.15
 Requires-Dist: matplotlib>=3.10.3
 Requires-Dist: nest-asyncio
 Requires-Dist: openai
+Requires-Dist: opentelemetry-api>=1.34.1
+Requires-Dist: opentelemetry-sdk>=1.34.1
+Requires-Dist: orjson>=3.9.0
 Requires-Dist: pandas
 Requires-Dist: python-dotenv==1.0.1
 Requires-Dist: python-slugify>=8.0.4
@@ -39,7 +43,7 @@ Description-Content-Type: text/markdown
     Enable self-learning agents with traces, evals, and environment data.
 </div>
-## [Docs](https://docs.judgmentlabs.ai/)  •  [Judgment Cloud](https://app.judgmentlabs.ai/register)  • [Self-Host](https://docs.judgmentlabs.ai/documentation/self-hosting/get-started)
+## [Docs](https://docs.judgmentlabs.ai/)  •  [Judgment Cloud](https://app.judgmentlabs.ai/register)  • [Self-Host](https://docs.judgmentlabs.ai/documentation/self-hosting/get-started)  • [Landing Page](https://judgmentlabs.ai/)
  [Demo](https://www.youtube.com/watch?v=1S4LixpVbcc) • [Bug Reports](https://github.com/JudgmentLabs/judgeval/issues) • [Changelog](https://docs.judgmentlabs.ai/changelog/2025-04-21)
@@ -139,7 +143,7 @@ run_agent("What is the capital of the United States?")
 ```
 You'll see your trace exported to the Judgment Platform:
-<p align="center"><img src="assets/trace_demo.png" alt="Judgment Platform Trace Example" width="800" /></p>
+<p align="center"><img src="assets/online_eval.png" alt="Judgment Platform Trace Example" width="1500" /></p>
 [Click here](https://docs.judgmentlabs.ai/documentation/tracing/introduction) for a more detailed explanation.
@@ -152,9 +156,9 @@ You'll see your trace exported to the Judgment Platform:
 |  |  |
 |:---|:---:|
-| <h3>🔍 Tracing</h3>Automatic agent tracing integrated with common frameworks (LangGraph, OpenAI, Anthropic). **Tracks inputs/outputs, agent tool calls, latency, cost, and custom metadata** at every step.<br><br>**Useful for:**<br>• 🐛 Debugging agent runs <br>• 📋 Collecting agent environment data <br>• 🔬 Pinpointing performance bottlenecks| <p align="center"><img src="assets/trace_screenshot.png" alt="Tracing visualization" width="1200"/></p> |
-| <h3>🧪 Evals</h3>Build custom evaluators on top of your agents. Judgeval supports LLM-as-a-judge, manual labeling, and code-based evaluators that connect with our metric-tracking infrastructure. <br><br>**Useful for:**<br>• ⚠️ Unit-testing <br>• 🔬 A/B testing <br>• 🛡️ Online guardrails | <p align="center"><img src="assets/experiments_page.png" alt="Evaluation metrics" width="800"/></p> |
-| <h3>📡 Monitoring</h3>Get Slack alerts for agent failures in production. Add custom hooks to address production regressions.<br><br> **Useful for:** <br>• 📉 Identifying degradation early <br>• 📈 Visualizing performance trends across agent versions and time | <p align="center"><img src="assets/error_analysis_dashboard.png" alt="Monitoring Dashboard" width="1200"/></p> |
+| <h3>🔍 Tracing</h3>Automatic agent tracing integrated with common frameworks (LangGraph, OpenAI, Anthropic). **Tracks inputs/outputs, agent tool calls, latency, cost, and custom metadata** at every step.<br><br>**Useful for:**<br>• 🐛 Debugging agent runs <br>• 📋 Collecting agent environment data <br>• 🔬 Pinpointing performance bottlenecks| <p align="center"><img src="assets/agent_trace_example.png" alt="Tracing visualization" width="1200"/></p> |
+| <h3>🧪 Evals</h3>Build custom evaluators on top of your agents. Judgeval supports LLM-as-a-judge, manual labeling, and code-based evaluators that connect with our metric-tracking infrastructure. <br><br>**Useful for:**<br>• ⚠️ Unit-testing <br>• 🔬 A/B testing <br>• 🛡️ Online guardrails | <p align="center"><img src="assets/test.png" alt="Evaluation metrics" width="800"/></p> |
+| <h3>📡 Monitoring</h3>Get Slack alerts for agent failures in production. Add custom hooks to address production regressions.<br><br> **Useful for:** <br>• 📉 Identifying degradation early <br>• 📈 Visualizing performance trends across agent versions and time | <p align="center"><img src="assets/errors.png" alt="Monitoring Dashboard" width="1200"/></p> |
 | <h3>📊 Datasets</h3>Export traces and test cases to datasets for scaled analysis and optimization. Move datasets to/from Parquet, S3, etc. <br><br>Run evals on datasets as unit tests or to A/B test different agent configurations, enabling continuous learning from production interactions. <br><br> **Useful for:**<br>• 🗃️ Agent environment interaction data for optimization<br>• 🔄 Scaled analysis for A/B tests | <p align="center"><img src="assets/datasets_preview_screenshot.png" alt="Dataset management" width="1200"/></p> |
 ## 🏢 Self-Hosting

{judgeval-0.2.0.dist-info → judgeval-0.3.1.dist-info}/RECORD RENAMED Viewed

@@ -1,47 +1,45 @@
 judgeval/__init__.py,sha256=HM1M8hmqRum6G554QKkXhB4DF4f5eh_xtYo0Kf-t3kw,332
 judgeval/clients.py,sha256=JnB8n90GyXiYaGmSEYaA67mdJSnr3SIrzArao7NGebw,980
-judgeval/constants.py,sha256=rfl4gW9_4irxgamjTC-jvDj2ATSUrjEu0UAHZ4pLLtY,4081
-judgeval/evaluation_run.py,sha256=PZeoKS7JCsO2gzdo8jeq8786yn01Ccrq0xuCNUu9CPo,2797
-judgeval/judgment_client.py,sha256=tUgKS2sV8QZUxjdh3mP2PSBnC7Bci1e8ur8muvrgzBM,14011
+judgeval/constants.py,sha256=hWed25HwGUJy-tePbtoUZ0_Zg0X_MkAH84KiH-OHHFI,4150
+judgeval/dataset.py,sha256=rjV54XNTslNNtf-Uu2ndDIh602ZwSCFhPg2NuckDJ-w,6081
+judgeval/evaluation_run.py,sha256=edNpO444Fwt2ykWsflIzlYdDJUlUfbpXHHQSKfFS4y0,2876
+judgeval/judgment_client.py,sha256=vPoxbmxAlhbG5rXXqxWjMbyEqOI044BaQanr1fev2CE,11723
 judgeval/rules.py,sha256=CoQjqmP8daEXewMkplmA-7urubDtweOr5O6z8klVwLI,20031
-judgeval/run_evaluation.py,sha256=h05MI7S5q6cmm_mFuM_nqHqNIu-IHXkisoZat4YOSDE,26405
+judgeval/run_evaluation.py,sha256=7J6FHhWhB-IDPMSOcWkrjTpSNm2v3s_KBq8Np3y2pys,27652
 judgeval/version_check.py,sha256=FoLEtpCjDw2HuDQdpw5yT29UtwumSc6ZZN6AV_c9Mnw,1057
 judgeval/common/__init__.py,sha256=KH-QJyWtQ60R6yFIBDYS3WGRiNpEu1guynpxivZvpBQ,309
 judgeval/common/exceptions.py,sha256=OkgDznu2wpBQZMXiZarLJYNk1HIcC8qYW7VypDC3Ook,556
 judgeval/common/logger.py,sha256=514eFLYWS_UL8VY-zAR2ePUlpQe4rbYlleLASFllLE4,1511
-judgeval/common/utils.py,sha256=GhCEv8i_7JK4DJeUlMmibqEUy9ZVHxJAlFO_CriAzg4,34323
+judgeval/common/utils.py,sha256=oxGDRVWOICKWeyGgsoc36_yAyHSYF4XtH842Mkznwis,34739
 judgeval/common/api/__init__.py,sha256=-E7lpZz1fG8puR_aYUMfPmQ-Vyhd0bgzoaU5EhIuFjQ,114
-judgeval/common/api/api.py,sha256=BGtAGGRDqxs8DrA0ye8BPZ6KBsgJ2C0Dca4vvA55d6g,13049
-judgeval/common/api/constants.py,sha256=azA0eyz4q33SWS795NHhaKDKNmVHBWAAGe5_sk37nDU,4297
+judgeval/common/api/api.py,sha256=wty02HYANeOYlM8fHOLc33ux5bu9Ieq7iRqCr-UP0ng,14157
+judgeval/common/api/constants.py,sha256=vAW94pbyTS6rv1TKpt7z6xxMJvTaAxFiy1D4kzuLHeg,4567
 judgeval/common/storage/__init__.py,sha256=a-PI7OL-ydyzugGUKmJKRBASnK-Q-gs82L9K9rSyJP8,90
-judgeval/common/storage/s3_storage.py,sha256=UvAKGSa0S1BnNprzDKHMAfyT-8zlMAOM5kCrXcVN0HE,3743
+judgeval/common/storage/s3_storage.py,sha256=0-bNKheqJJyBZ92KGrzQtd1zocIRWBlfn_58L4a-Ay0,3719
 judgeval/common/tracer/__init__.py,sha256=tJCJsmVmrL89Phv88gNCJ-j0ITPez6lh8vhMAAlLNSc,795
 judgeval/common/tracer/constants.py,sha256=yu5y8gMe5yb1AaBkPtAH-BNwIaAR3NwYCRoSf45wp5U,621
-judgeval/common/tracer/core.py,sha256=Ij-KDD3dVXHK_6NPk-VbTH_Mo8GZq5h4Zl5ii5QMjnE,72403
+judgeval/common/tracer/core.py,sha256=6a67h8WfI4T5YV4TXqZqAAbOPptA0yaIV38pe7Urf_0,73813
 judgeval/common/tracer/otel_exporter.py,sha256=kZLlOQ6afQE4dmb9H1wgU4P3H5PG1D_zKyvnpWcT5Ak,3899
-judgeval/common/tracer/otel_span_processor.py,sha256=3cMETvrNlwrTkS_XDdTNRhjVw_6TdgnojpQhDK9sbOs,7484
+judgeval/common/tracer/otel_span_processor.py,sha256=W7SM62KnxJ48vC9WllIHRKaLlvxkCwqYoT4KqZLfGNs,6497
 judgeval/common/tracer/span_processor.py,sha256=eFjTgSWSkM6BWE94CrvgafDg_WkxLsFL_MafwBG-p9M,1145
-judgeval/common/tracer/span_transformer.py,sha256=YIHEmr35o6_uX931JbD1PFIcLIWTVumWrJ198Ys391k,7544
+judgeval/common/tracer/span_transformer.py,sha256=nCnwRC52OKfYRFnsOwGdPaqb_U17yn5S_9jfhv1GaLM,7803
 judgeval/common/tracer/trace_manager.py,sha256=7KLWBrz5GE_138DHL_eRjhx4-LNfXKz1q_XIDfg6nw8,2992
 judgeval/data/__init__.py,sha256=1QagDcSQtfnJ632t9Dnq8d7XjAqhmY4mInOWt8qH9tM,455
-judgeval/data/example.py,sha256=6xtPTwWUsZ0HdErU-g954nCv64fsbnS1I5xuEvs14EA,2027
-judgeval/data/judgment_types.py,sha256=s1oea01AEBQBdqQntXhTbMiuDGAxvs2iGoxrR2uLnUw,9538
-judgeval/data/result.py,sha256=hHKiMMEl9Qv3EvK5UH8Y5YDu8VyvrHzNqlKatrq4UUY,2450
+judgeval/data/example.py,sha256=kRskIgsjwcvv2Y8jaPwV-PND7zlmMbFsvRVQ_b7SZY0,914
+judgeval/data/judgment_types.py,sha256=KE1HrFLfSxiu1zutaiZ7B7La9PGXIAsoWpo_5iy645c,8336
+judgeval/data/result.py,sha256=OtSnBUrdQpjyAqxXRLTW3wC9v9lOm_GqzL14ccRQxrg,2124
 judgeval/data/scorer_data.py,sha256=5QBHtvOIWOq0Rn9_uPJzAMRYMlWxMB-rXnG_6kV4Z4Y,2955
 judgeval/data/tool.py,sha256=iWQSdy5uNbIeACu3gQy1DC2oGYxRVYNfkkczWdQMAiA,99
-judgeval/data/trace.py,sha256=_cyCsyg2gwG7lyyv186xo4OvGH2QlJDuyIg-qh-CZNA,6994
+judgeval/data/trace.py,sha256=tDOuYFPUssQInjsmwyxcXq-W3IB29Vq340VzqafuKJc,6942
 judgeval/data/trace_run.py,sha256=c6pRSv09Vj016hxM49I3kMftCwWg8hhkfT_1kBXluSI,1600
-judgeval/data/datasets/__init__.py,sha256=IdNKhQv9yYZ_op0rdBacrFaFVmiiYQ3JTzXzxOTsEVQ,176
-judgeval/data/datasets/dataset.py,sha256=dDmTYSBRj4YEUhgYOebAcDm4N14nj3tcCqHj9y2Z1z0,12725
-judgeval/data/datasets/eval_dataset_client.py,sha256=8tiuwRC3oebc19KY-5b99Cxj0qq6ADW1NMDd1R1RhLc,7258
 judgeval/data/scripts/fix_default_factory.py,sha256=lvp2JwYZqz-XpD9LZNa3mANZVP-jJSZoNzolI6JWERM,591
-judgeval/data/scripts/openapi_transform.py,sha256=Rye-fErFtENAq3KKBKRUVR_oJdjYZtNzKRBKFkYS0XQ,3857
+judgeval/data/scripts/openapi_transform.py,sha256=Sm04JClzyP1ga8KA3gkIdsae8Hlx-XU7-x0gHCQYOhg,3877
 judgeval/integrations/langgraph.py,sha256=kJXLsgBY7DgsUTZyVQ47deDgHm887brFHfyIbuyerGw,29986
 judgeval/judges/__init__.py,sha256=6X7VSwrwsdxGBNxCyapVRWGghhKOy3MVxFNMQ62kCXM,308
 judgeval/judges/base_judge.py,sha256=_dz0qWsKRxzXxpRY9l6mrxTRYPSF2FE4ZXkrzhZ4gbY,986
-judgeval/judges/litellm_judge.py,sha256=LX4_KXb1Jp8IXif3vvOiKfRYH7ZkbQLs9AtWPGmj544,2483
-judgeval/judges/mixture_of_judges.py,sha256=wcHwLi9zU0uwKMqRVhcPdjiYKgWflX4dpUbU2kS9yg0,14825
-judgeval/judges/together_judge.py,sha256=r5k8ZcC6lnsFttGHhrocFtmglx2Cb3G-4ORKAeK-Nmw,2253
+judgeval/judges/litellm_judge.py,sha256=yt6QvwKMmxZcrUtjbn3EiO5aVg7CHM2YZkBCSQLS8jk,2509
+judgeval/judges/mixture_of_judges.py,sha256=cecQ8mRmz2-dDoZl2MGsrhZICkpIvRovGPK3su0kc8s,14889
+judgeval/judges/together_judge.py,sha256=5FADUhs6-FN1ZVV_1D3-8_gu9mPbZiG0PYTpme41SfM,2336
 judgeval/judges/utils.py,sha256=0CF9qtIUQUL3-W-qTGpmTjZbkUUBAM6TslDsrCHnTBU,2725
 judgeval/scorers/__init__.py,sha256=4H_cinTQ4EogZv59YEV-3U9EOTLppNwgAPTi1-jI9Fw,746
 judgeval/scorers/agent_scorer.py,sha256=TjwD_YglSywr3EowEojiCyg5qDgCRa5LRGc5nFdmIBc,703
@@ -49,8 +47,8 @@ judgeval/scorers/api_scorer.py,sha256=xlhqkeMUBFxl8daSXOTWOYwZjBAz7o6b4sVD5f8cIH
 judgeval/scorers/base_scorer.py,sha256=eDfQk8N8TQfM1ayJDWr0NTdSQxcbk9-VZHd0Igb9EbI,2878
 judgeval/scorers/example_scorer.py,sha256=2n45y3LMV1Q-ARyXLHqvVWETlnY1DqS7OLzPu9IBGz8,716
 judgeval/scorers/exceptions.py,sha256=ACDHK5-TWiF3NTk-wycaedpbrdobm-CvvC1JA_iP-Mk,179
-judgeval/scorers/score.py,sha256=t9prkpDapcOAyuOXtDHMmwrqVGW0C_Hvx1UIEGyafmI,6610
-judgeval/scorers/utils.py,sha256=WM7mTCQSa2Z_rJ-0Iv9dhuBmtkTfV0pFN7XEhxHdzsM,3959
+judgeval/scorers/score.py,sha256=2-M_AmOjIQR2c0qvuB4WIIQD-7zSNdzsWC8ttqltw2g,6601
+judgeval/scorers/utils.py,sha256=HQOYTJtNnsi_aPfMssePAaBbXpAv7LXgwUlWlDFuN2g,3965
 judgeval/scorers/judgeval_scorers/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 judgeval/scorers/judgeval_scorers/api_scorers/__init__.py,sha256=GX4KkwPR2p-c0Y5mZingJa8EUfjAbMGhrmRBDBunOGw,1484
 judgeval/scorers/judgeval_scorers/api_scorers/answer_correctness.py,sha256=zJsU0VrUmRhY9qav48c6jTyDqUwI3JzhV9ajtlJCe0M,544
@@ -65,9 +63,9 @@ judgeval/scorers/judgeval_scorers/api_scorers/tool_dependency.py,sha256=Mcp1CjMN
 judgeval/scorers/judgeval_scorers/api_scorers/tool_order.py,sha256=Z2FLGBC7m_CLx-CMgXVuTvYvN0vY5yOcWA0ImBkeBfY,787
 judgeval/tracer/__init__.py,sha256=wkuXtOGDCrwgPPXlh_sSJmvGuWaAMHyNzk1TzB5f9aI,148
 judgeval/utils/alerts.py,sha256=3w_AjQrgfmOZvfqCridW8WAnHVxHHXokX9jNzVFyGjA,3297
-judgeval/utils/file_utils.py,sha256=wIEn8kjM0WrP216RGU_yhZhFOMWIS5ckigyHbzFSOMk,1774
+judgeval/utils/file_utils.py,sha256=PWHRs8dUr8iDwpglSSk4Yjd7C6ZhDzUaO-jV3m7riHM,1987
 judgeval/utils/requests.py,sha256=K3gUKrwL6TvwYKVYO5OeLWdUHn9NiUPmnIXhZEiEaHU,1534
-judgeval-0.2.0.dist-info/METADATA,sha256=1AYfJLsYTlofcz1PDkd9Np71U_NvOSWKG_T387xdQ-0,10188
-judgeval-0.2.0.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
-judgeval-0.2.0.dist-info/licenses/LICENSE.md,sha256=tKmCg7k5QOmxPK19XMfzim04QiQJPmgIm0pAn55IJwk,11352
-judgeval-0.2.0.dist-info/RECORD,,
+judgeval-0.3.1.dist-info/METADATA,sha256=rMctXqjJ8pY2MEfeXMA9Ot_8GQiZUDZwErzCZ6rommQ,10348
+judgeval-0.3.1.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
+judgeval-0.3.1.dist-info/licenses/LICENSE.md,sha256=tKmCg7k5QOmxPK19XMfzim04QiQJPmgIm0pAn55IJwk,11352
+judgeval-0.3.1.dist-info/RECORD,,

judgeval/data/datasets/__init__.py DELETED Viewed

@@ -1,4 +0,0 @@
-from judgeval.data.datasets.dataset import EvalDataset
-from judgeval.data.datasets.eval_dataset_client import EvalDatasetClient
-__all__ = ["EvalDataset", "EvalDatasetClient"]

judgeval/data/datasets/dataset.py DELETED Viewed

@@ -1,341 +0,0 @@
-import ast
-import csv
-import datetime
-import json
-import os
-import yaml
-from dataclasses import dataclass, field
-from typing import List, Union, Literal, Optional
-from judgeval.data import Example, Trace
-from judgeval.common.logger import judgeval_logger
-from judgeval.utils.file_utils import get_examples_from_yaml
-@dataclass
-class EvalDataset:
-    examples: List[Example]
-    traces: List[Trace]
-    _alias: Union[str, None] = field(default=None)
-    _id: Union[str, None] = field(default=None)
-    judgment_api_key: str = field(default="")
-    organization_id: str = field(default="")
-    def __init__(
-        self,
-        judgment_api_key: str = os.getenv("JUDGMENT_API_KEY", ""),
-        organization_id: str = os.getenv("JUDGMENT_ORG_ID", ""),
-        examples: Optional[List[Example]] = None,
-        traces: Optional[List[Trace]] = None,
-    ):
-        if not judgment_api_key:
-            judgeval_logger.error("No judgment_api_key provided")
-        self.examples = examples or []
-        self.traces = traces or []
-        self._alias = None
-        self._id = None
-        self.judgment_api_key = judgment_api_key
-        self.organization_id = organization_id
-    def add_from_json(self, file_path: str) -> None:
-        """
-        Adds examples from a JSON file.
-        The format of the JSON file is expected to be a dictionary with one key: "examples".
-        The value of the key is a list of dictionaries, where each dictionary represents an example.
-        The JSON file is expected to have the following format:
-        {
-        "examples": [
-            {
-                "input": "test input",
-                "actual_output": "test output",
-                "expected_output": "expected output",
-                "context": [
-                    "context1",
-                    "context2"
-                ],
-                "retrieval_context": [
-                    "retrieval1"
-                ],
-                "additional_metadata": {
-                    "key": "value"
-                },
-                "tools_called": [
-                    "tool1"
-                ],
-                "expected_tools": [
-                    "tool1",
-                    "tool2"
-                ],
-                "name": "test example",
-                "example_id": null,
-                "timestamp": "20241230_160117",
-                "trace_id": "123"
-            }
-            ]
-        }
-        """
-        try:
-            with open(file_path, "r") as file:
-                payload = json.load(file)
-                examples = payload.get("examples", [])
-        except FileNotFoundError:
-            judgeval_logger.error(f"JSON file not found: {file_path}")
-            raise FileNotFoundError(f"The file {file_path} was not found.")
-        except json.JSONDecodeError:
-            judgeval_logger.error(f"Invalid JSON file: {file_path}")
-            raise ValueError(f"The file {file_path} is not a valid JSON file.")
-        new_examples = [Example(**e) for e in examples]
-        for e in new_examples:
-            self.add_example(e)
-    def add_from_csv(
-        self,
-        file_path: str,
-        header_mapping: dict,
-        primary_delimiter: str = ",",
-        secondary_delimiter: str = ";",
-    ) -> None:
-        """
-        Add Examples from a CSV file.
-        Args:
-            file_path (str): Path to the CSV file
-            header_mapping (dict): Dictionary mapping Example headers to custom headers
-            primary_delimiter (str, optional): Main delimiter used in CSV file. Defaults to ","
-            secondary_delimiter (str, optional): Secondary delimiter for list fields. Defaults to ";"
-        """
-        try:
-            import pandas as pd
-        except ModuleNotFoundError:
-            raise ModuleNotFoundError(
-                "Please install pandas to use this method. 'pip install pandas'"
-            )
-        # Pandas naturally reads numbers in data files as ints, not strings (can lead to unexpected behavior)
-        df = pd.read_csv(file_path, dtype={"trace_id": str}, sep=primary_delimiter)
-        """
-        The user should pass in a dict mapping from Judgment Example headers to their custom defined headers.
-        Available headers for Example objects are as follows:
-        "input", "actual_output", "expected_output", "context", \
-        "retrieval_context", "additional_metadata", "tools_called", \
-        "expected_tools", "name", "comments", "source_file", "example", \
-        "trace_id"
-        We want to collect the examples separately which can
-        be determined by the "example" column. If the value is True, then it is an
-        example, and we expect the `input` and `actual_output` fields to be non-null.
-        We also assume that if there are multiple retrieval contexts, contexts, or tools called, they are separated by semicolons.
-        This can be adjusted using the `secondary_delimiter` parameter.
-        """
-        examples = []
-        def process_csv_row(value, header):
-            """
-            Maps a singular value in the CSV file to the appropriate type based on the header.
-            If value exists and can be split into type List[*], we will split upon the user's provided secondary delimiter.
-            """
-            # check that the CSV value is not null for entry
-            null_replacement = dict() if header == "additional_metadata" else None
-            if pd.isna(value) or value == "":
-                return null_replacement
-            try:
-                value = (
-                    ast.literal_eval(value)
-                    if header == "additional_metadata"
-                    else str(value)
-                )
-            except (ValueError, SyntaxError):
-                value = str(value)
-            if header in [
-                "context",
-                "retrieval_context",
-                "tools_called",
-                "expected_tools",
-            ]:
-                # attempt to split the value by the secondary delimiter
-                value = value.split(secondary_delimiter)
-            return value
-        for _, row in df.iterrows():
-            data = {
-                header: process_csv_row(row[header_mapping[header]], header)
-                for header in header_mapping
-            }
-            if "example" in header_mapping and row[header_mapping["example"]]:
-                if "name" in header_mapping:
-                    data["name"] = (
-                        row[header_mapping["name"]]
-                        if pd.notna(row[header_mapping["name"]])
-                        else None
-                    )
-                # every Example has `input` and `actual_output` fields
-                if data["input"] is not None and data["actual_output"] is not None:
-                    e = Example(**data)
-                    examples.append(e)
-                else:
-                    raise ValueError(
-                        "Every example must have an 'input' and 'actual_output' field."
-                    )
-        for e in examples:
-            self.add_example(e)
-    def add_from_yaml(self, file_path: str) -> None:
-        """
-        Adds examples from a YAML file.
-        The format of the YAML file is expected to be a dictionary with one key: "examples".
-        The value of the key is a list of dictionaries, where each dictionary represents an example.
-        The YAML file is expected to have the following format:
-        examples:
-          - input: "test input"
-            actual_output: "test output"
-            expected_output: "expected output"
-            context:
-              - "context1"
-              - "context2"
-            retrieval_context:
-              - "retrieval1"
-            additional_metadata:
-              key: "value"
-            tools_called:
-              - "tool1"
-            expected_tools:
-              - "tool1"
-              - "tool2"
-            name: "test example"
-            example_id: null
-            timestamp: "20241230_160117"
-            trace_id: "123"
-        """
-        examples = get_examples_from_yaml(file_path)
-        for e in examples:
-            self.add_example(e)
-    def add_example(self, e: Example) -> None:
-        self.examples.append(e)
-        # TODO if we need to add rank, then we need to do it here
-    def add_trace(self, t: Trace) -> None:
-        self.traces.append(t)
-    def save_as(
-        self,
-        file_type: Literal["json", "csv", "yaml"],
-        dir_path: str,
-        save_name: str | None = None,
-    ) -> None:
-        """
-        Saves the dataset as a file. Save only the examples.
-        Args:
-            file_type (Literal["json", "csv"]): The file type to save the dataset as.
-            dir_path (str): The directory path to save the file to.
-            save_name (str, optional): The name of the file to save. Defaults to None.
-        """
-        if not os.path.exists(dir_path):
-            os.makedirs(dir_path)
-        file_name = (
-            datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
-            if save_name is None
-            else save_name
-        )
-        complete_path = os.path.join(dir_path, f"{file_name}.{file_type}")
-        if file_type == "json":
-            with open(complete_path, "w") as file:
-                json.dump(
-                    {
-                        "examples": [e.to_dict() for e in self.examples],
-                    },
-                    file,
-                    indent=4,
-                )
-        elif file_type == "csv":
-            with open(complete_path, "w", newline="") as file:
-                writer = csv.writer(file)
-                writer.writerow(
-                    [
-                        "input",
-                        "actual_output",
-                        "expected_output",
-                        "context",
-                        "retrieval_context",
-                        "additional_metadata",
-                        "tools_called",
-                        "expected_tools",
-                        "name",
-                        "comments",
-                        "source_file",
-                        "example",
-                        "trace_id",
-                    ]
-                )
-                for e in self.examples:
-                    writer.writerow(
-                        [
-                            e.input,
-                            e.actual_output,
-                            e.expected_output,
-                            ";".join(e.context),
-                            ";".join(e.retrieval_context),
-                            e.additional_metadata,
-                            ";".join(e.tools_called),
-                            ";".join(e.expected_tools),
-                            e.name,
-                            None,  # Example does not have comments
-                            None,  # Example does not have source file
-                            True,  # Adding an Example
-                        ]
-                    )
-        elif file_type == "yaml":
-            with open(complete_path, "w") as file:
-                yaml_data = {
-                    "examples": [
-                        {
-                            "input": e.input,
-                            "actual_output": e.actual_output,
-                            "expected_output": e.expected_output,
-                            "context": e.context,
-                            "retrieval_context": e.retrieval_context,
-                            "additional_metadata": e.additional_metadata,
-                            "tools_called": e.tools_called,
-                            "expected_tools": e.expected_tools,
-                            "name": e.name,
-                            "comments": None,  # Example does not have comments
-                            "source_file": None,  # Example does not have source file
-                            "example": True,  # Adding an Example
-                        }
-                        for e in self.examples
-                    ],
-                }
-                yaml.dump(yaml_data, file, default_flow_style=False)
-        else:
-            ACCEPTABLE_FILE_TYPES = ["json", "csv", "yaml"]
-            raise TypeError(
-                f"Invalid file type: {file_type}. Please choose from {ACCEPTABLE_FILE_TYPES}"
-            )
-    def __iter__(self):
-        return iter(self.examples)
-    def __len__(self):
-        return len(self.examples)
-    def __str__(self):
-        return (
-            f"{self.__class__.__name__}("
-            f"examples={self.examples}, "
-            f"traces={self.traces}, "
-            f"_alias={self._alias}, "
-            f"_id={self._id}"
-            f")"
-        )

judgeval 0.2.0__py3-none-any.whl → 0.3.1__py3-none-any.whl

judgeval 0.2.0py3-none-any.whl → 0.3.1py3-none-any.whl