PyPI - judgeval - Versions diffs - 0.2.0__tar.gz → 0.3.1__tar.gz - Mend

judgeval 0.2.0tar.gz → 0.3.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (123) hide show

{judgeval-0.2.0 → judgeval-0.3.1}/.github/workflows/lint.yaml RENAMED Viewed

@@ -10,20 +10,11 @@ jobs:
     steps:
       - uses: actions/checkout@v4
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
       - name: Install ruff
         uses: astral-sh/ruff-action@v3
         with:
           args: "--version"
-      - name: Install mypy and dependencies
-        run: |
-          pip install mypy types-requests types-PyYAML
       - name: Run ruff formatter
         if: always()
         run: ruff format --check .
@@ -31,7 +22,3 @@ jobs:
       - name: Run ruff linter
         if: always()
         run: ruff check .
-      - name: Run mypy
-        if: always()
-        run: mypy --explicit-package-bases --ignore-missing-imports .

judgeval-0.3.1/.github/workflows/mypy.yaml ADDED Viewed

@@ -0,0 +1,25 @@
+name: MyPy Check
+on:
+  pull_request:
+    branches: [ main, staging ]
+jobs:
+  mypy:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+      - name: Install dependencies
+        run: |
+          pip install uv
+          uv sync --dev
+      - name: Run mypy
+        if: always()
+        run: uv run mypy ./src/judgeval/

judgeval-0.3.1/.github/workflows/pre-commit-autoupdate.yaml ADDED Viewed

@@ -0,0 +1,38 @@
+name: Pre-commit auto-update
+on:
+  schedule:
+    - cron: '0 0 * * 1'  # Weekly on Monday at midnight UTC
+  workflow_dispatch:
+jobs:
+  auto-update:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          ref: staging
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.11'
+      - name: Install and update pre-commit
+        run: |
+          pip install pre-commit
+          pre-commit autoupdate
+      - name: Create Pull Request
+        uses: peter-evans/create-pull-request@v7
+        with:
+          commit-message: 'chore: update pre-commit hooks'
+          title: 'chore: update pre-commit hooks'
+          body: |
+            Auto-generated PR to update pre-commit hook versions.
+            Please review the changes and merge if everything looks good.
+            Updated by GitHub Actions on {{ date }}.
+          branch: update-pre-commit-hooks
+          base: staging

judgeval-0.3.1/.pre-commit-config.yaml ADDED Viewed

@@ -0,0 +1,23 @@
+repos:
+  - repo: https://github.com/astral-sh/uv-pre-commit
+    rev: 0.8.0
+    hooks:
+      - id: uv-lock
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.12.4
+    hooks:
+      - id: ruff
+        name: ruff (linter)
+        args: [--fix]
+      - id: ruff-format
+        name: ruff (formatter)
+  # - repo: https://github.com/pre-commit/mirrors-mypy
+  #   rev: v1.17.0
+  #   hooks:
+  #     - id: mypy
+  #       language: system
+  #       # These next two lines allow commits even if mypy fails, REMOVE once we fix all mypy errors
+  #       verbose: true
+  #       entry: bash -c 'mypy src/judgeval/ || true'

{judgeval-0.2.0 → judgeval-0.3.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: judgeval
-Version: 0.2.0
+Version: 0.3.1
 Summary: Judgeval Package
 Project-URL: Homepage, https://github.com/JudgmentLabs/judgeval
 Project-URL: Issues, https://github.com/JudgmentLabs/judgeval/issues
@@ -14,6 +14,7 @@ Requires-Dist: anthropic
 Requires-Dist: boto3
 Requires-Dist: datamodel-code-generator>=0.31.1
 Requires-Dist: google-genai
+Requires-Dist: groq>=0.30.0
 Requires-Dist: langchain-anthropic
 Requires-Dist: langchain-core
 Requires-Dist: langchain-huggingface
@@ -22,6 +23,9 @@ Requires-Dist: litellm>=1.61.15
 Requires-Dist: matplotlib>=3.10.3
 Requires-Dist: nest-asyncio
 Requires-Dist: openai
+Requires-Dist: opentelemetry-api>=1.34.1
+Requires-Dist: opentelemetry-sdk>=1.34.1
+Requires-Dist: orjson>=3.9.0
 Requires-Dist: pandas
 Requires-Dist: python-dotenv==1.0.1
 Requires-Dist: python-slugify>=8.0.4
@@ -39,7 +43,7 @@ Description-Content-Type: text/markdown
     Enable self-learning agents with traces, evals, and environment data.
 </div>
-## [Docs](https://docs.judgmentlabs.ai/)  •  [Judgment Cloud](https://app.judgmentlabs.ai/register)  • [Self-Host](https://docs.judgmentlabs.ai/documentation/self-hosting/get-started)
+## [Docs](https://docs.judgmentlabs.ai/)  •  [Judgment Cloud](https://app.judgmentlabs.ai/register)  • [Self-Host](https://docs.judgmentlabs.ai/documentation/self-hosting/get-started)  • [Landing Page](https://judgmentlabs.ai/)
  [Demo](https://www.youtube.com/watch?v=1S4LixpVbcc) • [Bug Reports](https://github.com/JudgmentLabs/judgeval/issues) • [Changelog](https://docs.judgmentlabs.ai/changelog/2025-04-21)
@@ -139,7 +143,7 @@ run_agent("What is the capital of the United States?")
 ```
 You'll see your trace exported to the Judgment Platform:
-<p align="center"><img src="assets/trace_demo.png" alt="Judgment Platform Trace Example" width="800" /></p>
+<p align="center"><img src="assets/online_eval.png" alt="Judgment Platform Trace Example" width="1500" /></p>
 [Click here](https://docs.judgmentlabs.ai/documentation/tracing/introduction) for a more detailed explanation.
@@ -152,9 +156,9 @@ You'll see your trace exported to the Judgment Platform:
 |  |  |
 |:---|:---:|
-| <h3>🔍 Tracing</h3>Automatic agent tracing integrated with common frameworks (LangGraph, OpenAI, Anthropic). **Tracks inputs/outputs, agent tool calls, latency, cost, and custom metadata** at every step.<br><br>**Useful for:**<br>• 🐛 Debugging agent runs <br>• 📋 Collecting agent environment data <br>• 🔬 Pinpointing performance bottlenecks| <p align="center"><img src="assets/trace_screenshot.png" alt="Tracing visualization" width="1200"/></p> |
-| <h3>🧪 Evals</h3>Build custom evaluators on top of your agents. Judgeval supports LLM-as-a-judge, manual labeling, and code-based evaluators that connect with our metric-tracking infrastructure. <br><br>**Useful for:**<br>• ⚠️ Unit-testing <br>• 🔬 A/B testing <br>• 🛡️ Online guardrails | <p align="center"><img src="assets/experiments_page.png" alt="Evaluation metrics" width="800"/></p> |
-| <h3>📡 Monitoring</h3>Get Slack alerts for agent failures in production. Add custom hooks to address production regressions.<br><br> **Useful for:** <br>• 📉 Identifying degradation early <br>• 📈 Visualizing performance trends across agent versions and time | <p align="center"><img src="assets/error_analysis_dashboard.png" alt="Monitoring Dashboard" width="1200"/></p> |
+| <h3>🔍 Tracing</h3>Automatic agent tracing integrated with common frameworks (LangGraph, OpenAI, Anthropic). **Tracks inputs/outputs, agent tool calls, latency, cost, and custom metadata** at every step.<br><br>**Useful for:**<br>• 🐛 Debugging agent runs <br>• 📋 Collecting agent environment data <br>• 🔬 Pinpointing performance bottlenecks| <p align="center"><img src="assets/agent_trace_example.png" alt="Tracing visualization" width="1200"/></p> |
+| <h3>🧪 Evals</h3>Build custom evaluators on top of your agents. Judgeval supports LLM-as-a-judge, manual labeling, and code-based evaluators that connect with our metric-tracking infrastructure. <br><br>**Useful for:**<br>• ⚠️ Unit-testing <br>• 🔬 A/B testing <br>• 🛡️ Online guardrails | <p align="center"><img src="assets/test.png" alt="Evaluation metrics" width="800"/></p> |
+| <h3>📡 Monitoring</h3>Get Slack alerts for agent failures in production. Add custom hooks to address production regressions.<br><br> **Useful for:** <br>• 📉 Identifying degradation early <br>• 📈 Visualizing performance trends across agent versions and time | <p align="center"><img src="assets/errors.png" alt="Monitoring Dashboard" width="1200"/></p> |
 | <h3>📊 Datasets</h3>Export traces and test cases to datasets for scaled analysis and optimization. Move datasets to/from Parquet, S3, etc. <br><br>Run evals on datasets as unit tests or to A/B test different agent configurations, enabling continuous learning from production interactions. <br><br> **Useful for:**<br>• 🗃️ Agent environment interaction data for optimization<br>• 🔄 Scaled analysis for A/B tests | <p align="center"><img src="assets/datasets_preview_screenshot.png" alt="Dataset management" width="1200"/></p> |
 ## 🏢 Self-Hosting

{judgeval-0.2.0 → judgeval-0.3.1}/README.md RENAMED Viewed

@@ -8,7 +8,7 @@
     Enable self-learning agents with traces, evals, and environment data.
 </div>
-## [Docs](https://docs.judgmentlabs.ai/)  •  [Judgment Cloud](https://app.judgmentlabs.ai/register)  • [Self-Host](https://docs.judgmentlabs.ai/documentation/self-hosting/get-started)
+## [Docs](https://docs.judgmentlabs.ai/)  •  [Judgment Cloud](https://app.judgmentlabs.ai/register)  • [Self-Host](https://docs.judgmentlabs.ai/documentation/self-hosting/get-started)  • [Landing Page](https://judgmentlabs.ai/)
  [Demo](https://www.youtube.com/watch?v=1S4LixpVbcc) • [Bug Reports](https://github.com/JudgmentLabs/judgeval/issues) • [Changelog](https://docs.judgmentlabs.ai/changelog/2025-04-21)
@@ -108,7 +108,7 @@ run_agent("What is the capital of the United States?")
 ```
 You'll see your trace exported to the Judgment Platform:
-<p align="center"><img src="assets/trace_demo.png" alt="Judgment Platform Trace Example" width="800" /></p>
+<p align="center"><img src="assets/online_eval.png" alt="Judgment Platform Trace Example" width="1500" /></p>
 [Click here](https://docs.judgmentlabs.ai/documentation/tracing/introduction) for a more detailed explanation.
@@ -121,9 +121,9 @@ You'll see your trace exported to the Judgment Platform:
 |  |  |
 |:---|:---:|
-| <h3>🔍 Tracing</h3>Automatic agent tracing integrated with common frameworks (LangGraph, OpenAI, Anthropic). **Tracks inputs/outputs, agent tool calls, latency, cost, and custom metadata** at every step.<br><br>**Useful for:**<br>• 🐛 Debugging agent runs <br>• 📋 Collecting agent environment data <br>• 🔬 Pinpointing performance bottlenecks| <p align="center"><img src="assets/trace_screenshot.png" alt="Tracing visualization" width="1200"/></p> |
-| <h3>🧪 Evals</h3>Build custom evaluators on top of your agents. Judgeval supports LLM-as-a-judge, manual labeling, and code-based evaluators that connect with our metric-tracking infrastructure. <br><br>**Useful for:**<br>• ⚠️ Unit-testing <br>• 🔬 A/B testing <br>• 🛡️ Online guardrails | <p align="center"><img src="assets/experiments_page.png" alt="Evaluation metrics" width="800"/></p> |
-| <h3>📡 Monitoring</h3>Get Slack alerts for agent failures in production. Add custom hooks to address production regressions.<br><br> **Useful for:** <br>• 📉 Identifying degradation early <br>• 📈 Visualizing performance trends across agent versions and time | <p align="center"><img src="assets/error_analysis_dashboard.png" alt="Monitoring Dashboard" width="1200"/></p> |
+| <h3>🔍 Tracing</h3>Automatic agent tracing integrated with common frameworks (LangGraph, OpenAI, Anthropic). **Tracks inputs/outputs, agent tool calls, latency, cost, and custom metadata** at every step.<br><br>**Useful for:**<br>• 🐛 Debugging agent runs <br>• 📋 Collecting agent environment data <br>• 🔬 Pinpointing performance bottlenecks| <p align="center"><img src="assets/agent_trace_example.png" alt="Tracing visualization" width="1200"/></p> |
+| <h3>🧪 Evals</h3>Build custom evaluators on top of your agents. Judgeval supports LLM-as-a-judge, manual labeling, and code-based evaluators that connect with our metric-tracking infrastructure. <br><br>**Useful for:**<br>• ⚠️ Unit-testing <br>• 🔬 A/B testing <br>• 🛡️ Online guardrails | <p align="center"><img src="assets/test.png" alt="Evaluation metrics" width="800"/></p> |
+| <h3>📡 Monitoring</h3>Get Slack alerts for agent failures in production. Add custom hooks to address production regressions.<br><br> **Useful for:** <br>• 📉 Identifying degradation early <br>• 📈 Visualizing performance trends across agent versions and time | <p align="center"><img src="assets/errors.png" alt="Monitoring Dashboard" width="1200"/></p> |
 | <h3>📊 Datasets</h3>Export traces and test cases to datasets for scaled analysis and optimization. Move datasets to/from Parquet, S3, etc. <br><br>Run evals on datasets as unit tests or to A/B test different agent configurations, enabling continuous learning from production interactions. <br><br> **Useful for:**<br>• 🗃️ Agent environment interaction data for optimization<br>• 🔄 Scaled analysis for A/B tests | <p align="center"><img src="assets/datasets_preview_screenshot.png" alt="Dataset management" width="1200"/></p> |
 ## 🏢 Self-Hosting

judgeval-0.3.1/assets/agent_trace_example.png ADDED Viewed

Binary file

judgeval-0.3.1/assets/errors.png ADDED Viewed

Binary file

judgeval-0.3.1/assets/online_eval.png ADDED Viewed

Binary file

judgeval-0.3.1/assets/product_shot.png ADDED Viewed

Binary file

judgeval-0.3.1/assets/test.png ADDED Viewed

Binary file

judgeval-0.3.1/assets/tests.png ADDED Viewed

Binary file

{judgeval-0.2.0 → judgeval-0.3.1}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "judgeval"
-version = "0.2.0"
+version = "0.3.1"
 authors = [
     { name="Andrew Li", email="andrew@judgmentlabs.ai" },
     { name="Alex Shan", email="alex@judgmentlabs.ai" },
@@ -33,6 +33,10 @@ dependencies = [
     "matplotlib>=3.10.3",
     "python-slugify>=8.0.4",
     "datamodel-code-generator>=0.31.1",
+    "groq>=0.30.0",
+    "opentelemetry-api>=1.34.1",
+    "opentelemetry-sdk>=1.34.1",
+    "orjson>=3.9.0",
 ]
 [project.urls]
@@ -62,6 +66,30 @@ dev = [
     "langgraph>=0.4.3",
     "pre-commit>=4.2.0",
     "types-requests>=2.32.4.20250611",
+    "mypy>=1.17.0",
+    "types-pyyaml>=6.0.12.20250516",
+    "pandas-stubs>=2.3.0.250703",
+    "lxml-stubs>=0.5.1",
+    "types-pygments>=2.19.0.20250715",
+    "types-beautifulsoup4>=4.12.0.20250516",
+    "types-cachetools>=6.1.0.20250717",
+    "types-cffi>=1.17.0.20250523",
+    "types-defusedxml>=0.7.0.20250708",
+    "types-greenlet>=3.2.0.20250417",
+    "types-jsonschema>=4.24.0.20250708",
+    "types-objgraph>=3.6.0.20240907",
+    "types-pexpect>=4.9.0.20250516",
+    "types-protobuf>=6.30.2.20250703",
+    "types-psutil>=7.0.0.20250601",
+    "types-pyopenssl>=24.1.0.20240722",
+    "types-pyasn1>=0.6.0.20250516",
+    "types-regex>=2024.11.6.20250403",
+    "types-reportlab>=4.4.1.20250602",
+    "types-simplejson>=3.20.0.20250326",
+    "types-tensorflow>=2.18.0.20250516",
+    "types-tqdm>=4.67.0.20250516",
+    "types-tree-sitter-languages>=1.10.0.20250530",
+    "types-xmltodict>=0.14.0.20241009",
 ]
 [tool.hatch.build]

{judgeval-0.2.0 → judgeval-0.3.1}/src/judgeval/common/api/api.py RENAMED Viewed

@@ -1,4 +1,4 @@
-from typing import Literal, List, Dict, Any
+from typing import Literal, List, Dict, Any, Union
 from requests import exceptions
 from judgeval.common.api.constants import (
     JUDGMENT_TRACES_FETCH_API_URL,
@@ -25,6 +25,8 @@ from judgeval.common.api.constants import (
     JUDGMENT_SCORER_SAVE_API_URL,
     JUDGMENT_SCORER_FETCH_API_URL,
     JUDGMENT_SCORER_EXISTS_API_URL,
+    JUDGMENT_DATASETS_APPEND_TRACES_API_URL,
+    JUDGMENT_CHECK_EXAMPLE_KEYS_API_URL,
 )
 from judgeval.common.api.constants import (
     TraceFetchPayload,
@@ -48,9 +50,12 @@ from judgeval.common.api.constants import (
     ScorerSavePayload,
     ScorerFetchPayload,
     ScorerExistsPayload,
+    CheckExampleKeysPayload,
 )
 from judgeval.utils.requests import requests
+import orjson
 class JudgmentAPIException(exceptions.HTTPError):
     """
@@ -65,7 +70,7 @@ class JudgmentAPIException(exceptions.HTTPError):
         self.request = request
     @property
-    def status_code(self) -> int | None:
+    def status_code(self) -> Union[int, None]:
         """Get the HTTP status code from the response."""
         return self.response.status_code if self.response else None
@@ -114,8 +119,15 @@ class JudgmentApiClient:
         try:
             r.raise_for_status()
         except exceptions.HTTPError as e:
+            try:
+                detail = r.json().get("detail", "")
+            except Exception:
+                detail = r.text
             raise JudgmentAPIException(
-                f"HTTP {r.status_code}: {r.reason}", response=r, request=e.request
+                f"HTTP {r.status_code}: {r.reason}, {detail}",
+                response=r,
+                request=e.request,
             )
         return r.json()
@@ -218,6 +230,14 @@ class JudgmentApiClient:
         }
         return self._do_request("POST", JUDGMENT_EVAL_RUN_NAME_EXISTS_API_URL, payload)
+    def check_example_keys(self, keys: List[str], eval_name: str, project_name: str):
+        payload: CheckExampleKeysPayload = {
+            "keys": keys,
+            "eval_name": eval_name,
+            "project_name": project_name,
+        }
+        return self._do_request("POST", JUDGMENT_CHECK_EXAMPLE_KEYS_API_URL, payload)
     def save_scorer(self, name: str, prompt: str, options: dict):
         payload: ScorerSavePayload = {
             "name": name,
@@ -279,7 +299,7 @@ class JudgmentApiClient:
         project_name: str,
         examples: List[Dict[str, Any]],
         traces: List[Dict[str, Any]],
-        overwrite: bool,
+        overwrite: bool = False,
     ):
         payload: DatasetPushPayload = {
             "dataset_alias": dataset_alias,
@@ -302,6 +322,18 @@ class JudgmentApiClient:
             "POST", JUDGMENT_DATASETS_APPEND_EXAMPLES_API_URL, payload
         )
+    def append_traces(
+        self, dataset_alias: str, project_name: str, traces: List[Dict[str, Any]]
+    ):
+        payload: DatasetAppendPayload = {
+            "dataset_alias": dataset_alias,
+            "project_name": project_name,
+            "traces": traces,
+        }
+        return self._do_request(
+            "POST", JUDGMENT_DATASETS_APPEND_TRACES_API_URL, payload
+        )
     def pull_dataset(self, dataset_alias: str, project_name: str):
         payload: DatasetPullPayload = {
             "dataset_alias": dataset_alias,
@@ -347,6 +379,5 @@ class JudgmentApiClient:
                 except Exception as e:
                     return f"<Unserializable object of type {type(obj).__name__}: {e}>"
-        import json
-        return json.dumps(data, default=fallback_encoder)
+        # orjson returns bytes, so we need to decode to str
+        return orjson.dumps(data, default=fallback_encoder).decode("utf-8")

{judgeval-0.2.0 → judgeval-0.3.1}/src/judgeval/common/api/constants.py RENAMED Viewed

@@ -51,6 +51,7 @@ JUDGMENT_ADD_TO_RUN_EVAL_QUEUE_API_URL = f"{ROOT_API}/add_to_run_eval_queue/"
 JUDGMENT_GET_EVAL_STATUS_API_URL = f"{ROOT_API}/get_evaluation_status/"
 JUDGMENT_CHECK_EXPERIMENT_TYPE_API_URL = f"{ROOT_API}/check_experiment_type/"
 JUDGMENT_EVAL_RUN_NAME_EXISTS_API_URL = f"{ROOT_API}/eval-run-name-exists/"
+JUDGMENT_CHECK_EXAMPLE_KEYS_API_URL = f"{ROOT_API}/check_example_keys/"
 # Evaluation API Payloads
@@ -90,9 +91,16 @@ class EvalRunNameExistsPayload(TypedDict):
     judgment_api_key: str
+class CheckExampleKeysPayload(TypedDict):
+    keys: List[str]
+    eval_name: str
+    project_name: str
 # Datasets API
 JUDGMENT_DATASETS_PUSH_API_URL = f"{ROOT_API}/datasets/push/"
 JUDGMENT_DATASETS_APPEND_EXAMPLES_API_URL = f"{ROOT_API}/datasets/insert_examples/"
+JUDGMENT_DATASETS_APPEND_TRACES_API_URL = f"{ROOT_API}/traces/add_to_dataset/"
 JUDGMENT_DATASETS_PULL_API_URL = f"{ROOT_API}/datasets/pull_for_judgeval/"
 JUDGMENT_DATASETS_DELETE_API_URL = f"{ROOT_API}/datasets/delete/"
 JUDGMENT_DATASETS_EXPORT_JSONL_API_URL = f"{ROOT_API}/datasets/export_jsonl/"
@@ -134,7 +142,7 @@ class DatasetStatsPayload(TypedDict):
 # Projects API
-JUDGMENT_PROJECT_DELETE_API_URL = f"{ROOT_API}/projects/delete/"
+JUDGMENT_PROJECT_DELETE_API_URL = f"{ROOT_API}/projects/delete_from_judgeval/"
 JUDGMENT_PROJECT_CREATE_API_URL = f"{ROOT_API}/projects/add/"

{judgeval-0.2.0 → judgeval-0.3.1}/src/judgeval/common/storage/s3_storage.py RENAMED Viewed

@@ -1,6 +1,6 @@
 import os
-import json
 import boto3
+import orjson
 from typing import Optional
 from datetime import datetime, UTC
 from botocore.exceptions import ClientError
@@ -85,8 +85,7 @@ class S3Storage:
         timestamp = datetime.now(UTC).strftime("%Y%m%d_%H%M%S")
         s3_key = f"traces/{project_name}/{trace_id}_{timestamp}.json"
-        # Convert trace data to JSON string
-        trace_json = json.dumps(trace_data)
+        trace_json = orjson.dumps(trace_data).decode("utf-8")
         self.s3_client.put_object(
             Bucket=self.bucket_name,

{judgeval-0.2.0 → judgeval-0.3.1}/src/judgeval/common/tracer/core.py RENAMED Viewed

@@ -32,6 +32,7 @@ from typing import (
 )
 import types
 from judgeval.common.tracer.constants import _TRACE_FILEPATH_BLOCKLIST
 from judgeval.common.tracer.otel_span_processor import JudgmentSpanProcessor
@@ -45,6 +46,7 @@ from openai.types.chat import ParsedChatCompletion
 from together import Together, AsyncTogether
 from anthropic import Anthropic, AsyncAnthropic
 from google import genai
+from groq import Groq, AsyncGroq
 from judgeval.data import Example, Trace, TraceSpan, TraceUsage
 from judgeval.scorers import APIScorerConfig, BaseScorer
@@ -67,6 +69,8 @@ ApiClient: TypeAlias = Union[
     AsyncTogether,
     genai.Client,
     genai.client.AsyncClient,
+    Groq,
+    AsyncGroq,
 ]
 SpanType: TypeAlias = str
@@ -79,7 +83,7 @@ class TraceClient:
         tracer: Tracer,
         trace_id: Optional[str] = None,
         name: str = "default",
-        project_name: str | None = None,
+        project_name: Union[str, None] = None,
         enable_monitoring: bool = True,
         enable_evaluations: bool = True,
         parent_trace_id: Optional[str] = None,
@@ -414,8 +418,6 @@ class TraceClient:
                 self.start_time or time.time(), timezone.utc
             ).isoformat(),
             "duration": total_duration,
-            "trace_spans": [span.model_dump() for span in self.trace_spans],
-            "evaluation_runs": [run.model_dump() for run in self.evaluation_runs],
             "offline_mode": self.tracer.offline_mode,
             "parent_trace_id": self.parent_trace_id,
             "parent_name": self.parent_name,
@@ -850,9 +852,9 @@ class Tracer:
     def __init__(
         self,
-        api_key: str | None = os.getenv("JUDGMENT_API_KEY"),
-        organization_id: str | None = os.getenv("JUDGMENT_ORG_ID"),
-        project_name: str | None = None,
+        api_key: Union[str, None] = os.getenv("JUDGMENT_API_KEY"),
+        organization_id: Union[str, None] = os.getenv("JUDGMENT_ORG_ID"),
+        project_name: Union[str, None] = None,
         deep_tracing: bool = False,  # Deep tracing is disabled by default
         enable_monitoring: bool = os.getenv("JUDGMENT_MONITORING", "true").lower()
         == "true",
@@ -905,8 +907,8 @@ class Tracer:
             self.class_identifiers: Dict[
                 str, str
             ] = {}  # Dictionary to store class identifiers
-            self.span_id_to_previous_span_id: Dict[str, str | None] = {}
-            self.trace_id_to_previous_trace: Dict[str, TraceClient | None] = {}
+            self.span_id_to_previous_span_id: Dict[str, Union[str, None]] = {}
+            self.trace_id_to_previous_trace: Dict[str, Union[TraceClient, None]] = {}
             self.current_span_id: Optional[str] = None
             self.current_trace: Optional[TraceClient] = None
             self.trace_across_async_contexts: bool = trace_across_async_contexts
@@ -958,7 +960,9 @@ class Tracer:
             self.enable_monitoring = False
             self.enable_evaluations = False
-    def set_current_span(self, span_id: str) -> Optional[contextvars.Token[str | None]]:
+    def set_current_span(
+        self, span_id: str
+    ) -> Optional[contextvars.Token[Union[str, None]]]:
         self.span_id_to_previous_span_id[span_id] = self.current_span_id
         self.current_span_id = span_id
         Tracer.current_span_id = span_id
@@ -981,7 +985,7 @@ class Tracer:
     def reset_current_span(
         self,
-        token: Optional[contextvars.Token[str | None]] = None,
+        token: Optional[contextvars.Token[Union[str, None]]] = None,
         span_id: Optional[str] = None,
     ):
         try:
@@ -997,7 +1001,7 @@ class Tracer:
     def set_current_trace(
         self, trace: TraceClient
-    ) -> Optional[contextvars.Token[TraceClient | None]]:
+    ) -> Optional[contextvars.Token[Union[TraceClient, None]]]:
         """
         Set the current trace context in contextvars
         """
@@ -1030,7 +1034,7 @@ class Tracer:
     def reset_current_trace(
         self,
-        token: Optional[contextvars.Token[TraceClient | None]] = None,
+        token: Optional[contextvars.Token[Union[TraceClient, None]]] = None,
         trace_id: Optional[str] = None,
     ):
         try:
@@ -1046,7 +1050,7 @@ class Tracer:
     @contextmanager
     def trace(
-        self, name: str, project_name: str | None = None
+        self, name: str, project_name: Union[str, None] = None
     ) -> Generator[TraceClient, None, None]:
         """Start a new trace context using a context manager"""
         trace_id = str(uuid.uuid4())
@@ -1692,25 +1696,31 @@ def wrap(
         return wrapper
     if isinstance(client, (OpenAI)):
-        client.chat.completions.create = wrapped(original_create)
-        client.responses.create = wrapped(original_responses_create)
-        client.beta.chat.completions.parse = wrapped(original_beta_parse)
+        setattr(client.chat.completions, "create", wrapped(original_create))
+        setattr(client.responses, "create", wrapped(original_responses_create))
+        setattr(client.beta.chat.completions, "parse", wrapped(original_beta_parse))
     elif isinstance(client, (AsyncOpenAI)):
-        client.chat.completions.create = wrapped_async(original_create)
-        client.responses.create = wrapped_async(original_responses_create)
-        client.beta.chat.completions.parse = wrapped_async(original_beta_parse)
+        setattr(client.chat.completions, "create", wrapped_async(original_create))
+        setattr(client.responses, "create", wrapped_async(original_responses_create))
+        setattr(
+            client.beta.chat.completions, "parse", wrapped_async(original_beta_parse)
+        )
     elif isinstance(client, (Together)):
-        client.chat.completions.create = wrapped(original_create)
+        setattr(client.chat.completions, "create", wrapped(original_create))
     elif isinstance(client, (AsyncTogether)):
-        client.chat.completions.create = wrapped_async(original_create)
+        setattr(client.chat.completions, "create", wrapped_async(original_create))
     elif isinstance(client, (Anthropic)):
-        client.messages.create = wrapped(original_create)
+        setattr(client.messages, "create", wrapped(original_create))
     elif isinstance(client, (AsyncAnthropic)):
-        client.messages.create = wrapped_async(original_create)
+        setattr(client.messages, "create", wrapped_async(original_create))
     elif isinstance(client, (genai.Client)):
-        client.models.generate_content = wrapped(original_create)
+        setattr(client.models, "generate_content", wrapped(original_create))
     elif isinstance(client, (genai.client.AsyncClient)):
-        client.models.generate_content = wrapped_async(original_create)
+        setattr(client.models, "generate_content", wrapped_async(original_create))
+    elif isinstance(client, (Groq)):
+        setattr(client.chat.completions, "create", wrapped(original_create))
+    elif isinstance(client, (AsyncGroq)):
+        setattr(client.chat.completions, "create", wrapped_async(original_create))
     return client
@@ -1745,6 +1755,8 @@ def _get_client_config(
             None,
             client.beta.chat.completions.parse,
         )
+    elif isinstance(client, (Groq, AsyncGroq)):
+        return "GROQ_API_CALL", client.chat.completions.create, None, None, None
     elif isinstance(client, (Together, AsyncTogether)):
         return "TOGETHER_API_CALL", client.chat.completions.create, None, None, None
     elif isinstance(client, (Anthropic, AsyncAnthropic)):
@@ -1783,9 +1795,17 @@ def _format_output_data(
     if isinstance(client, (OpenAI, AsyncOpenAI)):
         if isinstance(response, ChatCompletion):
             model_name = response.model
-            prompt_tokens = response.usage.prompt_tokens
-            completion_tokens = response.usage.completion_tokens
-            cache_read_input_tokens = response.usage.prompt_tokens_details.cached_tokens
+            prompt_tokens = response.usage.prompt_tokens if response.usage else 0
+            completion_tokens = (
+                response.usage.completion_tokens if response.usage else 0
+            )
+            cache_read_input_tokens = (
+                response.usage.prompt_tokens_details.cached_tokens
+                if response.usage
+                and response.usage.prompt_tokens_details
+                and response.usage.prompt_tokens_details.cached_tokens
+                else 0
+            )
             if isinstance(response, ParsedChatCompletion):
                 message_content = response.choices[0].message.parsed
@@ -1793,10 +1813,19 @@ def _format_output_data(
                 message_content = response.choices[0].message.content
         elif isinstance(response, Response):
             model_name = response.model
-            prompt_tokens = response.usage.input_tokens
-            completion_tokens = response.usage.output_tokens
-            cache_read_input_tokens = response.usage.input_tokens_details.cached_tokens
-            message_content = "".join(seg.text for seg in response.output[0].content)
+            prompt_tokens = response.usage.input_tokens if response.usage else 0
+            completion_tokens = response.usage.output_tokens if response.usage else 0
+            cache_read_input_tokens = (
+                response.usage.input_tokens_details.cached_tokens
+                if response.usage and response.usage.input_tokens_details
+                else 0
+            )
+            if hasattr(response.output[0], "content"):
+                message_content = "".join(
+                    seg.text
+                    for seg in response.output[0].content
+                    if hasattr(seg, "text")
+                )
         # Note: LiteLLM seems to use cache_read_input_tokens to calculate the cost for OpenAI
     elif isinstance(client, (Together, AsyncTogether)):
@@ -1821,6 +1850,11 @@ def _format_output_data(
         cache_read_input_tokens = response.usage.cache_read_input_tokens
         cache_creation_input_tokens = response.usage.cache_creation_input_tokens
         message_content = response.content[0].text
+    elif isinstance(client, (Groq, AsyncGroq)):
+        model_name = "groq/" + response.model
+        prompt_tokens = response.usage.prompt_tokens
+        completion_tokens = response.usage.completion_tokens
+        message_content = response.choices[0].message.content
     else:
         judgeval_logger.warning(f"Unsupported client type: {type(client)}")
         return None, None

judgeval 0.2.0__tar.gz → 0.3.1__tar.gz

judgeval 0.2.0tar.gz → 0.3.1tar.gz