PyPI - judgeval - Versions diffs - 0.0.51__tar.gz → 0.0.53__tar.gz - Mend

judgeval 0.0.51tar.gz → 0.0.53tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (128) hide show

judgeval-0.0.53/.github/ISSUE_TEMPLATE/bug_report.md ADDED Viewed

@@ -0,0 +1,41 @@
+---
+name: Bug report
+about: Create a report to help us improve Judgeval
+title: "[BUG]"
+labels: potential bug
+---
+## Describe the bug
+A clear and concise description of what the bug is.
+## To Reproduce
+Steps to reproduce the behavior:
+1. Go to '...'
+2. Click on '....'
+3. Scroll down to '....'
+4. See error
+## Expected behavior
+A clear and concise description of what you expected to happen.
+## Screenshots
+If applicable, add screenshots to help explain your problem.
+## Environment (please complete the following information):
+ - OS: [e.g. MacOS, Linux, Windows]
+ - Browser (if website issue): [e.g. Chrome, Safari, Firefox]
+ - Browser Version (if website issue): [e.g. 22]
+ - SDK Version: [e.g. 1.2.3]
+ - Programming Language/Runtime (if SDK issue): [e.g. Python 3.11, Python 3.12, etc.]
+ - Package Manager (if SDK issue): [e.g. uv, pip, pipenv]
+## Additional context
+Add any other context about the problem here.
+## Are you interested to contribute a fix for this bug?
+If this is a confirmed bug, the Judgment community is happy to support with guidance and review via [Discord](https://discord.com/invite/tGVFf8UBUY).
+- [ ] Yes
+- [ ] No

judgeval-0.0.53/.github/ISSUE_TEMPLATE/feature_request.md ADDED Viewed

@@ -0,0 +1,43 @@
+---
+name: Feature Request
+about: Suggest an idea for Judgeval
+title: "[FEATURE]"
+labels: feature-request
+---
+## Is your feature request related to a problem? Please describe.
+A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
+## Describe the solution you'd like
+A clear and concise description of what you want to happen.
+## Describe alternatives you've considered
+A clear and concise description of any alternative solutions or features you've considered.
+## Which component(s) does this affect?
+- [ ] SDK (open for community contributions)
+- [ ] Website (internal development only)
+- [ ] Documentation (open for community contributions)
+- [ ] Not sure
+## Use case and impact
+Describe your specific use case and how this feature would benefit you or other users. Include:
+- How often would you use this feature?
+- How many users might benefit from this?
+- Is this blocking your current implementation?
+## Proposed API/Interface (if applicable)
+If you have ideas about how this feature should be exposed (API methods, UI elements, etc.), please describe them here.
+## Additional context
+Add any other context, screenshots, code examples, or links to related issues/discussions about the feature request here.
+## Are you interested in contributing this feature?
+The Judgment community is happy to provide guidance and review for contributions via [Discord](https://discord.com/invite/tGVFf8UBUY).
+- [ ] Yes, I'd like to implement this
+- [ ] Yes, I'd like to help with design/planning
+- [ ] No, but I'd be happy to test it
+- [ ] No

judgeval-0.0.53/.github/pull_request_template.md ADDED Viewed

@@ -0,0 +1,23 @@
+## 📝 Summary
+<!-- Add your list of changes, make it a list to improve the PR reviewers' experience. Ie:
+- [ ] 1. Remove duplicate filter table
+- [ ] 2. Reenabled filtering on new ExperimentRunsTableClient component, reapplied filtering changes
+- [ ] 3. Added only search and filter when enter is pressed or apply filter is pressed
+- [ ] 4. Error message for applying incomplete filters
+- [ ] 5. Deletion should now work again for table
+- [ ] 6. Comparison should now work again for table
+-->
+- [ ] 1. ...
+## 🎥 Demo of Changes
+<!-- Add a short 1-3 minute video describing/demoing the changes -->
+## ✅ Checklist
+- [ ] Tagged Linear ticket in PR title. Ie. PR Title (JUD-XXXX)
+- [ ] Video demo of changes
+- [ ] Reviewers assigned
+- [ ] Docs updated ([if necessary](https://github.com/JudgmentLabs/docs))
+- [ ] Cookbooks updated ([if necessary](https://github.com/JudgmentLabs/judgment-cookbook))

{judgeval-0.0.51 → judgeval-0.0.53}/.gitignore RENAMED Viewed

@@ -110,4 +110,9 @@ test-results.xml
 # Logs
 ./logs
-demo
+demo
+# OpenAPI json file
+src/judgeval/data/openapi_new.json
+CLAUDE.md

{judgeval-0.0.51 → judgeval-0.0.53}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: judgeval
-Version: 0.0.51
+Version: 0.0.53
 Summary: Judgeval Package
 Project-URL: Homepage, https://github.com/JudgmentLabs/judgeval
 Project-URL: Issues, https://github.com/JudgmentLabs/judgeval/issues
@@ -12,6 +12,7 @@ Classifier: Programming Language :: Python :: 3
 Requires-Python: >=3.11
 Requires-Dist: anthropic
 Requires-Dist: boto3
+Requires-Dist: datamodel-code-generator>=0.31.1
 Requires-Dist: google-genai
 Requires-Dist: langchain-anthropic
 Requires-Dist: langchain-core
@@ -51,7 +52,7 @@ We're hiring! Join us in our mission to enable self-learning agents by providing
 </div>
-Judgeval offers **open-source tooling** for tracing, evaluating, and monitoring LLM agents. **Provides comprehensive data from agent-environment interactions** for continuous learning and self-improvement—**enabling the future of autonomous agents**.
+Judgeval offers **open-source tooling** for tracing and evaluating autonomous, stateful agents. It **provides runtime data from agent-environment interactions** for continuous learning and self-improvement.
 ## 🎬 See Judgeval in Action

{judgeval-0.0.51 → judgeval-0.0.53}/README.md RENAMED Viewed

@@ -22,7 +22,7 @@ We're hiring! Join us in our mission to enable self-learning agents by providing
 </div>
-Judgeval offers **open-source tooling** for tracing, evaluating, and monitoring LLM agents. **Provides comprehensive data from agent-environment interactions** for continuous learning and self-improvement—**enabling the future of autonomous agents**.
+Judgeval offers **open-source tooling** for tracing and evaluating autonomous, stateful agents. It **provides runtime data from agent-environment interactions** for continuous learning and self-improvement.
 ## 🎬 See Judgeval in Action

judgeval-0.0.53/assets/agent.gif ADDED Viewed

Binary file

judgeval-0.0.53/assets/data.gif ADDED Viewed

Binary file

judgeval-0.0.53/assets/document.gif ADDED Viewed

Binary file

judgeval-0.0.53/assets/trace.gif ADDED Viewed

Binary file

{judgeval-0.0.51 → judgeval-0.0.53}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "judgeval"
-version = "0.0.51"
+version = "0.0.53"
 authors = [
     { name="Andrew Li", email="andrew@judgmentlabs.ai" },
     { name="Alex Shan", email="alex@judgmentlabs.ai" },
@@ -31,6 +31,7 @@ dependencies = [
     "google-genai",
     "boto3",
     "matplotlib>=3.10.3",
+    "datamodel-code-generator>=0.31.1",
 ]
 [project.urls]

judgeval-0.0.53/src/judgeval/common/logger.py ADDED Viewed

@@ -0,0 +1,60 @@
+# logger.py
+import logging
+import sys
+import os
+# ANSI escape sequences
+RESET = "\033[0m"
+RED = "\033[31m"
+YELLOW = "\033[33m"
+BLUE = "\033[34m"
+GRAY = "\033[90m"
+class ColorFormatter(logging.Formatter):
+    """
+    Wrap the final formatted log record in ANSI color codes based on level.
+    """
+    COLORS = {
+        logging.DEBUG: GRAY,
+        logging.INFO: GRAY,
+        logging.WARNING: YELLOW,
+        logging.ERROR: RED,
+        logging.CRITICAL: RED,
+    }
+    def __init__(self, fmt=None, datefmt=None, use_color=True):
+        super().__init__(fmt=fmt, datefmt=datefmt)
+        self.use_color = use_color and sys.stdout.isatty()
+    def format(self, record):
+        message = super().format(record)
+        if self.use_color:
+            color = self.COLORS.get(record.levelno, "")
+            if color:
+                message = f"{color}{message}{RESET}"
+        return message
+def _setup_judgeval_logger():
+    use_color = sys.stdout.isatty() and os.getenv("NO_COLOR") is None
+    handler = logging.StreamHandler(sys.stdout)
+    handler.setLevel(logging.DEBUG)
+    handler.setFormatter(
+        ColorFormatter(
+            fmt="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
+            datefmt="%Y-%m-%d %H:%M:%S",
+            use_color=use_color,
+        )
+    )
+    logger = logging.getLogger("judgeval")
+    logger.setLevel(logging.DEBUG)
+    logger.addHandler(handler)
+    return logger
+# Global logger you can import elsewhere
+judgeval_logger = _setup_judgeval_logger()

{judgeval-0.0.51 → judgeval-0.0.53}/src/judgeval/common/s3_storage.py RENAMED Viewed

@@ -4,7 +4,7 @@ import boto3
 from typing import Optional
 from datetime import datetime, UTC
 from botocore.exceptions import ClientError
-from judgeval.common.logger import warning, info
+from judgeval.common.logger import judgeval_logger
 class S3Storage:
@@ -42,7 +42,6 @@ class S3Storage:
             error_code = e.response["Error"]["Code"]
             if error_code == "404":
                 # Bucket doesn't exist, create it
-                info(f"Bucket {self.bucket_name} doesn't exist, creating it ...")
                 try:
                     self.s3_client.create_bucket(
                         Bucket=self.bucket_name,
@@ -52,14 +51,13 @@ class S3Storage:
                     ) if self.s3_client.meta.region_name != "us-east-1" else self.s3_client.create_bucket(
                         Bucket=self.bucket_name
                     )
-                    info(f"Created S3 bucket: {self.bucket_name}")
                 except ClientError as create_error:
                     if (
                         create_error.response["Error"]["Code"]
                         == "BucketAlreadyOwnedByYou"
                     ):
                         # Bucket was just created by another process
-                        warning(
+                        judgeval_logger.warning(
                             f"Bucket {self.bucket_name} was just created by another process"
                         )
                         pass
@@ -90,8 +88,6 @@ class S3Storage:
         # Convert trace data to JSON string
         trace_json = json.dumps(trace_data)
-        # Upload to S3
-        info(f"Uploading trace to S3 at key {s3_key}, in bucket {self.bucket_name} ...")
         self.s3_client.put_object(
             Bucket=self.bucket_name,
             Key=s3_key,

judgeval 0.0.51__tar.gz → 0.0.53__tar.gz

judgeval 0.0.51tar.gz → 0.0.53tar.gz