PyPI - judgeval - Versions diffs - 0.0.26__tar.gz → 0.0.28__tar.gz - Mend

judgeval 0.0.26tar.gz → 0.0.28tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (169) hide show

{judgeval-0.0.26 → judgeval-0.0.28}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: judgeval
-Version: 0.0.26
+Version: 0.0.28
 Summary: Judgeval Package
 Project-URL: Homepage, https://github.com/JudgmentLabs/judgeval
 Project-URL: Issues, https://github.com/JudgmentLabs/judgeval/issues

judgeval-0.0.28/docs/alerts/notifications.mdx ADDED Viewed

@@ -0,0 +1,191 @@
+---
+title: 'Notifications'
+description: 'Get alerted when your rules trigger through multiple communication channels'
+---
+# Notifications
+Notifications allow you to receive alerts through various communication channels when your [rules](/alerts/rules) are triggered. This feature helps you stay informed about potential issues with your AI system's performance in real-time.
+## Overview
+The notification system works with [rules](/alerts/rules) to:
+1. Monitor your evaluation metrics
+2. Check if they meet your defined [conditions](/alerts/rules#conditions)
+3. Send alerts through your preferred channels when conditions are met
+Notifications can be configured globally or per rule, allowing you to customize how you're alerted based on the specific rule that was triggered.
+<Warning>
+Rules and notifications only work with built-in APIScorers. Local scorers and custom scorers are not supported for triggering notifications.
+</Warning>
+## Notification Configuration
+Notifications are configured using the `NotificationConfig` class from the `judgeval.rules` module.
+### Configuration Options
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `enabled` | boolean | Whether notifications are enabled (default: `True`) |
+| `communication_methods` | list of strings | The methods to use for sending notifications (e.g., `["email", "slack"]`) |
+| `email_addresses` | list of strings | Email addresses to send notifications to |
+| `send_at` | integer (Unix timestamp) | Schedule notifications for a specific time ([learn more](#scheduled-notifications)) |
+<Note>
+For aggregated reports and periodic summaries of multiple alerts, use the [Scheduled Reports feature](/alerts/platform_notifications#scheduled-reports-recaps) in the Judgment Platform.
+</Note>
+### Basic Configuration
+```python
+from judgeval.rules import NotificationConfig
+# Create a notification configuration
+notification_config = NotificationConfig(
+    enabled=True,
+    communication_methods=["slack", "email"],
+    email_addresses=["user@example.com"],
+    send_at=None  # Send immediately
+)
+```
+## Communication Methods
+Judgeval supports multiple communication methods for notifications:
+- `"email"`: Send emails to specified email addresses
+- `"slack"`: Send messages to configured Slack channels
+You can configure multiple methods to be used simultaneously.
+## Slack Integration
+For detailed information on integrating Slack with Judgment notifications, see the [Platform Notification Center documentation](/alerts/platform_notifications#slack-integration).
+## Attaching Notifications to Rules
+Notifications can be attached to [rules](/alerts/rules) during rule creation or added/configured later.
+### During Rule Creation
+```python
+from judgeval.rules import Rule, Condition, NotificationConfig
+from judgeval.scorers import FaithfulnessScorer
+# Create notification config
+notification_config = NotificationConfig(
+    enabled=True,
+    communication_methods=["slack", "email"],
+    email_addresses=["user@example.com"]
+)
+# Create rule with notification config
+rule = Rule(
+    name="Faithfulness Check",
+    description="Check if faithfulness meets threshold",
+    conditions=[
+        # Note: Only built-in APIScorers are supported
+        Condition(metric=FaithfulnessScorer(threshold=0.7))
+    ],
+    combine_type="all",  # Trigger when all conditions fail (see Combine Types in Rules documentation)
+    notification=notification_config
+)
+```
+## Scheduled Notifications
+You can schedule one-time notifications to be sent at a specific time using the `send_at` parameter:
+```python
+from judgeval.rules import NotificationConfig
+import time
+# Schedule notification for 1 hour from now
+one_hour_from_now = int(time.time()) + 3600
+notification_config = NotificationConfig(
+    enabled=True,
+    communication_methods=["email"],
+    email_addresses=["user@example.com"],
+    send_at=one_hour_from_now
+)
+```
+The `send_at` parameter accepts a Unix timestamp (integer) that specifies when the notification should be sent. This is useful for delaying notifications or grouping them to be sent at a specific time of day.
+<Warning>
+The `send_at` parameter only delays when a single notification is sent. It doesn't create recurring notifications or group multiple alerts together. Each time a rule is triggered, a separate notification is generated.
+</Warning>
+## Notification Types in the Platform
+The Judgment Platform offers two main types of notifications:
+1. **Evaluation Alerts** - Real-time notifications sent when specific rules are triggered. When using the API, these can be scheduled for a specific time using the `send_at` parameter.
+2. **Custom Alert Recaps** - Periodic summaries (daily, weekly, monthly) of evaluation metrics and alerts. These are configured in the [Platform Notification Center](/alerts/platform_notifications).
+### Setting Up Custom Alert Recaps
+To set up periodic notification summaries:
+1. Navigate to the Notifications page in your Judgment account settings
+2. Under "Custom Alert Recaps," click the "+" button to create a new report
+3. Configure your preferred frequency (Daily, Weekly, Monthly) and delivery time
+4. Add recipient email addresses
+For more details, see the [Scheduled Reports](/alerts/platform_notifications#scheduled-reports-recaps) documentation.
+## Judgment Platform Features
+For information about configuring notifications in the Judgment web platform, including email alerts, scheduled reports, and Slack integration, see the [Platform Notification Center](/alerts/platform_notifications) documentation.
+## Practical Example
+Here's a complete example showing how to set up rules with notifications and integrate them with the Tracer:
+```python
+import os
+from judgeval.common.tracer import Tracer, wrap
+from judgeval.scorers import FaithfulnessScorer, AnswerRelevancyScorer
+from judgeval.rules import Rule, Condition, NotificationConfig
+from openai import OpenAI
+# Create notification config
+notification_config = NotificationConfig(
+    enabled=True,
+    communication_methods=["slack", "email"],
+    email_addresses=["alerts@example.com"],
+    send_at=None  # Send immediately
+)
+# Create rules with notification config
+rules = [
+    Rule(
+        name="Quality Check",
+        description="Check if all quality metrics meet thresholds",
+        conditions=[
+            # Only built-in APIScorers can be used as metrics
+            Condition(metric=FaithfulnessScorer(threshold=0.7)),
+            Condition(metric=AnswerRelevancyScorer(threshold=0.8))
+        ],
+        combine_type="all",  # Trigger when all conditions fail
+        notification=notification_config
+    )
+]
+# Initialize tracer with rules for notifications
+judgment = Tracer(
+    api_key=os.getenv("JUDGMENT_API_KEY"),
+    project_name="my_project",
+    rules=rules
+)
+# Wrap OpenAI client for tracing
+client = wrap(OpenAI())
+# Now any evaluations that trigger the rules will send notifications
+```

judgeval-0.0.28/docs/alerts/platform_notifications.mdx ADDED Viewed

@@ -0,0 +1,74 @@
+---
+title: 'Platform Notification Center'
+description: 'Configure and manage notifications through the Judgment web interface'
+---
+# Platform Notification Center
+The Judgment Platform provides a comprehensive notification system through its web interface, allowing you to configure email notifications, scheduled reports, and app integrations like Slack.
+<Frame>
+  <img src="/images/notifications_page.png" alt="Notifications Settings Page" />
+</Frame>
+## Slack Integration
+Judgment allows you to receive notifications directly in your Slack workspace.
+### Connecting Slack
+1. Navigate to the Notifications page in your Judgment account settings
+2. In the "App Integrations" section, find the Slack card
+3. Click the "Connect" button
+4. You'll be redirected to Slack's authorization page
+5. Select the workspace you want to connect and authorize the Judgment application
+6. Once connected, you'll be redirected back to Judgment
+### Slack Notification Features
+After connecting Slack:
+- Receive real-time alerts when evaluation rules are triggered
+- Get notifications about model performance issues
+- Track Judgment activity in your Slack channels
+### Managing Slack Notifications
+Once connected, you can:
+- Disconnect your Slack workspace at any time
+- Add specific channels for different types of notifications
+- Configure which notifications are sent to Slack
+## Email Notifications
+In the Notifications settings page, you can configure:
+1. **Evaluation Alerts** - Receive real-time email notifications whenever an evaluation alert is triggered
+2. **Custom Alert Recaps** - Receive periodic email summaries of evaluations, traces, and metric scores
+## Scheduled Reports (Recaps)
+You can create custom scheduled reports to receive regular updates on your agent's performance.
+### Creating a Report
+1. Navigate to the Notifications page in your Judgment account settings
+2. Under "Custom Alert Recaps," click the "+" button to create a new report
+3. Configure your report with the following options:
+<Frame>
+  <img src="/images/reports_modal.png" alt="Scheduled Reports Modal" />
+</Frame>
+| Setting | Description |
+|---------|-------------|
+| Report Name | A descriptive name for your report (e.g., "Daily Alert Summary") |
+| Recipient Emails | Email addresses that will receive the report |
+| Frequency | How often the report should be sent (Daily, Weekly, Monthly) |
+| Select Days | For weekly reports, specify which days of the week |
+| Time | When the report should be sent |
+| Timezone | Your local timezone for accurate scheduling |
+| Compare to Previous Period | Enable to see performance changes over time |
+Your reports will be sent automatically based on your schedule settings, providing insights into your model's performance over time.

judgeval-0.0.28/docs/alerts/rules.mdx ADDED Viewed

@@ -0,0 +1,111 @@
+---
+title: 'Rules'
+description: 'Define custom triggers and conditions for your evaluation metrics'
+---
+# Rules
+Rules allow you to define specific conditions for your evaluation metrics that can trigger alerts and [notifications](/alerts/notifications) when met. They serve as the foundation for the alerting system and help you monitor your AI system's performance against predetermined thresholds.
+## Overview
+A rule consists of one or more [conditions](#conditions), each tied to a specific metric, that is supported by our Scorer (like Faithfulness or AnswerRelevancy). When evaluations are performed, the rules engine checks if the measured scores satisfy the conditions set in your rules. Based on the rule's configuration, alerts can be triggered and notifications sent through various channels.
+<Note>
+Rules and notifications only work with built-in APIScorers. Local scorers and custom scorers are not supported for triggering rules.
+</Note>
+## Creating Rules
+Rules can be created using the `Rule` class from the `judgeval.rules` module. Each rule requires:
+- A name
+- A list of [conditions](#conditions)
+- A [combine type](#combine-types) (how conditions should be evaluated together)
+Optional parameters include:
+- A description
+- [Notification configuration](/alerts/notifications#notification-configuration)
+### Basic Rule Structure
+```python
+from judgeval.rules import Rule, Condition
+from judgeval.scorers import FaithfulnessScorer, AnswerRelevancyScorer
+# Create a rule
+rule = Rule(
+    name="Quality Check",
+    description="Check if quality metrics meet thresholds",
+    conditions=[
+        Condition(metric=FaithfulnessScorer(threshold=0.7)),
+        Condition(metric=AnswerRelevancyScorer(threshold=0.8))
+    ],
+    combine_type="all"  # "all" = AND, "any" = OR
+)
+```
+## Conditions
+Conditions are the building blocks of rules. Each condition specifies a metric (must be a built-in APIScorer like FaithfulnessScorer or AnswerRelevancyScorer). The condition is met when the score for that metric is greater than or equal to the threshold specified in the scorer.
+### Creating Conditions
+```python
+from judgeval.rules import Condition
+from judgeval.scorers import FaithfulnessScorer
+# Create a condition that passes when faithfulness score is greater than or equal to 0.7
+condition = Condition(
+    metric=FaithfulnessScorer(threshold=0.7)
+)
+```
+### How Conditions are Evaluated
+When a condition is evaluated, it uses the scorer's threshold and internal evaluation logic:
+1. By default, a condition passes when the actual score is greater than or equal to the threshold
+2. If the scorer has a custom `success_check()` method, that method will be used instead
+3. The threshold is retrieved from the scorer's `threshold` attribute
+## Combine Types
+Rules support two combine types that determine how multiple conditions are evaluated:
+- `"all"`: The rule triggers when all conditions fail (logical AND)
+- `"any"`: The rule triggers when any condition fails (logical OR)
+This design is meant for setting up alerts that trigger when your metrics indicate a problem with your AI system's performance.
+## Using Rules with the Tracer
+Rules are most commonly used with the `Tracer` to monitor your AI system's performance:
+```python
+from judgeval.common.tracer import Tracer
+from judgeval.rules import Rule, Condition
+from judgeval.scorers import FaithfulnessScorer, AnswerRelevancyScorer
+# Create rules
+rules = [
+    Rule(
+        name="Quality Check",
+        description="Check if quality metrics meet thresholds",
+        conditions=[
+            Condition(metric=FaithfulnessScorer(threshold=0.7)),
+            Condition(metric=AnswerRelevancyScorer(threshold=0.8))
+        ],
+        combine_type="all"  # Trigger when all conditions fail
+    )
+]
+# Initialize tracer with rules
+judgment = Tracer(
+    api_key="your_api_key",
+    project_name="your_project",
+    rules=rules
+)
+```
+For more information on configuring notifications with rules, see the [Notifications documentation](/alerts/notifications#attaching-notifications-to-rules).

{judgeval-0.0.26 → judgeval-0.0.28}/docs/evaluation/scorers/custom_scorers.mdx RENAMED Viewed

@@ -4,6 +4,7 @@ description: ""
 ---
 If none of `judgeval`'s built-in scorers fit your evaluation criteria, you can easily build your own custom metric to be run through a `JudgevalScorer`.
 `JudgevalScorer`s are **automatically integrated** within `judgeval`'s infrastructure, so you can:
 - Run your own scorer with the same syntax as any other `judgeval` scorer.
 - Use `judgeval`'s batched evaluation infrastructure to execute **scalable evaluation runs**.
@@ -78,7 +79,6 @@ You can optionally set the self.reason attribute, depending on your preference.
 </Note>
 These methods are the core of your scorer, and you can implement them in any way you want. **Be creative!**
-Check out this list of examples our users have implemented if you need inspiration: TODO add link here
 #### Handling Errors
 If you want to handle errors gracefully, you can use a `try` block and in the `except` block, set the `self.error` attribute to the error message.
@@ -144,11 +144,37 @@ class SampleScorer(JudgevalScorer):
     def __name__(self):
         return "Sample Scorer"
 ```
 **Congratulations!** 🎉
 You've made your first custom judgeval scorer! Now that your scorer is implemented, you can run it on your own datasets
 just like any other `judgeval` scorer. Your scorer is fully integrated with `judgeval`'s infrastructure so you can view it on
 the [Judgment platform](/judgment/introduction) too.
-For more examples, check out some of the custom scorers our users have implemented: TODO add link here.
+## Using a Custom Scorer
+Once you've implemented your custom scorer, you can use it in the same way as any other scorer in `judgeval`.
+They can be run in conjunction with other scorers in a single evaluation run!
+```python run_custom_scorer.py
+from judgeval import JudgmentClient
+from your_custom_scorer import SampleScorer
+client = JudgmentClient()
+sample_scorer = SampleScorer()
+results = client.run_evaluation(
+    examples=[example1],
+    scorers=[sample_scorer],
+    model="gpt-4o"
+)
+```
+## Real World Examples
+You can find some real world examples of how our community has used custom `JudgevalScorer`s to evaluate their LLM systems in our [cookbook repository](https://github.com/JudgmentLabs/judgment-cookbook/tree/main/cookbooks/custom_scorers)!
+Here are some of our favorites:
+- [Code Style Scorer](https://github.com/JudgmentLabs/judgment-cookbook/blob/main/cookbooks/custom_scorers/code_style_scorer.py) - Evaluates code quality and style
+- [Cold Email Scorer](https://github.com/JudgmentLabs/judgment-cookbook/blob/main/cookbooks/custom_scorers/cold_email_scorer.py) - Evaluates the effectiveness of cold emails
+For more examples and detailed documentation on custom scorers, check out our [Custom Scorers Cookbook](https://github.com/JudgmentLabs/judgment-cookbook/blob/main/cookbooks/custom_scorers/README.md).

judgeval-0.0.28/docs/images/notifications_page.png ADDED Viewed

Binary file

judgeval-0.0.28/docs/images/reports_modal.png ADDED Viewed

Binary file

{judgeval-0.0.26 → judgeval-0.0.28}/docs/mint.json RENAMED Viewed

@@ -89,6 +89,14 @@
         "integration/langgraph"
       ]
     },
+    {
+      "group": "Alerts",
+      "pages": [
+        "alerts/rules",
+        "alerts/notifications",
+        "alerts/platform_notifications"
+      ]
+    },
     {
       "group": "Judgment Platform",
       "pages": [

{judgeval-0.0.26 → judgeval-0.0.28}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "judgeval"
-version = "0.0.26"
+version = "0.0.28"
 authors = [
     { name="Andrew Li", email="andrew@judgmentlabs.ai" },
     { name="Alex Shan", email="alex@judgmentlabs.ai" },

judgeval-0.0.28/src/demo/cookbooks/JNPR_Mist/test.py ADDED Viewed

@@ -0,0 +1,21 @@
+from judgeval import JudgmentClient
+from judgeval.data import Example
+from judgeval.scorers import FaithfulnessScorer
+client = JudgmentClient()
+example = Example(
+    input="What if these shoes don't fit?",
+    actual_output="We offer a 30-day full refund at no extra cost.",
+    retrieval_context=["All customers are eligible for a 30 day full refund at no extra cost."],
+)
+scorer = FaithfulnessScorer(threshold=0.5)
+results = client.run_evaluation(
+    examples=[example],
+    scorers=[scorer],
+    model="gpt-4o",
+    eval_run_name="TestRun",
+    project_name="TestProject",
+)
+print(results)

judgeval-0.0.28/src/demo/cookbooks/linkd/text2sql.py ADDED Viewed

@@ -0,0 +1,14 @@
+"""
+ClassifierScorer implementation for basic Text-to-SQL evaluation.
+Takes a natural language query, a corresponding LLM-generated SQL query, and a table schema + (optional) metadata.
+Determines if the LLM-generated SQL query is valid and works for the natural language query.
+"""
+from judgeval.scorers import ClassifierScorer
+from judgeval import JudgmentClient
+from judgeval.scorers.judgeval_scorers.classifiers.text2sql.text2sql_scorer import Text2SQLScorer
+judgment_client = JudgmentClient()
+print(judgment_client.push_classifier_scorer(Text2SQLScorer, slug="text2sql-eric-linkd"))
+print(judgment_client.fetch_classifier_scorer("text2sql-eric-linkd"))

judgeval-0.0.28/src/demo/custom_example_demo/osiris_test.py ADDED Viewed

@@ -0,0 +1,22 @@
+from judgeval.data import CustomExample
+from judgeval import JudgmentClient
+from qodo_scorer import QodoScorer
+judgment = JudgmentClient()
+custom_example = CustomExample(
+    code="print('Hello, world!')",
+    original_code="print('Hello, world!')",
+)
+qodo_scorer = QodoScorer()
+results = judgment.run_evaluation(
+    examples=[custom_example],
+    scorers=[qodo_scorer],
+    model="gpt-4o",
+    project_name="QoDoDemo",
+    eval_run_name="QoDoDemoRun1",
+)
+print(f"{results=}")

judgeval-0.0.28/src/demo/custom_example_demo/qodo_scorer.py ADDED Viewed

@@ -0,0 +1,78 @@
+from judgeval.data import Example
+from judgeval.common.tracer import Tracer, wrap
+from judgeval.scorers import JudgevalScorer, AnswerCorrectnessScorer
+from judgeval import JudgmentClient
+from openai import OpenAI, AsyncOpenAI
+import os
+client = OpenAI()
+async_client = AsyncOpenAI()
+class QodoScorer(JudgevalScorer):
+    def __init__(self,
+                 threshold=0.5,
+                 score_type="CodeReviewScorer",
+                 include_reason=True,
+                 async_mode=True,
+                 strict_mode=False,
+                 verbose_mode=True):
+        super().__init__(
+            threshold=threshold,
+            score_type=score_type,
+            include_reason=include_reason,
+            async_mode=async_mode,
+            strict_mode=strict_mode,
+            verbose_mode=verbose_mode)
+    def score_example(self, example: Example) -> float:
+        """
+        Score the trace based on the code review criteria.
+        """
+        response = client.chat.completions.create(
+            model="gpt-4o",
+            messages=[
+                {"role": "system", "content": "You are a QoDo reviewer. You will be given CODE, a PR_REQUEST and QoDo's improved summary of the PR_REQUEST as well as its review of the PR_REQUEST given as PR_QUALITY. Your job is to review the CODE and PR_REQUEST and determine how factually accurate and thorough QoDo is. Give reasoning for why or why not you think the QoDo's review if accurate and thorough."},
+                {"role": "user", "content": f"INPUT: {example.input}, CONTEXT: {example.context}, QoDo's REViEW: {example.actual_output}"},
+            ],
+        )
+        self.reason = response.choices[0].message.content
+        score_response = client.chat.completions.create(
+            model="gpt-4o",
+            messages=[
+                {"role": "system",
+                    "content": "You are a judge, you will be given a review of the performance of Qodo (a code review tool) on the accuracy and thoroughness of its review of a PR_REQUEST given as PR_QUALITY. Your job is to give a score from 0 to 1 on how well Qodo performed based on the REVIEW given to you. Do not output anything except the score."},
+                {"role": "user", "content": f"REVIEW: {self.reason}"},
+            ],
+        )
+        self.score = float(score_response.choices[0].message.content)
+        return self.score
+    async def a_score_example(self, example: Example) -> float:
+        """
+        Score the trace based on the code review criteria.
+        """
+        # In this case, the async implementation is the same as the sync one
+        # In a real scenario, you might want to use async APIs for better performance
+        response = await async_client.chat.completions.create(
+            model="gpt-4o",
+            messages=[
+                {"role": "system", "content": "You are a QoDo reviewer. You will be given CODE, a PR_REQUEST and QoDo's improved summary of the PR_REQUEST as well as its review of the PR_REQUEST given as PR_QUALITY. Your job is to review the CODE and PR_REQUEST and determine how factually accurate and thorough QoDo is. Give reasoning for why or why not you think the QoDo's review if accurate and thorough."},
+                {"role": "user", "content": f"INPUT: {example.input}, CONTEXT: {example.context}, QoDo's REViEW: {example.actual_output}"},
+            ],
+        )
+        self.score = 1.0
+        return self.score_example(example)
+    def _success_check(self):
+        if self.error is not None:
+            return False
+        return self.score >= self.threshold
+    @property
+    def __name__(self):
+        return "Qodo Scorer"

judgeval 0.0.26__tar.gz → 0.0.28__tar.gz

judgeval 0.0.26tar.gz → 0.0.28tar.gz