PyPI - judgeval - Versions diffs - 0.0.35__tar.gz → 0.0.37__tar.gz - Mend

judgeval 0.0.35tar.gz → 0.0.37tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (177) hide show

{judgeval-0.0.35 → judgeval-0.0.37}/.github/workflows/ci.yaml RENAMED Viewed

@@ -1,8 +1,8 @@
-name: CI
+name: CI Tests
 on:
-  pull_request_review:
-    types: [submitted]
+  pull_request:
+    types: [opened, synchronize, reopened]
     branches:
       - main
@@ -20,6 +20,7 @@ jobs:
       PYTHONPATH: "."
       OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
       TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
+      JUDGMENT_DEV: true
     steps:
       - name: Checkout code
@@ -39,7 +40,7 @@ jobs:
       - name: Run tests
         run: |
           cd src
-          pipenv run pytest
+          pipenv run pytest tests
   run-e2e-tests:
     if: "!contains(github.actor, '[bot]')"  # Exclude if the actor is a bot

judgeval-0.0.37/PKG-INFO ADDED Viewed

@@ -0,0 +1,214 @@
+Metadata-Version: 2.4
+Name: judgeval
+Version: 0.0.37
+Summary: Judgeval Package
+Project-URL: Homepage, https://github.com/JudgmentLabs/judgeval
+Project-URL: Issues, https://github.com/JudgmentLabs/judgeval/issues
+Author-email: Andrew Li <andrew@judgmentlabs.ai>, Alex Shan <alex@judgmentlabs.ai>, Joseph Camyre <joseph@judgmentlabs.ai>
+License-Expression: Apache-2.0
+License-File: LICENSE.md
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Requires-Python: >=3.11
+Requires-Dist: anthropic
+Requires-Dist: boto3
+Requires-Dist: google-genai
+Requires-Dist: langchain-anthropic
+Requires-Dist: langchain-core
+Requires-Dist: langchain-huggingface
+Requires-Dist: langchain-openai
+Requires-Dist: litellm==1.38.12
+Requires-Dist: nest-asyncio
+Requires-Dist: openai
+Requires-Dist: pandas
+Requires-Dist: python-dotenv==1.0.1
+Requires-Dist: requests
+Requires-Dist: together
+Description-Content-Type: text/markdown
+<div align="center">
+<img src="assets/logo-light.svg#gh-light-mode-only" alt="Judgment Logo" width="400" />
+<img src="assets/logo-dark.svg#gh-dark-mode-only" alt="Judgment Logo" width="400" />
+**Build monitoring & evaluation pipelines for complex agents**
+[Website](https://www.judgmentlabs.ai/) • [Twitter/X](https://x.com/JudgmentLabs) • [LinkedIn](https://www.linkedin.com/company/judgmentlabs) • [Documentation](https://judgment.mintlify.app/getting_started) • [Demos](https://www.youtube.com/@AlexShan-j3o)
+</div>
+## 🚀 What is Judgeval?
+Judgeval is an open-source tool for testing, monitoring, and optimizing AI agents. Judgeval is created and maintained by [Judgment Labs](https://judgmentlabs.ai/).
+**🔍 Tracing**
+* Automatic agent tracing for common agent frameworks and SDKs (LangGraph, OpenAI, Anthropic, etc.)
+* Track input/output, latency, cost, token usage at every step
+* Function tracing with `@judgment.observe` decorator
+**🧪 Evals**
+* Plug-and-measure 15+ metrics, including:
+  * Tool call accuracy
+  * Hallucinations
+  * Instruction adherence
+  * Retrieval context recall
+    Our metric implementations are research-backed by Stanford and Berkeley AI labs. Check out our [research](https://judgmentlabs.ai/research)!
+* Build custom evaluators that seamlessly connect with our infrastructure!
+* Use our evals for:
+    * ⚠️ Unit-testing your agent
+    * 🔬 Experimentally testing new prompts and models
+    * 🛡️ Online evaluations to guardrail your agent's actions and responses
+**📊 Datasets**
+* Export trace data to datasets hosted on Judgment's Platform and export to JSON, Parquet, S3, etc.
+* Run evals on datasets as unit-tests or to A/B test agent configs
+**💡 Insights**
+* Error clustering groups agent failures to uncover failure patterns and speed up root cause analysis
+* Trace agent failures to their exact source. Judgment's Osiris agent localizes errors to specific agent components, enabling precise, targeted fixes.
+## 🛠️ Installation
+Get started with Judgeval by installing our SDK using pip:
+```bash
+pip install judgeval
+```
+Ensure you have your `JUDGMENT_API_KEY` environment variable set to connect to the [Judgment platform](https://app.judgmentlabs.ai/). If you don't have a key, create an account on the platform!
+## 🏁 Get Started
+Here's how you can quickly start using Judgeval:
+### 🛰️ Tracing
+Track your agent execution with full observability with just a few lines of code.
+Create a file named `traces.py` with the following code:
+```python
+from judgeval.common.tracer import Tracer, wrap
+from openai import OpenAI
+client = wrap(OpenAI())
+judgment = Tracer(project_name="my_project")
+@judgment.observe(span_type="tool")
+def my_tool():
+    return "What's the capital of the U.S.?"
+@judgment.observe(span_type="function")
+def main():
+    task_input = my_tool()
+    res = client.chat.completions.create(
+        model="gpt-4.1",
+        messages=[{"role": "user", "content": f"{task_input}"}]
+    )
+    return res.choices[0].message.content
+main()
+```
+[Click here](https://judgment.mintlify.app/getting_started#create-your-first-trace) for a more detailed explanation.
+### 📝 Offline Evaluations
+You can evaluate your agent's execution to measure quality metrics such as hallucination.
+Create a file named `evaluate.py` with the following code:
+```python evaluate.py
+from judgeval import JudgmentClient
+from judgeval.data import Example
+from judgeval.scorers import FaithfulnessScorer
+client = JudgmentClient()
+example = Example(
+    input="What if these shoes don't fit?",
+    actual_output="We offer a 30-day full refund at no extra cost.",
+    retrieval_context=["All customers are eligible for a 30 day full refund at no extra cost."],
+)
+scorer = FaithfulnessScorer(threshold=0.5)
+results = client.run_evaluation(
+    examples=[example],
+    scorers=[scorer],
+    model="gpt-4.1",
+)
+print(results)
+```
+[Click here](https://judgment.mintlify.app/getting_started#create-your-first-experiment) for a more detailed explanation.
+### 📡 Online Evaluations
+Apply performance monitoring to measure the quality of your systems in production, not just on traces.
+Using the same `traces.py` file we created earlier, modify `main` function:
+```python
+from judgeval.common.tracer import Tracer, wrap
+from judgeval.scorers import AnswerRelevancyScorer
+from openai import OpenAI
+client = wrap(OpenAI())
+judgment = Tracer(project_name="my_project")
+@judgment.observe(span_type="tool")
+def my_tool():
+    return "Hello world!"
+@judgment.observe(span_type="function")
+def main():
+    task_input = my_tool()
+    res = client.chat.completions.create(
+        model="gpt-4.1",
+        messages=[{"role": "user", "content": f"{task_input}"}]
+    ).choices[0].message.content
+    judgment.get_current_trace().async_evaluate(
+        scorers=[AnswerRelevancyScorer(threshold=0.5)],
+        input=task_input,
+        actual_output=res,
+        model="gpt-4.1"
+    )
+    print("Online evaluation submitted.")
+    return res
+main()
+```
+[Click here](https://judgment.mintlify.app/getting_started#create-your-first-online-evaluation) for a more detailed explanation.
+## 🏢 Self-Hosting
+Run Judgment on your own infrastructure: we provide comprehensive self-hosting capabilities that give you full control over the backend and data plane that Judgeval interfaces with.
+### Key Features
+* Deploy Judgment on your own AWS account
+* Store data in your own Supabase instance
+* Access Judgment through your own custom domain
+### Getting Started
+1. Check out our [self-hosting documentation](https://judgment.mintlify.app/self_hosting/get_started) for detailed setup instructions, along with how your self-hosted instance can be accessed
+2. Use the [Judgment CLI](https://github.com/JudgmentLabs/judgment-cli) to deploy your self-hosted environment
+3. After your self-hosted instance is setup, make sure the `JUDGMENT_API_URL` environmental variable is set to your self-hosted backend endpoint
+## ⭐ Star Us on GitHub
+If you find Judgeval useful, please consider giving us a star on GitHub! Your support helps us grow our community and continue improving the product.
+## 🤝 Contributing
+There are many ways to contribute to Judgeval:
+- Submit [bug reports](https://github.com/JudgmentLabs/judgeval/issues) and [feature requests](https://github.com/JudgmentLabs/judgeval/issues)
+- Review the documentation and submit [Pull Requests](https://github.com/JudgmentLabs/judgeval/pulls) to improve it
+- Speaking or writing about Judgment and letting us know!
+## Documentation and Demos
+For more detailed documentation, please check out our [developer docs](https://judgment.mintlify.app/getting_started) and some of our [demo videos](https://www.youtube.com/@AlexShan-j3o) for reference!

{judgeval-0.0.35 → judgeval-0.0.37}/Pipfile RENAMED Viewed

@@ -6,24 +6,16 @@ name = "pypi"
 [packages]
 litellm = "==1.38.12"
 python-dotenv = "==1.0.1"
-fastapi = "*"
-uvicorn = "*"
-supabase = "*"
 requests = "*"
 pandas = "*"
 openai = "*"
 together = "*"
 anthropic = "*"
-asyncio = "*"
 nest-asyncio = "*"
-pika = "*"
-openpyxl = "*"
-langchain = "*"
 langchain-huggingface = "*"
 langchain-openai = "*"
 langchain-anthropic = "*"
 langchain-core = "*"
-langchain-community = "*"
 langgraph = "*"
 google-genai = "*"
 boto3 = "*"
@@ -33,6 +25,8 @@ pytest = "*"
 pytest-asyncio = "*"
 pytest-mock = "*"
 tavily-python = "*"
+chromadb = "*"
+langchain-community = "*"
 [requires]
 python_version = "3.11"

judgeval 0.0.35__tar.gz → 0.0.37__tar.gz

judgeval 0.0.35tar.gz → 0.0.37tar.gz