PyPI - deepeval - Versions diffs - 3.7.9__tar.gz → 3.8.1__tar.gz - Mend

deepeval 3.7.9tar.gz → 3.8.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (528) hide show

{deepeval-3.7.9 → deepeval-3.8.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: deepeval
-Version: 3.7.9
+Version: 3.8.1
 Summary: The LLM Evaluation Framework
 Home-page: https://github.com/confident-ai/deepeval
 License: Apache-2.0
@@ -100,7 +100,7 @@ Description-Content-Type: text/markdown
     <a href="https://www.readme-i18n.com/confident-ai/deepeval?lang=zh">中文</a>
 </p>
-**DeepEval** is a simple-to-use, open-source LLM evaluation framework, for evaluating and testing large-language model systems. It is similar to Pytest but specialized for unit testing LLM outputs. DeepEval incorporates the latest research to evaluate LLM outputs based on metrics such as G-Eval, task completion, answer relevancy, hallucination, etc., which uses LLM-as-a-judge and other NLP models that runs **locally on your machine** for evaluation.
+**DeepEval** is a simple-to-use, open-source LLM evaluation framework, for evaluating and testing large-language model systems. It is similar to Pytest but specialized for unit testing LLM outputs. DeepEval incorporates the latest research to evaluate LLM outputs based on metrics such as G-Eval, task completion, answer relevancy, hallucination, etc., which uses LLM-as-a-judge and other NLP models that run **locally on your machine** for evaluation.
 Whether your LLM applications are AI agents, RAG pipelines, or chatbots, implemented via LangChain or OpenAI, DeepEval has you covered. With it, you can easily determine the optimal models, prompts, and architecture to improve your RAG pipeline, agentic workflows, prevent prompt drifting, or even transition from OpenAI to hosting your own Deepseek R1 with confidence.
@@ -118,7 +118,7 @@ Whether your LLM applications are AI agents, RAG pipelines, or chatbots, impleme
 > 🥳 You can now share DeepEval's test results on the cloud directly on [Confident AI](https://confident-ai.com?utm_source=GitHub)
 - Supports both end-to-end and component-level LLM evaluation.
-- Large variety of ready-to-use LLM evaluation metrics (all with explanations) powered by **ANY** LLM of your choice, statistical methods, or NLP models that runs **locally on your machine**:
+- Large variety of ready-to-use LLM evaluation metrics (all with explanations) powered by **ANY** LLM of your choice, statistical methods, or NLP models that run **locally on your machine**:
   - G-Eval
   - DAG ([deep acyclic graph](https://deepeval.com/docs/metrics-dag))
   - **RAG metrics:**

{deepeval-3.7.9 → deepeval-3.8.1}/README.md RENAMED Viewed

@@ -53,7 +53,7 @@
     <a href="https://www.readme-i18n.com/confident-ai/deepeval?lang=zh">中文</a>
 </p>
-**DeepEval** is a simple-to-use, open-source LLM evaluation framework, for evaluating and testing large-language model systems. It is similar to Pytest but specialized for unit testing LLM outputs. DeepEval incorporates the latest research to evaluate LLM outputs based on metrics such as G-Eval, task completion, answer relevancy, hallucination, etc., which uses LLM-as-a-judge and other NLP models that runs **locally on your machine** for evaluation.
+**DeepEval** is a simple-to-use, open-source LLM evaluation framework, for evaluating and testing large-language model systems. It is similar to Pytest but specialized for unit testing LLM outputs. DeepEval incorporates the latest research to evaluate LLM outputs based on metrics such as G-Eval, task completion, answer relevancy, hallucination, etc., which uses LLM-as-a-judge and other NLP models that run **locally on your machine** for evaluation.
 Whether your LLM applications are AI agents, RAG pipelines, or chatbots, implemented via LangChain or OpenAI, DeepEval has you covered. With it, you can easily determine the optimal models, prompts, and architecture to improve your RAG pipeline, agentic workflows, prevent prompt drifting, or even transition from OpenAI to hosting your own Deepseek R1 with confidence.
@@ -71,7 +71,7 @@ Whether your LLM applications are AI agents, RAG pipelines, or chatbots, impleme
 > 🥳 You can now share DeepEval's test results on the cloud directly on [Confident AI](https://confident-ai.com?utm_source=GitHub)
 - Supports both end-to-end and component-level LLM evaluation.
-- Large variety of ready-to-use LLM evaluation metrics (all with explanations) powered by **ANY** LLM of your choice, statistical methods, or NLP models that runs **locally on your machine**:
+- Large variety of ready-to-use LLM evaluation metrics (all with explanations) powered by **ANY** LLM of your choice, statistical methods, or NLP models that run **locally on your machine**:
   - G-Eval
   - DAG ([deep acyclic graph](https://deepeval.com/docs/metrics-dag))
   - **RAG metrics:**

deepeval-3.8.1/deepeval/_version.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__: str = "3.8.1"

{deepeval-3.7.9 → deepeval-3.8.1}/deepeval/annotation/annotation.py RENAMED Viewed

@@ -14,7 +14,7 @@ def send_annotation(
     explanation: Optional[str] = None,
     user_id: Optional[str] = None,
     type: Optional[AnnotationType] = AnnotationType.THUMBS_RATING,
-) -> str:
+) -> None:
     api_annotation = APIAnnotation(
         rating=rating,
         traceUuid=trace_uuid,
@@ -50,7 +50,7 @@ async def a_send_annotation(
     explanation: Optional[str] = None,
     type: Optional[AnnotationType] = AnnotationType.THUMBS_RATING,
     user_id: Optional[str] = None,
-) -> str:
+) -> None:
     api_annotation = APIAnnotation(
         rating=rating,
         traceUuid=trace_uuid,

{deepeval-3.7.9 → deepeval-3.8.1}/deepeval/cli/main.py RENAMED Viewed

@@ -2937,5 +2937,173 @@ def unset_portkey_model_env(
             )
+#############################################
+# OpenRouter Integration ####################
+#############################################
+@app.command(name="set-openrouter")
+def set_openrouter_model_env(
+    model: Optional[str] = typer.Option(
+        None,
+        "-m",
+        "--model",
+        help="Model identifier to use for this provider (e.g., `openai/gpt-4.1`).",
+    ),
+    prompt_api_key: bool = typer.Option(
+        False,
+        "-k",
+        "--prompt-api-key",
+        help=(
+            "Prompt for OPENROUTER_API_KEY (input hidden). Not suitable for CI. "
+            "If --save (or DEEPEVAL_DEFAULT_SAVE) is used, the key is written to dotenv in plaintext."
+        ),
+    ),
+    base_url: Optional[str] = typer.Option(
+        None,
+        "-u",
+        "--base-url",
+        help="Override the API endpoint/base URL used by this provider (default: https://openrouter.ai/api/v1).",
+    ),
+    temperature: Optional[float] = typer.Option(
+        None,
+        "-t",
+        "--temperature",
+        help="Override the global TEMPERATURE used by LLM providers (e.g., 0.0 for deterministic behavior).",
+    ),
+    cost_per_input_token: Optional[float] = typer.Option(
+        None,
+        "-i",
+        "--cost-per-input-token",
+        help=(
+            "USD per input token used for cost tracking. "
+            "If unset and OpenRouter does not return pricing metadata, "
+            "costs will not be calculated."
+        ),
+    ),
+    cost_per_output_token: Optional[float] = typer.Option(
+        None,
+        "-o",
+        "--cost-per-output-token",
+        help=(
+            "USD per output token used for cost tracking. "
+            "If unset and OpenRouter does not return pricing metadata, "
+            "costs will not be calculated."
+        ),
+    ),
+    save: Optional[str] = typer.Option(
+        None,
+        "-s",
+        "--save",
+        help="Persist CLI parameters as environment variables in a dotenv file. "
+        "Usage: --save=dotenv[:path] (default: .env.local)",
+    ),
+    quiet: bool = typer.Option(
+        False,
+        "-q",
+        "--quiet",
+        help="Suppress printing to the terminal (useful for CI).",
+    ),
+):
+    api_key = None
+    if prompt_api_key:
+        api_key = coerce_blank_to_none(
+            typer.prompt("OpenRouter API key", hide_input=True)
+        )
+    model = coerce_blank_to_none(model)
+    base_url = coerce_blank_to_none(base_url)
+    settings = get_settings()
+    with settings.edit(save=save) as edit_ctx:
+        edit_ctx.switch_model_provider(ModelKeyValues.USE_OPENROUTER_MODEL)
+        if model is not None:
+            settings.OPENROUTER_MODEL_NAME = model
+        if api_key is not None:
+            settings.OPENROUTER_API_KEY = api_key
+        if base_url is not None:
+            settings.OPENROUTER_BASE_URL = base_url
+        if temperature is not None:
+            settings.TEMPERATURE = temperature
+        if cost_per_input_token is not None:
+            settings.OPENROUTER_COST_PER_INPUT_TOKEN = cost_per_input_token
+        if cost_per_output_token is not None:
+            settings.OPENROUTER_COST_PER_OUTPUT_TOKEN = cost_per_output_token
+    handled, path, updates = edit_ctx.result
+    effective_model = settings.OPENROUTER_MODEL_NAME
+    if not effective_model:
+        raise typer.BadParameter(
+            "OpenRouter model name is not set. Pass --model (or set OPENROUTER_MODEL_NAME).",
+            param_hint="--model",
+        )
+    _handle_save_result(
+        handled=handled,
+        path=path,
+        updates=updates,
+        save=save,
+        quiet=quiet,
+        success_msg=(
+            f":raising_hands: Congratulations! You're now using OpenRouter `{escape(effective_model)}` for all evals that require an LLM."
+        ),
+    )
+@app.command(name="unset-openrouter")
+def unset_openrouter_model_env(
+    save: Optional[str] = typer.Option(
+        None,
+        "-s",
+        "--save",
+        help="Remove only the OpenRouter model related environment variables from a dotenv file. "
+        "Usage: --save=dotenv[:path] (default: .env.local)",
+    ),
+    clear_secrets: bool = typer.Option(
+        False,
+        "-x",
+        "--clear-secrets",
+        help="Also remove OPENROUTER_API_KEY from the dotenv store.",
+    ),
+    quiet: bool = typer.Option(
+        False,
+        "-q",
+        "--quiet",
+        help="Suppress printing to the terminal (useful for CI).",
+    ),
+):
+    settings = get_settings()
+    with settings.edit(save=save) as edit_ctx:
+        settings.USE_OPENROUTER_MODEL = None
+        settings.OPENROUTER_MODEL_NAME = None
+        settings.OPENROUTER_BASE_URL = None
+        settings.OPENROUTER_COST_PER_INPUT_TOKEN = None
+        settings.OPENROUTER_COST_PER_OUTPUT_TOKEN = None
+        # Intentionally do NOT touch TEMPERATURE here; it's a global dial.
+        if clear_secrets:
+            settings.OPENROUTER_API_KEY = None
+    handled, path, updates = edit_ctx.result
+    if _handle_save_result(
+        handled=handled,
+        path=path,
+        updates=updates,
+        save=save,
+        quiet=quiet,
+        updated_msg="Removed OpenRouter model environment variables from {path}.",
+        tip_msg=None,
+    ):
+        if is_openai_configured():
+            print(
+                ":raised_hands: OpenAI will still be used by default because OPENAI_API_KEY is set."
+            )
+        else:
+            print(
+                "The OpenRouter model configuration has been removed. No model is currently configured, but you can set one with the CLI or add credentials to .env[.local]."
+            )
 if __name__ == "__main__":
     app()

{deepeval-3.7.9 → deepeval-3.8.1}/deepeval/confident/api.py RENAMED Viewed

@@ -106,6 +106,8 @@ class Endpoints(Enum):
     EVALUATE_TRACE_ENDPOINT = "/v1/evaluate/traces/:traceUuid"
     EVALUATE_SPAN_ENDPOINT = "/v1/evaluate/spans/:spanUuid"
+    METRICS_ENDPOINT = "/v1/metrics"
 class Api:
     def __init__(self, api_key: Optional[str] = None):

{deepeval-3.7.9 → deepeval-3.8.1}/deepeval/config/settings.py RENAMED Viewed

@@ -447,6 +447,9 @@ class Settings(BaseSettings):
     AZURE_OPENAI_API_KEY: Optional[SecretStr] = Field(
         None, description="Azure OpenAI API key."
     )
+    AZURE_OPENAI_AD_TOKEN: Optional[SecretStr] = Field(
+        None, description="Azure OpenAI Ad Token."
+    )
     AZURE_OPENAI_ENDPOINT: Optional[AnyUrl] = Field(
         None, description="Azure OpenAI endpoint URL."
     )
@@ -627,6 +630,16 @@ class Settings(BaseSettings):
     PORTKEY_PROVIDER_NAME: Optional[str] = Field(
         None, description="Provider name/routing hint for Portkey."
     )
+    # OpenRouter
+    USE_OPENROUTER_MODEL: Optional[bool] = None
+    OPENROUTER_API_KEY: Optional[SecretStr] = None
+    OPENROUTER_MODEL_NAME: Optional[str] = None
+    OPENROUTER_COST_PER_INPUT_TOKEN: Optional[float] = None
+    OPENROUTER_COST_PER_OUTPUT_TOKEN: Optional[float] = None
+    OPENROUTER_BASE_URL: Optional[AnyUrl] = Field(
+        None, description="OpenRouter base URL (if using a custom endpoint)."
+    )
     # Vertex AI
     VERTEX_AI_MODEL_NAME: Optional[str] = Field(
         None,

{deepeval-3.7.9 → deepeval-3.8.1}/deepeval/constants.py RENAMED Viewed

@@ -35,6 +35,7 @@ class ProviderSlug(str, Enum):
     LITELLM = "litellm"
     LOCAL = "local"
     OLLAMA = "ollama"
+    OPENROUTER = "openrouter"
 def slugify(value: Union[str, ProviderSlug]) -> str:

{deepeval-3.7.9 → deepeval-3.8.1}/deepeval/dataset/dataset.py RENAMED Viewed

@@ -84,9 +84,11 @@ class EvaluationDataset:
     def __init__(
         self,
         goldens: Union[List[Golden], List[ConversationalGolden]] = [],
+        confident_api_key: Optional[str] = None,
     ):
         self._alias = None
         self._id = None
+        self.confident_api_key = confident_api_key
         if len(goldens) > 0:
             self._multi_turn = (
                 True if isinstance(goldens[0], ConversationalGolden) else False
@@ -722,7 +724,7 @@ class EvaluationDataset:
                 "Unable to push empty dataset to Confident AI, there must be at least one golden in dataset."
             )
-        api = Api()
+        api = Api(api_key=self.confident_api_key)
         api_dataset = APIDataset(
             goldens=self.goldens if not self._multi_turn else None,
             conversationalGoldens=(self.goldens if self._multi_turn else None),
@@ -755,7 +757,7 @@ class EvaluationDataset:
         auto_convert_goldens_to_test_cases: bool = False,
         public: bool = False,
     ):
-        api = Api()
+        api = Api(api_key=self.confident_api_key)
         with capture_pull_dataset():
             with Progress(
                 SpinnerColumn(style="rgb(106,0,255)"),
@@ -839,7 +841,7 @@ class EvaluationDataset:
             raise ValueError(
                 f"Can't queue empty list of goldens to dataset with alias: {alias} on Confident AI."
             )
-        api = Api()
+        api = Api(api_key=self.confident_api_key)
         multi_turn = isinstance(goldens[0], ConversationalGolden)
@@ -871,7 +873,7 @@ class EvaluationDataset:
         self,
         alias: str,
     ):
-        api = Api()
+        api = Api(api_key=self.confident_api_key)
         api.send_request(
             method=HttpMethods.DELETE,
             endpoint=Endpoints.DATASET_ALIAS_ENDPOINT,

deepeval 3.7.9__tar.gz → 3.8.1__tar.gz

deepeval 3.7.9tar.gz → 3.8.1tar.gz