PyPI - azure-ai-evaluation - Versions diffs - 1.8.0__tar.gz → 1.10.0__tar.gz - Mend

azure-ai-evaluation 1.8.0tar.gz → 1.10.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of azure-ai-evaluation might be problematic. Click here for more details.

Files changed (394) hide show

{azure_ai_evaluation-1.8.0 → azure_ai_evaluation-1.10.0}/CHANGELOG.md RENAMED Viewed

@@ -1,5 +1,50 @@
 # Release History
+## 1.10.0 (2025-07-31)
+### Breaking Changes
+- Added `evaluate_query` parameter to all RAI service evaluators that can be passed as a keyword argument. This parameter controls whether queries are included in evaluation data when evaluating query-response pairs. Previously, queries were always included in evaluations. When set to `True`, both query and response will be evaluated; when set to `False` (default), only the response will be evaluated. This parameter is available across all RAI service evaluators including `ContentSafetyEvaluator`, `ViolenceEvaluator`, `SexualEvaluator`, `SelfHarmEvaluator`, `HateUnfairnessEvaluator`, `ProtectedMaterialEvaluator`, `IndirectAttackEvaluator`, `CodeVulnerabilityEvaluator`, `UngroundedAttributesEvaluator`, `GroundednessProEvaluator`, and `EciEvaluator`.  Existing code that relies on queries being evaluated will need to explicitly set `evaluate_query=True` to maintain the previous behavior.
+### Features Added
+- Added support for Azure OpenAI Python grader via `AzureOpenAIPythonGrader` class, which serves as a wrapper around Azure Open AI Python grader configurations. This new grader object can be supplied to the main `evaluate` method as if it were a normal callable evaluator.
+- Added `attack_success_thresholds` parameter to `RedTeam` class for configuring custom thresholds that determine attack success. This allows users to set specific threshold values for each risk category, with scores greater than the threshold considered successful attacks (i.e. higher threshold means higher
+tolerance for harmful responses).
+- Enhanced threshold reporting in RedTeam results to include default threshold values when custom thresholds aren't specified, providing better transparency about the evaluation criteria used.
+### Bugs Fixed
+- Fixed red team scan `output_path` issue where individual evaluation results were overwriting each other instead of being preserved as separate files. Individual evaluations now create unique files while the user's `output_path` is reserved for final aggregated results.
+- Significant improvements to TaskAdherence evaluator. New version has less variance, is much faster and consumes fewer tokens.
+- Significant improvements to Relevance evaluator. New version has more concrete rubrics and has less variance, is much faster and consumes fewer tokens.
+### Other Changes
+- The default engine for evaluation was changed from `promptflow` (PFClient) to an in-SDK batch client (RunSubmitterClient)
+  - Note: We've temporarily kept an escape hatch to fall back to the legacy `promptflow` implementation by setting `_use_pf_client=True` when invoking `evaluate()`.
+    This is due to be removed in a future release.
+## 1.9.0 (2025-07-02)
+### Features Added
+- Added support for Azure Open AI evaluation via `AzureOpenAIScoreModelGrader` class, which serves as a wrapper around Azure Open AI score model configurations. This new grader object can be supplied to the main `evaluate` method as if it were a normal callable evaluator.
+- Added new experimental risk categories ProtectedMaterial and CodeVulnerability for redteam agent scan.
+### Bugs Fixed
+- Significant improvements to IntentResolution evaluator. New version has less variance, is nearly 2x faster and consumes fewer tokens.
+- Fixes and improvements to ToolCallAccuracy evaluator. New version has less variance. and now works on all tool calls that happen in a turn at once. Previously, it worked on each tool call independently without having context on the other tool calls that happen in the same turn, and then aggregated the results to a score in the range [0-1]. The score range is now [1-5].
+- Fixed MeteorScoreEvaluator and other threshold-based evaluators returning incorrect binary results due to integer conversion of decimal scores. Previously, decimal scores like 0.9375 were incorrectly converted to integers (0) before threshold comparison, causing them to fail even when above the threshold. [#41415](https://github.com/Azure/azure-sdk-for-python/issues/41415)
+- Added a new enum `ADVERSARIAL_QA_DOCUMENTS` which moves all the "file_content" type prompts away from `ADVERSARIAL_QA` to the new enum
+- `AzureOpenAIScoreModelGrader` evaluator now supports `pass_threshold` parameter to set the minimum score required for a response to be considered passing. This allows users to define custom thresholds for evaluation results, enhancing flexibility in grading AI model responses.
 ## 1.8.0 (2025-05-29)
 ### Features Added

{azure_ai_evaluation-1.8.0/azure_ai_evaluation.egg-info → azure_ai_evaluation-1.10.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: azure-ai-evaluation
-Version: 1.8.0
+Version: 1.10.0
 Summary: Microsoft Azure Evaluation Library for Python
 Home-page: https://github.com/Azure/azure-sdk-for-python
 Author: Microsoft Corporation
@@ -21,8 +21,6 @@ Classifier: Operating System :: OS Independent
 Requires-Python: >=3.9
 Description-Content-Type: text/markdown
 License-File: NOTICE.txt
-Requires-Dist: promptflow-devkit>=1.17.1
-Requires-Dist: promptflow-core>=1.17.1
 Requires-Dist: pyjwt>=2.8.0
 Requires-Dist: azure-identity>=1.16.0
 Requires-Dist: azure-core>=1.30.2
@@ -400,6 +398,51 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
 # Release History
+## 1.10.0 (2025-07-31)
+### Breaking Changes
+- Added `evaluate_query` parameter to all RAI service evaluators that can be passed as a keyword argument. This parameter controls whether queries are included in evaluation data when evaluating query-response pairs. Previously, queries were always included in evaluations. When set to `True`, both query and response will be evaluated; when set to `False` (default), only the response will be evaluated. This parameter is available across all RAI service evaluators including `ContentSafetyEvaluator`, `ViolenceEvaluator`, `SexualEvaluator`, `SelfHarmEvaluator`, `HateUnfairnessEvaluator`, `ProtectedMaterialEvaluator`, `IndirectAttackEvaluator`, `CodeVulnerabilityEvaluator`, `UngroundedAttributesEvaluator`, `GroundednessProEvaluator`, and `EciEvaluator`.  Existing code that relies on queries being evaluated will need to explicitly set `evaluate_query=True` to maintain the previous behavior.
+### Features Added
+- Added support for Azure OpenAI Python grader via `AzureOpenAIPythonGrader` class, which serves as a wrapper around Azure Open AI Python grader configurations. This new grader object can be supplied to the main `evaluate` method as if it were a normal callable evaluator.
+- Added `attack_success_thresholds` parameter to `RedTeam` class for configuring custom thresholds that determine attack success. This allows users to set specific threshold values for each risk category, with scores greater than the threshold considered successful attacks (i.e. higher threshold means higher
+tolerance for harmful responses).
+- Enhanced threshold reporting in RedTeam results to include default threshold values when custom thresholds aren't specified, providing better transparency about the evaluation criteria used.
+### Bugs Fixed
+- Fixed red team scan `output_path` issue where individual evaluation results were overwriting each other instead of being preserved as separate files. Individual evaluations now create unique files while the user's `output_path` is reserved for final aggregated results.
+- Significant improvements to TaskAdherence evaluator. New version has less variance, is much faster and consumes fewer tokens.
+- Significant improvements to Relevance evaluator. New version has more concrete rubrics and has less variance, is much faster and consumes fewer tokens.
+### Other Changes
+- The default engine for evaluation was changed from `promptflow` (PFClient) to an in-SDK batch client (RunSubmitterClient)
+  - Note: We've temporarily kept an escape hatch to fall back to the legacy `promptflow` implementation by setting `_use_pf_client=True` when invoking `evaluate()`.
+    This is due to be removed in a future release.
+## 1.9.0 (2025-07-02)
+### Features Added
+- Added support for Azure Open AI evaluation via `AzureOpenAIScoreModelGrader` class, which serves as a wrapper around Azure Open AI score model configurations. This new grader object can be supplied to the main `evaluate` method as if it were a normal callable evaluator.
+- Added new experimental risk categories ProtectedMaterial and CodeVulnerability for redteam agent scan.
+### Bugs Fixed
+- Significant improvements to IntentResolution evaluator. New version has less variance, is nearly 2x faster and consumes fewer tokens.
+- Fixes and improvements to ToolCallAccuracy evaluator. New version has less variance. and now works on all tool calls that happen in a turn at once. Previously, it worked on each tool call independently without having context on the other tool calls that happen in the same turn, and then aggregated the results to a score in the range [0-1]. The score range is now [1-5].
+- Fixed MeteorScoreEvaluator and other threshold-based evaluators returning incorrect binary results due to integer conversion of decimal scores. Previously, decimal scores like 0.9375 were incorrectly converted to integers (0) before threshold comparison, causing them to fail even when above the threshold. [#41415](https://github.com/Azure/azure-sdk-for-python/issues/41415)
+- Added a new enum `ADVERSARIAL_QA_DOCUMENTS` which moves all the "file_content" type prompts away from `ADVERSARIAL_QA` to the new enum
+- `AzureOpenAIScoreModelGrader` evaluator now supports `pass_threshold` parameter to set the minimum score required for a response to be considered passing. This allows users to define custom thresholds for evaluation results, enhancing flexibility in grading AI model responses.
 ## 1.8.0 (2025-05-29)
 ### Features Added

{azure_ai_evaluation-1.8.0 → azure_ai_evaluation-1.10.0}/TROUBLESHOOTING.md RENAMED Viewed

@@ -46,9 +46,6 @@ This guide walks you through how to investigate failures, common errors in the `
 - Risk and safety evaluators depend on the Azure AI Studio safety evaluation backend service. For a list of supported regions, please refer to the documentation [here](https://aka.ms/azureaisafetyeval-regionsupport).
 - If you encounter a 403 Unauthorized error when using safety evaluators, verify that you have the `Contributor` role assigned to your Azure AI project. `Contributor` role is currently required to run safety evaluations.
-### Troubleshoot Quality Evaluator Issues
-- For `ToolCallAccuracyEvaluator`, if your input did not have a tool to evaluate, the current behavior is to output `null`.
 ## Handle Simulation Errors
 ### Adversarial Simulation Supported Regions

{azure_ai_evaluation-1.8.0 → azure_ai_evaluation-1.10.0}/azure/ai/evaluation/__init__.py RENAMED Viewed

@@ -45,6 +45,8 @@ from ._aoai.aoai_grader import AzureOpenAIGrader
 from ._aoai.label_grader import AzureOpenAILabelGrader
 from ._aoai.string_check_grader import AzureOpenAIStringCheckGrader
 from ._aoai.text_similarity_grader import AzureOpenAITextSimilarityGrader
+from ._aoai.score_model_grader import AzureOpenAIScoreModelGrader
+from ._aoai.python_grader import AzureOpenAIPythonGrader
 _patch_all = []
@@ -52,13 +54,47 @@ _patch_all = []
 # The converter from the AI service to the evaluator schema requires a dependency on
 # ai.projects, but we also don't want to force users installing ai.evaluations to pull
 # in ai.projects. So we only import it if it's available and the user has ai.projects.
-try:
-    from ._converters._ai_services import AIAgentConverter
-    _patch_all.append("AIAgentConverter")
-except ImportError:
-    print("[INFO] Could not import AIAgentConverter. Please install the dependency with `pip install azure-ai-projects`.")
+# We use lazy loading to avoid printing messages during import unless the classes are actually used.
+_lazy_imports = {}
+def _create_lazy_import(class_name, module_path, dependency_name):
+    """Create a lazy import function for optional dependencies.
+    Args:
+        class_name: Name of the class to import
+        module_path: Module path to import from
+        dependency_name: Name of the dependency package for error message
+    Returns:
+        A function that performs the lazy import when called
+    """
+    def lazy_import():
+        try:
+            module = __import__(module_path, fromlist=[class_name])
+            cls = getattr(module, class_name)
+            _patch_all.append(class_name)
+            return cls
+        except ImportError:
+            raise ImportError(
+                f"Could not import {class_name}. Please install the dependency with `pip install {dependency_name}`."
+            )
+    return lazy_import
+_lazy_imports["AIAgentConverter"] = _create_lazy_import(
+    "AIAgentConverter",
+    "azure.ai.evaluation._converters._ai_services",
+    "azure-ai-projects",
+)
+_lazy_imports["SKAgentConverter"] = _create_lazy_import(
+    "SKAgentConverter",
+    "azure.ai.evaluation._converters._sk_services",
+    "semantic-kernel",
+)
 __all__ = [
     "evaluate",
     "CoherenceEvaluator",
@@ -99,6 +135,15 @@ __all__ = [
     "AzureOpenAILabelGrader",
     "AzureOpenAIStringCheckGrader",
     "AzureOpenAITextSimilarityGrader",
+    "AzureOpenAIScoreModelGrader",
+    "AzureOpenAIPythonGrader",
 ]
-__all__.extend([p for p in _patch_all if p not in __all__])
+__all__.extend([p for p in _patch_all if p not in __all__])
+def __getattr__(name):
+    """Handle lazy imports for optional dependencies."""
+    if name in _lazy_imports:
+        return _lazy_imports[name]()
+    raise AttributeError(f"module '{__name__}' has no attribute '{name}'")

{azure_ai_evaluation-1.8.0 → azure_ai_evaluation-1.10.0}/azure/ai/evaluation/_aoai/__init__.py RENAMED Viewed

@@ -7,4 +7,4 @@ from .aoai_grader import AzureOpenAIGrader
 __all__ = [
     "AzureOpenAIGrader",
-]
+]

{azure_ai_evaluation-1.8.0 → azure_ai_evaluation-1.10.0}/azure/ai/evaluation/_aoai/aoai_grader.py RENAMED Viewed

@@ -5,12 +5,13 @@ from azure.ai.evaluation._model_configurations import AzureOpenAIModelConfigurat
 from azure.ai.evaluation._constants import DEFAULT_AOAI_API_VERSION
 from azure.ai.evaluation._exceptions import ErrorBlame, ErrorCategory, ErrorTarget, EvaluationException
+from azure.ai.evaluation._user_agent import UserAgentSingleton
 from typing import Any, Dict, Union
 from azure.ai.evaluation._common._experimental import experimental
 @experimental
-class AzureOpenAIGrader():
+class AzureOpenAIGrader:
     """
     Base class for Azure OpenAI grader wrappers, recommended only for use by experienced OpenAI API users.
     Combines a model configuration and any grader configuration
@@ -35,9 +36,15 @@ class AzureOpenAIGrader():
     """
-    id = "aoai://general"
+    id = "azureai://built-in/evaluators/azure-openai/custom_grader"
-    def __init__(self, *, model_config : Union[AzureOpenAIModelConfiguration, OpenAIModelConfiguration], grader_config: Dict[str, Any], **kwargs: Any):
+    def __init__(
+        self,
+        *,
+        model_config: Union[AzureOpenAIModelConfiguration, OpenAIModelConfiguration],
+        grader_config: Dict[str, Any],
+        **kwargs: Any,
+    ):
         self._model_config = model_config
         self._grader_config = grader_config
@@ -45,8 +52,6 @@ class AzureOpenAIGrader():
             self._validate_model_config()
             self._validate_grader_config()
     def _validate_model_config(self) -> None:
         """Validate the model configuration that this grader wrapper is using."""
         if "api_key" not in self._model_config or not self._model_config.get("api_key"):
@@ -57,7 +62,7 @@ class AzureOpenAIGrader():
                 category=ErrorCategory.INVALID_VALUE,
                 target=ErrorTarget.AOAI_GRADER,
             )
     def _validate_grader_config(self) -> None:
         """Validate the grader configuration that this grader wrapper is using."""
@@ -71,19 +76,24 @@ class AzureOpenAIGrader():
         :return: The OpenAI client.
         :rtype: [~openai.OpenAI, ~openai.AzureOpenAI]
         """
+        default_headers = {"User-Agent": UserAgentSingleton().value}
         if "azure_endpoint" in self._model_config:
-           from openai import AzureOpenAI
-           # TODO set default values?
-           return AzureOpenAI(
+            from openai import AzureOpenAI
+            # TODO set default values?
+            return AzureOpenAI(
                 azure_endpoint=self._model_config["azure_endpoint"],
-                api_key=self._model_config.get("api_key", None), # Default-style access to appease linters.
-                api_version=DEFAULT_AOAI_API_VERSION, # Force a known working version
+                api_key=self._model_config.get("api_key", None),  # Default-style access to appease linters.
+                api_version=DEFAULT_AOAI_API_VERSION,  # Force a known working version
                 azure_deployment=self._model_config.get("azure_deployment", ""),
+                default_headers=default_headers,
             )
         from openai import OpenAI
         # TODO add default values for base_url and organization?
         return OpenAI(
             api_key=self._model_config["api_key"],
             base_url=self._model_config.get("base_url", ""),
             organization=self._model_config.get("organization", ""),
+            default_headers=default_headers,
         )

{azure_ai_evaluation-1.8.0 → azure_ai_evaluation-1.10.0}/azure/ai/evaluation/_aoai/label_grader.py RENAMED Viewed

@@ -9,6 +9,7 @@ from azure.ai.evaluation._common._experimental import experimental
 from .aoai_grader import AzureOpenAIGrader
 @experimental
 class AzureOpenAILabelGrader(AzureOpenAIGrader):
     """
@@ -42,12 +43,12 @@ class AzureOpenAILabelGrader(AzureOpenAIGrader):
     """
-    id = "aoai://label_model"
+    id = "azureai://built-in/evaluators/azure-openai/label_grader"
     def __init__(
         self,
         *,
-        model_config : Union[AzureOpenAIModelConfiguration, OpenAIModelConfiguration],
+        model_config: Union[AzureOpenAIModelConfiguration, OpenAIModelConfiguration],
         input: List[Dict[str, str]],
         labels: List[str],
         model: str,

azure_ai_evaluation-1.10.0/azure/ai/evaluation/_aoai/python_grader.py ADDED Viewed

@@ -0,0 +1,84 @@
+# ---------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# ---------------------------------------------------------
+from typing import Any, Dict, Union, Optional
+from azure.ai.evaluation._model_configurations import AzureOpenAIModelConfiguration, OpenAIModelConfiguration
+from openai.types.graders import PythonGrader
+from azure.ai.evaluation._common._experimental import experimental
+from .aoai_grader import AzureOpenAIGrader
+@experimental
+class AzureOpenAIPythonGrader(AzureOpenAIGrader):
+    """
+    Wrapper class for OpenAI's Python code graders.
+    Enables custom Python-based evaluation logic with flexible scoring and
+    pass/fail thresholds. The grader executes user-provided Python code
+    to evaluate outputs against custom criteria.
+    Supplying a PythonGrader to the `evaluate` method will cause an
+    asynchronous request to evaluate the grader via the OpenAI API. The
+    results of the evaluation will then be merged into the standard
+    evaluation results.
+    :param model_config: The model configuration to use for the grader.
+    :type model_config: Union[
+        ~azure.ai.evaluation.AzureOpenAIModelConfiguration,
+        ~azure.ai.evaluation.OpenAIModelConfiguration
+    ]
+    :param name: The name of the grader.
+    :type name: str
+    :param image_tag: The image tag for the Python execution environment.
+    :type image_tag: str
+    :param pass_threshold: Score threshold for pass/fail classification.
+        Scores >= threshold are considered passing.
+    :type pass_threshold: float
+    :param source: Python source code containing the grade function.
+        Must define: def grade(sample: dict, item: dict) -> float
+    :type source: str
+    :param kwargs: Additional keyword arguments to pass to the grader.
+    :type kwargs: Any
+    .. admonition:: Example:
+        .. literalinclude:: ../samples/evaluation_samples_common.py
+            :start-after: [START python_grader_example]
+            :end-before: [END python_grader_example]
+            :language: python
+            :dedent: 8
+            :caption: Using AzureOpenAIPythonGrader for custom evaluation logic.
+    """
+    id = "azureai://built-in/evaluators/azure-openai/python_grader"
+    def __init__(
+        self,
+        *,
+        model_config: Union[AzureOpenAIModelConfiguration, OpenAIModelConfiguration],
+        name: str,
+        image_tag: str,
+        pass_threshold: float,
+        source: str,
+        **kwargs: Any,
+    ):
+        # Validate pass_threshold
+        if not 0.0 <= pass_threshold <= 1.0:
+            raise ValueError("pass_threshold must be between 0.0 and 1.0")
+        # Store pass_threshold as instance attribute for potential future use
+        self.pass_threshold = pass_threshold
+        # Create OpenAI PythonGrader instance
+        grader = PythonGrader(
+            name=name,
+            image_tag=image_tag,
+            pass_threshold=pass_threshold,
+            source=source,
+            type="python",
+        )
+        super().__init__(model_config=model_config, grader_config=grader, **kwargs)

azure_ai_evaluation-1.10.0/azure/ai/evaluation/_aoai/score_model_grader.py ADDED Viewed

@@ -0,0 +1,91 @@
+# ---------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# ---------------------------------------------------------
+from typing import Any, Dict, Union, List, Optional
+from azure.ai.evaluation._model_configurations import AzureOpenAIModelConfiguration, OpenAIModelConfiguration
+from openai.types.graders import ScoreModelGrader
+from azure.ai.evaluation._common._experimental import experimental
+from .aoai_grader import AzureOpenAIGrader
+@experimental
+class AzureOpenAIScoreModelGrader(AzureOpenAIGrader):
+    """
+    Wrapper class for OpenAI's score model graders.
+    Enables continuous scoring evaluation with custom prompts and flexible
+    conversation-style inputs. Supports configurable score ranges and
+    pass thresholds for binary classification.
+    Supplying a ScoreModelGrader to the `evaluate` method will cause an
+    asynchronous request to evaluate the grader via the OpenAI API. The
+    results of the evaluation will then be merged into the standard
+    evaluation results.
+    :param model_config: The model configuration to use for the grader.
+    :type model_config: Union[
+        ~azure.ai.evaluation.AzureOpenAIModelConfiguration,
+        ~azure.ai.evaluation.OpenAIModelConfiguration
+    ]
+    :param input: The input messages for the grader. List of conversation
+        messages with role and content.
+    :type input: List[Dict[str, str]]
+    :param model: The model to use for the evaluation.
+    :type model: str
+    :param name: The name of the grader.
+    :type name: str
+    :param range: The range of the score. Defaults to [0, 1].
+    :type range: Optional[List[float]]
+    :param pass_threshold: Score threshold for pass/fail classification.
+        Defaults to midpoint of range.
+    :type pass_threshold: Optional[float]
+    :param sampling_params: The sampling parameters for the model.
+    :type sampling_params: Optional[Dict[str, Any]]
+    :param kwargs: Additional keyword arguments to pass to the grader.
+    :type kwargs: Any
+    """
+    id = "azureai://built-in/evaluators/azure-openai/score_model_grader"
+    def __init__(
+        self,
+        *,
+        model_config: Union[AzureOpenAIModelConfiguration, OpenAIModelConfiguration],
+        input: List[Dict[str, str]],
+        model: str,
+        name: str,
+        range: Optional[List[float]] = None,
+        pass_threshold: Optional[float] = None,
+        sampling_params: Optional[Dict[str, Any]] = None,
+        **kwargs: Any,
+    ):
+        # Validate range and pass_threshold
+        if range is not None:
+            if len(range) != 2 or range[0] >= range[1]:
+                raise ValueError("range must be a list of two numbers [min, max] where min < max")
+        else:
+            range = [0.0, 1.0]  # Default range
+        if pass_threshold is not None:
+            if range and (pass_threshold < range[0] or pass_threshold > range[1]):
+                raise ValueError(f"pass_threshold {pass_threshold} must be within range {range}")
+        else:
+            pass_threshold = (range[0] + range[1]) / 2  # Default to midpoint
+        # Store pass_threshold as instance attribute
+        self.pass_threshold = pass_threshold
+        # Create OpenAI ScoreModelGrader instance
+        grader_kwargs = {"input": input, "model": model, "name": name, "type": "score_model"}
+        if range is not None:
+            grader_kwargs["range"] = range
+        if sampling_params is not None:
+            grader_kwargs["sampling_params"] = sampling_params
+        grader_kwargs["pass_threshold"] = self.pass_threshold
+        grader = ScoreModelGrader(**grader_kwargs)
+        super().__init__(model_config=model_config, grader_config=grader, **kwargs)

{azure_ai_evaluation-1.8.0 → azure_ai_evaluation-1.10.0}/azure/ai/evaluation/_aoai/string_check_grader.py RENAMED Viewed

@@ -10,6 +10,7 @@ from azure.ai.evaluation._common._experimental import experimental
 from .aoai_grader import AzureOpenAIGrader
 @experimental
 class AzureOpenAIStringCheckGrader(AzureOpenAIGrader):
     """
@@ -38,12 +39,12 @@ class AzureOpenAIStringCheckGrader(AzureOpenAIGrader):
     """
-    id = "aoai://string_check"
+    id = "azureai://built-in/evaluators/azure-openai/string_check_grader"
     def __init__(
         self,
         *,
-        model_config : Union[AzureOpenAIModelConfiguration, OpenAIModelConfiguration],
+        model_config: Union[AzureOpenAIModelConfiguration, OpenAIModelConfiguration],
         input: str,
         name: str,
         operation: Literal[

{azure_ai_evaluation-1.8.0 → azure_ai_evaluation-1.10.0}/azure/ai/evaluation/_aoai/text_similarity_grader.py RENAMED Viewed

@@ -10,6 +10,7 @@ from azure.ai.evaluation._common._experimental import experimental
 from .aoai_grader import AzureOpenAIGrader
 @experimental
 class AzureOpenAITextSimilarityGrader(AzureOpenAIGrader):
     """
@@ -52,12 +53,12 @@ class AzureOpenAITextSimilarityGrader(AzureOpenAIGrader):
     """
-    id = "aoai://text_similarity"
+    id = "azureai://built-in/evaluators/azure-openai/text_similarity_grader"
     def __init__(
         self,
         *,
-        model_config : Union[AzureOpenAIModelConfiguration, OpenAIModelConfiguration],
+        model_config: Union[AzureOpenAIModelConfiguration, OpenAIModelConfiguration],
         evaluation_metric: Literal[
             "fuzzy_match",
             "bleu",

{azure_ai_evaluation-1.8.0 → azure_ai_evaluation-1.10.0}/azure/ai/evaluation/_azure/_envs.py RENAMED Viewed

@@ -19,6 +19,7 @@ from azure.core.pipeline.policies import ProxyPolicy, AsyncRetryPolicy
 class AzureEnvironmentMetadata(TypedDict):
     """Configuration for various Azure environments. All endpoints include a trailing slash."""
     portal_endpoint: str
     """The management portal for the Azure environment (e.g. https://portal.azure.com/)"""
     resource_manager_endpoint: str
@@ -107,15 +108,15 @@ class AzureEnvironmentClient:
         def case_insensitive_match(d: Mapping[str, Any], key: str) -> Optional[Any]:
             key = key.strip().lower()
-            return next((v for k,v in d.items() if k.strip().lower() == key), None)
+            return next((v for k, v in d.items() if k.strip().lower() == key), None)
         async with _ASYNC_LOCK:
             cloud = _KNOWN_AZURE_ENVIRONMENTS.get(name) or case_insensitive_match(_KNOWN_AZURE_ENVIRONMENTS, name)
             if cloud:
                 return cloud
-            default_endpoint = (_KNOWN_AZURE_ENVIRONMENTS
-                .get(_DEFAULT_AZURE_ENV_NAME, {})
-                .get("resource_manager_endpoint"))
+            default_endpoint = _KNOWN_AZURE_ENVIRONMENTS.get(_DEFAULT_AZURE_ENV_NAME, {}).get(
+                "resource_manager_endpoint"
+            )
         metadata_url = self.get_default_metadata_url(default_endpoint)
         clouds = await self.get_clouds_async(metadata_url=metadata_url, update_cached=update_cached)
@@ -124,10 +125,7 @@ class AzureEnvironmentClient:
         return cloud_metadata
     async def get_clouds_async(
-        self,
-        *,
-        metadata_url: Optional[str] = None,
-        update_cached: bool = True
+        self, *, metadata_url: Optional[str] = None, update_cached: bool = True
     ) -> Mapping[str, AzureEnvironmentMetadata]:
         metadata_url = metadata_url or self.get_default_metadata_url()
@@ -149,7 +147,8 @@ class AzureEnvironmentClient:
         default_endpoint = default_endpoint or "https://management.azure.com/"
         metadata_url = os.getenv(
             _ENV_ARM_CLOUD_METADATA_URL,
-            f"{default_endpoint}metadata/endpoints?api-version={AzureEnvironmentClient.DEFAULT_API_VERSION}")
+            f"{default_endpoint}metadata/endpoints?api-version={AzureEnvironmentClient.DEFAULT_API_VERSION}",
+        )
         return metadata_url
     @staticmethod
@@ -197,7 +196,7 @@ class AzureEnvironmentClient:
 def recursive_update(d: Dict, u: Mapping) -> None:
     """Recursively update a dictionary.
     :param Dict d: The dictionary to update.
     :param Mapping u: The mapping to update from.
     """

{azure_ai_evaluation-1.8.0 → azure_ai_evaluation-1.10.0}/azure/ai/evaluation/_azure/_token_manager.py RENAMED Viewed

@@ -73,7 +73,13 @@ class AzureMLTokenManager(APITokenManager):
             return super().get_aad_credential()
     def get_token(
-            self, scopes = None, claims: Union[str, None] = None, tenant_id: Union[str, None] = None, enable_cae: bool = False, **kwargs: Any) -> AccessToken:
+        self,
+        scopes=None,
+        claims: Union[str, None] = None,
+        tenant_id: Union[str, None] = None,
+        enable_cae: bool = False,
+        **kwargs: Any
+    ) -> AccessToken:
         """Get the API token. If the token is not available or has expired, refresh the token.
         :return: API token

{azure_ai_evaluation-1.8.0 → azure_ai_evaluation-1.10.0}/azure/ai/evaluation/_common/constants.py RENAMED Viewed

@@ -5,8 +5,17 @@ from enum import Enum
 from azure.core import CaseInsensitiveEnumMeta
-PROMPT_BASED_REASON_EVALUATORS = ["coherence", "relevance", "retrieval", "groundedness", "fluency", "intent_resolution",
-                                  "tool_call_accurate", "response_completeness", "task_adherence"]
+PROMPT_BASED_REASON_EVALUATORS = [
+    "coherence",
+    "relevance",
+    "retrieval",
+    "groundedness",
+    "fluency",
+    "intent_resolution",
+    "tool_call_accurate",
+    "response_completeness",
+    "task_adherence",
+]
 class CommonConstants:

azure-ai-evaluation 1.8.0__tar.gz → 1.10.0__tar.gz

Potentially problematic release.

azure-ai-evaluation 1.8.0tar.gz → 1.10.0tar.gz