PyPI - azure-ai-evaluation - Versions diffs - 0.0.0b0__py3-none-any.whl → 1.0.0b1__py3-none-any.whl - Mend

azure-ai-evaluation 0.0.0b0py3-none-any.whl → 1.0.0b1py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of azure-ai-evaluation might be problematic. Click here for more details.

Files changed (100) hide show

azure/ai/evaluation/_evaluators/_fluency/fluency.prompty ADDED Viewed

@@ -0,0 +1,61 @@
+---
+name: Fluency
+description: Evaluates fluency score for QA scenario
+model:
+  api: chat
+  configuration:
+    type: azure_openai
+    azure_deployment: ${env:AZURE_DEPLOYMENT}
+    api_key: ${env:AZURE_OPENAI_API_KEY}
+    azure_endpoint: ${env:AZURE_OPENAI_ENDPOINT}
+  parameters:
+    temperature: 0.0
+    max_tokens: 1
+    top_p: 1.0
+    presence_penalty: 0
+    frequency_penalty: 0
+    response_format:
+      type: text
+inputs:
+  query:
+    type: string
+  response:
+    type: string
+---
+system:
+You are an AI assistant. You will be given the definition of an evaluation metric for assessing the quality of an answer in a question-answering task. Your job is to compute an accurate evaluation score using the provided evaluation metric. You should return a single integer value between 1 to 5 representing the evaluation metric. You will include no other text or information.
+user:
+Fluency measures the quality of individual sentences in the answer, and whether they are well-written and grammatically correct. Consider the quality of individual sentences when evaluating fluency. Given the question and answer, score the fluency of the answer between one to five stars using the following rating scale:
+One star: the answer completely lacks fluency
+Two stars: the answer mostly lacks fluency
+Three stars: the answer is partially fluent
+Four stars: the answer is mostly fluent
+Five stars: the answer has perfect fluency
+This rating value should always be an integer between 1 and 5. So the rating produced should be 1 or 2 or 3 or 4 or 5.
+question: What did you have for breakfast today?
+answer: Breakfast today, me eating cereal and orange juice very good.
+stars: 1
+question: How do you feel when you travel alone?
+answer: Alone travel, nervous, but excited also. I feel adventure and like its time.
+stars: 2
+question: When was the last time you went on a family vacation?
+answer: Last family vacation, it took place in last summer. We traveled to a beach destination, very fun.
+stars: 3
+question: What is your favorite thing about your job?
+answer: My favorite aspect of my job is the chance to interact with diverse people. I am constantly learning from their experiences and stories.
+stars: 4
+question: Can you describe your morning routine?
+answer: Every morning, I wake up at 6 am, drink a glass of water, and do some light stretching. After that, I take a shower and get dressed for work. Then, I have a healthy breakfast, usually consisting of oatmeal and fruits, before leaving the house around 7:30 am.
+stars: 5
+question: {{query}}
+answer: {{response}}
+stars:

azure/ai/evaluation/_evaluators/_gleu/__init__.py ADDED Viewed

@@ -0,0 +1,9 @@
+# ---------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# ---------------------------------------------------------
+from ._gleu import GleuScoreEvaluator
+__all__ = [
+    "GleuScoreEvaluator",
+]

azure/ai/evaluation/_evaluators/_gleu/_gleu.py ADDED Viewed

@@ -0,0 +1,71 @@
+# ---------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# ---------------------------------------------------------
+from nltk.translate.gleu_score import sentence_gleu
+from promptflow._utils.async_utils import async_run_allowing_running_loop
+from azure.ai.evaluation._common.utils import nltk_tokenize
+class _AsyncGleuScoreEvaluator:
+    def __init__(self):
+        pass
+    async def __call__(self, *, ground_truth: str, response: str, **kwargs):
+        reference_tokens = nltk_tokenize(ground_truth)
+        hypothesis_tokens = nltk_tokenize(response)
+        score = sentence_gleu([reference_tokens], hypothesis_tokens)
+        return {
+            "gleu_score": score,
+        }
+class GleuScoreEvaluator:
+    """
+    Evaluator that computes the BLEU Score between two strings.
+    The GLEU (Google-BLEU) score evaluator measures the similarity between generated and reference texts by
+    evaluating n-gram overlap, considering both precision and recall. This balanced evaluation, designed for
+    sentence-level assessment, makes it ideal for detailed analysis of translation quality. GLEU is well-suited for
+    use cases such as machine translation, text summarization, and text generation.
+    **Usage**
+    .. code-block:: python
+        eval_fn = GleuScoreEvaluator()
+        result = eval_fn(
+            response="Tokyo is the capital of Japan.",
+            ground_truth="The capital of Japan is Tokyo.")
+    **Output format**
+    .. code-block:: python
+        {
+            "gleu_score": 0.41
+        }
+    """
+    def __init__(self):
+        self._async_evaluator = _AsyncGleuScoreEvaluator()
+    def __call__(self, *, ground_truth: str, response: str, **kwargs):
+        """
+        Evaluate the GLEU score between the response and the ground truth.
+        :keyword response: The response to be evaluated.
+        :paramtype response: str
+        :keyword ground_truth: The ground truth to be compared against.
+        :paramtype ground_truth: str
+        :return: The GLEU score.
+        :rtype: dict
+        """
+        return async_run_allowing_running_loop(
+            self._async_evaluator, ground_truth=ground_truth, response=response, **kwargs
+        )
+    def _to_async(self):
+        return self._async_evaluator

azure/ai/evaluation/_evaluators/_groundedness/__init__.py ADDED Viewed

@@ -0,0 +1,9 @@
+# ---------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# ---------------------------------------------------------
+from ._groundedness import GroundednessEvaluator
+__all__ = [
+    "GroundednessEvaluator",
+]

azure/ai/evaluation/_evaluators/_groundedness/_groundedness.py ADDED Viewed

@@ -0,0 +1,123 @@
+# ---------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# ---------------------------------------------------------
+import os
+import re
+from typing import Union
+import numpy as np
+from promptflow._utils.async_utils import async_run_allowing_running_loop
+from azure.ai.evaluation._exceptions import EvaluationException, ErrorBlame, ErrorCategory, ErrorTarget
+from promptflow.core import AsyncPrompty
+from ..._model_configurations import AzureOpenAIModelConfiguration, OpenAIModelConfiguration
+from ..._common.utils import (
+    check_and_add_api_version_for_aoai_model_config,
+    check_and_add_user_agent_for_aoai_model_config,
+)
+try:
+    from ..._user_agent import USER_AGENT
+except ImportError:
+    USER_AGENT = None
+class _AsyncGroundednessEvaluator:
+    # Constants must be defined within eval's directory to be save/loadable
+    PROMPTY_FILE = "groundedness.prompty"
+    LLM_CALL_TIMEOUT = 600
+    DEFAULT_OPEN_API_VERSION = "2024-02-15-preview"
+    def __init__(self, model_config: dict):
+        check_and_add_api_version_for_aoai_model_config(model_config, self.DEFAULT_OPEN_API_VERSION)
+        prompty_model_config = {"configuration": model_config, "parameters": {"extra_headers": {}}}
+        # Handle "RuntimeError: Event loop is closed" from httpx AsyncClient
+        # https://github.com/encode/httpx/discussions/2959
+        prompty_model_config["parameters"]["extra_headers"].update({"Connection": "close"})
+        check_and_add_user_agent_for_aoai_model_config(
+            model_config,
+            prompty_model_config,
+            USER_AGENT,
+        )
+        current_dir = os.path.dirname(__file__)
+        prompty_path = os.path.join(current_dir, "groundedness.prompty")
+        self._flow = AsyncPrompty.load(source=prompty_path, model=prompty_model_config)
+    async def __call__(self, *, response: str, context: str, **kwargs):
+        # Validate input parameters
+        response = str(response or "")
+        context = str(context or "")
+        if not response.strip() or not context.strip():
+            msg = "Both 'response' and 'context' must be non-empty strings."
+            raise EvaluationException(
+                message=msg,
+                internal_message=msg,
+                error_category=ErrorCategory.MISSING_FIELD,
+                error_blame=ErrorBlame.USER_ERROR,
+                error_target=ErrorTarget.F1_EVALUATOR,
+            )
+        # Run the evaluation flow
+        llm_output = await self._flow(response=response, context=context, timeout=self.LLM_CALL_TIMEOUT, **kwargs)
+        score = np.nan
+        if llm_output:
+            match = re.search(r"\d", llm_output)
+            if match:
+                score = float(match.group())
+        return {"gpt_groundedness": float(score)}
+class GroundednessEvaluator:
+    """
+    Initialize a groundedness evaluator configured for a specific Azure OpenAI model.
+    :param model_config: Configuration for the Azure OpenAI model.
+    :type model_config: Union[~azure.ai.evalation.AzureOpenAIModelConfiguration,
+        ~azure.ai.evalation.OpenAIModelConfiguration]
+    **Usage**
+    .. code-block:: python
+        eval_fn = GroundednessEvaluator(model_config)
+        result = eval_fn(
+            response="The capital of Japan is Tokyo.",
+            context="Tokyo is Japan's capital, known for its blend of traditional culture \
+                and technological advancements.")
+    **Output format**
+    .. code-block:: python
+        {
+            "gpt_groundedness": 5
+        }
+    """
+    def __init__(self, model_config: dict):
+        self._async_evaluator = _AsyncGroundednessEvaluator(model_config)
+    def __call__(self, *, response: str, context: str, **kwargs):
+        """
+        Evaluate groundedness of the response in the context.
+        :keyword response: The response to be evaluated.
+        :paramtype response: str
+        :keyword context: The context in which the response is evaluated.
+        :paramtype context: str
+        :return: The groundedness score.
+        :rtype: dict
+        """
+        return async_run_allowing_running_loop(self._async_evaluator, response=response, context=context, **kwargs)
+    def _to_async(self):
+        return self._async_evaluator

azure/ai/evaluation/_evaluators/_groundedness/groundedness.prompty ADDED Viewed

@@ -0,0 +1,54 @@
+---
+name: Groundedness
+description: Evaluates groundedness score for QA scenario
+model:
+  api: chat
+  configuration:
+    type: azure_openai
+    azure_deployment: ${env:AZURE_DEPLOYMENT}
+    api_key: ${env:AZURE_OPENAI_API_KEY}
+    azure_endpoint: ${env:AZURE_OPENAI_ENDPOINT}
+  parameters:
+    temperature: 0.0
+    max_tokens: 1
+    top_p: 1.0
+    presence_penalty: 0
+    frequency_penalty: 0
+    response_format:
+      type: text
+inputs:
+  response:
+    type: string
+  context:
+    type: string
+---
+system:
+You are an AI assistant. You will be given the definition of an evaluation metric for assessing the quality of an answer in a question-answering task. Your job is to compute an accurate evaluation score using the provided evaluation metric. You should return a single integer value between 1 to 5 representing the evaluation metric. You will include no other text or information.
+user:
+You will be presented with a CONTEXT and an ANSWER about that CONTEXT. You need to decide whether the ANSWER is entailed by the CONTEXT by choosing one of the following rating:
+1. 5: The ANSWER follows logically from the information contained in the CONTEXT.
+2. 1: The ANSWER is logically false from the information contained in the CONTEXT.
+3. an integer score between 1 and 5 and if such integer score does not exist, use 1: It is not possible to determine whether the ANSWER is true or false without further information. Read the passage of information thoroughly and select the correct answer from the three answer labels. Read the CONTEXT thoroughly to ensure you know what the CONTEXT entails. Note the ANSWER is generated by a computer system, it can contain certain symbols, which should not be a negative factor in the evaluation.
+Independent Examples:
+## Example Task #1 Input:
+{"CONTEXT": "Some are reported as not having been wanted at all.", "QUESTION": "", "ANSWER": "All are reported as being completely and fully wanted."}
+## Example Task #1 Output:
+1
+## Example Task #2 Input:
+{"CONTEXT": "Ten new television shows appeared during the month of September. Five of the shows were sitcoms, three were hourlong dramas, and two were news-magazine shows. By January, only seven of these new shows were still on the air. Five of the shows that remained were sitcoms.", "QUESTION": "", "ANSWER": "At least one of the shows that were cancelled was an hourlong drama."}
+## Example Task #2 Output:
+5
+## Example Task #3 Input:
+{"CONTEXT": "In Quebec, an allophone is a resident, usually an immigrant, whose mother tongue or home language is neither French nor English.", "QUESTION": "", "ANSWER": "In Quebec, an allophone is a resident, usually an immigrant, whose mother tongue or home language is not French."}
+## Example Task #3 Output:
+5
+## Example Task #4 Input:
+{"CONTEXT": "Some are reported as not having been wanted at all.", "QUESTION": "", "ANSWER": "All are reported as being completely and fully wanted."}
+## Example Task #4 Output:
+1
+## Actual Task Input:
+{"CONTEXT": {{context}}, "QUESTION": "", "ANSWER": {{response}}}
+Reminder: The return values for each task should be correctly formatted as an integer between 1 and 5. Do not repeat the context and question.
+Actual Task Output:

azure/ai/evaluation/_evaluators/_meteor/__init__.py ADDED Viewed

@@ -0,0 +1,9 @@
+# ---------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# ---------------------------------------------------------
+from ._meteor import MeteorScoreEvaluator
+__all__ = [
+    "MeteorScoreEvaluator",
+]

azure/ai/evaluation/_evaluators/_meteor/_meteor.py ADDED Viewed

@@ -0,0 +1,96 @@
+# ---------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# ---------------------------------------------------------
+import nltk
+from nltk.translate.meteor_score import meteor_score
+from promptflow._utils.async_utils import async_run_allowing_running_loop
+from azure.ai.evaluation._common.utils import nltk_tokenize
+class _AsyncMeteorScoreEvaluator:
+    def __init__(self, alpha: float = 0.9, beta: float = 3.0, gamma: float = 0.5):
+        self._alpha = alpha
+        self._beta = beta
+        self._gamma = gamma
+        try:
+            nltk.find("corpora/wordnet.zip")
+        except LookupError:
+            nltk.download("wordnet")
+    async def __call__(self, *, ground_truth: str, response: str, **kwargs):
+        reference_tokens = nltk_tokenize(ground_truth)
+        hypothesis_tokens = nltk_tokenize(response)
+        score = meteor_score(
+            [reference_tokens],
+            hypothesis_tokens,
+            alpha=self._alpha,
+            beta=self._beta,
+            gamma=self._gamma,
+        )
+        return {
+            "meteor_score": score,
+        }
+class MeteorScoreEvaluator:
+    """
+    Evaluator that computes the METEOR Score between two strings.
+    The METEOR (Metric for Evaluation of Translation with Explicit Ordering) score grader evaluates generated text by
+    comparing it to reference texts, focusing on precision, recall, and content alignment. It addresses limitations of
+    other metrics like BLEU by considering synonyms, stemming, and paraphrasing. METEOR score considers synonyms and
+    word stems to more accurately capture meaning and language variations. In addition to machine translation and
+    text summarization, paraphrase detection is an optimal use case for the METEOR score.
+    :param alpha: The METEOR score alpha parameter. Default is 0.9.
+    :type alpha: float
+    :param beta: The METEOR score beta parameter. Default is 3.0.
+    :type beta: float
+    :param gamma: The METEOR score gamma parameter. Default is 0.5.
+    :type gamma: float
+    **Usage**
+    .. code-block:: python
+        eval_fn = MeteorScoreEvaluator(
+            alpha=0.9,
+            beta=3.0,
+            gamma=0.5
+        )
+        result = eval_fn(
+            response="Tokyo is the capital of Japan.",
+            ground_truth="The capital of Japan is Tokyo.")
+    **Output format**
+    .. code-block:: python
+        {
+            "meteor_score": 0.62
+        }
+    """
+    def __init__(self, alpha: float = 0.9, beta: float = 3.0, gamma: float = 0.5):
+        self._async_evaluator = _AsyncMeteorScoreEvaluator(alpha=alpha, beta=beta, gamma=gamma)
+    def __call__(self, *, ground_truth: str, response: str, **kwargs):
+        """
+        Evaluate the METEOR score between the response and the ground truth.
+        :keyword response: The response to be evaluated.
+        :paramtype response: str
+        :keyword ground_truth: The ground truth to be compared against.
+        :paramtype ground_truth: str
+        :return: The METEOR score.
+        :rtype: dict
+        """
+        return async_run_allowing_running_loop(
+            self._async_evaluator, ground_truth=ground_truth, response=response, **kwargs
+        )
+    def _to_async(self):
+        return self._async_evaluator

azure/ai/evaluation/_evaluators/_protected_material/__init__.py ADDED Viewed

@@ -0,0 +1,5 @@
+from ._protected_material import ProtectedMaterialEvaluator
+__all__ = [
+    "ProtectedMaterialEvaluator",
+]

azure/ai/evaluation/_evaluators/_protected_material/_protected_material.py ADDED Viewed

@@ -0,0 +1,104 @@
+# ---------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# ---------------------------------------------------------
+from promptflow._utils.async_utils import async_run_allowing_running_loop
+from azure.ai.evaluation._common.constants import EvaluationMetrics
+from azure.ai.evaluation._common.rai_service import evaluate_with_rai_service
+from azure.ai.evaluation._exceptions import EvaluationException, ErrorBlame, ErrorCategory, ErrorTarget
+from azure.ai.evaluation._model_configurations import AzureAIProject
+class _AsyncProtectedMaterialEvaluator:
+    def __init__(self, azure_ai_project: dict, credential=None):
+        self._azure_ai_project = azure_ai_project
+        self._credential = credential
+    async def __call__(self, *, query: str, response: str, **kwargs):
+        """
+        Evaluates content according to this evaluator's metric.
+        :keyword query: The query to be evaluated.
+        :paramtype query: str
+        :keyword response: The response to be evaluated.
+        :paramtype response: str
+        :return: The evaluation score computation based on the Content Safety metric (self.metric).
+        :rtype: Any
+        """
+        # Validate inputs
+        # Raises value error if failed, so execution alone signifies success.
+        if not (query and query.strip() and query != "None") or not (
+            response and response.strip() and response != "None"
+        ):
+            msg = "Both 'query' and 'response' must be non-empty strings."
+            raise EvaluationException(
+                message=msg,
+                internal_message=msg,
+                error_category=ErrorCategory.MISSING_FIELD,
+                error_blame=ErrorBlame.USER_ERROR,
+                error_target=ErrorTarget.PROTECTED_MATERIAL_EVALUATOR,
+            )
+        # Run score computation based on supplied metric.
+        result = await evaluate_with_rai_service(
+            metric_name=EvaluationMetrics.PROTECTED_MATERIAL,
+            query=query,
+            response=response,
+            project_scope=self._azure_ai_project,
+            credential=self._credential,
+        )
+        return result
+class ProtectedMaterialEvaluator:
+    """
+    Initialize a protected material evaluator to detect whether protected material
+    is present in your AI system's response. Outputs True or False with AI-generated reasoning.
+    :param azure_ai_project: The scope of the Azure AI project.
+        It contains subscription id, resource group, and project name.
+    :type azure_ai_project: ~azure.ai.evaluation.AzureAIProject
+    :param credential: The credential for connecting to Azure AI project.
+    :type credential: ~azure.core.credentials.TokenCredential
+    :return: Whether or not protected material was found in the response, with AI-generated reasoning.
+    :rtype: Dict[str, str]
+    **Usage**
+    .. code-block:: python
+        azure_ai_project = {
+            "subscription_id": "<subscription_id>",
+            "resource_group_name": "<resource_group_name>",
+            "project_name": "<project_name>",
+        }
+        eval_fn = ProtectedMaterialEvaluator(azure_ai_project)
+        result = eval_fn(query="What is the capital of France?", response="Paris.")
+    **Output format**
+    .. code-block:: python
+        {
+            "protected_material_label": "False",
+            "protected_material_reason": "This query does not contain any protected material."
+        }
+    """
+    def __init__(self, azure_ai_project: dict, credential=None):
+        self._async_evaluator = _AsyncProtectedMaterialEvaluator(azure_ai_project, credential)
+    def __call__(self, *, query: str, response: str, **kwargs):
+        """
+        Evaluates protected material content.
+        :keyword query: The query to be evaluated.
+        :paramtype query: str
+        :keyword response: The response to be evaluated.
+        :paramtype response: str
+        :return: A dictionary containing a boolean label and reasoning.
+        :rtype: dict
+        """
+        return async_run_allowing_running_loop(self._async_evaluator, query=query, response=response, **kwargs)
+    def _to_async(self):
+        return self._async_evaluator

azure/ai/evaluation/_evaluators/_protected_materials/__init__.py ADDED Viewed

@@ -0,0 +1,5 @@
+from ._protected_materials import ProtectedMaterialsEvaluator
+__all__ = [
+    "ProtectedMaterialsEvaluator",
+]

azure/ai/evaluation/_evaluators/_protected_materials/_protected_materials.py ADDED Viewed

@@ -0,0 +1,104 @@
+# ---------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# ---------------------------------------------------------
+from promptflow._utils.async_utils import async_run_allowing_running_loop
+from azure.ai.evaluation._common.constants import EvaluationMetrics
+from azure.ai.evaluation._common.rai_service import evaluate_with_rai_service
+from azure.ai.evaluation._exceptions import EvaluationException, ErrorBlame, ErrorCategory, ErrorTarget
+from azure.ai.evaluation._model_configurations import AzureAIProject
+class _AsyncProtectedMaterialsEvaluator:
+    def __init__(self, azure_ai_project: dict, credential=None):
+        self._azure_ai_project = azure_ai_project
+        self._credential = credential
+    async def __call__(self, *, query: str, response: str, **kwargs):
+        """
+        Evaluates content according to this evaluator's metric.
+        :keyword query: The query to be evaluated.
+        :paramtype query: str
+        :keyword response: The response to be evaluated.
+        :paramtype response: str
+        :return: The evaluation score computation based on the Content Safety metric (self.metric).
+        :rtype: Any
+        """
+        # Validate inputs
+        # Raises value error if failed, so execution alone signifies success.
+        if not (query and query.strip() and query != "None") or not (
+            response and response.strip() and response != "None"
+        ):
+            msg = "Both 'query' and 'response' must be non-empty strings."
+            raise EvaluationException(
+                message=msg,
+                internal_message=msg,
+                error_category=ErrorCategory.MISSING_FIELD,
+                error_blame=ErrorBlame.USER_ERROR,
+                error_target=ErrorTarget.PROTECTED_MATERIAL_EVALUATOR,
+            )
+        # Run score computation based on supplied metric.
+        result = await evaluate_with_rai_service(
+            metric_name=EvaluationMetrics.PROTECTED_MATERIAL,
+            query=query,
+            response=response,
+            project_scope=self._azure_ai_project,
+            credential=self._credential,
+        )
+        return result
+class ProtectedMaterialsEvaluator:
+    """
+    Initialize a protected materials evaluator to detect whether protected material
+    is present in your AI system's response. Outputs True or False with AI-generated reasoning.
+    :param azure_ai_project: The scope of the Azure AI project.
+        It contains subscription id, resource group, and project name.
+    :type azure_ai_project: ~azure.ai.evaluation.AzureAIProject
+    :param credential: The credential for connecting to Azure AI project.
+    :type credential: ~azure.core.credentials.TokenCredential
+    :return: Whether or not protected material was found in the response, with AI-generated reasoning.
+    :rtype: Dict[str, str]
+    **Usage**
+    .. code-block:: python
+        azure_ai_project = {
+            "subscription_id": "<subscription_id>",
+            "resource_group_name": "<resource_group_name>",
+            "project_name": "<project_name>",
+        }
+        eval_fn = ProtectedMaterialsEvaluator(azure_ai_project)
+        result = eval_fn(query="What is the capital of France?", response="Paris.")
+    **Output format**
+    .. code-block:: python
+        {
+            "label": "False",
+            "reasoning": "This query does not contain any protected material."
+        }
+    """
+    def __init__(self, azure_ai_project: dict, credential=None):
+        self._async_evaluator = _AsyncProtectedMaterialsEvaluator(azure_ai_project, credential)
+    def __call__(self, *, query: str, response: str, **kwargs):
+        """
+        Evaluates protected materials content.
+        :keyword query: The query to be evaluated.
+        :paramtype query: str
+        :keyword response: The response to be evaluated.
+        :paramtype response: str
+        :return: A dictionary containing a boolean label and reasoning.
+        :rtype: dict
+        """
+        return async_run_allowing_running_loop(self._async_evaluator, query=query, response=response, **kwargs)
+    def _to_async(self):
+        return self._async_evaluator

azure/ai/evaluation/_evaluators/_qa/__init__.py ADDED Viewed

@@ -0,0 +1,9 @@
+# ---------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# ---------------------------------------------------------
+from ._qa import QAEvaluator
+__all__ = [
+    "QAEvaluator",
+]

azure-ai-evaluation 0.0.0b0__py3-none-any.whl → 1.0.0b1__py3-none-any.whl

Potentially problematic release.

azure-ai-evaluation 0.0.0b0py3-none-any.whl → 1.0.0b1py3-none-any.whl