azure-ai-evaluation 1.2.0__py3-none-any.whl → 1.4.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of azure-ai-evaluation might be problematic. Click here for more details.

Files changed (134) hide show
  1. azure/ai/evaluation/__init__.py +42 -14
  2. azure/ai/evaluation/_azure/_models.py +6 -6
  3. azure/ai/evaluation/_common/constants.py +6 -2
  4. azure/ai/evaluation/_common/rai_service.py +38 -4
  5. azure/ai/evaluation/_common/raiclient/__init__.py +34 -0
  6. azure/ai/evaluation/_common/raiclient/_client.py +128 -0
  7. azure/ai/evaluation/_common/raiclient/_configuration.py +87 -0
  8. azure/ai/evaluation/_common/raiclient/_model_base.py +1235 -0
  9. azure/ai/evaluation/_common/raiclient/_patch.py +20 -0
  10. azure/ai/evaluation/_common/raiclient/_serialization.py +2050 -0
  11. azure/ai/evaluation/_common/raiclient/_version.py +9 -0
  12. azure/ai/evaluation/_common/raiclient/aio/__init__.py +29 -0
  13. azure/ai/evaluation/_common/raiclient/aio/_client.py +130 -0
  14. azure/ai/evaluation/_common/raiclient/aio/_configuration.py +87 -0
  15. azure/ai/evaluation/_common/raiclient/aio/_patch.py +20 -0
  16. azure/ai/evaluation/_common/raiclient/aio/operations/__init__.py +25 -0
  17. azure/ai/evaluation/_common/raiclient/aio/operations/_operations.py +981 -0
  18. azure/ai/evaluation/_common/raiclient/aio/operations/_patch.py +20 -0
  19. azure/ai/evaluation/_common/raiclient/models/__init__.py +60 -0
  20. azure/ai/evaluation/_common/raiclient/models/_enums.py +18 -0
  21. azure/ai/evaluation/_common/raiclient/models/_models.py +651 -0
  22. azure/ai/evaluation/_common/raiclient/models/_patch.py +20 -0
  23. azure/ai/evaluation/_common/raiclient/operations/__init__.py +25 -0
  24. azure/ai/evaluation/_common/raiclient/operations/_operations.py +1225 -0
  25. azure/ai/evaluation/_common/raiclient/operations/_patch.py +20 -0
  26. azure/ai/evaluation/_common/raiclient/py.typed +1 -0
  27. azure/ai/evaluation/_common/utils.py +30 -10
  28. azure/ai/evaluation/_constants.py +10 -0
  29. azure/ai/evaluation/_converters/__init__.py +3 -0
  30. azure/ai/evaluation/_converters/_ai_services.py +804 -0
  31. azure/ai/evaluation/_converters/_models.py +302 -0
  32. azure/ai/evaluation/_evaluate/_batch_run/__init__.py +10 -3
  33. azure/ai/evaluation/_evaluate/_batch_run/_run_submitter_client.py +104 -0
  34. azure/ai/evaluation/_evaluate/_batch_run/batch_clients.py +82 -0
  35. azure/ai/evaluation/_evaluate/_eval_run.py +1 -1
  36. azure/ai/evaluation/_evaluate/_evaluate.py +36 -4
  37. azure/ai/evaluation/_evaluators/_bleu/_bleu.py +23 -3
  38. azure/ai/evaluation/_evaluators/_code_vulnerability/__init__.py +5 -0
  39. azure/ai/evaluation/_evaluators/_code_vulnerability/_code_vulnerability.py +120 -0
  40. azure/ai/evaluation/_evaluators/_coherence/_coherence.py +21 -2
  41. azure/ai/evaluation/_evaluators/_common/_base_eval.py +43 -3
  42. azure/ai/evaluation/_evaluators/_common/_base_multi_eval.py +3 -1
  43. azure/ai/evaluation/_evaluators/_common/_base_prompty_eval.py +43 -4
  44. azure/ai/evaluation/_evaluators/_common/_base_rai_svc_eval.py +16 -4
  45. azure/ai/evaluation/_evaluators/_content_safety/_content_safety.py +42 -5
  46. azure/ai/evaluation/_evaluators/_content_safety/_hate_unfairness.py +15 -0
  47. azure/ai/evaluation/_evaluators/_content_safety/_self_harm.py +15 -0
  48. azure/ai/evaluation/_evaluators/_content_safety/_sexual.py +15 -0
  49. azure/ai/evaluation/_evaluators/_content_safety/_violence.py +15 -0
  50. azure/ai/evaluation/_evaluators/_f1_score/_f1_score.py +28 -4
  51. azure/ai/evaluation/_evaluators/_fluency/_fluency.py +21 -2
  52. azure/ai/evaluation/_evaluators/_gleu/_gleu.py +26 -3
  53. azure/ai/evaluation/_evaluators/_groundedness/_groundedness.py +21 -3
  54. azure/ai/evaluation/_evaluators/_intent_resolution/__init__.py +7 -0
  55. azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py +152 -0
  56. azure/ai/evaluation/_evaluators/_intent_resolution/intent_resolution.prompty +161 -0
  57. azure/ai/evaluation/_evaluators/_meteor/_meteor.py +26 -3
  58. azure/ai/evaluation/_evaluators/_qa/_qa.py +51 -7
  59. azure/ai/evaluation/_evaluators/_relevance/_relevance.py +26 -2
  60. azure/ai/evaluation/_evaluators/_response_completeness/__init__.py +7 -0
  61. azure/ai/evaluation/_evaluators/_response_completeness/_response_completeness.py +157 -0
  62. azure/ai/evaluation/_evaluators/_response_completeness/response_completeness.prompty +99 -0
  63. azure/ai/evaluation/_evaluators/_retrieval/_retrieval.py +21 -2
  64. azure/ai/evaluation/_evaluators/_rouge/_rouge.py +113 -4
  65. azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py +23 -3
  66. azure/ai/evaluation/_evaluators/_similarity/_similarity.py +24 -5
  67. azure/ai/evaluation/_evaluators/_task_adherence/__init__.py +7 -0
  68. azure/ai/evaluation/_evaluators/_task_adherence/_task_adherence.py +148 -0
  69. azure/ai/evaluation/_evaluators/_task_adherence/task_adherence.prompty +117 -0
  70. azure/ai/evaluation/_evaluators/_tool_call_accuracy/__init__.py +9 -0
  71. azure/ai/evaluation/_evaluators/_tool_call_accuracy/_tool_call_accuracy.py +292 -0
  72. azure/ai/evaluation/_evaluators/_tool_call_accuracy/tool_call_accuracy.prompty +71 -0
  73. azure/ai/evaluation/_evaluators/_ungrounded_attributes/__init__.py +5 -0
  74. azure/ai/evaluation/_evaluators/_ungrounded_attributes/_ungrounded_attributes.py +103 -0
  75. azure/ai/evaluation/_evaluators/_xpia/xpia.py +2 -0
  76. azure/ai/evaluation/_exceptions.py +5 -1
  77. azure/ai/evaluation/_legacy/__init__.py +3 -0
  78. azure/ai/evaluation/_legacy/_batch_engine/__init__.py +9 -0
  79. azure/ai/evaluation/_legacy/_batch_engine/_config.py +45 -0
  80. azure/ai/evaluation/_legacy/_batch_engine/_engine.py +368 -0
  81. azure/ai/evaluation/_legacy/_batch_engine/_exceptions.py +88 -0
  82. azure/ai/evaluation/_legacy/_batch_engine/_logging.py +292 -0
  83. azure/ai/evaluation/_legacy/_batch_engine/_openai_injector.py +23 -0
  84. azure/ai/evaluation/_legacy/_batch_engine/_result.py +99 -0
  85. azure/ai/evaluation/_legacy/_batch_engine/_run.py +121 -0
  86. azure/ai/evaluation/_legacy/_batch_engine/_run_storage.py +128 -0
  87. azure/ai/evaluation/_legacy/_batch_engine/_run_submitter.py +217 -0
  88. azure/ai/evaluation/_legacy/_batch_engine/_status.py +25 -0
  89. azure/ai/evaluation/_legacy/_batch_engine/_trace.py +105 -0
  90. azure/ai/evaluation/_legacy/_batch_engine/_utils.py +82 -0
  91. azure/ai/evaluation/_legacy/_batch_engine/_utils_deprecated.py +131 -0
  92. azure/ai/evaluation/_legacy/prompty/__init__.py +36 -0
  93. azure/ai/evaluation/_legacy/prompty/_connection.py +182 -0
  94. azure/ai/evaluation/_legacy/prompty/_exceptions.py +59 -0
  95. azure/ai/evaluation/_legacy/prompty/_prompty.py +313 -0
  96. azure/ai/evaluation/_legacy/prompty/_utils.py +545 -0
  97. azure/ai/evaluation/_legacy/prompty/_yaml_utils.py +99 -0
  98. azure/ai/evaluation/_red_team/__init__.py +3 -0
  99. azure/ai/evaluation/_red_team/_attack_objective_generator.py +192 -0
  100. azure/ai/evaluation/_red_team/_attack_strategy.py +42 -0
  101. azure/ai/evaluation/_red_team/_callback_chat_target.py +74 -0
  102. azure/ai/evaluation/_red_team/_default_converter.py +21 -0
  103. azure/ai/evaluation/_red_team/_red_team.py +1858 -0
  104. azure/ai/evaluation/_red_team/_red_team_result.py +246 -0
  105. azure/ai/evaluation/_red_team/_utils/__init__.py +3 -0
  106. azure/ai/evaluation/_red_team/_utils/constants.py +64 -0
  107. azure/ai/evaluation/_red_team/_utils/formatting_utils.py +164 -0
  108. azure/ai/evaluation/_red_team/_utils/logging_utils.py +139 -0
  109. azure/ai/evaluation/_red_team/_utils/strategy_utils.py +188 -0
  110. azure/ai/evaluation/_safety_evaluation/__init__.py +3 -0
  111. azure/ai/evaluation/_safety_evaluation/_generated_rai_client.py +0 -0
  112. azure/ai/evaluation/_safety_evaluation/_safety_evaluation.py +741 -0
  113. azure/ai/evaluation/_version.py +2 -1
  114. azure/ai/evaluation/simulator/_adversarial_scenario.py +3 -1
  115. azure/ai/evaluation/simulator/_adversarial_simulator.py +61 -27
  116. azure/ai/evaluation/simulator/_conversation/__init__.py +4 -5
  117. azure/ai/evaluation/simulator/_conversation/_conversation.py +4 -0
  118. azure/ai/evaluation/simulator/_model_tools/_generated_rai_client.py +145 -0
  119. azure/ai/evaluation/simulator/_model_tools/_proxy_completion_model.py +2 -0
  120. azure/ai/evaluation/simulator/_model_tools/_rai_client.py +71 -1
  121. {azure_ai_evaluation-1.2.0.dist-info → azure_ai_evaluation-1.4.0.dist-info}/METADATA +75 -15
  122. azure_ai_evaluation-1.4.0.dist-info/RECORD +197 -0
  123. {azure_ai_evaluation-1.2.0.dist-info → azure_ai_evaluation-1.4.0.dist-info}/WHEEL +1 -1
  124. azure/ai/evaluation/_evaluators/_multimodal/__init__.py +0 -20
  125. azure/ai/evaluation/_evaluators/_multimodal/_content_safety_multimodal.py +0 -132
  126. azure/ai/evaluation/_evaluators/_multimodal/_content_safety_multimodal_base.py +0 -55
  127. azure/ai/evaluation/_evaluators/_multimodal/_hate_unfairness.py +0 -100
  128. azure/ai/evaluation/_evaluators/_multimodal/_protected_material.py +0 -124
  129. azure/ai/evaluation/_evaluators/_multimodal/_self_harm.py +0 -100
  130. azure/ai/evaluation/_evaluators/_multimodal/_sexual.py +0 -100
  131. azure/ai/evaluation/_evaluators/_multimodal/_violence.py +0 -100
  132. azure_ai_evaluation-1.2.0.dist-info/RECORD +0 -125
  133. {azure_ai_evaluation-1.2.0.dist-info → azure_ai_evaluation-1.4.0.dist-info}/NOTICE.txt +0 -0
  134. {azure_ai_evaluation-1.2.0.dist-info → azure_ai_evaluation-1.4.0.dist-info}/top_level.txt +0 -0
@@ -1,5 +1,6 @@
1
1
  # ---------------------------------------------------------
2
2
  # Copyright (c) Microsoft Corporation. All rights reserved.
3
3
  # ---------------------------------------------------------
4
+ # represents upcoming version
4
5
 
5
- VERSION = "1.2.0"
6
+ VERSION = "1.4.0"
@@ -5,7 +5,7 @@
5
5
  from enum import Enum
6
6
  from azure.ai.evaluation._common._experimental import experimental
7
7
 
8
-
8
+ # cspell:ignore vuln
9
9
  @experimental
10
10
  class AdversarialScenario(Enum):
11
11
  """Adversarial scenario types
@@ -28,6 +28,8 @@ class AdversarialScenario(Enum):
28
28
  ADVERSARIAL_CONTENT_GEN_UNGROUNDED = "adv_content_gen_ungrounded"
29
29
  ADVERSARIAL_CONTENT_GEN_GROUNDED = "adv_content_gen_grounded"
30
30
  ADVERSARIAL_CONTENT_PROTECTED_MATERIAL = "adv_content_protected_material"
31
+ ADVERSARIAL_CODE_VULNERABILITY = "adv_code_vuln"
32
+ ADVERSARIAL_UNGROUNDED_ATTRIBUTES = "adv_isa"
31
33
 
32
34
 
33
35
  @experimental
@@ -7,6 +7,7 @@ import asyncio
7
7
  import logging
8
8
  import random
9
9
  from typing import Any, Callable, Dict, List, Optional, Union, cast
10
+ import uuid
10
11
 
11
12
  from tqdm import tqdm
12
13
 
@@ -187,6 +188,8 @@ class AdversarialSimulator:
187
188
  )
188
189
  self._ensure_service_dependencies()
189
190
  templates = await self.adversarial_template_handler._get_content_harm_template_collections(scenario.value)
191
+ simulation_id = str(uuid.uuid4())
192
+ logger.warning("Use simulation_id to help debug the issue: %s", str(simulation_id))
190
193
  concurrent_async_task = min(concurrent_async_task, 1000)
191
194
  semaphore = asyncio.Semaphore(concurrent_async_task)
192
195
  sim_results = []
@@ -217,32 +220,54 @@ class AdversarialSimulator:
217
220
  if randomization_seed is not None:
218
221
  random.seed(randomization_seed)
219
222
  random.shuffle(templates)
220
- parameter_lists = [t.template_parameters for t in templates]
221
- zipped_parameters = list(zip(*parameter_lists))
222
- for param_group in zipped_parameters:
223
- for template, parameter in zip(templates, param_group):
224
- if _jailbreak_type == "upia":
225
- parameter = self._add_jailbreak_parameter(parameter, random.choice(jailbreak_dataset))
226
- tasks.append(
227
- asyncio.create_task(
228
- self._simulate_async(
229
- target=target,
230
- template=template,
231
- parameters=parameter,
232
- max_conversation_turns=max_conversation_turns,
233
- api_call_retry_limit=api_call_retry_limit,
234
- api_call_retry_sleep_sec=api_call_retry_sleep_sec,
235
- api_call_delay_sec=api_call_delay_sec,
236
- language=language,
237
- semaphore=semaphore,
238
- scenario=scenario,
239
- )
223
+
224
+ # Prepare task parameters based on scenario - but use a single append call for all scenarios
225
+ tasks = []
226
+ template_parameter_pairs = []
227
+
228
+ if scenario == AdversarialScenario.ADVERSARIAL_CONVERSATION:
229
+ # For ADVERSARIAL_CONVERSATION, flatten the parameters
230
+ for i, template in enumerate(templates):
231
+ if not template.template_parameters:
232
+ continue
233
+ for parameter in template.template_parameters:
234
+ template_parameter_pairs.append((template, parameter))
235
+ else:
236
+ # Use original logic for other scenarios - zip parameters
237
+ parameter_lists = [t.template_parameters for t in templates]
238
+ zipped_parameters = list(zip(*parameter_lists))
239
+
240
+ for param_group in zipped_parameters:
241
+ for template, parameter in zip(templates, param_group):
242
+ template_parameter_pairs.append((template, parameter))
243
+
244
+ # Limit to max_simulation_results if needed
245
+ if len(template_parameter_pairs) > max_simulation_results:
246
+ template_parameter_pairs = template_parameter_pairs[:max_simulation_results]
247
+
248
+ # Single task append loop for all scenarios
249
+ for template, parameter in template_parameter_pairs:
250
+ if _jailbreak_type == "upia":
251
+ parameter = self._add_jailbreak_parameter(parameter, random.choice(jailbreak_dataset))
252
+
253
+ tasks.append(
254
+ asyncio.create_task(
255
+ self._simulate_async(
256
+ target=target,
257
+ template=template,
258
+ parameters=parameter,
259
+ max_conversation_turns=max_conversation_turns,
260
+ api_call_retry_limit=api_call_retry_limit,
261
+ api_call_retry_sleep_sec=api_call_retry_sleep_sec,
262
+ api_call_delay_sec=api_call_delay_sec,
263
+ language=language,
264
+ semaphore=semaphore,
265
+ scenario=scenario,
266
+ simulation_id=simulation_id,
240
267
  )
241
268
  )
242
- if len(tasks) >= max_simulation_results:
243
- break
244
- if len(tasks) >= max_simulation_results:
245
- break
269
+ )
270
+
246
271
  for task in asyncio.as_completed(tasks):
247
272
  sim_results.append(await task)
248
273
  progress_bar.update(1)
@@ -298,9 +323,14 @@ class AdversarialSimulator:
298
323
  language: SupportedLanguages,
299
324
  semaphore: asyncio.Semaphore,
300
325
  scenario: Union[AdversarialScenario, AdversarialScenarioJailbreak],
326
+ simulation_id: str = "",
301
327
  ) -> List[Dict]:
302
328
  user_bot = self._setup_bot(
303
- role=ConversationRole.USER, template=template, parameters=parameters, scenario=scenario
329
+ role=ConversationRole.USER,
330
+ template=template,
331
+ parameters=parameters,
332
+ scenario=scenario,
333
+ simulation_id=simulation_id,
304
334
  )
305
335
  system_bot = self._setup_bot(
306
336
  target=target, role=ConversationRole.ASSISTANT, template=template, parameters=parameters, scenario=scenario
@@ -329,7 +359,7 @@ class AdversarialSimulator:
329
359
  )
330
360
 
331
361
  def _get_user_proxy_completion_model(
332
- self, template_key: str, template_parameters: TemplateParameters
362
+ self, template_key: str, template_parameters: TemplateParameters, simulation_id: str = ""
333
363
  ) -> ProxyChatCompletionsModel:
334
364
  return ProxyChatCompletionsModel(
335
365
  name="raisvc_proxy_model",
@@ -340,6 +370,7 @@ class AdversarialSimulator:
340
370
  api_version="2023-07-01-preview",
341
371
  max_tokens=1200,
342
372
  temperature=0.0,
373
+ simulation_id=simulation_id,
343
374
  )
344
375
 
345
376
  def _setup_bot(
@@ -350,10 +381,13 @@ class AdversarialSimulator:
350
381
  parameters: TemplateParameters,
351
382
  target: Optional[Callable] = None,
352
383
  scenario: Union[AdversarialScenario, AdversarialScenarioJailbreak],
384
+ simulation_id: str = "",
353
385
  ) -> ConversationBot:
354
386
  if role is ConversationRole.USER:
355
387
  model = self._get_user_proxy_completion_model(
356
- template_key=template.template_name, template_parameters=parameters
388
+ template_key=template.template_name,
389
+ template_parameters=parameters,
390
+ simulation_id=simulation_id,
357
391
  )
358
392
  return ConversationBot(
359
393
  role=role,
@@ -128,19 +128,15 @@ class ConversationBot:
128
128
  self.conversation_starter: Optional[Union[str, jinja2.Template, Dict]] = None
129
129
  if role == ConversationRole.USER:
130
130
  if "conversation_starter" in self.persona_template_args:
131
- print(self.persona_template_args)
132
131
  conversation_starter_content = self.persona_template_args["conversation_starter"]
133
132
  if isinstance(conversation_starter_content, dict):
134
133
  self.conversation_starter = conversation_starter_content
135
- print(f"Conversation starter content: {conversation_starter_content}")
136
134
  else:
137
135
  try:
138
136
  self.conversation_starter = jinja2.Template(
139
137
  conversation_starter_content, undefined=jinja2.StrictUndefined
140
138
  )
141
- print("Successfully created a Jinja2 template for the conversation starter.")
142
139
  except jinja2.exceptions.TemplateSyntaxError as e: # noqa: F841
143
- print(f"Template syntax error: {e}. Using raw content.")
144
140
  self.conversation_starter = conversation_starter_content
145
141
  else:
146
142
  self.logger.info(
@@ -153,6 +149,7 @@ class ConversationBot:
153
149
  conversation_history: List[ConversationTurn],
154
150
  max_history: int,
155
151
  turn_number: int = 0,
152
+ session_state: Optional[Dict[str, Any]] = None,
156
153
  ) -> Tuple[dict, dict, float, dict]:
157
154
  """
158
155
  Prompt the ConversationBot for a response.
@@ -262,6 +259,7 @@ class CallbackConversationBot(ConversationBot):
262
259
  conversation_history: List[Any],
263
260
  max_history: int,
264
261
  turn_number: int = 0,
262
+ session_state: Optional[Dict[str, Any]] = None,
265
263
  ) -> Tuple[dict, dict, float, dict]:
266
264
  chat_protocol_message = self._to_chat_protocol(
267
265
  self.user_template, conversation_history, self.user_template_parameters
@@ -269,7 +267,7 @@ class CallbackConversationBot(ConversationBot):
269
267
  msg_copy = copy.deepcopy(chat_protocol_message)
270
268
  result = {}
271
269
  start_time = time.time()
272
- result = await self.callback(msg_copy)
270
+ result = await self.callback(msg_copy, session_state=session_state)
273
271
  end_time = time.time()
274
272
  if not result:
275
273
  result = {
@@ -348,6 +346,7 @@ class MultiModalConversationBot(ConversationBot):
348
346
  conversation_history: List[Any],
349
347
  max_history: int,
350
348
  turn_number: int = 0,
349
+ session_state: Optional[Dict[str, Any]] = None,
351
350
  ) -> Tuple[dict, dict, float, dict]:
352
351
  previous_prompt = conversation_history[-1]
353
352
  chat_protocol_message = await self._to_chat_protocol(conversation_history, self.user_template_parameters)
@@ -101,6 +101,7 @@ async def simulate_conversation(
101
101
  :rtype: Tuple[Optional[str], List[ConversationTurn]]
102
102
  """
103
103
 
104
+ session_state = {}
104
105
  # Read the first prompt.
105
106
  (first_response, request, _, full_response) = await bots[0].generate_response(
106
107
  session=session,
@@ -149,7 +150,10 @@ async def simulate_conversation(
149
150
  conversation_history=conversation_history,
150
151
  max_history=history_limit,
151
152
  turn_number=current_turn,
153
+ session_state=session_state,
152
154
  )
155
+ if "session_state" in full_response and full_response["session_state"] is not None:
156
+ session_state.update(full_response["session_state"])
153
157
 
154
158
  # check if conversation id is null, which means conversation starter was used. use id from next turn
155
159
  if conversation_id is None and "id" in response:
@@ -0,0 +1,145 @@
1
+ # ---------------------------------------------------------
2
+ # Copyright (c) Microsoft Corporation. All rights reserved.
3
+ # ---------------------------------------------------------
4
+
5
+ import os
6
+ from typing import Dict, List, Optional
7
+
8
+ from azure.core.credentials import TokenCredential
9
+ from azure.ai.evaluation._model_configurations import AzureAIProject
10
+ from azure.ai.evaluation.simulator._model_tools import ManagedIdentityAPITokenManager
11
+ from azure.ai.evaluation._common.raiclient import MachineLearningServicesClient
12
+ import jwt
13
+ import time
14
+ import ast
15
+
16
+ class GeneratedRAIClient:
17
+ """Client for the Responsible AI Service using the auto-generated MachineLearningServicesClient.
18
+
19
+ :param azure_ai_project: The scope of the Azure AI project. It contains subscription id, resource group, and project name.
20
+ :type azure_ai_project: ~azure.ai.evaluation.AzureAIProject
21
+ :param token_manager: The token manager
22
+ :type token_manager: ~azure.ai.evaluation.simulator._model_tools._identity_manager.APITokenManager
23
+ """
24
+
25
+ def __init__(self, azure_ai_project: AzureAIProject, token_manager: ManagedIdentityAPITokenManager):
26
+ self.azure_ai_project = azure_ai_project
27
+ self.token_manager = token_manager
28
+
29
+ # Service URL construction
30
+ if "RAI_SVC_URL" in os.environ:
31
+ endpoint = os.environ["RAI_SVC_URL"].rstrip("/")
32
+ else:
33
+ endpoint = self._get_service_discovery_url()
34
+
35
+ # Create the autogenerated client
36
+ self._client = MachineLearningServicesClient(
37
+ endpoint=endpoint,
38
+ subscription_id=self.azure_ai_project["subscription_id"],
39
+ resource_group_name=self.azure_ai_project["resource_group_name"],
40
+ workspace_name=self.azure_ai_project["project_name"],
41
+ credential=self.token_manager,
42
+ )
43
+
44
+ def _get_service_discovery_url(self):
45
+ """Get the service discovery URL.
46
+
47
+ :return: The service discovery URL
48
+ :rtype: str
49
+ """
50
+ import requests
51
+ bearer_token = self._fetch_or_reuse_token(self.token_manager)
52
+ headers = {"Authorization": f"Bearer {bearer_token}", "Content-Type": "application/json"}
53
+
54
+ response = requests.get(
55
+ f"https://management.azure.com/subscriptions/{self.azure_ai_project['subscription_id']}/"
56
+ f"resourceGroups/{self.azure_ai_project['resource_group_name']}/"
57
+ f"providers/Microsoft.MachineLearningServices/workspaces/{self.azure_ai_project['project_name']}?"
58
+ f"api-version=2023-08-01-preview",
59
+ headers=headers,
60
+ timeout=5,
61
+ )
62
+
63
+ if response.status_code != 200:
64
+ msg = (
65
+ f"Failed to connect to your Azure AI project. Please check if the project scope is configured "
66
+ f"correctly, and make sure you have the necessary access permissions. "
67
+ f"Status code: {response.status_code}."
68
+ )
69
+ raise Exception(msg)
70
+
71
+ # Parse the discovery URL
72
+ from urllib.parse import urlparse
73
+ base_url = urlparse(response.json()["properties"]["discoveryUrl"])
74
+ return f"{base_url.scheme}://{base_url.netloc}"
75
+
76
+ async def get_attack_objectives(self, risk_category: Optional[str] = None, application_scenario: str = None, strategy: Optional[str] = None) -> Dict:
77
+ """Get attack objectives using the auto-generated operations.
78
+
79
+ :param risk_category: Optional risk category to filter the attack objectives
80
+ :type risk_category: Optional[str]
81
+ :param application_scenario: Optional description of the application scenario for context
82
+ :type application_scenario: str
83
+ :param strategy: Optional strategy to filter the attack objectives
84
+ :type strategy: Optional[str]
85
+ :return: The attack objectives
86
+ :rtype: Dict
87
+ """
88
+ try:
89
+ # Send the request using the autogenerated client
90
+ response = self._client.rai_svc.get_attack_objectives(
91
+ risk_types=[risk_category],
92
+ lang="en",
93
+ strategy=strategy,
94
+ )
95
+ return response
96
+
97
+ except Exception as e:
98
+ # Log the exception for debugging purposes
99
+ import logging
100
+ logging.error(f"Error in get_attack_objectives: {str(e)}")
101
+ raise
102
+
103
+ async def get_jailbreak_prefixes(self) -> List[str]:
104
+ """Get jailbreak prefixes using the auto-generated operations.
105
+
106
+ :return: The jailbreak prefixes
107
+ :rtype: List[str]
108
+ """
109
+ try:
110
+ # Send the request using the autogenerated client
111
+ response = self._client.rai_svc.get_jail_break_dataset_with_type(type="upia")
112
+ if isinstance(response, list):
113
+ return response
114
+ else:
115
+ self.logger.error("Unexpected response format from get_jail_break_dataset_with_type")
116
+ raise ValueError("Unexpected response format from get_jail_break_dataset_with_type")
117
+
118
+ except Exception as e:
119
+ return [""]
120
+
121
+ def _fetch_or_reuse_token(self, credential: TokenCredential, token: Optional[str] = None) -> str:
122
+ """Get token. Fetch a new token if the current token is near expiry
123
+
124
+ :param credential: The Azure authentication credential.
125
+ :type credential:
126
+ ~azure.core.credentials.TokenCredential
127
+ :param token: The Azure authentication token. Defaults to None. If none, a new token will be fetched.
128
+ :type token: str
129
+ :return: The Azure authentication token.
130
+ """
131
+ if token:
132
+ # Decode the token to get its expiration time
133
+ try:
134
+ decoded_token = jwt.decode(token, options={"verify_signature": False})
135
+ except jwt.PyJWTError:
136
+ pass
137
+ else:
138
+ exp_time = decoded_token["exp"]
139
+ current_time = time.time()
140
+
141
+ # Return current token if not near expiry
142
+ if (exp_time - current_time) >= 300:
143
+ return token
144
+
145
+ return credential.get_token("https://management.azure.com/.default").token
@@ -89,6 +89,7 @@ class ProxyChatCompletionsModel(OpenAIChatCompletionsModel):
89
89
  self.tkey = template_key
90
90
  self.tparam = template_parameters
91
91
  self.result_url: Optional[str] = None
92
+ self.simulation_id: Optional[str] = kwargs.pop("simulation_id", "")
92
93
 
93
94
  super().__init__(name=name, **kwargs)
94
95
 
@@ -169,6 +170,7 @@ class ProxyChatCompletionsModel(OpenAIChatCompletionsModel):
169
170
  "Content-Type": "application/json",
170
171
  "X-CV": f"{uuid.uuid4()}",
171
172
  "X-ModelType": self.model or "",
173
+ "x-ms-client-request-id": self.simulation_id,
172
174
  }
173
175
  # add all additional headers
174
176
  headers.update(self.additional_headers) # type: ignore[arg-type]
@@ -2,9 +2,10 @@
2
2
  # Copyright (c) Microsoft Corporation. All rights reserved.
3
3
  # ---------------------------------------------------------
4
4
  import os
5
- from typing import Any
5
+ from typing import Any, Dict, List
6
6
  from urllib.parse import urljoin, urlparse
7
7
  import base64
8
+ import json
8
9
 
9
10
  from azure.ai.evaluation._exceptions import ErrorBlame, ErrorCategory, ErrorTarget, EvaluationException
10
11
  from azure.ai.evaluation._http_utils import AsyncHttpPipeline, get_async_http_client, get_http_client
@@ -62,6 +63,7 @@ class RAIClient: # pylint: disable=client-accepts-api-version-keyword
62
63
  self.jailbreaks_json_endpoint = urljoin(self.api_url, "simulation/jailbreak")
63
64
  self.simulation_submit_endpoint = urljoin(self.api_url, "simulation/chat/completions/submit")
64
65
  self.xpia_jailbreaks_json_endpoint = urljoin(self.api_url, "simulation/jailbreak/xpia")
66
+ self.attack_objectives_endpoint = urljoin(self.api_url, "simulation/attackobjectives")
65
67
 
66
68
  def _get_service_discovery_url(self):
67
69
  bearer_token = self.token_manager.get_token()
@@ -206,3 +208,71 @@ class RAIClient: # pylint: disable=client-accepts-api-version-keyword
206
208
  category=ErrorCategory.UNKNOWN,
207
209
  blame=ErrorBlame.USER_ERROR,
208
210
  )
211
+
212
+ async def get_attack_objectives(self, risk_categories: List[str], application_scenario: str = None, strategy: str = None) -> Any:
213
+ """Get the attack objectives based on risk categories and application scenario
214
+
215
+ :param risk_categories: List of risk categories to generate attack objectives for
216
+ :type risk_categories: List[str]
217
+ :param application_scenario: Optional description of the application scenario for context
218
+ :type application_scenario: str
219
+ :param strategy: Optional attack strategy to get specific objectives for
220
+ :type strategy: str
221
+ :return: The attack objectives
222
+ :rtype: Any
223
+ """
224
+ # Create query parameters for the request
225
+ if application_scenario:
226
+ raise NotImplementedError("Application scenario is not supported yet")
227
+
228
+ params = {
229
+ "api-version": "2022-11-01-preview",
230
+ "riskTypes": ",".join(risk_categories),
231
+ "lang": "en" # Default to English
232
+ }
233
+
234
+ # Add strategy parameter if provided
235
+ if strategy:
236
+ params["strategy"] = strategy
237
+
238
+ try:
239
+ # Make the request using the existing get method
240
+ result = await self.get(self.attack_objectives_endpoint)
241
+ # from collections import defaultdict
242
+ # counts_by_risk = defaultdict(int)
243
+ # for item in result:
244
+ # target_harms = item.get("Metadata", {}).get("TargetHarms", [])
245
+ # if not target_harms:
246
+ # # No risk type specified
247
+ # counts_by_risk["empty"] += 1
248
+ # else:
249
+ # for harm in target_harms:
250
+ # # Use "empty" if the risk type field is missing
251
+ # risk_type = harm.get("RiskType", "") or "empty"
252
+ # counts_by_risk[risk_type] += 1
253
+ return result
254
+ except Exception:
255
+ # If the API fails or isn't implemented yet, return a mock response
256
+ # This is temporary until the API endpoint is fully implemented
257
+ return [{
258
+ "metadata": {
259
+ "lang": "en",
260
+ "target_harms": [
261
+ {
262
+ "risk-type": "violence",
263
+ "risk-subtype": ""
264
+ }
265
+ ]
266
+ },
267
+ "messages": [
268
+ {
269
+ "role": "user",
270
+ "content": "Risky content"
271
+ }
272
+ ],
273
+ "modality": "text",
274
+ "source": [
275
+ "source"
276
+ ]
277
+ }]
278
+
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: azure-ai-evaluation
3
- Version: 1.2.0
3
+ Version: 1.4.0
4
4
  Summary: Microsoft Azure Evaluation Library for Python
5
5
  Home-page: https://github.com/Azure/azure-sdk-for-python
6
6
  Author: Microsoft Corporation
@@ -21,13 +21,15 @@ Classifier: Operating System :: OS Independent
21
21
  Requires-Python: >=3.9
22
22
  Description-Content-Type: text/markdown
23
23
  License-File: NOTICE.txt
24
- Requires-Dist: promptflow-devkit >=1.17.1
25
- Requires-Dist: promptflow-core >=1.17.1
26
- Requires-Dist: pyjwt >=2.8.0
27
- Requires-Dist: azure-identity >=1.16.0
28
- Requires-Dist: azure-core >=1.30.2
29
- Requires-Dist: nltk >=3.9.1
30
- Requires-Dist: azure-storage-blob >=12.10.0
24
+ Requires-Dist: promptflow-devkit>=1.17.1
25
+ Requires-Dist: promptflow-core>=1.17.1
26
+ Requires-Dist: pyjwt>=2.8.0
27
+ Requires-Dist: azure-identity>=1.16.0
28
+ Requires-Dist: azure-core>=1.30.2
29
+ Requires-Dist: nltk>=3.9.1
30
+ Requires-Dist: azure-storage-blob>=12.10.0
31
+ Provides-Extra: redteam
32
+ Requires-Dist: pyrit>=0.8.0; extra == "redteam"
31
33
 
32
34
  # Azure AI Evaluation client library for Python
33
35
 
@@ -54,7 +56,7 @@ Azure AI SDK provides following to evaluate Generative AI Applications:
54
56
  ### Prerequisites
55
57
 
56
58
  - Python 3.9 or later is required to use this package.
57
- - [Optional] You must have [Azure AI Project][ai_project] or [Azure Open AI][azure_openai] to use AI-assisted evaluators
59
+ - [Optional] You must have [Azure AI Foundry Project][ai_project] or [Azure Open AI][azure_openai] to use AI-assisted evaluators
58
60
 
59
61
  ### Install the package
60
62
 
@@ -63,10 +65,6 @@ Install the Azure AI Evaluation SDK for Python with [pip][pip_link]:
63
65
  ```bash
64
66
  pip install azure-ai-evaluation
65
67
  ```
66
- If you want to track results in [AI Studio][ai_studio], install `remote` extra:
67
- ```python
68
- pip install azure-ai-evaluation[remote]
69
- ```
70
68
 
71
69
  ## Key concepts
72
70
 
@@ -175,9 +173,9 @@ result = evaluate(
175
173
  }
176
174
  }
177
175
  }
178
- # Optionally provide your AI Studio project information to track your evaluation results in your Azure AI Studio project
176
+ # Optionally provide your AI Foundry project information to track your evaluation results in your Azure AI Foundry project
179
177
  azure_ai_project = azure_ai_project,
180
- # Optionally provide an output path to dump a json of metric summary, row level data and metric and studio URL
178
+ # Optionally provide an output path to dump a json of metric summary, row level data and metric and AI Foundry URL
181
179
  output_path="./evaluation_results.json"
182
180
  )
183
181
  ```
@@ -375,8 +373,70 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
375
373
  [simulate_with_conversation_starter]: https://github.com/Azure-Samples/azureai-samples/tree/main/scenarios/evaluate/Simulators/Simulate_Context-Relevant_Data/Simulate_From_Conversation_Starter
376
374
  [adversarial_jailbreak]: https://learn.microsoft.com/azure/ai-studio/how-to/develop/simulator-interaction-data#simulating-jailbreak-attacks
377
375
 
376
+
378
377
  # Release History
379
378
 
379
+ ## 1.4.0 (2025-03-27)
380
+
381
+ ### Features Added
382
+ - Enhanced binary evaluation results with customizable thresholds
383
+ - Added threshold support for QA and ContentSafety evaluators
384
+ - Evaluation results now include both the score and threshold values
385
+ - Configurable threshold parameter allows custom binary classification boundaries
386
+ - Default thresholds provided for backward compatibility
387
+ - Quality evaluators use "higher is better" scoring (score ≥ threshold is positive)
388
+ - Content safety evaluators use "lower is better" scoring (score ≤ threshold is positive)
389
+ - New Built-in evaluator called CodeVulnerabilityEvaluator is added.
390
+ - It provides capabilities to identify the following code vulnerabilities.
391
+ - path-injection
392
+ - sql-injection
393
+ - code-injection
394
+ - stack-trace-exposure
395
+ - incomplete-url-substring-sanitization
396
+ - flask-debug
397
+ - clear-text-logging-sensitive-data
398
+ - incomplete-hostname-regexp
399
+ - server-side-unvalidated-url-redirection
400
+ - weak-cryptographic-algorithm
401
+ - full-ssrf
402
+ - bind-socket-all-network-interfaces
403
+ - client-side-unvalidated-url-redirection
404
+ - likely-bugs
405
+ - reflected-xss
406
+ - clear-text-storage-sensitive-data
407
+ - tarslip
408
+ - hardcoded-credentials
409
+ - insecure-randomness
410
+ - It also supports multiple coding languages such as (Python, Java, C++, C#, Go, Javascript, SQL)
411
+
412
+ - New Built-in evaluator called UngroundedAttributesEvaluator is added.
413
+ - It evaluates ungrounded inference of human attributes for a given query, response, and context for a single-turn evaluation only,
414
+ - where query represents the user query and response represents the AI system response given the provided context.
415
+
416
+ - Ungrounded Attributes checks for whether a response is first, ungrounded, and checks if it contains information about protected class
417
+ - or emotional state of a person.
418
+
419
+ - It identifies the following attributes:
420
+
421
+ - emotional_state
422
+ - protected_class
423
+ - groundedness
424
+ - New Built-in evaluators for Agent Evaluation (Preview)
425
+ - IntentResolutionEvaluator - Evaluates the intent resolution of an agent's response to a user query.
426
+ - ResponseCompletenessEvaluator - Evaluates the response completeness of an agent's response to a user query.
427
+ - TaskAdherenceEvaluator - Evaluates the task adherence of an agent's response to a user query.
428
+ - ToolCallAccuracyEvaluator - Evaluates the accuracy of tool calls made by an agent in response to a user query.
429
+
430
+ ### Bugs Fixed
431
+ - Fixed error in `GroundednessProEvaluator` when handling non-numeric values like "n/a" returned from the service.
432
+ - Uploading local evaluation results from `evaluate` with the same run name will no longer result in each online run sharing (and bashing) result files.
433
+
434
+ ## 1.3.0 (2025-02-28)
435
+
436
+ ### Breaking Changes
437
+ - Multimodal specific evaluators `ContentSafetyMultimodalEvaluator`, `ViolenceMultimodalEvaluator`, `SexualMultimodalEvaluator`, `SelfHarmMultimodalEvaluator`, `HateUnfairnessMultimodalEvaluator` and `ProtectedMaterialMultimodalEvaluator` has been removed. Please use `ContentSafetyEvaluator`, `ViolenceEvaluator`, `SexualEvaluator`, `SelfHarmEvaluator`, `HateUnfairnessEvaluator` and `ProtectedMaterialEvaluator` instead.
438
+ - Metric name in ProtectedMaterialEvaluator's output is changed from `protected_material.fictional_characters_label` to `protected_material.fictional_characters_defect_rate`. It's now consistent with other evaluator's metric names (ending with `_defect_rate`).
439
+
380
440
  ## 1.2.0 (2025-01-27)
381
441
 
382
442
  ### Features Added