PyPI - strands-agents-evals - Versions diffs - 0.1.0__tar.gz → 0.1.1__tar.gz - Mend

strands-agents-evals 0.1.0tar.gz → 0.1.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (113) hide show

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/.github/workflows/pypi-publish-on-release.yml RENAMED Viewed

@@ -52,7 +52,7 @@ jobs:
         hatch build
     - name: Store the distribution packages
-      uses: actions/upload-artifact@v5
+      uses: actions/upload-artifact@v6
       with:
         name: python-package-distributions
         path: dist/
@@ -74,7 +74,7 @@ jobs:
     steps:
     - name: Download all the dists
-      uses: actions/download-artifact@v4
+      uses: actions/download-artifact@v7
       with:
         name: python-package-distributions
         path: dist/

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: strands-agents-evals
-Version: 0.1.0
+Version: 0.1.1
 Summary: Evaluation framework for Strands
 Author-email: AWS <opensource@amazon.com>
 License: Apache-2.0
@@ -68,6 +68,7 @@ Strands Evaluation is a powerful framework for evaluating AI agents and LLM appl
 ## Feature Overview
 - **Multiple Evaluation Types**: Output evaluation, trajectory analysis, tool usage assessment, and interaction evaluation
+- **Dynamic Simulators**: Multi-turn conversation simulation with realistic user behavior and goal-oriented interactions
 - **LLM-as-a-Judge**: Built-in evaluators using language models for sophisticated assessment with structured scoring
 - **Trace-based Evaluation**: Analyze agent behavior through OpenTelemetry execution traces
 - **Automated Experiment Generation**: Generate comprehensive test suites from context descriptions
@@ -226,6 +227,73 @@ reports = experiment.run_evaluations(user_task_function)
 reports[0].run_display()
 ```
+### Multi-turn Conversation Simulation
+Simulate realistic user interactions with dynamic, goal-oriented conversations using ActorSimulator:
+```python
+from strands import Agent
+from strands_evals import Case, Experiment, ActorSimulator
+from strands_evals.evaluators import HelpfulnessEvaluator, GoalSuccessRateEvaluator
+from strands_evals.mappers import StrandsInMemorySessionMapper
+from strands_evals.telemetry import StrandsEvalsTelemetry
+# Setup telemetry
+telemetry = StrandsEvalsTelemetry().setup_in_memory_exporter()
+memory_exporter = telemetry.in_memory_exporter
+def task_function(case: Case) -> dict:
+    # Create simulator to drive conversation
+    simulator = ActorSimulator.from_case_for_user_simulator(
+        case=case,
+        max_turns=10
+    )
+    # Create agent to evaluate
+    agent = Agent(
+        trace_attributes={
+            "gen_ai.conversation.id": case.session_id,
+            "session.id": case.session_id
+        },
+        callback_handler=None
+    )
+    # Run multi-turn conversation
+    all_spans = []
+    user_message = case.input
+    while simulator.has_next():
+        memory_exporter.clear()
+        agent_response = agent(user_message)
+        turn_spans = list(memory_exporter.get_finished_spans())
+        all_spans.extend(turn_spans)
+        user_result = simulator.act(str(agent_response))
+        user_message = str(user_result.structured_output.message)
+    # Map to session for evaluation
+    mapper = StrandsInMemorySessionMapper()
+    session = mapper.map_to_session(all_spans, session_id=case.session_id)
+    return {"output": str(agent_response), "trajectory": session}
+# Use evaluators to assess simulated conversations
+evaluators = [
+    HelpfulnessEvaluator(),
+    GoalSuccessRateEvaluator()
+]
+experiment = Experiment(cases=test_cases, evaluators=evaluators)
+reports = experiment.run_evaluations(task_function)
+```
+**Key Benefits:**
+- **Dynamic Interactions**: Simulator adapts responses based on agent behavior
+- **Goal-Oriented Testing**: Verify agents can complete user objectives through dialogue
+- **Realistic Conversations**: Generate authentic multi-turn interaction patterns
+- **No Predefined Scripts**: Test agents without hardcoded conversation paths
+- **Comprehensive Evaluation**: Combine with trace-based evaluators for full assessment
 ### Automated Experiment Generation
 Generate comprehensive test suites automatically from context descriptions:
@@ -388,8 +456,9 @@ reports[0].run_display()  # Interactive display with metrics breakdown
 For detailed guidance & examples, explore our documentation:
-- [User Guide](https://strandsagents.com/latest//user-guide/evals-sdk/quickstart.md)
-- [Evaluator Reference](https://strandsagents.com/latest/user-guide/evals-sdk/evaluators/)
+- [User Guide](https://strandsagents.com/latest/documentation/docs/user-guide/evals-sdk/quickstart/)
+- [Evaluator Reference](https://strandsagents.com/latest/documentation/docs/user-guide/evals-sdk/evaluators/)
+- [Simulators Guide](https://strandsagents.com/latest/documentation/docs/user-guide/evals-sdk/simulators/)
 ## Contributing ❤️

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/README.md RENAMED Viewed

@@ -36,6 +36,7 @@ Strands Evaluation is a powerful framework for evaluating AI agents and LLM appl
 ## Feature Overview
 - **Multiple Evaluation Types**: Output evaluation, trajectory analysis, tool usage assessment, and interaction evaluation
+- **Dynamic Simulators**: Multi-turn conversation simulation with realistic user behavior and goal-oriented interactions
 - **LLM-as-a-Judge**: Built-in evaluators using language models for sophisticated assessment with structured scoring
 - **Trace-based Evaluation**: Analyze agent behavior through OpenTelemetry execution traces
 - **Automated Experiment Generation**: Generate comprehensive test suites from context descriptions
@@ -194,6 +195,73 @@ reports = experiment.run_evaluations(user_task_function)
 reports[0].run_display()
 ```
+### Multi-turn Conversation Simulation
+Simulate realistic user interactions with dynamic, goal-oriented conversations using ActorSimulator:
+```python
+from strands import Agent
+from strands_evals import Case, Experiment, ActorSimulator
+from strands_evals.evaluators import HelpfulnessEvaluator, GoalSuccessRateEvaluator
+from strands_evals.mappers import StrandsInMemorySessionMapper
+from strands_evals.telemetry import StrandsEvalsTelemetry
+# Setup telemetry
+telemetry = StrandsEvalsTelemetry().setup_in_memory_exporter()
+memory_exporter = telemetry.in_memory_exporter
+def task_function(case: Case) -> dict:
+    # Create simulator to drive conversation
+    simulator = ActorSimulator.from_case_for_user_simulator(
+        case=case,
+        max_turns=10
+    )
+    # Create agent to evaluate
+    agent = Agent(
+        trace_attributes={
+            "gen_ai.conversation.id": case.session_id,
+            "session.id": case.session_id
+        },
+        callback_handler=None
+    )
+    # Run multi-turn conversation
+    all_spans = []
+    user_message = case.input
+    while simulator.has_next():
+        memory_exporter.clear()
+        agent_response = agent(user_message)
+        turn_spans = list(memory_exporter.get_finished_spans())
+        all_spans.extend(turn_spans)
+        user_result = simulator.act(str(agent_response))
+        user_message = str(user_result.structured_output.message)
+    # Map to session for evaluation
+    mapper = StrandsInMemorySessionMapper()
+    session = mapper.map_to_session(all_spans, session_id=case.session_id)
+    return {"output": str(agent_response), "trajectory": session}
+# Use evaluators to assess simulated conversations
+evaluators = [
+    HelpfulnessEvaluator(),
+    GoalSuccessRateEvaluator()
+]
+experiment = Experiment(cases=test_cases, evaluators=evaluators)
+reports = experiment.run_evaluations(task_function)
+```
+**Key Benefits:**
+- **Dynamic Interactions**: Simulator adapts responses based on agent behavior
+- **Goal-Oriented Testing**: Verify agents can complete user objectives through dialogue
+- **Realistic Conversations**: Generate authentic multi-turn interaction patterns
+- **No Predefined Scripts**: Test agents without hardcoded conversation paths
+- **Comprehensive Evaluation**: Combine with trace-based evaluators for full assessment
 ### Automated Experiment Generation
 Generate comprehensive test suites automatically from context descriptions:
@@ -356,8 +424,9 @@ reports[0].run_display()  # Interactive display with metrics breakdown
 For detailed guidance & examples, explore our documentation:
-- [User Guide](https://strandsagents.com/latest//user-guide/evals-sdk/quickstart.md)
-- [Evaluator Reference](https://strandsagents.com/latest/user-guide/evals-sdk/evaluators/)
+- [User Guide](https://strandsagents.com/latest/documentation/docs/user-guide/evals-sdk/quickstart/)
+- [Evaluator Reference](https://strandsagents.com/latest/documentation/docs/user-guide/evals-sdk/evaluators/)
+- [Simulators Guide](https://strandsagents.com/latest/documentation/docs/user-guide/evals-sdk/simulators/)
 ## Contributing ❤️

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/pyproject.toml RENAMED Viewed

@@ -138,7 +138,8 @@ disable_error_code = [
 disallow_untyped_decorators = false
 [tool.hatch.version]
-path = "src/strands_evals/__init__.py"
+source = "vcs"  # Use git tags for versioning
 [tool.pytest.ini_options]
 asyncio_mode = "auto"
 testpaths = ["tests"]

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/__init__.py RENAMED Viewed

@@ -1,5 +1,3 @@
-__version__ = "0.1.0"
 from . import evaluators, extractors, generators, simulation, telemetry, types
 from .case import Case
 from .experiment import Experiment

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/experiment.py RENAMED Viewed

@@ -577,8 +577,8 @@ class Experiment(Generic[InputT, OutputT]):
         file_path.parent.mkdir(parents=True, exist_ok=True)
-        with open(file_path, "w") as f:
-            json.dump(self.to_dict(), f, indent=2)
+        with open(file_path, "w", encoding="utf-8") as f:
+            json.dump(self.to_dict(), f, indent=2, ensure_ascii=False)
     @classmethod
     def from_dict(cls, data: dict, custom_evaluators: list[type[Evaluator]] | None = None):
@@ -646,7 +646,7 @@ class Experiment(Generic[InputT, OutputT]):
                 f"Only .json format is supported. Got file: {path}. Please provide a path with .json extension."
             )
-        with open(file_path, "r") as f:
+        with open(file_path, "r", encoding="utf-8") as f:
             data = json.load(f)
         return cls.from_dict(data, custom_evaluators)

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/extractors/tools_use_extractor.py RENAMED Viewed

@@ -33,6 +33,7 @@ def extract_agent_tools_used_from_messages(agent_messages):
                     tool_id = tool.get("toolUseId")
                     # get the tool result from the next message
                     tool_result = None
+                    is_error = False
                     next_message_i = i + 1
                     while next_message_i < len(agent_messages):
                         next_message = agent_messages[next_message_i]
@@ -46,9 +47,12 @@ def extract_agent_tools_used_from_messages(agent_messages):
                                     tool_result_content = tool_result_dict.get("content", [])
                                     if len(tool_result_content) > 0:
                                         tool_result = tool_result_content[0].get("text")
+                                        is_error = tool_result_dict.get("status") == "error"
                                         break
-                    tools_used.append({"name": tool_name, "input": tool_input, "tool_result": tool_result})
+                    tools_used.append(
+                        {"name": tool_name, "input": tool_input, "tool_result": tool_result, "is_error": is_error}
+                    )
     return tools_used

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/extractors/test_tools_use_extractor.py RENAMED Viewed

@@ -45,6 +45,7 @@ def test_tools_use_extractor_extract_from_messages_with_tools():
     assert result[0]["name"] == "calculator"
     assert result[0]["input"] == {"expression": "2+2"}
     assert result[0]["tool_result"] == "Result: 4"
+    assert result[0]["is_error"] is False
 def test_tools_use_extractor_extract_from_messages_no_tools():
@@ -59,6 +60,38 @@ def test_tools_use_extractor_extract_from_messages_no_tools():
     assert result == []
+def test_tools_use_extractor_extract_from_messages_with_error():
+    """Test extracting tool usage from messages with error status"""
+    messages = [
+        {"role": "user", "content": [{"text": "Calculate invalid"}]},
+        {
+            "role": "assistant",
+            "content": [
+                {"toolUse": {"toolUseId": "tool_123", "name": "calculator", "input": {"expression": "invalid"}}},
+            ],
+        },
+        {
+            "role": "user",
+            "content": [
+                {
+                    "toolResult": {
+                        "status": "error",
+                        "content": [{"text": "Invalid expression"}],
+                        "toolUseId": "tool_123",
+                    }
+                }
+            ],
+        },
+    ]
+    result = extract_agent_tools_used_from_messages(messages)
+    assert len(result) == 1
+    assert result[0]["name"] == "calculator"
+    assert result[0]["tool_result"] == "Invalid expression"
+    assert result[0]["is_error"] is True
 def test_tools_use_extractor_extract_from_messages_empty():
     """Test extracting tool usage from empty messages"""
     result = extract_agent_tools_used_from_messages([])
@@ -96,6 +129,7 @@ def test_tools_use_extractor_extract_from_messages_no_tool_result():
     assert result[0]["name"] == "calculator"
     assert result[0]["input"] == {"expression": "2+2"}
     assert result[0]["tool_result"] is None
+    assert result[0]["is_error"] is False
 def test_tools_use_extractor_extract_from_messages_malformed_tool_result():

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/test_integration.py RENAMED Viewed

@@ -348,3 +348,45 @@ async def test_async_dataset_with_interactions(interaction_case):
     assert len(report.cases) == 1
     assert report.cases[0].get("actual_interactions") is not None
     assert len(report.cases[0].get("actual_interactions")) == 2
+def test_integration_tool_error_extraction():
+    """Test that is_error field is correctly extracted from tool execution"""
+    from strands_evals.extractors.tools_use_extractor import extract_agent_tools_used_from_messages
+    # Create mock messages simulating tool success and error
+    messages = [
+        {"role": "user", "content": [{"text": "test"}]},
+        {
+            "role": "assistant",
+            "content": [
+                {"toolUse": {"toolUseId": "tool1", "name": "success_tool", "input": {}}},
+            ],
+        },
+        {
+            "role": "user",
+            "content": [
+                {"toolResult": {"status": "success", "content": [{"text": "ok"}], "toolUseId": "tool1"}},
+            ],
+        },
+        {
+            "role": "assistant",
+            "content": [
+                {"toolUse": {"toolUseId": "tool2", "name": "error_tool", "input": {}}},
+            ],
+        },
+        {
+            "role": "user",
+            "content": [
+                {"toolResult": {"status": "error", "content": [{"text": "failed"}], "toolUseId": "tool2"}},
+            ],
+        },
+    ]
+    tools_used = extract_agent_tools_used_from_messages(messages)
+    assert len(tools_used) == 2
+    assert tools_used[0]["name"] == "success_tool"
+    assert tools_used[0]["is_error"] is False
+    assert tools_used[1]["name"] == "error_tool"
+    assert tools_used[1]["is_error"] is True

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/.github/ISSUE_TEMPLATE/bug_report.yml RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/.github/ISSUE_TEMPLATE/config.yml RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/.github/ISSUE_TEMPLATE/feature_request.yml RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/.github/PULL_REQUEST_TEMPLATE.md RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/.github/dependabot.yml RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/.github/workflows/integration-test.yml RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/.github/workflows/pr-and-push.yml RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/.github/workflows/test-lint.yml RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/.gitignore RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/.pre-commit-config.yaml RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/CODE_OF_CONDUCT.md RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/CONTRIBUTING.md RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/LICENSE RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/NOTICE RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/STYLE_GUIDE.md RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/case.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/display/display_console.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/faithfulness_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/goal_success_rate_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/harmfulness_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/helpfulness_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/interactions_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/output_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/prompt_templates/case_prompt_template.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/prompt_templates/faithfulness/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/prompt_templates/faithfulness/faithfulness_v0.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/prompt_templates/goal_success_rate/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/prompt_templates/goal_success_rate/goal_success_rate_v0.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/prompt_templates/harmfulness/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/prompt_templates/harmfulness/harmfulness_v0.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/prompt_templates/helpfulness/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/prompt_templates/helpfulness/helpfulness_v0.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/prompt_templates/prompt_templates.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/prompt_templates/tool_parameter_accuracy/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/prompt_templates/tool_parameter_accuracy/tool_parameter_accuracy_v0.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/prompt_templates/tool_selection_accuracy/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/prompt_templates/tool_selection_accuracy/tool_selection_accuracy_v0.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/tool_parameter_accuracy_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/tool_selection_accuracy_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/evaluators/trajectory_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/extractors/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/extractors/graph_extractor.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/extractors/swarm_extractor.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/extractors/trace_extractor.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/generators/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/generators/experiment_generator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/generators/prompt_template/prompt_templates.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/generators/topic_planner.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/mappers/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/mappers/session_mapper.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/mappers/strands_in_memory_session_mapper.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/simulation/README.md RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/simulation/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/simulation/actor_simulator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/simulation/profiles/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/simulation/profiles/actor_profile.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/simulation/prompt_templates/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/simulation/prompt_templates/actor_profile_extraction.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/simulation/prompt_templates/actor_system_prompt.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/simulation/prompt_templates/goal_completion.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/simulation/tools/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/simulation/tools/goal_completion.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/telemetry/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/telemetry/_cloudwatch_logger.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/telemetry/config.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/telemetry/tracer.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/tools/evaluation_tools.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/types/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/types/evaluation.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/types/evaluation_report.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/types/simulation/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/types/simulation/actor.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/src/strands_evals/types/trace.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/evaluators/test_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/evaluators/test_faithfulness_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/evaluators/test_goal_success_rate_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/evaluators/test_harmfulness_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/evaluators/test_helpfulness_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/evaluators/test_interactions_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/evaluators/test_output_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/evaluators/test_tool_parameter_accuracy_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/evaluators/test_tool_selection_accuracy_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/evaluators/test_trajectory_evaluator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/extractors/test_graph_extractor.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/extractors/test_swarm_extractor.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/extractors/test_trace_extractor.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/generators/test_experiment_generator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/generators/test_topic_planner.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/mappers/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/mappers/test_strands_in_memory_mapper.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/simulation/__init__.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/simulation/test_actor_simulator.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/simulation/test_goal_completion.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/telemetry/test_config.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/telemetry/test_tracer.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/test_cases.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/test_experiment.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/tools/test_evaluation_tools.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests/strands_evals/types/test_trace.py RENAMED Viewed

File without changes

{strands_agents_evals-0.1.0 → strands_agents_evals-0.1.1}/tests_integ/test_output_evaluator.py RENAMED Viewed

File without changes

strands-agents-evals 0.1.0__tar.gz → 0.1.1__tar.gz

strands-agents-evals 0.1.0tar.gz → 0.1.1tar.gz