PyPI - langwatch-scenario - Versions diffs - 0.1.1__py3-none-any.whl → 0.1.3__py3-none-any.whl - Mend

langwatch-scenario 0.1.1py3-none-any.whl → 0.1.3py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

{langwatch_scenario-0.1.1.dist-info → langwatch_scenario-0.1.3.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: langwatch-scenario
-Version: 0.1.1
+Version: 0.1.3
 Summary: The end-to-end agent testing library
 Author-email: LangWatch Team <support@langwatch.ai>
 License: MIT
@@ -24,14 +24,14 @@ Requires-Dist: pydantic>=2.7.0
 Requires-Dist: joblib>=1.4.2
 Requires-Dist: wrapt>=1.17.2
 Requires-Dist: pytest-asyncio>=0.26.0
-Requires-Dist: rich>=14.0.0
+Requires-Dist: rich<15.0.0,>=13.3.3
 Provides-Extra: dev
 Requires-Dist: black; extra == "dev"
 Requires-Dist: isort; extra == "dev"
 Requires-Dist: mypy; extra == "dev"
 Requires-Dist: pytest-cov; extra == "dev"
-![scenario](./assets/scenario-wide.webp)
+![scenario](https://github.com/langwatch/scenario/raw/main/assets/scenario-wide.webp)
 <div align="center">
 <!-- Discord, PyPI, Docs, etc links -->
@@ -43,6 +43,8 @@ Scenario is a library for testing agents end-to-end as a human would, but withou
 You define the scenarios, and the testing agent will simulate your users as it follows them, it will keep chatting and evaluating your agent until it reaches the desired goal or detects an unexpected behavior.
+[📺 Video Tutorial](https://www.youtube.com/watch?v=f8NLpkY0Av4)
 ## Getting Started
 Install pytest and scenario:
@@ -51,12 +53,12 @@ Install pytest and scenario:
 pip install pytest langwatch-scenario
 ```
-Now create your first scenario:
+Now create your first scenario and save it as `tests/test_vegetarian_recipe_agent.py`:
 ```python
 import pytest
-from scenario import Scenario, TestingAgent
+from scenario import Scenario, TestingAgent, scenario_cache
 Scenario.configure(testing_agent=TestingAgent(model="openai/gpt-4o-mini"))
@@ -64,37 +66,78 @@ Scenario.configure(testing_agent=TestingAgent(model="openai/gpt-4o-mini"))
 @pytest.mark.agent_test
 @pytest.mark.asyncio
 async def test_vegetarian_recipe_agent():
+    agent = VegetarianRecipeAgent()
     def vegetarian_recipe_agent(message, context):
         # Call your agent here
-        response = "<Your agent's response>"
-        return {"message": response}
+        return agent.run(message)
+    # Define the scenario
     scenario = Scenario(
         "User is looking for a dinner idea",
         agent=vegetarian_recipe_agent,
         success_criteria=[
             "Recipe agent generates a vegetarian recipe",
+            "Recipe includes a list of ingredients",
             "Recipe includes step-by-step cooking instructions",
         ],
         failure_criteria=[
-            "The recipe includes meat",
+            "The recipe is not vegetarian or includes meat",
             "The agent asks more than two follow-up questions",
         ],
     )
+    # Run the scenario and get results
     result = await scenario.run()
+    # Assert for pytest to know whether the test passed
     assert result.success
+# Example agent implementation
+import litellm
+class VegetarianRecipeAgent:
+    def __init__(self):
+        self.history = []
+    @scenario_cache()
+    def run(self, message: str):
+        self.history.append({"role": "user", "content": message})
+        response = litellm.completion(
+            model="openai/gpt-4o-mini",
+            messages=[
+                {
+                    "role": "system",
+                    "content": """You are a vegetarian recipe agent.
+                    Given the user request, ask AT MOST ONE follow-up question,
+                    then provide a complete recipe. Keep your responses concise and focused.""",
+                },
+                *self.history,
+            ],
+        )
+        message = response.choices[0].message  # type: ignore
+        self.history.append(message)
+        return {"messages": [message]}
+```
+Create a `.env` file and put your OpenAI API key in it:
+```bash
+OPENAI_API_KEY=<your-api-key>
 ```
-Save it as `tests/test_vegetarian_recipe_agent.py` and run it with pytest:
+Now run it with pytest:
 ```bash
 pytest -s tests/test_vegetarian_recipe_agent.py
 ```
-Once you connect to callback to a real agent, this is how it will look like:
+This is how it will look like:
 [![asciicast](https://asciinema.org/a/nvO5GWGzqKTTCd8gtNSezQw11.svg)](https://asciinema.org/a/nvO5GWGzqKTTCd8gtNSezQw11)
@@ -132,7 +175,7 @@ You can find a fully working Lovable Clone example in [examples/test_lovable_clo
 ## Debug mode
-You can enable debug mode by setting the `debug` field to `True` in the `Scenario.configure` method or in the specific scenario you are running.
+You can enable debug mode by setting the `debug` field to `True` in the `Scenario.configure` method or in the specific scenario you are running, or by passing the `--debug` flag to pytest.
 Debug mode allows you to see the messages in slow motion step by step, and intervene with your own inputs to debug your agent from the middle of the conversation.
@@ -140,6 +183,12 @@ Debug mode allows you to see the messages in slow motion step by step, and inter
 Scenario.configure(testing_agent=TestingAgent(model="openai/gpt-4o-mini"), debug=True)
 ```
+or
+```bash
+pytest -s tests/test_vegetarian_recipe_agent.py --debug
+```
 ## Cache
 Each time the scenario runs, the testing agent might chose a different input to start, this is good to make sure it covers the variance of real users as well, however we understand that the non-deterministic nature of it might make it less repeatable, costly and harder to debug. To solve for it, you can use the `cache_key` field in the `Scenario.configure` method or in the specific scenario you are running, this will make the testing agent give the same input for given the same scenario:

langwatch_scenario-0.1.3.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,15 @@
+scenario/__init__.py,sha256=LfCjOpbn55jYBBZHyMSZtRAWeCDFn4z4OhAyFnu8aMg,602
+scenario/cache.py,sha256=sYu16SAf-BnVYkWSlEDzpyynJGIQyNYsgMXPgCqEnmk,1719
+scenario/config.py,sha256=5UVBmuQDtni0Yu00bMh5p0xMGsrymYVRftXBGTsi2fI,802
+scenario/error_messages.py,sha256=ZMcAOKJmKaLIinMZ0yBIOgDhPfeJH0uZxIEmolRArtc,2344
+scenario/pytest_plugin.py,sha256=BuBbyKLa-t9AFVn9EETl7OvGSt__dFO7KnbZynfS1UM,5789
+scenario/result.py,sha256=SGF8uYNtkP7cJy4KsshUozZRevmdiyX2TFzr6VreTv8,2717
+scenario/scenario.py,sha256=tYn3Y1sK6_7pg7hFb_5w0TW6nun-za_4F8kqcnrXXU4,4077
+scenario/scenario_executor.py,sha256=c8xV6GoJgO2JoZBWpYPQN5YwwQ3G9iJUtXV9UGSf1q8,7919
+scenario/testing_agent.py,sha256=eS-c_io5cHgzJ88wwRvU_vve-pmB2HsGWN6qwlq0sPg,10865
+scenario/utils.py,sha256=tMESosrxesA1B5zZB3IJ-sNSXDmnpNNib-DHobveVLA,3918
+langwatch_scenario-0.1.3.dist-info/METADATA,sha256=7OIolGcZ3fkCXFmE6JHkckVCeJb1r3yYSYveJ6iE9zw,8801
+langwatch_scenario-0.1.3.dist-info/WHEEL,sha256=pxyMxgL8-pra_rKaQ4drOZAegBVuX-G_4nRHjjgWbmo,91
+langwatch_scenario-0.1.3.dist-info/entry_points.txt,sha256=WlEnJ_gku0i18bIa3DSuGqXRX-QDQLe_s0YmRzK45TI,45
+langwatch_scenario-0.1.3.dist-info/top_level.txt,sha256=45Mn28aedJsetnBMB5xSmrJ-yo701QLH89Zlz4r1clE,9
+langwatch_scenario-0.1.3.dist-info/RECORD,,

{langwatch_scenario-0.1.1.dist-info → langwatch_scenario-0.1.3.dist-info}/WHEEL RENAMED Viewed

@@ -1,5 +1,5 @@
 Wheel-Version: 1.0
-Generator: setuptools (78.1.0)
+Generator: setuptools (79.0.0)
 Root-Is-Purelib: true
 Tag: py3-none-any

scenario/error_messages.py CHANGED Viewed

@@ -1,3 +1,5 @@
+from textwrap import indent
+from typing import Any
 import termcolor
@@ -37,9 +39,17 @@ default_config_error_message = f"""
                           """
-message_return_error_message = f"""
+def message_return_error_message(got: Any):
+    got_ = got.__repr__()
+    if len(got_) > 100:
+        got_ = got_[:100] + "..."
- {termcolor.colored("->", "cyan")} Your agent should return a dict with either a "message" string key or a "messages" key in OpenAI messages format so the testing agent can understand what happened. For example:
+    return f"""
+ {termcolor.colored("->", "cyan")} Your agent returned:
+{indent(got_, ' ' * 4)}
+ {termcolor.colored("->", "cyan")} But your agent should return a dict with either a "message" string key or a "messages" key in OpenAI messages format so the testing agent can understand what happened. For example:
     def my_agent_under_test(message, context):
         response = call_my_agent(message)

scenario/pytest_plugin.py CHANGED Viewed

@@ -11,14 +11,16 @@ from scenario.result import ScenarioResult
 from .scenario import Scenario
 class ScenarioReporterResults(TypedDict):
     scenario: Scenario
     result: ScenarioResult
 # ScenarioReporter class definition moved outside the fixture for global use
 class ScenarioReporter:
     def __init__(self):
-        self.results : list[ScenarioReporterResults] = []
+        self.results: list[ScenarioReporterResults] = []
     def add_result(self, scenario, result):
         """Add a test result to the reporter."""
@@ -83,7 +85,12 @@ class ScenarioReporter:
                 f"\n{idx}. {scenario.description} - {colored(status, status_color, attrs=['bold'])}{time}"
             )
-            print(colored(f"   Reasoning: {result.reasoning}", "green" if result.success else "red"))
+            print(
+                colored(
+                    f"   Reasoning: {result.reasoning}",
+                    "green" if result.success else "red",
+                )
+            )
             if hasattr(result, "met_criteria") and result.met_criteria:
                 criteria_count = len(result.met_criteria)
@@ -119,6 +126,10 @@ def pytest_configure(config):
         "markers", "agent_test: mark test as an agent scenario test"
     )
+    if config.getoption("--debug"):
+        print(colored("\nScenario debug mode enabled (--debug).", "yellow"))
+        Scenario.configure(verbose=True, debug=True)
     # Create a global reporter instance
     config._scenario_reporter = ScenarioReporter()
@@ -128,7 +139,12 @@ def pytest_configure(config):
         result = await original_run(self, *args, **kwargs)
         # Always report to the global reporter
-        config._scenario_reporter.add_result(self, result)
+        # Ensure the reporter exists before adding result
+        if hasattr(config, "_scenario_reporter"):
+            config._scenario_reporter.add_result(self, result)
+        else:
+            # Handle case where reporter might not be initialized (should not happen with current setup)
+            print(colored("Warning: Scenario reporter not found during run.", "yellow"))
         return result

scenario/scenario.py CHANGED Viewed

@@ -105,6 +105,7 @@ class Scenario(ScenarioConfig):
         max_turns: Optional[int] = None,
         verbose: Optional[Union[bool, int]] = None,
         cache_key: Optional[str] = None,
+        debug: Optional[bool] = None,
     ) -> None:
         existing_config = getattr(cls, "default_config", ScenarioConfig())
@@ -114,5 +115,6 @@ class Scenario(ScenarioConfig):
                 max_turns=max_turns,
                 verbose=verbose,
                 cache_key=cache_key,
+                debug=debug,
             )
         )

scenario/scenario_executor.py CHANGED Viewed

@@ -52,17 +52,17 @@ class ScenarioExecutor:
         # Run the initial testing agent prompt to get started
         total_start_time = time.time()
         context_scenario.set(self.scenario)
-        initial_message = self._generate_next_message(
+        next_message = self._generate_next_message(
             self.scenario, self.conversation, first_message=True
         )
-        if isinstance(initial_message, ScenarioResult):
+        if isinstance(next_message, ScenarioResult):
             raise Exception(
                 "Unexpectedly generated a ScenarioResult for the initial message",
-                initial_message.__repr__(),
+                next_message.__repr__(),
             )
         elif self.scenario.verbose:
-            print(self._scenario_name() + termcolor.colored("User:", "green"), initial_message)
+            print(self._scenario_name() + termcolor.colored("User:", "green"), next_message)
         # Execute the conversation
         current_turn = 0
@@ -72,14 +72,14 @@ class ScenarioExecutor:
         # Start the test with the initial message
         while current_turn < max_turns:
             # Record the testing agent's message
-            self.conversation.append({"role": "user", "content": initial_message})
+            self.conversation.append({"role": "user", "content": next_message})
             # Get response from the agent under test
             start_time = time.time()
             context_scenario.set(self.scenario)
             with show_spinner(text="Agent:", color="blue", enabled=self.scenario.verbose):
-                agent_response = self.scenario.agent(initial_message, context)
+                agent_response = self.scenario.agent(next_message, context)
                 if isinstance(agent_response, Awaitable):
                     agent_response = await agent_response
@@ -97,10 +97,10 @@ class ScenarioExecutor:
                 )
             )
             if not has_valid_message and not has_valid_messages:
-                raise Exception(message_return_error_message)
+                raise Exception(message_return_error_message(agent_response))
             messages: list[ChatCompletionMessageParam] = []
-            if has_valid_messages:
+            if has_valid_messages and len(agent_response["messages"]) > 0:
                 messages = agent_response["messages"]
                 # Drop the first messages both if they are system or user messages
@@ -110,7 +110,7 @@ class ScenarioExecutor:
                     messages = messages[1:]
             if has_valid_message and self.scenario.verbose:
-                print(self._scenario_name(), termcolor.colored("Agent:", "blue"), agent_response["message"])
+                print(self._scenario_name() + termcolor.colored("Agent:", "blue"), agent_response["message"])
             if messages and self.scenario.verbose:
                 print_openai_messages(self._scenario_name(), messages)
@@ -159,7 +159,7 @@ class ScenarioExecutor:
                 print(self._scenario_name() + termcolor.colored("User:", "green"), result)
             # Otherwise, it's the next message to send to the agent
-            initial_message = result
+            next_message = result
             # Increment turn counter
             current_turn += 1

scenario/testing_agent.py CHANGED Viewed

@@ -249,9 +249,12 @@ if you don't have enough information to make a verdict, say inconclusive with ma
                     except json.JSONDecodeError:
                         logger.error("Failed to parse tool call arguments")
-            # If no tool call or invalid tool call, use the message content as next message
+            # If no tool call use the message content as next message
             message_content = message.content
             if message_content is None:
+                # If invalid tool call, raise an error
+                if message.tool_calls:
+                    raise Exception(f"Invalid tool call from testing agent: {message.tool_calls.__repr__()}")
                 raise Exception(f"No response from LLM: {response.__repr__()}")
             return message_content

langwatch_scenario-0.1.1.dist-info/RECORD DELETED Viewed

@@ -1,15 +0,0 @@
-scenario/__init__.py,sha256=LfCjOpbn55jYBBZHyMSZtRAWeCDFn4z4OhAyFnu8aMg,602
-scenario/cache.py,sha256=sYu16SAf-BnVYkWSlEDzpyynJGIQyNYsgMXPgCqEnmk,1719
-scenario/config.py,sha256=5UVBmuQDtni0Yu00bMh5p0xMGsrymYVRftXBGTsi2fI,802
-scenario/error_messages.py,sha256=8bTwG_iKz7FjGp50FU0anQ1fmI6eJE4NeaoXtiifbBg,2099
-scenario/pytest_plugin.py,sha256=ydtQxaN09qzoo12nNT8BQY_UPPHAt-AH92HWnPEN6bI,5212
-scenario/result.py,sha256=SGF8uYNtkP7cJy4KsshUozZRevmdiyX2TFzr6VreTv8,2717
-scenario/scenario.py,sha256=MqsyiNue1KC4mtvTHnJqJ6Fj3u0TTAdAYann8P8WBBQ,4010
-scenario/scenario_executor.py,sha256=bDzoatslbp80dG6DU-i2VUlOa9SMtyw2VIhcF7knwis,7883
-scenario/testing_agent.py,sha256=wMK2GqmN4QDr0kFoxgqcAPsU6gjCx8HBJQv1wmsdSb4,10683
-scenario/utils.py,sha256=tMESosrxesA1B5zZB3IJ-sNSXDmnpNNib-DHobveVLA,3918
-langwatch_scenario-0.1.1.dist-info/METADATA,sha256=SL8rtzUuSwBthrIfjiSLpPNxFt1kX8Vd1TzETBw1oys,7435
-langwatch_scenario-0.1.1.dist-info/WHEEL,sha256=CmyFI0kx5cdEMTLiONQRbGQwjIoR1aIYB7eCAQ4KPJ0,91
-langwatch_scenario-0.1.1.dist-info/entry_points.txt,sha256=WlEnJ_gku0i18bIa3DSuGqXRX-QDQLe_s0YmRzK45TI,45
-langwatch_scenario-0.1.1.dist-info/top_level.txt,sha256=45Mn28aedJsetnBMB5xSmrJ-yo701QLH89Zlz4r1clE,9
-langwatch_scenario-0.1.1.dist-info/RECORD,,

{langwatch_scenario-0.1.1.dist-info → langwatch_scenario-0.1.3.dist-info}/entry_points.txt RENAMED Viewed

File without changes

{langwatch_scenario-0.1.1.dist-info → langwatch_scenario-0.1.3.dist-info}/top_level.txt RENAMED Viewed

File without changes

langwatch-scenario 0.1.1__py3-none-any.whl → 0.1.3__py3-none-any.whl

langwatch-scenario 0.1.1py3-none-any.whl → 0.1.3py3-none-any.whl