PyPI - vision-agent - Versions diffs - 0.2.55__tar.gz → 0.2.57__tar.gz - Mend

vision-agent 0.2.55tar.gz → 0.2.57tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

{vision_agent-0.2.55 → vision_agent-0.2.57}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: vision-agent
-Version: 0.2.55
+Version: 0.2.57
 Summary: Toolset for Vision Agent
 Author: Landing AI
 Author-email: dev@landing.ai
@@ -38,7 +38,6 @@ Description-Content-Type: text/markdown
     <img alt="vision_agent" height="200px" src="https://github.com/landing-ai/vision-agent/blob/main/assets/logo.jpg?raw=true">
 # 🔍🤖 Vision Agent
 [![](https://dcbadge.vercel.app/api/server/wPdN8RCYew?compact=true&style=flat)](https://discord.gg/wPdN8RCYew)
 ![ci_status](https://github.com/landing-ai/vision-agent/actions/workflows/ci_cd.yml/badge.svg)
 [![PyPI version](https://badge.fury.io/py/vision-agent.svg)](https://badge.fury.io/py/vision-agent)
@@ -52,9 +51,14 @@ accomplish the task you want. Vision Agent aims to provide an in-seconds experie
 allowing users to describe their problem in text and have the agent framework generate
 code to solve the task for them. Check out our discord for updates and roadmaps!
+## Web Application
+Try Vision Agent live on [va.landing.ai](https://va.landing.ai/)
 ## Documentation
-- [Vision Agent Library Docs](https://landing-ai.github.io/vision-agent/)
+[Vision Agent Library Docs](https://landing-ai.github.io/vision-agent/)
 ## Getting Started
@@ -73,6 +77,7 @@ export OPENAI_API_KEY="your-api-key"
 ```
 ### Vision Agent
+#### Basic Usage
 You can interact with the agent as you would with any LLM or LMM model:
 ```python
@@ -88,28 +93,28 @@ from vision_agent.tools import load_image, grounding_sam
 def calculate_filled_percentage(image_path: str) -> float:
     # Step 1: Load the image
     image = load_image(image_path)
     # Step 2: Segment the jar
     jar_segments = grounding_sam(prompt="jar", image=image)
     # Step 3: Segment the coffee beans
     coffee_beans_segments = grounding_sam(prompt="coffee beans", image=image)
     # Step 4: Calculate the area of the segmented jar
     jar_area = 0
     for segment in jar_segments:
         jar_area += segment['mask'].sum()
     # Step 5: Calculate the area of the segmented coffee beans
     coffee_beans_area = 0
     for segment in coffee_beans_segments:
         coffee_beans_area += segment['mask'].sum()
     # Step 6: Compute the percentage of the jar area that is filled with coffee beans
     if jar_area == 0:
         return 0.0  # To avoid division by zero
     filled_percentage = (coffee_beans_area / jar_area) * 100
     # Step 7: Return the computed percentage
     return filled_percentage
 ```
@@ -121,10 +126,12 @@ mode by passing in the verbose argument:
 >>> agent = VisionAgent(verbose=2)
 ```
-You can also have it return more information by calling `chat_with_workflow`:
+#### Detailed Usage
+You can also have it return more information by calling `chat_with_workflow`. The format
+of the input is a list of dictionaries with the keys `role`, `content`, and `media`:
 ```python
->>> results = agent.chat_with_workflow([{"role": "user", "content": "What percentage of the area of the jar is filled with coffee beans?"}], media="jar.jpg")
+>>> results = agent.chat_with_workflow([{"role": "user", "content": "What percentage of the area of the jar is filled with coffee beans?", "media": ["jar.jpg"]}])
 >>> print(results)
 {
     "code": "from vision_agent.tools import ..."
@@ -135,19 +142,45 @@ You can also have it return more information by calling `chat_with_workflow`:
 }
 ```
-With this you can examine more detailed information such as the etesting code, testing
+With this you can examine more detailed information such as the testing code, testing
 results, plan or working memory it used to complete the task.
+#### Multi-turn conversations
+You can have multi-turn conversations with vision-agent as well, giving it feedback on
+the code and having it update. You just need to add the code as a response from the
+assistant:
+```python
+agent = va.agent.VisionAgent(verbosity=2)
+conv = [
+    {
+        "role": "user",
+        "content": "Are these workers wearing safety gear? Output only a True or False value.",
+        "media": ["workers.png"],
+    }
+]
+result = agent.chat_with_workflow(conv)
+code = result["code"]
+conv.append({"role": "assistant", "content": code})
+conv.append(
+    {
+        "role": "user",
+        "content": "Can you also return the number of workers wearing safety gear?",
+    }
+)
+result = agent.chat_with_workflow(conv)
+```
 ### Tools
 There are a variety of tools for the model or the user to use. Some are executed locally
-while others are hosted for you. You can also ask an LLM directly to build a tool for
+while others are hosted for you. You can also ask an LMM directly to build a tool for
 you. For example:
 ```python
 >>> import vision_agent as va
->>> llm = va.llm.OpenAILLM()
+>>> llm = va.llm.OpenAILMM()
 >>> detector = llm.generate_detector("Can you build a jar detector for me?")
->>> detector("jar.jpg")
+>>> detector(va.tools.load_image("jar.jpg"))
 [{"labels": ["jar",],
   "scores": [0.99],
   "bboxes": [

{vision_agent-0.2.55 → vision_agent-0.2.57}/README.md RENAMED Viewed

@@ -2,7 +2,6 @@
     <img alt="vision_agent" height="200px" src="https://github.com/landing-ai/vision-agent/blob/main/assets/logo.jpg?raw=true">
 # 🔍🤖 Vision Agent
 [![](https://dcbadge.vercel.app/api/server/wPdN8RCYew?compact=true&style=flat)](https://discord.gg/wPdN8RCYew)
 ![ci_status](https://github.com/landing-ai/vision-agent/actions/workflows/ci_cd.yml/badge.svg)
 [![PyPI version](https://badge.fury.io/py/vision-agent.svg)](https://badge.fury.io/py/vision-agent)
@@ -16,9 +15,14 @@ accomplish the task you want. Vision Agent aims to provide an in-seconds experie
 allowing users to describe their problem in text and have the agent framework generate
 code to solve the task for them. Check out our discord for updates and roadmaps!
+## Web Application
+Try Vision Agent live on [va.landing.ai](https://va.landing.ai/)
 ## Documentation
-- [Vision Agent Library Docs](https://landing-ai.github.io/vision-agent/)
+[Vision Agent Library Docs](https://landing-ai.github.io/vision-agent/)
 ## Getting Started
@@ -37,6 +41,7 @@ export OPENAI_API_KEY="your-api-key"
 ```
 ### Vision Agent
+#### Basic Usage
 You can interact with the agent as you would with any LLM or LMM model:
 ```python
@@ -52,28 +57,28 @@ from vision_agent.tools import load_image, grounding_sam
 def calculate_filled_percentage(image_path: str) -> float:
     # Step 1: Load the image
     image = load_image(image_path)
     # Step 2: Segment the jar
     jar_segments = grounding_sam(prompt="jar", image=image)
     # Step 3: Segment the coffee beans
     coffee_beans_segments = grounding_sam(prompt="coffee beans", image=image)
     # Step 4: Calculate the area of the segmented jar
     jar_area = 0
     for segment in jar_segments:
         jar_area += segment['mask'].sum()
     # Step 5: Calculate the area of the segmented coffee beans
     coffee_beans_area = 0
     for segment in coffee_beans_segments:
         coffee_beans_area += segment['mask'].sum()
     # Step 6: Compute the percentage of the jar area that is filled with coffee beans
     if jar_area == 0:
         return 0.0  # To avoid division by zero
     filled_percentage = (coffee_beans_area / jar_area) * 100
     # Step 7: Return the computed percentage
     return filled_percentage
 ```
@@ -85,10 +90,12 @@ mode by passing in the verbose argument:
 >>> agent = VisionAgent(verbose=2)
 ```
-You can also have it return more information by calling `chat_with_workflow`:
+#### Detailed Usage
+You can also have it return more information by calling `chat_with_workflow`. The format
+of the input is a list of dictionaries with the keys `role`, `content`, and `media`:
 ```python
->>> results = agent.chat_with_workflow([{"role": "user", "content": "What percentage of the area of the jar is filled with coffee beans?"}], media="jar.jpg")
+>>> results = agent.chat_with_workflow([{"role": "user", "content": "What percentage of the area of the jar is filled with coffee beans?", "media": ["jar.jpg"]}])
 >>> print(results)
 {
     "code": "from vision_agent.tools import ..."
@@ -99,19 +106,45 @@ You can also have it return more information by calling `chat_with_workflow`:
 }
 ```
-With this you can examine more detailed information such as the etesting code, testing
+With this you can examine more detailed information such as the testing code, testing
 results, plan or working memory it used to complete the task.
+#### Multi-turn conversations
+You can have multi-turn conversations with vision-agent as well, giving it feedback on
+the code and having it update. You just need to add the code as a response from the
+assistant:
+```python
+agent = va.agent.VisionAgent(verbosity=2)
+conv = [
+    {
+        "role": "user",
+        "content": "Are these workers wearing safety gear? Output only a True or False value.",
+        "media": ["workers.png"],
+    }
+]
+result = agent.chat_with_workflow(conv)
+code = result["code"]
+conv.append({"role": "assistant", "content": code})
+conv.append(
+    {
+        "role": "user",
+        "content": "Can you also return the number of workers wearing safety gear?",
+    }
+)
+result = agent.chat_with_workflow(conv)
+```
 ### Tools
 There are a variety of tools for the model or the user to use. Some are executed locally
-while others are hosted for you. You can also ask an LLM directly to build a tool for
+while others are hosted for you. You can also ask an LMM directly to build a tool for
 you. For example:
 ```python
 >>> import vision_agent as va
->>> llm = va.llm.OpenAILLM()
+>>> llm = va.llm.OpenAILMM()
 >>> detector = llm.generate_detector("Can you build a jar detector for me?")
->>> detector("jar.jpg")
+>>> detector(va.tools.load_image("jar.jpg"))
 [{"labels": ["jar",],
   "scores": [0.99],
   "bboxes": [

{vision_agent-0.2.55 → vision_agent-0.2.57}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
 [tool.poetry]
 name = "vision-agent"
-version = "0.2.55"
+version = "0.2.57"
 description = "Toolset for Vision Agent"
 authors = ["Landing AI <dev@landing.ai>"]
 readme = "README.md"

vision_agent-0.2.57/vision_agent/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ from .agent import Agent
2	+ from .lmm import LMM, OpenAILMM

{vision_agent-0.2.55 → vision_agent-0.2.57}/vision_agent/agent/agent.py RENAMED Viewed

@@ -2,12 +2,14 @@ from abc import ABC, abstractmethod
 from pathlib import Path
 from typing import Any, Dict, List, Optional, Union
+from vision_agent.lmm import Message
 class Agent(ABC):
     @abstractmethod
     def __call__(
         self,
-        input: Union[List[Dict[str, str]], str],
+        input: Union[str, List[Message]],
         media: Optional[Union[str, Path]] = None,
     ) -> str:
         pass

{vision_agent-0.2.55 → vision_agent-0.2.57}/vision_agent/agent/vision_agent.py RENAMED Viewed

@@ -13,7 +13,6 @@ from rich.style import Style
 from rich.syntax import Syntax
 from tabulate import tabulate
-from vision_agent.llm.llm import AzureOpenAILLM
 import vision_agent.tools as T
 from vision_agent.agent import Agent
 from vision_agent.agent.vision_agent_prompts import (
@@ -25,8 +24,7 @@ from vision_agent.agent.vision_agent_prompts import (
     SIMPLE_TEST,
     USER_REQ,
 )
-from vision_agent.llm import LLM, OpenAILLM
-from vision_agent.lmm import LMM, OpenAILMM
+from vision_agent.lmm import LMM, AzureOpenAILMM, Message, OpenAILMM
 from vision_agent.utils import CodeInterpreterFactory, Execution
 from vision_agent.utils.execute import CodeInterpreter
 from vision_agent.utils.image_utils import b64_to_pil
@@ -133,11 +131,10 @@ def extract_image(
 def write_plan(
-    chat: List[Dict[str, str]],
+    chat: List[Message],
     tool_desc: str,
     working_memory: str,
-    model: Union[LLM, LMM],
-    media: Optional[Sequence[Union[str, Path]]] = None,
+    model: LMM,
 ) -> List[Dict[str, str]]:
     chat = copy.deepcopy(chat)
     if chat[-1]["role"] != "user":
@@ -147,18 +144,58 @@ def write_plan(
     context = USER_REQ.format(user_request=user_request)
     prompt = PLAN.format(context=context, tool_desc=tool_desc, feedback=working_memory)
     chat[-1]["content"] = prompt
-    if isinstance(model, OpenAILMM):
-        media = extract_image(media)
-        return extract_json(model.chat(chat, images=media))["plan"]  # type: ignore
-    else:
-        return extract_json(model.chat(chat))["plan"]  # type: ignore
+    return extract_json(model.chat(chat))["plan"]  # type: ignore
+def write_code(
+    coder: LMM,
+    chat: List[Message],
+    tool_info: str,
+    feedback: str,
+) -> str:
+    chat = copy.deepcopy(chat)
+    if chat[-1]["role"] != "user":
+        raise ValueError("Last chat message must be from the user.")
+    user_request = chat[-1]["content"]
+    prompt = CODE.format(
+        docstring=tool_info,
+        question=user_request,
+        feedback=feedback,
+    )
+    chat[-1]["content"] = prompt
+    return extract_code(coder(chat))
+def write_test(
+    tester: LMM,
+    chat: List[Message],
+    tool_utils: str,
+    code: str,
+    feedback: str,
+    media: Optional[Sequence[Union[str, Path]]] = None,
+) -> str:
+    chat = copy.deepcopy(chat)
+    if chat[-1]["role"] != "user":
+        raise ValueError("Last chat message must be from the user.")
+    user_request = chat[-1]["content"]
+    prompt = SIMPLE_TEST.format(
+        docstring=tool_utils,
+        question=user_request,
+        code=code,
+        feedback=feedback,
+        media=media,
+    )
+    chat[-1]["content"] = prompt
+    return extract_code(tester(chat))
 def reflect(
-    chat: List[Dict[str, str]],
+    chat: List[Message],
     plan: str,
     code: str,
-    model: Union[LLM, LMM],
+    model: LMM,
 ) -> Dict[str, Union[str, bool]]:
     chat = copy.deepcopy(chat)
     if chat[-1]["role"] != "user":
@@ -168,22 +205,22 @@ def reflect(
     context = USER_REQ.format(user_request=user_request)
     prompt = REFLECT.format(context=context, plan=plan, code=code)
     chat[-1]["content"] = prompt
-    return extract_json(model.chat(chat))
+    return extract_json(model(chat))
 def write_and_test_code(
-    task: str,
+    chat: List[Message],
     tool_info: str,
     tool_utils: str,
     working_memory: List[Dict[str, str]],
-    coder: LLM,
-    tester: LLM,
-    debugger: LLM,
+    coder: LMM,
+    tester: LMM,
+    debugger: LMM,
     code_interpreter: CodeInterpreter,
     log_progress: Callable[[Dict[str, Any]], None],
     verbosity: int = 0,
     max_retries: int = 3,
-    input_media: Optional[Union[str, Path]] = None,
+    media: Optional[Sequence[Union[str, Path]]] = None,
 ) -> Dict[str, Any]:
     log_progress(
         {
@@ -191,25 +228,9 @@ def write_and_test_code(
             "status": "started",
         }
     )
-    code = extract_code(
-        coder(
-            CODE.format(
-                docstring=tool_info,
-                question=task,
-                feedback=format_memory(working_memory),
-            )
-        )
-    )
-    test = extract_code(
-        tester(
-            SIMPLE_TEST.format(
-                docstring=tool_utils,
-                question=task,
-                code=code,
-                feedback=working_memory,
-                media=input_media,
-            )
-        )
+    code = write_code(coder, chat, tool_info, format_memory(working_memory))
+    test = write_test(
+        tester, chat, tool_utils, code, format_memory(working_memory), media
     )
     log_progress(
@@ -392,10 +413,10 @@ class VisionAgent(Agent):
     def __init__(
         self,
-        planner: Optional[Union[LLM, LMM]] = None,
-        coder: Optional[LLM] = None,
-        tester: Optional[LLM] = None,
-        debugger: Optional[LLM] = None,
+        planner: Optional[LMM] = None,
+        coder: Optional[LMM] = None,
+        tester: Optional[LMM] = None,
+        debugger: Optional[LMM] = None,
         tool_recommender: Optional[Sim] = None,
         verbosity: int = 0,
         report_progress_callback: Optional[Callable[[Dict[str, Any]], None]] = None,
@@ -403,10 +424,10 @@ class VisionAgent(Agent):
         """Initialize the Vision Agent.
         Parameters:
-            planner (Optional[LLM]): The planner model to use. Defaults to OpenAILLM.
-            coder (Optional[LLM]): The coder model to use. Defaults to OpenAILLM.
-            tester (Optional[LLM]): The tester model to use. Defaults to OpenAILLM.
-            debugger (Optional[LLM]): The debugger model to
+            planner (Optional[LMM]): The planner model to use. Defaults to OpenAILMM.
+            coder (Optional[LMM]): The coder model to use. Defaults to OpenAILMM.
+            tester (Optional[LMM]): The tester model to use. Defaults to OpenAILMM.
+            debugger (Optional[LMM]): The debugger model to
             tool_recommender (Optional[Sim]): The tool recommender model to use.
             verbosity (int): The verbosity level of the agent. Defaults to 0. 2 is the
                 highest verbosity level which will output all intermediate debugging
@@ -418,12 +439,12 @@ class VisionAgent(Agent):
         """
         self.planner = (
-            OpenAILLM(temperature=0.0, json_mode=True) if planner is None else planner
+            OpenAILMM(temperature=0.0, json_mode=True) if planner is None else planner
         )
-        self.coder = OpenAILLM(temperature=0.0) if coder is None else coder
-        self.tester = OpenAILLM(temperature=0.0) if tester is None else tester
+        self.coder = OpenAILMM(temperature=0.0) if coder is None else coder
+        self.tester = OpenAILMM(temperature=0.0) if tester is None else tester
         self.debugger = (
-            OpenAILLM(temperature=0.0, json_mode=True) if debugger is None else debugger
+            OpenAILMM(temperature=0.0, json_mode=True) if debugger is None else debugger
         )
         self.tool_recommender = (
@@ -437,7 +458,7 @@ class VisionAgent(Agent):
     def __call__(
         self,
-        input: Union[List[Dict[str, str]], str],
+        input: Union[str, List[Message]],
         media: Optional[Union[str, Path]] = None,
     ) -> str:
         """Chat with Vision Agent and return intermediate information regarding the task.
@@ -454,23 +475,26 @@ class VisionAgent(Agent):
         if isinstance(input, str):
             input = [{"role": "user", "content": input}]
-        results = self.chat_with_workflow(input, media)
+            if media is not None:
+                input[0]["media"] = [media]
+        results = self.chat_with_workflow(input)
         results.pop("working_memory")
         return results  # type: ignore
     def chat_with_workflow(
         self,
-        chat: List[Dict[str, str]],
-        media: Optional[Union[str, Path]] = None,
+        chat: List[Message],
         self_reflection: bool = False,
         display_visualization: bool = False,
     ) -> Dict[str, Any]:
         """Chat with Vision Agent and return intermediate information regarding the task.
         Parameters:
-            chat (List[Dict[str, str]]): A conversation in the format of
-                [{"role": "user", "content": "describe your task here..."}].
-            media (Optional[Union[str, Path]]): The media file to be used in the task.
+            chat (List[MediaChatItem]): A conversation
+                in the format of:
+                [{"role": "user", "content": "describe your task here..."}]
+                or if it contains media files, it should be in the format of:
+                [{"role": "user", "content": "describe your task here...", "media": ["image1.jpg", "image2.jpg"]}]
             self_reflection (bool): Whether to reflect on the task and debug the code.
             display_visualization (bool): If True, it opens a new window locally to
                 show the image(s) created by visualization code (if there is any).
@@ -485,11 +509,19 @@ class VisionAgent(Agent):
         # NOTE: each chat should have a dedicated code interpreter instance to avoid concurrency issues
         with CodeInterpreterFactory.new_instance() as code_interpreter:
-            if media is not None:
-                media = code_interpreter.upload_file(media)
-                for chat_i in chat:
-                    if chat_i["role"] == "user":
-                        chat_i["content"] += f" Image name {media}"
+            chat = copy.deepcopy(chat)
+            media_list = []
+            for chat_i in chat:
+                if "media" in chat_i:
+                    for media in chat_i["media"]:
+                        media = code_interpreter.upload_file(media)
+                        chat_i["content"] += f" Media name {media}"  # type: ignore
+                        media_list.append(media)
+            int_chat = cast(
+                List[Message],
+                [{"role": c["role"], "content": c["content"]} for c in chat],
+            )
             code = ""
             test = ""
@@ -507,11 +539,10 @@ class VisionAgent(Agent):
                     }
                 )
                 plan_i = write_plan(
-                    chat,
+                    int_chat,
                     T.TOOL_DESCRIPTIONS,
                     format_memory(working_memory),
                     self.planner,
-                    media=[media] if media else None,
                 )
                 plan_i_str = "\n-".join([e["instructions"] for e in plan_i])
@@ -534,9 +565,7 @@ class VisionAgent(Agent):
                     self.verbosity,
                 )
                 results = write_and_test_code(
-                    task=FULL_TASK.format(
-                        user_request=chat[0]["content"], subtasks=plan_i_str
-                    ),
+                    chat=int_chat,
                     tool_info=tool_info,
                     tool_utils=T.UTILITIES_DOCSTRING,
                     working_memory=working_memory,
@@ -546,7 +575,7 @@ class VisionAgent(Agent):
                     code_interpreter=code_interpreter,
                     log_progress=self.log_progress,
                     verbosity=self.verbosity,
-                    input_media=media,
+                    media=media_list,
                 )
                 success = cast(bool, results["success"])
                 code = cast(str, results["code"])
@@ -564,7 +593,7 @@ class VisionAgent(Agent):
                     }
                 )
                 reflection = reflect(
-                    chat,
+                    int_chat,
                     FULL_TASK.format(
                         user_request=chat[0]["content"], subtasks=plan_i_str
                     ),
@@ -634,10 +663,10 @@ class AzureVisionAgent(VisionAgent):
     def __init__(
         self,
-        planner: Optional[Union[LLM, LMM]] = None,
-        coder: Optional[LLM] = None,
-        tester: Optional[LLM] = None,
-        debugger: Optional[LLM] = None,
+        planner: Optional[LMM] = None,
+        coder: Optional[LMM] = None,
+        tester: Optional[LMM] = None,
+        debugger: Optional[LMM] = None,
         tool_recommender: Optional[Sim] = None,
         verbosity: int = 0,
         report_progress_callback: Optional[Callable[[Dict[str, Any]], None]] = None,
@@ -645,10 +674,10 @@ class AzureVisionAgent(VisionAgent):
         """Initialize the Vision Agent.
         Parameters:
-            planner (Optional[LLM]): The planner model to use. Defaults to OpenAILLM.
-            coder (Optional[LLM]): The coder model to use. Defaults to OpenAILLM.
-            tester (Optional[LLM]): The tester model to use. Defaults to OpenAILLM.
-            debugger (Optional[LLM]): The debugger model to
+            planner (Optional[LMM]): The planner model to use. Defaults to OpenAILMM.
+            coder (Optional[LMM]): The coder model to use. Defaults to OpenAILMM.
+            tester (Optional[LMM]): The tester model to use. Defaults to OpenAILMM.
+            debugger (Optional[LMM]): The debugger model to
             tool_recommender (Optional[Sim]): The tool recommender model to use.
             verbosity (int): The verbosity level of the agent. Defaults to 0. 2 is the
                 highest verbosity level which will output all intermediate debugging
@@ -660,14 +689,14 @@ class AzureVisionAgent(VisionAgent):
         """
         super().__init__(
             planner=(
-                AzureOpenAILLM(temperature=0.0, json_mode=True)
+                AzureOpenAILMM(temperature=0.0, json_mode=True)
                 if planner is None
                 else planner
             ),
-            coder=AzureOpenAILLM(temperature=0.0) if coder is None else coder,
-            tester=AzureOpenAILLM(temperature=0.0) if tester is None else tester,
+            coder=AzureOpenAILMM(temperature=0.0) if coder is None else coder,
+            tester=AzureOpenAILMM(temperature=0.0) if tester is None else tester,
             debugger=(
-                AzureOpenAILLM(temperature=0.0, json_mode=True)
+                AzureOpenAILMM(temperature=0.0, json_mode=True)
                 if debugger is None
                 else debugger
             ),

{vision_agent-0.2.55 → vision_agent-0.2.57}/vision_agent/agent/vision_agent_prompts.py RENAMED Viewed

@@ -171,7 +171,7 @@ This is the documentation for the functions you have access to. You may call any
 **Instructions**:
 1. Verify the fundamental functionality under normal conditions.
 2. Ensure each test case is well-documented with comments explaining the scenario it covers.
-3. Your test case MUST run only on the given image which is {media}
+3. Your test case MUST run only on the given images which are {media}
 4. Your test case MUST run only with the given values which is available in the question - {question}
 5. DO NOT use any non-existent or dummy image or video files that are not provided by the user's instructions.
 6. DO NOT mock any functions, you must test their functionality as is.

vision_agent-0.2.57/vision_agent/lmm/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ from .lmm import LMM, AzureOpenAILMM, Message, OpenAILMM

vision-agent 0.2.55__tar.gz → 0.2.57__tar.gz

vision-agent 0.2.55tar.gz → 0.2.57tar.gz