PyPI - vision-agent - Versions diffs - 0.2.30__tar.gz → 0.2.31__tar.gz - Mend

vision-agent 0.2.30tar.gz → 0.2.31tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

vision_agent-0.2.31/PKG-INFO ADDED Viewed

@@ -0,0 +1,175 @@
+Metadata-Version: 2.1
+Name: vision-agent
+Version: 0.2.31
+Summary: Toolset for Vision Agent
+Author: Landing AI
+Author-email: dev@landing.ai
+Requires-Python: >=3.9,<4.0
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Requires-Dist: ipykernel (>=6.29.4,<7.0.0)
+Requires-Dist: langsmith (>=0.1.58,<0.2.0)
+Requires-Dist: moviepy (>=1.0.0,<2.0.0)
+Requires-Dist: nbclient (>=0.10.0,<0.11.0)
+Requires-Dist: nbformat (>=5.10.4,<6.0.0)
+Requires-Dist: numpy (>=1.21.0,<2.0.0)
+Requires-Dist: openai (>=1.0.0,<2.0.0)
+Requires-Dist: opencv-python-headless (>=4.0.0,<5.0.0)
+Requires-Dist: pandas (>=2.0.0,<3.0.0)
+Requires-Dist: pillow (>=10.0.0,<11.0.0)
+Requires-Dist: pydantic-settings (>=2.2.1,<3.0.0)
+Requires-Dist: requests (>=2.0.0,<3.0.0)
+Requires-Dist: rich (>=13.7.1,<14.0.0)
+Requires-Dist: scipy (>=1.13.0,<1.14.0)
+Requires-Dist: tabulate (>=0.9.0,<0.10.0)
+Requires-Dist: tqdm (>=4.64.0,<5.0.0)
+Requires-Dist: typing_extensions (>=4.0.0,<5.0.0)
+Project-URL: Homepage, https://landing.ai
+Project-URL: documentation, https://github.com/landing-ai/vision-agent
+Project-URL: repository, https://github.com/landing-ai/vision-agent
+Description-Content-Type: text/markdown
+<div align="center">
+    <img alt="vision_agent" height="200px" src="https://github.com/landing-ai/vision-agent/blob/main/assets/logo.jpg?raw=true">
+# 🔍🤖 Vision Agent
+[![](https://dcbadge.vercel.app/api/server/wPdN8RCYew?compact=true&style=flat)](https://discord.gg/wPdN8RCYew)
+![ci_status](https://github.com/landing-ai/vision-agent/actions/workflows/ci_cd.yml/badge.svg)
+[![PyPI version](https://badge.fury.io/py/vision-agent.svg)](https://badge.fury.io/py/vision-agent)
+![version](https://img.shields.io/pypi/pyversions/vision-agent)
+</div>
+Vision Agent is a library that helps you utilize agent frameworks to generate code to
+solve your vision task. Many current vision problems can easily take hours or days to
+solve, you need to find the right model, figure out how to use it and program it to
+accomplish the task you want. Vision Agent aims to provide an in-seconds experience by
+allowing users to describe their problem in text and have the agent framework generate
+code to solve the task for them. Check out our discord for updates and roadmaps!
+## Documentation
+- [Vision Agent Library Docs](https://landing-ai.github.io/vision-agent/)
+## Getting Started
+### Installation
+To get started, you can install the library using pip:
+```bash
+pip install vision-agent
+```
+Ensure you have an OpenAI API key and set it as an environment variable (if you are
+using Azure OpenAI please see the Azure setup section):
+```bash
+export OPENAI_API_KEY="your-api-key"
+```
+### Vision Agent
+You can interact with the agent as you would with any LLM or LMM model:
+```python
+>>> from vision_agent.agent import VisionAgent
+>>> agent = VisionAgent()
+>>> code = agent("What percentage of the area of the jar is filled with coffee beans?", media="jar.jpg")
+```
+Which produces the following code:
+```python
+from vision_agent.tools import load_image, grounding_sam
+def calculate_filled_percentage(image_path: str) -> float:
+    # Step 1: Load the image
+    image = load_image(image_path)
+    # Step 2: Segment the jar
+    jar_segments = grounding_sam(prompt="jar", image=image)
+    # Step 3: Segment the coffee beans
+    coffee_beans_segments = grounding_sam(prompt="coffee beans", image=image)
+    # Step 4: Calculate the area of the segmented jar
+    jar_area = 0
+    for segment in jar_segments:
+        jar_area += segment['mask'].sum()
+    # Step 5: Calculate the area of the segmented coffee beans
+    coffee_beans_area = 0
+    for segment in coffee_beans_segments:
+        coffee_beans_area += segment['mask'].sum()
+    # Step 6: Compute the percentage of the jar area that is filled with coffee beans
+    if jar_area == 0:
+        return 0.0  # To avoid division by zero
+    filled_percentage = (coffee_beans_area / jar_area) * 100
+    # Step 7: Return the computed percentage
+    return filled_percentage
+```
+To better understand how the model came up with it's answer, you can run it in debug
+mode by passing in the verbose argument:
+```python
+>>> agent = VisionAgent(verbose=2)
+```
+You can also have it return more information by calling `chat_with_workflow`:
+```python
+>>> results = agent.chat_with_workflow([{"role": "user", "content": "What percentage of the area of the jar is filled with coffee beans?"}], media="jar.jpg")
+>>> print(results)
+{
+    "code": "from vision_agent.tools import ..."
+    "test": "calculate_filled_percentage('jar.jpg')",
+    "test_result": "...",
+    "plan": [{"code": "...", "test": "...", "plan": "..."}, ...],
+    "working_memory": ...,
+}
+```
+With this you can examine more detailed information such as the etesting code, testing
+results, plan or working memory it used to complete the task.
+### Tools
+There are a variety of tools for the model or the user to use. Some are executed locally
+while others are hosted for you. You can also ask an LLM directly to build a tool for
+you. For example:
+```python
+>>> import vision_agent as va
+>>> llm = va.llm.OpenAILLM()
+>>> detector = llm.generate_detector("Can you build a jar detector for me?")
+>>> detector("jar.jpg")
+[{"labels": ["jar",],
+  "scores": [0.99],
+  "bboxes": [
+    [0.58, 0.2, 0.72, 0.45],
+  ]
+}]
+```
+### Azure Setup
+If you want to use Azure OpenAI models, you can set the environment variable:
+```bash
+export AZURE_OPENAI_API_KEY="your-api-key"
+export AZURE_OPENAI_ENDPOINT="your-endpoint"
+```
+You can then run Vision Agent using the Azure OpenAI models:
+```python
+>>> import vision_agent as va
+>>> agent = va.agent.VisionAgent(
+>>>     task_model=va.llm.AzureOpenAILLM(),
+>>>     answer_model=va.lmm.AzureOpenAILMM(),
+>>>     reflection_model=va.lmm.AzureOpenAILMM(),
+>>> )
+```

vision_agent-0.2.31/README.md ADDED Viewed

@@ -0,0 +1,141 @@
+<div align="center">
+    <img alt="vision_agent" height="200px" src="https://github.com/landing-ai/vision-agent/blob/main/assets/logo.jpg?raw=true">
+# 🔍🤖 Vision Agent
+[![](https://dcbadge.vercel.app/api/server/wPdN8RCYew?compact=true&style=flat)](https://discord.gg/wPdN8RCYew)
+![ci_status](https://github.com/landing-ai/vision-agent/actions/workflows/ci_cd.yml/badge.svg)
+[![PyPI version](https://badge.fury.io/py/vision-agent.svg)](https://badge.fury.io/py/vision-agent)
+![version](https://img.shields.io/pypi/pyversions/vision-agent)
+</div>
+Vision Agent is a library that helps you utilize agent frameworks to generate code to
+solve your vision task. Many current vision problems can easily take hours or days to
+solve, you need to find the right model, figure out how to use it and program it to
+accomplish the task you want. Vision Agent aims to provide an in-seconds experience by
+allowing users to describe their problem in text and have the agent framework generate
+code to solve the task for them. Check out our discord for updates and roadmaps!
+## Documentation
+- [Vision Agent Library Docs](https://landing-ai.github.io/vision-agent/)
+## Getting Started
+### Installation
+To get started, you can install the library using pip:
+```bash
+pip install vision-agent
+```
+Ensure you have an OpenAI API key and set it as an environment variable (if you are
+using Azure OpenAI please see the Azure setup section):
+```bash
+export OPENAI_API_KEY="your-api-key"
+```
+### Vision Agent
+You can interact with the agent as you would with any LLM or LMM model:
+```python
+>>> from vision_agent.agent import VisionAgent
+>>> agent = VisionAgent()
+>>> code = agent("What percentage of the area of the jar is filled with coffee beans?", media="jar.jpg")
+```
+Which produces the following code:
+```python
+from vision_agent.tools import load_image, grounding_sam
+def calculate_filled_percentage(image_path: str) -> float:
+    # Step 1: Load the image
+    image = load_image(image_path)
+    # Step 2: Segment the jar
+    jar_segments = grounding_sam(prompt="jar", image=image)
+    # Step 3: Segment the coffee beans
+    coffee_beans_segments = grounding_sam(prompt="coffee beans", image=image)
+    # Step 4: Calculate the area of the segmented jar
+    jar_area = 0
+    for segment in jar_segments:
+        jar_area += segment['mask'].sum()
+    # Step 5: Calculate the area of the segmented coffee beans
+    coffee_beans_area = 0
+    for segment in coffee_beans_segments:
+        coffee_beans_area += segment['mask'].sum()
+    # Step 6: Compute the percentage of the jar area that is filled with coffee beans
+    if jar_area == 0:
+        return 0.0  # To avoid division by zero
+    filled_percentage = (coffee_beans_area / jar_area) * 100
+    # Step 7: Return the computed percentage
+    return filled_percentage
+```
+To better understand how the model came up with it's answer, you can run it in debug
+mode by passing in the verbose argument:
+```python
+>>> agent = VisionAgent(verbose=2)
+```
+You can also have it return more information by calling `chat_with_workflow`:
+```python
+>>> results = agent.chat_with_workflow([{"role": "user", "content": "What percentage of the area of the jar is filled with coffee beans?"}], media="jar.jpg")
+>>> print(results)
+{
+    "code": "from vision_agent.tools import ..."
+    "test": "calculate_filled_percentage('jar.jpg')",
+    "test_result": "...",
+    "plan": [{"code": "...", "test": "...", "plan": "..."}, ...],
+    "working_memory": ...,
+}
+```
+With this you can examine more detailed information such as the etesting code, testing
+results, plan or working memory it used to complete the task.
+### Tools
+There are a variety of tools for the model or the user to use. Some are executed locally
+while others are hosted for you. You can also ask an LLM directly to build a tool for
+you. For example:
+```python
+>>> import vision_agent as va
+>>> llm = va.llm.OpenAILLM()
+>>> detector = llm.generate_detector("Can you build a jar detector for me?")
+>>> detector("jar.jpg")
+[{"labels": ["jar",],
+  "scores": [0.99],
+  "bboxes": [
+    [0.58, 0.2, 0.72, 0.45],
+  ]
+}]
+```
+### Azure Setup
+If you want to use Azure OpenAI models, you can set the environment variable:
+```bash
+export AZURE_OPENAI_API_KEY="your-api-key"
+export AZURE_OPENAI_ENDPOINT="your-endpoint"
+```
+You can then run Vision Agent using the Azure OpenAI models:
+```python
+>>> import vision_agent as va
+>>> agent = va.agent.VisionAgent(
+>>>     task_model=va.llm.AzureOpenAILLM(),
+>>>     answer_model=va.lmm.AzureOpenAILMM(),
+>>>     reflection_model=va.lmm.AzureOpenAILMM(),
+>>> )
+```

{vision_agent-0.2.30 → vision_agent-0.2.31}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
 [tool.poetry]
 name = "vision-agent"
-version = "0.2.30"
+version = "0.2.31"
 description = "Toolset for Vision Agent"
 authors = ["Landing AI <dev@landing.ai>"]
 readme = "README.md"

{vision_agent-0.2.30 → vision_agent-0.2.31}/vision_agent/agent/__init__.py RENAMED Viewed

@@ -1,7 +1,7 @@
 from .agent import Agent
 from .agent_coder import AgentCoder
+from .data_interpreter import DataInterpreter
 from .easytool import EasyTool
+from .easytool_v2 import EasyToolV2
 from .reflexion import Reflexion
 from .vision_agent import VisionAgent
-from .vision_agent_v2 import VisionAgentV2
-from .vision_agent_v3 import VisionAgentV3

{vision_agent-0.2.30 → vision_agent-0.2.31}/vision_agent/agent/agent.py RENAMED Viewed

@@ -8,7 +8,7 @@ class Agent(ABC):
     def __call__(
         self,
         input: Union[List[Dict[str, str]], str],
-        image: Optional[Union[str, Path]] = None,
+        media: Optional[Union[str, Path]] = None,
     ) -> str:
         pass

{vision_agent-0.2.30 → vision_agent-0.2.31}/vision_agent/agent/agent_coder.py RENAMED Viewed

@@ -18,7 +18,7 @@ from vision_agent.agent.agent_coder_prompts import (
 )
 from vision_agent.llm import LLM, OpenAILLM
 from vision_agent.lmm import LMM, OpenAILMM
-from vision_agent.tools.tools_v2 import TOOL_DOCSTRING, UTILITIES_DOCSTRING
+from vision_agent.tools import TOOL_DOCSTRING, UTILITIES_DOCSTRING
 from vision_agent.utils import Execute
 IMPORT_HELPER = """
@@ -38,7 +38,7 @@ import numpy as np
 import string
 from typing import *
 from collections import *
-from vision_agent.tools.tools_v2 import *
+from vision_agent.tools import *
 """
 logging.basicConfig(stream=sys.stdout)
 _LOGGER = logging.getLogger(__name__)
@@ -150,20 +150,20 @@ class AgentCoder(Agent):
     def __call__(
         self,
         input: Union[List[Dict[str, str]], str],
-        image: Optional[Union[str, Path]] = None,
+        media: Optional[Union[str, Path]] = None,
     ) -> str:
         if isinstance(input, str):
             input = [{"role": "user", "content": input}]
-        return self.chat(input, image)
+        return self.chat(input, media)
     def chat(
         self,
         input: List[Dict[str, str]],
-        image: Optional[Union[str, Path]] = None,
+        media: Optional[Union[str, Path]] = None,
     ) -> str:
         question = input[0]["content"]
-        if image:
-            question += f" Input file path: {os.path.abspath(image)}"
+        if media:
+            question += f" Input file path: {os.path.abspath(media)}"
         code = ""
         feedback = ""

vision_agent-0.2.30/vision_agent/agent/vision_agent_v2.py → vision_agent-0.2.31/vision_agent/agent/data_interpreter.py RENAMED Viewed

@@ -10,7 +10,7 @@ from rich.syntax import Syntax
 from tabulate import tabulate
 from vision_agent.agent import Agent
-from vision_agent.agent.vision_agent_v2_prompts import (
+from vision_agent.agent.data_interpreter_prompts import (
     CODE,
     CODE_SYS_MSG,
     DEBUG,
@@ -25,7 +25,7 @@ from vision_agent.agent.vision_agent_v2_prompts import (
     USER_REQ_SUBTASK_WM_CONTEXT,
 )
 from vision_agent.llm import LLM, OpenAILLM
-from vision_agent.tools.tools_v2 import TOOL_DESCRIPTIONS, TOOLS_DF
+from vision_agent.tools import TOOL_DESCRIPTIONS, TOOLS_DF
 from vision_agent.utils import Execute, Sim
 logging.basicConfig(level=logging.INFO)
@@ -331,11 +331,11 @@ def run_plan(
     return current_code, current_test, plan, working_memory
-class VisionAgentV2(Agent):
-    """Vision Agent is an AI agentic framework geared towards outputting Python code to
-    solve vision tasks. It is inspired by MetaGPT's Data Interpreter
-    https://arxiv.org/abs/2402.18679. Vision Agent has several key features to help it
-    generate code:
+class DataInterpreter(Agent):
+    """This version of Data Interpreter is an AI agentic framework geared towards
+    outputting Python code to solve vision tasks. It is inspired by MetaGPT's Data
+    Interpreter https://arxiv.org/abs/2402.18679. This version of Data Interpreter has
+    several key features to help it generate code:
     - A planner to generate a plan of tasks to solve a user requirement. The planner
     can output code tasks or test tasks, where test tasks are used to verify the code.
@@ -379,29 +379,29 @@ class VisionAgentV2(Agent):
     def __call__(
         self,
         input: Union[List[Dict[str, str]], str],
-        image: Optional[Union[str, Path]] = None,
+        media: Optional[Union[str, Path]] = None,
         plan: Optional[List[Dict[str, Any]]] = None,
     ) -> str:
         if isinstance(input, str):
             input = [{"role": "user", "content": input}]
-        results = self.chat_with_workflow(input, image, plan)
+        results = self.chat_with_workflow(input, media, plan)
         return results["code"]  # type: ignore
     @traceable
     def chat_with_workflow(
         self,
         chat: List[Dict[str, str]],
-        image: Optional[Union[str, Path]] = None,
+        media: Optional[Union[str, Path]] = None,
         plan: Optional[List[Dict[str, Any]]] = None,
     ) -> Dict[str, Any]:
         if len(chat) == 0:
             raise ValueError("Input cannot be empty.")
-        if image is not None:
+        if media is not None:
             # append file names to all user messages
             for chat_i in chat:
                 if chat_i["role"] == "user":
-                    chat_i["content"] += f" Image name {image}"
+                    chat_i["content"] += f" Image name {media}"
         working_code = ""
         if plan is not None:

vision_agent-0.2.30/vision_agent/agent/vision_agent_v2_prompts.py → vision_agent-0.2.31/vision_agent/agent/data_interpreter_prompts.py RENAMED Viewed

@@ -74,15 +74,15 @@ CODE = """
 # Constraints
 - Write a function that accomplishes the 'Current Subtask'. You are supplied code from a previous task under 'Previous Code', do not delete or change previous code unless it contains a bug or it is necessary to complete the 'Current Subtask'.
-- Always prioritize using pre-defined tools or code for the same functionality from 'Tool Info' when working on 'Current Subtask'. You have access to all these tools through the `from vision_agent.tools.tools_v2 import *` import.
+- Always prioritize using pre-defined tools or code for the same functionality from 'Tool Info' when working on 'Current Subtask'. You have access to all these tools through the `from vision_agent.tools import *` import.
 - You may recieve previous trials and errors under 'Previous Task', this is code, output and reflections from previous tasks. You can use these to avoid running in to the same issues when writing your code.
-- Use the `save_json` function from `vision_agent.tools.tools_v2` to save your output as a json file.
+- Use the `save_json` function from `vision_agent.tools` to save your output as a json file.
 - Write clean, readable, and well-documented code.
 # Output
 While some concise thoughts are helpful, code is absolutely required. If possible, execute your defined functions in the code output. Output code in the following format:
 ```python
-from vision_agent.tools.tools_v2 imoprt *
+from vision_agent.tools imoprt *
 # your code goes here
 ```

{vision_agent-0.2.30 → vision_agent-0.2.31}/vision_agent/agent/easytool.py RENAMED Viewed

@@ -6,7 +6,7 @@ from typing import Any, Callable, Dict, List, Optional, Tuple, Union
 from vision_agent.llm import LLM, OpenAILLM
 from vision_agent.lmm import LMM
-from vision_agent.tools import TOOLS
+from vision_agent.tools.easytool_tools import TOOLS
 from .agent import Agent
 from .easytool_prompts import (
@@ -272,7 +272,7 @@ class EasyTool(Agent):
     def __call__(
         self,
         input: Union[List[Dict[str, str]], str],
-        image: Optional[Union[str, Path]] = None,
+        media: Optional[Union[str, Path]] = None,
     ) -> str:
         """Invoke the vision agent.
@@ -285,14 +285,14 @@ class EasyTool(Agent):
         """
         if isinstance(input, str):
             input = [{"role": "user", "content": input}]
-        return self.chat(input, image=image)
+        return self.chat(input, media=media)
     def chat_with_workflow(
-        self, chat: List[Dict[str, str]], image: Optional[Union[str, Path]] = None
+        self, chat: List[Dict[str, str]], media: Optional[Union[str, Path]] = None
     ) -> Tuple[str, List[Dict]]:
         question = chat[0]["content"]
-        if image:
-            question += f" Image name: {image}"
+        if media:
+            question += f" Image name: {media}"
         tasks = task_decompose(
             self.task_model,
             question,
@@ -340,7 +340,7 @@ class EasyTool(Agent):
         return answer_summarize(self.answer_model, question, answers), all_tool_results
     def chat(
-        self, chat: List[Dict[str, str]], image: Optional[Union[str, Path]] = None
+        self, chat: List[Dict[str, str]], media: Optional[Union[str, Path]] = None
     ) -> str:
-        answer, _ = self.chat_with_workflow(chat, image=image)
+        answer, _ = self.chat_with_workflow(chat, media=media)
         return answer

vision_agent-0.2.30/vision_agent/agent/vision_agent.py → vision_agent-0.2.31/vision_agent/agent/easytool_v2.py RENAMED Viewed

@@ -17,7 +17,7 @@ from vision_agent.agent.easytool_prompts import (
     TASK_DECOMPOSE,
     TASK_TOPOLOGY,
 )
-from vision_agent.agent.vision_agent_prompts import (
+from vision_agent.agent.easytool_v2_prompts import (
     ANSWER_GENERATE_DEPENDS,
     ANSWER_SUMMARIZE_DEPENDS,
     CHOOSE_PARAMETER_DEPENDS,
@@ -27,7 +27,7 @@ from vision_agent.agent.vision_agent_prompts import (
 )
 from vision_agent.llm import LLM, OpenAILLM
 from vision_agent.lmm import LMM, OpenAILMM
-from vision_agent.tools import TOOLS
+from vision_agent.tools.easytool_tools import TOOLS
 from vision_agent.utils.image_utils import (
     convert_to_b64,
     overlay_bboxes,
@@ -427,9 +427,9 @@ def visualize_result(all_tool_results: List[Dict]) -> Sequence[Union[str, Path]]
     return visualized_images
-class VisionAgent(Agent):
-    r"""Vision Agent is an agent framework that utilizes tools as well as self
-    reflection to accomplish tasks, in particular vision tasks. Vision Agent is based
+class EasyToolV2(Agent):
+    r"""EasyToolV2 is an agent framework that utilizes tools as well as self
+    reflection to accomplish tasks, in particular vision tasks. EasyToolV2 is based
     off of EasyTool https://arxiv.org/abs/2401.06201 and Reflexion
     https://arxiv.org/abs/2303.11366 where it will attempt to complete a task and then
     reflect on whether or not it was able to accomplish the task based off of the plan
@@ -437,8 +437,8 @@ class VisionAgent(Agent):
     Example
     -------
-        >>> from vision_agent.agent import VisionAgent
-        >>> agent = VisionAgent()
+        >>> from vision_agent.agent import EasyToolV2
+        >>> agent = EasyToolV2()
         >>> resp = agent("If red tomatoes cost $5 each and yellow tomatoes cost $2.50 each, what is the total cost of all the tomatoes in the image?", image="tomatoes.jpg")
         >>> print(resp)
         "The total cost is $57.50."
@@ -453,7 +453,7 @@ class VisionAgent(Agent):
         verbose: bool = False,
         report_progress_callback: Optional[Callable[[Dict[str, Any]], None]] = None,
     ):
-        """VisionAgent constructor.
+        """EasyToolV2 constructor.
         Parameters:
             task_model: the model to use for task decomposition.
@@ -461,7 +461,7 @@ class VisionAgent(Agent):
             reflect_model: the model to use for self reflection.
             max_retries: maximum number of retries to attempt to complete the task.
             verbose: whether to print more logs.
-            report_progress_callback: a callback to report the progress of the agent. This is useful for streaming logs in a web application where multiple VisionAgent instances are running in parallel. This callback ensures that the progress are not mixed up.
+            report_progress_callback: a callback to report the progress of the agent. This is useful for streaming logs in a web application where multiple EasyToolV2 instances are running in parallel. This callback ensures that the progress are not mixed up.
         """
         self.task_model = (
             OpenAILLM(model_name="gpt-4-turbo", json_mode=True, temperature=0.0)
@@ -487,7 +487,7 @@ class VisionAgent(Agent):
     def __call__(
         self,
         input: Union[List[Dict[str, str]], str],
-        image: Optional[Union[str, Path]] = None,
+        media: Optional[Union[str, Path]] = None,
         reference_data: Optional[Dict[str, str]] = None,
         visualize_output: Optional[bool] = False,
         self_reflection: Optional[bool] = True,
@@ -512,7 +512,7 @@ class VisionAgent(Agent):
             input = [{"role": "user", "content": input}]
         return self.chat(
             input,
-            image=image,
+            media=media,
             visualize_output=visualize_output,
             reference_data=reference_data,
             self_reflection=self_reflection,
@@ -539,12 +539,12 @@ class VisionAgent(Agent):
     def chat_with_workflow(
         self,
         chat: List[Dict[str, str]],
-        image: Optional[Union[str, Path]] = None,
+        media: Optional[Union[str, Path]] = None,
         reference_data: Optional[Dict[str, str]] = None,
         visualize_output: Optional[bool] = False,
         self_reflection: Optional[bool] = True,
     ) -> Tuple[str, List[Dict]]:
-        """Chat with the vision agent and return the final answer and all tool results.
+        """Chat with EasyToolV2 and return the final answer and all tool results.
         Parameters:
             chat: A conversation in the format of
@@ -566,8 +566,8 @@ class VisionAgent(Agent):
             raise ValueError("Input cannot be empty.")
         question = chat[0]["content"]
-        if image:
-            question += f" Image name: {image}"
+        if media:
+            question += f" Image name: {media}"
         if reference_data:
             question += (
                 f" Reference image: {reference_data['image']}"
@@ -630,8 +630,8 @@ class VisionAgent(Agent):
             all_tool_results.append({"visualized_output": visualized_output})
             if len(visualized_output) > 0:
                 reflection_images = sample_n_evenly_spaced(visualized_output, 3)
-            elif image is not None:
-                reflection_images = [image]
+            elif media is not None:
+                reflection_images = [media]
             else:
                 reflection_images = None
@@ -658,7 +658,7 @@ class VisionAgent(Agent):
         # '<ANSWER>' is a symbol to indicate the end of the chat, which is useful for streaming logs.
         self.log_progress(
             {
-                "log": f"The Vision Agent has concluded this chat. <ANSWER>{final_answer}</ANSWER>"
+                "log": f"EasyToolV2 has concluded this chat. <ANSWER>{final_answer}</ANSWER>"
             }
         )
@@ -675,14 +675,14 @@ class VisionAgent(Agent):
     def chat(
         self,
         chat: List[Dict[str, str]],
-        image: Optional[Union[str, Path]] = None,
+        media: Optional[Union[str, Path]] = None,
         reference_data: Optional[Dict[str, str]] = None,
         visualize_output: Optional[bool] = False,
         self_reflection: Optional[bool] = True,
     ) -> str:
         answer, _ = self.chat_with_workflow(
             chat,
-            image=image,
+            media=media,
             visualize_output=visualize_output,
             reference_data=reference_data,
             self_reflection=self_reflection,

vision-agent 0.2.30__tar.gz → 0.2.31__tar.gz

vision-agent 0.2.30tar.gz → 0.2.31tar.gz