PyPI - vision-agent - Versions diffs - 0.0.48__py3-none-any.whl → 0.0.49__py3-none-any.whl - Mend

vision-agent 0.0.48py3-none-any.whl → 0.0.49py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

vision_agent/agent/agent.py +7 -0
vision_agent/agent/easytool_prompts.py +14 -14
vision_agent/agent/reflexion_prompts.py +1 -1
vision_agent/agent/vision_agent.py +113 -82
vision_agent/agent/vision_agent_prompts.py +20 -20
vision_agent/image_utils.py +1 -1
vision_agent/llm/__init__.py +1 -1
vision_agent/llm/llm.py +38 -3
vision_agent/lmm/__init__.py +1 -1
vision_agent/lmm/lmm.py +37 -2
vision_agent/tools/prompts.py +3 -3
vision_agent/tools/tools.py +95 -50
{vision_agent-0.0.48.dist-info → vision_agent-0.0.49.dist-info}/METADATA +23 -2
vision_agent-0.0.49.dist-info/RECORD +26 -0
vision_agent-0.0.48.dist-info/RECORD +0 -26
{vision_agent-0.0.48.dist-info → vision_agent-0.0.49.dist-info}/LICENSE +0 -0
{vision_agent-0.0.48.dist-info → vision_agent-0.0.49.dist-info}/WHEEL +0 -0

vision_agent/agent/agent.py CHANGED Viewed

@@ -11,3 +11,10 @@ class Agent(ABC):
         image: Optional[Union[str, Path]] = None,
     ) -> str:
         pass
+    @abstractmethod
+    def log_progress(self, description: str) -> None:
+        """Log the progress of the agent.
+        This is a hook that is intended for reporting the progress of the agent.
+        """
+        pass

vision_agent/agent/easytool_prompts.py CHANGED Viewed

@@ -1,11 +1,11 @@
-TASK_DECOMPOSE = """You need to decompose a complex user's question into some simple subtasks and let the model execute it step by step.
+TASK_DECOMPOSE = """You need to decompose a user's complex question into some simple subtasks and let the model execute it step by step.
 This is the user's question: {question}
-This is tool list:
+This is the tool list:
 {tools}
 Please note that:
 1. You should only decompose this complex user's question into some simple subtasks which can be executed easily by using one single tool in the tool list.
-2. If one subtask need the results from other subtask, you can should write clearly. For example:
+2. If one subtask needs the results from another subtask, you should write clearly. For example:
 {{"Tasks": ["Convert 23 km/h to X km/min by 'divide_'", "Multiply X km/min by 45 min to get Y by 'multiply_'"]}}
 3. You must ONLY output in a parsible JSON format. An example output looks like:
@@ -13,7 +13,7 @@ Please note that:
 Output: """
-TASK_TOPOLOGY = """Given a complex user's question, I have decompose this question into some simple subtasks. I think there exists a logical connections and order amontg the tasks. Thus you need to help me output this logical connections and order.
+TASK_TOPOLOGY = """Given a user's complex question, I have decomposed this question into some simple subtasks. I think there exist logical connections and order among the tasks. Thus, you need to help me output these logical connections and order.
 You must ONLY output in a parsible JSON format with the following format:
 {{"Tasks": [{{"task": task, "id", task_id, "dep": [dependency_task_id1, dependency_task_id2, ...]}}]}}
@@ -21,7 +21,7 @@ You must ONLY output in a parsible JSON format with the following format:
 The "dep" field denotes the id of the previous task which generates a new resource upon which the current task depends. If there are no dependencies, set "dep" to -1.
-This is user's question: {question}
+This is the user's question: {question}
 These are subtasks of this question:
@@ -34,7 +34,7 @@ These are the tools you can select to solve the question:
 {tools}
 Please note that:
-1. You should only chooce one tool the Tool List to solve this question.
+1. You should only choose one tool from the Tool List to solve this question.
 2. You must ONLY output the ID of the tool you chose in a parsible JSON format. Two example outputs look like:
 Example 1: {{"ID": 1}}
@@ -42,22 +42,22 @@ Example 2: {{"ID": 2}}
 Output: """
-CHOOSE_PARAMETER = """Given a user's question and a API tool documentation, you need to output parameters according to the API tool documentation to successfully call the API to solve the user's question.
+CHOOSE_PARAMETER = """Given a user's question and an API tool documentation, you need to output parameters according to the API tool documentation to successfully call the API to solve the user's question.
 Please note that:
 1. The Example in the API tool documentation can help you better understand the use of the API.
-2. Ensure the parameters you output are correct. The output must contain the required parameters, and can contain the optional parameters based on the question. If no paremters in the required parameters and optional parameters, just leave it as {{"Parameters":{{}}}}
+2. Ensure the parameters you output are correct. The output must contain the required parameters, and can contain the optional parameters based on the question. If there are no paremters in the required parameters and optional parameters, just leave it as {{"Parameters":{{}}}}
 3. If the user's question mentions other APIs, you should ONLY consider the API tool documentation I give and do not consider other APIs.
 4. The question may have dependencies on answers of other questions, so we will provide logs of previous questions and answers for your reference.
-5. If you need to use this API multiple times,, please set "Parameters" to a list.
-6. You must ONLY output in a parsible JSON format. Two examples output looks like:
+5. If you need to use this API multiple times, please set "Parameters" to a list.
+6. You must ONLY output in a parsible JSON format. Two example outputs looks like:
 Example 1: {{"Parameters":{{"input": [1,2,3]}}}}
 Example 2: {{"Parameters":[{{"input": [1,2,3]}}, {{"input": [2,3,4]}}]}}
-There are logs of previous questions and answers:
+These are logs of previous questions and answers:
 {previous_log}
 This is the current user's question: {question}
-This is API tool documentation: {tool_usage}
+This is the API tool documentation: {tool_usage}
 Output: """
@@ -67,7 +67,7 @@ Please note that:
 2. We will not show the API response to the user, thus you need to make full use of the response and give the information in the response that can satisfy the user's question in as much detail as possible.
 3. If the API tool does not provide useful information in the response, please answer with your knowledge.
 4. The question may have dependencies on answers of other questions, so we will provide logs of previous questions and answers.
-There are logs of previous questions and answers:
+These are logs of previous questions and answers:
 {previous_log}
 This is the user's question: {question}
 This is the response output by the API tool:
@@ -75,7 +75,7 @@ This is the response output by the API tool:
 We will not show the API response to the user, thus you need to make full use of the response and give the information in the response that can satisfy the user's question in as much detail as possible.
 Output: """
-ANSWER_SUMMARIZE = """We break down a complex user's problems into simple subtasks and provide answers to each simple subtask. You need to organize these answers to each subtask and form a self-consistent final answer to the user's question
+ANSWER_SUMMARIZE = """We break down a complex user's problems into simple subtasks and provide answers to each simple subtask. You need to organize these answers to each subtask and form a self-consistent final answer to the user's question.
 This is the user's question: {question}
 These are subtasks and their answers: {answers}
 Final answer: """

vision_agent/agent/reflexion_prompts.py CHANGED Viewed

@@ -9,7 +9,7 @@ Relevant Context: {context}
 Question: {question}{scratchpad}"""
-COT_REFLECT_INSTRUCTION = """You are an advanced reasoning agent that can improve based on self refection. You will be given a previous reasoning trial in which you were given access to relevant context and a question to answer. You were unsuccessful in answering the question either because you guessed the wrong answer with Finish[<answer>] or there is a phrasing discrepancy with your provided answer and the answer key. In a few sentences, Diagnose a possible reason for failure or phrasing discrepancy and devise a new, concise, high level plan that aims to mitigate the same failure. Use complete sentences.
+COT_REFLECT_INSTRUCTION = """You are an advanced reasoning agent that can improve based on self-refection. You will be given a previous reasoning trial in which you were given access to relevant context and a question to answer. You were unsuccessful in answering the question either because you guessed the wrong answer with Finish[<answer>] or there is a phrasing discrepancy with your provided answer and the answer key. In a few sentences, diagnose a possible reason for failure or phrasing discrepancy and devise a new, concise, high level plan that aims to mitigate the same failure. Use complete sentences.
 Here are some examples:
 {examples}
 (END OF EXAMPLES)

vision_agent/agent/vision_agent.py CHANGED Viewed

@@ -244,79 +244,6 @@ def function_call(tool: Callable, parameters: Dict[str, Any]) -> Any:
         return str(e)
-def retrieval(
-    model: Union[LLM, LMM, Agent],
-    question: str,
-    tools: Dict[int, Any],
-    previous_log: str,
-    reflections: str,
-) -> Tuple[Dict, str]:
-    tool_id = choose_tool(
-        model, question, {k: v["description"] for k, v in tools.items()}, reflections
-    )
-    if tool_id is None:
-        return {}, ""
-    tool_instructions = tools[tool_id]
-    tool_usage = tool_instructions["usage"]
-    tool_name = tool_instructions["name"]
-    parameters = choose_parameter(
-        model, question, tool_usage, previous_log, reflections
-    )
-    if parameters is None:
-        return {}, ""
-    tool_results = {"task": question, "tool_name": tool_name, "parameters": parameters}
-    _LOGGER.info(
-        f"""Going to run the following tool(s) in sequence:
-{tabulate([tool_results], headers="keys", tablefmt="mixed_grid")}"""
-    )
-    def parse_tool_results(result: Dict[str, Union[Dict, List]]) -> Any:
-        call_results: List[Any] = []
-        if isinstance(result["parameters"], Dict):
-            call_results.append(
-                function_call(tools[tool_id]["class"], result["parameters"])
-            )
-        elif isinstance(result["parameters"], List):
-            for parameters in result["parameters"]:
-                call_results.append(function_call(tools[tool_id]["class"], parameters))
-        return call_results
-    call_results = parse_tool_results(tool_results)
-    tool_results["call_results"] = call_results
-    call_results_str = str(call_results)
-    # _LOGGER.info(f"\tCall Results: {call_results_str}")
-    return tool_results, call_results_str
-def create_tasks(
-    task_model: Union[LLM, LMM], question: str, tools: Dict[int, Any], reflections: str
-) -> List[Dict]:
-    tasks = task_decompose(
-        task_model,
-        question,
-        {k: v["description"] for k, v in tools.items()},
-        reflections,
-    )
-    if tasks is not None:
-        task_list = [{"task": task, "id": i + 1} for i, task in enumerate(tasks)]
-        task_list = task_topology(task_model, question, task_list)
-        try:
-            task_list = topological_sort(task_list)
-        except Exception:
-            _LOGGER.error(f"Failed topological_sort on: {task_list}")
-    else:
-        task_list = []
-    _LOGGER.info(
-        f"""Planned tasks:
-{tabulate(task_list, headers="keys", tablefmt="mixed_grid")}"""
-    )
-    return task_list
 def self_reflect(
     reflect_model: Union[LLM, LMM],
     question: str,
@@ -350,7 +277,7 @@ def parse_reflect(reflect: str) -> bool:
 def visualize_result(all_tool_results: List[Dict]) -> List[str]:
     image_to_data: Dict[str, Dict] = {}
     for tool_result in all_tool_results:
-        if not tool_result["tool_name"] in ["grounding_sam_", "grounding_dino_"]:
+        if tool_result["tool_name"] not in ["grounding_sam_", "grounding_dino_"]:
             continue
         parameters = tool_result["parameters"]
@@ -368,7 +295,6 @@ def visualize_result(all_tool_results: List[Dict]) -> List[str]:
                 continue
         for param, call_result in zip(parameters, tool_result["call_results"]):
             # calls can fail, so we need to check if the call was successful
             if not isinstance(call_result, dict):
                 continue
@@ -421,7 +347,18 @@ class VisionAgent(Agent):
         reflect_model: Optional[Union[LLM, LMM]] = None,
         max_retries: int = 2,
         verbose: bool = False,
+        report_progress_callback: Optional[Callable[[str], None]] = None,
     ):
+        """VisionAgent constructor.
+        Parameters
+            task_model: the model to use for task decomposition.
+            answer_model: the model to use for reasoning and concluding the answer.
+            reflect_model: the model to use for self reflection.
+            max_retries: maximum number of retries to attempt to complete the task.
+            verbose: whether to print more logs.
+            report_progress_callback: a callback to report the progress of the agent. This is useful for streaming logs in a web application where multiple VisionAgent instances are running in parallel. This callback ensures that the progress are not mixed up.
+        """
         self.task_model = (
             OpenAILLM(json_mode=True, temperature=0.1)
             if task_model is None
@@ -434,8 +371,8 @@ class VisionAgent(Agent):
             OpenAILMM(temperature=0.1) if reflect_model is None else reflect_model
         )
         self.max_retries = max_retries
         self.tools = TOOLS
+        self.report_progress_callback = report_progress_callback
         if verbose:
             _LOGGER.setLevel(logging.INFO)
@@ -458,6 +395,11 @@ class VisionAgent(Agent):
             input = [{"role": "user", "content": input}]
         return self.chat(input, image=image)
+    def log_progress(self, description: str) -> None:
+        _LOGGER.info(description)
+        if self.report_progress_callback:
+            self.report_progress_callback(description)
     def chat_with_workflow(
         self, chat: List[Dict[str, str]], image: Optional[Union[str, Path]] = None
     ) -> Tuple[str, List[Dict]]:
@@ -470,7 +412,9 @@ class VisionAgent(Agent):
         all_tool_results: List[Dict] = []
         for _ in range(self.max_retries):
-            task_list = create_tasks(self.task_model, question, self.tools, reflections)
+            task_list = self.create_tasks(
+                self.task_model, question, self.tools, reflections
+            )
             task_depend = {"Original Quesiton": question}
             previous_log = ""
@@ -482,7 +426,7 @@ class VisionAgent(Agent):
             for task in task_list:
                 task_str = task["task"]
                 previous_log = str(task_depend)
-                tool_results, call_results = retrieval(
+                tool_results, call_results = self.retrieval(
                     self.task_model,
                     task_str,
                     self.tools,
@@ -496,8 +440,8 @@ class VisionAgent(Agent):
                 tool_results["answer"] = answer
                 all_tool_results.append(tool_results)
-                _LOGGER.info(f"\tCall Result: {call_results}")
-                _LOGGER.info(f"\tAnswer: {answer}")
+                self.log_progress(f"\tCall Result: {call_results}")
+                self.log_progress(f"\tAnswer: {answer}")
                 answers.append({"task": task_str, "answer": answer})
                 task_depend[task["id"]]["answer"] = answer  # type: ignore
                 task_depend[task["id"]]["call_result"] = call_results  # type: ignore
@@ -515,12 +459,15 @@ class VisionAgent(Agent):
                 final_answer,
                 visualized_images[0] if len(visualized_images) > 0 else image,
             )
-            _LOGGER.info(f"Reflection: {reflection}")
+            self.log_progress(f"Reflection: {reflection}")
             if parse_reflect(reflection):
                 break
             else:
                 reflections += reflection
+        # '<END>' is a symbol to indicate the end of the chat, which is useful for streaming logs.
+        self.log_progress(
+            f"The Vision Agent has concluded this chat. <ANSWER>{final_answer}</<ANSWER>"
+        )
         return final_answer, all_tool_results
     def chat(
@@ -528,3 +475,87 @@ class VisionAgent(Agent):
     ) -> str:
         answer, _ = self.chat_with_workflow(chat, image=image)
         return answer
+    def retrieval(
+        self,
+        model: Union[LLM, LMM, Agent],
+        question: str,
+        tools: Dict[int, Any],
+        previous_log: str,
+        reflections: str,
+    ) -> Tuple[Dict, str]:
+        tool_id = choose_tool(
+            model,
+            question,
+            {k: v["description"] for k, v in tools.items()},
+            reflections,
+        )
+        if tool_id is None:
+            return {}, ""
+        tool_instructions = tools[tool_id]
+        tool_usage = tool_instructions["usage"]
+        tool_name = tool_instructions["name"]
+        parameters = choose_parameter(
+            model, question, tool_usage, previous_log, reflections
+        )
+        if parameters is None:
+            return {}, ""
+        tool_results = {
+            "task": question,
+            "tool_name": tool_name,
+            "parameters": parameters,
+        }
+        self.log_progress(
+            f"""Going to run the following tool(s) in sequence:
+{tabulate([tool_results], headers="keys", tablefmt="mixed_grid")}"""
+        )
+        def parse_tool_results(result: Dict[str, Union[Dict, List]]) -> Any:
+            call_results: List[Any] = []
+            if isinstance(result["parameters"], Dict):
+                call_results.append(
+                    function_call(tools[tool_id]["class"], result["parameters"])
+                )
+            elif isinstance(result["parameters"], List):
+                for parameters in result["parameters"]:
+                    call_results.append(
+                        function_call(tools[tool_id]["class"], parameters)
+                    )
+            return call_results
+        call_results = parse_tool_results(tool_results)
+        tool_results["call_results"] = call_results
+        call_results_str = str(call_results)
+        return tool_results, call_results_str
+    def create_tasks(
+        self,
+        task_model: Union[LLM, LMM],
+        question: str,
+        tools: Dict[int, Any],
+        reflections: str,
+    ) -> List[Dict]:
+        tasks = task_decompose(
+            task_model,
+            question,
+            {k: v["description"] for k, v in tools.items()},
+            reflections,
+        )
+        if tasks is not None:
+            task_list = [{"task": task, "id": i + 1} for i, task in enumerate(tasks)]
+            task_list = task_topology(task_model, question, task_list)
+            try:
+                task_list = topological_sort(task_list)
+            except Exception:
+                _LOGGER.error(f"Failed topological_sort on: {task_list}")
+        else:
+            task_list = []
+        self.log_progress(
+            f"""Planned tasks:
+{tabulate(task_list, headers="keys", tablefmt="mixed_grid")}"""
+        )
+        return task_list

vision_agent/agent/vision_agent_prompts.py CHANGED Viewed

@@ -1,4 +1,4 @@
-VISION_AGENT_REFLECTION = """You are an advanced reasoning agent that can improve based on self refection. You will be given a previous reasoning trial in which you were given the user's question, the available tools that the agent has, the decomposed tasks and tools that the agent used to answer the question and the final answer the agent provided. You must determine if the agent's answer was correct or incorrect. If the agent's answer was correct, respond with Finish. If the agent's answer was incorrect, you must diagnose a possible reason for failure or phrasing discrepancy and devise a new, concise, high level plan that aims to mitigate the same failure with the tools available. Use complete sentences.
+VISION_AGENT_REFLECTION = """You are an advanced reasoning agent that can improve based on self-refection. You will be given a previous reasoning trial in which you were given the user's question, the available tools that the agent has, the decomposed tasks and tools that the agent used to answer the question and the final answer the agent provided. You must determine if the agent's answer was correct or incorrect. If the agent's answer was correct, respond with Finish. If the agent's answer was incorrect, you must diagnose a possible reason for failure or phrasing discrepancy and devise a new, concise, high level plan that aims to mitigate the same failure with the tools available. Use complete sentences.
 User's question: {question}
@@ -13,14 +13,14 @@ Final answer:
 Reflection: """
-TASK_DECOMPOSE = """You need to decompose a complex user's question into some simple subtasks and let the model execute it step by step.
+TASK_DECOMPOSE = """You need to decompose a user's complex question into some simple subtasks and let the model execute it step by step.
 This is the user's question: {question}
-This is tool list:
+This is the tool list:
 {tools}
 Please note that:
-1. You should only decompose this complex user's question into some simple subtasks which can be executed easily by using one single tool in the tool list.
-2. If one subtask need the results from other subtask, you can should write clearly. For example:
+1. You should only decompose this user's complex question into some simple subtasks which can be executed easily by using one single tool in the tool list.
+2. If one subtask needs the results from another subtask, you should write clearly. For example:
 {{"Tasks": ["Convert 23 km/h to X km/min by 'divide_'", "Multiply X km/min by 45 min to get Y by 'multiply_'"]}}
 3. You must ONLY output in a parsible JSON format. An example output looks like:
@@ -28,18 +28,18 @@ Please note that:
 Output: """
-TASK_DECOMPOSE_DEPENDS = """You need to decompose a complex user's question into some simple subtasks and let the model execute it step by step.
+TASK_DECOMPOSE_DEPENDS = """You need to decompose a user's complex question into some simple subtasks and let the model execute it step by step.
 This is the user's question: {question}
-This is tool list:
+This is the tool list:
 {tools}
 This is a reflection from a previous failed attempt:
 {reflections}
 Please note that:
-1. You should only decompose this complex user's question into some simple subtasks which can be executed easily by using one single tool in the tool list.
-2. If one subtask need the results from other subtask, you can should write clearly. For example:
+1. You should only decompose this user's complex question into some simple subtasks which can be executed easily by using one single tool in the tool list.
+2. If one subtask needs the results from another subtask, you should write clearly. For example:
 {{"Tasks": ["Convert 23 km/h to X km/min by 'divide_'", "Multiply X km/min by 45 min to get Y by 'multiply_'"]}}
 3. You must ONLY output in a parsible JSON format. An example output looks like:
@@ -53,7 +53,7 @@ These are the tools you can select to solve the question:
 {tools}
 Please note that:
-1. You should only chooce one tool the Tool List to solve this question.
+1. You should only choose one tool from the Tool List to solve this question.
 2. You must ONLY output the ID of the tool you chose in a parsible JSON format. Two example outputs look like:
 Example 1: {{"ID": 1}}
@@ -70,7 +70,7 @@ This is a reflection from a previous failed attempt:
 {reflections}
 Please note that:
-1. You should only chooce one tool the Tool List to solve this question.
+1. You should only choose one tool from the Tool List to solve this question.
 2. You must ONLY output the ID of the tool you chose in a parsible JSON format. Two example outputs look like:
 Example 1: {{"ID": 1}}
@@ -78,14 +78,14 @@ Example 2: {{"ID": 2}}
 Output: """
-CHOOSE_PARAMETER_DEPENDS = """Given a user's question and a API tool documentation, you need to output parameters according to the API tool documentation to successfully call the API to solve the user's question.
+CHOOSE_PARAMETER_DEPENDS = """Given a user's question and an API tool documentation, you need to output parameters according to the API tool documentation to successfully call the API to solve the user's question.
 Please note that:
 1. The Example in the API tool documentation can help you better understand the use of the API.
-2. Ensure the parameters you output are correct. The output must contain the required parameters, and can contain the optional parameters based on the question. If no paremters in the required parameters and optional parameters, just leave it as {{"Parameters":{{}}}}
+2. Ensure the parameters you output are correct. The output must contain the required parameters, and can contain the optional parameters based on the question. If there are no paremters in the required parameters and optional parameters, just leave it as {{"Parameters":{{}}}}
 3. If the user's question mentions other APIs, you should ONLY consider the API tool documentation I give and do not consider other APIs.
 4. The question may have dependencies on answers of other questions, so we will provide logs of previous questions and answers for your reference.
-5. If you need to use this API multiple times,, please set "Parameters" to a list.
-6. You must ONLY output in a parsible JSON format. Two examples output looks like:
+5. If you need to use this API multiple times, please set "Parameters" to a list.
+6. You must ONLY output in a parsible JSON format. Two example outputs look like:
 Example 1: {{"Parameters":{{"input": [1,2,3]}}}}
 Example 2: {{"Parameters":[{{"input": [1,2,3]}}, {{"input": [2,3,4]}}]}}
@@ -93,16 +93,16 @@ Example 2: {{"Parameters":[{{"input": [1,2,3]}}, {{"input": [2,3,4]}}]}}
 This is a reflection from a previous failed attempt:
 {reflections}
-There are logs of previous questions and answers:
+These are logs of previous questions and answers:
 {previous_log}
 This is the current user's question: {question}
-This is API tool documentation: {tool_usage}
+This is the API tool documentation: {tool_usage}
 Output: """
 ANSWER_GENERATE_DEPENDS = """You should answer the question based on the response output by the API tool.
 Please note that:
-1. Try to organize the response into a natural language answer.
+1. You should try to organize the response into a natural language answer.
 2. We will not show the API response to the user, thus you need to make full use of the response and give the information in the response that can satisfy the user's question in as much detail as possible.
 3. If the API tool does not provide useful information in the response, please answer with your knowledge.
 4. The question may have dependencies on answers of other questions, so we will provide logs of previous questions and answers.
@@ -110,7 +110,7 @@ Please note that:
 This is a reflection from a previous failed attempt:
 {reflections}
-There are logs of previous questions and answers:
+These are logs of previous questions and answers:
 {previous_log}
 This is the user's question: {question}
@@ -121,7 +121,7 @@ This is the response output by the API tool:
 We will not show the API response to the user, thus you need to make full use of the response and give the information in the response that can satisfy the user's question in as much detail as possible.
 Output: """
-ANSWER_SUMMARIZE_DEPENDS = """We break down a complex user's problems into simple subtasks and provide answers to each simple subtask. You need to organize these answers to each subtask and form a self-consistent final answer to the user's question
+ANSWER_SUMMARIZE_DEPENDS = """We break down a user's complex problems into simple subtasks and provide answers to each simple subtask. You need to organize these answers to each subtask and form a self-consistent final answer to the user's question
 This is the user's question: {question}
 These are subtasks and their answers:

vision_agent/image_utils.py CHANGED Viewed

@@ -78,7 +78,7 @@ def convert_to_b64(data: Union[str, Path, np.ndarray, ImageType]) -> str:
         data = Image.open(data)
     if isinstance(data, Image.Image):
         buffer = BytesIO()
-        data.save(buffer, format="PNG")
+        data.convert("RGB").save(buffer, format="JPEG")
         return base64.b64encode(buffer.getvalue()).decode("utf-8")
     else:
         arr_bytes = data.tobytes()

vision_agent/llm/__init__.py CHANGED Viewed

	@@ -1 +1 @@
1	- from .llm import LLM, OpenAILLM
1	+ from .llm import LLM, AzureOpenAILLM, OpenAILLM

vision_agent/llm/llm.py CHANGED Viewed

@@ -1,8 +1,9 @@
 import json
+import os
 from abc import ABC, abstractmethod
-from typing import Any, Callable, Dict, List, Mapping, Union, cast
+from typing import Any, Callable, Dict, List, Mapping, Optional, Union, cast
-from openai import OpenAI
+from openai import AzureOpenAI, OpenAI
 from vision_agent.tools import (
     CHOOSE_PARAMS,
@@ -33,11 +34,16 @@ class OpenAILLM(LLM):
     def __init__(
         self,
         model_name: str = "gpt-4-turbo-preview",
+        api_key: Optional[str] = None,
         json_mode: bool = False,
         **kwargs: Any
     ):
+        if not api_key:
+            self.client = OpenAI()
+        else:
+            self.client = OpenAI(api_key=api_key)
         self.model_name = model_name
-        self.client = OpenAI()
         self.kwargs = kwargs
         if json_mode:
             self.kwargs["response_format"] = {"type": "json_object"}
@@ -120,3 +126,32 @@ class OpenAILLM(LLM):
         ]
         return lambda x: GroundingSAM()(**{"prompt": params["prompt"], "image": x})
+class AzureOpenAILLM(OpenAILLM):
+    def __init__(
+        self,
+        model_name: str = "gpt-4-turbo-preview",
+        api_key: Optional[str] = None,
+        api_version: str = "2024-02-01",
+        azure_endpoint: Optional[str] = None,
+        json_mode: bool = False,
+        **kwargs: Any
+    ):
+        if not api_key:
+            api_key = os.getenv("AZURE_OPENAI_API_KEY")
+        if not azure_endpoint:
+            azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
+        if not api_key:
+            raise ValueError("Azure OpenAI API key is required.")
+        if not azure_endpoint:
+            raise ValueError("Azure OpenAI endpoint is required.")
+        self.client = AzureOpenAI(
+            api_key=api_key, api_version=api_version, azure_endpoint=azure_endpoint
+        )
+        self.model_name = model_name
+        self.kwargs = kwargs
+        if json_mode:
+            self.kwargs["response_format"] = {"type": "json_object"}

vision_agent/lmm/__init__.py CHANGED Viewed

	@@ -1 +1 @@
1	- from .lmm import LMM, LLaVALMM, OpenAILMM, get_lmm
1	+ from .lmm import LMM, AzureOpenAILMM, LLaVALMM, OpenAILMM, get_lmm

vision_agent/lmm/lmm.py CHANGED Viewed

@@ -1,12 +1,13 @@
 import base64
 import json
 import logging
+import os
 from abc import ABC, abstractmethod
 from pathlib import Path
 from typing import Any, Callable, Dict, List, Optional, Union, cast
 import requests
-from openai import OpenAI
+from openai import AzureOpenAI, OpenAI
 from vision_agent.tools import (
     CHOOSE_PARAMS,
@@ -99,12 +100,18 @@ class OpenAILMM(LMM):
     def __init__(
         self,
         model_name: str = "gpt-4-vision-preview",
+        api_key: Optional[str] = None,
         max_tokens: int = 1024,
         **kwargs: Any,
     ):
+        if not api_key:
+            self.client = OpenAI()
+        else:
+            self.client = OpenAI(api_key=api_key)
+        self.client = OpenAI(api_key=api_key)
         self.model_name = model_name
         self.max_tokens = max_tokens
-        self.client = OpenAI()
         self.kwargs = kwargs
     def __call__(
@@ -248,6 +255,34 @@ class OpenAILMM(LMM):
         return lambda x: GroundingSAM()(**{"prompt": params["prompt"], "image": x})
+class AzureOpenAILMM(OpenAILMM):
+    def __init__(
+        self,
+        model_name: str = "gpt-4-vision-preview",
+        api_key: Optional[str] = None,
+        api_version: str = "2024-02-01",
+        azure_endpoint: Optional[str] = None,
+        max_tokens: int = 1024,
+        **kwargs: Any,
+    ):
+        if not api_key:
+            api_key = os.getenv("AZURE_OPENAI_API_KEY")
+        if not azure_endpoint:
+            azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
+        if not api_key:
+            raise ValueError("OpenAI API key is required.")
+        if not azure_endpoint:
+            raise ValueError("Azure OpenAI endpoint is required.")
+        self.client = AzureOpenAI(
+            api_key=api_key, api_version=api_version, azure_endpoint=azure_endpoint
+        )
+        self.model_name = model_name
+        self.max_tokens = max_tokens
+        self.kwargs = kwargs
 def get_lmm(name: str) -> LMM:
     if name == "openai":
         return OpenAILMM(name)

vision_agent/tools/prompts.py CHANGED Viewed

@@ -6,14 +6,14 @@ CHOOSE_PARAMS = (
     "This is the API tool documentation: {api_doc}\n"
     "Please note that: \n"
     "1. The Example in the API tool documentation can help you better understand the use of the API.\n"
-    '2. Ensure the parameters you output are correct. The output must contain the required parameters, and can contain the optional parameters based on the question. If no paremters in the required parameters and optional parameters, just leave it as {{"Parameters":{{}}}}\n'
+    '2. Ensure the parameters you output are correct. The output must contain the required parameters, and can contain the optional parameters based on the question. If there are no parameters in the required parameters and optional parameters, just leave it as {{"Parameters":{{}}}}\n'
     "3. If the user's question mentions other APIs, you should ONLY consider the API tool documentation I give and do not consider other APIs.\n"
     '4. If you need to use this API multiple times, please set "Parameters" to a list.\n'
-    "5. You must ONLY output in a parsible JSON format. Two examples output looks like:\n"
+    "5. You must ONLY output in a parsible JSON format. Two example outputs look like:\n"
     "'''\n"
     'Example 1: {{"Parameters":{{"keyword": "Artificial Intelligence", "language": "English"}}}}\n'
     'Example 2: {{"Parameters":[{{"keyword": "Artificial Intelligence", "language": "English"}}, {{"keyword": "Machine Learning", "language": "English"}}]}}\n'
     "'''\n"
-    "This is user's question: {question}\n"
+    "This is the user's question: {question}\n"
     "Output:\n"
 )

vision_agent/tools/tools.py CHANGED Viewed

@@ -78,32 +78,32 @@ class CLIP(Tool):
     -------
         >>> import vision_agent as va
         >>> clip = va.tools.CLIP()
-        >>> clip(["red line", "yellow dot"], "ct_scan1.jpg"))
+        >>> clip("red line, yellow dot", "ct_scan1.jpg"))
         [{"labels": ["red line", "yellow dot"], "scores": [0.98, 0.02]}]
     """
-    _ENDPOINT = "https://rb4ii6dfacmwqfxivi4aedyyfm0endsv.lambda-url.us-east-2.on.aws"
+    _ENDPOINT = "https://soi4ewr6fjqqdf5vuss6rrilee0kumxq.lambda-url.us-east-2.on.aws"
     name = "clip_"
     description = "'clip_' is a tool that can classify or tag any image given a set of input classes or tags."
     usage = {
         "required_parameters": [
-            {"name": "prompt", "type": "List[str]"},
+            {"name": "prompt", "type": "str"},
             {"name": "image", "type": "str"},
         ],
         "examples": [
             {
                 "scenario": "Can you classify this image as a cat? Image name: cat.jpg",
-                "parameters": {"prompt": ["cat"], "image": "cat.jpg"},
+                "parameters": {"prompt": "cat", "image": "cat.jpg"},
             },
             {
                 "scenario": "Can you tag this photograph with cat or dog? Image name: cat_dog.jpg",
-                "parameters": {"prompt": ["cat", "dog"], "image": "cat_dog.jpg"},
+                "parameters": {"prompt": "cat, dog", "image": "cat_dog.jpg"},
             },
             {
                 "scenario": "Can you build me a classifier that classifies red shirts, green shirts and other? Image name: shirts.jpg",
                 "parameters": {
-                    "prompt": ["red shirt", "green shirt", "other"],
+                    "prompt": "red shirt, green shirt, other",
                     "image": "shirts.jpg",
                 },
             },
@@ -111,11 +111,11 @@ class CLIP(Tool):
     }
     # TODO: Add support for input multiple images, which aligns with the output type.
-    def __call__(self, prompt: List[str], image: Union[str, ImageType]) -> Dict:
+    def __call__(self, prompt: str, image: Union[str, ImageType]) -> Dict:
         """Invoke the CLIP model.
         Parameters:
-            prompt: a list of classes or tags to classify the image.
+            prompt: a string includes a list of classes or tags to classify the image.
             image: the input image to classify.
         Returns:
@@ -123,8 +123,9 @@ class CLIP(Tool):
         """
         image_b64 = convert_to_b64(image)
         data = {
-            "classes": prompt,
-            "images": [image_b64],
+            "prompt": prompt,
+            "image": image_b64,
+            "tool": "closed_set_image_classification",
         }
         res = requests.post(
             self._ENDPOINT,
@@ -138,10 +139,11 @@ class CLIP(Tool):
             _LOGGER.error(f"Request failed: {resp_json}")
             raise ValueError(f"Request failed: {resp_json}")
-        rets = []
-        for elt in resp_json["data"]:
-            rets.append({"labels": prompt, "scores": [round(prob, 2) for prob in elt]})
-        return cast(Dict, rets[0])
+        resp_json["data"]["scores"] = [
+            round(prob, 4) for prob in resp_json["data"]["scores"]
+        ]
+        return resp_json["data"]  # type: ignore
 class GroundingDINO(Tool):
@@ -158,7 +160,7 @@ class GroundingDINO(Tool):
         'scores': [0.98, 0.02]}]
     """
-    _ENDPOINT = "https://chnicr4kes5ku77niv2zoytggq0qyqlp.lambda-url.us-east-2.on.aws"
+    _ENDPOINT = "https://soi4ewr6fjqqdf5vuss6rrilee0kumxq.lambda-url.us-east-2.on.aws"
     name = "grounding_dino_"
     description = "'grounding_dino_' is a tool that can detect arbitrary objects with inputs such as category names or referring expressions."
@@ -167,6 +169,10 @@ class GroundingDINO(Tool):
             {"name": "prompt", "type": "str"},
             {"name": "image", "type": "str"},
         ],
+        "optional_parameters": [
+            {"name": "box_threshold", "type": "float"},
+            {"name": "iou_threshold", "type": "float"},
+        ],
         "examples": [
             {
                 "scenario": "Can you build me a car detector?",
@@ -181,32 +187,44 @@ class GroundingDINO(Tool):
                 "parameters": {
                     "prompt": "red shirt. green shirt",
                     "image": "shirts.jpg",
+                    "box_threshold": 0.20,
+                    "iou_threshold": 0.75,
                 },
             },
         ],
     }
     # TODO: Add support for input multiple images, which aligns with the output type.
-    def __call__(self, prompt: str, image: Union[str, Path, ImageType]) -> Dict:
+    def __call__(
+        self,
+        prompt: str,
+        image: Union[str, Path, ImageType],
+        box_threshold: float = 0.20,
+        iou_threshold: float = 0.75,
+    ) -> Dict:
         """Invoke the Grounding DINO model.
         Parameters:
             prompt: one or multiple class names to detect. The classes should be separated by a period if there are multiple classes. E.g. "big dog . small cat"
             image: the input image to run against.
+            box_threshold: the threshold to filter out the bounding boxes with low scores.
+            iou_threshold: the threshold for intersection over union used in nms algorithm. It will suppress the boxes which have iou greater than this threshold.
         Returns:
             A list of dictionaries containing the labels, scores, and bboxes. Each dictionary contains the detection result for an image.
         """
         image_size = get_image_size(image)
         image_b64 = convert_to_b64(image)
-        data = {
+        request_data = {
             "prompt": prompt,
-            "images": [image_b64],
+            "image": image_b64,
+            "tool": "visual_grounding",
+            "kwargs": {"box_threshold": box_threshold, "iou_threshold": iou_threshold},
         }
         res = requests.post(
             self._ENDPOINT,
             headers={"Content-Type": "application/json"},
-            json=data,
+            json=request_data,
         )
         resp_json: Dict[str, Any] = res.json()
         if (
@@ -214,16 +232,15 @@ class GroundingDINO(Tool):
         ) or "statusCode" not in resp_json:
             _LOGGER.error(f"Request failed: {resp_json}")
             raise ValueError(f"Request failed: {resp_json}")
-        resp_data = resp_json["data"]
-        for elt in resp_data:
-            if "bboxes" in elt:
-                elt["bboxes"] = [
-                    normalize_bbox(box, image_size) for box in elt["bboxes"]
-                ]
-            if "scores" in elt:
-                elt["scores"] = [round(score, 2) for score in elt["scores"]]
-            elt["size"] = (image_size[1], image_size[0])
-        return cast(Dict, resp_data)
+        data: Dict[str, Any] = resp_json["data"]
+        if "bboxes" in data:
+            data["bboxes"] = [normalize_bbox(box, image_size) for box in data["bboxes"]]
+        if "scores" in data:
+            data["scores"] = [round(score, 2) for score in data["scores"]]
+        if "labels" in data:
+            data["labels"] = [label for label in data["labels"]]
+        data["size"] = (image_size[1], image_size[0])
+        return data
 class GroundingSAM(Tool):
@@ -234,7 +251,7 @@ class GroundingSAM(Tool):
     -------
         >>> import vision_agent as va
         >>> t = va.tools.GroundingSAM()
-        >>> t(["red line", "yellow dot"], ct_scan1.jpg"])
+        >>> t("red line, yellow dot", "ct_scan1.jpg"])
         [{'labels': ['yellow dot', 'red line'],
         'bboxes': [[0.38, 0.15, 0.59, 0.7], [0.48, 0.25, 0.69, 0.71]],
         'masks': [array([[0, 0, 0, ..., 0, 0, 0],
@@ -249,55 +266,71 @@ class GroundingSAM(Tool):
            [1, 1, 1, ..., 1, 1, 1]], dtype=uint8)]}]
     """
-    _ENDPOINT = "https://cou5lfmus33jbddl6hoqdfbw7e0qidrw.lambda-url.us-east-2.on.aws"
+    _ENDPOINT = "https://soi4ewr6fjqqdf5vuss6rrilee0kumxq.lambda-url.us-east-2.on.aws"
     name = "grounding_sam_"
     description = "'grounding_sam_' is a tool that can detect and segment arbitrary objects with inputs such as category names or referring expressions."
     usage = {
         "required_parameters": [
-            {"name": "prompt", "type": "List[str]"},
+            {"name": "prompt", "type": "str"},
             {"name": "image", "type": "str"},
         ],
+        "optional_parameters": [
+            {"name": "box_threshold", "type": "float"},
+            {"name": "iou_threshold", "type": "float"},
+        ],
         "examples": [
             {
                 "scenario": "Can you build me a car segmentor?",
-                "parameters": {"prompt": ["car"], "image": ""},
+                "parameters": {"prompt": "car", "image": ""},
             },
             {
                 "scenario": "Can you segment the person on the left? Image name: person.jpg",
-                "parameters": {"prompt": ["person on the left"], "image": "person.jpg"},
+                "parameters": {"prompt": "person on the left", "image": "person.jpg"},
             },
             {
                 "scenario": "Can you build me a tool that segments red shirts and green shirts? Image name: shirts.jpg",
                 "parameters": {
-                    "prompt": ["red shirt", "green shirt"],
+                    "prompt": "red shirt, green shirt",
                     "image": "shirts.jpg",
+                    "box_threshold": 0.20,
+                    "iou_threshold": 0.75,
                 },
             },
         ],
     }
     # TODO: Add support for input multiple images, which aligns with the output type.
-    def __call__(self, prompt: List[str], image: Union[str, ImageType]) -> Dict:
+    def __call__(
+        self,
+        prompt: str,
+        image: Union[str, ImageType],
+        box_threshold: float = 0.2,
+        iou_threshold: float = 0.75,
+    ) -> Dict:
         """Invoke the Grounding SAM model.
         Parameters:
             prompt: a list of classes to segment.
             image: the input image to segment.
+            box_threshold: the threshold to filter out the bounding boxes with low scores.
+            iou_threshold: the threshold for intersection over union used in nms algorithm. It will suppress the boxes which have iou greater than this threshold.
         Returns:
             A list of dictionaries containing the labels, scores, bboxes and masks. Each dictionary contains the segmentation result for an image.
         """
         image_size = get_image_size(image)
         image_b64 = convert_to_b64(image)
-        data = {
-            "classes": prompt,
+        request_data = {
+            "prompt": prompt,
             "image": image_b64,
+            "tool": "visual_grounding_segment",
+            "kwargs": {"box_threshold": box_threshold, "iou_threshold": iou_threshold},
         }
         res = requests.post(
             self._ENDPOINT,
             headers={"Content-Type": "application/json"},
-            json=data,
+            json=request_data,
         )
         resp_json: Dict[str, Any] = res.json()
         if (
@@ -305,14 +338,19 @@ class GroundingSAM(Tool):
         ) or "statusCode" not in resp_json:
             _LOGGER.error(f"Request failed: {resp_json}")
             raise ValueError(f"Request failed: {resp_json}")
-        resp_data = resp_json["data"]
+        data: Dict[str, Any] = resp_json["data"]
         ret_pred: Dict[str, List] = {"labels": [], "bboxes": [], "masks": []}
-        for pred in resp_data["preds"]:
-            encoded_mask = pred["encoded_mask"]
-            mask = rle_decode(mask_rle=encoded_mask, shape=pred["mask_shape"])
-            ret_pred["labels"].append(pred["label_name"])
-            ret_pred["bboxes"].append(normalize_bbox(pred["bbox"], image_size))
-            ret_pred["masks"].append(mask)
+        if "bboxes" in data:
+            ret_pred["bboxes"] = [
+                normalize_bbox(box, image_size) for box in data["bboxes"]
+            ]
+        if "masks" in data:
+            ret_pred["masks"] = [
+                rle_decode(mask_rle=mask, shape=data["mask_shape"])
+                for mask in data["masks"]
+            ]
+        ret_pred["labels"] = data["labels"]
+        ret_pred["scores"] = data["scores"]
         return ret_pred
@@ -321,8 +359,14 @@ class AgentGroundingSAM(GroundingSAM):
     returns the file name. This makes it easier for agents to use.
     """
-    def __call__(self, prompt: List[str], image: Union[str, ImageType]) -> Dict:
-        rets = super().__call__(prompt, image)
+    def __call__(
+        self,
+        prompt: str,
+        image: Union[str, ImageType],
+        box_threshold: float = 0.2,
+        iou_threshold: float = 0.75,
+    ) -> Dict:
+        rets = super().__call__(prompt, image, box_threshold, iou_threshold)
         mask_files = []
         for mask in rets["masks"]:
             with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as tmp:
@@ -403,7 +447,7 @@ class BboxArea(Tool):
     name = "bbox_area_"
     description = "'bbox_area_' returns the area of the bounding box in pixels normalized to 2 decimal places."
     usage = {
-        "required_parameters": [{"name": "bbox", "type": "List[int]"}],
+        "required_parameters": [{"name": "bboxes", "type": "List[int]"}],
         "examples": [
             {
                 "scenario": "If you want to calculate the area of the bounding box [0.2, 0.21, 0.34, 0.42]",
@@ -445,7 +489,8 @@ class SegArea(Tool):
     def __call__(self, masks: Union[str, Path]) -> float:
         pil_mask = Image.open(str(masks))
         np_mask = np.array(pil_mask)
-        return cast(float, round(np.sum(np_mask) / 255, 2))
+        np_mask = np.clip(np_mask, 0, 1)
+        return cast(float, round(np.sum(np_mask), 2))
 class BboxIoU(Tool):

{vision_agent-0.0.48.dist-info → vision_agent-0.0.49.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: vision-agent
-Version: 0.0.48
+Version: 0.0.49
 Summary: Toolset for Vision Agent
 Author: Landing AI
 Author-email: dev@landing.ai
@@ -59,7 +59,8 @@ To get started, you can install the library using pip:
 pip install vision-agent
 ```
-Ensure you have an OpenAI API key and set it as an environment variable:
+Ensure you have an OpenAI API key and set it as an environment variable (if you are
+using Azure OpenAI please see the additional setup section):
 ```bash
 export OPENAI_API_KEY="your-api-key"
@@ -139,3 +140,23 @@ you. For example:
 It also has a basic set of calculate tools such as add, subtract, multiply and divide.
+### Additional Setup
+If you want to use Azure OpenAI models, you can set the environment variable:
+```bash
+export AZURE_OPENAI_API_KEY="your-api-key"
+export AZURE_OPENAI_ENDPOINT="your-endpoint"
+```
+You can then run Vision Agent using the Azure OpenAI models:
+```python
+>>> import vision_agent as va
+>>> agent = va.agent.VisionAgent(
+>>>     task_model=va.llm.AzureOpenAILLM(),
+>>>     answer_model=va.lmm.AzureOpenAILMM(),
+>>>     reflection_model=va.lmm.AzureOpenAILMM(),
+>>> )
+```

vision_agent-0.0.49.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,26 @@
+vision_agent/__init__.py,sha256=wD1cssVTAJ55uTViNfBGooqJUV0p9fmVAuTMHHrmUBU,229
+vision_agent/agent/__init__.py,sha256=B4JVrbY4IRVCJfjmrgvcp7h1mTUEk8MZvL0Zmej4Ka0,127
+vision_agent/agent/agent.py,sha256=X7kON-g9ePUKumCDaYfQNBX_MEFE-ax5PnRp7-Cc5Wo,529
+vision_agent/agent/easytool.py,sha256=oMHnBg7YBtIPgqQUNcZgq7uMgpPThs99_UnO7ERkMVg,11511
+vision_agent/agent/easytool_prompts.py,sha256=dYzWa_RaiaFSQ-CowoQOcFmjZtBTTljRyA809bLgrvU,4519
+vision_agent/agent/reflexion.py,sha256=wzpptfALNZIh9Q5jgkK3imGL5LWjTW_n_Ypsvxdh07Q,10101
+vision_agent/agent/reflexion_prompts.py,sha256=G7UAeNz_g2qCb2yN6OaIC7bQVUkda4m3z42EG8wAyfE,9342
+vision_agent/agent/vision_agent.py,sha256=DgvRra_1e05xyo8vIwD8TwZDcd5v-KdfaGB_QJLh62o,19101
+vision_agent/agent/vision_agent_prompts.py,sha256=fYnOT6z7DmuVTfUknUuc6b_vPmO0vgCyVJRQSR5M-G8,6192
+vision_agent/data/__init__.py,sha256=YU-5g3LbEQ6a4drz0RLGTagXMVU2Z4Xr3RlfWE-R0jU,46
+vision_agent/data/data.py,sha256=pgtSGZdAnbQ8oGsuapLtFTMPajnCGDGekEXTnFuBwsY,5122
+vision_agent/emb/__init__.py,sha256=YmCkGrJBtXb6X6Z3lnKiFoQYKXMgHMJp8JJyMLVvqcI,75
+vision_agent/emb/emb.py,sha256=la9lhEzk7jqUCjYYQ5oRgVNSnC9_EJBJIpE_B9c6PJo,1375
+vision_agent/image_utils.py,sha256=_hDikKa40U-2nQufKMRDgU9t-OmwCK9Rb_6O3v1U3nE,4436
+vision_agent/llm/__init__.py,sha256=BoUm_zSAKnLlE8s-gKTSQugXDqVZKPqYlWwlTLdhcz4,48
+vision_agent/llm/llm.py,sha256=tgL6ZtuwZKuxSNiCxJCuP2ETjNMrosdgxXkZJb0_00E,5024
+vision_agent/lmm/__init__.py,sha256=nnNeKD1k7q_4vLb1x51O_EUTYaBgGfeiCx5F433gr3M,67
+vision_agent/lmm/lmm.py,sha256=LxwxCArp7DfnPbjf_Gl55xBxPwo2Qx8eDp1gCnGYSO0,9535
+vision_agent/tools/__init__.py,sha256=AKN-T659HpwVearRnkCd6wWNoJ6K5kW9gAZwb8IQSLE,235
+vision_agent/tools/prompts.py,sha256=V1z4YJLXZuUl_iZ5rY0M5hHc_2tmMEUKr0WocXKGt4E,1430
+vision_agent/tools/tools.py,sha256=bYc3Xeg0wDjpfd8WGxRPCSaGQxUHRLI2PJk-SThqjHY,25644
+vision_agent/tools/video.py,sha256=40rscP8YvKN3lhZ4PDcOK4XbdFX2duCRpHY_krmBYKU,7476
+vision_agent-0.0.49.dist-info/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
+vision_agent-0.0.49.dist-info/METADATA,sha256=PgExhHIptlfP38agIfQIqbj0LEhjlBLcapULWU3o2YM,6142
+vision_agent-0.0.49.dist-info/WHEEL,sha256=7Z8_27uaHI_UZAc4Uox4PpBhQ9Y5_modZXWMxtUi4NU,88
+vision_agent-0.0.49.dist-info/RECORD,,

vision_agent-0.0.48.dist-info/RECORD DELETED Viewed

@@ -1,26 +0,0 @@
-vision_agent/__init__.py,sha256=wD1cssVTAJ55uTViNfBGooqJUV0p9fmVAuTMHHrmUBU,229
-vision_agent/agent/__init__.py,sha256=B4JVrbY4IRVCJfjmrgvcp7h1mTUEk8MZvL0Zmej4Ka0,127
-vision_agent/agent/agent.py,sha256=PRLItaPfMc94H6mAIPj_gBvJ8RezDEPanB6Cmu81A0M,306
-vision_agent/agent/easytool.py,sha256=oMHnBg7YBtIPgqQUNcZgq7uMgpPThs99_UnO7ERkMVg,11511
-vision_agent/agent/easytool_prompts.py,sha256=uNp12LOFRLr3i2zLhNuLuyFms2-s8es2t6P6h76QDow,4493
-vision_agent/agent/reflexion.py,sha256=wzpptfALNZIh9Q5jgkK3imGL5LWjTW_n_Ypsvxdh07Q,10101
-vision_agent/agent/reflexion_prompts.py,sha256=UPGkt_qgHBMUY0VPVoF-BqhR0d_6WPjjrhbYLBYOtnQ,9342
-vision_agent/agent/vision_agent.py,sha256=P2melU6XQCCiiL1C_4QsxGUaWbwahuJA90eIcQJTR4U,17449
-vision_agent/agent/vision_agent_prompts.py,sha256=fSYO-6D-7rExS8tyZyZewrzAWsn2ZiqjBfoODL9m5Yk,6152
-vision_agent/data/__init__.py,sha256=YU-5g3LbEQ6a4drz0RLGTagXMVU2Z4Xr3RlfWE-R0jU,46
-vision_agent/data/data.py,sha256=pgtSGZdAnbQ8oGsuapLtFTMPajnCGDGekEXTnFuBwsY,5122
-vision_agent/emb/__init__.py,sha256=YmCkGrJBtXb6X6Z3lnKiFoQYKXMgHMJp8JJyMLVvqcI,75
-vision_agent/emb/emb.py,sha256=la9lhEzk7jqUCjYYQ5oRgVNSnC9_EJBJIpE_B9c6PJo,1375
-vision_agent/image_utils.py,sha256=XiOLpHAvlk55URw6iG7hl1OY71FVRA9_25b650amZXA,4420
-vision_agent/llm/__init__.py,sha256=fBKsIjL4z08eA0QYx6wvhRe4Nkp2pJ4VrZK0-uUL5Ec,32
-vision_agent/llm/llm.py,sha256=l8ZVh6vCZOJBHfenfOoHwPySXEUQoNt_gbL14gkvu2g,3904
-vision_agent/lmm/__init__.py,sha256=I8mbeNUajTfWVNqLsuFQVOaNBDlkIhYp9DFU8H4kB7g,51
-vision_agent/lmm/lmm.py,sha256=s_A3SKCoWm2biOt-gS9PXOsa9l-zrmR6mInLjAqam-A,8438
-vision_agent/tools/__init__.py,sha256=AKN-T659HpwVearRnkCd6wWNoJ6K5kW9gAZwb8IQSLE,235
-vision_agent/tools/prompts.py,sha256=9RBbyqlNlExsGKlJ89Jkph83DAEJ8PCVGaHoNbyN7TM,1416
-vision_agent/tools/tools.py,sha256=VD80cINHyesmGAfiCMrK506Q-G9QU_Srzey5wJ3aJGQ,23884
-vision_agent/tools/video.py,sha256=40rscP8YvKN3lhZ4PDcOK4XbdFX2duCRpHY_krmBYKU,7476
-vision_agent-0.0.48.dist-info/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
-vision_agent-0.0.48.dist-info/METADATA,sha256=y5wDj2u8p8zlIhxBh87SRWXAlc1hcMWd_aaLyuOKTbI,5581
-vision_agent-0.0.48.dist-info/WHEEL,sha256=7Z8_27uaHI_UZAc4Uox4PpBhQ9Y5_modZXWMxtUi4NU,88
-vision_agent-0.0.48.dist-info/RECORD,,

{vision_agent-0.0.48.dist-info → vision_agent-0.0.49.dist-info}/LICENSE RENAMED Viewed

File without changes

{vision_agent-0.0.48.dist-info → vision_agent-0.0.49.dist-info}/WHEEL RENAMED Viewed

File without changes

vision-agent 0.0.48__py3-none-any.whl → 0.0.49__py3-none-any.whl

vision-agent 0.0.48py3-none-any.whl → 0.0.49py3-none-any.whl