PyPI - vision-agent - Versions diffs - 0.2.229__tar.gz → 0.2.230__tar.gz - Mend

vision-agent 0.2.229tar.gz → 0.2.230tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (55) hide show

vision_agent-0.2.230/PKG-INFO ADDED Viewed

@@ -0,0 +1,156 @@
+Metadata-Version: 2.1
+Name: vision-agent
+Version: 0.2.230
+Summary: Toolset for Vision Agent
+Author: Landing AI
+Author-email: dev@landing.ai
+Requires-Python: >=3.9,<4.0
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Requires-Dist: anthropic (>=0.31.0,<0.32.0)
+Requires-Dist: av (>=11.0.0,<12.0.0)
+Requires-Dist: e2b (>=0.17.2a50,<0.18.0)
+Requires-Dist: e2b-code-interpreter (==0.0.11a37)
+Requires-Dist: flake8 (>=7.0.0,<8.0.0)
+Requires-Dist: ipykernel (>=6.29.4,<7.0.0)
+Requires-Dist: langsmith (>=0.1.58,<0.2.0)
+Requires-Dist: libcst (>=1.5.0,<2.0.0)
+Requires-Dist: matplotlib (>=3.9.2,<4.0.0)
+Requires-Dist: nbclient (>=0.10.0,<0.11.0)
+Requires-Dist: nbformat (>=5.10.4,<6.0.0)
+Requires-Dist: numpy (>=1.21.0,<2.0.0)
+Requires-Dist: openai (>=1.0.0,<2.0.0)
+Requires-Dist: opencv-python (>=4.0.0,<5.0.0)
+Requires-Dist: opentelemetry-api (>=1.29.0,<2.0.0)
+Requires-Dist: pandas (>=2.0.0,<3.0.0)
+Requires-Dist: pillow (>=10.0.0,<11.0.0)
+Requires-Dist: pillow-heif (>=0.16.0,<0.17.0)
+Requires-Dist: pydantic (==2.7.4)
+Requires-Dist: pydantic-settings (>=2.2.1,<3.0.0)
+Requires-Dist: pytube (==15.0.0)
+Requires-Dist: requests (>=2.0.0,<3.0.0)
+Requires-Dist: rich (>=13.7.1,<14.0.0)
+Requires-Dist: scikit-learn (>=1.5.2,<2.0.0)
+Requires-Dist: scipy (>=1.13.0,<1.14.0)
+Requires-Dist: tabulate (>=0.9.0,<0.10.0)
+Requires-Dist: tenacity (>=8.3.0,<9.0.0)
+Requires-Dist: tqdm (>=4.64.0,<5.0.0)
+Requires-Dist: typing_extensions (>=4.0.0,<5.0.0)
+Project-URL: Homepage, https://landing.ai
+Project-URL: documentation, https://github.com/landing-ai/vision-agent
+Project-URL: repository, https://github.com/landing-ai/vision-agent
+Description-Content-Type: text/markdown
+<div align="center">
+    <picture>
+        <source media="(prefers-color-scheme: dark)" srcset="https://github.com/landing-ai/vision-agent/blob/main/assets/logo_light.svg?raw=true">
+        <source media="(prefers-color-scheme: light)" srcset="https://github.com/landing-ai/vision-agent/blob/main/assets/logo_dark.svg?raw=true">
+        <img alt="VisionAgent" height="200px" src="https://github.com/landing-ai/vision-agent/blob/main/assets/logo_light.svg?raw=true">
+    </picture>
+[![](https://dcbadge.vercel.app/api/server/wPdN8RCYew?compact=true&style=flat)](https://discord.gg/wPdN8RCYew)
+![ci_status](https://github.com/landing-ai/vision-agent/actions/workflows/ci_cd.yml/badge.svg)
+[![PyPI version](https://badge.fury.io/py/vision-agent.svg)](https://badge.fury.io/py/vision-agent)
+![version](https://img.shields.io/pypi/pyversions/vision-agent)
+</div>
+## VisionAgent
+VisionAgent is a library that helps you utilize agent frameworks to generate code to
+solve your vision task. Check out our discord for updates and roadmaps! The fastest
+way to test out VisionAgent is to use our web application which you can find [here](https://va.landing.ai/).
+## Installation
+```bash
+pip install vision-agent
+```
+```bash
+export ANTHROPIC_API_KEY="your-api-key"
+export OPENAI_API_KEY="your-api-key"
+```
+---
+**NOTE**
+We found using both Anthropic Claude-3.5 and OpenAI o1 to be provide the best performance
+for VisionAgent. If you want to use a different LLM provider or only one, see
+'Using Other LLM Providers' below.
+---
+## Documentation
+[VisionAgent Library Docs](https://landing-ai.github.io/vision-agent/)
+## Examples
+### Counting cans in an image
+You can run VisionAgent in a local Jupyter Notebook [Counting cans in an image](https://github.com/landing-ai/vision-agent/blob/main/examples/notebooks/counting_cans.ipynb)
+### Generating code
+You can use VisionAgent to generate code to count the number of people in an image:
+```python
+from vision_agent.agent import VisionAgentCoderV2
+from vision_agent.agent.types import AgentMessage
+agent = VisionAgentCoderV2(verbose=True)
+code_context = agent.generate_code(
+    [
+        AgentMessage(
+            role="user",
+            content="Count the number of people in this image",
+            media=["people.png"]
+        )
+    ]
+)
+with open("generated_code.py", "w") as f:
+    f.write(code_context.code + "\n" + code_context.test)
+```
+### Using the tools directly
+VisionAgent produces code that utilizes our tools. You can also use the tools directly.
+For example if you wanted to detect people in an image and visualize the results:
+```python
+import vision_agent.tools as T
+import matplotlib.pyplot as plt
+image = T.load_image("people.png")
+dets = T.countgd_object_detection("person", image)
+# visualize the countgd bounding boxes on the image
+viz = T.overlay_bounding_boxes(image, dets)
+# save the visualization to a file
+T.save_image(viz, "people_detected.png")
+# display the visualization
+plt.imshow(viz)
+plt.show()
+```
+You can also use the tools for running on video files:
+```python
+import vision_agent.tools as T
+frames_and_ts = T.extract_frames_and_timestamps("people.mp4")
+# extract the frames from the frames_and_ts list
+frames = [f["frame"] for f in frames_and_ts]
+# run the countgd tracking on the frames
+tracks = T.countgd_sam2_video_tracking("person", frames)
+# visualize the countgd tracking results on the frames and save the video
+viz = T.overlay_segmentation_masks(frames, tracks)
+T.save_video(viz, "people_detected.mp4")
+```
+## Using Other LLM Providers
+You can use other LLM providers by changing `config.py` in the `vision_agent/configs`
+directory. For example to change to Anthropic simply just run:
+```bash
+cp vision_agent/configs/anthropic_config.py vision_agent/configs/config.py
+```
+**NOTE**
+VisionAgent moves fast and we are constantly updating and changing the library. If you
+have any questions or need help, please reach out to us on our discord channel.
+---

vision_agent-0.2.230/README.md ADDED Viewed

@@ -0,0 +1,110 @@
+<div align="center">
+    <picture>
+        <source media="(prefers-color-scheme: dark)" srcset="https://github.com/landing-ai/vision-agent/blob/main/assets/logo_light.svg?raw=true">
+        <source media="(prefers-color-scheme: light)" srcset="https://github.com/landing-ai/vision-agent/blob/main/assets/logo_dark.svg?raw=true">
+        <img alt="VisionAgent" height="200px" src="https://github.com/landing-ai/vision-agent/blob/main/assets/logo_light.svg?raw=true">
+    </picture>
+[![](https://dcbadge.vercel.app/api/server/wPdN8RCYew?compact=true&style=flat)](https://discord.gg/wPdN8RCYew)
+![ci_status](https://github.com/landing-ai/vision-agent/actions/workflows/ci_cd.yml/badge.svg)
+[![PyPI version](https://badge.fury.io/py/vision-agent.svg)](https://badge.fury.io/py/vision-agent)
+![version](https://img.shields.io/pypi/pyversions/vision-agent)
+</div>
+## VisionAgent
+VisionAgent is a library that helps you utilize agent frameworks to generate code to
+solve your vision task. Check out our discord for updates and roadmaps! The fastest
+way to test out VisionAgent is to use our web application which you can find [here](https://va.landing.ai/).
+## Installation
+```bash
+pip install vision-agent
+```
+```bash
+export ANTHROPIC_API_KEY="your-api-key"
+export OPENAI_API_KEY="your-api-key"
+```
+---
+**NOTE**
+We found using both Anthropic Claude-3.5 and OpenAI o1 to be provide the best performance
+for VisionAgent. If you want to use a different LLM provider or only one, see
+'Using Other LLM Providers' below.
+---
+## Documentation
+[VisionAgent Library Docs](https://landing-ai.github.io/vision-agent/)
+## Examples
+### Counting cans in an image
+You can run VisionAgent in a local Jupyter Notebook [Counting cans in an image](https://github.com/landing-ai/vision-agent/blob/main/examples/notebooks/counting_cans.ipynb)
+### Generating code
+You can use VisionAgent to generate code to count the number of people in an image:
+```python
+from vision_agent.agent import VisionAgentCoderV2
+from vision_agent.agent.types import AgentMessage
+agent = VisionAgentCoderV2(verbose=True)
+code_context = agent.generate_code(
+    [
+        AgentMessage(
+            role="user",
+            content="Count the number of people in this image",
+            media=["people.png"]
+        )
+    ]
+)
+with open("generated_code.py", "w") as f:
+    f.write(code_context.code + "\n" + code_context.test)
+```
+### Using the tools directly
+VisionAgent produces code that utilizes our tools. You can also use the tools directly.
+For example if you wanted to detect people in an image and visualize the results:
+```python
+import vision_agent.tools as T
+import matplotlib.pyplot as plt
+image = T.load_image("people.png")
+dets = T.countgd_object_detection("person", image)
+# visualize the countgd bounding boxes on the image
+viz = T.overlay_bounding_boxes(image, dets)
+# save the visualization to a file
+T.save_image(viz, "people_detected.png")
+# display the visualization
+plt.imshow(viz)
+plt.show()
+```
+You can also use the tools for running on video files:
+```python
+import vision_agent.tools as T
+frames_and_ts = T.extract_frames_and_timestamps("people.mp4")
+# extract the frames from the frames_and_ts list
+frames = [f["frame"] for f in frames_and_ts]
+# run the countgd tracking on the frames
+tracks = T.countgd_sam2_video_tracking("person", frames)
+# visualize the countgd tracking results on the frames and save the video
+viz = T.overlay_segmentation_masks(frames, tracks)
+T.save_video(viz, "people_detected.mp4")
+```
+## Using Other LLM Providers
+You can use other LLM providers by changing `config.py` in the `vision_agent/configs`
+directory. For example to change to Anthropic simply just run:
+```bash
+cp vision_agent/configs/anthropic_config.py vision_agent/configs/config.py
+```
+**NOTE**
+VisionAgent moves fast and we are constantly updating and changing the library. If you
+have any questions or need help, please reach out to us on our discord channel.
+---

{vision_agent-0.2.229 → vision_agent-0.2.230}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
 [tool.poetry]
 name = "vision-agent"
-version = "0.2.229"
+version = "0.2.230"
 description = "Toolset for Vision Agent"
 authors = ["Landing AI <dev@landing.ai>"]
 readme = "README.md"

{vision_agent-0.2.229 → vision_agent-0.2.230}/vision_agent/.sim_tools/df.csv RENAMED Viewed

@@ -244,7 +244,8 @@ desc,doc,name
     1.0.
     Parameters:
-        prompt (str): The prompt to ground to the image.
+        prompt (str): The prompt to ground to the image. Use exclusive categories that
+            do not overlap such as 'person, car' and NOT 'person, athlete'.
         image (np.ndarray): The image to ground the prompt to.
         fine_tune_id (Optional[str]): If you have a fine-tuned model, you can pass the
             fine-tuned model ID here to use it.
@@ -281,7 +282,8 @@ desc,doc,name
     is useful for tracking and counting without duplicating counts.
     Parameters:
-        prompt (str): The prompt to ground to the video.
+        prompt (str): The prompt to ground to the image. Use exclusive categories that
+            do not overlap such as 'person, car' and NOT 'person, athlete'.
         frames (List[np.ndarray]): The list of frames to ground the prompt to.
         chunk_length (Optional[int]): The number of frames to re-run florence2 to find
             new objects.
@@ -317,14 +319,14 @@ desc,doc,name
         ]
     ",florence2_sam2_video_tracking
 "'florence2_object_detection' is a tool that can detect multiple objects given a text prompt which can be object names or caption. You can optionally separate the object names in the text with commas. It returns a list of bounding boxes with normalized coordinates, label names and associated confidence scores of 1.0.","florence2_object_detection(prompt: str, image: numpy.ndarray, fine_tune_id: Optional[str] = None) -> List[Dict[str, Any]]:
-'florence2_object_detection' is a tool that can detect multiple
-    objects given a text prompt which can be object names or caption. You
-    can optionally separate the object names in the text with commas. It returns a list
-    of bounding boxes with normalized coordinates, label names and associated
-    confidence scores of 1.0.
+'florence2_object_detection' is a tool that can detect multiple objects given a
+    text prompt which can be object names or caption. You can optionally separate the
+    object names in the text with commas. It returns a list of bounding boxes with
+    normalized coordinates, label names and associated confidence scores of 1.0.
     Parameters:
-        prompt (str): The prompt to ground to the image.
+        prompt (str): The prompt to ground to the image. Use exclusive categories that
+            do not overlap such as 'person, car' and NOT 'person, athlete'.
         image (np.ndarray): The image to used to detect objects
         fine_tune_id (Optional[str]): If you have a fine-tuned model, you can pass the
             fine-tuned model ID here to use it.

{vision_agent-0.2.229 → vision_agent-0.2.230}/vision_agent/agent/agent_utils.py RENAMED Viewed

@@ -157,10 +157,11 @@ def format_conversation(chat: List[AgentMessage]) -> str:
     chat = copy.deepcopy(chat)
     prompt = ""
     for chat_i in chat:
-        if chat_i.role == "user":
-            prompt += f"USER: {chat_i.content}\n\n"
-        elif chat_i.role == "observation" or chat_i.role == "coder":
-            prompt += f"OBSERVATION: {chat_i.content}\n\n"
+        if chat_i.role == "user" or chat_i.role == "coder":
+            if "<final_code>" in chat_i.role:
+                prompt += f"OBSERVATION: {chat_i.content}\n\n"
+            elif chat_i.role == "user":
+                prompt += f"USER: {chat_i.content}\n\n"
         elif chat_i.role == "conversation":
             prompt += f"AGENT: {chat_i.content}\n\n"
     return prompt
@@ -332,26 +333,26 @@ def strip_function_calls(  # noqa: C901
         def __init__(self, exclusions: List[str]):
             # Store exclusions to skip removing certain function calls
             self.exclusions = exclusions
-            self.in_function_or_class = False
+            self.in_function_or_class: List[bool] = []
         def visit_FunctionDef(self, node: cst.FunctionDef) -> Optional[bool]:
-            self.in_function_or_class = True
+            self.in_function_or_class.append(True)
             return True
         def leave_FunctionDef(
             self, original_node: cst.FunctionDef, updated_node: cst.FunctionDef
         ) -> cst.BaseStatement:
-            self.in_function_or_class = False
+            self.in_function_or_class.pop()
             return updated_node
         def visit_ClassDef(self, node: cst.ClassDef) -> Optional[bool]:
-            self.in_function_or_class = True
+            self.in_function_or_class.append(True)
             return True
         def leave_ClassDef(
             self, node: cst.ClassDef, updated_node: cst.ClassDef
         ) -> cst.BaseStatement:
-            self.in_function_or_class = False
+            self.in_function_or_class.pop()
             return updated_node
         def leave_Expr(

{vision_agent-0.2.229 → vision_agent-0.2.230}/vision_agent/agent/vision_agent.py RENAMED Viewed

@@ -291,10 +291,9 @@ class VisionAgent(Agent):
             verbosity (int): The verbosity level of the agent.
             callback_message (Optional[Callable[[Dict[str, Any]], None]]): Callback
                 function to send intermediate update messages.
-            code_interpreter (Optional[Union[str, CodeInterpreter]]): For string values
-                it can be one of: None, "local" or "e2b". If None, it will read from
-                the environment variable "CODE_SANDBOX_RUNTIME". If a CodeInterpreter
-                object is provided it will use that.
+            code_sandbox_runtime (Optional[str]): For string values it can be one of:
+                None, "local" or "e2b". If None, it will read from the environment
+                variable "CODE_SANDBOX_RUNTIME".
         """
         self.agent = AnthropicLMM(temperature=0.0) if agent is None else agent

{vision_agent-0.2.229 → vision_agent-0.2.230}/vision_agent/agent/vision_agent_coder_prompts.py RENAMED Viewed

@@ -44,22 +44,22 @@ Can you write a program to check if each person is wearing a helmet? First detec
 ## Subtasks
-This plan uses the owl_v2_image tool to detect both people and helmets in a single pass, which should be efficient and accurate. We can then compare the detections to determine if each person is wearing a helmet.
--Use owl_v2_image with prompt 'person, helmet' to detect both people and helmets in the image
+This plan uses the owlv2_object_detection tool to detect both people and helmets in a single pass, which should be efficient and accurate. We can then compare the detections to determine if each person is wearing a helmet.
+-Use owlv2_object_detection with prompt 'person, helmet' to detect both people and helmets in the image
 -Process the detections to match helmets with people based on bounding box proximity
 -Count people with and without helmets based on the matching results
 -Return a dictionary with the counts
 **Tool Tests and Outputs**:
-After examining the image, I can see 4 workers in total, with 3 wearing yellow safety helmets and 1 not wearing a helmet. Plan 1 using owl_v2_image seems to be the most accurate in detecting both people and helmets. However, it needs some modifications to improve accuracy. We should increase the confidence threshold to 0.15 to filter out the lowest confidence box, and implement logic to associate helmets with people based on their bounding box positions. Plan 2 and Plan 3 seem less reliable given the tool outputs, as they either failed to distinguish between people with and without helmets or misclassified all workers as not wearing helmets.
+After examining the image, I can see 4 workers in total, with 3 wearing yellow safety helmets and 1 not wearing a helmet. Plan 1 using owlv2_object_detection seems to be the most accurate in detecting both people and helmets. However, it needs some modifications to improve accuracy. We should increase the confidence threshold to 0.15 to filter out the lowest confidence box, and implement logic to associate helmets with people based on their bounding box positions. Plan 2 and Plan 3 seem less reliable given the tool outputs, as they either failed to distinguish between people with and without helmets or misclassified all workers as not wearing helmets.
 **Tool Output Thoughts**:
 ```python
 ...
 ```
 ----- stdout -----
-Plan 1 - owl_v2_image:
+Plan 1 - owlv2_object_detection:
 [{{'label': 'helmet', 'score': 0.15, 'bbox': [0.85, 0.41, 0.87, 0.45]}}, {{'label': 'helmet', 'score': 0.3, 'bbox': [0.8, 0.43, 0.81, 0.46]}}, {{'label': 'helmet', 'score': 0.31, 'bbox': [0.85, 0.45, 0.86, 0.46]}}, {{'label': 'person', 'score': 0.31, 'bbox': [0.84, 0.45, 0.88, 0.58]}}, {{'label': 'person', 'score': 0.31, 'bbox': [0.78, 0.43, 0.82, 0.57]}}, {{'label': 'helmet', 'score': 0.33, 'bbox': [0.3, 0.65, 0.32, 0.67]}}, {{'label': 'person', 'score': 0.29, 'bbox': [0.28, 0.65, 0.36, 0.84]}}, {{'label': 'helmet', 'score': 0.29, 'bbox': [0.13, 0.82, 0.15, 0.85]}}, {{'label': 'person', 'score': 0.3, 'bbox': [0.1, 0.82, 0.24, 1.0]}}]
@@ -67,12 +67,12 @@ Plan 1 - owl_v2_image:
 **Input Code Snippet**:
 ```python
-from vision_agent.tools import load_image, owl_v2_image
+from vision_agent.tools import load_image, owlv2_object_detection
 def check_helmets(image_path):
     image = load_image(image_path)
     # Detect people and helmets, filter out the lowest confidence helmet score of 0.15
-    detections = owl_v2_image("person, helmet", image, box_threshold=0.15)
+    detections = owlv2_object_detection("person, helmet", image, box_threshold=0.15)
     height, width = image.shape[:2]
     # Separate people and helmets

{vision_agent-0.2.229 → vision_agent-0.2.230}/vision_agent/agent/vision_agent_coder_v2.py RENAMED Viewed

@@ -26,7 +26,8 @@ from vision_agent.agent.types import (
 )
 from vision_agent.agent.vision_agent_coder_prompts_v2 import CODE, FIX_BUG, TEST
 from vision_agent.agent.vision_agent_planner_v2 import VisionAgentPlannerV2
-from vision_agent.lmm import LMM, AnthropicLMM
+from vision_agent.configs import Config
+from vision_agent.lmm import LMM
 from vision_agent.lmm.types import Message
 from vision_agent.tools.meta_tools import get_diff
 from vision_agent.utils.execute import (
@@ -36,6 +37,7 @@ from vision_agent.utils.execute import (
 )
 from vision_agent.utils.sim import Sim, get_tool_recommender
+CONFIG = Config()
 _CONSOLE = Console()
@@ -185,23 +187,17 @@ def debug_code(
     return code, test, debug_info
-def write_and_test_code(
-    coder: LMM,
+def test_code(
     tester: LMM,
     debugger: LMM,
     chat: List[AgentMessage],
     plan: str,
+    code: str,
     tool_docs: str,
     code_interpreter: CodeInterpreter,
     media_list: List[Union[str, Path]],
     verbose: bool,
 ) -> CodeContext:
-    code = write_code(
-        coder=coder,
-        chat=chat,
-        tool_docs=tool_docs,
-        plan=plan,
-    )
     try:
         code = strip_function_calls(code)
     except Exception:
@@ -257,6 +253,36 @@ def write_and_test_code(
     )
+def write_and_test_code(
+    coder: LMM,
+    tester: LMM,
+    debugger: LMM,
+    chat: List[AgentMessage],
+    plan: str,
+    tool_docs: str,
+    code_interpreter: CodeInterpreter,
+    media_list: List[Union[str, Path]],
+    verbose: bool,
+) -> CodeContext:
+    code = write_code(
+        coder=coder,
+        chat=chat,
+        tool_docs=tool_docs,
+        plan=plan,
+    )
+    return test_code(
+        tester,
+        debugger,
+        chat,
+        plan,
+        code,
+        tool_docs,
+        code_interpreter,
+        media_list,
+        verbose,
+    )
 class VisionAgentCoderV2(AgentCoder):
     """VisionAgentCoderV2 is an agent that will write vision code for you."""
@@ -300,21 +326,9 @@ class VisionAgentCoderV2(AgentCoder):
             )
         )
-        self.coder = (
-            coder
-            if coder is not None
-            else AnthropicLMM(model_name="claude-3-5-sonnet-20241022", temperature=0.0)
-        )
-        self.tester = (
-            tester
-            if tester is not None
-            else AnthropicLMM(model_name="claude-3-5-sonnet-20241022", temperature=0.0)
-        )
-        self.debugger = (
-            debugger
-            if debugger is not None
-            else AnthropicLMM(model_name="claude-3-5-sonnet-20241022", temperature=0.0)
-        )
+        self.coder = coder if coder is not None else CONFIG.create_coder()
+        self.tester = tester if tester is not None else CONFIG.create_tester()
+        self.debugger = debugger if debugger is not None else CONFIG.create_debugger()
         if tool_recommender is not None:
             if isinstance(tool_recommender, str):
                 self.tool_recommender = Sim.load(tool_recommender)
@@ -440,12 +454,13 @@ class VisionAgentCoderV2(AgentCoder):
         ) as code_interpreter:
             int_chat, _, media_list = add_media_to_chat(chat, code_interpreter)
             tool_docs = retrieve_tools(plan_context.instructions, self.tool_recommender)
-            code_context = write_and_test_code(
-                coder=self.coder,
+            code_context = test_code(
                 tester=self.tester,
                 debugger=self.debugger,
                 chat=int_chat,
                 plan=format_plan_v2(plan_context),
+                code=plan_context.code,
                 tool_docs=tool_docs,
                 code_interpreter=code_interpreter,
                 media_list=media_list,

{vision_agent-0.2.229 → vision_agent-0.2.230}/vision_agent/agent/vision_agent_planner_prompts.py RENAMED Viewed

@@ -55,27 +55,27 @@ This is the documentation for the functions you have access to. You may call any
 --- EXAMPLE1 ---
 plan1:
 - Load the image from the provided file path 'image.jpg'.
-- Use the 'owl_v2_image' tool with the prompt 'person' to detect and count the number of people in the image.
+- Use the 'owlv2_object_detection' tool with the prompt 'person' to detect and count the number of people in the image.
 plan2:
 - Load the image from the provided file path 'image.jpg'.
-- Use the 'florence2_sam2_image' tool with the prompt 'person' to detect and count the number of people in the image.
+- Use the 'florence2_sam2_instance_segmentation' tool with the prompt 'person' to detect and count the number of people in the image.
 - Count the number of detected objects labeled as 'person'.
 plan3:
 - Load the image from the provided file path 'image.jpg'.
 - Use the 'countgd_object_detection' tool to count the dominant foreground object, which in this case is people.
 ```python
-from vision_agent.tools import load_image, owl_v2_image, florence2_sam2_image, countgd_object_detection
+from vision_agent.tools import load_image, owlv2_object_detection, florence2_sam2_instance_segmentation, countgd_object_detection
 image = load_image("image.jpg")
-owl_v2_out = owl_v2_image("person", image)
+owl_v2_out = owlv2_object_detection("person", image)
-f2s2_out = florence2_sam2_image("person", image)
+f2s2_out = florence2_sam2_instance_segmentation("person", image)
 # strip out the masks from the output becuase they don't provide useful information when printed
 f2s2_out = [{{k: v for k, v in o.items() if k != "mask"}} for o in f2s2_out]
 cgd_out = countgd_object_detection("person", image)
-final_out = {{"owl_v2_image": owl_v2_out, "florence2_sam2_image": f2s2, "countgd_object_detection": cgd_out}}
+final_out = {{"owlv2_object_detection": owl_v2_out, "florence2_sam2_instance_segmentation": f2s2, "countgd_object_detection": cgd_out}}
 print(final_out)
 --- END EXAMPLE1 ---

vision-agent 0.2.229__tar.gz → 0.2.230__tar.gz

vision-agent 0.2.229tar.gz → 0.2.230tar.gz