PyPI - vision-agent - Versions diffs - 0.2.190__tar.gz → 0.2.192__tar.gz - Mend

vision-agent 0.2.190tar.gz → 0.2.192tar.gz

Files changed (35) hide show

{vision_agent-0.2.190 → vision_agent-0.2.192}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: vision-agent
-Version: 0.2.190
+Version: 0.2.192
 Summary: Toolset for Vision Agent
 Author: Landing AI
 Author-email: dev@landing.ai
@@ -54,11 +54,7 @@ Description-Content-Type: text/markdown
 </div>
 VisionAgent is a library that helps you utilize agent frameworks to generate code to
-solve your vision task. Many current vision problems can easily take hours or days to
-solve, you need to find the right model, figure out how to use it and program it to
-accomplish the task you want. VisionAgent aims to provide an in-seconds experience by
-allowing users to describe their problem in text and have the agent framework generate
-code to solve the task for them. Check out our discord for updates and roadmaps!
+solve your vision task. Check out our discord for updates and roadmaps!
 ## Table of Contents
 - [🚀Quick Start](#quick-start)
@@ -82,19 +78,19 @@ To get started with the python library, you can install it using pip:
 pip install vision-agent
 ```
-Ensure you have an Anthropic key and an OpenAI API key and set in your environment
+Ensure you have both an Anthropic key and an OpenAI API key and set in your environment
 variables (if you are using Azure OpenAI please see the Azure setup section):
 ```bash
-export ANTHROPIC_API_KEY="your-api-key"
-export OPENAI_API_KEY="your-api-key"
+export ANTHROPIC_API_KEY="your-api-key" # needed for VisionAgent and VisionAgentCoder
+export OPENAI_API_KEY="your-api-key" # needed for ToolRecommender
 ```
 ### Basic Usage
 To get started you can just import the `VisionAgent` and start chatting with it:
 ```python
 >>> from vision_agent.agent import VisionAgent
->>> agent = VisionAgent()
+>>> agent = VisionAgent(verbosity=2)
 >>> resp = agent("Hello")
 >>> print(resp)
 [{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "{'thoughts': 'The user has greeted me. I will respond with a greeting and ask how I can assist them.', 'response': 'Hello! How can I assist you today?', 'let_user_respond': True}"}]
@@ -103,7 +99,7 @@ To get started you can just import the `VisionAgent` and start chatting with it:
 ```
 The chat messages are similar to `OpenAI`'s format with `role` and `content` keys but
-in addition to those you can add `medai` which is a list of media files that can either
+in addition to those you can add `media` which is a list of media files that can either
 be images or video files.
 ## Documentation

{vision_agent-0.2.190 → vision_agent-0.2.192}/README.md RENAMED Viewed

@@ -12,11 +12,7 @@
 </div>
 VisionAgent is a library that helps you utilize agent frameworks to generate code to
-solve your vision task. Many current vision problems can easily take hours or days to
-solve, you need to find the right model, figure out how to use it and program it to
-accomplish the task you want. VisionAgent aims to provide an in-seconds experience by
-allowing users to describe their problem in text and have the agent framework generate
-code to solve the task for them. Check out our discord for updates and roadmaps!
+solve your vision task. Check out our discord for updates and roadmaps!
 ## Table of Contents
 - [🚀Quick Start](#quick-start)
@@ -40,19 +36,19 @@ To get started with the python library, you can install it using pip:
 pip install vision-agent
 ```
-Ensure you have an Anthropic key and an OpenAI API key and set in your environment
+Ensure you have both an Anthropic key and an OpenAI API key and set in your environment
 variables (if you are using Azure OpenAI please see the Azure setup section):
 ```bash
-export ANTHROPIC_API_KEY="your-api-key"
-export OPENAI_API_KEY="your-api-key"
+export ANTHROPIC_API_KEY="your-api-key" # needed for VisionAgent and VisionAgentCoder
+export OPENAI_API_KEY="your-api-key" # needed for ToolRecommender
 ```
 ### Basic Usage
 To get started you can just import the `VisionAgent` and start chatting with it:
 ```python
 >>> from vision_agent.agent import VisionAgent
->>> agent = VisionAgent()
+>>> agent = VisionAgent(verbosity=2)
 >>> resp = agent("Hello")
 >>> print(resp)
 [{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "{'thoughts': 'The user has greeted me. I will respond with a greeting and ask how I can assist them.', 'response': 'Hello! How can I assist you today?', 'let_user_respond': True}"}]
@@ -61,7 +57,7 @@ To get started you can just import the `VisionAgent` and start chatting with it:
 ```
 The chat messages are similar to `OpenAI`'s format with `role` and `content` keys but
-in addition to those you can add `medai` which is a list of media files that can either
+in addition to those you can add `media` which is a list of media files that can either
 be images or video files.
 ## Documentation

{vision_agent-0.2.190 → vision_agent-0.2.192}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
 [tool.poetry]
 name = "vision-agent"
-version = "0.2.190"
+version = "0.2.192"
 description = "Toolset for Vision Agent"
 authors = ["Landing AI <dev@landing.ai>"]
 readme = "README.md"

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/agent/vision_agent_coder.py RENAMED Viewed

@@ -527,9 +527,6 @@ class VisionAgentCoder(Agent):
                 [{"role": "user", "content": "describe your task here..."}].
             plan_context (PlanContext): The context of the plan, including the plans,
                 best_plan, plan_thoughts, tool_doc, and tool_output.
-            test_multi_plan (bool): Whether to test multiple plans or just the best plan.
-            custom_tool_names (Optional[List[str]]): A list of custom tool names to use
-                for the planner.
         Returns:
             Dict[str, Any]: A dictionary containing the code output by the

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/agent/vision_agent_planner.py RENAMED Viewed

@@ -519,11 +519,7 @@ class OpenAIVisionAgentPlanner(VisionAgentPlanner):
         code_interpreter: Optional[Union[str, CodeInterpreter]] = None,
     ) -> None:
         super().__init__(
-            planner=(
-                OpenAILMM(temperature=0.0, json_mode=True)
-                if planner is None
-                else planner
-            ),
+            planner=(OpenAILMM(temperature=0.0) if planner is None else planner),
             tool_recommender=tool_recommender,
             verbosity=verbosity,
             report_progress_callback=report_progress_callback,
@@ -567,11 +563,7 @@ class AzureVisionAgentPlanner(VisionAgentPlanner):
         code_interpreter: Optional[Union[str, CodeInterpreter]] = None,
     ) -> None:
         super().__init__(
-            planner=(
-                AzureOpenAILMM(temperature=0.0, json_mode=True)
-                if planner is None
-                else planner
-            ),
+            planner=(AzureOpenAILMM(temperature=0.0) if planner is None else planner),
             tool_recommender=(
                 AzureSim(T.TOOLS_DF, sim_key="desc")
                 if tool_recommender is None

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/tools/tools.py RENAMED Viewed

@@ -27,10 +27,7 @@ from vision_agent.tools.tool_utils import (
     send_inference_request,
     send_task_inference_request,
 )
-from vision_agent.tools.tools_types import (
-    JobStatus,
-    ODResponseData,
-)
+from vision_agent.tools.tools_types import JobStatus, ODResponseData
 from vision_agent.utils.exceptions import FineTuneModelIsNotReady
 from vision_agent.utils.execute import FileSerializer, MimeType
 from vision_agent.utils.image_utils import (
@@ -641,8 +638,8 @@ def loca_visual_prompt_counting(
     Parameters:
         image (np.ndarray): The image that contains lot of instances of a single object
-        visual_prompt (Dict[str, List[float]]): Bounding box of the object in format
-        [xmin, ymin, xmax, ymax]. Only 1 bounding box can be provided.
+            visual_prompt (Dict[str, List[float]]): Bounding box of the object in
+            format [xmin, ymin, xmax, ymax]. Only 1 bounding box can be provided.
     Returns:
         Dict[str, Any]: A dictionary containing the key 'count' and the count as a
@@ -750,10 +747,10 @@ def countgd_example_based_counting(
     Parameters:
         visual_prompts (List[List[float]]): Bounding boxes of the object in format
-        [xmin, ymin, xmax, ymax]. Upto 3 bounding boxes can be provided.
-        image (np.ndarray): The image that contains multiple instances of the object.
-        box_threshold (float, optional): The threshold for detection. Defaults
-            to 0.23.
+            [xmin, ymin, xmax, ymax]. Upto 3 bounding boxes can be provided. image
+            (np.ndarray): The image that contains multiple instances of the object.
+            box_threshold (float, optional): The threshold for detection. Defaults to
+            0.23.
     Returns:
         List[Dict[str, Any]]: A list of dictionaries containing the score, label, and
@@ -1809,6 +1806,12 @@ def flux_image_inpainting(
     ):
         raise ValueError("The image or mask does not have enough size for inpainting")
+    if image.shape[0] % 8 != 0 or image.shape[1] % 8 != 0:
+        new_height = (image.shape[0] // 8) * 8
+        new_width = (image.shape[1] // 8) * 8
+        image = cv2.resize(image, (new_width, new_height))
+        mask = cv2.resize(mask, (new_width, new_height))
     if np.array_equal(mask, mask.astype(bool).astype(int)):
         mask = np.where(mask > 0, 255, 0).astype(np.uint8)
     else:

{vision_agent-0.2.190 → vision_agent-0.2.192}/LICENSE RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/__init__.py RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/agent/__init__.py RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/agent/agent.py RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/agent/agent_utils.py RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/agent/vision_agent.py RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/agent/vision_agent_coder_prompts.py RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/agent/vision_agent_planner_prompts.py RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/agent/vision_agent_prompts.py RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/clients/__init__.py RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/clients/http.py RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/clients/landing_public_api.py RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/fonts/__init__.py RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/fonts/default_font_ch_en.ttf RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/lmm/__init__.py RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/lmm/lmm.py RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/lmm/types.py RENAMED Viewed

File without changes

{vision_agent-0.2.190 → vision_agent-0.2.192}/vision_agent/tools/__init__.py RENAMED Viewed

@@ -40,6 +40,7 @@ from .tools import (
     florence2_roberta_vqa,
     florence2_sam2_image,
     florence2_sam2_video_tracking,
+    flux_image_inpainting,
     generate_pose_image,
     generate_soft_edge_image,
     get_tool_documentation,
@@ -59,17 +60,16 @@ from .tools import (
     overlay_segmentation_masks,
     owl_v2_image,
     owl_v2_video,
+    qwen2_vl_images_vqa,
+    qwen2_vl_video_vqa,
     save_image,
     save_json,
     save_video,
+    siglip_classification,
     template_match,
+    video_temporal_localization,
     vit_image_classification,
     vit_nsfw_classification,
-    qwen2_vl_images_vqa,
-    qwen2_vl_video_vqa,
-    video_temporal_localization,
-    flux_image_inpainting,
-    siglip_classification,
 )
 __new_tools__ = [