PyPI - vision-agent - Versions diffs - 1.1.14__py3-none-any.whl → 1.1.15__py3-none-any.whl - Mend

vision-agent 1.1.14py3-none-any.whl → 1.1.15py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

vision_agent/.sim_tools/df.csv CHANGED Viewed

@@ -406,6 +406,29 @@ desc,doc,name
         [
             {'label': 'hello world', 'bbox': [0.1, 0.11, 0.35, 0.4], 'score': 0.99},
         ]",ocr
+"'gemini_image_generation' performs either image inpainting given an image and text prompt, or image generation given a prompt. It can be used to edit parts of an image or the entire image according to the prompt given.","gemini_image_generation(prompt: str, image: Optional[numpy.ndarray] = None) -> numpy.ndarray:
+'gemini_image_generation' performs either image inpainting given an image and text prompt, or image generation given a prompt.
+    It can be used to edit parts of an image or the entire image according to the prompt given.
+    Parameters:
+        prompt (str): A detailed text description guiding what should be generated
+            in the image. More detailed and specific prompts typically yield
+            better results.
+        image (np.ndarray, optional): The source image to be inpainted. The image will serve as
+            the base context for the inpainting process.
+    Returns:
+        np.ndarray: The generated image(s) as a numpy array in RGB format with values
+            ranging from 0 to 255.
+    -------
+    Example:
+        >>> # Generate inpainting
+        >>> result = gemini_image_generation(
+        ...     prompt=""a modern black leather sofa with white pillows"",
+        ...     image=image,
+        ... )
+        >>> save_image(result, ""inpainted_room.png"")",gemini_image_generation
 'qwen25_vl_images_vqa' is a tool that can answer any questions about arbitrary images including regular images or images of documents or presentations. It can be very useful for document QA or OCR text extraction. It returns text as an answer to the question.,"qwen25_vl_images_vqa(prompt: str, images: List[numpy.ndarray]) -> str:
 'qwen25_vl_images_vqa' is a tool that can answer any questions about arbitrary
     images including regular images or images of documents or presentations. It can be
@@ -439,27 +462,28 @@ desc,doc,name
     -------
         >>> qwen25_vl_video_vqa('Which football player made the goal?', frames)
         'Lionel Messi'",qwen25_vl_video_vqa
-'activity_recognition' is a tool that can recognize activities in a video given a text prompt. It can be used to identify where specific activities or actions happen in a video and returns a list of 0s and 1s to indicate the activity.,"activity_recognition(prompt: str, frames: List[numpy.ndarray], model: str = 'qwen25vl', chunk_length_frames: int = 10) -> List[float]:
-'activity_recognition' is a tool that can recognize activities in a video given a
-    text prompt. It can be used to identify where specific activities or actions
-    happen in a video and returns a list of 0s and 1s to indicate the activity.
+"'agentic_activity_recognition' is a tool that allows you to detect multiple activities within a video. It can be used to identify when specific activities or actions happen in a video, along with a description of the activity.","agentic_activity_recognition(prompt: str, frames: List[numpy.ndarray], fps: Optional[float] = 5, specificity: str = 'max', with_audio: bool = False) -> List[Dict[str, Any]]:
+'agentic_activity_recognition' is a tool that allows you to detect multiple activities within a video.
+    It can be used to identify when specific activities or actions happen in a video, along with a description of the activity.
     Parameters:
-        prompt (str): The event you want to identify, should be phrased as a question,
-            for example, ""Did a goal happen?"".
-        frames (List[np.ndarray]): The reference frames used for the question
-        model (str): The model to use for the inference. Valid values are
-            'claude-35', 'gpt-4o', 'qwen2vl'.
-        chunk_length_frames (int): length of each chunk in frames
+        prompt (str): The prompt for activity recognition. Multiple activieties can be separated by semi-colon.
+        frames (List[np.ndarray]): The list of frames corresponding to the video.
+        fps (float, optional): The frame rate per second to extract the frames at. Defaults to 5.
+        specificity (str, optional): Specificity or precision level for activity recognition - low, medium, high, max. Default is max.
+        with_audio (bool, optional): Whether to include audio processing in activity recognition. Set it to false if there is no audio in the video. Default is false.
     Returns:
-        List[float]: A list of floats with a value of 1.0 if the activity is detected in
-            the chunk_length_frames of the video.
+        List[Dict[str, Any]]: A list of dictionaries containing the start time, end time, location, description, and label for each detected activity.
+            The start and end times are in seconds, the location is a string, the description is a string, and the label is an integer.
     Example
     -------
-        >>> activity_recognition('Did a goal happened?', frames)
-        [0.0, 0.0, 0.0, 1.0, 1.0, 0.0]",activity_recognition
+        >>> agentic_activity_recognition('Person gets on bike; Person gets off bike', frames)
+        [
+            {'start_time': 2, 'end_time': 4, 'location': 'Outdoor area', 'description': 'A person approaches a white bicycle parked in a row. The person then swings their leg over the bike and gets on it.', 'label': 0},
+            {'start_time': 10, 'end_time': 13, 'location': 'Outdoor area', 'description': 'A person gets off a white bicycle parked in a row. The person swings their leg over the bike and dismounts.', 'label': 1},
+        ]",agentic_activity_recognition
 'depth_anything_v2' is a tool that runs depth anything v2 model to generate a depth image from a given RGB image. The returned depth image is monochrome and represents depth values as pixel intensities with pixel values ranging from 0 to 255.,"depth_anything_v2(image: numpy.ndarray) -> numpy.ndarray:
 'depth_anything_v2' is a tool that runs depth anything v2 model to generate a
     depth image from a given RGB image. The returned depth image is monochrome and
@@ -514,59 +538,6 @@ desc,doc,name
     -------
         >>> vit_nsfw_classification(image)
         {""label"": ""normal"", ""scores"": 0.68},",vit_nsfw_classification
-"'flux_image_inpainting' performs image inpainting to fill the masked regions, given by mask, in the image, given image based on the text prompt and surrounding image context. It can be used to edit regions of an image according to the prompt given.","flux_image_inpainting(prompt: str, image: numpy.ndarray, mask: numpy.ndarray) -> numpy.ndarray:
-'flux_image_inpainting' performs image inpainting to fill the masked regions,
-    given by mask, in the image, given image based on the text prompt and surrounding
-    image context. It can be used to edit regions of an image according to the prompt
-    given.
-    Parameters:
-        prompt (str): A detailed text description guiding what should be generated
-            in the masked area. More detailed and specific prompts typically yield
-            better results.
-        image (np.ndarray): The source image to be inpainted. The image will serve as
-            the base context for the inpainting process.
-        mask (np.ndarray): A binary mask image with 0's and 1's, where 1 indicates
-            areas to be inpainted and 0 indicates areas to be preserved.
-    Returns:
-        np.ndarray: The generated image(s) as a numpy array in RGB format with values
-            ranging from 0 to 255.
-    -------
-    Example:
-        >>> # Generate inpainting
-        >>> result = flux_image_inpainting(
-        ...     prompt=""a modern black leather sofa with white pillows"",
-        ...     image=image,
-        ...     mask=mask,
-        ... )
-        >>> save_image(result, ""inpainted_room.png"")
-    ",flux_image_inpainting
-"'gemini_image_generation' performs image inpainting given an image and text prompt. It can be used to edit parts of an image or the entire image according to the prompt given.","gemini_image_generation(prompt: str, image: numpy.ndarray) -> numpy.ndarray:
-'gemini_image_generation' performs image inpainting given an image and text prompt.
-    It can be used to edit parts of an image or the entire image according to the prompt given.
-    Parameters:
-        prompt (str): A detailed text description guiding what should be generated
-            in the image. More detailed and specific prompts typically yield
-            better results.
-        image (np.ndarray): The source image to be inpainted. The image will serve as
-            the base context for the inpainting process.
-    Returns:
-        np.ndarray: The generated image(s) as a numpy array in RGB format with values
-            ranging from 0 to 255.
-    -------
-    Example:
-        >>> # Generate inpainting
-        >>> result = gemini_image_generation(
-        ...     prompt="a modern black leather sofa with white pillows",
-        ...     image=image,
-        ... )
-        >>> save_image(result, ""inpainted_room.png"")
-    ",gemini_image_generation
 'siglip_classification' is a tool that can classify an image or a cropped detection given a list of input labels or tags. It returns the same list of the input labels along with their probability scores based on image content.,"siglip_classification(image: numpy.ndarray, labels: List[str]) -> Dict[str, Any]:
 'siglip_classification' is a tool that can classify an image or a cropped detection given a list
     of input labels or tags. It returns the same list of the input labels along with
@@ -718,4 +689,4 @@ desc,doc,name
                     [0, 0, 0, ..., 0, 0, 0],
                     [0, 0, 0, ..., 0, 0, 0]], dtype=uint8),
             }],
-        )",overlay_segmentation_masks
+        )",overlay_segmentation_masks

vision_agent/.sim_tools/embs.npy CHANGED Viewed

Binary file

vision_agent/agent/vision_agent_planner_prompts_v2.py CHANGED Viewed

@@ -519,7 +519,7 @@ You are given a task: "{task}" from the user. You must extract the type of categ
 - "video object tracking" - tracking objects in a video.
 - "depth and pose estimation" - estimating the depth or pose of objects in an image.
 - "activity recognition" - identifying time period(s) an event occurs in a video.
-- "inpainting" - filling in masked parts of an image.
+- "image generation" - generating images from a text prompt.
 Return the category or categories (comma separated) inside tags <category># your categories here</category>. If you are unsure about a task, it is better to include more categories than less.
 """

vision_agent/agent/vision_agent_prompts_v2.py CHANGED Viewed

@@ -55,7 +55,7 @@ AGENT: <response>I am VisionAgent, an agent built by LandingAI, to help users wr
 - Pose estimation
 - Visual question answering for both images and videos
 - Activity recognition in videos
-- Image inpainting
+- Image generation
 How can I help you?</response>
 --- END EXAMPLE2 ---

vision_agent/tools/__init__.py CHANGED Viewed

@@ -7,7 +7,7 @@ from .meta_tools import (
 from .planner_tools import judge_od_results
 from .prompts import CHOOSE_PARAMS, SYSTEM_PROMPT
 from .tools import (
-    activity_recognition,
+    agentic_activity_recognition,
     agentic_document_extraction,
     agentic_object_detection,
     agentic_sam2_instance_segmentation,
@@ -30,7 +30,6 @@ from .tools import (
     florence2_ocr,
     florence2_sam2_instance_segmentation,
     florence2_sam2_video_tracking,
-    flux_image_inpainting,
     gemini_image_generation,
     generate_pose_image,
     get_tools,

vision_agent/tools/tools.py CHANGED Viewed

@@ -24,7 +24,7 @@ import pymupdf  # type: ignore
 from google import genai  # type: ignore
 from google.genai import types  # type: ignore
-from vision_agent.lmm.lmm import LMM, AnthropicLMM, OpenAILMM
+from vision_agent.lmm.lmm import AnthropicLMM
 from vision_agent.utils.execute import FileSerializer, MimeType
 from vision_agent.utils.image_utils import (
     b64_to_pil,
@@ -2337,140 +2337,55 @@ Answer the question directly using only the information from the document, do no
     return llm_output
-def _sample(frames: List[np.ndarray], sample_size: int) -> List[np.ndarray]:
-    sample_indices = np.linspace(0, len(frames) - 1, sample_size, dtype=int)
-    sampled_frames = []
-    for i, frame in enumerate(frames):
-        if i in sample_indices:
-            sampled_frames.append(frame)
-        if len(sampled_frames) >= sample_size:
-            break
-    return sampled_frames
-def _lmm_activity_recognition(
-    lmm: LMM,
-    segment: List[np.ndarray],
-    prompt: str,
-) -> List[float]:
-    frames = _sample(segment, 10)
-    media = []
-    for frame in frames:
-        buffer = io.BytesIO()
-        image_pil = Image.fromarray(frame)
-        if image_pil.size[0] > 768:
-            image_pil.thumbnail((768, 768))
-        image_pil.save(buffer, format="PNG")
-        image_bytes = buffer.getvalue()
-        image_b64 = "data:image/png;base64," + encode_image_bytes(image_bytes)
-        media.append(image_b64)
-    response = cast(str, lmm.generate(prompt, media))
-    if "yes" in response.lower():
-        return [1.0] * len(segment)
-    return [0.0] * len(segment)
-def _qwenvl_activity_recognition(
-    segment: List[np.ndarray], prompt: str, model_name: str = "qwen2vl"
-) -> List[float]:
-    payload: Dict[str, Any] = {
-        "prompt": prompt,
-        "model": model_name,
-        "function_name": f"{model_name}_vl_video_vqa",
-    }
-    segment_buffer_bytes = [("video", frames_to_bytes(segment))]
-    response = send_inference_request(
-        payload, "image-to-text", files=segment_buffer_bytes, v2=True
-    )
-    if "yes" in response.lower():
-        return [1.0] * len(segment)
-    return [0.0] * len(segment)
-def activity_recognition(
+def agentic_activity_recognition(
     prompt: str,
     frames: List[np.ndarray],
-    model: str = "qwen25vl",
-    chunk_length_frames: int = 10,
-) -> List[float]:
-    """'activity_recognition' is a tool that can recognize activities in a video given a
-    text prompt. It can be used to identify where specific activities or actions
-    happen in a video and returns a list of 0s and 1s to indicate the activity.
+    fps: Optional[float] = 5,
+    specificity: str = "max",
+    with_audio: bool = False,
+) -> List[Dict[str, Any]]:
+    """'agentic_activity_recognition' is a tool that allows you to detect multiple activities within a video.
+    It can be used to identify when specific activities or actions happen in a video, along with a description of the activity.
     Parameters:
-        prompt (str): The event you want to identify, should be phrased as a question,
-            for example, "Did a goal happen?".
-        frames (List[np.ndarray]): The reference frames used for the question
-        model (str): The model to use for the inference. Valid values are
-            'claude-35', 'gpt-4o', 'qwen2vl'.
-        chunk_length_frames (int): length of each chunk in frames
+        prompt (str): The prompt for activity recognition. Multiple activieties can be separated by semi-colon.
+        frames (List[np.ndarray]): The list of frames corresponding to the video.
+        fps (float, optional): The frame rate per second to extract the frames at. Defaults to 5.
+        specificity (str, optional): Specificity or precision level for activity recognition - low, medium, high, max. Default is max.
+        with_audio (bool, optional): Whether to include audio processing in activity recognition. Set it to false if there is no audio in the video. Default is false.
     Returns:
-        List[float]: A list of floats with a value of 1.0 if the activity is detected in
-            the chunk_length_frames of the video.
+        List[Dict[str, Any]]: A list of dictionaries containing the start time, end time, location, description, and label for each detected activity.
+            The start and end times are in seconds, the location is a string, the description is a string, and the label is an integer.
     Example
     -------
-        >>> activity_recognition('Did a goal happened?', frames)
-        [0.0, 0.0, 0.0, 1.0, 1.0, 0.0]
+        >>> agentic_activity_recognition('Person gets on bike; Person gets off bike', frames)
+        [
+            {'start_time': 2, 'end_time': 4, 'location': 'Outdoor area', 'description': 'A person approaches a white bicycle parked in a row. The person then swings their leg over the bike and gets on it.', 'label': 0},
+            {'start_time': 10, 'end_time': 13, 'location': 'Outdoor area', 'description': 'A person gets off a white bicycle parked in a row. The person swings their leg over the bike and dismounts.', 'label': 1},
+        ]
     """
-    buffer_bytes = frames_to_bytes(frames)
+    fps = fps if fps is not None else 5
+    buffer_bytes = frames_to_bytes(frames, fps=fps)
     files = [("video", buffer_bytes)]
-    segments = split_frames_into_segments(
-        frames, segment_size=chunk_length_frames, overlap=0
-    )
+    payload = {"prompt": prompt, "specificity": specificity, "with_audio": with_audio}
-    prompt = (
-        f"{prompt} Please respond with a 'yes' or 'no' based on the frames provided."
+    response = send_inference_request(
+        payload=payload, endpoint_name="activity-recognition", files=files, v2=True
     )
-    if model == "claude-35":
-        def _apply_activity_recognition(segment: List[np.ndarray]) -> List[float]:
-            return _lmm_activity_recognition(AnthropicLMM(), segment, prompt)
-    elif model == "gpt-4o":
-        def _apply_activity_recognition(segment: List[np.ndarray]) -> List[float]:
-            return _lmm_activity_recognition(OpenAILMM(), segment, prompt)
-    elif model == "qwen2vl":
-        def _apply_activity_recognition(segment: List[np.ndarray]) -> List[float]:
-            return _qwenvl_activity_recognition(segment, prompt, model_name="qwen2vl")
-    elif model == "qwen25vl":
-        def _apply_activity_recognition(segment: List[np.ndarray]) -> List[float]:
-            return _qwenvl_activity_recognition(segment, prompt, model_name="qwen25vl")
-    else:
-        raise ValueError(f"Invalid model: {model}")
-    with ThreadPoolExecutor() as executor:
-        futures = {
-            executor.submit(_apply_activity_recognition, segment): segment_index
-            for segment_index, segment in enumerate(segments)
-        }
-        return_value_tuples = []
-        for future in as_completed(futures):
-            segment_index = futures[future]
-            return_value_tuples.append((segment_index, future.result()))
-    return_values = [x[1] for x in sorted(return_value_tuples, key=lambda x: x[0])]
-    return_values_flattened = cast(List[float], [e for o in return_values for e in o])
     _display_tool_trace(
-        activity_recognition.__name__,
-        {"prompt": prompt, "model": model},
-        return_values,
+        agentic_activity_recognition.__name__,
+        {"prompt": prompt, "specificity": specificity, "with_audio": with_audio},
+        response,
         files,
     )
-    return return_values_flattened
+    events: List[Dict[str, Any]] = response["events"]
+    return events
 def vit_image_classification(image: np.ndarray) -> Dict[str, Any]:
@@ -2751,104 +2666,6 @@ def template_match(
     return return_data
-def flux_image_inpainting(
-    prompt: str,
-    image: np.ndarray,
-    mask: np.ndarray,
-) -> np.ndarray:
-    """'flux_image_inpainting' performs image inpainting to fill the masked regions,
-    given by mask, in the image, given image based on the text prompt and surrounding
-    image context. It can be used to edit regions of an image according to the prompt
-    given.
-    Parameters:
-        prompt (str): A detailed text description guiding what should be generated
-            in the masked area. More detailed and specific prompts typically yield
-            better results.
-        image (np.ndarray): The source image to be inpainted. The image will serve as
-            the base context for the inpainting process.
-        mask (np.ndarray): A binary mask image with 0's and 1's, where 1 indicates
-            areas to be inpainted and 0 indicates areas to be preserved.
-    Returns:
-        np.ndarray: The generated image(s) as a numpy array in RGB format with values
-            ranging from 0 to 255.
-    -------
-    Example:
-        >>> # Generate inpainting
-        >>> result = flux_image_inpainting(
-        ...     prompt="a modern black leather sofa with white pillows",
-        ...     image=image,
-        ...     mask=mask,
-        ... )
-        >>> save_image(result, "inpainted_room.png")
-    """
-    min_dim = 8
-    if any(dim < min_dim for dim in image.shape[:2] + mask.shape[:2]):
-        raise ValueError(f"Image and mask must be at least {min_dim}x{min_dim} pixels")
-    max_size = (512, 512)
-    if image.shape[0] > max_size[0] or image.shape[1] > max_size[1]:
-        scaling_factor = min(max_size[0] / image.shape[0], max_size[1] / image.shape[1])
-        new_size = (
-            int(image.shape[1] * scaling_factor),
-            int(image.shape[0] * scaling_factor),
-        )
-        new_size = ((new_size[0] // 8) * 8, (new_size[1] // 8) * 8)
-        image = cv2.resize(image, new_size, interpolation=cv2.INTER_AREA)
-        mask = cv2.resize(mask, new_size, interpolation=cv2.INTER_NEAREST)
-    elif image.shape[0] % 8 != 0 or image.shape[1] % 8 != 0:
-        new_size = ((image.shape[1] // 8) * 8, (image.shape[0] // 8) * 8)
-        image = cv2.resize(image, new_size, interpolation=cv2.INTER_AREA)
-        mask = cv2.resize(mask, new_size, interpolation=cv2.INTER_NEAREST)
-    if np.array_equal(mask, mask.astype(bool).astype(int)):
-        mask = np.where(mask > 0, 255, 0).astype(np.uint8)
-    else:
-        raise ValueError("Mask should contain only binary values (0 or 1)")
-    image_file = numpy_to_bytes(image)
-    mask_file = numpy_to_bytes(mask)
-    files = [
-        ("image", image_file),
-        ("mask_image", mask_file),
-    ]
-    payload = {
-        "prompt": prompt,
-        "task": "inpainting",
-        "height": image.shape[0],
-        "width": image.shape[1],
-        "strength": 0.99,
-        "guidance_scale": 18,
-        "num_inference_steps": 20,
-        "seed": None,
-    }
-    response = send_inference_request(
-        payload=payload,
-        endpoint_name="flux1",
-        files=files,
-        v2=True,
-        metadata_payload={"function_name": "flux_image_inpainting"},
-    )
-    output_image = np.array(b64_to_pil(response[0]).convert("RGB"))
-    _display_tool_trace(
-        flux_image_inpainting.__name__,
-        payload,
-        output_image,
-        files,
-    )
-    return output_image
 def gemini_image_generation(
     prompt: str,
     image: Optional[np.ndarray] = None,
@@ -2894,24 +2711,18 @@ def gemini_image_generation(
                     ),
                 )
-                if (
-                    not resp.candidates
-                    or not resp.candidates[0].content
-                    or not resp.candidates[0].content.parts
-                    or not resp.candidates[0].content.parts[0].inline_data
-                    or not resp.candidates[0].content.parts[0].inline_data.data
-                ):
+                if not resp.candidates or not resp.candidates[0].content:
                     _LOGGER.warning(f"Attempt {attempt + 1}: No candidates returned")
                     time.sleep(5)
                     continue
-                else:
-                    return (
-                        resp.candidates[0].content.parts[0].inline_data.data
-                        if isinstance(
-                            resp.candidates[0].content.parts[0].inline_data.data, bytes
-                        )
-                        else None
-                    )
+                for part in resp.candidates[0].content.parts:
+                    if (
+                        hasattr(part, "inline_data")
+                        and part.inline_data
+                        and isinstance(data := part.inline_data.data, bytes)
+                    ):
+                        return data
             except genai.errors.ClientError as e:
                 _LOGGER.warning(f"Attempt {attempt + 1} failed: {str(e)}")
@@ -2932,8 +2743,6 @@ def gemini_image_generation(
             )
             image = cv2.resize(image, new_size, interpolation=cv2.INTER_AREA)
-        # Convert to RGB
-        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
         image_file = numpy_to_bytes(image)
         files = [("image", image_file)]
@@ -3756,13 +3565,13 @@ FUNCTION_TOOLS = [
     agentic_document_extraction,
     document_qa,
     ocr,
+    gemini_image_generation,
     qwen25_vl_images_vqa,
     qwen25_vl_video_vqa,
-    activity_recognition,
+    agentic_activity_recognition,
     depth_anything_v2,
     generate_pose_image,
     vit_nsfw_classification,
-    flux_image_inpainting,
     siglip_classification,
     minimum_distance,
 ]

{vision_agent-1.1.14.dist-info → vision_agent-1.1.15.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: vision-agent
-Version: 1.1.14
+Version: 1.1.15
 Summary: Toolset for Vision Agent
 Project-URL: Homepage, https://landing.ai
 Project-URL: repository, https://github.com/landing-ai/vision-agent

{vision_agent-1.1.14.dist-info → vision_agent-1.1.15.dist-info}/RECORD RENAMED Viewed

@@ -1,14 +1,14 @@
 vision_agent/__init__.py,sha256=EAb4-f9iyuEYkBrX4ag1syM8Syx8118_t0R6_C34M9w,57
-vision_agent/.sim_tools/df.csv,sha256=fLh8HN76ezbOXZUoZbnkhNi5vvjYif2jSblHtRdY8dY,41875
-vision_agent/.sim_tools/embs.npy,sha256=uUPZ6QuCAr8JAtFa1L9ndAag5ycptIeJ2I8P9U8Y6YY,245888
+vision_agent/.sim_tools/df.csv,sha256=i732_U1KQf55UNhT-9srtZXF91XvDnfWBDdc8EqDmpw,41215
+vision_agent/.sim_tools/embs.npy,sha256=XCu3LnLS10IS3npfPMqX2VHIbDPq9iY_NPDBwq5AEj0,245888
 vision_agent/agent/README.md,sha256=3XSPG_VO7-6y6P8COvcgSSonWj5uvfgvfmOkBpfKK8Q,5527
 vision_agent/agent/__init__.py,sha256=_-nGLHhRTLViXxBSb9D4OwLTqk9HXKPEkTBkvK8c7OU,206
 vision_agent/agent/agent.py,sha256=o1Zuhl6h2R7uVwvUur0Aj38kak8U08plfeFWPst_ErM,1576
 vision_agent/agent/vision_agent_coder_prompts_v2.py,sha256=53b_DhQtffX5wxLuCbNQ83AJhB0P_3wEnuKr-v5bx-o,4866
 vision_agent/agent/vision_agent_coder_v2.py,sha256=ELc_J8Q4NKPs7YETu3a9O0Vk1zN3k6QfHBgu0M0IWGk,17450
-vision_agent/agent/vision_agent_planner_prompts_v2.py,sha256=YARVphHKLMNUqCeOsrManvgecl77RP1g51vtt7JpdWk,35937
+vision_agent/agent/vision_agent_planner_prompts_v2.py,sha256=O24BpRhMRZx7D_WdaRv-a2K6fLpin0o7oWxlvL70WpM,35944
 vision_agent/agent/vision_agent_planner_v2.py,sha256=Aww_BJhTFKZ5XjYe8FW57z2Gwp2se0vg1t1DKLGRAyQ,22050
-vision_agent/agent/vision_agent_prompts_v2.py,sha256=6l0o6yAEcaTBOxkHPNJcdV2wkLpoMIiB_9ZqgL2qo2k,4231
+vision_agent/agent/vision_agent_prompts_v2.py,sha256=NG1xnZvZGi4DcqdfqZCkPkS7oka3gr6h42ekUKUKcqY,4231
 vision_agent/agent/vision_agent_v2.py,sha256=iPW6DowH7wCFIA5vb1SdSLfZFWbn_oSC7Xa8uO8KIJI,11675
 vision_agent/clients/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 vision_agent/clients/http.py,sha256=k883i6M_4nl7zwwHSI-yP5sAgQZIDPM1nrKD6YFJ3Xs,2009
@@ -26,11 +26,11 @@ vision_agent/models/lmm_types.py,sha256=v04h-NjbczHOIN8UWa1vvO5-1BDuZ4JQhD2mge1c
 vision_agent/models/tools_types.py,sha256=8hYf2OZhI58gvf65KGaeGkt4EQ56nwLFqIQDPHioOBc,2339
 vision_agent/sim/__init__.py,sha256=Aouz6HEPPTYcLxR5_0fTYCL1OvPKAH1RMWAF90QXAlA,135
 vision_agent/sim/sim.py,sha256=WQY_x9A4VT647qGDBScJ3R8_Iv0aoYLHTgwcQSCXwv4,10059
-vision_agent/tools/__init__.py,sha256=PRUka2eqHwPWJxwfpLj-O2Ab7hXG_dsE1Aov3TE6teM,2496
+vision_agent/tools/__init__.py,sha256=zf8HzjcMSgxKhtrxbqYe9hmvsfuweeDMrOc8eVA8Ya8,2477
 vision_agent/tools/meta_tools.py,sha256=9iJilpGYEiXW0nYPTYAWHa7l23wGN8IM5KbE7mWDOT0,6798
 vision_agent/tools/planner_tools.py,sha256=iQWtTgXdomn0IWrbmvXXM-y8Q_RSEOxyP04HIRLrgWI,19576
 vision_agent/tools/prompts.py,sha256=V1z4YJLXZuUl_iZ5rY0M5hHc_2tmMEUKr0WocXKGt4E,1430
-vision_agent/tools/tools.py,sha256=A1YpJuarR1P9ZLnCuakxLiUUtYsnlrvfwlUrkBey_FU,130803
+vision_agent/tools/tools.py,sha256=i9GGGu8tvo2M6O5fF4UUBTpn_Ul2KEN9mG3ZlJ95qao,124929
 vision_agent/utils/__init__.py,sha256=mANUs_84VL-3gpZbXryvV2mWU623eWnRlJCSUHtMjuw,122
 vision_agent/utils/agent.py,sha256=2ifTP5QElItnr4YHOJR6L5P1PUzV0GhChTTqVxuVyQg,15153
 vision_agent/utils/exceptions.py,sha256=zis8smCbdEylBVZBTVfEUfAh7Rb7cWV3MSPambu6FsQ,1837
@@ -40,7 +40,7 @@ vision_agent/utils/tools.py,sha256=Days0dETPRQLSDamMKPnXFsc5g5IKX9QJcPPNmSHNdM,8
 vision_agent/utils/tools_doc.py,sha256=PKcXXbJktiuPi9q6Q1zXzFx24Dh229SNgWBDtZ2fQSQ,2730
 vision_agent/utils/video.py,sha256=rjsQ1sKKisaQ6AVjJz0zd_G4g-ovRweS_rs4JEhenoI,5340
 vision_agent/utils/video_tracking.py,sha256=DZLFpNCuzuPJQzbQoVNcp-m4dKxgiKdCNM5QTh_zURE,12245
-vision_agent-1.1.14.dist-info/METADATA,sha256=fYoKMckIOfrXxZ6vmu4jTCGcKFRYckSQEcodUBRLjqI,12673
-vision_agent-1.1.14.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
-vision_agent-1.1.14.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
-vision_agent-1.1.14.dist-info/RECORD,,
+vision_agent-1.1.15.dist-info/METADATA,sha256=EkYUNPMuq2WuDoBFVhKMT9H06z7-wzjWjV4EQGeIf8E,12673
+vision_agent-1.1.15.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
+vision_agent-1.1.15.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
+vision_agent-1.1.15.dist-info/RECORD,,

{vision_agent-1.1.14.dist-info → vision_agent-1.1.15.dist-info}/WHEEL RENAMED Viewed

File without changes

{vision_agent-1.1.14.dist-info → vision_agent-1.1.15.dist-info}/licenses/LICENSE RENAMED Viewed

File without changes

vision-agent 1.1.14__py3-none-any.whl → 1.1.15__py3-none-any.whl

vision-agent 1.1.14py3-none-any.whl → 1.1.15py3-none-any.whl