PyPI - vision-agent - Versions diffs - 1.1.6__tar.gz → 1.1.8__tar.gz - Mend

vision-agent 1.1.6tar.gz → 1.1.8tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

vision_agent-1.1.8/.gitignore ADDED Viewed

@@ -0,0 +1,99 @@
+# Prerequisites
+*.d
+# Object files
+*.o
+*.ko
+*.obj
+*.elf
+# Env files
+.env
+# Precompiled Headers
+*.gch
+*.pch
+# Libraries
+*.lib
+*.a
+*.la
+*.lo
+# Shared objects (inc. Windows DLLs)
+*.dll
+*.so
+*.so.*
+*.dylib
+# Executables
+*.exe
+*.out
+*.app
+*.i*86
+*.x86_64
+*.hex
+# Debug files
+*.dSYM/
+*.su
+# Mac files
+.DS_Store
+.DS_STORE
+# Old HG stuff
+.hg
+.hgignore
+.hgtags
+.git
+__pycache__
+.ipynb_checkpoints
+*/__pycache__
+*/.ipynb_checkpoints
+.local
+.jupyter
+.ipython
+*/.terraform
+terraform.*
+.terraform.*
+shinobi-dvr/*
+.vscode/
+# mypy
+.mypy_cache/*
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# Output from various tools
+examples/output
+tests/output
+docs-build
+site
+# Local or WIP files
+local/
+vision-agent-benchmark/
+vision_agent/tools/suggestion.py
+vision_agent/agent/visual_design_patterns.py

{vision_agent-1.1.6 → vision_agent-1.1.8}/PKG-INFO RENAMED Viewed

@@ -1,46 +1,42 @@
-Metadata-Version: 2.3
+Metadata-Version: 2.4
 Name: vision-agent
-Version: 1.1.6
+Version: 1.1.8
 Summary: Toolset for Vision Agent
-Author: Landing AI
-Author-email: dev@landing.ai
-Requires-Python: >=3.9,<4.0
-Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.9
-Classifier: Programming Language :: Python :: 3.10
-Classifier: Programming Language :: Python :: 3.11
-Classifier: Programming Language :: Python :: 3.12
-Classifier: Programming Language :: Python :: 3.13
-Requires-Dist: anthropic (>=0.31.0,<0.32.0)
-Requires-Dist: av (>=11.0.0,<12.0.0)
-Requires-Dist: dotenv (>=0.9.9,<0.10.0)
-Requires-Dist: flake8 (>=7.0.0,<8.0.0)
-Requires-Dist: ipykernel (>=6.29.4,<7.0.0)
-Requires-Dist: libcst (>=1.5.0,<2.0.0)
-Requires-Dist: matplotlib (>=3.9.2,<4.0.0)
-Requires-Dist: nbclient (>=0.10.0,<0.11.0)
-Requires-Dist: nbformat (>=5.10.4,<6.0.0)
-Requires-Dist: numpy (>=1.21.0,<2.0.0)
-Requires-Dist: openai (==1.*)
-Requires-Dist: opencv-python (==4.*)
-Requires-Dist: opentelemetry-api (>=1.29.0,<2.0.0)
-Requires-Dist: pandas (==2.*)
-Requires-Dist: pillow (==10.*)
-Requires-Dist: pillow-heif (>=0.16.0,<0.17.0)
-Requires-Dist: pydantic (>=2.0.0,<3.0.0)
-Requires-Dist: pymupdf (>=1.23.0,<2.0.0)
-Requires-Dist: pytube (==15.0.0)
-Requires-Dist: requests (==2.*)
-Requires-Dist: rich (>=13.7.1,<14.0.0)
-Requires-Dist: scikit-learn (>=1.5.2,<2.0.0)
-Requires-Dist: scipy (==1.13.*)
-Requires-Dist: tabulate (>=0.9.0,<0.10.0)
-Requires-Dist: tenacity (>=8.3.0,<9.0.0)
-Requires-Dist: tqdm (>=4.64.0,<5.0.0)
-Requires-Dist: typing_extensions (==4.*)
 Project-URL: Homepage, https://landing.ai
-Project-URL: documentation, https://github.com/landing-ai/vision-agent
 Project-URL: repository, https://github.com/landing-ai/vision-agent
+Project-URL: documentation, https://github.com/landing-ai/vision-agent
+Author-email: Landing AI <dev@landing.ai>
+License-File: LICENSE
+Requires-Python: <4.0,>=3.9
+Requires-Dist: anthropic<0.32,>=0.31.0
+Requires-Dist: av<12,>=11.0.0
+Requires-Dist: dotenv<0.10,>=0.9.9
+Requires-Dist: flake8<8,>=7.0.0
+Requires-Dist: google-genai<2,>=1.0.0
+Requires-Dist: httpx==0.27.2
+Requires-Dist: ipykernel<7,>=6.29.4
+Requires-Dist: libcst<2,>=1.5.0
+Requires-Dist: matplotlib<4,>=3.9.2
+Requires-Dist: nbclient<0.11,>=0.10.0
+Requires-Dist: nbformat<6,>=5.10.4
+Requires-Dist: numpy<2.0.0,>=1.21.0
+Requires-Dist: openai==1.55.3
+Requires-Dist: opencv-python==4.*
+Requires-Dist: opentelemetry-api<2,>=1.29.0
+Requires-Dist: pandas==2.*
+Requires-Dist: pillow-heif<0.17,>=0.16.0
+Requires-Dist: pillow==10.*
+Requires-Dist: pydantic<3,>=2.0.0
+Requires-Dist: pymupdf<2,>=1.23.0
+Requires-Dist: pytube==15.0.0
+Requires-Dist: requests==2.*
+Requires-Dist: rich<14,>=13.7.1
+Requires-Dist: scikit-learn<2,>=1.5.2
+Requires-Dist: scipy==1.13.*
+Requires-Dist: tabulate<0.10,>=0.9.0
+Requires-Dist: tenacity<9,>=8.3.0
+Requires-Dist: tqdm<5.0.0,>=4.64.0
+Requires-Dist: typing-extensions==4.*
 Description-Content-Type: text/markdown
 <div align="center">
@@ -81,7 +77,7 @@ The most important step is to [signup](https://va.landing.ai/agent) and obtain y
 ### Other Prerequisites
 - Python version 3.9 or higher
 - [Anthropic API key](#get-an-anthropic-api-key)
-- [Gemini API key](#get-a-gemini-api-key)
+- [Google API key](#get-a-google-api-key)
 ### Why do I need Anthropic and Google API Keys?
 VisionAgent uses models from Anthropic and Google to respond to prompts and generate code.
@@ -90,7 +86,7 @@ When you run the web-based version of VisionAgent, the app uses the LandingAI AP
 When you run VisionAgent programmatically, the app will need to use your API keys to access the Anthropic and Google models. This ensures that any projects you run with VisionAgent aren’t limited by the rate limits in place with the LandingAI accounts, and it also prevents many users from overloading the LandingAI rate limits.
-Anthropic and Gemini each have their own rate limits and paid tiers. Refer to their documentation and pricing to learn more.
+Anthropic and Google each have their own rate limits and paid tiers. Refer to their documentation and pricing to learn more.
 > **_NOTE:_** In VisionAgent v1.0.2 and earlier, VisionAgent was powered by Anthropic Claude-3.5 and OpenAI o1. If using one of these VisionAgent versions, you get an OpenAI API key and set it as an environment variable.
@@ -100,13 +96,14 @@ Anthropic and Gemini each have their own rate limits and paid tiers. Refer to th
 2. In the Anthropic Console, go to the [API Keys](https://console.anthropic.com/settings/keys) page.
 3. Generate an API key.
-### Get a Gemini API Key
+### Get a Google API Key
 1. If you don’t have one yet, create a [Google AI Studio account](https://aistudio.google.com/).
 2. In Google AI Studio, go to the [Get API Key](https://aistudio.google.com/app/apikey) page.
 3. Generate an API key.
 ## Installation
 ```bash
 pip install vision-agent
 ```
@@ -114,8 +111,8 @@ pip install vision-agent
 ## Quickstart: Prompt VisionAgent
 Follow this quickstart to learn how to prompt VisionAgent. After learning the basics, customize your prompt and workflow to meet your needs.
-1. Get your Anthropic, Gemini, and VisionAgent API keys.
-2. [Set the Anthropic, Gemini, and VisionAgent API keys as environment variables](#set-api-keys-as-environment-variables).
+1. Get your Anthropic, Google, and VisionAgent API keys.
+2. [Set the Anthropic, Google, and VisionAgent API keys as environment variables](#set-api-keys-as-environment-variables).
 3. [Install VisionAgent](#installation).
 4. Create a folder called `quickstart`.
 5. Find an image you want to analyze and save it to the `quickstart` folder.
@@ -124,13 +121,13 @@ Follow this quickstart to learn how to prompt VisionAgent. After learning the ba
 8. VisionAgent creates a file called `generated_code.py` and saves the generated code there.
 ### Set API Keys as Environment Variables
-Before running VisionAgent code, you must set the Anthropic, Gemini, and VisionAgent API keys as environment variables. Each operating system offers different ways to do this.
+Before running VisionAgent code, you must set the Anthropic, Google, and VisionAgent API keys as environment variables. Each operating system offers different ways to do this.
 Here is the code for setting the variables:
 ```bash
 export VISION_AGENT_API_KEY="your-api-key"
 export ANTHROPIC_API_KEY="your-api-key"
-export GEMINI_API_KEY="your-api-key"
+export GOOGLE_API_KEY="your-api-key"
 ```
 ### Sample Script: Prompt VisionAgent
 To use VisionAgent to generate code, use the following script as a starting point:
@@ -269,4 +266,3 @@ with this code:
 - [VisionAgent Library Docs](https://landing-ai.github.io/vision-agent/): Learn how to use this library.
 - [VisionAgent Web App Docs](https://support.landing.ai/docs/agentic-ai): Learn how to use the web-based version of VisionAgent.
 - [Video Tutorials](https://www.youtube.com/playlist?list=PLrKGAzovU85fvo22OnVtPl90mxBygIf79): Watch the latest video tutorials to see how VisionAgent is used in a variety of use cases.

{vision_agent-1.1.6 → vision_agent-1.1.8}/README.md RENAMED Viewed

@@ -36,7 +36,7 @@ The most important step is to [signup](https://va.landing.ai/agent) and obtain y
 ### Other Prerequisites
 - Python version 3.9 or higher
 - [Anthropic API key](#get-an-anthropic-api-key)
-- [Gemini API key](#get-a-gemini-api-key)
+- [Google API key](#get-a-google-api-key)
 ### Why do I need Anthropic and Google API Keys?
 VisionAgent uses models from Anthropic and Google to respond to prompts and generate code.
@@ -45,7 +45,7 @@ When you run the web-based version of VisionAgent, the app uses the LandingAI AP
 When you run VisionAgent programmatically, the app will need to use your API keys to access the Anthropic and Google models. This ensures that any projects you run with VisionAgent aren’t limited by the rate limits in place with the LandingAI accounts, and it also prevents many users from overloading the LandingAI rate limits.
-Anthropic and Gemini each have their own rate limits and paid tiers. Refer to their documentation and pricing to learn more.
+Anthropic and Google each have their own rate limits and paid tiers. Refer to their documentation and pricing to learn more.
 > **_NOTE:_** In VisionAgent v1.0.2 and earlier, VisionAgent was powered by Anthropic Claude-3.5 and OpenAI o1. If using one of these VisionAgent versions, you get an OpenAI API key and set it as an environment variable.
@@ -55,13 +55,14 @@ Anthropic and Gemini each have their own rate limits and paid tiers. Refer to th
 2. In the Anthropic Console, go to the [API Keys](https://console.anthropic.com/settings/keys) page.
 3. Generate an API key.
-### Get a Gemini API Key
+### Get a Google API Key
 1. If you don’t have one yet, create a [Google AI Studio account](https://aistudio.google.com/).
 2. In Google AI Studio, go to the [Get API Key](https://aistudio.google.com/app/apikey) page.
 3. Generate an API key.
 ## Installation
 ```bash
 pip install vision-agent
 ```
@@ -69,8 +70,8 @@ pip install vision-agent
 ## Quickstart: Prompt VisionAgent
 Follow this quickstart to learn how to prompt VisionAgent. After learning the basics, customize your prompt and workflow to meet your needs.
-1. Get your Anthropic, Gemini, and VisionAgent API keys.
-2. [Set the Anthropic, Gemini, and VisionAgent API keys as environment variables](#set-api-keys-as-environment-variables).
+1. Get your Anthropic, Google, and VisionAgent API keys.
+2. [Set the Anthropic, Google, and VisionAgent API keys as environment variables](#set-api-keys-as-environment-variables).
 3. [Install VisionAgent](#installation).
 4. Create a folder called `quickstart`.
 5. Find an image you want to analyze and save it to the `quickstart` folder.
@@ -79,13 +80,13 @@ Follow this quickstart to learn how to prompt VisionAgent. After learning the ba
 8. VisionAgent creates a file called `generated_code.py` and saves the generated code there.
 ### Set API Keys as Environment Variables
-Before running VisionAgent code, you must set the Anthropic, Gemini, and VisionAgent API keys as environment variables. Each operating system offers different ways to do this.
+Before running VisionAgent code, you must set the Anthropic, Google, and VisionAgent API keys as environment variables. Each operating system offers different ways to do this.
 Here is the code for setting the variables:
 ```bash
 export VISION_AGENT_API_KEY="your-api-key"
 export ANTHROPIC_API_KEY="your-api-key"
-export GEMINI_API_KEY="your-api-key"
+export GOOGLE_API_KEY="your-api-key"
 ```
 ### Sample Script: Prompt VisionAgent
 To use VisionAgent to generate code, use the following script as a starting point:

vision_agent-1.1.8/pyproject.toml ADDED Viewed

@@ -0,0 +1,122 @@
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+[project]
+name = "vision-agent"
+version = "1.1.8"
+description = "Toolset for Vision Agent"
+authors = [{ name = "Landing AI", email = "dev@landing.ai" }]
+requires-python = ">=3.9,<4.0"
+readme = "README.md"
+dependencies = [
+    "numpy>=1.21.0,<2.0.0",
+    "pillow==10.*",
+    "requests==2.*",
+    "tqdm>=4.64.0,<5.0.0",
+    "pandas==2.*",
+    "openai==1.55.3",
+    "httpx==0.27.2",
+    "flake8>=7.0.0,<8",
+    "typing_extensions==4.*",
+    "opencv-python==4.*",
+    "tabulate>=0.9.0,<0.10",
+    "scipy==1.13.*",
+    "nbclient>=0.10.0,<0.11",
+    "nbformat>=5.10.4,<6",
+    "rich>=13.7.1,<14",
+    "ipykernel>=6.29.4,<7",
+    "tenacity>=8.3.0,<9",
+    "pillow-heif>=0.16.0,<0.17",
+    "pytube==15.0.0",
+    "anthropic>=0.31.0,<0.32",
+    "pydantic>=2.0.0,<3",
+    "av>=11.0.0,<12",
+    "libcst>=1.5.0,<2",
+    "matplotlib>=3.9.2,<4",
+    "scikit-learn>=1.5.2,<2",
+    "opentelemetry-api>=1.29.0,<2",
+    "dotenv>=0.9.9,<0.10",
+    "pymupdf>=1.23.0,<2",
+    "google-genai>=1.0.0,<2",
+]
+[project.urls]
+Homepage = "https://landing.ai"
+repository = "https://github.com/landing-ai/vision-agent"
+documentation = "https://github.com/landing-ai/vision-agent"
+[dependency-groups]
+dev = [
+    "autoflake==1.*",
+    "pytest==7.*",
+    "black>=23,<25",
+    "isort==5.*",
+    "responses>=0.23.1,<0.24",
+    "mypy<1.8.0",
+    "types-requests>=2.31.0.0,<3",
+    "types-pillow>=9.5.0.4,<10",
+    "data-science-types>=0.2.23,<0.3",
+    "types-tqdm>=4.65.0.1,<5",
+    "setuptools>=68.0.0,<69",
+    "griffe>=0.45.3,<0.46",
+    "mkdocs>=1.5.3,<2",
+    "mkdocstrings[python]>=0.23.0,<0.24",
+    "mkdocs-material>=9.4.2,<10",
+    "types-tabulate>=0.9.0.20240106,<0.10",
+    "scikit-image<0.23.1",
+    "pre-commit>=3.8.0,<4",
+]
+[tool.hatch.build.targets.wheel]
+include = [
+    "vision_agent",
+    "vision_agent/.sim_tools/*",
+]
+[tool.hatch.build.targets.sdist]
+include = [
+    "vision_agent",
+    "vision_agent/.sim_tools/*",
+]
+[tool.pytest.ini_options]
+log_cli = true
+log_cli_level = "INFO"
+log_cli_format = "%(asctime)s [%(levelname)s] %(message)s (%(filename)s:%(lineno)s)"
+log_cli_date_format = "%Y-%m-%d %H:%M:%S"
+[tool.black]
+exclude = '.vscode|.eggs|venv'
+line-length = 88  # suggested by black official site
+[tool.isort]
+line_length = 88
+profile = "black"
+[tool.mypy]
+plugins = "pydantic.mypy"
+exclude = "tests"
+show_error_context = true
+pretty = true
+check_untyped_defs = true
+disallow_untyped_defs = true
+no_implicit_optional = true
+strict_optional = true
+strict_equality = true
+extra_checks = true
+warn_redundant_casts = true
+warn_unused_configs = true
+warn_unused_ignores = true
+warn_return_any = true
+show_error_codes = true
+[[tool.mypy.overrides]]
+ignore_missing_imports = true
+module = [
+    "cv2.*",
+    "openai.*",
+    "sentence_transformers.*",
+]

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/.sim_tools/df.csv RENAMED Viewed

@@ -559,6 +559,30 @@ desc,doc,name
         ... )
         >>> save_image(result, ""inpainted_room.png"")
     ",flux_image_inpainting
+"'gemini_image_generation' performs image inpainting given an image and text prompt. It can be used to edit parts of an image or the entire image according to the prompt given.","gemini_image_generation(prompt: str, image: numpy.ndarray) -> numpy.ndarray:
+'gemini_image_generation' performs image inpainting given an image and text prompt.
+    It can be used to edit parts of an image or the entire image according to the prompt given.
+    Parameters:
+        prompt (str): A detailed text description guiding what should be generated
+            in the image. More detailed and specific prompts typically yield
+            better results.
+        image (np.ndarray): The source image to be inpainted. The image will serve as
+            the base context for the inpainting process.
+    Returns:
+        np.ndarray: The generated image(s) as a numpy array in RGB format with values
+            ranging from 0 to 255.
+    -------
+    Example:
+        >>> # Generate inpainting
+        >>> result = gemini_image_generation(
+        ...     prompt="a modern black leather sofa with white pillows",
+        ...     image=image,
+        ... )
+        >>> save_image(result, ""inpainted_room.png"")
+    ",gemini_image_generation
 'siglip_classification' is a tool that can classify an image or a cropped detection given a list of input labels or tags. It returns the same list of the input labels along with their probability scores based on image content.,"siglip_classification(image: numpy.ndarray, labels: List[str]) -> Dict[str, Any]:
 'siglip_classification' is a tool that can classify an image or a cropped detection given a list
     of input labels or tags. It returns the same list of the input labels along with

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/tools/__init__.py RENAMED Viewed

@@ -31,6 +31,7 @@ from .tools import (
     florence2_sam2_instance_segmentation,
     florence2_sam2_video_tracking,
     flux_image_inpainting,
+    gemini_image_generation,
     generate_pose_image,
     get_tools,
     get_tools_descriptions,

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/tools/tools.py RENAMED Viewed

@@ -10,6 +10,7 @@ from importlib import resources
 from pathlib import Path
 from typing import IO, Any, Callable, Dict, List, Optional, Tuple, Union, cast
 from warnings import warn
+import time
 import cv2
 import numpy as np
@@ -20,6 +21,8 @@ from PIL import Image, ImageDraw, ImageFont
 from pillow_heif import register_heif_opener  # type: ignore
 from pytube import YouTube  # type: ignore
 import pymupdf  # type: ignore
+from google import genai  # type: ignore
+from google.genai import types  # type: ignore
 from vision_agent.lmm.lmm import LMM, AnthropicLMM, OpenAILMM
 from vision_agent.utils.execute import FileSerializer, MimeType
@@ -2841,6 +2844,147 @@ def flux_image_inpainting(
     return output_image
+def gemini_image_generation(
+    prompt: str,
+    image: Optional[np.ndarray] = None,
+) -> np.ndarray:
+    """'gemini_image_generation' performs either image inpainting given an image and text prompt, or image generation given a prompt.
+    It can be used to edit parts of an image or the entire image according to the prompt given.
+    Parameters:
+        prompt (str): A detailed text description guiding what should be generated
+            in the image. More detailed and specific prompts typically yield
+            better results.
+        image (np.ndarray, optional): The source image to be inpainted. The image will serve as
+            the base context for the inpainting process.
+    Returns:
+        np.ndarray: The generated image(s) as a numpy array in RGB format with values
+            ranging from 0 to 255.
+    -------
+    Example:
+        >>> # Generate inpainting
+        >>> result = gemini_image_generation(
+        ...     prompt="a modern black leather sofa with white pillows",
+        ...     image=image,
+        ... )
+        >>> save_image(result, "inpainted_room.png")
+    """
+    client = genai.Client()
+    files = []
+    image_file = None
+    def try_generate_content(
+        input_prompt: types.Content, num_retries: int = 3
+    ) -> Optional[bytes]:
+        """Try to generate content with multiple attempts."""
+        for attempt in range(num_retries):
+            try:
+                resp = client.models.generate_content(
+                    model="gemini-2.0-flash-exp-image-generation",
+                    contents=input_prompt,
+                    config=types.GenerateContentConfig(
+                        response_modalities=["Text", "Image"]
+                    ),
+                )
+                if (
+                    not resp.candidates
+                    or not resp.candidates[0].content
+                    or not resp.candidates[0].content.parts
+                    or not resp.candidates[0].content.parts[0].inline_data
+                    or not resp.candidates[0].content.parts[0].inline_data.data
+                ):
+                    _LOGGER.warning(f"Attempt {attempt + 1}: No candidates returned")
+                    time.sleep(5)
+                    continue
+                else:
+                    return (
+                        resp.candidates[0].content.parts[0].inline_data.data
+                        if isinstance(
+                            resp.candidates[0].content.parts[0].inline_data.data, bytes
+                        )
+                        else None
+                    )
+            except genai.errors.ClientError as e:
+                _LOGGER.warning(f"Attempt {attempt + 1} failed: {str(e)}")
+                time.sleep(5)
+        return None
+    if image is not None:
+        # Resize if needed
+        max_size = (512, 512)
+        if image.shape[0] > max_size[0] or image.shape[1] > max_size[1]:
+            scaling_factor = min(
+                max_size[0] / image.shape[0], max_size[1] / image.shape[1]
+            )
+            new_size = (
+                int(image.shape[1] * scaling_factor),
+                int(image.shape[0] * scaling_factor),
+            )
+            image = cv2.resize(image, new_size, interpolation=cv2.INTER_AREA)
+        # Convert to RGB
+        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
+        image_file = numpy_to_bytes(image)
+        files = [("image", image_file)]
+        input_prompt = types.Content(
+            parts=[
+                types.Part(
+                    text="I want you to edit this image given this prompt: " + prompt
+                ),
+                types.Part(inline_data={"mime_type": "image/png", "data": image_file}),
+            ]
+        )
+    else:
+        input_prompt = types.Content(parts=[types.Part(text=prompt)])
+    # Try to generate content
+    output_image_bytes = try_generate_content(input_prompt)
+    # Handle fallback if all attempts failed
+    if output_image_bytes is None:
+        if image is not None:
+            _LOGGER.warning("Returning original image after all retries failed.")
+            return image
+        else:
+            try:
+                _LOGGER.warning("All retries failed; prompting for fresh generation.")
+                time.sleep(10)
+                output_image_bytes = try_generate_content(
+                    types.Content(parts=[types.Part(text="Generate an image.")]),
+                    num_retries=1,
+                )
+            except Exception as e:
+                raise ValueError(f"Fallback generation failed: {str(e)}")
+    # Convert bytes to image
+    if output_image_bytes is not None:
+        output_image_temp = io.BytesIO(output_image_bytes)
+        output_image_pil = Image.open(output_image_temp)
+        final_image = np.array(output_image_pil)
+    else:
+        raise ValueError("Fallback generation failed")
+    _display_tool_trace(
+        gemini_image_generation.__name__,
+        {
+            "prompt": prompt,
+            "model": "gemini-2.0-flash-exp-image-generation",
+        },
+        final_image,
+        files,
+    )
+    return final_image
 def siglip_classification(image: np.ndarray, labels: List[str]) -> Dict[str, Any]:
     """'siglip_classification' is a tool that can classify an image or a cropped detection given a list
     of input labels or tags. It returns the same list of the input labels along with

vision_agent-1.1.6/pyproject.toml DELETED Viewed

@@ -1,108 +0,0 @@
-[build-system]
-requires = ["poetry-core"]
-build-backend = "poetry.core.masonry.api"
-[tool.poetry]
-name = "vision-agent"
-version = "1.1.6"
-description = "Toolset for Vision Agent"
-authors = ["Landing AI <dev@landing.ai>"]
-readme = "README.md"
-packages = [{include = "vision_agent"}]
-include = [{path = "vision_agent/.sim_tools/*"}]
-[tool.poetry.urls]
-"Homepage" = "https://landing.ai"
-"repository" = "https://github.com/landing-ai/vision-agent"
-"documentation" = "https://github.com/landing-ai/vision-agent"
-[tool.poetry.dependencies]  # main dependency group
-python = ">=3.9,<4.0"
-numpy = ">=1.21.0,<2.0.0"
-pillow = "10.*"
-requests = "2.*"
-tqdm = ">=4.64.0,<5.0.0"
-pandas = "2.*"
-openai = "1.*"
-flake8 = "^7.0.0"
-typing_extensions = "4.*"
-opencv-python = "4.*"
-tabulate = "^0.9.0"
-scipy = "1.13.*"
-nbclient = "^0.10.0"
-nbformat = "^5.10.4"
-rich = "^13.7.1"
-ipykernel = "^6.29.4"
-tenacity = "^8.3.0"
-pillow-heif = "^0.16.0"
-pytube = "15.0.0"
-anthropic = "^0.31.0"
-pydantic = "^2.0.0"
-av = "^11.0.0"
-libcst = "^1.5.0"
-matplotlib = "^3.9.2"
-scikit-learn = "^1.5.2"
-opentelemetry-api = "^1.29.0"
-dotenv = "^0.9.9"
-pymupdf = "^1.23.0"
-[tool.poetry.group.dev.dependencies]
-autoflake = "1.*"
-pytest = "7.*"
-black = ">=23,<25"
-isort = "5.*"
-responses = "^0.23.1"
-mypy = "<1.8.0"
-types-requests = "^2.31.0.0"
-types-pillow = "^9.5.0.4"
-data-science-types = "^0.2.23"
-types-tqdm = "^4.65.0.1"
-setuptools = "^68.0.0"
-griffe = "^0.45.3"
-mkdocs = "^1.5.3"
-mkdocstrings = {extras = ["python"], version = "^0.23.0"}
-mkdocs-material = "^9.4.2"
-types-tabulate = "^0.9.0.20240106"
-scikit-image = "<0.23.1"
-pre-commit = "^3.8.0"
-[tool.pytest.ini_options]
-log_cli = true
-log_cli_level = "INFO"
-log_cli_format = "%(asctime)s [%(levelname)s] %(message)s (%(filename)s:%(lineno)s)"
-log_cli_date_format = "%Y-%m-%d %H:%M:%S"
-[tool.black]
-exclude = '.vscode|.eggs|venv'
-line-length = 88  # suggested by black official site
-[tool.isort]
-line_length = 88
-profile = "black"
-[tool.mypy]
-plugins = "pydantic.mypy"
-exclude = "tests"
-show_error_context = true
-pretty = true
-check_untyped_defs = true
-disallow_untyped_defs = true
-no_implicit_optional = true
-strict_optional = true
-strict_equality = true
-extra_checks = true
-warn_redundant_casts = true
-warn_unused_configs = true
-warn_unused_ignores = true
-warn_return_any = true
-show_error_codes = true
-[[tool.mypy.overrides]]
-ignore_missing_imports = true
-module = [
-    "cv2.*",
-    "openai.*",
-    "sentence_transformers.*",
-]

{vision_agent-1.1.6 → vision_agent-1.1.8}/LICENSE RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/.sim_tools/embs.npy RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/__init__.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/agent/README.md RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/agent/__init__.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/agent/agent.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/agent/vision_agent_coder_prompts_v2.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/agent/vision_agent_coder_v2.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/agent/vision_agent_planner_prompts_v2.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/agent/vision_agent_planner_v2.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/agent/vision_agent_prompts_v2.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/agent/vision_agent_v2.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/clients/__init__.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/clients/http.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/configs/__init__.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/configs/anthropic_config.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/configs/config.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/configs/openai_config.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/fonts/__init__.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/fonts/default_font_ch_en.ttf RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/lmm/__init__.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/lmm/lmm.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/models/__init__.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/models/agent_types.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/models/lmm_types.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/models/tools_types.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/sim/__init__.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/sim/sim.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/tools/meta_tools.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/tools/planner_tools.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/tools/prompts.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/utils/__init__.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/utils/agent.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/utils/exceptions.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/utils/execute.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/utils/image_utils.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/utils/tools.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/utils/tools_doc.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/utils/video.py RENAMED Viewed

File without changes

{vision_agent-1.1.6 → vision_agent-1.1.8}/vision_agent/utils/video_tracking.py RENAMED Viewed

File without changes

vision-agent 1.1.6__tar.gz → 1.1.8__tar.gz

vision-agent 1.1.6tar.gz → 1.1.8tar.gz