vision-agent 0.2.30__py3-none-any.whl → 0.2.31__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,226 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: vision-agent
3
- Version: 0.2.30
4
- Summary: Toolset for Vision Agent
5
- Author: Landing AI
6
- Author-email: dev@landing.ai
7
- Requires-Python: >=3.9,<4.0
8
- Classifier: Programming Language :: Python :: 3
9
- Classifier: Programming Language :: Python :: 3.9
10
- Classifier: Programming Language :: Python :: 3.10
11
- Classifier: Programming Language :: Python :: 3.11
12
- Requires-Dist: ipykernel (>=6.29.4,<7.0.0)
13
- Requires-Dist: langsmith (>=0.1.58,<0.2.0)
14
- Requires-Dist: moviepy (>=1.0.0,<2.0.0)
15
- Requires-Dist: nbclient (>=0.10.0,<0.11.0)
16
- Requires-Dist: nbformat (>=5.10.4,<6.0.0)
17
- Requires-Dist: numpy (>=1.21.0,<2.0.0)
18
- Requires-Dist: openai (>=1.0.0,<2.0.0)
19
- Requires-Dist: opencv-python-headless (>=4.0.0,<5.0.0)
20
- Requires-Dist: pandas (>=2.0.0,<3.0.0)
21
- Requires-Dist: pillow (>=10.0.0,<11.0.0)
22
- Requires-Dist: pydantic-settings (>=2.2.1,<3.0.0)
23
- Requires-Dist: requests (>=2.0.0,<3.0.0)
24
- Requires-Dist: rich (>=13.7.1,<14.0.0)
25
- Requires-Dist: scipy (>=1.13.0,<1.14.0)
26
- Requires-Dist: tabulate (>=0.9.0,<0.10.0)
27
- Requires-Dist: tqdm (>=4.64.0,<5.0.0)
28
- Requires-Dist: typing_extensions (>=4.0.0,<5.0.0)
29
- Project-URL: Homepage, https://landing.ai
30
- Project-URL: documentation, https://github.com/landing-ai/vision-agent
31
- Project-URL: repository, https://github.com/landing-ai/vision-agent
32
- Description-Content-Type: text/markdown
33
-
34
- <div align="center">
35
- <img alt="vision_agent" height="200px" src="https://github.com/landing-ai/vision-agent/blob/main/assets/logo.jpg?raw=true">
36
-
37
- # 🔍🤖 Vision Agent
38
-
39
- [![](https://dcbadge.vercel.app/api/server/wPdN8RCYew?compact=true&style=flat)](https://discord.gg/wPdN8RCYew)
40
- ![ci_status](https://github.com/landing-ai/vision-agent/actions/workflows/ci_cd.yml/badge.svg)
41
- [![PyPI version](https://badge.fury.io/py/vision-agent.svg)](https://badge.fury.io/py/vision-agent)
42
- ![version](https://img.shields.io/pypi/pyversions/vision-agent)
43
- </div>
44
-
45
- Vision Agent is a library that helps you utilize agent frameworks for your vision tasks.
46
- Many current vision problems can easily take hours or days to solve, you need to find the
47
- right model, figure out how to use it, possibly write programming logic around it to
48
- accomplish the task you want or even more expensive, train your own model. Vision Agent
49
- aims to provide an in-seconds experience by allowing users to describe their problem in
50
- text and utilizing agent frameworks to solve the task for them. Check out our discord
51
- for updates and roadmaps!
52
-
53
- ## Documentation
54
-
55
- - [Vision Agent Library Docs](https://landing-ai.github.io/vision-agent/)
56
-
57
-
58
- ## Getting Started
59
- ### Installation
60
- To get started, you can install the library using pip:
61
-
62
- ```bash
63
- pip install vision-agent
64
- ```
65
-
66
- Ensure you have an OpenAI API key and set it as an environment variable (if you are
67
- using Azure OpenAI please see the Azure setup section):
68
-
69
- ```bash
70
- export OPENAI_API_KEY="your-api-key"
71
- ```
72
-
73
- ### Vision Agents
74
- You can interact with the agents as you would with any LLM or LMM model:
75
-
76
- ```python
77
- >>> from vision_agent.agent import VisionAgent
78
- >>> agent = VisionAgent()
79
- >>> agent("What percentage of the area of this jar is filled with coffee beans?", image="jar.jpg")
80
- "The percentage of area of the jar filled with coffee beans is 25%."
81
- ```
82
-
83
- To better understand how the model came up with it's answer, you can also run it in
84
- debug mode by passing in the verbose argument:
85
-
86
- ```python
87
- >>> agent = VisionAgent(verbose=True)
88
- ```
89
-
90
- You can also have it return the workflow it used to complete the task along with all
91
- the individual steps and tools to get the answer:
92
-
93
- ```python
94
- >>> resp, workflow = agent.chat_with_workflow([{"role": "user", "content": "What percentage of the area of this jar is filled with coffee beans?"}], image="jar.jpg")
95
- >>> print(workflow)
96
- [{"task": "Segment the jar using 'grounding_sam_'.",
97
- "tool": "grounding_sam_",
98
- "parameters": {"prompt": "jar", "image": "jar.jpg"},
99
- "call_results": [[
100
- {
101
- "labels": ["jar"],
102
- "scores": [0.99],
103
- "bboxes": [
104
- [0.58, 0.2, 0.72, 0.45],
105
- ],
106
- "masks": "mask.png"
107
- }
108
- ]],
109
- "answer": "The jar is located at [0.58, 0.2, 0.72, 0.45].",
110
- },
111
- {"visualize_output": "final_output.png"}]
112
- ```
113
-
114
- You can also provide reference data for the model to utilize. For example, if you want
115
- to utilize VisualPromptCounting:
116
-
117
- ```python
118
- agent(
119
- "How many apples are in this image?",
120
- image="apples.jpg",
121
- reference_data={"bbox": [0.1, 0.11, 0.24, 0.25]},
122
- )
123
- ```
124
- Where `[0.1, 0.11, 0.24, 0.25]` is the normalized bounding box coordinates of an apple.
125
- Similarly for DINOv you can provide a reference image and mask:
126
-
127
- ```python
128
- agent(
129
- "Can you detect all of the objects similar to the mask I've provided?",
130
- image="image.jpg",
131
- reference_data={"mask": "reference_mask.png", "image": "reference_image.png"},
132
- )
133
- ```
134
- Here, `reference_mask.png` and `reference_image.png` in `reference_data` could be any
135
- image with it's corresponding mask that is the object you want to detect in `image.jpg`.
136
- You can find a demo app to generate masks for DINOv [here](examples/mask_app/).
137
-
138
- ### Tools
139
- There are a variety of tools for the model or the user to use. Some are executed locally
140
- while others are hosted for you. You can also ask an LLM directly to build a tool for
141
- you. For example:
142
-
143
- ```python
144
- >>> import vision_agent as va
145
- >>> llm = va.llm.OpenAILLM()
146
- >>> detector = llm.generate_detector("Can you build a jar detector for me?")
147
- >>> detector("jar.jpg")
148
- [{"labels": ["jar",],
149
- "scores": [0.99],
150
- "bboxes": [
151
- [0.58, 0.2, 0.72, 0.45],
152
- ]
153
- }]
154
- ```
155
-
156
- #### Custom Tools
157
- You can also add your own custom tools for your vision agent to use:
158
-
159
- ```python
160
- from vision_agent.tools import Tool, register_tool
161
- @register_tool
162
- class NumItems(Tool):
163
- name = "num_items_"
164
- description = "Returns the number of items in a list."
165
- usage = {
166
- "required_parameters": [{"name": "prompt", "type": "list"}],
167
- "examples": [
168
- {
169
- "scenario": "How many items are in this list? ['a', 'b', 'c']",
170
- "parameters": {"prompt": "['a', 'b', 'c']"},
171
- }
172
- ],
173
- }
174
- def __call__(self, prompt: list[str]) -> int:
175
- return len(prompt)
176
- ```
177
- This will register it with the list of tools Vision Agent has access to. It will be able
178
- to pick it based on the tool description and use it based on the usage provided. You can
179
- find an example that creates a custom tool for template matching [here](examples/custom_tools/).
180
-
181
- #### Tool List
182
- | Tool | Description |
183
- | --- | --- |
184
- | CLIP | CLIP is a tool that can classify or tag any image given a set of input classes or tags. |
185
- | ImageCaption| ImageCaption is a tool that can generate a caption for an image. |
186
- | GroundingDINO | GroundingDINO is a tool that can detect arbitrary objects with inputs such as category names or referring expressions. |
187
- | GroundingSAM | GroundingSAM is a tool that can detect and segment arbitrary objects with inputs such as category names or referring expressions. |
188
- | DINOv | DINOv is a tool that can detect arbitrary objects with using a referring mask. |
189
- | Crop | Crop crops an image given a bounding box and returns a file name of the cropped image. |
190
- | BboxArea | BboxArea returns the area of the bounding box in pixels normalized to 2 decimal places. |
191
- | SegArea | SegArea returns the area of the segmentation mask in pixels normalized to 2 decimal places. |
192
- | BboxIoU | BboxIoU returns the intersection over union of two bounding boxes normalized to 2 decimal places. |
193
- | SegIoU | SegIoU returns the intersection over union of two segmentation masks normalized to 2 decimal places. |
194
- | BoxDistance | BoxDistance returns the minimum distance between two bounding boxes normalized to 2 decimal places. |
195
- | MaskDistance | MaskDistance returns the minimum distance between two segmentation masks in pixel units |
196
- | BboxContains | BboxContains returns the intersection of two boxes over the target box area. It is good for check if one box is contained within another box. |
197
- | ExtractFrames | ExtractFrames extracts frames with motion from a video. |
198
- | ZeroShotCounting | ZeroShotCounting returns the total number of objects belonging to a single class in a given image. |
199
- | VisualPromptCounting | VisualPromptCounting returns the total number of objects belonging to a single class given an image and visual prompt. |
200
- | VisualQuestionAnswering | VisualQuestionAnswering is a tool that can explain the contents of an image and answer questions about the image. |
201
- | ImageQuestionAnswering | ImageQuestionAnswering is similar to VisualQuestionAnswering but does not rely on OpenAI and instead uses a dedicated model for the task. |
202
- | OCR | OCR returns the text detected in an image along with the location. |
203
-
204
-
205
- It also has a basic set of calculate tools such as add, subtract, multiply and divide.
206
-
207
- ### Azure Setup
208
- If you want to use Azure OpenAI models, you can set the environment variable:
209
-
210
- ```bash
211
- export AZURE_OPENAI_API_KEY="your-api-key"
212
- export AZURE_OPENAI_ENDPOINT="your-endpoint"
213
- ```
214
-
215
- You can then run Vision Agent using the Azure OpenAI models:
216
-
217
- ```python
218
- >>> import vision_agent as va
219
- >>> agent = va.agent.VisionAgent(
220
- >>> task_model=va.llm.AzureOpenAILLM(),
221
- >>> answer_model=va.lmm.AzureOpenAILMM(),
222
- >>> reflection_model=va.lmm.AzureOpenAILMM(),
223
- >>> )
224
- ```
225
-
226
-
@@ -1,36 +0,0 @@
1
- vision_agent/__init__.py,sha256=GVLHCeK_R-zgldpbcPmOzJat-BkadvkuRCMxDvTIcXs,108
2
- vision_agent/agent/__init__.py,sha256=jpmL6z5e4PFfQM21JbSsRwcERRXn58XFmURAMwWeoRM,249
3
- vision_agent/agent/agent.py,sha256=4buKL_7PA6q_Ktlf26FxfX0JxRGrL-swYk0xJuYNVz4,538
4
- vision_agent/agent/agent_coder.py,sha256=4Neo6qM9-J8sJ-PKqSaUHr28SYm43IjEvhDK8BfDosE,7006
5
- vision_agent/agent/agent_coder_prompts.py,sha256=CJe3v7xvHQ32u3RQAXQga_Tk_4UgU64RBAMHZ3S70KY,5538
6
- vision_agent/agent/easytool.py,sha256=oMHnBg7YBtIPgqQUNcZgq7uMgpPThs99_UnO7ERkMVg,11511
7
- vision_agent/agent/easytool_prompts.py,sha256=Bikw-PPLkm78dwywTlnv32Y1Tw6JMeC-R7oCnXWLcTk,4656
8
- vision_agent/agent/reflexion.py,sha256=4gz30BuFMeGxSsTzoDV4p91yE0R8LISXp28IaOI6wdM,10506
9
- vision_agent/agent/reflexion_prompts.py,sha256=G7UAeNz_g2qCb2yN6OaIC7bQVUkda4m3z42EG8wAyfE,9342
10
- vision_agent/agent/vision_agent.py,sha256=Rs7O0PXc2J9FlrpBa3UGs5NjqQT51Y507klQf9fC0UY,27281
11
- vision_agent/agent/vision_agent_prompts.py,sha256=MZSIwovYgB-f-kdJ6btaNDVXptJn47bfOL3-Zn6NiC0,8573
12
- vision_agent/agent/vision_agent_v2.py,sha256=t2D1mMUYEv1dFeMrkEUVbDEdArunb7F1ZeYB8qijU2w,15109
13
- vision_agent/agent/vision_agent_v2_prompts.py,sha256=b_0BMq6GrbGfl09MHrv4mj-mqyE1FxMl3Xq44qD4S1E,6161
14
- vision_agent/agent/vision_agent_v3.py,sha256=jPU__NueKQwFzIoJd0zzg6z9q7IDQa9QPaxt8Qlca98,12403
15
- vision_agent/agent/vision_agent_v3_prompts.py,sha256=ejedMNDluVYZjHOIXKN98LzX-pOHin2DJhCyZUWULNE,8070
16
- vision_agent/fonts/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
17
- vision_agent/fonts/default_font_ch_en.ttf,sha256=1YM0Z3XqLDjSNbF7ihQFSAIUdjF9m1rtHiNC_6QosTE,1594400
18
- vision_agent/llm/__init__.py,sha256=BoUm_zSAKnLlE8s-gKTSQugXDqVZKPqYlWwlTLdhcz4,48
19
- vision_agent/llm/llm.py,sha256=_Klwngc35JdRuzezWe1P5BMBRkfRQSGJqNOtS44rM9s,5891
20
- vision_agent/lmm/__init__.py,sha256=nnNeKD1k7q_4vLb1x51O_EUTYaBgGfeiCx5F433gr3M,67
21
- vision_agent/lmm/lmm.py,sha256=gK90vMxh0OcGSuIZQikBkDXm4pfkdFk1R2y7rtWDl84,10539
22
- vision_agent/tools/__init__.py,sha256=dRHXGpjhItXZRQs0r_l3Z3bQIreaZaYP0CJrl8mOJxM,452
23
- vision_agent/tools/prompts.py,sha256=V1z4YJLXZuUl_iZ5rY0M5hHc_2tmMEUKr0WocXKGt4E,1430
24
- vision_agent/tools/tool_utils.py,sha256=wzRacbUpqk9hhfX_Y08rL8qP0XCN2w-8IZoYLi3Upn4,869
25
- vision_agent/tools/tools.py,sha256=pZc5dQlYINlV4nYbbzsDi3-wauA-fCeD2iGmJUMoUfE,47373
26
- vision_agent/tools/tools_v2.py,sha256=mio0A1l5QcyRC5IgaD4Trfqg7hFTZ8rOjx1dYivwb4Q,21585
27
- vision_agent/utils/__init__.py,sha256=xsHFyJSDbLdonB9Dh74cwZnVTiT__2OQF3Brd3Nmglc,116
28
- vision_agent/utils/execute.py,sha256=8_SfK-IkHH4lXF0JVyV7sDFszZn9HKsh1bFITKGCJ1g,3881
29
- vision_agent/utils/image_utils.py,sha256=_cdiS5YrLzqkq_ZgFUO897m5M4_SCIThwUy4lOklfB8,7700
30
- vision_agent/utils/sim.py,sha256=oUZ-6eu8Io-UNt9GXJ0XRKtP-Wc0sPWVzYGVpB2yDFk,3001
31
- vision_agent/utils/type_defs.py,sha256=BlI8ywWHAplC7kYWLvt4AOdnKpEW3qWEFm-GEOSkrFQ,1792
32
- vision_agent/utils/video.py,sha256=xTElFSFp1Jw4ulOMnk81Vxsh-9dTxcWUO6P9fzEi3AM,7653
33
- vision_agent-0.2.30.dist-info/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
34
- vision_agent-0.2.30.dist-info/METADATA,sha256=uVj7XfG4Hat1Bed9FYM2dipIseooN4AHY-Tl4rSPOIg,9212
35
- vision_agent-0.2.30.dist-info/WHEEL,sha256=7Z8_27uaHI_UZAc4Uox4PpBhQ9Y5_modZXWMxtUi4NU,88
36
- vision_agent-0.2.30.dist-info/RECORD,,