vision-agent 0.2.9__py3-none-any.whl → 0.2.10__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -585,7 +585,7 @@ class VisionAgent(Agent):
585
585
  self.task_model, question, self.tools, reflections
586
586
  )
587
587
 
588
- task_depend = {"Original Quesiton": question}
588
+ task_depend = {"Original Question": question}
589
589
  previous_log = ""
590
590
  answers = []
591
591
  for task in task_list:
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: vision-agent
3
- Version: 0.2.9
3
+ Version: 0.2.10
4
4
  Summary: Toolset for Vision Agent
5
5
  Author: Landing AI
6
6
  Author-email: dev@landing.ai
@@ -105,6 +105,30 @@ the individual steps and tools to get the answer:
105
105
  {"visualize_output": "final_output.png"}]
106
106
  ```
107
107
 
108
+ You can also provide reference data for the model to utilize. For example, if you want
109
+ to utilize VisualPromptCounting:
110
+
111
+ ```python
112
+ agent(
113
+ "How many apples are in this image?",
114
+ image="apples.jpg",
115
+ reference_data={"bbox": [0.1, 0.11, 0.24, 0.25]},
116
+ )
117
+ ```
118
+ Where `[0.1, 0.11, 0.24, 0.25]` is the normalized bounding box coordinates of an apple.
119
+ Similarly for DINOv you can provide a reference image and mask:
120
+
121
+ ```python
122
+ agent(
123
+ "Can you detect all of the objects similar to the mask I've provided?",
124
+ image="image.jpg",
125
+ reference_data={"mask": "reference_mask.png", "image": "reference_image.png"},
126
+ )
127
+ ```
128
+ Here, `reference_mask.png` and `reference_image.png` in `reference_data` could be any
129
+ image with it's corresponding mask that is the object you want to detect in `image.jpg`.
130
+ You can find a demo app to generate masks for DINOv [here](examples/mask_app/).
131
+
108
132
  ### Tools
109
133
  There are a variety of tools for the model or the user to use. Some are executed locally
110
134
  while others are hosted for you. You can also ask an LLM directly to build a tool for
@@ -127,25 +151,26 @@ you. For example:
127
151
  You can also add your own custom tools for your vision agent to use:
128
152
 
129
153
  ```python
130
- >>> from vision_agent.tools import Tool, register_tool
131
- >>> @register_tool
132
- >>> class NumItems(Tool):
133
- >>> name = "num_items_"
134
- >>> description = "Returns the number of items in a list."
135
- >>> usage = {
136
- >>> "required_parameters": [{"name": "prompt", "type": "list"}],
137
- >>> "examples": [
138
- >>> {
139
- >>> "scenario": "How many items are in this list? ['a', 'b', 'c']",
140
- >>> "parameters": {"prompt": "['a', 'b', 'c']"},
141
- >>> }
142
- >>> ],
143
- >>> }
144
- >>> def __call__(self, prompt: list[str]) -> int:
145
- >>> return len(prompt)
154
+ from vision_agent.tools import Tool, register_tool
155
+ @register_tool
156
+ class NumItems(Tool):
157
+ name = "num_items_"
158
+ description = "Returns the number of items in a list."
159
+ usage = {
160
+ "required_parameters": [{"name": "prompt", "type": "list"}],
161
+ "examples": [
162
+ {
163
+ "scenario": "How many items are in this list? ['a', 'b', 'c']",
164
+ "parameters": {"prompt": "['a', 'b', 'c']"},
165
+ }
166
+ ],
167
+ }
168
+ def __call__(self, prompt: list[str]) -> int:
169
+ return len(prompt)
146
170
  ```
147
171
  This will register it with the list of tools Vision Agent has access to. It will be able
148
- to pick it based on the tool description and use it based on the usage provided.
172
+ to pick it based on the tool description and use it based on the usage provided. You can
173
+ find an example that creates a custom tool for template matching [here](examples/custom_tools/).
149
174
 
150
175
  #### Tool List
151
176
  | Tool | Description |
@@ -164,8 +189,10 @@ to pick it based on the tool description and use it based on the usage provided.
164
189
  | BoxDistance | BoxDistance returns the minimum distance between two bounding boxes normalized to 2 decimal places. |
165
190
  | BboxContains | BboxContains returns the intersection of two boxes over the target box area. It is good for check if one box is contained within another box. |
166
191
  | ExtractFrames | ExtractFrames extracts frames with motion from a video. |
167
- | ZeroShotCounting | ZeroShotCounting returns the total number of objects belonging to a single class in a given image |
168
- | VisualPromptCounting | VisualPromptCounting returns the total number of objects belonging to a single class given an image and visual prompt |
192
+ | ZeroShotCounting | ZeroShotCounting returns the total number of objects belonging to a single class in a given image. |
193
+ | VisualPromptCounting | VisualPromptCounting returns the total number of objects belonging to a single class given an image and visual prompt. |
194
+ | VisualQuestionAnswering | VisualQuestionAnswering is a tool that can explain the contents of an image and answer questions about the image. |
195
+ | ImageQuestionAnswering | ImageQuestionAnswering is similar to VisualQuestionAnswering but does not rely on OpenAI and instead uses a dedicated model for the task. |
169
196
  | OCR | OCR returns the text detected in an image along with the location. |
170
197
 
171
198
 
@@ -5,7 +5,7 @@ vision_agent/agent/easytool.py,sha256=oMHnBg7YBtIPgqQUNcZgq7uMgpPThs99_UnO7ERkMV
5
5
  vision_agent/agent/easytool_prompts.py,sha256=Bikw-PPLkm78dwywTlnv32Y1Tw6JMeC-R7oCnXWLcTk,4656
6
6
  vision_agent/agent/reflexion.py,sha256=4gz30BuFMeGxSsTzoDV4p91yE0R8LISXp28IaOI6wdM,10506
7
7
  vision_agent/agent/reflexion_prompts.py,sha256=G7UAeNz_g2qCb2yN6OaIC7bQVUkda4m3z42EG8wAyfE,9342
8
- vision_agent/agent/vision_agent.py,sha256=PyAtzDl5h1Uasd-Fjzdl-NK9gdZ2ARxoF9y3tvap7PU,26243
8
+ vision_agent/agent/vision_agent.py,sha256=DVcvT02GjY85mCjhHgJGrhI_dpUvjZhoYzYik9bkHQA,26243
9
9
  vision_agent/agent/vision_agent_prompts.py,sha256=moihXFhEzFw8xnf2sUSgd_k9eoxQam3T6XUkB0fyp5o,8570
10
10
  vision_agent/fonts/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
11
11
  vision_agent/fonts/default_font_ch_en.ttf,sha256=1YM0Z3XqLDjSNbF7ihQFSAIUdjF9m1rtHiNC_6QosTE,1594400
@@ -19,7 +19,7 @@ vision_agent/tools/prompts.py,sha256=V1z4YJLXZuUl_iZ5rY0M5hHc_2tmMEUKr0WocXKGt4E
19
19
  vision_agent/tools/tools.py,sha256=EvNDLUxe-Ed8-meHInTIiX3aySLUXFBsAWwL0Is5S1o,43823
20
20
  vision_agent/tools/video.py,sha256=xTElFSFp1Jw4ulOMnk81Vxsh-9dTxcWUO6P9fzEi3AM,7653
21
21
  vision_agent/type_defs.py,sha256=4LTnTL4HNsfYqCrDn9Ppjg9bSG2ZGcoKSSd9YeQf4Bw,1792
22
- vision_agent-0.2.9.dist-info/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
23
- vision_agent-0.2.9.dist-info/METADATA,sha256=jyfAwSfDnObeILoLyfB8ijuLLpZUWd-Fvg-xncEMCYc,7697
24
- vision_agent-0.2.9.dist-info/WHEEL,sha256=7Z8_27uaHI_UZAc4Uox4PpBhQ9Y5_modZXWMxtUi4NU,88
25
- vision_agent-0.2.9.dist-info/RECORD,,
22
+ vision_agent-0.2.10.dist-info/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
23
+ vision_agent-0.2.10.dist-info/METADATA,sha256=2uCVxAWBCbaFvxFnd6xoRoPNSo1UXaTLkeZ5qVOSM84,8930
24
+ vision_agent-0.2.10.dist-info/WHEEL,sha256=7Z8_27uaHI_UZAc4Uox4PpBhQ9Y5_modZXWMxtUi4NU,88
25
+ vision_agent-0.2.10.dist-info/RECORD,,