vision-agent 0.2.9__tar.gz → 0.2.10__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (25) hide show
  1. {vision_agent-0.2.9 → vision_agent-0.2.10}/PKG-INFO +47 -20
  2. {vision_agent-0.2.9 → vision_agent-0.2.10}/README.md +46 -19
  3. {vision_agent-0.2.9 → vision_agent-0.2.10}/pyproject.toml +1 -1
  4. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/agent/vision_agent.py +1 -1
  5. {vision_agent-0.2.9 → vision_agent-0.2.10}/LICENSE +0 -0
  6. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/__init__.py +0 -0
  7. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/agent/__init__.py +0 -0
  8. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/agent/agent.py +0 -0
  9. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/agent/easytool.py +0 -0
  10. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/agent/easytool_prompts.py +0 -0
  11. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/agent/reflexion.py +0 -0
  12. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/agent/reflexion_prompts.py +0 -0
  13. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/agent/vision_agent_prompts.py +0 -0
  14. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/fonts/__init__.py +0 -0
  15. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/fonts/default_font_ch_en.ttf +0 -0
  16. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/image_utils.py +0 -0
  17. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/llm/__init__.py +0 -0
  18. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/llm/llm.py +0 -0
  19. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/lmm/__init__.py +0 -0
  20. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/lmm/lmm.py +0 -0
  21. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/tools/__init__.py +0 -0
  22. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/tools/prompts.py +0 -0
  23. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/tools/tools.py +0 -0
  24. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/tools/video.py +0 -0
  25. {vision_agent-0.2.9 → vision_agent-0.2.10}/vision_agent/type_defs.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: vision-agent
3
- Version: 0.2.9
3
+ Version: 0.2.10
4
4
  Summary: Toolset for Vision Agent
5
5
  Author: Landing AI
6
6
  Author-email: dev@landing.ai
@@ -105,6 +105,30 @@ the individual steps and tools to get the answer:
105
105
  {"visualize_output": "final_output.png"}]
106
106
  ```
107
107
 
108
+ You can also provide reference data for the model to utilize. For example, if you want
109
+ to utilize VisualPromptCounting:
110
+
111
+ ```python
112
+ agent(
113
+ "How many apples are in this image?",
114
+ image="apples.jpg",
115
+ reference_data={"bbox": [0.1, 0.11, 0.24, 0.25]},
116
+ )
117
+ ```
118
+ Where `[0.1, 0.11, 0.24, 0.25]` is the normalized bounding box coordinates of an apple.
119
+ Similarly for DINOv you can provide a reference image and mask:
120
+
121
+ ```python
122
+ agent(
123
+ "Can you detect all of the objects similar to the mask I've provided?",
124
+ image="image.jpg",
125
+ reference_data={"mask": "reference_mask.png", "image": "reference_image.png"},
126
+ )
127
+ ```
128
+ Here, `reference_mask.png` and `reference_image.png` in `reference_data` could be any
129
+ image with it's corresponding mask that is the object you want to detect in `image.jpg`.
130
+ You can find a demo app to generate masks for DINOv [here](examples/mask_app/).
131
+
108
132
  ### Tools
109
133
  There are a variety of tools for the model or the user to use. Some are executed locally
110
134
  while others are hosted for you. You can also ask an LLM directly to build a tool for
@@ -127,25 +151,26 @@ you. For example:
127
151
  You can also add your own custom tools for your vision agent to use:
128
152
 
129
153
  ```python
130
- >>> from vision_agent.tools import Tool, register_tool
131
- >>> @register_tool
132
- >>> class NumItems(Tool):
133
- >>> name = "num_items_"
134
- >>> description = "Returns the number of items in a list."
135
- >>> usage = {
136
- >>> "required_parameters": [{"name": "prompt", "type": "list"}],
137
- >>> "examples": [
138
- >>> {
139
- >>> "scenario": "How many items are in this list? ['a', 'b', 'c']",
140
- >>> "parameters": {"prompt": "['a', 'b', 'c']"},
141
- >>> }
142
- >>> ],
143
- >>> }
144
- >>> def __call__(self, prompt: list[str]) -> int:
145
- >>> return len(prompt)
154
+ from vision_agent.tools import Tool, register_tool
155
+ @register_tool
156
+ class NumItems(Tool):
157
+ name = "num_items_"
158
+ description = "Returns the number of items in a list."
159
+ usage = {
160
+ "required_parameters": [{"name": "prompt", "type": "list"}],
161
+ "examples": [
162
+ {
163
+ "scenario": "How many items are in this list? ['a', 'b', 'c']",
164
+ "parameters": {"prompt": "['a', 'b', 'c']"},
165
+ }
166
+ ],
167
+ }
168
+ def __call__(self, prompt: list[str]) -> int:
169
+ return len(prompt)
146
170
  ```
147
171
  This will register it with the list of tools Vision Agent has access to. It will be able
148
- to pick it based on the tool description and use it based on the usage provided.
172
+ to pick it based on the tool description and use it based on the usage provided. You can
173
+ find an example that creates a custom tool for template matching [here](examples/custom_tools/).
149
174
 
150
175
  #### Tool List
151
176
  | Tool | Description |
@@ -164,8 +189,10 @@ to pick it based on the tool description and use it based on the usage provided.
164
189
  | BoxDistance | BoxDistance returns the minimum distance between two bounding boxes normalized to 2 decimal places. |
165
190
  | BboxContains | BboxContains returns the intersection of two boxes over the target box area. It is good for check if one box is contained within another box. |
166
191
  | ExtractFrames | ExtractFrames extracts frames with motion from a video. |
167
- | ZeroShotCounting | ZeroShotCounting returns the total number of objects belonging to a single class in a given image |
168
- | VisualPromptCounting | VisualPromptCounting returns the total number of objects belonging to a single class given an image and visual prompt |
192
+ | ZeroShotCounting | ZeroShotCounting returns the total number of objects belonging to a single class in a given image. |
193
+ | VisualPromptCounting | VisualPromptCounting returns the total number of objects belonging to a single class given an image and visual prompt. |
194
+ | VisualQuestionAnswering | VisualQuestionAnswering is a tool that can explain the contents of an image and answer questions about the image. |
195
+ | ImageQuestionAnswering | ImageQuestionAnswering is similar to VisualQuestionAnswering but does not rely on OpenAI and instead uses a dedicated model for the task. |
169
196
  | OCR | OCR returns the text detected in an image along with the location. |
170
197
 
171
198
 
@@ -78,6 +78,30 @@ the individual steps and tools to get the answer:
78
78
  {"visualize_output": "final_output.png"}]
79
79
  ```
80
80
 
81
+ You can also provide reference data for the model to utilize. For example, if you want
82
+ to utilize VisualPromptCounting:
83
+
84
+ ```python
85
+ agent(
86
+ "How many apples are in this image?",
87
+ image="apples.jpg",
88
+ reference_data={"bbox": [0.1, 0.11, 0.24, 0.25]},
89
+ )
90
+ ```
91
+ Where `[0.1, 0.11, 0.24, 0.25]` is the normalized bounding box coordinates of an apple.
92
+ Similarly for DINOv you can provide a reference image and mask:
93
+
94
+ ```python
95
+ agent(
96
+ "Can you detect all of the objects similar to the mask I've provided?",
97
+ image="image.jpg",
98
+ reference_data={"mask": "reference_mask.png", "image": "reference_image.png"},
99
+ )
100
+ ```
101
+ Here, `reference_mask.png` and `reference_image.png` in `reference_data` could be any
102
+ image with it's corresponding mask that is the object you want to detect in `image.jpg`.
103
+ You can find a demo app to generate masks for DINOv [here](examples/mask_app/).
104
+
81
105
  ### Tools
82
106
  There are a variety of tools for the model or the user to use. Some are executed locally
83
107
  while others are hosted for you. You can also ask an LLM directly to build a tool for
@@ -100,25 +124,26 @@ you. For example:
100
124
  You can also add your own custom tools for your vision agent to use:
101
125
 
102
126
  ```python
103
- >>> from vision_agent.tools import Tool, register_tool
104
- >>> @register_tool
105
- >>> class NumItems(Tool):
106
- >>> name = "num_items_"
107
- >>> description = "Returns the number of items in a list."
108
- >>> usage = {
109
- >>> "required_parameters": [{"name": "prompt", "type": "list"}],
110
- >>> "examples": [
111
- >>> {
112
- >>> "scenario": "How many items are in this list? ['a', 'b', 'c']",
113
- >>> "parameters": {"prompt": "['a', 'b', 'c']"},
114
- >>> }
115
- >>> ],
116
- >>> }
117
- >>> def __call__(self, prompt: list[str]) -> int:
118
- >>> return len(prompt)
127
+ from vision_agent.tools import Tool, register_tool
128
+ @register_tool
129
+ class NumItems(Tool):
130
+ name = "num_items_"
131
+ description = "Returns the number of items in a list."
132
+ usage = {
133
+ "required_parameters": [{"name": "prompt", "type": "list"}],
134
+ "examples": [
135
+ {
136
+ "scenario": "How many items are in this list? ['a', 'b', 'c']",
137
+ "parameters": {"prompt": "['a', 'b', 'c']"},
138
+ }
139
+ ],
140
+ }
141
+ def __call__(self, prompt: list[str]) -> int:
142
+ return len(prompt)
119
143
  ```
120
144
  This will register it with the list of tools Vision Agent has access to. It will be able
121
- to pick it based on the tool description and use it based on the usage provided.
145
+ to pick it based on the tool description and use it based on the usage provided. You can
146
+ find an example that creates a custom tool for template matching [here](examples/custom_tools/).
122
147
 
123
148
  #### Tool List
124
149
  | Tool | Description |
@@ -137,8 +162,10 @@ to pick it based on the tool description and use it based on the usage provided.
137
162
  | BoxDistance | BoxDistance returns the minimum distance between two bounding boxes normalized to 2 decimal places. |
138
163
  | BboxContains | BboxContains returns the intersection of two boxes over the target box area. It is good for check if one box is contained within another box. |
139
164
  | ExtractFrames | ExtractFrames extracts frames with motion from a video. |
140
- | ZeroShotCounting | ZeroShotCounting returns the total number of objects belonging to a single class in a given image |
141
- | VisualPromptCounting | VisualPromptCounting returns the total number of objects belonging to a single class given an image and visual prompt |
165
+ | ZeroShotCounting | ZeroShotCounting returns the total number of objects belonging to a single class in a given image. |
166
+ | VisualPromptCounting | VisualPromptCounting returns the total number of objects belonging to a single class given an image and visual prompt. |
167
+ | VisualQuestionAnswering | VisualQuestionAnswering is a tool that can explain the contents of an image and answer questions about the image. |
168
+ | ImageQuestionAnswering | ImageQuestionAnswering is similar to VisualQuestionAnswering but does not rely on OpenAI and instead uses a dedicated model for the task. |
142
169
  | OCR | OCR returns the text detected in an image along with the location. |
143
170
 
144
171
 
@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
4
4
 
5
5
  [tool.poetry]
6
6
  name = "vision-agent"
7
- version = "0.2.9"
7
+ version = "0.2.10"
8
8
  description = "Toolset for Vision Agent"
9
9
  authors = ["Landing AI <dev@landing.ai>"]
10
10
  readme = "README.md"
@@ -585,7 +585,7 @@ class VisionAgent(Agent):
585
585
  self.task_model, question, self.tools, reflections
586
586
  )
587
587
 
588
- task_depend = {"Original Quesiton": question}
588
+ task_depend = {"Original Question": question}
589
589
  previous_log = ""
590
590
  answers = []
591
591
  for task in task_list:
File without changes