vision-agent 0.2.90__py3-none-any.whl → 0.2.92__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- vision_agent/agent/__init__.py +2 -1
- vision_agent/agent/agent.py +1 -1
- vision_agent/agent/agent_utils.py +43 -0
- vision_agent/agent/vision_agent.py +116 -824
- vision_agent/agent/vision_agent_coder.py +897 -0
- vision_agent/agent/vision_agent_coder_prompts.py +328 -0
- vision_agent/agent/vision_agent_prompts.py +89 -302
- vision_agent/lmm/__init__.py +2 -1
- vision_agent/lmm/lmm.py +3 -5
- vision_agent/lmm/types.py +5 -0
- vision_agent/tools/__init__.py +1 -0
- vision_agent/tools/meta_tools.py +402 -0
- vision_agent/tools/tool_utils.py +48 -2
- vision_agent/tools/tools.py +7 -49
- vision_agent/utils/execute.py +52 -76
- vision_agent/utils/image_utils.py +1 -1
- vision_agent/utils/type_defs.py +1 -1
- {vision_agent-0.2.90.dist-info → vision_agent-0.2.92.dist-info}/METADATA +42 -12
- vision_agent-0.2.92.dist-info/RECORD +29 -0
- vision_agent-0.2.90.dist-info/RECORD +0 -24
- {vision_agent-0.2.90.dist-info → vision_agent-0.2.92.dist-info}/LICENSE +0 -0
- {vision_agent-0.2.90.dist-info → vision_agent-0.2.92.dist-info}/WHEEL +0 -0
@@ -0,0 +1,328 @@
|
|
1
|
+
USER_REQ = """
|
2
|
+
## User Request
|
3
|
+
{user_request}
|
4
|
+
"""
|
5
|
+
|
6
|
+
FULL_TASK = """
|
7
|
+
## User Request
|
8
|
+
{user_request}
|
9
|
+
|
10
|
+
## Subtasks
|
11
|
+
{subtasks}
|
12
|
+
"""
|
13
|
+
|
14
|
+
FEEDBACK = """
|
15
|
+
## This contains code and feedback from previous runs and is used for providing context so you do not make the same mistake again.
|
16
|
+
|
17
|
+
{feedback}
|
18
|
+
"""
|
19
|
+
|
20
|
+
|
21
|
+
PLAN = """
|
22
|
+
**Context**:
|
23
|
+
{context}
|
24
|
+
|
25
|
+
**Tools Available**:
|
26
|
+
{tool_desc}
|
27
|
+
|
28
|
+
**Previous Feedback**:
|
29
|
+
{feedback}
|
30
|
+
|
31
|
+
**Instructions**:
|
32
|
+
1. Based on the context and tools you have available, create a plan of subtasks to achieve the user request.
|
33
|
+
2. Output three different plans each utilize a different strategy or tool.
|
34
|
+
|
35
|
+
Output a list of jsons in the following format
|
36
|
+
|
37
|
+
```json
|
38
|
+
{{
|
39
|
+
"plan1":
|
40
|
+
[
|
41
|
+
{{
|
42
|
+
"instructions": str # what you should do in this task associated with a tool
|
43
|
+
}}
|
44
|
+
],
|
45
|
+
"plan2": ...,
|
46
|
+
"plan3": ...
|
47
|
+
}}
|
48
|
+
```
|
49
|
+
"""
|
50
|
+
|
51
|
+
|
52
|
+
TEST_PLANS = """
|
53
|
+
**Role**: You are a software programmer responsible for testing different tools.
|
54
|
+
|
55
|
+
**Task**: Your responsibility is to take a set of several plans and test the different tools for each plan.
|
56
|
+
|
57
|
+
**Documentation**:
|
58
|
+
This is the documentation for the functions you have access to. You may call any of these functions to help you complete the task. They are available through importing `from vision_agent.tools import *`.
|
59
|
+
|
60
|
+
{docstring}
|
61
|
+
|
62
|
+
**Plans**:
|
63
|
+
{plans}
|
64
|
+
|
65
|
+
{previous_attempts}
|
66
|
+
|
67
|
+
**Instructions**:
|
68
|
+
1. Write a program to load the media and call each tool and save it's output.
|
69
|
+
2. Create a dictionary where the keys are the tool name and the values are the tool outputs. Remove numpy arrays from the printed dictionary.
|
70
|
+
3. Your test case MUST run only on the given images which are {media}
|
71
|
+
4. Print this final dictionary.
|
72
|
+
|
73
|
+
**Example**:
|
74
|
+
plan1:
|
75
|
+
- Load the image from the provided file path 'image.jpg'.
|
76
|
+
- Use the 'owl_v2' tool with the prompt 'person' to detect and count the number of people in the image.
|
77
|
+
plan2:
|
78
|
+
- Load the image from the provided file path 'image.jpg'.
|
79
|
+
- Use the 'grounding_sam' tool with the prompt 'person' to detect and count the number of people in the image.
|
80
|
+
- Count the number of detected objects labeled as 'person'.
|
81
|
+
plan3:
|
82
|
+
- Load the image from the provided file path 'image.jpg'.
|
83
|
+
- Use the 'loca_zero_shot_counting' tool to count the dominant foreground object, which in this case is people.
|
84
|
+
|
85
|
+
```python
|
86
|
+
from vision_agent.tools import load_image, owl_v2, grounding_sam, loca_zero_shot_counting
|
87
|
+
image = load_image("image.jpg")
|
88
|
+
owl_v2_out = owl_v2("person", image)
|
89
|
+
|
90
|
+
gsam_out = grounding_sam("person", image)
|
91
|
+
gsam_out = [{{k: v for k, v in o.items() if k != "mask"}} for o in gsam_out]
|
92
|
+
|
93
|
+
loca_out = loca_zero_shot_counting(image)
|
94
|
+
loca_out = loca_out["count"]
|
95
|
+
|
96
|
+
final_out = {{"owl_v2": owl_v2_out, "florencev2_object_detection": florencev2_out, "loca_zero_shot_counting": loca_out}}
|
97
|
+
print(final_out)
|
98
|
+
```
|
99
|
+
"""
|
100
|
+
|
101
|
+
|
102
|
+
PREVIOUS_FAILED = """
|
103
|
+
**Previous Failed Attempts**:
|
104
|
+
You previously ran this code:
|
105
|
+
```python
|
106
|
+
{code}
|
107
|
+
```
|
108
|
+
|
109
|
+
But got the following error or no stdout:
|
110
|
+
{error}
|
111
|
+
"""
|
112
|
+
|
113
|
+
|
114
|
+
PICK_PLAN = """
|
115
|
+
**Role**: You are a software programmer.
|
116
|
+
|
117
|
+
**Task**: Your responsibility is to pick the best plan from the three plans provided.
|
118
|
+
|
119
|
+
**Context**:
|
120
|
+
{context}
|
121
|
+
|
122
|
+
**Plans**:
|
123
|
+
{plans}
|
124
|
+
|
125
|
+
**Tool Output**:
|
126
|
+
{tool_output}
|
127
|
+
|
128
|
+
**Instructions**:
|
129
|
+
1. Given the plans, image, and tool outputs, decide which plan is the best to achieve the user request.
|
130
|
+
2. Output a JSON object with the following format:
|
131
|
+
{{
|
132
|
+
"thoughts": str # your thought process for choosing the best plan
|
133
|
+
"best_plan": str # the best plan you have chosen
|
134
|
+
}}
|
135
|
+
"""
|
136
|
+
|
137
|
+
CODE = """
|
138
|
+
**Role**: You are a software programmer.
|
139
|
+
|
140
|
+
**Task**: As a programmer, you are required to complete the function. Use a Chain-of-Thought approach to break down the problem, create pseudocode, and then write the code in Python language. Ensure that your code is efficient, readable, and well-commented. Return the requested information from the function you create. Do not call your code, a test will be run after the code is submitted.
|
141
|
+
|
142
|
+
**Documentation**:
|
143
|
+
This is the documentation for the functions you have access to. You may call any of these functions to help you complete the task. They are available through importing `from vision_agent.tools import *`.
|
144
|
+
|
145
|
+
{docstring}
|
146
|
+
|
147
|
+
**Input Code Snippet**:
|
148
|
+
```python
|
149
|
+
# Your code here
|
150
|
+
```
|
151
|
+
|
152
|
+
**User Instructions**:
|
153
|
+
{question}
|
154
|
+
|
155
|
+
**Tool Output**:
|
156
|
+
{tool_output}
|
157
|
+
|
158
|
+
**Previous Feedback**:
|
159
|
+
{feedback}
|
160
|
+
|
161
|
+
**Instructions**:
|
162
|
+
1. **Understand and Clarify**: Make sure you understand the task.
|
163
|
+
2. **Algorithm/Method Selection**: Decide on the most efficient way.
|
164
|
+
3. **Pseudocode Creation**: Write down the steps you will follow in pseudocode.
|
165
|
+
4. **Code Generation**: Translate your pseudocode into executable Python code. Ensure you use correct arguments, remember coordinates are always returned normalized from `vision_agent.tools`. All images from `vision_agent.tools` are in RGB format, red is (255, 0, 0) and blue is (0, 0, 255).
|
166
|
+
"""
|
167
|
+
|
168
|
+
TEST = """
|
169
|
+
**Role**: As a tester, your task is to create comprehensive test cases for the provided code. These test cases should encompass Basic and Edge case scenarios to ensure the code's robustness and reliability if possible.
|
170
|
+
|
171
|
+
**Documentation**:
|
172
|
+
This is the documentation for the functions you have access to. You may call any of these functions to help you complete the task. They are available through importing `from vision_agent.tools import *`. You do not need to test these functions. Test only the code provided by the user.
|
173
|
+
|
174
|
+
{docstring}
|
175
|
+
|
176
|
+
**User Instructions**:
|
177
|
+
{question}
|
178
|
+
|
179
|
+
**Input Code Snippet**:
|
180
|
+
```python
|
181
|
+
### Please decided how would you want to generate test cases. Based on incomplete code or completed version.
|
182
|
+
{code}
|
183
|
+
```
|
184
|
+
|
185
|
+
**Instructions**:
|
186
|
+
1. Verify the fundamental functionality under normal conditions.
|
187
|
+
2. Ensure each test case is well-documented with comments explaining the scenario it covers.
|
188
|
+
3. DO NOT use any files that are not provided by the user's instructions, your test must be run and will crash if it tries to load a non-existent file.
|
189
|
+
4. DO NOT mock any functions, you must test their functionality as is.
|
190
|
+
|
191
|
+
You should format your test cases at the end of your response wrapped in ```python ``` tags like in the following example:
|
192
|
+
```python
|
193
|
+
# You can run assertions to ensure the function is working as expected
|
194
|
+
assert function(input) == expected_output, "Test case description"
|
195
|
+
|
196
|
+
# You can simply call the function to ensure it runs
|
197
|
+
function(input)
|
198
|
+
|
199
|
+
# Or you can visualize the output
|
200
|
+
output = function(input)
|
201
|
+
visualize(output)
|
202
|
+
```
|
203
|
+
|
204
|
+
**Examples**:
|
205
|
+
## Prompt 1:
|
206
|
+
```python
|
207
|
+
def detect_cats_and_dogs(image_path: str) -> Dict[str, List[List[float]]]:
|
208
|
+
\""" Detects cats and dogs in an image. Returns a dictionary with
|
209
|
+
{{
|
210
|
+
"cats": [[x1, y1, x2, y2], ...], "dogs": [[x1, y1, x2, y2], ...]
|
211
|
+
}}
|
212
|
+
\"""
|
213
|
+
```
|
214
|
+
|
215
|
+
## Completion 1:
|
216
|
+
```python
|
217
|
+
# We can test to ensure the output has the correct structure but we cannot test the
|
218
|
+
# content of the output without knowing the image. We can test on "image.jpg" because
|
219
|
+
# it is provided by the user so we know it exists.
|
220
|
+
output = detect_cats_and_dogs("image.jpg")
|
221
|
+
assert "cats" in output, "The output should contain 'cats'
|
222
|
+
assert "dogs" in output, "The output should contain 'dogs'
|
223
|
+
```
|
224
|
+
|
225
|
+
## Prompt 2:
|
226
|
+
```python
|
227
|
+
def find_text(image_path: str, text: str) -> str:
|
228
|
+
\""" Finds the text in the image and returns the text. \"""
|
229
|
+
|
230
|
+
## Completion 2:
|
231
|
+
```python
|
232
|
+
# Because we do not know ahead of time what text is in the image, we can only run the
|
233
|
+
# code and print the results. We can test on "image.jpg" because it is provided by the
|
234
|
+
# user so we know it exists.
|
235
|
+
found_text = find_text("image.jpg", "Hello World")
|
236
|
+
print(found_text)
|
237
|
+
```
|
238
|
+
"""
|
239
|
+
|
240
|
+
SIMPLE_TEST = """
|
241
|
+
**Role**: As a tester, your task is to create a simple test case for the provided code. This test case should verify the fundamental functionality under normal conditions.
|
242
|
+
|
243
|
+
**Documentation**:
|
244
|
+
This is the documentation for the functions you have access to. You may call any of these functions to help you complete the task. They are available through importing `from vision_agent.tools import *`. You do not need to test these functions, only the code provided by the user.
|
245
|
+
|
246
|
+
{docstring}
|
247
|
+
|
248
|
+
**User Instructions**:
|
249
|
+
{question}
|
250
|
+
|
251
|
+
**Input Code Snippet**:
|
252
|
+
```python
|
253
|
+
### Please decide how would you want to generate test cases. Based on incomplete code or completed version.
|
254
|
+
{code}
|
255
|
+
```
|
256
|
+
|
257
|
+
**Previous Feedback**:
|
258
|
+
{feedback}
|
259
|
+
|
260
|
+
**Instructions**:
|
261
|
+
1. Verify the fundamental functionality under normal conditions.
|
262
|
+
2. Ensure each test case is well-documented with comments explaining the scenario it covers.
|
263
|
+
3. Your test case MUST run only on the given images which are {media}
|
264
|
+
4. Your test case MUST run only with the given values which is available in the question - {question}
|
265
|
+
5. DO NOT use any non-existent or dummy image or video files that are not provided by the user's instructions.
|
266
|
+
6. DO NOT mock any functions, you must test their functionality as is.
|
267
|
+
7. DO NOT assert the output value, run the code and assert only the output format or data structure.
|
268
|
+
8. DO NOT use try except block to handle the error, let the error be raised if the code is incorrect.
|
269
|
+
9. DO NOT import the testing function as it will available in the testing environment.
|
270
|
+
10. Print the output of the function that is being tested.
|
271
|
+
11. Use the output of the function that is being tested as the return value of the testing function.
|
272
|
+
12. Run the testing function in the end and don't assign a variable to its output.
|
273
|
+
"""
|
274
|
+
|
275
|
+
|
276
|
+
FIX_BUG = """
|
277
|
+
**Role** As a coder, your job is to find the error in the code and fix it. You are running in a notebook setting so you can run !pip install to install missing packages.
|
278
|
+
|
279
|
+
**Instructions**:
|
280
|
+
Please re-complete the code to fix the error message. Here is the previous version:
|
281
|
+
```python
|
282
|
+
{code}
|
283
|
+
```
|
284
|
+
|
285
|
+
When we run this test code:
|
286
|
+
```python
|
287
|
+
{tests}
|
288
|
+
```
|
289
|
+
|
290
|
+
It raises this error:
|
291
|
+
{result}
|
292
|
+
|
293
|
+
This is previous feedback provided on the code:
|
294
|
+
{feedback}
|
295
|
+
|
296
|
+
Please fix the bug by follow the error information and return a JSON object with the following format:
|
297
|
+
{{
|
298
|
+
"reflections": str # any thoughts you have about the bug and how you fixed it
|
299
|
+
"code": str # the fixed code if any, else an empty string
|
300
|
+
"test": str # the fixed test code if any, else an empty string
|
301
|
+
}}
|
302
|
+
"""
|
303
|
+
|
304
|
+
|
305
|
+
REFLECT = """
|
306
|
+
**Role**: You are a reflection agent. Your job is to look at the original user request and the code produced and determine if the code satisfies the user's request. If it does not, you must provide feedback on how to improve the code. You are concerned only if the code meets the user request, not if the code is good or bad.
|
307
|
+
|
308
|
+
**Context**:
|
309
|
+
{context}
|
310
|
+
|
311
|
+
**Plan**:
|
312
|
+
{plan}
|
313
|
+
|
314
|
+
**Code**:
|
315
|
+
{code}
|
316
|
+
|
317
|
+
**Instructions**:
|
318
|
+
1. **Understand the User Request**: Read the user request and understand what the user is asking for.
|
319
|
+
2. **Review the Plan**: Check the plan to see if it is a viable approach to solving the user request.
|
320
|
+
3. **Review the Code**: Check the code to see if it solves the user request.
|
321
|
+
4. DO NOT add any reflections for test cases, these are taken care of.
|
322
|
+
|
323
|
+
Respond in JSON format with the following structure:
|
324
|
+
{{
|
325
|
+
"feedback": str # the feedback you would give to the coder and tester
|
326
|
+
"success": bool # whether the code and tests meet the user request
|
327
|
+
}}
|
328
|
+
"""
|