vision-agent 1.0.4__py3-none-any.whl → 1.0.7__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- vision_agent/.sim_tools/df.csv +46 -47
- vision_agent/.sim_tools/embs.npy +0 -0
- vision_agent/agent/__init__.py +0 -16
- vision_agent/agent/vision_agent_planner_prompts_v2.py +57 -58
- vision_agent/agent/vision_agent_planner_v2.py +3 -2
- vision_agent/configs/anthropic_config.py +29 -16
- vision_agent/configs/config.py +14 -15
- vision_agent/configs/openai_config.py +10 -10
- vision_agent/lmm/lmm.py +2 -2
- vision_agent/tools/__init__.py +0 -6
- vision_agent/tools/meta_tools.py +1 -492
- vision_agent/tools/planner_tools.py +13 -14
- vision_agent/tools/tools.py +16 -27
- {vision_agent-1.0.4.dist-info → vision_agent-1.0.7.dist-info}/METADATA +31 -3
- {vision_agent-1.0.4.dist-info → vision_agent-1.0.7.dist-info}/RECORD +17 -24
- vision_agent/agent/vision_agent.py +0 -605
- vision_agent/agent/vision_agent_coder.py +0 -742
- vision_agent/agent/vision_agent_coder_prompts.py +0 -290
- vision_agent/agent/vision_agent_planner.py +0 -564
- vision_agent/agent/vision_agent_planner_prompts.py +0 -199
- vision_agent/agent/vision_agent_prompts.py +0 -312
- vision_agent/configs/anthropic_openai_config.py +0 -164
- {vision_agent-1.0.4.dist-info → vision_agent-1.0.7.dist-info}/LICENSE +0 -0
- {vision_agent-1.0.4.dist-info → vision_agent-1.0.7.dist-info}/WHEEL +0 -0
@@ -1,290 +0,0 @@
|
|
1
|
-
FULL_TASK = """
|
2
|
-
## User Request
|
3
|
-
{user_request}
|
4
|
-
|
5
|
-
## Subtasks
|
6
|
-
{subtasks}
|
7
|
-
"""
|
8
|
-
|
9
|
-
FEEDBACK = """
|
10
|
-
## This contains code and feedback from previous runs and is used for providing context so you do not make the same mistake again.
|
11
|
-
|
12
|
-
{feedback}
|
13
|
-
"""
|
14
|
-
|
15
|
-
|
16
|
-
CODE = """
|
17
|
-
**Role**: You are a software programmer.
|
18
|
-
|
19
|
-
**Task**: As a programmer, you are required to complete the function. Use a Chain-of-Thought approach to break down the problem, create pseudocode, and then write the code in Python language. Ensure that your code is efficient, readable, and well-commented. Return the requested information from the function you create. Do not call your code, a test will be run after the code is submitted.
|
20
|
-
|
21
|
-
**Documentation**:
|
22
|
-
This is the documentation for the functions you have access to. You may call any of these functions to help you complete the task. They are available through importing `from vision_agent.tools import *`.
|
23
|
-
|
24
|
-
{docstring}
|
25
|
-
|
26
|
-
**User Instructions**:
|
27
|
-
{question}
|
28
|
-
|
29
|
-
**Tool Tests and Outputs**:
|
30
|
-
{tool_output}
|
31
|
-
|
32
|
-
**Tool Output Thoughts**:
|
33
|
-
{plan_thoughts}
|
34
|
-
|
35
|
-
**Previous Feedback**:
|
36
|
-
{feedback}
|
37
|
-
|
38
|
-
**Examples**:
|
39
|
-
--- EXAMPLE1 ---
|
40
|
-
**User Instructions**:
|
41
|
-
|
42
|
-
## User Request
|
43
|
-
Can you write a program to check if each person is wearing a helmet? First detect all the people in the image, then detect the helmets, check whether or not a person is wearing a helmet if the helmet is on the worker. Return a dictionary with the count of people with helments and people without helmets. Media name worker_helmets.webp
|
44
|
-
|
45
|
-
## Subtasks
|
46
|
-
|
47
|
-
This plan uses the owlv2_object_detection tool to detect both people and helmets in a single pass, which should be efficient and accurate. We can then compare the detections to determine if each person is wearing a helmet.
|
48
|
-
-Use owlv2_object_detection with prompt 'person, helmet' to detect both people and helmets in the image
|
49
|
-
-Process the detections to match helmets with people based on bounding box proximity
|
50
|
-
-Count people with and without helmets based on the matching results
|
51
|
-
-Return a dictionary with the counts
|
52
|
-
|
53
|
-
|
54
|
-
**Tool Tests and Outputs**:
|
55
|
-
After examining the image, I can see 4 workers in total, with 3 wearing yellow safety helmets and 1 not wearing a helmet. Plan 1 using owlv2_object_detection seems to be the most accurate in detecting both people and helmets. However, it needs some modifications to improve accuracy. We should increase the confidence threshold to 0.15 to filter out the lowest confidence box, and implement logic to associate helmets with people based on their bounding box positions. Plan 2 and Plan 3 seem less reliable given the tool outputs, as they either failed to distinguish between people with and without helmets or misclassified all workers as not wearing helmets.
|
56
|
-
|
57
|
-
**Tool Output Thoughts**:
|
58
|
-
```python
|
59
|
-
...
|
60
|
-
```
|
61
|
-
----- stdout -----
|
62
|
-
Plan 1 - owlv2_object_detection:
|
63
|
-
|
64
|
-
[{{'label': 'helmet', 'score': 0.15, 'bbox': [0.85, 0.41, 0.87, 0.45]}}, {{'label': 'helmet', 'score': 0.3, 'bbox': [0.8, 0.43, 0.81, 0.46]}}, {{'label': 'helmet', 'score': 0.31, 'bbox': [0.85, 0.45, 0.86, 0.46]}}, {{'label': 'person', 'score': 0.31, 'bbox': [0.84, 0.45, 0.88, 0.58]}}, {{'label': 'person', 'score': 0.31, 'bbox': [0.78, 0.43, 0.82, 0.57]}}, {{'label': 'helmet', 'score': 0.33, 'bbox': [0.3, 0.65, 0.32, 0.67]}}, {{'label': 'person', 'score': 0.29, 'bbox': [0.28, 0.65, 0.36, 0.84]}}, {{'label': 'helmet', 'score': 0.29, 'bbox': [0.13, 0.82, 0.15, 0.85]}}, {{'label': 'person', 'score': 0.3, 'bbox': [0.1, 0.82, 0.24, 1.0]}}]
|
65
|
-
|
66
|
-
...
|
67
|
-
|
68
|
-
**Input Code Snippet**:
|
69
|
-
```python
|
70
|
-
from vision_agent.tools import load_image, owlv2_object_detection
|
71
|
-
|
72
|
-
def check_helmets(image_path):
|
73
|
-
image = load_image(image_path)
|
74
|
-
# Detect people and helmets, filter out the lowest confidence helmet score of 0.15
|
75
|
-
detections = owlv2_object_detection("person, helmet", image, box_threshold=0.15)
|
76
|
-
height, width = image.shape[:2]
|
77
|
-
|
78
|
-
# Separate people and helmets
|
79
|
-
people = [d for d in detections if d['label'] == 'person']
|
80
|
-
helmets = [d for d in detections if d['label'] == 'helmet']
|
81
|
-
|
82
|
-
people_with_helmets = 0
|
83
|
-
people_without_helmets = 0
|
84
|
-
|
85
|
-
for person in people:
|
86
|
-
person_x = (person['bbox'][0] + person['bbox'][2]) / 2
|
87
|
-
person_y = person['bbox'][1] # Top of the bounding box
|
88
|
-
|
89
|
-
helmet_found = False
|
90
|
-
for helmet in helmets:
|
91
|
-
helmet_x = (helmet['bbox'][0] + helmet['bbox'][2]) / 2
|
92
|
-
helmet_y = (helmet['bbox'][1] + helmet['bbox'][3]) / 2
|
93
|
-
|
94
|
-
# Check if the helmet is within 20 pixels of the person's head. Unnormalize
|
95
|
-
# the coordinates so we can better compare them.
|
96
|
-
if (abs((helmet_x - person_x) * width) < 20 and
|
97
|
-
-5 < ((helmet_y - person_y) * height) < 20):
|
98
|
-
helmet_found = True
|
99
|
-
break
|
100
|
-
|
101
|
-
if helmet_found:
|
102
|
-
people_with_helmets += 1
|
103
|
-
else:
|
104
|
-
people_without_helmets += 1
|
105
|
-
|
106
|
-
return {{
|
107
|
-
"people_with_helmets": people_with_helmets,
|
108
|
-
"people_without_helmets": people_without_helmets
|
109
|
-
}}
|
110
|
-
```
|
111
|
-
--- END EXAMPLE1 ---
|
112
|
-
|
113
|
-
**Instructions**:
|
114
|
-
1. **Understand and Clarify**: Make sure you understand the task.
|
115
|
-
2. **Algorithm/Method Selection**: Decide on the most efficient method, use the tool outputs and tool thoughts to guide you.
|
116
|
-
3. **Pseudocode Creation**: Write down the steps you will follow in pseudocode.
|
117
|
-
4. **Code Generation**: Translate your pseudocode into executable Python code.
|
118
|
-
4.1. Take in the media path as an argument and load with either `load_image` or `extract_frames_and_timestamps`.
|
119
|
-
4.2. Coordinates are always returned normalized from `vision_agent.tools`.
|
120
|
-
4.3. Do not create dummy input or functions, the code must be usable if the user provides new media.
|
121
|
-
4.4. Use unnormalized coordinates when comparing bounding boxes.
|
122
|
-
"""
|
123
|
-
|
124
|
-
TEST = """
|
125
|
-
**Role**: As a tester, your task is to create comprehensive test cases for the provided code. These test cases should encompass Basic and Edge case scenarios to ensure the code's robustness and reliability if possible.
|
126
|
-
|
127
|
-
**Documentation**:
|
128
|
-
This is the documentation for the functions you have access to. You may call any of these functions to help you complete the task. They are available through importing `from vision_agent.tools import *`. You do not need to test these functions. Test only the code provided by the user.
|
129
|
-
|
130
|
-
{docstring}
|
131
|
-
|
132
|
-
**User Instructions**:
|
133
|
-
{question}
|
134
|
-
|
135
|
-
**Input Code Snippet**:
|
136
|
-
```python
|
137
|
-
### Please decided how would you want to generate test cases. Based on incomplete code or completed version.
|
138
|
-
{code}
|
139
|
-
```
|
140
|
-
|
141
|
-
**Instructions**:
|
142
|
-
1. Verify the fundamental functionality under normal conditions.
|
143
|
-
2. Ensure each test case is well-documented with comments explaining the scenario it covers.
|
144
|
-
3. DO NOT use any files that are not provided by the user's instructions, your test must be run and will crash if it tries to load a non-existent file.
|
145
|
-
4. DO NOT mock any functions, you must test their functionality as is.
|
146
|
-
|
147
|
-
You should format your test cases at the end of your response wrapped in ```python ``` tags like in the following example:
|
148
|
-
```python
|
149
|
-
# You can run assertions to ensure the function is working as expected
|
150
|
-
assert function(input) == expected_output, "Test case description"
|
151
|
-
|
152
|
-
# You can simply call the function to ensure it runs
|
153
|
-
function(input)
|
154
|
-
|
155
|
-
# Or you can visualize the output
|
156
|
-
output = function(input)
|
157
|
-
visualize(output)
|
158
|
-
```
|
159
|
-
|
160
|
-
**Examples**:
|
161
|
-
## Prompt 1:
|
162
|
-
```python
|
163
|
-
def detect_cats_and_dogs(image_path: str) -> Dict[str, List[List[float]]]:
|
164
|
-
\""" Detects cats and dogs in an image. Returns a dictionary with
|
165
|
-
{{
|
166
|
-
"cats": [[x1, y1, x2, y2], ...], "dogs": [[x1, y1, x2, y2], ...]
|
167
|
-
}}
|
168
|
-
\"""
|
169
|
-
```
|
170
|
-
|
171
|
-
## Completion 1:
|
172
|
-
```python
|
173
|
-
# We can test to ensure the output has the correct structure but we cannot test the
|
174
|
-
# content of the output without knowing the image. We can test on "image.jpg" because
|
175
|
-
# it is provided by the user so we know it exists.
|
176
|
-
output = detect_cats_and_dogs("image.jpg")
|
177
|
-
assert "cats" in output, "The output should contain 'cats'
|
178
|
-
assert "dogs" in output, "The output should contain 'dogs'
|
179
|
-
```
|
180
|
-
|
181
|
-
## Prompt 2:
|
182
|
-
```python
|
183
|
-
def find_text(image_path: str, text: str) -> str:
|
184
|
-
\""" Finds the text in the image and returns the text. \"""
|
185
|
-
|
186
|
-
## Completion 2:
|
187
|
-
```python
|
188
|
-
# Because we do not know ahead of time what text is in the image, we can only run the
|
189
|
-
# code and print the results. We can test on "image.jpg" because it is provided by the
|
190
|
-
# user so we know it exists.
|
191
|
-
found_text = find_text("image.jpg", "Hello World")
|
192
|
-
print(found_text)
|
193
|
-
```
|
194
|
-
"""
|
195
|
-
|
196
|
-
SIMPLE_TEST = """
|
197
|
-
**Role**: As a tester, your task is to create a simple test case for the provided code. This test case should verify the fundamental functionality under normal conditions.
|
198
|
-
|
199
|
-
**Documentation**:
|
200
|
-
This is the documentation for the functions you have access to. You may call any of these functions to help you complete the task. They are available through importing `from vision_agent.tools import *`. You do not need to test these functions, only the code provided by the user.
|
201
|
-
|
202
|
-
{docstring}
|
203
|
-
|
204
|
-
**User Instructions**:
|
205
|
-
{question}
|
206
|
-
|
207
|
-
**Input Code Snippet**:
|
208
|
-
```python
|
209
|
-
### Please decide how would you want to generate test cases. Based on incomplete code or completed version.
|
210
|
-
{code}
|
211
|
-
```
|
212
|
-
|
213
|
-
**Previous Feedback**:
|
214
|
-
{feedback}
|
215
|
-
|
216
|
-
**Instructions**:
|
217
|
-
1. Verify the fundamental functionality under normal conditions.
|
218
|
-
2. Ensure each test case is well-documented with comments explaining the scenario it covers.
|
219
|
-
3. Your test case MUST run only on the given images which are {media}
|
220
|
-
4. Your test case MUST run only with the given values which is available in the question - {question}
|
221
|
-
5. DO NOT use any non-existent or dummy image or video files that are not provided by the user's instructions.
|
222
|
-
6. DO NOT mock any functions, you must test their functionality as is.
|
223
|
-
7. DO NOT assert the output value, run the code and assert only the output format or data structure.
|
224
|
-
8. DO NOT use try except block to handle the error, let the error be raised if the code is incorrect.
|
225
|
-
9. DO NOT import the testing function as it will available in the testing environment.
|
226
|
-
10. Print the output of the function that is being tested.
|
227
|
-
11. Use the output of the function that is being tested as the return value of the testing function.
|
228
|
-
12. Run the testing function in the end and don't assign a variable to its output.
|
229
|
-
"""
|
230
|
-
|
231
|
-
|
232
|
-
FIX_BUG = """
|
233
|
-
**Role** As a coder, your job is to find the error in the code and fix it. You are running in a notebook setting but do not run !pip install to install new packages.
|
234
|
-
|
235
|
-
**Documentation**:
|
236
|
-
This is the documentation for the functions you have access to. You may call any of these functions to help you complete the task. They are available through importing `from vision_agent.tools import *`.
|
237
|
-
|
238
|
-
{docstring}
|
239
|
-
|
240
|
-
**Instructions**:
|
241
|
-
Please re-complete the code to fix the error message. Here is the current version of the CODE:
|
242
|
-
<code>
|
243
|
-
{code}
|
244
|
-
</code>
|
245
|
-
|
246
|
-
When we run the TEST code:
|
247
|
-
<test>
|
248
|
-
{tests}
|
249
|
-
</test>
|
250
|
-
|
251
|
-
It raises this error:
|
252
|
-
<error>
|
253
|
-
{result}
|
254
|
-
</error>
|
255
|
-
|
256
|
-
This is previous feedback provided on the code:
|
257
|
-
{feedback}
|
258
|
-
|
259
|
-
Please fix the bug by correcting the error. Return thoughts you have about the bug and how you fixed in <thoughts> tags followed by the fixed CODE in <code> tags and the fixed TEST in <test> tags. For example:
|
260
|
-
|
261
|
-
<thoughts>Your thoughts here...</thoughts>
|
262
|
-
<code># your fixed code here</code>
|
263
|
-
<test># your fixed test here</test>
|
264
|
-
"""
|
265
|
-
|
266
|
-
|
267
|
-
REFLECT = """
|
268
|
-
**Role**: You are a reflection agent. Your job is to look at the original user request and the code produced and determine if the code satisfies the user's request. If it does not, you must provide feedback on how to improve the code. You are concerned only if the code meets the user request, not if the code is good or bad.
|
269
|
-
|
270
|
-
**Context**:
|
271
|
-
{context}
|
272
|
-
|
273
|
-
**Plan**:
|
274
|
-
{plan}
|
275
|
-
|
276
|
-
**Code**:
|
277
|
-
{code}
|
278
|
-
|
279
|
-
**Instructions**:
|
280
|
-
1. **Understand the User Request**: Read the user request and understand what the user is asking for.
|
281
|
-
2. **Review the Plan**: Check the plan to see if it is a viable approach to solving the user request.
|
282
|
-
3. **Review the Code**: Check the code to see if it solves the user request.
|
283
|
-
4. DO NOT add any reflections for test cases, these are taken care of.
|
284
|
-
|
285
|
-
Respond in JSON format with the following structure:
|
286
|
-
{{
|
287
|
-
"feedback": str # the feedback you would give to the coder and tester
|
288
|
-
"success": bool # whether the code and tests meet the user request
|
289
|
-
}}
|
290
|
-
"""
|