vision-agent 0.2.147__tar.gz → 0.2.148__tar.gz
Sign up to get free protection for your applications and to get access to all the features.
- {vision_agent-0.2.147 → vision_agent-0.2.148}/PKG-INFO +1 -1
- {vision_agent-0.2.147 → vision_agent-0.2.148}/pyproject.toml +1 -1
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/agent/vision_agent_coder_prompts.py +90 -14
- {vision_agent-0.2.147 → vision_agent-0.2.148}/LICENSE +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/README.md +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/__init__.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/agent/__init__.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/agent/agent.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/agent/agent_utils.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/agent/vision_agent.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/agent/vision_agent_coder.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/agent/vision_agent_prompts.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/clients/__init__.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/clients/http.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/clients/landing_public_api.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/fonts/__init__.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/fonts/default_font_ch_en.ttf +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/lmm/__init__.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/lmm/lmm.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/lmm/types.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/tools/__init__.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/tools/meta_tools.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/tools/prompts.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/tools/tool_utils.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/tools/tools.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/tools/tools_types.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/utils/__init__.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/utils/exceptions.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/utils/execute.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/utils/image_utils.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/utils/sim.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/utils/type_defs.py +0 -0
- {vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/utils/video.py +0 -0
{vision_agent-0.2.147 → vision_agent-0.2.148}/vision_agent/agent/vision_agent_coder_prompts.py
RENAMED
@@ -67,14 +67,7 @@ This is the documentation for the functions you have access to. You may call any
|
|
67
67
|
**Previous Attempts**:
|
68
68
|
{previous_attempts}
|
69
69
|
|
70
|
-
**
|
71
|
-
1. Write a program to load the media and call each tool and print it's output along with other relevant information.
|
72
|
-
2. Create a dictionary where the keys are the tool name and the values are the tool outputs. Remove numpy arrays from the printed dictionary.
|
73
|
-
3. Your test case MUST run only on the given images which are {media}
|
74
|
-
4. Print this final dictionary.
|
75
|
-
5. For video input, sample at 1 FPS and use the first 10 frames only to reduce processing time.
|
76
|
-
|
77
|
-
**Example**:
|
70
|
+
**Examples**:
|
78
71
|
--- EXAMPLE1 ---
|
79
72
|
plan1:
|
80
73
|
- Load the image from the provided file path 'image.jpg'.
|
@@ -100,6 +93,7 @@ cgd_out = countgd_counting(image)
|
|
100
93
|
|
101
94
|
final_out = {{"owl_v2_image": owl_v2_out, "florence2_sam2_image": f2s2, "countgd_counting": cgd_out}}
|
102
95
|
print(final_out)
|
96
|
+
--- END EXAMPLE1 ---
|
103
97
|
|
104
98
|
--- EXAMPLE2 ---
|
105
99
|
plan1:
|
@@ -173,6 +167,14 @@ print(final_out)
|
|
173
167
|
print(labels_and_scores)
|
174
168
|
print(counts)
|
175
169
|
```
|
170
|
+
--- END EXAMPLE2 ---
|
171
|
+
|
172
|
+
**Instructions**:
|
173
|
+
1. Write a program to load the media and call each tool and print it's output along with other relevant information.
|
174
|
+
2. Create a dictionary where the keys are the tool name and the values are the tool outputs. Remove numpy arrays from the printed dictionary.
|
175
|
+
3. Your test case MUST run only on the given images which are {media}
|
176
|
+
4. Print this final dictionary.
|
177
|
+
5. For video input, sample at 1 FPS and use the first 10 frames only to reduce processing time.
|
176
178
|
"""
|
177
179
|
|
178
180
|
|
@@ -224,11 +226,6 @@ This is the documentation for the functions you have access to. You may call any
|
|
224
226
|
|
225
227
|
{docstring}
|
226
228
|
|
227
|
-
**Input Code Snippet**:
|
228
|
-
```python
|
229
|
-
# Your code here
|
230
|
-
```
|
231
|
-
|
232
229
|
**User Instructions**:
|
233
230
|
{question}
|
234
231
|
|
@@ -241,11 +238,90 @@ This is the documentation for the functions you have access to. You may call any
|
|
241
238
|
**Previous Feedback**:
|
242
239
|
{feedback}
|
243
240
|
|
241
|
+
**Examples**:
|
242
|
+
--- EXAMPLE1 ---
|
243
|
+
**User Instructions**:
|
244
|
+
|
245
|
+
## User Request
|
246
|
+
Can you write a program to check if each person is wearing a helmet? First detect all the people in the image, then detect the helmets, check whether or not a person is wearing a helmet if the helmet is on the worker. Return a dictionary with the count of people with helments and people without helmets. Media name worker_helmets.webp
|
247
|
+
|
248
|
+
## Subtasks
|
249
|
+
|
250
|
+
This plan uses the owl_v2_image tool to detect both people and helmets in a single pass, which should be efficient and accurate. We can then compare the detections to determine if each person is wearing a helmet.
|
251
|
+
-Use owl_v2_image with prompt 'person, helmet' to detect both people and helmets in the image
|
252
|
+
-Process the detections to match helmets with people based on bounding box proximity
|
253
|
+
-Count people with and without helmets based on the matching results
|
254
|
+
-Return a dictionary with the counts
|
255
|
+
|
256
|
+
|
257
|
+
**Tool Tests and Outputs**:
|
258
|
+
After examining the image, I can see 4 workers in total, with 3 wearing yellow safety helmets and 1 not wearing a helmet. Plan 1 using owl_v2_image seems to be the most accurate in detecting both people and helmets. However, it needs some modifications to improve accuracy. We should increase the confidence threshold to 0.15 to filter out the lowest confidence box, and implement logic to associate helmets with people based on their bounding box positions. Plan 2 and Plan 3 seem less reliable given the tool outputs, as they either failed to distinguish between people with and without helmets or misclassified all workers as not wearing helmets.
|
259
|
+
|
260
|
+
**Tool Output Thoughts**:
|
261
|
+
```python
|
262
|
+
...
|
263
|
+
```
|
264
|
+
----- stdout -----
|
265
|
+
Plan 1 - owl_v2_image:
|
266
|
+
|
267
|
+
[{{'label': 'helmet', 'score': 0.15, 'bbox': [0.85, 0.41, 0.87, 0.45]}}, {{'label': 'helmet', 'score': 0.3, 'bbox': [0.8, 0.43, 0.81, 0.46]}}, {{'label': 'helmet', 'score': 0.31, 'bbox': [0.85, 0.45, 0.86, 0.46]}}, {{'label': 'person', 'score': 0.31, 'bbox': [0.84, 0.45, 0.88, 0.58]}}, {{'label': 'person', 'score': 0.31, 'bbox': [0.78, 0.43, 0.82, 0.57]}}, {{'label': 'helmet', 'score': 0.33, 'bbox': [0.3, 0.65, 0.32, 0.67]}}, {{'label': 'person', 'score': 0.29, 'bbox': [0.28, 0.65, 0.36, 0.84]}}, {{'label': 'helmet', 'score': 0.29, 'bbox': [0.13, 0.82, 0.15, 0.85]}}, {{'label': 'person', 'score': 0.3, 'bbox': [0.1, 0.82, 0.24, 1.0]}}]
|
268
|
+
|
269
|
+
...
|
270
|
+
|
271
|
+
**Input Code Snippet**:
|
272
|
+
```python
|
273
|
+
from vision_agent.tools import load_image, owl_v2_image
|
274
|
+
|
275
|
+
def check_helmets(image_path):
|
276
|
+
image = load_image(image_path)
|
277
|
+
# Detect people and helmets, filter out the lowest confidence helmet score of 0.15
|
278
|
+
detections = owl_v2_image("person, helmet", image, box_threshold=0.15)
|
279
|
+
height, width = image.shape[:2]
|
280
|
+
|
281
|
+
# Separate people and helmets
|
282
|
+
people = [d for d in detections if d['label'] == 'person']
|
283
|
+
helmets = [d for d in detections if d['label'] == 'helmet']
|
284
|
+
|
285
|
+
people_with_helmets = 0
|
286
|
+
people_without_helmets = 0
|
287
|
+
|
288
|
+
for person in people:
|
289
|
+
person_x = (person['bbox'][0] + person['bbox'][2]) / 2
|
290
|
+
person_y = person['bbox'][1] # Top of the bounding box
|
291
|
+
|
292
|
+
helmet_found = False
|
293
|
+
for helmet in helmets:
|
294
|
+
helmet_x = (helmet['bbox'][0] + helmet['bbox'][2]) / 2
|
295
|
+
helmet_y = (helmet['bbox'][1] + helmet['bbox'][3]) / 2
|
296
|
+
|
297
|
+
# Check if the helmet is within 20 pixels of the person's head. Unnormalize
|
298
|
+
# the coordinates so we can better compare them.
|
299
|
+
if (abs((helmet_x - person_x) * width) < 20 and
|
300
|
+
-5 < ((helmet_y - person_y) * height) < 20):
|
301
|
+
helmet_found = True
|
302
|
+
break
|
303
|
+
|
304
|
+
if helmet_found:
|
305
|
+
people_with_helmets += 1
|
306
|
+
else:
|
307
|
+
people_without_helmets += 1
|
308
|
+
|
309
|
+
return {{
|
310
|
+
"people_with_helmets": people_with_helmets,
|
311
|
+
"people_without_helmets": people_without_helmets
|
312
|
+
}}
|
313
|
+
```
|
314
|
+
--- END EXAMPLE1 ---
|
315
|
+
|
244
316
|
**Instructions**:
|
245
317
|
1. **Understand and Clarify**: Make sure you understand the task.
|
246
318
|
2. **Algorithm/Method Selection**: Decide on the most efficient method, use the tool outputs and tool thoughts to guide you.
|
247
319
|
3. **Pseudocode Creation**: Write down the steps you will follow in pseudocode.
|
248
|
-
4. **Code Generation**: Translate your pseudocode into executable Python code.
|
320
|
+
4. **Code Generation**: Translate your pseudocode into executable Python code.
|
321
|
+
4.1. Take in the media path as an argument and load with either `load_image` or `extract_frames_and_timestamps`.
|
322
|
+
4.2. Coordinates are always returned normalized from `vision_agent.tools`.
|
323
|
+
4.3. Do not create dummy input or functions, the code must be usable if the user provides new media.
|
324
|
+
4.4. Use unnormalized coordinates when comparing bounding boxes.
|
249
325
|
"""
|
250
326
|
|
251
327
|
TEST = """
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|