vision-agent 0.0.33__tar.gz → 0.0.35__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. vision_agent-0.0.35/PKG-INFO +130 -0
  2. vision_agent-0.0.35/README.md +104 -0
  3. {vision_agent-0.0.33 → vision_agent-0.0.35}/pyproject.toml +1 -1
  4. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/agent/easytool.py +2 -2
  5. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/agent/reflexion.py +4 -4
  6. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/agent/vision_agent.py +1 -1
  7. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/tools/tools.py +17 -17
  8. vision_agent-0.0.33/PKG-INFO +0 -100
  9. vision_agent-0.0.33/README.md +0 -74
  10. {vision_agent-0.0.33 → vision_agent-0.0.35}/LICENSE +0 -0
  11. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/__init__.py +0 -0
  12. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/agent/__init__.py +0 -0
  13. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/agent/agent.py +0 -0
  14. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/agent/easytool_prompts.py +0 -0
  15. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/agent/reflexion_prompts.py +0 -0
  16. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/agent/vision_agent_prompts.py +0 -0
  17. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/data/__init__.py +0 -0
  18. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/data/data.py +0 -0
  19. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/emb/__init__.py +0 -0
  20. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/emb/emb.py +0 -0
  21. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/image_utils.py +0 -0
  22. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/llm/__init__.py +0 -0
  23. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/llm/llm.py +0 -0
  24. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/lmm/__init__.py +0 -0
  25. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/lmm/lmm.py +0 -0
  26. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/tools/__init__.py +0 -0
  27. {vision_agent-0.0.33 → vision_agent-0.0.35}/vision_agent/tools/prompts.py +0 -0
@@ -0,0 +1,130 @@
1
+ Metadata-Version: 2.1
2
+ Name: vision-agent
3
+ Version: 0.0.35
4
+ Summary: Toolset for Vision Agent
5
+ Author: Landing AI
6
+ Author-email: dev@landing.ai
7
+ Requires-Python: >=3.10,<3.12
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Programming Language :: Python :: 3.10
10
+ Classifier: Programming Language :: Python :: 3.11
11
+ Requires-Dist: faiss-cpu (>=1.0.0,<2.0.0)
12
+ Requires-Dist: numpy (>=1.21.0,<2.0.0)
13
+ Requires-Dist: openai (>=1.0.0,<2.0.0)
14
+ Requires-Dist: pandas (>=2.0.0,<3.0.0)
15
+ Requires-Dist: pillow (>=10.0.0,<11.0.0)
16
+ Requires-Dist: requests (>=2.0.0,<3.0.0)
17
+ Requires-Dist: sentence-transformers (>=2.0.0,<3.0.0)
18
+ Requires-Dist: torch (>=2.1.0,<2.2.0)
19
+ Requires-Dist: tqdm (>=4.64.0,<5.0.0)
20
+ Requires-Dist: typing_extensions (>=4.0.0,<5.0.0)
21
+ Project-URL: Homepage, https://landing.ai
22
+ Project-URL: documentation, https://github.com/landing-ai/vision-agent
23
+ Project-URL: repository, https://github.com/landing-ai/vision-agent
24
+ Description-Content-Type: text/markdown
25
+
26
+ <div align="center">
27
+ <img alt="vision_agent" height="200px" src="https://github.com/landing-ai/vision-agent/blob/main/assets/logo.jpg?raw=true">
28
+
29
+ # 🔍🤖 Vision Agent
30
+
31
+ [![](https://dcbadge.vercel.app/api/server/wPdN8RCYew?compact=true&style=flat)](https://discord.gg/wPdN8RCYew)
32
+ ![ci_status](https://github.com/landing-ai/vision-agent/actions/workflows/ci_cd.yml/badge.svg)
33
+ [![PyPI version](https://badge.fury.io/py/vision-agent.svg)](https://badge.fury.io/py/vision-agent)
34
+ ![version](https://img.shields.io/pypi/pyversions/vision-agent)
35
+ </div>
36
+
37
+ Vision Agent is a library that helps you utilize agent frameworks for your vision tasks.
38
+ Many current vision problems can easily take hours or days to solve, you need to find the
39
+ right model, figure out how to use it, possibly write programming logic around it to
40
+ accomplish the task you want or even more expensive, train your own model. Vision Agent
41
+ aims to provide an in-seconds experience by allowing users to describe their problem in
42
+ text and utilizing agent frameworks to solve the task for them. Check out our discord
43
+ for updates and roadmaps!
44
+
45
+ ## Getting Started
46
+ ### Installation
47
+ To get started, you can install the library using pip:
48
+
49
+ ```bash
50
+ pip install vision-agent
51
+ ```
52
+
53
+ Ensure you have an OpenAI API key and set it as an environment variable:
54
+
55
+ ```bash
56
+ export OPENAI_API_KEY="your-api-key"
57
+ ```
58
+
59
+ ### Vision Agents
60
+ You can interact with the agents as you would with any LLM or LMM model:
61
+
62
+ ```python
63
+ >>> import vision_agent as va
64
+ >>> agent = VisionAgent()
65
+ >>> agent("How many apples are in this image?", image="apples.jpg")
66
+ "There are 2 apples in the image."
67
+ ```
68
+
69
+ To better understand how the model came up with it's answer, you can also run it in
70
+ debug mode by passing in the verbose argument:
71
+
72
+ ```python
73
+ >>> agent = VisionAgent(verbose=True)
74
+ ```
75
+
76
+ You can also have it return the workflow it used to complete the task along with all
77
+ the individual steps and tools to get the answer:
78
+
79
+ ```python
80
+ >>> resp, workflow = agent.chat_with_workflow([{"role": "user", "content": "How many apples are in this image?"}], image="apples.jpg")
81
+ >>> print(workflow)
82
+ [{"task": "Count the number of apples using 'grounding_dino_'.",
83
+ "tool": "grounding_dino_",
84
+ "parameters": {"prompt": "apple", "image": "apples.jpg"},
85
+ "call_results": [[
86
+ {
87
+ "labels": ["apple", "apple"],
88
+ "scores": [0.99, 0.95],
89
+ "bboxes": [
90
+ [0.58, 0.2, 0.72, 0.45],
91
+ [0.94, 0.57, 0.98, 0.66],
92
+ ]
93
+ }
94
+ ]],
95
+ "answer": "There are 2 apples in the image.",
96
+ }]
97
+ ```
98
+
99
+ ### Tools
100
+ There are a variety of tools for the model or the user to use. Some are executed locally
101
+ while others are hosted for you. You can also ask an LLM directly to build a tool for
102
+ you. For example:
103
+
104
+ ```python
105
+ >>> import vision_agent as va
106
+ >>> llm = va.llm.OpenAILLM()
107
+ >>> detector = llm.generate_detector("Can you build an apple detector for me?")
108
+ >>> detector("apples.jpg")
109
+ [{"labels": ["apple", "apple"],
110
+ "scores": [0.99, 0.95],
111
+ "bboxes": [
112
+ [0.58, 0.2, 0.72, 0.45],
113
+ [0.94, 0.57, 0.98, 0.66],
114
+ ]
115
+ }]
116
+ ```
117
+
118
+ | Tool | Description |
119
+ | --- | --- |
120
+ | CLIP | CLIP is a tool that can classify or tag any image given a set of input classes or tags. |
121
+ | GroundingDINO | GroundingDINO is a tool that can detect arbitrary objects with inputs such as category names or referring expressions. |
122
+ | GroundingSAM | GroundingSAM is a tool that can detect and segment arbitrary objects with inputs such as category names or referring expressions. |
123
+ | Counter | Counter detects and counts the number of objects in an image given an input such as a category name or referring expression. |
124
+ | Crop | Crop crops an image given a bounding box and returns a file name of the cropped image. |
125
+ | BboxArea | BboxArea returns the area of the bounding box in pixels normalized to 2 decimal places. |
126
+ | SegArea | SegArea returns the area of the segmentation mask in pixels normalized to 2 decimal places. |
127
+
128
+
129
+ It also has a basic set of calculate tools such as add, subtract, multiply and divide.
130
+
@@ -0,0 +1,104 @@
1
+ <div align="center">
2
+ <img alt="vision_agent" height="200px" src="https://github.com/landing-ai/vision-agent/blob/main/assets/logo.jpg?raw=true">
3
+
4
+ # 🔍🤖 Vision Agent
5
+
6
+ [![](https://dcbadge.vercel.app/api/server/wPdN8RCYew?compact=true&style=flat)](https://discord.gg/wPdN8RCYew)
7
+ ![ci_status](https://github.com/landing-ai/vision-agent/actions/workflows/ci_cd.yml/badge.svg)
8
+ [![PyPI version](https://badge.fury.io/py/vision-agent.svg)](https://badge.fury.io/py/vision-agent)
9
+ ![version](https://img.shields.io/pypi/pyversions/vision-agent)
10
+ </div>
11
+
12
+ Vision Agent is a library that helps you utilize agent frameworks for your vision tasks.
13
+ Many current vision problems can easily take hours or days to solve, you need to find the
14
+ right model, figure out how to use it, possibly write programming logic around it to
15
+ accomplish the task you want or even more expensive, train your own model. Vision Agent
16
+ aims to provide an in-seconds experience by allowing users to describe their problem in
17
+ text and utilizing agent frameworks to solve the task for them. Check out our discord
18
+ for updates and roadmaps!
19
+
20
+ ## Getting Started
21
+ ### Installation
22
+ To get started, you can install the library using pip:
23
+
24
+ ```bash
25
+ pip install vision-agent
26
+ ```
27
+
28
+ Ensure you have an OpenAI API key and set it as an environment variable:
29
+
30
+ ```bash
31
+ export OPENAI_API_KEY="your-api-key"
32
+ ```
33
+
34
+ ### Vision Agents
35
+ You can interact with the agents as you would with any LLM or LMM model:
36
+
37
+ ```python
38
+ >>> import vision_agent as va
39
+ >>> agent = VisionAgent()
40
+ >>> agent("How many apples are in this image?", image="apples.jpg")
41
+ "There are 2 apples in the image."
42
+ ```
43
+
44
+ To better understand how the model came up with it's answer, you can also run it in
45
+ debug mode by passing in the verbose argument:
46
+
47
+ ```python
48
+ >>> agent = VisionAgent(verbose=True)
49
+ ```
50
+
51
+ You can also have it return the workflow it used to complete the task along with all
52
+ the individual steps and tools to get the answer:
53
+
54
+ ```python
55
+ >>> resp, workflow = agent.chat_with_workflow([{"role": "user", "content": "How many apples are in this image?"}], image="apples.jpg")
56
+ >>> print(workflow)
57
+ [{"task": "Count the number of apples using 'grounding_dino_'.",
58
+ "tool": "grounding_dino_",
59
+ "parameters": {"prompt": "apple", "image": "apples.jpg"},
60
+ "call_results": [[
61
+ {
62
+ "labels": ["apple", "apple"],
63
+ "scores": [0.99, 0.95],
64
+ "bboxes": [
65
+ [0.58, 0.2, 0.72, 0.45],
66
+ [0.94, 0.57, 0.98, 0.66],
67
+ ]
68
+ }
69
+ ]],
70
+ "answer": "There are 2 apples in the image.",
71
+ }]
72
+ ```
73
+
74
+ ### Tools
75
+ There are a variety of tools for the model or the user to use. Some are executed locally
76
+ while others are hosted for you. You can also ask an LLM directly to build a tool for
77
+ you. For example:
78
+
79
+ ```python
80
+ >>> import vision_agent as va
81
+ >>> llm = va.llm.OpenAILLM()
82
+ >>> detector = llm.generate_detector("Can you build an apple detector for me?")
83
+ >>> detector("apples.jpg")
84
+ [{"labels": ["apple", "apple"],
85
+ "scores": [0.99, 0.95],
86
+ "bboxes": [
87
+ [0.58, 0.2, 0.72, 0.45],
88
+ [0.94, 0.57, 0.98, 0.66],
89
+ ]
90
+ }]
91
+ ```
92
+
93
+ | Tool | Description |
94
+ | --- | --- |
95
+ | CLIP | CLIP is a tool that can classify or tag any image given a set of input classes or tags. |
96
+ | GroundingDINO | GroundingDINO is a tool that can detect arbitrary objects with inputs such as category names or referring expressions. |
97
+ | GroundingSAM | GroundingSAM is a tool that can detect and segment arbitrary objects with inputs such as category names or referring expressions. |
98
+ | Counter | Counter detects and counts the number of objects in an image given an input such as a category name or referring expression. |
99
+ | Crop | Crop crops an image given a bounding box and returns a file name of the cropped image. |
100
+ | BboxArea | BboxArea returns the area of the bounding box in pixels normalized to 2 decimal places. |
101
+ | SegArea | SegArea returns the area of the segmentation mask in pixels normalized to 2 decimal places. |
102
+
103
+
104
+ It also has a basic set of calculate tools such as add, subtract, multiply and divide.
@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
4
4
 
5
5
  [tool.poetry]
6
6
  name = "vision-agent"
7
- version = "0.0.33"
7
+ version = "0.0.35"
8
8
  description = "Toolset for Vision Agent"
9
9
  authors = ["Landing AI <dev@landing.ai>"]
10
10
  readme = "README.md"
@@ -246,10 +246,10 @@ class EasyTool(Agent):
246
246
  >>> agent = EasyTool()
247
247
  >>> resp = agent("If a car is traveling at 64 km/h, how many kilometers does it travel in 29 minutes?")
248
248
  >>> print(resp)
249
- >>> "It will travel approximately 31.03 kilometers in 29 minutes."
249
+ "It will travel approximately 31.03 kilometers in 29 minutes."
250
250
  >>> resp = agent("How many cards are in this image?", image="cards.jpg")
251
251
  >>> print(resp)
252
- >>> "There are 2 cards in this image."
252
+ "There are 2 cards in this image."
253
253
  """
254
254
 
255
255
  def __init__(
@@ -74,14 +74,14 @@ class Reflexion(Agent):
74
74
  >>> question = "How many tires does a truck have?"
75
75
  >>> resp = agent(question)
76
76
  >>> print(resp)
77
- >>> "18"
77
+ "18"
78
78
  >>> resp = agent([
79
79
  >>> {"role": "user", "content": question},
80
80
  >>> {"role": "assistant", "content": resp},
81
81
  >>> {"role": "user", "content": "No I mean those regular trucks but where the back tires are double."}
82
82
  >>> ])
83
83
  >>> print(resp)
84
- >>> "6"
84
+ "6"
85
85
  >>> agent = Reflexion(
86
86
  >>> self_reflect_model=va.lmm.OpenAILMM(),
87
87
  >>> action_agent=va.lmm.OpenAILMM()
@@ -89,14 +89,14 @@ class Reflexion(Agent):
89
89
  >>> quesiton = "How many hearts are in this image?"
90
90
  >>> resp = agent(question, image="cards.png")
91
91
  >>> print(resp)
92
- >>> "6"
92
+ "6"
93
93
  >>> resp = agent([
94
94
  >>> {"role": "user", "content": question},
95
95
  >>> {"role": "assistant", "content": resp},
96
96
  >>> {"role": "user", "content": "No, please count the hearts on the bottom card."}
97
97
  >>> ], image="cards.png")
98
98
  >>> print(resp)
99
- >>> "4"
99
+ "4"
100
100
  )
101
101
  """
102
102
 
@@ -344,7 +344,7 @@ class VisionAgent(Agent):
344
344
  >>> agent = VisionAgent()
345
345
  >>> resp = agent("If red tomatoes cost $5 each and yellow tomatoes cost $2.50 each, what is the total cost of all the tomatoes in the image?", image="tomatoes.jpg")
346
346
  >>> print(resp)
347
- >>> "The total cost is $57.50."
347
+ "The total cost is $57.50."
348
348
  """
349
349
 
350
350
  def __init__(
@@ -58,13 +58,13 @@ class CLIP(Tool):
58
58
  >>> import vision_agent as va
59
59
  >>> clip = va.tools.CLIP()
60
60
  >>> clip(["red line", "yellow dot"], "ct_scan1.jpg"))
61
- >>> [{"labels": ["red line", "yellow dot"], "scores": [0.98, 0.02]}]
61
+ [{"labels": ["red line", "yellow dot"], "scores": [0.98, 0.02]}]
62
62
  """
63
63
 
64
64
  _ENDPOINT = "https://rb4ii6dfacmwqfxivi4aedyyfm0endsv.lambda-url.us-east-2.on.aws"
65
65
 
66
66
  name = "clip_"
67
- description = "'clip_' is a tool that can classify or tag any image given a set if input classes or tags."
67
+ description = "'clip_' is a tool that can classify or tag any image given a set of input classes or tags."
68
68
  usage = {
69
69
  "required_parameters": [
70
70
  {"name": "prompt", "type": "List[str]"},
@@ -121,9 +121,9 @@ class GroundingDINO(Tool):
121
121
  >>> import vision_agent as va
122
122
  >>> t = va.tools.GroundingDINO()
123
123
  >>> t("red line. yellow dot", "ct_scan1.jpg")
124
- >>> [{'labels': ['red line', 'yellow dot'],
125
- >>> 'bboxes': [[0.38, 0.15, 0.59, 0.7], [0.48, 0.25, 0.69, 0.71]],
126
- >>> 'scores': [0.98, 0.02]}]
124
+ [{'labels': ['red line', 'yellow dot'],
125
+ 'bboxes': [[0.38, 0.15, 0.59, 0.7], [0.48, 0.25, 0.69, 0.71]],
126
+ 'scores': [0.98, 0.02]}]
127
127
  """
128
128
 
129
129
  _ENDPOINT = "https://chnicr4kes5ku77niv2zoytggq0qyqlp.lambda-url.us-east-2.on.aws"
@@ -192,18 +192,18 @@ class GroundingSAM(Tool):
192
192
  >>> import vision_agent as va
193
193
  >>> t = va.tools.GroundingSAM()
194
194
  >>> t(["red line", "yellow dot"], ct_scan1.jpg"])
195
- >>> [{'labels': ['yellow dot', 'red line'],
196
- >>> 'bboxes': [[0.38, 0.15, 0.59, 0.7], [0.48, 0.25, 0.69, 0.71]],
197
- >>> 'masks': [array([[0, 0, 0, ..., 0, 0, 0],
198
- >>> [0, 0, 0, ..., 0, 0, 0],
199
- >>> ...,
200
- >>> [0, 0, 0, ..., 0, 0, 0],
201
- >>> [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)},
202
- >>> array([[0, 0, 0, ..., 0, 0, 0],
203
- >>> [0, 0, 0, ..., 0, 0, 0],
204
- >>> ...,
205
- >>> [1, 1, 1, ..., 1, 1, 1],
206
- >>> [1, 1, 1, ..., 1, 1, 1]], dtype=uint8)]}]
195
+ [{'labels': ['yellow dot', 'red line'],
196
+ 'bboxes': [[0.38, 0.15, 0.59, 0.7], [0.48, 0.25, 0.69, 0.71]],
197
+ 'masks': [array([[0, 0, 0, ..., 0, 0, 0],
198
+ [0, 0, 0, ..., 0, 0, 0],
199
+ ...,
200
+ [0, 0, 0, ..., 0, 0, 0],
201
+ [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)},
202
+ array([[0, 0, 0, ..., 0, 0, 0],
203
+ [0, 0, 0, ..., 0, 0, 0],
204
+ ...,
205
+ [1, 1, 1, ..., 1, 1, 1],
206
+ [1, 1, 1, ..., 1, 1, 1]], dtype=uint8)]}]
207
207
  """
208
208
 
209
209
  _ENDPOINT = "https://cou5lfmus33jbddl6hoqdfbw7e0qidrw.lambda-url.us-east-2.on.aws"
@@ -1,100 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: vision-agent
3
- Version: 0.0.33
4
- Summary: Toolset for Vision Agent
5
- Author: Landing AI
6
- Author-email: dev@landing.ai
7
- Requires-Python: >=3.10,<3.12
8
- Classifier: Programming Language :: Python :: 3
9
- Classifier: Programming Language :: Python :: 3.10
10
- Classifier: Programming Language :: Python :: 3.11
11
- Requires-Dist: faiss-cpu (>=1.0.0,<2.0.0)
12
- Requires-Dist: numpy (>=1.21.0,<2.0.0)
13
- Requires-Dist: openai (>=1.0.0,<2.0.0)
14
- Requires-Dist: pandas (>=2.0.0,<3.0.0)
15
- Requires-Dist: pillow (>=10.0.0,<11.0.0)
16
- Requires-Dist: requests (>=2.0.0,<3.0.0)
17
- Requires-Dist: sentence-transformers (>=2.0.0,<3.0.0)
18
- Requires-Dist: torch (>=2.1.0,<2.2.0)
19
- Requires-Dist: tqdm (>=4.64.0,<5.0.0)
20
- Requires-Dist: typing_extensions (>=4.0.0,<5.0.0)
21
- Project-URL: Homepage, https://landing.ai
22
- Project-URL: documentation, https://github.com/landing-ai/vision-agent
23
- Project-URL: repository, https://github.com/landing-ai/vision-agent
24
- Description-Content-Type: text/markdown
25
-
26
- <div align="center">
27
- <img alt="vision_agent" height="200px" src="https://github.com/landing-ai/vision-agent/blob/main/assets/logo.jpg?raw=true">
28
-
29
- # 🔍🤖 Vision Agent
30
-
31
- [![](https://dcbadge.vercel.app/api/server/wPdN8RCYew?compact=true&style=flat)](https://discord.gg/wPdN8RCYew)
32
- ![ci_status](https://github.com/landing-ai/vision-agent/actions/workflows/ci_cd.yml/badge.svg)
33
- [![PyPI version](https://badge.fury.io/py/vision-agent.svg)](https://badge.fury.io/py/vision-agent)
34
- ![version](https://img.shields.io/pypi/pyversions/vision-agent)
35
- </div>
36
-
37
-
38
- Vision Agent is a library for that helps you to use multimodal models to organize and structure your image data. Check out our discord for roadmaps and updates!
39
-
40
- One of the problems of dealing with image data is it can be difficult to organize and search. For example, you might have a bunch of pictures of houses and want to count how many yellow houses you have, or how many houses with adobe roofs. The vision agent library uses LMMs to help create tags or descriptions of images to allow you to search over them, or use them in a database to carry out other operations.
41
-
42
- ## Getting Started
43
- ### LMMs
44
- To get started, you can use an LMM to start generating text from images. The following code will use the LLaVA-1.6 34B model to generate a description of the image you pass it.
45
-
46
- ```python
47
- import vision_agent as va
48
-
49
- model = va.lmm.get_lmm("llava")
50
- model.generate("Describe this image", "image.png")
51
- >>> "A yellow house with a green lawn."
52
- ```
53
-
54
- **WARNING** We are hosting the LLaVA-1.6 34B model, if it times out please wait ~3-5 min for the server to warm up as it shuts down when usage is low.
55
-
56
- ### DataStore
57
- You can use the `DataStore` class to store your images, add new metadata to them such as descriptions, and search over different columns.
58
-
59
- ```python
60
- import vision_agent as va
61
- import pandas as pd
62
-
63
- df = pd.DataFrame({"image_paths": ["image1.png", "image2.png", "image3.png"]})
64
- ds = va.data.DataStore(df)
65
- ds = ds.add_lmm(va.lmm.get_lmm("llava"))
66
- ds = ds.add_embedder(va.emb.get_embedder("sentence-transformer"))
67
-
68
- ds = ds.add_column("descriptions", "Describe this image.")
69
- ```
70
-
71
- This will use the prompt you passed, "Describe this image.", and the LMM to create a new column of descriptions for your image. Your data will now contain a new column with the descriptions of each image:
72
-
73
- | image\_paths | image\_id | descriptions |
74
- | --- | --- | --- |
75
- | image1.png | 1 | "A yellow house with a green lawn." |
76
- | image2.png | 2 | "A white house with a two door garage." |
77
- | image3.png | 3 | "A wooden house in the middle of the forest." |
78
-
79
- You can now create an index on the descriptions column and search over it to find images that match your query.
80
-
81
- ```python
82
- ds = ds.build_index("descriptions")
83
- ds.search("A yellow house.", top_k=1)
84
- >>> [{'image_paths': 'image1.png', 'image_id': 1, 'descriptions': 'A yellow house with a green lawn.'}]
85
- ```
86
-
87
- You can also create other columns for you data such as `is_yellow`:
88
-
89
- ```python
90
- ds = ds.add_column("is_yellow", "Is the house in this image yellow? Please answer yes or no.")
91
- ```
92
-
93
- which would give you a dataset similar to this:
94
-
95
- | image\_paths | image\_id | descriptions | is\_yellow |
96
- | --- | --- | --- | --- |
97
- | image1.png | 1 | "A yellow house with a green lawn." | "yes" |
98
- | image2.png | 2 | "A white house with a two door garage." | "no" |
99
- | image3.png | 3 | "A wooden house in the middle of the forest." | "no" |
100
-
@@ -1,74 +0,0 @@
1
- <div align="center">
2
- <img alt="vision_agent" height="200px" src="https://github.com/landing-ai/vision-agent/blob/main/assets/logo.jpg?raw=true">
3
-
4
- # 🔍🤖 Vision Agent
5
-
6
- [![](https://dcbadge.vercel.app/api/server/wPdN8RCYew?compact=true&style=flat)](https://discord.gg/wPdN8RCYew)
7
- ![ci_status](https://github.com/landing-ai/vision-agent/actions/workflows/ci_cd.yml/badge.svg)
8
- [![PyPI version](https://badge.fury.io/py/vision-agent.svg)](https://badge.fury.io/py/vision-agent)
9
- ![version](https://img.shields.io/pypi/pyversions/vision-agent)
10
- </div>
11
-
12
-
13
- Vision Agent is a library for that helps you to use multimodal models to organize and structure your image data. Check out our discord for roadmaps and updates!
14
-
15
- One of the problems of dealing with image data is it can be difficult to organize and search. For example, you might have a bunch of pictures of houses and want to count how many yellow houses you have, or how many houses with adobe roofs. The vision agent library uses LMMs to help create tags or descriptions of images to allow you to search over them, or use them in a database to carry out other operations.
16
-
17
- ## Getting Started
18
- ### LMMs
19
- To get started, you can use an LMM to start generating text from images. The following code will use the LLaVA-1.6 34B model to generate a description of the image you pass it.
20
-
21
- ```python
22
- import vision_agent as va
23
-
24
- model = va.lmm.get_lmm("llava")
25
- model.generate("Describe this image", "image.png")
26
- >>> "A yellow house with a green lawn."
27
- ```
28
-
29
- **WARNING** We are hosting the LLaVA-1.6 34B model, if it times out please wait ~3-5 min for the server to warm up as it shuts down when usage is low.
30
-
31
- ### DataStore
32
- You can use the `DataStore` class to store your images, add new metadata to them such as descriptions, and search over different columns.
33
-
34
- ```python
35
- import vision_agent as va
36
- import pandas as pd
37
-
38
- df = pd.DataFrame({"image_paths": ["image1.png", "image2.png", "image3.png"]})
39
- ds = va.data.DataStore(df)
40
- ds = ds.add_lmm(va.lmm.get_lmm("llava"))
41
- ds = ds.add_embedder(va.emb.get_embedder("sentence-transformer"))
42
-
43
- ds = ds.add_column("descriptions", "Describe this image.")
44
- ```
45
-
46
- This will use the prompt you passed, "Describe this image.", and the LMM to create a new column of descriptions for your image. Your data will now contain a new column with the descriptions of each image:
47
-
48
- | image\_paths | image\_id | descriptions |
49
- | --- | --- | --- |
50
- | image1.png | 1 | "A yellow house with a green lawn." |
51
- | image2.png | 2 | "A white house with a two door garage." |
52
- | image3.png | 3 | "A wooden house in the middle of the forest." |
53
-
54
- You can now create an index on the descriptions column and search over it to find images that match your query.
55
-
56
- ```python
57
- ds = ds.build_index("descriptions")
58
- ds.search("A yellow house.", top_k=1)
59
- >>> [{'image_paths': 'image1.png', 'image_id': 1, 'descriptions': 'A yellow house with a green lawn.'}]
60
- ```
61
-
62
- You can also create other columns for you data such as `is_yellow`:
63
-
64
- ```python
65
- ds = ds.add_column("is_yellow", "Is the house in this image yellow? Please answer yes or no.")
66
- ```
67
-
68
- which would give you a dataset similar to this:
69
-
70
- | image\_paths | image\_id | descriptions | is\_yellow |
71
- | --- | --- | --- | --- |
72
- | image1.png | 1 | "A yellow house with a green lawn." | "yes" |
73
- | image2.png | 2 | "A white house with a two door garage." | "no" |
74
- | image3.png | 3 | "A wooden house in the middle of the forest." | "no" |
File without changes