nomic 3.2.0__tar.gz → 3.3.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of nomic might be problematic. Click here for more details.
- nomic-3.3.2/PKG-INFO +247 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic/data_inference.py +5 -1
- {nomic-3.2.0 → nomic-3.3.2}/nomic/dataset.py +5 -42
- nomic-3.3.2/nomic.egg-info/PKG-INFO +247 -0
- {nomic-3.2.0 → nomic-3.3.2}/setup.py +18 -2
- nomic-3.2.0/PKG-INFO +0 -18
- nomic-3.2.0/nomic.egg-info/PKG-INFO +0 -18
- {nomic-3.2.0 → nomic-3.3.2}/README.md +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic/__init__.py +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic/atlas.py +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic/aws/__init__.py +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic/aws/sagemaker.py +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic/cli.py +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic/data_operations.py +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic/embed.py +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic/pl_callbacks/__init__.py +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic/pl_callbacks/pl_callback.py +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic/settings.py +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic/utils.py +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic.egg-info/SOURCES.txt +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic.egg-info/dependency_links.txt +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic.egg-info/entry_points.txt +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic.egg-info/requires.txt +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/nomic.egg-info/top_level.txt +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/pyproject.toml +0 -0
- {nomic-3.2.0 → nomic-3.3.2}/setup.cfg +0 -0
nomic-3.3.2/PKG-INFO
ADDED
|
@@ -0,0 +1,247 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: nomic
|
|
3
|
+
Version: 3.3.2
|
|
4
|
+
Summary: The official Nomic python client.
|
|
5
|
+
Home-page: https://github.com/nomic-ai/nomic
|
|
6
|
+
Author: nomic.ai
|
|
7
|
+
Author-email: support@nomic.ai
|
|
8
|
+
License: UNKNOWN
|
|
9
|
+
Platform: UNKNOWN
|
|
10
|
+
Classifier: License :: OSI Approved :: Apache Software License
|
|
11
|
+
Classifier: Programming Language :: Python :: 3
|
|
12
|
+
Description-Content-Type: text/markdown
|
|
13
|
+
Provides-Extra: local
|
|
14
|
+
Provides-Extra: aws
|
|
15
|
+
Provides-Extra: all
|
|
16
|
+
Provides-Extra: dev
|
|
17
|
+
|
|
18
|
+
<h1 align="center">Nomic Atlas Python Client</h1>
|
|
19
|
+
<h3 align="center">Explore, label, search and share massive datasets in your web browser.</h3>
|
|
20
|
+
<p>This repository contains Python bindings for working with <a href="https://atlas.nomic.ai/">Nomic Atlas</a>, the world’s most powerful unstructured data interaction platform. Atlas supports datasets from hundreds to tens of millions of points, and supports data modalities ranging from text to image to audio to video. </p>
|
|
21
|
+
|
|
22
|
+
With Nomic Atlas, you can:
|
|
23
|
+
|
|
24
|
+
- Generate, store and retrieve embeddings for your unstructured data.
|
|
25
|
+
- Find insights in your unstructured data and embeddings all from your web browser.
|
|
26
|
+
- Share and present your datasets and data findings to anyone.
|
|
27
|
+
|
|
28
|
+
### Where to find us?
|
|
29
|
+
|
|
30
|
+
[https://atlas.nomic.ai/](https://atlas.nomic.ai/)
|
|
31
|
+
|
|
32
|
+
|
|
33
|
+
|
|
34
|
+
## Table of Contents
|
|
35
|
+
|
|
36
|
+
- [Quick resources](#quick-resources)
|
|
37
|
+
- [Example maps](#example-maps)
|
|
38
|
+
- [Features](#features)
|
|
39
|
+
- [Quickstart](#quickstart)
|
|
40
|
+
- [Installation](#installation)
|
|
41
|
+
- [Make your first map](#make-your-first-map)
|
|
42
|
+
- [Atlas usage examples](#atlas-usage-examples)
|
|
43
|
+
- [Access your embeddings](#access-your-embeddings)
|
|
44
|
+
- [View your data's topic model](#view-your-datas-topic-model)
|
|
45
|
+
- [Search for data semantically](#search-for-data-semantically)
|
|
46
|
+
- [Documentation](#documentation)
|
|
47
|
+
- [Discussion](#discussion)
|
|
48
|
+
- [Community](#community)
|
|
49
|
+
|
|
50
|
+
## Quick Resources
|
|
51
|
+
|
|
52
|
+
<p >
|
|
53
|
+
Try the <a href="https://colab.research.google.com/drive/1CZBo3LV0FoRTVRN3v068tvNJgbeWpcSX?usp=sharing">:notebook: Colab Demo</a> to get started in Python
|
|
54
|
+
</p>
|
|
55
|
+
|
|
56
|
+
<p>
|
|
57
|
+
Read the <a href="https://docs.nomic.ai">:closed_book: Atlas Docs</a>
|
|
58
|
+
</p>
|
|
59
|
+
|
|
60
|
+
<p>
|
|
61
|
+
Join our <a href="https://discord.gg/myY5YDR8z8">:hut: Discord</a> to start chatting and get help
|
|
62
|
+
</p>
|
|
63
|
+
|
|
64
|
+
#### Example maps
|
|
65
|
+
|
|
66
|
+
<a href="https://atlas.nomic.ai/map/twitter">:world_map: Map of Twitter</a> (5.4 million tweets)
|
|
67
|
+
<br> <br>
|
|
68
|
+
<a href="https://atlas.nomic.ai/map/stablediffusion">:world_map: Map of StableDiffusion Generations</a> (6.4 million images)
|
|
69
|
+
<br> <br>
|
|
70
|
+
<a href="https://atlas.nomic.ai/map/neurips">:world_map: Map of NeurIPS Proceedings</a> (16,623 abstracts)
|
|
71
|
+
|
|
72
|
+
</p>
|
|
73
|
+
|
|
74
|
+
## Features
|
|
75
|
+
|
|
76
|
+
Here are just a few of the features which Atlas offers:
|
|
77
|
+
|
|
78
|
+
- Organize your **text, image, and embedding data**
|
|
79
|
+
- Create **beautiful and shareable** maps **with or without coding knowledge**
|
|
80
|
+
- Have easy access to both **high-level data structures** and **individual datapoints**
|
|
81
|
+
- **Search** millions of datapoints **instantly**
|
|
82
|
+
- **Cluster data** into semantic topics
|
|
83
|
+
- **Tag and clean** your dataset
|
|
84
|
+
- **Deduplicate** text, images, video, audio
|
|
85
|
+
|
|
86
|
+
|
|
87
|
+
|
|
88
|
+
## Quickstart
|
|
89
|
+
|
|
90
|
+
### Installation
|
|
91
|
+
|
|
92
|
+
1. Install the Nomic library
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
pip install nomic
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
2. Login or create your Nomic account:
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
nomic login
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
3. Follow the instructions to obtain your access token.
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
nomic login [token]
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
### Make your first map
|
|
111
|
+
|
|
112
|
+
```python
|
|
113
|
+
from nomic import atlas
|
|
114
|
+
import numpy as np
|
|
115
|
+
|
|
116
|
+
# Randomly generate a set of 10,000 high-dimensional embeddings
|
|
117
|
+
num_embeddings = 10000
|
|
118
|
+
embeddings = np.random.rand(num_embeddings, 256)
|
|
119
|
+
|
|
120
|
+
# Create Atlas project
|
|
121
|
+
dataset = atlas.map_data(embeddings=embeddings)
|
|
122
|
+
|
|
123
|
+
print(dataset)
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
## Atlas usage examples
|
|
127
|
+
|
|
128
|
+
### Access your embeddings
|
|
129
|
+
|
|
130
|
+
Atlas stores, manages and generates embeddings for your unstructured data.
|
|
131
|
+
|
|
132
|
+
You can access Atlas latent embeddings (e.g. high dimensional) or the two-dimensional embeddings generated for web display.
|
|
133
|
+
|
|
134
|
+
```python
|
|
135
|
+
# Access your Atlas map and download your embeddings
|
|
136
|
+
map = dataset.maps[0]
|
|
137
|
+
|
|
138
|
+
projected_embeddings = map.embeddings.projected
|
|
139
|
+
latent_embeddings = map.embeddings.latent
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
```python
|
|
143
|
+
print(projected_embeddings)
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
```
|
|
147
|
+
# Response:
|
|
148
|
+
id x y
|
|
149
|
+
0 9.815330 -8.105308
|
|
150
|
+
1 -8.725819 5.980628
|
|
151
|
+
2 13.199472 -1.103389
|
|
152
|
+
... ... ... ...
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
```python
|
|
156
|
+
print(latent_embeddings)
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
```
|
|
160
|
+
# Response:
|
|
161
|
+
n x d numpy.ndarray where n = number of datapoints and d = number of latent dimensions
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
### View your data’s topic model
|
|
165
|
+
|
|
166
|
+
Atlas automatically organizes your data into topics informed by the latent contents of your embeddings. Visually, these are represented by regions of homogenous color on an Atlas map.
|
|
167
|
+
|
|
168
|
+
You can access and operate on topics programmatically by using the `topics` attribute
|
|
169
|
+
of an AtlasMap.
|
|
170
|
+
|
|
171
|
+
```python
|
|
172
|
+
# Access your Atlas map
|
|
173
|
+
map = dataset.maps[0]
|
|
174
|
+
|
|
175
|
+
# Access a pandas DataFrame associating each datum on your map to their topics at each topic depth.
|
|
176
|
+
topic_df = map.topics.df
|
|
177
|
+
|
|
178
|
+
print(map.topics.df)
|
|
179
|
+
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
```
|
|
183
|
+
Response:
|
|
184
|
+
|
|
185
|
+
id topic_depth_1 topic_depth_2
|
|
186
|
+
0 Oil Prices mergers and acquisitions
|
|
187
|
+
1 Iraq War Trial of Thatcher
|
|
188
|
+
2 Oil Prices Economic Growth
|
|
189
|
+
... ... ... ...
|
|
190
|
+
9997 Oil Prices Economic Growth
|
|
191
|
+
9998 Baseball Giambi's contract
|
|
192
|
+
9999 Olympic Gold Medal European Football
|
|
193
|
+
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
### Search for data semantically
|
|
197
|
+
|
|
198
|
+
Use Atlas to automatically find nearest neighbors in your vector database.
|
|
199
|
+
|
|
200
|
+
```python
|
|
201
|
+
# Load map and perform vector search for the five nearest neighbors of datum with id "my_query_point"
|
|
202
|
+
map = dataset.maps[0]
|
|
203
|
+
|
|
204
|
+
with dataset.wait_for_dataset_lock():
|
|
205
|
+
neighbors, _ = map.embeddings.vector_search(ids=['my_query_point'], k=5)
|
|
206
|
+
|
|
207
|
+
# Return similar data points
|
|
208
|
+
similar_datapoints = dataset.get_data(ids=neighbors[0])
|
|
209
|
+
|
|
210
|
+
print(similar_datapoints)
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
```
|
|
214
|
+
Response:
|
|
215
|
+
|
|
216
|
+
Original query point:
|
|
217
|
+
"Intel abandons digital TV chip project NEW YORK, October 22 (newratings.com) - Global semiconductor giant Intel Corporation (INTC.NAS) has called off its plan to develop a new chip for the digital projection televisions."
|
|
218
|
+
|
|
219
|
+
Nearest neighbors:
|
|
220
|
+
"Intel awaits government move on expensing options Figuring it's had enough of fighting over options, the chip giant is waiting to see what Congress comes up with."
|
|
221
|
+
"Citigroup Takes On Intel The financial services giant takes over non-memory semiconductor chip production."
|
|
222
|
+
"Intel Seen Readying New Wi-Fi Chips SAN FRANCISCO (Reuters) - Intel Corp. this week is expected to introduce a chip that adds support for a relatively obscure version of Wi-Fi, analysts said on Monday, in a move that could help ease congestion on wireless networks."
|
|
223
|
+
"Intel pledges to bring Itanic down to Xeon price-point EM64T a stand-in until the real anti-AMD64 kit arrives"
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
## Background
|
|
227
|
+
|
|
228
|
+
Atlas is developed by the [Nomic AI](https://home.nomic.ai/) team, which is based in NYC. Nomic also developed and maintains [GPT4All](https://gpt4all.io/index.html), an open-source LLM chatbot ecosystem.
|
|
229
|
+
|
|
230
|
+
## Discussion
|
|
231
|
+
|
|
232
|
+
Join the discussion on our [:hut: Discord](https://discord.gg/myY5YDR8z8) to ask questions, get help, and chat with others about Atlas, Nomic, GPT4All, and related topics. Our doors are open to enthusiasts of all skill levels.
|
|
233
|
+
|
|
234
|
+
## Community
|
|
235
|
+
|
|
236
|
+
- Blog: [https://blog.nomic.ai/](https://blog.nomic.ai/)
|
|
237
|
+
- Twitter: [https://twitter.com/nomic_ai](https://twitter.com/nomic_ai)
|
|
238
|
+
- Nomic Website: [https://home.nomic.ai/](https://home.nomic.ai/)
|
|
239
|
+
- Atlas Website: [https://atlas.nomic.ai/](https://atlas.nomic.ai/)
|
|
240
|
+
- GPT4All Website: [https://gpt4all.io/index.html](https://gpt4all.io/index.html)
|
|
241
|
+
- LinkedIn: [https://www.linkedin.com/company/nomic-ai](https://www.linkedin.com/company/nomic-ai)
|
|
242
|
+
|
|
243
|
+
<br>
|
|
244
|
+
|
|
245
|
+
[Go to top](#)
|
|
246
|
+
|
|
247
|
+
|
|
@@ -115,5 +115,9 @@ class NomicEmbedOptions(BaseModel):
|
|
|
115
115
|
"""
|
|
116
116
|
|
|
117
117
|
model: Literal[
|
|
118
|
-
"nomic-embed-text-v1",
|
|
118
|
+
"nomic-embed-text-v1",
|
|
119
|
+
"nomic-embed-vision-v1",
|
|
120
|
+
"nomic-embed-text-v1.5",
|
|
121
|
+
"nomic-embed-vision-v1.5",
|
|
122
|
+
"gte-multilingual-base",
|
|
119
123
|
] = "nomic-embed-text-v1.5"
|
|
@@ -1688,42 +1688,9 @@ class AtlasDataset(AtlasClass):
|
|
|
1688
1688
|
|
|
1689
1689
|
"""
|
|
1690
1690
|
|
|
1691
|
-
|
|
1692
|
-
|
|
1693
|
-
|
|
1694
|
-
raise ValueError(msg)
|
|
1695
|
-
|
|
1696
|
-
if self.modality == "text" and embeddings is not None:
|
|
1697
|
-
msg = "Please dont specify embeddings for updating a text project"
|
|
1698
|
-
raise ValueError(msg)
|
|
1699
|
-
|
|
1700
|
-
if embeddings is not None and len(data) != embeddings.shape[0]:
|
|
1701
|
-
msg = (
|
|
1702
|
-
"Expected data and embeddings to be the same length but found lengths {} and {} respectively.".format()
|
|
1703
|
-
)
|
|
1704
|
-
raise ValueError(msg)
|
|
1705
|
-
|
|
1706
|
-
shard_size = 2000 # TODO someone removed shard size from params and didn't update
|
|
1707
|
-
# Add new data
|
|
1708
|
-
logger.info("Uploading data to Nomic's neural database Atlas.")
|
|
1709
|
-
with tqdm(total=len(data) // shard_size) as pbar:
|
|
1710
|
-
for i in range(0, len(data), MAX_MEMORY_CHUNK):
|
|
1711
|
-
if self.modality == "embedding" and embeddings is not None:
|
|
1712
|
-
self._add_embeddings(
|
|
1713
|
-
embeddings=embeddings[i : i + MAX_MEMORY_CHUNK, :],
|
|
1714
|
-
data=data[i : i + MAX_MEMORY_CHUNK],
|
|
1715
|
-
pbar=pbar,
|
|
1716
|
-
)
|
|
1717
|
-
else:
|
|
1718
|
-
self._add_text(
|
|
1719
|
-
data=data[i : i + MAX_MEMORY_CHUNK],
|
|
1720
|
-
pbar=pbar,
|
|
1721
|
-
)
|
|
1722
|
-
logger.info("Upload succeeded.")
|
|
1723
|
-
|
|
1724
|
-
# Update maps
|
|
1725
|
-
# finally, update all the indices
|
|
1726
|
-
return self.update_indices()
|
|
1691
|
+
raise DeprecationWarning(
|
|
1692
|
+
f"The function AtlasDataset.update_maps is deprecated. Use AtlasDataset.add_data() instead."
|
|
1693
|
+
)
|
|
1727
1694
|
|
|
1728
1695
|
def update_indices(self, rebuild_topic_models: bool = False):
|
|
1729
1696
|
"""
|
|
@@ -1734,10 +1701,6 @@ class AtlasDataset(AtlasClass):
|
|
|
1734
1701
|
rebuild_topic_models: (Default False) - If true, will create new topic models when updating these indices.
|
|
1735
1702
|
"""
|
|
1736
1703
|
|
|
1737
|
-
|
|
1738
|
-
|
|
1739
|
-
headers=self.header,
|
|
1740
|
-
json={"project_id": self.id, "rebuild_topic_models": rebuild_topic_models},
|
|
1704
|
+
raise DeprecationWarning(
|
|
1705
|
+
f"The function AtlasDataset.update_indices is deprecated. Use AtlasDataset.add_data() instead."
|
|
1741
1706
|
)
|
|
1742
|
-
|
|
1743
|
-
logger.info(f"Updating maps in dataset `{self.identifier}`")
|
|
@@ -0,0 +1,247 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: nomic
|
|
3
|
+
Version: 3.3.2
|
|
4
|
+
Summary: The official Nomic python client.
|
|
5
|
+
Home-page: https://github.com/nomic-ai/nomic
|
|
6
|
+
Author: nomic.ai
|
|
7
|
+
Author-email: support@nomic.ai
|
|
8
|
+
License: UNKNOWN
|
|
9
|
+
Platform: UNKNOWN
|
|
10
|
+
Classifier: License :: OSI Approved :: Apache Software License
|
|
11
|
+
Classifier: Programming Language :: Python :: 3
|
|
12
|
+
Description-Content-Type: text/markdown
|
|
13
|
+
Provides-Extra: local
|
|
14
|
+
Provides-Extra: aws
|
|
15
|
+
Provides-Extra: all
|
|
16
|
+
Provides-Extra: dev
|
|
17
|
+
|
|
18
|
+
<h1 align="center">Nomic Atlas Python Client</h1>
|
|
19
|
+
<h3 align="center">Explore, label, search and share massive datasets in your web browser.</h3>
|
|
20
|
+
<p>This repository contains Python bindings for working with <a href="https://atlas.nomic.ai/">Nomic Atlas</a>, the world’s most powerful unstructured data interaction platform. Atlas supports datasets from hundreds to tens of millions of points, and supports data modalities ranging from text to image to audio to video. </p>
|
|
21
|
+
|
|
22
|
+
With Nomic Atlas, you can:
|
|
23
|
+
|
|
24
|
+
- Generate, store and retrieve embeddings for your unstructured data.
|
|
25
|
+
- Find insights in your unstructured data and embeddings all from your web browser.
|
|
26
|
+
- Share and present your datasets and data findings to anyone.
|
|
27
|
+
|
|
28
|
+
### Where to find us?
|
|
29
|
+
|
|
30
|
+
[https://atlas.nomic.ai/](https://atlas.nomic.ai/)
|
|
31
|
+
|
|
32
|
+
|
|
33
|
+
|
|
34
|
+
## Table of Contents
|
|
35
|
+
|
|
36
|
+
- [Quick resources](#quick-resources)
|
|
37
|
+
- [Example maps](#example-maps)
|
|
38
|
+
- [Features](#features)
|
|
39
|
+
- [Quickstart](#quickstart)
|
|
40
|
+
- [Installation](#installation)
|
|
41
|
+
- [Make your first map](#make-your-first-map)
|
|
42
|
+
- [Atlas usage examples](#atlas-usage-examples)
|
|
43
|
+
- [Access your embeddings](#access-your-embeddings)
|
|
44
|
+
- [View your data's topic model](#view-your-datas-topic-model)
|
|
45
|
+
- [Search for data semantically](#search-for-data-semantically)
|
|
46
|
+
- [Documentation](#documentation)
|
|
47
|
+
- [Discussion](#discussion)
|
|
48
|
+
- [Community](#community)
|
|
49
|
+
|
|
50
|
+
## Quick Resources
|
|
51
|
+
|
|
52
|
+
<p >
|
|
53
|
+
Try the <a href="https://colab.research.google.com/drive/1CZBo3LV0FoRTVRN3v068tvNJgbeWpcSX?usp=sharing">:notebook: Colab Demo</a> to get started in Python
|
|
54
|
+
</p>
|
|
55
|
+
|
|
56
|
+
<p>
|
|
57
|
+
Read the <a href="https://docs.nomic.ai">:closed_book: Atlas Docs</a>
|
|
58
|
+
</p>
|
|
59
|
+
|
|
60
|
+
<p>
|
|
61
|
+
Join our <a href="https://discord.gg/myY5YDR8z8">:hut: Discord</a> to start chatting and get help
|
|
62
|
+
</p>
|
|
63
|
+
|
|
64
|
+
#### Example maps
|
|
65
|
+
|
|
66
|
+
<a href="https://atlas.nomic.ai/map/twitter">:world_map: Map of Twitter</a> (5.4 million tweets)
|
|
67
|
+
<br> <br>
|
|
68
|
+
<a href="https://atlas.nomic.ai/map/stablediffusion">:world_map: Map of StableDiffusion Generations</a> (6.4 million images)
|
|
69
|
+
<br> <br>
|
|
70
|
+
<a href="https://atlas.nomic.ai/map/neurips">:world_map: Map of NeurIPS Proceedings</a> (16,623 abstracts)
|
|
71
|
+
|
|
72
|
+
</p>
|
|
73
|
+
|
|
74
|
+
## Features
|
|
75
|
+
|
|
76
|
+
Here are just a few of the features which Atlas offers:
|
|
77
|
+
|
|
78
|
+
- Organize your **text, image, and embedding data**
|
|
79
|
+
- Create **beautiful and shareable** maps **with or without coding knowledge**
|
|
80
|
+
- Have easy access to both **high-level data structures** and **individual datapoints**
|
|
81
|
+
- **Search** millions of datapoints **instantly**
|
|
82
|
+
- **Cluster data** into semantic topics
|
|
83
|
+
- **Tag and clean** your dataset
|
|
84
|
+
- **Deduplicate** text, images, video, audio
|
|
85
|
+
|
|
86
|
+
|
|
87
|
+
|
|
88
|
+
## Quickstart
|
|
89
|
+
|
|
90
|
+
### Installation
|
|
91
|
+
|
|
92
|
+
1. Install the Nomic library
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
pip install nomic
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
2. Login or create your Nomic account:
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
nomic login
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
3. Follow the instructions to obtain your access token.
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
nomic login [token]
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
### Make your first map
|
|
111
|
+
|
|
112
|
+
```python
|
|
113
|
+
from nomic import atlas
|
|
114
|
+
import numpy as np
|
|
115
|
+
|
|
116
|
+
# Randomly generate a set of 10,000 high-dimensional embeddings
|
|
117
|
+
num_embeddings = 10000
|
|
118
|
+
embeddings = np.random.rand(num_embeddings, 256)
|
|
119
|
+
|
|
120
|
+
# Create Atlas project
|
|
121
|
+
dataset = atlas.map_data(embeddings=embeddings)
|
|
122
|
+
|
|
123
|
+
print(dataset)
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
## Atlas usage examples
|
|
127
|
+
|
|
128
|
+
### Access your embeddings
|
|
129
|
+
|
|
130
|
+
Atlas stores, manages and generates embeddings for your unstructured data.
|
|
131
|
+
|
|
132
|
+
You can access Atlas latent embeddings (e.g. high dimensional) or the two-dimensional embeddings generated for web display.
|
|
133
|
+
|
|
134
|
+
```python
|
|
135
|
+
# Access your Atlas map and download your embeddings
|
|
136
|
+
map = dataset.maps[0]
|
|
137
|
+
|
|
138
|
+
projected_embeddings = map.embeddings.projected
|
|
139
|
+
latent_embeddings = map.embeddings.latent
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
```python
|
|
143
|
+
print(projected_embeddings)
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
```
|
|
147
|
+
# Response:
|
|
148
|
+
id x y
|
|
149
|
+
0 9.815330 -8.105308
|
|
150
|
+
1 -8.725819 5.980628
|
|
151
|
+
2 13.199472 -1.103389
|
|
152
|
+
... ... ... ...
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
```python
|
|
156
|
+
print(latent_embeddings)
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
```
|
|
160
|
+
# Response:
|
|
161
|
+
n x d numpy.ndarray where n = number of datapoints and d = number of latent dimensions
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
### View your data’s topic model
|
|
165
|
+
|
|
166
|
+
Atlas automatically organizes your data into topics informed by the latent contents of your embeddings. Visually, these are represented by regions of homogenous color on an Atlas map.
|
|
167
|
+
|
|
168
|
+
You can access and operate on topics programmatically by using the `topics` attribute
|
|
169
|
+
of an AtlasMap.
|
|
170
|
+
|
|
171
|
+
```python
|
|
172
|
+
# Access your Atlas map
|
|
173
|
+
map = dataset.maps[0]
|
|
174
|
+
|
|
175
|
+
# Access a pandas DataFrame associating each datum on your map to their topics at each topic depth.
|
|
176
|
+
topic_df = map.topics.df
|
|
177
|
+
|
|
178
|
+
print(map.topics.df)
|
|
179
|
+
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
```
|
|
183
|
+
Response:
|
|
184
|
+
|
|
185
|
+
id topic_depth_1 topic_depth_2
|
|
186
|
+
0 Oil Prices mergers and acquisitions
|
|
187
|
+
1 Iraq War Trial of Thatcher
|
|
188
|
+
2 Oil Prices Economic Growth
|
|
189
|
+
... ... ... ...
|
|
190
|
+
9997 Oil Prices Economic Growth
|
|
191
|
+
9998 Baseball Giambi's contract
|
|
192
|
+
9999 Olympic Gold Medal European Football
|
|
193
|
+
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
### Search for data semantically
|
|
197
|
+
|
|
198
|
+
Use Atlas to automatically find nearest neighbors in your vector database.
|
|
199
|
+
|
|
200
|
+
```python
|
|
201
|
+
# Load map and perform vector search for the five nearest neighbors of datum with id "my_query_point"
|
|
202
|
+
map = dataset.maps[0]
|
|
203
|
+
|
|
204
|
+
with dataset.wait_for_dataset_lock():
|
|
205
|
+
neighbors, _ = map.embeddings.vector_search(ids=['my_query_point'], k=5)
|
|
206
|
+
|
|
207
|
+
# Return similar data points
|
|
208
|
+
similar_datapoints = dataset.get_data(ids=neighbors[0])
|
|
209
|
+
|
|
210
|
+
print(similar_datapoints)
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
```
|
|
214
|
+
Response:
|
|
215
|
+
|
|
216
|
+
Original query point:
|
|
217
|
+
"Intel abandons digital TV chip project NEW YORK, October 22 (newratings.com) - Global semiconductor giant Intel Corporation (INTC.NAS) has called off its plan to develop a new chip for the digital projection televisions."
|
|
218
|
+
|
|
219
|
+
Nearest neighbors:
|
|
220
|
+
"Intel awaits government move on expensing options Figuring it's had enough of fighting over options, the chip giant is waiting to see what Congress comes up with."
|
|
221
|
+
"Citigroup Takes On Intel The financial services giant takes over non-memory semiconductor chip production."
|
|
222
|
+
"Intel Seen Readying New Wi-Fi Chips SAN FRANCISCO (Reuters) - Intel Corp. this week is expected to introduce a chip that adds support for a relatively obscure version of Wi-Fi, analysts said on Monday, in a move that could help ease congestion on wireless networks."
|
|
223
|
+
"Intel pledges to bring Itanic down to Xeon price-point EM64T a stand-in until the real anti-AMD64 kit arrives"
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
## Background
|
|
227
|
+
|
|
228
|
+
Atlas is developed by the [Nomic AI](https://home.nomic.ai/) team, which is based in NYC. Nomic also developed and maintains [GPT4All](https://gpt4all.io/index.html), an open-source LLM chatbot ecosystem.
|
|
229
|
+
|
|
230
|
+
## Discussion
|
|
231
|
+
|
|
232
|
+
Join the discussion on our [:hut: Discord](https://discord.gg/myY5YDR8z8) to ask questions, get help, and chat with others about Atlas, Nomic, GPT4All, and related topics. Our doors are open to enthusiasts of all skill levels.
|
|
233
|
+
|
|
234
|
+
## Community
|
|
235
|
+
|
|
236
|
+
- Blog: [https://blog.nomic.ai/](https://blog.nomic.ai/)
|
|
237
|
+
- Twitter: [https://twitter.com/nomic_ai](https://twitter.com/nomic_ai)
|
|
238
|
+
- Nomic Website: [https://home.nomic.ai/](https://home.nomic.ai/)
|
|
239
|
+
- Atlas Website: [https://atlas.nomic.ai/](https://atlas.nomic.ai/)
|
|
240
|
+
- GPT4All Website: [https://gpt4all.io/index.html](https://gpt4all.io/index.html)
|
|
241
|
+
- LinkedIn: [https://www.linkedin.com/company/nomic-ai](https://www.linkedin.com/company/nomic-ai)
|
|
242
|
+
|
|
243
|
+
<br>
|
|
244
|
+
|
|
245
|
+
[Go to top](#)
|
|
246
|
+
|
|
247
|
+
|
|
@@ -6,12 +6,28 @@ from setuptools import setup, find_packages
|
|
|
6
6
|
|
|
7
7
|
description = "The official Nomic python client."
|
|
8
8
|
|
|
9
|
+
# Read README.md and remove tables and images
|
|
10
|
+
with open("README.md") as f:
|
|
11
|
+
content = f.read()
|
|
12
|
+
# Remove table sections including content
|
|
13
|
+
while "<table>" in content and "</table>" in content:
|
|
14
|
+
start = content.find("<table>")
|
|
15
|
+
end = content.find("</table>") + 8
|
|
16
|
+
content = content[:start] + content[end:]
|
|
17
|
+
# Remove img tags and content
|
|
18
|
+
while "<img" in content and ">" in content:
|
|
19
|
+
start = content.find("<img")
|
|
20
|
+
end = content.find(">", start) + 1
|
|
21
|
+
content = content[:start] + content[end:]
|
|
22
|
+
long_description = content
|
|
23
|
+
|
|
9
24
|
setup(
|
|
10
25
|
name="nomic",
|
|
11
|
-
version="3.2
|
|
26
|
+
version="3.3.2",
|
|
12
27
|
url="https://github.com/nomic-ai/nomic",
|
|
13
28
|
description=description,
|
|
14
|
-
long_description=
|
|
29
|
+
long_description=long_description,
|
|
30
|
+
long_description_content_type="text/markdown",
|
|
15
31
|
packages=find_packages(include=["nomic", "nomic.*"]),
|
|
16
32
|
author_email="support@nomic.ai",
|
|
17
33
|
author="nomic.ai",
|
nomic-3.2.0/PKG-INFO
DELETED
|
@@ -1,18 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.1
|
|
2
|
-
Name: nomic
|
|
3
|
-
Version: 3.2.0
|
|
4
|
-
Summary: The official Nomic python client.
|
|
5
|
-
Home-page: https://github.com/nomic-ai/nomic
|
|
6
|
-
Author: nomic.ai
|
|
7
|
-
Author-email: support@nomic.ai
|
|
8
|
-
License: UNKNOWN
|
|
9
|
-
Platform: UNKNOWN
|
|
10
|
-
Classifier: License :: OSI Approved :: Apache Software License
|
|
11
|
-
Classifier: Programming Language :: Python :: 3
|
|
12
|
-
Provides-Extra: local
|
|
13
|
-
Provides-Extra: aws
|
|
14
|
-
Provides-Extra: all
|
|
15
|
-
Provides-Extra: dev
|
|
16
|
-
|
|
17
|
-
The official Nomic python client.
|
|
18
|
-
|
|
@@ -1,18 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.1
|
|
2
|
-
Name: nomic
|
|
3
|
-
Version: 3.2.0
|
|
4
|
-
Summary: The official Nomic python client.
|
|
5
|
-
Home-page: https://github.com/nomic-ai/nomic
|
|
6
|
-
Author: nomic.ai
|
|
7
|
-
Author-email: support@nomic.ai
|
|
8
|
-
License: UNKNOWN
|
|
9
|
-
Platform: UNKNOWN
|
|
10
|
-
Classifier: License :: OSI Approved :: Apache Software License
|
|
11
|
-
Classifier: Programming Language :: Python :: 3
|
|
12
|
-
Provides-Extra: local
|
|
13
|
-
Provides-Extra: aws
|
|
14
|
-
Provides-Extra: all
|
|
15
|
-
Provides-Extra: dev
|
|
16
|
-
|
|
17
|
-
The official Nomic python client.
|
|
18
|
-
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|