flatnav 0.1.2__cp38-cp38-macosx_10_14_x86_64.whl

Sign up to get free protection for your applications and to get access to all the features.
flatnav/__init__.py ADDED
@@ -0,0 +1,35 @@
1
+ import sys
2
+ from ._core import (
3
+ MetricType,
4
+ data_type,
5
+ __version__,
6
+ __doc__
7
+ )
8
+
9
+ class _DataTypeModule:
10
+ from ._core.data_type import DataType
11
+
12
+
13
+ class _IndexModule:
14
+ from ._core.index import (
15
+ IndexL2Float,
16
+ IndexIPFloat,
17
+ IndexL2Uint8,
18
+ IndexIPUint8,
19
+ IndexL2Int8,
20
+ IndexIPInt8,
21
+ create,
22
+ )
23
+
24
+
25
+ index = _IndexModule
26
+ sys.modules['flatnav.index'] = _IndexModule
27
+ sys.modules['flatnav.data_type'] = _DataTypeModule
28
+
29
+ __all__ = [
30
+ 'MetricType',
31
+ 'data_type',
32
+ 'index',
33
+ '__version__',
34
+ '__doc__'
35
+ ]
Binary file
@@ -0,0 +1,299 @@
1
+ Metadata-Version: 2.1
2
+ Name: flatnav
3
+ Version: 0.1.2
4
+ Summary: A performant graph-based kNN search library with re-ordering.
5
+ Home-page: https://flatnav.net
6
+ Author: Benjamin Coleman, Blaise Munyampirwa, Vihan Lakshman
7
+ Author-email: benjamin.ray.coleman@gmail.com, blaisemunyampirwa@gmail.com, vihan@mit.edu
8
+ Maintainer-email: blaisemunyampirwa@gmail.com
9
+ License: Apache License, Version 2.0
10
+ Project-URL: Source Code, https://github.com/BlaiseMuhirwa/flatnav
11
+ Project-URL: Documentation, https://blaisemuhirwa.github.io/flatnav
12
+ Project-URL: Bug Tracker, https://github.com/BlaiseMuhirwa/flatnav/issues
13
+ Keywords: similarity search,vector databases,machine learning
14
+ Classifier: Development Status :: 4 - Beta
15
+ Classifier: Environment :: Console
16
+ Classifier: Operating System :: POSIX :: Linux
17
+ Classifier: Intended Audience :: Science/Research
18
+ Classifier: Intended Audience :: Developers
19
+ Classifier: Intended Audience :: Other Audience
20
+ Classifier: License :: OSI Approved :: Apache Software License
21
+ Classifier: Programming Language :: C++
22
+ Classifier: Programming Language :: Python :: 3
23
+ Classifier: Programming Language :: Python :: 3.8
24
+ Classifier: Programming Language :: Python :: 3.9
25
+ Classifier: Programming Language :: Python :: 3.10
26
+ Classifier: Programming Language :: Python :: 3.11
27
+ Classifier: Programming Language :: Python :: 3.12
28
+ Classifier: Topic :: System
29
+ Classifier: Topic :: Scientific/Engineering
30
+ Classifier: Topic :: Software Development
31
+ Description-Content-Type: text/markdown
32
+ Requires-Dist: numpy<2,>=1.21.0
33
+ Requires-Dist: h5py==3.11.0
34
+
35
+ ## FlatNav
36
+
37
+ FlatNav is a fast and header-only graph-based index for Approximate Nearest Neighbor Search (ANNS). FlatNav is inspired by the influential [Hierarchical Navigable Small World (HNSW) index](https://github.com/nmslib/hnswlib), but with the hierarchical component removed. As detailed in our [research paper](https://arxiv.org/pdf/2412.01940), we found that FlatNav achieved identical performance to HNSW on high-dimensional datasets (dimensionality > 32) with approximately 38% less peak memory consumption and a simplified implementation.
38
+
39
+ We hope to maintain this open source library as a resource for broader community. Please consider opening a Github Issue for bugs and feature requests, or get in touch with us directly for discussions.
40
+
41
+
42
+ ### Installation
43
+ FlatNav is implemented in C++ with a complete Python extension with [cereal](https://uscilab.github.io/cereal/) as the only external dependency. This is a header-only library, so there is nothing to build. You can just include the necessary headers in your existing code.
44
+
45
+ FlatNav is supported on x86-64 machines on linux and MacOS (we can extend this to windows if there is sufficient interest). To get the C++ library working and run examples under the [tools](https://github.com/BlaiseMuhirwa/flatnav/blob/main/tools) directory, you will need
46
+
47
+ * C++17 compiler with OpenMP support (version >= 2.0)
48
+ * CMake (version >= 3.14)
49
+
50
+ We provide some helpful scripts for installing the above in the [bin](https://github.com/BlaiseMuhirwa/flatnav/tree/main/bin) directory.
51
+
52
+ To generate the library with CMake and compile examples, run
53
+
54
+ ```shell
55
+ $ git clone https://github.com/BlaiseMuhirwa/flatnav.git --recurse-submodules
56
+ $ cd flatnav
57
+ $ ./bin/build.sh -e
58
+ ```
59
+
60
+ You can get all options available with the `build.sh` script by passing it the `-h` argument.
61
+
62
+ This will display all available build options:
63
+
64
+ ```shell
65
+ Usage ./build.sh [OPTIONS]
66
+
67
+ Available Options:
68
+ -t, --tests: Build tests
69
+ -e, --examples: Build examples
70
+ -v, --verbose: Make verbose
71
+ -b, --benchmark: Build benchmarks
72
+ -bt, --build_type: Build type (Debug, Release, RelWithDebInfo, MinSizeRel)
73
+ -nmv, --no_simd_vectorization:Disable SIMD instructions
74
+ -h, --help: Print this help message
75
+
76
+ Example Usage:
77
+ ./build.sh -t -e -v
78
+ ```
79
+
80
+ To build the Python bindings, follow instructions [here](https://github.com/BlaiseMuhirwa/flatnav/blob/main/flatnav_python/README.md). There are also examples for how to use the library to build an index and run queries on top of it [here](https://github.com/BlaiseMuhirwa/flatnav/blob/main/flatnav_python/unit_tests/test_index.py).
81
+
82
+ ### Support for SIMD Extensions
83
+
84
+ We currently support SIMD extensions for certain platforms as detailed below.
85
+
86
+ | Operation | x86_64 | arm64v8 | Apple silicon |
87
+ |-----------|--------|---------|-----------------|
88
+ | FP32 Inner product |SSE, AVX, AVX512 | No SIMD support | No SIMD support |
89
+ | FP32 L2 distance |SSE, AVX, AVX512| No SIMD support | No SIMD support |
90
+ | UINT8 L2 distance |AVX512 | No SIMD support | No SIMD support |
91
+ | INT8 L2 distance | SSE | No SIMD support | No SIMD support |
92
+
93
+
94
+ ### Getting Started in Python
95
+
96
+ Currently, we support Python wheels for versions 3.8 through 3.12 on x86_64 architectures (Intel, AMD and MacOS). Support for
97
+ ARM wheels is a future improvement.
98
+
99
+ The python library can be installed from PyPI by using
100
+ ```shell
101
+ $ pip install flatnav
102
+ ```
103
+
104
+ Similarly, `flatnav` can be installed from source via [cibuildwheel](https://cibuildwheel.pypa.io/en/stable/), which
105
+ builds cross-platform wheels. Follow the following steps
106
+
107
+ ```shell
108
+ $ git clone https://github.com/BlaiseMuhirwa/flatnav.git --recurse-submodules
109
+ $ cd flatnav
110
+ $ make install-cibuildwheel
111
+
112
+ # This will build flatnav for the current version in your environment. If you want to build wheels
113
+ # for all supported python versions (3.8 to 3.12), remove the --current-version flag.
114
+ $ ./cibuild.sh --current-version
115
+
116
+ $ pip install wheelhouse/flatnav*.whl --force-reinstall
117
+ ```
118
+
119
+ Once you have the python library installed and you have a dataset you want to index as a numpy array, you can construct the index as shown below. This will allocate memory and create a directed graph with vectors as nodes.
120
+
121
+ ```python
122
+ import numpy as np
123
+ import flatnav
124
+ from flatnav.data_type import DataType
125
+
126
+ # Get your numpy-formatted dataset.
127
+ dataset_size = 1_000_000
128
+ dataset_dimension = 128
129
+ dataset_to_index = np.random.randn(dataset_size, dataset_dimension)
130
+
131
+ # Define index construction parameters.
132
+ distance_type = "l2"
133
+ max_edges_per_node = 32
134
+ ef_construction = 100
135
+ num_build_threads = 16
136
+
137
+ # Create index configuration and pre-allocate memory
138
+ index = flatnav.index.create(
139
+ distance_type=distance_type,
140
+ index_data_type=DataType.float32,
141
+ dim=dataset_dimension,
142
+ dataset_size=dataset_size,
143
+ max_edges_per_node=max_edges_per_node,
144
+ verbose=True,
145
+ collect_stats=True,
146
+ )
147
+ index.set_num_threads(num_build_threads)
148
+
149
+ # Now index the dataset
150
+ index.add(data=dataset_to_index, ef_construction=ef_construction)
151
+ ```
152
+
153
+ Note that we specified `DataType.float32` to indicate that we want to build an index with vectors represented with `float` type. If you want to use a different precision, such as `uint8_t` or `int8_t` (which are the only other ones currently supported), you can use `DataType.uint8` or `DataType.int8`.
154
+ The distance type can either be `l2` or `angular`. The `collect_stats` flag will record the number of distance evaluations.
155
+
156
+ To query the index we just created by generating IID vectors from the standard normal distribution, we do it as follows
157
+
158
+ ```python
159
+
160
+ # Set query-time parameters
161
+ k = 100
162
+ ef_search = 100
163
+
164
+ # Run k-NN query with a single thread.
165
+ index.set_num_threads(1)
166
+
167
+ queries = np.random.randn(1000, dataset_to_index.shape[1])
168
+ for query in queries:
169
+ distances, indices = index.search_single(
170
+ query=query,
171
+ ef_search=ef_search,
172
+ K=k,
173
+ )
174
+
175
+ ```
176
+
177
+ You can parallelize the search by setting the number of threads to a desired number and using a different API that also returns the exact same results as `search_single`.
178
+
179
+ ```python
180
+ index.set_num_threads(16)
181
+ distances, indices = index.search(queries=queries, ef_search=ef_search, K=k)
182
+ ```
183
+
184
+ ### Getting Started in C++
185
+
186
+ As mentioned earlier, there is nothing to build since this is header-only. We will translate the above Python code in C++ to illustrate how to use the C++ API.
187
+
188
+ ```c++
189
+ #include <cstdint>
190
+ #include <flatnav/index/Index.h>
191
+ #include <flatnav/distances/SquaredL2Distance.h>
192
+ #include <flatnav/distances/DistanceInterface.h>
193
+
194
+ template <typename dist_t>
195
+ void run_knn_search(Index<dist_t, int>>* index, float *queries, int* gtruth,
196
+ int ef_search, int K, int num_queries, int num_gtruth, int dim) {
197
+
198
+ float mean_recall = 0;
199
+ for (int i = 0; i < num_queries; i++) {
200
+ float *q = queries + dim * i;
201
+ int *g = gtruth + num_gtruth * i;
202
+ std::vector<std::pair<float, int>> result =
203
+ index->search(q, K, ef_search);
204
+
205
+ float recall = 0;
206
+ for (int j = 0; j < K; j++) {
207
+ for (int l = 0; l < K; l++) {
208
+ if (result[j].second == g[l]) {
209
+ recall = recall + 1;
210
+ }
211
+ }
212
+ }
213
+ recall = recall / K;
214
+ mean_recall = mean_recall + recall;
215
+ }
216
+ }
217
+
218
+
219
+ int main(int argc, char** argv) {
220
+ uint32_t dataset_size = 1000000;
221
+ uint32_t dataset_dimension = 128;
222
+
223
+ // We skip the random data generation, but you can do that with std::mt19937, std::random_device
224
+ // and std::normal_distribution
225
+ // std::vector<float> dataset_to_index;
226
+
227
+ uint32_t max_edges_per_node = 32;
228
+ uint32_t ef_construction = 100;
229
+
230
+ // Create an index with l2 distance
231
+ auto distance = SquaredL2Distance<>::create(dataset_dimension);
232
+ auto* index = new Index<SquaredL2Distance<DataType::float32>>, int>(
233
+ /* dist = */ std::move(distance), /* dataset_size = */ dataset_size,
234
+ /* max_edges_per_node = */ max_edges_per_node);
235
+
236
+ index->setNumThreads(build_num_threads);
237
+
238
+ std::vector<int> labels(dataset_size);
239
+ std::iota(labels.begin(), labels.end(), 0);
240
+ index->template addBatch<float>(/* data = */ (void *)dataset_to_index,
241
+ /* labels = */ labels,
242
+ /* ef_construction */ ef_construction);
243
+
244
+ // Now query the index and compute the recall
245
+ // We assume you have a ground truth (int*) array and a queries (float*) array
246
+ uint32_t ef_search = 100;
247
+ uint32_t k = 100;
248
+ uint32_t num_queries = 1000;
249
+ uint32_t num_gtruth = 1000;
250
+
251
+ // Query the index and compute the recall.
252
+ run_knn_search(index, queries, gtruth, ef_search, k, num_queries, num_gtruth, dataset_dimension);
253
+ }
254
+
255
+ ```
256
+
257
+ ### Datasets from ANN-Benchmarks
258
+
259
+ ANN-Benchmarks provide HDF5 files for a standard benchmark of near-neighbor datasets, queries and ground-truth results. To index any of these datasets you can use the `construct_npy.cpp` and `query_npy.cpp` files linked above.
260
+
261
+ To generate the [ANNS benchmark datasets](https://github.com/erikbern/ann-benchmarks?tab=readme-ov-file#data-sets), run the following script
262
+
263
+ ```shell
264
+ $ ./bin/download_anns_datasets.sh <dataset-name> [--normalize]
265
+ ```
266
+
267
+ For datasets that use the angular/cosine similarity, you will need to use `--normalize` option so that the distances are computed correctly.
268
+
269
+ Available dataset names include:
270
+
271
+ ```shell
272
+ _ mnist-784-euclidean
273
+ _ sift-128-euclidean
274
+ _ glove-25-angular
275
+ _ glove-50-angular
276
+ _ glove-100-angular
277
+ _ glove-200-angular
278
+ _ deep-image-96-angular
279
+ _ gist-960-euclidean
280
+ _ nytimes-256-angular
281
+ ```
282
+
283
+ ### Experimental API and Future Extensions
284
+
285
+ You can find the current work under development under the [development-features](https://github.com/BlaiseMuhirwa/flatnav/blob/main/development-features) directory.
286
+ While some of these features may be usable, they are not guarranteed to be stable. Stable features will be expected to be part of the PyPI releases.
287
+ The most notable on-going extension that's under development is product quantization.
288
+
289
+ ## Citation
290
+ If you find this library useful, please consider citing our associated paper:
291
+
292
+ ```
293
+ @article{munyampirwa2024down,
294
+ title={Down with the Hierarchy: The'H'in HNSW Stands for" Hubs"},
295
+ author={Munyampirwa, Blaise and Lakshman, Vihan and Coleman, Benjamin},
296
+ journal={arXiv preprint arXiv:2412.01940},
297
+ year={2024}
298
+ }
299
+ ```
@@ -0,0 +1,6 @@
1
+ flatnav/__init__.py,sha256=Rrz04JKfpOf0cP1T8yy_sfT2z5dL9HMrnL9lsIzCCBg,579
2
+ flatnav/_core.cpython-38-darwin.so,sha256=oxJWP5Pjk_V9RAf67YRg5e5nuk5y3nsV0w_i18CjH6s,682920
3
+ flatnav-0.1.2.dist-info/METADATA,sha256=2Zsista1n0hTA5dqRQj6KwxySEAQnOjk78w3FKGen2s,11855
4
+ flatnav-0.1.2.dist-info/WHEEL,sha256=1Q0ilh_UmUoDz5aBvEtcYwUkqD8WzqUHpTXCjcN17Ek,104
5
+ flatnav-0.1.2.dist-info/top_level.txt,sha256=FVUKVYK356G2MlNoIaTtjmGUzJNV_2wLRmcHtuSUP3Y,8
6
+ flatnav-0.1.2.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: skbuild 0.18.1
3
+ Root-Is-Purelib: false
4
+ Tag: cp38-cp38-macosx_10_14_x86_64
5
+
@@ -0,0 +1 @@
1
+ flatnav