mrpt 2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
mrpt-2.0/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2017 developers of MRPT
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
mrpt-2.0/PKG-INFO ADDED
@@ -0,0 +1,182 @@
1
+ Metadata-Version: 2.2
2
+ Name: mrpt
3
+ Version: 2.0
4
+ Summary: Fast nearest neighbor search with random projection
5
+ Home-page: http://github.com/vioshyvo/mrpt
6
+ Author: Ville Hyvönen
7
+ Author-email: ville.o.hyvonen@helsinki.fi
8
+ License: MIT
9
+ Keywords: vector search,approximate nearest neighbor search
10
+ Classifier: Development Status :: 5 - Production/Stable
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Programming Language :: C++
13
+ Classifier: Programming Language :: Python
14
+ Classifier: Topic :: Database :: Database Engines/Servers
15
+ Classifier: Programming Language :: Python :: 3.9
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Programming Language :: Python :: 3.13
20
+ Classifier: Operating System :: MacOS
21
+ Classifier: Operating System :: POSIX :: Linux
22
+ Classifier: Operating System :: Microsoft :: Windows
23
+ Description-Content-Type: text/markdown
24
+ License-File: LICENSE.txt
25
+ Requires-Dist: numpy
26
+ Dynamic: author
27
+ Dynamic: author-email
28
+ Dynamic: classifier
29
+ Dynamic: description
30
+ Dynamic: description-content-type
31
+ Dynamic: home-page
32
+ Dynamic: keywords
33
+ Dynamic: license
34
+ Dynamic: requires-dist
35
+ Dynamic: summary
36
+
37
+ # MRPT - fast nearest neighbor search with random projection
38
+
39
+ ![Fifty shades of green](docs/img/voting-candidates2.png)
40
+
41
+ [![Documentation](https://img.shields.io/badge/api-reference-blue.svg)](http://vioshyvo.github.io/mrpt/html/index.html)
42
+
43
+ MRPT is a lightweight and easy-to-use library for approximate nearest neighbor search. It is written in C++11 and has Python bindings. The index building has an integrated hyperparameter tuning algorithm, so the only hyperparameter required to construct the index is the target recall level!
44
+
45
+ According to [our experiments](https://github.com/ejaasaari/mrpt-comparison/) MRPT is one of the fastest libraries for approximate nearest neighbor search.
46
+
47
+ In the offline phase of the algorithm MRPT indexes the data with a collection of *random projection trees*. In the online phase the index structure allows us to answer queries in superior time. A detailed description of the algorithm with the time and space complexities, and the aforementioned comparisons can be found in [our article](https://www.cs.helsinki.fi/u/ttonteri/pub/bigdata2016.pdf) that was published in IEEE International Conference on Big Data 2016.
48
+
49
+ The algorithm for automatic hyperparameter tuning is described in detail in our new article that will be presented in Pacific-Asia Conference on Knowledge Discovery and Data Mining 2019 ([arxiv preprint](https://arxiv.org/abs/1812.07484)).
50
+
51
+ Currently the Euclidean distance is supported as a distance metric.
52
+
53
+ The tests for MRPT are in a separate [repo](https://github.com/vioshyvo/RP-test).
54
+ ## New
55
+
56
+ - Release [MRPT 2.0.0](https://github.com/vioshyvo/mrpt/releases/tag/release-2.0.0) : pip package and bug fixes. (2025/02/28)
57
+
58
+ - Release [MRPT 1.1.1](https://github.com/vioshyvo/mrpt/releases/tag/release-1.1.1) : faster autotuning and bug fixes. (2018/12/07)
59
+
60
+ - Release [MRPT 1.1.0](https://github.com/vioshyvo/mrpt/releases/tag/release-1.1.0) : now autotuning works also without a separate set of test queries. (2018/11/24)
61
+
62
+ - Release [MRPT 1.0.0](https://github.com/vioshyvo/mrpt/releases) (2018/11/22)
63
+
64
+ - Add [documentation](http://vioshyvo.github.io/mrpt/html/index.html) for C++ API (2018/11/22)
65
+
66
+ - Add index building with autotuning: no more manual hyperparameter tuning! (2018/11/21)
67
+
68
+ ## Python installation
69
+
70
+ Install the module with `pip install mrpt`
71
+
72
+ On macOS, it is highly recommended to use the Homebrew version of Clang as the compiler:
73
+ ```
74
+ brew install llvm libomp
75
+ CC=/opt/homebrew/opt/llvm/bin/clang CXX=/opt/homebrew/opt/llvm/bin/clang++ LDFLAGS=-L/opt/homebrew/opt/llvm/lib pip install lorann
76
+ ```
77
+
78
+ ## Minimal examples
79
+
80
+ ### Python
81
+
82
+ This example first generates a 200-dimensional data set of 10000 points, and 100 test query points. The `exact_search` function can be used to find the indices of the true 10 nearest neighbors of the first test query.
83
+
84
+ The `build_autotune_sample` function then builds an index for approximate k-nn search; it uses automatic parameter tuning, so only the target recall level (90% in this example) and the number of neighbors searched for have to be specified.
85
+
86
+ ```python
87
+ import mrpt
88
+ import numpy as np
89
+
90
+ n, d, k = 10000, 200, 10
91
+ target_recall = 0.9
92
+
93
+ data = np.random.rand(n, d).astype(np.float32)
94
+ q = np.random.rand(d).astype(np.float32)
95
+
96
+ index = mrpt.MRPTIndex(data)
97
+ print(index.exact_search(q, k, return_distances=False))
98
+
99
+ index.build_autotune_sample(target_recall, k)
100
+ print(index.ann(q, return_distances=False))
101
+ ```
102
+
103
+ The approximate nearest neighbors are then searched by the function `ann`; because the index was autotuned, no other arguments than the query point are required.
104
+
105
+ Here is a sample output:
106
+ ```
107
+ [9738 5033 6520 2108 9216 9164 112 1442 1871 8020]
108
+ [9738 5033 6520 2108 9216 9164 112 1442 1871 6789]
109
+ ```
110
+
111
+ ### C++
112
+
113
+ MRPT is a header-only library, so no compilation is required: just include the header `cpp/Mrpt.h`. The only dependency is the Eigen linear algebra library (Eigen 3.3.5 is bundled in `cpp/lib`), so when using g++, the following minimal example can be compiled for example as:
114
+ ```
115
+ g++ -std=c++14 -Ofast -march=native -Icpp -Icpp/lib ex1.cpp -o ex1 -fopenmp -lgomp
116
+ ```
117
+
118
+ Let's first generate a 200-dimensional data set of 10000 points, and a query point (row = dimension, column = data point). Then `Mrpt::exact_knn` can be used to find the indices of the true 10 nearest neighbors of the test query.
119
+
120
+ The `grow_autotune` function builds an index for approximate k-nn search; it uses automatic parameter tuning, so only the target recall level (90% in this example), and the number of neighbors searched for have to be specified. This version automatically samples a test set of 100 query points from the data set to tune the parameters, so no separate test set is required.
121
+
122
+ ```c++
123
+ #include <iostream>
124
+ #include "Eigen/Dense"
125
+ #include "Mrpt.h"
126
+
127
+ int main() {
128
+ int n = 10000, d = 200, k = 10;
129
+ double target_recall = 0.9;
130
+ Eigen::MatrixXf X = Eigen::MatrixXf::Random(d, n);
131
+ Eigen::MatrixXf q = Eigen::VectorXf::Random(d);
132
+
133
+ Eigen::VectorXi indices(k), indices_exact(k);
134
+
135
+ Mrpt::exact_knn(q, X, k, indices_exact.data());
136
+ std::cout << indices_exact.transpose() << std::endl;
137
+
138
+ Mrpt mrpt(X);
139
+ mrpt.grow_autotune(target_recall, k);
140
+
141
+ mrpt.query(q, indices.data());
142
+ std::cout << indices.transpose() << std::endl;
143
+ }
144
+ ```
145
+
146
+ The approximate nearest neighbors are then searched by the function `query`; because the index was autotuned, no other arguments than a query point and an output buffer for indices are required.
147
+
148
+ Here is a sample output:
149
+ ```
150
+ 8108 1465 6963 2165 83 5900 662 8112 3592 5505
151
+ 8108 1465 6963 2165 83 5900 8112 3592 5505 7992
152
+ ```
153
+ The approximate nearest neighbor search found 9 of 10 true nearest neighbors; so this time the observed recall happened to match the expected recall exactly (results vary between the runs because the algorithm is randomized).
154
+
155
+ ## Citation
156
+ Automatic hyperparameter tuning:
157
+ ~~~~
158
+ @inproceedings{Jaasaari2019,
159
+ title={Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search},
160
+ author={J{\"a}{\"a}saari, Elias and Hyv{\"o}nen, Ville and Roos, Teemu},
161
+ booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining},
162
+ pages={In press},
163
+ year={2019},
164
+ organization={Springer}
165
+ }
166
+ ~~~~
167
+
168
+ MRPT algorithm:
169
+ ~~~~
170
+ @inproceedings{Hyvonen2016,
171
+ title={Fast nearest neighbor search through sparse random projections and voting},
172
+ author={Hyv{\"o}nen, Ville and Pitk{\"a}nen, Teemu and Tasoulis, Sotiris and J{\"a}{\"a}saari, Elias and Tuomainen, Risto and Wang, Liang and Corander, Jukka and Roos, Teemu},
173
+ booktitle={Big Data (Big Data), 2016 IEEE International Conference on},
174
+ pages={881--888},
175
+ year={2016},
176
+ organization={IEEE}
177
+ }
178
+ ~~~~
179
+
180
+ ## License
181
+
182
+ MRPT is available under the MIT License (see [LICENSE.txt](LICENSE.txt)). Note that third-party libraries in the cpp/lib folder may be distributed under other open source licenses. The Eigen library is licensed under the MPL2.
mrpt-2.0/README.md ADDED
@@ -0,0 +1,146 @@
1
+ # MRPT - fast nearest neighbor search with random projection
2
+
3
+ ![Fifty shades of green](docs/img/voting-candidates2.png)
4
+
5
+ [![Documentation](https://img.shields.io/badge/api-reference-blue.svg)](http://vioshyvo.github.io/mrpt/html/index.html)
6
+
7
+ MRPT is a lightweight and easy-to-use library for approximate nearest neighbor search. It is written in C++11 and has Python bindings. The index building has an integrated hyperparameter tuning algorithm, so the only hyperparameter required to construct the index is the target recall level!
8
+
9
+ According to [our experiments](https://github.com/ejaasaari/mrpt-comparison/) MRPT is one of the fastest libraries for approximate nearest neighbor search.
10
+
11
+ In the offline phase of the algorithm MRPT indexes the data with a collection of *random projection trees*. In the online phase the index structure allows us to answer queries in superior time. A detailed description of the algorithm with the time and space complexities, and the aforementioned comparisons can be found in [our article](https://www.cs.helsinki.fi/u/ttonteri/pub/bigdata2016.pdf) that was published in IEEE International Conference on Big Data 2016.
12
+
13
+ The algorithm for automatic hyperparameter tuning is described in detail in our new article that will be presented in Pacific-Asia Conference on Knowledge Discovery and Data Mining 2019 ([arxiv preprint](https://arxiv.org/abs/1812.07484)).
14
+
15
+ Currently the Euclidean distance is supported as a distance metric.
16
+
17
+ The tests for MRPT are in a separate [repo](https://github.com/vioshyvo/RP-test).
18
+ ## New
19
+
20
+ - Release [MRPT 2.0.0](https://github.com/vioshyvo/mrpt/releases/tag/release-2.0.0) : pip package and bug fixes. (2025/02/28)
21
+
22
+ - Release [MRPT 1.1.1](https://github.com/vioshyvo/mrpt/releases/tag/release-1.1.1) : faster autotuning and bug fixes. (2018/12/07)
23
+
24
+ - Release [MRPT 1.1.0](https://github.com/vioshyvo/mrpt/releases/tag/release-1.1.0) : now autotuning works also without a separate set of test queries. (2018/11/24)
25
+
26
+ - Release [MRPT 1.0.0](https://github.com/vioshyvo/mrpt/releases) (2018/11/22)
27
+
28
+ - Add [documentation](http://vioshyvo.github.io/mrpt/html/index.html) for C++ API (2018/11/22)
29
+
30
+ - Add index building with autotuning: no more manual hyperparameter tuning! (2018/11/21)
31
+
32
+ ## Python installation
33
+
34
+ Install the module with `pip install mrpt`
35
+
36
+ On macOS, it is highly recommended to use the Homebrew version of Clang as the compiler:
37
+ ```
38
+ brew install llvm libomp
39
+ CC=/opt/homebrew/opt/llvm/bin/clang CXX=/opt/homebrew/opt/llvm/bin/clang++ LDFLAGS=-L/opt/homebrew/opt/llvm/lib pip install lorann
40
+ ```
41
+
42
+ ## Minimal examples
43
+
44
+ ### Python
45
+
46
+ This example first generates a 200-dimensional data set of 10000 points, and 100 test query points. The `exact_search` function can be used to find the indices of the true 10 nearest neighbors of the first test query.
47
+
48
+ The `build_autotune_sample` function then builds an index for approximate k-nn search; it uses automatic parameter tuning, so only the target recall level (90% in this example) and the number of neighbors searched for have to be specified.
49
+
50
+ ```python
51
+ import mrpt
52
+ import numpy as np
53
+
54
+ n, d, k = 10000, 200, 10
55
+ target_recall = 0.9
56
+
57
+ data = np.random.rand(n, d).astype(np.float32)
58
+ q = np.random.rand(d).astype(np.float32)
59
+
60
+ index = mrpt.MRPTIndex(data)
61
+ print(index.exact_search(q, k, return_distances=False))
62
+
63
+ index.build_autotune_sample(target_recall, k)
64
+ print(index.ann(q, return_distances=False))
65
+ ```
66
+
67
+ The approximate nearest neighbors are then searched by the function `ann`; because the index was autotuned, no other arguments than the query point are required.
68
+
69
+ Here is a sample output:
70
+ ```
71
+ [9738 5033 6520 2108 9216 9164 112 1442 1871 8020]
72
+ [9738 5033 6520 2108 9216 9164 112 1442 1871 6789]
73
+ ```
74
+
75
+ ### C++
76
+
77
+ MRPT is a header-only library, so no compilation is required: just include the header `cpp/Mrpt.h`. The only dependency is the Eigen linear algebra library (Eigen 3.3.5 is bundled in `cpp/lib`), so when using g++, the following minimal example can be compiled for example as:
78
+ ```
79
+ g++ -std=c++14 -Ofast -march=native -Icpp -Icpp/lib ex1.cpp -o ex1 -fopenmp -lgomp
80
+ ```
81
+
82
+ Let's first generate a 200-dimensional data set of 10000 points, and a query point (row = dimension, column = data point). Then `Mrpt::exact_knn` can be used to find the indices of the true 10 nearest neighbors of the test query.
83
+
84
+ The `grow_autotune` function builds an index for approximate k-nn search; it uses automatic parameter tuning, so only the target recall level (90% in this example), and the number of neighbors searched for have to be specified. This version automatically samples a test set of 100 query points from the data set to tune the parameters, so no separate test set is required.
85
+
86
+ ```c++
87
+ #include <iostream>
88
+ #include "Eigen/Dense"
89
+ #include "Mrpt.h"
90
+
91
+ int main() {
92
+ int n = 10000, d = 200, k = 10;
93
+ double target_recall = 0.9;
94
+ Eigen::MatrixXf X = Eigen::MatrixXf::Random(d, n);
95
+ Eigen::MatrixXf q = Eigen::VectorXf::Random(d);
96
+
97
+ Eigen::VectorXi indices(k), indices_exact(k);
98
+
99
+ Mrpt::exact_knn(q, X, k, indices_exact.data());
100
+ std::cout << indices_exact.transpose() << std::endl;
101
+
102
+ Mrpt mrpt(X);
103
+ mrpt.grow_autotune(target_recall, k);
104
+
105
+ mrpt.query(q, indices.data());
106
+ std::cout << indices.transpose() << std::endl;
107
+ }
108
+ ```
109
+
110
+ The approximate nearest neighbors are then searched by the function `query`; because the index was autotuned, no other arguments than a query point and an output buffer for indices are required.
111
+
112
+ Here is a sample output:
113
+ ```
114
+ 8108 1465 6963 2165 83 5900 662 8112 3592 5505
115
+ 8108 1465 6963 2165 83 5900 8112 3592 5505 7992
116
+ ```
117
+ The approximate nearest neighbor search found 9 of 10 true nearest neighbors; so this time the observed recall happened to match the expected recall exactly (results vary between the runs because the algorithm is randomized).
118
+
119
+ ## Citation
120
+ Automatic hyperparameter tuning:
121
+ ~~~~
122
+ @inproceedings{Jaasaari2019,
123
+ title={Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search},
124
+ author={J{\"a}{\"a}saari, Elias and Hyv{\"o}nen, Ville and Roos, Teemu},
125
+ booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining},
126
+ pages={In press},
127
+ year={2019},
128
+ organization={Springer}
129
+ }
130
+ ~~~~
131
+
132
+ MRPT algorithm:
133
+ ~~~~
134
+ @inproceedings{Hyvonen2016,
135
+ title={Fast nearest neighbor search through sparse random projections and voting},
136
+ author={Hyv{\"o}nen, Ville and Pitk{\"a}nen, Teemu and Tasoulis, Sotiris and J{\"a}{\"a}saari, Elias and Tuomainen, Risto and Wang, Liang and Corander, Jukka and Roos, Teemu},
137
+ booktitle={Big Data (Big Data), 2016 IEEE International Conference on},
138
+ pages={881--888},
139
+ year={2016},
140
+ organization={IEEE}
141
+ }
142
+ ~~~~
143
+
144
+ ## License
145
+
146
+ MRPT is available under the MIT License (see [LICENSE.txt](LICENSE.txt)). Note that third-party libraries in the cpp/lib folder may be distributed under other open source licenses. The Eigen library is licensed under the MPL2.