native-vector-store 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. package/README.md +242 -12
  2. package/binding.gyp +22 -10
  3. package/deps/simdjson/simdjson.cpp +56403 -0
  4. package/deps/simdjson/simdjson.h +123534 -0
  5. package/docs/PERFORMANCE_CASE_STUDY.md +130 -0
  6. package/docs/PREBUILDS.md +69 -0
  7. package/docs/VectorStore.html +180 -0
  8. package/docs/VectorStoreWrapper.html +1356 -0
  9. package/docs/fonts/OpenSans-Bold-webfont.eot +0 -0
  10. package/docs/fonts/OpenSans-Bold-webfont.svg +1830 -0
  11. package/docs/fonts/OpenSans-Bold-webfont.woff +0 -0
  12. package/docs/fonts/OpenSans-BoldItalic-webfont.eot +0 -0
  13. package/docs/fonts/OpenSans-BoldItalic-webfont.svg +1830 -0
  14. package/docs/fonts/OpenSans-BoldItalic-webfont.woff +0 -0
  15. package/docs/fonts/OpenSans-Italic-webfont.eot +0 -0
  16. package/docs/fonts/OpenSans-Italic-webfont.svg +1830 -0
  17. package/docs/fonts/OpenSans-Italic-webfont.woff +0 -0
  18. package/docs/fonts/OpenSans-Light-webfont.eot +0 -0
  19. package/docs/fonts/OpenSans-Light-webfont.svg +1831 -0
  20. package/docs/fonts/OpenSans-Light-webfont.woff +0 -0
  21. package/docs/fonts/OpenSans-LightItalic-webfont.eot +0 -0
  22. package/docs/fonts/OpenSans-LightItalic-webfont.svg +1835 -0
  23. package/docs/fonts/OpenSans-LightItalic-webfont.woff +0 -0
  24. package/docs/fonts/OpenSans-Regular-webfont.eot +0 -0
  25. package/docs/fonts/OpenSans-Regular-webfont.svg +1831 -0
  26. package/docs/fonts/OpenSans-Regular-webfont.woff +0 -0
  27. package/docs/global.html +561 -0
  28. package/docs/index.html +570 -0
  29. package/docs/scripts/linenumber.js +25 -0
  30. package/docs/scripts/prettify/Apache-License-2.0.txt +202 -0
  31. package/docs/scripts/prettify/lang-css.js +2 -0
  32. package/docs/scripts/prettify/prettify.js +28 -0
  33. package/docs/styles/jsdoc-default.css +358 -0
  34. package/docs/styles/prettify-jsdoc.css +111 -0
  35. package/docs/styles/prettify-tomorrow.css +132 -0
  36. package/index.js +162 -0
  37. package/package.json +30 -7
  38. package/prebuilds/darwin-arm64/native-vector-store.node +0 -0
  39. package/prebuilds/darwin-x64/native-vector-store.node +0 -0
  40. package/prebuilds/linux-arm64/native-vector-store.node +0 -0
  41. package/prebuilds/linux-x64/native-vector-store.node +0 -0
  42. package/prebuilds/linux-x64-musl/napi-v9/native-vector-store.node +0 -0
  43. package/prebuilds/linux-x64-musl/native-vector-store.node +0 -0
  44. package/prebuilds/win32-x64/native-vector-store.node +0 -0
  45. package/src/Makefile +87 -0
  46. package/src/test_main.cpp +173 -0
  47. package/src/test_stress.cpp +394 -0
  48. package/src/vector_store.cpp +344 -0
  49. package/src/vector_store.h +21 -323
  50. package/native-vector-store-0.1.0.tgz +0 -0
  51. package/scripts/build-prebuilds.sh +0 -23
  52. /package/{src → deps/atomic_queue}/atomic_queue.h +0 -0
  53. /package/{src → deps/atomic_queue}/defs.h +0 -0
@@ -0,0 +1,130 @@
1
+ # Performance Case Study: Adaptive File Loading in Native Vector Store
2
+
3
+ ## Executive Summary
4
+
5
+ We achieved significant performance improvements in our native vector store by implementing an adaptive file loading strategy that automatically selects the optimal I/O method based on file size. This resulted in up to **3x faster loading times** for typical workloads while maintaining simplicity for users.
6
+
7
+ ## Background
8
+
9
+ Our vector store loads JSON documents containing embeddings from disk. Initial implementation used standard file I/O with buffering, which performed well but had room for improvement, especially when dealing with directories containing many files of varying sizes.
10
+
11
+ ## The Challenge
12
+
13
+ We discovered that optimal file loading strategies vary dramatically based on file characteristics:
14
+ - **Large files (>5MB)**: Sequential reads with pre-allocated buffers perform best
15
+ - **Small files (<5MB)**: Memory-mapped I/O significantly reduces overhead
16
+
17
+ ## Implementation Journey
18
+
19
+ ### Phase 1: Baseline Optimization
20
+ First, we optimized the standard loader:
21
+ - Pre-allocated reusable buffers (1MB initial, grows as needed)
22
+ - Used `filesystem::file_size()` to avoid redundant syscalls
23
+ - Implemented producer-consumer pattern with lock-free queues
24
+
25
+ **Result**: ~10-15% improvement over naive implementation
26
+
27
+ ### Phase 2: Memory-Mapped I/O
28
+ We added memory-mapped file support:
29
+ - Zero-copy access to file data
30
+ - Cross-platform support (mmap on POSIX, MapViewOfFile on Windows)
31
+ - Eliminated buffer allocation overhead
32
+
33
+ **Result**: Mixed results - faster for small files, slower for large files
34
+
35
+ ### Phase 3: Adaptive Strategy
36
+ The key insight was that no single approach works best for all cases:
37
+
38
+ ```cpp
39
+ // Adaptive loader chooses the best method per file
40
+ constexpr size_t SIZE_THRESHOLD = 5 * 1024 * 1024; // 5MB
41
+
42
+ for (const auto& file_info : file_infos) {
43
+ if (file_info.size < SIZE_THRESHOLD) {
44
+ // Use memory mapping for small files
45
+ load_with_mmap(file_info);
46
+ } else {
47
+ // Use standard I/O for large files
48
+ load_with_standard_io(file_info);
49
+ }
50
+ }
51
+ ```
52
+
53
+ ## Benchmark Results
54
+
55
+ ### Test Dataset 1: Large Files (2 files, ~340MB total)
56
+ | Method | Load Time | Relative Performance |
57
+ |--------|-----------|---------------------|
58
+ | Standard Loader | 731ms | 1.0x (baseline) |
59
+ | Memory-Mapped | 1070ms | 0.68x (slower) |
60
+ | **Adaptive** | **735ms** | **0.99x** |
61
+
62
+ ### Test Dataset 2: Partitioned Files (66 files, ~340MB total)
63
+ | Method | Load Time | Relative Performance |
64
+ |--------|-----------|---------------------|
65
+ | Standard Loader | 415ms | 1.0x (baseline) |
66
+ | Memory-Mapped | 283ms | 1.47x (faster) |
67
+ | **Adaptive** | **278ms** | **1.49x (faster)** |
68
+
69
+ ### Test Dataset 3: Small Files (465 files, ~45MB total)
70
+ | Method | Load Time | Relative Performance |
71
+ |--------|-----------|---------------------|
72
+ | Standard Loader | 146ms | 1.0x (baseline) |
73
+ | Memory-Mapped | 51ms | 2.86x (faster) |
74
+ | **Adaptive** | **49ms** | **2.98x (faster)** |
75
+
76
+ ## Key Findings
77
+
78
+ 1. **File size matters more than total data volume**
79
+ - Memory mapping excels with many small files
80
+ - Standard I/O wins for few large files
81
+
82
+ 2. **The 5MB threshold is optimal**
83
+ - Below 5MB: Memory mapping eliminates per-file overhead
84
+ - Above 5MB: SimdJSON's padding requirement negates mmap benefits
85
+
86
+ 3. **Adaptive loading provides consistent best performance**
87
+ - Automatically selects optimal strategy
88
+ - No configuration required
89
+ - Negligible decision overhead (<1μs per file)
90
+
91
+ ## Technical Details
92
+
93
+ ### Why Memory Mapping Helps Small Files
94
+ - Eliminates buffer allocation (saves ~1-2ms per file)
95
+ - OS handles caching and prefetching
96
+ - Reduces memory copies
97
+
98
+ ### Why Standard I/O Helps Large Files
99
+ - SimdJSON requires padded strings, forcing a copy anyway
100
+ - Sequential reads are highly optimized by OS
101
+ - Single large allocation is more efficient than mmap overhead
102
+
103
+ ### Thread Safety Considerations
104
+ - Both strategies use the same producer-consumer pattern
105
+ - Lock-free atomic queues for work distribution
106
+ - No overlapping OpenMP regions (prevents TSAN warnings)
107
+
108
+ ## Usage
109
+
110
+ The adaptive loader is now the default:
111
+
112
+ ```javascript
113
+ const store = new VectorStore(1536);
114
+ store.loadDir('./documents'); // Automatically uses adaptive strategy
115
+ ```
116
+
117
+ For specific use cases, individual strategies remain available:
118
+ ```javascript
119
+ store.loadDirMMap('./documents'); // Force memory mapping
120
+ store.loadDirAdaptive('./documents'); // Explicit adaptive
121
+ ```
122
+
123
+ ## Conclusion
124
+
125
+ By implementing an adaptive loading strategy, we achieved:
126
+ - **Up to 3x faster loading** for typical workloads
127
+ - **Zero configuration** - it just works
128
+ - **Consistent performance** across diverse file distributions
129
+
130
+ The lesson: Sometimes the best optimization is knowing when to use which technique. Our adaptive loader makes this decision automatically, giving users optimal performance without complexity.
@@ -0,0 +1,69 @@
1
+ # Prebuilt Binaries
2
+
3
+ This package includes prebuilt binaries for common platforms to make installation easier. Users don't need build tools or system dependencies when installing from npm if their platform is supported.
4
+
5
+ ## Supported Platforms
6
+
7
+ Prebuilds are automatically created for:
8
+ - **Linux**: x64, arm64
9
+ - **macOS**: x64, arm64 (including Apple Silicon)
10
+ - **Windows**: x64
11
+
12
+ ## How It Works
13
+
14
+ 1. When users run `npm install native-vector-store`, the install script uses `node-gyp-build`
15
+ 2. `node-gyp-build` checks if a prebuild exists for the current platform
16
+ 3. If found, it uses the prebuild (fast, no compilation needed)
17
+ 4. If not found, it falls back to building from source
18
+
19
+ ## Building Prebuilds
20
+
21
+ ### Locally (for current platform)
22
+ ```bash
23
+ npm run prebuildify
24
+ ```
25
+
26
+ ### For all platforms (using GitHub Actions)
27
+ 1. Push a tag starting with 'v' (e.g., v0.1.0)
28
+ 2. GitHub Actions will automatically build for all platforms
29
+ 3. Prebuilds will be attached to the GitHub release
30
+
31
+ ### Manual trigger
32
+ You can also manually trigger the prebuild workflow from the Actions tab on GitHub.
33
+
34
+ ## Including Prebuilds in npm Package
35
+
36
+ The prebuilds are automatically included when you run `npm publish`. The directory structure is:
37
+ ```
38
+ native-vector-store/
39
+ ├── prebuilds/
40
+ │ ├── linux-x64/
41
+ │ ├── linux-arm64/
42
+ │ ├── darwin-x64/
43
+ │ ├── darwin-arm64/
44
+ │ └── win32-x64/
45
+ └── ... other files
46
+ ```
47
+
48
+ ## Fallback Behavior
49
+
50
+ If a prebuild isn't available, users will need:
51
+ - C++17 compatible compiler
52
+ - simdjson library
53
+ - OpenMP support
54
+ - Python and build tools
55
+
56
+ ## Testing Prebuilds
57
+
58
+ After building prebuilds:
59
+ ```bash
60
+ # Test that it loads correctly
61
+ node -e "console.log(require('.'))"
62
+ ```
63
+
64
+ ## Troubleshooting
65
+
66
+ If prebuilds aren't working:
67
+ 1. Check that `node-gyp-build` is in dependencies (not devDependencies)
68
+ 2. Ensure prebuilds/ directory is not in .npmignore
69
+ 3. Verify the binary names match node-gyp-build expectations
@@ -0,0 +1,180 @@
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="utf-8">
5
+ <title>JSDoc: Class: VectorStore</title>
6
+
7
+ <script src="scripts/prettify/prettify.js"> </script>
8
+ <script src="scripts/prettify/lang-css.js"> </script>
9
+ <!--[if lt IE 9]>
10
+ <script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
11
+ <![endif]-->
12
+ <link type="text/css" rel="stylesheet" href="styles/prettify-tomorrow.css">
13
+ <link type="text/css" rel="stylesheet" href="styles/jsdoc-default.css">
14
+ </head>
15
+
16
+ <body>
17
+
18
+ <div id="main">
19
+
20
+ <h1 class="page-title">Class: VectorStore</h1>
21
+
22
+
23
+
24
+
25
+
26
+
27
+ <section>
28
+
29
+ <header>
30
+
31
+ <h2><span class="attribs"><span class="type-signature"></span></span>VectorStore<span class="signature">()</span><span class="type-signature"></span></h2>
32
+
33
+
34
+ </header>
35
+
36
+ <article>
37
+ <div class="container-overview">
38
+
39
+
40
+
41
+
42
+
43
+
44
+ <h4 class="name" id="VectorStore"><span class="type-signature"></span>new VectorStore<span class="signature">()</span><span class="type-signature"></span></h4>
45
+
46
+
47
+
48
+
49
+
50
+
51
+ <div class="description">
52
+ <p>High-performance vector store with SIMD optimization for similarity search.
53
+ Designed for immutable, one-time loading scenarios with fast searches over focused corpora.</p>
54
+ </div>
55
+
56
+
57
+
58
+
59
+
60
+
61
+
62
+
63
+
64
+
65
+
66
+
67
+
68
+ <dl class="details">
69
+
70
+
71
+
72
+
73
+
74
+
75
+
76
+
77
+
78
+
79
+
80
+
81
+
82
+
83
+
84
+
85
+
86
+
87
+
88
+
89
+
90
+
91
+
92
+
93
+
94
+
95
+
96
+
97
+
98
+
99
+
100
+
101
+ </dl>
102
+
103
+
104
+
105
+
106
+
107
+
108
+
109
+
110
+
111
+
112
+
113
+
114
+
115
+
116
+
117
+
118
+
119
+
120
+
121
+ <h5>Examples</h5>
122
+
123
+ <pre class="prettyprint"><code>// Basic usage
124
+ const store = new VectorStore(1536);
125
+ store.loadDir('./documents');
126
+ const results = store.search(queryEmbedding, 10);</code></pre>
127
+
128
+ <pre class="prettyprint"><code>// Multiple domain-specific stores
129
+ const productStore = new VectorStore(1536);
130
+ const supportStore = new VectorStore(1536);
131
+ productStore.loadDir('./knowledge/products');
132
+ supportStore.loadDir('./knowledge/support');</code></pre>
133
+
134
+
135
+
136
+
137
+ </div>
138
+
139
+
140
+
141
+
142
+
143
+
144
+
145
+
146
+
147
+
148
+
149
+
150
+
151
+
152
+
153
+
154
+
155
+
156
+
157
+
158
+ </article>
159
+
160
+ </section>
161
+
162
+
163
+
164
+
165
+ </div>
166
+
167
+ <nav>
168
+ <h2><a href="index.html">Home</a></h2><h3>Classes</h3><ul><li><a href="VectorStore.html">VectorStore</a></li><li><a href="VectorStoreWrapper.html">VectorStoreWrapper</a></li></ul><h3><a href="global.html">Global</a></h3>
169
+ </nav>
170
+
171
+ <br class="clear">
172
+
173
+ <footer>
174
+ Documentation generated by <a href="https://github.com/jsdoc/jsdoc">JSDoc 4.0.4</a>
175
+ </footer>
176
+
177
+ <script> prettyPrint(); </script>
178
+ <script src="scripts/linenumber.js"> </script>
179
+ </body>
180
+ </html>