@llm-dev-ops/agentics-cli 2.7.38 → 2.7.39
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/pipeline/auto-chain.d.ts.map +1 -1
- package/dist/pipeline/auto-chain.js +82 -6
- package/dist/pipeline/auto-chain.js.map +1 -1
- package/dist/pipeline/exemplars.d.ts +34 -0
- package/dist/pipeline/exemplars.d.ts.map +1 -0
- package/dist/pipeline/exemplars.js +101 -0
- package/dist/pipeline/exemplars.js.map +1 -0
- package/dist/pipeline/output-validator.d.ts +38 -0
- package/dist/pipeline/output-validator.d.ts.map +1 -0
- package/dist/pipeline/output-validator.js +152 -0
- package/dist/pipeline/output-validator.js.map +1 -0
- package/dist/pipeline/phase2/phases/adr-generator.d.ts.map +1 -1
- package/dist/pipeline/phase2/phases/adr-generator.js +21 -3
- package/dist/pipeline/phase2/phases/adr-generator.js.map +1 -1
- package/dist/pipeline/phase4-adrs/phase4-adrs-coordinator.d.ts.map +1 -1
- package/dist/pipeline/phase4-adrs/phase4-adrs-coordinator.js +17 -5
- package/dist/pipeline/phase4-adrs/phase4-adrs-coordinator.js.map +1 -1
- package/dist/pipeline/ruflo-phase-executor.d.ts.map +1 -1
- package/dist/pipeline/ruflo-phase-executor.js +52 -60
- package/dist/pipeline/ruflo-phase-executor.js.map +1 -1
- package/docs/templates/ADR-Good-Example.md +787 -0
- package/docs/templates/Implementation-Prompts-Good-Example.md +1158 -0
- package/docs/templates/promotion-changelog.md +94 -0
- package/docs/templates/regression-changelog.md +86 -0
- package/docs/templates/sparc-specification-good-example.md +600 -0
- package/package.json +2 -1
|
@@ -0,0 +1,787 @@
|
|
|
1
|
+
# ADR-001: Ruvector Core Architecture
|
|
2
|
+
|
|
3
|
+
**Status**: Proposed
|
|
4
|
+
**Date**: 2026-01-18
|
|
5
|
+
**Authors**: ruv.io, RuVector Team
|
|
6
|
+
**Deciders**: Architecture Review Board
|
|
7
|
+
**SDK**: Claude-Flow
|
|
8
|
+
|
|
9
|
+
**Note**: The storage layer described in this ADR is superseded by ADR-029 (RVF as Canonical Binary Format). All vector persistence now uses the RVF segment model.
|
|
10
|
+
|
|
11
|
+
## Version History
|
|
12
|
+
|
|
13
|
+
| Version | Date | Author | Changes |
|
|
14
|
+
|---------|------|--------|---------|
|
|
15
|
+
| 0.1 | 2026-01-18 | ruv.io | Initial architecture proposal |
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## Context
|
|
20
|
+
|
|
21
|
+
### The Vector Database Challenge
|
|
22
|
+
|
|
23
|
+
Modern AI applications require vector databases that can:
|
|
24
|
+
|
|
25
|
+
1. **Store high-dimensional embeddings** from LLMs and embedding models
|
|
26
|
+
2. **Search with sub-millisecond latency** for real-time inference
|
|
27
|
+
3. **Scale to billions of vectors** while maintaining performance
|
|
28
|
+
4. **Deploy anywhere** - edge devices, browsers (WASM), cloud servers
|
|
29
|
+
5. **Integrate seamlessly** with LLM inference pipelines
|
|
30
|
+
|
|
31
|
+
### Current State of Vector Databases
|
|
32
|
+
|
|
33
|
+
Existing solutions fall into several categories:
|
|
34
|
+
|
|
35
|
+
| Category | Examples | Limitations |
|
|
36
|
+
|----------|----------|-------------|
|
|
37
|
+
| **Cloud-only** | Pinecone | No edge deployment, vendor lock-in |
|
|
38
|
+
| **Heavy native** | Milvus, Qdrant | Complex deployment, high memory |
|
|
39
|
+
| **Python-first** | ChromaDB, FAISS | Performance overhead, no WASM |
|
|
40
|
+
| **Learning-capable** | None | No existing solutions learn from usage |
|
|
41
|
+
|
|
42
|
+
### The Ruvector Vision
|
|
43
|
+
|
|
44
|
+
Ruvector is designed as a **high-performance, learning-capable vector database** implemented in Rust that:
|
|
45
|
+
|
|
46
|
+
- Achieves **61us p50 latency** for k=10 search on 384-dim vectors
|
|
47
|
+
- Provides **2-32x memory compression** through tiered quantization
|
|
48
|
+
- Runs **anywhere** - native (x86_64, ARM64), WASM (browser, edge), PostgreSQL extension
|
|
49
|
+
- **Learns from usage** via GNN layers that improve search quality over time
|
|
50
|
+
- Integrates with **AI agent memory systems** for policy, session state, and audit logs
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## Decision
|
|
55
|
+
|
|
56
|
+
### Adopt a Layered, SIMD-Optimized Architecture
|
|
57
|
+
|
|
58
|
+
We implement ruvector-core as the foundational vector database engine with the following architecture:
|
|
59
|
+
|
|
60
|
+
```
|
|
61
|
+
+-----------------------------------------------------------------------------+
|
|
62
|
+
| APPLICATION LAYER |
|
|
63
|
+
| AgenticDB | VectorDB API | Cypher Queries | REST/gRPC Server |
|
|
64
|
+
+-----------------------------------------------------------------------------+
|
|
65
|
+
|
|
|
66
|
+
+-----------------------------------------------------------------------------+
|
|
67
|
+
| INDEX LAYER |
|
|
68
|
+
| HNSW Index | Flat Index | Filtered Search | Hybrid Search | MMR |
|
|
69
|
+
+-----------------------------------------------------------------------------+
|
|
70
|
+
|
|
|
71
|
+
+-----------------------------------------------------------------------------+
|
|
72
|
+
| QUANTIZATION LAYER |
|
|
73
|
+
| Scalar (4x) | Product (8-16x) | Binary (32x) | Conformal Prediction |
|
|
74
|
+
+-----------------------------------------------------------------------------+
|
|
75
|
+
|
|
|
76
|
+
+-----------------------------------------------------------------------------+
|
|
77
|
+
| DISTANCE LAYER |
|
|
78
|
+
| Euclidean | Cosine | Dot Product | Manhattan | SIMD Dispatch |
|
|
79
|
+
+-----------------------------------------------------------------------------+
|
|
80
|
+
|
|
|
81
|
+
+-----------------------------------------------------------------------------+
|
|
82
|
+
| SIMD INTRINSICS LAYER |
|
|
83
|
+
| AVX2/AVX-512 (x86_64) | NEON (ARM64/Apple Silicon) | Scalar Fallback |
|
|
84
|
+
+-----------------------------------------------------------------------------+
|
|
85
|
+
|
|
|
86
|
+
+-----------------------------------------------------------------------------+
|
|
87
|
+
| STORAGE LAYER |
|
|
88
|
+
| REDB (native) | Memory-only (WASM) | PostgreSQL Extension |
|
|
89
|
+
+-----------------------------------------------------------------------------+
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
## Key Components
|
|
95
|
+
|
|
96
|
+
### 1. SIMD Intrinsics Layer (`simd_intrinsics.rs`)
|
|
97
|
+
|
|
98
|
+
The performance foundation of ruvector, providing hardware-accelerated distance calculations.
|
|
99
|
+
|
|
100
|
+
#### Architecture Dispatch
|
|
101
|
+
|
|
102
|
+
```rust
|
|
103
|
+
pub fn euclidean_distance_simd(a: &[f32], b: &[f32]) -> f32 {
|
|
104
|
+
#[cfg(target_arch = "x86_64")]
|
|
105
|
+
{
|
|
106
|
+
if is_x86_feature_detected!("avx2") {
|
|
107
|
+
unsafe { euclidean_distance_avx2_impl(a, b) }
|
|
108
|
+
} else {
|
|
109
|
+
euclidean_distance_scalar(a, b)
|
|
110
|
+
}
|
|
111
|
+
}
|
|
112
|
+
|
|
113
|
+
#[cfg(target_arch = "aarch64")]
|
|
114
|
+
{
|
|
115
|
+
unsafe { euclidean_distance_neon_impl(a, b) }
|
|
116
|
+
}
|
|
117
|
+
|
|
118
|
+
#[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))]
|
|
119
|
+
{
|
|
120
|
+
euclidean_distance_scalar(a, b)
|
|
121
|
+
}
|
|
122
|
+
}
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
#### Supported Operations
|
|
126
|
+
|
|
127
|
+
| Operation | AVX2 (x86_64) | NEON (ARM64) | Scalar Fallback |
|
|
128
|
+
|-----------|---------------|--------------|-----------------|
|
|
129
|
+
| Euclidean Distance | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
|
|
130
|
+
| Dot Product | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
|
|
131
|
+
| Cosine Similarity | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
|
|
132
|
+
| Manhattan Distance | N/A | 4 floats/cycle | 1 float/cycle |
|
|
133
|
+
|
|
134
|
+
#### Performance Characteristics
|
|
135
|
+
|
|
136
|
+
| Metric | AVX2 | NEON | Scalar |
|
|
137
|
+
|--------|------|------|--------|
|
|
138
|
+
| **512-dim Euclidean** | ~16M ops/sec | ~8M ops/sec | ~2M ops/sec |
|
|
139
|
+
| **384-dim Cosine** | ~143ns | ~200ns | ~800ns |
|
|
140
|
+
| **1536-dim Dot Product** | ~33ns | ~50ns | ~150ns |
|
|
141
|
+
|
|
142
|
+
#### Security Guarantees
|
|
143
|
+
|
|
144
|
+
- Bounds checking via `assert_eq!(a.len(), b.len())` prevents buffer overflows
|
|
145
|
+
- Unaligned loads (`_mm256_loadu_ps`, `vld1q_f32`) handle arbitrary alignment
|
|
146
|
+
- Scalar fallback handles remainder elements after SIMD processing
|
|
147
|
+
|
|
148
|
+
### 2. Distance Metrics Layer (`distance.rs`)
|
|
149
|
+
|
|
150
|
+
High-level distance API with optional SimSIMD integration for additional acceleration.
|
|
151
|
+
|
|
152
|
+
#### Supported Metrics
|
|
153
|
+
|
|
154
|
+
```rust
|
|
155
|
+
pub enum DistanceMetric {
|
|
156
|
+
Euclidean, // L2 distance: sqrt(sum((a[i] - b[i])^2))
|
|
157
|
+
Cosine, // 1 - cosine_similarity
|
|
158
|
+
DotProduct, // Negative dot product (for maximization)
|
|
159
|
+
Manhattan, // L1 distance: sum(|a[i] - b[i]|)
|
|
160
|
+
}
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
#### Feature Flags
|
|
164
|
+
|
|
165
|
+
| Feature | Description | Use Case |
|
|
166
|
+
|---------|-------------|----------|
|
|
167
|
+
| `simd` | SimSIMD acceleration | Native builds |
|
|
168
|
+
| `parallel` | Rayon batch processing | Multi-core systems |
|
|
169
|
+
| None | Pure Rust fallback | WASM builds |
|
|
170
|
+
|
|
171
|
+
#### Batch Distance API
|
|
172
|
+
|
|
173
|
+
```rust
|
|
174
|
+
pub fn batch_distances(
|
|
175
|
+
query: &[f32],
|
|
176
|
+
vectors: &[Vec<f32>],
|
|
177
|
+
metric: DistanceMetric,
|
|
178
|
+
) -> Result<Vec<f32>> {
|
|
179
|
+
#[cfg(all(feature = "parallel", not(target_arch = "wasm32")))]
|
|
180
|
+
{
|
|
181
|
+
use rayon::prelude::*;
|
|
182
|
+
vectors.par_iter()
|
|
183
|
+
.map(|v| distance(query, v, metric))
|
|
184
|
+
.collect()
|
|
185
|
+
}
|
|
186
|
+
// Sequential fallback for WASM...
|
|
187
|
+
}
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
### 3. Index Structures (`index/`)
|
|
191
|
+
|
|
192
|
+
#### HNSW Index (`index/hnsw.rs`)
|
|
193
|
+
|
|
194
|
+
Hierarchical Navigable Small World graph for approximate nearest neighbor search.
|
|
195
|
+
|
|
196
|
+
**Configuration Parameters:**
|
|
197
|
+
|
|
198
|
+
| Parameter | Default | Description |
|
|
199
|
+
|-----------|---------|-------------|
|
|
200
|
+
| `m` | 32 | Connections per layer (higher = better recall, more memory) |
|
|
201
|
+
| `ef_construction` | 200 | Build-time search depth (higher = better graph, slower build) |
|
|
202
|
+
| `ef_search` | 100 | Query-time search depth (higher = better recall, slower query) |
|
|
203
|
+
| `max_elements` | 10M | Pre-allocated capacity |
|
|
204
|
+
|
|
205
|
+
**Complexity Analysis:**
|
|
206
|
+
|
|
207
|
+
| Operation | Time Complexity | Space Complexity |
|
|
208
|
+
|-----------|-----------------|------------------|
|
|
209
|
+
| Insert | O(log n * m * ef_construction) | O(m * log n) per vector |
|
|
210
|
+
| Search | O(log n * m * ef_search) | O(ef_search) |
|
|
211
|
+
| Delete | O(1)* | O(1) |
|
|
212
|
+
|
|
213
|
+
*Note: HNSW deletion marks vectors as removed but does not restructure the graph.
|
|
214
|
+
|
|
215
|
+
**Serialization:**
|
|
216
|
+
|
|
217
|
+
```rust
|
|
218
|
+
pub struct HnswState {
|
|
219
|
+
vectors: Vec<(String, Vec<f32>)>,
|
|
220
|
+
id_to_idx: Vec<(String, usize)>,
|
|
221
|
+
idx_to_id: Vec<(usize, String)>,
|
|
222
|
+
next_idx: usize,
|
|
223
|
+
config: SerializableHnswConfig,
|
|
224
|
+
dimensions: usize,
|
|
225
|
+
metric: SerializableDistanceMetric,
|
|
226
|
+
}
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
#### Flat Index
|
|
230
|
+
|
|
231
|
+
Linear scan index for small datasets or exact search.
|
|
232
|
+
|
|
233
|
+
**Use Cases:**
|
|
234
|
+
- Datasets < 10K vectors
|
|
235
|
+
- Exact k-NN required
|
|
236
|
+
- Benchmarking HNSW recall
|
|
237
|
+
|
|
238
|
+
### 4. Quantization Strategies (`quantization.rs`)
|
|
239
|
+
|
|
240
|
+
Memory compression techniques trading precision for storage efficiency.
|
|
241
|
+
|
|
242
|
+
#### Scalar Quantization (4x compression)
|
|
243
|
+
|
|
244
|
+
Quantizes f32 to u8 using min-max scaling.
|
|
245
|
+
|
|
246
|
+
```rust
|
|
247
|
+
pub struct ScalarQuantized {
|
|
248
|
+
pub data: Vec<u8>, // Quantized values
|
|
249
|
+
pub min: f32, // Minimum for dequantization
|
|
250
|
+
pub scale: f32, // Scale factor
|
|
251
|
+
}
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
**Characteristics:**
|
|
255
|
+
- Compression: 4x (f32 -> u8)
|
|
256
|
+
- Distance calculation: Uses average scale for symmetric distance
|
|
257
|
+
- Reconstruction error: < 0.4% for typical embedding distributions
|
|
258
|
+
|
|
259
|
+
#### Product Quantization (8-16x compression)
|
|
260
|
+
|
|
261
|
+
Divides vectors into subspaces, each quantized independently via k-means codebooks.
|
|
262
|
+
|
|
263
|
+
```rust
|
|
264
|
+
pub struct ProductQuantized {
|
|
265
|
+
pub codes: Vec<u8>, // One code per subspace
|
|
266
|
+
pub codebooks: Vec<Vec<Vec<f32>>>, // Learned centroids
|
|
267
|
+
}
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
**Training:**
|
|
271
|
+
- K-means clustering on subspace vectors
|
|
272
|
+
- Codebook size typically 256 (fits in u8)
|
|
273
|
+
- Iterations: 10-100 for convergence
|
|
274
|
+
|
|
275
|
+
#### Binary Quantization (32x compression)
|
|
276
|
+
|
|
277
|
+
Single-bit representation based on sign.
|
|
278
|
+
|
|
279
|
+
```rust
|
|
280
|
+
pub struct BinaryQuantized {
|
|
281
|
+
pub bits: Vec<u8>, // Packed bits (8 dimensions per byte)
|
|
282
|
+
pub dimensions: usize,
|
|
283
|
+
}
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
**Characteristics:**
|
|
287
|
+
- Compression: 32x (f32 -> 1 bit)
|
|
288
|
+
- Distance: Hamming distance (XOR + popcount)
|
|
289
|
+
- Best for: Filtering stage before exact distance on candidates
|
|
290
|
+
|
|
291
|
+
#### Tiered Compression Strategy
|
|
292
|
+
|
|
293
|
+
Ruvector automatically manages compression based on access patterns:
|
|
294
|
+
|
|
295
|
+
| Access Frequency | Format | Compression | Latency |
|
|
296
|
+
|-----------------|--------|-------------|---------|
|
|
297
|
+
| Hot (>80%) | f32 | 1x | Instant |
|
|
298
|
+
| Warm (40-80%) | f16 | 2x | ~1us |
|
|
299
|
+
| Cool (10-40%) | Scalar | 4x | ~10us |
|
|
300
|
+
| Cold (1-10%) | Product | 8-16x | ~100us |
|
|
301
|
+
| Archive (<1%) | Binary | 32x | ~1ms |
|
|
302
|
+
|
|
303
|
+
### 5. Memory Management
|
|
304
|
+
|
|
305
|
+
#### Arena Allocator (`arena.rs`)
|
|
306
|
+
|
|
307
|
+
Bump allocator for batch operations reducing allocation overhead.
|
|
308
|
+
|
|
309
|
+
#### Lock-Free Structures (`lockfree.rs`)
|
|
310
|
+
|
|
311
|
+
- Crossbeam-based concurrent data structures
|
|
312
|
+
- Lock-free queues for batch ingestion
|
|
313
|
+
- Available only on `parallel` feature (not WASM)
|
|
314
|
+
|
|
315
|
+
#### Cache-Optimized Operations (`cache_optimized.rs`)
|
|
316
|
+
|
|
317
|
+
- Prefetching hints for sequential access
|
|
318
|
+
- Cache-line aligned storage
|
|
319
|
+
- NUMA-aware allocation on supported platforms
|
|
320
|
+
|
|
321
|
+
### 6. Storage Layer (`storage.rs`)
|
|
322
|
+
|
|
323
|
+
#### Native Storage (REDB)
|
|
324
|
+
|
|
325
|
+
- ACID transactions
|
|
326
|
+
- Memory-mapped vectors
|
|
327
|
+
- Configuration persistence
|
|
328
|
+
- Connection pooling for multiple VectorDB instances
|
|
329
|
+
|
|
330
|
+
```rust
|
|
331
|
+
const VECTORS_TABLE: TableDefinition<&str, &[u8]> = TableDefinition::new("vectors");
|
|
332
|
+
const METADATA_TABLE: TableDefinition<&str, &str> = TableDefinition::new("metadata");
|
|
333
|
+
const CONFIG_TABLE: TableDefinition<&str, &str> = TableDefinition::new("config");
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
**Security:**
|
|
337
|
+
- Path traversal protection
|
|
338
|
+
- Validates relative paths don't escape working directory
|
|
339
|
+
|
|
340
|
+
#### Memory-Only Storage (`storage_memory.rs`)
|
|
341
|
+
|
|
342
|
+
- Pure in-memory for WASM
|
|
343
|
+
- No persistence
|
|
344
|
+
- DashMap for concurrent access
|
|
345
|
+
|
|
346
|
+
---
|
|
347
|
+
|
|
348
|
+
## Integration Points
|
|
349
|
+
|
|
350
|
+
### 1. Policy Memory Store
|
|
351
|
+
|
|
352
|
+
Ruvector serves as the backing store for AI agent policy memory:
|
|
353
|
+
|
|
354
|
+
```
|
|
355
|
+
+-------------------+ +-------------------+ +-------------------+
|
|
356
|
+
| AI Agent | | Policy Memory | | ruvector-core |
|
|
357
|
+
| | ----> | (AgenticDB) | ----> | |
|
|
358
|
+
| "What action for | | Search similar | | HNSW search |
|
|
359
|
+
| this situation?" | | past situations | | with metadata |
|
|
360
|
+
+-------------------+ +-------------------+ +-------------------+
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
**Use Cases:**
|
|
364
|
+
- Q-learning state-action lookups
|
|
365
|
+
- Contextual bandit policy retrieval
|
|
366
|
+
- Episodic memory for reasoning
|
|
367
|
+
|
|
368
|
+
### 2. Session State Index
|
|
369
|
+
|
|
370
|
+
Real-time session context for conversational AI:
|
|
371
|
+
|
|
372
|
+
```
|
|
373
|
+
+-------------------+ +-------------------+ +-------------------+
|
|
374
|
+
| Chat Session | | Session Index | | ruvector-core |
|
|
375
|
+
| | ----> | | ----> | |
|
|
376
|
+
| Current context | | Find relevant | | Cosine similarity |
|
|
377
|
+
| embedding | | past turns | | top-k search |
|
|
378
|
+
+-------------------+ +-------------------+ +-------------------+
|
|
379
|
+
```
|
|
380
|
+
|
|
381
|
+
**Requirements:**
|
|
382
|
+
- < 10ms latency for interactive use
|
|
383
|
+
- Session isolation via namespaces
|
|
384
|
+
- TTL-based cleanup
|
|
385
|
+
|
|
386
|
+
### 3. Witness Log for Audit
|
|
387
|
+
|
|
388
|
+
Cryptographically-linked audit trail:
|
|
389
|
+
|
|
390
|
+
```
|
|
391
|
+
+-------------------+ +-------------------+ +-------------------+
|
|
392
|
+
| Agent Action | | Witness Log | | ruvector-core |
|
|
393
|
+
| | ----> | | ----> | |
|
|
394
|
+
| Action embedding | | Store with hash | | Append-only |
|
|
395
|
+
| + metadata | | chain reference | | with timestamps |
|
|
396
|
+
+-------------------+ +-------------------+ +-------------------+
|
|
397
|
+
```
|
|
398
|
+
|
|
399
|
+
**Properties:**
|
|
400
|
+
- Immutable entries
|
|
401
|
+
- Hash-chain linking
|
|
402
|
+
- Semantic searchability
|
|
403
|
+
|
|
404
|
+
---
|
|
405
|
+
|
|
406
|
+
## Decision Drivers
|
|
407
|
+
|
|
408
|
+
### 1. Performance (Sub-millisecond Latency)
|
|
409
|
+
|
|
410
|
+
| Requirement | Implementation |
|
|
411
|
+
|-------------|----------------|
|
|
412
|
+
| 61us p50 search | SIMD-optimized distance + HNSW |
|
|
413
|
+
| 16,400 QPS | Parallel search with Rayon |
|
|
414
|
+
| Batch ingestion | Lock-free queues + bulk insert |
|
|
415
|
+
|
|
416
|
+
### 2. Memory Efficiency (Quantization Support)
|
|
417
|
+
|
|
418
|
+
| Requirement | Implementation |
|
|
419
|
+
|-------------|----------------|
|
|
420
|
+
| 4x compression | Scalar quantization |
|
|
421
|
+
| 8-16x compression | Product quantization |
|
|
422
|
+
| 32x compression | Binary quantization |
|
|
423
|
+
| Automatic tiering | Access pattern tracking |
|
|
424
|
+
|
|
425
|
+
### 3. Cross-Platform Portability (WASM, Native)
|
|
426
|
+
|
|
427
|
+
| Platform | Features Available |
|
|
428
|
+
|----------|-------------------|
|
|
429
|
+
| x86_64 Linux/macOS | Full (SIMD, parallel, storage) |
|
|
430
|
+
| ARM64 macOS (Apple Silicon) | Full (NEON, parallel, storage) |
|
|
431
|
+
| WASM (browser) | Memory-only, scalar fallback |
|
|
432
|
+
| PostgreSQL extension | Full + SQL integration |
|
|
433
|
+
|
|
434
|
+
### 4. LLM Integration
|
|
435
|
+
|
|
436
|
+
| Requirement | Implementation |
|
|
437
|
+
|-------------|----------------|
|
|
438
|
+
| Embedding ingestion | API-based and local providers |
|
|
439
|
+
| Semantic search | Cosine/dot product metrics |
|
|
440
|
+
| RAG pipeline | Hybrid search + metadata filtering |
|
|
441
|
+
|
|
442
|
+
---
|
|
443
|
+
|
|
444
|
+
## Alternatives Considered
|
|
445
|
+
|
|
446
|
+
### Alternative 1: Pure Python Implementation (NumPy/FAISS)
|
|
447
|
+
|
|
448
|
+
**Rejected because:**
|
|
449
|
+
- 10-100x slower than Rust SIMD
|
|
450
|
+
- No WASM support
|
|
451
|
+
- GIL contention in concurrent workloads
|
|
452
|
+
|
|
453
|
+
### Alternative 2: C++ with Bindings
|
|
454
|
+
|
|
455
|
+
**Rejected because:**
|
|
456
|
+
- Memory safety concerns
|
|
457
|
+
- Complex cross-compilation
|
|
458
|
+
- Build system complexity (CMake)
|
|
459
|
+
|
|
460
|
+
### Alternative 3: Qdrant/Milvus Integration
|
|
461
|
+
|
|
462
|
+
**Rejected because:**
|
|
463
|
+
- External service dependency
|
|
464
|
+
- No WASM support
|
|
465
|
+
- Complex deployment for edge use cases
|
|
466
|
+
|
|
467
|
+
### Alternative 4: GPU-Only Acceleration (CUDA/ROCm)
|
|
468
|
+
|
|
469
|
+
**Rejected because:**
|
|
470
|
+
- Not portable to edge/mobile
|
|
471
|
+
- Driver dependencies
|
|
472
|
+
- Overkill for < 100M vectors
|
|
473
|
+
|
|
474
|
+
---
|
|
475
|
+
|
|
476
|
+
## Consequences
|
|
477
|
+
|
|
478
|
+
### Benefits
|
|
479
|
+
|
|
480
|
+
1. **Performance**: Sub-millisecond latency enables real-time AI applications
|
|
481
|
+
2. **Portability**: Single codebase runs native, WASM, and PostgreSQL
|
|
482
|
+
3. **Memory Efficiency**: 2-32x compression makes large datasets practical on edge
|
|
483
|
+
4. **Integration**: Native Rust means zero-cost abstractions for embedding in other systems
|
|
484
|
+
5. **Learning**: GNN layers can improve search quality without reindexing
|
|
485
|
+
|
|
486
|
+
### Risks and Mitigations
|
|
487
|
+
|
|
488
|
+
| Risk | Probability | Impact | Mitigation |
|
|
489
|
+
|------|-------------|--------|------------|
|
|
490
|
+
| HNSW recall < 100% | High | Medium | ef_search tuning, hybrid with exact search |
|
|
491
|
+
| Quantization accuracy loss | Medium | Medium | Conformal prediction bounds |
|
|
492
|
+
| WASM performance gap | Medium | Low | Specialized WASM-optimized builds |
|
|
493
|
+
| API embeddings require external call | High | Low | Local embedding option via ONNX |
|
|
494
|
+
|
|
495
|
+
### Performance Targets
|
|
496
|
+
|
|
497
|
+
| Metric | Target | Achieved |
|
|
498
|
+
|--------|--------|----------|
|
|
499
|
+
| HNSW Search (k=10, 384-dim) | < 100us p50 | 61us |
|
|
500
|
+
| HNSW Search (k=100, 384-dim) | < 200us p50 | 164us |
|
|
501
|
+
| Cosine Distance (1536-dim) | < 200ns | 143ns |
|
|
502
|
+
| Dot Product (384-dim) | < 50ns | 33ns |
|
|
503
|
+
| Batch Distance (1000 vectors) | < 500us | 237us |
|
|
504
|
+
| QPS (10K vectors, k=10) | > 10K | 16,400 |
|
|
505
|
+
|
|
506
|
+
---
|
|
507
|
+
|
|
508
|
+
## Implementation Status
|
|
509
|
+
|
|
510
|
+
### Completed (v0.1.x)
|
|
511
|
+
|
|
512
|
+
| Module | Status | Description |
|
|
513
|
+
|--------|--------|-------------|
|
|
514
|
+
| `simd_intrinsics` | Complete | AVX2/NEON dispatch with scalar fallback |
|
|
515
|
+
| `distance` | Complete | All 4 metrics with SimSIMD integration |
|
|
516
|
+
| `index/hnsw` | Complete | Full HNSW with serialization |
|
|
517
|
+
| `index/flat` | Complete | Linear scan baseline |
|
|
518
|
+
| `quantization` | Complete | Scalar, Product, Binary |
|
|
519
|
+
| `storage` | Complete | REDB-based with connection pooling |
|
|
520
|
+
| `storage_memory` | Complete | In-memory for WASM |
|
|
521
|
+
| `types` | Complete | Core types with serde |
|
|
522
|
+
| `error` | Complete | Error types with thiserror |
|
|
523
|
+
| `vector_db` | Complete | High-level API |
|
|
524
|
+
| `agenticdb` | Complete | AI agent memory interface |
|
|
525
|
+
|
|
526
|
+
### Advanced Features
|
|
527
|
+
|
|
528
|
+
| Module | Status | Description |
|
|
529
|
+
|--------|--------|-------------|
|
|
530
|
+
| `advanced_features/filtered_search` | Complete | Metadata-based filtering |
|
|
531
|
+
| `advanced_features/hybrid_search` | Complete | Dense + sparse (BM25) |
|
|
532
|
+
| `advanced_features/mmr` | Complete | Maximal Marginal Relevance |
|
|
533
|
+
| `advanced_features/conformal_prediction` | Complete | Uncertainty quantification |
|
|
534
|
+
| `advanced_features/product_quantization` | Complete | Enhanced PQ with training |
|
|
535
|
+
|
|
536
|
+
### Research Features (`advanced/`)
|
|
537
|
+
|
|
538
|
+
| Module | Status | Description |
|
|
539
|
+
|--------|--------|-------------|
|
|
540
|
+
| `hypergraph` | Experimental | Hyperedge relationships |
|
|
541
|
+
| `learned_index` | Experimental | Neural index structures |
|
|
542
|
+
| `neural_hash` | Experimental | LSH with neural tuning |
|
|
543
|
+
| `tda` | Experimental | Topological data analysis |
|
|
544
|
+
|
|
545
|
+
---
|
|
546
|
+
|
|
547
|
+
## Feature Flags
|
|
548
|
+
|
|
549
|
+
| Feature | Default | Description |
|
|
550
|
+
|---------|---------|-------------|
|
|
551
|
+
| `default` | Yes | simd, storage, hnsw, api-embeddings, parallel |
|
|
552
|
+
| `simd` | Yes | SimSIMD acceleration |
|
|
553
|
+
| `parallel` | Yes | Rayon parallel processing |
|
|
554
|
+
| `storage` | Yes | REDB file-based storage |
|
|
555
|
+
| `hnsw` | Yes | HNSW index support |
|
|
556
|
+
| `api-embeddings` | Yes | HTTP-based embedding providers |
|
|
557
|
+
| `memory-only` | No | Pure in-memory (WASM) |
|
|
558
|
+
| `real-embeddings` | No | Deprecated, use api-embeddings |
|
|
559
|
+
|
|
560
|
+
---
|
|
561
|
+
|
|
562
|
+
## Dependencies
|
|
563
|
+
|
|
564
|
+
### Core Dependencies
|
|
565
|
+
|
|
566
|
+
| Dependency | Version | Purpose |
|
|
567
|
+
|------------|---------|---------|
|
|
568
|
+
| `hnsw_rs` | workspace | HNSW implementation |
|
|
569
|
+
| `simsimd` | workspace | SIMD distance functions |
|
|
570
|
+
| `rayon` | workspace | Parallel iteration |
|
|
571
|
+
| `redb` | workspace | Embedded database |
|
|
572
|
+
| `bincode` | workspace | Binary serialization |
|
|
573
|
+
| `dashmap` | workspace | Concurrent hash map |
|
|
574
|
+
| `parking_lot` | workspace | Optimized locks |
|
|
575
|
+
|
|
576
|
+
### Optional Dependencies
|
|
577
|
+
|
|
578
|
+
| Dependency | Feature | Purpose |
|
|
579
|
+
|------------|---------|---------|
|
|
580
|
+
| `reqwest` | api-embeddings | HTTP client for embedding APIs |
|
|
581
|
+
| `memmap2` | storage | Memory-mapped files |
|
|
582
|
+
| `crossbeam` | parallel | Lock-free data structures |
|
|
583
|
+
|
|
584
|
+
---
|
|
585
|
+
|
|
586
|
+
## API Examples
|
|
587
|
+
|
|
588
|
+
### Basic Vector Search
|
|
589
|
+
|
|
590
|
+
```rust
|
|
591
|
+
use ruvector_core::{VectorDB, DistanceMetric, HnswConfig};
|
|
592
|
+
|
|
593
|
+
// Create database
|
|
594
|
+
let config = HnswConfig {
|
|
595
|
+
m: 32,
|
|
596
|
+
ef_construction: 200,
|
|
597
|
+
ef_search: 100,
|
|
598
|
+
max_elements: 1_000_000,
|
|
599
|
+
};
|
|
600
|
+
let mut db = VectorDB::new(384, DistanceMetric::Cosine, config)?;
|
|
601
|
+
|
|
602
|
+
// Insert vectors
|
|
603
|
+
db.insert("doc_1".to_string(), vec![0.1; 384])?;
|
|
604
|
+
db.insert("doc_2".to_string(), vec![0.2; 384])?;
|
|
605
|
+
|
|
606
|
+
// Search
|
|
607
|
+
let query = vec![0.15; 384];
|
|
608
|
+
let results = db.search(&query, 10)?;
|
|
609
|
+
```
|
|
610
|
+
|
|
611
|
+
### Quantized Search
|
|
612
|
+
|
|
613
|
+
```rust
|
|
614
|
+
use ruvector_core::quantization::{ScalarQuantized, QuantizedVector};
|
|
615
|
+
|
|
616
|
+
// Quantize vectors for storage
|
|
617
|
+
let quantized = ScalarQuantized::quantize(&vector);
|
|
618
|
+
|
|
619
|
+
// Distance in quantized space
|
|
620
|
+
let distance = quantized.distance(&other_quantized);
|
|
621
|
+
|
|
622
|
+
// Reconstruct if needed
|
|
623
|
+
let reconstructed = quantized.reconstruct();
|
|
624
|
+
```
|
|
625
|
+
|
|
626
|
+
### Batch Operations
|
|
627
|
+
|
|
628
|
+
```rust
|
|
629
|
+
use ruvector_core::distance::batch_distances;
|
|
630
|
+
|
|
631
|
+
// Calculate distances to many vectors in parallel
|
|
632
|
+
let distances = batch_distances(
|
|
633
|
+
&query,
|
|
634
|
+
&corpus_vectors,
|
|
635
|
+
DistanceMetric::Cosine,
|
|
636
|
+
)?;
|
|
637
|
+
```
|
|
638
|
+
|
|
639
|
+
---
|
|
640
|
+
|
|
641
|
+
## References
|
|
642
|
+
|
|
643
|
+
1. Malkov, Y., & Yashunin, D. (2018). "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs." arXiv:1603.09320.
|
|
644
|
+
|
|
645
|
+
2. Jegou, H., Douze, M., & Schmid, C. (2011). "Product quantization for nearest neighbor search." IEEE TPAMI.
|
|
646
|
+
|
|
647
|
+
3. RuVector Team. "ruvector-core Benchmarks." /crates/ruvector-core/benches/
|
|
648
|
+
|
|
649
|
+
4. SimSIMD Documentation. https://github.com/ashvardanian/SimSIMD
|
|
650
|
+
|
|
651
|
+
---
|
|
652
|
+
|
|
653
|
+
## Appendix A: SIMD Register Usage
|
|
654
|
+
|
|
655
|
+
### AVX2 (256-bit registers)
|
|
656
|
+
|
|
657
|
+
```
|
|
658
|
+
+-------+-------+-------+-------+-------+-------+-------+-------+
|
|
659
|
+
| f32 | f32 | f32 | f32 | f32 | f32 | f32 | f32 |
|
|
660
|
+
+-------+-------+-------+-------+-------+-------+-------+-------+
|
|
661
|
+
[0] [1] [2] [3] [4] [5] [6] [7]
|
|
662
|
+
|
|
663
|
+
Operations per cycle:
|
|
664
|
+
- _mm256_loadu_ps: Load 8 floats
|
|
665
|
+
- _mm256_sub_ps: 8 subtractions
|
|
666
|
+
- _mm256_mul_ps: 8 multiplications
|
|
667
|
+
- _mm256_add_ps: 8 additions
|
|
668
|
+
```
|
|
669
|
+
|
|
670
|
+
### NEON (128-bit registers)
|
|
671
|
+
|
|
672
|
+
```
|
|
673
|
+
+-------+-------+-------+-------+
|
|
674
|
+
| f32 | f32 | f32 | f32 |
|
|
675
|
+
+-------+-------+-------+-------+
|
|
676
|
+
[0] [1] [2] [3]
|
|
677
|
+
|
|
678
|
+
Operations per cycle:
|
|
679
|
+
- vld1q_f32: Load 4 floats
|
|
680
|
+
- vsubq_f32: 4 subtractions
|
|
681
|
+
- vfmaq_f32: 4 fused multiply-add
|
|
682
|
+
- vaddvq_f32: Horizontal sum
|
|
683
|
+
```
|
|
684
|
+
|
|
685
|
+
---
|
|
686
|
+
|
|
687
|
+
## Appendix B: Memory Layout
|
|
688
|
+
|
|
689
|
+
### VectorEntry
|
|
690
|
+
|
|
691
|
+
```
|
|
692
|
+
+------------------+------------------+------------------+
|
|
693
|
+
| id: String | vector: Vec<f32>| metadata: JSON |
|
|
694
|
+
| (optional) | (required) | (optional) |
|
|
695
|
+
+------------------+------------------+------------------+
|
|
696
|
+
```
|
|
697
|
+
|
|
698
|
+
### HNSW Graph Structure
|
|
699
|
+
|
|
700
|
+
```
|
|
701
|
+
Level 3: [v0] -------- [v5]
|
|
702
|
+
\ /
|
|
703
|
+
Level 2: [v0] -- [v3] -- [v5] -- [v9]
|
|
704
|
+
\ / \ / \
|
|
705
|
+
Level 1: [v0]-[v1]-[v3]-[v4]-[v5]-[v7]-[v9]
|
|
706
|
+
| | | | | | |
|
|
707
|
+
Level 0: [v0]-[v1]-[v2]-[v3]-[v4]-[v5]-[v6]-[v7]-[v8]-[v9]
|
|
708
|
+
```
|
|
709
|
+
|
|
710
|
+
---
|
|
711
|
+
|
|
712
|
+
## Appendix C: Benchmark Results
|
|
713
|
+
|
|
714
|
+
### Platform: Apple M2 (ARM64 NEON)
|
|
715
|
+
|
|
716
|
+
```
|
|
717
|
+
HNSW Search k=10 (10K vectors, 384-dim):
|
|
718
|
+
p50: 61us
|
|
719
|
+
p95: 89us
|
|
720
|
+
p99: 112us
|
|
721
|
+
Throughput: 16,400 QPS
|
|
722
|
+
|
|
723
|
+
HNSW Search k=100 (10K vectors, 384-dim):
|
|
724
|
+
p50: 164us
|
|
725
|
+
p95: 203us
|
|
726
|
+
p99: 245us
|
|
727
|
+
Throughput: 6,100 QPS
|
|
728
|
+
|
|
729
|
+
Distance Operations (1536-dim):
|
|
730
|
+
Cosine: 143ns
|
|
731
|
+
Euclidean: 156ns
|
|
732
|
+
Dot Product: 33ns (384-dim)
|
|
733
|
+
|
|
734
|
+
Batch Distance (1000 vectors, 384-dim):
|
|
735
|
+
Parallel (Rayon): 237us
|
|
736
|
+
Sequential: 890us
|
|
737
|
+
```
|
|
738
|
+
|
|
739
|
+
### Platform: Intel i7 (AVX2)
|
|
740
|
+
|
|
741
|
+
```
|
|
742
|
+
HNSW Search k=10 (10K vectors, 384-dim):
|
|
743
|
+
p50: 72us
|
|
744
|
+
p95: 105us
|
|
745
|
+
p99: 134us
|
|
746
|
+
Throughput: 13,900 QPS
|
|
747
|
+
|
|
748
|
+
Distance Operations (1536-dim):
|
|
749
|
+
Cosine: 128ns
|
|
750
|
+
Euclidean: 141ns
|
|
751
|
+
Dot Product: 29ns (384-dim)
|
|
752
|
+
```
|
|
753
|
+
|
|
754
|
+
---
|
|
755
|
+
|
|
756
|
+
## Related Decisions
|
|
757
|
+
|
|
758
|
+
- **ADR-002**: RuvLLM Integration with Ruvector
|
|
759
|
+
- **ADR-003**: SIMD Optimization Strategy
|
|
760
|
+
- **ADR-004**: KV Cache Management
|
|
761
|
+
- **ADR-005**: WASM Runtime Integration
|
|
762
|
+
- **ADR-006**: Memory Management
|
|
763
|
+
- **ADR-007**: Security Review & Technical Debt
|
|
764
|
+
|
|
765
|
+
---
|
|
766
|
+
|
|
767
|
+
## Implementation Status (v2.1)
|
|
768
|
+
|
|
769
|
+
| Component | Status | Notes |
|
|
770
|
+
|-----------|--------|-------|
|
|
771
|
+
| HNSW Index | ✅ Implemented | M=32, ef_construct=256, 16K QPS |
|
|
772
|
+
| SIMD Distance | ✅ Implemented | AVX2/NEON with fallback |
|
|
773
|
+
| Scalar Quantization | ✅ Implemented | 8-bit with min/max scaling |
|
|
774
|
+
| Batch Operations | ✅ Implemented | Rayon parallel distances |
|
|
775
|
+
| Graph Store | ✅ Implemented | Adjacency list with metadata |
|
|
776
|
+
| Persistence | ✅ Implemented | Binary format with versioning |
|
|
777
|
+
|
|
778
|
+
**Security Status:** Core components reviewed. No critical vulnerabilities in ruvector-core. See ADR-007 for full audit (RuvLLM-specific issues).
|
|
779
|
+
|
|
780
|
+
---
|
|
781
|
+
|
|
782
|
+
## Revision History
|
|
783
|
+
|
|
784
|
+
| Version | Date | Author | Changes |
|
|
785
|
+
|---------|------|--------|---------|
|
|
786
|
+
| 1.0 | 2026-01-18 | Ruvector Architecture Team | Initial version |
|
|
787
|
+
| 1.1 | 2026-01-19 | Security Review Agent | Added implementation status, related decisions |
|