npm - elid - Versions diffs - 0.2.1 → 0.3.0 - Mend

elid 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -1,11 +1,31 @@
-# ELID - Efficient Levenshtein and String Similarity Library
+# ELID - Embedding Locality IDentifier
 [![CI](https://github.com/ZachHandley/ELID/actions/workflows/ci.yml/badge.svg)](https://github.com/ZachHandley/ELID/actions)
 [![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue.svg)](LICENSE-MIT)
-A fast, zero-dependency Rust library for computing string similarity metrics with bindings for Python, JavaScript (WASM), and C.
+**ELID** enables vector search without a vector store by encoding high-dimensional embeddings into sortable string IDs that preserve locality. Similar vectors produce similar IDs, allowing you to use standard database indexes for similarity search.
-## Algorithms
+ELID also includes a complete suite of fast string similarity algorithms.
+## Features
+### Embedding Encoding (Vector Search Without Vector Stores)
+Convert embeddings from any ML model into compact, sortable identifiers:
+| Profile | Output | Best For |
+|---------|--------|----------|
+| **Mini128** | 26-char base32hex | Fast similarity via Hamming distance |
+| **Morton10x10** | 20-char base32hex | Database range queries (Z-order) |
+| **Hilbert10x10** | 20-char base32hex | Maximum locality preservation |
+**Key benefits:**
+- Similar vectors produce similar IDs (locality preservation)
+- IDs are lexicographically sortable for database indexing
+- No vector store required - use any database with string indexes
+- Deterministic: same embedding always produces the same ID
+### String Similarity Algorithms
 | Algorithm | Type | Best For |
 |-----------|------|----------|
@@ -23,8 +43,17 @@ A fast, zero-dependency Rust library for computing string similarity metrics wit
 ### Rust
 ```toml
+# String similarity only (zero dependencies)
 [dependencies]
-elid = "0.2.1"
+elid = "0.1"
+# Embedding encoding
+[dependencies]
+elid = { version = "0.1", features = ["embeddings"] }
+# Both features
+[dependencies]
+elid = { version = "0.1", features = ["strings", "embeddings"] }
 ```
 ### Python
@@ -45,6 +74,58 @@ Build with `cargo build --release --features ffi` to get `libelid.so` and `elid.
 ## Quick Start
+### Embedding Encoding (Rust)
+```rust
+use elid::embeddings::{encode, Profile, Elid};
+// Get an embedding from your ML model (e.g., OpenAI, Cohere, sentence-transformers)
+let embedding: Vec<f32> = model.embed("Hello, world!")?;
+// Encode to a sortable ELID
+let profile = Profile::default(); // Mini128
+let elid: Elid = encode(&embedding, &profile)?;
+println!("ELID: {}", elid); // e.g., "01a3f5g7h9jklmnopqrstuv"
+// Similar texts produce similar ELIDs
+let elid2 = encode(&model.embed("Hello, universe!")?, &profile)?;
+// Compare similarity via Hamming distance
+use elid::embeddings::hamming_distance;
+let distance = hamming_distance(&elid, &elid2)?; // Lower = more similar
+```
+### Encoding Profiles
+```rust
+use elid::embeddings::Profile;
+// Mini128: 128-bit SimHash (default)
+// Best for: Fast similarity search via Hamming distance
+let mini = Profile::Mini128 {
+    seed: 0x454c4944_53494d48, // Deterministic seed
+};
+// Morton10x10: Z-order curve encoding
+// Best for: Database range queries
+let morton = Profile::Morton10x10 {
+    dims: 10,
+    bits_per_dim: 10,
+    transform_id: None,
+};
+// Hilbert10x10: Hilbert curve encoding
+// Best for: Maximum locality preservation
+let hilbert = Profile::Hilbert10x10 {
+    dims: 10,
+    bits_per_dim: 10,
+    transform_id: None,
+};
+```
+### String Similarity (Rust)
 ```rust
 use elid::*;
@@ -71,9 +152,14 @@ let (idx, score) = find_best_match("app", &candidates);
 ```python
 import elid
+# String similarity
 elid.levenshtein("kitten", "sitting")  # 3
 elid.jaro_winkler("martha", "marhta")  # 0.961
 elid.simhash_similarity("iPhone 14", "iPhone 15")  # 0.922
+# Embedding encoding (with embeddings feature)
+embedding = model.embed("Hello, world!")
+elid_str = elid.encode_embedding(embedding)
 ```
 ### JavaScript
@@ -102,12 +188,62 @@ let opts = SimilarityOpts {
 let distance = levenshtein_with_opts("  HELLO  ", "hello", &opts); // 0
 ```
+## Feature Flags
+| Feature | Description | Dependencies |
+|---------|-------------|--------------|
+| `strings` | String similarity algorithms (default) | None |
+| `embeddings` | Embedding encoding (default) | rand, blake3, etc. |
+| `wasm` | WebAssembly bindings (includes embeddings) | wasm-bindgen, js-sys, getrandom |
+| `python` | Python bindings via PyO3 (includes embeddings) | pyo3, numpy, rayon |
+| `ffi` | C FFI bindings | None (enables unsafe) |
 ## Performance
-- Zero external dependencies for core algorithms
+- Zero external dependencies for string-only use
 - O(min(m,n)) space-optimized Levenshtein
 - 1.4M+ string comparisons per second (Python benchmarks)
-- ~96KB WASM binary
+- ~96KB WASM binary (strings only)
+- Embedding encoding: <1ms per vector
+## Use Cases
+### Vector Search Without Vector Stores
+Store ELIDs directly in PostgreSQL, SQLite, or any database:
+```sql
+-- Create index on ELID column
+CREATE INDEX idx_documents_elid ON documents(elid);
+-- Find similar documents using string prefix matching
+SELECT * FROM documents
+WHERE elid LIKE 'abc%'  -- Prefix match for locality
+ORDER BY elid;
+```
+### Deduplication
+Use SimHash to find near-duplicate content:
+```rust
+let hash1 = simhash("The quick brown fox");
+let hash2 = simhash("The quick brown dog");
+let similarity = simhash_similarity_from_hashes(hash1, hash2);
+if similarity > 0.9 {
+    println!("Likely duplicates!");
+}
+```
+### Fuzzy Search
+Find matches with typo tolerance:
+```rust
+let candidates = vec!["apple", "application", "apply", "banana"];
+let matches = find_matches_above_threshold("aple", &candidates, 0.7);
+// Returns: [("apple", 0.8), ...]
+```
 ## Building