PyPI - simplevecdb - Versions diffs - 1.2.0__tar.gz → 2.0.0__tar.gz - Mend

simplevecdb 1.2.0tar.gz → 2.0.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (112) hide show

simplevecdb-2.0.0/.bandit ADDED Viewed

@@ -0,0 +1,9 @@
+exclude_dirs:
+- /tests
+- /examples
+skips:
+- B104
+- B608  # SQL injection false positive: table names are validated via _validate_table_name()

{simplevecdb-1.2.0 → simplevecdb-2.0.0}/.github/workflows/ci.yml RENAMED Viewed

@@ -1,4 +1,6 @@
 name: CI
+permissions:
+  contents: read
 on:
   push:

simplevecdb-2.0.0/.github/workflows/publish.yml ADDED Viewed

@@ -0,0 +1,121 @@
+name: Publish to PyPI
+on:
+  push:
+    tags:
+      - "v*.*.*"
+permissions:
+  id-token: write
+  contents: write
+jobs:
+  # Verify version matches before publishing
+  verify:
+    runs-on: ubuntu-latest
+    outputs:
+      version: ${{ steps.version.outputs.VERSION }}
+    steps:
+      - uses: actions/checkout@v4
+      - name: Extract version from tag
+        id: version
+        run: echo "VERSION=${GITHUB_REF#refs/tags/v}" >> $GITHUB_OUTPUT
+      - name: Install uv
+        uses: astral-sh/setup-uv@v3
+      - name: Set up Python
+        run: uv python install 3.11
+      - name: Verify __version__ matches tag
+        run: |
+          PACKAGE_VERSION=$(uv run python -c "from simplevecdb import __version__; print(__version__)")
+          TAG_VERSION="${{ steps.version.outputs.VERSION }}"
+          if [ "$PACKAGE_VERSION" != "$TAG_VERSION" ]; then
+            echo "❌ Version mismatch!"
+            echo "   Tag version:     $TAG_VERSION"
+            echo "   Package version: $PACKAGE_VERSION"
+            echo ""
+            echo "Update __version__ in src/simplevecdb/__init__.py to match the tag."
+            exit 1
+          fi
+          echo "✅ Version match: $PACKAGE_VERSION"
+  release:
+    needs: verify
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - name: Extract changelog for release
+        id: changelog
+        run: |
+          VERSION="${{ needs.verify.outputs.version }}"
+          # Extract section for this version from CHANGELOG.md
+          # Matches from "## [VERSION]" until the next "## [" or end of file
+          awk -v ver="$VERSION" '
+            /^## \[/ {
+              if (found) exit
+              if (index($0, "## [" ver "]") == 1) found=1
+            }
+            found
+          ' CHANGELOG.md > release_notes.md
+          # If empty, provide a fallback
+          if [ ! -s release_notes.md ]; then
+            echo "## Release v$VERSION" > release_notes.md
+            echo "" >> release_notes.md
+            echo "See [CHANGELOG.md](https://github.com/${{ github.repository }}/blob/main/CHANGELOG.md) for details." >> release_notes.md
+          fi
+          echo "📋 Release notes:"
+          cat release_notes.md
+      - name: Create GitHub Release
+        uses: softprops/action-gh-release@v2
+        with:
+          body_path: release_notes.md
+          draft: false
+          prerelease: ${{ contains(needs.verify.outputs.version, 'rc') || contains(needs.verify.outputs.version, 'beta') || contains(needs.verify.outputs.version, 'alpha') }}
+          generate_release_notes: false
+  publish:
+    needs: [verify, release]
+    runs-on: ubuntu-latest
+    environment: pypi
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install uv
+        uses: astral-sh/setup-uv@v3
+      - name: Set up Python
+        run: uv python install 3.11
+      - name: Build package
+        run: uv build
+      - name: Verify build artifacts
+        run: |
+          echo "📦 Built packages:"
+          ls -la dist/
+          # Verify version in built package
+          uv run python -c "
+          import zipfile
+          import glob
+          whl = glob.glob('dist/*.whl')[0]
+          with zipfile.ZipFile(whl) as z:
+              for name in z.namelist():
+                  if name.endswith('METADATA'):
+                      content = z.read(name).decode()
+                      for line in content.split('\n'):
+                          if line.startswith('Version:'):
+                              print(f'✅ Package version: {line}')
+                              break
+          "
+      - name: Publish to PyPI
+        run: uv publish --token ${{ secrets.PYPI_API_TOKEN }}

{simplevecdb-1.2.0 → simplevecdb-2.0.0}/.github/workflows/security.yml RENAMED Viewed

@@ -22,12 +22,12 @@ jobs:
       - name: Create venv and install dependencies
         run: |
           uv venv
-          uv pip install safety pip-audit
+          uv pip install pip-audit
+          uv pip install ".[dev,server]"
-      - name: Scan dependencies
+      - name: Scan dependencies with pip-audit
         run: |
-          uv run safety check --json || true
-          uv run pip-audit --requirement pyproject.toml || true
+          uv run pip-audit --strict --ignore-vuln PYSEC-2024-142 || true
   code-scan:
     runs-on: ubuntu-latest

{simplevecdb-1.2.0 → simplevecdb-2.0.0}/CHANGELOG.md RENAMED Viewed

@@ -5,6 +5,168 @@ All notable changes to SimpleVecDB will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [2.0.0] - 2025-12-23
+### Breaking Changes
+- **Backend Migration: sqlite-vec → usearch HNSW**
+  - Vector search now uses usearch's high-performance HNSW algorithm
+  - 10-100x faster similarity search for large collections
+  - Vector data stored in separate `.usearch` files per collection (e.g., `mydb.db.default.usearch`)
+  - SQLite still stores metadata, text, and FTS5 index
+- **Removed `DistanceStrategy.L1`** - Manhattan distance not supported by usearch
+- **Storage Format Change**
+  - Embeddings now stored in both usearch index AND SQLite (for MMR support)
+  - Existing sqlite-vec databases will auto-migrate on first open
+  - Migration is one-way; backup before upgrading
+### Added
+- **`usearch_index.py`** - New UsearchIndex wrapper class:
+  - Thread-safe HNSW index operations (lock on writes, lock-free reads)
+  - Automatic persistence to `.usearch` files
+  - Upsert support (removes existing keys before add)
+  - BIT quantization using Hamming metric with bit packing
+  - Configurable HNSW parameters (connectivity, expansion_add, expansion_search)
+- **Proper MMR Implementation** - Max Marginal Relevance now computes actual pairwise similarity between candidates and selected documents using stored embeddings
+- **Embedding Storage in SQLite** - Embeddings stored as BLOB for:
+  - Accurate MMR diversity computation
+  - Future index rebuild from SQLite backup
+  - Schema auto-migrates existing tables
+- **`VectorCollection.rebuild_index()`** - Reconstruct usearch HNSW index from SQLite embeddings:
+  - Useful for index corruption recovery
+  - Tune HNSW parameters (connectivity, expansion_add, expansion_search)
+  - Reclaim space after many deletions
+- **`VectorDB.check_migration(path)`** - Dry-run migration check:
+  - Reports which collections need migration
+  - Shows total vector count and estimated storage
+  - Provides detailed rollback instructions
+- **Adaptive Search** - Automatically optimizes search strategy based on collection size:
+  - Collections < 10k vectors use brute-force (`exact=True`) for perfect recall
+  - Collections ≥ 10k vectors use HNSW for faster approximate search
+  - Threshold configurable via `constants.USEARCH_BRUTEFORCE_THRESHOLD`
+- **`exact` parameter** - Force search mode in `similarity_search()`:
+  - `None` (default): adaptive based on collection size
+  - `True`: force brute-force for perfect recall
+  - `False`: force HNSW approximate search
+- **`Quantization.FLOAT16`** - Half-precision floating point:
+  - 2x memory savings compared to FLOAT32
+  - 1.5x faster search with minimal precision loss
+  - Ideal for embeddings where full precision isn't needed
+- **`threads` parameter** - Parallel execution control:
+  - Added to `add_texts()` and `similarity_search()`
+  - `0` (default): auto-detect optimal thread count
+  - Explicit value: control parallelism for batch operations
+- **Auto Memory-Mapping** - Large indexes automatically use memory-mapped mode:
+  - Indexes >100k vectors use `view=True` for instant startup
+  - Lower memory footprint for large collections
+  - Transparent upgrade to writable mode on add operations
+  - Configurable via `constants.USEARCH_MMAP_THRESHOLD`
+- **`similarity_search_batch()`** - Multi-query batch search:
+  - ~10x throughput for batch query workloads
+  - Uses usearch's native batch search under the hood
+  - Same parameters as `similarity_search()` but accepts list of queries
+- **`examples/backend_benchmark.py`** - Benchmark script comparing usearch vs brute-force:
+  - Measures speedup, recall, and storage efficiency
+  - Supports all quantization levels
+  - Validates 10-100x performance claims
+### Changed
+- **Dependencies**: Replaced `sqlite-vec>=0.1.6` with `usearch>=2.12`
+- **CatalogManager**: Removed vec0 virtual table operations, added embedding column
+- **SearchEngine**: Rewrote to use UsearchIndex for all vector operations
+- **VectorCollection**: Creates usearch index at `{db_path}.{collection}.usearch`
+### Migration Notes
+1. **Backup your database** before upgrading
+2. On first open, existing sqlite-vec data will be migrated automatically
+3. New `.usearch` files will be created alongside your `.db` file
+4. The legacy sqlite-vec table is dropped after successful migration
+## [1.3.0] - 2025-12-07
+### Added
+- **Structured Logging Module** - New `simplevecdb.logging` module for production-grade observability
+  - `get_logger(name)` - Get namespaced loggers under `simplevecdb.*`
+  - `configure_logging(level, format, handler)` - One-call logging setup
+  - `log_operation(name, **context)` - Context manager for operation timing and error tracking
+  - `log_error(operation, error, **context)` - Consistent error logging with context
+- **SQLite Lock Retry Logic** - Automatic retry with exponential backoff for database lock contention
+  - `@retry_on_lock(max_retries, base_delay, max_delay, jitter)` decorator
+  - `DatabaseLockedError` exception for exhausted retries with attempt/wait metrics
+  - Applied to `add_texts()` and `delete_by_ids()` operations in CatalogManager
+- **Filter Validation** - Early validation of metadata filter dictionaries
+  - `validate_filter(filter_dict)` - Validates keys are strings, values are supported types
+  - Clear error messages for invalid filter structures
+  - Automatically called in `build_filter_clause()` before SQL generation
+- **New Exports** - Added to `simplevecdb.__all__`:
+  - `get_logger`, `configure_logging`, `log_operation`
+  - `DatabaseLockedError`, `retry_on_lock`, `validate_filter`
+### Changed
+- **CatalogManager** internal refactoring:
+  - `add_texts()` now delegates to `_insert_batch()` which has retry logic
+  - `delete_by_ids()` now has retry logic for lock contention
+  - `build_filter_clause()` validates filters before processing
+- **`delete_by_ids()` no longer auto-vacuums** - Call `VectorDB.vacuum()` separately to reclaim disk space after large deletions. This improves performance for batch deletions.
+- **RateLimiter** now includes TTL-based cleanup to prevent memory exhaustion on long-running servers with many unique clients (default: 1 hour TTL, 10k max buckets).
+- **AsyncVectorDB.close()** now guarantees database connection is closed even if executor shutdown fails.
+### Testing
+- Added 25 new tests in `tests/unit/test_error_handling.py`:
+  - 7 tests for `retry_on_lock` decorator behavior
+  - 2 tests for `DatabaseLockedError` exception
+  - 4 tests for `validate_filter` function
+  - 8 tests for logging utilities
+  - 4 integration tests for error handling in VectorDB operations
+### Example
+```python
+import logging
+from simplevecdb import (
+    VectorDB,
+    configure_logging,
+    get_logger,
+    log_operation,
+    DatabaseLockedError,
+)
+# Enable debug logging
+configure_logging(level=logging.DEBUG)
+logger = get_logger(__name__)
+try:
+    with log_operation("bulk_insert", collection="docs", count=1000):
+        db = VectorDB("data.db")
+        collection = db.collection("docs")
+        collection.add_texts(texts, embeddings=embeddings)
+except DatabaseLockedError as e:
+    logger.error(f"Insert failed after {e.attempts} attempts")
+```
 ## [1.2.0] - 2025-11-25
 ### Added
@@ -210,6 +372,7 @@ Benchmarks on i9-13900K & RTX 4090 with 10k vectors (384-dim):
 - **Documentation**: https://coderdayton.github.io/simplevecdb/
 - **License**: MIT
+[1.3.0]: https://github.com/coderdayton/simplevecdb/releases/tag/v1.3.0
 [1.2.0]: https://github.com/coderdayton/simplevecdb/releases/tag/v1.2.0
 [1.1.1]: https://github.com/coderdayton/simplevecdb/releases/tag/v1.1.1
 [1.1.0]: https://github.com/coderdayton/simplevecdb/releases/tag/v1.1.0

simplevecdb-1.2.0/README.md → simplevecdb-2.0.0/PKG-INFO RENAMED Viewed

@@ -1,3 +1,23 @@
+Metadata-Version: 2.4
+Name: simplevecdb
+Version: 2.0.0
+Summary: Dead-simple local vector database powered by usearch HNSW.
+Author-email: Dayton Dunbar <coderdayton14@gmail.com>
+License: MIT
+License-File: LICENSE
+Requires-Python: >=3.10
+Requires-Dist: numpy>=2.0
+Requires-Dist: psutil>=5.9.0
+Requires-Dist: python-dotenv>=1.2.1
+Requires-Dist: usearch>=2.12
+Provides-Extra: examples
+Requires-Dist: ollama; extra == 'examples'
+Provides-Extra: server
+Requires-Dist: fastapi>=0.115; extra == 'server'
+Requires-Dist: sentence-transformers>=5.0; extra == 'server'
+Requires-Dist: uvicorn[standard]>=0.30; extra == 'server'
+Description-Content-Type: text/markdown
 # SimpleVecDB
 [![CI](https://github.com/coderdayton/simplevecdb/actions/workflows/ci.yml/badge.svg)](https://github.com/coderdayton/simplevecdb/actions)
@@ -7,12 +27,12 @@
 **The dead-simple, local-first vector database.**
-SimpleVecDB brings **Chroma-like simplicity** to a single **SQLite file**. Built on `sqlite-vec`, it offers high-performance vector search, quantization, and zero infrastructure headaches. Perfect for local RAG, offline agents, and indie hackers who need production-grade vector search without the operational overhead.
+SimpleVecDB brings **Chroma-like simplicity** to a single **SQLite file**. Built on `usearch` HNSW indexing, it offers high-performance vector search, quantization, and zero infrastructure headaches. Perfect for local RAG, offline agents, and indie hackers who need production-grade vector search without the operational overhead.
 ## Why SimpleVecDB?
 - **Zero Infrastructure** — Just a `.db` file. No Docker, no Redis, no cloud bills.
-- **Blazing Fast** — ~2ms queries on consumer hardware with 32x storage efficiency via quantization.
+- **Blazing Fast** — 10-100x faster search via usearch HNSW. Adaptive: brute-force for <10k vectors (perfect recall), HNSW for larger collections.
 - **Truly Portable** — Runs anywhere SQLite runs: Linux, macOS, Windows, even WASM.
 - **Async Ready** — Full async/await support for web servers and concurrent workloads.
 - **Batteries Included** — Optional FastAPI embeddings server + LangChain/LlamaIndex integrations.
@@ -178,8 +198,8 @@ Organize vectors by domain within a single database file:
 from simplevecdb import VectorDB, Quantization
 db = VectorDB("app.db")
-users = db.collection("users", quantization=Quantization.INT8)
-products = db.collection("products", quantization=Quantization.BIT)
+users = db.collection("users", quantization=Quantization.FLOAT16)  # 2x memory savings
+products = db.collection("products", quantization=Quantization.BIT)  # 32x compression
 # Isolated namespaces
 users.add_texts(["Alice likes hiking"], embeddings=[[0.1]*384])
@@ -189,9 +209,22 @@ products.add_texts(["Hiking boots"], embeddings=[[0.9]*384])
 ### Search Capabilities
 ```python
-# Vector similarity (cosine/L2/inner product)
+# Vector similarity (cosine/L2) - adaptive search by default
 results = collection.similarity_search(query_vector, k=10)
+# Force exact search for perfect recall (brute-force)
+results = collection.similarity_search(query_vector, k=10, exact=True)
+# Force HNSW approximate search (faster, may miss some results)
+results = collection.similarity_search(query_vector, k=10, exact=False)
+# Parallel search with explicit thread count
+results = collection.similarity_search(query_vector, k=10, threads=8)
+# Batch search - 10x throughput for multiple queries
+queries = [query1, query2, query3]  # List of embedding vectors
+batch_results = collection.similarity_search_batch(queries, k=10)
 # Keyword search (BM25)
 results = collection.keyword_search("exact phrase", k=10)
@@ -211,36 +244,37 @@ results = collection.similarity_search(
 ## Feature Matrix
-| Feature                   | Status | Description                                                |
-| :------------------------ | :----- | :--------------------------------------------------------- |
-| **Single-File Storage**   | ✅     | SQLite `.db` file or in-memory mode                        |
-| **Multi-Collection**      | ✅     | Isolated namespaces per database                           |
-| **Vector Search**         | ✅     | Cosine, Euclidean, Inner Product metrics                   |
-| **Hybrid Search**         | ✅     | BM25 + vector fusion (Reciprocal Rank Fusion)              |
-| **Quantization**          | ✅     | FLOAT32, INT8, BIT (1-bit) for 4-32x compression           |
-| **Metadata Filtering**    | ✅     | SQL `WHERE` clause support                                 |
-| **Framework Integration** | ✅     | LangChain \& LlamaIndex adapters                           |
-| **Hardware Acceleration** | ✅     | Auto-detects CUDA/MPS/CPU                                  |
-| **Local Embeddings**      | ✅     | HuggingFace models via `[server]` extras                   |
-| **HNSW Indexing**         | 🔜     | Approximate nearest neighbor (pending `sqlite-vec` update) |
-| **Built-in Encryption**   | 🔜     | SQLCipher integration for at-rest encryption               |
+| Feature                   | Status | Description                                                  |
+| :------------------------ | :----- | :----------------------------------------------------------- |
+| **Single-File Storage**   | ✅     | SQLite `.db` file or in-memory mode                          |
+| **Multi-Collection**      | ✅     | Isolated namespaces per database                             |
+| **HNSW Indexing**         | ✅     | usearch HNSW for 10-100x faster search                       |
+| **Adaptive Search**       | ✅     | Auto brute-force for <10k vectors, HNSW for larger           |
+| **Vector Search**         | ✅     | Cosine, Euclidean metrics (L1 removed in v2.0)               |
+| **Hybrid Search**         | ✅     | BM25 + vector fusion (Reciprocal Rank Fusion)                |
+| **Quantization**          | ✅     | FLOAT32, FLOAT16, INT8, BIT for 2-32x compression            |
+| **Parallel Operations**   | ✅     | `threads` parameter for add/search                           |
+| **Metadata Filtering**    | ✅     | SQL `WHERE` clause support                                   |
+| **Framework Integration** | ✅     | LangChain \& LlamaIndex adapters                             |
+| **Hardware Acceleration** | ✅     | Auto-detects CUDA/MPS/CPU + SIMD via usearch                 |
+| **Local Embeddings**      | ✅     | HuggingFace models via `[server]` extras                     |
+| **Built-in Encryption**   | 🔜     | SQLCipher integration for at-rest encryption                 |
 ## Performance Benchmarks
-**Test Environment:** Intel i9-13900K, NVIDIA RTX 4090, `sqlite-vec` v0.1.6
-**Dataset:** 10,000 vectors × 384 dimensions
-| Quantization | Storage Size | Insert Speed | Query Latency (k=10) | Compression Ratio |
-| :----------- | :----------- | :----------- | :------------------- | :---------------- |
-| **FLOAT32**  | 15.50 MB     | 15,585 vec/s | 3.55 ms              | 1x (baseline)     |
-| **INT8**     | 4.23 MB      | 27,893 vec/s | 3.93 ms              | 3.7x smaller      |
-| **BIT**      | 0.95 MB      | 32,321 vec/s | 0.27 ms              | 16.3x smaller     |
+**10,000 vectors, 384 dimensions, k=10 search** — [Full benchmarks →](https://coderdayton.github.io/SimpleVecDB/benchmarks)
-**Key Takeaways:**
+| Quantization | Storage  | Query Time | Compression |
+| :----------- | :------- | :--------- | :---------- |
+| FLOAT32      | 36.0 MB  | 0.20 ms    | 1x          |
+| FLOAT16      | 28.7 MB  | 0.20 ms    | 2x          |
+| INT8         | 25.0 MB  | 0.16 ms    | 4x          |
+| BIT          | 21.8 MB  | 0.08 ms    | 32x         |
-- BIT quantization delivers 13x faster queries with 16x storage reduction
-- INT8 offers balanced performance (79% faster inserts, minimal query overhead)
-- Sub-4ms query latency on consumer hardware
+**Key highlights:**
+- **3-34x faster** than brute-force for collections >10k vectors
+- **Adaptive search**: perfect recall for small collections, HNSW for large
+- **FLOAT16 recommended**: best balance of speed, memory, and precision
 ## Documentation
@@ -280,14 +314,16 @@ pip install torch --index-url https://download.pytorch.org/whl/cu118
 **Slow Queries on Large Datasets**
 - Enable quantization: `collection = db.collection("docs", quantization=Quantization.INT8)`
-- Consider HNSW indexing when available (roadmap item)
+- For >10k vectors, HNSW is automatic; tune with `rebuild_index(connectivity=32)`
+- Use `exact=False` to force HNSW even on smaller collections
 - Use metadata filtering to reduce search space
 ## Roadmap
 - [x] Hybrid Search (BM25 + Vector)
 - [x] Multi-collection support
-- [ ] HNSW indexing (pending `sqlite-vec` upstream)
+- [x] HNSW indexing (usearch backend)
+- [x] Adaptive search (brute-force/HNSW)
 - [ ] SQLCipher encryption (at-rest data protection)
 - [ ] Streaming insert API for large-scale ingestion
 - [ ] Graph-based metadata relationships

simplevecdb-1.2.0/PKG-INFO → simplevecdb-2.0.0/README.md RENAMED Viewed

@@ -1,23 +1,3 @@
-Metadata-Version: 2.4
-Name: simplevecdb
-Version: 1.2.0
-Summary: Dead-simple local vector database powered by sqlite-vec.
-Author-email: Dayton Dunbar <coderdayton14@gmail.com>
-License: MIT
-License-File: LICENSE
-Requires-Python: >=3.10
-Requires-Dist: numpy>=2.0
-Requires-Dist: psutil>=5.9.0
-Requires-Dist: python-dotenv>=1.2.1
-Requires-Dist: sqlite-vec>=0.1.6
-Provides-Extra: examples
-Requires-Dist: ollama; extra == 'examples'
-Provides-Extra: server
-Requires-Dist: fastapi>=0.115; extra == 'server'
-Requires-Dist: sentence-transformers[onnx]==5.1.2; extra == 'server'
-Requires-Dist: uvicorn[standard]>=0.30; extra == 'server'
-Description-Content-Type: text/markdown
 # SimpleVecDB
 [![CI](https://github.com/coderdayton/simplevecdb/actions/workflows/ci.yml/badge.svg)](https://github.com/coderdayton/simplevecdb/actions)
@@ -27,12 +7,12 @@ Description-Content-Type: text/markdown
 **The dead-simple, local-first vector database.**
-SimpleVecDB brings **Chroma-like simplicity** to a single **SQLite file**. Built on `sqlite-vec`, it offers high-performance vector search, quantization, and zero infrastructure headaches. Perfect for local RAG, offline agents, and indie hackers who need production-grade vector search without the operational overhead.
+SimpleVecDB brings **Chroma-like simplicity** to a single **SQLite file**. Built on `usearch` HNSW indexing, it offers high-performance vector search, quantization, and zero infrastructure headaches. Perfect for local RAG, offline agents, and indie hackers who need production-grade vector search without the operational overhead.
 ## Why SimpleVecDB?
 - **Zero Infrastructure** — Just a `.db` file. No Docker, no Redis, no cloud bills.
-- **Blazing Fast** — ~2ms queries on consumer hardware with 32x storage efficiency via quantization.
+- **Blazing Fast** — 10-100x faster search via usearch HNSW. Adaptive: brute-force for <10k vectors (perfect recall), HNSW for larger collections.
 - **Truly Portable** — Runs anywhere SQLite runs: Linux, macOS, Windows, even WASM.
 - **Async Ready** — Full async/await support for web servers and concurrent workloads.
 - **Batteries Included** — Optional FastAPI embeddings server + LangChain/LlamaIndex integrations.
@@ -198,8 +178,8 @@ Organize vectors by domain within a single database file:
 from simplevecdb import VectorDB, Quantization
 db = VectorDB("app.db")
-users = db.collection("users", quantization=Quantization.INT8)
-products = db.collection("products", quantization=Quantization.BIT)
+users = db.collection("users", quantization=Quantization.FLOAT16)  # 2x memory savings
+products = db.collection("products", quantization=Quantization.BIT)  # 32x compression
 # Isolated namespaces
 users.add_texts(["Alice likes hiking"], embeddings=[[0.1]*384])
@@ -209,9 +189,22 @@ products.add_texts(["Hiking boots"], embeddings=[[0.9]*384])
 ### Search Capabilities
 ```python
-# Vector similarity (cosine/L2/inner product)
+# Vector similarity (cosine/L2) - adaptive search by default
 results = collection.similarity_search(query_vector, k=10)
+# Force exact search for perfect recall (brute-force)
+results = collection.similarity_search(query_vector, k=10, exact=True)
+# Force HNSW approximate search (faster, may miss some results)
+results = collection.similarity_search(query_vector, k=10, exact=False)
+# Parallel search with explicit thread count
+results = collection.similarity_search(query_vector, k=10, threads=8)
+# Batch search - 10x throughput for multiple queries
+queries = [query1, query2, query3]  # List of embedding vectors
+batch_results = collection.similarity_search_batch(queries, k=10)
 # Keyword search (BM25)
 results = collection.keyword_search("exact phrase", k=10)
@@ -231,36 +224,37 @@ results = collection.similarity_search(
 ## Feature Matrix
-| Feature                   | Status | Description                                                |
-| :------------------------ | :----- | :--------------------------------------------------------- |
-| **Single-File Storage**   | ✅     | SQLite `.db` file or in-memory mode                        |
-| **Multi-Collection**      | ✅     | Isolated namespaces per database                           |
-| **Vector Search**         | ✅     | Cosine, Euclidean, Inner Product metrics                   |
-| **Hybrid Search**         | ✅     | BM25 + vector fusion (Reciprocal Rank Fusion)              |
-| **Quantization**          | ✅     | FLOAT32, INT8, BIT (1-bit) for 4-32x compression           |
-| **Metadata Filtering**    | ✅     | SQL `WHERE` clause support                                 |
-| **Framework Integration** | ✅     | LangChain \& LlamaIndex adapters                           |
-| **Hardware Acceleration** | ✅     | Auto-detects CUDA/MPS/CPU                                  |
-| **Local Embeddings**      | ✅     | HuggingFace models via `[server]` extras                   |
-| **HNSW Indexing**         | 🔜     | Approximate nearest neighbor (pending `sqlite-vec` update) |
-| **Built-in Encryption**   | 🔜     | SQLCipher integration for at-rest encryption               |
+| Feature                   | Status | Description                                                  |
+| :------------------------ | :----- | :----------------------------------------------------------- |
+| **Single-File Storage**   | ✅     | SQLite `.db` file or in-memory mode                          |
+| **Multi-Collection**      | ✅     | Isolated namespaces per database                             |
+| **HNSW Indexing**         | ✅     | usearch HNSW for 10-100x faster search                       |
+| **Adaptive Search**       | ✅     | Auto brute-force for <10k vectors, HNSW for larger           |
+| **Vector Search**         | ✅     | Cosine, Euclidean metrics (L1 removed in v2.0)               |
+| **Hybrid Search**         | ✅     | BM25 + vector fusion (Reciprocal Rank Fusion)                |
+| **Quantization**          | ✅     | FLOAT32, FLOAT16, INT8, BIT for 2-32x compression            |
+| **Parallel Operations**   | ✅     | `threads` parameter for add/search                           |
+| **Metadata Filtering**    | ✅     | SQL `WHERE` clause support                                   |
+| **Framework Integration** | ✅     | LangChain \& LlamaIndex adapters                             |
+| **Hardware Acceleration** | ✅     | Auto-detects CUDA/MPS/CPU + SIMD via usearch                 |
+| **Local Embeddings**      | ✅     | HuggingFace models via `[server]` extras                     |
+| **Built-in Encryption**   | 🔜     | SQLCipher integration for at-rest encryption                 |
 ## Performance Benchmarks
-**Test Environment:** Intel i9-13900K, NVIDIA RTX 4090, `sqlite-vec` v0.1.6
-**Dataset:** 10,000 vectors × 384 dimensions
-| Quantization | Storage Size | Insert Speed | Query Latency (k=10) | Compression Ratio |
-| :----------- | :----------- | :----------- | :------------------- | :---------------- |
-| **FLOAT32**  | 15.50 MB     | 15,585 vec/s | 3.55 ms              | 1x (baseline)     |
-| **INT8**     | 4.23 MB      | 27,893 vec/s | 3.93 ms              | 3.7x smaller      |
-| **BIT**      | 0.95 MB      | 32,321 vec/s | 0.27 ms              | 16.3x smaller     |
+**10,000 vectors, 384 dimensions, k=10 search** — [Full benchmarks →](https://coderdayton.github.io/SimpleVecDB/benchmarks)
-**Key Takeaways:**
+| Quantization | Storage  | Query Time | Compression |
+| :----------- | :------- | :--------- | :---------- |
+| FLOAT32      | 36.0 MB  | 0.20 ms    | 1x          |
+| FLOAT16      | 28.7 MB  | 0.20 ms    | 2x          |
+| INT8         | 25.0 MB  | 0.16 ms    | 4x          |
+| BIT          | 21.8 MB  | 0.08 ms    | 32x         |
-- BIT quantization delivers 13x faster queries with 16x storage reduction
-- INT8 offers balanced performance (79% faster inserts, minimal query overhead)
-- Sub-4ms query latency on consumer hardware
+**Key highlights:**
+- **3-34x faster** than brute-force for collections >10k vectors
+- **Adaptive search**: perfect recall for small collections, HNSW for large
+- **FLOAT16 recommended**: best balance of speed, memory, and precision
 ## Documentation
@@ -300,14 +294,16 @@ pip install torch --index-url https://download.pytorch.org/whl/cu118
 **Slow Queries on Large Datasets**
 - Enable quantization: `collection = db.collection("docs", quantization=Quantization.INT8)`
-- Consider HNSW indexing when available (roadmap item)
+- For >10k vectors, HNSW is automatic; tune with `rebuild_index(connectivity=32)`
+- Use `exact=False` to force HNSW even on smaller collections
 - Use metadata filtering to reduce search space
 ## Roadmap
 - [x] Hybrid Search (BM25 + Vector)
 - [x] Multi-collection support
-- [ ] HNSW indexing (pending `sqlite-vec` upstream)
+- [x] HNSW indexing (usearch backend)
+- [x] Adaptive search (brute-force/HNSW)
 - [ ] SQLCipher encryption (at-rest data protection)
 - [ ] Streaming insert API for large-scale ingestion
 - [ ] Graph-based metadata relationships

simplevecdb 1.2.0__tar.gz → 2.0.0__tar.gz

simplevecdb 1.2.0tar.gz → 2.0.0tar.gz