simplevecdb 1.2.0__tar.gz → 2.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (112) hide show
  1. simplevecdb-2.0.0/.bandit +9 -0
  2. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/.github/workflows/ci.yml +2 -0
  3. simplevecdb-2.0.0/.github/workflows/publish.yml +121 -0
  4. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/.github/workflows/security.yml +4 -4
  5. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/CHANGELOG.md +163 -0
  6. simplevecdb-1.2.0/README.md → simplevecdb-2.0.0/PKG-INFO +68 -32
  7. simplevecdb-1.2.0/PKG-INFO → simplevecdb-2.0.0/README.md +48 -52
  8. simplevecdb-2.0.0/docs/CHANGELOG.md +379 -0
  9. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/docs/api/async.md +6 -2
  10. simplevecdb-2.0.0/docs/api/core.md +79 -0
  11. simplevecdb-2.0.0/docs/api/engine/search.md +46 -0
  12. simplevecdb-2.0.0/docs/benchmarks.md +122 -0
  13. simplevecdb-2.0.0/docs/examples.md +154 -0
  14. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/docs/index.md +26 -18
  15. simplevecdb-2.0.0/examples/auto_embed.py +27 -0
  16. simplevecdb-2.0.0/examples/backend_benchmark.py +326 -0
  17. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/examples/embeddings/perf_benchmark.py +20 -10
  18. simplevecdb-2.0.0/examples/quant_benchmark.py +58 -0
  19. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/examples/rag/langchain_rag.ipynb +7 -10
  20. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/examples/rag/llama_rag.ipynb +6 -11
  21. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/examples/rag/ollama_rag.ipynb +1 -1
  22. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/pyproject.toml +5 -6
  23. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/src/simplevecdb/__init__.py +17 -2
  24. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/src/simplevecdb/async_core.py +63 -14
  25. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/src/simplevecdb/config.py +3 -1
  26. simplevecdb-2.0.0/src/simplevecdb/constants.py +86 -0
  27. simplevecdb-2.0.0/src/simplevecdb/core.py +938 -0
  28. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/src/simplevecdb/embeddings/models.py +4 -6
  29. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/src/simplevecdb/embeddings/server.py +94 -3
  30. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/src/simplevecdb/engine/__init__.py +2 -1
  31. simplevecdb-2.0.0/src/simplevecdb/engine/catalog.py +497 -0
  32. simplevecdb-2.0.0/src/simplevecdb/engine/search.py +450 -0
  33. simplevecdb-2.0.0/src/simplevecdb/engine/usearch_index.py +421 -0
  34. simplevecdb-2.0.0/src/simplevecdb/logging.py +214 -0
  35. simplevecdb-2.0.0/src/simplevecdb/types.py +70 -0
  36. simplevecdb-2.0.0/src/simplevecdb/utils.py +176 -0
  37. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/core/test_core_additional_coverage.py +0 -77
  38. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/core/test_factory_methods.py +1 -16
  39. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/core/test_filters.py +7 -7
  40. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/core/test_initialization.py +6 -31
  41. simplevecdb-2.0.0/tests/unit/core/test_quantization.py +86 -0
  42. simplevecdb-2.0.0/tests/unit/core/test_similarity_search.py +127 -0
  43. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/embeddings/test_models.py +1 -2
  44. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/embeddings/test_server.py +2 -1
  45. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/test_async.py +23 -0
  46. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/test_core.py +175 -31
  47. simplevecdb-2.0.0/tests/unit/test_error_handling.py +495 -0
  48. simplevecdb-2.0.0/tests/unit/test_search_coverage.py +120 -0
  49. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/test_types.py +1 -1
  50. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/uv.lock +171 -251
  51. simplevecdb-1.2.0/.bandit +0 -9
  52. simplevecdb-1.2.0/.github/workflows/publish.yml +0 -57
  53. simplevecdb-1.2.0/docs/CHANGELOG.md +0 -124
  54. simplevecdb-1.2.0/docs/api/core.md +0 -5
  55. simplevecdb-1.2.0/docs/api/engine/search.md +0 -5
  56. simplevecdb-1.2.0/docs/benchmarks.md +0 -36
  57. simplevecdb-1.2.0/docs/examples.md +0 -31
  58. simplevecdb-1.2.0/examples/auto_embed.py +0 -8
  59. simplevecdb-1.2.0/examples/quant_benchmark.py +0 -42
  60. simplevecdb-1.2.0/src/simplevecdb/constants.py +0 -38
  61. simplevecdb-1.2.0/src/simplevecdb/core.py +0 -673
  62. simplevecdb-1.2.0/src/simplevecdb/engine/catalog.py +0 -349
  63. simplevecdb-1.2.0/src/simplevecdb/engine/search.py +0 -418
  64. simplevecdb-1.2.0/src/simplevecdb/types.py +0 -34
  65. simplevecdb-1.2.0/src/simplevecdb/utils.py +0 -19
  66. simplevecdb-1.2.0/tests/unit/core/test_brute_force.py +0 -142
  67. simplevecdb-1.2.0/tests/unit/core/test_quantization.py +0 -51
  68. simplevecdb-1.2.0/tests/unit/core/test_similarity_search.py +0 -47
  69. simplevecdb-1.2.0/tests/unit/test_search_coverage.py +0 -198
  70. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/.env.example +0 -0
  71. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/.github/FUNDING.yml +0 -0
  72. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/.github/dependabot.yml +0 -0
  73. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/.github/workflows/update-sponsors.yml +0 -0
  74. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/.gitignore +0 -0
  75. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/.pre-commit-config.yaml +0 -0
  76. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/.python-version +0 -0
  77. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/CODE_OF_CONDUCT.md +0 -0
  78. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/CONTRIBUTING.md +0 -0
  79. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/LICENSE +0 -0
  80. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/SECURITY.md +0 -0
  81. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/docs/CONTRIBUTING.md +0 -0
  82. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/docs/ENV_SETUP.md +0 -0
  83. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/docs/LICENSE +0 -0
  84. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/docs/api/config.md +0 -0
  85. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/docs/api/embeddings.md +0 -0
  86. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/docs/api/engine/catalog.md +0 -0
  87. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/docs/api/engine/quantization.md +0 -0
  88. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/docs/api/integrations.md +0 -0
  89. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/examples/smoke_test.py +0 -0
  90. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/mkdocs.yml +0 -0
  91. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/src/simplevecdb/embeddings/__init__.py +0 -0
  92. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/src/simplevecdb/engine/quantization.py +0 -0
  93. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/src/simplevecdb/integrations/__init__.py +0 -0
  94. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/src/simplevecdb/integrations/langchain.py +0 -0
  95. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/src/simplevecdb/integrations/llamaindex.py +0 -0
  96. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/conftest.py +0 -0
  97. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/integration/test_langchain.py +0 -0
  98. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/integration/test_llamaindex.py +0 -0
  99. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/integration/test_rag.py +0 -0
  100. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/integration/test_server.py +0 -0
  101. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/perf/test_batch_detection.py +0 -0
  102. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/perf/test_performance.py +0 -0
  103. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/core/__init__.py +0 -0
  104. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/core/test_batch_detection.py +0 -0
  105. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/embeddings/__init__.py +0 -0
  106. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/integrations/__init__.py +0 -0
  107. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/integrations/test_langchain_coverage.py +0 -0
  108. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/integrations/test_llamaindex_coverage.py +0 -0
  109. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/test_config.py +0 -0
  110. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/test_multi_collection.py +0 -0
  111. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/test_search.py +0 -0
  112. {simplevecdb-1.2.0 → simplevecdb-2.0.0}/tests/unit/test_utils.py +0 -0
@@ -0,0 +1,9 @@
1
+ exclude_dirs:
2
+
3
+ - /tests
4
+ - /examples
5
+
6
+ skips:
7
+
8
+ - B104
9
+ - B608 # SQL injection false positive: table names are validated via _validate_table_name()
@@ -1,4 +1,6 @@
1
1
  name: CI
2
+ permissions:
3
+ contents: read
2
4
 
3
5
  on:
4
6
  push:
@@ -0,0 +1,121 @@
1
+ name: Publish to PyPI
2
+
3
+ on:
4
+ push:
5
+ tags:
6
+ - "v*.*.*"
7
+
8
+ permissions:
9
+ id-token: write
10
+ contents: write
11
+
12
+ jobs:
13
+ # Verify version matches before publishing
14
+ verify:
15
+ runs-on: ubuntu-latest
16
+ outputs:
17
+ version: ${{ steps.version.outputs.VERSION }}
18
+ steps:
19
+ - uses: actions/checkout@v4
20
+
21
+ - name: Extract version from tag
22
+ id: version
23
+ run: echo "VERSION=${GITHUB_REF#refs/tags/v}" >> $GITHUB_OUTPUT
24
+
25
+ - name: Install uv
26
+ uses: astral-sh/setup-uv@v3
27
+
28
+ - name: Set up Python
29
+ run: uv python install 3.11
30
+
31
+ - name: Verify __version__ matches tag
32
+ run: |
33
+ PACKAGE_VERSION=$(uv run python -c "from simplevecdb import __version__; print(__version__)")
34
+ TAG_VERSION="${{ steps.version.outputs.VERSION }}"
35
+ if [ "$PACKAGE_VERSION" != "$TAG_VERSION" ]; then
36
+ echo "❌ Version mismatch!"
37
+ echo " Tag version: $TAG_VERSION"
38
+ echo " Package version: $PACKAGE_VERSION"
39
+ echo ""
40
+ echo "Update __version__ in src/simplevecdb/__init__.py to match the tag."
41
+ exit 1
42
+ fi
43
+ echo "✅ Version match: $PACKAGE_VERSION"
44
+
45
+ release:
46
+ needs: verify
47
+ runs-on: ubuntu-latest
48
+ steps:
49
+ - uses: actions/checkout@v4
50
+ with:
51
+ fetch-depth: 0
52
+
53
+ - name: Extract changelog for release
54
+ id: changelog
55
+ run: |
56
+ VERSION="${{ needs.verify.outputs.version }}"
57
+ # Extract section for this version from CHANGELOG.md
58
+ # Matches from "## [VERSION]" until the next "## [" or end of file
59
+ awk -v ver="$VERSION" '
60
+ /^## \[/ {
61
+ if (found) exit
62
+ if (index($0, "## [" ver "]") == 1) found=1
63
+ }
64
+ found
65
+ ' CHANGELOG.md > release_notes.md
66
+
67
+ # If empty, provide a fallback
68
+ if [ ! -s release_notes.md ]; then
69
+ echo "## Release v$VERSION" > release_notes.md
70
+ echo "" >> release_notes.md
71
+ echo "See [CHANGELOG.md](https://github.com/${{ github.repository }}/blob/main/CHANGELOG.md) for details." >> release_notes.md
72
+ fi
73
+
74
+ echo "📋 Release notes:"
75
+ cat release_notes.md
76
+
77
+ - name: Create GitHub Release
78
+ uses: softprops/action-gh-release@v2
79
+ with:
80
+ body_path: release_notes.md
81
+ draft: false
82
+ prerelease: ${{ contains(needs.verify.outputs.version, 'rc') || contains(needs.verify.outputs.version, 'beta') || contains(needs.verify.outputs.version, 'alpha') }}
83
+ generate_release_notes: false
84
+
85
+ publish:
86
+ needs: [verify, release]
87
+ runs-on: ubuntu-latest
88
+ environment: pypi
89
+ steps:
90
+ - uses: actions/checkout@v4
91
+
92
+ - name: Install uv
93
+ uses: astral-sh/setup-uv@v3
94
+
95
+ - name: Set up Python
96
+ run: uv python install 3.11
97
+
98
+ - name: Build package
99
+ run: uv build
100
+
101
+ - name: Verify build artifacts
102
+ run: |
103
+ echo "📦 Built packages:"
104
+ ls -la dist/
105
+ # Verify version in built package
106
+ uv run python -c "
107
+ import zipfile
108
+ import glob
109
+ whl = glob.glob('dist/*.whl')[0]
110
+ with zipfile.ZipFile(whl) as z:
111
+ for name in z.namelist():
112
+ if name.endswith('METADATA'):
113
+ content = z.read(name).decode()
114
+ for line in content.split('\n'):
115
+ if line.startswith('Version:'):
116
+ print(f'✅ Package version: {line}')
117
+ break
118
+ "
119
+
120
+ - name: Publish to PyPI
121
+ run: uv publish --token ${{ secrets.PYPI_API_TOKEN }}
@@ -22,12 +22,12 @@ jobs:
22
22
  - name: Create venv and install dependencies
23
23
  run: |
24
24
  uv venv
25
- uv pip install safety pip-audit
25
+ uv pip install pip-audit
26
+ uv pip install ".[dev,server]"
26
27
 
27
- - name: Scan dependencies
28
+ - name: Scan dependencies with pip-audit
28
29
  run: |
29
- uv run safety check --json || true
30
- uv run pip-audit --requirement pyproject.toml || true
30
+ uv run pip-audit --strict --ignore-vuln PYSEC-2024-142 || true
31
31
 
32
32
  code-scan:
33
33
  runs-on: ubuntu-latest
@@ -5,6 +5,168 @@ All notable changes to SimpleVecDB will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [2.0.0] - 2025-12-23
9
+
10
+ ### Breaking Changes
11
+
12
+ - **Backend Migration: sqlite-vec → usearch HNSW**
13
+ - Vector search now uses usearch's high-performance HNSW algorithm
14
+ - 10-100x faster similarity search for large collections
15
+ - Vector data stored in separate `.usearch` files per collection (e.g., `mydb.db.default.usearch`)
16
+ - SQLite still stores metadata, text, and FTS5 index
17
+
18
+ - **Removed `DistanceStrategy.L1`** - Manhattan distance not supported by usearch
19
+
20
+ - **Storage Format Change**
21
+ - Embeddings now stored in both usearch index AND SQLite (for MMR support)
22
+ - Existing sqlite-vec databases will auto-migrate on first open
23
+ - Migration is one-way; backup before upgrading
24
+
25
+ ### Added
26
+
27
+ - **`usearch_index.py`** - New UsearchIndex wrapper class:
28
+ - Thread-safe HNSW index operations (lock on writes, lock-free reads)
29
+ - Automatic persistence to `.usearch` files
30
+ - Upsert support (removes existing keys before add)
31
+ - BIT quantization using Hamming metric with bit packing
32
+ - Configurable HNSW parameters (connectivity, expansion_add, expansion_search)
33
+
34
+ - **Proper MMR Implementation** - Max Marginal Relevance now computes actual pairwise similarity between candidates and selected documents using stored embeddings
35
+
36
+ - **Embedding Storage in SQLite** - Embeddings stored as BLOB for:
37
+ - Accurate MMR diversity computation
38
+ - Future index rebuild from SQLite backup
39
+ - Schema auto-migrates existing tables
40
+
41
+ - **`VectorCollection.rebuild_index()`** - Reconstruct usearch HNSW index from SQLite embeddings:
42
+ - Useful for index corruption recovery
43
+ - Tune HNSW parameters (connectivity, expansion_add, expansion_search)
44
+ - Reclaim space after many deletions
45
+
46
+ - **`VectorDB.check_migration(path)`** - Dry-run migration check:
47
+ - Reports which collections need migration
48
+ - Shows total vector count and estimated storage
49
+ - Provides detailed rollback instructions
50
+
51
+ - **Adaptive Search** - Automatically optimizes search strategy based on collection size:
52
+ - Collections < 10k vectors use brute-force (`exact=True`) for perfect recall
53
+ - Collections ≥ 10k vectors use HNSW for faster approximate search
54
+ - Threshold configurable via `constants.USEARCH_BRUTEFORCE_THRESHOLD`
55
+
56
+ - **`exact` parameter** - Force search mode in `similarity_search()`:
57
+ - `None` (default): adaptive based on collection size
58
+ - `True`: force brute-force for perfect recall
59
+ - `False`: force HNSW approximate search
60
+
61
+ - **`Quantization.FLOAT16`** - Half-precision floating point:
62
+ - 2x memory savings compared to FLOAT32
63
+ - 1.5x faster search with minimal precision loss
64
+ - Ideal for embeddings where full precision isn't needed
65
+
66
+ - **`threads` parameter** - Parallel execution control:
67
+ - Added to `add_texts()` and `similarity_search()`
68
+ - `0` (default): auto-detect optimal thread count
69
+ - Explicit value: control parallelism for batch operations
70
+
71
+ - **Auto Memory-Mapping** - Large indexes automatically use memory-mapped mode:
72
+ - Indexes >100k vectors use `view=True` for instant startup
73
+ - Lower memory footprint for large collections
74
+ - Transparent upgrade to writable mode on add operations
75
+ - Configurable via `constants.USEARCH_MMAP_THRESHOLD`
76
+
77
+ - **`similarity_search_batch()`** - Multi-query batch search:
78
+ - ~10x throughput for batch query workloads
79
+ - Uses usearch's native batch search under the hood
80
+ - Same parameters as `similarity_search()` but accepts list of queries
81
+
82
+ - **`examples/backend_benchmark.py`** - Benchmark script comparing usearch vs brute-force:
83
+ - Measures speedup, recall, and storage efficiency
84
+ - Supports all quantization levels
85
+ - Validates 10-100x performance claims
86
+
87
+ ### Changed
88
+
89
+ - **Dependencies**: Replaced `sqlite-vec>=0.1.6` with `usearch>=2.12`
90
+ - **CatalogManager**: Removed vec0 virtual table operations, added embedding column
91
+ - **SearchEngine**: Rewrote to use UsearchIndex for all vector operations
92
+ - **VectorCollection**: Creates usearch index at `{db_path}.{collection}.usearch`
93
+
94
+ ### Migration Notes
95
+
96
+ 1. **Backup your database** before upgrading
97
+ 2. On first open, existing sqlite-vec data will be migrated automatically
98
+ 3. New `.usearch` files will be created alongside your `.db` file
99
+ 4. The legacy sqlite-vec table is dropped after successful migration
100
+
101
+ ## [1.3.0] - 2025-12-07
102
+
103
+ ### Added
104
+
105
+ - **Structured Logging Module** - New `simplevecdb.logging` module for production-grade observability
106
+ - `get_logger(name)` - Get namespaced loggers under `simplevecdb.*`
107
+ - `configure_logging(level, format, handler)` - One-call logging setup
108
+ - `log_operation(name, **context)` - Context manager for operation timing and error tracking
109
+ - `log_error(operation, error, **context)` - Consistent error logging with context
110
+
111
+ - **SQLite Lock Retry Logic** - Automatic retry with exponential backoff for database lock contention
112
+ - `@retry_on_lock(max_retries, base_delay, max_delay, jitter)` decorator
113
+ - `DatabaseLockedError` exception for exhausted retries with attempt/wait metrics
114
+ - Applied to `add_texts()` and `delete_by_ids()` operations in CatalogManager
115
+
116
+ - **Filter Validation** - Early validation of metadata filter dictionaries
117
+ - `validate_filter(filter_dict)` - Validates keys are strings, values are supported types
118
+ - Clear error messages for invalid filter structures
119
+ - Automatically called in `build_filter_clause()` before SQL generation
120
+
121
+ - **New Exports** - Added to `simplevecdb.__all__`:
122
+ - `get_logger`, `configure_logging`, `log_operation`
123
+ - `DatabaseLockedError`, `retry_on_lock`, `validate_filter`
124
+
125
+ ### Changed
126
+
127
+ - **CatalogManager** internal refactoring:
128
+ - `add_texts()` now delegates to `_insert_batch()` which has retry logic
129
+ - `delete_by_ids()` now has retry logic for lock contention
130
+ - `build_filter_clause()` validates filters before processing
131
+ - **`delete_by_ids()` no longer auto-vacuums** - Call `VectorDB.vacuum()` separately to reclaim disk space after large deletions. This improves performance for batch deletions.
132
+ - **RateLimiter** now includes TTL-based cleanup to prevent memory exhaustion on long-running servers with many unique clients (default: 1 hour TTL, 10k max buckets).
133
+ - **AsyncVectorDB.close()** now guarantees database connection is closed even if executor shutdown fails.
134
+
135
+ ### Testing
136
+
137
+ - Added 25 new tests in `tests/unit/test_error_handling.py`:
138
+ - 7 tests for `retry_on_lock` decorator behavior
139
+ - 2 tests for `DatabaseLockedError` exception
140
+ - 4 tests for `validate_filter` function
141
+ - 8 tests for logging utilities
142
+ - 4 integration tests for error handling in VectorDB operations
143
+
144
+ ### Example
145
+
146
+ ```python
147
+ import logging
148
+ from simplevecdb import (
149
+ VectorDB,
150
+ configure_logging,
151
+ get_logger,
152
+ log_operation,
153
+ DatabaseLockedError,
154
+ )
155
+
156
+ # Enable debug logging
157
+ configure_logging(level=logging.DEBUG)
158
+
159
+ logger = get_logger(__name__)
160
+
161
+ try:
162
+ with log_operation("bulk_insert", collection="docs", count=1000):
163
+ db = VectorDB("data.db")
164
+ collection = db.collection("docs")
165
+ collection.add_texts(texts, embeddings=embeddings)
166
+ except DatabaseLockedError as e:
167
+ logger.error(f"Insert failed after {e.attempts} attempts")
168
+ ```
169
+
8
170
  ## [1.2.0] - 2025-11-25
9
171
 
10
172
  ### Added
@@ -210,6 +372,7 @@ Benchmarks on i9-13900K & RTX 4090 with 10k vectors (384-dim):
210
372
  - **Documentation**: https://coderdayton.github.io/simplevecdb/
211
373
  - **License**: MIT
212
374
 
375
+ [1.3.0]: https://github.com/coderdayton/simplevecdb/releases/tag/v1.3.0
213
376
  [1.2.0]: https://github.com/coderdayton/simplevecdb/releases/tag/v1.2.0
214
377
  [1.1.1]: https://github.com/coderdayton/simplevecdb/releases/tag/v1.1.1
215
378
  [1.1.0]: https://github.com/coderdayton/simplevecdb/releases/tag/v1.1.0
@@ -1,3 +1,23 @@
1
+ Metadata-Version: 2.4
2
+ Name: simplevecdb
3
+ Version: 2.0.0
4
+ Summary: Dead-simple local vector database powered by usearch HNSW.
5
+ Author-email: Dayton Dunbar <coderdayton14@gmail.com>
6
+ License: MIT
7
+ License-File: LICENSE
8
+ Requires-Python: >=3.10
9
+ Requires-Dist: numpy>=2.0
10
+ Requires-Dist: psutil>=5.9.0
11
+ Requires-Dist: python-dotenv>=1.2.1
12
+ Requires-Dist: usearch>=2.12
13
+ Provides-Extra: examples
14
+ Requires-Dist: ollama; extra == 'examples'
15
+ Provides-Extra: server
16
+ Requires-Dist: fastapi>=0.115; extra == 'server'
17
+ Requires-Dist: sentence-transformers>=5.0; extra == 'server'
18
+ Requires-Dist: uvicorn[standard]>=0.30; extra == 'server'
19
+ Description-Content-Type: text/markdown
20
+
1
21
  # SimpleVecDB
2
22
 
3
23
  [![CI](https://github.com/coderdayton/simplevecdb/actions/workflows/ci.yml/badge.svg)](https://github.com/coderdayton/simplevecdb/actions)
@@ -7,12 +27,12 @@
7
27
 
8
28
  **The dead-simple, local-first vector database.**
9
29
 
10
- SimpleVecDB brings **Chroma-like simplicity** to a single **SQLite file**. Built on `sqlite-vec`, it offers high-performance vector search, quantization, and zero infrastructure headaches. Perfect for local RAG, offline agents, and indie hackers who need production-grade vector search without the operational overhead.
30
+ SimpleVecDB brings **Chroma-like simplicity** to a single **SQLite file**. Built on `usearch` HNSW indexing, it offers high-performance vector search, quantization, and zero infrastructure headaches. Perfect for local RAG, offline agents, and indie hackers who need production-grade vector search without the operational overhead.
11
31
 
12
32
  ## Why SimpleVecDB?
13
33
 
14
34
  - **Zero Infrastructure** — Just a `.db` file. No Docker, no Redis, no cloud bills.
15
- - **Blazing Fast** — ~2ms queries on consumer hardware with 32x storage efficiency via quantization.
35
+ - **Blazing Fast** — 10-100x faster search via usearch HNSW. Adaptive: brute-force for <10k vectors (perfect recall), HNSW for larger collections.
16
36
  - **Truly Portable** — Runs anywhere SQLite runs: Linux, macOS, Windows, even WASM.
17
37
  - **Async Ready** — Full async/await support for web servers and concurrent workloads.
18
38
  - **Batteries Included** — Optional FastAPI embeddings server + LangChain/LlamaIndex integrations.
@@ -178,8 +198,8 @@ Organize vectors by domain within a single database file:
178
198
  from simplevecdb import VectorDB, Quantization
179
199
 
180
200
  db = VectorDB("app.db")
181
- users = db.collection("users", quantization=Quantization.INT8)
182
- products = db.collection("products", quantization=Quantization.BIT)
201
+ users = db.collection("users", quantization=Quantization.FLOAT16) # 2x memory savings
202
+ products = db.collection("products", quantization=Quantization.BIT) # 32x compression
183
203
 
184
204
  # Isolated namespaces
185
205
  users.add_texts(["Alice likes hiking"], embeddings=[[0.1]*384])
@@ -189,9 +209,22 @@ products.add_texts(["Hiking boots"], embeddings=[[0.9]*384])
189
209
  ### Search Capabilities
190
210
 
191
211
  ```python
192
- # Vector similarity (cosine/L2/inner product)
212
+ # Vector similarity (cosine/L2) - adaptive search by default
193
213
  results = collection.similarity_search(query_vector, k=10)
194
214
 
215
+ # Force exact search for perfect recall (brute-force)
216
+ results = collection.similarity_search(query_vector, k=10, exact=True)
217
+
218
+ # Force HNSW approximate search (faster, may miss some results)
219
+ results = collection.similarity_search(query_vector, k=10, exact=False)
220
+
221
+ # Parallel search with explicit thread count
222
+ results = collection.similarity_search(query_vector, k=10, threads=8)
223
+
224
+ # Batch search - 10x throughput for multiple queries
225
+ queries = [query1, query2, query3] # List of embedding vectors
226
+ batch_results = collection.similarity_search_batch(queries, k=10)
227
+
195
228
  # Keyword search (BM25)
196
229
  results = collection.keyword_search("exact phrase", k=10)
197
230
 
@@ -211,36 +244,37 @@ results = collection.similarity_search(
211
244
 
212
245
  ## Feature Matrix
213
246
 
214
- | Feature | Status | Description |
215
- | :------------------------ | :----- | :--------------------------------------------------------- |
216
- | **Single-File Storage** | ✅ | SQLite `.db` file or in-memory mode |
217
- | **Multi-Collection** | ✅ | Isolated namespaces per database |
218
- | **Vector Search** | ✅ | Cosine, Euclidean, Inner Product metrics |
219
- | **Hybrid Search** | ✅ | BM25 + vector fusion (Reciprocal Rank Fusion) |
220
- | **Quantization** | ✅ | FLOAT32, INT8, BIT (1-bit) for 4-32x compression |
221
- | **Metadata Filtering** | ✅ | SQL `WHERE` clause support |
222
- | **Framework Integration** | ✅ | LangChain \& LlamaIndex adapters |
223
- | **Hardware Acceleration** | ✅ | Auto-detects CUDA/MPS/CPU |
224
- | **Local Embeddings** | ✅ | HuggingFace models via `[server]` extras |
225
- | **HNSW Indexing** | 🔜 | Approximate nearest neighbor (pending `sqlite-vec` update) |
226
- | **Built-in Encryption** | 🔜 | SQLCipher integration for at-rest encryption |
247
+ | Feature | Status | Description |
248
+ | :------------------------ | :----- | :----------------------------------------------------------- |
249
+ | **Single-File Storage** | ✅ | SQLite `.db` file or in-memory mode |
250
+ | **Multi-Collection** | ✅ | Isolated namespaces per database |
251
+ | **HNSW Indexing** | ✅ | usearch HNSW for 10-100x faster search |
252
+ | **Adaptive Search** | ✅ | Auto brute-force for <10k vectors, HNSW for larger |
253
+ | **Vector Search** | ✅ | Cosine, Euclidean metrics (L1 removed in v2.0) |
254
+ | **Hybrid Search** | ✅ | BM25 + vector fusion (Reciprocal Rank Fusion) |
255
+ | **Quantization** | ✅ | FLOAT32, FLOAT16, INT8, BIT for 2-32x compression |
256
+ | **Parallel Operations** | ✅ | `threads` parameter for add/search |
257
+ | **Metadata Filtering** | ✅ | SQL `WHERE` clause support |
258
+ | **Framework Integration** | | LangChain \& LlamaIndex adapters |
259
+ | **Hardware Acceleration** | | Auto-detects CUDA/MPS/CPU + SIMD via usearch |
260
+ | **Local Embeddings** | ✅ | HuggingFace models via `[server]` extras |
261
+ | **Built-in Encryption** | 🔜 | SQLCipher integration for at-rest encryption |
227
262
 
228
263
  ## Performance Benchmarks
229
264
 
230
- **Test Environment:** Intel i9-13900K, NVIDIA RTX 4090, `sqlite-vec` v0.1.6
231
- **Dataset:** 10,000 vectors × 384 dimensions
232
-
233
- | Quantization | Storage Size | Insert Speed | Query Latency (k=10) | Compression Ratio |
234
- | :----------- | :----------- | :----------- | :------------------- | :---------------- |
235
- | **FLOAT32** | 15.50 MB | 15,585 vec/s | 3.55 ms | 1x (baseline) |
236
- | **INT8** | 4.23 MB | 27,893 vec/s | 3.93 ms | 3.7x smaller |
237
- | **BIT** | 0.95 MB | 32,321 vec/s | 0.27 ms | 16.3x smaller |
265
+ **10,000 vectors, 384 dimensions, k=10 search** [Full benchmarks →](https://coderdayton.github.io/SimpleVecDB/benchmarks)
238
266
 
239
- **Key Takeaways:**
267
+ | Quantization | Storage | Query Time | Compression |
268
+ | :----------- | :------- | :--------- | :---------- |
269
+ | FLOAT32 | 36.0 MB | 0.20 ms | 1x |
270
+ | FLOAT16 | 28.7 MB | 0.20 ms | 2x |
271
+ | INT8 | 25.0 MB | 0.16 ms | 4x |
272
+ | BIT | 21.8 MB | 0.08 ms | 32x |
240
273
 
241
- - BIT quantization delivers 13x faster queries with 16x storage reduction
242
- - INT8 offers balanced performance (79% faster inserts, minimal query overhead)
243
- - Sub-4ms query latency on consumer hardware
274
+ **Key highlights:**
275
+ - **3-34x faster** than brute-force for collections >10k vectors
276
+ - **Adaptive search**: perfect recall for small collections, HNSW for large
277
+ - **FLOAT16 recommended**: best balance of speed, memory, and precision
244
278
 
245
279
  ## Documentation
246
280
 
@@ -280,14 +314,16 @@ pip install torch --index-url https://download.pytorch.org/whl/cu118
280
314
  **Slow Queries on Large Datasets**
281
315
 
282
316
  - Enable quantization: `collection = db.collection("docs", quantization=Quantization.INT8)`
283
- - Consider HNSW indexing when available (roadmap item)
317
+ - For >10k vectors, HNSW is automatic; tune with `rebuild_index(connectivity=32)`
318
+ - Use `exact=False` to force HNSW even on smaller collections
284
319
  - Use metadata filtering to reduce search space
285
320
 
286
321
  ## Roadmap
287
322
 
288
323
  - [x] Hybrid Search (BM25 + Vector)
289
324
  - [x] Multi-collection support
290
- - [ ] HNSW indexing (pending `sqlite-vec` upstream)
325
+ - [x] HNSW indexing (usearch backend)
326
+ - [x] Adaptive search (brute-force/HNSW)
291
327
  - [ ] SQLCipher encryption (at-rest data protection)
292
328
  - [ ] Streaming insert API for large-scale ingestion
293
329
  - [ ] Graph-based metadata relationships
@@ -1,23 +1,3 @@
1
- Metadata-Version: 2.4
2
- Name: simplevecdb
3
- Version: 1.2.0
4
- Summary: Dead-simple local vector database powered by sqlite-vec.
5
- Author-email: Dayton Dunbar <coderdayton14@gmail.com>
6
- License: MIT
7
- License-File: LICENSE
8
- Requires-Python: >=3.10
9
- Requires-Dist: numpy>=2.0
10
- Requires-Dist: psutil>=5.9.0
11
- Requires-Dist: python-dotenv>=1.2.1
12
- Requires-Dist: sqlite-vec>=0.1.6
13
- Provides-Extra: examples
14
- Requires-Dist: ollama; extra == 'examples'
15
- Provides-Extra: server
16
- Requires-Dist: fastapi>=0.115; extra == 'server'
17
- Requires-Dist: sentence-transformers[onnx]==5.1.2; extra == 'server'
18
- Requires-Dist: uvicorn[standard]>=0.30; extra == 'server'
19
- Description-Content-Type: text/markdown
20
-
21
1
  # SimpleVecDB
22
2
 
23
3
  [![CI](https://github.com/coderdayton/simplevecdb/actions/workflows/ci.yml/badge.svg)](https://github.com/coderdayton/simplevecdb/actions)
@@ -27,12 +7,12 @@ Description-Content-Type: text/markdown
27
7
 
28
8
  **The dead-simple, local-first vector database.**
29
9
 
30
- SimpleVecDB brings **Chroma-like simplicity** to a single **SQLite file**. Built on `sqlite-vec`, it offers high-performance vector search, quantization, and zero infrastructure headaches. Perfect for local RAG, offline agents, and indie hackers who need production-grade vector search without the operational overhead.
10
+ SimpleVecDB brings **Chroma-like simplicity** to a single **SQLite file**. Built on `usearch` HNSW indexing, it offers high-performance vector search, quantization, and zero infrastructure headaches. Perfect for local RAG, offline agents, and indie hackers who need production-grade vector search without the operational overhead.
31
11
 
32
12
  ## Why SimpleVecDB?
33
13
 
34
14
  - **Zero Infrastructure** — Just a `.db` file. No Docker, no Redis, no cloud bills.
35
- - **Blazing Fast** — ~2ms queries on consumer hardware with 32x storage efficiency via quantization.
15
+ - **Blazing Fast** — 10-100x faster search via usearch HNSW. Adaptive: brute-force for <10k vectors (perfect recall), HNSW for larger collections.
36
16
  - **Truly Portable** — Runs anywhere SQLite runs: Linux, macOS, Windows, even WASM.
37
17
  - **Async Ready** — Full async/await support for web servers and concurrent workloads.
38
18
  - **Batteries Included** — Optional FastAPI embeddings server + LangChain/LlamaIndex integrations.
@@ -198,8 +178,8 @@ Organize vectors by domain within a single database file:
198
178
  from simplevecdb import VectorDB, Quantization
199
179
 
200
180
  db = VectorDB("app.db")
201
- users = db.collection("users", quantization=Quantization.INT8)
202
- products = db.collection("products", quantization=Quantization.BIT)
181
+ users = db.collection("users", quantization=Quantization.FLOAT16) # 2x memory savings
182
+ products = db.collection("products", quantization=Quantization.BIT) # 32x compression
203
183
 
204
184
  # Isolated namespaces
205
185
  users.add_texts(["Alice likes hiking"], embeddings=[[0.1]*384])
@@ -209,9 +189,22 @@ products.add_texts(["Hiking boots"], embeddings=[[0.9]*384])
209
189
  ### Search Capabilities
210
190
 
211
191
  ```python
212
- # Vector similarity (cosine/L2/inner product)
192
+ # Vector similarity (cosine/L2) - adaptive search by default
213
193
  results = collection.similarity_search(query_vector, k=10)
214
194
 
195
+ # Force exact search for perfect recall (brute-force)
196
+ results = collection.similarity_search(query_vector, k=10, exact=True)
197
+
198
+ # Force HNSW approximate search (faster, may miss some results)
199
+ results = collection.similarity_search(query_vector, k=10, exact=False)
200
+
201
+ # Parallel search with explicit thread count
202
+ results = collection.similarity_search(query_vector, k=10, threads=8)
203
+
204
+ # Batch search - 10x throughput for multiple queries
205
+ queries = [query1, query2, query3] # List of embedding vectors
206
+ batch_results = collection.similarity_search_batch(queries, k=10)
207
+
215
208
  # Keyword search (BM25)
216
209
  results = collection.keyword_search("exact phrase", k=10)
217
210
 
@@ -231,36 +224,37 @@ results = collection.similarity_search(
231
224
 
232
225
  ## Feature Matrix
233
226
 
234
- | Feature | Status | Description |
235
- | :------------------------ | :----- | :--------------------------------------------------------- |
236
- | **Single-File Storage** | ✅ | SQLite `.db` file or in-memory mode |
237
- | **Multi-Collection** | ✅ | Isolated namespaces per database |
238
- | **Vector Search** | ✅ | Cosine, Euclidean, Inner Product metrics |
239
- | **Hybrid Search** | ✅ | BM25 + vector fusion (Reciprocal Rank Fusion) |
240
- | **Quantization** | ✅ | FLOAT32, INT8, BIT (1-bit) for 4-32x compression |
241
- | **Metadata Filtering** | ✅ | SQL `WHERE` clause support |
242
- | **Framework Integration** | ✅ | LangChain \& LlamaIndex adapters |
243
- | **Hardware Acceleration** | ✅ | Auto-detects CUDA/MPS/CPU |
244
- | **Local Embeddings** | ✅ | HuggingFace models via `[server]` extras |
245
- | **HNSW Indexing** | 🔜 | Approximate nearest neighbor (pending `sqlite-vec` update) |
246
- | **Built-in Encryption** | 🔜 | SQLCipher integration for at-rest encryption |
227
+ | Feature | Status | Description |
228
+ | :------------------------ | :----- | :----------------------------------------------------------- |
229
+ | **Single-File Storage** | ✅ | SQLite `.db` file or in-memory mode |
230
+ | **Multi-Collection** | ✅ | Isolated namespaces per database |
231
+ | **HNSW Indexing** | ✅ | usearch HNSW for 10-100x faster search |
232
+ | **Adaptive Search** | ✅ | Auto brute-force for <10k vectors, HNSW for larger |
233
+ | **Vector Search** | ✅ | Cosine, Euclidean metrics (L1 removed in v2.0) |
234
+ | **Hybrid Search** | ✅ | BM25 + vector fusion (Reciprocal Rank Fusion) |
235
+ | **Quantization** | ✅ | FLOAT32, FLOAT16, INT8, BIT for 2-32x compression |
236
+ | **Parallel Operations** | ✅ | `threads` parameter for add/search |
237
+ | **Metadata Filtering** | ✅ | SQL `WHERE` clause support |
238
+ | **Framework Integration** | | LangChain \& LlamaIndex adapters |
239
+ | **Hardware Acceleration** | | Auto-detects CUDA/MPS/CPU + SIMD via usearch |
240
+ | **Local Embeddings** | ✅ | HuggingFace models via `[server]` extras |
241
+ | **Built-in Encryption** | 🔜 | SQLCipher integration for at-rest encryption |
247
242
 
248
243
  ## Performance Benchmarks
249
244
 
250
- **Test Environment:** Intel i9-13900K, NVIDIA RTX 4090, `sqlite-vec` v0.1.6
251
- **Dataset:** 10,000 vectors × 384 dimensions
252
-
253
- | Quantization | Storage Size | Insert Speed | Query Latency (k=10) | Compression Ratio |
254
- | :----------- | :----------- | :----------- | :------------------- | :---------------- |
255
- | **FLOAT32** | 15.50 MB | 15,585 vec/s | 3.55 ms | 1x (baseline) |
256
- | **INT8** | 4.23 MB | 27,893 vec/s | 3.93 ms | 3.7x smaller |
257
- | **BIT** | 0.95 MB | 32,321 vec/s | 0.27 ms | 16.3x smaller |
245
+ **10,000 vectors, 384 dimensions, k=10 search** [Full benchmarks →](https://coderdayton.github.io/SimpleVecDB/benchmarks)
258
246
 
259
- **Key Takeaways:**
247
+ | Quantization | Storage | Query Time | Compression |
248
+ | :----------- | :------- | :--------- | :---------- |
249
+ | FLOAT32 | 36.0 MB | 0.20 ms | 1x |
250
+ | FLOAT16 | 28.7 MB | 0.20 ms | 2x |
251
+ | INT8 | 25.0 MB | 0.16 ms | 4x |
252
+ | BIT | 21.8 MB | 0.08 ms | 32x |
260
253
 
261
- - BIT quantization delivers 13x faster queries with 16x storage reduction
262
- - INT8 offers balanced performance (79% faster inserts, minimal query overhead)
263
- - Sub-4ms query latency on consumer hardware
254
+ **Key highlights:**
255
+ - **3-34x faster** than brute-force for collections >10k vectors
256
+ - **Adaptive search**: perfect recall for small collections, HNSW for large
257
+ - **FLOAT16 recommended**: best balance of speed, memory, and precision
264
258
 
265
259
  ## Documentation
266
260
 
@@ -300,14 +294,16 @@ pip install torch --index-url https://download.pytorch.org/whl/cu118
300
294
  **Slow Queries on Large Datasets**
301
295
 
302
296
  - Enable quantization: `collection = db.collection("docs", quantization=Quantization.INT8)`
303
- - Consider HNSW indexing when available (roadmap item)
297
+ - For >10k vectors, HNSW is automatic; tune with `rebuild_index(connectivity=32)`
298
+ - Use `exact=False` to force HNSW even on smaller collections
304
299
  - Use metadata filtering to reduce search space
305
300
 
306
301
  ## Roadmap
307
302
 
308
303
  - [x] Hybrid Search (BM25 + Vector)
309
304
  - [x] Multi-collection support
310
- - [ ] HNSW indexing (pending `sqlite-vec` upstream)
305
+ - [x] HNSW indexing (usearch backend)
306
+ - [x] Adaptive search (brute-force/HNSW)
311
307
  - [ ] SQLCipher encryption (at-rest data protection)
312
308
  - [ ] Streaming insert API for large-scale ingestion
313
309
  - [ ] Graph-based metadata relationships