PyPI - simplevecdb - Versions diffs - 2.4.0__tar.gz → 2.5.0__tar.gz - Mend

simplevecdb 2.4.0tar.gz → 2.5.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (120) hide show

{simplevecdb-2.4.0/docs → simplevecdb-2.5.0}/CHANGELOG.md RENAMED Viewed

@@ -5,6 +5,55 @@ All notable changes to SimpleVecDB will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [2.5.0] - 2026-04-07
+### Added
+- **`delete_collection(name)`** — drop a collection's SQLite tables, FTS index, and usearch file in one call. Available on both `VectorDB` and `AsyncVectorDB`.
+- **`store_embeddings` parameter** on `collection()` — opt into storing embedding BLOBs in SQLite (default `False`). Saves ~2x storage; MMR transparently fetches vectors from the usearch index when BLOBs are absent.
+- **`async_retry_on_lock` decorator** — async variant of `retry_on_lock` using `asyncio.sleep` instead of `time.sleep`, avoiding executor thread blocking.
+- **`file_lock` context manager** — advisory cross-process file locking (`fcntl`/`msvcrt`) for usearch index files. Prevents corruption from concurrent processes.
+- **`__repr__`** on `VectorDB`, `VectorCollection`, `AsyncVectorDB`, `AsyncVectorCollection` for debuggable string representations.
+- **FLOAT16 quantization** fully implemented in `serialize()`/`deserialize()` — was previously defined in the enum but raised `ValueError` at runtime.
+- **Pagination** on `get_documents(limit=, offset=)` and catalog methods (`find_ids_by_filter`, `find_ids_by_texts`) — previously returned unbounded result sets.
+- **Embeddings server enhancements:**
+  - Graceful shutdown with SIGTERM/SIGINT draining (10s timeout)
+  - CORS middleware with configurable origins for browser-based clients
+  - Model warm-up on startup (skip with `--no-warmup`)
+  - Input validation: rejects empty strings (422) and texts exceeding 100k chars (413)
+  - Proper `argparse` CLI with `--host`, `--port`, `--no-warmup`, `--help`
+  - Startup banner logging config summary (host, port, model, auth, rate limits)
+  - Nested token array normalization (`list[list[int]]` input format)
+  - Async executor offload for `embed_texts` (non-blocking event loop)
+  - OpenAPI version synced from package metadata
+  - Module `__init__.py` exports (`embed_texts`, `get_embedder`, `load_model`, `app`, `run_server`)
+### Fixed
+- **`delete_by_ids` ordering** — SQLite deletion now happens first (transactional, can rollback), then usearch. Previously usearch removed first, leaving orphaned catalog entries on SQLite failure.
+- **`_matches_filter` string semantics** — now uses exact equality, consistent with SQL `build_filter_clause`. Was using substring match (`value in str(meta_value)`).
+- **`list_collections`** — scans `sqlite_master` for persisted collection tables instead of returning only session-cached names. Works across reopened databases.
+- **WAL mode for encrypted databases** — `PRAGMA journal_mode=WAL` and `PRAGMA synchronous=NORMAL` now set for SQLCipher connections (was only set for unencrypted).
+- **`collection()` cache key** — includes `distance_strategy` and `quantization` in cache key (sync version). Previously cached by name only, silently ignoring differing params on cache hit.
+- **`_ensure_fts_table`** — retries up to 3 times on transient "database is locked" errors instead of permanently disabling FTS on first failure.
+- **Connection health check** — `SELECT 1` probe after connection creation; raises `RuntimeError` immediately on corrupt databases.
+### Improved
+- **Usearch batch operations** — `add()`, `remove()`, and `get()` now use batch usearch APIs instead of per-key loops. Significant speedup for large operations.
+- **Filtered search iterative deepening** — replaces fixed `k*3` overfetch with adaptive doubling (up to `k*30`). Highly selective filters now reliably return `k` results.
+- **Memory-map heuristic** — uses file size threshold (50MB) instead of inaccurate `file_size // 100` vector count estimate for mmap vs load decision.
+- **Apple chip detection** — uses `platform.processor()` instead of spawning a `sysctl` subprocess.
+### Removed
+- **Duplicate `_dim` property** — removed in favor of the public `dim` property.
+### Breaking Changes
+- String metadata filters now use exact equality (was substring match).
+- `store_embeddings` defaults to `False` — `rebuild_index()` requires `store_embeddings=True` or re-adding documents.
 ## [2.4.0] - 2026-03-22
 ### Added

{simplevecdb-2.4.0 → simplevecdb-2.5.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: simplevecdb
-Version: 2.4.0
+Version: 2.5.0
 Summary: Dead-simple local vector database powered by usearch HNSW.
 Author-email: Dayton Dunbar <coderdayton14@gmail.com>
 License: MIT
@@ -169,10 +169,13 @@ hybrid = collection.hybrid_search("powerhouse cell", k=2)
 **Optional: Run embeddings server (OpenAI-compatible)**
 ```bash
-simplevecdb-server --port 8000
+simplevecdb-server --port 8000                # Default model, auto warm-up
+simplevecdb-server --host 0.0.0.0 --port 9000 # Bind to all interfaces
+simplevecdb-server --no-warmup                # Skip model preload on startup
+simplevecdb-server --help                     # Show all options
 ```
-See [Setup Guide](ENV_SETUP.md) for configuration: model registry, rate limits, API keys, CUDA optimization.
+See [Setup Guide](ENV_SETUP.md) for configuration: model registry, rate limits, API keys, CORS, CUDA optimization.
 ### Option 3: With LangChain or LlamaIndex
@@ -331,6 +334,10 @@ docs = collection.get_documents(filter_dict={"category": "tech"})
 for doc_id, text, metadata in docs:
     print(f"[{doc_id}] {text[:50]}...")
+# Paginated access (v2.5+)
+page1 = collection.get_documents(limit=100)
+page2 = collection.get_documents(limit=100, offset=100)
 # Fetch stored embeddings
 embeddings = collection.get_embeddings_by_ids([1, 2, 3])
@@ -342,6 +349,9 @@ collection.update_metadata([
 # Quick stats
 print(f"Collection has {collection.count()} documents, dim={collection.dim}")
+# Delete an entire collection (v2.5+)
+db.delete_collection("old_data")
 ```
 ### Vector Clustering (v2.2+)
@@ -384,6 +394,10 @@ Supports K-means, MiniBatch K-means, and HDBSCAN. See [Clustering Guide](https:/
 | **Cluster Persistence**   | ✅     | Save/load cluster centroids for fast assignment (v2.2+)      |
 | **Public Catalog API**    | ✅     | `get_documents`, `get_embeddings_by_ids`, `update_metadata` (v2.4+) |
 | **Executor Injection**    | ✅     | Share thread pool across async instances for ONNX safety (v2.4+) |
+| **Collection Management** | ✅     | `delete_collection()`, paginated `get_documents(limit=, offset=)` (v2.5+) |
+| **Cross-Process Safety**  | ✅     | Advisory file locking on usearch index files (v2.5+) |
+| **FLOAT16 Quantization**  | ✅     | Half-precision storage with 2x compression (v2.5+) |
+| **Embeddings Server**     | ✅     | CORS, graceful shutdown, input validation, model warm-up (v2.5+) |
 ## Performance Benchmarks
@@ -456,6 +470,9 @@ pip install torch --index-url https://download.pytorch.org/whl/cu118
 - [x] Vector clustering and auto-tagging (v2.2)
 - [x] Public catalog API for document management (v2.4)
 - [x] Async executor injection for thread-safe sharing (v2.4)
+- [x] Collection management: `delete_collection()`, pagination (v2.5)
+- [x] Cross-process file locking and connection health checks (v2.5)
+- [x] Embeddings server hardening: CORS, graceful shutdown, input validation (v2.5)
 - [ ] Incremental clustering (online learning)
 - [ ] Cluster visualization exports

{simplevecdb-2.4.0 → simplevecdb-2.5.0}/README.md RENAMED Viewed

@@ -140,10 +140,13 @@ hybrid = collection.hybrid_search("powerhouse cell", k=2)
 **Optional: Run embeddings server (OpenAI-compatible)**
 ```bash
-simplevecdb-server --port 8000
+simplevecdb-server --port 8000                # Default model, auto warm-up
+simplevecdb-server --host 0.0.0.0 --port 9000 # Bind to all interfaces
+simplevecdb-server --no-warmup                # Skip model preload on startup
+simplevecdb-server --help                     # Show all options
 ```
-See [Setup Guide](ENV_SETUP.md) for configuration: model registry, rate limits, API keys, CUDA optimization.
+See [Setup Guide](ENV_SETUP.md) for configuration: model registry, rate limits, API keys, CORS, CUDA optimization.
 ### Option 3: With LangChain or LlamaIndex
@@ -302,6 +305,10 @@ docs = collection.get_documents(filter_dict={"category": "tech"})
 for doc_id, text, metadata in docs:
     print(f"[{doc_id}] {text[:50]}...")
+# Paginated access (v2.5+)
+page1 = collection.get_documents(limit=100)
+page2 = collection.get_documents(limit=100, offset=100)
 # Fetch stored embeddings
 embeddings = collection.get_embeddings_by_ids([1, 2, 3])
@@ -313,6 +320,9 @@ collection.update_metadata([
 # Quick stats
 print(f"Collection has {collection.count()} documents, dim={collection.dim}")
+# Delete an entire collection (v2.5+)
+db.delete_collection("old_data")
 ```
 ### Vector Clustering (v2.2+)
@@ -355,6 +365,10 @@ Supports K-means, MiniBatch K-means, and HDBSCAN. See [Clustering Guide](https:/
 | **Cluster Persistence**   | ✅     | Save/load cluster centroids for fast assignment (v2.2+)      |
 | **Public Catalog API**    | ✅     | `get_documents`, `get_embeddings_by_ids`, `update_metadata` (v2.4+) |
 | **Executor Injection**    | ✅     | Share thread pool across async instances for ONNX safety (v2.4+) |
+| **Collection Management** | ✅     | `delete_collection()`, paginated `get_documents(limit=, offset=)` (v2.5+) |
+| **Cross-Process Safety**  | ✅     | Advisory file locking on usearch index files (v2.5+) |
+| **FLOAT16 Quantization**  | ✅     | Half-precision storage with 2x compression (v2.5+) |
+| **Embeddings Server**     | ✅     | CORS, graceful shutdown, input validation, model warm-up (v2.5+) |
 ## Performance Benchmarks
@@ -427,6 +441,9 @@ pip install torch --index-url https://download.pytorch.org/whl/cu118
 - [x] Vector clustering and auto-tagging (v2.2)
 - [x] Public catalog API for document management (v2.4)
 - [x] Async executor injection for thread-safe sharing (v2.4)
+- [x] Collection management: `delete_collection()`, pagination (v2.5)
+- [x] Cross-process file locking and connection health checks (v2.5)
+- [x] Embeddings server hardening: CORS, graceful shutdown, input validation (v2.5)
 - [ ] Incremental clustering (online learning)
 - [ ] Cluster visualization exports

{simplevecdb-2.4.0 → simplevecdb-2.5.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "simplevecdb"
-version = "2.4.0"
+version = "2.5.0"
 description = "Dead-simple local vector database powered by usearch HNSW."
 authors = [{ name = "Dayton Dunbar", email = "coderdayton14@gmail.com" }]
 license = { text = "MIT" }

{simplevecdb-2.4.0 → simplevecdb-2.5.0}/src/simplevecdb/__init__.py RENAMED Viewed

@@ -16,7 +16,13 @@ try:
 except ImportError:
     pass
 from .logging import get_logger, configure_logging, log_operation
-from .utils import DatabaseLockedError, retry_on_lock, validate_filter
+from .utils import (
+    DatabaseLockedError,
+    async_retry_on_lock,
+    file_lock,
+    retry_on_lock,
+    validate_filter,
+)
 from .encryption import EncryptionError, EncryptionUnavailableError
 from importlib.metadata import version as _pkg_version
@@ -49,6 +55,8 @@ __all__ = [
     "MigrationRequiredError",
     "EncryptionError",
     "EncryptionUnavailableError",
+    "async_retry_on_lock",
+    "file_lock",
     "retry_on_lock",
     "validate_filter",
 ]

{simplevecdb-2.4.0 → simplevecdb-2.5.0}/src/simplevecdb/async_core.py RENAMED Viewed

@@ -60,6 +60,9 @@ class AsyncVectorCollection:
         """Collection name."""
         return self._collection.name
+    def __repr__(self) -> str:
+        return f"AsyncVectorCollection(name={self._collection.name!r})"
     async def add_texts(
         self,
         texts: Sequence[str],
@@ -210,15 +213,20 @@ class AsyncVectorCollection:
     async def get_documents(
         self,
         filter_dict: dict[str, Any] | None = None,
+        *,
+        limit: int | None = None,
+        offset: int | None = None,
     ) -> list[tuple[int, str, dict[str, Any]]]:
-        """Get all documents with text content and metadata.
+        """Get documents with text content and metadata.
         See VectorCollection.get_documents for full documentation.
         """
         loop = asyncio.get_running_loop()
         return await loop.run_in_executor(
             self._executor,
-            lambda: self._collection.get_documents(filter_dict=filter_dict),
+            lambda: self._collection.get_documents(
+                filter_dict=filter_dict, limit=limit, offset=offset
+            ),
         )
     async def get_embeddings_by_ids(self, ids: Sequence[int]) -> dict[int, Any]:
@@ -599,9 +607,21 @@ class AsyncVectorDB:
             return self._collections[cache_key]
     def list_collections(self) -> list[str]:
-        """Return names of all initialized collections."""
+        """Return names of all persisted collections in the database."""
         return self._db.list_collections()
+    async def delete_collection(self, name: str) -> None:
+        """Delete a collection and all its data."""
+        loop = asyncio.get_running_loop()
+        await loop.run_in_executor(
+            self._executor, lambda: self._db.delete_collection(name)
+        )
+        # Evict from async-level cache too
+        with self._collections_lock:
+            keys_to_remove = [k for k in self._collections if k[0] == name]
+            for k in keys_to_remove:
+                del self._collections[k]
     async def search_collections(
         self,
         query: Sequence[float],
@@ -644,6 +664,9 @@ class AsyncVectorDB:
             self._executor, lambda: self._db.vacuum(checkpoint_wal)
         )
+    def __repr__(self) -> str:
+        return f"AsyncVectorDB(path={self._db.path!r})"
     async def close(self) -> None:
         """Close the database connection and shutdown executor."""
         try:

{simplevecdb-2.4.0 → simplevecdb-2.5.0}/src/simplevecdb/constants.py RENAMED Viewed

@@ -74,7 +74,8 @@ USEARCH_BRUTEFORCE_THRESHOLD = 10000
 # - Instant startup (no full load into RAM)
 # - Lower memory footprint (OS manages page cache)
 # - Slight latency increase for cold pages (acceptable trade-off)
-USEARCH_MMAP_THRESHOLD = 100000
+# Threshold in bytes — 50MB covers ~30k 384-dim f32 vectors.
+USEARCH_MMAP_THRESHOLD = 50 * 1024 * 1024  # 50 MB
 # Batch search threshold: auto-batch queries when > this count
 # usearch batch search provides ~10x throughput for multi-query workloads

{simplevecdb-2.4.0 → simplevecdb-2.5.0}/src/simplevecdb/core.py RENAMED Viewed

@@ -178,6 +178,7 @@ class VectorCollection:
         distance_strategy: DistanceStrategy,
         quantization: Quantization,
         encryption_key: str | bytes | None = None,
+        store_embeddings: bool = False,
     ):
         self.conn = conn
         self._db_path = db_path
@@ -186,6 +187,7 @@ class VectorCollection:
         self.quantization = quantization
         self._quantizer = QuantizationStrategy(quantization)
         self._encryption_key = encryption_key
+        self._store_embeddings = store_embeddings
         # Sanitize name to prevent issues
         if not re.match(constants.COLLECTION_NAME_PATTERN, name):
@@ -397,12 +399,12 @@ class VectorCollection:
             batch_ids = ids[batch_start:batch_end] if ids else None
             batch_parent_ids = parent_ids[batch_start:batch_end] if parent_ids else None
-            # Add to SQLite metadata store (with embeddings for MMR support)
+            # Add to SQLite metadata store
             doc_ids = self._catalog.add_documents(
                 batch_texts,
                 list(batch_metas),
                 batch_ids,
-                embeddings=batch_embeds,
+                embeddings=batch_embeds if self._store_embeddings else None,
                 parent_ids=batch_parent_ids,
             )
@@ -748,12 +750,13 @@ class VectorCollection:
         if not ids_list:
             return
-        # Delete from usearch
-        self._index.remove(ids_list)
-        # Delete from SQLite
+        # Delete from SQLite first (transactional, can rollback on failure)
         self._catalog.delete_by_ids(ids_list)
+        # Then remove from usearch (if this fails, catalog is clean and
+        # rebuild_index() can recover the index from stored data)
+        self._index.remove(ids_list)
     def remove_texts(
         self,
         texts: Sequence[str] | None = None,
@@ -846,6 +849,13 @@ class VectorCollection:
         # Fetch embeddings from SQLite
         embeddings_map = self._catalog.get_embeddings_by_ids(all_ids)
+        if not embeddings_map and not self._store_embeddings:
+            raise RuntimeError(
+                "Cannot rebuild index: no embeddings stored in SQLite. "
+                "Create the collection with store_embeddings=True to enable "
+                "rebuild_index(), or re-add documents with store_embeddings=True."
+            )
         # Filter to only docs with embeddings
         valid_pairs = [
             (doc_id, emb)
@@ -1260,7 +1270,7 @@ class VectorCollection:
         centroids = None
         if centroids_bytes is not None:
-            dim = self._dim
+            dim = self.dim
             if dim:
                 centroids = np.frombuffer(centroids_bytes, dtype=np.float32).reshape(
                     n_clusters, dim
@@ -1353,19 +1363,26 @@ class VectorCollection:
     def get_documents(
         self,
         filter_dict: dict[str, Any] | None = None,
+        *,
+        limit: int | None = None,
+        offset: int | None = None,
     ) -> list[tuple[int, str, dict[str, Any]]]:
-        """Get all documents with text content and metadata.
+        """Get documents with text content and metadata.
         Args:
             filter_dict: Optional metadata filter to narrow results.
+            limit: Maximum number of documents to return (None = all).
+            offset: Number of documents to skip (None = 0).
         Returns:
-            List of (doc_id, text, metadata) tuples.
+            List of (doc_id, text, metadata) tuples, ordered by ID.
         """
         filter_builder = self._catalog.build_filter_clause if filter_dict else None
         return self._catalog.get_all_docs_with_text(
             filter_dict=filter_dict,
             filter_builder=filter_builder,
+            limit=limit,
+            offset=offset,
         )
     def get_embeddings_by_ids(self, ids: Sequence[int]) -> dict[int, Any]:
@@ -1395,10 +1412,11 @@ class VectorCollection:
         """Vector dimension (None if no vectors added yet)."""
         return self._index.ndim
-    @property
-    def _dim(self) -> int | None:
-        """Vector dimension (None if no vectors added yet)."""
-        return self._index.ndim
+    def __repr__(self) -> str:
+        return (
+            f"VectorCollection(name={self.name!r}, dim={self.dim}, "
+            f"size={self.count()}, distance={self.distance_strategy.value})"
+        )
 class VectorDB:
@@ -1452,7 +1470,7 @@ class VectorDB:
         self.quantization = quantization
         self.auto_migrate = auto_migrate
         self._encryption_key = encryption_key
-        self._collections: dict[str, VectorCollection] = {}
+        self._collections: dict[tuple, VectorCollection] = {}
         # Create connection (encrypted or plain)
         if encryption_key is not None:
@@ -1467,6 +1485,8 @@ class VectorDB:
                 check_same_thread=False,
                 timeout=30.0,
             )
+            self.conn.execute("PRAGMA journal_mode=WAL")
+            self.conn.execute("PRAGMA synchronous=NORMAL")
             self._encrypted = True
             _logger.info("Opened encrypted database: %s", self.path)
         else:
@@ -1477,6 +1497,13 @@ class VectorDB:
             self.conn.execute("PRAGMA synchronous=NORMAL")
             self._encrypted = False
+        # Verify connection is healthy
+        try:
+            self.conn.execute("SELECT 1")
+        except sqlite3.DatabaseError as e:
+            self.conn.close()
+            raise RuntimeError(f"Database health check failed: {e}") from e
         # Check for required migration before allowing collection access
         if not auto_migrate and self.path != ":memory:":
             migration_info = VectorDB.check_migration(self.path)
@@ -1491,22 +1518,103 @@ class VectorDB:
     def list_collections(self) -> list[str]:
         """
-        Return names of all initialized collections.
+        Return names of all persisted collections in the database.
-        Only returns collections that have been accessed via `collection()` in this
-        session. Does not scan the database for collections created in previous sessions.
+        Scans the database schema for collection tables, returning both
+        collections accessed this session and those created in previous sessions.
         Returns:
-            List of collection names currently cached in this VectorDB instance.
+            Sorted list of collection names stored in this database.
         Example:
             >>> db = VectorDB("app.db")
             >>> db.collection("users")
-            >>> db.collection("products")
-            >>> db.list_collections()
-            ['users', 'products']
+            >>> db.close()
+            >>> db2 = VectorDB("app.db")
+            >>> db2.list_collections()
+            ['users']
+        """
+        rows = self.conn.execute(
+            "SELECT name FROM sqlite_master WHERE type='table' "
+            "AND (name = 'tinyvec_items' OR name LIKE 'items_%')"
+        ).fetchall()
+        # Collect all table names, then filter out FTS/cluster derivatives.
+        # FTS5 creates shadow tables: items_<name>_fts, items_<name>_fts_data,
+        # items_<name>_fts_idx, items_<name>_fts_content, items_<name>_fts_docsize,
+        # items_<name>_fts_config.  Cluster tables: items_<name>_clusters.
+        # We identify derivatives by checking if a suffix is <coll>_fts*
+        # or <coll>_clusters for some other known collection suffix.
+        all_suffixes: set[str] = set()
+        has_default = False
+        for (table_name,) in rows:
+            if table_name == "tinyvec_items":
+                has_default = True
+            elif table_name.startswith("items_"):
+                all_suffixes.add(table_name[6:])
+        # A suffix is a real collection if no other suffix is a prefix of it
+        # followed by _fts* or _clusters.
+        _fts_suffixes = ("_fts", "_fts_data", "_fts_idx", "_fts_content",
+                         "_fts_docsize", "_fts_config")
+        derivative_suffixes: set[str] = set()
+        for s in all_suffixes:
+            for fts in _fts_suffixes:
+                derivative_suffixes.add(f"{s}{fts}")
+            derivative_suffixes.add(f"{s}_clusters")
+        names: list[str] = []
+        if has_default:
+            names.append("default")
+        for s in sorted(all_suffixes - derivative_suffixes):
+            names.append(s)
+        return names
+    def delete_collection(self, name: str) -> None:
         """
-        return list(self._collections.keys())
+        Delete a collection and all its data.
+        Drops the SQLite tables (items, FTS, clusters) and deletes
+        the usearch index file. Removes the collection from the cache.
+        Args:
+            name: Collection name to delete.
+        Raises:
+            ValueError: If the collection name is invalid.
+            KeyError: If the collection does not exist.
+        """
+        if not re.match(constants.COLLECTION_NAME_PATTERN, name):
+            raise ValueError(
+                f"Invalid collection name '{name}'. Must be alphanumeric + underscores."
+            )
+        if name not in self.list_collections():
+            raise KeyError(f"Collection '{name}' does not exist.")
+        table_name = "tinyvec_items" if name == "default" else f"items_{name}"
+        fts_table = f"{table_name}_fts"
+        cluster_table = f"{table_name}_clusters"
+        # Drop SQLite tables
+        self.conn.execute(f"DROP TABLE IF EXISTS {fts_table}")
+        self.conn.execute(f"DROP TABLE IF EXISTS {cluster_table}")
+        self.conn.execute(f"DROP TABLE IF EXISTS {table_name}")
+        self.conn.commit()
+        # Delete usearch index file (and encrypted variant if present)
+        if self.path != ":memory:":
+            index_path = Path(self.path + f".{name}.usearch")
+            if index_path.exists():
+                index_path.unlink()
+            encrypted_path = Path(str(index_path) + ".enc")
+            if encrypted_path.exists():
+                encrypted_path.unlink()
+        # Remove from cache (match any tuple key with this name)
+        keys_to_remove = [k for k in self._collections if k[0] == name]
+        for k in keys_to_remove:
+            del self._collections[k]
+        _logger.info("Deleted collection: %s", name)
     def search_collections(
         self,
@@ -1562,14 +1670,28 @@ class VectorDB:
         # Resolve and validate collections
         targets: list[VectorCollection] = []
         dims: set[int | None] = set()
+        # Validate explicit collection names exist in DB
+        if collections is not None:
+            persisted = set(self.list_collections())
+            for name in target_names:
+                if name not in persisted:
+                    # Check cache too (collection may exist but not yet persisted)
+                    if not any(k[0] == name for k in self._collections):
+                        raise KeyError(
+                            f"Collection '{name}' not initialized. "
+                            f"Call db.collection('{name}') first."
+                        )
         for name in target_names:
-            if name not in self._collections:
-                raise KeyError(
-                    f"Collection '{name}' not initialized. Call db.collection('{name}') first."
-                )
-            coll = self._collections[name]
+            # Find cached collection by name (may have any strategy/quantization)
+            matched = [v for k, v in self._collections.items() if k[0] == name]
+            if matched:
+                coll = matched[0]
+            else:
+                # Auto-initialize with defaults for persisted but uncached collections
+                coll = self.collection(name)
             targets.append(coll)
-            dims.add(coll._dim)
+            dims.add(coll.dim)
         # Check dimension consistency (ignore None for empty collections)
         dims.discard(None)
@@ -1629,6 +1751,7 @@ class VectorDB:
         name: str = "default",
         distance_strategy: DistanceStrategy | None = None,
         quantization: Quantization | None = None,
+        store_embeddings: bool = False,
     ) -> VectorCollection:
         """
         Get or create a named collection.
@@ -1640,6 +1763,9 @@ class VectorDB:
             name: Collection name (alphanumeric + underscore only).
             distance_strategy: Override database-level distance metric.
             quantization: Override database-level quantization.
+            store_embeddings: If True, store embeddings as BLOBs in SQLite
+                alongside the usearch index. Required for rebuild_index().
+                Default False to save ~2x storage.
         Returns:
             VectorCollection instance.
@@ -1647,7 +1773,7 @@ class VectorDB:
         Raises:
             ValueError: If collection name contains invalid characters.
         """
-        cache_key = name
+        cache_key = (name, distance_strategy, quantization, store_embeddings)
         if cache_key not in self._collections:
             self._collections[cache_key] = VectorCollection(
                 conn=self.conn,
@@ -1656,6 +1782,7 @@ class VectorDB:
                 distance_strategy=distance_strategy or self.distance_strategy,
                 quantization=quantization or self.quantization,
                 encryption_key=self._encryption_key,
+                store_embeddings=store_embeddings,
             )
         return self._collections[cache_key]
@@ -1844,6 +1971,9 @@ MIGRATION ROLLBACK INSTRUCTIONS:
         for collection in self._collections.values():
             collection.save()
+    def __repr__(self) -> str:
+        return f"VectorDB(path={self.path!r}, collections={self.list_collections()})"
     def close(self) -> None:
         """Close the database connection and save indexes."""
         if getattr(self, "_closed", False):

simplevecdb-2.5.0/src/simplevecdb/embeddings/__init__.py ADDED Viewed

@@ -0,0 +1,12 @@
+"""Embeddings module — local embedding models and OpenAI-compatible server."""
+from .models import embed_texts, get_embedder, load_model
+from .server import app, run_server
+__all__ = [
+    "app",
+    "embed_texts",
+    "get_embedder",
+    "load_model",
+    "run_server",
+]

simplevecdb 2.4.0__tar.gz → 2.5.0__tar.gz

simplevecdb 2.4.0tar.gz → 2.5.0tar.gz