PyPI - simplevecdb - Versions diffs - 2.0.0__tar.gz → 2.1.0__tar.gz - Mend

simplevecdb 2.0.0tar.gz → 2.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (104) hide show

simplevecdb-2.1.0/.github/ISSUE_TEMPLATE/bug_report.yml ADDED Viewed

@@ -0,0 +1,82 @@
+name: Bug Report
+description: Report a bug in SimpleVecDB
+labels: ["bug"]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Thanks for taking the time to report a bug. Please fill out the sections below.
+  - type: textarea
+    id: description
+    attributes:
+      label: Describe the bug
+      description: A clear and concise description of what the bug is.
+    validations:
+      required: true
+  - type: textarea
+    id: reproduction
+    attributes:
+      label: Reproduction steps
+      description: Minimal code to reproduce the issue.
+      placeholder: |
+        ```python
+        from simplevecdb import VectorDB
+        db = VectorDB(":memory:")
+        # ... code that triggers bug
+        ```
+    validations:
+      required: true
+  - type: textarea
+    id: expected
+    attributes:
+      label: Expected behavior
+      description: What you expected to happen.
+    validations:
+      required: true
+  - type: textarea
+    id: actual
+    attributes:
+      label: Actual behavior
+      description: What actually happened. Include error messages if applicable.
+    validations:
+      required: true
+  - type: input
+    id: version
+    attributes:
+      label: SimpleVecDB version
+      placeholder: "2.0.0"
+    validations:
+      required: true
+  - type: input
+    id: python-version
+    attributes:
+      label: Python version
+      placeholder: "3.11"
+    validations:
+      required: true
+  - type: dropdown
+    id: os
+    attributes:
+      label: Operating System
+      options:
+        - Linux
+        - macOS
+        - Windows
+        - Other
+    validations:
+      required: true
+  - type: textarea
+    id: additional
+    attributes:
+      label: Additional context
+      description: Any other context about the problem (logs, screenshots, etc.)
+    validations:
+      required: false

simplevecdb-2.1.0/.github/ISSUE_TEMPLATE/config.yml ADDED Viewed

@@ -0,0 +1,8 @@
+blank_issues_enabled: false
+contact_links:
+  - name: Documentation
+    url: https://simplevecdb.dev
+    about: Check the docs before opening an issue
+  - name: Discussions
+    url: https://github.com/coderdayton/simplevecdb/discussions
+    about: Ask questions and share ideas

simplevecdb-2.1.0/.github/ISSUE_TEMPLATE/feature_request.yml ADDED Viewed

@@ -0,0 +1,58 @@
+name: Feature Request
+description: Suggest a new feature or enhancement
+labels: ["enhancement"]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Thanks for suggesting a feature! Please describe what you'd like to see.
+  - type: textarea
+    id: problem
+    attributes:
+      label: Problem or motivation
+      description: What problem does this feature solve? Why do you need it?
+      placeholder: "I'm trying to do X but currently have to..."
+    validations:
+      required: true
+  - type: textarea
+    id: solution
+    attributes:
+      label: Proposed solution
+      description: How would you like this to work?
+      placeholder: |
+        ```python
+        # Example API usage
+        db = VectorDB("my.db")
+        db.new_feature(...)
+        ```
+    validations:
+      required: true
+  - type: textarea
+    id: alternatives
+    attributes:
+      label: Alternatives considered
+      description: Any alternative solutions or workarounds you've tried.
+    validations:
+      required: false
+  - type: dropdown
+    id: scope
+    attributes:
+      label: Scope
+      description: How big is this change?
+      options:
+        - Small (docs, minor tweak)
+        - Medium (new method, config option)
+        - Large (new module, breaking change)
+    validations:
+      required: true
+  - type: checkboxes
+    id: contribution
+    attributes:
+      label: Contribution
+      options:
+        - label: I'm willing to submit a PR for this feature

{simplevecdb-2.0.0 → simplevecdb-2.1.0}/.gitignore RENAMED Viewed

@@ -21,6 +21,10 @@ build/
 *.db
 *.sqlite
+# OpenCode
+.opencode/
+opencode.json
 # Project specific
 simplevecdb_plan.md
 AGENTS.md

{simplevecdb-2.0.0/docs → simplevecdb-2.1.0}/CHANGELOG.md RENAMED Viewed

@@ -5,6 +5,49 @@ All notable changes to SimpleVecDB will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [2.1.0] - 2026-01-01
+### Added
+- **SQLCipher Encryption Support** - Full at-rest encryption for sensitive data:
+  - `VectorDB(path, encryption_key="...")` enables AES-256 page-level database encryption
+  - Uses SQLCipher for transparent SQLite encryption (PRAGMA key)
+  - Usearch index files encrypted with AES-256-GCM (`.usearch.enc`)
+  - Zero performance overhead during search (decrypt on load, encrypt on save only)
+  - Key derivation: PBKDF2-SHA256 with 480,000 iterations for passphrases
+  - Install with `pip install simplevecdb[encryption]`
+- **New encryption module** (`simplevecdb.encryption`):
+  - `create_encrypted_connection()` - SQLCipher connection factory
+  - `is_database_encrypted()` - Check if a database file is encrypted
+  - `encrypt_index_file()` / `decrypt_index_file()` - Index file encryption
+  - `EncryptionError` / `EncryptionUnavailableError` - New exception types
+- **Streaming Insert API** - Memory-efficient large-scale ingestion:
+  - `collection.add_texts_streaming(iterable)` - Process from any iterator/generator
+  - Configurable `batch_size` parameter (default: config.EMBEDDING_BATCH_SIZE)
+  - Yields `StreamingProgress` after each batch for monitoring
+  - Optional `on_progress` callback for custom logging/UI updates
+  - New types: `StreamingProgress`, `ProgressCallback`
+- **Hierarchical Document Relationships** - Parent/child document structure:
+  - `parent_ids` parameter in `add_texts()` to link documents
+  - `get_children(doc_id)` - Get direct child documents
+  - `get_parent(doc_id)` - Get parent document
+  - `get_descendants(doc_id, max_depth)` - Recursive children traversal
+  - `get_ancestors(doc_id, max_depth)` - Path to root
+  - `set_parent(doc_id, parent_id)` - Update relationships
+  - Uses SQLite recursive CTE for efficient traversal
+  - Auto-migrates existing databases (adds `parent_id` column)
+### Changed
+- `check_migration()` now gracefully handles encrypted databases (returns `needs_migration=False`)
+### Dependencies
+- New optional dependency group `[encryption]`: `sqlcipher3-binary>=0.5.0`, `cryptography>=41.0`
 ## [2.0.0] - 2025-12-23
 ### Breaking Changes

simplevecdb-2.0.0/README.md → simplevecdb-2.1.0/PKG-INFO RENAMED Viewed

@@ -1,3 +1,26 @@
+Metadata-Version: 2.4
+Name: simplevecdb
+Version: 2.1.0
+Summary: Dead-simple local vector database powered by usearch HNSW.
+Author-email: Dayton Dunbar <coderdayton14@gmail.com>
+License: MIT
+License-File: LICENSE
+Requires-Python: >=3.10
+Requires-Dist: numpy>=2.0
+Requires-Dist: psutil>=5.9.0
+Requires-Dist: python-dotenv>=1.2.1
+Requires-Dist: usearch>=2.12
+Provides-Extra: encryption
+Requires-Dist: cryptography>=41.0; extra == 'encryption'
+Requires-Dist: sqlcipher3-binary>=0.5.0; extra == 'encryption'
+Provides-Extra: examples
+Requires-Dist: ollama; extra == 'examples'
+Provides-Extra: server
+Requires-Dist: fastapi>=0.115; extra == 'server'
+Requires-Dist: sentence-transformers>=5.0; extra == 'server'
+Requires-Dist: uvicorn[standard]>=0.30; extra == 'server'
+Description-Content-Type: text/markdown
 # SimpleVecDB
 [![CI](https://github.com/coderdayton/simplevecdb/actions/workflows/ci.yml/badge.svg)](https://github.com/coderdayton/simplevecdb/actions)
@@ -51,6 +74,9 @@ pip install simplevecdb
 # With local embeddings server + HuggingFace models (500MB+)
 pip install "simplevecdb[server]"
+# With encryption support (SQLCipher)
+pip install "simplevecdb[encryption]"
 ```
 **Verify Installation:**
@@ -222,6 +248,63 @@ results = collection.similarity_search(
 > **Tip:** LangChain and LlamaIndex integrations support all search methods.
+### Encryption (v2.1+)
+Protect sensitive data with AES-256 at-rest encryption:
+```bash
+pip install "simplevecdb[encryption]"
+```
+```python
+from simplevecdb import VectorDB
+# Create encrypted database
+db = VectorDB("secure.db", encryption_key="your-secret-key")
+collection = db.collection("confidential")
+collection.add_texts(["sensitive data"], embeddings=[[0.1]*384])
+db.close()
+# Reopen requires same key
+db = VectorDB("secure.db", encryption_key="your-secret-key")
+```
+### Streaming Insert (v2.1+)
+Memory-efficient ingestion for large datasets:
+```python
+def load_documents():
+    for line in open("large_file.jsonl"):
+        doc = json.loads(line)
+        yield (doc["text"], doc.get("metadata"), doc.get("embedding"))
+for progress in collection.add_texts_streaming(load_documents(), batch_size=1000):
+    print(f"Processed {progress['docs_processed']} documents")
+```
+### Document Hierarchies (v2.1+)
+Organize documents in parent-child relationships:
+```python
+# Add parent document
+parent_ids = collection.add_texts(["Main document"], embeddings=[[0.1]*384])
+# Add children
+child_ids = collection.add_texts(
+    ["Chunk 1", "Chunk 2"],
+    embeddings=[[0.11]*384, [0.12]*384],
+    parent_ids=[parent_ids[0], parent_ids[0]]
+)
+# Navigate hierarchy
+children = collection.get_children(parent_ids[0])
+parent = collection.get_parent(child_ids[0])
+descendants = collection.get_descendants(parent_ids[0])
+```
 ## Feature Matrix
 | Feature                   | Status | Description                                                  |
@@ -238,7 +321,9 @@ results = collection.similarity_search(
 | **Framework Integration** | ✅     | LangChain \& LlamaIndex adapters                             |
 | **Hardware Acceleration** | ✅     | Auto-detects CUDA/MPS/CPU + SIMD via usearch                 |
 | **Local Embeddings**      | ✅     | HuggingFace models via `[server]` extras                     |
-| **Built-in Encryption**   | 🔜     | SQLCipher integration for at-rest encryption                 |
+| **Built-in Encryption**   | ✅     | SQLCipher AES-256 at-rest encryption via `[encryption]` extras |
+| **Streaming Insert**      | ✅     | Memory-efficient large-scale ingestion with progress callbacks |
+| **Document Hierarchies**  | ✅     | Parent/child relationships for chunked docs                  |
 ## Performance Benchmarks
@@ -304,9 +389,11 @@ pip install torch --index-url https://download.pytorch.org/whl/cu118
 - [x] Multi-collection support
 - [x] HNSW indexing (usearch backend)
 - [x] Adaptive search (brute-force/HNSW)
-- [ ] SQLCipher encryption (at-rest data protection)
-- [ ] Streaming insert API for large-scale ingestion
-- [ ] Graph-based metadata relationships
+- [x] SQLCipher encryption (at-rest data protection)
+- [x] Streaming insert API for large-scale ingestion
+- [x] Hierarchical document relationships (parent/child)
+- [ ] Cross-collection search
+- [ ] Vector clustering and auto-tagging
 Vote on features or propose new ones in [GitHub Discussions](https://github.com/coderdayton/simplevecdb/discussions).

simplevecdb-2.0.0/PKG-INFO → simplevecdb-2.1.0/README.md RENAMED Viewed

@@ -1,23 +1,3 @@
-Metadata-Version: 2.4
-Name: simplevecdb
-Version: 2.0.0
-Summary: Dead-simple local vector database powered by usearch HNSW.
-Author-email: Dayton Dunbar <coderdayton14@gmail.com>
-License: MIT
-License-File: LICENSE
-Requires-Python: >=3.10
-Requires-Dist: numpy>=2.0
-Requires-Dist: psutil>=5.9.0
-Requires-Dist: python-dotenv>=1.2.1
-Requires-Dist: usearch>=2.12
-Provides-Extra: examples
-Requires-Dist: ollama; extra == 'examples'
-Provides-Extra: server
-Requires-Dist: fastapi>=0.115; extra == 'server'
-Requires-Dist: sentence-transformers>=5.0; extra == 'server'
-Requires-Dist: uvicorn[standard]>=0.30; extra == 'server'
-Description-Content-Type: text/markdown
 # SimpleVecDB
 [![CI](https://github.com/coderdayton/simplevecdb/actions/workflows/ci.yml/badge.svg)](https://github.com/coderdayton/simplevecdb/actions)
@@ -71,6 +51,9 @@ pip install simplevecdb
 # With local embeddings server + HuggingFace models (500MB+)
 pip install "simplevecdb[server]"
+# With encryption support (SQLCipher)
+pip install "simplevecdb[encryption]"
 ```
 **Verify Installation:**
@@ -242,6 +225,63 @@ results = collection.similarity_search(
 > **Tip:** LangChain and LlamaIndex integrations support all search methods.
+### Encryption (v2.1+)
+Protect sensitive data with AES-256 at-rest encryption:
+```bash
+pip install "simplevecdb[encryption]"
+```
+```python
+from simplevecdb import VectorDB
+# Create encrypted database
+db = VectorDB("secure.db", encryption_key="your-secret-key")
+collection = db.collection("confidential")
+collection.add_texts(["sensitive data"], embeddings=[[0.1]*384])
+db.close()
+# Reopen requires same key
+db = VectorDB("secure.db", encryption_key="your-secret-key")
+```
+### Streaming Insert (v2.1+)
+Memory-efficient ingestion for large datasets:
+```python
+def load_documents():
+    for line in open("large_file.jsonl"):
+        doc = json.loads(line)
+        yield (doc["text"], doc.get("metadata"), doc.get("embedding"))
+for progress in collection.add_texts_streaming(load_documents(), batch_size=1000):
+    print(f"Processed {progress['docs_processed']} documents")
+```
+### Document Hierarchies (v2.1+)
+Organize documents in parent-child relationships:
+```python
+# Add parent document
+parent_ids = collection.add_texts(["Main document"], embeddings=[[0.1]*384])
+# Add children
+child_ids = collection.add_texts(
+    ["Chunk 1", "Chunk 2"],
+    embeddings=[[0.11]*384, [0.12]*384],
+    parent_ids=[parent_ids[0], parent_ids[0]]
+)
+# Navigate hierarchy
+children = collection.get_children(parent_ids[0])
+parent = collection.get_parent(child_ids[0])
+descendants = collection.get_descendants(parent_ids[0])
+```
 ## Feature Matrix
 | Feature                   | Status | Description                                                  |
@@ -258,7 +298,9 @@ results = collection.similarity_search(
 | **Framework Integration** | ✅     | LangChain \& LlamaIndex adapters                             |
 | **Hardware Acceleration** | ✅     | Auto-detects CUDA/MPS/CPU + SIMD via usearch                 |
 | **Local Embeddings**      | ✅     | HuggingFace models via `[server]` extras                     |
-| **Built-in Encryption**   | 🔜     | SQLCipher integration for at-rest encryption                 |
+| **Built-in Encryption**   | ✅     | SQLCipher AES-256 at-rest encryption via `[encryption]` extras |
+| **Streaming Insert**      | ✅     | Memory-efficient large-scale ingestion with progress callbacks |
+| **Document Hierarchies**  | ✅     | Parent/child relationships for chunked docs                  |
 ## Performance Benchmarks
@@ -324,9 +366,11 @@ pip install torch --index-url https://download.pytorch.org/whl/cu118
 - [x] Multi-collection support
 - [x] HNSW indexing (usearch backend)
 - [x] Adaptive search (brute-force/HNSW)
-- [ ] SQLCipher encryption (at-rest data protection)
-- [ ] Streaming insert API for large-scale ingestion
-- [ ] Graph-based metadata relationships
+- [x] SQLCipher encryption (at-rest data protection)
+- [x] Streaming insert API for large-scale ingestion
+- [x] Hierarchical document relationships (parent/child)
+- [ ] Cross-collection search
+- [ ] Vector clustering and auto-tagging
 Vote on features or propose new ones in [GitHub Discussions](https://github.com/coderdayton/simplevecdb/discussions).

simplevecdb-2.1.0/docs/api/core.md ADDED Viewed

@@ -0,0 +1,147 @@
+# Core API
+## VectorDB
+The main database class for managing vector collections.
+::: simplevecdb.core.VectorDB
+    options:
+      members:
+        - collection
+        - vacuum
+        - close
+        - check_migration
+## VectorCollection
+A named collection of vectors within a database.
+::: simplevecdb.core.VectorCollection
+    options:
+      members:
+        - add_texts
+        - add_texts_streaming
+        - similarity_search
+        - similarity_search_batch
+        - keyword_search
+        - hybrid_search
+        - max_marginal_relevance_search
+        - delete_by_ids
+        - remove_texts
+        - rebuild_index
+        - get_children
+        - get_parent
+        - get_descendants
+        - get_ancestors
+        - set_parent
+## Quick Reference
+### Search Methods
+| Method | Description | Use Case |
+|--------|-------------|----------|
+| `similarity_search()` | Vector similarity search | Single query, best match |
+| `similarity_search_batch()` | Batch vector search | Multiple queries, ~10x throughput |
+| `keyword_search()` | BM25 full-text search | Keyword matching |
+| `hybrid_search()` | BM25 + vector fusion | Best of both worlds |
+| `max_marginal_relevance_search()` | Diversity-aware search | Avoid redundant results |
+### Search Parameters
+```python
+# Adaptive search (default) - auto-selects brute-force or HNSW
+results = collection.similarity_search(query, k=10)
+# Force exact brute-force search (perfect recall)
+results = collection.similarity_search(query, k=10, exact=True)
+# Force HNSW approximate search (faster)
+results = collection.similarity_search(query, k=10, exact=False)
+# Parallel search with explicit thread count
+results = collection.similarity_search(query, k=10, threads=4)
+# Batch search for multiple queries
+results = collection.similarity_search_batch(queries, k=10)
+```
+### Quantization Options
+```python
+from simplevecdb import Quantization
+# Full precision (default)
+collection = db.collection("docs", quantization=Quantization.FLOAT)
+# Half precision - 2x memory savings, 1.5x faster
+collection = db.collection("docs", quantization=Quantization.FLOAT16)
+# 8-bit quantization - 4x memory savings
+collection = db.collection("docs", quantization=Quantization.INT8)
+# 1-bit quantization - 32x memory savings
+collection = db.collection("docs", quantization=Quantization.BIT)
+```
+### Streaming Insert
+For large-scale ingestion without memory pressure:
+```python
+# From generator/iterator
+def load_documents():
+    for line in open("large_file.jsonl"):
+        doc = json.loads(line)
+        yield (doc["text"], doc.get("metadata"), doc.get("embedding"))
+for progress in collection.add_texts_streaming(load_documents()):
+    print(f"Batch {progress['batch_num']}: {progress['docs_processed']} total")
+# With progress callback
+def log_progress(p):
+    print(f"{p['docs_processed']} docs, batch {p['batch_num']}")
+list(collection.add_texts_streaming(items, batch_size=500, on_progress=log_progress))
+```
+### Hierarchical Relationships
+Organize documents in parent-child hierarchies for chunked documents, threaded conversations, or nested content:
+```python
+# Add documents with parent relationships
+parent_ids = collection.add_texts(["Main document"], metadatas=[{"type": "parent"}])
+parent_id = parent_ids[0]
+# Add children referencing the parent
+child_ids = collection.add_texts(
+    ["Chunk 1", "Chunk 2", "Chunk 3"],
+    parent_ids=[parent_id, parent_id, parent_id]
+)
+# Navigate the hierarchy
+children = collection.get_children(parent_id)         # Direct children
+parent = collection.get_parent(child_ids[0])          # Get parent document
+descendants = collection.get_descendants(parent_id)   # All nested children
+ancestors = collection.get_ancestors(child_ids[0])    # Path to root
+# Reparent or orphan documents
+collection.set_parent(child_ids[0], new_parent_id)    # Move to new parent
+collection.set_parent(child_ids[0], None)             # Make root document
+# Search within a subtree
+results = collection.similarity_search(
+    query_embedding,
+    k=5,
+    filter={"parent_id": parent_id}  # Only search children
+)
+```
+| Method | Description |
+|--------|-------------|
+| `get_children(doc_id)` | Direct children of a document |
+| `get_parent(doc_id)` | Parent document (or None if root) |
+| `get_descendants(doc_id, max_depth)` | All nested children recursively |
+| `get_ancestors(doc_id)` | Path from document to root |
+| `set_parent(doc_id, parent_id)` | Move document to new parent (or None to orphan) |

simplevecdb 2.0.0__tar.gz → 2.1.0__tar.gz

simplevecdb 2.0.0tar.gz → 2.1.0tar.gz