PyPI - beaver-db - Versions diffs - 0.13.1__tar.gz → 0.15.0__tar.gz - Mend

beaver-db 0.13.1tar.gz → 0.15.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of beaver-db might be problematic. Click here for more details.

Files changed (20) hide show

{beaver_db-0.13.1/beaver_db.egg-info → beaver_db-0.15.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: beaver-db
-Version: 0.13.1
+Version: 0.15.0
 Summary: Fast, embedded, and multi-modal DB based on SQLite for AI-powered applications.
 Requires-Python: >=3.13
 Description-Content-Type: text/markdown
@@ -31,12 +31,13 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
   - **Sync/Async High-Efficiency Pub/Sub**: A powerful, thread and process-safe publish-subscribe system for real-time messaging with a fan-out architecture. Sync by default, but with an `as_async` wrapper for async applications.
   - **Namespaced Key-Value Dictionaries**: A Pythonic, dictionary-like interface for storing any JSON-serializable object within separate namespaces with optional TTL for cache implementations.
   - **Pythonic List Management**: A fluent, Redis-like interface for managing persistent, ordered lists.
-  - **Persistent Priority Queue**: A high-performance, persistent queue that always returns the item with the highest priority, perfect for task management.
+  - **Persistent Priority Queue**: A high-performance, persistent priority queue perfect for task orchestration across multiple processes. Also with optional async support.
   - **Simple Blob Storage**: A dictionary-like interface for storing medium-sized binary files (like PDFs or images) directly in the database, ensuring transactional integrity with your other data.
   - **High-Performance Vector Storage & Search**: Store vector embeddings and perform fast, crash-safe approximate nearest neighbor searches using a `faiss`-based hybrid index.
   - **Full-Text and Fuzzy Search**: Automatically index and search through document metadata using SQLite's powerful FTS5 engine, enhanced with optional fuzzy search for typo-tolerant matching.
   - **Knowledge Graph**: Create relationships between documents and traverse the graph to find neighbors or perform multi-hop walks.
   - **Single-File & Portable**: All data is stored in a single SQLite file, making it incredibly easy to move, back up, or embed in your application.
+  - **Optional Type-Safety:** Although the database is schemaless, you can use a minimalistic typing system for automatic serialization and deserialization that is Pydantic-compatible out of the box.
 ## How Beaver is Implemented
@@ -222,7 +223,7 @@ avatar = attachments.get("user_123_avatar.png")
 ## Type-Safe Data Models
-For enhanced data integrity and a better developer experience, BeaverDB supports type-safe operations for dictionaries, lists, and queues. By associating a model with these data structures, you get automatic serialization and deserialization, complete with autocompletion in your editor.
+For enhanced data integrity and a better developer experience, BeaverDB supports type-safe operations for all modalities. By associating a model with these data structures, you get automatic serialization and deserialization, complete with autocompletion in your editor.
 This feature is designed to be flexible and works seamlessly with two kinds of models:
@@ -252,25 +253,32 @@ retrieved_user = users["alice"]
 print(f"Retrieved: {retrieved_user.name}") # Your editor will provide autocompletion here
 ```
+In the same way you can have typed message payloads in `db.channel`, typed metadata in `db.blobs`, and custom document types in `db.collection`, as well as custom types in lists and queues.
+Basically everywhere you can store or get some object in BeaverDB, you can use a typed version adding `model=MyClass` to the corresponding wrapper methond in `BeaverDB` and enjoy first-class type safety and inference.
 ## More Examples
 For more in-depth examples, check out the scripts in the `examples/` directory:
-  - [`examples/async_pubsub.py`](examples/async_pubsub.py): A demonstration of the asynchronous wrapper for the publish/subscribe system.
-  - [`examples/blobs.py`](examples/blobs.py): Demonstrates how to store and retrieve binary data in the database.
-  - [`examples/cache.py`](examples/cache.py): A practical example of using a dictionary with TTL as a cache for API calls.
-  - [`examples/fts.py`](examples/fts.py): A detailed look at full-text search, including targeted searches on specific metadata fields.
-  - [`examples/fuzzy.py`](examples/fuzzy.py): Demonstrates fuzzy search capabilities for text search.
-  - [`examples/general_test.py`](examples/general_test.py): A general-purpose test to run all operations randomly which allows testing long-running processes and synchronicity issues.
-  - [`examples/graph.py`](examples/graph.py): Shows how to create relationships between documents and perform multi-hop graph traversals.
-  - [`examples/kvstore.py`](examples/kvstore.py): A comprehensive demo of the namespaced dictionary feature.
-  - [`examples/list.py`](examples/list.py): Shows the full capabilities of the persistent list, including slicing and in-place updates.
-  - [`examples/publisher.py`](examples/publisher.py) and [`examples/subscriber.py`](examples/subscriber.py): A pair of examples demonstrating inter-process message passing with the publish/subscribe system.
-  - [`examples/pubsub.py`](examples/pubsub.py): A demonstration of the synchronous, thread-safe publish/subscribe system in a single process.
-  - [`examples/queue.py`](examples/queue.py): A practical example of using the persistent priority queue for task management.
-  - [`examples/rerank.py`](examples/rerank.py): Shows how to combine results from vector and text search for more refined results.
-  - [`examples/stress_vectors.py`](examples/stress_vectors.py): A stress test for the vector search functionality.
-  - [`examples/vector.py`](examples/vector.py): Demonstrates how to index and search vector embeddings, including upserts.
+  - [`async_pubsub.py`](examples/async_pubsub.py): A demonstration of the asynchronous wrapper for the publish/subscribe system.
+  - [`blobs.py`](examples/blobs.py): Demonstrates how to store and retrieve binary data in the database.
+  - [`cache.py`](examples/cache.py): A practical example of using a dictionary with TTL as a cache for API calls.
+  - [`fts.py`](examples/fts.py): A detailed look at full-text search, including targeted searches on specific metadata fields.
+  - [`fuzzy.py`](examples/fuzzy.py): Demonstrates fuzzy search capabilities for text search.
+  - [`general_test.py`](examples/general_test.py): A general-purpose test to run all operations randomly which allows testing long-running processes and synchronicity issues.
+  - [`graph.py`](examples/graph.py): Shows how to create relationships between documents and perform multi-hop graph traversals.
+  - [`kvstore.py`](examples/kvstore.py): A comprehensive demo of the namespaced dictionary feature.
+  - [`list.py`](examples/list.py): Shows the full capabilities of the persistent list, including slicing and in-place updates.
+  - [`pqueue.py`](examples/pqueue.py): A practical example of using the persistent priority queue for task management.
+  - [`producer_consumer.py`](examples/producer_consumer.py): A demonstration of the distributed task queue system in a multi-process environment.
+  - [`publisher.py`](examples/publisher.py) and [`subscriber.py`](examples/subscriber.py): A pair of examples demonstrating inter-process message passing with the publish/subscribe system.
+  - [`pubsub.py`](examples/pubsub.py): A demonstration of the synchronous, thread-safe publish/subscribe system in a single process.
+  - [`rerank.py`](examples/rerank.py): Shows how to combine results from vector and text search for more refined results.
+  - [`stress_vectors.py`](examples/stress_vectors.py): A stress test for the vector search functionality.
+  - [`textual_chat.py`](examples/textual_chat.py): A chat application built with `textual` and `beaver` to illustrate the use of several primitives (lists, dicts, and channels) at the same time.
+  - [`type_hints.py](examples/type_hints.py): Shows how to use type hints with `beaver` to get better IDE support and type safety.
+  - [`vector.py`](examples/vector.py): Demonstrates how to index and search vector embeddings, including upserts.
 ## Roadmap
@@ -279,7 +287,6 @@ For more in-depth examples, check out the scripts in the `examples/` directory:
 These are some of the features and improvements planned for future releases:
   - **Async API**: Extend the async support with on-demand wrappers for all features besides channels.
-  - **Type Hints**: Extend type hints for channels and documents.
 Check out the [roadmap](roadmap.md) for a detailed list of upcoming features and design ideas.

{beaver_db-0.13.1 → beaver_db-0.15.0}/README.md RENAMED Viewed

@@ -20,12 +20,13 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
   - **Sync/Async High-Efficiency Pub/Sub**: A powerful, thread and process-safe publish-subscribe system for real-time messaging with a fan-out architecture. Sync by default, but with an `as_async` wrapper for async applications.
   - **Namespaced Key-Value Dictionaries**: A Pythonic, dictionary-like interface for storing any JSON-serializable object within separate namespaces with optional TTL for cache implementations.
   - **Pythonic List Management**: A fluent, Redis-like interface for managing persistent, ordered lists.
-  - **Persistent Priority Queue**: A high-performance, persistent queue that always returns the item with the highest priority, perfect for task management.
+  - **Persistent Priority Queue**: A high-performance, persistent priority queue perfect for task orchestration across multiple processes. Also with optional async support.
   - **Simple Blob Storage**: A dictionary-like interface for storing medium-sized binary files (like PDFs or images) directly in the database, ensuring transactional integrity with your other data.
   - **High-Performance Vector Storage & Search**: Store vector embeddings and perform fast, crash-safe approximate nearest neighbor searches using a `faiss`-based hybrid index.
   - **Full-Text and Fuzzy Search**: Automatically index and search through document metadata using SQLite's powerful FTS5 engine, enhanced with optional fuzzy search for typo-tolerant matching.
   - **Knowledge Graph**: Create relationships between documents and traverse the graph to find neighbors or perform multi-hop walks.
   - **Single-File & Portable**: All data is stored in a single SQLite file, making it incredibly easy to move, back up, or embed in your application.
+  - **Optional Type-Safety:** Although the database is schemaless, you can use a minimalistic typing system for automatic serialization and deserialization that is Pydantic-compatible out of the box.
 ## How Beaver is Implemented
@@ -211,7 +212,7 @@ avatar = attachments.get("user_123_avatar.png")
 ## Type-Safe Data Models
-For enhanced data integrity and a better developer experience, BeaverDB supports type-safe operations for dictionaries, lists, and queues. By associating a model with these data structures, you get automatic serialization and deserialization, complete with autocompletion in your editor.
+For enhanced data integrity and a better developer experience, BeaverDB supports type-safe operations for all modalities. By associating a model with these data structures, you get automatic serialization and deserialization, complete with autocompletion in your editor.
 This feature is designed to be flexible and works seamlessly with two kinds of models:
@@ -241,25 +242,32 @@ retrieved_user = users["alice"]
 print(f"Retrieved: {retrieved_user.name}") # Your editor will provide autocompletion here
 ```
+In the same way you can have typed message payloads in `db.channel`, typed metadata in `db.blobs`, and custom document types in `db.collection`, as well as custom types in lists and queues.
+Basically everywhere you can store or get some object in BeaverDB, you can use a typed version adding `model=MyClass` to the corresponding wrapper methond in `BeaverDB` and enjoy first-class type safety and inference.
 ## More Examples
 For more in-depth examples, check out the scripts in the `examples/` directory:
-  - [`examples/async_pubsub.py`](examples/async_pubsub.py): A demonstration of the asynchronous wrapper for the publish/subscribe system.
-  - [`examples/blobs.py`](examples/blobs.py): Demonstrates how to store and retrieve binary data in the database.
-  - [`examples/cache.py`](examples/cache.py): A practical example of using a dictionary with TTL as a cache for API calls.
-  - [`examples/fts.py`](examples/fts.py): A detailed look at full-text search, including targeted searches on specific metadata fields.
-  - [`examples/fuzzy.py`](examples/fuzzy.py): Demonstrates fuzzy search capabilities for text search.
-  - [`examples/general_test.py`](examples/general_test.py): A general-purpose test to run all operations randomly which allows testing long-running processes and synchronicity issues.
-  - [`examples/graph.py`](examples/graph.py): Shows how to create relationships between documents and perform multi-hop graph traversals.
-  - [`examples/kvstore.py`](examples/kvstore.py): A comprehensive demo of the namespaced dictionary feature.
-  - [`examples/list.py`](examples/list.py): Shows the full capabilities of the persistent list, including slicing and in-place updates.
-  - [`examples/publisher.py`](examples/publisher.py) and [`examples/subscriber.py`](examples/subscriber.py): A pair of examples demonstrating inter-process message passing with the publish/subscribe system.
-  - [`examples/pubsub.py`](examples/pubsub.py): A demonstration of the synchronous, thread-safe publish/subscribe system in a single process.
-  - [`examples/queue.py`](examples/queue.py): A practical example of using the persistent priority queue for task management.
-  - [`examples/rerank.py`](examples/rerank.py): Shows how to combine results from vector and text search for more refined results.
-  - [`examples/stress_vectors.py`](examples/stress_vectors.py): A stress test for the vector search functionality.
-  - [`examples/vector.py`](examples/vector.py): Demonstrates how to index and search vector embeddings, including upserts.
+  - [`async_pubsub.py`](examples/async_pubsub.py): A demonstration of the asynchronous wrapper for the publish/subscribe system.
+  - [`blobs.py`](examples/blobs.py): Demonstrates how to store and retrieve binary data in the database.
+  - [`cache.py`](examples/cache.py): A practical example of using a dictionary with TTL as a cache for API calls.
+  - [`fts.py`](examples/fts.py): A detailed look at full-text search, including targeted searches on specific metadata fields.
+  - [`fuzzy.py`](examples/fuzzy.py): Demonstrates fuzzy search capabilities for text search.
+  - [`general_test.py`](examples/general_test.py): A general-purpose test to run all operations randomly which allows testing long-running processes and synchronicity issues.
+  - [`graph.py`](examples/graph.py): Shows how to create relationships between documents and perform multi-hop graph traversals.
+  - [`kvstore.py`](examples/kvstore.py): A comprehensive demo of the namespaced dictionary feature.
+  - [`list.py`](examples/list.py): Shows the full capabilities of the persistent list, including slicing and in-place updates.
+  - [`pqueue.py`](examples/pqueue.py): A practical example of using the persistent priority queue for task management.
+  - [`producer_consumer.py`](examples/producer_consumer.py): A demonstration of the distributed task queue system in a multi-process environment.
+  - [`publisher.py`](examples/publisher.py) and [`subscriber.py`](examples/subscriber.py): A pair of examples demonstrating inter-process message passing with the publish/subscribe system.
+  - [`pubsub.py`](examples/pubsub.py): A demonstration of the synchronous, thread-safe publish/subscribe system in a single process.
+  - [`rerank.py`](examples/rerank.py): Shows how to combine results from vector and text search for more refined results.
+  - [`stress_vectors.py`](examples/stress_vectors.py): A stress test for the vector search functionality.
+  - [`textual_chat.py`](examples/textual_chat.py): A chat application built with `textual` and `beaver` to illustrate the use of several primitives (lists, dicts, and channels) at the same time.
+  - [`type_hints.py](examples/type_hints.py): Shows how to use type hints with `beaver` to get better IDE support and type safety.
+  - [`vector.py`](examples/vector.py): Demonstrates how to index and search vector embeddings, including upserts.
 ## Roadmap
@@ -268,7 +276,6 @@ For more in-depth examples, check out the scripts in the `examples/` directory:
 These are some of the features and improvements planned for future releases:
   - **Async API**: Extend the async support with on-demand wrappers for all features besides channels.
-  - **Type Hints**: Extend type hints for channels and documents.
 Check out the [roadmap](roadmap.md) for a detailed list of upcoming features and design ideas.

{beaver_db-0.13.1 → beaver_db-0.15.0}/beaver/blobs.py RENAMED Viewed

@@ -1,24 +1,43 @@
 import json
 import sqlite3
-from typing import Any, Dict, Iterator, NamedTuple, Optional
+from typing import Any, Dict, Iterator, NamedTuple, Optional, Type, TypeVar
+from .types import JsonSerializable
-class Blob(NamedTuple):
+class Blob[M](NamedTuple):
     """A data class representing a single blob retrieved from the store."""
     key: str
     data: bytes
-    metadata: Dict[str, Any]
+    metadata: M
-class BlobManager:
+class BlobManager[M]:
     """A wrapper providing a Pythonic interface to a blob store in the database."""
-    def __init__(self, name: str, conn: sqlite3.Connection):
+    def __init__(self, name: str, conn: sqlite3.Connection, model: Type[M] | None = None):
         self._name = name
         self._conn = conn
+        self._model = model
+    def _serialize(self, value: M) -> str | None:
+        """Serializes the given value to a JSON string."""
+        if value is None:
+            return None
+        if isinstance(value, JsonSerializable):
+            return value.model_dump_json()
+        return json.dumps(value)
+    def _deserialize(self, value: str) -> M:
+        """Deserializes a JSON string into the specified model or a generic object."""
+        if self._model:
+            return self._model.model_validate_json(value)
+        return json.loads(value)
-    def put(self, key: str, data: bytes, metadata: Optional[Dict[str, Any]] = None):
+    def put(self, key: str, data: bytes, metadata: Optional[M] = None):
         """
         Stores or replaces a blob in the store.
@@ -30,7 +49,7 @@ class BlobManager:
         if not isinstance(data, bytes):
             raise TypeError("Blob data must be of type bytes.")
-        metadata_json = json.dumps(metadata) if metadata else None
+        metadata_json = self._serialize(metadata) if metadata else None
         with self._conn:
             self._conn.execute(
@@ -38,7 +57,7 @@ class BlobManager:
                 (self._name, key, data, metadata_json),
             )
-    def get(self, key: str) -> Optional[Blob]:
+    def get(self, key: str) -> Optional[Blob[M]]:
         """
         Retrieves a blob from the store.
@@ -60,7 +79,7 @@ class BlobManager:
             return None
         data, metadata_json = result
-        metadata = json.loads(metadata_json) if metadata_json else {}
+        metadata = self._deserialize(metadata_json) if metadata_json else None
         return Blob(key=key, data=data, metadata=metadata)

{beaver_db-0.13.1 → beaver_db-0.15.0}/beaver/channels.py RENAMED Viewed

@@ -4,19 +4,21 @@ import sqlite3
 import threading
 import time
 from queue import Empty, Queue
-from typing import Any, AsyncIterator, Iterator, Set
+from typing import Any, AsyncIterator, Generic, Iterator, Set, Type, TypeVar
+from .types import JsonSerializable
 # A special message object used to signal the listener to gracefully shut down.
 _SHUTDOWN_SENTINEL = object()
-class AsyncSubscriber:
+class AsyncSubscriber[T]:
     """A thread-safe async message receiver for a specific channel subscription."""
-    def __init__(self, subscriber: "Subscriber"):
+    def __init__(self, subscriber: "Subscriber[T]"):
         self._subscriber = subscriber
-    async def __aenter__(self) -> "AsyncSubscriber":
+    async def __aenter__(self) -> "AsyncSubscriber[T]":
         """Registers the listener's queue with the channel to start receiving messages."""
         await asyncio.to_thread(self._subscriber.__enter__)
         return self
@@ -25,7 +27,7 @@ class AsyncSubscriber:
         """Unregisters the listener's queue from the channel to stop receiving messages."""
         await asyncio.to_thread(self._subscriber.__exit__, exc_type, exc_val, exc_tb)
-    async def listen(self, timeout: float | None = None) -> AsyncIterator[Any]:
+    async def listen(self, timeout: float | None = None) -> AsyncIterator[T]:
         """
         Returns a blocking async iterator that yields messages as they arrive.
         """
@@ -39,7 +41,7 @@ class AsyncSubscriber:
                 raise TimeoutError(f"Timeout {timeout}s expired.")
-class Subscriber:
+class Subscriber[T]:
     """
     A thread-safe message receiver for a specific channel subscription.
@@ -49,11 +51,11 @@ class Subscriber:
     impact others.
     """
-    def __init__(self, channel: "ChannelManager"):
+    def __init__(self, channel: "ChannelManager[T]"):
         self._channel = channel
         self._queue: Queue = Queue()
-    def __enter__(self) -> "Subscriber":
+    def __enter__(self) -> "Subscriber[T]":
         """Registers the listener's queue with the channel to start receiving messages."""
         self._channel._register(self._queue)
         return self
@@ -62,7 +64,7 @@ class Subscriber:
         """Unregisters the listener's queue from the channel to stop receiving messages."""
         self._channel._unregister(self._queue)
-    def listen(self, timeout: float | None = None) -> Iterator[Any]:
+    def listen(self, timeout: float | None = None) -> Iterator[T]:
         """
         Returns a blocking iterator that yields messages as they arrive.
@@ -84,29 +86,29 @@ class Subscriber:
             except Empty:
                 raise TimeoutError(f"Timeout {timeout}s expired.")
-    def as_async(self) -> "AsyncSubscriber":
+    def as_async(self) -> "AsyncSubscriber[T]":
         """Returns an async version of the subscriber."""
         return AsyncSubscriber(self)
-class AsyncChannelManager:
+class AsyncChannelManager[T]:
     """The central async hub for a named pub/sub channel."""
-    def __init__(self, channel: "ChannelManager"):
+    def __init__(self, channel: "ChannelManager[T]"):
         self._channel = channel
-    async def publish(self, payload: Any):
+    async def publish(self, payload: T):
         """
         Publishes a JSON-serializable message to the channel asynchronously.
         """
         await asyncio.to_thread(self._channel.publish, payload)
-    def subscribe(self) -> "AsyncSubscriber":
+    def subscribe(self) -> "AsyncSubscriber[T]":
         """Creates a new async subscription, returning an AsyncSubscriber context manager."""
         return self._channel.subscribe().as_async()
-class ChannelManager:
+class ChannelManager[T]:
     """
     The central hub for a named pub/sub channel.
@@ -121,16 +123,32 @@ class ChannelManager:
         conn: sqlite3.Connection,
         db_path: str,
         poll_interval: float = 0.1,
+        model: Type[T] | None = None,
     ):
         self._name = name
         self._conn = conn
         self._db_path = db_path
         self._poll_interval = poll_interval
+        self._model = model
         self._listeners: Set[Queue] = set()
         self._lock = threading.Lock()
         self._polling_thread: threading.Thread | None = None
         self._stop_event = threading.Event()
+    def _serialize(self, value: T) -> str:
+        """Serializes the given value to a JSON string."""
+        if isinstance(value, JsonSerializable):
+            return value.model_dump_json()
+        return json.dumps(value)
+    def _deserialize(self, value: str) -> T:
+        """Deserializes a JSON string into the specified model or a generic object."""
+        if self._model:
+            return self._model.model_validate_json(value)
+        return json.loads(value)
     def _register(self, queue: Queue):
         """Adds a listener's queue and starts the poller if it's the first one."""
@@ -186,7 +204,6 @@ class ChannelManager:
         # The poller starts listening for messages from this moment forward.
         last_seen_timestamp = time.time()
         while not self._stop_event.is_set():
             cursor = thread_conn.cursor()
             cursor.execute(
@@ -206,18 +223,18 @@ class ChannelManager:
                 with self._lock:
                     for queue in self._listeners:
                         for row in messages:
-                            queue.put(json.loads(row["message_payload"]))
+                            queue.put(self._deserialize(row["message_payload"]))
             # Wait for the poll interval before checking for new messages again.
             time.sleep(self._poll_interval)
         thread_conn.close()
-    def subscribe(self) -> Subscriber:
+    def subscribe(self) -> Subscriber[T]:
         """Creates a new subscription, returning a Listener context manager."""
         return Subscriber(self)
-    def publish(self, payload: Any):
+    def publish(self, payload: T):
         """
         Publishes a JSON-serializable message to the channel.
@@ -225,7 +242,7 @@ class ChannelManager:
         into the database's pub/sub log.
         """
         try:
-            json_payload = json.dumps(payload)
+            json_payload = self._serialize(payload)
         except TypeError as e:
             raise TypeError("Message payload must be JSON-serializable.") from e
@@ -235,6 +252,6 @@ class ChannelManager:
                 (time.time(), self._name, json_payload),
             )
-    def as_async(self) -> "AsyncChannelManager":
+    def as_async(self) -> "AsyncChannelManager[T]":
         """Returns an async version of the channel manager."""
-        return AsyncChannelManager(self)
+        return AsyncChannelManager(self)

{beaver_db-0.13.1 → beaver_db-0.15.0}/beaver/collections.py RENAMED Viewed

@@ -3,10 +3,11 @@ import sqlite3
 import threading
 import uuid
 from enum import Enum
-from typing import Any, List, Literal, Tuple
+from typing import Any, Iterator, List, Literal, Tuple, Type, TypeVar
 import numpy as np
+from .types import Model
 from .vectors import VectorIndex
@@ -71,7 +72,7 @@ class WalkDirection(Enum):
     INCOMING = "incoming"
-class Document:
+class Document(Model):
     """A data class representing a single item in a collection."""
     def __init__(
@@ -88,8 +89,7 @@ class Document:
                 raise TypeError("Embedding must be a list of numbers.")
             self.embedding = np.array(embedding, dtype=np.float32)
-        for key, value in metadata.items():
-            setattr(self, key, value)
+        super().__init__(**metadata)
     def to_dict(self) -> dict[str, Any]:
         """Serializes the document's metadata to a dictionary."""
@@ -103,15 +103,16 @@ class Document:
         return f"Document(id='{self.id}', {metadata_str})"
-class CollectionManager:
+class CollectionManager[D: Document]:
     """
     A wrapper for multi-modal collection operations, including document storage,
     FTS, fuzzy search, graph traversal, and persistent vector search.
     """
-    def __init__(self, name: str, conn: sqlite3.Connection):
+    def __init__(self, name: str, conn: sqlite3.Connection, model: Type[D] | None = None):
         self._name = name
         self._conn = conn
+        self._model = model or Document
         # All vector-related operations are now delegated to the VectorIndex class.
         self._vector_index = VectorIndex(name, conn)
         # A lock to ensure only one compaction thread runs at a time for this collection.
@@ -184,7 +185,7 @@ class CollectionManager:
     def index(
         self,
-        document: Document,
+        document: D,
         *,
         fts: bool | list[str] = True,
         fuzzy: bool = False
@@ -266,7 +267,7 @@ class CollectionManager:
         if self._needs_compaction():
             self.compact()
-    def __iter__(self):
+    def __iter__(self) -> Iterator[D]:
         """Returns an iterator over all documents in the collection."""
         cursor = self._conn.cursor()
         cursor.execute(
@@ -279,14 +280,14 @@ class CollectionManager:
                 if row["item_vector"]
                 else None
             )
-            yield Document(
+            yield self._model(
                 id=row["item_id"], embedding=embedding, **json.loads(row["metadata"])
             )
         cursor.close()
     def search(
         self, vector: list[float], top_k: int = 10
-    ) -> List[Tuple[Document, float]]:
+    ) -> List[Tuple[D, float]]:
         """Performs a fast, persistent approximate nearest neighbor search."""
         if not isinstance(vector, list):
             raise TypeError("Search vector must be a list of floats.")
@@ -307,7 +308,7 @@ class CollectionManager:
         rows = cursor.execute(sql, (self._name, *result_ids)).fetchall()
         doc_map = {
-            row["item_id"]: Document(
+            row["item_id"]: self._model(
                 id=row["item_id"],
                 embedding=(np.frombuffer(row["item_vector"], dtype=np.float32).tolist() if row["item_vector"] else None),
                 **json.loads(row["metadata"]),
@@ -331,7 +332,7 @@ class CollectionManager:
         on: str | list[str] | None = None,
         top_k: int = 10,
         fuzziness: int = 0
-    ) -> list[tuple[Document, float]]:
+    ) -> list[tuple[D, float]]:
         """
         Performs a full-text or fuzzy search on indexed string fields.
         """
@@ -345,7 +346,7 @@ class CollectionManager:
     def _perform_fts_search(
         self, query: str, on: list[str] | None, top_k: int
-    ) -> list[tuple[Document, float]]:
+    ) -> list[tuple[D, float]]:
         """Performs a standard FTS search."""
         cursor = self._conn.cursor()
         sql_query = """
@@ -371,7 +372,7 @@ class CollectionManager:
                 np.frombuffer(row["item_vector"], dtype=np.float32).tolist()
                 if row["item_vector"] else None
             )
-            doc = Document(id=row["item_id"], embedding=embedding, **json.loads(row["metadata"]))
+            doc = self._model(id=row["item_id"], embedding=embedding, **json.loads(row["metadata"]))
             results.append((doc, row["rank"]))
         return results
@@ -410,7 +411,7 @@ class CollectionManager:
     def _perform_fuzzy_search(
         self, query: str, on: list[str] | None, top_k: int, fuzziness: int
-    ) -> list[tuple[Document, float]]:
+    ) -> list[tuple[D, float]]:
         """Performs a 3-stage fuzzy search: gather, score, and sort."""
         fts_results = self._perform_fts_search(query, on, top_k)
         fts_candidate_ids = {doc.id for doc, _ in fts_results}
@@ -462,7 +463,7 @@ class CollectionManager:
         id_placeholders = ",".join("?" for _ in top_ids)
         sql_docs = f"SELECT item_id, item_vector, metadata FROM beaver_collections WHERE collection = ? AND item_id IN ({id_placeholders})"
         cursor.execute(sql_docs, (self._name, *top_ids))
-        doc_map = {row["item_id"]: Document(id=row["item_id"], embedding=(np.frombuffer(row["item_vector"], dtype=np.float32).tolist() if row["item_vector"] else None), **json.loads(row["metadata"])) for row in cursor.fetchall()}
+        doc_map = {row["item_id"]: self._model(id=row["item_id"], embedding=(np.frombuffer(row["item_vector"], dtype=np.float32).tolist() if row["item_vector"] else None), **json.loads(row["metadata"])) for row in cursor.fetchall()}
         final_results = []
         distance_map = {c["id"]: c["distance"] for c in scored_candidates}
@@ -489,7 +490,7 @@ class CollectionManager:
                 ),
             )
-    def neighbors(self, doc: Document, label: str | None = None) -> list[Document]:
+    def neighbors(self, doc: D, label: str | None = None) -> list[D]:
         """Retrieves the neighboring documents connected to a given document."""
         sql = "SELECT t1.item_id, t1.item_vector, t1.metadata FROM beaver_collections AS t1 JOIN beaver_edges AS t2 ON t1.item_id = t2.target_item_id AND t1.collection = t2.collection WHERE t2.collection = ? AND t2.source_item_id = ?"
         params = [self._name, doc.id]
@@ -499,7 +500,7 @@ class CollectionManager:
         rows = self._conn.cursor().execute(sql, tuple(params)).fetchall()
         return [
-            Document(
+            self._model(
                 id=row["item_id"],
                 embedding=(
                     np.frombuffer(row["item_vector"], dtype=np.float32).tolist()
@@ -513,16 +514,16 @@ class CollectionManager:
     def walk(
         self,
-        source: Document,
+        source: D,
         labels: List[str],
         depth: int,
         *,
         direction: Literal[
             WalkDirection.OUTGOING, WalkDirection.INCOMING
         ] = WalkDirection.OUTGOING,
-    ) -> List[Document]:
+    ) -> List[D]:
         """Performs a graph traversal (BFS) from a starting document."""
-        if not isinstance(source, Document):
+        if not isinstance(source, D):
             raise TypeError("The starting point must be a Document object.")
         if depth <= 0:
             return []
@@ -548,7 +549,7 @@ class CollectionManager:
         rows = self._conn.cursor().execute(sql, tuple(params)).fetchall()
         return [
-            Document(
+            self._model(
                 id=row["item_id"],
                 embedding=(
                     np.frombuffer(row["item_vector"], dtype=np.float32).tolist()
@@ -572,11 +573,11 @@ class CollectionManager:
         return count
-def rerank(
-    *results: list[Document],
+def rerank[D: Document](
+    *results: list[D],
     weights: list[float] | None = None,
     k: int = 60
-) -> list[Document]:
+) -> list[D]:
     """
     Reranks documents from multiple search result lists using Reverse Rank Fusion (RRF).
     """
@@ -590,7 +591,7 @@ def rerank(
         raise ValueError("The number of result lists must match the number of weights.")
     rrf_scores: dict[str, float] = {}
-    doc_store: dict[str, Document] = {}
+    doc_store: dict[str, D] = {}
     for result_list, weight in zip(results, weights):
         for rank, doc in enumerate(result_list):
@@ -600,5 +601,5 @@ def rerank(
             score = weight * (1 / (k + rank))
             rrf_scores[doc_id] = rrf_scores.get(doc_id, 0.0) + score
-    sorted_doc_ids = sorted(rrf_scores.keys(), key=rrf_scores.get, reverse=True)
+    sorted_doc_ids = sorted(rrf_scores.keys(), key=lambda k: rrf_scores[k], reverse=True)
     return [doc_store[doc_id] for doc_id in sorted_doc_ids]

{beaver_db-0.13.1 → beaver_db-0.15.0}/beaver/core.py RENAMED Viewed

@@ -1,10 +1,11 @@
 import sqlite3
 import threading
+from typing import Type
 from .types import JsonSerializable
 from .blobs import BlobManager
 from .channels import ChannelManager
-from .collections import CollectionManager
+from .collections import CollectionManager, Document
 from .dicts import DictManager
 from .lists import ListManager
 from .queues import QueueManager
@@ -306,7 +307,7 @@ class BeaverDB:
         return QueueManager(name, self._conn, model)
-    def collection(self, name: str) -> CollectionManager:
+    def collection[D: Document](self, name: str, model: Type[D] | None = None) -> CollectionManager[D]:
         """
         Returns a singleton CollectionManager instance for interacting with a
         document collection.
@@ -319,10 +320,11 @@ class BeaverDB:
         # of the vector index consistently.
         with self._collections_lock:
             if name not in self._collections:
-                self._collections[name] = CollectionManager(name, self._conn)
+                self._collections[name] = CollectionManager(name, self._conn, model=model)
             return self._collections[name]
-    def channel(self, name: str) -> ChannelManager:
+    def channel[T](self, name: str, model: type[T] | None = None) -> ChannelManager[T]:
         """
         Returns a singleton Channel instance for high-efficiency pub/sub.
         """
@@ -332,12 +334,12 @@ class BeaverDB:
         # Use a thread-safe lock to ensure only one Channel object is created per name.
         with self._channels_lock:
             if name not in self._channels:
-                self._channels[name] = ChannelManager(name, self._conn, self._db_path)
+                self._channels[name] = ChannelManager(name, self._conn, self._db_path, model=model)
             return self._channels[name]
-    def blobs(self, name: str) -> BlobManager:
+    def blobs[M](self, name: str, model: type[M] | None = None) -> BlobManager[M]:
         """Returns a wrapper object for interacting with a named blob store."""
         if not isinstance(name, str) or not name:
             raise TypeError("Blob store name must be a non-empty string.")
-        return BlobManager(name, self._conn)
+        return BlobManager(name, self._conn, model)

{beaver_db-0.13.1 → beaver_db-0.15.0}/beaver/queues.py RENAMED Viewed

@@ -1,3 +1,4 @@
+import asyncio
 import json
 import sqlite3
 import time
@@ -14,8 +15,34 @@ class QueueItem[T](NamedTuple):
     data: T
+class AsyncQueueManager[T]:
+    """An async wrapper for the producer-consumer priority queue."""
+    def __init__(self, queue: "QueueManager[T]"):
+        self._queue = queue
+    async def put(self, data: T, priority: float):
+        """Asynchronously adds an item to the queue with a specific priority."""
+        await asyncio.to_thread(self._queue.put, data, priority)
+    @overload
+    async def get(self, block: Literal[True] = True, timeout: float | None = None) -> QueueItem[T]: ...
+    @overload
+    async def get(self, block: Literal[False]) -> QueueItem[T]: ...
+    async def get(self, block: bool = True, timeout: float | None = None) -> QueueItem[T]:
+        """
+        Asynchronously and atomically retrieves the highest-priority item.
+        This method will run the synchronous blocking logic in a separate thread.
+        """
+        return await asyncio.to_thread(self._queue.get, block=block, timeout=timeout)
 class QueueManager[T]:
-    """A wrapper providing a Pythonic interface to a persistent priority queue."""
+    """
+    A wrapper providing a Pythonic interface to a persistent, multi-process
+    producer-consumer priority queue.
+    """
     def __init__(self, name: str, conn: sqlite3.Connection, model: Type[T] | None = None):
         self._name = name
@@ -50,19 +77,13 @@ class QueueManager[T]:
                 (self._name, priority, time.time(), self._serialize(data)),
             )
-    @overload
-    def get(self, safe:Literal[True]) -> QueueItem[T] | None: ...
-    @overload
-    def get(self) -> QueueItem[T]: ...
-    def get(self, safe:bool=False) -> QueueItem[T] | None:
+    def _get_item_atomically(self) -> QueueItem[T] | None:
         """
-        Atomically retrieves and removes the highest-priority item from the queue.
-        If the queue is empty, returns None if safe is True, otherwise (the default) raises IndexError.
+        Performs a single, atomic attempt to retrieve and remove the
+        highest-priority item from the queue. Returns None if the queue is empty.
         """
         with self._conn:
             cursor = self._conn.cursor()
-            # The compound index on (queue_name, priority, timestamp) makes this query efficient.
             cursor.execute(
                 """
                 SELECT rowid, priority, timestamp, data
@@ -76,19 +97,61 @@ class QueueManager[T]:
             result = cursor.fetchone()
             if result is None:
-                if safe:
-                    return None
-                else:
-                    raise IndexError("No item available.")
+                return None
             rowid, priority, timestamp, data = result
-            # Delete the retrieved item to ensure it's processed only once.
             cursor.execute("DELETE FROM beaver_priority_queues WHERE rowid = ?", (rowid,))
             return QueueItem(
                 priority=priority, timestamp=timestamp, data=self._deserialize(data)
             )
+    @overload
+    def get(self, block: Literal[True] = True, timeout: float | None = None) -> QueueItem[T]: ...
+    @overload
+    def get(self, block: Literal[False]) -> QueueItem[T]: ...
+    def get(self, block: bool = True, timeout: float | None = None) -> QueueItem[T]:
+        """
+        Atomically retrieves and removes the highest-priority item from the queue.
+        This method is designed for producer-consumer patterns and can block
+        until an item becomes available.
+        Args:
+            block: If True (default), the method will wait until an item is available.
+            timeout: If `block` is True, this specifies the maximum number of seconds
+                     to wait. If the timeout is reached, `TimeoutError` is raised.
+        Returns:
+            A `QueueItem` containing the retrieved data.
+        Raises:
+            IndexError: If `block` is False and the queue is empty.
+            TimeoutError: If `block` is True and the timeout expires.
+        """
+        if not block:
+            item = self._get_item_atomically()
+            if item is None:
+                raise IndexError("get from an empty queue.")
+            return item
+        start_time = time.time()
+        while True:
+            item = self._get_item_atomically()
+            if item is not None:
+                return item
+            if timeout is not None and (time.time() - start_time) > timeout:
+                raise TimeoutError("Timeout expired while waiting for an item.")
+            # Sleep for a short interval to avoid busy-waiting and consuming CPU.
+            time.sleep(0.1)
+    def as_async(self) -> "AsyncQueueManager[T]":
+        """Returns an async version of the queue manager."""
+        return AsyncQueueManager(self)
     def __len__(self) -> int:
         """Returns the current number of items in the queue."""
         cursor = self._conn.cursor()

{beaver_db-0.13.1 → beaver_db-0.15.0/beaver_db.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: beaver-db
-Version: 0.13.1
+Version: 0.15.0
 Summary: Fast, embedded, and multi-modal DB based on SQLite for AI-powered applications.
 Requires-Python: >=3.13
 Description-Content-Type: text/markdown
@@ -31,12 +31,13 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
   - **Sync/Async High-Efficiency Pub/Sub**: A powerful, thread and process-safe publish-subscribe system for real-time messaging with a fan-out architecture. Sync by default, but with an `as_async` wrapper for async applications.
   - **Namespaced Key-Value Dictionaries**: A Pythonic, dictionary-like interface for storing any JSON-serializable object within separate namespaces with optional TTL for cache implementations.
   - **Pythonic List Management**: A fluent, Redis-like interface for managing persistent, ordered lists.
-  - **Persistent Priority Queue**: A high-performance, persistent queue that always returns the item with the highest priority, perfect for task management.
+  - **Persistent Priority Queue**: A high-performance, persistent priority queue perfect for task orchestration across multiple processes. Also with optional async support.
   - **Simple Blob Storage**: A dictionary-like interface for storing medium-sized binary files (like PDFs or images) directly in the database, ensuring transactional integrity with your other data.
   - **High-Performance Vector Storage & Search**: Store vector embeddings and perform fast, crash-safe approximate nearest neighbor searches using a `faiss`-based hybrid index.
   - **Full-Text and Fuzzy Search**: Automatically index and search through document metadata using SQLite's powerful FTS5 engine, enhanced with optional fuzzy search for typo-tolerant matching.
   - **Knowledge Graph**: Create relationships between documents and traverse the graph to find neighbors or perform multi-hop walks.
   - **Single-File & Portable**: All data is stored in a single SQLite file, making it incredibly easy to move, back up, or embed in your application.
+  - **Optional Type-Safety:** Although the database is schemaless, you can use a minimalistic typing system for automatic serialization and deserialization that is Pydantic-compatible out of the box.
 ## How Beaver is Implemented
@@ -222,7 +223,7 @@ avatar = attachments.get("user_123_avatar.png")
 ## Type-Safe Data Models
-For enhanced data integrity and a better developer experience, BeaverDB supports type-safe operations for dictionaries, lists, and queues. By associating a model with these data structures, you get automatic serialization and deserialization, complete with autocompletion in your editor.
+For enhanced data integrity and a better developer experience, BeaverDB supports type-safe operations for all modalities. By associating a model with these data structures, you get automatic serialization and deserialization, complete with autocompletion in your editor.
 This feature is designed to be flexible and works seamlessly with two kinds of models:
@@ -252,25 +253,32 @@ retrieved_user = users["alice"]
 print(f"Retrieved: {retrieved_user.name}") # Your editor will provide autocompletion here
 ```
+In the same way you can have typed message payloads in `db.channel`, typed metadata in `db.blobs`, and custom document types in `db.collection`, as well as custom types in lists and queues.
+Basically everywhere you can store or get some object in BeaverDB, you can use a typed version adding `model=MyClass` to the corresponding wrapper methond in `BeaverDB` and enjoy first-class type safety and inference.
 ## More Examples
 For more in-depth examples, check out the scripts in the `examples/` directory:
-  - [`examples/async_pubsub.py`](examples/async_pubsub.py): A demonstration of the asynchronous wrapper for the publish/subscribe system.
-  - [`examples/blobs.py`](examples/blobs.py): Demonstrates how to store and retrieve binary data in the database.
-  - [`examples/cache.py`](examples/cache.py): A practical example of using a dictionary with TTL as a cache for API calls.
-  - [`examples/fts.py`](examples/fts.py): A detailed look at full-text search, including targeted searches on specific metadata fields.
-  - [`examples/fuzzy.py`](examples/fuzzy.py): Demonstrates fuzzy search capabilities for text search.
-  - [`examples/general_test.py`](examples/general_test.py): A general-purpose test to run all operations randomly which allows testing long-running processes and synchronicity issues.
-  - [`examples/graph.py`](examples/graph.py): Shows how to create relationships between documents and perform multi-hop graph traversals.
-  - [`examples/kvstore.py`](examples/kvstore.py): A comprehensive demo of the namespaced dictionary feature.
-  - [`examples/list.py`](examples/list.py): Shows the full capabilities of the persistent list, including slicing and in-place updates.
-  - [`examples/publisher.py`](examples/publisher.py) and [`examples/subscriber.py`](examples/subscriber.py): A pair of examples demonstrating inter-process message passing with the publish/subscribe system.
-  - [`examples/pubsub.py`](examples/pubsub.py): A demonstration of the synchronous, thread-safe publish/subscribe system in a single process.
-  - [`examples/queue.py`](examples/queue.py): A practical example of using the persistent priority queue for task management.
-  - [`examples/rerank.py`](examples/rerank.py): Shows how to combine results from vector and text search for more refined results.
-  - [`examples/stress_vectors.py`](examples/stress_vectors.py): A stress test for the vector search functionality.
-  - [`examples/vector.py`](examples/vector.py): Demonstrates how to index and search vector embeddings, including upserts.
+  - [`async_pubsub.py`](examples/async_pubsub.py): A demonstration of the asynchronous wrapper for the publish/subscribe system.
+  - [`blobs.py`](examples/blobs.py): Demonstrates how to store and retrieve binary data in the database.
+  - [`cache.py`](examples/cache.py): A practical example of using a dictionary with TTL as a cache for API calls.
+  - [`fts.py`](examples/fts.py): A detailed look at full-text search, including targeted searches on specific metadata fields.
+  - [`fuzzy.py`](examples/fuzzy.py): Demonstrates fuzzy search capabilities for text search.
+  - [`general_test.py`](examples/general_test.py): A general-purpose test to run all operations randomly which allows testing long-running processes and synchronicity issues.
+  - [`graph.py`](examples/graph.py): Shows how to create relationships between documents and perform multi-hop graph traversals.
+  - [`kvstore.py`](examples/kvstore.py): A comprehensive demo of the namespaced dictionary feature.
+  - [`list.py`](examples/list.py): Shows the full capabilities of the persistent list, including slicing and in-place updates.
+  - [`pqueue.py`](examples/pqueue.py): A practical example of using the persistent priority queue for task management.
+  - [`producer_consumer.py`](examples/producer_consumer.py): A demonstration of the distributed task queue system in a multi-process environment.
+  - [`publisher.py`](examples/publisher.py) and [`subscriber.py`](examples/subscriber.py): A pair of examples demonstrating inter-process message passing with the publish/subscribe system.
+  - [`pubsub.py`](examples/pubsub.py): A demonstration of the synchronous, thread-safe publish/subscribe system in a single process.
+  - [`rerank.py`](examples/rerank.py): Shows how to combine results from vector and text search for more refined results.
+  - [`stress_vectors.py`](examples/stress_vectors.py): A stress test for the vector search functionality.
+  - [`textual_chat.py`](examples/textual_chat.py): A chat application built with `textual` and `beaver` to illustrate the use of several primitives (lists, dicts, and channels) at the same time.
+  - [`type_hints.py](examples/type_hints.py): Shows how to use type hints with `beaver` to get better IDE support and type safety.
+  - [`vector.py`](examples/vector.py): Demonstrates how to index and search vector embeddings, including upserts.
 ## Roadmap
@@ -279,7 +287,6 @@ For more in-depth examples, check out the scripts in the `examples/` directory:
 These are some of the features and improvements planned for future releases:
   - **Async API**: Extend the async support with on-demand wrappers for all features besides channels.
-  - **Type Hints**: Extend type hints for channels and documents.
 Check out the [roadmap](roadmap.md) for a detailed list of upcoming features and design ideas.

{beaver_db-0.13.1 → beaver_db-0.15.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "beaver-db"
-version = "0.13.1"
+version = "0.15.0"
 description = "Fast, embedded, and multi-modal DB based on SQLite for AI-powered applications."
 readme = "README.md"
 requires-python = ">=3.13"
@@ -11,3 +11,8 @@ dependencies = [
 [tool.hatch.build.targets.wheel]
 packages = ["beaver"]
+[dependency-groups]
+dev = [
+    "textual>=6.1.0",
+]