beaver-db 0.9.2__tar.gz → 0.11.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of beaver-db might be problematic. Click here for more details.
- {beaver_db-0.9.2 → beaver_db-0.11.0}/PKG-INFO +33 -12
- {beaver_db-0.9.2 → beaver_db-0.11.0}/README.md +32 -11
- beaver_db-0.11.0/beaver/collections.py +593 -0
- beaver_db-0.11.0/beaver/core.py +301 -0
- beaver_db-0.11.0/beaver/vectors.py +370 -0
- {beaver_db-0.9.2 → beaver_db-0.11.0}/beaver_db.egg-info/PKG-INFO +33 -12
- {beaver_db-0.9.2 → beaver_db-0.11.0}/beaver_db.egg-info/SOURCES.txt +1 -0
- beaver_db-0.11.0/beaver_db.egg-info/requires.txt +2 -0
- {beaver_db-0.9.2 → beaver_db-0.11.0}/pyproject.toml +2 -2
- beaver_db-0.9.2/beaver/collections.py +0 -403
- beaver_db-0.9.2/beaver/core.py +0 -220
- beaver_db-0.9.2/beaver_db.egg-info/requires.txt +0 -2
- {beaver_db-0.9.2 → beaver_db-0.11.0}/LICENSE +0 -0
- {beaver_db-0.9.2 → beaver_db-0.11.0}/beaver/__init__.py +0 -0
- {beaver_db-0.9.2 → beaver_db-0.11.0}/beaver/channels.py +0 -0
- {beaver_db-0.9.2 → beaver_db-0.11.0}/beaver/dicts.py +0 -0
- {beaver_db-0.9.2 → beaver_db-0.11.0}/beaver/lists.py +0 -0
- {beaver_db-0.9.2 → beaver_db-0.11.0}/beaver/queues.py +0 -0
- {beaver_db-0.9.2 → beaver_db-0.11.0}/beaver_db.egg-info/dependency_links.txt +0 -0
- {beaver_db-0.9.2 → beaver_db-0.11.0}/beaver_db.egg-info/top_level.txt +0 -0
- {beaver_db-0.9.2 → beaver_db-0.11.0}/setup.cfg +0 -0
|
@@ -1,14 +1,16 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: beaver-db
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.11.0
|
|
4
4
|
Summary: Fast, embedded, and multi-modal DB based on SQLite for AI-powered applications.
|
|
5
5
|
Requires-Python: >=3.13
|
|
6
6
|
Description-Content-Type: text/markdown
|
|
7
7
|
License-File: LICENSE
|
|
8
|
+
Requires-Dist: faiss-cpu>=1.12.0
|
|
8
9
|
Requires-Dist: numpy>=2.3.3
|
|
9
|
-
Requires-Dist: scipy>=1.16.2
|
|
10
10
|
Dynamic: license-file
|
|
11
11
|
|
|
12
|
+
Of course, here is a rewritten README to explain the vector store uses a high performance FAISS-based implementation with in-memory and persistent indices, with an added small section on how is this implemented to explain the basic ideas behind the implementation of beaver.
|
|
13
|
+
|
|
12
14
|
# beaver 🦫
|
|
13
15
|
|
|
14
16
|
A fast, single-file, multi-modal database for Python, built with the standard `sqlite3` library.
|
|
@@ -19,10 +21,11 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
|
|
|
19
21
|
|
|
20
22
|
`beaver` is built with a minimalistic philosophy for small, local use cases where a full-blown database server would be overkill.
|
|
21
23
|
|
|
22
|
-
- **Minimalistic
|
|
23
|
-
- **
|
|
24
|
+
- **Minimalistic**: Uses only Python's standard libraries (`sqlite3`) and `numpy`/`faiss-cpu`.
|
|
25
|
+
- **Schemaless**: Flexible data storage without rigid schemas across all modalities.
|
|
26
|
+
- **Synchronous, Multi-Process, and Thread-Safe**: Designed for simplicity and safety in multi-threaded and multi-process environments.
|
|
24
27
|
- **Built for Local Applications**: Perfect for local AI tools, RAG prototypes, chatbots, and desktop utilities that need persistent, structured data without network overhead.
|
|
25
|
-
- **Fast by Default**: It's built on SQLite, which is famously fast and reliable for local applications.
|
|
28
|
+
- **Fast by Default**: It's built on SQLite, which is famously fast and reliable for local applications. Vector search is accelerated with a high-performance, persistent `faiss` index.
|
|
26
29
|
- **Standard Relational Interface**: While `beaver` provides high-level features, you can always use the same SQLite file for normal relational tasks with standard SQL.
|
|
27
30
|
|
|
28
31
|
## Core Features
|
|
@@ -31,11 +34,28 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
|
|
|
31
34
|
- **Namespaced Key-Value Dictionaries**: A Pythonic, dictionary-like interface for storing any JSON-serializable object within separate namespaces with optional TTL for cache implementations.
|
|
32
35
|
- **Pythonic List Management**: A fluent, Redis-like interface for managing persistent, ordered lists.
|
|
33
36
|
- **Persistent Priority Queue**: A high-performance, persistent queue that always returns the item with the highest priority, perfect for task management.
|
|
34
|
-
- **
|
|
35
|
-
- **Full-Text Search**: Automatically index and search through document metadata using SQLite's powerful FTS5 engine.
|
|
36
|
-
- **Graph
|
|
37
|
+
- **High-Performance Vector Storage & Search**: Store vector embeddings and perform fast, crash-safe approximate nearest neighbor searches using a `faiss`-based hybrid index.
|
|
38
|
+
- **Full-Text and Fuzzy Search**: Automatically index and search through document metadata using SQLite's powerful FTS5 engine, enhanced with optional fuzzy search for typo-tolerant matching.
|
|
39
|
+
- **Knowledge Graph**: Create relationships between documents and traverse the graph to find neighbors or perform multi-hop walks.
|
|
37
40
|
- **Single-File & Portable**: All data is stored in a single SQLite file, making it incredibly easy to move, back up, or embed in your application.
|
|
38
41
|
|
|
42
|
+
## How Beaver is Implemented
|
|
43
|
+
|
|
44
|
+
BeaverDB is architected as a set of targeted wrappers around a standard SQLite database. The core `BeaverDB` class manages a single connection to the SQLite file and initializes all the necessary tables for the various features.
|
|
45
|
+
|
|
46
|
+
When you call a method like `db.dict("my_dict")` or `db.collection("my_docs")`, you get back a specialized manager object (`DictManager`, `CollectionManager`, etc.) that provides a clean, Pythonic API for that specific data modality. These managers translate the simple method calls (e.g., `my_dict["key"] = "value"`) into the appropriate SQL queries, handling all the complexity of data serialization, indexing, and transaction management behind the scenes. This design provides a minimal and intuitive API surface while leveraging the power and reliability of SQLite.
|
|
47
|
+
|
|
48
|
+
The vector store in BeaverDB is designed for high performance and reliability, using a hybrid faiss-based index that is both fast and persistent. Here's a look at the core ideas behind its implementation:
|
|
49
|
+
|
|
50
|
+
- **Hybrid Index System**: The vector store uses a two-tiered system to balance fast writes with efficient long-term storage:
|
|
51
|
+
- **Base Index**: A large, optimized faiss index that contains the majority of the vectors. This index is serialized and stored as a BLOB inside a dedicated SQLite table, ensuring it remains part of the single database file.
|
|
52
|
+
- **Delta Index**: A small, in-memory faiss index that holds all newly added vectors. This allows for near-instant write performance without having to rebuild the entire index for every new addition.
|
|
53
|
+
- **Crash-Safe Logging**: To ensure durability, all new vector additions and deletions are first recorded in a dedicated log table in the SQLite database. This means that even if the application crashes, no data is lost.
|
|
54
|
+
- **Automatic Compaction**: When the number of changes in the log reaches a certain threshold, a background process is automatically triggered to "compact" the index. This process rebuilds the base index, incorporating all the recent changes from the delta index, and then clears the log. This ensures that the index remains optimized for fast search performance over time.
|
|
55
|
+
|
|
56
|
+
This hybrid approach allows BeaverDB to provide a vector search experience that is both fast and durable, without sacrificing the single-file, embedded philosophy of the library.
|
|
57
|
+
|
|
58
|
+
|
|
39
59
|
## Installation
|
|
40
60
|
|
|
41
61
|
```bash
|
|
@@ -135,7 +155,7 @@ for message in chat_history:
|
|
|
135
155
|
|
|
136
156
|
### 4. Build a RAG (Retrieval-Augmented Generation) System
|
|
137
157
|
|
|
138
|
-
Combine **vector search** and **full-text search** to build a powerful RAG pipeline for your local documents.
|
|
158
|
+
Combine **vector search** and **full-text search** to build a powerful RAG pipeline for your local documents. The vector search uses a high-performance, persistent `faiss` index that supports incremental additions without downtime.
|
|
139
159
|
|
|
140
160
|
```python
|
|
141
161
|
# Get context for a user query like "fast python web frameworks"
|
|
@@ -194,14 +214,15 @@ For more in-depth examples, check out the scripts in the `examples/` directory:
|
|
|
194
214
|
- [`examples/publisher.py`](examples/publisher.py) and [`examples/subscriber.py`](examples/subscriber.py): A pair of examples demonstrating inter-process message passing with the publish/subscribe system.
|
|
195
215
|
- [`examples/cache.py`](examples/cache.py): A practical example of using a dictionary with TTL as a cache for API calls.
|
|
196
216
|
- [`examples/rerank.py`](examples/rerank.py): Shows how to combine results from vector and text search for more refined results.
|
|
217
|
+
- [`examples/fuzzy.py`](examples/fuzzy.py): Demonstrates fuzzy search capabilities for text search.
|
|
218
|
+
- [`examples/stress_vectors.py](examples/stress_vectors.py): A stress test for the vector search functionality.
|
|
219
|
+
- [`examples/general_test.py`](examples/general_test.py): A general-purpose test to run all operations randomly which allows testing long-running processes and synchronicity issues.
|
|
197
220
|
|
|
198
221
|
## Roadmap
|
|
199
222
|
|
|
200
223
|
These are some of the features and improvements planned for future releases:
|
|
201
224
|
|
|
202
|
-
- **
|
|
203
|
-
- **Faster ANN**: Explore integrating more advanced ANN libraries like `faiss` for improved vector search performance.
|
|
204
|
-
- **Async API**: Comprehensive async support with on-demand wrappers for all collections.
|
|
225
|
+
- **Full Async API**: Comprehensive async support with on-demand wrappers for all collections.
|
|
205
226
|
|
|
206
227
|
Check out the [roadmap](roadmap.md) for a detailed list of upcoming features and design ideas.
|
|
207
228
|
|
|
@@ -1,3 +1,5 @@
|
|
|
1
|
+
Of course, here is a rewritten README to explain the vector store uses a high performance FAISS-based implementation with in-memory and persistent indices, with an added small section on how is this implemented to explain the basic ideas behind the implementation of beaver.
|
|
2
|
+
|
|
1
3
|
# beaver 🦫
|
|
2
4
|
|
|
3
5
|
A fast, single-file, multi-modal database for Python, built with the standard `sqlite3` library.
|
|
@@ -8,10 +10,11 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
|
|
|
8
10
|
|
|
9
11
|
`beaver` is built with a minimalistic philosophy for small, local use cases where a full-blown database server would be overkill.
|
|
10
12
|
|
|
11
|
-
- **Minimalistic
|
|
12
|
-
- **
|
|
13
|
+
- **Minimalistic**: Uses only Python's standard libraries (`sqlite3`) and `numpy`/`faiss-cpu`.
|
|
14
|
+
- **Schemaless**: Flexible data storage without rigid schemas across all modalities.
|
|
15
|
+
- **Synchronous, Multi-Process, and Thread-Safe**: Designed for simplicity and safety in multi-threaded and multi-process environments.
|
|
13
16
|
- **Built for Local Applications**: Perfect for local AI tools, RAG prototypes, chatbots, and desktop utilities that need persistent, structured data without network overhead.
|
|
14
|
-
- **Fast by Default**: It's built on SQLite, which is famously fast and reliable for local applications.
|
|
17
|
+
- **Fast by Default**: It's built on SQLite, which is famously fast and reliable for local applications. Vector search is accelerated with a high-performance, persistent `faiss` index.
|
|
15
18
|
- **Standard Relational Interface**: While `beaver` provides high-level features, you can always use the same SQLite file for normal relational tasks with standard SQL.
|
|
16
19
|
|
|
17
20
|
## Core Features
|
|
@@ -20,11 +23,28 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
|
|
|
20
23
|
- **Namespaced Key-Value Dictionaries**: A Pythonic, dictionary-like interface for storing any JSON-serializable object within separate namespaces with optional TTL for cache implementations.
|
|
21
24
|
- **Pythonic List Management**: A fluent, Redis-like interface for managing persistent, ordered lists.
|
|
22
25
|
- **Persistent Priority Queue**: A high-performance, persistent queue that always returns the item with the highest priority, perfect for task management.
|
|
23
|
-
- **
|
|
24
|
-
- **Full-Text Search**: Automatically index and search through document metadata using SQLite's powerful FTS5 engine.
|
|
25
|
-
- **Graph
|
|
26
|
+
- **High-Performance Vector Storage & Search**: Store vector embeddings and perform fast, crash-safe approximate nearest neighbor searches using a `faiss`-based hybrid index.
|
|
27
|
+
- **Full-Text and Fuzzy Search**: Automatically index and search through document metadata using SQLite's powerful FTS5 engine, enhanced with optional fuzzy search for typo-tolerant matching.
|
|
28
|
+
- **Knowledge Graph**: Create relationships between documents and traverse the graph to find neighbors or perform multi-hop walks.
|
|
26
29
|
- **Single-File & Portable**: All data is stored in a single SQLite file, making it incredibly easy to move, back up, or embed in your application.
|
|
27
30
|
|
|
31
|
+
## How Beaver is Implemented
|
|
32
|
+
|
|
33
|
+
BeaverDB is architected as a set of targeted wrappers around a standard SQLite database. The core `BeaverDB` class manages a single connection to the SQLite file and initializes all the necessary tables for the various features.
|
|
34
|
+
|
|
35
|
+
When you call a method like `db.dict("my_dict")` or `db.collection("my_docs")`, you get back a specialized manager object (`DictManager`, `CollectionManager`, etc.) that provides a clean, Pythonic API for that specific data modality. These managers translate the simple method calls (e.g., `my_dict["key"] = "value"`) into the appropriate SQL queries, handling all the complexity of data serialization, indexing, and transaction management behind the scenes. This design provides a minimal and intuitive API surface while leveraging the power and reliability of SQLite.
|
|
36
|
+
|
|
37
|
+
The vector store in BeaverDB is designed for high performance and reliability, using a hybrid faiss-based index that is both fast and persistent. Here's a look at the core ideas behind its implementation:
|
|
38
|
+
|
|
39
|
+
- **Hybrid Index System**: The vector store uses a two-tiered system to balance fast writes with efficient long-term storage:
|
|
40
|
+
- **Base Index**: A large, optimized faiss index that contains the majority of the vectors. This index is serialized and stored as a BLOB inside a dedicated SQLite table, ensuring it remains part of the single database file.
|
|
41
|
+
- **Delta Index**: A small, in-memory faiss index that holds all newly added vectors. This allows for near-instant write performance without having to rebuild the entire index for every new addition.
|
|
42
|
+
- **Crash-Safe Logging**: To ensure durability, all new vector additions and deletions are first recorded in a dedicated log table in the SQLite database. This means that even if the application crashes, no data is lost.
|
|
43
|
+
- **Automatic Compaction**: When the number of changes in the log reaches a certain threshold, a background process is automatically triggered to "compact" the index. This process rebuilds the base index, incorporating all the recent changes from the delta index, and then clears the log. This ensures that the index remains optimized for fast search performance over time.
|
|
44
|
+
|
|
45
|
+
This hybrid approach allows BeaverDB to provide a vector search experience that is both fast and durable, without sacrificing the single-file, embedded philosophy of the library.
|
|
46
|
+
|
|
47
|
+
|
|
28
48
|
## Installation
|
|
29
49
|
|
|
30
50
|
```bash
|
|
@@ -124,7 +144,7 @@ for message in chat_history:
|
|
|
124
144
|
|
|
125
145
|
### 4. Build a RAG (Retrieval-Augmented Generation) System
|
|
126
146
|
|
|
127
|
-
Combine **vector search** and **full-text search** to build a powerful RAG pipeline for your local documents.
|
|
147
|
+
Combine **vector search** and **full-text search** to build a powerful RAG pipeline for your local documents. The vector search uses a high-performance, persistent `faiss` index that supports incremental additions without downtime.
|
|
128
148
|
|
|
129
149
|
```python
|
|
130
150
|
# Get context for a user query like "fast python web frameworks"
|
|
@@ -183,17 +203,18 @@ For more in-depth examples, check out the scripts in the `examples/` directory:
|
|
|
183
203
|
- [`examples/publisher.py`](examples/publisher.py) and [`examples/subscriber.py`](examples/subscriber.py): A pair of examples demonstrating inter-process message passing with the publish/subscribe system.
|
|
184
204
|
- [`examples/cache.py`](examples/cache.py): A practical example of using a dictionary with TTL as a cache for API calls.
|
|
185
205
|
- [`examples/rerank.py`](examples/rerank.py): Shows how to combine results from vector and text search for more refined results.
|
|
206
|
+
- [`examples/fuzzy.py`](examples/fuzzy.py): Demonstrates fuzzy search capabilities for text search.
|
|
207
|
+
- [`examples/stress_vectors.py](examples/stress_vectors.py): A stress test for the vector search functionality.
|
|
208
|
+
- [`examples/general_test.py`](examples/general_test.py): A general-purpose test to run all operations randomly which allows testing long-running processes and synchronicity issues.
|
|
186
209
|
|
|
187
210
|
## Roadmap
|
|
188
211
|
|
|
189
212
|
These are some of the features and improvements planned for future releases:
|
|
190
213
|
|
|
191
|
-
- **
|
|
192
|
-
- **Faster ANN**: Explore integrating more advanced ANN libraries like `faiss` for improved vector search performance.
|
|
193
|
-
- **Async API**: Comprehensive async support with on-demand wrappers for all collections.
|
|
214
|
+
- **Full Async API**: Comprehensive async support with on-demand wrappers for all collections.
|
|
194
215
|
|
|
195
216
|
Check out the [roadmap](roadmap.md) for a detailed list of upcoming features and design ideas.
|
|
196
217
|
|
|
197
218
|
## License
|
|
198
219
|
|
|
199
|
-
This project is licensed under the MIT License.
|
|
220
|
+
This project is licensed under the MIT License.
|