beaver-db 0.17.5__tar.gz → 0.18.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of beaver-db might be problematic. Click here for more details.
- beaver_db-0.18.0/.dockerignore +13 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/PKG-INFO +98 -28
- {beaver_db-0.17.5 → beaver_db-0.18.0}/README.md +97 -27
- {beaver_db-0.17.5 → beaver_db-0.18.0}/beaver/__init__.py +2 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/beaver/cli.py +25 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/beaver/collections.py +10 -3
- {beaver_db-0.17.5 → beaver_db-0.18.0}/beaver/server.py +5 -5
- beaver_db-0.18.0/design.md +91 -0
- beaver_db-0.18.0/dockerfile +28 -0
- beaver_db-0.18.0/makefile +23 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/pyproject.toml +1 -1
- {beaver_db-0.17.5 → beaver_db-0.18.0}/roadmap.md +49 -2
- beaver_db-0.17.5/design.md +0 -118
- beaver_db-0.17.5/makefile +0 -15
- {beaver_db-0.17.5 → beaver_db-0.18.0}/.gitignore +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/.python-version +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/LICENSE +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/beaver/blobs.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/beaver/channels.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/beaver/core.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/beaver/dicts.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/beaver/lists.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/beaver/logs.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/beaver/queues.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/beaver/types.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/beaver/vectors.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/async_pubsub.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/blobs.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/cache.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/fts.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/fuzzy.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/general_test.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/graph.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/kvstore.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/list.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/logs.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/pqueue.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/producer_consumer.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/publisher.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/pubsub.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/rerank.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/stress_vectors.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/subscriber.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/textual_chat.css +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/textual_chat.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/type_hints.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/examples/vector.py +0 -0
- {beaver_db-0.17.5 → beaver_db-0.18.0}/uv.lock +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: beaver-db
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.18.0
|
|
4
4
|
Summary: Fast, embedded, and multi-modal DB based on SQLite for AI-powered applications.
|
|
5
5
|
License-File: LICENSE
|
|
6
6
|
Classifier: License :: OSI Approved :: MIT License
|
|
@@ -42,7 +42,7 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
|
|
|
42
42
|
|
|
43
43
|
`beaver` is built with a minimalistic philosophy for small, local use cases where a full-blown database server would be overkill.
|
|
44
44
|
|
|
45
|
-
- **Minimalistic**: The core library has zero external dependencies. Vector search
|
|
45
|
+
- **Minimalistic**: The core library has zero external dependencies. Vector search, the REST server, and the CLI, which require external libraries, are available as optional features.
|
|
46
46
|
- **Schemaless**: Flexible data storage without rigid schemas across all modalities.
|
|
47
47
|
- **Synchronous, Multi-Process, and Thread-Safe**: Designed for simplicity and safety in multi-threaded and multi-process environments.
|
|
48
48
|
- **Built for Local Applications**: Perfect for local AI tools, RAG prototypes, chatbots, and desktop utilities that need persistent, structured data without network overhead.
|
|
@@ -61,6 +61,8 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
|
|
|
61
61
|
- **Full-Text and Fuzzy Search**: Automatically index and search through document metadata using SQLite's powerful FTS5 engine, enhanced with optional fuzzy search for typo-tolerant matching.
|
|
62
62
|
- **Knowledge Graph**: Create relationships between documents and traverse the graph to find neighbors or perform multi-hop walks.
|
|
63
63
|
- **Single-File & Portable**: All data is stored in a single SQLite file, making it incredibly easy to move, back up, or embed in your application.
|
|
64
|
+
- **Built-in REST API Server (Optional)**: Instantly serve your database over a RESTful API with automatic OpenAPI documentation using FastAPI.
|
|
65
|
+
- **Full-Featured CLI Client (Optional)**: Interact with your database directly from the command line for administrative tasks and data exploration.
|
|
64
66
|
- **Optional Type-Safety:** Although the database is schemaless, you can use a minimalistic typing system for automatic serialization and deserialization that is Pydantic-compatible out of the box.
|
|
65
67
|
|
|
66
68
|
## How Beaver is Implemented
|
|
@@ -79,7 +81,6 @@ The vector store in BeaverDB is designed for high performance and reliability, u
|
|
|
79
81
|
|
|
80
82
|
This hybrid approach allows BeaverDB to provide a vector search experience that is both fast and durable, without sacrificing the single-file, embedded philosophy of the library.
|
|
81
83
|
|
|
82
|
-
|
|
83
84
|
## Installation
|
|
84
85
|
|
|
85
86
|
Install the core, dependency-free library:
|
|
@@ -88,12 +89,29 @@ Install the core, dependency-free library:
|
|
|
88
89
|
pip install beaver-db
|
|
89
90
|
```
|
|
90
91
|
|
|
91
|
-
|
|
92
|
+
To include optional features, you can install them as extras:
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
# For vector search capabilities
|
|
96
|
+
pip install "beaver-db[vector]"
|
|
97
|
+
|
|
98
|
+
# For the REST API server and CLI
|
|
99
|
+
pip install "beaver-db[server,cli]"
|
|
100
|
+
|
|
101
|
+
# To install all optional features at once
|
|
102
|
+
pip install "beaver-db[full]"
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
### Running with Docker
|
|
106
|
+
For a fully embedded and lightweight solution, you can run the BeaverDB REST API server using Docker. This is the easiest way to get a self-hosted instance up and running.
|
|
92
107
|
|
|
93
108
|
```bash
|
|
94
|
-
|
|
109
|
+
docker run -p 8000:8000 -v $(pwd)/data:/app apiad/beaverdb
|
|
95
110
|
```
|
|
96
111
|
|
|
112
|
+
This command will start the BeaverDB server, and your database file will be stored in the data directory on your host machine. You can access the API at <http://localhost:8000>.
|
|
113
|
+
|
|
114
|
+
|
|
97
115
|
## Quickstart
|
|
98
116
|
|
|
99
117
|
Get up and running in 30 seconds. This example showcases a dictionary, a list, and full-text search in a single script.
|
|
@@ -131,6 +149,53 @@ print(f"FTS Result: '{top_doc.content}'")
|
|
|
131
149
|
db.close()
|
|
132
150
|
```
|
|
133
151
|
|
|
152
|
+
## Built-in Server and CLI
|
|
153
|
+
|
|
154
|
+
Beaver comes with a built-in REST API server and a full-featured command-line client, allowing you to interact with your database without writing any code.
|
|
155
|
+
|
|
156
|
+
### REST API Server
|
|
157
|
+
|
|
158
|
+
You can instantly expose all of your database's functionality over a RESTful API. This is perfect for building quick prototypes, microservices, or for interacting with your data from other languages.
|
|
159
|
+
|
|
160
|
+
**1. Start the server**
|
|
161
|
+
|
|
162
|
+
```bash
|
|
163
|
+
# Start the server for your database file
|
|
164
|
+
beaver serve --database data.db --port 8000
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
This starts a `FastAPI` server. You can now access the interactive API documentation at `http://127.0.0.1:8000/docs`.
|
|
168
|
+
|
|
169
|
+
**2. Interact with the API**
|
|
170
|
+
|
|
171
|
+
Here are a couple of examples using `curl`:
|
|
172
|
+
|
|
173
|
+
```bash
|
|
174
|
+
# Set a value in the 'app_config' dictionary
|
|
175
|
+
curl -X PUT http://127.0.0.1:8000/dicts/app_config/api_key
|
|
176
|
+
-H "Content-Type: application/json"
|
|
177
|
+
-d '"your-secret-api-key"'
|
|
178
|
+
|
|
179
|
+
# Get the value back
|
|
180
|
+
curl http://127.0.0.1:8000/dicts/app_config/api_key
|
|
181
|
+
# Output: "your-secret-api-key"
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
### Command-Line Client
|
|
185
|
+
|
|
186
|
+
The CLI client allows you to call any BeaverDB method directly from your terminal.
|
|
187
|
+
|
|
188
|
+
```bash
|
|
189
|
+
# Set a value in a dictionary
|
|
190
|
+
beaver client --database data.db dict app_config set theme light
|
|
191
|
+
|
|
192
|
+
# Get the value back
|
|
193
|
+
beaver client --database data.db dict app_config get theme
|
|
194
|
+
|
|
195
|
+
# Push an item to a list
|
|
196
|
+
beaver client --database data.db list daily_tasks push "Review PRs"
|
|
197
|
+
```
|
|
198
|
+
|
|
134
199
|
## Things You Can Build with Beaver
|
|
135
200
|
|
|
136
201
|
Here are a few ideas to inspire your next project, showcasing how to combine Beaver's features to build powerful local applications.
|
|
@@ -282,8 +347,10 @@ For enhanced data integrity and a better developer experience, BeaverDB supports
|
|
|
282
347
|
|
|
283
348
|
This feature is designed to be flexible and works seamlessly with two kinds of models:
|
|
284
349
|
|
|
285
|
-
|
|
286
|
-
|
|
350
|
+
- **Pydantic Models**: If you're already using Pydantic, your `BaseModel` classes will work out of the box.
|
|
351
|
+
|
|
352
|
+
- **Lightweight `beaver.Model`**: For a zero-dependency solution, you can inherit from the built-in `beaver.Model` class, which is a standard Python class with serialization methods automatically included.
|
|
353
|
+
|
|
287
354
|
|
|
288
355
|
Here’s a quick example of how to use it:
|
|
289
356
|
|
|
@@ -305,7 +372,8 @@ users["alice"] = User(name="Alice", email="alice@example.com")
|
|
|
305
372
|
|
|
306
373
|
# The retrieved object is a proper instance of the User class
|
|
307
374
|
retrieved_user = users["alice"]
|
|
308
|
-
|
|
375
|
+
# Your editor will provide autocompletion here
|
|
376
|
+
print(f"Retrieved: {retrieved_user.name}")
|
|
309
377
|
```
|
|
310
378
|
|
|
311
379
|
In the same way you can have typed message payloads in `db.channel`, typed metadata in `db.blobs`, and custom document types in `db.collection`, as well as custom types in lists and queues.
|
|
@@ -316,25 +384,25 @@ Basically everywhere you can store or get some object in BeaverDB, you can use a
|
|
|
316
384
|
|
|
317
385
|
For more in-depth examples, check out the scripts in the `examples/` directory:
|
|
318
386
|
|
|
319
|
-
|
|
320
|
-
|
|
321
|
-
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
|
|
325
|
-
|
|
326
|
-
|
|
327
|
-
|
|
328
|
-
|
|
329
|
-
|
|
330
|
-
|
|
331
|
-
|
|
332
|
-
|
|
333
|
-
|
|
334
|
-
|
|
335
|
-
|
|
336
|
-
|
|
337
|
-
|
|
387
|
+
- [`async_pubsub.py`](examples/async_pubsub.py): A demonstration of the asynchronous wrapper for the publish/subscribe system.
|
|
388
|
+
- [`blobs.py`](examples/blobs.py): Demonstrates how to store and retrieve binary data in the database.
|
|
389
|
+
- [`cache.py`](examples/cache.py): A practical example of using a dictionary with TTL as a cache for API calls.
|
|
390
|
+
- [`fts.py`](examples/fts.py): A detailed look at full-text search, including targeted searches on specific metadata fields.
|
|
391
|
+
- [`fuzzy.py`](examples/fuzzy.py): Demonstrates fuzzy search capabilities for text search.
|
|
392
|
+
- [`general_test.py`](examples/general_test.py): A general-purpose test to run all operations randomly which allows testing long-running processes and synchronicity issues.
|
|
393
|
+
- [`graph.py`](examples/graph.py): Shows how to create relationships between documents and perform multi-hop graph traversals.
|
|
394
|
+
- [`kvstore.py`](examples/kvstore.py): A comprehensive demo of the namespaced dictionary feature.
|
|
395
|
+
- [`list.py`](examples/list.py): Shows the full capabilities of the persistent list, including slicing and in-place updates.
|
|
396
|
+
- [`logs.py`](examples/logs.py): A short example showing how to build a realtime dashboard with the logging feature.
|
|
397
|
+
- [`pqueue.py`](examples/pqueue.py): A practical example of using the persistent priority queue for task management.
|
|
398
|
+
- [`producer_consumer.py`](examples/producer_consumer.py): A demonstration of the distributed task queue system in a multi-process environment.
|
|
399
|
+
- [`publisher.py`](examples/publisher.p) and [`subscriber.py`](examples/subscriber.py): A pair of examples demonstrating inter-process message passing with the publish/subscribe system.
|
|
400
|
+
- [`pubsub.py`](examples/pubsub.py): A demonstration of the synchronous, thread-safe publish/subscribe system in a single process.
|
|
401
|
+
- [`rerank.py`](examples/rerank.py): Shows how to combine results from vector and text search for more refined results.
|
|
402
|
+
- [`stress_vectors.py`](examples/stress_vectors.py): A stress test for the vector search functionality.
|
|
403
|
+
- [`textual_chat.py`](examples/textual_chat.py): A chat application built with `textual` and `beaver` to illustrate the use of several primitives (lists, dicts, and channels) at the same time.
|
|
404
|
+
- [`type_hints.py`](examples/type_hints.py): Shows how to use type hints with `beaver` to get better IDE support and type safety.
|
|
405
|
+
- [`vector.py`](examples/vector.py): Demonstrates how to index and search vector embeddings, including upserts.
|
|
338
406
|
|
|
339
407
|
## Roadmap
|
|
340
408
|
|
|
@@ -342,7 +410,9 @@ For more in-depth examples, check out the scripts in the `examples/` directory:
|
|
|
342
410
|
|
|
343
411
|
These are some of the features and improvements planned for future releases:
|
|
344
412
|
|
|
345
|
-
|
|
413
|
+
- **Async API**: Extend the async support with on-demand wrappers for all features besides channels.
|
|
414
|
+
- **Type-Safe Models**: Enhance built-in `Model` to handle recursive and embedded types.
|
|
415
|
+
- **Drop-in REST Client**: Implement a `BeaverClient` class that acts as a drop-in replacement for `BeaverDB` but instead of a local database file, it works against a REST API server.
|
|
346
416
|
|
|
347
417
|
Check out the [roadmap](roadmap.md) for a detailed list of upcoming features and design ideas.
|
|
348
418
|
|
|
@@ -17,7 +17,7 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
|
|
|
17
17
|
|
|
18
18
|
`beaver` is built with a minimalistic philosophy for small, local use cases where a full-blown database server would be overkill.
|
|
19
19
|
|
|
20
|
-
- **Minimalistic**: The core library has zero external dependencies. Vector search
|
|
20
|
+
- **Minimalistic**: The core library has zero external dependencies. Vector search, the REST server, and the CLI, which require external libraries, are available as optional features.
|
|
21
21
|
- **Schemaless**: Flexible data storage without rigid schemas across all modalities.
|
|
22
22
|
- **Synchronous, Multi-Process, and Thread-Safe**: Designed for simplicity and safety in multi-threaded and multi-process environments.
|
|
23
23
|
- **Built for Local Applications**: Perfect for local AI tools, RAG prototypes, chatbots, and desktop utilities that need persistent, structured data without network overhead.
|
|
@@ -36,6 +36,8 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
|
|
|
36
36
|
- **Full-Text and Fuzzy Search**: Automatically index and search through document metadata using SQLite's powerful FTS5 engine, enhanced with optional fuzzy search for typo-tolerant matching.
|
|
37
37
|
- **Knowledge Graph**: Create relationships between documents and traverse the graph to find neighbors or perform multi-hop walks.
|
|
38
38
|
- **Single-File & Portable**: All data is stored in a single SQLite file, making it incredibly easy to move, back up, or embed in your application.
|
|
39
|
+
- **Built-in REST API Server (Optional)**: Instantly serve your database over a RESTful API with automatic OpenAPI documentation using FastAPI.
|
|
40
|
+
- **Full-Featured CLI Client (Optional)**: Interact with your database directly from the command line for administrative tasks and data exploration.
|
|
39
41
|
- **Optional Type-Safety:** Although the database is schemaless, you can use a minimalistic typing system for automatic serialization and deserialization that is Pydantic-compatible out of the box.
|
|
40
42
|
|
|
41
43
|
## How Beaver is Implemented
|
|
@@ -54,7 +56,6 @@ The vector store in BeaverDB is designed for high performance and reliability, u
|
|
|
54
56
|
|
|
55
57
|
This hybrid approach allows BeaverDB to provide a vector search experience that is both fast and durable, without sacrificing the single-file, embedded philosophy of the library.
|
|
56
58
|
|
|
57
|
-
|
|
58
59
|
## Installation
|
|
59
60
|
|
|
60
61
|
Install the core, dependency-free library:
|
|
@@ -63,12 +64,29 @@ Install the core, dependency-free library:
|
|
|
63
64
|
pip install beaver-db
|
|
64
65
|
```
|
|
65
66
|
|
|
66
|
-
|
|
67
|
+
To include optional features, you can install them as extras:
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
# For vector search capabilities
|
|
71
|
+
pip install "beaver-db[vector]"
|
|
72
|
+
|
|
73
|
+
# For the REST API server and CLI
|
|
74
|
+
pip install "beaver-db[server,cli]"
|
|
75
|
+
|
|
76
|
+
# To install all optional features at once
|
|
77
|
+
pip install "beaver-db[full]"
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
### Running with Docker
|
|
81
|
+
For a fully embedded and lightweight solution, you can run the BeaverDB REST API server using Docker. This is the easiest way to get a self-hosted instance up and running.
|
|
67
82
|
|
|
68
83
|
```bash
|
|
69
|
-
|
|
84
|
+
docker run -p 8000:8000 -v $(pwd)/data:/app apiad/beaverdb
|
|
70
85
|
```
|
|
71
86
|
|
|
87
|
+
This command will start the BeaverDB server, and your database file will be stored in the data directory on your host machine. You can access the API at <http://localhost:8000>.
|
|
88
|
+
|
|
89
|
+
|
|
72
90
|
## Quickstart
|
|
73
91
|
|
|
74
92
|
Get up and running in 30 seconds. This example showcases a dictionary, a list, and full-text search in a single script.
|
|
@@ -106,6 +124,53 @@ print(f"FTS Result: '{top_doc.content}'")
|
|
|
106
124
|
db.close()
|
|
107
125
|
```
|
|
108
126
|
|
|
127
|
+
## Built-in Server and CLI
|
|
128
|
+
|
|
129
|
+
Beaver comes with a built-in REST API server and a full-featured command-line client, allowing you to interact with your database without writing any code.
|
|
130
|
+
|
|
131
|
+
### REST API Server
|
|
132
|
+
|
|
133
|
+
You can instantly expose all of your database's functionality over a RESTful API. This is perfect for building quick prototypes, microservices, or for interacting with your data from other languages.
|
|
134
|
+
|
|
135
|
+
**1. Start the server**
|
|
136
|
+
|
|
137
|
+
```bash
|
|
138
|
+
# Start the server for your database file
|
|
139
|
+
beaver serve --database data.db --port 8000
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
This starts a `FastAPI` server. You can now access the interactive API documentation at `http://127.0.0.1:8000/docs`.
|
|
143
|
+
|
|
144
|
+
**2. Interact with the API**
|
|
145
|
+
|
|
146
|
+
Here are a couple of examples using `curl`:
|
|
147
|
+
|
|
148
|
+
```bash
|
|
149
|
+
# Set a value in the 'app_config' dictionary
|
|
150
|
+
curl -X PUT http://127.0.0.1:8000/dicts/app_config/api_key
|
|
151
|
+
-H "Content-Type: application/json"
|
|
152
|
+
-d '"your-secret-api-key"'
|
|
153
|
+
|
|
154
|
+
# Get the value back
|
|
155
|
+
curl http://127.0.0.1:8000/dicts/app_config/api_key
|
|
156
|
+
# Output: "your-secret-api-key"
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
### Command-Line Client
|
|
160
|
+
|
|
161
|
+
The CLI client allows you to call any BeaverDB method directly from your terminal.
|
|
162
|
+
|
|
163
|
+
```bash
|
|
164
|
+
# Set a value in a dictionary
|
|
165
|
+
beaver client --database data.db dict app_config set theme light
|
|
166
|
+
|
|
167
|
+
# Get the value back
|
|
168
|
+
beaver client --database data.db dict app_config get theme
|
|
169
|
+
|
|
170
|
+
# Push an item to a list
|
|
171
|
+
beaver client --database data.db list daily_tasks push "Review PRs"
|
|
172
|
+
```
|
|
173
|
+
|
|
109
174
|
## Things You Can Build with Beaver
|
|
110
175
|
|
|
111
176
|
Here are a few ideas to inspire your next project, showcasing how to combine Beaver's features to build powerful local applications.
|
|
@@ -257,8 +322,10 @@ For enhanced data integrity and a better developer experience, BeaverDB supports
|
|
|
257
322
|
|
|
258
323
|
This feature is designed to be flexible and works seamlessly with two kinds of models:
|
|
259
324
|
|
|
260
|
-
|
|
261
|
-
|
|
325
|
+
- **Pydantic Models**: If you're already using Pydantic, your `BaseModel` classes will work out of the box.
|
|
326
|
+
|
|
327
|
+
- **Lightweight `beaver.Model`**: For a zero-dependency solution, you can inherit from the built-in `beaver.Model` class, which is a standard Python class with serialization methods automatically included.
|
|
328
|
+
|
|
262
329
|
|
|
263
330
|
Here’s a quick example of how to use it:
|
|
264
331
|
|
|
@@ -280,7 +347,8 @@ users["alice"] = User(name="Alice", email="alice@example.com")
|
|
|
280
347
|
|
|
281
348
|
# The retrieved object is a proper instance of the User class
|
|
282
349
|
retrieved_user = users["alice"]
|
|
283
|
-
|
|
350
|
+
# Your editor will provide autocompletion here
|
|
351
|
+
print(f"Retrieved: {retrieved_user.name}")
|
|
284
352
|
```
|
|
285
353
|
|
|
286
354
|
In the same way you can have typed message payloads in `db.channel`, typed metadata in `db.blobs`, and custom document types in `db.collection`, as well as custom types in lists and queues.
|
|
@@ -291,25 +359,25 @@ Basically everywhere you can store or get some object in BeaverDB, you can use a
|
|
|
291
359
|
|
|
292
360
|
For more in-depth examples, check out the scripts in the `examples/` directory:
|
|
293
361
|
|
|
294
|
-
|
|
295
|
-
|
|
296
|
-
|
|
297
|
-
|
|
298
|
-
|
|
299
|
-
|
|
300
|
-
|
|
301
|
-
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
|
|
306
|
-
|
|
307
|
-
|
|
308
|
-
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
|
|
362
|
+
- [`async_pubsub.py`](examples/async_pubsub.py): A demonstration of the asynchronous wrapper for the publish/subscribe system.
|
|
363
|
+
- [`blobs.py`](examples/blobs.py): Demonstrates how to store and retrieve binary data in the database.
|
|
364
|
+
- [`cache.py`](examples/cache.py): A practical example of using a dictionary with TTL as a cache for API calls.
|
|
365
|
+
- [`fts.py`](examples/fts.py): A detailed look at full-text search, including targeted searches on specific metadata fields.
|
|
366
|
+
- [`fuzzy.py`](examples/fuzzy.py): Demonstrates fuzzy search capabilities for text search.
|
|
367
|
+
- [`general_test.py`](examples/general_test.py): A general-purpose test to run all operations randomly which allows testing long-running processes and synchronicity issues.
|
|
368
|
+
- [`graph.py`](examples/graph.py): Shows how to create relationships between documents and perform multi-hop graph traversals.
|
|
369
|
+
- [`kvstore.py`](examples/kvstore.py): A comprehensive demo of the namespaced dictionary feature.
|
|
370
|
+
- [`list.py`](examples/list.py): Shows the full capabilities of the persistent list, including slicing and in-place updates.
|
|
371
|
+
- [`logs.py`](examples/logs.py): A short example showing how to build a realtime dashboard with the logging feature.
|
|
372
|
+
- [`pqueue.py`](examples/pqueue.py): A practical example of using the persistent priority queue for task management.
|
|
373
|
+
- [`producer_consumer.py`](examples/producer_consumer.py): A demonstration of the distributed task queue system in a multi-process environment.
|
|
374
|
+
- [`publisher.py`](examples/publisher.p) and [`subscriber.py`](examples/subscriber.py): A pair of examples demonstrating inter-process message passing with the publish/subscribe system.
|
|
375
|
+
- [`pubsub.py`](examples/pubsub.py): A demonstration of the synchronous, thread-safe publish/subscribe system in a single process.
|
|
376
|
+
- [`rerank.py`](examples/rerank.py): Shows how to combine results from vector and text search for more refined results.
|
|
377
|
+
- [`stress_vectors.py`](examples/stress_vectors.py): A stress test for the vector search functionality.
|
|
378
|
+
- [`textual_chat.py`](examples/textual_chat.py): A chat application built with `textual` and `beaver` to illustrate the use of several primitives (lists, dicts, and channels) at the same time.
|
|
379
|
+
- [`type_hints.py`](examples/type_hints.py): Shows how to use type hints with `beaver` to get better IDE support and type safety.
|
|
380
|
+
- [`vector.py`](examples/vector.py): Demonstrates how to index and search vector embeddings, including upserts.
|
|
313
381
|
|
|
314
382
|
## Roadmap
|
|
315
383
|
|
|
@@ -317,7 +385,9 @@ For more in-depth examples, check out the scripts in the `examples/` directory:
|
|
|
317
385
|
|
|
318
386
|
These are some of the features and improvements planned for future releases:
|
|
319
387
|
|
|
320
|
-
|
|
388
|
+
- **Async API**: Extend the async support with on-demand wrappers for all features besides channels.
|
|
389
|
+
- **Type-Safe Models**: Enhance built-in `Model` to handle recursive and embedded types.
|
|
390
|
+
- **Drop-in REST Client**: Implement a `BeaverClient` class that acts as a drop-in replacement for `BeaverDB` but instead of a local database file, it works against a REST API server.
|
|
321
391
|
|
|
322
392
|
Check out the [roadmap](roadmap.md) for a detailed list of upcoming features and design ideas.
|
|
323
393
|
|
|
@@ -1,10 +1,35 @@
|
|
|
1
1
|
import typer
|
|
2
2
|
import rich
|
|
3
3
|
from typing_extensions import Annotated
|
|
4
|
+
import beaver
|
|
4
5
|
|
|
5
6
|
app = typer.Typer()
|
|
6
7
|
|
|
7
8
|
|
|
9
|
+
def version_callback(value: bool):
|
|
10
|
+
if value:
|
|
11
|
+
print(beaver.__version__)
|
|
12
|
+
raise typer.Exit()
|
|
13
|
+
|
|
14
|
+
|
|
15
|
+
@app.callback()
|
|
16
|
+
def main(
|
|
17
|
+
version: Annotated[
|
|
18
|
+
bool,
|
|
19
|
+
typer.Option(
|
|
20
|
+
"--version",
|
|
21
|
+
callback=version_callback,
|
|
22
|
+
is_eager=True,
|
|
23
|
+
help="Show the version and exit.",
|
|
24
|
+
),
|
|
25
|
+
] = False,
|
|
26
|
+
):
|
|
27
|
+
"""
|
|
28
|
+
BeaverDB command-line interface.
|
|
29
|
+
"""
|
|
30
|
+
pass
|
|
31
|
+
|
|
32
|
+
|
|
8
33
|
@app.command()
|
|
9
34
|
def serve(
|
|
10
35
|
database: Annotated[
|
|
@@ -96,13 +96,20 @@ class Document(Model):
|
|
|
96
96
|
def to_dict(self) -> dict[str, Any]:
|
|
97
97
|
"""Serializes the document's metadata to a dictionary."""
|
|
98
98
|
metadata = self.__dict__.copy()
|
|
99
|
-
metadata
|
|
100
|
-
metadata.pop("id", None)
|
|
99
|
+
metadata["embedding"] = self.embedding.tolist() if self.embedding is not None else None
|
|
101
100
|
return metadata
|
|
102
101
|
|
|
103
102
|
def __repr__(self):
|
|
103
|
+
d = self.to_dict()
|
|
104
|
+
d.pop("embedding")
|
|
104
105
|
metadata_str = ", ".join(f"{k}={v!r}" for k, v in self.to_dict().items())
|
|
105
|
-
return f"Document(
|
|
106
|
+
return f"Document({metadata_str})"
|
|
107
|
+
|
|
108
|
+
def model_dump_json(self) -> str:
|
|
109
|
+
d = self.to_dict()
|
|
110
|
+
d.pop("embedding")
|
|
111
|
+
d.pop("id")
|
|
112
|
+
return json.dumps(d)
|
|
106
113
|
|
|
107
114
|
|
|
108
115
|
class CollectionManager[D: Document]:
|
|
@@ -270,7 +270,7 @@ def build(db: BeaverDB) -> FastAPI:
|
|
|
270
270
|
def get_all_documents(name: str) -> List[dict]:
|
|
271
271
|
"""Retrieves all documents in the collection."""
|
|
272
272
|
collection = db.collection(name)
|
|
273
|
-
return [doc.
|
|
273
|
+
return [doc.to_dict() for doc in collection]
|
|
274
274
|
|
|
275
275
|
@app.post("/collections/{name}/index", tags=["Collections"])
|
|
276
276
|
def index_document(name: str, req: IndexRequest):
|
|
@@ -291,7 +291,7 @@ def build(db: BeaverDB) -> FastAPI:
|
|
|
291
291
|
collection = db.collection(name)
|
|
292
292
|
try:
|
|
293
293
|
results = collection.search(vector=req.vector, top_k=req.top_k)
|
|
294
|
-
return [{"document": doc.
|
|
294
|
+
return [{"document": doc.to_dict(), "distance": dist} for doc, dist in results]
|
|
295
295
|
except TypeError as e:
|
|
296
296
|
if "faiss" in str(e):
|
|
297
297
|
raise HTTPException(status_code=501, detail="Vector search requires the '[faiss]' extra. Install with: pip install \"beaver-db[faiss]\"")
|
|
@@ -302,7 +302,7 @@ def build(db: BeaverDB) -> FastAPI:
|
|
|
302
302
|
"""Performs a full-text or fuzzy search on the collection."""
|
|
303
303
|
collection = db.collection(name)
|
|
304
304
|
results = collection.match(query=req.query, on=req.on, top_k=req.top_k, fuzziness=req.fuzziness)
|
|
305
|
-
return [{"document": doc.
|
|
305
|
+
return [{"document": doc.to_dict(), "score": score} for doc, score in results]
|
|
306
306
|
|
|
307
307
|
@app.post("/collections/{name}/connect", tags=["Collections"])
|
|
308
308
|
def connect_documents(name: str, req: ConnectRequest):
|
|
@@ -319,7 +319,7 @@ def build(db: BeaverDB) -> FastAPI:
|
|
|
319
319
|
collection = db.collection(name)
|
|
320
320
|
doc = Document(id=doc_id)
|
|
321
321
|
neighbors = collection.neighbors(doc, label=label)
|
|
322
|
-
return [n.
|
|
322
|
+
return [n.to_dict() for n in neighbors]
|
|
323
323
|
|
|
324
324
|
@app.post("/collections/{name}/{doc_id}/walk", tags=["Collections"])
|
|
325
325
|
def walk_graph(name: str, doc_id: str, req: WalkRequest) -> List[dict]:
|
|
@@ -327,7 +327,7 @@ def build(db: BeaverDB) -> FastAPI:
|
|
|
327
327
|
collection = db.collection(name)
|
|
328
328
|
source_doc = Document(id=doc_id)
|
|
329
329
|
results = collection.walk(source=source_doc, labels=req.labels, depth=req.depth, direction=req.direction)
|
|
330
|
-
return [doc.
|
|
330
|
+
return [doc.to_dict() for doc in results]
|
|
331
331
|
|
|
332
332
|
return app
|
|
333
333
|
|
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
# BeaverDB Design Document
|
|
2
|
+
|
|
3
|
+
- **Version**: 1.2
|
|
4
|
+
- **Status**: Active
|
|
5
|
+
- **Last Updated**: October 4, 2025
|
|
6
|
+
|
|
7
|
+
## 1. Introduction & Vision
|
|
8
|
+
|
|
9
|
+
`beaver-db` is a local-first, embedded, multi-modal database for Python. Its primary motivation is to provide a simple, single-file solution for modern applications that need to handle complex data types like vectors, documents, and graphs without the overhead of a database server.
|
|
10
|
+
|
|
11
|
+
The vision for `beaver-db` is to be the go-to "good enough" database for prototypes, desktop utilities, and small-scale applications. It empowers developers to start quickly with a simple API but provides a seamless path to scale from a local, embedded model to a networked client-server architecture without changing their application code.
|
|
12
|
+
|
|
13
|
+
## 2. Guiding Principles
|
|
14
|
+
|
|
15
|
+
These principles guide all development and design decisions for beaver-db.
|
|
16
|
+
|
|
17
|
+
* **Local-First & Embedded:** The default mode of operation is a single SQLite file. This is the non-negotiable source of truth for local use. While a client-server model is supported via the REST API, the core engine remains fundamentally embedded.
|
|
18
|
+
* **Standard SQLite Compatibility:** The .db file generated by beaver-db must always be a valid SQLite file that can be opened and queried by any standard SQLite tool. This ensures data portability and interoperability.
|
|
19
|
+
* **Minimal & Optional Dependencies:** The core library has zero external dependencies. Features like vector search, the REST server, and the API client are available as optional extras, allowing users to install only what they need.
|
|
20
|
+
* **Simplicity and Pythonic API:** The library must present a simple, intuitive, and "Pythonic" interface. We will always prefer simple method calls with clear parameters over custom Domain Specific Languages (DSLs).
|
|
21
|
+
* **Developer Experience & API Parity**: The primary user experience goal is to provide a clean and minimal public API. This includes ensuring that the remote `BeaverClient` is a drop-in replacement for the local `BeaverDB`, maintaining perfect API parity.
|
|
22
|
+
* **Synchronous Core with Async Potential:** The core library is built on a synchronous foundation for thread safety and simplicity. An async-compatible API is provided via on-demand wrappers that run blocking calls in a background thread pool.
|
|
23
|
+
* **Convention over Configuration:** Features should work well out-of-the-box with sensible defaults.
|
|
24
|
+
|
|
25
|
+
## 3. Architecture & Core Components
|
|
26
|
+
|
|
27
|
+
`beaver-db` is architected as a set of targeted wrappers around a standard SQLite database. It supports two primary modes of operation: **Embedded** and **Client-Server**.
|
|
28
|
+
|
|
29
|
+
### 3.1. Core Engine (Embedded Mode)
|
|
30
|
+
|
|
31
|
+
* **`BeaverDB` Class**: Manages a thread-safe connection to a local SQLite database file.
|
|
32
|
+
* **Concurrency**: Enables `PRAGMA journal_mode=WAL;` (Write-Ahead Logging) by default to provide good concurrency between one writer and multiple readers.
|
|
33
|
+
* **Schema Management**: All tables are prefixed with `beaver_` to avoid conflicts with user-defined tables.
|
|
34
|
+
|
|
35
|
+
### 3.2. Client-Server Mode
|
|
36
|
+
|
|
37
|
+
* **REST API Server**: An optional feature that exposes the full functionality of a `BeaverDB` instance over a RESTful API, including WebSocket endpoints for real-time features.
|
|
38
|
+
* **`BeaverClient`**: A drop-in replacement for the `BeaverDB` class that interacts with the REST API. Instead of a database connection, it manages an HTTP client session, allowing applications to switch from local to remote operation without code changes.
|
|
39
|
+
|
|
40
|
+
### 3.3. Data Models & Features
|
|
41
|
+
|
|
42
|
+
#### Key-Value Dictionaries (DictManager)
|
|
43
|
+
|
|
44
|
+
* **Implementation**: A single table (`beaver_dicts`) stores key-value pairs partitioned by a `dict_name`.
|
|
45
|
+
* **Design**: The `db.dict("namespace")` method returns a `DictManager` (or `RemoteDictManager`) that provides a complete Pythonic dictionary-like interface.
|
|
46
|
+
|
|
47
|
+
#### Lists (ListManager)
|
|
48
|
+
|
|
49
|
+
* **Implementation:** A table (`beaver_lists`) storing `list_name`, `item_order` (REAL), and `item_value` (TEXT). The use of a floating-point `item_order` allows for O(1) insertions.
|
|
50
|
+
* **Design:** The `ListManager` provides a rich, Pythonic API (`__len__`, `__getitem__`, `push`, `pop`, etc.).
|
|
51
|
+
|
|
52
|
+
#### Pub/Sub System
|
|
53
|
+
|
|
54
|
+
* **Implementation:** A log table (`beaver_pubsub_log`) stores messages with a timestamp and channel name.
|
|
55
|
+
* **Design:** In embedded mode, a background thread polls the table for new messages. In client-server mode, the client subscribes to a WebSocket endpoint for real-time updates.
|
|
56
|
+
|
|
57
|
+
#### Collections (CollectionManager)
|
|
58
|
+
|
|
59
|
+
This is the most complex component, supporting documents, vectors, text, and graphs.
|
|
60
|
+
|
|
61
|
+
* **Document Storage:** Documents are stored in the `beaver_collections` table.
|
|
62
|
+
* **Vector Search (ANN):**
|
|
63
|
+
* **Implementation**: Uses a hybrid index system with a large, on-disk `faiss` base index and a small, in-memory delta index for fast writes. All changes are crash-safe, logged in SQLite tables before being applied.
|
|
64
|
+
* **Compaction**: An automatic background process compacts the delta index into the base index to maintain search performance over time.
|
|
65
|
+
* **Full-Text Search (FTS):**
|
|
66
|
+
* **Implementation:** Uses a virtual table (`beaver_fts_index`) powered by SQLite's FTS5 extension. String values from indexed documents are automatically added to the FTS index.
|
|
67
|
+
* **Graph Engine:**
|
|
68
|
+
* **Implementation:** Relationships are stored as directed edges in the `beaver_edges` table.
|
|
69
|
+
* **Design:** The API is purely functional and Pythonic, using recursive Common Table Expressions (CTEs) for efficient multi-hop graph traversals (`walk()`).
|
|
70
|
+
|
|
71
|
+
### 3.4. Developer Experience Features
|
|
72
|
+
|
|
73
|
+
* **Async API**: A `.as_async()` method is available on all synchronous manager objects. It returns a parallel `Async` version of the object where all methods are `async def`. This is achieved by running the blocking database calls in a background thread pool via `asyncio.to_thread`.
|
|
74
|
+
* **Type-Safe Models**: All data-handling methods (`db.dict`, `db.list`, etc.) accept an optional `model` argument. When a Pydantic model or the built-in `beaver.Model` is provided, the library handles automatic data validation, serialization, and deserialization, enabling static analysis and autocompletion.
|
|
75
|
+
|
|
76
|
+
## 4. Roadmap & Future Development
|
|
77
|
+
|
|
78
|
+
With the design of the REST API client and async wrappers, the library is approaching feature-completeness for its core domain. The primary focus for the near future is on stability and usability.
|
|
79
|
+
|
|
80
|
+
1. **Comprehensive Unit Testing:** Increase test coverage for all features, including the new `BeaverClient` and async wrappers.
|
|
81
|
+
2. **Elaborate Examples:** Create more examples that demonstrate how to combine features, particularly in a client-server context.
|
|
82
|
+
3. **Performance Benchmarking:** Develop a standardized suite of performance tests for both embedded and client-server modes to document practical scalability limits.
|
|
83
|
+
|
|
84
|
+
### Explicitly Out of Scope
|
|
85
|
+
|
|
86
|
+
To maintain focus and simplicity, the following features will **not** be implemented:
|
|
87
|
+
|
|
88
|
+
* Replication or distributed operation beyond the single-server model.
|
|
89
|
+
* Multi-file database formats.
|
|
90
|
+
* Any feature that makes the database file incompatible with standard SQLite tools.
|
|
91
|
+
* Custom query languages or DSLs.
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# Use a lightweight Python base image
|
|
2
|
+
FROM python:3.13-slim
|
|
3
|
+
|
|
4
|
+
# Set the working directory in the container
|
|
5
|
+
WORKDIR /app
|
|
6
|
+
|
|
7
|
+
# Add a build argument for the version, defaulting to the latest
|
|
8
|
+
ARG VERSION=latest
|
|
9
|
+
|
|
10
|
+
# Install the specified version of beaver-db from PyPI
|
|
11
|
+
# If VERSION is "latest", it installs the most recent version.
|
|
12
|
+
# Otherwise, it installs the specified version (e.g., beaver-db==0.17.6)
|
|
13
|
+
RUN if [ "${VERSION}" = "latest" ]; then \
|
|
14
|
+
pip install --no-cache-dir "beaver-db[full]"; \
|
|
15
|
+
else \
|
|
16
|
+
pip install --no-cache-dir "beaver-db[full]==${VERSION}"; \
|
|
17
|
+
fi
|
|
18
|
+
|
|
19
|
+
# Set default environment variables for configuration
|
|
20
|
+
ENV DATABASE=beaver.db
|
|
21
|
+
ENV HOST=0.0.0.0
|
|
22
|
+
ENV PORT=8000
|
|
23
|
+
|
|
24
|
+
# Expose the port the server will run on
|
|
25
|
+
EXPOSE 8000
|
|
26
|
+
|
|
27
|
+
# The command to run when the container starts
|
|
28
|
+
CMD ["beaver", "serve", "--database", "${DATABASE}", "--host", "${HOST}", "--port", "${PORT}"]
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
.PHONY: publish
|
|
2
|
+
publish: clean build
|
|
3
|
+
uv publish --token `dotenv -f .env get PYPI_TOKEN`
|
|
4
|
+
|
|
5
|
+
.PHONY: build
|
|
6
|
+
build:
|
|
7
|
+
uv build
|
|
8
|
+
uv pip install -e .[full]
|
|
9
|
+
|
|
10
|
+
.PHONY: clean
|
|
11
|
+
clean:
|
|
12
|
+
rm -rf dist
|
|
13
|
+
rm -rf beaver_db.egg-info
|
|
14
|
+
find . -name '*.pyc' -exec rm -f {} +
|
|
15
|
+
find . -name '__pycache__' -exec rm -rf {} +
|
|
16
|
+
|
|
17
|
+
.PHONY: push-docker
|
|
18
|
+
push-docker:
|
|
19
|
+
$(eval VERSION := $(shell beaver --version))
|
|
20
|
+
docker build --build-arg VERSION=$(VERSION) -t apiad/beaverdb:$(VERSION) .
|
|
21
|
+
docker tag apiad/beaverdb:$(VERSION) apiad/beaverdb:latest
|
|
22
|
+
docker push apiad/beaverdb:$(VERSION)
|
|
23
|
+
docker push apiad/beaverdb:latest
|
|
@@ -43,8 +43,6 @@ This feature perfectly aligns with the library's guiding principles:
|
|
|
43
43
|
* **Simplicity and Pythonic API**: The `.as_async()` method is an intuitive and Pythonic way to opt into asynchronous behavior, and the chained-call syntax is elegant and clean.
|
|
44
44
|
* **Developer Experience**: By ensuring the `async` wrappers are explicitly typed, the design prioritizes compatibility with modern developer tools, preventing bugs and improving productivity.
|
|
45
45
|
|
|
46
|
-
-----
|
|
47
|
-
|
|
48
46
|
## Feature: Pydantic Model Integration for Type-Safe Operations
|
|
49
47
|
|
|
50
48
|
### 1. Concept
|
|
@@ -103,3 +101,52 @@ This feature aligns with `beaver-db`'s guiding principles:
|
|
|
103
101
|
* **Simplicity and Pythonic API**: The `model` parameter is a simple and intuitive way to enable type safety.
|
|
104
102
|
* **Developer Experience**: This feature directly addresses the developer experience by providing type safety and editor support.
|
|
105
103
|
* **Minimal & Cross-Platform Dependencies**: By making `pydantic` an optional dependency, the core library remains minimalistic.
|
|
104
|
+
|
|
105
|
+
## Feature: Drop-in REST API Client (`BeaverClient`)
|
|
106
|
+
|
|
107
|
+
### 1. Concept
|
|
108
|
+
|
|
109
|
+
This feature introduces a new `BeaverClient` class that acts as a **drop-in replacement** for the core `BeaverDB` class. Instead of interacting directly with a local SQLite file, this client will execute all operations by making requests to a remote BeaverDB REST API server. This allows users to seamlessly switch from a local, embedded database to a client-server architecture without changing their application code.
|
|
110
|
+
|
|
111
|
+
### 2. Use Cases
|
|
112
|
+
|
|
113
|
+
* **Seamless Scaling**: Effortlessly transition a project from a local prototype to a networked service without a code rewrite.
|
|
114
|
+
* **Multi-Process/Multi-Machine Access**: Allow multiple processes or machines to share and interact with a single, centralized BeaverDB instance.
|
|
115
|
+
* **Language Interoperability**: While the client itself is Python, it provides a blueprint for creating clients in other languages to interact with the BeaverDB server.
|
|
116
|
+
|
|
117
|
+
### 3. Proposed API
|
|
118
|
+
|
|
119
|
+
The API is designed for maximum compatibility. A user only needs to change how the database object is instantiated.
|
|
120
|
+
|
|
121
|
+
**Local Implementation:**
|
|
122
|
+
|
|
123
|
+
```python
|
|
124
|
+
from beaver import BeaverDB
|
|
125
|
+
db = BeaverDB("my_local_data.db")
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
**Remote Implementation:**
|
|
129
|
+
|
|
130
|
+
```python
|
|
131
|
+
from beaver.client import BeaverClient
|
|
132
|
+
db = BeaverClient(base_url="http://127.0.0.1:8000")
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
All subsequent code, such as `db.dict("config")["theme"] = "dark"` or `db.collection("docs").search(...)`, remains identical.
|
|
136
|
+
|
|
137
|
+
### 4. Implementation Design: Remote Managers and HTTP Client
|
|
138
|
+
|
|
139
|
+
The implementation will live in a new `beaver/client.py` file and will not depend on any SQLite logic.
|
|
140
|
+
|
|
141
|
+
1. **Core Component**: The `BeaverClient` class will manage a persistent HTTP session using the **`httpx`** library, which provides connection pooling and supports both synchronous and asynchronous operations.
|
|
142
|
+
2. **Remote Managers**: For each existing manager (e.g., `DictManager`, `CollectionManager`), a corresponding `RemoteDictManager` or `RemoteCollectionManager` will be created. These classes will contain no database logic; their methods will simply construct and send the appropriate HTTP requests to the server endpoints.
|
|
143
|
+
3. **WebSocket Handling**: For real-time features like `db.channel("my_channel").subscribe()` and `db.log("metrics").live()`, the remote managers will establish WebSocket connections to the server's streaming endpoints. This will require new `WebSocketSubscriber` and `WebSocketLiveIterator` classes that read from the network stream instead of a local queue.
|
|
144
|
+
4. **Optional Dependency**: `httpx` and any necessary WebSocket libraries will be included as a new optional dependency, such as `pip install "beaver-db[client]"`.
|
|
145
|
+
|
|
146
|
+
### 5. Alignment with Philosophy
|
|
147
|
+
|
|
148
|
+
This feature strongly aligns with the library's guiding principles:
|
|
149
|
+
|
|
150
|
+
* **Simplicity and Pythonic API**: By maintaining perfect API parity, it ensures the remote client is just as intuitive and simple to use as the local database.
|
|
151
|
+
* **Developer Experience**: It provides a frictionless path for scaling applications, which is a major enhancement to the developer experience.
|
|
152
|
+
* **Minimal Dependencies**: By keeping the client and its dependencies optional, the core library remains lightweight and dependency-free.
|
beaver_db-0.17.5/design.md
DELETED
|
@@ -1,118 +0,0 @@
|
|
|
1
|
-
# BeaverDB Design Document
|
|
2
|
-
|
|
3
|
-
Version: 1.1
|
|
4
|
-
Status: In Progress
|
|
5
|
-
Last Updated: September 18, 2025
|
|
6
|
-
|
|
7
|
-
## 1. Introduction & Vision
|
|
8
|
-
|
|
9
|
-
`beaver-db` is a local-first, embedded, multi-modal database for Python. Its primary motivation is to provide a simple, single-file solution for modern applications that need to handle complex data types like vectors, documents, and graphs without the overhead of a database server.
|
|
10
|
-
|
|
11
|
-
The vision for `beaver-db` is to be the go-to "good enough" database for prototypes, desktop utilities, and small-scale applications. It empowers developers to start quickly with a simple API but provides sufficient power and performance so that they do not need to immediately upgrade to a more complex database system as their application's features grow.
|
|
12
|
-
|
|
13
|
-
## 2. Guiding Principles
|
|
14
|
-
|
|
15
|
-
These principleembeddings guide all development and design decisions for beaver-db.
|
|
16
|
-
|
|
17
|
-
* **Local-First & Embedded:** The database will always be a single SQLite file. This is the non-negotiable source of truth. Temporary files for journaling or syncing are acceptable, but the core architecture is serverless. Any feature that requires a client-server model is out of scope.
|
|
18
|
-
* **Standard SQLite Compatibility:** The .db file generated by beaver-db must always be a valid SQLite file that can be opened and queried by any standard SQLite tool. This ensures data portability and interoperability.
|
|
19
|
-
* **Minimal & Cross-Platform Dependencies:** New dependencies beyond numpy and scipy will only be considered if they provide a monumental performance improvement and are fully cross-platform, with a guarantee of easy installation for all users. The core library should strive to use Python's standard library wherever possible.
|
|
20
|
-
* **Simplicity and Pythonic API:** The library must present a simple, intuitive, and "Pythonic" interface. We will always prefer simple function calls with clear parameters over custom Domain Specific Languages (DSLs) or expression parsing. The user should not have to learn a new query language.
|
|
21
|
-
* **Developer Experience & Minimal API Surface:** The primary user experience goal is to provide a clean and minimal public API.
|
|
22
|
-
* **User-facing Classes:** The user should only ever need to instantiate BeaverDB and Document directly.
|
|
23
|
-
* **Fluent Interface:** All other functionalities (like list or collection operations) are exposed through methods on the BeaverDB object (e.g., `db.list("my_list")`, `db.collection("my_docs")`). These methods return internal wrapper objects (`ListWrapper`, `CollectionWrapper`) that provide a rich, fluent interface, preventing the need for the user to learn and manage a large number of classes.
|
|
24
|
-
* **API Design:** Public functions will have short, descriptive names and well-documented parameters, adhering to Python's conventions.
|
|
25
|
-
* **Synchronousembedding Core with Async Potential:** The core library is built on a synchronous foundation, reflecting the synchronous nature of the underlying `sqlite3` driver. This ensures thread safety and simplicity. An async-compatible API may be introduced in the future, but it will likely be a wrapper around the synchronous core (e.g., using a thread pool).
|
|
26
|
-
* **Convention over Configuration:** Features should work well out-of-the-box with sensible defaults. While configuration options can be provided, the user should not be required to tweak many parameters to get good performance and functionality.
|
|
27
|
-
|
|
28
|
-
## 3. Architecture & Core Components
|
|
29
|
-
|
|
30
|
-
`beaver-db` is architected as a set of targeted wrappers around a standard SQLite database, with an in-memory component for performance-critical tasks like vector search.
|
|
31
|
-
|
|
32
|
-
### 3.1. Core Engine (BeaverDB)
|
|
33
|
-
|
|
34
|
-
* **Connection Management:** The BeaverDB class manages a single connection to the SQLite database file.
|
|
35
|
-
* **Concurrency:** It enables `PRAGMA journal_mode=WAL;` (Write-Ahead Logging) by default. This provides a good level of concurrency between one writer and multiple readers, which is a common pattern for the target applications.
|
|
36
|
-
* **Schema Management:** All tables created and managed by the library are prefixed with `beaver_` (or `_beaver_` for internal helpers). This avoids conflicts with user-defined tables within the same database file. The schema is evolved by adding new `beaver_*` tables as new features are introduced.
|
|
37
|
-
|
|
38
|
-
### 3.2. Data Models & Features
|
|
39
|
-
|
|
40
|
-
### Key-Value Dictionaries (DictWrapper)
|
|
41
|
-
|
|
42
|
-
* **Implementation**: A single table (`beaver_dicts`) stores key-value pairs partitioned by a `dict_name` (TEXT). The primary key is a composite of `(dict_name, key)`.
|
|
43
|
-
* **Design**: The `db.dict("namespace")` method returns a `DictWrapper` that provides a complete and standard Pythonic dictionary-like interface. This includes subscripting (`__getitem__`, `__setitem__`), explicit `get()`/`set()` methods, and iterators (`keys()`, `values()`, `items()`). This feature is ideal for managing structured configurations or namespaced key-value data while adhering to the principle of a Simple and Pythonic API.
|
|
44
|
-
|
|
45
|
-
#### Lists (ListWrapper)
|
|
46
|
-
|
|
47
|
-
* **Implementation:** A table (`beaver_lists`) storing list_name, item_order (REAL), and item_value (TEXT).
|
|
48
|
-
* **Design:** The item_order is a floating-point number. This allows for O(1) insertions in the middle of the list by calculating the midpoint between two existing order values. This avoids re-indexing all subsequent items. The ListWrapper provides a rich, Pythonic API (`_len_`, `_getitem_`, `push`, `pop`, etc.).
|
|
49
|
-
|
|
50
|
-
#### Pub/Sub System
|
|
51
|
-
|
|
52
|
-
* **Implementation:** A log table (`beaver_pubsub_log`) that stores messages with a timestamp, channel name, and payload.
|
|
53
|
-
* **Design:** The current SubWrapper is a synchronous iterator that polls the table for new messages since the last seen timestamp. This design is simple and robust. A more performant, event-driven mechanism could be explored in the future, but only if it does not violate the principles of simplicity and minimal dependencies.
|
|
54
|
-
|
|
55
|
-
#### Collections (CollectionWrapper)
|
|
56
|
-
|
|
57
|
-
This is the most complex component, supporting documents, vectors, text, and graphs.
|
|
58
|
-
|
|
59
|
-
* **Document Storage:** Documents are stored in the beaver_collections table. Each document has a collection, item_id, optional item_vector (BLOB), and metadata (TEXT, as JSON). The Document class is a flexible container with no enforced schema.
|
|
60
|
-
* **Vector Search (ANN):**
|
|
61
|
-
* **Indexing:** Vector embeddings are stored as raw bytes (BLOBs) in the beaver_collections table.
|
|
62
|
-
* **Search Algorithm:** For performance, an in-memory k-d tree (`scipy.spatial.cKDTree`) is used for Approximate Nearest Neighbor (ANN) search.
|
|
63
|
-
* **Stale Index Handling:** The database maintains a version number for each collection in beaver_collection_versions. This version is incremented on every index or drop operation. The CollectionWrapper caches the version of its in-memory index and automatically triggers a `refresh()` if it detects its version is older than the database version.
|
|
64
|
-
* **Design Trade-offs:** This in-memory approach is extremely fast for datasets that fit in RAM. The library will not have hard-coded scalability limits, but performance testing will establish practical boundaries which will be published in the documentation.
|
|
65
|
-
* **Full-Text Search (FTS):**
|
|
66
|
-
* **Implementation:** Uses a virtual table (`beaver_fts_index`) powered by SQLite's FTS5 extension.
|
|
67
|
-
* **Indexing:** When a Document is indexed, its metadata is flattened into key-value pairs (e.g., author.name becomes author_name). All string values are automatically inserted into the FTS index.
|
|
68
|
-
* **Querying:** The `match()` method provides a simple interface for searching. It supports FTS5 syntax (like "OR", "AND") passed directly in the query string. The goal is to provide powerful search out-of-the-box without requiring complex configuration.
|
|
69
|
-
* **Graph Engine:**
|
|
70
|
-
* **Implementation:** Relationships are stored as directed edges in the beaver_edges table, linking a source_item_id to a target_item_id with a label.
|
|
71
|
-
* **Design:** The API is purely functional and Pythonic. It does not use a custom graph query language.
|
|
72
|
-
* `connect()`: Creates a directed edge.
|
|
73
|
-
* `neighbors()`: Retrieves immediate (1-hop) neighbors.
|
|
74
|
-
* `walk()`: Uses a recursive Common Table Expression (CTE) in SQL to perform efficient multi-hop graph traversals (BFS). This is powerful yet keeps the implementation clean and contained within SQL.
|
|
75
|
-
|
|
76
|
-
### 3.3. Reliability and Data Handling
|
|
77
|
-
|
|
78
|
-
* **Atomicity and Transactions:** Correctness is paramount. Any public method that performs multiple database modifications (e.g., collection.index, which writes to the documents, FTS, and version tables) **must** be wrapped in a single, atomic database transaction. If any step of the operation fails, the entire transaction will be rolled back, ensuring the database is never left in an inconsistent state.
|
|
79
|
-
* **Error Handling:** The library will use standard Python exceptions whenever possible (e.g., TypeError, ValueError, sqlite3.Error). Custom exceptions will be avoided to ensure the API feels native and predictable to Python developers.
|
|
80
|
-
* **Data Serialization:**
|
|
81
|
-
* The default serialization format is JSON.
|
|
82
|
-
* The library will provide built-in, standardized converters for common Python types not native to JSON, such as `datetime` (to ISO 8601 strings) and `bytes` (to a standard encoding like base64).
|
|
83
|
-
* For custom user-defined objects, it is the user's responsibility to serialize them into a JSON-compatible format (e.g., a string or a dictionary) before storing them.
|
|
84
|
-
|
|
85
|
-
## 4. Roadmap & Future Development
|
|
86
|
-
|
|
87
|
-
### 4.1. Immediate Priorities
|
|
88
|
-
|
|
89
|
-
The primary focus for the near future is on stability and usability.
|
|
90
|
-
|
|
91
|
-
1. **Comprehensive Unit Testing:** Increase test coverage to ensure all features are robust and reliable.
|
|
92
|
-
2. **Elaborate Examples:** Create more examples that demonstrate how to combine features (e.g., a hybrid search that uses FTS, graph traversal, and vector search together).
|
|
93
|
-
3. **Performance Benchmarking:** Develop a standardized suite of performance tests to track regressions and identify optimization opportunities. The results will be used to document the practical scalability limits of the library for various use cases.
|
|
94
|
-
|
|
95
|
-
### 4.2. Long-Term Vision
|
|
96
|
-
|
|
97
|
-
The library is approaching feature-completeness for its core domain. The long-term vision is to make it a stable, trusted, and well-documented tool for its niche.
|
|
98
|
-
|
|
99
|
-
* **Stability:** The API will soon be stabilized, with a commitment to backward compatibility as defined in the versioning policy.
|
|
100
|
-
* **New Modalities:** In the distant future, other data modalities could be considered if they fit the embedded, single-file philosophy.
|
|
101
|
-
|
|
102
|
-
### 4.3. Explicitly Out of Scope
|
|
103
|
-
|
|
104
|
-
To maintain focus and simplicity, the following features will **not** be implemented:
|
|
105
|
-
|
|
106
|
-
* Client-server architecture.
|
|
107
|
-
* Replication or distributed operation.
|
|
108
|
-
* Multi-file database formats.
|
|
109
|
-
* Any feature that makes the database file incompatible with standard SQLite tools.
|
|
110
|
-
* Custom query languages or DSLs.
|
|
111
|
-
|
|
112
|
-
### 4.4. Versioning and Backward Compatibility
|
|
113
|
-
|
|
114
|
-
The project will adhere to **Semantic Versioning (Major.Minor.Patch)**.
|
|
115
|
-
|
|
116
|
-
* **Patch Releases (x.y.Z):** For bug fixes and non-breaking changes. No API or schema changes are permitted.
|
|
117
|
-
* **Minor Releases (x.Y.z):** For adding new, backward-compatible features. Schema additions (e.g., new `beaver_*` tables) are allowed, but schema modifications that break older code are not. Code written for version x.z will be able to open and read a database file from version x.y where z > y.
|
|
118
|
-
* **Major Releases (X.y.z):** For introducing breaking changes to the API or the database schema. There is no guarantee of backward compatibility between major versions, and a migration path will not be provided automatically.
|
beaver_db-0.17.5/makefile
DELETED
|
@@ -1,15 +0,0 @@
|
|
|
1
|
-
.PHONY: publish
|
|
2
|
-
publish: clean build
|
|
3
|
-
uv publish --token `dotenv -f .env get PYPI_TOKEN`
|
|
4
|
-
|
|
5
|
-
.PHONY: build
|
|
6
|
-
build:
|
|
7
|
-
uv build
|
|
8
|
-
uv pip install -e .[full]
|
|
9
|
-
|
|
10
|
-
.PHONY: clean
|
|
11
|
-
clean:
|
|
12
|
-
rm -rf dist
|
|
13
|
-
rm -rf beaver_db.egg-info
|
|
14
|
-
find . -name '*.pyc' -exec rm -f {} +
|
|
15
|
-
find . -name '__pycache__' -exec rm -rf {} +
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|