beaver-db 0.18.6__tar.gz → 0.19.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of beaver-db might be problematic. Click here for more details.
- {beaver_db-0.18.6 → beaver_db-0.19.1}/.gitignore +1 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/PKG-INFO +41 -13
- {beaver_db-0.18.6 → beaver_db-0.19.1}/README.md +40 -12
- {beaver_db-0.18.6 → beaver_db-0.19.1}/beaver/__init__.py +1 -1
- {beaver_db-0.18.6 → beaver_db-0.19.1}/beaver/core.py +51 -0
- beaver_db-0.19.1/beaver/locks.py +173 -0
- beaver_db-0.19.1/examples/locks.py +52 -0
- beaver_db-0.19.1/issues/2-comprehensive-async-wrappers.md +52 -0
- beaver_db-0.19.1/issues/6-drop-in-replacement-for-beaver-rest-server-client.md +53 -0
- beaver_db-0.19.1/issues/7-replace-faiss-with-simpler-linear-numpy-vectorial-search.md +139 -0
- beaver_db-0.19.1/issues/9-type-safe-wrappers-based-on-pydantic-compatible-models.md +63 -0
- beaver_db-0.19.1/issues/closed/1-refactor-vector-store-to-use-faiss.md +50 -0
- beaver_db-0.19.1/issues/closed/5-add-dblock-for-inter-process-synchronization.md +132 -0
- beaver_db-0.19.1/issues/closed/8-first-class-synchronization-primitive.md +127 -0
- beaver_db-0.19.1/makefile +55 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/pyproject.toml +1 -1
- beaver_db-0.18.6/makefile +0 -23
- beaver_db-0.18.6/roadmap.md +0 -152
- {beaver_db-0.18.6 → beaver_db-0.19.1}/.dockerignore +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/.python-version +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/LICENSE +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/beaver/blobs.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/beaver/channels.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/beaver/cli.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/beaver/collections.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/beaver/dicts.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/beaver/lists.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/beaver/logs.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/beaver/queues.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/beaver/server.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/beaver/types.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/beaver/vectors.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/design.md +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/dockerfile +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/async_pubsub.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/blobs.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/cache.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/fts.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/fuzzy.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/general_test.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/graph.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/kvstore.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/list.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/logs.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/pqueue.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/producer_consumer.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/publisher.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/pubsub.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/rerank.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/stress_vectors.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/subscriber.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/textual_chat.css +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/textual_chat.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/type_hints.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/examples/vector.py +0 -0
- {beaver_db-0.18.6 → beaver_db-0.19.1}/uv.lock +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: beaver-db
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.19.1
|
|
4
4
|
Summary: Fast, embedded, and multi-modal DB based on SQLite for AI-powered applications.
|
|
5
5
|
License-File: LICENSE
|
|
6
6
|
Classifier: License :: OSI Approved :: MIT License
|
|
@@ -36,7 +36,7 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
|
|
|
36
36
|
|
|
37
37
|
`beaver` is the **B**ackend for **E**mbedded, **A**ll-in-one **V**ector, **E**ntity, and **R**elationship storage. It's a simple, local, and embedded database designed to manage complex, modern data types without requiring a database server, built on top of SQLite.
|
|
38
38
|
|
|
39
|
-
> If you like beaver's minimalist, no-bullshit philosophy, check out [castor](https://github.com/apiad/castor) for an equally minimalistic approach to task orchestration.
|
|
39
|
+
> If you like beaver's minimalist, no-bullshit philosophy, check out [castor](https://github.com/apiad/castor "null") for an equally minimalistic approach to task orchestration.
|
|
40
40
|
|
|
41
41
|
## Design Philosophy
|
|
42
42
|
|
|
@@ -55,6 +55,7 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
|
|
|
55
55
|
- **Namespaced Key-Value Dictionaries**: A Pythonic, dictionary-like interface for storing any JSON-serializable object within separate namespaces with optional TTL for cache implementations.
|
|
56
56
|
- **Pythonic List Management**: A fluent, Redis-like interface for managing persistent, ordered lists.
|
|
57
57
|
- **Persistent Priority Queue**: A high-performance, persistent priority queue perfect for task orchestration across multiple processes. Also with optional async support.
|
|
58
|
+
- **Inter-Process Locking**: A robust, deadlock-proof, and fair (FIFO) distributed lock (`db.lock()`) to coordinate multiple processes and prevent race conditions.
|
|
58
59
|
- **Time-Indexed Log for Monitoring**: A specialized data structure for structured, time-series logs. Query historical data by time range or create a live, aggregated view of the most recent events for real-time dashboards.
|
|
59
60
|
- **Simple Blob Storage**: A dictionary-like interface for storing medium-sized binary files (like PDFs or images) directly in the database, ensuring transactional integrity with your other data.
|
|
60
61
|
- **High-Performance Vector Storage & Search (Optional)**: Store vector embeddings and perform fast approximate nearest neighbor searches using a `faiss`-based hybrid index.
|
|
@@ -103,14 +104,14 @@ pip install "beaver-db[full]"
|
|
|
103
104
|
```
|
|
104
105
|
|
|
105
106
|
### Running with Docker
|
|
107
|
+
|
|
106
108
|
For a fully embedded and lightweight solution, you can run the BeaverDB REST API server using Docker. This is the easiest way to get a self-hosted instance up and running.
|
|
107
109
|
|
|
108
110
|
```bash
|
|
109
111
|
docker run -p 8000:8000 -v $(pwd)/data:/app apiad/beaverdb
|
|
110
112
|
```
|
|
111
113
|
|
|
112
|
-
This command will start the BeaverDB server, and your database file will be stored in the data directory on your host machine. You can access the API at
|
|
113
|
-
|
|
114
|
+
This command will start the BeaverDB server, and your database file will be stored in the data directory on your host machine. You can access the API at [http://localhost:8000](http://localhost:8000").
|
|
114
115
|
|
|
115
116
|
## Quickstart
|
|
116
117
|
|
|
@@ -172,12 +173,12 @@ Here are a couple of examples using `curl`:
|
|
|
172
173
|
|
|
173
174
|
```bash
|
|
174
175
|
# Set a value in the 'app_config' dictionary
|
|
175
|
-
curl -X PUT http://127.0.0.1:8000/dicts/app_config/api_key
|
|
176
|
+
curl -X PUT [http://127.0.0.1:8000/dicts/app_config/api_key](http://127.0.0.1:8000/dicts/app_config/api_key)
|
|
176
177
|
-H "Content-Type: application/json"
|
|
177
178
|
-d '"your-secret-api-key"'
|
|
178
179
|
|
|
179
180
|
# Get the value back
|
|
180
|
-
curl http://127.0.0.1:8000/dicts/app_config/api_key
|
|
181
|
+
curl [http://127.0.0.1:8000/dicts/app_config/api_key](http://127.0.0.1:8000/dicts/app_config/api_key)
|
|
181
182
|
# Output: "your-secret-api-key"
|
|
182
183
|
```
|
|
183
184
|
|
|
@@ -341,6 +342,34 @@ for summary in live_summary:
|
|
|
341
342
|
print(f"Live Stats (10s window): Count={summary['count']}, Mean={summary['mean']:.2f}")
|
|
342
343
|
```
|
|
343
344
|
|
|
345
|
+
### 9. Coordinate Distributed Web Scrapers
|
|
346
|
+
|
|
347
|
+
Run multiple scraper processes in parallel and use `db.lock()` to coordinate them. You can ensure only one process refreshes a shared API token or sitemap, preventing race conditions and rate-limiting.
|
|
348
|
+
|
|
349
|
+
```python
|
|
350
|
+
import time
|
|
351
|
+
|
|
352
|
+
scrapers_state = db.dict("scraper_state")
|
|
353
|
+
|
|
354
|
+
last_refresh = scrapers_state.get("last_sitemap_refresh", 0)
|
|
355
|
+
if time.time() - last_refresh > 3600: # Only refresh once per hour
|
|
356
|
+
try:
|
|
357
|
+
# Try to get a lock to refresh the shared sitemap, but don't wait long
|
|
358
|
+
with db.lock("refresh_sitemap", timeout=1):
|
|
359
|
+
# We got the lock. Check if it's time to refresh.
|
|
360
|
+
print(f"PID {os.getpid()} is refreshing the sitemap...")
|
|
361
|
+
scrapers_state["sitemap"] = ["/page1", "/page2"] # Your fetch_sitemap()
|
|
362
|
+
scrapers_state["last_sitemap_refresh"] = time.time()
|
|
363
|
+
|
|
364
|
+
except TimeoutError:
|
|
365
|
+
# Another process is already refreshing, so we can skip
|
|
366
|
+
print(f"PID {os.getpid()} letting other process handle refresh.")
|
|
367
|
+
|
|
368
|
+
# All processes can now safely use the shared sitemap
|
|
369
|
+
sitemap = scrapers_state.get("sitemap")
|
|
370
|
+
# ... proceed with scraping ...
|
|
371
|
+
```
|
|
372
|
+
|
|
344
373
|
## Type-Safe Data Models
|
|
345
374
|
|
|
346
375
|
For enhanced data integrity and a better developer experience, BeaverDB supports type-safe operations for all modalities. By associating a model with these data structures, you get automatic serialization and deserialization, complete with autocompletion in your editor.
|
|
@@ -348,7 +377,6 @@ For enhanced data integrity and a better developer experience, BeaverDB supports
|
|
|
348
377
|
This feature is designed to be flexible and works seamlessly with two kinds of models:
|
|
349
378
|
|
|
350
379
|
- **Pydantic Models**: If you're already using Pydantic, your `BaseModel` classes will work out of the box.
|
|
351
|
-
|
|
352
380
|
- **Lightweight `beaver.Model`**: For a zero-dependency solution, you can inherit from the built-in `beaver.Model` class, which is a standard Python class with serialization methods automatically included.
|
|
353
381
|
|
|
354
382
|
|
|
@@ -393,10 +421,11 @@ For more in-depth examples, check out the scripts in the `examples/` directory:
|
|
|
393
421
|
- [`graph.py`](examples/graph.py): Shows how to create relationships between documents and perform multi-hop graph traversals.
|
|
394
422
|
- [`kvstore.py`](examples/kvstore.py): A comprehensive demo of the namespaced dictionary feature.
|
|
395
423
|
- [`list.py`](examples/list.py): Shows the full capabilities of the persistent list, including slicing and in-place updates.
|
|
424
|
+
- [`locks.py`](examples/lock_test.py): Demonstrates how to use the inter-process lock to create critical sections.
|
|
396
425
|
- [`logs.py`](examples/logs.py): A short example showing how to build a realtime dashboard with the logging feature.
|
|
397
426
|
- [`pqueue.py`](examples/pqueue.py): A practical example of using the persistent priority queue for task management.
|
|
398
427
|
- [`producer_consumer.py`](examples/producer_consumer.py): A demonstration of the distributed task queue system in a multi-process environment.
|
|
399
|
-
- [`publisher.py`](examples/publisher.
|
|
428
|
+
- [`publisher.py`](examples/publisher.py) and [`subscriber.py`](examples/subscriber.py): A pair of examples demonstrating inter-process message passing with the publish/subscribe system.
|
|
400
429
|
- [`pubsub.py`](examples/pubsub.py): A demonstration of the synchronous, thread-safe publish/subscribe system in a single process.
|
|
401
430
|
- [`rerank.py`](examples/rerank.py): Shows how to combine results from vector and text search for more refined results.
|
|
402
431
|
- [`stress_vectors.py`](examples/stress_vectors.py): A stress test for the vector search functionality.
|
|
@@ -410,14 +439,13 @@ For more in-depth examples, check out the scripts in the `examples/` directory:
|
|
|
410
439
|
|
|
411
440
|
These are some of the features and improvements planned for future releases:
|
|
412
441
|
|
|
413
|
-
- **
|
|
414
|
-
- **Type-
|
|
415
|
-
- **Drop-in REST
|
|
442
|
+
- **[Issue #2](https://github.com/syalia-srl/beaver/issues/2) Comprehensive async wrappers**: Extend the async support with on-demand wrappers for all data structures, not just channels.
|
|
443
|
+
- **[Issue #9](https://github.com/syalia-srl/beaver/issues/2) Type-safe wrappers based on Pydantic-compatible models**: Enhance the built-in `Model` to handle recursive and embedded types and provide Pydantic compatibility.
|
|
444
|
+
- **[Issue #6](https://github.com/syalia-srl/beaver/issues/2) Drop-in replacement for Beaver REST server client**: Implement a `BeaverClient` class that acts as a drop-in replacement for `BeaverDB` but works against the REST API server.
|
|
445
|
+
- **[Issue #7](https://github.com/syalia-srl/beaver/issues/2) Replace `faiss` with simpler, linear `numpy` vectorial search**: Investigate removing the heavy `faiss` dependency in favor of a pure `numpy` implementation to improve installation simplicity, accepting a trade-off in search performance for O(1) installation.
|
|
416
446
|
|
|
417
|
-
Check out the [roadmap](roadmap.md) for a detailed list of upcoming features and design ideas.
|
|
418
447
|
|
|
419
448
|
If you think of something that would make `beaver` more useful for your use case, please open an issue and/or submit a pull request.
|
|
420
|
-
|
|
421
449
|
## License
|
|
422
450
|
|
|
423
451
|
This project is licensed under the MIT License.
|
|
@@ -11,7 +11,7 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
|
|
|
11
11
|
|
|
12
12
|
`beaver` is the **B**ackend for **E**mbedded, **A**ll-in-one **V**ector, **E**ntity, and **R**elationship storage. It's a simple, local, and embedded database designed to manage complex, modern data types without requiring a database server, built on top of SQLite.
|
|
13
13
|
|
|
14
|
-
> If you like beaver's minimalist, no-bullshit philosophy, check out [castor](https://github.com/apiad/castor) for an equally minimalistic approach to task orchestration.
|
|
14
|
+
> If you like beaver's minimalist, no-bullshit philosophy, check out [castor](https://github.com/apiad/castor "null") for an equally minimalistic approach to task orchestration.
|
|
15
15
|
|
|
16
16
|
## Design Philosophy
|
|
17
17
|
|
|
@@ -30,6 +30,7 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
|
|
|
30
30
|
- **Namespaced Key-Value Dictionaries**: A Pythonic, dictionary-like interface for storing any JSON-serializable object within separate namespaces with optional TTL for cache implementations.
|
|
31
31
|
- **Pythonic List Management**: A fluent, Redis-like interface for managing persistent, ordered lists.
|
|
32
32
|
- **Persistent Priority Queue**: A high-performance, persistent priority queue perfect for task orchestration across multiple processes. Also with optional async support.
|
|
33
|
+
- **Inter-Process Locking**: A robust, deadlock-proof, and fair (FIFO) distributed lock (`db.lock()`) to coordinate multiple processes and prevent race conditions.
|
|
33
34
|
- **Time-Indexed Log for Monitoring**: A specialized data structure for structured, time-series logs. Query historical data by time range or create a live, aggregated view of the most recent events for real-time dashboards.
|
|
34
35
|
- **Simple Blob Storage**: A dictionary-like interface for storing medium-sized binary files (like PDFs or images) directly in the database, ensuring transactional integrity with your other data.
|
|
35
36
|
- **High-Performance Vector Storage & Search (Optional)**: Store vector embeddings and perform fast approximate nearest neighbor searches using a `faiss`-based hybrid index.
|
|
@@ -78,14 +79,14 @@ pip install "beaver-db[full]"
|
|
|
78
79
|
```
|
|
79
80
|
|
|
80
81
|
### Running with Docker
|
|
82
|
+
|
|
81
83
|
For a fully embedded and lightweight solution, you can run the BeaverDB REST API server using Docker. This is the easiest way to get a self-hosted instance up and running.
|
|
82
84
|
|
|
83
85
|
```bash
|
|
84
86
|
docker run -p 8000:8000 -v $(pwd)/data:/app apiad/beaverdb
|
|
85
87
|
```
|
|
86
88
|
|
|
87
|
-
This command will start the BeaverDB server, and your database file will be stored in the data directory on your host machine. You can access the API at
|
|
88
|
-
|
|
89
|
+
This command will start the BeaverDB server, and your database file will be stored in the data directory on your host machine. You can access the API at [http://localhost:8000](http://localhost:8000").
|
|
89
90
|
|
|
90
91
|
## Quickstart
|
|
91
92
|
|
|
@@ -147,12 +148,12 @@ Here are a couple of examples using `curl`:
|
|
|
147
148
|
|
|
148
149
|
```bash
|
|
149
150
|
# Set a value in the 'app_config' dictionary
|
|
150
|
-
curl -X PUT http://127.0.0.1:8000/dicts/app_config/api_key
|
|
151
|
+
curl -X PUT [http://127.0.0.1:8000/dicts/app_config/api_key](http://127.0.0.1:8000/dicts/app_config/api_key)
|
|
151
152
|
-H "Content-Type: application/json"
|
|
152
153
|
-d '"your-secret-api-key"'
|
|
153
154
|
|
|
154
155
|
# Get the value back
|
|
155
|
-
curl http://127.0.0.1:8000/dicts/app_config/api_key
|
|
156
|
+
curl [http://127.0.0.1:8000/dicts/app_config/api_key](http://127.0.0.1:8000/dicts/app_config/api_key)
|
|
156
157
|
# Output: "your-secret-api-key"
|
|
157
158
|
```
|
|
158
159
|
|
|
@@ -316,6 +317,34 @@ for summary in live_summary:
|
|
|
316
317
|
print(f"Live Stats (10s window): Count={summary['count']}, Mean={summary['mean']:.2f}")
|
|
317
318
|
```
|
|
318
319
|
|
|
320
|
+
### 9. Coordinate Distributed Web Scrapers
|
|
321
|
+
|
|
322
|
+
Run multiple scraper processes in parallel and use `db.lock()` to coordinate them. You can ensure only one process refreshes a shared API token or sitemap, preventing race conditions and rate-limiting.
|
|
323
|
+
|
|
324
|
+
```python
|
|
325
|
+
import time
|
|
326
|
+
|
|
327
|
+
scrapers_state = db.dict("scraper_state")
|
|
328
|
+
|
|
329
|
+
last_refresh = scrapers_state.get("last_sitemap_refresh", 0)
|
|
330
|
+
if time.time() - last_refresh > 3600: # Only refresh once per hour
|
|
331
|
+
try:
|
|
332
|
+
# Try to get a lock to refresh the shared sitemap, but don't wait long
|
|
333
|
+
with db.lock("refresh_sitemap", timeout=1):
|
|
334
|
+
# We got the lock. Check if it's time to refresh.
|
|
335
|
+
print(f"PID {os.getpid()} is refreshing the sitemap...")
|
|
336
|
+
scrapers_state["sitemap"] = ["/page1", "/page2"] # Your fetch_sitemap()
|
|
337
|
+
scrapers_state["last_sitemap_refresh"] = time.time()
|
|
338
|
+
|
|
339
|
+
except TimeoutError:
|
|
340
|
+
# Another process is already refreshing, so we can skip
|
|
341
|
+
print(f"PID {os.getpid()} letting other process handle refresh.")
|
|
342
|
+
|
|
343
|
+
# All processes can now safely use the shared sitemap
|
|
344
|
+
sitemap = scrapers_state.get("sitemap")
|
|
345
|
+
# ... proceed with scraping ...
|
|
346
|
+
```
|
|
347
|
+
|
|
319
348
|
## Type-Safe Data Models
|
|
320
349
|
|
|
321
350
|
For enhanced data integrity and a better developer experience, BeaverDB supports type-safe operations for all modalities. By associating a model with these data structures, you get automatic serialization and deserialization, complete with autocompletion in your editor.
|
|
@@ -323,7 +352,6 @@ For enhanced data integrity and a better developer experience, BeaverDB supports
|
|
|
323
352
|
This feature is designed to be flexible and works seamlessly with two kinds of models:
|
|
324
353
|
|
|
325
354
|
- **Pydantic Models**: If you're already using Pydantic, your `BaseModel` classes will work out of the box.
|
|
326
|
-
|
|
327
355
|
- **Lightweight `beaver.Model`**: For a zero-dependency solution, you can inherit from the built-in `beaver.Model` class, which is a standard Python class with serialization methods automatically included.
|
|
328
356
|
|
|
329
357
|
|
|
@@ -368,10 +396,11 @@ For more in-depth examples, check out the scripts in the `examples/` directory:
|
|
|
368
396
|
- [`graph.py`](examples/graph.py): Shows how to create relationships between documents and perform multi-hop graph traversals.
|
|
369
397
|
- [`kvstore.py`](examples/kvstore.py): A comprehensive demo of the namespaced dictionary feature.
|
|
370
398
|
- [`list.py`](examples/list.py): Shows the full capabilities of the persistent list, including slicing and in-place updates.
|
|
399
|
+
- [`locks.py`](examples/lock_test.py): Demonstrates how to use the inter-process lock to create critical sections.
|
|
371
400
|
- [`logs.py`](examples/logs.py): A short example showing how to build a realtime dashboard with the logging feature.
|
|
372
401
|
- [`pqueue.py`](examples/pqueue.py): A practical example of using the persistent priority queue for task management.
|
|
373
402
|
- [`producer_consumer.py`](examples/producer_consumer.py): A demonstration of the distributed task queue system in a multi-process environment.
|
|
374
|
-
- [`publisher.py`](examples/publisher.
|
|
403
|
+
- [`publisher.py`](examples/publisher.py) and [`subscriber.py`](examples/subscriber.py): A pair of examples demonstrating inter-process message passing with the publish/subscribe system.
|
|
375
404
|
- [`pubsub.py`](examples/pubsub.py): A demonstration of the synchronous, thread-safe publish/subscribe system in a single process.
|
|
376
405
|
- [`rerank.py`](examples/rerank.py): Shows how to combine results from vector and text search for more refined results.
|
|
377
406
|
- [`stress_vectors.py`](examples/stress_vectors.py): A stress test for the vector search functionality.
|
|
@@ -385,14 +414,13 @@ For more in-depth examples, check out the scripts in the `examples/` directory:
|
|
|
385
414
|
|
|
386
415
|
These are some of the features and improvements planned for future releases:
|
|
387
416
|
|
|
388
|
-
- **
|
|
389
|
-
- **Type-
|
|
390
|
-
- **Drop-in REST
|
|
417
|
+
- **[Issue #2](https://github.com/syalia-srl/beaver/issues/2) Comprehensive async wrappers**: Extend the async support with on-demand wrappers for all data structures, not just channels.
|
|
418
|
+
- **[Issue #9](https://github.com/syalia-srl/beaver/issues/2) Type-safe wrappers based on Pydantic-compatible models**: Enhance the built-in `Model` to handle recursive and embedded types and provide Pydantic compatibility.
|
|
419
|
+
- **[Issue #6](https://github.com/syalia-srl/beaver/issues/2) Drop-in replacement for Beaver REST server client**: Implement a `BeaverClient` class that acts as a drop-in replacement for `BeaverDB` but works against the REST API server.
|
|
420
|
+
- **[Issue #7](https://github.com/syalia-srl/beaver/issues/2) Replace `faiss` with simpler, linear `numpy` vectorial search**: Investigate removing the heavy `faiss` dependency in favor of a pure `numpy` implementation to improve installation simplicity, accepting a trade-off in search performance for O(1) installation.
|
|
391
421
|
|
|
392
|
-
Check out the [roadmap](roadmap.md) for a detailed list of upcoming features and design ideas.
|
|
393
422
|
|
|
394
423
|
If you think of something that would make `beaver` more useful for your use case, please open an issue and/or submit a pull request.
|
|
395
|
-
|
|
396
424
|
## License
|
|
397
425
|
|
|
398
426
|
This project is licensed under the MIT License.
|
|
@@ -9,6 +9,7 @@ from .channels import ChannelManager
|
|
|
9
9
|
from .collections import CollectionManager, Document
|
|
10
10
|
from .dicts import DictManager
|
|
11
11
|
from .lists import ListManager
|
|
12
|
+
from .locks import LockManager
|
|
12
13
|
from .logs import LogManager
|
|
13
14
|
from .queues import QueueManager
|
|
14
15
|
|
|
@@ -99,6 +100,35 @@ class BeaverDB:
|
|
|
99
100
|
self._create_pubsub_table()
|
|
100
101
|
self._create_trigrams_table()
|
|
101
102
|
self._create_versions_table()
|
|
103
|
+
self._create_locks_table()
|
|
104
|
+
|
|
105
|
+
def _create_locks_table(self): # <-- Add this new method
|
|
106
|
+
"""Creates the table for managing inter-process lock waiters."""
|
|
107
|
+
self.connection.execute(
|
|
108
|
+
"""
|
|
109
|
+
CREATE TABLE IF NOT EXISTS beaver_lock_waiters (
|
|
110
|
+
lock_name TEXT NOT NULL,
|
|
111
|
+
waiter_id TEXT NOT NULL,
|
|
112
|
+
requested_at REAL NOT NULL,
|
|
113
|
+
expires_at REAL NOT NULL,
|
|
114
|
+
PRIMARY KEY (lock_name, requested_at)
|
|
115
|
+
)
|
|
116
|
+
"""
|
|
117
|
+
)
|
|
118
|
+
# Index for fast cleanup of expired locks
|
|
119
|
+
self.connection.execute(
|
|
120
|
+
"""
|
|
121
|
+
CREATE INDEX IF NOT EXISTS idx_lock_expires
|
|
122
|
+
ON beaver_lock_waiters (lock_name, expires_at)
|
|
123
|
+
"""
|
|
124
|
+
)
|
|
125
|
+
# Index for fast deletion by the lock holder
|
|
126
|
+
self.connection.execute(
|
|
127
|
+
"""
|
|
128
|
+
CREATE INDEX IF NOT EXISTS idx_lock_waiter_id
|
|
129
|
+
ON beaver_lock_waiters (lock_name, waiter_id)
|
|
130
|
+
"""
|
|
131
|
+
)
|
|
102
132
|
|
|
103
133
|
def _create_logs_table(self):
|
|
104
134
|
"""Creates the table for time-indexed logs."""
|
|
@@ -421,3 +451,24 @@ class BeaverDB:
|
|
|
421
451
|
raise TypeError("The model parameter must be a JsonSerializable class.")
|
|
422
452
|
|
|
423
453
|
return LogManager(name, self, self._db_path, model)
|
|
454
|
+
|
|
455
|
+
def lock(
|
|
456
|
+
self,
|
|
457
|
+
name: str,
|
|
458
|
+
timeout: float | None = None,
|
|
459
|
+
lock_ttl: float = 60.0,
|
|
460
|
+
poll_interval: float = 0.1,
|
|
461
|
+
) -> LockManager:
|
|
462
|
+
"""
|
|
463
|
+
Returns an inter-process lock manager for a given lock name.
|
|
464
|
+
|
|
465
|
+
Args:
|
|
466
|
+
name: The unique name of the lock (e.g., "run_compaction").
|
|
467
|
+
timeout: Max seconds to wait to acquire the lock.
|
|
468
|
+
If None, it will wait forever.
|
|
469
|
+
lock_ttl: Max seconds the lock can be held. If the process crashes,
|
|
470
|
+
the lock will auto-expire after this time.
|
|
471
|
+
poll_interval: Seconds to wait between polls. Shorter intervals
|
|
472
|
+
are more responsive but create more DB I/O.
|
|
473
|
+
"""
|
|
474
|
+
return LockManager(self, name, timeout, lock_ttl, poll_interval)
|
|
@@ -0,0 +1,173 @@
|
|
|
1
|
+
import random
|
|
2
|
+
import time
|
|
3
|
+
import os
|
|
4
|
+
import uuid
|
|
5
|
+
from typing import Optional
|
|
6
|
+
from .types import IDatabase
|
|
7
|
+
|
|
8
|
+
|
|
9
|
+
class LockManager:
|
|
10
|
+
"""
|
|
11
|
+
An inter-process, deadlock-proof, and fair (FIFO) lock built on SQLite.
|
|
12
|
+
|
|
13
|
+
This class provides a context manager (`with` statement) to ensure that
|
|
14
|
+
only one process (among many) can enter a critical section of code at a
|
|
15
|
+
time.
|
|
16
|
+
|
|
17
|
+
It is "fair" because it uses a FIFO queue (based on insertion time).
|
|
18
|
+
It is "deadlock-proof" because locks have a Time-To-Live (TTL); if a
|
|
19
|
+
process crashes, its lock will eventually expire and be cleaned up.
|
|
20
|
+
"""
|
|
21
|
+
|
|
22
|
+
def __init__(
|
|
23
|
+
self,
|
|
24
|
+
db: IDatabase,
|
|
25
|
+
name: str,
|
|
26
|
+
timeout: Optional[float] = None,
|
|
27
|
+
lock_ttl: float = 60.0,
|
|
28
|
+
poll_interval: float = 0.1,
|
|
29
|
+
):
|
|
30
|
+
"""
|
|
31
|
+
Initializes the lock manager.
|
|
32
|
+
|
|
33
|
+
Args:
|
|
34
|
+
db: The BeaverDB instance.
|
|
35
|
+
name: The unique name of the lock (e.g., "run_compaction").
|
|
36
|
+
timeout: Max seconds to wait to acquire the lock. If None,
|
|
37
|
+
it will wait forever.
|
|
38
|
+
lock_ttl: Max seconds the lock can be held. If the process crashes,
|
|
39
|
+
the lock will auto-expire after this time.
|
|
40
|
+
poll_interval: Seconds to wait between polls.
|
|
41
|
+
"""
|
|
42
|
+
if not isinstance(name, str) or not name:
|
|
43
|
+
raise ValueError("Lock name must be a non-empty string.")
|
|
44
|
+
if lock_ttl <= 0:
|
|
45
|
+
raise ValueError("lock_ttl must be positive.")
|
|
46
|
+
if poll_interval <= 0:
|
|
47
|
+
raise ValueError("poll_interval must be positive.")
|
|
48
|
+
|
|
49
|
+
self._db = db
|
|
50
|
+
self._lock_name = name
|
|
51
|
+
self._timeout = timeout
|
|
52
|
+
self._lock_ttl = lock_ttl
|
|
53
|
+
self._poll_interval = poll_interval
|
|
54
|
+
# A unique ID for this specific lock instance across all processes
|
|
55
|
+
self._waiter_id = f"pid:{os.getpid()}:id:{uuid.uuid4()}"
|
|
56
|
+
self._acquired = False # State to track if this instance holds the lock
|
|
57
|
+
|
|
58
|
+
def acquire(self) -> "LockManager":
|
|
59
|
+
"""
|
|
60
|
+
Blocks until the lock is acquired or the timeout expires.
|
|
61
|
+
|
|
62
|
+
Raises:
|
|
63
|
+
TimeoutError: If the lock cannot be acquired within the specified timeout.
|
|
64
|
+
"""
|
|
65
|
+
if self._acquired:
|
|
66
|
+
# This instance already holds the lock
|
|
67
|
+
return self
|
|
68
|
+
|
|
69
|
+
start_time = time.time()
|
|
70
|
+
requested_at = time.time()
|
|
71
|
+
expires_at = requested_at + self._lock_ttl
|
|
72
|
+
|
|
73
|
+
conn = self._db.connection
|
|
74
|
+
|
|
75
|
+
try:
|
|
76
|
+
# 1. Add self to the FIFO queue (atomic)
|
|
77
|
+
with conn:
|
|
78
|
+
conn.execute(
|
|
79
|
+
"""
|
|
80
|
+
INSERT INTO beaver_lock_waiters (lock_name, waiter_id, requested_at, expires_at)
|
|
81
|
+
VALUES (?, ?, ?, ?)
|
|
82
|
+
""",
|
|
83
|
+
(self._lock_name, self._waiter_id, requested_at, expires_at),
|
|
84
|
+
)
|
|
85
|
+
|
|
86
|
+
# 2. Start polling loop
|
|
87
|
+
while True:
|
|
88
|
+
with conn:
|
|
89
|
+
# 3. Clean up expired locks from crashed processes
|
|
90
|
+
now = time.time()
|
|
91
|
+
conn.execute(
|
|
92
|
+
"DELETE FROM beaver_lock_waiters WHERE lock_name = ? AND expires_at < ?",
|
|
93
|
+
(self._lock_name, now),
|
|
94
|
+
)
|
|
95
|
+
|
|
96
|
+
# 4. Check who is at the front of the queue
|
|
97
|
+
cursor = conn.cursor()
|
|
98
|
+
cursor.execute(
|
|
99
|
+
"""
|
|
100
|
+
SELECT waiter_id FROM beaver_lock_waiters
|
|
101
|
+
WHERE lock_name = ?
|
|
102
|
+
ORDER BY requested_at ASC
|
|
103
|
+
LIMIT 1
|
|
104
|
+
""",
|
|
105
|
+
(self._lock_name,),
|
|
106
|
+
)
|
|
107
|
+
result = cursor.fetchone()
|
|
108
|
+
cursor.close()
|
|
109
|
+
|
|
110
|
+
if result and result["waiter_id"] == self._waiter_id:
|
|
111
|
+
# We are at the front. We own the lock.
|
|
112
|
+
self._acquired = True
|
|
113
|
+
return self
|
|
114
|
+
|
|
115
|
+
# 5. Check for timeout
|
|
116
|
+
if self._timeout is not None:
|
|
117
|
+
if (time.time() - start_time) > self._timeout:
|
|
118
|
+
# We timed out. Remove ourselves from the queue and raise.
|
|
119
|
+
self._release_from_queue()
|
|
120
|
+
raise TimeoutError(
|
|
121
|
+
f"Failed to acquire lock '{self._lock_name}' within {self._timeout}s."
|
|
122
|
+
)
|
|
123
|
+
|
|
124
|
+
# 6. Wait politely before polling again
|
|
125
|
+
# Add +/- 10% jitter to the poll interval to avoid thundering herd
|
|
126
|
+
jitter = self._poll_interval * 0.1
|
|
127
|
+
sleep_time = random.uniform(
|
|
128
|
+
self._poll_interval - jitter, self._poll_interval + jitter
|
|
129
|
+
)
|
|
130
|
+
time.sleep(sleep_time)
|
|
131
|
+
|
|
132
|
+
except Exception:
|
|
133
|
+
# If anything goes wrong, try to clean up our waiter entry
|
|
134
|
+
self._release_from_queue()
|
|
135
|
+
raise
|
|
136
|
+
|
|
137
|
+
def _release_from_queue(self):
|
|
138
|
+
"""
|
|
139
|
+
Atomically removes this instance's entry from the waiter queue.
|
|
140
|
+
This is a best-effort, fire-and-forget operation.
|
|
141
|
+
"""
|
|
142
|
+
try:
|
|
143
|
+
with self._db.connection:
|
|
144
|
+
self._db.connection.execute(
|
|
145
|
+
"DELETE FROM beaver_lock_waiters WHERE lock_name = ? AND waiter_id = ?",
|
|
146
|
+
(self._lock_name, self._waiter_id),
|
|
147
|
+
)
|
|
148
|
+
except Exception:
|
|
149
|
+
# Don't raise errors during release/cleanup
|
|
150
|
+
pass
|
|
151
|
+
|
|
152
|
+
def release(self):
|
|
153
|
+
"""
|
|
154
|
+
Releases the lock, allowing the next process in the queue to acquire it.
|
|
155
|
+
This is safe to call multiple times.
|
|
156
|
+
"""
|
|
157
|
+
if not self._acquired:
|
|
158
|
+
# We don't hold the lock, so nothing to do.
|
|
159
|
+
return
|
|
160
|
+
|
|
161
|
+
self._release_from_queue()
|
|
162
|
+
self._acquired = False
|
|
163
|
+
|
|
164
|
+
def __enter__(self) -> "LockManager":
|
|
165
|
+
"""Acquires the lock when entering a 'with' statement."""
|
|
166
|
+
return self.acquire()
|
|
167
|
+
|
|
168
|
+
def __exit__(self, exc_type, exc_val, exc_tb):
|
|
169
|
+
"""Releases the lock when exiting a 'with' statement."""
|
|
170
|
+
self.release()
|
|
171
|
+
|
|
172
|
+
def __repr__(self) -> str:
|
|
173
|
+
return f"LockManager(name='{self._lock_name}', acquired={self._acquired})"
|
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
import os
|
|
2
|
+
import time
|
|
3
|
+
from datetime import datetime
|
|
4
|
+
from beaver import BeaverDB
|
|
5
|
+
|
|
6
|
+
DB_PATH = "lock_test.db"
|
|
7
|
+
LOCK_NAME = "critical_task_lock"
|
|
8
|
+
SHARED_LOG_NAME = "shared_work_log"
|
|
9
|
+
|
|
10
|
+
|
|
11
|
+
def run_lock_demo():
|
|
12
|
+
"""
|
|
13
|
+
A test function that demonstrates the inter-process lock.
|
|
14
|
+
"""
|
|
15
|
+
pid = os.getpid()
|
|
16
|
+
db = BeaverDB(DB_PATH)
|
|
17
|
+
|
|
18
|
+
while True:
|
|
19
|
+
print(f"[PID {pid}] Trying to acquire lock '{LOCK_NAME}' (timeout=5s)...")
|
|
20
|
+
|
|
21
|
+
try:
|
|
22
|
+
# Try to acquire the lock with a 5-second timeout
|
|
23
|
+
# We use a short poll_interval for a responsive test
|
|
24
|
+
with db.lock(LOCK_NAME, timeout=5.0, poll_interval=0.2):
|
|
25
|
+
# --- CRITICAL SECTION START ---
|
|
26
|
+
# Only one process can be in this block at a time.
|
|
27
|
+
print(f"--------------------------------------------------")
|
|
28
|
+
print(f"[PID {pid}] ✅ Lock ACQUIRED.")
|
|
29
|
+
log_time = datetime.now().isoformat()
|
|
30
|
+
|
|
31
|
+
print(f"[PID {pid}] 👷 Starting work (simulating 3 seconds)...")
|
|
32
|
+
time.sleep(3)
|
|
33
|
+
print(f"[PID {pid}] 🏁 Work finished.")
|
|
34
|
+
|
|
35
|
+
print(f"[PID {pid}] 🔑 Releasing lock.")
|
|
36
|
+
# --- CRITICAL SECTION END ---
|
|
37
|
+
# Lock is automatically released when the 'with' block exits
|
|
38
|
+
print(f"--------------------------------------------------")
|
|
39
|
+
|
|
40
|
+
except TimeoutError:
|
|
41
|
+
# This happens if we couldn't get the lock within the 5-second timeout
|
|
42
|
+
print(f"[PID {pid}] ❌ Lock acquisition TIMED OUT. Another process is busy.")
|
|
43
|
+
except Exception as e:
|
|
44
|
+
print(f"[PID {pid}] An error occurred: {e}")
|
|
45
|
+
|
|
46
|
+
|
|
47
|
+
if __name__ == "__main__":
|
|
48
|
+
print("--- BeaverDB Lock Test ---")
|
|
49
|
+
print(f"To test, run this script in 2 or more terminals at the same time.")
|
|
50
|
+
print("Database file: " + DB_PATH)
|
|
51
|
+
print("------------------------\n")
|
|
52
|
+
run_lock_demo()
|
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
---
|
|
2
|
+
number: 2
|
|
3
|
+
title: "Comprehensive async wrappers"
|
|
4
|
+
state: open
|
|
5
|
+
labels:
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Tasks
|
|
9
|
+
|
|
10
|
+
- [ ] Dictionary wrapper
|
|
11
|
+
- [ ] List wrapper
|
|
12
|
+
- [ ] Queue wrapper
|
|
13
|
+
- [ ] Collections wrapper
|
|
14
|
+
|
|
15
|
+
### 1. Concept
|
|
16
|
+
|
|
17
|
+
A **Comprehensive Async API** will be introduced to allow seamless integration of `beaver-db` into modern `asyncio`-based applications. Instead of making the core library asynchronous, this feature will provide an elegant, on-demand way to get an async-compatible version of any core `beaver-db` object.
|
|
18
|
+
|
|
19
|
+
The core of the library will remain fully synchronous, respecting its design principle of a "Synchronous Core with Async Potential". The async functionality will be provided through thin, type-safe wrappers that run the blocking database calls in a background thread pool, ensuring the `asyncio` event loop is never blocked.
|
|
20
|
+
|
|
21
|
+
### 2. Use Cases
|
|
22
|
+
|
|
23
|
+
This feature is essential for developers using modern Python frameworks and for building highly concurrent applications:
|
|
24
|
+
|
|
25
|
+
* **Modern Web Backends**: Natively integrate `beaver-db` with frameworks like FastAPI or Starlette without needing to manage a separate thread pool executor for database calls.
|
|
26
|
+
* **High-Concurrency Tools**: Use `beaver-db` in applications that manage thousands of concurrent I/O operations (like websocket servers, scrapers, or chatbots) without sacrificing responsiveness.
|
|
27
|
+
* **Ergonomic Developer Experience**: Allow developers working in an `async` codebase to use the familiar `await` syntax for all database operations, leading to cleaner and more consistent code.
|
|
28
|
+
|
|
29
|
+
### 3. Proposed API
|
|
30
|
+
|
|
31
|
+
The API is designed to be flexible and explicit, allowing the developer to "opt-in" to the async version of an object whenever needed.
|
|
32
|
+
|
|
33
|
+
* `docs = db.collection("articles")`: The developer starts with the standard, synchronous object.
|
|
34
|
+
* `async_docs = docs.as_async()`: A new `.as_async()` method on any synchronous wrapper (`CollectionWrapper`, `ListWrapper`, etc.) will return a parallel `Async` version of that object.
|
|
35
|
+
* `await async_docs.index(my_doc)`: All methods on the `Async` wrapper are `async def` and must be awaited. The method names are identical to their synchronous counterparts, providing a clean and consistent API.
|
|
36
|
+
* `await docs.as_async().search(vector)`: For one-off calls, the developer can chain the methods for a concise, non-blocking operation.
|
|
37
|
+
|
|
38
|
+
### 4. Implementation Design: Type-Safe Parallel Wrappers
|
|
39
|
+
|
|
40
|
+
The implementation will prioritize correctness, flexibility, and compatibility with developer tooling.
|
|
41
|
+
|
|
42
|
+
1. **Parallel Class Hierarchy**: For each core wrapper (e.g., `CollectionWrapper`), there will be a corresponding `AsyncCollectionWrapper`. This new class will hold a reference to the original synchronous object.
|
|
43
|
+
2. **Explicit `async def` Methods**: Every method on the `Async` wrapper will be explicitly defined with `async def`. This ensures that type checkers (like Mypy) and IDEs can correctly identify them as awaitable, preventing common runtime errors and providing proper autocompletion.
|
|
44
|
+
3. **`asyncio.to_thread` Execution**: The implementation of each `async` method will simply call the corresponding synchronous method on the original object using `asyncio.to_thread`. This delegates the blocking I/O to a background thread, keeping the `asyncio` event loop free.
|
|
45
|
+
|
|
46
|
+
### 5. Alignment with Philosophy
|
|
47
|
+
|
|
48
|
+
This feature perfectly aligns with the library's guiding principles:
|
|
49
|
+
|
|
50
|
+
* **Synchronous Core with Async Potential**: It adds a powerful `async` layer without altering the simple, robust, and synchronous foundation of the library.
|
|
51
|
+
* **Simplicity and Pythonic API**: The `.as_async()` method is an intuitive and Pythonic way to opt into asynchronous behavior, and the chained-call syntax is elegant and clean.
|
|
52
|
+
* **Developer Experience**: By ensuring the `async` wrappers are explicitly typed, the design prioritizes compatibility with modern developer tools, preventing bugs and improving productivity.
|