beaver-db 0.9.2__tar.gz → 0.10.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of beaver-db might be problematic. Click here for more details.

@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: beaver-db
3
- Version: 0.9.2
3
+ Version: 0.10.0
4
4
  Summary: Fast, embedded, and multi-modal DB based on SQLite for AI-powered applications.
5
5
  Requires-Python: >=3.13
6
6
  Description-Content-Type: text/markdown
@@ -19,8 +19,9 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
19
19
 
20
20
  `beaver` is built with a minimalistic philosophy for small, local use cases where a full-blown database server would be overkill.
21
21
 
22
- - **Minimalistic & Zero-Dependency**: Uses only Python's standard libraries (`sqlite3`) and `numpy`/`scipy`.
23
- - **Synchronous & Thread-Safe**: Designed for simplicity and safety in multi-threaded environments.
22
+ - **Minimalistic**: Uses only Python's standard libraries (`sqlite3`) and `numpy`/`scipy`.
23
+ - **Schemaless**: Flexible data storage without rigid schemas across all modalities.
24
+ - **Synchronous, Multi-Process, and Thread-Safe**: Designed for simplicity and safety in multi-threaded and multi-process environments.
24
25
  - **Built for Local Applications**: Perfect for local AI tools, RAG prototypes, chatbots, and desktop utilities that need persistent, structured data without network overhead.
25
26
  - **Fast by Default**: It's built on SQLite, which is famously fast and reliable for local applications. The vector search is accelerated with an in-memory k-d tree.
26
27
  - **Standard Relational Interface**: While `beaver` provides high-level features, you can always use the same SQLite file for normal relational tasks with standard SQL.
@@ -32,7 +33,7 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
32
33
  - **Pythonic List Management**: A fluent, Redis-like interface for managing persistent, ordered lists.
33
34
  - **Persistent Priority Queue**: A high-performance, persistent queue that always returns the item with the highest priority, perfect for task management.
34
35
  - **Efficient Vector Storage & Search**: Store vector embeddings and perform fast approximate nearest neighbor searches using an in-memory k-d tree.
35
- - **Full-Text Search**: Automatically index and search through document metadata using SQLite's powerful FTS5 engine.
36
+ - **Full-Text Search and Fuzzy**: Automatically index and search through document metadata using SQLite's powerful FTS5 engine, enhanced with optional fuzzy saerch.
36
37
  - **Graph Traversal**: Create relationships between documents and traverse the graph to find neighbors or perform multi-hop walks.
37
38
  - **Single-File & Portable**: All data is stored in a single SQLite file, making it incredibly easy to move, back up, or embed in your application.
38
39
 
@@ -194,14 +195,14 @@ For more in-depth examples, check out the scripts in the `examples/` directory:
194
195
  - [`examples/publisher.py`](examples/publisher.py) and [`examples/subscriber.py`](examples/subscriber.py): A pair of examples demonstrating inter-process message passing with the publish/subscribe system.
195
196
  - [`examples/cache.py`](examples/cache.py): A practical example of using a dictionary with TTL as a cache for API calls.
196
197
  - [`examples/rerank.py`](examples/rerank.py): Shows how to combine results from vector and text search for more refined results.
198
+ - [`examples/fuzzy.py`](examples/fuzzy.py): Demonstrates fuzzy search capabilities for text search.
197
199
 
198
200
  ## Roadmap
199
201
 
200
202
  These are some of the features and improvements planned for future releases:
201
203
 
202
- - **Fuzzy search**: Implement fuzzy matching capabilities for text search.
203
204
  - **Faster ANN**: Explore integrating more advanced ANN libraries like `faiss` for improved vector search performance.
204
- - **Async API**: Comprehensive async support with on-demand wrappers for all collections.
205
+ - **Full Async API**: Comprehensive async support with on-demand wrappers for all collections.
205
206
 
206
207
  Check out the [roadmap](roadmap.md) for a detailed list of upcoming features and design ideas.
207
208
 
@@ -8,8 +8,9 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
8
8
 
9
9
  `beaver` is built with a minimalistic philosophy for small, local use cases where a full-blown database server would be overkill.
10
10
 
11
- - **Minimalistic & Zero-Dependency**: Uses only Python's standard libraries (`sqlite3`) and `numpy`/`scipy`.
12
- - **Synchronous & Thread-Safe**: Designed for simplicity and safety in multi-threaded environments.
11
+ - **Minimalistic**: Uses only Python's standard libraries (`sqlite3`) and `numpy`/`scipy`.
12
+ - **Schemaless**: Flexible data storage without rigid schemas across all modalities.
13
+ - **Synchronous, Multi-Process, and Thread-Safe**: Designed for simplicity and safety in multi-threaded and multi-process environments.
13
14
  - **Built for Local Applications**: Perfect for local AI tools, RAG prototypes, chatbots, and desktop utilities that need persistent, structured data without network overhead.
14
15
  - **Fast by Default**: It's built on SQLite, which is famously fast and reliable for local applications. The vector search is accelerated with an in-memory k-d tree.
15
16
  - **Standard Relational Interface**: While `beaver` provides high-level features, you can always use the same SQLite file for normal relational tasks with standard SQL.
@@ -21,7 +22,7 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
21
22
  - **Pythonic List Management**: A fluent, Redis-like interface for managing persistent, ordered lists.
22
23
  - **Persistent Priority Queue**: A high-performance, persistent queue that always returns the item with the highest priority, perfect for task management.
23
24
  - **Efficient Vector Storage & Search**: Store vector embeddings and perform fast approximate nearest neighbor searches using an in-memory k-d tree.
24
- - **Full-Text Search**: Automatically index and search through document metadata using SQLite's powerful FTS5 engine.
25
+ - **Full-Text Search and Fuzzy**: Automatically index and search through document metadata using SQLite's powerful FTS5 engine, enhanced with optional fuzzy saerch.
25
26
  - **Graph Traversal**: Create relationships between documents and traverse the graph to find neighbors or perform multi-hop walks.
26
27
  - **Single-File & Portable**: All data is stored in a single SQLite file, making it incredibly easy to move, back up, or embed in your application.
27
28
 
@@ -183,14 +184,14 @@ For more in-depth examples, check out the scripts in the `examples/` directory:
183
184
  - [`examples/publisher.py`](examples/publisher.py) and [`examples/subscriber.py`](examples/subscriber.py): A pair of examples demonstrating inter-process message passing with the publish/subscribe system.
184
185
  - [`examples/cache.py`](examples/cache.py): A practical example of using a dictionary with TTL as a cache for API calls.
185
186
  - [`examples/rerank.py`](examples/rerank.py): Shows how to combine results from vector and text search for more refined results.
187
+ - [`examples/fuzzy.py`](examples/fuzzy.py): Demonstrates fuzzy search capabilities for text search.
186
188
 
187
189
  ## Roadmap
188
190
 
189
191
  These are some of the features and improvements planned for future releases:
190
192
 
191
- - **Fuzzy search**: Implement fuzzy matching capabilities for text search.
192
193
  - **Faster ANN**: Explore integrating more advanced ANN libraries like `faiss` for improved vector search performance.
193
- - **Async API**: Comprehensive async support with on-demand wrappers for all collections.
194
+ - **Full Async API**: Comprehensive async support with on-demand wrappers for all collections.
194
195
 
195
196
  Check out the [roadmap](roadmap.md) for a detailed list of upcoming features and design ideas.
196
197
 
@@ -5,7 +5,63 @@ from enum import Enum
5
5
  from typing import Any, List, Literal, Set
6
6
 
7
7
  import numpy as np
8
- from scipy.spatial import cKDTree
8
+ from scipy.spatial import KDTree
9
+
10
+
11
+ # --- Fuzzy Search Helper Functions ---
12
+
13
+ def _levenshtein_distance(s1: str, s2: str) -> int:
14
+ """Calculates the Levenshtein distance between two strings."""
15
+ if len(s1) < len(s2):
16
+ return _levenshtein_distance(s2, s1)
17
+ if len(s2) == 0:
18
+ return len(s1)
19
+
20
+ previous_row = range(len(s2) + 1)
21
+ for i, c1 in enumerate(s1):
22
+ current_row = [i + 1]
23
+ for j, c2 in enumerate(s2):
24
+ insertions = previous_row[j + 1] + 1
25
+ deletions = current_row[j] + 1
26
+ substitutions = previous_row[j] + (c1 != c2)
27
+ current_row.append(min(insertions, deletions, substitutions))
28
+ previous_row = current_row
29
+ return previous_row[-1]
30
+
31
+
32
+ def _get_trigrams(text: str) -> set[str]:
33
+ """Generates a set of 3-character trigrams from a string."""
34
+ if not text or len(text) < 3:
35
+ return set()
36
+ return {text[i:i+3] for i in range(len(text) - 2)}
37
+
38
+
39
+ def _sliding_window_levenshtein(query: str, content: str, fuzziness: int) -> int:
40
+ """
41
+ Finds the best Levenshtein match for a query within a larger text
42
+ by comparing it against relevant substrings.
43
+ """
44
+ query_tokens = query.lower().split()
45
+ content_tokens = content.lower().split()
46
+ query_len = len(query_tokens)
47
+ if query_len == 0:
48
+ return 0
49
+
50
+ min_dist = float('inf')
51
+ query_norm = " ".join(query_tokens)
52
+
53
+ # The window size can be slightly smaller or larger than the query length
54
+ # to account for missing or extra words in a fuzzy match.
55
+ for window_size in range(max(1, query_len - fuzziness), query_len + fuzziness + 1):
56
+ if window_size > len(content_tokens):
57
+ continue
58
+ for i in range(len(content_tokens) - window_size + 1):
59
+ window_text = " ".join(content_tokens[i:i+window_size])
60
+ dist = _levenshtein_distance(query_norm, window_text)
61
+ if dist < min_dist:
62
+ min_dist = dist
63
+
64
+ return int(min_dist)
9
65
 
10
66
 
11
67
  class WalkDirection(Enum):
@@ -54,18 +110,18 @@ class CollectionManager:
54
110
  def __init__(self, name: str, conn: sqlite3.Connection):
55
111
  self._name = name
56
112
  self._conn = conn
57
- self._kdtree: cKDTree | None = None
113
+ self._kdtree: KDTree | None = None
58
114
  self._doc_ids: List[str] = []
59
115
  self._local_index_version = -1 # Version of the in-memory index
60
116
 
61
- def _flatten_metadata(self, metadata: dict, prefix: str = "") -> dict[str, str]:
62
- """Flattens a nested dictionary and filters for string values."""
117
+ def _flatten_metadata(self, metadata: dict, prefix: str = "") -> dict[str, Any]:
118
+ """Flattens a nested dictionary for indexing."""
63
119
  flat_dict = {}
64
120
  for key, value in metadata.items():
65
- new_key = f"{prefix}__{key}" if prefix else key
121
+ new_key = f"{prefix}.{key}" if prefix else key
66
122
  if isinstance(value, dict):
67
123
  flat_dict.update(self._flatten_metadata(value, new_key))
68
- elif isinstance(value, str):
124
+ else:
69
125
  flat_dict[new_key] = value
70
126
  return flat_dict
71
127
 
@@ -85,39 +141,63 @@ class CollectionManager:
85
141
  return True
86
142
  return self._local_index_version < self._get_db_version()
87
143
 
88
- def index(self, document: Document, *, fts: bool = True):
89
- """Indexes a Document, performing an upsert and updating the FTS index."""
144
+ def index(
145
+ self,
146
+ document: Document,
147
+ *,
148
+ fts: bool | list[str] = True,
149
+ fuzzy: bool = False
150
+ ):
151
+ """
152
+ Indexes a Document, including vector, FTS, and fuzzy search data.
153
+ The entire operation is performed in a single atomic transaction.
154
+ """
90
155
  with self._conn:
91
- if fts:
92
- self._conn.execute(
93
- "DELETE FROM beaver_fts_index WHERE collection = ? AND item_id = ?",
94
- (self._name, document.id),
95
- )
96
- string_fields = self._flatten_metadata(document.to_dict())
97
- if string_fields:
98
- fts_data = [
99
- (self._name, document.id, path, content)
100
- for path, content in string_fields.items()
101
- ]
102
- self._conn.executemany(
103
- "INSERT INTO beaver_fts_index (collection, item_id, field_path, field_content) VALUES (?, ?, ?, ?)",
104
- fts_data,
105
- )
106
-
156
+ # Step 1: Core Document and Vector Storage (Unaffected by FTS/Fuzzy)
107
157
  self._conn.execute(
108
158
  "INSERT OR REPLACE INTO beaver_collections (collection, item_id, item_vector, metadata) VALUES (?, ?, ?, ?)",
109
159
  (
110
160
  self._name,
111
161
  document.id,
112
- (
113
- document.embedding.tobytes()
114
- if document.embedding is not None
115
- else None
116
- ),
162
+ document.embedding.tobytes() if document.embedding is not None else None,
117
163
  json.dumps(document.to_dict()),
118
164
  ),
119
165
  )
120
- # Atomically increment the collection's version number
166
+
167
+ # Step 2: FTS and Fuzzy Indexing
168
+ # First, clean up old index data for this document
169
+ self._conn.execute("DELETE FROM beaver_fts_index WHERE collection = ? AND item_id = ?", (self._name, document.id))
170
+ self._conn.execute("DELETE FROM beaver_trigrams WHERE collection = ? AND item_id = ?", (self._name, document.id))
171
+
172
+ # Determine which string fields to index
173
+ flat_metadata = self._flatten_metadata(document.to_dict())
174
+ fields_to_index: dict[str, str] = {}
175
+ if isinstance(fts, list):
176
+ fields_to_index = {k: v for k, v in flat_metadata.items() if k in fts and isinstance(v, str)}
177
+ elif fts:
178
+ fields_to_index = {k: v for k, v in flat_metadata.items() if isinstance(v, str)}
179
+
180
+ if fields_to_index:
181
+ # FTS indexing
182
+ fts_data = [(self._name, document.id, path, content) for path, content in fields_to_index.items()]
183
+ self._conn.executemany(
184
+ "INSERT INTO beaver_fts_index (collection, item_id, field_path, field_content) VALUES (?, ?, ?, ?)",
185
+ fts_data,
186
+ )
187
+
188
+ # Fuzzy indexing (if enabled)
189
+ if fuzzy:
190
+ trigram_data = []
191
+ for path, content in fields_to_index.items():
192
+ for trigram in _get_trigrams(content.lower()):
193
+ trigram_data.append((self._name, document.id, path, trigram))
194
+ if trigram_data:
195
+ self._conn.executemany(
196
+ "INSERT INTO beaver_trigrams (collection, item_id, field_path, trigram) VALUES (?, ?, ?, ?)",
197
+ trigram_data,
198
+ )
199
+
200
+ # Step 3: Update Collection Version
121
201
  self._conn.execute(
122
202
  """
123
203
  INSERT INTO beaver_collection_versions (collection_name, version) VALUES (?, 1)
@@ -139,6 +219,10 @@ class CollectionManager:
139
219
  "DELETE FROM beaver_fts_index WHERE collection = ? AND item_id = ?",
140
220
  (self._name, document.id),
141
221
  )
222
+ self._conn.execute(
223
+ "DELETE FROM beaver_trigrams WHERE collection = ? AND item_id = ?",
224
+ (self._name, document.id),
225
+ )
142
226
  self._conn.execute(
143
227
  "DELETE FROM beaver_edges WHERE collection = ? AND (source_item_id = ? OR target_item_id = ?)",
144
228
  (self._name, document.id, document.id),
@@ -181,7 +265,7 @@ class CollectionManager:
181
265
  self._doc_ids.append(row["item_id"])
182
266
  vectors.append(np.frombuffer(row["item_vector"], dtype=np.float32))
183
267
 
184
- self._kdtree = cKDTree(vectors) if vectors else None
268
+ self._kdtree = KDTree(vectors) if vectors else None
185
269
  self._local_index_version = self._get_db_version()
186
270
 
187
271
  def search(
@@ -222,9 +306,36 @@ class CollectionManager:
222
306
  return results
223
307
 
224
308
  def match(
225
- self, query: str, on_field: str | None = None, top_k: int = 10
309
+ self,
310
+ query: str,
311
+ *,
312
+ on: str | list[str] | None = None,
313
+ top_k: int = 10,
314
+ fuzziness: int = 0
226
315
  ) -> list[tuple[Document, float]]:
227
- """Performs a full-text search on indexed string fields."""
316
+ """
317
+ Performs a full-text or fuzzy search on indexed string fields.
318
+
319
+ Args:
320
+ query: The search query string.
321
+ on: An optional list of fields to restrict the search to.
322
+ top_k: The maximum number of results to return.
323
+ fuzziness: The Levenshtein distance for fuzzy matching.
324
+ If 0, performs an exact FTS search.
325
+ If > 0, performs a fuzzy search.
326
+ """
327
+ if isinstance(on, str):
328
+ on = [on]
329
+
330
+ if fuzziness == 0:
331
+ return self._perform_fts_search(query, on, top_k)
332
+ else:
333
+ return self._perform_fuzzy_search(query, on, top_k, fuzziness)
334
+
335
+ def _perform_fts_search(
336
+ self, query: str, on: list[str] | None, top_k: int
337
+ ) -> list[tuple[Document, float]]:
338
+ """Performs a standard FTS search."""
228
339
  cursor = self._conn.cursor()
229
340
  sql_query = """
230
341
  SELECT t1.item_id, t1.item_vector, t1.metadata, fts.rank
@@ -234,30 +345,127 @@ class CollectionManager:
234
345
  ) AS fts ON t1.item_id = fts.item_id
235
346
  WHERE t1.collection = ? ORDER BY fts.rank
236
347
  """
237
- params, field_filter_sql = [], ""
238
- if on_field:
239
- field_filter_sql = "AND field_path = ?"
240
- params.extend([query, on_field])
241
- else:
242
- params.append(query)
243
- params.extend([top_k, self._name])
348
+ params: list[Any] = [query]
349
+ field_filter_sql = ""
350
+ if on:
351
+ placeholders = ",".join("?" for _ in on)
352
+ field_filter_sql = f"AND field_path IN ({placeholders})"
353
+ params.extend(on)
244
354
 
245
- rows = cursor.execute(
246
- sql_query.format(field_filter_sql), tuple(params)
247
- ).fetchall()
355
+ params.extend([top_k, self._name])
356
+ rows = cursor.execute(sql_query.format(field_filter_sql), tuple(params)).fetchall()
248
357
  results = []
249
358
  for row in rows:
250
359
  embedding = (
251
360
  np.frombuffer(row["item_vector"], dtype=np.float32).tolist()
252
- if row["item_vector"]
253
- else None
254
- )
255
- doc = Document(
256
- id=row["item_id"], embedding=embedding, **json.loads(row["metadata"])
361
+ if row["item_vector"] else None
257
362
  )
363
+ doc = Document(id=row["item_id"], embedding=embedding, **json.loads(row["metadata"]))
258
364
  results.append((doc, row["rank"]))
259
365
  return results
260
366
 
367
+ def _get_trigram_candidates(self, query: str, on: list[str] | None) -> set[str]:
368
+ """
369
+ Gets document IDs that meet a trigram similarity threshold with the query.
370
+ """
371
+ query_trigrams = _get_trigrams(query.lower())
372
+ if not query_trigrams:
373
+ return set()
374
+
375
+ # Optimization: Only consider documents that share a significant number of trigrams.
376
+ # This threshold dramatically reduces the number of candidates for the expensive
377
+ # Levenshtein check. A 30% threshold is a reasonable starting point.
378
+ similarity_threshold = int(len(query_trigrams) * 0.3)
379
+ if similarity_threshold == 0:
380
+ return set()
381
+
382
+ cursor = self._conn.cursor()
383
+ sql = """
384
+ SELECT item_id FROM beaver_trigrams
385
+ WHERE collection = ? AND trigram IN ({}) {}
386
+ GROUP BY item_id
387
+ HAVING COUNT(DISTINCT trigram) >= ?
388
+ """
389
+ params: list[Any] = [self._name]
390
+ trigram_placeholders = ",".join("?" for _ in query_trigrams)
391
+ params.extend(query_trigrams)
392
+
393
+ field_filter_sql = ""
394
+ if on:
395
+ field_placeholders = ",".join("?" for _ in on)
396
+ field_filter_sql = f"AND field_path IN ({field_placeholders})"
397
+ params.extend(on)
398
+
399
+ params.append(similarity_threshold)
400
+ cursor.execute(sql.format(trigram_placeholders, field_filter_sql), tuple(params))
401
+ return {row['item_id'] for row in cursor.fetchall()}
402
+
403
+ def _perform_fuzzy_search(
404
+ self, query: str, on: list[str] | None, top_k: int, fuzziness: int
405
+ ) -> list[tuple[Document, float]]:
406
+ """Performs a 3-stage fuzzy search: gather, score, and sort."""
407
+ # Stage 1: Gather Candidates
408
+ fts_results = self._perform_fts_search(query, on, top_k)
409
+ fts_candidate_ids = {doc.id for doc, _ in fts_results}
410
+ trigram_candidate_ids = self._get_trigram_candidates(query, on)
411
+ candidate_ids = fts_candidate_ids.union(trigram_candidate_ids)
412
+ if not candidate_ids:
413
+ return []
414
+
415
+ # Stage 2: Score Candidates
416
+ cursor = self._conn.cursor()
417
+ id_placeholders = ",".join("?" for _ in candidate_ids)
418
+ sql_text = f"SELECT item_id, field_path, field_content FROM beaver_fts_index WHERE collection = ? AND item_id IN ({id_placeholders})"
419
+ params_text: list[Any] = [self._name]
420
+ params_text.extend(candidate_ids)
421
+ if on:
422
+ sql_text += f" AND field_path IN ({','.join('?' for _ in on)})"
423
+ params_text.extend(on)
424
+
425
+ cursor.execute(sql_text, tuple(params_text))
426
+ candidate_texts: dict[str, dict[str, str]] = {}
427
+ for row in cursor.fetchall():
428
+ item_id = row['item_id']
429
+ if item_id not in candidate_texts:
430
+ candidate_texts[item_id] = {}
431
+ candidate_texts[item_id][row['field_path']] = row['field_content']
432
+
433
+ scored_candidates = []
434
+ fts_rank_map = {doc.id: rank for doc, rank in fts_results}
435
+
436
+ for item_id in candidate_ids:
437
+ if item_id not in candidate_texts:
438
+ continue
439
+ min_dist = float('inf')
440
+ for content in candidate_texts[item_id].values():
441
+ dist = _sliding_window_levenshtein(query, content, fuzziness)
442
+ if dist < min_dist:
443
+ min_dist = dist
444
+ if min_dist <= fuzziness:
445
+ scored_candidates.append({
446
+ "id": item_id,
447
+ "distance": min_dist,
448
+ "fts_rank": fts_rank_map.get(item_id, 0) # Use 0 for non-matches (less relevant)
449
+ })
450
+
451
+ # Stage 3: Sort and Fetch Results
452
+ scored_candidates.sort(key=lambda x: (x["distance"], x["fts_rank"]))
453
+ top_ids = [c["id"] for c in scored_candidates[:top_k]]
454
+ if not top_ids:
455
+ return []
456
+
457
+ id_placeholders = ",".join("?" for _ in top_ids)
458
+ sql_docs = f"SELECT item_id, item_vector, metadata FROM beaver_collections WHERE collection = ? AND item_id IN ({id_placeholders})"
459
+ cursor.execute(sql_docs, (self._name, *top_ids))
460
+ doc_map = {row["item_id"]: Document(id=row["item_id"], embedding=(np.frombuffer(row["item_vector"], dtype=np.float32).tolist() if row["item_vector"] else None), **json.loads(row["metadata"])) for row in cursor.fetchall()}
461
+
462
+ final_results = []
463
+ distance_map = {c["id"]: c["distance"] for c in scored_candidates}
464
+ for doc_id in top_ids:
465
+ if doc_id in doc_map:
466
+ final_results.append((doc_map[doc_id], float(distance_map[doc_id])))
467
+ return final_results
468
+
261
469
  def connect(
262
470
  self, source: Document, target: Document, label: str, metadata: dict = None
263
471
  ):
@@ -36,6 +36,7 @@ class BeaverDB:
36
36
  self._create_list_table()
37
37
  self._create_collections_table()
38
38
  self._create_fts_table()
39
+ self._create_trigrams_table()
39
40
  self._create_edges_table()
40
41
  self._create_versions_table()
41
42
  self._create_dict_table()
@@ -139,6 +140,27 @@ class BeaverDB:
139
140
  """
140
141
  )
141
142
 
143
+ def _create_trigrams_table(self):
144
+ """Creates the table for the fuzzy search trigram index."""
145
+ with self._conn:
146
+ self._conn.execute(
147
+ """
148
+ CREATE TABLE IF NOT EXISTS beaver_trigrams (
149
+ collection TEXT NOT NULL,
150
+ item_id TEXT NOT NULL,
151
+ field_path TEXT NOT NULL,
152
+ trigram TEXT NOT NULL,
153
+ PRIMARY KEY (collection, field_path, trigram, item_id)
154
+ )
155
+ """
156
+ )
157
+ self._conn.execute(
158
+ """
159
+ CREATE INDEX IF NOT EXISTS idx_trigram_lookup
160
+ ON beaver_trigrams (collection, trigram, field_path)
161
+ """
162
+ )
163
+
142
164
  def _create_edges_table(self):
143
165
  """Creates the table for storing relationships between documents."""
144
166
  with self._conn:
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: beaver-db
3
- Version: 0.9.2
3
+ Version: 0.10.0
4
4
  Summary: Fast, embedded, and multi-modal DB based on SQLite for AI-powered applications.
5
5
  Requires-Python: >=3.13
6
6
  Description-Content-Type: text/markdown
@@ -19,8 +19,9 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
19
19
 
20
20
  `beaver` is built with a minimalistic philosophy for small, local use cases where a full-blown database server would be overkill.
21
21
 
22
- - **Minimalistic & Zero-Dependency**: Uses only Python's standard libraries (`sqlite3`) and `numpy`/`scipy`.
23
- - **Synchronous & Thread-Safe**: Designed for simplicity and safety in multi-threaded environments.
22
+ - **Minimalistic**: Uses only Python's standard libraries (`sqlite3`) and `numpy`/`scipy`.
23
+ - **Schemaless**: Flexible data storage without rigid schemas across all modalities.
24
+ - **Synchronous, Multi-Process, and Thread-Safe**: Designed for simplicity and safety in multi-threaded and multi-process environments.
24
25
  - **Built for Local Applications**: Perfect for local AI tools, RAG prototypes, chatbots, and desktop utilities that need persistent, structured data without network overhead.
25
26
  - **Fast by Default**: It's built on SQLite, which is famously fast and reliable for local applications. The vector search is accelerated with an in-memory k-d tree.
26
27
  - **Standard Relational Interface**: While `beaver` provides high-level features, you can always use the same SQLite file for normal relational tasks with standard SQL.
@@ -32,7 +33,7 @@ A fast, single-file, multi-modal database for Python, built with the standard `s
32
33
  - **Pythonic List Management**: A fluent, Redis-like interface for managing persistent, ordered lists.
33
34
  - **Persistent Priority Queue**: A high-performance, persistent queue that always returns the item with the highest priority, perfect for task management.
34
35
  - **Efficient Vector Storage & Search**: Store vector embeddings and perform fast approximate nearest neighbor searches using an in-memory k-d tree.
35
- - **Full-Text Search**: Automatically index and search through document metadata using SQLite's powerful FTS5 engine.
36
+ - **Full-Text Search and Fuzzy**: Automatically index and search through document metadata using SQLite's powerful FTS5 engine, enhanced with optional fuzzy saerch.
36
37
  - **Graph Traversal**: Create relationships between documents and traverse the graph to find neighbors or perform multi-hop walks.
37
38
  - **Single-File & Portable**: All data is stored in a single SQLite file, making it incredibly easy to move, back up, or embed in your application.
38
39
 
@@ -194,14 +195,14 @@ For more in-depth examples, check out the scripts in the `examples/` directory:
194
195
  - [`examples/publisher.py`](examples/publisher.py) and [`examples/subscriber.py`](examples/subscriber.py): A pair of examples demonstrating inter-process message passing with the publish/subscribe system.
195
196
  - [`examples/cache.py`](examples/cache.py): A practical example of using a dictionary with TTL as a cache for API calls.
196
197
  - [`examples/rerank.py`](examples/rerank.py): Shows how to combine results from vector and text search for more refined results.
198
+ - [`examples/fuzzy.py`](examples/fuzzy.py): Demonstrates fuzzy search capabilities for text search.
197
199
 
198
200
  ## Roadmap
199
201
 
200
202
  These are some of the features and improvements planned for future releases:
201
203
 
202
- - **Fuzzy search**: Implement fuzzy matching capabilities for text search.
203
204
  - **Faster ANN**: Explore integrating more advanced ANN libraries like `faiss` for improved vector search performance.
204
- - **Async API**: Comprehensive async support with on-demand wrappers for all collections.
205
+ - **Full Async API**: Comprehensive async support with on-demand wrappers for all collections.
205
206
 
206
207
  Check out the [roadmap](roadmap.md) for a detailed list of upcoming features and design ideas.
207
208
 
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "beaver-db"
3
- version = "0.9.2"
3
+ version = "0.10.0"
4
4
  description = "Fast, embedded, and multi-modal DB based on SQLite for AI-powered applications."
5
5
  readme = "README.md"
6
6
  requires-python = ">=3.13"
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes