edgevdb 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,4 @@
1
+ recursive-include edgevdb/lib *.so *.dll *.dylib
2
+ include edgevdb/lib/README.md
3
+ include README.md
4
+ include LICENSE
edgevdb-1.0.0/PKG-INFO ADDED
@@ -0,0 +1,666 @@
1
+ Metadata-Version: 2.4
2
+ Name: edgevdb
3
+ Version: 1.0.0
4
+ Summary: EdgeVDB — On-device vector database with HNSW, hybrid retrieval, knowledge graph, and CRDT sync
5
+ Author-email: XformAI <contact@xformai.in>
6
+ License: Apache-2.0
7
+ Project-URL: Homepage, https://github.com/XformAI/EDGEVDB
8
+ Project-URL: Documentation, https://xformai.github.io/EDGEVDB/
9
+ Project-URL: Repository, https://github.com/XformAI/EDGEVDB
10
+ Project-URL: Issues, https://github.com/XformAI/EDGEVDB/issues
11
+ Keywords: vector-database,hnsw,embedding,rag,on-device,edge-ai,semantic-search
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: License :: OSI Approved :: Apache Software License
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.8
17
+ Classifier: Programming Language :: Python :: 3.9
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Programming Language :: Python :: 3.13
22
+ Classifier: Operating System :: Microsoft :: Windows
23
+ Classifier: Operating System :: POSIX :: Linux
24
+ Classifier: Operating System :: MacOS
25
+ Classifier: Topic :: Database
26
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
27
+ Requires-Python: >=3.8
28
+ Description-Content-Type: text/markdown
29
+
30
+ # EdgeVDB Python SDK
31
+
32
+ > **Python wrapper for EdgeVDB on-device vector database with ctypes FFI binding.**
33
+
34
+ The EdgeVDB Python SDK provides a Pythonic interface to the EdgeVDB C++ core library using ctypes. It enables Python applications to use EdgeVDB's vector database capabilities on desktop and Raspberry Pi platforms.
35
+
36
+ ## Features
37
+
38
+ - **ctypes FFI Binding** — Direct calls to C API with no Python dependencies
39
+ - **Context Manager Support** — Automatic resource cleanup with `with` statements
40
+ - **Type Hints** — Full type annotations for IDE support
41
+ - **Zero Python Dependencies** — Only standard library and ctypes
42
+ - **Cross-Platform** — Linux, macOS, Windows, Raspberry Pi
43
+ - **Flexible Embedding** — Use any embedding provider or built-in ONNX embedder
44
+
45
+ ## Installation
46
+
47
+ ### From Source
48
+
49
+ ```bash
50
+ # Build the C++ core first
51
+ cd ..
52
+ cmake --preset desktop-release
53
+ cmake --build build/desktop-release
54
+
55
+ # Copy shared library to Python package (platform-specific)
56
+ # Linux:
57
+ cp build/desktop-release/core/libedgevdb_shared.so python/edgevdb/lib/linux/
58
+ # macOS:
59
+ # cp build/desktop-release/core/libedgevdb_shared.dylib python/edgevdb/lib/darwin/
60
+ # Windows:
61
+ # copy build\desktop-release\core\edgevdb_shared.dll python\edgevdb\lib\windows\
62
+
63
+ # Install in development mode
64
+ cd python
65
+ pip install -e .
66
+ ```
67
+
68
+ ### From PyPI (after publishing)
69
+
70
+ ```bash
71
+ pip install edgevdb
72
+ ```
73
+
74
+ ## Quick Start
75
+
76
+ ### Without ONNX (Recommended)
77
+
78
+ Use embeddings from any provider (OpenAI, Cohere, sentence-transformers, etc.):
79
+
80
+ ```python
81
+ from edgevdb import EdgeVDB
82
+
83
+ # Open database
84
+ db = EdgeVDB("./my_database")
85
+
86
+ # Get embeddings from your preferred provider
87
+ # Example with sentence-transformers:
88
+ from sentence_transformers import SentenceTransformer
89
+ model = SentenceTransformer('all-MiniLM-L6-v2')
90
+ embedding = model.encode("Machine learning finds patterns in data")
91
+
92
+ # Insert with pre-computed embedding
93
+ chunk_id = db.insert_chunk(
94
+ text="Machine learning finds patterns in data",
95
+ embedding=embedding,
96
+ doc_id=1,
97
+ page_number=0
98
+ )
99
+
100
+ # Query
101
+ query_emb = model.encode("what is ML?")
102
+ results = db.query_vector(query_emb, query_text="what is ML?", top_k=5)
103
+
104
+ for r in results:
105
+ print(f"score={r.score:.3f} text={r.text}")
106
+
107
+ # Object store
108
+ doc_id = db.put_object("Document", {"title": "ML Intro", "author": "Alice"})
109
+ db.add_relation("has_chunk", doc_id, chunk_id)
110
+
111
+ db.save()
112
+ db.close()
113
+ ```
114
+
115
+ ### With Built-in Embedder
116
+
117
+ ```python
118
+ from edgevdb import EdgeVDB, Embedder
119
+
120
+ # Create embedder
121
+ embedder = Embedder(
122
+ model_path="models/model.onnx",
123
+ vocab_path="models/vocab.txt",
124
+ threads=2
125
+ )
126
+
127
+ # Use with context manager
128
+ with EdgeVDB("./my_database") as db:
129
+ # Auto-embed on insert
130
+ chunk_id = db.insert_text(
131
+ embedder,
132
+ "Deep learning uses neural networks",
133
+ doc_id=1,
134
+ page_number=0
135
+ )
136
+
137
+ # Auto-embed on query
138
+ results = db.query_text(embedder, "neural network architecture", top_k=5)
139
+ print(results.context_string)
140
+ ```
141
+
142
+ ## API Reference
143
+
144
+ ### EdgeVDB
145
+
146
+ Main database class.
147
+
148
+ #### Constructor
149
+
150
+ ```python
151
+ EdgeVDB(storage_dir: str, **kwargs)
152
+ ```
153
+
154
+ **Parameters:**
155
+ - `storage_dir` (str): Directory for database files
156
+ - `hnsw_M` (int): HNSW M parameter (default: 16)
157
+ - `hnsw_ef_construction` (int): HNSW ef_construction (default: 200)
158
+ - `hnsw_ef_search` (int): HNSW ef_search (default: 64)
159
+ - `ranker_alpha` (float): Cosine weight (default: 0.70)
160
+ - `ranker_beta` (float): Page proximity weight (default: 0.20)
161
+ - `ranker_gamma` (float): Keyword weight (default: 0.10)
162
+ - `token_budget` (int): Max tokens in context (default: 3200)
163
+ - `embedding_threads` (int): ONNX thread count (default: 2)
164
+ - `enable_knowledge_graph` (bool): Enable KG (default: True)
165
+ - `enable_sync` (bool): Enable sync (default: False)
166
+ - `device_id` (str): Device ID for sync (default: auto-generated)
167
+
168
+ #### Methods
169
+
170
+ ##### Vector Store
171
+
172
+ **insert_chunk(text, embedding, doc_id=0, page_number=0) -> int**
173
+ - Insert text with pre-computed embedding
174
+ - Returns chunk ID
175
+
176
+ ```python
177
+ chunk_id = db.insert_chunk(
178
+ text="Your text here",
179
+ embedding=[0.1, 0.2, ...], # 384-dim float array
180
+ doc_id=1,
181
+ page_number=0
182
+ )
183
+ ```
184
+
185
+ **insert_text(embedder, text, doc_id=0, page_number=0) -> int**
186
+ - Insert text with auto-embedding via embedder
187
+ - Returns chunk ID
188
+
189
+ ```python
190
+ chunk_id = db.insert_text(
191
+ embedder,
192
+ "Your text here",
193
+ doc_id=1,
194
+ page_number=0
195
+ )
196
+ ```
197
+
198
+ **remove_chunk(chunk_id)**
199
+ - Remove chunk by ID
200
+
201
+ ```python
202
+ db.remove_chunk(chunk_id)
203
+ ```
204
+
205
+ **query_vector(embedding, query_text="", top_k=5) -> QueryResults**
206
+ - Query with pre-computed embedding
207
+ - Returns QueryResults object
208
+
209
+ ```python
210
+ results = db.query_vector(
211
+ embedding=[0.1, 0.2, ...],
212
+ query_text="search query",
213
+ top_k=5
214
+ )
215
+ ```
216
+
217
+ **query_text(embedder, query, top_k=5, use_kg_expansion=False) -> QueryResults**
218
+ - Query with auto-embedding via embedder
219
+ - Returns QueryResults object
220
+
221
+ ```python
222
+ results = db.query_text(
223
+ embedder,
224
+ "search query",
225
+ top_k=5,
226
+ use_kg_expansion=False
227
+ )
228
+ ```
229
+
230
+ ##### Object Store
231
+
232
+ **put_object(type_name, properties) -> int**
233
+ - Store JSON object
234
+ - Returns object ID
235
+
236
+ ```python
237
+ doc_id = db.put_object(
238
+ "Document",
239
+ {"title": "My Doc", "author": "Alice"}
240
+ )
241
+ ```
242
+
243
+ **get_object(object_id) -> Optional[Dict]**
244
+ - Retrieve object by ID
245
+ - Returns dict or None if not found
246
+
247
+ ```python
248
+ obj = db.get_object(doc_id)
249
+ if obj:
250
+ print(obj["title"])
251
+ ```
252
+
253
+ **remove_object(object_id)**
254
+ - Soft delete object
255
+
256
+ ```python
257
+ db.remove_object(doc_id)
258
+ ```
259
+
260
+ ##### Relations
261
+
262
+ **add_relation(name, from_id, to_id)**
263
+ - Add typed edge between objects
264
+
265
+ ```python
266
+ db.add_relation("has_chunk", doc_id, chunk_id)
267
+ ```
268
+
269
+ ##### Lifecycle
270
+
271
+ **save()**
272
+ - Flush all data to disk
273
+
274
+ ```python
275
+ db.save()
276
+ ```
277
+
278
+ **close()**
279
+ - Release native resources
280
+
281
+ ```python
282
+ db.close()
283
+ ```
284
+
285
+ **Context Manager**
286
+
287
+ ```python
288
+ with EdgeVDB("./data") as db:
289
+ # Auto-save and close on exit
290
+ db.insert_chunk("text", embedding, doc_id=1)
291
+ ```
292
+
293
+ ### Embedder
294
+
295
+ ONNX embedding model wrapper.
296
+
297
+ #### Constructor
298
+
299
+ ```python
300
+ Embedder(model_path: str, vocab_path: str, threads: int = 2)
301
+ ```
302
+
303
+ **Parameters:**
304
+ - `model_path` (str): Path to ONNX model file
305
+ - `vocab_path` (str): Path to vocabulary file
306
+ - `threads` (int): Number of inference threads (default: 2)
307
+
308
+ #### Methods
309
+
310
+ **embed(text: str) -> List[float]**
311
+ - Embed text to 384-dim vector
312
+ - Returns list of floats
313
+
314
+ ```python
315
+ embedding = embedder.embed("Hello world")
316
+ ```
317
+
318
+ **destroy()**
319
+ - Release native resources
320
+
321
+ ```python
322
+ embedder.destroy()
323
+ ```
324
+
325
+ ### QueryResults
326
+
327
+ Query result container with lazy access.
328
+
329
+ #### Properties
330
+
331
+ **count** (int): Number of results
332
+
333
+ ```python
334
+ print(f"Found {results.count} results")
335
+ ```
336
+
337
+ **context_string** (str): Pre-assembled RAG context
338
+
339
+ ```python
340
+ print(results.context_string)
341
+ ```
342
+
343
+ #### Methods
344
+
345
+ **__getitem__(index) -> ChunkResult**
346
+ - Access individual result by index
347
+
348
+ ```python
349
+ result = results[0]
350
+ print(result.text)
351
+ ```
352
+
353
+ **__iter__()**
354
+ - Iterate over results
355
+
356
+ ```python
357
+ for r in results:
358
+ print(f"{r.score}: {r.text}")
359
+ ```
360
+
361
+ **to_list() -> List[ChunkResult]**
362
+ - Convert to list
363
+
364
+ ```python
365
+ results_list = results.to_list()
366
+ ```
367
+
368
+ **free()**
369
+ - Free native query handle (called automatically by __del__)
370
+
371
+ ```python
372
+ results.free()
373
+ ```
374
+
375
+ ### ChunkResult
376
+
377
+ Single query result.
378
+
379
+ #### Attributes
380
+
381
+ - **chunk_id** (int): Unique chunk identifier
382
+ - **text** (str): Chunk text content
383
+ - **score** (float): Hybrid similarity score [0.0, 1.0]
384
+ - **page_number** (int): Page number in document
385
+ - **doc_id** (int): Document identifier
386
+
387
+ ```python
388
+ for r in results:
389
+ print(f"ID: {r.chunk_id}")
390
+ print(f"Text: {r.text}")
391
+ print(f"Score: {r.score:.3f}")
392
+ print(f"Page: {r.page_number}")
393
+ ```
394
+
395
+ ## Examples
396
+
397
+ ### RAG Pipeline
398
+
399
+ ```python
400
+ from edgevdb import EdgeVDB
401
+ from sentence_transformers import SentenceTransformer
402
+
403
+ # Initialize
404
+ model = SentenceTransformer('all-MiniLM-L6-v2')
405
+ db = EdgeVDB("./rag_database")
406
+
407
+ # Index documents
408
+ documents = [
409
+ {"id": 1, "text": "Python is a high-level programming language."},
410
+ {"id": 2, "text": "Machine learning is a subset of AI."},
411
+ {"id": 3, "text": "Vector databases enable semantic search."},
412
+ ]
413
+
414
+ for doc in documents:
415
+ embedding = model.encode(doc["text"])
416
+ db.insert_chunk(doc["text"], embedding, doc_id=doc["id"])
417
+
418
+ # Query
419
+ query = "What is semantic search?"
420
+ query_emb = model.encode(query)
421
+ results = db.query_vector(query_emb, query_text=query, top_k=2)
422
+
423
+ # Assemble context
424
+ context = results.context_string
425
+ print(f"Context: {context}")
426
+
427
+ db.save()
428
+ db.close()
429
+ ```
430
+
431
+ ### Object Store + Relations
432
+
433
+ ```python
434
+ from edgevdb import EdgeVDB
435
+
436
+ db = EdgeVDB("./my_database")
437
+
438
+ # Store documents
439
+ doc1_id = db.put_object("Document", {
440
+ "title": "Introduction to ML",
441
+ "author": "Alice",
442
+ "year": 2024
443
+ })
444
+
445
+ doc2_id = db.put_object("Document", {
446
+ "title": "Advanced Topics",
447
+ "author": "Bob",
448
+ "year": 2024
449
+ })
450
+
451
+ # Store chunks with embeddings
452
+ chunk1_id = db.insert_chunk("ML is fascinating", emb, doc_id=doc1_id)
453
+ chunk2_id = db.insert_chunk("Deep learning is powerful", emb, doc_id=doc2_id)
454
+
455
+ # Link chunks to documents
456
+ db.add_relation("has_chunk", doc1_id, chunk1_id)
457
+ db.add_relation("has_chunk", doc2_id, chunk2_id)
458
+
459
+ db.save()
460
+ db.close()
461
+ ```
462
+
463
+ ### Error Handling
464
+
465
+ ```python
466
+ from edgevdb import EdgeVDB, set_log_level
467
+
468
+ # Enable debug logging
469
+ set_log_level(3)
470
+
471
+ try:
472
+ db = EdgeVDB("./my_database")
473
+
474
+ # Operations
475
+ chunk_id = db.insert_chunk("text", embedding, doc_id=1)
476
+
477
+ # Object not found returns None (doesn't throw)
478
+ obj = db.get_object(999)
479
+ if obj is None:
480
+ print("Object not found")
481
+
482
+ db.save()
483
+ db.close()
484
+
485
+ except RuntimeError as e:
486
+ print(f"EdgeVDB error: {e}")
487
+ ```
488
+
489
+ ## Library Discovery
490
+
491
+ The Python SDK automatically searches for the EdgeVDB shared library in the following locations:
492
+
493
+ 1. Platform-specific directory (`edgevdb/lib/<platform>/`) — **preferred**
494
+ 2. Package lib directory (`edgevdb/lib/`)
495
+ 3. Package directory (`edgevdb/`)
496
+ 4. Current working directory
497
+ 5. `build/desktop-release/core/`
498
+ 6. `build/desktop-debug/core/`
499
+
500
+ **Library Layout:**
501
+ ```
502
+ python/edgevdb/lib/
503
+ linux/ → libedgevdb_shared.so
504
+ darwin/ → libedgevdb_shared.dylib
505
+ windows/ → edgevdb_shared.dll, libedgevdb_shared.dll
506
+ ```
507
+
508
+ ## Performance Considerations
509
+
510
+ ### Embedding Provider Choice
511
+
512
+ | Provider | Speed | Quality | Offline | Cost |
513
+ |----------|-------|--------|---------|------|
514
+ | sentence-transformers | Fast | Good | ✅ | Free |
515
+ | OpenAI API | Slow | Excellent | ❌ | Paid |
516
+ | Cohere API | Medium | Good | ❌ | Paid |
517
+ | Built-in ONNX | Medium | Good | ✅ | Free |
518
+
519
+ ### Batch Operations
520
+
521
+ For large-scale operations, consider batching:
522
+
523
+ ```python
524
+ # Batch insert
525
+ embeddings = model.encode(texts)
526
+ for text, emb in zip(texts, embeddings):
527
+ db.insert_chunk(text, emb, doc_id=doc_id)
528
+
529
+ db.save() # Save once after all inserts
530
+ ```
531
+
532
+ ### Memory Management
533
+
534
+ - Query results hold native handles; call `results.free()` or use context manager
535
+ - Embedders hold native resources; call `embedder.destroy()` when done
536
+ - Database handles are released by `close()` or context manager
537
+
538
+ ## Platform-Specific Notes
539
+
540
+ ### Linux
541
+
542
+ ```bash
543
+ # Build
544
+ cmake --preset desktop-release
545
+ cmake --build build/desktop-release
546
+
547
+ # Install
548
+ cp build/desktop-release/core/libedgevdb_shared.so python/edgevdb/lib/linux/
549
+ pip install -e python/
550
+ ```
551
+
552
+ ### macOS
553
+
554
+ ```bash
555
+ # Build
556
+ cmake --preset desktop-release
557
+ cmake --build build/desktop-release
558
+
559
+ # Install
560
+ cp build/desktop-release/core/libedgevdb_shared.dylib python/edgevdb/lib/darwin/
561
+ pip install -e python/
562
+ ```
563
+
564
+ ### Windows
565
+
566
+ ```powershell
567
+ # Build
568
+ cmake --preset desktop-release
569
+ cmake --build build/desktop-release
570
+
571
+ # Install
572
+ copy build\desktop-release\core\edgevdb_shared.dll python\edgevdb\lib\windows\
573
+ pip install -e python\
574
+ ```
575
+
576
+ ### Raspberry Pi
577
+
578
+ ```bash
579
+ # Build with NEON support
580
+ cmake --preset desktop-release
581
+ cmake --build build/desktop-release
582
+
583
+ # Install
584
+ cp build/desktop-release/core/libedgevdb_shared.so python/edgevdb/lib/linux/
585
+ pip install -e python/
586
+ ```
587
+
588
+ ## Testing
589
+
590
+ ```bash
591
+ cd python
592
+
593
+ # Run tests
594
+ python -m unittest tests.test_edgevdb -v
595
+
596
+ # Or with pytest
597
+ pytest tests/ -v
598
+ ```
599
+
600
+ ## Troubleshooting
601
+
602
+ ### Library Not Found
603
+
604
+ **Error:** `FileNotFoundError: Could not find EdgeVDB library`
605
+
606
+ **Solution:**
607
+ 1. Build the C++ core: `cmake --preset desktop-release && cmake --build build/desktop-release`
608
+ 2. Copy the shared library to `python/edgevdb/lib/<platform>/`
609
+ 3. Verify the library name matches your platform
610
+
611
+ ### Import Errors
612
+
613
+ **Error:** `ImportError: dynamic module does not define init function`
614
+
615
+ **Solution:**
616
+ - Ensure the shared library was built for your platform
617
+ - Check Python architecture matches library (32-bit vs 64-bit)
618
+ - Rebuild the C++ core for your platform
619
+
620
+ ### Segmentation Faults
621
+
622
+ **Error:** Python crashes with segmentation fault
623
+
624
+ **Solution:**
625
+ - Ensure you're using the correct library version
626
+ - Check that you're not accessing freed handles
627
+ - Verify embedding dimensions are exactly 384
628
+ - Enable debug logging: `set_log_level(3)`
629
+
630
+ ## Contributing
631
+
632
+ ### Development Setup
633
+
634
+ ```bash
635
+ # Build C++ core in debug mode
636
+ cmake --preset desktop-debug
637
+ cmake --build build/desktop-debug
638
+
639
+ # Copy debug library
640
+ cp build/desktop-debug/core/libedgevdb_shared.so python/edgevdb/
641
+
642
+ # Install in development mode
643
+ cd python
644
+ pip install -e .
645
+ ```
646
+
647
+ ### Running Tests
648
+
649
+ ```bash
650
+ cd python
651
+ python -m unittest tests.test_edgevdb -v
652
+ ```
653
+
654
+ ### Code Style
655
+
656
+ - Follow PEP 8
657
+ - Use type hints
658
+ - Add docstrings for public APIs
659
+ - Run black and flake8
660
+
661
+ ## See Also
662
+
663
+ - [../README.md](../README.md) — Project overview
664
+ - [../../DEVELOPER_GUIDE.md](../../DEVELOPER_GUIDE.md) — Build and integration guide
665
+ - [../../docs/python_integration.md](../../docs/python_integration.md) — Python integration guide
666
+ - [examples/](examples/) — Example scripts