edgevdb 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,4 @@
1
+ recursive-include edgevdb/lib *.so *.dll *.dylib
2
+ include edgevdb/lib/README.md
3
+ include README.md
4
+ include LICENSE
edgevdb-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,668 @@
1
+ Metadata-Version: 2.4
2
+ Name: edgevdb
3
+ Version: 0.1.0
4
+ Summary: EdgeVDB — On-device vector database with HNSW, hybrid retrieval, knowledge graph, and CRDT sync
5
+ Author-email: XformAI <contact@xformai.in>
6
+ License: Apache-2.0
7
+ Project-URL: Homepage, https://github.com/XformAI/EDGEVDB
8
+ Project-URL: Documentation, https://xformai.github.io/EDGEVDB/
9
+ Project-URL: Repository, https://github.com/XformAI/EDGEVDB
10
+ Project-URL: Issues, https://github.com/XformAI/EDGEVDB/issues
11
+ Keywords: vector-database,hnsw,embedding,rag,on-device,edge-ai,semantic-search
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: License :: OSI Approved :: Apache Software License
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.8
17
+ Classifier: Programming Language :: Python :: 3.9
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Programming Language :: Python :: 3.13
22
+ Classifier: Operating System :: Microsoft :: Windows
23
+ Classifier: Operating System :: POSIX :: Linux
24
+ Classifier: Operating System :: MacOS
25
+ Classifier: Topic :: Database
26
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
27
+ Requires-Python: >=3.8
28
+ Description-Content-Type: text/markdown
29
+
30
+ # EdgeVDB Python SDK
31
+
32
+ > **Python wrapper for EdgeVDB on-device vector database with ctypes FFI binding.**
33
+
34
+ The EdgeVDB Python SDK provides a Pythonic interface to the EdgeVDB C++ core library using ctypes. It enables Python applications to use EdgeVDB's vector database capabilities on desktop and Raspberry Pi platforms.
35
+
36
+ ## Features
37
+
38
+ - **ctypes FFI Binding** — Direct calls to C API with no Python dependencies
39
+ - **Context Manager Support** — Automatic resource cleanup with `with` statements
40
+ - **Type Hints** — Full type annotations for IDE support
41
+ - **Zero Python Dependencies** — Only standard library and ctypes
42
+ - **Cross-Platform** — Linux, macOS, Windows, Raspberry Pi
43
+ - **Flexible Embedding** — Use any embedding provider or built-in ONNX embedder
44
+
45
+ ## Installation
46
+
47
+ ### From PyPI (Recommended)
48
+
49
+ ```bash
50
+ pip install edgevdb
51
+ ```
52
+
53
+ Pre-built wheels include native libraries for **Linux** (x86_64, glibc 2.28+), **macOS** (arm64/x86_64), and **Windows** (x86_64).
54
+
55
+ ### From Source
56
+
57
+ ```bash
58
+ # Build the C++ core first
59
+ cd ..
60
+ cmake --preset desktop-release
61
+ cmake --build build/desktop-release
62
+
63
+ # Copy shared library to Python package (platform-specific)
64
+ # Linux:
65
+ cp build/desktop-release/core/libedgevdb_shared.so python/edgevdb/lib/linux/
66
+ # macOS:
67
+ # cp build/desktop-release/core/libedgevdb_shared.dylib python/edgevdb/lib/darwin/
68
+ # Windows:
69
+ # copy build\desktop-release\core\edgevdb_shared.dll python\edgevdb\lib\windows\
70
+
71
+ # Install in development mode
72
+ cd python
73
+ pip install -e .
74
+ ```
75
+
76
+ ## Quick Start
77
+
78
+ ### Without ONNX (Recommended)
79
+
80
+ Use embeddings from any provider (OpenAI, Cohere, sentence-transformers, etc.):
81
+
82
+ ```python
83
+ from edgevdb import EdgeVDB
84
+
85
+ # Open database
86
+ db = EdgeVDB("./my_database")
87
+
88
+ # Get embeddings from your preferred provider
89
+ # Example with sentence-transformers:
90
+ from sentence_transformers import SentenceTransformer
91
+ model = SentenceTransformer('all-MiniLM-L6-v2')
92
+ embedding = model.encode("Machine learning finds patterns in data")
93
+
94
+ # Insert with pre-computed embedding
95
+ chunk_id = db.insert_chunk(
96
+ text="Machine learning finds patterns in data",
97
+ embedding=embedding,
98
+ doc_id=1,
99
+ page_number=0
100
+ )
101
+
102
+ # Query
103
+ query_emb = model.encode("what is ML?")
104
+ results = db.query_vector(query_emb, query_text="what is ML?", top_k=5)
105
+
106
+ for r in results:
107
+ print(f"score={r.score:.3f} text={r.text}")
108
+
109
+ # Object store
110
+ doc_id = db.put_object("Document", {"title": "ML Intro", "author": "Alice"})
111
+ db.add_relation("has_chunk", doc_id, chunk_id)
112
+
113
+ db.save()
114
+ db.close()
115
+ ```
116
+
117
+ ### With Built-in Embedder
118
+
119
+ ```python
120
+ from edgevdb import EdgeVDB, Embedder
121
+
122
+ # Create embedder
123
+ embedder = Embedder(
124
+ model_path="models/model.onnx",
125
+ vocab_path="models/vocab.txt",
126
+ threads=2
127
+ )
128
+
129
+ # Use with context manager
130
+ with EdgeVDB("./my_database") as db:
131
+ # Auto-embed on insert
132
+ chunk_id = db.insert_text(
133
+ embedder,
134
+ "Deep learning uses neural networks",
135
+ doc_id=1,
136
+ page_number=0
137
+ )
138
+
139
+ # Auto-embed on query
140
+ results = db.query_text(embedder, "neural network architecture", top_k=5)
141
+ print(results.context_string)
142
+ ```
143
+
144
+ ## API Reference
145
+
146
+ ### EdgeVDB
147
+
148
+ Main database class.
149
+
150
+ #### Constructor
151
+
152
+ ```python
153
+ EdgeVDB(storage_dir: str, **kwargs)
154
+ ```
155
+
156
+ **Parameters:**
157
+ - `storage_dir` (str): Directory for database files
158
+ - `hnsw_M` (int): HNSW M parameter (default: 16)
159
+ - `hnsw_ef_construction` (int): HNSW ef_construction (default: 200)
160
+ - `hnsw_ef_search` (int): HNSW ef_search (default: 64)
161
+ - `ranker_alpha` (float): Cosine weight (default: 0.70)
162
+ - `ranker_beta` (float): Page proximity weight (default: 0.20)
163
+ - `ranker_gamma` (float): Keyword weight (default: 0.10)
164
+ - `token_budget` (int): Max tokens in context (default: 3200)
165
+ - `embedding_threads` (int): ONNX thread count (default: 2)
166
+ - `enable_knowledge_graph` (bool): Enable KG (default: True)
167
+ - `enable_sync` (bool): Enable sync (default: False)
168
+ - `device_id` (str): Device ID for sync (default: auto-generated)
169
+
170
+ #### Methods
171
+
172
+ ##### Vector Store
173
+
174
+ **insert_chunk(text, embedding, doc_id=0, page_number=0) -> int**
175
+ - Insert text with pre-computed embedding
176
+ - Returns chunk ID
177
+
178
+ ```python
179
+ chunk_id = db.insert_chunk(
180
+ text="Your text here",
181
+ embedding=[0.1, 0.2, ...], # 384-dim float array
182
+ doc_id=1,
183
+ page_number=0
184
+ )
185
+ ```
186
+
187
+ **insert_text(embedder, text, doc_id=0, page_number=0) -> int**
188
+ - Insert text with auto-embedding via embedder
189
+ - Returns chunk ID
190
+
191
+ ```python
192
+ chunk_id = db.insert_text(
193
+ embedder,
194
+ "Your text here",
195
+ doc_id=1,
196
+ page_number=0
197
+ )
198
+ ```
199
+
200
+ **remove_chunk(chunk_id)**
201
+ - Remove chunk by ID
202
+
203
+ ```python
204
+ db.remove_chunk(chunk_id)
205
+ ```
206
+
207
+ **query_vector(embedding, query_text="", top_k=5) -> QueryResults**
208
+ - Query with pre-computed embedding
209
+ - Returns QueryResults object
210
+
211
+ ```python
212
+ results = db.query_vector(
213
+ embedding=[0.1, 0.2, ...],
214
+ query_text="search query",
215
+ top_k=5
216
+ )
217
+ ```
218
+
219
+ **query_text(embedder, query, top_k=5, use_kg_expansion=False) -> QueryResults**
220
+ - Query with auto-embedding via embedder
221
+ - Returns QueryResults object
222
+
223
+ ```python
224
+ results = db.query_text(
225
+ embedder,
226
+ "search query",
227
+ top_k=5,
228
+ use_kg_expansion=False
229
+ )
230
+ ```
231
+
232
+ ##### Object Store
233
+
234
+ **put_object(type_name, properties) -> int**
235
+ - Store JSON object
236
+ - Returns object ID
237
+
238
+ ```python
239
+ doc_id = db.put_object(
240
+ "Document",
241
+ {"title": "My Doc", "author": "Alice"}
242
+ )
243
+ ```
244
+
245
+ **get_object(object_id) -> Optional[Dict]**
246
+ - Retrieve object by ID
247
+ - Returns dict or None if not found
248
+
249
+ ```python
250
+ obj = db.get_object(doc_id)
251
+ if obj:
252
+ print(obj["title"])
253
+ ```
254
+
255
+ **remove_object(object_id)**
256
+ - Soft delete object
257
+
258
+ ```python
259
+ db.remove_object(doc_id)
260
+ ```
261
+
262
+ ##### Relations
263
+
264
+ **add_relation(name, from_id, to_id)**
265
+ - Add typed edge between objects
266
+
267
+ ```python
268
+ db.add_relation("has_chunk", doc_id, chunk_id)
269
+ ```
270
+
271
+ ##### Lifecycle
272
+
273
+ **save()**
274
+ - Flush all data to disk
275
+
276
+ ```python
277
+ db.save()
278
+ ```
279
+
280
+ **close()**
281
+ - Release native resources
282
+
283
+ ```python
284
+ db.close()
285
+ ```
286
+
287
+ **Context Manager**
288
+
289
+ ```python
290
+ with EdgeVDB("./data") as db:
291
+ # Auto-save and close on exit
292
+ db.insert_chunk("text", embedding, doc_id=1)
293
+ ```
294
+
295
+ ### Embedder
296
+
297
+ ONNX embedding model wrapper.
298
+
299
+ #### Constructor
300
+
301
+ ```python
302
+ Embedder(model_path: str, vocab_path: str, threads: int = 2)
303
+ ```
304
+
305
+ **Parameters:**
306
+ - `model_path` (str): Path to ONNX model file
307
+ - `vocab_path` (str): Path to vocabulary file
308
+ - `threads` (int): Number of inference threads (default: 2)
309
+
310
+ #### Methods
311
+
312
+ **embed(text: str) -> List[float]**
313
+ - Embed text to 384-dim vector
314
+ - Returns list of floats
315
+
316
+ ```python
317
+ embedding = embedder.embed("Hello world")
318
+ ```
319
+
320
+ **destroy()**
321
+ - Release native resources
322
+
323
+ ```python
324
+ embedder.destroy()
325
+ ```
326
+
327
+ ### QueryResults
328
+
329
+ Query result container with lazy access.
330
+
331
+ #### Properties
332
+
333
+ **count** (int): Number of results
334
+
335
+ ```python
336
+ print(f"Found {results.count} results")
337
+ ```
338
+
339
+ **context_string** (str): Pre-assembled RAG context
340
+
341
+ ```python
342
+ print(results.context_string)
343
+ ```
344
+
345
+ #### Methods
346
+
347
+ **__getitem__(index) -> ChunkResult**
348
+ - Access individual result by index
349
+
350
+ ```python
351
+ result = results[0]
352
+ print(result.text)
353
+ ```
354
+
355
+ **__iter__()**
356
+ - Iterate over results
357
+
358
+ ```python
359
+ for r in results:
360
+ print(f"{r.score}: {r.text}")
361
+ ```
362
+
363
+ **to_list() -> List[ChunkResult]**
364
+ - Convert to list
365
+
366
+ ```python
367
+ results_list = results.to_list()
368
+ ```
369
+
370
+ **free()**
371
+ - Free native query handle (called automatically by __del__)
372
+
373
+ ```python
374
+ results.free()
375
+ ```
376
+
377
+ ### ChunkResult
378
+
379
+ Single query result.
380
+
381
+ #### Attributes
382
+
383
+ - **chunk_id** (int): Unique chunk identifier
384
+ - **text** (str): Chunk text content
385
+ - **score** (float): Hybrid similarity score [0.0, 1.0]
386
+ - **page_number** (int): Page number in document
387
+ - **doc_id** (int): Document identifier
388
+
389
+ ```python
390
+ for r in results:
391
+ print(f"ID: {r.chunk_id}")
392
+ print(f"Text: {r.text}")
393
+ print(f"Score: {r.score:.3f}")
394
+ print(f"Page: {r.page_number}")
395
+ ```
396
+
397
+ ## Examples
398
+
399
+ ### RAG Pipeline
400
+
401
+ ```python
402
+ from edgevdb import EdgeVDB
403
+ from sentence_transformers import SentenceTransformer
404
+
405
+ # Initialize
406
+ model = SentenceTransformer('all-MiniLM-L6-v2')
407
+ db = EdgeVDB("./rag_database")
408
+
409
+ # Index documents
410
+ documents = [
411
+ {"id": 1, "text": "Python is a high-level programming language."},
412
+ {"id": 2, "text": "Machine learning is a subset of AI."},
413
+ {"id": 3, "text": "Vector databases enable semantic search."},
414
+ ]
415
+
416
+ for doc in documents:
417
+ embedding = model.encode(doc["text"])
418
+ db.insert_chunk(doc["text"], embedding, doc_id=doc["id"])
419
+
420
+ # Query
421
+ query = "What is semantic search?"
422
+ query_emb = model.encode(query)
423
+ results = db.query_vector(query_emb, query_text=query, top_k=2)
424
+
425
+ # Assemble context
426
+ context = results.context_string
427
+ print(f"Context: {context}")
428
+
429
+ db.save()
430
+ db.close()
431
+ ```
432
+
433
+ ### Object Store + Relations
434
+
435
+ ```python
436
+ from edgevdb import EdgeVDB
437
+
438
+ db = EdgeVDB("./my_database")
439
+
440
+ # Store documents
441
+ doc1_id = db.put_object("Document", {
442
+ "title": "Introduction to ML",
443
+ "author": "Alice",
444
+ "year": 2024
445
+ })
446
+
447
+ doc2_id = db.put_object("Document", {
448
+ "title": "Advanced Topics",
449
+ "author": "Bob",
450
+ "year": 2024
451
+ })
452
+
453
+ # Store chunks with embeddings
454
+ chunk1_id = db.insert_chunk("ML is fascinating", emb, doc_id=doc1_id)
455
+ chunk2_id = db.insert_chunk("Deep learning is powerful", emb, doc_id=doc2_id)
456
+
457
+ # Link chunks to documents
458
+ db.add_relation("has_chunk", doc1_id, chunk1_id)
459
+ db.add_relation("has_chunk", doc2_id, chunk2_id)
460
+
461
+ db.save()
462
+ db.close()
463
+ ```
464
+
465
+ ### Error Handling
466
+
467
+ ```python
468
+ from edgevdb import EdgeVDB, set_log_level
469
+
470
+ # Enable debug logging
471
+ set_log_level(3)
472
+
473
+ try:
474
+ db = EdgeVDB("./my_database")
475
+
476
+ # Operations
477
+ chunk_id = db.insert_chunk("text", embedding, doc_id=1)
478
+
479
+ # Object not found returns None (doesn't throw)
480
+ obj = db.get_object(999)
481
+ if obj is None:
482
+ print("Object not found")
483
+
484
+ db.save()
485
+ db.close()
486
+
487
+ except RuntimeError as e:
488
+ print(f"EdgeVDB error: {e}")
489
+ ```
490
+
491
+ ## Library Discovery
492
+
493
+ The Python SDK automatically searches for the EdgeVDB shared library in the following locations:
494
+
495
+ 1. Platform-specific directory (`edgevdb/lib/<platform>/`) — **preferred**
496
+ 2. Package lib directory (`edgevdb/lib/`)
497
+ 3. Package directory (`edgevdb/`)
498
+ 4. Current working directory
499
+ 5. `build/desktop-release/core/`
500
+ 6. `build/desktop-debug/core/`
501
+
502
+ **Library Layout:**
503
+ ```
504
+ python/edgevdb/lib/
505
+ linux/ → libedgevdb_shared.so
506
+ darwin/ → libedgevdb_shared.dylib
507
+ windows/ → edgevdb_shared.dll, libedgevdb_shared.dll
508
+ ```
509
+
510
+ ## Performance Considerations
511
+
512
+ ### Embedding Provider Choice
513
+
514
+ | Provider | Speed | Quality | Offline | Cost |
515
+ |----------|-------|--------|---------|------|
516
+ | sentence-transformers | Fast | Good | ✅ | Free |
517
+ | OpenAI API | Slow | Excellent | ❌ | Paid |
518
+ | Cohere API | Medium | Good | ❌ | Paid |
519
+ | Built-in ONNX | Medium | Good | ✅ | Free |
520
+
521
+ ### Batch Operations
522
+
523
+ For large-scale operations, consider batching:
524
+
525
+ ```python
526
+ # Batch insert
527
+ embeddings = model.encode(texts)
528
+ for text, emb in zip(texts, embeddings):
529
+ db.insert_chunk(text, emb, doc_id=doc_id)
530
+
531
+ db.save() # Save once after all inserts
532
+ ```
533
+
534
+ ### Memory Management
535
+
536
+ - Query results hold native handles; call `results.free()` or use context manager
537
+ - Embedders hold native resources; call `embedder.destroy()` when done
538
+ - Database handles are released by `close()` or context manager
539
+
540
+ ## Platform-Specific Notes
541
+
542
+ ### Linux
543
+
544
+ ```bash
545
+ # Build
546
+ cmake --preset desktop-release
547
+ cmake --build build/desktop-release
548
+
549
+ # Install
550
+ cp build/desktop-release/core/libedgevdb_shared.so python/edgevdb/lib/linux/
551
+ pip install -e python/
552
+ ```
553
+
554
+ ### macOS
555
+
556
+ ```bash
557
+ # Build
558
+ cmake --preset desktop-release
559
+ cmake --build build/desktop-release
560
+
561
+ # Install
562
+ cp build/desktop-release/core/libedgevdb_shared.dylib python/edgevdb/lib/darwin/
563
+ pip install -e python/
564
+ ```
565
+
566
+ ### Windows
567
+
568
+ ```powershell
569
+ # Build
570
+ cmake --preset desktop-release
571
+ cmake --build build/desktop-release
572
+
573
+ # Install
574
+ copy build\desktop-release\core\edgevdb_shared.dll python\edgevdb\lib\windows\
575
+ pip install -e python\
576
+ ```
577
+
578
+ ### Raspberry Pi
579
+
580
+ ```bash
581
+ # Build with NEON support
582
+ cmake --preset desktop-release
583
+ cmake --build build/desktop-release
584
+
585
+ # Install
586
+ cp build/desktop-release/core/libedgevdb_shared.so python/edgevdb/lib/linux/
587
+ pip install -e python/
588
+ ```
589
+
590
+ ## Testing
591
+
592
+ ```bash
593
+ cd python
594
+
595
+ # Run tests
596
+ python -m unittest tests.test_edgevdb -v
597
+
598
+ # Or with pytest
599
+ pytest tests/ -v
600
+ ```
601
+
602
+ ## Troubleshooting
603
+
604
+ ### Library Not Found
605
+
606
+ **Error:** `FileNotFoundError: Could not find EdgeVDB library`
607
+
608
+ **Solution:**
609
+ 1. Build the C++ core: `cmake --preset desktop-release && cmake --build build/desktop-release`
610
+ 2. Copy the shared library to `python/edgevdb/lib/<platform>/`
611
+ 3. Verify the library name matches your platform
612
+
613
+ ### Import Errors
614
+
615
+ **Error:** `ImportError: dynamic module does not define init function`
616
+
617
+ **Solution:**
618
+ - Ensure the shared library was built for your platform
619
+ - Check Python architecture matches library (32-bit vs 64-bit)
620
+ - Rebuild the C++ core for your platform
621
+
622
+ ### Segmentation Faults
623
+
624
+ **Error:** Python crashes with segmentation fault
625
+
626
+ **Solution:**
627
+ - Ensure you're using the correct library version
628
+ - Check that you're not accessing freed handles
629
+ - Verify embedding dimensions are exactly 384
630
+ - Enable debug logging: `set_log_level(3)`
631
+
632
+ ## Contributing
633
+
634
+ ### Development Setup
635
+
636
+ ```bash
637
+ # Build C++ core in debug mode
638
+ cmake --preset desktop-debug
639
+ cmake --build build/desktop-debug
640
+
641
+ # Copy debug library
642
+ cp build/desktop-debug/core/libedgevdb_shared.so python/edgevdb/
643
+
644
+ # Install in development mode
645
+ cd python
646
+ pip install -e .
647
+ ```
648
+
649
+ ### Running Tests
650
+
651
+ ```bash
652
+ cd python
653
+ python -m unittest tests.test_edgevdb -v
654
+ ```
655
+
656
+ ### Code Style
657
+
658
+ - Follow PEP 8
659
+ - Use type hints
660
+ - Add docstrings for public APIs
661
+ - Run black and flake8
662
+
663
+ ## See Also
664
+
665
+ - [../README.md](../README.md) — Project overview
666
+ - [../../DEVELOPER_GUIDE.md](../../DEVELOPER_GUIDE.md) — Build and integration guide
667
+ - [../../docs/python_integration.md](../../docs/python_integration.md) — Python integration guide
668
+ - [examples/](examples/) — Example scripts