endee-llamaindex 0.1.2__tar.gz → 0.1.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,492 @@
1
+ Metadata-Version: 2.4
2
+ Name: endee-llamaindex
3
+ Version: 0.1.3
4
+ Summary: Vector Database for Fast ANN Searches
5
+ Home-page: https://endee.io
6
+ Author: Endee Labs
7
+ Author-email: vineet@endee.io
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.6
12
+ Description-Content-Type: text/markdown
13
+ Requires-Dist: llama-index>=0.12.34
14
+ Requires-Dist: endee>=0.1.4
15
+ Dynamic: author
16
+ Dynamic: author-email
17
+ Dynamic: classifier
18
+ Dynamic: description
19
+ Dynamic: description-content-type
20
+ Dynamic: home-page
21
+ Dynamic: requires-dist
22
+ Dynamic: requires-python
23
+ Dynamic: summary
24
+
25
+ # Endee LlamaIndex Integration
26
+
27
+ Build powerful RAG applications with Endee vector database and LlamaIndex.
28
+
29
+ ---
30
+
31
+ ## Table of Contents
32
+
33
+ 1. [Installation](#1-installation)
34
+ 2. [Setting up Credentials](#2-setting-up-endee-and-openai-credentials)
35
+ 3. [Creating Sample Documents](#3-creating-sample-documents)
36
+ 4. [Setting up Endee with LlamaIndex](#4-setting-up-endee-with-llamaindex)
37
+ 5. [Creating a Vector Index](#5-creating-a-vector-index-from-documents)
38
+ 6. [Basic Retrieval](#6-basic-retrieval-with-query-engine)
39
+ 7. [Using Metadata Filters](#7-using-metadata-filters)
40
+ 8. [Advanced Filtering](#8-advanced-filtering-with-multiple-conditions)
41
+ 9. [Custom Retriever Setup](#9-custom-retriever-setup)
42
+ 10. [Custom Retriever with Query Engine](#10-using-a-custom-retriever-with-a-query-engine)
43
+ 11. [Direct VectorStore Querying](#11-direct-vectorstore-querying)
44
+ 12. [Saving and Loading Indexes](#12-saving-and-loading-indexes)
45
+ 13. [Cleanup](#13-cleanup)
46
+
47
+ ---
48
+
49
+ ## 1. Installation
50
+
51
+ Get started by installing the required package.
52
+
53
+ ```bash
54
+ pip install endee-llamaindex
55
+ ```
56
+
57
+ > **Note:** This will automatically install `endee` and `llama-index` as dependencies.
58
+
59
+ ---
60
+
61
+ ## 2. Setting up Endee and OpenAI credentials
62
+
63
+ Configure your API credentials for Endee and OpenAI.
64
+
65
+ ```python
66
+ import os
67
+ from llama_index.embeddings.openai import OpenAIEmbedding
68
+
69
+ # Set API keys
70
+ os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
71
+ endee_api_token = "your-endee-api-token"
72
+ ```
73
+
74
+ > **Tip:** Store your API keys in environment variables for production use.
75
+
76
+ ---
77
+
78
+ ## 3. Creating Sample Documents
79
+
80
+ Create documents with metadata for filtering and organization.
81
+
82
+ ```python
83
+ from llama_index.core import Document
84
+
85
+ # Create sample documents with different categories and metadata
86
+ documents = [
87
+ Document(
88
+ text="Python is a high-level, interpreted programming language known for its readability and simplicity.",
89
+ metadata={"category": "programming", "language": "python", "difficulty": "beginner"}
90
+ ),
91
+ Document(
92
+ text="JavaScript is a scripting language that enables interactive web pages and is an essential part of web applications.",
93
+ metadata={"category": "programming", "language": "javascript", "difficulty": "intermediate"}
94
+ ),
95
+ Document(
96
+ text="Machine learning is a subset of artificial intelligence that provides systems the ability to automatically learn and improve from experience.",
97
+ metadata={"category": "ai", "field": "machine_learning", "difficulty": "advanced"}
98
+ ),
99
+ Document(
100
+ text="Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning.",
101
+ metadata={"category": "ai", "field": "deep_learning", "difficulty": "advanced"}
102
+ ),
103
+ Document(
104
+ text="Vector databases are specialized database systems designed to store and query high-dimensional vectors for similarity search.",
105
+ metadata={"category": "database", "type": "vector", "difficulty": "intermediate"}
106
+ ),
107
+ Document(
108
+ text="Endee is a vector database that provides secure and private vector search capabilities.",
109
+ metadata={"category": "database", "type": "vector", "product": "endee", "difficulty": "intermediate"}
110
+ )
111
+ ]
112
+
113
+ print(f"Created {len(documents)} sample documents")
114
+ ```
115
+
116
+ **Output:**
117
+ ```
118
+ Created 6 sample documents
119
+ ```
120
+
121
+ ---
122
+
123
+ ## 4. Setting up Endee with LlamaIndex
124
+
125
+ Initialize the Endee vector store and connect it to LlamaIndex.
126
+
127
+ ```python
128
+ from endee_llamaindex import EndeeVectorStore
129
+ from llama_index.core import StorageContext
130
+ import time
131
+
132
+ # Create a unique index name with timestamp to avoid conflicts
133
+ timestamp = int(time.time())
134
+ index_name = f"llamaindex_demo_{timestamp}"
135
+
136
+ # Set up the embedding model
137
+ embed_model = OpenAIEmbedding()
138
+
139
+ # Get the embedding dimension
140
+ dimension = 1536 # OpenAI's default embedding dimension
141
+
142
+ # Initialize the Endee vector store
143
+ vector_store = EndeeVectorStore.from_params(
144
+ api_token=endee_api_token,
145
+ index_name=index_name,
146
+ dimension=dimension,
147
+ space_type="cosine", # Can be "cosine", "l2", or "ip"
148
+ precision="medium" # Index precision: "low", "medium", "high", or None
149
+ )
150
+
151
+ # Create storage context with our vector store
152
+ storage_context = StorageContext.from_defaults(vector_store=vector_store)
153
+
154
+ print(f"Initialized Endee vector store with index: {index_name}")
155
+ ```
156
+
157
+ ### Configuration Options
158
+
159
+ | Parameter | Description | Options |
160
+ |-----------|-------------|---------|
161
+ | `space_type` | Distance metric for similarity | `cosine`, `l2`, `ip` |
162
+ | `dimension` | Vector dimension | Must match embedding model |
163
+ | `precision` | Index precision setting | `"low"`, `"medium"` (default), `"high"`, or `None` |
164
+ | `key` | Encryption key for metadata | 256-bit hex key (64 hex characters) |
165
+ | `batch_size` | Vectors per API call | Default: `100` |
166
+
167
+ ---
168
+
169
+ ## 5. Creating a Vector Index from Documents
170
+
171
+ Build a searchable vector index from your documents.
172
+
173
+ ```python
174
+ from llama_index.core import VectorStoreIndex
175
+
176
+ # Create a vector index
177
+ index = VectorStoreIndex.from_documents(
178
+ documents,
179
+ storage_context=storage_context,
180
+ embed_model=embed_model
181
+ )
182
+
183
+ print("Vector index created successfully")
184
+ ```
185
+
186
+ **Output:**
187
+ ```
188
+ Vector index created successfully
189
+ ```
190
+
191
+ ---
192
+
193
+ ## 6. Basic Retrieval with Query Engine
194
+
195
+ Create a query engine and perform semantic search.
196
+
197
+ ```python
198
+ # Create a query engine
199
+ query_engine = index.as_query_engine()
200
+
201
+ # Ask a question
202
+ response = query_engine.query("What is Python?")
203
+
204
+ print("Query: What is Python?")
205
+ print("Response:")
206
+ print(response)
207
+ ```
208
+
209
+ **Example Output:**
210
+ ```
211
+ Query: What is Python?
212
+ Response:
213
+ Python is a high-level, interpreted programming language known for its readability and simplicity.
214
+ ```
215
+
216
+ ---
217
+
218
+ ## 7. Using Metadata Filters
219
+
220
+ Filter search results based on document metadata.
221
+
222
+ ```python
223
+ from llama_index.core.vector_stores.types import MetadataFilters, MetadataFilter, FilterOperator
224
+
225
+ # Create a filtered retriever to only search within AI-related documents
226
+ ai_filter = MetadataFilter(key="category", value="ai", operator=FilterOperator.EQ)
227
+ ai_filters = MetadataFilters(filters=[ai_filter])
228
+
229
+ # Create a filtered query engine
230
+ filtered_query_engine = index.as_query_engine(filters=ai_filters)
231
+
232
+ # Ask a general question but only using AI documents
233
+ response = filtered_query_engine.query("What is learning from data?")
234
+
235
+ print("Filtered Query (AI category only): What is learning from data?")
236
+ print("Response:")
237
+ print(response)
238
+ ```
239
+
240
+ ### Available Filter Operators
241
+
242
+ | Operator | Description |
243
+ |----------|-------------|
244
+ | `FilterOperator.EQ` | Equal to |
245
+ | `FilterOperator.NE` | Not equal to |
246
+ | `FilterOperator.GT` | Greater than |
247
+ | `FilterOperator.GTE` | Greater than or equal |
248
+ | `FilterOperator.LT` | Less than |
249
+ | `FilterOperator.LTE` | Less than or equal |
250
+ | `FilterOperator.IN` | In list |
251
+ | `FilterOperator.NIN` | Not in list |
252
+
253
+ ---
254
+
255
+ ## 8. Advanced Filtering with Multiple Conditions
256
+
257
+ Combine multiple metadata filters for precise results.
258
+
259
+ ```python
260
+ # Create a more complex filter: database category AND intermediate difficulty
261
+ category_filter = MetadataFilter(key="category", value="database", operator=FilterOperator.EQ)
262
+ difficulty_filter = MetadataFilter(key="difficulty", value="intermediate", operator=FilterOperator.EQ)
263
+
264
+ complex_filters = MetadataFilters(filters=[category_filter, difficulty_filter])
265
+
266
+ # Create a query engine with the complex filters
267
+ complex_filtered_engine = index.as_query_engine(filters=complex_filters)
268
+
269
+ # Query with the complex filters
270
+ response = complex_filtered_engine.query("Tell me about databases")
271
+
272
+ print("Complex Filtered Query (database category AND intermediate difficulty): Tell me about databases")
273
+ print("Response:")
274
+ print(response)
275
+ ```
276
+
277
+ > **Note:** Multiple filters are combined with AND logic by default.
278
+
279
+ ---
280
+
281
+ ## 9. Custom Retriever Setup
282
+
283
+ Create a custom retriever for fine-grained control over the retrieval process.
284
+
285
+ ```python
286
+ from llama_index.core.retrievers import VectorIndexRetriever
287
+
288
+ # Create a retriever with custom parameters
289
+ retriever = VectorIndexRetriever(
290
+ index=index,
291
+ similarity_top_k=3, # Return top 3 most similar results
292
+ filters=ai_filters # Use our AI category filter from before
293
+ )
294
+
295
+ # Retrieve nodes for a query
296
+ nodes = retriever.retrieve("What is deep learning?")
297
+
298
+ print(f"Retrieved {len(nodes)} nodes for query: 'What is deep learning?' (with AI category filter)")
299
+ print("\nRetrieved content:")
300
+ for i, node in enumerate(nodes):
301
+ print(f"\nNode {i+1}:")
302
+ print(f"Text: {node.node.text}")
303
+ print(f"Metadata: {node.node.metadata}")
304
+ print(f"Score: {node.score:.4f}")
305
+ ```
306
+
307
+ **Example Output:**
308
+ ```
309
+ Retrieved 2 nodes for query: 'What is deep learning?' (with AI category filter)
310
+
311
+ Node 1:
312
+ Text: Deep learning is part of a broader family of machine learning methods...
313
+ Metadata: {'category': 'ai', 'field': 'deep_learning', 'difficulty': 'advanced'}
314
+ Score: 0.8934
315
+
316
+ Node 2:
317
+ Text: Machine learning is a subset of artificial intelligence...
318
+ Metadata: {'category': 'ai', 'field': 'machine_learning', 'difficulty': 'advanced'}
319
+ Score: 0.7821
320
+ ```
321
+
322
+ ---
323
+
324
+ ## 10. Using a Custom Retriever with a Query Engine
325
+
326
+ Combine your custom retriever with a query engine for enhanced control.
327
+
328
+ ```python
329
+ from llama_index.core.query_engine import RetrieverQueryEngine
330
+
331
+ # Create a query engine with our custom retriever
332
+ custom_query_engine = RetrieverQueryEngine.from_args(
333
+ retriever=retriever,
334
+ verbose=True # Enable verbose mode to see the retrieved nodes
335
+ )
336
+
337
+ # Query using the custom retriever query engine
338
+ response = custom_query_engine.query("Explain the difference between machine learning and deep learning")
339
+
340
+ print("\nFinal Response:")
341
+ print(response)
342
+ ```
343
+
344
+ ---
345
+
346
+ ## 11. Direct VectorStore Querying
347
+
348
+ Query the Endee vector store directly, bypassing the LlamaIndex query engine.
349
+
350
+ ```python
351
+ from llama_index.core.vector_stores.types import VectorStoreQuery
352
+
353
+ # Generate an embedding for our query
354
+ query_text = "What are vector databases?"
355
+ query_embedding = embed_model.get_text_embedding(query_text)
356
+
357
+ # Create a VectorStoreQuery
358
+ vector_store_query = VectorStoreQuery(
359
+ query_embedding=query_embedding,
360
+ similarity_top_k=2,
361
+ filters=MetadataFilters(filters=[MetadataFilter(key="category", value="database", operator=FilterOperator.EQ)])
362
+ )
363
+
364
+ # Execute the query directly on the vector store
365
+ query_result = vector_store.query(vector_store_query)
366
+
367
+ print(f"Direct VectorStore query: '{query_text}'")
368
+ print(f"Retrieved {len(query_result.nodes)} results with database category filter:")
369
+ for i, (node, score) in enumerate(zip(query_result.nodes, query_result.similarities)):
370
+ print(f"\nResult {i+1}:")
371
+ print(f"Text: {node.text}")
372
+ print(f"Metadata: {node.metadata}")
373
+ print(f"Similarity score: {score:.4f}")
374
+ ```
375
+
376
+ > **Tip:** Direct querying is useful when you need raw results without LLM processing.
377
+
378
+ ---
379
+
380
+ ## 12. Saving and Loading Indexes
381
+
382
+ Reconnect to your index in future sessions. Your vectors are stored in the cloud.
383
+
384
+ ```python
385
+ # To reconnect to an existing index in a future session:
386
+ def reconnect_to_index(api_token, index_name):
387
+ # Initialize the vector store with existing index
388
+ vector_store = EndeeVectorStore.from_params(
389
+ api_token=api_token,
390
+ index_name=index_name
391
+ )
392
+
393
+ # Create storage context
394
+ storage_context = StorageContext.from_defaults(vector_store=vector_store)
395
+
396
+ # Load the index
397
+ index = VectorStoreIndex.from_vector_store(
398
+ vector_store,
399
+ embed_model=OpenAIEmbedding()
400
+ )
401
+
402
+ return index
403
+
404
+ # Example usage
405
+ reconnected_index = reconnect_to_index(endee_api_token, index_name)
406
+ query_engine = reconnected_index.as_query_engine()
407
+ response = query_engine.query("What is Endee?")
408
+ print(response)
409
+
410
+ print(f"To reconnect to this index in the future, use:\n")
411
+ print(f"API Token: {endee_api_token}")
412
+ print(f"Index Name: {index_name}")
413
+ ```
414
+
415
+ > **Important:** Save your `index_name` to reconnect to your data later.
416
+
417
+ ---
418
+
419
+ ## 13. Cleanup
420
+
421
+ Delete the index when you're done to free up resources.
422
+
423
+ ```python
424
+ # Uncomment to delete your index
425
+ # endee.delete_index(index_name)
426
+ # print(f"Index {index_name} deleted")
427
+ ```
428
+
429
+ > **Warning:** Deleting an index permanently removes all stored vectors and cannot be undone.
430
+
431
+ ---
432
+
433
+ ## Quick Reference
434
+
435
+ ### EndeeVectorStore Parameters
436
+
437
+ | Parameter | Type | Description | Default |
438
+ |-----------|------|-------------|---------|
439
+ | `api_token` | `str` | Your Endee API token | Required |
440
+ | `index_name` | `str` | Name of the index | Required |
441
+ | `dimension` | `int` | Vector dimension | Required |
442
+ | `space_type` | `str` | Distance metric | `"cosine"` |
443
+ | `precision` | `str` | Index precision setting | `"medium"` |
444
+ | `key` | `str` | Encryption key for metadata (256-bit hex) | `None` |
445
+ | `batch_size` | `int` | Vectors per API call | `100` |
446
+
447
+ ### Distance Metrics
448
+
449
+ | Metric | Best For |
450
+ |--------|----------|
451
+ | `cosine` | Text embeddings, normalized vectors |
452
+ | `l2` | Image features, spatial data |
453
+ | `ip` | Recommendation systems, dot product similarity |
454
+
455
+ ### Precision Settings
456
+
457
+ The `precision` parameter controls the trade-off between search accuracy and performance:
458
+
459
+ | Precision | Description | Use Case |
460
+ |-----------|-------------|----------|
461
+ | `"low"` | Faster searches, lower accuracy | Large-scale applications where speed is critical |
462
+ | `"medium"` | Balanced performance and accuracy | General purpose applications (default) |
463
+ | `"high"` | Slower searches, higher accuracy | Applications requiring maximum precision |
464
+ | `None` | Default system precision | Use system defaults |
465
+
466
+ ### Encryption Support
467
+
468
+ You can encrypt metadata stored in Endee by providing a 256-bit encryption key (64 hex characters). This ensures sensitive information is encrypted at rest.
469
+
470
+ ```python
471
+ # Generate a 256-bit key (example - use a secure method in production)
472
+ import secrets
473
+ encryption_key = secrets.token_hex(32) # 32 bytes = 64 hex characters
474
+
475
+ # Create vector store with encryption
476
+ vector_store = EndeeVectorStore.from_params(
477
+ api_token=endee_api_token,
478
+ index_name=index_name,
479
+ dimension=dimension,
480
+ space_type="cosine",
481
+ precision="medium",
482
+ key=encryption_key # Metadata will be encrypted
483
+ )
484
+
485
+ # Important: Store this key securely! You'll need it to access the index later.
486
+ ```
487
+
488
+ > **Warning:** If you lose the encryption key, you will not be able to decrypt your metadata. Store it securely (e.g., in a secrets manager).
489
+
490
+ ---
491
+
492
+