omnirag 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
omnirag-1.0.0/PKG-INFO ADDED
@@ -0,0 +1,491 @@
1
+ Metadata-Version: 2.4
2
+ Name: omnirag
3
+ Version: 1.0.0
4
+ Summary: OmniRAG: Universal RAG System combining Liquid + Agentic + Chain RAG
5
+ Home-page: https://github.com/Giri530/omnirag
6
+ Author: Girinath V
7
+ Author-email: girinathv48@gmail.com
8
+ Keywords: rag llm ai faiss huggingface qwen machine-learning nlp pdf
9
+ Classifier: Development Status :: 4 - Beta
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.8
15
+ Classifier: Programming Language :: Python :: 3.9
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Requires-Python: >=3.8
19
+ Description-Content-Type: text/markdown
20
+ Requires-Dist: transformers>=4.30.0
21
+ Requires-Dist: torch>=2.0.0
22
+ Requires-Dist: sentence-transformers>=2.2.0
23
+ Requires-Dist: faiss-cpu>=1.7.4
24
+ Requires-Dist: numpy>=1.24.0
25
+ Requires-Dist: accelerate>=0.20.0
26
+ Requires-Dist: PyPDF2>=3.0.0
27
+ Requires-Dist: duckduckgo-search>=3.9.0
28
+ Requires-Dist: requests>=2.31.0
29
+ Dynamic: author
30
+ Dynamic: author-email
31
+ Dynamic: classifier
32
+ Dynamic: description
33
+ Dynamic: description-content-type
34
+ Dynamic: home-page
35
+ Dynamic: keywords
36
+ Dynamic: requires-dist
37
+ Dynamic: requires-python
38
+ Dynamic: summary
39
+
40
+ # 🚀 OmniRAG - The Universal RAG System
41
+
42
+ **Intelligent RAG combining Liquid + Agentic + Chain architectures**
43
+
44
+ 100% FREE using HuggingFace model (Qwen) + FAISS!
45
+
46
+ ---
47
+
48
+ ## 🎯 What is OmniRAG?
49
+
50
+ OmniRAG is an advanced Retrieval-Augmented Generation system that combines three powerful RAG techniques:
51
+
52
+ ### 🌊 Liquid RAG
53
+ Automatically adapts answers to user expertise level:
54
+ - **Beginner**: Simple explanations with examples
55
+ - **Intermediate**: Balanced technical content
56
+ - **Expert**: Deep technical details
57
+
58
+ ### 🤖 Agentic RAG
59
+ Intelligently chooses the best information source:
60
+ - **VectorDB**: For local documents
61
+ - **Web Search**: For current information
62
+
63
+ ### ⛓️ Chain RAG
64
+ Handles complex multi-part questions:
65
+ - Breaks down complex queries
66
+ - Answers each part separately
67
+ - Synthesizes coherent final answer
68
+
69
+ ---
70
+
71
+ ## ✨ Features
72
+
73
+ ✅ **PDF Support** - Load PDF files directly
74
+ ✅ **Multiple LLM Models** - Qwen, Flan-T5, Mistral, Phi-2
75
+ ✅ **FAISS Vector DB** - Fast similarity search
76
+ ✅ **Web Search** - DuckDuckGo integration (free!)
77
+ ✅ **Smart User Detection** - Auto expertise level detection
78
+ ✅ **Query Decomposition** - Handles complex questions
79
+ ✅ **Fast Caching** - 3x speedup on repeated queries
80
+ ✅ **100% FREE** - No API costs!
81
+ ✅ **Works on CPU** - No GPU required (but faster with GPU)
82
+
83
+ ---
84
+
85
+ ## 📦 Installation
86
+
87
+ ```bash
88
+ pip install omnirag
89
+ ```
90
+
91
+ ### From Source
92
+
93
+ ```bash
94
+ git clone https://github.com/Giri530/omnirag.git
95
+ cd omnirag
96
+ pip install -e .
97
+ ```
98
+
99
+ ---
100
+
101
+ ## 🚀 Quick Start
102
+
103
+ ```python
104
+ from omnirag import OmniRAG
105
+
106
+ # Initialize with your preferred model
107
+ rag = OmniRAG(
108
+ model_name="Qwen/Qwen2.5-1.5B-Instruct", # or "google/flan-t5-large"
109
+ verbose=True
110
+ )
111
+
112
+ # Load documents
113
+ rag.load_from_file("dataset.pdf")
114
+
115
+ # Query
116
+ result = rag.query("What is the main concept?")
117
+ print(result['answer'])
118
+ ```
119
+
120
+ **That's it!** OmniRAG automatically:
121
+ - Detects user expertise level
122
+ - Retrieves relevant information
123
+ - Adapts content to user level
124
+ - Generates perfect answer
125
+
126
+ ---
127
+
128
+ ## 💡 Usage Examples
129
+
130
+ ### Load Different File Types
131
+
132
+ ```python
133
+ # PDF files
134
+ rag.load_from_file("research_paper.pdf")
135
+
136
+ # Text files
137
+ rag.load_from_file("notes.txt")
138
+
139
+ # JSON data
140
+ rag.load_from_file("data.json")
141
+
142
+ # Entire folder
143
+ rag.load_from_folder("./documents")
144
+
145
+ # With chunking for large files
146
+ rag.load_from_file("big_file.pdf", chunk_size=500)
147
+
148
+ # Direct text
149
+ rag.add_documents([
150
+ "Python is great for ML.",
151
+ "Qwen is a powerful language model."
152
+ ])
153
+ ```
154
+
155
+ ### Different User Levels
156
+
157
+ ```python
158
+ # Auto-detect user level
159
+ result = rag.query("What is machine learning?")
160
+
161
+ # Force specific level
162
+ result = rag.query("Explain ML", user_level="expert")
163
+
164
+ # Get detailed metadata
165
+ result = rag.query("Question", return_metadata=True)
166
+ print(result['metadata']['user_level'])
167
+ print(result['metadata']['sub_queries'])
168
+ ```
169
+
170
+ ### Complex Queries
171
+
172
+ ```python
173
+ # OmniRAG automatically breaks down and answers
174
+ result = rag.query("""
175
+ Compare Python vs Java for machine learning.
176
+ Which is better for beginners?
177
+ What are the performance differences?
178
+ """)
179
+
180
+ print(result['answer'])
181
+ ```
182
+
183
+ ### Enable Web Search
184
+
185
+ ```python
186
+ rag = OmniRAG(
187
+ model_name="Qwen/Qwen2.5-1.5B-Instruct",
188
+ enable_web_search=True # Free DuckDuckGo search
189
+ )
190
+
191
+ # Queries about "latest" or "recent" automatically use web
192
+ result = rag.query("Latest AI developments in 2025")
193
+ ```
194
+
195
+ ---
196
+
197
+ ## 🎨 Supported Models
198
+
199
+ ### Qwen Models (Recommended!)
200
+
201
+ ```python
202
+ # Fast & Efficient
203
+ rag = OmniRAG(model_name="Qwen/Qwen2.5-0.5B-Instruct")
204
+
205
+ # Balanced (Best Choice!)
206
+ rag = OmniRAG(model_name="Qwen/Qwen2.5-1.5B-Instruct")
207
+
208
+ # High Quality
209
+ rag = OmniRAG(model_name="Qwen/Qwen2.5-3B-Instruct")
210
+ ```
211
+
212
+ ### Flan-T5 Models
213
+
214
+ ```python
215
+ # Small & Fast
216
+ rag = OmniRAG(model_name="google/flan-t5-base") # 250M params
217
+
218
+ # Larger & Better
219
+ rag = OmniRAG(model_name="google/flan-t5-large") # 780M params
220
+ ```
221
+
222
+ ### Other Models
223
+
224
+ ```python
225
+ # Microsoft Phi
226
+ rag = OmniRAG(model_name="microsoft/phi-2") # 2.7B params
227
+
228
+ # Mistral
229
+ rag = OmniRAG(model_name="mistralai/Mistral-7B-Instruct-v0.2") # 7B params
230
+ ```
231
+
232
+ ---
233
+
234
+ ## 🏗️ Architecture
235
+
236
+ ```
237
+ User Query
238
+
239
+ 🌊 LIQUID RAG: Detect expertise level
240
+
241
+ ⛓️ CHAIN RAG: Break into sub-queries (if complex)
242
+
243
+ FOR EACH SUB-QUERY:
244
+
245
+ 🤖 AGENTIC RAG: Choose tool (VectorDB or Web)
246
+
247
+ Retrieve relevant chunks
248
+
249
+ 🌊 LIQUID RAG: Transform to user level
250
+
251
+ Generate sub-answer
252
+
253
+ ⛓️ CHAIN RAG: Synthesize all sub-answers
254
+
255
+ 🌊 LIQUID RAG: Final polish
256
+
257
+ ✨ Perfect Answer!
258
+ ```
259
+
260
+ See [Architecture Diagram](docs/architecture.drawio) for detailed visualization.
261
+
262
+ ---
263
+
264
+ ## 📊 Performance
265
+
266
+ | Model | Size | RAM | Speed | Quality |
267
+ |-------|------|-----|-------|---------|
268
+ | Qwen-0.5B | 0.5B | 1GB | ⚡⚡⚡ | ⭐⭐ |
269
+ | **Qwen-1.5B** | 1.5B | 2GB | ⚡⚡ | ⭐⭐⭐ ⭐ |
270
+ | Qwen-3B | 3B | 4GB | ⚡ | ⭐⭐⭐⭐⭐ |
271
+ | Flan-T5-Base | 250M | 1GB | ⚡⚡⚡ | ⭐⭐⭐ |
272
+ | Flan-T5-Large | 780M | 2GB | ⚡⚡ | ⭐⭐⭐⭐ |
273
+
274
+ **Recommended:** Qwen-1.5B for best balance!
275
+
276
+ ---
277
+
278
+ ## 🔧 Configuration
279
+
280
+ ```python
281
+ rag = OmniRAG(
282
+ # LLM Model
283
+ model_name="Qwen/Qwen2.5-1.5B-Instruct",
284
+
285
+ # Embedding Model
286
+ embedding_model="all-MiniLM-L6-v2",
287
+
288
+ # Web Search
289
+ enable_web_search=True,
290
+
291
+ # Verbose Output
292
+ verbose=True
293
+ )
294
+ ```
295
+
296
+ ---
297
+
298
+ ## 📖 API Reference
299
+
300
+ ### OmniRAG Class
301
+
302
+ #### `__init__(model_name, embedding_model, enable_web_search, verbose)`
303
+ Initialize OmniRAG system.
304
+
305
+ #### `load_from_file(file_path, chunk_size=None)`
306
+ Load documents from file (.pdf, .txt, .json, .csv, .md).
307
+
308
+ #### `load_from_folder(folder_path, file_extensions=None, chunk_size=None)`
309
+ Load all documents from folder.
310
+
311
+ #### `add_documents(documents)`
312
+ Add documents directly as list.
313
+
314
+ #### `query(user_query, user_level=None, max_sources=5, return_metadata=False)`
315
+ Query the system and get answer.
316
+
317
+ **Returns:**
318
+ ```python
319
+ {
320
+ 'answer': str, # Generated answer
321
+ 'metadata': { # Optional
322
+ 'user_level': str,
323
+ 'sub_queries_count': int,
324
+ 'sub_queries': list,
325
+ 'tools_used': list
326
+ }
327
+ }
328
+ ```
329
+
330
+ #### `get_stats()`
331
+ Get system statistics.
332
+
333
+ #### `clear_cache()`
334
+ Clear query cache.
335
+
336
+ ---
337
+
338
+ ## 🌍 Use Cases
339
+
340
+ ### Research Assistant
341
+ ```python
342
+ rag.load_from_file("research_papers.pdf")
343
+ result = rag.query("What are the key findings?")
344
+ ```
345
+
346
+ ### Document Q&A
347
+ ```python
348
+ rag.load_from_folder("./company_docs")
349
+ result = rag.query("What is our refund policy?")
350
+ ```
351
+
352
+ ### Educational Tool
353
+ ```python
354
+ rag.load_from_file("textbook.pdf")
355
+ result = rag.query("Explain photosynthesis simply")
356
+ # Auto-detects beginner level!
357
+ ```
358
+
359
+ ### Code Documentation
360
+ ```python
361
+ rag.load_from_folder("./docs", file_extensions=['.md', '.txt'])
362
+ result = rag.query("How do I deploy this?")
363
+ ```
364
+
365
+ ---
366
+
367
+ ## 🛠️ Development
368
+
369
+ ### Install for Development
370
+
371
+ ```bash
372
+ git clone https://github.com/Giri530/omnirag.git
373
+ cd omnirag
374
+ pip install -e ".[dev]"
375
+ ```
376
+
377
+ ### Run Tests
378
+
379
+ ```bash
380
+ pytest tests/
381
+ ```
382
+
383
+ ### Project Structure
384
+
385
+ ```
386
+ omnirag/
387
+ ├── omnirag/
388
+ │ ├── __init__.py
389
+ │ ├── omnirag.py # Main class
390
+ │ ├── liquid_analyzer.py # User level detection
391
+ │ ├── chain_decomposer.py # Query decomposition
392
+ │ ├── agentic_planner.py # Tool selection
393
+ │ ├── content_transformer.py # Content adaptation
394
+ │ ├── vectordb_tool.py # FAISS database
395
+ │ ├── web_search_tool.py # Web search
396
+ │ ├── llm_client.py # LLM wrapper
397
+ │ └── cache.py # Caching
398
+ ├── examples/
399
+ │ └── quickstart.py
400
+ ├── setup.py
401
+ ├── requirements.txt
402
+ └── README.md
403
+ ```
404
+
405
+ ---
406
+
407
+ ## 🤝 Contributing
408
+
409
+ Contributions welcome! Please:
410
+
411
+ 1. Fork the repository
412
+ 2. Create feature branch (`git checkout -b feature/amazing`)
413
+ 3. Commit changes (`git commit -m 'Add amazing feature'`)
414
+ 4. Push to branch (`git push origin feature/amazing`)
415
+ 5. Open Pull Request
416
+
417
+ ---
418
+
419
+ ## 📝 Requirements
420
+
421
+ - Python 3.8+
422
+ - 2-4GB RAM (depends on model)
423
+ - CPU or GPU (GPU recommended for speed)
424
+
425
+ **Dependencies:**
426
+ - transformers
427
+ - torch
428
+ - sentence-transformers
429
+ - faiss-cpu
430
+ - PyPDF2
431
+ - duckduckgo-search
432
+
433
+ ---
434
+
435
+ ## 📄 License
436
+
437
+ MIT License - Free for commercial and personal use!
438
+
439
+ See [LICENSE](LICENSE) for details.
440
+
441
+ ---
442
+
443
+ ## 🙏 Acknowledgments
444
+
445
+ - **HuggingFace** for transformers library
446
+ - **Qwen Team** for excellent models
447
+ - **FAISS** for fast vector search
448
+ - **Sentence Transformers** for embeddings
449
+
450
+ ---
451
+
452
+ ## 📧 Contact
453
+
454
+ - **GitHub Issues**: [Report bugs or request features](https://github.com/Giri530/omnirag/issues)
455
+ - **Email**: your@email.com
456
+
457
+ ---
458
+
459
+ ## 🌟 Star History
460
+
461
+ If you find OmniRAG useful, please ⭐ star the repo!
462
+
463
+ ---
464
+
465
+ ## 📚 Citation
466
+
467
+ ```bibtex
468
+ @software{omnirag2025,
469
+ title={OmniRAG: The Universal RAG System},
470
+ author={Your Name},
471
+ year={2025},
472
+ url={https://github.com/Giri530/omnirag}
473
+ }
474
+ ```
475
+
476
+ ---
477
+
478
+ ## 🎯 Roadmap
479
+
480
+ - [ ] Support for more file formats (DOCX, XLSX)
481
+ - [ ] Advanced caching strategies
482
+ - [ ] Multi-language support
483
+ - [ ] Custom embedding models
484
+ - [ ] GUI interface
485
+ - [ ] Cloud deployment guides
486
+
487
+ ---
488
+
489
+ **Made with ❤️ - 100% FREE Forever!**
490
+
491
+ **Happy RAG-ing! 🚀**