omnirag 1.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- omnirag-1.0.0/PKG-INFO +491 -0
- omnirag-1.0.0/README.md +452 -0
- omnirag-1.0.0/omnirag.egg-info/PKG-INFO +491 -0
- omnirag-1.0.0/omnirag.egg-info/SOURCES.txt +7 -0
- omnirag-1.0.0/omnirag.egg-info/dependency_links.txt +1 -0
- omnirag-1.0.0/omnirag.egg-info/requires.txt +9 -0
- omnirag-1.0.0/omnirag.egg-info/top_level.txt +1 -0
- omnirag-1.0.0/setup.cfg +4 -0
- omnirag-1.0.0/setup.py +40 -0
omnirag-1.0.0/PKG-INFO
ADDED
|
@@ -0,0 +1,491 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: omnirag
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: OmniRAG: Universal RAG System combining Liquid + Agentic + Chain RAG
|
|
5
|
+
Home-page: https://github.com/Giri530/omnirag
|
|
6
|
+
Author: Girinath V
|
|
7
|
+
Author-email: girinathv48@gmail.com
|
|
8
|
+
Keywords: rag llm ai faiss huggingface qwen machine-learning nlp pdf
|
|
9
|
+
Classifier: Development Status :: 4 - Beta
|
|
10
|
+
Classifier: Intended Audience :: Developers
|
|
11
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
12
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
18
|
+
Requires-Python: >=3.8
|
|
19
|
+
Description-Content-Type: text/markdown
|
|
20
|
+
Requires-Dist: transformers>=4.30.0
|
|
21
|
+
Requires-Dist: torch>=2.0.0
|
|
22
|
+
Requires-Dist: sentence-transformers>=2.2.0
|
|
23
|
+
Requires-Dist: faiss-cpu>=1.7.4
|
|
24
|
+
Requires-Dist: numpy>=1.24.0
|
|
25
|
+
Requires-Dist: accelerate>=0.20.0
|
|
26
|
+
Requires-Dist: PyPDF2>=3.0.0
|
|
27
|
+
Requires-Dist: duckduckgo-search>=3.9.0
|
|
28
|
+
Requires-Dist: requests>=2.31.0
|
|
29
|
+
Dynamic: author
|
|
30
|
+
Dynamic: author-email
|
|
31
|
+
Dynamic: classifier
|
|
32
|
+
Dynamic: description
|
|
33
|
+
Dynamic: description-content-type
|
|
34
|
+
Dynamic: home-page
|
|
35
|
+
Dynamic: keywords
|
|
36
|
+
Dynamic: requires-dist
|
|
37
|
+
Dynamic: requires-python
|
|
38
|
+
Dynamic: summary
|
|
39
|
+
|
|
40
|
+
# 🚀 OmniRAG - The Universal RAG System
|
|
41
|
+
|
|
42
|
+
**Intelligent RAG combining Liquid + Agentic + Chain architectures**
|
|
43
|
+
|
|
44
|
+
100% FREE using HuggingFace model (Qwen) + FAISS!
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## 🎯 What is OmniRAG?
|
|
49
|
+
|
|
50
|
+
OmniRAG is an advanced Retrieval-Augmented Generation system that combines three powerful RAG techniques:
|
|
51
|
+
|
|
52
|
+
### 🌊 Liquid RAG
|
|
53
|
+
Automatically adapts answers to user expertise level:
|
|
54
|
+
- **Beginner**: Simple explanations with examples
|
|
55
|
+
- **Intermediate**: Balanced technical content
|
|
56
|
+
- **Expert**: Deep technical details
|
|
57
|
+
|
|
58
|
+
### 🤖 Agentic RAG
|
|
59
|
+
Intelligently chooses the best information source:
|
|
60
|
+
- **VectorDB**: For local documents
|
|
61
|
+
- **Web Search**: For current information
|
|
62
|
+
|
|
63
|
+
### ⛓️ Chain RAG
|
|
64
|
+
Handles complex multi-part questions:
|
|
65
|
+
- Breaks down complex queries
|
|
66
|
+
- Answers each part separately
|
|
67
|
+
- Synthesizes coherent final answer
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
## ✨ Features
|
|
72
|
+
|
|
73
|
+
✅ **PDF Support** - Load PDF files directly
|
|
74
|
+
✅ **Multiple LLM Models** - Qwen, Flan-T5, Mistral, Phi-2
|
|
75
|
+
✅ **FAISS Vector DB** - Fast similarity search
|
|
76
|
+
✅ **Web Search** - DuckDuckGo integration (free!)
|
|
77
|
+
✅ **Smart User Detection** - Auto expertise level detection
|
|
78
|
+
✅ **Query Decomposition** - Handles complex questions
|
|
79
|
+
✅ **Fast Caching** - 3x speedup on repeated queries
|
|
80
|
+
✅ **100% FREE** - No API costs!
|
|
81
|
+
✅ **Works on CPU** - No GPU required (but faster with GPU)
|
|
82
|
+
|
|
83
|
+
---
|
|
84
|
+
|
|
85
|
+
## 📦 Installation
|
|
86
|
+
|
|
87
|
+
```bash
|
|
88
|
+
pip install omnirag
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
### From Source
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
git clone https://github.com/Giri530/omnirag.git
|
|
95
|
+
cd omnirag
|
|
96
|
+
pip install -e .
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## 🚀 Quick Start
|
|
102
|
+
|
|
103
|
+
```python
|
|
104
|
+
from omnirag import OmniRAG
|
|
105
|
+
|
|
106
|
+
# Initialize with your preferred model
|
|
107
|
+
rag = OmniRAG(
|
|
108
|
+
model_name="Qwen/Qwen2.5-1.5B-Instruct", # or "google/flan-t5-large"
|
|
109
|
+
verbose=True
|
|
110
|
+
)
|
|
111
|
+
|
|
112
|
+
# Load documents
|
|
113
|
+
rag.load_from_file("dataset.pdf")
|
|
114
|
+
|
|
115
|
+
# Query
|
|
116
|
+
result = rag.query("What is the main concept?")
|
|
117
|
+
print(result['answer'])
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
**That's it!** OmniRAG automatically:
|
|
121
|
+
- Detects user expertise level
|
|
122
|
+
- Retrieves relevant information
|
|
123
|
+
- Adapts content to user level
|
|
124
|
+
- Generates perfect answer
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## 💡 Usage Examples
|
|
129
|
+
|
|
130
|
+
### Load Different File Types
|
|
131
|
+
|
|
132
|
+
```python
|
|
133
|
+
# PDF files
|
|
134
|
+
rag.load_from_file("research_paper.pdf")
|
|
135
|
+
|
|
136
|
+
# Text files
|
|
137
|
+
rag.load_from_file("notes.txt")
|
|
138
|
+
|
|
139
|
+
# JSON data
|
|
140
|
+
rag.load_from_file("data.json")
|
|
141
|
+
|
|
142
|
+
# Entire folder
|
|
143
|
+
rag.load_from_folder("./documents")
|
|
144
|
+
|
|
145
|
+
# With chunking for large files
|
|
146
|
+
rag.load_from_file("big_file.pdf", chunk_size=500)
|
|
147
|
+
|
|
148
|
+
# Direct text
|
|
149
|
+
rag.add_documents([
|
|
150
|
+
"Python is great for ML.",
|
|
151
|
+
"Qwen is a powerful language model."
|
|
152
|
+
])
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
### Different User Levels
|
|
156
|
+
|
|
157
|
+
```python
|
|
158
|
+
# Auto-detect user level
|
|
159
|
+
result = rag.query("What is machine learning?")
|
|
160
|
+
|
|
161
|
+
# Force specific level
|
|
162
|
+
result = rag.query("Explain ML", user_level="expert")
|
|
163
|
+
|
|
164
|
+
# Get detailed metadata
|
|
165
|
+
result = rag.query("Question", return_metadata=True)
|
|
166
|
+
print(result['metadata']['user_level'])
|
|
167
|
+
print(result['metadata']['sub_queries'])
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
### Complex Queries
|
|
171
|
+
|
|
172
|
+
```python
|
|
173
|
+
# OmniRAG automatically breaks down and answers
|
|
174
|
+
result = rag.query("""
|
|
175
|
+
Compare Python vs Java for machine learning.
|
|
176
|
+
Which is better for beginners?
|
|
177
|
+
What are the performance differences?
|
|
178
|
+
""")
|
|
179
|
+
|
|
180
|
+
print(result['answer'])
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
### Enable Web Search
|
|
184
|
+
|
|
185
|
+
```python
|
|
186
|
+
rag = OmniRAG(
|
|
187
|
+
model_name="Qwen/Qwen2.5-1.5B-Instruct",
|
|
188
|
+
enable_web_search=True # Free DuckDuckGo search
|
|
189
|
+
)
|
|
190
|
+
|
|
191
|
+
# Queries about "latest" or "recent" automatically use web
|
|
192
|
+
result = rag.query("Latest AI developments in 2025")
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
---
|
|
196
|
+
|
|
197
|
+
## 🎨 Supported Models
|
|
198
|
+
|
|
199
|
+
### Qwen Models (Recommended!)
|
|
200
|
+
|
|
201
|
+
```python
|
|
202
|
+
# Fast & Efficient
|
|
203
|
+
rag = OmniRAG(model_name="Qwen/Qwen2.5-0.5B-Instruct")
|
|
204
|
+
|
|
205
|
+
# Balanced (Best Choice!)
|
|
206
|
+
rag = OmniRAG(model_name="Qwen/Qwen2.5-1.5B-Instruct")
|
|
207
|
+
|
|
208
|
+
# High Quality
|
|
209
|
+
rag = OmniRAG(model_name="Qwen/Qwen2.5-3B-Instruct")
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
### Flan-T5 Models
|
|
213
|
+
|
|
214
|
+
```python
|
|
215
|
+
# Small & Fast
|
|
216
|
+
rag = OmniRAG(model_name="google/flan-t5-base") # 250M params
|
|
217
|
+
|
|
218
|
+
# Larger & Better
|
|
219
|
+
rag = OmniRAG(model_name="google/flan-t5-large") # 780M params
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
### Other Models
|
|
223
|
+
|
|
224
|
+
```python
|
|
225
|
+
# Microsoft Phi
|
|
226
|
+
rag = OmniRAG(model_name="microsoft/phi-2") # 2.7B params
|
|
227
|
+
|
|
228
|
+
# Mistral
|
|
229
|
+
rag = OmniRAG(model_name="mistralai/Mistral-7B-Instruct-v0.2") # 7B params
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
---
|
|
233
|
+
|
|
234
|
+
## 🏗️ Architecture
|
|
235
|
+
|
|
236
|
+
```
|
|
237
|
+
User Query
|
|
238
|
+
↓
|
|
239
|
+
🌊 LIQUID RAG: Detect expertise level
|
|
240
|
+
↓
|
|
241
|
+
⛓️ CHAIN RAG: Break into sub-queries (if complex)
|
|
242
|
+
↓
|
|
243
|
+
FOR EACH SUB-QUERY:
|
|
244
|
+
↓
|
|
245
|
+
🤖 AGENTIC RAG: Choose tool (VectorDB or Web)
|
|
246
|
+
↓
|
|
247
|
+
Retrieve relevant chunks
|
|
248
|
+
↓
|
|
249
|
+
🌊 LIQUID RAG: Transform to user level
|
|
250
|
+
↓
|
|
251
|
+
Generate sub-answer
|
|
252
|
+
↓
|
|
253
|
+
⛓️ CHAIN RAG: Synthesize all sub-answers
|
|
254
|
+
↓
|
|
255
|
+
🌊 LIQUID RAG: Final polish
|
|
256
|
+
↓
|
|
257
|
+
✨ Perfect Answer!
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
See [Architecture Diagram](docs/architecture.drawio) for detailed visualization.
|
|
261
|
+
|
|
262
|
+
---
|
|
263
|
+
|
|
264
|
+
## 📊 Performance
|
|
265
|
+
|
|
266
|
+
| Model | Size | RAM | Speed | Quality |
|
|
267
|
+
|-------|------|-----|-------|---------|
|
|
268
|
+
| Qwen-0.5B | 0.5B | 1GB | ⚡⚡⚡ | ⭐⭐ |
|
|
269
|
+
| **Qwen-1.5B** | 1.5B | 2GB | ⚡⚡ | ⭐⭐⭐ ⭐ |
|
|
270
|
+
| Qwen-3B | 3B | 4GB | ⚡ | ⭐⭐⭐⭐⭐ |
|
|
271
|
+
| Flan-T5-Base | 250M | 1GB | ⚡⚡⚡ | ⭐⭐⭐ |
|
|
272
|
+
| Flan-T5-Large | 780M | 2GB | ⚡⚡ | ⭐⭐⭐⭐ |
|
|
273
|
+
|
|
274
|
+
**Recommended:** Qwen-1.5B for best balance!
|
|
275
|
+
|
|
276
|
+
---
|
|
277
|
+
|
|
278
|
+
## 🔧 Configuration
|
|
279
|
+
|
|
280
|
+
```python
|
|
281
|
+
rag = OmniRAG(
|
|
282
|
+
# LLM Model
|
|
283
|
+
model_name="Qwen/Qwen2.5-1.5B-Instruct",
|
|
284
|
+
|
|
285
|
+
# Embedding Model
|
|
286
|
+
embedding_model="all-MiniLM-L6-v2",
|
|
287
|
+
|
|
288
|
+
# Web Search
|
|
289
|
+
enable_web_search=True,
|
|
290
|
+
|
|
291
|
+
# Verbose Output
|
|
292
|
+
verbose=True
|
|
293
|
+
)
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
---
|
|
297
|
+
|
|
298
|
+
## 📖 API Reference
|
|
299
|
+
|
|
300
|
+
### OmniRAG Class
|
|
301
|
+
|
|
302
|
+
#### `__init__(model_name, embedding_model, enable_web_search, verbose)`
|
|
303
|
+
Initialize OmniRAG system.
|
|
304
|
+
|
|
305
|
+
#### `load_from_file(file_path, chunk_size=None)`
|
|
306
|
+
Load documents from file (.pdf, .txt, .json, .csv, .md).
|
|
307
|
+
|
|
308
|
+
#### `load_from_folder(folder_path, file_extensions=None, chunk_size=None)`
|
|
309
|
+
Load all documents from folder.
|
|
310
|
+
|
|
311
|
+
#### `add_documents(documents)`
|
|
312
|
+
Add documents directly as list.
|
|
313
|
+
|
|
314
|
+
#### `query(user_query, user_level=None, max_sources=5, return_metadata=False)`
|
|
315
|
+
Query the system and get answer.
|
|
316
|
+
|
|
317
|
+
**Returns:**
|
|
318
|
+
```python
|
|
319
|
+
{
|
|
320
|
+
'answer': str, # Generated answer
|
|
321
|
+
'metadata': { # Optional
|
|
322
|
+
'user_level': str,
|
|
323
|
+
'sub_queries_count': int,
|
|
324
|
+
'sub_queries': list,
|
|
325
|
+
'tools_used': list
|
|
326
|
+
}
|
|
327
|
+
}
|
|
328
|
+
```
|
|
329
|
+
|
|
330
|
+
#### `get_stats()`
|
|
331
|
+
Get system statistics.
|
|
332
|
+
|
|
333
|
+
#### `clear_cache()`
|
|
334
|
+
Clear query cache.
|
|
335
|
+
|
|
336
|
+
---
|
|
337
|
+
|
|
338
|
+
## 🌍 Use Cases
|
|
339
|
+
|
|
340
|
+
### Research Assistant
|
|
341
|
+
```python
|
|
342
|
+
rag.load_from_file("research_papers.pdf")
|
|
343
|
+
result = rag.query("What are the key findings?")
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
### Document Q&A
|
|
347
|
+
```python
|
|
348
|
+
rag.load_from_folder("./company_docs")
|
|
349
|
+
result = rag.query("What is our refund policy?")
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
### Educational Tool
|
|
353
|
+
```python
|
|
354
|
+
rag.load_from_file("textbook.pdf")
|
|
355
|
+
result = rag.query("Explain photosynthesis simply")
|
|
356
|
+
# Auto-detects beginner level!
|
|
357
|
+
```
|
|
358
|
+
|
|
359
|
+
### Code Documentation
|
|
360
|
+
```python
|
|
361
|
+
rag.load_from_folder("./docs", file_extensions=['.md', '.txt'])
|
|
362
|
+
result = rag.query("How do I deploy this?")
|
|
363
|
+
```
|
|
364
|
+
|
|
365
|
+
---
|
|
366
|
+
|
|
367
|
+
## 🛠️ Development
|
|
368
|
+
|
|
369
|
+
### Install for Development
|
|
370
|
+
|
|
371
|
+
```bash
|
|
372
|
+
git clone https://github.com/Giri530/omnirag.git
|
|
373
|
+
cd omnirag
|
|
374
|
+
pip install -e ".[dev]"
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
### Run Tests
|
|
378
|
+
|
|
379
|
+
```bash
|
|
380
|
+
pytest tests/
|
|
381
|
+
```
|
|
382
|
+
|
|
383
|
+
### Project Structure
|
|
384
|
+
|
|
385
|
+
```
|
|
386
|
+
omnirag/
|
|
387
|
+
├── omnirag/
|
|
388
|
+
│ ├── __init__.py
|
|
389
|
+
│ ├── omnirag.py # Main class
|
|
390
|
+
│ ├── liquid_analyzer.py # User level detection
|
|
391
|
+
│ ├── chain_decomposer.py # Query decomposition
|
|
392
|
+
│ ├── agentic_planner.py # Tool selection
|
|
393
|
+
│ ├── content_transformer.py # Content adaptation
|
|
394
|
+
│ ├── vectordb_tool.py # FAISS database
|
|
395
|
+
│ ├── web_search_tool.py # Web search
|
|
396
|
+
│ ├── llm_client.py # LLM wrapper
|
|
397
|
+
│ └── cache.py # Caching
|
|
398
|
+
├── examples/
|
|
399
|
+
│ └── quickstart.py
|
|
400
|
+
├── setup.py
|
|
401
|
+
├── requirements.txt
|
|
402
|
+
└── README.md
|
|
403
|
+
```
|
|
404
|
+
|
|
405
|
+
---
|
|
406
|
+
|
|
407
|
+
## 🤝 Contributing
|
|
408
|
+
|
|
409
|
+
Contributions welcome! Please:
|
|
410
|
+
|
|
411
|
+
1. Fork the repository
|
|
412
|
+
2. Create feature branch (`git checkout -b feature/amazing`)
|
|
413
|
+
3. Commit changes (`git commit -m 'Add amazing feature'`)
|
|
414
|
+
4. Push to branch (`git push origin feature/amazing`)
|
|
415
|
+
5. Open Pull Request
|
|
416
|
+
|
|
417
|
+
---
|
|
418
|
+
|
|
419
|
+
## 📝 Requirements
|
|
420
|
+
|
|
421
|
+
- Python 3.8+
|
|
422
|
+
- 2-4GB RAM (depends on model)
|
|
423
|
+
- CPU or GPU (GPU recommended for speed)
|
|
424
|
+
|
|
425
|
+
**Dependencies:**
|
|
426
|
+
- transformers
|
|
427
|
+
- torch
|
|
428
|
+
- sentence-transformers
|
|
429
|
+
- faiss-cpu
|
|
430
|
+
- PyPDF2
|
|
431
|
+
- duckduckgo-search
|
|
432
|
+
|
|
433
|
+
---
|
|
434
|
+
|
|
435
|
+
## 📄 License
|
|
436
|
+
|
|
437
|
+
MIT License - Free for commercial and personal use!
|
|
438
|
+
|
|
439
|
+
See [LICENSE](LICENSE) for details.
|
|
440
|
+
|
|
441
|
+
---
|
|
442
|
+
|
|
443
|
+
## 🙏 Acknowledgments
|
|
444
|
+
|
|
445
|
+
- **HuggingFace** for transformers library
|
|
446
|
+
- **Qwen Team** for excellent models
|
|
447
|
+
- **FAISS** for fast vector search
|
|
448
|
+
- **Sentence Transformers** for embeddings
|
|
449
|
+
|
|
450
|
+
---
|
|
451
|
+
|
|
452
|
+
## 📧 Contact
|
|
453
|
+
|
|
454
|
+
- **GitHub Issues**: [Report bugs or request features](https://github.com/Giri530/omnirag/issues)
|
|
455
|
+
- **Email**: your@email.com
|
|
456
|
+
|
|
457
|
+
---
|
|
458
|
+
|
|
459
|
+
## 🌟 Star History
|
|
460
|
+
|
|
461
|
+
If you find OmniRAG useful, please ⭐ star the repo!
|
|
462
|
+
|
|
463
|
+
---
|
|
464
|
+
|
|
465
|
+
## 📚 Citation
|
|
466
|
+
|
|
467
|
+
```bibtex
|
|
468
|
+
@software{omnirag2025,
|
|
469
|
+
title={OmniRAG: The Universal RAG System},
|
|
470
|
+
author={Your Name},
|
|
471
|
+
year={2025},
|
|
472
|
+
url={https://github.com/Giri530/omnirag}
|
|
473
|
+
}
|
|
474
|
+
```
|
|
475
|
+
|
|
476
|
+
---
|
|
477
|
+
|
|
478
|
+
## 🎯 Roadmap
|
|
479
|
+
|
|
480
|
+
- [ ] Support for more file formats (DOCX, XLSX)
|
|
481
|
+
- [ ] Advanced caching strategies
|
|
482
|
+
- [ ] Multi-language support
|
|
483
|
+
- [ ] Custom embedding models
|
|
484
|
+
- [ ] GUI interface
|
|
485
|
+
- [ ] Cloud deployment guides
|
|
486
|
+
|
|
487
|
+
---
|
|
488
|
+
|
|
489
|
+
**Made with ❤️ - 100% FREE Forever!**
|
|
490
|
+
|
|
491
|
+
**Happy RAG-ing! 🚀**
|