QuerySUTRA 0.3.1__py3-none-any.whl → 0.3.3__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- querysutra-0.3.3.dist-info/METADATA +285 -0
- {querysutra-0.3.1.dist-info → querysutra-0.3.3.dist-info}/RECORD +7 -7
- sutra/__init__.py +2 -2
- sutra/sutra.py +281 -463
- querysutra-0.3.1.dist-info/METADATA +0 -429
- {querysutra-0.3.1.dist-info → querysutra-0.3.3.dist-info}/WHEEL +0 -0
- {querysutra-0.3.1.dist-info → querysutra-0.3.3.dist-info}/licenses/LICENSE +0 -0
- {querysutra-0.3.1.dist-info → querysutra-0.3.3.dist-info}/top_level.txt +0 -0
|
@@ -1,429 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: QuerySUTRA
|
|
3
|
-
Version: 0.3.1
|
|
4
|
-
Summary: SUTRA: Structured-Unstructured-Text-Retrieval-Architecture - AI-powered data analysis with custom visualizations, fuzzy matching, and smart caching
|
|
5
|
-
Home-page: https://github.com/yourusername/querysutra
|
|
6
|
-
Author: Aditya Batta
|
|
7
|
-
Author-email:
|
|
8
|
-
License: MIT
|
|
9
|
-
Classifier: Development Status :: 4 - Beta
|
|
10
|
-
Classifier: Intended Audience :: Developers
|
|
11
|
-
Classifier: Topic :: Database
|
|
12
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
13
|
-
Classifier: Programming Language :: Python :: 3
|
|
14
|
-
Requires-Python: >=3.8
|
|
15
|
-
Description-Content-Type: text/markdown
|
|
16
|
-
License-File: LICENSE
|
|
17
|
-
Requires-Dist: pandas>=1.3.0
|
|
18
|
-
Requires-Dist: numpy>=1.21.0
|
|
19
|
-
Requires-Dist: openai>=1.0.0
|
|
20
|
-
Requires-Dist: plotly>=5.0.0
|
|
21
|
-
Requires-Dist: matplotlib>=3.3.0
|
|
22
|
-
Requires-Dist: PyPDF2>=3.0.0
|
|
23
|
-
Requires-Dist: python-docx>=0.8.11
|
|
24
|
-
Requires-Dist: openpyxl>=3.0.0
|
|
25
|
-
Provides-Extra: mysql
|
|
26
|
-
Requires-Dist: sqlalchemy>=1.4.0; extra == "mysql"
|
|
27
|
-
Requires-Dist: mysql-connector-python>=8.0.0; extra == "mysql"
|
|
28
|
-
Provides-Extra: postgres
|
|
29
|
-
Requires-Dist: sqlalchemy>=1.4.0; extra == "postgres"
|
|
30
|
-
Requires-Dist: psycopg2-binary>=2.9.0; extra == "postgres"
|
|
31
|
-
Provides-Extra: embeddings
|
|
32
|
-
Requires-Dist: sentence-transformers>=2.0.0; extra == "embeddings"
|
|
33
|
-
Provides-Extra: all
|
|
34
|
-
Requires-Dist: sqlalchemy>=1.4.0; extra == "all"
|
|
35
|
-
Requires-Dist: mysql-connector-python>=8.0.0; extra == "all"
|
|
36
|
-
Requires-Dist: psycopg2-binary>=2.9.0; extra == "all"
|
|
37
|
-
Requires-Dist: sentence-transformers>=2.0.0; extra == "all"
|
|
38
|
-
Dynamic: home-page
|
|
39
|
-
Dynamic: license-file
|
|
40
|
-
Dynamic: requires-python
|
|
41
|
-
|
|
42
|
-
# QuerySUTRA
|
|
43
|
-
|
|
44
|
-
**SUTRA: Structured-Unstructured-Text-Retrieval-Architecture**
|
|
45
|
-
|
|
46
|
-
Transform any data into structured, queryable databases with AI-powered entity extraction.
|
|
47
|
-
|
|
48
|
-
## 🎯 Key Features
|
|
49
|
-
|
|
50
|
-
✅ **Multi-Table Creation** - Automatically extracts entities and creates multiple related tables
|
|
51
|
-
✅ **Smart Entity Extraction** - Identifies people, contacts, events, organizations from unstructured data
|
|
52
|
-
✅ **Natural Language Queries** - Ask questions in plain English
|
|
53
|
-
✅ **Multiple Data Formats** - CSV, Excel, JSON, PDF, DOCX, TXT, SQL, DataFrames
|
|
54
|
-
✅ **Direct SQL Access** - Query without API costs
|
|
55
|
-
✅ **Auto Visualization** - Built-in charts and graphs
|
|
56
|
-
✅ **Cloud Export** - Save to MySQL, PostgreSQL, or local SQLite
|
|
57
|
-
|
|
58
|
-
## 📦 Installation
|
|
59
|
-
|
|
60
|
-
```bash
|
|
61
|
-
pip install QuerySUTRA
|
|
62
|
-
|
|
63
|
-
# With MySQL support
|
|
64
|
-
pip install QuerySUTRA[mysql]
|
|
65
|
-
|
|
66
|
-
# With PostgreSQL support
|
|
67
|
-
pip install QuerySUTRA[postgres]
|
|
68
|
-
|
|
69
|
-
# With all database support
|
|
70
|
-
pip install QuerySUTRA[all]
|
|
71
|
-
```
|
|
72
|
-
|
|
73
|
-
## 🚀 Quick Start
|
|
74
|
-
|
|
75
|
-
```python
|
|
76
|
-
from sutra import SUTRA
|
|
77
|
-
|
|
78
|
-
# Initialize
|
|
79
|
-
sutra = SUTRA(api_key="your-openai-key")
|
|
80
|
-
|
|
81
|
-
# Upload any data - AI creates multiple structured tables!
|
|
82
|
-
sutra.upload("employee_story.pdf")
|
|
83
|
-
|
|
84
|
-
# View all created tables
|
|
85
|
-
sutra.tables()
|
|
86
|
-
# Output:
|
|
87
|
-
# 📋 TABLES IN DATABASE
|
|
88
|
-
# 1. employee_story_people (20 rows, 6 columns)
|
|
89
|
-
# Columns: id, name, address, city, email, phone
|
|
90
|
-
# 2. employee_story_contacts (20 rows, 4 columns)
|
|
91
|
-
# Columns: id, person_id, email, phone
|
|
92
|
-
# 3. employee_story_events (15 rows, 4 columns)
|
|
93
|
-
# Columns: id, host_id, description, city
|
|
94
|
-
|
|
95
|
-
# View detailed schema
|
|
96
|
-
sutra.schema()
|
|
97
|
-
|
|
98
|
-
# Query with natural language
|
|
99
|
-
result = sutra.ask("Show all people from New York")
|
|
100
|
-
print(result.data)
|
|
101
|
-
|
|
102
|
-
# With visualization
|
|
103
|
-
result = sutra.ask("Show events by city", viz=True)
|
|
104
|
-
|
|
105
|
-
# Direct SQL (no API cost!)
|
|
106
|
-
result = sutra.sql("SELECT * FROM employee_story_people WHERE city='Dallas'")
|
|
107
|
-
print(result.data)
|
|
108
|
-
```
|
|
109
|
-
|
|
110
|
-
## 📊 How It Works
|
|
111
|
-
|
|
112
|
-
### From Unstructured PDF to Structured Tables
|
|
113
|
-
|
|
114
|
-
**Input:** PDF with employee information
|
|
115
|
-
|
|
116
|
-
**AI Automatically Creates:**
|
|
117
|
-
```
|
|
118
|
-
📋 Created 3 structured tables:
|
|
119
|
-
📊 employee_story_people: 20 rows, 6 columns
|
|
120
|
-
- id, name, address, city, email, phone
|
|
121
|
-
📊 employee_story_contacts: 20 rows, 4 columns
|
|
122
|
-
- id, person_id, email, phone
|
|
123
|
-
📊 employee_story_events: 15 rows, 4 columns
|
|
124
|
-
- id, host_id, description, city
|
|
125
|
-
```
|
|
126
|
-
|
|
127
|
-
## 💡 Usage Examples
|
|
128
|
-
|
|
129
|
-
### 1. Upload Different Formats
|
|
130
|
-
|
|
131
|
-
```python
|
|
132
|
-
# CSV file
|
|
133
|
-
sutra.upload("sales_data.csv")
|
|
134
|
-
|
|
135
|
-
# Excel file
|
|
136
|
-
sutra.upload("quarterly_report.xlsx")
|
|
137
|
-
|
|
138
|
-
# PDF document (AI extracts entities!)
|
|
139
|
-
sutra.upload("company_directory.pdf")
|
|
140
|
-
|
|
141
|
-
# Word document
|
|
142
|
-
sutra.upload("meeting_notes.docx")
|
|
143
|
-
|
|
144
|
-
# Text file
|
|
145
|
-
sutra.upload("log_data.txt")
|
|
146
|
-
|
|
147
|
-
# DataFrame
|
|
148
|
-
import pandas as pd
|
|
149
|
-
df = pd.DataFrame({'name': ['Alice', 'Bob'], 'score': [95, 87]})
|
|
150
|
-
sutra.upload(df, name="test_scores")
|
|
151
|
-
```
|
|
152
|
-
|
|
153
|
-
### 2. View Your Data
|
|
154
|
-
|
|
155
|
-
```python
|
|
156
|
-
# List all tables with details
|
|
157
|
-
sutra.tables()
|
|
158
|
-
|
|
159
|
-
# Show schema with data types
|
|
160
|
-
sutra.schema()
|
|
161
|
-
|
|
162
|
-
# Show schema for specific table
|
|
163
|
-
sutra.schema("employee_story_people")
|
|
164
|
-
|
|
165
|
-
# Preview data
|
|
166
|
-
sutra.peek("employee_story_people", n=10)
|
|
167
|
-
```
|
|
168
|
-
|
|
169
|
-
### 3. Query Your Data
|
|
170
|
-
|
|
171
|
-
```python
|
|
172
|
-
# Natural language (uses OpenAI)
|
|
173
|
-
result = sutra.ask("What are the top 5 sales by region?")
|
|
174
|
-
print(result.data)
|
|
175
|
-
|
|
176
|
-
# With visualization
|
|
177
|
-
result = sutra.ask("Show sales trends by month", viz=True)
|
|
178
|
-
|
|
179
|
-
# Interactive mode (asks if you want viz)
|
|
180
|
-
result = sutra.interactive("Compare revenue across quarters")
|
|
181
|
-
|
|
182
|
-
# Direct SQL (free, no API!)
|
|
183
|
-
result = sutra.sql("SELECT city, COUNT(*) as count FROM employee_story_people GROUP BY city")
|
|
184
|
-
print(result.data)
|
|
185
|
-
```
|
|
186
|
-
|
|
187
|
-
### 4. Export Your Database
|
|
188
|
-
|
|
189
|
-
```python
|
|
190
|
-
# Export to MySQL (local or cloud)
|
|
191
|
-
sutra.save_to_mysql(
|
|
192
|
-
host="localhost",
|
|
193
|
-
user="root",
|
|
194
|
-
password="password",
|
|
195
|
-
database="my_database"
|
|
196
|
-
)
|
|
197
|
-
|
|
198
|
-
# Export to PostgreSQL
|
|
199
|
-
sutra.save_to_postgres(
|
|
200
|
-
host="mydb.amazonaws.com",
|
|
201
|
-
user="admin",
|
|
202
|
-
password="password",
|
|
203
|
-
database="production_db"
|
|
204
|
-
)
|
|
205
|
-
|
|
206
|
-
# Export to SQLite file
|
|
207
|
-
sutra.export_db("backup.db", format="sqlite")
|
|
208
|
-
|
|
209
|
-
# Export to SQL dump
|
|
210
|
-
sutra.export_db("schema.sql", format="sql")
|
|
211
|
-
|
|
212
|
-
# Export to JSON
|
|
213
|
-
sutra.export_db("data.json", format="json")
|
|
214
|
-
|
|
215
|
-
# Export to Excel (all tables as sheets)
|
|
216
|
-
sutra.export_db("data.xlsx", format="excel")
|
|
217
|
-
|
|
218
|
-
# Complete backup
|
|
219
|
-
sutra.backup("./backups")
|
|
220
|
-
```
|
|
221
|
-
|
|
222
|
-
## 🔥 Advanced Features
|
|
223
|
-
|
|
224
|
-
### Entity Extraction
|
|
225
|
-
|
|
226
|
-
QuerySUTRA automatically identifies and extracts:
|
|
227
|
-
|
|
228
|
-
- 👥 **People** - Names, addresses, contact info
|
|
229
|
-
- 📧 **Contacts** - Emails, phone numbers
|
|
230
|
-
- 📅 **Events** - Meetings, activities, locations
|
|
231
|
-
- 🏢 **Organizations** - Companies, departments
|
|
232
|
-
- 📍 **Locations** - Cities, addresses, coordinates
|
|
233
|
-
|
|
234
|
-
### Multiple Table Relationships
|
|
235
|
-
|
|
236
|
-
```python
|
|
237
|
-
# AI creates relational structure
|
|
238
|
-
sutra.upload("company_data.pdf")
|
|
239
|
-
|
|
240
|
-
# Result:
|
|
241
|
-
# people table with person_id
|
|
242
|
-
# contacts table with foreign key to person_id
|
|
243
|
-
# events table with host_id linking to people
|
|
244
|
-
```
|
|
245
|
-
|
|
246
|
-
### Query Across Tables
|
|
247
|
-
|
|
248
|
-
```python
|
|
249
|
-
# Natural language handles joins automatically
|
|
250
|
-
result = sutra.ask("Show all events hosted by people from Dallas")
|
|
251
|
-
|
|
252
|
-
# Or write SQL joins manually
|
|
253
|
-
result = sutra.sql("""
|
|
254
|
-
SELECT e.description, p.name, p.city
|
|
255
|
-
FROM employee_story_events e
|
|
256
|
-
JOIN employee_story_people p ON e.host_id = p.id
|
|
257
|
-
WHERE p.city = 'Dallas'
|
|
258
|
-
""")
|
|
259
|
-
```
|
|
260
|
-
|
|
261
|
-
## 📈 Visualization
|
|
262
|
-
|
|
263
|
-
```python
|
|
264
|
-
# Auto-detect best chart type
|
|
265
|
-
result = sutra.ask("Show revenue by product", viz=True)
|
|
266
|
-
|
|
267
|
-
# Interactive charts with Plotly
|
|
268
|
-
# - Bar charts for categorical data
|
|
269
|
-
# - Line charts for time series
|
|
270
|
-
# - Tables for detailed data
|
|
271
|
-
# - Pie charts for distributions
|
|
272
|
-
```
|
|
273
|
-
|
|
274
|
-
## 🌐 Cloud Database Integration
|
|
275
|
-
|
|
276
|
-
### AWS RDS MySQL
|
|
277
|
-
```python
|
|
278
|
-
sutra.save_to_mysql(
|
|
279
|
-
host="mydb.xxxx.us-east-1.rds.amazonaws.com",
|
|
280
|
-
user="admin",
|
|
281
|
-
password="password",
|
|
282
|
-
database="production",
|
|
283
|
-
port=3306
|
|
284
|
-
)
|
|
285
|
-
```
|
|
286
|
-
|
|
287
|
-
### Google Cloud SQL
|
|
288
|
-
```python
|
|
289
|
-
sutra.save_to_postgres(
|
|
290
|
-
host="35.123.456.789",
|
|
291
|
-
user="postgres",
|
|
292
|
-
password="password",
|
|
293
|
-
database="analytics"
|
|
294
|
-
)
|
|
295
|
-
```
|
|
296
|
-
|
|
297
|
-
### Heroku Postgres
|
|
298
|
-
```python
|
|
299
|
-
sutra.save_to_postgres(
|
|
300
|
-
host="ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com",
|
|
301
|
-
user="username",
|
|
302
|
-
password="password",
|
|
303
|
-
database="dbname",
|
|
304
|
-
port=5432
|
|
305
|
-
)
|
|
306
|
-
```
|
|
307
|
-
|
|
308
|
-
## ⚡ Performance Tips
|
|
309
|
-
|
|
310
|
-
```python
|
|
311
|
-
# Use direct SQL for complex queries (faster, no API cost)
|
|
312
|
-
result = sutra.sql("SELECT * FROM data WHERE status='active'")
|
|
313
|
-
|
|
314
|
-
# Cache is automatic for repeated questions
|
|
315
|
-
result1 = sutra.ask("Show total sales") # Calls API
|
|
316
|
-
result2 = sutra.ask("Show total sales") # From cache ⚡
|
|
317
|
-
|
|
318
|
-
# Export results for reuse
|
|
319
|
-
result.data.to_csv("results.csv")
|
|
320
|
-
```
|
|
321
|
-
|
|
322
|
-
## 🔒 API Key Security
|
|
323
|
-
|
|
324
|
-
```python
|
|
325
|
-
# Option 1: Pass directly (not recommended for production)
|
|
326
|
-
sutra = SUTRA(api_key="sk-...")
|
|
327
|
-
|
|
328
|
-
# Option 2: Environment variable (recommended)
|
|
329
|
-
import os
|
|
330
|
-
os.environ["OPENAI_API_KEY"] = "sk-..."
|
|
331
|
-
sutra = SUTRA()
|
|
332
|
-
|
|
333
|
-
# Option 3: .env file
|
|
334
|
-
# Create .env file with: OPENAI_API_KEY=sk-...
|
|
335
|
-
from dotenv import load_dotenv
|
|
336
|
-
load_dotenv()
|
|
337
|
-
sutra = SUTRA()
|
|
338
|
-
```
|
|
339
|
-
|
|
340
|
-
## 🎓 Complete Example
|
|
341
|
-
|
|
342
|
-
```python
|
|
343
|
-
from sutra import SUTRA
|
|
344
|
-
import pandas as pd
|
|
345
|
-
|
|
346
|
-
# Initialize
|
|
347
|
-
sutra = SUTRA(api_key="your-openai-key")
|
|
348
|
-
|
|
349
|
-
# Upload PDF - creates multiple tables
|
|
350
|
-
sutra.upload("employee_directory.pdf")
|
|
351
|
-
|
|
352
|
-
# View what was created
|
|
353
|
-
tables_info = sutra.tables()
|
|
354
|
-
print(f"Created {len(tables_info)} tables")
|
|
355
|
-
|
|
356
|
-
# View detailed schema
|
|
357
|
-
sutra.schema()
|
|
358
|
-
|
|
359
|
-
# Query specific table
|
|
360
|
-
result = sutra.ask("How many people are in each city?",
|
|
361
|
-
table="employee_directory_people")
|
|
362
|
-
print(result.data)
|
|
363
|
-
|
|
364
|
-
# Visualize
|
|
365
|
-
result = sutra.ask("Show distribution of people by city", viz=True)
|
|
366
|
-
|
|
367
|
-
# Export to MySQL
|
|
368
|
-
sutra.save_to_mysql("localhost", "root", "password", "company_db")
|
|
369
|
-
|
|
370
|
-
# Backup everything
|
|
371
|
-
sutra.backup("./backups")
|
|
372
|
-
|
|
373
|
-
# Close connection
|
|
374
|
-
sutra.close()
|
|
375
|
-
```
|
|
376
|
-
|
|
377
|
-
## 📚 Method Reference
|
|
378
|
-
|
|
379
|
-
### Core Methods
|
|
380
|
-
|
|
381
|
-
| Method | Description |
|
|
382
|
-
|--------|-------------|
|
|
383
|
-
| `upload(data, name)` | Upload any data format, creates multiple tables |
|
|
384
|
-
| `tables()` | List all tables with row/column counts |
|
|
385
|
-
| `schema(table)` | Show detailed schema with data types |
|
|
386
|
-
| `peek(table, n)` | Preview first n rows |
|
|
387
|
-
| `ask(question, viz)` | Natural language query |
|
|
388
|
-
| `sql(query, viz)` | Direct SQL query |
|
|
389
|
-
| `interactive(question)` | Query with viz prompt |
|
|
390
|
-
|
|
391
|
-
### Export Methods
|
|
392
|
-
|
|
393
|
-
| Method | Description |
|
|
394
|
-
|--------|-------------|
|
|
395
|
-
| `export_db(path, format)` | Export database (sqlite/sql/json/excel) |
|
|
396
|
-
| `save_to_mysql(...)` | Save to MySQL database |
|
|
397
|
-
| `save_to_postgres(...)` | Save to PostgreSQL database |
|
|
398
|
-
| `backup(path)` | Complete backup with timestamp |
|
|
399
|
-
|
|
400
|
-
## 🐛 Troubleshooting
|
|
401
|
-
|
|
402
|
-
**Q: Only one table created instead of multiple?**
|
|
403
|
-
A: Make sure you have OpenAI API key set. Without it, falls back to simple parsing.
|
|
404
|
-
|
|
405
|
-
**Q: "No API key" error?**
|
|
406
|
-
A: Set your OpenAI key: `sutra = SUTRA(api_key="sk-...")`
|
|
407
|
-
|
|
408
|
-
**Q: PDF extraction failed?**
|
|
409
|
-
A: Install PyPDF2: `pip install PyPDF2`
|
|
410
|
-
|
|
411
|
-
**Q: MySQL export error?**
|
|
412
|
-
A: Install extras: `pip install QuerySUTRA[mysql]`
|
|
413
|
-
|
|
414
|
-
## 📄 License
|
|
415
|
-
|
|
416
|
-
MIT License - see LICENSE file
|
|
417
|
-
|
|
418
|
-
## 🤝 Contributing
|
|
419
|
-
|
|
420
|
-
Contributions welcome! Open an issue or submit a PR.
|
|
421
|
-
|
|
422
|
-
## 📞 Support
|
|
423
|
-
|
|
424
|
-
- Issues: [GitHub Issues](https://github.com/yourusername/querysutra/issues)
|
|
425
|
-
- Email: your@email.com
|
|
426
|
-
|
|
427
|
-
---
|
|
428
|
-
|
|
429
|
-
**Made with ❤️ by Aditya Batta**
|
|
File without changes
|
|
File without changes
|
|
File without changes
|