PyPI - QuerySUTRA - Versions diffs - 0.3.2__tar.gz → 0.4.0__tar.gz - Mend

QuerySUTRA 0.3.2tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

querysutra-0.4.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,438 @@
+Metadata-Version: 2.4
+Name: QuerySUTRA
+Version: 0.4.0
+Summary: SUTRA: Structured-Unstructured-Text-Retrieval-Architecture - Complete data extraction
+Author: Aditya Batta
+License: MIT
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: pandas>=1.3.0
+Requires-Dist: numpy>=1.21.0
+Requires-Dist: openai>=1.0.0
+Requires-Dist: plotly>=5.0.0
+Requires-Dist: matplotlib>=3.3.0
+Requires-Dist: PyPDF2>=3.0.0
+Requires-Dist: python-docx>=0.8.11
+Requires-Dist: openpyxl>=3.0.0
+Provides-Extra: mysql
+Requires-Dist: sqlalchemy>=1.4.0; extra == "mysql"
+Requires-Dist: mysql-connector-python>=8.0.0; extra == "mysql"
+Provides-Extra: postgres
+Requires-Dist: sqlalchemy>=1.4.0; extra == "postgres"
+Requires-Dist: psycopg2-binary>=2.9.0; extra == "postgres"
+Provides-Extra: embeddings
+Requires-Dist: sentence-transformers>=2.0.0; extra == "embeddings"
+Provides-Extra: all
+Requires-Dist: sqlalchemy>=1.4.0; extra == "all"
+Requires-Dist: mysql-connector-python>=8.0.0; extra == "all"
+Requires-Dist: psycopg2-binary>=2.9.0; extra == "all"
+Requires-Dist: sentence-transformers>=2.0.0; extra == "all"
+Dynamic: license-file
+Dynamic: requires-python
+# QuerySUTRA
+**SUTRA: Structured-Unstructured-Text-Retrieval-Architecture**
+Professional Python library for AI-powered data analysis with automatic entity extraction, natural language querying, and intelligent caching.
+## Installation
+```bash
+pip install QuerySUTRA
+# With optional features
+pip install QuerySUTRA[embeddings]  # Smart caching
+pip install QuerySUTRA[mysql]       # MySQL support
+pip install QuerySUTRA[postgres]    # PostgreSQL support
+pip install QuerySUTRA[all]         # All features
+```
+## Key Features
+### 1. Complete Data Extraction from Large Documents
+Processes entire documents in chunks - no data loss on large PDFs.
+```python
+from sutra import SUTRA
+sutra = SUTRA(api_key="your-openai-key")
+sutra.upload("large_document.pdf")  # Extracts ALL data, not just first page
+# Automatically creates multiple tables:
+# - document_people (40 rows, 8 columns)
+# - document_skills (50 rows, 5 columns)
+# - document_technologies (30 rows, 4 columns)
+# - document_projects (25 rows, 6 columns)
+# etc.
+```
+### 2. Natural Language Querying
+```python
+result = sutra.ask("Show me all people from New York")
+print(result.data)
+# With visualization
+result = sutra.ask("Show sales by region", viz="pie")
+```
+### 3. Load Existing Databases
+```python
+# Load SQLite database
+sutra = SUTRA.load_from_db("sutra.db", api_key="your-key")
+# Connect to MySQL
+sutra = SUTRA.connect_mysql("localhost", "root", "password", "database")
+# Connect to PostgreSQL
+sutra = SUTRA.connect_postgres("localhost", "postgres", "password", "database")
+```
+### 4. Import SQLite to MySQL
+**Step 1: In Colab or Python - Export Database**
+```python
+# After uploading your data
+sutra.upload("data.pdf")
+sutra.tables()
+# Export to SQLite
+sutra.export_db("my_data.db", format="sqlite")
+# In Colab, download the file
+from google.colab import files
+files.download("my_data.db")
+```
+**Step 2: On Windows - Import to MySQL**
+Method A: Using QuerySUTRA
+```python
+from sutra import SUTRA
+# Load the SQLite database
+sutra = SUTRA(api_key="your-key", db="my_data.db")
+# Verify tables
+sutra.tables()
+# Export to MySQL
+sutra.save_to_mysql("localhost", "root", "password", "my_database")
+```
+Method B: Using simple_import.py Script
+Download the conversion script from the repository or create simple_import.py:
+```python
+import sqlite3
+import mysql.connector
+import pandas as pd
+# Configuration
+SQLITE_DB = "my_data.db"
+MYSQL_HOST = "localhost"
+MYSQL_USER = "root"
+MYSQL_PASSWORD = "your_password"
+MYSQL_DATABASE = "my_database"
+# Connect to SQLite
+sqlite_conn = sqlite3.connect(SQLITE_DB)
+cursor = sqlite_conn.cursor()
+cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
+tables = [row[0] for row in cursor.fetchall()]
+# Connect to MySQL and create database
+mysql_conn_temp = mysql.connector.connect(
+    host=MYSQL_HOST, user=MYSQL_USER, password=MYSQL_PASSWORD
+)
+temp_cursor = mysql_conn_temp.cursor()
+temp_cursor.execute(f"CREATE DATABASE IF NOT EXISTS {MYSQL_DATABASE}")
+temp_cursor.close()
+mysql_conn_temp.close()
+# Connect to database and import
+mysql_conn = mysql.connector.connect(
+    host=MYSQL_HOST, user=MYSQL_USER,
+    password=MYSQL_PASSWORD, database=MYSQL_DATABASE
+)
+mysql_cursor = mysql_conn.cursor()
+for table in tables:
+    df = pd.read_sql_query(f"SELECT * FROM {table}", sqlite_conn)
+    # Create table
+    mysql_cursor.execute(f"DROP TABLE IF EXISTS {table}")
+    cols = []
+    for col in df.columns:
+        dtype = 'INT' if df[col].dtype == 'int64' else 'FLOAT' if df[col].dtype == 'float64' else 'TEXT'
+        cols.append(f"`{col}` {dtype}")
+    mysql_cursor.execute(f"CREATE TABLE {table} ({', '.join(cols)})")
+    # Insert data
+    if len(df) > 0:
+        placeholders = ', '.join(['%s'] * len(df.columns))
+        for _, row in df.iterrows():
+            vals = [None if pd.isna(v) else v for v in row.values]
+            mysql_cursor.execute(f"INSERT INTO {table} VALUES ({placeholders})", vals)
+    mysql_conn.commit()
+    print(f"Imported {table}: {len(df)} rows")
+sqlite_conn.close()
+mysql_cursor.close()
+mysql_conn.close()
+print("Complete!")
+```
+Run: `python simple_import.py`
+**Step 3: Verify in MySQL**
+```sql
+USE my_database;
+SHOW TABLES;
+SELECT * FROM employee_data_people;
+```
+### 5. Custom Visualizations
+```python
+result = sutra.ask("Sales by region", viz="pie")       # Pie chart
+result = sutra.ask("Trends", viz="line")               # Line chart
+result = sutra.ask("Compare", viz="bar")               # Bar chart
+result = sutra.ask("Correlation", viz="scatter")       # Scatter plot
+result = sutra.ask("Data", viz="table")                # Table view
+result = sutra.ask("Analysis", viz="heatmap")          # Heatmap
+result = sutra.ask("Auto", viz=True)                   # Auto-detect
+```
+### 6. Smart Fuzzy Matching
+```python
+sutra = SUTRA(api_key="your-key", fuzzy_match=True)
+# "New York City" automatically matches "New York"
+result = sutra.ask("Who are from New York City?")
+```
+### 7. Intelligent Caching with Embeddings
+```python
+sutra = SUTRA(api_key="your-key", use_embeddings=True)
+result = sutra.ask("Show sales")           # Calls API
+result = sutra.ask("Display sales data")   # Uses cache (no API call)
+```
+### 8. Irrelevant Query Detection
+```python
+sutra = SUTRA(api_key="your-key", check_relevance=True)
+result = sutra.ask("What is the weather?")
+# Warns: "Query may be irrelevant to your database"
+```
+### 9. Direct SQL Access
+```python
+result = sutra.sql("SELECT * FROM people WHERE city='New York'")
+print(result.data)
+```
+## Complete Configuration
+```python
+sutra = SUTRA(
+    api_key="your-openai-key",
+    db="database.db",              # SQLite path
+    use_embeddings=True,           # Smart caching (saves API calls)
+    check_relevance=True,          # Detect irrelevant queries
+    fuzzy_match=True,              # Better NLP
+    cache_queries=True             # Simple caching
+)
+```
+## Supported Formats
+CSV, Excel, JSON, SQL, PDF, Word, Text, Pandas DataFrame
+## How It Works
+### Multi-Table Entity Extraction
+From a single PDF, QuerySUTRA automatically creates multiple related tables:
+**Input:** Employee PDF with 40 employees
+**Output Tables:**
+- employee_data_people (40 rows)
+- employee_data_skills (50 rows)
+- employee_data_technologies (30 rows)
+- employee_data_projects (25 rows)
+- employee_data_certifications (20 rows)
+- employee_data_education (40 rows)
+- employee_data_work_experience (35 rows)
+### Proper Relational Structure
+Tables have unique primary keys and proper foreign key relationships:
+```
+people table:
+  id=1, name="John Doe", city="Dallas"
+  id=2, name="Jane Smith", city="New York"
+skills table:
+  id=1, person_id=1, skill_name="Python"
+  id=2, person_id=1, skill_name="SQL"
+  id=3, person_id=2, skill_name="Java"
+```
+### Chunk Processing for Large Documents
+v0.4.0 processes documents in 10,000 character chunks, ensuring ALL data is extracted:
+- PDF with 50 employees: Extracts all 50 (not just first 10)
+- Large documents: Processes entire content
+- Merges results with unique IDs across chunks
+## API Reference
+### Class Methods
+`SUTRA.load_from_db(db_path, api_key, **kwargs)` - Load existing SQLite
+`SUTRA.connect_mysql(host, user, password, database, ...)` - Connect to MySQL
+`SUTRA.connect_postgres(host, user, password, database, ...)` - Connect to PostgreSQL
+### Instance Methods
+`upload(data, name, extract_entities)` - Upload data
+`ask(question, viz, table)` - Natural language query
+`sql(query, viz)` - Raw SQL query
+`tables()` - List all tables
+`schema(table)` - Show schema
+`peek(table, n)` - Preview data
+`export_db(path, format)` - Export database (sqlite/sql/json/excel)
+`save_to_mysql(...)` - Export to MySQL
+`save_to_postgres(...)` - Export to PostgreSQL
+`backup(path)` - Create backup
+`close()` - Close connection
+## Common Workflows
+### Workflow 1: Analyze PDF in Colab, Export to Local MySQL
+```python
+# In Colab
+from sutra import SUTRA
+sutra = SUTRA(api_key="your-key")
+sutra.upload("document.pdf")
+sutra.tables()
+# Export and download
+sutra.export_db("data.db", format="sqlite")
+from google.colab import files
+files.download("data.db")
+# On Windows
+sutra = SUTRA(api_key="your-key", db="data.db")
+sutra.save_to_mysql("localhost", "root", "password", "my_database")
+```
+### Workflow 2: Load Existing Database and Query
+```python
+# No need to re-upload data
+sutra = SUTRA.load_from_db("data.db", api_key="your-key")
+result = sutra.ask("Your question", viz="pie")
+```
+### Workflow 3: Query MySQL Directly
+```python
+# Connect and query MySQL database
+sutra = SUTRA.connect_mysql("localhost", "root", "password", "production_db")
+result = sutra.ask("Show me latest transactions")
+```
+## Performance Tips
+1. Use `load_from_db()` to avoid re-uploading
+2. Use `sql()` for complex queries (no API cost)
+3. Enable `use_embeddings=True` for caching similar queries
+4. Enable `cache_queries=True` for exact query matches
+5. For large PDFs (50+ pages), allow extra processing time
+## Troubleshooting
+**Only extracting 10 records instead of 50:**
+- Fixed in v0.4.0 with chunk processing
+- Upgrade: `pip install --upgrade QuerySUTRA`
+**MySQL import fails:**
+- Ensure MySQL database exists: `CREATE DATABASE my_database;`
+- Install dependencies: `pip install QuerySUTRA[mysql]`
+- Check MySQL is running: `mysql -u root -p`
+**Colab disk I/O error:**
+- Fixed in v0.4.0 with better connection handling
+- Restart Colab runtime and try again
+**connect_mysql() not found:**
+- Update QuerySUTRA: `pip install --upgrade QuerySUTRA`
+- Requires v0.3.0 or higher
+## System Requirements
+- Python 3.8+
+- OpenAI API key
+- 100MB disk space (for embeddings)
+- MySQL/PostgreSQL (optional, for database export)
+## License
+MIT License
+## Changelog
+### v0.4.0 (Latest)
+- FIXED: Complete data extraction from large documents
+- Chunk processing for PDFs with 50+ pages
+- All employees extracted (not just first 10)
+- Improved MySQL/PostgreSQL import
+- Better error handling for Colab
+### v0.3.x
+- Added MySQL/PostgreSQL connectivity
+- Smart caching with embeddings
+- Fuzzy matching for better NLP
+- Custom visualizations
+- Irrelevant query detection
+### v0.2.x
+- Multi-table entity extraction
+- Proper primary and foreign keys
+### v0.1.x
+- Initial release
+---
+**Made by Aditya Batta**

QuerySUTRA 0.3.2__tar.gz → 0.4.0__tar.gz

QuerySUTRA 0.3.2tar.gz → 0.4.0tar.gz