qasql 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
qasql-1.0.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Chansokheang
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,5 @@
1
+ include LICENSE
2
+ include README.md
3
+ include pyproject.toml
4
+ recursive-include qasql *.py
5
+ recursive-include qasql py.typed
qasql-1.0.0/PKG-INFO ADDED
@@ -0,0 +1,640 @@
1
+ Metadata-Version: 2.4
2
+ Name: qasql
3
+ Version: 1.0.0
4
+ Summary: Local-first Text-to-SQL engine for enterprise deployment
5
+ Author-email: Chansokheang <heangs770@gmail.com>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/Chansokheang/Map-Reduce-Schema-sdk
8
+ Project-URL: Documentation, https://github.com/Chansokheang/Map-Reduce-Schema-sdk#readme
9
+ Project-URL: Repository, https://github.com/Chansokheang/Map-Reduce-Schema-sdk
10
+ Project-URL: Issues, https://github.com/Chansokheang/Map-Reduce-Schema-sdk/issues
11
+ Keywords: text-to-sql,natural-language,sql,database,llm,enterprise,local,ollama
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Operating System :: OS Independent
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Topic :: Database
20
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
21
+ Requires-Python: >=3.10
22
+ Description-Content-Type: text/markdown
23
+ License-File: LICENSE
24
+ Requires-Dist: requests>=2.28.0
25
+ Provides-Extra: anthropic
26
+ Requires-Dist: anthropic>=0.18.0; extra == "anthropic"
27
+ Provides-Extra: openai
28
+ Requires-Dist: openai>=1.0.0; extra == "openai"
29
+ Provides-Extra: postgres
30
+ Requires-Dist: psycopg2-binary>=2.9.0; extra == "postgres"
31
+ Provides-Extra: all
32
+ Requires-Dist: anthropic>=0.18.0; extra == "all"
33
+ Requires-Dist: openai>=1.0.0; extra == "all"
34
+ Requires-Dist: psycopg2-binary>=2.9.0; extra == "all"
35
+ Provides-Extra: dev
36
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
37
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
38
+ Requires-Dist: black>=23.0.0; extra == "dev"
39
+ Requires-Dist: ruff>=0.1.0; extra == "dev"
40
+ Dynamic: license-file
41
+
42
+ # QA-SQL SDK
43
+
44
+ **Local-first Text-to-SQL engine for enterprise deployment.**
45
+
46
+ All processing happens locally - sensitive database schemas never leave your network.
47
+
48
+ [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
49
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
50
+
51
+ ---
52
+
53
+ ## Table of Contents
54
+
55
+ - [Features](#features)
56
+ - [Installation](#installation)
57
+ - [Quick Start](#quick-start)
58
+ - [CLI Usage](#cli-usage)
59
+ - [Python SDK](#python-sdk)
60
+ - [Configuration](#configuration)
61
+ - [How It Works](#how-it-works)
62
+ - [Examples](#examples)
63
+ - [API Reference](#api-reference)
64
+ - [Troubleshooting](#troubleshooting)
65
+
66
+ ---
67
+
68
+ ## Features
69
+
70
+ - **Privacy-First**: Use local LLMs (Ollama) - zero data leaves your network
71
+ - **Multi-Strategy Generation**: Generates 4-5 SQL candidates using different approaches
72
+ - **Automatic Schema Discovery**: Extracts and profiles database structure
73
+ - **Smart Selection**: LLM-as-a-Judge picks the best SQL candidate
74
+ - **Database Support**: SQLite and PostgreSQL
75
+ - **Flexible LLM Support**: Ollama (local), Anthropic Claude, OpenAI GPT
76
+
77
+ ---
78
+
79
+ ## Installation
80
+
81
+ ### Step 1: Install the SDK
82
+
83
+ ```bash
84
+ # From source (development)
85
+ cd qasql-sdk
86
+ pip install -e .
87
+
88
+ # Or from PyPI (after publish)
89
+ pip install qasql
90
+ ```
91
+
92
+ ### Step 2: Install Optional Dependencies
93
+
94
+ ```bash
95
+ # For Anthropic Claude
96
+ pip install qasql[anthropic]
97
+
98
+ # For OpenAI
99
+ pip install qasql[openai]
100
+
101
+ # For PostgreSQL
102
+ pip install qasql[postgres]
103
+
104
+ # All extras
105
+ pip install qasql[all]
106
+ ```
107
+
108
+ ### Step 3: Setup LLM Provider
109
+
110
+ #### Option A: Ollama (Local - Recommended for Privacy)
111
+
112
+ ```bash
113
+ # Install Ollama
114
+ curl -fsSL https://ollama.ai/install.sh | sh
115
+
116
+ # Start Ollama server (keep running in terminal)
117
+ ollama serve
118
+
119
+ # In another terminal, pull a model
120
+ ollama pull llama3.2
121
+
122
+ # Or for better SQL generation
123
+ ollama pull codellama:13b
124
+ ```
125
+
126
+ #### Option B: Anthropic API
127
+
128
+ ```bash
129
+ export ANTHROPIC_API_KEY='your-anthropic-api-key'
130
+ ```
131
+
132
+ #### Option C: OpenAI API
133
+
134
+ ```bash
135
+ export OPENAI_API_KEY='your-openai-api-key'
136
+ ```
137
+
138
+ ---
139
+
140
+ ## Quick Start
141
+
142
+ ### 1. Test Schema Extraction (No LLM Required)
143
+
144
+ ```bash
145
+ cd qasql-sdk/examples
146
+ python test_schema_only.py
147
+ ```
148
+
149
+ ### 2. Full Text-to-SQL Test (Requires LLM)
150
+
151
+ ```bash
152
+ # Make sure Ollama is running first
153
+ ollama serve
154
+
155
+ # Then run the test
156
+ cd qasql-sdk/examples
157
+ python test_california_schools.py
158
+ ```
159
+
160
+ ### 3. Interactive Demo
161
+
162
+ ```bash
163
+ cd qasql-sdk/examples
164
+ python interactive_demo.py --db-uri sqlite:///../../app/california_schools.sqlite
165
+ ```
166
+
167
+ ---
168
+
169
+ ## CLI Usage
170
+
171
+ ### Using `python -m qasql`
172
+
173
+ ```bash
174
+ cd qasql-sdk
175
+
176
+ # List tables
177
+ python -m qasql tables --db-uri sqlite:///path/to/database.sqlite
178
+
179
+ # Setup database (extract schema)
180
+ python -m qasql setup --db-uri sqlite:///path/to/database.sqlite
181
+
182
+ # Generate SQL from question
183
+ python -m qasql query --db-uri sqlite:///path/to/database.sqlite \
184
+ --question "How many customers are there?"
185
+
186
+ # Generate SQL with hint (enables SME strategy)
187
+ python -m qasql query --db-uri sqlite:///path/to/database.sqlite \
188
+ --question "What is the total revenue?" \
189
+ --hint "revenue = sum(amount) from orders table"
190
+
191
+ # Execute the generated SQL
192
+ python -m qasql query --db-uri sqlite:///path/to/database.sqlite \
193
+ --question "List all products" \
194
+ --execute
195
+
196
+ # Show verbose output with timings
197
+ python -m qasql query --db-uri sqlite:///path/to/database.sqlite \
198
+ --question "Count orders by status" \
199
+ --verbose
200
+ ```
201
+
202
+ ### CLI Options
203
+
204
+ ```
205
+ Global Options:
206
+ --config, -c Path to config file (JSON)
207
+ --db-uri Database URI (sqlite:/// or postgresql://)
208
+ --provider LLM provider: ollama, anthropic, openai (default: ollama)
209
+ --model LLM model name (default: llama3.2)
210
+ --ollama-url Ollama server URL (default: http://localhost:11434)
211
+ --output-dir, -o Output directory (default: ./qasql_output)
212
+
213
+ Commands:
214
+ setup Extract schema and generate descriptions
215
+ --readable-names Path to readable names mapping file
216
+ --force, -f Force regeneration
217
+
218
+ query Generate SQL from natural language
219
+ --question, -q Natural language question (required)
220
+ --hint SME hint for better accuracy
221
+ --execute, -e Execute the generated SQL
222
+ --verbose, -v Show timing information
223
+ --json Output as JSON
224
+
225
+ tables List database tables
226
+ ```
227
+
228
+ ---
229
+
230
+ ## Python SDK
231
+
232
+ ### Basic Usage
233
+
234
+ ```python
235
+ from qasql import QASQLEngine
236
+
237
+ # Initialize engine
238
+ engine = QASQLEngine(
239
+ db_uri="sqlite:///path/to/database.sqlite",
240
+ llm_provider="ollama", # or "anthropic", "openai"
241
+ llm_model="llama3.2", # model name
242
+ output_dir="./qasql_output"
243
+ )
244
+
245
+ # One-time setup (extracts schema, generates column descriptions)
246
+ setup_result = engine.setup()
247
+ print(f"Tables found: {setup_result.tables_found}")
248
+
249
+ # Query WITHOUT hint → generates 4 candidates
250
+ result = engine.query("How many customers are there?")
251
+ print(result.sql)
252
+ print(result.confidence)
253
+
254
+ # Query WITH hint → generates 5 candidates (includes SME strategy)
255
+ result = engine.query(
256
+ question="What is the total revenue by month?",
257
+ hint="revenue = sum(order_amount), use orders table"
258
+ )
259
+ print(result.sql)
260
+ print(result.reasoning)
261
+
262
+ # Execute SQL directly
263
+ rows, columns = engine.execute_sql(result.sql)
264
+ print(columns)
265
+ print(rows)
266
+ ```
267
+
268
+ ### With Configuration File
269
+
270
+ ```python
271
+ from qasql import QASQLEngine
272
+
273
+ # Load from config file
274
+ engine = QASQLEngine(config_file="qasql.config.json")
275
+ engine.setup()
276
+ result = engine.query("Show all orders")
277
+ ```
278
+
279
+ ### Inspect Schema
280
+
281
+ ```python
282
+ # Get table list
283
+ tables = engine.get_tables()
284
+ print(tables) # ['customers', 'orders', 'products']
285
+
286
+ # Get full schema
287
+ schema = engine.get_schema()
288
+ for table_name, info in schema.items():
289
+ print(f"{table_name}: {len(info['columns'])} columns, {info['row_count']} rows")
290
+
291
+ # Get column descriptions
292
+ profile = engine.get_profile()
293
+ ```
294
+
295
+ ### Inspect Query Results
296
+
297
+ ```python
298
+ result = engine.query("Show top 10 customers by revenue")
299
+
300
+ # Access all fields
301
+ print(result.sql) # Generated SQL
302
+ print(result.confidence) # 0.0 - 1.0
303
+ print(result.reasoning) # Why this SQL was selected
304
+ print(result.question) # Original question
305
+ print(result.hint) # Hint if provided
306
+
307
+ # Candidate details
308
+ print(f"Candidates: {result.successful_candidates}/{result.total_candidates}")
309
+ for candidate in result.candidates:
310
+ print(f" [{candidate.strategy}] {candidate.success} - {candidate.sql[:50]}...")
311
+
312
+ # Timing information
313
+ for stage, ms in result.metadata.get("timings", {}).items():
314
+ print(f" {stage}: {ms:.0f}ms")
315
+
316
+ # Convert to dictionary
317
+ result_dict = result.to_dict()
318
+ ```
319
+
320
+ ---
321
+
322
+ ## Configuration
323
+
324
+ ### Config File (qasql.config.json)
325
+
326
+ ```json
327
+ {
328
+ "database": {
329
+ "type": "sqlite",
330
+ "uri": "./database.sqlite"
331
+ },
332
+ "llm": {
333
+ "provider": "ollama",
334
+ "model": "llama3.2",
335
+ "base_url": "http://localhost:11434"
336
+ },
337
+ "options": {
338
+ "readable_names": "mappings.json",
339
+ "relevance_threshold": 0.5,
340
+ "query_timeout": 30,
341
+ "output_dir": "./output"
342
+ }
343
+ }
344
+ ```
345
+
346
+ ### PostgreSQL Configuration
347
+
348
+ ```json
349
+ {
350
+ "database": {
351
+ "type": "postgresql",
352
+ "uri": "postgresql://user:password@localhost:5432/mydb"
353
+ },
354
+ "llm": {
355
+ "provider": "ollama",
356
+ "model": "llama3.2"
357
+ }
358
+ }
359
+ ```
360
+
361
+ ### Environment Variables
362
+
363
+ ```bash
364
+ export QASQL_DB_URI="sqlite:///database.sqlite"
365
+ export QASQL_DB_TYPE="sqlite"
366
+ export QASQL_LLM_PROVIDER="ollama"
367
+ export QASQL_LLM_MODEL="llama3.2"
368
+ export QASQL_OLLAMA_URL="http://localhost:11434"
369
+
370
+ # For cloud providers
371
+ export ANTHROPIC_API_KEY="sk-ant-..."
372
+ export OPENAI_API_KEY="sk-..."
373
+ ```
374
+
375
+ ### Readable Names Mapping
376
+
377
+ If your database has cryptic column names, provide a mapping file:
378
+
379
+ **JSON format:**
380
+ ```json
381
+ {
382
+ "tbl_cust_01": {
383
+ "table_readable_name": "Customers",
384
+ "columns": {
385
+ "col_a": "Customer Name",
386
+ "col_b": "Email Address",
387
+ "col_c": "Registration Date"
388
+ }
389
+ },
390
+ "tbl_ord_02": {
391
+ "table_readable_name": "Orders",
392
+ "columns": {
393
+ "ord_id": "Order ID",
394
+ "amt_val": "Order Amount"
395
+ }
396
+ }
397
+ }
398
+ ```
399
+
400
+ **CSV format:**
401
+ ```csv
402
+ table,column,readable_name
403
+ tbl_cust_01,col_a,Customer Name
404
+ tbl_cust_01,col_b,Email Address
405
+ tbl_ord_02,amt_val,Order Amount
406
+ ```
407
+
408
+ ---
409
+
410
+ ## How It Works
411
+
412
+ ### Architecture
413
+
414
+ ```
415
+ ┌─────────────────────────────────────────────────────────────┐
416
+ │ QA-SQL SDK │
417
+ ├─────────────────────────────────────────────────────────────┤
418
+ │ │
419
+ │ ┌──────────┐ ┌──────────────┐ ┌──────────────────┐ │
420
+ │ │ Database │───▶│ QASQLEngine │───▶│ LLM Provider │ │
421
+ │ │ SQLite/ │ │ │ │ Ollama/Anthropic │ │
422
+ │ │ Postgres │◀───│ │◀───│ /OpenAI │ │
423
+ │ └──────────┘ └──────────────┘ └──────────────────┘ │
424
+ │ │
425
+ └─────────────────────────────────────────────────────────────┘
426
+ ```
427
+
428
+ ### Two-Phase Flow
429
+
430
+ **Phase 1: Setup (One-time)**
431
+ ```
432
+ Database → Schema Extraction → Column Descriptions → Ready
433
+ ```
434
+
435
+ **Phase 2: Query (Runtime)**
436
+ ```
437
+ Question → Schema Agent → Candidate Generation → Execution → Judge → SQL
438
+ (Map-Reduce) (4-5 strategies) (retry) (select best)
439
+ ```
440
+
441
+ ### Candidate Generation Strategies
442
+
443
+ | Strategy | Description | With Hint | Without Hint |
444
+ |----------|-------------|-----------|--------------|
445
+ | full_schema | Complete database schema | ✓ | ✓ |
446
+ | sme_metadata | Schema + domain expert hints | ✓ | ✗ (skipped) |
447
+ | minimal_profile | Column names only | ✓ | ✓ |
448
+ | focused_schema | Relevant tables only | ✓ | ✓ |
449
+ | full_profile | Schema + descriptions | ✓ | ✓ |
450
+ | **Total** | | **5** | **4** |
451
+
452
+ When no hint is provided, the SME strategy is skipped since it requires domain knowledge.
453
+
454
+ ---
455
+
456
+ ## Examples
457
+
458
+ ### Example 1: Simple Query
459
+
460
+ ```python
461
+ from qasql import QASQLEngine
462
+
463
+ engine = QASQLEngine(db_uri="sqlite:///sales.sqlite")
464
+ engine.setup()
465
+
466
+ result = engine.query("How many orders were placed last month?")
467
+ print(result.sql)
468
+ # SELECT COUNT(*) FROM orders WHERE order_date >= date('now', '-1 month')
469
+ ```
470
+
471
+ ### Example 2: Query with Hint
472
+
473
+ ```python
474
+ result = engine.query(
475
+ question="What is the average order value by customer segment?",
476
+ hint="order value = quantity * unit_price, segment is in customers table"
477
+ )
478
+ print(result.sql)
479
+ print(result.confidence) # Higher confidence with hint
480
+ ```
481
+
482
+ ### Example 3: Execute and Display Results
483
+
484
+ ```python
485
+ result = engine.query("List top 5 customers by total purchases")
486
+
487
+ if result.sql:
488
+ rows, columns = engine.execute_sql(result.sql)
489
+
490
+ # Print as table
491
+ print(" | ".join(columns))
492
+ print("-" * 50)
493
+ for row in rows:
494
+ print(" | ".join(str(v) for v in row))
495
+ ```
496
+
497
+ ### Example 4: Using Anthropic
498
+
499
+ ```python
500
+ import os
501
+ os.environ["ANTHROPIC_API_KEY"] = "your-key"
502
+
503
+ engine = QASQLEngine(
504
+ db_uri="sqlite:///mydb.sqlite",
505
+ llm_provider="anthropic",
506
+ llm_model="claude-sonnet-4-5-20250929"
507
+ )
508
+ ```
509
+
510
+ ---
511
+
512
+ ## API Reference
513
+
514
+ ### QASQLEngine
515
+
516
+ ```python
517
+ class QASQLEngine:
518
+ def __init__(
519
+ self,
520
+ db_uri: str = None, # Database URI
521
+ db_type: str = None, # "sqlite" or "postgresql"
522
+ llm_provider: str = "ollama", # "ollama", "anthropic", "openai"
523
+ llm_model: str = "llama3.2", # Model name
524
+ llm_base_url: str = "http://localhost:11434",
525
+ readable_names: str = None, # Path to mappings file
526
+ output_dir: str = "./qasql_output",
527
+ config_file: str = None, # Path to config JSON
528
+ ): ...
529
+
530
+ def setup(self, force: bool = False) -> SetupResult: ...
531
+ def query(self, question: str, hint: str = None) -> QueryResult: ...
532
+ def execute_sql(self, sql: str) -> tuple[list, list]: ...
533
+ def get_tables(self) -> list[str]: ...
534
+ def get_schema(self) -> dict: ...
535
+ def get_profile(self) -> dict: ...
536
+ ```
537
+
538
+ ### QueryResult
539
+
540
+ ```python
541
+ @dataclass
542
+ class QueryResult:
543
+ sql: str # Generated SQL
544
+ confidence: float # 0.0 - 1.0
545
+ question: str # Original question
546
+ hint: str | None # Provided hint
547
+ reasoning: str # Selection reasoning
548
+ candidates: list # All candidates
549
+ successful_candidates: int # Count of successful
550
+ total_candidates: int # Total count
551
+ metadata: dict # Timings, etc.
552
+
553
+ def to_dict(self) -> dict: ...
554
+ ```
555
+
556
+ ### SetupResult
557
+
558
+ ```python
559
+ @dataclass
560
+ class SetupResult:
561
+ success: bool
562
+ database_name: str
563
+ tables_found: int
564
+ schema_path: str | None
565
+ descriptions_path: str | None
566
+ errors: list[str]
567
+ ```
568
+
569
+ ---
570
+
571
+ ## Troubleshooting
572
+
573
+ ### "command not found: qasql"
574
+
575
+ Use `python -m qasql` instead:
576
+ ```bash
577
+ python -m qasql tables --db-uri sqlite:///mydb.sqlite
578
+ ```
579
+
580
+ ### "Cannot connect to Ollama"
581
+
582
+ Make sure Ollama is running:
583
+ ```bash
584
+ # Terminal 1
585
+ ollama serve
586
+
587
+ # Terminal 2
588
+ ollama pull llama3.2
589
+ ```
590
+
591
+ ### "ANTHROPIC_API_KEY not found"
592
+
593
+ Set the environment variable:
594
+ ```bash
595
+ export ANTHROPIC_API_KEY='your-key'
596
+ ```
597
+
598
+ ### "Database not found"
599
+
600
+ Check the path is correct:
601
+ ```bash
602
+ # Use absolute path
603
+ python -m qasql tables --db-uri sqlite:////absolute/path/to/db.sqlite
604
+
605
+ # Or relative path
606
+ python -m qasql tables --db-uri sqlite:///./relative/path/db.sqlite
607
+ ```
608
+
609
+ ### "No module named 'qasql'"
610
+
611
+ Install the package:
612
+ ```bash
613
+ cd qasql-sdk
614
+ pip install -e .
615
+ ```
616
+
617
+ ---
618
+
619
+ ## Data Privacy
620
+
621
+ | Provider | Data Location | Recommendation |
622
+ |----------|---------------|----------------|
623
+ | **Ollama** | 100% Local | Enterprise / Sensitive data |
624
+ | Anthropic | Cloud (Anthropic servers) | Development / Non-sensitive |
625
+ | OpenAI | Cloud (OpenAI servers) | Development / Non-sensitive |
626
+
627
+ **With Ollama, zero data leaves your network.**
628
+
629
+ ---
630
+
631
+ ## License
632
+
633
+ MIT License - see [LICENSE](LICENSE) for details.
634
+
635
+ ---
636
+
637
+ ## Support
638
+
639
+ - Issues: [GitHub Issues](https://github.com/your-org/qasql/issues)
640
+ - Documentation: [GitHub Wiki](https://github.com/your-org/qasql/wiki)