thoth-dbmanager 0.4.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- thoth_dbmanager-0.4.0/LICENSE +21 -0
- thoth_dbmanager-0.4.0/MANIFEST.in +3 -0
- thoth_dbmanager-0.4.0/PKG-INFO +384 -0
- thoth_dbmanager-0.4.0/README.md +312 -0
- thoth_dbmanager-0.4.0/pyproject.toml +86 -0
- thoth_dbmanager-0.4.0/setup.cfg +4 -0
- thoth_dbmanager-0.4.0/tests/test_integration_new_architecture.py +363 -0
- thoth_dbmanager-0.4.0/tests/test_lsh_query.py +63 -0
- thoth_dbmanager-0.4.0/tests/test_new_architecture.py +471 -0
- thoth_dbmanager-0.4.0/tests/test_parameter_validation.py +68 -0
- thoth_dbmanager-0.4.0/tests/test_thoth_db_manager_base.py +140 -0
- thoth_dbmanager-0.4.0/tests/test_thoth_informix_manager.py +41 -0
- thoth_dbmanager-0.4.0/tests/test_thoth_mariadb_manager.py +39 -0
- thoth_dbmanager-0.4.0/tests/test_thoth_mysql_manager.py +39 -0
- thoth_dbmanager-0.4.0/tests/test_thoth_oracle_manager.py +39 -0
- thoth_dbmanager-0.4.0/tests/test_thoth_pg_manager.py +292 -0
- thoth_dbmanager-0.4.0/tests/test_thoth_sqlite_manager.py +326 -0
- thoth_dbmanager-0.4.0/tests/test_thoth_sqlserver_manager.py +39 -0
- thoth_dbmanager-0.4.0/tests/test_thoth_supabase_manager.py +376 -0
- thoth_dbmanager-0.4.0/thoth_dbmanager.egg-info/PKG-INFO +384 -0
- thoth_dbmanager-0.4.0/thoth_dbmanager.egg-info/SOURCES.txt +22 -0
- thoth_dbmanager-0.4.0/thoth_dbmanager.egg-info/dependency_links.txt +1 -0
- thoth_dbmanager-0.4.0/thoth_dbmanager.egg-info/requires.txt +56 -0
- thoth_dbmanager-0.4.0/thoth_dbmanager.egg-info/top_level.txt +1 -0
@@ -0,0 +1,21 @@
|
|
1
|
+
MIT License
|
2
|
+
|
3
|
+
Copyright (c) 2025 Marco Pancotti
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
13
|
+
copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21
|
+
SOFTWARE.
|
@@ -0,0 +1,384 @@
|
|
1
|
+
Metadata-Version: 2.4
|
2
|
+
Name: thoth-dbmanager
|
3
|
+
Version: 0.4.0
|
4
|
+
Summary: A Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.
|
5
|
+
Author-email: Marco Pancotti <mp@tylconsulting.it>
|
6
|
+
License: MIT
|
7
|
+
Project-URL: Homepage, https://github.com/mptyl/thoth-dbmanager
|
8
|
+
Project-URL: Bug Tracker, https://github.com/mptyl/thoth-dbmanager/issues
|
9
|
+
Project-URL: Documentation, https://github.com/mptyl/thoth-dbmanager#readme
|
10
|
+
Project-URL: Source Code, https://github.com/mptyl/thoth-dbmanager
|
11
|
+
Keywords: database,sql,lsh,similarity-search,orm
|
12
|
+
Classifier: Programming Language :: Python :: 3
|
13
|
+
Classifier: Programming Language :: Python :: 3.8
|
14
|
+
Classifier: Programming Language :: Python :: 3.9
|
15
|
+
Classifier: Programming Language :: Python :: 3.10
|
16
|
+
Classifier: Programming Language :: Python :: 3.11
|
17
|
+
Classifier: Programming Language :: Python :: 3.12
|
18
|
+
Classifier: License :: OSI Approved :: MIT License
|
19
|
+
Classifier: Operating System :: OS Independent
|
20
|
+
Classifier: Intended Audience :: Developers
|
21
|
+
Classifier: Topic :: Database
|
22
|
+
Classifier: Topic :: Scientific/Engineering :: Information Analysis
|
23
|
+
Classifier: Development Status :: 4 - Beta
|
24
|
+
Requires-Python: >=3.8
|
25
|
+
Description-Content-Type: text/markdown
|
26
|
+
License-File: LICENSE
|
27
|
+
Requires-Dist: datasketch>=1.5.0
|
28
|
+
Requires-Dist: tqdm>=4.60.0
|
29
|
+
Requires-Dist: SQLAlchemy>=1.4.0
|
30
|
+
Requires-Dist: pydantic>=2.0.0
|
31
|
+
Provides-Extra: postgresql
|
32
|
+
Requires-Dist: psycopg2-binary>=2.9.0; extra == "postgresql"
|
33
|
+
Provides-Extra: mysql
|
34
|
+
Requires-Dist: mysql-connector-python>=8.0.0; extra == "mysql"
|
35
|
+
Provides-Extra: mariadb
|
36
|
+
Requires-Dist: mariadb>=1.1.0; extra == "mariadb"
|
37
|
+
Provides-Extra: sqlserver
|
38
|
+
Requires-Dist: pyodbc>=4.0.0; extra == "sqlserver"
|
39
|
+
Provides-Extra: oracle
|
40
|
+
Requires-Dist: cx_Oracle>=8.3.0; extra == "oracle"
|
41
|
+
Provides-Extra: informix
|
42
|
+
Requires-Dist: informixdb>=2.2.0; extra == "informix"
|
43
|
+
Provides-Extra: supabase
|
44
|
+
Requires-Dist: supabase>=2.0.0; extra == "supabase"
|
45
|
+
Requires-Dist: postgrest-py>=0.16.0; extra == "supabase"
|
46
|
+
Requires-Dist: gotrue-py>=2.0.0; extra == "supabase"
|
47
|
+
Provides-Extra: sqlite
|
48
|
+
Provides-Extra: all
|
49
|
+
Requires-Dist: psycopg2-binary>=2.9.0; extra == "all"
|
50
|
+
Requires-Dist: mysql-connector-python>=8.0.0; extra == "all"
|
51
|
+
Requires-Dist: mariadb>=1.1.0; extra == "all"
|
52
|
+
Requires-Dist: pyodbc>=4.0.0; extra == "all"
|
53
|
+
Requires-Dist: cx_Oracle>=8.3.0; extra == "all"
|
54
|
+
Requires-Dist: informixdb>=2.2.0; extra == "all"
|
55
|
+
Requires-Dist: supabase>=2.0.0; extra == "all"
|
56
|
+
Requires-Dist: postgrest-py>=0.16.0; extra == "all"
|
57
|
+
Requires-Dist: gotrue-py>=2.0.0; extra == "all"
|
58
|
+
Provides-Extra: dev
|
59
|
+
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
60
|
+
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
|
61
|
+
Requires-Dist: black>=22.0.0; extra == "dev"
|
62
|
+
Requires-Dist: flake8>=5.0.0; extra == "dev"
|
63
|
+
Requires-Dist: mypy>=1.0.0; extra == "dev"
|
64
|
+
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
|
65
|
+
Provides-Extra: test-postgresql
|
66
|
+
Requires-Dist: pytest>=7.0.0; extra == "test-postgresql"
|
67
|
+
Requires-Dist: psycopg2-binary>=2.9.0; extra == "test-postgresql"
|
68
|
+
Provides-Extra: test-mysql
|
69
|
+
Requires-Dist: pytest>=7.0.0; extra == "test-mysql"
|
70
|
+
Requires-Dist: mysql-connector-python>=8.0.0; extra == "test-mysql"
|
71
|
+
Dynamic: license-file
|
72
|
+
|
73
|
+
# Thoth Database Manager
|
74
|
+
|
75
|
+
A Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.
|
76
|
+
|
77
|
+
## Features
|
78
|
+
|
79
|
+
- **Multi-Database Support**: PostgreSQL, MySQL, MariaDB, SQLite, SQL Server, Oracle, Informix, and Supabase
|
80
|
+
- **Plugin Architecture**: Extensible design for adding new database types
|
81
|
+
- **LSH Search**: Locality-Sensitive Hashing for finding similar values across database columns
|
82
|
+
- **Type Safety**: Pydantic-based document models for structured data
|
83
|
+
- **Backward Compatibility**: Maintains compatibility with existing code
|
84
|
+
|
85
|
+
## Installation
|
86
|
+
|
87
|
+
```bash
|
88
|
+
pip install thoth-dbmanager
|
89
|
+
```
|
90
|
+
|
91
|
+
## Quick Start
|
92
|
+
|
93
|
+
### Basic Usage
|
94
|
+
|
95
|
+
```python
|
96
|
+
from thoth_dbmanager import ThothDbManager
|
97
|
+
|
98
|
+
# Create a database manager instance
|
99
|
+
manager = ThothDbManager.get_instance(
|
100
|
+
db_type="postgresql",
|
101
|
+
db_root_path="./data",
|
102
|
+
db_mode="dev",
|
103
|
+
host="localhost",
|
104
|
+
port=5432,
|
105
|
+
database="mydb",
|
106
|
+
user="username",
|
107
|
+
password="password"
|
108
|
+
)
|
109
|
+
|
110
|
+
# Get database information
|
111
|
+
tables = manager.get_tables()
|
112
|
+
columns = manager.get_columns("my_table")
|
113
|
+
foreign_keys = manager.get_foreign_keys()
|
114
|
+
|
115
|
+
# Execute SQL queries
|
116
|
+
results = manager.execute_sql("SELECT * FROM my_table LIMIT 10")
|
117
|
+
|
118
|
+
# Get example data for understanding table contents
|
119
|
+
example_data = manager.get_example_data("my_table", number_of_rows=20)
|
120
|
+
```
|
121
|
+
|
122
|
+
### LSH Similarity Search
|
123
|
+
|
124
|
+
```python
|
125
|
+
# Query for similar values across all database columns
|
126
|
+
similar_values = manager.query_lsh(
|
127
|
+
keyword="john doe",
|
128
|
+
top_n=10,
|
129
|
+
signature_size=30,
|
130
|
+
n_gram=3
|
131
|
+
)
|
132
|
+
|
133
|
+
# Results are organized by table and column
|
134
|
+
for table_name, columns in similar_values.items():
|
135
|
+
for column_name, values in columns.items():
|
136
|
+
print(f"Similar values in {table_name}.{column_name}: {values}")
|
137
|
+
```
|
138
|
+
|
139
|
+
## Supported Databases
|
140
|
+
|
141
|
+
| Database | Connection Parameters |
|
142
|
+
|----------|----------------------|
|
143
|
+
| **PostgreSQL** | `host`, `port`, `database`, `user`, `password` |
|
144
|
+
| **MySQL** | `host`, `port`, `database`, `user`, `password` |
|
145
|
+
| **MariaDB** | `host`, `port`, `database`, `user`, `password` |
|
146
|
+
| **SQLite** | `database_path` |
|
147
|
+
| **Supabase** | `host`, `port`, `database`, `user`, `password`, `project_url`, `api_key` |
|
148
|
+
| **SQL Server** | `server`, `database`, `user`, `password` |
|
149
|
+
| **Oracle** | `host`, `port`, `service_name`, `user`, `password` |
|
150
|
+
| **Informix** | `server`, `database`, `host`, `user`, `password` |
|
151
|
+
|
152
|
+
## API Reference
|
153
|
+
|
154
|
+
### ThothDbManager
|
155
|
+
|
156
|
+
The main class for database operations.
|
157
|
+
|
158
|
+
#### Class Methods
|
159
|
+
|
160
|
+
##### `get_instance(db_type: str, **kwargs) -> ThothDbManager`
|
161
|
+
|
162
|
+
Creates or retrieves a singleton instance of the database manager.
|
163
|
+
|
164
|
+
**Parameters:**
|
165
|
+
- `db_type` (str): Database type identifier
|
166
|
+
- `db_root_path` (str): Path to store database-related files
|
167
|
+
- `db_mode` (str): Operating mode ("dev", "prod", etc.)
|
168
|
+
- `**kwargs`: Database-specific connection parameters
|
169
|
+
|
170
|
+
**Returns:** Database manager instance
|
171
|
+
|
172
|
+
#### Instance Methods
|
173
|
+
|
174
|
+
##### `execute_sql(sql: str, params: dict = None, fetch: str = "all", timeout: int = 60) -> Any`
|
175
|
+
|
176
|
+
Execute SQL queries against the database.
|
177
|
+
|
178
|
+
**Parameters:**
|
179
|
+
- `sql` (str): SQL query string
|
180
|
+
- `params` (dict, optional): Query parameters for prepared statements
|
181
|
+
- `fetch` (str): How to fetch results ("all", "one", or number)
|
182
|
+
- `timeout` (int): Query timeout in seconds
|
183
|
+
|
184
|
+
**Returns:** Query results
|
185
|
+
|
186
|
+
##### `get_tables() -> List[Dict[str, str]]`
|
187
|
+
|
188
|
+
Get list of tables in the database.
|
189
|
+
|
190
|
+
**Returns:** List of dictionaries with `name` and `comment` keys
|
191
|
+
|
192
|
+
##### `get_columns(table_name: str) -> List[Dict[str, Any]]`
|
193
|
+
|
194
|
+
Get column information for a specific table.
|
195
|
+
|
196
|
+
**Parameters:**
|
197
|
+
- `table_name` (str): Name of the table
|
198
|
+
|
199
|
+
**Returns:** List of dictionaries with column metadata (`name`, `data_type`, `comment`, `is_pk`)
|
200
|
+
|
201
|
+
##### `get_foreign_keys() -> List[Dict[str, str]]`
|
202
|
+
|
203
|
+
Get foreign key relationships in the database.
|
204
|
+
|
205
|
+
**Returns:** List of dictionaries with foreign key information
|
206
|
+
|
207
|
+
##### `get_example_data(table_name: str, number_of_rows: int = 30) -> Dict[str, List[Any]]`
|
208
|
+
|
209
|
+
Get the most frequent values for each column in a table.
|
210
|
+
|
211
|
+
**Parameters:**
|
212
|
+
- `table_name` (str): Name of the table
|
213
|
+
- `number_of_rows` (int): Maximum number of example values per column
|
214
|
+
|
215
|
+
**Returns:** Dictionary mapping column names to lists of example values
|
216
|
+
|
217
|
+
##### `query_lsh(keyword: str, signature_size: int = 30, n_gram: int = 3, top_n: int = 10) -> Dict[str, Dict[str, List[str]]]`
|
218
|
+
|
219
|
+
Search for similar values using LSH (Locality-Sensitive Hashing).
|
220
|
+
|
221
|
+
**Parameters:**
|
222
|
+
- `keyword` (str): Search term
|
223
|
+
- `signature_size` (int): MinHash signature size
|
224
|
+
- `n_gram` (int): N-gram size for text processing
|
225
|
+
- `top_n` (int): Number of similar values to return
|
226
|
+
|
227
|
+
**Returns:** Nested dictionary: `{table_name: {column_name: [similar_values]}}`
|
228
|
+
|
229
|
+
### Factory Pattern (Alternative API)
|
230
|
+
|
231
|
+
For more advanced usage, you can use the factory pattern:
|
232
|
+
|
233
|
+
```python
|
234
|
+
from thoth_dbmanager import ThothDbFactory
|
235
|
+
|
236
|
+
# Create manager using factory
|
237
|
+
manager = ThothDbFactory.create_manager(
|
238
|
+
db_type="postgresql",
|
239
|
+
db_root_path="./data",
|
240
|
+
db_mode="dev",
|
241
|
+
host="localhost",
|
242
|
+
port=5432,
|
243
|
+
database="mydb",
|
244
|
+
user="username",
|
245
|
+
password="password"
|
246
|
+
)
|
247
|
+
|
248
|
+
# List available database types
|
249
|
+
available_dbs = ThothDbFactory.list_available_databases()
|
250
|
+
|
251
|
+
# Get required parameters for a database type
|
252
|
+
params = ThothDbFactory.get_required_parameters("postgresql")
|
253
|
+
```
|
254
|
+
|
255
|
+
## Document-Based API (Advanced)
|
256
|
+
|
257
|
+
The library also provides a document-based API for structured data access:
|
258
|
+
|
259
|
+
```python
|
260
|
+
# Get tables as structured documents
|
261
|
+
if hasattr(manager, 'get_tables_as_documents'):
|
262
|
+
table_docs = manager.get_tables_as_documents()
|
263
|
+
for doc in table_docs:
|
264
|
+
print(f"Table: {doc.table_name}")
|
265
|
+
print(f"Schema: {doc.schema_name}")
|
266
|
+
print(f"Comment: {doc.comment}")
|
267
|
+
|
268
|
+
# Get columns as structured documents
|
269
|
+
if hasattr(manager, 'get_columns_as_documents'):
|
270
|
+
column_docs = manager.get_columns_as_documents("my_table")
|
271
|
+
for doc in column_docs:
|
272
|
+
print(f"Column: {doc.column_name} ({doc.data_type})")
|
273
|
+
print(f"Nullable: {doc.is_nullable}")
|
274
|
+
print(f"Primary Key: {doc.is_pk}")
|
275
|
+
```
|
276
|
+
|
277
|
+
## Configuration Examples
|
278
|
+
|
279
|
+
### PostgreSQL
|
280
|
+
```python
|
281
|
+
manager = ThothDbManager.get_instance(
|
282
|
+
db_type="postgresql",
|
283
|
+
db_root_path="./data",
|
284
|
+
db_mode="production",
|
285
|
+
host="localhost",
|
286
|
+
port=5432,
|
287
|
+
database="myapp",
|
288
|
+
user="dbuser",
|
289
|
+
password="dbpass"
|
290
|
+
)
|
291
|
+
```
|
292
|
+
|
293
|
+
### SQLite
|
294
|
+
```python
|
295
|
+
manager = ThothDbManager.get_instance(
|
296
|
+
db_type="sqlite",
|
297
|
+
db_root_path="./data",
|
298
|
+
db_mode="dev",
|
299
|
+
database_path="./data/myapp.db"
|
300
|
+
)
|
301
|
+
```
|
302
|
+
|
303
|
+
### MySQL/MariaDB
|
304
|
+
```python
|
305
|
+
manager = ThothDbManager.get_instance(
|
306
|
+
db_type="mysql", # or "mariadb"
|
307
|
+
db_root_path="./data",
|
308
|
+
db_mode="production",
|
309
|
+
host="localhost",
|
310
|
+
port=3306,
|
311
|
+
database="myapp",
|
312
|
+
user="dbuser",
|
313
|
+
password="dbpass"
|
314
|
+
)
|
315
|
+
```
|
316
|
+
|
317
|
+
## Error Handling
|
318
|
+
|
319
|
+
The library provides clear error messages for common issues:
|
320
|
+
|
321
|
+
```python
|
322
|
+
try:
|
323
|
+
manager = ThothDbManager.get_instance(
|
324
|
+
db_type="postgresql",
|
325
|
+
db_root_path="./data",
|
326
|
+
# Missing required parameters
|
327
|
+
)
|
328
|
+
except ValueError as e:
|
329
|
+
print(f"Configuration error: {e}")
|
330
|
+
|
331
|
+
try:
|
332
|
+
results = manager.execute_sql("SELECT * FROM nonexistent_table")
|
333
|
+
except Exception as e:
|
334
|
+
print(f"Query error: {e}")
|
335
|
+
```
|
336
|
+
|
337
|
+
## LSH Search Details
|
338
|
+
|
339
|
+
The LSH (Locality-Sensitive Hashing) feature allows you to find similar text values across your database:
|
340
|
+
|
341
|
+
1. **Automatic Setup**: LSH indexes are created automatically from your database content
|
342
|
+
2. **Cross-Column Search**: Search across all text columns in all tables
|
343
|
+
3. **Fuzzy Matching**: Find similar values even with typos or variations
|
344
|
+
4. **Configurable**: Adjust similarity sensitivity with parameters
|
345
|
+
|
346
|
+
### LSH Use Cases
|
347
|
+
|
348
|
+
- **Data Deduplication**: Find duplicate or near-duplicate records
|
349
|
+
- **Data Quality**: Identify inconsistent data entry
|
350
|
+
- **Search Enhancement**: Provide "did you mean?" functionality
|
351
|
+
- **Data Exploration**: Discover related content across tables
|
352
|
+
|
353
|
+
## Architecture
|
354
|
+
|
355
|
+
The library uses a modern plugin architecture:
|
356
|
+
|
357
|
+
- **Plugins**: Database-specific implementations
|
358
|
+
- **Adapters**: Low-level database operations
|
359
|
+
- **Factory**: Plugin instantiation and management
|
360
|
+
- **Documents**: Type-safe data models using Pydantic
|
361
|
+
- **Registry**: Plugin discovery and registration
|
362
|
+
|
363
|
+
This design makes it easy to:
|
364
|
+
- Add support for new database types
|
365
|
+
- Maintain consistent APIs across databases
|
366
|
+
- Extend functionality without breaking existing code
|
367
|
+
|
368
|
+
## Contributing
|
369
|
+
|
370
|
+
To add support for a new database type:
|
371
|
+
|
372
|
+
1. Create an adapter class implementing `DbAdapter`
|
373
|
+
2. Create a plugin class implementing `DbPlugin`
|
374
|
+
3. Register the plugin with `@register_plugin("db_type")`
|
375
|
+
4. Add connection parameter validation
|
376
|
+
5. Implement required abstract methods
|
377
|
+
|
378
|
+
## License
|
379
|
+
|
380
|
+
This project is licensed under the MIT License.
|
381
|
+
|
382
|
+
## Support
|
383
|
+
|
384
|
+
For issues, questions, or contributions, please visit the project repository.
|