thoth-dbmanager 0.4.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Marco Pancotti
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,3 @@
1
+ prune data
2
+ include README.md
3
+ include LICENSE
@@ -0,0 +1,384 @@
1
+ Metadata-Version: 2.4
2
+ Name: thoth-dbmanager
3
+ Version: 0.4.0
4
+ Summary: A Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.
5
+ Author-email: Marco Pancotti <mp@tylconsulting.it>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/mptyl/thoth-dbmanager
8
+ Project-URL: Bug Tracker, https://github.com/mptyl/thoth-dbmanager/issues
9
+ Project-URL: Documentation, https://github.com/mptyl/thoth-dbmanager#readme
10
+ Project-URL: Source Code, https://github.com/mptyl/thoth-dbmanager
11
+ Keywords: database,sql,lsh,similarity-search,orm
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.8
14
+ Classifier: Programming Language :: Python :: 3.9
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: License :: OSI Approved :: MIT License
19
+ Classifier: Operating System :: OS Independent
20
+ Classifier: Intended Audience :: Developers
21
+ Classifier: Topic :: Database
22
+ Classifier: Topic :: Scientific/Engineering :: Information Analysis
23
+ Classifier: Development Status :: 4 - Beta
24
+ Requires-Python: >=3.8
25
+ Description-Content-Type: text/markdown
26
+ License-File: LICENSE
27
+ Requires-Dist: datasketch>=1.5.0
28
+ Requires-Dist: tqdm>=4.60.0
29
+ Requires-Dist: SQLAlchemy>=1.4.0
30
+ Requires-Dist: pydantic>=2.0.0
31
+ Provides-Extra: postgresql
32
+ Requires-Dist: psycopg2-binary>=2.9.0; extra == "postgresql"
33
+ Provides-Extra: mysql
34
+ Requires-Dist: mysql-connector-python>=8.0.0; extra == "mysql"
35
+ Provides-Extra: mariadb
36
+ Requires-Dist: mariadb>=1.1.0; extra == "mariadb"
37
+ Provides-Extra: sqlserver
38
+ Requires-Dist: pyodbc>=4.0.0; extra == "sqlserver"
39
+ Provides-Extra: oracle
40
+ Requires-Dist: cx_Oracle>=8.3.0; extra == "oracle"
41
+ Provides-Extra: informix
42
+ Requires-Dist: informixdb>=2.2.0; extra == "informix"
43
+ Provides-Extra: supabase
44
+ Requires-Dist: supabase>=2.0.0; extra == "supabase"
45
+ Requires-Dist: postgrest-py>=0.16.0; extra == "supabase"
46
+ Requires-Dist: gotrue-py>=2.0.0; extra == "supabase"
47
+ Provides-Extra: sqlite
48
+ Provides-Extra: all
49
+ Requires-Dist: psycopg2-binary>=2.9.0; extra == "all"
50
+ Requires-Dist: mysql-connector-python>=8.0.0; extra == "all"
51
+ Requires-Dist: mariadb>=1.1.0; extra == "all"
52
+ Requires-Dist: pyodbc>=4.0.0; extra == "all"
53
+ Requires-Dist: cx_Oracle>=8.3.0; extra == "all"
54
+ Requires-Dist: informixdb>=2.2.0; extra == "all"
55
+ Requires-Dist: supabase>=2.0.0; extra == "all"
56
+ Requires-Dist: postgrest-py>=0.16.0; extra == "all"
57
+ Requires-Dist: gotrue-py>=2.0.0; extra == "all"
58
+ Provides-Extra: dev
59
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
60
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
61
+ Requires-Dist: black>=22.0.0; extra == "dev"
62
+ Requires-Dist: flake8>=5.0.0; extra == "dev"
63
+ Requires-Dist: mypy>=1.0.0; extra == "dev"
64
+ Requires-Dist: pre-commit>=3.0.0; extra == "dev"
65
+ Provides-Extra: test-postgresql
66
+ Requires-Dist: pytest>=7.0.0; extra == "test-postgresql"
67
+ Requires-Dist: psycopg2-binary>=2.9.0; extra == "test-postgresql"
68
+ Provides-Extra: test-mysql
69
+ Requires-Dist: pytest>=7.0.0; extra == "test-mysql"
70
+ Requires-Dist: mysql-connector-python>=8.0.0; extra == "test-mysql"
71
+ Dynamic: license-file
72
+
73
+ # Thoth Database Manager
74
+
75
+ A Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.
76
+
77
+ ## Features
78
+
79
+ - **Multi-Database Support**: PostgreSQL, MySQL, MariaDB, SQLite, SQL Server, Oracle, Informix, and Supabase
80
+ - **Plugin Architecture**: Extensible design for adding new database types
81
+ - **LSH Search**: Locality-Sensitive Hashing for finding similar values across database columns
82
+ - **Type Safety**: Pydantic-based document models for structured data
83
+ - **Backward Compatibility**: Maintains compatibility with existing code
84
+
85
+ ## Installation
86
+
87
+ ```bash
88
+ pip install thoth-dbmanager
89
+ ```
90
+
91
+ ## Quick Start
92
+
93
+ ### Basic Usage
94
+
95
+ ```python
96
+ from thoth_dbmanager import ThothDbManager
97
+
98
+ # Create a database manager instance
99
+ manager = ThothDbManager.get_instance(
100
+ db_type="postgresql",
101
+ db_root_path="./data",
102
+ db_mode="dev",
103
+ host="localhost",
104
+ port=5432,
105
+ database="mydb",
106
+ user="username",
107
+ password="password"
108
+ )
109
+
110
+ # Get database information
111
+ tables = manager.get_tables()
112
+ columns = manager.get_columns("my_table")
113
+ foreign_keys = manager.get_foreign_keys()
114
+
115
+ # Execute SQL queries
116
+ results = manager.execute_sql("SELECT * FROM my_table LIMIT 10")
117
+
118
+ # Get example data for understanding table contents
119
+ example_data = manager.get_example_data("my_table", number_of_rows=20)
120
+ ```
121
+
122
+ ### LSH Similarity Search
123
+
124
+ ```python
125
+ # Query for similar values across all database columns
126
+ similar_values = manager.query_lsh(
127
+ keyword="john doe",
128
+ top_n=10,
129
+ signature_size=30,
130
+ n_gram=3
131
+ )
132
+
133
+ # Results are organized by table and column
134
+ for table_name, columns in similar_values.items():
135
+ for column_name, values in columns.items():
136
+ print(f"Similar values in {table_name}.{column_name}: {values}")
137
+ ```
138
+
139
+ ## Supported Databases
140
+
141
+ | Database | Connection Parameters |
142
+ |----------|----------------------|
143
+ | **PostgreSQL** | `host`, `port`, `database`, `user`, `password` |
144
+ | **MySQL** | `host`, `port`, `database`, `user`, `password` |
145
+ | **MariaDB** | `host`, `port`, `database`, `user`, `password` |
146
+ | **SQLite** | `database_path` |
147
+ | **Supabase** | `host`, `port`, `database`, `user`, `password`, `project_url`, `api_key` |
148
+ | **SQL Server** | `server`, `database`, `user`, `password` |
149
+ | **Oracle** | `host`, `port`, `service_name`, `user`, `password` |
150
+ | **Informix** | `server`, `database`, `host`, `user`, `password` |
151
+
152
+ ## API Reference
153
+
154
+ ### ThothDbManager
155
+
156
+ The main class for database operations.
157
+
158
+ #### Class Methods
159
+
160
+ ##### `get_instance(db_type: str, **kwargs) -> ThothDbManager`
161
+
162
+ Creates or retrieves a singleton instance of the database manager.
163
+
164
+ **Parameters:**
165
+ - `db_type` (str): Database type identifier
166
+ - `db_root_path` (str): Path to store database-related files
167
+ - `db_mode` (str): Operating mode ("dev", "prod", etc.)
168
+ - `**kwargs`: Database-specific connection parameters
169
+
170
+ **Returns:** Database manager instance
171
+
172
+ #### Instance Methods
173
+
174
+ ##### `execute_sql(sql: str, params: dict = None, fetch: str = "all", timeout: int = 60) -> Any`
175
+
176
+ Execute SQL queries against the database.
177
+
178
+ **Parameters:**
179
+ - `sql` (str): SQL query string
180
+ - `params` (dict, optional): Query parameters for prepared statements
181
+ - `fetch` (str): How to fetch results ("all", "one", or number)
182
+ - `timeout` (int): Query timeout in seconds
183
+
184
+ **Returns:** Query results
185
+
186
+ ##### `get_tables() -> List[Dict[str, str]]`
187
+
188
+ Get list of tables in the database.
189
+
190
+ **Returns:** List of dictionaries with `name` and `comment` keys
191
+
192
+ ##### `get_columns(table_name: str) -> List[Dict[str, Any]]`
193
+
194
+ Get column information for a specific table.
195
+
196
+ **Parameters:**
197
+ - `table_name` (str): Name of the table
198
+
199
+ **Returns:** List of dictionaries with column metadata (`name`, `data_type`, `comment`, `is_pk`)
200
+
201
+ ##### `get_foreign_keys() -> List[Dict[str, str]]`
202
+
203
+ Get foreign key relationships in the database.
204
+
205
+ **Returns:** List of dictionaries with foreign key information
206
+
207
+ ##### `get_example_data(table_name: str, number_of_rows: int = 30) -> Dict[str, List[Any]]`
208
+
209
+ Get the most frequent values for each column in a table.
210
+
211
+ **Parameters:**
212
+ - `table_name` (str): Name of the table
213
+ - `number_of_rows` (int): Maximum number of example values per column
214
+
215
+ **Returns:** Dictionary mapping column names to lists of example values
216
+
217
+ ##### `query_lsh(keyword: str, signature_size: int = 30, n_gram: int = 3, top_n: int = 10) -> Dict[str, Dict[str, List[str]]]`
218
+
219
+ Search for similar values using LSH (Locality-Sensitive Hashing).
220
+
221
+ **Parameters:**
222
+ - `keyword` (str): Search term
223
+ - `signature_size` (int): MinHash signature size
224
+ - `n_gram` (int): N-gram size for text processing
225
+ - `top_n` (int): Number of similar values to return
226
+
227
+ **Returns:** Nested dictionary: `{table_name: {column_name: [similar_values]}}`
228
+
229
+ ### Factory Pattern (Alternative API)
230
+
231
+ For more advanced usage, you can use the factory pattern:
232
+
233
+ ```python
234
+ from thoth_dbmanager import ThothDbFactory
235
+
236
+ # Create manager using factory
237
+ manager = ThothDbFactory.create_manager(
238
+ db_type="postgresql",
239
+ db_root_path="./data",
240
+ db_mode="dev",
241
+ host="localhost",
242
+ port=5432,
243
+ database="mydb",
244
+ user="username",
245
+ password="password"
246
+ )
247
+
248
+ # List available database types
249
+ available_dbs = ThothDbFactory.list_available_databases()
250
+
251
+ # Get required parameters for a database type
252
+ params = ThothDbFactory.get_required_parameters("postgresql")
253
+ ```
254
+
255
+ ## Document-Based API (Advanced)
256
+
257
+ The library also provides a document-based API for structured data access:
258
+
259
+ ```python
260
+ # Get tables as structured documents
261
+ if hasattr(manager, 'get_tables_as_documents'):
262
+ table_docs = manager.get_tables_as_documents()
263
+ for doc in table_docs:
264
+ print(f"Table: {doc.table_name}")
265
+ print(f"Schema: {doc.schema_name}")
266
+ print(f"Comment: {doc.comment}")
267
+
268
+ # Get columns as structured documents
269
+ if hasattr(manager, 'get_columns_as_documents'):
270
+ column_docs = manager.get_columns_as_documents("my_table")
271
+ for doc in column_docs:
272
+ print(f"Column: {doc.column_name} ({doc.data_type})")
273
+ print(f"Nullable: {doc.is_nullable}")
274
+ print(f"Primary Key: {doc.is_pk}")
275
+ ```
276
+
277
+ ## Configuration Examples
278
+
279
+ ### PostgreSQL
280
+ ```python
281
+ manager = ThothDbManager.get_instance(
282
+ db_type="postgresql",
283
+ db_root_path="./data",
284
+ db_mode="production",
285
+ host="localhost",
286
+ port=5432,
287
+ database="myapp",
288
+ user="dbuser",
289
+ password="dbpass"
290
+ )
291
+ ```
292
+
293
+ ### SQLite
294
+ ```python
295
+ manager = ThothDbManager.get_instance(
296
+ db_type="sqlite",
297
+ db_root_path="./data",
298
+ db_mode="dev",
299
+ database_path="./data/myapp.db"
300
+ )
301
+ ```
302
+
303
+ ### MySQL/MariaDB
304
+ ```python
305
+ manager = ThothDbManager.get_instance(
306
+ db_type="mysql", # or "mariadb"
307
+ db_root_path="./data",
308
+ db_mode="production",
309
+ host="localhost",
310
+ port=3306,
311
+ database="myapp",
312
+ user="dbuser",
313
+ password="dbpass"
314
+ )
315
+ ```
316
+
317
+ ## Error Handling
318
+
319
+ The library provides clear error messages for common issues:
320
+
321
+ ```python
322
+ try:
323
+ manager = ThothDbManager.get_instance(
324
+ db_type="postgresql",
325
+ db_root_path="./data",
326
+ # Missing required parameters
327
+ )
328
+ except ValueError as e:
329
+ print(f"Configuration error: {e}")
330
+
331
+ try:
332
+ results = manager.execute_sql("SELECT * FROM nonexistent_table")
333
+ except Exception as e:
334
+ print(f"Query error: {e}")
335
+ ```
336
+
337
+ ## LSH Search Details
338
+
339
+ The LSH (Locality-Sensitive Hashing) feature allows you to find similar text values across your database:
340
+
341
+ 1. **Automatic Setup**: LSH indexes are created automatically from your database content
342
+ 2. **Cross-Column Search**: Search across all text columns in all tables
343
+ 3. **Fuzzy Matching**: Find similar values even with typos or variations
344
+ 4. **Configurable**: Adjust similarity sensitivity with parameters
345
+
346
+ ### LSH Use Cases
347
+
348
+ - **Data Deduplication**: Find duplicate or near-duplicate records
349
+ - **Data Quality**: Identify inconsistent data entry
350
+ - **Search Enhancement**: Provide "did you mean?" functionality
351
+ - **Data Exploration**: Discover related content across tables
352
+
353
+ ## Architecture
354
+
355
+ The library uses a modern plugin architecture:
356
+
357
+ - **Plugins**: Database-specific implementations
358
+ - **Adapters**: Low-level database operations
359
+ - **Factory**: Plugin instantiation and management
360
+ - **Documents**: Type-safe data models using Pydantic
361
+ - **Registry**: Plugin discovery and registration
362
+
363
+ This design makes it easy to:
364
+ - Add support for new database types
365
+ - Maintain consistent APIs across databases
366
+ - Extend functionality without breaking existing code
367
+
368
+ ## Contributing
369
+
370
+ To add support for a new database type:
371
+
372
+ 1. Create an adapter class implementing `DbAdapter`
373
+ 2. Create a plugin class implementing `DbPlugin`
374
+ 3. Register the plugin with `@register_plugin("db_type")`
375
+ 4. Add connection parameter validation
376
+ 5. Implement required abstract methods
377
+
378
+ ## License
379
+
380
+ This project is licensed under the MIT License.
381
+
382
+ ## Support
383
+
384
+ For issues, questions, or contributions, please visit the project repository.