sql-testing-library 0.14.0__tar.gz → 0.15.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (23) hide show
  1. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/CHANGELOG.md +13 -0
  2. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/PKG-INFO +117 -35
  3. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/README.md +113 -32
  4. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/pyproject.toml +10 -3
  5. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/__init__.py +3 -1
  6. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/_adapters/bigquery.py +17 -3
  7. sql_testing_library-0.15.0/src/sql_testing_library/_adapters/duckdb.py +474 -0
  8. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/_adapters/snowflake.py +11 -1
  9. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/_core.py +11 -4
  10. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/_pytest_plugin.py +6 -0
  11. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/_types.py +12 -1
  12. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/LICENSE +0 -0
  13. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/_adapters/__init__.py +0 -0
  14. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/_adapters/athena.py +0 -0
  15. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/_adapters/base.py +0 -0
  16. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/_adapters/presto.py +0 -0
  17. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/_adapters/redshift.py +0 -0
  18. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/_adapters/trino.py +0 -0
  19. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/_exceptions.py +0 -0
  20. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/_mock_table.py +0 -0
  21. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/_sql_logger.py +0 -0
  22. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/_sql_utils.py +0 -0
  23. {sql_testing_library-0.14.0 → sql_testing_library-0.15.0}/src/sql_testing_library/py.typed +0 -0
@@ -5,6 +5,19 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## 0.15.0 (2025-07-27)
9
+
10
+ ### Feat
11
+
12
+ - implement duckdb integration (#117)
13
+ - integrate mocksmith for test data generation and simplify relea… (#112)
14
+ - integrate mocksmith for test data generation and simplify release workflow
15
+
16
+ ### Fix
17
+
18
+ - added explicit dependency of faker
19
+ - upgrade mocksmith library version
20
+
8
21
  ## 0.14.0 (2025-06-30)
9
22
 
10
23
  ### Feat
@@ -1,9 +1,9 @@
1
1
  Metadata-Version: 2.3
2
2
  Name: sql-testing-library
3
- Version: 0.14.0
4
- Summary: A powerful Python framework for unit testing SQL queries across BigQuery, Snowflake, Redshift, Athena, and Trino with mock data
3
+ Version: 0.15.0
4
+ Summary: A powerful Python framework for unit testing SQL queries across BigQuery, Snowflake, Redshift, Athena, Trino, and DuckDB with mock data
5
5
  License: MIT
6
- Keywords: sql,testing,unit-testing,mock-data,database-testing,bigquery,snowflake,redshift,athena,trino,data-engineering,etl-testing,sql-validation,query-testing
6
+ Keywords: sql,testing,unit-testing,mock-data,database-testing,bigquery,snowflake,redshift,athena,trino,duckdb,data-engineering,etl-testing,sql-validation,query-testing
7
7
  Author: Gurmeet Saran
8
8
  Author-email: gurmeetx@gmail.com
9
9
  Maintainer: Gurmeet Saran
@@ -35,6 +35,7 @@ Classifier: Typing :: Typed
35
35
  Provides-Extra: all
36
36
  Provides-Extra: athena
37
37
  Provides-Extra: bigquery
38
+ Provides-Extra: duckdb
38
39
  Provides-Extra: redshift
39
40
  Provides-Extra: snowflake
40
41
  Provides-Extra: trino
@@ -57,7 +58,7 @@ Description-Content-Type: text/markdown
57
58
 
58
59
  # SQL Testing Library
59
60
 
60
- A powerful Python framework for unit testing SQL queries with mock data injection across BigQuery, Snowflake, Redshift, Athena, and Trino.
61
+ A powerful Python framework for unit testing SQL queries with mock data injection across BigQuery, Snowflake, Redshift, Athena, Trino, and DuckDB.
61
62
 
62
63
  [![Unit Tests](https://github.com/gurmeetsaran/sqltesting/actions/workflows/tests.yaml/badge.svg)](https://github.com/gurmeetsaran/sqltesting/actions/workflows/tests.yaml)
63
64
  [![Athena Integration](https://github.com/gurmeetsaran/sqltesting/actions/workflows/athena-integration.yml/badge.svg)](https://github.com/gurmeetsaran/sqltesting/actions/workflows/athena-integration.yml)
@@ -104,7 +105,7 @@ For more details on our journey and the engineering challenges we solved, read t
104
105
 
105
106
  ## Features
106
107
 
107
- - **Multi-Database Support**: Test SQL across BigQuery, Athena, Redshift, Trino, and Snowflake
108
+ - **Multi-Database Support**: Test SQL across BigQuery, Athena, Redshift, Trino, Snowflake, and DuckDB
108
109
  - **Mock Data Injection**: Use Python dataclasses for type-safe test data
109
110
  - **CTE or Physical Tables**: Automatic fallback for query size limits
110
111
  - **Type-Safe Results**: Deserialize results to Pydantic models
@@ -117,28 +118,28 @@ The library supports different data types across database engines. All checkmark
117
118
 
118
119
  ### Primitive Types
119
120
 
120
- | Data Type | Python Type | BigQuery | Athena | Redshift | Trino | Snowflake |
121
- |-----------|-------------|----------|--------|----------|-------|-----------|
122
- | **String** | `str` | ✅ | ✅ | ✅ | ✅ | ✅ |
123
- | **Integer** | `int` | ✅ | ✅ | ✅ | ✅ | ✅ |
124
- | **Float** | `float` | ✅ | ✅ | ✅ | ✅ | ✅ |
125
- | **Boolean** | `bool` | ✅ | ✅ | ✅ | ✅ | ✅ |
126
- | **Date** | `date` | ✅ | ✅ | ✅ | ✅ | ✅ |
127
- | **Datetime** | `datetime` | ✅ | ✅ | ✅ | ✅ | ✅ |
128
- | **Decimal** | `Decimal` | ✅ | ✅ | ✅ | ✅ | ✅ |
129
- | **Optional** | `Optional[T]` | ✅ | ✅ | ✅ | ✅ | ✅ |
121
+ | Data Type | Python Type | BigQuery | Athena | Redshift | Trino | Snowflake | DuckDB |
122
+ |-----------|-------------|----------|--------|----------|-------|-----------|--------|
123
+ | **String** | `str` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
124
+ | **Integer** | `int` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
125
+ | **Float** | `float` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
126
+ | **Boolean** | `bool` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
127
+ | **Date** | `date` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
128
+ | **Datetime** | `datetime` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
129
+ | **Decimal** | `Decimal` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
130
+ | **Optional** | `Optional[T]` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
130
131
 
131
132
  ### Complex Types
132
133
 
133
- | Data Type | Python Type | BigQuery | Athena | Redshift | Trino | Snowflake |
134
- |-----------|-------------|----------|--------|----------|-------|-----------|
135
- | **String Array** | `List[str]` | ✅ | ✅ | ✅ | ✅ | ✅ |
136
- | **Integer Array** | `List[int]` | ✅ | ✅ | ✅ | ✅ | ✅ |
137
- | **Decimal Array** | `List[Decimal]` | ✅ | ✅ | ✅ | ✅ | ✅ |
138
- | **Optional Array** | `Optional[List[T]]` | ✅ | ✅ | ✅ | ✅ | ✅ |
139
- | **Map/Dict** | `Dict[K, V]` | ✅ | ✅ | ✅ | ✅ | ✅ |
140
- | **Struct/Record** | `dataclass` | ✅ | ✅ | ❌ | ✅ | ❌ |
141
- | **Nested Arrays** | `List[List[T]]` | ❌ | ❌ | ❌ | ❌ | ❌ |
134
+ | Data Type | Python Type | BigQuery | Athena | Redshift | Trino | Snowflake | DuckDB |
135
+ |-----------|-------------|----------|--------|----------|-------|-----------|--------|
136
+ | **String Array** | `List[str]` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
137
+ | **Integer Array** | `List[int]` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
138
+ | **Decimal Array** | `List[Decimal]` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
139
+ | **Optional Array** | `Optional[List[T]]` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
140
+ | **Map/Dict** | `Dict[K, V]` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
141
+ | **Struct/Record** | `dataclass` | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ |
142
+ | **Nested Arrays** | `List[List[T]]` | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
142
143
 
143
144
  ### Database-Specific Notes
144
145
 
@@ -147,15 +148,16 @@ The library supports different data types across database engines. All checkmark
147
148
  - **Redshift**: Arrays and maps implemented via SUPER type (JSON parsing); 16MB query size limit; struct types not yet supported
148
149
  - **Trino**: Memory catalog for testing; excellent decimal precision; supports arrays, maps, and struct types using `ROW` with named fields (dataclasses and Pydantic models)
149
150
  - **Snowflake**: Column names normalized to lowercase; 1MB query size limit; dict/map types implemented via VARIANT type (JSON parsing); struct types not yet supported
151
+ - **DuckDB**: Fast embedded analytics database; excellent SQL standards compliance; supports arrays, maps, and struct types using `STRUCT` syntax with named fields (dataclasses and Pydantic models)
150
152
 
151
153
  ## Execution Modes Support
152
154
 
153
155
  The library supports two execution modes for mock data injection. **CTE Mode is the default** and is automatically used unless Physical Tables mode is explicitly requested or required due to query size limits.
154
156
 
155
- | Execution Mode | Description | BigQuery | Athena | Redshift | Trino | Snowflake |
156
- |----------------|-------------|----------|--------|----------|-------|-----------|
157
- | **CTE Mode** | Mock data injected as Common Table Expressions | ✅ | ✅ | ✅ | ✅ | ✅ |
158
- | **Physical Tables** | Mock data created as temporary tables | ✅ | ✅ | ✅ | ✅ | ✅ |
157
+ | Execution Mode | Description | BigQuery | Athena | Redshift | Trino | Snowflake | DuckDB |
158
+ |----------------|-------------|----------|--------|----------|-------|-----------|--------|
159
+ | **CTE Mode** | Mock data injected as Common Table Expressions | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
160
+ | **Physical Tables** | Mock data created as temporary tables | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
159
161
 
160
162
  ### Execution Mode Details
161
163
 
@@ -179,14 +181,16 @@ The library supports two execution modes for mock data injection. **CTE Mode is
179
181
  | **Redshift** | Temporary tables | Session-specific temp schema | Database automatic | Session end |
180
182
  | **Trino** | Memory tables | `memory.default` schema | Library executes `DROP TABLE` | After each test |
181
183
  | **Snowflake** | Temporary tables | Session-specific temp schema | Database automatic | Session end |
184
+ | **DuckDB** | Temporary tables | Database-specific temp schema | Library executes `DROP TABLE` | After each test |
182
185
 
183
186
  #### **Cleanup Behavior Explained**
184
187
 
185
- **Library-Managed Cleanup (BigQuery, Athena, Trino):**
188
+ **Library-Managed Cleanup (BigQuery, Athena, Trino, DuckDB):**
186
189
  - The SQL Testing Library explicitly calls cleanup methods after each test
187
190
  - **BigQuery**: Creates standard tables in your dataset, then deletes them via `client.delete_table()`
188
191
  - **Athena**: Creates external tables backed by S3 data, then drops table metadata via `DROP TABLE IF EXISTS` (⚠️ **S3 data files remain and require separate cleanup**)
189
192
  - **Trino**: Creates tables in memory catalog, then drops them via `DROP TABLE IF EXISTS`
193
+ - **DuckDB**: Creates temporary tables in the database, then drops them via `DROP TABLE IF EXISTS`
190
194
 
191
195
  **Database-Managed Cleanup (Redshift, Snowflake):**
192
196
  - These databases have built-in temporary table mechanisms
@@ -211,7 +215,7 @@ A: Trino's memory catalog doesn't automatically clean up tables when sessions en
211
215
  A: BigQuery tables created by the library are **standard tables without TTL** - they persist until explicitly deleted. The library immediately calls `client.delete_table()` after each test. If you want to set TTL as a safety net, you can configure it at the dataset level (e.g., 24 hours) to auto-delete any orphaned tables.
212
216
 
213
217
  **Q: Which databases leave artifacts if tests crash?**
214
- - **BigQuery, Athena, Trino**: May leave tables if library crashes before cleanup
218
+ - **BigQuery, Athena, Trino, DuckDB**: May leave tables if library crashes before cleanup
215
219
  - **Redshift, Snowflake**: No artifacts - temporary tables auto-cleanup on session end
216
220
 
217
221
  **Q: How to manually clean up orphaned tables?**
@@ -227,6 +231,10 @@ DROP TABLE temp_table_name;
227
231
  -- Trino: List and drop tables with temp prefix
228
232
  SHOW TABLES FROM memory.default LIKE 'temp_%';
229
233
  DROP TABLE memory.default.temp_table_name;
234
+
235
+ -- DuckDB: List and drop tables with temp prefix
236
+ SHOW TABLES;
237
+ DROP TABLE temp_table_name;
230
238
  ```
231
239
 
232
240
  **Q: How to handle S3 cleanup for Athena tables?**
@@ -277,6 +285,7 @@ aws s3api list-objects-v2 --bucket your-athena-results-bucket --prefix "temp_" \
277
285
  | **Redshift** | 16MB | Automatically switches at 16MB |
278
286
  | **Trino** | 16MB (estimated) | Large dataset or complex CTEs |
279
287
  | **Snowflake** | 1MB | Automatically switches at 1MB |
288
+ | **DuckDB** | 32MB (estimated) | Large dataset or complex CTEs |
280
289
 
281
290
  ### How to Control Execution Mode
282
291
 
@@ -399,6 +408,9 @@ pip install sql-testing-library[trino]
399
408
  # Install with Snowflake support
400
409
  pip install sql-testing-library[snowflake]
401
410
 
411
+ # Install with DuckDB support
412
+ pip install sql-testing-library[duckdb]
413
+
402
414
  # Or install with all database adapters
403
415
  pip install sql-testing-library[all]
404
416
  ```
@@ -414,9 +426,10 @@ poetry install --with athena
414
426
  poetry install --with redshift
415
427
  poetry install --with trino
416
428
  poetry install --with snowflake
429
+ poetry install --with duckdb
417
430
 
418
431
  # Install with all database adapters and dev tools
419
- poetry install --with bigquery,athena,redshift,trino,snowflake,dev
432
+ poetry install --with bigquery,athena,redshift,trino,snowflake,duckdb,dev
420
433
  ```
421
434
 
422
435
  ## Quick Start
@@ -425,7 +438,7 @@ poetry install --with bigquery,athena,redshift,trino,snowflake,dev
425
438
 
426
439
  ```ini
427
440
  [sql_testing]
428
- adapter = bigquery # Use 'bigquery', 'athena', 'redshift', 'trino', or 'snowflake'
441
+ adapter = bigquery # Use 'bigquery', 'athena', 'redshift', 'trino', 'snowflake', or 'duckdb'
429
442
 
430
443
  # BigQuery configuration
431
444
  [sql_testing.bigquery]
@@ -483,6 +496,10 @@ credentials_path = <path to credentials json>
483
496
  #
484
497
  # # Option 2: Password authentication (for accounts without MFA)
485
498
  # password = <snowflake_password>
499
+
500
+ # DuckDB configuration
501
+ # [sql_testing.duckdb]
502
+ # database = <path/to/database.duckdb> # Optional: defaults to in-memory database
486
503
  ```
487
504
 
488
505
  ### Database Context Understanding
@@ -496,6 +513,7 @@ Each database adapter uses a different concept for organizing tables and queries
496
513
  | **Redshift** | `{database}` | database only | `"test_db"` | `SELECT * FROM test_db.orders` |
497
514
  | **Snowflake** | `{database}.{schema}` | database + schema | `"test_db.public"` | `SELECT * FROM test_db.public.products` |
498
515
  | **Trino** | `{catalog}.{schema}` | catalog + schema | `"memory.default"` | `SELECT * FROM memory.default.inventory` |
516
+ | **DuckDB** | `{database}` | database only | `"test_db"` | `SELECT * FROM test_db.analytics` |
499
517
 
500
518
  #### Key Points:
501
519
 
@@ -568,6 +586,14 @@ class ProductsMockTable(BaseMockTable):
568
586
 
569
587
  def get_table_name(self) -> str:
570
588
  return "products"
589
+
590
+ # DuckDB Mock Table
591
+ class AnalyticsMockTable(BaseMockTable):
592
+ def get_database_name(self) -> str:
593
+ return "test_db" # database only
594
+
595
+ def get_table_name(self) -> str:
596
+ return "analytics"
571
597
  ```
572
598
 
573
599
  2. **Write a test** using one of the flexible patterns:
@@ -693,7 +719,7 @@ class EmployeesMockTable(BaseMockTable):
693
719
 
694
720
  # Test with struct types
695
721
  @sql_test(
696
- adapter_type="athena", # or "trino" or "bigquery"
722
+ adapter_type="athena", # or "trino", "bigquery", or "duckdb"
697
723
  mock_tables=[
698
724
  EmployeesMockTable([
699
725
  Employee(
@@ -739,7 +765,7 @@ def test_struct_with_dot_notation():
739
765
 
740
766
  # You can also query entire structs
741
767
  @sql_test(
742
- adapter_type="trino", # or "athena" or "bigquery"
768
+ adapter_type="trino", # or "athena", "bigquery", or "duckdb"
743
769
  mock_tables=[EmployeesMockTable([...])],
744
770
  result_class=dict # Returns full struct as dict
745
771
  )
@@ -987,9 +1013,21 @@ def test_snowflake_query():
987
1013
  query="SELECT user_id, name FROM users WHERE user_id = 1",
988
1014
  default_namespace="test_db"
989
1015
  )
1016
+
1017
+ # Use DuckDB adapter for this test
1018
+ @sql_test(
1019
+ adapter_type="duckdb",
1020
+ mock_tables=[...],
1021
+ result_class=UserResult
1022
+ )
1023
+ def test_duckdb_query():
1024
+ return TestCase(
1025
+ query="SELECT user_id, name FROM users WHERE user_id = 1",
1026
+ default_namespace="test_db"
1027
+ )
990
1028
  ```
991
1029
 
992
- The adapter_type parameter will use the configuration from the corresponding section in pytest.ini, such as `[sql_testing.bigquery]`, `[sql_testing.athena]`, `[sql_testing.redshift]`, `[sql_testing.trino]`, or `[sql_testing.snowflake]`.
1030
+ The adapter_type parameter will use the configuration from the corresponding section in pytest.ini, such as `[sql_testing.bigquery]`, `[sql_testing.athena]`, `[sql_testing.redshift]`, `[sql_testing.trino]`, `[sql_testing.snowflake]`, or `[sql_testing.duckdb]`.
993
1031
 
994
1032
  **Default Adapter Behavior:**
995
1033
  - If `adapter_type` is not specified in the test, the library uses the adapter from `[sql_testing]` section's `adapter` setting
@@ -1032,6 +1070,14 @@ The adapter_type parameter will use the configuration from the corresponding sec
1032
1070
  - Supports authentication via username and password
1033
1071
  - Optional support for warehouse, role, and schema specification
1034
1072
 
1073
+ #### DuckDB Adapter
1074
+ - Supports DuckDB embedded analytical database
1075
+ - Uses CTAS (CREATE TABLE AS SELECT) for efficient temporary table creation
1076
+ - Fast local database with excellent SQL standards compliance
1077
+ - Supports both file-based and in-memory databases
1078
+ - No authentication required - perfect for local development and testing
1079
+ - Excellent performance for analytical workloads
1080
+
1035
1081
  **Default Behavior:**
1036
1082
  - If adapter_type is not specified in the TestCase or decorator, the library will use the adapter specified in the `[sql_testing]` section's `adapter` setting.
1037
1083
  - If no adapter is specified in the `[sql_testing]` section, it defaults to "bigquery".
@@ -1240,6 +1286,41 @@ The library automatically:
1240
1286
 
1241
1287
  For detailed usage and configuration options, see the example files included.
1242
1288
 
1289
+ ## Integration with Mocksmith
1290
+
1291
+ SQL Testing Library works seamlessly with [Mocksmith](https://github.com/gurmeetsaran/mocksmith) for automatic test data generation. Mocksmith can reduce your test setup code by ~70% while providing more realistic test data.
1292
+
1293
+ Install mocksmith with: `pip install mocksmith[mock,pydantic]`
1294
+
1295
+ ### Quick Example
1296
+
1297
+ ```python
1298
+ # Without Mocksmith - Manual data creation
1299
+ customers = []
1300
+ for i in range(100):
1301
+ customers.append(Customer(
1302
+ id=i + 1,
1303
+ name=f"Customer {i + 1}",
1304
+ email=f"customer{i + 1}@test.com",
1305
+ balance=Decimal(str(random.uniform(0, 10000)))
1306
+ ))
1307
+
1308
+ # With Mocksmith - Automatic realistic data
1309
+ from mocksmith import mockable, Varchar, Integer, Money
1310
+
1311
+ @mockable
1312
+ @dataclass
1313
+ class Customer:
1314
+ id: Integer()
1315
+ name: Varchar(100)
1316
+ email: Varchar(255)
1317
+ balance: Money()
1318
+
1319
+ customers = [Customer.mock() for _ in range(100)]
1320
+ ```
1321
+
1322
+ See the [Mocksmith Integration Guide](docs/mocksmith_integration.md) and [examples](examples/mocksmith_integration_example.py) for detailed usage patterns.
1323
+
1243
1324
  ## Known Limitations and TODOs
1244
1325
 
1245
1326
  The library has a few known limitations that are planned to be addressed in future updates:
@@ -1268,4 +1349,5 @@ The library has a few known limitations that are planned to be addressed in futu
1268
1349
  - psycopg2-binary for Redshift
1269
1350
  - trino for Trino
1270
1351
  - snowflake-connector-python for Snowflake
1352
+ - duckdb for DuckDB
1271
1353
 
@@ -1,6 +1,6 @@
1
1
  # SQL Testing Library
2
2
 
3
- A powerful Python framework for unit testing SQL queries with mock data injection across BigQuery, Snowflake, Redshift, Athena, and Trino.
3
+ A powerful Python framework for unit testing SQL queries with mock data injection across BigQuery, Snowflake, Redshift, Athena, Trino, and DuckDB.
4
4
 
5
5
  [![Unit Tests](https://github.com/gurmeetsaran/sqltesting/actions/workflows/tests.yaml/badge.svg)](https://github.com/gurmeetsaran/sqltesting/actions/workflows/tests.yaml)
6
6
  [![Athena Integration](https://github.com/gurmeetsaran/sqltesting/actions/workflows/athena-integration.yml/badge.svg)](https://github.com/gurmeetsaran/sqltesting/actions/workflows/athena-integration.yml)
@@ -47,7 +47,7 @@ For more details on our journey and the engineering challenges we solved, read t
47
47
 
48
48
  ## Features
49
49
 
50
- - **Multi-Database Support**: Test SQL across BigQuery, Athena, Redshift, Trino, and Snowflake
50
+ - **Multi-Database Support**: Test SQL across BigQuery, Athena, Redshift, Trino, Snowflake, and DuckDB
51
51
  - **Mock Data Injection**: Use Python dataclasses for type-safe test data
52
52
  - **CTE or Physical Tables**: Automatic fallback for query size limits
53
53
  - **Type-Safe Results**: Deserialize results to Pydantic models
@@ -60,28 +60,28 @@ The library supports different data types across database engines. All checkmark
60
60
 
61
61
  ### Primitive Types
62
62
 
63
- | Data Type | Python Type | BigQuery | Athena | Redshift | Trino | Snowflake |
64
- |-----------|-------------|----------|--------|----------|-------|-----------|
65
- | **String** | `str` | ✅ | ✅ | ✅ | ✅ | ✅ |
66
- | **Integer** | `int` | ✅ | ✅ | ✅ | ✅ | ✅ |
67
- | **Float** | `float` | ✅ | ✅ | ✅ | ✅ | ✅ |
68
- | **Boolean** | `bool` | ✅ | ✅ | ✅ | ✅ | ✅ |
69
- | **Date** | `date` | ✅ | ✅ | ✅ | ✅ | ✅ |
70
- | **Datetime** | `datetime` | ✅ | ✅ | ✅ | ✅ | ✅ |
71
- | **Decimal** | `Decimal` | ✅ | ✅ | ✅ | ✅ | ✅ |
72
- | **Optional** | `Optional[T]` | ✅ | ✅ | ✅ | ✅ | ✅ |
63
+ | Data Type | Python Type | BigQuery | Athena | Redshift | Trino | Snowflake | DuckDB |
64
+ |-----------|-------------|----------|--------|----------|-------|-----------|--------|
65
+ | **String** | `str` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
66
+ | **Integer** | `int` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
67
+ | **Float** | `float` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
68
+ | **Boolean** | `bool` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
69
+ | **Date** | `date` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
70
+ | **Datetime** | `datetime` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
71
+ | **Decimal** | `Decimal` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
72
+ | **Optional** | `Optional[T]` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
73
73
 
74
74
  ### Complex Types
75
75
 
76
- | Data Type | Python Type | BigQuery | Athena | Redshift | Trino | Snowflake |
77
- |-----------|-------------|----------|--------|----------|-------|-----------|
78
- | **String Array** | `List[str]` | ✅ | ✅ | ✅ | ✅ | ✅ |
79
- | **Integer Array** | `List[int]` | ✅ | ✅ | ✅ | ✅ | ✅ |
80
- | **Decimal Array** | `List[Decimal]` | ✅ | ✅ | ✅ | ✅ | ✅ |
81
- | **Optional Array** | `Optional[List[T]]` | ✅ | ✅ | ✅ | ✅ | ✅ |
82
- | **Map/Dict** | `Dict[K, V]` | ✅ | ✅ | ✅ | ✅ | ✅ |
83
- | **Struct/Record** | `dataclass` | ✅ | ✅ | ❌ | ✅ | ❌ |
84
- | **Nested Arrays** | `List[List[T]]` | ❌ | ❌ | ❌ | ❌ | ❌ |
76
+ | Data Type | Python Type | BigQuery | Athena | Redshift | Trino | Snowflake | DuckDB |
77
+ |-----------|-------------|----------|--------|----------|-------|-----------|--------|
78
+ | **String Array** | `List[str]` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
79
+ | **Integer Array** | `List[int]` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
80
+ | **Decimal Array** | `List[Decimal]` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
81
+ | **Optional Array** | `Optional[List[T]]` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
82
+ | **Map/Dict** | `Dict[K, V]` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
83
+ | **Struct/Record** | `dataclass` | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ |
84
+ | **Nested Arrays** | `List[List[T]]` | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
85
85
 
86
86
  ### Database-Specific Notes
87
87
 
@@ -90,15 +90,16 @@ The library supports different data types across database engines. All checkmark
90
90
  - **Redshift**: Arrays and maps implemented via SUPER type (JSON parsing); 16MB query size limit; struct types not yet supported
91
91
  - **Trino**: Memory catalog for testing; excellent decimal precision; supports arrays, maps, and struct types using `ROW` with named fields (dataclasses and Pydantic models)
92
92
  - **Snowflake**: Column names normalized to lowercase; 1MB query size limit; dict/map types implemented via VARIANT type (JSON parsing); struct types not yet supported
93
+ - **DuckDB**: Fast embedded analytics database; excellent SQL standards compliance; supports arrays, maps, and struct types using `STRUCT` syntax with named fields (dataclasses and Pydantic models)
93
94
 
94
95
  ## Execution Modes Support
95
96
 
96
97
  The library supports two execution modes for mock data injection. **CTE Mode is the default** and is automatically used unless Physical Tables mode is explicitly requested or required due to query size limits.
97
98
 
98
- | Execution Mode | Description | BigQuery | Athena | Redshift | Trino | Snowflake |
99
- |----------------|-------------|----------|--------|----------|-------|-----------|
100
- | **CTE Mode** | Mock data injected as Common Table Expressions | ✅ | ✅ | ✅ | ✅ | ✅ |
101
- | **Physical Tables** | Mock data created as temporary tables | ✅ | ✅ | ✅ | ✅ | ✅ |
99
+ | Execution Mode | Description | BigQuery | Athena | Redshift | Trino | Snowflake | DuckDB |
100
+ |----------------|-------------|----------|--------|----------|-------|-----------|--------|
101
+ | **CTE Mode** | Mock data injected as Common Table Expressions | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
102
+ | **Physical Tables** | Mock data created as temporary tables | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
102
103
 
103
104
  ### Execution Mode Details
104
105
 
@@ -122,14 +123,16 @@ The library supports two execution modes for mock data injection. **CTE Mode is
122
123
  | **Redshift** | Temporary tables | Session-specific temp schema | Database automatic | Session end |
123
124
  | **Trino** | Memory tables | `memory.default` schema | Library executes `DROP TABLE` | After each test |
124
125
  | **Snowflake** | Temporary tables | Session-specific temp schema | Database automatic | Session end |
126
+ | **DuckDB** | Temporary tables | Database-specific temp schema | Library executes `DROP TABLE` | After each test |
125
127
 
126
128
  #### **Cleanup Behavior Explained**
127
129
 
128
- **Library-Managed Cleanup (BigQuery, Athena, Trino):**
130
+ **Library-Managed Cleanup (BigQuery, Athena, Trino, DuckDB):**
129
131
  - The SQL Testing Library explicitly calls cleanup methods after each test
130
132
  - **BigQuery**: Creates standard tables in your dataset, then deletes them via `client.delete_table()`
131
133
  - **Athena**: Creates external tables backed by S3 data, then drops table metadata via `DROP TABLE IF EXISTS` (⚠️ **S3 data files remain and require separate cleanup**)
132
134
  - **Trino**: Creates tables in memory catalog, then drops them via `DROP TABLE IF EXISTS`
135
+ - **DuckDB**: Creates temporary tables in the database, then drops them via `DROP TABLE IF EXISTS`
133
136
 
134
137
  **Database-Managed Cleanup (Redshift, Snowflake):**
135
138
  - These databases have built-in temporary table mechanisms
@@ -154,7 +157,7 @@ A: Trino's memory catalog doesn't automatically clean up tables when sessions en
154
157
  A: BigQuery tables created by the library are **standard tables without TTL** - they persist until explicitly deleted. The library immediately calls `client.delete_table()` after each test. If you want to set TTL as a safety net, you can configure it at the dataset level (e.g., 24 hours) to auto-delete any orphaned tables.
155
158
 
156
159
  **Q: Which databases leave artifacts if tests crash?**
157
- - **BigQuery, Athena, Trino**: May leave tables if library crashes before cleanup
160
+ - **BigQuery, Athena, Trino, DuckDB**: May leave tables if library crashes before cleanup
158
161
  - **Redshift, Snowflake**: No artifacts - temporary tables auto-cleanup on session end
159
162
 
160
163
  **Q: How to manually clean up orphaned tables?**
@@ -170,6 +173,10 @@ DROP TABLE temp_table_name;
170
173
  -- Trino: List and drop tables with temp prefix
171
174
  SHOW TABLES FROM memory.default LIKE 'temp_%';
172
175
  DROP TABLE memory.default.temp_table_name;
176
+
177
+ -- DuckDB: List and drop tables with temp prefix
178
+ SHOW TABLES;
179
+ DROP TABLE temp_table_name;
173
180
  ```
174
181
 
175
182
  **Q: How to handle S3 cleanup for Athena tables?**
@@ -220,6 +227,7 @@ aws s3api list-objects-v2 --bucket your-athena-results-bucket --prefix "temp_" \
220
227
  | **Redshift** | 16MB | Automatically switches at 16MB |
221
228
  | **Trino** | 16MB (estimated) | Large dataset or complex CTEs |
222
229
  | **Snowflake** | 1MB | Automatically switches at 1MB |
230
+ | **DuckDB** | 32MB (estimated) | Large dataset or complex CTEs |
223
231
 
224
232
  ### How to Control Execution Mode
225
233
 
@@ -342,6 +350,9 @@ pip install sql-testing-library[trino]
342
350
  # Install with Snowflake support
343
351
  pip install sql-testing-library[snowflake]
344
352
 
353
+ # Install with DuckDB support
354
+ pip install sql-testing-library[duckdb]
355
+
345
356
  # Or install with all database adapters
346
357
  pip install sql-testing-library[all]
347
358
  ```
@@ -357,9 +368,10 @@ poetry install --with athena
357
368
  poetry install --with redshift
358
369
  poetry install --with trino
359
370
  poetry install --with snowflake
371
+ poetry install --with duckdb
360
372
 
361
373
  # Install with all database adapters and dev tools
362
- poetry install --with bigquery,athena,redshift,trino,snowflake,dev
374
+ poetry install --with bigquery,athena,redshift,trino,snowflake,duckdb,dev
363
375
  ```
364
376
 
365
377
  ## Quick Start
@@ -368,7 +380,7 @@ poetry install --with bigquery,athena,redshift,trino,snowflake,dev
368
380
 
369
381
  ```ini
370
382
  [sql_testing]
371
- adapter = bigquery # Use 'bigquery', 'athena', 'redshift', 'trino', or 'snowflake'
383
+ adapter = bigquery # Use 'bigquery', 'athena', 'redshift', 'trino', 'snowflake', or 'duckdb'
372
384
 
373
385
  # BigQuery configuration
374
386
  [sql_testing.bigquery]
@@ -426,6 +438,10 @@ credentials_path = <path to credentials json>
426
438
  #
427
439
  # # Option 2: Password authentication (for accounts without MFA)
428
440
  # password = <snowflake_password>
441
+
442
+ # DuckDB configuration
443
+ # [sql_testing.duckdb]
444
+ # database = <path/to/database.duckdb> # Optional: defaults to in-memory database
429
445
  ```
430
446
 
431
447
  ### Database Context Understanding
@@ -439,6 +455,7 @@ Each database adapter uses a different concept for organizing tables and queries
439
455
  | **Redshift** | `{database}` | database only | `"test_db"` | `SELECT * FROM test_db.orders` |
440
456
  | **Snowflake** | `{database}.{schema}` | database + schema | `"test_db.public"` | `SELECT * FROM test_db.public.products` |
441
457
  | **Trino** | `{catalog}.{schema}` | catalog + schema | `"memory.default"` | `SELECT * FROM memory.default.inventory` |
458
+ | **DuckDB** | `{database}` | database only | `"test_db"` | `SELECT * FROM test_db.analytics` |
442
459
 
443
460
  #### Key Points:
444
461
 
@@ -511,6 +528,14 @@ class ProductsMockTable(BaseMockTable):
511
528
 
512
529
  def get_table_name(self) -> str:
513
530
  return "products"
531
+
532
+ # DuckDB Mock Table
533
+ class AnalyticsMockTable(BaseMockTable):
534
+ def get_database_name(self) -> str:
535
+ return "test_db" # database only
536
+
537
+ def get_table_name(self) -> str:
538
+ return "analytics"
514
539
  ```
515
540
 
516
541
  2. **Write a test** using one of the flexible patterns:
@@ -636,7 +661,7 @@ class EmployeesMockTable(BaseMockTable):
636
661
 
637
662
  # Test with struct types
638
663
  @sql_test(
639
- adapter_type="athena", # or "trino" or "bigquery"
664
+ adapter_type="athena", # or "trino", "bigquery", or "duckdb"
640
665
  mock_tables=[
641
666
  EmployeesMockTable([
642
667
  Employee(
@@ -682,7 +707,7 @@ def test_struct_with_dot_notation():
682
707
 
683
708
  # You can also query entire structs
684
709
  @sql_test(
685
- adapter_type="trino", # or "athena" or "bigquery"
710
+ adapter_type="trino", # or "athena", "bigquery", or "duckdb"
686
711
  mock_tables=[EmployeesMockTable([...])],
687
712
  result_class=dict # Returns full struct as dict
688
713
  )
@@ -930,9 +955,21 @@ def test_snowflake_query():
930
955
  query="SELECT user_id, name FROM users WHERE user_id = 1",
931
956
  default_namespace="test_db"
932
957
  )
958
+
959
+ # Use DuckDB adapter for this test
960
+ @sql_test(
961
+ adapter_type="duckdb",
962
+ mock_tables=[...],
963
+ result_class=UserResult
964
+ )
965
+ def test_duckdb_query():
966
+ return TestCase(
967
+ query="SELECT user_id, name FROM users WHERE user_id = 1",
968
+ default_namespace="test_db"
969
+ )
933
970
  ```
934
971
 
935
- The adapter_type parameter will use the configuration from the corresponding section in pytest.ini, such as `[sql_testing.bigquery]`, `[sql_testing.athena]`, `[sql_testing.redshift]`, `[sql_testing.trino]`, or `[sql_testing.snowflake]`.
972
+ The adapter_type parameter will use the configuration from the corresponding section in pytest.ini, such as `[sql_testing.bigquery]`, `[sql_testing.athena]`, `[sql_testing.redshift]`, `[sql_testing.trino]`, `[sql_testing.snowflake]`, or `[sql_testing.duckdb]`.
936
973
 
937
974
  **Default Adapter Behavior:**
938
975
  - If `adapter_type` is not specified in the test, the library uses the adapter from `[sql_testing]` section's `adapter` setting
@@ -975,6 +1012,14 @@ The adapter_type parameter will use the configuration from the corresponding sec
975
1012
  - Supports authentication via username and password
976
1013
  - Optional support for warehouse, role, and schema specification
977
1014
 
1015
+ #### DuckDB Adapter
1016
+ - Supports DuckDB embedded analytical database
1017
+ - Uses CTAS (CREATE TABLE AS SELECT) for efficient temporary table creation
1018
+ - Fast local database with excellent SQL standards compliance
1019
+ - Supports both file-based and in-memory databases
1020
+ - No authentication required - perfect for local development and testing
1021
+ - Excellent performance for analytical workloads
1022
+
978
1023
  **Default Behavior:**
979
1024
  - If adapter_type is not specified in the TestCase or decorator, the library will use the adapter specified in the `[sql_testing]` section's `adapter` setting.
980
1025
  - If no adapter is specified in the `[sql_testing]` section, it defaults to "bigquery".
@@ -1183,6 +1228,41 @@ The library automatically:
1183
1228
 
1184
1229
  For detailed usage and configuration options, see the example files included.
1185
1230
 
1231
+ ## Integration with Mocksmith
1232
+
1233
+ SQL Testing Library works seamlessly with [Mocksmith](https://github.com/gurmeetsaran/mocksmith) for automatic test data generation. Mocksmith can reduce your test setup code by ~70% while providing more realistic test data.
1234
+
1235
+ Install mocksmith with: `pip install mocksmith[mock,pydantic]`
1236
+
1237
+ ### Quick Example
1238
+
1239
+ ```python
1240
+ # Without Mocksmith - Manual data creation
1241
+ customers = []
1242
+ for i in range(100):
1243
+ customers.append(Customer(
1244
+ id=i + 1,
1245
+ name=f"Customer {i + 1}",
1246
+ email=f"customer{i + 1}@test.com",
1247
+ balance=Decimal(str(random.uniform(0, 10000)))
1248
+ ))
1249
+
1250
+ # With Mocksmith - Automatic realistic data
1251
+ from mocksmith import mockable, Varchar, Integer, Money
1252
+
1253
+ @mockable
1254
+ @dataclass
1255
+ class Customer:
1256
+ id: Integer()
1257
+ name: Varchar(100)
1258
+ email: Varchar(255)
1259
+ balance: Money()
1260
+
1261
+ customers = [Customer.mock() for _ in range(100)]
1262
+ ```
1263
+
1264
+ See the [Mocksmith Integration Guide](docs/mocksmith_integration.md) and [examples](examples/mocksmith_integration_example.py) for detailed usage patterns.
1265
+
1186
1266
  ## Known Limitations and TODOs
1187
1267
 
1188
1268
  The library has a few known limitations that are planned to be addressed in future updates:
@@ -1211,3 +1291,4 @@ The library has a few known limitations that are planned to be addressed in futu
1211
1291
  - psycopg2-binary for Redshift
1212
1292
  - trino for Trino
1213
1293
  - snowflake-connector-python for Snowflake
1294
+ - duckdb for DuckDB