sqlassert 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- sqlassert-0.1.0/PKG-INFO +173 -0
- sqlassert-0.1.0/README.md +162 -0
- sqlassert-0.1.0/pyproject.toml +22 -0
- sqlassert-0.1.0/setup.cfg +4 -0
- sqlassert-0.1.0/sqlassert/__init__.py +13 -0
- sqlassert-0.1.0/sqlassert/unique.py +605 -0
- sqlassert-0.1.0/sqlassert.egg-info/PKG-INFO +173 -0
- sqlassert-0.1.0/sqlassert.egg-info/SOURCES.txt +10 -0
- sqlassert-0.1.0/sqlassert.egg-info/dependency_links.txt +1 -0
- sqlassert-0.1.0/sqlassert.egg-info/requires.txt +5 -0
- sqlassert-0.1.0/sqlassert.egg-info/top_level.txt +1 -0
- sqlassert-0.1.0/tests/test_unique_join.py +310 -0
sqlassert-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,173 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: sqlassert
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Generate SQL assertions from unique join markers.
|
|
5
|
+
Requires-Python: >=3.10
|
|
6
|
+
Description-Content-Type: text/markdown
|
|
7
|
+
Requires-Dist: sqlglot>=28
|
|
8
|
+
Provides-Extra: test
|
|
9
|
+
Requires-Dist: duckdb; extra == "test"
|
|
10
|
+
Requires-Dist: pytest; extra == "test"
|
|
11
|
+
|
|
12
|
+
# sqlassert
|
|
13
|
+
|
|
14
|
+
`sqlassert` is a Python library for adding safety checks to SQL before you run it.
|
|
15
|
+
|
|
16
|
+
The goal is to catch common query mistakes at test time or build time, using fast static and metadata-backed proofs instead of scanning production data. You can add `sqlassert` to your test suite and validate important queries offline, making them more resilient independent of the current contents of your database.
|
|
17
|
+
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
pip install sqlassert
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
_Alpha warning: Today `sqlassert` supports only one check: `/**UNIQUE**/` joins. It is also only tested on duckdb._
|
|
24
|
+
|
|
25
|
+
## Features
|
|
26
|
+
|
|
27
|
+
### Unique Join
|
|
28
|
+
|
|
29
|
+
Joins often accidentally multiply rows. A query may look correct against today’s data but silently break when the RHS relation later contains multiple matching rows.
|
|
30
|
+
|
|
31
|
+
`sqlassert` lets you mark joins that are expected to be unique. That is, the result of the join must never 'grow' the number of rows with respect to the LHS.
|
|
32
|
+
|
|
33
|
+
```sql
|
|
34
|
+
select *
|
|
35
|
+
from sessions
|
|
36
|
+
/**UNIQUE**/ join users
|
|
37
|
+
on sessions.user_id = users.id;
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
The marker is just a SQL comment. Your SQL remains valid SQL and can still run normally. `sqlassert` reads the query separately and validates that the RHS is provably unique for the join keys.
|
|
41
|
+
|
|
42
|
+
## Usage
|
|
43
|
+
|
|
44
|
+
Run validation offline, before your application or analytics job executes the query:
|
|
45
|
+
|
|
46
|
+
```python
|
|
47
|
+
import duckdb
|
|
48
|
+
from sqlassert import validate_unique_joins
|
|
49
|
+
|
|
50
|
+
con = duckdb.connect("warehouse.duckdb")
|
|
51
|
+
|
|
52
|
+
query = """
|
|
53
|
+
select *
|
|
54
|
+
from sessions
|
|
55
|
+
/**UNIQUE**/ join users
|
|
56
|
+
on sessions.user_id = users.id
|
|
57
|
+
"""
|
|
58
|
+
|
|
59
|
+
result = validate_unique_joins(con, query)
|
|
60
|
+
|
|
61
|
+
assert result.valid, result.reason
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
For a test suite, keep your model/query SQL as strings or load them from files, then validate them against a db connection that has the relevant schema:
|
|
65
|
+
|
|
66
|
+
```python
|
|
67
|
+
def test_query_join_contract(con):
|
|
68
|
+
query = load_query("models/session_enrichment.sql")
|
|
69
|
+
result = validate_unique_joins(con, query)
|
|
70
|
+
|
|
71
|
+
assert result.valid, result.reason
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
`result.checks` contains one check per marker:
|
|
75
|
+
|
|
76
|
+
```python
|
|
77
|
+
for check in result.checks:
|
|
78
|
+
print(check.valid)
|
|
79
|
+
print(check.reason)
|
|
80
|
+
print(check.inferred_key_columns)
|
|
81
|
+
print(check.constrained_key_columns)
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
## Details
|
|
85
|
+
|
|
86
|
+
### Unique Join Syntax
|
|
87
|
+
|
|
88
|
+
Place `/**UNIQUE**/` immediately before the join that should be uniqueness-checked:
|
|
89
|
+
|
|
90
|
+
```sql
|
|
91
|
+
select *
|
|
92
|
+
from lhs
|
|
93
|
+
/**UNIQUE**/ left join rhs
|
|
94
|
+
on lhs.rhs_id = rhs.id;
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
`ON` and `USING` are both supported:
|
|
98
|
+
|
|
99
|
+
```sql
|
|
100
|
+
select *
|
|
101
|
+
from users
|
|
102
|
+
/**UNIQUE**/ join user_profiles
|
|
103
|
+
using (id);
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
The marker applies to the next join after the comment.
|
|
107
|
+
|
|
108
|
+
## Proofs, Not Data Checks
|
|
109
|
+
|
|
110
|
+
`sqlassert` does **not** validate by querying actual table data. It will not run `count(*)`, search for duplicates, or sample rows.
|
|
111
|
+
|
|
112
|
+
Instead, it proves uniqueness using fast information available from the SQL and database metadata. If uniqueness cannot be proven, validation fails with a reason that names the join and RHS column:
|
|
113
|
+
|
|
114
|
+
```text
|
|
115
|
+
in join "INNER JOIN events ON sessions.event_id = events.id", we can't prove that RHS column id is unique
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
Supported uniqueness proofs today:
|
|
119
|
+
|
|
120
|
+
- RHS `PRIMARY KEY` and `UNIQUE` constraints from db metadata.
|
|
121
|
+
- RHS `GROUP BY` subqueries, when the join covers the grouping keys.
|
|
122
|
+
- RHS `SELECT DISTINCT` subqueries, when the join covers the selected distinct columns.
|
|
123
|
+
- RHS `QUALIFY row_number() over (partition by ...) = 1` subqueries, when the join covers the partition keys.
|
|
124
|
+
- Simple projection views and subqueries that preserve one of the proofs above.
|
|
125
|
+
|
|
126
|
+
Views can inherit uniqueness when they are simple projections over a source relation with a supported proof. Filters preserve uniqueness; computed expressions, joins inside views, unions, and arbitrary subquery semantics are not guessed.
|
|
127
|
+
|
|
128
|
+
Examples:
|
|
129
|
+
|
|
130
|
+
```sql
|
|
131
|
+
-- Proved by primary key.
|
|
132
|
+
select *
|
|
133
|
+
from sessions
|
|
134
|
+
/**UNIQUE**/ join users
|
|
135
|
+
on sessions.user_id = users.id;
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
```sql
|
|
139
|
+
-- Proved by composite primary key plus RHS-only filter.
|
|
140
|
+
select *
|
|
141
|
+
from sessions
|
|
142
|
+
/**UNIQUE**/ join orders
|
|
143
|
+
on sessions.user_id = orders.user_id
|
|
144
|
+
and orders.order_id = 1;
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
```sql
|
|
148
|
+
-- Proved by GROUP BY.
|
|
149
|
+
with latest_session as (
|
|
150
|
+
select user_id, max(ts) as max_ts
|
|
151
|
+
from sessions
|
|
152
|
+
group by user_id
|
|
153
|
+
)
|
|
154
|
+
select *
|
|
155
|
+
from users
|
|
156
|
+
/**UNIQUE**/ join latest_session
|
|
157
|
+
on users.id = latest_session.user_id;
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
```sql
|
|
161
|
+
-- Proved by QUALIFY row_number() = 1.
|
|
162
|
+
with sessions_ranked as (
|
|
163
|
+
select user_id, *
|
|
164
|
+
from sessions
|
|
165
|
+
qualify row_number() over (partition by user_id order by ts) = 1
|
|
166
|
+
)
|
|
167
|
+
select *
|
|
168
|
+
from users
|
|
169
|
+
/**UNIQUE**/ join sessions_ranked
|
|
170
|
+
on users.id = sessions_ranked.user_id;
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
More compile-time SQL checks can be added under the same model: explicit syntax, fast validation, and clear reasons when a proof is missing.
|
|
@@ -0,0 +1,162 @@
|
|
|
1
|
+
# sqlassert
|
|
2
|
+
|
|
3
|
+
`sqlassert` is a Python library for adding safety checks to SQL before you run it.
|
|
4
|
+
|
|
5
|
+
The goal is to catch common query mistakes at test time or build time, using fast static and metadata-backed proofs instead of scanning production data. You can add `sqlassert` to your test suite and validate important queries offline, making them more resilient independent of the current contents of your database.
|
|
6
|
+
|
|
7
|
+
|
|
8
|
+
```bash
|
|
9
|
+
pip install sqlassert
|
|
10
|
+
```
|
|
11
|
+
|
|
12
|
+
_Alpha warning: Today `sqlassert` supports only one check: `/**UNIQUE**/` joins. It is also only tested on duckdb._
|
|
13
|
+
|
|
14
|
+
## Features
|
|
15
|
+
|
|
16
|
+
### Unique Join
|
|
17
|
+
|
|
18
|
+
Joins often accidentally multiply rows. A query may look correct against today’s data but silently break when the RHS relation later contains multiple matching rows.
|
|
19
|
+
|
|
20
|
+
`sqlassert` lets you mark joins that are expected to be unique. That is, the result of the join must never 'grow' the number of rows with respect to the LHS.
|
|
21
|
+
|
|
22
|
+
```sql
|
|
23
|
+
select *
|
|
24
|
+
from sessions
|
|
25
|
+
/**UNIQUE**/ join users
|
|
26
|
+
on sessions.user_id = users.id;
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
The marker is just a SQL comment. Your SQL remains valid SQL and can still run normally. `sqlassert` reads the query separately and validates that the RHS is provably unique for the join keys.
|
|
30
|
+
|
|
31
|
+
## Usage
|
|
32
|
+
|
|
33
|
+
Run validation offline, before your application or analytics job executes the query:
|
|
34
|
+
|
|
35
|
+
```python
|
|
36
|
+
import duckdb
|
|
37
|
+
from sqlassert import validate_unique_joins
|
|
38
|
+
|
|
39
|
+
con = duckdb.connect("warehouse.duckdb")
|
|
40
|
+
|
|
41
|
+
query = """
|
|
42
|
+
select *
|
|
43
|
+
from sessions
|
|
44
|
+
/**UNIQUE**/ join users
|
|
45
|
+
on sessions.user_id = users.id
|
|
46
|
+
"""
|
|
47
|
+
|
|
48
|
+
result = validate_unique_joins(con, query)
|
|
49
|
+
|
|
50
|
+
assert result.valid, result.reason
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
For a test suite, keep your model/query SQL as strings or load them from files, then validate them against a db connection that has the relevant schema:
|
|
54
|
+
|
|
55
|
+
```python
|
|
56
|
+
def test_query_join_contract(con):
|
|
57
|
+
query = load_query("models/session_enrichment.sql")
|
|
58
|
+
result = validate_unique_joins(con, query)
|
|
59
|
+
|
|
60
|
+
assert result.valid, result.reason
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
`result.checks` contains one check per marker:
|
|
64
|
+
|
|
65
|
+
```python
|
|
66
|
+
for check in result.checks:
|
|
67
|
+
print(check.valid)
|
|
68
|
+
print(check.reason)
|
|
69
|
+
print(check.inferred_key_columns)
|
|
70
|
+
print(check.constrained_key_columns)
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
## Details
|
|
74
|
+
|
|
75
|
+
### Unique Join Syntax
|
|
76
|
+
|
|
77
|
+
Place `/**UNIQUE**/` immediately before the join that should be uniqueness-checked:
|
|
78
|
+
|
|
79
|
+
```sql
|
|
80
|
+
select *
|
|
81
|
+
from lhs
|
|
82
|
+
/**UNIQUE**/ left join rhs
|
|
83
|
+
on lhs.rhs_id = rhs.id;
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
`ON` and `USING` are both supported:
|
|
87
|
+
|
|
88
|
+
```sql
|
|
89
|
+
select *
|
|
90
|
+
from users
|
|
91
|
+
/**UNIQUE**/ join user_profiles
|
|
92
|
+
using (id);
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
The marker applies to the next join after the comment.
|
|
96
|
+
|
|
97
|
+
## Proofs, Not Data Checks
|
|
98
|
+
|
|
99
|
+
`sqlassert` does **not** validate by querying actual table data. It will not run `count(*)`, search for duplicates, or sample rows.
|
|
100
|
+
|
|
101
|
+
Instead, it proves uniqueness using fast information available from the SQL and database metadata. If uniqueness cannot be proven, validation fails with a reason that names the join and RHS column:
|
|
102
|
+
|
|
103
|
+
```text
|
|
104
|
+
in join "INNER JOIN events ON sessions.event_id = events.id", we can't prove that RHS column id is unique
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
Supported uniqueness proofs today:
|
|
108
|
+
|
|
109
|
+
- RHS `PRIMARY KEY` and `UNIQUE` constraints from db metadata.
|
|
110
|
+
- RHS `GROUP BY` subqueries, when the join covers the grouping keys.
|
|
111
|
+
- RHS `SELECT DISTINCT` subqueries, when the join covers the selected distinct columns.
|
|
112
|
+
- RHS `QUALIFY row_number() over (partition by ...) = 1` subqueries, when the join covers the partition keys.
|
|
113
|
+
- Simple projection views and subqueries that preserve one of the proofs above.
|
|
114
|
+
|
|
115
|
+
Views can inherit uniqueness when they are simple projections over a source relation with a supported proof. Filters preserve uniqueness; computed expressions, joins inside views, unions, and arbitrary subquery semantics are not guessed.
|
|
116
|
+
|
|
117
|
+
Examples:
|
|
118
|
+
|
|
119
|
+
```sql
|
|
120
|
+
-- Proved by primary key.
|
|
121
|
+
select *
|
|
122
|
+
from sessions
|
|
123
|
+
/**UNIQUE**/ join users
|
|
124
|
+
on sessions.user_id = users.id;
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
```sql
|
|
128
|
+
-- Proved by composite primary key plus RHS-only filter.
|
|
129
|
+
select *
|
|
130
|
+
from sessions
|
|
131
|
+
/**UNIQUE**/ join orders
|
|
132
|
+
on sessions.user_id = orders.user_id
|
|
133
|
+
and orders.order_id = 1;
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
```sql
|
|
137
|
+
-- Proved by GROUP BY.
|
|
138
|
+
with latest_session as (
|
|
139
|
+
select user_id, max(ts) as max_ts
|
|
140
|
+
from sessions
|
|
141
|
+
group by user_id
|
|
142
|
+
)
|
|
143
|
+
select *
|
|
144
|
+
from users
|
|
145
|
+
/**UNIQUE**/ join latest_session
|
|
146
|
+
on users.id = latest_session.user_id;
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
```sql
|
|
150
|
+
-- Proved by QUALIFY row_number() = 1.
|
|
151
|
+
with sessions_ranked as (
|
|
152
|
+
select user_id, *
|
|
153
|
+
from sessions
|
|
154
|
+
qualify row_number() over (partition by user_id order by ts) = 1
|
|
155
|
+
)
|
|
156
|
+
select *
|
|
157
|
+
from users
|
|
158
|
+
/**UNIQUE**/ join sessions_ranked
|
|
159
|
+
on users.id = sessions_ranked.user_id;
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
More compile-time SQL checks can be added under the same model: explicit syntax, fast validation, and clear reasons when a proof is missing.
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
[build-system]
|
|
2
|
+
requires = ["setuptools>=68"]
|
|
3
|
+
build-backend = "setuptools.build_meta"
|
|
4
|
+
|
|
5
|
+
[project]
|
|
6
|
+
name = "sqlassert"
|
|
7
|
+
version = "0.1.0"
|
|
8
|
+
description = "Generate SQL assertions from unique join markers."
|
|
9
|
+
readme = "README.md"
|
|
10
|
+
requires-python = ">=3.10"
|
|
11
|
+
dependencies = [
|
|
12
|
+
"sqlglot>=28",
|
|
13
|
+
]
|
|
14
|
+
|
|
15
|
+
[project.optional-dependencies]
|
|
16
|
+
test = [
|
|
17
|
+
"duckdb",
|
|
18
|
+
"pytest",
|
|
19
|
+
]
|
|
20
|
+
|
|
21
|
+
[tool.pytest.ini_options]
|
|
22
|
+
testpaths = ["tests"]
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
from sqlassert.unique import (
|
|
2
|
+
UniqueJoinCheckResult,
|
|
3
|
+
UniqueJoinValidationResult,
|
|
4
|
+
unique_assertions,
|
|
5
|
+
validate_unique_joins,
|
|
6
|
+
)
|
|
7
|
+
|
|
8
|
+
__all__ = [
|
|
9
|
+
"UniqueJoinCheckResult",
|
|
10
|
+
"UniqueJoinValidationResult",
|
|
11
|
+
"unique_assertions",
|
|
12
|
+
"validate_unique_joins",
|
|
13
|
+
]
|