sqlassert 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,173 @@
1
+ Metadata-Version: 2.4
2
+ Name: sqlassert
3
+ Version: 0.1.0
4
+ Summary: Generate SQL assertions from unique join markers.
5
+ Requires-Python: >=3.10
6
+ Description-Content-Type: text/markdown
7
+ Requires-Dist: sqlglot>=28
8
+ Provides-Extra: test
9
+ Requires-Dist: duckdb; extra == "test"
10
+ Requires-Dist: pytest; extra == "test"
11
+
12
+ # sqlassert
13
+
14
+ `sqlassert` is a Python library for adding safety checks to SQL before you run it.
15
+
16
+ The goal is to catch common query mistakes at test time or build time, using fast static and metadata-backed proofs instead of scanning production data. You can add `sqlassert` to your test suite and validate important queries offline, making them more resilient independent of the current contents of your database.
17
+
18
+
19
+ ```bash
20
+ pip install sqlassert
21
+ ```
22
+
23
+ _Alpha warning: Today `sqlassert` supports only one check: `/**UNIQUE**/` joins. It is also only tested on duckdb._
24
+
25
+ ## Features
26
+
27
+ ### Unique Join
28
+
29
+ Joins often accidentally multiply rows. A query may look correct against today’s data but silently break when the RHS relation later contains multiple matching rows.
30
+
31
+ `sqlassert` lets you mark joins that are expected to be unique. That is, the result of the join must never 'grow' the number of rows with respect to the LHS.
32
+
33
+ ```sql
34
+ select *
35
+ from sessions
36
+ /**UNIQUE**/ join users
37
+ on sessions.user_id = users.id;
38
+ ```
39
+
40
+ The marker is just a SQL comment. Your SQL remains valid SQL and can still run normally. `sqlassert` reads the query separately and validates that the RHS is provably unique for the join keys.
41
+
42
+ ## Usage
43
+
44
+ Run validation offline, before your application or analytics job executes the query:
45
+
46
+ ```python
47
+ import duckdb
48
+ from sqlassert import validate_unique_joins
49
+
50
+ con = duckdb.connect("warehouse.duckdb")
51
+
52
+ query = """
53
+ select *
54
+ from sessions
55
+ /**UNIQUE**/ join users
56
+ on sessions.user_id = users.id
57
+ """
58
+
59
+ result = validate_unique_joins(con, query)
60
+
61
+ assert result.valid, result.reason
62
+ ```
63
+
64
+ For a test suite, keep your model/query SQL as strings or load them from files, then validate them against a db connection that has the relevant schema:
65
+
66
+ ```python
67
+ def test_query_join_contract(con):
68
+ query = load_query("models/session_enrichment.sql")
69
+ result = validate_unique_joins(con, query)
70
+
71
+ assert result.valid, result.reason
72
+ ```
73
+
74
+ `result.checks` contains one check per marker:
75
+
76
+ ```python
77
+ for check in result.checks:
78
+ print(check.valid)
79
+ print(check.reason)
80
+ print(check.inferred_key_columns)
81
+ print(check.constrained_key_columns)
82
+ ```
83
+
84
+ ## Details
85
+
86
+ ### Unique Join Syntax
87
+
88
+ Place `/**UNIQUE**/` immediately before the join that should be uniqueness-checked:
89
+
90
+ ```sql
91
+ select *
92
+ from lhs
93
+ /**UNIQUE**/ left join rhs
94
+ on lhs.rhs_id = rhs.id;
95
+ ```
96
+
97
+ `ON` and `USING` are both supported:
98
+
99
+ ```sql
100
+ select *
101
+ from users
102
+ /**UNIQUE**/ join user_profiles
103
+ using (id);
104
+ ```
105
+
106
+ The marker applies to the next join after the comment.
107
+
108
+ ## Proofs, Not Data Checks
109
+
110
+ `sqlassert` does **not** validate by querying actual table data. It will not run `count(*)`, search for duplicates, or sample rows.
111
+
112
+ Instead, it proves uniqueness using fast information available from the SQL and database metadata. If uniqueness cannot be proven, validation fails with a reason that names the join and RHS column:
113
+
114
+ ```text
115
+ in join "INNER JOIN events ON sessions.event_id = events.id", we can't prove that RHS column id is unique
116
+ ```
117
+
118
+ Supported uniqueness proofs today:
119
+
120
+ - RHS `PRIMARY KEY` and `UNIQUE` constraints from db metadata.
121
+ - RHS `GROUP BY` subqueries, when the join covers the grouping keys.
122
+ - RHS `SELECT DISTINCT` subqueries, when the join covers the selected distinct columns.
123
+ - RHS `QUALIFY row_number() over (partition by ...) = 1` subqueries, when the join covers the partition keys.
124
+ - Simple projection views and subqueries that preserve one of the proofs above.
125
+
126
+ Views can inherit uniqueness when they are simple projections over a source relation with a supported proof. Filters preserve uniqueness; computed expressions, joins inside views, unions, and arbitrary subquery semantics are not guessed.
127
+
128
+ Examples:
129
+
130
+ ```sql
131
+ -- Proved by primary key.
132
+ select *
133
+ from sessions
134
+ /**UNIQUE**/ join users
135
+ on sessions.user_id = users.id;
136
+ ```
137
+
138
+ ```sql
139
+ -- Proved by composite primary key plus RHS-only filter.
140
+ select *
141
+ from sessions
142
+ /**UNIQUE**/ join orders
143
+ on sessions.user_id = orders.user_id
144
+ and orders.order_id = 1;
145
+ ```
146
+
147
+ ```sql
148
+ -- Proved by GROUP BY.
149
+ with latest_session as (
150
+ select user_id, max(ts) as max_ts
151
+ from sessions
152
+ group by user_id
153
+ )
154
+ select *
155
+ from users
156
+ /**UNIQUE**/ join latest_session
157
+ on users.id = latest_session.user_id;
158
+ ```
159
+
160
+ ```sql
161
+ -- Proved by QUALIFY row_number() = 1.
162
+ with sessions_ranked as (
163
+ select user_id, *
164
+ from sessions
165
+ qualify row_number() over (partition by user_id order by ts) = 1
166
+ )
167
+ select *
168
+ from users
169
+ /**UNIQUE**/ join sessions_ranked
170
+ on users.id = sessions_ranked.user_id;
171
+ ```
172
+
173
+ More compile-time SQL checks can be added under the same model: explicit syntax, fast validation, and clear reasons when a proof is missing.
@@ -0,0 +1,162 @@
1
+ # sqlassert
2
+
3
+ `sqlassert` is a Python library for adding safety checks to SQL before you run it.
4
+
5
+ The goal is to catch common query mistakes at test time or build time, using fast static and metadata-backed proofs instead of scanning production data. You can add `sqlassert` to your test suite and validate important queries offline, making them more resilient independent of the current contents of your database.
6
+
7
+
8
+ ```bash
9
+ pip install sqlassert
10
+ ```
11
+
12
+ _Alpha warning: Today `sqlassert` supports only one check: `/**UNIQUE**/` joins. It is also only tested on duckdb._
13
+
14
+ ## Features
15
+
16
+ ### Unique Join
17
+
18
+ Joins often accidentally multiply rows. A query may look correct against today’s data but silently break when the RHS relation later contains multiple matching rows.
19
+
20
+ `sqlassert` lets you mark joins that are expected to be unique. That is, the result of the join must never 'grow' the number of rows with respect to the LHS.
21
+
22
+ ```sql
23
+ select *
24
+ from sessions
25
+ /**UNIQUE**/ join users
26
+ on sessions.user_id = users.id;
27
+ ```
28
+
29
+ The marker is just a SQL comment. Your SQL remains valid SQL and can still run normally. `sqlassert` reads the query separately and validates that the RHS is provably unique for the join keys.
30
+
31
+ ## Usage
32
+
33
+ Run validation offline, before your application or analytics job executes the query:
34
+
35
+ ```python
36
+ import duckdb
37
+ from sqlassert import validate_unique_joins
38
+
39
+ con = duckdb.connect("warehouse.duckdb")
40
+
41
+ query = """
42
+ select *
43
+ from sessions
44
+ /**UNIQUE**/ join users
45
+ on sessions.user_id = users.id
46
+ """
47
+
48
+ result = validate_unique_joins(con, query)
49
+
50
+ assert result.valid, result.reason
51
+ ```
52
+
53
+ For a test suite, keep your model/query SQL as strings or load them from files, then validate them against a db connection that has the relevant schema:
54
+
55
+ ```python
56
+ def test_query_join_contract(con):
57
+ query = load_query("models/session_enrichment.sql")
58
+ result = validate_unique_joins(con, query)
59
+
60
+ assert result.valid, result.reason
61
+ ```
62
+
63
+ `result.checks` contains one check per marker:
64
+
65
+ ```python
66
+ for check in result.checks:
67
+ print(check.valid)
68
+ print(check.reason)
69
+ print(check.inferred_key_columns)
70
+ print(check.constrained_key_columns)
71
+ ```
72
+
73
+ ## Details
74
+
75
+ ### Unique Join Syntax
76
+
77
+ Place `/**UNIQUE**/` immediately before the join that should be uniqueness-checked:
78
+
79
+ ```sql
80
+ select *
81
+ from lhs
82
+ /**UNIQUE**/ left join rhs
83
+ on lhs.rhs_id = rhs.id;
84
+ ```
85
+
86
+ `ON` and `USING` are both supported:
87
+
88
+ ```sql
89
+ select *
90
+ from users
91
+ /**UNIQUE**/ join user_profiles
92
+ using (id);
93
+ ```
94
+
95
+ The marker applies to the next join after the comment.
96
+
97
+ ## Proofs, Not Data Checks
98
+
99
+ `sqlassert` does **not** validate by querying actual table data. It will not run `count(*)`, search for duplicates, or sample rows.
100
+
101
+ Instead, it proves uniqueness using fast information available from the SQL and database metadata. If uniqueness cannot be proven, validation fails with a reason that names the join and RHS column:
102
+
103
+ ```text
104
+ in join "INNER JOIN events ON sessions.event_id = events.id", we can't prove that RHS column id is unique
105
+ ```
106
+
107
+ Supported uniqueness proofs today:
108
+
109
+ - RHS `PRIMARY KEY` and `UNIQUE` constraints from db metadata.
110
+ - RHS `GROUP BY` subqueries, when the join covers the grouping keys.
111
+ - RHS `SELECT DISTINCT` subqueries, when the join covers the selected distinct columns.
112
+ - RHS `QUALIFY row_number() over (partition by ...) = 1` subqueries, when the join covers the partition keys.
113
+ - Simple projection views and subqueries that preserve one of the proofs above.
114
+
115
+ Views can inherit uniqueness when they are simple projections over a source relation with a supported proof. Filters preserve uniqueness; computed expressions, joins inside views, unions, and arbitrary subquery semantics are not guessed.
116
+
117
+ Examples:
118
+
119
+ ```sql
120
+ -- Proved by primary key.
121
+ select *
122
+ from sessions
123
+ /**UNIQUE**/ join users
124
+ on sessions.user_id = users.id;
125
+ ```
126
+
127
+ ```sql
128
+ -- Proved by composite primary key plus RHS-only filter.
129
+ select *
130
+ from sessions
131
+ /**UNIQUE**/ join orders
132
+ on sessions.user_id = orders.user_id
133
+ and orders.order_id = 1;
134
+ ```
135
+
136
+ ```sql
137
+ -- Proved by GROUP BY.
138
+ with latest_session as (
139
+ select user_id, max(ts) as max_ts
140
+ from sessions
141
+ group by user_id
142
+ )
143
+ select *
144
+ from users
145
+ /**UNIQUE**/ join latest_session
146
+ on users.id = latest_session.user_id;
147
+ ```
148
+
149
+ ```sql
150
+ -- Proved by QUALIFY row_number() = 1.
151
+ with sessions_ranked as (
152
+ select user_id, *
153
+ from sessions
154
+ qualify row_number() over (partition by user_id order by ts) = 1
155
+ )
156
+ select *
157
+ from users
158
+ /**UNIQUE**/ join sessions_ranked
159
+ on users.id = sessions_ranked.user_id;
160
+ ```
161
+
162
+ More compile-time SQL checks can be added under the same model: explicit syntax, fast validation, and clear reasons when a proof is missing.
@@ -0,0 +1,22 @@
1
+ [build-system]
2
+ requires = ["setuptools>=68"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "sqlassert"
7
+ version = "0.1.0"
8
+ description = "Generate SQL assertions from unique join markers."
9
+ readme = "README.md"
10
+ requires-python = ">=3.10"
11
+ dependencies = [
12
+ "sqlglot>=28",
13
+ ]
14
+
15
+ [project.optional-dependencies]
16
+ test = [
17
+ "duckdb",
18
+ "pytest",
19
+ ]
20
+
21
+ [tool.pytest.ini_options]
22
+ testpaths = ["tests"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,13 @@
1
+ from sqlassert.unique import (
2
+ UniqueJoinCheckResult,
3
+ UniqueJoinValidationResult,
4
+ unique_assertions,
5
+ validate_unique_joins,
6
+ )
7
+
8
+ __all__ = [
9
+ "UniqueJoinCheckResult",
10
+ "UniqueJoinValidationResult",
11
+ "unique_assertions",
12
+ "validate_unique_joins",
13
+ ]