data-contract-validator 1.0.5__tar.gz → 1.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/CHANGELOG.md +51 -1
- data_contract_validator-1.1.0/PKG-INFO +336 -0
- data_contract_validator-1.1.0/README.md +286 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/__init__.py +8 -5
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/cli.py +28 -15
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/core/models.py +20 -0
- data_contract_validator-1.1.0/data_contract_validator/core/types.py +291 -0
- data_contract_validator-1.1.0/data_contract_validator/core/validator.py +248 -0
- data_contract_validator-1.1.0/data_contract_validator/extractors/base.py +77 -0
- data_contract_validator-1.1.0/data_contract_validator/extractors/dbt.py +449 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/extractors/fastapi.py +35 -14
- data_contract_validator-1.1.0/data_contract_validator.egg-info/PKG-INFO +336 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator.egg-info/SOURCES.txt +1 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator.egg-info/requires.txt +1 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/pyproject.toml +3 -2
- data_contract_validator-1.0.5/PKG-INFO +0 -512
- data_contract_validator-1.0.5/README.md +0 -463
- data_contract_validator-1.0.5/data_contract_validator/core/validator.py +0 -187
- data_contract_validator-1.0.5/data_contract_validator/extractors/base.py +0 -45
- data_contract_validator-1.0.5/data_contract_validator/extractors/dbt.py +0 -227
- data_contract_validator-1.0.5/data_contract_validator.egg-info/PKG-INFO +0 -512
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/LICENSE +0 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/MANIFEST.in +0 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/core/__init__.py +0 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/extractors/__init__.py +0 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/integrations/__init__.py +0 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/py.typed +0 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/templates/github-actions-template.yml +0 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator.egg-info/dependency_links.txt +0 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator.egg-info/entry_points.txt +0 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator.egg-info/top_level.txt +0 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/requirements.txt +0 -0
- {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/setup.cfg +0 -0
|
@@ -7,6 +7,55 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
7
7
|
|
|
8
8
|
## [Unreleased]
|
|
9
9
|
|
|
10
|
+
## [1.1.0] - 2026-06-30
|
|
11
|
+
|
|
12
|
+
This release is focused on **accuracy** — making a red check always mean a real
|
|
13
|
+
problem and a green check genuinely safe, so the tool can be trusted to gate a
|
|
14
|
+
deploy.
|
|
15
|
+
|
|
16
|
+
### Added
|
|
17
|
+
- **Canonical type system** (`core/types.py`): every extractor now normalizes
|
|
18
|
+
its native types (warehouse SQL types, Python hints) into a shared, neutral
|
|
19
|
+
vocabulary (`CanonicalType`). The validator compares canonical types instead
|
|
20
|
+
of raw strings, eliminating the bulk of false "type mismatch" warnings
|
|
21
|
+
(e.g. dbt `varchar` vs Pydantic `str` are now correctly equal).
|
|
22
|
+
- Dialect-aware normalization: Snowflake `NUMBER(38,0)`→bigint, BigQuery
|
|
23
|
+
`INT64`/`FLOAT64`, Redshift `SUPER`, Postgres `jsonb`, and more.
|
|
24
|
+
- **Tiered dbt extraction** with graceful degradation:
|
|
25
|
+
1. `catalog.json` — real warehouse types (high confidence).
|
|
26
|
+
2. `sqlglot` — a proper SQL parser. Handles CTEs, `||`, window functions, and
|
|
27
|
+
quoted identifiers that the old regex parser mangled. Detects `SELECT *`
|
|
28
|
+
and flags the schema as incomplete.
|
|
29
|
+
3. regex — last-resort best effort (low confidence, never hard-fails).
|
|
30
|
+
- **Confidence-aware validation**: when source columns can't be fully resolved
|
|
31
|
+
(e.g. `SELECT *`), a missing column is reported as a **warning, not a
|
|
32
|
+
build-blocking critical**. Type warnings are suppressed for low-confidence
|
|
33
|
+
(regex-tier) sources. This is the core false-positive guard.
|
|
34
|
+
- **Explicit mapping config** (`mapping:` in `.retl-validator.yml`) for when
|
|
35
|
+
name heuristics aren't enough — map a target table/column to a differently
|
|
36
|
+
named source model/column:
|
|
37
|
+
```yaml
|
|
38
|
+
mapping:
|
|
39
|
+
tables:
|
|
40
|
+
user_analytics: user_analytics_summary
|
|
41
|
+
columns:
|
|
42
|
+
user_analytics:
|
|
43
|
+
userId: user_id
|
|
44
|
+
```
|
|
45
|
+
- **Name normalization**: tables/columns now match across snake_case, camelCase
|
|
46
|
+
and casing differences (`userId` == `user_id` == `USER_ID`).
|
|
47
|
+
|
|
48
|
+
### Changed
|
|
49
|
+
- `Schema` now carries `confidence` and `is_complete` (via `metadata`).
|
|
50
|
+
- `BaseExtractor` no longer contains Python-specific type mapping; type
|
|
51
|
+
normalization lives in the canonical type system. Added `_make_column` helper.
|
|
52
|
+
- Added `sqlglot` as a dependency (imported optionally; falls back to regex if
|
|
53
|
+
absent).
|
|
54
|
+
|
|
55
|
+
### Fixed
|
|
56
|
+
- Hardened GitHub API rate-limit handling against non-dict response headers
|
|
57
|
+
(previously could raise when headers weren't a mapping).
|
|
58
|
+
|
|
10
59
|
## [1.0.5] - 2025-01-24
|
|
11
60
|
|
|
12
61
|
### Fixed
|
|
@@ -66,6 +115,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
66
115
|
- Limited type inference from SQL
|
|
67
116
|
- No support for complex nested types
|
|
68
117
|
|
|
69
|
-
[Unreleased]: https://github.com/OGsiji/data-contract-validator/compare/v1.0
|
|
118
|
+
[Unreleased]: https://github.com/OGsiji/data-contract-validator/compare/v1.1.0...HEAD
|
|
119
|
+
[1.1.0]: https://github.com/OGsiji/data-contract-validator/releases/tag/v1.1.0
|
|
70
120
|
[1.0.5]: https://github.com/OGsiji/data-contract-validator/releases/tag/v1.0.5
|
|
71
121
|
[1.0.0]: https://github.com/OGsiji/data-contract-validator/releases/tag/v1.0.0
|
|
@@ -0,0 +1,336 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: data-contract-validator
|
|
3
|
+
Version: 1.1.0
|
|
4
|
+
Summary: Validate data contracts between dbt models and FastAPI/Pydantic APIs with accurate, low-false-positive schema checks
|
|
5
|
+
Author-email: Ogunniran Siji <ogunniransiji@gmail.com>
|
|
6
|
+
Maintainer-email: Ogunniran Siji <ogunniransiji@gmail.com>
|
|
7
|
+
License: MIT
|
|
8
|
+
Project-URL: Homepage, https://github.com/OGsiji/data-contract-validator
|
|
9
|
+
Project-URL: Documentation, https://github.com/OGsiji/data-contract-validator/blob/main/README.md
|
|
10
|
+
Project-URL: Repository, https://github.com/OGsiji/data-contract-validator
|
|
11
|
+
Project-URL: Bug Reports, https://github.com/OGsiji/data-contract-validator/issues
|
|
12
|
+
Project-URL: Changelog, https://github.com/OGsiji/data-contract-validator/blob/main/CHANGELOG.md
|
|
13
|
+
Keywords: dbt,fastapi,contract-testing,api-validation,data-engineering,schema-validation,ci-cd,devops
|
|
14
|
+
Classifier: Development Status :: 4 - Beta
|
|
15
|
+
Classifier: Intended Audience :: Developers
|
|
16
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
17
|
+
Classifier: Operating System :: OS Independent
|
|
18
|
+
Classifier: Programming Language :: Python :: 3
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
22
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
23
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
24
|
+
Classifier: Topic :: Software Development :: Quality Assurance
|
|
25
|
+
Classifier: Topic :: Software Development :: Testing
|
|
26
|
+
Classifier: Topic :: Database
|
|
27
|
+
Classifier: Topic :: Internet :: WWW/HTTP :: HTTP Servers
|
|
28
|
+
Requires-Python: >=3.8
|
|
29
|
+
Description-Content-Type: text/markdown
|
|
30
|
+
License-File: LICENSE
|
|
31
|
+
Requires-Dist: pydantic>=2.0.0
|
|
32
|
+
Requires-Dist: PyYAML>=6.0
|
|
33
|
+
Requires-Dist: requests>=2.25.0
|
|
34
|
+
Requires-Dist: click>=8.0.0
|
|
35
|
+
Requires-Dist: sqlglot>=20.0.0
|
|
36
|
+
Provides-Extra: dev
|
|
37
|
+
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
|
38
|
+
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
|
|
39
|
+
Requires-Dist: black>=22.0.0; extra == "dev"
|
|
40
|
+
Requires-Dist: flake8>=4.0.0; extra == "dev"
|
|
41
|
+
Requires-Dist: mypy>=0.991; extra == "dev"
|
|
42
|
+
Requires-Dist: pre-commit>=2.20.0; extra == "dev"
|
|
43
|
+
Requires-Dist: build>=0.8.0; extra == "dev"
|
|
44
|
+
Requires-Dist: twine>=4.0.0; extra == "dev"
|
|
45
|
+
Provides-Extra: test
|
|
46
|
+
Requires-Dist: pytest>=7.0.0; extra == "test"
|
|
47
|
+
Requires-Dist: pytest-cov>=4.0.0; extra == "test"
|
|
48
|
+
Requires-Dist: pytest-mock>=3.8.0; extra == "test"
|
|
49
|
+
Dynamic: license-file
|
|
50
|
+
|
|
51
|
+
# 🛡️ Data Contract Validator
|
|
52
|
+
|
|
53
|
+
> **Catch breaking changes between your dbt models and your FastAPI/Pydantic APIs — before they hit production.**
|
|
54
|
+
|
|
55
|
+
[](https://badge.fury.io/py/data-contract-validator)
|
|
56
|
+
[](https://github.com/OGsiji/data-contract-validator/actions)
|
|
57
|
+
[](https://opensource.org/licenses/MIT)
|
|
58
|
+
|
|
59
|
+
## 🎯 What it solves
|
|
60
|
+
|
|
61
|
+
Your analytics team changes a dbt model. Your API team's FastAPI service still
|
|
62
|
+
expects the old shape. Nobody notices until production 500s at 2 AM.
|
|
63
|
+
|
|
64
|
+
This tool sits on that boundary. It extracts the schema your **dbt models
|
|
65
|
+
produce** and the schema your **Pydantic models expect**, compares them, and
|
|
66
|
+
fails CI when the data side can no longer satisfy the API side.
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
dbt models Data Contract Validator FastAPI / Pydantic
|
|
70
|
+
(what the pipeline ──▶ extract → normalize → compare ◀── (what the API expects)
|
|
71
|
+
produces) ↓
|
|
72
|
+
critical issues block the build
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### Built for trust
|
|
76
|
+
|
|
77
|
+
A check that gates a deploy is only useful if it doesn't cry wolf. v1.1
|
|
78
|
+
re-architected extraction around that principle:
|
|
79
|
+
|
|
80
|
+
- **Canonical types** — dbt `varchar` and Pydantic `str` are understood to be
|
|
81
|
+
the same thing, so you don't get drowned in fake "type mismatch" warnings.
|
|
82
|
+
- **A real SQL parser** (`sqlglot`) instead of regex — CTEs, `||`
|
|
83
|
+
concatenation, window functions and quoted identifiers are parsed correctly.
|
|
84
|
+
- **Confidence-aware** — if the tool can't fully resolve a model's columns
|
|
85
|
+
(e.g. `SELECT *`), it will **warn** rather than falsely **block** your build.
|
|
86
|
+
|
|
87
|
+
## ⚡ Quick start
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
pip install data-contract-validator
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
# Initialize config + CI workflow in your dbt project
|
|
95
|
+
contract-validator init --interactive
|
|
96
|
+
|
|
97
|
+
# Sanity-check the setup
|
|
98
|
+
contract-validator test
|
|
99
|
+
|
|
100
|
+
# Validate
|
|
101
|
+
contract-validator validate
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### One-off validation (no config file)
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
# Local dbt project against a local Pydantic models file or directory
|
|
108
|
+
contract-validator validate \
|
|
109
|
+
--dbt-project ./my-dbt-project \
|
|
110
|
+
--fastapi-local ./my-api/app/models.py
|
|
111
|
+
|
|
112
|
+
# dbt project against models in another GitHub repo (microservices)
|
|
113
|
+
contract-validator validate \
|
|
114
|
+
--dbt-project . \
|
|
115
|
+
--fastapi-repo "my-org/my-api" \
|
|
116
|
+
--fastapi-path "app/models.py"
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
## 🔍 How extraction works (and why it's accurate)
|
|
120
|
+
|
|
121
|
+
### dbt side — tiered, best-source-wins
|
|
122
|
+
|
|
123
|
+
| Tier | Source | Types | Confidence | Notes |
|
|
124
|
+
|---|---|---|---|---|
|
|
125
|
+
| 1 | `target/catalog.json` | **Real warehouse types** | high | Produced by `dbt docs generate`. Most accurate. |
|
|
126
|
+
| 2 | `sqlglot` SQL parse | Inferred (often unknown) | medium | Trusted column **names**; enriched with documented types from `manifest.json`. Detects `SELECT *`. |
|
|
127
|
+
| 3 | regex parse | Guessed | low | Last resort. Never used to hard-fail a build. |
|
|
128
|
+
|
|
129
|
+
The tool auto-detects what's available and degrades gracefully — so it works
|
|
130
|
+
offline in pre-commit **and** with full type fidelity in a warehouse-connected
|
|
131
|
+
CI job.
|
|
132
|
+
|
|
133
|
+
> 💡 **Tip:** run `dbt docs generate` in CI before validating to unlock Tier 1
|
|
134
|
+
> (real types). Without it, you still get accurate column-presence checks from
|
|
135
|
+
> Tier 2.
|
|
136
|
+
|
|
137
|
+
### FastAPI side
|
|
138
|
+
|
|
139
|
+
Pydantic / SQLModel classes are parsed from source with Python's `ast` (no
|
|
140
|
+
imports executed). `Optional[...]` controls whether a field is required;
|
|
141
|
+
`table=True` SQLModel classes (DB tables, not API contracts) are skipped.
|
|
142
|
+
|
|
143
|
+
## 🚦 What gets flagged
|
|
144
|
+
|
|
145
|
+
| Severity | Meaning | Example |
|
|
146
|
+
|---|---|---|
|
|
147
|
+
| 🚨 **Critical** | Blocks the build | API requires a column the dbt model no longer produces |
|
|
148
|
+
| ⚠️ **Warning** | Worth a look, non-blocking | A real type mismatch, or a missing column on a model we couldn't fully resolve |
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
$ contract-validator validate
|
|
152
|
+
|
|
153
|
+
🛡️ Data Contract Validation Results:
|
|
154
|
+
Status: ❌ FAILED
|
|
155
|
+
Critical: 1 | Warnings: 0
|
|
156
|
+
|
|
157
|
+
🚨 Critical Issues (Must Fix):
|
|
158
|
+
💥 user_analytics
|
|
159
|
+
Column: total_orders
|
|
160
|
+
Problem: Target REQUIRES column 'total_orders' but source doesn't provide it
|
|
161
|
+
🔧 Fix: Add column 'total_orders' to source model for table 'user_analytics'
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
## 🔧 Configuration (`.retl-validator.yml`)
|
|
165
|
+
|
|
166
|
+
```yaml
|
|
167
|
+
version: "1.0"
|
|
168
|
+
name: "my-project-contracts"
|
|
169
|
+
|
|
170
|
+
source:
|
|
171
|
+
dbt:
|
|
172
|
+
project_path: "."
|
|
173
|
+
auto_compile: true
|
|
174
|
+
# Force Tier 2/3 SQL parsing even if catalog/manifest exist:
|
|
175
|
+
disable_manifest: false
|
|
176
|
+
|
|
177
|
+
target:
|
|
178
|
+
fastapi:
|
|
179
|
+
# GitHub repo:
|
|
180
|
+
type: "github"
|
|
181
|
+
repo: "my-org/my-api"
|
|
182
|
+
path: "app/models.py"
|
|
183
|
+
# ...or local:
|
|
184
|
+
# type: "local"
|
|
185
|
+
# path: "../my-api/app/models.py"
|
|
186
|
+
|
|
187
|
+
# Optional: explicit mapping for when names don't line up by convention.
|
|
188
|
+
mapping:
|
|
189
|
+
tables:
|
|
190
|
+
# target (Pydantic) table : source (dbt) model
|
|
191
|
+
user_analytics: user_analytics_summary
|
|
192
|
+
columns:
|
|
193
|
+
user_analytics:
|
|
194
|
+
# target column : source column
|
|
195
|
+
userId: user_id
|
|
196
|
+
|
|
197
|
+
validation:
|
|
198
|
+
fail_on: ["missing_tables", "missing_required_columns"]
|
|
199
|
+
warn_on: ["type_mismatches", "missing_optional_columns"]
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
### When do I need `mapping`?
|
|
203
|
+
|
|
204
|
+
By default, names are matched across `snake_case` / `camelCase` / casing
|
|
205
|
+
(`UserAnalytics` → `user_analytics`, `userId` → `user_id`). Reach for `mapping`
|
|
206
|
+
only when a model or column is named so differently that the convention can't
|
|
207
|
+
bridge it (e.g. Pydantic `user_id` ↔ dbt `customer_identifier`).
|
|
208
|
+
|
|
209
|
+
## 🐍 Python API
|
|
210
|
+
|
|
211
|
+
```python
|
|
212
|
+
from data_contract_validator import ContractValidator, DBTExtractor, FastAPIExtractor
|
|
213
|
+
|
|
214
|
+
dbt = DBTExtractor(project_path="./dbt-project")
|
|
215
|
+
fastapi = FastAPIExtractor.from_github_repo("my-org/my-api", "app/models.py")
|
|
216
|
+
|
|
217
|
+
validator = ContractValidator(
|
|
218
|
+
source_extractor=dbt,
|
|
219
|
+
target_extractor=fastapi,
|
|
220
|
+
mapping={"tables": {"user_analytics": "user_analytics_summary"}}, # optional
|
|
221
|
+
)
|
|
222
|
+
result = validator.validate()
|
|
223
|
+
|
|
224
|
+
if not result.success:
|
|
225
|
+
for issue in result.critical_issues:
|
|
226
|
+
print(f"💥 {issue.table}.{issue.column}: {issue.message}")
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
## 🪝 CI / pre-commit integration
|
|
230
|
+
|
|
231
|
+
### GitHub Actions
|
|
232
|
+
|
|
233
|
+
`contract-validator init` generates a workflow for you. Minimal version:
|
|
234
|
+
|
|
235
|
+
```yaml
|
|
236
|
+
name: 🛡️ Data Contract Validation
|
|
237
|
+
on:
|
|
238
|
+
pull_request:
|
|
239
|
+
paths: ["models/**/*.sql", "dbt_project.yml", "**/*models*.py"]
|
|
240
|
+
jobs:
|
|
241
|
+
validate-contracts:
|
|
242
|
+
runs-on: ubuntu-latest
|
|
243
|
+
steps:
|
|
244
|
+
- uses: actions/checkout@v4
|
|
245
|
+
- uses: actions/setup-python@v4
|
|
246
|
+
with: { python-version: "3.11" }
|
|
247
|
+
- run: pip install data-contract-validator
|
|
248
|
+
# Optional: `dbt docs generate` here for real warehouse types (Tier 1)
|
|
249
|
+
- run: contract-validator validate --output github
|
|
250
|
+
env:
|
|
251
|
+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
### Pre-commit
|
|
255
|
+
|
|
256
|
+
```bash
|
|
257
|
+
contract-validator setup-precommit --install-hooks
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
```yaml
|
|
261
|
+
repos:
|
|
262
|
+
- repo: https://github.com/OGsiji/data-contract-validator
|
|
263
|
+
rev: v1.1.0
|
|
264
|
+
hooks:
|
|
265
|
+
- id: contract-validation
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
## 🧪 Output formats
|
|
269
|
+
|
|
270
|
+
```bash
|
|
271
|
+
contract-validator validate --output terminal # human-friendly (default)
|
|
272
|
+
contract-validator validate --output json # machine-readable for CI
|
|
273
|
+
contract-validator validate --output github # GitHub Actions annotations
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
## 🚀 Supported frameworks
|
|
277
|
+
|
|
278
|
+
**Source:** dbt (all adapters — Snowflake, BigQuery, Redshift, Postgres, …).
|
|
279
|
+
**Target:** FastAPI (Pydantic v2 + SQLModel).
|
|
280
|
+
|
|
281
|
+
The extractor architecture is intentionally pluggable (`BaseExtractor` →
|
|
282
|
+
`Dict[str, Schema]` with canonical types), so additional sources/targets can be
|
|
283
|
+
added without touching the validator. [Open an issue](https://github.com/OGsiji/data-contract-validator/issues)
|
|
284
|
+
to request one.
|
|
285
|
+
|
|
286
|
+
## 🛠️ Development & testing
|
|
287
|
+
|
|
288
|
+
```bash
|
|
289
|
+
git clone https://github.com/OGsiji/data-contract-validator
|
|
290
|
+
cd data-contract-validator
|
|
291
|
+
|
|
292
|
+
python -m venv .venv && source .venv/bin/activate
|
|
293
|
+
pip install -e ".[dev]" # or: pip install -e ".[test]"
|
|
294
|
+
|
|
295
|
+
# Run the suite
|
|
296
|
+
pytest
|
|
297
|
+
|
|
298
|
+
# Lint / format
|
|
299
|
+
black data_contract_validator tests
|
|
300
|
+
```
|
|
301
|
+
|
|
302
|
+
The test suite covers the canonical type system (`tests/test_core/test_types.py`),
|
|
303
|
+
the tiered dbt extractor including sqlglot CTE handling and `catalog.json`
|
|
304
|
+
(`tests/test_extractors/test_dbt.py`), and the confidence/mapping behavior of
|
|
305
|
+
the validator (`tests/test_core/test_validator.py`).
|
|
306
|
+
|
|
307
|
+
### Adding an extractor
|
|
308
|
+
|
|
309
|
+
```python
|
|
310
|
+
from data_contract_validator.extractors.base import BaseExtractor
|
|
311
|
+
from data_contract_validator.core.types import CanonicalType
|
|
312
|
+
|
|
313
|
+
class MyExtractor(BaseExtractor):
|
|
314
|
+
def extract_schemas(self):
|
|
315
|
+
# return Dict[str, Schema]; use self._make_column(...) so each column
|
|
316
|
+
# carries a canonical_type the validator can compare.
|
|
317
|
+
...
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
## 🗺️ Roadmap
|
|
321
|
+
|
|
322
|
+
- Real compatibility semantics (nullability, additive vs. breaking changes)
|
|
323
|
+
- Reporter/logging abstraction (quiet/embeddable core)
|
|
324
|
+
- A canonical, language-neutral contract artifact + baseline/snapshot diffing
|
|
325
|
+
- More targets (Django, SQLAlchemy, GraphQL, OpenAPI)
|
|
326
|
+
|
|
327
|
+
## 📄 License
|
|
328
|
+
|
|
329
|
+
MIT — see [LICENSE](LICENSE).
|
|
330
|
+
|
|
331
|
+
## 🆘 Support
|
|
332
|
+
|
|
333
|
+
- 🐛 Issues: https://github.com/OGsiji/data-contract-validator/issues
|
|
334
|
+
- 📧 Email: ogunniransiji@gmail.com
|
|
335
|
+
|
|
336
|
+
If this saves you a production incident, please ⭐ the repo.
|
|
@@ -0,0 +1,286 @@
|
|
|
1
|
+
# 🛡️ Data Contract Validator
|
|
2
|
+
|
|
3
|
+
> **Catch breaking changes between your dbt models and your FastAPI/Pydantic APIs — before they hit production.**
|
|
4
|
+
|
|
5
|
+
[](https://badge.fury.io/py/data-contract-validator)
|
|
6
|
+
[](https://github.com/OGsiji/data-contract-validator/actions)
|
|
7
|
+
[](https://opensource.org/licenses/MIT)
|
|
8
|
+
|
|
9
|
+
## 🎯 What it solves
|
|
10
|
+
|
|
11
|
+
Your analytics team changes a dbt model. Your API team's FastAPI service still
|
|
12
|
+
expects the old shape. Nobody notices until production 500s at 2 AM.
|
|
13
|
+
|
|
14
|
+
This tool sits on that boundary. It extracts the schema your **dbt models
|
|
15
|
+
produce** and the schema your **Pydantic models expect**, compares them, and
|
|
16
|
+
fails CI when the data side can no longer satisfy the API side.
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
dbt models Data Contract Validator FastAPI / Pydantic
|
|
20
|
+
(what the pipeline ──▶ extract → normalize → compare ◀── (what the API expects)
|
|
21
|
+
produces) ↓
|
|
22
|
+
critical issues block the build
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
### Built for trust
|
|
26
|
+
|
|
27
|
+
A check that gates a deploy is only useful if it doesn't cry wolf. v1.1
|
|
28
|
+
re-architected extraction around that principle:
|
|
29
|
+
|
|
30
|
+
- **Canonical types** — dbt `varchar` and Pydantic `str` are understood to be
|
|
31
|
+
the same thing, so you don't get drowned in fake "type mismatch" warnings.
|
|
32
|
+
- **A real SQL parser** (`sqlglot`) instead of regex — CTEs, `||`
|
|
33
|
+
concatenation, window functions and quoted identifiers are parsed correctly.
|
|
34
|
+
- **Confidence-aware** — if the tool can't fully resolve a model's columns
|
|
35
|
+
(e.g. `SELECT *`), it will **warn** rather than falsely **block** your build.
|
|
36
|
+
|
|
37
|
+
## ⚡ Quick start
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
pip install data-contract-validator
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
# Initialize config + CI workflow in your dbt project
|
|
45
|
+
contract-validator init --interactive
|
|
46
|
+
|
|
47
|
+
# Sanity-check the setup
|
|
48
|
+
contract-validator test
|
|
49
|
+
|
|
50
|
+
# Validate
|
|
51
|
+
contract-validator validate
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
### One-off validation (no config file)
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
# Local dbt project against a local Pydantic models file or directory
|
|
58
|
+
contract-validator validate \
|
|
59
|
+
--dbt-project ./my-dbt-project \
|
|
60
|
+
--fastapi-local ./my-api/app/models.py
|
|
61
|
+
|
|
62
|
+
# dbt project against models in another GitHub repo (microservices)
|
|
63
|
+
contract-validator validate \
|
|
64
|
+
--dbt-project . \
|
|
65
|
+
--fastapi-repo "my-org/my-api" \
|
|
66
|
+
--fastapi-path "app/models.py"
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
## 🔍 How extraction works (and why it's accurate)
|
|
70
|
+
|
|
71
|
+
### dbt side — tiered, best-source-wins
|
|
72
|
+
|
|
73
|
+
| Tier | Source | Types | Confidence | Notes |
|
|
74
|
+
|---|---|---|---|---|
|
|
75
|
+
| 1 | `target/catalog.json` | **Real warehouse types** | high | Produced by `dbt docs generate`. Most accurate. |
|
|
76
|
+
| 2 | `sqlglot` SQL parse | Inferred (often unknown) | medium | Trusted column **names**; enriched with documented types from `manifest.json`. Detects `SELECT *`. |
|
|
77
|
+
| 3 | regex parse | Guessed | low | Last resort. Never used to hard-fail a build. |
|
|
78
|
+
|
|
79
|
+
The tool auto-detects what's available and degrades gracefully — so it works
|
|
80
|
+
offline in pre-commit **and** with full type fidelity in a warehouse-connected
|
|
81
|
+
CI job.
|
|
82
|
+
|
|
83
|
+
> 💡 **Tip:** run `dbt docs generate` in CI before validating to unlock Tier 1
|
|
84
|
+
> (real types). Without it, you still get accurate column-presence checks from
|
|
85
|
+
> Tier 2.
|
|
86
|
+
|
|
87
|
+
### FastAPI side
|
|
88
|
+
|
|
89
|
+
Pydantic / SQLModel classes are parsed from source with Python's `ast` (no
|
|
90
|
+
imports executed). `Optional[...]` controls whether a field is required;
|
|
91
|
+
`table=True` SQLModel classes (DB tables, not API contracts) are skipped.
|
|
92
|
+
|
|
93
|
+
## 🚦 What gets flagged
|
|
94
|
+
|
|
95
|
+
| Severity | Meaning | Example |
|
|
96
|
+
|---|---|---|
|
|
97
|
+
| 🚨 **Critical** | Blocks the build | API requires a column the dbt model no longer produces |
|
|
98
|
+
| ⚠️ **Warning** | Worth a look, non-blocking | A real type mismatch, or a missing column on a model we couldn't fully resolve |
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
$ contract-validator validate
|
|
102
|
+
|
|
103
|
+
🛡️ Data Contract Validation Results:
|
|
104
|
+
Status: ❌ FAILED
|
|
105
|
+
Critical: 1 | Warnings: 0
|
|
106
|
+
|
|
107
|
+
🚨 Critical Issues (Must Fix):
|
|
108
|
+
💥 user_analytics
|
|
109
|
+
Column: total_orders
|
|
110
|
+
Problem: Target REQUIRES column 'total_orders' but source doesn't provide it
|
|
111
|
+
🔧 Fix: Add column 'total_orders' to source model for table 'user_analytics'
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
## 🔧 Configuration (`.retl-validator.yml`)
|
|
115
|
+
|
|
116
|
+
```yaml
|
|
117
|
+
version: "1.0"
|
|
118
|
+
name: "my-project-contracts"
|
|
119
|
+
|
|
120
|
+
source:
|
|
121
|
+
dbt:
|
|
122
|
+
project_path: "."
|
|
123
|
+
auto_compile: true
|
|
124
|
+
# Force Tier 2/3 SQL parsing even if catalog/manifest exist:
|
|
125
|
+
disable_manifest: false
|
|
126
|
+
|
|
127
|
+
target:
|
|
128
|
+
fastapi:
|
|
129
|
+
# GitHub repo:
|
|
130
|
+
type: "github"
|
|
131
|
+
repo: "my-org/my-api"
|
|
132
|
+
path: "app/models.py"
|
|
133
|
+
# ...or local:
|
|
134
|
+
# type: "local"
|
|
135
|
+
# path: "../my-api/app/models.py"
|
|
136
|
+
|
|
137
|
+
# Optional: explicit mapping for when names don't line up by convention.
|
|
138
|
+
mapping:
|
|
139
|
+
tables:
|
|
140
|
+
# target (Pydantic) table : source (dbt) model
|
|
141
|
+
user_analytics: user_analytics_summary
|
|
142
|
+
columns:
|
|
143
|
+
user_analytics:
|
|
144
|
+
# target column : source column
|
|
145
|
+
userId: user_id
|
|
146
|
+
|
|
147
|
+
validation:
|
|
148
|
+
fail_on: ["missing_tables", "missing_required_columns"]
|
|
149
|
+
warn_on: ["type_mismatches", "missing_optional_columns"]
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
### When do I need `mapping`?
|
|
153
|
+
|
|
154
|
+
By default, names are matched across `snake_case` / `camelCase` / casing
|
|
155
|
+
(`UserAnalytics` → `user_analytics`, `userId` → `user_id`). Reach for `mapping`
|
|
156
|
+
only when a model or column is named so differently that the convention can't
|
|
157
|
+
bridge it (e.g. Pydantic `user_id` ↔ dbt `customer_identifier`).
|
|
158
|
+
|
|
159
|
+
## 🐍 Python API
|
|
160
|
+
|
|
161
|
+
```python
|
|
162
|
+
from data_contract_validator import ContractValidator, DBTExtractor, FastAPIExtractor
|
|
163
|
+
|
|
164
|
+
dbt = DBTExtractor(project_path="./dbt-project")
|
|
165
|
+
fastapi = FastAPIExtractor.from_github_repo("my-org/my-api", "app/models.py")
|
|
166
|
+
|
|
167
|
+
validator = ContractValidator(
|
|
168
|
+
source_extractor=dbt,
|
|
169
|
+
target_extractor=fastapi,
|
|
170
|
+
mapping={"tables": {"user_analytics": "user_analytics_summary"}}, # optional
|
|
171
|
+
)
|
|
172
|
+
result = validator.validate()
|
|
173
|
+
|
|
174
|
+
if not result.success:
|
|
175
|
+
for issue in result.critical_issues:
|
|
176
|
+
print(f"💥 {issue.table}.{issue.column}: {issue.message}")
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
## 🪝 CI / pre-commit integration
|
|
180
|
+
|
|
181
|
+
### GitHub Actions
|
|
182
|
+
|
|
183
|
+
`contract-validator init` generates a workflow for you. Minimal version:
|
|
184
|
+
|
|
185
|
+
```yaml
|
|
186
|
+
name: 🛡️ Data Contract Validation
|
|
187
|
+
on:
|
|
188
|
+
pull_request:
|
|
189
|
+
paths: ["models/**/*.sql", "dbt_project.yml", "**/*models*.py"]
|
|
190
|
+
jobs:
|
|
191
|
+
validate-contracts:
|
|
192
|
+
runs-on: ubuntu-latest
|
|
193
|
+
steps:
|
|
194
|
+
- uses: actions/checkout@v4
|
|
195
|
+
- uses: actions/setup-python@v4
|
|
196
|
+
with: { python-version: "3.11" }
|
|
197
|
+
- run: pip install data-contract-validator
|
|
198
|
+
# Optional: `dbt docs generate` here for real warehouse types (Tier 1)
|
|
199
|
+
- run: contract-validator validate --output github
|
|
200
|
+
env:
|
|
201
|
+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
### Pre-commit
|
|
205
|
+
|
|
206
|
+
```bash
|
|
207
|
+
contract-validator setup-precommit --install-hooks
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
```yaml
|
|
211
|
+
repos:
|
|
212
|
+
- repo: https://github.com/OGsiji/data-contract-validator
|
|
213
|
+
rev: v1.1.0
|
|
214
|
+
hooks:
|
|
215
|
+
- id: contract-validation
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
## 🧪 Output formats
|
|
219
|
+
|
|
220
|
+
```bash
|
|
221
|
+
contract-validator validate --output terminal # human-friendly (default)
|
|
222
|
+
contract-validator validate --output json # machine-readable for CI
|
|
223
|
+
contract-validator validate --output github # GitHub Actions annotations
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
## 🚀 Supported frameworks
|
|
227
|
+
|
|
228
|
+
**Source:** dbt (all adapters — Snowflake, BigQuery, Redshift, Postgres, …).
|
|
229
|
+
**Target:** FastAPI (Pydantic v2 + SQLModel).
|
|
230
|
+
|
|
231
|
+
The extractor architecture is intentionally pluggable (`BaseExtractor` →
|
|
232
|
+
`Dict[str, Schema]` with canonical types), so additional sources/targets can be
|
|
233
|
+
added without touching the validator. [Open an issue](https://github.com/OGsiji/data-contract-validator/issues)
|
|
234
|
+
to request one.
|
|
235
|
+
|
|
236
|
+
## 🛠️ Development & testing
|
|
237
|
+
|
|
238
|
+
```bash
|
|
239
|
+
git clone https://github.com/OGsiji/data-contract-validator
|
|
240
|
+
cd data-contract-validator
|
|
241
|
+
|
|
242
|
+
python -m venv .venv && source .venv/bin/activate
|
|
243
|
+
pip install -e ".[dev]" # or: pip install -e ".[test]"
|
|
244
|
+
|
|
245
|
+
# Run the suite
|
|
246
|
+
pytest
|
|
247
|
+
|
|
248
|
+
# Lint / format
|
|
249
|
+
black data_contract_validator tests
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
The test suite covers the canonical type system (`tests/test_core/test_types.py`),
|
|
253
|
+
the tiered dbt extractor including sqlglot CTE handling and `catalog.json`
|
|
254
|
+
(`tests/test_extractors/test_dbt.py`), and the confidence/mapping behavior of
|
|
255
|
+
the validator (`tests/test_core/test_validator.py`).
|
|
256
|
+
|
|
257
|
+
### Adding an extractor
|
|
258
|
+
|
|
259
|
+
```python
|
|
260
|
+
from data_contract_validator.extractors.base import BaseExtractor
|
|
261
|
+
from data_contract_validator.core.types import CanonicalType
|
|
262
|
+
|
|
263
|
+
class MyExtractor(BaseExtractor):
|
|
264
|
+
def extract_schemas(self):
|
|
265
|
+
# return Dict[str, Schema]; use self._make_column(...) so each column
|
|
266
|
+
# carries a canonical_type the validator can compare.
|
|
267
|
+
...
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
## 🗺️ Roadmap
|
|
271
|
+
|
|
272
|
+
- Real compatibility semantics (nullability, additive vs. breaking changes)
|
|
273
|
+
- Reporter/logging abstraction (quiet/embeddable core)
|
|
274
|
+
- A canonical, language-neutral contract artifact + baseline/snapshot diffing
|
|
275
|
+
- More targets (Django, SQLAlchemy, GraphQL, OpenAPI)
|
|
276
|
+
|
|
277
|
+
## 📄 License
|
|
278
|
+
|
|
279
|
+
MIT — see [LICENSE](LICENSE).
|
|
280
|
+
|
|
281
|
+
## 🆘 Support
|
|
282
|
+
|
|
283
|
+
- 🐛 Issues: https://github.com/OGsiji/data-contract-validator/issues
|
|
284
|
+
- 📧 Email: ogunniransiji@gmail.com
|
|
285
|
+
|
|
286
|
+
If this saves you a production incident, please ⭐ the repo.
|