data-contract-validator 1.0.5__tar.gz → 1.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/CHANGELOG.md +51 -1
  2. data_contract_validator-1.1.0/PKG-INFO +336 -0
  3. data_contract_validator-1.1.0/README.md +286 -0
  4. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/__init__.py +8 -5
  5. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/cli.py +28 -15
  6. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/core/models.py +20 -0
  7. data_contract_validator-1.1.0/data_contract_validator/core/types.py +291 -0
  8. data_contract_validator-1.1.0/data_contract_validator/core/validator.py +248 -0
  9. data_contract_validator-1.1.0/data_contract_validator/extractors/base.py +77 -0
  10. data_contract_validator-1.1.0/data_contract_validator/extractors/dbt.py +449 -0
  11. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/extractors/fastapi.py +35 -14
  12. data_contract_validator-1.1.0/data_contract_validator.egg-info/PKG-INFO +336 -0
  13. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator.egg-info/SOURCES.txt +1 -0
  14. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator.egg-info/requires.txt +1 -0
  15. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/pyproject.toml +3 -2
  16. data_contract_validator-1.0.5/PKG-INFO +0 -512
  17. data_contract_validator-1.0.5/README.md +0 -463
  18. data_contract_validator-1.0.5/data_contract_validator/core/validator.py +0 -187
  19. data_contract_validator-1.0.5/data_contract_validator/extractors/base.py +0 -45
  20. data_contract_validator-1.0.5/data_contract_validator/extractors/dbt.py +0 -227
  21. data_contract_validator-1.0.5/data_contract_validator.egg-info/PKG-INFO +0 -512
  22. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/LICENSE +0 -0
  23. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/MANIFEST.in +0 -0
  24. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/core/__init__.py +0 -0
  25. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/extractors/__init__.py +0 -0
  26. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/integrations/__init__.py +0 -0
  27. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/py.typed +0 -0
  28. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator/templates/github-actions-template.yml +0 -0
  29. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator.egg-info/dependency_links.txt +0 -0
  30. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator.egg-info/entry_points.txt +0 -0
  31. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/data_contract_validator.egg-info/top_level.txt +0 -0
  32. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/requirements.txt +0 -0
  33. {data_contract_validator-1.0.5 → data_contract_validator-1.1.0}/setup.cfg +0 -0
@@ -7,6 +7,55 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [1.1.0] - 2026-06-30
11
+
12
+ This release is focused on **accuracy** — making a red check always mean a real
13
+ problem and a green check genuinely safe, so the tool can be trusted to gate a
14
+ deploy.
15
+
16
+ ### Added
17
+ - **Canonical type system** (`core/types.py`): every extractor now normalizes
18
+ its native types (warehouse SQL types, Python hints) into a shared, neutral
19
+ vocabulary (`CanonicalType`). The validator compares canonical types instead
20
+ of raw strings, eliminating the bulk of false "type mismatch" warnings
21
+ (e.g. dbt `varchar` vs Pydantic `str` are now correctly equal).
22
+ - Dialect-aware normalization: Snowflake `NUMBER(38,0)`→bigint, BigQuery
23
+ `INT64`/`FLOAT64`, Redshift `SUPER`, Postgres `jsonb`, and more.
24
+ - **Tiered dbt extraction** with graceful degradation:
25
+ 1. `catalog.json` — real warehouse types (high confidence).
26
+ 2. `sqlglot` — a proper SQL parser. Handles CTEs, `||`, window functions, and
27
+ quoted identifiers that the old regex parser mangled. Detects `SELECT *`
28
+ and flags the schema as incomplete.
29
+ 3. regex — last-resort best effort (low confidence, never hard-fails).
30
+ - **Confidence-aware validation**: when source columns can't be fully resolved
31
+ (e.g. `SELECT *`), a missing column is reported as a **warning, not a
32
+ build-blocking critical**. Type warnings are suppressed for low-confidence
33
+ (regex-tier) sources. This is the core false-positive guard.
34
+ - **Explicit mapping config** (`mapping:` in `.retl-validator.yml`) for when
35
+ name heuristics aren't enough — map a target table/column to a differently
36
+ named source model/column:
37
+ ```yaml
38
+ mapping:
39
+ tables:
40
+ user_analytics: user_analytics_summary
41
+ columns:
42
+ user_analytics:
43
+ userId: user_id
44
+ ```
45
+ - **Name normalization**: tables/columns now match across snake_case, camelCase
46
+ and casing differences (`userId` == `user_id` == `USER_ID`).
47
+
48
+ ### Changed
49
+ - `Schema` now carries `confidence` and `is_complete` (via `metadata`).
50
+ - `BaseExtractor` no longer contains Python-specific type mapping; type
51
+ normalization lives in the canonical type system. Added `_make_column` helper.
52
+ - Added `sqlglot` as a dependency (imported optionally; falls back to regex if
53
+ absent).
54
+
55
+ ### Fixed
56
+ - Hardened GitHub API rate-limit handling against non-dict response headers
57
+ (previously could raise when headers weren't a mapping).
58
+
10
59
  ## [1.0.5] - 2025-01-24
11
60
 
12
61
  ### Fixed
@@ -66,6 +115,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
66
115
  - Limited type inference from SQL
67
116
  - No support for complex nested types
68
117
 
69
- [Unreleased]: https://github.com/OGsiji/data-contract-validator/compare/v1.0.5...HEAD
118
+ [Unreleased]: https://github.com/OGsiji/data-contract-validator/compare/v1.1.0...HEAD
119
+ [1.1.0]: https://github.com/OGsiji/data-contract-validator/releases/tag/v1.1.0
70
120
  [1.0.5]: https://github.com/OGsiji/data-contract-validator/releases/tag/v1.0.5
71
121
  [1.0.0]: https://github.com/OGsiji/data-contract-validator/releases/tag/v1.0.0
@@ -0,0 +1,336 @@
1
+ Metadata-Version: 2.4
2
+ Name: data-contract-validator
3
+ Version: 1.1.0
4
+ Summary: Validate data contracts between dbt models and FastAPI/Pydantic APIs with accurate, low-false-positive schema checks
5
+ Author-email: Ogunniran Siji <ogunniransiji@gmail.com>
6
+ Maintainer-email: Ogunniran Siji <ogunniransiji@gmail.com>
7
+ License: MIT
8
+ Project-URL: Homepage, https://github.com/OGsiji/data-contract-validator
9
+ Project-URL: Documentation, https://github.com/OGsiji/data-contract-validator/blob/main/README.md
10
+ Project-URL: Repository, https://github.com/OGsiji/data-contract-validator
11
+ Project-URL: Bug Reports, https://github.com/OGsiji/data-contract-validator/issues
12
+ Project-URL: Changelog, https://github.com/OGsiji/data-contract-validator/blob/main/CHANGELOG.md
13
+ Keywords: dbt,fastapi,contract-testing,api-validation,data-engineering,schema-validation,ci-cd,devops
14
+ Classifier: Development Status :: 4 - Beta
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: License :: OSI Approved :: MIT License
17
+ Classifier: Operating System :: OS Independent
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.8
20
+ Classifier: Programming Language :: Python :: 3.9
21
+ Classifier: Programming Language :: Python :: 3.10
22
+ Classifier: Programming Language :: Python :: 3.11
23
+ Classifier: Programming Language :: Python :: 3.12
24
+ Classifier: Topic :: Software Development :: Quality Assurance
25
+ Classifier: Topic :: Software Development :: Testing
26
+ Classifier: Topic :: Database
27
+ Classifier: Topic :: Internet :: WWW/HTTP :: HTTP Servers
28
+ Requires-Python: >=3.8
29
+ Description-Content-Type: text/markdown
30
+ License-File: LICENSE
31
+ Requires-Dist: pydantic>=2.0.0
32
+ Requires-Dist: PyYAML>=6.0
33
+ Requires-Dist: requests>=2.25.0
34
+ Requires-Dist: click>=8.0.0
35
+ Requires-Dist: sqlglot>=20.0.0
36
+ Provides-Extra: dev
37
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
38
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
39
+ Requires-Dist: black>=22.0.0; extra == "dev"
40
+ Requires-Dist: flake8>=4.0.0; extra == "dev"
41
+ Requires-Dist: mypy>=0.991; extra == "dev"
42
+ Requires-Dist: pre-commit>=2.20.0; extra == "dev"
43
+ Requires-Dist: build>=0.8.0; extra == "dev"
44
+ Requires-Dist: twine>=4.0.0; extra == "dev"
45
+ Provides-Extra: test
46
+ Requires-Dist: pytest>=7.0.0; extra == "test"
47
+ Requires-Dist: pytest-cov>=4.0.0; extra == "test"
48
+ Requires-Dist: pytest-mock>=3.8.0; extra == "test"
49
+ Dynamic: license-file
50
+
51
+ # 🛡️ Data Contract Validator
52
+
53
+ > **Catch breaking changes between your dbt models and your FastAPI/Pydantic APIs — before they hit production.**
54
+
55
+ [![PyPI version](https://badge.fury.io/py/data-contract-validator.svg)](https://badge.fury.io/py/data-contract-validator)
56
+ [![Tests](https://github.com/OGsiji/data-contract-validator/workflows/Tests/badge.svg)](https://github.com/OGsiji/data-contract-validator/actions)
57
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
58
+
59
+ ## 🎯 What it solves
60
+
61
+ Your analytics team changes a dbt model. Your API team's FastAPI service still
62
+ expects the old shape. Nobody notices until production 500s at 2 AM.
63
+
64
+ This tool sits on that boundary. It extracts the schema your **dbt models
65
+ produce** and the schema your **Pydantic models expect**, compares them, and
66
+ fails CI when the data side can no longer satisfy the API side.
67
+
68
+ ```
69
+ dbt models Data Contract Validator FastAPI / Pydantic
70
+ (what the pipeline ──▶ extract → normalize → compare ◀── (what the API expects)
71
+ produces) ↓
72
+ critical issues block the build
73
+ ```
74
+
75
+ ### Built for trust
76
+
77
+ A check that gates a deploy is only useful if it doesn't cry wolf. v1.1
78
+ re-architected extraction around that principle:
79
+
80
+ - **Canonical types** — dbt `varchar` and Pydantic `str` are understood to be
81
+ the same thing, so you don't get drowned in fake "type mismatch" warnings.
82
+ - **A real SQL parser** (`sqlglot`) instead of regex — CTEs, `||`
83
+ concatenation, window functions and quoted identifiers are parsed correctly.
84
+ - **Confidence-aware** — if the tool can't fully resolve a model's columns
85
+ (e.g. `SELECT *`), it will **warn** rather than falsely **block** your build.
86
+
87
+ ## ⚡ Quick start
88
+
89
+ ```bash
90
+ pip install data-contract-validator
91
+ ```
92
+
93
+ ```bash
94
+ # Initialize config + CI workflow in your dbt project
95
+ contract-validator init --interactive
96
+
97
+ # Sanity-check the setup
98
+ contract-validator test
99
+
100
+ # Validate
101
+ contract-validator validate
102
+ ```
103
+
104
+ ### One-off validation (no config file)
105
+
106
+ ```bash
107
+ # Local dbt project against a local Pydantic models file or directory
108
+ contract-validator validate \
109
+ --dbt-project ./my-dbt-project \
110
+ --fastapi-local ./my-api/app/models.py
111
+
112
+ # dbt project against models in another GitHub repo (microservices)
113
+ contract-validator validate \
114
+ --dbt-project . \
115
+ --fastapi-repo "my-org/my-api" \
116
+ --fastapi-path "app/models.py"
117
+ ```
118
+
119
+ ## 🔍 How extraction works (and why it's accurate)
120
+
121
+ ### dbt side — tiered, best-source-wins
122
+
123
+ | Tier | Source | Types | Confidence | Notes |
124
+ |---|---|---|---|---|
125
+ | 1 | `target/catalog.json` | **Real warehouse types** | high | Produced by `dbt docs generate`. Most accurate. |
126
+ | 2 | `sqlglot` SQL parse | Inferred (often unknown) | medium | Trusted column **names**; enriched with documented types from `manifest.json`. Detects `SELECT *`. |
127
+ | 3 | regex parse | Guessed | low | Last resort. Never used to hard-fail a build. |
128
+
129
+ The tool auto-detects what's available and degrades gracefully — so it works
130
+ offline in pre-commit **and** with full type fidelity in a warehouse-connected
131
+ CI job.
132
+
133
+ > 💡 **Tip:** run `dbt docs generate` in CI before validating to unlock Tier 1
134
+ > (real types). Without it, you still get accurate column-presence checks from
135
+ > Tier 2.
136
+
137
+ ### FastAPI side
138
+
139
+ Pydantic / SQLModel classes are parsed from source with Python's `ast` (no
140
+ imports executed). `Optional[...]` controls whether a field is required;
141
+ `table=True` SQLModel classes (DB tables, not API contracts) are skipped.
142
+
143
+ ## 🚦 What gets flagged
144
+
145
+ | Severity | Meaning | Example |
146
+ |---|---|---|
147
+ | 🚨 **Critical** | Blocks the build | API requires a column the dbt model no longer produces |
148
+ | ⚠️ **Warning** | Worth a look, non-blocking | A real type mismatch, or a missing column on a model we couldn't fully resolve |
149
+
150
+ ```bash
151
+ $ contract-validator validate
152
+
153
+ 🛡️ Data Contract Validation Results:
154
+ Status: ❌ FAILED
155
+ Critical: 1 | Warnings: 0
156
+
157
+ 🚨 Critical Issues (Must Fix):
158
+ 💥 user_analytics
159
+ Column: total_orders
160
+ Problem: Target REQUIRES column 'total_orders' but source doesn't provide it
161
+ 🔧 Fix: Add column 'total_orders' to source model for table 'user_analytics'
162
+ ```
163
+
164
+ ## 🔧 Configuration (`.retl-validator.yml`)
165
+
166
+ ```yaml
167
+ version: "1.0"
168
+ name: "my-project-contracts"
169
+
170
+ source:
171
+ dbt:
172
+ project_path: "."
173
+ auto_compile: true
174
+ # Force Tier 2/3 SQL parsing even if catalog/manifest exist:
175
+ disable_manifest: false
176
+
177
+ target:
178
+ fastapi:
179
+ # GitHub repo:
180
+ type: "github"
181
+ repo: "my-org/my-api"
182
+ path: "app/models.py"
183
+ # ...or local:
184
+ # type: "local"
185
+ # path: "../my-api/app/models.py"
186
+
187
+ # Optional: explicit mapping for when names don't line up by convention.
188
+ mapping:
189
+ tables:
190
+ # target (Pydantic) table : source (dbt) model
191
+ user_analytics: user_analytics_summary
192
+ columns:
193
+ user_analytics:
194
+ # target column : source column
195
+ userId: user_id
196
+
197
+ validation:
198
+ fail_on: ["missing_tables", "missing_required_columns"]
199
+ warn_on: ["type_mismatches", "missing_optional_columns"]
200
+ ```
201
+
202
+ ### When do I need `mapping`?
203
+
204
+ By default, names are matched across `snake_case` / `camelCase` / casing
205
+ (`UserAnalytics` → `user_analytics`, `userId` → `user_id`). Reach for `mapping`
206
+ only when a model or column is named so differently that the convention can't
207
+ bridge it (e.g. Pydantic `user_id` ↔ dbt `customer_identifier`).
208
+
209
+ ## 🐍 Python API
210
+
211
+ ```python
212
+ from data_contract_validator import ContractValidator, DBTExtractor, FastAPIExtractor
213
+
214
+ dbt = DBTExtractor(project_path="./dbt-project")
215
+ fastapi = FastAPIExtractor.from_github_repo("my-org/my-api", "app/models.py")
216
+
217
+ validator = ContractValidator(
218
+ source_extractor=dbt,
219
+ target_extractor=fastapi,
220
+ mapping={"tables": {"user_analytics": "user_analytics_summary"}}, # optional
221
+ )
222
+ result = validator.validate()
223
+
224
+ if not result.success:
225
+ for issue in result.critical_issues:
226
+ print(f"💥 {issue.table}.{issue.column}: {issue.message}")
227
+ ```
228
+
229
+ ## 🪝 CI / pre-commit integration
230
+
231
+ ### GitHub Actions
232
+
233
+ `contract-validator init` generates a workflow for you. Minimal version:
234
+
235
+ ```yaml
236
+ name: 🛡️ Data Contract Validation
237
+ on:
238
+ pull_request:
239
+ paths: ["models/**/*.sql", "dbt_project.yml", "**/*models*.py"]
240
+ jobs:
241
+ validate-contracts:
242
+ runs-on: ubuntu-latest
243
+ steps:
244
+ - uses: actions/checkout@v4
245
+ - uses: actions/setup-python@v4
246
+ with: { python-version: "3.11" }
247
+ - run: pip install data-contract-validator
248
+ # Optional: `dbt docs generate` here for real warehouse types (Tier 1)
249
+ - run: contract-validator validate --output github
250
+ env:
251
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
252
+ ```
253
+
254
+ ### Pre-commit
255
+
256
+ ```bash
257
+ contract-validator setup-precommit --install-hooks
258
+ ```
259
+
260
+ ```yaml
261
+ repos:
262
+ - repo: https://github.com/OGsiji/data-contract-validator
263
+ rev: v1.1.0
264
+ hooks:
265
+ - id: contract-validation
266
+ ```
267
+
268
+ ## 🧪 Output formats
269
+
270
+ ```bash
271
+ contract-validator validate --output terminal # human-friendly (default)
272
+ contract-validator validate --output json # machine-readable for CI
273
+ contract-validator validate --output github # GitHub Actions annotations
274
+ ```
275
+
276
+ ## 🚀 Supported frameworks
277
+
278
+ **Source:** dbt (all adapters — Snowflake, BigQuery, Redshift, Postgres, …).
279
+ **Target:** FastAPI (Pydantic v2 + SQLModel).
280
+
281
+ The extractor architecture is intentionally pluggable (`BaseExtractor` →
282
+ `Dict[str, Schema]` with canonical types), so additional sources/targets can be
283
+ added without touching the validator. [Open an issue](https://github.com/OGsiji/data-contract-validator/issues)
284
+ to request one.
285
+
286
+ ## 🛠️ Development & testing
287
+
288
+ ```bash
289
+ git clone https://github.com/OGsiji/data-contract-validator
290
+ cd data-contract-validator
291
+
292
+ python -m venv .venv && source .venv/bin/activate
293
+ pip install -e ".[dev]" # or: pip install -e ".[test]"
294
+
295
+ # Run the suite
296
+ pytest
297
+
298
+ # Lint / format
299
+ black data_contract_validator tests
300
+ ```
301
+
302
+ The test suite covers the canonical type system (`tests/test_core/test_types.py`),
303
+ the tiered dbt extractor including sqlglot CTE handling and `catalog.json`
304
+ (`tests/test_extractors/test_dbt.py`), and the confidence/mapping behavior of
305
+ the validator (`tests/test_core/test_validator.py`).
306
+
307
+ ### Adding an extractor
308
+
309
+ ```python
310
+ from data_contract_validator.extractors.base import BaseExtractor
311
+ from data_contract_validator.core.types import CanonicalType
312
+
313
+ class MyExtractor(BaseExtractor):
314
+ def extract_schemas(self):
315
+ # return Dict[str, Schema]; use self._make_column(...) so each column
316
+ # carries a canonical_type the validator can compare.
317
+ ...
318
+ ```
319
+
320
+ ## 🗺️ Roadmap
321
+
322
+ - Real compatibility semantics (nullability, additive vs. breaking changes)
323
+ - Reporter/logging abstraction (quiet/embeddable core)
324
+ - A canonical, language-neutral contract artifact + baseline/snapshot diffing
325
+ - More targets (Django, SQLAlchemy, GraphQL, OpenAPI)
326
+
327
+ ## 📄 License
328
+
329
+ MIT — see [LICENSE](LICENSE).
330
+
331
+ ## 🆘 Support
332
+
333
+ - 🐛 Issues: https://github.com/OGsiji/data-contract-validator/issues
334
+ - 📧 Email: ogunniransiji@gmail.com
335
+
336
+ If this saves you a production incident, please ⭐ the repo.
@@ -0,0 +1,286 @@
1
+ # 🛡️ Data Contract Validator
2
+
3
+ > **Catch breaking changes between your dbt models and your FastAPI/Pydantic APIs — before they hit production.**
4
+
5
+ [![PyPI version](https://badge.fury.io/py/data-contract-validator.svg)](https://badge.fury.io/py/data-contract-validator)
6
+ [![Tests](https://github.com/OGsiji/data-contract-validator/workflows/Tests/badge.svg)](https://github.com/OGsiji/data-contract-validator/actions)
7
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8
+
9
+ ## 🎯 What it solves
10
+
11
+ Your analytics team changes a dbt model. Your API team's FastAPI service still
12
+ expects the old shape. Nobody notices until production 500s at 2 AM.
13
+
14
+ This tool sits on that boundary. It extracts the schema your **dbt models
15
+ produce** and the schema your **Pydantic models expect**, compares them, and
16
+ fails CI when the data side can no longer satisfy the API side.
17
+
18
+ ```
19
+ dbt models Data Contract Validator FastAPI / Pydantic
20
+ (what the pipeline ──▶ extract → normalize → compare ◀── (what the API expects)
21
+ produces) ↓
22
+ critical issues block the build
23
+ ```
24
+
25
+ ### Built for trust
26
+
27
+ A check that gates a deploy is only useful if it doesn't cry wolf. v1.1
28
+ re-architected extraction around that principle:
29
+
30
+ - **Canonical types** — dbt `varchar` and Pydantic `str` are understood to be
31
+ the same thing, so you don't get drowned in fake "type mismatch" warnings.
32
+ - **A real SQL parser** (`sqlglot`) instead of regex — CTEs, `||`
33
+ concatenation, window functions and quoted identifiers are parsed correctly.
34
+ - **Confidence-aware** — if the tool can't fully resolve a model's columns
35
+ (e.g. `SELECT *`), it will **warn** rather than falsely **block** your build.
36
+
37
+ ## ⚡ Quick start
38
+
39
+ ```bash
40
+ pip install data-contract-validator
41
+ ```
42
+
43
+ ```bash
44
+ # Initialize config + CI workflow in your dbt project
45
+ contract-validator init --interactive
46
+
47
+ # Sanity-check the setup
48
+ contract-validator test
49
+
50
+ # Validate
51
+ contract-validator validate
52
+ ```
53
+
54
+ ### One-off validation (no config file)
55
+
56
+ ```bash
57
+ # Local dbt project against a local Pydantic models file or directory
58
+ contract-validator validate \
59
+ --dbt-project ./my-dbt-project \
60
+ --fastapi-local ./my-api/app/models.py
61
+
62
+ # dbt project against models in another GitHub repo (microservices)
63
+ contract-validator validate \
64
+ --dbt-project . \
65
+ --fastapi-repo "my-org/my-api" \
66
+ --fastapi-path "app/models.py"
67
+ ```
68
+
69
+ ## 🔍 How extraction works (and why it's accurate)
70
+
71
+ ### dbt side — tiered, best-source-wins
72
+
73
+ | Tier | Source | Types | Confidence | Notes |
74
+ |---|---|---|---|---|
75
+ | 1 | `target/catalog.json` | **Real warehouse types** | high | Produced by `dbt docs generate`. Most accurate. |
76
+ | 2 | `sqlglot` SQL parse | Inferred (often unknown) | medium | Trusted column **names**; enriched with documented types from `manifest.json`. Detects `SELECT *`. |
77
+ | 3 | regex parse | Guessed | low | Last resort. Never used to hard-fail a build. |
78
+
79
+ The tool auto-detects what's available and degrades gracefully — so it works
80
+ offline in pre-commit **and** with full type fidelity in a warehouse-connected
81
+ CI job.
82
+
83
+ > 💡 **Tip:** run `dbt docs generate` in CI before validating to unlock Tier 1
84
+ > (real types). Without it, you still get accurate column-presence checks from
85
+ > Tier 2.
86
+
87
+ ### FastAPI side
88
+
89
+ Pydantic / SQLModel classes are parsed from source with Python's `ast` (no
90
+ imports executed). `Optional[...]` controls whether a field is required;
91
+ `table=True` SQLModel classes (DB tables, not API contracts) are skipped.
92
+
93
+ ## 🚦 What gets flagged
94
+
95
+ | Severity | Meaning | Example |
96
+ |---|---|---|
97
+ | 🚨 **Critical** | Blocks the build | API requires a column the dbt model no longer produces |
98
+ | ⚠️ **Warning** | Worth a look, non-blocking | A real type mismatch, or a missing column on a model we couldn't fully resolve |
99
+
100
+ ```bash
101
+ $ contract-validator validate
102
+
103
+ 🛡️ Data Contract Validation Results:
104
+ Status: ❌ FAILED
105
+ Critical: 1 | Warnings: 0
106
+
107
+ 🚨 Critical Issues (Must Fix):
108
+ 💥 user_analytics
109
+ Column: total_orders
110
+ Problem: Target REQUIRES column 'total_orders' but source doesn't provide it
111
+ 🔧 Fix: Add column 'total_orders' to source model for table 'user_analytics'
112
+ ```
113
+
114
+ ## 🔧 Configuration (`.retl-validator.yml`)
115
+
116
+ ```yaml
117
+ version: "1.0"
118
+ name: "my-project-contracts"
119
+
120
+ source:
121
+ dbt:
122
+ project_path: "."
123
+ auto_compile: true
124
+ # Force Tier 2/3 SQL parsing even if catalog/manifest exist:
125
+ disable_manifest: false
126
+
127
+ target:
128
+ fastapi:
129
+ # GitHub repo:
130
+ type: "github"
131
+ repo: "my-org/my-api"
132
+ path: "app/models.py"
133
+ # ...or local:
134
+ # type: "local"
135
+ # path: "../my-api/app/models.py"
136
+
137
+ # Optional: explicit mapping for when names don't line up by convention.
138
+ mapping:
139
+ tables:
140
+ # target (Pydantic) table : source (dbt) model
141
+ user_analytics: user_analytics_summary
142
+ columns:
143
+ user_analytics:
144
+ # target column : source column
145
+ userId: user_id
146
+
147
+ validation:
148
+ fail_on: ["missing_tables", "missing_required_columns"]
149
+ warn_on: ["type_mismatches", "missing_optional_columns"]
150
+ ```
151
+
152
+ ### When do I need `mapping`?
153
+
154
+ By default, names are matched across `snake_case` / `camelCase` / casing
155
+ (`UserAnalytics` → `user_analytics`, `userId` → `user_id`). Reach for `mapping`
156
+ only when a model or column is named so differently that the convention can't
157
+ bridge it (e.g. Pydantic `user_id` ↔ dbt `customer_identifier`).
158
+
159
+ ## 🐍 Python API
160
+
161
+ ```python
162
+ from data_contract_validator import ContractValidator, DBTExtractor, FastAPIExtractor
163
+
164
+ dbt = DBTExtractor(project_path="./dbt-project")
165
+ fastapi = FastAPIExtractor.from_github_repo("my-org/my-api", "app/models.py")
166
+
167
+ validator = ContractValidator(
168
+ source_extractor=dbt,
169
+ target_extractor=fastapi,
170
+ mapping={"tables": {"user_analytics": "user_analytics_summary"}}, # optional
171
+ )
172
+ result = validator.validate()
173
+
174
+ if not result.success:
175
+ for issue in result.critical_issues:
176
+ print(f"💥 {issue.table}.{issue.column}: {issue.message}")
177
+ ```
178
+
179
+ ## 🪝 CI / pre-commit integration
180
+
181
+ ### GitHub Actions
182
+
183
+ `contract-validator init` generates a workflow for you. Minimal version:
184
+
185
+ ```yaml
186
+ name: 🛡️ Data Contract Validation
187
+ on:
188
+ pull_request:
189
+ paths: ["models/**/*.sql", "dbt_project.yml", "**/*models*.py"]
190
+ jobs:
191
+ validate-contracts:
192
+ runs-on: ubuntu-latest
193
+ steps:
194
+ - uses: actions/checkout@v4
195
+ - uses: actions/setup-python@v4
196
+ with: { python-version: "3.11" }
197
+ - run: pip install data-contract-validator
198
+ # Optional: `dbt docs generate` here for real warehouse types (Tier 1)
199
+ - run: contract-validator validate --output github
200
+ env:
201
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
202
+ ```
203
+
204
+ ### Pre-commit
205
+
206
+ ```bash
207
+ contract-validator setup-precommit --install-hooks
208
+ ```
209
+
210
+ ```yaml
211
+ repos:
212
+ - repo: https://github.com/OGsiji/data-contract-validator
213
+ rev: v1.1.0
214
+ hooks:
215
+ - id: contract-validation
216
+ ```
217
+
218
+ ## 🧪 Output formats
219
+
220
+ ```bash
221
+ contract-validator validate --output terminal # human-friendly (default)
222
+ contract-validator validate --output json # machine-readable for CI
223
+ contract-validator validate --output github # GitHub Actions annotations
224
+ ```
225
+
226
+ ## 🚀 Supported frameworks
227
+
228
+ **Source:** dbt (all adapters — Snowflake, BigQuery, Redshift, Postgres, …).
229
+ **Target:** FastAPI (Pydantic v2 + SQLModel).
230
+
231
+ The extractor architecture is intentionally pluggable (`BaseExtractor` →
232
+ `Dict[str, Schema]` with canonical types), so additional sources/targets can be
233
+ added without touching the validator. [Open an issue](https://github.com/OGsiji/data-contract-validator/issues)
234
+ to request one.
235
+
236
+ ## 🛠️ Development & testing
237
+
238
+ ```bash
239
+ git clone https://github.com/OGsiji/data-contract-validator
240
+ cd data-contract-validator
241
+
242
+ python -m venv .venv && source .venv/bin/activate
243
+ pip install -e ".[dev]" # or: pip install -e ".[test]"
244
+
245
+ # Run the suite
246
+ pytest
247
+
248
+ # Lint / format
249
+ black data_contract_validator tests
250
+ ```
251
+
252
+ The test suite covers the canonical type system (`tests/test_core/test_types.py`),
253
+ the tiered dbt extractor including sqlglot CTE handling and `catalog.json`
254
+ (`tests/test_extractors/test_dbt.py`), and the confidence/mapping behavior of
255
+ the validator (`tests/test_core/test_validator.py`).
256
+
257
+ ### Adding an extractor
258
+
259
+ ```python
260
+ from data_contract_validator.extractors.base import BaseExtractor
261
+ from data_contract_validator.core.types import CanonicalType
262
+
263
+ class MyExtractor(BaseExtractor):
264
+ def extract_schemas(self):
265
+ # return Dict[str, Schema]; use self._make_column(...) so each column
266
+ # carries a canonical_type the validator can compare.
267
+ ...
268
+ ```
269
+
270
+ ## 🗺️ Roadmap
271
+
272
+ - Real compatibility semantics (nullability, additive vs. breaking changes)
273
+ - Reporter/logging abstraction (quiet/embeddable core)
274
+ - A canonical, language-neutral contract artifact + baseline/snapshot diffing
275
+ - More targets (Django, SQLAlchemy, GraphQL, OpenAPI)
276
+
277
+ ## 📄 License
278
+
279
+ MIT — see [LICENSE](LICENSE).
280
+
281
+ ## 🆘 Support
282
+
283
+ - 🐛 Issues: https://github.com/OGsiji/data-contract-validator/issues
284
+ - 📧 Email: ogunniransiji@gmail.com
285
+
286
+ If this saves you a production incident, please ⭐ the repo.