data-contract-validator 1.0.5__tar.gz → 1.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/CHANGELOG.md +62 -1
  2. data_contract_validator-1.1.1/PKG-INFO +339 -0
  3. data_contract_validator-1.1.1/README.md +289 -0
  4. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/data_contract_validator/__init__.py +8 -5
  5. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/data_contract_validator/cli.py +28 -15
  6. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/data_contract_validator/core/models.py +20 -0
  7. data_contract_validator-1.1.1/data_contract_validator/core/types.py +347 -0
  8. data_contract_validator-1.1.1/data_contract_validator/core/validator.py +254 -0
  9. data_contract_validator-1.1.1/data_contract_validator/extractors/base.py +77 -0
  10. data_contract_validator-1.1.1/data_contract_validator/extractors/dbt.py +449 -0
  11. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/data_contract_validator/extractors/fastapi.py +35 -14
  12. data_contract_validator-1.1.1/data_contract_validator.egg-info/PKG-INFO +339 -0
  13. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/data_contract_validator.egg-info/SOURCES.txt +1 -0
  14. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/data_contract_validator.egg-info/requires.txt +1 -0
  15. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/pyproject.toml +3 -2
  16. data_contract_validator-1.0.5/PKG-INFO +0 -512
  17. data_contract_validator-1.0.5/README.md +0 -463
  18. data_contract_validator-1.0.5/data_contract_validator/core/validator.py +0 -187
  19. data_contract_validator-1.0.5/data_contract_validator/extractors/base.py +0 -45
  20. data_contract_validator-1.0.5/data_contract_validator/extractors/dbt.py +0 -227
  21. data_contract_validator-1.0.5/data_contract_validator.egg-info/PKG-INFO +0 -512
  22. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/LICENSE +0 -0
  23. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/MANIFEST.in +0 -0
  24. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/data_contract_validator/core/__init__.py +0 -0
  25. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/data_contract_validator/extractors/__init__.py +0 -0
  26. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/data_contract_validator/integrations/__init__.py +0 -0
  27. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/data_contract_validator/py.typed +0 -0
  28. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/data_contract_validator/templates/github-actions-template.yml +0 -0
  29. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/data_contract_validator.egg-info/dependency_links.txt +0 -0
  30. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/data_contract_validator.egg-info/entry_points.txt +0 -0
  31. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/data_contract_validator.egg-info/top_level.txt +0 -0
  32. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/requirements.txt +0 -0
  33. {data_contract_validator-1.0.5 → data_contract_validator-1.1.1}/setup.cfg +0 -0
@@ -7,6 +7,65 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [1.1.1] - 2026-06-30
11
+
12
+ ### Added
13
+ - **Automatic plural/singular table & column matching.** dbt models are
14
+ conventionally plural (`users`) while Pydantic classes are singular
15
+ (`User` → `user`); these now match automatically with no `mapping` needed.
16
+ Candidate forms are only matched against names that actually exist on the
17
+ other side, so it never over-strips (`address` is never mistaken for
18
+ `addres`). Explicit `mapping` still takes precedence.
19
+
20
+ ## [1.1.0] - 2026-06-30
21
+
22
+ This release is focused on **accuracy** — making a red check always mean a real
23
+ problem and a green check genuinely safe, so the tool can be trusted to gate a
24
+ deploy.
25
+
26
+ ### Added
27
+ - **Canonical type system** (`core/types.py`): every extractor now normalizes
28
+ its native types (warehouse SQL types, Python hints) into a shared, neutral
29
+ vocabulary (`CanonicalType`). The validator compares canonical types instead
30
+ of raw strings, eliminating the bulk of false "type mismatch" warnings
31
+ (e.g. dbt `varchar` vs Pydantic `str` are now correctly equal).
32
+ - Dialect-aware normalization: Snowflake `NUMBER(38,0)`→bigint, BigQuery
33
+ `INT64`/`FLOAT64`, Redshift `SUPER`, Postgres `jsonb`, and more.
34
+ - **Tiered dbt extraction** with graceful degradation:
35
+ 1. `catalog.json` — real warehouse types (high confidence).
36
+ 2. `sqlglot` — a proper SQL parser. Handles CTEs, `||`, window functions, and
37
+ quoted identifiers that the old regex parser mangled. Detects `SELECT *`
38
+ and flags the schema as incomplete.
39
+ 3. regex — last-resort best effort (low confidence, never hard-fails).
40
+ - **Confidence-aware validation**: when source columns can't be fully resolved
41
+ (e.g. `SELECT *`), a missing column is reported as a **warning, not a
42
+ build-blocking critical**. Type warnings are suppressed for low-confidence
43
+ (regex-tier) sources. This is the core false-positive guard.
44
+ - **Explicit mapping config** (`mapping:` in `.retl-validator.yml`) for when
45
+ name heuristics aren't enough — map a target table/column to a differently
46
+ named source model/column:
47
+ ```yaml
48
+ mapping:
49
+ tables:
50
+ user_analytics: user_analytics_summary
51
+ columns:
52
+ user_analytics:
53
+ userId: user_id
54
+ ```
55
+ - **Name normalization**: tables/columns now match across snake_case, camelCase
56
+ and casing differences (`userId` == `user_id` == `USER_ID`).
57
+
58
+ ### Changed
59
+ - `Schema` now carries `confidence` and `is_complete` (via `metadata`).
60
+ - `BaseExtractor` no longer contains Python-specific type mapping; type
61
+ normalization lives in the canonical type system. Added `_make_column` helper.
62
+ - Added `sqlglot` as a dependency (imported optionally; falls back to regex if
63
+ absent).
64
+
65
+ ### Fixed
66
+ - Hardened GitHub API rate-limit handling against non-dict response headers
67
+ (previously could raise when headers weren't a mapping).
68
+
10
69
  ## [1.0.5] - 2025-01-24
11
70
 
12
71
  ### Fixed
@@ -66,6 +125,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
66
125
  - Limited type inference from SQL
67
126
  - No support for complex nested types
68
127
 
69
- [Unreleased]: https://github.com/OGsiji/data-contract-validator/compare/v1.0.5...HEAD
128
+ [Unreleased]: https://github.com/OGsiji/data-contract-validator/compare/v1.1.1...HEAD
129
+ [1.1.1]: https://github.com/OGsiji/data-contract-validator/releases/tag/v1.1.1
130
+ [1.1.0]: https://github.com/OGsiji/data-contract-validator/releases/tag/v1.1.0
70
131
  [1.0.5]: https://github.com/OGsiji/data-contract-validator/releases/tag/v1.0.5
71
132
  [1.0.0]: https://github.com/OGsiji/data-contract-validator/releases/tag/v1.0.0
@@ -0,0 +1,339 @@
1
+ Metadata-Version: 2.4
2
+ Name: data-contract-validator
3
+ Version: 1.1.1
4
+ Summary: Validate data contracts between dbt models and FastAPI/Pydantic APIs with accurate, low-false-positive schema checks
5
+ Author-email: Ogunniran Siji <ogunniransiji@gmail.com>
6
+ Maintainer-email: Ogunniran Siji <ogunniransiji@gmail.com>
7
+ License: MIT
8
+ Project-URL: Homepage, https://github.com/OGsiji/data-contract-validator
9
+ Project-URL: Documentation, https://github.com/OGsiji/data-contract-validator/blob/main/README.md
10
+ Project-URL: Repository, https://github.com/OGsiji/data-contract-validator
11
+ Project-URL: Bug Reports, https://github.com/OGsiji/data-contract-validator/issues
12
+ Project-URL: Changelog, https://github.com/OGsiji/data-contract-validator/blob/main/CHANGELOG.md
13
+ Keywords: dbt,fastapi,contract-testing,api-validation,data-engineering,schema-validation,ci-cd,devops
14
+ Classifier: Development Status :: 4 - Beta
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: License :: OSI Approved :: MIT License
17
+ Classifier: Operating System :: OS Independent
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.8
20
+ Classifier: Programming Language :: Python :: 3.9
21
+ Classifier: Programming Language :: Python :: 3.10
22
+ Classifier: Programming Language :: Python :: 3.11
23
+ Classifier: Programming Language :: Python :: 3.12
24
+ Classifier: Topic :: Software Development :: Quality Assurance
25
+ Classifier: Topic :: Software Development :: Testing
26
+ Classifier: Topic :: Database
27
+ Classifier: Topic :: Internet :: WWW/HTTP :: HTTP Servers
28
+ Requires-Python: >=3.8
29
+ Description-Content-Type: text/markdown
30
+ License-File: LICENSE
31
+ Requires-Dist: pydantic>=2.0.0
32
+ Requires-Dist: PyYAML>=6.0
33
+ Requires-Dist: requests>=2.25.0
34
+ Requires-Dist: click>=8.0.0
35
+ Requires-Dist: sqlglot>=20.0.0
36
+ Provides-Extra: dev
37
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
38
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
39
+ Requires-Dist: black>=22.0.0; extra == "dev"
40
+ Requires-Dist: flake8>=4.0.0; extra == "dev"
41
+ Requires-Dist: mypy>=0.991; extra == "dev"
42
+ Requires-Dist: pre-commit>=2.20.0; extra == "dev"
43
+ Requires-Dist: build>=0.8.0; extra == "dev"
44
+ Requires-Dist: twine>=4.0.0; extra == "dev"
45
+ Provides-Extra: test
46
+ Requires-Dist: pytest>=7.0.0; extra == "test"
47
+ Requires-Dist: pytest-cov>=4.0.0; extra == "test"
48
+ Requires-Dist: pytest-mock>=3.8.0; extra == "test"
49
+ Dynamic: license-file
50
+
51
+ # 🛡️ Data Contract Validator
52
+
53
+ > **Catch breaking changes between your dbt models and your FastAPI/Pydantic APIs — before they hit production.**
54
+
55
+ [![PyPI version](https://badge.fury.io/py/data-contract-validator.svg)](https://badge.fury.io/py/data-contract-validator)
56
+ [![Tests](https://github.com/OGsiji/data-contract-validator/workflows/Tests/badge.svg)](https://github.com/OGsiji/data-contract-validator/actions)
57
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
58
+
59
+ ## 🎯 What it solves
60
+
61
+ Your analytics team changes a dbt model. Your API team's FastAPI service still
62
+ expects the old shape. Nobody notices until production 500s at 2 AM.
63
+
64
+ This tool sits on that boundary. It extracts the schema your **dbt models
65
+ produce** and the schema your **Pydantic models expect**, compares them, and
66
+ fails CI when the data side can no longer satisfy the API side.
67
+
68
+ ```
69
+ dbt models Data Contract Validator FastAPI / Pydantic
70
+ (what the pipeline ──▶ extract → normalize → compare ◀── (what the API expects)
71
+ produces) ↓
72
+ critical issues block the build
73
+ ```
74
+
75
+ ### Built for trust
76
+
77
+ A check that gates a deploy is only useful if it doesn't cry wolf. v1.1
78
+ re-architected extraction around that principle:
79
+
80
+ - **Canonical types** — dbt `varchar` and Pydantic `str` are understood to be
81
+ the same thing, so you don't get drowned in fake "type mismatch" warnings.
82
+ - **A real SQL parser** (`sqlglot`) instead of regex — CTEs, `||`
83
+ concatenation, window functions and quoted identifiers are parsed correctly.
84
+ - **Confidence-aware** — if the tool can't fully resolve a model's columns
85
+ (e.g. `SELECT *`), it will **warn** rather than falsely **block** your build.
86
+
87
+ ## ⚡ Quick start
88
+
89
+ ```bash
90
+ pip install data-contract-validator
91
+ ```
92
+
93
+ ```bash
94
+ # Initialize config + CI workflow in your dbt project
95
+ contract-validator init --interactive
96
+
97
+ # Sanity-check the setup
98
+ contract-validator test
99
+
100
+ # Validate
101
+ contract-validator validate
102
+ ```
103
+
104
+ ### One-off validation (no config file)
105
+
106
+ ```bash
107
+ # Local dbt project against a local Pydantic models file or directory
108
+ contract-validator validate \
109
+ --dbt-project ./my-dbt-project \
110
+ --fastapi-local ./my-api/app/models.py
111
+
112
+ # dbt project against models in another GitHub repo (microservices)
113
+ contract-validator validate \
114
+ --dbt-project . \
115
+ --fastapi-repo "my-org/my-api" \
116
+ --fastapi-path "app/models.py"
117
+ ```
118
+
119
+ ## 🔍 How extraction works (and why it's accurate)
120
+
121
+ ### dbt side — tiered, best-source-wins
122
+
123
+ | Tier | Source | Types | Confidence | Notes |
124
+ |---|---|---|---|---|
125
+ | 1 | `target/catalog.json` | **Real warehouse types** | high | Produced by `dbt docs generate`. Most accurate. |
126
+ | 2 | `sqlglot` SQL parse | Inferred (often unknown) | medium | Trusted column **names**; enriched with documented types from `manifest.json`. Detects `SELECT *`. |
127
+ | 3 | regex parse | Guessed | low | Last resort. Never used to hard-fail a build. |
128
+
129
+ The tool auto-detects what's available and degrades gracefully — so it works
130
+ offline in pre-commit **and** with full type fidelity in a warehouse-connected
131
+ CI job.
132
+
133
+ > 💡 **Tip:** run `dbt docs generate` in CI before validating to unlock Tier 1
134
+ > (real types). Without it, you still get accurate column-presence checks from
135
+ > Tier 2.
136
+
137
+ ### FastAPI side
138
+
139
+ Pydantic / SQLModel classes are parsed from source with Python's `ast` (no
140
+ imports executed). `Optional[...]` controls whether a field is required;
141
+ `table=True` SQLModel classes (DB tables, not API contracts) are skipped.
142
+
143
+ ## 🚦 What gets flagged
144
+
145
+ | Severity | Meaning | Example |
146
+ |---|---|---|
147
+ | 🚨 **Critical** | Blocks the build | API requires a column the dbt model no longer produces |
148
+ | ⚠️ **Warning** | Worth a look, non-blocking | A real type mismatch, or a missing column on a model we couldn't fully resolve |
149
+
150
+ ```bash
151
+ $ contract-validator validate
152
+
153
+ 🛡️ Data Contract Validation Results:
154
+ Status: ❌ FAILED
155
+ Critical: 1 | Warnings: 0
156
+
157
+ 🚨 Critical Issues (Must Fix):
158
+ 💥 user_analytics
159
+ Column: total_orders
160
+ Problem: Target REQUIRES column 'total_orders' but source doesn't provide it
161
+ 🔧 Fix: Add column 'total_orders' to source model for table 'user_analytics'
162
+ ```
163
+
164
+ ## 🔧 Configuration (`.retl-validator.yml`)
165
+
166
+ ```yaml
167
+ version: "1.0"
168
+ name: "my-project-contracts"
169
+
170
+ source:
171
+ dbt:
172
+ project_path: "."
173
+ auto_compile: true
174
+ # Force Tier 2/3 SQL parsing even if catalog/manifest exist:
175
+ disable_manifest: false
176
+
177
+ target:
178
+ fastapi:
179
+ # GitHub repo:
180
+ type: "github"
181
+ repo: "my-org/my-api"
182
+ path: "app/models.py"
183
+ # ...or local:
184
+ # type: "local"
185
+ # path: "../my-api/app/models.py"
186
+
187
+ # Optional: explicit mapping for when names don't line up by convention.
188
+ mapping:
189
+ tables:
190
+ # target (Pydantic) table : source (dbt) model
191
+ user_analytics: user_analytics_summary
192
+ columns:
193
+ user_analytics:
194
+ # target column : source column
195
+ userId: user_id
196
+
197
+ validation:
198
+ fail_on: ["missing_tables", "missing_required_columns"]
199
+ warn_on: ["type_mismatches", "missing_optional_columns"]
200
+ ```
201
+
202
+ ### When do I need `mapping`?
203
+
204
+ Most of the time you don't. Names are matched automatically across:
205
+ - `snake_case` / `camelCase` / casing — `UserAnalytics` → `user_analytics`, `userId` → `user_id`
206
+ - **plural ↔ singular** — dbt's plural `users` matches Pydantic's `User` (→ `user`)
207
+ with no config (and it won't over-match — `address` is never confused with `addres`).
208
+
209
+ Reach for `mapping` only when a model or column is named so differently that
210
+ convention can't bridge it (e.g. Pydantic `user_id` ↔ dbt `customer_identifier`).
211
+
212
+ ## 🐍 Python API
213
+
214
+ ```python
215
+ from data_contract_validator import ContractValidator, DBTExtractor, FastAPIExtractor
216
+
217
+ dbt = DBTExtractor(project_path="./dbt-project")
218
+ fastapi = FastAPIExtractor.from_github_repo("my-org/my-api", "app/models.py")
219
+
220
+ validator = ContractValidator(
221
+ source_extractor=dbt,
222
+ target_extractor=fastapi,
223
+ mapping={"tables": {"user_analytics": "user_analytics_summary"}}, # optional
224
+ )
225
+ result = validator.validate()
226
+
227
+ if not result.success:
228
+ for issue in result.critical_issues:
229
+ print(f"💥 {issue.table}.{issue.column}: {issue.message}")
230
+ ```
231
+
232
+ ## 🪝 CI / pre-commit integration
233
+
234
+ ### GitHub Actions
235
+
236
+ `contract-validator init` generates a workflow for you. Minimal version:
237
+
238
+ ```yaml
239
+ name: 🛡️ Data Contract Validation
240
+ on:
241
+ pull_request:
242
+ paths: ["models/**/*.sql", "dbt_project.yml", "**/*models*.py"]
243
+ jobs:
244
+ validate-contracts:
245
+ runs-on: ubuntu-latest
246
+ steps:
247
+ - uses: actions/checkout@v4
248
+ - uses: actions/setup-python@v4
249
+ with: { python-version: "3.11" }
250
+ - run: pip install data-contract-validator
251
+ # Optional: `dbt docs generate` here for real warehouse types (Tier 1)
252
+ - run: contract-validator validate --output github
253
+ env:
254
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
255
+ ```
256
+
257
+ ### Pre-commit
258
+
259
+ ```bash
260
+ contract-validator setup-precommit --install-hooks
261
+ ```
262
+
263
+ ```yaml
264
+ repos:
265
+ - repo: https://github.com/OGsiji/data-contract-validator
266
+ rev: v1.1.0
267
+ hooks:
268
+ - id: contract-validation
269
+ ```
270
+
271
+ ## 🧪 Output formats
272
+
273
+ ```bash
274
+ contract-validator validate --output terminal # human-friendly (default)
275
+ contract-validator validate --output json # machine-readable for CI
276
+ contract-validator validate --output github # GitHub Actions annotations
277
+ ```
278
+
279
+ ## 🚀 Supported frameworks
280
+
281
+ **Source:** dbt (all adapters — Snowflake, BigQuery, Redshift, Postgres, …).
282
+ **Target:** FastAPI (Pydantic v2 + SQLModel).
283
+
284
+ The extractor architecture is intentionally pluggable (`BaseExtractor` →
285
+ `Dict[str, Schema]` with canonical types), so additional sources/targets can be
286
+ added without touching the validator. [Open an issue](https://github.com/OGsiji/data-contract-validator/issues)
287
+ to request one.
288
+
289
+ ## 🛠️ Development & testing
290
+
291
+ ```bash
292
+ git clone https://github.com/OGsiji/data-contract-validator
293
+ cd data-contract-validator
294
+
295
+ python -m venv .venv && source .venv/bin/activate
296
+ pip install -e ".[dev]" # or: pip install -e ".[test]"
297
+
298
+ # Run the suite
299
+ pytest
300
+
301
+ # Lint / format
302
+ black data_contract_validator tests
303
+ ```
304
+
305
+ The test suite covers the canonical type system (`tests/test_core/test_types.py`),
306
+ the tiered dbt extractor including sqlglot CTE handling and `catalog.json`
307
+ (`tests/test_extractors/test_dbt.py`), and the confidence/mapping behavior of
308
+ the validator (`tests/test_core/test_validator.py`).
309
+
310
+ ### Adding an extractor
311
+
312
+ ```python
313
+ from data_contract_validator.extractors.base import BaseExtractor
314
+ from data_contract_validator.core.types import CanonicalType
315
+
316
+ class MyExtractor(BaseExtractor):
317
+ def extract_schemas(self):
318
+ # return Dict[str, Schema]; use self._make_column(...) so each column
319
+ # carries a canonical_type the validator can compare.
320
+ ...
321
+ ```
322
+
323
+ ## 🗺️ Roadmap
324
+
325
+ - Real compatibility semantics (nullability, additive vs. breaking changes)
326
+ - Reporter/logging abstraction (quiet/embeddable core)
327
+ - A canonical, language-neutral contract artifact + baseline/snapshot diffing
328
+ - More targets (Django, SQLAlchemy, GraphQL, OpenAPI)
329
+
330
+ ## 📄 License
331
+
332
+ MIT — see [LICENSE](LICENSE).
333
+
334
+ ## 🆘 Support
335
+
336
+ - 🐛 Issues: https://github.com/OGsiji/data-contract-validator/issues
337
+ - 📧 Email: ogunniransiji@gmail.com
338
+
339
+ If this saves you a production incident, please ⭐ the repo.
@@ -0,0 +1,289 @@
1
+ # 🛡️ Data Contract Validator
2
+
3
+ > **Catch breaking changes between your dbt models and your FastAPI/Pydantic APIs — before they hit production.**
4
+
5
+ [![PyPI version](https://badge.fury.io/py/data-contract-validator.svg)](https://badge.fury.io/py/data-contract-validator)
6
+ [![Tests](https://github.com/OGsiji/data-contract-validator/workflows/Tests/badge.svg)](https://github.com/OGsiji/data-contract-validator/actions)
7
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8
+
9
+ ## 🎯 What it solves
10
+
11
+ Your analytics team changes a dbt model. Your API team's FastAPI service still
12
+ expects the old shape. Nobody notices until production 500s at 2 AM.
13
+
14
+ This tool sits on that boundary. It extracts the schema your **dbt models
15
+ produce** and the schema your **Pydantic models expect**, compares them, and
16
+ fails CI when the data side can no longer satisfy the API side.
17
+
18
+ ```
19
+ dbt models Data Contract Validator FastAPI / Pydantic
20
+ (what the pipeline ──▶ extract → normalize → compare ◀── (what the API expects)
21
+ produces) ↓
22
+ critical issues block the build
23
+ ```
24
+
25
+ ### Built for trust
26
+
27
+ A check that gates a deploy is only useful if it doesn't cry wolf. v1.1
28
+ re-architected extraction around that principle:
29
+
30
+ - **Canonical types** — dbt `varchar` and Pydantic `str` are understood to be
31
+ the same thing, so you don't get drowned in fake "type mismatch" warnings.
32
+ - **A real SQL parser** (`sqlglot`) instead of regex — CTEs, `||`
33
+ concatenation, window functions and quoted identifiers are parsed correctly.
34
+ - **Confidence-aware** — if the tool can't fully resolve a model's columns
35
+ (e.g. `SELECT *`), it will **warn** rather than falsely **block** your build.
36
+
37
+ ## ⚡ Quick start
38
+
39
+ ```bash
40
+ pip install data-contract-validator
41
+ ```
42
+
43
+ ```bash
44
+ # Initialize config + CI workflow in your dbt project
45
+ contract-validator init --interactive
46
+
47
+ # Sanity-check the setup
48
+ contract-validator test
49
+
50
+ # Validate
51
+ contract-validator validate
52
+ ```
53
+
54
+ ### One-off validation (no config file)
55
+
56
+ ```bash
57
+ # Local dbt project against a local Pydantic models file or directory
58
+ contract-validator validate \
59
+ --dbt-project ./my-dbt-project \
60
+ --fastapi-local ./my-api/app/models.py
61
+
62
+ # dbt project against models in another GitHub repo (microservices)
63
+ contract-validator validate \
64
+ --dbt-project . \
65
+ --fastapi-repo "my-org/my-api" \
66
+ --fastapi-path "app/models.py"
67
+ ```
68
+
69
+ ## 🔍 How extraction works (and why it's accurate)
70
+
71
+ ### dbt side — tiered, best-source-wins
72
+
73
+ | Tier | Source | Types | Confidence | Notes |
74
+ |---|---|---|---|---|
75
+ | 1 | `target/catalog.json` | **Real warehouse types** | high | Produced by `dbt docs generate`. Most accurate. |
76
+ | 2 | `sqlglot` SQL parse | Inferred (often unknown) | medium | Trusted column **names**; enriched with documented types from `manifest.json`. Detects `SELECT *`. |
77
+ | 3 | regex parse | Guessed | low | Last resort. Never used to hard-fail a build. |
78
+
79
+ The tool auto-detects what's available and degrades gracefully — so it works
80
+ offline in pre-commit **and** with full type fidelity in a warehouse-connected
81
+ CI job.
82
+
83
+ > 💡 **Tip:** run `dbt docs generate` in CI before validating to unlock Tier 1
84
+ > (real types). Without it, you still get accurate column-presence checks from
85
+ > Tier 2.
86
+
87
+ ### FastAPI side
88
+
89
+ Pydantic / SQLModel classes are parsed from source with Python's `ast` (no
90
+ imports executed). `Optional[...]` controls whether a field is required;
91
+ `table=True` SQLModel classes (DB tables, not API contracts) are skipped.
92
+
93
+ ## 🚦 What gets flagged
94
+
95
+ | Severity | Meaning | Example |
96
+ |---|---|---|
97
+ | 🚨 **Critical** | Blocks the build | API requires a column the dbt model no longer produces |
98
+ | ⚠️ **Warning** | Worth a look, non-blocking | A real type mismatch, or a missing column on a model we couldn't fully resolve |
99
+
100
+ ```bash
101
+ $ contract-validator validate
102
+
103
+ 🛡️ Data Contract Validation Results:
104
+ Status: ❌ FAILED
105
+ Critical: 1 | Warnings: 0
106
+
107
+ 🚨 Critical Issues (Must Fix):
108
+ 💥 user_analytics
109
+ Column: total_orders
110
+ Problem: Target REQUIRES column 'total_orders' but source doesn't provide it
111
+ 🔧 Fix: Add column 'total_orders' to source model for table 'user_analytics'
112
+ ```
113
+
114
+ ## 🔧 Configuration (`.retl-validator.yml`)
115
+
116
+ ```yaml
117
+ version: "1.0"
118
+ name: "my-project-contracts"
119
+
120
+ source:
121
+ dbt:
122
+ project_path: "."
123
+ auto_compile: true
124
+ # Force Tier 2/3 SQL parsing even if catalog/manifest exist:
125
+ disable_manifest: false
126
+
127
+ target:
128
+ fastapi:
129
+ # GitHub repo:
130
+ type: "github"
131
+ repo: "my-org/my-api"
132
+ path: "app/models.py"
133
+ # ...or local:
134
+ # type: "local"
135
+ # path: "../my-api/app/models.py"
136
+
137
+ # Optional: explicit mapping for when names don't line up by convention.
138
+ mapping:
139
+ tables:
140
+ # target (Pydantic) table : source (dbt) model
141
+ user_analytics: user_analytics_summary
142
+ columns:
143
+ user_analytics:
144
+ # target column : source column
145
+ userId: user_id
146
+
147
+ validation:
148
+ fail_on: ["missing_tables", "missing_required_columns"]
149
+ warn_on: ["type_mismatches", "missing_optional_columns"]
150
+ ```
151
+
152
+ ### When do I need `mapping`?
153
+
154
+ Most of the time you don't. Names are matched automatically across:
155
+ - `snake_case` / `camelCase` / casing — `UserAnalytics` → `user_analytics`, `userId` → `user_id`
156
+ - **plural ↔ singular** — dbt's plural `users` matches Pydantic's `User` (→ `user`)
157
+ with no config (and it won't over-match — `address` is never confused with `addres`).
158
+
159
+ Reach for `mapping` only when a model or column is named so differently that
160
+ convention can't bridge it (e.g. Pydantic `user_id` ↔ dbt `customer_identifier`).
161
+
162
+ ## 🐍 Python API
163
+
164
+ ```python
165
+ from data_contract_validator import ContractValidator, DBTExtractor, FastAPIExtractor
166
+
167
+ dbt = DBTExtractor(project_path="./dbt-project")
168
+ fastapi = FastAPIExtractor.from_github_repo("my-org/my-api", "app/models.py")
169
+
170
+ validator = ContractValidator(
171
+ source_extractor=dbt,
172
+ target_extractor=fastapi,
173
+ mapping={"tables": {"user_analytics": "user_analytics_summary"}}, # optional
174
+ )
175
+ result = validator.validate()
176
+
177
+ if not result.success:
178
+ for issue in result.critical_issues:
179
+ print(f"💥 {issue.table}.{issue.column}: {issue.message}")
180
+ ```
181
+
182
+ ## 🪝 CI / pre-commit integration
183
+
184
+ ### GitHub Actions
185
+
186
+ `contract-validator init` generates a workflow for you. Minimal version:
187
+
188
+ ```yaml
189
+ name: 🛡️ Data Contract Validation
190
+ on:
191
+ pull_request:
192
+ paths: ["models/**/*.sql", "dbt_project.yml", "**/*models*.py"]
193
+ jobs:
194
+ validate-contracts:
195
+ runs-on: ubuntu-latest
196
+ steps:
197
+ - uses: actions/checkout@v4
198
+ - uses: actions/setup-python@v4
199
+ with: { python-version: "3.11" }
200
+ - run: pip install data-contract-validator
201
+ # Optional: `dbt docs generate` here for real warehouse types (Tier 1)
202
+ - run: contract-validator validate --output github
203
+ env:
204
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
205
+ ```
206
+
207
+ ### Pre-commit
208
+
209
+ ```bash
210
+ contract-validator setup-precommit --install-hooks
211
+ ```
212
+
213
+ ```yaml
214
+ repos:
215
+ - repo: https://github.com/OGsiji/data-contract-validator
216
+ rev: v1.1.0
217
+ hooks:
218
+ - id: contract-validation
219
+ ```
220
+
221
+ ## 🧪 Output formats
222
+
223
+ ```bash
224
+ contract-validator validate --output terminal # human-friendly (default)
225
+ contract-validator validate --output json # machine-readable for CI
226
+ contract-validator validate --output github # GitHub Actions annotations
227
+ ```
228
+
229
+ ## 🚀 Supported frameworks
230
+
231
+ **Source:** dbt (all adapters — Snowflake, BigQuery, Redshift, Postgres, …).
232
+ **Target:** FastAPI (Pydantic v2 + SQLModel).
233
+
234
+ The extractor architecture is intentionally pluggable (`BaseExtractor` →
235
+ `Dict[str, Schema]` with canonical types), so additional sources/targets can be
236
+ added without touching the validator. [Open an issue](https://github.com/OGsiji/data-contract-validator/issues)
237
+ to request one.
238
+
239
+ ## 🛠️ Development & testing
240
+
241
+ ```bash
242
+ git clone https://github.com/OGsiji/data-contract-validator
243
+ cd data-contract-validator
244
+
245
+ python -m venv .venv && source .venv/bin/activate
246
+ pip install -e ".[dev]" # or: pip install -e ".[test]"
247
+
248
+ # Run the suite
249
+ pytest
250
+
251
+ # Lint / format
252
+ black data_contract_validator tests
253
+ ```
254
+
255
+ The test suite covers the canonical type system (`tests/test_core/test_types.py`),
256
+ the tiered dbt extractor including sqlglot CTE handling and `catalog.json`
257
+ (`tests/test_extractors/test_dbt.py`), and the confidence/mapping behavior of
258
+ the validator (`tests/test_core/test_validator.py`).
259
+
260
+ ### Adding an extractor
261
+
262
+ ```python
263
+ from data_contract_validator.extractors.base import BaseExtractor
264
+ from data_contract_validator.core.types import CanonicalType
265
+
266
+ class MyExtractor(BaseExtractor):
267
+ def extract_schemas(self):
268
+ # return Dict[str, Schema]; use self._make_column(...) so each column
269
+ # carries a canonical_type the validator can compare.
270
+ ...
271
+ ```
272
+
273
+ ## 🗺️ Roadmap
274
+
275
+ - Real compatibility semantics (nullability, additive vs. breaking changes)
276
+ - Reporter/logging abstraction (quiet/embeddable core)
277
+ - A canonical, language-neutral contract artifact + baseline/snapshot diffing
278
+ - More targets (Django, SQLAlchemy, GraphQL, OpenAPI)
279
+
280
+ ## 📄 License
281
+
282
+ MIT — see [LICENSE](LICENSE).
283
+
284
+ ## 🆘 Support
285
+
286
+ - 🐛 Issues: https://github.com/OGsiji/data-contract-validator/issues
287
+ - 📧 Email: ogunniransiji@gmail.com
288
+
289
+ If this saves you a production incident, please ⭐ the repo.