data-contract-validator 1.1.1__tar.gz → 1.1.7__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/CHANGELOG.md +103 -0
- {data_contract_validator-1.1.1/data_contract_validator.egg-info → data_contract_validator-1.1.7}/PKG-INFO +182 -7
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/README.md +181 -6
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/__init__.py +1 -1
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/cli.py +214 -49
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/core/types.py +6 -1
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/core/validator.py +14 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/extractors/fastapi.py +62 -22
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7/data_contract_validator.egg-info}/PKG-INFO +182 -7
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/pyproject.toml +1 -1
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/LICENSE +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/MANIFEST.in +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/core/__init__.py +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/core/models.py +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/extractors/__init__.py +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/extractors/base.py +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/extractors/dbt.py +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/integrations/__init__.py +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/py.typed +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/templates/github-actions-template.yml +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator.egg-info/SOURCES.txt +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator.egg-info/dependency_links.txt +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator.egg-info/entry_points.txt +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator.egg-info/requires.txt +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator.egg-info/top_level.txt +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/requirements.txt +0 -0
- {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/setup.cfg +0 -0
|
@@ -7,6 +7,109 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
7
7
|
|
|
8
8
|
## [Unreleased]
|
|
9
9
|
|
|
10
|
+
## [1.1.7] - 2026-07-04
|
|
11
|
+
|
|
12
|
+
### Changed
|
|
13
|
+
- **The generated CI workflow now defaults `GITHUB_TOKEN` to
|
|
14
|
+
`secrets.API_REPO_TOKEN`** — a token you create yourself — instead of the
|
|
15
|
+
auto-provided `secrets.GITHUB_TOKEN`, which only has access to the repo
|
|
16
|
+
the workflow runs in. A personal access token works identically for
|
|
17
|
+
public and private targets, so this removes the silent-failure case
|
|
18
|
+
entirely rather than just documenting it (1.1.6's fix). Still skipped
|
|
19
|
+
entirely for a `local` target.
|
|
20
|
+
|
|
21
|
+
### Added
|
|
22
|
+
- The generated CI workflow now includes a commented scaffold for
|
|
23
|
+
`dbt deps && dbt docs generate`, unlocking Tier 1 (real warehouse types)
|
|
24
|
+
in CI instead of that only being mentioned in prose docs. Commented out
|
|
25
|
+
by default since it needs the user's warehouse adapter and credentials
|
|
26
|
+
filled in, which can't be inferred.
|
|
27
|
+
|
|
28
|
+
## [1.1.6] - 2026-07-03
|
|
29
|
+
|
|
30
|
+
### Fixed
|
|
31
|
+
- **The generated CI workflow silently assumed the default
|
|
32
|
+
`secrets.GITHUB_TOKEN` could read the target API repo.** That token only
|
|
33
|
+
has access to the repo the workflow itself runs in — if `target.*.repo`
|
|
34
|
+
is a *different*, private repo, validation would fail on every PR with no
|
|
35
|
+
indication why. The generated workflow now documents the fix inline
|
|
36
|
+
(a personal-access-token secret pointed at that specific repo) and skips
|
|
37
|
+
the `GITHUB_TOKEN` env block entirely for a `local` target, which never
|
|
38
|
+
talks to the GitHub API at all.
|
|
39
|
+
|
|
40
|
+
## [1.1.5] - 2026-07-03
|
|
41
|
+
|
|
42
|
+
### Fixed
|
|
43
|
+
- **Python `int` was mapped to the narrower `INTEGER` canonical rank,
|
|
44
|
+
producing a false "type mismatch" warning against any dbt column typed
|
|
45
|
+
`bigint`** (a very common type for count/id columns). Python's `int` is
|
|
46
|
+
arbitrary-precision, unlike a fixed-width SQL `INTEGER` column, so there's
|
|
47
|
+
no real truncation risk — it's now mapped to the wider `BIGINT` rank.
|
|
48
|
+
A genuinely fractional source (`DECIMAL`/`FLOAT`) is still flagged.
|
|
49
|
+
|
|
50
|
+
### Added
|
|
51
|
+
- `init --interactive` now offers to set up a pre-commit hook as part of the
|
|
52
|
+
same wizard, instead of requiring a separate `setup-precommit` invocation.
|
|
53
|
+
(The GitHub Actions CI workflow was already created automatically by
|
|
54
|
+
`init` for both the interactive and non-interactive paths — only the
|
|
55
|
+
pre-commit step needed folding in.)
|
|
56
|
+
|
|
57
|
+
## [1.1.4] - 2026-07-03
|
|
58
|
+
|
|
59
|
+
### Changed
|
|
60
|
+
- **`table=True` SQLModel classes are no longer skipped during extraction.**
|
|
61
|
+
Whether a table is meant to come from dbt is business knowledge that can't
|
|
62
|
+
be recovered from the Python source — two structurally identical
|
|
63
|
+
`table=True` classes can need opposite treatment (one is a normal dbt-fed
|
|
64
|
+
table an API also returns directly; another is populated by a separate
|
|
65
|
+
pipeline like Kafka and was never meant to have a dbt model). Blanket-
|
|
66
|
+
skipping every `table=True` class silently exempted the former case from
|
|
67
|
+
validation too, which is likely the more common pattern — defeating the
|
|
68
|
+
tool's purpose for it. `table=True` classes are now validated like any
|
|
69
|
+
other target.
|
|
70
|
+
- Added `mapping.exclude: [<table>, ...]` so the latter case (genuinely no
|
|
71
|
+
source model, e.g. Kafka-populated) can be stated explicitly instead of
|
|
72
|
+
inferred from `table=True`. Excluded tables are skipped entirely and never
|
|
73
|
+
produce a "missing table" issue.
|
|
74
|
+
|
|
75
|
+
## [1.1.3] - 2026-07-03
|
|
76
|
+
|
|
77
|
+
Supersedes 1.1.2, which was only ever published to TestPyPI for verification
|
|
78
|
+
and never released to production PyPI.
|
|
79
|
+
|
|
80
|
+
### Fixed
|
|
81
|
+
- **`table=True` SQLModel classes were incorrectly evaluated as required API
|
|
82
|
+
contracts.** The standard `class Foo(SQLModel, table=True)` syntax puts
|
|
83
|
+
`table=True` on the class definition's own keywords, not nested inside a
|
|
84
|
+
`Call` base — the skip check only looked in the latter, so DB-only tables
|
|
85
|
+
never matched and produced permanent, unfixable "missing table" criticals.
|
|
86
|
+
- **Explicit `__tablename__` is now resolved and used as the target table
|
|
87
|
+
name**, instead of only the class-name-derived guess. A class like
|
|
88
|
+
`VideoViewed` with `__tablename__ = "int_unified_video_viewed"` now matches
|
|
89
|
+
its real source model without needing a manual `mapping.tables` entry.
|
|
90
|
+
- **`init --interactive` no longer guesses local vs. GitHub from the path's
|
|
91
|
+
shape.** A local relative path like `app/models` (the wizard's own
|
|
92
|
+
suggested default) is syntactically identical to a GitHub `org/repo`
|
|
93
|
+
string, and was always guessed as a repo, producing a nonsensical
|
|
94
|
+
`app/models/app/models` GitHub target. The wizard now asks explicitly
|
|
95
|
+
("local project or a different GitHub repo?") before asking for the
|
|
96
|
+
path, and asks for the repo and the path within it as separate prompts.
|
|
97
|
+
|
|
98
|
+
### Added
|
|
99
|
+
- `init --interactive` and `contract-validator test` now verify a configured
|
|
100
|
+
GitHub target path actually exists via the GitHub API, instead of silently
|
|
101
|
+
accepting a stale or typo'd path.
|
|
102
|
+
- GitHub API error messages hint at setting `GITHUB_TOKEN` when an
|
|
103
|
+
unauthenticated 404 is ambiguous with a private repo.
|
|
104
|
+
|
|
105
|
+
### Changed
|
|
106
|
+
- **`contract-validator init` no longer silently overwrites an existing
|
|
107
|
+
`.retl-validator.yml` or generated workflow file.** Re-running `init` (e.g.
|
|
108
|
+
after upgrading to pick up a newer version's config defaults) now refuses
|
|
109
|
+
and exits if either file already exists — pass `--force` to regenerate
|
|
110
|
+
them from scratch. Previously this was an unconditional overwrite with no
|
|
111
|
+
confirmation, which could silently destroy hand-added `mapping` entries.
|
|
112
|
+
|
|
10
113
|
## [1.1.1] - 2026-06-30
|
|
11
114
|
|
|
12
115
|
### Added
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: data-contract-validator
|
|
3
|
-
Version: 1.1.
|
|
3
|
+
Version: 1.1.7
|
|
4
4
|
Summary: Validate data contracts between dbt models and FastAPI/Pydantic APIs with accurate, low-false-positive schema checks
|
|
5
5
|
Author-email: Ogunniran Siji <ogunniransiji@gmail.com>
|
|
6
6
|
Maintainer-email: Ogunniran Siji <ogunniransiji@gmail.com>
|
|
@@ -101,6 +101,80 @@ contract-validator test
|
|
|
101
101
|
contract-validator validate
|
|
102
102
|
```
|
|
103
103
|
|
|
104
|
+
## 🚀 Getting started, step by step
|
|
105
|
+
|
|
106
|
+
If you're setting this up on a project for the first time, the order below
|
|
107
|
+
avoids the sharp edges:
|
|
108
|
+
|
|
109
|
+
1. **Install into the same environment dbt runs in** (not a separate venv) —
|
|
110
|
+
the tool needs to see your dbt project:
|
|
111
|
+
```bash
|
|
112
|
+
pip install data-contract-validator
|
|
113
|
+
```
|
|
114
|
+
Already have `.retl-validator.yml` committed by a teammate? Skip to step 5.
|
|
115
|
+
|
|
116
|
+
2. **Generate the config + CI workflow** (one-time):
|
|
117
|
+
```bash
|
|
118
|
+
contract-validator init --interactive
|
|
119
|
+
```
|
|
120
|
+
You'll be asked: where your dbt project is, which API framework you use,
|
|
121
|
+
whether your models live in this local project or a different GitHub
|
|
122
|
+
repo, and then the local path (or the `org/repo` + path within it). It's
|
|
123
|
+
asked explicitly rather than guessed from the path's shape — a local path
|
|
124
|
+
like `app/models` is syntactically identical to a GitHub `org/repo`
|
|
125
|
+
string, so there's no reliable way to infer which one you mean. If you
|
|
126
|
+
pick GitHub, it checks the path actually exists before writing the
|
|
127
|
+
config — so a typo surfaces here instead of at `validate` time.
|
|
128
|
+
|
|
129
|
+
`init` refuses to touch an existing `.retl-validator.yml` or workflow
|
|
130
|
+
file — it won't clobber hand-added `mapping` entries just because you
|
|
131
|
+
upgraded the package and re-ran `init`. Pass `--force` if you really want
|
|
132
|
+
to regenerate them from the new version's defaults.
|
|
133
|
+
|
|
134
|
+
3. **Pre-commit hook**: `init --interactive` asks whether you want one set
|
|
135
|
+
up right after creating the config and CI workflow — say yes there and
|
|
136
|
+
it's done. To add one later (or if you used non-interactive `init`,
|
|
137
|
+
which doesn't prompt), run it standalone:
|
|
138
|
+
```bash
|
|
139
|
+
contract-validator setup-precommit --install-hooks
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
4. **If the target repo is private, set a token** before running anything
|
|
143
|
+
that talks to GitHub locally:
|
|
144
|
+
```bash
|
|
145
|
+
export GITHUB_TOKEN=$(gh auth token) # or a PAT with repo read access
|
|
146
|
+
```
|
|
147
|
+
See [Private GitHub repos need `GITHUB_TOKEN`](#private-github-repos-need-github_token) below for why this is easy to miss.
|
|
148
|
+
|
|
149
|
+
5. **Sanity-check the setup**:
|
|
150
|
+
```bash
|
|
151
|
+
contract-validator test
|
|
152
|
+
```
|
|
153
|
+
Confirms the config parses, the dbt project is found, and the target
|
|
154
|
+
(local path or GitHub path) is reachable. If this fails, `validate` will
|
|
155
|
+
fail the same way — fix it here first.
|
|
156
|
+
|
|
157
|
+
6. **Run it**:
|
|
158
|
+
```bash
|
|
159
|
+
contract-validator validate
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
7. **When it reports a critical issue, diagnose before assuming your dbt
|
|
163
|
+
model is wrong**:
|
|
164
|
+
- Real missing column/table → fix the dbt model.
|
|
165
|
+
- Target name doesn't match the dbt model by convention (renamed/prefixed)
|
|
166
|
+
→ add an entry under `mapping.tables` in `.retl-validator.yml` (see
|
|
167
|
+
[When do I need `mapping`?](#when-do-i-need-mapping)).
|
|
168
|
+
- A table that's genuinely populated by something other than dbt (e.g. a
|
|
169
|
+
separate streaming pipeline) and has no source model on purpose → add
|
|
170
|
+
it to `mapping.exclude`. `table=True` alone is **not** used to infer
|
|
171
|
+
this automatically — see [FastAPI side](#fastapi-side) for why.
|
|
172
|
+
|
|
173
|
+
8. **For accurate type-checking** (not just column-presence checks), run
|
|
174
|
+
`dbt docs generate` before `validate` so it picks up `catalog.json` (Tier 1,
|
|
175
|
+
real warehouse types) instead of inferring from SQL text — see
|
|
176
|
+
[How extraction works](#-how-extraction-works-and-why-its-accurate) below.
|
|
177
|
+
|
|
104
178
|
### One-off validation (no config file)
|
|
105
179
|
|
|
106
180
|
```bash
|
|
@@ -132,13 +206,25 @@ CI job.
|
|
|
132
206
|
|
|
133
207
|
> 💡 **Tip:** run `dbt docs generate` in CI before validating to unlock Tier 1
|
|
134
208
|
> (real types). Without it, you still get accurate column-presence checks from
|
|
135
|
-
> Tier 2.
|
|
209
|
+
> Tier 2. The workflow `init` generates includes this step already, commented
|
|
210
|
+
> out — it needs your warehouse adapter and credentials filled in, which
|
|
211
|
+
> can't be guessed, so it isn't active by default.
|
|
136
212
|
|
|
137
213
|
### FastAPI side
|
|
138
214
|
|
|
139
215
|
Pydantic / SQLModel classes are parsed from source with Python's `ast` (no
|
|
140
|
-
imports executed). `Optional[...]` controls whether a field is required
|
|
141
|
-
`
|
|
216
|
+
imports executed). `Optional[...]` controls whether a field is required.
|
|
217
|
+
An explicit `__tablename__` is used as the table name when present;
|
|
218
|
+
otherwise the class name is converted to `snake_case`.
|
|
219
|
+
|
|
220
|
+
`table=True` SQLModel classes are validated the same as any other class —
|
|
221
|
+
they are **not** skipped. Whether a table is meant to come from dbt is
|
|
222
|
+
business knowledge that isn't recoverable from the Python source: two
|
|
223
|
+
structurally identical `table=True` classes can need opposite treatment (one
|
|
224
|
+
is a normal dbt-fed table your API also returns directly; another is
|
|
225
|
+
populated by a Kafka stream and was never meant to have a dbt model). Use
|
|
226
|
+
`mapping.exclude` to state the latter case explicitly rather than relying on
|
|
227
|
+
`table=True` to imply it.
|
|
142
228
|
|
|
143
229
|
## 🚦 What gets flagged
|
|
144
230
|
|
|
@@ -193,12 +279,76 @@ mapping:
|
|
|
193
279
|
user_analytics:
|
|
194
280
|
# target column : source column
|
|
195
281
|
userId: user_id
|
|
282
|
+
# Target tables with no source model on purpose (e.g. Kafka-populated,
|
|
283
|
+
# not dbt) -- see "When do I need mapping?" below.
|
|
284
|
+
exclude:
|
|
285
|
+
- feed_interaction
|
|
196
286
|
|
|
197
287
|
validation:
|
|
198
288
|
fail_on: ["missing_tables", "missing_required_columns"]
|
|
199
289
|
warn_on: ["type_mismatches", "missing_optional_columns"]
|
|
200
290
|
```
|
|
201
291
|
|
|
292
|
+
### Private GitHub repos need `GITHUB_TOKEN`
|
|
293
|
+
|
|
294
|
+
If `target.*.repo` points at a private repository, `contract-validator`
|
|
295
|
+
needs a token with read access to it. Where that token comes from is
|
|
296
|
+
different locally vs. in CI — and the CI case has a sharp edge worth
|
|
297
|
+
understanding before it silently fails on a PR.
|
|
298
|
+
|
|
299
|
+
**Locally**, set the `GITHUB_TOKEN` environment variable before running the
|
|
300
|
+
CLI. On bash/zsh that's `export` (there's nothing to install — `export` just
|
|
301
|
+
makes the variable visible to the `contract-validator` process you run
|
|
302
|
+
next):
|
|
303
|
+
|
|
304
|
+
```bash
|
|
305
|
+
export GITHUB_TOKEN=$(gh auth token) # or a PAT with repo read access
|
|
306
|
+
contract-validator validate
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
GitHub's API 404s (not 403s) an unauthenticated request to a private path,
|
|
310
|
+
so without a token this looks identical to a plain typo in `path` —
|
|
311
|
+
`contract-validator init --interactive` and `contract-validator test` both
|
|
312
|
+
check `target.*.path` actually exists and will point you at this if the
|
|
313
|
+
lookup 404s with no token set.
|
|
314
|
+
|
|
315
|
+
**In CI**, the workflow `init` generates for a GitHub target wires up
|
|
316
|
+
`GITHUB_TOKEN: ${{ secrets.API_REPO_TOKEN }}` — a token **you** create,
|
|
317
|
+
*not* the auto-provided `secrets.GITHUB_TOKEN`. That auto-provided token
|
|
318
|
+
only has access to **the repository the workflow is running in**, so if
|
|
319
|
+
your dbt repo and your API repo are different repos, it silently can't read
|
|
320
|
+
the target the first time that target is private — and a PAT works
|
|
321
|
+
identically for a public target too, so there's no reason to default to the
|
|
322
|
+
token that only sometimes works. To finish the setup the generated workflow
|
|
323
|
+
expects:
|
|
324
|
+
|
|
325
|
+
1. Create a token with read access to the *target* repo — a
|
|
326
|
+
[fine-grained PAT](https://github.com/settings/personal-access-tokens/new)
|
|
327
|
+
scoped to just that repo's Contents (read-only) is the least-privilege
|
|
328
|
+
option; a classic PAT with the `repo` scope also works.
|
|
329
|
+
2. In the repo running the workflow (your dbt repo): **Settings → Secrets
|
|
330
|
+
and variables → Actions → New repository secret**. Name it
|
|
331
|
+
**`API_REPO_TOKEN`** exactly (that's the name the generated workflow
|
|
332
|
+
already references) and paste the token as the value.
|
|
333
|
+
|
|
334
|
+
> ⚠️ **GitHub rejects any secret name starting with `GITHUB_`** — it's a
|
|
335
|
+
> reserved prefix. You cannot create a secret literally called
|
|
336
|
+
> `GITHUB_TOKEN`; that's not a naming suggestion, the UI will refuse it.
|
|
337
|
+
> That's exactly why the workflow's secret is named `API_REPO_TOKEN`
|
|
338
|
+
> instead, even though the environment variable it feeds is `GITHUB_TOKEN`
|
|
339
|
+
> — two different things with confusingly similar names:
|
|
340
|
+
> ```yaml
|
|
341
|
+
> env:
|
|
342
|
+
> GITHUB_TOKEN: ${{ secrets.API_REPO_TOKEN }}
|
|
343
|
+
> # ^^^^^^^^^^^ local variable name, can be anything -- the CLI
|
|
344
|
+
> # just needs it called GITHUB_TOKEN to find it
|
|
345
|
+
> # ^^^^^^^^^^^^^^ the *secret's* name --
|
|
346
|
+
> # this is what GitHub restricts
|
|
347
|
+
> ```
|
|
348
|
+
|
|
349
|
+
Skip all of this for a `local` target — `init` omits the whole `env:` block
|
|
350
|
+
since a local target never talks to the GitHub API at all.
|
|
351
|
+
|
|
202
352
|
### When do I need `mapping`?
|
|
203
353
|
|
|
204
354
|
Most of the time you don't. Names are matched automatically across:
|
|
@@ -206,8 +356,26 @@ Most of the time you don't. Names are matched automatically across:
|
|
|
206
356
|
- **plural ↔ singular** — dbt's plural `users` matches Pydantic's `User` (→ `user`)
|
|
207
357
|
with no config (and it won't over-match — `address` is never confused with `addres`).
|
|
208
358
|
|
|
209
|
-
Reach for `mapping` only when a model or column is
|
|
210
|
-
convention can't bridge it (e.g. Pydantic
|
|
359
|
+
Reach for `mapping.tables` / `mapping.columns` only when a model or column is
|
|
360
|
+
named so differently that convention can't bridge it (e.g. Pydantic
|
|
361
|
+
`user_id` ↔ dbt `customer_identifier`).
|
|
362
|
+
|
|
363
|
+
`mapping.exclude` is different — it's not about renamed models, it's for a
|
|
364
|
+
target table that has **no source model on purpose**, because it's
|
|
365
|
+
populated by something other than dbt (a Kafka stream, a cron job, etc.).
|
|
366
|
+
This can't be inferred from the code (a `table=True` SQLModel class looks
|
|
367
|
+
identical whether or not dbt is supposed to feed it), so it has to be a
|
|
368
|
+
deliberate, human-stated exception:
|
|
369
|
+
|
|
370
|
+
```yaml
|
|
371
|
+
mapping:
|
|
372
|
+
exclude:
|
|
373
|
+
- feed_interaction
|
|
374
|
+
- affiliate_reward
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
Anything not listed is validated normally — including `table=True` classes,
|
|
378
|
+
which are treated the same as any other target and are not silently skipped.
|
|
211
379
|
|
|
212
380
|
## 🐍 Python API
|
|
213
381
|
|
|
@@ -251,9 +419,16 @@ jobs:
|
|
|
251
419
|
# Optional: `dbt docs generate` here for real warehouse types (Tier 1)
|
|
252
420
|
- run: contract-validator validate --output github
|
|
253
421
|
env:
|
|
254
|
-
GITHUB_TOKEN: ${{ secrets.
|
|
422
|
+
GITHUB_TOKEN: ${{ secrets.API_REPO_TOKEN }}
|
|
255
423
|
```
|
|
256
424
|
|
|
425
|
+
`GITHUB_TOKEN` here is only needed if `target` is a `github` repo (`init`
|
|
426
|
+
omits the whole `env:` block for a `local` target). `secrets.API_REPO_TOKEN`
|
|
427
|
+
is a token you create yourself, not GitHub's auto-provided
|
|
428
|
+
`secrets.GITHUB_TOKEN` — see
|
|
429
|
+
[Private GitHub repos need `GITHUB_TOKEN`](#private-github-repos-need-github_token)
|
|
430
|
+
above for why, and how to set it up.
|
|
431
|
+
|
|
257
432
|
### Pre-commit
|
|
258
433
|
|
|
259
434
|
```bash
|
|
@@ -51,6 +51,80 @@ contract-validator test
|
|
|
51
51
|
contract-validator validate
|
|
52
52
|
```
|
|
53
53
|
|
|
54
|
+
## 🚀 Getting started, step by step
|
|
55
|
+
|
|
56
|
+
If you're setting this up on a project for the first time, the order below
|
|
57
|
+
avoids the sharp edges:
|
|
58
|
+
|
|
59
|
+
1. **Install into the same environment dbt runs in** (not a separate venv) —
|
|
60
|
+
the tool needs to see your dbt project:
|
|
61
|
+
```bash
|
|
62
|
+
pip install data-contract-validator
|
|
63
|
+
```
|
|
64
|
+
Already have `.retl-validator.yml` committed by a teammate? Skip to step 5.
|
|
65
|
+
|
|
66
|
+
2. **Generate the config + CI workflow** (one-time):
|
|
67
|
+
```bash
|
|
68
|
+
contract-validator init --interactive
|
|
69
|
+
```
|
|
70
|
+
You'll be asked: where your dbt project is, which API framework you use,
|
|
71
|
+
whether your models live in this local project or a different GitHub
|
|
72
|
+
repo, and then the local path (or the `org/repo` + path within it). It's
|
|
73
|
+
asked explicitly rather than guessed from the path's shape — a local path
|
|
74
|
+
like `app/models` is syntactically identical to a GitHub `org/repo`
|
|
75
|
+
string, so there's no reliable way to infer which one you mean. If you
|
|
76
|
+
pick GitHub, it checks the path actually exists before writing the
|
|
77
|
+
config — so a typo surfaces here instead of at `validate` time.
|
|
78
|
+
|
|
79
|
+
`init` refuses to touch an existing `.retl-validator.yml` or workflow
|
|
80
|
+
file — it won't clobber hand-added `mapping` entries just because you
|
|
81
|
+
upgraded the package and re-ran `init`. Pass `--force` if you really want
|
|
82
|
+
to regenerate them from the new version's defaults.
|
|
83
|
+
|
|
84
|
+
3. **Pre-commit hook**: `init --interactive` asks whether you want one set
|
|
85
|
+
up right after creating the config and CI workflow — say yes there and
|
|
86
|
+
it's done. To add one later (or if you used non-interactive `init`,
|
|
87
|
+
which doesn't prompt), run it standalone:
|
|
88
|
+
```bash
|
|
89
|
+
contract-validator setup-precommit --install-hooks
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
4. **If the target repo is private, set a token** before running anything
|
|
93
|
+
that talks to GitHub locally:
|
|
94
|
+
```bash
|
|
95
|
+
export GITHUB_TOKEN=$(gh auth token) # or a PAT with repo read access
|
|
96
|
+
```
|
|
97
|
+
See [Private GitHub repos need `GITHUB_TOKEN`](#private-github-repos-need-github_token) below for why this is easy to miss.
|
|
98
|
+
|
|
99
|
+
5. **Sanity-check the setup**:
|
|
100
|
+
```bash
|
|
101
|
+
contract-validator test
|
|
102
|
+
```
|
|
103
|
+
Confirms the config parses, the dbt project is found, and the target
|
|
104
|
+
(local path or GitHub path) is reachable. If this fails, `validate` will
|
|
105
|
+
fail the same way — fix it here first.
|
|
106
|
+
|
|
107
|
+
6. **Run it**:
|
|
108
|
+
```bash
|
|
109
|
+
contract-validator validate
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
7. **When it reports a critical issue, diagnose before assuming your dbt
|
|
113
|
+
model is wrong**:
|
|
114
|
+
- Real missing column/table → fix the dbt model.
|
|
115
|
+
- Target name doesn't match the dbt model by convention (renamed/prefixed)
|
|
116
|
+
→ add an entry under `mapping.tables` in `.retl-validator.yml` (see
|
|
117
|
+
[When do I need `mapping`?](#when-do-i-need-mapping)).
|
|
118
|
+
- A table that's genuinely populated by something other than dbt (e.g. a
|
|
119
|
+
separate streaming pipeline) and has no source model on purpose → add
|
|
120
|
+
it to `mapping.exclude`. `table=True` alone is **not** used to infer
|
|
121
|
+
this automatically — see [FastAPI side](#fastapi-side) for why.
|
|
122
|
+
|
|
123
|
+
8. **For accurate type-checking** (not just column-presence checks), run
|
|
124
|
+
`dbt docs generate` before `validate` so it picks up `catalog.json` (Tier 1,
|
|
125
|
+
real warehouse types) instead of inferring from SQL text — see
|
|
126
|
+
[How extraction works](#-how-extraction-works-and-why-its-accurate) below.
|
|
127
|
+
|
|
54
128
|
### One-off validation (no config file)
|
|
55
129
|
|
|
56
130
|
```bash
|
|
@@ -82,13 +156,25 @@ CI job.
|
|
|
82
156
|
|
|
83
157
|
> 💡 **Tip:** run `dbt docs generate` in CI before validating to unlock Tier 1
|
|
84
158
|
> (real types). Without it, you still get accurate column-presence checks from
|
|
85
|
-
> Tier 2.
|
|
159
|
+
> Tier 2. The workflow `init` generates includes this step already, commented
|
|
160
|
+
> out — it needs your warehouse adapter and credentials filled in, which
|
|
161
|
+
> can't be guessed, so it isn't active by default.
|
|
86
162
|
|
|
87
163
|
### FastAPI side
|
|
88
164
|
|
|
89
165
|
Pydantic / SQLModel classes are parsed from source with Python's `ast` (no
|
|
90
|
-
imports executed). `Optional[...]` controls whether a field is required
|
|
91
|
-
`
|
|
166
|
+
imports executed). `Optional[...]` controls whether a field is required.
|
|
167
|
+
An explicit `__tablename__` is used as the table name when present;
|
|
168
|
+
otherwise the class name is converted to `snake_case`.
|
|
169
|
+
|
|
170
|
+
`table=True` SQLModel classes are validated the same as any other class —
|
|
171
|
+
they are **not** skipped. Whether a table is meant to come from dbt is
|
|
172
|
+
business knowledge that isn't recoverable from the Python source: two
|
|
173
|
+
structurally identical `table=True` classes can need opposite treatment (one
|
|
174
|
+
is a normal dbt-fed table your API also returns directly; another is
|
|
175
|
+
populated by a Kafka stream and was never meant to have a dbt model). Use
|
|
176
|
+
`mapping.exclude` to state the latter case explicitly rather than relying on
|
|
177
|
+
`table=True` to imply it.
|
|
92
178
|
|
|
93
179
|
## 🚦 What gets flagged
|
|
94
180
|
|
|
@@ -143,12 +229,76 @@ mapping:
|
|
|
143
229
|
user_analytics:
|
|
144
230
|
# target column : source column
|
|
145
231
|
userId: user_id
|
|
232
|
+
# Target tables with no source model on purpose (e.g. Kafka-populated,
|
|
233
|
+
# not dbt) -- see "When do I need mapping?" below.
|
|
234
|
+
exclude:
|
|
235
|
+
- feed_interaction
|
|
146
236
|
|
|
147
237
|
validation:
|
|
148
238
|
fail_on: ["missing_tables", "missing_required_columns"]
|
|
149
239
|
warn_on: ["type_mismatches", "missing_optional_columns"]
|
|
150
240
|
```
|
|
151
241
|
|
|
242
|
+
### Private GitHub repos need `GITHUB_TOKEN`
|
|
243
|
+
|
|
244
|
+
If `target.*.repo` points at a private repository, `contract-validator`
|
|
245
|
+
needs a token with read access to it. Where that token comes from is
|
|
246
|
+
different locally vs. in CI — and the CI case has a sharp edge worth
|
|
247
|
+
understanding before it silently fails on a PR.
|
|
248
|
+
|
|
249
|
+
**Locally**, set the `GITHUB_TOKEN` environment variable before running the
|
|
250
|
+
CLI. On bash/zsh that's `export` (there's nothing to install — `export` just
|
|
251
|
+
makes the variable visible to the `contract-validator` process you run
|
|
252
|
+
next):
|
|
253
|
+
|
|
254
|
+
```bash
|
|
255
|
+
export GITHUB_TOKEN=$(gh auth token) # or a PAT with repo read access
|
|
256
|
+
contract-validator validate
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
GitHub's API 404s (not 403s) an unauthenticated request to a private path,
|
|
260
|
+
so without a token this looks identical to a plain typo in `path` —
|
|
261
|
+
`contract-validator init --interactive` and `contract-validator test` both
|
|
262
|
+
check `target.*.path` actually exists and will point you at this if the
|
|
263
|
+
lookup 404s with no token set.
|
|
264
|
+
|
|
265
|
+
**In CI**, the workflow `init` generates for a GitHub target wires up
|
|
266
|
+
`GITHUB_TOKEN: ${{ secrets.API_REPO_TOKEN }}` — a token **you** create,
|
|
267
|
+
*not* the auto-provided `secrets.GITHUB_TOKEN`. That auto-provided token
|
|
268
|
+
only has access to **the repository the workflow is running in**, so if
|
|
269
|
+
your dbt repo and your API repo are different repos, it silently can't read
|
|
270
|
+
the target the first time that target is private — and a PAT works
|
|
271
|
+
identically for a public target too, so there's no reason to default to the
|
|
272
|
+
token that only sometimes works. To finish the setup the generated workflow
|
|
273
|
+
expects:
|
|
274
|
+
|
|
275
|
+
1. Create a token with read access to the *target* repo — a
|
|
276
|
+
[fine-grained PAT](https://github.com/settings/personal-access-tokens/new)
|
|
277
|
+
scoped to just that repo's Contents (read-only) is the least-privilege
|
|
278
|
+
option; a classic PAT with the `repo` scope also works.
|
|
279
|
+
2. In the repo running the workflow (your dbt repo): **Settings → Secrets
|
|
280
|
+
and variables → Actions → New repository secret**. Name it
|
|
281
|
+
**`API_REPO_TOKEN`** exactly (that's the name the generated workflow
|
|
282
|
+
already references) and paste the token as the value.
|
|
283
|
+
|
|
284
|
+
> ⚠️ **GitHub rejects any secret name starting with `GITHUB_`** — it's a
|
|
285
|
+
> reserved prefix. You cannot create a secret literally called
|
|
286
|
+
> `GITHUB_TOKEN`; that's not a naming suggestion, the UI will refuse it.
|
|
287
|
+
> That's exactly why the workflow's secret is named `API_REPO_TOKEN`
|
|
288
|
+
> instead, even though the environment variable it feeds is `GITHUB_TOKEN`
|
|
289
|
+
> — two different things with confusingly similar names:
|
|
290
|
+
> ```yaml
|
|
291
|
+
> env:
|
|
292
|
+
> GITHUB_TOKEN: ${{ secrets.API_REPO_TOKEN }}
|
|
293
|
+
> # ^^^^^^^^^^^ local variable name, can be anything -- the CLI
|
|
294
|
+
> # just needs it called GITHUB_TOKEN to find it
|
|
295
|
+
> # ^^^^^^^^^^^^^^ the *secret's* name --
|
|
296
|
+
> # this is what GitHub restricts
|
|
297
|
+
> ```
|
|
298
|
+
|
|
299
|
+
Skip all of this for a `local` target — `init` omits the whole `env:` block
|
|
300
|
+
since a local target never talks to the GitHub API at all.
|
|
301
|
+
|
|
152
302
|
### When do I need `mapping`?
|
|
153
303
|
|
|
154
304
|
Most of the time you don't. Names are matched automatically across:
|
|
@@ -156,8 +306,26 @@ Most of the time you don't. Names are matched automatically across:
|
|
|
156
306
|
- **plural ↔ singular** — dbt's plural `users` matches Pydantic's `User` (→ `user`)
|
|
157
307
|
with no config (and it won't over-match — `address` is never confused with `addres`).
|
|
158
308
|
|
|
159
|
-
Reach for `mapping` only when a model or column is
|
|
160
|
-
convention can't bridge it (e.g. Pydantic
|
|
309
|
+
Reach for `mapping.tables` / `mapping.columns` only when a model or column is
|
|
310
|
+
named so differently that convention can't bridge it (e.g. Pydantic
|
|
311
|
+
`user_id` ↔ dbt `customer_identifier`).
|
|
312
|
+
|
|
313
|
+
`mapping.exclude` is different — it's not about renamed models, it's for a
|
|
314
|
+
target table that has **no source model on purpose**, because it's
|
|
315
|
+
populated by something other than dbt (a Kafka stream, a cron job, etc.).
|
|
316
|
+
This can't be inferred from the code (a `table=True` SQLModel class looks
|
|
317
|
+
identical whether or not dbt is supposed to feed it), so it has to be a
|
|
318
|
+
deliberate, human-stated exception:
|
|
319
|
+
|
|
320
|
+
```yaml
|
|
321
|
+
mapping:
|
|
322
|
+
exclude:
|
|
323
|
+
- feed_interaction
|
|
324
|
+
- affiliate_reward
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
Anything not listed is validated normally — including `table=True` classes,
|
|
328
|
+
which are treated the same as any other target and are not silently skipped.
|
|
161
329
|
|
|
162
330
|
## 🐍 Python API
|
|
163
331
|
|
|
@@ -201,9 +369,16 @@ jobs:
|
|
|
201
369
|
# Optional: `dbt docs generate` here for real warehouse types (Tier 1)
|
|
202
370
|
- run: contract-validator validate --output github
|
|
203
371
|
env:
|
|
204
|
-
GITHUB_TOKEN: ${{ secrets.
|
|
372
|
+
GITHUB_TOKEN: ${{ secrets.API_REPO_TOKEN }}
|
|
205
373
|
```
|
|
206
374
|
|
|
375
|
+
`GITHUB_TOKEN` here is only needed if `target` is a `github` repo (`init`
|
|
376
|
+
omits the whole `env:` block for a `local` target). `secrets.API_REPO_TOKEN`
|
|
377
|
+
is a token you create yourself, not GitHub's auto-provided
|
|
378
|
+
`secrets.GITHUB_TOKEN` — see
|
|
379
|
+
[Private GitHub repos need `GITHUB_TOKEN`](#private-github-repos-need-github_token)
|
|
380
|
+
above for why, and how to set it up.
|
|
381
|
+
|
|
207
382
|
### Pre-commit
|
|
208
383
|
|
|
209
384
|
```bash
|