data-contract-validator 1.1.1__tar.gz → 1.1.7__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/CHANGELOG.md +103 -0
  2. {data_contract_validator-1.1.1/data_contract_validator.egg-info → data_contract_validator-1.1.7}/PKG-INFO +182 -7
  3. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/README.md +181 -6
  4. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/__init__.py +1 -1
  5. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/cli.py +214 -49
  6. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/core/types.py +6 -1
  7. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/core/validator.py +14 -0
  8. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/extractors/fastapi.py +62 -22
  9. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7/data_contract_validator.egg-info}/PKG-INFO +182 -7
  10. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/pyproject.toml +1 -1
  11. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/LICENSE +0 -0
  12. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/MANIFEST.in +0 -0
  13. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/core/__init__.py +0 -0
  14. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/core/models.py +0 -0
  15. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/extractors/__init__.py +0 -0
  16. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/extractors/base.py +0 -0
  17. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/extractors/dbt.py +0 -0
  18. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/integrations/__init__.py +0 -0
  19. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/py.typed +0 -0
  20. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator/templates/github-actions-template.yml +0 -0
  21. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator.egg-info/SOURCES.txt +0 -0
  22. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator.egg-info/dependency_links.txt +0 -0
  23. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator.egg-info/entry_points.txt +0 -0
  24. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator.egg-info/requires.txt +0 -0
  25. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/data_contract_validator.egg-info/top_level.txt +0 -0
  26. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/requirements.txt +0 -0
  27. {data_contract_validator-1.1.1 → data_contract_validator-1.1.7}/setup.cfg +0 -0
@@ -7,6 +7,109 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [1.1.7] - 2026-07-04
11
+
12
+ ### Changed
13
+ - **The generated CI workflow now defaults `GITHUB_TOKEN` to
14
+ `secrets.API_REPO_TOKEN`** — a token you create yourself — instead of the
15
+ auto-provided `secrets.GITHUB_TOKEN`, which only has access to the repo
16
+ the workflow runs in. A personal access token works identically for
17
+ public and private targets, so this removes the silent-failure case
18
+ entirely rather than just documenting it (1.1.6's fix). Still skipped
19
+ entirely for a `local` target.
20
+
21
+ ### Added
22
+ - The generated CI workflow now includes a commented scaffold for
23
+ `dbt deps && dbt docs generate`, unlocking Tier 1 (real warehouse types)
24
+ in CI instead of that only being mentioned in prose docs. Commented out
25
+ by default since it needs the user's warehouse adapter and credentials
26
+ filled in, which can't be inferred.
27
+
28
+ ## [1.1.6] - 2026-07-03
29
+
30
+ ### Fixed
31
+ - **The generated CI workflow silently assumed the default
32
+ `secrets.GITHUB_TOKEN` could read the target API repo.** That token only
33
+ has access to the repo the workflow itself runs in — if `target.*.repo`
34
+ is a *different*, private repo, validation would fail on every PR with no
35
+ indication why. The generated workflow now documents the fix inline
36
+ (a personal-access-token secret pointed at that specific repo) and skips
37
+ the `GITHUB_TOKEN` env block entirely for a `local` target, which never
38
+ talks to the GitHub API at all.
39
+
40
+ ## [1.1.5] - 2026-07-03
41
+
42
+ ### Fixed
43
+ - **Python `int` was mapped to the narrower `INTEGER` canonical rank,
44
+ producing a false "type mismatch" warning against any dbt column typed
45
+ `bigint`** (a very common type for count/id columns). Python's `int` is
46
+ arbitrary-precision, unlike a fixed-width SQL `INTEGER` column, so there's
47
+ no real truncation risk — it's now mapped to the wider `BIGINT` rank.
48
+ A genuinely fractional source (`DECIMAL`/`FLOAT`) is still flagged.
49
+
50
+ ### Added
51
+ - `init --interactive` now offers to set up a pre-commit hook as part of the
52
+ same wizard, instead of requiring a separate `setup-precommit` invocation.
53
+ (The GitHub Actions CI workflow was already created automatically by
54
+ `init` for both the interactive and non-interactive paths — only the
55
+ pre-commit step needed folding in.)
56
+
57
+ ## [1.1.4] - 2026-07-03
58
+
59
+ ### Changed
60
+ - **`table=True` SQLModel classes are no longer skipped during extraction.**
61
+ Whether a table is meant to come from dbt is business knowledge that can't
62
+ be recovered from the Python source — two structurally identical
63
+ `table=True` classes can need opposite treatment (one is a normal dbt-fed
64
+ table an API also returns directly; another is populated by a separate
65
+ pipeline like Kafka and was never meant to have a dbt model). Blanket-
66
+ skipping every `table=True` class silently exempted the former case from
67
+ validation too, which is likely the more common pattern — defeating the
68
+ tool's purpose for it. `table=True` classes are now validated like any
69
+ other target.
70
+ - Added `mapping.exclude: [<table>, ...]` so the latter case (genuinely no
71
+ source model, e.g. Kafka-populated) can be stated explicitly instead of
72
+ inferred from `table=True`. Excluded tables are skipped entirely and never
73
+ produce a "missing table" issue.
74
+
75
+ ## [1.1.3] - 2026-07-03
76
+
77
+ Supersedes 1.1.2, which was only ever published to TestPyPI for verification
78
+ and never released to production PyPI.
79
+
80
+ ### Fixed
81
+ - **`table=True` SQLModel classes were incorrectly evaluated as required API
82
+ contracts.** The standard `class Foo(SQLModel, table=True)` syntax puts
83
+ `table=True` on the class definition's own keywords, not nested inside a
84
+ `Call` base — the skip check only looked in the latter, so DB-only tables
85
+ never matched and produced permanent, unfixable "missing table" criticals.
86
+ - **Explicit `__tablename__` is now resolved and used as the target table
87
+ name**, instead of only the class-name-derived guess. A class like
88
+ `VideoViewed` with `__tablename__ = "int_unified_video_viewed"` now matches
89
+ its real source model without needing a manual `mapping.tables` entry.
90
+ - **`init --interactive` no longer guesses local vs. GitHub from the path's
91
+ shape.** A local relative path like `app/models` (the wizard's own
92
+ suggested default) is syntactically identical to a GitHub `org/repo`
93
+ string, and was always guessed as a repo, producing a nonsensical
94
+ `app/models/app/models` GitHub target. The wizard now asks explicitly
95
+ ("local project or a different GitHub repo?") before asking for the
96
+ path, and asks for the repo and the path within it as separate prompts.
97
+
98
+ ### Added
99
+ - `init --interactive` and `contract-validator test` now verify a configured
100
+ GitHub target path actually exists via the GitHub API, instead of silently
101
+ accepting a stale or typo'd path.
102
+ - GitHub API error messages hint at setting `GITHUB_TOKEN` when an
103
+ unauthenticated 404 is ambiguous with a private repo.
104
+
105
+ ### Changed
106
+ - **`contract-validator init` no longer silently overwrites an existing
107
+ `.retl-validator.yml` or generated workflow file.** Re-running `init` (e.g.
108
+ after upgrading to pick up a newer version's config defaults) now refuses
109
+ and exits if either file already exists — pass `--force` to regenerate
110
+ them from scratch. Previously this was an unconditional overwrite with no
111
+ confirmation, which could silently destroy hand-added `mapping` entries.
112
+
10
113
  ## [1.1.1] - 2026-06-30
11
114
 
12
115
  ### Added
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: data-contract-validator
3
- Version: 1.1.1
3
+ Version: 1.1.7
4
4
  Summary: Validate data contracts between dbt models and FastAPI/Pydantic APIs with accurate, low-false-positive schema checks
5
5
  Author-email: Ogunniran Siji <ogunniransiji@gmail.com>
6
6
  Maintainer-email: Ogunniran Siji <ogunniransiji@gmail.com>
@@ -101,6 +101,80 @@ contract-validator test
101
101
  contract-validator validate
102
102
  ```
103
103
 
104
+ ## 🚀 Getting started, step by step
105
+
106
+ If you're setting this up on a project for the first time, the order below
107
+ avoids the sharp edges:
108
+
109
+ 1. **Install into the same environment dbt runs in** (not a separate venv) —
110
+ the tool needs to see your dbt project:
111
+ ```bash
112
+ pip install data-contract-validator
113
+ ```
114
+ Already have `.retl-validator.yml` committed by a teammate? Skip to step 5.
115
+
116
+ 2. **Generate the config + CI workflow** (one-time):
117
+ ```bash
118
+ contract-validator init --interactive
119
+ ```
120
+ You'll be asked: where your dbt project is, which API framework you use,
121
+ whether your models live in this local project or a different GitHub
122
+ repo, and then the local path (or the `org/repo` + path within it). It's
123
+ asked explicitly rather than guessed from the path's shape — a local path
124
+ like `app/models` is syntactically identical to a GitHub `org/repo`
125
+ string, so there's no reliable way to infer which one you mean. If you
126
+ pick GitHub, it checks the path actually exists before writing the
127
+ config — so a typo surfaces here instead of at `validate` time.
128
+
129
+ `init` refuses to touch an existing `.retl-validator.yml` or workflow
130
+ file — it won't clobber hand-added `mapping` entries just because you
131
+ upgraded the package and re-ran `init`. Pass `--force` if you really want
132
+ to regenerate them from the new version's defaults.
133
+
134
+ 3. **Pre-commit hook**: `init --interactive` asks whether you want one set
135
+ up right after creating the config and CI workflow — say yes there and
136
+ it's done. To add one later (or if you used non-interactive `init`,
137
+ which doesn't prompt), run it standalone:
138
+ ```bash
139
+ contract-validator setup-precommit --install-hooks
140
+ ```
141
+
142
+ 4. **If the target repo is private, set a token** before running anything
143
+ that talks to GitHub locally:
144
+ ```bash
145
+ export GITHUB_TOKEN=$(gh auth token) # or a PAT with repo read access
146
+ ```
147
+ See [Private GitHub repos need `GITHUB_TOKEN`](#private-github-repos-need-github_token) below for why this is easy to miss.
148
+
149
+ 5. **Sanity-check the setup**:
150
+ ```bash
151
+ contract-validator test
152
+ ```
153
+ Confirms the config parses, the dbt project is found, and the target
154
+ (local path or GitHub path) is reachable. If this fails, `validate` will
155
+ fail the same way — fix it here first.
156
+
157
+ 6. **Run it**:
158
+ ```bash
159
+ contract-validator validate
160
+ ```
161
+
162
+ 7. **When it reports a critical issue, diagnose before assuming your dbt
163
+ model is wrong**:
164
+ - Real missing column/table → fix the dbt model.
165
+ - Target name doesn't match the dbt model by convention (renamed/prefixed)
166
+ → add an entry under `mapping.tables` in `.retl-validator.yml` (see
167
+ [When do I need `mapping`?](#when-do-i-need-mapping)).
168
+ - A table that's genuinely populated by something other than dbt (e.g. a
169
+ separate streaming pipeline) and has no source model on purpose → add
170
+ it to `mapping.exclude`. `table=True` alone is **not** used to infer
171
+ this automatically — see [FastAPI side](#fastapi-side) for why.
172
+
173
+ 8. **For accurate type-checking** (not just column-presence checks), run
174
+ `dbt docs generate` before `validate` so it picks up `catalog.json` (Tier 1,
175
+ real warehouse types) instead of inferring from SQL text — see
176
+ [How extraction works](#-how-extraction-works-and-why-its-accurate) below.
177
+
104
178
  ### One-off validation (no config file)
105
179
 
106
180
  ```bash
@@ -132,13 +206,25 @@ CI job.
132
206
 
133
207
  > 💡 **Tip:** run `dbt docs generate` in CI before validating to unlock Tier 1
134
208
  > (real types). Without it, you still get accurate column-presence checks from
135
- > Tier 2.
209
+ > Tier 2. The workflow `init` generates includes this step already, commented
210
+ > out — it needs your warehouse adapter and credentials filled in, which
211
+ > can't be guessed, so it isn't active by default.
136
212
 
137
213
  ### FastAPI side
138
214
 
139
215
  Pydantic / SQLModel classes are parsed from source with Python's `ast` (no
140
- imports executed). `Optional[...]` controls whether a field is required;
141
- `table=True` SQLModel classes (DB tables, not API contracts) are skipped.
216
+ imports executed). `Optional[...]` controls whether a field is required.
217
+ An explicit `__tablename__` is used as the table name when present;
218
+ otherwise the class name is converted to `snake_case`.
219
+
220
+ `table=True` SQLModel classes are validated the same as any other class —
221
+ they are **not** skipped. Whether a table is meant to come from dbt is
222
+ business knowledge that isn't recoverable from the Python source: two
223
+ structurally identical `table=True` classes can need opposite treatment (one
224
+ is a normal dbt-fed table your API also returns directly; another is
225
+ populated by a Kafka stream and was never meant to have a dbt model). Use
226
+ `mapping.exclude` to state the latter case explicitly rather than relying on
227
+ `table=True` to imply it.
142
228
 
143
229
  ## 🚦 What gets flagged
144
230
 
@@ -193,12 +279,76 @@ mapping:
193
279
  user_analytics:
194
280
  # target column : source column
195
281
  userId: user_id
282
+ # Target tables with no source model on purpose (e.g. Kafka-populated,
283
+ # not dbt) -- see "When do I need mapping?" below.
284
+ exclude:
285
+ - feed_interaction
196
286
 
197
287
  validation:
198
288
  fail_on: ["missing_tables", "missing_required_columns"]
199
289
  warn_on: ["type_mismatches", "missing_optional_columns"]
200
290
  ```
201
291
 
292
+ ### Private GitHub repos need `GITHUB_TOKEN`
293
+
294
+ If `target.*.repo` points at a private repository, `contract-validator`
295
+ needs a token with read access to it. Where that token comes from is
296
+ different locally vs. in CI — and the CI case has a sharp edge worth
297
+ understanding before it silently fails on a PR.
298
+
299
+ **Locally**, set the `GITHUB_TOKEN` environment variable before running the
300
+ CLI. On bash/zsh that's `export` (there's nothing to install — `export` just
301
+ makes the variable visible to the `contract-validator` process you run
302
+ next):
303
+
304
+ ```bash
305
+ export GITHUB_TOKEN=$(gh auth token) # or a PAT with repo read access
306
+ contract-validator validate
307
+ ```
308
+
309
+ GitHub's API 404s (not 403s) an unauthenticated request to a private path,
310
+ so without a token this looks identical to a plain typo in `path` —
311
+ `contract-validator init --interactive` and `contract-validator test` both
312
+ check `target.*.path` actually exists and will point you at this if the
313
+ lookup 404s with no token set.
314
+
315
+ **In CI**, the workflow `init` generates for a GitHub target wires up
316
+ `GITHUB_TOKEN: ${{ secrets.API_REPO_TOKEN }}` — a token **you** create,
317
+ *not* the auto-provided `secrets.GITHUB_TOKEN`. That auto-provided token
318
+ only has access to **the repository the workflow is running in**, so if
319
+ your dbt repo and your API repo are different repos, it silently can't read
320
+ the target the first time that target is private — and a PAT works
321
+ identically for a public target too, so there's no reason to default to the
322
+ token that only sometimes works. To finish the setup the generated workflow
323
+ expects:
324
+
325
+ 1. Create a token with read access to the *target* repo — a
326
+ [fine-grained PAT](https://github.com/settings/personal-access-tokens/new)
327
+ scoped to just that repo's Contents (read-only) is the least-privilege
328
+ option; a classic PAT with the `repo` scope also works.
329
+ 2. In the repo running the workflow (your dbt repo): **Settings → Secrets
330
+ and variables → Actions → New repository secret**. Name it
331
+ **`API_REPO_TOKEN`** exactly (that's the name the generated workflow
332
+ already references) and paste the token as the value.
333
+
334
+ > ⚠️ **GitHub rejects any secret name starting with `GITHUB_`** — it's a
335
+ > reserved prefix. You cannot create a secret literally called
336
+ > `GITHUB_TOKEN`; that's not a naming suggestion, the UI will refuse it.
337
+ > That's exactly why the workflow's secret is named `API_REPO_TOKEN`
338
+ > instead, even though the environment variable it feeds is `GITHUB_TOKEN`
339
+ > — two different things with confusingly similar names:
340
+ > ```yaml
341
+ > env:
342
+ > GITHUB_TOKEN: ${{ secrets.API_REPO_TOKEN }}
343
+ > # ^^^^^^^^^^^ local variable name, can be anything -- the CLI
344
+ > # just needs it called GITHUB_TOKEN to find it
345
+ > # ^^^^^^^^^^^^^^ the *secret's* name --
346
+ > # this is what GitHub restricts
347
+ > ```
348
+
349
+ Skip all of this for a `local` target — `init` omits the whole `env:` block
350
+ since a local target never talks to the GitHub API at all.
351
+
202
352
  ### When do I need `mapping`?
203
353
 
204
354
  Most of the time you don't. Names are matched automatically across:
@@ -206,8 +356,26 @@ Most of the time you don't. Names are matched automatically across:
206
356
  - **plural ↔ singular** — dbt's plural `users` matches Pydantic's `User` (→ `user`)
207
357
  with no config (and it won't over-match — `address` is never confused with `addres`).
208
358
 
209
- Reach for `mapping` only when a model or column is named so differently that
210
- convention can't bridge it (e.g. Pydantic `user_id` ↔ dbt `customer_identifier`).
359
+ Reach for `mapping.tables` / `mapping.columns` only when a model or column is
360
+ named so differently that convention can't bridge it (e.g. Pydantic
361
+ `user_id` ↔ dbt `customer_identifier`).
362
+
363
+ `mapping.exclude` is different — it's not about renamed models, it's for a
364
+ target table that has **no source model on purpose**, because it's
365
+ populated by something other than dbt (a Kafka stream, a cron job, etc.).
366
+ This can't be inferred from the code (a `table=True` SQLModel class looks
367
+ identical whether or not dbt is supposed to feed it), so it has to be a
368
+ deliberate, human-stated exception:
369
+
370
+ ```yaml
371
+ mapping:
372
+ exclude:
373
+ - feed_interaction
374
+ - affiliate_reward
375
+ ```
376
+
377
+ Anything not listed is validated normally — including `table=True` classes,
378
+ which are treated the same as any other target and are not silently skipped.
211
379
 
212
380
  ## 🐍 Python API
213
381
 
@@ -251,9 +419,16 @@ jobs:
251
419
  # Optional: `dbt docs generate` here for real warehouse types (Tier 1)
252
420
  - run: contract-validator validate --output github
253
421
  env:
254
- GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
422
+ GITHUB_TOKEN: ${{ secrets.API_REPO_TOKEN }}
255
423
  ```
256
424
 
425
+ `GITHUB_TOKEN` here is only needed if `target` is a `github` repo (`init`
426
+ omits the whole `env:` block for a `local` target). `secrets.API_REPO_TOKEN`
427
+ is a token you create yourself, not GitHub's auto-provided
428
+ `secrets.GITHUB_TOKEN` — see
429
+ [Private GitHub repos need `GITHUB_TOKEN`](#private-github-repos-need-github_token)
430
+ above for why, and how to set it up.
431
+
257
432
  ### Pre-commit
258
433
 
259
434
  ```bash
@@ -51,6 +51,80 @@ contract-validator test
51
51
  contract-validator validate
52
52
  ```
53
53
 
54
+ ## 🚀 Getting started, step by step
55
+
56
+ If you're setting this up on a project for the first time, the order below
57
+ avoids the sharp edges:
58
+
59
+ 1. **Install into the same environment dbt runs in** (not a separate venv) —
60
+ the tool needs to see your dbt project:
61
+ ```bash
62
+ pip install data-contract-validator
63
+ ```
64
+ Already have `.retl-validator.yml` committed by a teammate? Skip to step 5.
65
+
66
+ 2. **Generate the config + CI workflow** (one-time):
67
+ ```bash
68
+ contract-validator init --interactive
69
+ ```
70
+ You'll be asked: where your dbt project is, which API framework you use,
71
+ whether your models live in this local project or a different GitHub
72
+ repo, and then the local path (or the `org/repo` + path within it). It's
73
+ asked explicitly rather than guessed from the path's shape — a local path
74
+ like `app/models` is syntactically identical to a GitHub `org/repo`
75
+ string, so there's no reliable way to infer which one you mean. If you
76
+ pick GitHub, it checks the path actually exists before writing the
77
+ config — so a typo surfaces here instead of at `validate` time.
78
+
79
+ `init` refuses to touch an existing `.retl-validator.yml` or workflow
80
+ file — it won't clobber hand-added `mapping` entries just because you
81
+ upgraded the package and re-ran `init`. Pass `--force` if you really want
82
+ to regenerate them from the new version's defaults.
83
+
84
+ 3. **Pre-commit hook**: `init --interactive` asks whether you want one set
85
+ up right after creating the config and CI workflow — say yes there and
86
+ it's done. To add one later (or if you used non-interactive `init`,
87
+ which doesn't prompt), run it standalone:
88
+ ```bash
89
+ contract-validator setup-precommit --install-hooks
90
+ ```
91
+
92
+ 4. **If the target repo is private, set a token** before running anything
93
+ that talks to GitHub locally:
94
+ ```bash
95
+ export GITHUB_TOKEN=$(gh auth token) # or a PAT with repo read access
96
+ ```
97
+ See [Private GitHub repos need `GITHUB_TOKEN`](#private-github-repos-need-github_token) below for why this is easy to miss.
98
+
99
+ 5. **Sanity-check the setup**:
100
+ ```bash
101
+ contract-validator test
102
+ ```
103
+ Confirms the config parses, the dbt project is found, and the target
104
+ (local path or GitHub path) is reachable. If this fails, `validate` will
105
+ fail the same way — fix it here first.
106
+
107
+ 6. **Run it**:
108
+ ```bash
109
+ contract-validator validate
110
+ ```
111
+
112
+ 7. **When it reports a critical issue, diagnose before assuming your dbt
113
+ model is wrong**:
114
+ - Real missing column/table → fix the dbt model.
115
+ - Target name doesn't match the dbt model by convention (renamed/prefixed)
116
+ → add an entry under `mapping.tables` in `.retl-validator.yml` (see
117
+ [When do I need `mapping`?](#when-do-i-need-mapping)).
118
+ - A table that's genuinely populated by something other than dbt (e.g. a
119
+ separate streaming pipeline) and has no source model on purpose → add
120
+ it to `mapping.exclude`. `table=True` alone is **not** used to infer
121
+ this automatically — see [FastAPI side](#fastapi-side) for why.
122
+
123
+ 8. **For accurate type-checking** (not just column-presence checks), run
124
+ `dbt docs generate` before `validate` so it picks up `catalog.json` (Tier 1,
125
+ real warehouse types) instead of inferring from SQL text — see
126
+ [How extraction works](#-how-extraction-works-and-why-its-accurate) below.
127
+
54
128
  ### One-off validation (no config file)
55
129
 
56
130
  ```bash
@@ -82,13 +156,25 @@ CI job.
82
156
 
83
157
  > 💡 **Tip:** run `dbt docs generate` in CI before validating to unlock Tier 1
84
158
  > (real types). Without it, you still get accurate column-presence checks from
85
- > Tier 2.
159
+ > Tier 2. The workflow `init` generates includes this step already, commented
160
+ > out — it needs your warehouse adapter and credentials filled in, which
161
+ > can't be guessed, so it isn't active by default.
86
162
 
87
163
  ### FastAPI side
88
164
 
89
165
  Pydantic / SQLModel classes are parsed from source with Python's `ast` (no
90
- imports executed). `Optional[...]` controls whether a field is required;
91
- `table=True` SQLModel classes (DB tables, not API contracts) are skipped.
166
+ imports executed). `Optional[...]` controls whether a field is required.
167
+ An explicit `__tablename__` is used as the table name when present;
168
+ otherwise the class name is converted to `snake_case`.
169
+
170
+ `table=True` SQLModel classes are validated the same as any other class —
171
+ they are **not** skipped. Whether a table is meant to come from dbt is
172
+ business knowledge that isn't recoverable from the Python source: two
173
+ structurally identical `table=True` classes can need opposite treatment (one
174
+ is a normal dbt-fed table your API also returns directly; another is
175
+ populated by a Kafka stream and was never meant to have a dbt model). Use
176
+ `mapping.exclude` to state the latter case explicitly rather than relying on
177
+ `table=True` to imply it.
92
178
 
93
179
  ## 🚦 What gets flagged
94
180
 
@@ -143,12 +229,76 @@ mapping:
143
229
  user_analytics:
144
230
  # target column : source column
145
231
  userId: user_id
232
+ # Target tables with no source model on purpose (e.g. Kafka-populated,
233
+ # not dbt) -- see "When do I need mapping?" below.
234
+ exclude:
235
+ - feed_interaction
146
236
 
147
237
  validation:
148
238
  fail_on: ["missing_tables", "missing_required_columns"]
149
239
  warn_on: ["type_mismatches", "missing_optional_columns"]
150
240
  ```
151
241
 
242
+ ### Private GitHub repos need `GITHUB_TOKEN`
243
+
244
+ If `target.*.repo` points at a private repository, `contract-validator`
245
+ needs a token with read access to it. Where that token comes from is
246
+ different locally vs. in CI — and the CI case has a sharp edge worth
247
+ understanding before it silently fails on a PR.
248
+
249
+ **Locally**, set the `GITHUB_TOKEN` environment variable before running the
250
+ CLI. On bash/zsh that's `export` (there's nothing to install — `export` just
251
+ makes the variable visible to the `contract-validator` process you run
252
+ next):
253
+
254
+ ```bash
255
+ export GITHUB_TOKEN=$(gh auth token) # or a PAT with repo read access
256
+ contract-validator validate
257
+ ```
258
+
259
+ GitHub's API 404s (not 403s) an unauthenticated request to a private path,
260
+ so without a token this looks identical to a plain typo in `path` —
261
+ `contract-validator init --interactive` and `contract-validator test` both
262
+ check `target.*.path` actually exists and will point you at this if the
263
+ lookup 404s with no token set.
264
+
265
+ **In CI**, the workflow `init` generates for a GitHub target wires up
266
+ `GITHUB_TOKEN: ${{ secrets.API_REPO_TOKEN }}` — a token **you** create,
267
+ *not* the auto-provided `secrets.GITHUB_TOKEN`. That auto-provided token
268
+ only has access to **the repository the workflow is running in**, so if
269
+ your dbt repo and your API repo are different repos, it silently can't read
270
+ the target the first time that target is private — and a PAT works
271
+ identically for a public target too, so there's no reason to default to the
272
+ token that only sometimes works. To finish the setup the generated workflow
273
+ expects:
274
+
275
+ 1. Create a token with read access to the *target* repo — a
276
+ [fine-grained PAT](https://github.com/settings/personal-access-tokens/new)
277
+ scoped to just that repo's Contents (read-only) is the least-privilege
278
+ option; a classic PAT with the `repo` scope also works.
279
+ 2. In the repo running the workflow (your dbt repo): **Settings → Secrets
280
+ and variables → Actions → New repository secret**. Name it
281
+ **`API_REPO_TOKEN`** exactly (that's the name the generated workflow
282
+ already references) and paste the token as the value.
283
+
284
+ > ⚠️ **GitHub rejects any secret name starting with `GITHUB_`** — it's a
285
+ > reserved prefix. You cannot create a secret literally called
286
+ > `GITHUB_TOKEN`; that's not a naming suggestion, the UI will refuse it.
287
+ > That's exactly why the workflow's secret is named `API_REPO_TOKEN`
288
+ > instead, even though the environment variable it feeds is `GITHUB_TOKEN`
289
+ > — two different things with confusingly similar names:
290
+ > ```yaml
291
+ > env:
292
+ > GITHUB_TOKEN: ${{ secrets.API_REPO_TOKEN }}
293
+ > # ^^^^^^^^^^^ local variable name, can be anything -- the CLI
294
+ > # just needs it called GITHUB_TOKEN to find it
295
+ > # ^^^^^^^^^^^^^^ the *secret's* name --
296
+ > # this is what GitHub restricts
297
+ > ```
298
+
299
+ Skip all of this for a `local` target — `init` omits the whole `env:` block
300
+ since a local target never talks to the GitHub API at all.
301
+
152
302
  ### When do I need `mapping`?
153
303
 
154
304
  Most of the time you don't. Names are matched automatically across:
@@ -156,8 +306,26 @@ Most of the time you don't. Names are matched automatically across:
156
306
  - **plural ↔ singular** — dbt's plural `users` matches Pydantic's `User` (→ `user`)
157
307
  with no config (and it won't over-match — `address` is never confused with `addres`).
158
308
 
159
- Reach for `mapping` only when a model or column is named so differently that
160
- convention can't bridge it (e.g. Pydantic `user_id` ↔ dbt `customer_identifier`).
309
+ Reach for `mapping.tables` / `mapping.columns` only when a model or column is
310
+ named so differently that convention can't bridge it (e.g. Pydantic
311
+ `user_id` ↔ dbt `customer_identifier`).
312
+
313
+ `mapping.exclude` is different — it's not about renamed models, it's for a
314
+ target table that has **no source model on purpose**, because it's
315
+ populated by something other than dbt (a Kafka stream, a cron job, etc.).
316
+ This can't be inferred from the code (a `table=True` SQLModel class looks
317
+ identical whether or not dbt is supposed to feed it), so it has to be a
318
+ deliberate, human-stated exception:
319
+
320
+ ```yaml
321
+ mapping:
322
+ exclude:
323
+ - feed_interaction
324
+ - affiliate_reward
325
+ ```
326
+
327
+ Anything not listed is validated normally — including `table=True` classes,
328
+ which are treated the same as any other target and are not silently skipped.
161
329
 
162
330
  ## 🐍 Python API
163
331
 
@@ -201,9 +369,16 @@ jobs:
201
369
  # Optional: `dbt docs generate` here for real warehouse types (Tier 1)
202
370
  - run: contract-validator validate --output github
203
371
  env:
204
- GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
372
+ GITHUB_TOKEN: ${{ secrets.API_REPO_TOKEN }}
205
373
  ```
206
374
 
375
+ `GITHUB_TOKEN` here is only needed if `target` is a `github` repo (`init`
376
+ omits the whole `env:` block for a `local` target). `secrets.API_REPO_TOKEN`
377
+ is a token you create yourself, not GitHub's auto-provided
378
+ `secrets.GITHUB_TOKEN` — see
379
+ [Private GitHub repos need `GITHUB_TOKEN`](#private-github-repos-need-github_token)
380
+ above for why, and how to set it up.
381
+
207
382
  ### Pre-commit
208
383
 
209
384
  ```bash
@@ -5,7 +5,7 @@ Prevent production API breaks by validating data contracts between
5
5
  your data pipelines and API frameworks.
6
6
  """
7
7
 
8
- __version__ = "1.1.1"
8
+ __version__ = "1.1.7"
9
9
  __author__ = "Ogunniran Siji"
10
10
  __email__ = "ogunniransiji@gmail.com"
11
11