db-schema-guard 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- db_schema_guard-0.1.0/PKG-INFO +530 -0
- db_schema_guard-0.1.0/README.md +506 -0
- db_schema_guard-0.1.0/pyproject.toml +40 -0
- db_schema_guard-0.1.0/setup.cfg +4 -0
- db_schema_guard-0.1.0/src/db_schema_guard.egg-info/PKG-INFO +530 -0
- db_schema_guard-0.1.0/src/db_schema_guard.egg-info/SOURCES.txt +17 -0
- db_schema_guard-0.1.0/src/db_schema_guard.egg-info/dependency_links.txt +1 -0
- db_schema_guard-0.1.0/src/db_schema_guard.egg-info/entry_points.txt +2 -0
- db_schema_guard-0.1.0/src/db_schema_guard.egg-info/requires.txt +8 -0
- db_schema_guard-0.1.0/src/db_schema_guard.egg-info/top_level.txt +1 -0
- db_schema_guard-0.1.0/src/schema_guard/alerter.py +54 -0
- db_schema_guard-0.1.0/src/schema_guard/cli.py +124 -0
- db_schema_guard-0.1.0/src/schema_guard/contract.py +95 -0
- db_schema_guard-0.1.0/src/schema_guard/diff_engine.py +100 -0
- db_schema_guard-0.1.0/src/schema_guard/extractors/postgres.py +32 -0
- db_schema_guard-0.1.0/src/schema_guard/snapshot.py +55 -0
- db_schema_guard-0.1.0/tests/test_contract.py +189 -0
- db_schema_guard-0.1.0/tests/test_diff_engine.py +204 -0
- db_schema_guard-0.1.0/tests/test_snapshot.py +94 -0
|
@@ -0,0 +1,530 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: db-schema-guard
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Stop silent schema drift before it breaks your data pipelines
|
|
5
|
+
Author-email: Anil Solanki <anusolanki2645@gmail.com>
|
|
6
|
+
License: MIT
|
|
7
|
+
Keywords: data-engineering,schema,ci-cd,postgresql
|
|
8
|
+
Classifier: Development Status :: 3 - Alpha
|
|
9
|
+
Classifier: Intended Audience :: Developers
|
|
10
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
11
|
+
Classifier: Programming Language :: Python :: 3
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
16
|
+
Description-Content-Type: text/markdown
|
|
17
|
+
Requires-Dist: click>=8.0
|
|
18
|
+
Requires-Dist: pyyaml>=6.0
|
|
19
|
+
Requires-Dist: sqlalchemy>=1.4
|
|
20
|
+
Requires-Dist: psycopg2-binary
|
|
21
|
+
Requires-Dist: python-dotenv
|
|
22
|
+
Provides-Extra: dev
|
|
23
|
+
Requires-Dist: pytest>=7.0; extra == "dev"
|
|
24
|
+
|
|
25
|
+
|
|
26
|
+
# π‘οΈ Schema Guard
|
|
27
|
+
|
|
28
|
+
**Stop silent schema drift before it breaks your production data pipelines.**
|
|
29
|
+
|
|
30
|
+
Schema Guard is a lightweight CLI tool that captures database schema snapshots and acts as a CI/CD gate, blocking deployments when unauthorized schema changes are detected. It's the missing guardrail for data engineers who've been burned by unexpected column drops, type changes, or nullability shifts.
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
## Why Schema Guard?
|
|
35
|
+
|
|
36
|
+
Every data engineer knows the 2 AM nightmare:
|
|
37
|
+
- A source team adds a column, changes a type, or drops a `NOT NULL` without notice.
|
|
38
|
+
- Your ETL pipeline doesn't failβit silently writes `NULL`s or corrupts downstream data.
|
|
39
|
+
- Executive dashboards break, and you spend hours tracing the issue back to a trivial schema change.
|
|
40
|
+
|
|
41
|
+
Schema Guard stops this at the CI gate. You define a **data contract** (YAML), capture a trusted **schema snapshot**, and then in your deployment pipeline, `schema-guard gate` compares the live source against the snapshot and contract. Any drift fails the build *before* it hits production.
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## π§ Architecture Overview
|
|
46
|
+
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
βββββββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
|
|
50
|
+
β Contract ββββββββΆ β CLI (snap) ββββββββΆβ Snapshot JSON β
|
|
51
|
+
β (orders.yaml) β ββββββββββββββββ βββββββββββββββββββ
|
|
52
|
+
β β β
|
|
53
|
+
β Live Database β β
|
|
54
|
+
βββββββββββββββββββ β
|
|
55
|
+
β
|
|
56
|
+
βββββββββββββΌββββββββββββ
|
|
57
|
+
β CLI (gate) β
|
|
58
|
+
β Compares live schema β
|
|
59
|
+
β vs snapshot + contractβ
|
|
60
|
+
βββββββββββββ¬ββββββββββββ
|
|
61
|
+
β
|
|
62
|
+
βββββββββββββΌββββββββββββ
|
|
63
|
+
β Diff Engine β
|
|
64
|
+
β Detects violations β
|
|
65
|
+
βββββββββββββ¬ββββββββββββ
|
|
66
|
+
β
|
|
67
|
+
βββββββββββββΌββββββββββββ
|
|
68
|
+
β Alerter (email) β
|
|
69
|
+
β Sends notifications β
|
|
70
|
+
βββββββββββββββββββββββββ
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
1. **Contract** β Describes the expected schema (columns, types, nullability, allowed drifts).
|
|
74
|
+
2. **Snapshot** β A frozen, integrity-verified JSON representation of the real schema at a trusted point in time.
|
|
75
|
+
3. **Gate** β Compares live schema to snapshot + contract. If drift is detected, it logs violations, sends an email, and exits with code 1 to fail the CI pipeline.
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## β¨ Key Features
|
|
80
|
+
|
|
81
|
+
- **Snapshot integrity verification** β Every snapshot is SHA-256 hashed. Tampered or corrupted snapshot files are detected and rejected on load.
|
|
82
|
+
- **Overwrite protection** β The `snap` command refuses to overwrite an existing baseline unless `--force` is passed, preventing accidental baseline drift.
|
|
83
|
+
- **Case-insensitive type comparison** β `INTEGER` vs `integer` and `NUMERIC(10, 2)` vs `numeric(10,2)` are treated as equal. No more false positives.
|
|
84
|
+
- **Primary key drift detection** β Dropped or added primary key constraints are flagged as CRITICAL violations.
|
|
85
|
+
- **Contract validation** β Malformed YAML contracts fail fast with clear error messages instead of cryptic `KeyError` crashes.
|
|
86
|
+
- **Contract-vs-snapshot warnings** β If the snapshot itself doesn't match the contract expectations, you'll see a WARNING notice so you know the baseline may have been captured at a bad time.
|
|
87
|
+
- **Allowed drift visibility** β When an `allowed_drift` rule matches, it's logged as an INFO notice instead of being silently swallowed.
|
|
88
|
+
- **Reliable alerting** β Email alert failures are reported to stderr. Use `--require-alert` to fail the gate if email delivery fails.
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## π Installation
|
|
93
|
+
|
|
94
|
+
### From source (for development)
|
|
95
|
+
```bash
|
|
96
|
+
git clone https://github.com/your-username/schema-guard.git
|
|
97
|
+
cd schema-guard
|
|
98
|
+
pip install -e ".[dev]"
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
### From PyPI (once published)
|
|
102
|
+
```bash
|
|
103
|
+
pip install db-schema-guard
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
Requires Python β₯ 3.8. Dependencies are installed automatically.
|
|
107
|
+
|
|
108
|
+
---
|
|
109
|
+
|
|
110
|
+
## β‘ Quick Start
|
|
111
|
+
|
|
112
|
+
### 1. Create the database table
|
|
113
|
+
```sql
|
|
114
|
+
CREATE TABLE public.orders (
|
|
115
|
+
order_id INT PRIMARY KEY,
|
|
116
|
+
amount DECIMAL(10,2) NOT NULL,
|
|
117
|
+
status VARCHAR(20) NOT NULL
|
|
118
|
+
);
|
|
119
|
+
|
|
120
|
+
INSERT INTO orders VALUES (1, 99.99, 'shipped');
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### 2. Define a contract β `contracts/orders.yaml`
|
|
124
|
+
```yaml
|
|
125
|
+
source:
|
|
126
|
+
name: prod_orders
|
|
127
|
+
type: postgres
|
|
128
|
+
connection: "env:DB_CONNECTION_STRING" # see Configuration
|
|
129
|
+
schema: public
|
|
130
|
+
table: orders
|
|
131
|
+
columns:
|
|
132
|
+
- name: order_id
|
|
133
|
+
type: integer
|
|
134
|
+
nullable: false
|
|
135
|
+
- name: amount
|
|
136
|
+
type: numeric(10,2)
|
|
137
|
+
nullable: false
|
|
138
|
+
allowed_drift:
|
|
139
|
+
- from: "numeric(10,2)"
|
|
140
|
+
to: "numeric(12,2)"
|
|
141
|
+
- name: status
|
|
142
|
+
type: character varying(20)
|
|
143
|
+
nullable: false
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
### 3. Set up environment variables
|
|
147
|
+
|
|
148
|
+
Copy the example file and fill in your values:
|
|
149
|
+
```bash
|
|
150
|
+
cp .env.example .env
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
Edit `.env`:
|
|
154
|
+
```env
|
|
155
|
+
DB_CONNECTION_STRING=postgresql://postgres:yourpassword@localhost:5432/schema_guard
|
|
156
|
+
```
|
|
157
|
+
> **Note:** Special characters must be URLβencoded, e.g., `@` β `%40`
|
|
158
|
+
|
|
159
|
+
> **β οΈ Security:** Never commit `.env` to version control. The `.gitignore` is already configured to exclude it. Use `.env.example` as a template reference.
|
|
160
|
+
|
|
161
|
+
### 4. Capture a snapshot of the current schema
|
|
162
|
+
```bash
|
|
163
|
+
schema-guard snap --contract contracts/orders.yaml --snapshot-file snapshots/orders.json
|
|
164
|
+
```
|
|
165
|
+
β
Snapshot saved to `snapshots/orders.json`
|
|
166
|
+
|
|
167
|
+
### 5. Verify the gate passes
|
|
168
|
+
```bash
|
|
169
|
+
schema-guard gate --contract contracts/orders.yaml --snapshot-file snapshots/orders.json
|
|
170
|
+
```
|
|
171
|
+
β
Schema matches snapshot. No drift.
|
|
172
|
+
|
|
173
|
+
### 6. Simulate a drift
|
|
174
|
+
```sql
|
|
175
|
+
ALTER TABLE public.orders ALTER COLUMN amount DROP NOT NULL;
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
### 7. Run the gate again (should fail)
|
|
179
|
+
```bash
|
|
180
|
+
schema-guard gate --contract contracts/orders.yaml --snapshot-file snapshots/orders.json
|
|
181
|
+
```
|
|
182
|
+
```text
|
|
183
|
+
β Schema drift detected:
|
|
184
|
+
- CRITICAL: Column 'amount' nullable changed from False to True.
|
|
185
|
+
```
|
|
186
|
+
Command exits with code 1, and an email alert is sent (if configured).
|
|
187
|
+
|
|
188
|
+
### 8. Revert to clean state
|
|
189
|
+
```sql
|
|
190
|
+
ALTER TABLE public.orders ALTER COLUMN amount SET NOT NULL;
|
|
191
|
+
```
|
|
192
|
+
Now the gate passes again.
|
|
193
|
+
|
|
194
|
+
---
|
|
195
|
+
|
|
196
|
+
## π Configuration
|
|
197
|
+
|
|
198
|
+
### Database connection
|
|
199
|
+
Define a PostgreSQL connection string in `.env`:
|
|
200
|
+
```env
|
|
201
|
+
DB_CONNECTION_STRING=postgresql://user:password@host:5432/dbname
|
|
202
|
+
```
|
|
203
|
+
In the contract, reference it with `"env:DB_CONNECTION_STRING"`.
|
|
204
|
+
The tool's `contract.py` resolves any value beginning with `env:` to the corresponding environment variable.
|
|
205
|
+
|
|
206
|
+
### Email alerting (optional)
|
|
207
|
+
Add these to `.env` to receive drift alerts via SMTP:
|
|
208
|
+
```env
|
|
209
|
+
EMAIL_ENABLED=true
|
|
210
|
+
EMAIL_HOST=smtp.gmail.com
|
|
211
|
+
EMAIL_PORT=587
|
|
212
|
+
EMAIL_USER=your_email@gmail.com
|
|
213
|
+
EMAIL_PASSWORD=your_app_password # Use an App Password, not your real password
|
|
214
|
+
EMAIL_FROM=your_email@gmail.com
|
|
215
|
+
EMAIL_TO=oncall@example.com
|
|
216
|
+
EMAIL_SUBJECT=Schema Drift Detected
|
|
217
|
+
```
|
|
218
|
+
If `EMAIL_ENABLED` is not `true`, alerts are silently skippedβthe gate still works, but no email is sent.
|
|
219
|
+
|
|
220
|
+
---
|
|
221
|
+
|
|
222
|
+
## π Command Reference
|
|
223
|
+
|
|
224
|
+
### `snap` β capture a schema snapshot
|
|
225
|
+
```bash
|
|
226
|
+
schema-guard snap --contract <contract.yaml> --snapshot-file <snapshot.json> [--force]
|
|
227
|
+
```
|
|
228
|
+
| Flag | Description |
|
|
229
|
+
|------|-------------|
|
|
230
|
+
| `--contract` | **(required)** Path to the contract YAML file |
|
|
231
|
+
| `--snapshot-file` | Where to save the snapshot (default: `schema_snapshot.json`) |
|
|
232
|
+
| `--force` | Overwrite an existing snapshot. Without this flag, the command will refuse to overwrite and show a diff of detected changes |
|
|
233
|
+
|
|
234
|
+
- Connects to the source defined in the contract.
|
|
235
|
+
- Inspects the table and saves column metadata (name, type, nullable, primary key) along with a SHA-256 hash and timestamp.
|
|
236
|
+
- **Overwrite protection:** If a snapshot already exists, you must pass `--force` to replace it. This prevents accidentally re-baselining against a drifted schema.
|
|
237
|
+
|
|
238
|
+
### `gate` β check for drift
|
|
239
|
+
```bash
|
|
240
|
+
schema-guard gate --contract <contract.yaml> --snapshot-file <snapshot.json> [--require-alert]
|
|
241
|
+
```
|
|
242
|
+
| Flag | Description |
|
|
243
|
+
|------|-------------|
|
|
244
|
+
| `--contract` | **(required)** Path to the contract YAML file |
|
|
245
|
+
| `--snapshot-file` | Baseline snapshot to compare against (default: `schema_snapshot.json`) |
|
|
246
|
+
| `--require-alert` | Exit with code 2 if email alert fails to send |
|
|
247
|
+
|
|
248
|
+
- Verifies snapshot integrity (SHA-256 hash check) before comparing.
|
|
249
|
+
- Extracts the current live schema.
|
|
250
|
+
- Validates contract expectations against the snapshot (emits warnings if mismatched).
|
|
251
|
+
- Compares live schema against the snapshot using the contract rules.
|
|
252
|
+
- **Violation types:**
|
|
253
|
+
|
|
254
|
+
| Violation | Severity |
|
|
255
|
+
|-----------|----------|
|
|
256
|
+
| Column removed | CRITICAL |
|
|
257
|
+
| Type changed (not in `allowed_drift`) | CRITICAL |
|
|
258
|
+
| Nullable changed | CRITICAL |
|
|
259
|
+
| Primary key constraint dropped | CRITICAL |
|
|
260
|
+
| Primary key constraint added | CRITICAL |
|
|
261
|
+
| New column added | WARNING |
|
|
262
|
+
| Allowed drift matched | INFO (notice, non-blocking) |
|
|
263
|
+
|
|
264
|
+
### Exit codes
|
|
265
|
+
|
|
266
|
+
| Code | Meaning |
|
|
267
|
+
|------|---------|
|
|
268
|
+
| `0` | Gate passed β no drift detected |
|
|
269
|
+
| `1` | Schema drift detected β violations found |
|
|
270
|
+
| `2` | Email alert failed (when `--require-alert` is set) or unsupported source type |
|
|
271
|
+
| `3` | Snapshot integrity check failed (file was tampered with or corrupted) |
|
|
272
|
+
|
|
273
|
+
---
|
|
274
|
+
|
|
275
|
+
## π§± Contract YAML Specification
|
|
276
|
+
|
|
277
|
+
A contract file defines the expected state of a data source. The contract is validated on load β missing or malformed fields produce clear error messages.
|
|
278
|
+
|
|
279
|
+
```yaml
|
|
280
|
+
source:
|
|
281
|
+
name: friendly_name # Used in logs (optional)
|
|
282
|
+
type: postgres # Source type (currently only postgres supported)
|
|
283
|
+
connection: "env:DB_CONNECTION_STRING" # Database URL or env:VARIABLE
|
|
284
|
+
schema: public # Database schema (required)
|
|
285
|
+
table: orders # Table name (required)
|
|
286
|
+
|
|
287
|
+
columns: # Optional section
|
|
288
|
+
- name: order_id # Column name (required)
|
|
289
|
+
type: integer # Expected type (required, case-insensitive matching)
|
|
290
|
+
nullable: false # Expected nullability (required)
|
|
291
|
+
allowed_drift: # Optional β type changes allowed without alert
|
|
292
|
+
- from: "numeric(10,2)"
|
|
293
|
+
to: "numeric(12,2)"
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
### Required fields
|
|
297
|
+
|
|
298
|
+
| Section | Required Keys |
|
|
299
|
+
|---------|---------------|
|
|
300
|
+
| `source` | `type`, `connection`, `schema`, `table` |
|
|
301
|
+
| Each column in `columns` | `name`, `type`, `nullable` |
|
|
302
|
+
| Each rule in `allowed_drift` | `from`, `to` |
|
|
303
|
+
|
|
304
|
+
### Matching behavior
|
|
305
|
+
- **`type`** comparison is **case-insensitive** and **whitespace-insensitive**. `INTEGER` matches `integer`, and `NUMERIC(10, 2)` matches `numeric(10,2)`.
|
|
306
|
+
- **`nullable`** must be `true` or `false`.
|
|
307
|
+
- **`allowed_drift`** rules are also matched case-insensitively. When a rule matches, an INFO notice is logged for visibility.
|
|
308
|
+
|
|
309
|
+
---
|
|
310
|
+
|
|
311
|
+
## π Alerting (Email)
|
|
312
|
+
|
|
313
|
+
The `alerter.py` module sends an email with the list of violations. Uses Python's `smtplib` and `email` libraries (no extra deps).
|
|
314
|
+
|
|
315
|
+
- Alert errors are printed to **stderr** as `[alerter] Failed to send email: ...`
|
|
316
|
+
- The gate still fails with exit code 1 regardless of email success/failure.
|
|
317
|
+
- To **require** successful email delivery, use `--require-alert`. If the email fails, the gate exits with code 2.
|
|
318
|
+
|
|
319
|
+
---
|
|
320
|
+
|
|
321
|
+
## βοΈ CI/CD Integration
|
|
322
|
+
|
|
323
|
+
### GitHub Actions example β `.github/workflows/schema-check.yml`
|
|
324
|
+
```yaml
|
|
325
|
+
name: Schema Drift Gate
|
|
326
|
+
|
|
327
|
+
on: [push, pull_request]
|
|
328
|
+
|
|
329
|
+
jobs:
|
|
330
|
+
schema-drift:
|
|
331
|
+
runs-on: ubuntu-latest
|
|
332
|
+
steps:
|
|
333
|
+
- uses: actions/checkout@v3
|
|
334
|
+
- uses: actions/setup-python@v4
|
|
335
|
+
with:
|
|
336
|
+
python-version: '3.10'
|
|
337
|
+
- name: Install schema-guard
|
|
338
|
+
run: pip install -e . # or pip install db-schema-guard
|
|
339
|
+
- name: Run schema gate
|
|
340
|
+
run: schema-guard gate --contract contracts/orders.yaml --snapshot-file snapshots/orders.json --require-alert
|
|
341
|
+
env:
|
|
342
|
+
DB_CONNECTION_STRING: ${{ secrets.DB_CONNECTION_STRING }}
|
|
343
|
+
EMAIL_ENABLED: true
|
|
344
|
+
EMAIL_HOST: ${{ secrets.EMAIL_HOST }}
|
|
345
|
+
EMAIL_PORT: ${{ secrets.EMAIL_PORT }}
|
|
346
|
+
EMAIL_USER: ${{ secrets.EMAIL_USER }}
|
|
347
|
+
EMAIL_PASSWORD: ${{ secrets.EMAIL_PASSWORD }}
|
|
348
|
+
EMAIL_FROM: ${{ secrets.EMAIL_FROM }}
|
|
349
|
+
EMAIL_TO: ${{ secrets.EMAIL_TO }}
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
Store these values as [GitHub Secrets](https://docs.github.com/en/actions/security-guides/encrypted-secrets).
|
|
353
|
+
Now every push and pull request will be checked for schema drift automatically.
|
|
354
|
+
|
|
355
|
+
---
|
|
356
|
+
|
|
357
|
+
## π§ͺ Testing
|
|
358
|
+
|
|
359
|
+
Schema Guard includes a comprehensive test suite with **39 unit tests** covering the core modules. No database connection is required to run the tests β they use in-memory fixtures.
|
|
360
|
+
|
|
361
|
+
### Running the test suite
|
|
362
|
+
|
|
363
|
+
```bash
|
|
364
|
+
# Install with dev dependencies (includes pytest)
|
|
365
|
+
pip install -e ".[dev]"
|
|
366
|
+
|
|
367
|
+
# Run all tests with verbose output
|
|
368
|
+
python -m pytest tests/ -v
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
Expected output:
|
|
372
|
+
```text
|
|
373
|
+
tests/test_contract.py::TestValidContract::test_loads_valid_contract PASSED
|
|
374
|
+
tests/test_contract.py::TestValidContract::test_contract_without_columns_section PASSED
|
|
375
|
+
tests/test_contract.py::TestEnvResolution::test_resolves_env_variable PASSED
|
|
376
|
+
tests/test_contract.py::TestEnvResolution::test_missing_env_variable_raises PASSED
|
|
377
|
+
tests/test_contract.py::TestContractValidation::test_missing_source_section PASSED
|
|
378
|
+
tests/test_contract.py::TestContractValidation::test_missing_source_type PASSED
|
|
379
|
+
tests/test_contract.py::TestContractValidation::test_missing_source_connection PASSED
|
|
380
|
+
tests/test_contract.py::TestContractValidation::test_unsupported_source_type PASSED
|
|
381
|
+
tests/test_contract.py::TestContractValidation::test_column_missing_name PASSED
|
|
382
|
+
tests/test_contract.py::TestContractValidation::test_column_missing_type PASSED
|
|
383
|
+
tests/test_contract.py::TestContractValidation::test_malformed_allowed_drift PASSED
|
|
384
|
+
tests/test_contract.py::TestContractValidation::test_empty_yaml_raises PASSED
|
|
385
|
+
tests/test_diff_engine.py::TestNormalizeType::test_lowercase PASSED
|
|
386
|
+
tests/test_diff_engine.py::TestNormalizeType::test_strip_spaces PASSED
|
|
387
|
+
tests/test_diff_engine.py::TestNormalizeType::test_already_normalized PASSED
|
|
388
|
+
tests/test_diff_engine.py::TestColumnRemoved::test_detects_removed_column PASSED
|
|
389
|
+
tests/test_diff_engine.py::TestColumnAdded::test_detects_added_column PASSED
|
|
390
|
+
tests/test_diff_engine.py::TestNullableChange::test_detects_nullable_change PASSED
|
|
391
|
+
tests/test_diff_engine.py::TestNullableChange::test_no_violation_when_nullable_matches PASSED
|
|
392
|
+
tests/test_diff_engine.py::TestTypeChange::test_detects_type_change PASSED
|
|
393
|
+
tests/test_diff_engine.py::TestTypeChange::test_case_insensitive_no_false_positive PASSED
|
|
394
|
+
tests/test_diff_engine.py::TestTypeChange::test_whitespace_insensitive PASSED
|
|
395
|
+
tests/test_diff_engine.py::TestAllowedDrift::test_allowed_drift_passes PASSED
|
|
396
|
+
tests/test_diff_engine.py::TestAllowedDrift::test_disallowed_drift_fails PASSED
|
|
397
|
+
tests/test_diff_engine.py::TestAllowedDrift::test_allowed_drift_case_insensitive PASSED
|
|
398
|
+
tests/test_diff_engine.py::TestPrimaryKeyChange::test_detects_pk_dropped PASSED
|
|
399
|
+
tests/test_diff_engine.py::TestPrimaryKeyChange::test_detects_pk_added PASSED
|
|
400
|
+
tests/test_diff_engine.py::TestPrimaryKeyChange::test_no_violation_when_pk_matches PASSED
|
|
401
|
+
tests/test_diff_engine.py::TestContractVsSnapshot::test_warns_type_mismatch PASSED
|
|
402
|
+
tests/test_diff_engine.py::TestContractVsSnapshot::test_warns_nullable_mismatch PASSED
|
|
403
|
+
tests/test_diff_engine.py::TestContractVsSnapshot::test_no_warning_when_contract_matches PASSED
|
|
404
|
+
tests/test_diff_engine.py::TestNoDrift::test_identical_schemas_pass PASSED
|
|
405
|
+
tests/test_snapshot.py::TestRoundTrip::test_save_and_load PASSED
|
|
406
|
+
tests/test_snapshot.py::TestRoundTrip::test_hash_is_deterministic PASSED
|
|
407
|
+
tests/test_snapshot.py::TestRoundTrip::test_creates_parent_directories PASSED
|
|
408
|
+
tests/test_snapshot.py::TestIntegrityVerification::test_tampered_schema_raises PASSED
|
|
409
|
+
tests/test_snapshot.py::TestIntegrityVerification::test_tampered_hash_raises PASSED
|
|
410
|
+
tests/test_snapshot.py::TestIntegrityVerification::test_missing_hash_raises PASSED
|
|
411
|
+
tests/test_snapshot.py::TestIntegrityVerification::test_valid_snapshot_passes PASSED
|
|
412
|
+
|
|
413
|
+
============================= 39 passed ==============================
|
|
414
|
+
```
|
|
415
|
+
|
|
416
|
+
### What the tests cover
|
|
417
|
+
|
|
418
|
+
| Test File | Module | What's Tested |
|
|
419
|
+
|-----------|--------|---------------|
|
|
420
|
+
| `test_diff_engine.py` | `diff_engine.py` | Column added/removed, type change detection, case-insensitive comparison, nullable change, primary key drift, allowed drift (with logging), contract-vs-snapshot validation, no-drift happy path |
|
|
421
|
+
| `test_contract.py` | `contract.py` | Valid YAML loading, env variable resolution, missing env var error, missing required keys (`source`, `type`, `connection`, etc.), unsupported source type, malformed `allowed_drift`, empty YAML |
|
|
422
|
+
| `test_snapshot.py` | `snapshot.py` | Save/load round-trip, SHA-256 hash determinism, parent directory auto-creation, tamper detection (modified schema, fake hash, missing hash field) |
|
|
423
|
+
|
|
424
|
+
### Running a specific test file
|
|
425
|
+
```bash
|
|
426
|
+
python -m pytest tests/test_diff_engine.py -v # Only diff engine tests
|
|
427
|
+
python -m pytest tests/test_contract.py -v # Only contract tests
|
|
428
|
+
python -m pytest tests/test_snapshot.py -v # Only snapshot tests
|
|
429
|
+
```
|
|
430
|
+
|
|
431
|
+
### Running a specific test class or method
|
|
432
|
+
```bash
|
|
433
|
+
python -m pytest tests/test_diff_engine.py::TestPrimaryKeyChange -v # One class
|
|
434
|
+
python -m pytest tests/test_snapshot.py::TestIntegrityVerification::test_tampered_schema_raises -v # One test
|
|
435
|
+
```
|
|
436
|
+
|
|
437
|
+
---
|
|
438
|
+
|
|
439
|
+
## π Project Structure
|
|
440
|
+
|
|
441
|
+
```
|
|
442
|
+
schema-guard/
|
|
443
|
+
βββ .github/workflows/
|
|
444
|
+
β βββ schema-check.yml # CI/CD pipeline config
|
|
445
|
+
βββ contracts/
|
|
446
|
+
β βββ orders.yaml # Your data contracts
|
|
447
|
+
βββ snapshots/ # Baseline snapshots (autoβcreated)
|
|
448
|
+
β βββ orders.json
|
|
449
|
+
βββ src/schema_guard/
|
|
450
|
+
β βββ __init__.py
|
|
451
|
+
β βββ cli.py # Click CLI (snap & gate commands)
|
|
452
|
+
β βββ contract.py # YAML parser, env resolver & validator
|
|
453
|
+
β βββ extractors/
|
|
454
|
+
β β βββ __init__.py
|
|
455
|
+
β β βββ postgres.py # PostgreSQL schema inspector
|
|
456
|
+
β βββ snapshot.py # Save/load snapshot JSON with integrity checks
|
|
457
|
+
β βββ diff_engine.py # Drift detection logic (type, nullable, PK)
|
|
458
|
+
β βββ alerter.py # SMTP email alerting
|
|
459
|
+
βββ tests/
|
|
460
|
+
β βββ test_diff_engine.py # 20 tests β drift detection logic
|
|
461
|
+
β βββ test_contract.py # 12 tests β YAML loading & validation
|
|
462
|
+
β βββ test_snapshot.py # 7 tests β save/load & integrity
|
|
463
|
+
βββ pyproject.toml # Build & packaging config
|
|
464
|
+
βββ .env.example # Template for environment variables
|
|
465
|
+
βββ .gitignore
|
|
466
|
+
βββ README.md
|
|
467
|
+
```
|
|
468
|
+
|
|
469
|
+
---
|
|
470
|
+
|
|
471
|
+
## π Extending with New Extractors
|
|
472
|
+
|
|
473
|
+
To support another data source (Snowflake, BigQuery, S3 Parquet, etc.):
|
|
474
|
+
|
|
475
|
+
1. Create a new file in `src/schema_guard/extractors/` (e.g., `snowflake.py`).
|
|
476
|
+
2. Implement a function `get_schema(connection_string, schema_name, table_name)` returning:
|
|
477
|
+
```python
|
|
478
|
+
{
|
|
479
|
+
"table": "schema.table",
|
|
480
|
+
"columns": [
|
|
481
|
+
{
|
|
482
|
+
"name": "col1",
|
|
483
|
+
"type": "varchar",
|
|
484
|
+
"nullable": True,
|
|
485
|
+
"primary_key": False
|
|
486
|
+
},
|
|
487
|
+
...
|
|
488
|
+
]
|
|
489
|
+
}
|
|
490
|
+
```
|
|
491
|
+
3. In `cli.py`, add an `elif` branch for the new source type.
|
|
492
|
+
4. Add any required packages to `pyproject.toml`.
|
|
493
|
+
|
|
494
|
+
Pull requests are welcome!
|
|
495
|
+
|
|
496
|
+
---
|
|
497
|
+
|
|
498
|
+
## π§© Troubleshooting
|
|
499
|
+
|
|
500
|
+
| Problem | Solution |
|
|
501
|
+
|--------|----------|
|
|
502
|
+
| `FileNotFoundError: snapshots/orders.json` | Run `schema-guard snap` first to create the baseline snapshot. The `snapshots/` directory is autoβcreated. |
|
|
503
|
+
| `Snapshot integrity check FAILED` | The snapshot file was manually edited or corrupted. Reβcapture a trusted snapshot with `schema-guard snap --force`. |
|
|
504
|
+
| `Contract 'source' is missing required key` | Your contract YAML is missing a required field. Check the [Contract YAML Specification](#-contract-yaml-specification). |
|
|
505
|
+
| `Unsupported source type` | Only `postgres` is currently supported. Check for typos in `source.type`. |
|
|
506
|
+
| `Could not parse SQLAlchemy URL` | The connection string is invalid. Check `.env` and contract; ensure `env:` placeholders resolve correctly. |
|
|
507
|
+
| `FATAL: password authentication failed` | Wrong credentials or special characters not URLβencoded. Encode `@` as `%40`, `%` as `%25`, etc. Test with `psql`. |
|
|
508
|
+
| Email not sent | Set `EMAIL_ENABLED=true`. Use an App Password for Gmail. Look for `[alerter]` messages in **stderr**. |
|
|
509
|
+
| Gate passes when it shouldn't | Ensure the snapshot file is the correct baseline and hasn't been tampered with. The hash check should catch this. |
|
|
510
|
+
| `ModuleNotFoundError` when running CLI | Run `pip install -e .` from the project root. |
|
|
511
|
+
| Snapshot overwrite refused | Pass `--force` to `schema-guard snap` to overwrite an existing snapshot. |
|
|
512
|
+
|
|
513
|
+
---
|
|
514
|
+
|
|
515
|
+
## π Security Best Practices
|
|
516
|
+
|
|
517
|
+
- **Never commit `.env`** β Use `.env.example` as a template. The `.gitignore` excludes `.env` by default.
|
|
518
|
+
- **Use GitHub Secrets** for CI/CD β Store `DB_CONNECTION_STRING`, email credentials, etc. as encrypted secrets.
|
|
519
|
+
- **Rotate credentials** if they were ever exposed in version control history.
|
|
520
|
+
- **Snapshot integrity** β Every snapshot includes a SHA-256 hash. If the file is tampered with, the `gate` command will reject it immediately.
|
|
521
|
+
|
|
522
|
+
---
|
|
523
|
+
|
|
524
|
+
## π License
|
|
525
|
+
|
|
526
|
+
Schema Guard is openβsource under the **MIT License**. See the [LICENSE](LICENSE) file for details.
|
|
527
|
+
|
|
528
|
+
---
|
|
529
|
+
|
|
530
|
+
**Built with frustration turned into code by a data engineer who just wants to sleep through the night.** β¨
|