db-schema-guard 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,530 @@
1
+ Metadata-Version: 2.4
2
+ Name: db-schema-guard
3
+ Version: 0.1.0
4
+ Summary: Stop silent schema drift before it breaks your data pipelines
5
+ Author-email: Anil Solanki <anusolanki2645@gmail.com>
6
+ License: MIT
7
+ Keywords: data-engineering,schema,ci-cd,postgresql
8
+ Classifier: Development Status :: 3 - Alpha
9
+ Classifier: Intended Audience :: Developers
10
+ Classifier: License :: OSI Approved :: MIT License
11
+ Classifier: Programming Language :: Python :: 3
12
+ Classifier: Programming Language :: Python :: 3.8
13
+ Classifier: Programming Language :: Python :: 3.9
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Description-Content-Type: text/markdown
17
+ Requires-Dist: click>=8.0
18
+ Requires-Dist: pyyaml>=6.0
19
+ Requires-Dist: sqlalchemy>=1.4
20
+ Requires-Dist: psycopg2-binary
21
+ Requires-Dist: python-dotenv
22
+ Provides-Extra: dev
23
+ Requires-Dist: pytest>=7.0; extra == "dev"
24
+
25
+
26
+ # πŸ›‘οΈ Schema Guard
27
+
28
+ **Stop silent schema drift before it breaks your production data pipelines.**
29
+
30
+ Schema Guard is a lightweight CLI tool that captures database schema snapshots and acts as a CI/CD gate, blocking deployments when unauthorized schema changes are detected. It's the missing guardrail for data engineers who've been burned by unexpected column drops, type changes, or nullability shifts.
31
+
32
+ ---
33
+
34
+ ## Why Schema Guard?
35
+
36
+ Every data engineer knows the 2 AM nightmare:
37
+ - A source team adds a column, changes a type, or drops a `NOT NULL` without notice.
38
+ - Your ETL pipeline doesn't failβ€”it silently writes `NULL`s or corrupts downstream data.
39
+ - Executive dashboards break, and you spend hours tracing the issue back to a trivial schema change.
40
+
41
+ Schema Guard stops this at the CI gate. You define a **data contract** (YAML), capture a trusted **schema snapshot**, and then in your deployment pipeline, `schema-guard gate` compares the live source against the snapshot and contract. Any drift fails the build *before* it hits production.
42
+
43
+ ---
44
+
45
+ ## 🧠 Architecture Overview
46
+
47
+ ```
48
+
49
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
50
+ β”‚ Contract │──────▢ β”‚ CLI (snap) │──────▢│ Snapshot JSON β”‚
51
+ β”‚ (orders.yaml) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
52
+ β”‚ β”‚ β”‚
53
+ β”‚ Live Database β”‚ β”‚
54
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
55
+ β”‚
56
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
57
+ β”‚ CLI (gate) β”‚
58
+ β”‚ Compares live schema β”‚
59
+ β”‚ vs snapshot + contractβ”‚
60
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
61
+ β”‚
62
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
63
+ β”‚ Diff Engine β”‚
64
+ β”‚ Detects violations β”‚
65
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
66
+ β”‚
67
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
68
+ β”‚ Alerter (email) β”‚
69
+ β”‚ Sends notifications β”‚
70
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
71
+ ```
72
+
73
+ 1. **Contract** – Describes the expected schema (columns, types, nullability, allowed drifts).
74
+ 2. **Snapshot** – A frozen, integrity-verified JSON representation of the real schema at a trusted point in time.
75
+ 3. **Gate** – Compares live schema to snapshot + contract. If drift is detected, it logs violations, sends an email, and exits with code 1 to fail the CI pipeline.
76
+
77
+ ---
78
+
79
+ ## ✨ Key Features
80
+
81
+ - **Snapshot integrity verification** β€” Every snapshot is SHA-256 hashed. Tampered or corrupted snapshot files are detected and rejected on load.
82
+ - **Overwrite protection** β€” The `snap` command refuses to overwrite an existing baseline unless `--force` is passed, preventing accidental baseline drift.
83
+ - **Case-insensitive type comparison** β€” `INTEGER` vs `integer` and `NUMERIC(10, 2)` vs `numeric(10,2)` are treated as equal. No more false positives.
84
+ - **Primary key drift detection** β€” Dropped or added primary key constraints are flagged as CRITICAL violations.
85
+ - **Contract validation** β€” Malformed YAML contracts fail fast with clear error messages instead of cryptic `KeyError` crashes.
86
+ - **Contract-vs-snapshot warnings** β€” If the snapshot itself doesn't match the contract expectations, you'll see a WARNING notice so you know the baseline may have been captured at a bad time.
87
+ - **Allowed drift visibility** β€” When an `allowed_drift` rule matches, it's logged as an INFO notice instead of being silently swallowed.
88
+ - **Reliable alerting** β€” Email alert failures are reported to stderr. Use `--require-alert` to fail the gate if email delivery fails.
89
+
90
+ ---
91
+
92
+ ## πŸš€ Installation
93
+
94
+ ### From source (for development)
95
+ ```bash
96
+ git clone https://github.com/your-username/schema-guard.git
97
+ cd schema-guard
98
+ pip install -e ".[dev]"
99
+ ```
100
+
101
+ ### From PyPI (once published)
102
+ ```bash
103
+ pip install db-schema-guard
104
+ ```
105
+
106
+ Requires Python β‰₯ 3.8. Dependencies are installed automatically.
107
+
108
+ ---
109
+
110
+ ## ⚑ Quick Start
111
+
112
+ ### 1. Create the database table
113
+ ```sql
114
+ CREATE TABLE public.orders (
115
+ order_id INT PRIMARY KEY,
116
+ amount DECIMAL(10,2) NOT NULL,
117
+ status VARCHAR(20) NOT NULL
118
+ );
119
+
120
+ INSERT INTO orders VALUES (1, 99.99, 'shipped');
121
+ ```
122
+
123
+ ### 2. Define a contract – `contracts/orders.yaml`
124
+ ```yaml
125
+ source:
126
+ name: prod_orders
127
+ type: postgres
128
+ connection: "env:DB_CONNECTION_STRING" # see Configuration
129
+ schema: public
130
+ table: orders
131
+ columns:
132
+ - name: order_id
133
+ type: integer
134
+ nullable: false
135
+ - name: amount
136
+ type: numeric(10,2)
137
+ nullable: false
138
+ allowed_drift:
139
+ - from: "numeric(10,2)"
140
+ to: "numeric(12,2)"
141
+ - name: status
142
+ type: character varying(20)
143
+ nullable: false
144
+ ```
145
+
146
+ ### 3. Set up environment variables
147
+
148
+ Copy the example file and fill in your values:
149
+ ```bash
150
+ cp .env.example .env
151
+ ```
152
+
153
+ Edit `.env`:
154
+ ```env
155
+ DB_CONNECTION_STRING=postgresql://postgres:yourpassword@localhost:5432/schema_guard
156
+ ```
157
+ > **Note:** Special characters must be URL‑encoded, e.g., `@` β†’ `%40`
158
+
159
+ > **⚠️ Security:** Never commit `.env` to version control. The `.gitignore` is already configured to exclude it. Use `.env.example` as a template reference.
160
+
161
+ ### 4. Capture a snapshot of the current schema
162
+ ```bash
163
+ schema-guard snap --contract contracts/orders.yaml --snapshot-file snapshots/orders.json
164
+ ```
165
+ βœ… Snapshot saved to `snapshots/orders.json`
166
+
167
+ ### 5. Verify the gate passes
168
+ ```bash
169
+ schema-guard gate --contract contracts/orders.yaml --snapshot-file snapshots/orders.json
170
+ ```
171
+ βœ… Schema matches snapshot. No drift.
172
+
173
+ ### 6. Simulate a drift
174
+ ```sql
175
+ ALTER TABLE public.orders ALTER COLUMN amount DROP NOT NULL;
176
+ ```
177
+
178
+ ### 7. Run the gate again (should fail)
179
+ ```bash
180
+ schema-guard gate --contract contracts/orders.yaml --snapshot-file snapshots/orders.json
181
+ ```
182
+ ```text
183
+ ❌ Schema drift detected:
184
+ - CRITICAL: Column 'amount' nullable changed from False to True.
185
+ ```
186
+ Command exits with code 1, and an email alert is sent (if configured).
187
+
188
+ ### 8. Revert to clean state
189
+ ```sql
190
+ ALTER TABLE public.orders ALTER COLUMN amount SET NOT NULL;
191
+ ```
192
+ Now the gate passes again.
193
+
194
+ ---
195
+
196
+ ## πŸ“‹ Configuration
197
+
198
+ ### Database connection
199
+ Define a PostgreSQL connection string in `.env`:
200
+ ```env
201
+ DB_CONNECTION_STRING=postgresql://user:password@host:5432/dbname
202
+ ```
203
+ In the contract, reference it with `"env:DB_CONNECTION_STRING"`.
204
+ The tool's `contract.py` resolves any value beginning with `env:` to the corresponding environment variable.
205
+
206
+ ### Email alerting (optional)
207
+ Add these to `.env` to receive drift alerts via SMTP:
208
+ ```env
209
+ EMAIL_ENABLED=true
210
+ EMAIL_HOST=smtp.gmail.com
211
+ EMAIL_PORT=587
212
+ EMAIL_USER=your_email@gmail.com
213
+ EMAIL_PASSWORD=your_app_password # Use an App Password, not your real password
214
+ EMAIL_FROM=your_email@gmail.com
215
+ EMAIL_TO=oncall@example.com
216
+ EMAIL_SUBJECT=Schema Drift Detected
217
+ ```
218
+ If `EMAIL_ENABLED` is not `true`, alerts are silently skippedβ€”the gate still works, but no email is sent.
219
+
220
+ ---
221
+
222
+ ## πŸ“– Command Reference
223
+
224
+ ### `snap` – capture a schema snapshot
225
+ ```bash
226
+ schema-guard snap --contract <contract.yaml> --snapshot-file <snapshot.json> [--force]
227
+ ```
228
+ | Flag | Description |
229
+ |------|-------------|
230
+ | `--contract` | **(required)** Path to the contract YAML file |
231
+ | `--snapshot-file` | Where to save the snapshot (default: `schema_snapshot.json`) |
232
+ | `--force` | Overwrite an existing snapshot. Without this flag, the command will refuse to overwrite and show a diff of detected changes |
233
+
234
+ - Connects to the source defined in the contract.
235
+ - Inspects the table and saves column metadata (name, type, nullable, primary key) along with a SHA-256 hash and timestamp.
236
+ - **Overwrite protection:** If a snapshot already exists, you must pass `--force` to replace it. This prevents accidentally re-baselining against a drifted schema.
237
+
238
+ ### `gate` – check for drift
239
+ ```bash
240
+ schema-guard gate --contract <contract.yaml> --snapshot-file <snapshot.json> [--require-alert]
241
+ ```
242
+ | Flag | Description |
243
+ |------|-------------|
244
+ | `--contract` | **(required)** Path to the contract YAML file |
245
+ | `--snapshot-file` | Baseline snapshot to compare against (default: `schema_snapshot.json`) |
246
+ | `--require-alert` | Exit with code 2 if email alert fails to send |
247
+
248
+ - Verifies snapshot integrity (SHA-256 hash check) before comparing.
249
+ - Extracts the current live schema.
250
+ - Validates contract expectations against the snapshot (emits warnings if mismatched).
251
+ - Compares live schema against the snapshot using the contract rules.
252
+ - **Violation types:**
253
+
254
+ | Violation | Severity |
255
+ |-----------|----------|
256
+ | Column removed | CRITICAL |
257
+ | Type changed (not in `allowed_drift`) | CRITICAL |
258
+ | Nullable changed | CRITICAL |
259
+ | Primary key constraint dropped | CRITICAL |
260
+ | Primary key constraint added | CRITICAL |
261
+ | New column added | WARNING |
262
+ | Allowed drift matched | INFO (notice, non-blocking) |
263
+
264
+ ### Exit codes
265
+
266
+ | Code | Meaning |
267
+ |------|---------|
268
+ | `0` | Gate passed β€” no drift detected |
269
+ | `1` | Schema drift detected β€” violations found |
270
+ | `2` | Email alert failed (when `--require-alert` is set) or unsupported source type |
271
+ | `3` | Snapshot integrity check failed (file was tampered with or corrupted) |
272
+
273
+ ---
274
+
275
+ ## 🧱 Contract YAML Specification
276
+
277
+ A contract file defines the expected state of a data source. The contract is validated on load β€” missing or malformed fields produce clear error messages.
278
+
279
+ ```yaml
280
+ source:
281
+ name: friendly_name # Used in logs (optional)
282
+ type: postgres # Source type (currently only postgres supported)
283
+ connection: "env:DB_CONNECTION_STRING" # Database URL or env:VARIABLE
284
+ schema: public # Database schema (required)
285
+ table: orders # Table name (required)
286
+
287
+ columns: # Optional section
288
+ - name: order_id # Column name (required)
289
+ type: integer # Expected type (required, case-insensitive matching)
290
+ nullable: false # Expected nullability (required)
291
+ allowed_drift: # Optional β€” type changes allowed without alert
292
+ - from: "numeric(10,2)"
293
+ to: "numeric(12,2)"
294
+ ```
295
+
296
+ ### Required fields
297
+
298
+ | Section | Required Keys |
299
+ |---------|---------------|
300
+ | `source` | `type`, `connection`, `schema`, `table` |
301
+ | Each column in `columns` | `name`, `type`, `nullable` |
302
+ | Each rule in `allowed_drift` | `from`, `to` |
303
+
304
+ ### Matching behavior
305
+ - **`type`** comparison is **case-insensitive** and **whitespace-insensitive**. `INTEGER` matches `integer`, and `NUMERIC(10, 2)` matches `numeric(10,2)`.
306
+ - **`nullable`** must be `true` or `false`.
307
+ - **`allowed_drift`** rules are also matched case-insensitively. When a rule matches, an INFO notice is logged for visibility.
308
+
309
+ ---
310
+
311
+ ## πŸ”” Alerting (Email)
312
+
313
+ The `alerter.py` module sends an email with the list of violations. Uses Python's `smtplib` and `email` libraries (no extra deps).
314
+
315
+ - Alert errors are printed to **stderr** as `[alerter] Failed to send email: ...`
316
+ - The gate still fails with exit code 1 regardless of email success/failure.
317
+ - To **require** successful email delivery, use `--require-alert`. If the email fails, the gate exits with code 2.
318
+
319
+ ---
320
+
321
+ ## βš™οΈ CI/CD Integration
322
+
323
+ ### GitHub Actions example – `.github/workflows/schema-check.yml`
324
+ ```yaml
325
+ name: Schema Drift Gate
326
+
327
+ on: [push, pull_request]
328
+
329
+ jobs:
330
+ schema-drift:
331
+ runs-on: ubuntu-latest
332
+ steps:
333
+ - uses: actions/checkout@v3
334
+ - uses: actions/setup-python@v4
335
+ with:
336
+ python-version: '3.10'
337
+ - name: Install schema-guard
338
+ run: pip install -e . # or pip install db-schema-guard
339
+ - name: Run schema gate
340
+ run: schema-guard gate --contract contracts/orders.yaml --snapshot-file snapshots/orders.json --require-alert
341
+ env:
342
+ DB_CONNECTION_STRING: ${{ secrets.DB_CONNECTION_STRING }}
343
+ EMAIL_ENABLED: true
344
+ EMAIL_HOST: ${{ secrets.EMAIL_HOST }}
345
+ EMAIL_PORT: ${{ secrets.EMAIL_PORT }}
346
+ EMAIL_USER: ${{ secrets.EMAIL_USER }}
347
+ EMAIL_PASSWORD: ${{ secrets.EMAIL_PASSWORD }}
348
+ EMAIL_FROM: ${{ secrets.EMAIL_FROM }}
349
+ EMAIL_TO: ${{ secrets.EMAIL_TO }}
350
+ ```
351
+
352
+ Store these values as [GitHub Secrets](https://docs.github.com/en/actions/security-guides/encrypted-secrets).
353
+ Now every push and pull request will be checked for schema drift automatically.
354
+
355
+ ---
356
+
357
+ ## πŸ§ͺ Testing
358
+
359
+ Schema Guard includes a comprehensive test suite with **39 unit tests** covering the core modules. No database connection is required to run the tests β€” they use in-memory fixtures.
360
+
361
+ ### Running the test suite
362
+
363
+ ```bash
364
+ # Install with dev dependencies (includes pytest)
365
+ pip install -e ".[dev]"
366
+
367
+ # Run all tests with verbose output
368
+ python -m pytest tests/ -v
369
+ ```
370
+
371
+ Expected output:
372
+ ```text
373
+ tests/test_contract.py::TestValidContract::test_loads_valid_contract PASSED
374
+ tests/test_contract.py::TestValidContract::test_contract_without_columns_section PASSED
375
+ tests/test_contract.py::TestEnvResolution::test_resolves_env_variable PASSED
376
+ tests/test_contract.py::TestEnvResolution::test_missing_env_variable_raises PASSED
377
+ tests/test_contract.py::TestContractValidation::test_missing_source_section PASSED
378
+ tests/test_contract.py::TestContractValidation::test_missing_source_type PASSED
379
+ tests/test_contract.py::TestContractValidation::test_missing_source_connection PASSED
380
+ tests/test_contract.py::TestContractValidation::test_unsupported_source_type PASSED
381
+ tests/test_contract.py::TestContractValidation::test_column_missing_name PASSED
382
+ tests/test_contract.py::TestContractValidation::test_column_missing_type PASSED
383
+ tests/test_contract.py::TestContractValidation::test_malformed_allowed_drift PASSED
384
+ tests/test_contract.py::TestContractValidation::test_empty_yaml_raises PASSED
385
+ tests/test_diff_engine.py::TestNormalizeType::test_lowercase PASSED
386
+ tests/test_diff_engine.py::TestNormalizeType::test_strip_spaces PASSED
387
+ tests/test_diff_engine.py::TestNormalizeType::test_already_normalized PASSED
388
+ tests/test_diff_engine.py::TestColumnRemoved::test_detects_removed_column PASSED
389
+ tests/test_diff_engine.py::TestColumnAdded::test_detects_added_column PASSED
390
+ tests/test_diff_engine.py::TestNullableChange::test_detects_nullable_change PASSED
391
+ tests/test_diff_engine.py::TestNullableChange::test_no_violation_when_nullable_matches PASSED
392
+ tests/test_diff_engine.py::TestTypeChange::test_detects_type_change PASSED
393
+ tests/test_diff_engine.py::TestTypeChange::test_case_insensitive_no_false_positive PASSED
394
+ tests/test_diff_engine.py::TestTypeChange::test_whitespace_insensitive PASSED
395
+ tests/test_diff_engine.py::TestAllowedDrift::test_allowed_drift_passes PASSED
396
+ tests/test_diff_engine.py::TestAllowedDrift::test_disallowed_drift_fails PASSED
397
+ tests/test_diff_engine.py::TestAllowedDrift::test_allowed_drift_case_insensitive PASSED
398
+ tests/test_diff_engine.py::TestPrimaryKeyChange::test_detects_pk_dropped PASSED
399
+ tests/test_diff_engine.py::TestPrimaryKeyChange::test_detects_pk_added PASSED
400
+ tests/test_diff_engine.py::TestPrimaryKeyChange::test_no_violation_when_pk_matches PASSED
401
+ tests/test_diff_engine.py::TestContractVsSnapshot::test_warns_type_mismatch PASSED
402
+ tests/test_diff_engine.py::TestContractVsSnapshot::test_warns_nullable_mismatch PASSED
403
+ tests/test_diff_engine.py::TestContractVsSnapshot::test_no_warning_when_contract_matches PASSED
404
+ tests/test_diff_engine.py::TestNoDrift::test_identical_schemas_pass PASSED
405
+ tests/test_snapshot.py::TestRoundTrip::test_save_and_load PASSED
406
+ tests/test_snapshot.py::TestRoundTrip::test_hash_is_deterministic PASSED
407
+ tests/test_snapshot.py::TestRoundTrip::test_creates_parent_directories PASSED
408
+ tests/test_snapshot.py::TestIntegrityVerification::test_tampered_schema_raises PASSED
409
+ tests/test_snapshot.py::TestIntegrityVerification::test_tampered_hash_raises PASSED
410
+ tests/test_snapshot.py::TestIntegrityVerification::test_missing_hash_raises PASSED
411
+ tests/test_snapshot.py::TestIntegrityVerification::test_valid_snapshot_passes PASSED
412
+
413
+ ============================= 39 passed ==============================
414
+ ```
415
+
416
+ ### What the tests cover
417
+
418
+ | Test File | Module | What's Tested |
419
+ |-----------|--------|---------------|
420
+ | `test_diff_engine.py` | `diff_engine.py` | Column added/removed, type change detection, case-insensitive comparison, nullable change, primary key drift, allowed drift (with logging), contract-vs-snapshot validation, no-drift happy path |
421
+ | `test_contract.py` | `contract.py` | Valid YAML loading, env variable resolution, missing env var error, missing required keys (`source`, `type`, `connection`, etc.), unsupported source type, malformed `allowed_drift`, empty YAML |
422
+ | `test_snapshot.py` | `snapshot.py` | Save/load round-trip, SHA-256 hash determinism, parent directory auto-creation, tamper detection (modified schema, fake hash, missing hash field) |
423
+
424
+ ### Running a specific test file
425
+ ```bash
426
+ python -m pytest tests/test_diff_engine.py -v # Only diff engine tests
427
+ python -m pytest tests/test_contract.py -v # Only contract tests
428
+ python -m pytest tests/test_snapshot.py -v # Only snapshot tests
429
+ ```
430
+
431
+ ### Running a specific test class or method
432
+ ```bash
433
+ python -m pytest tests/test_diff_engine.py::TestPrimaryKeyChange -v # One class
434
+ python -m pytest tests/test_snapshot.py::TestIntegrityVerification::test_tampered_schema_raises -v # One test
435
+ ```
436
+
437
+ ---
438
+
439
+ ## πŸ“ Project Structure
440
+
441
+ ```
442
+ schema-guard/
443
+ β”œβ”€β”€ .github/workflows/
444
+ β”‚ └── schema-check.yml # CI/CD pipeline config
445
+ β”œβ”€β”€ contracts/
446
+ β”‚ └── orders.yaml # Your data contracts
447
+ β”œβ”€β”€ snapshots/ # Baseline snapshots (auto‑created)
448
+ β”‚ └── orders.json
449
+ β”œβ”€β”€ src/schema_guard/
450
+ β”‚ β”œβ”€β”€ __init__.py
451
+ β”‚ β”œβ”€β”€ cli.py # Click CLI (snap & gate commands)
452
+ β”‚ β”œβ”€β”€ contract.py # YAML parser, env resolver & validator
453
+ β”‚ β”œβ”€β”€ extractors/
454
+ β”‚ β”‚ β”œβ”€β”€ __init__.py
455
+ β”‚ β”‚ └── postgres.py # PostgreSQL schema inspector
456
+ β”‚ β”œβ”€β”€ snapshot.py # Save/load snapshot JSON with integrity checks
457
+ β”‚ β”œβ”€β”€ diff_engine.py # Drift detection logic (type, nullable, PK)
458
+ β”‚ └── alerter.py # SMTP email alerting
459
+ β”œβ”€β”€ tests/
460
+ β”‚ β”œβ”€β”€ test_diff_engine.py # 20 tests β€” drift detection logic
461
+ β”‚ β”œβ”€β”€ test_contract.py # 12 tests β€” YAML loading & validation
462
+ β”‚ └── test_snapshot.py # 7 tests β€” save/load & integrity
463
+ β”œβ”€β”€ pyproject.toml # Build & packaging config
464
+ β”œβ”€β”€ .env.example # Template for environment variables
465
+ β”œβ”€β”€ .gitignore
466
+ └── README.md
467
+ ```
468
+
469
+ ---
470
+
471
+ ## πŸ”Œ Extending with New Extractors
472
+
473
+ To support another data source (Snowflake, BigQuery, S3 Parquet, etc.):
474
+
475
+ 1. Create a new file in `src/schema_guard/extractors/` (e.g., `snowflake.py`).
476
+ 2. Implement a function `get_schema(connection_string, schema_name, table_name)` returning:
477
+ ```python
478
+ {
479
+ "table": "schema.table",
480
+ "columns": [
481
+ {
482
+ "name": "col1",
483
+ "type": "varchar",
484
+ "nullable": True,
485
+ "primary_key": False
486
+ },
487
+ ...
488
+ ]
489
+ }
490
+ ```
491
+ 3. In `cli.py`, add an `elif` branch for the new source type.
492
+ 4. Add any required packages to `pyproject.toml`.
493
+
494
+ Pull requests are welcome!
495
+
496
+ ---
497
+
498
+ ## 🧩 Troubleshooting
499
+
500
+ | Problem | Solution |
501
+ |--------|----------|
502
+ | `FileNotFoundError: snapshots/orders.json` | Run `schema-guard snap` first to create the baseline snapshot. The `snapshots/` directory is auto‑created. |
503
+ | `Snapshot integrity check FAILED` | The snapshot file was manually edited or corrupted. Re‑capture a trusted snapshot with `schema-guard snap --force`. |
504
+ | `Contract 'source' is missing required key` | Your contract YAML is missing a required field. Check the [Contract YAML Specification](#-contract-yaml-specification). |
505
+ | `Unsupported source type` | Only `postgres` is currently supported. Check for typos in `source.type`. |
506
+ | `Could not parse SQLAlchemy URL` | The connection string is invalid. Check `.env` and contract; ensure `env:` placeholders resolve correctly. |
507
+ | `FATAL: password authentication failed` | Wrong credentials or special characters not URL‑encoded. Encode `@` as `%40`, `%` as `%25`, etc. Test with `psql`. |
508
+ | Email not sent | Set `EMAIL_ENABLED=true`. Use an App Password for Gmail. Look for `[alerter]` messages in **stderr**. |
509
+ | Gate passes when it shouldn't | Ensure the snapshot file is the correct baseline and hasn't been tampered with. The hash check should catch this. |
510
+ | `ModuleNotFoundError` when running CLI | Run `pip install -e .` from the project root. |
511
+ | Snapshot overwrite refused | Pass `--force` to `schema-guard snap` to overwrite an existing snapshot. |
512
+
513
+ ---
514
+
515
+ ## πŸ”’ Security Best Practices
516
+
517
+ - **Never commit `.env`** β€” Use `.env.example` as a template. The `.gitignore` excludes `.env` by default.
518
+ - **Use GitHub Secrets** for CI/CD β€” Store `DB_CONNECTION_STRING`, email credentials, etc. as encrypted secrets.
519
+ - **Rotate credentials** if they were ever exposed in version control history.
520
+ - **Snapshot integrity** β€” Every snapshot includes a SHA-256 hash. If the file is tampered with, the `gate` command will reject it immediately.
521
+
522
+ ---
523
+
524
+ ## πŸ“œ License
525
+
526
+ Schema Guard is open‑source under the **MIT License**. See the [LICENSE](LICENSE) file for details.
527
+
528
+ ---
529
+
530
+ **Built with frustration turned into code by a data engineer who just wants to sleep through the night.** ✨