sqlever 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/spec/SPEC.md ADDED
@@ -0,0 +1,1521 @@
1
+ # sqlever — Product Specification
2
+
3
+ - **Version:** 0.9 (draft)
4
+ - **Status:** Pre-development — spec review in progress, no implementation yet
5
+ - **License:** Apache 2.0
6
+ - **Changelog:** [SPEC-CHANGELOG.md](SPEC-CHANGELOG.md)
7
+ - **Location:** `spec/`
8
+
9
+ ---
10
+
11
+ ## Table of contents
12
+
13
+ 1. [Inspiration and prior art](#1-inspiration-and-prior-art)
14
+ 2. [Problems we're solving](#2-problems-were-solving)
15
+ 3. [Goals and non-goals](#3-goals-and-non-goals)
16
+ 4. [Requirements](#4-requirements)
17
+ 5. [Feature ideas](#5-feature-ideas)
18
+ 6. [Design decisions](#6-design-decisions)
19
+ 7. [Architecture](#7-architecture)
20
+ 8. [Testing strategy](#8-testing-strategy)
21
+ 9. [Implementation plan](#9-implementation-plan)
22
+
23
+ ---
24
+
25
+ ## 1. Inspiration and prior art
26
+
27
+ ### Sqitch — the foundation
28
+
29
+ https://sqitch.org — the best migration tool that exists today. It gets the fundamentals right:
30
+
31
+ - Dependency-aware changes (not just sequential numbers)
32
+ - Database-native: tracks state in the database itself, not migration files
33
+ - No ORM coupling: plain SQL, works with anything
34
+ - Proper deploy/revert/verify trinity
35
+
36
+ What it gets wrong: written in Perl (distribution nightmare, no contributors), zero awareness of what a migration *does* to a running system, no primitives for zero-downtime or large-table surgery.
37
+
38
+ ### pgroll
39
+
40
+ https://github.com/xataio/pgroll — expand/contract pattern for PostgreSQL. The right idea (dual-schema transition so app and DB deploys decouple) but wrong implementation (full table recreation, heavy). Teaches us what to do and what not to do.
41
+
42
+ ### migrationpilot
43
+
44
+ https://github.com/mickelsamuel/migrationpilot — static analysis of migration SQL. Catches dangerous patterns. Thin implementation but the right instinct. We'll reimplement and extend.
45
+
46
+ ### GitLab migration helpers
47
+
48
+ Source: https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/migration_helpers.rb
49
+ Batched background migrations docs: https://docs.gitlab.com/development/database/batched_background_migrations/
50
+ Migration style guide: https://docs.gitlab.com/development/migration_style_guide/
51
+ Migration pipeline (how GitLab tests migrations): https://docs.gitlab.com/development/database/database_migration_pipeline/
52
+
53
+ Battle-tested at massive scale. `BatchedMigration` framework: throttled background DML, pause/resume, per-batch transactions, state tracked in Postgres. The gold standard for large-table data migrations. We extract the concepts, drop the Rails dependency.
54
+
55
+ ### SkyTools / PGQ
56
+
57
+ https://github.com/pgq/pgq — 3-partition rotating queue table, entirely inside Postgres. Proven architecture for durable queuing without external systems. Inspiration for our batched DML queue.
58
+
59
+ ### pg_index_pilot
60
+
61
+ https://gitlab.com/postgres-ai/postgresai/-/tree/main/components/index_pilot — pure PL/pgSQL tool for managing `REINDEX INDEX CONCURRENTLY` operations. Key design patterns relevant to sqlever: (1) write-ahead tracking with three-state lifecycle (`in_progress` → `completed` | `failed`) for crash recovery of non-transactional DDL, (2) advisory lock coordination using schema OID and `pg_try_advisory_lock()` to prevent concurrent operations, (3) invalid index cleanup by checking `pg_index.indisvalid` and dropping `_ccnew` suffixed indexes left by failed `REINDEX CONCURRENTLY`, (4) explicit pre-DDL commit to release held locks before executing concurrent DDL. These patterns informed sqlever's non-transactional write-ahead tracking and `sqlever.pending_changes` design.
62
+
63
+ ### Flyway / Liquibase
64
+
65
+ Sequential-numbered files, XML/YAML config, JVM runtime. Wrong philosophy. We take nothing.
66
+
67
+ ---
68
+
69
+ ## 2. Problems we're solving
70
+
71
+ ### Problem 1: Dangerous migrations reach production undetected
72
+
73
+ `ALTER TABLE orders ADD COLUMN processed_at timestamptz NOT NULL DEFAULT '2024-01-01'::timestamptz` — on PostgreSQL < 11 this rewrites the entire table (any `ADD COLUMN ... DEFAULT` causes a table rewrite). On a 500GB orders table at 3am, that's an outage. And even on PG 11+, volatile defaults like `DEFAULT gen_random_uuid()` still cause a full table rewrite (note: `now()` is `STABLE`, not volatile, so it does NOT cause a rewrite on PG 11+). No existing migration tool catches this before deploy.
74
+
75
+ ### Problem 2: Sqitch is Perl
76
+
77
+ Installing Sqitch means managing a Perl runtime. `cpan` in 2026. Binary distribution is painful. Contributing to it is painful. It has accumulated technical debt that will never be paid.
78
+
79
+ ### Problem 3: No tooling for zero-downtime schema changes
80
+
81
+ Renaming a column, changing a type, adding NOT NULL to an existing column — these all require coordinating application and database deploys. No migration tool has first-class primitives for the expand/contract pattern. Teams either skip the pattern (causing downtime) or implement it manually every time (error-prone).
82
+
83
+ ### Problem 4: Large-table data migrations cause incidents
84
+
85
+ Backfilling a new column across 100M rows in a single transaction locks the table. The standard advice ("batch it") has no tooling. Engineers write one-off scripts, forget to throttle, forget to handle failures, forget to track progress.
86
+
87
+ ### Problem 5: `\i` includes are not version-aware
88
+
89
+ Sqitch supports `\i file.sql` to include shared SQL. But if `shared/functions.sql` changes after a migration is written, replaying that migration on a fresh database uses the *current* version of the file — not the version that existed when the migration was created. Silent correctness bug, especially painful in CI.
90
+
91
+ **Real-world example:** The PostgresAI Console project (`postgres-ai/platform-ui/db`) has 130+ shared function files, 56+ view files, and 30+ trigger files — all included via `\i` in deploy scripts. One migration includes 96 files. When these shared files evolve (which they do regularly), historical migrations silently change behavior on replay. Git-correlated snapshot includes (Section 5.2) would resolve each `\i` to the exact file version that existed when the migration was added to the plan, making fresh-database deployments reproducible.
92
+
93
+ ### Problem 6: Migration tooling is invisible to AI agents
94
+
95
+ In the agentic development era, engineers increasingly have AI assistants writing and reviewing code. Migration tools have no affordances for this: no machine-readable output, no structured risk assessment, no integration with code review, no explanation of what SQL actually does.
96
+
97
+ ---
98
+
99
+ ## 3. Goals and non-goals
100
+
101
+ ### Goals
102
+
103
+ - Drop-in CLI replacement for Sqitch — alias `sqitch` → `sqlever` and nothing breaks
104
+ - PostgreSQL-only — depth over breadth, know the target platform deeply
105
+ - Single compiled binary — `bun build --compile`, no runtime deps, <50ms startup
106
+ - Static analysis as a first-class citizen — dangerous patterns caught before deploy, not after
107
+ - All advanced features are opt-in — v1.0 is safe to adopt without understanding expand/contract or batching
108
+ - AI-native — structured output, machine-readable risk reports, CI integration
109
+ - Composable — each major feature usable independently, without adopting the full tool. Teams using Flyway, Alembic, Rails migrations, or raw psql scripts should be able to run `sqlever analyze` as a standalone linter in their CI pipeline without touching their migration runner. The batched DML worker should be invokable standalone. No forced adoption of the whole stack.
110
+
111
+ ### Non-goals
112
+
113
+ - MySQL, SQLite, Oracle, CockroachDB support — explicitly out of scope
114
+ - ORM integration (ActiveRecord, Django ORM, Alembic, Prisma) — out of scope
115
+ - GUI or web dashboard — CLI only, composable with other tools
116
+ - Cloud-hosted service — out of scope for now
117
+ - Replacing application-level migration frameworks for teams already happy with them
118
+
119
+ ---
120
+
121
+ ## 4. Requirements
122
+
123
+ ### R1 — Sqitch CLI compatibility (mandatory)
124
+
125
+ All Sqitch commands must be supported with identical flags and semantics:
126
+
127
+ | Command | Description |
128
+ |---------|-------------|
129
+ | `sqlever init [project]` | Initialize project, create `sqitch.conf` and `sqitch.plan` |
130
+ | `sqlever add <name> [-n note] [-r dep] [--conflict dep]` | Add new change (supports `--no-transaction` pragma) |
131
+ | `sqlever deploy [target] [--to change] [--mode [all\|change\|tag]]` | Deploy changes |
132
+ | `sqlever revert [target] [--to change] [-y]` | Revert changes |
133
+ | `sqlever verify [target] [--from change] [--to change]` | Run verify scripts |
134
+ | `sqlever status [target]` | Show deployment status |
135
+ | `sqlever log [target]` | Show deployment history |
136
+ | `sqlever tag [name]` | Tag current deployment state |
137
+ | `sqlever rework <name>` | Rework an existing change (create new version with same name) |
138
+ | `sqlever rebase [--onto change]` | Revert then re-deploy (convenience for `revert` + `deploy`) |
139
+ | `sqlever bundle [--dest-dir dir]` | Package project for distribution |
140
+ | `sqlever checkout <branch>` | Deploy/revert changes to match a VCS branch |
141
+ | `sqlever show <type> <name>` | Display change/tag details or script contents |
142
+ | `sqlever plan [filter]` | Display plan contents in human-readable format |
143
+ | `sqlever upgrade` | Upgrade the Sqitch registry schema to current version |
144
+ | `sqlever engine add\|alter\|remove\|show\|list` | Manage database engines |
145
+ | `sqlever target add\|alter\|remove\|show\|list` | Manage deploy targets |
146
+ | `sqlever config` | Read/write configuration |
147
+ | `sqlever help [command]` | Show help |
148
+
149
+ Flags that must be supported: `--db-uri`, `--db-client`, `--plan-file`, `--top-dir`, `--registry`, `--quiet`, `--verbose`, `--target`, `--set` / `-s` (template variables), `--log-only`, `--no-verify`, `--verify`, `--no-prompt` / `-y`.
150
+
151
+ **`--set` / `-s`:** Template variable substitution. Sqitch supports passing variables at deploy time that are substituted in scripts via psql's `:variable` syntax. Commonly used for environment-specific schema names, roles, or tablespaces.
152
+
153
+ **`--log-only`:** Record a change as deployed without executing the script. Critical for adopting sqlever on databases where changes were already applied manually or by another tool.
154
+
155
+ **`--registry`:** Specifies the schema name for tracking tables (default: `sqitch`). Some teams use non-default registry schemas (e.g., `_sqitch` or `migrations`).
156
+
157
+ **`rework`:** Allows re-deploying a change that has already been deployed by creating a new version of it. Appends a reworked copy to the plan file with the same change name but a new change ID. A tag must exist between the old and new versions. The old version is referenceable via `change@tag` syntax. This is heavily used for iteratively evolving stored procedures, views, and functions.
158
+
159
+ ### R2 — Plan file format compatibility
160
+
161
+ `sqitch.plan` format must be parsed and written without modification. Existing Sqitch projects must be adoptable with zero file changes.
162
+
163
+ **Plan file pragmas:** The plan file begins with pragmas:
164
+ ```
165
+ %syntax-version=1
166
+ %project=myproject
167
+ %uri=urn:uuid:...
168
+ ```
169
+ The `%uri` pragma is set during `sqitch init --uri <uri>` and is used as the stable project identifier in `sqitch.projects.uri`.
170
+
171
+ **Change entry format:** Each change entry has the form:
172
+ ```
173
+ change_name [dependencies] YYYY-MM-DDTHH:MM:SSZ planner_name <planner_email> # note
174
+ ```
175
+
176
+ **Reworked changes:** The plan file supports duplicate change names separated by a tag. References to specific versions use `change@tag` syntax:
177
+ ```
178
+ add_users 2024-01-01T00:00:00Z user <user@example.com> # add users table
179
+ @v1.0 2024-01-01T00:01:00Z user <user@example.com> # tag v1.0
180
+ add_users [add_users@v1.0] 2024-02-01T00:00:00Z user <user@example.com> # rework users
181
+ ```
182
+
183
+ **Cross-project dependencies:** Dependencies may reference changes in other projects using `project:change` syntax. At deploy time, sqlever checks the tracking tables for the other project's changes.
184
+
185
+ **Change ID computation:** Each change has a unique change ID. Sqitch computes this as a SHA-1 hash using an object format (from `App::Sqitch::Plan::Change->info` and `->id`). The input to SHA-1 is:
186
+
187
+ ```
188
+ change <content_length>\0<content>
189
+ ```
190
+
191
+ Where `\0` is a null byte, `<content_length>` is the decimal string length of `<content>`, and `<content>` is the concatenation of the following lines (each terminated by `\n`):
192
+ ```
193
+ project <project_name>
194
+ uri <project_uri> ← conditional: only if project has a URI (%uri pragma)
195
+ change <change_name>
196
+ parent <parent_change_id> ← conditional: only if this change has a preceding change in the plan (i.e., not the first change)
197
+ planner <planner_name> <<planner_email>>
198
+ date <planned_at_iso8601>
199
+ requires ← conditional: only if change has requires dependencies
200
+ + dep1 ← indented with " + " prefix, one per line
201
+ + dep2
202
+ conflicts ← conditional: only if change has conflict dependencies
203
+ - dep1 ← indented with " - " prefix, one per line
204
+ ← blank line separator before note
205
+ <note text> ← conditional: raw note text (no "note" prefix), only if note is non-empty
206
+ ```
207
+
208
+ Key differences from earlier spec versions: (1) `uri` line is conditional on the project having a `%uri` pragma, (2) `parent` line appears for ANY change that has a preceding change in the plan (not just reworked changes — every change except the first has a parent), (3) requires and conflicts use section headers with indented entries (not `require <dep>` per line), (4) note is raw text after a blank line separator (not `note <text>`), (5) the blank line and note are only present when the note is non-empty. Requires and conflicts entries preserve declaration order (not sorted).
209
+
210
+ sqlever must compute identical IDs. Since `change_id` is the primary key in `sqitch.changes`, any divergence will break the mid-deploy handoff scenario.
211
+
212
+ **Tag ID computation:** Each tag also has a SHA-1 ID (stored as `tag_id` in `sqitch.tags`). The format is:
213
+
214
+ ```
215
+ tag <content_length>\0<content>
216
+ ```
217
+
218
+ Where `<content>` is the concatenation of the following lines (each terminated by `\n`):
219
+ ```
220
+ project <project_name>
221
+ uri <project_uri> ← conditional: only if project has a URI
222
+ tag @<tag_name>
223
+ change <change_id>
224
+ planner <planner_name> <<planner_email>>
225
+ date <planned_at_iso8601>
226
+ ← blank line separator before note
227
+ <note text> ← conditional: raw note text, only if note is non-empty
228
+ ```
229
+
230
+ Like change IDs, the `uri` line is conditional, and the note appears as raw text after a blank line separator (only when non-empty). Note: `tag_name` in the info string uses `format_name`, which prepends `@` — so a tag named `v1.0` produces the line `tag @v1.0`, not `tag v1.0`. This is critical for computing correct tag IDs.
231
+
232
+ ### R3 — Tracking schema compatibility
233
+
234
+ Sqitch tracking tables must be used as-is. Teams currently using Sqitch must be able to switch to sqlever mid-project without re-deploying all migrations. The tracking schema includes the following tables:
235
+
236
+ **`sqitch.projects`:**
237
+ ```sql
238
+ CREATE TABLE sqitch.projects (
239
+ project TEXT PRIMARY KEY,
240
+ uri TEXT NULL UNIQUE,
241
+ created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
242
+ creator_name TEXT NOT NULL,
243
+ creator_email TEXT NOT NULL
244
+ );
245
+ ```
246
+
247
+ Note: `uri` has a `UNIQUE` constraint (projects are uniquely identified by URI when present). `clock_timestamp()` is used instead of `NOW()` — `clock_timestamp()` returns wall-clock time and advances within a transaction, so in `--mode all` each change gets a distinct timestamp rather than all sharing the transaction start time.
248
+
249
+ **`sqitch.releases`:**
250
+ ```sql
251
+ CREATE TABLE sqitch.releases (
252
+ version REAL PRIMARY KEY,
253
+ installed_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
254
+ installer_name TEXT NOT NULL,
255
+ installer_email TEXT NOT NULL
256
+ );
257
+ ```
258
+
259
+ Note: The `releases` table tracks registry schema versions and is used by `sqlever upgrade` to determine if the tracking schema needs migration.
260
+
261
+ **`sqitch.changes`:**
262
+ ```sql
263
+ CREATE TABLE sqitch.changes (
264
+ change_id TEXT PRIMARY KEY,
265
+ script_hash TEXT,
266
+ change TEXT NOT NULL,
267
+ project TEXT NOT NULL REFERENCES sqitch.projects(project) ON UPDATE CASCADE,
268
+ note TEXT NOT NULL DEFAULT '',
269
+ committed_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
270
+ committer_name TEXT NOT NULL,
271
+ committer_email TEXT NOT NULL,
272
+ planned_at TIMESTAMPTZ NOT NULL,
273
+ planner_name TEXT NOT NULL,
274
+ planner_email TEXT NOT NULL,
275
+ UNIQUE (project, script_hash)
276
+ );
277
+ ```
278
+
279
+ **`sqitch.tags`:**
280
+ ```sql
281
+ CREATE TABLE sqitch.tags (
282
+ tag_id TEXT PRIMARY KEY,
283
+ tag TEXT NOT NULL,
284
+ project TEXT NOT NULL REFERENCES sqitch.projects(project) ON UPDATE CASCADE,
285
+ change_id TEXT NOT NULL REFERENCES sqitch.changes(change_id) ON UPDATE CASCADE,
286
+ note TEXT NOT NULL DEFAULT '',
287
+ committed_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
288
+ committer_name TEXT NOT NULL,
289
+ committer_email TEXT NOT NULL,
290
+ planned_at TIMESTAMPTZ NOT NULL,
291
+ planner_name TEXT NOT NULL,
292
+ planner_email TEXT NOT NULL,
293
+ UNIQUE (project, tag)
294
+ );
295
+ ```
296
+
297
+ **`sqitch.dependencies`:**
298
+ ```sql
299
+ CREATE TABLE sqitch.dependencies (
300
+ change_id TEXT NOT NULL REFERENCES sqitch.changes(change_id) ON UPDATE CASCADE ON DELETE CASCADE,
301
+ type TEXT NOT NULL, -- 'require' or 'conflict'
302
+ dependency TEXT NOT NULL,
303
+ dependency_id TEXT NULL REFERENCES sqitch.changes(change_id) ON UPDATE CASCADE,
304
+ PRIMARY KEY (change_id, dependency)
305
+ -- Note: Sqitch has no CHECK constraint on this table. dependency_id can be
306
+ -- NULL for cross-project dependencies regardless of type, and NOT NULL for
307
+ -- conflicts that reference known changes.
308
+ );
309
+ ```
310
+
311
+ **`sqitch.events`:**
312
+ ```sql
313
+ CREATE TABLE sqitch.events (
314
+ event TEXT NOT NULL CHECK (event IN ('deploy', 'revert', 'fail', 'merge')),
315
+ change_id TEXT NOT NULL,
316
+ change TEXT NOT NULL,
317
+ project TEXT NOT NULL REFERENCES sqitch.projects(project) ON UPDATE CASCADE,
318
+ note TEXT NOT NULL DEFAULT '',
319
+ requires TEXT[] NOT NULL DEFAULT '{}',
320
+ conflicts TEXT[] NOT NULL DEFAULT '{}',
321
+ tags TEXT[] NOT NULL DEFAULT '{}',
322
+ committed_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
323
+ committer_name TEXT NOT NULL,
324
+ committer_email TEXT NOT NULL,
325
+ planned_at TIMESTAMPTZ NOT NULL,
326
+ planner_name TEXT NOT NULL,
327
+ planner_email TEXT NOT NULL,
328
+ PRIMARY KEY (change_id, committed_at)
329
+ );
330
+ ```
331
+
332
+ **`script_hash` computation:** Sqitch computes `script_hash` as a plain SHA-1 hash of the raw file content: `SHA-1(<raw_file_bytes>)`. There is NO git-style `"blob <size>\0"` prefix — Sqitch reads the file in raw binary mode and feeds it directly to SHA-1 (see `App::Sqitch::Plan::Change->_deploy_hash`). sqlever must compute identical hashes. If sqlever adds a `"blob <size>\0"` prefix, `sqlever status` will falsely report every script as "modified" compared to what Sqitch recorded.
333
+
334
+ The hash is computed from the raw file bytes (no line-ending normalization). For snapshot includes, the hash is computed from the deploy script file itself, not the assembled content after `\i` resolution. For reworked changes, the hash reflects the file content at the time of deployment — the current file on disk at deploy time, not the file as it existed when the change was first added to the plan.
335
+
336
+ **Registry schema creation:** When sqlever first deploys to a database, it must create the `sqitch` schema and all tracking tables using DDL identical to what Sqitch produces. The creation should use `CREATE SCHEMA IF NOT EXISTS` and `CREATE TABLE IF NOT EXISTS` for idempotent first-deploy, and should be protected by an advisory lock to handle concurrent first-deploys from multiple CI runners.
337
+
338
+ ### R4 — Static analysis on deploy
339
+
340
+ `sqlever deploy` must run static analysis before executing any SQL. On `error`-severity findings, deploy must be blocked (unless `--force` is passed). On `warn`, deploy proceeds with output. `--force-rule SA003` bypasses a specific rule while keeping all other guards active (can be specified multiple times). `--force` remains as the blanket "bypass everything" escape hatch.
341
+
342
+ ### R5 — Machine-readable output
343
+
344
+ All commands must support `--format json` for structured output. Risk reports must be JSON-serializable.
345
+
346
+ ### R6 — Exit codes
347
+
348
+ | Code | Meaning |
349
+ |------|---------|
350
+ | 0 | Success |
351
+ | 1 | Deploy failed |
352
+ | 2 | Analysis blocked deploy (also used by standalone `sqlever analyze` when error-level findings exist) |
353
+ | 3 | Verification failed |
354
+ | 4 | Concurrent deploy detected (another deploy is in progress) |
355
+ | 5 | Lock timeout exceeded (retry may succeed) |
356
+ | 10 | Database unreachable |
357
+
358
+ Note: Previous draft used exit code 127 for "database unreachable." Changed to 10 because POSIX reserves 127 for "command not found," which would make the two conditions indistinguishable in shell scripts and CI systems.
359
+
360
+ ---
361
+
362
+ ## 5. Feature ideas
363
+
364
+ ### 5.1 Static analysis (v1.0)
365
+
366
+ Analyze migration SQL before deploy and flag dangerous patterns. Moved from v1.1 to v1.0 — analysis is core safety infrastructure, not optional.
367
+
368
+ **Severity levels:**
369
+ - `error` — blocks deploy
370
+ - `warn` — prints warning, deploy proceeds
371
+ - `info` — informational
372
+
373
+ **Rule classification:**
374
+ - **Static rules** — can run on SQL alone, no database connection needed. These work in standalone linter mode (`sqlever analyze file.sql`).
375
+ - **Connected rules** — require a database connection for schema introspection (e.g., checking indexes, row counts). When no connection is available, connected rules are silently skipped with an `info`-level note.
376
+ - **Hybrid rules** — have both a static check (always runs) and a connected check (runs when a database connection is available). In standalone mode, only the static portion fires. With a connection, the connected portion may refine, suppress, or add to the static findings.
377
+
378
+ **Rules:**
379
+
380
+ | Rule ID | Severity | Type | Trigger | Why dangerous |
381
+ |---------|----------|------|---------|---------------|
382
+ | `SA001` | error | static | `ADD COLUMN ... NOT NULL` without default | Fails outright on populated tables (`ERROR: column contains null values`). Does NOT fire when a `DEFAULT` is present — the default satisfies the NOT NULL constraint, and that case is covered by SA002/SA002b. |
383
+ | `SA002` | error | static | `ADD COLUMN ... DEFAULT <volatile>` (any PG version) | Volatile defaults (e.g., `random()`, `gen_random_uuid()`, `clock_timestamp()`, `txid_current()`) cause a full table rewrite on ALL PostgreSQL versions, including PG 11+. The PG 11 optimization only applies to immutable/stable defaults. Note: `now()` is `STABLE` (returns transaction start time), not volatile — `DEFAULT now()` does NOT cause a rewrite on PG 11+. |
384
+ | `SA002b` | warn | static | `ADD COLUMN ... DEFAULT <non-volatile>` on PG < 11 | Non-volatile defaults cause a full table rewrite on PG < 11. Safe on PG 11+ (metadata-only). |
385
+ | `SA003` | error | static | `ALTER COLUMN ... TYPE` (unsafe cast) | Full table rewrite + `AccessExclusiveLock`. See safe cast allowlist below. |
386
+ | `SA004` | warn | static | `CREATE INDEX` without `CONCURRENTLY` | Takes `ShareLock`, blocks INSERT/UPDATE/DELETE for duration. |
387
+ | `SA005` | warn | static | `DROP INDEX` without `CONCURRENTLY` | Takes `AccessExclusiveLock`. |
388
+ | `SA006` | warn | static | `DROP COLUMN` | Data loss, irreversible. |
389
+ | `SA007` | error | static | `DROP TABLE` (non-revert context) | Data loss. In sqitch project context, files under `revert/` are exempt. In standalone mode, always fires. |
390
+ | `SA008` | warn | static | `TRUNCATE` | Data loss. |
391
+ | `SA009` | warn | hybrid | `ADD FOREIGN KEY` without `NOT VALID` | Static: detects `ADD FOREIGN KEY` without `NOT VALID` (lock concern — takes `ShareRowExclusiveLock` on both referencing and referenced tables; lock on the referenced table is brief but still blocks concurrent DDL). Connected: also flags missing index on referencing column (ongoing performance concern). Recommend two-step: `ADD CONSTRAINT ... NOT VALID` then `VALIDATE CONSTRAINT` (takes only `ShareUpdateExclusiveLock`, does not block writes). |
392
+ | `SA010` | warn | static | `UPDATE` or `DELETE` without `WHERE` | Full table DML. Downgraded from `error` to `warn` — full-table DML is often intentional in migrations (backfills, cleanups). Use inline suppression for acknowledged cases. |
393
+ | `SA011` | warn | connected | `UPDATE` or `DELETE` on large table (estimated rows > threshold) | Long-running DML, table bloat. Requires `pg_class.reltuples` from live database. |
394
+ | `SA012` | info | static | `ALTER SEQUENCE RESTART` | May break application assumptions. |
395
+ | `SA013` | warn | static | `SET lock_timeout` missing before risky DDL | Runaway lock wait. "Risky DDL" = any DDL taking `AccessExclusiveLock` or `ShareLock`. If lock timeout guard (5.9) auto-prepends, SA013 does not fire. |
396
+ | `SA014` | warn | static | `VACUUM FULL` or `CLUSTER` | Full table lock + rewrite, avoid in migrations. |
397
+ | `SA015` | warn | static | `ALTER TABLE ... RENAME` (table or column) | Breaks running application. Severity is `warn` (not `error`) until expand/contract (v2.0) exists, since there is no way to satisfy the rule before then. After v2.0, promote to `error` for renames not part of an expand/contract pair. |
398
+ | `SA016` | error | static | `ADD CONSTRAINT ... CHECK` without `NOT VALID` | Full table scan under `ShareLock` (PG < 16) / `ShareUpdateExclusiveLock` (PG 16+) — `ShareLock` blocks INSERT/UPDATE/DELETE for the duration of the validation scan. Safe pattern: `ADD CONSTRAINT ... NOT VALID` then `VALIDATE CONSTRAINT`. |
399
+ | `SA017` | error | hybrid | `ALTER COLUMN ... SET NOT NULL` (existing column) | Static: fires on any `SET NOT NULL` (on PG < 12, full table scan under `AccessExclusiveLock`; on PG 12+, metadata-only if a valid CHECK constraint exists). Connected: checks catalog for existing valid `CHECK (col IS NOT NULL)` constraint and suppresses if found. Recommend three-step: add CHECK NOT VALID, validate, then SET NOT NULL. |
400
+ | `SA018` | warn | hybrid | `ADD PRIMARY KEY` without pre-existing index | Static: fires on `ADD PRIMARY KEY` without `USING INDEX` clause (`ALTER TABLE` takes `AccessExclusiveLock`, and the implicit index creation extends lock duration). Connected: checks catalog for pre-existing unique index on the PK columns and suppresses if found. Safe pattern: create index concurrently first, then `ADD CONSTRAINT ... USING INDEX`. |
401
+ | `SA019` | warn | static | `REINDEX` without `CONCURRENTLY` | Takes `AccessExclusiveLock`. PG 12+ supports `REINDEX CONCURRENTLY`. |
402
+ | `SA020` | error | static | `CREATE INDEX CONCURRENTLY`, `DROP INDEX CONCURRENTLY`, or `REINDEX CONCURRENTLY` inside transactional deploy | Cannot run inside a transaction block — will fail at runtime. Change must be marked non-transactional. In project mode: checks plan file for non-transactional marker. In standalone mode: warns on any `CONCURRENTLY` usage with message "Ensure this runs outside a transaction block." Also recognizes `-- sqlever:no-transaction` script comment (sqlever-only convention, see non-transactional changes below). |
403
+ | `SA021` | warn | static | `LOCK TABLE` (any mode) | Explicit locking in migrations is a code smell and dangerous in production. |
404
+
405
+ **SA003 safe cast allowlist:** The following type changes are known to be safe (no table rewrite, binary-compatible):
406
+ - `varchar(N)` to `varchar(M)` where M > N (widening)
407
+ - `varchar(N)` to `varchar` (removing limit)
408
+ - `varchar` to `text`
409
+ - `char(N)` to `varchar` or `text`
410
+ - `numeric(P,S)` to `numeric(P2,S)` where P2 > P (widening precision)
411
+ - `numeric(P,S)` to unconstrained `numeric` (removing precision/scale constraint)
412
+
413
+ Known unsafe casts that require a rewrite (commonly assumed safe but are not):
414
+ - `int` to `bigint` — different binary representation, always rewrites
415
+ - `timestamp` to `timestamptz` — rewrite required
416
+
417
+ All other type changes are flagged. When a `USING` clause is present, SA003 ALWAYS fires regardless of whether the cast is in the safe allowlist — PostgreSQL rewrites the table to evaluate the expression, even when the source and target types are binary-compatible. The allowlist only applies to `ALTER COLUMN TYPE` without a `USING` clause. In the absence of a database connection, the rule is conservative and flags all type changes not in the allowlist. With a connection, the rule can consult `pg_cast` to determine if the cast is binary-coercible.
418
+
419
+ **OPEN:** Build a comprehensive safe-cast list by auditing `pg_cast.castmethod` across PG 14-18. The allowlist above is a starting point.
420
+
421
+ **PL/pgSQL body exclusion:** DML inside `CREATE FUNCTION`, `CREATE PROCEDURE`, and `DO $$ ... $$` blocks is excluded from SA010, SA011, and SA008. These statements define function bodies, not direct migration operations. Analysis rules operate on top-level statements only.
422
+
423
+ **Inline suppression:** Rules can be suppressed per-statement using SQL comments:
424
+ ```sql
425
+ -- sqlever:disable SA010
426
+ UPDATE users SET tier = 'free';
427
+ -- sqlever:enable SA010
428
+ ```
429
+ Or single-line: `UPDATE users SET tier = 'free'; -- sqlever:disable SA010`
430
+
431
+ **Inline suppression scoping rules:**
432
+ - **Block form:** `-- sqlever:disable` ... `-- sqlever:enable` suppresses findings for all statements between the markers. An unclosed block (no matching `enable`) extends to end of file and produces a warning.
433
+ - **Single-line form:** A trailing `-- sqlever:disable` comment attaches to the immediately preceding statement (determined by source range from the parser).
434
+ - **Multiple rules:** Comma-separated rule IDs are supported: `-- sqlever:disable SA010,SA011`.
435
+ - **Unknown rules:** `-- sqlever:disable SA999` (nonexistent rule) produces a warning, not a silent ignore.
436
+ - **`all` keyword:** `-- sqlever:disable all` is NOT supported — suppressing all rules silently is too dangerous. Suppress rules individually.
437
+
438
+ **Per-file overrides in `sqlever.toml`:**
439
+ ```toml
440
+ [analysis.overrides."deploy/backfill_tiers.sql"]
441
+ skip = ["SA010"]
442
+ ```
443
+
444
+ **Configuration via `sqlever.toml`:**
445
+ ```toml
446
+ [analysis]
447
+ error_on_warn = false
448
+ max_affected_rows = 10_000
449
+ skip = []
450
+ pg_version = 14 # minimum PG version migrations must support
451
+ ```
452
+
453
+ `pg_version` represents the minimum supported version. If upgrading from PG 14 to PG 17, set `pg_version = 14` so rules fire for patterns unsafe on the oldest supported version.
454
+
455
+ **Rule interface contract:**
456
+ ```
457
+ interface Rule {
458
+ id: string; // "SA001"
459
+ severity: Severity;
460
+ type: "static" | "connected" | "hybrid";
461
+ check(context: AnalysisContext): Finding[];
462
+ }
463
+ interface AnalysisContext {
464
+ ast: ParseResult;
465
+ rawSql: string;
466
+ filePath: string;
467
+ pgVersion: number;
468
+ config: AnalysisConfig;
469
+ db?: DatabaseClient; // present only for connected rules with active connection
470
+ }
471
+ interface Finding {
472
+ ruleId: string;
473
+ severity: Severity;
474
+ message: string;
475
+ location: { file: string; line: number; column: number; endLine?: number; endColumn?: number };
476
+ suggestion?: string;
477
+ }
478
+ ```
479
+
480
+ Note: `libpg_query` provides byte offsets that must be converted to line/column using the original source text. This conversion is part of `analysis/parser.ts`.
481
+
482
+ **Hybrid rule convention:** Hybrid rules implement a single `check()` method and internally branch on `context.db !== undefined`. When `db` is present, the connected portion may refine, suppress, or add to the static findings. A hybrid rule may produce multiple findings from one statement (e.g., SA009 can produce both a "missing NOT VALID" finding and a "missing index" finding). Suppression filtering (inline `-- sqlever:disable` and per-file overrides) happens in the analyzer entry point after rules return findings — rules do not see or reason about suppressions.
483
+
484
+ **Unused suppression warnings:** If a `-- sqlever:disable SA010` comment matches no actual finding in its scope, the analyzer emits a warning: "Unused suppression for SA010." This prevents dead suppression comments from accumulating.
485
+
486
+ **`sqlever analyze` scope:**
487
+ - `sqlever analyze <file>` — analyze a single SQL file (standalone linter mode, no `sqitch.plan` required).
488
+ - `sqlever analyze <directory>` — analyze all `.sql` files in the directory.
489
+ - `sqlever analyze` (no args, in sqitch project) — analyze all pending (undeployed) migrations from `sqitch.plan`.
490
+ - `sqlever analyze --all` — analyze all migrations in the project.
491
+ - `sqlever analyze --changed` — analyze only files changed in the current git diff (useful in CI for PRs).
492
+
493
+ When no `sqitch.plan` exists and no arguments are given, analyze all `.sql` files in the current directory. This supports the composability goal — teams using other migration tools can run `sqlever analyze` as a standalone linter.
494
+
495
+ ### 5.2 Snapshot includes (v1.2)
496
+
497
+ `\i` / `\ir` includes resolved from the git commit where the migration was added — not HEAD.
498
+
499
+ ```
500
+ sqlever deploy # uses historically-correct included files
501
+ sqlever deploy --no-snapshot # falls back to HEAD (opt-out)
502
+ ```
503
+
504
+ ### 5.3 TUI — interactive deployment dashboard (v1.3)
505
+
506
+ When stdout is a TTY, sqlever shows a live TUI during deploy:
507
+
508
+ ```
509
+ sqlever deploy
510
+ ┌─ Deploying to production ──────────────────────────────┐
511
+ │ [✓] 001_create_users 12ms │
512
+ │ [✓] 002_create_orders 8ms │
513
+ │ [→] 003_add_order_status running... │
514
+ │ [ ] 004_create_indexes pending │
515
+ │ │
516
+ │ Analysis: 0 errors, 1 warning (SA004: missing CONCURRENT) │
517
+ │ Progress: 2/4 changes ████████░░░░ 50% │
518
+ └────────────────────────────────────────────────────────┘
519
+ ```
520
+
521
+ Plain output when piped (`--no-tui` or non-TTY).
522
+
523
+ ### 5.4 Zero-downtime migrations — expand/contract (v2.0)
524
+
525
+ First-class support for the expand/contract pattern.
526
+
527
+ **Expand phase** (backward-compatible):
528
+ - Add new column alongside old
529
+ - Install trigger: writes to old column → synced to new column and vice versa
530
+ - Deploy application code that reads/writes both
531
+
532
+ **Contract phase** (after full app rollout):
533
+ - Verify all rows backfilled (sqlever checks before proceeding)
534
+ - Drop sync trigger
535
+ - Drop old column or constraint
536
+
537
+ **CLI:**
538
+ ```bash
539
+ sqlever add rename_users_name --expand # generates expand migration pair
540
+ sqlever deploy --phase expand
541
+ sqlever deploy --phase contract
542
+ sqlever status # shows expand/contract state
543
+ ```
544
+
545
+ Inspired by pgroll — but surgical (no full table recreation).
546
+
547
+ **Trigger edge cases and mitigations:**
548
+
549
+ 1. **Infinite recursion:** Bidirectional sync triggers (old→new, new→old) can recurse infinitely. All generated sync triggers must include a recursion guard. The guard uses `pg_trigger_depth()` scoped to sqlever triggers by checking the trigger name: `IF pg_trigger_depth() < 2 AND TG_NAME LIKE 'sqlever_sync_%' THEN ... END IF`. This is preferred over the session variable approach (`SET LOCAL sqlever.syncing = 'true'`) because `SET LOCAL` remains true for the entire transaction, which would suppress ALL subsequent trigger fires within the transaction — not just recursive ones. The `pg_trigger_depth()` approach correctly allows multiple sqlever sync triggers on different tables to fire independently within the same transaction while still preventing recursion. All sqlever-generated sync triggers must use the `sqlever_sync_` name prefix.
550
+
551
+ 2. **Logical replication:** Triggers do not fire on logical replication subscribers by default. If the target database is a subscriber, sync triggers will not fire, leaving columns out of sync. sqlever should document this limitation. Using `ALTER TABLE ... ENABLE ALWAYS TRIGGER` is possible but risky (may cause loops). **OPEN:** Determine the recommended approach for logical replication environments.
552
+
553
+ 3. **Partitioned tables:** Since sqlever targets PG 14+ (test matrix), trigger inheritance from the partitioned parent table is always available (PG 13+ feature). sqlever installs sync triggers on the parent table; they automatically apply to all partitions. Backfills must be partition-aware (iterate per-partition for progress tracking and to avoid lock escalation).
554
+
555
+ 4. **COPY performance:** `BEFORE INSERT` triggers fire during `COPY`, which may significantly impact bulk load performance during the expand phase. Document this trade-off.
556
+
557
+ 5. **Trigger installation lock:** Creating a trigger takes `AccessExclusiveLock` on the table. This is the same lock type that the expand/contract pattern is designed to avoid for the overall migration. The lock is brief (metadata-only), but on a high-traffic table with long-running queries, it may require `lock_timeout` and retry logic.
558
+
559
+ 6. **Concurrency control:** The expand/contract phase tracker must use advisory locks to prevent concurrent phase transitions (e.g., two operators running `--phase contract` simultaneously).
560
+
561
+ ### 5.5 Batched background DML (v2.1)
562
+
563
+ Queue-based large-table data migrations, entirely inside Postgres.
564
+
565
+ **Queue architecture:**
566
+ - 3-partition rotating table (PGQ-inspired from SkyTools)
567
+ - No external dependencies (no Redis, no Kafka)
568
+ - All job state visible in Postgres via querying `sqlever.*` tables. `pg_stat_activity` shows the currently executing batch query and its duration/wait events, but job-level state (pending/running/done/failed) is in the queue tables.
569
+
570
+ **Job lifecycle:**
571
+ ```
572
+ pending → running → done
573
+
574
+ failed → (retry) → done
575
+ → dead (max retries exceeded)
576
+
577
+ (manual retry) → running
578
+ ```
579
+
580
+ A `dead` job can be manually retried after the operator fixes the underlying issue. sqlever tracks the last processed primary key so that retried jobs resume from where they stopped, not from the beginning.
581
+
582
+ **Worker heartbeat:** The batch queue table includes a `heartbeat_at` column updated at the start of each batch. A configurable staleness threshold (default: 5 minutes, configurable via `sqlever.toml` `[batch] heartbeat_staleness`) determines when a worker is considered dead. On `sqlever batch status` and at the start of any batch operation, sqlever checks for running jobs with stale heartbeats and marks them as `failed` with an error message indicating the worker was unresponsive. This handles the case where a batch worker process dies silently (OOM kill, network partition) and leaves a job in `running` state indefinitely.
583
+
584
+ **CLI:**
585
+ ```bash
586
+ sqlever batch add backfill_user_tier --table users --batch-size 500 --sleep 100ms
587
+ sqlever batch list
588
+ sqlever batch status backfill_user_tier
589
+ sqlever batch pause backfill_user_tier
590
+ sqlever batch resume backfill_user_tier
591
+ sqlever batch cancel backfill_user_tier
592
+ sqlever batch retry backfill_user_tier # manual retry of dead job
593
+ ```
594
+
595
+ **Features:**
596
+ - Configurable batch size, sleep interval, lock timeout, and statement timeout per batch
597
+ - Progress: rows done / rows remaining / ETA
598
+ - Per-batch transaction — each batch commits independently
599
+ - Replication lag monitoring: query `pg_stat_replication.replay_lag` and pause the batch job when lag exceeds a configurable threshold (default: 10s). Most production databases have replicas; unthrottled batched writes will cause replica lag incidents.
600
+ - VACUUM pressure awareness: monitor `pg_stat_user_tables.n_dead_tup` and pause if dead tuple ratio exceeds a configurable percentage (default: 10%). The ratio is computed as `n_dead_tup / (n_live_tup + n_dead_tup)`. Using a ratio rather than an absolute count ensures the threshold is meaningful regardless of table size. Configurable via `sqlever.toml` `[batch] max_dead_tuple_ratio`. Many small transactions create dead tuples; autovacuum may not keep up on hot tables.
601
+ - Connection management: the batch worker requires a direct PostgreSQL connection (not through PgBouncer in transaction mode) because it uses session-level settings and the connection must persist across sleep intervals. SET statements (`lock_timeout`, `statement_timeout`, `search_path`) are re-issued at the start of each batch transaction as a safety measure.
602
+ - Inspired by GitLab `BatchedMigration`: throttling, pause/resume, retry, state tracking in Postgres
603
+ - https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/migration_helpers.rb
604
+ - https://docs.gitlab.com/development/database/batched_background_migrations/
605
+
606
+ ### 5.6 CI integration (v1.0+)
607
+
608
+ ```bash
609
+ # GitHub Actions
610
+ sqlever analyze --format github-annotations # native GH annotation format
611
+ sqlever analyze --format json | jq . # structured for any CI
612
+
613
+ # GitLab CI
614
+ sqlever analyze --format gitlab-codequality # native GL code quality report
615
+
616
+ # General
617
+ sqlever analyze --strict # exit non-zero on any finding (warnings treated as errors)
618
+ ```
619
+
620
+ Note: `sqlever analyze` returns exit code 2 when error-level findings exist (default behavior). The `--strict` flag additionally treats warnings as errors for exit code purposes. `--strict` is the CLI equivalent of `error_on_warn = true` in `sqlever.toml` `[analysis]` config. This replaces the earlier `--exit-code` flag which was redundant with the default behavior.
621
+
622
+ **Reporter format specifications:**
623
+
624
+ - **`text`** (default): Human-readable output with colors when stdout is a TTY.
625
+ - **`json`**: Structured output following a defined schema:
626
+ ```json
627
+ {
628
+ "version": 1,
629
+ "metadata": { "files_analyzed": 3, "rules_checked": 21, "duration_ms": 42 },
630
+ "findings": [ { "ruleId": "SA004", "severity": "warn", "message": "...", "location": { "file": "...", "line": 5, "column": 1 }, "suggestion": "..." } ],
631
+ "summary": { "errors": 0, "warnings": 1, "info": 0 }
632
+ }
633
+ ```
634
+ - **`github-annotations`**: GitHub Actions workflow commands: `::error file={file},line={line},col={col}::{message}` and `::warning ...`. Appear inline in PR diffs.
635
+ - **`gitlab-codequality`**: GitLab Code Quality JSON schema: `[{"description": "...", "check_name": "SA004", "fingerprint": "...", "severity": "major", "location": {"path": "...", "lines": {"begin": 5}}}]`. Severity mapping from sqlever to GitLab Code Quality: `error` → `critical`, `warn` → `major`, `info` → `minor`. Fingerprint computation: SHA-1 of `(ruleId, filePath, line)` — this produces stable fingerprints for deduplication across CI runs.
636
+
637
+ **Example GitHub Actions step:**
638
+ ```yaml
639
+ - name: Analyze migrations
640
+ run: sqlever analyze --format github-annotations
641
+ ```
642
+
643
+ Annotations appear inline in PR diff — dangerous migration SQL highlighted at the line.
644
+
645
+ ### 5.7 AI integration (v1.2+)
646
+
647
+ **`sqlever explain <migration>`** — plain-English summary of what a migration does and its risk profile, via LLM.
648
+
649
+ **`sqlever review`** — structured risk report suitable for posting as a PR comment (Markdown output). Designed to be called by AI coding agents reviewing PRs.
650
+
651
+ **`sqlever suggest-revert <migration>`** — LLM-assisted revert script generation when no revert was written.
652
+
653
+ **`sqlever chat`** — interactive mode: ask questions about the migration history, planned changes, risk.
654
+
655
+ ### 5.8 DBLab integration (v3.0)
656
+
657
+ Test deploy + revert against a full-size production clone before touching prod.
658
+
659
+ ```bash
660
+ sqlever deploy --dblab-url https://dblab.example.com --dblab-token $TOKEN
661
+ # sqlever provisions a clone, runs deploy+verify+revert, reports result
662
+ # No prod changes until clone test passes
663
+ ```
664
+
665
+ The PostgresAI native advantage — no other migration tool can offer this.
666
+
667
+ ### 5.9 Lock timeout guard (v1.0)
668
+
669
+ Moved from v1.1 to v1.0 — this is core safety infrastructure.
670
+
671
+ Automatically prepend `SET lock_timeout = '5s'` before any DDL that could take a long lock, unless the migration already sets it. Configurable via `sqlever.toml`. Can be disabled.
672
+
673
+ Detection: sqlever scans the deploy script for any `SET lock_timeout` statement at the top level. If found, the auto-prepend is skipped for that script.
674
+
675
+ **Timeout behavior on failure:** When `lock_timeout` fires, the statement fails and the transaction rolls back. sqlever reports the error with actionable guidance: which lock was contended, suggestion to retry, and optionally identify the blocking query via `pg_stat_activity`.
676
+
677
+ **Lock retry for CI:** `sqlever deploy --lock-retries N` (default: 0, no retry) retries acquiring the lock up to N times with exponential backoff (starting at 1 second, doubling each retry, capped at 30 seconds). This is designed for CI pipelines where the operator is not present to manually re-trigger a deploy after a transient lock conflict. Configurable via `sqlever.toml` `[deploy] lock_retries`.
678
+
679
+ **Per-migration override:** Individual migrations can set their own `lock_timeout` (e.g., a `VALIDATE CONSTRAINT` that needs a longer timeout). The auto-prepend is suppressed when the script contains its own `SET lock_timeout`.
680
+
681
+ ### 5.10 Dry-run mode
682
+
683
+ ```bash
684
+ sqlever deploy --dry-run # prints what would be deployed, runs analysis, exits
685
+ ```
686
+
687
+ **What dry-run does:** Validates the plan, checks dependency order, runs static analysis, reports what changes would be deployed. Zero database modifications (verified in tests via table counts before/after).
688
+
689
+ **What dry-run does NOT do:** Predict lock contention (depends on concurrent queries), estimate DDL duration (depends on table size), verify data-dependent operations (constraints, COPY), or check disk space. Dry-run validates the plan and runs static analysis but does not simulate execution.
690
+
691
+ ### 5.11 Migration diff
692
+
693
+ ```bash
694
+ sqlever diff # show schema diff between deployed and plan
695
+ sqlever diff --from tag_a --to tag_b
696
+ ```
697
+
698
+ ---
699
+
700
+ ## 6. Design decisions
701
+
702
+ ### DD1 — TypeScript + Bun, not Rust
703
+
704
+ Rust would give a faster binary but TypeScript + Bun gives:
705
+ - `bun build --compile` produces a single static binary, fast enough (<50ms startup)
706
+ - Easier onboarding for contributors (more TypeScript developers than Rust)
707
+ - Faster iteration — spec is still evolving
708
+ - `pg` (node-postgres) is a mature, battle-tested driver
709
+
710
+ Revisit if performance becomes an issue.
711
+
712
+ ### DD2 — PostgreSQL-only
713
+
714
+ Depth beats breadth. Sqitch's multi-DB support is one reason it can't do PG-specific things (CONCURRENT indexes, advisory locks, partition introspection, `pg_stat_activity`). We know our target. Every feature can assume PG-native primitives.
715
+
716
+ ### DD3 — Sqitch tracking schema compatibility
717
+
718
+ We use the existing Sqitch tables (`sqitch.changes`, etc.) rather than our own schema. Reason: zero migration cost for existing Sqitch users. A team can `alias sqitch=sqlever` and evaluate us before committing. This is the adoption path.
719
+
720
+ **`sqlever.*` schema:** When sqlever-specific features are used, a separate `sqlever.*` schema is created. This includes non-transactional deploy (which uses `sqlever.pending_changes` for write-ahead tracking), expand/contract, and batched DML. The schema is created on first use of any of these features — for example, on the first deploy that includes a non-transactional change. The `sqlever.*` schema is independent of `sqitch.*` and can be safely dropped if reverting to Sqitch (advanced features will stop working, but core migration tracking is unaffected).
721
+
722
+ ### DD4 — SQL parser
723
+
724
+ We use a PostgreSQL-aware SQL parser for static analysis, not regex.
725
+
726
+ Options evaluated:
727
+ - `pgsql-parser` (npm) — JS wrapper around the actual PG parser (`libpg_query`). Exact fidelity, same AST as Postgres. Use this.
728
+ - `pg-query-parser` — older, same approach
729
+ - Hand-rolled regex — too fragile for production rules
730
+
731
+ Decision: **`pgsql-parser`** (or equivalent). If AST is unavailable for some construct, fall back to regex with a clear comment.
732
+
733
+ **psql metacommand pre-processing:** `pgsql-parser` / `libpg_query` parses SQL, not psql metacommands. `\i`, `\ir`, `\set`, `\copy`, etc. are client-side directives. Before passing SQL to the parser, a pre-processing stage must: (1) scan for `\i`/`\ir` lines and resolve includes, (2) strip or record other psql metacommands (`\set`, `\pset`, `\timing`, etc.) with an info-level note, (3) pass the assembled SQL to `pgsql-parser`.
734
+
735
+ ### DD5 — Plan file is source of truth
736
+
737
+ sqlever never modifies `sqitch.plan` without an explicit command. The plan file is append-only during `add`, never rewritten during deploy/revert.
738
+
739
+ ### DD6 — No magic sequencing
740
+
741
+ Like Sqitch, sqlever uses explicit dependency declarations (`-r dep1 -r dep2`), not sequential numbers. Sequential numbers create false ordering assumptions and merge conflicts. Dependencies are explicit.
742
+
743
+ **Conflict dependencies:** Sqitch supports `--conflict dep` (or `!dep` in the plan file), meaning "this change cannot be deployed if `dep` is currently deployed." Before deploying a change, sqlever must check that all requires are deployed and no conflicts are deployed. If a conflict is deployed, deploy fails with an error.
744
+
745
+ ### DD7 — Expand/contract is opt-in
746
+
747
+ The expand/contract pattern requires application-side changes. sqlever never automatically applies it. It provides the primitives and tracks state. Engineers choose when to use it.
748
+
749
+ ### DD8 — All state in Postgres
750
+
751
+ No lock files, no local state files, no `.sqlever/` directory with runtime state. Everything that matters (what's deployed, batch job state, expand/contract phase) lives in the database. This makes sqlever safe to run from multiple machines (CI + developer laptop) without coordination.
752
+
753
+ ### DD9 — 3-partition queue with SKIP LOCKED
754
+
755
+ For batched DML, we use a PGQ-style 3-partition rotating table with `SELECT ... FOR UPDATE SKIP LOCKED` for worker coordination. These are complementary, not alternatives:
756
+
757
+ - **3-partition rotation** solves **bloat and cleanup**. Completed batches accumulate dead tuples if removed via DELETE. Partition rotation allows TRUNCATE of an entire inactive partition — instant, zero bloat, no VACUUM pressure. The three partitions rotate: one active (receiving work), one being processed, one being truncated.
758
+ - **SKIP LOCKED** solves **worker concurrency**. Multiple batch workers can dequeue from the active partition without blocking each other. `SELECT ... FOR UPDATE SKIP LOCKED` lets each worker grab the next available batch without waiting for locks held by other workers.
759
+
760
+ This is the PGQ architecture: partition rotation for cleanup, lock-free dequeue within partitions. Job state tracking (pending/running/done/failed/dead), pause/resume, and progress monitoring are orthogonal — they use status columns and partial indexes within the active partitions.
761
+
762
+ ### DD10 — No hidden network calls
763
+
764
+ sqlever never calls external services without explicit configuration. No telemetry, no update checks, no LLM calls unless `sqlever explain`/`sqlever review` is explicitly invoked.
765
+
766
+ ### DD11 — Sqitch as the oracle for compatibility testing
767
+
768
+ We run Sqitch and sqlever side-by-side against identical databases and compare output, tracking table state, and exit codes. Sqitch is the ground truth. Any divergence is a bug in sqlever.
769
+
770
+ This means Sqitch must be installed in CI. It is available as a Docker image (`sqitch/sqitch`) and as a Perl cpan package. We use the Docker image to avoid Perl runtime management.
771
+
772
+ We maintain a corpus of real-world Sqitch projects as test fixtures (anonymized where needed). Each fixture is tested against both tools and outputs compared.
773
+
774
+ ### DD12 — Script execution model: psql vs node-postgres
775
+
776
+ **The problem:** Sqitch shells out to `psql` to execute migration scripts. It does NOT use a programmatic database driver. This matters because many real-world Sqitch migration scripts use psql metacommands: `\i` (include), `\ir` (include relative), `\set` (variable substitution), `\copy` (client-side copy), `\if`/`\elif`/`\else`/`\endif` (conditionals), `\echo`, etc. `node-postgres` cannot handle any of these — they are client-side directives, not SQL.
777
+
778
+ Sqitch also sets `ON_ERROR_STOP=1` when invoking psql, which aborts on the first error. It disables `.psqlrc` via environment variables.
779
+
780
+ **Decision: Shell out to psql.** Like Sqitch, sqlever executes migration scripts by invoking `psql`. This guarantees 100% compatibility with all psql metacommands (`\i`, `\ir`, `\set`, `\copy`, `\if`/`\elif`/`\endif`, `\echo`, `\gset`, etc.) — no subset, no reimplementation, no compatibility gaps.
781
+
782
+ **Implications:**
783
+ - psql must be installed on the deploy machine. The `--db-client` flag specifies the psql path (default: `psql` from `$PATH`).
784
+ - sqlever sets `ON_ERROR_STOP=1`, disables `.psqlrc` (via `PSQLRC=/dev/null` and `--no-psqlrc`), and passes `--single-transaction` or not depending on `--mode` and `--no-transaction`.
785
+ - `--set key=value` is passed directly to psql as `-v key=value`, preserving full psql variable interpolation (`:variable`, `:'variable'`, `:{?variable}`).
786
+ - Error handling: sqlever parses psql's stderr for error messages and exit code for success/failure. Structured error extraction is best-effort — psql does not emit machine-readable errors.
787
+ - `node-postgres` (`pg` package) is still used for tracking table operations (`sqitch.*`, `sqlever.*`), advisory locks, schema introspection, and batch DML. It is NOT used for executing migration scripts.
788
+
789
+ This cleanly separates concerns: psql runs user SQL (with full metacommand support), `pg` manages sqlever's own database state.
790
+
791
+ **`--mode` transaction semantics (depends on DD12):**
792
+ - `change` mode (default): each change in its own transaction.
793
+ - `all` mode: see note below on Sqitch behavior.
794
+ - `tag` mode: changes grouped by tag, each tag-group in a transaction.
795
+
796
+ **Sqitch `--mode all` behavior (important):** Sqitch's `_deploy_all` does NOT use a single wrapping transaction. It uses per-change transactions with explicit revert on failure — if change N fails, Sqitch explicitly reverts changes N-1, N-2, etc. that were already committed. This means intermediate committed states are visible to other sessions between changes. sqlever may choose to improve upon this by offering true single-transaction semantics for `--mode all` (where supported by the execution model), but must document the behavioral difference. When sqlever uses single-transaction `--mode all`, failure rolls back atomically (no partial state visible); Sqitch's approach shows intermediate states and relies on explicit revert.
797
+
798
+ If using psql, Sqitch does NOT wrap deploy scripts in a transaction managed by Sqitch — it passes transaction control to psql and manages tracking separately. If using node-postgres, sqlever manages `BEGIN`/`COMMIT` directly. The transaction boundary differs by mode.
799
+
800
+ ### DD13 — PgBouncer compatibility and advisory locks
801
+
802
+ **The problem:** Most production PostgreSQL deployments use PgBouncer for connection pooling. PgBouncer in transaction mode has significant implications for sqlever:
803
+
804
+ - `pg_advisory_lock` (session-level) does not work through PgBouncer in transaction mode — the lock is tied to the backend connection, which PgBouncer may reassign between transactions.
805
+ - `pg_advisory_xact_lock` (transaction-level) releases at transaction end — it cannot span the entire deploy in `--mode change` (each change is a separate transaction) or across non-transactional changes. This makes it unsuitable for deploy coordination.
806
+ - Session-level `SET` commands (`lock_timeout`, `statement_timeout`, `search_path`) may leak to other connections or be lost between transactions.
807
+ - The batch worker's sleep interval between batches causes the connection to return to the pool; the next batch may run on a different backend.
808
+
809
+ **Decision:** sqlever deploy, revert, rebase, checkout, and batch operations require direct PostgreSQL connections, not PgBouncer in transaction mode. sqlever will:
810
+
811
+ 1. **Use session-level advisory locks:** Deploy coordination uses session-level advisory locks. The default mode is non-blocking: `pg_try_advisory_lock(<lock_key>)`, which returns `false` immediately if the lock is held by another session. If the lock is not acquired, sqlever exits with code 4 (concurrent deploy detected). For CI environments where waiting is preferred, an alternative wait mode is available: `SET lock_timeout = '<advisory_lock_timeout>'` followed by `pg_advisory_lock(<lock_key>)`. The `advisory_lock_timeout` is configurable (default: 30 seconds) via `sqlever.toml` `[deploy] advisory_lock_timeout`. If the timeout expires, sqlever exits with code 5 (lock timeout). The lock key is computed in the application layer as `pg_advisory_lock(<namespace_constant>, <project_hash>)` using the two-argument form with a fixed namespace constant and a stable application-computed hash of the project name. This avoids using `hashtext()`, whose output is NOT guaranteed stable across PostgreSQL major versions (the hash function can change during upgrades). The lock is held for the entire deploy session and released explicitly on completion via `pg_advisory_unlock()` (or automatically on disconnect for crash recovery). The same lock must be acquired for `revert`, `rebase`, and `checkout` — any command that modifies tracking state or executes DDL/DML. **Note:** Sqitch uses `LOCK TABLE sqitch.changes IN EXCLUSIVE MODE` (a table-level lock inside each change's transaction) for concurrency control. sqlever's advisory lock approach is a sqlever improvement that provides stronger coordination (spans the full deploy session, not just individual transactions).
812
+ 2. **Detect PgBouncer:** Attempt `SHOW pool_mode` (PgBouncer-specific command that returns the pool mode; errors on direct PG connections). This is a best-effort detection — it works for standard PgBouncer installations but may not detect all pooler configurations. If PgBouncer in transaction mode is detected, emit an **error** (not just a warning) for deploy/revert/rebase/checkout operations, as session-level advisory locks are not safe. The `connection_type` config option in `sqlever.toml` is the reliable mechanism: `connection_type = "direct"` for non-PgBouncer poolers, PgBouncer in session mode, or when `SHOW pool_mode` detection is unreliable.
813
+ 3. **Re-issue SET commands:** At the start of each transaction, re-issue any session-level settings (`lock_timeout`, `statement_timeout`, `search_path`) as a safety measure.
814
+ 4. **Document:** Require direct PostgreSQL connections for deploy/batch operations. Application traffic can continue to use PgBouncer.
815
+
816
+ ### DD14 — Deploy connection session settings
817
+
818
+ Deploy connections should set:
819
+ - `application_name = 'sqlever/<command>/<project>'` — e.g., `sqlever/deploy/myproject`. Visible in `pg_stat_activity`, critical for DBAs diagnosing lock contention or long-running queries during incidents.
820
+ - `statement_timeout = 0` (or a configurable high value) — migrations are inherently long-running; a global `statement_timeout` (common in production, e.g., 30s) will kill legitimate operations like `VALIDATE CONSTRAINT`. For non-transactional DDL (e.g., `CREATE INDEX CONCURRENTLY`), a separate configurable timeout applies (default: 4 hours, configurable via `sqlever.toml` `[deploy] non_transactional_statement_timeout`). This prevents indefinite hangs while allowing legitimately long operations.
821
+ - `idle_in_transaction_session_timeout` — set to a configurable generous value (default: 10 minutes), not unlimited. This provides a safety net against hung deploy processes (e.g., operator walks away during a TUI prompt) without interfering with normal operation. Configurable via `sqlever.toml` `[deploy] idle_in_transaction_session_timeout`.
822
+ - `lock_timeout` — set by the lock timeout guard (5.9), per-migration configurable.
823
+ - `search_path` — respect the database/role default (Sqitch-compatible behavior). Sqitch does not set `search_path`; sqlever follows suit. Override available via `sqlever.toml` `[deploy] search_path` for teams that want explicit control.
824
+
825
+ ---
826
+
827
+ ## 7. Architecture
828
+
829
+ ```
830
+ sqlever/
831
+ ├── src/
832
+ │ ├── cli.ts # Entry point, command routing
833
+ │ ├── commands/
834
+ │ │ ├── init.ts
835
+ │ │ ├── add.ts
836
+ │ │ ├── deploy.ts
837
+ │ │ ├── revert.ts
838
+ │ │ ├── verify.ts
839
+ │ │ ├── status.ts
840
+ │ │ ├── log.ts
841
+ │ │ ├── tag.ts
842
+ │ │ ├── rework.ts # Rework existing change
843
+ │ │ ├── rebase.ts # Revert + deploy convenience
844
+ │ │ ├── bundle.ts # Package for distribution
845
+ │ │ ├── checkout.ts # VCS branch switch
846
+ │ │ ├── show.ts # Display change/tag details
847
+ │ │ ├── plan.ts # Display plan contents
848
+ │ │ ├── upgrade.ts # Upgrade registry schema
849
+ │ │ ├── analyze.ts # Static analysis (standalone)
850
+ │ │ ├── explain.ts # AI explain
851
+ │ │ ├── batch.ts # Batched DML commands
852
+ │ │ └── diff.ts
853
+ │ ├── plan/
854
+ │ │ ├── parser.ts # sqitch.plan parser (pragmas, reworked changes, @tag refs)
855
+ │ │ ├── writer.ts # sqitch.plan writer
856
+ │ │ └── types.ts # Change, Tag, Dependency types
857
+ │ ├── db/
858
+ │ │ ├── client.ts # pg connection wrapper
859
+ │ │ ├── registry.ts # sqitch.* table operations
860
+ │ │ └── introspect.ts # Schema introspection for connected analysis rules
861
+ │ ├── analysis/
862
+ │ │ ├── index.ts # Analyzer entry point, rule registry
863
+ │ │ ├── parser.ts # SQL AST parsing (pgsql-parser), byte-offset→line/col conversion
864
+ │ │ ├── preprocess.ts # psql metacommand pre-processing (\i, \set, etc.)
865
+ │ │ ├── rules/
866
+ │ │ │ ├── SA001.ts # One file per rule
867
+ │ │ │ ├── SA002.ts
868
+ │ │ │ └── ...
869
+ │ │ └── reporter.ts # Output formatting (text/json/github/gitlab)
870
+ │ ├── includes/
871
+ │ │ └── snapshot.ts # git-aware \i / \ir resolution
872
+ │ ├── expand-contract/
873
+ │ │ ├── generator.ts # Generate expand/contract migration pairs
874
+ │ │ └── tracker.ts # Track phase state in Postgres
875
+ │ ├── batch/
876
+ │ │ ├── queue.ts # 3-partition queue schema + operations
877
+ │ │ ├── worker.ts # Batch execution loop
878
+ │ │ └── progress.ts # Progress tracking + ETA
879
+ │ ├── tui/
880
+ │ │ └── deploy.ts # Interactive deploy dashboard
881
+ │ ├── ai/
882
+ │ │ ├── explain.ts # Migration explainer
883
+ │ │ └── review.ts # PR review comment generator
884
+ │ ├── config.ts # sqlever.toml + sqitch.conf parsing
885
+ │ └── output.ts # Shared output formatting
886
+ ├── tests/
887
+ │ ├── unit/
888
+ │ ├── integration/
889
+ │ └── fixtures/
890
+ ├── spec/
891
+ │ ├── SPEC.md
892
+ │ └── SPEC-CHANGELOG.md
893
+ ├── README.md
894
+ ├── package.json
895
+ ├── tsconfig.json
896
+ └── sqlever.toml.example
897
+ ```
898
+
899
+ **`sqitch.conf` format:** Sqitch uses a Git-style INI configuration format (not TOML — parsed by `Config::GitLike`). sqlever must parse this format including sections, subsections, multi-valued keys, and includes:
900
+ ```ini
901
+ [core]
902
+ engine = pg
903
+ top_dir = migrations
904
+ plan_file = migrations/sqitch.plan
905
+ [engine "pg"]
906
+ target = db:pg:mydb
907
+ client = /usr/bin/psql
908
+ [deploy]
909
+ verify = true
910
+ mode = change
911
+ [target "production"]
912
+ uri = db:pg://user@host/dbname
913
+ ```
914
+
915
+ The `[deploy]` section controls default deploy behavior: `verify` (default: `true`, run verify scripts after each change), `mode` (default: `change`, transaction scope). These correspond to `--verify`/`--no-verify` and `--mode` command-line flags.
916
+
917
+ **Configuration precedence:** system (`$(prefix)/etc/sqitch/sqitch.conf`) < user (`~/.sqitch/sqitch.conf`) < project (`./sqitch.conf`) < `sqlever.toml` (sqlever-only features) < environment variables < command-line flags.
918
+
919
+ **Target URI scheme:** Sqitch uses a `db:` URI scheme: `db:pg://user:pass@host:port/dbname`. sqlever must accept both `db:pg:` URIs (Sqitch compat) and standard PostgreSQL URIs (`postgresql://...`).
920
+
921
+ **Default paths:** plan file = `./sqitch.plan`, top dir = `.`, deploy dir = `./deploy`, revert dir = `./revert`, verify dir = `./verify`. All overridable in `sqitch.conf` under `[core]`.
922
+
923
+ ### Data flow — deploy
924
+
925
+ ```
926
+ sqlever deploy
927
+ → parse sqitch.conf + sqlever.toml
928
+ → connect to database (set application_name, statement_timeout=0, idle_in_transaction_session_timeout=10min)
929
+ → acquire session-level advisory lock (default: non-blocking)
930
+ → pg_try_advisory_lock(<lock_key>) — returns false immediately if lock held
931
+ → if lock not acquired: exit 4 (concurrent deploy detected)
932
+ → alternative wait mode (CI): SET lock_timeout = '<advisory_lock_timeout>'; pg_advisory_lock(<lock_key>)
933
+ → if timeout exceeded: exit 5 (lock timeout)
934
+ → lock_key: application-computed stable hash (see I3 note in DD13)
935
+ → (requires direct connection — not PgBouncer in transaction mode)
936
+ → read sqitch.* tracking tables
937
+ → compute pending changes (topological sort by dependency)
938
+ → check conflict dependencies (no conflicts may be currently deployed)
939
+ → for each pending change:
940
+ → resolve \i includes (snapshot or HEAD)
941
+ → pre-process psql metacommands
942
+ → run static analysis
943
+ → if error: abort (unless --force)
944
+ → if warn: print, continue
945
+ → if change is non-transactional:
946
+ → execute deploy script WITHOUT transaction wrapper
947
+ → on success: BEGIN; update sqitch.changes, sqitch.events, sqitch.dependencies; COMMIT
948
+ → on failure: report error, note that partial DDL may remain (e.g., INVALID index)
949
+ → else (normal transactional change):
950
+ → BEGIN
951
+ → SET lock_timeout (if guard enabled and script doesn't set its own)
952
+ → execute deploy script
953
+ → update sqitch.changes, sqitch.events, sqitch.dependencies
954
+ → COMMIT
955
+ → release advisory lock: pg_advisory_unlock(<lock_key>) [always — success, failure, or analysis abort]
956
+ → print summary
957
+ ```
958
+
959
+ **Advisory lock release:** `pg_advisory_unlock()` must be called on ALL exit paths — successful completion, deploy failure (analysis block, script error, verification failure), and user abort. Disconnect-based release (PG automatically releases session-level advisory locks on disconnect) is the safety net for crashes, not the primary unlock mechanism. Implementations must use a `finally`-style pattern to ensure the unlock call is always reached.
960
+
961
+ **Non-transactional changes:** sqlever marks non-transactional changes via a plan file pragma added by `sqlever add --no-transaction`. Additionally, sqlever recognizes a `-- sqlever:no-transaction` comment on the first line of the deploy script as a sqlever-only convention. **Note:** Sqitch does NOT have a `-- sqitch-no-transaction` convention — no evidence of this mechanism exists in the Sqitch source code. Sqitch always wraps changes in `begin_work`/`finish_work`. The script comment convention is a sqlever-only innovation for standalone linter mode (SA020 detection) and should not be described as Sqitch-compatible. During deploy, non-transactional changes execute without `BEGIN`/`COMMIT` wrapping. A separate configurable `statement_timeout` applies to non-transactional DDL (default: 4 hours, configurable via `sqlever.toml` `[deploy] non_transactional_statement_timeout`).
962
+
963
+ **Non-transactional write-ahead tracking:** Before executing non-transactional DDL, sqlever writes a "pending" record to `sqlever.pending_changes` (in its own committed transaction). After the DDL succeeds, the record is updated to "complete" and the sqitch tracking tables are updated. On the next deploy, sqlever checks for any "pending" non-transactional changes and verifies their state before deciding to skip or retry. This handles the case where sqlever crashes between DDL execution and tracking table update. **Note:** `sqlever.pending_changes` reads and writes are protected by the same session-level deploy advisory lock (DD13). The advisory lock prevents concurrent deploys from simultaneously reading or acting on pending records — without it, two processes could both attempt to verify and resolve the same pending change.
964
+
965
+ **`sqlever.pending_changes` schema:**
966
+ ```sql
967
+ CREATE TABLE sqlever.pending_changes (
968
+ change_id TEXT PRIMARY KEY,
969
+ change_name TEXT NOT NULL,
970
+ project TEXT NOT NULL,
971
+ script_path TEXT NOT NULL,
972
+ started_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
973
+ status TEXT NOT NULL DEFAULT 'pending'
974
+ CHECK (status IN ('pending', 'complete', 'failed')),
975
+ error_message TEXT
976
+ );
977
+ ```
978
+
979
+ **Non-transactional verify logic:** When sqlever finds a "pending" record on the next deploy, it verifies the change's state. For index operations (`CREATE INDEX CONCURRENTLY`), sqlever checks `pg_index.indisvalid` to determine if the index was successfully created. For other DDL, sqlever runs the change's verify script (if one exists). Automated verification only works for known DDL patterns — for arbitrary DDL without a verify script, sqlever reports the pending state and requires manual resolution.
980
+
981
+ Failure recovery for non-transactional DDL is fundamentally different: a failed `CREATE INDEX CONCURRENTLY` leaves an `INVALID` index that must be cleaned up. The error message must include the exact command to drop the INVALID index before retrying.
982
+
983
+ **`ALTER TYPE ... ADD VALUE` note:** On PG < 12, `ALTER TYPE ... ADD VALUE` cannot run inside a transaction block and must be marked non-transactional. On PG 12+, it can run inside a transaction, but the new enum value is **not usable within the same transaction** — an `INSERT` using the new value in the same transaction will fail. If a deploy script does `ALTER TYPE ... ADD VALUE 'x'` followed by `INSERT ... VALUES ('x')`, they must be in separate changes or the change must be non-transactional.
984
+
985
+ **Transaction scope by `--mode`:**
986
+ - `change` (default): each change in its own transaction (as shown above).
987
+ - `all`: all changes in a single transaction. Failure rolls back everything including tracking table updates. Note: this is a sqlever improvement — Sqitch uses per-change transactions with explicit revert on failure (see DD12).
988
+ - `tag`: changes grouped by tag, each tag-group in a single transaction.
989
+
990
+ Non-transactional changes always execute outside any transaction regardless of `--mode`. In `--mode all`, non-transactional changes break the surrounding transaction: sqlever issues `COMMIT` before the non-transactional DDL, executes it, then issues `BEGIN` to continue with subsequent changes. This means `--mode all` cannot guarantee atomicity when non-transactional changes are present. If a transactional change fails after a non-transactional change has already committed, the non-transactional DDL remains deployed (its tracking update was committed separately). The subsequent transactional changes roll back, including their tracking records. This leaves a partially-deployed state where `sqlever status` correctly reports which changes are deployed and which are not. sqlever emits a warning at the start of deploy when `--mode all` is used with a plan containing non-transactional changes.
991
+
992
+ ---
993
+
994
+ ## 8. Testing strategy
995
+
996
+ Reliability is non-negotiable for a migration tool. A bug in a migration runner can corrupt production databases. We invest heavily in testing infrastructure from day one.
997
+
998
+ ### 8.1 Test pyramid
999
+
1000
+ ```
1001
+ ┌──────────────┐
1002
+ │ E2E compat │ ← Sqitch oracle tests (slowest, highest confidence)
1003
+ ┌┴──────────────┴┐
1004
+ │ Integration │ ← Real Postgres, real filesystem
1005
+ ┌┴────────────────┴┐
1006
+ │ Unit tests │ ← Pure functions, no I/O (fastest)
1007
+ └──────────────────┘
1008
+ ```
1009
+
1010
+ ### 8.2 Unit tests
1011
+
1012
+ Fast, no I/O, run on every commit.
1013
+
1014
+ **Plan parser**
1015
+ - Round-trip: parse every valid sqitch.plan format → serialize → parse again, result identical
1016
+ - All comment styles (`#`, blank lines)
1017
+ - All pragmas (`%syntax-version`, `%project`, `%uri`)
1018
+ - Reworked changes: duplicate change names with `@tag` references, change ID disambiguation
1019
+ - All dependency forms (`@tag`, `change`, `project:change`, `!conflict`)
1020
+ - Cross-project dependencies (`project:change` syntax)
1021
+ - Change entry format: timestamp + planner name + email + note
1022
+ - Unicode in change names and notes
1023
+ - Edge cases: empty plan, plan with only pragmas, plan with tags only
1024
+ - Change name character set validation (alphanumeric, hyphens, underscores, forward slashes)
1025
+ - Note parsing: `#` in middle of change line starts the note vs. `#` at beginning of line is a comment
1026
+
1027
+ **Change ID computation**
1028
+ - Compute change IDs for known Sqitch test cases and verify byte-for-byte match
1029
+
1030
+ **Config parser**
1031
+ - `sqitch.conf` Git-style INI format: sections, subsections (`[engine "pg"]`), multi-valued keys
1032
+ - `db:pg:` URI scheme parsing and conversion to standard PostgreSQL URI
1033
+ - `sqlever.toml` overrides: precedence rules (system < user < project < sqlever.toml < env vars < flags)
1034
+ - Invalid config: clear error messages, no panics
1035
+
1036
+ **Analysis rules**
1037
+ For each rule SA001–SA021:
1038
+ - SQL strings that must trigger the rule (positive cases, with location info)
1039
+ - SQL strings that must NOT trigger (false positive prevention)
1040
+ - Version-aware cases: SQL safe on PG 17 but dangerous on PG 14 (e.g. SA002b)
1041
+ - Multi-statement scripts: rule fires on correct statement, not adjacent ones
1042
+ - Rule interactions: two rules on same statement both fire independently
1043
+ - PL/pgSQL body exclusion: DML inside CREATE FUNCTION / DO blocks does not fire SA010/SA011/SA008
1044
+ - Inline suppression: `-- sqlever:disable SA010` prevents rule from firing
1045
+ - SA003 safe-cast allowlist: verify each safe cast does NOT fire, each unsafe cast does fire
1046
+ - SA020: CREATE INDEX CONCURRENTLY detection in transactional context
1047
+
1048
+ For connected rules (SA009, SA011): fixtures include companion `.context.json` files with mock introspection data (table schemas, row estimates, index lists).
1049
+
1050
+ **Snapshot includes**
1051
+ - Mock git: given commit hash → file content mapping, verify correct version resolved
1052
+ - Fallback: no git repo → HEAD used, no error
1053
+ - Missing file: clear error, not silent skip
1054
+ - Nested includes: `\i a.sql` where `a.sql` also has `\i b.sql`
1055
+
1056
+ **Topological sort**
1057
+ - Linear deps: A → B → C deploys in order
1058
+ - Diamond deps: A → B, A → C, D → B, D → C deploys correctly
1059
+ - Cycle detection: circular deps produce clear error before any deploy
1060
+ - Partial deploy `--to <change>`: correct subset selected
1061
+ - Conflict dependencies: change with `!conflict` fails if conflict is deployed
1062
+
1063
+ **Exit codes**
1064
+ - Each exit code scenario produces the correct code
1065
+ - Analysis error (2) vs deploy failure (1) vs verification failure (3) vs concurrent deploy (4) vs lock timeout (5) vs database unreachable (10) are distinct
1066
+
1067
+ ### 8.3 Integration tests
1068
+
1069
+ Real Postgres via Docker. No mocks for DB layer. Each test gets a fresh database.
1070
+
1071
+ **PG version matrix: 14, 15, 16, 17, 18.**
1072
+
1073
+ PG < 14 is best-effort/untested. Version-aware rules (SA002b) still fire based on `pg_version` config without integration tests on old versions.
1074
+
1075
+ **Command coverage (per PG version):**
1076
+
1077
+ | Command | What we test |
1078
+ |---------|-------------|
1079
+ | `init` | Creates sqitch.conf, sqitch.plan, deploy/ revert/ verify/ dirs |
1080
+ | `add` | Creates correctly named files, appends to plan, handles `-r` deps and `--conflict` |
1081
+ | `add --no-transaction` | Plan entry contains no-transaction pragma |
1082
+ | `deploy` | Executes SQL, updates sqitch.changes + sqitch.events + sqitch.dependencies, correct timestamps |
1083
+ | `deploy --to` | Stops at specified change, tracking state correct |
1084
+ | `deploy --dry-run` | Zero DB changes (verified via table counts before/after) |
1085
+ | `deploy --mode change` | Each change in own transaction, stops on first failure, tracking state consistent |
1086
+ | `deploy --mode all` | All changes in single transaction (sqlever improvement over Sqitch's per-change txn + explicit revert), failure rolls back everything |
1087
+ | `deploy --mode tag` | Changes grouped by tag, each group in a transaction |
1088
+ | `deploy` (non-transactional) | `CREATE INDEX CONCURRENTLY` executes without transaction wrapper, tracking updated separately |
1089
+ | `revert` | Reverts in reverse dependency order, updates tracking tables |
1090
+ | `revert --to` | Reverts to specified change, not further |
1091
+ | `verify` | Runs verify script, PASS/FAIL per change, correct exit code |
1092
+ | `deploy --verify` | Run verify after each change; if verify script is missing for a change, skip verification for that change (do not error) |
1093
+ | `status` | Correct pending count, deployed count, last deployed change |
1094
+ | `log` | Full history in correct order, timestamps reasonable |
1095
+ | `tag` | Tag appears in plan, visible in status/log |
1096
+ | `rework` | Creates reworked change, plan file has duplicate name with @tag reference |
1097
+
1098
+ **Failure scenarios:**
1099
+ - Deploy script fails (SQL error): tracking tables left consistent, revert possible
1100
+ - Deploy script fails mid-batch (multiple changes): partial state recoverable
1101
+ - Verify script fails: correct exit code 3, clear output
1102
+ - Verify script missing (with `--verify`): skip verification for that change, do not error
1103
+ - Revert script raises exception (non-revertable migration): log failure, record `fail` event, tracking state consistent
1104
+ - Database unreachable: exit code 10, clear error message
1105
+ - Concurrent deploy from two processes: second process gets exit code 4, first completes
1106
+ - Non-transactional deploy fails: INVALID index detected, clear guidance on cleanup
1107
+ - Lock timeout exceeded: exit code 5, actionable error message
1108
+
1109
+ **Advisory lock tests:**
1110
+ - `pg_try_advisory_lock(<lock_key>)` acquired at deploy start, released on completion
1111
+ - Same lock acquired for revert, rebase, and checkout operations
1112
+ - Second concurrent deploy fails immediately (non-blocking mode), reports exit code 4
1113
+ - Wait mode: `pg_advisory_lock` with `lock_timeout` times out, reports exit code 5
1114
+ - Crashed deploy: advisory lock auto-released on disconnect, next deploy succeeds
1115
+ - PgBouncer detection: `SHOW pool_mode` triggers error for deploy/revert in transaction mode
1116
+
1117
+ **Dependency scenarios:**
1118
+ - Diamond dependency deploys correctly
1119
+ - Missing dependency detected before deploy starts
1120
+ - Circular dependency detected before deploy starts
1121
+ - Conflict dependency: deploy blocked if conflicting change is currently deployed
1122
+
1123
+ **Schema isolation:**
1124
+ - `sqitch.*` schema created correctly on first deploy (using IF NOT EXISTS)
1125
+ - Existing `sqitch.*` schema from prior Sqitch deployment used without modification
1126
+ - Concurrent first-deploy (two processes, fresh DB): advisory lock prevents race condition
1127
+
1128
+ ### 8.4 Sqitch oracle / compatibility tests
1129
+
1130
+ The most important test suite. Sqitch is the ground truth. We run identical operations against identical databases with both tools and compare results.
1131
+
1132
+ **Infrastructure:**
1133
+ - Sqitch runs via Docker image `sqitch/sqitch:latest` — no Perl runtime needed
1134
+ - Both tools share the same Postgres container
1135
+ - Test harness executes a command with Sqitch, snapshots DB state, resets, executes same command with sqlever, snapshots DB state, diffs
1136
+
1137
+ **What we compare:**
1138
+
1139
+ | Artifact | How we compare |
1140
+ |----------|----------------|
1141
+ | `sqitch.plan` output | Byte-for-byte identical after `add`, `tag`, `rework` |
1142
+ | `sqitch.changes` table | All columns: change_id, script_hash, change, project, note, committed_at (within tolerance), committer_name, committer_email, planned_at, planner_name, planner_email |
1143
+ | `sqitch.dependencies` table | All columns: change_id, type, dependency, dependency_id |
1144
+ | `sqitch.events` table | All columns: event, change_id, change, project, note, requires, conflicts, tags, committed_at (within tolerance), committer_name, committer_email, planned_at, planner_name, planner_email |
1145
+ | `sqitch.tags` table | All columns: tag_id, tag, project, change_id, note, committed_at (within tolerance), committer_name, committer_email, planned_at, planner_name, planner_email |
1146
+ | `sqlever status` stdout | Semantically equivalent (pending count, deployed count, last change name) |
1147
+ | `sqlever log` stdout | Same changes in same order, same metadata |
1148
+ | Exit codes | Identical for all success and failure scenarios |
1149
+
1150
+ **Timestamp tolerance:** committed_at / planned_at compared within 5 seconds (wall clock differences between runs).
1151
+
1152
+ **Test fixture corpus** — maintained in `tests/fixtures/sqitch-projects/`:
1153
+
1154
+ | Fixture | Description |
1155
+ |---------|-------------|
1156
+ | `minimal/` | 1 change, no deps, no tags |
1157
+ | `linear/` | 10 changes, linear deps A→B→C... |
1158
+ | `diamond/` | Diamond dependency graph |
1159
+ | `tagged/` | Multiple tags, partial deploy to tag |
1160
+ | `with-includes/` | `\i` shared SQL files |
1161
+ | `reworked/` | Changes reworked with `@tag` references |
1162
+ | `cross-project/` | Cross-project dependencies (`project:change`) |
1163
+ | `conflicts/` | Changes with conflict dependencies |
1164
+ | `non-transactional/` | Changes marked `--no-transaction` |
1165
+ | `mid-deploy/` | Project partially deployed by Sqitch, sqlever continues |
1166
+ | `planner-edge-cases/` | Plan entries with commas in planner names (`First,Last,,`), trailing spaces, blank lines mid-plan |
1167
+ | `missing-verify/` | Deploy scripts with no corresponding verify file; test `--verify` skips gracefully |
1168
+ | `non-revertable/` | Revert script that raises an exception; test graceful failure handling |
1169
+ | `heavy-includes/` | 50+ `\i` includes per migration, both repo-relative and `\ir` script-relative paths |
1170
+ | `real-world-1/` | Anonymized real project, ~50 changes |
1171
+ | `real-world-2/` | Anonymized real project, ~200 changes, multiple tags |
1172
+ | `postgres-ai-console/` | PostgresAI Console (`postgres-ai/platform-ui/db`) — 255 sequential changes, heavy `\i`/`\ir` usage (130+ shared functions, 56+ views, 30+ triggers), `%uri` with URL, commas in planner names, missing verify scripts. **Customer zero: sqlever must be a 100% drop-in replacement for this project.** |
1173
+
1174
+ **The "mid-deploy handoff" test** — most important for adoption:
1175
+ 1. Deploy first half of project with real Sqitch
1176
+ 2. Switch to sqlever for second half
1177
+ 3. Verify tracking tables consistent, all remaining changes deploy correctly
1178
+ 4. Verify sqlever status matches what sqitch status would show for the full deployment
1179
+ 5. Verify change IDs computed by sqlever match those computed by Sqitch for the same changes
1180
+
1181
+ **The "reverse handoff" test** — safety net for adoption:
1182
+ 1. Deploy full project with sqlever
1183
+ 2. Switch to Sqitch
1184
+ 3. Verify `sqitch status` reads sqlever-written tracking tables correctly
1185
+ 4. Verify `sqitch log` shows correct history
1186
+ 5. Add a new change with Sqitch, deploy it, verify tracking tables are consistent
1187
+ 6. Revert a sqlever-deployed change with Sqitch, verify tracking state
1188
+
1189
+ This bidirectional test validates that teams can safely evaluate sqlever and revert to Sqitch if needed.
1190
+
1191
+ ### 8.5 Analysis correctness tests
1192
+
1193
+ For each analysis rule, maintain a fixture directory:
1194
+
1195
+ ```
1196
+ tests/fixtures/analysis/
1197
+ SA001/
1198
+ trigger/
1199
+ add_column_not_null_no_default.sql # must trigger
1200
+ no_trigger/
1201
+ add_column_nullable.sql # must NOT trigger
1202
+ add_column_not_null_with_default.sql # must NOT trigger (PG 17)
1203
+ version_aware/
1204
+ add_column_not_null_with_default.pg11.trigger.sql # triggers on PG < 11
1205
+ SA002/
1206
+ trigger/
1207
+ add_column_default_random.sql # must trigger (volatile, all versions)
1208
+ add_column_default_gen_random_uuid.sql # must trigger (volatile, all versions)
1209
+ add_column_default_clock_timestamp.sql # must trigger (volatile, all versions)
1210
+ no_trigger/
1211
+ add_column_default_now.sql # must NOT trigger on PG >= 11 (now() is STABLE)
1212
+ add_column_default_literal.pg17.sql # must NOT trigger on PG >= 11
1213
+ ...
1214
+ SA009/
1215
+ trigger/
1216
+ add_fk_no_index.sql # must trigger
1217
+ add_fk_no_index.context.json # mock introspection data
1218
+ ...
1219
+ SA020/
1220
+ trigger/
1221
+ concurrent_index_in_transaction.sql # must trigger
1222
+ no_trigger/
1223
+ concurrent_index_no_transaction.sql # must NOT trigger
1224
+ ...
1225
+ ```
1226
+
1227
+ Test runner:
1228
+ - For every `trigger/` file: analysis must produce the rule ID at correct severity
1229
+ - For every `no_trigger/` file: analysis must produce zero findings for that rule
1230
+ - Version-aware files: tested against relevant PG versions in matrix
1231
+
1232
+ False positive rate tracked as a metric. Any new rule must have >=5 no_trigger fixtures. Complex rules (SA003 with many type pairs, SA002 with version/volatility matrix) should have 10-20+.
1233
+
1234
+ ### 8.6 Performance tests
1235
+
1236
+ Migration tooling must be fast even on large plan files.
1237
+
1238
+ - Plan parse: 10,000-change plan file parses in < 500ms
1239
+ - `sqlever status` on 1,000-change deployed project: < 1s (single query, not N queries)
1240
+ - Analysis: 1,000-line migration SQL analyzed in < 200ms
1241
+ - `sqlever log` with 10,000 entries: < 2s (pagination by default)
1242
+
1243
+ Run on every release, not every commit.
1244
+
1245
+ ### 8.7 CI configuration
1246
+
1247
+ ```yaml
1248
+ # .github/workflows/ci.yml
1249
+
1250
+ on: [push, pull_request]
1251
+
1252
+ jobs:
1253
+ unit:
1254
+ runs-on: ubuntu-latest
1255
+ steps:
1256
+ - uses: actions/checkout@v4
1257
+ - uses: oven-sh/setup-bun@v2
1258
+ - run: bun install
1259
+ - run: bun test tests/unit/
1260
+
1261
+ integration:
1262
+ runs-on: ubuntu-latest
1263
+ strategy:
1264
+ matrix:
1265
+ pg: ["14", "15", "16", "17", "18"]
1266
+ services:
1267
+ postgres:
1268
+ image: postgres:${{ matrix.pg }}
1269
+ env:
1270
+ POSTGRES_PASSWORD: test
1271
+ options: >-
1272
+ --health-cmd pg_isready
1273
+ --health-interval 5s
1274
+ steps:
1275
+ - uses: actions/checkout@v4
1276
+ - uses: oven-sh/setup-bun@v2
1277
+ - run: bun install
1278
+ - run: bun test tests/integration/
1279
+ env:
1280
+ PGPASSWORD: test
1281
+ PGHOST: localhost
1282
+
1283
+ compat:
1284
+ runs-on: ubuntu-latest
1285
+ strategy:
1286
+ matrix:
1287
+ pg: ["14", "15", "16", "17", "18"]
1288
+ services:
1289
+ postgres:
1290
+ image: postgres:${{ matrix.pg }}
1291
+ env:
1292
+ POSTGRES_PASSWORD: test
1293
+ steps:
1294
+ - uses: actions/checkout@v4
1295
+ - uses: oven-sh/setup-bun@v2
1296
+ - run: docker pull sqitch/sqitch:latest
1297
+ - run: bun install
1298
+ - run: bun test tests/compat/
1299
+ env:
1300
+ PGPASSWORD: test
1301
+ PGHOST: localhost
1302
+ SQITCH_IMAGE: sqitch/sqitch:latest
1303
+
1304
+ analysis:
1305
+ runs-on: ubuntu-latest
1306
+ strategy:
1307
+ matrix:
1308
+ pg: ["14", "17"] # oldest + newest for version-aware rules
1309
+ steps:
1310
+ - uses: actions/checkout@v4
1311
+ - uses: oven-sh/setup-bun@v2
1312
+ - run: bun install
1313
+ - run: bun test tests/analysis/
1314
+
1315
+ build:
1316
+ runs-on: ${{ matrix.os }}
1317
+ strategy:
1318
+ matrix:
1319
+ os: [ubuntu-latest, macos-latest]
1320
+ steps:
1321
+ - uses: actions/checkout@v4
1322
+ - uses: oven-sh/setup-bun@v2
1323
+ - run: bun install
1324
+ - run: bun run build
1325
+ - run: ./dist/sqlever --version
1326
+ ```
1327
+
1328
+ **CI policy:**
1329
+ - All jobs must pass before merge to `main`
1330
+ - `compat` job is the gate — no merge if Sqitch oracle tests fail
1331
+ - Performance tests run on release tags only
1332
+ - Test coverage reported to codecov
1333
+
1334
+ ### 8.8 Local development
1335
+
1336
+ ```bash
1337
+ # Run all tests locally
1338
+ bun test
1339
+
1340
+ # Run only unit tests (fast, no Docker needed)
1341
+ bun test tests/unit/
1342
+
1343
+ # Run integration tests against local PG
1344
+ PGURI=postgres://postgres:test@localhost/sqlever_test bun test tests/integration/
1345
+
1346
+ # Run Sqitch compat tests (requires Docker)
1347
+ bun test tests/compat/
1348
+
1349
+ # Run a specific rule's analysis tests
1350
+ bun test tests/analysis/SA001
1351
+
1352
+ # Run the full matrix (slow — use before release)
1353
+ bash scripts/test-matrix.sh
1354
+ ```
1355
+
1356
+ `docker-compose.yml` in repo root spins up PG 14–18 on ports 5414–5418 for local matrix testing.
1357
+
1358
+ ---
1359
+
1360
+ ## 9. Implementation plan
1361
+
1362
+ ### Phase 0 — foundation (Sprint 1, ~1 week)
1363
+
1364
+ Core infrastructure. Nothing user-visible yet.
1365
+
1366
+ - [ ] Repo structure: `src/`, `tests/`, `fixtures/`
1367
+ - [ ] `tsconfig.json`, `package.json` with bun
1368
+ - [ ] `bun build --compile` producing single binary
1369
+ - [ ] **Validation spike: `pgsql-parser` + `bun build --compile`** — verify that the native C addon (`libpg_query`) compiles on macOS and Linux, bundles correctly in the compiled binary, and works on a machine without build tools. If it fails, evaluate WASM alternatives (`pg-query-emscripten` or similar). This is a go/no-go for the architecture.
1370
+ - [ ] CI: GitHub Actions running `bun test` on push
1371
+ - [ ] Docker Compose for local PG test matrix (PG 14–18)
1372
+ - [ ] `src/config.ts`: parse `sqitch.conf` (Git-style INI with subsections) and `sqlever.toml`
1373
+ - [ ] `src/db/client.ts`: pg connection wrapper, URI parsing (`db:pg:` and `postgresql://`), error handling
1374
+ - [ ] `src/output.ts`: shared print/error/json output helpers
1375
+ - [ ] `src/cli.ts`: command router with `--help`, `--version`, `--format`
1376
+ - [ ] **DD12 resolved: psql execution model** — implement psql shell-out wrapper: invoke psql with `ON_ERROR_STOP=1`, `--no-psqlrc`, `-v` for `--set` variables, parse stderr for errors, handle exit codes
1377
+
1378
+ ### Phase 1 — Sqitch parity (Sprint 2–4, ~3 weeks)
1379
+
1380
+ After this phase: drop-in replacement for all Sqitch commands.
1381
+
1382
+ **Sprint 2 — plan + tracking**
1383
+ - [ ] `src/plan/parser.ts`: full sqitch.plan format parser (pragmas, reworked changes, `@tag` refs, cross-project deps)
1384
+ - [ ] `src/plan/writer.ts`: sqitch.plan writer (append-only for `add`, rework support)
1385
+ - [ ] `src/plan/types.ts`: Change, Tag, Dependency, Project types
1386
+ - [ ] Change ID computation: SHA-1 algorithm matching Sqitch byte-for-byte
1387
+ - [ ] `src/db/registry.ts`: read/write `sqitch.changes`, `sqitch.events`, `sqitch.tags`, `sqitch.projects`, `sqitch.dependencies`
1388
+ - [ ] `sqlever init`: creates sqitch.conf, sqitch.plan, deploy/revert/verify dirs
1389
+ - [ ] `sqlever add`: creates migration files, appends to plan (supports `--no-transaction`, `--conflict`)
1390
+ - [ ] Tests: plan round-trip, init, add, change ID verification against Sqitch
1391
+
1392
+ **Sprint 3 — deploy + revert**
1393
+ - [ ] `src/commands/deploy.ts`: topological sort, execute deploy scripts, update tracking
1394
+ - [ ] Advisory lock acquisition at deploy start: `pg_try_advisory_lock(<lock_key>)` (non-blocking default) or `pg_advisory_lock(<lock_key>)` with `lock_timeout` (wait mode) — session-level, released on completion or disconnect. Lock key: application-computed stable hash (not `hashtext()`)
1395
+ - [ ] Non-transactional change support (execute without BEGIN/COMMIT, track separately)
1396
+ - [ ] `src/commands/revert.ts`: execute revert scripts in reverse order, update tracking
1397
+ - [ ] `--to <change>` flag for both
1398
+ - [ ] `--dry-run` flag
1399
+ - [ ] `--mode [all|change|tag]` flag with correct transaction boundaries
1400
+ - [ ] `--log-only` flag (record as deployed without executing)
1401
+ - [ ] `--set` variable substitution
1402
+ - [ ] Deploy connection session settings (application_name, statement_timeout=0, idle_in_transaction_session_timeout=10min, non_transactional_statement_timeout=4h)
1403
+ - [ ] Lock timeout guard: auto-prepend `SET lock_timeout` before risky DDL (configurable, enabled by default)
1404
+ - [ ] Conflict dependency checking
1405
+ - [ ] Partial deploy / revert with dependency validation
1406
+ - [ ] Tests: deploy/revert, partial, dry-run, failed deploy recovery, advisory locks, non-transactional deploy
1407
+
1408
+ **Sprint 4 — verify + status + log + remaining commands**
1409
+ - [ ] `sqlever verify`: run verify scripts, report pass/fail per change
1410
+ - [ ] `sqlever status`: pending count, deployed count, last deployed, target info, modified script detection
1411
+ - [ ] `sqlever log`: deployment history with timestamps and committers
1412
+ - [ ] `sqlever tag`: create tag at current deployment state
1413
+ - [ ] `sqlever rework`: create reworked version of existing change
1414
+ - [ ] `sqlever rebase`: revert + deploy convenience command
1415
+ - [ ] `sqlever bundle`: package project for distribution
1416
+ - [ ] `sqlever checkout`: deploy/revert to match VCS branch
1417
+ - [ ] `sqlever show`: display change/tag details
1418
+ - [ ] `sqlever plan`: display plan contents
1419
+ - [ ] `sqlever upgrade`: upgrade registry schema
1420
+ - [ ] `sqlever engine`, `sqlever target`, `sqlever config`: manage configuration
1421
+ - [ ] Compatibility test: adopt existing Sqitch project, verify parity
1422
+ - [ ] Reverse handoff test: deploy with sqlever, verify Sqitch reads tracking tables correctly
1423
+ - [ ] Tests: verify, status, log, tag, rework, all compat tests
1424
+
1425
+ ### Phase 2 — static analysis (Sprint 5–6, ~2 weeks)
1426
+
1427
+ **Sprint 5 — analysis engine**
1428
+ - [ ] Integrate `pgsql-parser` (npm): parse SQL to AST (or WASM alternative if Phase 0 spike failed)
1429
+ - [ ] `src/analysis/preprocess.ts`: psql metacommand pre-processing (`\i`/`\ir` resolution, `\set` handling, strip unsupported metacommands with warning)
1430
+ - [ ] `src/analysis/index.ts`: analyzer entry point, rule registry, static/connected rule distinction
1431
+ - [ ] `src/analysis/reporter.ts`: text / json / github-annotations / gitlab-codequality output
1432
+ - [ ] `sqlever analyze <file>`: standalone analysis command (works without sqitch.plan)
1433
+ - [ ] `sqlever analyze` (no args): analyze pending migrations
1434
+ - [ ] `sqlever analyze --changed`: analyze git-changed files
1435
+ - [ ] Inline suppression: `-- sqlever:disable SA010` comment syntax
1436
+ - [ ] Per-file overrides in `sqlever.toml`
1437
+ - [ ] Analysis integrated into `sqlever deploy` (pre-deploy, blocks on error)
1438
+ - [ ] `--force` flag to bypass analysis errors
1439
+ - [ ] `--format github-annotations` for CI
1440
+ - [ ] Rules SA001–SA010 (column safety, index, drop, DML)
1441
+ - [ ] PL/pgSQL body exclusion for SA008/SA010/SA011
1442
+
1443
+ **Sprint 6 — remaining rules + version awareness**
1444
+ - [ ] Rules SA011–SA021 (connected rules, sequence, lock timeout, rename, constraints, concurrency)
1445
+ - [ ] PG version awareness in rules (SA002/SA002b volatility + version, SA017 PG 12+ CHECK pattern, SA019 PG 12+ REINDEX CONCURRENTLY)
1446
+ - [ ] `sqlever.toml` `[analysis]` config: skip, error_on_warn, max_affected_rows, pg_version
1447
+ - [ ] Tests: all rules trigger/no-trigger, version-aware behavior, inline suppression, PL/pgSQL exclusion
1448
+ - [ ] Exit code 2 when analysis blocks deploy
1449
+
1450
+ ### Phase 3 — snapshot includes (Sprint 7, ~1 week)
1451
+
1452
+ - [ ] `src/includes/snapshot.ts`: resolve `\i`/`\ir` from git history
1453
+ - [ ] Parse migration SQL for `\i` / `\ir` directives before deploy
1454
+ - [ ] `git show <commit>:path` for each include, using migration's commit from plan
1455
+ - [ ] Fallback to HEAD when no git repo or file has no history
1456
+ - [ ] `--no-snapshot` flag to disable
1457
+ - [ ] Tests: snapshot resolution, fallback, no-git fallback
1458
+
1459
+ ### Phase 4 — TUI + CI polish (Sprint 8, ~1 week)
1460
+
1461
+ - [ ] `src/tui/deploy.ts`: live deploy dashboard (TTY detection, plain fallback)
1462
+ - [ ] `--format json` on all commands: structured output
1463
+ - [ ] `sqlever diff`: schema diff between deployed state and plan
1464
+ - [ ] PgBouncer detection and warning (DD13)
1465
+ - [ ] Man page generation
1466
+ - [ ] Homebrew formula
1467
+
1468
+ ### Phase 5 — expand/contract (Sprint 9–10, ~2 weeks)
1469
+
1470
+ - [ ] `src/expand-contract/generator.ts`: generate expand + contract migration pair from column rename/type change
1471
+ - [ ] `src/expand-contract/tracker.ts`: track phase state in Postgres (new table in `sqlever.*` schema)
1472
+ - [ ] `sqlever add --expand`: generate linked pair
1473
+ - [ ] `sqlever deploy --phase expand|contract`
1474
+ - [ ] Trigger generation for old↔new column sync (with `pg_trigger_depth()` + `TG_NAME LIKE 'sqlever_sync_%'` recursion guard)
1475
+ - [ ] Partitioned table detection: install sync triggers on parent table (PG 14+ always supports trigger inheritance)
1476
+ - [ ] View shim for backward compat during transition
1477
+ - [ ] Backfill verification before contract phase
1478
+ - [ ] Advisory lock-based concurrency control for phase transitions
1479
+ - [ ] Tests: full expand/contract cycle, rollback, trigger correctness, partitioned tables
1480
+
1481
+ ### Phase 6 — batched background DML (Sprint 11–13, ~3 weeks)
1482
+
1483
+ - [ ] `src/batch/queue.ts`: 3-partition rotating queue schema (PGQ-inspired), DDL migration, partition rotation logic
1484
+ - [ ] `src/batch/worker.ts`: batch execution loop, lock_timeout + statement_timeout per batch, sleep, retry
1485
+ - [ ] `src/batch/progress.ts`: row counting (last processed PK tracking), ETA, dead tuple monitoring
1486
+ - [ ] Replication lag monitoring: query `pg_stat_replication`, pause when lag exceeds threshold
1487
+ - [ ] `sqlever batch add`: register job, create queue entry
1488
+ - [ ] `sqlever batch list`: show all jobs and state
1489
+ - [ ] `sqlever batch status <job>`: progress, ETA, recent errors
1490
+ - [ ] `sqlever batch pause|resume|cancel <job>`
1491
+ - [ ] `sqlever batch retry <job>`: manual retry of dead jobs, resume from last processed PK
1492
+ - [ ] Connection management: direct connection required, SET re-issued per batch
1493
+ - [ ] Tests: full job lifecycle, pause/resume, retry on failure, max retries → dead → manual retry, replication lag pause
1494
+
1495
+ ### Phase 7 — AI + DBLab (Sprint 14–16, ~3 weeks)
1496
+
1497
+ - [ ] `src/ai/explain.ts`: LLM-powered migration explainer (OpenAI/Anthropic/Ollama)
1498
+ - [ ] `sqlever explain <migration>`: plain-English + risk summary
1499
+ - [ ] `sqlever review`: Markdown PR comment with analysis results + explanation
1500
+ - [ ] `sqlever suggest-revert <migration>`: LLM-assisted revert generation
1501
+ - [ ] DBLab integration: `--dblab-url`, `--dblab-token` flags on `deploy`
1502
+ - [ ] Clone provisioning, deploy+verify+revert on clone, report before touching prod
1503
+ - [ ] Tests: mock LLM responses, DBLab API mock
1504
+
1505
+ ---
1506
+
1507
+ ## Prior art summary
1508
+
1509
+ | Tool | What we take | What we don't |
1510
+ |------|-------------|---------------|
1511
+ | Sqitch | CLI interface, plan format, tracking schema | Perl runtime, multi-DB |
1512
+ | pgroll | Expand/contract pattern concept | Full table recreation |
1513
+ | migrationpilot | Dangerous pattern detection ideas | Thin implementation |
1514
+ | GitLab migration helpers | Batched DML design, throttling, retry, replication lag monitoring | Rails dependency |
1515
+ | SkyTools PGQ | 3-partition queue architecture | External daemon |
1516
+ | pg_index_pilot | Write-ahead tracking, advisory lock patterns, invalid index cleanup | PL/pgSQL-only, dblink architecture |
1517
+ | Flyway / Liquibase | Nothing | Wrong philosophy |
1518
+
1519
+ ---
1520
+
1521
+ *This spec is a living document. Update it before writing code.*