pcrd 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (72) hide show
  1. checksums.yaml +7 -0
  2. data/CHANGELOG.md +24 -0
  3. data/LICENSE +21 -0
  4. data/README.md +614 -0
  5. data/bin/pcrd +7 -0
  6. data/lib/pcrd/advisory_lock.rb +50 -0
  7. data/lib/pcrd/apply/engine.rb +184 -0
  8. data/lib/pcrd/apply/worker.rb +97 -0
  9. data/lib/pcrd/backfill/batch.rb +158 -0
  10. data/lib/pcrd/backfill/engine.rb +153 -0
  11. data/lib/pcrd/checkpoint/store.rb +217 -0
  12. data/lib/pcrd/cli.rb +274 -0
  13. data/lib/pcrd/commands/analyze.rb +125 -0
  14. data/lib/pcrd/commands/cleanup.rb +112 -0
  15. data/lib/pcrd/commands/demo.rb +152 -0
  16. data/lib/pcrd/commands/readiness.rb +30 -0
  17. data/lib/pcrd/commands/status.rb +129 -0
  18. data/lib/pcrd/commands/verify.rb +172 -0
  19. data/lib/pcrd/config/add_column.rb +7 -0
  20. data/lib/pcrd/config/analyze_config.rb +8 -0
  21. data/lib/pcrd/config/column_spec.rb +10 -0
  22. data/lib/pcrd/config/connection.rb +7 -0
  23. data/lib/pcrd/config/cutover_config.rb +7 -0
  24. data/lib/pcrd/config/load_error.rb +7 -0
  25. data/lib/pcrd/config/loader.rb +158 -0
  26. data/lib/pcrd/config/migrate_config.rb +21 -0
  27. data/lib/pcrd/config/root.rb +9 -0
  28. data/lib/pcrd/config/schema.rb +62 -0
  29. data/lib/pcrd/config/table.rb +9 -0
  30. data/lib/pcrd/config/verify_config.rb +7 -0
  31. data/lib/pcrd/config.rb +7 -0
  32. data/lib/pcrd/connection/client.rb +129 -0
  33. data/lib/pcrd/connection/error.rb +7 -0
  34. data/lib/pcrd/connection/replication.rb +108 -0
  35. data/lib/pcrd/cutover/orchestrator.rb +108 -0
  36. data/lib/pcrd/cutover/sequences.rb +138 -0
  37. data/lib/pcrd/demo/generator.rb +214 -0
  38. data/lib/pcrd/demo/schema.rb +154 -0
  39. data/lib/pcrd/error.rb +12 -0
  40. data/lib/pcrd/migration/orchestrator.rb +272 -0
  41. data/lib/pcrd/monitor/lag.rb +107 -0
  42. data/lib/pcrd/options.rb +15 -0
  43. data/lib/pcrd/output/analyze_printer.rb +173 -0
  44. data/lib/pcrd/output/cutover_printer.rb +128 -0
  45. data/lib/pcrd/output/preflight_printer.rb +119 -0
  46. data/lib/pcrd/output/readiness_printer.rb +72 -0
  47. data/lib/pcrd/preflight.rb +331 -0
  48. data/lib/pcrd/readiness/manifest.rb +201 -0
  49. data/lib/pcrd/replication/consumer.rb +235 -0
  50. data/lib/pcrd/replication/error.rb +10 -0
  51. data/lib/pcrd/replication/pgoutput/messages.rb +68 -0
  52. data/lib/pcrd/replication/pgoutput/parser.rb +316 -0
  53. data/lib/pcrd/reporter/console.rb +46 -0
  54. data/lib/pcrd/reporter/null.rb +14 -0
  55. data/lib/pcrd/schema/column.rb +59 -0
  56. data/lib/pcrd/schema/ddl.rb +71 -0
  57. data/lib/pcrd/schema/diff_entry.rb +36 -0
  58. data/lib/pcrd/schema/differ.rb +175 -0
  59. data/lib/pcrd/schema/object_reader.rb +187 -0
  60. data/lib/pcrd/schema/packer.rb +90 -0
  61. data/lib/pcrd/schema/reader.rb +118 -0
  62. data/lib/pcrd/schema/setup.rb +143 -0
  63. data/lib/pcrd/schema/setup_error.rb +9 -0
  64. data/lib/pcrd/schema/table_not_found.rb +8 -0
  65. data/lib/pcrd/schema/type_registry.rb +116 -0
  66. data/lib/pcrd/sql.rb +55 -0
  67. data/lib/pcrd/transform/row_transformer.rb +69 -0
  68. data/lib/pcrd/transform/type_map.rb +209 -0
  69. data/lib/pcrd/transform/validator.rb +106 -0
  70. data/lib/pcrd/version.rb +5 -0
  71. data/lib/pcrd.rb +11 -0
  72. metadata +231 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 974b78ca232b53584a2aba4742b46dd09c0842701f05649775d817cbef711c09
4
+ data.tar.gz: 8cae02d6f8e799c2d86a383f1b7b27cab7af7f8eb15d41cc402d36683db5b367
5
+ SHA512:
6
+ metadata.gz: e3777c72cf2c9fd9d2accce587906a9425fac44a8cfd7eb7f00f2a7b96814663e11869e0daab62a533270ce3f4d1e8ca4d837ae290d322a3e7fe37ea6122fd84
7
+ data.tar.gz: 25206ce31d6bd995298debbf6109e922d88993df78a2e54d2104f22431f5e065a0edba722b29e6822f01ab14f7ae6c1422148fc969b44d92e2472321edd21090
data/CHANGELOG.md ADDED
@@ -0,0 +1,24 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ## [0.1.0] - 2026-06-10
11
+
12
+ ### Added
13
+
14
+ - Initial release.
15
+ - Zero-downtime cross-cluster PostgreSQL migrations via logical replication.
16
+ - Column type changes, renames, additions, drops, and reordering with padding optimization.
17
+ - Preflight validation (connections, WAL level, type cast safety, PK existence).
18
+ - Keyset-paginated, checkpointed `COPY` backfill engine.
19
+ - Concurrent WAL streaming + apply engine with TOAST/TRUNCATE handling.
20
+ - Replication lag monitoring and brief-lock cutover orchestration.
21
+ - `pcrd` CLI built on Thor.
22
+
23
+ [Unreleased]: https://github.com/charlesharris/pcrd/compare/v0.1.0...HEAD
24
+ [0.1.0]: https://github.com/charlesharris/pcrd/releases/tag/v0.1.0
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Charles Harris
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,614 @@
1
+ # pcrd — PostgreSQL Column Rewrite Daemon
2
+
3
+ ![Captain Picard happy dance](https://media.tenor.com/05Jm-dslzLgAAAAM/happy-dance-star-trek.gif)
4
+
5
+ !!! NOTE - This is not a production-ready project... use at your own risk, YMMV, etc.
6
+
7
+ Zero-downtime cross-cluster PostgreSQL migrations using logical replication.
8
+
9
+ pcrd migrates large tables to a new PostgreSQL cluster with column type changes, renames, additions, drops, and column reordering — without locking your source database for more than a few seconds at cutover.
10
+
11
+ ---
12
+
13
+ ## The Problem
14
+
15
+ `ALTER TABLE t ALTER COLUMN id TYPE bigint` on a 500M-row table acquires an `AccessExclusiveLock` and rewrites every row while holding it. That means minutes to hours of complete read/write blackout — unacceptable in production.
16
+
17
+ pcrd solves this by building the new schema on a separate cluster using logical replication, streaming all changes from source to target continuously, and then cutting over with only a brief maintenance window (seconds, not hours).
18
+
19
+ ---
20
+
21
+ ## How It Works
22
+
23
+ ```
24
+ Source cluster pcrd Target cluster
25
+ ────────────── ──── ──────────────
26
+ live table ─WAL─────► WAL consumer new schema table
27
+ (old types) type transformer ──► (new types)
28
+ │ ─bulk──────► backfill engine ──►
29
+ │ lag monitor
30
+ │ cutover (brief lock)
31
+
32
+ App ──── DATABASE_URL ────────────────────────────► switch here
33
+ ```
34
+
35
+ **Phases:**
36
+ 1. **Preflight** — validate connections, WAL level, type cast safety, PK existence
37
+ 2. **Setup** — create publication + replication slot on source; DDL on target
38
+ 3. **Backfill** — bulk copy existing rows via keyset-paginated `COPY`, checkpointed
39
+ 4. **Streaming** — consume WAL events, transform, apply to target concurrently with backfill
40
+ 5. **Catchup** — monitor replication lag; display live lag meter
41
+ 6. **Cutover** — operator-triggered; drain lag to zero, advance sequences, signal ready
42
+ 7. **Verify** — row count + spot-check comparison
43
+ 8. **Cleanup** — drop replication slot, publication, archive source tables
44
+
45
+ ---
46
+
47
+ ## Features
48
+
49
+ - **Zero schema lock** — source database runs normally throughout; `AccessExclusiveLock` held only for milliseconds at cutover
50
+ - **Cross-cluster** — source and target are separate PostgreSQL servers; works for version upgrades, cloud provider migrations, hardware changes
51
+ - **Type transformation** — widening casts (int→bigint, varchar→text, timestamp→timestamptz) are automatic; narrowing casts require an explicit pre-migration data validation pass
52
+ - **Column padding optimizer** — analyzes column alignment and estimates space savings from reordering; integrated into the migration flow
53
+ - **Resumable** — SQLite checkpoint stores per-batch progress; `pcrd migrate --resume` picks up from the last completed batch
54
+ - **No source extensions required** — uses PostgreSQL's built-in `pgoutput` logical replication (PG 10+)
55
+
56
+ ---
57
+
58
+ ## Requirements
59
+
60
+ - Ruby 3.2+
61
+ - PostgreSQL 10+ on source (with `wal_level = logical`)
62
+ - PostgreSQL 10+ on target
63
+ - Source user must have `REPLICATION` attribute and `SELECT` on migrated tables
64
+
65
+ ---
66
+
67
+ ## Installation
68
+
69
+ **From RubyGems** (once published):
70
+ ```bash
71
+ gem install pcrd
72
+ ```
73
+
74
+ **From source:**
75
+ ```bash
76
+ git clone https://github.com/charris/pcrd
77
+ cd pcrd
78
+ bundle install
79
+ gem build pcrd.gemspec
80
+ gem install pcrd-0.1.0.gem
81
+ pcrd --version
82
+ ```
83
+
84
+ > `bundle install` installs dependencies only — it does not put `pcrd` on your PATH. The `gem build` + `gem install` steps are required. If you just want to run pcrd without installing, use `bundle exec bin/pcrd` from the repo root.
85
+
86
+ ---
87
+
88
+ ## Quick Start
89
+
90
+ ### 1. Start the demo environment
91
+
92
+ ```bash
93
+ docker compose -f dev/docker-compose.yml up -d
94
+ ```
95
+
96
+ This starts two PostgreSQL 16 containers:
97
+ - **source_db** on port 5433 (with `wal_level=logical`)
98
+ - **target_db** on port 5434
99
+
100
+ ### 2. Create the demo schema and data
101
+
102
+ ```bash
103
+ # Create tables on source (intentionally poor column ordering for demo)
104
+ pcrd demo setup
105
+
106
+ # Seed with 50,000 rows (users → agents → listings)
107
+ pcrd demo seed --rows 50000
108
+ ```
109
+
110
+ ### 3. Analyze column padding
111
+
112
+ ```bash
113
+ # Shows current column layout and how much space can be saved by reordering
114
+ pcrd analyze
115
+
116
+ # Compare source vs. proposed target schema side-by-side
117
+ pcrd analyze --compare-target
118
+ ```
119
+
120
+ ### 4. Run the migration
121
+
122
+ ```bash
123
+ # Check everything looks right first
124
+ pcrd migrate --preflight-only
125
+
126
+ # Run the full migration (backfill + streaming)
127
+ pcrd migrate --yes
128
+
129
+ # Or backfill only (no WAL streaming)
130
+ pcrd migrate --backfill-only --yes
131
+ ```
132
+
133
+ ### 5. Cut over
134
+
135
+ ```bash
136
+ # Once lag is near zero, put the app in maintenance mode, then:
137
+ pcrd cutover --maintenance-confirmed
138
+ ```
139
+
140
+ ---
141
+
142
+ ## Configuration
143
+
144
+ pcrd looks for `pcrd.config.yml` in the current directory by default. Pass `--config path/to/file.yml` to override. Run `pcrd demo setup` to generate a sample file automatically.
145
+
146
+ **Full reference:** [docs/config_reference.md](https://github.com/charlesharris/pcrd/blob/main/docs/config_reference.md)
147
+
148
+ ### The most important rule: only specify what changes
149
+
150
+ The `columns:` map under each table only needs entries for columns you want to **modify**. Any column not listed is migrated automatically — same name, same type, same `NOT NULL`, same `DEFAULT`. You only need a column entry if you want to change its type, rename it, or drop it.
151
+
152
+ For a 20-column table where only `id` needs widening:
153
+
154
+ ```yaml
155
+ migrate:
156
+ tables:
157
+ - name: orders
158
+ columns:
159
+ id:
160
+ type: bigint # only this one column needs to be listed
161
+ ```
162
+
163
+ The other 19 columns require no configuration.
164
+
165
+ ### Annotated config example
166
+
167
+ ```yaml
168
+ # pcrd.config.yml
169
+
170
+ source:
171
+ host: db-primary.old.example.com
172
+ port: 5432 # default: 5432
173
+ database: myapp_production
174
+ user: pcrd_replication
175
+ # password: via PCRD_SOURCE_PASSWORD env var or ~/.pgpass
176
+
177
+ target:
178
+ host: db-primary.new.example.com
179
+ port: 5432
180
+ database: myapp_production # same database name on both clusters
181
+ user: pcrd_writer
182
+ # password: via PCRD_TARGET_PASSWORD env var or ~/.pgpass
183
+
184
+ migrate:
185
+ # replication_slot and publication are auto-derived from the first table
186
+ # name if not set. Set explicitly when running multiple migrations.
187
+ replication_slot: pcrd_listings_v2 # optional
188
+ publication: pcrd_pub_v2 # optional
189
+
190
+ batch_size: 10_000 # rows per backfill batch; default 10,000
191
+ lag_threshold_bytes: 1_048_576 # 1 MB — "ready for cutover" threshold
192
+ checkpoint_db: ./pcrd_checkpoint.sqlite3 # per-batch progress; enables --resume
193
+
194
+ tables:
195
+ - name: listings
196
+
197
+ # Reorder columns for minimal alignment padding (free — pcrd rewrites
198
+ # the table anyway). Run `pcrd analyze` first to see the savings.
199
+ optimize_column_order: true
200
+
201
+ # Only specify columns you want to change.
202
+ # Every other column is copied as-is (same name, type, constraints).
203
+ columns:
204
+ id:
205
+ type: bigint # widen integer → bigint (always safe, no validation)
206
+
207
+ list_price:
208
+ type: numeric(18,4) # widen numeric precision (always safe)
209
+ rename: list_price_precise # rename in the same step
210
+
211
+ status_code:
212
+ rename: listing_status # rename only, keep the same type
213
+
214
+ legacy_notes:
215
+ drop: true # exclude this column from the target entirely
216
+
217
+ # Not listed = copied as-is:
218
+ # active, bedrooms, bathrooms, created_at, latitude, longitude, ...
219
+
220
+ # New columns to add (not present on source).
221
+ # Backfilled rows get the DEFAULT value; NULL if no default specified.
222
+ add_columns:
223
+ - name: updated_at
224
+ type: timestamptz
225
+ default: "now()" # SQL expression evaluated by PostgreSQL
226
+
227
+ - name: users
228
+ columns:
229
+ id:
230
+ type: bigint # all other user columns copied unchanged
231
+
232
+ # Tables to include in `pcrd analyze` output.
233
+ # If omitted, analyzes all tables listed in migrate.tables.
234
+ analyze:
235
+ tables:
236
+ - listings
237
+ - users
238
+
239
+ # Spot-check settings for `pcrd verify`.
240
+ verify:
241
+ sample_size: 1_000 # random rows to compare field-by-field
242
+
243
+ # Cutover behavior.
244
+ cutover:
245
+ sequence_buffer: 1_000 # added to max(id) when setting target sequences
246
+ lag_drain_timeout: 300 # seconds to wait for lag → zero during cutover
247
+ ```
248
+
249
+ **Passwords** — never put passwords in the config file. Use:
250
+ - `PCRD_SOURCE_PASSWORD` environment variable
251
+ - `PCRD_TARGET_PASSWORD` environment variable
252
+ - `~/.pgpass` (standard PostgreSQL password file)
253
+
254
+ ### Quick reference: column change options
255
+
256
+ | In `columns:` | Effect |
257
+ |---|---|
258
+ | `type: bigint` | Change type; keep name |
259
+ | `rename: new_name` | Rename; keep type |
260
+ | `type: bigint, rename: new_name` | Change type AND rename |
261
+ | `drop: true` | Exclude from target entirely |
262
+ | *(no entry)* | Copy exactly as-is |
263
+
264
+ ### Supported type changes
265
+
266
+ | Cast | Safety |
267
+ |---|---|
268
+ | `smallint/integer → bigint` | Always safe |
269
+ | `varchar(n) → text` | Always safe |
270
+ | `timestamp → timestamptz` | Always safe |
271
+ | `integer/bigint → numeric` | Always safe |
272
+ | `bigint → integer` | Validated (range check) |
273
+ | `text/varchar → varchar(n)` | Validated (length check) |
274
+ | `float8 → float4` | Validated (warn only) |
275
+ | `timestamptz → timestamp` | Validated (warn only — timezone lost) |
276
+
277
+ Run `pcrd migrate --preflight-only` to see the full safety report and generated DDL before committing.
278
+
279
+ ---
280
+
281
+ ## CLI Reference
282
+
283
+ ### `pcrd --version`
284
+
285
+ ```bash
286
+ pcrd --version # or: pcrd -v
287
+ # → pcrd 0.1.0
288
+ ```
289
+
290
+ ### `pcrd analyze`
291
+
292
+ Analyze column padding for source tables. Read-only.
293
+
294
+ ```bash
295
+ pcrd analyze [--config FILE] [--table TABLE] [--compare-target]
296
+ ```
297
+
298
+ - `--table TABLE` — analyze only this table (default: all tables in config)
299
+ - `--compare-target` — connect to target and show source vs. target side-by-side, including type changes, renames, added/dropped columns, and padding delta
300
+
301
+ **Example output:**
302
+ ```
303
+ Table: public.listings (50,000 rows)
304
+
305
+ Current layout:
306
+ ┌─────────────────────┬──────────────────┬───────┬──────────┬────────────────┐
307
+ │ Column │ Type │ Align │ Size │ Padding before │
308
+ ├─────────────────────┼──────────────────┼───────┼──────────┼────────────────┤
309
+ │ id │ integer │ 4B │ 4 │ — │
310
+ │ active │ boolean │ 1B │ 1 │ — │
311
+ │ listed_at │ timestamp │ 8B │ 8 │ ← 1 wasted │
312
+ ...
313
+
314
+ Padding analysis:
315
+ Current row overhead (fixed cols + padding): 104 bytes
316
+ Optimal row overhead (fixed cols only): 84 bytes
317
+ Wasted padding: 20 bytes/row (19.2%)
318
+ At 50,000 rows: ~1.0 MB reclaimed by reordering columns
319
+ ```
320
+
321
+ ---
322
+
323
+ ### `pcrd migrate`
324
+
325
+ Run the migration. Preflight → setup → backfill → streaming.
326
+
327
+ ```bash
328
+ pcrd migrate [--config FILE] [--preflight-only] [--backfill-only] [--dry-run]
329
+ [--resume] [--yes] [--force-overwrite]
330
+ ```
331
+
332
+ - `--preflight-only` — run all safety checks and print target DDL; do not start migration
333
+ - `--dry-run` — same as `--preflight-only`
334
+ - `--backfill-only` — copy existing rows only; do not start WAL streaming
335
+ - `--resume` — resume an interrupted migration from the last checkpoint
336
+ - `--yes` — skip the confirmation prompt
337
+ - `--force-overwrite` — drop and recreate target tables if they already exist
338
+
339
+ **Ctrl-C / SIGINT:** pcrd finishes the current batch or WAL event, writes the checkpoint, and exits cleanly with a `--resume` command to copy. Nothing is lost.
340
+
341
+ ```
342
+ Migration interrupted. Resume with:
343
+ pcrd migrate --config migration.yml --resume
344
+ ```
345
+
346
+ **Preflight checks performed:**
347
+ 1. Source and target connectivity
348
+ 2. `wal_level = logical` on source
349
+ 3. `max_replication_slots` headroom
350
+ 4. Source tables exist; row count estimate
351
+ 5. Primary key present on every migrated table (required for upsert semantics)
352
+ 6. Target tables do not already exist
353
+ 7. All spec column names exist on source; all type casts are known
354
+ 8. Data validation for validated casts (bigint→int range, text→varchar(n) length, etc.)
355
+
356
+ **Supported type changes:**
357
+
358
+ | Always safe (no validation) | Validated (data check required) |
359
+ |---|---|
360
+ | `smallint → integer/bigint` | `bigint → integer` |
361
+ | `integer → bigint` | `text/varchar → varchar(n)` |
362
+ | `float4 → float8` | `float8 → float4` (warn only) |
363
+ | `varchar(n) → text` | `timestamptz → timestamp` (warn only) |
364
+ | `timestamp → timestamptz` | `numeric → integer/bigint` |
365
+ | `date → timestamp/timestamptz` | |
366
+ | `integer/bigint → numeric` | |
367
+
368
+ ---
369
+
370
+ ### `pcrd demo`
371
+
372
+ Set up and seed a demo database for testing.
373
+
374
+ ```bash
375
+ pcrd demo setup [--config FILE]
376
+ pcrd demo seed [--config FILE] [--rows N] [--seed N]
377
+ pcrd demo reset [--config FILE]
378
+ ```
379
+
380
+ - `demo setup` — creates `users`, `agents`, and `listings` tables on source; writes a sample `pcrd.config.yml` if none exists. The `listings` table is intentionally ordered with poor column alignment to demonstrate the padding optimizer.
381
+ - `demo seed --rows N` — generates realistic fake data (N listings, proportional users and agents). Default: 50,000 rows. Reproducible with `--seed`.
382
+ - `demo reset` — drops all demo tables.
383
+
384
+ ---
385
+
386
+ ### `pcrd cutover` *(coming soon)*
387
+
388
+ Trigger the cutover sequence after lag reaches near-zero.
389
+
390
+ ```bash
391
+ pcrd cutover [--config FILE] [--maintenance-confirmed]
392
+ ```
393
+
394
+ The application must be in maintenance mode before running this command. See [Cutover Procedure](#cutover-procedure) below.
395
+
396
+ ---
397
+
398
+ ### `pcrd verify` *(coming soon)*
399
+
400
+ Compare row counts and spot-check rows across clusters.
401
+
402
+ ```bash
403
+ pcrd verify [--config FILE] [--sample-size N]
404
+ ```
405
+
406
+ ---
407
+
408
+ ### `pcrd status` *(coming soon)*
409
+
410
+ Show current migration phase, backfill progress, and live replication lag.
411
+
412
+ ---
413
+
414
+ ### `pcrd cleanup` *(coming soon)*
415
+
416
+ Drop replication slot, publication, and checkpoint. Optionally drop source tables.
417
+
418
+ ---
419
+
420
+ ## Cutover Procedure
421
+
422
+ When the lag meter shows "✓ Ready for cutover":
423
+
424
+ 1. **Put the application in maintenance mode.** Options depending on your stack:
425
+
426
+ | Stack | Approach |
427
+ |---|---|
428
+ | **pgBouncer** | `PAUSE <database>` — queues connections instead of rejecting them |
429
+ | **Rails + Rack** | Enable maintenance middleware via file flag or env var |
430
+ | **Kubernetes** | `kubectl scale --replicas=0 deployment/app` |
431
+ | **Heroku** | `heroku maintenance:on` |
432
+
433
+ 2. **Run cutover:** `pcrd cutover --maintenance-confirmed`
434
+ pcrd drains remaining lag to zero, advances target sequences, and verifies row counts.
435
+
436
+ 3. **Switch connection strings:** Update `DATABASE_URL` (or equivalent) to point at the target cluster.
437
+
438
+ 4. **Restart the application.**
439
+
440
+ 5. **Verify:** `pcrd verify` — confirms row counts match across clusters.
441
+
442
+ 6. **End maintenance mode** once the application is healthy on the target cluster.
443
+
444
+ 7. **Cleanup** (days later, when confident): `pcrd cleanup`
445
+
446
+ **Rollback:** Never cut over → old cluster keeps running unchanged. No data is lost.
447
+
448
+ ---
449
+
450
+ ## Column Padding Analysis
451
+
452
+ PostgreSQL stores columns in definition order. Each column is aligned to its type's natural boundary, which wastes bytes when small-alignment columns (bool, smallint) appear between large-alignment columns (bigint, timestamp).
453
+
454
+ **Alignment rules:**
455
+ - 8 bytes: `bigint`, `float8`, `timestamp`, `timestamptz`
456
+ - 4 bytes: `integer`, `float4`, `date`, `numeric`/`text` headers
457
+ - 2 bytes: `smallint`
458
+ - 1 byte: `boolean`, `char`
459
+
460
+ **Optimal ordering:** 8-byte → 4-byte → 2-byte → 1-byte → variable-length
461
+
462
+ Since pcrd rewrites the table anyway during migration, column reordering is free — set `optimize_column_order: true` in the table config and pcrd applies the optimal ordering automatically.
463
+
464
+ The `pcrd analyze` command shows the current waste and estimated space reclaimed at current row count.
465
+
466
+ ---
467
+
468
+ ## Source Database Requirements
469
+
470
+ ```sql
471
+ -- Grant replication capability
472
+ ALTER ROLE pcrd_replication REPLICATION;
473
+
474
+ -- Grant read access to migrated tables
475
+ GRANT SELECT ON TABLE listings, users TO pcrd_replication;
476
+
477
+ -- Allow publication creation (superuser or pg_monitor in PG14+)
478
+ GRANT CREATE ON DATABASE myapp_production TO pcrd_replication;
479
+ ```
480
+
481
+ `postgresql.conf` must have:
482
+ ```
483
+ wal_level = logical
484
+ max_replication_slots = <current + number of concurrent pcrd migrations>
485
+ max_wal_senders = <current + number of concurrent pcrd migrations>
486
+ ```
487
+
488
+ ---
489
+
490
+ ## Example Project
491
+
492
+ `examples/listings_migration/` contains a complete end-to-end demo:
493
+ - **Docker Compose** environment: source cluster, target cluster, Rails API app
494
+ - **Annotated `migration.yml`** showing all supported change types
495
+ - **Operator runbook** walking through every step from setup to cleanup
496
+
497
+ See [`examples/listings_migration/runbook.md`](https://github.com/charlesharris/pcrd/blob/main/examples/listings_migration/runbook.md).
498
+
499
+ ---
500
+
501
+ ## Development
502
+
503
+ ```bash
504
+ git clone https://github.com/charris/pcrd
505
+ cd pcrd
506
+ bundle install
507
+
508
+ # Build and install the gem so `pcrd` is on your PATH
509
+ gem build pcrd.gemspec
510
+ gem install pcrd-0.1.0.gem
511
+
512
+ # Start dev PostgreSQL containers
513
+ docker compose -f dev/docker-compose.yml up -d
514
+
515
+ # Run tests
516
+ bundle exec rspec
517
+
518
+ # Run integration tests only
519
+ bundle exec rspec spec/integration/
520
+
521
+ # Run a quick end-to-end demo
522
+ pcrd demo setup
523
+ pcrd demo seed --rows 10000
524
+ pcrd analyze
525
+ pcrd migrate --preflight-only
526
+ pcrd migrate --backfill-only --yes
527
+ ```
528
+
529
+ > After making code changes locally, re-run `gem build pcrd.gemspec && gem install pcrd-0.1.0.gem` to pick them up, or use `bundle exec bin/pcrd` to run from source without reinstalling.
530
+
531
+ ### Test environment
532
+
533
+ Integration tests require both containers from `dev/docker-compose.yml`. Override connection details with environment variables:
534
+
535
+ ```bash
536
+ PCRD_TEST_SOURCE_HOST=localhost PCRD_TEST_SOURCE_PORT=5433 \
537
+ PCRD_TEST_SOURCE_DB=pcrd_source PCRD_TEST_SOURCE_USER=postgres \
538
+ PCRD_TEST_SOURCE_PASSWORD=postgres \
539
+ PCRD_TEST_TARGET_HOST=localhost PCRD_TEST_TARGET_PORT=5434 \
540
+ PCRD_TEST_TARGET_DB=pcrd_target PCRD_TEST_TARGET_USER=postgres \
541
+ PCRD_TEST_TARGET_PASSWORD=postgres \
542
+ bundle exec rspec
543
+ ```
544
+
545
+ ---
546
+
547
+ ## Architecture Notes
548
+
549
+ ### Why cross-cluster?
550
+
551
+ Running source and target as separate PostgreSQL servers supports more than just schema changes:
552
+ - **Version upgrades**: migrate from PG 14 to PG 16 with zero downtime
553
+ - **Cloud migrations**: move from on-premise to RDS, from AWS to GCP, etc.
554
+ - **Hardware changes**: move to larger instances without downtime
555
+ - **Schema changes**: the original use case — column type changes, renames, reordering
556
+
557
+ ### Why pgoutput?
558
+
559
+ `pgoutput` is PostgreSQL's built-in logical replication plugin (available since PG 10). No extensions are required on the source server. This makes pcrd work with managed PostgreSQL services (RDS, Cloud SQL, etc.) that restrict extension installation.
560
+
561
+ ### Backfill / streaming overlap
562
+
563
+ The replication slot is created before backfill starts. This ensures all WAL changes during backfill are retained. The WAL consumer runs concurrently with backfill, buffering events. When backfill completes, the apply engine replays buffered events before transitioning to live streaming. Because the apply engine uses `INSERT ... ON CONFLICT DO UPDATE`, rows that appear in both the bulk copy and the WAL stream are handled correctly — WAL wins.
564
+
565
+ ### Primary key requirement
566
+
567
+ Every migrated table must have a primary key or unique not-null index. This is a hard requirement: without a unique key, the apply engine cannot safely handle the backfill/streaming overlap window (it cannot know whether a WAL insert is a concurrent new write or a duplicate of something already bulk-copied).
568
+
569
+ ---
570
+
571
+ ## Known Limitations
572
+
573
+ - **Sequences** — target sequences are advanced as part of `pcrd cutover`. The command computes `max(id)` on source and calls `setval` on target with a configurable safety buffer.
574
+ - **Foreign keys** — FK constraints on the target are listed in the preflight output but not automatically created. Add them post-cutover.
575
+ - **Non-PK indexes** — like FK constraints, these are listed in the preflight report. Create them on the target before cutover for query performance.
576
+ - **Large objects** — `pg_largeobject` data is not replicated via logical replication.
577
+ - **Generated columns** — pcrd creates these without the GENERATED clause; values are recomputed by the target database.
578
+ - **DDL during migration** — if a column is added or dropped on the source after the migration starts, pcrd halts with a clear error rather than silently corrupting data.
579
+ - **Partitioned tables** — supported but each partition must be listed individually in the config.
580
+
581
+ ---
582
+
583
+ ## Project Status
584
+
585
+ | Phase | Status | Description |
586
+ |---|---|---|
587
+ | Config loading | ✅ | YAML config, typed structs, env-var passwords |
588
+ | Schema reader | ✅ | pg_attribute query, column metadata |
589
+ | Padding analyzer | ✅ | Optimal column ordering, space savings estimate |
590
+ | `pcrd analyze` | ✅ | Source-only and --compare-target |
591
+ | Type transformer | ✅ | Cast safety rules, data validation |
592
+ | DDL generation | ✅ | CREATE TABLE from spec + source schema |
593
+ | Preflight | ✅ | All 8 safety checks |
594
+ | `pcrd migrate --preflight-only` | ✅ | Full preflight report + DDL preview |
595
+ | Checkpoint store | ✅ | SQLite per-batch progress tracking |
596
+ | Backfill engine | ✅ | Keyset-paginated COPY, resumable |
597
+ | `pcrd migrate --backfill-only` | ✅ | Full backfill with progress display |
598
+ | pgoutput parser | ✅ | All message types, binary protocol |
599
+ | WAL consumer | ✅ | Background thread, transaction buffering |
600
+ | Apply engine | ✅ | Upsert/update/delete on target |
601
+ | `pcrd migrate` (full) | ✅ | Backfill + streaming + lag meter |
602
+ | `pcrd demo setup/seed` | ✅ | Demo database with realistic schema |
603
+ | `pcrd cutover` | ✅ | Sequence advancement, drain, verify |
604
+ | `pcrd verify` | ✅ | Row counts + spot-check |
605
+ | `pcrd status` | ✅ | Live lag meter from checkpoint |
606
+ | `pcrd cleanup` | ✅ | Drop slot/pub/checkpoint |
607
+ | Docker Compose example | ✅ | Rails app + runbook |
608
+ | Full polish + README | ✅ | |
609
+
610
+ ---
611
+
612
+ ## License
613
+
614
+ MIT
data/bin/pcrd ADDED
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require "pcrd"
5
+ require "pcrd/cli"
6
+
7
+ Pcrd::CLI.start(ARGV)