pcrd 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +24 -0
- data/LICENSE +21 -0
- data/README.md +614 -0
- data/bin/pcrd +7 -0
- data/lib/pcrd/advisory_lock.rb +50 -0
- data/lib/pcrd/apply/engine.rb +184 -0
- data/lib/pcrd/apply/worker.rb +97 -0
- data/lib/pcrd/backfill/batch.rb +158 -0
- data/lib/pcrd/backfill/engine.rb +153 -0
- data/lib/pcrd/checkpoint/store.rb +217 -0
- data/lib/pcrd/cli.rb +274 -0
- data/lib/pcrd/commands/analyze.rb +125 -0
- data/lib/pcrd/commands/cleanup.rb +112 -0
- data/lib/pcrd/commands/demo.rb +152 -0
- data/lib/pcrd/commands/readiness.rb +30 -0
- data/lib/pcrd/commands/status.rb +129 -0
- data/lib/pcrd/commands/verify.rb +172 -0
- data/lib/pcrd/config/add_column.rb +7 -0
- data/lib/pcrd/config/analyze_config.rb +8 -0
- data/lib/pcrd/config/column_spec.rb +10 -0
- data/lib/pcrd/config/connection.rb +7 -0
- data/lib/pcrd/config/cutover_config.rb +7 -0
- data/lib/pcrd/config/load_error.rb +7 -0
- data/lib/pcrd/config/loader.rb +158 -0
- data/lib/pcrd/config/migrate_config.rb +21 -0
- data/lib/pcrd/config/root.rb +9 -0
- data/lib/pcrd/config/schema.rb +62 -0
- data/lib/pcrd/config/table.rb +9 -0
- data/lib/pcrd/config/verify_config.rb +7 -0
- data/lib/pcrd/config.rb +7 -0
- data/lib/pcrd/connection/client.rb +129 -0
- data/lib/pcrd/connection/error.rb +7 -0
- data/lib/pcrd/connection/replication.rb +108 -0
- data/lib/pcrd/cutover/orchestrator.rb +108 -0
- data/lib/pcrd/cutover/sequences.rb +138 -0
- data/lib/pcrd/demo/generator.rb +214 -0
- data/lib/pcrd/demo/schema.rb +154 -0
- data/lib/pcrd/error.rb +12 -0
- data/lib/pcrd/migration/orchestrator.rb +272 -0
- data/lib/pcrd/monitor/lag.rb +107 -0
- data/lib/pcrd/options.rb +15 -0
- data/lib/pcrd/output/analyze_printer.rb +173 -0
- data/lib/pcrd/output/cutover_printer.rb +128 -0
- data/lib/pcrd/output/preflight_printer.rb +119 -0
- data/lib/pcrd/output/readiness_printer.rb +72 -0
- data/lib/pcrd/preflight.rb +331 -0
- data/lib/pcrd/readiness/manifest.rb +201 -0
- data/lib/pcrd/replication/consumer.rb +235 -0
- data/lib/pcrd/replication/error.rb +10 -0
- data/lib/pcrd/replication/pgoutput/messages.rb +68 -0
- data/lib/pcrd/replication/pgoutput/parser.rb +316 -0
- data/lib/pcrd/reporter/console.rb +46 -0
- data/lib/pcrd/reporter/null.rb +14 -0
- data/lib/pcrd/schema/column.rb +59 -0
- data/lib/pcrd/schema/ddl.rb +71 -0
- data/lib/pcrd/schema/diff_entry.rb +36 -0
- data/lib/pcrd/schema/differ.rb +175 -0
- data/lib/pcrd/schema/object_reader.rb +187 -0
- data/lib/pcrd/schema/packer.rb +90 -0
- data/lib/pcrd/schema/reader.rb +118 -0
- data/lib/pcrd/schema/setup.rb +143 -0
- data/lib/pcrd/schema/setup_error.rb +9 -0
- data/lib/pcrd/schema/table_not_found.rb +8 -0
- data/lib/pcrd/schema/type_registry.rb +116 -0
- data/lib/pcrd/sql.rb +55 -0
- data/lib/pcrd/transform/row_transformer.rb +69 -0
- data/lib/pcrd/transform/type_map.rb +209 -0
- data/lib/pcrd/transform/validator.rb +106 -0
- data/lib/pcrd/version.rb +5 -0
- data/lib/pcrd.rb +11 -0
- metadata +231 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: 974b78ca232b53584a2aba4742b46dd09c0842701f05649775d817cbef711c09
|
|
4
|
+
data.tar.gz: 8cae02d6f8e799c2d86a383f1b7b27cab7af7f8eb15d41cc402d36683db5b367
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: e3777c72cf2c9fd9d2accce587906a9425fac44a8cfd7eb7f00f2a7b96814663e11869e0daab62a533270ce3f4d1e8ca4d837ae290d322a3e7fe37ea6122fd84
|
|
7
|
+
data.tar.gz: 25206ce31d6bd995298debbf6109e922d88993df78a2e54d2104f22431f5e065a0edba722b29e6822f01ab14f7ae6c1422148fc969b44d92e2472321edd21090
|
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [Unreleased]
|
|
9
|
+
|
|
10
|
+
## [0.1.0] - 2026-06-10
|
|
11
|
+
|
|
12
|
+
### Added
|
|
13
|
+
|
|
14
|
+
- Initial release.
|
|
15
|
+
- Zero-downtime cross-cluster PostgreSQL migrations via logical replication.
|
|
16
|
+
- Column type changes, renames, additions, drops, and reordering with padding optimization.
|
|
17
|
+
- Preflight validation (connections, WAL level, type cast safety, PK existence).
|
|
18
|
+
- Keyset-paginated, checkpointed `COPY` backfill engine.
|
|
19
|
+
- Concurrent WAL streaming + apply engine with TOAST/TRUNCATE handling.
|
|
20
|
+
- Replication lag monitoring and brief-lock cutover orchestration.
|
|
21
|
+
- `pcrd` CLI built on Thor.
|
|
22
|
+
|
|
23
|
+
[Unreleased]: https://github.com/charlesharris/pcrd/compare/v0.1.0...HEAD
|
|
24
|
+
[0.1.0]: https://github.com/charlesharris/pcrd/releases/tag/v0.1.0
|
data/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Charles Harris
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,614 @@
|
|
|
1
|
+
# pcrd — PostgreSQL Column Rewrite Daemon
|
|
2
|
+
|
|
3
|
+

|
|
4
|
+
|
|
5
|
+
!!! NOTE - This is not a production-ready project... use at your own risk, YMMV, etc.
|
|
6
|
+
|
|
7
|
+
Zero-downtime cross-cluster PostgreSQL migrations using logical replication.
|
|
8
|
+
|
|
9
|
+
pcrd migrates large tables to a new PostgreSQL cluster with column type changes, renames, additions, drops, and column reordering — without locking your source database for more than a few seconds at cutover.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## The Problem
|
|
14
|
+
|
|
15
|
+
`ALTER TABLE t ALTER COLUMN id TYPE bigint` on a 500M-row table acquires an `AccessExclusiveLock` and rewrites every row while holding it. That means minutes to hours of complete read/write blackout — unacceptable in production.
|
|
16
|
+
|
|
17
|
+
pcrd solves this by building the new schema on a separate cluster using logical replication, streaming all changes from source to target continuously, and then cutting over with only a brief maintenance window (seconds, not hours).
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## How It Works
|
|
22
|
+
|
|
23
|
+
```
|
|
24
|
+
Source cluster pcrd Target cluster
|
|
25
|
+
────────────── ──── ──────────────
|
|
26
|
+
live table ─WAL─────► WAL consumer new schema table
|
|
27
|
+
(old types) type transformer ──► (new types)
|
|
28
|
+
│ ─bulk──────► backfill engine ──►
|
|
29
|
+
│ lag monitor
|
|
30
|
+
│ cutover (brief lock)
|
|
31
|
+
│
|
|
32
|
+
App ──── DATABASE_URL ────────────────────────────► switch here
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
**Phases:**
|
|
36
|
+
1. **Preflight** — validate connections, WAL level, type cast safety, PK existence
|
|
37
|
+
2. **Setup** — create publication + replication slot on source; DDL on target
|
|
38
|
+
3. **Backfill** — bulk copy existing rows via keyset-paginated `COPY`, checkpointed
|
|
39
|
+
4. **Streaming** — consume WAL events, transform, apply to target concurrently with backfill
|
|
40
|
+
5. **Catchup** — monitor replication lag; display live lag meter
|
|
41
|
+
6. **Cutover** — operator-triggered; drain lag to zero, advance sequences, signal ready
|
|
42
|
+
7. **Verify** — row count + spot-check comparison
|
|
43
|
+
8. **Cleanup** — drop replication slot, publication, archive source tables
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## Features
|
|
48
|
+
|
|
49
|
+
- **Zero schema lock** — source database runs normally throughout; `AccessExclusiveLock` held only for milliseconds at cutover
|
|
50
|
+
- **Cross-cluster** — source and target are separate PostgreSQL servers; works for version upgrades, cloud provider migrations, hardware changes
|
|
51
|
+
- **Type transformation** — widening casts (int→bigint, varchar→text, timestamp→timestamptz) are automatic; narrowing casts require an explicit pre-migration data validation pass
|
|
52
|
+
- **Column padding optimizer** — analyzes column alignment and estimates space savings from reordering; integrated into the migration flow
|
|
53
|
+
- **Resumable** — SQLite checkpoint stores per-batch progress; `pcrd migrate --resume` picks up from the last completed batch
|
|
54
|
+
- **No source extensions required** — uses PostgreSQL's built-in `pgoutput` logical replication (PG 10+)
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## Requirements
|
|
59
|
+
|
|
60
|
+
- Ruby 3.2+
|
|
61
|
+
- PostgreSQL 10+ on source (with `wal_level = logical`)
|
|
62
|
+
- PostgreSQL 10+ on target
|
|
63
|
+
- Source user must have `REPLICATION` attribute and `SELECT` on migrated tables
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## Installation
|
|
68
|
+
|
|
69
|
+
**From RubyGems** (once published):
|
|
70
|
+
```bash
|
|
71
|
+
gem install pcrd
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
**From source:**
|
|
75
|
+
```bash
|
|
76
|
+
git clone https://github.com/charris/pcrd
|
|
77
|
+
cd pcrd
|
|
78
|
+
bundle install
|
|
79
|
+
gem build pcrd.gemspec
|
|
80
|
+
gem install pcrd-0.1.0.gem
|
|
81
|
+
pcrd --version
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
> `bundle install` installs dependencies only — it does not put `pcrd` on your PATH. The `gem build` + `gem install` steps are required. If you just want to run pcrd without installing, use `bundle exec bin/pcrd` from the repo root.
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## Quick Start
|
|
89
|
+
|
|
90
|
+
### 1. Start the demo environment
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
docker compose -f dev/docker-compose.yml up -d
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
This starts two PostgreSQL 16 containers:
|
|
97
|
+
- **source_db** on port 5433 (with `wal_level=logical`)
|
|
98
|
+
- **target_db** on port 5434
|
|
99
|
+
|
|
100
|
+
### 2. Create the demo schema and data
|
|
101
|
+
|
|
102
|
+
```bash
|
|
103
|
+
# Create tables on source (intentionally poor column ordering for demo)
|
|
104
|
+
pcrd demo setup
|
|
105
|
+
|
|
106
|
+
# Seed with 50,000 rows (users → agents → listings)
|
|
107
|
+
pcrd demo seed --rows 50000
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
### 3. Analyze column padding
|
|
111
|
+
|
|
112
|
+
```bash
|
|
113
|
+
# Shows current column layout and how much space can be saved by reordering
|
|
114
|
+
pcrd analyze
|
|
115
|
+
|
|
116
|
+
# Compare source vs. proposed target schema side-by-side
|
|
117
|
+
pcrd analyze --compare-target
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
### 4. Run the migration
|
|
121
|
+
|
|
122
|
+
```bash
|
|
123
|
+
# Check everything looks right first
|
|
124
|
+
pcrd migrate --preflight-only
|
|
125
|
+
|
|
126
|
+
# Run the full migration (backfill + streaming)
|
|
127
|
+
pcrd migrate --yes
|
|
128
|
+
|
|
129
|
+
# Or backfill only (no WAL streaming)
|
|
130
|
+
pcrd migrate --backfill-only --yes
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
### 5. Cut over
|
|
134
|
+
|
|
135
|
+
```bash
|
|
136
|
+
# Once lag is near zero, put the app in maintenance mode, then:
|
|
137
|
+
pcrd cutover --maintenance-confirmed
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
---
|
|
141
|
+
|
|
142
|
+
## Configuration
|
|
143
|
+
|
|
144
|
+
pcrd looks for `pcrd.config.yml` in the current directory by default. Pass `--config path/to/file.yml` to override. Run `pcrd demo setup` to generate a sample file automatically.
|
|
145
|
+
|
|
146
|
+
**Full reference:** [docs/config_reference.md](https://github.com/charlesharris/pcrd/blob/main/docs/config_reference.md)
|
|
147
|
+
|
|
148
|
+
### The most important rule: only specify what changes
|
|
149
|
+
|
|
150
|
+
The `columns:` map under each table only needs entries for columns you want to **modify**. Any column not listed is migrated automatically — same name, same type, same `NOT NULL`, same `DEFAULT`. You only need a column entry if you want to change its type, rename it, or drop it.
|
|
151
|
+
|
|
152
|
+
For a 20-column table where only `id` needs widening:
|
|
153
|
+
|
|
154
|
+
```yaml
|
|
155
|
+
migrate:
|
|
156
|
+
tables:
|
|
157
|
+
- name: orders
|
|
158
|
+
columns:
|
|
159
|
+
id:
|
|
160
|
+
type: bigint # only this one column needs to be listed
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
The other 19 columns require no configuration.
|
|
164
|
+
|
|
165
|
+
### Annotated config example
|
|
166
|
+
|
|
167
|
+
```yaml
|
|
168
|
+
# pcrd.config.yml
|
|
169
|
+
|
|
170
|
+
source:
|
|
171
|
+
host: db-primary.old.example.com
|
|
172
|
+
port: 5432 # default: 5432
|
|
173
|
+
database: myapp_production
|
|
174
|
+
user: pcrd_replication
|
|
175
|
+
# password: via PCRD_SOURCE_PASSWORD env var or ~/.pgpass
|
|
176
|
+
|
|
177
|
+
target:
|
|
178
|
+
host: db-primary.new.example.com
|
|
179
|
+
port: 5432
|
|
180
|
+
database: myapp_production # same database name on both clusters
|
|
181
|
+
user: pcrd_writer
|
|
182
|
+
# password: via PCRD_TARGET_PASSWORD env var or ~/.pgpass
|
|
183
|
+
|
|
184
|
+
migrate:
|
|
185
|
+
# replication_slot and publication are auto-derived from the first table
|
|
186
|
+
# name if not set. Set explicitly when running multiple migrations.
|
|
187
|
+
replication_slot: pcrd_listings_v2 # optional
|
|
188
|
+
publication: pcrd_pub_v2 # optional
|
|
189
|
+
|
|
190
|
+
batch_size: 10_000 # rows per backfill batch; default 10,000
|
|
191
|
+
lag_threshold_bytes: 1_048_576 # 1 MB — "ready for cutover" threshold
|
|
192
|
+
checkpoint_db: ./pcrd_checkpoint.sqlite3 # per-batch progress; enables --resume
|
|
193
|
+
|
|
194
|
+
tables:
|
|
195
|
+
- name: listings
|
|
196
|
+
|
|
197
|
+
# Reorder columns for minimal alignment padding (free — pcrd rewrites
|
|
198
|
+
# the table anyway). Run `pcrd analyze` first to see the savings.
|
|
199
|
+
optimize_column_order: true
|
|
200
|
+
|
|
201
|
+
# Only specify columns you want to change.
|
|
202
|
+
# Every other column is copied as-is (same name, type, constraints).
|
|
203
|
+
columns:
|
|
204
|
+
id:
|
|
205
|
+
type: bigint # widen integer → bigint (always safe, no validation)
|
|
206
|
+
|
|
207
|
+
list_price:
|
|
208
|
+
type: numeric(18,4) # widen numeric precision (always safe)
|
|
209
|
+
rename: list_price_precise # rename in the same step
|
|
210
|
+
|
|
211
|
+
status_code:
|
|
212
|
+
rename: listing_status # rename only, keep the same type
|
|
213
|
+
|
|
214
|
+
legacy_notes:
|
|
215
|
+
drop: true # exclude this column from the target entirely
|
|
216
|
+
|
|
217
|
+
# Not listed = copied as-is:
|
|
218
|
+
# active, bedrooms, bathrooms, created_at, latitude, longitude, ...
|
|
219
|
+
|
|
220
|
+
# New columns to add (not present on source).
|
|
221
|
+
# Backfilled rows get the DEFAULT value; NULL if no default specified.
|
|
222
|
+
add_columns:
|
|
223
|
+
- name: updated_at
|
|
224
|
+
type: timestamptz
|
|
225
|
+
default: "now()" # SQL expression evaluated by PostgreSQL
|
|
226
|
+
|
|
227
|
+
- name: users
|
|
228
|
+
columns:
|
|
229
|
+
id:
|
|
230
|
+
type: bigint # all other user columns copied unchanged
|
|
231
|
+
|
|
232
|
+
# Tables to include in `pcrd analyze` output.
|
|
233
|
+
# If omitted, analyzes all tables listed in migrate.tables.
|
|
234
|
+
analyze:
|
|
235
|
+
tables:
|
|
236
|
+
- listings
|
|
237
|
+
- users
|
|
238
|
+
|
|
239
|
+
# Spot-check settings for `pcrd verify`.
|
|
240
|
+
verify:
|
|
241
|
+
sample_size: 1_000 # random rows to compare field-by-field
|
|
242
|
+
|
|
243
|
+
# Cutover behavior.
|
|
244
|
+
cutover:
|
|
245
|
+
sequence_buffer: 1_000 # added to max(id) when setting target sequences
|
|
246
|
+
lag_drain_timeout: 300 # seconds to wait for lag → zero during cutover
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
**Passwords** — never put passwords in the config file. Use:
|
|
250
|
+
- `PCRD_SOURCE_PASSWORD` environment variable
|
|
251
|
+
- `PCRD_TARGET_PASSWORD` environment variable
|
|
252
|
+
- `~/.pgpass` (standard PostgreSQL password file)
|
|
253
|
+
|
|
254
|
+
### Quick reference: column change options
|
|
255
|
+
|
|
256
|
+
| In `columns:` | Effect |
|
|
257
|
+
|---|---|
|
|
258
|
+
| `type: bigint` | Change type; keep name |
|
|
259
|
+
| `rename: new_name` | Rename; keep type |
|
|
260
|
+
| `type: bigint, rename: new_name` | Change type AND rename |
|
|
261
|
+
| `drop: true` | Exclude from target entirely |
|
|
262
|
+
| *(no entry)* | Copy exactly as-is |
|
|
263
|
+
|
|
264
|
+
### Supported type changes
|
|
265
|
+
|
|
266
|
+
| Cast | Safety |
|
|
267
|
+
|---|---|
|
|
268
|
+
| `smallint/integer → bigint` | Always safe |
|
|
269
|
+
| `varchar(n) → text` | Always safe |
|
|
270
|
+
| `timestamp → timestamptz` | Always safe |
|
|
271
|
+
| `integer/bigint → numeric` | Always safe |
|
|
272
|
+
| `bigint → integer` | Validated (range check) |
|
|
273
|
+
| `text/varchar → varchar(n)` | Validated (length check) |
|
|
274
|
+
| `float8 → float4` | Validated (warn only) |
|
|
275
|
+
| `timestamptz → timestamp` | Validated (warn only — timezone lost) |
|
|
276
|
+
|
|
277
|
+
Run `pcrd migrate --preflight-only` to see the full safety report and generated DDL before committing.
|
|
278
|
+
|
|
279
|
+
---
|
|
280
|
+
|
|
281
|
+
## CLI Reference
|
|
282
|
+
|
|
283
|
+
### `pcrd --version`
|
|
284
|
+
|
|
285
|
+
```bash
|
|
286
|
+
pcrd --version # or: pcrd -v
|
|
287
|
+
# → pcrd 0.1.0
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
### `pcrd analyze`
|
|
291
|
+
|
|
292
|
+
Analyze column padding for source tables. Read-only.
|
|
293
|
+
|
|
294
|
+
```bash
|
|
295
|
+
pcrd analyze [--config FILE] [--table TABLE] [--compare-target]
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
- `--table TABLE` — analyze only this table (default: all tables in config)
|
|
299
|
+
- `--compare-target` — connect to target and show source vs. target side-by-side, including type changes, renames, added/dropped columns, and padding delta
|
|
300
|
+
|
|
301
|
+
**Example output:**
|
|
302
|
+
```
|
|
303
|
+
Table: public.listings (50,000 rows)
|
|
304
|
+
|
|
305
|
+
Current layout:
|
|
306
|
+
┌─────────────────────┬──────────────────┬───────┬──────────┬────────────────┐
|
|
307
|
+
│ Column │ Type │ Align │ Size │ Padding before │
|
|
308
|
+
├─────────────────────┼──────────────────┼───────┼──────────┼────────────────┤
|
|
309
|
+
│ id │ integer │ 4B │ 4 │ — │
|
|
310
|
+
│ active │ boolean │ 1B │ 1 │ — │
|
|
311
|
+
│ listed_at │ timestamp │ 8B │ 8 │ ← 1 wasted │
|
|
312
|
+
...
|
|
313
|
+
|
|
314
|
+
Padding analysis:
|
|
315
|
+
Current row overhead (fixed cols + padding): 104 bytes
|
|
316
|
+
Optimal row overhead (fixed cols only): 84 bytes
|
|
317
|
+
Wasted padding: 20 bytes/row (19.2%)
|
|
318
|
+
At 50,000 rows: ~1.0 MB reclaimed by reordering columns
|
|
319
|
+
```
|
|
320
|
+
|
|
321
|
+
---
|
|
322
|
+
|
|
323
|
+
### `pcrd migrate`
|
|
324
|
+
|
|
325
|
+
Run the migration. Preflight → setup → backfill → streaming.
|
|
326
|
+
|
|
327
|
+
```bash
|
|
328
|
+
pcrd migrate [--config FILE] [--preflight-only] [--backfill-only] [--dry-run]
|
|
329
|
+
[--resume] [--yes] [--force-overwrite]
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
- `--preflight-only` — run all safety checks and print target DDL; do not start migration
|
|
333
|
+
- `--dry-run` — same as `--preflight-only`
|
|
334
|
+
- `--backfill-only` — copy existing rows only; do not start WAL streaming
|
|
335
|
+
- `--resume` — resume an interrupted migration from the last checkpoint
|
|
336
|
+
- `--yes` — skip the confirmation prompt
|
|
337
|
+
- `--force-overwrite` — drop and recreate target tables if they already exist
|
|
338
|
+
|
|
339
|
+
**Ctrl-C / SIGINT:** pcrd finishes the current batch or WAL event, writes the checkpoint, and exits cleanly with a `--resume` command to copy. Nothing is lost.
|
|
340
|
+
|
|
341
|
+
```
|
|
342
|
+
Migration interrupted. Resume with:
|
|
343
|
+
pcrd migrate --config migration.yml --resume
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
**Preflight checks performed:**
|
|
347
|
+
1. Source and target connectivity
|
|
348
|
+
2. `wal_level = logical` on source
|
|
349
|
+
3. `max_replication_slots` headroom
|
|
350
|
+
4. Source tables exist; row count estimate
|
|
351
|
+
5. Primary key present on every migrated table (required for upsert semantics)
|
|
352
|
+
6. Target tables do not already exist
|
|
353
|
+
7. All spec column names exist on source; all type casts are known
|
|
354
|
+
8. Data validation for validated casts (bigint→int range, text→varchar(n) length, etc.)
|
|
355
|
+
|
|
356
|
+
**Supported type changes:**
|
|
357
|
+
|
|
358
|
+
| Always safe (no validation) | Validated (data check required) |
|
|
359
|
+
|---|---|
|
|
360
|
+
| `smallint → integer/bigint` | `bigint → integer` |
|
|
361
|
+
| `integer → bigint` | `text/varchar → varchar(n)` |
|
|
362
|
+
| `float4 → float8` | `float8 → float4` (warn only) |
|
|
363
|
+
| `varchar(n) → text` | `timestamptz → timestamp` (warn only) |
|
|
364
|
+
| `timestamp → timestamptz` | `numeric → integer/bigint` |
|
|
365
|
+
| `date → timestamp/timestamptz` | |
|
|
366
|
+
| `integer/bigint → numeric` | |
|
|
367
|
+
|
|
368
|
+
---
|
|
369
|
+
|
|
370
|
+
### `pcrd demo`
|
|
371
|
+
|
|
372
|
+
Set up and seed a demo database for testing.
|
|
373
|
+
|
|
374
|
+
```bash
|
|
375
|
+
pcrd demo setup [--config FILE]
|
|
376
|
+
pcrd demo seed [--config FILE] [--rows N] [--seed N]
|
|
377
|
+
pcrd demo reset [--config FILE]
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
- `demo setup` — creates `users`, `agents`, and `listings` tables on source; writes a sample `pcrd.config.yml` if none exists. The `listings` table is intentionally ordered with poor column alignment to demonstrate the padding optimizer.
|
|
381
|
+
- `demo seed --rows N` — generates realistic fake data (N listings, proportional users and agents). Default: 50,000 rows. Reproducible with `--seed`.
|
|
382
|
+
- `demo reset` — drops all demo tables.
|
|
383
|
+
|
|
384
|
+
---
|
|
385
|
+
|
|
386
|
+
### `pcrd cutover` *(coming soon)*
|
|
387
|
+
|
|
388
|
+
Trigger the cutover sequence after lag reaches near-zero.
|
|
389
|
+
|
|
390
|
+
```bash
|
|
391
|
+
pcrd cutover [--config FILE] [--maintenance-confirmed]
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
The application must be in maintenance mode before running this command. See [Cutover Procedure](#cutover-procedure) below.
|
|
395
|
+
|
|
396
|
+
---
|
|
397
|
+
|
|
398
|
+
### `pcrd verify` *(coming soon)*
|
|
399
|
+
|
|
400
|
+
Compare row counts and spot-check rows across clusters.
|
|
401
|
+
|
|
402
|
+
```bash
|
|
403
|
+
pcrd verify [--config FILE] [--sample-size N]
|
|
404
|
+
```
|
|
405
|
+
|
|
406
|
+
---
|
|
407
|
+
|
|
408
|
+
### `pcrd status` *(coming soon)*
|
|
409
|
+
|
|
410
|
+
Show current migration phase, backfill progress, and live replication lag.
|
|
411
|
+
|
|
412
|
+
---
|
|
413
|
+
|
|
414
|
+
### `pcrd cleanup` *(coming soon)*
|
|
415
|
+
|
|
416
|
+
Drop replication slot, publication, and checkpoint. Optionally drop source tables.
|
|
417
|
+
|
|
418
|
+
---
|
|
419
|
+
|
|
420
|
+
## Cutover Procedure
|
|
421
|
+
|
|
422
|
+
When the lag meter shows "✓ Ready for cutover":
|
|
423
|
+
|
|
424
|
+
1. **Put the application in maintenance mode.** Options depending on your stack:
|
|
425
|
+
|
|
426
|
+
| Stack | Approach |
|
|
427
|
+
|---|---|
|
|
428
|
+
| **pgBouncer** | `PAUSE <database>` — queues connections instead of rejecting them |
|
|
429
|
+
| **Rails + Rack** | Enable maintenance middleware via file flag or env var |
|
|
430
|
+
| **Kubernetes** | `kubectl scale --replicas=0 deployment/app` |
|
|
431
|
+
| **Heroku** | `heroku maintenance:on` |
|
|
432
|
+
|
|
433
|
+
2. **Run cutover:** `pcrd cutover --maintenance-confirmed`
|
|
434
|
+
pcrd drains remaining lag to zero, advances target sequences, and verifies row counts.
|
|
435
|
+
|
|
436
|
+
3. **Switch connection strings:** Update `DATABASE_URL` (or equivalent) to point at the target cluster.
|
|
437
|
+
|
|
438
|
+
4. **Restart the application.**
|
|
439
|
+
|
|
440
|
+
5. **Verify:** `pcrd verify` — confirms row counts match across clusters.
|
|
441
|
+
|
|
442
|
+
6. **End maintenance mode** once the application is healthy on the target cluster.
|
|
443
|
+
|
|
444
|
+
7. **Cleanup** (days later, when confident): `pcrd cleanup`
|
|
445
|
+
|
|
446
|
+
**Rollback:** Never cut over → old cluster keeps running unchanged. No data is lost.
|
|
447
|
+
|
|
448
|
+
---
|
|
449
|
+
|
|
450
|
+
## Column Padding Analysis
|
|
451
|
+
|
|
452
|
+
PostgreSQL stores columns in definition order. Each column is aligned to its type's natural boundary, which wastes bytes when small-alignment columns (bool, smallint) appear between large-alignment columns (bigint, timestamp).
|
|
453
|
+
|
|
454
|
+
**Alignment rules:**
|
|
455
|
+
- 8 bytes: `bigint`, `float8`, `timestamp`, `timestamptz`
|
|
456
|
+
- 4 bytes: `integer`, `float4`, `date`, `numeric`/`text` headers
|
|
457
|
+
- 2 bytes: `smallint`
|
|
458
|
+
- 1 byte: `boolean`, `char`
|
|
459
|
+
|
|
460
|
+
**Optimal ordering:** 8-byte → 4-byte → 2-byte → 1-byte → variable-length
|
|
461
|
+
|
|
462
|
+
Since pcrd rewrites the table anyway during migration, column reordering is free — set `optimize_column_order: true` in the table config and pcrd applies the optimal ordering automatically.
|
|
463
|
+
|
|
464
|
+
The `pcrd analyze` command shows the current waste and estimated space reclaimed at current row count.
|
|
465
|
+
|
|
466
|
+
---
|
|
467
|
+
|
|
468
|
+
## Source Database Requirements
|
|
469
|
+
|
|
470
|
+
```sql
|
|
471
|
+
-- Grant replication capability
|
|
472
|
+
ALTER ROLE pcrd_replication REPLICATION;
|
|
473
|
+
|
|
474
|
+
-- Grant read access to migrated tables
|
|
475
|
+
GRANT SELECT ON TABLE listings, users TO pcrd_replication;
|
|
476
|
+
|
|
477
|
+
-- Allow publication creation (superuser or pg_monitor in PG14+)
|
|
478
|
+
GRANT CREATE ON DATABASE myapp_production TO pcrd_replication;
|
|
479
|
+
```
|
|
480
|
+
|
|
481
|
+
`postgresql.conf` must have:
|
|
482
|
+
```
|
|
483
|
+
wal_level = logical
|
|
484
|
+
max_replication_slots = <current + number of concurrent pcrd migrations>
|
|
485
|
+
max_wal_senders = <current + number of concurrent pcrd migrations>
|
|
486
|
+
```
|
|
487
|
+
|
|
488
|
+
---
|
|
489
|
+
|
|
490
|
+
## Example Project
|
|
491
|
+
|
|
492
|
+
`examples/listings_migration/` contains a complete end-to-end demo:
|
|
493
|
+
- **Docker Compose** environment: source cluster, target cluster, Rails API app
|
|
494
|
+
- **Annotated `migration.yml`** showing all supported change types
|
|
495
|
+
- **Operator runbook** walking through every step from setup to cleanup
|
|
496
|
+
|
|
497
|
+
See [`examples/listings_migration/runbook.md`](https://github.com/charlesharris/pcrd/blob/main/examples/listings_migration/runbook.md).
|
|
498
|
+
|
|
499
|
+
---
|
|
500
|
+
|
|
501
|
+
## Development
|
|
502
|
+
|
|
503
|
+
```bash
|
|
504
|
+
git clone https://github.com/charris/pcrd
|
|
505
|
+
cd pcrd
|
|
506
|
+
bundle install
|
|
507
|
+
|
|
508
|
+
# Build and install the gem so `pcrd` is on your PATH
|
|
509
|
+
gem build pcrd.gemspec
|
|
510
|
+
gem install pcrd-0.1.0.gem
|
|
511
|
+
|
|
512
|
+
# Start dev PostgreSQL containers
|
|
513
|
+
docker compose -f dev/docker-compose.yml up -d
|
|
514
|
+
|
|
515
|
+
# Run tests
|
|
516
|
+
bundle exec rspec
|
|
517
|
+
|
|
518
|
+
# Run integration tests only
|
|
519
|
+
bundle exec rspec spec/integration/
|
|
520
|
+
|
|
521
|
+
# Run a quick end-to-end demo
|
|
522
|
+
pcrd demo setup
|
|
523
|
+
pcrd demo seed --rows 10000
|
|
524
|
+
pcrd analyze
|
|
525
|
+
pcrd migrate --preflight-only
|
|
526
|
+
pcrd migrate --backfill-only --yes
|
|
527
|
+
```
|
|
528
|
+
|
|
529
|
+
> After making code changes locally, re-run `gem build pcrd.gemspec && gem install pcrd-0.1.0.gem` to pick them up, or use `bundle exec bin/pcrd` to run from source without reinstalling.
|
|
530
|
+
|
|
531
|
+
### Test environment
|
|
532
|
+
|
|
533
|
+
Integration tests require both containers from `dev/docker-compose.yml`. Override connection details with environment variables:
|
|
534
|
+
|
|
535
|
+
```bash
|
|
536
|
+
PCRD_TEST_SOURCE_HOST=localhost PCRD_TEST_SOURCE_PORT=5433 \
|
|
537
|
+
PCRD_TEST_SOURCE_DB=pcrd_source PCRD_TEST_SOURCE_USER=postgres \
|
|
538
|
+
PCRD_TEST_SOURCE_PASSWORD=postgres \
|
|
539
|
+
PCRD_TEST_TARGET_HOST=localhost PCRD_TEST_TARGET_PORT=5434 \
|
|
540
|
+
PCRD_TEST_TARGET_DB=pcrd_target PCRD_TEST_TARGET_USER=postgres \
|
|
541
|
+
PCRD_TEST_TARGET_PASSWORD=postgres \
|
|
542
|
+
bundle exec rspec
|
|
543
|
+
```
|
|
544
|
+
|
|
545
|
+
---
|
|
546
|
+
|
|
547
|
+
## Architecture Notes
|
|
548
|
+
|
|
549
|
+
### Why cross-cluster?
|
|
550
|
+
|
|
551
|
+
Running source and target as separate PostgreSQL servers supports more than just schema changes:
|
|
552
|
+
- **Version upgrades**: migrate from PG 14 to PG 16 with zero downtime
|
|
553
|
+
- **Cloud migrations**: move from on-premise to RDS, from AWS to GCP, etc.
|
|
554
|
+
- **Hardware changes**: move to larger instances without downtime
|
|
555
|
+
- **Schema changes**: the original use case — column type changes, renames, reordering
|
|
556
|
+
|
|
557
|
+
### Why pgoutput?
|
|
558
|
+
|
|
559
|
+
`pgoutput` is PostgreSQL's built-in logical replication plugin (available since PG 10). No extensions are required on the source server. This makes pcrd work with managed PostgreSQL services (RDS, Cloud SQL, etc.) that restrict extension installation.
|
|
560
|
+
|
|
561
|
+
### Backfill / streaming overlap
|
|
562
|
+
|
|
563
|
+
The replication slot is created before backfill starts. This ensures all WAL changes during backfill are retained. The WAL consumer runs concurrently with backfill, buffering events. When backfill completes, the apply engine replays buffered events before transitioning to live streaming. Because the apply engine uses `INSERT ... ON CONFLICT DO UPDATE`, rows that appear in both the bulk copy and the WAL stream are handled correctly — WAL wins.
|
|
564
|
+
|
|
565
|
+
### Primary key requirement
|
|
566
|
+
|
|
567
|
+
Every migrated table must have a primary key or unique not-null index. This is a hard requirement: without a unique key, the apply engine cannot safely handle the backfill/streaming overlap window (it cannot know whether a WAL insert is a concurrent new write or a duplicate of something already bulk-copied).
|
|
568
|
+
|
|
569
|
+
---
|
|
570
|
+
|
|
571
|
+
## Known Limitations
|
|
572
|
+
|
|
573
|
+
- **Sequences** — target sequences are advanced as part of `pcrd cutover`. The command computes `max(id)` on source and calls `setval` on target with a configurable safety buffer.
|
|
574
|
+
- **Foreign keys** — FK constraints on the target are listed in the preflight output but not automatically created. Add them post-cutover.
|
|
575
|
+
- **Non-PK indexes** — like FK constraints, these are listed in the preflight report. Create them on the target before cutover for query performance.
|
|
576
|
+
- **Large objects** — `pg_largeobject` data is not replicated via logical replication.
|
|
577
|
+
- **Generated columns** — pcrd creates these without the GENERATED clause; values are recomputed by the target database.
|
|
578
|
+
- **DDL during migration** — if a column is added or dropped on the source after the migration starts, pcrd halts with a clear error rather than silently corrupting data.
|
|
579
|
+
- **Partitioned tables** — supported but each partition must be listed individually in the config.
|
|
580
|
+
|
|
581
|
+
---
|
|
582
|
+
|
|
583
|
+
## Project Status
|
|
584
|
+
|
|
585
|
+
| Phase | Status | Description |
|
|
586
|
+
|---|---|---|
|
|
587
|
+
| Config loading | ✅ | YAML config, typed structs, env-var passwords |
|
|
588
|
+
| Schema reader | ✅ | pg_attribute query, column metadata |
|
|
589
|
+
| Padding analyzer | ✅ | Optimal column ordering, space savings estimate |
|
|
590
|
+
| `pcrd analyze` | ✅ | Source-only and --compare-target |
|
|
591
|
+
| Type transformer | ✅ | Cast safety rules, data validation |
|
|
592
|
+
| DDL generation | ✅ | CREATE TABLE from spec + source schema |
|
|
593
|
+
| Preflight | ✅ | All 8 safety checks |
|
|
594
|
+
| `pcrd migrate --preflight-only` | ✅ | Full preflight report + DDL preview |
|
|
595
|
+
| Checkpoint store | ✅ | SQLite per-batch progress tracking |
|
|
596
|
+
| Backfill engine | ✅ | Keyset-paginated COPY, resumable |
|
|
597
|
+
| `pcrd migrate --backfill-only` | ✅ | Full backfill with progress display |
|
|
598
|
+
| pgoutput parser | ✅ | All message types, binary protocol |
|
|
599
|
+
| WAL consumer | ✅ | Background thread, transaction buffering |
|
|
600
|
+
| Apply engine | ✅ | Upsert/update/delete on target |
|
|
601
|
+
| `pcrd migrate` (full) | ✅ | Backfill + streaming + lag meter |
|
|
602
|
+
| `pcrd demo setup/seed` | ✅ | Demo database with realistic schema |
|
|
603
|
+
| `pcrd cutover` | ✅ | Sequence advancement, drain, verify |
|
|
604
|
+
| `pcrd verify` | ✅ | Row counts + spot-check |
|
|
605
|
+
| `pcrd status` | ✅ | Live lag meter from checkpoint |
|
|
606
|
+
| `pcrd cleanup` | ✅ | Drop slot/pub/checkpoint |
|
|
607
|
+
| Docker Compose example | ✅ | Rails app + runbook |
|
|
608
|
+
| Full polish + README | ✅ | |
|
|
609
|
+
|
|
610
|
+
---
|
|
611
|
+
|
|
612
|
+
## License
|
|
613
|
+
|
|
614
|
+
MIT
|