pg_online_schema_change 0.4.0 → 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8ef1069f7d159544838ec27cf518f98eb31609a0a2e1878ebd40246b10a6b534
4
- data.tar.gz: 8e83cbac78164fb10870537bf4867020eae7cf06f4cd91930a2c8846fbc76c7c
3
+ metadata.gz: b14af6aa2b98ab8f1b2aab5ae0a8555e017f92531ce1214e50f2ac62ff354224
4
+ data.tar.gz: 66ba81f4e90f4dc612d863d043838fec8676e468924eeb5c39d21094d002e48a
5
5
  SHA512:
6
- metadata.gz: 2657c94b3b07730aa3bb282f34cd1ee95f0549d3649b39a61be3ce1b6d2ba5832a1914b6a9788778c912464cbd9773b5e967ed4e6c896127d60e62a0daa76c99
7
- data.tar.gz: 4788523d691c25045b7f3c539f9743d880fabcbb01887ae34ec55a6c8a829804a9f4d43f619e79e0c44b1c9fff6ea88361fb5ac1111bc1f491aa17e6e43336d0
6
+ metadata.gz: a272a3749f8f053a528d859cc35b97b5fde7d6353a3c0c710500e35d33df6b4bc8526b18e855fd154b742eef8189b661e8e710a6b25221df6e47d55a47381a2e
7
+ data.tar.gz: c0dd7f41e204840b54b768df1f1e8486f79038bede199021ef13684c4b1369e3de43be6fbfd8f03b7ca652251c59a8a580ecefb235e00e54858d77f88a7334de
data/CHANGELOG.md CHANGED
@@ -1,3 +1,11 @@
1
+ ## [0.4.0] - 2022-02-22
2
+ * Lint sourcecode, setup Rubocop proper and Lint in CI by @shayonj in https://github.com/shayonj/pg-osc/pull/46
3
+ * Uniquely identify operation_type column by @shayonj in https://github.com/shayonj/pg-osc/pull/50
4
+ * Introduce primary key on audit table for ordered reads by @shayonj in https://github.com/shayonj/pg-osc/pull/49
5
+ - This addresses an edge case with replay.
6
+ * Uniquely identify trigger_time column by @shayonj in https://github.com/shayonj/pg-osc/pull/51
7
+ * Abstract assertions into a helper function by @shayonj in https://github.com/shayonj/pg-osc/pull/52
8
+
1
9
  ## [0.3.0] - 2022-02-21
2
10
 
3
11
  - Explicitly call dependencies and bump dependencies by @shayonj https://github.com/shayonj/pg-osc/pull/44
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- pg_online_schema_change (0.3.0)
4
+ pg_online_schema_change (0.4.0)
5
5
  ougai (~> 2.0.0)
6
6
  pg (~> 1.3.2)
7
7
  pg_query (~> 2.1.3)
@@ -14,7 +14,6 @@ GEM
14
14
  coderay (1.1.3)
15
15
  diff-lcs (1.5.0)
16
16
  google-protobuf (3.19.4)
17
- google-protobuf (3.19.4-x86_64-linux)
18
17
  method_source (1.0.0)
19
18
  oj (3.13.11)
20
19
  ougai (2.0.0)
data/README.md CHANGED
@@ -16,8 +16,8 @@ pg-online-schema-change (`pg-osc`) is a tool for making schema changes (any `ALT
16
16
  - [Installation](#installation)
17
17
  - [Requirements](#requirements)
18
18
  - [Usage](#usage)
19
- - [How does it work](#how-does-it-work)
20
19
  - [Prominent features](#prominent-features)
20
+ - [Load test](#load-test)
21
21
  - [Examples](#examples)
22
22
  * [Renaming a column](#renaming-a-column)
23
23
  * [Multiple ALTER statements](#multiple-alter-statements)
@@ -25,6 +25,7 @@ pg-online-schema-change (`pg-osc`) is a tool for making schema changes (any `ALT
25
25
  * [Backfill data](#backfill-data)
26
26
  * [Running using Docker](#running-using-docker)
27
27
  - [Caveats](#caveats)
28
+ - [How does it work](#how-does-it-work)
28
29
  - [Development](#development)
29
30
  - [Releasing](#releasing)
30
31
  - [Contributing](#contributing)
@@ -75,13 +76,17 @@ Options:
75
76
  -u, --username=USERNAME # Username for the Database
76
77
  -p, --port=N # Port for the Database
77
78
  # Default: 5432
78
- -w, --password=PASSWORD # Password for the Database
79
+ -w, --password=PASSWORD # DEPRECATED: Password for the Database. Please pass PGPASSWORD environment variable instead.
79
80
  -v, [--verbose], [--no-verbose] # Emit logs in debug mode
80
81
  -f, [--drop], [--no-drop] # Drop the original table in the end after the swap
81
82
  -k, [--kill-backends], [--no-kill-backends] # Kill other competing queries/backends when trying to acquire lock for the shadow table creation and swap. It will wait for --wait-time-for-lock duration before killing backends and try upto 3 times.
82
83
  -w, [--wait-time-for-lock=N] # Time to wait before killing backends to acquire lock and/or retrying upto 3 times. It will kill backends if --kill-backends is true, otherwise try upto 3 times and exit if it cannot acquire a lock.
83
84
  # Default: 10
84
- -c, [--copy-statement=COPY_STATEMENT] # Takes a .sql file location where you can provide a custom query to be played (ex: backfills) when pg-osc copies data from the primary to the shadow table. More examples in README.
85
+ -c, [--copy-statement=COPY_STATEMENT] # Takes a .sql file location where you can provide a custom query to be played (ex: backfills) when pgosc copies data from the primary to the shadow table. More examples in README.
86
+ -b, [--pull-batch-count=N] # Number of rows to be replayed on each iteration after copy. This can be tuned for faster catch up and swap. Best used with delta-count.
87
+ # Default: 1000
88
+ -e, [--delta-count=N] # Indicates how many rows should be remaining before a swap should be performed. This can be tuned for faster catch up and swap, especially on highly volume tables. Best used with pull-batch-count.
89
+ # Default: 20
85
90
  ```
86
91
 
87
92
  ```
@@ -90,29 +95,6 @@ Usage:
90
95
 
91
96
  print the version
92
97
  ```
93
- ## How does it work
94
-
95
- - **Primary table**: A table against which a potential schema change is to be run
96
- - **Shadow table**: A copy of an existing primary table
97
- - **Audit table**: A table to store any updates/inserts/delete on a primary table
98
-
99
- ![how-it-works](diagrams/how-it-works.png)
100
-
101
-
102
- 1. Create an audit table to record changes made to the parent table.
103
- 2. Acquire a brief `ACCESS EXCLUSIVE` lock to add a trigger on the parent table (for inserts, updates, deletes) to the audit table.
104
- 3. Create a new shadow table and run ALTER/migration on the shadow table.
105
- 4. Copy all rows from the old table.
106
- 5. Build indexes on the new table.
107
- 6. Replay all changes accumulated in the audit table against the shadow table.
108
- - Delete rows in the audit table as they are replayed.
109
- 7. Once the delta (remaining rows) is ~20 rows, acquire an `ACCESS EXCLUSIVE` lock against the parent table within a transaction and:
110
- - swap table names (shadow table <> parent table).
111
- - update references in other tables (FKs) by dropping and re-creating the FKs with a `NOT VALID`.
112
- 8. Runs `ANALYZE` on the new table.
113
- 9. Validates all FKs that were added with `NOT VALID`.
114
- 10. Drop parent (now old) table (OPTIONAL).
115
-
116
98
  ## Prominent features
117
99
  - `pg-osc` supports when a column is being added, dropped or renamed with no data loss.
118
100
  - `pg-osc` acquires minimal locks throughout the process (read more below on the caveats).
@@ -121,26 +103,30 @@ print the version
121
103
  - Backfill old/new columns as data is copied from primary table to shadow table, and then perform the swap. [Example](#backfill-data)
122
104
  - **TBD**: Ability to reverse the change with no data loss. [tracking issue](https://github.com/shayonj/pg-osc/issues/14)
123
105
 
106
+ ## Load test
107
+
108
+ [More about the preliminary load test figures here](docs/load-test.md)
109
+
124
110
  ## Examples
125
111
 
126
112
  ### Renaming a column
127
113
  ```
114
+ export PGPASSWORD=""
128
115
  pg-online-schema-change perform \
129
116
  --alter-statement 'ALTER TABLE books RENAME COLUMN email TO new_email' \
130
117
  --dbname "postgres" \
131
118
  --host "localhost" \
132
119
  --username "jamesbond" \
133
- --password "" \
134
120
  ```
135
121
 
136
122
  ### Multiple ALTER statements
137
123
  ```
124
+ export PGPASSWORD=""
138
125
  pg-online-schema-change perform \
139
126
  --alter-statement 'ALTER TABLE books ADD COLUMN "purchased" BOOLEAN DEFAULT FALSE; ALTER TABLE books RENAME COLUMN email TO new_email;' \
140
127
  --dbname "postgres" \
141
128
  --host "localhost" \
142
129
  --username "jamesbond" \
143
- --password "" \
144
130
  --drop
145
131
  ```
146
132
 
@@ -148,13 +134,30 @@ pg-online-schema-change perform \
148
134
  If the operation is being performed on a busy table, you can use `pg-osc`'s `kill-backend` functionality to kill other backends that may be competing with the `pg-osc` operation to acquire a lock for a brief while. The `ACCESS EXCLUSIVE` lock acquired by `pg-osc` is only held for a brief while and released after. You can tune how long `pg-osc` should wait before killing other backends (or if at all `pg-osc` should kill backends in the first place).
149
135
 
150
136
  ```
137
+ export PGPASSWORD=""
151
138
  pg-online-schema-change perform \
152
139
  --alter-statement 'ALTER TABLE books ADD COLUMN "purchased" BOOLEAN DEFAULT FALSE;' \
153
140
  --dbname "postgres" \
154
141
  --host "localhost" \
155
142
  --username "jamesbond" \
156
- --password "" \
157
- --wait-time-for-lock=5 \
143
+ --wait-time-for-lock 5 \
144
+ --kill-backends \
145
+ --drop
146
+ ```
147
+
148
+ ### Replaying larger workloads
149
+ If you have a table with high write volume, the default replay iteration may not suffice. That is - you may see that `pg-osc` is replaying 1000 rows (`pull-batch-count`) in one go from the audit table. `pg-osc` also waits until the remaining row count (`delta-count`) in audit table is 20 before making the swap. You can tune these values to be higher for faster catch up on these kind of workloads.
150
+
151
+ ```
152
+ export PGPASSWORD=""
153
+ pg-online-schema-change perform \
154
+ --alter-statement 'ALTER TABLE books ADD COLUMN "purchased" BOOLEAN DEFAULT FALSE;' \
155
+ --dbname "postgres" \
156
+ --host "localhost" \
157
+ --username "jamesbond" \
158
+ --pull-batch-count 2000
159
+ --delta-count 500
160
+ --wait-time-for-lock 5 \
158
161
  --kill-backends \
159
162
  --drop
160
163
  ```
@@ -183,7 +186,6 @@ pg-online-schema-change perform \
183
186
  --dbname "postgres" \
184
187
  --host "localhost" \
185
188
  --username "jamesbond" \
186
- --password "" \
187
189
  --copy-statement "/src/query.sql" \
188
190
  --drop
189
191
  ```
@@ -197,7 +199,6 @@ docker run --network host -it --rm shayonj/pg-osc:latest \
197
199
  --dbname "postgres" \
198
200
  --host "localhost" \
199
201
  --username "jamesbond" \
200
- --password "" \
201
202
  --drop
202
203
  ```
203
204
  ## Caveats
@@ -215,6 +216,29 @@ docker run --network host -it --rm shayonj/pg-osc:latest \
215
216
  - Can be fixed in future releases. Feel free to open a feature req.
216
217
  - Foreign keys are dropped & re-added to referencing tables with a `NOT VALID`. A follow on `VALIDATE CONSTRAINT` is run.
217
218
  - Ensures that integrity is maintained and re-introducing FKs doesn't acquire additional locks, hence the `NOT VALID`.
219
+ ## How does it work
220
+
221
+ - **Primary table**: A table against which a potential schema change is to be run
222
+ - **Shadow table**: A copy of an existing primary table
223
+ - **Audit table**: A table to store any updates/inserts/delete on a primary table
224
+
225
+ ![how-it-works](docs/how-it-works.png)
226
+
227
+
228
+ 1. Create an audit table to record changes made to the parent table.
229
+ 2. Acquire a brief `ACCESS EXCLUSIVE` lock to add a trigger on the parent table (for inserts, updates, deletes) to the audit table.
230
+ 3. Create a new shadow table and run ALTER/migration on the shadow table.
231
+ 4. Copy all rows from the old table.
232
+ 5. Build indexes on the new table.
233
+ 6. Replay all changes accumulated in the audit table against the shadow table.
234
+ - Delete rows in the audit table as they are replayed.
235
+ 7. Once the delta (remaining rows) is ~20 rows, acquire an `ACCESS EXCLUSIVE` lock against the parent table within a transaction and:
236
+ - swap table names (shadow table <> parent table).
237
+ - update references in other tables (FKs) by dropping and re-creating the FKs with a `NOT VALID`.
238
+ 8. Runs `ANALYZE` on the new table.
239
+ 9. Validates all FKs that were added with `NOT VALID`.
240
+ 10. Drop parent (now old) table (OPTIONAL).
241
+
218
242
  ## Development
219
243
 
220
244
  - Install ruby 3.0
File without changes
File without changes
Binary file
data/docs/load-test.md ADDED
@@ -0,0 +1,138 @@
1
+ # Preliminary Load Test
2
+
3
+ ## pg-osc: No downtime schema changes with 7K+ writes/s & 12k+ reads/s
4
+
5
+ This is a very basic load test performed with `pgbench` against a single instance PostgreSQL DB running on DigitialOcean with the following configuration:
6
+
7
+ - **128GB RAM**
8
+ - **32vCPU**
9
+ - **695GB Disk**
10
+ - Trasanction based connection pool with **500 pool limit**
11
+
12
+ Total time taken to run schema change: **<3mins**
13
+
14
+ ## Simulating load with pgbench
15
+
16
+ **Initialize**
17
+ ```
18
+ pgbench -p $PORT --initialize -s 20 -F 20 --foreign-keys --host $HOST -U $USERNAME -d $DB
19
+ ```
20
+
21
+ This creates bunch of pgbench tables. The table being used with `pg-osc` is `pgbench_accounts` which has FKs and also references by other tables with FKS, containing 2M rows.
22
+
23
+ **Begin**
24
+ ```
25
+ pgbench -p $PORT -j 72 -c 288 -T 500 -r --host $DB_HOST -U $USERNAME -d $DB
26
+ ```
27
+
28
+ ## Running pg-osc
29
+
30
+ Simple `ALTER` statement for experimentation purposes.
31
+
32
+ ```sql
33
+ ALTER TABLE pgbench_accounts ADD COLUMN "purchased" BOOLEAN DEFAULT FALSE;
34
+ ```
35
+
36
+ **Execution**
37
+
38
+ ```bash
39
+ bundle exec bin/pg-online-schema-change perform \
40
+ -a 'ALTER TABLE pgbench_accounts ADD COLUMN "purchased" BOOLEAN DEFAULT FALSE;' \
41
+ -d "pool" \
42
+ -p 25061
43
+ -h "..." \
44
+ -u "..." \
45
+ --pull-batch-count 2000 \
46
+ --delta-count 200
47
+ ```
48
+
49
+ ## Outcome
50
+
51
+ **pgbench results**
52
+
53
+ ```
54
+ number of transactions actually processed: 1060382
55
+ latency average = 144.874 ms
56
+ tps = 1767.057392 (including connections establishing)
57
+ tps = 1777.971823 (excluding connections establishing)
58
+ statement latencies in milliseconds:
59
+ 0.479 \set aid random(1, 100000 * :scale)
60
+ 0.409 \set bid random(1, 1 * :scale)
61
+ 0.247 \set tid random(1, 10 * :scale)
62
+ 0.208 \set delta random(-5000, 5000)
63
+ 3.136 BEGIN;
64
+ 4.243 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
65
+ 4.488 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
66
+ 71.017 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
67
+ 46.689 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
68
+ 4.035 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
69
+ 4.166 END;
70
+ ```
71
+
72
+ **Metrics**
73
+ ![load-test](load-test-1.png)
74
+
75
+ **New table structure**
76
+
77
+ Added `purchased` column.
78
+
79
+ ```
80
+ defaultdb=> \d+ pgbench_accounts;
81
+ Table "public.pgbench_accounts"
82
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
83
+ -----------+---------------+-----------+----------+---------+----------+--------------+-------------
84
+ aid | integer | | not null | | plain | |
85
+ bid | integer | | | | plain | |
86
+ abalance | integer | | | | plain | |
87
+ filler | character(84) | | | | extended | |
88
+ purchased | boolean | | | false | plain | |
89
+ Indexes:
90
+ "pgosc_st_pgbench_accounts_815029_pkey" PRIMARY KEY, btree (aid)
91
+ Foreign-key constraints:
92
+ "pgbench_accounts_bid_fkey" FOREIGN KEY (bid) REFERENCES pgbench_branches(bid)
93
+ Referenced by:
94
+ TABLE "pgbench_history" CONSTRAINT "pgbench_history_aid_fkey" FOREIGN KEY (aid) REFERENCES pgbench_accounts(aid)
95
+ Options: autovacuum_enabled=false, fillfactor=20
96
+ ```
97
+
98
+ **Logs**
99
+
100
+ <details>
101
+ <summary>Logs from pg-osc</summary>
102
+
103
+ ```json
104
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:01.147-05:00","v":0,"msg":"Setting up audit table","audit_table":"pgosc_at_pgbench_accounts_714a8b","version":"0.4.0"}
105
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:01.660-05:00","v":0,"msg":"Setting up triggers","version":"0.4.0"}
106
+ NOTICE: trigger "primary_to_audit_table_trigger" for relation "pgbench_accounts" does not exist, skipping
107
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:01.814-05:00","v":0,"msg":"Setting up shadow table","shadow_table":"pgosc_st_pgbench_accounts_714a8b","version":"0.4.0"}
108
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:02.169-05:00","v":0,"msg":"Running alter statement on shadow table","shadow_table":"pgosc_st_pgbench_accounts_714a8b","parent_table":"pgbench_accounts","version":"0.4.0"}
109
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:02.204-05:00","v":0,"msg":"Clearing contents of audit table before copy..","shadow_table":"pgosc_st_pgbench_accounts_714a8b","parent_table":"pgbench_accounts","version":"0.4.0"}
110
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:02.240-05:00","v":0,"msg":"Copying contents..","shadow_table":"pgosc_st_pgbench_accounts_714a8b","parent_table":"pgbench_accounts","version":"0.4.0"}
111
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:20.481-05:00","v":0,"msg":"Performing ANALYZE!","version":"0.4.0"}
112
+ INFO: analyzing "public.pgbench_accounts"
113
+ INFO: "pgbench_accounts": scanned 30000 of 166667 pages, containing 360000 live rows and 200 dead rows; 30000 rows in sample, 2000004 estimated total rows
114
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:21.078-05:00","v":0,"msg":"Replaying rows, count: 2000","version":"0.4.0"}
115
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:21.580-05:00","v":0,"msg":"Replaying rows, count: 2000","version":"0.4.0"}
116
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:22.022-05:00","v":0,"msg":"Replaying rows, count: 2000","version":"0.4.0"}
117
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:22.490-05:00","v":0,"msg":"Replaying rows, count: 2000","version":"0.4.0"}
118
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:22.866-05:00","v":0,"msg":"Replaying rows, count: 661","version":"0.4.0"}
119
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:23.212-05:00","v":0,"msg":"Replaying rows, count: 533","version":"0.4.0"}
120
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:23.512-05:00","v":0,"msg":"Replaying rows, count: 468","version":"0.4.0"}
121
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:23.809-05:00","v":0,"msg":"Remaining rows below delta count, proceeding towards swap","version":"0.4.0"}
122
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:23.809-05:00","v":0,"msg":"Performing swap!","version":"0.4.0"}
123
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:24.259-05:00","v":0,"msg":"Replaying rows, count: 449","version":"0.4.0"}
124
+ NOTICE: trigger "primary_to_audit_table_trigger" for relation "pgbench_accounts" does not exist, skipping
125
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:24.650-05:00","v":0,"msg":"Performing ANALYZE!","version":"0.4.0"}
126
+ INFO: analyzing "public.pgbench_accounts"
127
+ INFO: "pgbench_accounts": scanned 30000 of 32935 pages, containing 1821834 live rows and 6056 dead rows; 30000 rows in sample, 2000070 estimated total rows
128
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:24.941-05:00","v":0,"msg":"Validating constraints!","version":"0.4.0"}
129
+ NOTICE: table "pgosc_st_pgbench_accounts_714a8b" does not exist, skipping
130
+ {"name":"pg-online-schema-change","hostname":"MacBook-Pro.local","pid":13263,"level":30,"time":"2022-02-25T17:22:26.159-05:00","v":0,"msg":"All tasks successfully completed","version":"0.4.0"}
131
+ ```
132
+
133
+ </details>
134
+
135
+
136
+ ## Conclusion
137
+
138
+ By tweaking `--pull-batch-count` to `2000` (replay 2k rows at once) and `--delta-count` to `200` (time to swap when remaining rows is <200), `pg-osc` was able to perform the schema change with no impact within very quick time. Depending on the database size and load on the table, you can further tune them to achieve desired impact. At some point this is going to plateau - I can imagine the replay factor not working quite well for say 100k commits/s workloads. So, YMMV.
@@ -3,8 +3,10 @@
3
3
  require "thor"
4
4
 
5
5
  module PgOnlineSchemaChange
6
+ PULL_BATCH_COUNT = 1000
7
+ DELTA_COUNT = 20
6
8
  class CLI < Thor
7
- desc "perform", "Perform the set of operations to safely apply the schema change with minimal locks"
9
+ desc "perform", "Safely apply schema changes with minimal locks"
8
10
  method_option :alter_statement, aliases: "-a", type: :string, required: true,
9
11
  desc: "The ALTER statement to perform the schema change"
10
12
  method_option :schema, aliases: "-s", type: :string, required: true, default: "public",
@@ -13,7 +15,7 @@ module PgOnlineSchemaChange
13
15
  method_option :host, aliases: "-h", type: :string, required: true, desc: "Server host where the Database is located"
14
16
  method_option :username, aliases: "-u", type: :string, required: true, desc: "Username for the Database"
15
17
  method_option :port, aliases: "-p", type: :numeric, required: true, default: 5432, desc: "Port for the Database"
16
- method_option :password, aliases: "-w", type: :string, required: true, desc: "Password for the Database"
18
+ method_option :password, aliases: "-w", type: :string, required: true, desc: "DEPRECATED: Password for the Database. Please pass PGPASSWORD environment variable instead."
17
19
  method_option :verbose, aliases: "-v", type: :boolean, default: false, desc: "Emit logs in debug mode"
18
20
  method_option :drop, aliases: "-f", type: :boolean, default: false,
19
21
  desc: "Drop the original table in the end after the swap"
@@ -23,11 +25,19 @@ module PgOnlineSchemaChange
23
25
  desc: "Time to wait before killing backends to acquire lock and/or retrying upto 3 times. It will kill backends if --kill-backends is true, otherwise try upto 3 times and exit if it cannot acquire a lock."
24
26
  method_option :copy_statement, aliases: "-c", type: :string, required: false, default: "",
25
27
  desc: "Takes a .sql file location where you can provide a custom query to be played (ex: backfills) when pgosc copies data from the primary to the shadow table. More examples in README."
28
+ method_option :pull_batch_count, aliases: "-b", type: :numeric, required: false, default: PULL_BATCH_COUNT,
29
+ desc: "Number of rows to be replayed on each iteration after copy. This can be tuned for faster catch up and swap. Best used with delta-count."
30
+ method_option :delta_count, aliases: "-e", type: :numeric, required: false, default: DELTA_COUNT,
31
+ desc: "Indicates how many rows should be remaining before a swap should be performed. This can be tuned for faster catch up and swap, especially on highly volume tables. Best used with pull-batch-count."
26
32
 
27
33
  def perform
28
34
  client_options = Struct.new(*options.keys.map(&:to_sym)).new(*options.values)
29
-
30
35
  PgOnlineSchemaChange.logger(verbose: client_options.verbose)
36
+
37
+ PgOnlineSchemaChange.logger.warn("DEPRECATED: -w is deprecated. Please pass PGPASSWORD environment variable instead.") if client_options.password
38
+
39
+ client_options.password = ENV["PGPASSWORD"] || client_options.password
40
+
31
41
  PgOnlineSchemaChange::Orchestrate.run!(client_options)
32
42
  end
33
43
 
@@ -5,7 +5,7 @@ require "pg"
5
5
  module PgOnlineSchemaChange
6
6
  class Client
7
7
  attr_accessor :alter_statement, :schema, :dbname, :host, :username, :port, :password, :connection, :table, :drop,
8
- :kill_backends, :wait_time_for_lock, :copy_statement
8
+ :kill_backends, :wait_time_for_lock, :copy_statement, :pull_batch_count, :delta_count
9
9
 
10
10
  def initialize(options)
11
11
  @alter_statement = options.alter_statement
@@ -18,6 +18,8 @@ module PgOnlineSchemaChange
18
18
  @drop = options.drop
19
19
  @kill_backends = options.kill_backends
20
20
  @wait_time_for_lock = options.wait_time_for_lock
21
+ @pull_batch_count = options.pull_batch_count
22
+ @delta_count = options.delta_count
21
23
 
22
24
  handle_copy_statement(options.copy_statement)
23
25
  handle_validations
@@ -204,7 +204,7 @@ module PgOnlineSchemaChange
204
204
  return Query.run(client.connection, query, true)
205
205
  end
206
206
 
207
- sql = Query.copy_data_statement(client, shadow_table)
207
+ sql = Query.copy_data_statement(client, shadow_table, true)
208
208
  Query.run(client.connection, sql, true)
209
209
  ensure
210
210
  Query.run(client.connection, "COMMIT;") # commit the serializable transaction
@@ -7,9 +7,6 @@ module PgOnlineSchemaChange
7
7
  extend Helper
8
8
 
9
9
  class << self
10
- PULL_BATCH_COUNT = 1000
11
- DELTA_COUNT = 20
12
-
13
10
  # This, picks PULL_BATCH_COUNT rows by primary key from audit_table,
14
11
  # replays it on the shadow_table. Once the batch is done,
15
12
  # it them deletes those PULL_BATCH_COUNT rows from audit_table. Then, pull another batch,
@@ -20,7 +17,7 @@ module PgOnlineSchemaChange
20
17
  loop do
21
18
  rows = rows_to_play
22
19
 
23
- raise CountBelowDelta if rows.count <= DELTA_COUNT
20
+ raise CountBelowDelta if rows.count <= client.delta_count
24
21
 
25
22
  play!(rows)
26
23
  end
@@ -28,7 +25,7 @@ module PgOnlineSchemaChange
28
25
 
29
26
  def rows_to_play(reuse_trasaction = false)
30
27
  select_query = <<~SQL
31
- SELECT * FROM #{audit_table} ORDER BY #{audit_table_pk} LIMIT #{PULL_BATCH_COUNT};
28
+ SELECT * FROM #{audit_table} ORDER BY #{audit_table_pk} LIMIT #{client.pull_batch_count};
32
29
  SQL
33
30
 
34
31
  rows = []
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module PgOnlineSchemaChange
4
- VERSION = "0.4.0"
4
+ VERSION = "0.5.0"
5
5
  end
@@ -6,12 +6,12 @@ require "ougai"
6
6
  require "pg_online_schema_change/version"
7
7
  require "pg_online_schema_change/helper"
8
8
  require "pg_online_schema_change/functions"
9
- require "pg_online_schema_change/cli"
10
9
  require "pg_online_schema_change/client"
11
10
  require "pg_online_schema_change/query"
12
11
  require "pg_online_schema_change/store"
13
12
  require "pg_online_schema_change/replay"
14
13
  require "pg_online_schema_change/orchestrate"
14
+ require "pg_online_schema_change/cli"
15
15
 
16
16
  module PgOnlineSchemaChange
17
17
  class Error < StandardError; end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pg_online_schema_change
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.0
4
+ version: 0.5.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Shayon Mukherjee
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-02-22 00:00:00.000000000 Z
11
+ date: 2022-02-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ougai
@@ -204,9 +204,11 @@ files:
204
204
  - bin/console
205
205
  - bin/pg-online-schema-change
206
206
  - bin/setup
207
- - diagrams/how-it-works.excalidraw
208
- - diagrams/how-it-works.png
209
207
  - docker-compose.yml
208
+ - docs/how-it-works.excalidraw
209
+ - docs/how-it-works.png
210
+ - docs/load-test-1.png
211
+ - docs/load-test.md
210
212
  - lib/pg_online_schema_change.rb
211
213
  - lib/pg_online_schema_change/cli.rb
212
214
  - lib/pg_online_schema_change/client.rb