pg_easy_replicate 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md ADDED
@@ -0,0 +1,213 @@
1
+ # pg_easy_replicate
2
+
3
+ `pg_easy_replicate` is a CLI orchestrator tool that simplifies the process of setting up [logical replication](https://www.postgresql.org/docs/current/logical-replication.html) between two PostgreSQL databases. `pg_easy_replicate` also supports switchover. After the source (primary database) is fully replicating, `pg_easy_replicate` puts it into read-only mode and via logical replication flushes all data to the new target database. This ensures zero data loss and minimal downtime for the application. This method can be useful for performing minimal downtime major version upgrades between two PostgreSQL databases, load testing with blue/green database setup and other similar use cases.
4
+
5
+ - [Installation](#installation)
6
+ - [Requirements](#requirements)
7
+ - [Limits](#limits)
8
+ - [Usage](#usage)
9
+ - [CLI](#cli)
10
+ - [Replicating all tables with a single group](#replicating-all-tables-with-a-single-group)
11
+ - [Config check](#config-check)
12
+ - [Bootstrap](#bootstrap)
13
+ - [Start sync](#start-sync)
14
+ - [Stats](#stats)
15
+ - [Performing switchover](#performing-switchover)
16
+ - [Replicating single database with custom tables](#replicating-single-database-with-custom-tables)
17
+ - [Switchover strategies with minimal downtime](#switchover-strategies-with-minimal-downtime)
18
+ - [Rolling restart strategy](#rolling-restart-strategy)
19
+ - [DNS Failover strategy](#dns-failover-strategy)
20
+
21
+ ## Installation
22
+
23
+ Add this line to your application's Gemfile:
24
+
25
+ ```ruby
26
+ gem "pg_easy_replicate"
27
+ ```
28
+
29
+ And then execute:
30
+
31
+ $ bundle install
32
+
33
+ Or install it yourself as:
34
+
35
+ $ gem install pg_easy_replicate
36
+
37
+ This will include all dependencies accordingly as well. Make sure the following requirements are satisfied.
38
+
39
+ Or via Docker:
40
+
41
+ docker pull shayonj/pg_easy_replicate:latest
42
+
43
+ https://hub.docker.com/r/shayonj/pg_easy_replicate
44
+
45
+ ## Requirements
46
+
47
+ - PostgreSQL 10 and later
48
+ - Ruby 2.7 and later
49
+ - Database user should have permissions for `SUPERUSER`
50
+ - Both databases should have the same schema
51
+
52
+ ## Limits
53
+
54
+ All [Logical Replication Restrictions](https://www.postgresql.org/docs/current/logical-replication-restrictions.html) apply.
55
+
56
+ ## Usage
57
+
58
+ Ensure `SOURCE_DB_URL` and `TARGET_DB_URL` are present as environment variables in the runtime environment. The URL are of the postgres connection string format. Example:
59
+
60
+ ```bash
61
+ $ export SOURCE_DB_URL="postgres://USERNAME:PASSWORD@localhost:5432/DATABASE_NAME"
62
+ $ export TARGET_DB_URL="postgres://USERNAME:PASSWORD@localhost:5433/DATABASE_NAME"
63
+ ```
64
+
65
+ Any `pg_easy_replicate` command can be run the same way with the docker image as well. As long the container is running in an environment where it has access to both the databases. Example
66
+
67
+ ```bash
68
+ docker run -it --rm shayonj/pg_easy_replicate:latest \
69
+ -e SOURCE_DB_URL="postgres://USERNAME:PASSWORD@localhost:5432/DATABASE_NAME" \
70
+ -e TARGET_DB_URL="postgres://USERNAME:PASSWORD@localhost:5433/DATABASE_NAME" \
71
+ pg_easy_replicate config_check
72
+ ```
73
+
74
+ ## CLI
75
+
76
+ ```bash
77
+ $ pg_easy_replicate
78
+ pg_easy_replicate commands:
79
+ pg_easy_replicate bootstrap -g, --group-name=GROUP_NAME # Sets up temporary tables for information required during runtime
80
+ pg_easy_replicate cleanup -g, --group-name=GROUP_NAME # Cleans up all bootstrapped data for the respective group
81
+ pg_easy_replicate config_check # Prints if source and target database have the required config
82
+ pg_easy_replicate help [COMMAND] # Describe available commands or one specific command
83
+ pg_easy_replicate start_sync -g, --group-name=GROUP_NAME # Starts the logical replication from source database to target database provisioned in the group
84
+ pg_easy_replicate stats -g, --group-name=GROUP_NAME # Prints the statistics in JSON for the group
85
+ pg_easy_replicate stop_sync -g, --group-name=GROUP_NAME # Stop the logical replication from source database to target database provisioned in the group
86
+ pg_easy_replicate switchover -g, --group-name=GROUP_NAME # Puts the source database in read only mode after all the data is flushed and written
87
+ pg_easy_replicate version # Prints the version
88
+
89
+ ```
90
+
91
+ ## Replicating all tables with a single group
92
+
93
+ You can create as many groups as you want for a single database. Groups are just a logical isolation of a single replication.
94
+
95
+ ### Config check
96
+
97
+ ```bash
98
+ $ pg_easy_replicate config_check
99
+
100
+ ✅ Config is looking good.
101
+ ```
102
+
103
+ ### Bootstrap
104
+
105
+ Every sync will need to be bootstrapped before you can set up the sync between two databases. Bootstrap creates a new super user to perform the orchestration required during the rest of the process. It also creates some internal metadata tables for record keeping.
106
+
107
+ ```bash
108
+ $ pg_easy_replicate bootstrap --group-name database-cluster-1
109
+
110
+ {"name":"pg_easy_replicate","hostname":"PKHXQVK6DW","pid":21485,"level":30,"time":"2023-06-19T15:51:11.015-04:00","v":0,"msg":"Setting up schema","version":"0.1.0"}
111
+ ...
112
+ ```
113
+
114
+ ### Start sync
115
+
116
+ Once the bootstrap is complete, you can start the sync. Starting the sync sets up the publication, subscription and performs other minor housekeeping things.
117
+
118
+ ```bash
119
+ $ pg_easy_replicate start_sync --group-name database-cluster-1
120
+
121
+ {"name":"pg_easy_replicate","hostname":"PKHXQVK6DW","pid":22113,"level":30,"time":"2023-06-19T15:54:54.874-04:00","v":0,"msg":"Setting up publication","publication_name":"pger_publication_database_cluster_1","version":"0.1.0"}
122
+ ...
123
+ ```
124
+
125
+ ### Stats
126
+
127
+ You can inspect or watch stats any time during the sync process. The stats give you can an idea of when the sync started, current flush/write lag, how many tables are in `replicating`, `copying` or other stages, and more.
128
+
129
+ You can poll these stats to perform any other after the switchover is done. The stats include a `switchover_completed_at` which is updated once the switch over is complete.
130
+
131
+ ```bash
132
+ $ pg_easy_replicate stats --group-name database-cluster-1
133
+
134
+ {
135
+ "lag_stats": [
136
+ {
137
+ "pid": 66,
138
+ "client_addr": "192.168.128.2",
139
+ "user_name": "jamesbond",
140
+ "application_name": "pger_subscription_database_cluster_1",
141
+ "state": "streaming",
142
+ "sync_state": "async",
143
+ "write_lag": "0.0",
144
+ "flush_lag": "0.0",
145
+ "replay_lag": "0.0"
146
+ }
147
+ ],
148
+ "message_lsn_receipts": [
149
+ {
150
+ "received_lsn": "0/1674688",
151
+ "last_msg_send_time": "2023-06-19 19:56:35 UTC",
152
+ "last_msg_receipt_time": "2023-06-19 19:56:35 UTC",
153
+ "latest_end_lsn": "0/1674688",
154
+ "latest_end_time": "2023-06-19 19:56:35 UTC"
155
+ }
156
+ ],
157
+ "sync_started_at": "2023-06-19 19:54:54 UTC",
158
+ "sync_failed_at": null,
159
+ "switchover_completed_at": null
160
+
161
+ ....
162
+ ```
163
+
164
+ ### Performing switchover
165
+
166
+ `pg_easy_replicate` doesn't kick off the switchover on its own. When you start the sync via `start_sync`, it starts the replication between the two databases. Once you have had the time to monitor stats and any other key metrics, you can kick off the `switchover`.
167
+
168
+ `switchover` will wait until all tables in the group are replicating and the delta for lag is <200kb (by calculating the `pg_wal_lsn_diff` between `sent_lsn` and `write_lsn`) and then perform the switch.
169
+
170
+ The switch is made by putting the user on the source database in `READ ONLY` mode, so that it is not accepting any more writes and waits for the flush lag to be `0`. It is up to user to kick of a rolling restart of your application containers or failover DNS (more on these below in strategies) after the switchover is complete, so that your application isn't sending any read/write requests to the old/source database.
171
+
172
+ ```bash
173
+ $ pg_easy_replicate switchover --group-name database-cluster-1
174
+
175
+ {"name":"pg_easy_replicate","hostname":"PKHXQVK6DW","pid":24192,"level":30,"time":"2023-06-19T16:05:23.033-04:00","v":0,"msg":"Watching lag stats","version":"0.1.0"}
176
+ ...
177
+ ```
178
+
179
+ ## Replicating single database with custom tables
180
+
181
+ By default all tables are added for replication but you can create multiple groups with custom tables for the same database. Example
182
+
183
+ ```bash
184
+
185
+ $ pg_easy_replicate bootstrap --group-name database-cluster-1
186
+ $ pg_easy_replicate start_sync --group-name database-cluster-1 --schema-name public --tables "users, posts, events"
187
+
188
+ ...
189
+
190
+ $ pg_easy_replicate bootstrap --group-name database-cluster-2
191
+ $ pg_easy_replicate start_sync --group-name database-cluster-2 --schema-name public --tables "comments, views"
192
+
193
+ ...
194
+ $ pg_easy_replicate switchover --group-name database-cluster-1
195
+ $ pg_easy_replicate switchover --group-name database-cluster-2
196
+ ...
197
+ ```
198
+
199
+ ## Switchover strategies with minimal downtime
200
+
201
+ For minimal downtime, it'd be best to watch/tail the stats and wait until `switchover_completed_at` is updated with a timestamp. Once that happens you can perform any of the following strategies. Note: These are just suggestions and `pg_easy_replicate` doesn't provide any functionalities for this.
202
+
203
+ ### Rolling restart strategy
204
+
205
+ In this strategy, you have a change ready to go which instructs your application to start connecting to the new database. Either using an environment variable or similar. Depending on the application type, it may or may not require a rolling restart.
206
+
207
+ Next, you can set up a program that watches the `stats` and waits until `switchover_completed_at` is reporting as `true`. Once that happens it kicks off a rolling restart of your application containers so they can start making connections to the DNS of the new database.
208
+
209
+ ### DNS Failover strategy
210
+
211
+ In this strategy, you have a weighted based DNS system (example [AWS Route53 weighted records](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resource-record-sets-values-weighted.html)) where 100% of traffic goes to a primary origin and 0% to a secondary origin. The primary origin here is the DNS host for your source database and secondary origin is the DNS host for your target database. You can set up your application ahead of time to interact with the database using DNS from the weighted group.
212
+
213
+ Next, you can set up a program that watches the `stats` and waits until `switchover_completed_at` is reporting as `true`. Once that happens it updates the weight in the DNS weighted group where 100% of the requests now go to the new/target database. Note: Keeping a lot `ttl` is recommended.
data/Rakefile ADDED
@@ -0,0 +1,13 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+ require "standalone_migrations"
6
+
7
+ RSpec::Core::RakeTask.new(:spec)
8
+
9
+ require "rubocop/rake_task"
10
+
11
+ RuboCop::RakeTask.new
12
+
13
+ task default: [:spec, :rubocop]
data/bin/console ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require "bundler/setup"
5
+ require "pg_easy_replicate"
6
+ require "pry"
7
+
8
+ Pry.start
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require "pg_easy_replicate"
5
+
6
+ PgEasyReplicate::CLI.start(ARGV)
data/bin/release.sh ADDED
@@ -0,0 +1,28 @@
1
+ export VERSION=$1
2
+ echo "VERSION: ${VERSION}"
3
+
4
+ echo "=== Pushing tags to github ===="
5
+ git tag v"$VERSION"
6
+ git push origin --tags
7
+
8
+ echo "=== Building Gem ===="
9
+ gem build pg_easy_replicate.gemspec
10
+
11
+ echo "=== Pushing gem ===="
12
+ gem push pg_easy_replicate-"$VERSION".gem
13
+
14
+ echo "=== Sleeping for 5s ===="
15
+ sleep 5
16
+
17
+ echo "=== Building Image ===="
18
+ docker build . --build-arg VERSION="$VERSION" -t shayonj/pg-osc:"$VERSION"
19
+
20
+ echo "=== Tagging Image ===="
21
+ docker image tag shayonj/pg-osc:"$VERSION" shayonj/pg-osc:latest
22
+
23
+ echo "=== Pushing Image ===="
24
+ docker push shayonj/pg-osc:"$VERSION"
25
+ docker push shayonj/pg-osc:latest
26
+
27
+ echo "=== Cleaning up ===="
28
+ rm pg_easy_replicate-"$VERSION".gem
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,34 @@
1
+ version: "3.7"
2
+ services:
3
+ source_db:
4
+ image: postgres:12-alpine
5
+ ports:
6
+ - "5432:5432"
7
+ environment:
8
+ POSTGRES_USER: jamesbond
9
+ POSTGRES_PASSWORD: jamesbond
10
+ POSTGRES_DB: postgres
11
+ command:
12
+ - "postgres"
13
+ - "-c"
14
+ - "wal_level=logical"
15
+ networks:
16
+ localnet:
17
+
18
+ target_db:
19
+ image: postgres:12-alpine
20
+ ports:
21
+ - "5433:5432"
22
+ environment:
23
+ POSTGRES_USER: jamesbond
24
+ POSTGRES_PASSWORD: jamesbond
25
+ POSTGRES_DB: postgres
26
+ command:
27
+ - "postgres"
28
+ - "-c"
29
+ - "wal_level=logical"
30
+ networks:
31
+ localnet:
32
+
33
+ networks:
34
+ localnet:
@@ -0,0 +1,121 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "thor"
4
+
5
+ module PgEasyReplicate
6
+ class CLI < Thor
7
+ package_name "pg_easy_replicate"
8
+
9
+ desc "config_check",
10
+ "Prints if source and target database have the required config"
11
+ def config_check
12
+ PgEasyReplicate.assert_config
13
+
14
+ puts "✅ Config is looking good."
15
+ end
16
+
17
+ method_option :group_name,
18
+ aliases: "-g",
19
+ required: true,
20
+ desc: "Name of the group to provision"
21
+ desc "bootstrap",
22
+ "Sets up temporary tables for information required during runtime"
23
+ def bootstrap
24
+ PgEasyReplicate.bootstrap(options)
25
+ end
26
+
27
+ desc "cleanup", "Cleans up all bootstrapped data for the respective group"
28
+ method_option :group_name,
29
+ aliases: "-g",
30
+ required: true,
31
+ desc: "Name of the group previously provisioned"
32
+ method_option :everything,
33
+ aliases: "-e",
34
+ desc:
35
+ "Cleans up all bootstrap tables, users and any publication/subscription"
36
+ method_option :sync,
37
+ aliases: "-s",
38
+ desc:
39
+ "Cleans up the publication and subscription for the respective group"
40
+ def cleanup
41
+ PgEasyReplicate.cleanup(options)
42
+ end
43
+
44
+ desc "start_sync",
45
+ "Starts the logical replication from source database to target database provisioned in the group"
46
+ method_option :group_name,
47
+ aliases: "-g",
48
+ required: true,
49
+ desc: "Name of the group to provision"
50
+ method_option :group_name,
51
+ aliases: "-g",
52
+ required: true,
53
+ desc:
54
+ "Name of the grouping for this collection of source and target DB"
55
+ method_option :schema_name,
56
+ aliases: "-s",
57
+ desc:
58
+ "Name of the schema tables are in, only required if passsing list of tables"
59
+ method_option :tables,
60
+ aliases: "-t",
61
+ desc:
62
+ "Comma separated list of table names. Default: All tables"
63
+ def start_sync
64
+ PgEasyReplicate::Orchestrate.start_sync(options)
65
+ end
66
+
67
+ desc "stop_sync",
68
+ "Stop the logical replication from source database to target database provisioned in the group"
69
+ method_option :group_name,
70
+ aliases: "-g",
71
+ required: true,
72
+ desc: "Name of the group previously provisioned"
73
+ def stop_sync
74
+ PgEasyReplicate::Orchestrate.stop_sync(options[:group_name])
75
+ end
76
+
77
+ desc "switchover ",
78
+ "Puts the source database in read only mode after all the data is flushed and written"
79
+ method_option :group_name,
80
+ aliases: "-g",
81
+ required: true,
82
+ desc: "Name of the group previously provisioned"
83
+ method_option :lag_delta_size,
84
+ aliases: "-l",
85
+ desc:
86
+ "The size of the lag to watch for before switchover. Default 200KB."
87
+ # method_option :bi_directional,
88
+ # aliases: "-b",
89
+ # desc:
90
+ # "Setup replication from target database to source database"
91
+ def switchover
92
+ PgEasyReplicate::Orchestrate.switchover(
93
+ group_name: options[:group_name],
94
+ lag_delta_size: options[:lag_delta_size],
95
+ )
96
+ end
97
+
98
+ desc "stats ", "Prints the statistics in JSON for the group"
99
+ method_option :group_name,
100
+ aliases: "-g",
101
+ required: true,
102
+ desc: "Name of the group previously provisioned"
103
+ method_option :watch, aliases: "-w", desc: "Tail the stats"
104
+ def stats
105
+ if options[:watch]
106
+ PgEasyReplicate::Stats.follow(options[:group_name])
107
+ else
108
+ PgEasyReplicate::Stats.print(options[:group_name])
109
+ end
110
+ end
111
+
112
+ desc "version", "Prints the version"
113
+ def version
114
+ puts PgEasyReplicate::VERSION
115
+ end
116
+
117
+ def self.exit_on_failure?
118
+ true
119
+ end
120
+ end
121
+ end
@@ -0,0 +1,92 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PgEasyReplicate
4
+ class Group
5
+ extend Helper
6
+ class << self
7
+ def setup
8
+ conn =
9
+ Query.connect(
10
+ connection_url: source_db_url,
11
+ schema: internal_schema_name,
12
+ )
13
+ return if conn.table_exists?("groups")
14
+ conn.create_table("groups") do
15
+ primary_key(:id)
16
+ column(:name, String, null: false)
17
+ column(:table_names, String, text: true)
18
+ column(:schema_name, String)
19
+ column(:created_at, Time, default: Sequel::CURRENT_TIMESTAMP)
20
+ column(:updated_at, Time, default: Sequel::CURRENT_TIMESTAMP)
21
+ column(:started_at, Time)
22
+ column(:failed_at, Time)
23
+ column(:switchover_completed_at, Time)
24
+ end
25
+ ensure
26
+ conn&.disconnect
27
+ end
28
+
29
+ def drop
30
+ conn =
31
+ Query.connect(
32
+ connection_url: source_db_url,
33
+ schema: internal_schema_name,
34
+ ).drop_table?("groups")
35
+ ensure
36
+ conn&.disconnect
37
+ end
38
+
39
+ def create(options)
40
+ groups.insert(
41
+ name: options[:name],
42
+ table_names: options[:table_names],
43
+ schema_name: options[:schema_name],
44
+ started_at: options[:started_at],
45
+ failed_at: options[:failed_at],
46
+ )
47
+ rescue => e
48
+ abort_with("Adding group entry failed: #{e.message}")
49
+ end
50
+
51
+ def update(
52
+ group_name:,
53
+ started_at: nil,
54
+ switchover_completed_at: nil,
55
+ failed_at: nil
56
+ )
57
+ set = {
58
+ started_at: started_at&.utc,
59
+ switchover_completed_at: switchover_completed_at&.utc,
60
+ failed_at: failed_at&.utc,
61
+ updated_at: Time.now.utc,
62
+ }.compact
63
+ groups.where(name: group_name).update(set)
64
+ rescue => e
65
+ abort_with("Updating group entry failed: #{e.message}")
66
+ end
67
+
68
+ def find(group_name)
69
+ groups.first(name: group_name)
70
+ rescue => e
71
+ abort_with("Finding group entry failed: #{e.message}")
72
+ end
73
+
74
+ def delete(group_name)
75
+ groups.where(name: group_name).delete
76
+ rescue => e
77
+ abort_with("Deleting group entry failed: #{e.message}")
78
+ end
79
+
80
+ private
81
+
82
+ def groups
83
+ conn =
84
+ Query.connect(
85
+ connection_url: source_db_url,
86
+ schema: internal_schema_name,
87
+ )
88
+ conn[:groups]
89
+ end
90
+ end
91
+ end
92
+ end
@@ -0,0 +1,68 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PgEasyReplicate
4
+ module Helper
5
+ def source_db_url
6
+ ENV.fetch("SOURCE_DB_URL", nil)
7
+ end
8
+
9
+ def secondary_source_db_url
10
+ ENV.fetch("SECONDARY_SOURCE_DB_URL", nil)
11
+ end
12
+
13
+ def target_db_url
14
+ ENV.fetch("TARGET_DB_URL", nil)
15
+ end
16
+
17
+ def logger
18
+ PgEasyReplicate.logger
19
+ end
20
+
21
+ def internal_schema_name
22
+ "pger"
23
+ end
24
+
25
+ def internal_user_name
26
+ "pger_su"
27
+ end
28
+
29
+ def publication_name(group_name)
30
+ "pger_publication_#{underscore(group_name)}"
31
+ end
32
+
33
+ def subscription_name(group_name)
34
+ "pger_subscription_#{underscore(group_name)}"
35
+ end
36
+
37
+ def underscore(str)
38
+ str
39
+ .gsub(/::/, "/")
40
+ .gsub(/([A-Z]+)([A-Z][a-z])/, '\1_\2')
41
+ .gsub(/([a-z\d])([A-Z])/, '\1_\2')
42
+ .tr("-", "_")
43
+ .downcase
44
+ end
45
+
46
+ def test_env?
47
+ ENV.fetch("RACK_ENV", nil) == "test"
48
+ end
49
+
50
+ def connection_info(conn_string)
51
+ PG::Connection
52
+ .conninfo_parse(conn_string)
53
+ .each_with_object({}) do |obj, hash|
54
+ hash[obj[:keyword].to_sym] = obj[:val]
55
+ end
56
+ .compact
57
+ end
58
+
59
+ def db_user(url)
60
+ connection_info(url)[:user]
61
+ end
62
+
63
+ def abort_with(msg)
64
+ raise(msg) if test_env?
65
+ abort(msg)
66
+ end
67
+ end
68
+ end