pgsync 0.5.5 → 0.6.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of pgsync might be problematic. Click here for more details.

checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 32d1e60a133659dde98d3564d21fca4f1866e8f7b58518ef71f6dcaf59cf8c82
4
- data.tar.gz: 650b50aee614ed737d2e99278cbb8910f7980a041dcf58d18288f6bc53826635
3
+ metadata.gz: f6c20cf99ebc0961d8699228883f6c2e9c95c9798cfb6f101cb78803df6b1b43
4
+ data.tar.gz: 7c719442c07e6db1d4704199aa2b95d32e1ba528797b2d21b15be85f2de71d79
5
5
  SHA512:
6
- metadata.gz: 8bd3e4373558c8f8519f0c118f51b55ac27a3a82fb1ab9a38e85bfee858e08a80fb2c37cab490878d2dda06121c13f45d81b3df6c87b8ede0e0060a50b94fee6
7
- data.tar.gz: 922ecb8f57615025e5b92a93f82010bb667fbc968db12ebddeb49507ce90d6dd4c3c16ea3cd5777817bb321a8fefa54f6483b6ffae872f897e7ada64d68831b0
6
+ metadata.gz: 1e3d735e6e3002e2fcb74de574f63f534472821aca0154c13d9d7ce6e9fa94426239cef875cd33724fec5a6cf2567b5656a80a9d92f89d9678ee6bee2c2ec70c
7
+ data.tar.gz: 3ab5d416f4c2364644c015d106631898a3615cc671179f8b1193cda5b1e894512fb39906ec1c27570ca9fbd186c9eb19af3afb88eba093bb21e8bc471c28f18b
@@ -1,9 +1,25 @@
1
+ ## 0.6.0 (2020-06-07)
2
+
3
+ - Added messages for different column types and non-deferrable constraints
4
+ - Added support for wildcards to `--exclude`
5
+ - Improved `--overwrite` and `--preserve` options for foreign keys
6
+ - Improved output for schema sync
7
+ - Fixed `--overwrite` and `--preserve` options for multicolumn primary keys
8
+ - Fixed output for notices
9
+
10
+ Breaking
11
+
12
+ - Syncs shared tables instead of raising an error when tables missing in destination
13
+ - Raise an error when `--config` or `--db` option provided and config not found
14
+ - Removed deprecated options
15
+ - Dropped support for Postgres < 9.5
16
+
1
17
  ## 0.5.5 (2020-05-13)
2
18
 
3
19
  - Added `--jobs` option
4
- - Added experimental `--defer-constraints` option
5
- - Added experimental `--disable-user-triggers` option
6
- - Added experimental `--disable-integrity` option
20
+ - Added `--defer-constraints` option
21
+ - Added `--disable-user-triggers` option
22
+ - Added `--disable-integrity` option
7
23
  - Improved error message for older libpq
8
24
 
9
25
  ## 0.5.4 (2020-05-09)
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2015-2019 Andrew Kane
3
+ Copyright (c) 2015-2020 Andrew Kane
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -33,20 +33,26 @@ This creates `.pgsync.yml` for you to customize. We recommend checking this into
33
33
 
34
34
  ## How to Use
35
35
 
36
+ First, make sure your schema is set up in both databases. We recommend using a schema migration tool for this, but pgsync also provides a few [convenience methods](#schema). Once that’s done, you’re ready to sync data.
37
+
36
38
  Sync all tables
37
39
 
38
40
  ```sh
39
41
  pgsync
40
42
  ```
41
43
 
42
- **Note:** pgsync assumes your schema is setup in your `to` database. See the [schema section](#schema) if that’s not the case.
43
-
44
44
  Sync specific tables
45
45
 
46
46
  ```sh
47
47
  pgsync table1,table2
48
48
  ```
49
49
 
50
+ Works with wildcards as well
51
+
52
+ ```sh
53
+ pgsync "table*"
54
+ ```
55
+
50
56
  Sync specific rows (existing rows are overwritten)
51
57
 
52
58
  ```sh
@@ -65,13 +71,15 @@ Or truncate them
65
71
  pgsync products "where store_id = 1" --truncate
66
72
  ```
67
73
 
68
- ## Exclude Tables
74
+ ## Tables
75
+
76
+ Exclude specific tables
69
77
 
70
78
  ```sh
71
- pgsync --exclude users
79
+ pgsync --exclude table1,table2
72
80
  ```
73
81
 
74
- To always exclude, add to `.pgsync.yml`.
82
+ Add to `.pgsync.yml` to exclude by default
75
83
 
76
84
  ```yml
77
85
  exclude:
@@ -79,12 +87,14 @@ exclude:
79
87
  - table2
80
88
  ```
81
89
 
82
- For Rails, you probably want to exclude schema migrations and Active Record metadata.
90
+ Sync tables from all schemas or specific schemas (by default, only the search path is synced)
83
91
 
84
- ```yml
85
- exclude:
86
- - schema_migrations
87
- - ar_internal_metadata
92
+ ```sh
93
+ pgsync --all-schemas
94
+ # or
95
+ pgsync --schemas public,other
96
+ # or
97
+ pgsync public.table1,other.table2
88
98
  ```
89
99
 
90
100
  ## Groups
@@ -104,6 +114,8 @@ And run:
104
114
  pgsync group1
105
115
  ```
106
116
 
117
+ ## Variables
118
+
107
119
  You can also use groups to sync a specific record and associated records in other tables.
108
120
 
109
121
  To get product `123` with its reviews, last 10 coupons, and store, use:
@@ -125,16 +137,12 @@ pgsync product:123
125
137
 
126
138
  ## Schema
127
139
 
128
- **Note:** pgsync is designed to sync data. You should use a schema migration tool to manage schema changes. The methods in this section are provided for convenience but not recommended.
129
-
130
- Sync schema before the data
140
+ Sync schema before the data (this wipes out existing data)
131
141
 
132
142
  ```sh
133
143
  pgsync --schema-first
134
144
  ```
135
145
 
136
- **Note:** This wipes out existing data
137
-
138
146
  Specify tables
139
147
 
140
148
  ```sh
@@ -149,9 +157,9 @@ pgsync --schema-only
149
157
 
150
158
  pgsync does not try to sync Postgres extensions.
151
159
 
152
- ## Sensitive Information
160
+ ## Sensitive Data
153
161
 
154
- Prevent sensitive information like email addresses from leaving the remote server.
162
+ Prevent sensitive data like email addresses from leaving the remote server.
155
163
 
156
164
  Define rules in `.pgsync.yml`:
157
165
 
@@ -184,7 +192,7 @@ Options for replacement are:
184
192
  - `null`
185
193
  - `untouched`
186
194
 
187
- Rules starting with `unique_` require the table to have a primary key. `unique_phone` requires a numeric primary key.
195
+ Rules starting with `unique_` require the table to have a single column primary key. `unique_phone` requires a numeric primary key.
188
196
 
189
197
  ## Foreign Keys
190
198
 
@@ -192,7 +200,7 @@ Foreign keys can make it difficult to sync data. Three options are:
192
200
 
193
201
  1. Manually specify the order of tables
194
202
  2. Use deferrable constraints
195
- 3. Disable triggers, which can silently break referential integrity
203
+ 3. Disable foreign key triggers, which can silently break referential integrity
196
204
 
197
205
  When manually specifying the order, use `--jobs 1` so tables are synced one-at-a-time.
198
206
 
@@ -206,16 +214,12 @@ If your tables have [deferrable constraints](https://begriffs.com/posts/2017-08-
206
214
  pgsync --defer-constraints
207
215
  ```
208
216
 
209
- **Note:** This feature is currently experimental.
210
-
211
- To disable triggers and potentially break referential integrity, use:
217
+ To disable foreign key triggers and potentially break referential integrity, use:
212
218
 
213
219
  ```sh
214
220
  pgsync --disable-integrity
215
221
  ```
216
222
 
217
- **Note:** This feature is currently experimental.
218
-
219
223
  ## Triggers
220
224
 
221
225
  Disable user triggers with:
@@ -224,8 +228,6 @@ Disable user triggers with:
224
228
  pgsync --disable-user-triggers
225
229
  ```
226
230
 
227
- **Note:** This feature is currently experimental.
228
-
229
231
  ## Append-Only Tables
230
232
 
231
233
  For extremely large, append-only tables, sync in batches.
data/config.yml CHANGED
@@ -20,6 +20,10 @@ to: postgres://localhost:5432/myapp_development
20
20
  # - table1
21
21
  # - table2
22
22
 
23
+ # sync specific schemas
24
+ # schemas:
25
+ # - public
26
+
23
27
  # protect sensitive information
24
28
  data_rules:
25
29
  email: unique_email
@@ -10,15 +10,19 @@ require "shellwords"
10
10
  require "tempfile"
11
11
  require "uri"
12
12
  require "yaml"
13
+ require "open3"
13
14
 
14
15
  # modules
15
16
  require "pgsync/utils"
16
17
  require "pgsync/client"
17
18
  require "pgsync/data_source"
18
19
  require "pgsync/init"
20
+ require "pgsync/schema_sync"
19
21
  require "pgsync/sync"
20
- require "pgsync/table_list"
22
+ require "pgsync/table"
21
23
  require "pgsync/table_sync"
24
+ require "pgsync/task"
25
+ require "pgsync/task_resolver"
22
26
  require "pgsync/version"
23
27
 
24
28
  module PgSync
@@ -7,77 +7,72 @@ module PgSync
7
7
  output.sync = true
8
8
  end
9
9
 
10
- def perform(testing: true)
11
- opts = parse_args
10
+ def perform
11
+ result = Slop::Parser.new(slop_options).parse(@args)
12
+ arguments = result.arguments
13
+ options = result.to_h
12
14
 
13
- # TODO throw error in 0.6.0
14
- warn "Specify either --db or --config, not both" if opts[:db] && opts[:config]
15
+ raise Error, "Specify either --db or --config, not both" if options[:db] && options[:config]
16
+ raise Error, "Cannot use --overwrite with --in-batches" if options[:overwrite] && options[:in_batches]
15
17
 
16
- if opts.version?
18
+ if options[:version]
17
19
  log VERSION
18
- elsif opts.help?
19
- log opts
20
- # TODO remove deprecated conditions (last two)
21
- elsif opts.init? || opts.setup? || opts.arguments[0] == "setup"
22
- Init.new.perform(opts)
20
+ elsif options[:help]
21
+ log slop_options
22
+ elsif options[:init]
23
+ Init.new(arguments, options).perform
23
24
  else
24
- Sync.new.perform(opts)
25
+ Sync.new(arguments, options).perform
25
26
  end
26
- rescue Error, PG::ConnectionBad => e
27
- raise e if testing
28
- abort colorize(e.message, :red)
27
+ rescue => e
28
+ # Error, PG::ConnectionBad, Slop::Error
29
+ raise e if options && options[:debug]
30
+ abort colorize(e.message.strip, :red)
29
31
  end
30
32
 
31
33
  def self.start
32
- new(ARGV).perform(testing: false)
34
+ new(ARGV).perform
33
35
  end
34
36
 
35
37
  protected
36
38
 
37
- def parse_args
38
- Slop.parse(@args) do |o|
39
- o.banner = %{Usage:
40
- pgsync [options]
39
+ def slop_options
40
+ o = Slop::Options.new
41
+ o.banner = %{Usage:
42
+ pgsync [options]
41
43
 
42
44
  Options:}
43
- o.string "-d", "--db", "database"
44
- o.string "-t", "--tables", "tables to sync"
45
- o.string "-g", "--groups", "groups to sync"
46
- o.integer "-j", "--jobs", "number of tables to sync at a time"
47
- o.string "--schemas", "schemas to sync"
48
- o.string "--from", "source"
49
- o.string "--to", "destination"
50
- o.string "--where", "where", help: false
51
- o.integer "--limit", "limit", help: false
52
- o.string "--exclude", "exclude tables"
53
- o.string "--config", "config file"
54
- # TODO much better name for this option
55
- o.boolean "--to-safe", "accept danger", default: false
56
- o.boolean "--debug", "debug", default: false
57
- o.boolean "--list", "list", default: false
58
- o.boolean "--overwrite", "overwrite existing rows", default: false, help: false
59
- o.boolean "--preserve", "preserve existing rows", default: false
60
- o.boolean "--truncate", "truncate existing rows", default: false
61
- o.boolean "--schema-first", "schema first", default: false
62
- o.boolean "--schema-only", "schema only", default: false
63
- o.boolean "--all-schemas", "all schemas", default: false
64
- o.boolean "--no-rules", "do not apply data rules", default: false
65
- o.boolean "--no-sequences", "do not sync sequences", default: false
66
- o.boolean "--init", "init", default: false
67
- o.boolean "--setup", "setup", default: false, help: false
68
- o.boolean "--in-batches", "in batches", default: false, help: false
69
- o.integer "--batch-size", "batch size", default: 10000, help: false
70
- o.float "--sleep", "sleep", default: 0, help: false
71
- o.boolean "--fail-fast", "stop on the first failed table", default: false
72
- o.boolean "--defer-constraints", "defer constraints", default: false
73
- o.boolean "--disable-user-triggers", "disable non-system triggers", default: false
74
- o.boolean "--disable-integrity", "disable foreign key triggers", default: false
75
- # o.array "--var", "pass a variable"
76
- o.boolean "-v", "--version", "print the version"
77
- o.boolean "-h", "--help", "prints help"
78
- end
79
- rescue Slop::Error => e
80
- raise Error, e.message
45
+ o.string "-d", "--db", "database"
46
+ o.string "-t", "--tables", "tables to sync"
47
+ o.string "-g", "--groups", "groups to sync"
48
+ o.integer "-j", "--jobs", "number of tables to sync at a time"
49
+ o.string "--schemas", "schemas to sync"
50
+ o.string "--from", "source"
51
+ o.string "--to", "destination"
52
+ o.string "--exclude", "exclude tables"
53
+ o.string "--config", "config file"
54
+ o.boolean "--to-safe", "accept danger", default: false
55
+ o.boolean "--debug", "debug", default: false
56
+ o.boolean "--list", "list", default: false
57
+ o.boolean "--overwrite", "overwrite existing rows", default: false, help: false
58
+ o.boolean "--preserve", "preserve existing rows", default: false
59
+ o.boolean "--truncate", "truncate existing rows", default: false
60
+ o.boolean "--schema-first", "schema first", default: false
61
+ o.boolean "--schema-only", "schema only", default: false
62
+ o.boolean "--all-schemas", "all schemas", default: false
63
+ o.boolean "--no-rules", "do not apply data rules", default: false
64
+ o.boolean "--no-sequences", "do not sync sequences", default: false
65
+ o.boolean "--init", "init", default: false
66
+ o.boolean "--in-batches", "in batches", default: false, help: false
67
+ o.integer "--batch-size", "batch size", default: 10000, help: false
68
+ o.float "--sleep", "sleep", default: 0, help: false
69
+ o.boolean "--fail-fast", "stop on the first failed table", default: false
70
+ o.boolean "--defer-constraints", "defer constraints", default: false
71
+ o.boolean "--disable-user-triggers", "disable non-system triggers", default: false
72
+ o.boolean "--disable-integrity", "disable foreign key triggers", default: false
73
+ o.boolean "-v", "--version", "print the version"
74
+ o.boolean "-h", "--help", "prints help"
75
+ o
81
76
  end
82
77
  end
83
78
  end
@@ -1,9 +1,11 @@
1
1
  module PgSync
2
2
  class DataSource
3
+ include Utils
4
+
3
5
  attr_reader :url
4
6
 
5
- def initialize(source)
6
- @url = resolve_url(source)
7
+ def initialize(url)
8
+ @url = url
7
9
  end
8
10
 
9
11
  def exists?
@@ -29,8 +31,18 @@ module PgSync
29
31
  # gets visible tables
30
32
  def tables
31
33
  @tables ||= begin
32
- query = "SELECT table_schema, table_name FROM information_schema.tables WHERE table_type = 'BASE TABLE' AND table_schema NOT IN ('information_schema', 'pg_catalog') ORDER BY 1, 2"
33
- execute(query).map { |row| "#{row["table_schema"]}.#{row["table_name"]}" }
34
+ query = <<~SQL
35
+ SELECT
36
+ table_schema AS schema,
37
+ table_name AS table
38
+ FROM
39
+ information_schema.tables
40
+ WHERE
41
+ table_type = 'BASE TABLE' AND
42
+ table_schema NOT IN ('information_schema', 'pg_catalog')
43
+ ORDER BY 1, 2
44
+ SQL
45
+ execute(query).map { |row| Table.new(row["schema"], row["table"]) }
34
46
  end
35
47
  end
36
48
 
@@ -38,25 +50,21 @@ module PgSync
38
50
  table_set.include?(table)
39
51
  end
40
52
 
41
- def columns(table)
42
- query = "SELECT column_name FROM information_schema.columns WHERE table_schema = $1 AND table_name = $2"
43
- execute(query, table.split(".", 2)).map { |row| row["column_name"] }
44
- end
45
-
46
53
  def sequences(table, columns)
47
- execute("SELECT #{columns.map { |f| "pg_get_serial_sequence(#{escape("#{quote_ident_full(table)}")}, #{escape(f)}) AS #{quote_ident(f)}" }.join(", ")}")[0].values.compact
54
+ execute("SELECT #{columns.map { |f| "pg_get_serial_sequence(#{escape("#{quote_ident_full(table)}")}, #{escape(f)}) AS #{quote_ident(f)}" }.join(", ")}").first.values.compact
48
55
  end
49
56
 
50
57
  def max_id(table, primary_key, sql_clause = nil)
51
- execute("SELECT MAX(#{quote_ident(primary_key)}) FROM #{quote_ident_full(table)}#{sql_clause}")[0]["max"].to_i
58
+ execute("SELECT MAX(#{quote_ident(primary_key)}) FROM #{quote_ident_full(table)}#{sql_clause}").first["max"].to_i
52
59
  end
53
60
 
54
61
  def min_id(table, primary_key, sql_clause = nil)
55
- execute("SELECT MIN(#{quote_ident(primary_key)}) FROM #{quote_ident_full(table)}#{sql_clause}")[0]["min"].to_i
62
+ execute("SELECT MIN(#{quote_ident(primary_key)}) FROM #{quote_ident_full(table)}#{sql_clause}").first["min"].to_i
56
63
  end
57
64
 
65
+ # this value comes from pg_get_serial_sequence which is already quoted
58
66
  def last_value(seq)
59
- execute("select last_value from #{seq}")[0]["last_value"]
67
+ execute("SELECT last_value FROM #{seq}").first["last_value"]
60
68
  end
61
69
 
62
70
  def truncate(table)
@@ -64,28 +72,31 @@ module PgSync
64
72
  end
65
73
 
66
74
  # https://stackoverflow.com/a/20537829
75
+ # TODO can simplify with array_position in Postgres 9.5+
67
76
  def primary_key(table)
68
- query = <<-SQL
77
+ query = <<~SQL
69
78
  SELECT
70
79
  pg_attribute.attname,
71
- format_type(pg_attribute.atttypid, pg_attribute.atttypmod)
80
+ format_type(pg_attribute.atttypid, pg_attribute.atttypmod),
81
+ pg_attribute.attnum,
82
+ pg_index.indkey
72
83
  FROM
73
84
  pg_index, pg_class, pg_attribute, pg_namespace
74
85
  WHERE
75
- pg_class.oid = $2::regclass AND
76
- indrelid = pg_class.oid AND
77
86
  nspname = $1 AND
87
+ relname = $2 AND
88
+ indrelid = pg_class.oid AND
78
89
  pg_class.relnamespace = pg_namespace.oid AND
79
90
  pg_attribute.attrelid = pg_class.oid AND
80
91
  pg_attribute.attnum = any(pg_index.indkey) AND
81
92
  indisprimary
82
93
  SQL
83
- row = execute(query, [table.split(".", 2)[0], quote_ident_full(table)])[0]
84
- row && row["attname"]
94
+ rows = execute(query, [table.schema, table.name])
95
+ rows.sort_by { |r| r["indkey"].split(" ").index(r["attnum"]) }.map { |r| r["attname"] }
85
96
  end
86
97
 
87
98
  def triggers(table)
88
- query = <<-SQL
99
+ query = <<~SQL
89
100
  SELECT
90
101
  tgname AS name,
91
102
  tgisinternal AS internal,
@@ -108,9 +119,10 @@ module PgSync
108
119
  else
109
120
  config = {dbname: @url}
110
121
  end
122
+ @concurrent_id = concurrent_id
111
123
  PG::Connection.new(config)
112
124
  rescue URI::InvalidURIError
113
- raise Error, "Invalid connection string"
125
+ raise Error, "Invalid connection string. Make sure it works with `psql`"
114
126
  end
115
127
  end
116
128
  end
@@ -122,32 +134,17 @@ module PgSync
122
134
  end
123
135
  end
124
136
 
125
- def reconnect
126
- @conn.reset
127
- end
128
-
129
- def dump_command(tables)
130
- tables = tables ? tables.keys.map { |t| "-t #{Shellwords.escape(quote_ident_full(t))}" }.join(" ") : ""
131
- "pg_dump -Fc --verbose --schema-only --no-owner --no-acl #{tables} -d #{@url}"
137
+ # reconnect for new thread or process
138
+ def reconnect_if_needed
139
+ reconnect if @concurrent_id != concurrent_id
132
140
  end
133
141
 
134
- def restore_command
135
- if_exists = Gem::Version.new(pg_restore_version) >= Gem::Version.new("9.4.0")
136
- "pg_restore --verbose --no-owner --no-acl --clean #{if_exists ? "--if-exists" : nil} -d #{@url}"
137
- end
138
-
139
- def fully_resolve_tables(tables)
140
- no_schema_tables = {}
141
- search_path_index = Hash[search_path.map.with_index.to_a]
142
- self.tables.group_by { |t| t.split(".", 2)[-1] }.each do |group, t2|
143
- no_schema_tables[group] = t2.sort_by { |t| [search_path_index[t.split(".", 2)[0]] || 1000000, t] }[0]
144
- end
145
-
146
- Hash[tables.map { |k, v| [no_schema_tables[k] || k, v] }]
142
+ def search_path
143
+ @search_path ||= execute("SELECT unnest(current_schemas(true)) AS schema").map { |r| r["schema"] }
147
144
  end
148
145
 
149
- def search_path
150
- @search_path ||= execute("SELECT current_schemas(true)")[0]["current_schemas"][1..-2].split(",")
146
+ def server_version_num
147
+ @server_version_num ||= execute("SHOW server_version_num").first["server_version_num"].to_i
151
148
  end
152
149
 
153
150
  def execute(query, params = [])
@@ -167,10 +164,13 @@ module PgSync
167
164
 
168
165
  private
169
166
 
170
- def pg_restore_version
171
- `pg_restore --version`.lines[0].chomp.split(" ")[-1].split(/[^\d.]/)[0]
172
- rescue Errno::ENOENT
173
- raise Error, "pg_restore not found"
167
+ def concurrent_id
168
+ [Process.pid, Thread.current.object_id]
169
+ end
170
+
171
+ def reconnect
172
+ @conn.reset
173
+ @concurrent_id = concurrent_id
174
174
  end
175
175
 
176
176
  def table_set
@@ -185,41 +185,5 @@ module PgSync
185
185
  conn.conninfo_hash
186
186
  end
187
187
  end
188
-
189
- def quote_ident_full(ident)
190
- ident.split(".", 2).map { |v| quote_ident(v) }.join(".")
191
- end
192
-
193
- def quote_ident(value)
194
- PG::Connection.quote_ident(value)
195
- end
196
-
197
- def escape(value)
198
- if value.is_a?(String)
199
- "'#{quote_string(value)}'"
200
- else
201
- value
202
- end
203
- end
204
-
205
- # activerecord
206
- def quote_string(s)
207
- s.gsub(/\\/, '\&\&').gsub(/'/, "''")
208
- end
209
-
210
- def resolve_url(source)
211
- if source
212
- source = source.dup
213
- source.gsub!(/\$\([^)]+\)/) do |m|
214
- command = m[2..-2]
215
- result = `#{command}`.chomp
216
- unless $?.success?
217
- raise Error, "Command exited with non-zero status:\n#{command}"
218
- end
219
- result
220
- end
221
- end
222
- source
223
- end
224
188
  end
225
189
  end