pgsync 0.2.4 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of pgsync might be problematic. Click here for more details.

Files changed (6) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +11 -0
  3. data/README.md +36 -26
  4. data/lib/pgsync/version.rb +1 -1
  5. data/lib/pgsync.rb +204 -145
  6. metadata +2 -2
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 792aeac71e01e8b28c8f33eab3392f2afb8cf1ac
4
- data.tar.gz: e47f1aed5ef45cdb3af86d2a52a93c16f332e079
3
+ metadata.gz: 06dcfff629b2d6d8e5fb56719c8527b908aa41d8
4
+ data.tar.gz: c3afdf8c5ac9b352c38169c65fe59ff1d55e61a5
5
5
  SHA512:
6
- metadata.gz: 4c6c37c246796179708f08abd3e43d09a3608545cc02ee4eab03cfeb3138b893e60be4ca8fceb6773ac295346fa194d6275aae660517c870be96b6d93cfb3358
7
- data.tar.gz: 0fa9979a1a15d6a5e0da7298eed998f9a04a5067ed2be35d0b67d84ef5ffd2fa85b94cdf1d7f52d9ab42185ce476b57cac89b0f5a182eaacfd011a8ad8b12861
6
+ metadata.gz: fb13e085bd8bb12d027166481a64cf28d6c7dda07aee78996d65dcf8f8beba91af9850d6ebf544d0cd937abfed07f4c209147c08b010306f3ef4223b49591718
7
+ data.tar.gz: 2019ba48fac3b89f17d0c70069f1c975f1c052bed67bd2f8c63b7d8eaca5b396c2dbafe434dd7f0887a2b062a154210ce41bf76fd5067e190f2121209e7d6312
data/CHANGELOG.md CHANGED
@@ -1,3 +1,14 @@
1
+ # 0.3.0
2
+
3
+ - More powerful groups
4
+ - Overwrite rows by default when `WHERE` clause (previously truncated)
5
+ - Added `pgsync users "WHERE id = 1`
6
+ - Added `pgsync group1`, shorthand for `pgsync groups group1`
7
+ - Added `--schema-only` option
8
+ - Added `--no-rules` option
9
+ - Added `--setup` option
10
+ - Added `--truncate` option
11
+
1
12
  # 0.2.4
2
13
 
3
14
  - Added `--preserve` option
data/README.md CHANGED
@@ -2,6 +2,8 @@
2
2
 
3
3
  Quickly and securely sync data between environments
4
4
 
5
+ :tangerine: Battle-tested at [Instacart](https://www.instacart.com/opensource)
6
+
5
7
  ## Installation
6
8
 
7
9
  ```sh
@@ -11,7 +13,7 @@ gem install pgsync
11
13
  And in your project directory, run:
12
14
 
13
15
  ```sh
14
- pgsync setup
16
+ pgsync --setup
15
17
  ```
16
18
 
17
19
  This creates `.pgsync.yml` for you to customize. We recommend checking this into your version control (assuming it doesn’t contain sensitive information). `pgsync` commands can be run from this directory or any subdirectory.
@@ -30,16 +32,10 @@ Fetch specific tables
30
32
  pgsync table1,table2
31
33
  ```
32
34
 
33
- Fetch specific rows (truncates destination table first)
34
-
35
- ```sh
36
- pgsync products --where "id < 100"
37
- ```
38
-
39
- To preserve existing rows, use:
35
+ Fetch specific rows
40
36
 
41
37
  ```sh
42
- pgsync products --where "id < 100" --preserve
38
+ pgsync products "WHERE id < 1000"
43
39
  ```
44
40
 
45
41
  ### Exclude Tables
@@ -63,38 +59,52 @@ exclude:
63
59
  - schema_migrations
64
60
  ```
65
61
 
66
- ### Schema
62
+ ### Groups
67
63
 
68
- Fetch schema
64
+ Define groups in `.pgsync.yml`:
69
65
 
70
- ```sh
71
- pgsync schema
66
+ ```yml
67
+ groups:
68
+ group1:
69
+ - table1
70
+ - table2
72
71
  ```
73
72
 
74
- Specify tables
73
+ And run:
75
74
 
76
75
  ```sh
77
- pgsync schema table1,table2
76
+ pgsync group1
78
77
  ```
79
78
 
80
- ### Groups
81
-
82
- Define groups in `.pgsync.yml`:
79
+ You can also sync specific rows:
83
80
 
84
81
  ```yml
85
82
  groups:
86
- group1:
87
- - table1
88
- - table2
89
- group2:
90
- - table3
91
- - table4
83
+ user:
84
+ users: "WHERE id = {id}"
85
+ orders: "WHERE user_id = {id}"
92
86
  ```
93
87
 
94
88
  And run:
95
89
 
96
90
  ```sh
97
- pgsync groups group1,group2
91
+ pgsync user:123
92
+ ```
93
+
94
+ to get rows associated with user `123`.
95
+
96
+ ### Schema
97
+
98
+ Fetch schema
99
+
100
+ ```sh
101
+ pgsync --schema-only
102
+ ```
103
+
104
+ Specify tables
105
+
106
+ ```sh
107
+ pgsync table1,table2 --schema-only
98
108
  ```
99
109
 
100
110
  ## Sensitive Information
@@ -136,7 +146,7 @@ Options for replacement are:
136
146
  To use with multiple databases, run:
137
147
 
138
148
  ```sh
139
- pgsync setup db2
149
+ pgsync --setup db2
140
150
  ```
141
151
 
142
152
  This creates `.pgsync-db2.yml` for you to edit. Specify a database in commands with:
@@ -1,3 +1,3 @@
1
1
  module PgSync
2
- VERSION = "0.2.4"
2
+ VERSION = "0.3.0"
3
3
  end
data/lib/pgsync.rb CHANGED
@@ -40,8 +40,13 @@ module PgSync
40
40
  end
41
41
  command = args[0]
42
42
 
43
+ # setup hack
44
+ if opts[:setup]
45
+ command = "setup"
46
+ args[1] = args[0]
47
+ end
43
48
  if command == "setup"
44
- setup(db_config_file(args[1]) || config_file)
49
+ setup(db_config_file(args[1]) || config_file || ".pgsync.yml")
45
50
  else
46
51
  source = parse_source(opts[:from])
47
52
  abort "No source" unless source
@@ -55,11 +60,16 @@ module PgSync
55
60
  print_uri("From", source_uri)
56
61
  print_uri("To", destination_uri)
57
62
 
58
- if args[0] == "schema"
63
+ from_uri = source_uri
64
+ to_uri = destination_uri
65
+
66
+ tables = table_list(args, opts, from_uri)
67
+
68
+ if args[0] == "schema" || opts[:schema_only]
59
69
  time =
60
70
  benchmark do
61
71
  log "* Dumping schema"
62
- tables = to_arr(args[1]).map { |t| "-t #{t}" }.join(" ")
72
+ tables = tables.keys.map { |t| "-t #{t}" }.join(" ")
63
73
  dump_command = "pg_dump --verbose --schema-only --no-owner --no-acl --clean #{tables} #{to_url(source_uri)}"
64
74
  restore_command = "psql -q -d #{to_url(destination_uri)}"
65
75
  system("#{dump_command} | #{restore_command}")
@@ -67,39 +77,8 @@ module PgSync
67
77
 
68
78
  log "* DONE (#{time.round(1)}s)"
69
79
  else
70
- from_uri = source_uri
71
- to_uri = destination_uri
72
-
73
- tables =
74
- if args[0] == "groups"
75
- specified_groups = to_arr(args[1])
76
- specified_groups.map do |group|
77
- if (tables = config["groups"][group])
78
- tables
79
- else
80
- abort "Group not found: #{group}"
81
- end
82
- end.flatten
83
- elsif args[0] == "tables"
84
- to_arr(args[1])
85
- elsif args[0]
86
- to_arr(args[0])
87
- else
88
- nil
89
- end
90
-
91
- with_connection(from_uri, timeout: 3) do |conn|
92
- tables ||= self.tables(conn, "public") - to_arr(opts[:exclude])
93
-
94
- tables.each do |table|
95
- unless table_exists?(conn, table, "public")
96
- abort "Table does not exist in source: #{table}"
97
- end
98
- end
99
- end
100
-
101
80
  with_connection(to_uri, timeout: 3) do |conn|
102
- tables.each do |table|
81
+ tables.keys.each do |table|
103
82
  unless table_exists?(conn, table, "public")
104
83
  abort "Table does not exist in destination: #{table}"
105
84
  end
@@ -110,130 +89,143 @@ module PgSync
110
89
  if args[0] == "groups"
111
90
  pretty_list (config["groups"] || {}).keys
112
91
  else
113
- pretty_list tables
92
+ pretty_list tables.keys
114
93
  end
115
94
  else
116
- in_parallel(tables) do |table|
117
- time =
118
- benchmark do
119
- with_connection(from_uri) do |from_connection|
120
- with_connection(to_uri) do |to_connection|
121
- bad_fields = config["data_rules"]
122
-
123
- from_fields = columns(from_connection, table, "public")
124
- to_fields = columns(to_connection, table, "public")
125
- shared_fields = to_fields & from_fields
126
- extra_fields = to_fields - from_fields
127
- missing_fields = from_fields - to_fields
128
-
129
- from_sequences = sequences(from_connection, table, shared_fields)
130
- to_sequences = sequences(to_connection, table, shared_fields)
131
- shared_sequences = to_sequences & from_sequences
132
- extra_sequences = to_sequences - from_sequences
133
- missing_sequences = from_sequences - to_sequences
134
-
135
- where = opts[:where]
136
- limit = opts[:limit]
137
- sql_clause = String.new
138
-
139
- @mutex.synchronize do
140
- log "* Syncing #{table}"
141
- if where
142
- log " #{where}"
143
- sql_clause << " WHERE #{opts[:where]}"
144
- end
145
- if limit
146
- log " LIMIT #{limit}"
147
- sql_clause << " LIMIT #{limit}"
148
- end
149
- log " Extra columns: #{extra_fields.join(", ")}" if extra_fields.any?
150
- log " Missing columns: #{missing_fields.join(", ")}" if missing_fields.any?
151
- log " Extra sequences: #{extra_sequences.join(", ")}" if extra_sequences.any?
152
- log " Missing sequences: #{missing_sequences.join(", ")}" if missing_sequences.any?
95
+ in_parallel(tables) do |table, table_opts|
96
+ sync_table(table, opts.merge(table_opts), from_uri, to_uri)
97
+ end
153
98
 
154
- if shared_fields.empty?
155
- log " No fields to copy"
156
- end
99
+ time = Time.now - start_time
100
+ log "Completed in #{time.round(1)}s"
101
+ end
102
+ end
103
+ end
104
+ true
105
+ end
106
+
107
+ protected
108
+
109
+ def sync_table(table, opts, from_uri, to_uri)
110
+ time =
111
+ benchmark do
112
+ with_connection(from_uri) do |from_connection|
113
+ with_connection(to_uri) do |to_connection|
114
+ bad_fields = opts[:no_rules] ? [] : config["data_rules"]
115
+
116
+ from_fields = columns(from_connection, table, "public")
117
+ to_fields = columns(to_connection, table, "public")
118
+ shared_fields = to_fields & from_fields
119
+ extra_fields = to_fields - from_fields
120
+ missing_fields = from_fields - to_fields
121
+
122
+ from_sequences = sequences(from_connection, table, shared_fields)
123
+ to_sequences = sequences(to_connection, table, shared_fields)
124
+ shared_sequences = to_sequences & from_sequences
125
+ extra_sequences = to_sequences - from_sequences
126
+ missing_sequences = from_sequences - to_sequences
127
+
128
+ where = opts[:where]
129
+ limit = opts[:limit]
130
+ sql_clause = String.new
131
+
132
+ @mutex.synchronize do
133
+ log "* Syncing #{table}"
134
+ if opts[:sql]
135
+ log " #{opts[:sql]}"
136
+ sql_clause << " #{opts[:sql]}"
137
+ end
138
+ if where
139
+ log " #{where}"
140
+ sql_clause << " WHERE #{opts[:where]}"
141
+ end
142
+ if limit
143
+ log " LIMIT #{limit}"
144
+ sql_clause << " LIMIT #{limit}"
145
+ end
146
+ log " Extra columns: #{extra_fields.join(", ")}" if extra_fields.any?
147
+ log " Missing columns: #{missing_fields.join(", ")}" if missing_fields.any?
148
+ log " Extra sequences: #{extra_sequences.join(", ")}" if extra_sequences.any?
149
+ log " Missing sequences: #{missing_sequences.join(", ")}" if missing_sequences.any?
150
+
151
+ if shared_fields.empty?
152
+ log " No fields to copy"
153
+ end
154
+ end
155
+
156
+ if shared_fields.any?
157
+ copy_fields = shared_fields.map { |f| f2 = bad_fields.to_a.find { |bf, bk| rule_match?(table, f, bf) }; f2 ? "#{apply_strategy(f2[1], table, f, from_connection)} AS #{escape_identifier(f)}" : "#{table}.#{escape_identifier(f)}" }.join(", ")
158
+ fields = shared_fields.map { |f| escape_identifier(f) }.join(", ")
159
+
160
+ seq_values = {}
161
+ shared_sequences.each do |seq|
162
+ seq_values[seq] = from_connection.exec("select last_value from #{seq}").to_a[0]["last_value"]
163
+ end
164
+
165
+ copy_to_command = "COPY (SELECT #{copy_fields} FROM #{table}#{sql_clause}) TO STDOUT"
166
+ if !opts[:truncate] && (opts[:preserve] || !sql_clause.empty?)
167
+ primary_key = self.primary_key(from_connection, table, "public")
168
+ abort "No primary key" unless primary_key
169
+
170
+ temp_table = "pgsync_#{rand(1_000_000_000)}"
171
+ file = Tempfile.new(temp_table)
172
+ begin
173
+ from_connection.copy_data copy_to_command do
174
+ while row = from_connection.get_copy_data
175
+ file.write(row)
157
176
  end
177
+ end
178
+ file.rewind
158
179
 
159
- if shared_fields.any?
160
- copy_fields = shared_fields.map { |f| f2 = bad_fields.to_a.find { |bf, bk| rule_match?(table, f, bf) }; f2 ? "#{apply_strategy(f2[1], f, from_connection)} AS #{escape_identifier(f)}" : escape_identifier(f) }.join(", ")
161
- fields = shared_fields.map { |f| escape_identifier(f) }.join(", ")
180
+ to_connection.transaction do
181
+ # create a temp table
182
+ to_connection.exec("CREATE TABLE #{temp_table} AS SELECT * FROM #{table} WITH NO DATA")
162
183
 
163
- seq_values = {}
164
- shared_sequences.each do |seq|
165
- seq_values[seq] = from_connection.exec("select last_value from #{seq}").to_a[0]["last_value"]
184
+ # load file
185
+ to_connection.copy_data "COPY #{temp_table} (#{fields}) FROM STDIN" do
186
+ file.each do |row|
187
+ to_connection.put_copy_data(row)
166
188
  end
189
+ end
167
190
 
168
- copy_to_command = "COPY (SELECT #{copy_fields} FROM #{table}#{sql_clause}) TO STDOUT"
169
- if opts[:preserve]
170
- primary_key = self.primary_key(from_connection, table, "public")
171
- abort "No primary key" unless primary_key
172
-
173
- temp_table = "pgsync_#{rand(1_000_000_000)}"
174
- file = Tempfile.new(temp_table)
175
- begin
176
- from_connection.copy_data copy_to_command do
177
- while row = from_connection.get_copy_data
178
- file.write(row)
179
- end
180
- end
181
- file.rewind
182
-
183
- to_connection.transaction do
184
- # create a temp table
185
- to_connection.exec("CREATE TABLE #{temp_table} AS SELECT * FROM #{table} WITH NO DATA")
186
-
187
- # load file
188
- to_connection.copy_data "COPY #{temp_table} (#{fields}) FROM STDIN" do
189
- file.each do |row|
190
- to_connection.put_copy_data(row)
191
- end
192
- end
193
-
194
- # insert into
195
- to_connection.exec("INSERT INTO #{table} (SELECT * FROM #{temp_table} WHERE NOT EXISTS (SELECT 1 FROM #{table} WHERE #{table}.#{primary_key} = #{temp_table}.#{primary_key}))")
196
-
197
- # delete temp table
198
- to_connection.exec("DROP TABLE #{temp_table}")
199
- end
200
- ensure
201
- file.close
202
- file.unlink
203
- end
204
- else
205
- to_connection.exec("TRUNCATE #{table} CASCADE")
206
- to_connection.copy_data "COPY #{table} (#{fields}) FROM STDIN" do
207
- from_connection.copy_data copy_to_command do
208
- while row = from_connection.get_copy_data
209
- to_connection.put_copy_data(row)
210
- end
211
- end
212
- end
213
- end
214
- seq_values.each do |seq, value|
215
- to_connection.exec("SELECT setval(#{escape(seq)}, #{escape(value)})")
216
- end
191
+ if opts[:preserve]
192
+ # insert into
193
+ to_connection.exec("INSERT INTO #{table} (SELECT * FROM #{temp_table} WHERE NOT EXISTS (SELECT 1 FROM #{table} WHERE #{table}.#{primary_key} = #{temp_table}.#{primary_key}))")
194
+ else
195
+ to_connection.exec("DELETE FROM #{table} WHERE #{primary_key} IN (SELECT #{primary_key} FROM #{temp_table})")
196
+ to_connection.exec("INSERT INTO #{table} (SELECT * FROM #{temp_table})")
197
+ end
198
+
199
+ # delete temp table
200
+ to_connection.exec("DROP TABLE #{temp_table}")
201
+ end
202
+ ensure
203
+ file.close
204
+ file.unlink
205
+ end
206
+ else
207
+ to_connection.exec("TRUNCATE #{table} CASCADE")
208
+ to_connection.copy_data "COPY #{table} (#{fields}) FROM STDIN" do
209
+ from_connection.copy_data copy_to_command do
210
+ while row = from_connection.get_copy_data
211
+ to_connection.put_copy_data(row)
217
212
  end
218
213
  end
219
214
  end
220
215
  end
221
-
222
- @mutex.synchronize do
223
- log "* DONE #{table} (#{time.round(1)}s)"
216
+ seq_values.each do |seq, value|
217
+ to_connection.exec("SELECT setval(#{escape(seq)}, #{escape(value)})")
218
+ end
224
219
  end
225
220
  end
226
-
227
- time = Time.now - start_time
228
- log "Completed in #{time.round(1)}s"
229
221
  end
230
222
  end
223
+
224
+ @mutex.synchronize do
225
+ log "* DONE #{table} (#{time.round(1)}s)"
231
226
  end
232
- true
233
227
  end
234
228
 
235
- protected
236
-
237
229
  def parse_args(args)
238
230
  opts = Slop.parse(args) do |o|
239
231
  o.banner = %{Usage:
@@ -257,7 +249,11 @@ Options:}
257
249
  o.boolean "--to-safe", "accept danger", default: false
258
250
  o.boolean "--debug", "debug", default: false
259
251
  o.boolean "--list", "list", default: false
260
- o.boolean "--preserve", "preserve", default: false
252
+ o.boolean "--preserve", "preserve existing rows", default: false
253
+ o.boolean "--truncate", "truncate existing rows", default: false
254
+ o.boolean "--schema-only", "schema only", default: false
255
+ o.boolean "--no-rules", "do not apply data rules", default: false
256
+ o.boolean "--setup", "setup", default: false
261
257
  o.on "-v", "--version", "print the version" do
262
258
  log PgSync::VERSION
263
259
  @exit = true
@@ -385,7 +381,7 @@ Options:}
385
381
  end
386
382
 
387
383
  # TODO wildcard rules
388
- def apply_strategy(rule, column, conn)
384
+ def apply_strategy(rule, table, column, conn)
389
385
  if rule.is_a?(Hash)
390
386
  if rule.key?("value")
391
387
  escape(rule["value"])
@@ -396,13 +392,13 @@ Options:}
396
392
  end
397
393
  else
398
394
  strategies = {
399
- "unique_email" => "'email' || id || '@example.org'",
395
+ "unique_email" => "'email' || #{table}.id || '@example.org'",
400
396
  "untouched" => escape_identifier(column),
401
- "unique_phone" => "(id + 1000000000)::text",
397
+ "unique_phone" => "(#{table}.id + 1000000000)::text",
402
398
  "random_int" => "(RAND() * 10)::int",
403
399
  "random_date" => "'1970-01-01'",
404
400
  "random_time" => "NOW()",
405
- "unique_secret" => "'secret' || id",
401
+ "unique_secret" => "'secret' || #{table}.id",
406
402
  "random_ip" => "'127.0.0.1'",
407
403
  "random_letter" => "'A'",
408
404
  "null" => "NULL",
@@ -511,5 +507,68 @@ Options:}
511
507
  log item
512
508
  end
513
509
  end
510
+
511
+ def table_list(args, opts, from_uri)
512
+ tables = nil
513
+
514
+ if args[0] == "groups"
515
+ tables = Hash.new { |hash, key| hash[key] = {} }
516
+ specified_groups = to_arr(args[1])
517
+ specified_groups.map do |tag|
518
+ group, id = tag.split(":", 2)
519
+ if (t = config["goups"][group])
520
+ t.each do |table|
521
+ tables[table] = {}
522
+ tables[table][:sql] = args[2].to_s.gsub("{id}", cast(id)) if args[2]
523
+ end
524
+ else
525
+ abort "Group not found: #{group}"
526
+ end
527
+ end
528
+ elsif args[0] == "tables"
529
+ tables = Hash.new { |hash, key| hash[key] = {} }
530
+ to_arr(args[1]).each do |tag|
531
+ table, id = tag.split(":", 2)
532
+ tables[table] = {}
533
+ tables[table][:sql] = args[2].to_s.gsub("{id}", cast(id)) if args[2]
534
+ end
535
+ elsif args[0]
536
+ # could be a group, table, or mix
537
+ tables = Hash.new { |hash, key| hash[key] = {} }
538
+ specified_groups = to_arr(args[0])
539
+ specified_groups.map do |tag|
540
+ group, id = tag.split(":", 2)
541
+ if (t = config["groups"][group])
542
+ t.each do |table|
543
+ sql = nil
544
+ if table.is_a?(Array)
545
+ table, sql = table
546
+ end
547
+ tables[table] = {}
548
+ tables[table][:sql] = (args[1] || sql).to_s.gsub("{id}", cast(id)) if args[1] || sql
549
+ end
550
+ else
551
+ tables[group] = {}
552
+ tables[group][:sql] = args[1].to_s.gsub("{id}", cast(id)) if args[1]
553
+ end
554
+ end
555
+ end
556
+
557
+ with_connection(from_uri, timeout: 3) do |conn|
558
+ tables ||= Hash[(self.tables(conn, "public") - to_arr(opts[:exclude])).map { |k| [k, {}] }]
559
+
560
+ tables.keys.each do |table|
561
+ unless table_exists?(conn, table, "public")
562
+ abort "Table does not exist in source: #{table}"
563
+ end
564
+ end
565
+ end
566
+
567
+ tables
568
+ end
569
+
570
+ def cast(value)
571
+ value.to_s
572
+ end
514
573
  end
515
574
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pgsync
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.4
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-04-05 00:00:00.000000000 Z
11
+ date: 2016-04-07 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: slop