cleansweep 1.0.1 → 1.0.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: b3cbb9366208c06f5d514941f464a4f87dcc1a5a
4
- data.tar.gz: 47d8ffbb89cc27024bbd20d6af95275e0c05f978
3
+ metadata.gz: 171c5ce6b972df17162909a1538cf8ecc867e347
4
+ data.tar.gz: 6d91698f6759a599e03683287ea230230d99a475
5
5
  SHA512:
6
- metadata.gz: c60c9e771c8711d67e548e8c68258ab5f6407675f96efdf27f7b31d2308c2b4a75e9c3a87d514fff67b54aab2fedab1621a6560dbb74146badee7930b57f6061
7
- data.tar.gz: 4fa1bb94eff78d05bd7c2946d567dfb32a09bda50fcae7a305be431b486e0d253288079055771d391bfd21bb3ef2978bafba6044a4f361d259a00a8cbcf2c738
6
+ metadata.gz: 5373eb62b1acbf097681efde6a1ad08ad94c574ee5676546e0c798ba91d93a8a28efc97fe8f4ce95e8b5f0ee8f1e4b12e340c6cdf0a78e901589ecbc85f4fee5
7
+ data.tar.gz: 199c96ba5a90457bd6d3310b59a293de1dd185d84b009d1f849ce13637210415606ba29fd704312186c8767aaaf9990e902bfcbc7d4e561f28f200112ef83f0e
data/CHANGES.md CHANGED
@@ -1,5 +1,11 @@
1
1
  See the [documentation](http://bkayser.github.io/cleansweep) for details
2
2
 
3
- # Version 1.0.1
3
+ ### Version 1.0.1
4
4
 
5
- * Initial release
5
+ * Initial release
6
+
7
+ ### Version 1.0.2
8
+
9
+ * Changed destination options so you can delete from a different table.
10
+ * Added `dest_columns` option as a map of column names in the source to column names in the destination.
11
+ * More testing and bug fixing in real environments
data/README.md CHANGED
@@ -1,5 +1,6 @@
1
- Cleansweep is a utility for scripting purges using ruby in an efficient, low-impact manner on
2
- mysql innodb tables. Based on the Percona `pt-archive` utility.
1
+ Cleansweep is a utility for scripting purges using ruby in an
2
+ efficient, low-impact manner on mysql innodb tables. Based on the
3
+ Percona `pt-archive` utility.
3
4
 
4
5
  ## Installation
5
6
 
@@ -35,12 +36,13 @@ Assume there is an active record model for it:
35
36
 
36
37
  ### Purging by traversing an index
37
38
 
38
- The most efficient way to work through a table is by scanning through an index one chunk
39
- at a time.
39
+ The most efficient way to work through a table is by scanning through
40
+ an index one chunk at a time.
40
41
 
41
42
  Let's assume we want to purge Comments older than 1 month. We can
42
- scan the primary key index or the `account`,`timestamp` index. In this case the latter will
43
- probably work better since we are evaluating the timestamp for the purge.
43
+ scan the primary key index or the `account`,`timestamp` index. In
44
+ this case the latter will probably work better since we are evaluating
45
+ the timestamp for the purge.
44
46
 
45
47
  ```ruby
46
48
  r = CleanSweep::PurgeRunner.new model: Comment,
@@ -62,7 +64,8 @@ Check what it will do:
62
64
  r.print_queries($stdout)
63
65
  ```
64
66
 
65
- This will show you what it will do by printing out the three different statements used:
67
+ This will show you what it will do by printing out the three different
68
+ statements used:
66
69
 
67
70
  ```sql
68
71
  Initial Query:
@@ -82,13 +85,15 @@ This will show you what it will do by printing out the three different statement
82
85
  WHERE (`id` = 2)
83
86
  ```
84
87
 
85
- It does the initial statement once to get the first chunk of rows. Then it does subsequent queries
86
- starting at the index where the last chunk left off, thereby avoiding a complete index scan. This works
87
- fine as long as you don't have rows with duplicate account id and timestamps. If you do, you'll possibly
88
- miss rows between chunks.
88
+ It does the initial statement once to get the first chunk of rows.
89
+ Then it does subsequent queries starting at the index where the last
90
+ chunk left off, thereby avoiding a complete index scan. This works
91
+ fine as long as you don't have rows with duplicate account id and
92
+ timestamps. If you do, you'll possibly miss rows between chunks.
89
93
 
90
- To avoid missing duplicates, you can traverse the index using only the first column with an inclusive comparator
91
- like `>=` instead of `>`. Here's what that would look like:
94
+ To avoid missing duplicates, you can traverse the index using only the
95
+ first column with an inclusive comparator like `>=` instead of `>`.
96
+ Here's what that would look like:
92
97
 
93
98
  ```ruby
94
99
  r = CleanSweep::PurgeRunner.new model:Comment,
@@ -107,48 +112,70 @@ The chunk query looks like:
107
112
  LIMIT 500
108
113
  ```
109
114
 
110
- You can scan the index in either direction. To specify descending order, use the `reverse: true` option.
115
+ You can scan the index in either direction. To specify descending
116
+ order, use the `reverse: true` option.
111
117
 
112
118
  ### Copying rows from one table to another
113
119
 
114
- You can use the same technique to copy rows from one table to another. Support in CleanSweep is pretty
115
- minimal. It won't _move_ rows, only copy them, although it would be easy to fix this.
116
- I used this to copy ids into a temporary table which I then
117
- used to delete later.
120
+ You can use the same technique to copy rows from one table to another.
121
+ Support in CleanSweep is pretty minimal. It won't _move_ rows, only
122
+ copy them, although it would be easy to fix this. I used this to copy
123
+ ids into a temporary table which I then used to delete later.
118
124
 
119
- Here's an example that copies rows from the `Comment` model to the `ExpiredComment` model (`expired_comments`).
120
- Comments older than one week are copied.
125
+ Here's an example that copies rows from the `Comment` model to the
126
+ `ExpiredComment` model (`expired_comments`). Comments older than one
127
+ week are copied.
121
128
 
122
129
  ```ruby
123
130
  copier = CleanSweep::PurgeRunner.new model: Comment,
124
131
  index: 'comments_on_account_timestamp',
125
132
  dest_model: ExpiredComment,
133
+ copy_only: true,
126
134
  copy_columns: %w[liked] do do | model |
127
135
  model.where('last_used_at < ?', 1.week.ago)
128
136
  end
129
137
  ```
130
138
 
131
- The `copy_columns` option specifies additional columns to be inserted into the `expired_comments` table.
139
+ The `copy_columns` option specifies additional columns to be inserted
140
+ into the `expired_comments` table.
141
+
142
+ If the column names are different in the destination table than in the
143
+ source table, you can specify a mapping with the `dest_columns` option
144
+ which takes a map of source column name to destination name.
145
+
146
+ ### Deleting rows in another table
147
+
148
+ What if you want to query one table and delete those rows in another?
149
+ I needed this when I built a temporary table of account ids that
150
+ referenced deleted accounts. I then wanted to delete rows in other
151
+ tables that referenced those account ids. To do that, specify a
152
+ `dest_table` without specifying `copy_only` mode. This will execute
153
+ the delete statement on the destination table without removing rows
154
+ from the source table.
132
155
 
133
156
  ### Watching the history list and replication lag
134
157
 
135
- You can enter thresholds for the history list size and replication lag that will be used to pause the
136
- purge if either of those values get into an unsafe territory. The script will pause for 5 minutes and
137
- only start once the corresponding metric goes back down to 90% of the specified threshold.
158
+ You can enter thresholds for the history list size and replication lag
159
+ that will be used to pause the purge if either of those values get
160
+ into an unsafe territory. The script will pause for 5 minutes and
161
+ only start once the corresponding metric goes back down to 90% of the
162
+ specified threshold.
138
163
 
139
164
  ### Logging and monitoring progress
140
165
 
141
- You pass in a standard log instance to capture all running output. By default it will log to your
142
- `ActiveRecord::Base` logger, or stdout if that's not set up.
166
+ You pass in a standard log instance to capture all running output. By
167
+ default it will log to your `ActiveRecord::Base` logger, or stdout if
168
+ that's not set up.
143
169
 
144
- If you specify a reporting interval
145
- with the `report` option it will print the status of the purge at that interval. This is useful to track
146
- progress and assess the rate of deletion.
170
+ If you specify a reporting interval with the `report` option it will
171
+ print the status of the purge at that interval. This is useful to
172
+ track progress and assess the rate of deletion.
147
173
 
148
174
  ### Joins and subqueries
149
175
 
150
- You can add subqueries and joins to your query in the scope block, but be careful. The index and order
151
- clause may work against you if the table you are joining with doesn't have good parity with the indexes
176
+ You can add subqueries and joins to your query in the scope block, but
177
+ be careful. The index and order clause may work against you if the
178
+ table you are joining with doesn't have good parity with the indexes
152
179
  in your target table.
153
180
 
154
181
  ### Limitations
@@ -165,21 +192,24 @@ in your target table.
165
192
 
166
193
  ### Other options
167
194
 
168
- There are a number of other options you can use to tune the script. For details look at the
169
- [API on the `PurgeRunner` class](http://bkayser.github.io/cleansweep/rdoc/CleanSweep/PurgeRunner.html)
195
+ There are a number of other options you can use to tune the script.
196
+ For details look at the [API on the `PurgeRunner`
197
+ class](http://bkayser.github.io/cleansweep/rdoc/CleanSweep/PurgeRunner.html)
170
198
 
171
199
  ### NewRelic integration
172
200
 
173
- The script requires the [New Relic](http://github.com/newrelic/rpm) gem. It won't impact anyting if you
174
- don't have a New Relic account to report to, but if you do use New Relic it is configured to show you
175
- detailed metrics. I recommend turning off transaction traces for long purge jobs to reduce your memory
176
- footprint.
201
+ The script requires the [New Relic](http://github.com/newrelic/rpm)
202
+ gem. It won't impact anyting if you don't have a New Relic account to
203
+ report to, but if you do use New Relic it is configured to show you
204
+ detailed metrics. I recommend turning off transaction traces for long
205
+ purge jobs to reduce your memory footprint.
177
206
 
178
207
  ## Testing
179
208
 
180
- To run the specs, start a local mysql instance. The default user is root with an empty password.
181
- Override the user/password with environment variables `DB_USER` and `DB_PASSWORD`. The test
182
- creates a db called 'cstest'.
209
+ To run the specs, start a local mysql instance. The default user is
210
+ root with an empty password. Override the user/password with
211
+ environment variables `DB_USER` and `DB_PASSWORD`. The test creates a
212
+ db called 'cstest'.
183
213
 
184
214
  ## Contributing
185
215
 
@@ -197,5 +227,6 @@ Covered by the MIT [LICENSE](LICENSE.txt).
197
227
 
198
228
  ### Credits
199
229
 
200
- This was all inspired and informed by [Percona's `pt-archiver` script](http://www.percona.com/doc/percona-toolkit/2.1/pt-archiver.html)
230
+ This was all inspired and informed by [Percona's `pt-archiver`
231
+ script](http://www.percona.com/doc/percona-toolkit/2.1/pt-archiver.html)
201
232
  written by Baron Schwartz.
data/cleansweep.gemspec CHANGED
@@ -9,11 +9,15 @@ Gem::Specification.new do |spec|
9
9
  spec.authors = ["Bill Kayser"]
10
10
  spec.email = ["bkayser@newrelic.com"]
11
11
  spec.summary = %q{Utility to purge or archive rows in mysql tables}
12
+
13
+ spec.platform = Gem::Platform::RUBY
14
+ spec.required_ruby_version = '~> 2'
15
+
12
16
  spec.description = <<-EOF
13
17
  Purge data from mysql innodb tables efficiently with low overhead and impact.
14
18
  Based on the Percona pt-archive utility.
15
19
  EOF
16
- spec.homepage = "http://github.com/bkayser/cleansweep"
20
+ spec.homepage = "http://bkayser.github.com/cleansweep"
17
21
  spec.license = "MIT"
18
22
 
19
23
  spec.files = `git ls-files -z`.split("\x0")
@@ -23,7 +27,7 @@ Gem::Specification.new do |spec|
23
27
 
24
28
  spec.add_runtime_dependency 'activerecord', '>= 3.0'
25
29
  spec.add_runtime_dependency 'newrelic_rpm'
26
- spec.add_runtime_dependency 'mysql2', '~> 0.3.17'
30
+ spec.add_runtime_dependency 'mysql2', '~> 0.3'
27
31
 
28
32
  spec.add_development_dependency 'pry', '~> 0'
29
33
  spec.add_development_dependency 'bundler', '~> 1.7'
@@ -36,11 +36,23 @@ require 'stringio'
36
36
  # The log instance to use. Defaults to the <tt>ActiveRecord::Base.logger</tt>
37
37
  # if not nil, otherwise it uses _$stdout_
38
38
  # [:dest_model]
39
- # When this option is present nothing is deleted, and instead rows are copied to
40
- # the table for this model. This model must
41
- # have identically named columns as the source model. By default, only columns in the
39
+ # Specifies the model for the delete operation, or the copy operation if in copy mode.
40
+ # When this option is present nothing is deleted in the model table. Instead, rows
41
+ # are either inserted into this table or deleted from this table.
42
+ # The columns in this model must include the primary key columns found in the source
43
+ # model. If they have different names you need to specify them with the
44
+ # <tt>dest_columns</tt> option.
45
+ # [:copy_only]
46
+ # Specifies copy mode, where rows are inserted into the destination table instead of deleted from
47
+ # the model table. By default, only columns in the
42
48
  # named index and primary key are copied but these can be augmented with columns in the
43
49
  # <tt>copy_columns</tt> option.
50
+ # [:dest_columns]
51
+ # This is a map of column names in the model to column names in the dest model when the
52
+ # corresponding models differ. Only column names that are different need to be specified.
53
+ # For instance your table of account ids might have <tt>account_id</tt>
54
+ # as the primary key column, but you want to delete rows in the accounts table where the account id is
55
+ # the column named <tt>id</tt>
44
56
  # [:copy_columns]
45
57
  # Extra columns to add when copying to a dest model.
46
58
  #
@@ -79,11 +91,15 @@ class CleanSweep::PurgeRunner
79
91
  @max_history = options[:max_history]
80
92
  @max_repl_lag = options[:max_repl_lag]
81
93
 
94
+ @copy_mode = @target_model && options[:copy_only]
95
+
82
96
  @table_schema = CleanSweep::TableSchema.new @model,
83
97
  key_name: options[:index],
84
98
  ascending: !options[:reverse],
85
99
  extra_columns: options[:copy_columns],
86
- first_only: options[:first_only]
100
+ first_only: options[:first_only],
101
+ dest_model: @target_model,
102
+ dest_columns: options[:dest_columns]
87
103
 
88
104
  if (@max_history || @max_repl_lag)
89
105
  @mysql_status = CleanSweep::PurgeRunner::MysqlStatus.new model: @model,
@@ -106,7 +122,7 @@ class CleanSweep::PurgeRunner
106
122
 
107
123
 
108
124
  def copy_mode?
109
- @target_model.present?
125
+ @copy_mode
110
126
  end
111
127
 
112
128
  # Execute the purge in chunks according to the parameters given on instance creation.
@@ -117,7 +133,10 @@ class CleanSweep::PurgeRunner
117
133
  #
118
134
  def execute_in_batches
119
135
 
120
- print_queries($stdout) and return 0 if @dry_run
136
+ if @dry_run
137
+ print_queries($stdout)
138
+ return 0
139
+ end
121
140
 
122
141
  @start = Time.now
123
142
  verb = copy_mode? ? "copying" : "purging"
@@ -146,7 +165,7 @@ class CleanSweep::PurgeRunner
146
165
  last_row = rows.last
147
166
  if copy_mode?
148
167
  metric_op_name = 'INSERT'
149
- statement = @table_schema.insert_statement(@target_model, rows)
168
+ statement = @table_schema.insert_statement(rows)
150
169
  else
151
170
  metric_op_name = 'DELETE'
152
171
  statement = @table_schema.delete_statement(rows)
@@ -190,11 +209,16 @@ class CleanSweep::PurgeRunner
190
209
  io.puts 'Initial Query:'
191
210
  io.puts format_query(' ', @query.to_sql)
192
211
  rows = @model.connection.select_rows @query.limit(1).to_sql
212
+ if rows.empty?
213
+ # Don't have any sample data to use for the sample queries, so use NULL values just
214
+ # so the query will print out.
215
+ rows << [nil] * 100
216
+ end
193
217
  io.puts "Chunk Query:"
194
218
  io.puts format_query(' ', @table_schema.scope_to_next_chunk(@query, rows.first).to_sql)
195
219
  if copy_mode?
196
220
  io.puts "Insert Statement:"
197
- io.puts format_query(' ', @table_schema.insert_statement(@target_model, rows))
221
+ io.puts format_query(' ', @table_schema.insert_statement(rows))
198
222
  else
199
223
  io.puts "Delete Statement:"
200
224
  io.puts format_query(' ', @table_schema.delete_statement(rows))
@@ -1,23 +1,39 @@
1
1
  class CleanSweep::TableSchema::ColumnSchema
2
2
 
3
- attr_reader :name
3
+ attr_reader :name, :ar_column
4
4
  attr_accessor :select_position
5
+ attr_writer :dest_name
5
6
 
6
7
  def initialize(name, model)
7
8
  @name = name.to_sym
8
9
  col_num = model.column_names.index(name.to_s) or raise "Can't find #{name} in #{model.name}"
9
10
  @model = model
10
- @column = model.columns[col_num]
11
+ @ar_column = model.columns[col_num]
11
12
  end
12
13
 
13
14
  def quoted_name
14
- "`#{name}`"
15
+ quote_column_name(@model, name)
15
16
  end
17
+
18
+ def quoted_dest_name(dest_model)
19
+ quote_column_name(dest_model, @dest_name || @name)
20
+ end
21
+
16
22
  def value(row)
17
23
  row[select_position]
18
24
  end
25
+
19
26
  def quoted_value(row)
20
- @model.quote_value(value(row), @column)
27
+ @model.quote_value(value(row), @ar_column)
28
+ end
29
+
30
+ def == other
31
+ return other && name == other.name
32
+ end
33
+
34
+ private
35
+ def quote_column_name(model, column_name)
36
+ model.connection.quote_table_name(model.table_name) + "." + model.connection.quote_column_name(column_name)
21
37
  end
22
38
  end
23
39
 
@@ -1,6 +1,6 @@
1
1
  class CleanSweep::TableSchema::IndexSchema < Struct.new :name, :model, :ascending
2
2
 
3
- attr_accessor :columns, :name, :model, :ascending, :first_only
3
+ attr_accessor :columns, :name, :model, :ascending, :first_only, :dest_model
4
4
 
5
5
  def initialize name, model
6
6
  @model = model
@@ -16,12 +16,12 @@ class CleanSweep::TableSchema::IndexSchema < Struct.new :name, :model, :ascendin
16
16
  # Take columns referenced by this index and add them to the list if they
17
17
  # are not present. Record their position in the list because the position will
18
18
  # be where they are located in a row of values passed in later to #scope_to_next_chunk
19
- def add_columns_to select_columns
19
+ def add_columns_to columns
20
20
  @columns.each do | column |
21
- pos = select_columns.index column.name
21
+ pos = columns.index column
22
22
  if pos.nil?
23
- select_columns << column.name
24
- pos = select_columns.size - 1
23
+ columns << column
24
+ pos = columns.size - 1
25
25
  end
26
26
  column.select_position = pos
27
27
  end
@@ -2,7 +2,7 @@
2
2
  class CleanSweep::TableSchema
3
3
 
4
4
  # The list of columns used when selecting, the union of pk and traversing key columns
5
- attr_reader :select_columns
5
+ attr_reader :columns
6
6
 
7
7
  # The schema for the primary key
8
8
  attr_reader :primary_key
@@ -18,8 +18,17 @@ class CleanSweep::TableSchema
18
18
  ascending = options.include?(:ascending) ? options[:ascending] : true
19
19
  first_only = options[:first_only]
20
20
  @model = model
21
+ @dest_model = options[:dest_model] || @model
22
+
23
+ # Downcase and symbolize the entries in the column name map:
24
+ dest_columns_map = Hash[*(options[:dest_columns] || {}).to_a.flatten.map{|n| n.to_s.downcase.to_sym}]
25
+
21
26
  @name = @model.table_name
22
- @select_columns = (options[:extra_columns] && options[:extra_columns].map(&:to_sym)) || []
27
+
28
+ @columns =
29
+ (options[:extra_columns] || []).map do | extra_col_name |
30
+ CleanSweep::TableSchema::ColumnSchema.new extra_col_name, model
31
+ end
23
32
 
24
33
  key_schemas = build_indexes
25
34
 
@@ -28,31 +37,40 @@ class CleanSweep::TableSchema
28
37
  raise "Table #{model.table_name} must have a primary key" unless key_schemas.include? 'primary'
29
38
 
30
39
  @primary_key = key_schemas['primary']
31
- @primary_key.add_columns_to @select_columns
40
+ @primary_key.add_columns_to @columns
32
41
  if traversing_key_name
33
42
  traversing_key_name.downcase!
34
43
  raise "BTREE Index #{traversing_key_name} not found" unless key_schemas.include? traversing_key_name
35
44
  @traversing_key = key_schemas[traversing_key_name]
36
- @traversing_key.add_columns_to @select_columns
45
+ @traversing_key.add_columns_to @columns
37
46
  @traversing_key.ascending = ascending
38
47
  @traversing_key.first_only = first_only
39
48
  end
40
49
 
50
+ # Specify the column names in the destination map, if provided
51
+ @columns.each do | column |
52
+ column.dest_name = dest_columns_map[column.name]
53
+ end
54
+
41
55
  end
42
56
 
43
- def insert_statement(target_model, rows)
44
- "insert into #{target_model.quoted_table_name} (#{quoted_column_names}) values #{quoted_row_values(rows)}"
57
+ def column_names
58
+ @columns.map(&:name)
59
+ end
60
+
61
+ def insert_statement(rows)
62
+ "insert into #{@dest_model.quoted_table_name} (#{quoted_dest_column_names}) values #{quoted_row_values(rows)}"
45
63
  end
46
64
 
47
65
  def delete_statement(rows)
48
66
  rec_criteria = rows.map do | row |
49
67
  row_compares = []
50
68
  @primary_key.columns.each do |column|
51
- row_compares << "#{column.quoted_name} = #{column.quoted_value(row)}"
69
+ row_compares << "#{column.quoted_dest_name(@dest_model)} = #{column.quoted_value(row)}"
52
70
  end
53
71
  "(" + row_compares.join(" AND ") + ")"
54
72
  end
55
- "DELETE FROM #{@model.quoted_table_name} WHERE #{rec_criteria.join(" OR ")}"
73
+ "DELETE FROM #{@dest_model.quoted_table_name} WHERE #{rec_criteria.join(" OR ")}"
56
74
  end
57
75
 
58
76
  def initial_scope
@@ -82,15 +100,20 @@ class CleanSweep::TableSchema
82
100
  end
83
101
 
84
102
  def quoted_column_names
85
- select_columns.map{|c| "`#{c}`"}.join(",")
103
+ columns.map{|c| "#{c.quoted_name}"}.join(",")
104
+ end
105
+
106
+ def quoted_dest_column_names
107
+ columns.map{|c| c.quoted_dest_name(@dest_model)}.join(",")
86
108
  end
87
109
 
88
110
  def quoted_row_values(rows)
89
111
  rows.map do |vec|
90
- quoted_column_values = vec.map do |col_value|
91
- @model.connection.quote(col_value)
92
- end.join(",")
93
- "(#{quoted_column_values})"
112
+ row = []
113
+ columns.each_with_index do | col, i |
114
+ row << @model.quote_value(vec[i], col.ar_column)
115
+ end
116
+ "(#{row.join(',')})"
94
117
  end.join(",")
95
118
  end
96
119
 
@@ -1,3 +1,3 @@
1
1
  module CleanSweep
2
- VERSION = "1.0.1"
2
+ VERSION = "1.0.2"
3
3
  end
@@ -12,6 +12,7 @@ class Book < ActiveRecord::Base
12
12
  key book_index_by_bin(bin, id)
13
13
  )
14
14
  EOF
15
+ Book.delete_all
15
16
  end
16
17
 
17
18
  end
@@ -27,10 +28,17 @@ end
27
28
  class BookTemp < ActiveRecord::Base
28
29
 
29
30
  self.table_name = 'book_vault'
31
+ self.primary_key= 'book_id'
30
32
 
31
33
  def self.create_table
32
34
  connection.execute <<-EOF
33
- create temporary table if not exists book_vault like books
35
+ create temporary table if not exists
36
+ book_vault (
37
+ `book_id` int(11) primary key auto_increment,
38
+ `bin` int(11),
39
+ `published_by` varchar(64)
40
+ )
34
41
  EOF
42
+ BookTemp.delete_all
35
43
  end
36
44
  end
@@ -72,20 +72,20 @@ describe CleanSweep::PurgeRunner do
72
72
  purger.print_queries(output)
73
73
  expect(output.string).to eq <<EOF
74
74
  Initial Query:
75
- SELECT `id`,`account`,`timestamp`
75
+ SELECT `comments`.`id`,`comments`.`account`,`comments`.`timestamp`
76
76
  FROM `comments` FORCE INDEX(comments_on_account_timestamp)
77
77
  WHERE (timestamp < '2014-11-25 21:47:43')
78
- ORDER BY `account` ASC,`timestamp` ASC
78
+ ORDER BY `comments`.`account` ASC,`comments`.`timestamp` ASC
79
79
  LIMIT 500
80
80
  Chunk Query:
81
- SELECT `id`,`account`,`timestamp`
81
+ SELECT `comments`.`id`,`comments`.`account`,`comments`.`timestamp`
82
82
  FROM `comments` FORCE INDEX(comments_on_account_timestamp)
83
- WHERE (timestamp < '2014-11-25 21:47:43') AND (`account` > 0 OR (`account` = 0 AND `timestamp` > '2014-11-18 21:47:43'))\n ORDER BY `account` ASC,`timestamp` ASC
83
+ WHERE (timestamp < '2014-11-25 21:47:43') AND (`comments`.`account` > 0 OR (`comments`.`account` = 0 AND `comments`.`timestamp` > '2014-11-18 21:47:43'))\n ORDER BY `comments`.`account` ASC,`comments`.`timestamp` ASC
84
84
  LIMIT 500
85
85
  Delete Statement:
86
86
  DELETE
87
87
  FROM `comments`
88
- WHERE (`id` = 2)
88
+ WHERE (`comments`.`id` = 2)
89
89
  EOF
90
90
  end
91
91
  end
@@ -167,13 +167,20 @@ EOF
167
167
  it 'copies books' do
168
168
  BookTemp.create_table
169
169
  purger = CleanSweep::PurgeRunner.new model: Book,
170
+ copy_columns: ['publisher'],
170
171
  dest_model: BookTemp,
172
+ dest_columns: { 'PUBLISHER' => 'published_by', 'ID' => 'book_id'},
171
173
  chunk_size: 4,
174
+ copy_only: true,
172
175
  index: 'book_index_by_bin'
173
176
 
174
177
  count = purger.execute_in_batches
175
178
  expect(count).to be(@total_book_size)
176
179
  expect(BookTemp.count).to eq(@total_book_size)
180
+ last_book = BookTemp.last
181
+ expect(last_book.book_id).to be 200
182
+ expect(last_book.bin).to be 2000
183
+ expect(last_book.published_by).to eq 'Random House'
177
184
  end
178
185
 
179
186
  end
@@ -17,22 +17,22 @@ describe CleanSweep::TableSchema do
17
17
  it 'should produce an ascending chunk clause' do
18
18
  rows = account_and_timestamp_rows
19
19
  expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
20
- .to include("(`account` > 5 OR (`account` = 5 AND `timestamp` > '2014-12-01 23:13:25'))")
20
+ .to include("(`comments`.`account` > 5 OR (`comments`.`account` = 5 AND `comments`.`timestamp` > '2014-12-01 23:13:25'))")
21
21
  end
22
22
 
23
23
  it 'should produce all select columns' do
24
- expect(schema.select_columns).to eq([:id, :account, :timestamp])
24
+ expect(schema.column_names).to eq([:id, :account, :timestamp])
25
25
  end
26
26
 
27
27
  it 'should produce the ascending order clause' do
28
- expect(schema.initial_scope.to_sql).to include('`account` ASC,`timestamp` ASC')
28
+ expect(schema.initial_scope.to_sql).to include('`comments`.`account` ASC,`comments`.`timestamp` ASC')
29
29
  end
30
30
 
31
31
 
32
32
  it 'should produce an insert statement' do
33
33
  schema = CleanSweep::TableSchema.new Comment, key_name: 'comments_on_account_timestamp'
34
34
  rows = account_and_timestamp_rows
35
- expect(schema.insert_statement(Comment, rows)).to eq("insert into `comments` (`id`,`account`,`timestamp`) values (1001,5,'2014-12-02 01:13:25'),(1002,2,'2014-12-02 00:13:25'),(1005,5,'2014-12-01 23:13:25')")
35
+ expect(schema.insert_statement(rows)).to eq("insert into `comments` (`comments`.`id`,`comments`.`account`,`comments`.`timestamp`) values (1001,5,'2014-12-02 01:13:25'),(1002,2,'2014-12-02 00:13:25'),(1005,5,'2014-12-01 23:13:25')")
36
36
  end
37
37
  end
38
38
 
@@ -43,14 +43,14 @@ describe CleanSweep::TableSchema do
43
43
  it 'should produce a descending where clause' do
44
44
  rows = account_and_timestamp_rows
45
45
  expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
46
- .to include("(`account` < 5 OR (`account` = 5 AND `timestamp` < '2014-12-01 23:13:25'))")
46
+ .to include("(`comments`.`account` < 5 OR (`comments`.`account` = 5 AND `comments`.`timestamp` < '2014-12-01 23:13:25'))")
47
47
  end
48
48
 
49
49
 
50
50
  it 'should produce the descending order clause' do
51
51
  rows = account_and_timestamp_rows
52
52
  expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
53
- .to include("`account` DESC,`timestamp` DESC")
53
+ .to include("`comments`.`account` DESC,`comments`.`timestamp` DESC")
54
54
  end
55
55
 
56
56
  end
@@ -59,13 +59,13 @@ describe CleanSweep::TableSchema do
59
59
  let(:schema) { CleanSweep::TableSchema.new Comment, key_name:'comments_on_account_timestamp', first_only: true }
60
60
 
61
61
  it 'should select all the rows' do
62
- expect(schema.select_columns).to eq([:id, :account, :timestamp])
62
+ expect(schema.column_names).to eq([:id, :account, :timestamp])
63
63
  end
64
64
 
65
65
  it 'should only query using the first column of the index' do
66
66
  rows = account_and_timestamp_rows
67
67
  expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
68
- .to include(" (`account` >= 5) ")
68
+ .to include(" (`comments`.`account` >= 5) ")
69
69
 
70
70
  end
71
71
 
@@ -83,7 +83,7 @@ describe CleanSweep::TableSchema do
83
83
 
84
84
  it 'should produce minimal select columns' do
85
85
  schema = CleanSweep::TableSchema.new Comment, key_name: 'PRIMARY'
86
- expect(schema.select_columns).to eq([:id])
86
+ expect(schema.column_names).to eq([:id])
87
87
  end
88
88
 
89
89
  it 'should produce the from clause with an index' do
@@ -93,10 +93,10 @@ describe CleanSweep::TableSchema do
93
93
 
94
94
  it 'should include additional columns' do
95
95
  schema = CleanSweep::TableSchema.new Comment, key_name: 'comments_on_account_timestamp', extra_columns: %w[seen id]
96
- expect(schema.select_columns).to eq([:seen, :id, :account, :timestamp])
96
+ expect(schema.column_names).to eq([:seen, :id, :account, :timestamp])
97
97
  rows = account_and_timestamp_rows
98
98
  rows.map! { |row| row.unshift 1 } # Insert 'seen' value to beginning of row
99
- expect(schema.insert_statement(Comment, rows)).to eq("insert into `comments` (`seen`,`id`,`account`,`timestamp`) values (1,1001,5,'2014-12-02 01:13:25'),(1,1002,2,'2014-12-02 00:13:25'),(1,1005,5,'2014-12-01 23:13:25')")
99
+ expect(schema.insert_statement(rows)).to eq("insert into `comments` (`comments`.`seen`,`comments`.`id`,`comments`.`account`,`comments`.`timestamp`) values (1,1001,5,'2014-12-02 01:13:25'),(1,1002,2,'2014-12-02 00:13:25'),(1,1005,5,'2014-12-01 23:13:25')")
100
100
 
101
101
  end
102
102
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: cleansweep
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.1
4
+ version: 1.0.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Bill Kayser
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-12-02 00:00:00.000000000 Z
11
+ date: 2014-12-03 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activerecord
@@ -44,14 +44,14 @@ dependencies:
44
44
  requirements:
45
45
  - - "~>"
46
46
  - !ruby/object:Gem::Version
47
- version: 0.3.17
47
+ version: '0.3'
48
48
  type: :runtime
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
52
  - - "~>"
53
53
  - !ruby/object:Gem::Version
54
- version: 0.3.17
54
+ version: '0.3'
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: pry
57
57
  requirement: !ruby/object:Gem::Requirement
@@ -167,7 +167,7 @@ files:
167
167
  - spec/purge_runner_spec.rb
168
168
  - spec/spec_helper.rb
169
169
  - spec/table_schema_spec.rb
170
- homepage: http://github.com/bkayser/cleansweep
170
+ homepage: http://bkayser.github.com/cleansweep
171
171
  licenses:
172
172
  - MIT
173
173
  metadata: {}
@@ -177,9 +177,9 @@ require_paths:
177
177
  - lib
178
178
  required_ruby_version: !ruby/object:Gem::Requirement
179
179
  requirements:
180
- - - ">="
180
+ - - "~>"
181
181
  - !ruby/object:Gem::Version
182
- version: '0'
182
+ version: '2'
183
183
  required_rubygems_version: !ruby/object:Gem::Requirement
184
184
  requirements:
185
185
  - - ">="