cleansweep 1.0.1 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: b3cbb9366208c06f5d514941f464a4f87dcc1a5a
4
- data.tar.gz: 47d8ffbb89cc27024bbd20d6af95275e0c05f978
3
+ metadata.gz: 171c5ce6b972df17162909a1538cf8ecc867e347
4
+ data.tar.gz: 6d91698f6759a599e03683287ea230230d99a475
5
5
  SHA512:
6
- metadata.gz: c60c9e771c8711d67e548e8c68258ab5f6407675f96efdf27f7b31d2308c2b4a75e9c3a87d514fff67b54aab2fedab1621a6560dbb74146badee7930b57f6061
7
- data.tar.gz: 4fa1bb94eff78d05bd7c2946d567dfb32a09bda50fcae7a305be431b486e0d253288079055771d391bfd21bb3ef2978bafba6044a4f361d259a00a8cbcf2c738
6
+ metadata.gz: 5373eb62b1acbf097681efde6a1ad08ad94c574ee5676546e0c798ba91d93a8a28efc97fe8f4ce95e8b5f0ee8f1e4b12e340c6cdf0a78e901589ecbc85f4fee5
7
+ data.tar.gz: 199c96ba5a90457bd6d3310b59a293de1dd185d84b009d1f849ce13637210415606ba29fd704312186c8767aaaf9990e902bfcbc7d4e561f28f200112ef83f0e
data/CHANGES.md CHANGED
@@ -1,5 +1,11 @@
1
1
  See the [documentation](http://bkayser.github.io/cleansweep) for details
2
2
 
3
- # Version 1.0.1
3
+ ### Version 1.0.1
4
4
 
5
- * Initial release
5
+ * Initial release
6
+
7
+ ### Version 1.0.2
8
+
9
+ * Changed destination options so you can delete from a different table.
10
+ * Added `dest_columns` option as a map of column names in the source to column names in the destination.
11
+ * More testing and bug fixing in real environments
data/README.md CHANGED
@@ -1,5 +1,6 @@
1
- Cleansweep is a utility for scripting purges using ruby in an efficient, low-impact manner on
2
- mysql innodb tables. Based on the Percona `pt-archive` utility.
1
+ Cleansweep is a utility for scripting purges using ruby in an
2
+ efficient, low-impact manner on mysql innodb tables. Based on the
3
+ Percona `pt-archive` utility.
3
4
 
4
5
  ## Installation
5
6
 
@@ -35,12 +36,13 @@ Assume there is an active record model for it:
35
36
 
36
37
  ### Purging by traversing an index
37
38
 
38
- The most efficient way to work through a table is by scanning through an index one chunk
39
- at a time.
39
+ The most efficient way to work through a table is by scanning through
40
+ an index one chunk at a time.
40
41
 
41
42
  Let's assume we want to purge Comments older than 1 month. We can
42
- scan the primary key index or the `account`,`timestamp` index. In this case the latter will
43
- probably work better since we are evaluating the timestamp for the purge.
43
+ scan the primary key index or the `account`,`timestamp` index. In
44
+ this case the latter will probably work better since we are evaluating
45
+ the timestamp for the purge.
44
46
 
45
47
  ```ruby
46
48
  r = CleanSweep::PurgeRunner.new model: Comment,
@@ -62,7 +64,8 @@ Check what it will do:
62
64
  r.print_queries($stdout)
63
65
  ```
64
66
 
65
- This will show you what it will do by printing out the three different statements used:
67
+ This will show you what it will do by printing out the three different
68
+ statements used:
66
69
 
67
70
  ```sql
68
71
  Initial Query:
@@ -82,13 +85,15 @@ This will show you what it will do by printing out the three different statement
82
85
  WHERE (`id` = 2)
83
86
  ```
84
87
 
85
- It does the initial statement once to get the first chunk of rows. Then it does subsequent queries
86
- starting at the index where the last chunk left off, thereby avoiding a complete index scan. This works
87
- fine as long as you don't have rows with duplicate account id and timestamps. If you do, you'll possibly
88
- miss rows between chunks.
88
+ It does the initial statement once to get the first chunk of rows.
89
+ Then it does subsequent queries starting at the index where the last
90
+ chunk left off, thereby avoiding a complete index scan. This works
91
+ fine as long as you don't have rows with duplicate account id and
92
+ timestamps. If you do, you'll possibly miss rows between chunks.
89
93
 
90
- To avoid missing duplicates, you can traverse the index using only the first column with an inclusive comparator
91
- like `>=` instead of `>`. Here's what that would look like:
94
+ To avoid missing duplicates, you can traverse the index using only the
95
+ first column with an inclusive comparator like `>=` instead of `>`.
96
+ Here's what that would look like:
92
97
 
93
98
  ```ruby
94
99
  r = CleanSweep::PurgeRunner.new model:Comment,
@@ -107,48 +112,70 @@ The chunk query looks like:
107
112
  LIMIT 500
108
113
  ```
109
114
 
110
- You can scan the index in either direction. To specify descending order, use the `reverse: true` option.
115
+ You can scan the index in either direction. To specify descending
116
+ order, use the `reverse: true` option.
111
117
 
112
118
  ### Copying rows from one table to another
113
119
 
114
- You can use the same technique to copy rows from one table to another. Support in CleanSweep is pretty
115
- minimal. It won't _move_ rows, only copy them, although it would be easy to fix this.
116
- I used this to copy ids into a temporary table which I then
117
- used to delete later.
120
+ You can use the same technique to copy rows from one table to another.
121
+ Support in CleanSweep is pretty minimal. It won't _move_ rows, only
122
+ copy them, although it would be easy to fix this. I used this to copy
123
+ ids into a temporary table which I then used to delete later.
118
124
 
119
- Here's an example that copies rows from the `Comment` model to the `ExpiredComment` model (`expired_comments`).
120
- Comments older than one week are copied.
125
+ Here's an example that copies rows from the `Comment` model to the
126
+ `ExpiredComment` model (`expired_comments`). Comments older than one
127
+ week are copied.
121
128
 
122
129
  ```ruby
123
130
  copier = CleanSweep::PurgeRunner.new model: Comment,
124
131
  index: 'comments_on_account_timestamp',
125
132
  dest_model: ExpiredComment,
133
+ copy_only: true,
126
134
  copy_columns: %w[liked] do do | model |
127
135
  model.where('last_used_at < ?', 1.week.ago)
128
136
  end
129
137
  ```
130
138
 
131
- The `copy_columns` option specifies additional columns to be inserted into the `expired_comments` table.
139
+ The `copy_columns` option specifies additional columns to be inserted
140
+ into the `expired_comments` table.
141
+
142
+ If the column names are different in the destination table than in the
143
+ source table, you can specify a mapping with the `dest_columns` option
144
+ which takes a map of source column name to destination name.
145
+
146
+ ### Deleting rows in another table
147
+
148
+ What if you want to query one table and delete those rows in another?
149
+ I needed this when I built a temporary table of account ids that
150
+ referenced deleted accounts. I then wanted to delete rows in other
151
+ tables that referenced those account ids. To do that, specify a
152
+ `dest_table` without specifying `copy_only` mode. This will execute
153
+ the delete statement on the destination table without removing rows
154
+ from the source table.
132
155
 
133
156
  ### Watching the history list and replication lag
134
157
 
135
- You can enter thresholds for the history list size and replication lag that will be used to pause the
136
- purge if either of those values get into an unsafe territory. The script will pause for 5 minutes and
137
- only start once the corresponding metric goes back down to 90% of the specified threshold.
158
+ You can enter thresholds for the history list size and replication lag
159
+ that will be used to pause the purge if either of those values get
160
+ into an unsafe territory. The script will pause for 5 minutes and
161
+ only start once the corresponding metric goes back down to 90% of the
162
+ specified threshold.
138
163
 
139
164
  ### Logging and monitoring progress
140
165
 
141
- You pass in a standard log instance to capture all running output. By default it will log to your
142
- `ActiveRecord::Base` logger, or stdout if that's not set up.
166
+ You pass in a standard log instance to capture all running output. By
167
+ default it will log to your `ActiveRecord::Base` logger, or stdout if
168
+ that's not set up.
143
169
 
144
- If you specify a reporting interval
145
- with the `report` option it will print the status of the purge at that interval. This is useful to track
146
- progress and assess the rate of deletion.
170
+ If you specify a reporting interval with the `report` option it will
171
+ print the status of the purge at that interval. This is useful to
172
+ track progress and assess the rate of deletion.
147
173
 
148
174
  ### Joins and subqueries
149
175
 
150
- You can add subqueries and joins to your query in the scope block, but be careful. The index and order
151
- clause may work against you if the table you are joining with doesn't have good parity with the indexes
176
+ You can add subqueries and joins to your query in the scope block, but
177
+ be careful. The index and order clause may work against you if the
178
+ table you are joining with doesn't have good parity with the indexes
152
179
  in your target table.
153
180
 
154
181
  ### Limitations
@@ -165,21 +192,24 @@ in your target table.
165
192
 
166
193
  ### Other options
167
194
 
168
- There are a number of other options you can use to tune the script. For details look at the
169
- [API on the `PurgeRunner` class](http://bkayser.github.io/cleansweep/rdoc/CleanSweep/PurgeRunner.html)
195
+ There are a number of other options you can use to tune the script.
196
+ For details look at the [API on the `PurgeRunner`
197
+ class](http://bkayser.github.io/cleansweep/rdoc/CleanSweep/PurgeRunner.html)
170
198
 
171
199
  ### NewRelic integration
172
200
 
173
- The script requires the [New Relic](http://github.com/newrelic/rpm) gem. It won't impact anyting if you
174
- don't have a New Relic account to report to, but if you do use New Relic it is configured to show you
175
- detailed metrics. I recommend turning off transaction traces for long purge jobs to reduce your memory
176
- footprint.
201
+ The script requires the [New Relic](http://github.com/newrelic/rpm)
202
+ gem. It won't impact anyting if you don't have a New Relic account to
203
+ report to, but if you do use New Relic it is configured to show you
204
+ detailed metrics. I recommend turning off transaction traces for long
205
+ purge jobs to reduce your memory footprint.
177
206
 
178
207
  ## Testing
179
208
 
180
- To run the specs, start a local mysql instance. The default user is root with an empty password.
181
- Override the user/password with environment variables `DB_USER` and `DB_PASSWORD`. The test
182
- creates a db called 'cstest'.
209
+ To run the specs, start a local mysql instance. The default user is
210
+ root with an empty password. Override the user/password with
211
+ environment variables `DB_USER` and `DB_PASSWORD`. The test creates a
212
+ db called 'cstest'.
183
213
 
184
214
  ## Contributing
185
215
 
@@ -197,5 +227,6 @@ Covered by the MIT [LICENSE](LICENSE.txt).
197
227
 
198
228
  ### Credits
199
229
 
200
- This was all inspired and informed by [Percona's `pt-archiver` script](http://www.percona.com/doc/percona-toolkit/2.1/pt-archiver.html)
230
+ This was all inspired and informed by [Percona's `pt-archiver`
231
+ script](http://www.percona.com/doc/percona-toolkit/2.1/pt-archiver.html)
201
232
  written by Baron Schwartz.
data/cleansweep.gemspec CHANGED
@@ -9,11 +9,15 @@ Gem::Specification.new do |spec|
9
9
  spec.authors = ["Bill Kayser"]
10
10
  spec.email = ["bkayser@newrelic.com"]
11
11
  spec.summary = %q{Utility to purge or archive rows in mysql tables}
12
+
13
+ spec.platform = Gem::Platform::RUBY
14
+ spec.required_ruby_version = '~> 2'
15
+
12
16
  spec.description = <<-EOF
13
17
  Purge data from mysql innodb tables efficiently with low overhead and impact.
14
18
  Based on the Percona pt-archive utility.
15
19
  EOF
16
- spec.homepage = "http://github.com/bkayser/cleansweep"
20
+ spec.homepage = "http://bkayser.github.com/cleansweep"
17
21
  spec.license = "MIT"
18
22
 
19
23
  spec.files = `git ls-files -z`.split("\x0")
@@ -23,7 +27,7 @@ Gem::Specification.new do |spec|
23
27
 
24
28
  spec.add_runtime_dependency 'activerecord', '>= 3.0'
25
29
  spec.add_runtime_dependency 'newrelic_rpm'
26
- spec.add_runtime_dependency 'mysql2', '~> 0.3.17'
30
+ spec.add_runtime_dependency 'mysql2', '~> 0.3'
27
31
 
28
32
  spec.add_development_dependency 'pry', '~> 0'
29
33
  spec.add_development_dependency 'bundler', '~> 1.7'
@@ -36,11 +36,23 @@ require 'stringio'
36
36
  # The log instance to use. Defaults to the <tt>ActiveRecord::Base.logger</tt>
37
37
  # if not nil, otherwise it uses _$stdout_
38
38
  # [:dest_model]
39
- # When this option is present nothing is deleted, and instead rows are copied to
40
- # the table for this model. This model must
41
- # have identically named columns as the source model. By default, only columns in the
39
+ # Specifies the model for the delete operation, or the copy operation if in copy mode.
40
+ # When this option is present nothing is deleted in the model table. Instead, rows
41
+ # are either inserted into this table or deleted from this table.
42
+ # The columns in this model must include the primary key columns found in the source
43
+ # model. If they have different names you need to specify them with the
44
+ # <tt>dest_columns</tt> option.
45
+ # [:copy_only]
46
+ # Specifies copy mode, where rows are inserted into the destination table instead of deleted from
47
+ # the model table. By default, only columns in the
42
48
  # named index and primary key are copied but these can be augmented with columns in the
43
49
  # <tt>copy_columns</tt> option.
50
+ # [:dest_columns]
51
+ # This is a map of column names in the model to column names in the dest model when the
52
+ # corresponding models differ. Only column names that are different need to be specified.
53
+ # For instance your table of account ids might have <tt>account_id</tt>
54
+ # as the primary key column, but you want to delete rows in the accounts table where the account id is
55
+ # the column named <tt>id</tt>
44
56
  # [:copy_columns]
45
57
  # Extra columns to add when copying to a dest model.
46
58
  #
@@ -79,11 +91,15 @@ class CleanSweep::PurgeRunner
79
91
  @max_history = options[:max_history]
80
92
  @max_repl_lag = options[:max_repl_lag]
81
93
 
94
+ @copy_mode = @target_model && options[:copy_only]
95
+
82
96
  @table_schema = CleanSweep::TableSchema.new @model,
83
97
  key_name: options[:index],
84
98
  ascending: !options[:reverse],
85
99
  extra_columns: options[:copy_columns],
86
- first_only: options[:first_only]
100
+ first_only: options[:first_only],
101
+ dest_model: @target_model,
102
+ dest_columns: options[:dest_columns]
87
103
 
88
104
  if (@max_history || @max_repl_lag)
89
105
  @mysql_status = CleanSweep::PurgeRunner::MysqlStatus.new model: @model,
@@ -106,7 +122,7 @@ class CleanSweep::PurgeRunner
106
122
 
107
123
 
108
124
  def copy_mode?
109
- @target_model.present?
125
+ @copy_mode
110
126
  end
111
127
 
112
128
  # Execute the purge in chunks according to the parameters given on instance creation.
@@ -117,7 +133,10 @@ class CleanSweep::PurgeRunner
117
133
  #
118
134
  def execute_in_batches
119
135
 
120
- print_queries($stdout) and return 0 if @dry_run
136
+ if @dry_run
137
+ print_queries($stdout)
138
+ return 0
139
+ end
121
140
 
122
141
  @start = Time.now
123
142
  verb = copy_mode? ? "copying" : "purging"
@@ -146,7 +165,7 @@ class CleanSweep::PurgeRunner
146
165
  last_row = rows.last
147
166
  if copy_mode?
148
167
  metric_op_name = 'INSERT'
149
- statement = @table_schema.insert_statement(@target_model, rows)
168
+ statement = @table_schema.insert_statement(rows)
150
169
  else
151
170
  metric_op_name = 'DELETE'
152
171
  statement = @table_schema.delete_statement(rows)
@@ -190,11 +209,16 @@ class CleanSweep::PurgeRunner
190
209
  io.puts 'Initial Query:'
191
210
  io.puts format_query(' ', @query.to_sql)
192
211
  rows = @model.connection.select_rows @query.limit(1).to_sql
212
+ if rows.empty?
213
+ # Don't have any sample data to use for the sample queries, so use NULL values just
214
+ # so the query will print out.
215
+ rows << [nil] * 100
216
+ end
193
217
  io.puts "Chunk Query:"
194
218
  io.puts format_query(' ', @table_schema.scope_to_next_chunk(@query, rows.first).to_sql)
195
219
  if copy_mode?
196
220
  io.puts "Insert Statement:"
197
- io.puts format_query(' ', @table_schema.insert_statement(@target_model, rows))
221
+ io.puts format_query(' ', @table_schema.insert_statement(rows))
198
222
  else
199
223
  io.puts "Delete Statement:"
200
224
  io.puts format_query(' ', @table_schema.delete_statement(rows))
@@ -1,23 +1,39 @@
1
1
  class CleanSweep::TableSchema::ColumnSchema
2
2
 
3
- attr_reader :name
3
+ attr_reader :name, :ar_column
4
4
  attr_accessor :select_position
5
+ attr_writer :dest_name
5
6
 
6
7
  def initialize(name, model)
7
8
  @name = name.to_sym
8
9
  col_num = model.column_names.index(name.to_s) or raise "Can't find #{name} in #{model.name}"
9
10
  @model = model
10
- @column = model.columns[col_num]
11
+ @ar_column = model.columns[col_num]
11
12
  end
12
13
 
13
14
  def quoted_name
14
- "`#{name}`"
15
+ quote_column_name(@model, name)
15
16
  end
17
+
18
+ def quoted_dest_name(dest_model)
19
+ quote_column_name(dest_model, @dest_name || @name)
20
+ end
21
+
16
22
  def value(row)
17
23
  row[select_position]
18
24
  end
25
+
19
26
  def quoted_value(row)
20
- @model.quote_value(value(row), @column)
27
+ @model.quote_value(value(row), @ar_column)
28
+ end
29
+
30
+ def == other
31
+ return other && name == other.name
32
+ end
33
+
34
+ private
35
+ def quote_column_name(model, column_name)
36
+ model.connection.quote_table_name(model.table_name) + "." + model.connection.quote_column_name(column_name)
21
37
  end
22
38
  end
23
39
 
@@ -1,6 +1,6 @@
1
1
  class CleanSweep::TableSchema::IndexSchema < Struct.new :name, :model, :ascending
2
2
 
3
- attr_accessor :columns, :name, :model, :ascending, :first_only
3
+ attr_accessor :columns, :name, :model, :ascending, :first_only, :dest_model
4
4
 
5
5
  def initialize name, model
6
6
  @model = model
@@ -16,12 +16,12 @@ class CleanSweep::TableSchema::IndexSchema < Struct.new :name, :model, :ascendin
16
16
  # Take columns referenced by this index and add them to the list if they
17
17
  # are not present. Record their position in the list because the position will
18
18
  # be where they are located in a row of values passed in later to #scope_to_next_chunk
19
- def add_columns_to select_columns
19
+ def add_columns_to columns
20
20
  @columns.each do | column |
21
- pos = select_columns.index column.name
21
+ pos = columns.index column
22
22
  if pos.nil?
23
- select_columns << column.name
24
- pos = select_columns.size - 1
23
+ columns << column
24
+ pos = columns.size - 1
25
25
  end
26
26
  column.select_position = pos
27
27
  end
@@ -2,7 +2,7 @@
2
2
  class CleanSweep::TableSchema
3
3
 
4
4
  # The list of columns used when selecting, the union of pk and traversing key columns
5
- attr_reader :select_columns
5
+ attr_reader :columns
6
6
 
7
7
  # The schema for the primary key
8
8
  attr_reader :primary_key
@@ -18,8 +18,17 @@ class CleanSweep::TableSchema
18
18
  ascending = options.include?(:ascending) ? options[:ascending] : true
19
19
  first_only = options[:first_only]
20
20
  @model = model
21
+ @dest_model = options[:dest_model] || @model
22
+
23
+ # Downcase and symbolize the entries in the column name map:
24
+ dest_columns_map = Hash[*(options[:dest_columns] || {}).to_a.flatten.map{|n| n.to_s.downcase.to_sym}]
25
+
21
26
  @name = @model.table_name
22
- @select_columns = (options[:extra_columns] && options[:extra_columns].map(&:to_sym)) || []
27
+
28
+ @columns =
29
+ (options[:extra_columns] || []).map do | extra_col_name |
30
+ CleanSweep::TableSchema::ColumnSchema.new extra_col_name, model
31
+ end
23
32
 
24
33
  key_schemas = build_indexes
25
34
 
@@ -28,31 +37,40 @@ class CleanSweep::TableSchema
28
37
  raise "Table #{model.table_name} must have a primary key" unless key_schemas.include? 'primary'
29
38
 
30
39
  @primary_key = key_schemas['primary']
31
- @primary_key.add_columns_to @select_columns
40
+ @primary_key.add_columns_to @columns
32
41
  if traversing_key_name
33
42
  traversing_key_name.downcase!
34
43
  raise "BTREE Index #{traversing_key_name} not found" unless key_schemas.include? traversing_key_name
35
44
  @traversing_key = key_schemas[traversing_key_name]
36
- @traversing_key.add_columns_to @select_columns
45
+ @traversing_key.add_columns_to @columns
37
46
  @traversing_key.ascending = ascending
38
47
  @traversing_key.first_only = first_only
39
48
  end
40
49
 
50
+ # Specify the column names in the destination map, if provided
51
+ @columns.each do | column |
52
+ column.dest_name = dest_columns_map[column.name]
53
+ end
54
+
41
55
  end
42
56
 
43
- def insert_statement(target_model, rows)
44
- "insert into #{target_model.quoted_table_name} (#{quoted_column_names}) values #{quoted_row_values(rows)}"
57
+ def column_names
58
+ @columns.map(&:name)
59
+ end
60
+
61
+ def insert_statement(rows)
62
+ "insert into #{@dest_model.quoted_table_name} (#{quoted_dest_column_names}) values #{quoted_row_values(rows)}"
45
63
  end
46
64
 
47
65
  def delete_statement(rows)
48
66
  rec_criteria = rows.map do | row |
49
67
  row_compares = []
50
68
  @primary_key.columns.each do |column|
51
- row_compares << "#{column.quoted_name} = #{column.quoted_value(row)}"
69
+ row_compares << "#{column.quoted_dest_name(@dest_model)} = #{column.quoted_value(row)}"
52
70
  end
53
71
  "(" + row_compares.join(" AND ") + ")"
54
72
  end
55
- "DELETE FROM #{@model.quoted_table_name} WHERE #{rec_criteria.join(" OR ")}"
73
+ "DELETE FROM #{@dest_model.quoted_table_name} WHERE #{rec_criteria.join(" OR ")}"
56
74
  end
57
75
 
58
76
  def initial_scope
@@ -82,15 +100,20 @@ class CleanSweep::TableSchema
82
100
  end
83
101
 
84
102
  def quoted_column_names
85
- select_columns.map{|c| "`#{c}`"}.join(",")
103
+ columns.map{|c| "#{c.quoted_name}"}.join(",")
104
+ end
105
+
106
+ def quoted_dest_column_names
107
+ columns.map{|c| c.quoted_dest_name(@dest_model)}.join(",")
86
108
  end
87
109
 
88
110
  def quoted_row_values(rows)
89
111
  rows.map do |vec|
90
- quoted_column_values = vec.map do |col_value|
91
- @model.connection.quote(col_value)
92
- end.join(",")
93
- "(#{quoted_column_values})"
112
+ row = []
113
+ columns.each_with_index do | col, i |
114
+ row << @model.quote_value(vec[i], col.ar_column)
115
+ end
116
+ "(#{row.join(',')})"
94
117
  end.join(",")
95
118
  end
96
119
 
@@ -1,3 +1,3 @@
1
1
  module CleanSweep
2
- VERSION = "1.0.1"
2
+ VERSION = "1.0.2"
3
3
  end
@@ -12,6 +12,7 @@ class Book < ActiveRecord::Base
12
12
  key book_index_by_bin(bin, id)
13
13
  )
14
14
  EOF
15
+ Book.delete_all
15
16
  end
16
17
 
17
18
  end
@@ -27,10 +28,17 @@ end
27
28
  class BookTemp < ActiveRecord::Base
28
29
 
29
30
  self.table_name = 'book_vault'
31
+ self.primary_key= 'book_id'
30
32
 
31
33
  def self.create_table
32
34
  connection.execute <<-EOF
33
- create temporary table if not exists book_vault like books
35
+ create temporary table if not exists
36
+ book_vault (
37
+ `book_id` int(11) primary key auto_increment,
38
+ `bin` int(11),
39
+ `published_by` varchar(64)
40
+ )
34
41
  EOF
42
+ BookTemp.delete_all
35
43
  end
36
44
  end
@@ -72,20 +72,20 @@ describe CleanSweep::PurgeRunner do
72
72
  purger.print_queries(output)
73
73
  expect(output.string).to eq <<EOF
74
74
  Initial Query:
75
- SELECT `id`,`account`,`timestamp`
75
+ SELECT `comments`.`id`,`comments`.`account`,`comments`.`timestamp`
76
76
  FROM `comments` FORCE INDEX(comments_on_account_timestamp)
77
77
  WHERE (timestamp < '2014-11-25 21:47:43')
78
- ORDER BY `account` ASC,`timestamp` ASC
78
+ ORDER BY `comments`.`account` ASC,`comments`.`timestamp` ASC
79
79
  LIMIT 500
80
80
  Chunk Query:
81
- SELECT `id`,`account`,`timestamp`
81
+ SELECT `comments`.`id`,`comments`.`account`,`comments`.`timestamp`
82
82
  FROM `comments` FORCE INDEX(comments_on_account_timestamp)
83
- WHERE (timestamp < '2014-11-25 21:47:43') AND (`account` > 0 OR (`account` = 0 AND `timestamp` > '2014-11-18 21:47:43'))\n ORDER BY `account` ASC,`timestamp` ASC
83
+ WHERE (timestamp < '2014-11-25 21:47:43') AND (`comments`.`account` > 0 OR (`comments`.`account` = 0 AND `comments`.`timestamp` > '2014-11-18 21:47:43'))\n ORDER BY `comments`.`account` ASC,`comments`.`timestamp` ASC
84
84
  LIMIT 500
85
85
  Delete Statement:
86
86
  DELETE
87
87
  FROM `comments`
88
- WHERE (`id` = 2)
88
+ WHERE (`comments`.`id` = 2)
89
89
  EOF
90
90
  end
91
91
  end
@@ -167,13 +167,20 @@ EOF
167
167
  it 'copies books' do
168
168
  BookTemp.create_table
169
169
  purger = CleanSweep::PurgeRunner.new model: Book,
170
+ copy_columns: ['publisher'],
170
171
  dest_model: BookTemp,
172
+ dest_columns: { 'PUBLISHER' => 'published_by', 'ID' => 'book_id'},
171
173
  chunk_size: 4,
174
+ copy_only: true,
172
175
  index: 'book_index_by_bin'
173
176
 
174
177
  count = purger.execute_in_batches
175
178
  expect(count).to be(@total_book_size)
176
179
  expect(BookTemp.count).to eq(@total_book_size)
180
+ last_book = BookTemp.last
181
+ expect(last_book.book_id).to be 200
182
+ expect(last_book.bin).to be 2000
183
+ expect(last_book.published_by).to eq 'Random House'
177
184
  end
178
185
 
179
186
  end
@@ -17,22 +17,22 @@ describe CleanSweep::TableSchema do
17
17
  it 'should produce an ascending chunk clause' do
18
18
  rows = account_and_timestamp_rows
19
19
  expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
20
- .to include("(`account` > 5 OR (`account` = 5 AND `timestamp` > '2014-12-01 23:13:25'))")
20
+ .to include("(`comments`.`account` > 5 OR (`comments`.`account` = 5 AND `comments`.`timestamp` > '2014-12-01 23:13:25'))")
21
21
  end
22
22
 
23
23
  it 'should produce all select columns' do
24
- expect(schema.select_columns).to eq([:id, :account, :timestamp])
24
+ expect(schema.column_names).to eq([:id, :account, :timestamp])
25
25
  end
26
26
 
27
27
  it 'should produce the ascending order clause' do
28
- expect(schema.initial_scope.to_sql).to include('`account` ASC,`timestamp` ASC')
28
+ expect(schema.initial_scope.to_sql).to include('`comments`.`account` ASC,`comments`.`timestamp` ASC')
29
29
  end
30
30
 
31
31
 
32
32
  it 'should produce an insert statement' do
33
33
  schema = CleanSweep::TableSchema.new Comment, key_name: 'comments_on_account_timestamp'
34
34
  rows = account_and_timestamp_rows
35
- expect(schema.insert_statement(Comment, rows)).to eq("insert into `comments` (`id`,`account`,`timestamp`) values (1001,5,'2014-12-02 01:13:25'),(1002,2,'2014-12-02 00:13:25'),(1005,5,'2014-12-01 23:13:25')")
35
+ expect(schema.insert_statement(rows)).to eq("insert into `comments` (`comments`.`id`,`comments`.`account`,`comments`.`timestamp`) values (1001,5,'2014-12-02 01:13:25'),(1002,2,'2014-12-02 00:13:25'),(1005,5,'2014-12-01 23:13:25')")
36
36
  end
37
37
  end
38
38
 
@@ -43,14 +43,14 @@ describe CleanSweep::TableSchema do
43
43
  it 'should produce a descending where clause' do
44
44
  rows = account_and_timestamp_rows
45
45
  expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
46
- .to include("(`account` < 5 OR (`account` = 5 AND `timestamp` < '2014-12-01 23:13:25'))")
46
+ .to include("(`comments`.`account` < 5 OR (`comments`.`account` = 5 AND `comments`.`timestamp` < '2014-12-01 23:13:25'))")
47
47
  end
48
48
 
49
49
 
50
50
  it 'should produce the descending order clause' do
51
51
  rows = account_and_timestamp_rows
52
52
  expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
53
- .to include("`account` DESC,`timestamp` DESC")
53
+ .to include("`comments`.`account` DESC,`comments`.`timestamp` DESC")
54
54
  end
55
55
 
56
56
  end
@@ -59,13 +59,13 @@ describe CleanSweep::TableSchema do
59
59
  let(:schema) { CleanSweep::TableSchema.new Comment, key_name:'comments_on_account_timestamp', first_only: true }
60
60
 
61
61
  it 'should select all the rows' do
62
- expect(schema.select_columns).to eq([:id, :account, :timestamp])
62
+ expect(schema.column_names).to eq([:id, :account, :timestamp])
63
63
  end
64
64
 
65
65
  it 'should only query using the first column of the index' do
66
66
  rows = account_and_timestamp_rows
67
67
  expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
68
- .to include(" (`account` >= 5) ")
68
+ .to include(" (`comments`.`account` >= 5) ")
69
69
 
70
70
  end
71
71
 
@@ -83,7 +83,7 @@ describe CleanSweep::TableSchema do
83
83
 
84
84
  it 'should produce minimal select columns' do
85
85
  schema = CleanSweep::TableSchema.new Comment, key_name: 'PRIMARY'
86
- expect(schema.select_columns).to eq([:id])
86
+ expect(schema.column_names).to eq([:id])
87
87
  end
88
88
 
89
89
  it 'should produce the from clause with an index' do
@@ -93,10 +93,10 @@ describe CleanSweep::TableSchema do
93
93
 
94
94
  it 'should include additional columns' do
95
95
  schema = CleanSweep::TableSchema.new Comment, key_name: 'comments_on_account_timestamp', extra_columns: %w[seen id]
96
- expect(schema.select_columns).to eq([:seen, :id, :account, :timestamp])
96
+ expect(schema.column_names).to eq([:seen, :id, :account, :timestamp])
97
97
  rows = account_and_timestamp_rows
98
98
  rows.map! { |row| row.unshift 1 } # Insert 'seen' value to beginning of row
99
- expect(schema.insert_statement(Comment, rows)).to eq("insert into `comments` (`seen`,`id`,`account`,`timestamp`) values (1,1001,5,'2014-12-02 01:13:25'),(1,1002,2,'2014-12-02 00:13:25'),(1,1005,5,'2014-12-01 23:13:25')")
99
+ expect(schema.insert_statement(rows)).to eq("insert into `comments` (`comments`.`seen`,`comments`.`id`,`comments`.`account`,`comments`.`timestamp`) values (1,1001,5,'2014-12-02 01:13:25'),(1,1002,2,'2014-12-02 00:13:25'),(1,1005,5,'2014-12-01 23:13:25')")
100
100
 
101
101
  end
102
102
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: cleansweep
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.1
4
+ version: 1.0.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Bill Kayser
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-12-02 00:00:00.000000000 Z
11
+ date: 2014-12-03 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activerecord
@@ -44,14 +44,14 @@ dependencies:
44
44
  requirements:
45
45
  - - "~>"
46
46
  - !ruby/object:Gem::Version
47
- version: 0.3.17
47
+ version: '0.3'
48
48
  type: :runtime
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
52
  - - "~>"
53
53
  - !ruby/object:Gem::Version
54
- version: 0.3.17
54
+ version: '0.3'
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: pry
57
57
  requirement: !ruby/object:Gem::Requirement
@@ -167,7 +167,7 @@ files:
167
167
  - spec/purge_runner_spec.rb
168
168
  - spec/spec_helper.rb
169
169
  - spec/table_schema_spec.rb
170
- homepage: http://github.com/bkayser/cleansweep
170
+ homepage: http://bkayser.github.com/cleansweep
171
171
  licenses:
172
172
  - MIT
173
173
  metadata: {}
@@ -177,9 +177,9 @@ require_paths:
177
177
  - lib
178
178
  required_ruby_version: !ruby/object:Gem::Requirement
179
179
  requirements:
180
- - - ">="
180
+ - - "~>"
181
181
  - !ruby/object:Gem::Version
182
- version: '0'
182
+ version: '2'
183
183
  required_rubygems_version: !ruby/object:Gem::Requirement
184
184
  requirements:
185
185
  - - ">="