cleansweep 1.0.1 → 1.0.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGES.md +8 -2
- data/README.md +72 -41
- data/cleansweep.gemspec +6 -2
- data/lib/clean_sweep/purge_runner.rb +32 -8
- data/lib/clean_sweep/table_schema/column_schema.rb +20 -4
- data/lib/clean_sweep/table_schema/index_schema.rb +5 -5
- data/lib/clean_sweep/table_schema.rb +36 -13
- data/lib/clean_sweep/version.rb +1 -1
- data/spec/factories/books.rb +9 -1
- data/spec/purge_runner_spec.rb +12 -5
- data/spec/table_schema_spec.rb +11 -11
- metadata +7 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 171c5ce6b972df17162909a1538cf8ecc867e347
|
4
|
+
data.tar.gz: 6d91698f6759a599e03683287ea230230d99a475
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 5373eb62b1acbf097681efde6a1ad08ad94c574ee5676546e0c798ba91d93a8a28efc97fe8f4ce95e8b5f0ee8f1e4b12e340c6cdf0a78e901589ecbc85f4fee5
|
7
|
+
data.tar.gz: 199c96ba5a90457bd6d3310b59a293de1dd185d84b009d1f849ce13637210415606ba29fd704312186c8767aaaf9990e902bfcbc7d4e561f28f200112ef83f0e
|
data/CHANGES.md
CHANGED
@@ -1,5 +1,11 @@
|
|
1
1
|
See the [documentation](http://bkayser.github.io/cleansweep) for details
|
2
2
|
|
3
|
-
|
3
|
+
### Version 1.0.1
|
4
4
|
|
5
|
-
* Initial release
|
5
|
+
* Initial release
|
6
|
+
|
7
|
+
### Version 1.0.2
|
8
|
+
|
9
|
+
* Changed destination options so you can delete from a different table.
|
10
|
+
* Added `dest_columns` option as a map of column names in the source to column names in the destination.
|
11
|
+
* More testing and bug fixing in real environments
|
data/README.md
CHANGED
@@ -1,5 +1,6 @@
|
|
1
|
-
Cleansweep is a utility for scripting purges using ruby in an
|
2
|
-
mysql innodb tables. Based on the
|
1
|
+
Cleansweep is a utility for scripting purges using ruby in an
|
2
|
+
efficient, low-impact manner on mysql innodb tables. Based on the
|
3
|
+
Percona `pt-archive` utility.
|
3
4
|
|
4
5
|
## Installation
|
5
6
|
|
@@ -35,12 +36,13 @@ Assume there is an active record model for it:
|
|
35
36
|
|
36
37
|
### Purging by traversing an index
|
37
38
|
|
38
|
-
The most efficient way to work through a table is by scanning through
|
39
|
-
at a time.
|
39
|
+
The most efficient way to work through a table is by scanning through
|
40
|
+
an index one chunk at a time.
|
40
41
|
|
41
42
|
Let's assume we want to purge Comments older than 1 month. We can
|
42
|
-
scan the primary key index or the `account`,`timestamp` index. In
|
43
|
-
probably work better since we are evaluating
|
43
|
+
scan the primary key index or the `account`,`timestamp` index. In
|
44
|
+
this case the latter will probably work better since we are evaluating
|
45
|
+
the timestamp for the purge.
|
44
46
|
|
45
47
|
```ruby
|
46
48
|
r = CleanSweep::PurgeRunner.new model: Comment,
|
@@ -62,7 +64,8 @@ Check what it will do:
|
|
62
64
|
r.print_queries($stdout)
|
63
65
|
```
|
64
66
|
|
65
|
-
This will show you what it will do by printing out the three different
|
67
|
+
This will show you what it will do by printing out the three different
|
68
|
+
statements used:
|
66
69
|
|
67
70
|
```sql
|
68
71
|
Initial Query:
|
@@ -82,13 +85,15 @@ This will show you what it will do by printing out the three different statement
|
|
82
85
|
WHERE (`id` = 2)
|
83
86
|
```
|
84
87
|
|
85
|
-
It does the initial statement once to get the first chunk of rows.
|
86
|
-
starting at the index where the last
|
87
|
-
|
88
|
-
|
88
|
+
It does the initial statement once to get the first chunk of rows.
|
89
|
+
Then it does subsequent queries starting at the index where the last
|
90
|
+
chunk left off, thereby avoiding a complete index scan. This works
|
91
|
+
fine as long as you don't have rows with duplicate account id and
|
92
|
+
timestamps. If you do, you'll possibly miss rows between chunks.
|
89
93
|
|
90
|
-
To avoid missing duplicates, you can traverse the index using only the
|
91
|
-
like `>=` instead of `>`.
|
94
|
+
To avoid missing duplicates, you can traverse the index using only the
|
95
|
+
first column with an inclusive comparator like `>=` instead of `>`.
|
96
|
+
Here's what that would look like:
|
92
97
|
|
93
98
|
```ruby
|
94
99
|
r = CleanSweep::PurgeRunner.new model:Comment,
|
@@ -107,48 +112,70 @@ The chunk query looks like:
|
|
107
112
|
LIMIT 500
|
108
113
|
```
|
109
114
|
|
110
|
-
You can scan the index in either direction. To specify descending
|
115
|
+
You can scan the index in either direction. To specify descending
|
116
|
+
order, use the `reverse: true` option.
|
111
117
|
|
112
118
|
### Copying rows from one table to another
|
113
119
|
|
114
|
-
You can use the same technique to copy rows from one table to another.
|
115
|
-
minimal. It won't _move_ rows, only
|
116
|
-
|
117
|
-
used to delete later.
|
120
|
+
You can use the same technique to copy rows from one table to another.
|
121
|
+
Support in CleanSweep is pretty minimal. It won't _move_ rows, only
|
122
|
+
copy them, although it would be easy to fix this. I used this to copy
|
123
|
+
ids into a temporary table which I then used to delete later.
|
118
124
|
|
119
|
-
Here's an example that copies rows from the `Comment` model to the
|
120
|
-
Comments older than one
|
125
|
+
Here's an example that copies rows from the `Comment` model to the
|
126
|
+
`ExpiredComment` model (`expired_comments`). Comments older than one
|
127
|
+
week are copied.
|
121
128
|
|
122
129
|
```ruby
|
123
130
|
copier = CleanSweep::PurgeRunner.new model: Comment,
|
124
131
|
index: 'comments_on_account_timestamp',
|
125
132
|
dest_model: ExpiredComment,
|
133
|
+
copy_only: true,
|
126
134
|
copy_columns: %w[liked] do do | model |
|
127
135
|
model.where('last_used_at < ?', 1.week.ago)
|
128
136
|
end
|
129
137
|
```
|
130
138
|
|
131
|
-
The `copy_columns` option specifies additional columns to be inserted
|
139
|
+
The `copy_columns` option specifies additional columns to be inserted
|
140
|
+
into the `expired_comments` table.
|
141
|
+
|
142
|
+
If the column names are different in the destination table than in the
|
143
|
+
source table, you can specify a mapping with the `dest_columns` option
|
144
|
+
which takes a map of source column name to destination name.
|
145
|
+
|
146
|
+
### Deleting rows in another table
|
147
|
+
|
148
|
+
What if you want to query one table and delete those rows in another?
|
149
|
+
I needed this when I built a temporary table of account ids that
|
150
|
+
referenced deleted accounts. I then wanted to delete rows in other
|
151
|
+
tables that referenced those account ids. To do that, specify a
|
152
|
+
`dest_table` without specifying `copy_only` mode. This will execute
|
153
|
+
the delete statement on the destination table without removing rows
|
154
|
+
from the source table.
|
132
155
|
|
133
156
|
### Watching the history list and replication lag
|
134
157
|
|
135
|
-
You can enter thresholds for the history list size and replication lag
|
136
|
-
purge if either of those values get
|
137
|
-
|
158
|
+
You can enter thresholds for the history list size and replication lag
|
159
|
+
that will be used to pause the purge if either of those values get
|
160
|
+
into an unsafe territory. The script will pause for 5 minutes and
|
161
|
+
only start once the corresponding metric goes back down to 90% of the
|
162
|
+
specified threshold.
|
138
163
|
|
139
164
|
### Logging and monitoring progress
|
140
165
|
|
141
|
-
You pass in a standard log instance to capture all running output. By
|
142
|
-
`ActiveRecord::Base` logger, or stdout if
|
166
|
+
You pass in a standard log instance to capture all running output. By
|
167
|
+
default it will log to your `ActiveRecord::Base` logger, or stdout if
|
168
|
+
that's not set up.
|
143
169
|
|
144
|
-
If you specify a reporting interval
|
145
|
-
|
146
|
-
progress and assess the rate of deletion.
|
170
|
+
If you specify a reporting interval with the `report` option it will
|
171
|
+
print the status of the purge at that interval. This is useful to
|
172
|
+
track progress and assess the rate of deletion.
|
147
173
|
|
148
174
|
### Joins and subqueries
|
149
175
|
|
150
|
-
You can add subqueries and joins to your query in the scope block, but
|
151
|
-
clause may work against you if the
|
176
|
+
You can add subqueries and joins to your query in the scope block, but
|
177
|
+
be careful. The index and order clause may work against you if the
|
178
|
+
table you are joining with doesn't have good parity with the indexes
|
152
179
|
in your target table.
|
153
180
|
|
154
181
|
### Limitations
|
@@ -165,21 +192,24 @@ in your target table.
|
|
165
192
|
|
166
193
|
### Other options
|
167
194
|
|
168
|
-
There are a number of other options you can use to tune the script.
|
169
|
-
[API on the `PurgeRunner`
|
195
|
+
There are a number of other options you can use to tune the script.
|
196
|
+
For details look at the [API on the `PurgeRunner`
|
197
|
+
class](http://bkayser.github.io/cleansweep/rdoc/CleanSweep/PurgeRunner.html)
|
170
198
|
|
171
199
|
### NewRelic integration
|
172
200
|
|
173
|
-
The script requires the [New Relic](http://github.com/newrelic/rpm)
|
174
|
-
|
175
|
-
|
176
|
-
|
201
|
+
The script requires the [New Relic](http://github.com/newrelic/rpm)
|
202
|
+
gem. It won't impact anyting if you don't have a New Relic account to
|
203
|
+
report to, but if you do use New Relic it is configured to show you
|
204
|
+
detailed metrics. I recommend turning off transaction traces for long
|
205
|
+
purge jobs to reduce your memory footprint.
|
177
206
|
|
178
207
|
## Testing
|
179
208
|
|
180
|
-
To run the specs, start a local mysql instance. The default user is
|
181
|
-
Override the user/password with
|
182
|
-
|
209
|
+
To run the specs, start a local mysql instance. The default user is
|
210
|
+
root with an empty password. Override the user/password with
|
211
|
+
environment variables `DB_USER` and `DB_PASSWORD`. The test creates a
|
212
|
+
db called 'cstest'.
|
183
213
|
|
184
214
|
## Contributing
|
185
215
|
|
@@ -197,5 +227,6 @@ Covered by the MIT [LICENSE](LICENSE.txt).
|
|
197
227
|
|
198
228
|
### Credits
|
199
229
|
|
200
|
-
This was all inspired and informed by [Percona's `pt-archiver`
|
230
|
+
This was all inspired and informed by [Percona's `pt-archiver`
|
231
|
+
script](http://www.percona.com/doc/percona-toolkit/2.1/pt-archiver.html)
|
201
232
|
written by Baron Schwartz.
|
data/cleansweep.gemspec
CHANGED
@@ -9,11 +9,15 @@ Gem::Specification.new do |spec|
|
|
9
9
|
spec.authors = ["Bill Kayser"]
|
10
10
|
spec.email = ["bkayser@newrelic.com"]
|
11
11
|
spec.summary = %q{Utility to purge or archive rows in mysql tables}
|
12
|
+
|
13
|
+
spec.platform = Gem::Platform::RUBY
|
14
|
+
spec.required_ruby_version = '~> 2'
|
15
|
+
|
12
16
|
spec.description = <<-EOF
|
13
17
|
Purge data from mysql innodb tables efficiently with low overhead and impact.
|
14
18
|
Based on the Percona pt-archive utility.
|
15
19
|
EOF
|
16
|
-
spec.homepage = "http://github.com/
|
20
|
+
spec.homepage = "http://bkayser.github.com/cleansweep"
|
17
21
|
spec.license = "MIT"
|
18
22
|
|
19
23
|
spec.files = `git ls-files -z`.split("\x0")
|
@@ -23,7 +27,7 @@ Gem::Specification.new do |spec|
|
|
23
27
|
|
24
28
|
spec.add_runtime_dependency 'activerecord', '>= 3.0'
|
25
29
|
spec.add_runtime_dependency 'newrelic_rpm'
|
26
|
-
spec.add_runtime_dependency 'mysql2', '~> 0.3
|
30
|
+
spec.add_runtime_dependency 'mysql2', '~> 0.3'
|
27
31
|
|
28
32
|
spec.add_development_dependency 'pry', '~> 0'
|
29
33
|
spec.add_development_dependency 'bundler', '~> 1.7'
|
@@ -36,11 +36,23 @@ require 'stringio'
|
|
36
36
|
# The log instance to use. Defaults to the <tt>ActiveRecord::Base.logger</tt>
|
37
37
|
# if not nil, otherwise it uses _$stdout_
|
38
38
|
# [:dest_model]
|
39
|
-
#
|
40
|
-
#
|
41
|
-
#
|
39
|
+
# Specifies the model for the delete operation, or the copy operation if in copy mode.
|
40
|
+
# When this option is present nothing is deleted in the model table. Instead, rows
|
41
|
+
# are either inserted into this table or deleted from this table.
|
42
|
+
# The columns in this model must include the primary key columns found in the source
|
43
|
+
# model. If they have different names you need to specify them with the
|
44
|
+
# <tt>dest_columns</tt> option.
|
45
|
+
# [:copy_only]
|
46
|
+
# Specifies copy mode, where rows are inserted into the destination table instead of deleted from
|
47
|
+
# the model table. By default, only columns in the
|
42
48
|
# named index and primary key are copied but these can be augmented with columns in the
|
43
49
|
# <tt>copy_columns</tt> option.
|
50
|
+
# [:dest_columns]
|
51
|
+
# This is a map of column names in the model to column names in the dest model when the
|
52
|
+
# corresponding models differ. Only column names that are different need to be specified.
|
53
|
+
# For instance your table of account ids might have <tt>account_id</tt>
|
54
|
+
# as the primary key column, but you want to delete rows in the accounts table where the account id is
|
55
|
+
# the column named <tt>id</tt>
|
44
56
|
# [:copy_columns]
|
45
57
|
# Extra columns to add when copying to a dest model.
|
46
58
|
#
|
@@ -79,11 +91,15 @@ class CleanSweep::PurgeRunner
|
|
79
91
|
@max_history = options[:max_history]
|
80
92
|
@max_repl_lag = options[:max_repl_lag]
|
81
93
|
|
94
|
+
@copy_mode = @target_model && options[:copy_only]
|
95
|
+
|
82
96
|
@table_schema = CleanSweep::TableSchema.new @model,
|
83
97
|
key_name: options[:index],
|
84
98
|
ascending: !options[:reverse],
|
85
99
|
extra_columns: options[:copy_columns],
|
86
|
-
first_only: options[:first_only]
|
100
|
+
first_only: options[:first_only],
|
101
|
+
dest_model: @target_model,
|
102
|
+
dest_columns: options[:dest_columns]
|
87
103
|
|
88
104
|
if (@max_history || @max_repl_lag)
|
89
105
|
@mysql_status = CleanSweep::PurgeRunner::MysqlStatus.new model: @model,
|
@@ -106,7 +122,7 @@ class CleanSweep::PurgeRunner
|
|
106
122
|
|
107
123
|
|
108
124
|
def copy_mode?
|
109
|
-
@
|
125
|
+
@copy_mode
|
110
126
|
end
|
111
127
|
|
112
128
|
# Execute the purge in chunks according to the parameters given on instance creation.
|
@@ -117,7 +133,10 @@ class CleanSweep::PurgeRunner
|
|
117
133
|
#
|
118
134
|
def execute_in_batches
|
119
135
|
|
120
|
-
|
136
|
+
if @dry_run
|
137
|
+
print_queries($stdout)
|
138
|
+
return 0
|
139
|
+
end
|
121
140
|
|
122
141
|
@start = Time.now
|
123
142
|
verb = copy_mode? ? "copying" : "purging"
|
@@ -146,7 +165,7 @@ class CleanSweep::PurgeRunner
|
|
146
165
|
last_row = rows.last
|
147
166
|
if copy_mode?
|
148
167
|
metric_op_name = 'INSERT'
|
149
|
-
statement = @table_schema.insert_statement(
|
168
|
+
statement = @table_schema.insert_statement(rows)
|
150
169
|
else
|
151
170
|
metric_op_name = 'DELETE'
|
152
171
|
statement = @table_schema.delete_statement(rows)
|
@@ -190,11 +209,16 @@ class CleanSweep::PurgeRunner
|
|
190
209
|
io.puts 'Initial Query:'
|
191
210
|
io.puts format_query(' ', @query.to_sql)
|
192
211
|
rows = @model.connection.select_rows @query.limit(1).to_sql
|
212
|
+
if rows.empty?
|
213
|
+
# Don't have any sample data to use for the sample queries, so use NULL values just
|
214
|
+
# so the query will print out.
|
215
|
+
rows << [nil] * 100
|
216
|
+
end
|
193
217
|
io.puts "Chunk Query:"
|
194
218
|
io.puts format_query(' ', @table_schema.scope_to_next_chunk(@query, rows.first).to_sql)
|
195
219
|
if copy_mode?
|
196
220
|
io.puts "Insert Statement:"
|
197
|
-
io.puts format_query(' ', @table_schema.insert_statement(
|
221
|
+
io.puts format_query(' ', @table_schema.insert_statement(rows))
|
198
222
|
else
|
199
223
|
io.puts "Delete Statement:"
|
200
224
|
io.puts format_query(' ', @table_schema.delete_statement(rows))
|
@@ -1,23 +1,39 @@
|
|
1
1
|
class CleanSweep::TableSchema::ColumnSchema
|
2
2
|
|
3
|
-
attr_reader :name
|
3
|
+
attr_reader :name, :ar_column
|
4
4
|
attr_accessor :select_position
|
5
|
+
attr_writer :dest_name
|
5
6
|
|
6
7
|
def initialize(name, model)
|
7
8
|
@name = name.to_sym
|
8
9
|
col_num = model.column_names.index(name.to_s) or raise "Can't find #{name} in #{model.name}"
|
9
10
|
@model = model
|
10
|
-
@
|
11
|
+
@ar_column = model.columns[col_num]
|
11
12
|
end
|
12
13
|
|
13
14
|
def quoted_name
|
14
|
-
|
15
|
+
quote_column_name(@model, name)
|
15
16
|
end
|
17
|
+
|
18
|
+
def quoted_dest_name(dest_model)
|
19
|
+
quote_column_name(dest_model, @dest_name || @name)
|
20
|
+
end
|
21
|
+
|
16
22
|
def value(row)
|
17
23
|
row[select_position]
|
18
24
|
end
|
25
|
+
|
19
26
|
def quoted_value(row)
|
20
|
-
@model.quote_value(value(row), @
|
27
|
+
@model.quote_value(value(row), @ar_column)
|
28
|
+
end
|
29
|
+
|
30
|
+
def == other
|
31
|
+
return other && name == other.name
|
32
|
+
end
|
33
|
+
|
34
|
+
private
|
35
|
+
def quote_column_name(model, column_name)
|
36
|
+
model.connection.quote_table_name(model.table_name) + "." + model.connection.quote_column_name(column_name)
|
21
37
|
end
|
22
38
|
end
|
23
39
|
|
@@ -1,6 +1,6 @@
|
|
1
1
|
class CleanSweep::TableSchema::IndexSchema < Struct.new :name, :model, :ascending
|
2
2
|
|
3
|
-
attr_accessor :columns, :name, :model, :ascending, :first_only
|
3
|
+
attr_accessor :columns, :name, :model, :ascending, :first_only, :dest_model
|
4
4
|
|
5
5
|
def initialize name, model
|
6
6
|
@model = model
|
@@ -16,12 +16,12 @@ class CleanSweep::TableSchema::IndexSchema < Struct.new :name, :model, :ascendin
|
|
16
16
|
# Take columns referenced by this index and add them to the list if they
|
17
17
|
# are not present. Record their position in the list because the position will
|
18
18
|
# be where they are located in a row of values passed in later to #scope_to_next_chunk
|
19
|
-
def add_columns_to
|
19
|
+
def add_columns_to columns
|
20
20
|
@columns.each do | column |
|
21
|
-
pos =
|
21
|
+
pos = columns.index column
|
22
22
|
if pos.nil?
|
23
|
-
|
24
|
-
pos =
|
23
|
+
columns << column
|
24
|
+
pos = columns.size - 1
|
25
25
|
end
|
26
26
|
column.select_position = pos
|
27
27
|
end
|
@@ -2,7 +2,7 @@
|
|
2
2
|
class CleanSweep::TableSchema
|
3
3
|
|
4
4
|
# The list of columns used when selecting, the union of pk and traversing key columns
|
5
|
-
attr_reader :
|
5
|
+
attr_reader :columns
|
6
6
|
|
7
7
|
# The schema for the primary key
|
8
8
|
attr_reader :primary_key
|
@@ -18,8 +18,17 @@ class CleanSweep::TableSchema
|
|
18
18
|
ascending = options.include?(:ascending) ? options[:ascending] : true
|
19
19
|
first_only = options[:first_only]
|
20
20
|
@model = model
|
21
|
+
@dest_model = options[:dest_model] || @model
|
22
|
+
|
23
|
+
# Downcase and symbolize the entries in the column name map:
|
24
|
+
dest_columns_map = Hash[*(options[:dest_columns] || {}).to_a.flatten.map{|n| n.to_s.downcase.to_sym}]
|
25
|
+
|
21
26
|
@name = @model.table_name
|
22
|
-
|
27
|
+
|
28
|
+
@columns =
|
29
|
+
(options[:extra_columns] || []).map do | extra_col_name |
|
30
|
+
CleanSweep::TableSchema::ColumnSchema.new extra_col_name, model
|
31
|
+
end
|
23
32
|
|
24
33
|
key_schemas = build_indexes
|
25
34
|
|
@@ -28,31 +37,40 @@ class CleanSweep::TableSchema
|
|
28
37
|
raise "Table #{model.table_name} must have a primary key" unless key_schemas.include? 'primary'
|
29
38
|
|
30
39
|
@primary_key = key_schemas['primary']
|
31
|
-
@primary_key.add_columns_to @
|
40
|
+
@primary_key.add_columns_to @columns
|
32
41
|
if traversing_key_name
|
33
42
|
traversing_key_name.downcase!
|
34
43
|
raise "BTREE Index #{traversing_key_name} not found" unless key_schemas.include? traversing_key_name
|
35
44
|
@traversing_key = key_schemas[traversing_key_name]
|
36
|
-
@traversing_key.add_columns_to @
|
45
|
+
@traversing_key.add_columns_to @columns
|
37
46
|
@traversing_key.ascending = ascending
|
38
47
|
@traversing_key.first_only = first_only
|
39
48
|
end
|
40
49
|
|
50
|
+
# Specify the column names in the destination map, if provided
|
51
|
+
@columns.each do | column |
|
52
|
+
column.dest_name = dest_columns_map[column.name]
|
53
|
+
end
|
54
|
+
|
41
55
|
end
|
42
56
|
|
43
|
-
def
|
44
|
-
|
57
|
+
def column_names
|
58
|
+
@columns.map(&:name)
|
59
|
+
end
|
60
|
+
|
61
|
+
def insert_statement(rows)
|
62
|
+
"insert into #{@dest_model.quoted_table_name} (#{quoted_dest_column_names}) values #{quoted_row_values(rows)}"
|
45
63
|
end
|
46
64
|
|
47
65
|
def delete_statement(rows)
|
48
66
|
rec_criteria = rows.map do | row |
|
49
67
|
row_compares = []
|
50
68
|
@primary_key.columns.each do |column|
|
51
|
-
row_compares << "#{column.
|
69
|
+
row_compares << "#{column.quoted_dest_name(@dest_model)} = #{column.quoted_value(row)}"
|
52
70
|
end
|
53
71
|
"(" + row_compares.join(" AND ") + ")"
|
54
72
|
end
|
55
|
-
"DELETE FROM #{@
|
73
|
+
"DELETE FROM #{@dest_model.quoted_table_name} WHERE #{rec_criteria.join(" OR ")}"
|
56
74
|
end
|
57
75
|
|
58
76
|
def initial_scope
|
@@ -82,15 +100,20 @@ class CleanSweep::TableSchema
|
|
82
100
|
end
|
83
101
|
|
84
102
|
def quoted_column_names
|
85
|
-
|
103
|
+
columns.map{|c| "#{c.quoted_name}"}.join(",")
|
104
|
+
end
|
105
|
+
|
106
|
+
def quoted_dest_column_names
|
107
|
+
columns.map{|c| c.quoted_dest_name(@dest_model)}.join(",")
|
86
108
|
end
|
87
109
|
|
88
110
|
def quoted_row_values(rows)
|
89
111
|
rows.map do |vec|
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
112
|
+
row = []
|
113
|
+
columns.each_with_index do | col, i |
|
114
|
+
row << @model.quote_value(vec[i], col.ar_column)
|
115
|
+
end
|
116
|
+
"(#{row.join(',')})"
|
94
117
|
end.join(",")
|
95
118
|
end
|
96
119
|
|
data/lib/clean_sweep/version.rb
CHANGED
data/spec/factories/books.rb
CHANGED
@@ -12,6 +12,7 @@ class Book < ActiveRecord::Base
|
|
12
12
|
key book_index_by_bin(bin, id)
|
13
13
|
)
|
14
14
|
EOF
|
15
|
+
Book.delete_all
|
15
16
|
end
|
16
17
|
|
17
18
|
end
|
@@ -27,10 +28,17 @@ end
|
|
27
28
|
class BookTemp < ActiveRecord::Base
|
28
29
|
|
29
30
|
self.table_name = 'book_vault'
|
31
|
+
self.primary_key= 'book_id'
|
30
32
|
|
31
33
|
def self.create_table
|
32
34
|
connection.execute <<-EOF
|
33
|
-
create temporary table if not exists
|
35
|
+
create temporary table if not exists
|
36
|
+
book_vault (
|
37
|
+
`book_id` int(11) primary key auto_increment,
|
38
|
+
`bin` int(11),
|
39
|
+
`published_by` varchar(64)
|
40
|
+
)
|
34
41
|
EOF
|
42
|
+
BookTemp.delete_all
|
35
43
|
end
|
36
44
|
end
|
data/spec/purge_runner_spec.rb
CHANGED
@@ -72,20 +72,20 @@ describe CleanSweep::PurgeRunner do
|
|
72
72
|
purger.print_queries(output)
|
73
73
|
expect(output.string).to eq <<EOF
|
74
74
|
Initial Query:
|
75
|
-
SELECT `id`,`account`,`timestamp`
|
75
|
+
SELECT `comments`.`id`,`comments`.`account`,`comments`.`timestamp`
|
76
76
|
FROM `comments` FORCE INDEX(comments_on_account_timestamp)
|
77
77
|
WHERE (timestamp < '2014-11-25 21:47:43')
|
78
|
-
ORDER BY `account` ASC,`timestamp` ASC
|
78
|
+
ORDER BY `comments`.`account` ASC,`comments`.`timestamp` ASC
|
79
79
|
LIMIT 500
|
80
80
|
Chunk Query:
|
81
|
-
SELECT `id`,`account`,`timestamp`
|
81
|
+
SELECT `comments`.`id`,`comments`.`account`,`comments`.`timestamp`
|
82
82
|
FROM `comments` FORCE INDEX(comments_on_account_timestamp)
|
83
|
-
WHERE (timestamp < '2014-11-25 21:47:43') AND (`account` > 0 OR (`account` = 0 AND `timestamp` > '2014-11-18 21:47:43'))\n ORDER BY `account` ASC,`timestamp` ASC
|
83
|
+
WHERE (timestamp < '2014-11-25 21:47:43') AND (`comments`.`account` > 0 OR (`comments`.`account` = 0 AND `comments`.`timestamp` > '2014-11-18 21:47:43'))\n ORDER BY `comments`.`account` ASC,`comments`.`timestamp` ASC
|
84
84
|
LIMIT 500
|
85
85
|
Delete Statement:
|
86
86
|
DELETE
|
87
87
|
FROM `comments`
|
88
|
-
WHERE (`id` = 2)
|
88
|
+
WHERE (`comments`.`id` = 2)
|
89
89
|
EOF
|
90
90
|
end
|
91
91
|
end
|
@@ -167,13 +167,20 @@ EOF
|
|
167
167
|
it 'copies books' do
|
168
168
|
BookTemp.create_table
|
169
169
|
purger = CleanSweep::PurgeRunner.new model: Book,
|
170
|
+
copy_columns: ['publisher'],
|
170
171
|
dest_model: BookTemp,
|
172
|
+
dest_columns: { 'PUBLISHER' => 'published_by', 'ID' => 'book_id'},
|
171
173
|
chunk_size: 4,
|
174
|
+
copy_only: true,
|
172
175
|
index: 'book_index_by_bin'
|
173
176
|
|
174
177
|
count = purger.execute_in_batches
|
175
178
|
expect(count).to be(@total_book_size)
|
176
179
|
expect(BookTemp.count).to eq(@total_book_size)
|
180
|
+
last_book = BookTemp.last
|
181
|
+
expect(last_book.book_id).to be 200
|
182
|
+
expect(last_book.bin).to be 2000
|
183
|
+
expect(last_book.published_by).to eq 'Random House'
|
177
184
|
end
|
178
185
|
|
179
186
|
end
|
data/spec/table_schema_spec.rb
CHANGED
@@ -17,22 +17,22 @@ describe CleanSweep::TableSchema do
|
|
17
17
|
it 'should produce an ascending chunk clause' do
|
18
18
|
rows = account_and_timestamp_rows
|
19
19
|
expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
|
20
|
-
.to include("(`account` > 5 OR (`account` = 5 AND `timestamp` > '2014-12-01 23:13:25'))")
|
20
|
+
.to include("(`comments`.`account` > 5 OR (`comments`.`account` = 5 AND `comments`.`timestamp` > '2014-12-01 23:13:25'))")
|
21
21
|
end
|
22
22
|
|
23
23
|
it 'should produce all select columns' do
|
24
|
-
expect(schema.
|
24
|
+
expect(schema.column_names).to eq([:id, :account, :timestamp])
|
25
25
|
end
|
26
26
|
|
27
27
|
it 'should produce the ascending order clause' do
|
28
|
-
expect(schema.initial_scope.to_sql).to include('`account` ASC,`timestamp` ASC')
|
28
|
+
expect(schema.initial_scope.to_sql).to include('`comments`.`account` ASC,`comments`.`timestamp` ASC')
|
29
29
|
end
|
30
30
|
|
31
31
|
|
32
32
|
it 'should produce an insert statement' do
|
33
33
|
schema = CleanSweep::TableSchema.new Comment, key_name: 'comments_on_account_timestamp'
|
34
34
|
rows = account_and_timestamp_rows
|
35
|
-
expect(schema.insert_statement(
|
35
|
+
expect(schema.insert_statement(rows)).to eq("insert into `comments` (`comments`.`id`,`comments`.`account`,`comments`.`timestamp`) values (1001,5,'2014-12-02 01:13:25'),(1002,2,'2014-12-02 00:13:25'),(1005,5,'2014-12-01 23:13:25')")
|
36
36
|
end
|
37
37
|
end
|
38
38
|
|
@@ -43,14 +43,14 @@ describe CleanSweep::TableSchema do
|
|
43
43
|
it 'should produce a descending where clause' do
|
44
44
|
rows = account_and_timestamp_rows
|
45
45
|
expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
|
46
|
-
.to include("(`account` < 5 OR (`account` = 5 AND `timestamp` < '2014-12-01 23:13:25'))")
|
46
|
+
.to include("(`comments`.`account` < 5 OR (`comments`.`account` = 5 AND `comments`.`timestamp` < '2014-12-01 23:13:25'))")
|
47
47
|
end
|
48
48
|
|
49
49
|
|
50
50
|
it 'should produce the descending order clause' do
|
51
51
|
rows = account_and_timestamp_rows
|
52
52
|
expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
|
53
|
-
.to include("`account` DESC,`timestamp` DESC")
|
53
|
+
.to include("`comments`.`account` DESC,`comments`.`timestamp` DESC")
|
54
54
|
end
|
55
55
|
|
56
56
|
end
|
@@ -59,13 +59,13 @@ describe CleanSweep::TableSchema do
|
|
59
59
|
let(:schema) { CleanSweep::TableSchema.new Comment, key_name:'comments_on_account_timestamp', first_only: true }
|
60
60
|
|
61
61
|
it 'should select all the rows' do
|
62
|
-
expect(schema.
|
62
|
+
expect(schema.column_names).to eq([:id, :account, :timestamp])
|
63
63
|
end
|
64
64
|
|
65
65
|
it 'should only query using the first column of the index' do
|
66
66
|
rows = account_and_timestamp_rows
|
67
67
|
expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
|
68
|
-
.to include(" (`account` >= 5) ")
|
68
|
+
.to include(" (`comments`.`account` >= 5) ")
|
69
69
|
|
70
70
|
end
|
71
71
|
|
@@ -83,7 +83,7 @@ describe CleanSweep::TableSchema do
|
|
83
83
|
|
84
84
|
it 'should produce minimal select columns' do
|
85
85
|
schema = CleanSweep::TableSchema.new Comment, key_name: 'PRIMARY'
|
86
|
-
expect(schema.
|
86
|
+
expect(schema.column_names).to eq([:id])
|
87
87
|
end
|
88
88
|
|
89
89
|
it 'should produce the from clause with an index' do
|
@@ -93,10 +93,10 @@ describe CleanSweep::TableSchema do
|
|
93
93
|
|
94
94
|
it 'should include additional columns' do
|
95
95
|
schema = CleanSweep::TableSchema.new Comment, key_name: 'comments_on_account_timestamp', extra_columns: %w[seen id]
|
96
|
-
expect(schema.
|
96
|
+
expect(schema.column_names).to eq([:seen, :id, :account, :timestamp])
|
97
97
|
rows = account_and_timestamp_rows
|
98
98
|
rows.map! { |row| row.unshift 1 } # Insert 'seen' value to beginning of row
|
99
|
-
expect(schema.insert_statement(
|
99
|
+
expect(schema.insert_statement(rows)).to eq("insert into `comments` (`comments`.`seen`,`comments`.`id`,`comments`.`account`,`comments`.`timestamp`) values (1,1001,5,'2014-12-02 01:13:25'),(1,1002,2,'2014-12-02 00:13:25'),(1,1005,5,'2014-12-01 23:13:25')")
|
100
100
|
|
101
101
|
end
|
102
102
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: cleansweep
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Bill Kayser
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-12-
|
11
|
+
date: 2014-12-03 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: activerecord
|
@@ -44,14 +44,14 @@ dependencies:
|
|
44
44
|
requirements:
|
45
45
|
- - "~>"
|
46
46
|
- !ruby/object:Gem::Version
|
47
|
-
version: 0.3
|
47
|
+
version: '0.3'
|
48
48
|
type: :runtime
|
49
49
|
prerelease: false
|
50
50
|
version_requirements: !ruby/object:Gem::Requirement
|
51
51
|
requirements:
|
52
52
|
- - "~>"
|
53
53
|
- !ruby/object:Gem::Version
|
54
|
-
version: 0.3
|
54
|
+
version: '0.3'
|
55
55
|
- !ruby/object:Gem::Dependency
|
56
56
|
name: pry
|
57
57
|
requirement: !ruby/object:Gem::Requirement
|
@@ -167,7 +167,7 @@ files:
|
|
167
167
|
- spec/purge_runner_spec.rb
|
168
168
|
- spec/spec_helper.rb
|
169
169
|
- spec/table_schema_spec.rb
|
170
|
-
homepage: http://github.com/
|
170
|
+
homepage: http://bkayser.github.com/cleansweep
|
171
171
|
licenses:
|
172
172
|
- MIT
|
173
173
|
metadata: {}
|
@@ -177,9 +177,9 @@ require_paths:
|
|
177
177
|
- lib
|
178
178
|
required_ruby_version: !ruby/object:Gem::Requirement
|
179
179
|
requirements:
|
180
|
-
- - "
|
180
|
+
- - "~>"
|
181
181
|
- !ruby/object:Gem::Version
|
182
|
-
version: '
|
182
|
+
version: '2'
|
183
183
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
184
184
|
requirements:
|
185
185
|
- - ">="
|