cleansweep 1.0.1 → 1.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGES.md +8 -2
- data/README.md +72 -41
- data/cleansweep.gemspec +6 -2
- data/lib/clean_sweep/purge_runner.rb +32 -8
- data/lib/clean_sweep/table_schema/column_schema.rb +20 -4
- data/lib/clean_sweep/table_schema/index_schema.rb +5 -5
- data/lib/clean_sweep/table_schema.rb +36 -13
- data/lib/clean_sweep/version.rb +1 -1
- data/spec/factories/books.rb +9 -1
- data/spec/purge_runner_spec.rb +12 -5
- data/spec/table_schema_spec.rb +11 -11
- metadata +7 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 171c5ce6b972df17162909a1538cf8ecc867e347
|
4
|
+
data.tar.gz: 6d91698f6759a599e03683287ea230230d99a475
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 5373eb62b1acbf097681efde6a1ad08ad94c574ee5676546e0c798ba91d93a8a28efc97fe8f4ce95e8b5f0ee8f1e4b12e340c6cdf0a78e901589ecbc85f4fee5
|
7
|
+
data.tar.gz: 199c96ba5a90457bd6d3310b59a293de1dd185d84b009d1f849ce13637210415606ba29fd704312186c8767aaaf9990e902bfcbc7d4e561f28f200112ef83f0e
|
data/CHANGES.md
CHANGED
@@ -1,5 +1,11 @@
|
|
1
1
|
See the [documentation](http://bkayser.github.io/cleansweep) for details
|
2
2
|
|
3
|
-
|
3
|
+
### Version 1.0.1
|
4
4
|
|
5
|
-
* Initial release
|
5
|
+
* Initial release
|
6
|
+
|
7
|
+
### Version 1.0.2
|
8
|
+
|
9
|
+
* Changed destination options so you can delete from a different table.
|
10
|
+
* Added `dest_columns` option as a map of column names in the source to column names in the destination.
|
11
|
+
* More testing and bug fixing in real environments
|
data/README.md
CHANGED
@@ -1,5 +1,6 @@
|
|
1
|
-
Cleansweep is a utility for scripting purges using ruby in an
|
2
|
-
mysql innodb tables. Based on the
|
1
|
+
Cleansweep is a utility for scripting purges using ruby in an
|
2
|
+
efficient, low-impact manner on mysql innodb tables. Based on the
|
3
|
+
Percona `pt-archive` utility.
|
3
4
|
|
4
5
|
## Installation
|
5
6
|
|
@@ -35,12 +36,13 @@ Assume there is an active record model for it:
|
|
35
36
|
|
36
37
|
### Purging by traversing an index
|
37
38
|
|
38
|
-
The most efficient way to work through a table is by scanning through
|
39
|
-
at a time.
|
39
|
+
The most efficient way to work through a table is by scanning through
|
40
|
+
an index one chunk at a time.
|
40
41
|
|
41
42
|
Let's assume we want to purge Comments older than 1 month. We can
|
42
|
-
scan the primary key index or the `account`,`timestamp` index. In
|
43
|
-
probably work better since we are evaluating
|
43
|
+
scan the primary key index or the `account`,`timestamp` index. In
|
44
|
+
this case the latter will probably work better since we are evaluating
|
45
|
+
the timestamp for the purge.
|
44
46
|
|
45
47
|
```ruby
|
46
48
|
r = CleanSweep::PurgeRunner.new model: Comment,
|
@@ -62,7 +64,8 @@ Check what it will do:
|
|
62
64
|
r.print_queries($stdout)
|
63
65
|
```
|
64
66
|
|
65
|
-
This will show you what it will do by printing out the three different
|
67
|
+
This will show you what it will do by printing out the three different
|
68
|
+
statements used:
|
66
69
|
|
67
70
|
```sql
|
68
71
|
Initial Query:
|
@@ -82,13 +85,15 @@ This will show you what it will do by printing out the three different statement
|
|
82
85
|
WHERE (`id` = 2)
|
83
86
|
```
|
84
87
|
|
85
|
-
It does the initial statement once to get the first chunk of rows.
|
86
|
-
starting at the index where the last
|
87
|
-
|
88
|
-
|
88
|
+
It does the initial statement once to get the first chunk of rows.
|
89
|
+
Then it does subsequent queries starting at the index where the last
|
90
|
+
chunk left off, thereby avoiding a complete index scan. This works
|
91
|
+
fine as long as you don't have rows with duplicate account id and
|
92
|
+
timestamps. If you do, you'll possibly miss rows between chunks.
|
89
93
|
|
90
|
-
To avoid missing duplicates, you can traverse the index using only the
|
91
|
-
like `>=` instead of `>`.
|
94
|
+
To avoid missing duplicates, you can traverse the index using only the
|
95
|
+
first column with an inclusive comparator like `>=` instead of `>`.
|
96
|
+
Here's what that would look like:
|
92
97
|
|
93
98
|
```ruby
|
94
99
|
r = CleanSweep::PurgeRunner.new model:Comment,
|
@@ -107,48 +112,70 @@ The chunk query looks like:
|
|
107
112
|
LIMIT 500
|
108
113
|
```
|
109
114
|
|
110
|
-
You can scan the index in either direction. To specify descending
|
115
|
+
You can scan the index in either direction. To specify descending
|
116
|
+
order, use the `reverse: true` option.
|
111
117
|
|
112
118
|
### Copying rows from one table to another
|
113
119
|
|
114
|
-
You can use the same technique to copy rows from one table to another.
|
115
|
-
minimal. It won't _move_ rows, only
|
116
|
-
|
117
|
-
used to delete later.
|
120
|
+
You can use the same technique to copy rows from one table to another.
|
121
|
+
Support in CleanSweep is pretty minimal. It won't _move_ rows, only
|
122
|
+
copy them, although it would be easy to fix this. I used this to copy
|
123
|
+
ids into a temporary table which I then used to delete later.
|
118
124
|
|
119
|
-
Here's an example that copies rows from the `Comment` model to the
|
120
|
-
Comments older than one
|
125
|
+
Here's an example that copies rows from the `Comment` model to the
|
126
|
+
`ExpiredComment` model (`expired_comments`). Comments older than one
|
127
|
+
week are copied.
|
121
128
|
|
122
129
|
```ruby
|
123
130
|
copier = CleanSweep::PurgeRunner.new model: Comment,
|
124
131
|
index: 'comments_on_account_timestamp',
|
125
132
|
dest_model: ExpiredComment,
|
133
|
+
copy_only: true,
|
126
134
|
copy_columns: %w[liked] do do | model |
|
127
135
|
model.where('last_used_at < ?', 1.week.ago)
|
128
136
|
end
|
129
137
|
```
|
130
138
|
|
131
|
-
The `copy_columns` option specifies additional columns to be inserted
|
139
|
+
The `copy_columns` option specifies additional columns to be inserted
|
140
|
+
into the `expired_comments` table.
|
141
|
+
|
142
|
+
If the column names are different in the destination table than in the
|
143
|
+
source table, you can specify a mapping with the `dest_columns` option
|
144
|
+
which takes a map of source column name to destination name.
|
145
|
+
|
146
|
+
### Deleting rows in another table
|
147
|
+
|
148
|
+
What if you want to query one table and delete those rows in another?
|
149
|
+
I needed this when I built a temporary table of account ids that
|
150
|
+
referenced deleted accounts. I then wanted to delete rows in other
|
151
|
+
tables that referenced those account ids. To do that, specify a
|
152
|
+
`dest_table` without specifying `copy_only` mode. This will execute
|
153
|
+
the delete statement on the destination table without removing rows
|
154
|
+
from the source table.
|
132
155
|
|
133
156
|
### Watching the history list and replication lag
|
134
157
|
|
135
|
-
You can enter thresholds for the history list size and replication lag
|
136
|
-
purge if either of those values get
|
137
|
-
|
158
|
+
You can enter thresholds for the history list size and replication lag
|
159
|
+
that will be used to pause the purge if either of those values get
|
160
|
+
into an unsafe territory. The script will pause for 5 minutes and
|
161
|
+
only start once the corresponding metric goes back down to 90% of the
|
162
|
+
specified threshold.
|
138
163
|
|
139
164
|
### Logging and monitoring progress
|
140
165
|
|
141
|
-
You pass in a standard log instance to capture all running output. By
|
142
|
-
`ActiveRecord::Base` logger, or stdout if
|
166
|
+
You pass in a standard log instance to capture all running output. By
|
167
|
+
default it will log to your `ActiveRecord::Base` logger, or stdout if
|
168
|
+
that's not set up.
|
143
169
|
|
144
|
-
If you specify a reporting interval
|
145
|
-
|
146
|
-
progress and assess the rate of deletion.
|
170
|
+
If you specify a reporting interval with the `report` option it will
|
171
|
+
print the status of the purge at that interval. This is useful to
|
172
|
+
track progress and assess the rate of deletion.
|
147
173
|
|
148
174
|
### Joins and subqueries
|
149
175
|
|
150
|
-
You can add subqueries and joins to your query in the scope block, but
|
151
|
-
clause may work against you if the
|
176
|
+
You can add subqueries and joins to your query in the scope block, but
|
177
|
+
be careful. The index and order clause may work against you if the
|
178
|
+
table you are joining with doesn't have good parity with the indexes
|
152
179
|
in your target table.
|
153
180
|
|
154
181
|
### Limitations
|
@@ -165,21 +192,24 @@ in your target table.
|
|
165
192
|
|
166
193
|
### Other options
|
167
194
|
|
168
|
-
There are a number of other options you can use to tune the script.
|
169
|
-
[API on the `PurgeRunner`
|
195
|
+
There are a number of other options you can use to tune the script.
|
196
|
+
For details look at the [API on the `PurgeRunner`
|
197
|
+
class](http://bkayser.github.io/cleansweep/rdoc/CleanSweep/PurgeRunner.html)
|
170
198
|
|
171
199
|
### NewRelic integration
|
172
200
|
|
173
|
-
The script requires the [New Relic](http://github.com/newrelic/rpm)
|
174
|
-
|
175
|
-
|
176
|
-
|
201
|
+
The script requires the [New Relic](http://github.com/newrelic/rpm)
|
202
|
+
gem. It won't impact anyting if you don't have a New Relic account to
|
203
|
+
report to, but if you do use New Relic it is configured to show you
|
204
|
+
detailed metrics. I recommend turning off transaction traces for long
|
205
|
+
purge jobs to reduce your memory footprint.
|
177
206
|
|
178
207
|
## Testing
|
179
208
|
|
180
|
-
To run the specs, start a local mysql instance. The default user is
|
181
|
-
Override the user/password with
|
182
|
-
|
209
|
+
To run the specs, start a local mysql instance. The default user is
|
210
|
+
root with an empty password. Override the user/password with
|
211
|
+
environment variables `DB_USER` and `DB_PASSWORD`. The test creates a
|
212
|
+
db called 'cstest'.
|
183
213
|
|
184
214
|
## Contributing
|
185
215
|
|
@@ -197,5 +227,6 @@ Covered by the MIT [LICENSE](LICENSE.txt).
|
|
197
227
|
|
198
228
|
### Credits
|
199
229
|
|
200
|
-
This was all inspired and informed by [Percona's `pt-archiver`
|
230
|
+
This was all inspired and informed by [Percona's `pt-archiver`
|
231
|
+
script](http://www.percona.com/doc/percona-toolkit/2.1/pt-archiver.html)
|
201
232
|
written by Baron Schwartz.
|
data/cleansweep.gemspec
CHANGED
@@ -9,11 +9,15 @@ Gem::Specification.new do |spec|
|
|
9
9
|
spec.authors = ["Bill Kayser"]
|
10
10
|
spec.email = ["bkayser@newrelic.com"]
|
11
11
|
spec.summary = %q{Utility to purge or archive rows in mysql tables}
|
12
|
+
|
13
|
+
spec.platform = Gem::Platform::RUBY
|
14
|
+
spec.required_ruby_version = '~> 2'
|
15
|
+
|
12
16
|
spec.description = <<-EOF
|
13
17
|
Purge data from mysql innodb tables efficiently with low overhead and impact.
|
14
18
|
Based on the Percona pt-archive utility.
|
15
19
|
EOF
|
16
|
-
spec.homepage = "http://github.com/
|
20
|
+
spec.homepage = "http://bkayser.github.com/cleansweep"
|
17
21
|
spec.license = "MIT"
|
18
22
|
|
19
23
|
spec.files = `git ls-files -z`.split("\x0")
|
@@ -23,7 +27,7 @@ Gem::Specification.new do |spec|
|
|
23
27
|
|
24
28
|
spec.add_runtime_dependency 'activerecord', '>= 3.0'
|
25
29
|
spec.add_runtime_dependency 'newrelic_rpm'
|
26
|
-
spec.add_runtime_dependency 'mysql2', '~> 0.3
|
30
|
+
spec.add_runtime_dependency 'mysql2', '~> 0.3'
|
27
31
|
|
28
32
|
spec.add_development_dependency 'pry', '~> 0'
|
29
33
|
spec.add_development_dependency 'bundler', '~> 1.7'
|
@@ -36,11 +36,23 @@ require 'stringio'
|
|
36
36
|
# The log instance to use. Defaults to the <tt>ActiveRecord::Base.logger</tt>
|
37
37
|
# if not nil, otherwise it uses _$stdout_
|
38
38
|
# [:dest_model]
|
39
|
-
#
|
40
|
-
#
|
41
|
-
#
|
39
|
+
# Specifies the model for the delete operation, or the copy operation if in copy mode.
|
40
|
+
# When this option is present nothing is deleted in the model table. Instead, rows
|
41
|
+
# are either inserted into this table or deleted from this table.
|
42
|
+
# The columns in this model must include the primary key columns found in the source
|
43
|
+
# model. If they have different names you need to specify them with the
|
44
|
+
# <tt>dest_columns</tt> option.
|
45
|
+
# [:copy_only]
|
46
|
+
# Specifies copy mode, where rows are inserted into the destination table instead of deleted from
|
47
|
+
# the model table. By default, only columns in the
|
42
48
|
# named index and primary key are copied but these can be augmented with columns in the
|
43
49
|
# <tt>copy_columns</tt> option.
|
50
|
+
# [:dest_columns]
|
51
|
+
# This is a map of column names in the model to column names in the dest model when the
|
52
|
+
# corresponding models differ. Only column names that are different need to be specified.
|
53
|
+
# For instance your table of account ids might have <tt>account_id</tt>
|
54
|
+
# as the primary key column, but you want to delete rows in the accounts table where the account id is
|
55
|
+
# the column named <tt>id</tt>
|
44
56
|
# [:copy_columns]
|
45
57
|
# Extra columns to add when copying to a dest model.
|
46
58
|
#
|
@@ -79,11 +91,15 @@ class CleanSweep::PurgeRunner
|
|
79
91
|
@max_history = options[:max_history]
|
80
92
|
@max_repl_lag = options[:max_repl_lag]
|
81
93
|
|
94
|
+
@copy_mode = @target_model && options[:copy_only]
|
95
|
+
|
82
96
|
@table_schema = CleanSweep::TableSchema.new @model,
|
83
97
|
key_name: options[:index],
|
84
98
|
ascending: !options[:reverse],
|
85
99
|
extra_columns: options[:copy_columns],
|
86
|
-
first_only: options[:first_only]
|
100
|
+
first_only: options[:first_only],
|
101
|
+
dest_model: @target_model,
|
102
|
+
dest_columns: options[:dest_columns]
|
87
103
|
|
88
104
|
if (@max_history || @max_repl_lag)
|
89
105
|
@mysql_status = CleanSweep::PurgeRunner::MysqlStatus.new model: @model,
|
@@ -106,7 +122,7 @@ class CleanSweep::PurgeRunner
|
|
106
122
|
|
107
123
|
|
108
124
|
def copy_mode?
|
109
|
-
@
|
125
|
+
@copy_mode
|
110
126
|
end
|
111
127
|
|
112
128
|
# Execute the purge in chunks according to the parameters given on instance creation.
|
@@ -117,7 +133,10 @@ class CleanSweep::PurgeRunner
|
|
117
133
|
#
|
118
134
|
def execute_in_batches
|
119
135
|
|
120
|
-
|
136
|
+
if @dry_run
|
137
|
+
print_queries($stdout)
|
138
|
+
return 0
|
139
|
+
end
|
121
140
|
|
122
141
|
@start = Time.now
|
123
142
|
verb = copy_mode? ? "copying" : "purging"
|
@@ -146,7 +165,7 @@ class CleanSweep::PurgeRunner
|
|
146
165
|
last_row = rows.last
|
147
166
|
if copy_mode?
|
148
167
|
metric_op_name = 'INSERT'
|
149
|
-
statement = @table_schema.insert_statement(
|
168
|
+
statement = @table_schema.insert_statement(rows)
|
150
169
|
else
|
151
170
|
metric_op_name = 'DELETE'
|
152
171
|
statement = @table_schema.delete_statement(rows)
|
@@ -190,11 +209,16 @@ class CleanSweep::PurgeRunner
|
|
190
209
|
io.puts 'Initial Query:'
|
191
210
|
io.puts format_query(' ', @query.to_sql)
|
192
211
|
rows = @model.connection.select_rows @query.limit(1).to_sql
|
212
|
+
if rows.empty?
|
213
|
+
# Don't have any sample data to use for the sample queries, so use NULL values just
|
214
|
+
# so the query will print out.
|
215
|
+
rows << [nil] * 100
|
216
|
+
end
|
193
217
|
io.puts "Chunk Query:"
|
194
218
|
io.puts format_query(' ', @table_schema.scope_to_next_chunk(@query, rows.first).to_sql)
|
195
219
|
if copy_mode?
|
196
220
|
io.puts "Insert Statement:"
|
197
|
-
io.puts format_query(' ', @table_schema.insert_statement(
|
221
|
+
io.puts format_query(' ', @table_schema.insert_statement(rows))
|
198
222
|
else
|
199
223
|
io.puts "Delete Statement:"
|
200
224
|
io.puts format_query(' ', @table_schema.delete_statement(rows))
|
@@ -1,23 +1,39 @@
|
|
1
1
|
class CleanSweep::TableSchema::ColumnSchema
|
2
2
|
|
3
|
-
attr_reader :name
|
3
|
+
attr_reader :name, :ar_column
|
4
4
|
attr_accessor :select_position
|
5
|
+
attr_writer :dest_name
|
5
6
|
|
6
7
|
def initialize(name, model)
|
7
8
|
@name = name.to_sym
|
8
9
|
col_num = model.column_names.index(name.to_s) or raise "Can't find #{name} in #{model.name}"
|
9
10
|
@model = model
|
10
|
-
@
|
11
|
+
@ar_column = model.columns[col_num]
|
11
12
|
end
|
12
13
|
|
13
14
|
def quoted_name
|
14
|
-
|
15
|
+
quote_column_name(@model, name)
|
15
16
|
end
|
17
|
+
|
18
|
+
def quoted_dest_name(dest_model)
|
19
|
+
quote_column_name(dest_model, @dest_name || @name)
|
20
|
+
end
|
21
|
+
|
16
22
|
def value(row)
|
17
23
|
row[select_position]
|
18
24
|
end
|
25
|
+
|
19
26
|
def quoted_value(row)
|
20
|
-
@model.quote_value(value(row), @
|
27
|
+
@model.quote_value(value(row), @ar_column)
|
28
|
+
end
|
29
|
+
|
30
|
+
def == other
|
31
|
+
return other && name == other.name
|
32
|
+
end
|
33
|
+
|
34
|
+
private
|
35
|
+
def quote_column_name(model, column_name)
|
36
|
+
model.connection.quote_table_name(model.table_name) + "." + model.connection.quote_column_name(column_name)
|
21
37
|
end
|
22
38
|
end
|
23
39
|
|
@@ -1,6 +1,6 @@
|
|
1
1
|
class CleanSweep::TableSchema::IndexSchema < Struct.new :name, :model, :ascending
|
2
2
|
|
3
|
-
attr_accessor :columns, :name, :model, :ascending, :first_only
|
3
|
+
attr_accessor :columns, :name, :model, :ascending, :first_only, :dest_model
|
4
4
|
|
5
5
|
def initialize name, model
|
6
6
|
@model = model
|
@@ -16,12 +16,12 @@ class CleanSweep::TableSchema::IndexSchema < Struct.new :name, :model, :ascendin
|
|
16
16
|
# Take columns referenced by this index and add them to the list if they
|
17
17
|
# are not present. Record their position in the list because the position will
|
18
18
|
# be where they are located in a row of values passed in later to #scope_to_next_chunk
|
19
|
-
def add_columns_to
|
19
|
+
def add_columns_to columns
|
20
20
|
@columns.each do | column |
|
21
|
-
pos =
|
21
|
+
pos = columns.index column
|
22
22
|
if pos.nil?
|
23
|
-
|
24
|
-
pos =
|
23
|
+
columns << column
|
24
|
+
pos = columns.size - 1
|
25
25
|
end
|
26
26
|
column.select_position = pos
|
27
27
|
end
|
@@ -2,7 +2,7 @@
|
|
2
2
|
class CleanSweep::TableSchema
|
3
3
|
|
4
4
|
# The list of columns used when selecting, the union of pk and traversing key columns
|
5
|
-
attr_reader :
|
5
|
+
attr_reader :columns
|
6
6
|
|
7
7
|
# The schema for the primary key
|
8
8
|
attr_reader :primary_key
|
@@ -18,8 +18,17 @@ class CleanSweep::TableSchema
|
|
18
18
|
ascending = options.include?(:ascending) ? options[:ascending] : true
|
19
19
|
first_only = options[:first_only]
|
20
20
|
@model = model
|
21
|
+
@dest_model = options[:dest_model] || @model
|
22
|
+
|
23
|
+
# Downcase and symbolize the entries in the column name map:
|
24
|
+
dest_columns_map = Hash[*(options[:dest_columns] || {}).to_a.flatten.map{|n| n.to_s.downcase.to_sym}]
|
25
|
+
|
21
26
|
@name = @model.table_name
|
22
|
-
|
27
|
+
|
28
|
+
@columns =
|
29
|
+
(options[:extra_columns] || []).map do | extra_col_name |
|
30
|
+
CleanSweep::TableSchema::ColumnSchema.new extra_col_name, model
|
31
|
+
end
|
23
32
|
|
24
33
|
key_schemas = build_indexes
|
25
34
|
|
@@ -28,31 +37,40 @@ class CleanSweep::TableSchema
|
|
28
37
|
raise "Table #{model.table_name} must have a primary key" unless key_schemas.include? 'primary'
|
29
38
|
|
30
39
|
@primary_key = key_schemas['primary']
|
31
|
-
@primary_key.add_columns_to @
|
40
|
+
@primary_key.add_columns_to @columns
|
32
41
|
if traversing_key_name
|
33
42
|
traversing_key_name.downcase!
|
34
43
|
raise "BTREE Index #{traversing_key_name} not found" unless key_schemas.include? traversing_key_name
|
35
44
|
@traversing_key = key_schemas[traversing_key_name]
|
36
|
-
@traversing_key.add_columns_to @
|
45
|
+
@traversing_key.add_columns_to @columns
|
37
46
|
@traversing_key.ascending = ascending
|
38
47
|
@traversing_key.first_only = first_only
|
39
48
|
end
|
40
49
|
|
50
|
+
# Specify the column names in the destination map, if provided
|
51
|
+
@columns.each do | column |
|
52
|
+
column.dest_name = dest_columns_map[column.name]
|
53
|
+
end
|
54
|
+
|
41
55
|
end
|
42
56
|
|
43
|
-
def
|
44
|
-
|
57
|
+
def column_names
|
58
|
+
@columns.map(&:name)
|
59
|
+
end
|
60
|
+
|
61
|
+
def insert_statement(rows)
|
62
|
+
"insert into #{@dest_model.quoted_table_name} (#{quoted_dest_column_names}) values #{quoted_row_values(rows)}"
|
45
63
|
end
|
46
64
|
|
47
65
|
def delete_statement(rows)
|
48
66
|
rec_criteria = rows.map do | row |
|
49
67
|
row_compares = []
|
50
68
|
@primary_key.columns.each do |column|
|
51
|
-
row_compares << "#{column.
|
69
|
+
row_compares << "#{column.quoted_dest_name(@dest_model)} = #{column.quoted_value(row)}"
|
52
70
|
end
|
53
71
|
"(" + row_compares.join(" AND ") + ")"
|
54
72
|
end
|
55
|
-
"DELETE FROM #{@
|
73
|
+
"DELETE FROM #{@dest_model.quoted_table_name} WHERE #{rec_criteria.join(" OR ")}"
|
56
74
|
end
|
57
75
|
|
58
76
|
def initial_scope
|
@@ -82,15 +100,20 @@ class CleanSweep::TableSchema
|
|
82
100
|
end
|
83
101
|
|
84
102
|
def quoted_column_names
|
85
|
-
|
103
|
+
columns.map{|c| "#{c.quoted_name}"}.join(",")
|
104
|
+
end
|
105
|
+
|
106
|
+
def quoted_dest_column_names
|
107
|
+
columns.map{|c| c.quoted_dest_name(@dest_model)}.join(",")
|
86
108
|
end
|
87
109
|
|
88
110
|
def quoted_row_values(rows)
|
89
111
|
rows.map do |vec|
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
112
|
+
row = []
|
113
|
+
columns.each_with_index do | col, i |
|
114
|
+
row << @model.quote_value(vec[i], col.ar_column)
|
115
|
+
end
|
116
|
+
"(#{row.join(',')})"
|
94
117
|
end.join(",")
|
95
118
|
end
|
96
119
|
|
data/lib/clean_sweep/version.rb
CHANGED
data/spec/factories/books.rb
CHANGED
@@ -12,6 +12,7 @@ class Book < ActiveRecord::Base
|
|
12
12
|
key book_index_by_bin(bin, id)
|
13
13
|
)
|
14
14
|
EOF
|
15
|
+
Book.delete_all
|
15
16
|
end
|
16
17
|
|
17
18
|
end
|
@@ -27,10 +28,17 @@ end
|
|
27
28
|
class BookTemp < ActiveRecord::Base
|
28
29
|
|
29
30
|
self.table_name = 'book_vault'
|
31
|
+
self.primary_key= 'book_id'
|
30
32
|
|
31
33
|
def self.create_table
|
32
34
|
connection.execute <<-EOF
|
33
|
-
create temporary table if not exists
|
35
|
+
create temporary table if not exists
|
36
|
+
book_vault (
|
37
|
+
`book_id` int(11) primary key auto_increment,
|
38
|
+
`bin` int(11),
|
39
|
+
`published_by` varchar(64)
|
40
|
+
)
|
34
41
|
EOF
|
42
|
+
BookTemp.delete_all
|
35
43
|
end
|
36
44
|
end
|
data/spec/purge_runner_spec.rb
CHANGED
@@ -72,20 +72,20 @@ describe CleanSweep::PurgeRunner do
|
|
72
72
|
purger.print_queries(output)
|
73
73
|
expect(output.string).to eq <<EOF
|
74
74
|
Initial Query:
|
75
|
-
SELECT `id`,`account`,`timestamp`
|
75
|
+
SELECT `comments`.`id`,`comments`.`account`,`comments`.`timestamp`
|
76
76
|
FROM `comments` FORCE INDEX(comments_on_account_timestamp)
|
77
77
|
WHERE (timestamp < '2014-11-25 21:47:43')
|
78
|
-
ORDER BY `account` ASC,`timestamp` ASC
|
78
|
+
ORDER BY `comments`.`account` ASC,`comments`.`timestamp` ASC
|
79
79
|
LIMIT 500
|
80
80
|
Chunk Query:
|
81
|
-
SELECT `id`,`account`,`timestamp`
|
81
|
+
SELECT `comments`.`id`,`comments`.`account`,`comments`.`timestamp`
|
82
82
|
FROM `comments` FORCE INDEX(comments_on_account_timestamp)
|
83
|
-
WHERE (timestamp < '2014-11-25 21:47:43') AND (`account` > 0 OR (`account` = 0 AND `timestamp` > '2014-11-18 21:47:43'))\n ORDER BY `account` ASC,`timestamp` ASC
|
83
|
+
WHERE (timestamp < '2014-11-25 21:47:43') AND (`comments`.`account` > 0 OR (`comments`.`account` = 0 AND `comments`.`timestamp` > '2014-11-18 21:47:43'))\n ORDER BY `comments`.`account` ASC,`comments`.`timestamp` ASC
|
84
84
|
LIMIT 500
|
85
85
|
Delete Statement:
|
86
86
|
DELETE
|
87
87
|
FROM `comments`
|
88
|
-
WHERE (`id` = 2)
|
88
|
+
WHERE (`comments`.`id` = 2)
|
89
89
|
EOF
|
90
90
|
end
|
91
91
|
end
|
@@ -167,13 +167,20 @@ EOF
|
|
167
167
|
it 'copies books' do
|
168
168
|
BookTemp.create_table
|
169
169
|
purger = CleanSweep::PurgeRunner.new model: Book,
|
170
|
+
copy_columns: ['publisher'],
|
170
171
|
dest_model: BookTemp,
|
172
|
+
dest_columns: { 'PUBLISHER' => 'published_by', 'ID' => 'book_id'},
|
171
173
|
chunk_size: 4,
|
174
|
+
copy_only: true,
|
172
175
|
index: 'book_index_by_bin'
|
173
176
|
|
174
177
|
count = purger.execute_in_batches
|
175
178
|
expect(count).to be(@total_book_size)
|
176
179
|
expect(BookTemp.count).to eq(@total_book_size)
|
180
|
+
last_book = BookTemp.last
|
181
|
+
expect(last_book.book_id).to be 200
|
182
|
+
expect(last_book.bin).to be 2000
|
183
|
+
expect(last_book.published_by).to eq 'Random House'
|
177
184
|
end
|
178
185
|
|
179
186
|
end
|
data/spec/table_schema_spec.rb
CHANGED
@@ -17,22 +17,22 @@ describe CleanSweep::TableSchema do
|
|
17
17
|
it 'should produce an ascending chunk clause' do
|
18
18
|
rows = account_and_timestamp_rows
|
19
19
|
expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
|
20
|
-
.to include("(`account` > 5 OR (`account` = 5 AND `timestamp` > '2014-12-01 23:13:25'))")
|
20
|
+
.to include("(`comments`.`account` > 5 OR (`comments`.`account` = 5 AND `comments`.`timestamp` > '2014-12-01 23:13:25'))")
|
21
21
|
end
|
22
22
|
|
23
23
|
it 'should produce all select columns' do
|
24
|
-
expect(schema.
|
24
|
+
expect(schema.column_names).to eq([:id, :account, :timestamp])
|
25
25
|
end
|
26
26
|
|
27
27
|
it 'should produce the ascending order clause' do
|
28
|
-
expect(schema.initial_scope.to_sql).to include('`account` ASC,`timestamp` ASC')
|
28
|
+
expect(schema.initial_scope.to_sql).to include('`comments`.`account` ASC,`comments`.`timestamp` ASC')
|
29
29
|
end
|
30
30
|
|
31
31
|
|
32
32
|
it 'should produce an insert statement' do
|
33
33
|
schema = CleanSweep::TableSchema.new Comment, key_name: 'comments_on_account_timestamp'
|
34
34
|
rows = account_and_timestamp_rows
|
35
|
-
expect(schema.insert_statement(
|
35
|
+
expect(schema.insert_statement(rows)).to eq("insert into `comments` (`comments`.`id`,`comments`.`account`,`comments`.`timestamp`) values (1001,5,'2014-12-02 01:13:25'),(1002,2,'2014-12-02 00:13:25'),(1005,5,'2014-12-01 23:13:25')")
|
36
36
|
end
|
37
37
|
end
|
38
38
|
|
@@ -43,14 +43,14 @@ describe CleanSweep::TableSchema do
|
|
43
43
|
it 'should produce a descending where clause' do
|
44
44
|
rows = account_and_timestamp_rows
|
45
45
|
expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
|
46
|
-
.to include("(`account` < 5 OR (`account` = 5 AND `timestamp` < '2014-12-01 23:13:25'))")
|
46
|
+
.to include("(`comments`.`account` < 5 OR (`comments`.`account` = 5 AND `comments`.`timestamp` < '2014-12-01 23:13:25'))")
|
47
47
|
end
|
48
48
|
|
49
49
|
|
50
50
|
it 'should produce the descending order clause' do
|
51
51
|
rows = account_and_timestamp_rows
|
52
52
|
expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
|
53
|
-
.to include("`account` DESC,`timestamp` DESC")
|
53
|
+
.to include("`comments`.`account` DESC,`comments`.`timestamp` DESC")
|
54
54
|
end
|
55
55
|
|
56
56
|
end
|
@@ -59,13 +59,13 @@ describe CleanSweep::TableSchema do
|
|
59
59
|
let(:schema) { CleanSweep::TableSchema.new Comment, key_name:'comments_on_account_timestamp', first_only: true }
|
60
60
|
|
61
61
|
it 'should select all the rows' do
|
62
|
-
expect(schema.
|
62
|
+
expect(schema.column_names).to eq([:id, :account, :timestamp])
|
63
63
|
end
|
64
64
|
|
65
65
|
it 'should only query using the first column of the index' do
|
66
66
|
rows = account_and_timestamp_rows
|
67
67
|
expect(schema.scope_to_next_chunk(schema.initial_scope, rows.last).to_sql)
|
68
|
-
.to include(" (`account` >= 5) ")
|
68
|
+
.to include(" (`comments`.`account` >= 5) ")
|
69
69
|
|
70
70
|
end
|
71
71
|
|
@@ -83,7 +83,7 @@ describe CleanSweep::TableSchema do
|
|
83
83
|
|
84
84
|
it 'should produce minimal select columns' do
|
85
85
|
schema = CleanSweep::TableSchema.new Comment, key_name: 'PRIMARY'
|
86
|
-
expect(schema.
|
86
|
+
expect(schema.column_names).to eq([:id])
|
87
87
|
end
|
88
88
|
|
89
89
|
it 'should produce the from clause with an index' do
|
@@ -93,10 +93,10 @@ describe CleanSweep::TableSchema do
|
|
93
93
|
|
94
94
|
it 'should include additional columns' do
|
95
95
|
schema = CleanSweep::TableSchema.new Comment, key_name: 'comments_on_account_timestamp', extra_columns: %w[seen id]
|
96
|
-
expect(schema.
|
96
|
+
expect(schema.column_names).to eq([:seen, :id, :account, :timestamp])
|
97
97
|
rows = account_and_timestamp_rows
|
98
98
|
rows.map! { |row| row.unshift 1 } # Insert 'seen' value to beginning of row
|
99
|
-
expect(schema.insert_statement(
|
99
|
+
expect(schema.insert_statement(rows)).to eq("insert into `comments` (`comments`.`seen`,`comments`.`id`,`comments`.`account`,`comments`.`timestamp`) values (1,1001,5,'2014-12-02 01:13:25'),(1,1002,2,'2014-12-02 00:13:25'),(1,1005,5,'2014-12-01 23:13:25')")
|
100
100
|
|
101
101
|
end
|
102
102
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: cleansweep
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Bill Kayser
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-12-
|
11
|
+
date: 2014-12-03 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: activerecord
|
@@ -44,14 +44,14 @@ dependencies:
|
|
44
44
|
requirements:
|
45
45
|
- - "~>"
|
46
46
|
- !ruby/object:Gem::Version
|
47
|
-
version: 0.3
|
47
|
+
version: '0.3'
|
48
48
|
type: :runtime
|
49
49
|
prerelease: false
|
50
50
|
version_requirements: !ruby/object:Gem::Requirement
|
51
51
|
requirements:
|
52
52
|
- - "~>"
|
53
53
|
- !ruby/object:Gem::Version
|
54
|
-
version: 0.3
|
54
|
+
version: '0.3'
|
55
55
|
- !ruby/object:Gem::Dependency
|
56
56
|
name: pry
|
57
57
|
requirement: !ruby/object:Gem::Requirement
|
@@ -167,7 +167,7 @@ files:
|
|
167
167
|
- spec/purge_runner_spec.rb
|
168
168
|
- spec/spec_helper.rb
|
169
169
|
- spec/table_schema_spec.rb
|
170
|
-
homepage: http://github.com/
|
170
|
+
homepage: http://bkayser.github.com/cleansweep
|
171
171
|
licenses:
|
172
172
|
- MIT
|
173
173
|
metadata: {}
|
@@ -177,9 +177,9 @@ require_paths:
|
|
177
177
|
- lib
|
178
178
|
required_ruby_version: !ruby/object:Gem::Requirement
|
179
179
|
requirements:
|
180
|
-
- - "
|
180
|
+
- - "~>"
|
181
181
|
- !ruby/object:Gem::Version
|
182
|
-
version: '
|
182
|
+
version: '2'
|
183
183
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
184
184
|
requirements:
|
185
185
|
- - ">="
|