large-hadron-migrator 0.1.2 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/CHANGES.md +9 -0
- data/{README.markdown → README.md} +10 -10
- data/VERSION +1 -1
- data/lib/large_hadron_migration.rb +23 -17
- data/spec/large_hadron_migration_spec.rb +7 -1
- data/spec/spec_helper.rb +15 -0
- metadata +6 -32
- data/CHANGES.markdown +0 -0
data/CHANGES.md
ADDED
@@ -0,0 +1,9 @@
|
|
1
|
+
# 0.1.3
|
2
|
+
* code cleanup
|
3
|
+
* Merged [Pullrequest #8](https://github.com/soundcloud/large-hadron-migrator/pull/8)
|
4
|
+
* Merged [Pullrequest #7](https://github.com/soundcloud/large-hadron-migrator/pull/7)
|
5
|
+
* Merged [Pullrequest #4](https://github.com/soundcloud/large-hadron-migrator/pull/4)
|
6
|
+
* Merged [Pullrequest #1](https://github.com/soundcloud/large-hadron-migrator/pull/1)
|
7
|
+
|
8
|
+
# 0.1.2
|
9
|
+
* Initial Release
|
@@ -5,7 +5,7 @@ an agile manner. Most Rails projects start like this, and at first, making
|
|
5
5
|
changes is fast and easy.
|
6
6
|
|
7
7
|
That is until your tables grow to millions of records. At this point, the
|
8
|
-
locking nature of `ALTER TABLE` may take your site down for an hour
|
8
|
+
locking nature of `ALTER TABLE` may take your site down for an hour or more
|
9
9
|
while critical tables are migrated. In order to avoid this, developers begin
|
10
10
|
to design around the problem by introducing join tables or moving the data
|
11
11
|
into another layer. Development gets less and less agile as tables grow and
|
@@ -89,7 +89,7 @@ there can only ever be one version of the record in the journal table.
|
|
89
89
|
|
90
90
|
If the journalling trigger hits an already persisted record, it will be
|
91
91
|
replaced with the latest data and action. `ON DUPLICATE KEY` comes in handy
|
92
|
-
here. This
|
92
|
+
here. This ensures that all journal records will be consistent with the
|
93
93
|
original table.
|
94
94
|
|
95
95
|
### Perform alter statement on new table
|
@@ -100,20 +100,20 @@ indexes at the end of the copying process.
|
|
100
100
|
|
101
101
|
### Copy in chunks up to max primary key value to new table
|
102
102
|
|
103
|
-
Currently InnoDB
|
103
|
+
Currently InnoDB acquires a read lock on the source rows in `INSERT INTO...
|
104
104
|
SELECT`. LHM reads 35K ranges and pauses for a specified number of milliseconds
|
105
105
|
so that contention can be minimized.
|
106
106
|
|
107
107
|
### Switch new and original table names and remove triggers
|
108
108
|
|
109
|
-
The
|
109
|
+
The original and copy table are now atomically switched with `RENAME TABLE
|
110
110
|
original TO archive_original, copy_table TO original`. The triggers are removed
|
111
111
|
so that journalling stops and all mutations and reads now go against the
|
112
112
|
original table.
|
113
113
|
|
114
114
|
### Replay journal: insert, update, deletes
|
115
115
|
|
116
|
-
Because the chunked copy stops at the
|
116
|
+
Because the chunked copy stops at the initial maximum id, we can simply replay
|
117
117
|
all inserts in the journal table without worrying about collisions.
|
118
118
|
|
119
119
|
Updates and deletes are then replayed.
|
@@ -131,8 +131,8 @@ pass, so this will be quite short compared to the copy phase. The
|
|
131
131
|
inconsistency during replay is similar in effect to a slave which is slightly
|
132
132
|
behind master.
|
133
133
|
|
134
|
-
There is also caveat with the current journalling scheme; stale journal
|
135
|
-
'update' entries are still replayed. Imagine an update to
|
134
|
+
There is also a caveat with the current journalling scheme; stale journal
|
135
|
+
'update' entries are still replayed. Imagine an update to a record in the
|
136
136
|
migrated table while the journal is replaying. The journal may already contain
|
137
137
|
an update for this record, which becomes stale now. When it is replayed, the
|
138
138
|
second change will be lost. So if a record is updated twice, once before and
|
@@ -162,9 +162,9 @@ Several hours into the migration, a critical fix had to be deployed to the
|
|
162
162
|
site. We rolled out the fix and restarted the app servers in mid migration.
|
163
163
|
This was not a good idea.
|
164
164
|
|
165
|
-
TL;DR: Never restart during migrations when removing columns with
|
166
|
-
|
167
|
-
|
165
|
+
TL;DR: Never restart during migrations when removing columns with LHM.
|
166
|
+
You can restart while adding migrations as long as active record reads column
|
167
|
+
definitions from the slave.
|
168
168
|
|
169
169
|
The information below is only relevant if you want to restart your app servers
|
170
170
|
while migrating in a master slave setup.
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.1.
|
1
|
+
0.1.3
|
@@ -100,16 +100,17 @@ class LargeHadronMigration < ActiveRecord::Migration
|
|
100
100
|
|
101
101
|
raise "chunk_size must be >= 1" unless chunk_size >= 1
|
102
102
|
|
103
|
-
|
104
|
-
|
105
|
-
|
103
|
+
started = Time.now.strftime("%Y_%m_%d_%H_%M_%S_%3N")
|
104
|
+
new_table = "lhmn_%s" % curr_table
|
105
|
+
old_table = "lhmo_%s_%s" % [started, curr_table]
|
106
|
+
journal_table = "lhmj_%s_%s" % [started, curr_table]
|
106
107
|
|
107
108
|
last_insert_id = last_insert_id(curr_table)
|
108
109
|
say "last inserted id in #{curr_table}: #{last_insert_id}"
|
109
110
|
|
110
111
|
begin
|
111
112
|
# clean tables. old tables are never deleted to guard against rollbacks.
|
112
|
-
execute %Q
|
113
|
+
execute %Q{drop table if exists %s} % new_table
|
113
114
|
|
114
115
|
clone_table(curr_table, new_table, id_window)
|
115
116
|
clone_table_for_changes(curr_table, journal_table)
|
@@ -253,8 +254,8 @@ class LargeHadronMigration < ActiveRecord::Migration
|
|
253
254
|
end
|
254
255
|
end
|
255
256
|
|
256
|
-
def self.clone_table(source, dest, window = 0)
|
257
|
-
execute schema_sql(source, dest, window)
|
257
|
+
def self.clone_table(source, dest, window = 0, add_action_column = false)
|
258
|
+
execute schema_sql(source, dest, window, add_action_column)
|
258
259
|
end
|
259
260
|
|
260
261
|
def self.common_columns(t1, t2)
|
@@ -262,11 +263,7 @@ class LargeHadronMigration < ActiveRecord::Migration
|
|
262
263
|
end
|
263
264
|
|
264
265
|
def self.clone_table_for_changes(table, journal_table)
|
265
|
-
clone_table(table, journal_table)
|
266
|
-
execute %Q{
|
267
|
-
alter table %s
|
268
|
-
add column hadron_action varchar(15);
|
269
|
-
} % journal_table
|
266
|
+
clone_table(table, journal_table, 0, true)
|
270
267
|
end
|
271
268
|
|
272
269
|
def self.rename_tables(tables = {})
|
@@ -333,11 +330,15 @@ class LargeHadronMigration < ActiveRecord::Migration
|
|
333
330
|
end
|
334
331
|
|
335
332
|
def self.replay_delete_changes(table, journal_table)
|
336
|
-
|
337
|
-
|
338
|
-
|
339
|
-
|
340
|
-
|
333
|
+
with_master do
|
334
|
+
if connection.select_values("select id from #{journal_table} where hadron_action = 'delete' LIMIT 1").any?
|
335
|
+
execute %Q{
|
336
|
+
delete from #{table} where id in (
|
337
|
+
select id from #{journal_table} where hadron_action = 'delete'
|
338
|
+
)
|
339
|
+
}
|
340
|
+
end
|
341
|
+
end
|
341
342
|
end
|
342
343
|
|
343
344
|
def self.replay_update_changes(table, journal_table, chunk_size = 10000, wait = 0.2)
|
@@ -359,12 +360,17 @@ class LargeHadronMigration < ActiveRecord::Migration
|
|
359
360
|
# behavior with the latter where the auto_increment of the source table
|
360
361
|
# got modified when updating the destination.
|
361
362
|
#
|
362
|
-
def self.schema_sql(source, dest, window)
|
363
|
+
def self.schema_sql(source, dest, window, add_action_column = false)
|
363
364
|
show_create(source).tap do |schema|
|
364
365
|
schema.gsub!(/auto_increment=(\d+)/i) do
|
365
366
|
"auto_increment=#{ $1.to_i + window }"
|
366
367
|
end
|
367
368
|
|
369
|
+
if add_action_column
|
370
|
+
schema.sub!(/\) ENGINE=/,
|
371
|
+
", hadron_action ENUM('update', 'insert', 'delete'), INDEX hadron_action (hadron_action) USING BTREE) ENGINE=")
|
372
|
+
end
|
373
|
+
|
368
374
|
schema.gsub!('CREATE TABLE `%s`' % source, 'CREATE TABLE `%s`' % dest)
|
369
375
|
end
|
370
376
|
end
|
@@ -99,7 +99,8 @@ describe "LargeHadronMigration", "triggers" do
|
|
99
99
|
end
|
100
100
|
|
101
101
|
it "should create a table for triggered changes" do
|
102
|
-
truthiness_column "triggerme_changes", "hadron_action", "
|
102
|
+
truthiness_column "triggerme_changes", "hadron_action", "enum"
|
103
|
+
truthiness_index "triggerme_changes", "hadron_action", [ "hadron_action" ], false
|
103
104
|
end
|
104
105
|
|
105
106
|
it "should trigger on insert" do
|
@@ -341,6 +342,11 @@ describe "LargeHadronMigration", "replaying changes" do
|
|
341
342
|
end
|
342
343
|
end
|
343
344
|
|
345
|
+
it "doesn't replay delete if there are any" do
|
346
|
+
LargeHadronMigration.should_receive(:execute).never
|
347
|
+
LargeHadronMigration.replay_delete_changes("source", "source_changes")
|
348
|
+
end
|
349
|
+
|
344
350
|
end
|
345
351
|
|
346
352
|
describe "LargeHadronMigration", "units" do
|
data/spec/spec_helper.rb
CHANGED
@@ -90,6 +90,21 @@ module SpecHelper
|
|
90
90
|
|
91
91
|
end
|
92
92
|
|
93
|
+
def truthiness_index(table, expected_index_name, indexed_columns, unique)
|
94
|
+
index = sql("SHOW INDEXES FROM #{table}").all_hashes.inject({}) do |a, part|
|
95
|
+
index_name = part['Key_name']
|
96
|
+
a[index_name] ||= { 'unique' => '0' == part['Non_unique'], 'columns' => [] }
|
97
|
+
column_index = part['Seq_in_index'].to_i - 1
|
98
|
+
a[index_name]['columns'][column_index] = part['Column_name']
|
99
|
+
a
|
100
|
+
end[expected_index_name]
|
101
|
+
|
102
|
+
flunk("no index named #{expected_index_name} found on #{table}") unless index
|
103
|
+
|
104
|
+
index['columns'].should == indexed_columns
|
105
|
+
index['unique'].should == unique
|
106
|
+
end
|
107
|
+
|
93
108
|
end
|
94
109
|
|
95
110
|
# Mock Rails Environment
|
metadata
CHANGED
@@ -1,12 +1,8 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: large-hadron-migrator
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
prerelease:
|
5
|
-
|
6
|
-
- 0
|
7
|
-
- 1
|
8
|
-
- 2
|
9
|
-
version: 0.1.2
|
4
|
+
prerelease:
|
5
|
+
version: 0.1.3
|
10
6
|
platform: ruby
|
11
7
|
authors:
|
12
8
|
- SoundCloud
|
@@ -16,8 +12,7 @@ autorequire:
|
|
16
12
|
bindir: bin
|
17
13
|
cert_chain: []
|
18
14
|
|
19
|
-
date: 2011-
|
20
|
-
default_executable:
|
15
|
+
date: 2011-09-28 00:00:00 Z
|
21
16
|
dependencies:
|
22
17
|
- !ruby/object:Gem::Dependency
|
23
18
|
name: activerecord
|
@@ -27,10 +22,6 @@ dependencies:
|
|
27
22
|
requirements:
|
28
23
|
- - ~>
|
29
24
|
- !ruby/object:Gem::Version
|
30
|
-
segments:
|
31
|
-
- 2
|
32
|
-
- 3
|
33
|
-
- 8
|
34
25
|
version: 2.3.8
|
35
26
|
type: :runtime
|
36
27
|
version_requirements: *id001
|
@@ -42,10 +33,6 @@ dependencies:
|
|
42
33
|
requirements:
|
43
34
|
- - ~>
|
44
35
|
- !ruby/object:Gem::Version
|
45
|
-
segments:
|
46
|
-
- 2
|
47
|
-
- 3
|
48
|
-
- 8
|
49
36
|
version: 2.3.8
|
50
37
|
type: :runtime
|
51
38
|
version_requirements: *id002
|
@@ -57,10 +44,6 @@ dependencies:
|
|
57
44
|
requirements:
|
58
45
|
- - "="
|
59
46
|
- !ruby/object:Gem::Version
|
60
|
-
segments:
|
61
|
-
- 2
|
62
|
-
- 8
|
63
|
-
- 1
|
64
47
|
version: 2.8.1
|
65
48
|
type: :runtime
|
66
49
|
version_requirements: *id003
|
@@ -72,10 +55,6 @@ dependencies:
|
|
72
55
|
requirements:
|
73
56
|
- - "="
|
74
57
|
- !ruby/object:Gem::Version
|
75
|
-
segments:
|
76
|
-
- 1
|
77
|
-
- 3
|
78
|
-
- 1
|
79
58
|
version: 1.3.1
|
80
59
|
type: :development
|
81
60
|
version_requirements: *id004
|
@@ -89,11 +68,11 @@ extra_rdoc_files: []
|
|
89
68
|
|
90
69
|
files:
|
91
70
|
- .gitignore
|
92
|
-
- CHANGES.
|
71
|
+
- CHANGES.md
|
93
72
|
- Gemfile
|
94
73
|
- Gemfile.lock
|
95
74
|
- LICENSE
|
96
|
-
- README.
|
75
|
+
- README.md
|
97
76
|
- Rakefile
|
98
77
|
- VERSION
|
99
78
|
- large-hadron-migrator.gemspec
|
@@ -101,7 +80,6 @@ files:
|
|
101
80
|
- spec/large_hadron_migration_spec.rb
|
102
81
|
- spec/migrate/add_new_column.rb
|
103
82
|
- spec/spec_helper.rb
|
104
|
-
has_rdoc: true
|
105
83
|
homepage: http://github.com/soundcloud/large-hadron-migrator
|
106
84
|
licenses: []
|
107
85
|
|
@@ -115,21 +93,17 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
115
93
|
requirements:
|
116
94
|
- - ">="
|
117
95
|
- !ruby/object:Gem::Version
|
118
|
-
segments:
|
119
|
-
- 0
|
120
96
|
version: "0"
|
121
97
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
122
98
|
none: false
|
123
99
|
requirements:
|
124
100
|
- - ">="
|
125
101
|
- !ruby/object:Gem::Version
|
126
|
-
segments:
|
127
|
-
- 0
|
128
102
|
version: "0"
|
129
103
|
requirements: []
|
130
104
|
|
131
105
|
rubyforge_project:
|
132
|
-
rubygems_version: 1.
|
106
|
+
rubygems_version: 1.8.10
|
133
107
|
signing_key:
|
134
108
|
specification_version: 3
|
135
109
|
summary: online schema changer for mysql
|
data/CHANGES.markdown
DELETED
File without changes
|