RubyGems - large-hadron-migrator - Versions diffs - 0.1.2 - Mend

large-hadron-migrator 0.1.2

Files changed (13) hide show

data/.gitignore +5 -0
data/CHANGES.markdown +0 -0
data/Gemfile +3 -0
data/LICENSE +10 -0
data/README.markdown +237 -0
data/Rakefile +5 -0
data/VERSION +1 -0
data/large-hadron-migrator.gemspec +24 -0
data/lib/large_hadron_migration.rb +379 -0
data/spec/large_hadron_migration_spec.rb +370 -0
data/spec/migrate/add_new_column.rb +13 -0
data/spec/spec_helper.rb +114 -0
metadata +139 -0

data/.gitignore ADDED Viewed

@@ -0,0 +1,5 @@
+*.gem
+.bundle
+Gemfile.lock
+pkg/*
+.rvmrc

data/CHANGES.markdown ADDED Viewed

File without changes

data/Gemfile ADDED Viewed

@@ -0,0 +1,3 @@
+source "http://rubygems.org"
+gemspec

data/LICENSE ADDED Viewed

@@ -0,0 +1,10 @@
+Copyright (c) 2011, SoundCloud, Rany Keddo and Tobias Bielohlawek
+All rights reserved.
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+- Neither the name of the SoundCloud nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

data/README.markdown ADDED Viewed

@@ -0,0 +1,237 @@
+# Large Hadron Migrator
+Rails style database migrations are a useful way to evolve your data schema in
+an agile manner. Most Rails projects start like this, and at first, making
+changes is fast and easy.
+That is until your tables grow to millions of records. At this point, the
+locking nature of `ALTER TABLE` may take your site down for an hour our more
+while critical tables are migrated. In order to avoid this, developers begin
+to design around the problem by introducing join tables or moving the data
+into another layer. Development gets less and less agile as tables grow and
+grow. To make the problem worse, adding or changing indices to optimize data
+access becomes just as difficult.
+> Side effects may include black holes and universe implosion.
+There are few things that can be done at the server or engine level. It is
+possible to change default values in an `ALTER TABLE` without locking the table.
+The InnoDB Plugin provides facilities for online index creation, which is
+great if you are using this engine, but only solves half the problem.
+At SoundCloud we started having migration pains quite a while ago, and after
+looking around for third party solutions [0] [1] [2], we decided to create our
+own. We called it Large Hadron Migrator, and it is a gem for online
+ActiveRecord migrations.
+![LHC](http://farm4.static.flickr.com/3093/2844971993_17f2ddf2a8_z.jpg)
+[The Large Hadron collider at CERN](http://en.wikipedia.org/wiki/Large_Hadron_Collider)
+## The idea
+The basic idea is to perform the migration online while the system is live,
+without locking the table. Similar to OAK (online alter table) [2] and the
+facebook tool [0], we use a copy table, triggers and a journal table.
+We copy successive ranges of data from the original table to a copy table and
+then rename both at the end. Since `UPDATE`, `DELETE` and `CREATE` statements
+continue to hit the original table while doing this, we add tiggers to capture
+these changes into a journal table.
+At the end of the copying process, the journal table is replayed so that none
+of these intervening mutations are lost.
+The Large Hadron is a test driven Ruby solution which can easily be dropped
+into an ActiveRecord migration. It presumes a single auto incremented
+numerical primary key called id as per the Rails convention. Unlike the
+twitter solution [1], it does not require the presence of an indexed
+`updated_at` column.
+## Usage
+Large Hadron Migration is currently implemented as a Rails ActiveRecord
+Migration.
+    class AddIndexToEmails < LargeHadronMigration
+      def self.up
+        large_hadron_migrate :emails, :wait => 0.2 do |table_name|
+          execute %Q{
+            alter table %s
+              add index index_emails_on_hashed_address (hashed_address)
+          } % table_name
+        end
+      end
+    end
+## Migration phases
+LHM runs through the following phases during a migration.
+### Get the maximum primary key value for the table
+When starting the migration, we remember the last insert id on the original
+table. When the original table is copied into the new table, we stop at this
+id. The rest of the records will be found in the journal table - see below.
+### Create new table and journal table
+The two tables are cloned using `SHOW CREATE TABLE`. The journal table has an
+extra action field (update, delete, insert).
+### Activate journalling with triggers
+Triggers are created for each of the action types 'create', 'update' and
+'delete'. Triggers are responsible for filling the journal table.
+Because the journal table has the same primary key as the original table,
+there can only ever be one version of the record in the journal table.
+If the journalling trigger hits an already persisted record, it will be
+replaced with the latest data and action. `ON DUPLICATE KEY` comes in handy
+here. This insures that all journal records will be consistent with the
+original table.
+### Perform alter statement on new table
+The user supplied `ALTER TABLE` statement(s) or index changes are applied to the
+new table. Our tests using InnodDB showed this to be faster than adding the
+indexes at the end of the copying process.
+### Copy in chunks up to max primary key value to new table
+Currently InnoDB aquires a read lock on the source rows in `INSERT INTO...
+SELECT`. LHM reads 35K ranges and pauses for a specified number of milliseconds
+so that contention can be minimized.
+### Switch new and original table names and remove triggers
+The orignal and copy table are now atomically switched with `RENAME TABLE
+original TO archive_original, copy_table TO original`. The triggers are removed
+so that journalling stops and all mutations and reads now go against the
+original table.
+### Replay journal: insert, update, deletes
+Because the chunked copy stops at the intial maximum id, we can simply replay
+all inserts in the journal table without worrying about collisions.
+Updates and deletes are then replayed.
+## Potential issues
+Locks could be avoided during the copy phase by loading records into an
+outfile and then reading them back into the copy table. The facebook solution
+does this and reads in 500000 rows and is faster for this reason. We plan to
+add this optimization to LHM.
+Data is eventually consistent while replaying the journal, so there may be
+delays while the journal is replayed. The journal is replayed in a single
+pass, so this will be quite short compared to the copy phase. The
+inconsistency during replay is similar in effect to a slave which is slightly
+behind master.
+There is also caveat with the current journalling scheme; stale journal
+'update' entries are still replayed. Imagine an update to the a record in the
+migrated table while the journal is replaying. The journal may already contain
+an update for this record, which becomes stale now. When it is replayed, the
+second change will be lost. So if a record is updated twice, once before and
+once during the replay window, the second update will be lost.
+There are several ways this edge case could be resolved. One way would be to
+add an UPDATE trigger to the main table, and delete corresponding records from
+the journal while replaying. This would ensure that the journal does not
+contain stale update entries.
+## Near disaster at the collider
+Having scratched our itch, we went ahead and got ready to roll schema and
+index changes that would impact hundreds of millions of records across many
+tables. There was a backlog of changes we rolled out in one go.
+At the time, our MySQL slaves were regularly struggling with their replication
+thread. They were often far behind master. Some of the changes were designed
+to relieve this situation. Because of the slave lag, we configured the LHM to
+add a bit more wait time between chunks, which made the total migration time
+quite long. After running some rehersals, we agreed on the settings and rolled
+out to live, expecting 5 - 7 hours to complete the migrations.
+![LHC](http://farm2.static.flickr.com/1391/958035425_abb70e79b1.jpg)
+Several hours into the migration, a critical fix had to be deployed to the
+site. We rolled out the fix and restarted the app servers in mid migration.
+This was not a good idea.
+TL;DR: Never restart during migrations when removing columns with large
+hadron. You can restart while adding migrations as long as active record reads
+column definitions from the slave.
+The information below is only relevant if you want to restart your app servers
+while migrating in a master slave setup.
+### When adding a column
+1. Migration running on master; no effect, nothing has changed.
+2. Tables switched on master. slave out of sync, migration running.
+   a. Given the app server reads columns from slave on restart, nothing
+      happens.
+   b. Given the app server reads columns from master on restart, bad
+      shitz happen, ie: queries are built with new columns, ie on :include
+      the explicit column list will be built (rather than *) for the
+      included table. Since it does not exist on the slave, queries will
+      break here.
+3. Tables switched on slave
+  -  Same as 2b. Just do a cap deploy instead of cap deploy:restart.
+### When removing a column
+1. Migration running on master; no effect, nothing has changed.
+2. Tables switched on master. slave out of sync, migration running.
+   a. Given the app server reads columns from slave on restart
+      - Writes against master will fail due to the additional column.
+      - Reads will succeed against slaves, but not master.
+   b. Given the app server reads columns from master on restart:
+     - Writes against master might succeed. Old code referencing
+       removed columns will fail.
+     - Reads might or might not succeed, for the same reason.
+## Todos
+Load data into outfile instead of `INSERT INTO... SELECT`. Avoid contention and
+increase speed.
+Handle invalidation of 'update' entries in journal while replaying. Avoid
+stale update replays.
+Some other optimizations:
+Deletions create gaps in the primary key id integer column. LHM has no
+problems with this, but the chunked copy could be sped up by factoring this
+in. Currently a copy range may be completely empty, but there will still be
+a `INSERT INTO... SELECT`.
+Records inserted after the last insert id is retrieved and before the triggers
+are created are currently lost. The table should be briefly locked while id is
+read and triggers are applied.
+## Contributing
+We'll check out your contribution if you:
+- Provide a comprehensive suite of tests for your fork.
+- Have a clear and documented rationale for your changes.
+- Package these up in a pull request.
+We'll do our best to help you out with any contribution issues you may have.
+## License
+The license is included as LICENSE in this directory.
+## Footnotes
+[0]: http://www.facebook.com/note.php?note\_id=430801045932 "Facebook"
+[1]: https://github.com/freels/table\_migrator              "Twitter"
+[2]: http://openarkkit.googlecode.com                       "OAK online alter table"

data/Rakefile ADDED Viewed

@@ -0,0 +1,5 @@
+require 'bundler'
+Bundler::GemHelper.install_tasks
+require 'spec/rake/spectask'
+Spec::Rake::SpecTask.new

data/VERSION ADDED Viewed

	@@ -0,0 +1 @@
1	+ 0.1.2

data/large-hadron-migrator.gemspec ADDED Viewed

@@ -0,0 +1,24 @@
+# -*- encoding: utf-8 -*-
+Gem::Specification.new do |s|
+  s.name          = "large-hadron-migrator"
+  s.version       = File.read("VERSION").to_s
+  s.platform      = Gem::Platform::RUBY
+  s.authors       = ["SoundCloud", "Rany Keddo", "Tobias Bielohlawek"]
+  s.email         = %q{rany@soundcloud.com, tobi@soundcloud.com}
+  s.summary       = %q{online schema changer for mysql}
+  s.description   = %q{Migrate large tables without downtime by copying to a temporary table in chunks. The old table is not dropped. Instead, it is moved to timestamp_table_name for verification.}
+  s.homepage      = %q{http://github.com/soundcloud/large-hadron-migrator}
+  s.files         = `git ls-files`.split("\n")
+  s.test_files    = `git ls-files -- {test,spec,features}/*`.split("\n")
+  s.executables   = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
+  s.require_paths = ["lib"]
+  ['activerecord ~>2.3.8', 'activesupport ~>2.3.8', 'mysql =2.8.1'].each do |gem|
+    s.add_dependency *gem.split(' ')
+  end
+  ['rspec =1.3.1'].each do |gem|
+    s.add_development_dependency *gem.split(' ')
+  end
+end

data/lib/large_hadron_migration.rb ADDED Viewed

@@ -0,0 +1,379 @@
+require 'benchmark'
+#
+#  Copyright (c) 2011, SoundCloud Ltd., Rany Keddo, Tobias Bielohlawek
+#
+#  Migrate large tables without downtime by copying to a temporary table in
+#  chunks. The old table is not dropped. Instead, it is moved to
+#  timestamp_table_name for verification.
+#
+#  WARNING:
+#     - this is an unlocked online operation. updates will probably become
+#       inconsistent during migration.
+#     - may cause the universe to implode.
+#
+#  USAGE:
+#
+#  class AddIndexToEmails < LargeHadronMigration
+#    def self.up
+#      large_hadron_migrate :emails, :wait => 0.2 do |table_name|
+#        execute %Q{
+#          alter table %s
+#            add index index_emails_on_hashed_address (hashed_address)
+#        } % table_name
+#      end
+#    end
+#  end
+#
+#  How to deploy large hadrons with capistrano
+#  -------------------------------------------
+#
+#  1. Run cap deploy:update_code. The new release directory is not symlinked,
+#     so that restarts will not load the new code.
+#
+#  2. Run rake db:migrate from the new release directory on an appserver,
+#     preferably in a screen session.
+#
+#  3. Wait for migrations to sync to all slaves then cap deploy.
+#
+#  Restarting before step 2 is done
+#  --------------------------------
+#
+#  - When adding a column
+#
+#  1. Migration running on master; no effect, nothing has changed.
+#  2. Tables switched on master. slave out of sync, migration running.
+#     a. Given the app server reads columns from slave on restart, nothing
+#        happens.
+#     b. Given the app server reads columns from master on restart, bad
+#        shitz happen, ie: queries are built with new columns, ie on :include
+#        the explicit column list will be built (rather than *) for the
+#        included table. Since it does not exist on the slave, queries will
+#        break here.
+#
+#  3. Tables switched on slave
+#    -  Same as 2b. Just do a cap deploy instead of cap deploy:restart.
+#
+#  - When removing a column
+#
+#  1. Migration running on master; no effect, nothing has changed.
+#  2. Tables switched on master. slave out of sync, migration running.
+#     a. Given the app server reads columns from slave on restart
+#        - Writes against master will fail due to the additional column.
+#        - Reads will succeed against slaves, but not master.
+#
+#     b. Given the app server reads columns from master on restart:
+#       - Writes against master might succeed. Old code referencing
+#         removed columns will fail.
+#       - Reads might or might not succeed, for the same reason.
+#
+#  tl;dr: Never restart during migrations when removing columns with large
+#  hadron. You can restart while adding migrations as long as active record
+#  reads column definitions from the slave.
+#
+#  Pushing out hotfixes while migrating in step 2
+#  ----------------------------------------------
+#
+#  - Check out the currently running (old) code ref.
+#  - Branch from this, make your changes, push it up
+#  - Deploy this version.
+#
+#  Deploying the new version will hurt your head. Don't do it.
+#
+class LargeHadronMigration < ActiveRecord::Migration
+  # id_window must be larger than the number of inserts
+  # added to the journal table. if this is not the case,
+  # inserts will be lost in the replay phase.
+  def self.large_hadron_migrate(curr_table, *args, &block)
+    opts = args.extract_options!.reverse_merge :wait => 0.5,
+        :chunk_size => 35_000,
+        :id_window => 11_000
+    curr_table = curr_table.to_s
+    chunk_size = opts[:chunk_size].to_i
+    # we are in dev/test mode - so speed it up
+    chunk_size = 10_000_000.to_i if Rails.env.development? or Rails.env.test?
+    wait = opts[:wait].to_f
+    id_window = opts[:id_window]
+    raise "chunk_size must be >= 1" unless chunk_size >= 1
+    new_table      = "new_#{curr_table}"
+    old_table      = "%s_#{curr_table}" % Time.now.strftime("%Y_%m_%d_%H_%M_%S_%3N")
+    journal_table  = "#{old_table}_changes"
+    last_insert_id = last_insert_id(curr_table)
+    say "last inserted id in #{curr_table}: #{last_insert_id}"
+    begin
+      # clean tables. old tables are never deleted to guard against rollbacks.
+      execute %Q/drop table if exists %s/ % new_table
+      clone_table(curr_table, new_table, id_window)
+      clone_table_for_changes(curr_table, journal_table)
+      # add triggers
+      add_trigger_on_action(curr_table, journal_table, "insert")
+      add_trigger_on_action(curr_table, journal_table, "update")
+      add_trigger_on_action(curr_table, journal_table, "delete")
+      # alter new table
+      default_values = {}
+      yield new_table, default_values
+      insertion_columns = prepare_insertion_columns(new_table, curr_table, default_values)
+      raise "insertion_columns empty" if insertion_columns.empty?
+      chunked_insert \
+        last_insert_id,
+        chunk_size,
+        new_table,
+        insertion_columns,
+        curr_table,
+        wait
+      rename_tables curr_table => old_table, new_table => curr_table
+      cleanup(curr_table)
+      # replay changes from the changes jornal
+      replay_insert_changes(curr_table, journal_table, chunk_size, wait)
+      replay_update_changes(curr_table, journal_table, chunk_size, wait)
+      replay_delete_changes(curr_table, journal_table)
+      old_table
+    ensure
+      cleanup(curr_table)
+    end
+  end
+  def self.prepare_insertion_columns(new_table, table, default_values = {})
+    {}.tap do |columns|
+      (common_columns(new_table, table) | default_values.keys).each do |column|
+        columns[tick(column)] = default_values[column] || tick(column)
+      end
+    end
+  end
+  def self.chunked_insert(last_insert_id, chunk_size, new_table, insertion_columns, curr_table, wait, where = "")
+    # do the inserts in chunks. helps to reduce io contention and keeps the
+    # undo log small.
+    chunks = (last_insert_id / chunk_size.to_f).ceil
+    times = []
+    (1..chunks).each do |chunk|
+      times << Benchmark.measure do
+        execute "start transaction"
+        execute %Q{
+          insert into %s
+          (%s)
+          select %s
+          from %s
+          where (id between %d and %d) %s
+        } % [
+          new_table,
+          insertion_columns.keys.join(","),
+          insertion_columns.values.join(","),
+          curr_table,
+          ((chunk - 1) * chunk_size) + 1,
+          [chunk * chunk_size, last_insert_id].min,
+          where
+        ]
+        execute "COMMIT"
+      end
+      say_remaining_estimate(times, chunks, chunk, wait)
+      # larger values trade greater inconsistency for less io
+      sleep wait
+    end
+  end
+  def self.chunked_update(last_insert_id, chunk_size, new_table, insertion_columns, curr_table, wait, where = "")
+    # do the inserts in chunks. helps to reduce io contention and keeps the
+    # undo log small.
+    chunks = (last_insert_id / chunk_size.to_f).ceil
+    times = []
+    (1..chunks).each do |chunk|
+      times << Benchmark.measure do
+        execute "start transaction"
+        execute %Q{
+          update %s as t1
+          join %s as t2 on t1.id = t2.id
+          set %s
+          where (t2.id between %d and %d) %s
+        } % [
+          new_table,
+          curr_table,
+          insertion_columns.keys.map { |keys| "t1.#{keys} = t2.#{keys}"}.join(","),
+          ((chunk - 1) * chunk_size) + 1,
+          [chunk * chunk_size, last_insert_id].min,
+          where
+        ]
+        execute "COMMIT"
+      end
+      say_remaining_estimate(times, chunks, chunk, wait)
+      # larger values trade greater inconsistency for less io
+      sleep wait
+    end
+  end
+  def self.last_insert_id(curr_table)
+    with_master do
+      connection.select_value("select max(id) from %s" % curr_table).to_i
+    end
+  end
+  def self.table_column_names(table_name)
+    with_master do
+      connection.select_values %Q{
+        select column_name
+          from information_schema.columns
+         where table_name = "%s"
+           and table_schema = "%s"
+      } % [table_name, connection.current_database]
+    end
+  end
+  def self.with_master
+    if ActiveRecord::Base.respond_to? :with_master
+      ActiveRecord::Base.with_master do
+        yield
+      end
+    else
+      yield
+    end
+  end
+  def self.clone_table(source, dest, window = 0)
+    execute schema_sql(source, dest, window)
+  end
+  def self.common_columns(t1, t2)
+    table_column_names(t1) & table_column_names(t2)
+  end
+  def self.clone_table_for_changes(table, journal_table)
+    clone_table(table, journal_table)
+    execute %Q{
+      alter table %s
+      add column hadron_action varchar(15);
+    } % journal_table
+  end
+  def self.rename_tables(tables = {})
+    execute "rename table %s" % tables.map{ |old_table, new_table| "#{old_table} to #{new_table}" }.join(', ')
+  end
+  def self.add_trigger_on_action(table, journal_table, action)
+    columns = table_column_names(table)
+    table_alias = (action == 'delete') ? 'OLD' : 'NEW'
+    fallback    = (action == 'delete') ? "`hadron_action` = 'delete'" : columns.map { |c| "#{tick(c)} = #{table_alias}.#{tick(c)}" }.join(",")
+    execute %Q{
+      create trigger %s
+        after #{action} on %s for each row
+        begin
+          insert into %s (%s, `hadron_action`)
+          values (%s, '#{ action }')
+          ON DUPLICATE KEY UPDATE %s;
+        end
+    } % [trigger_name(action, table),
+         table,
+         journal_table,
+         columns.map { |c| tick(c) }.join(","),
+         columns.map { |c| "#{table_alias}.#{tick(c)}" }.join(","),
+         fallback
+        ]
+  end
+  def self.delete_trigger_on_action(table, action)
+    execute "drop trigger if exists %s" % trigger_name(action, table)
+  end
+  def self.trigger_name(action, table)
+    tick("after_#{action}_#{table}")
+  end
+  def self.cleanup(table)
+    delete_trigger_on_action(table, "insert")
+    delete_trigger_on_action(table, "update")
+    delete_trigger_on_action(table, "delete")
+  end
+  def self.say_remaining_estimate(times, chunks, chunk, wait)
+    avg = times.inject(0) { |s, t| s += t.real } / times.size.to_f
+    remaining = chunks - chunk
+    say "%d more chunks to go, estimated end: %s" % [
+      remaining,
+      Time.now + (remaining * (avg + wait))
+    ]
+  end
+  def self.replay_insert_changes(table, journal_table, chunk_size = 10000, wait = 0.2)
+    last_insert_id = last_insert_id(journal_table)
+    columns = prepare_insertion_columns(table, journal_table)
+    chunked_insert \
+      last_insert_id,
+      chunk_size,
+      table,
+      columns,
+      journal_table,
+      wait,
+      "AND hadron_action = 'insert'"
+  end
+  def self.replay_delete_changes(table, journal_table)
+    execute %Q{
+      delete from #{table} where id in (
+        select id from #{journal_table} where hadron_action = 'delete'
+      )
+    }
+  end
+  def self.replay_update_changes(table, journal_table, chunk_size = 10000, wait = 0.2)
+    last_insert_id = last_insert_id(journal_table)
+    columns = prepare_insertion_columns(table, journal_table)
+    chunked_update \
+      last_insert_id,
+      chunk_size,
+      table,
+      columns,
+      journal_table,
+      wait,
+      "AND hadron_action = 'update'"
+  end
+  #
+  #  use show create instead of create table like. there was some weird
+  #  behavior with the latter where the auto_increment of the source table
+  #  got modified when updating the destination.
+  #
+  def self.schema_sql(source, dest, window)
+    show_create(source).tap do |schema|
+      schema.gsub!(/auto_increment=(\d+)/i) do
+        "auto_increment=#{  $1.to_i + window }"
+      end
+      schema.gsub!('CREATE TABLE `%s`' % source, 'CREATE TABLE `%s`' % dest)
+    end
+  end
+  def self.show_create(t1)
+    (execute "show create table %s" % t1).fetch_row.last
+  end
+  def self.tick(col)
+    "`#{ col }`"
+  end
+end

data/spec/large_hadron_migration_spec.rb ADDED Viewed

@@ -0,0 +1,370 @@
+#
+#  Copyright (c) 2011, SoundCloud Ltd., Rany Keddo, Tobias Bielohlawek
+#
+require File.expand_path(File.dirname(__FILE__) + '/spec_helper')
+require "migrate/add_new_column"
+describe "LargeHadronMigration", "integration" do
+  include SpecHelper
+  before(:each) { recreate }
+  it "should add new column" do
+    table("addscolumn") do |t|
+      t.string :title
+      t.integer :rating
+      t.timestamps
+    end
+    truthiness_column "addscolumn", "title", "varchar"
+    truthiness_column "addscolumn", "rating", "int"
+    truthiness_column "addscolumn", "created_at", "datetime"
+    truthiness_column "addscolumn", "updated_at", "datetime"
+    ghost = AddNewColumn.up
+    truthiness_column "addscolumn", "title", "varchar"
+    truthiness_column "addscolumn", "rating", "int"
+    truthiness_column "addscolumn", "spam", "tinyint"
+    truthiness_column "addscolumn", "created_at", "datetime"
+    truthiness_column "addscolumn", "updated_at", "datetime"
+  end
+  it "should have same row data" do
+    table "addscolumn" do |t|
+      t.string :text
+      t.integer :number
+      t.timestamps
+    end
+    1200.times do |i|
+      random_string = (0...rand(25)).map{65.+(rand(25)).chr}.join
+      sql "INSERT INTO `addscolumn` SET
+            `id`         = #{i+1},
+            `text`       = '#{random_string}',
+            `number`     = '#{rand(255)}',
+            `updated_at` = NOW(),
+            `created_at` = NOW()"
+    end
+    ghost = AddNewColumn.up
+    truthiness_rows "addscolumn", ghost
+  end
+end
+describe "LargeHadronMigration", "rename" do
+  include SpecHelper
+  before(:each) do
+    recreate
+  end
+  it "should rename multiple tables" do
+    table "renameme" do |t|
+      t.string :text
+    end
+    table "renamemetoo" do |t|
+      t.integer :number
+    end
+    LargeHadronMigration.rename_tables("renameme" => "renameme_new", "renamemetoo" => "renameme")
+    truthiness_column "renameme", "number", "int"
+    truthiness_column "renameme_new", "text", "varchar"
+  end
+end
+describe "LargeHadronMigration", "triggers" do
+  include SpecHelper
+  before(:each) do
+    recreate
+    table "triggerme" do |t|
+      t.string :text
+      t.integer :number
+      t.timestamps
+    end
+    LargeHadronMigration.clone_table_for_changes \
+      "triggerme",
+      "triggerme_changes"
+  end
+  it "should create a table for triggered changes" do
+    truthiness_column "triggerme_changes", "hadron_action", "varchar"
+  end
+  it "should trigger on insert" do
+    LargeHadronMigration.add_trigger_on_action \
+      "triggerme",
+      "triggerme_changes",
+      "insert"
+      # test
+    sql("insert into triggerme values (111, 'hallo', 5, NOW(), NOW())")
+    sql("select * from triggerme_changes where id = 111").tap do |res|
+      res.fetch_hash.tap do |row|
+        row['hadron_action'].should == 'insert'
+        row['text'].should == 'hallo'
+      end
+    end
+  end
+  it "should trigger on update" do
+    # setup
+    sql "insert into triggerme values (111, 'hallo', 5, NOW(), NOW())"
+    LargeHadronMigration.add_trigger_on_action \
+      "triggerme",
+      "triggerme_changes",
+      "update"
+    # test
+    sql("update triggerme set text = 'goodbye' where id = '111'")
+    sql("select * from triggerme_changes where id = 111").tap do |res|
+      res.fetch_hash.tap do |row|
+        row['hadron_action'].should == 'update'
+        row['text'].should == 'goodbye'
+      end
+    end
+  end
+  it "should trigger on delete" do
+    # setup
+    sql "insert into triggerme values (111, 'hallo', 5, NOW(), NOW())"
+    LargeHadronMigration.add_trigger_on_action \
+      "triggerme",
+      "triggerme_changes",
+      "delete"
+    # test
+    sql("delete from triggerme where id = '111'")
+    sql("select * from triggerme_changes where id = 111").tap do |res|
+      res.fetch_hash.tap do |row|
+        row['hadron_action'].should == 'delete'
+        row['text'].should == 'hallo'
+      end
+    end
+  end
+  it "should trigger on create and update" do
+    LargeHadronMigration.add_trigger_on_action \
+      "triggerme",
+      "triggerme_changes",
+      "insert"
+    LargeHadronMigration.add_trigger_on_action \
+      "triggerme",
+      "triggerme_changes",
+      "update"
+    # test
+    sql "insert into triggerme values (111, 'hallo', 5, NOW(), NOW())"
+    sql("update triggerme set text = 'goodbye' where id = '111'")
+    sql("select count(*) AS cnt from triggerme_changes where id = 111").tap do |res|
+      res.fetch_hash.tap do |row|
+        row['cnt'].should == '1'
+      end
+    end
+  end
+  it "should trigger on multiple update" do
+    sql "insert into triggerme values (111, 'hallo', 5, NOW(), NOW())"
+    LargeHadronMigration.add_trigger_on_action \
+      "triggerme",
+      "triggerme_changes",
+      "update"
+    # test
+    sql("update triggerme set text = 'goodbye' where id = '111'")
+    sql("update triggerme set text = 'hallo again' where id = '111'")
+    sql("select count(*) AS cnt from triggerme_changes where id = 111").tap do |res|
+      res.fetch_hash.tap do |row|
+        row['cnt'].should == '1'
+      end
+    end
+  end
+  it "should trigger on inser, update and delete" do
+    LargeHadronMigration.add_trigger_on_action \
+      "triggerme",
+      "triggerme_changes",
+      "insert"
+    LargeHadronMigration.add_trigger_on_action \
+      "triggerme",
+      "triggerme_changes",
+      "update"
+    LargeHadronMigration.add_trigger_on_action \
+      "triggerme",
+      "triggerme_changes",
+      "delete"
+    # test
+    sql "insert into triggerme values (111, 'hallo', 5, NOW(), NOW())"
+    sql("update triggerme set text = 'goodbye' where id = '111'")
+    sql("delete from triggerme where id = '111'")
+    sql("select count(*) AS cnt from triggerme_changes where id = 111").tap do |res|
+      res.fetch_hash.tap do |row|
+        row['cnt'].should == '1'
+      end
+    end
+  end
+  it "should cleanup triggers" do
+    %w(insert update delete).each do |action|
+      LargeHadronMigration.add_trigger_on_action \
+        "triggerme",
+        "triggerme_changes",
+        action
+    end
+    LargeHadronMigration.cleanup "triggerme"
+    # test
+    sql("insert into triggerme values (111, 'hallo', 5, NOW(), NOW())")
+    sql("update triggerme set text = 'goodbye' where id = '111'")
+    sql("delete from triggerme where id = '111'")
+    sql("select count(*) AS cnt from triggerme_changes where id = 111").tap do |res|
+      res.fetch_hash.tap do |row|
+        row['cnt'].should == '0'
+      end
+    end
+  end
+end
+describe "LargeHadronMigration", "replaying changes" do
+  include SpecHelper
+  before(:each) do
+    recreate
+    table "source" do |t|
+      t.string :text
+      t.integer :number
+      t.timestamps
+    end
+    table "source_changes" do |t|
+      t.string :text
+      t.integer :number
+      t.string :hadron_action
+      t.timestamps
+    end
+  end
+  it "should replay inserts" do
+    sql %Q{
+      insert into source (id, text, number, created_at, updated_at)
+           values (1, 'hallo', 5, NOW(), NOW())
+    }
+    sql %Q{
+      insert into source_changes (id, text, number, created_at, updated_at, hadron_action)
+           values (2, 'goodbye', 5, NOW(), NOW(), 'insert')
+    }
+    sql %Q{
+      insert into source_changes (id, text, number, created_at, updated_at, hadron_action)
+           values (3, 'goodbye', 5, NOW(), NOW(), 'delete')
+    }
+    LargeHadronMigration.replay_insert_changes("source", "source_changes")
+    sql("select * from source where id = 2").tap do |res|
+      res.fetch_hash.tap do |row|
+        row['text'].should == 'goodbye'
+      end
+    end
+    sql("select count(*) as cnt from source where id = 3").tap do |res|
+      res.fetch_hash.tap do |row|
+        row['cnt'].should == '0'
+      end
+    end
+  end
+  it "should replay updates" do
+    sql %Q{
+      insert into source (id, text, number, created_at, updated_at)
+           values (1, 'hallo', 5, NOW(), NOW())
+    }
+    sql %Q{
+      insert into source_changes (id, text, number, created_at, updated_at, hadron_action)
+           values (1, 'goodbye', 5, NOW(), NOW(), 'update')
+    }
+    LargeHadronMigration.replay_update_changes("source", "source_changes")
+    sql("select * from source where id = 1").tap do |res|
+      res.fetch_hash.tap do |row|
+        row['text'].should == 'goodbye'
+      end
+    end
+  end
+  it "should replay deletes" do
+    sql %Q{
+      insert into source (id, text, number, created_at, updated_at)
+           values (1, 'hallo', 5, NOW(), NOW()),
+                  (2, 'schmu', 5, NOW(), NOW())
+    }
+    sql %Q{
+      insert into source_changes (id, text, number, created_at, updated_at, hadron_action)
+           values (1, 'goodbye', 5, NOW(), NOW(), 'delete')
+    }
+    LargeHadronMigration.replay_delete_changes("source", "source_changes")
+    sql("select count(*) as cnt from source").tap do |res|
+      res.fetch_hash.tap do |row|
+        row['cnt'].should == '1'
+      end
+    end
+  end
+end
+describe "LargeHadronMigration", "units" do
+  include SpecHelper
+  it "should return correct schema" do
+    recreate
+    table "source" do |t|
+      t.string :text
+      t.integer :number
+      t.timestamps
+    end
+    sql %Q{
+      insert into source (id, text, number, created_at, updated_at)
+           values (1, 'hallo', 5, NOW(), NOW()),
+                  (2, 'schmu', 5, NOW(), NOW())
+    }
+    schema = LargeHadronMigration.schema_sql("source", "source_changes", 1000)
+    schema.should_not include('`source`')
+    schema.should include('`source_changes`')
+    schema.should include('1003')
+  end
+end

data/spec/migrate/add_new_column.rb ADDED Viewed

@@ -0,0 +1,13 @@
+#
+#  Copyright (c) 2011, SoundCloud Ltd., Rany Keddo, Tobias Bielohlawek
+#
+class AddNewColumn < LargeHadronMigration
+  def self.up
+    large_hadron_migrate "addscolumn", :chunk_size => 100 do |table_name|
+      execute %Q{
+        alter table %s add column spam tinyint(1)
+      } % table_name
+    end
+  end
+end

data/spec/spec_helper.rb ADDED Viewed

@@ -0,0 +1,114 @@
+#
+#  Copyright (c) 2011, SoundCloud Ltd., Rany Keddo, Tobias Bielohlawek
+#
+$LOAD_PATH.unshift(File.dirname(__FILE__))
+$LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
+require 'active_record'
+require 'large_hadron_migration'
+require 'spec'
+require 'spec/autorun'
+ActiveRecord::Base.establish_connection(
+  :adapter => 'mysql',
+  :database => 'large_hadron_migration',
+  :username => 'root',
+  :password => '',
+  :host => 'localhost'
+)
+module SpecHelper
+  def connection
+    ActiveRecord::Base.connection
+  end
+  def sql(args)
+    connection.execute(args)
+  end
+  def recreate
+    sql "drop database large_hadron_migration"
+    sql "create database large_hadron_migration character set = 'UTF8'"
+    ActiveRecord::Base.connection.reconnect!
+  end
+  def flunk(msg)
+    raise Spec::Expectations::ExpectationNotMetError.new(msg)
+  end
+  def table(name)
+    ActiveRecord::Schema.define do
+      create_table(name) do |t|
+        yield t
+      end
+    end
+    name
+  end
+  #
+  #  can't be arsed with rspec matchers
+  #
+  def truthiness_column(table, name, type)
+    results = connection.select_values %Q{
+      select column_name
+        from information_schema.columns
+       where table_name = "%s"
+         and table_schema = "%s"
+         and column_name = "%s"
+         and data_type = "%s"
+    } % [table, connection.current_database, name, type]
+    if results.empty?
+      flunk "truthiness column not defined as: %s:%s:%s" % [
+        table,
+        name,
+        type
+      ]
+    end
+  end
+  def truthiness_rows(table_name1, table_name2, offset = 0, limit = 1000)
+    res_1 = sql("SELECT * FROM #{table_name1} ORDER BY id ASC LIMIT #{limit} OFFSET #{offset}")
+    res_2 = sql("SELECT * FROM #{table_name2} ORDER BY id ASC LIMIT #{limit} OFFSET #{offset}")
+    limit.times do |i|
+      res_1_hash = res_1.fetch_hash
+      res_2_hash = res_2.fetch_hash
+      if res_1_hash.nil? || res_2_hash.nil?
+        flunk("truthiness rows failed: Expected #{limit} rows, but only #{i} found")
+      end
+      res_1_hash.keys.each do |key|
+        flunk("truthiness rows failed: #{key} is not same") unless res_1_hash[key] == res_2_hash[key]
+      end
+    end
+  end
+end
+# Mock Rails Environment
+class Rails
+  class << self
+    def env
+      self
+    end
+    def development?
+      false
+    end
+    def production?
+      true
+    end
+    def test?
+      false
+    end
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,139 @@
+--- !ruby/object:Gem::Specification
+name: large-hadron-migrator
+version: !ruby/object:Gem::Version
+  prerelease: false
+  segments:
+  - 0
+  - 1
+  - 2
+  version: 0.1.2
+platform: ruby
+authors:
+- SoundCloud
+- Rany Keddo
+- Tobias Bielohlawek
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2011-05-04 00:00:00 +02:00
+default_executable:
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: activerecord
+  prerelease: false
+  requirement: &id001 !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ~>
+      - !ruby/object:Gem::Version
+        segments:
+        - 2
+        - 3
+        - 8
+        version: 2.3.8
+  type: :runtime
+  version_requirements: *id001
+- !ruby/object:Gem::Dependency
+  name: activesupport
+  prerelease: false
+  requirement: &id002 !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ~>
+      - !ruby/object:Gem::Version
+        segments:
+        - 2
+        - 3
+        - 8
+        version: 2.3.8
+  type: :runtime
+  version_requirements: *id002
+- !ruby/object:Gem::Dependency
+  name: mysql
+  prerelease: false
+  requirement: &id003 !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - "="
+      - !ruby/object:Gem::Version
+        segments:
+        - 2
+        - 8
+        - 1
+        version: 2.8.1
+  type: :runtime
+  version_requirements: *id003
+- !ruby/object:Gem::Dependency
+  name: rspec
+  prerelease: false
+  requirement: &id004 !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - "="
+      - !ruby/object:Gem::Version
+        segments:
+        - 1
+        - 3
+        - 1
+        version: 1.3.1
+  type: :development
+  version_requirements: *id004
+description: Migrate large tables without downtime by copying to a temporary table in chunks. The old table is not dropped. Instead, it is moved to timestamp_table_name for verification.
+email: rany@soundcloud.com, tobi@soundcloud.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- .gitignore
+- CHANGES.markdown
+- Gemfile
+- Gemfile.lock
+- LICENSE
+- README.markdown
+- Rakefile
+- VERSION
+- large-hadron-migrator.gemspec
+- lib/large_hadron_migration.rb
+- spec/large_hadron_migration_spec.rb
+- spec/migrate/add_new_column.rb
+- spec/spec_helper.rb
+has_rdoc: true
+homepage: http://github.com/soundcloud/large-hadron-migrator
+licenses: []
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      segments:
+      - 0
+      version: "0"
+required_rubygems_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      segments:
+      - 0
+      version: "0"
+requirements: []
+rubyforge_project:
+rubygems_version: 1.3.7
+signing_key:
+specification_version: 3
+summary: online schema changer for mysql
+test_files:
+- spec/large_hadron_migration_spec.rb
+- spec/migrate/add_new_column.rb
+- spec/spec_helper.rb