schema_transformer 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.markdown ADDED
@@ -0,0 +1,83 @@
1
+ Schema Transformer
2
+ =======
3
+
4
+ Summary
5
+ -------
6
+ This gem provides a way is alter database schemas on large tables with little downtime. You run 2 commands to ultimately alter the database.
7
+
8
+ First, you generate the schema transform definitions and commands to be ran later on production. You will check these files into the rails project.
9
+
10
+ Second, you run 2 commands on production.
11
+
12
+ The first command will create a 'temporary' table with the altered schema and incrementally copy the data over until it is close to synced. You can run this command as many times as you want as it want - it work hurt. This first command is slow as it takes a while to copy the data over, especially if you have a really large tables that are several GBs in size.
13
+
14
+ The second command will do a switheroo with with 'temporarily' new table and the current table. It will then remove the obsoleted table with the old schema structure. Because it is doing a rename (which can screw up replication on a heavily traffic site), this second command should be ran with maintenance page up. This second command is fast because it doe a final incremental sync and quickly switches the new table into place.
15
+
16
+ Install
17
+ -------
18
+
19
+ <pre>
20
+ gem install --no-ri --no-rdoc schema_transformer # sudo if you need to
21
+ </pre>
22
+
23
+ Usage
24
+ -------
25
+
26
+ Generate the schema transform definitions:
27
+
28
+ <pre>
29
+ tung@walle $ schema_transformer generate
30
+ What is the name of the table you want to alter?
31
+ > tags
32
+ What is the modification to the table?
33
+ Examples 1:
34
+ ADD COLUMN smart tinyint(1) DEFAULT '0'
35
+ Examples 2:
36
+ ADD INDEX idx_name (name)
37
+ Examples 3:
38
+ ADD COLUMN smart tinyint(1) DEFAULT '0', DROP COLUMN full_name
39
+ > ADD COLUMN special tinyint(1) DEFAULT '0'
40
+ ss
41
+ *** Thanks ***
42
+ Schema transform definitions have been generated and saved to:
43
+ config/schema_transformations/tags.json
44
+ Next you need to run 2 commands to alter the database. As explained in the README, the first
45
+ can be ran with the site still up. The second command should be done with a maintenance page up.
46
+
47
+ Here are the 2 commands you'll need to run later after checking in the tags.json file
48
+ into your version control system:
49
+ $ schema_transformer sync tags # can be ran over and over, it will just keep syncing the data
50
+ $ schema_transformer switch tags # should be done with a maintenance page up, switches the tables
51
+ *** Thank you ***
52
+ tung@walle $ schema_transformer sync tags
53
+ Creating temp table and syncing the data... (tail log/schema_transformer.log for status)
54
+ *** Thanks ***
55
+ There is now a tags_st_temp table with the new table schema and the data has been synced.
56
+ Please run the next command after you put a maintenance page up:
57
+ $ schema_transformer switch tags
58
+ tung@walle $ schema_transformer switch tags
59
+ *** Thanks ***
60
+ The final sync ran and the table tags has been updated with the new schema.
61
+ Get rid of that maintenance page and re-enable your site.
62
+ Thank you. Have a very nice day.
63
+ tung@walle $
64
+ </pre>
65
+
66
+ FAQ
67
+ -------
68
+
69
+ Q: What table alteration are supported?
70
+ A: I've only tested with adding columns and removing columns.
71
+
72
+ Q: Can I add and drop multiple columns and indexes at the same time?
73
+ A: Yes.
74
+
75
+ Cautionary Notes
76
+ -------
77
+ For speed reasons the final sync is done by using the updated_at timestamp if available and syncing
78
+ the data last updated since the last day. Data before that will not get synced in the final sync.
79
+ So, having an updated_at timestamp and using it on the original table is very important.
80
+
81
+ For tables that do not have updated_at timestamps. I need to still limit the size of the final update
82
+ so I'm limiting it to the last 100_000 records. Not much at all, so it is very important to have that
83
+ updated_at timestamp.
data/Rakefile ADDED
@@ -0,0 +1,29 @@
1
+ require 'rubygems'
2
+ require 'rake'
3
+ require 'rake/gempackagetask'
4
+ require 'spec/rake/spectask'
5
+ require 'gemspec'
6
+
7
+ desc "Generate gemspec"
8
+ task :gemspec do
9
+ File.open("#{Dir.pwd}/#{GEM_NAME}.gemspec", 'w') do |f|
10
+ f.write(GEM_SPEC.to_ruby)
11
+ end
12
+ end
13
+
14
+ desc "Install gem"
15
+ task :install do
16
+ Rake::Task['gem'].invoke
17
+ $stdout.puts "Installing gem..."
18
+ `gem install pkg/#{GEM_NAME}*.gem`
19
+ `rm -Rf pkg`
20
+ end
21
+
22
+ desc "Package gem"
23
+ Rake::GemPackageTask.new(GEM_SPEC) do |pkg|
24
+ pkg.gem_spec = GEM_SPEC
25
+ end
26
+
27
+ desc "Clean up the test project"
28
+ task :cleanup do
29
+ end
data/TODO ADDED
@@ -0,0 +1,11 @@
1
+ 1. create new table with schema
2
+ 2. batch copy data
3
+ 3. maintainenance page
4
+ 4. batch copy final data
5
+ 5. rename tables
6
+ 6. remove maintenance page
7
+
8
+ * TODO:
9
+ * add logging again: schema_transformer.log
10
+ * updated_at if its available and use a real time vs some guess
11
+ * clean up spec: use real mocks, get rid of $testing_books
@@ -0,0 +1,4 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require File.join(File.dirname(__FILE__),'..','lib','schema_transformer')
4
+ SchemaTransformer::CLI.run(ARGV)
data/gemspec.rb ADDED
@@ -0,0 +1,20 @@
1
+ require 'lib/schema_transformer/version'
2
+
3
+ GEM_NAME = 'schema_transformer'
4
+ GEM_FILES = FileList['**/*'] - FileList['coverage', 'coverage/**/*', 'pkg', 'pkg/**/*']
5
+ GEM_SPEC = Gem::Specification.new do |s|
6
+ # == CONFIGURE ==
7
+ s.author = "Tung Nguyen"
8
+ s.email = "tongueroo@gmail.com"
9
+ s.homepage = "http://github.com/tongueroo/#{GEM_NAME}"
10
+ s.summary = "Way is alter database schemas on large tables with little downtime"
11
+ # == CONFIGURE ==
12
+ s.executables += [GEM_NAME]
13
+ s.extra_rdoc_files = [ "README.markdown" ]
14
+ s.files = GEM_FILES.to_a
15
+ s.has_rdoc = false
16
+ s.name = GEM_NAME
17
+ s.platform = Gem::Platform::RUBY
18
+ s.require_path = "lib"
19
+ s.version = SchemaTransformer::VERSION
20
+ end
@@ -0,0 +1,260 @@
1
+ module SchemaTransformer
2
+ class UsageError < RuntimeError; end
3
+
4
+ class Base
5
+ include Help
6
+ @@stagger = 0
7
+ def self.run(options)
8
+ @@stagger = options[:stagger] || 0
9
+ @transformer = SchemaTransformer::Base.new(options[:base] || Dir.pwd)
10
+ @transformer.run(options)
11
+ end
12
+
13
+ attr_reader :options, :temp_table, :table
14
+ def initialize(base = File.expand_path("..", __FILE__), options = {})
15
+ @base = base
16
+ @db, @log, @mail = ActiveWrapper.setup(
17
+ :base => @base,
18
+ :env => ENV['RAILS_ENV'] || 'development',
19
+ :log => "schema_transformer"
20
+ )
21
+ @db.establish_connection
22
+ @conn = ActiveRecord::Base.connection
23
+
24
+ @batch_size = options[:batch_size] || 10_000
25
+ end
26
+
27
+ def run(options)
28
+ @action = options[:action].first
29
+ case @action
30
+ when "generate"
31
+ self.generate
32
+ help(:generate)
33
+ when "sync"
34
+ help(:sync_progress)
35
+ table = options[:action][1]
36
+ self.gather_info(table)
37
+ self.create
38
+ self.sync
39
+ help(:sync)
40
+ when "switch"
41
+ table = options[:action][1]
42
+ self.gather_info(table)
43
+ self.switch
44
+ self.cleanup
45
+ help(:switch)
46
+ else
47
+ raise UsageError, "Invalid action #{@action}"
48
+ end
49
+ end
50
+
51
+ def generate
52
+ data = {}
53
+ ask "What is the name of the table you want to alter?"
54
+ data[:table] = gets(:table)
55
+ ask <<-TXT
56
+ What is the modification to the table?
57
+ Examples 1:
58
+ ADD COLUMN smart tinyint(1) DEFAULT '0'
59
+ Examples 2:
60
+ ADD INDEX idx_name (name)
61
+ Examples 3:
62
+ ADD COLUMN smart tinyint(1) DEFAULT '0', DROP COLUMN full_name
63
+ TXT
64
+ data[:mod] = gets(:mod)
65
+ path = transform_file(data[:table])
66
+ FileUtils.mkdir(File.dirname(path)) unless File.exist?(File.dirname(path))
67
+ File.open(path,"w") { |f| f << data.to_json }
68
+ @table = data[:table]
69
+ data
70
+ end
71
+
72
+ def gather_info(table)
73
+ if table.nil?
74
+ raise UsageError, "You need to specific the table name: schema_transformer #{@action} <table_name>"
75
+ end
76
+ data = JSON.parse(IO.read(transform_file(table)))
77
+ @table = data["table"]
78
+ @mod = data["mod"]
79
+ # variables need for rest of the program
80
+ @temp_table = "#{@table}_st_temp"
81
+ @trash_table = "#{@table}_st_trash"
82
+ @model = define_model(@table)
83
+ end
84
+
85
+ def create
86
+ if self.temp_table_exists?
87
+ @temp_model = define_model(@temp_table)
88
+ else
89
+ sql_create = %{CREATE TABLE #{@temp_table} LIKE #{@table}}
90
+ sql_mod = %{ALTER TABLE #{@temp_table} #{@mod}}
91
+ @conn.execute(sql_create)
92
+ @conn.execute(sql_mod)
93
+ @temp_model = define_model(@temp_table)
94
+ end
95
+ reset_column_info
96
+ end
97
+
98
+ def sync
99
+ res = @conn.execute("SELECT max(id) AS max_id FROM `#{@temp_table}`")
100
+ start = res.fetch_row[0].to_i + 1 # nil case is okay: [nil][0].to_i => 0
101
+ find_in_batches(@table, :start => start, :batch_size => @batch_size) do |batch|
102
+ # puts "batch #{batch.inspect}"
103
+ lower = batch.first
104
+ upper = batch.last
105
+
106
+ columns = insert_columns_sql
107
+ sql = %Q{
108
+ INSERT INTO #{@temp_table} (
109
+ SELECT #{columns}
110
+ FROM #{@table} WHERE id >= #{lower} AND id <= #{upper}
111
+ )
112
+ }
113
+ # puts sql
114
+ @conn.execute(sql)
115
+
116
+ if @@stagger > 0
117
+ log("Staggering: delaying for #{@@stagger} seconds before next batch insert")
118
+ sleep(@@stagger)
119
+ end
120
+ end
121
+ end
122
+
123
+ def final_sync
124
+ @temp_model = define_model(@temp_table)
125
+ reset_column_info
126
+
127
+ sync
128
+ columns = subset_columns.collect{|x| "#{@temp_table}.`#{x}` = #{@table}.`#{x}`" }.join(", ")
129
+ # need to limit the final sync, if we do the entire table it takes a long time
130
+ limit_cond = get_limit_cond
131
+ sql = %{
132
+ UPDATE #{@temp_table} INNER JOIN #{@table}
133
+ ON #{@temp_table}.id = #{@table}.id
134
+ SET #{columns}
135
+ WHERE #{limit_cond}
136
+ }
137
+ # puts sql
138
+ @conn.execute(sql)
139
+ end
140
+
141
+ def switch
142
+ final_sync
143
+ to_trash = %Q{RENAME TABLE #{@table} TO #{@trash_table}}
144
+ from_temp = %Q{RENAME TABLE #{@temp_table} TO #{@table}}
145
+ @conn.execute(to_trash)
146
+ @conn.execute(from_temp)
147
+ end
148
+
149
+ def cleanup
150
+ sql = %Q{DROP TABLE #{@trash_table}}
151
+ @conn.execute(sql)
152
+ end
153
+
154
+ def get_limit_cond
155
+ if @model.column_names.include?("updated_at")
156
+ "#{@table}.updated_at >= '#{1.day.ago.strftime("%Y-%m-%d")}'"
157
+ else
158
+ sql = "select id from #{@table} order by id desc limit 100000"
159
+ resp = @conn.execute(sql)
160
+ bound = 0
161
+ while row = resp.fetch_row do
162
+ bound = row[0].to_i
163
+ end
164
+ "#{@table}.id >= #{bound}"
165
+ end
166
+ end
167
+
168
+ # the parameter is only for testing
169
+ def gets(name = nil)
170
+ STDIN.gets.strip
171
+ end
172
+
173
+ def subset_columns
174
+ removed = @model.column_names - @temp_model.column_names
175
+ subset = @model.column_names - removed
176
+ end
177
+
178
+ def insert_columns_sql
179
+ # existing subset
180
+ subset = subset_columns
181
+
182
+ # added
183
+ added_s = @temp_model.column_names - @model.column_names
184
+ added = @temp_model.columns.
185
+ select{|c| added_s.include?(c.name) }.
186
+ collect{|c| "#{extract_default(c)} AS `#{c.name}`" }
187
+
188
+ # combine both
189
+ columns = subset.collect{|x| "`#{x}`"} + added
190
+ sql = columns.join(", ")
191
+ end
192
+
193
+ # returns Array of record ids
194
+ def find(table, cond)
195
+ sql = "SELECT id FROM #{table} WHERE #{cond}"
196
+ response = @conn.execute(sql)
197
+ results = []
198
+ while row = response.fetch_row do
199
+ results << row[0].to_i
200
+ end
201
+ results
202
+ end
203
+
204
+ # lower memory heavy version of ActiveRecord's find in batches
205
+ def find_in_batches(table, options = {})
206
+ raise "You can't specify an order, it's forced to be #{batch_order}" if options[:order]
207
+ raise "You can't specify a limit, it's forced to be the batch_size" if options[:limit]
208
+
209
+ start = options.delete(:start).to_i
210
+ batch_size = options.delete(:batch_size) || 1000
211
+ order_limit = "ORDER BY id LIMIT #{batch_size}"
212
+
213
+ records = find(table, "id >= #{start} #{order_limit}")
214
+ while records.any?
215
+ yield records
216
+
217
+ break if records.size < batch_size
218
+ records = find(table, "id > #{records.last} #{order_limit}")
219
+ end
220
+ end
221
+
222
+ def define_model(table)
223
+ # Object.const_set(table.classify, Class.new(ActiveRecord::Base))
224
+ Object.class_eval(<<-code)
225
+ class #{table.classify} < ActiveRecord::Base
226
+ set_table_name "#{table}"
227
+ end
228
+ code
229
+ table.classify.constantize # returns the constant
230
+ end
231
+
232
+ def transform_file(table)
233
+ @base+"/config/schema_transformations/#{table}.json"
234
+ end
235
+
236
+ def temp_table_exists?
237
+ @conn.table_exists?(@temp_table)
238
+ end
239
+
240
+ def reset_column_info
241
+ @model.reset_column_information
242
+ @temp_model.reset_column_information
243
+ end
244
+
245
+ def log(msg)
246
+ @log.info(msg)
247
+ end
248
+
249
+ private
250
+ def ask(msg)
251
+ puts msg
252
+ print "> "
253
+ end
254
+
255
+ def extract_default(col)
256
+ @conn.quote(col.default)
257
+ end
258
+
259
+ end
260
+ end
@@ -0,0 +1,99 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'rubygems'
4
+ require 'active_wrapper'
5
+
6
+ module SchemaTransformer
7
+ class CLI
8
+
9
+ def self.run(args)
10
+ cli = new(args)
11
+ cli.parse_options!
12
+ cli.run
13
+ end
14
+
15
+ # The array of (unparsed) command-line options
16
+ attr_reader :args
17
+ # The hash of (parsed) command-line options
18
+ attr_reader :options
19
+
20
+ def initialize(args)
21
+ @args = args.dup
22
+ end
23
+
24
+ # Return an OptionParser instance that defines the acceptable command
25
+ # line switches for cloud_info, and what their corresponding behaviors
26
+ # are.
27
+ def option_parser
28
+ # @logger = Logger.new
29
+ @option_parser ||= OptionParser.new do |opts|
30
+ opts.banner = "Usage: #{File.basename($0)} [options] [action]"
31
+
32
+ opts.on("-h", "--help", "Display this help message.") do
33
+ puts help_message
34
+ puts opts
35
+ exit
36
+ end
37
+
38
+ opts.on("-v", "--verbose",
39
+ "Verbose mode"
40
+ ) { |value| options[:verbose] = true }
41
+
42
+ opts.on("-s", "--stagger",
43
+ "Number of seconds to wait inbetween each bulk insert. Default 0"
44
+ ) { |value| options[:stagger] = value }
45
+
46
+ opts.on("-V", "--version",
47
+ "Display the schema_transformer version, and exit."
48
+ ) do
49
+ require File.expand_path("../version", __FILE__)
50
+ puts "Schema Transformer v#{SchemaTransformer::VERSION}"
51
+ exit
52
+ end
53
+
54
+ end
55
+ end
56
+
57
+ def parse_options!
58
+ @options = {:action => nil}
59
+
60
+ if args.empty?
61
+ warn "Please specifiy an action to execute."
62
+ warn help_message
63
+ warn option_parser
64
+ exit 1
65
+ end
66
+
67
+ option_parser.parse!(args)
68
+ extract_environment_variables!
69
+
70
+ options[:action] = args # ignore remaining
71
+ end
72
+
73
+ # Extracts name=value pairs from the remaining command-line arguments
74
+ # and assigns them as environment variables.
75
+ def extract_environment_variables! #:nodoc:
76
+ args.delete_if do |arg|
77
+ next unless arg.match(/^(\w+)=(.*)$/)
78
+ ENV[$1] = $2
79
+ end
80
+ end
81
+
82
+ def run
83
+ begin
84
+ SchemaTransformer::Base.run(options)
85
+ rescue UsageError => e
86
+ puts "Usage Error: #{e.message}"
87
+ puts help_message
88
+ puts option_parser
89
+ end
90
+ end
91
+
92
+ private
93
+ def help_message
94
+ "Available actions: generate, sync, switch"
95
+ end
96
+ end
97
+
98
+ end
99
+
@@ -0,0 +1,43 @@
1
+ module SchemaTransformer
2
+ module Help
3
+ def help(action)
4
+ case action
5
+ when :generate
6
+ out =<<-HELP
7
+ ss
8
+ *** Thanks ***
9
+ Schema transform definitions have been generated and saved to:
10
+ config/schema_transformations/#{self.table}.json
11
+ Next you need to run 2 commands to alter the database. As explained in the README, the first
12
+ can be ran with the site still up. The second command should be done with a maintenance page up.
13
+
14
+ Here are the 2 commands you'll need to run later after checking in the #{self.table}.json file
15
+ into your version control system:
16
+ $ schema_transformer sync #{self.table} # can be ran over and over, it will just keep syncing the data
17
+ $ schema_transformer switch #{self.table} # should be done with a maintenance page up, switches the tables
18
+ *** Thank you ***
19
+ HELP
20
+ when :sync_progress
21
+ out =<<-TEXT
22
+ Creating temp table and syncing the data... (tail log/schema_transformer.log for status)
23
+ TEXT
24
+ when :sync
25
+ out =<<-TEXT
26
+ *** Thanks ***
27
+ There is now a #{self.temp_table} table with the new table schema and the data has been synced.
28
+ Please run the next command after you put a maintenance page up:
29
+ $ schema_transformer switch #{self.table}
30
+ TEXT
31
+ when :switch
32
+ out =<<-TEXT
33
+ *** Thanks ***
34
+ The final sync ran and the table #{self.table} has been updated with the new schema.
35
+ Get rid of that maintenance page and re-enable your site.
36
+ Thank you. Have a very nice day.
37
+ TEXT
38
+ end
39
+ puts out
40
+ end
41
+
42
+ end
43
+ end
@@ -0,0 +1,3 @@
1
+ module SchemaTransformer
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,10 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'rubygems'
4
+ require 'active_wrapper'
5
+ require 'pp'
6
+ require 'fileutils'
7
+ require File.expand_path('../schema_transformer/version', __FILE__)
8
+ require File.expand_path('../schema_transformer/help', __FILE__)
9
+ require File.expand_path('../schema_transformer/base', __FILE__)
10
+ require File.expand_path('../schema_transformer/cli', __FILE__)
data/notes/copier.rb ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ res = conn.execute("SELECT max(`article_revisions_new`.id) AS max_id FROM `article_revisions_new`")
4
+ start = res.fetch_row[0].to_i # nil case is okay: [nil][0].to_i => nil
5
+ Article::Revisions.find_in_batches(:start => start, :batch_size => 10_000) do |batch|
6
+ lower = batch.first.id
7
+ upper = batch.last.id
8
+ execute(%{
9
+ INSERT INTO article_revisions_new (
10
+ SELECT id, title, body, article_id, number, note, editor_id, created_at, blurb, teaser, source, slide_id
11
+ FROM article_revisions WHERE id <= #{lower} AND id < #{upper}
12
+ );
13
+ })
14
+ end
@@ -0,0 +1,45 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ ArticleRevision.find_in_batches
4
+
5
+ Activity
6
+
7
+ id, title, body, article_id, number, note, editor_id, created_at, blurb, teaser, source, slide_id, NULL test_id
8
+
9
+ def find_in_batches(options = {})
10
+ raise "You can't specify an order, it's forced to be #{batch_order}" if options[:order]
11
+ raise "You can't specify a limit, it's forced to be the batch_size" if options[:limit]
12
+
13
+ start = options.delete(:start).to_i
14
+ batch_size = options.delete(:batch_size) || 1000
15
+
16
+ with_scope(:find => options.merge(:order => batch_order, :limit => batch_size)) do
17
+ records = find(:all, :conditions => [ "#{table_name}.#{primary_key} >= ?", start ])
18
+
19
+ while records.any?
20
+ yield records
21
+
22
+ break if records.size < batch_size
23
+ records = find(:all, :conditions => [ "#{table_name}.#{primary_key} > ?", records.last.id ])
24
+ end
25
+ end
26
+ end
27
+
28
+ res = conn.execute("SELECT max(`article_revisions_new`.id) AS max_id FROM `article_revisions_new`")
29
+ start = res.fetch_row[0].to_i # nil case is okay: [nil][0].to_i => nil
30
+ Article::Revisions.find_in_batches(:start => start, :batch_size => 10_000) do |batch|
31
+ lower = batch.first.id
32
+ upper = batch.last.id
33
+ execute(%{
34
+ INSERT INTO article_revisions_new (
35
+ SELECT id, title, body, article_id, number, note, editor_id, created_at, blurb, teaser, source, slide_id
36
+ FROM article_revisions WHERE id <= #{lower} AND id < #{upper}
37
+ );
38
+ })
39
+ end
40
+
41
+
42
+ pager = Pager.new(:per_page => 10_000, :lower => 300, :upper => 30_000)
43
+ pager.each do |page|
44
+ puts page.start_index
45
+ end