mogbak 0.1.2

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore ADDED
@@ -0,0 +1 @@
1
+ .idea/
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source :rubygems
2
+ gemspec
3
+
data/Gemfile.lock ADDED
@@ -0,0 +1,20 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ mogbak (0.0.1)
5
+
6
+ GEM
7
+ remote: http://rubygems.org/
8
+ specs:
9
+ json (1.6.1)
10
+ rake (0.9.2.2)
11
+ rdoc (3.11)
12
+ json (~> 1.4)
13
+
14
+ PLATFORMS
15
+ ruby
16
+
17
+ DEPENDENCIES
18
+ mogbak!
19
+ rake
20
+ rdoc
data/LICENSE ADDED
@@ -0,0 +1,7 @@
1
+ Copyright (c) 2012 Firespring
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4
+
5
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6
+
7
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,57 @@
1
+ Mogbak makes it easy to backup your MogileFS domain to a single self contained directory. It has the ability to
2
+ update that directory again and again to match your MogileFS domain. This makes it possible for you to use
3
+ LVM snapshots. Mogbak can also fork worker processes to backup or restore files in parallel.
4
+
5
+ ##Need a backup?
6
+ mogbak create --db=mogilefs --dbhost=mysqlserver --dbpass=secret --dbuser=mogile --domain=awesomeapp \\
7
+ --trackerip=10.10.10.10 --workers=10 /backups/awesomeapp
8
+ mogbak backup /backups/awesomeapp
9
+
10
+ ###Perhaps you want an incremental, no problem.
11
+ mogbak backup --workers=10 /backups/awesomeapp
12
+
13
+ ###Maybe you need to see the files in the backup
14
+ mogbak list /backups/awesomeapp
15
+
16
+ ###Backups suck if you can't restore, well good thing we can
17
+ mogbak restore --domain=restoreawesomeapp --trackerip=10.10.10.10 --workers=10 /backups/awesomeapp
18
+
19
+ ###Maybe you just want to restore one file?
20
+ mogbak restore --domain=restoreawesomeapp --trackerip=10.10.10.10 --single-file=abc1234file --workers=10 /backups/awesomeapp
21
+
22
+
23
+ ###Why does Mogbak need to connect to my database?
24
+ MogileFS simply bumps its FID value in the files table when a new file is saved. This makes it quite simple
25
+ for us to query and see what files need to be backed up since our last backup. The problem is that we also need
26
+ to know what files have been deleted from MogileFS but still live within your backup. Since MogileFS has no delete
27
+ log for us to look at we need to query the database in a brute-force manner. This would be extremely painful without
28
+ access to the database. We do this as efficiently as we can, our cluster has about 3 million files and it takes less than a second.
29
+ You can disable this feature with --no-delete switch.
30
+
31
+ The good news is that mogbak only needs *SELECT access*.
32
+
33
+ ###What does the self contained backup directory look like?
34
+
35
+ * db.sqlite - holds the metadata of each file in the backup
36
+ * settings.yml - holds the settings to connect to the mysql database and the tracker
37
+ * Backup files hashed using the same scheme as MogileFS Server
38
+
39
+ ###Whats the catch?
40
+
41
+ * Space. Obviously with large clusters the ability to save a full backup onto one device probably isn't possible
42
+ * Database. Right now only MySQL backed trackers are supported
43
+
44
+ There are certainly things that could be done about the above issues. Pull requests are welcome :)
45
+
46
+ ####Requirements
47
+
48
+ * Ruby 1.9 is what we test against. It'll probably work under 1.8 but you'll be a ghiney pig.
49
+ * *nix
50
+ * mysql client development libraries (for mysql2 gem dependency)
51
+ * sqlite3 development libraries (for sqlite3 gem dependency)
52
+
53
+ ####How to install?
54
+ gem install mogbak
55
+
56
+ ####Syntax?
57
+ See https://github.com/firespring/mogbak/wiki/Command-syntax
data/bin/mogbak ADDED
@@ -0,0 +1,193 @@
1
+ #!/usr/bin/env ruby
2
+ #Realpath didn't exist in Ruby 1.8 so we'll add it.
3
+ unless File.respond_to? :realpath
4
+ class File #:nodoc:
5
+ def self.realpath path
6
+ return realpath(File.readlink(path)) if symlink?(path)
7
+ path
8
+ end
9
+ end
10
+ end
11
+ $: << File.expand_path(File.dirname(File.realpath(__FILE__)) + '/../lib')
12
+
13
+ require 'monkey_patch'
14
+ require 'rubygems'
15
+ require 'gli'
16
+ require 'mogbak_version'
17
+ require 'validations'
18
+ require 'create'
19
+ require 'backup'
20
+ require 'restore'
21
+ require 'list'
22
+ require 'active_record'
23
+ require 'activerecord-import'
24
+ require "mogilefs"
25
+ require 'sqlite3'
26
+ require 'yaml'
27
+ require 'forkinator'
28
+ require 'path_helper'
29
+
30
+
31
+ #Set master process name
32
+ $0 = "mogbak [master]"
33
+
34
+ include GLI
35
+ program_desc 'Backup a mogilefs domain to the filesystem. mogbak needs SELECT access to your MogileFS tracker database, which must
36
+ currently be MySQL.'
37
+
38
+ desc 'enable debug mode'
39
+ switch ['debug']
40
+
41
+ version Mogbak::VERSION
42
+
43
+ desc 'Create a new backup profile for a MogileFS domain'
44
+ arg_name '[backup_path]'
45
+ command :create do |c|
46
+ c.desc 'tracker ip'
47
+ c.default_value '127.0.0.1'
48
+ c.flag :trackerip
49
+
50
+ c.desc 'tracker port'
51
+ c.default_value 7001
52
+ c.flag :trackerport
53
+
54
+ c.desc 'mogilefs domain'
55
+ c.default_value 'test'
56
+ c.flag :domain
57
+
58
+
59
+ c.desc 'mysql db server host'
60
+ c.default_value 'localhost'
61
+ c.flag :dbhost
62
+
63
+ c.desc 'mysql db server port'
64
+ c.default_value 3306
65
+ c.flag :dbport
66
+
67
+ c.desc 'database name'
68
+ c.default_value 'mogilefs'
69
+ c.flag :db
70
+
71
+ c.desc 'database user'
72
+ c.default_value 'mogile'
73
+ c.flag :dbuser
74
+
75
+ c.desc 'database password (REQUIRED)'
76
+ c.flag :dbpass
77
+
78
+
79
+ c.action do |global_options, options, args|
80
+
81
+ #required options and arguments
82
+ raise '--dbpass is required - see: mogbak help creater' unless options[:dbpass]
83
+ raise '[backup_path] is required see mogadm - see: mogbak help create' unless args[0]
84
+ $backup_path = args[0]
85
+
86
+ mog = Create.new(:tracker_ip => options[:trackerip],
87
+ :tracker_port => options[:trackerport],
88
+ :domain => options[:domain],
89
+ :db_host => options[:dbhost],
90
+ :db_port => options[:dbport],
91
+ :db => options[:db],
92
+ :db_user => options[:dbuser],
93
+ :db_pass => options[:dbpass],
94
+ :backup_path => args[0])
95
+ puts "Backup profile successfully created. To start a backup run: mogbak backup #{args[0]}"
96
+ end
97
+ end
98
+
99
+ desc 'Start a backup using an existing backup profile'
100
+ arg_name '[backup_path]'
101
+ command :backup do |c|
102
+ c.desc 'do not remove deleted files from the backup (faster)'
103
+ c.switch ['no-delete']
104
+
105
+ c.desc 'Number of worker processes'
106
+ c.default_value 1
107
+ c.flag :workers
108
+
109
+ c.action do |global_options, options, args|
110
+ raise '[backup_path] is required - see: mogbak help backup' unless args[0]
111
+ $backup_path = args[0]
112
+ mog = Backup.new(:backup_path => args[0], :workers => options[:workers].to_i)
113
+ mog.backup(:no_delete => options[:"no-delete"])
114
+ end
115
+
116
+ end
117
+
118
+ desc 'Restore a backup to a new MogileFS domain.'
119
+ long_desc <<EOS
120
+ This function restores all files in backup to the new domain. It does not keep track of what has been restored, so the
121
+ restore cannot be stopped and resumed. If it fails to restore a file it will output an error. It restores the files with
122
+ the same class that they were created with. It is expected that you will create those classes on the domain before restoring.
123
+ EOS
124
+ arg_name '[backup_path]'
125
+ command :restore do |c|
126
+
127
+ c.desc 'restore dest tracker ip'
128
+ c.default_value '127.0.0.1'
129
+ c.flag :trackerip
130
+
131
+ c.desc 'restore dest tracker port'
132
+ c.default_value 7001
133
+ c.flag :trackerport
134
+
135
+ c.desc 'restore dest mogilefs domain (REQUIRED)'
136
+ c.flag :domain
137
+
138
+ c.desc 'Number of worker processes'
139
+ c.default_value 1
140
+ c.flag :workers
141
+
142
+ c.desc 'restore a single file by dkey'
143
+ c.flag :"single-file"
144
+
145
+ c.action do |global_options, options, args|
146
+ raise 'domain parameter is required - see: mogbak help restore' unless options[:domain]
147
+ raise '[backup_path] is required - see: mogbak help restore' unless args[0]
148
+ $backup_path = args[0]
149
+
150
+ restore = Restore.new(:tracker_ip => options[:trackerip],
151
+ :tracker_port => options[:trackerport],
152
+ :domain => options[:domain],
153
+ :backup_path => args[0],
154
+ :workers => options[:workers])
155
+ restore.restore(options[:"single-file"])
156
+ end
157
+ end
158
+
159
+ desc 'List the files in a backup'
160
+ long_desc <<EOS
161
+ Output is: fid,dkey,length,classname
162
+ EOS
163
+ arg_name '[backup_path]'
164
+ command :list do |c|
165
+ c.action do |global_options, options, args|
166
+ raise '[backup_path] is required - see: mogbak help list' unless args[0]
167
+ $backup_path = args[0]
168
+
169
+ list = List.new(:backup_path => args[0])
170
+ list.list
171
+ end
172
+ end
173
+
174
+ #If the user passes in --debug we can detect it here.
175
+ pre do |global_options, command, options, args|
176
+ $debug = global_options[:debug]
177
+ true
178
+ end
179
+
180
+ #If debug is enabled we'll spit out the exception and the entire backtrace. Otherwise it will just output
181
+ #the exception message
182
+ on_error do |exception|
183
+ if $debug
184
+ puts exception
185
+ puts ''
186
+ puts 'Backtrace:'
187
+ puts exception.backtrace
188
+ return false
189
+ end
190
+ true
191
+ end
192
+
193
+ exit GLI.run(ARGV)
@@ -0,0 +1,17 @@
1
+ class CreateInitialSchema < ActiveRecord::Migration
2
+ def self.up
3
+ create_table :bak_files do |t|
4
+ t.integer :fid
5
+ t.string :domain, :limit => 255
6
+ t.string :dkey, :limit => 255
7
+ t.integer :length, :limit => 20
8
+ t.string :classname, :limit => 255
9
+ t.boolean :saved, :default => false
10
+ end
11
+ add_index :bak_files, :fid
12
+ end
13
+
14
+ def self.down
15
+ drop_table :bak_files
16
+ end
17
+ end
data/lib/backup.rb ADDED
@@ -0,0 +1,197 @@
1
+ #Used to backup a mogilefs domain using a backup profile.
2
+ class Backup
3
+ attr_accessor :db, :db_host, :db_port, :db_pass, :db_user, :domain, :tracker_host, :tracker_port, :workers
4
+ include Validations
5
+
6
+ #Run validations and prepare the object for a backup
7
+ #@param [Hash] o hash containing the settings for the backup
8
+ def initialize(o={})
9
+
10
+ #Load up the settings file
11
+ check_settings_file
12
+ settings = YAML::load(File.open("#{$backup_path}/settings.yml"))
13
+ @db = settings['db']
14
+ @db_host = settings['db_host']
15
+ @db_port = settings['db_port']
16
+ @db_pass = settings['db_pass']
17
+ @db_user = settings['db_user']
18
+ @domain = settings['domain']
19
+ @tracker_ip = settings['tracker_ip']
20
+ @tracker_port = settings['tracker_port']
21
+ @workers = o[:workers] if o[:workers]
22
+
23
+
24
+ #run validations and setup
25
+ raise unless check_backup_path
26
+ create_sqlite_db
27
+ connect_sqlite
28
+ migrate_sqlite
29
+ mogile_db_connect
30
+ mogile_tracker_connect
31
+ check_mogile_domain(domain)
32
+
33
+ require ('domain')
34
+ require('file')
35
+ require('bakfile')
36
+ require('fileclass')
37
+ end
38
+
39
+
40
+ #Create a backup of a file using a BakFile object
41
+ #@param [BakFile] file file that needs to be backed up
42
+ #@return [Bool] file save result
43
+ def bak_file(file)
44
+ saved = file.bak_it
45
+ if saved
46
+ puts "Backed up: FID #{file.fid}"
47
+ else
48
+ puts "Error - will try again on next run: FID #{file.fid}"
49
+ end
50
+
51
+ return saved
52
+ end
53
+
54
+ #Launch workers to backup an array of BakFiles
55
+ #@param [Array] files must be an array of BakFiles
56
+ def launch_backup_workers(files)
57
+
58
+ #This proc will process the results of the child proc
59
+ parent = Proc.new { |results|
60
+ fids = []
61
+
62
+ results.each do |result|
63
+ file = result[:file]
64
+ saved = result[:saved]
65
+ fids << file.fid if saved
66
+ end
67
+
68
+ #bulk update all the fids. much faster then doing it one at a time
69
+ BakFile.update_all({:saved => true}, {:fid => fids})
70
+
71
+ #release the connection from the connection pool
72
+ SqliteActiveRecord.clear_active_connections!
73
+ }
74
+
75
+ #This proc receives an array of BakFiles, proccesses them, and returns a result array to the parent proc
76
+ child = Proc.new { |files|
77
+ result = []
78
+ files.each do |file|
79
+ break if file.nil?
80
+ saved = bak_file(file)
81
+ result << {:saved => saved, :file => file}
82
+ end
83
+ result
84
+ }
85
+
86
+ #launch workers using the above procs and files
87
+ Forkinator.hybrid_fork(self.workers.to_i, files, parent, child)
88
+ end
89
+
90
+ #Launch workers to delete an array of files
91
+ #param [Array] files must be an array of BakFiles that need to be deleted
92
+ def launch_delete_workers(fids)
93
+
94
+ #This proc receives an array of BakFiles, handles them, and spits them back to the parent.
95
+ child = Proc.new { |fids|
96
+ result = []
97
+ fids.each do |fid|
98
+ break if fid.nil?
99
+ deleted = BakFile.delete_from_fs(fid)
100
+ if deleted
101
+ puts "Deleting from backup: FID #{fid}"
102
+ else
103
+ puts "Failed to delete from backup: FID #{fid}"
104
+ end
105
+
106
+ result << fid
107
+ end
108
+ result
109
+ }
110
+
111
+ #This proc will process the results of the child proc
112
+ parent = Proc.new { |results|
113
+ fids = []
114
+
115
+ results.each do |result|
116
+ fids << result
117
+ end
118
+
119
+ BakFile.delete_all({:fid => fids})
120
+
121
+ #release the connection from the connection pool
122
+ SqliteActiveRecord.clear_active_connections!
123
+ }
124
+
125
+ #launch workers using the above procs and files
126
+ Forkinator.hybrid_fork(self.workers.to_i, fids, parent, child)
127
+
128
+ end
129
+
130
+ #The real logic for backing the domain up. It is pretty careful about making sure that it doesn't report a file
131
+ #as backed up unless it actually was. Supports the ability to remove deleted files from the backup as well. We grab files
132
+ #from the mogilefs mysql server in groups of 500 * number of workers (default is 1 worker)
133
+ #@param [Hash] o if :no_delete then don't remove deleted files from the backup (intensive process)
134
+ def backup(o = {})
135
+
136
+ files = []
137
+ #first we retry files that we haven't been able to backup successfully, if any.
138
+ BakFile.find_each(:conditions => ['saved = ?', false]) do |bak_file|
139
+ files << bak_file
140
+ end
141
+
142
+ launch_backup_workers(files)
143
+
144
+ #now back up any new files. if they fail to be backed up we'll retry them the next time the backup
145
+ #command is ran.
146
+ dmid = Domain.find_by_namespace(self.domain)
147
+ results = Fid.find_in_batches(:conditions => ['dmid = ? AND fid > ?', dmid, BakFile.max_fid], :batch_size => 500 * self.workers.to_i, :include => [:domain, :fileclass]) do |batch|
148
+
149
+ #Insert all the files into our bak db with :saved false so that we don't think we backed up something that crashed
150
+ files = []
151
+ batch.each do |file|
152
+ files << BakFile.new(:fid => file.fid,
153
+ :domain => file.domain.namespace,
154
+ :dkey => file.dkey,
155
+ :length => file.length,
156
+ :classname => file.classname,
157
+ :saved => false)
158
+ end
159
+
160
+ #There is no way to do a bulk insert in sqlite so this generates a lot of inserts. wrapping all of the inserts
161
+ #inside a single transaction makes it much much faster.
162
+ BakFile.transaction do
163
+ BakFile.import files, :validate => false
164
+ end
165
+
166
+ #Fire up the workers now that we have work for them to do
167
+ launch_backup_workers(files)
168
+
169
+ end
170
+
171
+ #Delete files from the backup that no longer exist in the mogilefs domain. Unfortunently there is no easy way to detect
172
+ #which files have been deleted from the MogileFS domain. Our only option is to brute force our way through. This is a bulk
173
+ #query that checks a thousand files in each query against the MogileFS database server. The query is kind of tricky because
174
+ #I wanted to do this with nothing but SELECT privileges which meant I couldn't create a temporary table (which would require,
175
+ #create temporary table and insert privleges). You might want to only run this operation every once and awhile if you have a
176
+ #very large domain. In my testing, it is able to get through domains with millions of files in a matter of a second. So
177
+ #all in all it's not so bad
178
+ if !o[:no_delete]
179
+ files_to_delete = Array.new
180
+ BakFile.find_in_batches { |bak_files|
181
+
182
+ union = "SELECT #{bak_files.first.fid} as fid"
183
+ bak_files.shift
184
+ bak_files.each do |bakfile|
185
+ union = "#{union} UNION SELECT #{bakfile.fid}"
186
+ end
187
+ connection = ActiveRecord::Base.connection
188
+ files = connection.select_values("SELECT t1.fid FROM (#{union}) as t1 LEFT JOIN file on t1.fid = file.fid WHERE file.fid IS NULL")
189
+ files_to_delete += files
190
+ }
191
+
192
+ launch_delete_workers(files_to_delete)
193
+
194
+ end
195
+
196
+ end
197
+ end
data/lib/bakfile.rb ADDED
@@ -0,0 +1,75 @@
1
+ #This is kind of awkward but is the only good way to support multiple database connections using ActiveRecord
2
+ class SqliteActiveRecord < ActiveRecord::Base
3
+ establish_connection(:adapter => 'sqlite3', :database => "#{$backup_path}/db.sqlite")
4
+ self.abstract_class = true
5
+ end
6
+
7
+ #Represents files that are either backed up, about to be backed up, or failed to be backed up.
8
+ class BakFile < SqliteActiveRecord
9
+
10
+ #get the max fid that is backed up
11
+ def self.max_fid
12
+ last_backed_file = BakFile.order("fid").last
13
+ if last_backed_file
14
+ max_fid = last_backed_file.fid
15
+ else
16
+ max_fid = 0
17
+ end
18
+ max_fid
19
+ end
20
+
21
+ #Restore a file back to a MogileFS domain
22
+ #@return [Bool]
23
+ def restore
24
+ path = PathHelper.path(self.fid)
25
+ begin
26
+ $mg.store_file(self.dkey, self.classname, path)
27
+ rescue Exception => e
28
+ if $debug
29
+ raise e
30
+ end
31
+ end
32
+ end
33
+
34
+
35
+ #Get a file from MogileFS and save it to the destination path.
36
+ #@return [Bool]
37
+ def bak_it
38
+ begin
39
+ path = PathHelper.path(self.fid)
40
+ $mg.get_file_data(self.dkey, path)
41
+ rescue Exception => e
42
+ if $debug
43
+ raise e
44
+ end
45
+ return false
46
+ end
47
+ true
48
+ end
49
+
50
+
51
+ #Delete from filesystem using just a fid
52
+ #@return [Bool]
53
+ def self.delete_from_fs(delete_fid)
54
+ begin
55
+ File.delete(PathHelper.path(delete_fid))
56
+ rescue Exception => e
57
+ if $debug
58
+ raise e
59
+ end
60
+ end
61
+ end
62
+
63
+
64
+ #Delete file from filesystem if someone deletes this object through ActiveRecord somehow.
65
+ before_destroy do
66
+ path = PathHelper.path(self.fid)
67
+ begin
68
+ File.delete(path)
69
+ rescue Exception => e
70
+ if $debug
71
+ raise e
72
+ end
73
+ end
74
+ end
75
+ end
data/lib/create.rb ADDED
@@ -0,0 +1,56 @@
1
+ #Creates a backup profile to be used by Backup
2
+ class Create
3
+ attr_accessor :db, :db_host, :db_port, :db_pass, :db_user, :domain, :tracker_host, :tracker_port, :workers
4
+ include Validations
5
+
6
+ #Run validations and create the backup profile
7
+ #@param [Hash] o hash containing the settings for the backup profile
8
+ def initialize(o={})
9
+ @db = o[:db] if o[:db]
10
+ @db_host = o[:db_host] if o[:db_host]
11
+ @db_port = o[:db_port] if o[:db_port]
12
+ @db_pass = o[:db_pass] if o[:db_pass]
13
+ @db_user = o[:db_user] if o[:db_user]
14
+ @domain = o[:domain] if o[:domain]
15
+ @tracker_ip = o[:tracker_ip] if o[:tracker_ip]
16
+ @tracker_port = o[:tracker_port] if o[:tracker_port]
17
+
18
+ #If settings.yml exists then this is an existing backup and you cannot run a create on top of it
19
+ raise 'Cannot run create on an existing backup. Try: mogbak backup #{$backup_path} to backup. If you want
20
+ to change settings on this backup profile you will have to edit #{$backup_path}/settings.yml manually.' if check_settings_file(nil)
21
+
22
+ check_backup_path
23
+ create_sqlite_db
24
+ connect_sqlite
25
+ migrate_sqlite
26
+ mogile_db_connect
27
+ mogile_tracker_connect
28
+ check_mogile_domain(@domain)
29
+
30
+ #Save settings
31
+ save_settings
32
+ end
33
+
34
+ #Save the settings for the backup into a yaml file (settings.yaml) so that an incremental can be ran without so many parameters
35
+ #@return [Bool] true or false
36
+ def save_settings
37
+ require ('yaml')
38
+ settings = {
39
+ 'db' => @db,
40
+ 'db_host' => @db_host,
41
+ 'db_port' => @db_port,
42
+ 'db_pass' => @db_pass,
43
+ 'db_user' => @db_user,
44
+ 'domain' => @domain,
45
+ 'tracker_ip' => @tracker_ip,
46
+ 'tracker_port' => @tracker_port,
47
+ 'backup_path' => $backup_path
48
+ }
49
+
50
+ File.open("#{$backup_path}/settings.yml", "w") do |file|
51
+ file.write settings.to_yaml
52
+ end
53
+
54
+ true
55
+ end
56
+ end
data/lib/domain.rb ADDED
@@ -0,0 +1,5 @@
1
+ #Represents domains in a MogileFS database server
2
+ class Domain < ActiveRecord::Base
3
+ self.primary_key = "dmid"
4
+ self.table_name = "domain"
5
+ end
data/lib/file.rb ADDED
@@ -0,0 +1,21 @@
1
+ #represents files that are in the MogileFS database. This model is used for talking to the MogileFS database via
2
+ #ActiveRecord
3
+ class Fid < ActiveRecord::Base
4
+ self.primary_key = "fid"
5
+ self.table_name = "file"
6
+ belongs_to :domain,
7
+ :foreign_key => "dmid"
8
+ belongs_to :fileclass,
9
+ :foreign_key => "classid"
10
+
11
+ #If there is no fileclass then it is the default class
12
+ #@return [String] name of class file belongs to
13
+ def classname
14
+ if fileclass
15
+ fileclass.classname
16
+ else
17
+ return 'default'
18
+ end
19
+ end
20
+
21
+ end
data/lib/fileclass.rb ADDED
@@ -0,0 +1,5 @@
1
+ #Represents classes in a MogileFS database server
2
+ class Fileclass < ActiveRecord::Base
3
+ self.primary_key = "classid"
4
+ self.table_name = "class"
5
+ end
data/lib/forkinator.rb ADDED
@@ -0,0 +1,132 @@
1
+ #The Forkinator makes it easy to fork workers, pass a list of jobs for them to work on, and listen for the results back from
2
+ #the child process. It uses a combination of threading and forking to accomplish this. Marshal is used to pass objects back
3
+ #and forth between the child and parent via IO.pipe.
4
+ class Forkinator
5
+
6
+ #Wait for threads
7
+ #@param [Array] threads an array containing threads we need to wait on
8
+ def self.wait_for_threads(threads)
9
+ threads.compact.each do |t|
10
+ begin
11
+ t.join
12
+ rescue Interrupt
13
+ # no reason to wait on dead threads
14
+ end
15
+ end
16
+ end
17
+
18
+ #Fork a child. Provide a proc of code to run inside child. The child proc expects to be sent an array of jobs
19
+ #@param [Proc] child_proc code for the child to run
20
+ #@return [Hash] :write = pipe for writing to the child, :read = pipe for reading from the child, :pid = pid of the child
21
+ def self.make_child(child_proc)
22
+
23
+ #open pipes for two way communication between the parent and child
24
+ child_read, parent_write = IO.pipe
25
+ parent_read, child_write = IO.pipe
26
+
27
+ #remove our database connection, we don't want it inside the child, as it'll get closed when the child shuts down
28
+ mog_config = ActiveRecord::Base.remove_connection
29
+
30
+ #fork, code inside this block is only ran inside the child
31
+ pid = Process.fork do
32
+ begin
33
+
34
+ #Since we're the child now, we'll close the parent's r/w pipes as we don't need them
35
+ parent_write.close
36
+ parent_read.close
37
+
38
+ #child loops through IO pipe, listening for data from the parent, if the parent closes the pipe then we're
39
+ #done
40
+ while !child_read.eof? do
41
+ #rename the process to make it clear that it's a worker in idle status
42
+ $0 = "mogbak [idle]"
43
+ #this call blocks until it receives something from the parent via the pipe
44
+ job = Marshal.load(child_read)
45
+ #since we're working now we'll rename the process
46
+ $0 = "mogbak [working]"
47
+ #call the child proc
48
+ result = child_proc.call(job)
49
+ #hand the child proc response back to the parent
50
+ Marshal.dump(result, child_write)
51
+ end
52
+
53
+ #no matter what happens..make sure we get the pipes closed
54
+ ensure
55
+ child_read.close
56
+ child_write.close
57
+ end
58
+ end
59
+
60
+ #This is the parent executing this -- reconnect to the database we just dropped above.
61
+ ActiveRecord::Base.establish_connection(mog_config)
62
+
63
+ #close the child's handle on the pipes since the parent won't need them
64
+ child_read.close
65
+ child_write.close
66
+
67
+ {:write => parent_write, :read => parent_read, :pid => pid}
68
+ end
69
+
70
+ #Forks children, makes threads for two-way communication, and evenly distributes jobs to each child.
71
+ #@param [Integer] qty number of workers to launch
72
+ #@param [Array] jobs array containing jobs for each child
73
+ #@param [Proc] parent_proc code to be ran in the thread used to communicate with the child
74
+ #@param [Proc] child_proc code to be ran in the forked child
75
+ def self.hybrid_fork(qty, jobs, parent_proc, child_proc)
76
+ threads = []
77
+
78
+ #mutex is used to ensure that some operations in the threads don't have the potential of happening at the same time
79
+ #in another thread
80
+ semaphore = Mutex.new
81
+
82
+ require('thread')
83
+
84
+ #split the jobs up
85
+ jobs = jobs.in_groups(qty)
86
+
87
+ #spawn the children
88
+ children = []
89
+ qty.times { children << make_child(child_proc)}
90
+
91
+ #register signal handler so that children kill if program receives a SIGINT
92
+ #which will happen if the user ctrl c's the parent process
93
+ Signal.trap :SIGINT do
94
+ children.each { |child| Process.kill(:KILL, child[:pid]) if child[:pid]}
95
+ exit 1
96
+ end
97
+
98
+ #For each worker
99
+ qty.times do |i|
100
+
101
+ #start a thread
102
+ threads[i] = Thread.new do
103
+ Thread.current.abort_on_exception = true
104
+
105
+ child = {}
106
+ semaphore.synchronize { child = children.pop }
107
+
108
+ pid = child[:pid]
109
+ njobs = jobs[i - 1]
110
+
111
+ #pass jobs to child
112
+ Marshal.dump(njobs, child[:write])
113
+
114
+ #wait for result
115
+ result = Marshal.load(child[:read])
116
+
117
+ #process result
118
+ semaphore.synchronize { parent_proc.call(result) }
119
+
120
+ #close the pipe
121
+ child[:write].close
122
+
123
+ #wait for process to finish before terminating this thread
124
+ Process.wait(pid)
125
+
126
+ #close db connection
127
+ SqliteActiveRecord.connection.close
128
+ end
129
+ end
130
+ wait_for_threads(threads)
131
+ end
132
+ end
data/lib/list.rb ADDED
@@ -0,0 +1,28 @@
1
+ #List the files in a MogileFS backup with their metadata
2
+ class List
3
+ attr_accessor :backup_path
4
+ include Validations
5
+
6
+
7
+ #initialize the list object
8
+ #@param[Hash] o :backup_path is required
9
+ def initialize(o={})
10
+
11
+ #If settings file does not exist then this is not a valid mogilefs backup
12
+ check_settings_file('settings.yml not found in path. This must not be a backup profile. Cannot list')
13
+
14
+ connect_sqlite
15
+ migrate_sqlite
16
+
17
+ #Now that database is all setup load the model class
18
+ require('bakfile')
19
+ end
20
+
21
+ #Outputs a list of files in CSV format
22
+ #fid,key,length,class
23
+ def list
24
+ files = BakFile.find_each(:conditions => ['saved = ?', true]) do |file|
25
+ puts "#{file.fid},#{file.dkey},#{file.length},#{file.classname}"
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,3 @@
1
+ module Mogbak
2
+ VERSION = '0.1.2'
3
+ end
@@ -0,0 +1,28 @@
1
+ #pretty useful rails method. Splits an array into groups
2
+ class Array
3
+ def in_groups(number, fill_with = nil)
4
+ # size / number gives minor group size;
5
+ # size % number gives how many objects need extra accommodation;
6
+ # each group hold either division or division + 1 items.
7
+ division = size / number
8
+ modulo = size % number
9
+
10
+ # create a new array avoiding dup
11
+ groups = []
12
+ start = 0
13
+
14
+ number.times do |index|
15
+ length = division + (modulo > 0 && modulo > index ? 1 : 0)
16
+ padding = fill_with != false &&
17
+ modulo > 0 && length == division ? 1 : 0
18
+ groups << slice(start, length).concat([fill_with] * padding)
19
+ start += length
20
+ end
21
+
22
+ if block_given?
23
+ groups.each { |g| yield(g) }
24
+ else
25
+ groups
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,20 @@
1
+ class PathHelper
2
+ #This produces a hashed path very similar to mogilefs just without the device id. It also recursively creates the
3
+ #directory inside the backup
4
+ def self.path(sfid)
5
+ sfid = "#{sfid}"
6
+ length = sfid.length
7
+ if length < 10
8
+ nfid = "0" * (10 - length) + sfid
9
+ else
10
+ nfid = fid
11
+ end
12
+ /(?<b>\d)(?<mmm>\d{3})(?<ttt>\d{3})(?<hto>\d{3})/ =~ nfid
13
+
14
+ #create the directory
15
+ directory_path = "#{$backup_path}/#{b}/#{mmm}/#{ttt}"
16
+ FileUtils.mkdir_p(directory_path)
17
+
18
+ return "#{directory_path}/#{nfid}.fid"
19
+ end
20
+ end
data/lib/restore.rb ADDED
@@ -0,0 +1,71 @@
1
+ #Restore a mogbak backup to a MogileFS domain
2
+ class Restore
3
+ attr_accessor :domain, :tracker_host, :tracker_port, :backup_path, :workers
4
+ include Validations
5
+
6
+ #
7
+ def initialize(o={})
8
+ @domain = o[:domain] if o[:domain]
9
+ @tracker_ip = o[:tracker_ip] if o[:tracker_ip]
10
+ @tracker_port = o[:tracker_port] if o[:tracker_port]
11
+ @backup_path = o[:backup_path] if o[:backup_path]
12
+ @workers = o[:workers] if o[:workers]
13
+
14
+
15
+ #If settings file does not exist then this is not a valid mogilefs backup
16
+ check_settings_file('settings.yml not found in path. This must not be a backup profile. Cannot restore')
17
+
18
+ connect_sqlite
19
+ migrate_sqlite
20
+ mogile_tracker_connect
21
+
22
+ #Now that database is all setup load the model classes
23
+ require ('domain')
24
+ require('file')
25
+ require('bakfile')
26
+ require('fileclass')
27
+ end
28
+
29
+ def output_save(save, fid)
30
+ if save
31
+ puts "Restored: FID #{fid}"
32
+ else
33
+ puts "Error: FID #{fid}"
34
+ end
35
+ end
36
+
37
+ def launch_restore_workers(files)
38
+ child = Proc.new { |files|
39
+ results = []
40
+ files.each do |file|
41
+ break if file.nil?
42
+ save = file.restore
43
+ output_save(save, file.fid)
44
+ results << {:restored => save, :fid => file.fid}
45
+ end
46
+ results
47
+ }
48
+
49
+ parent = Proc.new { |results|
50
+ SqliteActiveRecord.clear_active_connections!
51
+ }
52
+
53
+ Forkinator.hybrid_fork(self.workers.to_i, files, parent, child)
54
+
55
+ end
56
+
57
+ def restore(dkey = false)
58
+ if dkey
59
+ file = BakFile.find_by_dkey(dkey)
60
+ raise 'file not found in backup' unless file
61
+ save = file.restore
62
+ output_save(save, file.fid)
63
+ else
64
+
65
+ BakFile.find_in_batches(:conditions => ['saved = ?', true], :batch_size => 2000) do |batch|
66
+ launch_restore_workers(batch)
67
+ end
68
+
69
+ end
70
+ end
71
+ end
@@ -0,0 +1,120 @@
1
+ #Common methods used for setting up or verifing
2
+ module Validations
3
+
4
+ #Check that the settings.yml file exists in the backup_path
5
+ #@param [String] raise_msg exception to raise if the file is missing. if set to nil no exception will be rasied
6
+ #@return [Bool]
7
+ def check_settings_file(raise_msg = 'settings.yml not found in path. This must not be a backup profile. See: mogbak help create')
8
+ if File.exists?("#{$backup_path}/settings.yml")
9
+ return true
10
+ else
11
+ raise raise_msg if raise_msg
12
+ return false
13
+ end
14
+ end
15
+
16
+
17
+ #Check that the backup_path is valid
18
+ #@param [String] raise_msg exception to raise if the backup_path is not a valid direcotry. if set to nil no exception will be rasied
19
+ #@return [Bool]
20
+ def check_backup_path(raise_msg = 'backup_path is not a valid directory')
21
+ if !File.directory?($backup_path)
22
+ raise raise_msg if raise_msg
23
+ return false
24
+ end
25
+ true
26
+ end
27
+
28
+ #Create database for metadata
29
+ #@param [String] raise_msg exception to raise if database cannot be created. nil will raise no exception
30
+ #@return [Bool]
31
+ def create_sqlite_db(raise_msg = "Could not create #{$backup_path}/db.sqlite - check permissions")
32
+ begin
33
+ if !File.exists?("#{$backup_path}/db.sqlite")
34
+ SQLite3::Database.new("#{$backup_path}/db.sqlite")
35
+ end
36
+ rescue Exception => e
37
+ raise raise_msg if raise_msg
38
+ return false
39
+ end
40
+ true
41
+ end
42
+
43
+ #Connect to sqlite metadata db
44
+ #@param [String] raise_msg exception to raise if we cannot connect. if set to nil no exception will be rasied
45
+ #@return [Bool]
46
+ def connect_sqlite(raise_msg = nil)
47
+ begin
48
+ ActiveRecord::Base.establish_connection(:adapter => 'sqlite3', :database => "#{$backup_path}/db.sqlite", :timeout => 1000)
49
+ rescue Exception => e
50
+ raise raise_msg if raise_msg
51
+ raise e if $debug
52
+ return false
53
+ end
54
+ true
55
+ end
56
+
57
+ #Run ActiveRecord migrations on the sqlite database
58
+ #@param [String] raise_msg exception to raise if migrations fail. if set to nil no exception will be rasied
59
+ #@return [Bool]
60
+ def migrate_sqlite(raise_msg = "could not run migrations on #{$backup_path}/db.sqlite")
61
+ #run migrations
62
+ begin
63
+ ActiveRecord::Migrator.up(File.expand_path(File.dirname(__FILE__)) + '/../db/migrate/')
64
+ rescue
65
+ raise raise_msg if raise_msg
66
+ return false
67
+ end
68
+ true
69
+ end
70
+
71
+ #Connect to MogileFS mysql server
72
+ #@param [String] raise_msg exception to raise if we cannot connect. if set to nil no exception will be rasied
73
+ #@return [Bool]
74
+ def mogile_db_connect(raise_msg = 'Could not connect to MySQL database')
75
+ #Verify that we can connect to the mogilefs mysql server
76
+ begin
77
+ ActiveRecord::Base.establish_connection({:adapter => "mysql2",
78
+ :host => @db_host,
79
+ :port => @db_port,
80
+ :username => @db_user,
81
+ :password => @db_pass,
82
+ :database => @db,
83
+ :reconnect => true})
84
+ rescue Exception => e
85
+ raise raise_msg if raise_msg
86
+ return false
87
+ end
88
+ true
89
+ end
90
+
91
+ #Connect to mogile tracker
92
+ #@param [String] raise_msg exception to raise if we cannot connect. if set to nil no exception will be raised
93
+ #@return [Bool]
94
+ def mogile_tracker_connect(raise_msg = 'Could not connect to MogileFS tracker')
95
+ host = ["#{@tracker_ip}:#{@tracker_port}"]
96
+ begin
97
+ $mg = MogileFS::MogileFS.new(:domain => @domain, :hosts => host)
98
+ rescue Exception => e
99
+ if $debug
100
+ raise e
101
+ end
102
+ raise raise_msg if raise_msg
103
+ return false
104
+ end
105
+ end
106
+
107
+ #Check if mogile domain is valid
108
+ #@param [String] raise_msg exception to raise if domain does not exist. if set to nil no exception will be raised
109
+ #@return [Bool]
110
+ def check_mogile_domain(domain, raise_msg = 'Domain does not exist in MogileFS')
111
+ require('domain')
112
+ domain = Domain.find_by_namespace(self.domain)
113
+ if !domain
114
+ raise raise_msg if raise_msg
115
+ return false
116
+ end
117
+ true
118
+ end
119
+
120
+ end
data/mogbak.gemspec ADDED
@@ -0,0 +1,27 @@
1
+ # Ensure we require the local version and not one we might have installed already
2
+ require File.join([File.dirname(__FILE__),'lib','mogbak_version.rb'])
3
+ spec = Gem::Specification.new do |s|
4
+ s.name = 'mogbak'
5
+ s.license = "MIT"
6
+ s.version = Mogbak::VERSION
7
+ s.author = 'Jesse Angell'
8
+ s.email = 'jesse.angell@firespring.com'
9
+ s.homepage = 'http://www.github.com/firespring/mogbak'
10
+ s.platform = Gem::Platform::RUBY
11
+ s.summary = 'Backup utility for MogileFS'
12
+ s.description = 'mogbak makes it easy to backup and restore mogilefs domains'
13
+ s.files = `git ls-files`.split("\n")
14
+ s.require_paths << 'lib'
15
+ s.bindir = 'bin'
16
+ s.executables << 'mogbak'
17
+ s.add_development_dependency('awesome_print')
18
+
19
+ s.add_runtime_dependency('gli', '>= 1.5.1')
20
+ s.add_runtime_dependency('mysql2', '>= 0.3.11')
21
+ s.add_runtime_dependency('mogilefs-client','>= 3.1.1')
22
+ s.add_runtime_dependency('json','>= 1.6.5')
23
+ s.add_runtime_dependency('sqlite3','>=1.3.5')
24
+ s.add_runtime_dependency('activerecord-import','>=0.2.9')
25
+
26
+
27
+ end
metadata ADDED
@@ -0,0 +1,180 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: mogbak
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.2
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Jesse Angell
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2012-03-27 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: awesome_print
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: '0'
22
+ type: :development
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ! '>='
28
+ - !ruby/object:Gem::Version
29
+ version: '0'
30
+ - !ruby/object:Gem::Dependency
31
+ name: gli
32
+ requirement: !ruby/object:Gem::Requirement
33
+ none: false
34
+ requirements:
35
+ - - ! '>='
36
+ - !ruby/object:Gem::Version
37
+ version: 1.5.1
38
+ type: :runtime
39
+ prerelease: false
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ! '>='
44
+ - !ruby/object:Gem::Version
45
+ version: 1.5.1
46
+ - !ruby/object:Gem::Dependency
47
+ name: mysql2
48
+ requirement: !ruby/object:Gem::Requirement
49
+ none: false
50
+ requirements:
51
+ - - ! '>='
52
+ - !ruby/object:Gem::Version
53
+ version: 0.3.11
54
+ type: :runtime
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ none: false
58
+ requirements:
59
+ - - ! '>='
60
+ - !ruby/object:Gem::Version
61
+ version: 0.3.11
62
+ - !ruby/object:Gem::Dependency
63
+ name: mogilefs-client
64
+ requirement: !ruby/object:Gem::Requirement
65
+ none: false
66
+ requirements:
67
+ - - ! '>='
68
+ - !ruby/object:Gem::Version
69
+ version: 3.1.1
70
+ type: :runtime
71
+ prerelease: false
72
+ version_requirements: !ruby/object:Gem::Requirement
73
+ none: false
74
+ requirements:
75
+ - - ! '>='
76
+ - !ruby/object:Gem::Version
77
+ version: 3.1.1
78
+ - !ruby/object:Gem::Dependency
79
+ name: json
80
+ requirement: !ruby/object:Gem::Requirement
81
+ none: false
82
+ requirements:
83
+ - - ! '>='
84
+ - !ruby/object:Gem::Version
85
+ version: 1.6.5
86
+ type: :runtime
87
+ prerelease: false
88
+ version_requirements: !ruby/object:Gem::Requirement
89
+ none: false
90
+ requirements:
91
+ - - ! '>='
92
+ - !ruby/object:Gem::Version
93
+ version: 1.6.5
94
+ - !ruby/object:Gem::Dependency
95
+ name: sqlite3
96
+ requirement: !ruby/object:Gem::Requirement
97
+ none: false
98
+ requirements:
99
+ - - ! '>='
100
+ - !ruby/object:Gem::Version
101
+ version: 1.3.5
102
+ type: :runtime
103
+ prerelease: false
104
+ version_requirements: !ruby/object:Gem::Requirement
105
+ none: false
106
+ requirements:
107
+ - - ! '>='
108
+ - !ruby/object:Gem::Version
109
+ version: 1.3.5
110
+ - !ruby/object:Gem::Dependency
111
+ name: activerecord-import
112
+ requirement: !ruby/object:Gem::Requirement
113
+ none: false
114
+ requirements:
115
+ - - ! '>='
116
+ - !ruby/object:Gem::Version
117
+ version: 0.2.9
118
+ type: :runtime
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ none: false
122
+ requirements:
123
+ - - ! '>='
124
+ - !ruby/object:Gem::Version
125
+ version: 0.2.9
126
+ description: mogbak makes it easy to backup and restore mogilefs domains
127
+ email: jesse.angell@firespring.com
128
+ executables:
129
+ - mogbak
130
+ extensions: []
131
+ extra_rdoc_files: []
132
+ files:
133
+ - .gitignore
134
+ - Gemfile
135
+ - Gemfile.lock
136
+ - LICENSE
137
+ - README.md
138
+ - bin/mogbak
139
+ - db/migrate/20120316200100_create_initial_schema.rb
140
+ - lib/backup.rb
141
+ - lib/bakfile.rb
142
+ - lib/create.rb
143
+ - lib/domain.rb
144
+ - lib/file.rb
145
+ - lib/fileclass.rb
146
+ - lib/forkinator.rb
147
+ - lib/list.rb
148
+ - lib/mogbak_version.rb
149
+ - lib/monkey_patch.rb
150
+ - lib/path_helper.rb
151
+ - lib/restore.rb
152
+ - lib/validations.rb
153
+ - mogbak.gemspec
154
+ homepage: http://www.github.com/firespring/mogbak
155
+ licenses:
156
+ - MIT
157
+ post_install_message:
158
+ rdoc_options: []
159
+ require_paths:
160
+ - lib
161
+ - lib
162
+ required_ruby_version: !ruby/object:Gem::Requirement
163
+ none: false
164
+ requirements:
165
+ - - ! '>='
166
+ - !ruby/object:Gem::Version
167
+ version: '0'
168
+ required_rubygems_version: !ruby/object:Gem::Requirement
169
+ none: false
170
+ requirements:
171
+ - - ! '>='
172
+ - !ruby/object:Gem::Version
173
+ version: '0'
174
+ requirements: []
175
+ rubyforge_project:
176
+ rubygems_version: 1.8.21
177
+ signing_key:
178
+ specification_version: 3
179
+ summary: Backup utility for MogileFS
180
+ test_files: []