sir-sync-a-lot 0.0.0 → 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,4 @@
1
+ *.gem
2
+ .bundle
3
+ Gemfile.lock
4
+ pkg/*
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in sir-sync-a-lot.gemspec
4
+ gemspec
@@ -0,0 +1,19 @@
1
+ Copyright (c) 2010 Ryan Allen, Envato Pty Ltd
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in
11
+ all copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ THE SOFTWARE.
@@ -0,0 +1,59 @@
1
+ # Sir Sync-A-Lot. An optimised S3 sync tool using the power of *nix!
2
+
3
+ ## Requirements:
4
+
5
+ * Ruby 1.8.something
6
+ * RubyGems
7
+ * aws-s3 RubyGem
8
+ * OpenSSL
9
+ * find and xargs (should be on your *nix straight outta the box)
10
+
11
+ To configure, run <code>sir-sync-a-lot setup</code> and follow the prompts, you'll
12
+ need your S3 keys, the local file path you want to back up, the bucket name
13
+ to back up to, and any extra options to pass into find (i.e. for ignoring
14
+ filepaths etc). It'll write the config to <code>~/.sir-sync-a-lot.yml</code>.
15
+
16
+ Then to sync, run <code>sir-sync-a-lot sync</code> and away she goes.
17
+
18
+ ## Why?
19
+
20
+ This library was written because we needed to be able to back up craploads of
21
+ data without having to worry about if we had enough disk space on the remote.
22
+ That's where S3 is nice.
23
+
24
+ We tried s3sync but it blew the crap out of our server load (we do in excess of
25
+ 500,000 requests a day (page views, not including hits for images and what not,
26
+ and the server needs to stay responsive). The secret sauce is using the *nix
27
+ 'find', 'xargs' and 'openssl' commands to generate md5 checksums for comparison.
28
+ Seems to work quite well for us (we have almost 90,000 files to compare).
29
+
30
+ Initially the plan was to use find with -ctime but S3 isn't particulary nice about
31
+ returning a full list of objects in a bucket (default is 1000, and I want all
32
+ 90,000, and it ignores me when I ask for 1,000,000 objects). Manifest generation
33
+ on a server under load is fast enough and low enough on resources so we're sticking
34
+ with that in the interim.
35
+
36
+ ## Etc
37
+
38
+ FYI when you run sync, the output will look something like this:
39
+
40
+ [Thu Apr 01 11:50:25 +1100 2010] Starting, performing pre-sync checks...
41
+ [Thu Apr 01 11:50:26 +1100 2010] Generating local manifest...
42
+ [Thu Apr 01 11:50:26 +1100 2010] Fetching remote manifest...
43
+ [Thu Apr 01 11:50:27 +1100 2010] Performing checksum comparison...
44
+ [Thu Apr 01 11:50:27 +1100 2010] Pushing /tmp/backups/deep/four...
45
+ [Thu Apr 01 11:50:28 +1100 2010] Pushing /tmp/backups/three...
46
+ [Thu Apr 01 11:50:29 +1100 2010] Pushing /tmp/backups/two...
47
+ [Thu Apr 01 11:50:30 +1100 2010] Pushing local manifest up to remote...
48
+ [Thu Apr 01 11:50:31 +1100 2010] Done like a dinner.
49
+
50
+ You could pipe sync into a log file, which might be nice, this is what our crontab
51
+ looks like:
52
+
53
+ # run sync backups to s3 every day
54
+ 0 1 * * * /usr/local/bin/rvm 1.8.7 ruby /root/sir-sync-a-lot/sir-sync-a-lot 1>> /var/log/sir-sync-a-lot.log 2>> /var/log/sir-sync-a-lot.error
55
+
56
+ Have fun!
57
+
58
+ Project wholy sponsored by Envato Pty Ltd. They're the shizzy! P.S. We use this
59
+ in production environments!
@@ -0,0 +1 @@
1
+ require "bundler/gem_tasks"
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require_relative '../lib/sir-sync-a-lot'
3
+ SirSyncalot.run!(*ARGV)
@@ -0,0 +1,295 @@
1
+ require 'aws/s3'
2
+ require 'yaml'
3
+
4
+ class SirSyncalot
5
+ def self.run!(*args)
6
+ new(*args).run!
7
+ end
8
+
9
+ [:action, :config].each { |member| attr(member) }
10
+
11
+ def initialize(action = "sync")
12
+ @action = action
13
+ end
14
+
15
+ def run!
16
+ validate_inputs!
17
+ perform_action!
18
+ end
19
+
20
+ VERSION = '0.0.1'
21
+
22
+ private
23
+
24
+ def validate_inputs!
25
+ if setup_action? and config_exists?
26
+ exit_with_error!("Can't make a setup, because there's already a configuration in '#{config_path}'.")
27
+ elsif sync_action? and !config_exists?
28
+ exit_with_error!("Can't make a sync, because there's no configuration, try '#{__FILE__} setup'.")
29
+ end
30
+ end
31
+
32
+ def perform_action!
33
+ if setup_action?
34
+ aquire_lock! { perform_setup! }
35
+ elsif sync_action?
36
+ aquire_lock! { perform_sync! }
37
+ elsif help_action?
38
+ display_help!
39
+ else
40
+ exit_with_error!("Cannot perform action '#{@action}', try '#{__FILE__} help' for usage.")
41
+ end
42
+ end
43
+
44
+ def setup_action?
45
+ action == "setup"
46
+ end
47
+
48
+ def sync_action?
49
+ action == "sync"
50
+ end
51
+
52
+ def help_action?
53
+ action == "help"
54
+ end
55
+
56
+ def perform_setup!
57
+ display("Hello! Ima ask you a few questions, and store the results in #{config_path} for later, OK?")
58
+
59
+ config = {}
60
+
61
+ config[:aws_access_key] = ask("What is the AWS access key?")
62
+ config[:aws_secret_key] = ask("What is the AWS secret access key?")
63
+ display("Just a sec, ima check that works...")
64
+ if aws_credentials_valid?(config)
65
+ display("Yep, all good.")
66
+ config[:aws_dest_bucket] = ask("What bucket should we put your backups in? (If it doesn't exist I'll create it)")
67
+ if bucket_exists?(config)
68
+ if bucket_empty?(config)
69
+ display("I found that the bucket already exists, and it's empty so I'm happy.")
70
+ else
71
+ exit_with_error!("I found the bucket to exist, but it's not empty. I can't sync to a bucket that is not empty.")
72
+ end
73
+ else
74
+ display("The bucket doesn't exist, so I'm creating it now...")
75
+ create_bucket(config)
76
+ display("OK that's done.")
77
+ end
78
+ else
79
+ exit_with_error!("I couldn't connect to S3 with the credentials you supplied, try again much?")
80
+ end
81
+
82
+ config[:local_file_path] = ask("What is the (absolute) path that you want to back up? (i.e. /var/www not ./www)")
83
+ if !local_file_path_exists?(config)
84
+ exit_with_error!("I find that the local file path you supplied doesn't exist, wrong much?")
85
+ end
86
+
87
+ config[:find_options] = ask("Do you have any options for find ? (e.g. \! -path \"*.git*). Press enter for none")
88
+
89
+ display("Right, I'm writing out the details you supplied to '#{config_path}' for my future reference...")
90
+ write_config!(config)
91
+ display("You're good to go. Next up is '#{__FILE__} sync' to syncronise your files to S3.")
92
+ end
93
+
94
+ def aws_credentials_valid?(config = read_config())
95
+ AWS::S3::Base.establish_connection!(:access_key_id => config[:aws_access_key], :secret_access_key => config[:aws_secret_key])
96
+ begin
97
+ AWS::S3::Service.buckets # AWS::S3 don't try to connect at all until you ask it for something.
98
+ rescue AWS::S3::InvalidAccessKeyId => e
99
+ false
100
+ else
101
+ true
102
+ end
103
+ end
104
+
105
+ def bucket_exists?(config = read_config())
106
+ AWS::S3::Bucket.find(config[:aws_dest_bucket])
107
+ rescue AWS::S3::NoSuchBucket => e
108
+ false
109
+ end
110
+
111
+ def bucket_empty?(config = read_config())
112
+ AWS::S3::Bucket.find(config[:aws_dest_bucket]).empty?
113
+ end
114
+
115
+ def create_bucket(config = read_config())
116
+ AWS::S3::Bucket.create(config[:aws_dest_bucket])
117
+ end
118
+
119
+ def local_file_path_exists?(config = read_config())
120
+ File.exist?(config[:local_file_path])
121
+ end
122
+
123
+ def write_config!(config)
124
+ open(config_path, 'w') { |f| YAML::dump(config, f) }
125
+ end
126
+
127
+ def read_config(reload = false)
128
+ reload or !@config ? @config = open(config_path, 'r') { |f| YAML::load(f) } : @config
129
+ end
130
+
131
+ def perform_sync!
132
+ display("Starting, performing pre-sync checks...")
133
+ if !aws_credentials_valid?
134
+ exit_with_error!("Couldn't connect to S3 with the credentials in #{config_path}.")
135
+ end
136
+
137
+ if !bucket_exists?
138
+ exit_with_error!("Can't find the bucket in S3 specified in #{config_path}.")
139
+ end
140
+
141
+ if !local_file_path_exists?
142
+ exit_with_error!("Local path specified in #{config_path} does not exist.")
143
+ end
144
+
145
+ create_tmp_sync_state
146
+
147
+ if last_sync_recorded?
148
+ display("Performing time based comparison...")
149
+ files_modified_since_last_sync
150
+ else
151
+ display("Performing (potentially expensive) checksum comparison...")
152
+ display("Generating local manifest...")
153
+ generate_local_manifest
154
+ display("Traversing S3 for remote manifest...")
155
+ fetch_remote_manifest
156
+ # note that we do not remove files on s3 that no longer exist on local host. this behaviour
157
+ # may be desirable (ala rsync --delete) but we currently don't support it. ok? sweet.
158
+ display("Performing checksum comparison...")
159
+ files_on_localhost_with_checksums - files_on_s3
160
+ end.each { |file| push_file(file) }
161
+
162
+ finalize_sync_state
163
+
164
+ display("Done like a dinner.")
165
+ end
166
+
167
+ def last_sync_recorded?
168
+ File.exist?(last_sync_completed)
169
+ end
170
+
171
+ def create_tmp_sync_state
172
+ `touch #{last_sync_started}`
173
+ end
174
+
175
+ def finalize_sync_state
176
+ `cp #{last_sync_started} #{last_sync_completed}`
177
+ end
178
+
179
+ def last_sync_started
180
+ ENV['HOME'] + "/.sir-sync-a-lot.last-sync.started"
181
+ end
182
+
183
+ def last_sync_completed
184
+ ENV['HOME'] + "/.sir-sync-a-lot.last-sync.completed"
185
+ end
186
+
187
+ def files_modified_since_last_sync
188
+ # '! -type d' ignores directories, in local manifest directories are spit out to stderr whereas directories pop up in this query
189
+ `find #{read_config[:local_file_path]} #{read_config[:find_options]} \! -type d -cnewer #{last_sync_completed}`.split("\n").collect { |path| {:path => path } }
190
+ end
191
+
192
+ def update_config_with_sync_state(sync_start)
193
+ config = read_config()
194
+ config[:last_sync_at] = sync_start
195
+ write_config!(config)
196
+ end
197
+
198
+ def generate_local_manifest
199
+ `find #{read_config[:local_file_path]} #{read_config[:find_options]} -print0 | xargs -0 openssl md5 2> /dev/null > /tmp/sir-sync-a-lot.manifest.local`
200
+ end
201
+
202
+ def fetch_remote_manifest
203
+ @remote_objects_cache = [] # instance vars feel like global variables somehow
204
+ traverse_s3_for_objects(AWS::S3::Bucket.find(read_config[:aws_dest_bucket]), @remote_objects_cache)
205
+ end
206
+
207
+ def traverse_s3_for_objects(bucket, collection, n = 1000, upto = 0, marker = nil)
208
+ objects = bucket.objects(:marker => marker, :max_keys => n)
209
+ if objects.size == 0
210
+ return
211
+ else
212
+ objects.each { |object| collection << {:path => "/#{object.key}", :checksum => object.etag} }
213
+ traverse_s3_for_objects(bucket, collection, n, upto+n, objects.last.key)
214
+ end
215
+ end
216
+
217
+ def files_on_localhost_with_checksums
218
+ parse_manifest(local_manifest_path)
219
+ end
220
+
221
+ def files_on_s3
222
+ @remote_objects_cache
223
+ end
224
+
225
+ def local_manifest_path
226
+ "/tmp/sir-sync-a-lot.manifest.local"
227
+ end
228
+
229
+ def parse_manifest(location)
230
+ if File.exist?(location)
231
+ open(location, 'r') do |file|
232
+ file.collect do |line|
233
+ path, checksum = *line.chomp.match(/^MD5\((.*)\)= (.*)$/).captures
234
+ {:path => path, :checksum => checksum}
235
+ end
236
+ end
237
+ else
238
+ []
239
+ end
240
+ end
241
+
242
+ def push_file(file)
243
+ # xfer speed, logging, etc can occur in this method
244
+ display("Pushing #{file[:path]}...")
245
+ AWS::S3::S3Object.store(file[:path], open(file[:path]), read_config[:aws_dest_bucket])
246
+ rescue
247
+ display("ERROR: Could not push '#{file[:path]}': #{$!.inspect}")
248
+ end
249
+
250
+ def aquire_lock!
251
+ if File.exist?(lock_path)
252
+ # better way is to write out the pid ($$) and read it back in, to make sure it's the same
253
+ exit_with_error!("Found a lock at #{lock_path}, is another instance of #{__FILE__} running?")
254
+ end
255
+
256
+ begin
257
+ system("touch #{lock_path}")
258
+ yield
259
+ ensure
260
+ system("rm #{lock_path}")
261
+ end
262
+ end
263
+
264
+
265
+ def display_help!
266
+ display("Go help yourself buddy!")
267
+ end
268
+
269
+ def exit_with_error!(message)
270
+ display("Gah! " + message)
271
+ exit
272
+ end
273
+
274
+ def display(message)
275
+ puts("[#{Time.now}] #{message}")
276
+ end
277
+
278
+ def ask(question)
279
+ print(question + ": ")
280
+ $stdin.readline.chomp # gets doesn't work here!
281
+ end
282
+
283
+ def config_exists?
284
+ File.exist?(config_path)
285
+ end
286
+
287
+ def config_path
288
+ ENV['HOME'] + "/.sir-sync-a-lot.yml"
289
+ end
290
+
291
+ def lock_path
292
+ ENV['HOME'] + "/.sir-sync-a-lot.lock"
293
+ end
294
+
295
+ end
@@ -0,0 +1,24 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "sir-sync-a-lot"
4
+
5
+ Gem::Specification.new do |s|
6
+ s.name = "sir-sync-a-lot"
7
+ s.version = SirSyncalot::VERSION
8
+ s.authors = ["Ryan Allen"]
9
+ s.email = ["ryan@yeahnah.org"]
10
+ s.homepage = "https://github.com/ryan-allen/sir-sync-a-lot"
11
+ s.summary = %q{Baby got backups!}
12
+ s.description = %q{Optimised S3 backup tool. Uses linux's find and xargs to find updated files as to not exaust your disk IO.}
13
+
14
+ s.rubyforge_project = "sir-sync-a-lot"
15
+
16
+ s.files = `git ls-files`.split("\n")
17
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
18
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
19
+ s.require_paths = ["lib"]
20
+
21
+ # specify any dependencies here; for example:
22
+ # s.add_development_dependency "rspec"
23
+ s.add_runtime_dependency "aws-s3"
24
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sir-sync-a-lot
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.0
4
+ version: 0.0.1
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -13,7 +13,7 @@ date: 2011-09-14 00:00:00.000000000Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: aws-s3
16
- requirement: &70288454419240 !ruby/object:Gem::Requirement
16
+ requirement: &70302277542320 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ! '>='
@@ -21,16 +21,25 @@ dependencies:
21
21
  version: '0'
22
22
  type: :runtime
23
23
  prerelease: false
24
- version_requirements: *70288454419240
24
+ version_requirements: *70302277542320
25
25
  description: Optimised S3 backup tool. Uses linux's find and xargs to find updated
26
26
  files as to not exaust your disk IO.
27
27
  email:
28
28
  - ryan@yeahnah.org
29
- executables: []
29
+ executables:
30
+ - sir-sync-a-lot
30
31
  extensions: []
31
32
  extra_rdoc_files: []
32
- files: []
33
- homepage: ''
33
+ files:
34
+ - .gitignore
35
+ - Gemfile
36
+ - MIT-LICENSE
37
+ - README.md
38
+ - Rakefile
39
+ - bin/sir-sync-a-lot
40
+ - lib/sir-sync-a-lot.rb
41
+ - sir-sync-a-lot.gemspec
42
+ homepage: https://github.com/ryan-allen/sir-sync-a-lot
34
43
  licenses: []
35
44
  post_install_message:
36
45
  rdoc_options: []