sir-sync-a-lot 0.0.0 → 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,4 @@
1
+ *.gem
2
+ .bundle
3
+ Gemfile.lock
4
+ pkg/*
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in sir-sync-a-lot.gemspec
4
+ gemspec
@@ -0,0 +1,19 @@
1
+ Copyright (c) 2010 Ryan Allen, Envato Pty Ltd
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in
11
+ all copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ THE SOFTWARE.
@@ -0,0 +1,59 @@
1
+ # Sir Sync-A-Lot. An optimised S3 sync tool using the power of *nix!
2
+
3
+ ## Requirements:
4
+
5
+ * Ruby 1.8.something
6
+ * RubyGems
7
+ * aws-s3 RubyGem
8
+ * OpenSSL
9
+ * find and xargs (should be on your *nix straight outta the box)
10
+
11
+ To configure, run <code>sir-sync-a-lot setup</code> and follow the prompts, you'll
12
+ need your S3 keys, the local file path you want to back up, the bucket name
13
+ to back up to, and any extra options to pass into find (i.e. for ignoring
14
+ filepaths etc). It'll write the config to <code>~/.sir-sync-a-lot.yml</code>.
15
+
16
+ Then to sync, run <code>sir-sync-a-lot sync</code> and away she goes.
17
+
18
+ ## Why?
19
+
20
+ This library was written because we needed to be able to back up craploads of
21
+ data without having to worry about if we had enough disk space on the remote.
22
+ That's where S3 is nice.
23
+
24
+ We tried s3sync but it blew the crap out of our server load (we do in excess of
25
+ 500,000 requests a day (page views, not including hits for images and what not,
26
+ and the server needs to stay responsive). The secret sauce is using the *nix
27
+ 'find', 'xargs' and 'openssl' commands to generate md5 checksums for comparison.
28
+ Seems to work quite well for us (we have almost 90,000 files to compare).
29
+
30
+ Initially the plan was to use find with -ctime but S3 isn't particulary nice about
31
+ returning a full list of objects in a bucket (default is 1000, and I want all
32
+ 90,000, and it ignores me when I ask for 1,000,000 objects). Manifest generation
33
+ on a server under load is fast enough and low enough on resources so we're sticking
34
+ with that in the interim.
35
+
36
+ ## Etc
37
+
38
+ FYI when you run sync, the output will look something like this:
39
+
40
+ [Thu Apr 01 11:50:25 +1100 2010] Starting, performing pre-sync checks...
41
+ [Thu Apr 01 11:50:26 +1100 2010] Generating local manifest...
42
+ [Thu Apr 01 11:50:26 +1100 2010] Fetching remote manifest...
43
+ [Thu Apr 01 11:50:27 +1100 2010] Performing checksum comparison...
44
+ [Thu Apr 01 11:50:27 +1100 2010] Pushing /tmp/backups/deep/four...
45
+ [Thu Apr 01 11:50:28 +1100 2010] Pushing /tmp/backups/three...
46
+ [Thu Apr 01 11:50:29 +1100 2010] Pushing /tmp/backups/two...
47
+ [Thu Apr 01 11:50:30 +1100 2010] Pushing local manifest up to remote...
48
+ [Thu Apr 01 11:50:31 +1100 2010] Done like a dinner.
49
+
50
+ You could pipe sync into a log file, which might be nice, this is what our crontab
51
+ looks like:
52
+
53
+ # run sync backups to s3 every day
54
+ 0 1 * * * /usr/local/bin/rvm 1.8.7 ruby /root/sir-sync-a-lot/sir-sync-a-lot 1>> /var/log/sir-sync-a-lot.log 2>> /var/log/sir-sync-a-lot.error
55
+
56
+ Have fun!
57
+
58
+ Project wholy sponsored by Envato Pty Ltd. They're the shizzy! P.S. We use this
59
+ in production environments!
@@ -0,0 +1 @@
1
+ require "bundler/gem_tasks"
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require_relative '../lib/sir-sync-a-lot'
3
+ SirSyncalot.run!(*ARGV)
@@ -0,0 +1,295 @@
1
+ require 'aws/s3'
2
+ require 'yaml'
3
+
4
+ class SirSyncalot
5
+ def self.run!(*args)
6
+ new(*args).run!
7
+ end
8
+
9
+ [:action, :config].each { |member| attr(member) }
10
+
11
+ def initialize(action = "sync")
12
+ @action = action
13
+ end
14
+
15
+ def run!
16
+ validate_inputs!
17
+ perform_action!
18
+ end
19
+
20
+ VERSION = '0.0.1'
21
+
22
+ private
23
+
24
+ def validate_inputs!
25
+ if setup_action? and config_exists?
26
+ exit_with_error!("Can't make a setup, because there's already a configuration in '#{config_path}'.")
27
+ elsif sync_action? and !config_exists?
28
+ exit_with_error!("Can't make a sync, because there's no configuration, try '#{__FILE__} setup'.")
29
+ end
30
+ end
31
+
32
+ def perform_action!
33
+ if setup_action?
34
+ aquire_lock! { perform_setup! }
35
+ elsif sync_action?
36
+ aquire_lock! { perform_sync! }
37
+ elsif help_action?
38
+ display_help!
39
+ else
40
+ exit_with_error!("Cannot perform action '#{@action}', try '#{__FILE__} help' for usage.")
41
+ end
42
+ end
43
+
44
+ def setup_action?
45
+ action == "setup"
46
+ end
47
+
48
+ def sync_action?
49
+ action == "sync"
50
+ end
51
+
52
+ def help_action?
53
+ action == "help"
54
+ end
55
+
56
+ def perform_setup!
57
+ display("Hello! Ima ask you a few questions, and store the results in #{config_path} for later, OK?")
58
+
59
+ config = {}
60
+
61
+ config[:aws_access_key] = ask("What is the AWS access key?")
62
+ config[:aws_secret_key] = ask("What is the AWS secret access key?")
63
+ display("Just a sec, ima check that works...")
64
+ if aws_credentials_valid?(config)
65
+ display("Yep, all good.")
66
+ config[:aws_dest_bucket] = ask("What bucket should we put your backups in? (If it doesn't exist I'll create it)")
67
+ if bucket_exists?(config)
68
+ if bucket_empty?(config)
69
+ display("I found that the bucket already exists, and it's empty so I'm happy.")
70
+ else
71
+ exit_with_error!("I found the bucket to exist, but it's not empty. I can't sync to a bucket that is not empty.")
72
+ end
73
+ else
74
+ display("The bucket doesn't exist, so I'm creating it now...")
75
+ create_bucket(config)
76
+ display("OK that's done.")
77
+ end
78
+ else
79
+ exit_with_error!("I couldn't connect to S3 with the credentials you supplied, try again much?")
80
+ end
81
+
82
+ config[:local_file_path] = ask("What is the (absolute) path that you want to back up? (i.e. /var/www not ./www)")
83
+ if !local_file_path_exists?(config)
84
+ exit_with_error!("I find that the local file path you supplied doesn't exist, wrong much?")
85
+ end
86
+
87
+ config[:find_options] = ask("Do you have any options for find ? (e.g. \! -path \"*.git*). Press enter for none")
88
+
89
+ display("Right, I'm writing out the details you supplied to '#{config_path}' for my future reference...")
90
+ write_config!(config)
91
+ display("You're good to go. Next up is '#{__FILE__} sync' to syncronise your files to S3.")
92
+ end
93
+
94
+ def aws_credentials_valid?(config = read_config())
95
+ AWS::S3::Base.establish_connection!(:access_key_id => config[:aws_access_key], :secret_access_key => config[:aws_secret_key])
96
+ begin
97
+ AWS::S3::Service.buckets # AWS::S3 don't try to connect at all until you ask it for something.
98
+ rescue AWS::S3::InvalidAccessKeyId => e
99
+ false
100
+ else
101
+ true
102
+ end
103
+ end
104
+
105
+ def bucket_exists?(config = read_config())
106
+ AWS::S3::Bucket.find(config[:aws_dest_bucket])
107
+ rescue AWS::S3::NoSuchBucket => e
108
+ false
109
+ end
110
+
111
+ def bucket_empty?(config = read_config())
112
+ AWS::S3::Bucket.find(config[:aws_dest_bucket]).empty?
113
+ end
114
+
115
+ def create_bucket(config = read_config())
116
+ AWS::S3::Bucket.create(config[:aws_dest_bucket])
117
+ end
118
+
119
+ def local_file_path_exists?(config = read_config())
120
+ File.exist?(config[:local_file_path])
121
+ end
122
+
123
+ def write_config!(config)
124
+ open(config_path, 'w') { |f| YAML::dump(config, f) }
125
+ end
126
+
127
+ def read_config(reload = false)
128
+ reload or !@config ? @config = open(config_path, 'r') { |f| YAML::load(f) } : @config
129
+ end
130
+
131
+ def perform_sync!
132
+ display("Starting, performing pre-sync checks...")
133
+ if !aws_credentials_valid?
134
+ exit_with_error!("Couldn't connect to S3 with the credentials in #{config_path}.")
135
+ end
136
+
137
+ if !bucket_exists?
138
+ exit_with_error!("Can't find the bucket in S3 specified in #{config_path}.")
139
+ end
140
+
141
+ if !local_file_path_exists?
142
+ exit_with_error!("Local path specified in #{config_path} does not exist.")
143
+ end
144
+
145
+ create_tmp_sync_state
146
+
147
+ if last_sync_recorded?
148
+ display("Performing time based comparison...")
149
+ files_modified_since_last_sync
150
+ else
151
+ display("Performing (potentially expensive) checksum comparison...")
152
+ display("Generating local manifest...")
153
+ generate_local_manifest
154
+ display("Traversing S3 for remote manifest...")
155
+ fetch_remote_manifest
156
+ # note that we do not remove files on s3 that no longer exist on local host. this behaviour
157
+ # may be desirable (ala rsync --delete) but we currently don't support it. ok? sweet.
158
+ display("Performing checksum comparison...")
159
+ files_on_localhost_with_checksums - files_on_s3
160
+ end.each { |file| push_file(file) }
161
+
162
+ finalize_sync_state
163
+
164
+ display("Done like a dinner.")
165
+ end
166
+
167
+ def last_sync_recorded?
168
+ File.exist?(last_sync_completed)
169
+ end
170
+
171
+ def create_tmp_sync_state
172
+ `touch #{last_sync_started}`
173
+ end
174
+
175
+ def finalize_sync_state
176
+ `cp #{last_sync_started} #{last_sync_completed}`
177
+ end
178
+
179
+ def last_sync_started
180
+ ENV['HOME'] + "/.sir-sync-a-lot.last-sync.started"
181
+ end
182
+
183
+ def last_sync_completed
184
+ ENV['HOME'] + "/.sir-sync-a-lot.last-sync.completed"
185
+ end
186
+
187
+ def files_modified_since_last_sync
188
+ # '! -type d' ignores directories, in local manifest directories are spit out to stderr whereas directories pop up in this query
189
+ `find #{read_config[:local_file_path]} #{read_config[:find_options]} \! -type d -cnewer #{last_sync_completed}`.split("\n").collect { |path| {:path => path } }
190
+ end
191
+
192
+ def update_config_with_sync_state(sync_start)
193
+ config = read_config()
194
+ config[:last_sync_at] = sync_start
195
+ write_config!(config)
196
+ end
197
+
198
+ def generate_local_manifest
199
+ `find #{read_config[:local_file_path]} #{read_config[:find_options]} -print0 | xargs -0 openssl md5 2> /dev/null > /tmp/sir-sync-a-lot.manifest.local`
200
+ end
201
+
202
+ def fetch_remote_manifest
203
+ @remote_objects_cache = [] # instance vars feel like global variables somehow
204
+ traverse_s3_for_objects(AWS::S3::Bucket.find(read_config[:aws_dest_bucket]), @remote_objects_cache)
205
+ end
206
+
207
+ def traverse_s3_for_objects(bucket, collection, n = 1000, upto = 0, marker = nil)
208
+ objects = bucket.objects(:marker => marker, :max_keys => n)
209
+ if objects.size == 0
210
+ return
211
+ else
212
+ objects.each { |object| collection << {:path => "/#{object.key}", :checksum => object.etag} }
213
+ traverse_s3_for_objects(bucket, collection, n, upto+n, objects.last.key)
214
+ end
215
+ end
216
+
217
+ def files_on_localhost_with_checksums
218
+ parse_manifest(local_manifest_path)
219
+ end
220
+
221
+ def files_on_s3
222
+ @remote_objects_cache
223
+ end
224
+
225
+ def local_manifest_path
226
+ "/tmp/sir-sync-a-lot.manifest.local"
227
+ end
228
+
229
+ def parse_manifest(location)
230
+ if File.exist?(location)
231
+ open(location, 'r') do |file|
232
+ file.collect do |line|
233
+ path, checksum = *line.chomp.match(/^MD5\((.*)\)= (.*)$/).captures
234
+ {:path => path, :checksum => checksum}
235
+ end
236
+ end
237
+ else
238
+ []
239
+ end
240
+ end
241
+
242
+ def push_file(file)
243
+ # xfer speed, logging, etc can occur in this method
244
+ display("Pushing #{file[:path]}...")
245
+ AWS::S3::S3Object.store(file[:path], open(file[:path]), read_config[:aws_dest_bucket])
246
+ rescue
247
+ display("ERROR: Could not push '#{file[:path]}': #{$!.inspect}")
248
+ end
249
+
250
+ def aquire_lock!
251
+ if File.exist?(lock_path)
252
+ # better way is to write out the pid ($$) and read it back in, to make sure it's the same
253
+ exit_with_error!("Found a lock at #{lock_path}, is another instance of #{__FILE__} running?")
254
+ end
255
+
256
+ begin
257
+ system("touch #{lock_path}")
258
+ yield
259
+ ensure
260
+ system("rm #{lock_path}")
261
+ end
262
+ end
263
+
264
+
265
+ def display_help!
266
+ display("Go help yourself buddy!")
267
+ end
268
+
269
+ def exit_with_error!(message)
270
+ display("Gah! " + message)
271
+ exit
272
+ end
273
+
274
+ def display(message)
275
+ puts("[#{Time.now}] #{message}")
276
+ end
277
+
278
+ def ask(question)
279
+ print(question + ": ")
280
+ $stdin.readline.chomp # gets doesn't work here!
281
+ end
282
+
283
+ def config_exists?
284
+ File.exist?(config_path)
285
+ end
286
+
287
+ def config_path
288
+ ENV['HOME'] + "/.sir-sync-a-lot.yml"
289
+ end
290
+
291
+ def lock_path
292
+ ENV['HOME'] + "/.sir-sync-a-lot.lock"
293
+ end
294
+
295
+ end
@@ -0,0 +1,24 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "sir-sync-a-lot"
4
+
5
+ Gem::Specification.new do |s|
6
+ s.name = "sir-sync-a-lot"
7
+ s.version = SirSyncalot::VERSION
8
+ s.authors = ["Ryan Allen"]
9
+ s.email = ["ryan@yeahnah.org"]
10
+ s.homepage = "https://github.com/ryan-allen/sir-sync-a-lot"
11
+ s.summary = %q{Baby got backups!}
12
+ s.description = %q{Optimised S3 backup tool. Uses linux's find and xargs to find updated files as to not exaust your disk IO.}
13
+
14
+ s.rubyforge_project = "sir-sync-a-lot"
15
+
16
+ s.files = `git ls-files`.split("\n")
17
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
18
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
19
+ s.require_paths = ["lib"]
20
+
21
+ # specify any dependencies here; for example:
22
+ # s.add_development_dependency "rspec"
23
+ s.add_runtime_dependency "aws-s3"
24
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sir-sync-a-lot
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.0
4
+ version: 0.0.1
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -13,7 +13,7 @@ date: 2011-09-14 00:00:00.000000000Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: aws-s3
16
- requirement: &70288454419240 !ruby/object:Gem::Requirement
16
+ requirement: &70302277542320 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ! '>='
@@ -21,16 +21,25 @@ dependencies:
21
21
  version: '0'
22
22
  type: :runtime
23
23
  prerelease: false
24
- version_requirements: *70288454419240
24
+ version_requirements: *70302277542320
25
25
  description: Optimised S3 backup tool. Uses linux's find and xargs to find updated
26
26
  files as to not exaust your disk IO.
27
27
  email:
28
28
  - ryan@yeahnah.org
29
- executables: []
29
+ executables:
30
+ - sir-sync-a-lot
30
31
  extensions: []
31
32
  extra_rdoc_files: []
32
- files: []
33
- homepage: ''
33
+ files:
34
+ - .gitignore
35
+ - Gemfile
36
+ - MIT-LICENSE
37
+ - README.md
38
+ - Rakefile
39
+ - bin/sir-sync-a-lot
40
+ - lib/sir-sync-a-lot.rb
41
+ - sir-sync-a-lot.gemspec
42
+ homepage: https://github.com/ryan-allen/sir-sync-a-lot
34
43
  licenses: []
35
44
  post_install_message:
36
45
  rdoc_options: []