sir-sync-a-lot 0.0.0 → 0.0.1
Sign up to get free protection for your applications and to get access to all the features.
- data/.gitignore +4 -0
- data/Gemfile +4 -0
- data/MIT-LICENSE +19 -0
- data/README.md +59 -0
- data/Rakefile +1 -0
- data/bin/sir-sync-a-lot +3 -0
- data/lib/sir-sync-a-lot.rb +295 -0
- data/sir-sync-a-lot.gemspec +24 -0
- metadata +15 -6
data/.gitignore
ADDED
data/Gemfile
ADDED
data/MIT-LICENSE
ADDED
@@ -0,0 +1,19 @@
|
|
1
|
+
Copyright (c) 2010 Ryan Allen, Envato Pty Ltd
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
4
|
+
of this software and associated documentation files (the "Software"), to deal
|
5
|
+
in the Software without restriction, including without limitation the rights
|
6
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
7
|
+
copies of the Software, and to permit persons to whom the Software is
|
8
|
+
furnished to do so, subject to the following conditions:
|
9
|
+
|
10
|
+
The above copyright notice and this permission notice shall be included in
|
11
|
+
all copies or substantial portions of the Software.
|
12
|
+
|
13
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
14
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
15
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
16
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
17
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
18
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
19
|
+
THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,59 @@
|
|
1
|
+
# Sir Sync-A-Lot. An optimised S3 sync tool using the power of *nix!
|
2
|
+
|
3
|
+
## Requirements:
|
4
|
+
|
5
|
+
* Ruby 1.8.something
|
6
|
+
* RubyGems
|
7
|
+
* aws-s3 RubyGem
|
8
|
+
* OpenSSL
|
9
|
+
* find and xargs (should be on your *nix straight outta the box)
|
10
|
+
|
11
|
+
To configure, run <code>sir-sync-a-lot setup</code> and follow the prompts, you'll
|
12
|
+
need your S3 keys, the local file path you want to back up, the bucket name
|
13
|
+
to back up to, and any extra options to pass into find (i.e. for ignoring
|
14
|
+
filepaths etc). It'll write the config to <code>~/.sir-sync-a-lot.yml</code>.
|
15
|
+
|
16
|
+
Then to sync, run <code>sir-sync-a-lot sync</code> and away she goes.
|
17
|
+
|
18
|
+
## Why?
|
19
|
+
|
20
|
+
This library was written because we needed to be able to back up craploads of
|
21
|
+
data without having to worry about if we had enough disk space on the remote.
|
22
|
+
That's where S3 is nice.
|
23
|
+
|
24
|
+
We tried s3sync but it blew the crap out of our server load (we do in excess of
|
25
|
+
500,000 requests a day (page views, not including hits for images and what not,
|
26
|
+
and the server needs to stay responsive). The secret sauce is using the *nix
|
27
|
+
'find', 'xargs' and 'openssl' commands to generate md5 checksums for comparison.
|
28
|
+
Seems to work quite well for us (we have almost 90,000 files to compare).
|
29
|
+
|
30
|
+
Initially the plan was to use find with -ctime but S3 isn't particulary nice about
|
31
|
+
returning a full list of objects in a bucket (default is 1000, and I want all
|
32
|
+
90,000, and it ignores me when I ask for 1,000,000 objects). Manifest generation
|
33
|
+
on a server under load is fast enough and low enough on resources so we're sticking
|
34
|
+
with that in the interim.
|
35
|
+
|
36
|
+
## Etc
|
37
|
+
|
38
|
+
FYI when you run sync, the output will look something like this:
|
39
|
+
|
40
|
+
[Thu Apr 01 11:50:25 +1100 2010] Starting, performing pre-sync checks...
|
41
|
+
[Thu Apr 01 11:50:26 +1100 2010] Generating local manifest...
|
42
|
+
[Thu Apr 01 11:50:26 +1100 2010] Fetching remote manifest...
|
43
|
+
[Thu Apr 01 11:50:27 +1100 2010] Performing checksum comparison...
|
44
|
+
[Thu Apr 01 11:50:27 +1100 2010] Pushing /tmp/backups/deep/four...
|
45
|
+
[Thu Apr 01 11:50:28 +1100 2010] Pushing /tmp/backups/three...
|
46
|
+
[Thu Apr 01 11:50:29 +1100 2010] Pushing /tmp/backups/two...
|
47
|
+
[Thu Apr 01 11:50:30 +1100 2010] Pushing local manifest up to remote...
|
48
|
+
[Thu Apr 01 11:50:31 +1100 2010] Done like a dinner.
|
49
|
+
|
50
|
+
You could pipe sync into a log file, which might be nice, this is what our crontab
|
51
|
+
looks like:
|
52
|
+
|
53
|
+
# run sync backups to s3 every day
|
54
|
+
0 1 * * * /usr/local/bin/rvm 1.8.7 ruby /root/sir-sync-a-lot/sir-sync-a-lot 1>> /var/log/sir-sync-a-lot.log 2>> /var/log/sir-sync-a-lot.error
|
55
|
+
|
56
|
+
Have fun!
|
57
|
+
|
58
|
+
Project wholy sponsored by Envato Pty Ltd. They're the shizzy! P.S. We use this
|
59
|
+
in production environments!
|
data/Rakefile
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
require "bundler/gem_tasks"
|
data/bin/sir-sync-a-lot
ADDED
@@ -0,0 +1,295 @@
|
|
1
|
+
require 'aws/s3'
|
2
|
+
require 'yaml'
|
3
|
+
|
4
|
+
class SirSyncalot
|
5
|
+
def self.run!(*args)
|
6
|
+
new(*args).run!
|
7
|
+
end
|
8
|
+
|
9
|
+
[:action, :config].each { |member| attr(member) }
|
10
|
+
|
11
|
+
def initialize(action = "sync")
|
12
|
+
@action = action
|
13
|
+
end
|
14
|
+
|
15
|
+
def run!
|
16
|
+
validate_inputs!
|
17
|
+
perform_action!
|
18
|
+
end
|
19
|
+
|
20
|
+
VERSION = '0.0.1'
|
21
|
+
|
22
|
+
private
|
23
|
+
|
24
|
+
def validate_inputs!
|
25
|
+
if setup_action? and config_exists?
|
26
|
+
exit_with_error!("Can't make a setup, because there's already a configuration in '#{config_path}'.")
|
27
|
+
elsif sync_action? and !config_exists?
|
28
|
+
exit_with_error!("Can't make a sync, because there's no configuration, try '#{__FILE__} setup'.")
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
def perform_action!
|
33
|
+
if setup_action?
|
34
|
+
aquire_lock! { perform_setup! }
|
35
|
+
elsif sync_action?
|
36
|
+
aquire_lock! { perform_sync! }
|
37
|
+
elsif help_action?
|
38
|
+
display_help!
|
39
|
+
else
|
40
|
+
exit_with_error!("Cannot perform action '#{@action}', try '#{__FILE__} help' for usage.")
|
41
|
+
end
|
42
|
+
end
|
43
|
+
|
44
|
+
def setup_action?
|
45
|
+
action == "setup"
|
46
|
+
end
|
47
|
+
|
48
|
+
def sync_action?
|
49
|
+
action == "sync"
|
50
|
+
end
|
51
|
+
|
52
|
+
def help_action?
|
53
|
+
action == "help"
|
54
|
+
end
|
55
|
+
|
56
|
+
def perform_setup!
|
57
|
+
display("Hello! Ima ask you a few questions, and store the results in #{config_path} for later, OK?")
|
58
|
+
|
59
|
+
config = {}
|
60
|
+
|
61
|
+
config[:aws_access_key] = ask("What is the AWS access key?")
|
62
|
+
config[:aws_secret_key] = ask("What is the AWS secret access key?")
|
63
|
+
display("Just a sec, ima check that works...")
|
64
|
+
if aws_credentials_valid?(config)
|
65
|
+
display("Yep, all good.")
|
66
|
+
config[:aws_dest_bucket] = ask("What bucket should we put your backups in? (If it doesn't exist I'll create it)")
|
67
|
+
if bucket_exists?(config)
|
68
|
+
if bucket_empty?(config)
|
69
|
+
display("I found that the bucket already exists, and it's empty so I'm happy.")
|
70
|
+
else
|
71
|
+
exit_with_error!("I found the bucket to exist, but it's not empty. I can't sync to a bucket that is not empty.")
|
72
|
+
end
|
73
|
+
else
|
74
|
+
display("The bucket doesn't exist, so I'm creating it now...")
|
75
|
+
create_bucket(config)
|
76
|
+
display("OK that's done.")
|
77
|
+
end
|
78
|
+
else
|
79
|
+
exit_with_error!("I couldn't connect to S3 with the credentials you supplied, try again much?")
|
80
|
+
end
|
81
|
+
|
82
|
+
config[:local_file_path] = ask("What is the (absolute) path that you want to back up? (i.e. /var/www not ./www)")
|
83
|
+
if !local_file_path_exists?(config)
|
84
|
+
exit_with_error!("I find that the local file path you supplied doesn't exist, wrong much?")
|
85
|
+
end
|
86
|
+
|
87
|
+
config[:find_options] = ask("Do you have any options for find ? (e.g. \! -path \"*.git*). Press enter for none")
|
88
|
+
|
89
|
+
display("Right, I'm writing out the details you supplied to '#{config_path}' for my future reference...")
|
90
|
+
write_config!(config)
|
91
|
+
display("You're good to go. Next up is '#{__FILE__} sync' to syncronise your files to S3.")
|
92
|
+
end
|
93
|
+
|
94
|
+
def aws_credentials_valid?(config = read_config())
|
95
|
+
AWS::S3::Base.establish_connection!(:access_key_id => config[:aws_access_key], :secret_access_key => config[:aws_secret_key])
|
96
|
+
begin
|
97
|
+
AWS::S3::Service.buckets # AWS::S3 don't try to connect at all until you ask it for something.
|
98
|
+
rescue AWS::S3::InvalidAccessKeyId => e
|
99
|
+
false
|
100
|
+
else
|
101
|
+
true
|
102
|
+
end
|
103
|
+
end
|
104
|
+
|
105
|
+
def bucket_exists?(config = read_config())
|
106
|
+
AWS::S3::Bucket.find(config[:aws_dest_bucket])
|
107
|
+
rescue AWS::S3::NoSuchBucket => e
|
108
|
+
false
|
109
|
+
end
|
110
|
+
|
111
|
+
def bucket_empty?(config = read_config())
|
112
|
+
AWS::S3::Bucket.find(config[:aws_dest_bucket]).empty?
|
113
|
+
end
|
114
|
+
|
115
|
+
def create_bucket(config = read_config())
|
116
|
+
AWS::S3::Bucket.create(config[:aws_dest_bucket])
|
117
|
+
end
|
118
|
+
|
119
|
+
def local_file_path_exists?(config = read_config())
|
120
|
+
File.exist?(config[:local_file_path])
|
121
|
+
end
|
122
|
+
|
123
|
+
def write_config!(config)
|
124
|
+
open(config_path, 'w') { |f| YAML::dump(config, f) }
|
125
|
+
end
|
126
|
+
|
127
|
+
def read_config(reload = false)
|
128
|
+
reload or !@config ? @config = open(config_path, 'r') { |f| YAML::load(f) } : @config
|
129
|
+
end
|
130
|
+
|
131
|
+
def perform_sync!
|
132
|
+
display("Starting, performing pre-sync checks...")
|
133
|
+
if !aws_credentials_valid?
|
134
|
+
exit_with_error!("Couldn't connect to S3 with the credentials in #{config_path}.")
|
135
|
+
end
|
136
|
+
|
137
|
+
if !bucket_exists?
|
138
|
+
exit_with_error!("Can't find the bucket in S3 specified in #{config_path}.")
|
139
|
+
end
|
140
|
+
|
141
|
+
if !local_file_path_exists?
|
142
|
+
exit_with_error!("Local path specified in #{config_path} does not exist.")
|
143
|
+
end
|
144
|
+
|
145
|
+
create_tmp_sync_state
|
146
|
+
|
147
|
+
if last_sync_recorded?
|
148
|
+
display("Performing time based comparison...")
|
149
|
+
files_modified_since_last_sync
|
150
|
+
else
|
151
|
+
display("Performing (potentially expensive) checksum comparison...")
|
152
|
+
display("Generating local manifest...")
|
153
|
+
generate_local_manifest
|
154
|
+
display("Traversing S3 for remote manifest...")
|
155
|
+
fetch_remote_manifest
|
156
|
+
# note that we do not remove files on s3 that no longer exist on local host. this behaviour
|
157
|
+
# may be desirable (ala rsync --delete) but we currently don't support it. ok? sweet.
|
158
|
+
display("Performing checksum comparison...")
|
159
|
+
files_on_localhost_with_checksums - files_on_s3
|
160
|
+
end.each { |file| push_file(file) }
|
161
|
+
|
162
|
+
finalize_sync_state
|
163
|
+
|
164
|
+
display("Done like a dinner.")
|
165
|
+
end
|
166
|
+
|
167
|
+
def last_sync_recorded?
|
168
|
+
File.exist?(last_sync_completed)
|
169
|
+
end
|
170
|
+
|
171
|
+
def create_tmp_sync_state
|
172
|
+
`touch #{last_sync_started}`
|
173
|
+
end
|
174
|
+
|
175
|
+
def finalize_sync_state
|
176
|
+
`cp #{last_sync_started} #{last_sync_completed}`
|
177
|
+
end
|
178
|
+
|
179
|
+
def last_sync_started
|
180
|
+
ENV['HOME'] + "/.sir-sync-a-lot.last-sync.started"
|
181
|
+
end
|
182
|
+
|
183
|
+
def last_sync_completed
|
184
|
+
ENV['HOME'] + "/.sir-sync-a-lot.last-sync.completed"
|
185
|
+
end
|
186
|
+
|
187
|
+
def files_modified_since_last_sync
|
188
|
+
# '! -type d' ignores directories, in local manifest directories are spit out to stderr whereas directories pop up in this query
|
189
|
+
`find #{read_config[:local_file_path]} #{read_config[:find_options]} \! -type d -cnewer #{last_sync_completed}`.split("\n").collect { |path| {:path => path } }
|
190
|
+
end
|
191
|
+
|
192
|
+
def update_config_with_sync_state(sync_start)
|
193
|
+
config = read_config()
|
194
|
+
config[:last_sync_at] = sync_start
|
195
|
+
write_config!(config)
|
196
|
+
end
|
197
|
+
|
198
|
+
def generate_local_manifest
|
199
|
+
`find #{read_config[:local_file_path]} #{read_config[:find_options]} -print0 | xargs -0 openssl md5 2> /dev/null > /tmp/sir-sync-a-lot.manifest.local`
|
200
|
+
end
|
201
|
+
|
202
|
+
def fetch_remote_manifest
|
203
|
+
@remote_objects_cache = [] # instance vars feel like global variables somehow
|
204
|
+
traverse_s3_for_objects(AWS::S3::Bucket.find(read_config[:aws_dest_bucket]), @remote_objects_cache)
|
205
|
+
end
|
206
|
+
|
207
|
+
def traverse_s3_for_objects(bucket, collection, n = 1000, upto = 0, marker = nil)
|
208
|
+
objects = bucket.objects(:marker => marker, :max_keys => n)
|
209
|
+
if objects.size == 0
|
210
|
+
return
|
211
|
+
else
|
212
|
+
objects.each { |object| collection << {:path => "/#{object.key}", :checksum => object.etag} }
|
213
|
+
traverse_s3_for_objects(bucket, collection, n, upto+n, objects.last.key)
|
214
|
+
end
|
215
|
+
end
|
216
|
+
|
217
|
+
def files_on_localhost_with_checksums
|
218
|
+
parse_manifest(local_manifest_path)
|
219
|
+
end
|
220
|
+
|
221
|
+
def files_on_s3
|
222
|
+
@remote_objects_cache
|
223
|
+
end
|
224
|
+
|
225
|
+
def local_manifest_path
|
226
|
+
"/tmp/sir-sync-a-lot.manifest.local"
|
227
|
+
end
|
228
|
+
|
229
|
+
def parse_manifest(location)
|
230
|
+
if File.exist?(location)
|
231
|
+
open(location, 'r') do |file|
|
232
|
+
file.collect do |line|
|
233
|
+
path, checksum = *line.chomp.match(/^MD5\((.*)\)= (.*)$/).captures
|
234
|
+
{:path => path, :checksum => checksum}
|
235
|
+
end
|
236
|
+
end
|
237
|
+
else
|
238
|
+
[]
|
239
|
+
end
|
240
|
+
end
|
241
|
+
|
242
|
+
def push_file(file)
|
243
|
+
# xfer speed, logging, etc can occur in this method
|
244
|
+
display("Pushing #{file[:path]}...")
|
245
|
+
AWS::S3::S3Object.store(file[:path], open(file[:path]), read_config[:aws_dest_bucket])
|
246
|
+
rescue
|
247
|
+
display("ERROR: Could not push '#{file[:path]}': #{$!.inspect}")
|
248
|
+
end
|
249
|
+
|
250
|
+
def aquire_lock!
|
251
|
+
if File.exist?(lock_path)
|
252
|
+
# better way is to write out the pid ($$) and read it back in, to make sure it's the same
|
253
|
+
exit_with_error!("Found a lock at #{lock_path}, is another instance of #{__FILE__} running?")
|
254
|
+
end
|
255
|
+
|
256
|
+
begin
|
257
|
+
system("touch #{lock_path}")
|
258
|
+
yield
|
259
|
+
ensure
|
260
|
+
system("rm #{lock_path}")
|
261
|
+
end
|
262
|
+
end
|
263
|
+
|
264
|
+
|
265
|
+
def display_help!
|
266
|
+
display("Go help yourself buddy!")
|
267
|
+
end
|
268
|
+
|
269
|
+
def exit_with_error!(message)
|
270
|
+
display("Gah! " + message)
|
271
|
+
exit
|
272
|
+
end
|
273
|
+
|
274
|
+
def display(message)
|
275
|
+
puts("[#{Time.now}] #{message}")
|
276
|
+
end
|
277
|
+
|
278
|
+
def ask(question)
|
279
|
+
print(question + ": ")
|
280
|
+
$stdin.readline.chomp # gets doesn't work here!
|
281
|
+
end
|
282
|
+
|
283
|
+
def config_exists?
|
284
|
+
File.exist?(config_path)
|
285
|
+
end
|
286
|
+
|
287
|
+
def config_path
|
288
|
+
ENV['HOME'] + "/.sir-sync-a-lot.yml"
|
289
|
+
end
|
290
|
+
|
291
|
+
def lock_path
|
292
|
+
ENV['HOME'] + "/.sir-sync-a-lot.lock"
|
293
|
+
end
|
294
|
+
|
295
|
+
end
|
@@ -0,0 +1,24 @@
|
|
1
|
+
# -*- encoding: utf-8 -*-
|
2
|
+
$:.push File.expand_path("../lib", __FILE__)
|
3
|
+
require "sir-sync-a-lot"
|
4
|
+
|
5
|
+
Gem::Specification.new do |s|
|
6
|
+
s.name = "sir-sync-a-lot"
|
7
|
+
s.version = SirSyncalot::VERSION
|
8
|
+
s.authors = ["Ryan Allen"]
|
9
|
+
s.email = ["ryan@yeahnah.org"]
|
10
|
+
s.homepage = "https://github.com/ryan-allen/sir-sync-a-lot"
|
11
|
+
s.summary = %q{Baby got backups!}
|
12
|
+
s.description = %q{Optimised S3 backup tool. Uses linux's find and xargs to find updated files as to not exaust your disk IO.}
|
13
|
+
|
14
|
+
s.rubyforge_project = "sir-sync-a-lot"
|
15
|
+
|
16
|
+
s.files = `git ls-files`.split("\n")
|
17
|
+
s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
|
18
|
+
s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
|
19
|
+
s.require_paths = ["lib"]
|
20
|
+
|
21
|
+
# specify any dependencies here; for example:
|
22
|
+
# s.add_development_dependency "rspec"
|
23
|
+
s.add_runtime_dependency "aws-s3"
|
24
|
+
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sir-sync-a-lot
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.1
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -13,7 +13,7 @@ date: 2011-09-14 00:00:00.000000000Z
|
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: aws-s3
|
16
|
-
requirement: &
|
16
|
+
requirement: &70302277542320 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ! '>='
|
@@ -21,16 +21,25 @@ dependencies:
|
|
21
21
|
version: '0'
|
22
22
|
type: :runtime
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *70302277542320
|
25
25
|
description: Optimised S3 backup tool. Uses linux's find and xargs to find updated
|
26
26
|
files as to not exaust your disk IO.
|
27
27
|
email:
|
28
28
|
- ryan@yeahnah.org
|
29
|
-
executables:
|
29
|
+
executables:
|
30
|
+
- sir-sync-a-lot
|
30
31
|
extensions: []
|
31
32
|
extra_rdoc_files: []
|
32
|
-
files:
|
33
|
-
|
33
|
+
files:
|
34
|
+
- .gitignore
|
35
|
+
- Gemfile
|
36
|
+
- MIT-LICENSE
|
37
|
+
- README.md
|
38
|
+
- Rakefile
|
39
|
+
- bin/sir-sync-a-lot
|
40
|
+
- lib/sir-sync-a-lot.rb
|
41
|
+
- sir-sync-a-lot.gemspec
|
42
|
+
homepage: https://github.com/ryan-allen/sir-sync-a-lot
|
34
43
|
licenses: []
|
35
44
|
post_install_message:
|
36
45
|
rdoc_options: []
|