sir-sync-a-lot 0.0.0 → 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.gitignore +4 -0
- data/Gemfile +4 -0
- data/MIT-LICENSE +19 -0
- data/README.md +59 -0
- data/Rakefile +1 -0
- data/bin/sir-sync-a-lot +3 -0
- data/lib/sir-sync-a-lot.rb +295 -0
- data/sir-sync-a-lot.gemspec +24 -0
- metadata +15 -6
data/.gitignore
ADDED
data/Gemfile
ADDED
data/MIT-LICENSE
ADDED
@@ -0,0 +1,19 @@
|
|
1
|
+
Copyright (c) 2010 Ryan Allen, Envato Pty Ltd
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
4
|
+
of this software and associated documentation files (the "Software"), to deal
|
5
|
+
in the Software without restriction, including without limitation the rights
|
6
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
7
|
+
copies of the Software, and to permit persons to whom the Software is
|
8
|
+
furnished to do so, subject to the following conditions:
|
9
|
+
|
10
|
+
The above copyright notice and this permission notice shall be included in
|
11
|
+
all copies or substantial portions of the Software.
|
12
|
+
|
13
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
14
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
15
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
16
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
17
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
18
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
19
|
+
THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,59 @@
|
|
1
|
+
# Sir Sync-A-Lot. An optimised S3 sync tool using the power of *nix!
|
2
|
+
|
3
|
+
## Requirements:
|
4
|
+
|
5
|
+
* Ruby 1.8.something
|
6
|
+
* RubyGems
|
7
|
+
* aws-s3 RubyGem
|
8
|
+
* OpenSSL
|
9
|
+
* find and xargs (should be on your *nix straight outta the box)
|
10
|
+
|
11
|
+
To configure, run <code>sir-sync-a-lot setup</code> and follow the prompts, you'll
|
12
|
+
need your S3 keys, the local file path you want to back up, the bucket name
|
13
|
+
to back up to, and any extra options to pass into find (i.e. for ignoring
|
14
|
+
filepaths etc). It'll write the config to <code>~/.sir-sync-a-lot.yml</code>.
|
15
|
+
|
16
|
+
Then to sync, run <code>sir-sync-a-lot sync</code> and away she goes.
|
17
|
+
|
18
|
+
## Why?
|
19
|
+
|
20
|
+
This library was written because we needed to be able to back up craploads of
|
21
|
+
data without having to worry about if we had enough disk space on the remote.
|
22
|
+
That's where S3 is nice.
|
23
|
+
|
24
|
+
We tried s3sync but it blew the crap out of our server load (we do in excess of
|
25
|
+
500,000 requests a day (page views, not including hits for images and what not,
|
26
|
+
and the server needs to stay responsive). The secret sauce is using the *nix
|
27
|
+
'find', 'xargs' and 'openssl' commands to generate md5 checksums for comparison.
|
28
|
+
Seems to work quite well for us (we have almost 90,000 files to compare).
|
29
|
+
|
30
|
+
Initially the plan was to use find with -ctime but S3 isn't particulary nice about
|
31
|
+
returning a full list of objects in a bucket (default is 1000, and I want all
|
32
|
+
90,000, and it ignores me when I ask for 1,000,000 objects). Manifest generation
|
33
|
+
on a server under load is fast enough and low enough on resources so we're sticking
|
34
|
+
with that in the interim.
|
35
|
+
|
36
|
+
## Etc
|
37
|
+
|
38
|
+
FYI when you run sync, the output will look something like this:
|
39
|
+
|
40
|
+
[Thu Apr 01 11:50:25 +1100 2010] Starting, performing pre-sync checks...
|
41
|
+
[Thu Apr 01 11:50:26 +1100 2010] Generating local manifest...
|
42
|
+
[Thu Apr 01 11:50:26 +1100 2010] Fetching remote manifest...
|
43
|
+
[Thu Apr 01 11:50:27 +1100 2010] Performing checksum comparison...
|
44
|
+
[Thu Apr 01 11:50:27 +1100 2010] Pushing /tmp/backups/deep/four...
|
45
|
+
[Thu Apr 01 11:50:28 +1100 2010] Pushing /tmp/backups/three...
|
46
|
+
[Thu Apr 01 11:50:29 +1100 2010] Pushing /tmp/backups/two...
|
47
|
+
[Thu Apr 01 11:50:30 +1100 2010] Pushing local manifest up to remote...
|
48
|
+
[Thu Apr 01 11:50:31 +1100 2010] Done like a dinner.
|
49
|
+
|
50
|
+
You could pipe sync into a log file, which might be nice, this is what our crontab
|
51
|
+
looks like:
|
52
|
+
|
53
|
+
# run sync backups to s3 every day
|
54
|
+
0 1 * * * /usr/local/bin/rvm 1.8.7 ruby /root/sir-sync-a-lot/sir-sync-a-lot 1>> /var/log/sir-sync-a-lot.log 2>> /var/log/sir-sync-a-lot.error
|
55
|
+
|
56
|
+
Have fun!
|
57
|
+
|
58
|
+
Project wholy sponsored by Envato Pty Ltd. They're the shizzy! P.S. We use this
|
59
|
+
in production environments!
|
data/Rakefile
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
require "bundler/gem_tasks"
|
data/bin/sir-sync-a-lot
ADDED
@@ -0,0 +1,295 @@
|
|
1
|
+
require 'aws/s3'
|
2
|
+
require 'yaml'
|
3
|
+
|
4
|
+
class SirSyncalot
|
5
|
+
def self.run!(*args)
|
6
|
+
new(*args).run!
|
7
|
+
end
|
8
|
+
|
9
|
+
[:action, :config].each { |member| attr(member) }
|
10
|
+
|
11
|
+
def initialize(action = "sync")
|
12
|
+
@action = action
|
13
|
+
end
|
14
|
+
|
15
|
+
def run!
|
16
|
+
validate_inputs!
|
17
|
+
perform_action!
|
18
|
+
end
|
19
|
+
|
20
|
+
VERSION = '0.0.1'
|
21
|
+
|
22
|
+
private
|
23
|
+
|
24
|
+
def validate_inputs!
|
25
|
+
if setup_action? and config_exists?
|
26
|
+
exit_with_error!("Can't make a setup, because there's already a configuration in '#{config_path}'.")
|
27
|
+
elsif sync_action? and !config_exists?
|
28
|
+
exit_with_error!("Can't make a sync, because there's no configuration, try '#{__FILE__} setup'.")
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
def perform_action!
|
33
|
+
if setup_action?
|
34
|
+
aquire_lock! { perform_setup! }
|
35
|
+
elsif sync_action?
|
36
|
+
aquire_lock! { perform_sync! }
|
37
|
+
elsif help_action?
|
38
|
+
display_help!
|
39
|
+
else
|
40
|
+
exit_with_error!("Cannot perform action '#{@action}', try '#{__FILE__} help' for usage.")
|
41
|
+
end
|
42
|
+
end
|
43
|
+
|
44
|
+
def setup_action?
|
45
|
+
action == "setup"
|
46
|
+
end
|
47
|
+
|
48
|
+
def sync_action?
|
49
|
+
action == "sync"
|
50
|
+
end
|
51
|
+
|
52
|
+
def help_action?
|
53
|
+
action == "help"
|
54
|
+
end
|
55
|
+
|
56
|
+
def perform_setup!
|
57
|
+
display("Hello! Ima ask you a few questions, and store the results in #{config_path} for later, OK?")
|
58
|
+
|
59
|
+
config = {}
|
60
|
+
|
61
|
+
config[:aws_access_key] = ask("What is the AWS access key?")
|
62
|
+
config[:aws_secret_key] = ask("What is the AWS secret access key?")
|
63
|
+
display("Just a sec, ima check that works...")
|
64
|
+
if aws_credentials_valid?(config)
|
65
|
+
display("Yep, all good.")
|
66
|
+
config[:aws_dest_bucket] = ask("What bucket should we put your backups in? (If it doesn't exist I'll create it)")
|
67
|
+
if bucket_exists?(config)
|
68
|
+
if bucket_empty?(config)
|
69
|
+
display("I found that the bucket already exists, and it's empty so I'm happy.")
|
70
|
+
else
|
71
|
+
exit_with_error!("I found the bucket to exist, but it's not empty. I can't sync to a bucket that is not empty.")
|
72
|
+
end
|
73
|
+
else
|
74
|
+
display("The bucket doesn't exist, so I'm creating it now...")
|
75
|
+
create_bucket(config)
|
76
|
+
display("OK that's done.")
|
77
|
+
end
|
78
|
+
else
|
79
|
+
exit_with_error!("I couldn't connect to S3 with the credentials you supplied, try again much?")
|
80
|
+
end
|
81
|
+
|
82
|
+
config[:local_file_path] = ask("What is the (absolute) path that you want to back up? (i.e. /var/www not ./www)")
|
83
|
+
if !local_file_path_exists?(config)
|
84
|
+
exit_with_error!("I find that the local file path you supplied doesn't exist, wrong much?")
|
85
|
+
end
|
86
|
+
|
87
|
+
config[:find_options] = ask("Do you have any options for find ? (e.g. \! -path \"*.git*). Press enter for none")
|
88
|
+
|
89
|
+
display("Right, I'm writing out the details you supplied to '#{config_path}' for my future reference...")
|
90
|
+
write_config!(config)
|
91
|
+
display("You're good to go. Next up is '#{__FILE__} sync' to syncronise your files to S3.")
|
92
|
+
end
|
93
|
+
|
94
|
+
def aws_credentials_valid?(config = read_config())
|
95
|
+
AWS::S3::Base.establish_connection!(:access_key_id => config[:aws_access_key], :secret_access_key => config[:aws_secret_key])
|
96
|
+
begin
|
97
|
+
AWS::S3::Service.buckets # AWS::S3 don't try to connect at all until you ask it for something.
|
98
|
+
rescue AWS::S3::InvalidAccessKeyId => e
|
99
|
+
false
|
100
|
+
else
|
101
|
+
true
|
102
|
+
end
|
103
|
+
end
|
104
|
+
|
105
|
+
def bucket_exists?(config = read_config())
|
106
|
+
AWS::S3::Bucket.find(config[:aws_dest_bucket])
|
107
|
+
rescue AWS::S3::NoSuchBucket => e
|
108
|
+
false
|
109
|
+
end
|
110
|
+
|
111
|
+
def bucket_empty?(config = read_config())
|
112
|
+
AWS::S3::Bucket.find(config[:aws_dest_bucket]).empty?
|
113
|
+
end
|
114
|
+
|
115
|
+
def create_bucket(config = read_config())
|
116
|
+
AWS::S3::Bucket.create(config[:aws_dest_bucket])
|
117
|
+
end
|
118
|
+
|
119
|
+
def local_file_path_exists?(config = read_config())
|
120
|
+
File.exist?(config[:local_file_path])
|
121
|
+
end
|
122
|
+
|
123
|
+
def write_config!(config)
|
124
|
+
open(config_path, 'w') { |f| YAML::dump(config, f) }
|
125
|
+
end
|
126
|
+
|
127
|
+
def read_config(reload = false)
|
128
|
+
reload or !@config ? @config = open(config_path, 'r') { |f| YAML::load(f) } : @config
|
129
|
+
end
|
130
|
+
|
131
|
+
def perform_sync!
|
132
|
+
display("Starting, performing pre-sync checks...")
|
133
|
+
if !aws_credentials_valid?
|
134
|
+
exit_with_error!("Couldn't connect to S3 with the credentials in #{config_path}.")
|
135
|
+
end
|
136
|
+
|
137
|
+
if !bucket_exists?
|
138
|
+
exit_with_error!("Can't find the bucket in S3 specified in #{config_path}.")
|
139
|
+
end
|
140
|
+
|
141
|
+
if !local_file_path_exists?
|
142
|
+
exit_with_error!("Local path specified in #{config_path} does not exist.")
|
143
|
+
end
|
144
|
+
|
145
|
+
create_tmp_sync_state
|
146
|
+
|
147
|
+
if last_sync_recorded?
|
148
|
+
display("Performing time based comparison...")
|
149
|
+
files_modified_since_last_sync
|
150
|
+
else
|
151
|
+
display("Performing (potentially expensive) checksum comparison...")
|
152
|
+
display("Generating local manifest...")
|
153
|
+
generate_local_manifest
|
154
|
+
display("Traversing S3 for remote manifest...")
|
155
|
+
fetch_remote_manifest
|
156
|
+
# note that we do not remove files on s3 that no longer exist on local host. this behaviour
|
157
|
+
# may be desirable (ala rsync --delete) but we currently don't support it. ok? sweet.
|
158
|
+
display("Performing checksum comparison...")
|
159
|
+
files_on_localhost_with_checksums - files_on_s3
|
160
|
+
end.each { |file| push_file(file) }
|
161
|
+
|
162
|
+
finalize_sync_state
|
163
|
+
|
164
|
+
display("Done like a dinner.")
|
165
|
+
end
|
166
|
+
|
167
|
+
def last_sync_recorded?
|
168
|
+
File.exist?(last_sync_completed)
|
169
|
+
end
|
170
|
+
|
171
|
+
def create_tmp_sync_state
|
172
|
+
`touch #{last_sync_started}`
|
173
|
+
end
|
174
|
+
|
175
|
+
def finalize_sync_state
|
176
|
+
`cp #{last_sync_started} #{last_sync_completed}`
|
177
|
+
end
|
178
|
+
|
179
|
+
def last_sync_started
|
180
|
+
ENV['HOME'] + "/.sir-sync-a-lot.last-sync.started"
|
181
|
+
end
|
182
|
+
|
183
|
+
def last_sync_completed
|
184
|
+
ENV['HOME'] + "/.sir-sync-a-lot.last-sync.completed"
|
185
|
+
end
|
186
|
+
|
187
|
+
def files_modified_since_last_sync
|
188
|
+
# '! -type d' ignores directories, in local manifest directories are spit out to stderr whereas directories pop up in this query
|
189
|
+
`find #{read_config[:local_file_path]} #{read_config[:find_options]} \! -type d -cnewer #{last_sync_completed}`.split("\n").collect { |path| {:path => path } }
|
190
|
+
end
|
191
|
+
|
192
|
+
def update_config_with_sync_state(sync_start)
|
193
|
+
config = read_config()
|
194
|
+
config[:last_sync_at] = sync_start
|
195
|
+
write_config!(config)
|
196
|
+
end
|
197
|
+
|
198
|
+
def generate_local_manifest
|
199
|
+
`find #{read_config[:local_file_path]} #{read_config[:find_options]} -print0 | xargs -0 openssl md5 2> /dev/null > /tmp/sir-sync-a-lot.manifest.local`
|
200
|
+
end
|
201
|
+
|
202
|
+
def fetch_remote_manifest
|
203
|
+
@remote_objects_cache = [] # instance vars feel like global variables somehow
|
204
|
+
traverse_s3_for_objects(AWS::S3::Bucket.find(read_config[:aws_dest_bucket]), @remote_objects_cache)
|
205
|
+
end
|
206
|
+
|
207
|
+
def traverse_s3_for_objects(bucket, collection, n = 1000, upto = 0, marker = nil)
|
208
|
+
objects = bucket.objects(:marker => marker, :max_keys => n)
|
209
|
+
if objects.size == 0
|
210
|
+
return
|
211
|
+
else
|
212
|
+
objects.each { |object| collection << {:path => "/#{object.key}", :checksum => object.etag} }
|
213
|
+
traverse_s3_for_objects(bucket, collection, n, upto+n, objects.last.key)
|
214
|
+
end
|
215
|
+
end
|
216
|
+
|
217
|
+
def files_on_localhost_with_checksums
|
218
|
+
parse_manifest(local_manifest_path)
|
219
|
+
end
|
220
|
+
|
221
|
+
def files_on_s3
|
222
|
+
@remote_objects_cache
|
223
|
+
end
|
224
|
+
|
225
|
+
def local_manifest_path
|
226
|
+
"/tmp/sir-sync-a-lot.manifest.local"
|
227
|
+
end
|
228
|
+
|
229
|
+
def parse_manifest(location)
|
230
|
+
if File.exist?(location)
|
231
|
+
open(location, 'r') do |file|
|
232
|
+
file.collect do |line|
|
233
|
+
path, checksum = *line.chomp.match(/^MD5\((.*)\)= (.*)$/).captures
|
234
|
+
{:path => path, :checksum => checksum}
|
235
|
+
end
|
236
|
+
end
|
237
|
+
else
|
238
|
+
[]
|
239
|
+
end
|
240
|
+
end
|
241
|
+
|
242
|
+
def push_file(file)
|
243
|
+
# xfer speed, logging, etc can occur in this method
|
244
|
+
display("Pushing #{file[:path]}...")
|
245
|
+
AWS::S3::S3Object.store(file[:path], open(file[:path]), read_config[:aws_dest_bucket])
|
246
|
+
rescue
|
247
|
+
display("ERROR: Could not push '#{file[:path]}': #{$!.inspect}")
|
248
|
+
end
|
249
|
+
|
250
|
+
def aquire_lock!
|
251
|
+
if File.exist?(lock_path)
|
252
|
+
# better way is to write out the pid ($$) and read it back in, to make sure it's the same
|
253
|
+
exit_with_error!("Found a lock at #{lock_path}, is another instance of #{__FILE__} running?")
|
254
|
+
end
|
255
|
+
|
256
|
+
begin
|
257
|
+
system("touch #{lock_path}")
|
258
|
+
yield
|
259
|
+
ensure
|
260
|
+
system("rm #{lock_path}")
|
261
|
+
end
|
262
|
+
end
|
263
|
+
|
264
|
+
|
265
|
+
def display_help!
|
266
|
+
display("Go help yourself buddy!")
|
267
|
+
end
|
268
|
+
|
269
|
+
def exit_with_error!(message)
|
270
|
+
display("Gah! " + message)
|
271
|
+
exit
|
272
|
+
end
|
273
|
+
|
274
|
+
def display(message)
|
275
|
+
puts("[#{Time.now}] #{message}")
|
276
|
+
end
|
277
|
+
|
278
|
+
def ask(question)
|
279
|
+
print(question + ": ")
|
280
|
+
$stdin.readline.chomp # gets doesn't work here!
|
281
|
+
end
|
282
|
+
|
283
|
+
def config_exists?
|
284
|
+
File.exist?(config_path)
|
285
|
+
end
|
286
|
+
|
287
|
+
def config_path
|
288
|
+
ENV['HOME'] + "/.sir-sync-a-lot.yml"
|
289
|
+
end
|
290
|
+
|
291
|
+
def lock_path
|
292
|
+
ENV['HOME'] + "/.sir-sync-a-lot.lock"
|
293
|
+
end
|
294
|
+
|
295
|
+
end
|
@@ -0,0 +1,24 @@
|
|
1
|
+
# -*- encoding: utf-8 -*-
|
2
|
+
$:.push File.expand_path("../lib", __FILE__)
|
3
|
+
require "sir-sync-a-lot"
|
4
|
+
|
5
|
+
Gem::Specification.new do |s|
|
6
|
+
s.name = "sir-sync-a-lot"
|
7
|
+
s.version = SirSyncalot::VERSION
|
8
|
+
s.authors = ["Ryan Allen"]
|
9
|
+
s.email = ["ryan@yeahnah.org"]
|
10
|
+
s.homepage = "https://github.com/ryan-allen/sir-sync-a-lot"
|
11
|
+
s.summary = %q{Baby got backups!}
|
12
|
+
s.description = %q{Optimised S3 backup tool. Uses linux's find and xargs to find updated files as to not exaust your disk IO.}
|
13
|
+
|
14
|
+
s.rubyforge_project = "sir-sync-a-lot"
|
15
|
+
|
16
|
+
s.files = `git ls-files`.split("\n")
|
17
|
+
s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
|
18
|
+
s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
|
19
|
+
s.require_paths = ["lib"]
|
20
|
+
|
21
|
+
# specify any dependencies here; for example:
|
22
|
+
# s.add_development_dependency "rspec"
|
23
|
+
s.add_runtime_dependency "aws-s3"
|
24
|
+
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sir-sync-a-lot
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.1
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -13,7 +13,7 @@ date: 2011-09-14 00:00:00.000000000Z
|
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: aws-s3
|
16
|
-
requirement: &
|
16
|
+
requirement: &70302277542320 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ! '>='
|
@@ -21,16 +21,25 @@ dependencies:
|
|
21
21
|
version: '0'
|
22
22
|
type: :runtime
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *70302277542320
|
25
25
|
description: Optimised S3 backup tool. Uses linux's find and xargs to find updated
|
26
26
|
files as to not exaust your disk IO.
|
27
27
|
email:
|
28
28
|
- ryan@yeahnah.org
|
29
|
-
executables:
|
29
|
+
executables:
|
30
|
+
- sir-sync-a-lot
|
30
31
|
extensions: []
|
31
32
|
extra_rdoc_files: []
|
32
|
-
files:
|
33
|
-
|
33
|
+
files:
|
34
|
+
- .gitignore
|
35
|
+
- Gemfile
|
36
|
+
- MIT-LICENSE
|
37
|
+
- README.md
|
38
|
+
- Rakefile
|
39
|
+
- bin/sir-sync-a-lot
|
40
|
+
- lib/sir-sync-a-lot.rb
|
41
|
+
- sir-sync-a-lot.gemspec
|
42
|
+
homepage: https://github.com/ryan-allen/sir-sync-a-lot
|
34
43
|
licenses: []
|
35
44
|
post_install_message:
|
36
45
|
rdoc_options: []
|