ssync 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,2 @@
1
+ .bundle/*
2
+ pkg/*
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source :rubygems
2
+
3
+ gemspec
@@ -0,0 +1,23 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ ssync (0.1.0)
5
+ aws-s3 (~> 0.6.2)
6
+
7
+ GEM
8
+ remote: http://rubygems.org/
9
+ specs:
10
+ aws-s3 (0.6.2)
11
+ builder
12
+ mime-types
13
+ xml-simple
14
+ builder (2.1.2)
15
+ mime-types (1.16)
16
+ xml-simple (1.0.12)
17
+
18
+ PLATFORMS
19
+ ruby
20
+
21
+ DEPENDENCIES
22
+ aws-s3 (~> 0.6.2)
23
+ ssync!
@@ -0,0 +1,19 @@
1
+ Copyright (c) 2010 Ryan Allen, Envato Pty Ltd
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in
11
+ all copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ THE SOFTWARE.
@@ -0,0 +1,70 @@
1
+ # Ssync
2
+
3
+ __Ssync__, an optimised S3 sync tool using the power of *nix!
4
+
5
+ ## Requirements
6
+
7
+ - Ruby 1.8 or 1.9
8
+ - RubyGems
9
+ - 'aws-s3' rubygem
10
+ - `openssl`
11
+ - `find` and `xargs`
12
+
13
+ ## Installation
14
+
15
+ gem install ssync
16
+
17
+ ## Configuration
18
+
19
+ To configure, run `ssync setup` and follow the prompts, you'll
20
+ need your AWS keys, the local file path you want to back up, the bucket name
21
+ to back up to, and any extra options to pass into find (i.e. for ignoring
22
+ filepaths etc). It'll write the config to `~/.ssync.yml`.
23
+
24
+ ## Synchronisation
25
+
26
+ To sync, run `ssync sync` and away it goes.
27
+
28
+ In the case of a corrupted/incomplete synchronisation, run `ssync sync -f`
29
+ or `ssync sync --force` to force a checksum comparison.
30
+
31
+ ## Why?
32
+
33
+ This library was written because we needed to be able to back up loads of
34
+ data without having to worry about if we had enough disk space on the remote.
35
+ That's where S3 is nice.
36
+
37
+ We tried [s3sync](http://www.s3sync.net/) but it blew our server load (we do in excess of
38
+ 500,000 requests a day (page views, not including hits for images and what not,
39
+ and the server needs to stay responsive). The secret sauce is using the *nix
40
+ `find`, `xargs` and `openssl` commands to generate md5 checksums for comparison.
41
+ Seems to work quite well for us (we have almost 90,000 files to compare).
42
+
43
+ Initially the plan was to use `find` with `-ctime` but S3 isn't particularly nice about
44
+ returning a full list of objects in a bucket (default is 1000, and I want all
45
+ 90,000, and it ignores me when I ask for 1,000,000 objects). Manifest generation
46
+ on a server under load is fast enough and low enough on resources so we're sticking
47
+ with that in the interim.
48
+
49
+ FYI when you run sync, the output will look something like this:
50
+
51
+ [Thu Apr 01 11:50:25 +1100 2010] Starting, performing pre-sync checks ...
52
+ [Thu Apr 01 11:50:26 +1100 2010] Generating local manifest ...
53
+ [Thu Apr 01 11:50:26 +1100 2010] Fetching remote manifest ...
54
+ [Thu Apr 01 11:50:27 +1100 2010] Performing checksum comparison ...
55
+ [Thu Apr 01 11:50:27 +1100 2010] Pushing /tmp/backups/deep/four ...
56
+ [Thu Apr 01 11:50:28 +1100 2010] Pushing /tmp/backups/three ...
57
+ [Thu Apr 01 11:50:29 +1100 2010] Pushing /tmp/backups/two ...
58
+ [Thu Apr 01 11:50:30 +1100 2010] Pushing local manifest up to remote ...
59
+ [Thu Apr 01 11:50:31 +1100 2010] Sync complete!
60
+
61
+ You could pipe sync into a log file, which might be nice.
62
+
63
+ Have fun!
64
+
65
+ ## Authors
66
+
67
+ - [Ryan Allen](https://github.com/ryan-allen)
68
+ - [Fred Wu](https://github.com/fredwu)
69
+
70
+ This project is brought to you by [Envato](http://envato.com/) Pty Ltd.
@@ -0,0 +1,5 @@
1
+ begin
2
+ require 'bundler'
3
+ Bundler::GemHelper.install_tasks
4
+ rescue Exception => e
5
+ end
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ $:.unshift File.dirname(__FILE__) + "/../lib"
4
+ require "ssync/command"
5
+
6
+ Ssync::Command.run!(*ARGV)
@@ -0,0 +1,8 @@
1
+ # encoding: utf-8
2
+
3
+ require "aws/s3"
4
+ require "yaml"
5
+ require "ssync/helpers"
6
+ require "ssync/setup"
7
+ require "ssync/sync"
8
+ require "ssync/version"
@@ -0,0 +1,52 @@
1
+ require "ssync"
2
+
3
+ module Ssync
4
+ class Command
5
+ include Helpers
6
+
7
+ def self.action
8
+ @@action
9
+ end
10
+
11
+ def self.args
12
+ @@args
13
+ end
14
+
15
+ def self.run!(*args)
16
+ new(*args).run!
17
+ end
18
+
19
+ def initialize(action = :sync, *args)
20
+ @@action = action.to_sym
21
+ @@args = *args
22
+ end
23
+
24
+ def run!
25
+ pre_run_check!
26
+ perform_action!
27
+ end
28
+
29
+ private
30
+
31
+ def pre_run_check!
32
+ if action_eq?(:setup) && config_exists?
33
+ e! "Cannot run the setup, there is already an Ssync configuration in '#{config_path}'."
34
+ elsif action_eq?(:sync) && !config_exists?
35
+ e! "Cannot run the sync, there is no Ssync configuration, try 'ssync setup' to create one first."
36
+ end
37
+ end
38
+
39
+ def perform_action!
40
+ case @@action
41
+ when :setup
42
+ aquire_lock! { Ssync::Setup.run! }
43
+ when :sync
44
+ aquire_lock! { Ssync::Sync.run! }
45
+ when :help
46
+ display_help!
47
+ else
48
+ e! "Cannot perform action '#{@action}', try 'ssync help' for usage."
49
+ end
50
+ end
51
+ end
52
+ end
@@ -0,0 +1,76 @@
1
+ module Ssync
2
+ module Helpers
3
+ def display_help!
4
+ display("Not implemented yet.")
5
+ end
6
+
7
+ def display(message)
8
+ puts("[#{Time.now}] #{message}")
9
+ end
10
+
11
+ def display_error(message)
12
+ display("Error! " + message)
13
+ end
14
+
15
+ def exit_with_error!(message)
16
+ display_error(message)
17
+ exit
18
+ end
19
+
20
+ alias :e :display_error
21
+ alias :e! :exit_with_error!
22
+
23
+ def ask(config_item, question)
24
+ print(question + " [#{config_item}]: ")
25
+ a = $stdin.readline.chomp
26
+ a.empty? ? config_item : a
27
+ end
28
+
29
+ def action_eq?(action)
30
+ Ssync::Command.action == action.to_sym
31
+ end
32
+
33
+ def config_exists?
34
+ File.exist?(config_path)
35
+ end
36
+
37
+ def config_path
38
+ ENV['HOME'] + "/.ssync.yml"
39
+ end
40
+
41
+ def lock_path
42
+ ENV['HOME'] + "/.ssync.lock"
43
+ end
44
+
45
+ def aquire_lock!
46
+ # better way is to write out the pid ($$) and read it back in, to make sure it's the same
47
+ e! "Found a lock at #{lock_path}, is another instance of Ssync running?" if File.exist?(lock_path)
48
+
49
+ begin
50
+ system "touch #{lock_path}"
51
+ yield
52
+ ensure
53
+ system "rm #{lock_path}"
54
+ end
55
+ end
56
+
57
+ def read_config
58
+ begin
59
+ open(config_path, "r") { |f| YAML::load(f) }
60
+ rescue
61
+ {}
62
+ end
63
+ end
64
+
65
+ def write_config!(config)
66
+ open(config_path, "w") { |f| YAML::dump(config, f) }
67
+ end
68
+
69
+ def options_set?(*options)
70
+ false
71
+ [options].flatten.each do |option|
72
+ return true if Command.args.include?("#{option.to_s}")
73
+ end
74
+ end
75
+ end
76
+ end
@@ -0,0 +1,92 @@
1
+ module Ssync
2
+ class Setup
3
+ class << self
4
+ include Helpers
5
+
6
+ def config
7
+ @config
8
+ end
9
+
10
+ def config=(config)
11
+ @config = config
12
+ end
13
+
14
+ def run!
15
+ display "Welcome to Ssync! You will now be asked a few questions, the results will be stored at '#{config_path}'."
16
+
17
+ config = read_config
18
+
19
+ config[:aws_access_key] = ask config[:aws_access_key], "What is the AWS Access Key ID?"
20
+ config[:aws_secret_key] = ask config[:aws_secret_key], "What is the AWS Secret Access Key?"
21
+
22
+ display "Please wait while Ssync is connecting to AWS ..."
23
+
24
+ if aws_credentials_is_valid?(config)
25
+ display "Successfully connected to AWS."
26
+
27
+ config[:aws_dest_bucket] = ask config[:aws_dest_bucket], "Which bucket would you like to put your backups in? Ssync will create the bucket for you if it doesn't exist."
28
+
29
+ if bucket_exists?(config)
30
+ if bucket_empty?(config)
31
+ display "The bucket exists and is empty, great!"
32
+ else
33
+ e! "The bucket exists but is not empty, we cannot sync to a bucket that is not empty!"
34
+ end
35
+ else
36
+ display "The bucket doesn't exist, creating it now ..."
37
+ create_bucket(config)
38
+ display "The bucket has been created."
39
+ end
40
+ else
41
+ e! "Ssync wasn't able to connect to AWS, please check the credentials you supplied are correct."
42
+ end
43
+
44
+ require "pathname"
45
+ config[:local_file_path] = ask config[:local_file_path], "What is the path you would like to backup? (i.e. '/var/www')."
46
+ config[:local_file_path] = Pathname.new(config[:local_file_path]).realpath.to_s
47
+
48
+ if local_file_path_exists?(config)
49
+ display "The path is set to '#{config[:local_file_path]}'."
50
+ else
51
+ e! "The path you specified does not exist!"
52
+ end
53
+
54
+ config[:find_options] = ask config[:find_options], "Do you have any options for 'find'? (e.g. \! -path *.git*)."
55
+
56
+ display "Writing the supplied details to '#{config_path}' for future reference ..."
57
+ write_config!(config)
58
+ display "All done! You may now use 'ssync sync' to syncronise your files to the S3 bucket."
59
+ end
60
+
61
+ def aws_credentials_is_valid?(config = read_config)
62
+ AWS::S3::Base.establish_connection!(:access_key_id => config[:aws_access_key], :secret_access_key => config[:aws_secret_key])
63
+ begin
64
+ # AWS::S3 don't try to connect at all until you ask it for something.
65
+ AWS::S3::Service.buckets
66
+ rescue AWS::S3::InvalidAccessKeyId => e
67
+ false
68
+ else
69
+ true
70
+ end
71
+ end
72
+
73
+ def bucket_exists?(config = read_config)
74
+ AWS::S3::Bucket.find(config[:aws_dest_bucket])
75
+ rescue AWS::S3::NoSuchBucket => e
76
+ false
77
+ end
78
+
79
+ def bucket_empty?(config = read_config)
80
+ AWS::S3::Bucket.find(config[:aws_dest_bucket]).empty?
81
+ end
82
+
83
+ def create_bucket(config = read_config)
84
+ AWS::S3::Bucket.create(config[:aws_dest_bucket])
85
+ end
86
+
87
+ def local_file_path_exists?(config = read_config)
88
+ File.exist?(config[:local_file_path])
89
+ end
90
+ end
91
+ end
92
+ end
@@ -0,0 +1,126 @@
1
+ module Ssync
2
+ class Sync
3
+ class << self
4
+ include Helpers
5
+
6
+ def run!
7
+ display "Initialising Ssync, performing pre-sync checks ..."
8
+
9
+ e! "Couldn't connect to AWS with the credentials specified in '#{config_path}'." unless Setup.aws_credentials_is_valid?
10
+ e! "Couldn't find the S3 bucket specified in '#{config_path}'." unless Setup.bucket_exists?
11
+ e! "The local path specified in '#{config_path}' does not exist." unless Setup.local_file_path_exists?
12
+
13
+ if options_set?("-f", "--force")
14
+ display "Clearing previous sync state ..."
15
+ clear_sync_state
16
+ end
17
+ create_tmp_sync_state
18
+
19
+ if last_sync_recorded?
20
+ display "Performing time based comparison ..."
21
+ files_modified_since_last_sync
22
+ else
23
+ display "Performing (potentially expensive) MD5 checksum comparison ..."
24
+ display "Generating local manifest ..."
25
+ generate_local_manifest
26
+ display "Traversing S3 for remote manifest ..."
27
+ fetch_remote_manifest
28
+ # note that we do not remove files on s3 that no longer exist on local host.
29
+ # this behaviour may be desirable (ala rsync --delete) but we currently don't support it.
30
+ display "Performing checksum comparison ..."
31
+ files_on_localhost_with_checksums - files_on_s3
32
+ end.each { |file| push_file(file) }
33
+
34
+ finalize_sync_state
35
+
36
+ display "Sync complete!"
37
+ end
38
+
39
+ def clear_sync_state
40
+ `rm -f #{last_sync_started} #{last_sync_completed}`
41
+ end
42
+
43
+ def create_tmp_sync_state
44
+ `touch #{last_sync_started}`
45
+ end
46
+
47
+ def last_sync_started
48
+ ENV['HOME'] + "/.ssync.last-sync.started"
49
+ end
50
+
51
+ def last_sync_completed
52
+ ENV['HOME'] + "/.ssync.last-sync.completed"
53
+ end
54
+
55
+ def last_sync_recorded?
56
+ File.exist?(last_sync_completed)
57
+ end
58
+
59
+ def finalize_sync_state
60
+ `cp #{last_sync_started} #{last_sync_completed}`
61
+ end
62
+
63
+ def files_modified_since_last_sync
64
+ # '! -type d' ignores directories, in local manifest directories are spit out to stderr whereas directories pop up in this query
65
+ `find #{read_config[:local_file_path]} #{read_config[:find_options]} \! -type d -cnewer #{last_sync_completed}`.split("\n").collect { |path| { :path => path } }
66
+ end
67
+
68
+ def update_config_with_sync_state(sync_start)
69
+ config = read_config()
70
+ config[:last_sync_at] = sync_start
71
+ write_config!(config)
72
+ end
73
+
74
+ def generate_local_manifest
75
+ `find #{read_config[:local_file_path]} #{read_config[:find_options]} -print0 | xargs -0 openssl md5 2> /dev/null > #{local_manifest_path}`
76
+ end
77
+
78
+ def fetch_remote_manifest
79
+ @remote_objects_cache = []
80
+ traverse_s3_for_objects(AWS::S3::Bucket.find(read_config[:aws_dest_bucket]), @remote_objects_cache)
81
+ end
82
+
83
+ def traverse_s3_for_objects(bucket, collection, n = 1000, upto = 0, marker = nil)
84
+ objects = bucket.objects(:marker => marker, :max_keys => n)
85
+ if objects.size == 0
86
+ return
87
+ else
88
+ objects.each { |object| collection << { :path => "/#{object.key}", :checksum => object.etag } }
89
+ traverse_s3_for_objects(bucket, collection, n, upto+n, objects.last.key)
90
+ end
91
+ end
92
+
93
+ def files_on_localhost_with_checksums
94
+ parse_manifest(local_manifest_path)
95
+ end
96
+
97
+ def files_on_s3
98
+ @remote_objects_cache
99
+ end
100
+
101
+ def local_manifest_path
102
+ "/tmp/ssync.manifest.local"
103
+ end
104
+
105
+ def parse_manifest(location)
106
+ []
107
+ if File.exist?(location)
108
+ open(location, "r") do |file|
109
+ file.collect do |line|
110
+ path, checksum = *line.chomp.match(/^MD5\((.*)\)= (.*)$/).captures
111
+ { :path => path, :checksum => checksum }
112
+ end
113
+ end
114
+ end
115
+ end
116
+
117
+ def push_file(file)
118
+ # xfer speed, logging, etc can occur in this method
119
+ display "Pushing '#{file[:path]}' ..."
120
+ AWS::S3::S3Object.store(file[:path], open(file[:path]), read_config[:aws_dest_bucket])
121
+ rescue
122
+ e "Could not push '#{file[:path]}': #{$!.inspect}"
123
+ end
124
+ end
125
+ end
126
+ end
@@ -0,0 +1,3 @@
1
+ module Ssync
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,24 @@
1
+ # -*- encoding: utf-8 -*-
2
+ require File.dirname(__FILE__) + "/lib/ssync/version"
3
+
4
+ Gem::Specification.new do |s|
5
+ s.name = "ssync"
6
+ s.version = Ssync::VERSION
7
+ s.date = Date.today.to_s
8
+ s.authors = ["Fred Wu", "Ryan Allen"]
9
+ s.email = ["fred@envato.com", "ryan@envato.com"]
10
+ s.summary = %q{Ssync, an optimised S3 sync tool using the power of *nix!}
11
+ s.description = %q{Ssync, an optimised S3 sync tool using the power of *nix!}
12
+ s.homepage = %q{http://github.com/fredwu/ssync}
13
+ s.extra_rdoc_files = ["README.md"]
14
+ s.rdoc_options = ["--charset=UTF-8"]
15
+ s.require_paths = ["lib"]
16
+ s.rubyforge_project = s.name
17
+
18
+ s.files = `git ls-files`.split("\n")
19
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
20
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
21
+ s.require_paths = ["lib"]
22
+
23
+ s.add_runtime_dependency(%q<aws-s3>, ["~> 0.6.2"])
24
+ end
metadata ADDED
@@ -0,0 +1,94 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: ssync
3
+ version: !ruby/object:Gem::Version
4
+ prerelease: false
5
+ segments:
6
+ - 0
7
+ - 1
8
+ - 0
9
+ version: 0.1.0
10
+ platform: ruby
11
+ authors:
12
+ - Fred Wu
13
+ - Ryan Allen
14
+ autorequire:
15
+ bindir: bin
16
+ cert_chain: []
17
+
18
+ date: 2010-11-12 00:00:00 +11:00
19
+ default_executable:
20
+ dependencies:
21
+ - !ruby/object:Gem::Dependency
22
+ name: aws-s3
23
+ prerelease: false
24
+ requirement: &id001 !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ~>
28
+ - !ruby/object:Gem::Version
29
+ segments:
30
+ - 0
31
+ - 6
32
+ - 2
33
+ version: 0.6.2
34
+ type: :runtime
35
+ version_requirements: *id001
36
+ description: Ssync, an optimised S3 sync tool using the power of *nix!
37
+ email:
38
+ - fred@envato.com
39
+ - ryan@envato.com
40
+ executables:
41
+ - ssync
42
+ extensions: []
43
+
44
+ extra_rdoc_files:
45
+ - README.md
46
+ files:
47
+ - .gitignore
48
+ - Gemfile
49
+ - Gemfile.lock
50
+ - MIT-LICENSE
51
+ - README.md
52
+ - Rakefile
53
+ - bin/ssync
54
+ - lib/ssync.rb
55
+ - lib/ssync/command.rb
56
+ - lib/ssync/helpers.rb
57
+ - lib/ssync/setup.rb
58
+ - lib/ssync/sync.rb
59
+ - lib/ssync/version.rb
60
+ - ssync.gemspec
61
+ has_rdoc: true
62
+ homepage: http://github.com/fredwu/ssync
63
+ licenses: []
64
+
65
+ post_install_message:
66
+ rdoc_options:
67
+ - --charset=UTF-8
68
+ require_paths:
69
+ - lib
70
+ required_ruby_version: !ruby/object:Gem::Requirement
71
+ none: false
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ segments:
76
+ - 0
77
+ version: "0"
78
+ required_rubygems_version: !ruby/object:Gem::Requirement
79
+ none: false
80
+ requirements:
81
+ - - ">="
82
+ - !ruby/object:Gem::Version
83
+ segments:
84
+ - 0
85
+ version: "0"
86
+ requirements: []
87
+
88
+ rubyforge_project: ssync
89
+ rubygems_version: 1.3.7
90
+ signing_key:
91
+ specification_version: 3
92
+ summary: Ssync, an optimised S3 sync tool using the power of *nix!
93
+ test_files: []
94
+