tina 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 5341f9de0800c6433ed74730845512e4bad3e21d
4
+ data.tar.gz: c2614fed632aceb92ee0fd0ea4312716c90415bd
5
+ SHA512:
6
+ metadata.gz: b35c9014deb0bb60fe01b035b82936fdf7d623eba2b3de3d8d630fb9e0b4d7baf5d15d9a9038afcac46e8b726b7f858e8ce8dbd6a75ed1b35cf29e66daa0e643
7
+ data.tar.gz: 4d4d88719c30b2e7bbf1feae0ce8d3a705edb136497e16ad669ee568b948c87b13ac7a3fcd6fc1c9d0074e437ce851e53d60f56afe0748307db385afb662f93c
@@ -0,0 +1,91 @@
1
+ # Tina
2
+
3
+ Tina is a tool for restoring objects from Amazon Glacier into Amazon
4
+ S3, while maintaining control over costs.
5
+
6
+ Amazon Glacier allows for a certain amount of the total storage to be
7
+ restored for free. The pricing model is however very complicated when
8
+ this threshold is exceeded, and it is not trivial to calculate when it
9
+ will be. Tina was written in order to solve this by estimating a price
10
+ for a restore given the total storage, the duration of the restore,
11
+ and what objects to restore.
12
+
13
+ ## Install
14
+
15
+ $ gem install tina
16
+
17
+ ## Usage
18
+
19
+ What you need:
20
+
21
+ * The `total-storage` number, which is the amount of data stored in
22
+ Glacier, in bytes. You can find a good enough estimate for this
23
+ number by looking at the "Amazon Simple Storage Service
24
+ EU-TimedStorage-GlacierByteHrs" line item on your bill for last
25
+ month ("Amazon Simple Storage Service TimedStorage-GlacierByteHrs"
26
+ for US regions).
27
+ * A `PREFIX_FILE` with lines on the format `s3://<BUCKET>/<PREFIX>` with
28
+ all the prefixes you want to restore
29
+
30
+ An example of how this tool can be used follows.
31
+
32
+ Caroline stores 227 TiB of data in Glacier, which is 249589139505152
33
+ bytes. She wants to restore all photos from June 2014 from the bucket
34
+ `my-photos` and all her horror movies starting with the letter A and B
35
+ from `my-movies`. She prepares a file called `my-restore.txt` with the
36
+ following contents:
37
+
38
+ s3://my-photos/2014/06/
39
+ s3://my-movies/horror/A
40
+ s3://my-movies/horror/B
41
+
42
+ She can now run tina like this;
43
+
44
+ $ tina --total-storage=249589139505152 --duration=20h --keep-days=14 my-restore.txt
45
+
46
+ This will instruct tina to prepare a restore over __20 hours__ for all
47
+ objects matching the prefixes in `my-restore.txt` and keep the objects
48
+ on S3 for __14 days__. Using the total storage amount, tina can
49
+ estimate a price for the restore.
50
+
51
+ After printing information about the restore and an estimated price,
52
+ tina will ask Caroline whether to proceed.
53
+
54
+ Please note that tina is a long running process, which means it is a
55
+ good idea to run it under `screen` or `tmux`, and on a machine that is
56
+ constantly connected, e.g. an EC2 instance.
57
+
58
+ ## Notes
59
+
60
+ * The estimated cost does not include the cost for the restore
61
+ requests or the temporary storage on S3.
62
+ * The estimated cost is based on the assumption that no other restores
63
+ are running in parallel, since that would incur a higher peak
64
+ restore rate and consequently a higher cost.
65
+ * The parameter for specifying the number of days to keep objects on
66
+ S3 is passed directly to the restore request. This means that
67
+ objects restored in one of the first chunks may expire sooner from
68
+ S3 than objects restored in one of the last chunks.
69
+
70
+ ## Future improvements
71
+
72
+ * Speed up initial object listing by parallelizing requests
73
+ * Implement a mode where tina figures out the required restore time to
74
+ restore given a specific budget (that might be $0)
75
+ * Implement resume and failure handling. Currently, if tina fails (for
76
+ example due to a restore request failing) the prefix file would have
77
+ to be updated manually in order to resume at the same place later.
78
+ * Use a first-fit algorithm to spread objects into chunks, instead of
79
+ the current naïve ordered chunking
80
+
81
+ ## Disclaimer
82
+
83
+ The authors make no guarantees that the costs calculated using this
84
+ script are correct and will not take any responsibility for any costs
85
+ caused by running this script. Please beware that restoring objects is
86
+ a potentially costly operation, that Amazon's pricing model may change
87
+ at any time and that this script may contain nasty bugs.
88
+
89
+ ## Copyright
90
+
91
+ © 2014 Burt AB, see LICENSE.txt (BSD 3-Clause).
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ $: << File.expand_path('../../lib', __FILE__)
4
+ require 'tina'
5
+
6
+ Tina::CLI.start
@@ -0,0 +1,6 @@
1
+ module Tina
2
+ S3Object = Struct.new(:bucket, :key, :size)
3
+ end
4
+ require 'tina/s3_client'
5
+ require 'tina/cli'
6
+ require 'tina/restore_plan'
@@ -0,0 +1,92 @@
1
+ require 'thor'
2
+ require 'aws-sdk-core'
3
+
4
+ module Tina
5
+ class CLI < Thor
6
+ desc "restore PREFIX_FILE", "Restore files from Glacier into S3"
7
+ option :total_storage, type: :numeric, required: true, desc: 'the total amount stored in Glacier, in bytes'
8
+ option :duration, type: :string, required: true, desc: 'duration of the restore, in the format <N>h or <N>d'
9
+ option :keep_days, type: :numeric, required: true, desc: 'The number of days to keep objects on S3'
10
+ option :aws_region, type: :string, required: true, default: 'eu-west-1', desc: 'The Amazon region to operate against'
11
+ def restore(prefix_file)
12
+ total_storage = options[:total_storage]
13
+ duration = options[:duration]
14
+ keep_days = options[:keep_days]
15
+ duration_in_seconds = parse_duration(duration)
16
+
17
+ prefixes = File.readlines(prefix_file).map(&:chomp)
18
+ objects = RestorePlan::ObjectCollection.new(s3_client.list_bucket_prefixes(prefixes))
19
+ restore_plan = RestorePlan.new(total_storage.to_i, objects)
20
+ price = restore_plan.price(duration_in_seconds)
21
+ chunks = objects.chunk(duration_in_seconds)
22
+ say
23
+ say "Restores will be performed in the following chunks:"
24
+ say "-" * 60
25
+ chunks.each_with_index do |chunk, index|
26
+ chunk_size = chunk.map(&:size).reduce(&:+)
27
+ say "#{index+1}) #{chunk.size} objects of total size %.2f GiB / %.2f TiB" % [chunk_size / 1024 ** 3, chunk_size.to_f / 1024 ** 4]
28
+ end
29
+ say "-" * 60
30
+ say "Actual restore time: %i days, %i hours" % [(4 * chunks.size) / 24, (4 * chunks.size) % 24]
31
+ say "Number of objects to restore: #{objects.size}"
32
+ say "Total restore size: %.2f MiB / %.2f GiB / %.2f TiB" % [objects.total_size.to_f / 1024 ** 2, objects.total_size.to_f / 1024 ** 3, objects.total_size.to_f / 1024 ** 4]
33
+ say "Estimated cost: $#{price}"
34
+ say "Days to keep objects on S3: #{keep_days} days"
35
+ say "-" * 60
36
+ say "* Please beware that these costs are not included in estimated cost:"
37
+ say "* - Cost for %i restore requests" % [objects.size]
38
+ say "* - Storage on S3 of %.2f GiB during %i days" % [objects.total_size.to_f / 1024 ** 3, keep_days]
39
+ say "-" * 60
40
+ return unless yes?("Do you feel rich? [y/n]", :yellow)
41
+ restore_chunks(chunks, keep_days)
42
+ end
43
+
44
+ private
45
+
46
+ UNIT_FACTORS = { 'd' => 24 * 3600, 'h' => 1 * 3600 }
47
+ CHUNK_INTERVAL = 4 * 3600
48
+
49
+ def parse_duration(duration)
50
+ duration.match(/^(\d+)(h|d)$/)
51
+ raise "DURATION not in required format, [0-9]+(h|d)" unless (count = $1.to_i rescue nil) && (unit = $2)
52
+ count * UNIT_FACTORS[unit]
53
+ end
54
+
55
+ def restore_chunks(chunks, keep_days)
56
+ chunks.each_with_index do |chunk, index|
57
+ start = Time.now
58
+ say "Restoring #{chunk.size} objects, chunk #{index+1} of #{chunks.size}"
59
+
60
+ chunk.each do |object|
61
+ begin
62
+ s3_client.restore_object(object, keep_days)
63
+ rescue Aws::S3::Errors::RestoreAlreadyInProgress, Aws::S3::Errors::InvalidObjectState => e
64
+ say "Error restoring #{object.bucket} / #{object.key} was ignored: #{e}", :yellow
65
+ else
66
+ say "Restore issued for #{object.bucket} / #{object.key}", :green
67
+ end
68
+ end
69
+
70
+ say "Restore for all objects in chunk %i requested. Took %.1f seconds." % [index+1, Time.now - start]
71
+
72
+ if index + 1 < chunks.size
73
+ next_start = start + CHUNK_INTERVAL
74
+ sleep_time = (next_start - Time.now)
75
+ if sleep_time < 0
76
+ say "Warning! Issuing restores took more than 4 hours, so the end time will be delayed. Proceeding immediately with next chunk.", :yellow
77
+ else
78
+ say "Sleeping for %.1f seconds, until #{next_start}" % sleep_time, :green
79
+ sleep sleep_time
80
+ end
81
+ end
82
+ end
83
+ end
84
+
85
+ def s3_client
86
+ @s3_client ||= begin
87
+ s3 = Aws::S3::Client.new(region: options[:aws_region])
88
+ s3_client = S3Client.new(s3, shell)
89
+ end
90
+ end
91
+ end
92
+ end
@@ -0,0 +1,62 @@
1
+ module Tina
2
+ class RestorePlan
3
+
4
+ MONTHLY_FREE_TIER_ALLOWANCE_FACTOR = 0.05
5
+ DAYS_PER_MONTH = 30
6
+ PRICE_PER_GB_PER_HOUR = 0.011
7
+
8
+ def initialize(total_storage_size, objects, options = {})
9
+ @total_storage_size = total_storage_size
10
+ @objects = objects
11
+ @price_per_gb_per_hour = options[:price_per_gb_per_hour] || PRICE_PER_GB_PER_HOUR
12
+
13
+ @daily_allowance = @total_storage_size * MONTHLY_FREE_TIER_ALLOWANCE_FACTOR / DAYS_PER_MONTH
14
+ end
15
+
16
+ def price(total_time)
17
+ total_time = [total_time, 4 * 3600].max
18
+ chunk_size = quadhourly_restore_rate(total_time)
19
+ chunks = @objects.chunk(chunk_size)
20
+ largest_chunk_object_size = chunks.map { |chunk| chunk.map(&:size).reduce(&:+) }.max
21
+ quadhours = chunks.size
22
+ quadhourly_allowance = @daily_allowance / ( [(24 / 4), quadhours].min * 4)
23
+
24
+ peak_retrieval_rate = largest_chunk_object_size / 4
25
+ peak_billable_retrieval_rate = [0, peak_retrieval_rate - quadhourly_allowance].max
26
+
27
+ peak_billable_retrieval_rate * (@price_per_gb_per_hour / 1024 ** 3) * 720
28
+ end
29
+ private
30
+
31
+ def quadhourly_restore_rate(total_time)
32
+ @objects.total_size / (total_time / (4 * 3600))
33
+ end
34
+
35
+ class ObjectCollection
36
+ attr_reader :total_size
37
+
38
+ def initialize(objects)
39
+ @objects = objects
40
+ @total_size = objects.map(&:size).reduce(&:+)
41
+ end
42
+
43
+ def size
44
+ @objects.size
45
+ end
46
+
47
+ def chunk(max_chunk_size)
48
+ @chunks ||= begin
49
+ chunks = @objects.chunk(sum: 0, index: 0) do |object, state|
50
+ state[:sum] += object.size
51
+ if state[:sum] > max_chunk_size
52
+ state[:sum] = object.size
53
+ state[:index] += 1
54
+ end
55
+ state[:index]
56
+ end
57
+ chunks.map(&:last)
58
+ end
59
+ end
60
+ end
61
+ end
62
+ end
@@ -0,0 +1,38 @@
1
+ require 'uri'
2
+
3
+ module Tina
4
+ class S3Client
5
+ ClientError = Class.new(StandardError)
6
+
7
+ def initialize(s3, shell)
8
+ @s3 = s3
9
+ @shell = shell
10
+ end
11
+
12
+ def list_bucket_prefixes(prefix_uris)
13
+ bucket_prefixes = prefix_uris.map do |prefix_uri|
14
+ uri = URI.parse(prefix_uri)
15
+ raise ClientError, "Invalid S3 URI: #{uri}" unless uri.scheme == 's3'
16
+ [uri.host, uri.path.sub(%r[^/], '')]
17
+ end
18
+ bucket_prefixes.flat_map do |(bucket,prefix)|
19
+ @shell.say "Listing prefix #{bucket}/#{prefix}..."
20
+ objects = []
21
+ marker = nil
22
+ loop do
23
+ listing = @s3.list_objects(bucket: bucket, prefix: prefix, marker: marker)
24
+ listing.contents.each do |object|
25
+ objects << S3Object.new(bucket, object.key, object.size)
26
+ marker = object.key
27
+ end
28
+ break unless listing.is_truncated
29
+ end
30
+ objects
31
+ end
32
+ end
33
+
34
+ def restore_object(object, keep_days)
35
+ @s3.restore_object(bucket: object.bucket, key: object.key, restore_request: { days: keep_days })
36
+ end
37
+ end
38
+ end
@@ -0,0 +1,3 @@
1
+ module Tina
2
+ VERSION = '1.0.0'
3
+ end
@@ -0,0 +1 @@
1
+ require 'tina'
@@ -0,0 +1,107 @@
1
+ require 'spec_helper'
2
+
3
+ module Tina
4
+ describe RestorePlan do
5
+ subject do
6
+ described_class.new(total_storage_size, object_collection, options)
7
+ end
8
+
9
+ let :total_storage_size do
10
+ 75 * (1024 ** 4)
11
+ end
12
+
13
+ let :total_restore_size do
14
+ 140 * (1024 ** 3)
15
+ end
16
+
17
+ let :object_collection do
18
+ SpecHelpers::ObjectCollection.new(total_restore_size)
19
+ end
20
+
21
+ let :options do
22
+ {
23
+ price_per_gb_per_hour: 0.01
24
+ }
25
+ end
26
+
27
+ describe '#price' do
28
+ context 'with perfectly aligned chunks' do
29
+ # http://aws.amazon.com/glacier/faqs/
30
+ context 'with the examples given on the Amazon Glacier pricing FAQ' do
31
+ it 'matches the the price for a restore with everything at once' do
32
+ expect(subject.price(4 * 3600)).to be_within(0.05).of(21.6)
33
+ end
34
+
35
+ it 'matches the the price for a restore over 8 hours' do
36
+ expect(subject.price(8 * 3600)).to be_within(0.05).of(10.8)
37
+ end
38
+
39
+ it 'matches the the price for a restore over 28 hours' do
40
+ expect(subject.price(28 * 3600)).to eq 0
41
+ end
42
+ end
43
+
44
+ # http://calculator.s3.amazonaws.com/index.html
45
+ context 'with arbitrary examples taken from the Amazon calculator' do
46
+ let :options do
47
+ {
48
+ price_per_gb_per_hour: 0.011
49
+ }
50
+ end
51
+
52
+ let :total_storage_size do
53
+ 227 * 1024 ** 4
54
+ end
55
+
56
+ let :total_restore_size do
57
+ 12_000 * 1024 ** 3
58
+ end
59
+
60
+ it 'matches the price for a restore over a month' do
61
+ expect(subject.price(30 * 24 * 3600)).to be_within(0.05).of(4.16)
62
+ end
63
+
64
+ it 'matches the price for a restore over a week' do
65
+ expect(subject.price(7 * 24 * 3600)).to be_within(0.05).of(437.87)
66
+ end
67
+
68
+ it 'matches the price for a restore over a day' do
69
+ expect(subject.price(1 * 24 * 3600)).to be_within(0.05).of(3832.16)
70
+ end
71
+
72
+ it 'matches the price for a restore over a 4 hour period' do
73
+ expect(subject.price(4 * 3600)).to be_within(0.05).of(22992.93)
74
+ end
75
+ end
76
+
77
+ context 'with the examples Amazon supplied in an e-mail' do
78
+ let :total_storage_size do
79
+ 227 * 1024 ** 4
80
+ end
81
+
82
+ let :total_restore_size do
83
+ 12_000 * 1024 ** 3
84
+ end
85
+
86
+ it 'matches the price for a restore over 4 days' do
87
+ expect(subject.price(4 * 24 * 3600)).to be_within(20).of(768)
88
+ end
89
+ end
90
+ end
91
+ end
92
+ end
93
+ end
94
+
95
+ module SpecHelpers
96
+ class ObjectCollection
97
+ attr_reader :total_size
98
+
99
+ def initialize(total_restore_size)
100
+ @total_size = total_restore_size
101
+ end
102
+
103
+ def chunk(max_chunk_size)
104
+ [[Tina::S3Object.new('bucket', 'key', max_chunk_size)]] * (total_size / max_chunk_size)
105
+ end
106
+ end
107
+ end
@@ -0,0 +1,61 @@
1
+ module Tina
2
+ describe S3Client do
3
+ let :s3 do
4
+ double('s3', list_objects: object_list)
5
+ end
6
+
7
+ let :shell do
8
+ double('shell', say: nil)
9
+ end
10
+
11
+ let :object_list do
12
+ double('object list', contents: [double('object', key: 'first', size: 123)], is_truncated: false)
13
+ end
14
+
15
+ subject do
16
+ described_class.new(s3, shell)
17
+ end
18
+
19
+ describe '#list_bucket_prefixes' do
20
+ it 'raises a client error when one of the input URIs does not have the correct scheme' do
21
+ expect { subject.list_bucket_prefixes(%w(http://foo)) }.to raise_error(described_class::ClientError, /Invalid S3 URI/)
22
+ end
23
+
24
+ it 'retrieves s3 objects' do
25
+ allow(object_list).to receive(:contents).and_return([double('object', key: 'foo', size: 123)])
26
+ expect(subject.list_bucket_prefixes(['s3://bucket/prefix'])).to eq [S3Object.new('bucket', 'foo', 123)]
27
+ expect(s3).to have_received(:list_objects).with(hash_including(bucket: 'bucket', prefix: 'prefix'))
28
+ end
29
+
30
+ context 'for truncated response listings' do
31
+ let :object_list2 do
32
+ double('object list 2', contents: [double('object', key: 'second', size: 123)], is_truncated: false)
33
+ end
34
+
35
+ before do
36
+ allow(object_list).to receive(:is_truncated).and_return(true)
37
+ allow(s3).to receive(:list_objects).and_return(object_list, object_list2)
38
+ end
39
+
40
+ it 'specifies the last key of the first request as the marker for the next request when truncated' do
41
+ markers = []
42
+ first = true
43
+ allow(s3).to receive(:list_objects) do |options|
44
+ markers << options[:marker]
45
+ ret = first ? object_list : object_list2
46
+ first = false
47
+ ret
48
+ end
49
+ subject.list_bucket_prefixes(['s3://bucket/prefix'])
50
+ expect(markers).to eq [nil, "first"]
51
+ end
52
+
53
+ it 'returns the complete list of objects' do
54
+ actual_objects = subject.list_bucket_prefixes(['s3://bucket/prefix'])
55
+ expected_objects = object_list.contents + object_list2.contents
56
+ expect(actual_objects.map(&:key)).to eq(expected_objects.map(&:key))
57
+ end
58
+ end
59
+ end
60
+ end
61
+ end
metadata ADDED
@@ -0,0 +1,87 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: tina
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Björn Ramberg
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2014-09-24 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: thor
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - '>='
18
+ - !ruby/object:Gem::Version
19
+ version: 0.19.1
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - '>='
25
+ - !ruby/object:Gem::Version
26
+ version: 0.19.1
27
+ - !ruby/object:Gem::Dependency
28
+ name: aws-sdk-core
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - '>='
32
+ - !ruby/object:Gem::Version
33
+ version: 2.0.0.rc15
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - '>='
39
+ - !ruby/object:Gem::Version
40
+ version: 2.0.0.rc15
41
+ description: CLI tool for restoring objects from Amazon Glacier over time in order
42
+ to control costs
43
+ email:
44
+ - bjorn@burtcorp.com
45
+ executables:
46
+ - tina
47
+ extensions: []
48
+ extra_rdoc_files: []
49
+ files:
50
+ - README.md
51
+ - bin/tina
52
+ - lib/tina.rb
53
+ - lib/tina/cli.rb
54
+ - lib/tina/restore_plan.rb
55
+ - lib/tina/s3_client.rb
56
+ - lib/tina/version.rb
57
+ - spec/spec_helper.rb
58
+ - spec/tina/restore_plan_spec.rb
59
+ - spec/tina/s3_client_spec.rb
60
+ homepage: http://github.com/burtcorp/tina
61
+ licenses:
62
+ - BSD-3-Clause
63
+ metadata: {}
64
+ post_install_message:
65
+ rdoc_options: []
66
+ require_paths:
67
+ - lib
68
+ required_ruby_version: !ruby/object:Gem::Requirement
69
+ requirements:
70
+ - - '>='
71
+ - !ruby/object:Gem::Version
72
+ version: '0'
73
+ required_rubygems_version: !ruby/object:Gem::Requirement
74
+ requirements:
75
+ - - '>='
76
+ - !ruby/object:Gem::Version
77
+ version: '0'
78
+ requirements: []
79
+ rubyforge_project:
80
+ rubygems_version: 2.2.2
81
+ signing_key:
82
+ specification_version: 4
83
+ summary: CLI tool for restoring objects from Amazon Glacier
84
+ test_files:
85
+ - spec/spec_helper.rb
86
+ - spec/tina/restore_plan_spec.rb
87
+ - spec/tina/s3_client_spec.rb