tina 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 5341f9de0800c6433ed74730845512e4bad3e21d
4
+ data.tar.gz: c2614fed632aceb92ee0fd0ea4312716c90415bd
5
+ SHA512:
6
+ metadata.gz: b35c9014deb0bb60fe01b035b82936fdf7d623eba2b3de3d8d630fb9e0b4d7baf5d15d9a9038afcac46e8b726b7f858e8ce8dbd6a75ed1b35cf29e66daa0e643
7
+ data.tar.gz: 4d4d88719c30b2e7bbf1feae0ce8d3a705edb136497e16ad669ee568b948c87b13ac7a3fcd6fc1c9d0074e437ce851e53d60f56afe0748307db385afb662f93c
@@ -0,0 +1,91 @@
1
+ # Tina
2
+
3
+ Tina is a tool for restoring objects from Amazon Glacier into Amazon
4
+ S3, while maintaining control over costs.
5
+
6
+ Amazon Glacier allows for a certain amount of the total storage to be
7
+ restored for free. The pricing model is however very complicated when
8
+ this threshold is exceeded, and it is not trivial to calculate when it
9
+ will be. Tina was written in order to solve this by estimating a price
10
+ for a restore given the total storage, the duration of the restore,
11
+ and what objects to restore.
12
+
13
+ ## Install
14
+
15
+ $ gem install tina
16
+
17
+ ## Usage
18
+
19
+ What you need:
20
+
21
+ * The `total-storage` number, which is the amount of data stored in
22
+ Glacier, in bytes. You can find a good enough estimate for this
23
+ number by looking at the "Amazon Simple Storage Service
24
+ EU-TimedStorage-GlacierByteHrs" line item on your bill for last
25
+ month ("Amazon Simple Storage Service TimedStorage-GlacierByteHrs"
26
+ for US regions).
27
+ * A `PREFIX_FILE` with lines on the format `s3://<BUCKET>/<PREFIX>` with
28
+ all the prefixes you want to restore
29
+
30
+ An example of how this tool can be used follows.
31
+
32
+ Caroline stores 227 TiB of data in Glacier, which is 249589139505152
33
+ bytes. She wants to restore all photos from June 2014 from the bucket
34
+ `my-photos` and all her horror movies starting with the letter A and B
35
+ from `my-movies`. She prepares a file called `my-restore.txt` with the
36
+ following contents:
37
+
38
+ s3://my-photos/2014/06/
39
+ s3://my-movies/horror/A
40
+ s3://my-movies/horror/B
41
+
42
+ She can now run tina like this;
43
+
44
+ $ tina --total-storage=249589139505152 --duration=20h --keep-days=14 my-restore.txt
45
+
46
+ This will instruct tina to prepare a restore over __20 hours__ for all
47
+ objects matching the prefixes in `my-restore.txt` and keep the objects
48
+ on S3 for __14 days__. Using the total storage amount, tina can
49
+ estimate a price for the restore.
50
+
51
+ After printing information about the restore and an estimated price,
52
+ tina will ask Caroline whether to proceed.
53
+
54
+ Please note that tina is a long running process, which means it is a
55
+ good idea to run it under `screen` or `tmux`, and on a machine that is
56
+ constantly connected, e.g. an EC2 instance.
57
+
58
+ ## Notes
59
+
60
+ * The estimated cost does not include the cost for the restore
61
+ requests or the temporary storage on S3.
62
+ * The estimated cost is based on the assumption that no other restores
63
+ are running in parallel, since that would incur a higher peak
64
+ restore rate and consequently a higher cost.
65
+ * The parameter for specifying the number of days to keep objects on
66
+ S3 is passed directly to the restore request. This means that
67
+ objects restored in one of the first chunks may expire sooner from
68
+ S3 than objects restored in one of the last chunks.
69
+
70
+ ## Future improvements
71
+
72
+ * Speed up initial object listing by parallelizing requests
73
+ * Implement a mode where tina figures out the required restore time to
74
+ restore given a specific budget (that might be $0)
75
+ * Implement resume and failure handling. Currently, if tina fails (for
76
+ example due to a restore request failing) the prefix file would have
77
+ to be updated manually in order to resume at the same place later.
78
+ * Use a first-fit algorithm to spread objects into chunks, instead of
79
+ the current naïve ordered chunking
80
+
81
+ ## Disclaimer
82
+
83
+ The authors make no guarantees that the costs calculated using this
84
+ script are correct and will not take any responsibility for any costs
85
+ caused by running this script. Please beware that restoring objects is
86
+ a potentially costly operation, that Amazon's pricing model may change
87
+ at any time and that this script may contain nasty bugs.
88
+
89
+ ## Copyright
90
+
91
+ © 2014 Burt AB, see LICENSE.txt (BSD 3-Clause).
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ $: << File.expand_path('../../lib', __FILE__)
4
+ require 'tina'
5
+
6
+ Tina::CLI.start
@@ -0,0 +1,6 @@
1
+ module Tina
2
+ S3Object = Struct.new(:bucket, :key, :size)
3
+ end
4
+ require 'tina/s3_client'
5
+ require 'tina/cli'
6
+ require 'tina/restore_plan'
@@ -0,0 +1,92 @@
1
+ require 'thor'
2
+ require 'aws-sdk-core'
3
+
4
+ module Tina
5
+ class CLI < Thor
6
+ desc "restore PREFIX_FILE", "Restore files from Glacier into S3"
7
+ option :total_storage, type: :numeric, required: true, desc: 'the total amount stored in Glacier, in bytes'
8
+ option :duration, type: :string, required: true, desc: 'duration of the restore, in the format <N>h or <N>d'
9
+ option :keep_days, type: :numeric, required: true, desc: 'The number of days to keep objects on S3'
10
+ option :aws_region, type: :string, required: true, default: 'eu-west-1', desc: 'The Amazon region to operate against'
11
+ def restore(prefix_file)
12
+ total_storage = options[:total_storage]
13
+ duration = options[:duration]
14
+ keep_days = options[:keep_days]
15
+ duration_in_seconds = parse_duration(duration)
16
+
17
+ prefixes = File.readlines(prefix_file).map(&:chomp)
18
+ objects = RestorePlan::ObjectCollection.new(s3_client.list_bucket_prefixes(prefixes))
19
+ restore_plan = RestorePlan.new(total_storage.to_i, objects)
20
+ price = restore_plan.price(duration_in_seconds)
21
+ chunks = objects.chunk(duration_in_seconds)
22
+ say
23
+ say "Restores will be performed in the following chunks:"
24
+ say "-" * 60
25
+ chunks.each_with_index do |chunk, index|
26
+ chunk_size = chunk.map(&:size).reduce(&:+)
27
+ say "#{index+1}) #{chunk.size} objects of total size %.2f GiB / %.2f TiB" % [chunk_size / 1024 ** 3, chunk_size.to_f / 1024 ** 4]
28
+ end
29
+ say "-" * 60
30
+ say "Actual restore time: %i days, %i hours" % [(4 * chunks.size) / 24, (4 * chunks.size) % 24]
31
+ say "Number of objects to restore: #{objects.size}"
32
+ say "Total restore size: %.2f MiB / %.2f GiB / %.2f TiB" % [objects.total_size.to_f / 1024 ** 2, objects.total_size.to_f / 1024 ** 3, objects.total_size.to_f / 1024 ** 4]
33
+ say "Estimated cost: $#{price}"
34
+ say "Days to keep objects on S3: #{keep_days} days"
35
+ say "-" * 60
36
+ say "* Please beware that these costs are not included in estimated cost:"
37
+ say "* - Cost for %i restore requests" % [objects.size]
38
+ say "* - Storage on S3 of %.2f GiB during %i days" % [objects.total_size.to_f / 1024 ** 3, keep_days]
39
+ say "-" * 60
40
+ return unless yes?("Do you feel rich? [y/n]", :yellow)
41
+ restore_chunks(chunks, keep_days)
42
+ end
43
+
44
+ private
45
+
46
+ UNIT_FACTORS = { 'd' => 24 * 3600, 'h' => 1 * 3600 }
47
+ CHUNK_INTERVAL = 4 * 3600
48
+
49
+ def parse_duration(duration)
50
+ duration.match(/^(\d+)(h|d)$/)
51
+ raise "DURATION not in required format, [0-9]+(h|d)" unless (count = $1.to_i rescue nil) && (unit = $2)
52
+ count * UNIT_FACTORS[unit]
53
+ end
54
+
55
+ def restore_chunks(chunks, keep_days)
56
+ chunks.each_with_index do |chunk, index|
57
+ start = Time.now
58
+ say "Restoring #{chunk.size} objects, chunk #{index+1} of #{chunks.size}"
59
+
60
+ chunk.each do |object|
61
+ begin
62
+ s3_client.restore_object(object, keep_days)
63
+ rescue Aws::S3::Errors::RestoreAlreadyInProgress, Aws::S3::Errors::InvalidObjectState => e
64
+ say "Error restoring #{object.bucket} / #{object.key} was ignored: #{e}", :yellow
65
+ else
66
+ say "Restore issued for #{object.bucket} / #{object.key}", :green
67
+ end
68
+ end
69
+
70
+ say "Restore for all objects in chunk %i requested. Took %.1f seconds." % [index+1, Time.now - start]
71
+
72
+ if index + 1 < chunks.size
73
+ next_start = start + CHUNK_INTERVAL
74
+ sleep_time = (next_start - Time.now)
75
+ if sleep_time < 0
76
+ say "Warning! Issuing restores took more than 4 hours, so the end time will be delayed. Proceeding immediately with next chunk.", :yellow
77
+ else
78
+ say "Sleeping for %.1f seconds, until #{next_start}" % sleep_time, :green
79
+ sleep sleep_time
80
+ end
81
+ end
82
+ end
83
+ end
84
+
85
+ def s3_client
86
+ @s3_client ||= begin
87
+ s3 = Aws::S3::Client.new(region: options[:aws_region])
88
+ s3_client = S3Client.new(s3, shell)
89
+ end
90
+ end
91
+ end
92
+ end
@@ -0,0 +1,62 @@
1
+ module Tina
2
+ class RestorePlan
3
+
4
+ MONTHLY_FREE_TIER_ALLOWANCE_FACTOR = 0.05
5
+ DAYS_PER_MONTH = 30
6
+ PRICE_PER_GB_PER_HOUR = 0.011
7
+
8
+ def initialize(total_storage_size, objects, options = {})
9
+ @total_storage_size = total_storage_size
10
+ @objects = objects
11
+ @price_per_gb_per_hour = options[:price_per_gb_per_hour] || PRICE_PER_GB_PER_HOUR
12
+
13
+ @daily_allowance = @total_storage_size * MONTHLY_FREE_TIER_ALLOWANCE_FACTOR / DAYS_PER_MONTH
14
+ end
15
+
16
+ def price(total_time)
17
+ total_time = [total_time, 4 * 3600].max
18
+ chunk_size = quadhourly_restore_rate(total_time)
19
+ chunks = @objects.chunk(chunk_size)
20
+ largest_chunk_object_size = chunks.map { |chunk| chunk.map(&:size).reduce(&:+) }.max
21
+ quadhours = chunks.size
22
+ quadhourly_allowance = @daily_allowance / ( [(24 / 4), quadhours].min * 4)
23
+
24
+ peak_retrieval_rate = largest_chunk_object_size / 4
25
+ peak_billable_retrieval_rate = [0, peak_retrieval_rate - quadhourly_allowance].max
26
+
27
+ peak_billable_retrieval_rate * (@price_per_gb_per_hour / 1024 ** 3) * 720
28
+ end
29
+ private
30
+
31
+ def quadhourly_restore_rate(total_time)
32
+ @objects.total_size / (total_time / (4 * 3600))
33
+ end
34
+
35
+ class ObjectCollection
36
+ attr_reader :total_size
37
+
38
+ def initialize(objects)
39
+ @objects = objects
40
+ @total_size = objects.map(&:size).reduce(&:+)
41
+ end
42
+
43
+ def size
44
+ @objects.size
45
+ end
46
+
47
+ def chunk(max_chunk_size)
48
+ @chunks ||= begin
49
+ chunks = @objects.chunk(sum: 0, index: 0) do |object, state|
50
+ state[:sum] += object.size
51
+ if state[:sum] > max_chunk_size
52
+ state[:sum] = object.size
53
+ state[:index] += 1
54
+ end
55
+ state[:index]
56
+ end
57
+ chunks.map(&:last)
58
+ end
59
+ end
60
+ end
61
+ end
62
+ end
@@ -0,0 +1,38 @@
1
+ require 'uri'
2
+
3
+ module Tina
4
+ class S3Client
5
+ ClientError = Class.new(StandardError)
6
+
7
+ def initialize(s3, shell)
8
+ @s3 = s3
9
+ @shell = shell
10
+ end
11
+
12
+ def list_bucket_prefixes(prefix_uris)
13
+ bucket_prefixes = prefix_uris.map do |prefix_uri|
14
+ uri = URI.parse(prefix_uri)
15
+ raise ClientError, "Invalid S3 URI: #{uri}" unless uri.scheme == 's3'
16
+ [uri.host, uri.path.sub(%r[^/], '')]
17
+ end
18
+ bucket_prefixes.flat_map do |(bucket,prefix)|
19
+ @shell.say "Listing prefix #{bucket}/#{prefix}..."
20
+ objects = []
21
+ marker = nil
22
+ loop do
23
+ listing = @s3.list_objects(bucket: bucket, prefix: prefix, marker: marker)
24
+ listing.contents.each do |object|
25
+ objects << S3Object.new(bucket, object.key, object.size)
26
+ marker = object.key
27
+ end
28
+ break unless listing.is_truncated
29
+ end
30
+ objects
31
+ end
32
+ end
33
+
34
+ def restore_object(object, keep_days)
35
+ @s3.restore_object(bucket: object.bucket, key: object.key, restore_request: { days: keep_days })
36
+ end
37
+ end
38
+ end
@@ -0,0 +1,3 @@
1
+ module Tina
2
+ VERSION = '1.0.0'
3
+ end
@@ -0,0 +1 @@
1
+ require 'tina'
@@ -0,0 +1,107 @@
1
+ require 'spec_helper'
2
+
3
+ module Tina
4
+ describe RestorePlan do
5
+ subject do
6
+ described_class.new(total_storage_size, object_collection, options)
7
+ end
8
+
9
+ let :total_storage_size do
10
+ 75 * (1024 ** 4)
11
+ end
12
+
13
+ let :total_restore_size do
14
+ 140 * (1024 ** 3)
15
+ end
16
+
17
+ let :object_collection do
18
+ SpecHelpers::ObjectCollection.new(total_restore_size)
19
+ end
20
+
21
+ let :options do
22
+ {
23
+ price_per_gb_per_hour: 0.01
24
+ }
25
+ end
26
+
27
+ describe '#price' do
28
+ context 'with perfectly aligned chunks' do
29
+ # http://aws.amazon.com/glacier/faqs/
30
+ context 'with the examples given on the Amazon Glacier pricing FAQ' do
31
+ it 'matches the the price for a restore with everything at once' do
32
+ expect(subject.price(4 * 3600)).to be_within(0.05).of(21.6)
33
+ end
34
+
35
+ it 'matches the the price for a restore over 8 hours' do
36
+ expect(subject.price(8 * 3600)).to be_within(0.05).of(10.8)
37
+ end
38
+
39
+ it 'matches the the price for a restore over 28 hours' do
40
+ expect(subject.price(28 * 3600)).to eq 0
41
+ end
42
+ end
43
+
44
+ # http://calculator.s3.amazonaws.com/index.html
45
+ context 'with arbitrary examples taken from the Amazon calculator' do
46
+ let :options do
47
+ {
48
+ price_per_gb_per_hour: 0.011
49
+ }
50
+ end
51
+
52
+ let :total_storage_size do
53
+ 227 * 1024 ** 4
54
+ end
55
+
56
+ let :total_restore_size do
57
+ 12_000 * 1024 ** 3
58
+ end
59
+
60
+ it 'matches the price for a restore over a month' do
61
+ expect(subject.price(30 * 24 * 3600)).to be_within(0.05).of(4.16)
62
+ end
63
+
64
+ it 'matches the price for a restore over a week' do
65
+ expect(subject.price(7 * 24 * 3600)).to be_within(0.05).of(437.87)
66
+ end
67
+
68
+ it 'matches the price for a restore over a day' do
69
+ expect(subject.price(1 * 24 * 3600)).to be_within(0.05).of(3832.16)
70
+ end
71
+
72
+ it 'matches the price for a restore over a 4 hour period' do
73
+ expect(subject.price(4 * 3600)).to be_within(0.05).of(22992.93)
74
+ end
75
+ end
76
+
77
+ context 'with the examples Amazon supplied in an e-mail' do
78
+ let :total_storage_size do
79
+ 227 * 1024 ** 4
80
+ end
81
+
82
+ let :total_restore_size do
83
+ 12_000 * 1024 ** 3
84
+ end
85
+
86
+ it 'matches the price for a restore over 4 days' do
87
+ expect(subject.price(4 * 24 * 3600)).to be_within(20).of(768)
88
+ end
89
+ end
90
+ end
91
+ end
92
+ end
93
+ end
94
+
95
+ module SpecHelpers
96
+ class ObjectCollection
97
+ attr_reader :total_size
98
+
99
+ def initialize(total_restore_size)
100
+ @total_size = total_restore_size
101
+ end
102
+
103
+ def chunk(max_chunk_size)
104
+ [[Tina::S3Object.new('bucket', 'key', max_chunk_size)]] * (total_size / max_chunk_size)
105
+ end
106
+ end
107
+ end
@@ -0,0 +1,61 @@
1
+ module Tina
2
+ describe S3Client do
3
+ let :s3 do
4
+ double('s3', list_objects: object_list)
5
+ end
6
+
7
+ let :shell do
8
+ double('shell', say: nil)
9
+ end
10
+
11
+ let :object_list do
12
+ double('object list', contents: [double('object', key: 'first', size: 123)], is_truncated: false)
13
+ end
14
+
15
+ subject do
16
+ described_class.new(s3, shell)
17
+ end
18
+
19
+ describe '#list_bucket_prefixes' do
20
+ it 'raises a client error when one of the input URIs does not have the correct scheme' do
21
+ expect { subject.list_bucket_prefixes(%w(http://foo)) }.to raise_error(described_class::ClientError, /Invalid S3 URI/)
22
+ end
23
+
24
+ it 'retrieves s3 objects' do
25
+ allow(object_list).to receive(:contents).and_return([double('object', key: 'foo', size: 123)])
26
+ expect(subject.list_bucket_prefixes(['s3://bucket/prefix'])).to eq [S3Object.new('bucket', 'foo', 123)]
27
+ expect(s3).to have_received(:list_objects).with(hash_including(bucket: 'bucket', prefix: 'prefix'))
28
+ end
29
+
30
+ context 'for truncated response listings' do
31
+ let :object_list2 do
32
+ double('object list 2', contents: [double('object', key: 'second', size: 123)], is_truncated: false)
33
+ end
34
+
35
+ before do
36
+ allow(object_list).to receive(:is_truncated).and_return(true)
37
+ allow(s3).to receive(:list_objects).and_return(object_list, object_list2)
38
+ end
39
+
40
+ it 'specifies the last key of the first request as the marker for the next request when truncated' do
41
+ markers = []
42
+ first = true
43
+ allow(s3).to receive(:list_objects) do |options|
44
+ markers << options[:marker]
45
+ ret = first ? object_list : object_list2
46
+ first = false
47
+ ret
48
+ end
49
+ subject.list_bucket_prefixes(['s3://bucket/prefix'])
50
+ expect(markers).to eq [nil, "first"]
51
+ end
52
+
53
+ it 'returns the complete list of objects' do
54
+ actual_objects = subject.list_bucket_prefixes(['s3://bucket/prefix'])
55
+ expected_objects = object_list.contents + object_list2.contents
56
+ expect(actual_objects.map(&:key)).to eq(expected_objects.map(&:key))
57
+ end
58
+ end
59
+ end
60
+ end
61
+ end
metadata ADDED
@@ -0,0 +1,87 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: tina
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Björn Ramberg
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2014-09-24 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: thor
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - '>='
18
+ - !ruby/object:Gem::Version
19
+ version: 0.19.1
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - '>='
25
+ - !ruby/object:Gem::Version
26
+ version: 0.19.1
27
+ - !ruby/object:Gem::Dependency
28
+ name: aws-sdk-core
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - '>='
32
+ - !ruby/object:Gem::Version
33
+ version: 2.0.0.rc15
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - '>='
39
+ - !ruby/object:Gem::Version
40
+ version: 2.0.0.rc15
41
+ description: CLI tool for restoring objects from Amazon Glacier over time in order
42
+ to control costs
43
+ email:
44
+ - bjorn@burtcorp.com
45
+ executables:
46
+ - tina
47
+ extensions: []
48
+ extra_rdoc_files: []
49
+ files:
50
+ - README.md
51
+ - bin/tina
52
+ - lib/tina.rb
53
+ - lib/tina/cli.rb
54
+ - lib/tina/restore_plan.rb
55
+ - lib/tina/s3_client.rb
56
+ - lib/tina/version.rb
57
+ - spec/spec_helper.rb
58
+ - spec/tina/restore_plan_spec.rb
59
+ - spec/tina/s3_client_spec.rb
60
+ homepage: http://github.com/burtcorp/tina
61
+ licenses:
62
+ - BSD-3-Clause
63
+ metadata: {}
64
+ post_install_message:
65
+ rdoc_options: []
66
+ require_paths:
67
+ - lib
68
+ required_ruby_version: !ruby/object:Gem::Requirement
69
+ requirements:
70
+ - - '>='
71
+ - !ruby/object:Gem::Version
72
+ version: '0'
73
+ required_rubygems_version: !ruby/object:Gem::Requirement
74
+ requirements:
75
+ - - '>='
76
+ - !ruby/object:Gem::Version
77
+ version: '0'
78
+ requirements: []
79
+ rubyforge_project:
80
+ rubygems_version: 2.2.2
81
+ signing_key:
82
+ specification_version: 4
83
+ summary: CLI tool for restoring objects from Amazon Glacier
84
+ test_files:
85
+ - spec/spec_helper.rb
86
+ - spec/tina/restore_plan_spec.rb
87
+ - spec/tina/s3_client_spec.rb