faster_s3_url 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 8def7b598503ea770dec6923c611bb3b5fa506ba90f89852ce299457ccbefb1e
4
+ data.tar.gz: 79161290e666ca39b0a9f349dc69fcb326edbeb3b975c8c0ede9c0d97c892737
5
+ SHA512:
6
+ metadata.gz: 62f80a646516296746d851bcdf870c4c0e06a48af6d9e677a53c391b93ec6e4fd14dc86be73f2f540d1ce71587fc3fe2fc4abf19c6b8917e63fa75001e62a261
7
+ data.tar.gz: c3d738c8c3811954540725227a042da8820b9dd0a670e2aa32098c3be5cf015745d5217c904f9f6b0f6f762469f1147db0a76c8f9644d4c29ed8a0ff9d5f4559
@@ -0,0 +1,14 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+
10
+ # rspec failure tracking
11
+ .rspec_status
12
+
13
+ Gemfile.lock
14
+ .byebug_history
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
@@ -0,0 +1,6 @@
1
+ ---
2
+ language: ruby
3
+ cache: bundler
4
+ rvm:
5
+ - 2.6.6
6
+ before_install: gem install bundler -v 2.1.4
data/Gemfile ADDED
@@ -0,0 +1,11 @@
1
+ source "https://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in faster_s3_url.gemspec
4
+ gemspec
5
+
6
+ gem "rake", "~> 12.0"
7
+ gem "rspec", "~> 3.0"
8
+
9
+ gem 'pry-byebug', "~> 3.9"
10
+ # need straight from github to get latest version without deprecations, eg https://github.com/softdevteam/libkalibera/issues/5
11
+ gem 'kalibera', github: "softdevteam/libkalibera"
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2020 Science History Institute, Jonathan Rochkind
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,218 @@
1
+ # FasterS3Url
2
+
3
+ Generate public and presigned AWS S3 `GET` URLs faster in ruby
4
+
5
+ [![Build Status](https://travis-ci.com/jrochkind/faster_s3_url.svg?branch=master)](https://travis-ci.com/jrochkind/faster_s3_url)
6
+
7
+ The official [ruby AWS SDK](https://github.com/aws/aws-sdk-ruby) is actually quite slow and unoptimized when generating URLs to access S3 objects. If you are only creating a couple S3 URLs at a time this may not matter. But it can matter on the order of even two or three hundred at a time, especially when creating presigned URLs, for which the AWS SDK is especially un-optimized.
8
+
9
+ This gem provides a much faster implementation, by around an order of magnitude, for both public and presigned S3 `GET` URLs. Additional S3 params such as `response-content-disposition` are supported for presigned URLs.
10
+
11
+ ## Usage
12
+
13
+ ```ruby
14
+ signer = FasterS3Url::Builder.new(
15
+ bucket_name: "my-bucket",
16
+ region: "us-east-1",
17
+ access_key_id: ENV['AWS_ACCESS_KEY'],
18
+ secret_access_key: ENV['AWS_SECRET_KEY']
19
+ )
20
+
21
+ signer.public_url("my/object/key.jpg")
22
+ #=> "https://my-bucket.aws"
23
+ signer.presigned_url("my/object/key.jpg")
24
+ ```
25
+
26
+ You can re-use a signer object for convenience or slighlty improved performance. It should be concurrency-safe to share globally between threads.
27
+
28
+ If you are using S3 keys that need to be escaped in the URLs, this gem will escpae them properly.
29
+
30
+ When presigning URLs, you can pass the query parameters supported by S3 to control subsequent response headers. You can also supply a version_id for a URL to access a specific version.
31
+
32
+ ```ruby
33
+ signer.presigned_url("my/object/key.jpg"
34
+ response_cache_control: "public, max-age=604800, immutable",
35
+ response_content_disposition: "attachment",
36
+ response_content_language: "de-DE, en-CA",
37
+ response_content_type: "text/html; charset=UTF-8",
38
+ response_content_encoding: "deflate, gzip",
39
+ response_expires: "Wed, 21 Oct 2030 07:28:00 GMT",
40
+ version_id: "BspIL8pXg_52rGXELmqZ7cgmn7u4XJgS"
41
+ )
42
+ ```
43
+
44
+ Use a CNAME or CDN or any other hostname variant other than the default this gem will come up with? Just pass in a `host` argument to initializer. Will work with both public and presigned URLs.
45
+
46
+ ```ruby
47
+ builder = FasterS3Url::Builder.new(
48
+ bucket_name: "my-bucket.example.com",
49
+ host: "my-bucket.example.com",
50
+ region: "us-east-1",
51
+ access_key_id: ENV['AWS_ACCESS_KEY'],
52
+ secret_access_key: ENV['AWS_SECRET_KEY']
53
+ )
54
+ ```
55
+
56
+ ### Cache signing keys for further performance
57
+
58
+ Under most usage patterns, the presigend URLs you generate will all use a `time` with the same UTC date. In this case, a performance advantage can be had by asking the Builder to cache and re-use AWS signing keys, which only vary with calendar date of `time` arg, not time, or S3 key, or other args. It will actually cache the 5 most recently used signing keys. This can result in around a 50% performance improvement with a re-used Builder used for generating presigned keys.
59
+
60
+ **NOTE WELL: This will technically make the Builder object no longer concurrency-safe under multiple threads.** Although you might get away with it under MRI. This is one reason it is not on by default.
61
+
62
+ ```ruby
63
+ builder = FasterS3Url::Builder.new(
64
+ bucket_name: "my-bucket.example.com",
65
+ region: "us-east-1",
66
+ access_key_id: ENV['AWS_ACCESS_KEY'],
67
+ secret_access_key: ENV['AWS_SECRET_KEY'],
68
+ cache_signing_keys: true
69
+ )
70
+ builder.presign_url(key) # performance enhanced
71
+ ```
72
+
73
+
74
+ ### Automatic AWS credentials lookup?
75
+
76
+ Right now, you need to explicitly supply `access_key_id` and `secret_access_key`, in part to avoid a dependency on the AWS SDK (This gem doesn't have such a dependency!). Let us know if this makes you feel a certain kind of way.
77
+
78
+ If you want to look up key/secret/region using the standard SDK methods of checking various places, in order to supply them to the `FasterS3Url::Builder`, you can try this (is there a better way? Cause this is kind of a mess!)
79
+
80
+ ```ruby
81
+ require 'aws-sdk-s3'
82
+ client = Aws::S3::Client.new
83
+ credentials = client.config.credentials
84
+ credentails = credentials.credentials if credentials.respond_to?(:credentials)
85
+
86
+ access_key_id = credentials.access_key_id
87
+ secret_access_key = credentials.secret_access_key
88
+ region = client.config.region
89
+ ```
90
+
91
+ ### Shrine Storage
92
+
93
+ Use [shrine](https://shrinerb.com/)? We do and love it. This gem provides a storage that can be a drop-in replacement to [Shrine::Storage::S3](https://shrinerb.com/docs/storage/s3) (shrine 3.x required), but with faster URL generation.
94
+
95
+ ```ruby
96
+ # Where you might have done:
97
+
98
+ require "shrine/storage/s3"
99
+
100
+ s3 = Shrine::Storage::S3.new(
101
+ bucket: "my-app", # required
102
+ region: "eu-west-1", # required
103
+ access_key_id: "abc",
104
+ secret_access_key: "xyz",
105
+ )
106
+
107
+ # instead do:
108
+
109
+ require "faster_s3_url/shrine/storage"
110
+
111
+ s3 = FasterS3Url::Shrine::Storage.new(
112
+ bucket: "my-app", # required
113
+ region: "eu-west-1", # required
114
+ access_key_id: "abc", # required
115
+ secret_access_key: "xyz", # required
116
+ )
117
+ ```
118
+
119
+ A couple minor differences, let me know if they disrupt you:
120
+ * We don't support the `signer` initializer argument, not clear to me why you'd want to use this gem if you are using it.
121
+ * We support a `host` arg in initializer, but not in #url method.
122
+
123
+ ## Performance Benchmarking
124
+
125
+ Benchmarks were done using scripts checked into repo at `./perf` (which use benchmark-ips with mode `:stats => :bootstrap, :confidence => 95`), on my 2015 Macbook Pro, using ruby MRI 2.6.6. Benchmarking is never an exact science, hopefully this is reasonable.
126
+
127
+ In my narrative, I normalize to how many iterations can happen in **10ms** to have numbers closer to what might be typical use cases.
128
+
129
+ ### Public URLs
130
+
131
+ `aws-sdk-s3` can create about 180 public URLs in 10ms, not horrible, but for how simple it seems the operation should be? FasterS3Url can do 2,200 public URLs in 10ms, that's a lot better.
132
+
133
+ ```
134
+ $ bundle exec ruby perf/public_bench.rb
135
+ Warming up --------------------------------------
136
+ aws-sdk-s3 1.265k i/100ms
137
+ FasterS3Url 24.414k i/100ms
138
+ Calculating -------------------------------------
139
+ aws-sdk-s3 18.701k (± 3.2%) i/s - 92.345k in 5.048062s
140
+ FasterS3Url 222.938k (± 3.2%) i/s - 1.123M in 5.106971s
141
+ with 95.0% confidence
142
+ ```
143
+
144
+ ### Presigned URLs
145
+
146
+ Here's where it really starts to matter.
147
+
148
+ `aws-sdk-s3` can only generate about 10 presigned URLs in 10ms, painful. FasterS3URL, with a re-used Builder object, can generate about 220 presigned URLs in 10ms, much better, and actually faster than `aws-sdk-s3` can generate public urls! Even if we re-instantiate a Builder each time, we can generate 180 presigned URLs in 10ms, don't lose too much performance that way.
149
+
150
+ If we re-use the Builder *and* turn on the (not thread-safe) `cached_signing_keys` option, we can get up to 300 presigned URLs generated in 10ms.
151
+
152
+ FasterS3URL supports supplying custom query params to instruct s3 HTTP response headers. This does slow things down since they need to be URI-escaped and constructed. Using this feature with `aws-sdk-s3`, it doesn't lose much speed, down to 9 instead of 10 URLs in 10ms. FasterS3URL goes down from 210 to 180 URLs generated in 10ms (without using `cached_signing_keys` option).
153
+
154
+ We can compare to the ultra-fast [wt_s3_signer](https://github.com/WeTransfer/wt_s3_signer) gem, which, with a re-used signer object (that assumes the same `time` for all URLs, unlike us; and does not support per-url custom query params) can get all the way up to 680 URLs generated in 10ms, over twice as fast as we can do even with `cached_signing_keys`. If the restrictions and API of wt_s3_signer are amenable to your use case, it's definitely the fastest. But FasterS3URL is in the ballpark, and still more than an order of magnitude faster than `aws-sdk-s3`.
155
+
156
+ ```
157
+ $ bundle exec ruby perf/presigned_bench.rb
158
+ Warming up --------------------------------------
159
+ aws-sdk-s3 113.000 i/100ms
160
+ aws-sdk-s3 with custom headers
161
+ 95.000 i/100ms
162
+ re-used FasterS3Url 1.820k i/100ms
163
+ re-used FasterS3Url with cached signing keys
164
+ 2.920k i/100ms
165
+ re-used FasterS3URL with custom headers
166
+ 1.494k i/100ms
167
+ new FasterS3URL Builder each time
168
+ 1.977k i/100ms
169
+ re-used WT::S3Signer 7.985k i/100ms
170
+ new WT::S3Signer each time
171
+ 1.611k i/100ms
172
+ Calculating -------------------------------------
173
+ aws-sdk-s3 1.084k (± 4.2%) i/s - 5.311k in 5.003981s
174
+ aws-sdk-s3 with custom headers
175
+ 918.315 (± 4.8%) i/s - 4.560k in 5.118770s
176
+ re-used FasterS3Url 21.906k (± 4.0%) i/s - 107.380k in 5.046561s
177
+ re-used FasterS3Url with cached signing keys
178
+ 29.756k (± 3.6%) i/s - 146.000k in 4.999910s
179
+ re-used FasterS3URL with custom headers
180
+ 18.062k (± 4.3%) i/s - 85.158k in 5.025685s
181
+ new FasterS3URL Builder each time
182
+ 18.312k (± 3.9%) i/s - 90.942k in 5.098636s
183
+ re-used WT::S3Signer 68.275k (± 3.5%) i/s - 343.355k in 5.109088s
184
+ new WT::S3Signer each time
185
+ 22.425k (± 2.8%) i/s - 111.159k in 5.036814s
186
+ with 95.0% confidence
187
+ ```
188
+
189
+
190
+ ## Development
191
+
192
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
193
+
194
+ To install local unreleased source of this gem onto your local machine for development, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
195
+
196
+ ## Sources/Acknowledgements
197
+
198
+ [wt_s3_signer](https://github.com/WeTransfer/wt_s3_signer) served as a proof of concept, but was optimized for their particular use case, assuming batch processing where all signed S3 URLs share the same "now" time. I needed to support cases that didn't assume this, and also support custom headers like `response_content_disposition`. This code is also released with a different license. But if the API and use cases of `wt_s3_signer` meet your needs, it is even faster than this code.
199
+
200
+ I tried to figure out how to do the S3 presigned request from some AWS docs:
201
+
202
+ * https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html
203
+ * https://docs.aws.amazon.com/general/latest/gr/sigv4-signed-request-examples.html
204
+
205
+ But these don't really have all info you need for generating an S3 signature. So also ended up debugger reverse engineering the ruby aws-sdk code when generating an S3 presigned url, for instance:
206
+
207
+ * https://github.com/aws/aws-sdk-ruby/blob/47c11bef18a4754ec8a05dfb637dcab120138c27/gems/aws-sdk-s3/lib/aws-sdk-s3/presigner.rb
208
+ * but especially: https://github.com/aws/aws-sdk-ruby/blob/47c11bef18a4754ec8a05dfb637dcab120138c27/gems/aws-sigv4/lib/aws-sigv4/signer.rb
209
+
210
+ ## Contributing
211
+
212
+ Bug reports and pull requests are welcome on GitHub at https://github.com/jrochkind/faster_s3_url.
213
+
214
+ Is there a feature missing that you need? I may not be able to provide it, but I would love to hear from you!
215
+
216
+ ## License
217
+
218
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "faster_s3_url"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
@@ -0,0 +1,29 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ #
5
+ # This file was generated by Bundler.
6
+ #
7
+ # The application 'rspec' is installed as part of a gem, and
8
+ # this file is here to facilitate running it.
9
+ #
10
+
11
+ require "pathname"
12
+ ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../Gemfile",
13
+ Pathname.new(__FILE__).realpath)
14
+
15
+ bundle_binstub = File.expand_path("../bundle", __FILE__)
16
+
17
+ if File.file?(bundle_binstub)
18
+ if File.read(bundle_binstub, 300) =~ /This file was generated by Bundler/
19
+ load(bundle_binstub)
20
+ else
21
+ abort("Your `bin/bundle` was not generated by Bundler, so this binstub cannot run.
22
+ Replace `bin/bundle` by running `bundle binstubs bundler --force`, then run this command again.")
23
+ end
24
+ end
25
+
26
+ require "rubygems"
27
+ require "bundler/setup"
28
+
29
+ load Gem.bin_path("rspec-core", "rspec")
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,35 @@
1
+ require_relative 'lib/faster_s3_url/version'
2
+
3
+ Gem::Specification.new do |spec|
4
+ spec.name = "faster_s3_url"
5
+ spec.version = FasterS3Url::VERSION
6
+ spec.authors = ["Jonathan Rochkind"]
7
+ spec.email = ["jrochkind@sciencehistory.org"]
8
+
9
+ spec.summary = %q{Generate public and presigned AWS S3 GET URLs faster}
10
+ spec.homepage = "https://github.com/jrochkind/faster_s3_url"
11
+ spec.license = "MIT"
12
+ spec.required_ruby_version = Gem::Requirement.new(">= 2.3.0")
13
+
14
+ #spec.metadata["allowed_push_host"] = "TODO: Set to 'http://mygemserver.com'"
15
+
16
+ spec.metadata["homepage_uri"] = spec.homepage
17
+ spec.metadata["source_code_uri"] = "https://github.com/jrochkind/faster_s3_url"
18
+ #spec.metadata["changelog_uri"] = "TODO: Put your gem's CHANGELOG.md URL here."
19
+
20
+ # Specify which files should be added to the gem when it is released.
21
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
22
+ spec.files = Dir.chdir(File.expand_path('..', __FILE__)) do
23
+ `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
24
+ end
25
+ spec.bindir = "exe"
26
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
27
+ spec.require_paths = ["lib"]
28
+
29
+ spec.add_development_dependency "aws-sdk-s3", "~> 1.81"
30
+ spec.add_development_dependency "timecop", "< 2"
31
+ spec.add_development_dependency "benchmark-ips", "~> 2.8"
32
+ #spec.add_development_dependency "kalibera" # for benchmark-ips :bootstrap stats option
33
+ spec.add_development_dependency "wt_s3_signer" # just for benchmarking
34
+ spec.add_development_dependency "shrine", "~> 3.0" # for testing shrine storage
35
+ end
@@ -0,0 +1,5 @@
1
+ require "faster_s3_url/version"
2
+ require 'faster_s3_url/builder'
3
+
4
+ module FasterS3Url
5
+ end
@@ -0,0 +1,290 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FasterS3Url
4
+ # Signing algorithm based on Amazon docs at https://docs.aws.amazon.com/general/latest/gr/sigv4-signed-request-examples.html ,
5
+ # as well as some interactive code reading of Aws::Sigv4::Signer
6
+ # https://github.com/aws/aws-sdk-ruby/blob/6114bc9692039ac75c8292c66472dacd14fa6f9a/gems/aws-sigv4/lib/aws-sigv4/signer.rb
7
+ # as used by Aws::S3::Presigner https://github.com/aws/aws-sdk-ruby/blob/6114bc9692039ac75c8292c66472dacd14fa6f9a/gems/aws-sdk-s3/lib/aws-sdk-s3/presigner.rb
8
+ class Builder
9
+ FIFTEEN_MINUTES = 60 * 15
10
+ ONE_WEEK = 60 * 60 * 24 * 7
11
+
12
+ SIGNED_HEADERS = "host".freeze
13
+ METHOD = "GET".freeze
14
+ ALGORITHM = "AWS4-HMAC-SHA256".freeze
15
+ SERVICE = "s3".freeze
16
+
17
+ DEFAULT_EXPIRES_IN = FIFTEEN_MINUTES # 15 minutes, seems to be AWS SDK default
18
+
19
+ MAX_CACHED_SIGNING_KEYS = 5
20
+
21
+ attr_reader :bucket_name, :region, :host, :access_key_id
22
+
23
+ # @option params [String] :bucket_name required
24
+ #
25
+ # @option params [String] :region eg "us-east-1", required
26
+ #
27
+ # @option params[String] :host optional, host to use in generated URLs. If empty, will construct default AWS S3 host for bucket name and region.
28
+ #
29
+ # @option params [String] :access_key_id required at present, change to allow look up from environment using standard aws sdk routines?
30
+ #
31
+ # @option params [String] :secret_access_key required at present, change to allow look up from environment using standard aws sdk routines?
32
+ #
33
+ # @option params [boolean] :default_public (true) default value of `public` when instance method #url is called.
34
+ #
35
+ # @option params [boolean] :cache_signing_keys (false). If set to true, up to five signing keys used for presigned URLs will
36
+ # be cached and re-used, improving performance when generating mulitple presigned urls with a single Builder by around 50%.
37
+ # NOTE WELL: This will make the Builder no longer technically concurrency-safe for sharing between multiple threads, is one
38
+ # reason it is not on by default.
39
+ def initialize(bucket_name:, region:, access_key_id:, secret_access_key:, host:nil, default_public: true, cache_signing_keys: false)
40
+ @bucket_name = bucket_name
41
+ @region = region
42
+ @host = host || default_host(bucket_name)
43
+ @default_public = default_public
44
+ @access_key_id = access_key_id
45
+ @secret_access_key = secret_access_key
46
+ @cache_signing_keys = cache_signing_keys
47
+ if @cache_signing_keys
48
+ @signing_key_cache = {}
49
+ end
50
+
51
+
52
+ @canonical_headers = "host:#{@host}\n"
53
+ end
54
+
55
+ def public_url(key)
56
+ "https://#{self.host}/#{uri_escape_key(key)}"
57
+ end
58
+
59
+ # Generates a presigned GET URL for a specified S3 object key.
60
+ #
61
+ # @param [String] key The S3 key to create a URL pointing to.
62
+ #
63
+ # @option params [Time] :time (Time.now) The starting time for when the
64
+ # presigned url becomes active.
65
+ #
66
+ # @option params [String] :response_cache_control
67
+ # Adds a `response-cache-control` query param to set the `Cache-Control` header of the subsequent response from S3.
68
+ #
69
+ # @option params [String] :response_content_disposition
70
+ # Adds a `response-content-disposition` query param to set the `Content-Disposition` header of the subsequent response from S3
71
+ #
72
+ # @option params [String] :response_content_encoding
73
+ # Adds a `response-content-encoding` query param to set `Content-Encoding` header of the subsequent response from S3
74
+ #
75
+ # @option params [String] :response_content_language
76
+ # Adds a `response-content-language` query param to sets the `Content-Language` header of the subsequent response from S3
77
+ #
78
+ # @option params [String] :response_content_type
79
+ # Adds a `response-content-type` query param to sets the `Content-Type` header of the subsequent response from S3
80
+ #
81
+ # @option params [String] :response_expires
82
+ # Adds a `response-expires` query param to sets the `Expires` header of of the subsequent response from S3
83
+ #
84
+ # @option params [String] :version_id
85
+ # Adds a `versionId` query param to reference a specific version of the object from S3.
86
+ def presigned_url(key, time: nil, expires_in: DEFAULT_EXPIRES_IN,
87
+ response_cache_control: nil,
88
+ response_content_disposition: nil,
89
+ response_content_encoding: nil,
90
+ response_content_language: nil,
91
+ response_content_type: nil,
92
+ response_expires: nil,
93
+ version_id: nil)
94
+ validate_expires_in(expires_in)
95
+
96
+ canonical_uri = "/" + uri_escape_key(key)
97
+
98
+ now = time ? time.dup.utc : Time.now.utc # Uh Time#utc is mutating, not nice to do to an argument!
99
+ amz_date = now.strftime("%Y%m%dT%H%M%SZ")
100
+ datestamp = now.strftime("%Y%m%d")
101
+
102
+ credential_scope = datestamp + '/' + region + '/' + SERVICE + '/' + 'aws4_request'
103
+
104
+ canonical_query_string_parts = [
105
+ "X-Amz-Algorithm=#{ALGORITHM}",
106
+ "X-Amz-Credential=" + uri_escape(@access_key_id + "/" + credential_scope),
107
+ "X-Amz-Date=" + amz_date,
108
+ "X-Amz-Expires=" + expires_in.to_s,
109
+ "X-Amz-SignedHeaders=" + SIGNED_HEADERS,
110
+ ]
111
+
112
+ extra_params = {
113
+ :"response-cache-control" => response_cache_control,
114
+ :"response-content-disposition" => response_content_disposition,
115
+ :"response-content-encoding" => response_content_encoding,
116
+ :"response-content-language" => response_content_language,
117
+ :"response-content-type" => response_content_type,
118
+ :"response-expires" => convert_for_timestamp_shape(response_expires),
119
+ :"versionId" => version_id
120
+ }.compact
121
+
122
+
123
+ if extra_params.size > 0
124
+ # These have to be sorted, but sort is case-sensitive, and we have a fixed
125
+ # list of headers we know might be here... turns out they are already sorted?
126
+ extra_param_parts = extra_params.collect {|k, v| "#{k}=#{uri_escape v}" }.join("&")
127
+ canonical_query_string_parts << extra_param_parts
128
+ end
129
+
130
+ canonical_query_string = canonical_query_string_parts.join("&")
131
+
132
+
133
+
134
+ canonical_request = ["GET",
135
+ canonical_uri,
136
+ canonical_query_string,
137
+ @canonical_headers,
138
+ SIGNED_HEADERS,
139
+ 'UNSIGNED-PAYLOAD'
140
+ ].join("\n")
141
+
142
+ string_to_sign = [
143
+ ALGORITHM,
144
+ amz_date,
145
+ credential_scope,
146
+ Digest::SHA256.hexdigest(canonical_request)
147
+ ].join("\n")
148
+
149
+ signing_key = retrieve_signing_key(datestamp)
150
+ signature = OpenSSL::HMAC.hexdigest("SHA256", signing_key, string_to_sign)
151
+
152
+ return "https://" + self.host + canonical_uri + "?" + canonical_query_string + "&X-Amz-Signature=" + signature
153
+ end
154
+
155
+ # just a convenience method that can call public_url or presigned_url based on flag
156
+ #
157
+ # signer.url(object_key, public: true)
158
+ # #=> forwards to signer.public_url(object_key)
159
+ #
160
+ # signer.url(object_key, public: false, response_content_type: "image/jpeg")
161
+ # #=> forwards to signer.presigned_url(object_key, response_content_type: "image/jpeg")
162
+ #
163
+ # Options (sucn as response_content_type) that are not applicable to #public_url
164
+ # are ignored in public mode.
165
+ #
166
+ # The default value of `public` can be set by initializer arg `default_public`, which
167
+ # is itself default true.
168
+ #
169
+ # builder = FasterS3Url::Builder.new(..., default_public: false)
170
+ # builder.url(object_key) # will call #presigned_url
171
+ def url(key, public: @default_public, **options)
172
+ if public
173
+ public_url(key)
174
+ else
175
+ presigned_url(key, **options)
176
+ end
177
+ end
178
+
179
+
180
+ private
181
+
182
+ def make_signing_key(datestamp)
183
+ aws_get_signature_key(@secret_access_key, datestamp, @region, SERVICE)
184
+ end
185
+
186
+ # If caching of signing keys is turned on, use and cache signing key, while
187
+ # making sure not to cache more than MAX_CACHED_SIGNING_KEYS
188
+ #
189
+ # Otherwise if caching of signing keys is not turned on, just generate and return
190
+ # a signing key.
191
+ def retrieve_signing_key(datestamp)
192
+ if @cache_signing_keys
193
+ if value = @signing_key_cache[datestamp]
194
+ value
195
+ else
196
+ value = @signing_key_cache[datestamp] = make_signing_key(datestamp)
197
+ while @signing_key_cache.size > MAX_CACHED_SIGNING_KEYS
198
+ @signing_key_cache.delete(@signing_key_cache.keys.first)
199
+ end
200
+ value
201
+ end
202
+ else
203
+ make_signing_key(datestamp)
204
+ end
205
+ end
206
+
207
+
208
+ # Becaues CGI.escape in MRI is written in C, this really does seem
209
+ # to be the fastest way to get the semantics we want, starting with
210
+ # CGI.escape and doing extra gsubs. Alternative would be using something
211
+ # else in pure C that has the semantics we want, but does not seem available.
212
+ def uri_escape(string)
213
+ if string.nil?
214
+ nil
215
+ else
216
+ CGI.escape(string.encode('UTF-8')).gsub('+', '%20').gsub('%7E', '~')
217
+ end
218
+ end
219
+
220
+ # like uri_escape but does NOT escape `/`, leaves it alone. The appropriate
221
+ # escaping algorithm for an S3 key turning into a URL.
222
+ #
223
+ # Faster to un-DRY the code with uri_escape. Yes, faster to actually just gsub
224
+ # %2F back to /
225
+ def uri_escape_key(string)
226
+ if string.nil?
227
+ nil
228
+ else
229
+ CGI.escape(string.encode('UTF-8')).gsub('+', '%20').gsub('%7E', '~').gsub("%2F", "/")
230
+ end
231
+ end
232
+
233
+ def default_host(bucket_name)
234
+ if region == "us-east-1"
235
+ # use legacy one without region, as S3 seems to
236
+ "#{bucket_name}.s3.amazonaws.com".freeze
237
+ else
238
+ "#{bucket_name}.s3.#{region}.amazonaws.com".freeze
239
+ end
240
+ end
241
+
242
+ # `def get_signature_key` `from python example at https://docs.aws.amazon.com/general/latest/gr/sigv4-signed-request-examples.html
243
+ def aws_get_signature_key(key, date_stamp, region_name, service_name)
244
+ k_date = aws_sign("AWS4" + key, date_stamp)
245
+ k_region = aws_sign(k_date, region_name)
246
+ k_service = aws_sign(k_region, service_name)
247
+ aws_sign(k_service, "aws4_request")
248
+ end
249
+
250
+ # `def sign` from python example at https://docs.aws.amazon.com/general/latest/gr/sigv4-signed-request-examples.html
251
+ def aws_sign(key, data)
252
+ OpenSSL::HMAC.digest("SHA256", key, data)
253
+ end
254
+
255
+ def validate_expires_in(expires_in)
256
+ if expires_in > ONE_WEEK
257
+ raise ArgumentError.new("expires_in value of #{expires_in} exceeds one-week maximum.")
258
+ elsif expires_in <= 0
259
+ raise ArgumentError.new("expires_in value of #{expires_in} cannot be 0 or less.")
260
+ end
261
+ end
262
+
263
+ # Crazy kind of reverse engineered from aws-sdk-ruby,
264
+ # for compatible handling of Expires header.
265
+ #
266
+ # This honestly seems to violate the HTTP spec, the result will be that for
267
+ # an `response-expires` param, subsequent S3 response will include an Expires
268
+ # header in ISO8601 instead of HTTP-date format.... but for now we'll make
269
+ # our tests pass by behaving equivalently to aws-sdk-s3 anyway? filed
270
+ # with aws-sdk-s3: https://github.com/aws/aws-sdk-ruby/issues/2415
271
+ #
272
+ # Switch last line from `.utc.iso8601` to `.httpdate` if you want to be
273
+ # more correct than aws-sdk-s3?
274
+ def convert_for_timestamp_shape(arg)
275
+ return nil if arg.nil?
276
+
277
+ time_value = case arg
278
+ when Time
279
+ arg
280
+ when Date, DateTime
281
+ arg.to_time
282
+ when Integer, Float
283
+ Time.at(arg)
284
+ else
285
+ Time.parse(arg.to_s)
286
+ end
287
+ time_value.utc.iso8601
288
+ end
289
+ end
290
+ end
@@ -0,0 +1,66 @@
1
+ gem "shrine", "~> 3.0"
2
+ require 'shrine/storage/s3'
3
+
4
+ module FasterS3Url
5
+ module Shrine
6
+ # More or less a drop-in replacement for Shrine::Storage::S3 , that uses FasterS3Url faster S3 URL generation.
7
+ # https://shrinerb.com/docs/storage/s3
8
+ #
9
+ # require 'faster_s3_url/storage/shrine'
10
+ #
11
+ # s3 = FasterS3Url::Shrine::Storage.new(
12
+ # bucket: "my-app", # required
13
+ # region: "eu-west-1", # required
14
+ # access_key_id: "abc",
15
+ # secret_access_key: "xyz"
16
+ # )
17
+ #
18
+ # A couple incompatibilities with Shrine::Storage::S3, which I don't expect to cause problems
19
+ # for anyone but if they do please let me know.
20
+ #
21
+ # * we do not support the :signer option in initialier (why would you want to use that with this? Let me know)
22
+ #
23
+ # * We support a `host` option on initializer, but do NOT support the `host` option on #url (I don't underestand
24
+ # why it's per-call in the first place, do you need it to be?)
25
+ #
26
+ class Storage < ::Shrine::Storage::S3
27
+ # Same options as Shrine::Storage::S3, plus `host`
28
+ def initialize(**options)
29
+ if options[:signer]
30
+ raise ArgumentError.new("#{self.class.name} does not support :signer option of Shrine::Storage::S3. Should it? Let us know.")
31
+ end
32
+
33
+ host = options.delete(:host)
34
+ @faster_s3_url_builder = FasterS3Url::Builder.new(
35
+ bucket_name: options[:bucket],
36
+ access_key_id: options[:access_key_id],
37
+ secret_access_key: options[:secret_access_key],
38
+ region: options[:region],
39
+ host: host)
40
+
41
+ super(**options)
42
+ end
43
+
44
+ # unlike base Shrine::Storage::S3, does not support `host` here, do it in
45
+ # initializer instead. Is there a really use case for doing it here?
46
+ # If so let us know.
47
+ #
48
+ # options are ignored when public mode, so you can send options the same
49
+ # for public or not, and not get an error on public for options only appropriate
50
+ # to presigned.
51
+ #
52
+ # Otherwise, same options as Shrine::S3::Storage should be supported, please
53
+ # see docs there. https://shrinerb.com/docs/storage/s3
54
+ def url(id, public: self.public, **options)
55
+ @faster_s3_url_builder.url(object_key(id), public: public, **options)
56
+ end
57
+
58
+ # For older shrine versions without it, we need this...
59
+ unless self.method_defined?(:object_key)
60
+ def object_key(id)
61
+ [*prefix, id].join("/")
62
+ end
63
+ end
64
+ end
65
+ end
66
+ end
@@ -0,0 +1,3 @@
1
+ module FasterS3Url
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,86 @@
1
+ # Run as eg:
2
+ #
3
+ # $ bundle exec ruby perf/presigned_bench.rb
4
+
5
+ require 'benchmark/ips'
6
+ require 'faster_s3_url'
7
+ require 'aws-sdk-s3'
8
+ require 'wt_s3_signer'
9
+
10
+ access_key_id = "fakeExampleAccessKeyId"
11
+ secret_access_key = "fakeExampleSecretAccessKey"
12
+
13
+ bucket_name = "my-bucket"
14
+ object_key = "some/directory/file.jpg"
15
+ region = "us-east-1"
16
+
17
+ aws_client = Aws::S3::Client.new(region: region, access_key_id: access_key_id, secret_access_key: secret_access_key)
18
+ aws_bucket = Aws::S3::Bucket.new(name: bucket_name, client: aws_client)
19
+
20
+ faster_s3_builder = FasterS3Url::Builder.new(region: region, access_key_id: access_key_id, secret_access_key: secret_access_key, bucket_name: bucket_name)
21
+
22
+ faster_s3_builder_with_caching = FasterS3Url::Builder.new(region: region, access_key_id: access_key_id, secret_access_key: secret_access_key, bucket_name: bucket_name, cache_signing_keys: true)
23
+
24
+ wt_signer = WT::S3Signer.new(
25
+ expires_in: 15 * 60,
26
+ aws_region: "us-east-1",
27
+ bucket_endpoint_url: "https://#{bucket_name}.s3.amazonaws.com",
28
+ bucket_host: "#{bucket_name}.s3.amazonaws.com",
29
+ bucket_name: bucket_name,
30
+ access_key_id: access_key_id,
31
+ secret_access_key: secret_access_key,
32
+ session_token: nil
33
+ )
34
+
35
+ Benchmark.ips do |x|
36
+ begin
37
+ require 'kalibera'
38
+ x.config(:stats => :bootstrap, :confidence => 95)
39
+ rescue LoadError
40
+
41
+ end
42
+
43
+ x.report("aws-sdk-s3") do
44
+ aws_bucket.object(object_key).presigned_url(:get)
45
+ end
46
+
47
+ x.report("aws-sdk-s3 with custom headers") do
48
+ aws_bucket.object(object_key).presigned_url(:get, response_content_type: "image/jpeg", response_content_disposition: "attachment; filename=\"foo bar.baz\"; filename*=UTF-8''foo%20bar.baz")
49
+ end
50
+
51
+ x.report("re-used FasterS3Url") do
52
+ faster_s3_builder.presigned_url(object_key)
53
+ end
54
+
55
+ x.report("re-used FasterS3Url with cached signing keys") do
56
+ faster_s3_builder_with_caching.presigned_url(object_key)
57
+ end
58
+
59
+ x.report("re-used FasterS3URL with custom headers") do
60
+ faster_s3_builder.presigned_url(object_key, response_content_type: "image/jpeg", response_content_disposition: "attachment; filename=\"foo bar.baz\"; filename*=UTF-8''foo%20bar.baz")
61
+ end
62
+
63
+ x.report("new FasterS3URL Builder each time") do
64
+ builder = FasterS3Url::Builder.new(region: region, access_key_id: access_key_id, secret_access_key: secret_access_key, bucket_name: bucket_name)
65
+ builder.presigned_url(object_key)
66
+ end
67
+
68
+ x.report("re-used WT::S3Signer") do
69
+ wt_signer.presigned_get_url(object_key: object_key)
70
+ end
71
+
72
+ x.report("new WT::S3Signer each time") do
73
+ signer = WT::S3Signer.new(
74
+ expires_in: 15 * 60,
75
+ aws_region: "us-east-1",
76
+ bucket_endpoint_url: "https://#{bucket_name}.s3.amazonaws.com",
77
+ bucket_host: "#{bucket_name}.s3.amazonaws.com",
78
+ bucket_name: bucket_name,
79
+ access_key_id: access_key_id,
80
+ secret_access_key: secret_access_key,
81
+ session_token: nil)
82
+ signer.presigned_get_url(object_key: object_key)
83
+ end
84
+
85
+ end
86
+
@@ -0,0 +1,38 @@
1
+ # Run as eg:
2
+ #
3
+ # $ bundle exec ruby perf/public_bench.rb
4
+
5
+ require 'benchmark/ips'
6
+ require 'faster_s3_url'
7
+ require 'aws-sdk-s3'
8
+
9
+ access_key_id = "fakeExampleAccessKeyId"
10
+ secret_access_key = "fakeExampleSecretAccessKey"
11
+
12
+ bucket_name = "my-bucket"
13
+ object_key = "some/directory/file.jpg"
14
+ region = "us-east-1"
15
+
16
+ aws_client = Aws::S3::Client.new(region: region, access_key_id: access_key_id, secret_access_key: secret_access_key)
17
+ aws_bucket = Aws::S3::Bucket.new(name: bucket_name, client: aws_client)
18
+
19
+ faster_s3_builder = FasterS3Url::Builder.new(region: region, access_key_id: access_key_id, secret_access_key: secret_access_key, bucket_name: bucket_name)
20
+
21
+ Benchmark.ips do |x|
22
+ begin
23
+ require 'kalibera'
24
+ x.config(:stats => :bootstrap, :confidence => 95)
25
+ rescue LoadError
26
+
27
+ end
28
+
29
+
30
+ x.report("aws-sdk-s3") do
31
+ aws_bucket.object(object_key).public_url
32
+ end
33
+
34
+ x.report("FasterS3Url") do
35
+ faster_s3_builder.public_url(object_key)
36
+ end
37
+ end
38
+
metadata ADDED
@@ -0,0 +1,132 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: faster_s3_url
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Jonathan Rochkind
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2020-10-01 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: aws-sdk-s3
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.81'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.81'
27
+ - !ruby/object:Gem::Dependency
28
+ name: timecop
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "<"
32
+ - !ruby/object:Gem::Version
33
+ version: '2'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "<"
39
+ - !ruby/object:Gem::Version
40
+ version: '2'
41
+ - !ruby/object:Gem::Dependency
42
+ name: benchmark-ips
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '2.8'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '2.8'
55
+ - !ruby/object:Gem::Dependency
56
+ name: wt_s3_signer
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: shrine
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '3.0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '3.0'
83
+ description:
84
+ email:
85
+ - jrochkind@sciencehistory.org
86
+ executables: []
87
+ extensions: []
88
+ extra_rdoc_files: []
89
+ files:
90
+ - ".gitignore"
91
+ - ".rspec"
92
+ - ".travis.yml"
93
+ - Gemfile
94
+ - LICENSE.txt
95
+ - README.md
96
+ - Rakefile
97
+ - bin/console
98
+ - bin/rspec
99
+ - bin/setup
100
+ - faster_s3_url.gemspec
101
+ - lib/faster_s3_url.rb
102
+ - lib/faster_s3_url/builder.rb
103
+ - lib/faster_s3_url/shrine/storage.rb
104
+ - lib/faster_s3_url/version.rb
105
+ - perf/presigned_bench.rb
106
+ - perf/public_bench.rb
107
+ homepage: https://github.com/jrochkind/faster_s3_url
108
+ licenses:
109
+ - MIT
110
+ metadata:
111
+ homepage_uri: https://github.com/jrochkind/faster_s3_url
112
+ source_code_uri: https://github.com/jrochkind/faster_s3_url
113
+ post_install_message:
114
+ rdoc_options: []
115
+ require_paths:
116
+ - lib
117
+ required_ruby_version: !ruby/object:Gem::Requirement
118
+ requirements:
119
+ - - ">="
120
+ - !ruby/object:Gem::Version
121
+ version: 2.3.0
122
+ required_rubygems_version: !ruby/object:Gem::Requirement
123
+ requirements:
124
+ - - ">="
125
+ - !ruby/object:Gem::Version
126
+ version: '0'
127
+ requirements: []
128
+ rubygems_version: 3.0.3
129
+ signing_key:
130
+ specification_version: 4
131
+ summary: Generate public and presigned AWS S3 GET URLs faster
132
+ test_files: []