faster_s3_url 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 8def7b598503ea770dec6923c611bb3b5fa506ba90f89852ce299457ccbefb1e
4
+ data.tar.gz: 79161290e666ca39b0a9f349dc69fcb326edbeb3b975c8c0ede9c0d97c892737
5
+ SHA512:
6
+ metadata.gz: 62f80a646516296746d851bcdf870c4c0e06a48af6d9e677a53c391b93ec6e4fd14dc86be73f2f540d1ce71587fc3fe2fc4abf19c6b8917e63fa75001e62a261
7
+ data.tar.gz: c3d738c8c3811954540725227a042da8820b9dd0a670e2aa32098c3be5cf015745d5217c904f9f6b0f6f762469f1147db0a76c8f9644d4c29ed8a0ff9d5f4559
@@ -0,0 +1,14 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+
10
+ # rspec failure tracking
11
+ .rspec_status
12
+
13
+ Gemfile.lock
14
+ .byebug_history
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
@@ -0,0 +1,6 @@
1
+ ---
2
+ language: ruby
3
+ cache: bundler
4
+ rvm:
5
+ - 2.6.6
6
+ before_install: gem install bundler -v 2.1.4
data/Gemfile ADDED
@@ -0,0 +1,11 @@
1
+ source "https://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in faster_s3_url.gemspec
4
+ gemspec
5
+
6
+ gem "rake", "~> 12.0"
7
+ gem "rspec", "~> 3.0"
8
+
9
+ gem 'pry-byebug', "~> 3.9"
10
+ # need straight from github to get latest version without deprecations, eg https://github.com/softdevteam/libkalibera/issues/5
11
+ gem 'kalibera', github: "softdevteam/libkalibera"
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2020 Science History Institute, Jonathan Rochkind
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,218 @@
1
+ # FasterS3Url
2
+
3
+ Generate public and presigned AWS S3 `GET` URLs faster in ruby
4
+
5
+ [![Build Status](https://travis-ci.com/jrochkind/faster_s3_url.svg?branch=master)](https://travis-ci.com/jrochkind/faster_s3_url)
6
+
7
+ The official [ruby AWS SDK](https://github.com/aws/aws-sdk-ruby) is actually quite slow and unoptimized when generating URLs to access S3 objects. If you are only creating a couple S3 URLs at a time this may not matter. But it can matter on the order of even two or three hundred at a time, especially when creating presigned URLs, for which the AWS SDK is especially un-optimized.
8
+
9
+ This gem provides a much faster implementation, by around an order of magnitude, for both public and presigned S3 `GET` URLs. Additional S3 params such as `response-content-disposition` are supported for presigned URLs.
10
+
11
+ ## Usage
12
+
13
+ ```ruby
14
+ signer = FasterS3Url::Builder.new(
15
+ bucket_name: "my-bucket",
16
+ region: "us-east-1",
17
+ access_key_id: ENV['AWS_ACCESS_KEY'],
18
+ secret_access_key: ENV['AWS_SECRET_KEY']
19
+ )
20
+
21
+ signer.public_url("my/object/key.jpg")
22
+ #=> "https://my-bucket.aws"
23
+ signer.presigned_url("my/object/key.jpg")
24
+ ```
25
+
26
+ You can re-use a signer object for convenience or slighlty improved performance. It should be concurrency-safe to share globally between threads.
27
+
28
+ If you are using S3 keys that need to be escaped in the URLs, this gem will escpae them properly.
29
+
30
+ When presigning URLs, you can pass the query parameters supported by S3 to control subsequent response headers. You can also supply a version_id for a URL to access a specific version.
31
+
32
+ ```ruby
33
+ signer.presigned_url("my/object/key.jpg"
34
+ response_cache_control: "public, max-age=604800, immutable",
35
+ response_content_disposition: "attachment",
36
+ response_content_language: "de-DE, en-CA",
37
+ response_content_type: "text/html; charset=UTF-8",
38
+ response_content_encoding: "deflate, gzip",
39
+ response_expires: "Wed, 21 Oct 2030 07:28:00 GMT",
40
+ version_id: "BspIL8pXg_52rGXELmqZ7cgmn7u4XJgS"
41
+ )
42
+ ```
43
+
44
+ Use a CNAME or CDN or any other hostname variant other than the default this gem will come up with? Just pass in a `host` argument to initializer. Will work with both public and presigned URLs.
45
+
46
+ ```ruby
47
+ builder = FasterS3Url::Builder.new(
48
+ bucket_name: "my-bucket.example.com",
49
+ host: "my-bucket.example.com",
50
+ region: "us-east-1",
51
+ access_key_id: ENV['AWS_ACCESS_KEY'],
52
+ secret_access_key: ENV['AWS_SECRET_KEY']
53
+ )
54
+ ```
55
+
56
+ ### Cache signing keys for further performance
57
+
58
+ Under most usage patterns, the presigend URLs you generate will all use a `time` with the same UTC date. In this case, a performance advantage can be had by asking the Builder to cache and re-use AWS signing keys, which only vary with calendar date of `time` arg, not time, or S3 key, or other args. It will actually cache the 5 most recently used signing keys. This can result in around a 50% performance improvement with a re-used Builder used for generating presigned keys.
59
+
60
+ **NOTE WELL: This will technically make the Builder object no longer concurrency-safe under multiple threads.** Although you might get away with it under MRI. This is one reason it is not on by default.
61
+
62
+ ```ruby
63
+ builder = FasterS3Url::Builder.new(
64
+ bucket_name: "my-bucket.example.com",
65
+ region: "us-east-1",
66
+ access_key_id: ENV['AWS_ACCESS_KEY'],
67
+ secret_access_key: ENV['AWS_SECRET_KEY'],
68
+ cache_signing_keys: true
69
+ )
70
+ builder.presign_url(key) # performance enhanced
71
+ ```
72
+
73
+
74
+ ### Automatic AWS credentials lookup?
75
+
76
+ Right now, you need to explicitly supply `access_key_id` and `secret_access_key`, in part to avoid a dependency on the AWS SDK (This gem doesn't have such a dependency!). Let us know if this makes you feel a certain kind of way.
77
+
78
+ If you want to look up key/secret/region using the standard SDK methods of checking various places, in order to supply them to the `FasterS3Url::Builder`, you can try this (is there a better way? Cause this is kind of a mess!)
79
+
80
+ ```ruby
81
+ require 'aws-sdk-s3'
82
+ client = Aws::S3::Client.new
83
+ credentials = client.config.credentials
84
+ credentails = credentials.credentials if credentials.respond_to?(:credentials)
85
+
86
+ access_key_id = credentials.access_key_id
87
+ secret_access_key = credentials.secret_access_key
88
+ region = client.config.region
89
+ ```
90
+
91
+ ### Shrine Storage
92
+
93
+ Use [shrine](https://shrinerb.com/)? We do and love it. This gem provides a storage that can be a drop-in replacement to [Shrine::Storage::S3](https://shrinerb.com/docs/storage/s3) (shrine 3.x required), but with faster URL generation.
94
+
95
+ ```ruby
96
+ # Where you might have done:
97
+
98
+ require "shrine/storage/s3"
99
+
100
+ s3 = Shrine::Storage::S3.new(
101
+ bucket: "my-app", # required
102
+ region: "eu-west-1", # required
103
+ access_key_id: "abc",
104
+ secret_access_key: "xyz",
105
+ )
106
+
107
+ # instead do:
108
+
109
+ require "faster_s3_url/shrine/storage"
110
+
111
+ s3 = FasterS3Url::Shrine::Storage.new(
112
+ bucket: "my-app", # required
113
+ region: "eu-west-1", # required
114
+ access_key_id: "abc", # required
115
+ secret_access_key: "xyz", # required
116
+ )
117
+ ```
118
+
119
+ A couple minor differences, let me know if they disrupt you:
120
+ * We don't support the `signer` initializer argument, not clear to me why you'd want to use this gem if you are using it.
121
+ * We support a `host` arg in initializer, but not in #url method.
122
+
123
+ ## Performance Benchmarking
124
+
125
+ Benchmarks were done using scripts checked into repo at `./perf` (which use benchmark-ips with mode `:stats => :bootstrap, :confidence => 95`), on my 2015 Macbook Pro, using ruby MRI 2.6.6. Benchmarking is never an exact science, hopefully this is reasonable.
126
+
127
+ In my narrative, I normalize to how many iterations can happen in **10ms** to have numbers closer to what might be typical use cases.
128
+
129
+ ### Public URLs
130
+
131
+ `aws-sdk-s3` can create about 180 public URLs in 10ms, not horrible, but for how simple it seems the operation should be? FasterS3Url can do 2,200 public URLs in 10ms, that's a lot better.
132
+
133
+ ```
134
+ $ bundle exec ruby perf/public_bench.rb
135
+ Warming up --------------------------------------
136
+ aws-sdk-s3 1.265k i/100ms
137
+ FasterS3Url 24.414k i/100ms
138
+ Calculating -------------------------------------
139
+ aws-sdk-s3 18.701k (± 3.2%) i/s - 92.345k in 5.048062s
140
+ FasterS3Url 222.938k (± 3.2%) i/s - 1.123M in 5.106971s
141
+ with 95.0% confidence
142
+ ```
143
+
144
+ ### Presigned URLs
145
+
146
+ Here's where it really starts to matter.
147
+
148
+ `aws-sdk-s3` can only generate about 10 presigned URLs in 10ms, painful. FasterS3URL, with a re-used Builder object, can generate about 220 presigned URLs in 10ms, much better, and actually faster than `aws-sdk-s3` can generate public urls! Even if we re-instantiate a Builder each time, we can generate 180 presigned URLs in 10ms, don't lose too much performance that way.
149
+
150
+ If we re-use the Builder *and* turn on the (not thread-safe) `cached_signing_keys` option, we can get up to 300 presigned URLs generated in 10ms.
151
+
152
+ FasterS3URL supports supplying custom query params to instruct s3 HTTP response headers. This does slow things down since they need to be URI-escaped and constructed. Using this feature with `aws-sdk-s3`, it doesn't lose much speed, down to 9 instead of 10 URLs in 10ms. FasterS3URL goes down from 210 to 180 URLs generated in 10ms (without using `cached_signing_keys` option).
153
+
154
+ We can compare to the ultra-fast [wt_s3_signer](https://github.com/WeTransfer/wt_s3_signer) gem, which, with a re-used signer object (that assumes the same `time` for all URLs, unlike us; and does not support per-url custom query params) can get all the way up to 680 URLs generated in 10ms, over twice as fast as we can do even with `cached_signing_keys`. If the restrictions and API of wt_s3_signer are amenable to your use case, it's definitely the fastest. But FasterS3URL is in the ballpark, and still more than an order of magnitude faster than `aws-sdk-s3`.
155
+
156
+ ```
157
+ $ bundle exec ruby perf/presigned_bench.rb
158
+ Warming up --------------------------------------
159
+ aws-sdk-s3 113.000 i/100ms
160
+ aws-sdk-s3 with custom headers
161
+ 95.000 i/100ms
162
+ re-used FasterS3Url 1.820k i/100ms
163
+ re-used FasterS3Url with cached signing keys
164
+ 2.920k i/100ms
165
+ re-used FasterS3URL with custom headers
166
+ 1.494k i/100ms
167
+ new FasterS3URL Builder each time
168
+ 1.977k i/100ms
169
+ re-used WT::S3Signer 7.985k i/100ms
170
+ new WT::S3Signer each time
171
+ 1.611k i/100ms
172
+ Calculating -------------------------------------
173
+ aws-sdk-s3 1.084k (± 4.2%) i/s - 5.311k in 5.003981s
174
+ aws-sdk-s3 with custom headers
175
+ 918.315 (± 4.8%) i/s - 4.560k in 5.118770s
176
+ re-used FasterS3Url 21.906k (± 4.0%) i/s - 107.380k in 5.046561s
177
+ re-used FasterS3Url with cached signing keys
178
+ 29.756k (± 3.6%) i/s - 146.000k in 4.999910s
179
+ re-used FasterS3URL with custom headers
180
+ 18.062k (± 4.3%) i/s - 85.158k in 5.025685s
181
+ new FasterS3URL Builder each time
182
+ 18.312k (± 3.9%) i/s - 90.942k in 5.098636s
183
+ re-used WT::S3Signer 68.275k (± 3.5%) i/s - 343.355k in 5.109088s
184
+ new WT::S3Signer each time
185
+ 22.425k (± 2.8%) i/s - 111.159k in 5.036814s
186
+ with 95.0% confidence
187
+ ```
188
+
189
+
190
+ ## Development
191
+
192
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
193
+
194
+ To install local unreleased source of this gem onto your local machine for development, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
195
+
196
+ ## Sources/Acknowledgements
197
+
198
+ [wt_s3_signer](https://github.com/WeTransfer/wt_s3_signer) served as a proof of concept, but was optimized for their particular use case, assuming batch processing where all signed S3 URLs share the same "now" time. I needed to support cases that didn't assume this, and also support custom headers like `response_content_disposition`. This code is also released with a different license. But if the API and use cases of `wt_s3_signer` meet your needs, it is even faster than this code.
199
+
200
+ I tried to figure out how to do the S3 presigned request from some AWS docs:
201
+
202
+ * https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html
203
+ * https://docs.aws.amazon.com/general/latest/gr/sigv4-signed-request-examples.html
204
+
205
+ But these don't really have all info you need for generating an S3 signature. So also ended up debugger reverse engineering the ruby aws-sdk code when generating an S3 presigned url, for instance:
206
+
207
+ * https://github.com/aws/aws-sdk-ruby/blob/47c11bef18a4754ec8a05dfb637dcab120138c27/gems/aws-sdk-s3/lib/aws-sdk-s3/presigner.rb
208
+ * but especially: https://github.com/aws/aws-sdk-ruby/blob/47c11bef18a4754ec8a05dfb637dcab120138c27/gems/aws-sigv4/lib/aws-sigv4/signer.rb
209
+
210
+ ## Contributing
211
+
212
+ Bug reports and pull requests are welcome on GitHub at https://github.com/jrochkind/faster_s3_url.
213
+
214
+ Is there a feature missing that you need? I may not be able to provide it, but I would love to hear from you!
215
+
216
+ ## License
217
+
218
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "faster_s3_url"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
@@ -0,0 +1,29 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ #
5
+ # This file was generated by Bundler.
6
+ #
7
+ # The application 'rspec' is installed as part of a gem, and
8
+ # this file is here to facilitate running it.
9
+ #
10
+
11
+ require "pathname"
12
+ ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../Gemfile",
13
+ Pathname.new(__FILE__).realpath)
14
+
15
+ bundle_binstub = File.expand_path("../bundle", __FILE__)
16
+
17
+ if File.file?(bundle_binstub)
18
+ if File.read(bundle_binstub, 300) =~ /This file was generated by Bundler/
19
+ load(bundle_binstub)
20
+ else
21
+ abort("Your `bin/bundle` was not generated by Bundler, so this binstub cannot run.
22
+ Replace `bin/bundle` by running `bundle binstubs bundler --force`, then run this command again.")
23
+ end
24
+ end
25
+
26
+ require "rubygems"
27
+ require "bundler/setup"
28
+
29
+ load Gem.bin_path("rspec-core", "rspec")
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,35 @@
1
+ require_relative 'lib/faster_s3_url/version'
2
+
3
+ Gem::Specification.new do |spec|
4
+ spec.name = "faster_s3_url"
5
+ spec.version = FasterS3Url::VERSION
6
+ spec.authors = ["Jonathan Rochkind"]
7
+ spec.email = ["jrochkind@sciencehistory.org"]
8
+
9
+ spec.summary = %q{Generate public and presigned AWS S3 GET URLs faster}
10
+ spec.homepage = "https://github.com/jrochkind/faster_s3_url"
11
+ spec.license = "MIT"
12
+ spec.required_ruby_version = Gem::Requirement.new(">= 2.3.0")
13
+
14
+ #spec.metadata["allowed_push_host"] = "TODO: Set to 'http://mygemserver.com'"
15
+
16
+ spec.metadata["homepage_uri"] = spec.homepage
17
+ spec.metadata["source_code_uri"] = "https://github.com/jrochkind/faster_s3_url"
18
+ #spec.metadata["changelog_uri"] = "TODO: Put your gem's CHANGELOG.md URL here."
19
+
20
+ # Specify which files should be added to the gem when it is released.
21
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
22
+ spec.files = Dir.chdir(File.expand_path('..', __FILE__)) do
23
+ `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
24
+ end
25
+ spec.bindir = "exe"
26
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
27
+ spec.require_paths = ["lib"]
28
+
29
+ spec.add_development_dependency "aws-sdk-s3", "~> 1.81"
30
+ spec.add_development_dependency "timecop", "< 2"
31
+ spec.add_development_dependency "benchmark-ips", "~> 2.8"
32
+ #spec.add_development_dependency "kalibera" # for benchmark-ips :bootstrap stats option
33
+ spec.add_development_dependency "wt_s3_signer" # just for benchmarking
34
+ spec.add_development_dependency "shrine", "~> 3.0" # for testing shrine storage
35
+ end
@@ -0,0 +1,5 @@
1
+ require "faster_s3_url/version"
2
+ require 'faster_s3_url/builder'
3
+
4
+ module FasterS3Url
5
+ end
@@ -0,0 +1,290 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FasterS3Url
4
+ # Signing algorithm based on Amazon docs at https://docs.aws.amazon.com/general/latest/gr/sigv4-signed-request-examples.html ,
5
+ # as well as some interactive code reading of Aws::Sigv4::Signer
6
+ # https://github.com/aws/aws-sdk-ruby/blob/6114bc9692039ac75c8292c66472dacd14fa6f9a/gems/aws-sigv4/lib/aws-sigv4/signer.rb
7
+ # as used by Aws::S3::Presigner https://github.com/aws/aws-sdk-ruby/blob/6114bc9692039ac75c8292c66472dacd14fa6f9a/gems/aws-sdk-s3/lib/aws-sdk-s3/presigner.rb
8
+ class Builder
9
+ FIFTEEN_MINUTES = 60 * 15
10
+ ONE_WEEK = 60 * 60 * 24 * 7
11
+
12
+ SIGNED_HEADERS = "host".freeze
13
+ METHOD = "GET".freeze
14
+ ALGORITHM = "AWS4-HMAC-SHA256".freeze
15
+ SERVICE = "s3".freeze
16
+
17
+ DEFAULT_EXPIRES_IN = FIFTEEN_MINUTES # 15 minutes, seems to be AWS SDK default
18
+
19
+ MAX_CACHED_SIGNING_KEYS = 5
20
+
21
+ attr_reader :bucket_name, :region, :host, :access_key_id
22
+
23
+ # @option params [String] :bucket_name required
24
+ #
25
+ # @option params [String] :region eg "us-east-1", required
26
+ #
27
+ # @option params[String] :host optional, host to use in generated URLs. If empty, will construct default AWS S3 host for bucket name and region.
28
+ #
29
+ # @option params [String] :access_key_id required at present, change to allow look up from environment using standard aws sdk routines?
30
+ #
31
+ # @option params [String] :secret_access_key required at present, change to allow look up from environment using standard aws sdk routines?
32
+ #
33
+ # @option params [boolean] :default_public (true) default value of `public` when instance method #url is called.
34
+ #
35
+ # @option params [boolean] :cache_signing_keys (false). If set to true, up to five signing keys used for presigned URLs will
36
+ # be cached and re-used, improving performance when generating mulitple presigned urls with a single Builder by around 50%.
37
+ # NOTE WELL: This will make the Builder no longer technically concurrency-safe for sharing between multiple threads, is one
38
+ # reason it is not on by default.
39
+ def initialize(bucket_name:, region:, access_key_id:, secret_access_key:, host:nil, default_public: true, cache_signing_keys: false)
40
+ @bucket_name = bucket_name
41
+ @region = region
42
+ @host = host || default_host(bucket_name)
43
+ @default_public = default_public
44
+ @access_key_id = access_key_id
45
+ @secret_access_key = secret_access_key
46
+ @cache_signing_keys = cache_signing_keys
47
+ if @cache_signing_keys
48
+ @signing_key_cache = {}
49
+ end
50
+
51
+
52
+ @canonical_headers = "host:#{@host}\n"
53
+ end
54
+
55
+ def public_url(key)
56
+ "https://#{self.host}/#{uri_escape_key(key)}"
57
+ end
58
+
59
+ # Generates a presigned GET URL for a specified S3 object key.
60
+ #
61
+ # @param [String] key The S3 key to create a URL pointing to.
62
+ #
63
+ # @option params [Time] :time (Time.now) The starting time for when the
64
+ # presigned url becomes active.
65
+ #
66
+ # @option params [String] :response_cache_control
67
+ # Adds a `response-cache-control` query param to set the `Cache-Control` header of the subsequent response from S3.
68
+ #
69
+ # @option params [String] :response_content_disposition
70
+ # Adds a `response-content-disposition` query param to set the `Content-Disposition` header of the subsequent response from S3
71
+ #
72
+ # @option params [String] :response_content_encoding
73
+ # Adds a `response-content-encoding` query param to set `Content-Encoding` header of the subsequent response from S3
74
+ #
75
+ # @option params [String] :response_content_language
76
+ # Adds a `response-content-language` query param to sets the `Content-Language` header of the subsequent response from S3
77
+ #
78
+ # @option params [String] :response_content_type
79
+ # Adds a `response-content-type` query param to sets the `Content-Type` header of the subsequent response from S3
80
+ #
81
+ # @option params [String] :response_expires
82
+ # Adds a `response-expires` query param to sets the `Expires` header of of the subsequent response from S3
83
+ #
84
+ # @option params [String] :version_id
85
+ # Adds a `versionId` query param to reference a specific version of the object from S3.
86
+ def presigned_url(key, time: nil, expires_in: DEFAULT_EXPIRES_IN,
87
+ response_cache_control: nil,
88
+ response_content_disposition: nil,
89
+ response_content_encoding: nil,
90
+ response_content_language: nil,
91
+ response_content_type: nil,
92
+ response_expires: nil,
93
+ version_id: nil)
94
+ validate_expires_in(expires_in)
95
+
96
+ canonical_uri = "/" + uri_escape_key(key)
97
+
98
+ now = time ? time.dup.utc : Time.now.utc # Uh Time#utc is mutating, not nice to do to an argument!
99
+ amz_date = now.strftime("%Y%m%dT%H%M%SZ")
100
+ datestamp = now.strftime("%Y%m%d")
101
+
102
+ credential_scope = datestamp + '/' + region + '/' + SERVICE + '/' + 'aws4_request'
103
+
104
+ canonical_query_string_parts = [
105
+ "X-Amz-Algorithm=#{ALGORITHM}",
106
+ "X-Amz-Credential=" + uri_escape(@access_key_id + "/" + credential_scope),
107
+ "X-Amz-Date=" + amz_date,
108
+ "X-Amz-Expires=" + expires_in.to_s,
109
+ "X-Amz-SignedHeaders=" + SIGNED_HEADERS,
110
+ ]
111
+
112
+ extra_params = {
113
+ :"response-cache-control" => response_cache_control,
114
+ :"response-content-disposition" => response_content_disposition,
115
+ :"response-content-encoding" => response_content_encoding,
116
+ :"response-content-language" => response_content_language,
117
+ :"response-content-type" => response_content_type,
118
+ :"response-expires" => convert_for_timestamp_shape(response_expires),
119
+ :"versionId" => version_id
120
+ }.compact
121
+
122
+
123
+ if extra_params.size > 0
124
+ # These have to be sorted, but sort is case-sensitive, and we have a fixed
125
+ # list of headers we know might be here... turns out they are already sorted?
126
+ extra_param_parts = extra_params.collect {|k, v| "#{k}=#{uri_escape v}" }.join("&")
127
+ canonical_query_string_parts << extra_param_parts
128
+ end
129
+
130
+ canonical_query_string = canonical_query_string_parts.join("&")
131
+
132
+
133
+
134
+ canonical_request = ["GET",
135
+ canonical_uri,
136
+ canonical_query_string,
137
+ @canonical_headers,
138
+ SIGNED_HEADERS,
139
+ 'UNSIGNED-PAYLOAD'
140
+ ].join("\n")
141
+
142
+ string_to_sign = [
143
+ ALGORITHM,
144
+ amz_date,
145
+ credential_scope,
146
+ Digest::SHA256.hexdigest(canonical_request)
147
+ ].join("\n")
148
+
149
+ signing_key = retrieve_signing_key(datestamp)
150
+ signature = OpenSSL::HMAC.hexdigest("SHA256", signing_key, string_to_sign)
151
+
152
+ return "https://" + self.host + canonical_uri + "?" + canonical_query_string + "&X-Amz-Signature=" + signature
153
+ end
154
+
155
+ # just a convenience method that can call public_url or presigned_url based on flag
156
+ #
157
+ # signer.url(object_key, public: true)
158
+ # #=> forwards to signer.public_url(object_key)
159
+ #
160
+ # signer.url(object_key, public: false, response_content_type: "image/jpeg")
161
+ # #=> forwards to signer.presigned_url(object_key, response_content_type: "image/jpeg")
162
+ #
163
+ # Options (sucn as response_content_type) that are not applicable to #public_url
164
+ # are ignored in public mode.
165
+ #
166
+ # The default value of `public` can be set by initializer arg `default_public`, which
167
+ # is itself default true.
168
+ #
169
+ # builder = FasterS3Url::Builder.new(..., default_public: false)
170
+ # builder.url(object_key) # will call #presigned_url
171
+ def url(key, public: @default_public, **options)
172
+ if public
173
+ public_url(key)
174
+ else
175
+ presigned_url(key, **options)
176
+ end
177
+ end
178
+
179
+
180
+ private
181
+
182
+ def make_signing_key(datestamp)
183
+ aws_get_signature_key(@secret_access_key, datestamp, @region, SERVICE)
184
+ end
185
+
186
+ # If caching of signing keys is turned on, use and cache signing key, while
187
+ # making sure not to cache more than MAX_CACHED_SIGNING_KEYS
188
+ #
189
+ # Otherwise if caching of signing keys is not turned on, just generate and return
190
+ # a signing key.
191
+ def retrieve_signing_key(datestamp)
192
+ if @cache_signing_keys
193
+ if value = @signing_key_cache[datestamp]
194
+ value
195
+ else
196
+ value = @signing_key_cache[datestamp] = make_signing_key(datestamp)
197
+ while @signing_key_cache.size > MAX_CACHED_SIGNING_KEYS
198
+ @signing_key_cache.delete(@signing_key_cache.keys.first)
199
+ end
200
+ value
201
+ end
202
+ else
203
+ make_signing_key(datestamp)
204
+ end
205
+ end
206
+
207
+
208
+ # Becaues CGI.escape in MRI is written in C, this really does seem
209
+ # to be the fastest way to get the semantics we want, starting with
210
+ # CGI.escape and doing extra gsubs. Alternative would be using something
211
+ # else in pure C that has the semantics we want, but does not seem available.
212
+ def uri_escape(string)
213
+ if string.nil?
214
+ nil
215
+ else
216
+ CGI.escape(string.encode('UTF-8')).gsub('+', '%20').gsub('%7E', '~')
217
+ end
218
+ end
219
+
220
+ # like uri_escape but does NOT escape `/`, leaves it alone. The appropriate
221
+ # escaping algorithm for an S3 key turning into a URL.
222
+ #
223
+ # Faster to un-DRY the code with uri_escape. Yes, faster to actually just gsub
224
+ # %2F back to /
225
+ def uri_escape_key(string)
226
+ if string.nil?
227
+ nil
228
+ else
229
+ CGI.escape(string.encode('UTF-8')).gsub('+', '%20').gsub('%7E', '~').gsub("%2F", "/")
230
+ end
231
+ end
232
+
233
+ def default_host(bucket_name)
234
+ if region == "us-east-1"
235
+ # use legacy one without region, as S3 seems to
236
+ "#{bucket_name}.s3.amazonaws.com".freeze
237
+ else
238
+ "#{bucket_name}.s3.#{region}.amazonaws.com".freeze
239
+ end
240
+ end
241
+
242
+ # `def get_signature_key` `from python example at https://docs.aws.amazon.com/general/latest/gr/sigv4-signed-request-examples.html
243
+ def aws_get_signature_key(key, date_stamp, region_name, service_name)
244
+ k_date = aws_sign("AWS4" + key, date_stamp)
245
+ k_region = aws_sign(k_date, region_name)
246
+ k_service = aws_sign(k_region, service_name)
247
+ aws_sign(k_service, "aws4_request")
248
+ end
249
+
250
+ # `def sign` from python example at https://docs.aws.amazon.com/general/latest/gr/sigv4-signed-request-examples.html
251
+ def aws_sign(key, data)
252
+ OpenSSL::HMAC.digest("SHA256", key, data)
253
+ end
254
+
255
+ def validate_expires_in(expires_in)
256
+ if expires_in > ONE_WEEK
257
+ raise ArgumentError.new("expires_in value of #{expires_in} exceeds one-week maximum.")
258
+ elsif expires_in <= 0
259
+ raise ArgumentError.new("expires_in value of #{expires_in} cannot be 0 or less.")
260
+ end
261
+ end
262
+
263
+ # Crazy kind of reverse engineered from aws-sdk-ruby,
264
+ # for compatible handling of Expires header.
265
+ #
266
+ # This honestly seems to violate the HTTP spec, the result will be that for
267
+ # an `response-expires` param, subsequent S3 response will include an Expires
268
+ # header in ISO8601 instead of HTTP-date format.... but for now we'll make
269
+ # our tests pass by behaving equivalently to aws-sdk-s3 anyway? filed
270
+ # with aws-sdk-s3: https://github.com/aws/aws-sdk-ruby/issues/2415
271
+ #
272
+ # Switch last line from `.utc.iso8601` to `.httpdate` if you want to be
273
+ # more correct than aws-sdk-s3?
274
+ def convert_for_timestamp_shape(arg)
275
+ return nil if arg.nil?
276
+
277
+ time_value = case arg
278
+ when Time
279
+ arg
280
+ when Date, DateTime
281
+ arg.to_time
282
+ when Integer, Float
283
+ Time.at(arg)
284
+ else
285
+ Time.parse(arg.to_s)
286
+ end
287
+ time_value.utc.iso8601
288
+ end
289
+ end
290
+ end
@@ -0,0 +1,66 @@
1
+ gem "shrine", "~> 3.0"
2
+ require 'shrine/storage/s3'
3
+
4
+ module FasterS3Url
5
+ module Shrine
6
+ # More or less a drop-in replacement for Shrine::Storage::S3 , that uses FasterS3Url faster S3 URL generation.
7
+ # https://shrinerb.com/docs/storage/s3
8
+ #
9
+ # require 'faster_s3_url/storage/shrine'
10
+ #
11
+ # s3 = FasterS3Url::Shrine::Storage.new(
12
+ # bucket: "my-app", # required
13
+ # region: "eu-west-1", # required
14
+ # access_key_id: "abc",
15
+ # secret_access_key: "xyz"
16
+ # )
17
+ #
18
+ # A couple incompatibilities with Shrine::Storage::S3, which I don't expect to cause problems
19
+ # for anyone but if they do please let me know.
20
+ #
21
+ # * we do not support the :signer option in initialier (why would you want to use that with this? Let me know)
22
+ #
23
+ # * We support a `host` option on initializer, but do NOT support the `host` option on #url (I don't underestand
24
+ # why it's per-call in the first place, do you need it to be?)
25
+ #
26
+ class Storage < ::Shrine::Storage::S3
27
+ # Same options as Shrine::Storage::S3, plus `host`
28
+ def initialize(**options)
29
+ if options[:signer]
30
+ raise ArgumentError.new("#{self.class.name} does not support :signer option of Shrine::Storage::S3. Should it? Let us know.")
31
+ end
32
+
33
+ host = options.delete(:host)
34
+ @faster_s3_url_builder = FasterS3Url::Builder.new(
35
+ bucket_name: options[:bucket],
36
+ access_key_id: options[:access_key_id],
37
+ secret_access_key: options[:secret_access_key],
38
+ region: options[:region],
39
+ host: host)
40
+
41
+ super(**options)
42
+ end
43
+
44
+ # unlike base Shrine::Storage::S3, does not support `host` here, do it in
45
+ # initializer instead. Is there a really use case for doing it here?
46
+ # If so let us know.
47
+ #
48
+ # options are ignored when public mode, so you can send options the same
49
+ # for public or not, and not get an error on public for options only appropriate
50
+ # to presigned.
51
+ #
52
+ # Otherwise, same options as Shrine::S3::Storage should be supported, please
53
+ # see docs there. https://shrinerb.com/docs/storage/s3
54
+ def url(id, public: self.public, **options)
55
+ @faster_s3_url_builder.url(object_key(id), public: public, **options)
56
+ end
57
+
58
+ # For older shrine versions without it, we need this...
59
+ unless self.method_defined?(:object_key)
60
+ def object_key(id)
61
+ [*prefix, id].join("/")
62
+ end
63
+ end
64
+ end
65
+ end
66
+ end
@@ -0,0 +1,3 @@
1
+ module FasterS3Url
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,86 @@
1
+ # Run as eg:
2
+ #
3
+ # $ bundle exec ruby perf/presigned_bench.rb
4
+
5
+ require 'benchmark/ips'
6
+ require 'faster_s3_url'
7
+ require 'aws-sdk-s3'
8
+ require 'wt_s3_signer'
9
+
10
+ access_key_id = "fakeExampleAccessKeyId"
11
+ secret_access_key = "fakeExampleSecretAccessKey"
12
+
13
+ bucket_name = "my-bucket"
14
+ object_key = "some/directory/file.jpg"
15
+ region = "us-east-1"
16
+
17
+ aws_client = Aws::S3::Client.new(region: region, access_key_id: access_key_id, secret_access_key: secret_access_key)
18
+ aws_bucket = Aws::S3::Bucket.new(name: bucket_name, client: aws_client)
19
+
20
+ faster_s3_builder = FasterS3Url::Builder.new(region: region, access_key_id: access_key_id, secret_access_key: secret_access_key, bucket_name: bucket_name)
21
+
22
+ faster_s3_builder_with_caching = FasterS3Url::Builder.new(region: region, access_key_id: access_key_id, secret_access_key: secret_access_key, bucket_name: bucket_name, cache_signing_keys: true)
23
+
24
+ wt_signer = WT::S3Signer.new(
25
+ expires_in: 15 * 60,
26
+ aws_region: "us-east-1",
27
+ bucket_endpoint_url: "https://#{bucket_name}.s3.amazonaws.com",
28
+ bucket_host: "#{bucket_name}.s3.amazonaws.com",
29
+ bucket_name: bucket_name,
30
+ access_key_id: access_key_id,
31
+ secret_access_key: secret_access_key,
32
+ session_token: nil
33
+ )
34
+
35
+ Benchmark.ips do |x|
36
+ begin
37
+ require 'kalibera'
38
+ x.config(:stats => :bootstrap, :confidence => 95)
39
+ rescue LoadError
40
+
41
+ end
42
+
43
+ x.report("aws-sdk-s3") do
44
+ aws_bucket.object(object_key).presigned_url(:get)
45
+ end
46
+
47
+ x.report("aws-sdk-s3 with custom headers") do
48
+ aws_bucket.object(object_key).presigned_url(:get, response_content_type: "image/jpeg", response_content_disposition: "attachment; filename=\"foo bar.baz\"; filename*=UTF-8''foo%20bar.baz")
49
+ end
50
+
51
+ x.report("re-used FasterS3Url") do
52
+ faster_s3_builder.presigned_url(object_key)
53
+ end
54
+
55
+ x.report("re-used FasterS3Url with cached signing keys") do
56
+ faster_s3_builder_with_caching.presigned_url(object_key)
57
+ end
58
+
59
+ x.report("re-used FasterS3URL with custom headers") do
60
+ faster_s3_builder.presigned_url(object_key, response_content_type: "image/jpeg", response_content_disposition: "attachment; filename=\"foo bar.baz\"; filename*=UTF-8''foo%20bar.baz")
61
+ end
62
+
63
+ x.report("new FasterS3URL Builder each time") do
64
+ builder = FasterS3Url::Builder.new(region: region, access_key_id: access_key_id, secret_access_key: secret_access_key, bucket_name: bucket_name)
65
+ builder.presigned_url(object_key)
66
+ end
67
+
68
+ x.report("re-used WT::S3Signer") do
69
+ wt_signer.presigned_get_url(object_key: object_key)
70
+ end
71
+
72
+ x.report("new WT::S3Signer each time") do
73
+ signer = WT::S3Signer.new(
74
+ expires_in: 15 * 60,
75
+ aws_region: "us-east-1",
76
+ bucket_endpoint_url: "https://#{bucket_name}.s3.amazonaws.com",
77
+ bucket_host: "#{bucket_name}.s3.amazonaws.com",
78
+ bucket_name: bucket_name,
79
+ access_key_id: access_key_id,
80
+ secret_access_key: secret_access_key,
81
+ session_token: nil)
82
+ signer.presigned_get_url(object_key: object_key)
83
+ end
84
+
85
+ end
86
+
@@ -0,0 +1,38 @@
1
+ # Run as eg:
2
+ #
3
+ # $ bundle exec ruby perf/public_bench.rb
4
+
5
+ require 'benchmark/ips'
6
+ require 'faster_s3_url'
7
+ require 'aws-sdk-s3'
8
+
9
+ access_key_id = "fakeExampleAccessKeyId"
10
+ secret_access_key = "fakeExampleSecretAccessKey"
11
+
12
+ bucket_name = "my-bucket"
13
+ object_key = "some/directory/file.jpg"
14
+ region = "us-east-1"
15
+
16
+ aws_client = Aws::S3::Client.new(region: region, access_key_id: access_key_id, secret_access_key: secret_access_key)
17
+ aws_bucket = Aws::S3::Bucket.new(name: bucket_name, client: aws_client)
18
+
19
+ faster_s3_builder = FasterS3Url::Builder.new(region: region, access_key_id: access_key_id, secret_access_key: secret_access_key, bucket_name: bucket_name)
20
+
21
+ Benchmark.ips do |x|
22
+ begin
23
+ require 'kalibera'
24
+ x.config(:stats => :bootstrap, :confidence => 95)
25
+ rescue LoadError
26
+
27
+ end
28
+
29
+
30
+ x.report("aws-sdk-s3") do
31
+ aws_bucket.object(object_key).public_url
32
+ end
33
+
34
+ x.report("FasterS3Url") do
35
+ faster_s3_builder.public_url(object_key)
36
+ end
37
+ end
38
+
metadata ADDED
@@ -0,0 +1,132 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: faster_s3_url
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Jonathan Rochkind
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2020-10-01 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: aws-sdk-s3
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.81'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.81'
27
+ - !ruby/object:Gem::Dependency
28
+ name: timecop
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "<"
32
+ - !ruby/object:Gem::Version
33
+ version: '2'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "<"
39
+ - !ruby/object:Gem::Version
40
+ version: '2'
41
+ - !ruby/object:Gem::Dependency
42
+ name: benchmark-ips
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '2.8'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '2.8'
55
+ - !ruby/object:Gem::Dependency
56
+ name: wt_s3_signer
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: shrine
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '3.0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '3.0'
83
+ description:
84
+ email:
85
+ - jrochkind@sciencehistory.org
86
+ executables: []
87
+ extensions: []
88
+ extra_rdoc_files: []
89
+ files:
90
+ - ".gitignore"
91
+ - ".rspec"
92
+ - ".travis.yml"
93
+ - Gemfile
94
+ - LICENSE.txt
95
+ - README.md
96
+ - Rakefile
97
+ - bin/console
98
+ - bin/rspec
99
+ - bin/setup
100
+ - faster_s3_url.gemspec
101
+ - lib/faster_s3_url.rb
102
+ - lib/faster_s3_url/builder.rb
103
+ - lib/faster_s3_url/shrine/storage.rb
104
+ - lib/faster_s3_url/version.rb
105
+ - perf/presigned_bench.rb
106
+ - perf/public_bench.rb
107
+ homepage: https://github.com/jrochkind/faster_s3_url
108
+ licenses:
109
+ - MIT
110
+ metadata:
111
+ homepage_uri: https://github.com/jrochkind/faster_s3_url
112
+ source_code_uri: https://github.com/jrochkind/faster_s3_url
113
+ post_install_message:
114
+ rdoc_options: []
115
+ require_paths:
116
+ - lib
117
+ required_ruby_version: !ruby/object:Gem::Requirement
118
+ requirements:
119
+ - - ">="
120
+ - !ruby/object:Gem::Version
121
+ version: 2.3.0
122
+ required_rubygems_version: !ruby/object:Gem::Requirement
123
+ requirements:
124
+ - - ">="
125
+ - !ruby/object:Gem::Version
126
+ version: '0'
127
+ requirements: []
128
+ rubygems_version: 3.0.3
129
+ signing_key:
130
+ specification_version: 4
131
+ summary: Generate public and presigned AWS S3 GET URLs faster
132
+ test_files: []