tus-server 0.10.2 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: a8418797bd1a0d54cfe083534e48a4b014763cf1
4
- data.tar.gz: c56a965e748adbd182700f38d32f8a4a530efee8
3
+ metadata.gz: 9dfd9cb38d7ef46fe3ac684a7ee3569f66e97c84
4
+ data.tar.gz: 19b75e3d0c222d47c46432f1fd198ab9e27f56bb
5
5
  SHA512:
6
- metadata.gz: 338316cbd7d8ae243e40a15bbca5b04070b70b6973ad618da9f7524a58af2586254eb73415ec6830d26bf43d470a13eb975b3cc9cf170a915ec8aed0d2309982
7
- data.tar.gz: 053cd94a93e0b2f57914240c45c9ed50494a6191acfb62f2b1c01d4331bc6047a249d4dff65367dc8bc29bef15aad9a4f7ff0b8e1bc85edb997bc96b755b6f79
6
+ metadata.gz: 48c902ba0c5ff439869a93b3b4fa851fe17644c7a41978b25041f4955b5c0697e9a725ece84a5ee1d294acd7c631ad04b9d4348f327a64c1ca0002549b1db509
7
+ data.tar.gz: 482543723d79aeb1df06a9fb429d8f39584e4891905e795c32df660f860413ffcac4408892114a85c67e4b1c867e1d6b9462157d6f495c9399d3d641e921f6b8
data/README.md CHANGED
@@ -19,8 +19,7 @@ gem "tus-server"
19
19
 
20
20
  Tus-ruby-server provides a `Tus::Server` Roda app, which you can run in your
21
21
  `config.ru`. That way you can run `Tus::Server` both as a standalone app or as
22
- part of your main app (though it's recommended to run it as a standalone app,
23
- as explained in the "Performance considerations" section of this README).
22
+ part of your main app.
24
23
 
25
24
  ```rb
26
25
  # config.ru
@@ -33,6 +32,9 @@ end
33
32
  run YourApp
34
33
  ```
35
34
 
35
+ While this is the most flexible option, it's not optimal in terms of
36
+ performance; see the [Goliath](#goliath) section for an alternative approach.
37
+
36
38
  Now you can tell your tus client library (e.g. [tus-js-client]) to use this
37
39
  endpoint:
38
40
 
@@ -40,7 +42,7 @@ endpoint:
40
42
  // using tus-js-client
41
43
  new tus.Upload(file, {
42
44
  endpoint: "http://localhost:9292/files",
43
- chunkSize: 5*1024*1024, // 5MB
45
+ chunkSize: 5*1024*1024, // required unless using Goliath
44
46
  // ...
45
47
  })
46
48
  ```
@@ -49,6 +51,42 @@ After the upload is complete, you'll probably want to attach the uploaded file
49
51
  to a database record. [Shrine] is one file attachment library that integrates
50
52
  nicely with tus-ruby-server, see [shrine-tus-demo] for an example integration.
51
53
 
54
+ ### Goliath
55
+
56
+ [Goliath] is the ideal web server to run tus-ruby-server on, because by
57
+ utilizing [EventMachine] it's asnychronous both in reading the request body and
58
+ writing to the response body, so it's not affected by slow clients. Goliath
59
+ also allows tus-ruby-server to handle interrupted requests, by saving data that
60
+ has been uploaded until the interruption. This means that with Goliath it's
61
+ **not** mandatory for client to chunk the upload into multiple requests in
62
+ order to achieve resumable uploads, which would be the case for most other web
63
+ servers.
64
+
65
+ Tus-ruby-server ships with Goliath integration, you just need to require it in
66
+ a Ruby file and run that file, and that will automatically start up Goliath.
67
+
68
+ ```rb
69
+ # Gemfile
70
+ gem "tus-server", "~> 1.0"
71
+ gem "goliath"
72
+ gem "async-rack", ">= 0.5.1"
73
+ ```
74
+ ```rb
75
+ # tus.rb
76
+ require "tus/server/goliath"
77
+
78
+ # any additional Tus::Server configuration you want to put in here
79
+ ```
80
+ ```sh
81
+ $ ruby tus.rb --stdout # enable logging
82
+ ```
83
+
84
+ Any options provided after the Ruby file will be passed in to the Goliath
85
+ server, see [this wiki][goliath server options] for all available options that
86
+ Goliath supports. As shown above, running tus-ruby-server on Goliath means you
87
+ have to run it separately from your main app (unless your main app is also on
88
+ Goliath).
89
+
52
90
  ## Storage
53
91
 
54
92
  ### Filesystem
@@ -207,8 +245,8 @@ Tus::Server.opts[:storage].expire_files(expiration_date)
207
245
  ## Download
208
246
 
209
247
  In addition to implementing the tus protocol, tus-ruby-server also comes with a
210
- GET endpoint for downloading the uploaded file, which streams the file directly
211
- from the storage.
248
+ GET endpoint for downloading the uploaded file, which streams the file from the
249
+ storage into the response body.
212
250
 
213
251
  The endpoint will automatically use the following `Upload-Metadata` values if
214
252
  they're available:
@@ -237,88 +275,13 @@ The following checksum algorithms are supported for the `checksum` extension:
237
275
  * MD5
238
276
  * CRC32
239
277
 
240
- ## Performance considerations
241
-
242
- ### Buffering
243
-
244
- When handling file uploads it's important not be be vulnerable to slow-write
245
- clients. That means you need to make sure that your web/application server
246
- buffers the request body locally before handing the request to the request
247
- worker.
248
-
249
- If the request body is not buffered and is read directly from the socket when
250
- it has already reached your Rack application, your application throughput will
251
- be severly impacted, because the workers will spend majority of their time
252
- waiting for request body to be read from the socket, and in that time they
253
- won't be able to serve new requests.
254
-
255
- Puma will automatically buffer the whole request body in a Tempfile, before
256
- fowarding the request to your Rack app. Unicorn and Passenger will not do that,
257
- so it's highly recommended to put a frontend server like Nginx in front of
258
- those web servers, and configure it to buffer the request body.
259
-
260
- ### Chunking
261
-
262
- The tus protocol specifies
263
-
264
- > The Server SHOULD always attempt to store as much of the received data as possible.
265
-
266
- The tus-ruby-server Rack application supports saving partial data for if the
267
- PATCH request gets interrupted before all data has been sent, but I'm not aware
268
- of any Rack-compliant web server that will forward interrupted requests to the
269
- Rack app.
270
-
271
- This means that for resumable upload to be possible with tus-ruby-server in
272
- general, the file must be uploaded in multiple chunks; the client shouldn't
273
- rely that server will store any data if the PATCH request was interrupted.
274
-
275
- ```js
276
- // using tus-js-client
277
- new tus.Upload(file, {
278
- endpoint: "http://localhost:9292/files",
279
- chunkSize: 5*1024*1024, // required option
280
- // ...
281
- })
282
- ```
283
-
284
- ### Downloading
285
-
286
- Tus-ruby-server has a download endpoint which streams the uploaded file to the
287
- client. Unfortunately, with most classic web servers this endpoint will be
288
- vulnerable to slow-read clients, because the worker is only done once the whole
289
- response body has been received by the client. Web servers that are not
290
- vulnerable to slow-read clients include [Goliath]/[Thin] ([EventMachine]) and
291
- [Reel] ([Celluloid::IO]).
292
-
293
- So, depending on your requirements, you might want to avoid displaying the
294
- uploaded file in the browser (making the user download the file directly from
295
- the tus server), until it has been moved to a permanent storage. You might also
296
- want to consider copying finished uploads to permanent storage directly from
297
- the underlying tus storage, instead of downloading them through the app.
298
-
299
278
  ## Tests
300
279
 
301
280
  Run tests with
302
281
 
303
- ```
304
- $ rake test
305
- ```
306
-
307
- The S3 tests are excluded by default, but you can include them by setting the
308
- `$S3` environment variable.
309
-
310
- ```
311
- $ S3=1 rake test
312
- ```
313
-
314
- For running S3 tests you need to create an `.env` with the S3 credentials:
315
-
316
282
  ```sh
317
- # .env
318
- S3_BUCKET="..."
319
- S3_REGION="..."
320
- S3_ACCESS_KEY_ID="..."
321
- S3_SECRET_ACCESS_KEY="..."
283
+ $ bundle exec rake test # unit tests
284
+ $ bundle exec cucumber # acceptance tests
322
285
  ```
323
286
 
324
287
  ## Inspiration
@@ -346,8 +309,6 @@ The tus-ruby-server was inspired by [rubytus].
346
309
  [`Aws::S3::Client#initialize`]: http://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Client.html#initialize-instance_method
347
310
  [`Aws::S3::Client#create_multipart_upload`]: http://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Client.html#create_multipart_upload-instance_method
348
311
  [Range requests]: https://tools.ietf.org/html/rfc7233
349
- [EventMachine]: https://github.com/eventmachine/eventmachine
350
- [Reel]: https://github.com/celluloid/reel
351
312
  [Goliath]: https://github.com/postrank-labs/goliath
352
- [Thin]: https://github.com/macournoyer/thin
353
- [Celluloid::IO]: https://github.com/celluloid/celluloid-io
313
+ [EventMachine]: https://github.com/eventmachine/eventmachine
314
+ [goliath server options]: https://github.com/postrank-labs/goliath/wiki/Server
data/lib/tus/checksum.rb CHANGED
@@ -1,9 +1,7 @@
1
- require "base64"
2
- require "digest"
3
- require "zlib"
4
-
5
1
  module Tus
6
2
  class Checksum
3
+ CHUNK_SIZE = 16*1024
4
+
7
5
  attr_reader :algorithm
8
6
 
9
7
  def self.generate(algorithm, input)
@@ -47,14 +45,21 @@ module Tus
47
45
  end
48
46
 
49
47
  def generate_crc32(io)
50
- crc = 0
51
- crc = Zlib.crc32(io.read(16*1024, buffer ||= ""), crc) until io.eof?
52
- Base64.encode64(crc.to_s)
48
+ require "zlib"
49
+ require "base64"
50
+ crc = Zlib.crc32("")
51
+ while (data = io.read(CHUNK_SIZE, buffer ||= ""))
52
+ crc = Zlib.crc32(data, crc)
53
+ end
54
+ Base64.strict_encode64(crc.to_s)
53
55
  end
54
56
 
55
57
  def digest(name, io)
58
+ require "digest"
56
59
  digest = Digest.const_get(name).new
57
- digest.update(io.read(16*1024, buffer ||= "")) until io.eof?
60
+ while (data = io.read(CHUNK_SIZE, buffer ||= ""))
61
+ digest.update(data)
62
+ end
58
63
  digest.base64digest
59
64
  end
60
65
  end
data/lib/tus/errors.rb CHANGED
@@ -1,4 +1,5 @@
1
1
  module Tus
2
- Error = Class.new(StandardError)
3
- NotFound = Class.new(Error)
2
+ Error = Class.new(StandardError)
3
+ NotFound = Class.new(Error)
4
+ MaxSizeExceeded = Class.new(Error)
4
5
  end
data/lib/tus/input.rb CHANGED
@@ -1,31 +1,33 @@
1
+ require "tus/errors"
2
+
1
3
  module Tus
2
4
  class Input
3
- def initialize(input)
5
+ def initialize(input, limit: nil)
4
6
  @input = input
5
- @bytes_read = 0
7
+ @limit = limit
8
+ @pos = 0
6
9
  end
7
10
 
8
- def read(*args)
9
- result = @input.read(*args)
10
- @bytes_read += result.bytesize if result.is_a?(String)
11
- result
11
+ def read(length = nil, outbuf = nil)
12
+ data = @input.read(length, outbuf)
13
+
14
+ @pos += data.bytesize if data
15
+ raise MaxSizeExceeded if @limit && @pos > @limit
16
+
17
+ data
18
+ rescue => exception
19
+ raise unless exception.class.name == "Unicorn::ClientShutdown"
20
+ outbuf = outbuf.to_s.clear
21
+ outbuf unless length
12
22
  end
13
23
 
14
- def eof?
15
- @bytes_read == size
24
+ def pos
25
+ @pos
16
26
  end
17
27
 
18
28
  def rewind
19
29
  @input.rewind
20
- @bytes_read = 0
21
- end
22
-
23
- def size
24
- if defined?(Rack::Lint) && @input.is_a?(Rack::Lint::InputWrapper)
25
- @input.instance_variable_get("@input").size
26
- else
27
- @input.size
28
- end
30
+ @pos = 0
29
31
  end
30
32
 
31
33
  def close
@@ -0,0 +1,68 @@
1
+ require "tus/server"
2
+ require "goliath"
3
+
4
+ class Tus::Server::Goliath < Goliath::API
5
+ # Called as soon as request headers are parsed.
6
+ def on_headers(env, headers)
7
+ # the write end of the pipe is written in #on_body, and the read end is read by Tus::Server
8
+ env["tus.input-reader"], env["tus.input-writer"] = IO.pipe
9
+ # use a thread so that request is being processed in parallel
10
+ env["tus.request-thread"] = Thread.new do
11
+ call_tus_server env.merge("rack.input" => env["tus.input-reader"])
12
+ end
13
+ end
14
+
15
+ # Called on each request body chunk received from the client.
16
+ def on_body(env, data)
17
+ # append data to the write end of the pipe if open, otherwise do nothing
18
+ env["tus.input-writer"].write(data) unless env["tus.input-writer"].closed?
19
+ rescue Errno::EPIPE
20
+ # read end of the pipe has been closed, so we close the write end as well
21
+ env["tus.input-writer"].close
22
+ end
23
+
24
+ # Called at the end of the request (after #response is called), but also on
25
+ # client disconnect (in which case #response isn't called), so we want to do
26
+ # the same finalization in both methods.
27
+ def on_close(env)
28
+ finalize(env)
29
+ end
30
+
31
+ # Called after all the data has been received from the client.
32
+ def response(env)
33
+ status, headers, body = finalize(env)
34
+
35
+ env[STREAM_START].call(status, headers)
36
+
37
+ operation = proc { body.each { |chunk| env.stream_send(chunk) } }
38
+ callback = proc { env.stream_close }
39
+
40
+ EM.defer(operation, callback) # use an outside thread pool for streaming
41
+
42
+ nil
43
+ end
44
+
45
+ private
46
+
47
+ # Calls the actual Roda application with the slightly modified env hash.
48
+ def call_tus_server(env)
49
+ Tus::Server.call env.merge(
50
+ "rack.url_scheme" => (env["options"][:ssl] ? "https" : "http"), # https://github.com/postrank-labs/goliath/issues/210
51
+ "async.callback" => nil, # prevent Roda from calling EventMachine when streaming
52
+ )
53
+ end
54
+
55
+ # This method needs to be idempotent, because it can be called twice (on
56
+ # normal requests both #response and #on_close will be called, and on client
57
+ # disconnect only #on_close will be called).
58
+ def finalize(env)
59
+ # closing the write end of the pipe will mark EOF on the read end
60
+ env["tus.input-writer"].close unless env["tus.input-writer"].closed?
61
+ # wait for the request to finish
62
+ result = env["tus.request-thread"].value
63
+ # close read end of the pipe, since nothing is going to read from it anymore
64
+ env["tus.input-reader"].close unless env["tus.input-reader"].closed?
65
+ # return rack response
66
+ result
67
+ end
68
+ end
data/lib/tus/server.rb CHANGED
@@ -32,7 +32,6 @@ module Tus
32
32
  plugin :request_headers
33
33
  plugin :not_allowed
34
34
  plugin :streaming
35
- plugin :error_handler
36
35
 
37
36
  route do |r|
38
37
  if request.headers["X-HTTP-Method-Override"]
@@ -74,6 +73,8 @@ module Tus
74
73
  )
75
74
 
76
75
  if info.concatenation?
76
+ validate_partial_uploads!(info.partial_uploads)
77
+
77
78
  length = storage.concatenate(uid, info.partial_uploads, info.to_h)
78
79
  info["Upload-Length"] = length.to_s
79
80
  info["Upload-Offset"] = length.to_s
@@ -102,7 +103,11 @@ module Tus
102
103
  no_content!
103
104
  end
104
105
 
105
- info = Tus::Info.new(storage.read_info(uid))
106
+ begin
107
+ info = Tus::Info.new(storage.read_info(uid))
108
+ rescue Tus::NotFound
109
+ error!(404, "Upload Not Found")
110
+ end
106
111
 
107
112
  r.head do
108
113
  response.headers.update(info.headers)
@@ -112,23 +117,27 @@ module Tus
112
117
  end
113
118
 
114
119
  r.patch do
115
- input = Tus::Input.new(request.body)
116
-
117
120
  if info.defer_length? && request.headers["Upload-Length"]
118
121
  validate_upload_length!
119
122
 
120
- info["Upload-Length"] = request.headers["Upload-Length"]
123
+ info["Upload-Length"] = request.headers["Upload-Length"]
121
124
  info["Upload-Defer-Length"] = nil
122
125
  end
123
126
 
127
+ input = get_input(info)
128
+
124
129
  validate_content_type!
125
- validate_content_length!(info.offset, info.length)
126
- validate_upload_offset!(info.offset)
130
+ validate_upload_offset!(info)
131
+ validate_content_length!(request.content_length.to_i, info) if request.content_length
127
132
  validate_upload_checksum!(input) if request.headers["Upload-Checksum"]
128
133
 
129
- storage.patch_file(uid, input, info.to_h)
134
+ begin
135
+ bytes_uploaded = storage.patch_file(uid, input, info.to_h)
136
+ rescue Tus::MaxSizeExceeded
137
+ validate_content_length!(input.pos, info)
138
+ end
130
139
 
131
- info["Upload-Offset"] = (info.offset + input.size).to_s
140
+ info["Upload-Offset"] = (info.offset + bytes_uploaded).to_s
132
141
  info["Upload-Expires"] = (Time.now + expiration_time).httpdate
133
142
 
134
143
  if info.offset == info.length # last chunk
@@ -142,13 +151,13 @@ module Tus
142
151
  end
143
152
 
144
153
  r.get do
145
- validate_upload_finished!(info.length, info.offset)
154
+ validate_upload_finished!(info)
146
155
  range = handle_range_request!(info.length)
147
156
 
148
157
  response.headers["Content-Length"] = (range.end - range.begin + 1).to_s
149
158
 
150
159
  metadata = info.metadata
151
- response.headers["Content-Disposition"] = opts[:disposition]
160
+ response.headers["Content-Disposition"] = opts[:disposition]
152
161
  response.headers["Content-Disposition"] += "; filename=\"#{metadata["filename"]}\"" if metadata["filename"]
153
162
  response.headers["Content-Type"] = metadata["content_type"] || "application/octet-stream"
154
163
 
@@ -167,9 +176,12 @@ module Tus
167
176
  end
168
177
  end
169
178
 
170
- error do |exception|
171
- not_found! if exception.is_a?(Tus::NotFound)
172
- raise
179
+ def get_input(info)
180
+ offset = info.offset
181
+ total = info.length || max_size
182
+ limit = total - offset if total
183
+
184
+ Tus::Input.new(request.body, limit: limit)
173
185
  end
174
186
 
175
187
  def validate_content_type!
@@ -197,29 +209,29 @@ module Tus
197
209
  end
198
210
  end
199
211
 
200
- def validate_upload_offset!(current_offset)
212
+ def validate_upload_offset!(info)
201
213
  upload_offset = request.headers["Upload-Offset"]
202
214
 
203
215
  error!(400, "Missing Upload-Offset header") if upload_offset.to_s == ""
204
216
  error!(400, "Invalid Upload-Offset header") if upload_offset =~ /\D/
205
217
  error!(400, "Invalid Upload-Offset header") if upload_offset.to_i < 0
206
218
 
207
- if upload_offset.to_i != current_offset
219
+ if upload_offset.to_i != info.offset
208
220
  error!(409, "Upload-Offset header doesn't match current offset")
209
221
  end
210
222
  end
211
223
 
212
- def validate_content_length!(current_offset, length)
213
- if length
214
- error!(403, "Cannot modify completed upload") if current_offset == length
215
- error!(413, "Size of this chunk surpasses Upload-Length") if Integer(request.content_length) + current_offset > length
224
+ def validate_content_length!(size, info)
225
+ if info.length
226
+ error!(403, "Cannot modify completed upload") if info.offset == info.length
227
+ error!(413, "Size of this chunk surpasses Upload-Length") if info.offset + size > info.length
216
228
  elsif max_size
217
- error!(413, "Size of this chunk surpasses Tus-Max-Size") if Integer(request.content_length) + current_offset > max_size
229
+ error!(413, "Size of this chunk surpasses Tus-Max-Size") if info.offset + size > max_size
218
230
  end
219
231
  end
220
232
 
221
- def validate_upload_finished!(length, current_offset)
222
- error!(403, "Cannot download unfinished upload") unless length && current_offset && length == current_offset
233
+ def validate_upload_finished!(info)
234
+ error!(403, "Cannot download unfinished upload") unless info.length == info.offset
223
235
  end
224
236
 
225
237
  def validate_upload_metadata!
@@ -249,6 +261,30 @@ module Tus
249
261
  end
250
262
  end
251
263
 
264
+ def validate_partial_uploads!(part_uids)
265
+ queue = Queue.new
266
+ part_uids.each { |part_uid| queue << part_uid }
267
+
268
+ threads = 10.times.map do
269
+ Thread.new do
270
+ results = []
271
+ loop do
272
+ part_uid = queue.deq(true) rescue break
273
+ part_info = storage.read_info(part_uid)
274
+ results << part_info["Upload-Concat"]
275
+ end
276
+ results
277
+ end
278
+ end
279
+
280
+ upload_concat_values = threads.flat_map(&:value)
281
+ unless upload_concat_values.all? { |value| value == "partial" }
282
+ error!(400, "One or more uploads were not partial")
283
+ end
284
+ rescue Tus::NotFound
285
+ error!(404, "One or more partial uploads were not found")
286
+ end
287
+
252
288
  def validate_upload_checksum!(input)
253
289
  algorithm, checksum = request.headers["Upload-Checksum"].split(" ")
254
290
 
@@ -256,7 +292,7 @@ module Tus
256
292
  error!(400, "Invalid Upload-Checksum header") unless SUPPORTED_CHECKSUM_ALGORITHMS.include?(algorithm)
257
293
 
258
294
  generated_checksum = Tus::Checksum.generate(algorithm, input)
259
- error!(460, "Checksum from Upload-Checksum header doesn't match generated") if generated_checksum != checksum
295
+ error!(460, "Upload-Checksum value doesn't match generated checksum") if generated_checksum != checksum
260
296
  end
261
297
 
262
298
  # "Range" header handling logic copied from Rack::File
@@ -312,10 +348,6 @@ module Tus
312
348
  request.halt
313
349
  end
314
350
 
315
- def not_found!(message = "Upload not found")
316
- error!(404, message)
317
- end
318
-
319
351
  def error!(status, message)
320
352
  response.status = status
321
353
  response.write(message) unless request.head?
@@ -49,35 +49,28 @@ module Tus
49
49
  end
50
50
 
51
51
  def patch_file(uid, input, info = {})
52
- exists!(uid)
53
-
54
52
  file_path(uid).open("ab") { |file| IO.copy_stream(input, file) }
55
53
  end
56
54
 
57
55
  def read_info(uid)
58
- exists!(uid)
56
+ raise Tus::NotFound if !file_path(uid).exist?
59
57
 
60
58
  JSON.parse(info_path(uid).binread)
61
59
  end
62
60
 
63
61
  def update_info(uid, info)
64
- exists!(uid)
65
-
66
62
  info_path(uid).binwrite(JSON.generate(info))
67
63
  end
68
64
 
69
65
  def get_file(uid, info = {}, range: nil)
70
- exists!(uid)
71
-
72
66
  file = file_path(uid).open("rb")
73
- range ||= 0..(file.size - 1)
74
- length = range.end - range.begin + 1
67
+ length = range ? range.size : file.size
75
68
 
76
69
  # Create an Enumerator which will yield chunks of the requested file
77
70
  # content, allowing tus server to efficiently stream requested content
78
71
  # to the client.
79
72
  chunks = Enumerator.new do |yielder|
80
- file.seek(range.begin)
73
+ file.seek(range.begin) if range
81
74
  remaining_length = length
82
75
 
83
76
  while remaining_length > 0
@@ -89,11 +82,7 @@ module Tus
89
82
 
90
83
  # We return a response object that responds to #each, #length and #close,
91
84
  # which the tus server can return directly as the Rack response.
92
- Response.new(
93
- chunks: chunks,
94
- length: length,
95
- close: ->{file.close},
96
- )
85
+ Response.new(chunks: chunks, length: length, close: -> { file.close })
97
86
  end
98
87
 
99
88
  def delete_file(uid, info = {})
@@ -116,12 +105,8 @@ module Tus
116
105
  FileUtils.rm_f paths
117
106
  end
118
107
 
119
- def exists!(uid)
120
- raise Tus::NotFound if !file_path(uid).exist?
121
- end
122
-
123
108
  def file_path(uid)
124
- directory.join("#{uid}.file")
109
+ directory.join("#{uid}")
125
110
  end
126
111
 
127
112
  def info_path(uid)
@@ -8,6 +8,8 @@ require "digest"
8
8
  module Tus
9
9
  module Storage
10
10
  class Gridfs
11
+ BATCH_SIZE = 5 * 1024 * 1024
12
+
11
13
  attr_reader :client, :prefix, :bucket, :chunk_size
12
14
 
13
15
  def initialize(client:, prefix: "fs", chunk_size: 256*1024)
@@ -47,7 +49,10 @@ module Tus
47
49
  grid_infos.inject(0) do |offset, grid_info|
48
50
  result = chunks_collection
49
51
  .find(files_id: grid_info[:_id])
50
- .update_many("$set" => {files_id: grid_file.id}, "$inc" => {n: offset})
52
+ .update_many(
53
+ "$set" => { files_id: grid_file.id },
54
+ "$inc" => { n: offset },
55
+ )
51
56
 
52
57
  offset += result.modified_count
53
58
  end
@@ -60,47 +65,72 @@ module Tus
60
65
  end
61
66
 
62
67
  def patch_file(uid, input, info = {})
63
- grid_info = find_grid_info!(uid)
68
+ grid_info = files_collection.find(filename: uid).first
69
+ current_length = grid_info[:length]
70
+ chunk_size = grid_info[:chunkSize]
71
+ bytes_saved = 0
72
+
73
+ bytes_saved += patch_last_chunk(input, grid_info) if current_length % chunk_size != 0
74
+
75
+ chunks_enumerator = Enumerator.new do |yielder|
76
+ while (data = input.read(chunk_size))
77
+ yielder << data
78
+ end
79
+ end
64
80
 
65
- patch_last_chunk(input, grid_info)
81
+ chunks_in_batch = (BATCH_SIZE.to_f / chunk_size).ceil
82
+ chunks_offset = chunks_collection.count(files_id: grid_info[:_id]) - 1
66
83
 
67
- grid_chunks = split_into_grid_chunks(input, grid_info)
68
- chunks_collection.insert_many(grid_chunks)
69
- grid_chunks.each { |grid_chunk| grid_chunk.data.data.clear } # deallocate strings
84
+ chunks_enumerator.each_slice(chunks_in_batch) do |chunks|
85
+ grid_chunks = chunks.map do |data|
86
+ Mongo::Grid::File::Chunk.new(
87
+ data: BSON::Binary.new(data),
88
+ files_id: grid_info[:_id],
89
+ n: chunks_offset += 1,
90
+ )
91
+ end
92
+
93
+ chunks_collection.insert_many(grid_chunks)
94
+
95
+ # Update the total length and refresh the upload date on each update,
96
+ # which are used in #get_file, #concatenate and #expire_files.
97
+ files_collection.find(filename: uid).update_one(
98
+ "$inc" => { length: chunks.map(&:bytesize).inject(0, :+) },
99
+ "$set" => { uploadDate: Time.now.utc },
100
+ )
101
+ bytes_saved += chunks.map(&:bytesize).inject(0, :+)
102
+
103
+ chunks.each(&:clear) # deallocate strings
104
+ end
70
105
 
71
- # Update the total length and refresh the upload date on each update,
72
- # which are used in #get_file, #concatenate and #expire_files.
73
- files_collection.find(filename: uid).update_one("$set" => {
74
- length: grid_info[:length] + input.size,
75
- uploadDate: Time.now.utc,
76
- })
106
+ bytes_saved
77
107
  end
78
108
 
79
109
  def read_info(uid)
80
- grid_info = find_grid_info!(uid)
110
+ grid_info = files_collection.find(filename: uid).first or raise Tus::NotFound
81
111
 
82
112
  grid_info[:metadata]
83
113
  end
84
114
 
85
115
  def update_info(uid, info)
86
- grid_info = find_grid_info!(uid)
116
+ grid_info = files_collection.find(filename: uid).first
87
117
 
88
118
  files_collection.update_one({filename: uid}, {"$set" => {metadata: info}})
89
119
  end
90
120
 
91
121
  def get_file(uid, info = {}, range: nil)
92
- grid_info = find_grid_info!(uid)
122
+ grid_info = files_collection.find(filename: uid).first
93
123
 
94
- range ||= 0..(grid_info[:length] - 1)
95
- length = range.end - range.begin + 1
124
+ length = range ? range.size : grid_info[:length]
96
125
 
97
- chunk_start = range.begin / grid_info[:chunkSize]
98
- chunk_stop = range.end / grid_info[:chunkSize]
126
+ filter = { files_id: grid_info[:_id] }
99
127
 
100
- filter = {
101
- files_id: grid_info[:_id],
102
- n: {"$gte" => chunk_start, "$lte" => chunk_stop}
103
- }
128
+ if range
129
+ chunk_start = range.begin / grid_info[:chunkSize]
130
+ chunk_stop = range.end / grid_info[:chunkSize]
131
+
132
+ filter[:n] = {"$gte" => chunk_start, "$lte" => chunk_stop}
133
+ end
104
134
 
105
135
  # Query only the subset of chunks specified by the range query. We
106
136
  # cannot use Mongo::FsBucket#open_download_stream here because it
@@ -137,11 +167,7 @@ module Tus
137
167
 
138
168
  # We return a response object that responds to #each, #length and #close,
139
169
  # which the tus server can return directly as the Rack response.
140
- Response.new(
141
- chunks: chunks,
142
- length: length,
143
- close: ->{chunks_view.close_query},
144
- )
170
+ Response.new(chunks: chunks, length: length, close: ->{chunks_view.close_query})
145
171
  end
146
172
 
147
173
  def delete_file(uid, info = {})
@@ -168,29 +194,19 @@ module Tus
168
194
  grid_file
169
195
  end
170
196
 
171
- def split_into_grid_chunks(io, grid_info)
172
- grid_info[:md5] = Digest::MD5.new # hack for `Chunk.split` updating MD5
173
- grid_info = Mongo::Grid::File::Info.new(Mongo::Options::Mapper.transform(grid_info, Mongo::Grid::File::Info::MAPPINGS.invert))
174
- offset = chunks_collection.count(files_id: grid_info.id)
175
-
176
- Mongo::Grid::File::Chunk.split(io, grid_info, offset)
177
- end
178
-
179
197
  def patch_last_chunk(input, grid_info)
180
- if grid_info[:length] % grid_info[:chunkSize] != 0
181
- last_chunk = chunks_collection.find(files_id: grid_info[:_id]).sort(n: -1).limit(1).first
182
- data = last_chunk[:data].data
183
- data << input.read(grid_info[:chunkSize] - data.length)
198
+ last_chunk = chunks_collection.find(files_id: grid_info[:_id]).sort(n: -1).limit(1).first
199
+ data = last_chunk[:data].data
200
+ patch = input.read(grid_info[:chunkSize] - data.bytesize)
201
+ data << patch
184
202
 
185
- chunks_collection.find(files_id: grid_info[:_id], n: last_chunk[:n])
186
- .update_one("$set" => {data: BSON::Binary.new(data)})
203
+ chunks_collection.find(files_id: grid_info[:_id], n: last_chunk[:n])
204
+ .update_one("$set" => { data: BSON::Binary.new(data) })
187
205
 
188
- data.clear # deallocate string
189
- end
190
- end
206
+ files_collection.find(_id: grid_info[:_id])
207
+ .update_one("$inc" => { length: patch.bytesize })
191
208
 
192
- def find_grid_info!(uid)
193
- files_collection.find(filename: uid).first or raise Tus::NotFound
209
+ patch.bytesize
194
210
  end
195
211
 
196
212
  def validate_parts!(grid_infos, part_uids)
@@ -1,18 +1,19 @@
1
1
  require "aws-sdk"
2
2
 
3
3
  require "tus/info"
4
- require "tus/checksum"
5
4
  require "tus/errors"
6
5
 
7
6
  require "json"
8
- require "cgi/util"
7
+ require "cgi"
8
+ require "fiber"
9
+ require "stringio"
9
10
 
10
11
  Aws.eager_autoload!(services: ["S3"])
11
12
 
12
13
  module Tus
13
14
  module Storage
14
15
  class S3
15
- MIN_PART_SIZE = 5 * 1024 * 1024
16
+ MIN_PART_SIZE = 5 * 1024 * 1024 # 5MB is the minimum part size for S3 multipart uploads
16
17
 
17
18
  attr_reader :client, :bucket, :prefix, :upload_options
18
19
 
@@ -20,7 +21,7 @@ module Tus
20
21
  resource = Aws::S3::Resource.new(**client_options)
21
22
 
22
23
  @client = resource.client
23
- @bucket = resource.bucket(bucket)
24
+ @bucket = resource.bucket(bucket) or fail(ArgumentError, "the :bucket option was nil")
24
25
  @prefix = prefix
25
26
  @upload_options = upload_options
26
27
  @thread_count = thread_count
@@ -33,8 +34,12 @@ module Tus
33
34
  options[:content_type] = tus_info.metadata["content_type"]
34
35
 
35
36
  if filename = tus_info.metadata["filename"]
37
+ # Aws-sdk doesn't sign non-ASCII characters correctly, and browsers
38
+ # will automatically URI-decode filenames.
39
+ filename = CGI.escape(filename).gsub("+", " ")
40
+
36
41
  options[:content_disposition] ||= "inline"
37
- options[:content_disposition] += "; filename=\"#{CGI.escape(filename).gsub("+", " ")}\""
42
+ options[:content_disposition] += "; filename=\"#{filename}\""
38
43
  end
39
44
 
40
45
  multipart_upload = object(uid).initiate_multipart_upload(options)
@@ -51,9 +56,7 @@ module Tus
51
56
  objects = part_uids.map { |part_uid| object(part_uid) }
52
57
  parts = copy_parts(objects, multipart_upload)
53
58
 
54
- parts.each do |part|
55
- info["multipart_parts"] << { "part_number" => part[:part_number], "etag" => part[:etag] }
56
- end
59
+ info["multipart_parts"].concat parts
57
60
 
58
61
  finalize_file(uid, info)
59
62
 
@@ -68,21 +71,44 @@ module Tus
68
71
  end
69
72
 
70
73
  def patch_file(uid, input, info = {})
71
- upload_id = info["multipart_id"]
72
- part_number = info["multipart_parts"].count + 1
74
+ tus_info = Tus::Info.new(info)
73
75
 
74
- multipart_upload = object(uid).multipart_upload(upload_id)
75
- multipart_part = multipart_upload.part(part_number)
76
- md5 = Tus::Checksum.new("md5").generate(input)
76
+ upload_id = info["multipart_id"]
77
+ part_offset = info["multipart_parts"].count
78
+ bytes_uploaded = 0
77
79
 
78
- response = multipart_part.upload(body: input, content_md5: md5)
80
+ jobs = []
81
+ chunk = StringIO.new(input.read(MIN_PART_SIZE).to_s)
79
82
 
80
- info["multipart_parts"] << {
81
- "part_number" => part_number,
82
- "etag" => response.etag[/"(.+)"/, 1],
83
- }
84
- rescue Aws::S3::Errors::NoSuchUpload
85
- raise Tus::NotFound
83
+ loop do
84
+ next_chunk = StringIO.new(input.read(MIN_PART_SIZE).to_s)
85
+
86
+ # merge next chunk into previous if it's smaller than minimum chunk size
87
+ if next_chunk.size < MIN_PART_SIZE
88
+ chunk = StringIO.new(chunk.string + next_chunk.string)
89
+ next_chunk.close
90
+ next_chunk = nil
91
+ end
92
+
93
+ # abort if chunk is smaller than 5MB and is not the last chunk
94
+ if chunk.size < MIN_PART_SIZE
95
+ break if (tus_info.length && tus_info.offset) &&
96
+ chunk.size + tus_info.offset < tus_info.length
97
+ end
98
+
99
+ thread = upload_part_thread(chunk, uid, upload_id, part_offset += 1)
100
+ jobs << [thread, chunk]
101
+
102
+ chunk = next_chunk or break
103
+ end
104
+
105
+ jobs.each do |thread, body|
106
+ info["multipart_parts"] << thread.value
107
+ bytes_uploaded += body.size
108
+ body.close
109
+ end
110
+
111
+ bytes_uploaded
86
112
  end
87
113
 
88
114
  def finalize_file(uid, info = {})
@@ -92,7 +118,7 @@ module Tus
92
118
  end
93
119
 
94
120
  multipart_upload = object(uid).multipart_upload(upload_id)
95
- multipart_upload.complete(multipart_upload: {parts: parts})
121
+ multipart_upload.complete(multipart_upload: { parts: parts })
96
122
 
97
123
  info.delete("multipart_id")
98
124
  info.delete("multipart_parts")
@@ -110,26 +136,15 @@ module Tus
110
136
  end
111
137
 
112
138
  def get_file(uid, info = {}, range: nil)
113
- object = object(uid)
114
- range = "bytes=#{range.begin}-#{range.end}" if range
115
-
116
- raw_chunks = object.enum_for(:get, range: range)
117
-
118
- # Start the request to be notified if the object doesn't exist, and to
119
- # get Aws::S3::Object#content_length.
120
- first_chunk = raw_chunks.next
139
+ tus_info = Tus::Info.new(info)
121
140
 
122
- chunks = Enumerator.new do |yielder|
123
- yielder << first_chunk
124
- loop { yielder << raw_chunks.next }
125
- end
141
+ length = range ? range.size : tus_info.length
142
+ range = "bytes=#{range.begin}-#{range.end}" if range
143
+ chunks = object(uid).enum_for(:get, range: range)
126
144
 
127
- Response.new(
128
- chunks: chunks,
129
- length: object.content_length,
130
- )
131
- rescue Aws::S3::Errors::NoSuchKey
132
- raise Tus::NotFound
145
+ # We return a response object that responds to #each, #length and #close,
146
+ # which the tus server can return directly as the Rack response.
147
+ Response.new(chunks: chunks, length: length)
133
148
  end
134
149
 
135
150
  def delete_file(uid, info = {})
@@ -151,7 +166,9 @@ module Tus
151
166
  delete(old_objects)
152
167
 
153
168
  bucket.multipart_uploads.each do |multipart_upload|
154
- next unless multipart_upload.initiated <= expiration_date
169
+ # no need to check multipart uploads initiated before expiration date
170
+ next if multipart_upload.initiated > expiration_date
171
+
155
172
  most_recent_part = multipart_upload.parts.sort_by(&:last_modified).last
156
173
  if most_recent_part.nil? || most_recent_part.last_modified <= expiration_date
157
174
  abort_multipart_upload(multipart_upload)
@@ -161,10 +178,23 @@ module Tus
161
178
 
162
179
  private
163
180
 
181
+ def upload_part_thread(body, key, upload_id, part_number)
182
+ Thread.new { upload_part(body, key, upload_id, part_number) }
183
+ end
184
+
185
+ def upload_part(body, key, upload_id, part_number)
186
+ multipart_upload = object(key).multipart_upload(upload_id)
187
+ multipart_part = multipart_upload.part(part_number)
188
+
189
+ response = multipart_part.upload(body: body)
190
+
191
+ { "part_number" => part_number, "etag" => response.etag }
192
+ end
193
+
164
194
  def delete(objects)
165
195
  # S3 can delete maximum of 1000 objects in a single request
166
196
  objects.each_slice(1000) do |objects_batch|
167
- delete_params = {objects: objects_batch.map { |object| {key: object.key} }}
197
+ delete_params = { objects: objects_batch.map { |object| { key: object.key } } }
168
198
  bucket.delete_objects(delete: delete_params)
169
199
  end
170
200
  end
@@ -187,7 +217,7 @@ module Tus
187
217
 
188
218
  threads = @thread_count.times.map { copy_part_thread(queue) }
189
219
 
190
- threads.flat_map(&:value).sort_by { |part| part[:part_number] }
220
+ threads.flat_map(&:value).sort_by { |part| part["part_number"] }
191
221
  end
192
222
 
193
223
  def compute_parts(objects, multipart_upload)
@@ -204,7 +234,6 @@ module Tus
204
234
 
205
235
  def copy_part_thread(queue)
206
236
  Thread.new do
207
- Thread.current.abort_on_exception = true
208
237
  begin
209
238
  results = []
210
239
  loop do
@@ -212,9 +241,9 @@ module Tus
212
241
  results << copy_part(part)
213
242
  end
214
243
  results
215
- rescue => error
244
+ rescue
216
245
  queue.clear
217
- raise error
246
+ raise
218
247
  end
219
248
  end
220
249
  end
@@ -222,7 +251,7 @@ module Tus
222
251
  def copy_part(part)
223
252
  response = client.upload_part_copy(part)
224
253
 
225
- { part_number: part[:part_number], etag: response.copy_part_result.etag }
254
+ { "part_number" => part[:part_number], "etag" => response.copy_part_result.etag }
226
255
  end
227
256
 
228
257
  def object(key)
@@ -239,12 +268,28 @@ module Tus
239
268
  @length
240
269
  end
241
270
 
242
- def each(&block)
243
- @chunks.each(&block)
271
+ def each
272
+ return enum_for(__method__) unless block_given?
273
+
274
+ while (chunk = chunks_fiber.resume)
275
+ yield chunk
276
+ end
244
277
  end
245
278
 
246
279
  def close
247
- # aws-sdk doesn't provide an API to terminate the HTTP connection
280
+ chunks_fiber.resume(:close) if chunks_fiber.alive?
281
+ end
282
+
283
+ private
284
+
285
+ def chunks_fiber
286
+ @chunks_fiber ||= Fiber.new do
287
+ @chunks.each do |chunk|
288
+ action = Fiber.yield chunk
289
+ break if action == :close
290
+ end
291
+ nil
292
+ end
248
293
  end
249
294
  end
250
295
  end
data/tus-server.gemspec CHANGED
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |gem|
2
2
  gem.name = "tus-server"
3
- gem.version = "0.10.2"
3
+ gem.version = "1.0.0"
4
4
 
5
5
  gem.required_ruby_version = ">= 2.1"
6
6
 
@@ -19,7 +19,10 @@ Gem::Specification.new do |gem|
19
19
  gem.add_development_dependency "rake", "~> 11.1"
20
20
  gem.add_development_dependency "minitest", "~> 5.8"
21
21
  gem.add_development_dependency "rack-test_app"
22
+ gem.add_development_dependency "cucumber"
23
+ gem.add_development_dependency "unicorn"
22
24
  gem.add_development_dependency "mongo"
23
25
  gem.add_development_dependency "aws-sdk", "~> 2.0"
24
- gem.add_development_dependency "dotenv"
26
+ gem.add_development_dependency "goliath"
27
+ gem.add_development_dependency "async-rack", ">= 0.5.1"
25
28
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: tus-server
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.10.2
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Janko Marohnić
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-04-19 00:00:00.000000000 Z
11
+ date: 2017-07-17 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: roda
@@ -66,6 +66,34 @@ dependencies:
66
66
  - - ">="
67
67
  - !ruby/object:Gem::Version
68
68
  version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: cucumber
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: unicorn
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
69
97
  - !ruby/object:Gem::Dependency
70
98
  name: mongo
71
99
  requirement: !ruby/object:Gem::Requirement
@@ -95,7 +123,7 @@ dependencies:
95
123
  - !ruby/object:Gem::Version
96
124
  version: '2.0'
97
125
  - !ruby/object:Gem::Dependency
98
- name: dotenv
126
+ name: goliath
99
127
  requirement: !ruby/object:Gem::Requirement
100
128
  requirements:
101
129
  - - ">="
@@ -108,6 +136,20 @@ dependencies:
108
136
  - - ">="
109
137
  - !ruby/object:Gem::Version
110
138
  version: '0'
139
+ - !ruby/object:Gem::Dependency
140
+ name: async-rack
141
+ requirement: !ruby/object:Gem::Requirement
142
+ requirements:
143
+ - - ">="
144
+ - !ruby/object:Gem::Version
145
+ version: 0.5.1
146
+ type: :development
147
+ prerelease: false
148
+ version_requirements: !ruby/object:Gem::Requirement
149
+ requirements:
150
+ - - ">="
151
+ - !ruby/object:Gem::Version
152
+ version: 0.5.1
111
153
  description:
112
154
  email:
113
155
  - janko.marohnic@gmail.com
@@ -123,6 +165,7 @@ files:
123
165
  - lib/tus/info.rb
124
166
  - lib/tus/input.rb
125
167
  - lib/tus/server.rb
168
+ - lib/tus/server/goliath.rb
126
169
  - lib/tus/storage/filesystem.rb
127
170
  - lib/tus/storage/gridfs.rb
128
171
  - lib/tus/storage/s3.rb
@@ -147,7 +190,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
147
190
  version: '0'
148
191
  requirements: []
149
192
  rubyforge_project:
150
- rubygems_version: 2.5.1
193
+ rubygems_version: 2.6.11
151
194
  signing_key:
152
195
  specification_version: 4
153
196
  summary: Ruby server implementation of tus.io, the open protocol for resumable file