down 2.5.1 → 3.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 0b0fd2b712dcfee3b7e3da1189103e57b2f67621
4
- data.tar.gz: 9d812256dc29a14cd18d17517f8aef6d360cf37b
3
+ metadata.gz: c16a220821aeb2a11910334c3598ccb5b823473d
4
+ data.tar.gz: 21b1426c6169e82627fb445cbe0526456e9c9f19
5
5
  SHA512:
6
- metadata.gz: 1174fbcef905d92928c24470f2a109fb6b96cdbc02819bfdae719e79959f3da091a76baf599f5ed2821eb42d254a56df43313fc4bcfa36d604a3553155c3e7df
7
- data.tar.gz: c88b14d47f42bc78601e4cb97e86ff5b559dd25edd1c833e1638699eb04810e09cde0a77e700458b8be582f3a06a48439bc72b63b79801945523d1d4481a9f49
6
+ metadata.gz: 97447da54fba5009a0dca7b3013906dc5c0f707620b2ab98bd031f85e0b02180829188bb9df31daeacdcc49138aba024f630a2d9a3c5170d6f4119add90ab2c8
7
+ data.tar.gz: '09498dc3f7ad7256700f42fb536b76293d6f3825e2b686eb091772b22e366116a346130832ea46d28b739fe76e99f477575e3d003efa852696c8fb492128243e'
data/README.md CHANGED
@@ -1,15 +1,19 @@
1
1
  # Down
2
2
 
3
- Down is a wrapper around [open-uri] standard library for safe downloading of
4
- remote files.
3
+ Down is a utility tool for streaming, flexible and safe downloading of remote
4
+ files. It can use [open-uri] + `Net::HTTP` or [HTTP.rb] as the backend HTTP
5
+ library.
5
6
 
6
7
  ## Installation
7
8
 
8
9
  ```rb
9
- gem 'down'
10
+ gem "down"
10
11
  ```
11
12
 
12
- ## Usage
13
+ ## Downloading
14
+
15
+ The primary method is `Down.download`, which downloads the remote file into a
16
+ Tempfile:
13
17
 
14
18
  ```rb
15
19
  require "down"
@@ -17,37 +21,13 @@ tempfile = Down.download("http://example.com/nature.jpg")
17
21
  tempfile #=> #<Tempfile:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/20150925-55456-z7vxqz.jpg>
18
22
  ```
19
23
 
20
- ## Downloading
21
-
22
- If you're downloading files from URLs that come from you, then it's probably
23
- enough to just use `open-uri`. However, if you're accepting URLs from your
24
- users (e.g. through `remote_<avatar>_url` in CarrierWave), then downloading is
25
- suddenly not as simple as it appears to be.
26
-
27
- ### StringIO
28
-
29
- Firstly, you may think that `open-uri` always downloads a file to disk, but
30
- that's not true. If the downloaded file has 10 KB or less, `open-uri` actually
31
- returns a `StringIO`. In my application I needed that the file is always
32
- downloaded to disk. This was obviously a wrong design decision from the MRI
33
- team, so Down patches this behaviour and always returns a `Tempfile`.
34
-
35
- ### File extension
36
-
37
- When using `open-uri` directly, the extension of the remote file is not
38
- preserved. Down patches that behaviour and preserves the file extension.
39
-
40
24
  ### Metadata
41
25
 
42
- `open-uri` adds some metadata to the returned file, like `#content_type`. Down
43
- adds `#original_filename` as well, which is extracted from the URL.
26
+ The returned Tempfile has `#content_type` and `#original_filename` attributes
27
+ determined from the response headers:
44
28
 
45
29
  ```rb
46
- require "down"
47
- tempfile = Down.download("http://example.com/nature.jpg")
48
-
49
- tempfile #=> #<Tempfile:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/20150925-55456-z7vxqz.jpg>
50
- tempfile.content_type #=> "image/jpeg"
30
+ tempfile.content_type #=> "image/jpeg"
51
31
  tempfile.original_filename #=> "nature.jpg"
52
32
  ```
53
33
 
@@ -65,60 +45,47 @@ Down.download("http://example.com/image.jpg", max_size: 5 * 1024 * 1024) # 5 MB
65
45
  What is the advantage over simply checking size after downloading? Well, Down
66
46
  terminates the download very early, as soon as it gets the `Content-Length`
67
47
  header. And if the `Content-Length` header is missing, Down will terminate the
68
- download as soon as it receives a chunk which surpasses the maximum size.
48
+ download as soon as the downloaded content surpasses the maximum size.
69
49
 
70
- ### Redirects
50
+ ### Basic authentication
71
51
 
72
- By default open-uri's redirects are turned off, since open-uri doesn't have a
73
- way to limit maximum number of redirects. Instead Down itself implements
74
- following redirects, by default allowing maximum of 2 redirects.
52
+ `Down.download` and `Down.open` will automatically detect and apply HTTP basic
53
+ authentication from the URL:
75
54
 
76
55
  ```rb
77
- Down.download("http://example.com/image.jpg") # 2 redirects allowed
78
- Down.download("http://example.com/image.jpg", max_redirects: 5) # 5 redirects allowed
79
- Down.download("http://example.com/image.jpg", max_redirects: 0) # 0 redirects allowed
56
+ Down.download("http://user:password@example.org")
57
+ Down.open("http://user:password@example.org")
80
58
  ```
81
59
 
82
60
  ### Download errors
83
61
 
84
62
  There are a lot of ways in which a download can fail:
85
63
 
86
- * URL is really invalid (`URI::InvalidURIError`)
87
- * URL is a little bit invalid, e.g. "http:/example.com" (`Errno::ECONNREFUSED`)
88
- * Domain was not found (`SocketError`)
89
- * Domain was found, but status is 4xx or 5xx (`OpenURI::HTTPError`)
90
- * Request timeout (`Timeout::Error`)
64
+ * Response status was 4xx or 5xx
65
+ * Domain was not found
66
+ * Timeout occurred
67
+ * URL is invalid
68
+ * ...
91
69
 
92
- Down unifies all of these errors into one `Down::NotFound` error (because this
93
- is what actually happened from the outside perspective). If you want to get the
94
- actual error raised by open-uri, in Ruby 2.1 or later you can use
70
+ Down attempts to unify all of these exceptions into one `Down::NotFound` error
71
+ (because this is what actually happened from the outside perspective). If you
72
+ want to retrieve the original error raised, in Ruby 2.1+ you can use
95
73
  `Exception#cause`:
96
74
 
97
75
  ```rb
98
76
  begin
99
77
  Down.download("http://example.com")
100
- rescue Down::Error => error
101
- error.cause #=> #<RuntimeError: HTTP redirection loop: http://example.com>
78
+ rescue Down::Error => exception
79
+ exception.cause #=> #<Timeout::Error>
102
80
  end
103
81
  ```
104
82
 
105
- ### Additional options
106
-
107
- Any additional options will be forwarded to [open-uri], so you can for example
108
- add basic authentication or a timeout:
109
-
110
- ```rb
111
- Down.download "http://example.com/image.jpg",
112
- http_basic_authentication: ['john', 'secret'],
113
- read_timeout: 5
114
- ```
115
-
116
83
  ## Streaming
117
84
 
118
- Down has the ability to access content of the remote file *as it is being
119
- downloaded*. The `Down.open` method returns an IO object which represents the
120
- remote file on the given URL. When you read from it, Down internally downloads
121
- chunks of the remote file, but only how much is needed.
85
+ Down has the ability to retrieve content of the remote file *as it is being
86
+ downloaded*. The `Down.open` method returns a `Down::ChunkedIO` object which
87
+ represents the remote file on the given URL. When you read from it, Down
88
+ internally downloads chunks of the remote file, but only how much is needed.
122
89
 
123
90
  ```rb
124
91
  remote_file = Down.open("http://example.com/image.jpg")
@@ -126,44 +93,61 @@ remote_file.size # read from the "Content-Length" header
126
93
 
127
94
  remote_file.read(1024) # downloads and returns first 1 KB
128
95
  remote_file.read(1024) # downloads and returns next 1 KB
129
- remote_file.read # downloads and returns the rest of the file
130
96
 
131
- remote_file.eof? #=> true
132
- remote_file.rewind
133
97
  remote_file.eof? #=> false
98
+ remote_file.read # downloads and returns the rest of the file content
99
+ remote_file.eof? #=> true
134
100
 
135
101
  remote_file.close # closes the HTTP connection and deletes the internal Tempfile
136
102
  ```
137
103
 
138
- You can also yield chunks directly as they're downloaded:
104
+ ### Caching
105
+
106
+ By default the downloaded content is internally cached into a `Tempfile`, so
107
+ that when you rewind the `Down::ChunkedIO`, it continues reading the cached
108
+ content that it had already retrieved.
139
109
 
140
110
  ```rb
141
111
  remote_file = Down.open("http://example.com/image.jpg")
142
- remote_file.each_chunk do |chunk|
143
- # ...
144
- end
145
- remote_file.close
112
+ remote_file.read(1*1024*1024) # downloads, caches, and returns first 1MB
113
+ remote_file.rewind
114
+ remote_file.read(1*1024*1024) # reads the cached content
115
+ remote_file.read(1*1024*1024) # downloads the next 1MB
146
116
  ```
147
117
 
148
- You can access the response status and headers of the HTTP request that was made:
118
+ If you want to save on IO calls and on disk usage, and don't need to be able to
119
+ rewind the `Down::ChunkedIO`, you can disable caching downloaded content:
120
+
121
+ ```rb
122
+ Down.open("http://example.com/image.jpg", rewindable: false)
123
+ ```
124
+
125
+ ### Yielding chunks
126
+
127
+ You can also yield chunks directly as they're downloaded via `#each_chunk`, in
128
+ which case the downloaded content is not cached into a file regardless of the
129
+ `:rewindable` option.
149
130
 
150
131
  ```rb
151
132
  remote_file = Down.open("http://example.com/image.jpg")
152
- remote_file.data[:status] #=> 200
153
- remote_file.data[:headers] #=> { ... }
133
+ remote_file.each_chunk { |chunk| ... }
134
+ remote_file.close
154
135
  ```
155
136
 
156
- Note that `Down::NotFound` error will automatically be raised if response
157
- status was 4xx or 5xx.
137
+ ### Data
158
138
 
159
- `Down.open` accepts `:ssl_verify_mode` and `:ssl_ca_cert` options with the same
160
- semantics as in `open-uri`, and any options with String keys will be
161
- interpreted as request headers.
139
+ You can access the response status and headers of the HTTP request that was made:
162
140
 
163
141
  ```rb
164
- Down.open("http://example.com/image.jpg", {"Authorization" => "..."})
142
+ remote_file = Down.open("http://example.com/image.jpg")
143
+ remote_file.data[:status] #=> 200
144
+ remote_file.data[:headers] #=> { ... }
145
+ remote_file.data[:response] # returns the response object
165
146
  ```
166
147
 
148
+ Note that `Down::NotFound` error will automatically be raised if response
149
+ status was 4xx or 5xx.
150
+
167
151
  ### `Down::ChunkedIO`
168
152
 
169
153
  The `Down.open` method uses `Down::ChunkedIO` internally. However,
@@ -173,9 +157,12 @@ The `Down.open` method uses `Down::ChunkedIO` internally. However,
173
157
  Down::ChunkedIO.new(...)
174
158
  ```
175
159
 
176
- * `:size` – size of the file, if it's known
177
- * `:chunks` – an `Enumerator` which returns chunks
178
- * `:on_close` – called when streaming finishes
160
+ * `:chunks` – an `Enumerator` which retrieves chunks
161
+ * `:size` – size of the file if it's known (returned by `#size`)
162
+ * `:on_close` – called when streaming finishes or IO is closed
163
+ * `:data` - custom data that you want to store (returned by `#data`)
164
+ * `:rewindable` - whether to cache retrieved data into a file (defaults to `true`)
165
+ * `:encoding` - force content to be returned in specified encoding (defaults to ASCII-8BIT)
179
166
 
180
167
  Here is an example of wrapping streaming MongoDB files:
181
168
 
@@ -185,7 +172,7 @@ require "down/chunked_io"
185
172
  mongo = Mongo::Client.new(...)
186
173
  bucket = mongo.database.fs
187
174
 
188
- content_length = bucket.find(_id: id).first["length"]
175
+ content_length = bucket.find(_id: id).first[:length]
189
176
  stream = bucket.open_download_stream(id)
190
177
 
191
178
  io = Down::ChunkedIO.new(
@@ -195,6 +182,42 @@ io = Down::ChunkedIO.new(
195
182
  )
196
183
  ```
197
184
 
185
+ ## open-uri + Net::HTTP
186
+
187
+ Then [open-uri] + Net::HTTP is the default backend, loaded by requiring `down`
188
+ or `down/net_http`:
189
+
190
+ ```rb
191
+ require "down"
192
+ # or
193
+ require "down/net_http"
194
+ ```
195
+
196
+ `Down.download` is implemented as a wrapper around open-uri, and fixes some of
197
+ open-uri's undesired behaviours:
198
+
199
+ * open-uri returns `StringIO` for files smaller than 10KB, and `Tempfile`
200
+ otherwise, but `Down.download` always returns a `Tempfile`
201
+ * open-uri doesn't give any extension to the returned `Tempfile`, but
202
+ `Down.download` adds the extension from the URL
203
+ * ...
204
+
205
+ Since open-uri doesn't expose support for partial downloads, `Down.open` is
206
+ implemented using `Net::HTTP` directly.
207
+
208
+ ### Redirects
209
+
210
+ `Down.download` turns off open-uri's following redirects, as open-uri doesn't
211
+ have a way to limit the maximum number of hops, and implements its own. By
212
+ default maximum of 2 redirects will be followed, but you can change it via the
213
+ `:max_redirects` option:
214
+
215
+ ```rb
216
+ Down.download("http://example.com/image.jpg") # 2 redirects allowed
217
+ Down.download("http://example.com/image.jpg", max_redirects: 5) # 5 redirects allowed
218
+ Down.download("http://example.com/image.jpg", max_redirects: 0) # 0 redirects allowed
219
+ ```
220
+
198
221
  ### Proxy
199
222
 
200
223
  Both `Down.download` and `Down.open` support a `:proxy` option, where you can
@@ -202,26 +225,89 @@ specify a URL to an HTTP proxy which should be used when downloading.
202
225
 
203
226
  ```rb
204
227
  Down.download("http://example.com/image.jpg", proxy: "http://proxy.org")
205
- Down.open("http://example.com/image.jpg", proxy: "http://user:password@proxy.org")
228
+ Down.open("http://example.com/image.jpg", proxy: "http://user:password@proxy.org")
229
+ ```
230
+
231
+ ### Additional options
232
+
233
+ Any additional options passed to `Down.download` will be forwarded to
234
+ [open-uri], so you can for example add basic authentication or a timeout:
235
+
236
+ ```rb
237
+ Down.download "http://example.com/image.jpg",
238
+ http_basic_authentication: ['john', 'secret'],
239
+ read_timeout: 5
240
+ ```
241
+
242
+ `Down.open` accepts `:ssl_verify_mode` and `:ssl_ca_cert` options with the same
243
+ semantics as in open-uri, and any options with String keys will be interpreted
244
+ as request headers, like with open-uri.
245
+
246
+ ```rb
247
+ Down.open("http://example.com/image.jpg", {"Authorization" => "..."})
248
+ ```
249
+
250
+ ## HTTP.rb
251
+
252
+ The [HTTP.rb] backend can be used by requiring `down/http`:
253
+
254
+ ```rb
255
+ gem "http", "~> 2.1"
256
+ gem "down"
257
+ ```
258
+ ```rb
259
+ require "down/http"
206
260
  ```
207
261
 
208
- ### Copying to tempfile
262
+ Some features that give the HTTP.rb backend an advantage over open-uri +
263
+ Net::HTTP include:
264
+
265
+ * Correct URI parsing with [Addressable::URI]
266
+ * Proper support for streaming downloads (`#download` and now reuse `#open`)
267
+ * Proper support for SSL
268
+ * Chaninable HTTP client builder API for setting default options
269
+ * Persistent connections
270
+ * Auto-inflating compressed response bodies
271
+ * ...
209
272
 
210
- Down has another "hidden" utility method, `#copy_to_tempfile`, which creates
211
- a Tempfile out of the given file. The `#download` method uses it internally,
212
- but it's also publicly available for direct use:
273
+ ### Default client
274
+
275
+ You can change the default `HTTP::Client` to be used in all download requests
276
+ via `Down::Http.client`:
213
277
 
214
278
  ```rb
215
- io # IO object that you want to copy to tempfile
216
- tempfile = Down.copy_to_tempfile "basename.jpg", io
217
- tempfile.path #=> "/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/down20151116-77262-jgcx65.jpg"
279
+ # reuse Down's default client
280
+ Down::Http.client = Down::Http.client.timeout(read: 3).feature(:auto_inflate)
281
+ Down::Http.client.default_options.merge!(ssl_context: ctx)
282
+
283
+ # or set a new client
284
+ Down::Http.client = HTTP.via("proxy-hostname.local", 8080)
218
285
  ```
219
286
 
287
+ ### Additional options
288
+
289
+ All additional options passed to `Down::Download` and `Down.open` will be
290
+ forwarded to `HTTP::Client#request`:
291
+
292
+ ```rb
293
+ Down.download("http://example.org/image.jpg", headers: {"Accept-Encoding" => "gzip"})
294
+ ```
295
+
296
+ If you prefer to add options using the chainable API, you can pass a block:
297
+
298
+ ```rb
299
+ Down.open("http://example.org/image.jpg") do |client|
300
+ client.timeout(read: 3)
301
+ end
302
+ ```
303
+
304
+ ### Thread safety
305
+
306
+ `Down::Http.client` is stored in a thread-local variable, so using the HTTP.rb
307
+ backend is thread safe.
308
+
220
309
  ## Supported Ruby versions
221
310
 
222
- * MRI 1.9.3
223
- * MRI 2.0
224
- * MRI 2.1
225
311
  * MRI 2.2
226
312
  * MRI 2.3
227
313
  * MRI 2.4
@@ -229,14 +315,17 @@ tempfile.path #=> "/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/down20151116
229
315
 
230
316
  ## Development
231
317
 
318
+ The test suite runs the http://httpbin.org/ server locally, and uses it to test
319
+ downloads. Httpbin is a Python package which is run with GUnicorn:
320
+
232
321
  ```
233
- $ rake test
322
+ $ pip install gunicorn httpbin
234
323
  ```
235
324
 
236
- If you want to test across Ruby versions and you're using rbenv, run
325
+ Afterwards you can run tests with
237
326
 
238
327
  ```
239
- $ bin/test-versions
328
+ $ rake test
240
329
  ```
241
330
 
242
331
  ## License
@@ -244,3 +333,5 @@ $ bin/test-versions
244
333
  [MIT](LICENSE.txt)
245
334
 
246
335
  [open-uri]: http://ruby-doc.org/stdlib-2.3.0/libdoc/open-uri/rdoc/OpenURI.html
336
+ [HTTP.rb]: https://github.com/httprb/http
337
+ [Addressable::URI]: https://github.com/sporkmonger/addressable
@@ -1,21 +1,21 @@
1
1
  require File.expand_path("../lib/down/version", __FILE__)
2
2
 
3
3
  Gem::Specification.new do |spec|
4
- spec.name = "down"
5
- spec.version = Down::VERSION
6
- spec.authors = ["Janko Marohnić"]
7
- spec.email = ["janko.marohnic@gmail.com"]
4
+ spec.name = "down"
5
+ spec.version = Down::VERSION
8
6
 
9
- spec.summary = "Robust streaming downloads using net/http."
10
- spec.homepage = "https://github.com/janko-m/down"
11
- spec.license = "MIT"
7
+ spec.required_ruby_version = ">= 2.1"
12
8
 
13
- spec.files = Dir["README.md", "LICENSE.txt", "*.gemspec", "lib/**/*.rb"]
14
- spec.require_paths = ["lib"]
9
+ spec.summary = "Robust streaming downloads using net/http."
10
+ spec.homepage = "https://github.com/janko-m/down"
11
+ spec.authors = ["Janko Marohnić"]
12
+ spec.email = ["janko.marohnic@gmail.com"]
13
+ spec.license = "MIT"
14
+
15
+ spec.files = Dir["README.md", "LICENSE.txt", "*.gemspec", "lib/**/*.rb"]
16
+ spec.require_path = "lib"
15
17
 
16
- spec.add_development_dependency "rake"
17
18
  spec.add_development_dependency "minitest", "~> 5.8"
18
- spec.add_development_dependency "webmock", "~> 2.3"
19
- spec.add_development_dependency "addressable", "< 2.5"
20
19
  spec.add_development_dependency "mocha"
20
+ spec.add_development_dependency "http", "~> 2.1"
21
21
  end