down 2.5.1 → 3.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +190 -99
- data/down.gemspec +12 -12
- data/lib/down.rb +3 -215
- data/lib/down/chunked_io.rb +81 -46
- data/lib/down/errors.rb +16 -0
- data/lib/down/http.rb +150 -0
- data/lib/down/net_http.rb +221 -0
- data/lib/down/version.rb +1 -1
- metadata +16 -42
- data/lib/down/wget.rb +0 -20
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c16a220821aeb2a11910334c3598ccb5b823473d
|
4
|
+
data.tar.gz: 21b1426c6169e82627fb445cbe0526456e9c9f19
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 97447da54fba5009a0dca7b3013906dc5c0f707620b2ab98bd031f85e0b02180829188bb9df31daeacdcc49138aba024f630a2d9a3c5170d6f4119add90ab2c8
|
7
|
+
data.tar.gz: '09498dc3f7ad7256700f42fb536b76293d6f3825e2b686eb091772b22e366116a346130832ea46d28b739fe76e99f477575e3d003efa852696c8fb492128243e'
|
data/README.md
CHANGED
@@ -1,15 +1,19 @@
|
|
1
1
|
# Down
|
2
2
|
|
3
|
-
Down is a
|
4
|
-
|
3
|
+
Down is a utility tool for streaming, flexible and safe downloading of remote
|
4
|
+
files. It can use [open-uri] + `Net::HTTP` or [HTTP.rb] as the backend HTTP
|
5
|
+
library.
|
5
6
|
|
6
7
|
## Installation
|
7
8
|
|
8
9
|
```rb
|
9
|
-
gem
|
10
|
+
gem "down"
|
10
11
|
```
|
11
12
|
|
12
|
-
##
|
13
|
+
## Downloading
|
14
|
+
|
15
|
+
The primary method is `Down.download`, which downloads the remote file into a
|
16
|
+
Tempfile:
|
13
17
|
|
14
18
|
```rb
|
15
19
|
require "down"
|
@@ -17,37 +21,13 @@ tempfile = Down.download("http://example.com/nature.jpg")
|
|
17
21
|
tempfile #=> #<Tempfile:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/20150925-55456-z7vxqz.jpg>
|
18
22
|
```
|
19
23
|
|
20
|
-
## Downloading
|
21
|
-
|
22
|
-
If you're downloading files from URLs that come from you, then it's probably
|
23
|
-
enough to just use `open-uri`. However, if you're accepting URLs from your
|
24
|
-
users (e.g. through `remote_<avatar>_url` in CarrierWave), then downloading is
|
25
|
-
suddenly not as simple as it appears to be.
|
26
|
-
|
27
|
-
### StringIO
|
28
|
-
|
29
|
-
Firstly, you may think that `open-uri` always downloads a file to disk, but
|
30
|
-
that's not true. If the downloaded file has 10 KB or less, `open-uri` actually
|
31
|
-
returns a `StringIO`. In my application I needed that the file is always
|
32
|
-
downloaded to disk. This was obviously a wrong design decision from the MRI
|
33
|
-
team, so Down patches this behaviour and always returns a `Tempfile`.
|
34
|
-
|
35
|
-
### File extension
|
36
|
-
|
37
|
-
When using `open-uri` directly, the extension of the remote file is not
|
38
|
-
preserved. Down patches that behaviour and preserves the file extension.
|
39
|
-
|
40
24
|
### Metadata
|
41
25
|
|
42
|
-
|
43
|
-
|
26
|
+
The returned Tempfile has `#content_type` and `#original_filename` attributes
|
27
|
+
determined from the response headers:
|
44
28
|
|
45
29
|
```rb
|
46
|
-
|
47
|
-
tempfile = Down.download("http://example.com/nature.jpg")
|
48
|
-
|
49
|
-
tempfile #=> #<Tempfile:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/20150925-55456-z7vxqz.jpg>
|
50
|
-
tempfile.content_type #=> "image/jpeg"
|
30
|
+
tempfile.content_type #=> "image/jpeg"
|
51
31
|
tempfile.original_filename #=> "nature.jpg"
|
52
32
|
```
|
53
33
|
|
@@ -65,60 +45,47 @@ Down.download("http://example.com/image.jpg", max_size: 5 * 1024 * 1024) # 5 MB
|
|
65
45
|
What is the advantage over simply checking size after downloading? Well, Down
|
66
46
|
terminates the download very early, as soon as it gets the `Content-Length`
|
67
47
|
header. And if the `Content-Length` header is missing, Down will terminate the
|
68
|
-
download as soon as
|
48
|
+
download as soon as the downloaded content surpasses the maximum size.
|
69
49
|
|
70
|
-
###
|
50
|
+
### Basic authentication
|
71
51
|
|
72
|
-
|
73
|
-
|
74
|
-
following redirects, by default allowing maximum of 2 redirects.
|
52
|
+
`Down.download` and `Down.open` will automatically detect and apply HTTP basic
|
53
|
+
authentication from the URL:
|
75
54
|
|
76
55
|
```rb
|
77
|
-
Down.download("http://example.
|
78
|
-
Down.
|
79
|
-
Down.download("http://example.com/image.jpg", max_redirects: 0) # 0 redirects allowed
|
56
|
+
Down.download("http://user:password@example.org")
|
57
|
+
Down.open("http://user:password@example.org")
|
80
58
|
```
|
81
59
|
|
82
60
|
### Download errors
|
83
61
|
|
84
62
|
There are a lot of ways in which a download can fail:
|
85
63
|
|
86
|
-
*
|
87
|
-
*
|
88
|
-
*
|
89
|
-
*
|
90
|
-
*
|
64
|
+
* Response status was 4xx or 5xx
|
65
|
+
* Domain was not found
|
66
|
+
* Timeout occurred
|
67
|
+
* URL is invalid
|
68
|
+
* ...
|
91
69
|
|
92
|
-
Down
|
93
|
-
is what actually happened from the outside perspective). If you
|
94
|
-
|
70
|
+
Down attempts to unify all of these exceptions into one `Down::NotFound` error
|
71
|
+
(because this is what actually happened from the outside perspective). If you
|
72
|
+
want to retrieve the original error raised, in Ruby 2.1+ you can use
|
95
73
|
`Exception#cause`:
|
96
74
|
|
97
75
|
```rb
|
98
76
|
begin
|
99
77
|
Down.download("http://example.com")
|
100
|
-
rescue Down::Error =>
|
101
|
-
|
78
|
+
rescue Down::Error => exception
|
79
|
+
exception.cause #=> #<Timeout::Error>
|
102
80
|
end
|
103
81
|
```
|
104
82
|
|
105
|
-
### Additional options
|
106
|
-
|
107
|
-
Any additional options will be forwarded to [open-uri], so you can for example
|
108
|
-
add basic authentication or a timeout:
|
109
|
-
|
110
|
-
```rb
|
111
|
-
Down.download "http://example.com/image.jpg",
|
112
|
-
http_basic_authentication: ['john', 'secret'],
|
113
|
-
read_timeout: 5
|
114
|
-
```
|
115
|
-
|
116
83
|
## Streaming
|
117
84
|
|
118
|
-
Down has the ability to
|
119
|
-
downloaded*. The `Down.open` method returns
|
120
|
-
remote file on the given URL. When you read from it, Down
|
121
|
-
chunks of the remote file, but only how much is needed.
|
85
|
+
Down has the ability to retrieve content of the remote file *as it is being
|
86
|
+
downloaded*. The `Down.open` method returns a `Down::ChunkedIO` object which
|
87
|
+
represents the remote file on the given URL. When you read from it, Down
|
88
|
+
internally downloads chunks of the remote file, but only how much is needed.
|
122
89
|
|
123
90
|
```rb
|
124
91
|
remote_file = Down.open("http://example.com/image.jpg")
|
@@ -126,44 +93,61 @@ remote_file.size # read from the "Content-Length" header
|
|
126
93
|
|
127
94
|
remote_file.read(1024) # downloads and returns first 1 KB
|
128
95
|
remote_file.read(1024) # downloads and returns next 1 KB
|
129
|
-
remote_file.read # downloads and returns the rest of the file
|
130
96
|
|
131
|
-
remote_file.eof? #=> true
|
132
|
-
remote_file.rewind
|
133
97
|
remote_file.eof? #=> false
|
98
|
+
remote_file.read # downloads and returns the rest of the file content
|
99
|
+
remote_file.eof? #=> true
|
134
100
|
|
135
101
|
remote_file.close # closes the HTTP connection and deletes the internal Tempfile
|
136
102
|
```
|
137
103
|
|
138
|
-
|
104
|
+
### Caching
|
105
|
+
|
106
|
+
By default the downloaded content is internally cached into a `Tempfile`, so
|
107
|
+
that when you rewind the `Down::ChunkedIO`, it continues reading the cached
|
108
|
+
content that it had already retrieved.
|
139
109
|
|
140
110
|
```rb
|
141
111
|
remote_file = Down.open("http://example.com/image.jpg")
|
142
|
-
remote_file.
|
143
|
-
|
144
|
-
|
145
|
-
remote_file.
|
112
|
+
remote_file.read(1*1024*1024) # downloads, caches, and returns first 1MB
|
113
|
+
remote_file.rewind
|
114
|
+
remote_file.read(1*1024*1024) # reads the cached content
|
115
|
+
remote_file.read(1*1024*1024) # downloads the next 1MB
|
146
116
|
```
|
147
117
|
|
148
|
-
|
118
|
+
If you want to save on IO calls and on disk usage, and don't need to be able to
|
119
|
+
rewind the `Down::ChunkedIO`, you can disable caching downloaded content:
|
120
|
+
|
121
|
+
```rb
|
122
|
+
Down.open("http://example.com/image.jpg", rewindable: false)
|
123
|
+
```
|
124
|
+
|
125
|
+
### Yielding chunks
|
126
|
+
|
127
|
+
You can also yield chunks directly as they're downloaded via `#each_chunk`, in
|
128
|
+
which case the downloaded content is not cached into a file regardless of the
|
129
|
+
`:rewindable` option.
|
149
130
|
|
150
131
|
```rb
|
151
132
|
remote_file = Down.open("http://example.com/image.jpg")
|
152
|
-
remote_file.
|
153
|
-
remote_file.
|
133
|
+
remote_file.each_chunk { |chunk| ... }
|
134
|
+
remote_file.close
|
154
135
|
```
|
155
136
|
|
156
|
-
|
157
|
-
status was 4xx or 5xx.
|
137
|
+
### Data
|
158
138
|
|
159
|
-
|
160
|
-
semantics as in `open-uri`, and any options with String keys will be
|
161
|
-
interpreted as request headers.
|
139
|
+
You can access the response status and headers of the HTTP request that was made:
|
162
140
|
|
163
141
|
```rb
|
164
|
-
Down.open("http://example.com/image.jpg"
|
142
|
+
remote_file = Down.open("http://example.com/image.jpg")
|
143
|
+
remote_file.data[:status] #=> 200
|
144
|
+
remote_file.data[:headers] #=> { ... }
|
145
|
+
remote_file.data[:response] # returns the response object
|
165
146
|
```
|
166
147
|
|
148
|
+
Note that `Down::NotFound` error will automatically be raised if response
|
149
|
+
status was 4xx or 5xx.
|
150
|
+
|
167
151
|
### `Down::ChunkedIO`
|
168
152
|
|
169
153
|
The `Down.open` method uses `Down::ChunkedIO` internally. However,
|
@@ -173,9 +157,12 @@ The `Down.open` method uses `Down::ChunkedIO` internally. However,
|
|
173
157
|
Down::ChunkedIO.new(...)
|
174
158
|
```
|
175
159
|
|
176
|
-
* `:
|
177
|
-
* `:
|
178
|
-
* `:on_close` – called when streaming finishes
|
160
|
+
* `:chunks` – an `Enumerator` which retrieves chunks
|
161
|
+
* `:size` – size of the file if it's known (returned by `#size`)
|
162
|
+
* `:on_close` – called when streaming finishes or IO is closed
|
163
|
+
* `:data` - custom data that you want to store (returned by `#data`)
|
164
|
+
* `:rewindable` - whether to cache retrieved data into a file (defaults to `true`)
|
165
|
+
* `:encoding` - force content to be returned in specified encoding (defaults to ASCII-8BIT)
|
179
166
|
|
180
167
|
Here is an example of wrapping streaming MongoDB files:
|
181
168
|
|
@@ -185,7 +172,7 @@ require "down/chunked_io"
|
|
185
172
|
mongo = Mongo::Client.new(...)
|
186
173
|
bucket = mongo.database.fs
|
187
174
|
|
188
|
-
content_length = bucket.find(_id: id).first[
|
175
|
+
content_length = bucket.find(_id: id).first[:length]
|
189
176
|
stream = bucket.open_download_stream(id)
|
190
177
|
|
191
178
|
io = Down::ChunkedIO.new(
|
@@ -195,6 +182,42 @@ io = Down::ChunkedIO.new(
|
|
195
182
|
)
|
196
183
|
```
|
197
184
|
|
185
|
+
## open-uri + Net::HTTP
|
186
|
+
|
187
|
+
Then [open-uri] + Net::HTTP is the default backend, loaded by requiring `down`
|
188
|
+
or `down/net_http`:
|
189
|
+
|
190
|
+
```rb
|
191
|
+
require "down"
|
192
|
+
# or
|
193
|
+
require "down/net_http"
|
194
|
+
```
|
195
|
+
|
196
|
+
`Down.download` is implemented as a wrapper around open-uri, and fixes some of
|
197
|
+
open-uri's undesired behaviours:
|
198
|
+
|
199
|
+
* open-uri returns `StringIO` for files smaller than 10KB, and `Tempfile`
|
200
|
+
otherwise, but `Down.download` always returns a `Tempfile`
|
201
|
+
* open-uri doesn't give any extension to the returned `Tempfile`, but
|
202
|
+
`Down.download` adds the extension from the URL
|
203
|
+
* ...
|
204
|
+
|
205
|
+
Since open-uri doesn't expose support for partial downloads, `Down.open` is
|
206
|
+
implemented using `Net::HTTP` directly.
|
207
|
+
|
208
|
+
### Redirects
|
209
|
+
|
210
|
+
`Down.download` turns off open-uri's following redirects, as open-uri doesn't
|
211
|
+
have a way to limit the maximum number of hops, and implements its own. By
|
212
|
+
default maximum of 2 redirects will be followed, but you can change it via the
|
213
|
+
`:max_redirects` option:
|
214
|
+
|
215
|
+
```rb
|
216
|
+
Down.download("http://example.com/image.jpg") # 2 redirects allowed
|
217
|
+
Down.download("http://example.com/image.jpg", max_redirects: 5) # 5 redirects allowed
|
218
|
+
Down.download("http://example.com/image.jpg", max_redirects: 0) # 0 redirects allowed
|
219
|
+
```
|
220
|
+
|
198
221
|
### Proxy
|
199
222
|
|
200
223
|
Both `Down.download` and `Down.open` support a `:proxy` option, where you can
|
@@ -202,26 +225,89 @@ specify a URL to an HTTP proxy which should be used when downloading.
|
|
202
225
|
|
203
226
|
```rb
|
204
227
|
Down.download("http://example.com/image.jpg", proxy: "http://proxy.org")
|
205
|
-
Down.open("http://example.com/image.jpg",
|
228
|
+
Down.open("http://example.com/image.jpg", proxy: "http://user:password@proxy.org")
|
229
|
+
```
|
230
|
+
|
231
|
+
### Additional options
|
232
|
+
|
233
|
+
Any additional options passed to `Down.download` will be forwarded to
|
234
|
+
[open-uri], so you can for example add basic authentication or a timeout:
|
235
|
+
|
236
|
+
```rb
|
237
|
+
Down.download "http://example.com/image.jpg",
|
238
|
+
http_basic_authentication: ['john', 'secret'],
|
239
|
+
read_timeout: 5
|
240
|
+
```
|
241
|
+
|
242
|
+
`Down.open` accepts `:ssl_verify_mode` and `:ssl_ca_cert` options with the same
|
243
|
+
semantics as in open-uri, and any options with String keys will be interpreted
|
244
|
+
as request headers, like with open-uri.
|
245
|
+
|
246
|
+
```rb
|
247
|
+
Down.open("http://example.com/image.jpg", {"Authorization" => "..."})
|
248
|
+
```
|
249
|
+
|
250
|
+
## HTTP.rb
|
251
|
+
|
252
|
+
The [HTTP.rb] backend can be used by requiring `down/http`:
|
253
|
+
|
254
|
+
```rb
|
255
|
+
gem "http", "~> 2.1"
|
256
|
+
gem "down"
|
257
|
+
```
|
258
|
+
```rb
|
259
|
+
require "down/http"
|
206
260
|
```
|
207
261
|
|
208
|
-
|
262
|
+
Some features that give the HTTP.rb backend an advantage over open-uri +
|
263
|
+
Net::HTTP include:
|
264
|
+
|
265
|
+
* Correct URI parsing with [Addressable::URI]
|
266
|
+
* Proper support for streaming downloads (`#download` and now reuse `#open`)
|
267
|
+
* Proper support for SSL
|
268
|
+
* Chaninable HTTP client builder API for setting default options
|
269
|
+
* Persistent connections
|
270
|
+
* Auto-inflating compressed response bodies
|
271
|
+
* ...
|
209
272
|
|
210
|
-
|
211
|
-
|
212
|
-
|
273
|
+
### Default client
|
274
|
+
|
275
|
+
You can change the default `HTTP::Client` to be used in all download requests
|
276
|
+
via `Down::Http.client`:
|
213
277
|
|
214
278
|
```rb
|
215
|
-
|
216
|
-
|
217
|
-
|
279
|
+
# reuse Down's default client
|
280
|
+
Down::Http.client = Down::Http.client.timeout(read: 3).feature(:auto_inflate)
|
281
|
+
Down::Http.client.default_options.merge!(ssl_context: ctx)
|
282
|
+
|
283
|
+
# or set a new client
|
284
|
+
Down::Http.client = HTTP.via("proxy-hostname.local", 8080)
|
218
285
|
```
|
219
286
|
|
287
|
+
### Additional options
|
288
|
+
|
289
|
+
All additional options passed to `Down::Download` and `Down.open` will be
|
290
|
+
forwarded to `HTTP::Client#request`:
|
291
|
+
|
292
|
+
```rb
|
293
|
+
Down.download("http://example.org/image.jpg", headers: {"Accept-Encoding" => "gzip"})
|
294
|
+
```
|
295
|
+
|
296
|
+
If you prefer to add options using the chainable API, you can pass a block:
|
297
|
+
|
298
|
+
```rb
|
299
|
+
Down.open("http://example.org/image.jpg") do |client|
|
300
|
+
client.timeout(read: 3)
|
301
|
+
end
|
302
|
+
```
|
303
|
+
|
304
|
+
### Thread safety
|
305
|
+
|
306
|
+
`Down::Http.client` is stored in a thread-local variable, so using the HTTP.rb
|
307
|
+
backend is thread safe.
|
308
|
+
|
220
309
|
## Supported Ruby versions
|
221
310
|
|
222
|
-
* MRI 1.9.3
|
223
|
-
* MRI 2.0
|
224
|
-
* MRI 2.1
|
225
311
|
* MRI 2.2
|
226
312
|
* MRI 2.3
|
227
313
|
* MRI 2.4
|
@@ -229,14 +315,17 @@ tempfile.path #=> "/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/down20151116
|
|
229
315
|
|
230
316
|
## Development
|
231
317
|
|
318
|
+
The test suite runs the http://httpbin.org/ server locally, and uses it to test
|
319
|
+
downloads. Httpbin is a Python package which is run with GUnicorn:
|
320
|
+
|
232
321
|
```
|
233
|
-
$
|
322
|
+
$ pip install gunicorn httpbin
|
234
323
|
```
|
235
324
|
|
236
|
-
|
325
|
+
Afterwards you can run tests with
|
237
326
|
|
238
327
|
```
|
239
|
-
$
|
328
|
+
$ rake test
|
240
329
|
```
|
241
330
|
|
242
331
|
## License
|
@@ -244,3 +333,5 @@ $ bin/test-versions
|
|
244
333
|
[MIT](LICENSE.txt)
|
245
334
|
|
246
335
|
[open-uri]: http://ruby-doc.org/stdlib-2.3.0/libdoc/open-uri/rdoc/OpenURI.html
|
336
|
+
[HTTP.rb]: https://github.com/httprb/http
|
337
|
+
[Addressable::URI]: https://github.com/sporkmonger/addressable
|
data/down.gemspec
CHANGED
@@ -1,21 +1,21 @@
|
|
1
1
|
require File.expand_path("../lib/down/version", __FILE__)
|
2
2
|
|
3
3
|
Gem::Specification.new do |spec|
|
4
|
-
spec.name
|
5
|
-
spec.version
|
6
|
-
spec.authors = ["Janko Marohnić"]
|
7
|
-
spec.email = ["janko.marohnic@gmail.com"]
|
4
|
+
spec.name = "down"
|
5
|
+
spec.version = Down::VERSION
|
8
6
|
|
9
|
-
spec.
|
10
|
-
spec.homepage = "https://github.com/janko-m/down"
|
11
|
-
spec.license = "MIT"
|
7
|
+
spec.required_ruby_version = ">= 2.1"
|
12
8
|
|
13
|
-
spec.
|
14
|
-
spec.
|
9
|
+
spec.summary = "Robust streaming downloads using net/http."
|
10
|
+
spec.homepage = "https://github.com/janko-m/down"
|
11
|
+
spec.authors = ["Janko Marohnić"]
|
12
|
+
spec.email = ["janko.marohnic@gmail.com"]
|
13
|
+
spec.license = "MIT"
|
14
|
+
|
15
|
+
spec.files = Dir["README.md", "LICENSE.txt", "*.gemspec", "lib/**/*.rb"]
|
16
|
+
spec.require_path = "lib"
|
15
17
|
|
16
|
-
spec.add_development_dependency "rake"
|
17
18
|
spec.add_development_dependency "minitest", "~> 5.8"
|
18
|
-
spec.add_development_dependency "webmock", "~> 2.3"
|
19
|
-
spec.add_development_dependency "addressable", "< 2.5"
|
20
19
|
spec.add_development_dependency "mocha"
|
20
|
+
spec.add_development_dependency "http", "~> 2.1"
|
21
21
|
end
|