down 2.5.1 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 0b0fd2b712dcfee3b7e3da1189103e57b2f67621
4
- data.tar.gz: 9d812256dc29a14cd18d17517f8aef6d360cf37b
3
+ metadata.gz: c16a220821aeb2a11910334c3598ccb5b823473d
4
+ data.tar.gz: 21b1426c6169e82627fb445cbe0526456e9c9f19
5
5
  SHA512:
6
- metadata.gz: 1174fbcef905d92928c24470f2a109fb6b96cdbc02819bfdae719e79959f3da091a76baf599f5ed2821eb42d254a56df43313fc4bcfa36d604a3553155c3e7df
7
- data.tar.gz: c88b14d47f42bc78601e4cb97e86ff5b559dd25edd1c833e1638699eb04810e09cde0a77e700458b8be582f3a06a48439bc72b63b79801945523d1d4481a9f49
6
+ metadata.gz: 97447da54fba5009a0dca7b3013906dc5c0f707620b2ab98bd031f85e0b02180829188bb9df31daeacdcc49138aba024f630a2d9a3c5170d6f4119add90ab2c8
7
+ data.tar.gz: '09498dc3f7ad7256700f42fb536b76293d6f3825e2b686eb091772b22e366116a346130832ea46d28b739fe76e99f477575e3d003efa852696c8fb492128243e'
data/README.md CHANGED
@@ -1,15 +1,19 @@
1
1
  # Down
2
2
 
3
- Down is a wrapper around [open-uri] standard library for safe downloading of
4
- remote files.
3
+ Down is a utility tool for streaming, flexible and safe downloading of remote
4
+ files. It can use [open-uri] + `Net::HTTP` or [HTTP.rb] as the backend HTTP
5
+ library.
5
6
 
6
7
  ## Installation
7
8
 
8
9
  ```rb
9
- gem 'down'
10
+ gem "down"
10
11
  ```
11
12
 
12
- ## Usage
13
+ ## Downloading
14
+
15
+ The primary method is `Down.download`, which downloads the remote file into a
16
+ Tempfile:
13
17
 
14
18
  ```rb
15
19
  require "down"
@@ -17,37 +21,13 @@ tempfile = Down.download("http://example.com/nature.jpg")
17
21
  tempfile #=> #<Tempfile:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/20150925-55456-z7vxqz.jpg>
18
22
  ```
19
23
 
20
- ## Downloading
21
-
22
- If you're downloading files from URLs that come from you, then it's probably
23
- enough to just use `open-uri`. However, if you're accepting URLs from your
24
- users (e.g. through `remote_<avatar>_url` in CarrierWave), then downloading is
25
- suddenly not as simple as it appears to be.
26
-
27
- ### StringIO
28
-
29
- Firstly, you may think that `open-uri` always downloads a file to disk, but
30
- that's not true. If the downloaded file has 10 KB or less, `open-uri` actually
31
- returns a `StringIO`. In my application I needed that the file is always
32
- downloaded to disk. This was obviously a wrong design decision from the MRI
33
- team, so Down patches this behaviour and always returns a `Tempfile`.
34
-
35
- ### File extension
36
-
37
- When using `open-uri` directly, the extension of the remote file is not
38
- preserved. Down patches that behaviour and preserves the file extension.
39
-
40
24
  ### Metadata
41
25
 
42
- `open-uri` adds some metadata to the returned file, like `#content_type`. Down
43
- adds `#original_filename` as well, which is extracted from the URL.
26
+ The returned Tempfile has `#content_type` and `#original_filename` attributes
27
+ determined from the response headers:
44
28
 
45
29
  ```rb
46
- require "down"
47
- tempfile = Down.download("http://example.com/nature.jpg")
48
-
49
- tempfile #=> #<Tempfile:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/20150925-55456-z7vxqz.jpg>
50
- tempfile.content_type #=> "image/jpeg"
30
+ tempfile.content_type #=> "image/jpeg"
51
31
  tempfile.original_filename #=> "nature.jpg"
52
32
  ```
53
33
 
@@ -65,60 +45,47 @@ Down.download("http://example.com/image.jpg", max_size: 5 * 1024 * 1024) # 5 MB
65
45
  What is the advantage over simply checking size after downloading? Well, Down
66
46
  terminates the download very early, as soon as it gets the `Content-Length`
67
47
  header. And if the `Content-Length` header is missing, Down will terminate the
68
- download as soon as it receives a chunk which surpasses the maximum size.
48
+ download as soon as the downloaded content surpasses the maximum size.
69
49
 
70
- ### Redirects
50
+ ### Basic authentication
71
51
 
72
- By default open-uri's redirects are turned off, since open-uri doesn't have a
73
- way to limit maximum number of redirects. Instead Down itself implements
74
- following redirects, by default allowing maximum of 2 redirects.
52
+ `Down.download` and `Down.open` will automatically detect and apply HTTP basic
53
+ authentication from the URL:
75
54
 
76
55
  ```rb
77
- Down.download("http://example.com/image.jpg") # 2 redirects allowed
78
- Down.download("http://example.com/image.jpg", max_redirects: 5) # 5 redirects allowed
79
- Down.download("http://example.com/image.jpg", max_redirects: 0) # 0 redirects allowed
56
+ Down.download("http://user:password@example.org")
57
+ Down.open("http://user:password@example.org")
80
58
  ```
81
59
 
82
60
  ### Download errors
83
61
 
84
62
  There are a lot of ways in which a download can fail:
85
63
 
86
- * URL is really invalid (`URI::InvalidURIError`)
87
- * URL is a little bit invalid, e.g. "http:/example.com" (`Errno::ECONNREFUSED`)
88
- * Domain was not found (`SocketError`)
89
- * Domain was found, but status is 4xx or 5xx (`OpenURI::HTTPError`)
90
- * Request timeout (`Timeout::Error`)
64
+ * Response status was 4xx or 5xx
65
+ * Domain was not found
66
+ * Timeout occurred
67
+ * URL is invalid
68
+ * ...
91
69
 
92
- Down unifies all of these errors into one `Down::NotFound` error (because this
93
- is what actually happened from the outside perspective). If you want to get the
94
- actual error raised by open-uri, in Ruby 2.1 or later you can use
70
+ Down attempts to unify all of these exceptions into one `Down::NotFound` error
71
+ (because this is what actually happened from the outside perspective). If you
72
+ want to retrieve the original error raised, in Ruby 2.1+ you can use
95
73
  `Exception#cause`:
96
74
 
97
75
  ```rb
98
76
  begin
99
77
  Down.download("http://example.com")
100
- rescue Down::Error => error
101
- error.cause #=> #<RuntimeError: HTTP redirection loop: http://example.com>
78
+ rescue Down::Error => exception
79
+ exception.cause #=> #<Timeout::Error>
102
80
  end
103
81
  ```
104
82
 
105
- ### Additional options
106
-
107
- Any additional options will be forwarded to [open-uri], so you can for example
108
- add basic authentication or a timeout:
109
-
110
- ```rb
111
- Down.download "http://example.com/image.jpg",
112
- http_basic_authentication: ['john', 'secret'],
113
- read_timeout: 5
114
- ```
115
-
116
83
  ## Streaming
117
84
 
118
- Down has the ability to access content of the remote file *as it is being
119
- downloaded*. The `Down.open` method returns an IO object which represents the
120
- remote file on the given URL. When you read from it, Down internally downloads
121
- chunks of the remote file, but only how much is needed.
85
+ Down has the ability to retrieve content of the remote file *as it is being
86
+ downloaded*. The `Down.open` method returns a `Down::ChunkedIO` object which
87
+ represents the remote file on the given URL. When you read from it, Down
88
+ internally downloads chunks of the remote file, but only how much is needed.
122
89
 
123
90
  ```rb
124
91
  remote_file = Down.open("http://example.com/image.jpg")
@@ -126,44 +93,61 @@ remote_file.size # read from the "Content-Length" header
126
93
 
127
94
  remote_file.read(1024) # downloads and returns first 1 KB
128
95
  remote_file.read(1024) # downloads and returns next 1 KB
129
- remote_file.read # downloads and returns the rest of the file
130
96
 
131
- remote_file.eof? #=> true
132
- remote_file.rewind
133
97
  remote_file.eof? #=> false
98
+ remote_file.read # downloads and returns the rest of the file content
99
+ remote_file.eof? #=> true
134
100
 
135
101
  remote_file.close # closes the HTTP connection and deletes the internal Tempfile
136
102
  ```
137
103
 
138
- You can also yield chunks directly as they're downloaded:
104
+ ### Caching
105
+
106
+ By default the downloaded content is internally cached into a `Tempfile`, so
107
+ that when you rewind the `Down::ChunkedIO`, it continues reading the cached
108
+ content that it had already retrieved.
139
109
 
140
110
  ```rb
141
111
  remote_file = Down.open("http://example.com/image.jpg")
142
- remote_file.each_chunk do |chunk|
143
- # ...
144
- end
145
- remote_file.close
112
+ remote_file.read(1*1024*1024) # downloads, caches, and returns first 1MB
113
+ remote_file.rewind
114
+ remote_file.read(1*1024*1024) # reads the cached content
115
+ remote_file.read(1*1024*1024) # downloads the next 1MB
146
116
  ```
147
117
 
148
- You can access the response status and headers of the HTTP request that was made:
118
+ If you want to save on IO calls and on disk usage, and don't need to be able to
119
+ rewind the `Down::ChunkedIO`, you can disable caching downloaded content:
120
+
121
+ ```rb
122
+ Down.open("http://example.com/image.jpg", rewindable: false)
123
+ ```
124
+
125
+ ### Yielding chunks
126
+
127
+ You can also yield chunks directly as they're downloaded via `#each_chunk`, in
128
+ which case the downloaded content is not cached into a file regardless of the
129
+ `:rewindable` option.
149
130
 
150
131
  ```rb
151
132
  remote_file = Down.open("http://example.com/image.jpg")
152
- remote_file.data[:status] #=> 200
153
- remote_file.data[:headers] #=> { ... }
133
+ remote_file.each_chunk { |chunk| ... }
134
+ remote_file.close
154
135
  ```
155
136
 
156
- Note that `Down::NotFound` error will automatically be raised if response
157
- status was 4xx or 5xx.
137
+ ### Data
158
138
 
159
- `Down.open` accepts `:ssl_verify_mode` and `:ssl_ca_cert` options with the same
160
- semantics as in `open-uri`, and any options with String keys will be
161
- interpreted as request headers.
139
+ You can access the response status and headers of the HTTP request that was made:
162
140
 
163
141
  ```rb
164
- Down.open("http://example.com/image.jpg", {"Authorization" => "..."})
142
+ remote_file = Down.open("http://example.com/image.jpg")
143
+ remote_file.data[:status] #=> 200
144
+ remote_file.data[:headers] #=> { ... }
145
+ remote_file.data[:response] # returns the response object
165
146
  ```
166
147
 
148
+ Note that `Down::NotFound` error will automatically be raised if response
149
+ status was 4xx or 5xx.
150
+
167
151
  ### `Down::ChunkedIO`
168
152
 
169
153
  The `Down.open` method uses `Down::ChunkedIO` internally. However,
@@ -173,9 +157,12 @@ The `Down.open` method uses `Down::ChunkedIO` internally. However,
173
157
  Down::ChunkedIO.new(...)
174
158
  ```
175
159
 
176
- * `:size` – size of the file, if it's known
177
- * `:chunks` – an `Enumerator` which returns chunks
178
- * `:on_close` – called when streaming finishes
160
+ * `:chunks` – an `Enumerator` which retrieves chunks
161
+ * `:size` – size of the file if it's known (returned by `#size`)
162
+ * `:on_close` – called when streaming finishes or IO is closed
163
+ * `:data` - custom data that you want to store (returned by `#data`)
164
+ * `:rewindable` - whether to cache retrieved data into a file (defaults to `true`)
165
+ * `:encoding` - force content to be returned in specified encoding (defaults to ASCII-8BIT)
179
166
 
180
167
  Here is an example of wrapping streaming MongoDB files:
181
168
 
@@ -185,7 +172,7 @@ require "down/chunked_io"
185
172
  mongo = Mongo::Client.new(...)
186
173
  bucket = mongo.database.fs
187
174
 
188
- content_length = bucket.find(_id: id).first["length"]
175
+ content_length = bucket.find(_id: id).first[:length]
189
176
  stream = bucket.open_download_stream(id)
190
177
 
191
178
  io = Down::ChunkedIO.new(
@@ -195,6 +182,42 @@ io = Down::ChunkedIO.new(
195
182
  )
196
183
  ```
197
184
 
185
+ ## open-uri + Net::HTTP
186
+
187
+ Then [open-uri] + Net::HTTP is the default backend, loaded by requiring `down`
188
+ or `down/net_http`:
189
+
190
+ ```rb
191
+ require "down"
192
+ # or
193
+ require "down/net_http"
194
+ ```
195
+
196
+ `Down.download` is implemented as a wrapper around open-uri, and fixes some of
197
+ open-uri's undesired behaviours:
198
+
199
+ * open-uri returns `StringIO` for files smaller than 10KB, and `Tempfile`
200
+ otherwise, but `Down.download` always returns a `Tempfile`
201
+ * open-uri doesn't give any extension to the returned `Tempfile`, but
202
+ `Down.download` adds the extension from the URL
203
+ * ...
204
+
205
+ Since open-uri doesn't expose support for partial downloads, `Down.open` is
206
+ implemented using `Net::HTTP` directly.
207
+
208
+ ### Redirects
209
+
210
+ `Down.download` turns off open-uri's following redirects, as open-uri doesn't
211
+ have a way to limit the maximum number of hops, and implements its own. By
212
+ default maximum of 2 redirects will be followed, but you can change it via the
213
+ `:max_redirects` option:
214
+
215
+ ```rb
216
+ Down.download("http://example.com/image.jpg") # 2 redirects allowed
217
+ Down.download("http://example.com/image.jpg", max_redirects: 5) # 5 redirects allowed
218
+ Down.download("http://example.com/image.jpg", max_redirects: 0) # 0 redirects allowed
219
+ ```
220
+
198
221
  ### Proxy
199
222
 
200
223
  Both `Down.download` and `Down.open` support a `:proxy` option, where you can
@@ -202,26 +225,89 @@ specify a URL to an HTTP proxy which should be used when downloading.
202
225
 
203
226
  ```rb
204
227
  Down.download("http://example.com/image.jpg", proxy: "http://proxy.org")
205
- Down.open("http://example.com/image.jpg", proxy: "http://user:password@proxy.org")
228
+ Down.open("http://example.com/image.jpg", proxy: "http://user:password@proxy.org")
229
+ ```
230
+
231
+ ### Additional options
232
+
233
+ Any additional options passed to `Down.download` will be forwarded to
234
+ [open-uri], so you can for example add basic authentication or a timeout:
235
+
236
+ ```rb
237
+ Down.download "http://example.com/image.jpg",
238
+ http_basic_authentication: ['john', 'secret'],
239
+ read_timeout: 5
240
+ ```
241
+
242
+ `Down.open` accepts `:ssl_verify_mode` and `:ssl_ca_cert` options with the same
243
+ semantics as in open-uri, and any options with String keys will be interpreted
244
+ as request headers, like with open-uri.
245
+
246
+ ```rb
247
+ Down.open("http://example.com/image.jpg", {"Authorization" => "..."})
248
+ ```
249
+
250
+ ## HTTP.rb
251
+
252
+ The [HTTP.rb] backend can be used by requiring `down/http`:
253
+
254
+ ```rb
255
+ gem "http", "~> 2.1"
256
+ gem "down"
257
+ ```
258
+ ```rb
259
+ require "down/http"
206
260
  ```
207
261
 
208
- ### Copying to tempfile
262
+ Some features that give the HTTP.rb backend an advantage over open-uri +
263
+ Net::HTTP include:
264
+
265
+ * Correct URI parsing with [Addressable::URI]
266
+ * Proper support for streaming downloads (`#download` and now reuse `#open`)
267
+ * Proper support for SSL
268
+ * Chaninable HTTP client builder API for setting default options
269
+ * Persistent connections
270
+ * Auto-inflating compressed response bodies
271
+ * ...
209
272
 
210
- Down has another "hidden" utility method, `#copy_to_tempfile`, which creates
211
- a Tempfile out of the given file. The `#download` method uses it internally,
212
- but it's also publicly available for direct use:
273
+ ### Default client
274
+
275
+ You can change the default `HTTP::Client` to be used in all download requests
276
+ via `Down::Http.client`:
213
277
 
214
278
  ```rb
215
- io # IO object that you want to copy to tempfile
216
- tempfile = Down.copy_to_tempfile "basename.jpg", io
217
- tempfile.path #=> "/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/down20151116-77262-jgcx65.jpg"
279
+ # reuse Down's default client
280
+ Down::Http.client = Down::Http.client.timeout(read: 3).feature(:auto_inflate)
281
+ Down::Http.client.default_options.merge!(ssl_context: ctx)
282
+
283
+ # or set a new client
284
+ Down::Http.client = HTTP.via("proxy-hostname.local", 8080)
218
285
  ```
219
286
 
287
+ ### Additional options
288
+
289
+ All additional options passed to `Down::Download` and `Down.open` will be
290
+ forwarded to `HTTP::Client#request`:
291
+
292
+ ```rb
293
+ Down.download("http://example.org/image.jpg", headers: {"Accept-Encoding" => "gzip"})
294
+ ```
295
+
296
+ If you prefer to add options using the chainable API, you can pass a block:
297
+
298
+ ```rb
299
+ Down.open("http://example.org/image.jpg") do |client|
300
+ client.timeout(read: 3)
301
+ end
302
+ ```
303
+
304
+ ### Thread safety
305
+
306
+ `Down::Http.client` is stored in a thread-local variable, so using the HTTP.rb
307
+ backend is thread safe.
308
+
220
309
  ## Supported Ruby versions
221
310
 
222
- * MRI 1.9.3
223
- * MRI 2.0
224
- * MRI 2.1
225
311
  * MRI 2.2
226
312
  * MRI 2.3
227
313
  * MRI 2.4
@@ -229,14 +315,17 @@ tempfile.path #=> "/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/down20151116
229
315
 
230
316
  ## Development
231
317
 
318
+ The test suite runs the http://httpbin.org/ server locally, and uses it to test
319
+ downloads. Httpbin is a Python package which is run with GUnicorn:
320
+
232
321
  ```
233
- $ rake test
322
+ $ pip install gunicorn httpbin
234
323
  ```
235
324
 
236
- If you want to test across Ruby versions and you're using rbenv, run
325
+ Afterwards you can run tests with
237
326
 
238
327
  ```
239
- $ bin/test-versions
328
+ $ rake test
240
329
  ```
241
330
 
242
331
  ## License
@@ -244,3 +333,5 @@ $ bin/test-versions
244
333
  [MIT](LICENSE.txt)
245
334
 
246
335
  [open-uri]: http://ruby-doc.org/stdlib-2.3.0/libdoc/open-uri/rdoc/OpenURI.html
336
+ [HTTP.rb]: https://github.com/httprb/http
337
+ [Addressable::URI]: https://github.com/sporkmonger/addressable
@@ -1,21 +1,21 @@
1
1
  require File.expand_path("../lib/down/version", __FILE__)
2
2
 
3
3
  Gem::Specification.new do |spec|
4
- spec.name = "down"
5
- spec.version = Down::VERSION
6
- spec.authors = ["Janko Marohnić"]
7
- spec.email = ["janko.marohnic@gmail.com"]
4
+ spec.name = "down"
5
+ spec.version = Down::VERSION
8
6
 
9
- spec.summary = "Robust streaming downloads using net/http."
10
- spec.homepage = "https://github.com/janko-m/down"
11
- spec.license = "MIT"
7
+ spec.required_ruby_version = ">= 2.1"
12
8
 
13
- spec.files = Dir["README.md", "LICENSE.txt", "*.gemspec", "lib/**/*.rb"]
14
- spec.require_paths = ["lib"]
9
+ spec.summary = "Robust streaming downloads using net/http."
10
+ spec.homepage = "https://github.com/janko-m/down"
11
+ spec.authors = ["Janko Marohnić"]
12
+ spec.email = ["janko.marohnic@gmail.com"]
13
+ spec.license = "MIT"
14
+
15
+ spec.files = Dir["README.md", "LICENSE.txt", "*.gemspec", "lib/**/*.rb"]
16
+ spec.require_path = "lib"
15
17
 
16
- spec.add_development_dependency "rake"
17
18
  spec.add_development_dependency "minitest", "~> 5.8"
18
- spec.add_development_dependency "webmock", "~> 2.3"
19
- spec.add_development_dependency "addressable", "< 2.5"
20
19
  spec.add_development_dependency "mocha"
20
+ spec.add_development_dependency "http", "~> 2.1"
21
21
  end