down 4.8.1 → 5.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +18 -0
- data/README.md +49 -24
- data/lib/down/chunked_io.rb +68 -41
- data/lib/down/errors.rb +8 -8
- data/lib/down/http.rb +1 -5
- data/lib/down/net_http.rb +1 -0
- data/lib/down/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: a1f7a1532b638ed92acdb3bdf74211dbd022e7a0de37d1fdbc25f665337d4bf1
|
|
4
|
+
data.tar.gz: 7bbc2684d53e278376981b4dd4741c49bc0d8da0990ece28fb03776f39aea6fe
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 6a3e62293c5fa1e5b43a5c835af3804ab8307babfb573179057ec279239b72a16ff74b2b2e96df9da033a3e4e13be4af6adeb8ec00dd73a27f75942b646419de
|
|
7
|
+
data.tar.gz: d267672bb2468c8998e271ff3849908c7beead00d8b9777799e6bc0abd1ab51ccf14d5c567c16f3bb30a41eb3c57b66ffbc2755d333055fba972d327a865acb4
|
data/CHANGELOG.md
CHANGED
|
@@ -1,3 +1,21 @@
|
|
|
1
|
+
## 5.0.0 (2019-09-26)
|
|
2
|
+
|
|
3
|
+
* Change `ChunkedIO#each_chunk` to return chunks in original encoding (@janko)
|
|
4
|
+
|
|
5
|
+
* Always return binary strings in `ChunkedIO#readpartial` (@janko)
|
|
6
|
+
|
|
7
|
+
* Handle frozen chunks in `Down::ChunkedIO` (@janko)
|
|
8
|
+
|
|
9
|
+
* Change `ChunkedIO#gets` to return lines in specified encoding (@janko)
|
|
10
|
+
|
|
11
|
+
* Halve memory allocation for `ChunkedIO#gets` (@janko)
|
|
12
|
+
|
|
13
|
+
* Halve memory allocation for `ChunkedIO#read` without arguments (@janko)
|
|
14
|
+
|
|
15
|
+
* Drop support for `HTTP::Client` argument in `Down::HTTP.new` (@janko)
|
|
16
|
+
|
|
17
|
+
* Repurpose `Down::NotFound` to be raised on `404 Not Found` response (@janko)
|
|
18
|
+
|
|
1
19
|
## 4.8.1 (2019-05-01)
|
|
2
20
|
|
|
3
21
|
* Make `ChunkedIO#read`/`#readpartial` with length always return strings in binary encoding (@janko)
|
data/README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
# Down
|
|
2
2
|
|
|
3
3
|
Down is a utility tool for streaming, flexible and safe downloading of remote
|
|
4
|
-
files. It can use [open-uri] + `Net::HTTP`, [
|
|
4
|
+
files. It can use [open-uri] + `Net::HTTP`, [http.rb] or `wget` as the backend
|
|
5
5
|
HTTP library.
|
|
6
6
|
|
|
7
7
|
## Installation
|
|
@@ -57,8 +57,12 @@ specific location on disk, you can specify the `:destination` option:
|
|
|
57
57
|
|
|
58
58
|
```rb
|
|
59
59
|
Down.download("http://example.com/image.jpg", destination: "/path/to/destination")
|
|
60
|
+
#=> nil
|
|
60
61
|
```
|
|
61
62
|
|
|
63
|
+
In this case `Down.download` won't have any return value, so if you need a File
|
|
64
|
+
object you'll have to create it manually.
|
|
65
|
+
|
|
62
66
|
### Basic authentication
|
|
63
67
|
|
|
64
68
|
`Down.download` and `Down.open` will automatically detect and apply HTTP basic
|
|
@@ -103,6 +107,16 @@ remote_file.eof? #=> true
|
|
|
103
107
|
remote_file.close # closes the HTTP connection and deletes the internal Tempfile
|
|
104
108
|
```
|
|
105
109
|
|
|
110
|
+
The following IO methods are implemented:
|
|
111
|
+
|
|
112
|
+
* `#read` & `#readpartial`
|
|
113
|
+
* `#gets`
|
|
114
|
+
* `#seek`
|
|
115
|
+
* `#pos` & `#tell`
|
|
116
|
+
* `#eof?`
|
|
117
|
+
* `#rewind`
|
|
118
|
+
* `#close`
|
|
119
|
+
|
|
106
120
|
### Caching
|
|
107
121
|
|
|
108
122
|
By default the downloaded content is internally cached into a `Tempfile`, so
|
|
@@ -147,10 +161,10 @@ remote_file.data[:headers] #=> { ... }
|
|
|
147
161
|
remote_file.data[:response] # returns the response object
|
|
148
162
|
```
|
|
149
163
|
|
|
150
|
-
Note that `Down::
|
|
151
|
-
status was 4xx or 5xx.
|
|
164
|
+
Note that a `Down::ResponseError` exception will automatically be raised if
|
|
165
|
+
response status was 4xx or 5xx.
|
|
152
166
|
|
|
153
|
-
###
|
|
167
|
+
### Down::ChunkedIO
|
|
154
168
|
|
|
155
169
|
The `Down.open` performs HTTP logic and returns an instance of
|
|
156
170
|
`Down::ChunkedIO`. However, `Down::ChunkedIO` is a generic class that can wrap
|
|
@@ -196,21 +210,23 @@ the `Down::Error` subclasses. This is Down's exception hierarchy:
|
|
|
196
210
|
|
|
197
211
|
* `Down::Error`
|
|
198
212
|
* `Down::TooLarge`
|
|
199
|
-
* `Down::
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
* `Down::
|
|
203
|
-
* `Down::
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
213
|
+
* `Down::InvalidUrl`
|
|
214
|
+
* `Down::TooManyRedirects`
|
|
215
|
+
* `Down::ResponseError`
|
|
216
|
+
* `Down::ClientError`
|
|
217
|
+
* `Down::NotFound`
|
|
218
|
+
* `Down::ServerError`
|
|
219
|
+
* `Down::ConnectionError`
|
|
220
|
+
* `Down::TimeoutError`
|
|
221
|
+
* `Down::SSLError`
|
|
208
222
|
|
|
209
223
|
## Backends
|
|
210
224
|
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
225
|
+
The following backends are available:
|
|
226
|
+
|
|
227
|
+
* [Down::NetHttp](#downnethttp) (default)
|
|
228
|
+
* [Down::Http](#downhttp)
|
|
229
|
+
* [Down::Wget](#downwget)
|
|
214
230
|
|
|
215
231
|
You can use the backend directly:
|
|
216
232
|
|
|
@@ -232,7 +248,10 @@ Down.download("...")
|
|
|
232
248
|
Down.open("...")
|
|
233
249
|
```
|
|
234
250
|
|
|
235
|
-
###
|
|
251
|
+
### Down::NetHttp
|
|
252
|
+
|
|
253
|
+
The `Down::NetHttp` backend implements downloads using [open-uri] and
|
|
254
|
+
[Net::HTTP].
|
|
236
255
|
|
|
237
256
|
```rb
|
|
238
257
|
gem "down", "~> 4.4"
|
|
@@ -334,7 +353,9 @@ net_http.download("http://example.com/image.jpg")
|
|
|
334
353
|
net_http.open("http://example.com/image.jpg")
|
|
335
354
|
```
|
|
336
355
|
|
|
337
|
-
###
|
|
356
|
+
### Down::Http
|
|
357
|
+
|
|
358
|
+
The `Down::Http` backend implements downloads using the [http.rb] gem.
|
|
338
359
|
|
|
339
360
|
```rb
|
|
340
361
|
gem "down", "~> 4.4"
|
|
@@ -350,7 +371,7 @@ io = Down::Http.open("http://nature.com/forest.jpg")
|
|
|
350
371
|
io #=> #<Down::ChunkedIO ...>
|
|
351
372
|
```
|
|
352
373
|
|
|
353
|
-
Some features that give the
|
|
374
|
+
Some features that give the http.rb backend an advantage over `open-uri` and
|
|
354
375
|
`Net::HTTP` include:
|
|
355
376
|
|
|
356
377
|
* Low memory usage (**10x less** than `open-uri`/`Net::HTTP`)
|
|
@@ -401,7 +422,10 @@ down = Down::Http.new(method: :post)
|
|
|
401
422
|
down.download("http://example.org/image.jpg")
|
|
402
423
|
```
|
|
403
424
|
|
|
404
|
-
### Wget (experimental)
|
|
425
|
+
### Down::Wget (experimental)
|
|
426
|
+
|
|
427
|
+
The `Down::Wget` backend implements downloads using the `wget` command line
|
|
428
|
+
utility.
|
|
405
429
|
|
|
406
430
|
```rb
|
|
407
431
|
gem "down", "~> 4.4"
|
|
@@ -418,9 +442,8 @@ io = Down::Wget.open("http://nature.com/forest.jpg")
|
|
|
418
442
|
io #=> #<Down::ChunkedIO ...>
|
|
419
443
|
```
|
|
420
444
|
|
|
421
|
-
|
|
422
|
-
|
|
423
|
-
interrupted due to network failures, which is very useful when you're
|
|
445
|
+
One major advantage of `wget` is that it automatically resumes downloads that
|
|
446
|
+
were interrupted due to network failures, which is very useful when you're
|
|
424
447
|
downloading large files.
|
|
425
448
|
|
|
426
449
|
However, the Wget backend should still be considered experimental, as it wasn't
|
|
@@ -450,6 +473,8 @@ wget.open("http://nature.com/forest.jpg")
|
|
|
450
473
|
* MRI 2.2
|
|
451
474
|
* MRI 2.3
|
|
452
475
|
* MRI 2.4
|
|
476
|
+
* MRI 2.5
|
|
477
|
+
* MRI 2.6
|
|
453
478
|
* JRuby
|
|
454
479
|
|
|
455
480
|
## Development
|
|
@@ -469,6 +494,6 @@ you'll need to have Docker installed and running.
|
|
|
469
494
|
|
|
470
495
|
[open-uri]: http://ruby-doc.org/stdlib-2.3.0/libdoc/open-uri/rdoc/OpenURI.html
|
|
471
496
|
[Net::HTTP]: https://ruby-doc.org/stdlib-2.4.1/libdoc/net/http/rdoc/Net/HTTP.html
|
|
472
|
-
[
|
|
497
|
+
[http.rb]: https://github.com/httprb/http
|
|
473
498
|
[Addressable::URI]: https://github.com/sporkmonger/addressable
|
|
474
499
|
[kennethreitz/httpbin]: https://github.com/kennethreitz/httpbin
|
data/lib/down/chunked_io.rb
CHANGED
|
@@ -63,21 +63,18 @@ module Down
|
|
|
63
63
|
def read(length = nil, outbuf = nil)
|
|
64
64
|
fail IOError, "closed stream" if closed?
|
|
65
65
|
|
|
66
|
+
data = outbuf.to_s.clear.force_encoding(Encoding::BINARY)
|
|
66
67
|
remaining_length = length
|
|
67
68
|
|
|
68
|
-
begin
|
|
69
|
-
data = readpartial(remaining_length, outbuf)
|
|
70
|
-
data = data.dup unless outbuf
|
|
71
|
-
remaining_length = length - data.bytesize if length
|
|
72
|
-
rescue EOFError
|
|
73
|
-
end
|
|
74
|
-
|
|
75
69
|
until remaining_length == 0 || eof?
|
|
76
|
-
data << readpartial(remaining_length)
|
|
70
|
+
data << readpartial(remaining_length, buffer ||= String.new)
|
|
77
71
|
remaining_length = length - data.bytesize if length
|
|
78
72
|
end
|
|
79
73
|
|
|
80
|
-
|
|
74
|
+
buffer.clear if buffer # deallocate string
|
|
75
|
+
|
|
76
|
+
data.force_encoding(@encoding) unless length
|
|
77
|
+
data unless data.empty? && length && length > 0
|
|
81
78
|
end
|
|
82
79
|
|
|
83
80
|
# Implements IO#gets semantics. Without arguments it retrieves lines of
|
|
@@ -108,27 +105,33 @@ module Down
|
|
|
108
105
|
|
|
109
106
|
separator = "\n\n" if separator.empty?
|
|
110
107
|
|
|
111
|
-
|
|
112
|
-
data = readpartial(limit)
|
|
108
|
+
data = String.new
|
|
113
109
|
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
110
|
+
until data.include?(separator) || data.bytesize == limit || eof?
|
|
111
|
+
remaining_length = limit - data.bytesize if limit
|
|
112
|
+
data << readpartial(remaining_length, buffer ||= String.new)
|
|
113
|
+
end
|
|
118
114
|
|
|
119
|
-
|
|
120
|
-
line << separator if data.include?(separator)
|
|
115
|
+
buffer.clear if buffer # deallocate buffer
|
|
121
116
|
|
|
117
|
+
line, extra = data.split(separator, 2)
|
|
118
|
+
line << separator if data.include?(separator)
|
|
119
|
+
|
|
120
|
+
data.clear # deallocate data
|
|
121
|
+
|
|
122
|
+
if extra
|
|
122
123
|
if cache
|
|
123
|
-
cache.pos -= extra.
|
|
124
|
+
cache.pos -= extra.bytesize
|
|
124
125
|
else
|
|
125
|
-
|
|
126
|
+
if @buffer
|
|
127
|
+
@buffer.prepend(extra)
|
|
128
|
+
else
|
|
129
|
+
@buffer = extra
|
|
130
|
+
end
|
|
126
131
|
end
|
|
127
|
-
rescue EOFError
|
|
128
|
-
line = nil
|
|
129
132
|
end
|
|
130
133
|
|
|
131
|
-
line
|
|
134
|
+
line.force_encoding(@encoding) if line
|
|
132
135
|
end
|
|
133
136
|
|
|
134
137
|
# Implements IO#readpartial semantics. If there is any content readily
|
|
@@ -145,27 +148,25 @@ module Down
|
|
|
145
148
|
# where the value is replaced with retrieved content.
|
|
146
149
|
#
|
|
147
150
|
# Raises EOFError if end of file is reached. Raises IOError if closed.
|
|
148
|
-
def readpartial(
|
|
151
|
+
def readpartial(maxlen = nil, outbuf = nil)
|
|
149
152
|
fail IOError, "closed stream" if closed?
|
|
150
153
|
|
|
151
|
-
|
|
154
|
+
maxlen ||= 16*1024
|
|
152
155
|
|
|
153
|
-
|
|
156
|
+
data = cache.read(maxlen, outbuf) if cache && !cache.eof?
|
|
157
|
+
data ||= outbuf.to_s.clear
|
|
154
158
|
|
|
155
|
-
if
|
|
156
|
-
data = cache.read(length, outbuf)
|
|
157
|
-
data.force_encoding(@encoding)
|
|
158
|
-
end
|
|
159
|
+
return data if maxlen == 0
|
|
159
160
|
|
|
160
|
-
if @buffer.nil? &&
|
|
161
|
+
if @buffer.nil? && data.empty?
|
|
161
162
|
fail EOFError, "end of file reached" if chunks_depleted?
|
|
162
163
|
@buffer = retrieve_chunk
|
|
163
164
|
end
|
|
164
165
|
|
|
165
|
-
remaining_length =
|
|
166
|
+
remaining_length = maxlen - data.bytesize
|
|
166
167
|
|
|
167
168
|
unless @buffer.nil? || remaining_length == 0
|
|
168
|
-
if remaining_length
|
|
169
|
+
if remaining_length < @buffer.bytesize
|
|
169
170
|
buffered_data = @buffer.byteslice(0, remaining_length)
|
|
170
171
|
@buffer = @buffer.byteslice(remaining_length..-1)
|
|
171
172
|
else
|
|
@@ -173,21 +174,46 @@ module Down
|
|
|
173
174
|
@buffer = nil
|
|
174
175
|
end
|
|
175
176
|
|
|
176
|
-
|
|
177
|
-
data << buffered_data
|
|
178
|
-
else
|
|
179
|
-
data = buffered_data
|
|
180
|
-
end
|
|
177
|
+
data << buffered_data
|
|
181
178
|
|
|
182
179
|
cache.write(buffered_data) if cache
|
|
183
180
|
|
|
184
|
-
buffered_data.clear unless buffered_data.
|
|
181
|
+
buffered_data.clear unless buffered_data.frozen?
|
|
185
182
|
end
|
|
186
183
|
|
|
187
184
|
@position += data.bytesize
|
|
188
185
|
|
|
189
|
-
data.force_encoding(Encoding::BINARY)
|
|
190
|
-
|
|
186
|
+
data.force_encoding(Encoding::BINARY)
|
|
187
|
+
end
|
|
188
|
+
|
|
189
|
+
# Implements IO#seek semantics.
|
|
190
|
+
def seek(amount, whence = IO::SEEK_SET)
|
|
191
|
+
fail Errno::ESPIPE, "Illegal seek" if cache.nil?
|
|
192
|
+
|
|
193
|
+
case whence
|
|
194
|
+
when IO::SEEK_SET, :SET
|
|
195
|
+
target_pos = amount
|
|
196
|
+
when IO::SEEK_CUR, :CUR
|
|
197
|
+
target_pos = @position + amount
|
|
198
|
+
when IO::SEEK_END, :END
|
|
199
|
+
unless chunks_depleted?
|
|
200
|
+
cache.seek(0, IO::SEEK_END)
|
|
201
|
+
IO.copy_stream(self, File::NULL)
|
|
202
|
+
end
|
|
203
|
+
|
|
204
|
+
target_pos = cache.size + amount
|
|
205
|
+
else
|
|
206
|
+
fail ArgumentError, "invalid whence: #{whence.inspect}"
|
|
207
|
+
end
|
|
208
|
+
|
|
209
|
+
if target_pos <= cache.size
|
|
210
|
+
cache.seek(target_pos)
|
|
211
|
+
else
|
|
212
|
+
cache.seek(0, IO::SEEK_END)
|
|
213
|
+
IO.copy_stream(self, File::NULL, target_pos - cache.size)
|
|
214
|
+
end
|
|
215
|
+
|
|
216
|
+
@position = cache.pos
|
|
191
217
|
end
|
|
192
218
|
|
|
193
219
|
# Implements IO#pos semantics. Returns the current position of the
|
|
@@ -195,6 +221,7 @@ module Down
|
|
|
195
221
|
def pos
|
|
196
222
|
@position
|
|
197
223
|
end
|
|
224
|
+
alias tell pos
|
|
198
225
|
|
|
199
226
|
# Implements IO#eof? semantics. Returns whether we've reached end of file.
|
|
200
227
|
# It returns true if cache is at the end and there is no more content to
|
|
@@ -272,7 +299,7 @@ module Down
|
|
|
272
299
|
def retrieve_chunk
|
|
273
300
|
chunk = @next_chunk
|
|
274
301
|
@next_chunk = chunks_fiber.resume
|
|
275
|
-
chunk
|
|
302
|
+
chunk
|
|
276
303
|
end
|
|
277
304
|
|
|
278
305
|
# Returns whether there is any content left to retrieve.
|
data/lib/down/errors.rb
CHANGED
|
@@ -7,17 +7,14 @@ module Down
|
|
|
7
7
|
# raised when the file is larger than the specified maximum size
|
|
8
8
|
class TooLarge < Error; end
|
|
9
9
|
|
|
10
|
-
# raised when the file failed to be retrieved for whatever reason
|
|
11
|
-
class NotFound < Error; end
|
|
12
|
-
|
|
13
10
|
# raised when the given URL couldn't be parsed
|
|
14
|
-
class InvalidUrl <
|
|
11
|
+
class InvalidUrl < Error; end
|
|
15
12
|
|
|
16
13
|
# raised when the number of redirects was larger than the specified maximum
|
|
17
|
-
class TooManyRedirects <
|
|
14
|
+
class TooManyRedirects < Error; end
|
|
18
15
|
|
|
19
16
|
# raised when response returned 4xx or 5xx response
|
|
20
|
-
class ResponseError <
|
|
17
|
+
class ResponseError < Error
|
|
21
18
|
attr_reader :response
|
|
22
19
|
|
|
23
20
|
def initialize(message, response: nil)
|
|
@@ -29,15 +26,18 @@ module Down
|
|
|
29
26
|
# raised when response returned 4xx response
|
|
30
27
|
class ClientError < ResponseError; end
|
|
31
28
|
|
|
29
|
+
# raised when response returned 404 response
|
|
30
|
+
class NotFound < ClientError; end
|
|
31
|
+
|
|
32
32
|
# raised when response returned 5xx response
|
|
33
33
|
class ServerError < ResponseError; end
|
|
34
34
|
|
|
35
35
|
# raised when there was an error connecting to the server
|
|
36
|
-
class ConnectionError <
|
|
36
|
+
class ConnectionError < Error; end
|
|
37
37
|
|
|
38
38
|
# raised when connecting to the server too longer than the specified timeout
|
|
39
39
|
class TimeoutError < ConnectionError; end
|
|
40
40
|
|
|
41
41
|
# raised when an SSL error was raised
|
|
42
|
-
class SSLError <
|
|
42
|
+
class SSLError < Error; end
|
|
43
43
|
end
|
data/lib/down/http.rb
CHANGED
|
@@ -13,11 +13,6 @@ module Down
|
|
|
13
13
|
class Http < Backend
|
|
14
14
|
# Initializes the backend with common defaults.
|
|
15
15
|
def initialize(options = {}, &block)
|
|
16
|
-
if options.is_a?(HTTP::Client)
|
|
17
|
-
warn "[Down] Passing an HTTP::Client object to Down::Http#initialize is deprecated and won't be supported in Down 5. Use the block initialization instead."
|
|
18
|
-
options = options.default_options.to_hash
|
|
19
|
-
end
|
|
20
|
-
|
|
21
16
|
@method = options.delete(:method) || :get
|
|
22
17
|
@client = HTTP
|
|
23
18
|
.headers("User-Agent" => "Down/#{Down::VERSION}")
|
|
@@ -114,6 +109,7 @@ module Down
|
|
|
114
109
|
args = [response.status.to_s, response: response]
|
|
115
110
|
|
|
116
111
|
case response.code
|
|
112
|
+
when 404 then raise Down::NotFound.new(*args)
|
|
117
113
|
when 400..499 then raise Down::ClientError.new(*args)
|
|
118
114
|
when 500..599 then raise Down::ServerError.new(*args)
|
|
119
115
|
else raise Down::ResponseError.new(*args)
|
data/lib/down/net_http.rb
CHANGED
|
@@ -313,6 +313,7 @@ module Down
|
|
|
313
313
|
args = ["#{code} #{message}", response: response]
|
|
314
314
|
|
|
315
315
|
case response.code.to_i
|
|
316
|
+
when 404 then raise Down::NotFound.new(*args)
|
|
316
317
|
when 400..499 then raise Down::ClientError.new(*args)
|
|
317
318
|
when 500..599 then raise Down::ServerError.new(*args)
|
|
318
319
|
else raise Down::ResponseError.new(*args)
|
data/lib/down/version.rb
CHANGED
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: down
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version:
|
|
4
|
+
version: 5.0.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Janko Marohnić
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: bin
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2019-
|
|
11
|
+
date: 2019-09-26 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: addressable
|