down 4.1.1 → 4.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +274 -0
- data/README.md +8 -4
- data/down.gemspec +1 -1
- data/lib/down/chunked_io.rb +102 -17
- data/lib/down/http.rb +31 -26
- data/lib/down/net_http.rb +122 -103
- data/lib/down/version.rb +1 -1
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 443491ec05f03e75593726f54314b04587637bf2
|
4
|
+
data.tar.gz: a04c39fd140b7266f7ea80bfee2da4f649c079c4
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 1a79349a849954e2e68b5bcf187e79825255c9a1c3fd5f1ea81dc12f2d97e4d20a5419443a6e3d3942e82b3af8615fcce24cfcdbd4ded091048d7f02e17d5039
|
7
|
+
data.tar.gz: 2fb8ed137234217f44790f35a767dead06eecff2174fba0960711ebde04d90ff4ee76f20dced2ab2bd6a48b7e5b31c335556504d6892d9603f6c89092cc7512b
|
data/CHANGELOG.md
ADDED
@@ -0,0 +1,274 @@
|
|
1
|
+
## 4.2.0 (2017-12-22)
|
2
|
+
|
3
|
+
* Handle `:max_redirects` in `Down::NetHttp#open` and follow up to 2 redirects by default (@janko-m)
|
4
|
+
|
5
|
+
## 4.1.1 (2017-10-15)
|
6
|
+
|
7
|
+
* Raise all system call exceptions as `Down::ConnectionError` in `Down::NetHttp` (@janko-m)
|
8
|
+
|
9
|
+
* Raise `Errno::ETIMEDOUT` as `Down::TimeoutError` in `Down::NetHttp` (@janko-m)
|
10
|
+
|
11
|
+
* Raise `Addressable::URI::InvalidURIError` as `Down::InvalidUrl` in `Down::Http` (@janko-m)
|
12
|
+
|
13
|
+
## 4.1.0 (2017-08-29)
|
14
|
+
|
15
|
+
* Fix `FiberError` occurring on `Down::NetHttp.open` when response is chunked and gzipped (@janko-m)
|
16
|
+
|
17
|
+
* Use a default `User-Agent` in `Down::NetHttp.open` (@janko-m)
|
18
|
+
|
19
|
+
* Fix raw read timeout error sometimes being raised instead of `Down::TimeoutError` in `Down.open` (@janko-m)
|
20
|
+
|
21
|
+
* `Down::ChunkedIO` can now be parsed by the CSV Ruby standard library (@janko-m)
|
22
|
+
|
23
|
+
* Implement `Down::ChunkedIO#gets` (@janko-m)
|
24
|
+
|
25
|
+
* Implement `Down::ChunkedIO#pos` (@janko-m)
|
26
|
+
|
27
|
+
## 4.0.1 (2017-07-08)
|
28
|
+
|
29
|
+
* Load and assign the `NetHttp` backend immediately on `require "down"` (@janko-m)
|
30
|
+
|
31
|
+
* Remove undocumented `Down::ChunkedIO#backend=` that was added in 4.0.0 to avoid confusion (@janko-m)
|
32
|
+
|
33
|
+
## 4.0.0 (2017-06-24)
|
34
|
+
|
35
|
+
* Don't apply `Down.download` and `Down.open` overrides when loading a backend (@janko-m)
|
36
|
+
|
37
|
+
* Remove `Down::Http.client` attribute accessor (@janko-m)
|
38
|
+
|
39
|
+
* Make `Down::NetHttp`, `Down::Http`, and `Down::Wget` classes instead of modules (@janko-m)
|
40
|
+
|
41
|
+
* Remove `Down.copy_to_tempfile` (@janko-m)
|
42
|
+
|
43
|
+
* Add Wget backend (@janko-m)
|
44
|
+
|
45
|
+
* Add `:content_length_proc` and `:progress_proc` to the HTTP.rb backend (@janko-m)
|
46
|
+
|
47
|
+
* Halve string allocations in `Down::ChunkedIO#readpartial` when buffer string is not used (@janko-m)
|
48
|
+
|
49
|
+
## 3.2.0 (2017-06-21)
|
50
|
+
|
51
|
+
* Add `Down::ChunkedIO#readpartial` for more memory efficient reading (@janko-m)
|
52
|
+
|
53
|
+
* Fix `Down::ChunkedIO` not returning second part of the last chunk if it was previously partially read (@janko-m)
|
54
|
+
|
55
|
+
* Strip internal variables from `Down::ChunkedIO#inspect` and show only the important ones (@janko-m)
|
56
|
+
|
57
|
+
* Add `Down::ChunkedIO#closed?` (@janko-m)
|
58
|
+
|
59
|
+
* Add `Down::ChunkedIO#rewindable?` (@janko-m)
|
60
|
+
|
61
|
+
* In `Down::ChunkedIO` only create the Tempfile if it's going to be used (@janko-m)
|
62
|
+
|
63
|
+
## 3.1.0 (2017-06-16)
|
64
|
+
|
65
|
+
* Split `Down::NotFound` into explanatory exceptions (@janko-m)
|
66
|
+
|
67
|
+
* Add `:read_timeout` and `:open_timeout` options to `Down::NetHttp.open` (@janko-m)
|
68
|
+
|
69
|
+
* Return an `Integer` in `data[:status]` on a result of `Down.open` when using the HTTP.rb strategy (@janko-m)
|
70
|
+
|
71
|
+
## 3.0.0 (2017-05-24)
|
72
|
+
|
73
|
+
* Make `Down.open` pass encoding from content type charset to `Down::ChunkedIO` (@janko-m)
|
74
|
+
|
75
|
+
* Add `:encoding` option to `Down::ChunkedIO.new` for specifying the encoding of returned content (@janko-m)
|
76
|
+
|
77
|
+
* Add HTTP.rb backend as an alternative to Net::HTTP (@janko-m)
|
78
|
+
|
79
|
+
* Stop testing on MRI 2.1 (@janko-m)
|
80
|
+
|
81
|
+
* Forward cookies from the `Set-Cookie` response header when redirecting (@janko-m)
|
82
|
+
|
83
|
+
* Add `frozen-string-literal: true` comments for less string allocations on Ruby 2.3+ (@janko-m)
|
84
|
+
|
85
|
+
* Modify `#content_type` to return nil instead of `application/octet-stream` when `Content-Type` is blank in `Down.download` (@janko-m)
|
86
|
+
|
87
|
+
* `Down::ChunkedIO#read`, `#each_chunk`, `#eof?`, `rewind` now raise an `IOError` when `Down::ChunkedIO` has been closed (@janko-m)
|
88
|
+
|
89
|
+
* `Down::ChunkedIO` now caches only the content that has been read (@janko-m)
|
90
|
+
|
91
|
+
* Add `Down::ChunkedIO#size=` to allow assigning size after the `Down::ChunkedIO` has been instantiated (@janko-m)
|
92
|
+
|
93
|
+
* Make `:size` an optional argument in `Down::ChunkedIO` (@janko-m)
|
94
|
+
|
95
|
+
* Call enumerator's `ensure` block when `Down::ChunkedIO#close` is called (@janko-m)
|
96
|
+
|
97
|
+
* Add `:rewindable` option to `Down::ChunkedIO` and `Down.open` for disabling caching read content into a file (@janko-m)
|
98
|
+
|
99
|
+
* Drop support for MRI 2.0 (@janko-m)
|
100
|
+
|
101
|
+
* Drop support for MRI 1.9.3 (@janko-m)
|
102
|
+
|
103
|
+
* Remove deprecated `:progress` option (@janko-m)
|
104
|
+
|
105
|
+
* Remove deprecated `:timeout` option (@janko-m)
|
106
|
+
|
107
|
+
* Reraise only a subset of exceptions as `Down::NotFound` in `Down.download` (@janko-m)
|
108
|
+
|
109
|
+
* Support streaming of "Transfer-Encoding: chunked" responses in `Down.open` again (@janko-m)
|
110
|
+
|
111
|
+
* Remove deprecated `Down.stream` (@janko-m)
|
112
|
+
|
113
|
+
## 2.5.1 (2017-05-13)
|
114
|
+
|
115
|
+
* Remove URL from the error messages (@janko-m)
|
116
|
+
|
117
|
+
## 2.5.0 (2017-05-03)
|
118
|
+
|
119
|
+
* Support both Strings and `URI` objects in `Down.download` and `Down.open` (@olleolleolle)
|
120
|
+
|
121
|
+
* Work around a `CGI.unescape` bug in Ruby 2.4.
|
122
|
+
|
123
|
+
* Apply HTTP Basic authentication contained in URLs in `Down.open`.
|
124
|
+
|
125
|
+
* Raise `Down::NotFound` on 4xx and 5xx responses in `Down.open`.
|
126
|
+
|
127
|
+
* Write `:status` and `:headers` information to `Down::ChunkedIO#data` in `Down.open`.
|
128
|
+
|
129
|
+
* Add `#data` attribute to `Down::ChunkedIO` for saving custom result data.
|
130
|
+
|
131
|
+
* Don't save retrieved chunks into the file in `Down::ChunkedIO#each_chunk`.
|
132
|
+
|
133
|
+
* Add `:proxy` option to `Down.download` and `Down.open`.
|
134
|
+
|
135
|
+
## 2.4.3 (2017-04-06)
|
136
|
+
|
137
|
+
* Show the input URL in the `Down::Error` message.
|
138
|
+
|
139
|
+
## 2.4.2 (2017-03-28)
|
140
|
+
|
141
|
+
* Don't raise `StopIteration` in `Down::ChunkedIO` when `:chunks` is an empty
|
142
|
+
Enumerator.
|
143
|
+
|
144
|
+
## 2.4.1 (2017-03-23)
|
145
|
+
|
146
|
+
* Correctly detect empty filename from `Content-Disposition` header, and
|
147
|
+
in this case continue extracting filename from URL.
|
148
|
+
|
149
|
+
## 2.4.0 (2017-03-19)
|
150
|
+
|
151
|
+
* Allow `Down.open` to accept request headers as options with String keys,
|
152
|
+
just like `Down.download` does.
|
153
|
+
|
154
|
+
* Decode URI-decoded filenames from the `Content-Disposition` header
|
155
|
+
|
156
|
+
* Parse filenames without quotes from the `Content-Disposition` header
|
157
|
+
|
158
|
+
## 2.3.8 (2016-11-07)
|
159
|
+
|
160
|
+
* Work around `Transfer-Encoding: chunked` responses by downloading whole
|
161
|
+
response body.
|
162
|
+
|
163
|
+
## 2.3.7 (2016-11-06)
|
164
|
+
|
165
|
+
* In `Down.open` send requests using the URI *path* instead of the full URI.
|
166
|
+
|
167
|
+
## 2.3.6 (2016-07-26)
|
168
|
+
|
169
|
+
* Read #original_filename from the "Content-Disposition" header.
|
170
|
+
|
171
|
+
* Extract `Down::ChunkedIO` into a file, so that it can be required separately.
|
172
|
+
|
173
|
+
* In `Down.stream` close the IO after reading from it.
|
174
|
+
|
175
|
+
## 2.3.5 (2016-07-18)
|
176
|
+
|
177
|
+
* Prevent reading the whole response body when the IO returned by `Down.open`
|
178
|
+
is closed.
|
179
|
+
|
180
|
+
## 2.3.4 (2016-07-14)
|
181
|
+
|
182
|
+
* Require `net/http`
|
183
|
+
|
184
|
+
## 2.3.3 (2016-06-23)
|
185
|
+
|
186
|
+
* Improve `Down::ChunkedIO` (and thus `Down.open`):
|
187
|
+
|
188
|
+
- `#each_chunk` and `#read` now automatically call `:on_close` when all
|
189
|
+
chunks were downloaded
|
190
|
+
|
191
|
+
- `#eof?` had incorrect behaviour, where it would return true if
|
192
|
+
everything was downloaded, instead only when it's also at the end of
|
193
|
+
the file
|
194
|
+
|
195
|
+
- `#close` can now be called multiple times, as the `:on_close` will always
|
196
|
+
be called only once
|
197
|
+
|
198
|
+
- end of download is now detected immediately when the last chunk was
|
199
|
+
downloaded (as opposed to after trying to read the next one)
|
200
|
+
|
201
|
+
## 2.3.2 (2016-06-22)
|
202
|
+
|
203
|
+
* Add `Down.open` for IO-like streaming, and deprecate `Down.stream` (janko-m)
|
204
|
+
|
205
|
+
* Allow URLs with basic authentication (`http://user:password@example.com`) (janko-m)
|
206
|
+
|
207
|
+
## ~~2.3.1 (2016-06-22)~~ (yanked)
|
208
|
+
|
209
|
+
## ~~2.3.0 (2016-06-22)~~ (yanked)
|
210
|
+
|
211
|
+
## 2.2.1 (2016-06-06)
|
212
|
+
|
213
|
+
* Make Down work on Windows (martinsefcik)
|
214
|
+
|
215
|
+
* Close an internal file descriptor that was left open (martinsefcik)
|
216
|
+
|
217
|
+
## 2.2.0 (2016-05-19)
|
218
|
+
|
219
|
+
* Add ability to follow redirects, and allow maximum of 2 redirects by default (janko-m)
|
220
|
+
|
221
|
+
* Fix a potential Windows issue when extracting `#original_filename` (janko-m)
|
222
|
+
|
223
|
+
* Fix `#original_filename` being incomplete if filename contains a slash (janko-m)
|
224
|
+
|
225
|
+
## 2.1.0 (2016-04-12)
|
226
|
+
|
227
|
+
* Make `:progress_proc` and `:content_length_proc` work with `:max_size` (janko-m)
|
228
|
+
|
229
|
+
* Deprecate `:progress` in favor of open-uri's `:progress_proc` (janko-m)
|
230
|
+
|
231
|
+
* Deprecate `:timeout` in favor of open-uri's `:open_timeout` and `:read_timeout` (janko-m)
|
232
|
+
|
233
|
+
* Add `Down.stream` for streaming remote files in chunks (janko-m)
|
234
|
+
|
235
|
+
* Replace deprecated `URI.encode` with `CGI.unescape` in downloaded file's `#original_filename` (janko-m)
|
236
|
+
|
237
|
+
## 2.0.1 (2016-03-06)
|
238
|
+
|
239
|
+
* Add error message when file was to large, and use a simple error message for other generic download failures (janko-m)
|
240
|
+
|
241
|
+
## 2.0.0 (2016-02-03)
|
242
|
+
|
243
|
+
* Fix an issue where valid URLs were transformed into invalid URLs (janko-m)
|
244
|
+
|
245
|
+
- All input URLs now have to be properly encoded, which should already be the
|
246
|
+
case in most situations.
|
247
|
+
|
248
|
+
* Include the error class when download fails (janko-m)
|
249
|
+
|
250
|
+
## 1.1.0 (2016-01-26)
|
251
|
+
|
252
|
+
* Forward all additional options to open-uri (janko-m)
|
253
|
+
|
254
|
+
## 1.0.5 (2015-12-18)
|
255
|
+
|
256
|
+
* Move the open-uri file to the new location instead of copying it (janko-m)
|
257
|
+
|
258
|
+
## 1.0.4 (2015-11-19)
|
259
|
+
|
260
|
+
* Delete the old open-uri file after using it (janko-m)
|
261
|
+
|
262
|
+
## 1.0.3 (2015-11-16)
|
263
|
+
|
264
|
+
* Fix `#download` and `#copy_to_tempfile` not preserving the file extension (janko-m)
|
265
|
+
|
266
|
+
* Fix `#copy_to_tempfile` not working when given a nested basename (janko-m)
|
267
|
+
|
268
|
+
## 1.0.2 (2015-10-24)
|
269
|
+
|
270
|
+
* Fix Down not working with Ruby 1.9.3 (janko-m)
|
271
|
+
|
272
|
+
## 1.0.1 (2015-10-01)
|
273
|
+
|
274
|
+
* Don't allow redirects when downloading files (janko-m)
|
data/README.md
CHANGED
@@ -263,6 +263,10 @@ default maximum of 2 redirects will be followed, but you can change it via the
|
|
263
263
|
Down::NetHttp.download("http://example.com/image.jpg") # 2 redirects allowed
|
264
264
|
Down::NetHttp.download("http://example.com/image.jpg", max_redirects: 5) # 5 redirects allowed
|
265
265
|
Down::NetHttp.download("http://example.com/image.jpg", max_redirects: 0) # 0 redirects allowed
|
266
|
+
|
267
|
+
Down::NetHttp.open("http://example.com/image.jpg") # 2 redirects allowed
|
268
|
+
Down::NetHttp.open("http://example.com/image.jpg", max_redirects: 5) # 5 redirects allowed
|
269
|
+
Down::NetHttp.open("http://example.com/image.jpg", max_redirects: 0) # 0 redirects allowed
|
266
270
|
```
|
267
271
|
|
268
272
|
#### Proxy
|
@@ -343,7 +347,7 @@ Net::HTTP include:
|
|
343
347
|
All additional options will be forwarded to `HTTP::Client#request`:
|
344
348
|
|
345
349
|
```rb
|
346
|
-
Down::Http.download("http://example.org/image.jpg", timeout: {
|
350
|
+
Down::Http.download("http://example.org/image.jpg", timeout: { connect: 3 })
|
347
351
|
Down::Http.open("http://example.org/image.jpg", follow: { max_hops: 0 })
|
348
352
|
```
|
349
353
|
|
@@ -351,16 +355,16 @@ If you prefer to add options using the chainable API, you can pass a block:
|
|
351
355
|
|
352
356
|
```rb
|
353
357
|
Down::Http.open("http://example.org/image.jpg") do |client|
|
354
|
-
client.timeout(
|
358
|
+
client.timeout(connect: 3)
|
355
359
|
end
|
356
360
|
```
|
357
361
|
|
358
362
|
You can also initialize the backend with default options:
|
359
363
|
|
360
364
|
```rb
|
361
|
-
http = Down::Http.new(timeout: {
|
365
|
+
http = Down::Http.new(timeout: { connect: 3 })
|
362
366
|
# or
|
363
|
-
http = Down::Http.new(HTTP.timeout(
|
367
|
+
http = Down::Http.new(HTTP.timeout(connect: 3))
|
364
368
|
|
365
369
|
http.download("http://example.com/image.jpg")
|
366
370
|
http.open("http://example.com/image.jpg")
|
data/down.gemspec
CHANGED
@@ -12,7 +12,7 @@ Gem::Specification.new do |spec|
|
|
12
12
|
spec.email = ["janko.marohnic@gmail.com"]
|
13
13
|
spec.license = "MIT"
|
14
14
|
|
15
|
-
spec.files = Dir["README.md", "LICENSE.txt", "*.gemspec", "lib/**/*.rb"]
|
15
|
+
spec.files = Dir["README.md", "LICENSE.txt", "CHANGELOG.md", "*.gemspec", "lib/**/*.rb"]
|
16
16
|
spec.require_path = "lib"
|
17
17
|
|
18
18
|
spec.add_development_dependency "minitest", "~> 5.8"
|
data/lib/down/chunked_io.rb
CHANGED
@@ -4,6 +4,26 @@ require "tempfile"
|
|
4
4
|
require "fiber"
|
5
5
|
|
6
6
|
module Down
|
7
|
+
# Wraps an enumerator that yields chunks of content into an IO object. It
|
8
|
+
# implements some essential IO methods:
|
9
|
+
#
|
10
|
+
# * IO#read
|
11
|
+
# * IO#readpartial
|
12
|
+
# * IO#gets
|
13
|
+
# * IO#size
|
14
|
+
# * IO#pos
|
15
|
+
# * IO#eof?
|
16
|
+
# * IO#rewind
|
17
|
+
# * IO#close
|
18
|
+
#
|
19
|
+
# By default the Down::ChunkedIO caches all read content into a tempfile,
|
20
|
+
# allowing it to be rewindable. If rewindability won't be used, it can be
|
21
|
+
# disabled by setting `:rewindable` to false, which eliminates any disk I/O.
|
22
|
+
#
|
23
|
+
# Any cleanup code (i.e. ensure block) that the given enumerator carries is
|
24
|
+
# guaranteed to get executed, either when all content has been retrieved or
|
25
|
+
# when Down::ChunkedIO is closed. One can also specify an `:on_close`
|
26
|
+
# callback that will also get executed in those situations.
|
7
27
|
class ChunkedIO
|
8
28
|
attr_accessor :size, :data, :encoding
|
9
29
|
|
@@ -12,23 +32,36 @@ module Down
|
|
12
32
|
@size = size
|
13
33
|
@on_close = on_close
|
14
34
|
@data = data
|
15
|
-
@encoding = find_encoding(encoding ||
|
35
|
+
@encoding = find_encoding(encoding || "binary")
|
16
36
|
@rewindable = rewindable
|
17
37
|
@buffer = nil
|
18
|
-
@
|
38
|
+
@position = 0
|
19
39
|
|
20
|
-
retrieve_chunk
|
40
|
+
retrieve_chunk # fetch first chunk so that we know whether the file is empty
|
21
41
|
end
|
22
42
|
|
43
|
+
# Yields elements of the underlying enumerator.
|
23
44
|
def each_chunk
|
24
|
-
|
45
|
+
fail IOError, "closed stream" if closed?
|
46
|
+
|
47
|
+
return enum_for(__method__) unless block_given?
|
25
48
|
|
26
|
-
return enum_for(__method__) if !block_given?
|
27
49
|
yield retrieve_chunk until chunks_depleted?
|
28
50
|
end
|
29
51
|
|
52
|
+
# Implements IO#read semantics. Without arguments it retrieves and returns
|
53
|
+
# all content.
|
54
|
+
#
|
55
|
+
# With `length` argument returns exactly that number of bytes if they're
|
56
|
+
# available.
|
57
|
+
#
|
58
|
+
# With `outbuf` argument each call will return that same string object,
|
59
|
+
# where the value is replaced with retrieved content.
|
60
|
+
#
|
61
|
+
# If end of file is reached, returns empty string if called without
|
62
|
+
# arguments, or nil if called with arguments. Raises IOError if closed.
|
30
63
|
def read(length = nil, outbuf = nil)
|
31
|
-
|
64
|
+
fail IOError, "closed stream" if closed?
|
32
65
|
|
33
66
|
remaining_length = length
|
34
67
|
|
@@ -47,8 +80,22 @@ module Down
|
|
47
80
|
data.to_s unless length && (data.nil? || data.empty?)
|
48
81
|
end
|
49
82
|
|
83
|
+
# Implements IO#gets semantics. Without arguments it retrieves lines of
|
84
|
+
# content separated by newlines.
|
85
|
+
#
|
86
|
+
# With `separator` argument it does the following:
|
87
|
+
#
|
88
|
+
# * if `separator` is a nonempty string returns chunks of content
|
89
|
+
# surrounded with that sequence of bytes
|
90
|
+
# * if `separator` is an empty string returns paragraphs of content
|
91
|
+
# (content delimited by two newlines)
|
92
|
+
# * if `separator` is nil returns all content
|
93
|
+
#
|
94
|
+
# With `limit` argument returns maximum of that amount of bytes.
|
95
|
+
#
|
96
|
+
# Returns nil if end of file is reached. Raises IOError if closed.
|
50
97
|
def gets(separator_or_limit = $/, limit = nil)
|
51
|
-
|
98
|
+
fail IOError, "closed stream" if closed?
|
52
99
|
|
53
100
|
if separator_or_limit.is_a?(Integer)
|
54
101
|
separator = $/
|
@@ -84,8 +131,22 @@ module Down
|
|
84
131
|
line
|
85
132
|
end
|
86
133
|
|
134
|
+
# Implements IO#readpartial semantics. If there is any content readily
|
135
|
+
# available reads from it, otherwise fetches and reads from the next chunk.
|
136
|
+
# It writes to and reads from the cache when needed.
|
137
|
+
#
|
138
|
+
# Without arguments it either returns all content that's readily available,
|
139
|
+
# or the next chunk. This is useful when you don't care about the size of
|
140
|
+
# chunks and you want to minimize string allocations.
|
141
|
+
#
|
142
|
+
# With `length` argument returns maximum of that amount of bytes.
|
143
|
+
#
|
144
|
+
# With `outbuf` argument each call will return that same string object,
|
145
|
+
# where the value is replaced with retrieved content.
|
146
|
+
#
|
147
|
+
# Raises EOFError if end of file is reached. Raises IOError if closed.
|
87
148
|
def readpartial(length = nil, outbuf = nil)
|
88
|
-
|
149
|
+
fail IOError, "closed stream" if closed?
|
89
150
|
|
90
151
|
data = outbuf.replace("").force_encoding(@encoding) if outbuf
|
91
152
|
|
@@ -95,7 +156,7 @@ module Down
|
|
95
156
|
end
|
96
157
|
|
97
158
|
if @buffer.nil? && (data.nil? || data.empty?)
|
98
|
-
|
159
|
+
fail EOFError, "end of file reached" if chunks_depleted?
|
99
160
|
@buffer = retrieve_chunk
|
100
161
|
end
|
101
162
|
|
@@ -123,50 +184,63 @@ module Down
|
|
123
184
|
end
|
124
185
|
end
|
125
186
|
|
126
|
-
@
|
187
|
+
@position += data.bytesize
|
127
188
|
|
128
189
|
data
|
129
190
|
end
|
130
191
|
|
192
|
+
# Implements IO#pos semantics. Returns the current position of the
|
193
|
+
# Down::ChunkedIO.
|
131
194
|
def pos
|
132
|
-
@
|
195
|
+
@position
|
133
196
|
end
|
134
197
|
|
198
|
+
# Implements IO#eof? semantics. Returns whether we've reached end of file.
|
199
|
+
# It returns true if cache is at the end and there is no more content to
|
200
|
+
# retrieve. Raises IOError if closed.
|
135
201
|
def eof?
|
136
|
-
|
202
|
+
fail IOError, "closed stream" if closed?
|
137
203
|
|
138
204
|
return false if cache && !cache.eof?
|
139
205
|
@buffer.nil? && chunks_depleted?
|
140
206
|
end
|
141
207
|
|
208
|
+
# Implements IO#rewind semantics. Rewinds the Down::ChunkedIO by rewinding
|
209
|
+
# the cache and setting the position to the beginning of the file. Raises
|
210
|
+
# IOError if closed or not rewindable.
|
142
211
|
def rewind
|
143
|
-
|
144
|
-
|
212
|
+
fail IOError, "closed stream" if closed?
|
213
|
+
fail IOError, "this Down::ChunkedIO is not rewindable" if cache.nil?
|
145
214
|
|
146
215
|
cache.rewind
|
147
|
-
@
|
216
|
+
@position = 0
|
148
217
|
end
|
149
218
|
|
219
|
+
# Implements IO#close semantics. Closes the Down::ChunkedIO by terminating
|
220
|
+
# chunk retrieval and deleting the cached content.
|
150
221
|
def close
|
151
222
|
return if @closed
|
152
223
|
|
153
224
|
chunks_fiber.resume(:terminate) if chunks_fiber.alive?
|
154
|
-
@buffer = nil
|
155
225
|
cache.close! if cache
|
226
|
+
@buffer = nil
|
156
227
|
@closed = true
|
157
228
|
end
|
158
229
|
|
230
|
+
# Returns whether the Down::ChunkedIO has been closed.
|
159
231
|
def closed?
|
160
232
|
!!@closed
|
161
233
|
end
|
162
234
|
|
235
|
+
# Returns whether the Down::ChunkedIO was specified as rewindable.
|
163
236
|
def rewindable?
|
164
237
|
@rewindable
|
165
238
|
end
|
166
239
|
|
240
|
+
# Returns useful information about the Down::ChunkedIO object.
|
167
241
|
def inspect
|
168
242
|
string = String.new
|
169
|
-
string << "
|
243
|
+
string << "#<#{self.class.name}"
|
170
244
|
string << " chunks=#{@chunks.inspect}"
|
171
245
|
string << " size=#{size.inspect}"
|
172
246
|
string << " encoding=#{encoding.inspect}"
|
@@ -179,20 +253,29 @@ module Down
|
|
179
253
|
|
180
254
|
private
|
181
255
|
|
256
|
+
# If Down::ChunkedIO is specified as rewindable, returns a new Tempfile for
|
257
|
+
# writing read content to. This allows the Down::ChunkedIO to be rewinded.
|
182
258
|
def cache
|
183
259
|
@cache ||= Tempfile.new("down-chunked_io", binmode: true) if @rewindable
|
184
260
|
end
|
185
261
|
|
262
|
+
# Returns current chunk and retrieves the next chunk. If next chunk is nil,
|
263
|
+
# we know we've reached EOF.
|
186
264
|
def retrieve_chunk
|
187
265
|
chunk = @next_chunk
|
188
266
|
@next_chunk = chunks_fiber.resume
|
189
267
|
chunk.force_encoding(@encoding) if chunk
|
190
268
|
end
|
191
269
|
|
270
|
+
# Returns whether there is any content left to retrieve.
|
192
271
|
def chunks_depleted?
|
193
272
|
!chunks_fiber.alive?
|
194
273
|
end
|
195
274
|
|
275
|
+
# Creates a Fiber wrapper around the underlying enumerator. The advantage
|
276
|
+
# of using a Fiber here is that we can terminate the chunk retrieval, in a
|
277
|
+
# way that executes any cleanup code that the enumerator carries. At the
|
278
|
+
# end of iteration the :on_close callback is executed if one was specified.
|
196
279
|
def chunks_fiber
|
197
280
|
@chunks_fiber ||= Fiber.new do
|
198
281
|
begin
|
@@ -206,6 +289,8 @@ module Down
|
|
206
289
|
end
|
207
290
|
end
|
208
291
|
|
292
|
+
# Finds encoding by name. If the encoding couldn't be find, falls back to
|
293
|
+
# the generic binary encoding.
|
209
294
|
def find_encoding(encoding)
|
210
295
|
Encoding.find(encoding)
|
211
296
|
rescue ArgumentError
|
data/lib/down/http.rb
CHANGED
@@ -9,17 +9,14 @@ require "cgi"
|
|
9
9
|
require "base64"
|
10
10
|
|
11
11
|
if Gem::Version.new(HTTP::VERSION) < Gem::Version.new("2.1.0")
|
12
|
-
fail "Down requires HTTP.rb version 2.1.0 or higher"
|
12
|
+
fail "Down::Http requires HTTP.rb version 2.1.0 or higher"
|
13
13
|
end
|
14
14
|
|
15
15
|
module Down
|
16
16
|
class Http < Backend
|
17
|
-
def initialize(client_or_options =
|
18
|
-
options
|
19
|
-
options =
|
20
|
-
|
21
|
-
@client = HTTP.headers("User-Agent" => "Down/#{Down::VERSION}").follow(max_hops: 2)
|
22
|
-
@client = HTTP::Client.new(@client.default_options.merge(options)) if options
|
17
|
+
def initialize(client_or_options = {})
|
18
|
+
options = client_or_options.is_a?(HTTP::Client) ? client_or_options.default_options : client_or_options
|
19
|
+
@options = { headers: { "User-Agent" => "Down/#{Down::VERSION}" }, follow: { max_hops: 2 } }.merge(options)
|
23
20
|
end
|
24
21
|
|
25
22
|
def download(url, max_size: nil, progress_proc: nil, content_length_proc: nil, **options, &block)
|
@@ -59,47 +56,55 @@ module Down
|
|
59
56
|
end
|
60
57
|
|
61
58
|
def open(url, rewindable: true, **options, &block)
|
62
|
-
|
63
|
-
response = get(url, **options, &block)
|
64
|
-
rescue => exception
|
65
|
-
request_error!(exception)
|
66
|
-
end
|
59
|
+
response = get(url, **options, &block)
|
67
60
|
|
68
61
|
response_error!(response) unless response.status.success?
|
69
62
|
|
70
|
-
body_chunks = Enumerator.new do |yielder|
|
71
|
-
begin
|
72
|
-
response.body.each { |chunk| yielder << chunk }
|
73
|
-
rescue => exception
|
74
|
-
request_error!(exception)
|
75
|
-
end
|
76
|
-
end
|
77
|
-
|
78
63
|
Down::ChunkedIO.new(
|
79
|
-
chunks:
|
64
|
+
chunks: enum_for(:stream_body, response),
|
80
65
|
size: response.content_length,
|
81
66
|
encoding: response.content_type.charset,
|
82
67
|
rewindable: rewindable,
|
83
|
-
on_close: (-> { response.connection.close } unless
|
68
|
+
on_close: (-> { response.connection.close } unless default_client.persistent?),
|
84
69
|
data: { status: response.code, headers: response.headers.to_h, response: response },
|
85
70
|
)
|
86
71
|
end
|
87
72
|
|
88
73
|
private
|
89
74
|
|
75
|
+
def default_client
|
76
|
+
@default_client ||= HTTP::Client.new(@options)
|
77
|
+
end
|
78
|
+
|
90
79
|
def get(url, **options, &block)
|
80
|
+
url = process_url(url, options)
|
81
|
+
|
82
|
+
client = default_client
|
83
|
+
client = block.call(client) if block
|
84
|
+
|
85
|
+
client.get(url, options)
|
86
|
+
rescue => exception
|
87
|
+
request_error!(exception)
|
88
|
+
end
|
89
|
+
|
90
|
+
def stream_body(response, &block)
|
91
|
+
response.body.each(&block)
|
92
|
+
rescue => exception
|
93
|
+
request_error!(exception)
|
94
|
+
end
|
95
|
+
|
96
|
+
def process_url(url, options)
|
91
97
|
uri = HTTP::URI.parse(url)
|
92
98
|
|
93
99
|
if uri.user || uri.password
|
94
100
|
user, pass = uri.user, uri.password
|
95
101
|
authorization = "Basic #{Base64.strict_encode64("#{user}:#{pass}")}"
|
96
|
-
|
102
|
+
options[:headers] ||= {}
|
103
|
+
options[:headers].merge!("Authorization" => authorization)
|
97
104
|
uri.user = uri.password = nil
|
98
105
|
end
|
99
106
|
|
100
|
-
|
101
|
-
client = block.call(client) if block
|
102
|
-
client.get(url, options)
|
107
|
+
uri.to_s
|
103
108
|
end
|
104
109
|
|
105
110
|
def response_error!(response)
|
data/lib/down/net_http.rb
CHANGED
@@ -12,14 +12,14 @@ require "cgi"
|
|
12
12
|
module Down
|
13
13
|
class NetHttp < Backend
|
14
14
|
def initialize(options = {})
|
15
|
-
@options = { "User-Agent" => "Down/#{Down::VERSION}" }.merge(options)
|
15
|
+
@options = { "User-Agent" => "Down/#{Down::VERSION}", max_redirects: 2 }.merge(options)
|
16
16
|
end
|
17
17
|
|
18
|
-
def download(
|
18
|
+
def download(url, options = {})
|
19
19
|
options = @options.merge(options)
|
20
20
|
|
21
21
|
max_size = options.delete(:max_size)
|
22
|
-
max_redirects = options.delete(:max_redirects)
|
22
|
+
max_redirects = options.delete(:max_redirects)
|
23
23
|
progress_proc = options.delete(:progress_proc)
|
24
24
|
content_length_proc = options.delete(:content_length_proc)
|
25
25
|
|
@@ -56,14 +56,7 @@ module Down
|
|
56
56
|
|
57
57
|
open_uri_options.merge!(options)
|
58
58
|
|
59
|
-
|
60
|
-
|
61
|
-
begin
|
62
|
-
uri = URI(uri)
|
63
|
-
raise Down::InvalidUrl, "URL scheme needs to be http or https" unless uri.is_a?(URI::HTTP)
|
64
|
-
rescue URI::InvalidURIError => exception
|
65
|
-
raise Down::InvalidUrl, exception.message
|
66
|
-
end
|
59
|
+
uri = ensure_uri(url)
|
67
60
|
|
68
61
|
if uri.user || uri.password
|
69
62
|
open_uri_options[:http_basic_authentication] ||= [uri.user, uri.password]
|
@@ -71,56 +64,123 @@ module Down
|
|
71
64
|
uri.password = nil
|
72
65
|
end
|
73
66
|
|
74
|
-
|
75
|
-
downloaded_file = uri.open(open_uri_options)
|
76
|
-
rescue OpenURI::HTTPRedirect => exception
|
77
|
-
if (tries -= 1) > 0
|
78
|
-
uri = exception.uri
|
67
|
+
open_uri_file = open_uri(uri, open_uri_options, follows_remaining: max_redirects)
|
79
68
|
|
80
|
-
|
81
|
-
|
82
|
-
end
|
69
|
+
tempfile = ensure_tempfile(open_uri_file)
|
70
|
+
tempfile.extend Down::NetHttp::DownloadedFile
|
83
71
|
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
72
|
+
tempfile
|
73
|
+
end
|
74
|
+
|
75
|
+
def open(url, options = {})
|
76
|
+
options = @options.merge(options)
|
77
|
+
|
78
|
+
uri = ensure_uri(url)
|
79
|
+
|
80
|
+
request = Fiber.new do
|
81
|
+
net_http_request(uri, options) do |response|
|
82
|
+
Fiber.yield response
|
94
83
|
end
|
84
|
+
end
|
95
85
|
|
96
|
-
|
97
|
-
|
98
|
-
|
86
|
+
response = request.resume
|
87
|
+
|
88
|
+
response_error!(response) unless response.is_a?(Net::HTTPSuccess)
|
89
|
+
|
90
|
+
Down::ChunkedIO.new(
|
91
|
+
chunks: enum_for(:stream_body, response),
|
92
|
+
size: response["Content-Length"] && response["Content-Length"].to_i,
|
93
|
+
encoding: response.type_params["charset"],
|
94
|
+
rewindable: options.fetch(:rewindable, true),
|
95
|
+
on_close: -> { request.resume }, # close HTTP connnection
|
96
|
+
data: {
|
97
|
+
status: response.code.to_i,
|
98
|
+
headers: response.each_header.inject({}) { |headers, (downcased_name, value)|
|
99
|
+
name = downcased_name.split("-").map(&:capitalize).join("-")
|
100
|
+
headers.merge!(name => value)
|
101
|
+
},
|
102
|
+
response: response,
|
103
|
+
},
|
104
|
+
)
|
105
|
+
end
|
106
|
+
|
107
|
+
private
|
108
|
+
|
109
|
+
def open_uri(uri, options, follows_remaining: 0)
|
110
|
+
downloaded_file = uri.open(options)
|
111
|
+
rescue OpenURI::HTTPRedirect => exception
|
112
|
+
raise Down::TooManyRedirects, "too many redirects" if follows_remaining == 0
|
113
|
+
|
114
|
+
uri = exception.uri
|
115
|
+
|
116
|
+
if !exception.io.meta["set-cookie"].to_s.empty?
|
117
|
+
options["Cookie"] = exception.io.meta["set-cookie"]
|
99
118
|
end
|
100
119
|
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
107
|
-
|
108
|
-
|
109
|
-
|
110
|
-
|
111
|
-
|
120
|
+
follows_remaining -= 1
|
121
|
+
retry
|
122
|
+
rescue OpenURI::HTTPError => exception
|
123
|
+
code, message = exception.io.status
|
124
|
+
response_class = Net::HTTPResponse::CODE_TO_OBJ.fetch(code)
|
125
|
+
response = response_class.new(nil, code, message)
|
126
|
+
exception.io.metas.each do |name, values|
|
127
|
+
values.each { |value| response.add_field(name, value) }
|
128
|
+
end
|
129
|
+
|
130
|
+
response_error!(response)
|
131
|
+
rescue => exception
|
132
|
+
request_error!(exception)
|
112
133
|
end
|
113
134
|
|
114
|
-
|
115
|
-
|
135
|
+
# Converts the open-uri result file into a Tempfile if it isn't already,
|
136
|
+
# and makes sure the Tempfile has the correct file extension.
|
137
|
+
def ensure_tempfile(open_uri_file)
|
138
|
+
extension = File.extname(open_uri_file.base_uri.path)
|
139
|
+
tempfile = Tempfile.new(["down-net_http", extension], binmode: true)
|
140
|
+
|
141
|
+
if open_uri_file.is_a?(Tempfile)
|
142
|
+
# Windows requires file descriptors to be closed before files are moved
|
143
|
+
open_uri_file.close
|
144
|
+
tempfile.close
|
145
|
+
FileUtils.mv open_uri_file.path, tempfile.path
|
146
|
+
else # open-uri returns a StringIO when there is less than 10KB of content
|
147
|
+
IO.copy_stream(open_uri_file, tempfile)
|
148
|
+
open_uri_file.close
|
149
|
+
end
|
150
|
+
|
151
|
+
tempfile.open
|
152
|
+
OpenURI::Meta.init tempfile, open_uri_file # adds open-uri methods
|
153
|
+
|
154
|
+
tempfile
|
155
|
+
end
|
156
|
+
|
157
|
+
def net_http_request(uri, options, follows_remaining: options.fetch(:max_redirects, 2), &block)
|
158
|
+
http, request = create_net_http(uri, options)
|
116
159
|
|
117
160
|
begin
|
118
|
-
|
119
|
-
|
120
|
-
|
121
|
-
|
161
|
+
response = http.start do
|
162
|
+
http.request(request) do |response|
|
163
|
+
unless response.is_a?(Net::HTTPRedirection)
|
164
|
+
yield response
|
165
|
+
response.instance_variable_set("@read", true) # mark response as read
|
166
|
+
end
|
167
|
+
end
|
168
|
+
end
|
169
|
+
rescue => exception
|
170
|
+
request_error!(exception)
|
171
|
+
end
|
172
|
+
|
173
|
+
if response.is_a?(Net::HTTPRedirection)
|
174
|
+
raise Down::TooManyRedirects if follows_remaining == 0
|
175
|
+
|
176
|
+
location = URI.parse(response["Location"])
|
177
|
+
location = uri + location if location.relative?
|
178
|
+
|
179
|
+
net_http_request(location, options, follows_remaining: follows_remaining - 1, &block)
|
122
180
|
end
|
181
|
+
end
|
123
182
|
|
183
|
+
def create_net_http(uri, options)
|
124
184
|
http_class = Net::HTTP
|
125
185
|
|
126
186
|
if options[:proxy]
|
@@ -154,62 +214,21 @@ module Down
|
|
154
214
|
get = Net::HTTP::Get.new(uri.request_uri, request_headers)
|
155
215
|
get.basic_auth(uri.user, uri.password) if uri.user || uri.password
|
156
216
|
|
157
|
-
|
158
|
-
http.start do
|
159
|
-
http.request(get) do |response|
|
160
|
-
Fiber.yield response
|
161
|
-
response.instance_variable_set("@read", true)
|
162
|
-
end
|
163
|
-
end
|
164
|
-
end
|
165
|
-
|
166
|
-
begin
|
167
|
-
response = request.resume
|
168
|
-
rescue => exception
|
169
|
-
request_error!(exception)
|
170
|
-
end
|
171
|
-
|
172
|
-
response_error!(response) unless (200..299).cover?(response.code.to_i)
|
173
|
-
|
174
|
-
body_chunks = Enumerator.new do |yielder|
|
175
|
-
begin
|
176
|
-
response.read_body { |chunk| yielder << chunk }
|
177
|
-
rescue => exception
|
178
|
-
request_error!(exception)
|
179
|
-
end
|
180
|
-
end
|
181
|
-
|
182
|
-
Down::ChunkedIO.new(
|
183
|
-
chunks: body_chunks,
|
184
|
-
size: response["Content-Length"] && response["Content-Length"].to_i,
|
185
|
-
encoding: response.type_params["charset"],
|
186
|
-
rewindable: options.fetch(:rewindable, true),
|
187
|
-
on_close: -> { request.resume }, # close HTTP connnection
|
188
|
-
data: {
|
189
|
-
status: response.code.to_i,
|
190
|
-
headers: response.each_header.inject({}) { |headers, (downcased_name, value)|
|
191
|
-
name = downcased_name.split("-").map(&:capitalize).join("-")
|
192
|
-
headers.merge!(name => value)
|
193
|
-
},
|
194
|
-
response: response,
|
195
|
-
},
|
196
|
-
)
|
217
|
+
[http, get]
|
197
218
|
end
|
198
219
|
|
199
|
-
|
220
|
+
def stream_body(response, &block)
|
221
|
+
response.read_body(&block)
|
222
|
+
rescue => exception
|
223
|
+
request_error!(exception)
|
224
|
+
end
|
200
225
|
|
201
|
-
def
|
202
|
-
|
203
|
-
|
204
|
-
|
205
|
-
|
206
|
-
|
207
|
-
else
|
208
|
-
IO.copy_stream(io, tempfile)
|
209
|
-
io.rewind
|
210
|
-
end
|
211
|
-
tempfile.open
|
212
|
-
tempfile
|
226
|
+
def ensure_uri(url)
|
227
|
+
uri = URI(url)
|
228
|
+
raise Down::InvalidUrl, "URL scheme needs to be http or https" unless uri.is_a?(URI::HTTP)
|
229
|
+
uri
|
230
|
+
rescue URI::InvalidURIError => exception
|
231
|
+
raise Down::InvalidUrl, exception.message
|
213
232
|
end
|
214
233
|
|
215
234
|
def response_error!(response)
|
@@ -227,7 +246,7 @@ module Down
|
|
227
246
|
|
228
247
|
def request_error!(exception)
|
229
248
|
case exception
|
230
|
-
when
|
249
|
+
when Net::OpenTimeout
|
231
250
|
raise Down::TimeoutError, "timed out waiting for connection to open"
|
232
251
|
when Net::ReadTimeout
|
233
252
|
raise Down::TimeoutError, "timed out while reading data"
|
data/lib/down/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: down
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 4.
|
4
|
+
version: 4.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Janko Marohnić
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-
|
11
|
+
date: 2017-12-22 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: minitest
|
@@ -101,6 +101,7 @@ executables: []
|
|
101
101
|
extensions: []
|
102
102
|
extra_rdoc_files: []
|
103
103
|
files:
|
104
|
+
- CHANGELOG.md
|
104
105
|
- LICENSE.txt
|
105
106
|
- README.md
|
106
107
|
- down.gemspec
|