mechanize 2.1.1 → 2.2
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of mechanize might be problematic. Click here for more details.
- data.tar.gz.sig +0 -0
- data/CHANGELOG.rdoc +36 -1
- data/EXAMPLES.rdoc +23 -18
- data/GUIDE.rdoc +10 -5
- data/Manifest.txt +4 -0
- data/Rakefile +2 -1
- data/lib/mechanize.rb +88 -18
- data/lib/mechanize/file_request.rb +4 -0
- data/lib/mechanize/file_saver.rb +3 -3
- data/lib/mechanize/http/agent.rb +155 -114
- data/lib/mechanize/image.rb +6 -0
- data/lib/mechanize/page.rb +38 -3
- data/lib/mechanize/page/image.rb +160 -10
- data/lib/mechanize/page/link.rb +5 -0
- data/lib/mechanize/page/meta_refresh.rb +28 -25
- data/lib/mechanize/pluggable_parsers.rb +28 -14
- data/lib/mechanize/util.rb +6 -0
- data/test/htdocs/tc_links.html +2 -0
- data/test/test_mechanize.rb +39 -10
- data/test/test_mechanize_directory_saver.rb +49 -0
- data/test/test_mechanize_file_request.rb +14 -8
- data/test/test_mechanize_http_agent.rb +391 -370
- data/test/test_mechanize_image.rb +8 -0
- data/test/test_mechanize_link.rb +8 -0
- data/test/test_mechanize_page.rb +11 -10
- data/test/test_mechanize_page_image.rb +183 -0
- data/test/test_mechanize_page_meta_refresh.rb +20 -4
- data/test/test_mechanize_pluggable_parser.rb +15 -0
- metadata +56 -27
- metadata.gz.sig +0 -0
data.tar.gz.sig
CHANGED
Binary file
|
data/CHANGELOG.rdoc
CHANGED
@@ -1,6 +1,41 @@
|
|
1
1
|
= Mechanize CHANGELOG
|
2
2
|
|
3
|
-
=== 2.
|
3
|
+
=== 2.2 / 2012-02-12
|
4
|
+
|
5
|
+
* API changes
|
6
|
+
* MetaRefresh#href is not normalized to an absolute URL, but set to the
|
7
|
+
original value and resolved later. It is even set to nil when the
|
8
|
+
Refresh URL is unspecified or empty.
|
9
|
+
|
10
|
+
* Minor enhancements
|
11
|
+
* Expose ssl_version from net-http-persistent. Patch by astera.
|
12
|
+
* SSL parameters and proxy may now be set at any time. Issue #194 by
|
13
|
+
dsisnero.
|
14
|
+
* Improved Mechanize::Page with #image_with and #images_with and
|
15
|
+
Mechanize::Page::Image various img element attribute accessors, #caption,
|
16
|
+
#extname, #mime_type and #fetch. Pull request #173 by kitamomonga
|
17
|
+
* Added MIME type parsing for content-types in Mechanize::PluggableParser
|
18
|
+
for fine-grained parser choices. Parsers will be chosen based on exact
|
19
|
+
match, simplified type or media type in that order. See
|
20
|
+
Mechanize::PluggableParser#[]=.
|
21
|
+
* Added Mechanize#download which downloads a response body to an IO-like or
|
22
|
+
filename.
|
23
|
+
* Added Mechanize::DirectorySaver which saves responses in a single
|
24
|
+
directory. Issue #187 by yoshie902a.
|
25
|
+
* Added Mechanize::Page::Link#noreferrer?
|
26
|
+
* The documentation for Mechanize::Page#search and #at now show that both
|
27
|
+
XPath and CSS expressions are allowed. Issue #199 by Shane Becker.
|
28
|
+
|
29
|
+
* Bug fixes
|
30
|
+
* Fixed handling of a HEAD request with Accept-Encoding: gzip. Issue #198
|
31
|
+
by Oleg Dashevskii
|
32
|
+
* Use #resolve for resolving a Location header value. fixes #197
|
33
|
+
* A Refresh value can have whitespaces around the semicolon and equal sign.
|
34
|
+
* MetaRefresh#click no longer sends out Referer.
|
35
|
+
* A link with an empty href is now resolved correctly where previously the
|
36
|
+
query part was dropped.
|
37
|
+
|
38
|
+
=== 2.1.1 / 2012-02-03
|
4
39
|
|
5
40
|
* Bug fixes
|
6
41
|
* Set missing idle_timeout default. Issue #196
|
data/EXAMPLES.rdoc
CHANGED
@@ -1,8 +1,9 @@
|
|
1
1
|
= Mechanize examples
|
2
2
|
|
3
3
|
Note: Several examples show methods chained to the end of do/end blocks.
|
4
|
-
|
5
|
-
is the same as { ...
|
4
|
+
<code>do...end</code> is the same as curly braces (<code>{...}</code>). For
|
5
|
+
example, <code>do ... end.submit</code> is the same as <code>{ ...
|
6
|
+
}.submit</code>.
|
6
7
|
|
7
8
|
== Google
|
8
9
|
|
@@ -81,7 +82,8 @@ Upload a file to flickr.
|
|
81
82
|
end
|
82
83
|
|
83
84
|
== Pluggable Parsers
|
84
|
-
|
85
|
+
|
86
|
+
Lets say you want HTML pages to automatically be parsed with Rubyful Soup.
|
85
87
|
This example shows you how:
|
86
88
|
|
87
89
|
require 'rubygems'
|
@@ -115,10 +117,10 @@ Beautiful Soup for that page.
|
|
115
117
|
|
116
118
|
== The transact method
|
117
119
|
|
118
|
-
transact runs the given block and then resets the page history. I.e.
|
119
|
-
block has been executed, you're back at the original page; no need
|
120
|
-
many times to call the back method at the end of a loop (while
|
121
|
-
possible exceptions).
|
120
|
+
Mechanize#transact runs the given block and then resets the page history. I.e.
|
121
|
+
after the block has been executed, you're back at the original page; no need
|
122
|
+
count how many times to call the back method at the end of a loop (while
|
123
|
+
accounting for possible exceptions).
|
122
124
|
|
123
125
|
This example also demonstrates subclassing Mechanize.
|
124
126
|
|
@@ -154,17 +156,12 @@ This example also demonstrates subclassing Mechanize.
|
|
154
156
|
|
155
157
|
== Client Certificate Authentication (Mutual Auth)
|
156
158
|
|
157
|
-
In most cases a client certificate is created as an additional layer of
|
158
|
-
for certain websites. The specific case that this was initially
|
159
|
-
for automating the download of archived images from a banks
|
160
|
-
system. Once the certificate is installed into your
|
161
|
-
export it and split the certificate and private key
|
162
|
-
|
163
|
-
PKCS #12. You can convert them from p12 to pem format by using the following
|
164
|
-
commands:
|
165
|
-
|
166
|
-
openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.key -nocerts -nodes
|
167
|
-
openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys
|
159
|
+
In most cases a client certificate is created as an additional layer of
|
160
|
+
security for certain websites. The specific case that this was initially
|
161
|
+
tested on was for automating the download of archived images from a banks
|
162
|
+
(Wachovia) lockbox system. Once the certificate is installed into your
|
163
|
+
browser you will have to export it and split the certificate and private key
|
164
|
+
into separate files.
|
168
165
|
|
169
166
|
require 'rubygems'
|
170
167
|
require 'mechanize'
|
@@ -185,3 +182,11 @@ openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys
|
|
185
182
|
|
186
183
|
# submit login form
|
187
184
|
agent.submit(login_form, login_form.buttons.first)
|
185
|
+
|
186
|
+
Exported files are usually in .p12 format (IE 7 & Firefox 2.0) which stands
|
187
|
+
for PKCS #12. You can convert them from p12 to pem format by using the
|
188
|
+
following commands:
|
189
|
+
|
190
|
+
openssl pkcs12 -in input_file.p12 -clcerts -out example.key -nocerts -nodes
|
191
|
+
openssl pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys
|
192
|
+
|
data/GUIDE.rdoc
CHANGED
@@ -130,7 +130,7 @@ In this section, I want to touch on using the different types in input fields
|
|
130
130
|
possible with a form. Password and textarea fields can be treated just like
|
131
131
|
text input fields. Select fields are very similar to text fields, but they
|
132
132
|
have many options associated with them. If you select one option, mechanize
|
133
|
-
will
|
133
|
+
will de-select the other options (unless it is a multi select!).
|
134
134
|
|
135
135
|
For example, lets select an option on a list:
|
136
136
|
|
@@ -154,10 +154,15 @@ tell it what file name you want to upload:
|
|
154
154
|
|
155
155
|
== Scraping Data
|
156
156
|
|
157
|
-
Mechanize uses nokogiri[http://nokogiri.org/] to parse
|
158
|
-
|
159
|
-
|
160
|
-
|
157
|
+
Mechanize uses nokogiri[http://nokogiri.org/] to parse HTML. What does this
|
158
|
+
mean for you? You can treat a mechanize page like an nokogiri object. After
|
159
|
+
you have used Mechanize to navigate to the page that you need to scrape, then
|
160
|
+
scrape it using nokogiri methods:
|
161
|
+
|
162
|
+
agent.get('http://someurl.com/').search("p.posted")
|
163
|
+
|
164
|
+
The expression given to Mechanize::Page#search may be a CSS expression or an
|
165
|
+
XPath expression:
|
161
166
|
|
162
167
|
agent.get('http://someurl.com/').search(".//p[@class='posted']")
|
163
168
|
|
data/Manifest.txt
CHANGED
@@ -47,6 +47,7 @@ lib/mechanize/http/auth_challenge.rb
|
|
47
47
|
lib/mechanize/http/auth_realm.rb
|
48
48
|
lib/mechanize/http/content_disposition_parser.rb
|
49
49
|
lib/mechanize/http/www_authenticate_parser.rb
|
50
|
+
lib/mechanize/image.rb
|
50
51
|
lib/mechanize/monkey_patch.rb
|
51
52
|
lib/mechanize/page.rb
|
52
53
|
lib/mechanize/page/base.rb
|
@@ -144,16 +145,19 @@ test/test_mechanize_http_auth_challenge.rb
|
|
144
145
|
test/test_mechanize_http_auth_realm.rb
|
145
146
|
test/test_mechanize_http_content_disposition_parser.rb
|
146
147
|
test/test_mechanize_http_www_authenticate_parser.rb
|
148
|
+
test/test_mechanize_image.rb
|
147
149
|
test/test_mechanize_link.rb
|
148
150
|
test/test_mechanize_page.rb
|
149
151
|
test/test_mechanize_page_encoding.rb
|
150
152
|
test/test_mechanize_page_frame.rb
|
153
|
+
test/test_mechanize_page_image.rb
|
151
154
|
test/test_mechanize_page_link.rb
|
152
155
|
test/test_mechanize_page_meta_refresh.rb
|
153
156
|
test/test_mechanize_parser.rb
|
154
157
|
test/test_mechanize_pluggable_parser.rb
|
155
158
|
test/test_mechanize_redirect_limit_reached_error.rb
|
156
159
|
test/test_mechanize_redirect_not_get_or_head_error.rb
|
160
|
+
test/test_mechanize_response_read_error.rb
|
157
161
|
test/test_mechanize_subclass.rb
|
158
162
|
test/test_mechanize_util.rb
|
159
163
|
test/test_multi_select.rb
|
data/Rakefile
CHANGED
@@ -17,7 +17,8 @@ hoe = Hoe.spec 'mechanize' do
|
|
17
17
|
rdoc_locations << 'drbrain@rubyforge.org:/var/www/gforge-projects/mechanize/'
|
18
18
|
|
19
19
|
self.extra_deps << ['net-http-digest_auth', '~> 1.1', '>= 1.1.1']
|
20
|
-
self.extra_deps << ['net-http-persistent', '~> 2.
|
20
|
+
self.extra_deps << ['net-http-persistent', '~> 2.5', '>= 2.5.2']
|
21
|
+
self.extra_deps << ['mime-types', '~> 1.17', '>= 1.17.2']
|
21
22
|
self.extra_deps << ['nokogiri', '~> 1.4']
|
22
23
|
self.extra_deps << ['ntlm-http', '~> 0.1', '>= 0.1.1']
|
23
24
|
self.extra_deps << ['webrobots', '~> 0.0', '>= 0.0.9']
|
data/lib/mechanize.rb
CHANGED
@@ -1,6 +1,7 @@
|
|
1
1
|
require 'fileutils'
|
2
2
|
require 'forwardable'
|
3
3
|
require 'iconv' if RUBY_VERSION < '1.9.2'
|
4
|
+
require 'mime/types'
|
4
5
|
require 'mutex_m'
|
5
6
|
require 'net/http/digest_auth'
|
6
7
|
require 'net/http/persistent'
|
@@ -72,7 +73,7 @@ class Mechanize
|
|
72
73
|
##
|
73
74
|
# The version of Mechanize you are using.
|
74
75
|
|
75
|
-
VERSION = '2.
|
76
|
+
VERSION = '2.2'
|
76
77
|
|
77
78
|
##
|
78
79
|
# Base mechanize error class
|
@@ -176,7 +177,6 @@ class Mechanize
|
|
176
177
|
yield self if block_given?
|
177
178
|
|
178
179
|
@agent.set_proxy @proxy_addr, @proxy_port, @proxy_user, @proxy_pass
|
179
|
-
@agent.set_http
|
180
180
|
end
|
181
181
|
|
182
182
|
# :section: History
|
@@ -293,7 +293,7 @@ class Mechanize
|
|
293
293
|
raise RobotsDisallowedError.new(link.href)
|
294
294
|
end
|
295
295
|
end
|
296
|
-
if link.
|
296
|
+
if link.noreferrer?
|
297
297
|
href = @agent.resolve(link.href, link.page || current_page)
|
298
298
|
referer = Page.new(nil, {'content-type'=>'text/html'})
|
299
299
|
else
|
@@ -319,6 +319,47 @@ class Mechanize
|
|
319
319
|
end
|
320
320
|
end
|
321
321
|
|
322
|
+
##
|
323
|
+
# GETs +uri+ and writes it to +io_or_filename+ without recording the request
|
324
|
+
# in the history. If +io_or_filename+ does not respond to #write it will be
|
325
|
+
# used as a file name. +parameters+, +referer+ and +headers+ are used as in
|
326
|
+
# #get.
|
327
|
+
#
|
328
|
+
# By default, if the Content-type of the response matches a Mechanize::File
|
329
|
+
# or Mechanize::Page parser, the response body will be loaded into memory
|
330
|
+
# before being saved. See #pluggable_parser for details on changing this
|
331
|
+
# default.
|
332
|
+
#
|
333
|
+
# For alternate ways of downloading files see Mechanize::FileSaver and
|
334
|
+
# Mechanize::DirectorySaver.
|
335
|
+
|
336
|
+
def download uri, io_or_filename, parameters = [], referer = nil, headers = {}
|
337
|
+
page = transact do
|
338
|
+
get uri, parameters, referer, headers
|
339
|
+
end
|
340
|
+
|
341
|
+
io = if io_or_filename.respond_to? :write then
|
342
|
+
io_or_filename
|
343
|
+
else
|
344
|
+
open io_or_filename, 'wb'
|
345
|
+
end
|
346
|
+
|
347
|
+
case page
|
348
|
+
when Mechanize::File then
|
349
|
+
io.write page.body
|
350
|
+
else
|
351
|
+
body_io = page.body_io
|
352
|
+
|
353
|
+
until body_io.eof? do
|
354
|
+
io.write body_io.read 16384
|
355
|
+
end
|
356
|
+
end
|
357
|
+
|
358
|
+
page
|
359
|
+
ensure
|
360
|
+
io.close if io and not io_or_filename.respond_to? :write
|
361
|
+
end
|
362
|
+
|
322
363
|
##
|
323
364
|
# DELETE +uri+ with +query_params+, and setting +headers+:
|
324
365
|
#
|
@@ -341,18 +382,20 @@ class Mechanize
|
|
341
382
|
|
342
383
|
referer ||=
|
343
384
|
if uri.to_s =~ %r{\Ahttps?://}
|
344
|
-
Page.new(nil,
|
385
|
+
Page.new(nil, 'content-type' => 'text/html')
|
345
386
|
else
|
346
|
-
current_page || Page.new(nil,
|
387
|
+
current_page || Page.new(nil, 'content-type' => 'text/html')
|
347
388
|
end
|
348
389
|
|
349
390
|
# FIXME: Huge hack so that using a URI as a referer works. I need to
|
350
391
|
# refactor everything to pass around URIs but still support
|
351
392
|
# Mechanize::Page#base
|
352
393
|
unless Mechanize::Parser === referer then
|
353
|
-
referer = referer.is_a?(String)
|
354
|
-
|
355
|
-
|
394
|
+
referer = if referer.is_a?(String) then
|
395
|
+
Page.new URI(referer), 'content-type' => 'text/html'
|
396
|
+
else
|
397
|
+
Page.new referer, 'content-type' => 'text/html'
|
398
|
+
end
|
356
399
|
end
|
357
400
|
|
358
401
|
# fetch the page
|
@@ -371,14 +414,15 @@ class Mechanize
|
|
371
414
|
end
|
372
415
|
|
373
416
|
##
|
374
|
-
# HEAD +uri+ with +query_params
|
417
|
+
# HEAD +uri+ with +query_params+ and +headers+:
|
375
418
|
#
|
376
419
|
# head('http://example/', {'q' => 'foo'}, {})
|
377
420
|
|
378
421
|
def head(uri, query_params = {}, headers = {})
|
379
|
-
|
380
|
-
|
422
|
+
page = @agent.fetch uri, :head, headers, query_params
|
423
|
+
|
381
424
|
yield page if block_given?
|
425
|
+
|
382
426
|
page
|
383
427
|
end
|
384
428
|
|
@@ -537,6 +581,18 @@ class Mechanize
|
|
537
581
|
|
538
582
|
attr_accessor :keep_alive_time
|
539
583
|
|
584
|
+
##
|
585
|
+
# The pluggable parser maps a response Content-Type to a parser class. The
|
586
|
+
# registered Content-Type may be either a full content type like 'image/png'
|
587
|
+
# or a media type 'text'. See Mechanize::PluggableParser for further
|
588
|
+
# details.
|
589
|
+
#
|
590
|
+
# Example:
|
591
|
+
#
|
592
|
+
# agent.pluggable_parser['application/octet-stream'] = Mechanize::Download
|
593
|
+
|
594
|
+
attr_reader :pluggable_parser
|
595
|
+
|
540
596
|
##
|
541
597
|
# The HTTP proxy address
|
542
598
|
|
@@ -907,7 +963,7 @@ class Mechanize
|
|
907
963
|
# certificate instance
|
908
964
|
|
909
965
|
def cert= cert
|
910
|
-
@agent.
|
966
|
+
@agent.certificate = cert
|
911
967
|
end
|
912
968
|
|
913
969
|
##
|
@@ -962,10 +1018,11 @@ class Mechanize
|
|
962
1018
|
end
|
963
1019
|
|
964
1020
|
##
|
965
|
-
# Sets the OpenSSL client +key+ to the given path or key instance
|
1021
|
+
# Sets the OpenSSL client +key+ to the given path or key instance. If a
|
1022
|
+
# path is given, the path must contain an RSA key file.
|
966
1023
|
|
967
1024
|
def key= key
|
968
|
-
@agent.
|
1025
|
+
@agent.private_key = key
|
969
1026
|
end
|
970
1027
|
|
971
1028
|
##
|
@@ -982,6 +1039,21 @@ class Mechanize
|
|
982
1039
|
@agent.pass = pass
|
983
1040
|
end
|
984
1041
|
|
1042
|
+
##
|
1043
|
+
# SSL version to use. Ruby 1.9 and newer only.
|
1044
|
+
|
1045
|
+
def ssl_version
|
1046
|
+
@agent.ssl_version
|
1047
|
+
end if RUBY_VERSION > '1.9'
|
1048
|
+
|
1049
|
+
##
|
1050
|
+
# Sets the SSL version to use to +version+ without client/server
|
1051
|
+
# negotiation. Ruby 1.9 and newer only.
|
1052
|
+
|
1053
|
+
def ssl_version= ssl_version
|
1054
|
+
@agent.ssl_version = ssl_version
|
1055
|
+
end if RUBY_VERSION > '1.9'
|
1056
|
+
|
985
1057
|
##
|
986
1058
|
# A callback for additional certificate verification. See
|
987
1059
|
# OpenSSL::SSL::SSLContext#verify_callback
|
@@ -1021,8 +1093,6 @@ class Mechanize
|
|
1021
1093
|
|
1022
1094
|
attr_reader :agent # :nodoc:
|
1023
1095
|
|
1024
|
-
attr_reader :pluggable_parser # :nodoc:
|
1025
|
-
|
1026
1096
|
##
|
1027
1097
|
# Parses the +body+ of the +response+ from +uri+ using the pluggable parser
|
1028
1098
|
# that matches its content type
|
@@ -1035,7 +1105,6 @@ class Mechanize
|
|
1035
1105
|
content_type, = data.downcase.split ',', 2 unless data.nil?
|
1036
1106
|
end
|
1037
1107
|
|
1038
|
-
# Find our pluggable parser
|
1039
1108
|
parser_klass = @pluggable_parser.parser content_type
|
1040
1109
|
|
1041
1110
|
unless parser_klass <= Mechanize::Download then
|
@@ -1074,7 +1143,6 @@ class Mechanize
|
|
1074
1143
|
@proxy_pass = password
|
1075
1144
|
|
1076
1145
|
@agent.set_proxy address, port, user, password
|
1077
|
-
@agent.set_http
|
1078
1146
|
end
|
1079
1147
|
|
1080
1148
|
private
|
@@ -1116,6 +1184,7 @@ require 'mechanize/cookie'
|
|
1116
1184
|
require 'mechanize/cookie_jar'
|
1117
1185
|
require 'mechanize/parser'
|
1118
1186
|
require 'mechanize/download'
|
1187
|
+
require 'mechanize/directory_saver'
|
1119
1188
|
require 'mechanize/file'
|
1120
1189
|
require 'mechanize/file_connection'
|
1121
1190
|
require 'mechanize/file_request'
|
@@ -1128,6 +1197,7 @@ require 'mechanize/http/auth_challenge'
|
|
1128
1197
|
require 'mechanize/http/auth_realm'
|
1129
1198
|
require 'mechanize/http/content_disposition_parser'
|
1130
1199
|
require 'mechanize/http/www_authenticate_parser'
|
1200
|
+
require 'mechanize/image'
|
1131
1201
|
require 'mechanize/page'
|
1132
1202
|
require 'mechanize/monkey_patch'
|
1133
1203
|
require 'mechanize/pluggable_parsers'
|
data/lib/mechanize/file_saver.rb
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
##
|
2
2
|
# This is a pluggable parser that automatically saves every file it
|
3
|
-
# encounters.
|
4
|
-
# path.
|
3
|
+
# encounters. Unlike Mechanize::DirectorySaver, the file saver saves the
|
4
|
+
# responses as a tree, reflecting the host and file path.
|
5
5
|
#
|
6
6
|
# == Example
|
7
7
|
#
|
@@ -11,7 +11,7 @@
|
|
11
11
|
#
|
12
12
|
# agent = Mechanize.new
|
13
13
|
# agent.pluggable_parser.pdf = Mechanize::FileSaver
|
14
|
-
# agent.get
|
14
|
+
# agent.get 'http://example.com/foo.pdf'
|
15
15
|
#
|
16
16
|
# Dir['example.com/*'] # => foo.pdf
|
17
17
|
|
data/lib/mechanize/http/agent.rb
CHANGED
@@ -78,37 +78,11 @@ class Mechanize::HTTP::Agent
|
|
78
78
|
|
79
79
|
# :section: SSL
|
80
80
|
|
81
|
-
# Path to an OpenSSL server certificate file
|
82
|
-
attr_accessor :ca_file
|
83
|
-
|
84
|
-
# An OpenSSL private key or the path to a private key
|
85
|
-
attr_accessor :key
|
86
|
-
|
87
|
-
# An OpenSSL client certificate or the path to a certificate file.
|
88
|
-
attr_accessor :cert
|
89
|
-
|
90
|
-
# An SSL certificate store
|
91
|
-
attr_accessor :cert_store
|
92
|
-
|
93
81
|
# OpenSSL key password
|
94
82
|
attr_accessor :pass
|
95
83
|
|
96
|
-
# A callback for additional certificate verification. See
|
97
|
-
# OpenSSL::SSL::SSLContext#verify_callback
|
98
|
-
#
|
99
|
-
# The callback can be used for debugging or to ignore errors by always
|
100
|
-
# returning +true+. Specifying nil uses the default method that was valid
|
101
|
-
# when the SSLContext was created
|
102
|
-
attr_accessor :verify_callback
|
103
|
-
|
104
|
-
# How to verify SSL connections. Defaults to VERIFY_PEER
|
105
|
-
attr_accessor :verify_mode
|
106
|
-
|
107
84
|
# :section: Timeouts
|
108
85
|
|
109
|
-
# Reset connections that have not been used in this many seconds
|
110
|
-
attr_reader :idle_timeout
|
111
|
-
|
112
86
|
# Set to false to disable HTTP/1.1 keep-alive requests
|
113
87
|
attr_accessor :keep_alive
|
114
88
|
|
@@ -123,12 +97,6 @@ class Mechanize::HTTP::Agent
|
|
123
97
|
# The cookies for this agent
|
124
98
|
attr_accessor :cookie_jar
|
125
99
|
|
126
|
-
# URI for a proxy connection
|
127
|
-
attr_reader :proxy_uri
|
128
|
-
|
129
|
-
# Retry non-idempotent requests?
|
130
|
-
attr_reader :retry_change_requests
|
131
|
-
|
132
100
|
# Responses larger than this will be written to a Tempfile instead of stored
|
133
101
|
# in memory.
|
134
102
|
attr_accessor :max_file_buffer
|
@@ -157,19 +125,15 @@ class Mechanize::HTTP::Agent
|
|
157
125
|
@follow_meta_refresh_self = false
|
158
126
|
@gzip_enabled = true
|
159
127
|
@history = Mechanize::History.new
|
160
|
-
@idle_timeout = 5
|
161
128
|
@keep_alive = true
|
162
|
-
@keep_alive_time = 300
|
163
129
|
@max_file_buffer = 10240
|
164
130
|
@open_timeout = nil
|
165
131
|
@post_connect_hooks = []
|
166
132
|
@pre_connect_hooks = []
|
167
|
-
@proxy_uri = nil
|
168
133
|
@read_timeout = nil
|
169
134
|
@redirect_ok = true
|
170
135
|
@redirection_limit = 20
|
171
136
|
@request_headers = {}
|
172
|
-
@retry_change_requests = false
|
173
137
|
@robots = false
|
174
138
|
@user_agent = nil
|
175
139
|
@webrobots = nil
|
@@ -188,13 +152,7 @@ class Mechanize::HTTP::Agent
|
|
188
152
|
@domain = nil # NTLM HTTP domain
|
189
153
|
|
190
154
|
# SSL
|
191
|
-
@
|
192
|
-
@cert = nil
|
193
|
-
@cert_store = nil
|
194
|
-
@key = nil
|
195
|
-
@pass = nil
|
196
|
-
@verify_callback = nil
|
197
|
-
@verify_mode = nil
|
155
|
+
@pass = nil
|
198
156
|
|
199
157
|
@scheme_handlers = Hash.new { |h, scheme|
|
200
158
|
h[scheme] = lambda { |link, page|
|
@@ -206,6 +164,10 @@ class Mechanize::HTTP::Agent
|
|
206
164
|
@scheme_handlers['https'] = @scheme_handlers['http']
|
207
165
|
@scheme_handlers['relative'] = @scheme_handlers['http']
|
208
166
|
@scheme_handlers['file'] = @scheme_handlers['http']
|
167
|
+
|
168
|
+
@http = Net::HTTP::Persistent.new 'mechanize'
|
169
|
+
@http.idle_timeout = 5
|
170
|
+
@http.keep_alive = 300
|
209
171
|
end
|
210
172
|
|
211
173
|
# Retrieves +uri+ and parses it into a page or other object according to
|
@@ -273,7 +235,8 @@ class Mechanize::HTTP::Agent
|
|
273
235
|
|
274
236
|
hook_content_encoding response, uri, response_body_io
|
275
237
|
|
276
|
-
response_body_io = response_content_encoding response, response_body_io
|
238
|
+
response_body_io = response_content_encoding response, response_body_io if
|
239
|
+
request.response_body_permitted?
|
277
240
|
|
278
241
|
post_connect uri, response, response_body_io
|
279
242
|
|
@@ -306,11 +269,21 @@ class Mechanize::HTTP::Agent
|
|
306
269
|
end
|
307
270
|
end
|
308
271
|
|
272
|
+
# URI for a proxy connection
|
273
|
+
|
274
|
+
def proxy_uri
|
275
|
+
@http.proxy_uri
|
276
|
+
end
|
277
|
+
|
278
|
+
# Retry non-idempotent requests?
|
279
|
+
def retry_change_requests
|
280
|
+
@http.retry_change_requests
|
281
|
+
end
|
282
|
+
|
309
283
|
# Retry non-idempotent requests
|
310
284
|
|
311
285
|
def retry_change_requests= retri
|
312
|
-
@retry_change_requests = retri
|
313
|
-
@http.retry_change_requests = retri if @http
|
286
|
+
@http.retry_change_requests = retri
|
314
287
|
end
|
315
288
|
|
316
289
|
# :section: Headers
|
@@ -568,10 +541,24 @@ class Mechanize::HTTP::Agent
|
|
568
541
|
end
|
569
542
|
|
570
543
|
def resolve(uri, referer = current_page)
|
571
|
-
|
544
|
+
referer_uri = referer && referer.uri
|
545
|
+
if uri.is_a?(URI)
|
546
|
+
uri = uri.dup
|
547
|
+
elsif uri.nil?
|
548
|
+
if referer_uri
|
549
|
+
return referer_uri
|
550
|
+
end
|
551
|
+
raise ArgumentError, "absolute URL needed (not nil)"
|
552
|
+
else
|
553
|
+
url = uri.to_s.strip
|
554
|
+
if url.empty?
|
555
|
+
if referer_uri
|
556
|
+
return referer_uri.dup.tap { |u| u.fragment = nil }
|
557
|
+
end
|
558
|
+
raise ArgumentError, "absolute URL needed (not #{uri.inspect})"
|
559
|
+
end
|
572
560
|
|
573
|
-
|
574
|
-
uri = uri.to_s.strip.gsub(/[^#{0.chr}-#{126.chr}]/o) { |match|
|
561
|
+
url.gsub!(/[^#{0.chr}-#{126.chr}]/o) { |match|
|
575
562
|
if RUBY_VERSION >= "1.9.0"
|
576
563
|
Mechanize::Util.uri_escape(match)
|
577
564
|
else
|
@@ -579,28 +566,25 @@ class Mechanize::HTTP::Agent
|
|
579
566
|
end
|
580
567
|
}
|
581
568
|
|
582
|
-
|
583
|
-
|
584
|
-
|
585
|
-
escaped_uri = Mechanize::Util.html_unescape(
|
586
|
-
unescaped.zip(escaped).map { |x,y|
|
569
|
+
escaped_url = Mechanize::Util.html_unescape(
|
570
|
+
url.split(/((?:%[0-9A-Fa-f]{2})+|#)/).each_slice(2).map { |x, y|
|
587
571
|
"#{WEBrick::HTTPUtils.escape(x)}#{y}"
|
588
572
|
}.join('')
|
589
573
|
)
|
590
574
|
|
591
575
|
begin
|
592
|
-
uri = URI.parse(
|
576
|
+
uri = URI.parse(escaped_url)
|
593
577
|
rescue
|
594
|
-
uri = URI.parse(WEBrick::HTTPUtils.escape(
|
578
|
+
uri = URI.parse(WEBrick::HTTPUtils.escape(escaped_url))
|
595
579
|
end
|
596
580
|
end
|
597
581
|
|
598
582
|
scheme = uri.relative? ? 'relative' : uri.scheme.downcase
|
599
583
|
uri = @scheme_handlers[scheme].call(uri, referer)
|
600
584
|
|
601
|
-
if
|
585
|
+
if referer_uri
|
602
586
|
if uri.path.length == 0 && uri.relative?
|
603
|
-
uri.path =
|
587
|
+
uri.path = referer_uri.path
|
604
588
|
end
|
605
589
|
end
|
606
590
|
|
@@ -608,17 +592,16 @@ class Mechanize::HTTP::Agent
|
|
608
592
|
|
609
593
|
if uri.relative?
|
610
594
|
raise ArgumentError, "absolute URL needed (not #{uri})" unless
|
611
|
-
|
595
|
+
referer_uri
|
612
596
|
|
613
|
-
|
614
|
-
|
615
|
-
base =
|
597
|
+
if referer.respond_to?(:bases) && referer.parser &&
|
598
|
+
(lbase = referer.bases.last) && lbase.uri && lbase.uri.absolute?
|
599
|
+
base = lbase
|
600
|
+
else
|
601
|
+
base = nil
|
616
602
|
end
|
617
603
|
|
618
|
-
uri = (
|
619
|
-
base.uri :
|
620
|
-
referer.uri) + uri
|
621
|
-
uri = referer.uri + uri
|
604
|
+
uri = referer_uri + (base ? base.uri : referer_uri) + uri
|
622
605
|
# Strip initial "/.." bits from the path
|
623
606
|
uri.path.sub!(/^(\/\.\.)+(?=\/)/, '')
|
624
607
|
end
|
@@ -791,7 +774,8 @@ class Mechanize::HTTP::Agent
|
|
791
774
|
|
792
775
|
def response_follow_meta_refresh response, uri, page, redirects
|
793
776
|
delay, new_url = get_meta_refresh(response, uri, page)
|
794
|
-
return nil unless
|
777
|
+
return nil unless delay
|
778
|
+
new_url = new_url ? resolve(new_url, page) : uri
|
795
779
|
|
796
780
|
raise Mechanize::RedirectLimitReachedError.new(page, redirects) if
|
797
781
|
redirects + 1 > @redirection_limit
|
@@ -888,9 +872,8 @@ class Mechanize::HTTP::Agent
|
|
888
872
|
|
889
873
|
redirect_method = method == :head ? :head : :get
|
890
874
|
|
891
|
-
|
892
|
-
|
893
|
-
new_uri = from_uri + response['Location'].to_s
|
875
|
+
@history.push(page, page.uri)
|
876
|
+
new_uri = resolve response['Location'].to_s, page
|
894
877
|
|
895
878
|
fetch new_uri, redirect_method, {}, [], referer, redirects + 1
|
896
879
|
end
|
@@ -949,16 +932,104 @@ class Mechanize::HTTP::Agent
|
|
949
932
|
|
950
933
|
# :section: SSL
|
951
934
|
|
935
|
+
# Path to an OpenSSL CA certificate file
|
936
|
+
def ca_file
|
937
|
+
@http.ca_file
|
938
|
+
end
|
939
|
+
|
940
|
+
# Sets the path to an OpenSSL CA certificate file
|
941
|
+
def ca_file= ca_file
|
942
|
+
@http.ca_file = ca_file
|
943
|
+
end
|
944
|
+
|
945
|
+
# The SSL certificate store used for validating connections
|
946
|
+
def cert_store
|
947
|
+
@http.cert_store
|
948
|
+
end
|
949
|
+
|
950
|
+
# Sets the SSL certificate store used for validating connections
|
951
|
+
def cert_store= cert_store
|
952
|
+
@http.cert_store = cert_store
|
953
|
+
end
|
954
|
+
|
955
|
+
# The client X509 certificate
|
952
956
|
def certificate
|
953
957
|
@http.certificate
|
954
958
|
end
|
955
959
|
|
960
|
+
# Sets the client certificate to given X509 certificate. If a path is given
|
961
|
+
# the certificate will be loaded and set.
|
962
|
+
def certificate= certificate
|
963
|
+
certificate = if OpenSSL::X509::Certificate === certificate then
|
964
|
+
certificate
|
965
|
+
else
|
966
|
+
OpenSSL::X509::Certificate.new File.read certificate
|
967
|
+
end
|
968
|
+
|
969
|
+
@http.certificate = certificate
|
970
|
+
end
|
971
|
+
|
972
|
+
# An OpenSSL private key or the path to a private key
|
973
|
+
def private_key
|
974
|
+
@http.private_key
|
975
|
+
end
|
976
|
+
|
977
|
+
# Sets the client's private key
|
978
|
+
def private_key= private_key
|
979
|
+
private_key = if OpenSSL::PKey::PKey === private_key then
|
980
|
+
private_key
|
981
|
+
else
|
982
|
+
OpenSSL::PKey::RSA.new File.read(private_key), @pass
|
983
|
+
end
|
984
|
+
|
985
|
+
@http.private_key = private_key
|
986
|
+
end
|
987
|
+
|
988
|
+
# SSL version to use
|
989
|
+
def ssl_version
|
990
|
+
@http.ssl_version
|
991
|
+
end if RUBY_VERSION > '1.9'
|
992
|
+
|
993
|
+
# Sets the SSL version to use
|
994
|
+
def ssl_version= ssl_version
|
995
|
+
@http.ssl_version = ssl_version
|
996
|
+
end if RUBY_VERSION > '1.9'
|
997
|
+
|
998
|
+
# A callback for additional certificate verification. See
|
999
|
+
# OpenSSL::SSL::SSLContext#verify_callback
|
1000
|
+
#
|
1001
|
+
# The callback can be used for debugging or to ignore errors by always
|
1002
|
+
# returning +true+. Specifying nil uses the default method that was valid
|
1003
|
+
# when the SSLContext was created
|
1004
|
+
def verify_callback
|
1005
|
+
@http.verify_callback
|
1006
|
+
end
|
1007
|
+
|
1008
|
+
# Sets the certificate verify callback
|
1009
|
+
def verify_callback= verify_callback
|
1010
|
+
@http.verify_callback = verify_callback
|
1011
|
+
end
|
1012
|
+
|
1013
|
+
# How to verify SSL connections. Defaults to VERIFY_PEER
|
1014
|
+
def verify_mode
|
1015
|
+
@http.verify_mode
|
1016
|
+
end
|
1017
|
+
|
1018
|
+
# Sets the mode for verifying SSL connections
|
1019
|
+
def verify_mode= verify_mode
|
1020
|
+
@http.verify_mode = verify_mode
|
1021
|
+
end
|
1022
|
+
|
956
1023
|
# :section: Timeouts
|
957
1024
|
|
958
|
-
#
|
1025
|
+
# Reset connections that have not been used in this many seconds
|
1026
|
+
def idle_timeout
|
1027
|
+
@http.idle_timeout
|
1028
|
+
end
|
1029
|
+
|
1030
|
+
# Sets the connection idle timeout for persistent connections
|
959
1031
|
def idle_timeout= timeout
|
960
|
-
@idle_timeout = timeout
|
961
|
-
@http.idle_timeout = timeout if @http
|
1032
|
+
@http.idle_timeout = timeout
|
962
1033
|
end
|
963
1034
|
|
964
1035
|
# :section: Utility
|
@@ -980,47 +1051,17 @@ class Mechanize::HTTP::Agent
|
|
980
1051
|
@context.log
|
981
1052
|
end
|
982
1053
|
|
983
|
-
def set_http
|
984
|
-
@http = Net::HTTP::Persistent.new 'mechanize', @proxy_uri
|
985
|
-
|
986
|
-
@http.keep_alive = @keep_alive_time
|
987
|
-
@http.idle_timeout = @idle_timeout if @idle_timeout
|
988
|
-
@http.retry_change_requests = @retry_change_requests
|
989
|
-
|
990
|
-
@http.ca_file = @ca_file
|
991
|
-
@http.cert_store = @cert_store if @cert_store
|
992
|
-
@http.verify_callback = @verify_callback
|
993
|
-
@http.verify_mode = @verify_mode if @verify_mode
|
994
|
-
|
995
|
-
# update our cached value
|
996
|
-
@verify_mode = @http.verify_mode
|
997
|
-
@cert_store = @http.cert_store
|
998
|
-
|
999
|
-
if @cert and @key then
|
1000
|
-
cert = if OpenSSL::X509::Certificate === @cert then
|
1001
|
-
@cert
|
1002
|
-
else
|
1003
|
-
OpenSSL::X509::Certificate.new ::File.read @cert
|
1004
|
-
end
|
1005
|
-
|
1006
|
-
key = if OpenSSL::PKey::PKey === @key then
|
1007
|
-
@key
|
1008
|
-
else
|
1009
|
-
OpenSSL::PKey::RSA.new ::File.read(@key), @pass
|
1010
|
-
end
|
1011
|
-
|
1012
|
-
@http.certificate = cert
|
1013
|
-
@http.private_key = key
|
1014
|
-
end
|
1015
|
-
end
|
1016
|
-
|
1017
1054
|
##
|
1018
1055
|
# Sets the proxy address, port, user, and password +addr+ should be a host,
|
1019
1056
|
# with no "http://", +port+ may be a port number, service name or port
|
1020
1057
|
# number string.
|
1021
1058
|
|
1022
|
-
def set_proxy
|
1023
|
-
|
1059
|
+
def set_proxy addr, port, user = nil, pass = nil
|
1060
|
+
unless addr and port then
|
1061
|
+
@http.proxy = nil
|
1062
|
+
|
1063
|
+
return
|
1064
|
+
end
|
1024
1065
|
|
1025
1066
|
unless Integer === port then
|
1026
1067
|
begin
|
@@ -1034,12 +1075,12 @@ class Mechanize::HTTP::Agent
|
|
1034
1075
|
end
|
1035
1076
|
end
|
1036
1077
|
|
1037
|
-
|
1038
|
-
|
1039
|
-
|
1040
|
-
|
1078
|
+
proxy_uri = URI "http://#{addr}"
|
1079
|
+
proxy_uri.port = port
|
1080
|
+
proxy_uri.user = user if user
|
1081
|
+
proxy_uri.password = pass if pass
|
1041
1082
|
|
1042
|
-
@proxy_uri
|
1083
|
+
@http.proxy = proxy_uri
|
1043
1084
|
end
|
1044
1085
|
|
1045
1086
|
end
|