mechanize 2.1.1 → 2.2

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of mechanize might be problematic. Click here for more details.

data.tar.gz.sig CHANGED
Binary file
data/CHANGELOG.rdoc CHANGED
@@ -1,6 +1,41 @@
1
1
  = Mechanize CHANGELOG
2
2
 
3
- === 2.1.1 / 2010-02-03
3
+ === 2.2 / 2012-02-12
4
+
5
+ * API changes
6
+ * MetaRefresh#href is not normalized to an absolute URL, but set to the
7
+ original value and resolved later. It is even set to nil when the
8
+ Refresh URL is unspecified or empty.
9
+
10
+ * Minor enhancements
11
+ * Expose ssl_version from net-http-persistent. Patch by astera.
12
+ * SSL parameters and proxy may now be set at any time. Issue #194 by
13
+ dsisnero.
14
+ * Improved Mechanize::Page with #image_with and #images_with and
15
+ Mechanize::Page::Image various img element attribute accessors, #caption,
16
+ #extname, #mime_type and #fetch. Pull request #173 by kitamomonga
17
+ * Added MIME type parsing for content-types in Mechanize::PluggableParser
18
+ for fine-grained parser choices. Parsers will be chosen based on exact
19
+ match, simplified type or media type in that order. See
20
+ Mechanize::PluggableParser#[]=.
21
+ * Added Mechanize#download which downloads a response body to an IO-like or
22
+ filename.
23
+ * Added Mechanize::DirectorySaver which saves responses in a single
24
+ directory. Issue #187 by yoshie902a.
25
+ * Added Mechanize::Page::Link#noreferrer?
26
+ * The documentation for Mechanize::Page#search and #at now show that both
27
+ XPath and CSS expressions are allowed. Issue #199 by Shane Becker.
28
+
29
+ * Bug fixes
30
+ * Fixed handling of a HEAD request with Accept-Encoding: gzip. Issue #198
31
+ by Oleg Dashevskii
32
+ * Use #resolve for resolving a Location header value. fixes #197
33
+ * A Refresh value can have whitespaces around the semicolon and equal sign.
34
+ * MetaRefresh#click no longer sends out Referer.
35
+ * A link with an empty href is now resolved correctly where previously the
36
+ query part was dropped.
37
+
38
+ === 2.1.1 / 2012-02-03
4
39
 
5
40
  * Bug fixes
6
41
  * Set missing idle_timeout default. Issue #196
data/EXAMPLES.rdoc CHANGED
@@ -1,8 +1,9 @@
1
1
  = Mechanize examples
2
2
 
3
3
  Note: Several examples show methods chained to the end of do/end blocks.
4
- Do...end is the same as curly braces ({...}). For example, do ... end.submit
5
- is the same as { ... }.submit.
4
+ <code>do...end</code> is the same as curly braces (<code>{...}</code>). For
5
+ example, <code>do ... end.submit</code> is the same as <code>{ ...
6
+ }.submit</code>.
6
7
 
7
8
  == Google
8
9
 
@@ -81,7 +82,8 @@ Upload a file to flickr.
81
82
  end
82
83
 
83
84
  == Pluggable Parsers
84
- Lets say you want html pages to automatically be parsed with Rubyful Soup.
85
+
86
+ Lets say you want HTML pages to automatically be parsed with Rubyful Soup.
85
87
  This example shows you how:
86
88
 
87
89
  require 'rubygems'
@@ -115,10 +117,10 @@ Beautiful Soup for that page.
115
117
 
116
118
  == The transact method
117
119
 
118
- transact runs the given block and then resets the page history. I.e. after the
119
- block has been executed, you're back at the original page; no need count how
120
- many times to call the back method at the end of a loop (while accounting for
121
- possible exceptions).
120
+ Mechanize#transact runs the given block and then resets the page history. I.e.
121
+ after the block has been executed, you're back at the original page; no need
122
+ count how many times to call the back method at the end of a loop (while
123
+ accounting for possible exceptions).
122
124
 
123
125
  This example also demonstrates subclassing Mechanize.
124
126
 
@@ -154,17 +156,12 @@ This example also demonstrates subclassing Mechanize.
154
156
 
155
157
  == Client Certificate Authentication (Mutual Auth)
156
158
 
157
- In most cases a client certificate is created as an additional layer of security
158
- for certain websites. The specific case that this was initially tested on was
159
- for automating the download of archived images from a banks (Wachovia) lockbox
160
- system. Once the certificate is installed into your browser you will have to
161
- export it and split the certificate and private key into separate files.
162
- Exported files are usually in .p12 format (IE 7 & Firefox 2.0) which stands for
163
- PKCS #12. You can convert them from p12 to pem format by using the following
164
- commands:
165
-
166
- openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.key -nocerts -nodes
167
- openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys
159
+ In most cases a client certificate is created as an additional layer of
160
+ security for certain websites. The specific case that this was initially
161
+ tested on was for automating the download of archived images from a banks
162
+ (Wachovia) lockbox system. Once the certificate is installed into your
163
+ browser you will have to export it and split the certificate and private key
164
+ into separate files.
168
165
 
169
166
  require 'rubygems'
170
167
  require 'mechanize'
@@ -185,3 +182,11 @@ openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys
185
182
 
186
183
  # submit login form
187
184
  agent.submit(login_form, login_form.buttons.first)
185
+
186
+ Exported files are usually in .p12 format (IE 7 & Firefox 2.0) which stands
187
+ for PKCS #12. You can convert them from p12 to pem format by using the
188
+ following commands:
189
+
190
+ openssl pkcs12 -in input_file.p12 -clcerts -out example.key -nocerts -nodes
191
+ openssl pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys
192
+
data/GUIDE.rdoc CHANGED
@@ -130,7 +130,7 @@ In this section, I want to touch on using the different types in input fields
130
130
  possible with a form. Password and textarea fields can be treated just like
131
131
  text input fields. Select fields are very similar to text fields, but they
132
132
  have many options associated with them. If you select one option, mechanize
133
- will deselect the other options (unless it is a multi select!).
133
+ will de-select the other options (unless it is a multi select!).
134
134
 
135
135
  For example, lets select an option on a list:
136
136
 
@@ -154,10 +154,15 @@ tell it what file name you want to upload:
154
154
 
155
155
  == Scraping Data
156
156
 
157
- Mechanize uses nokogiri[http://nokogiri.org/] to parse
158
- html. What does this mean for you? You can treat a mechanize page like
159
- an nokogiri object. After you have used Mechanize to navigate to the page
160
- that you need to scrape, then scrape it using nokogiri methods:
157
+ Mechanize uses nokogiri[http://nokogiri.org/] to parse HTML. What does this
158
+ mean for you? You can treat a mechanize page like an nokogiri object. After
159
+ you have used Mechanize to navigate to the page that you need to scrape, then
160
+ scrape it using nokogiri methods:
161
+
162
+ agent.get('http://someurl.com/').search("p.posted")
163
+
164
+ The expression given to Mechanize::Page#search may be a CSS expression or an
165
+ XPath expression:
161
166
 
162
167
  agent.get('http://someurl.com/').search(".//p[@class='posted']")
163
168
 
data/Manifest.txt CHANGED
@@ -47,6 +47,7 @@ lib/mechanize/http/auth_challenge.rb
47
47
  lib/mechanize/http/auth_realm.rb
48
48
  lib/mechanize/http/content_disposition_parser.rb
49
49
  lib/mechanize/http/www_authenticate_parser.rb
50
+ lib/mechanize/image.rb
50
51
  lib/mechanize/monkey_patch.rb
51
52
  lib/mechanize/page.rb
52
53
  lib/mechanize/page/base.rb
@@ -144,16 +145,19 @@ test/test_mechanize_http_auth_challenge.rb
144
145
  test/test_mechanize_http_auth_realm.rb
145
146
  test/test_mechanize_http_content_disposition_parser.rb
146
147
  test/test_mechanize_http_www_authenticate_parser.rb
148
+ test/test_mechanize_image.rb
147
149
  test/test_mechanize_link.rb
148
150
  test/test_mechanize_page.rb
149
151
  test/test_mechanize_page_encoding.rb
150
152
  test/test_mechanize_page_frame.rb
153
+ test/test_mechanize_page_image.rb
151
154
  test/test_mechanize_page_link.rb
152
155
  test/test_mechanize_page_meta_refresh.rb
153
156
  test/test_mechanize_parser.rb
154
157
  test/test_mechanize_pluggable_parser.rb
155
158
  test/test_mechanize_redirect_limit_reached_error.rb
156
159
  test/test_mechanize_redirect_not_get_or_head_error.rb
160
+ test/test_mechanize_response_read_error.rb
157
161
  test/test_mechanize_subclass.rb
158
162
  test/test_mechanize_util.rb
159
163
  test/test_multi_select.rb
data/Rakefile CHANGED
@@ -17,7 +17,8 @@ hoe = Hoe.spec 'mechanize' do
17
17
  rdoc_locations << 'drbrain@rubyforge.org:/var/www/gforge-projects/mechanize/'
18
18
 
19
19
  self.extra_deps << ['net-http-digest_auth', '~> 1.1', '>= 1.1.1']
20
- self.extra_deps << ['net-http-persistent', '~> 2.4', '>= 2.4.1']
20
+ self.extra_deps << ['net-http-persistent', '~> 2.5', '>= 2.5.2']
21
+ self.extra_deps << ['mime-types', '~> 1.17', '>= 1.17.2']
21
22
  self.extra_deps << ['nokogiri', '~> 1.4']
22
23
  self.extra_deps << ['ntlm-http', '~> 0.1', '>= 0.1.1']
23
24
  self.extra_deps << ['webrobots', '~> 0.0', '>= 0.0.9']
data/lib/mechanize.rb CHANGED
@@ -1,6 +1,7 @@
1
1
  require 'fileutils'
2
2
  require 'forwardable'
3
3
  require 'iconv' if RUBY_VERSION < '1.9.2'
4
+ require 'mime/types'
4
5
  require 'mutex_m'
5
6
  require 'net/http/digest_auth'
6
7
  require 'net/http/persistent'
@@ -72,7 +73,7 @@ class Mechanize
72
73
  ##
73
74
  # The version of Mechanize you are using.
74
75
 
75
- VERSION = '2.1.1'
76
+ VERSION = '2.2'
76
77
 
77
78
  ##
78
79
  # Base mechanize error class
@@ -176,7 +177,6 @@ class Mechanize
176
177
  yield self if block_given?
177
178
 
178
179
  @agent.set_proxy @proxy_addr, @proxy_port, @proxy_user, @proxy_pass
179
- @agent.set_http
180
180
  end
181
181
 
182
182
  # :section: History
@@ -293,7 +293,7 @@ class Mechanize
293
293
  raise RobotsDisallowedError.new(link.href)
294
294
  end
295
295
  end
296
- if link.rel?('noreferrer')
296
+ if link.noreferrer?
297
297
  href = @agent.resolve(link.href, link.page || current_page)
298
298
  referer = Page.new(nil, {'content-type'=>'text/html'})
299
299
  else
@@ -319,6 +319,47 @@ class Mechanize
319
319
  end
320
320
  end
321
321
 
322
+ ##
323
+ # GETs +uri+ and writes it to +io_or_filename+ without recording the request
324
+ # in the history. If +io_or_filename+ does not respond to #write it will be
325
+ # used as a file name. +parameters+, +referer+ and +headers+ are used as in
326
+ # #get.
327
+ #
328
+ # By default, if the Content-type of the response matches a Mechanize::File
329
+ # or Mechanize::Page parser, the response body will be loaded into memory
330
+ # before being saved. See #pluggable_parser for details on changing this
331
+ # default.
332
+ #
333
+ # For alternate ways of downloading files see Mechanize::FileSaver and
334
+ # Mechanize::DirectorySaver.
335
+
336
+ def download uri, io_or_filename, parameters = [], referer = nil, headers = {}
337
+ page = transact do
338
+ get uri, parameters, referer, headers
339
+ end
340
+
341
+ io = if io_or_filename.respond_to? :write then
342
+ io_or_filename
343
+ else
344
+ open io_or_filename, 'wb'
345
+ end
346
+
347
+ case page
348
+ when Mechanize::File then
349
+ io.write page.body
350
+ else
351
+ body_io = page.body_io
352
+
353
+ until body_io.eof? do
354
+ io.write body_io.read 16384
355
+ end
356
+ end
357
+
358
+ page
359
+ ensure
360
+ io.close if io and not io_or_filename.respond_to? :write
361
+ end
362
+
322
363
  ##
323
364
  # DELETE +uri+ with +query_params+, and setting +headers+:
324
365
  #
@@ -341,18 +382,20 @@ class Mechanize
341
382
 
342
383
  referer ||=
343
384
  if uri.to_s =~ %r{\Ahttps?://}
344
- Page.new(nil, {'content-type'=>'text/html'})
385
+ Page.new(nil, 'content-type' => 'text/html')
345
386
  else
346
- current_page || Page.new(nil, {'content-type'=>'text/html'})
387
+ current_page || Page.new(nil, 'content-type' => 'text/html')
347
388
  end
348
389
 
349
390
  # FIXME: Huge hack so that using a URI as a referer works. I need to
350
391
  # refactor everything to pass around URIs but still support
351
392
  # Mechanize::Page#base
352
393
  unless Mechanize::Parser === referer then
353
- referer = referer.is_a?(String) ?
354
- Page.new(URI.parse(referer), {'content-type' => 'text/html'}) :
355
- Page.new(referer, {'content-type' => 'text/html'})
394
+ referer = if referer.is_a?(String) then
395
+ Page.new URI(referer), 'content-type' => 'text/html'
396
+ else
397
+ Page.new referer, 'content-type' => 'text/html'
398
+ end
356
399
  end
357
400
 
358
401
  # fetch the page
@@ -371,14 +414,15 @@ class Mechanize
371
414
  end
372
415
 
373
416
  ##
374
- # HEAD +uri+ with +query_params+, and setting +headers+:
417
+ # HEAD +uri+ with +query_params+ and +headers+:
375
418
  #
376
419
  # head('http://example/', {'q' => 'foo'}, {})
377
420
 
378
421
  def head(uri, query_params = {}, headers = {})
379
- # fetch the page
380
- page = @agent.fetch(uri, :head, headers, query_params)
422
+ page = @agent.fetch uri, :head, headers, query_params
423
+
381
424
  yield page if block_given?
425
+
382
426
  page
383
427
  end
384
428
 
@@ -537,6 +581,18 @@ class Mechanize
537
581
 
538
582
  attr_accessor :keep_alive_time
539
583
 
584
+ ##
585
+ # The pluggable parser maps a response Content-Type to a parser class. The
586
+ # registered Content-Type may be either a full content type like 'image/png'
587
+ # or a media type 'text'. See Mechanize::PluggableParser for further
588
+ # details.
589
+ #
590
+ # Example:
591
+ #
592
+ # agent.pluggable_parser['application/octet-stream'] = Mechanize::Download
593
+
594
+ attr_reader :pluggable_parser
595
+
540
596
  ##
541
597
  # The HTTP proxy address
542
598
 
@@ -907,7 +963,7 @@ class Mechanize
907
963
  # certificate instance
908
964
 
909
965
  def cert= cert
910
- @agent.cert = cert
966
+ @agent.certificate = cert
911
967
  end
912
968
 
913
969
  ##
@@ -962,10 +1018,11 @@ class Mechanize
962
1018
  end
963
1019
 
964
1020
  ##
965
- # Sets the OpenSSL client +key+ to the given path or key instance
1021
+ # Sets the OpenSSL client +key+ to the given path or key instance. If a
1022
+ # path is given, the path must contain an RSA key file.
966
1023
 
967
1024
  def key= key
968
- @agent.key = key
1025
+ @agent.private_key = key
969
1026
  end
970
1027
 
971
1028
  ##
@@ -982,6 +1039,21 @@ class Mechanize
982
1039
  @agent.pass = pass
983
1040
  end
984
1041
 
1042
+ ##
1043
+ # SSL version to use. Ruby 1.9 and newer only.
1044
+
1045
+ def ssl_version
1046
+ @agent.ssl_version
1047
+ end if RUBY_VERSION > '1.9'
1048
+
1049
+ ##
1050
+ # Sets the SSL version to use to +version+ without client/server
1051
+ # negotiation. Ruby 1.9 and newer only.
1052
+
1053
+ def ssl_version= ssl_version
1054
+ @agent.ssl_version = ssl_version
1055
+ end if RUBY_VERSION > '1.9'
1056
+
985
1057
  ##
986
1058
  # A callback for additional certificate verification. See
987
1059
  # OpenSSL::SSL::SSLContext#verify_callback
@@ -1021,8 +1093,6 @@ class Mechanize
1021
1093
 
1022
1094
  attr_reader :agent # :nodoc:
1023
1095
 
1024
- attr_reader :pluggable_parser # :nodoc:
1025
-
1026
1096
  ##
1027
1097
  # Parses the +body+ of the +response+ from +uri+ using the pluggable parser
1028
1098
  # that matches its content type
@@ -1035,7 +1105,6 @@ class Mechanize
1035
1105
  content_type, = data.downcase.split ',', 2 unless data.nil?
1036
1106
  end
1037
1107
 
1038
- # Find our pluggable parser
1039
1108
  parser_klass = @pluggable_parser.parser content_type
1040
1109
 
1041
1110
  unless parser_klass <= Mechanize::Download then
@@ -1074,7 +1143,6 @@ class Mechanize
1074
1143
  @proxy_pass = password
1075
1144
 
1076
1145
  @agent.set_proxy address, port, user, password
1077
- @agent.set_http
1078
1146
  end
1079
1147
 
1080
1148
  private
@@ -1116,6 +1184,7 @@ require 'mechanize/cookie'
1116
1184
  require 'mechanize/cookie_jar'
1117
1185
  require 'mechanize/parser'
1118
1186
  require 'mechanize/download'
1187
+ require 'mechanize/directory_saver'
1119
1188
  require 'mechanize/file'
1120
1189
  require 'mechanize/file_connection'
1121
1190
  require 'mechanize/file_request'
@@ -1128,6 +1197,7 @@ require 'mechanize/http/auth_challenge'
1128
1197
  require 'mechanize/http/auth_realm'
1129
1198
  require 'mechanize/http/content_disposition_parser'
1130
1199
  require 'mechanize/http/www_authenticate_parser'
1200
+ require 'mechanize/image'
1131
1201
  require 'mechanize/page'
1132
1202
  require 'mechanize/monkey_patch'
1133
1203
  require 'mechanize/pluggable_parsers'
@@ -22,5 +22,9 @@ class Mechanize::FileRequest
22
22
  def each_header
23
23
  end
24
24
 
25
+ def response_body_permitted?
26
+ true
27
+ end
28
+
25
29
  end
26
30
 
@@ -1,7 +1,7 @@
1
1
  ##
2
2
  # This is a pluggable parser that automatically saves every file it
3
- # encounters. It saves the files as a tree, reflecting the host and file
4
- # path.
3
+ # encounters. Unlike Mechanize::DirectorySaver, the file saver saves the
4
+ # responses as a tree, reflecting the host and file path.
5
5
  #
6
6
  # == Example
7
7
  #
@@ -11,7 +11,7 @@
11
11
  #
12
12
  # agent = Mechanize.new
13
13
  # agent.pluggable_parser.pdf = Mechanize::FileSaver
14
- # agent.get('http://example.com/foo.pdf')
14
+ # agent.get 'http://example.com/foo.pdf'
15
15
  #
16
16
  # Dir['example.com/*'] # => foo.pdf
17
17
 
@@ -78,37 +78,11 @@ class Mechanize::HTTP::Agent
78
78
 
79
79
  # :section: SSL
80
80
 
81
- # Path to an OpenSSL server certificate file
82
- attr_accessor :ca_file
83
-
84
- # An OpenSSL private key or the path to a private key
85
- attr_accessor :key
86
-
87
- # An OpenSSL client certificate or the path to a certificate file.
88
- attr_accessor :cert
89
-
90
- # An SSL certificate store
91
- attr_accessor :cert_store
92
-
93
81
  # OpenSSL key password
94
82
  attr_accessor :pass
95
83
 
96
- # A callback for additional certificate verification. See
97
- # OpenSSL::SSL::SSLContext#verify_callback
98
- #
99
- # The callback can be used for debugging or to ignore errors by always
100
- # returning +true+. Specifying nil uses the default method that was valid
101
- # when the SSLContext was created
102
- attr_accessor :verify_callback
103
-
104
- # How to verify SSL connections. Defaults to VERIFY_PEER
105
- attr_accessor :verify_mode
106
-
107
84
  # :section: Timeouts
108
85
 
109
- # Reset connections that have not been used in this many seconds
110
- attr_reader :idle_timeout
111
-
112
86
  # Set to false to disable HTTP/1.1 keep-alive requests
113
87
  attr_accessor :keep_alive
114
88
 
@@ -123,12 +97,6 @@ class Mechanize::HTTP::Agent
123
97
  # The cookies for this agent
124
98
  attr_accessor :cookie_jar
125
99
 
126
- # URI for a proxy connection
127
- attr_reader :proxy_uri
128
-
129
- # Retry non-idempotent requests?
130
- attr_reader :retry_change_requests
131
-
132
100
  # Responses larger than this will be written to a Tempfile instead of stored
133
101
  # in memory.
134
102
  attr_accessor :max_file_buffer
@@ -157,19 +125,15 @@ class Mechanize::HTTP::Agent
157
125
  @follow_meta_refresh_self = false
158
126
  @gzip_enabled = true
159
127
  @history = Mechanize::History.new
160
- @idle_timeout = 5
161
128
  @keep_alive = true
162
- @keep_alive_time = 300
163
129
  @max_file_buffer = 10240
164
130
  @open_timeout = nil
165
131
  @post_connect_hooks = []
166
132
  @pre_connect_hooks = []
167
- @proxy_uri = nil
168
133
  @read_timeout = nil
169
134
  @redirect_ok = true
170
135
  @redirection_limit = 20
171
136
  @request_headers = {}
172
- @retry_change_requests = false
173
137
  @robots = false
174
138
  @user_agent = nil
175
139
  @webrobots = nil
@@ -188,13 +152,7 @@ class Mechanize::HTTP::Agent
188
152
  @domain = nil # NTLM HTTP domain
189
153
 
190
154
  # SSL
191
- @ca_file = nil
192
- @cert = nil
193
- @cert_store = nil
194
- @key = nil
195
- @pass = nil
196
- @verify_callback = nil
197
- @verify_mode = nil
155
+ @pass = nil
198
156
 
199
157
  @scheme_handlers = Hash.new { |h, scheme|
200
158
  h[scheme] = lambda { |link, page|
@@ -206,6 +164,10 @@ class Mechanize::HTTP::Agent
206
164
  @scheme_handlers['https'] = @scheme_handlers['http']
207
165
  @scheme_handlers['relative'] = @scheme_handlers['http']
208
166
  @scheme_handlers['file'] = @scheme_handlers['http']
167
+
168
+ @http = Net::HTTP::Persistent.new 'mechanize'
169
+ @http.idle_timeout = 5
170
+ @http.keep_alive = 300
209
171
  end
210
172
 
211
173
  # Retrieves +uri+ and parses it into a page or other object according to
@@ -273,7 +235,8 @@ class Mechanize::HTTP::Agent
273
235
 
274
236
  hook_content_encoding response, uri, response_body_io
275
237
 
276
- response_body_io = response_content_encoding response, response_body_io
238
+ response_body_io = response_content_encoding response, response_body_io if
239
+ request.response_body_permitted?
277
240
 
278
241
  post_connect uri, response, response_body_io
279
242
 
@@ -306,11 +269,21 @@ class Mechanize::HTTP::Agent
306
269
  end
307
270
  end
308
271
 
272
+ # URI for a proxy connection
273
+
274
+ def proxy_uri
275
+ @http.proxy_uri
276
+ end
277
+
278
+ # Retry non-idempotent requests?
279
+ def retry_change_requests
280
+ @http.retry_change_requests
281
+ end
282
+
309
283
  # Retry non-idempotent requests
310
284
 
311
285
  def retry_change_requests= retri
312
- @retry_change_requests = retri
313
- @http.retry_change_requests = retri if @http
286
+ @http.retry_change_requests = retri
314
287
  end
315
288
 
316
289
  # :section: Headers
@@ -568,10 +541,24 @@ class Mechanize::HTTP::Agent
568
541
  end
569
542
 
570
543
  def resolve(uri, referer = current_page)
571
- uri = uri.dup if uri.is_a?(URI)
544
+ referer_uri = referer && referer.uri
545
+ if uri.is_a?(URI)
546
+ uri = uri.dup
547
+ elsif uri.nil?
548
+ if referer_uri
549
+ return referer_uri
550
+ end
551
+ raise ArgumentError, "absolute URL needed (not nil)"
552
+ else
553
+ url = uri.to_s.strip
554
+ if url.empty?
555
+ if referer_uri
556
+ return referer_uri.dup.tap { |u| u.fragment = nil }
557
+ end
558
+ raise ArgumentError, "absolute URL needed (not #{uri.inspect})"
559
+ end
572
560
 
573
- unless uri.is_a?(URI)
574
- uri = uri.to_s.strip.gsub(/[^#{0.chr}-#{126.chr}]/o) { |match|
561
+ url.gsub!(/[^#{0.chr}-#{126.chr}]/o) { |match|
575
562
  if RUBY_VERSION >= "1.9.0"
576
563
  Mechanize::Util.uri_escape(match)
577
564
  else
@@ -579,28 +566,25 @@ class Mechanize::HTTP::Agent
579
566
  end
580
567
  }
581
568
 
582
- unescaped = uri.split(/(?:%[0-9A-Fa-f]{2})+|#/)
583
- escaped = uri.scan(/(?:%[0-9A-Fa-f]{2})+|#/)
584
-
585
- escaped_uri = Mechanize::Util.html_unescape(
586
- unescaped.zip(escaped).map { |x,y|
569
+ escaped_url = Mechanize::Util.html_unescape(
570
+ url.split(/((?:%[0-9A-Fa-f]{2})+|#)/).each_slice(2).map { |x, y|
587
571
  "#{WEBrick::HTTPUtils.escape(x)}#{y}"
588
572
  }.join('')
589
573
  )
590
574
 
591
575
  begin
592
- uri = URI.parse(escaped_uri)
576
+ uri = URI.parse(escaped_url)
593
577
  rescue
594
- uri = URI.parse(WEBrick::HTTPUtils.escape(escaped_uri))
578
+ uri = URI.parse(WEBrick::HTTPUtils.escape(escaped_url))
595
579
  end
596
580
  end
597
581
 
598
582
  scheme = uri.relative? ? 'relative' : uri.scheme.downcase
599
583
  uri = @scheme_handlers[scheme].call(uri, referer)
600
584
 
601
- if referer && referer.uri
585
+ if referer_uri
602
586
  if uri.path.length == 0 && uri.relative?
603
- uri.path = referer.uri.path
587
+ uri.path = referer_uri.path
604
588
  end
605
589
  end
606
590
 
@@ -608,17 +592,16 @@ class Mechanize::HTTP::Agent
608
592
 
609
593
  if uri.relative?
610
594
  raise ArgumentError, "absolute URL needed (not #{uri})" unless
611
- referer && referer.uri
595
+ referer_uri
612
596
 
613
- base = nil
614
- if referer.respond_to?(:bases) && referer.parser
615
- base = referer.bases.last
597
+ if referer.respond_to?(:bases) && referer.parser &&
598
+ (lbase = referer.bases.last) && lbase.uri && lbase.uri.absolute?
599
+ base = lbase
600
+ else
601
+ base = nil
616
602
  end
617
603
 
618
- uri = ((base && base.uri && base.uri.absolute?) ?
619
- base.uri :
620
- referer.uri) + uri
621
- uri = referer.uri + uri
604
+ uri = referer_uri + (base ? base.uri : referer_uri) + uri
622
605
  # Strip initial "/.." bits from the path
623
606
  uri.path.sub!(/^(\/\.\.)+(?=\/)/, '')
624
607
  end
@@ -791,7 +774,8 @@ class Mechanize::HTTP::Agent
791
774
 
792
775
  def response_follow_meta_refresh response, uri, page, redirects
793
776
  delay, new_url = get_meta_refresh(response, uri, page)
794
- return nil unless new_url
777
+ return nil unless delay
778
+ new_url = new_url ? resolve(new_url, page) : uri
795
779
 
796
780
  raise Mechanize::RedirectLimitReachedError.new(page, redirects) if
797
781
  redirects + 1 > @redirection_limit
@@ -888,9 +872,8 @@ class Mechanize::HTTP::Agent
888
872
 
889
873
  redirect_method = method == :head ? :head : :get
890
874
 
891
- from_uri = page.uri
892
- @history.push(page, from_uri)
893
- new_uri = from_uri + response['Location'].to_s
875
+ @history.push(page, page.uri)
876
+ new_uri = resolve response['Location'].to_s, page
894
877
 
895
878
  fetch new_uri, redirect_method, {}, [], referer, redirects + 1
896
879
  end
@@ -949,16 +932,104 @@ class Mechanize::HTTP::Agent
949
932
 
950
933
  # :section: SSL
951
934
 
935
+ # Path to an OpenSSL CA certificate file
936
+ def ca_file
937
+ @http.ca_file
938
+ end
939
+
940
+ # Sets the path to an OpenSSL CA certificate file
941
+ def ca_file= ca_file
942
+ @http.ca_file = ca_file
943
+ end
944
+
945
+ # The SSL certificate store used for validating connections
946
+ def cert_store
947
+ @http.cert_store
948
+ end
949
+
950
+ # Sets the SSL certificate store used for validating connections
951
+ def cert_store= cert_store
952
+ @http.cert_store = cert_store
953
+ end
954
+
955
+ # The client X509 certificate
952
956
  def certificate
953
957
  @http.certificate
954
958
  end
955
959
 
960
+ # Sets the client certificate to given X509 certificate. If a path is given
961
+ # the certificate will be loaded and set.
962
+ def certificate= certificate
963
+ certificate = if OpenSSL::X509::Certificate === certificate then
964
+ certificate
965
+ else
966
+ OpenSSL::X509::Certificate.new File.read certificate
967
+ end
968
+
969
+ @http.certificate = certificate
970
+ end
971
+
972
+ # An OpenSSL private key or the path to a private key
973
+ def private_key
974
+ @http.private_key
975
+ end
976
+
977
+ # Sets the client's private key
978
+ def private_key= private_key
979
+ private_key = if OpenSSL::PKey::PKey === private_key then
980
+ private_key
981
+ else
982
+ OpenSSL::PKey::RSA.new File.read(private_key), @pass
983
+ end
984
+
985
+ @http.private_key = private_key
986
+ end
987
+
988
+ # SSL version to use
989
+ def ssl_version
990
+ @http.ssl_version
991
+ end if RUBY_VERSION > '1.9'
992
+
993
+ # Sets the SSL version to use
994
+ def ssl_version= ssl_version
995
+ @http.ssl_version = ssl_version
996
+ end if RUBY_VERSION > '1.9'
997
+
998
+ # A callback for additional certificate verification. See
999
+ # OpenSSL::SSL::SSLContext#verify_callback
1000
+ #
1001
+ # The callback can be used for debugging or to ignore errors by always
1002
+ # returning +true+. Specifying nil uses the default method that was valid
1003
+ # when the SSLContext was created
1004
+ def verify_callback
1005
+ @http.verify_callback
1006
+ end
1007
+
1008
+ # Sets the certificate verify callback
1009
+ def verify_callback= verify_callback
1010
+ @http.verify_callback = verify_callback
1011
+ end
1012
+
1013
+ # How to verify SSL connections. Defaults to VERIFY_PEER
1014
+ def verify_mode
1015
+ @http.verify_mode
1016
+ end
1017
+
1018
+ # Sets the mode for verifying SSL connections
1019
+ def verify_mode= verify_mode
1020
+ @http.verify_mode = verify_mode
1021
+ end
1022
+
956
1023
  # :section: Timeouts
957
1024
 
958
- # Sets the conection idle timeout for persistent connections
1025
+ # Reset connections that have not been used in this many seconds
1026
+ def idle_timeout
1027
+ @http.idle_timeout
1028
+ end
1029
+
1030
+ # Sets the connection idle timeout for persistent connections
959
1031
  def idle_timeout= timeout
960
- @idle_timeout = timeout
961
- @http.idle_timeout = timeout if @http
1032
+ @http.idle_timeout = timeout
962
1033
  end
963
1034
 
964
1035
  # :section: Utility
@@ -980,47 +1051,17 @@ class Mechanize::HTTP::Agent
980
1051
  @context.log
981
1052
  end
982
1053
 
983
- def set_http
984
- @http = Net::HTTP::Persistent.new 'mechanize', @proxy_uri
985
-
986
- @http.keep_alive = @keep_alive_time
987
- @http.idle_timeout = @idle_timeout if @idle_timeout
988
- @http.retry_change_requests = @retry_change_requests
989
-
990
- @http.ca_file = @ca_file
991
- @http.cert_store = @cert_store if @cert_store
992
- @http.verify_callback = @verify_callback
993
- @http.verify_mode = @verify_mode if @verify_mode
994
-
995
- # update our cached value
996
- @verify_mode = @http.verify_mode
997
- @cert_store = @http.cert_store
998
-
999
- if @cert and @key then
1000
- cert = if OpenSSL::X509::Certificate === @cert then
1001
- @cert
1002
- else
1003
- OpenSSL::X509::Certificate.new ::File.read @cert
1004
- end
1005
-
1006
- key = if OpenSSL::PKey::PKey === @key then
1007
- @key
1008
- else
1009
- OpenSSL::PKey::RSA.new ::File.read(@key), @pass
1010
- end
1011
-
1012
- @http.certificate = cert
1013
- @http.private_key = key
1014
- end
1015
- end
1016
-
1017
1054
  ##
1018
1055
  # Sets the proxy address, port, user, and password +addr+ should be a host,
1019
1056
  # with no "http://", +port+ may be a port number, service name or port
1020
1057
  # number string.
1021
1058
 
1022
- def set_proxy(addr, port, user = nil, pass = nil)
1023
- return unless addr and port
1059
+ def set_proxy addr, port, user = nil, pass = nil
1060
+ unless addr and port then
1061
+ @http.proxy = nil
1062
+
1063
+ return
1064
+ end
1024
1065
 
1025
1066
  unless Integer === port then
1026
1067
  begin
@@ -1034,12 +1075,12 @@ class Mechanize::HTTP::Agent
1034
1075
  end
1035
1076
  end
1036
1077
 
1037
- @proxy_uri = URI "http://#{addr}"
1038
- @proxy_uri.port = port
1039
- @proxy_uri.user = user if user
1040
- @proxy_uri.password = pass if pass
1078
+ proxy_uri = URI "http://#{addr}"
1079
+ proxy_uri.port = port
1080
+ proxy_uri.user = user if user
1081
+ proxy_uri.password = pass if pass
1041
1082
 
1042
- @proxy_uri
1083
+ @http.proxy = proxy_uri
1043
1084
  end
1044
1085
 
1045
1086
  end