mechanize 2.0.pre.1 → 2.0.pre.2

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of mechanize might be problematic. Click here for more details.

Files changed (50) hide show
  1. data.tar.gz.sig +2 -2
  2. data/CHANGELOG.rdoc +24 -2
  3. data/Manifest.txt +15 -19
  4. data/Rakefile +6 -3
  5. data/lib/mechanize.rb +168 -28
  6. data/lib/mechanize/form.rb +14 -2
  7. data/lib/mechanize/page.rb +43 -14
  8. data/lib/mechanize/page/link.rb +10 -0
  9. data/lib/mechanize/redirect_not_get_or_head_error.rb +2 -1
  10. data/lib/mechanize/robots_disallowed_error.rb +29 -0
  11. data/lib/mechanize/util.rb +30 -6
  12. data/test/helper.rb +6 -0
  13. data/test/htdocs/canonical_uri.html +9 -0
  14. data/test/htdocs/nofollow.html +9 -0
  15. data/test/htdocs/noindex.html +9 -0
  16. data/test/htdocs/norobots.html +8 -0
  17. data/test/htdocs/rel_nofollow.html +8 -0
  18. data/test/htdocs/robots.html +8 -0
  19. data/test/htdocs/robots.txt +2 -0
  20. data/test/htdocs/tc_links.html +3 -3
  21. data/test/test_links.rb +9 -0
  22. data/test/test_mechanize.rb +617 -2
  23. data/test/{test_forms.rb → test_mechanize_form.rb} +45 -1
  24. data/test/test_mechanize_form_check_box.rb +37 -0
  25. data/test/test_mechanize_form_encoding.rb +118 -0
  26. data/test/{test_field_precedence.rb → test_mechanize_form_field.rb} +4 -16
  27. data/test/test_mechanize_page.rb +60 -1
  28. data/test/test_mechanize_redirect_not_get_or_head_error.rb +18 -0
  29. data/test/test_mechanize_subclass.rb +22 -0
  30. data/test/test_mechanize_util.rb +87 -2
  31. data/test/test_robots.rb +87 -0
  32. metadata +51 -43
  33. metadata.gz.sig +0 -0
  34. data/lib/mechanize/uri_resolver.rb +0 -82
  35. data/test/test_authenticate.rb +0 -71
  36. data/test/test_bad_links.rb +0 -25
  37. data/test/test_blank_form.rb +0 -16
  38. data/test/test_checkboxes.rb +0 -61
  39. data/test/test_content_type.rb +0 -13
  40. data/test/test_encoded_links.rb +0 -20
  41. data/test/test_errors.rb +0 -49
  42. data/test/test_follow_meta.rb +0 -119
  43. data/test/test_get_headers.rb +0 -52
  44. data/test/test_gzipping.rb +0 -22
  45. data/test/test_hash_api.rb +0 -45
  46. data/test/test_mech.rb +0 -283
  47. data/test/test_mech_proxy.rb +0 -16
  48. data/test/test_mechanize_uri_resolver.rb +0 -29
  49. data/test/test_redirect_verb_handling.rb +0 -49
  50. data/test/test_subclass.rb +0 -30
data.tar.gz.sig CHANGED
@@ -1,2 +1,2 @@
1
- W�y �@ ����r�?� \��{X9�=�KҪ���fU#�sH������9=%v�$�j�\�6�,V��I�1|[ΘF�U�r���H�� ����A���x�1cX�� �XH��0r S�=�kv�1He�I֋6��hBk�+�;��EW`X�h�� ~�I�۳&�������E�.')r�UY^\���X�<��h ��\ۯ�Z
2
- ���)e�)�A��3wo�m�k:;���% Ѿ�7ŗ�H�ݟ�+;N��IJ�̒���:
1
+ ����:�0*��r��jkbVw!��\y� �������]��F}6<��R@x�8D �f ^�[�^d��
2
+ �*����%94��6�ˉ���g8ϗ�o�~�ƛ̰����X?�B��8�6s�����:�SM�ѡ\φC>�9xR���R��@v�&�^�&�*$�� �;)��on۱�m`��y��b�/���j�ѱ�n��Q��f+��g8W `�a
@@ -1,11 +1,11 @@
1
1
  = Mechanize CHANGELOG
2
2
 
3
- === 2.0.pre.1 / 2011-04-09
3
+ === 2.0.pre.2 / 2011-04-17
4
4
 
5
5
  Mechanize is now under the MIT license
6
6
 
7
7
  * API changes
8
- * WWW::Mechanize has been removed.
8
+ * WWW::Mechanize has been removed. Use Mechanize.
9
9
  * Pre connect hooks are now called with the agent and the request. See
10
10
  Mechanize#pre_connect_hooks.
11
11
  * Post connect hooks are now called with the agent and the response. See
@@ -23,6 +23,16 @@ Mechanize is now under the MIT license
23
23
  * The User-Agent header has changed. It no longer includes the WWW- prefix
24
24
  and now includes the ruby version. The URL has been updated as well.
25
25
  * Mechanize now requires ruby 1.8.7 or newer.
26
+ * Hpricot support has been removed as webrobots requires nokogiri.
27
+ * Mechanize#get no longer accepts the referer as the second argument.
28
+ * Mechanize#get no longer allows the HTTP method to be changed (:verb
29
+ option).
30
+
31
+ * Deprecations
32
+ * Mechanize#get with an options hash is deprecated and will be removed after
33
+ October, 2011.
34
+ * Mechanize::Util::to_native_charset is deprecated as it is no longer used
35
+ by Mechanize.
26
36
 
27
37
  * New Features
28
38
 
@@ -39,6 +49,17 @@ Mechanize is now under the MIT license
39
49
  * Mechanize now allows a certificate and key to be passed directly. GH #71
40
50
  * Mechanize::Form::MultiSelectList now implements #option_with and
41
51
  #options_with. GH #42
52
+ * Add Mechanize::Page::Link#rel and #rel?(kind) to read and test the rel
53
+ attribute.
54
+ * Add Mechanize::Page#canonical_uri to read a </tt><link
55
+ rel="canonical"></tt> tag.
56
+ * Add support for Robots Exclusion Protocol (i.e. robots.txt) and
57
+ nofollow/noindex in meta tags and the rel attribute. Automatic
58
+ exclusion can be turned on by setting:
59
+ agent.robots = true
60
+ * Manual robots.txt test can be performed with
61
+ Mechanize#robots_allowed? and #robots_disallowed?.
62
+ * Mechanize::Form now supports the accept-charset attribute. GH #96
42
63
 
43
64
  * Bug Fixes:
44
65
 
@@ -63,6 +84,7 @@ Mechanize is now under the MIT license
63
84
  * Content-Encoding: x-gzip is now treated like gzip per RFC 2616.
64
85
  * Mechanize now unescapes URIs for meta refresh. GH #68
65
86
  * Mechanize now has more robust HTML charset detection. GH #43
87
+ * Mechanize::Form::Textarea is now created from a textarea element. GH #94
66
88
 
67
89
  === 1.0.0
68
90
 
@@ -46,8 +46,8 @@ lib/mechanize/pluggable_parsers.rb
46
46
  lib/mechanize/redirect_limit_reached_error.rb
47
47
  lib/mechanize/redirect_not_get_or_head_error.rb
48
48
  lib/mechanize/response_code_error.rb
49
+ lib/mechanize/robots_disallowed_error.rb
49
50
  lib/mechanize/unsupported_scheme_error.rb
50
- lib/mechanize/uri_resolver.rb
51
51
  lib/mechanize/util.rb
52
52
  test/data/htpasswd
53
53
  test/data/server.crt
@@ -58,6 +58,7 @@ test/helper.rb
58
58
  test/htdocs/alt_text.html
59
59
  test/htdocs/bad_form_test.html
60
60
  test/htdocs/button.jpg
61
+ test/htdocs/canonical_uri.html
61
62
  test/htdocs/dir with spaces/foo.html
62
63
  test/htdocs/empty_form.html
63
64
  test/htdocs/file_upload.html
@@ -79,8 +80,14 @@ test/htdocs/index.html
79
80
  test/htdocs/link with space.html
80
81
  test/htdocs/meta_cookie.html
81
82
  test/htdocs/no_title_test.html
83
+ test/htdocs/nofollow.html
84
+ test/htdocs/noindex.html
85
+ test/htdocs/norobots.html
82
86
  test/htdocs/rails_3_encoding_hack_form_test.html
87
+ test/htdocs/rel_nofollow.html
83
88
  test/htdocs/relative/tc_relative_links.html
89
+ test/htdocs/robots.html
90
+ test/htdocs/robots.txt
84
91
  test/htdocs/tc_bad_charset.html
85
92
  test/htdocs/tc_bad_links.html
86
93
  test/htdocs/tc_base_images.html
@@ -106,25 +113,12 @@ test/htdocs/test_click.html
106
113
  test/htdocs/unusual______.html
107
114
  test/servlets.rb
108
115
  test/ssl_server.rb
109
- test/test_authenticate.rb
110
- test/test_bad_links.rb
111
- test/test_blank_form.rb
112
- test/test_checkboxes.rb
113
- test/test_content_type.rb
114
116
  test/test_cookies.rb
115
- test/test_encoded_links.rb
116
- test/test_errors.rb
117
- test/test_field_precedence.rb
118
- test/test_follow_meta.rb
119
117
  test/test_form_action.rb
120
118
  test/test_form_as_hash.rb
121
119
  test/test_form_button.rb
122
120
  test/test_form_no_inputname.rb
123
- test/test_forms.rb
124
121
  test/test_frames.rb
125
- test/test_get_headers.rb
126
- test/test_gzipping.rb
127
- test/test_hash_api.rb
128
122
  test/test_headers.rb
129
123
  test/test_history.rb
130
124
  test/test_history_added.rb
@@ -132,17 +126,20 @@ test/test_html_unscape_forms.rb
132
126
  test/test_if_modified_since.rb
133
127
  test/test_images.rb
134
128
  test/test_links.rb
135
- test/test_mech.rb
136
- test/test_mech_proxy.rb
137
129
  test/test_mechanize.rb
138
130
  test/test_mechanize_cookie.rb
139
131
  test/test_mechanize_cookie_jar.rb
140
132
  test/test_mechanize_file.rb
141
133
  test/test_mechanize_file_request.rb
142
134
  test/test_mechanize_file_response.rb
135
+ test/test_mechanize_form.rb
136
+ test/test_mechanize_form_check_box.rb
137
+ test/test_mechanize_form_encoding.rb
138
+ test/test_mechanize_form_field.rb
143
139
  test/test_mechanize_form_image_button.rb
144
140
  test/test_mechanize_page.rb
145
- test/test_mechanize_uri_resolver.rb
141
+ test/test_mechanize_redirect_not_get_or_head_error.rb
142
+ test/test_mechanize_subclass.rb
146
143
  test/test_mechanize_util.rb
147
144
  test/test_meta.rb
148
145
  test/test_multi_select.rb
@@ -153,11 +150,11 @@ test/test_post_form.rb
153
150
  test/test_pretty_print.rb
154
151
  test/test_radiobutton.rb
155
152
  test/test_redirect_limit_reached.rb
156
- test/test_redirect_verb_handling.rb
157
153
  test/test_referer.rb
158
154
  test/test_relative_links.rb
159
155
  test/test_request.rb
160
156
  test/test_response_code.rb
157
+ test/test_robots.rb
161
158
  test/test_save_file.rb
162
159
  test/test_scheme.rb
163
160
  test/test_select.rb
@@ -166,7 +163,6 @@ test/test_select_none.rb
166
163
  test/test_select_noopts.rb
167
164
  test/test_set_fields.rb
168
165
  test/test_ssl_server.rb
169
- test/test_subclass.rb
170
166
  test/test_textarea.rb
171
167
  test/test_upload.rb
172
168
  test/test_verbs.rb
data/Rakefile CHANGED
@@ -12,9 +12,12 @@ Hoe.spec 'mechanize' do
12
12
  self.readme_file = 'README.rdoc'
13
13
  self.history_file = 'CHANGELOG.rdoc'
14
14
  self.extra_rdoc_files += Dir['*.rdoc']
15
- self.extra_deps << ['nokogiri', '~> 1.4']
16
- self.extra_deps << ['net-http-persistent', '~> 1.6']
17
- self.extra_deps << ['net-http-digest_auth', '~> 1.1', '>= 1.1.1']
15
+
16
+ self.extra_deps << ['nokogiri', '~> 1.4']
17
+ self.extra_deps << ['net-http-persistent', '~> 1.6']
18
+ self.extra_deps << ['net-http-digest_auth', '~> 1.1', '>= 1.1.1']
19
+ self.extra_deps << ['webrobots', '~> 0.0', '>= 0.0.6']
20
+
18
21
  self.spec_extras[:required_ruby_version] = '>= 1.8.7'
19
22
  end
20
23
 
@@ -12,7 +12,6 @@ require 'uri'
12
12
  require 'webrick/httputils'
13
13
  require 'zlib'
14
14
 
15
-
16
15
  # = Synopsis
17
16
  # The Mechanize library is used for automating interaction with a website. It
18
17
  # can follow links, and submit forms. Form fields can be populated and
@@ -71,7 +70,7 @@ class Mechanize
71
70
  attr_accessor :read_timeout
72
71
 
73
72
  # The identification string for the client initiating a web request
74
- attr_accessor :user_agent
73
+ attr_reader :user_agent
75
74
 
76
75
  # The value of watch_for_set is passed to pluggable parsers for retrieved
77
76
  # content
@@ -96,6 +95,15 @@ class Mechanize
96
95
  # redirects are followed.
97
96
  attr_accessor :redirect_ok
98
97
 
98
+ # Says this agent should consult the site's robots.txt for each access.
99
+ attr_reader :robots
100
+
101
+ def robots=(value)
102
+ require 'webrobots' if value
103
+ @webrobots = nil if value != @robots
104
+ @robots = value
105
+ end
106
+
99
107
  # Disables HTTP/1.1 gzip compression (enabled by default)
100
108
  attr_accessor :gzip_enabled
101
109
 
@@ -198,6 +206,9 @@ class Mechanize
198
206
  @follow_meta_refresh = false
199
207
  @redirection_limit = 20
200
208
 
209
+ @robots = false
210
+ @webrobots = nil
211
+
201
212
  # Connection Cache & Keep alive
202
213
  @keep_alive_time = 300
203
214
  @keep_alive = true
@@ -208,8 +219,16 @@ class Mechanize
208
219
  @proxy_user = nil
209
220
  @proxy_pass = nil
210
221
 
211
- @resolver = Mechanize::URIResolver.new
212
- @scheme_handlers = @resolver.scheme_handlers
222
+ @scheme_handlers = Hash.new { |h, scheme|
223
+ h[scheme] = lambda { |link, page|
224
+ raise Mechanize::UnsupportedSchemeError, scheme
225
+ }
226
+ }
227
+
228
+ @scheme_handlers['http'] = lambda { |link, page| link }
229
+ @scheme_handlers['https'] = @scheme_handlers['http']
230
+ @scheme_handlers['relative'] = @scheme_handlers['http']
231
+ @scheme_handlers['file'] = @scheme_handlers['http']
213
232
 
214
233
  @pre_connect_hooks = []
215
234
  @post_connect_hooks = []
@@ -243,6 +262,11 @@ class Mechanize
243
262
  nil
244
263
  end
245
264
 
265
+ def user_agent=(value)
266
+ @webrobots = nil if value != @user_agent
267
+ @user_agent = value
268
+ end
269
+
246
270
  # Set the user agent for the Mechanize object.
247
271
  # See AGENT_ALIASES
248
272
  def user_agent_alias=(al)
@@ -263,17 +287,15 @@ class Mechanize
263
287
  alias :basic_auth :auth
264
288
 
265
289
  # Fetches the URL passed in and returns a page.
266
- def get(options, parameters = [], referer = nil)
290
+ def get(uri, parameters = [], referer = nil, headers = {})
267
291
  method = :get
268
292
 
269
- unless options.is_a? Hash
270
- url = options
271
- unless parameters.respond_to?(:each) # FIXME: Remove this in 0.8.0
272
- referer = parameters
273
- parameters = []
274
- end
275
- else
276
- raise ArgumentError, "url must be specified" unless url = options[:url]
293
+ if Hash === uri then
294
+ options = uri
295
+ location = Gem.location_of_caller.join ':'
296
+ warn "#{location}: Mechanize#get with options hash is deprecated and will be removed October 2011"
297
+
298
+ raise ArgumentError, "url must be specified" unless uri = options[:url]
277
299
  parameters = options[:params] || []
278
300
  referer = options[:referer]
279
301
  headers = options[:headers]
@@ -281,7 +303,7 @@ class Mechanize
281
303
  end
282
304
 
283
305
  unless referer
284
- if url.to_s =~ %r{\Ahttps?://}
306
+ if uri.to_s =~ %r{\Ahttps?://}
285
307
  referer = Page.new(nil, {'content-type'=>'text/html'})
286
308
  else
287
309
  referer = current_page || Page.new(nil, {'content-type'=>'text/html'})
@@ -299,7 +321,7 @@ class Mechanize
299
321
 
300
322
  # fetch the page
301
323
  headers ||= {}
302
- page = fetch_page url, method, headers, parameters, referer
324
+ page = fetch_page uri, method, headers, parameters, referer
303
325
  add_to_history(page)
304
326
  yield page if block_given?
305
327
  page
@@ -347,6 +369,14 @@ class Mechanize
347
369
  # Mechanize::Page::Link object passed in. Returns the page fetched.
348
370
  def click(link)
349
371
  case link
372
+ when Page::Link
373
+ referer = link.page || current_page()
374
+ if robots
375
+ if (referer.is_a?(Page) && referer.parser.nofollow?) || link.rel?('nofollow')
376
+ raise RobotsDisallowedError.new(link.href)
377
+ end
378
+ end
379
+ get link.href, [], referer
350
380
  when String, Regexp
351
381
  if real_link = page.link_with(:text => link)
352
382
  click real_link
@@ -359,10 +389,10 @@ class Mechanize
359
389
  submit form, button if form
360
390
  end
361
391
  else
362
- referer = link.page rescue referer = nil
392
+ referer = current_page()
363
393
  href = link.respond_to?(:href) ? link.href :
364
394
  (link['href'] || link['src'])
365
- get(:url => href, :referer => (referer || current_page()))
395
+ get href, [], referer
366
396
  end
367
397
  end
368
398
 
@@ -421,11 +451,10 @@ class Mechanize
421
451
  when 'POST'
422
452
  post_form(form.action, form, headers)
423
453
  when 'GET'
424
- get( :url => form.action.gsub(/\?[^\?]*$/, ''),
425
- :params => form.build_query,
426
- :headers => headers,
427
- :referer => form.page
428
- )
454
+ get(form.action.gsub(/\?[^\?]*$/, ''),
455
+ form.build_query,
456
+ form.page,
457
+ headers)
429
458
  else
430
459
  raise ArgumentError, "unsupported method: #{form.method.upcase}"
431
460
  end
@@ -473,6 +502,36 @@ class Mechanize
473
502
  end
474
503
  end
475
504
 
505
+ # Tests if this agent is allowed to access +url+, consulting the
506
+ # site's robots.txt.
507
+ def robots_allowed?(uri)
508
+ return true if uri.request_uri == '/robots.txt'
509
+
510
+ webrobots.allowed?(uri)
511
+ end
512
+
513
+ # Equivalent to !robots_allowed?(url).
514
+ def robots_disallowed?(url)
515
+ !webrobots.allowed?(url)
516
+ end
517
+
518
+ # Returns an error object if there is an error in fetching or
519
+ # parsing robots.txt of the site +url+.
520
+ def robots_error(url)
521
+ webrobots.error(url)
522
+ end
523
+
524
+ # Raises the error if there is an error in fetching or parsing
525
+ # robots.txt of the site +url+.
526
+ def robots_error!(url)
527
+ webrobots.error!(url)
528
+ end
529
+
530
+ # Removes robots.txt cache for the site +url+.
531
+ def robots_reset(url)
532
+ webrobots.reset(url)
533
+ end
534
+
476
535
  alias :page :current_page
477
536
 
478
537
  def connection_for uri
@@ -608,6 +667,69 @@ class Mechanize
608
667
  request['User-Agent'] = @user_agent if @user_agent
609
668
  end
610
669
 
670
+ def resolve(uri, referer = current_page())
671
+ uri = uri.dup if uri.is_a?(URI)
672
+
673
+ unless uri.is_a?(URI)
674
+ uri = uri.to_s.strip.gsub(/[^#{0.chr}-#{126.chr}]/o) { |match|
675
+ if RUBY_VERSION >= "1.9.0"
676
+ Mechanize::Util.uri_escape(match)
677
+ else
678
+ sprintf('%%%X', match.unpack($KCODE == 'UTF8' ? 'U' : 'C')[0])
679
+ end
680
+ }
681
+
682
+ unescaped = uri.split(/(?:%[0-9A-Fa-f]{2})+|#/)
683
+ escaped = uri.scan(/(?:%[0-9A-Fa-f]{2})+|#/)
684
+
685
+ escaped_uri = Mechanize::Util.html_unescape(
686
+ unescaped.zip(escaped).map { |x,y|
687
+ "#{WEBrick::HTTPUtils.escape(x)}#{y}"
688
+ }.join('')
689
+ )
690
+
691
+ begin
692
+ uri = URI.parse(escaped_uri)
693
+ rescue
694
+ uri = URI.parse(WEBrick::HTTPUtils.escape(escaped_uri))
695
+ end
696
+ end
697
+
698
+ scheme = uri.relative? ? 'relative' : uri.scheme.downcase
699
+ uri = @scheme_handlers[scheme].call(uri, referer)
700
+
701
+ if referer && referer.uri
702
+ if uri.path.length == 0 && uri.relative?
703
+ uri.path = referer.uri.path
704
+ end
705
+ end
706
+
707
+ uri.path = '/' if uri.path.length == 0
708
+
709
+ if uri.relative?
710
+ raise ArgumentError, "absolute URL needed (not #{uri})" unless
711
+ referer && referer.uri
712
+
713
+ base = nil
714
+ if referer.respond_to?(:bases) && referer.parser
715
+ base = referer.bases.last
716
+ end
717
+
718
+ uri = ((base && base.uri && base.uri.absolute?) ?
719
+ base.uri :
720
+ referer.uri) + uri
721
+ uri = referer.uri + uri
722
+ # Strip initial "/.." bits from the path
723
+ uri.path.sub!(/^(\/\.\.)+(?=\/)/, '')
724
+ end
725
+
726
+ unless ['http', 'https', 'file'].include?(uri.scheme.downcase)
727
+ raise ArgumentError, "unsupported scheme: #{uri.scheme}"
728
+ end
729
+
730
+ uri
731
+ end
732
+
611
733
  def resolve_parameters uri, method, parameters
612
734
  case method
613
735
  when :head, :get, :delete, :trace then
@@ -707,7 +829,7 @@ class Mechanize
707
829
  response.read_body { |part|
708
830
  total += part.length
709
831
  body.write(part)
710
- log.debug("Read #{total} bytes") if log
832
+ log.debug("Read #{part.length} bytes (#{total} total)") if log
711
833
  }
712
834
 
713
835
  body.rewind
@@ -814,8 +936,15 @@ class Mechanize
814
936
 
815
937
  private
816
938
 
817
- def resolve(url, referer = current_page())
818
- @resolver.resolve(url, referer).to_s
939
+ def webrobots_http_get(uri)
940
+ get_file(uri)
941
+ rescue Mechanize::ResponseCodeError => e
942
+ return '' if e.response_code == '404'
943
+ raise e
944
+ end
945
+
946
+ def webrobots
947
+ @webrobots ||= WebRobots.new(@user_agent, :http_get => method(:webrobots_http_get))
819
948
  end
820
949
 
821
950
  def set_http proxy = nil
@@ -868,7 +997,7 @@ class Mechanize
868
997
  referer = current_page, redirects = 0
869
998
  referer_uri = referer ? referer.uri : nil
870
999
 
871
- uri = @resolver.resolve uri, referer
1000
+ uri = resolve uri, referer
872
1001
 
873
1002
  uri, params = resolve_parameters uri, method, params
874
1003
 
@@ -889,6 +1018,11 @@ class Mechanize
889
1018
 
890
1019
  pre_connect request
891
1020
 
1021
+ # Consult robots.txt
1022
+ if robots && uri.is_a?(URI::HTTP)
1023
+ robots_allowed?(uri) or raise RobotsDisallowedError.new(uri)
1024
+ end
1025
+
892
1026
  # Add If-Modified-Since if page is in history
893
1027
  if (page = visited_page(uri)) and page.response['Last-Modified']
894
1028
  request['If-Modified-Since'] = page.response['Last-Modified']
@@ -921,7 +1055,13 @@ class Mechanize
921
1055
  return meta if meta
922
1056
 
923
1057
  case response
924
- when Net::HTTPSuccess, Mechanize::FileResponse
1058
+ when Net::HTTPSuccess
1059
+ if robots && page.is_a?(Page)
1060
+ page.parser.noindex? and raise RobotsDisallowedError.new(uri)
1061
+ end
1062
+
1063
+ page
1064
+ when Mechanize::FileResponse
925
1065
  page
926
1066
  when Net::HTTPNotModified
927
1067
  log.debug("Got cached page") if log
@@ -958,7 +1098,7 @@ require 'mechanize/pluggable_parsers'
958
1098
  require 'mechanize/redirect_limit_reached_error'
959
1099
  require 'mechanize/redirect_not_get_or_head_error'
960
1100
  require 'mechanize/response_code_error'
1101
+ require 'mechanize/robots_disallowed_error'
961
1102
  require 'mechanize/unsupported_scheme_error'
962
- require 'mechanize/uri_resolver'
963
1103
  require 'mechanize/util'
964
1104