mechanize 0.6.3 → 0.6.4
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of mechanize might be problematic. Click here for more details.
- data/{CHANGELOG → CHANGELOG.txt} +17 -0
- data/{EXAMPLES → EXAMPLES.txt} +0 -0
- data/{GUIDE → GUIDE.txt} +0 -0
- data/{LICENSE → LICENSE.txt} +0 -0
- data/Manifest.txt +31 -0
- data/{NOTES → NOTES.txt} +14 -0
- data/{README → README.txt} +7 -3
- data/Rakefile +64 -0
- data/eg/flickr_upload.rb +23 -0
- data/eg/mech-dump.rb +7 -0
- data/eg/proxy_req.rb +9 -0
- data/eg/rubyforge.rb +21 -0
- data/eg/spider.rb +11 -0
- data/lib/mechanize/cookie.rb +15 -3
- data/lib/mechanize/errors.rb +1 -3
- data/lib/mechanize/form.rb +3 -1
- data/lib/mechanize/form_elements.rb +1 -1
- data/lib/mechanize/mech_version.rb +1 -1
- data/lib/mechanize/net-overrides/net/http.rb +1 -1
- data/lib/mechanize/page.rb +7 -10
- data/lib/mechanize/page_elements.rb +1 -0
- data/lib/mechanize/parsers/rexml_page.rb +8 -10
- data/lib/mechanize.rb +59 -37
- data/setup.rb +1585 -0
- data/test/htdocs/relative/tc_relative_links.html +19 -0
- data/test/htdocs/tc_relative_links.html +19 -0
- data/test/tc_checkboxes.rb +1 -1
- data/test/tc_cookie_class.rb +18 -2
- data/test/tc_form_as_hash.rb +6 -8
- data/test/tc_form_button.rb +36 -0
- data/test/tc_form_no_inputname.rb +2 -2
- data/test/tc_forms.rb +12 -0
- data/test/tc_no_attributes.rb +1 -1
- data/test/tc_relative_links.rb +40 -0
- data/test/tc_response_code.rb +20 -0
- data/test/tc_subclass.rb +28 -0
- data/test/tc_upload.rb +11 -11
- data/test/{ts_mech.rb → test_all.rb} +24 -34
- data/test/test_includes.rb +112 -0
- data/test/{servlets.rb → test_servlets.rb} +4 -6
- metadata +55 -59
- data/test/README +0 -7
- data/test/proxy.rb +0 -30
- data/test/server.rb +0 -43
- data/test/ssl_server.rb +0 -48
data/{CHANGELOG → CHANGELOG.txt}
RENAMED
@@ -1,5 +1,22 @@
|
|
1
1
|
= Mechanize CHANGELOG
|
2
2
|
|
3
|
+
== 0.6.4
|
4
|
+
|
5
|
+
* Adding the "redirect_ok" method to Mechanize to stop mechanize from
|
6
|
+
following redirects.
|
7
|
+
http://rubyforge.org/tracker/index.php?func=detail&aid=6571&group_id=1453&atid=5712
|
8
|
+
* Added protected method Mechanize#set_headers so that subclasses can set
|
9
|
+
custom headers.
|
10
|
+
http://rubyforge.org/tracker/?func=detail&aid=7208&group_id=1453&atid=5712
|
11
|
+
* Aliased Page#referer to Page#page
|
12
|
+
* Fixed a bug when clicking relative urls
|
13
|
+
http://rubyforge.org/pipermail/mechanize-users/2006-November/000035.html
|
14
|
+
* Fixing a bug when bad version or max age is passed to Cookie::parse
|
15
|
+
http://rubyforge.org/pipermail/mechanize-users/2006-November/000033.html
|
16
|
+
* Fixing a bug with response codes. [#6526]
|
17
|
+
* Fixed bug [#6548]. Input type of 'button' was not being added as a button.
|
18
|
+
* Fixed bug [#7139]. REXML parser calls hpricot parser by accident
|
19
|
+
|
3
20
|
== 0.6.3
|
4
21
|
|
5
22
|
* Added keys and values methods to Form
|
data/{EXAMPLES → EXAMPLES.txt}
RENAMED
File without changes
|
data/{GUIDE → GUIDE.txt}
RENAMED
File without changes
|
data/{LICENSE → LICENSE.txt}
RENAMED
File without changes
|
data/Manifest.txt
ADDED
@@ -0,0 +1,31 @@
|
|
1
|
+
CHANGELOG.txt
|
2
|
+
EXAMPLES.txt
|
3
|
+
GUIDE.txt
|
4
|
+
LICENSE.txt
|
5
|
+
Manifest.txt
|
6
|
+
NOTES.txt
|
7
|
+
README.txt
|
8
|
+
Rakefile
|
9
|
+
eg/flickr_upload.rb
|
10
|
+
eg/mech-dump.rb
|
11
|
+
eg/proxy_req.rb
|
12
|
+
eg/rubyforge.rb
|
13
|
+
eg/spider.rb
|
14
|
+
lib/mechanize.rb
|
15
|
+
lib/mechanize/cookie.rb
|
16
|
+
lib/mechanize/errors.rb
|
17
|
+
lib/mechanize/form.rb
|
18
|
+
lib/mechanize/form_elements.rb
|
19
|
+
lib/mechanize/hpricot.rb
|
20
|
+
lib/mechanize/inspect.rb
|
21
|
+
lib/mechanize/list.rb
|
22
|
+
lib/mechanize/mech_version.rb
|
23
|
+
lib/mechanize/net-overrides/net/http.rb
|
24
|
+
lib/mechanize/net-overrides/net/https.rb
|
25
|
+
lib/mechanize/net-overrides/net/protocol.rb
|
26
|
+
lib/mechanize/page.rb
|
27
|
+
lib/mechanize/page_elements.rb
|
28
|
+
lib/mechanize/parsers/rexml_page.rb
|
29
|
+
lib/mechanize/pluggable_parsers.rb
|
30
|
+
lib/mechanize/rexml.rb
|
31
|
+
setup.rb
|
data/{NOTES → NOTES.txt}
RENAMED
@@ -1,5 +1,19 @@
|
|
1
1
|
= Mechanize Release Notes
|
2
2
|
|
3
|
+
== 0.6.4 (Gwendolyn)
|
4
|
+
|
5
|
+
Custom request headers can now be added to Mechanize by subclassing mechanize
|
6
|
+
and defining the Mechanize#set_headers method. For example:
|
7
|
+
class A < WWW::Mechanize
|
8
|
+
def set_headers(u, r, c)
|
9
|
+
super(uri, request, cur_page)
|
10
|
+
request.add_field('Cookie', 'name=Aaron')
|
11
|
+
request
|
12
|
+
end
|
13
|
+
end
|
14
|
+
The Mechanize#redirect_ok method has been added to that you can keep mechanize
|
15
|
+
from following redirects.
|
16
|
+
|
3
17
|
== 0.6.3 (Big Man)
|
4
18
|
|
5
19
|
Mechanize 0.6.3 (Big Man) has a few bug fixes and some new features added to
|
data/{README → README.txt}
RENAMED
@@ -1,5 +1,9 @@
|
|
1
1
|
= WWW::Mechanize
|
2
2
|
|
3
|
+
http://mechanize.rubyforge.org/
|
4
|
+
|
5
|
+
== DESCRIPTION
|
6
|
+
|
3
7
|
The Mechanize library is used for automating interaction with websites.
|
4
8
|
Mechanize automatically stores and sends cookies, follows redirects,
|
5
9
|
can follow links, and submit forms. Form fields can be populated and
|
@@ -16,8 +20,8 @@ Note that the files in the net-overrides/ directory are taken from Ruby 1.9.0.
|
|
16
20
|
|
17
21
|
== Examples
|
18
22
|
|
19
|
-
If you are just starting, check out the GUIDE[link://files/
|
20
|
-
Also, check out the EXAMPLES[link://files/
|
23
|
+
If you are just starting, check out the GUIDE[link://files/GUIDE_txt.html].
|
24
|
+
Also, check out the EXAMPLES[link://files/EXAMPLES_txt.html] file.
|
21
25
|
|
22
26
|
== Authors
|
23
27
|
|
@@ -33,5 +37,5 @@ Ruby, my favorite language!
|
|
33
37
|
|
34
38
|
== License
|
35
39
|
|
36
|
-
This library is distributed under the GPL. Please see the LICENSE file.
|
40
|
+
This library is distributed under the GPL. Please see the LICENSE[link://files/LICENSE_txt.html] file.
|
37
41
|
|
data/Rakefile
ADDED
@@ -0,0 +1,64 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'hoe'
|
3
|
+
|
4
|
+
def announce(msg='')
|
5
|
+
STDERR.puts msg
|
6
|
+
end
|
7
|
+
|
8
|
+
PKG_BUILD = ENV['PKG_BUILD'] ? '.' + ENV['PKG_BUILD'] : ''
|
9
|
+
PKG_NAME = 'mechanize'
|
10
|
+
PKG_VERSION = '0.6.4' + PKG_BUILD
|
11
|
+
|
12
|
+
Hoe.new(PKG_NAME, PKG_VERSION) do |p|
|
13
|
+
p.rubyforge_name = PKG_NAME
|
14
|
+
p.author = 'Aaron Patterson'
|
15
|
+
p.email = 'aaronp@rubyforge.org'
|
16
|
+
p.summary = "Mechanize provides automated web-browsing"
|
17
|
+
p.description = p.paragraphs_of('README.txt', 3).join("\n\n")
|
18
|
+
p.url = p.paragraphs_of('README.txt', 1).first.strip
|
19
|
+
p.changes = p.paragraphs_of('CHANGELOG.txt', 0..2).join("\n\n")
|
20
|
+
files =
|
21
|
+
(p.test_globs + ['test/**/tc_*.rb',
|
22
|
+
"test/htdocs/**/*.{html,jpg}",
|
23
|
+
'test/data/server.*']).map { |x|
|
24
|
+
Dir.glob(x)
|
25
|
+
}.flatten + ['test/data/htpasswd']
|
26
|
+
p.extra_deps = ['hpricot']
|
27
|
+
p.spec_extras = { :test_files => files }
|
28
|
+
end
|
29
|
+
|
30
|
+
task :update_version do
|
31
|
+
announce "Updating Mechanize Version to #{PKG_VERSION}"
|
32
|
+
File.open("lib/mechanize/mech_version.rb", "w") do |f|
|
33
|
+
f.puts "module WWW"
|
34
|
+
f.puts " class Mechanize"
|
35
|
+
f.puts " Version = '#{PKG_VERSION}'"
|
36
|
+
f.puts " end"
|
37
|
+
f.puts "end"
|
38
|
+
end
|
39
|
+
sh 'svn commit -m"updating version" lib/mechanize/mech_version.rb'
|
40
|
+
end
|
41
|
+
|
42
|
+
desc "Tag code"
|
43
|
+
Rake::Task.define_task("tag") do |p|
|
44
|
+
baseurl = "svn+ssh://#{ENV['USER']}@rubyforge.org//var/svn/#{PKG_NAME}"
|
45
|
+
sh "svn cp -m 'tagged #{ PKG_VERSION }' . #{ baseurl }/tags/REL-#{ PKG_VERSION }"
|
46
|
+
end
|
47
|
+
|
48
|
+
desc "Branch code"
|
49
|
+
Rake::Task.define_task("branch") do |p|
|
50
|
+
baseurl = "svn+ssh://#{ENV['USER']}@rubyforge.org/var/svn/#{PKG_NAME}"
|
51
|
+
sh "svn cp -m 'branched #{ PKG_VERSION }' #{baseurl}/trunk #{ baseurl }/branches/RB-#{ PKG_VERSION }"
|
52
|
+
end
|
53
|
+
|
54
|
+
desc "Update SSL Certificate"
|
55
|
+
Rake::Task.define_task('ssl_cert') do |p|
|
56
|
+
sh "openssl genrsa -des3 -out server.key 1024"
|
57
|
+
sh "openssl req -new -key server.key -out server.csr"
|
58
|
+
sh "cp server.key server.key.org"
|
59
|
+
sh "openssl rsa -in server.key.org -out server.key"
|
60
|
+
sh "openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt"
|
61
|
+
sh "cp server.key server.pem"
|
62
|
+
sh "mv server.key server.csr server.crt server.pem test/data/"
|
63
|
+
sh "rm server.key.org"
|
64
|
+
end
|
data/eg/flickr_upload.rb
ADDED
@@ -0,0 +1,23 @@
|
|
1
|
+
$:.unshift File.join(File.dirname(__FILE__), "..", "lib")
|
2
|
+
|
3
|
+
require 'rubygems'
|
4
|
+
require 'mechanize'
|
5
|
+
|
6
|
+
agent = WWW::Mechanize.new
|
7
|
+
|
8
|
+
# Get the flickr sign in page
|
9
|
+
page = agent.get('http://flickr.com/signin/flickr/')
|
10
|
+
|
11
|
+
# Fill out the login form
|
12
|
+
form = page.forms.name('flickrloginform').first
|
13
|
+
form.email = ARGV[0]
|
14
|
+
form.password = ARGV[1]
|
15
|
+
page = agent.submit(form)
|
16
|
+
|
17
|
+
# Go to the upload page
|
18
|
+
page = agent.click page.links.text('Upload')
|
19
|
+
|
20
|
+
# Fill out the form
|
21
|
+
form = page.forms.action('/photos_upload_process.gne').first
|
22
|
+
form.file_uploads.name('file1').first.file_name = ARGV[2]
|
23
|
+
agent.submit(form)
|
data/eg/mech-dump.rb
ADDED
data/eg/proxy_req.rb
ADDED
data/eg/rubyforge.rb
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
$:.unshift File.join(File.dirname(__FILE__), "..", "lib")
|
2
|
+
|
3
|
+
# This example logs a user in to rubyforge and prints out the body of the
|
4
|
+
# page after logging the user in.
|
5
|
+
require 'rubygems'
|
6
|
+
require 'mechanize'
|
7
|
+
|
8
|
+
# Create a new mechanize object
|
9
|
+
agent = WWW::Mechanize.new { |a| a.log = Logger.new(STDERR) }
|
10
|
+
|
11
|
+
# Load the rubyforge website
|
12
|
+
page = agent.get('http://rubyforge.org/')
|
13
|
+
page = agent.click page.links.text(/Log In/) # Click the login link
|
14
|
+
form = page.forms[1] # Select the first form
|
15
|
+
form.form_loginname = ARGV[0]
|
16
|
+
form.form_pw = ARGV[1]
|
17
|
+
|
18
|
+
# Submit the form
|
19
|
+
page = agent.submit(form, form.buttons.first)
|
20
|
+
|
21
|
+
puts page.body # Print out the body
|
data/eg/spider.rb
ADDED
@@ -0,0 +1,11 @@
|
|
1
|
+
$:.unshift File.join(File.dirname(__FILE__), "..", "lib")
|
2
|
+
|
3
|
+
require 'rubygems'
|
4
|
+
require 'mechanize'
|
5
|
+
|
6
|
+
agent = WWW::Mechanize.new
|
7
|
+
stack = agent.get(ARGV[0]).links
|
8
|
+
while l = stack.pop
|
9
|
+
next unless l.uri.host == agent.history.first.uri.host
|
10
|
+
stack.push(*(agent.click(l).links)) unless agent.visited? l.href
|
11
|
+
end
|
data/lib/mechanize/cookie.rb
CHANGED
@@ -6,7 +6,7 @@ module WWW
|
|
6
6
|
class Mechanize
|
7
7
|
# This class is used to represent an HTTP Cookie.
|
8
8
|
class Cookie < WEBrick::Cookie
|
9
|
-
def self.parse(uri, str)
|
9
|
+
def self.parse(uri, str, log = nil)
|
10
10
|
return str.split(/,(?=[^;,]*=)|,$/).collect { |c|
|
11
11
|
cookie_elem = c.split(/;/)
|
12
12
|
first_elem = cookie_elem.shift
|
@@ -28,9 +28,21 @@ module WWW
|
|
28
28
|
rescue
|
29
29
|
Time.now
|
30
30
|
end
|
31
|
-
when "max-age" then
|
31
|
+
when "max-age" then
|
32
|
+
begin
|
33
|
+
cookie.max_age = Integer(value)
|
34
|
+
rescue
|
35
|
+
log.warn("Couldn't parse max age '#{value}'") if log
|
36
|
+
cookie.max_age = nil
|
37
|
+
end
|
32
38
|
when "comment" then cookie.comment = value
|
33
|
-
when "version" then
|
39
|
+
when "version" then
|
40
|
+
begin
|
41
|
+
cookie.version = Integer(value)
|
42
|
+
rescue
|
43
|
+
log.warn("Couldn't parse version '#{value}'") if log
|
44
|
+
cookie.version = nil
|
45
|
+
end
|
34
46
|
when "secure" then cookie.secure = true
|
35
47
|
end
|
36
48
|
}
|
data/lib/mechanize/errors.rb
CHANGED
data/lib/mechanize/form.rb
CHANGED
@@ -118,7 +118,7 @@ module WWW
|
|
118
118
|
node.attributes ||= {}
|
119
119
|
type = (node.attributes['type'] || 'text').downcase
|
120
120
|
name = node.attributes['name']
|
121
|
-
next if type
|
121
|
+
next if name.nil? && !(type == 'submit' || type =='button')
|
122
122
|
case type
|
123
123
|
when 'text', 'password', 'hidden', 'int'
|
124
124
|
@fields << Field.new(node.attributes['name'], node.attributes['value'] || '')
|
@@ -130,6 +130,8 @@ module WWW
|
|
130
130
|
@file_uploads << FileUpload.new(node.attributes['name'], nil)
|
131
131
|
when 'submit'
|
132
132
|
@buttons << Button.new(node.attributes['name'], node.attributes['value'])
|
133
|
+
when 'button'
|
134
|
+
@buttons << Button.new(node.attributes['name'], node.attributes['value'])
|
133
135
|
when 'image'
|
134
136
|
@buttons << ImageButton.new(node.attributes['name'], node.attributes['value'])
|
135
137
|
end
|
@@ -22,7 +22,7 @@ module WWW
|
|
22
22
|
# class, set WWW::FileUpload#file_data= to the data of the file you want
|
23
23
|
# to upload and WWW::FileUpload#mime_type= to the appropriate mime type
|
24
24
|
# of the file.
|
25
|
-
# See the example in EXAMPLES[link://files/
|
25
|
+
# See the example in EXAMPLES[link://files/EXAMPLES_txt.html]
|
26
26
|
class FileUpload < Field
|
27
27
|
attr_accessor :name # Field name
|
28
28
|
attr_accessor :file_name # File name
|
@@ -1826,7 +1826,7 @@ module Net # :nodoc:
|
|
1826
1826
|
'416' => HTTPRequestedRangeNotSatisfiable,
|
1827
1827
|
'417' => HTTPExpectationFailed,
|
1828
1828
|
|
1829
|
-
'
|
1829
|
+
'500' => HTTPInternalServerError,
|
1830
1830
|
'501' => HTTPNotImplemented,
|
1831
1831
|
'502' => HTTPBadGateway,
|
1832
1832
|
'503' => HTTPServiceUnavailable,
|
data/lib/mechanize/page.rb
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
require 'fileutils'
|
2
2
|
require 'hpricot'
|
3
|
+
require 'forwardable'
|
3
4
|
|
4
5
|
module WWW
|
5
6
|
class Mechanize
|
@@ -15,6 +16,8 @@ module WWW
|
|
15
16
|
# agent.get('http://google.com/').class #=> WWW::Mechanize::Page
|
16
17
|
#
|
17
18
|
class Page < File
|
19
|
+
extend Forwardable
|
20
|
+
|
18
21
|
attr_reader :root, :title, :watch_for_set
|
19
22
|
attr_reader :frames, :iframes, :links, :forms, :meta, :watches
|
20
23
|
attr_accessor :mech
|
@@ -31,7 +34,7 @@ module WWW
|
|
31
34
|
|
32
35
|
# construct parser and feed with HTML
|
33
36
|
if body && response
|
34
|
-
@root
|
37
|
+
@root ||= Hpricot.parse(body)
|
35
38
|
parse_html
|
36
39
|
end
|
37
40
|
end
|
@@ -47,16 +50,10 @@ module WWW
|
|
47
50
|
end
|
48
51
|
|
49
52
|
# Search through the page like HPricot
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
+
def_delegator :@root, :search, :search
|
54
|
+
def_delegator :@root, :/, :/
|
55
|
+
def_delegator :@root, :at, :at
|
53
56
|
|
54
|
-
def at(*args)
|
55
|
-
@root.at(*args)
|
56
|
-
end
|
57
|
-
|
58
|
-
alias :/ :search
|
59
|
-
|
60
57
|
def watch_for_set=(obj)
|
61
58
|
@watch_for_set = obj
|
62
59
|
parse_html if @body
|
@@ -3,15 +3,10 @@ require 'mechanize/rexml'
|
|
3
3
|
|
4
4
|
class WWW::Mechanize::REXMLPage < WWW::Mechanize::Page
|
5
5
|
def initialize(uri=nil, response=nil, body=nil, code=nil, mech=nil)
|
6
|
-
|
7
|
-
@watch_for_set
|
8
|
-
@mech
|
9
|
-
|
10
|
-
yield self if block_given?
|
11
|
-
|
12
|
-
raise Mechanize::ContentTypeError.new(response['content-type']) unless
|
13
|
-
content_type() =~ /^text\/html/
|
14
|
-
|
6
|
+
@body = body
|
7
|
+
@watch_for_set = {}
|
8
|
+
@mech = mech
|
9
|
+
|
15
10
|
# construct parser and feed with HTML
|
16
11
|
parser = HTMLTree::XMLParser.new
|
17
12
|
begin
|
@@ -32,6 +27,9 @@ class WWW::Mechanize::REXMLPage < WWW::Mechanize::Page
|
|
32
27
|
end
|
33
28
|
|
34
29
|
@root = parser.document
|
35
|
-
|
30
|
+
|
31
|
+
yield self if block_given?
|
32
|
+
|
33
|
+
super(uri, response, body, code)
|
36
34
|
end
|
37
35
|
end
|
data/lib/mechanize.rb
CHANGED
@@ -71,10 +71,13 @@ class Mechanize
|
|
71
71
|
attr_accessor :key
|
72
72
|
attr_accessor :cert
|
73
73
|
attr_accessor :pass
|
74
|
+
attr_accessor :redirect_ok
|
74
75
|
|
75
76
|
attr_reader :history
|
76
77
|
attr_reader :pluggable_parser
|
77
78
|
|
79
|
+
alias :follow_redirect? :redirect_ok
|
80
|
+
|
78
81
|
def initialize
|
79
82
|
# attr_accessors
|
80
83
|
@cookie_jar = CookieJar.new
|
@@ -88,6 +91,7 @@ class Mechanize
|
|
88
91
|
@cert = nil # OpenSSL Certificate
|
89
92
|
@key = nil # OpenSSL Private Key
|
90
93
|
@pass = nil # OpenSSL Password
|
94
|
+
@redirect_ok = true # Should we follow redirects?
|
91
95
|
|
92
96
|
# attr_readers
|
93
97
|
@history = []
|
@@ -129,14 +133,14 @@ class Mechanize
|
|
129
133
|
end
|
130
134
|
|
131
135
|
# Fetches the URL passed in and returns a page.
|
132
|
-
def get(url, referer=nil)
|
136
|
+
def get(url, referer=nil, &block)
|
133
137
|
cur_page = referer || current_page ||
|
134
138
|
Page.new( nil, {'content-type'=>'text/html'})
|
135
139
|
|
136
140
|
# fetch the page
|
137
141
|
abs_uri = to_absolute_uri(url, cur_page)
|
138
142
|
request = fetch_request(abs_uri)
|
139
|
-
page = fetch_page(abs_uri, request, cur_page)
|
143
|
+
page = fetch_page(abs_uri, request, cur_page, &block)
|
140
144
|
add_to_history(page)
|
141
145
|
page
|
142
146
|
end
|
@@ -150,15 +154,16 @@ class Mechanize
|
|
150
154
|
# Clicks the WWW::Mechanize::Link object passed in and returns the
|
151
155
|
# page fetched.
|
152
156
|
def click(link)
|
153
|
-
uri = to_absolute_uri(
|
154
|
-
link.attributes['href'] || link.attributes['src'] || link.href
|
155
|
-
)
|
156
157
|
referer =
|
157
158
|
begin
|
158
159
|
link.page
|
159
160
|
rescue
|
160
161
|
nil
|
161
162
|
end
|
163
|
+
uri = to_absolute_uri(
|
164
|
+
link.attributes['href'] || link.attributes['src'] || link.href,
|
165
|
+
referer || current_page()
|
166
|
+
)
|
162
167
|
get(uri, referer)
|
163
168
|
end
|
164
169
|
|
@@ -232,6 +237,35 @@ class Mechanize
|
|
232
237
|
|
233
238
|
alias :page :current_page
|
234
239
|
|
240
|
+
protected
|
241
|
+
def set_headers(uri, request, cur_page)
|
242
|
+
request.add_field('Accept-Encoding', 'gzip,identity')
|
243
|
+
request.add_field('Accept-Language', 'en-us,en;q0.5')
|
244
|
+
request.add_field('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.7')
|
245
|
+
|
246
|
+
unless @cookie_jar.empty?(uri)
|
247
|
+
cookies = @cookie_jar.cookies(uri)
|
248
|
+
cookie = cookies.length > 0 ? cookies.join("; ") : nil
|
249
|
+
if log
|
250
|
+
cookies.each do |c|
|
251
|
+
log.debug("using cookie: #{c}")
|
252
|
+
end
|
253
|
+
end
|
254
|
+
request.add_field('Cookie', cookie)
|
255
|
+
end
|
256
|
+
|
257
|
+
# Add Referer header to request
|
258
|
+
unless cur_page.uri.nil?
|
259
|
+
request.add_field('Referer', cur_page.uri.to_s)
|
260
|
+
end
|
261
|
+
|
262
|
+
# Add User-Agent header to request
|
263
|
+
request.add_field('User-Agent', @user_agent) if @user_agent
|
264
|
+
|
265
|
+
request.basic_auth(@user, @password) if @user || @password
|
266
|
+
request
|
267
|
+
end
|
268
|
+
|
235
269
|
private
|
236
270
|
|
237
271
|
def to_absolute_uri(url, cur_page=current_page())
|
@@ -306,28 +340,7 @@ class Mechanize
|
|
306
340
|
end
|
307
341
|
end
|
308
342
|
|
309
|
-
request
|
310
|
-
|
311
|
-
unless @cookie_jar.empty?(uri)
|
312
|
-
cookies = @cookie_jar.cookies(uri)
|
313
|
-
cookie = cookies.length > 0 ? cookies.join("; ") : nil
|
314
|
-
if log
|
315
|
-
cookies.each do |c|
|
316
|
-
log.debug("using cookie: #{c}")
|
317
|
-
end
|
318
|
-
end
|
319
|
-
request.add_field('Cookie', cookie)
|
320
|
-
end
|
321
|
-
|
322
|
-
# Add Referer header to request
|
323
|
-
unless cur_page.uri.nil?
|
324
|
-
request.add_field('Referer', cur_page.uri.to_s)
|
325
|
-
end
|
326
|
-
|
327
|
-
# Add User-Agent header to request
|
328
|
-
request.add_field('User-Agent', @user_agent) if @user_agent
|
329
|
-
|
330
|
-
request.basic_auth(@user, @password) if @user || @password
|
343
|
+
request = set_headers(uri, request, cur_page)
|
331
344
|
|
332
345
|
# Log specified headers for the request
|
333
346
|
if log
|
@@ -350,13 +363,20 @@ class Mechanize
|
|
350
363
|
end
|
351
364
|
|
352
365
|
(response.get_fields('Set-Cookie')||[]).each do |cookie|
|
353
|
-
Cookie::parse(uri, cookie) { |c|
|
366
|
+
Cookie::parse(uri, cookie, log) { |c|
|
354
367
|
log.debug("saved cookie: #{c}") if log
|
355
368
|
@cookie_jar.add(uri, c)
|
356
369
|
}
|
357
370
|
end
|
358
371
|
|
359
|
-
|
372
|
+
body = StringIO.new
|
373
|
+
total = 0
|
374
|
+
response.read_body { |part|
|
375
|
+
total += part.length
|
376
|
+
body.write(part)
|
377
|
+
log.debug("Read #{total} bytes") if log
|
378
|
+
}
|
379
|
+
body.rewind
|
360
380
|
|
361
381
|
content_type = nil
|
362
382
|
unless response['Content-Type'].nil?
|
@@ -369,12 +389,12 @@ class Mechanize
|
|
369
389
|
case encoding.downcase
|
370
390
|
when 'gzip'
|
371
391
|
log.debug('gunzip body') if log
|
372
|
-
Zlib::GzipReader.new(
|
392
|
+
Zlib::GzipReader.new(body).read
|
373
393
|
else
|
374
394
|
raise 'Unsupported content encoding'
|
375
395
|
end
|
376
396
|
else
|
377
|
-
|
397
|
+
body.read
|
378
398
|
end
|
379
399
|
|
380
400
|
# Find our pluggable parser
|
@@ -393,17 +413,19 @@ class Mechanize
|
|
393
413
|
page.watch_for_set = @watch_for_set
|
394
414
|
end
|
395
415
|
|
396
|
-
|
397
|
-
|
398
|
-
|
399
|
-
|
416
|
+
res_klass = Net::HTTPResponse::CODE_TO_OBJ[page.code.to_s]
|
417
|
+
|
418
|
+
return page if res_klass <= Net::HTTPSuccess
|
419
|
+
|
420
|
+
if res_klass <= Net::HTTPRedirection
|
421
|
+
return page unless follow_redirect?
|
400
422
|
log.info("follow redirect to: #{ response['Location'] }") if log
|
401
423
|
abs_uri = to_absolute_uri(response['Location'].to_s, page)
|
402
424
|
request = fetch_request(abs_uri)
|
403
425
|
return fetch_page(abs_uri, request, page)
|
404
|
-
else
|
405
|
-
raise ResponseCodeError.new(page), "Unhandled response", caller
|
406
426
|
end
|
427
|
+
|
428
|
+
raise ResponseCodeError.new(page), "Unhandled response", caller
|
407
429
|
}
|
408
430
|
}
|
409
431
|
end
|