eric-mechanize 0.9.3.20090623142847
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGELOG.rdoc +504 -0
- data/EXAMPLES.rdoc +171 -0
- data/FAQ.rdoc +11 -0
- data/GUIDE.rdoc +122 -0
- data/LICENSE.rdoc +340 -0
- data/Manifest.txt +169 -0
- data/README.rdoc +60 -0
- data/Rakefile +43 -0
- data/examples/flickr_upload.rb +23 -0
- data/examples/mech-dump.rb +7 -0
- data/examples/proxy_req.rb +9 -0
- data/examples/rubyforge.rb +21 -0
- data/examples/spider.rb +11 -0
- data/lib/mechanize.rb +7 -0
- data/lib/www/mechanize.rb +619 -0
- data/lib/www/mechanize/chain.rb +34 -0
- data/lib/www/mechanize/chain/auth_headers.rb +80 -0
- data/lib/www/mechanize/chain/body_decoding_handler.rb +48 -0
- data/lib/www/mechanize/chain/connection_resolver.rb +78 -0
- data/lib/www/mechanize/chain/custom_headers.rb +23 -0
- data/lib/www/mechanize/chain/handler.rb +9 -0
- data/lib/www/mechanize/chain/header_resolver.rb +53 -0
- data/lib/www/mechanize/chain/parameter_resolver.rb +24 -0
- data/lib/www/mechanize/chain/post_connect_hook.rb +0 -0
- data/lib/www/mechanize/chain/pre_connect_hook.rb +22 -0
- data/lib/www/mechanize/chain/request_resolver.rb +32 -0
- data/lib/www/mechanize/chain/response_body_parser.rb +40 -0
- data/lib/www/mechanize/chain/response_header_handler.rb +50 -0
- data/lib/www/mechanize/chain/response_reader.rb +41 -0
- data/lib/www/mechanize/chain/ssl_resolver.rb +42 -0
- data/lib/www/mechanize/chain/uri_resolver.rb +77 -0
- data/lib/www/mechanize/content_type_error.rb +16 -0
- data/lib/www/mechanize/cookie.rb +72 -0
- data/lib/www/mechanize/cookie_jar.rb +191 -0
- data/lib/www/mechanize/file.rb +73 -0
- data/lib/www/mechanize/file_response.rb +62 -0
- data/lib/www/mechanize/file_saver.rb +39 -0
- data/lib/www/mechanize/form.rb +360 -0
- data/lib/www/mechanize/form/button.rb +8 -0
- data/lib/www/mechanize/form/check_box.rb +13 -0
- data/lib/www/mechanize/form/field.rb +28 -0
- data/lib/www/mechanize/form/file_upload.rb +24 -0
- data/lib/www/mechanize/form/image_button.rb +23 -0
- data/lib/www/mechanize/form/multi_select_list.rb +69 -0
- data/lib/www/mechanize/form/option.rb +51 -0
- data/lib/www/mechanize/form/radio_button.rb +38 -0
- data/lib/www/mechanize/form/select_list.rb +45 -0
- data/lib/www/mechanize/headers.rb +12 -0
- data/lib/www/mechanize/history.rb +67 -0
- data/lib/www/mechanize/inspect.rb +90 -0
- data/lib/www/mechanize/monkey_patch.rb +37 -0
- data/lib/www/mechanize/page.rb +181 -0
- data/lib/www/mechanize/page/base.rb +10 -0
- data/lib/www/mechanize/page/frame.rb +22 -0
- data/lib/www/mechanize/page/link.rb +50 -0
- data/lib/www/mechanize/page/meta.rb +51 -0
- data/lib/www/mechanize/pluggable_parsers.rb +103 -0
- data/lib/www/mechanize/redirect_limit_reached_error.rb +18 -0
- data/lib/www/mechanize/redirect_not_get_or_head_error.rb +20 -0
- data/lib/www/mechanize/response_code_error.rb +25 -0
- data/lib/www/mechanize/unsupported_scheme_error.rb +10 -0
- data/lib/www/mechanize/util.rb +76 -0
- data/mechanize.gemspec +41 -0
- data/test/chain/test_argument_validator.rb +14 -0
- data/test/chain/test_auth_headers.rb +25 -0
- data/test/chain/test_custom_headers.rb +18 -0
- data/test/chain/test_header_resolver.rb +28 -0
- data/test/chain/test_parameter_resolver.rb +35 -0
- data/test/chain/test_request_resolver.rb +29 -0
- data/test/chain/test_response_reader.rb +24 -0
- data/test/data/htpasswd +1 -0
- data/test/data/server.crt +16 -0
- data/test/data/server.csr +12 -0
- data/test/data/server.key +15 -0
- data/test/data/server.pem +15 -0
- data/test/helper.rb +129 -0
- data/test/htdocs/alt_text.html +10 -0
- data/test/htdocs/bad_form_test.html +9 -0
- data/test/htdocs/button.jpg +0 -0
- data/test/htdocs/empty_form.html +6 -0
- data/test/htdocs/file_upload.html +26 -0
- data/test/htdocs/find_link.html +41 -0
- data/test/htdocs/form_multi_select.html +16 -0
- data/test/htdocs/form_multival.html +37 -0
- data/test/htdocs/form_no_action.html +18 -0
- data/test/htdocs/form_no_input_name.html +16 -0
- data/test/htdocs/form_select.html +16 -0
- data/test/htdocs/form_select_all.html +16 -0
- data/test/htdocs/form_select_none.html +17 -0
- data/test/htdocs/form_select_noopts.html +10 -0
- data/test/htdocs/form_set_fields.html +14 -0
- data/test/htdocs/form_test.html +188 -0
- data/test/htdocs/frame_test.html +30 -0
- data/test/htdocs/google.html +13 -0
- data/test/htdocs/iframe_test.html +16 -0
- data/test/htdocs/index.html +6 -0
- data/test/htdocs/link with space.html +5 -0
- data/test/htdocs/meta_cookie.html +11 -0
- data/test/htdocs/no_title_test.html +6 -0
- data/test/htdocs/relative/tc_relative_links.html +21 -0
- data/test/htdocs/tc_bad_links.html +5 -0
- data/test/htdocs/tc_base_link.html +8 -0
- data/test/htdocs/tc_blank_form.html +11 -0
- data/test/htdocs/tc_checkboxes.html +19 -0
- data/test/htdocs/tc_encoded_links.html +5 -0
- data/test/htdocs/tc_follow_meta.html +8 -0
- data/test/htdocs/tc_form_action.html +48 -0
- data/test/htdocs/tc_links.html +18 -0
- data/test/htdocs/tc_no_attributes.html +16 -0
- data/test/htdocs/tc_pretty_print.html +17 -0
- data/test/htdocs/tc_radiobuttons.html +17 -0
- data/test/htdocs/tc_referer.html +10 -0
- data/test/htdocs/tc_relative_links.html +19 -0
- data/test/htdocs/tc_textarea.html +23 -0
- data/test/htdocs/unusual______.html +5 -0
- data/test/servlets.rb +365 -0
- data/test/ssl_server.rb +48 -0
- data/test/test_authenticate.rb +71 -0
- data/test/test_bad_links.rb +25 -0
- data/test/test_blank_form.rb +16 -0
- data/test/test_checkboxes.rb +61 -0
- data/test/test_content_type.rb +13 -0
- data/test/test_cookie_class.rb +338 -0
- data/test/test_cookie_jar.rb +362 -0
- data/test/test_cookies.rb +123 -0
- data/test/test_encoded_links.rb +20 -0
- data/test/test_errors.rb +49 -0
- data/test/test_follow_meta.rb +108 -0
- data/test/test_form_action.rb +52 -0
- data/test/test_form_as_hash.rb +61 -0
- data/test/test_form_button.rb +38 -0
- data/test/test_form_no_inputname.rb +15 -0
- data/test/test_forms.rb +564 -0
- data/test/test_frames.rb +25 -0
- data/test/test_get_headers.rb +52 -0
- data/test/test_gzipping.rb +22 -0
- data/test/test_hash_api.rb +45 -0
- data/test/test_history.rb +142 -0
- data/test/test_history_added.rb +16 -0
- data/test/test_html_unscape_forms.rb +39 -0
- data/test/test_if_modified_since.rb +20 -0
- data/test/test_keep_alive.rb +31 -0
- data/test/test_links.rb +120 -0
- data/test/test_mech.rb +268 -0
- data/test/test_mechanize_file.rb +47 -0
- data/test/test_meta.rb +65 -0
- data/test/test_multi_select.rb +106 -0
- data/test/test_no_attributes.rb +13 -0
- data/test/test_option.rb +18 -0
- data/test/test_page.rb +124 -0
- data/test/test_pluggable_parser.rb +145 -0
- data/test/test_post_form.rb +34 -0
- data/test/test_pretty_print.rb +22 -0
- data/test/test_radiobutton.rb +75 -0
- data/test/test_redirect_limit_reached.rb +41 -0
- data/test/test_redirect_verb_handling.rb +45 -0
- data/test/test_referer.rb +39 -0
- data/test/test_relative_links.rb +40 -0
- data/test/test_request.rb +13 -0
- data/test/test_response_code.rb +52 -0
- data/test/test_save_file.rb +48 -0
- data/test/test_scheme.rb +48 -0
- data/test/test_select.rb +106 -0
- data/test/test_select_all.rb +15 -0
- data/test/test_select_none.rb +15 -0
- data/test/test_select_noopts.rb +16 -0
- data/test/test_set_fields.rb +44 -0
- data/test/test_ssl_server.rb +20 -0
- data/test/test_subclass.rb +14 -0
- data/test/test_textarea.rb +45 -0
- data/test/test_upload.rb +109 -0
- data/test/test_verbs.rb +25 -0
- metadata +314 -0
data/EXAMPLES.rdoc
ADDED
@@ -0,0 +1,171 @@
|
|
1
|
+
= WWW::Mechanize examples
|
2
|
+
|
3
|
+
== Google
|
4
|
+
require 'rubygems'
|
5
|
+
require 'mechanize'
|
6
|
+
|
7
|
+
a = WWW::Mechanize.new { |agent|
|
8
|
+
agent.user_agent_alias = 'Mac Safari'
|
9
|
+
}
|
10
|
+
|
11
|
+
a.get('http://google.com/') do |page|
|
12
|
+
search_result = page.form_with(:name => 'f') do |search|
|
13
|
+
search.q = 'Hello world'
|
14
|
+
end.submit
|
15
|
+
|
16
|
+
search_result.links.each do |link|
|
17
|
+
puts link.text
|
18
|
+
end
|
19
|
+
end
|
20
|
+
|
21
|
+
== Rubyforge
|
22
|
+
|
23
|
+
a = WWW::Mechanize.new
|
24
|
+
a.get('http://rubyforge.org/') do |page|
|
25
|
+
# Click the login link
|
26
|
+
login_page = a.click(page.links.text(/Log In/))
|
27
|
+
|
28
|
+
# Submit the login form
|
29
|
+
my_page = login_page.form_with(:action => '/account/login.php') do |f|
|
30
|
+
f.form_loginname = ARGV[0]
|
31
|
+
f.form_pw = ARGV[1]
|
32
|
+
end.click_button
|
33
|
+
|
34
|
+
my_page.links.each do |link|
|
35
|
+
text = link.text.strip
|
36
|
+
next unless text.length > 0
|
37
|
+
puts text
|
38
|
+
end
|
39
|
+
end
|
40
|
+
|
41
|
+
== File Upload
|
42
|
+
Upload a file to flickr.
|
43
|
+
|
44
|
+
a = WWW::Mechanize.new { |agent|
|
45
|
+
# Flickr refreshes after login
|
46
|
+
agent.follow_meta_refresh = true
|
47
|
+
}
|
48
|
+
|
49
|
+
a.get('http://flickr.com/') do |home_page|
|
50
|
+
signin_page = a.click(home_page.links.text(/Sign In/))
|
51
|
+
|
52
|
+
my_page = signin_page.form_with(:name => 'login_form') do |form|
|
53
|
+
form.login = ARGV[0]
|
54
|
+
form.passwd = ARGV[1]
|
55
|
+
end.submit
|
56
|
+
|
57
|
+
# Click the upload link
|
58
|
+
upload_page = a.click(my_page.links.text(/Upload/))
|
59
|
+
|
60
|
+
# We want the basic upload page.
|
61
|
+
upload_page = a.click(upload_page.links.text(/basic Uploader/))
|
62
|
+
|
63
|
+
# Upload the file
|
64
|
+
upload_page.form_with(:method => 'POST') do |upload_form|
|
65
|
+
upload_form.file_uploads.first.file_name = ARGV[2]
|
66
|
+
end.submit
|
67
|
+
end
|
68
|
+
|
69
|
+
== Pluggable Parsers
|
70
|
+
Lets say you want html pages to automatically be parsed with Rubyful Soup.
|
71
|
+
This example shows you how:
|
72
|
+
|
73
|
+
require 'rubygems'
|
74
|
+
require 'mechanize'
|
75
|
+
require 'rubyful_soup'
|
76
|
+
|
77
|
+
class SoupParser < WWW::Mechanize::Page
|
78
|
+
attr_reader :soup
|
79
|
+
def initialize(uri = nil, response = nil, body = nil, code = nil)
|
80
|
+
@soup = BeautifulSoup.new(body)
|
81
|
+
super(uri, response, body, code)
|
82
|
+
end
|
83
|
+
end
|
84
|
+
|
85
|
+
agent = WWW::Mechanize.new
|
86
|
+
agent.pluggable_parser.html = SoupParser
|
87
|
+
|
88
|
+
Now all HTML pages will be parsed with the SoupParser class, and automatically
|
89
|
+
give you access to a method called 'soup' where you can get access to the
|
90
|
+
Beautiful Soup for that page.
|
91
|
+
|
92
|
+
== Using a proxy
|
93
|
+
|
94
|
+
require 'rubygems'
|
95
|
+
require 'mechanize'
|
96
|
+
|
97
|
+
agent = WWW::Mechanize.new
|
98
|
+
agent.set_proxy('localhost', '8000')
|
99
|
+
page = agent.get(ARGV[0])
|
100
|
+
puts page.body
|
101
|
+
|
102
|
+
== The transact method
|
103
|
+
|
104
|
+
transact runs the given block and then resets the page history. I.e. after the
|
105
|
+
block has been executed, you're back at the original page; no need count how
|
106
|
+
many times to call the back method at the end of a loop (while accounting for
|
107
|
+
possible exceptions).
|
108
|
+
|
109
|
+
This example also demonstrates subclassing Mechanize.
|
110
|
+
|
111
|
+
require 'mechanize'
|
112
|
+
|
113
|
+
class TestMech < WWW::Mechanize
|
114
|
+
def process
|
115
|
+
get 'http://rubyforge.org/'
|
116
|
+
search_form = page.forms.first
|
117
|
+
search_form.words = 'WWW'
|
118
|
+
submit search_form
|
119
|
+
|
120
|
+
page.links_with(:href => %r{/projects/} ).each do |link|
|
121
|
+
next if link.href =~ %r{/projects/support/}
|
122
|
+
|
123
|
+
puts 'Loading %-30s %s' % [link.href, link.text]
|
124
|
+
begin
|
125
|
+
transact do
|
126
|
+
click link
|
127
|
+
# Do stuff, maybe click more links.
|
128
|
+
end
|
129
|
+
# Now we're back at the original page.
|
130
|
+
|
131
|
+
rescue => e
|
132
|
+
$stderr.puts "#{e.class}: #{e.message}"
|
133
|
+
end
|
134
|
+
end
|
135
|
+
end
|
136
|
+
end
|
137
|
+
|
138
|
+
TestMech.new.process
|
139
|
+
|
140
|
+
== Client Certificate Authentication (Mutual Auth)
|
141
|
+
|
142
|
+
In most cases a client certificate is created as an additional layer of security
|
143
|
+
for certain websites. The specific case that this was initially tested on was
|
144
|
+
for automating the download of archived images from a banks (Wachovia) lockbox
|
145
|
+
system. Once the certificate is installed into your browser you will have to
|
146
|
+
export it and split the certificate and private key into separate files. Exported
|
147
|
+
files are usually in .p12 format (IE 7 & Firefox 2.0) which stands for PKCS #12.
|
148
|
+
You can convert them from p12 to pem format by using the following commands:
|
149
|
+
|
150
|
+
openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.key -nocerts -nodes
|
151
|
+
openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys
|
152
|
+
|
153
|
+
require 'rubygems'
|
154
|
+
require 'mechanize'
|
155
|
+
|
156
|
+
# create Mechanize instance
|
157
|
+
agent = WWW::Mechanize.new
|
158
|
+
|
159
|
+
# set the path of the certificate file
|
160
|
+
agent.cert = 'example.cer'
|
161
|
+
|
162
|
+
# set the path of the private key file
|
163
|
+
agent.key = 'example.key'
|
164
|
+
|
165
|
+
# get the login form & fill it out with the username/password
|
166
|
+
login_form = @agent.get("http://example.com/login_page").form('Login')
|
167
|
+
login_form.Userid = 'TestUser'
|
168
|
+
login_form.Password = 'TestPassword'
|
169
|
+
|
170
|
+
# submit login form
|
171
|
+
agent.submit(login_form, login_form.buttons.first)
|
data/FAQ.rdoc
ADDED
@@ -0,0 +1,11 @@
|
|
1
|
+
Q: I keep getting an EOFError:
|
2
|
+
protocol.rb:133:in `sysread': end of file reached (EOFError)
|
3
|
+
|
4
|
+
A: Some people have experienced an EOFError during normal mechanize usage.
|
5
|
+
Most of the time this occurs because the remote website claims to support
|
6
|
+
keep alives, but does not implement them correctly. Try turning off
|
7
|
+
keep alives on your mechanize object:
|
8
|
+
|
9
|
+
mech.keep_alive = false
|
10
|
+
|
11
|
+
|
data/GUIDE.rdoc
ADDED
@@ -0,0 +1,122 @@
|
|
1
|
+
= Getting Started With WWW::Mechanize
|
2
|
+
This guide is meant to get you started using Mechanize. By the end of this
|
3
|
+
guide, you should be able to fetch pages, click links, fill out and submit
|
4
|
+
forms, scrape data, and many other hopefully useful things. This guide
|
5
|
+
really just scratches the surface of what is available, but should be enough
|
6
|
+
information to get you really going!
|
7
|
+
|
8
|
+
== Let's Fetch a Page!
|
9
|
+
First thing is first. Make sure that you've required mechanize and that you
|
10
|
+
instantiate a new mechanize object:
|
11
|
+
require 'rubygems'
|
12
|
+
require 'mechanize'
|
13
|
+
|
14
|
+
agent = WWW::Mechanize.new
|
15
|
+
Now we'll use the agent we've created to fetch a page. Let's fetch google
|
16
|
+
with our mechanize agent:
|
17
|
+
page = agent.get('http://google.com/')
|
18
|
+
What just happened? We told mechanize to go pick up google's main page.
|
19
|
+
Mechanize stored any cookies that were set, and followed any redirects that
|
20
|
+
google may have sent. The agent gave us back a page that we can use to
|
21
|
+
scrape data, find links to click, or find forms to fill out.
|
22
|
+
|
23
|
+
Next, lets try finding some links to click.
|
24
|
+
|
25
|
+
== Finding Links
|
26
|
+
Mechanize returns a page object whenever you get a page, post, or submit a
|
27
|
+
form. When a page is fetched, the agent will parse the page and put a list
|
28
|
+
of links on the page object.
|
29
|
+
|
30
|
+
Now that we've fetched google's homepage, lets try listing all of the links:
|
31
|
+
page.links.each do |link|
|
32
|
+
puts link.text
|
33
|
+
end
|
34
|
+
We can list the links, but Mechanize gives a few shortcuts to help us find a
|
35
|
+
link to click on. Lets say we wanted to click the link whose text is 'News'.
|
36
|
+
Normally, we would have to do this:
|
37
|
+
page = agent.click page.links.find { |l| l.text == 'News' }
|
38
|
+
But Mechanize gives us a shortcut. Instead we can say this:
|
39
|
+
page = agent.click page.link_with(:text => 'News')
|
40
|
+
That shortcut says "find all links with the name 'News'". You're probably
|
41
|
+
thinking "there could be multiple links with that text!", and you would be
|
42
|
+
correct! If you use the plural form, you can access the list.
|
43
|
+
If you wanted to click on the second news link, you could do this:
|
44
|
+
agent.click page.links_with(:text => 'News')[1]
|
45
|
+
We can even find a link with a certain href like so:
|
46
|
+
page.links_with(:href => '/something')
|
47
|
+
Or chain them together to find a link with certain text and certain href:
|
48
|
+
page.links_with(:text => 'News', :href => '/something')
|
49
|
+
|
50
|
+
These shortcuts that mechanize provides are available on any list that you
|
51
|
+
can fetch like frames, iframes, or forms. Now that we know how to find and
|
52
|
+
click links, lets try something more complicated like filling out a form.
|
53
|
+
|
54
|
+
== Filling Out Forms
|
55
|
+
Lets continue with our google example. Here's the code we have so far:
|
56
|
+
require 'rubygems'
|
57
|
+
require 'mechanize'
|
58
|
+
|
59
|
+
agent = WWW::Mechanize.new
|
60
|
+
page = agent.get('http://google.com/')
|
61
|
+
If we pretty print the page, we can see that there is one form named 'f',
|
62
|
+
that has a couple buttons and a few fields:
|
63
|
+
pp page
|
64
|
+
Now that we know the name of the form, lets fetch it off the page:
|
65
|
+
google_form = page.form('f')
|
66
|
+
Mechanize lets you access form input fields in a few different ways, but the
|
67
|
+
most convenient is that you can access input fields as accessors on the
|
68
|
+
object. So lets set the form field named 'q' on the form to 'ruby mechanize':
|
69
|
+
google_form.q = 'ruby mechanize'
|
70
|
+
To make sure that we set the value, lets pretty print the form, and you should
|
71
|
+
see a line similar to this:
|
72
|
+
#<WWW::Mechanize::Field:0x1403488 @name="q", @value="ruby mechanize">
|
73
|
+
If you saw that the value of 'q' changed, you're on the right track! Now we
|
74
|
+
can submit the form and 'press' the submit button and print the results:
|
75
|
+
page = agent.submit(google_form, google_form.buttons.first)
|
76
|
+
pp page
|
77
|
+
What we just did was equivalent to putting text in the search field and
|
78
|
+
clicking the 'Google Search' button. If we had submitted the form without
|
79
|
+
a button, it would be like typing in the text field and hitting the return
|
80
|
+
button.
|
81
|
+
|
82
|
+
Lets take a look at the code all together:
|
83
|
+
require 'rubygems'
|
84
|
+
require 'mechanize'
|
85
|
+
|
86
|
+
agent = WWW::Mechanize.new
|
87
|
+
page = agent.get('http://google.com/')
|
88
|
+
google_form = page.form('f')
|
89
|
+
google_form.q = 'ruby mechanize'
|
90
|
+
page = agent.submit(google_form)
|
91
|
+
pp page
|
92
|
+
|
93
|
+
Before we go on to screen scraping, lets take a look at forms a little more
|
94
|
+
in depth. Unless you want to skip ahead!
|
95
|
+
|
96
|
+
== Advanced Form Techniques
|
97
|
+
In this section, I want to touch on using the different types in input fields
|
98
|
+
possible with a form. Password and textarea fields can be treated just like
|
99
|
+
text input fields. Select fields are very similar to text fields, but they
|
100
|
+
have many options associated with them. If you select one option, mechanize
|
101
|
+
will deselect the other options (unless it is a multi select!).
|
102
|
+
|
103
|
+
For example, lets select an option on a list:
|
104
|
+
form.field_with(:name => 'list').options[0].select
|
105
|
+
|
106
|
+
Now lets take a look at checkboxes and radio buttons. To select a checkbox,
|
107
|
+
just check it like this:
|
108
|
+
form.checkbox_with(:name => 'box').check
|
109
|
+
Radio buttons are very similar to checkboxes, but they know how to uncheck
|
110
|
+
other radio buttons of the same name. Just check a radio button like you
|
111
|
+
would a checkbox:
|
112
|
+
form.radiobuttons_with(:name => 'box')[1].check
|
113
|
+
Mechanize also makes file uploads easy! Just find the file upload field, and
|
114
|
+
tell it what file name you want to upload:
|
115
|
+
form.file_uploads.first.file_name = "somefile.jpg"
|
116
|
+
|
117
|
+
== Scraping Data
|
118
|
+
Mechanize uses nokogiri[http://nokogiri.rubyforge.org/] to parse
|
119
|
+
html. What does this mean for you? You can treat a mechanize page like
|
120
|
+
an nokogiri object. After you have used Mechanize to navigate to the page
|
121
|
+
that you need to scrape, then scrape it using nokogiri methods:
|
122
|
+
agent.get('http://someurl.com/').search(".//p[@class='posted']")
|
data/LICENSE.rdoc
ADDED
@@ -0,0 +1,340 @@
|
|
1
|
+
GNU GENERAL PUBLIC LICENSE
|
2
|
+
Version 2, June 1991
|
3
|
+
|
4
|
+
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
|
5
|
+
51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
|
6
|
+
Everyone is permitted to copy and distribute verbatim copies
|
7
|
+
of this license document, but changing it is not allowed.
|
8
|
+
|
9
|
+
Preamble
|
10
|
+
|
11
|
+
The licenses for most software are designed to take away your
|
12
|
+
freedom to share and change it. By contrast, the GNU General Public
|
13
|
+
License is intended to guarantee your freedom to share and change free
|
14
|
+
software--to make sure the software is free for all its users. This
|
15
|
+
General Public License applies to most of the Free Software
|
16
|
+
Foundation's software and to any other program whose authors commit to
|
17
|
+
using it. (Some other Free Software Foundation software is covered by
|
18
|
+
the GNU Library General Public License instead.) You can apply it to
|
19
|
+
your programs, too.
|
20
|
+
|
21
|
+
When we speak of free software, we are referring to freedom, not
|
22
|
+
price. Our General Public Licenses are designed to make sure that you
|
23
|
+
have the freedom to distribute copies of free software (and charge for
|
24
|
+
this service if you wish), that you receive source code or can get it
|
25
|
+
if you want it, that you can change the software or use pieces of it
|
26
|
+
in new free programs; and that you know you can do these things.
|
27
|
+
|
28
|
+
To protect your rights, we need to make restrictions that forbid
|
29
|
+
anyone to deny you these rights or to ask you to surrender the rights.
|
30
|
+
These restrictions translate to certain responsibilities for you if you
|
31
|
+
distribute copies of the software, or if you modify it.
|
32
|
+
|
33
|
+
For example, if you distribute copies of such a program, whether
|
34
|
+
gratis or for a fee, you must give the recipients all the rights that
|
35
|
+
you have. You must make sure that they, too, receive or can get the
|
36
|
+
source code. And you must show them these terms so they know their
|
37
|
+
rights.
|
38
|
+
|
39
|
+
We protect your rights with two steps: (1) copyright the software, and
|
40
|
+
(2) offer you this license which gives you legal permission to copy,
|
41
|
+
distribute and/or modify the software.
|
42
|
+
|
43
|
+
Also, for each author's protection and ours, we want to make certain
|
44
|
+
that everyone understands that there is no warranty for this free
|
45
|
+
software. If the software is modified by someone else and passed on, we
|
46
|
+
want its recipients to know that what they have is not the original, so
|
47
|
+
that any problems introduced by others will not reflect on the original
|
48
|
+
authors' reputations.
|
49
|
+
|
50
|
+
Finally, any free program is threatened constantly by software
|
51
|
+
patents. We wish to avoid the danger that redistributors of a free
|
52
|
+
program will individually obtain patent licenses, in effect making the
|
53
|
+
program proprietary. To prevent this, we have made it clear that any
|
54
|
+
patent must be licensed for everyone's free use or not licensed at all.
|
55
|
+
|
56
|
+
The precise terms and conditions for copying, distribution and
|
57
|
+
modification follow.
|
58
|
+
|
59
|
+
GNU GENERAL PUBLIC LICENSE
|
60
|
+
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
61
|
+
|
62
|
+
0. This License applies to any program or other work which contains
|
63
|
+
a notice placed by the copyright holder saying it may be distributed
|
64
|
+
under the terms of this General Public License. The "Program", below,
|
65
|
+
refers to any such program or work, and a "work based on the Program"
|
66
|
+
means either the Program or any derivative work under copyright law:
|
67
|
+
that is to say, a work containing the Program or a portion of it,
|
68
|
+
either verbatim or with modifications and/or translated into another
|
69
|
+
language. (Hereinafter, translation is included without limitation in
|
70
|
+
the term "modification".) Each licensee is addressed as "you".
|
71
|
+
|
72
|
+
Activities other than copying, distribution and modification are not
|
73
|
+
covered by this License; they are outside its scope. The act of
|
74
|
+
running the Program is not restricted, and the output from the Program
|
75
|
+
is covered only if its contents constitute a work based on the
|
76
|
+
Program (independent of having been made by running the Program).
|
77
|
+
Whether that is true depends on what the Program does.
|
78
|
+
|
79
|
+
1. You may copy and distribute verbatim copies of the Program's
|
80
|
+
source code as you receive it, in any medium, provided that you
|
81
|
+
conspicuously and appropriately publish on each copy an appropriate
|
82
|
+
copyright notice and disclaimer of warranty; keep intact all the
|
83
|
+
notices that refer to this License and to the absence of any warranty;
|
84
|
+
and give any other recipients of the Program a copy of this License
|
85
|
+
along with the Program.
|
86
|
+
|
87
|
+
You may charge a fee for the physical act of transferring a copy, and
|
88
|
+
you may at your option offer warranty protection in exchange for a fee.
|
89
|
+
|
90
|
+
2. You may modify your copy or copies of the Program or any portion
|
91
|
+
of it, thus forming a work based on the Program, and copy and
|
92
|
+
distribute such modifications or work under the terms of Section 1
|
93
|
+
above, provided that you also meet all of these conditions:
|
94
|
+
|
95
|
+
a) You must cause the modified files to carry prominent notices
|
96
|
+
stating that you changed the files and the date of any change.
|
97
|
+
|
98
|
+
b) You must cause any work that you distribute or publish, that in
|
99
|
+
whole or in part contains or is derived from the Program or any
|
100
|
+
part thereof, to be licensed as a whole at no charge to all third
|
101
|
+
parties under the terms of this License.
|
102
|
+
|
103
|
+
c) If the modified program normally reads commands interactively
|
104
|
+
when run, you must cause it, when started running for such
|
105
|
+
interactive use in the most ordinary way, to print or display an
|
106
|
+
announcement including an appropriate copyright notice and a
|
107
|
+
notice that there is no warranty (or else, saying that you provide
|
108
|
+
a warranty) and that users may redistribute the program under
|
109
|
+
these conditions, and telling the user how to view a copy of this
|
110
|
+
License. (Exception: if the Program itself is interactive but
|
111
|
+
does not normally print such an announcement, your work based on
|
112
|
+
the Program is not required to print an announcement.)
|
113
|
+
|
114
|
+
These requirements apply to the modified work as a whole. If
|
115
|
+
identifiable sections of that work are not derived from the Program,
|
116
|
+
and can be reasonably considered independent and separate works in
|
117
|
+
themselves, then this License, and its terms, do not apply to those
|
118
|
+
sections when you distribute them as separate works. But when you
|
119
|
+
distribute the same sections as part of a whole which is a work based
|
120
|
+
on the Program, the distribution of the whole must be on the terms of
|
121
|
+
this License, whose permissions for other licensees extend to the
|
122
|
+
entire whole, and thus to each and every part regardless of who wrote it.
|
123
|
+
|
124
|
+
Thus, it is not the intent of this section to claim rights or contest
|
125
|
+
your rights to work written entirely by you; rather, the intent is to
|
126
|
+
exercise the right to control the distribution of derivative or
|
127
|
+
collective works based on the Program.
|
128
|
+
|
129
|
+
In addition, mere aggregation of another work not based on the Program
|
130
|
+
with the Program (or with a work based on the Program) on a volume of
|
131
|
+
a storage or distribution medium does not bring the other work under
|
132
|
+
the scope of this License.
|
133
|
+
|
134
|
+
3. You may copy and distribute the Program (or a work based on it,
|
135
|
+
under Section 2) in object code or executable form under the terms of
|
136
|
+
Sections 1 and 2 above provided that you also do one of the following:
|
137
|
+
|
138
|
+
a) Accompany it with the complete corresponding machine-readable
|
139
|
+
source code, which must be distributed under the terms of Sections
|
140
|
+
1 and 2 above on a medium customarily used for software interchange; or,
|
141
|
+
|
142
|
+
b) Accompany it with a written offer, valid for at least three
|
143
|
+
years, to give any third party, for a charge no more than your
|
144
|
+
cost of physically performing source distribution, a complete
|
145
|
+
machine-readable copy of the corresponding source code, to be
|
146
|
+
distributed under the terms of Sections 1 and 2 above on a medium
|
147
|
+
customarily used for software interchange; or,
|
148
|
+
|
149
|
+
c) Accompany it with the information you received as to the offer
|
150
|
+
to distribute corresponding source code. (This alternative is
|
151
|
+
allowed only for noncommercial distribution and only if you
|
152
|
+
received the program in object code or executable form with such
|
153
|
+
an offer, in accord with Subsection b above.)
|
154
|
+
|
155
|
+
The source code for a work means the preferred form of the work for
|
156
|
+
making modifications to it. For an executable work, complete source
|
157
|
+
code means all the source code for all modules it contains, plus any
|
158
|
+
associated interface definition files, plus the scripts used to
|
159
|
+
control compilation and installation of the executable. However, as a
|
160
|
+
special exception, the source code distributed need not include
|
161
|
+
anything that is normally distributed (in either source or binary
|
162
|
+
form) with the major components (compiler, kernel, and so on) of the
|
163
|
+
operating system on which the executable runs, unless that component
|
164
|
+
itself accompanies the executable.
|
165
|
+
|
166
|
+
If distribution of executable or object code is made by offering
|
167
|
+
access to copy from a designated place, then offering equivalent
|
168
|
+
access to copy the source code from the same place counts as
|
169
|
+
distribution of the source code, even though third parties are not
|
170
|
+
compelled to copy the source along with the object code.
|
171
|
+
|
172
|
+
4. You may not copy, modify, sublicense, or distribute the Program
|
173
|
+
except as expressly provided under this License. Any attempt
|
174
|
+
otherwise to copy, modify, sublicense or distribute the Program is
|
175
|
+
void, and will automatically terminate your rights under this License.
|
176
|
+
However, parties who have received copies, or rights, from you under
|
177
|
+
this License will not have their licenses terminated so long as such
|
178
|
+
parties remain in full compliance.
|
179
|
+
|
180
|
+
5. You are not required to accept this License, since you have not
|
181
|
+
signed it. However, nothing else grants you permission to modify or
|
182
|
+
distribute the Program or its derivative works. These actions are
|
183
|
+
prohibited by law if you do not accept this License. Therefore, by
|
184
|
+
modifying or distributing the Program (or any work based on the
|
185
|
+
Program), you indicate your acceptance of this License to do so, and
|
186
|
+
all its terms and conditions for copying, distributing or modifying
|
187
|
+
the Program or works based on it.
|
188
|
+
|
189
|
+
6. Each time you redistribute the Program (or any work based on the
|
190
|
+
Program), the recipient automatically receives a license from the
|
191
|
+
original licensor to copy, distribute or modify the Program subject to
|
192
|
+
these terms and conditions. You may not impose any further
|
193
|
+
restrictions on the recipients' exercise of the rights granted herein.
|
194
|
+
You are not responsible for enforcing compliance by third parties to
|
195
|
+
this License.
|
196
|
+
|
197
|
+
7. If, as a consequence of a court judgment or allegation of patent
|
198
|
+
infringement or for any other reason (not limited to patent issues),
|
199
|
+
conditions are imposed on you (whether by court order, agreement or
|
200
|
+
otherwise) that contradict the conditions of this License, they do not
|
201
|
+
excuse you from the conditions of this License. If you cannot
|
202
|
+
distribute so as to satisfy simultaneously your obligations under this
|
203
|
+
License and any other pertinent obligations, then as a consequence you
|
204
|
+
may not distribute the Program at all. For example, if a patent
|
205
|
+
license would not permit royalty-free redistribution of the Program by
|
206
|
+
all those who receive copies directly or indirectly through you, then
|
207
|
+
the only way you could satisfy both it and this License would be to
|
208
|
+
refrain entirely from distribution of the Program.
|
209
|
+
|
210
|
+
If any portion of this section is held invalid or unenforceable under
|
211
|
+
any particular circumstance, the balance of the section is intended to
|
212
|
+
apply and the section as a whole is intended to apply in other
|
213
|
+
circumstances.
|
214
|
+
|
215
|
+
It is not the purpose of this section to induce you to infringe any
|
216
|
+
patents or other property right claims or to contest validity of any
|
217
|
+
such claims; this section has the sole purpose of protecting the
|
218
|
+
integrity of the free software distribution system, which is
|
219
|
+
implemented by public license practices. Many people have made
|
220
|
+
generous contributions to the wide range of software distributed
|
221
|
+
through that system in reliance on consistent application of that
|
222
|
+
system; it is up to the author/donor to decide if he or she is willing
|
223
|
+
to distribute software through any other system and a licensee cannot
|
224
|
+
impose that choice.
|
225
|
+
|
226
|
+
This section is intended to make thoroughly clear what is believed to
|
227
|
+
be a consequence of the rest of this License.
|
228
|
+
|
229
|
+
8. If the distribution and/or use of the Program is restricted in
|
230
|
+
certain countries either by patents or by copyrighted interfaces, the
|
231
|
+
original copyright holder who places the Program under this License
|
232
|
+
may add an explicit geographical distribution limitation excluding
|
233
|
+
those countries, so that distribution is permitted only in or among
|
234
|
+
countries not thus excluded. In such case, this License incorporates
|
235
|
+
the limitation as if written in the body of this License.
|
236
|
+
|
237
|
+
9. The Free Software Foundation may publish revised and/or new versions
|
238
|
+
of the General Public License from time to time. Such new versions will
|
239
|
+
be similar in spirit to the present version, but may differ in detail to
|
240
|
+
address new problems or concerns.
|
241
|
+
|
242
|
+
Each version is given a distinguishing version number. If the Program
|
243
|
+
specifies a version number of this License which applies to it and "any
|
244
|
+
later version", you have the option of following the terms and conditions
|
245
|
+
either of that version or of any later version published by the Free
|
246
|
+
Software Foundation. If the Program does not specify a version number of
|
247
|
+
this License, you may choose any version ever published by the Free Software
|
248
|
+
Foundation.
|
249
|
+
|
250
|
+
10. If you wish to incorporate parts of the Program into other free
|
251
|
+
programs whose distribution conditions are different, write to the author
|
252
|
+
to ask for permission. For software which is copyrighted by the Free
|
253
|
+
Software Foundation, write to the Free Software Foundation; we sometimes
|
254
|
+
make exceptions for this. Our decision will be guided by the two goals
|
255
|
+
of preserving the free status of all derivatives of our free software and
|
256
|
+
of promoting the sharing and reuse of software generally.
|
257
|
+
|
258
|
+
NO WARRANTY
|
259
|
+
|
260
|
+
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
|
261
|
+
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
|
262
|
+
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
|
263
|
+
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
|
264
|
+
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
265
|
+
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
|
266
|
+
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
267
|
+
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
|
268
|
+
REPAIR OR CORRECTION.
|
269
|
+
|
270
|
+
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
271
|
+
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
|
272
|
+
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
|
273
|
+
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
|
274
|
+
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
|
275
|
+
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
|
276
|
+
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
|
277
|
+
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
|
278
|
+
POSSIBILITY OF SUCH DAMAGES.
|
279
|
+
|
280
|
+
END OF TERMS AND CONDITIONS
|
281
|
+
|
282
|
+
How to Apply These Terms to Your New Programs
|
283
|
+
|
284
|
+
If you develop a new program, and you want it to be of the greatest
|
285
|
+
possible use to the public, the best way to achieve this is to make it
|
286
|
+
free software which everyone can redistribute and change under these terms.
|
287
|
+
|
288
|
+
To do so, attach the following notices to the program. It is safest
|
289
|
+
to attach them to the start of each source file to most effectively
|
290
|
+
convey the exclusion of warranty; and each file should have at least
|
291
|
+
the "copyright" line and a pointer to where the full notice is found.
|
292
|
+
|
293
|
+
<one line to give the program's name and a brief idea of what it does.>
|
294
|
+
Copyright (C) <year> <name of author>
|
295
|
+
|
296
|
+
This program is free software; you can redistribute it and/or modify
|
297
|
+
it under the terms of the GNU General Public License as published by
|
298
|
+
the Free Software Foundation; either version 2 of the License, or
|
299
|
+
(at your option) any later version.
|
300
|
+
|
301
|
+
This program is distributed in the hope that it will be useful,
|
302
|
+
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
303
|
+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
304
|
+
GNU General Public License for more details.
|
305
|
+
|
306
|
+
You should have received a copy of the GNU General Public License
|
307
|
+
along with this program; if not, write to the Free Software
|
308
|
+
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
|
309
|
+
|
310
|
+
|
311
|
+
Also add information on how to contact you by electronic and paper mail.
|
312
|
+
|
313
|
+
If the program is interactive, make it output a short notice like this
|
314
|
+
when it starts in an interactive mode:
|
315
|
+
|
316
|
+
Gnomovision version 69, Copyright (C) year name of author
|
317
|
+
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
318
|
+
This is free software, and you are welcome to redistribute it
|
319
|
+
under certain conditions; type `show c' for details.
|
320
|
+
|
321
|
+
The hypothetical commands `show w' and `show c' should show the appropriate
|
322
|
+
parts of the General Public License. Of course, the commands you use may
|
323
|
+
be called something other than `show w' and `show c'; they could even be
|
324
|
+
mouse-clicks or menu items--whatever suits your program.
|
325
|
+
|
326
|
+
You should also get your employer (if you work as a programmer) or your
|
327
|
+
school, if any, to sign a "copyright disclaimer" for the program, if
|
328
|
+
necessary. Here is a sample; alter the names:
|
329
|
+
|
330
|
+
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
|
331
|
+
`Gnomovision' (which makes passes at compilers) written by James Hacker.
|
332
|
+
|
333
|
+
<signature of Ty Coon>, 1 April 1989
|
334
|
+
Ty Coon, President of Vice
|
335
|
+
|
336
|
+
This General Public License does not permit incorporating your program into
|
337
|
+
proprietary programs. If your program is a subroutine library, you may
|
338
|
+
consider it more useful to permit linking proprietary applications with the
|
339
|
+
library. If this is what you want to do, use the GNU Library General
|
340
|
+
Public License instead of this License.
|