spidr 0.2.7 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- data/.rspec +1 -0
- data/ChangeLog.md +56 -31
- data/Gemfile +7 -21
- data/LICENSE.txt +1 -2
- data/README.md +7 -6
- data/Rakefile +13 -23
- data/gemspec.yml +19 -0
- data/lib/spidr/actions/actions.rb +1 -1
- data/lib/spidr/agent.rb +21 -6
- data/lib/spidr/auth_store.rb +1 -1
- data/lib/spidr/body.rb +99 -0
- data/lib/spidr/extensions/uri.rb +14 -7
- data/lib/spidr/headers.rb +323 -0
- data/lib/spidr/links.rb +229 -0
- data/lib/spidr/page.rb +32 -536
- data/lib/spidr/sanitizers.rb +3 -3
- data/lib/spidr/session_cache.rb +1 -0
- data/lib/spidr/version.rb +1 -1
- data/spec/actions_spec.rb +6 -8
- data/spec/auth_store_spec.rb +28 -28
- data/spec/cookie_jar_spec.rb +49 -60
- data/spec/extensions/uri_spec.rb +4 -0
- data/spec/filters_spec.rb +8 -0
- data/spec/page_spec.rb +0 -7
- data/spec/rules_spec.rb +8 -6
- data/spec/sanitizers_spec.rb +10 -16
- data/spec/spec_helper.rb +1 -12
- data/spec/spidr_spec.rb +11 -11
- data/spidr.gemspec +11 -110
- metadata +24 -52
- data/.gitignore +0 -9
- data/.specopts +0 -1
- data/Gemfile.lock +0 -39
data/.rspec
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
--colour --format documentation
|
data/ChangeLog.md
CHANGED
@@ -1,19 +1,44 @@
|
|
1
|
+
### 0.3.0 / 2011-04-14
|
2
|
+
|
3
|
+
* Switched from Jeweler to [Ore](http://github.com/ruby-ore/ore).
|
4
|
+
* Split all header related methods out of {Spidr::Page} and into
|
5
|
+
{Spidr::Headers}.
|
6
|
+
* Split all body related methods out of {Spidr::Page} and into
|
7
|
+
{Spidr::Body}.
|
8
|
+
* Split all link related methods out of {Spidr::Page} and into
|
9
|
+
{Spidr::Links}.
|
10
|
+
* Added {Spidr::Headers#directory?}.
|
11
|
+
* Added {Spidr::Headers#json?}.
|
12
|
+
* Added {Spidr::Links#each_url}.
|
13
|
+
* Added {Spidr::Links#each_link}.
|
14
|
+
* Added {Spidr::Links#each_redirect}.
|
15
|
+
* Added {Spidr::Links#each_meta_redirect}.
|
16
|
+
* Aliased {Spidr::Headers#raw_cookie} to {Spidr::Headers#cookie}.
|
17
|
+
* Aliased {Spidr::Body#to_s} to {Spidr::Body#body}.
|
18
|
+
* Also check for `application/xml` in {Spidr::Headers#xml?}.
|
19
|
+
* Catch all exceptions when merging URIs in {Spidr::Links#to_absolute}.
|
20
|
+
* Always prepend a `/` to all FTP URI paths. Fixes a Ruby 1.8 specific
|
21
|
+
bug, where it expects an absolute path for all FTP URIs.
|
22
|
+
* Refactored {URI.expand_path}.
|
23
|
+
* Start the session in {Spidr::SessionCache#[]} to prevent multiple
|
24
|
+
`CONNECT` commands being sent to HTTP Proxies (thanks falaise).
|
25
|
+
|
1
26
|
### 0.2.7 / 2010-08-17
|
2
27
|
|
3
28
|
* Added {Spidr::CookieJar#cookies_for_host} (thanks zapnap).
|
4
|
-
* Renamed `Spidr::Page#cookie` to
|
29
|
+
* Renamed `Spidr::Page#cookie` to `Spidr::Page#raw_cookie`.
|
5
30
|
* Rescue `URI::InvalidComponentError` exceptions in
|
6
|
-
|
31
|
+
`Spidr::Page#to_absolute` (thanks zapnap).
|
7
32
|
|
8
33
|
### 0.2.6 / 2010-07-05
|
9
34
|
|
10
|
-
* Fixed a bug in
|
35
|
+
* Fixed a bug in `Spidr::Page#meta_redirect`, by calling
|
11
36
|
`Nokogiri::XML::Element#get_attribute` instead of `attr`.
|
12
37
|
|
13
38
|
### 0.2.5 / 2010-07-02
|
14
39
|
|
15
|
-
* Added
|
16
|
-
* Added
|
40
|
+
* Added `Spidr::Page#meta_redirect`.
|
41
|
+
* Added `Spidr::Page#meta_redirect?`.
|
17
42
|
* Manage development dependencies with Bundler.
|
18
43
|
* Support following "old-school" meta-refresh redirects (thanks zapnap).
|
19
44
|
* Allow {Spidr::CookieJar} inherit cookies set by a parent domain.
|
@@ -26,10 +51,10 @@
|
|
26
51
|
* Added {Spidr::Filters#visit_urls_like}.
|
27
52
|
* Added {Spidr::Filters#ignore_urls}.
|
28
53
|
* Added {Spidr::Filters#ignore_urls_like}.
|
29
|
-
* Added
|
30
|
-
* Default
|
31
|
-
* Default
|
32
|
-
* Default
|
54
|
+
* Added `Spidr::Page#is_content_type?`.
|
55
|
+
* Default `Spidr::Page#body` to an empty String.
|
56
|
+
* Default `Spidr::Page#content_type` to an empty String.
|
57
|
+
* Default `Spidr::Page#content_types` to an empty Array.
|
33
58
|
* Improved reliability of {Spidr::Page#is_redirect?}.
|
34
59
|
* Improved content type detection in {Spidr::Page} to handle `Content-Type`
|
35
60
|
headers containing charsets (thanks Josh Lindsey).
|
@@ -47,10 +72,10 @@
|
|
47
72
|
* Require Web Spider Obstacle Course (WSOC) >= 0.1.1.
|
48
73
|
* Integrated the new WSOC into the specs.
|
49
74
|
* Removed the built-in Web Spider Obstacle Course.
|
50
|
-
* Added
|
51
|
-
* Added
|
52
|
-
* Added
|
53
|
-
* Added
|
75
|
+
* Added `Spidr::Page#content_types`.
|
76
|
+
* Added `Spidr::Page#cookie`.
|
77
|
+
* Added `Spidr::Page#cookies`.
|
78
|
+
* Added `Spidr::Page#cookie_params`.
|
54
79
|
* Added {Spidr::Sanitizers}.
|
55
80
|
* Added {Spidr::SessionCache}.
|
56
81
|
* Added {Spidr::CookieJar} (thanks Nick Plante).
|
@@ -93,33 +118,33 @@
|
|
93
118
|
### 0.2.0 / 2009-10-10
|
94
119
|
|
95
120
|
* Added {URI.expand_path}.
|
96
|
-
* Added
|
97
|
-
* Added
|
98
|
-
* Added
|
121
|
+
* Added `Spidr::Page#search`.
|
122
|
+
* Added `Spidr::Page#at`.
|
123
|
+
* Added `Spidr::Page#title`.
|
99
124
|
* Added {Spidr::Agent#failures=}.
|
100
125
|
* Added a HTTP session cache to {Spidr::Agent}, per suggestion of falter.
|
101
126
|
* Added `Spidr::Agent#get_session`.
|
102
127
|
* Added `Spidr::Agent#kill_session`.
|
103
128
|
* Added {Spidr.proxy=}.
|
104
129
|
* Added {Spidr.disable_proxy!}.
|
105
|
-
* Aliased `Spidr::Page#txt?` to
|
106
|
-
* Aliased `Spidr::Page#ok?` to
|
107
|
-
* Aliased `Spidr::Page#redirect?` to
|
108
|
-
* Aliased `Spidr::Page#unauthorized?` to
|
109
|
-
* Aliased `Spidr::Page#forbidden?` to
|
110
|
-
* Aliased `Spidr::Page#missing?` to
|
130
|
+
* Aliased `Spidr::Page#txt?` to `Spidr::Page#plain_text?`.
|
131
|
+
* Aliased `Spidr::Page#ok?` to `Spidr::Page#is_ok?`.
|
132
|
+
* Aliased `Spidr::Page#redirect?` to `Spidr::Page#is_redirect?`.
|
133
|
+
* Aliased `Spidr::Page#unauthorized?` to `Spidr::Page#is_unauthorized?`.
|
134
|
+
* Aliased `Spidr::Page#forbidden?` to `Spidr::Page#is_forbidden?`.
|
135
|
+
* Aliased `Spidr::Page#missing?` to `Spidr::Page#is_missing?`.
|
111
136
|
* Split URL filtering code out of {Spidr::Agent} and into
|
112
137
|
{Spidr::Filters}.
|
113
138
|
* Split URL / Page event code out of {Spidr::Agent} and into
|
114
139
|
{Spidr::Events}.
|
115
140
|
* Split pause! / continue! / skip_link! / skip_page! methods out of
|
116
141
|
{Spidr::Agent} and into {Spidr::Actions}.
|
117
|
-
* Fixed a bug in
|
118
|
-
* Make sure
|
142
|
+
* Fixed a bug in `Spidr::Page#code`, where it was not returning an Integer.
|
143
|
+
* Make sure `Spidr::Page#doc` returns `Nokogiri::XML::Document` objects for
|
119
144
|
RSS/RDF/Atom pages as well.
|
120
|
-
* Fixed the handling of the Location header in
|
145
|
+
* Fixed the handling of the Location header in `Spidr::Page#links`
|
121
146
|
(thanks falter).
|
122
|
-
* Fixed a bug in
|
147
|
+
* Fixed a bug in `Spidr::Page#to_absolute` where trailing `/` characters on
|
123
148
|
URI paths were not being preserved (thanks falter).
|
124
149
|
* Fixed a bug where the URI query was not being sent with the request
|
125
150
|
in {Spidr::Agent#get_page} (thanks Damian Steer).
|
@@ -169,7 +194,7 @@
|
|
169
194
|
|
170
195
|
* Added `Spidr::Agent#all_headers`.
|
171
196
|
* Fixed a bug where {Spidr::Page#headers} was always `nil`.
|
172
|
-
* {Spidr::
|
197
|
+
* {Spidr::Agent} will now follow the Location header in HTTP 300,
|
173
198
|
301, 302, 303 and 307 Redirects.
|
174
199
|
* {Spidr::Agent} will now follow iframe and frame tags.
|
175
200
|
|
@@ -189,8 +214,8 @@
|
|
189
214
|
|
190
215
|
### 0.1.5 / 2009-03-22
|
191
216
|
|
192
|
-
* Catch malformed URIs in
|
193
|
-
* Filter out `nil` URIs in
|
217
|
+
* Catch malformed URIs in `Spidr::Page#to_absolute` and return `nil`.
|
218
|
+
* Filter out `nil` URIs in `Spidr::Page#urls`.
|
194
219
|
|
195
220
|
### 0.1.4 / 2009-01-15
|
196
221
|
|
@@ -204,9 +229,9 @@
|
|
204
229
|
|
205
230
|
### 0.1.2 / 2008-11-06
|
206
231
|
|
207
|
-
* Fixed a bug in
|
232
|
+
* Fixed a bug in `Spidr::Page#to_absolute` where URLs with no path were not
|
208
233
|
receiving a default path of `/`.
|
209
|
-
* Fixed a bug in
|
234
|
+
* Fixed a bug in `Spidr::Page#to_absolute` where URL paths were not being
|
210
235
|
expanded, in order to remove `..` and `.` directories.
|
211
236
|
* Fixed a bug where absolute URLs could have a blank path, thus causing
|
212
237
|
{Spidr::Agent#get_page} to crash when it performed the HTTP request.
|
data/Gemfile
CHANGED
@@ -1,27 +1,13 @@
|
|
1
1
|
source 'https://rubygems.org'
|
2
2
|
|
3
|
-
|
4
|
-
gem 'nokogiri', '>= 1.3.0'
|
5
|
-
end
|
3
|
+
gemspec
|
6
4
|
|
7
|
-
group
|
8
|
-
gem 'rake',
|
9
|
-
gem 'jeweler', '~> 1.4.0', :git => 'git://github.com/technicalpickles/jeweler.git'
|
10
|
-
end
|
5
|
+
group :development do
|
6
|
+
gem 'rake', '~> 0.8.7'
|
11
7
|
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
gem 'maruku', '~> 0.6.0'
|
16
|
-
else
|
17
|
-
gem 'rdiscount', '~> 1.6.3'
|
18
|
-
end
|
8
|
+
gem 'ore-tasks', '~> 0.4'
|
9
|
+
gem 'rspec', '~> 2.4'
|
10
|
+
gem 'wsoc', '~> 0.1.3'
|
19
11
|
|
20
|
-
gem '
|
12
|
+
gem 'kramdown', '~> 0.12'
|
21
13
|
end
|
22
|
-
|
23
|
-
group(:test) do
|
24
|
-
gem 'wsoc', '~> 0.1.3'
|
25
|
-
end
|
26
|
-
|
27
|
-
gem 'rspec', '~> 1.3.0', :group => [:development, :test]
|
data/LICENSE.txt
CHANGED
data/README.md
CHANGED
@@ -1,9 +1,9 @@
|
|
1
1
|
# Spidr
|
2
2
|
|
3
|
-
* [
|
4
|
-
* [
|
5
|
-
* [
|
6
|
-
* [
|
3
|
+
* [Homepage](http://spidr.rubyforge.org/)
|
4
|
+
* [Source](http://github.com/postmodern/spidr)
|
5
|
+
* [Issues](http://github.com/postmodern/spidr/issues)
|
6
|
+
* [Mailing List](http://groups.google.com/group/spidr)
|
7
7
|
* irc.freenode.net #spidr
|
8
8
|
|
9
9
|
## Description
|
@@ -177,7 +177,7 @@ Skip the processing of links:
|
|
177
177
|
|
178
178
|
## Requirements
|
179
179
|
|
180
|
-
* [nokogiri](http://nokogiri.rubyforge.org/)
|
180
|
+
* [nokogiri](http://nokogiri.rubyforge.org/) ~> 1.3
|
181
181
|
|
182
182
|
## Install
|
183
183
|
|
@@ -185,5 +185,6 @@ Skip the processing of links:
|
|
185
185
|
|
186
186
|
## License
|
187
187
|
|
188
|
-
|
188
|
+
Copyright (c) 2008-2011 Hal Brodigan
|
189
189
|
|
190
|
+
See {file:LICENSE.txt} for license information.
|
data/Rakefile
CHANGED
@@ -1,8 +1,15 @@
|
|
1
1
|
require 'rubygems'
|
2
|
-
require 'bundler'
|
3
2
|
|
4
3
|
begin
|
5
|
-
|
4
|
+
require 'bundler'
|
5
|
+
rescue LoadError => e
|
6
|
+
STDERR.puts e.message
|
7
|
+
STDERR.puts "Run `gem install bundler` to install Bundler."
|
8
|
+
exit e.status_code
|
9
|
+
end
|
10
|
+
|
11
|
+
begin
|
12
|
+
Bundler.setup(:development)
|
6
13
|
rescue Bundler::BundlerError => e
|
7
14
|
STDERR.puts e.message
|
8
15
|
STDERR.puts "Run `bundle install` to install missing gems"
|
@@ -10,29 +17,12 @@ rescue Bundler::BundlerError => e
|
|
10
17
|
end
|
11
18
|
|
12
19
|
require 'rake'
|
13
|
-
require 'jeweler'
|
14
|
-
require './lib/spidr/version.rb'
|
15
|
-
|
16
|
-
Jeweler::Tasks.new do |gem|
|
17
|
-
gem.name = 'spidr'
|
18
|
-
gem.version = Spidr::VERSION
|
19
|
-
gem.license = 'MIT'
|
20
|
-
gem.summary = %Q{A versatile Ruby web spidering library}
|
21
|
-
gem.description = %Q{Spidr is a versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.}
|
22
|
-
gem.email = 'postmodern.mod3@gmail.com'
|
23
|
-
gem.homepage = 'http://github.com/postmodern/spidr'
|
24
|
-
gem.authors = ['Postmodern']
|
25
|
-
gem.has_rdoc = 'yard'
|
26
|
-
end
|
27
|
-
Jeweler::GemcutterTasks.new
|
28
20
|
|
29
|
-
require '
|
30
|
-
|
31
|
-
spec.libs += ['lib', 'spec']
|
32
|
-
spec.spec_files = FileList['spec/**/*_spec.rb']
|
33
|
-
spec.spec_opts = ['--options', '.specopts']
|
34
|
-
end
|
21
|
+
require 'ore/tasks'
|
22
|
+
Ore::Tasks.new
|
35
23
|
|
24
|
+
require 'rspec/core/rake_task'
|
25
|
+
RSpec::Core::RakeTask.new
|
36
26
|
task :default => :spec
|
37
27
|
|
38
28
|
require 'yard'
|
data/gemspec.yml
ADDED
@@ -0,0 +1,19 @@
|
|
1
|
+
name: spidr
|
2
|
+
summary: A versatile Ruby web spidering library
|
3
|
+
description:
|
4
|
+
Spidr is a versatile Ruby web spidering library that can spider a site,
|
5
|
+
multiple domains, certain links or infinitely. Spidr is designed to be
|
6
|
+
fast and easy to use.
|
7
|
+
|
8
|
+
license: MIT
|
9
|
+
authors: Postmodern
|
10
|
+
email: postmodern.mod3@gmail.com
|
11
|
+
homepage: http://github.com/postmodern/spidr
|
12
|
+
has_yard: true
|
13
|
+
|
14
|
+
dependencies:
|
15
|
+
nokogiri: ~> 1.3
|
16
|
+
|
17
|
+
development_dependencies:
|
18
|
+
bundler: ~> 1.0.0
|
19
|
+
yard: ~> 0.6.0
|
data/lib/spidr/agent.rb
CHANGED
@@ -48,6 +48,12 @@ module Spidr
|
|
48
48
|
|
49
49
|
# Cached cookies
|
50
50
|
attr_reader :cookies
|
51
|
+
|
52
|
+
# Maximum depth
|
53
|
+
attr_reader :max_depth
|
54
|
+
|
55
|
+
# The visited URLs and their depth within a site
|
56
|
+
attr_reader :levels
|
51
57
|
|
52
58
|
#
|
53
59
|
# Creates a new Agent object.
|
@@ -91,6 +97,9 @@ module Spidr
|
|
91
97
|
# @option options [Set, Array] :history
|
92
98
|
# The initial list of visited URLs.
|
93
99
|
#
|
100
|
+
# @option options [Integer] :max_depth
|
101
|
+
# The maximum link depth to follow.
|
102
|
+
#
|
94
103
|
# @yield [agent]
|
95
104
|
# If a block is given, it will be passed the newly created agent
|
96
105
|
# for further configuration.
|
@@ -119,6 +128,9 @@ module Spidr
|
|
119
128
|
@failures = Set[]
|
120
129
|
@queue = []
|
121
130
|
|
131
|
+
@levels = Hash.new(0)
|
132
|
+
@max_depth = options[:max_depth]
|
133
|
+
|
122
134
|
super(options)
|
123
135
|
|
124
136
|
yield self if block_given?
|
@@ -450,7 +462,7 @@ module Spidr
|
|
450
462
|
# @return [Boolean]
|
451
463
|
# Specifies whether the URL was enqueued, or ignored.
|
452
464
|
#
|
453
|
-
def enqueue(url)
|
465
|
+
def enqueue(url,level=0)
|
454
466
|
url = sanitize_url(url)
|
455
467
|
|
456
468
|
if (!(queued?(url)) && visit?(url))
|
@@ -477,14 +489,15 @@ module Spidr
|
|
477
489
|
return false
|
478
490
|
rescue Actions::Action
|
479
491
|
end
|
480
|
-
|
492
|
+
|
481
493
|
@queue << url
|
494
|
+
@levels[url] = level
|
482
495
|
return true
|
483
496
|
end
|
484
497
|
|
485
498
|
return false
|
486
499
|
end
|
487
|
-
|
500
|
+
|
488
501
|
#
|
489
502
|
# Requests and creates a new Page object from a given URL.
|
490
503
|
#
|
@@ -568,7 +581,7 @@ module Spidr
|
|
568
581
|
# for the page failed, or the page was skipped.
|
569
582
|
#
|
570
583
|
def visit_page(url)
|
571
|
-
url =
|
584
|
+
url = sanitize_url(url)
|
572
585
|
|
573
586
|
get_page(url) do |page|
|
574
587
|
@history << page.url
|
@@ -584,7 +597,7 @@ module Spidr
|
|
584
597
|
rescue Actions::Action
|
585
598
|
end
|
586
599
|
|
587
|
-
page.
|
600
|
+
page.each_url do |next_url|
|
588
601
|
begin
|
589
602
|
@every_link_blocks.each do |link_block|
|
590
603
|
link_block.call(page.url,next_url)
|
@@ -596,7 +609,9 @@ module Spidr
|
|
596
609
|
rescue Actions::Action
|
597
610
|
end
|
598
611
|
|
599
|
-
|
612
|
+
if (@max_depth.nil? || @max_depth > @levels[url])
|
613
|
+
enqueue(next_url,@levels[url] + 1)
|
614
|
+
end
|
600
615
|
end
|
601
616
|
end
|
602
617
|
end
|
data/lib/spidr/auth_store.rb
CHANGED
@@ -24,7 +24,7 @@ module Spidr
|
|
24
24
|
# Given a URL, return the most specific matching auth credential.
|
25
25
|
#
|
26
26
|
# @param [URI] url
|
27
|
-
# A fully qualified url
|
27
|
+
# A fully qualified url including optional path.
|
28
28
|
#
|
29
29
|
# @return [AuthCredential, nil]
|
30
30
|
# Closest matching {AuthCredential} values for the URL,
|
data/lib/spidr/body.rb
ADDED
@@ -0,0 +1,99 @@
|
|
1
|
+
require 'nokogiri'
|
2
|
+
|
3
|
+
module Spidr
|
4
|
+
module Body
|
5
|
+
#
|
6
|
+
# The body of the response.
|
7
|
+
#
|
8
|
+
# @return [String]
|
9
|
+
# The body of the response.
|
10
|
+
#
|
11
|
+
def body
|
12
|
+
(response.body || '')
|
13
|
+
end
|
14
|
+
|
15
|
+
#
|
16
|
+
# Returns a parsed document object for HTML, XML, RSS and Atom pages.
|
17
|
+
#
|
18
|
+
# @return [Nokogiri::HTML::Document, Nokogiri::XML::Document, nil]
|
19
|
+
# The document that represents HTML or XML pages.
|
20
|
+
# Returns `nil` if the page is neither HTML, XML, RSS, Atom or if
|
21
|
+
# the page could not be parsed properly.
|
22
|
+
#
|
23
|
+
# @see http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html
|
24
|
+
# @see http://nokogiri.rubyforge.org/nokogiri/Nokogiri/HTML/Document.html
|
25
|
+
#
|
26
|
+
def doc
|
27
|
+
return nil if body.empty?
|
28
|
+
|
29
|
+
begin
|
30
|
+
if html?
|
31
|
+
return @doc ||= Nokogiri::HTML(body)
|
32
|
+
elsif (xml? || xsl? || rss? || atom?)
|
33
|
+
return @doc ||= Nokogiri::XML(body)
|
34
|
+
end
|
35
|
+
rescue
|
36
|
+
return nil
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
#
|
41
|
+
# Searches the document for XPath or CSS Path paths.
|
42
|
+
#
|
43
|
+
# @param [Array<String>] paths
|
44
|
+
# CSS or XPath expressions to search the document with.
|
45
|
+
#
|
46
|
+
# @return [Array]
|
47
|
+
# The matched nodes from the document.
|
48
|
+
# Returns an empty Array if no nodes were matched, or if the page
|
49
|
+
# is not an HTML or XML document.
|
50
|
+
#
|
51
|
+
# @example
|
52
|
+
# page.search('//a[@href]')
|
53
|
+
#
|
54
|
+
# @see http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Node.html#M000239
|
55
|
+
#
|
56
|
+
def search(*paths)
|
57
|
+
if doc
|
58
|
+
doc.search(*paths)
|
59
|
+
else
|
60
|
+
[]
|
61
|
+
end
|
62
|
+
end
|
63
|
+
|
64
|
+
#
|
65
|
+
# Searches for the first occurrence an XPath or CSS Path expression.
|
66
|
+
#
|
67
|
+
# @return [Nokogiri::HTML::Node, Nokogiri::XML::Node, nil]
|
68
|
+
# The first matched node. Returns `nil` if no nodes could be matched,
|
69
|
+
# or if the page is not a HTML or XML document.
|
70
|
+
#
|
71
|
+
# @example
|
72
|
+
# page.at('//title')
|
73
|
+
#
|
74
|
+
# @see http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Node.html#M000251
|
75
|
+
#
|
76
|
+
def at(*arguments)
|
77
|
+
if doc
|
78
|
+
doc.at(*arguments)
|
79
|
+
end
|
80
|
+
end
|
81
|
+
|
82
|
+
alias / search
|
83
|
+
alias % at
|
84
|
+
|
85
|
+
#
|
86
|
+
# The title of the HTML page.
|
87
|
+
#
|
88
|
+
# @return [String]
|
89
|
+
# The inner-text of the title element of the page.
|
90
|
+
#
|
91
|
+
def title
|
92
|
+
if (node = at('//title'))
|
93
|
+
node.inner_text
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
alias to_s body
|
98
|
+
end
|
99
|
+
end
|