yahoo_site_explorer 0.0.1 → 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
@@ -2,6 +2,8 @@
2
2
 
3
3
  The Yahoo! Site Explorer service provides access to Yahoo!'s information about web pages. The service stores information about links between web pages and and can gauge the popularity of a given page.
4
4
 
5
+ The Site Explorer APIs are limited to 5,000 queries per IP address per day and to noncommercial use. See information on {rate limiting}[http://developer.yahoo.com/search/rate.html].
6
+
5
7
  == Installation
6
8
 
7
9
  To install, simply:
@@ -11,17 +13,43 @@ To install, simply:
11
13
 
12
14
  == Example
13
15
 
14
- The following example queries Yahoo! Site Explorer for the number of inlinks (backlinks) for www.google.com:
16
+ The following example queries Yahoo! Site Explorer for backlinks (inlink data) information about 'http://www.yahoo.com':
15
17
 
16
18
  require 'yahoo_site_explorer'
17
19
 
18
- service = YahooSiteExplorer.new('myapikey')
19
- results = service.backlinks('http://www.google.com')
20
- puts results.total_results #=> 941822
20
+ service = YahooSiteExplorer.new('1234--MyAPIKeyHere=abcd--')
21
+ backlinks = service.backlinks('http://www.yahoo.com')
22
+
23
+ puts backlinks.total_results_available #=> 941822
24
+ puts backlinks.results.first.title #=> HTML page title
25
+ puts backlinks.results.first.url #=> HTML page title
26
+ puts backlinks.results.first.click_url #=> HTML page title
27
+
28
+ # Uses a cursor, requeries Yahoo! as necessary, and steps through all
29
+ # results.
30
+ backlinks.each do |link|
31
+ puts link.title
32
+ end
33
+
34
+ == Supported Services
35
+
36
+ This library supports the following endpoints for the Yahoo! Site Explorer web service:
37
+
38
+ Inlink Data::
39
+ Shows the pages from other sites linking in to a page.
40
+
41
+ Page Data::
42
+ Shows a list of all pages belonging to a domain in the Yahoo! index.
43
+
44
+ === Unsupported Services
45
+
46
+ The following endpoints are not currently supported by this library:
21
47
 
22
- == Caveats
48
+ Ping::
49
+ Allows you to notify Yahoo! of changes to your site.
23
50
 
24
- Currently, this library only implements the inlinksData, totalResultsAvailable, as that is the only data I need. I'll extend this to fully support the API, shortly.
51
+ Update Notification::
52
+ Allows you to notify Yahoo! of changes to your site.
25
53
 
26
54
  == Copyright
27
55
 
data/Rakefile CHANGED
@@ -58,7 +58,8 @@ Rake::RDocTask.new do |rdoc|
58
58
 
59
59
  rdoc.rdoc_dir = 'rdoc'
60
60
  rdoc.title = "yahoo_site_explorer #{version}"
61
- rdoc.rdoc_files.include('README*')
61
+ rdoc.main = 'README.rdoc'
62
+ rdoc.rdoc_files.include('README*', 'LICENSE*')
62
63
  rdoc.rdoc_files.include('lib/**/*.rb')
63
64
  end
64
65
 
@@ -0,0 +1,4 @@
1
+ ---
2
+ :minor: 0
3
+ :patch: 2
4
+ :major: 0
@@ -1,9 +1,10 @@
1
1
  require 'relax'
2
2
  require 'yahoo_site_explorer/api'
3
3
  require 'yahoo_site_explorer/backlinks'
4
+ require 'yahoo_site_explorer/page_data'
4
5
 
5
6
  ##
6
- #
7
+ # Provides a Ruby interface ot the Yahoo! Site Explorer web service.
7
8
  #
8
9
  class YahooSiteExplorer
9
10
 
@@ -13,9 +14,83 @@ class YahooSiteExplorer
13
14
  end
14
15
 
15
16
 
17
+ ##
18
+ # Queries Yahoo! Site Explorer for backlinks (inlink data) to the given
19
+ # URL.
20
+ #
21
+ # === Example
22
+ #
23
+ # yahoo_site_explorer.backlinks('http://www.yahoo.com', :results => 50)
24
+ #
25
+ # === Options
26
+ #
27
+ # The following options would be passed into the method as a symbolized key
28
+ # and value pair.
29
+ #
30
+ # results::
31
+ # The number of results to return.
32
+ # Default: 50.
33
+ # Maximum: 100.
34
+ #
35
+ # start::
36
+ # The starting result position to return. The finishing position cannot
37
+ # exceed 1000.
38
+ # Default: 1.
39
+ #
40
+ # entire_site::
41
+ # Specifies whether to provide results for the entire site, or just the
42
+ # page referenced by the query. If the query is not a domain URL (i.e.
43
+ # it contains path information, such as
44
+ # http://smallbusiness.yahoo.com/webhosthing/), this parameter has no
45
+ # effect.
46
+ # Default: no value.
47
+ # Other possible values: '1'
48
+ #
49
+ # omit_inlinks::
50
+ # If specified, inlinks will not be returned if they are from pages in the same domain/subdomain as the requested page.
51
+ # Default: 'none'
52
+ # Other possible values: 'domain', 'subdomain'
53
+ #
16
54
  def backlinks(url, options = {})
17
55
  options[:query] ||= url
18
- Backlinks.new(api.inlink_data(options))
56
+ Backlinks.new(self, options, api.inlink_data(options))
57
+ end
58
+
59
+ ##
60
+ # Queries Yahoo! Site Explorer for page data on the given URL.
61
+ #
62
+ # === Example
63
+ #
64
+ # yahoo_site_explorer.page_data('http://www.yahoo.com', :results => 50)
65
+ #
66
+ # === Options
67
+ #
68
+ # The following options would be passed into the method as a symbolized key
69
+ # and value pair.
70
+ #
71
+ # results::
72
+ # The number of results to return.
73
+ # Default: 50.
74
+ # Maximum: 100.
75
+ #
76
+ # start::
77
+ # The starting result position to return. The finishing position cannot
78
+ # exceed 1000.
79
+ # Default: 1.
80
+ #
81
+ # domain_only::
82
+ # Specifies whether to provide results for all subdomains (such as
83
+ # http://search.yahoo.com for http://www.yahoo.com) of the domain query,
84
+ # or just the specifically requested domain. If the query is not a domain
85
+ # URL (i.e. it contains path information, such as
86
+ # http://smallbusiness.yahoo.com/webhosting/), this parameter has not
87
+ # affect.
88
+ # Default: no value
89
+ # Other possible values: '1'
90
+ #
91
+ def page_data(url, options = {})
92
+ options[:query] ||= url
93
+ PageData.new(self, options, api.page_data(options))
19
94
  end
20
95
 
21
96
 
@@ -5,20 +5,40 @@ class YahooSiteExplorer
5
5
  defaults do
6
6
  parameter :appid, :required => true
7
7
  parameter :query, :required => true
8
+ parameter :results
9
+ parameter :start
8
10
  end
9
11
 
10
12
  endpoint 'http://search.yahooapis.com/SiteExplorerService/V1/' do
11
13
 
12
14
  action :inlink_data, :url => 'inlinkData' do
13
- parameter :results
14
- parameter :start
15
15
  parameter :entire_site
16
16
  parameter :omit_inlinks
17
- parameter :output
18
- parameter :callback
19
17
 
20
18
  parser 'ResultSet' do
21
- element :total_results, :attribute => 'totalResultsAvailable', :xpath => './/@totalResultsAvailable'
19
+ element :total_results_available, :attribute => 'totalResultsAvailable', :xpath => './/@totalResultsAvailable'
20
+ element :first_result_position, :attribute => 'firstResultPosition', :xpath => './/@firstResultPosition'
21
+ element :total_results_returned, :attribute => 'totalResultsReturned', :xpath => './/@totalResultsReturned'
22
+ elements 'Result', :as => :results do
23
+ element 'Title', :as => :title
24
+ element 'Url', :as => :url
25
+ element 'ClickUrl', :as => :click_url
26
+ end
27
+ end
28
+ end
29
+
30
+ action :page_data, :url => 'pageData' do
31
+ parameter :domain_only
32
+
33
+ parser 'ResultSet' do
34
+ element :total_results_available, :attribute => 'totalResultsAvailable', :xpath => './/@totalResultsAvailable'
35
+ element :first_result_position, :attribute => 'firstResultPosition', :xpath => './/@firstResultPosition'
36
+ element :total_results_returned, :attribute => 'totalResultsReturned', :xpath => './/@totalResultsReturned'
37
+ elements 'Result', :as => :results do
38
+ element 'Title', :as => :title
39
+ element 'Url', :as => :url
40
+ element 'ClickUrl', :as => :click_url
41
+ end
22
42
  end
23
43
  end
24
44
 
@@ -1,15 +1,48 @@
1
+ require 'yahoo_site_explorer/results_container'
2
+ require 'yahoo_site_explorer/result'
3
+
1
4
  class YahooSiteExplorer
2
5
 
3
- class Backlinks #:nodoc:
6
+ class Backlinks < ResultsContainer
4
7
 
5
- attr_reader :total_results
6
-
7
- def initialize(backlinks_hash)
8
- self.total_results = backlinks_hash[:total_results]
8
+ ##
9
+ # This method will step through all of the results supplied by Yahoo! for
10
+ # your given query. This method acts like a cursor and will
11
+ # automatically re-query Yahoo! after for any subsequent set of +results+
12
+ # results (i.e. if your original query asked for 50 results at a time,
13
+ # this method will act as a cursor, pulling 50 results at a time over the
14
+ # entire resulting collection).
15
+ #
16
+ def each
17
+ backlinks = self
18
+ records = self.results
19
+ while !records.empty?
20
+ records.each { |record| yield record }
21
+ backlinks = backlinks.next_set
22
+ records = backlinks.results
23
+ end
24
+ self
9
25
  end
10
26
 
11
- def total_results=(count)
12
- @total_results = count ? count.to_i : nil
27
+ ##
28
+ # Returns the next Backlinks set based on available results from Yahoo!
29
+ # specific to the current query and request options.
30
+ #
31
+ def next_set #:nodoc:
32
+ if next_starting_position <= @total_results_available
33
+ @service.backlinks( @request_options.delete(:url),
34
+ @request_options.merge({
35
+ :start => next_starting_position
36
+ })
37
+ )
38
+ else
39
+ Backlinks.new(@service, @request_options, {
40
+ :total_results_available => @total_results_available,
41
+ :total_results_returned => 0,
42
+ :first_result_position => next_starting_position,
43
+ :results => []
44
+ })
45
+ end
13
46
  end
14
47
 
15
48
  end
@@ -0,0 +1,50 @@
1
+ require 'yahoo_site_explorer/results_container'
2
+ require 'yahoo_site_explorer/result'
3
+
4
+ class YahooSiteExplorer
5
+
6
+ class PageData < ResultsContainer
7
+
8
+ ##
9
+ # This method will step through all of the results supplied by Yahoo! for
10
+ # your given query. This method acts like a cursor and will
11
+ # automatically re-query Yahoo! after for any subsequent set of +results+
12
+ # results (i.e. if your original query asked for 50 results at a time,
13
+ # this method will act as a cursor, pulling 50 results at a time over the
14
+ # entire resulting collection).
15
+ #
16
+ def each
17
+ page_data = self
18
+ records = self.results
19
+ while !records.empty?
20
+ records.each { |record| yield record }
21
+ page_data = page_data.next_set
22
+ records = page_data.results
23
+ end
24
+ self
25
+ end
26
+
27
+ ##
28
+ # Returns the next PageData set based on available results from Yahoo!
29
+ # specific to the current query and request options.
30
+ #
31
+ def next_set #:nodoc:
32
+ if next_starting_position <= @total_results_available
33
+ @service.page_data( @request_options.delete(:url),
34
+ @request_options.merge({
35
+ :start => next_starting_position
36
+ })
37
+ )
38
+ else
39
+ PageData.new(@service, @request_options, {
40
+ :total_results_available => @total_results_available,
41
+ :total_results_returned => 0,
42
+ :first_result_position => next_starting_position,
43
+ :results => []
44
+ })
45
+ end
46
+ end
47
+
48
+ end
49
+
50
+ end
@@ -0,0 +1,16 @@
1
+ class YahooSiteExplorer
2
+
3
+ ##
4
+ # Wraps each result returned by Yahoo! backlinks (inlink data) and page data
5
+ # requests.
6
+ #
7
+ class Result
8
+ attr_accessor :title, :url, :click_url
9
+
10
+ def initialize(title, url, click_url) #:nodoc:
11
+ @title, @url, @click_url = title, url, click_url
12
+ end
13
+
14
+ end
15
+
16
+ end
@@ -0,0 +1,53 @@
1
+ class YahooSiteExplorer
2
+
3
+ class ResultsContainer #:nodoc:
4
+ include Enumerable
5
+
6
+ attr_reader :total_results_available,
7
+ :total_results_returned,
8
+ :first_result_position,
9
+ :results
10
+
11
+
12
+ def initialize(service, request_options, results_hash = {}) #:nodoc:
13
+ @service = service
14
+ @request_options = request_options
15
+ parse_hash(results_hash)
16
+ end
17
+
18
+
19
+
20
+ protected
21
+
22
+
23
+ ##
24
+ # Returns the starting position for a query that would subsequently
25
+ # follow the current result set.
26
+ #
27
+ def next_starting_position
28
+ @first_result_position + @total_results_returned
29
+ end
30
+
31
+ def parse_hash(backlinks_hash) #:nodoc:
32
+ @total_results_available = numeric(backlinks_hash[:total_results_available])
33
+ @total_results_returned = numeric(backlinks_hash[:total_results_returned])
34
+ @first_result_position = numeric(backlinks_hash[:first_result_position])
35
+ @results = collect_results(backlinks_hash[:results])
36
+ end
37
+
38
+ def collect_results(results) #:nodoc:
39
+ collection = []
40
+ return unless results.respond_to?(:each)
41
+ results.each do |result|
42
+ collection << Result.new(result[:title], result[:url], result[:click_url])
43
+ end
44
+ collection
45
+ end
46
+
47
+ def numeric(value, nil_value = nil) #:nodoc:
48
+ value ? value.to_i : nil_value
49
+ end
50
+
51
+ end
52
+
53
+ end
@@ -7,7 +7,7 @@ class YahooSiteExplorerTest < Test::Unit::TestCase
7
7
  context 'backlinks' do
8
8
 
9
9
  setup do
10
- mock_response_for('http://search.yahooapis.com:80/SiteExplorerService/V1/inlinkData?appid=testid&query=http://www.google.com',
10
+ mock_response_for('http://search.yahooapis.com:80/SiteExplorerService/V1/inlinkData?appid=testid&query=http://www.yahoo.com',
11
11
  :response => mock_inlink_data_successful_response)
12
12
  end
13
13
 
@@ -16,11 +16,132 @@ class YahooSiteExplorerTest < Test::Unit::TestCase
16
16
  end
17
17
 
18
18
  should 'return Backlinks' do
19
- assert_kind_of YahooSiteExplorer::Backlinks, site_explorer.backlinks('http://www.google.com')
19
+ assert_kind_of YahooSiteExplorer::Backlinks, site_explorer.backlinks('http://www.yahoo.com')
20
20
  end
21
21
 
22
- should 'return total_results as an integer' do
23
- assert_equal 941822, site_explorer.backlinks('http://www.google.com').total_results
22
+ should 'be Enumerable' do
23
+ assert_kind_of Enumerable, site_explorer.backlinks('http://www.yahoo.com')
24
+ end
25
+
26
+ should 'return total_results_available as a numeric' do
27
+ assert_equal 941822, site_explorer.backlinks('http://www.yahoo.com').total_results_available
28
+ end
29
+
30
+ should 'return first_result_position as a numeric' do
31
+ assert_equal 1, site_explorer.backlinks('http://www.yahoo.com').first_result_position
32
+ end
33
+
34
+ should 'return total_results_returned as a numeric' do
35
+ assert_equal 2, site_explorer.backlinks('http://www.yahoo.com').total_results_returned
36
+ end
37
+
38
+ should 'return a collection of results' do
39
+ assert_equal 2, site_explorer.backlinks('http://www.yahoo.com').results.size
40
+ end
41
+
42
+ should 'contain the result title' do
43
+ assert_equal 'Common Dreams News Center', site_explorer.backlinks('http://www.yahoo.com').results.first.title
44
+ end
45
+
46
+ should 'contain the result url' do
47
+ assert_equal 'http://www.commondreams.org/', site_explorer.backlinks('http://www.yahoo.com').results.first.url
48
+ end
49
+
50
+ should 'contain the result click_url' do
51
+ assert_equal 'http://www.commondreams.org/', site_explorer.backlinks('http://www.yahoo.com').results.first.click_url
52
+ end
53
+
54
+ context 'each' do
55
+
56
+ setup do
57
+ mock_response_for('http://search.yahooapis.com:80/SiteExplorerService/V1/inlinkData?query=http://www.yahoo.com&appid=testid',
58
+ :response => mock_inlink_data_successful_response_set_1)
59
+ mock_response_for('http://search.yahooapis.com:80/SiteExplorerService/V1/inlinkData?start=51&query=http://www.yahoo.com&appid=testid',
60
+ :response => mock_inlink_data_successful_response_set_2)
61
+ end
62
+
63
+ should 'traverse multiple, contiguous query sets' do
64
+ results = 0
65
+ backlinks = site_explorer.backlinks('http://www.yahoo.com')
66
+ backlinks.each { |b| results += 1 }
67
+ assert_equal 99, results
68
+ end
69
+
70
+ should 'yield a Result' do
71
+ assert site_explorer.backlinks('http://www.yahoo.com').all? { |result| result.kind_of?(YahooSiteExplorer::Result) }
72
+ end
73
+
74
+ end
75
+
76
+ end
77
+
78
+ context 'page_data' do
79
+
80
+ setup do
81
+ mock_response_for('http://search.yahooapis.com:80/SiteExplorerService/V1/pageData?appid=testid&query=http://www.yahoo.com',
82
+ :response => mock_page_data_successful_response_set_1)
83
+ end
84
+
85
+ teardown do
86
+ FakeWeb.clean_registry
87
+ end
88
+
89
+ should 'return PageData' do
90
+ assert_kind_of YahooSiteExplorer::PageData, site_explorer.page_data('http://www.yahoo.com')
91
+ end
92
+
93
+ should 'be Enumerable' do
94
+ assert_kind_of Enumerable, site_explorer.page_data('http://www.yahoo.com')
95
+ end
96
+
97
+ should 'return total_results_available as a numeric' do
98
+ assert_equal 99, site_explorer.page_data('http://www.yahoo.com').total_results_available
99
+ end
100
+
101
+ should 'return first_result_position as a numeric' do
102
+ assert_equal 1, site_explorer.page_data('http://www.yahoo.com').first_result_position
103
+ end
104
+
105
+ should 'return total_results_returned as a numeric' do
106
+ assert_equal 50, site_explorer.page_data('http://www.yahoo.com').total_results_returned
107
+ end
108
+
109
+ should 'return a collection of results' do
110
+ assert_equal 50, site_explorer.page_data('http://www.yahoo.com').results.size
111
+ end
112
+
113
+ should 'contain the result title' do
114
+ assert_equal 'Site Explorer - Yahoo! Site Explorer', site_explorer.page_data('http://www.yahoo.com').results.first.title
115
+ end
116
+
117
+ should 'contain the result url' do
118
+ assert_equal 'http://siteexplorer.search.yahoo.com/', site_explorer.page_data('http://www.yahoo.com').results.first.url
119
+ end
120
+
121
+ should 'contain the result click_url' do
122
+ assert_equal 'http://siteexplorer.search.yahoo.com/', site_explorer.page_data('http://www.yahoo.com').results.first.click_url
123
+ end
124
+
125
+ context 'each' do
126
+
127
+ setup do
128
+ mock_response_for('http://search.yahooapis.com:80/SiteExplorerService/V1/pageData?query=http://www.yahoo.com&appid=testid',
129
+ :response => mock_page_data_successful_response_set_1)
130
+ mock_response_for('http://search.yahooapis.com:80/SiteExplorerService/V1/pageData?start=51&query=http://www.yahoo.com&appid=testid',
131
+ :response => mock_page_data_successful_response_set_2)
132
+ end
133
+
134
+ should 'traverse multiple, contiguous query sets' do
135
+ results = 0
136
+ pages = site_explorer.page_data('http://www.yahoo.com')
137
+ pages.each { |b| results += 1 }
138
+ assert_equal 99, results
139
+ end
140
+
141
+ should 'yield a Result' do
142
+ assert site_explorer.page_data('http://www.yahoo.com').all? { |result| result.kind_of?(YahooSiteExplorer::Result) }
143
+ end
144
+
24
145
  end
25
146
 
26
147
  end