yahoo_site_explorer 0.0.1 → 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -2,6 +2,8 @@
2
2
 
3
3
  The Yahoo! Site Explorer service provides access to Yahoo!'s information about web pages. The service stores information about links between web pages and and can gauge the popularity of a given page.
4
4
 
5
+ The Site Explorer APIs are limited to 5,000 queries per IP address per day and to noncommercial use. See information on {rate limiting}[http://developer.yahoo.com/search/rate.html].
6
+
5
7
  == Installation
6
8
 
7
9
  To install, simply:
@@ -11,17 +13,43 @@ To install, simply:
11
13
 
12
14
  == Example
13
15
 
14
- The following example queries Yahoo! Site Explorer for the number of inlinks (backlinks) for www.google.com:
16
+ The following example queries Yahoo! Site Explorer for backlinks (inlink data) information about 'http://www.yahoo.com':
15
17
 
16
18
  require 'yahoo_site_explorer'
17
19
 
18
- service = YahooSiteExplorer.new('myapikey')
19
- results = service.backlinks('http://www.google.com')
20
- puts results.total_results #=> 941822
20
+ service = YahooSiteExplorer.new('1234--MyAPIKeyHere=abcd--')
21
+ backlinks = service.backlinks('http://www.yahoo.com')
22
+
23
+ puts backlinks.total_results_available #=> 941822
24
+ puts backlinks.results.first.title #=> HTML page title
25
+ puts backlinks.results.first.url #=> HTML page title
26
+ puts backlinks.results.first.click_url #=> HTML page title
27
+
28
+ # Uses a cursor, requeries Yahoo! as necessary, and steps through all
29
+ # results.
30
+ backlinks.each do |link|
31
+ puts link.title
32
+ end
33
+
34
+ == Supported Services
35
+
36
+ This library supports the following endpoints for the Yahoo! Site Explorer web service:
37
+
38
+ Inlink Data::
39
+ Shows the pages from other sites linking in to a page.
40
+
41
+ Page Data::
42
+ Shows a list of all pages belonging to a domain in the Yahoo! index.
43
+
44
+ === Unsupported Services
45
+
46
+ The following endpoints are not currently supported by this library:
21
47
 
22
- == Caveats
48
+ Ping::
49
+ Allows you to notify Yahoo! of changes to your site.
23
50
 
24
- Currently, this library only implements the inlinksData, totalResultsAvailable, as that is the only data I need. I'll extend this to fully support the API, shortly.
51
+ Update Notification::
52
+ Allows you to notify Yahoo! of changes to your site.
25
53
 
26
54
  == Copyright
27
55
 
data/Rakefile CHANGED
@@ -58,7 +58,8 @@ Rake::RDocTask.new do |rdoc|
58
58
 
59
59
  rdoc.rdoc_dir = 'rdoc'
60
60
  rdoc.title = "yahoo_site_explorer #{version}"
61
- rdoc.rdoc_files.include('README*')
61
+ rdoc.main = 'README.rdoc'
62
+ rdoc.rdoc_files.include('README*', 'LICENSE*')
62
63
  rdoc.rdoc_files.include('lib/**/*.rb')
63
64
  end
64
65
 
@@ -0,0 +1,4 @@
1
+ ---
2
+ :minor: 0
3
+ :patch: 2
4
+ :major: 0
@@ -1,9 +1,10 @@
1
1
  require 'relax'
2
2
  require 'yahoo_site_explorer/api'
3
3
  require 'yahoo_site_explorer/backlinks'
4
+ require 'yahoo_site_explorer/page_data'
4
5
 
5
6
  ##
6
- #
7
+ # Provides a Ruby interface ot the Yahoo! Site Explorer web service.
7
8
  #
8
9
  class YahooSiteExplorer
9
10
 
@@ -13,9 +14,83 @@ class YahooSiteExplorer
13
14
  end
14
15
 
15
16
 
17
+ ##
18
+ # Queries Yahoo! Site Explorer for backlinks (inlink data) to the given
19
+ # URL.
20
+ #
21
+ # === Example
22
+ #
23
+ # yahoo_site_explorer.backlinks('http://www.yahoo.com', :results => 50)
24
+ #
25
+ # === Options
26
+ #
27
+ # The following options would be passed into the method as a symbolized key
28
+ # and value pair.
29
+ #
30
+ # results::
31
+ # The number of results to return.
32
+ # Default: 50.
33
+ # Maximum: 100.
34
+ #
35
+ # start::
36
+ # The starting result position to return. The finishing position cannot
37
+ # exceed 1000.
38
+ # Default: 1.
39
+ #
40
+ # entire_site::
41
+ # Specifies whether to provide results for the entire site, or just the
42
+ # page referenced by the query. If the query is not a domain URL (i.e.
43
+ # it contains path information, such as
44
+ # http://smallbusiness.yahoo.com/webhosthing/), this parameter has no
45
+ # effect.
46
+ # Default: no value.
47
+ # Other possible values: '1'
48
+ #
49
+ # omit_inlinks::
50
+ # If specified, inlinks will not be returned if they are from pages in the same domain/subdomain as the requested page.
51
+ # Default: 'none'
52
+ # Other possible values: 'domain', 'subdomain'
53
+ #
16
54
  def backlinks(url, options = {})
17
55
  options[:query] ||= url
18
- Backlinks.new(api.inlink_data(options))
56
+ Backlinks.new(self, options, api.inlink_data(options))
57
+ end
58
+
59
+ ##
60
+ # Queries Yahoo! Site Explorer for page data on the given URL.
61
+ #
62
+ # === Example
63
+ #
64
+ # yahoo_site_explorer.page_data('http://www.yahoo.com', :results => 50)
65
+ #
66
+ # === Options
67
+ #
68
+ # The following options would be passed into the method as a symbolized key
69
+ # and value pair.
70
+ #
71
+ # results::
72
+ # The number of results to return.
73
+ # Default: 50.
74
+ # Maximum: 100.
75
+ #
76
+ # start::
77
+ # The starting result position to return. The finishing position cannot
78
+ # exceed 1000.
79
+ # Default: 1.
80
+ #
81
+ # domain_only::
82
+ # Specifies whether to provide results for all subdomains (such as
83
+ # http://search.yahoo.com for http://www.yahoo.com) of the domain query,
84
+ # or just the specifically requested domain. If the query is not a domain
85
+ # URL (i.e. it contains path information, such as
86
+ # http://smallbusiness.yahoo.com/webhosting/), this parameter has not
87
+ # affect.
88
+ # Default: no value
89
+ # Other possible values: '1'
90
+ #
91
+ def page_data(url, options = {})
92
+ options[:query] ||= url
93
+ PageData.new(self, options, api.page_data(options))
19
94
  end
20
95
 
21
96
 
@@ -5,20 +5,40 @@ class YahooSiteExplorer
5
5
  defaults do
6
6
  parameter :appid, :required => true
7
7
  parameter :query, :required => true
8
+ parameter :results
9
+ parameter :start
8
10
  end
9
11
 
10
12
  endpoint 'http://search.yahooapis.com/SiteExplorerService/V1/' do
11
13
 
12
14
  action :inlink_data, :url => 'inlinkData' do
13
- parameter :results
14
- parameter :start
15
15
  parameter :entire_site
16
16
  parameter :omit_inlinks
17
- parameter :output
18
- parameter :callback
19
17
 
20
18
  parser 'ResultSet' do
21
- element :total_results, :attribute => 'totalResultsAvailable', :xpath => './/@totalResultsAvailable'
19
+ element :total_results_available, :attribute => 'totalResultsAvailable', :xpath => './/@totalResultsAvailable'
20
+ element :first_result_position, :attribute => 'firstResultPosition', :xpath => './/@firstResultPosition'
21
+ element :total_results_returned, :attribute => 'totalResultsReturned', :xpath => './/@totalResultsReturned'
22
+ elements 'Result', :as => :results do
23
+ element 'Title', :as => :title
24
+ element 'Url', :as => :url
25
+ element 'ClickUrl', :as => :click_url
26
+ end
27
+ end
28
+ end
29
+
30
+ action :page_data, :url => 'pageData' do
31
+ parameter :domain_only
32
+
33
+ parser 'ResultSet' do
34
+ element :total_results_available, :attribute => 'totalResultsAvailable', :xpath => './/@totalResultsAvailable'
35
+ element :first_result_position, :attribute => 'firstResultPosition', :xpath => './/@firstResultPosition'
36
+ element :total_results_returned, :attribute => 'totalResultsReturned', :xpath => './/@totalResultsReturned'
37
+ elements 'Result', :as => :results do
38
+ element 'Title', :as => :title
39
+ element 'Url', :as => :url
40
+ element 'ClickUrl', :as => :click_url
41
+ end
22
42
  end
23
43
  end
24
44
 
@@ -1,15 +1,48 @@
1
+ require 'yahoo_site_explorer/results_container'
2
+ require 'yahoo_site_explorer/result'
3
+
1
4
  class YahooSiteExplorer
2
5
 
3
- class Backlinks #:nodoc:
6
+ class Backlinks < ResultsContainer
4
7
 
5
- attr_reader :total_results
6
-
7
- def initialize(backlinks_hash)
8
- self.total_results = backlinks_hash[:total_results]
8
+ ##
9
+ # This method will step through all of the results supplied by Yahoo! for
10
+ # your given query. This method acts like a cursor and will
11
+ # automatically re-query Yahoo! after for any subsequent set of +results+
12
+ # results (i.e. if your original query asked for 50 results at a time,
13
+ # this method will act as a cursor, pulling 50 results at a time over the
14
+ # entire resulting collection).
15
+ #
16
+ def each
17
+ backlinks = self
18
+ records = self.results
19
+ while !records.empty?
20
+ records.each { |record| yield record }
21
+ backlinks = backlinks.next_set
22
+ records = backlinks.results
23
+ end
24
+ self
9
25
  end
10
26
 
11
- def total_results=(count)
12
- @total_results = count ? count.to_i : nil
27
+ ##
28
+ # Returns the next Backlinks set based on available results from Yahoo!
29
+ # specific to the current query and request options.
30
+ #
31
+ def next_set #:nodoc:
32
+ if next_starting_position <= @total_results_available
33
+ @service.backlinks( @request_options.delete(:url),
34
+ @request_options.merge({
35
+ :start => next_starting_position
36
+ })
37
+ )
38
+ else
39
+ Backlinks.new(@service, @request_options, {
40
+ :total_results_available => @total_results_available,
41
+ :total_results_returned => 0,
42
+ :first_result_position => next_starting_position,
43
+ :results => []
44
+ })
45
+ end
13
46
  end
14
47
 
15
48
  end
@@ -0,0 +1,50 @@
1
+ require 'yahoo_site_explorer/results_container'
2
+ require 'yahoo_site_explorer/result'
3
+
4
+ class YahooSiteExplorer
5
+
6
+ class PageData < ResultsContainer
7
+
8
+ ##
9
+ # This method will step through all of the results supplied by Yahoo! for
10
+ # your given query. This method acts like a cursor and will
11
+ # automatically re-query Yahoo! after for any subsequent set of +results+
12
+ # results (i.e. if your original query asked for 50 results at a time,
13
+ # this method will act as a cursor, pulling 50 results at a time over the
14
+ # entire resulting collection).
15
+ #
16
+ def each
17
+ page_data = self
18
+ records = self.results
19
+ while !records.empty?
20
+ records.each { |record| yield record }
21
+ page_data = page_data.next_set
22
+ records = page_data.results
23
+ end
24
+ self
25
+ end
26
+
27
+ ##
28
+ # Returns the next PageData set based on available results from Yahoo!
29
+ # specific to the current query and request options.
30
+ #
31
+ def next_set #:nodoc:
32
+ if next_starting_position <= @total_results_available
33
+ @service.page_data( @request_options.delete(:url),
34
+ @request_options.merge({
35
+ :start => next_starting_position
36
+ })
37
+ )
38
+ else
39
+ PageData.new(@service, @request_options, {
40
+ :total_results_available => @total_results_available,
41
+ :total_results_returned => 0,
42
+ :first_result_position => next_starting_position,
43
+ :results => []
44
+ })
45
+ end
46
+ end
47
+
48
+ end
49
+
50
+ end
@@ -0,0 +1,16 @@
1
+ class YahooSiteExplorer
2
+
3
+ ##
4
+ # Wraps each result returned by Yahoo! backlinks (inlink data) and page data
5
+ # requests.
6
+ #
7
+ class Result
8
+ attr_accessor :title, :url, :click_url
9
+
10
+ def initialize(title, url, click_url) #:nodoc:
11
+ @title, @url, @click_url = title, url, click_url
12
+ end
13
+
14
+ end
15
+
16
+ end
@@ -0,0 +1,53 @@
1
+ class YahooSiteExplorer
2
+
3
+ class ResultsContainer #:nodoc:
4
+ include Enumerable
5
+
6
+ attr_reader :total_results_available,
7
+ :total_results_returned,
8
+ :first_result_position,
9
+ :results
10
+
11
+
12
+ def initialize(service, request_options, results_hash = {}) #:nodoc:
13
+ @service = service
14
+ @request_options = request_options
15
+ parse_hash(results_hash)
16
+ end
17
+
18
+
19
+
20
+ protected
21
+
22
+
23
+ ##
24
+ # Returns the starting position for a query that would subsequently
25
+ # follow the current result set.
26
+ #
27
+ def next_starting_position
28
+ @first_result_position + @total_results_returned
29
+ end
30
+
31
+ def parse_hash(backlinks_hash) #:nodoc:
32
+ @total_results_available = numeric(backlinks_hash[:total_results_available])
33
+ @total_results_returned = numeric(backlinks_hash[:total_results_returned])
34
+ @first_result_position = numeric(backlinks_hash[:first_result_position])
35
+ @results = collect_results(backlinks_hash[:results])
36
+ end
37
+
38
+ def collect_results(results) #:nodoc:
39
+ collection = []
40
+ return unless results.respond_to?(:each)
41
+ results.each do |result|
42
+ collection << Result.new(result[:title], result[:url], result[:click_url])
43
+ end
44
+ collection
45
+ end
46
+
47
+ def numeric(value, nil_value = nil) #:nodoc:
48
+ value ? value.to_i : nil_value
49
+ end
50
+
51
+ end
52
+
53
+ end
@@ -7,7 +7,7 @@ class YahooSiteExplorerTest < Test::Unit::TestCase
7
7
  context 'backlinks' do
8
8
 
9
9
  setup do
10
- mock_response_for('http://search.yahooapis.com:80/SiteExplorerService/V1/inlinkData?appid=testid&query=http://www.google.com',
10
+ mock_response_for('http://search.yahooapis.com:80/SiteExplorerService/V1/inlinkData?appid=testid&query=http://www.yahoo.com',
11
11
  :response => mock_inlink_data_successful_response)
12
12
  end
13
13
 
@@ -16,11 +16,132 @@ class YahooSiteExplorerTest < Test::Unit::TestCase
16
16
  end
17
17
 
18
18
  should 'return Backlinks' do
19
- assert_kind_of YahooSiteExplorer::Backlinks, site_explorer.backlinks('http://www.google.com')
19
+ assert_kind_of YahooSiteExplorer::Backlinks, site_explorer.backlinks('http://www.yahoo.com')
20
20
  end
21
21
 
22
- should 'return total_results as an integer' do
23
- assert_equal 941822, site_explorer.backlinks('http://www.google.com').total_results
22
+ should 'be Enumerable' do
23
+ assert_kind_of Enumerable, site_explorer.backlinks('http://www.yahoo.com')
24
+ end
25
+
26
+ should 'return total_results_available as a numeric' do
27
+ assert_equal 941822, site_explorer.backlinks('http://www.yahoo.com').total_results_available
28
+ end
29
+
30
+ should 'return first_result_position as a numeric' do
31
+ assert_equal 1, site_explorer.backlinks('http://www.yahoo.com').first_result_position
32
+ end
33
+
34
+ should 'return total_results_returned as a numeric' do
35
+ assert_equal 2, site_explorer.backlinks('http://www.yahoo.com').total_results_returned
36
+ end
37
+
38
+ should 'return a collection of results' do
39
+ assert_equal 2, site_explorer.backlinks('http://www.yahoo.com').results.size
40
+ end
41
+
42
+ should 'contain the result title' do
43
+ assert_equal 'Common Dreams News Center', site_explorer.backlinks('http://www.yahoo.com').results.first.title
44
+ end
45
+
46
+ should 'contain the result url' do
47
+ assert_equal 'http://www.commondreams.org/', site_explorer.backlinks('http://www.yahoo.com').results.first.url
48
+ end
49
+
50
+ should 'contain the result click_url' do
51
+ assert_equal 'http://www.commondreams.org/', site_explorer.backlinks('http://www.yahoo.com').results.first.click_url
52
+ end
53
+
54
+ context 'each' do
55
+
56
+ setup do
57
+ mock_response_for('http://search.yahooapis.com:80/SiteExplorerService/V1/inlinkData?query=http://www.yahoo.com&appid=testid',
58
+ :response => mock_inlink_data_successful_response_set_1)
59
+ mock_response_for('http://search.yahooapis.com:80/SiteExplorerService/V1/inlinkData?start=51&query=http://www.yahoo.com&appid=testid',
60
+ :response => mock_inlink_data_successful_response_set_2)
61
+ end
62
+
63
+ should 'traverse multiple, contiguous query sets' do
64
+ results = 0
65
+ backlinks = site_explorer.backlinks('http://www.yahoo.com')
66
+ backlinks.each { |b| results += 1 }
67
+ assert_equal 99, results
68
+ end
69
+
70
+ should 'yield a Result' do
71
+ assert site_explorer.backlinks('http://www.yahoo.com').all? { |result| result.kind_of?(YahooSiteExplorer::Result) }
72
+ end
73
+
74
+ end
75
+
76
+ end
77
+
78
+ context 'page_data' do
79
+
80
+ setup do
81
+ mock_response_for('http://search.yahooapis.com:80/SiteExplorerService/V1/pageData?appid=testid&query=http://www.yahoo.com',
82
+ :response => mock_page_data_successful_response_set_1)
83
+ end
84
+
85
+ teardown do
86
+ FakeWeb.clean_registry
87
+ end
88
+
89
+ should 'return PageData' do
90
+ assert_kind_of YahooSiteExplorer::PageData, site_explorer.page_data('http://www.yahoo.com')
91
+ end
92
+
93
+ should 'be Enumerable' do
94
+ assert_kind_of Enumerable, site_explorer.page_data('http://www.yahoo.com')
95
+ end
96
+
97
+ should 'return total_results_available as a numeric' do
98
+ assert_equal 99, site_explorer.page_data('http://www.yahoo.com').total_results_available
99
+ end
100
+
101
+ should 'return first_result_position as a numeric' do
102
+ assert_equal 1, site_explorer.page_data('http://www.yahoo.com').first_result_position
103
+ end
104
+
105
+ should 'return total_results_returned as a numeric' do
106
+ assert_equal 50, site_explorer.page_data('http://www.yahoo.com').total_results_returned
107
+ end
108
+
109
+ should 'return a collection of results' do
110
+ assert_equal 50, site_explorer.page_data('http://www.yahoo.com').results.size
111
+ end
112
+
113
+ should 'contain the result title' do
114
+ assert_equal 'Site Explorer - Yahoo! Site Explorer', site_explorer.page_data('http://www.yahoo.com').results.first.title
115
+ end
116
+
117
+ should 'contain the result url' do
118
+ assert_equal 'http://siteexplorer.search.yahoo.com/', site_explorer.page_data('http://www.yahoo.com').results.first.url
119
+ end
120
+
121
+ should 'contain the result click_url' do
122
+ assert_equal 'http://siteexplorer.search.yahoo.com/', site_explorer.page_data('http://www.yahoo.com').results.first.click_url
123
+ end
124
+
125
+ context 'each' do
126
+
127
+ setup do
128
+ mock_response_for('http://search.yahooapis.com:80/SiteExplorerService/V1/pageData?query=http://www.yahoo.com&appid=testid',
129
+ :response => mock_page_data_successful_response_set_1)
130
+ mock_response_for('http://search.yahooapis.com:80/SiteExplorerService/V1/pageData?start=51&query=http://www.yahoo.com&appid=testid',
131
+ :response => mock_page_data_successful_response_set_2)
132
+ end
133
+
134
+ should 'traverse multiple, contiguous query sets' do
135
+ results = 0
136
+ pages = site_explorer.page_data('http://www.yahoo.com')
137
+ pages.each { |b| results += 1 }
138
+ assert_equal 99, results
139
+ end
140
+
141
+ should 'yield a Result' do
142
+ assert site_explorer.page_data('http://www.yahoo.com').all? { |result| result.kind_of?(YahooSiteExplorer::Result) }
143
+ end
144
+
24
145
  end
25
146
 
26
147
  end