google-site-search 0.0.5 → 0.0.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.rdoc +63 -45
- data/google-site-search.gemspec +2 -0
- data/lib/google-site-search.rb +24 -3
- data/lib/google-site-search/search.rb +20 -2
- data/lib/google-site-search/version.rb +1 -1
- data/test/data/utf8_results.xml +2 -2
- data/test/test_google_site_search.rb +37 -1
- data/test/test_search.rb +11 -2
- metadata +34 -6
data/README.rdoc
CHANGED
@@ -10,35 +10,56 @@ In the simplest use case it will query your google site search for a term and su
|
|
10
10
|
|
11
11
|
Add the following to your projects Gemfile.
|
12
12
|
|
13
|
-
|
13
|
+
gem 'google-site-search', :git => "git@github.com:dvallance/google-site-search.git"
|
14
14
|
|
15
15
|
Require the code if necessary (_note:_ some frameworks like rails are set to auto-require gems for you by default)
|
16
16
|
|
17
|
-
|
17
|
+
require 'google-site-search'
|
18
18
|
|
19
19
|
== Usage
|
20
20
|
|
21
21
|
The simpliest way to use the gem is by providing just a *search* *query* *term* and your *search* *engine* *unique* *id* code (_e.g._ looks like this +00255077836266642015+:+u-scht7a-8i+ and is located in your google site search control panel)
|
22
22
|
|
23
|
-
|
24
|
-
|
23
|
+
# just assign the query to an object
|
24
|
+
search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("microsoft", "00255077836266642015:u-scht7a-8i")
|
25
25
|
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
26
|
+
# object has search attributes like
|
27
|
+
puts search.next_results_url
|
28
|
+
puts search.previous_results_url
|
29
|
+
puts search.xml
|
30
|
+
puts search.spelling
|
31
|
+
puts search.spelling_url
|
32
|
+
|
33
|
+
# object has an array of each specific result that contains title, description and its link by default
|
34
|
+
search.results.each do |result|
|
35
|
+
puts result.title
|
36
|
+
puts result.description
|
37
|
+
puts result.link
|
38
|
+
end
|
39
39
|
|
40
40
|
The _query_ method expects a valid url so if you wanted to supply your own you can! However I have created a builder class to help with proper url creation and to help do some of the work for you.
|
41
41
|
|
42
|
+
== Multiple Search
|
43
|
+
|
44
|
+
Since google only allows a max of 20 returned results I have added a method that will capture up to *n* number of results.
|
45
|
+
|
46
|
+
# the array will be up to 5 search objects if the query actually has that many results.
|
47
|
+
# has soon as a search doesn't have a next_results_url the method stops.
|
48
|
+
array_of_search_results = GoogleSiteSearch.query_multiple(5, *YOUR_URL*)
|
49
|
+
|
50
|
+
== Blocks (and a caching example)
|
51
|
+
|
52
|
+
Both the query and query_multiple methods can take a block which executes for each Search object found.
|
53
|
+
|
54
|
+
# here is a rails example using the block
|
55
|
+
url = GoogleSiteSearch::UrlBuilder.new("microsoft", "00255077836266642015:u-scht7a-8i")
|
56
|
+
@search = Rails.cache.fetch(url) do |search|
|
57
|
+
# I can do something with the search objects.
|
58
|
+
# possibly custom analytics for our searchs?
|
59
|
+
search.url # we can access the object
|
60
|
+
end
|
61
|
+
|
62
|
+
|
42
63
|
== Advanced Usage
|
43
64
|
|
44
65
|
An important requirement for this gem was to be able to use {structured data}[https://developers.google.com/custom-search/docs/structured_data] for:
|
@@ -49,15 +70,14 @@ Therefore I allow the developer to supply his own "*Results*" class to the query
|
|
49
70
|
|
50
71
|
The default Result class is as follows:
|
51
72
|
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
end
|
73
|
+
class Result
|
74
|
+
attr_reader :title, :link, :description
|
75
|
+
def initialize(node)
|
76
|
+
@title = node.find_first("T").content
|
77
|
+
@link = node.find_first("UE").content
|
78
|
+
@description = node.find_first("S").content
|
79
|
+
end
|
80
|
+
end
|
61
81
|
|
62
82
|
As you can see it is very simple. Your class simply needs an initialize method that will recieve an xml node, which it can then do with as it pleases. After it is initialized it is added to the _search.results_ array as shown previously.
|
63
83
|
|
@@ -67,9 +87,9 @@ See
|
|
67
87
|
|
68
88
|
== Pagination
|
69
89
|
|
70
|
-
The google search api
|
90
|
+
The google search api does the work of pagination for us, supplying the next and previous urls. The urls are relative paths and contain the search engine id parameter. Since this is a security concern I strip out the search engine id when I store them in the Search.next_result_url and Search.previous_result_url methods. This makes them safe to put in links on views and it is why you must supply the search engine id again on the paginate call; so full url can be rebuilt for the query call.
|
71
91
|
|
72
|
-
|
92
|
+
search2 = GoogleSiteSearch.query(GoogleSiteSearch.paginate(search1.next_results_url, "00255077836266642015:u-scht7a-8i"))
|
73
93
|
|
74
94
|
== Pagination Simple Example
|
75
95
|
|
@@ -78,19 +98,19 @@ This works and is fairly straight forward.
|
|
78
98
|
In your controller:
|
79
99
|
|
80
100
|
if params[:move]
|
81
|
-
@search = GoogleSiteSearch.query(GoogleSiteSearch.paginate(params[:move]))
|
101
|
+
@search = GoogleSiteSearch.query(GoogleSiteSearch.paginate(params[:move], "00255077836266642015:u-scht7a-8i"))
|
82
102
|
else
|
83
103
|
@search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("microsoft", "00255077836266642015:u-scht7a-8i", :num => 5))
|
84
104
|
end
|
85
105
|
|
86
106
|
In your view:
|
87
107
|
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
108
|
+
<% if @search.previous_results_url %>
|
109
|
+
<%= link_to "Previous", search_url(:move => @search.previous_results_url) %>
|
110
|
+
<% end %>
|
111
|
+
<% if @search.next_results_url %>
|
112
|
+
<%= link_to "More", search_url(:move => @search.next_results_url) %>
|
113
|
+
<% end %>
|
94
114
|
|
95
115
|
== Escaping
|
96
116
|
|
@@ -98,13 +118,11 @@ If you start passing around the url's in parameters you may run into issues if y
|
|
98
118
|
|
99
119
|
View adds escape:
|
100
120
|
|
101
|
-
|
121
|
+
<%= link_to "Previous", search_url(:move => CGI::escape(@search.previous_results_url)) %>
|
102
122
|
|
103
123
|
Controller unescapes:
|
104
124
|
|
105
|
-
|
106
|
-
|
107
|
-
|
125
|
+
@search = GoogleSiteSearch.query(GoogleSiteSearch.paginate(CGI::unescape(params[:move]), "00255077836266642015:u-scht7a-8i"))
|
108
126
|
|
109
127
|
== Filtering and Sorting
|
110
128
|
|
@@ -116,8 +134,8 @@ Google expects filtering to be on the "search query" itself. However I feel my e
|
|
116
134
|
|
117
135
|
From the google reference link above an example filter search query is <b>halloween more:pagemap:document-author:lisamorton</b>
|
118
136
|
|
119
|
-
|
120
|
-
|
137
|
+
# using the example above would look like this.
|
138
|
+
search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("halloween", "00255077836266642015:u-scht7a-8i", :filter => "more:pagemap:document-author:lisamorton")
|
121
139
|
|
122
140
|
=== Separate Search Term From Filters
|
123
141
|
|
@@ -125,20 +143,20 @@ The full "search query" is returned by google's api and stored in the Search obj
|
|
125
143
|
|
126
144
|
To separate the search term from the filter use:
|
127
145
|
|
128
|
-
|
146
|
+
search_term, filters = GoogleSiteSearch.separate_search_term_from_filters(@search.search_query)
|
129
147
|
|
130
148
|
=== Sorting
|
131
149
|
|
132
150
|
Sorting would also be done by specifing a *sort* option.
|
133
151
|
|
134
|
-
|
152
|
+
search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("halloween", "00255077836266642015:u-scht7a-8i", :filter => "more:pagemap:document-author:lisamorton", :sort => "data-sdate")
|
135
153
|
|
136
154
|
== Other Params
|
137
155
|
|
138
156
|
Any <b>[param=value]</b> query string additions you want to add can be assigned like the sorting above. For example to limit the search results return, to 5, would look like...
|
139
157
|
|
140
|
-
|
141
|
-
|
158
|
+
# get only 5 search results with the filtering and sorting from above still applyed.
|
159
|
+
search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("halloween more:pagemap:document-author:lisamorton", "00255077836266642015:u-scht7a-8i", :sort => "date-sdate", :num => "5" )
|
142
160
|
|
143
161
|
== Author
|
144
162
|
|
data/google-site-search.gemspec
CHANGED
data/lib/google-site-search.rb
CHANGED
@@ -6,10 +6,12 @@ require "google-site-search/version"
|
|
6
6
|
require "google-site-search/url_builder"
|
7
7
|
require "google-site-search/search"
|
8
8
|
require "google-site-search/result"
|
9
|
-
require "timeout"
|
10
9
|
require "net/http"
|
10
|
+
require "rsmaz"
|
11
|
+
require "timeout"
|
11
12
|
require "uri"
|
12
13
|
require "xml"
|
14
|
+
require "rack/utils"
|
13
15
|
|
14
16
|
##
|
15
17
|
# A module to help query and parse the google site search api.
|
@@ -24,9 +26,28 @@ module GoogleSiteSearch
|
|
24
26
|
|
25
27
|
class << self
|
26
28
|
|
29
|
+
# Takes a url, strips out un-required query params, and compresses
|
30
|
+
# a string representation. The intent is to have a small string to
|
31
|
+
# use as a caching key.
|
32
|
+
def caching_key url
|
33
|
+
params = Rack::Utils.parse_query(URI.parse(url).query)
|
34
|
+
# ei = "Passes on an alphanumeric parameter that decodes the originating SERP where user clicked on a related search". Don't fully understand what it does but it makes my caching less effective.
|
35
|
+
params.delete("ei")
|
36
|
+
key = params.map{|k,v| k.to_s + v.to_s}.sort.join
|
37
|
+
key.blank? ? nil : RSmaz.compress(key)
|
38
|
+
end
|
39
|
+
|
27
40
|
# Expects the URL returned by Search#next_results_url or Search#previous_results_url.
|
28
|
-
def paginate url
|
29
|
-
|
41
|
+
def paginate url, search_engine_id
|
42
|
+
raise StandardError, "search_engine_id required" if search_engine_id.blank?
|
43
|
+
uri = URI.parse(url.to_s)
|
44
|
+
raise StandardError, "url seems to be invalid, parameters expected" if uri.query.blank?
|
45
|
+
if uri.relative?
|
46
|
+
uri.host = "www.google.com"
|
47
|
+
uri.scheme = "http"
|
48
|
+
end
|
49
|
+
uri.query = uri.query += "&cx=#{search_engine_id}"
|
50
|
+
uri.to_s
|
30
51
|
end
|
31
52
|
|
32
53
|
# See Search - This is a convienence method for creating and querying.
|
@@ -44,6 +44,15 @@ module GoogleSiteSearch
|
|
44
44
|
@result_class = result_class
|
45
45
|
end
|
46
46
|
|
47
|
+
def next_results_url
|
48
|
+
@next_results_url
|
49
|
+
end
|
50
|
+
|
51
|
+
def previous_results_url
|
52
|
+
@previous_results_url
|
53
|
+
end
|
54
|
+
|
55
|
+
|
47
56
|
# Query's Google API, stores the xml and parses values into itself.
|
48
57
|
def query
|
49
58
|
@xml = GoogleSiteSearch::request_xml(url)
|
@@ -68,12 +77,21 @@ module GoogleSiteSearch
|
|
68
77
|
@spelling = spelling_node.try(:content)
|
69
78
|
@spelling_q = spelling_node.try(:attributes).try(:[],:q)
|
70
79
|
@estimated_results_total = doc.find_first("RES/M").try(:content)
|
71
|
-
@next_results_url = doc.find_first("RES/NB/NU").try(:content)
|
72
|
-
@previous_results_url = doc.find_first("RES/NB/PU").try(:content)
|
80
|
+
@next_results_url = remove_search_engine_id(doc.find_first("RES/NB/NU").try(:content))
|
81
|
+
@previous_results_url = remove_search_engine_id(doc.find_first("RES/NB/PU").try(:content))
|
73
82
|
@search_query = doc.find_first("Q").try(:content)
|
74
83
|
rescue Exception => e
|
75
84
|
raise ParsingError, "#{e.message} URL:[#{@url}] XML:[#{@xml}]"
|
76
85
|
end
|
77
86
|
end
|
87
|
+
|
88
|
+
def remove_search_engine_id url
|
89
|
+
return nil if url.blank?
|
90
|
+
uri = URI.parse(url)
|
91
|
+
params = Rack::Utils::parse_query(uri.query)
|
92
|
+
params.delete("cx")
|
93
|
+
uri.query = params.map{|k,v| "#{k}=#{v}"}.sort.join("&")
|
94
|
+
uri.to_s
|
95
|
+
end
|
78
96
|
end
|
79
97
|
end
|
data/test/data/utf8_results.xml
CHANGED
@@ -35,8 +35,8 @@
|
|
35
35
|
<RES SN="3" EN="4">
|
36
36
|
<M>29</M>
|
37
37
|
<NB>
|
38
|
-
<PU>/previous</PU>
|
39
|
-
<NU>/next</NU>
|
38
|
+
<PU>/previous?start=20&q=search&cx=my_key</PU>
|
39
|
+
<NU>/next?cx=my_key&q=search&start=1</NU>
|
40
40
|
</NB>
|
41
41
|
<RG START="1" SIZE="2"/>
|
42
42
|
<RG START="1" SIZE="1"> </RG>
|
@@ -2,10 +2,46 @@ require_relative 'test_helper'
|
|
2
2
|
|
3
3
|
describe GoogleSiteSearch do
|
4
4
|
|
5
|
+
|
6
|
+
describe '.caching_key' do
|
7
|
+
let :sample_url do
|
8
|
+
"http://domain?q=work&ei=dontshow&do=i"
|
9
|
+
end
|
10
|
+
|
11
|
+
let :becomes do
|
12
|
+
"doiqwork" #query parameter values sorted and concated.
|
13
|
+
end
|
14
|
+
|
15
|
+
it "properly creates a key" do
|
16
|
+
RSmaz.decompress(GoogleSiteSearch.caching_key(sample_url)).must_equal becomes
|
17
|
+
end
|
18
|
+
|
19
|
+
it "raises error on bad uri" do
|
20
|
+
-> {GoogleSiteSearch.caching_key(nil)}.must_raise URI::InvalidURIError
|
21
|
+
end
|
22
|
+
|
23
|
+
end
|
24
|
+
|
5
25
|
describe '.paginate' do
|
26
|
+
|
27
|
+
let :valid_url do
|
28
|
+
"http://www.valid.com/search?q=mysearch"
|
29
|
+
end
|
30
|
+
|
31
|
+
it "raises an error if the url has no parameters and is therefore invalid" do
|
32
|
+
url = "http://www.noparameters.com/"
|
33
|
+
-> {GoogleSiteSearch.paginate(url, "my_key")}.must_raise StandardError
|
34
|
+
end
|
35
|
+
|
36
|
+
it "raises an error if a search engine key is nil or blank" do
|
37
|
+
-> {GoogleSiteSearch.paginate(valid_url, "")}.must_raise StandardError
|
38
|
+
-> {GoogleSiteSearch.paginate(valid_url, nil)}.must_raise StandardError
|
39
|
+
end
|
40
|
+
|
6
41
|
it 'completes a valid url for the relative path supplied' do
|
7
|
-
GoogleSiteSearch.paginate("/some/path").must_equal "http://www.google.com/some/path"
|
42
|
+
GoogleSiteSearch.paginate("/some/path?q=search","my_key").must_equal "http://www.google.com/some/path?q=search&cx=my_key"
|
8
43
|
end
|
44
|
+
|
9
45
|
end
|
10
46
|
|
11
47
|
describe '.relative_path' do
|
data/test/test_search.rb
CHANGED
@@ -27,13 +27,21 @@ describe Search do
|
|
27
27
|
end
|
28
28
|
|
29
29
|
it "contains the next results url" do
|
30
|
-
search.next_results_url.
|
30
|
+
search.next_results_url.wont_be :empty?
|
31
|
+
end
|
32
|
+
|
33
|
+
it "next results url removed the search engine id parameter" do
|
34
|
+
search.next_results_url.must_equal "/next?q=search&start=1"
|
31
35
|
end
|
32
36
|
|
33
37
|
it "contains the previous results url" do
|
34
|
-
search.previous_results_url.
|
38
|
+
search.previous_results_url.wont_be :empty?
|
35
39
|
end
|
36
40
|
|
41
|
+
it "next results url removed the search engine id parameter" do
|
42
|
+
search.previous_results_url.must_equal "/previous?q=search&start=20"
|
43
|
+
end
|
44
|
+
|
37
45
|
it "stores the original xml" do
|
38
46
|
search.xml.must_equal xml
|
39
47
|
end
|
@@ -54,4 +62,5 @@ describe Search do
|
|
54
62
|
search.spelling_q.must_equal "fake suggestion escaped"
|
55
63
|
end
|
56
64
|
end
|
65
|
+
|
57
66
|
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: google-site-search
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.7
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,11 +9,11 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2012-11-
|
12
|
+
date: 2012-11-25 00:00:00.000000000Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: activesupport
|
16
|
-
requirement: &
|
16
|
+
requirement: &11771100 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ! '>='
|
@@ -21,10 +21,10 @@ dependencies:
|
|
21
21
|
version: '0'
|
22
22
|
type: :runtime
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *11771100
|
25
25
|
- !ruby/object:Gem::Dependency
|
26
26
|
name: libxml-ruby
|
27
|
-
requirement: &
|
27
|
+
requirement: &11770260 !ruby/object:Gem::Requirement
|
28
28
|
none: false
|
29
29
|
requirements:
|
30
30
|
- - ! '>='
|
@@ -32,7 +32,29 @@ dependencies:
|
|
32
32
|
version: '0'
|
33
33
|
type: :runtime
|
34
34
|
prerelease: false
|
35
|
-
version_requirements: *
|
35
|
+
version_requirements: *11770260
|
36
|
+
- !ruby/object:Gem::Dependency
|
37
|
+
name: rsmaz
|
38
|
+
requirement: &11761040 !ruby/object:Gem::Requirement
|
39
|
+
none: false
|
40
|
+
requirements:
|
41
|
+
- - ! '>='
|
42
|
+
- !ruby/object:Gem::Version
|
43
|
+
version: '0'
|
44
|
+
type: :runtime
|
45
|
+
prerelease: false
|
46
|
+
version_requirements: *11761040
|
47
|
+
- !ruby/object:Gem::Dependency
|
48
|
+
name: rack
|
49
|
+
requirement: &11760240 !ruby/object:Gem::Requirement
|
50
|
+
none: false
|
51
|
+
requirements:
|
52
|
+
- - ! '>='
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '0'
|
55
|
+
type: :runtime
|
56
|
+
prerelease: false
|
57
|
+
version_requirements: *11760240
|
36
58
|
description: A gem to aid in the consumption of the google site search service; querys
|
37
59
|
the service, populates a result object and has some related helper methods.
|
38
60
|
email:
|
@@ -70,12 +92,18 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
70
92
|
- - ! '>='
|
71
93
|
- !ruby/object:Gem::Version
|
72
94
|
version: '0'
|
95
|
+
segments:
|
96
|
+
- 0
|
97
|
+
hash: -3596847305660679714
|
73
98
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
74
99
|
none: false
|
75
100
|
requirements:
|
76
101
|
- - ! '>='
|
77
102
|
- !ruby/object:Gem::Version
|
78
103
|
version: '0'
|
104
|
+
segments:
|
105
|
+
- 0
|
106
|
+
hash: -3596847305660679714
|
79
107
|
requirements: []
|
80
108
|
rubyforge_project:
|
81
109
|
rubygems_version: 1.8.10
|