snapsearch-client-ruby 0.1.0 → 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Gemfile.lock +1 -1
- data/README.md +213 -108
- data/VERSION +1 -1
- data/examples/rack/config.ru +7 -7
- data/examples/sinatra/config.ru +15 -15
- data/examples/sinatra/lib/sinatra_snap_search.rb +59 -1
- data/lib/rack/snap_search.rb +19 -4
- data/lib/rack/snap_search/config.rb +2 -2
- data/lib/snap_search.rb +1 -0
- data/lib/snap_search/client.rb +4 -4
- data/lib/snap_search/{connection_exception.rb → connection_error.rb} +15 -15
- data/lib/snap_search/detector.rb +4 -4
- data/lib/snap_search/{exception.rb → error.rb} +8 -8
- data/lib/snap_search/interceptor.rb +18 -11
- data/lib/snap_search/{validation_exception.rb → validation_error.rb} +17 -17
- metadata +5 -5
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 716cbd63128f5927b2cecd2f741b163b41260647
|
4
|
+
data.tar.gz: bc1f48f201d33d57c4218889710470c4bd61f2a0
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 4644c7538ebd5eaa147979d05facf00dfa819aed178965d73c55752a323dc4be362f13d2d75a20749ece4b0a1c915d63ea13f8bc2252c8cc69be8ab360c1bc07
|
7
|
+
data.tar.gz: 52a72306554f6ed8548b0b6b6dd27b60ea5f22d96cb9d30b5fdf0f0fe25f9e15894af7afae62e2b6291b097e2d86654ce540e8a7b314c5a02e2bcb672710a9bd
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -1,109 +1,214 @@
|
|
1
|
-
SnapSearch-Client-Ruby
|
2
|
-
======================
|
3
|
-
|
4
|
-
[![Build Status](https://travis-ci.org/SnapSearch/SnapSearch-Client-Ruby.png?branch=master)](https://travis-ci.org/SnapSearch/SnapSearch-Client-Ruby)
|
5
|
-
|
6
|
-
Snapsearch Client Ruby is Ruby based framework agnostic HTTP client library for SnapSearch (https://snapsearch.io/).
|
7
|
-
|
8
|
-
SnapSearch provides similar libraries in other languages: https://github.com/SnapSearch/Snapsearch-Clients
|
9
|
-
|
10
|
-
Installation
|
11
|
-
------------
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
```
|
22
|
-
gem
|
23
|
-
```
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
94
|
-
|
95
|
-
|
96
|
-
|
97
|
-
|
98
|
-
#
|
99
|
-
|
100
|
-
|
101
|
-
#
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
107
|
-
|
108
|
-
|
1
|
+
SnapSearch-Client-Ruby
|
2
|
+
======================
|
3
|
+
|
4
|
+
[![Build Status](https://travis-ci.org/SnapSearch/SnapSearch-Client-Ruby.png?branch=master)](https://travis-ci.org/SnapSearch/SnapSearch-Client-Ruby)
|
5
|
+
|
6
|
+
Snapsearch Client Ruby is Ruby based framework agnostic HTTP client library for SnapSearch (https://snapsearch.io/).
|
7
|
+
|
8
|
+
SnapSearch provides similar libraries in other languages: https://github.com/SnapSearch/Snapsearch-Clients
|
9
|
+
|
10
|
+
Installation
|
11
|
+
------------
|
12
|
+
|
13
|
+
SnapSearch-Client-Ruby is available through Rubygems and can be installed via:
|
14
|
+
|
15
|
+
```
|
16
|
+
gem install snapsearch-client-ruby
|
17
|
+
```
|
18
|
+
|
19
|
+
or add it to your Gemfile like this:
|
20
|
+
|
21
|
+
```
|
22
|
+
gem "snapsearch-client-ruby", "~> 0.1.0"
|
23
|
+
```
|
24
|
+
|
25
|
+
For all supported Ruby versions check out the `.travis.yml` file.
|
26
|
+
|
27
|
+
Usage
|
28
|
+
-----
|
29
|
+
|
30
|
+
SnapSearch Client Ruby is a rack based middleware for SnapSearch. It works with all rack based frameworks including Rails and Sinatra. You should place the SnapSearch middleware on top of other middleware so it gets called relatively early in the request response cycle. The middleware is also available as individual objects, which can be called independently. In non-rack based frameworks, it is best to start SnapSearch at the entry point of your application.
|
31
|
+
|
32
|
+
**The examples folder in this repository contains a rack and sinatra example showing the context of using the SnapSearch middleware in your application. The below instructions is an abridged version of the examples.**
|
33
|
+
|
34
|
+
For full documentation on the API and API request parameters see: https://snapsearch.io/documentation
|
35
|
+
|
36
|
+
### Basic Usage
|
37
|
+
|
38
|
+
In your `config.ru` file, import the `rack/snap_search`, then setup the configuration:
|
39
|
+
|
40
|
+
```ruby
|
41
|
+
require 'rack/snap_search'
|
42
|
+
|
43
|
+
use Rack::SnapSearch do |config|
|
44
|
+
|
45
|
+
config.email = 'user@example.com'
|
46
|
+
|
47
|
+
config.key = 'API_KEY_HERE'
|
48
|
+
|
49
|
+
end
|
50
|
+
```
|
51
|
+
|
52
|
+
This will handle everything from the detection of the robot to outputting the cached snapshot. If it detects the robot, it will skip execution of the application and output the snapshot response. The default configuration is to output only the status, location headers and body content. This is because some headers may cause encoding errors.
|
53
|
+
|
54
|
+
Here is an example of the response hash from SnapSearch:
|
55
|
+
|
56
|
+
```ruby
|
57
|
+
response = {
|
58
|
+
"cache" => true/false,
|
59
|
+
"callbackResult" => "",
|
60
|
+
"date" => 1390382314,
|
61
|
+
"headers" => [
|
62
|
+
{
|
63
|
+
"name" => "Content-Type",
|
64
|
+
"value" => "text/html"
|
65
|
+
}
|
66
|
+
],
|
67
|
+
"html" => "<html></html>",
|
68
|
+
"message" => "Success/Failed/Validation Errors",
|
69
|
+
"pageErrors" => [
|
70
|
+
{
|
71
|
+
"error" => "Error: document.querySelector(...) is null",
|
72
|
+
"trace" => [
|
73
|
+
{
|
74
|
+
"file" => "filename",
|
75
|
+
"function" => "anonymous",
|
76
|
+
"line" => "41",
|
77
|
+
"sourceURL" => "urltofile"
|
78
|
+
}
|
79
|
+
]
|
80
|
+
}
|
81
|
+
],
|
82
|
+
"screensot" => "BASE64 ENCODED IMAGE CONTENT",
|
83
|
+
"status" => 200
|
84
|
+
}
|
85
|
+
```
|
86
|
+
|
87
|
+
### Advanced Usage
|
88
|
+
|
89
|
+
The rack based middleware has many options and if you use the objects independently they are even more flexible. These options can be seen in context in the examples folder:
|
90
|
+
|
91
|
+
```ruby
|
92
|
+
use Rack::SnapSearch do |config|
|
93
|
+
|
94
|
+
config.email = 'user@example.com'
|
95
|
+
|
96
|
+
config.key = 'API_KEY_HERE'
|
97
|
+
|
98
|
+
# Optional: The API URL to send requests to.
|
99
|
+
config.api_url = 'https://snapsearch.io/api/v1/robot' # Default
|
100
|
+
|
101
|
+
# Optional: The CA Cert file to use when sending HTTPS requests to the API.
|
102
|
+
config.ca_cert_file = SnapSearch.root.join('resources', 'cacert.pem') # Default
|
103
|
+
|
104
|
+
# Optional: Check X-Forwarded-Proto if you use a load balancer that proxies https to http connections
|
105
|
+
config.x_forwarded_proto = true # Default
|
106
|
+
|
107
|
+
# Optional: Extra API parameters that is sent to SnapSearch
|
108
|
+
config.parameters = {} # Default
|
109
|
+
|
110
|
+
# Optional: Whitelisted routes. Should be an Array of Regexp instances.
|
111
|
+
config.matched_routes = [] # Default
|
112
|
+
|
113
|
+
# Optional: Blacklisted routes. Should be an Array of Regexp instances.
|
114
|
+
config.ignored_routes = [] # Default
|
115
|
+
|
116
|
+
# Optional: A path of the JSON file containing the user agent whitelist & blacklist.
|
117
|
+
config.robots_json = SnapSearch.root.join('resources', 'robots.json') # Default
|
118
|
+
|
119
|
+
# Optional: A path to the JSON file containing a single Hash with the keys `ignore` and `match`. These keys contain Arrays of Strings (user agents)
|
120
|
+
config.extensions_json = SnapSearch.root.join('resources', 'extensions.json') # Default
|
121
|
+
|
122
|
+
# Optional: Set to `true` to check file extensions in the URL, this will check if the URL contains invalid file extensions.
|
123
|
+
#If there is no file extension, then there's no problem. But if there is, it could be a request to a static file. In which case it is not HTML that we want to intercept.
|
124
|
+
#It is typically easier to simply whitelist or blacklist file based routes.
|
125
|
+
#You do not need this unless your application server (not your HTTP server) is serving up static files. Like binary content, images and non-HTML text files.
|
126
|
+
config.check_file_extensions = false # Default
|
127
|
+
|
128
|
+
# Optional: A block to run when an exception occurs when making requests to the API.
|
129
|
+
config.on_exception do |exception|
|
130
|
+
p exception
|
131
|
+
end
|
132
|
+
|
133
|
+
# Optional: A block to run before the interception of a bot. You can use this to do client side caching.
|
134
|
+
config.before_intercept do |url|
|
135
|
+
#Get a client side cached snapshot
|
136
|
+
end
|
137
|
+
|
138
|
+
# Optional: A block to run after the interception of a bot. You can use this to do client side caching.
|
139
|
+
config.after_intercept do |url, response|
|
140
|
+
#Save the client side cached snapshot (the cached time should be less then the cached time you passed to SnapSearch, we recommend half the SnapSearch cachetime)
|
141
|
+
end
|
142
|
+
|
143
|
+
# Optional: A block to manipulate the response from the SnapSearch API if a bit is intercepted. The headers in this case represent [{name: "HEADERKEY", value: "HEADERVALUE"}, ...]
|
144
|
+
config.response_callback do |status, headers, body|
|
145
|
+
[ status, headers, body ]
|
146
|
+
end
|
147
|
+
|
148
|
+
end
|
149
|
+
```
|
150
|
+
|
151
|
+
Check out the resources folder containing the `robots.json` and `extensions.json`. The `robots.json` contains all the Search Engine and Social App robot user agents we're currently checking for. The `extensions.json` contains all the valid file extensions that a web application might use for HTML resources. Feel free to edit them and use your own JSON files for the middleware. Always make sure to ignore the "SnapSearch" robot, otherwise you could get into an infinite interception loop.
|
152
|
+
|
153
|
+
The Detector instance's robot and extensions hash are publicly accessible and can be modified during runtime.
|
154
|
+
|
155
|
+
```
|
156
|
+
# Add a user agent to match against:
|
157
|
+
detector.robots['match'] << 'NewRobot'
|
158
|
+
|
159
|
+
# Add a user agent to ignore:
|
160
|
+
detector.robots['ignore'] << 'MyRobot'
|
161
|
+
|
162
|
+
detector.extensions['ruby'] << 'myvalidrubyfileextensionforhtmlresources'
|
163
|
+
```
|
164
|
+
|
165
|
+
Development
|
166
|
+
---------
|
167
|
+
|
168
|
+
Get the bundler dependency management tool.
|
169
|
+
|
170
|
+
```
|
171
|
+
gem install bundler
|
172
|
+
```
|
173
|
+
|
174
|
+
Install/update all dependencies:
|
175
|
+
|
176
|
+
```
|
177
|
+
bundle install
|
178
|
+
```
|
179
|
+
|
180
|
+
See all build tasks:
|
181
|
+
|
182
|
+
```
|
183
|
+
bundle exec rake -T
|
184
|
+
```
|
185
|
+
|
186
|
+
Make your changes. Release a new version tag with (see the other `rake version:bump:... etc` tasks):
|
187
|
+
|
188
|
+
```
|
189
|
+
bundle exec rake version:bump
|
190
|
+
```
|
191
|
+
|
192
|
+
Synchronise and push the tag to Github:
|
193
|
+
|
194
|
+
```
|
195
|
+
git push
|
196
|
+
git push --tags
|
197
|
+
```
|
198
|
+
|
199
|
+
Create the gem package:
|
200
|
+
|
201
|
+
```
|
202
|
+
bundle exec rake gem
|
203
|
+
```
|
204
|
+
|
205
|
+
Push the gem to Ruby Gems:
|
206
|
+
|
207
|
+
```
|
208
|
+
gem push pkg/snapsearch-client-ruby-MAJOR.MINOR.PATCH.gem
|
209
|
+
```
|
210
|
+
|
211
|
+
Tests
|
212
|
+
----
|
213
|
+
|
109
214
|
Tests are written with RSpec. Run tests with `bundle exec rspec spec/`
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
|
1
|
+
1.0.0
|
data/examples/rack/config.ru
CHANGED
@@ -1,11 +1,11 @@
|
|
1
1
|
# Notes to run:
|
2
|
-
#
|
3
|
-
#
|
4
|
-
#
|
2
|
+
# gem install bundler
|
3
|
+
# bundle install
|
4
|
+
# rackup
|
5
5
|
#
|
6
6
|
# Testing:
|
7
|
-
#
|
8
|
-
#
|
7
|
+
# Visit http://localhost:9292/
|
8
|
+
# Visit http://localhost:9292/?_escaped_fragment_
|
9
9
|
|
10
10
|
require 'bundler/setup'
|
11
11
|
require 'rack/snap_search'
|
@@ -44,8 +44,8 @@ use Rack::SnapSearch do |config|
|
|
44
44
|
# Optional: A path to the JSON file containing a single Hash with the keys `ignore` and `match`. These keys contain Arrays of Strings (user agents)
|
45
45
|
config.extensions_json = SnapSearch.root.join('resources', 'extensions.json') # Default
|
46
46
|
|
47
|
-
# Optional: Set to `true` to ignore
|
48
|
-
config.
|
47
|
+
# Optional: Set to `true` to ignore requests that have invalid file extensions
|
48
|
+
config.check_file_extensions = false # Default
|
49
49
|
|
50
50
|
# Optional: A block to run when an exception occurs when making requests to the API.
|
51
51
|
config.on_exception do |exception|
|
data/examples/sinatra/config.ru
CHANGED
@@ -1,15 +1,15 @@
|
|
1
|
-
# Notes to run:
|
2
|
-
#
|
3
|
-
#
|
4
|
-
#
|
5
|
-
#
|
6
|
-
# Testing:
|
7
|
-
#
|
8
|
-
#
|
9
|
-
|
10
|
-
require 'bundler/setup'
|
11
|
-
require 'pathname'
|
12
|
-
$:.unshift( Pathname.new(__FILE__).join('..', 'lib').expand_path.to_s )
|
13
|
-
require 'sinatra_snap_search'
|
14
|
-
|
15
|
-
run SinatraSnapSearch
|
1
|
+
# Notes to run:
|
2
|
+
# gem install bundler
|
3
|
+
# bundle install
|
4
|
+
# rackup
|
5
|
+
#
|
6
|
+
# Testing:
|
7
|
+
# Visit http://localhost:9292/
|
8
|
+
# Visit http://localhost:9292/?_escaped_fragment_
|
9
|
+
|
10
|
+
require 'bundler/setup'
|
11
|
+
require 'pathname'
|
12
|
+
$:.unshift( Pathname.new(__FILE__).join('..', 'lib').expand_path.to_s )
|
13
|
+
require 'sinatra_snap_search'
|
14
|
+
|
15
|
+
run SinatraSnapSearch
|
@@ -9,7 +9,65 @@ class SinatraSnapSearch < Sinatra::Base
|
|
9
9
|
set :root, SnapSearch.root.join('examples', 'sinatra')
|
10
10
|
enable :sessions, :logging, :method_override, :static
|
11
11
|
|
12
|
-
use Rack::SnapSearch
|
12
|
+
use Rack::SnapSearch do |config|
|
13
|
+
|
14
|
+
# Required: The email to authenticate with.
|
15
|
+
config.email = 'user@example.com'
|
16
|
+
|
17
|
+
# Required: The key to authenticate with.
|
18
|
+
config.key = 'API_KEY_HERE'
|
19
|
+
|
20
|
+
# Optional: The API URL to send requests to.
|
21
|
+
config.api_url = 'https://snapsearch.io/api/v1/robot' # Default
|
22
|
+
|
23
|
+
# Optional: The CA Cert file to use when sending HTTPS requests to the API.
|
24
|
+
config.ca_cert_file = SnapSearch.root.join('resources', 'cacert.pem') # Default
|
25
|
+
|
26
|
+
# Optional: Check X-Forwarded-Proto because Heroku SSL Support terminates at the load balancer.
|
27
|
+
config.x_forwarded_proto = true # Default
|
28
|
+
|
29
|
+
# Optional: Extra parameters to send to the API.
|
30
|
+
config.parameters = {} # Default
|
31
|
+
|
32
|
+
# Optional: Whitelisted routes. Should be an Array of Regexp instances.
|
33
|
+
config.matched_routes = [] # Default
|
34
|
+
|
35
|
+
# Optional: Blacklisted routes. Should be an Array of Regexp instances.
|
36
|
+
config.ignored_routes = [] # Default
|
37
|
+
|
38
|
+
# Optional: A path of the JSON file containing the user agent whitelist & blacklist.
|
39
|
+
config.robots_json = SnapSearch.root.join('resources', 'robots.json') # Default
|
40
|
+
|
41
|
+
# Optional: A path to the JSON file containing a single Hash with the keys `ignore` and `match`. These keys contain Arrays of Strings (user agents)
|
42
|
+
config.extensions_json = SnapSearch.root.join('resources', 'extensions.json') # Default
|
43
|
+
|
44
|
+
# Optional: Set to `true` to ignore direct requests to files.
|
45
|
+
config.check_file_extensions = false # Default
|
46
|
+
|
47
|
+
# Optional: A block to run when an exception occurs when making requests to the API.
|
48
|
+
config.on_exception do |exception|
|
49
|
+
p exception
|
50
|
+
end
|
51
|
+
|
52
|
+
# Optional: A block to run before the interception of a bot.
|
53
|
+
config.before_intercept do |url|
|
54
|
+
puts "Before interception\n URL: #{url}"
|
55
|
+
end
|
56
|
+
|
57
|
+
# Optional: A block to run after the interception of a bot.
|
58
|
+
config.after_intercept do |url, response|
|
59
|
+
puts "After interception\n URL: #{url}\n Response: #{response}"
|
60
|
+
end
|
61
|
+
|
62
|
+
# Optional: A block to manipulate the response from the SnapSearch API if a bit is intercepted.
|
63
|
+
config.response_callback do |status, headers, body|
|
64
|
+
puts "Response callback\n Status: #{status}\n Headers: #{headers}\n Body: #{body}"
|
65
|
+
|
66
|
+
[ status, headers, body ]
|
67
|
+
end
|
68
|
+
|
69
|
+
end
|
70
|
+
|
13
71
|
end
|
14
72
|
|
15
73
|
get '/' do
|
data/lib/rack/snap_search.rb
CHANGED
@@ -41,12 +41,20 @@ module Rack
|
|
41
41
|
setup_api_response_from_environment(environment) # Will set @api_response if one is given from the API
|
42
42
|
|
43
43
|
if @api_response
|
44
|
-
|
44
|
+
unless @config.response_callback.nil?
|
45
|
+
@callback_response = @config.response_callback.call(*@api_response)
|
46
|
+
|
47
|
+
setup_attributes_from_callback_response
|
48
|
+
else
|
49
|
+
setup_attributes_from_api_response
|
50
|
+
end
|
51
|
+
|
52
|
+
rack_response
|
45
53
|
else
|
46
54
|
setup_attributes_from_app(environment)
|
47
55
|
end
|
48
56
|
|
49
|
-
|
57
|
+
|
50
58
|
end
|
51
59
|
|
52
60
|
protected
|
@@ -78,7 +86,7 @@ module Rack
|
|
78
86
|
ignored_routes: @config.ignored_routes,
|
79
87
|
robots_json: @config.robots_json,
|
80
88
|
extensions_json: @config.extensions_json,
|
81
|
-
|
89
|
+
check_file_extensions: @config.check_file_extensions
|
82
90
|
)
|
83
91
|
end
|
84
92
|
|
@@ -106,7 +114,7 @@ module Rack
|
|
106
114
|
begin
|
107
115
|
request = Rack::Request.new(environment.to_h)
|
108
116
|
@api_response = @interceptor.intercept(request: request)
|
109
|
-
rescue SnapSearch::
|
117
|
+
rescue ::SnapSearch::Error => exception
|
110
118
|
@config.on_exception.nil? ? raise(exception) : @config.on_exception.call(exception)
|
111
119
|
end
|
112
120
|
end
|
@@ -134,6 +142,13 @@ module Rack
|
|
134
142
|
@status, @headers, @body = @app.call(environment)
|
135
143
|
end
|
136
144
|
|
145
|
+
def setup_attributes_from_callback_response
|
146
|
+
@status, @headers, @body = @callback_response
|
147
|
+
|
148
|
+
# Convert the Array of Hashes into a single Hash:
|
149
|
+
@headers = @headers.each_with_object({}) { |hash, memo| memo[ hash.first[0] ] = hash.first[1] }
|
150
|
+
end
|
151
|
+
|
137
152
|
def rack_response
|
138
153
|
[ @status, @headers, @body ]
|
139
154
|
end
|
@@ -7,7 +7,7 @@ module Rack
|
|
7
7
|
|
8
8
|
ATTRIBUTES = [
|
9
9
|
:email, :key, :api_url, :ca_cert_file, :x_forwarded_proto, :parameters, # Client options
|
10
|
-
:matched_routes, :ignored_routes, :robots_json, :extensions_json, :
|
10
|
+
:matched_routes, :ignored_routes, :robots_json, :extensions_json, :check_file_extensions # Detector options
|
11
11
|
]
|
12
12
|
|
13
13
|
attr_accessor *ATTRIBUTES # Setup reader & writer instance methods for each attribute
|
@@ -25,7 +25,7 @@ module Rack
|
|
25
25
|
# @option options [String] :ignored_routes Blacklisted routes. Should be an Array of Regexp instances.
|
26
26
|
# @option options [String] :robots_json A path of the JSON file containing the user agent whitelist & blacklist.
|
27
27
|
# @option options [String] :extensions_json A path to the JSON file containing a single Hash with the keys `ignore` and `match`. These keys contain Arrays of Strings (user agents)
|
28
|
-
# @option options [String] :
|
28
|
+
# @option options [String] :check_file_extensions Set to `true` to ignore direct requests to files.
|
29
29
|
# @option options [Proc, #call] :on_exception The block to run when an exception within SnapSearch occurs.
|
30
30
|
# @option options [Proc, #call] :before_intercept A block to run before the interception of a bot.
|
31
31
|
# @option options [Proc, #call] :after_intercept A block to run after the interception of a bot.
|
data/lib/snap_search.rb
CHANGED
data/lib/snap_search/client.rb
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
require 'json'
|
2
2
|
require 'httpi'
|
3
|
-
require 'snap_search/
|
4
|
-
require 'snap_search/
|
3
|
+
require 'snap_search/connection_error'
|
4
|
+
require 'snap_search/validation_error'
|
5
5
|
|
6
6
|
module SnapSearch
|
7
7
|
|
@@ -126,7 +126,7 @@ module SnapSearch
|
|
126
126
|
|
127
127
|
HTTPI.post(request)
|
128
128
|
rescue
|
129
|
-
raise
|
129
|
+
raise ConnectionError
|
130
130
|
end
|
131
131
|
|
132
132
|
# Retrieve the `content` or raise an error based on the `code` field in the JSON response.
|
@@ -137,7 +137,7 @@ module SnapSearch
|
|
137
137
|
|
138
138
|
case body['code']
|
139
139
|
when 'success' then body['content']
|
140
|
-
when 'validation_error' then raise(
|
140
|
+
when 'validation_error' then raise( ValidationError, body['content'] )
|
141
141
|
else; false # System error on SnapSearch; Nothing we can do # TODO: Raise exception?
|
142
142
|
end
|
143
143
|
end
|
@@ -1,15 +1,15 @@
|
|
1
|
-
require 'snap_search/
|
2
|
-
|
3
|
-
module SnapSearch
|
4
|
-
|
5
|
-
# Raised when the Client could not connect to the client's `api_url`.
|
6
|
-
class
|
7
|
-
|
8
|
-
# Create a new
|
9
|
-
def initialize
|
10
|
-
super('Could not establish a connection to SnapSearch.')
|
11
|
-
end
|
12
|
-
|
13
|
-
end
|
14
|
-
|
15
|
-
end
|
1
|
+
require 'snap_search/error'
|
2
|
+
|
3
|
+
module SnapSearch
|
4
|
+
|
5
|
+
# Raised when the Client could not connect to the client's `api_url`.
|
6
|
+
class ConnectionError < Error
|
7
|
+
|
8
|
+
# Create a new ConnectionError.
|
9
|
+
def initialize
|
10
|
+
super('Could not establish a connection to SnapSearch.')
|
11
|
+
end
|
12
|
+
|
13
|
+
end
|
14
|
+
|
15
|
+
end
|
data/lib/snap_search/detector.rb
CHANGED
@@ -8,7 +8,7 @@ module SnapSearch
|
|
8
8
|
class Detector
|
9
9
|
|
10
10
|
attr_reader :matched_routes, :ignored_routes
|
11
|
-
attr_reader :
|
11
|
+
attr_reader :check_file_extensions
|
12
12
|
attr_reader :robots_json, :robots
|
13
13
|
attr_reader :extensions_json, :extensions
|
14
14
|
|
@@ -18,7 +18,7 @@ module SnapSearch
|
|
18
18
|
# @option options [Array<Regexp>] :matched_routes The whitelisted routes.
|
19
19
|
# @option options [Array<Regexp>] :ignored_routes The blacklisted routes.
|
20
20
|
# @option options [String, #to_s] :robots_json The path of the JSON file containing the user agent whitelist & blacklist.
|
21
|
-
# @option options [true, false] :
|
21
|
+
# @option options [true, false] :check_file_extensions Set to `true` to ignore direct requests to files.
|
22
22
|
# @option options [Rack::Request] :request The Rack request that is to be checked/detected.
|
23
23
|
def initialize(options={})
|
24
24
|
raise TypeError, 'options must be a Hash or respond to #to_h or #to_hash' unless options.is_a?(Hash) || options.respond_to?(:to_h) || options.respond_to?(:to_hash)
|
@@ -29,10 +29,10 @@ module SnapSearch
|
|
29
29
|
ignored_routes: [],
|
30
30
|
robots_json: SnapSearch.root.join('resources', 'robots.json'),
|
31
31
|
extensions_json: SnapSearch.root.join('resources', 'extensions.json'),
|
32
|
-
|
32
|
+
check_file_extensions: false
|
33
33
|
}.merge(options) # Reverse merge: The hash `merge` is called on is used as the default and the options argument is merged into it
|
34
34
|
|
35
|
-
@matched_routes, @ignored_routes, @
|
35
|
+
@matched_routes, @ignored_routes, @check_file_extensions = options.values_at(:matched_routes, :ignored_routes, :check_file_extensions)
|
36
36
|
|
37
37
|
self.robots_json = @options[:robots_json] # Use the setter method which sets the @robots_json instance variable to the path, then sets @robots to the parsed JSON of the path's contents.
|
38
38
|
self.extensions_json = @options[:extensions_json] # Use the setter method which sets the @extensions_json instance variable to the path, then sets @extensions to the parsed JSON of the path's contents.
|
@@ -1,8 +1,8 @@
|
|
1
|
-
module SnapSearch
|
2
|
-
|
3
|
-
# The superclass for all SnapSearch based exceptions.
|
4
|
-
class
|
5
|
-
|
6
|
-
end
|
7
|
-
|
8
|
-
end
|
1
|
+
module SnapSearch
|
2
|
+
|
3
|
+
# The superclass for all SnapSearch based exceptions.
|
4
|
+
class Error < StandardError
|
5
|
+
|
6
|
+
end
|
7
|
+
|
8
|
+
end
|
@@ -18,18 +18,25 @@ module SnapSearch
|
|
18
18
|
def intercept(options={})
|
19
19
|
encoded_url = @detector.get_encoded_url( options[:request].params, Addressable::URI.parse(options[:request].url) )
|
20
20
|
|
21
|
-
|
22
|
-
unless @before_intercept.nil?
|
23
|
-
result = @before_intercept.call(encoded_url)
|
24
|
-
|
25
|
-
return result.to_hash if !result.nil? && (result.respond_to?(:to_h) || result.respond_to?(:to_hash))
|
26
|
-
end
|
27
|
-
response = @detector.detect(options) ? @client.request(encoded_url) : false
|
28
|
-
|
29
|
-
# call the after response interceptor, and pass in the response Hash (which is always going to be a Hash)
|
30
|
-
@after_intercept.call(encoded_url, response) unless @after_intercept.nil?
|
21
|
+
detected = @detector.detect(options)
|
31
22
|
|
32
|
-
|
23
|
+
if detected
|
24
|
+
# all the before interceptor and return an Hash response if it has one
|
25
|
+
unless @before_intercept.nil?
|
26
|
+
result = @before_intercept.call(encoded_url)
|
27
|
+
|
28
|
+
return result.to_hash if !result.nil? && (result.respond_to?(:to_h) || result.respond_to?(:to_hash))
|
29
|
+
end
|
30
|
+
|
31
|
+
response = @client.request(encoded_url)
|
32
|
+
|
33
|
+
# call the after response interceptor, and pass in the response Hash (which is always going to be a Hash)
|
34
|
+
@after_intercept.call(encoded_url, response) unless @after_intercept.nil?
|
35
|
+
|
36
|
+
response
|
37
|
+
else
|
38
|
+
false
|
39
|
+
end
|
33
40
|
end
|
34
41
|
|
35
42
|
# Before intercept callback.
|
@@ -1,17 +1,17 @@
|
|
1
|
-
require 'snap_search/
|
2
|
-
|
3
|
-
module SnapSearch
|
4
|
-
|
5
|
-
# Raised when the parameters of a request from the Client are not valid.
|
6
|
-
class
|
7
|
-
|
8
|
-
# Raise a new
|
9
|
-
def initialize(response_content)
|
10
|
-
error_messages = response_content.values.collect { |message| " #{message}" }.join("\n")
|
11
|
-
|
12
|
-
super("Validation error from SnapSearch. Check your request parameters:\n#{ error_messages }")
|
13
|
-
end
|
14
|
-
|
15
|
-
end
|
16
|
-
|
17
|
-
end
|
1
|
+
require 'snap_search/error'
|
2
|
+
|
3
|
+
module SnapSearch
|
4
|
+
|
5
|
+
# Raised when the parameters of a request from the Client are not valid.
|
6
|
+
class ValidationError < Error
|
7
|
+
|
8
|
+
# Raise a new ValidationError
|
9
|
+
def initialize(response_content)
|
10
|
+
error_messages = response_content.values.collect { |message| " #{message}" }.join("\n")
|
11
|
+
|
12
|
+
super("Validation error from SnapSearch. Check your request parameters:\n#{ error_messages }")
|
13
|
+
end
|
14
|
+
|
15
|
+
end
|
16
|
+
|
17
|
+
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: snapsearch-client-ruby
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version:
|
4
|
+
version: 1.0.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ryan Scott Lewis
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-03-
|
11
|
+
date: 2014-03-14 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: version
|
@@ -153,11 +153,11 @@ files:
|
|
153
153
|
- lib/rack/snap_search/config.rb
|
154
154
|
- lib/rack/snap_search.rb
|
155
155
|
- lib/snap_search/client.rb
|
156
|
-
- lib/snap_search/
|
156
|
+
- lib/snap_search/connection_error.rb
|
157
157
|
- lib/snap_search/detector.rb
|
158
|
-
- lib/snap_search/
|
158
|
+
- lib/snap_search/error.rb
|
159
159
|
- lib/snap_search/interceptor.rb
|
160
|
-
- lib/snap_search/
|
160
|
+
- lib/snap_search/validation_error.rb
|
161
161
|
- lib/snap_search.rb
|
162
162
|
- examples/rack/config.ru
|
163
163
|
- examples/rack/Gemfile
|