browser 2.7.1 → 3.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.rubocop.yml +1 -1
- data/CHANGELOG.md +11 -0
- data/README.md +33 -13
- data/Rakefile +8 -1
- data/bot_exceptions.yml +1 -0
- data/bots.yml +3 -1
- data/browser.gemspec +1 -1
- data/lib/browser.rb +2 -2
- data/lib/browser/aliases.rb +1 -1
- data/lib/browser/base.rb +2 -2
- data/lib/browser/bot.rb +34 -26
- data/lib/browser/bot/empty_user_agent_matcher.rb +11 -0
- data/lib/browser/bot/keyword_matcher.rb +11 -0
- data/lib/browser/bot/known_bots_matcher.rb +11 -0
- data/lib/browser/browser.rb +40 -37
- data/lib/browser/chrome.rb +1 -1
- data/lib/browser/device.rb +25 -22
- data/lib/browser/meta.rb +13 -13
- data/lib/browser/middleware.rb +1 -1
- data/lib/browser/middleware/context/additions.rb +1 -1
- data/lib/browser/platform.rb +16 -13
- data/lib/browser/qq.rb +1 -1
- data/lib/browser/rails.rb +3 -3
- data/lib/browser/sputnik.rb +1 -4
- data/lib/browser/version.rb +1 -1
- data/test/browser_test.rb +0 -4
- data/test/ua.yml +3 -2
- data/test/ua_bots.yml +2 -0
- data/test/ua_search_engines.yml +1 -0
- data/test/unit/accept_language_test.rb +6 -0
- data/test/unit/blackberry_test.rb +0 -6
- data/test/unit/bots_test.rb +26 -39
- data/test/unit/chrome_test.rb +8 -3
- data/test/unit/edge_test.rb +0 -4
- data/test/unit/firefox_test.rb +0 -3
- data/test/unit/internet_explorer_test.rb +0 -12
- data/test/unit/ios_test.rb +0 -5
- data/test/unit/kindle_test.rb +0 -2
- data/test/unit/meta_test.rb +0 -1
- data/test/unit/opera_test.rb +0 -2
- data/test/unit/sputnik_test.rb +1 -0
- metadata +9 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 73c5a3410e573b333ce54118589c25ca868b6daa5df15e2e4962da120521139b
|
4
|
+
data.tar.gz: 7d47dbef918c0ec0176c59d81d592285e273d259769a80fc4edcafc335f69c9d
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ad5612535803d139af879d09312cdba4c77bc4c1351a483ed16fc677f4372177da92afeaea6b6fb87912e1cd9ae9450c49d97bc58cd494522ee9b206411e3b49
|
7
|
+
data.tar.gz: f1f8223336528e77c863843b918bd303be591e57b0b26c09bef24fe238406e3896637238b28b573792b8b8f25b20d1a4dbe3369dd30dd9e9671b8e702817b00e
|
data/.rubocop.yml
CHANGED
data/CHANGELOG.md
CHANGED
@@ -2,6 +2,17 @@
|
|
2
2
|
|
3
3
|
## Unreleased
|
4
4
|
|
5
|
+
- Add ArchiveTeam's ArchiveBot to the bot list.
|
6
|
+
- Fix QQ Browser detection.
|
7
|
+
- Update modern rules.
|
8
|
+
- You can now define new bot matchers by adding a callable object to `Browser::Bot.matchers`.
|
9
|
+
- Fix `browser.yandex?` and `browser.sputnik?`.
|
10
|
+
- [BREAKING CHANGE] Removed methods to enable the bot's empty user agent detection (`Browser::Bot.detect_empty_ua!` and `Browser::Bot.detect_empty_ua?`).
|
11
|
+
- [BREAKING CHANGE] Bot detection is now more aggressive by default. It matches empty user agents, anything that matches `crawl|fetch|search|monitoring|spider|bot`, and anything listed under https://github.com/fnando/browser/blob/master/bots.yml.
|
12
|
+
- Add Jaunt to the bot list.
|
13
|
+
|
14
|
+
## 2.7.1
|
15
|
+
|
5
16
|
- Handle Snapchat user agents that have a space or an empty string instead of a slash before the version.
|
6
17
|
- Fix iOS 10+ version detection.
|
7
18
|
- Add fallback versions for instagram and snapchat to avoid NoMethodErrors on unexpected user agents.
|
data/README.md
CHANGED
@@ -58,7 +58,8 @@ browser.sputnik?
|
|
58
58
|
browser.bot.name
|
59
59
|
browser.bot.search_engine?
|
60
60
|
browser.bot?
|
61
|
-
|
61
|
+
browser.bot.why? # shows which matcher detected this user agent as a bot.
|
62
|
+
Browser::Bot.why?(ua)
|
62
63
|
|
63
64
|
# Get device info
|
64
65
|
browser.device
|
@@ -146,21 +147,21 @@ browser.mobile? #=> false
|
|
146
147
|
|
147
148
|
### What defines a modern browser?
|
148
149
|
|
149
|
-
The current rules that define a modern browser are pretty loose
|
150
|
+
The current rules that define a modern browser are pretty loose.
|
150
151
|
|
151
|
-
*
|
152
|
-
*
|
153
|
-
*
|
154
|
-
*
|
155
|
-
*
|
156
|
-
* Opera
|
152
|
+
* Chrome 65+
|
153
|
+
* Safari 10+
|
154
|
+
* Firefox 52+
|
155
|
+
* IE11+
|
156
|
+
* Microsoft Edge 39+
|
157
|
+
* Opera 50+
|
157
158
|
|
158
159
|
You can define your own rules. A rule must be a proc/lambda or any object that implements the method === and accepts the browser object. To redefine all rules, clear the existing rules before adding your own.
|
159
160
|
|
160
161
|
```ruby
|
161
|
-
# Only Chrome
|
162
|
+
# Only Google Chrome 79+ is considered modern.
|
162
163
|
Browser.modern_rules.clear
|
163
|
-
Browser.modern_rules << -> b { b.chrome? && b.version.to_i >=
|
164
|
+
Browser.modern_rules << -> b { b.chrome? && b.version.to_i >= 79 }
|
164
165
|
```
|
165
166
|
|
166
167
|
### Rails integration
|
@@ -213,7 +214,7 @@ language.name
|
|
213
214
|
#=> "English/United States"
|
214
215
|
```
|
215
216
|
|
216
|
-
Result is always sorted in quality order from highest
|
217
|
+
Result is always sorted in quality order from highest to lowest. As per the HTTP spec:
|
217
218
|
|
218
219
|
- omitting the quality value implies 1.0.
|
219
220
|
- quality value equal to zero means that is not accepted by the client.
|
@@ -266,10 +267,29 @@ browser.platform.ios?
|
|
266
267
|
|
267
268
|
### Bots
|
268
269
|
|
269
|
-
|
270
|
+
The bot detection is quite aggressive. Anything that matches at least one of the following requirements will be considered a bot.
|
271
|
+
|
272
|
+
- Empty user agent string
|
273
|
+
- User agent that matches `/crawl|fetch|search|monitoring|spider|bot/`
|
274
|
+
- Any known bot listed under [bots.yml](https://github.com/fnando/browser/blob/master/bots.yml)
|
275
|
+
|
276
|
+
To add custom matchers, you can add a callable object to `Browser::Bot.matchers`. The following example matches everything that has a `externalhit` substring on it. The bot name will always be `General Bot`.
|
277
|
+
|
278
|
+
```ruby
|
279
|
+
Browser::Bot.matchers << ->(ua, _browser) { ua =~ /externalhit/i }
|
280
|
+
```
|
281
|
+
|
282
|
+
To clear all matchers, including the ones that are bundled, use `Browser::Bot.matchers.clear`. You can re-add built-in matchers by doing the following:
|
283
|
+
|
284
|
+
```ruby
|
285
|
+
Browser::Bot.matchers += Browser::Bot.default_matchers
|
286
|
+
```
|
287
|
+
|
288
|
+
To restore v2's bot detection, remove the following matchers:
|
270
289
|
|
271
290
|
```ruby
|
272
|
-
Browser::Bot.
|
291
|
+
Browser::Bot.matchers.delete(Browser::Bot::KeywordMatcher)
|
292
|
+
Browser::Bot.matchers.delete(Browser::Bot::EmptyUserAgentMatcher)
|
273
293
|
```
|
274
294
|
|
275
295
|
### Middleware
|
data/Rakefile
CHANGED
@@ -16,7 +16,14 @@ end
|
|
16
16
|
require "rubocop/rake_task"
|
17
17
|
desc "Run rubocop"
|
18
18
|
task :rubocop do
|
19
|
-
RuboCop::RakeTask.new
|
19
|
+
RuboCop::RakeTask.new do |t|
|
20
|
+
t.options += %w[
|
21
|
+
--display-style-guide
|
22
|
+
--display-cop-names
|
23
|
+
--extra-details
|
24
|
+
--auto-correct
|
25
|
+
]
|
26
|
+
end
|
20
27
|
end
|
21
28
|
|
22
29
|
desc "Run specs against all gemfiles"
|
data/bot_exceptions.yml
CHANGED
data/bots.yml
CHANGED
@@ -1,3 +1,4 @@
|
|
1
|
+
---
|
1
2
|
200pleasebot: "200PleaseBot"
|
2
3
|
360spider: "360Spider"
|
3
4
|
abot: "CrawlDaddy, abot"
|
@@ -16,6 +17,7 @@ apis-google: APIs-Google
|
|
16
17
|
appengine-google: "Google App Engine"
|
17
18
|
applebot: "Apple Bot"
|
18
19
|
archive.org_bot: "Internet Archive (archive.org)"
|
20
|
+
archiveteam archivebot: "ArchiveTeam ArchiveBot"
|
19
21
|
ask jeeves: "Ask Jeeves"
|
20
22
|
asynchttpclient: "Java http and WebSocket client library"
|
21
23
|
awe.sm: "Awe.sm URL expander"
|
@@ -109,6 +111,7 @@ insieve: "Insieve Bot"
|
|
109
111
|
insitesbot: "Insitesbot"
|
110
112
|
instapaper: "Instapaper"
|
111
113
|
istellabot: "IstellaBot"
|
114
|
+
jaunt: "Jaunt - Java Web Scraping & JSON Querying"
|
112
115
|
jetslide: "Jetslide"
|
113
116
|
jobseeker: "jobseeker.com.au/bot.html"
|
114
117
|
jooble: "Jooble"
|
@@ -220,7 +223,6 @@ snapchat: "Snapchat"
|
|
220
223
|
socialrank: "SocialRankIOBot"
|
221
224
|
sogou: "Chinese search engine"
|
222
225
|
spbot: "OpenLinkProfiler"
|
223
|
-
spider: "generic web spider"
|
224
226
|
spinn3r: "Spinn3r aggregator"
|
225
227
|
sputnikbot: "SputnikBot"
|
226
228
|
squider: "Squider"
|
data/browser.gemspec
CHANGED
@@ -29,6 +29,6 @@ Gem::Specification.new do |s|
|
|
29
29
|
s.add_development_dependency "rails"
|
30
30
|
s.add_development_dependency "rake"
|
31
31
|
s.add_development_dependency "rubocop"
|
32
|
-
s.add_development_dependency "rubocop-fnando"
|
32
|
+
s.add_development_dependency "rubocop-fnando", "~> 0.0.3"
|
33
33
|
s.add_development_dependency "simplecov"
|
34
34
|
end
|
data/lib/browser.rb
CHANGED
data/lib/browser/aliases.rb
CHANGED
data/lib/browser/base.rb
CHANGED
@@ -151,12 +151,12 @@ module Browser
|
|
151
151
|
|
152
152
|
# Detect if browser is Sputnik.
|
153
153
|
def sputnik?(expected_version = nil)
|
154
|
-
Sputnik.new(ua) && detect_version?(full_version, expected_version)
|
154
|
+
Sputnik.new(ua).match? && detect_version?(full_version, expected_version)
|
155
155
|
end
|
156
156
|
|
157
157
|
# Detect if browser is Yandex.
|
158
158
|
def yandex?(expected_version = nil)
|
159
|
-
Yandex.new(ua) && detect_version?(full_version, expected_version)
|
159
|
+
Yandex.new(ua).match? && detect_version?(full_version, expected_version)
|
160
160
|
end
|
161
161
|
alias_method :yandex_browser?, :yandex?
|
162
162
|
|
data/lib/browser/bot.rb
CHANGED
@@ -2,68 +2,76 @@
|
|
2
2
|
|
3
3
|
module Browser
|
4
4
|
class Bot
|
5
|
-
|
6
|
-
|
5
|
+
GENERIC_NAME = "Generic Bot"
|
6
|
+
|
7
|
+
def self.matchers
|
8
|
+
@matchers ||= default_matchers
|
9
|
+
end
|
10
|
+
|
11
|
+
def self.default_matchers
|
12
|
+
[
|
13
|
+
EmptyUserAgentMatcher,
|
14
|
+
KnownBotsMatcher,
|
15
|
+
KeywordMatcher
|
16
|
+
]
|
7
17
|
end
|
8
18
|
|
9
|
-
def self.
|
10
|
-
|
19
|
+
def self.load_yaml(path)
|
20
|
+
YAML.load_file(Browser.root.join(path))
|
11
21
|
end
|
12
22
|
|
13
23
|
def self.bots
|
14
|
-
@bots ||=
|
24
|
+
@bots ||= load_yaml("bots.yml")
|
15
25
|
end
|
16
26
|
|
17
27
|
def self.bot_exceptions
|
18
|
-
@bot_exceptions ||=
|
19
|
-
.load_file(Browser.root.join("bot_exceptions.yml"))
|
28
|
+
@bot_exceptions ||= load_yaml("bot_exceptions.yml")
|
20
29
|
end
|
21
30
|
|
22
31
|
def self.search_engines
|
23
|
-
@search_engines ||=
|
24
|
-
.load_file(Browser.root.join("search_engines.yml"))
|
32
|
+
@search_engines ||= load_yaml("search_engines.yml")
|
25
33
|
end
|
26
34
|
|
27
35
|
def self.why?(ua)
|
28
|
-
|
29
|
-
|
36
|
+
ua = ua.downcase.strip
|
37
|
+
browser = Browser.new(ua)
|
38
|
+
matchers.find {|matcher| matcher.call(ua, browser) }
|
30
39
|
end
|
31
40
|
|
32
|
-
attr_reader :ua
|
41
|
+
attr_reader :ua, :browser
|
33
42
|
|
34
43
|
def initialize(ua)
|
35
|
-
@ua = ua
|
44
|
+
@ua = ua.downcase.strip
|
45
|
+
@browser = Browser.new(@ua)
|
36
46
|
end
|
37
47
|
|
38
48
|
def bot?
|
39
|
-
|
49
|
+
!bot_exception? && detect_bot?
|
50
|
+
end
|
51
|
+
|
52
|
+
def why?
|
53
|
+
self.class.matchers.find {|matcher| matcher.call(ua, self) }
|
40
54
|
end
|
41
55
|
|
42
56
|
def search_engine?
|
43
|
-
self.class.search_engines.any? {|key, _|
|
57
|
+
self.class.search_engines.any? {|key, _| ua.include?(key) }
|
44
58
|
end
|
45
59
|
|
46
60
|
def name
|
47
61
|
return unless bot?
|
48
|
-
return "Generic Bot" if bot_with_empty_ua?
|
49
|
-
|
50
|
-
self.class.bots.find {|key, _| downcased_ua.include?(key) }.last
|
51
|
-
end
|
52
62
|
|
53
|
-
|
54
|
-
self.class.detect_empty_ua? && ua.strip == ""
|
63
|
+
self.class.bots.find {|key, _| ua.include?(key) }&.last || GENERIC_NAME
|
55
64
|
end
|
56
65
|
|
57
66
|
private def bot_exception?
|
58
|
-
self.class.bot_exceptions.any? {|key|
|
67
|
+
self.class.bot_exceptions.any? {|key| ua.include?(key) }
|
59
68
|
end
|
60
69
|
|
61
70
|
private def detect_bot?
|
62
|
-
self.class.
|
71
|
+
self.class.matchers.any? {|matcher| matcher.call(ua, browser) }
|
63
72
|
end
|
64
73
|
|
65
|
-
private
|
66
|
-
|
67
|
-
end
|
74
|
+
private :ua
|
75
|
+
private :browser
|
68
76
|
end
|
69
77
|
end
|
data/lib/browser/browser.rb
CHANGED
@@ -4,39 +4,42 @@ require "set"
|
|
4
4
|
require "yaml"
|
5
5
|
require "pathname"
|
6
6
|
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
7
|
+
require_relative "version"
|
8
|
+
require_relative "detect_version"
|
9
|
+
require_relative "accept_language"
|
10
|
+
require_relative "base"
|
11
|
+
require_relative "safari"
|
12
|
+
require_relative "chrome"
|
13
|
+
require_relative "internet_explorer"
|
14
|
+
require_relative "firefox"
|
15
|
+
require_relative "edge"
|
16
|
+
require_relative "opera"
|
17
|
+
require_relative "blackberry"
|
18
|
+
require_relative "generic"
|
19
|
+
require_relative "phantom_js"
|
20
|
+
require_relative "uc_browser"
|
21
|
+
require_relative "nokia"
|
22
|
+
require_relative "micro_messenger"
|
23
|
+
require_relative "weibo"
|
24
|
+
require_relative "qq"
|
25
|
+
require_relative "alipay"
|
26
|
+
require_relative "electron"
|
27
|
+
require_relative "facebook"
|
28
|
+
require_relative "otter"
|
29
|
+
require_relative "instagram"
|
30
|
+
require_relative "yandex"
|
31
|
+
require_relative "sputnik"
|
32
|
+
require_relative "snapchat"
|
33
33
|
|
34
|
-
|
35
|
-
|
34
|
+
require_relative "bot"
|
35
|
+
require_relative "bot/empty_user_agent_matcher"
|
36
|
+
require_relative "bot/keyword_matcher"
|
37
|
+
require_relative "bot/known_bots_matcher"
|
36
38
|
|
37
|
-
|
38
|
-
|
39
|
-
|
39
|
+
require_relative "middleware"
|
40
|
+
require_relative "platform"
|
41
|
+
require_relative "device"
|
42
|
+
require_relative "meta"
|
40
43
|
|
41
44
|
module Browser
|
42
45
|
EMPTY_STRING = ""
|
@@ -89,12 +92,12 @@ module Browser
|
|
89
92
|
end
|
90
93
|
|
91
94
|
modern_rules.tap do |rules|
|
92
|
-
rules << ->(b) { b.
|
93
|
-
rules << ->(b) { b.
|
94
|
-
rules << ->(b) { b.
|
95
|
-
rules << ->(b) { b.
|
96
|
-
rules << ->(b) { b.
|
97
|
-
rules << ->(b) { b.
|
95
|
+
rules << ->(b) { b.chrome? && b.version.to_i >= 65 }
|
96
|
+
rules << ->(b) { b.safari? && b.version.to_i >= 10 }
|
97
|
+
rules << ->(b) { b.firefox? && b.version.to_i >= 52 }
|
98
|
+
rules << ->(b) { b.ie? && b.version.to_i >= 11 && !b.compatibility_view? }
|
99
|
+
rules << ->(b) { b.edge? && b.version.to_i >= 39 && !b.compatibility_view? }
|
100
|
+
rules << ->(b) { b.opera? && b.version.to_i >= 50 }
|
98
101
|
end
|
99
102
|
|
100
103
|
def self.new(user_agent, **kwargs)
|