browser 2.7.1 → 3.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +1 -1
- data/CHANGELOG.md +11 -0
- data/README.md +33 -13
- data/Rakefile +8 -1
- data/bot_exceptions.yml +1 -0
- data/bots.yml +3 -1
- data/browser.gemspec +1 -1
- data/lib/browser.rb +2 -2
- data/lib/browser/aliases.rb +1 -1
- data/lib/browser/base.rb +2 -2
- data/lib/browser/bot.rb +34 -26
- data/lib/browser/bot/empty_user_agent_matcher.rb +11 -0
- data/lib/browser/bot/keyword_matcher.rb +11 -0
- data/lib/browser/bot/known_bots_matcher.rb +11 -0
- data/lib/browser/browser.rb +40 -37
- data/lib/browser/chrome.rb +1 -1
- data/lib/browser/device.rb +25 -22
- data/lib/browser/meta.rb +13 -13
- data/lib/browser/middleware.rb +1 -1
- data/lib/browser/middleware/context/additions.rb +1 -1
- data/lib/browser/platform.rb +16 -13
- data/lib/browser/qq.rb +1 -1
- data/lib/browser/rails.rb +3 -3
- data/lib/browser/sputnik.rb +1 -4
- data/lib/browser/version.rb +1 -1
- data/test/browser_test.rb +0 -4
- data/test/ua.yml +3 -2
- data/test/ua_bots.yml +2 -0
- data/test/ua_search_engines.yml +1 -0
- data/test/unit/accept_language_test.rb +6 -0
- data/test/unit/blackberry_test.rb +0 -6
- data/test/unit/bots_test.rb +26 -39
- data/test/unit/chrome_test.rb +8 -3
- data/test/unit/edge_test.rb +0 -4
- data/test/unit/firefox_test.rb +0 -3
- data/test/unit/internet_explorer_test.rb +0 -12
- data/test/unit/ios_test.rb +0 -5
- data/test/unit/kindle_test.rb +0 -2
- data/test/unit/meta_test.rb +0 -1
- data/test/unit/opera_test.rb +0 -2
- data/test/unit/sputnik_test.rb +1 -0
- metadata +9 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 73c5a3410e573b333ce54118589c25ca868b6daa5df15e2e4962da120521139b
|
4
|
+
data.tar.gz: 7d47dbef918c0ec0176c59d81d592285e273d259769a80fc4edcafc335f69c9d
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ad5612535803d139af879d09312cdba4c77bc4c1351a483ed16fc677f4372177da92afeaea6b6fb87912e1cd9ae9450c49d97bc58cd494522ee9b206411e3b49
|
7
|
+
data.tar.gz: f1f8223336528e77c863843b918bd303be591e57b0b26c09bef24fe238406e3896637238b28b573792b8b8f25b20d1a4dbe3369dd30dd9e9671b8e702817b00e
|
data/.rubocop.yml
CHANGED
data/CHANGELOG.md
CHANGED
@@ -2,6 +2,17 @@
|
|
2
2
|
|
3
3
|
## Unreleased
|
4
4
|
|
5
|
+
- Add ArchiveTeam's ArchiveBot to the bot list.
|
6
|
+
- Fix QQ Browser detection.
|
7
|
+
- Update modern rules.
|
8
|
+
- You can now define new bot matchers by adding a callable object to `Browser::Bot.matchers`.
|
9
|
+
- Fix `browser.yandex?` and `browser.sputnik?`.
|
10
|
+
- [BREAKING CHANGE] Removed methods to enable the bot's empty user agent detection (`Browser::Bot.detect_empty_ua!` and `Browser::Bot.detect_empty_ua?`).
|
11
|
+
- [BREAKING CHANGE] Bot detection is now more aggressive by default. It matches empty user agents, anything that matches `crawl|fetch|search|monitoring|spider|bot`, and anything listed under https://github.com/fnando/browser/blob/master/bots.yml.
|
12
|
+
- Add Jaunt to the bot list.
|
13
|
+
|
14
|
+
## 2.7.1
|
15
|
+
|
5
16
|
- Handle Snapchat user agents that have a space or an empty string instead of a slash before the version.
|
6
17
|
- Fix iOS 10+ version detection.
|
7
18
|
- Add fallback versions for instagram and snapchat to avoid NoMethodErrors on unexpected user agents.
|
data/README.md
CHANGED
@@ -58,7 +58,8 @@ browser.sputnik?
|
|
58
58
|
browser.bot.name
|
59
59
|
browser.bot.search_engine?
|
60
60
|
browser.bot?
|
61
|
-
|
61
|
+
browser.bot.why? # shows which matcher detected this user agent as a bot.
|
62
|
+
Browser::Bot.why?(ua)
|
62
63
|
|
63
64
|
# Get device info
|
64
65
|
browser.device
|
@@ -146,21 +147,21 @@ browser.mobile? #=> false
|
|
146
147
|
|
147
148
|
### What defines a modern browser?
|
148
149
|
|
149
|
-
The current rules that define a modern browser are pretty loose
|
150
|
+
The current rules that define a modern browser are pretty loose.
|
150
151
|
|
151
|
-
*
|
152
|
-
*
|
153
|
-
*
|
154
|
-
*
|
155
|
-
*
|
156
|
-
* Opera
|
152
|
+
* Chrome 65+
|
153
|
+
* Safari 10+
|
154
|
+
* Firefox 52+
|
155
|
+
* IE11+
|
156
|
+
* Microsoft Edge 39+
|
157
|
+
* Opera 50+
|
157
158
|
|
158
159
|
You can define your own rules. A rule must be a proc/lambda or any object that implements the method === and accepts the browser object. To redefine all rules, clear the existing rules before adding your own.
|
159
160
|
|
160
161
|
```ruby
|
161
|
-
# Only Chrome
|
162
|
+
# Only Google Chrome 79+ is considered modern.
|
162
163
|
Browser.modern_rules.clear
|
163
|
-
Browser.modern_rules << -> b { b.chrome? && b.version.to_i >=
|
164
|
+
Browser.modern_rules << -> b { b.chrome? && b.version.to_i >= 79 }
|
164
165
|
```
|
165
166
|
|
166
167
|
### Rails integration
|
@@ -213,7 +214,7 @@ language.name
|
|
213
214
|
#=> "English/United States"
|
214
215
|
```
|
215
216
|
|
216
|
-
Result is always sorted in quality order from highest
|
217
|
+
Result is always sorted in quality order from highest to lowest. As per the HTTP spec:
|
217
218
|
|
218
219
|
- omitting the quality value implies 1.0.
|
219
220
|
- quality value equal to zero means that is not accepted by the client.
|
@@ -266,10 +267,29 @@ browser.platform.ios?
|
|
266
267
|
|
267
268
|
### Bots
|
268
269
|
|
269
|
-
|
270
|
+
The bot detection is quite aggressive. Anything that matches at least one of the following requirements will be considered a bot.
|
271
|
+
|
272
|
+
- Empty user agent string
|
273
|
+
- User agent that matches `/crawl|fetch|search|monitoring|spider|bot/`
|
274
|
+
- Any known bot listed under [bots.yml](https://github.com/fnando/browser/blob/master/bots.yml)
|
275
|
+
|
276
|
+
To add custom matchers, you can add a callable object to `Browser::Bot.matchers`. The following example matches everything that has a `externalhit` substring on it. The bot name will always be `General Bot`.
|
277
|
+
|
278
|
+
```ruby
|
279
|
+
Browser::Bot.matchers << ->(ua, _browser) { ua =~ /externalhit/i }
|
280
|
+
```
|
281
|
+
|
282
|
+
To clear all matchers, including the ones that are bundled, use `Browser::Bot.matchers.clear`. You can re-add built-in matchers by doing the following:
|
283
|
+
|
284
|
+
```ruby
|
285
|
+
Browser::Bot.matchers += Browser::Bot.default_matchers
|
286
|
+
```
|
287
|
+
|
288
|
+
To restore v2's bot detection, remove the following matchers:
|
270
289
|
|
271
290
|
```ruby
|
272
|
-
Browser::Bot.
|
291
|
+
Browser::Bot.matchers.delete(Browser::Bot::KeywordMatcher)
|
292
|
+
Browser::Bot.matchers.delete(Browser::Bot::EmptyUserAgentMatcher)
|
273
293
|
```
|
274
294
|
|
275
295
|
### Middleware
|
data/Rakefile
CHANGED
@@ -16,7 +16,14 @@ end
|
|
16
16
|
require "rubocop/rake_task"
|
17
17
|
desc "Run rubocop"
|
18
18
|
task :rubocop do
|
19
|
-
RuboCop::RakeTask.new
|
19
|
+
RuboCop::RakeTask.new do |t|
|
20
|
+
t.options += %w[
|
21
|
+
--display-style-guide
|
22
|
+
--display-cop-names
|
23
|
+
--extra-details
|
24
|
+
--auto-correct
|
25
|
+
]
|
26
|
+
end
|
20
27
|
end
|
21
28
|
|
22
29
|
desc "Run specs against all gemfiles"
|
data/bot_exceptions.yml
CHANGED
data/bots.yml
CHANGED
@@ -1,3 +1,4 @@
|
|
1
|
+
---
|
1
2
|
200pleasebot: "200PleaseBot"
|
2
3
|
360spider: "360Spider"
|
3
4
|
abot: "CrawlDaddy, abot"
|
@@ -16,6 +17,7 @@ apis-google: APIs-Google
|
|
16
17
|
appengine-google: "Google App Engine"
|
17
18
|
applebot: "Apple Bot"
|
18
19
|
archive.org_bot: "Internet Archive (archive.org)"
|
20
|
+
archiveteam archivebot: "ArchiveTeam ArchiveBot"
|
19
21
|
ask jeeves: "Ask Jeeves"
|
20
22
|
asynchttpclient: "Java http and WebSocket client library"
|
21
23
|
awe.sm: "Awe.sm URL expander"
|
@@ -109,6 +111,7 @@ insieve: "Insieve Bot"
|
|
109
111
|
insitesbot: "Insitesbot"
|
110
112
|
instapaper: "Instapaper"
|
111
113
|
istellabot: "IstellaBot"
|
114
|
+
jaunt: "Jaunt - Java Web Scraping & JSON Querying"
|
112
115
|
jetslide: "Jetslide"
|
113
116
|
jobseeker: "jobseeker.com.au/bot.html"
|
114
117
|
jooble: "Jooble"
|
@@ -220,7 +223,6 @@ snapchat: "Snapchat"
|
|
220
223
|
socialrank: "SocialRankIOBot"
|
221
224
|
sogou: "Chinese search engine"
|
222
225
|
spbot: "OpenLinkProfiler"
|
223
|
-
spider: "generic web spider"
|
224
226
|
spinn3r: "Spinn3r aggregator"
|
225
227
|
sputnikbot: "SputnikBot"
|
226
228
|
squider: "Squider"
|
data/browser.gemspec
CHANGED
@@ -29,6 +29,6 @@ Gem::Specification.new do |s|
|
|
29
29
|
s.add_development_dependency "rails"
|
30
30
|
s.add_development_dependency "rake"
|
31
31
|
s.add_development_dependency "rubocop"
|
32
|
-
s.add_development_dependency "rubocop-fnando"
|
32
|
+
s.add_development_dependency "rubocop-fnando", "~> 0.0.3"
|
33
33
|
s.add_development_dependency "simplecov"
|
34
34
|
end
|
data/lib/browser.rb
CHANGED
data/lib/browser/aliases.rb
CHANGED
data/lib/browser/base.rb
CHANGED
@@ -151,12 +151,12 @@ module Browser
|
|
151
151
|
|
152
152
|
# Detect if browser is Sputnik.
|
153
153
|
def sputnik?(expected_version = nil)
|
154
|
-
Sputnik.new(ua) && detect_version?(full_version, expected_version)
|
154
|
+
Sputnik.new(ua).match? && detect_version?(full_version, expected_version)
|
155
155
|
end
|
156
156
|
|
157
157
|
# Detect if browser is Yandex.
|
158
158
|
def yandex?(expected_version = nil)
|
159
|
-
Yandex.new(ua) && detect_version?(full_version, expected_version)
|
159
|
+
Yandex.new(ua).match? && detect_version?(full_version, expected_version)
|
160
160
|
end
|
161
161
|
alias_method :yandex_browser?, :yandex?
|
162
162
|
|
data/lib/browser/bot.rb
CHANGED
@@ -2,68 +2,76 @@
|
|
2
2
|
|
3
3
|
module Browser
|
4
4
|
class Bot
|
5
|
-
|
6
|
-
|
5
|
+
GENERIC_NAME = "Generic Bot"
|
6
|
+
|
7
|
+
def self.matchers
|
8
|
+
@matchers ||= default_matchers
|
9
|
+
end
|
10
|
+
|
11
|
+
def self.default_matchers
|
12
|
+
[
|
13
|
+
EmptyUserAgentMatcher,
|
14
|
+
KnownBotsMatcher,
|
15
|
+
KeywordMatcher
|
16
|
+
]
|
7
17
|
end
|
8
18
|
|
9
|
-
def self.
|
10
|
-
|
19
|
+
def self.load_yaml(path)
|
20
|
+
YAML.load_file(Browser.root.join(path))
|
11
21
|
end
|
12
22
|
|
13
23
|
def self.bots
|
14
|
-
@bots ||=
|
24
|
+
@bots ||= load_yaml("bots.yml")
|
15
25
|
end
|
16
26
|
|
17
27
|
def self.bot_exceptions
|
18
|
-
@bot_exceptions ||=
|
19
|
-
.load_file(Browser.root.join("bot_exceptions.yml"))
|
28
|
+
@bot_exceptions ||= load_yaml("bot_exceptions.yml")
|
20
29
|
end
|
21
30
|
|
22
31
|
def self.search_engines
|
23
|
-
@search_engines ||=
|
24
|
-
.load_file(Browser.root.join("search_engines.yml"))
|
32
|
+
@search_engines ||= load_yaml("search_engines.yml")
|
25
33
|
end
|
26
34
|
|
27
35
|
def self.why?(ua)
|
28
|
-
|
29
|
-
|
36
|
+
ua = ua.downcase.strip
|
37
|
+
browser = Browser.new(ua)
|
38
|
+
matchers.find {|matcher| matcher.call(ua, browser) }
|
30
39
|
end
|
31
40
|
|
32
|
-
attr_reader :ua
|
41
|
+
attr_reader :ua, :browser
|
33
42
|
|
34
43
|
def initialize(ua)
|
35
|
-
@ua = ua
|
44
|
+
@ua = ua.downcase.strip
|
45
|
+
@browser = Browser.new(@ua)
|
36
46
|
end
|
37
47
|
|
38
48
|
def bot?
|
39
|
-
|
49
|
+
!bot_exception? && detect_bot?
|
50
|
+
end
|
51
|
+
|
52
|
+
def why?
|
53
|
+
self.class.matchers.find {|matcher| matcher.call(ua, self) }
|
40
54
|
end
|
41
55
|
|
42
56
|
def search_engine?
|
43
|
-
self.class.search_engines.any? {|key, _|
|
57
|
+
self.class.search_engines.any? {|key, _| ua.include?(key) }
|
44
58
|
end
|
45
59
|
|
46
60
|
def name
|
47
61
|
return unless bot?
|
48
|
-
return "Generic Bot" if bot_with_empty_ua?
|
49
|
-
|
50
|
-
self.class.bots.find {|key, _| downcased_ua.include?(key) }.last
|
51
|
-
end
|
52
62
|
|
53
|
-
|
54
|
-
self.class.detect_empty_ua? && ua.strip == ""
|
63
|
+
self.class.bots.find {|key, _| ua.include?(key) }&.last || GENERIC_NAME
|
55
64
|
end
|
56
65
|
|
57
66
|
private def bot_exception?
|
58
|
-
self.class.bot_exceptions.any? {|key|
|
67
|
+
self.class.bot_exceptions.any? {|key| ua.include?(key) }
|
59
68
|
end
|
60
69
|
|
61
70
|
private def detect_bot?
|
62
|
-
self.class.
|
71
|
+
self.class.matchers.any? {|matcher| matcher.call(ua, browser) }
|
63
72
|
end
|
64
73
|
|
65
|
-
private
|
66
|
-
|
67
|
-
end
|
74
|
+
private :ua
|
75
|
+
private :browser
|
68
76
|
end
|
69
77
|
end
|
data/lib/browser/browser.rb
CHANGED
@@ -4,39 +4,42 @@ require "set"
|
|
4
4
|
require "yaml"
|
5
5
|
require "pathname"
|
6
6
|
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
7
|
+
require_relative "version"
|
8
|
+
require_relative "detect_version"
|
9
|
+
require_relative "accept_language"
|
10
|
+
require_relative "base"
|
11
|
+
require_relative "safari"
|
12
|
+
require_relative "chrome"
|
13
|
+
require_relative "internet_explorer"
|
14
|
+
require_relative "firefox"
|
15
|
+
require_relative "edge"
|
16
|
+
require_relative "opera"
|
17
|
+
require_relative "blackberry"
|
18
|
+
require_relative "generic"
|
19
|
+
require_relative "phantom_js"
|
20
|
+
require_relative "uc_browser"
|
21
|
+
require_relative "nokia"
|
22
|
+
require_relative "micro_messenger"
|
23
|
+
require_relative "weibo"
|
24
|
+
require_relative "qq"
|
25
|
+
require_relative "alipay"
|
26
|
+
require_relative "electron"
|
27
|
+
require_relative "facebook"
|
28
|
+
require_relative "otter"
|
29
|
+
require_relative "instagram"
|
30
|
+
require_relative "yandex"
|
31
|
+
require_relative "sputnik"
|
32
|
+
require_relative "snapchat"
|
33
33
|
|
34
|
-
|
35
|
-
|
34
|
+
require_relative "bot"
|
35
|
+
require_relative "bot/empty_user_agent_matcher"
|
36
|
+
require_relative "bot/keyword_matcher"
|
37
|
+
require_relative "bot/known_bots_matcher"
|
36
38
|
|
37
|
-
|
38
|
-
|
39
|
-
|
39
|
+
require_relative "middleware"
|
40
|
+
require_relative "platform"
|
41
|
+
require_relative "device"
|
42
|
+
require_relative "meta"
|
40
43
|
|
41
44
|
module Browser
|
42
45
|
EMPTY_STRING = ""
|
@@ -89,12 +92,12 @@ module Browser
|
|
89
92
|
end
|
90
93
|
|
91
94
|
modern_rules.tap do |rules|
|
92
|
-
rules << ->(b) { b.
|
93
|
-
rules << ->(b) { b.
|
94
|
-
rules << ->(b) { b.
|
95
|
-
rules << ->(b) { b.
|
96
|
-
rules << ->(b) { b.
|
97
|
-
rules << ->(b) { b.
|
95
|
+
rules << ->(b) { b.chrome? && b.version.to_i >= 65 }
|
96
|
+
rules << ->(b) { b.safari? && b.version.to_i >= 10 }
|
97
|
+
rules << ->(b) { b.firefox? && b.version.to_i >= 52 }
|
98
|
+
rules << ->(b) { b.ie? && b.version.to_i >= 11 && !b.compatibility_view? }
|
99
|
+
rules << ->(b) { b.edge? && b.version.to_i >= 39 && !b.compatibility_view? }
|
100
|
+
rules << ->(b) { b.opera? && b.version.to_i >= 50 }
|
98
101
|
end
|
99
102
|
|
100
103
|
def self.new(user_agent, **kwargs)
|