browser 2.7.1 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 17e665a446ca2a9578cae49ad272bbbc596c29bf8518643cbcec39f2654e4bc4
4
- data.tar.gz: 0dd132f6f496e0221417a5b474878823eed6b7e4eac51764f383cd2825cb1969
3
+ metadata.gz: 73c5a3410e573b333ce54118589c25ca868b6daa5df15e2e4962da120521139b
4
+ data.tar.gz: 7d47dbef918c0ec0176c59d81d592285e273d259769a80fc4edcafc335f69c9d
5
5
  SHA512:
6
- metadata.gz: 2902444a66316eb79ccb9fd409d58d21ac764b6f22a86bb9a2af74791f3bd8c86301a7c7b8829fe17f14a24c0827b3584d0c0a7813f85c9a4022a179ea67db7a
7
- data.tar.gz: 1282a961f637a98449d6425272abcd3f005c003908ce37e0954052b3b5fd8bc6bf3d5d75e3104e6d3705823e3abe60f9eb9fca7046bf94f1c16f625cc6a09b07
6
+ metadata.gz: ad5612535803d139af879d09312cdba4c77bc4c1351a483ed16fc677f4372177da92afeaea6b6fb87912e1cd9ae9450c49d97bc58cd494522ee9b206411e3b49
7
+ data.tar.gz: f1f8223336528e77c863843b918bd303be591e57b0b26c09bef24fe238406e3896637238b28b573792b8b8f25b20d1a4dbe3369dd30dd9e9671b8e702817b00e
@@ -16,7 +16,7 @@ AllCops:
16
16
  Metrics/ClassLength:
17
17
  Enabled: false
18
18
 
19
- Metrics/LineLength:
19
+ Layout/LineLength:
20
20
  Max: 80
21
21
 
22
22
  Metrics/MethodLength:
@@ -2,6 +2,17 @@
2
2
 
3
3
  ## Unreleased
4
4
 
5
+ - Add ArchiveTeam's ArchiveBot to the bot list.
6
+ - Fix QQ Browser detection.
7
+ - Update modern rules.
8
+ - You can now define new bot matchers by adding a callable object to `Browser::Bot.matchers`.
9
+ - Fix `browser.yandex?` and `browser.sputnik?`.
10
+ - [BREAKING CHANGE] Removed methods to enable the bot's empty user agent detection (`Browser::Bot.detect_empty_ua!` and `Browser::Bot.detect_empty_ua?`).
11
+ - [BREAKING CHANGE] Bot detection is now more aggressive by default. It matches empty user agents, anything that matches `crawl|fetch|search|monitoring|spider|bot`, and anything listed under https://github.com/fnando/browser/blob/master/bots.yml.
12
+ - Add Jaunt to the bot list.
13
+
14
+ ## 2.7.1
15
+
5
16
  - Handle Snapchat user agents that have a space or an empty string instead of a slash before the version.
6
17
  - Fix iOS 10+ version detection.
7
18
  - Add fallback versions for instagram and snapchat to avoid NoMethodErrors on unexpected user agents.
data/README.md CHANGED
@@ -58,7 +58,8 @@ browser.sputnik?
58
58
  browser.bot.name
59
59
  browser.bot.search_engine?
60
60
  browser.bot?
61
- Browser::Bot.why?(ua) # shows which user agent was the offender
61
+ browser.bot.why? # shows which matcher detected this user agent as a bot.
62
+ Browser::Bot.why?(ua)
62
63
 
63
64
  # Get device info
64
65
  browser.device
@@ -146,21 +147,21 @@ browser.mobile? #=> false
146
147
 
147
148
  ### What defines a modern browser?
148
149
 
149
- The current rules that define a modern browser are pretty loose:
150
+ The current rules that define a modern browser are pretty loose.
150
151
 
151
- * Webkit
152
- * IE9+
153
- * Microsoft Edge
154
- * Firefox 17+
155
- * Firefox Tablet 14+
156
- * Opera 12+
152
+ * Chrome 65+
153
+ * Safari 10+
154
+ * Firefox 52+
155
+ * IE11+
156
+ * Microsoft Edge 39+
157
+ * Opera 50+
157
158
 
158
159
  You can define your own rules. A rule must be a proc/lambda or any object that implements the method === and accepts the browser object. To redefine all rules, clear the existing rules before adding your own.
159
160
 
160
161
  ```ruby
161
- # Only Chrome Canary is considered modern.
162
+ # Only Google Chrome 79+ is considered modern.
162
163
  Browser.modern_rules.clear
163
- Browser.modern_rules << -> b { b.chrome? && b.version.to_i >= 37 }
164
+ Browser.modern_rules << -> b { b.chrome? && b.version.to_i >= 79 }
164
165
  ```
165
166
 
166
167
  ### Rails integration
@@ -213,7 +214,7 @@ language.name
213
214
  #=> "English/United States"
214
215
  ```
215
216
 
216
- Result is always sorted in quality order from highest -> lowest. As per the HTTP spec:
217
+ Result is always sorted in quality order from highest to lowest. As per the HTTP spec:
217
218
 
218
219
  - omitting the quality value implies 1.0.
219
220
  - quality value equal to zero means that is not accepted by the client.
@@ -266,10 +267,29 @@ browser.platform.ios?
266
267
 
267
268
  ### Bots
268
269
 
269
- Browser used to detect empty user agents as bots, but this behavior has changed. If you want to bring this detection back, you can activate it through the following call:
270
+ The bot detection is quite aggressive. Anything that matches at least one of the following requirements will be considered a bot.
271
+
272
+ - Empty user agent string
273
+ - User agent that matches `/crawl|fetch|search|monitoring|spider|bot/`
274
+ - Any known bot listed under [bots.yml](https://github.com/fnando/browser/blob/master/bots.yml)
275
+
276
+ To add custom matchers, you can add a callable object to `Browser::Bot.matchers`. The following example matches everything that has a `externalhit` substring on it. The bot name will always be `General Bot`.
277
+
278
+ ```ruby
279
+ Browser::Bot.matchers << ->(ua, _browser) { ua =~ /externalhit/i }
280
+ ```
281
+
282
+ To clear all matchers, including the ones that are bundled, use `Browser::Bot.matchers.clear`. You can re-add built-in matchers by doing the following:
283
+
284
+ ```ruby
285
+ Browser::Bot.matchers += Browser::Bot.default_matchers
286
+ ```
287
+
288
+ To restore v2's bot detection, remove the following matchers:
270
289
 
271
290
  ```ruby
272
- Browser::Bot.detect_empty_ua!
291
+ Browser::Bot.matchers.delete(Browser::Bot::KeywordMatcher)
292
+ Browser::Bot.matchers.delete(Browser::Bot::EmptyUserAgentMatcher)
273
293
  ```
274
294
 
275
295
  ### Middleware
data/Rakefile CHANGED
@@ -16,7 +16,14 @@ end
16
16
  require "rubocop/rake_task"
17
17
  desc "Run rubocop"
18
18
  task :rubocop do
19
- RuboCop::RakeTask.new
19
+ RuboCop::RakeTask.new do |t|
20
+ t.options += %w[
21
+ --display-style-guide
22
+ --display-cop-names
23
+ --extra-details
24
+ --auto-correct
25
+ ]
26
+ end
20
27
  end
21
28
 
22
29
  desc "Run specs against all gemfiles"
@@ -1,3 +1,4 @@
1
+ ---
1
2
  - ruby build
2
3
  - pinterest/android
3
4
  - pinterest/ios
data/bots.yml CHANGED
@@ -1,3 +1,4 @@
1
+ ---
1
2
  200pleasebot: "200PleaseBot"
2
3
  360spider: "360Spider"
3
4
  abot: "CrawlDaddy, abot"
@@ -16,6 +17,7 @@ apis-google: APIs-Google
16
17
  appengine-google: "Google App Engine"
17
18
  applebot: "Apple Bot"
18
19
  archive.org_bot: "Internet Archive (archive.org)"
20
+ archiveteam archivebot: "ArchiveTeam ArchiveBot"
19
21
  ask jeeves: "Ask Jeeves"
20
22
  asynchttpclient: "Java http and WebSocket client library"
21
23
  awe.sm: "Awe.sm URL expander"
@@ -109,6 +111,7 @@ insieve: "Insieve Bot"
109
111
  insitesbot: "Insitesbot"
110
112
  instapaper: "Instapaper"
111
113
  istellabot: "IstellaBot"
114
+ jaunt: "Jaunt - Java Web Scraping & JSON Querying"
112
115
  jetslide: "Jetslide"
113
116
  jobseeker: "jobseeker.com.au/bot.html"
114
117
  jooble: "Jooble"
@@ -220,7 +223,6 @@ snapchat: "Snapchat"
220
223
  socialrank: "SocialRankIOBot"
221
224
  sogou: "Chinese search engine"
222
225
  spbot: "OpenLinkProfiler"
223
- spider: "generic web spider"
224
226
  spinn3r: "Spinn3r aggregator"
225
227
  sputnikbot: "SputnikBot"
226
228
  squider: "Squider"
@@ -29,6 +29,6 @@ Gem::Specification.new do |s|
29
29
  s.add_development_dependency "rails"
30
30
  s.add_development_dependency "rake"
31
31
  s.add_development_dependency "rubocop"
32
- s.add_development_dependency "rubocop-fnando"
32
+ s.add_development_dependency "rubocop-fnando", "~> 0.0.3"
33
33
  s.add_development_dependency "simplecov"
34
34
  end
@@ -1,4 +1,4 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require "browser/browser"
4
- require "browser/rails" if defined?(::Rails)
3
+ require_relative "browser/browser"
4
+ require_relative "browser/rails" if defined?(::Rails)
@@ -1,6 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require "browser"
3
+ require_relative "../browser"
4
4
  require "forwardable"
5
5
 
6
6
  module Browser
@@ -151,12 +151,12 @@ module Browser
151
151
 
152
152
  # Detect if browser is Sputnik.
153
153
  def sputnik?(expected_version = nil)
154
- Sputnik.new(ua) && detect_version?(full_version, expected_version)
154
+ Sputnik.new(ua).match? && detect_version?(full_version, expected_version)
155
155
  end
156
156
 
157
157
  # Detect if browser is Yandex.
158
158
  def yandex?(expected_version = nil)
159
- Yandex.new(ua) && detect_version?(full_version, expected_version)
159
+ Yandex.new(ua).match? && detect_version?(full_version, expected_version)
160
160
  end
161
161
  alias_method :yandex_browser?, :yandex?
162
162
 
@@ -2,68 +2,76 @@
2
2
 
3
3
  module Browser
4
4
  class Bot
5
- def self.detect_empty_ua!
6
- @detect_empty_ua = true
5
+ GENERIC_NAME = "Generic Bot"
6
+
7
+ def self.matchers
8
+ @matchers ||= default_matchers
9
+ end
10
+
11
+ def self.default_matchers
12
+ [
13
+ EmptyUserAgentMatcher,
14
+ KnownBotsMatcher,
15
+ KeywordMatcher
16
+ ]
7
17
  end
8
18
 
9
- def self.detect_empty_ua?
10
- @detect_empty_ua
19
+ def self.load_yaml(path)
20
+ YAML.load_file(Browser.root.join(path))
11
21
  end
12
22
 
13
23
  def self.bots
14
- @bots ||= YAML.load_file(Browser.root.join("bots.yml"))
24
+ @bots ||= load_yaml("bots.yml")
15
25
  end
16
26
 
17
27
  def self.bot_exceptions
18
- @bot_exceptions ||= YAML
19
- .load_file(Browser.root.join("bot_exceptions.yml"))
28
+ @bot_exceptions ||= load_yaml("bot_exceptions.yml")
20
29
  end
21
30
 
22
31
  def self.search_engines
23
- @search_engines ||= YAML
24
- .load_file(Browser.root.join("search_engines.yml"))
32
+ @search_engines ||= load_yaml("search_engines.yml")
25
33
  end
26
34
 
27
35
  def self.why?(ua)
28
- downcased_ua = ua.downcase
29
- bots.find {|key, _| downcased_ua.include?(key) }
36
+ ua = ua.downcase.strip
37
+ browser = Browser.new(ua)
38
+ matchers.find {|matcher| matcher.call(ua, browser) }
30
39
  end
31
40
 
32
- attr_reader :ua
41
+ attr_reader :ua, :browser
33
42
 
34
43
  def initialize(ua)
35
- @ua = ua
44
+ @ua = ua.downcase.strip
45
+ @browser = Browser.new(@ua)
36
46
  end
37
47
 
38
48
  def bot?
39
- bot_with_empty_ua? || (!bot_exception? && detect_bot?)
49
+ !bot_exception? && detect_bot?
50
+ end
51
+
52
+ def why?
53
+ self.class.matchers.find {|matcher| matcher.call(ua, self) }
40
54
  end
41
55
 
42
56
  def search_engine?
43
- self.class.search_engines.any? {|key, _| downcased_ua.include?(key) }
57
+ self.class.search_engines.any? {|key, _| ua.include?(key) }
44
58
  end
45
59
 
46
60
  def name
47
61
  return unless bot?
48
- return "Generic Bot" if bot_with_empty_ua?
49
-
50
- self.class.bots.find {|key, _| downcased_ua.include?(key) }.last
51
- end
52
62
 
53
- private def bot_with_empty_ua?
54
- self.class.detect_empty_ua? && ua.strip == ""
63
+ self.class.bots.find {|key, _| ua.include?(key) }&.last || GENERIC_NAME
55
64
  end
56
65
 
57
66
  private def bot_exception?
58
- self.class.bot_exceptions.any? {|key| downcased_ua.include?(key) }
67
+ self.class.bot_exceptions.any? {|key| ua.include?(key) }
59
68
  end
60
69
 
61
70
  private def detect_bot?
62
- self.class.bots.any? {|key, _| downcased_ua.include?(key) }
71
+ self.class.matchers.any? {|matcher| matcher.call(ua, browser) }
63
72
  end
64
73
 
65
- private def downcased_ua
66
- @downcased_ua ||= ua.downcase
67
- end
74
+ private :ua
75
+ private :browser
68
76
  end
69
77
  end
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Browser
4
+ class Bot
5
+ class EmptyUserAgentMatcher
6
+ def self.call(ua, _browser)
7
+ ua == ""
8
+ end
9
+ end
10
+ end
11
+ end
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Browser
4
+ class Bot
5
+ class KeywordMatcher
6
+ def self.call(ua, _browser)
7
+ ua =~ /crawl|fetch|search|monitoring|spider|bot/
8
+ end
9
+ end
10
+ end
11
+ end
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Browser
4
+ class Bot
5
+ class KnownBotsMatcher
6
+ def self.call(ua, _browser)
7
+ Browser::Bot.bots.any? {|key, _| ua.include?(key) }
8
+ end
9
+ end
10
+ end
11
+ end
@@ -4,39 +4,42 @@ require "set"
4
4
  require "yaml"
5
5
  require "pathname"
6
6
 
7
- require "browser/version"
8
- require "browser/detect_version"
9
- require "browser/accept_language"
10
- require "browser/base"
11
- require "browser/safari"
12
- require "browser/chrome"
13
- require "browser/internet_explorer"
14
- require "browser/firefox"
15
- require "browser/edge"
16
- require "browser/opera"
17
- require "browser/blackberry"
18
- require "browser/generic"
19
- require "browser/phantom_js"
20
- require "browser/uc_browser"
21
- require "browser/nokia"
22
- require "browser/micro_messenger"
23
- require "browser/weibo"
24
- require "browser/qq"
25
- require "browser/alipay"
26
- require "browser/electron"
27
- require "browser/facebook"
28
- require "browser/otter"
29
- require "browser/instagram"
30
- require "browser/yandex"
31
- require "browser/sputnik"
32
- require "browser/snapchat"
7
+ require_relative "version"
8
+ require_relative "detect_version"
9
+ require_relative "accept_language"
10
+ require_relative "base"
11
+ require_relative "safari"
12
+ require_relative "chrome"
13
+ require_relative "internet_explorer"
14
+ require_relative "firefox"
15
+ require_relative "edge"
16
+ require_relative "opera"
17
+ require_relative "blackberry"
18
+ require_relative "generic"
19
+ require_relative "phantom_js"
20
+ require_relative "uc_browser"
21
+ require_relative "nokia"
22
+ require_relative "micro_messenger"
23
+ require_relative "weibo"
24
+ require_relative "qq"
25
+ require_relative "alipay"
26
+ require_relative "electron"
27
+ require_relative "facebook"
28
+ require_relative "otter"
29
+ require_relative "instagram"
30
+ require_relative "yandex"
31
+ require_relative "sputnik"
32
+ require_relative "snapchat"
33
33
 
34
- require "browser/bot"
35
- require "browser/middleware"
34
+ require_relative "bot"
35
+ require_relative "bot/empty_user_agent_matcher"
36
+ require_relative "bot/keyword_matcher"
37
+ require_relative "bot/known_bots_matcher"
36
38
 
37
- require "browser/platform"
38
- require "browser/device"
39
- require "browser/meta"
39
+ require_relative "middleware"
40
+ require_relative "platform"
41
+ require_relative "device"
42
+ require_relative "meta"
40
43
 
41
44
  module Browser
42
45
  EMPTY_STRING = ""
@@ -89,12 +92,12 @@ module Browser
89
92
  end
90
93
 
91
94
  modern_rules.tap do |rules|
92
- rules << ->(b) { b.webkit? }
93
- rules << ->(b) { b.firefox? && b.version.to_i >= 17 }
94
- rules << ->(b) { b.ie? && b.version.to_i >= 9 && !b.compatibility_view? }
95
- rules << ->(b) { b.edge? && !b.compatibility_view? }
96
- rules << ->(b) { b.opera? && b.version.to_i >= 12 }
97
- rules << ->(b) { b.firefox? && b.device.tablet? && b.platform.android? && b.version.to_i >= 14 } # rubocop:disable Metrics/LineLength
95
+ rules << ->(b) { b.chrome? && b.version.to_i >= 65 }
96
+ rules << ->(b) { b.safari? && b.version.to_i >= 10 }
97
+ rules << ->(b) { b.firefox? && b.version.to_i >= 52 }
98
+ rules << ->(b) { b.ie? && b.version.to_i >= 11 && !b.compatibility_view? }
99
+ rules << ->(b) { b.edge? && b.version.to_i >= 39 && !b.compatibility_view? }
100
+ rules << ->(b) { b.opera? && b.version.to_i >= 50 }
98
101
  end
99
102
 
100
103
  def self.new(user_agent, **kwargs)