browser 2.7.1 → 3.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 17e665a446ca2a9578cae49ad272bbbc596c29bf8518643cbcec39f2654e4bc4
4
- data.tar.gz: 0dd132f6f496e0221417a5b474878823eed6b7e4eac51764f383cd2825cb1969
3
+ metadata.gz: 73c5a3410e573b333ce54118589c25ca868b6daa5df15e2e4962da120521139b
4
+ data.tar.gz: 7d47dbef918c0ec0176c59d81d592285e273d259769a80fc4edcafc335f69c9d
5
5
  SHA512:
6
- metadata.gz: 2902444a66316eb79ccb9fd409d58d21ac764b6f22a86bb9a2af74791f3bd8c86301a7c7b8829fe17f14a24c0827b3584d0c0a7813f85c9a4022a179ea67db7a
7
- data.tar.gz: 1282a961f637a98449d6425272abcd3f005c003908ce37e0954052b3b5fd8bc6bf3d5d75e3104e6d3705823e3abe60f9eb9fca7046bf94f1c16f625cc6a09b07
6
+ metadata.gz: ad5612535803d139af879d09312cdba4c77bc4c1351a483ed16fc677f4372177da92afeaea6b6fb87912e1cd9ae9450c49d97bc58cd494522ee9b206411e3b49
7
+ data.tar.gz: f1f8223336528e77c863843b918bd303be591e57b0b26c09bef24fe238406e3896637238b28b573792b8b8f25b20d1a4dbe3369dd30dd9e9671b8e702817b00e
@@ -16,7 +16,7 @@ AllCops:
16
16
  Metrics/ClassLength:
17
17
  Enabled: false
18
18
 
19
- Metrics/LineLength:
19
+ Layout/LineLength:
20
20
  Max: 80
21
21
 
22
22
  Metrics/MethodLength:
@@ -2,6 +2,17 @@
2
2
 
3
3
  ## Unreleased
4
4
 
5
+ - Add ArchiveTeam's ArchiveBot to the bot list.
6
+ - Fix QQ Browser detection.
7
+ - Update modern rules.
8
+ - You can now define new bot matchers by adding a callable object to `Browser::Bot.matchers`.
9
+ - Fix `browser.yandex?` and `browser.sputnik?`.
10
+ - [BREAKING CHANGE] Removed methods to enable the bot's empty user agent detection (`Browser::Bot.detect_empty_ua!` and `Browser::Bot.detect_empty_ua?`).
11
+ - [BREAKING CHANGE] Bot detection is now more aggressive by default. It matches empty user agents, anything that matches `crawl|fetch|search|monitoring|spider|bot`, and anything listed under https://github.com/fnando/browser/blob/master/bots.yml.
12
+ - Add Jaunt to the bot list.
13
+
14
+ ## 2.7.1
15
+
5
16
  - Handle Snapchat user agents that have a space or an empty string instead of a slash before the version.
6
17
  - Fix iOS 10+ version detection.
7
18
  - Add fallback versions for instagram and snapchat to avoid NoMethodErrors on unexpected user agents.
data/README.md CHANGED
@@ -58,7 +58,8 @@ browser.sputnik?
58
58
  browser.bot.name
59
59
  browser.bot.search_engine?
60
60
  browser.bot?
61
- Browser::Bot.why?(ua) # shows which user agent was the offender
61
+ browser.bot.why? # shows which matcher detected this user agent as a bot.
62
+ Browser::Bot.why?(ua)
62
63
 
63
64
  # Get device info
64
65
  browser.device
@@ -146,21 +147,21 @@ browser.mobile? #=> false
146
147
 
147
148
  ### What defines a modern browser?
148
149
 
149
- The current rules that define a modern browser are pretty loose:
150
+ The current rules that define a modern browser are pretty loose.
150
151
 
151
- * Webkit
152
- * IE9+
153
- * Microsoft Edge
154
- * Firefox 17+
155
- * Firefox Tablet 14+
156
- * Opera 12+
152
+ * Chrome 65+
153
+ * Safari 10+
154
+ * Firefox 52+
155
+ * IE11+
156
+ * Microsoft Edge 39+
157
+ * Opera 50+
157
158
 
158
159
  You can define your own rules. A rule must be a proc/lambda or any object that implements the method === and accepts the browser object. To redefine all rules, clear the existing rules before adding your own.
159
160
 
160
161
  ```ruby
161
- # Only Chrome Canary is considered modern.
162
+ # Only Google Chrome 79+ is considered modern.
162
163
  Browser.modern_rules.clear
163
- Browser.modern_rules << -> b { b.chrome? && b.version.to_i >= 37 }
164
+ Browser.modern_rules << -> b { b.chrome? && b.version.to_i >= 79 }
164
165
  ```
165
166
 
166
167
  ### Rails integration
@@ -213,7 +214,7 @@ language.name
213
214
  #=> "English/United States"
214
215
  ```
215
216
 
216
- Result is always sorted in quality order from highest -> lowest. As per the HTTP spec:
217
+ Result is always sorted in quality order from highest to lowest. As per the HTTP spec:
217
218
 
218
219
  - omitting the quality value implies 1.0.
219
220
  - quality value equal to zero means that is not accepted by the client.
@@ -266,10 +267,29 @@ browser.platform.ios?
266
267
 
267
268
  ### Bots
268
269
 
269
- Browser used to detect empty user agents as bots, but this behavior has changed. If you want to bring this detection back, you can activate it through the following call:
270
+ The bot detection is quite aggressive. Anything that matches at least one of the following requirements will be considered a bot.
271
+
272
+ - Empty user agent string
273
+ - User agent that matches `/crawl|fetch|search|monitoring|spider|bot/`
274
+ - Any known bot listed under [bots.yml](https://github.com/fnando/browser/blob/master/bots.yml)
275
+
276
+ To add custom matchers, you can add a callable object to `Browser::Bot.matchers`. The following example matches everything that has a `externalhit` substring on it. The bot name will always be `General Bot`.
277
+
278
+ ```ruby
279
+ Browser::Bot.matchers << ->(ua, _browser) { ua =~ /externalhit/i }
280
+ ```
281
+
282
+ To clear all matchers, including the ones that are bundled, use `Browser::Bot.matchers.clear`. You can re-add built-in matchers by doing the following:
283
+
284
+ ```ruby
285
+ Browser::Bot.matchers += Browser::Bot.default_matchers
286
+ ```
287
+
288
+ To restore v2's bot detection, remove the following matchers:
270
289
 
271
290
  ```ruby
272
- Browser::Bot.detect_empty_ua!
291
+ Browser::Bot.matchers.delete(Browser::Bot::KeywordMatcher)
292
+ Browser::Bot.matchers.delete(Browser::Bot::EmptyUserAgentMatcher)
273
293
  ```
274
294
 
275
295
  ### Middleware
data/Rakefile CHANGED
@@ -16,7 +16,14 @@ end
16
16
  require "rubocop/rake_task"
17
17
  desc "Run rubocop"
18
18
  task :rubocop do
19
- RuboCop::RakeTask.new
19
+ RuboCop::RakeTask.new do |t|
20
+ t.options += %w[
21
+ --display-style-guide
22
+ --display-cop-names
23
+ --extra-details
24
+ --auto-correct
25
+ ]
26
+ end
20
27
  end
21
28
 
22
29
  desc "Run specs against all gemfiles"
@@ -1,3 +1,4 @@
1
+ ---
1
2
  - ruby build
2
3
  - pinterest/android
3
4
  - pinterest/ios
data/bots.yml CHANGED
@@ -1,3 +1,4 @@
1
+ ---
1
2
  200pleasebot: "200PleaseBot"
2
3
  360spider: "360Spider"
3
4
  abot: "CrawlDaddy, abot"
@@ -16,6 +17,7 @@ apis-google: APIs-Google
16
17
  appengine-google: "Google App Engine"
17
18
  applebot: "Apple Bot"
18
19
  archive.org_bot: "Internet Archive (archive.org)"
20
+ archiveteam archivebot: "ArchiveTeam ArchiveBot"
19
21
  ask jeeves: "Ask Jeeves"
20
22
  asynchttpclient: "Java http and WebSocket client library"
21
23
  awe.sm: "Awe.sm URL expander"
@@ -109,6 +111,7 @@ insieve: "Insieve Bot"
109
111
  insitesbot: "Insitesbot"
110
112
  instapaper: "Instapaper"
111
113
  istellabot: "IstellaBot"
114
+ jaunt: "Jaunt - Java Web Scraping & JSON Querying"
112
115
  jetslide: "Jetslide"
113
116
  jobseeker: "jobseeker.com.au/bot.html"
114
117
  jooble: "Jooble"
@@ -220,7 +223,6 @@ snapchat: "Snapchat"
220
223
  socialrank: "SocialRankIOBot"
221
224
  sogou: "Chinese search engine"
222
225
  spbot: "OpenLinkProfiler"
223
- spider: "generic web spider"
224
226
  spinn3r: "Spinn3r aggregator"
225
227
  sputnikbot: "SputnikBot"
226
228
  squider: "Squider"
@@ -29,6 +29,6 @@ Gem::Specification.new do |s|
29
29
  s.add_development_dependency "rails"
30
30
  s.add_development_dependency "rake"
31
31
  s.add_development_dependency "rubocop"
32
- s.add_development_dependency "rubocop-fnando"
32
+ s.add_development_dependency "rubocop-fnando", "~> 0.0.3"
33
33
  s.add_development_dependency "simplecov"
34
34
  end
@@ -1,4 +1,4 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require "browser/browser"
4
- require "browser/rails" if defined?(::Rails)
3
+ require_relative "browser/browser"
4
+ require_relative "browser/rails" if defined?(::Rails)
@@ -1,6 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require "browser"
3
+ require_relative "../browser"
4
4
  require "forwardable"
5
5
 
6
6
  module Browser
@@ -151,12 +151,12 @@ module Browser
151
151
 
152
152
  # Detect if browser is Sputnik.
153
153
  def sputnik?(expected_version = nil)
154
- Sputnik.new(ua) && detect_version?(full_version, expected_version)
154
+ Sputnik.new(ua).match? && detect_version?(full_version, expected_version)
155
155
  end
156
156
 
157
157
  # Detect if browser is Yandex.
158
158
  def yandex?(expected_version = nil)
159
- Yandex.new(ua) && detect_version?(full_version, expected_version)
159
+ Yandex.new(ua).match? && detect_version?(full_version, expected_version)
160
160
  end
161
161
  alias_method :yandex_browser?, :yandex?
162
162
 
@@ -2,68 +2,76 @@
2
2
 
3
3
  module Browser
4
4
  class Bot
5
- def self.detect_empty_ua!
6
- @detect_empty_ua = true
5
+ GENERIC_NAME = "Generic Bot"
6
+
7
+ def self.matchers
8
+ @matchers ||= default_matchers
9
+ end
10
+
11
+ def self.default_matchers
12
+ [
13
+ EmptyUserAgentMatcher,
14
+ KnownBotsMatcher,
15
+ KeywordMatcher
16
+ ]
7
17
  end
8
18
 
9
- def self.detect_empty_ua?
10
- @detect_empty_ua
19
+ def self.load_yaml(path)
20
+ YAML.load_file(Browser.root.join(path))
11
21
  end
12
22
 
13
23
  def self.bots
14
- @bots ||= YAML.load_file(Browser.root.join("bots.yml"))
24
+ @bots ||= load_yaml("bots.yml")
15
25
  end
16
26
 
17
27
  def self.bot_exceptions
18
- @bot_exceptions ||= YAML
19
- .load_file(Browser.root.join("bot_exceptions.yml"))
28
+ @bot_exceptions ||= load_yaml("bot_exceptions.yml")
20
29
  end
21
30
 
22
31
  def self.search_engines
23
- @search_engines ||= YAML
24
- .load_file(Browser.root.join("search_engines.yml"))
32
+ @search_engines ||= load_yaml("search_engines.yml")
25
33
  end
26
34
 
27
35
  def self.why?(ua)
28
- downcased_ua = ua.downcase
29
- bots.find {|key, _| downcased_ua.include?(key) }
36
+ ua = ua.downcase.strip
37
+ browser = Browser.new(ua)
38
+ matchers.find {|matcher| matcher.call(ua, browser) }
30
39
  end
31
40
 
32
- attr_reader :ua
41
+ attr_reader :ua, :browser
33
42
 
34
43
  def initialize(ua)
35
- @ua = ua
44
+ @ua = ua.downcase.strip
45
+ @browser = Browser.new(@ua)
36
46
  end
37
47
 
38
48
  def bot?
39
- bot_with_empty_ua? || (!bot_exception? && detect_bot?)
49
+ !bot_exception? && detect_bot?
50
+ end
51
+
52
+ def why?
53
+ self.class.matchers.find {|matcher| matcher.call(ua, self) }
40
54
  end
41
55
 
42
56
  def search_engine?
43
- self.class.search_engines.any? {|key, _| downcased_ua.include?(key) }
57
+ self.class.search_engines.any? {|key, _| ua.include?(key) }
44
58
  end
45
59
 
46
60
  def name
47
61
  return unless bot?
48
- return "Generic Bot" if bot_with_empty_ua?
49
-
50
- self.class.bots.find {|key, _| downcased_ua.include?(key) }.last
51
- end
52
62
 
53
- private def bot_with_empty_ua?
54
- self.class.detect_empty_ua? && ua.strip == ""
63
+ self.class.bots.find {|key, _| ua.include?(key) }&.last || GENERIC_NAME
55
64
  end
56
65
 
57
66
  private def bot_exception?
58
- self.class.bot_exceptions.any? {|key| downcased_ua.include?(key) }
67
+ self.class.bot_exceptions.any? {|key| ua.include?(key) }
59
68
  end
60
69
 
61
70
  private def detect_bot?
62
- self.class.bots.any? {|key, _| downcased_ua.include?(key) }
71
+ self.class.matchers.any? {|matcher| matcher.call(ua, browser) }
63
72
  end
64
73
 
65
- private def downcased_ua
66
- @downcased_ua ||= ua.downcase
67
- end
74
+ private :ua
75
+ private :browser
68
76
  end
69
77
  end
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Browser
4
+ class Bot
5
+ class EmptyUserAgentMatcher
6
+ def self.call(ua, _browser)
7
+ ua == ""
8
+ end
9
+ end
10
+ end
11
+ end
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Browser
4
+ class Bot
5
+ class KeywordMatcher
6
+ def self.call(ua, _browser)
7
+ ua =~ /crawl|fetch|search|monitoring|spider|bot/
8
+ end
9
+ end
10
+ end
11
+ end
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Browser
4
+ class Bot
5
+ class KnownBotsMatcher
6
+ def self.call(ua, _browser)
7
+ Browser::Bot.bots.any? {|key, _| ua.include?(key) }
8
+ end
9
+ end
10
+ end
11
+ end
@@ -4,39 +4,42 @@ require "set"
4
4
  require "yaml"
5
5
  require "pathname"
6
6
 
7
- require "browser/version"
8
- require "browser/detect_version"
9
- require "browser/accept_language"
10
- require "browser/base"
11
- require "browser/safari"
12
- require "browser/chrome"
13
- require "browser/internet_explorer"
14
- require "browser/firefox"
15
- require "browser/edge"
16
- require "browser/opera"
17
- require "browser/blackberry"
18
- require "browser/generic"
19
- require "browser/phantom_js"
20
- require "browser/uc_browser"
21
- require "browser/nokia"
22
- require "browser/micro_messenger"
23
- require "browser/weibo"
24
- require "browser/qq"
25
- require "browser/alipay"
26
- require "browser/electron"
27
- require "browser/facebook"
28
- require "browser/otter"
29
- require "browser/instagram"
30
- require "browser/yandex"
31
- require "browser/sputnik"
32
- require "browser/snapchat"
7
+ require_relative "version"
8
+ require_relative "detect_version"
9
+ require_relative "accept_language"
10
+ require_relative "base"
11
+ require_relative "safari"
12
+ require_relative "chrome"
13
+ require_relative "internet_explorer"
14
+ require_relative "firefox"
15
+ require_relative "edge"
16
+ require_relative "opera"
17
+ require_relative "blackberry"
18
+ require_relative "generic"
19
+ require_relative "phantom_js"
20
+ require_relative "uc_browser"
21
+ require_relative "nokia"
22
+ require_relative "micro_messenger"
23
+ require_relative "weibo"
24
+ require_relative "qq"
25
+ require_relative "alipay"
26
+ require_relative "electron"
27
+ require_relative "facebook"
28
+ require_relative "otter"
29
+ require_relative "instagram"
30
+ require_relative "yandex"
31
+ require_relative "sputnik"
32
+ require_relative "snapchat"
33
33
 
34
- require "browser/bot"
35
- require "browser/middleware"
34
+ require_relative "bot"
35
+ require_relative "bot/empty_user_agent_matcher"
36
+ require_relative "bot/keyword_matcher"
37
+ require_relative "bot/known_bots_matcher"
36
38
 
37
- require "browser/platform"
38
- require "browser/device"
39
- require "browser/meta"
39
+ require_relative "middleware"
40
+ require_relative "platform"
41
+ require_relative "device"
42
+ require_relative "meta"
40
43
 
41
44
  module Browser
42
45
  EMPTY_STRING = ""
@@ -89,12 +92,12 @@ module Browser
89
92
  end
90
93
 
91
94
  modern_rules.tap do |rules|
92
- rules << ->(b) { b.webkit? }
93
- rules << ->(b) { b.firefox? && b.version.to_i >= 17 }
94
- rules << ->(b) { b.ie? && b.version.to_i >= 9 && !b.compatibility_view? }
95
- rules << ->(b) { b.edge? && !b.compatibility_view? }
96
- rules << ->(b) { b.opera? && b.version.to_i >= 12 }
97
- rules << ->(b) { b.firefox? && b.device.tablet? && b.platform.android? && b.version.to_i >= 14 } # rubocop:disable Metrics/LineLength
95
+ rules << ->(b) { b.chrome? && b.version.to_i >= 65 }
96
+ rules << ->(b) { b.safari? && b.version.to_i >= 10 }
97
+ rules << ->(b) { b.firefox? && b.version.to_i >= 52 }
98
+ rules << ->(b) { b.ie? && b.version.to_i >= 11 && !b.compatibility_view? }
99
+ rules << ->(b) { b.edge? && b.version.to_i >= 39 && !b.compatibility_view? }
100
+ rules << ->(b) { b.opera? && b.version.to_i >= 50 }
98
101
  end
99
102
 
100
103
  def self.new(user_agent, **kwargs)