ferrum 0.6.2 → 0.7

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 9c0325708ecf4ff29cd0d11a3527207271a1a9a76c945a321810b7f37325c538
4
- data.tar.gz: 9d2f4790ef55ddfb55f1588473dc94511fc95fe7c7cc937f0e49af0980457b9e
3
+ metadata.gz: 34f864b5679986d8580fee118735ea2bf62b5b35f1d2ae9f403cdc08168d1d48
4
+ data.tar.gz: 5899a23fdf4219dfd7aa70b7f1d72d0dcee167a6102b07a5062c36b8c4588f90
5
5
  SHA512:
6
- metadata.gz: 8067b96de131a957317d641afeec6d145f47455b18eb119227dc1de184ed35b8b6a6cbf84aa8e1e6b3c87ff190b2218c68dd43cb1c9187fe7e7acea2cae1ca89
7
- data.tar.gz: a4fb8e6749da0fa5ccbf1ab05c807f176e30c2650e23b263072281065b5ca9632201a596652989636f7b9f565ac8a23ca391e8d366d755538e7b28504b09209b
6
+ metadata.gz: 3732e2ba120edbd7e1d28759d6d9fe75c4416f41cc08a535e63751324faec35117a805de3dfdb1467fa3aa55c2885b8c7058d1fce46bb0b69e642210f89bac26
7
+ data.tar.gz: 426b53078a92ddc04187250335c200dd35dac489eb2fe887f07a7ae8bb07f0652588d278ef18d7389221f692ae3b55488cedfcec0ea8511b57d9ebc9776060db
data/README.md CHANGED
@@ -12,12 +12,31 @@ It is Ruby clean and high-level API to Chrome. Runs headless by default,
12
12
  but you can configure it to run in a non-headless mode. All you need is Ruby and
13
13
  Chrome/Chromium. Ferrum connects to the browser via DevTools Protocol.
14
14
 
15
- Relation to [Cuprite](https://github.com/machinio/cuprite). Cuprite used to have
16
- this code inside in one form or another but the thing is you don't need capybara
17
- if you are going to crawl sites. You crawl, not test. Besides that clean
18
- lightweight API to browser is what Ruby was missing, so here it comes.
19
-
20
- If you like this project, please consider to [become a backer](https://www.patreon.com/rferrum) on Patreon.
15
+ [Cuprite](https://github.com/machinio/cuprite) used to have this code inside in
16
+ one form or another but the thing is you don't need Capybara if you are going to
17
+ crawl sites. You crawl, not test. Besides that clean lightweight API to browser
18
+ is what Ruby was missing, so here it comes.
19
+
20
+ [Vessel](https://github.com/route/vessel) high-level web crawling framework
21
+ based on Ferrum.
22
+
23
+ If you like this project, please consider to _[become a backer](https://www.patreon.com/rferrum)_
24
+ on Patreon.
25
+
26
+ ## Index
27
+
28
+ * [Customization](https://github.com/route/ferrum#customization)
29
+ * [Navigation](https://github.com/route/ferrum#navigation)
30
+ * [Finders](https://github.com/route/ferrum#finders)
31
+ * [Screenshots](https://github.com/route/ferrum#screenshots)
32
+ * [Network](https://github.com/route/ferrum#network)
33
+ * [Mouse](https://github.com/route/ferrum#mouse)
34
+ * [Keyboard](https://github.com/route/ferrum#keyboard)
35
+ * [Cookies](https://github.com/route/ferrum#cookies)
36
+ * [Headers](https://github.com/route/ferrum#headers)
37
+ * [JavaScript](https://github.com/route/ferrum#javascript)
38
+ * [Frames](https://github.com/route/ferrum#frames)
39
+ * [Dialog](https://github.com/route/ferrum#dialog)
21
40
 
22
41
  ## Install
23
42
 
@@ -48,8 +67,8 @@ Interact with a page:
48
67
  browser = Ferrum::Browser.new
49
68
  browser.goto("https://google.com")
50
69
  input = browser.at_xpath("//div[@id='searchform']/form//input[@type='text']")
51
- input.focus.type("Ruby headless driver for Capybara", :Enter)
52
- browser.at_css("a > h3").text # => "machinio/cuprite: Headless Chrome driver for Capybara - GitHub"
70
+ input.focus.type("Ruby headless driver for Chrome", :Enter)
71
+ browser.at_css("a > h3").text # => "route/ferrum: Ruby Chrome/Chromium driver - GitHub"
53
72
  browser.quit
54
73
  ```
55
74
 
@@ -93,22 +112,26 @@ Ferrum::Browser.new(options)
93
112
  ```
94
113
 
95
114
  * options `Hash`
96
- * `:browser_path` (String) - Path to chrome binary, you can also set ENV
97
- variable as `BROWSER_PATH=some/path/chrome bundle exec rspec`.
98
115
  * `:headless` (Boolean) - Set browser as headless or not, `true` by default.
99
- * `:slowmo` (Integer | Float) - Set a delay to wait before sending command.
100
- Usefull companion of headless option, so that you have time to see changes.
116
+ * `:window_size` (Array) - The dimensions of the browser window in which to
117
+ test, expressed as a 2-element array, e.g. [1024, 768]. Default: [1024, 768]
118
+ * `:extensions` (Array[String | Hash]) - An array of paths to files or JS
119
+ source code to be preloaded into the browser e.g.:
120
+ `["/path/to/script.js", { source: "window.secret = 'top'" }]`
101
121
  * `:logger` (Object responding to `puts`) - When present, debug output is
102
122
  written to this object.
123
+ * `:slowmo` (Integer | Float) - Set a delay to wait before sending command.
124
+ Usefull companion of headless option, so that you have time to see changes.
103
125
  * `:timeout` (Numeric) - The number of seconds we'll wait for a response when
104
126
  communicating with browser. Default is 5.
105
127
  * `:js_errors` (Boolean) - When true, JavaScript errors get re-raised in Ruby.
106
- * `:window_size` (Array) - The dimensions of the browser window in which to
107
- test, expressed as a 2-element array, e.g. [1024, 768]. Default: [1024, 768]
128
+ * `:browser_name` (Symbol) - `:chrome` by default, only experimental support
129
+ for `:firefox` for now.
130
+ * `:browser_path` (String) - Path to chrome binary, you can also set ENV
131
+ variable as `BROWSER_PATH=some/path/chrome bundle exec rspec`.
108
132
  * `:browser_options` (Hash) - Additional command line options,
109
133
  [see them all](https://peter.sh/experiments/chromium-command-line-switches/)
110
134
  e.g. `{ "ignore-certificate-errors" => nil }`
111
- * `:extensions` (Array) - An array of JS files to be preloaded into the browser
112
135
  * `:port` (Integer) - Remote debugging port for headless Chrome
113
136
  * `:host` (String) - Remote debugging address for headless Chrome
114
137
  * `:url` (String) - URL for a running instance of Chrome. If this is set, a
@@ -342,6 +365,24 @@ browser.goto("https://github.com/")
342
365
  browser.network.status # => 200
343
366
  ```
344
367
 
368
+ #### wait_for_idle(\*\*options)
369
+
370
+ Waits for network idle or raises `Ferrum::TimeoutError` error
371
+
372
+ * options `Hash`
373
+ * :connections `Integer` how many connections are allowed for network to be
374
+ idling, `0` by default
375
+ * :duration `Float` sleep for given amount of time and check again, `0.05` by
376
+ default
377
+ * :timeout `Float` during what time we try to check idle, `browser.timeout`
378
+ by default
379
+
380
+ ```ruby
381
+ browser.goto("https://example.com/")
382
+ browser.at_xpath("//a[text() = 'No UI changes button']").click
383
+ browser.network.wait_for_idle
384
+ ```
385
+
345
386
  #### clear(type)
346
387
 
347
388
  Clear browser's cache or collected traffic.
@@ -628,6 +669,18 @@ browser.add_script_tag(url: "http://example.com/stylesheet.css") # => true
628
669
 
629
670
  ```ruby
630
671
  browser.add_style_tag(content: "h1 { font-size: 40px; }") # => true
672
+
673
+ ```
674
+ #### bypass_csp(enabled) : `Boolean`
675
+
676
+ * enabled `Boolean`, `true` by default
677
+
678
+ ```ruby
679
+ browser.bypass_csp # => true
680
+ browser.goto("https://github.com/ruby-concurrency/concurrent-ruby/blob/master/docs-source/promises.in.md")
681
+ browser.refresh
682
+ browser.add_script_tag(content: "window.__injected = 42")
683
+ browser.evaluate("window.__injected") # => 42
631
684
  ```
632
685
 
633
686
 
@@ -10,8 +10,14 @@ module Ferrum
10
10
  class NotImplementedError < Error; end
11
11
 
12
12
  class StatusError < Error
13
- def initialize(url)
14
- super("Request to #{url} failed to reach server, check DNS and/or server status")
13
+ def initialize(url, pendings = [])
14
+ message = if pendings.empty?
15
+ "Request to #{url} failed to reach server, check DNS and/or server status"
16
+ else
17
+ "Request to #{url} reached server, but there are still pending connections: #{pendings.join(', ')}"
18
+ end
19
+
20
+ super(message)
15
21
  end
16
22
  end
17
23
 
@@ -31,7 +37,7 @@ module Ferrum
31
37
  end
32
38
 
33
39
  class DeadBrowserError < Error
34
- def initialize(message = "Browser is dead")
40
+ def initialize(message = "Browser is dead or given window is closed")
35
41
  super
36
42
  end
37
43
  end
@@ -16,15 +16,16 @@ module Ferrum
16
16
  extend Forwardable
17
17
  delegate %i[default_context] => :contexts
18
18
  delegate %i[targets create_target create_page page pages windows] => :default_context
19
- delegate %i[goto back forward refresh
19
+ delegate %i[goto back forward refresh reload
20
20
  at_css at_xpath css xpath current_url title body doctype
21
21
  headers cookies network
22
22
  mouse keyboard
23
23
  screenshot pdf viewport_size
24
24
  frames frame_by main_frame
25
25
  evaluate evaluate_on evaluate_async execute
26
- add_script_tag add_style_tag
26
+ add_script_tag add_style_tag bypass_csp
27
27
  on] => :page
28
+ delegate %i[default_user_agent] => :process
28
29
 
29
30
  attr_reader :client, :process, :contexts, :logger, :js_errors,
30
31
  :slowmo, :base_url, :options, :window_size
@@ -67,7 +68,9 @@ module Ferrum
67
68
  end
68
69
 
69
70
  def extensions
70
- @extensions ||= Array(@options[:extensions]).map { |p| File.read(p) }
71
+ @extensions ||= Array(@options[:extensions]).map do |ext|
72
+ (ext.is_a?(Hash) && ext[:source]) || File.read(ext)
73
+ end
71
74
  end
72
75
 
73
76
  def timeout
@@ -0,0 +1,76 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Ferrum
4
+ class Browser
5
+ class Chrome < Command
6
+ DEFAULT_OPTIONS = {
7
+ "headless" => nil,
8
+ "disable-gpu" => nil,
9
+ "hide-scrollbars" => nil,
10
+ "mute-audio" => nil,
11
+ "enable-automation" => nil,
12
+ "disable-web-security" => nil,
13
+ "disable-session-crashed-bubble" => nil,
14
+ "disable-breakpad" => nil,
15
+ "disable-sync" => nil,
16
+ "no-first-run" => nil,
17
+ "use-mock-keychain" => nil,
18
+ "keep-alive-for-test" => nil,
19
+ "disable-popup-blocking" => nil,
20
+ "disable-extensions" => nil,
21
+ "disable-hang-monitor" => nil,
22
+ "disable-features" => "site-per-process,TranslateUI",
23
+ "disable-translate" => nil,
24
+ "disable-background-networking" => nil,
25
+ "enable-features" => "NetworkService,NetworkServiceInProcess",
26
+ "disable-background-timer-throttling" => nil,
27
+ "disable-backgrounding-occluded-windows" => nil,
28
+ "disable-client-side-phishing-detection" => nil,
29
+ "disable-default-apps" => nil,
30
+ "disable-dev-shm-usage" => nil,
31
+ "disable-ipc-flooding-protection" => nil,
32
+ "disable-prompt-on-repost" => nil,
33
+ "disable-renderer-backgrounding" => nil,
34
+ "force-color-profile" => "srgb",
35
+ "metrics-recording-only" => nil,
36
+ "safebrowsing-disable-auto-update" => nil,
37
+ "password-store" => "basic",
38
+ # Note: --no-sandbox is not needed if you properly setup a user in the container.
39
+ # https://github.com/ebidel/lighthouse-ci/blob/master/builder/Dockerfile#L35-L40
40
+ # "no-sandbox" => nil,
41
+ }.freeze
42
+
43
+ MAC_BIN_PATH = [
44
+ "/Applications/Chromium.app/Contents/MacOS/Chromium",
45
+ "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
46
+ ].freeze
47
+ LINUX_BIN_PATH = %w[chromium google-chrome-unstable google-chrome-beta
48
+ google-chrome chrome chromium-browser
49
+ google-chrome-stable].freeze
50
+
51
+ private
52
+
53
+ def combine_flags
54
+ # Doesn't work on MacOS, so we need to set it by CDP as well
55
+ @flags.merge!("window-size" => options[:window_size].join(","))
56
+
57
+ port = options.fetch(:port, BROWSER_PORT)
58
+ @flags.merge!("remote-debugging-port" => port)
59
+
60
+ host = options.fetch(:host, BROWSER_HOST)
61
+ @flags.merge!("remote-debugging-address" => host)
62
+
63
+ @flags.merge!("user-data-dir" => @user_data_dir)
64
+
65
+ @flags = DEFAULT_OPTIONS.merge(@flags)
66
+
67
+ unless options.fetch(:headless, true)
68
+ @flags.delete("headless")
69
+ @flags.delete("disable-gpu")
70
+ end
71
+
72
+ @flags.merge!(options.fetch(:browser_options, {}))
73
+ end
74
+ end
75
+ end
76
+ end
@@ -7,20 +7,26 @@ require "ferrum/browser/web_socket"
7
7
  module Ferrum
8
8
  class Browser
9
9
  class Client
10
+ INTERRUPTIONS = %w[Fetch.requestPaused Fetch.authRequired].freeze
11
+
10
12
  def initialize(browser, ws_url, start_id = 0, allow_slowmo = true)
11
13
  @command_id = start_id
12
14
  @pendings = Concurrent::Hash.new
13
15
  @browser = browser
14
16
  @slowmo = @browser.slowmo if allow_slowmo && @browser.slowmo > 0
15
17
  @ws = WebSocket.new(ws_url, @browser.logger)
16
- @subscriber = Subscriber.new
18
+ @subscriber, @interruptor = Subscriber.build(2)
17
19
 
18
20
  @thread = Thread.new do
19
21
  Thread.current.abort_on_exception = true
20
- Thread.current.report_on_exception = true if Thread.current.respond_to?(:report_on_exception=)
22
+ if Thread.current.respond_to?(:report_on_exception=)
23
+ Thread.current.report_on_exception = true
24
+ end
21
25
 
22
26
  while message = @ws.messages.pop
23
- if message.key?("method")
27
+ if INTERRUPTIONS.include?(message["method"])
28
+ @interruptor.async.call(message)
29
+ elsif message.key?("method")
24
30
  @subscriber.async.call(message)
25
31
  else
26
32
  @pendings[message["id"]]&.set(message)
@@ -46,7 +52,12 @@ module Ferrum
46
52
  end
47
53
 
48
54
  def on(event, &block)
49
- @subscriber.on(event, &block)
55
+ case event
56
+ when *INTERRUPTIONS
57
+ @interruptor.on(event, &block)
58
+ else
59
+ @subscriber.on(event, &block)
60
+ end
50
61
  end
51
62
 
52
63
  def close
@@ -0,0 +1,56 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Ferrum
4
+ class Browser
5
+ class Command
6
+ BROWSER_HOST = "127.0.0.1"
7
+ BROWSER_PORT = "0"
8
+ NOT_FOUND = "Could not find an executable for the browser. Try to make " \
9
+ "it available on the PATH or set environment varible for " \
10
+ "example BROWSER_PATH=\"/usr/bin/chrome\"".freeze
11
+
12
+ # Currently only these browsers support CDP:
13
+ # https://github.com/cyrus-and/chrome-remote-interface#implementations
14
+ def self.build(options, user_data_dir)
15
+ case options[:browser_name]
16
+ when :firefox
17
+ Firefox
18
+ when :chrome, :opera, :edge, nil
19
+ Chrome
20
+ else
21
+ raise NotImplementedError, "not supported browser"
22
+ end.new(options, user_data_dir)
23
+ end
24
+
25
+ attr_reader :path, :flags, :options
26
+
27
+ def initialize(options, user_data_dir)
28
+ @flags = {}
29
+ @options, @user_data_dir = options, user_data_dir
30
+ @path = options[:browser_path] || ENV["BROWSER_PATH"] || detect_path
31
+ raise Cliver::Dependency::NotFound.new(NOT_FOUND) unless @path
32
+
33
+ combine_flags
34
+ end
35
+
36
+ def to_a
37
+ [path] + flags.map { |k, v| v.nil? ? "--#{k}" : "--#{k}=#{v}" }
38
+ end
39
+
40
+ private
41
+
42
+ def detect_path
43
+ if Ferrum.mac?
44
+ self.class::MAC_BIN_PATH.find { |b| File.exist?(b) }
45
+ else
46
+ self.class::LINUX_BIN_PATH
47
+ .find { |b| p = Cliver.detect(b) and break(p) }
48
+ end
49
+ end
50
+
51
+ def combine_flags
52
+ raise NotImplementedError
53
+ end
54
+ end
55
+ end
56
+ end
@@ -0,0 +1,34 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Ferrum
4
+ class Browser
5
+ class Firefox < Command
6
+ DEFAULT_OPTIONS = {
7
+ "headless" => nil,
8
+ }.freeze
9
+
10
+ MAC_BIN_PATH = [
11
+ "/Applications/Firefox.app/Contents/MacOS/firefox-bin"
12
+ ].freeze
13
+ LINUX_BIN_PATH = %w[firefox].freeze
14
+
15
+ private
16
+
17
+ def combine_flags
18
+ port = options.fetch(:port, BROWSER_PORT)
19
+ host = options.fetch(:host, BROWSER_HOST)
20
+ @flags.merge!("remote-debugger" => "#{host}:#{port}")
21
+
22
+ @flags.merge!("profile" => @user_data_dir)
23
+
24
+ @flags = DEFAULT_OPTIONS.merge(@flags)
25
+
26
+ unless options.fetch(:headless, true)
27
+ @flags.delete("headless")
28
+ end
29
+
30
+ @flags.merge!(options.fetch(:browser_options, {}))
31
+ end
32
+ end
33
+ end
34
+ end
@@ -5,6 +5,10 @@ require "net/http"
5
5
  require "json"
6
6
  require "addressable"
7
7
  require "tmpdir"
8
+ require "forwardable"
9
+ require "ferrum/browser/command"
10
+ require "ferrum/browser/chrome"
11
+ require "ferrum/browser/firefox"
8
12
 
9
13
  module Ferrum
10
14
  class Browser
@@ -12,52 +16,14 @@ module Ferrum
12
16
  KILL_TIMEOUT = 2
13
17
  WAIT_KILLED = 0.05
14
18
  PROCESS_TIMEOUT = ENV.fetch("FERRUM_PROCESS_TIMEOUT", 2).to_i
15
- BROWSER_PATH = ENV["BROWSER_PATH"]
16
- BROWSER_HOST = "127.0.0.1"
17
- BROWSER_PORT = "0"
18
- DEFAULT_OPTIONS = {
19
- "headless" => nil,
20
- "disable-gpu" => nil,
21
- "hide-scrollbars" => nil,
22
- "mute-audio" => nil,
23
- "enable-automation" => nil,
24
- "disable-web-security" => nil,
25
- "disable-session-crashed-bubble" => nil,
26
- "disable-breakpad" => nil,
27
- "disable-sync" => nil,
28
- "no-first-run" => nil,
29
- "use-mock-keychain" => nil,
30
- "keep-alive-for-test" => nil,
31
- "disable-popup-blocking" => nil,
32
- "disable-extensions" => nil,
33
- "disable-hang-monitor" => nil,
34
- "disable-features" => "site-per-process,TranslateUI",
35
- "disable-translate" => nil,
36
- "disable-background-networking" => nil,
37
- "enable-features" => "NetworkService,NetworkServiceInProcess",
38
- "disable-background-timer-throttling" => nil,
39
- "disable-backgrounding-occluded-windows" => nil,
40
- "disable-client-side-phishing-detection" => nil,
41
- "disable-default-apps" => nil,
42
- "disable-dev-shm-usage" => nil,
43
- "disable-ipc-flooding-protection" => nil,
44
- "disable-prompt-on-repost" => nil,
45
- "disable-renderer-backgrounding" => nil,
46
- "force-color-profile" => "srgb",
47
- "metrics-recording-only" => nil,
48
- "safebrowsing-disable-auto-update" => nil,
49
- "password-store" => "basic",
50
- # Note: --no-sandbox is not needed if you properly setup a user in the container.
51
- # https://github.com/ebidel/lighthouse-ci/blob/master/builder/Dockerfile#L35-L40
52
- # "no-sandbox" => nil,
53
- }.freeze
54
-
55
- NOT_FOUND = "Could not find an executable for chrome. Try to make it " \
56
- "available on the PATH or set environment varible for " \
57
- "example BROWSER_PATH=\"/Applications/Chromium.app/Contents/MacOS/Chromium\""
58
-
59
-
60
- attr_reader :host, :port, :ws_url, :pid, :path, :options, :cmd
19
+
20
+ attr_reader :host, :port, :ws_url, :pid, :command,
21
+ :default_user_agent, :browser_version, :protocol_version,
22
+ :v8_version, :webkit_version
23
+
24
+
25
+ extend Forwardable
26
+ delegate path: :command
61
27
 
62
28
  def self.start(*args)
63
29
  new(*args).tap(&:start)
@@ -85,65 +51,24 @@ module Ferrum
85
51
  end
86
52
 
87
53
  def self.directory_remover(path)
88
- proc do
89
- begin
90
- FileUtils.remove_entry(path)
91
- rescue Errno::ENOENT
92
- end
93
- end
94
- end
95
-
96
- def self.detect_browser_path
97
- if RUBY_PLATFORM.include?("darwin")
98
- [
99
- "/Applications/Chromium.app/Contents/MacOS/Chromium",
100
- "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
101
- ].find { |path| File.exist?(path) }
102
- else
103
- %w[chromium google-chrome-unstable google-chrome-beta google-chrome chrome chromium-browser google-chrome-stable].reduce(nil) do |path, exe|
104
- path = Cliver.detect(exe)
105
- break path if path
106
- end
107
- end
54
+ proc { FileUtils.remove_entry(path) rescue Errno::ENOENT }
108
55
  end
109
56
 
110
57
  def initialize(options)
111
- @options = {}
112
-
113
- @path = options[:browser_path] || BROWSER_PATH || self.class.detect_browser_path
114
-
115
58
  if options[:url]
116
59
  url = URI.join(options[:url].to_s, "/json/version")
117
60
  response = JSON.parse(::Net::HTTP.get(url))
118
61
  set_ws_url(response["webSocketDebuggerUrl"])
62
+ parse_browser_versions
119
63
  return
120
64
  end
121
65
 
122
- # Doesn't work on MacOS, so we need to set it by CDP as well
123
- @options.merge!("window-size" => options[:window_size].join(","))
124
-
125
- port = options.fetch(:port, BROWSER_PORT)
126
- @options.merge!("remote-debugging-port" => port)
127
-
128
- host = options.fetch(:host, BROWSER_HOST)
129
- @options.merge!("remote-debugging-address" => host)
130
-
131
- @temp_user_data_dir = Dir.mktmpdir
132
- ObjectSpace.define_finalizer(self, self.class.directory_remover(@temp_user_data_dir))
133
- @options.merge!("user-data-dir" => @temp_user_data_dir)
134
-
135
- @options = DEFAULT_OPTIONS.merge(@options)
136
-
137
- unless options.fetch(:headless, true)
138
- @options.delete("headless")
139
- @options.delete("disable-gpu")
140
- end
141
-
66
+ @logger = options[:logger]
142
67
  @process_timeout = options.fetch(:process_timeout, PROCESS_TIMEOUT)
143
68
 
144
- @options.merge!(options.fetch(:browser_options, {}))
145
-
146
- @logger = options[:logger]
69
+ tmpdir = Dir.mktmpdir
70
+ ObjectSpace.define_finalizer(self, self.class.directory_remover(tmpdir))
71
+ @command = Command.build(options, tmpdir)
147
72
  end
148
73
 
149
74
  def start
@@ -156,13 +81,11 @@ module Ferrum
156
81
  process_options[:pgroup] = true unless Ferrum.windows?
157
82
  process_options[:out] = process_options[:err] = write_io
158
83
 
159
- raise Cliver::Dependency::NotFound.new(NOT_FOUND) unless @path
160
-
161
- @cmd = [@path] + @options.map { |k, v| v.nil? ? "--#{k}" : "--#{k}=#{v}" }
162
- @pid = ::Process.spawn(*@cmd, process_options)
84
+ @pid = ::Process.spawn(*@command.to_a, process_options)
163
85
  ObjectSpace.define_finalizer(self, self.class.process_killer(@pid))
164
86
 
165
87
  parse_ws_url(read_io, @process_timeout)
88
+ parse_browser_versions
166
89
  ensure
167
90
  close_io(read_io, write_io)
168
91
  end
@@ -170,7 +93,7 @@ module Ferrum
170
93
 
171
94
  def stop
172
95
  kill if @pid
173
- remove_temp_user_data_dir if @temp_user_data_dir
96
+ remove_user_data_dir if @user_data_dir
174
97
  ObjectSpace.undefine_finalizer(self)
175
98
  end
176
99
 
@@ -186,9 +109,9 @@ module Ferrum
186
109
  @pid = nil
187
110
  end
188
111
 
189
- def remove_temp_user_data_dir
190
- self.class.directory_remover(@temp_user_data_dir).call
191
- @temp_user_data_dir = nil
112
+ def remove_user_data_dir
113
+ self.class.directory_remover(@user_data_dir).call
114
+ @user_data_dir = nil
192
115
  end
193
116
 
194
117
  def parse_ws_url(read_io, timeout)
@@ -211,7 +134,7 @@ module Ferrum
211
134
 
212
135
  unless ws_url
213
136
  @logger.puts output if @logger
214
- raise "Chrome process did not produce websocket url within #{timeout} seconds"
137
+ raise "Browser process did not produce websocket url within #{timeout} seconds"
215
138
  end
216
139
  end
217
140
 
@@ -221,12 +144,25 @@ module Ferrum
221
144
  @port = @ws_url.port
222
145
  end
223
146
 
147
+ def parse_browser_versions
148
+ return unless ws_url.is_a?(Addressable::URI)
149
+
150
+ version_url = URI.parse(ws_url.merge(scheme: "http", path: "/json/version"))
151
+ response = JSON.parse(::Net::HTTP.get(version_url))
152
+
153
+ @v8_version = response["V8-Version"]
154
+ @browser_version = response["Browser"]
155
+ @webkit_version = response["WebKit-Version"]
156
+ @default_user_agent = response["User-Agent"]
157
+ @protocol_version = response["Protocol-Version"]
158
+ end
159
+
224
160
  def close_io(*ios)
225
161
  ios.each do |io|
226
162
  begin
227
163
  io.close unless io.closed?
228
164
  rescue IOError
229
- raise unless RUBY_ENGINE == 'jruby'
165
+ raise unless RUBY_ENGINE == "jruby"
230
166
  end
231
167
  end
232
168
  end
@@ -7,6 +7,10 @@ module Ferrum
7
7
  class Subscriber
8
8
  include Concurrent::Async
9
9
 
10
+ def self.build(size)
11
+ (0..size).map { new }
12
+ end
13
+
10
14
  def initialize
11
15
  super
12
16
  @on = Hash.new { |h, k| h[k] = [] }
@@ -18,8 +18,6 @@ module Ferrum
18
18
  # Can be one of:
19
19
  # * started_loading
20
20
  # * navigated
21
- # * scheduled_navigation
22
- # * cleared_scheduled_navigation
23
21
  # * stopped_loading
24
22
  def state=(value)
25
23
  @state = value
@@ -232,7 +232,7 @@ module Ferrum
232
232
 
233
233
  const seen = [];
234
234
  function detectCycle(obj) {
235
- if (typeof obj === 'object') {
235
+ if (typeof obj === "object") {
236
236
  if (seen.indexOf(obj) !== -1) {
237
237
  return true;
238
238
  }
@@ -40,7 +40,7 @@ module Ferrum
40
40
 
41
41
  def set_overrides(user_agent: nil, accept_language: nil, platform: nil)
42
42
  options = Hash.new
43
- options[:userAgent] = user_agent if user_agent
43
+ options[:userAgent] = user_agent || @page.browser.default_user_agent
44
44
  options[:acceptLanguage] = accept_language if accept_language
45
45
  options[:platform] if platform
46
46
 
@@ -3,6 +3,9 @@
3
3
  require "ferrum/network/exchange"
4
4
  require "ferrum/network/intercepted_request"
5
5
  require "ferrum/network/auth_request"
6
+ require "ferrum/network/error"
7
+ require "ferrum/network/request"
8
+ require "ferrum/network/response"
6
9
 
7
10
  module Ferrum
8
11
  class Network
@@ -20,6 +23,31 @@ module Ferrum
20
23
  @exchange = nil
21
24
  end
22
25
 
26
+ def wait_for_idle(connections: 0, duration: 0.05, timeout: @page.browser.timeout)
27
+ start = Ferrum.monotonic_time
28
+
29
+ until idle?(connections)
30
+ raise TimeoutError if Ferrum.timeout?(start, timeout)
31
+ sleep(duration)
32
+ end
33
+ end
34
+
35
+ def idle?(connections = 0)
36
+ pending_connections <= connections
37
+ end
38
+
39
+ def total_connections
40
+ @traffic.size
41
+ end
42
+
43
+ def finished_connections
44
+ @traffic.count(&:finished?)
45
+ end
46
+
47
+ def pending_connections
48
+ total_connections - finished_connections
49
+ end
50
+
23
51
  def request
24
52
  @exchange&.request
25
53
  end
@@ -87,27 +115,43 @@ module Ferrum
87
115
 
88
116
  def subscribe
89
117
  @page.on("Network.requestWillBeSent") do |params|
118
+ request = Network::Request.new(params)
119
+
120
+ # We can build exchange in two places, here on the event or when request
121
+ # is interrupted. So we have to be careful when to create new one. We
122
+ # create new exchange only if there's no with such id or there's but
123
+ # it's filled with request which means this one is new but has response
124
+ # for a redirect. So we assign response from the params to previous
125
+ # exchange and build new exchange to assign this request to it.
126
+ exchange = select(request.id).last
127
+ exchange = build_exchange(request.id) unless exchange&.blank?
128
+
90
129
  # On redirects Chrome doesn't change `requestId` and there's no
91
130
  # `Network.responseReceived` event for such request. If there's already
92
131
  # exchange object with this id then we got redirected and params has
93
132
  # `redirectResponse` key which contains the response.
94
- if exchange = first_by(params["requestId"])
95
- exchange.build_response(params)
133
+ if params["redirectResponse"]
134
+ previous_exchange = select(request.id)[-2]
135
+ response = Network::Response.new(@page, params)
136
+ previous_exchange.response = response
96
137
  end
97
138
 
98
- exchange = Network::Exchange.new(@page, params)
99
- @exchange = exchange if exchange.navigation_request?(@page.main_frame.id)
100
- @traffic << exchange
139
+ exchange.request = request
140
+
141
+ if exchange.navigation_request?(@page.main_frame.id)
142
+ @exchange = exchange
143
+ end
101
144
  end
102
145
 
103
146
  @page.on("Network.responseReceived") do |params|
104
- if exchange = last_by(params["requestId"])
105
- exchange.build_response(params)
147
+ if exchange = select(params["requestId"]).last
148
+ response = Network::Response.new(@page, params)
149
+ exchange.response = response
106
150
  end
107
151
  end
108
152
 
109
153
  @page.on("Network.loadingFinished") do |params|
110
- exchange = last_by(params["requestId"])
154
+ exchange = select(params["requestId"]).last
111
155
  if exchange && exchange.response
112
156
  exchange.response.body_size = params["encodedDataLength"]
113
157
  end
@@ -115,10 +159,10 @@ module Ferrum
115
159
 
116
160
  @page.on("Log.entryAdded") do |params|
117
161
  entry = params["entry"] || {}
118
- if entry["source"] == "network" &&
119
- entry["level"] == "error" &&
120
- exchange = last_by(entry["networkRequestId"])
121
- exchange.build_error(entry)
162
+ if entry["source"] == "network" && entry["level"] == "error"
163
+ exchange = select(entry["networkRequestId"]).last
164
+ error = Network::Error.new(entry)
165
+ exchange.error = error
122
166
  end
123
167
  end
124
168
  end
@@ -135,12 +179,12 @@ module Ferrum
135
179
  end
136
180
  end
137
181
 
138
- def first_by(request_id)
139
- @traffic.find { |e| e.request.id == request_id }
182
+ def select(request_id)
183
+ @traffic.select { |e| e.id == request_id }
140
184
  end
141
185
 
142
- def last_by(request_id)
143
- @traffic.select { |e| e.request.id == request_id }.last
186
+ def build_exchange(id)
187
+ Network::Exchange.new(@page, id).tap { |e| @traffic << e }
144
188
  end
145
189
  end
146
190
  end
@@ -1,39 +1,37 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require "ferrum/network/error"
4
- require "ferrum/network/request"
5
- require "ferrum/network/response"
6
-
7
3
  module Ferrum
8
4
  class Network
9
5
  class Exchange
10
- attr_reader :request, :response, :error
6
+ attr_reader :id
7
+ attr_accessor :intercepted_request
8
+ attr_accessor :request, :response, :error
11
9
 
12
- def initialize(page, params)
13
- @page = page
14
- @response = @error = nil
15
- build_request(params)
10
+ def initialize(page, id)
11
+ @page, @id = page, id
12
+ @intercepted_request = nil
13
+ @request = @response = @error = nil
16
14
  end
17
15
 
18
- def build_request(params)
19
- @request = Network::Request.new(params)
16
+ def navigation_request?(frame_id)
17
+ request.type?(:document) &&
18
+ request.frame_id == frame_id
20
19
  end
21
20
 
22
- def build_response(params)
23
- @response = Network::Response.new(@page, params)
21
+ def blank?
22
+ !request
24
23
  end
25
24
 
26
- def build_error(params)
27
- @error = Network::Error.new(params)
25
+ def blocked?
26
+ intercepted_request && intercepted_request.status?(:aborted)
28
27
  end
29
28
 
30
- def navigation_request?(frame_id)
31
- request.type?(:document) &&
32
- request.frame_id == frame_id
29
+ def finished?
30
+ blocked? || response || error
33
31
  end
34
32
 
35
- def blocked?
36
- response.nil?
33
+ def pending?
34
+ !finished?
37
35
  end
38
36
 
39
37
  def to_a
@@ -41,7 +39,12 @@ module Ferrum
41
39
  end
42
40
 
43
41
  def inspect
44
- %(#<#{self.class} @id=#{@id.inspect} @request=#{@request.inspect} @response=#{@response.inspect} @error=#{@error.inspect}>)
42
+ "#<#{self.class} "\
43
+ "@id=#{@id.inspect} "\
44
+ "@intercepted_request=#{@intercepted_request.inspect} "\
45
+ "@request=#{@request.inspect} "\
46
+ "@response=#{@response.inspect} "\
47
+ "@error=#{@error.inspect}>"
45
48
  end
46
49
  end
47
50
  end
@@ -5,14 +5,20 @@ require "base64"
5
5
  module Ferrum
6
6
  class Network
7
7
  class InterceptedRequest
8
- attr_accessor :request_id, :frame_id, :resource_type
8
+ attr_accessor :request_id, :frame_id, :resource_type, :network_id, :status
9
9
 
10
10
  def initialize(page, params)
11
+ @status = nil
11
12
  @page, @params = page, params
12
13
  @request_id = params["requestId"]
13
14
  @frame_id = params["frameId"]
14
15
  @resource_type = params["resourceType"]
15
16
  @request = params["request"]
17
+ @network_id = params["networkId"]
18
+ end
19
+
20
+ def status?(value)
21
+ @status == value.to_sym
16
22
  end
17
23
 
18
24
  def navigation_request?
@@ -25,7 +31,7 @@ module Ferrum
25
31
 
26
32
  def respond(**options)
27
33
  has_body = options.has_key?(:body)
28
- headers = has_body ? { "content-length" => options.fetch(:body, '').length } : {}
34
+ headers = has_body ? { "content-length" => options.fetch(:body, "").length } : {}
29
35
  headers = headers.merge(options.fetch(:responseHeaders, {}))
30
36
 
31
37
  options = {responseCode: 200}.merge(options)
@@ -33,17 +39,20 @@ module Ferrum
33
39
  requestId: request_id,
34
40
  responseHeaders: header_array(headers),
35
41
  })
36
- options = options.merge(body: Base64.encode64(options.fetch(:body, '')).strip) if has_body
42
+ options = options.merge(body: Base64.encode64(options.fetch(:body, "")).strip) if has_body
37
43
 
44
+ @status = :responded
38
45
  @page.command("Fetch.fulfillRequest", **options)
39
46
  end
40
47
 
41
48
  def continue(**options)
42
49
  options = options.merge(requestId: request_id)
50
+ @status = :continued
43
51
  @page.command("Fetch.continueRequest", **options)
44
52
  end
45
53
 
46
54
  def abort
55
+ @status = :aborted
47
56
  @page.command("Fetch.failRequest", requestId: request_id, errorReason: "BlockedByClient")
48
57
  end
49
58
 
@@ -34,6 +34,10 @@ module Ferrum
34
34
  def headers_size
35
35
  @response["encodedDataLength"]
36
36
  end
37
+
38
+ def type
39
+ @params["type"]
40
+ end
37
41
 
38
42
  def content_type
39
43
  @content_type ||= headers.find { |k, _| k.downcase == "content-type" }&.last&.sub(/;.*\z/, "")
@@ -13,6 +13,8 @@ require "ferrum/browser/client"
13
13
 
14
14
  module Ferrum
15
15
  class Page
16
+ GOTO_WAIT = ENV.fetch("FERRUM_GOTO_WAIT", 0.1).to_f
17
+
16
18
  class Event < Concurrent::Event
17
19
  def iteration
18
20
  synchronize { @iteration }
@@ -65,7 +67,7 @@ module Ferrum
65
67
  def goto(url = nil)
66
68
  options = { url: combine_url!(url) }
67
69
  options.merge!(referrer: referrer) if referrer
68
- response = command("Page.navigate", wait: timeout, **options)
70
+ response = command("Page.navigate", wait: GOTO_WAIT, **options)
69
71
  # https://cs.chromium.org/chromium/src/net/base/net_error_list.h
70
72
  if %w[net::ERR_NAME_NOT_RESOLVED
71
73
  net::ERR_NAME_RESOLUTION_FAILED
@@ -74,6 +76,9 @@ module Ferrum
74
76
  raise StatusError, options[:url]
75
77
  end
76
78
  response["frameId"]
79
+ rescue TimeoutError
80
+ pendings = network.traffic.select(&:pending?).map { |e| e.request.url }
81
+ raise StatusError.new(options[:url], pendings) unless pendings.empty?
77
82
  end
78
83
 
79
84
  def close
@@ -104,6 +109,7 @@ module Ferrum
104
109
  def refresh
105
110
  command("Page.reload", wait: timeout)
106
111
  end
112
+ alias_method :reload, :refresh
107
113
 
108
114
  def back
109
115
  history_navigate(delta: -1)
@@ -113,12 +119,23 @@ module Ferrum
113
119
  history_navigate(delta: 1)
114
120
  end
115
121
 
122
+ def bypass_csp(value = true)
123
+ enabled = !!value
124
+ command("Page.setBypassCSP", enabled: enabled)
125
+ enabled
126
+ end
127
+
116
128
  def command(method, wait: 0, **params)
117
129
  iteration = @event.reset if wait > 0
118
130
  result = @client.command(method, params)
119
131
  if wait > 0
120
- @event.wait(wait)
121
- @event.wait(@browser.timeout) if iteration != @event.iteration
132
+ @event.wait(wait) # Wait a bit after command and check if iteration has
133
+ # changed which means there was some network event for
134
+ # the main frame and it started to load new content.
135
+ if iteration != @event.iteration
136
+ set = @event.wait(@browser.timeout)
137
+ raise TimeoutError unless set
138
+ end
122
139
  end
123
140
  result
124
141
  end
@@ -133,6 +150,9 @@ module Ferrum
133
150
  when :request
134
151
  @client.on("Fetch.requestPaused") do |params, index, total|
135
152
  request = Network::InterceptedRequest.new(self, params)
153
+ exchange = network.select(request.network_id).last
154
+ exchange ||= network.build_exchange(request.network_id)
155
+ exchange.intercepted_request = request
136
156
  block.call(request, index, total)
137
157
  end
138
158
  when :auth
@@ -163,14 +183,6 @@ module Ferrum
163
183
  Thread.main.raise JavaScriptError.new(params.dig("exceptionDetails", "exception"))
164
184
  end
165
185
  end
166
-
167
- on("Page.domContentEventFired") do |params|
168
- # `frameStoppedLoading` doesn't occur if status isn't success
169
- if network.status != 200
170
- @event.set
171
- get_document_id
172
- end
173
- end
174
186
  end
175
187
 
176
188
  def prepare_page
@@ -40,18 +40,6 @@ module Ferrum
40
40
  frame.name = name unless name.to_s.empty?
41
41
  end
42
42
 
43
- on("Page.frameScheduledNavigation") do |params|
44
- frame = @frames[params["frameId"]]
45
- frame.state = :scheduled_navigation
46
- @event.reset
47
- end
48
-
49
- on("Page.frameClearedScheduledNavigation") do |params|
50
- frame = @frames[params["frameId"]]
51
- frame.state = :cleared_scheduled_navigation
52
- @event.set if idling?
53
- end
54
-
55
43
  on("Page.frameStoppedLoading") do |params|
56
44
  # `DOM.performSearch` doesn't work without getting #document node first.
57
45
  # It returns node with nodeId 1 and nodeType 9 from which descend the
@@ -136,7 +136,7 @@ module Ferrum
136
136
 
137
137
  def capture_screenshot(options, full)
138
138
  maybe_resize_fullscreen(full) do
139
- command("Page.captureScreenshot", options)
139
+ command("Page.captureScreenshot", **options)
140
140
  end.fetch("data")
141
141
  end
142
142
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Ferrum
4
- VERSION = "0.6.2"
4
+ VERSION = "0.7"
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ferrum
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.6.2
4
+ version: '0.7'
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dmitry Vorotilin
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-10-30 00:00:00.000000000 Z
11
+ date: 2020-01-28 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: websocket-driver
@@ -181,7 +181,10 @@ files:
181
181
  - README.md
182
182
  - lib/ferrum.rb
183
183
  - lib/ferrum/browser.rb
184
+ - lib/ferrum/browser/chrome.rb
184
185
  - lib/ferrum/browser/client.rb
186
+ - lib/ferrum/browser/command.rb
187
+ - lib/ferrum/browser/firefox.rb
185
188
  - lib/ferrum/browser/process.rb
186
189
  - lib/ferrum/browser/subscriber.rb
187
190
  - lib/ferrum/browser/web_socket.rb