ferrum 0.6.2 → 0.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 9c0325708ecf4ff29cd0d11a3527207271a1a9a76c945a321810b7f37325c538
4
- data.tar.gz: 9d2f4790ef55ddfb55f1588473dc94511fc95fe7c7cc937f0e49af0980457b9e
3
+ metadata.gz: 34f864b5679986d8580fee118735ea2bf62b5b35f1d2ae9f403cdc08168d1d48
4
+ data.tar.gz: 5899a23fdf4219dfd7aa70b7f1d72d0dcee167a6102b07a5062c36b8c4588f90
5
5
  SHA512:
6
- metadata.gz: 8067b96de131a957317d641afeec6d145f47455b18eb119227dc1de184ed35b8b6a6cbf84aa8e1e6b3c87ff190b2218c68dd43cb1c9187fe7e7acea2cae1ca89
7
- data.tar.gz: a4fb8e6749da0fa5ccbf1ab05c807f176e30c2650e23b263072281065b5ca9632201a596652989636f7b9f565ac8a23ca391e8d366d755538e7b28504b09209b
6
+ metadata.gz: 3732e2ba120edbd7e1d28759d6d9fe75c4416f41cc08a535e63751324faec35117a805de3dfdb1467fa3aa55c2885b8c7058d1fce46bb0b69e642210f89bac26
7
+ data.tar.gz: 426b53078a92ddc04187250335c200dd35dac489eb2fe887f07a7ae8bb07f0652588d278ef18d7389221f692ae3b55488cedfcec0ea8511b57d9ebc9776060db
data/README.md CHANGED
@@ -12,12 +12,31 @@ It is Ruby clean and high-level API to Chrome. Runs headless by default,
12
12
  but you can configure it to run in a non-headless mode. All you need is Ruby and
13
13
  Chrome/Chromium. Ferrum connects to the browser via DevTools Protocol.
14
14
 
15
- Relation to [Cuprite](https://github.com/machinio/cuprite). Cuprite used to have
16
- this code inside in one form or another but the thing is you don't need capybara
17
- if you are going to crawl sites. You crawl, not test. Besides that clean
18
- lightweight API to browser is what Ruby was missing, so here it comes.
19
-
20
- If you like this project, please consider to [become a backer](https://www.patreon.com/rferrum) on Patreon.
15
+ [Cuprite](https://github.com/machinio/cuprite) used to have this code inside in
16
+ one form or another but the thing is you don't need Capybara if you are going to
17
+ crawl sites. You crawl, not test. Besides that clean lightweight API to browser
18
+ is what Ruby was missing, so here it comes.
19
+
20
+ [Vessel](https://github.com/route/vessel) high-level web crawling framework
21
+ based on Ferrum.
22
+
23
+ If you like this project, please consider to _[become a backer](https://www.patreon.com/rferrum)_
24
+ on Patreon.
25
+
26
+ ## Index
27
+
28
+ * [Customization](https://github.com/route/ferrum#customization)
29
+ * [Navigation](https://github.com/route/ferrum#navigation)
30
+ * [Finders](https://github.com/route/ferrum#finders)
31
+ * [Screenshots](https://github.com/route/ferrum#screenshots)
32
+ * [Network](https://github.com/route/ferrum#network)
33
+ * [Mouse](https://github.com/route/ferrum#mouse)
34
+ * [Keyboard](https://github.com/route/ferrum#keyboard)
35
+ * [Cookies](https://github.com/route/ferrum#cookies)
36
+ * [Headers](https://github.com/route/ferrum#headers)
37
+ * [JavaScript](https://github.com/route/ferrum#javascript)
38
+ * [Frames](https://github.com/route/ferrum#frames)
39
+ * [Dialog](https://github.com/route/ferrum#dialog)
21
40
 
22
41
  ## Install
23
42
 
@@ -48,8 +67,8 @@ Interact with a page:
48
67
  browser = Ferrum::Browser.new
49
68
  browser.goto("https://google.com")
50
69
  input = browser.at_xpath("//div[@id='searchform']/form//input[@type='text']")
51
- input.focus.type("Ruby headless driver for Capybara", :Enter)
52
- browser.at_css("a > h3").text # => "machinio/cuprite: Headless Chrome driver for Capybara - GitHub"
70
+ input.focus.type("Ruby headless driver for Chrome", :Enter)
71
+ browser.at_css("a > h3").text # => "route/ferrum: Ruby Chrome/Chromium driver - GitHub"
53
72
  browser.quit
54
73
  ```
55
74
 
@@ -93,22 +112,26 @@ Ferrum::Browser.new(options)
93
112
  ```
94
113
 
95
114
  * options `Hash`
96
- * `:browser_path` (String) - Path to chrome binary, you can also set ENV
97
- variable as `BROWSER_PATH=some/path/chrome bundle exec rspec`.
98
115
  * `:headless` (Boolean) - Set browser as headless or not, `true` by default.
99
- * `:slowmo` (Integer | Float) - Set a delay to wait before sending command.
100
- Usefull companion of headless option, so that you have time to see changes.
116
+ * `:window_size` (Array) - The dimensions of the browser window in which to
117
+ test, expressed as a 2-element array, e.g. [1024, 768]. Default: [1024, 768]
118
+ * `:extensions` (Array[String | Hash]) - An array of paths to files or JS
119
+ source code to be preloaded into the browser e.g.:
120
+ `["/path/to/script.js", { source: "window.secret = 'top'" }]`
101
121
  * `:logger` (Object responding to `puts`) - When present, debug output is
102
122
  written to this object.
123
+ * `:slowmo` (Integer | Float) - Set a delay to wait before sending command.
124
+ Usefull companion of headless option, so that you have time to see changes.
103
125
  * `:timeout` (Numeric) - The number of seconds we'll wait for a response when
104
126
  communicating with browser. Default is 5.
105
127
  * `:js_errors` (Boolean) - When true, JavaScript errors get re-raised in Ruby.
106
- * `:window_size` (Array) - The dimensions of the browser window in which to
107
- test, expressed as a 2-element array, e.g. [1024, 768]. Default: [1024, 768]
128
+ * `:browser_name` (Symbol) - `:chrome` by default, only experimental support
129
+ for `:firefox` for now.
130
+ * `:browser_path` (String) - Path to chrome binary, you can also set ENV
131
+ variable as `BROWSER_PATH=some/path/chrome bundle exec rspec`.
108
132
  * `:browser_options` (Hash) - Additional command line options,
109
133
  [see them all](https://peter.sh/experiments/chromium-command-line-switches/)
110
134
  e.g. `{ "ignore-certificate-errors" => nil }`
111
- * `:extensions` (Array) - An array of JS files to be preloaded into the browser
112
135
  * `:port` (Integer) - Remote debugging port for headless Chrome
113
136
  * `:host` (String) - Remote debugging address for headless Chrome
114
137
  * `:url` (String) - URL for a running instance of Chrome. If this is set, a
@@ -342,6 +365,24 @@ browser.goto("https://github.com/")
342
365
  browser.network.status # => 200
343
366
  ```
344
367
 
368
+ #### wait_for_idle(\*\*options)
369
+
370
+ Waits for network idle or raises `Ferrum::TimeoutError` error
371
+
372
+ * options `Hash`
373
+ * :connections `Integer` how many connections are allowed for network to be
374
+ idling, `0` by default
375
+ * :duration `Float` sleep for given amount of time and check again, `0.05` by
376
+ default
377
+ * :timeout `Float` during what time we try to check idle, `browser.timeout`
378
+ by default
379
+
380
+ ```ruby
381
+ browser.goto("https://example.com/")
382
+ browser.at_xpath("//a[text() = 'No UI changes button']").click
383
+ browser.network.wait_for_idle
384
+ ```
385
+
345
386
  #### clear(type)
346
387
 
347
388
  Clear browser's cache or collected traffic.
@@ -628,6 +669,18 @@ browser.add_script_tag(url: "http://example.com/stylesheet.css") # => true
628
669
 
629
670
  ```ruby
630
671
  browser.add_style_tag(content: "h1 { font-size: 40px; }") # => true
672
+
673
+ ```
674
+ #### bypass_csp(enabled) : `Boolean`
675
+
676
+ * enabled `Boolean`, `true` by default
677
+
678
+ ```ruby
679
+ browser.bypass_csp # => true
680
+ browser.goto("https://github.com/ruby-concurrency/concurrent-ruby/blob/master/docs-source/promises.in.md")
681
+ browser.refresh
682
+ browser.add_script_tag(content: "window.__injected = 42")
683
+ browser.evaluate("window.__injected") # => 42
631
684
  ```
632
685
 
633
686
 
@@ -10,8 +10,14 @@ module Ferrum
10
10
  class NotImplementedError < Error; end
11
11
 
12
12
  class StatusError < Error
13
- def initialize(url)
14
- super("Request to #{url} failed to reach server, check DNS and/or server status")
13
+ def initialize(url, pendings = [])
14
+ message = if pendings.empty?
15
+ "Request to #{url} failed to reach server, check DNS and/or server status"
16
+ else
17
+ "Request to #{url} reached server, but there are still pending connections: #{pendings.join(', ')}"
18
+ end
19
+
20
+ super(message)
15
21
  end
16
22
  end
17
23
 
@@ -31,7 +37,7 @@ module Ferrum
31
37
  end
32
38
 
33
39
  class DeadBrowserError < Error
34
- def initialize(message = "Browser is dead")
40
+ def initialize(message = "Browser is dead or given window is closed")
35
41
  super
36
42
  end
37
43
  end
@@ -16,15 +16,16 @@ module Ferrum
16
16
  extend Forwardable
17
17
  delegate %i[default_context] => :contexts
18
18
  delegate %i[targets create_target create_page page pages windows] => :default_context
19
- delegate %i[goto back forward refresh
19
+ delegate %i[goto back forward refresh reload
20
20
  at_css at_xpath css xpath current_url title body doctype
21
21
  headers cookies network
22
22
  mouse keyboard
23
23
  screenshot pdf viewport_size
24
24
  frames frame_by main_frame
25
25
  evaluate evaluate_on evaluate_async execute
26
- add_script_tag add_style_tag
26
+ add_script_tag add_style_tag bypass_csp
27
27
  on] => :page
28
+ delegate %i[default_user_agent] => :process
28
29
 
29
30
  attr_reader :client, :process, :contexts, :logger, :js_errors,
30
31
  :slowmo, :base_url, :options, :window_size
@@ -67,7 +68,9 @@ module Ferrum
67
68
  end
68
69
 
69
70
  def extensions
70
- @extensions ||= Array(@options[:extensions]).map { |p| File.read(p) }
71
+ @extensions ||= Array(@options[:extensions]).map do |ext|
72
+ (ext.is_a?(Hash) && ext[:source]) || File.read(ext)
73
+ end
71
74
  end
72
75
 
73
76
  def timeout
@@ -0,0 +1,76 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Ferrum
4
+ class Browser
5
+ class Chrome < Command
6
+ DEFAULT_OPTIONS = {
7
+ "headless" => nil,
8
+ "disable-gpu" => nil,
9
+ "hide-scrollbars" => nil,
10
+ "mute-audio" => nil,
11
+ "enable-automation" => nil,
12
+ "disable-web-security" => nil,
13
+ "disable-session-crashed-bubble" => nil,
14
+ "disable-breakpad" => nil,
15
+ "disable-sync" => nil,
16
+ "no-first-run" => nil,
17
+ "use-mock-keychain" => nil,
18
+ "keep-alive-for-test" => nil,
19
+ "disable-popup-blocking" => nil,
20
+ "disable-extensions" => nil,
21
+ "disable-hang-monitor" => nil,
22
+ "disable-features" => "site-per-process,TranslateUI",
23
+ "disable-translate" => nil,
24
+ "disable-background-networking" => nil,
25
+ "enable-features" => "NetworkService,NetworkServiceInProcess",
26
+ "disable-background-timer-throttling" => nil,
27
+ "disable-backgrounding-occluded-windows" => nil,
28
+ "disable-client-side-phishing-detection" => nil,
29
+ "disable-default-apps" => nil,
30
+ "disable-dev-shm-usage" => nil,
31
+ "disable-ipc-flooding-protection" => nil,
32
+ "disable-prompt-on-repost" => nil,
33
+ "disable-renderer-backgrounding" => nil,
34
+ "force-color-profile" => "srgb",
35
+ "metrics-recording-only" => nil,
36
+ "safebrowsing-disable-auto-update" => nil,
37
+ "password-store" => "basic",
38
+ # Note: --no-sandbox is not needed if you properly setup a user in the container.
39
+ # https://github.com/ebidel/lighthouse-ci/blob/master/builder/Dockerfile#L35-L40
40
+ # "no-sandbox" => nil,
41
+ }.freeze
42
+
43
+ MAC_BIN_PATH = [
44
+ "/Applications/Chromium.app/Contents/MacOS/Chromium",
45
+ "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
46
+ ].freeze
47
+ LINUX_BIN_PATH = %w[chromium google-chrome-unstable google-chrome-beta
48
+ google-chrome chrome chromium-browser
49
+ google-chrome-stable].freeze
50
+
51
+ private
52
+
53
+ def combine_flags
54
+ # Doesn't work on MacOS, so we need to set it by CDP as well
55
+ @flags.merge!("window-size" => options[:window_size].join(","))
56
+
57
+ port = options.fetch(:port, BROWSER_PORT)
58
+ @flags.merge!("remote-debugging-port" => port)
59
+
60
+ host = options.fetch(:host, BROWSER_HOST)
61
+ @flags.merge!("remote-debugging-address" => host)
62
+
63
+ @flags.merge!("user-data-dir" => @user_data_dir)
64
+
65
+ @flags = DEFAULT_OPTIONS.merge(@flags)
66
+
67
+ unless options.fetch(:headless, true)
68
+ @flags.delete("headless")
69
+ @flags.delete("disable-gpu")
70
+ end
71
+
72
+ @flags.merge!(options.fetch(:browser_options, {}))
73
+ end
74
+ end
75
+ end
76
+ end
@@ -7,20 +7,26 @@ require "ferrum/browser/web_socket"
7
7
  module Ferrum
8
8
  class Browser
9
9
  class Client
10
+ INTERRUPTIONS = %w[Fetch.requestPaused Fetch.authRequired].freeze
11
+
10
12
  def initialize(browser, ws_url, start_id = 0, allow_slowmo = true)
11
13
  @command_id = start_id
12
14
  @pendings = Concurrent::Hash.new
13
15
  @browser = browser
14
16
  @slowmo = @browser.slowmo if allow_slowmo && @browser.slowmo > 0
15
17
  @ws = WebSocket.new(ws_url, @browser.logger)
16
- @subscriber = Subscriber.new
18
+ @subscriber, @interruptor = Subscriber.build(2)
17
19
 
18
20
  @thread = Thread.new do
19
21
  Thread.current.abort_on_exception = true
20
- Thread.current.report_on_exception = true if Thread.current.respond_to?(:report_on_exception=)
22
+ if Thread.current.respond_to?(:report_on_exception=)
23
+ Thread.current.report_on_exception = true
24
+ end
21
25
 
22
26
  while message = @ws.messages.pop
23
- if message.key?("method")
27
+ if INTERRUPTIONS.include?(message["method"])
28
+ @interruptor.async.call(message)
29
+ elsif message.key?("method")
24
30
  @subscriber.async.call(message)
25
31
  else
26
32
  @pendings[message["id"]]&.set(message)
@@ -46,7 +52,12 @@ module Ferrum
46
52
  end
47
53
 
48
54
  def on(event, &block)
49
- @subscriber.on(event, &block)
55
+ case event
56
+ when *INTERRUPTIONS
57
+ @interruptor.on(event, &block)
58
+ else
59
+ @subscriber.on(event, &block)
60
+ end
50
61
  end
51
62
 
52
63
  def close
@@ -0,0 +1,56 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Ferrum
4
+ class Browser
5
+ class Command
6
+ BROWSER_HOST = "127.0.0.1"
7
+ BROWSER_PORT = "0"
8
+ NOT_FOUND = "Could not find an executable for the browser. Try to make " \
9
+ "it available on the PATH or set environment varible for " \
10
+ "example BROWSER_PATH=\"/usr/bin/chrome\"".freeze
11
+
12
+ # Currently only these browsers support CDP:
13
+ # https://github.com/cyrus-and/chrome-remote-interface#implementations
14
+ def self.build(options, user_data_dir)
15
+ case options[:browser_name]
16
+ when :firefox
17
+ Firefox
18
+ when :chrome, :opera, :edge, nil
19
+ Chrome
20
+ else
21
+ raise NotImplementedError, "not supported browser"
22
+ end.new(options, user_data_dir)
23
+ end
24
+
25
+ attr_reader :path, :flags, :options
26
+
27
+ def initialize(options, user_data_dir)
28
+ @flags = {}
29
+ @options, @user_data_dir = options, user_data_dir
30
+ @path = options[:browser_path] || ENV["BROWSER_PATH"] || detect_path
31
+ raise Cliver::Dependency::NotFound.new(NOT_FOUND) unless @path
32
+
33
+ combine_flags
34
+ end
35
+
36
+ def to_a
37
+ [path] + flags.map { |k, v| v.nil? ? "--#{k}" : "--#{k}=#{v}" }
38
+ end
39
+
40
+ private
41
+
42
+ def detect_path
43
+ if Ferrum.mac?
44
+ self.class::MAC_BIN_PATH.find { |b| File.exist?(b) }
45
+ else
46
+ self.class::LINUX_BIN_PATH
47
+ .find { |b| p = Cliver.detect(b) and break(p) }
48
+ end
49
+ end
50
+
51
+ def combine_flags
52
+ raise NotImplementedError
53
+ end
54
+ end
55
+ end
56
+ end
@@ -0,0 +1,34 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Ferrum
4
+ class Browser
5
+ class Firefox < Command
6
+ DEFAULT_OPTIONS = {
7
+ "headless" => nil,
8
+ }.freeze
9
+
10
+ MAC_BIN_PATH = [
11
+ "/Applications/Firefox.app/Contents/MacOS/firefox-bin"
12
+ ].freeze
13
+ LINUX_BIN_PATH = %w[firefox].freeze
14
+
15
+ private
16
+
17
+ def combine_flags
18
+ port = options.fetch(:port, BROWSER_PORT)
19
+ host = options.fetch(:host, BROWSER_HOST)
20
+ @flags.merge!("remote-debugger" => "#{host}:#{port}")
21
+
22
+ @flags.merge!("profile" => @user_data_dir)
23
+
24
+ @flags = DEFAULT_OPTIONS.merge(@flags)
25
+
26
+ unless options.fetch(:headless, true)
27
+ @flags.delete("headless")
28
+ end
29
+
30
+ @flags.merge!(options.fetch(:browser_options, {}))
31
+ end
32
+ end
33
+ end
34
+ end
@@ -5,6 +5,10 @@ require "net/http"
5
5
  require "json"
6
6
  require "addressable"
7
7
  require "tmpdir"
8
+ require "forwardable"
9
+ require "ferrum/browser/command"
10
+ require "ferrum/browser/chrome"
11
+ require "ferrum/browser/firefox"
8
12
 
9
13
  module Ferrum
10
14
  class Browser
@@ -12,52 +16,14 @@ module Ferrum
12
16
  KILL_TIMEOUT = 2
13
17
  WAIT_KILLED = 0.05
14
18
  PROCESS_TIMEOUT = ENV.fetch("FERRUM_PROCESS_TIMEOUT", 2).to_i
15
- BROWSER_PATH = ENV["BROWSER_PATH"]
16
- BROWSER_HOST = "127.0.0.1"
17
- BROWSER_PORT = "0"
18
- DEFAULT_OPTIONS = {
19
- "headless" => nil,
20
- "disable-gpu" => nil,
21
- "hide-scrollbars" => nil,
22
- "mute-audio" => nil,
23
- "enable-automation" => nil,
24
- "disable-web-security" => nil,
25
- "disable-session-crashed-bubble" => nil,
26
- "disable-breakpad" => nil,
27
- "disable-sync" => nil,
28
- "no-first-run" => nil,
29
- "use-mock-keychain" => nil,
30
- "keep-alive-for-test" => nil,
31
- "disable-popup-blocking" => nil,
32
- "disable-extensions" => nil,
33
- "disable-hang-monitor" => nil,
34
- "disable-features" => "site-per-process,TranslateUI",
35
- "disable-translate" => nil,
36
- "disable-background-networking" => nil,
37
- "enable-features" => "NetworkService,NetworkServiceInProcess",
38
- "disable-background-timer-throttling" => nil,
39
- "disable-backgrounding-occluded-windows" => nil,
40
- "disable-client-side-phishing-detection" => nil,
41
- "disable-default-apps" => nil,
42
- "disable-dev-shm-usage" => nil,
43
- "disable-ipc-flooding-protection" => nil,
44
- "disable-prompt-on-repost" => nil,
45
- "disable-renderer-backgrounding" => nil,
46
- "force-color-profile" => "srgb",
47
- "metrics-recording-only" => nil,
48
- "safebrowsing-disable-auto-update" => nil,
49
- "password-store" => "basic",
50
- # Note: --no-sandbox is not needed if you properly setup a user in the container.
51
- # https://github.com/ebidel/lighthouse-ci/blob/master/builder/Dockerfile#L35-L40
52
- # "no-sandbox" => nil,
53
- }.freeze
54
-
55
- NOT_FOUND = "Could not find an executable for chrome. Try to make it " \
56
- "available on the PATH or set environment varible for " \
57
- "example BROWSER_PATH=\"/Applications/Chromium.app/Contents/MacOS/Chromium\""
58
-
59
-
60
- attr_reader :host, :port, :ws_url, :pid, :path, :options, :cmd
19
+
20
+ attr_reader :host, :port, :ws_url, :pid, :command,
21
+ :default_user_agent, :browser_version, :protocol_version,
22
+ :v8_version, :webkit_version
23
+
24
+
25
+ extend Forwardable
26
+ delegate path: :command
61
27
 
62
28
  def self.start(*args)
63
29
  new(*args).tap(&:start)
@@ -85,65 +51,24 @@ module Ferrum
85
51
  end
86
52
 
87
53
  def self.directory_remover(path)
88
- proc do
89
- begin
90
- FileUtils.remove_entry(path)
91
- rescue Errno::ENOENT
92
- end
93
- end
94
- end
95
-
96
- def self.detect_browser_path
97
- if RUBY_PLATFORM.include?("darwin")
98
- [
99
- "/Applications/Chromium.app/Contents/MacOS/Chromium",
100
- "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
101
- ].find { |path| File.exist?(path) }
102
- else
103
- %w[chromium google-chrome-unstable google-chrome-beta google-chrome chrome chromium-browser google-chrome-stable].reduce(nil) do |path, exe|
104
- path = Cliver.detect(exe)
105
- break path if path
106
- end
107
- end
54
+ proc { FileUtils.remove_entry(path) rescue Errno::ENOENT }
108
55
  end
109
56
 
110
57
  def initialize(options)
111
- @options = {}
112
-
113
- @path = options[:browser_path] || BROWSER_PATH || self.class.detect_browser_path
114
-
115
58
  if options[:url]
116
59
  url = URI.join(options[:url].to_s, "/json/version")
117
60
  response = JSON.parse(::Net::HTTP.get(url))
118
61
  set_ws_url(response["webSocketDebuggerUrl"])
62
+ parse_browser_versions
119
63
  return
120
64
  end
121
65
 
122
- # Doesn't work on MacOS, so we need to set it by CDP as well
123
- @options.merge!("window-size" => options[:window_size].join(","))
124
-
125
- port = options.fetch(:port, BROWSER_PORT)
126
- @options.merge!("remote-debugging-port" => port)
127
-
128
- host = options.fetch(:host, BROWSER_HOST)
129
- @options.merge!("remote-debugging-address" => host)
130
-
131
- @temp_user_data_dir = Dir.mktmpdir
132
- ObjectSpace.define_finalizer(self, self.class.directory_remover(@temp_user_data_dir))
133
- @options.merge!("user-data-dir" => @temp_user_data_dir)
134
-
135
- @options = DEFAULT_OPTIONS.merge(@options)
136
-
137
- unless options.fetch(:headless, true)
138
- @options.delete("headless")
139
- @options.delete("disable-gpu")
140
- end
141
-
66
+ @logger = options[:logger]
142
67
  @process_timeout = options.fetch(:process_timeout, PROCESS_TIMEOUT)
143
68
 
144
- @options.merge!(options.fetch(:browser_options, {}))
145
-
146
- @logger = options[:logger]
69
+ tmpdir = Dir.mktmpdir
70
+ ObjectSpace.define_finalizer(self, self.class.directory_remover(tmpdir))
71
+ @command = Command.build(options, tmpdir)
147
72
  end
148
73
 
149
74
  def start
@@ -156,13 +81,11 @@ module Ferrum
156
81
  process_options[:pgroup] = true unless Ferrum.windows?
157
82
  process_options[:out] = process_options[:err] = write_io
158
83
 
159
- raise Cliver::Dependency::NotFound.new(NOT_FOUND) unless @path
160
-
161
- @cmd = [@path] + @options.map { |k, v| v.nil? ? "--#{k}" : "--#{k}=#{v}" }
162
- @pid = ::Process.spawn(*@cmd, process_options)
84
+ @pid = ::Process.spawn(*@command.to_a, process_options)
163
85
  ObjectSpace.define_finalizer(self, self.class.process_killer(@pid))
164
86
 
165
87
  parse_ws_url(read_io, @process_timeout)
88
+ parse_browser_versions
166
89
  ensure
167
90
  close_io(read_io, write_io)
168
91
  end
@@ -170,7 +93,7 @@ module Ferrum
170
93
 
171
94
  def stop
172
95
  kill if @pid
173
- remove_temp_user_data_dir if @temp_user_data_dir
96
+ remove_user_data_dir if @user_data_dir
174
97
  ObjectSpace.undefine_finalizer(self)
175
98
  end
176
99
 
@@ -186,9 +109,9 @@ module Ferrum
186
109
  @pid = nil
187
110
  end
188
111
 
189
- def remove_temp_user_data_dir
190
- self.class.directory_remover(@temp_user_data_dir).call
191
- @temp_user_data_dir = nil
112
+ def remove_user_data_dir
113
+ self.class.directory_remover(@user_data_dir).call
114
+ @user_data_dir = nil
192
115
  end
193
116
 
194
117
  def parse_ws_url(read_io, timeout)
@@ -211,7 +134,7 @@ module Ferrum
211
134
 
212
135
  unless ws_url
213
136
  @logger.puts output if @logger
214
- raise "Chrome process did not produce websocket url within #{timeout} seconds"
137
+ raise "Browser process did not produce websocket url within #{timeout} seconds"
215
138
  end
216
139
  end
217
140
 
@@ -221,12 +144,25 @@ module Ferrum
221
144
  @port = @ws_url.port
222
145
  end
223
146
 
147
+ def parse_browser_versions
148
+ return unless ws_url.is_a?(Addressable::URI)
149
+
150
+ version_url = URI.parse(ws_url.merge(scheme: "http", path: "/json/version"))
151
+ response = JSON.parse(::Net::HTTP.get(version_url))
152
+
153
+ @v8_version = response["V8-Version"]
154
+ @browser_version = response["Browser"]
155
+ @webkit_version = response["WebKit-Version"]
156
+ @default_user_agent = response["User-Agent"]
157
+ @protocol_version = response["Protocol-Version"]
158
+ end
159
+
224
160
  def close_io(*ios)
225
161
  ios.each do |io|
226
162
  begin
227
163
  io.close unless io.closed?
228
164
  rescue IOError
229
- raise unless RUBY_ENGINE == 'jruby'
165
+ raise unless RUBY_ENGINE == "jruby"
230
166
  end
231
167
  end
232
168
  end
@@ -7,6 +7,10 @@ module Ferrum
7
7
  class Subscriber
8
8
  include Concurrent::Async
9
9
 
10
+ def self.build(size)
11
+ (0..size).map { new }
12
+ end
13
+
10
14
  def initialize
11
15
  super
12
16
  @on = Hash.new { |h, k| h[k] = [] }
@@ -18,8 +18,6 @@ module Ferrum
18
18
  # Can be one of:
19
19
  # * started_loading
20
20
  # * navigated
21
- # * scheduled_navigation
22
- # * cleared_scheduled_navigation
23
21
  # * stopped_loading
24
22
  def state=(value)
25
23
  @state = value
@@ -232,7 +232,7 @@ module Ferrum
232
232
 
233
233
  const seen = [];
234
234
  function detectCycle(obj) {
235
- if (typeof obj === 'object') {
235
+ if (typeof obj === "object") {
236
236
  if (seen.indexOf(obj) !== -1) {
237
237
  return true;
238
238
  }
@@ -40,7 +40,7 @@ module Ferrum
40
40
 
41
41
  def set_overrides(user_agent: nil, accept_language: nil, platform: nil)
42
42
  options = Hash.new
43
- options[:userAgent] = user_agent if user_agent
43
+ options[:userAgent] = user_agent || @page.browser.default_user_agent
44
44
  options[:acceptLanguage] = accept_language if accept_language
45
45
  options[:platform] if platform
46
46
 
@@ -3,6 +3,9 @@
3
3
  require "ferrum/network/exchange"
4
4
  require "ferrum/network/intercepted_request"
5
5
  require "ferrum/network/auth_request"
6
+ require "ferrum/network/error"
7
+ require "ferrum/network/request"
8
+ require "ferrum/network/response"
6
9
 
7
10
  module Ferrum
8
11
  class Network
@@ -20,6 +23,31 @@ module Ferrum
20
23
  @exchange = nil
21
24
  end
22
25
 
26
+ def wait_for_idle(connections: 0, duration: 0.05, timeout: @page.browser.timeout)
27
+ start = Ferrum.monotonic_time
28
+
29
+ until idle?(connections)
30
+ raise TimeoutError if Ferrum.timeout?(start, timeout)
31
+ sleep(duration)
32
+ end
33
+ end
34
+
35
+ def idle?(connections = 0)
36
+ pending_connections <= connections
37
+ end
38
+
39
+ def total_connections
40
+ @traffic.size
41
+ end
42
+
43
+ def finished_connections
44
+ @traffic.count(&:finished?)
45
+ end
46
+
47
+ def pending_connections
48
+ total_connections - finished_connections
49
+ end
50
+
23
51
  def request
24
52
  @exchange&.request
25
53
  end
@@ -87,27 +115,43 @@ module Ferrum
87
115
 
88
116
  def subscribe
89
117
  @page.on("Network.requestWillBeSent") do |params|
118
+ request = Network::Request.new(params)
119
+
120
+ # We can build exchange in two places, here on the event or when request
121
+ # is interrupted. So we have to be careful when to create new one. We
122
+ # create new exchange only if there's no with such id or there's but
123
+ # it's filled with request which means this one is new but has response
124
+ # for a redirect. So we assign response from the params to previous
125
+ # exchange and build new exchange to assign this request to it.
126
+ exchange = select(request.id).last
127
+ exchange = build_exchange(request.id) unless exchange&.blank?
128
+
90
129
  # On redirects Chrome doesn't change `requestId` and there's no
91
130
  # `Network.responseReceived` event for such request. If there's already
92
131
  # exchange object with this id then we got redirected and params has
93
132
  # `redirectResponse` key which contains the response.
94
- if exchange = first_by(params["requestId"])
95
- exchange.build_response(params)
133
+ if params["redirectResponse"]
134
+ previous_exchange = select(request.id)[-2]
135
+ response = Network::Response.new(@page, params)
136
+ previous_exchange.response = response
96
137
  end
97
138
 
98
- exchange = Network::Exchange.new(@page, params)
99
- @exchange = exchange if exchange.navigation_request?(@page.main_frame.id)
100
- @traffic << exchange
139
+ exchange.request = request
140
+
141
+ if exchange.navigation_request?(@page.main_frame.id)
142
+ @exchange = exchange
143
+ end
101
144
  end
102
145
 
103
146
  @page.on("Network.responseReceived") do |params|
104
- if exchange = last_by(params["requestId"])
105
- exchange.build_response(params)
147
+ if exchange = select(params["requestId"]).last
148
+ response = Network::Response.new(@page, params)
149
+ exchange.response = response
106
150
  end
107
151
  end
108
152
 
109
153
  @page.on("Network.loadingFinished") do |params|
110
- exchange = last_by(params["requestId"])
154
+ exchange = select(params["requestId"]).last
111
155
  if exchange && exchange.response
112
156
  exchange.response.body_size = params["encodedDataLength"]
113
157
  end
@@ -115,10 +159,10 @@ module Ferrum
115
159
 
116
160
  @page.on("Log.entryAdded") do |params|
117
161
  entry = params["entry"] || {}
118
- if entry["source"] == "network" &&
119
- entry["level"] == "error" &&
120
- exchange = last_by(entry["networkRequestId"])
121
- exchange.build_error(entry)
162
+ if entry["source"] == "network" && entry["level"] == "error"
163
+ exchange = select(entry["networkRequestId"]).last
164
+ error = Network::Error.new(entry)
165
+ exchange.error = error
122
166
  end
123
167
  end
124
168
  end
@@ -135,12 +179,12 @@ module Ferrum
135
179
  end
136
180
  end
137
181
 
138
- def first_by(request_id)
139
- @traffic.find { |e| e.request.id == request_id }
182
+ def select(request_id)
183
+ @traffic.select { |e| e.id == request_id }
140
184
  end
141
185
 
142
- def last_by(request_id)
143
- @traffic.select { |e| e.request.id == request_id }.last
186
+ def build_exchange(id)
187
+ Network::Exchange.new(@page, id).tap { |e| @traffic << e }
144
188
  end
145
189
  end
146
190
  end
@@ -1,39 +1,37 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require "ferrum/network/error"
4
- require "ferrum/network/request"
5
- require "ferrum/network/response"
6
-
7
3
  module Ferrum
8
4
  class Network
9
5
  class Exchange
10
- attr_reader :request, :response, :error
6
+ attr_reader :id
7
+ attr_accessor :intercepted_request
8
+ attr_accessor :request, :response, :error
11
9
 
12
- def initialize(page, params)
13
- @page = page
14
- @response = @error = nil
15
- build_request(params)
10
+ def initialize(page, id)
11
+ @page, @id = page, id
12
+ @intercepted_request = nil
13
+ @request = @response = @error = nil
16
14
  end
17
15
 
18
- def build_request(params)
19
- @request = Network::Request.new(params)
16
+ def navigation_request?(frame_id)
17
+ request.type?(:document) &&
18
+ request.frame_id == frame_id
20
19
  end
21
20
 
22
- def build_response(params)
23
- @response = Network::Response.new(@page, params)
21
+ def blank?
22
+ !request
24
23
  end
25
24
 
26
- def build_error(params)
27
- @error = Network::Error.new(params)
25
+ def blocked?
26
+ intercepted_request && intercepted_request.status?(:aborted)
28
27
  end
29
28
 
30
- def navigation_request?(frame_id)
31
- request.type?(:document) &&
32
- request.frame_id == frame_id
29
+ def finished?
30
+ blocked? || response || error
33
31
  end
34
32
 
35
- def blocked?
36
- response.nil?
33
+ def pending?
34
+ !finished?
37
35
  end
38
36
 
39
37
  def to_a
@@ -41,7 +39,12 @@ module Ferrum
41
39
  end
42
40
 
43
41
  def inspect
44
- %(#<#{self.class} @id=#{@id.inspect} @request=#{@request.inspect} @response=#{@response.inspect} @error=#{@error.inspect}>)
42
+ "#<#{self.class} "\
43
+ "@id=#{@id.inspect} "\
44
+ "@intercepted_request=#{@intercepted_request.inspect} "\
45
+ "@request=#{@request.inspect} "\
46
+ "@response=#{@response.inspect} "\
47
+ "@error=#{@error.inspect}>"
45
48
  end
46
49
  end
47
50
  end
@@ -5,14 +5,20 @@ require "base64"
5
5
  module Ferrum
6
6
  class Network
7
7
  class InterceptedRequest
8
- attr_accessor :request_id, :frame_id, :resource_type
8
+ attr_accessor :request_id, :frame_id, :resource_type, :network_id, :status
9
9
 
10
10
  def initialize(page, params)
11
+ @status = nil
11
12
  @page, @params = page, params
12
13
  @request_id = params["requestId"]
13
14
  @frame_id = params["frameId"]
14
15
  @resource_type = params["resourceType"]
15
16
  @request = params["request"]
17
+ @network_id = params["networkId"]
18
+ end
19
+
20
+ def status?(value)
21
+ @status == value.to_sym
16
22
  end
17
23
 
18
24
  def navigation_request?
@@ -25,7 +31,7 @@ module Ferrum
25
31
 
26
32
  def respond(**options)
27
33
  has_body = options.has_key?(:body)
28
- headers = has_body ? { "content-length" => options.fetch(:body, '').length } : {}
34
+ headers = has_body ? { "content-length" => options.fetch(:body, "").length } : {}
29
35
  headers = headers.merge(options.fetch(:responseHeaders, {}))
30
36
 
31
37
  options = {responseCode: 200}.merge(options)
@@ -33,17 +39,20 @@ module Ferrum
33
39
  requestId: request_id,
34
40
  responseHeaders: header_array(headers),
35
41
  })
36
- options = options.merge(body: Base64.encode64(options.fetch(:body, '')).strip) if has_body
42
+ options = options.merge(body: Base64.encode64(options.fetch(:body, "")).strip) if has_body
37
43
 
44
+ @status = :responded
38
45
  @page.command("Fetch.fulfillRequest", **options)
39
46
  end
40
47
 
41
48
  def continue(**options)
42
49
  options = options.merge(requestId: request_id)
50
+ @status = :continued
43
51
  @page.command("Fetch.continueRequest", **options)
44
52
  end
45
53
 
46
54
  def abort
55
+ @status = :aborted
47
56
  @page.command("Fetch.failRequest", requestId: request_id, errorReason: "BlockedByClient")
48
57
  end
49
58
 
@@ -34,6 +34,10 @@ module Ferrum
34
34
  def headers_size
35
35
  @response["encodedDataLength"]
36
36
  end
37
+
38
+ def type
39
+ @params["type"]
40
+ end
37
41
 
38
42
  def content_type
39
43
  @content_type ||= headers.find { |k, _| k.downcase == "content-type" }&.last&.sub(/;.*\z/, "")
@@ -13,6 +13,8 @@ require "ferrum/browser/client"
13
13
 
14
14
  module Ferrum
15
15
  class Page
16
+ GOTO_WAIT = ENV.fetch("FERRUM_GOTO_WAIT", 0.1).to_f
17
+
16
18
  class Event < Concurrent::Event
17
19
  def iteration
18
20
  synchronize { @iteration }
@@ -65,7 +67,7 @@ module Ferrum
65
67
  def goto(url = nil)
66
68
  options = { url: combine_url!(url) }
67
69
  options.merge!(referrer: referrer) if referrer
68
- response = command("Page.navigate", wait: timeout, **options)
70
+ response = command("Page.navigate", wait: GOTO_WAIT, **options)
69
71
  # https://cs.chromium.org/chromium/src/net/base/net_error_list.h
70
72
  if %w[net::ERR_NAME_NOT_RESOLVED
71
73
  net::ERR_NAME_RESOLUTION_FAILED
@@ -74,6 +76,9 @@ module Ferrum
74
76
  raise StatusError, options[:url]
75
77
  end
76
78
  response["frameId"]
79
+ rescue TimeoutError
80
+ pendings = network.traffic.select(&:pending?).map { |e| e.request.url }
81
+ raise StatusError.new(options[:url], pendings) unless pendings.empty?
77
82
  end
78
83
 
79
84
  def close
@@ -104,6 +109,7 @@ module Ferrum
104
109
  def refresh
105
110
  command("Page.reload", wait: timeout)
106
111
  end
112
+ alias_method :reload, :refresh
107
113
 
108
114
  def back
109
115
  history_navigate(delta: -1)
@@ -113,12 +119,23 @@ module Ferrum
113
119
  history_navigate(delta: 1)
114
120
  end
115
121
 
122
+ def bypass_csp(value = true)
123
+ enabled = !!value
124
+ command("Page.setBypassCSP", enabled: enabled)
125
+ enabled
126
+ end
127
+
116
128
  def command(method, wait: 0, **params)
117
129
  iteration = @event.reset if wait > 0
118
130
  result = @client.command(method, params)
119
131
  if wait > 0
120
- @event.wait(wait)
121
- @event.wait(@browser.timeout) if iteration != @event.iteration
132
+ @event.wait(wait) # Wait a bit after command and check if iteration has
133
+ # changed which means there was some network event for
134
+ # the main frame and it started to load new content.
135
+ if iteration != @event.iteration
136
+ set = @event.wait(@browser.timeout)
137
+ raise TimeoutError unless set
138
+ end
122
139
  end
123
140
  result
124
141
  end
@@ -133,6 +150,9 @@ module Ferrum
133
150
  when :request
134
151
  @client.on("Fetch.requestPaused") do |params, index, total|
135
152
  request = Network::InterceptedRequest.new(self, params)
153
+ exchange = network.select(request.network_id).last
154
+ exchange ||= network.build_exchange(request.network_id)
155
+ exchange.intercepted_request = request
136
156
  block.call(request, index, total)
137
157
  end
138
158
  when :auth
@@ -163,14 +183,6 @@ module Ferrum
163
183
  Thread.main.raise JavaScriptError.new(params.dig("exceptionDetails", "exception"))
164
184
  end
165
185
  end
166
-
167
- on("Page.domContentEventFired") do |params|
168
- # `frameStoppedLoading` doesn't occur if status isn't success
169
- if network.status != 200
170
- @event.set
171
- get_document_id
172
- end
173
- end
174
186
  end
175
187
 
176
188
  def prepare_page
@@ -40,18 +40,6 @@ module Ferrum
40
40
  frame.name = name unless name.to_s.empty?
41
41
  end
42
42
 
43
- on("Page.frameScheduledNavigation") do |params|
44
- frame = @frames[params["frameId"]]
45
- frame.state = :scheduled_navigation
46
- @event.reset
47
- end
48
-
49
- on("Page.frameClearedScheduledNavigation") do |params|
50
- frame = @frames[params["frameId"]]
51
- frame.state = :cleared_scheduled_navigation
52
- @event.set if idling?
53
- end
54
-
55
43
  on("Page.frameStoppedLoading") do |params|
56
44
  # `DOM.performSearch` doesn't work without getting #document node first.
57
45
  # It returns node with nodeId 1 and nodeType 9 from which descend the
@@ -136,7 +136,7 @@ module Ferrum
136
136
 
137
137
  def capture_screenshot(options, full)
138
138
  maybe_resize_fullscreen(full) do
139
- command("Page.captureScreenshot", options)
139
+ command("Page.captureScreenshot", **options)
140
140
  end.fetch("data")
141
141
  end
142
142
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Ferrum
4
- VERSION = "0.6.2"
4
+ VERSION = "0.7"
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ferrum
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.6.2
4
+ version: '0.7'
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dmitry Vorotilin
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-10-30 00:00:00.000000000 Z
11
+ date: 2020-01-28 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: websocket-driver
@@ -181,7 +181,10 @@ files:
181
181
  - README.md
182
182
  - lib/ferrum.rb
183
183
  - lib/ferrum/browser.rb
184
+ - lib/ferrum/browser/chrome.rb
184
185
  - lib/ferrum/browser/client.rb
186
+ - lib/ferrum/browser/command.rb
187
+ - lib/ferrum/browser/firefox.rb
185
188
  - lib/ferrum/browser/process.rb
186
189
  - lib/ferrum/browser/subscriber.rb
187
190
  - lib/ferrum/browser/web_socket.rb