rubium 0.1.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 124da1f21fad244bbdeb8adee803043896502e4c58964753bbba6ea8083df750
4
- data.tar.gz: 13cda3fb2bd4f121700d08f03a8fbc3b15e38b785dca30074afa13121f4592b0
3
+ metadata.gz: e10835f0d5dff1baa1cc0e65f3a916372c9f97ecf02c80ae337b192aca99c082
4
+ data.tar.gz: 5b5d44b8d329282fe8dfef8d7867ed9d1d57174e7fdc4d1ecc07180ba720e491
5
5
  SHA512:
6
- metadata.gz: e724053ddf9d97bbaf77db2c3647e1988c5bbace1a56b2f55f72afde1758ace0aaabd1f9d1771647361f8a863a6c33c35a8b616a7b8c4bd4f61278a750b33af0
7
- data.tar.gz: 8decfacf86c9751c6ae39e61cae9b8fa1b29a3cb769f8ac3e11e15122364173bf0d20e649e058fe150023e1a7ea38ac13fd919d14abddef6fbfa961d64ca6987
6
+ metadata.gz: 56fdf04341dbf6b119d5682ad80d69beafe9147552bb26f53ba1c94e40ba509c7846d669de16b79f2ca667c18fceb8dc3d540d0922de934269ecc885f1a549c1
7
+ data.tar.gz: 611689e94f1846eb5b1f25a2d22b65666c34a28ab4d95239015f575c26f30f4e8dc9b4931aa8ebbb003a5a7f8a29224019036c3f49934d2f3e2c745fa63b4114
data/.gitignore CHANGED
@@ -1,4 +1,5 @@
1
1
  /.bundle/
2
+ /.claude/
2
3
  /.yardoc
3
4
  /_yardoc/
4
5
  /coverage/
data/CHANGELOG.md ADDED
@@ -0,0 +1,28 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [0.2.1] - 2025-12-07
9
+
10
+ ### Added
11
+ - Added CHANGELOG.md to track version changes
12
+ - Require fileutils to make sure fileutils included in newer ruby versions
13
+ - Update dependencies
14
+
15
+ ### Changed
16
+ - Updated MAX_CONNECT_WAIT_TIME to 6 seconds
17
+ - Added option to provide custom data-dir path
18
+
19
+ ## [0.2.0] - 2019-02-24
20
+
21
+ - Added logging support
22
+ - Added cookies and restart_after options
23
+ - Added urls_blacklist and disable_images options
24
+ - Various bug fixes and improvements
25
+
26
+ ## [0.1.0] - 2019-01-XX
27
+
28
+ - Initial release
data/LICENSE.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2018 Victor Afanasev
3
+ Copyright (c) 2025 Victor Afanasev
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -1,6 +1,8 @@
1
1
  # Rubium
2
2
 
3
- Rubium is a handy wrapper around [chrome_remote](https://github.com/cavalle/chrome_remote) gem. It adds browsers instances handling, and some Capybara-like methods. It is very lightweight (200 lines of code in the main `Rubium::Browser` class for now) and doens't use Selenium or Capybara.
3
+ ## Description
4
+
5
+ Rubium is a handy wrapper around [chrome_remote](https://github.com/cavalle/chrome_remote) gem. It adds browsers instances handling, and some Capybara-like methods. It is very lightweight (250 lines of code in the main `Rubium::Browser` class for now) and doens't use Selenium or Capybara. Consider Rubium as a _very simple_ and _basic_ implementation of [Puppeteer](https://github.com/GoogleChrome/puppeteer) in Ruby language.
4
6
 
5
7
  You can use Rubium as a lightweight alternative to Selenium/Capybara/Watir if you need to perform some operations (like web scraping) using Headless Chromium and Ruby. Of course, the API currently doesn't has a lot of methods to automate browser, but it has the most frequently used and basic ones.
6
8
 
@@ -22,6 +24,12 @@ browser.click("some selector")
22
24
  # Get current cookies:
23
25
  browser.cookies
24
26
 
27
+ # Set cookies (Array of hashes):
28
+ browser.set_cookies([
29
+ { name: "some_cookie_name", value: "some_cookie_value", domain: ".some-cookie-domain.com" },
30
+ { name: "another_cookie_name", value: "another_cookie_value", domain: ".another-cookie-domain.com" }
31
+ ])
32
+
25
33
  # Fill in some field:
26
34
  browser.fill_in("some field selector", "Some text")
27
35
 
@@ -39,7 +47,8 @@ browser.evaluate_on_new_document(File.read "browser_inject.js")
39
47
  # Evaluate JS code expression:
40
48
  browser.execute_script("JS code string")
41
49
 
42
- # Access chrome_remote client directly:
50
+ # Access chrome_remote client (instance of ChromeRemote class) directly:
51
+ # See more here: https://github.com/cavalle/chrome_remote#using-the-chromeremote-api
43
52
  browser.client
44
53
 
45
54
  # Close browser:
@@ -49,18 +58,42 @@ browser.close
49
58
  browser.restart!
50
59
  ```
51
60
 
52
- There are some options which you can provide while creating browser instance:
61
+ **There are some options** which you can provide while creating browser instance:
53
62
 
54
63
  ```ruby
55
64
  browser = Rubium::Browser.new(
56
- debugging_port: 9222, # custom debugging port
57
- headless: false, # Run browser in normal (not headless) mode
58
- user_agent: "Some user agent", # Custom user-agent
59
- proxy_server: "http://1.1.1.1:8080", # Set proxy
65
+ debugging_port: 9222, # custom debugging port. Default is any available port.
66
+ headless: false, # Run browser in normal (not headless) mode. Default is headless.
67
+ window_size: [1600, 900], # Custom window size. Default is unset.
68
+ user_agent: "Some user agent", # Custom user-agent.
69
+ proxy_server: "http://1.1.1.1:8080", # Set proxy.
70
+ extension_code: "Some JS code string", # Inject custom JS code on each page. See above `evaluate_on_new_document`
71
+ cookies: [], # Set custom cookies, see above `set_cookies`
72
+ restart_after: 25, # Automatically restart browser after N processed requests
73
+ enable_logger: true, # Enable logger to log info about processing requests
74
+ max_timeout: 30, # How long to wait (in seconds) until page will be fully loaded. Default 60 sec.
75
+ urls_blacklist: ["*some-domain.com*"], # Skip all requests which match provided patterns (wildcard allowed).
76
+ disable_images: true # Do not download images.
77
+ )
78
+ ```
79
+
80
+ Note that for options `user_agent` and `proxy_server` you can provide `lambda` object instead of string:
81
+
82
+ ```ruby
83
+ USER_AGENTS = ["Safari", "Mozilla", "IE", "Chrome"]
84
+ PROXIES = ["http://1.1.1.1:8080", "http://2.2.2.2:8080", "http://3.3.3.3:8080"]
85
+
86
+ browser = Rubium::Browser.new(
87
+ user_agent: -> { USER_AGENTS.sample },
88
+ proxy_server: -> { PROXIES.sample },
89
+ restart_after: 25
60
90
  )
61
91
  ```
62
92
 
63
- You can provide custom Chrome binary path this way:
93
+ > What for: Chrome doesn't provide an API to change proxies on the fly (after browser has been started). It is possible to set proxy while starting Chrome instance by providing CLI argument only. On the other hand, Rubium allows you to automatically restart browser (`restart_after` option) after N processed requests. On each restart, if options `user_agent` and/or `proxy_server` has lambda format, then lambda will be called to fetch fresh value. Thus it's possible to rotate proxies/user-agents without any much effort.
94
+
95
+
96
+ **You can provide custom Chrome binary** path this way:
64
97
 
65
98
  ```ruby
66
99
  Rubium.configure do |config|
@@ -68,12 +101,18 @@ Rubium.configure do |config|
68
101
  end
69
102
  ```
70
103
 
104
+ Common Chrome path example for MacOS:
105
+
106
+ ```ruby
107
+ Rubium.configure do |config|
108
+ config.chrome_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
109
+ end
110
+ ```
111
+
71
112
 
72
113
  ## Installation
73
114
  Rubium tested with `2.3.0` Ruby version and up.
74
115
 
75
- Rubium is in the alpha stage (and therefore will have breaking updates in the future), so it's recommended to hard-code latest gem version in your Gemfile, like: `gem 'rubium', '0.1.0'`.
76
-
77
116
  ## Contribution
78
117
  Sure, feel free to fork and add new functionality.
79
118
 
@@ -4,6 +4,8 @@ require 'random-port'
4
4
  require 'cliver'
5
5
  require 'timeout'
6
6
  require 'securerandom'
7
+ require 'logger'
8
+ require 'fileutils'
7
9
 
8
10
  at_exit do
9
11
  Rubium::Browser.running_pids.each { |pid| Process.kill("HUP", pid) }
@@ -13,7 +15,8 @@ module Rubium
13
15
  class Browser
14
16
  class ConfigurationError < StandardError; end
15
17
 
16
- MAX_CONNECT_WAIT_TIME = 2
18
+ MAX_CONNECT_WAIT_TIME = 6
19
+ MAX_DEFAULT_TIMEOUT = 60
17
20
 
18
21
  class << self
19
22
  def ports_pool
@@ -25,25 +28,40 @@ module Rubium
25
28
  end
26
29
  end
27
30
 
28
- attr_reader :client, :devtools_url, :pid, :port, :options
31
+ attr_reader :client, :devtools_url, :pid, :port, :options, :processed_requests_count, :logger
29
32
 
30
33
  def initialize(options = {})
31
34
  @options = options
35
+
36
+ if @options[:enable_logger]
37
+ @logger = Logger.new(STDOUT)
38
+ @logger.progname = self.class.to_s
39
+ end
40
+
32
41
  create_browser
33
42
  end
34
43
 
35
44
  def restart!
45
+ logger.info "Restarting..." if options[:enable_logger]
46
+
36
47
  close
37
48
  create_browser
38
49
  end
39
50
 
40
51
  def close
41
- unless closed?
52
+ if closed?
53
+ logger.info "Browser already has been closed" if options[:enable_logger]
54
+ else
42
55
  Process.kill("HUP", @pid)
43
56
  self.class.running_pids.delete(@pid)
44
57
  self.class.ports_pool.release(@port)
45
58
 
46
- FileUtils.rm_rf(@data_dir) if Dir.exist?(@data_dir)
59
+ # Delete temp profile directory, if there is no custom one
60
+ unless options[:data_dir]
61
+ FileUtils.rm_rf(@data_dir) if Dir.exist?(@data_dir)
62
+ end
63
+
64
+ logger.info "Closed browser" if options[:enable_logger]
47
65
  @closed = true
48
66
  end
49
67
  end
@@ -54,14 +72,28 @@ module Rubium
54
72
  @closed
55
73
  end
56
74
 
57
- def goto(url, wait: 30)
75
+ def goto(url, wait: options[:max_timeout] || MAX_DEFAULT_TIMEOUT)
76
+ logger.info "Started request: #{url}" if options[:enable_logger]
77
+ if options[:restart_after] && processed_requests_count >= options[:restart_after]
78
+ restart!
79
+ end
80
+
58
81
  response = @client.send_cmd "Page.navigate", url: url
59
82
 
60
- if wait
61
- Timeout.timeout(wait) { @client.wait_for "Page.loadEventFired" }
62
- else
63
- response
83
+ # By default, after Page.navigate we should wait till page will load completely
84
+ # using Page.loadEventFired. But on some websites with Ajax navigation, Page.loadEventFired
85
+ # will stuck forever. In this case you can provide `wait: false` option to skip waiting.
86
+ if wait != false
87
+ # https://chromedevtools.github.io/devtools-protocol/tot/Page#event-frameStoppedLoading
88
+ Timeout.timeout(wait) do
89
+ @client.wait_for do |event_name, event_params|
90
+ event_name == "Page.frameStoppedLoading" && event_params["frameId"] == response["frameId"]
91
+ end
92
+ end
64
93
  end
94
+
95
+ @processed_requests_count += 1
96
+ logger.info "Finished request: #{url}" if options[:enable_logger]
65
97
  end
66
98
 
67
99
  alias_method :visit, :goto
@@ -106,21 +138,21 @@ module Rubium
106
138
  end
107
139
 
108
140
  def click(selector)
109
- @client.send_cmd "Runtime.evaluate", expression: <<~JS
141
+ @client.send_cmd "Runtime.evaluate", expression: <<~js
110
142
  document.querySelector("#{selector}").click();
111
- JS
143
+ js
112
144
  end
113
145
 
114
146
  # https://github.com/cyrus-and/chrome-remote-interface/issues/226#issuecomment-320247756
115
147
  # https://stackoverflow.com/a/18937620
116
148
  def send_key_on(selector, key)
117
- @client.send_cmd "Runtime.evaluate", expression: <<~JS
149
+ @client.send_cmd "Runtime.evaluate", expression: <<~js
118
150
  document.querySelector("#{selector}").dispatchEvent(
119
151
  new KeyboardEvent("keydown", {
120
152
  bubbles: true, cancelable: true, keyCode: #{key}
121
153
  })
122
154
  );
123
- JS
155
+ js
124
156
  end
125
157
 
126
158
  # https://github.com/GoogleChrome/puppeteer/blob/master/lib/Page.js#L784
@@ -130,15 +162,24 @@ module Rubium
130
162
  @client.send_cmd "Page.addScriptToEvaluateOnNewDocument", source: script
131
163
  end
132
164
 
165
+ ###
166
+
133
167
  def cookies
134
168
  response = @client.send_cmd "Network.getCookies"
135
169
  response["cookies"]
136
170
  end
137
171
 
172
+ # https://chromedevtools.github.io/devtools-protocol/tot/Network#method-setCookies
173
+ def set_cookies(cookies)
174
+ @client.send_cmd "Network.setCookies", cookies: cookies
175
+ end
176
+
177
+ ###
178
+
138
179
  def fill_in(selector, text)
139
- execute_script <<~HEREDOC
180
+ execute_script <<~js
140
181
  document.querySelector("#{selector}").value = "#{text}"
141
- HEREDOC
182
+ js
142
183
  end
143
184
 
144
185
  def execute_script(script)
@@ -148,8 +189,11 @@ module Rubium
148
189
  private
149
190
 
150
191
  def create_browser
192
+ @processed_requests_count = 0
193
+
151
194
  @port = options[:debugging_port] || self.class.ports_pool.acquire
152
- @data_dir = "/tmp/rubium_profile_#{SecureRandom.hex}"
195
+
196
+ @data_dir = options[:data_dir] || "/tmp/rubium_profile_#{SecureRandom.hex}"
153
197
 
154
198
  chrome_path = Rubium.configuration.chrome_path ||
155
199
  Cliver.detect("chromium-browser") ||
@@ -196,6 +240,26 @@ module Rubium
196
240
  @client.send_cmd "Page.enable"
197
241
 
198
242
  evaluate_on_new_document(options[:extension_code]) if options[:extension_code]
243
+
244
+ set_cookies(options[:cookies]) if options[:cookies]
245
+
246
+ if options[:urls_blacklist] || options[:disable_images]
247
+ urls = []
248
+
249
+ if options[:urls_blacklist]
250
+ urls += options[:urls_blacklist]
251
+ end
252
+
253
+ if options[:disable_images]
254
+ urls += %w(jpg jpeg png gif swf svg tif).map { |ext| ["*.#{ext}", "*.#{ext}?*"] }.flatten
255
+ urls << "data:image*"
256
+ end
257
+
258
+ @client.send_cmd "Network.setBlockedURLs", urls: urls
259
+ end
260
+
261
+
262
+ logger.info "Opened browser" if options[:enable_logger]
199
263
  end
200
264
 
201
265
  def convert_proxy(proxy_string)
@@ -1,3 +1,3 @@
1
1
  module Rubium
2
- VERSION = "0.1.0"
2
+ VERSION = "0.2.1"
3
3
  end
data/lib/rubium.rb CHANGED
@@ -30,6 +30,7 @@ module Rubium
30
30
  --mute-audio
31
31
  --no-sandbox
32
32
  --disable-infobars
33
+ --disable-blink-features=AutomationControlled
33
34
  ).freeze
34
35
 
35
36
  def self.configuration
data/rubium.gemspec CHANGED
@@ -21,12 +21,12 @@ Gem::Specification.new do |spec|
21
21
  end
22
22
  spec.require_paths = ["lib"]
23
23
 
24
- spec.add_dependency "chrome_remote", "~> 0.2"
24
+ spec.add_dependency "chrome_remote", "~> 0.3"
25
25
  spec.add_dependency "cliver", "~> 0.3"
26
26
  spec.add_dependency "random-port"
27
27
  spec.add_dependency "nokogiri"
28
28
 
29
- spec.add_development_dependency "bundler", "~> 1.16"
30
- spec.add_development_dependency "rake", "~> 10.0"
31
- spec.add_development_dependency "minitest", "~> 5.0"
29
+ spec.add_development_dependency "bundler", "~> 2.4"
30
+ spec.add_development_dependency "rake", "~> 13.0"
31
+ spec.add_development_dependency "minitest", "~> 5.22"
32
32
  end
metadata CHANGED
@@ -1,14 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rubium
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Victor Afanasev
8
- autorequire:
9
8
  bindir: bin
10
9
  cert_chain: []
11
- date: 2018-12-19 00:00:00.000000000 Z
10
+ date: 1980-01-02 00:00:00.000000000 Z
12
11
  dependencies:
13
12
  - !ruby/object:Gem::Dependency
14
13
  name: chrome_remote
@@ -16,14 +15,14 @@ dependencies:
16
15
  requirements:
17
16
  - - "~>"
18
17
  - !ruby/object:Gem::Version
19
- version: '0.2'
18
+ version: '0.3'
20
19
  type: :runtime
21
20
  prerelease: false
22
21
  version_requirements: !ruby/object:Gem::Requirement
23
22
  requirements:
24
23
  - - "~>"
25
24
  - !ruby/object:Gem::Version
26
- version: '0.2'
25
+ version: '0.3'
27
26
  - !ruby/object:Gem::Dependency
28
27
  name: cliver
29
28
  requirement: !ruby/object:Gem::Requirement
@@ -72,42 +71,42 @@ dependencies:
72
71
  requirements:
73
72
  - - "~>"
74
73
  - !ruby/object:Gem::Version
75
- version: '1.16'
74
+ version: '2.4'
76
75
  type: :development
77
76
  prerelease: false
78
77
  version_requirements: !ruby/object:Gem::Requirement
79
78
  requirements:
80
79
  - - "~>"
81
80
  - !ruby/object:Gem::Version
82
- version: '1.16'
81
+ version: '2.4'
83
82
  - !ruby/object:Gem::Dependency
84
83
  name: rake
85
84
  requirement: !ruby/object:Gem::Requirement
86
85
  requirements:
87
86
  - - "~>"
88
87
  - !ruby/object:Gem::Version
89
- version: '10.0'
88
+ version: '13.0'
90
89
  type: :development
91
90
  prerelease: false
92
91
  version_requirements: !ruby/object:Gem::Requirement
93
92
  requirements:
94
93
  - - "~>"
95
94
  - !ruby/object:Gem::Version
96
- version: '10.0'
95
+ version: '13.0'
97
96
  - !ruby/object:Gem::Dependency
98
97
  name: minitest
99
98
  requirement: !ruby/object:Gem::Requirement
100
99
  requirements:
101
100
  - - "~>"
102
101
  - !ruby/object:Gem::Version
103
- version: '5.0'
102
+ version: '5.22'
104
103
  type: :development
105
104
  prerelease: false
106
105
  version_requirements: !ruby/object:Gem::Requirement
107
106
  requirements:
108
107
  - - "~>"
109
108
  - !ruby/object:Gem::Version
110
- version: '5.0'
109
+ version: '5.22'
111
110
  description: Headless Chromium Ruby API based on ChromeRemote gem
112
111
  email:
113
112
  - vicfreefly@gmail.com
@@ -116,6 +115,7 @@ extensions: []
116
115
  extra_rdoc_files: []
117
116
  files:
118
117
  - ".gitignore"
118
+ - CHANGELOG.md
119
119
  - Gemfile
120
120
  - LICENSE.txt
121
121
  - README.md
@@ -130,7 +130,6 @@ homepage: https://github.com/vifreefly/rubium
130
130
  licenses:
131
131
  - MIT
132
132
  metadata: {}
133
- post_install_message:
134
133
  rdoc_options: []
135
134
  require_paths:
136
135
  - lib
@@ -145,9 +144,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
145
144
  - !ruby/object:Gem::Version
146
145
  version: '0'
147
146
  requirements: []
148
- rubyforge_project:
149
- rubygems_version: 2.7.6
150
- signing_key:
147
+ rubygems_version: 3.6.7
151
148
  specification_version: 4
152
149
  summary: Headless Chromium Ruby API based on ChromeRemote gem
153
150
  test_files: []