Dhalang 0.7.0 → 0.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7392a94eca6e888d2a81b779cde341fd7a9be0bbddd2d254e708e93aa18b7fe3
4
- data.tar.gz: dc520a25fcaf30ad5584820bb45b9bcf72b35531024676e77d9ce0f54291f91f
3
+ metadata.gz: f8181be50a21d8c3b688b3ba8aa7f683f1bd6101acc36b6623b2948ffd0462ca
4
+ data.tar.gz: 7f04e3437befb446f2e2a45b90f46ffe17e97c56d570299ac1852c4d3fd2ed9a
5
5
  SHA512:
6
- metadata.gz: 3b75659edee50ba18a726be62a32d39336d5cb8ce6ac94625e529a75015ba7dee45fb396bc292a34604c602369ebba39842c127753d537fc882665a48aacf249
7
- data.tar.gz: 965cbdae8bc88057c4d7a7a366940a5bb8deedf367893fab5d2c7c437bf96dc9530caf96a138ae7b19ef87ce190c9e8740d83f36cf2a411ec745c3c18b5d208b
6
+ metadata.gz: 5a45db0cb7cbf08828e99ebd1a86fdc8e3dc8ac1fb01d10d3c521b2bbe38168fdd93e79271fe766504a1868d9cec124390ec3bf65cf513354ee6c974f0e50e37
7
+ data.tar.gz: 6f56ec00f075b58242fdf55c5d49ca2cb545fcf3b249942c1c4a0f1b102623ce7d633b7db25bb5f04450b25b599580fd6a81c502e434710e282cfcd6ed1af8f0
data/Gemfile.lock CHANGED
@@ -1,14 +1,14 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- Dhalang (0.7.0)
4
+ Dhalang (0.7.2)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
8
8
  specs:
9
- Ascii85 (1.1.0)
9
+ Ascii85 (1.1.1)
10
10
  afm (0.2.2)
11
- bigdecimal (3.1.7)
11
+ bigdecimal (3.1.8)
12
12
  diff-lcs (1.5.1)
13
13
  fastimage (2.2.7)
14
14
  hashery (2.1.2)
@@ -23,15 +23,15 @@ GEM
23
23
  rspec-core (~> 3.13.0)
24
24
  rspec-expectations (~> 3.13.0)
25
25
  rspec-mocks (~> 3.13.0)
26
- rspec-core (3.13.0)
26
+ rspec-core (3.13.2)
27
27
  rspec-support (~> 3.13.0)
28
- rspec-expectations (3.13.0)
28
+ rspec-expectations (3.13.3)
29
29
  diff-lcs (>= 1.2.0, < 2.0)
30
30
  rspec-support (~> 3.13.0)
31
- rspec-mocks (3.13.0)
31
+ rspec-mocks (3.13.2)
32
32
  diff-lcs (>= 1.2.0, < 2.0)
33
33
  rspec-support (~> 3.13.0)
34
- rspec-support (3.13.1)
34
+ rspec-support (3.13.2)
35
35
  ruby-rc4 (0.1.5)
36
36
  ttfunk (1.8.0)
37
37
  bigdecimal (~> 3.1)
data/README.md CHANGED
@@ -1,4 +1,4 @@
1
- # Dhalang [![Build](https://github.com/NielsSteensma/Dhalang/actions/workflows/build.yml/badge.svg)](https://github.com/NielsSteensma/Dhalang/actions/workflows/build.yml)
1
+ # Dhalang [![Build](https://github.com/NielsSteensma/Dhalang/actions/workflows/build.yml/badge.svg)](https://github.com/NielsSteensma/Dhalang/actions/workflows/build.yml) [![Gem Version](https://badge.fury.io/rb/Dhalang.svg)](https://badge.fury.io/rb/Dhalang)
2
2
 
3
3
  > Dhalang is a Ruby wrapper for Google's Puppeteer.
4
4
 
@@ -11,7 +11,11 @@
11
11
  * Scrape HTML from webpages
12
12
 
13
13
 
14
-
14
+ ## Prerequisites
15
+ * Node ≥ 18
16
+ * Puppeteer ≥ 22
17
+ * Unix shell ( Dhalang will not work on Windows shells )
18
+
15
19
  ## Installation
16
20
  Add this line to your application's Gemfile:
17
21
 
@@ -21,11 +25,12 @@ And then execute:
21
25
 
22
26
  $ bundle update
23
27
 
24
- Install puppeteer in your application's root directory:
28
+ Install puppeteer or puppeteer-core in your application's root directory:
25
29
 
26
- $ npm install puppeteer
30
+ $ npm install puppeteer
31
+ or
32
+ $ npm install puppeteer-core
27
33
 
28
- <sub>Dhalang and Puppeteer require Node ≥ 18 and Puppeteer ≥ 22</sub>
29
34
  ## Usage
30
35
  __PDF of a website url__
31
36
  ```ruby
@@ -75,10 +80,9 @@ For example to only take a screenshot of the visible part of the page:
75
80
  Dhalang::Screenshot.get_from_url("https://www.google.com", :webp, {fullPage: false})
76
81
  ```
77
82
 
78
- A list of all possible PDF options that can be set, can be found at: https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagepdfoptions
79
-
80
- A list of all possible screenshot options that can be set, can be found at: https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagescreenshotoptions
83
+ A list of all possible PDF options that can be set, can be found at: https://github.com/puppeteer/puppeteer/blob/main/docs/api/puppeteer.pdfoptions.md
81
84
 
85
+ A list of all possible screenshot options that can be set, can be found at: https://github.com/puppeteer/puppeteer/blob/main/docs/api/puppeteer.screenshotoptions.md
82
86
  > The default Puppeteer options contain the options `headerTemplate` and `footerTemplate`. Puppeteer expects these to be HTML strings. By default, the Dhalang
83
87
  > gem passes all options as arguments in a `node ...` shell command. In case the HTML strings are too long they might surpass the maximum
84
88
  > argument length of the host. For example, on Linux the `MAX_ARG_LEN` is 128kB. Therefore, you can also pass the headers and footers as file path using the
@@ -86,24 +90,17 @@ A list of all possible screenshot options that can be set, can be found at: http
86
90
  >
87
91
  > For example: `Dhalang::PDF.get_from_url("https://www.google.com", {headerTemplateFile: '/tmp/header.html', footerTemplateFile: '/tmp/footer.html'})`
88
92
 
89
-
90
- ## Custom user options
91
- You may want to change the way Dhalang interacts with Puppeteer in general. User options can be set by providing them in a hash as last argument to any calls you make to the library. Are you setting both custom PDF and user options? Then they should be passed as a single hash.
92
-
93
- For example to set a custom navigation timeout:
94
- ```ruby
95
- Dhalang::Screenshot.get_from_url("https://www.google.com", :jpeg, {navigationTimeout: 20000})
96
- ```
97
-
98
- Below table lists all possible configuration parameters that can be set:
93
+ Below table lists more configuration parameters that can be set:
99
94
  | Key | Description | Default |
100
95
  |--------------------|-----------------------------------------------------------------------------------------|---------------------------------|
96
+ | isHeadless | Indicates if Chromium should be launched headless (useful for debugging) | true |
97
+ | slowMo | Amount of milliseconds to slow down Puppeteer operations (useful for debugging) | 0 |
98
+ | browserWebsocketUrl | Websocket url of remote chromium browser to use | None |
101
99
  | navigationTimeout | Amount of milliseconds until Puppeteer while timeout when navigating to the given page | 10000 |
102
100
  | printToPDFTimeout | Amount of milliseconds until Puppeteer while timeout when calling Page.printToPDF | 0 (unlimited) |
103
101
  | navigationWaitForSelector | If set, Dhalang will wait for the specified selector to appear before creating the screenshot or PDF | None |
104
102
  | navigationWaitForXPath | If set, Dhalang will wait for the specified XPath to appear before creating the screenshot or PDF | None |
105
103
  | userAgent | User agent to send with the request | Default Puppeteer one |
106
- | isHeadless | Indicates if Chromium should be launched headless | true |
107
104
  | isAutoHeight | When set to true the height of generated PDFs will be based on the scrollHeight property of the document body | false |
108
105
  | viewPort | Custom viewport to use for the request | Default Puppeteer one |
109
106
  | httpAuthenticationCredentials | Custom HTTP authentication credentials to use for the request | None |
@@ -3,6 +3,7 @@ module Dhalang
3
3
  class Configuration
4
4
  NODE_MODULES_PATH = Dir.pwd + '/node_modules/'.freeze
5
5
  USER_OPTIONS = {
6
+ browserWebsocketUrl: '',
6
7
  navigationTimeout: 10000,
7
8
  printToPDFTimeout: 0, # unlimited
8
9
  navigationWaitUntil: 'load',
@@ -13,7 +14,8 @@ module Dhalang
13
14
  viewPort: '',
14
15
  httpAuthenticationCredentials: '',
15
16
  isAutoHeight: false,
16
- chromeOptions: []
17
+ chromeOptions: [],
18
+ slowMo: 0
17
19
  }.freeze
18
20
  DEFAULT_PDF_OPTIONS = {
19
21
  scale: 1,
@@ -48,6 +50,7 @@ module Dhalang
48
50
  private_constant :DEFAULT_JPEG_OPTIONS
49
51
 
50
52
  private attr_accessor :page_url
53
+ private attr_accessor :browser_websocket_url
51
54
  private attr_accessor :temp_file_path
52
55
  private attr_accessor :temp_file_extension
53
56
  private attr_accessor :user_options
@@ -6,9 +6,10 @@ module Dhalang
6
6
  #
7
7
  # @param [String] url The url to validate
8
8
  def self.validate(url)
9
- if (url !~ URI::DEFAULT_PARSER.regexp[:ABS_URI])
10
- raise URI::InvalidURIError, 'The given url was invalid, use format http://www.example.com'
11
- end
9
+ parsed = URI.parse(url) # Raise URI::InvalidURIError on invalid URLs
10
+ return true if parsed.absolute?
11
+
12
+ raise URI::InvalidURIError, 'The given url was invalid, use format http://www.example.com'
12
13
  end
13
14
  end
14
- end
15
+ end
@@ -1,3 +1,3 @@
1
1
  module Dhalang
2
- VERSION = "0.7.0"
2
+ VERSION = "0.7.2"
3
3
  end
data/lib/js/dhalang.js CHANGED
@@ -14,6 +14,7 @@ const fs = require('fs')
14
14
 
15
15
  /**
16
16
  * @typedef {Object} UserOptions
17
+ * @property {string} browserWebsocketUrl - The websocket url of remote Chromium browser to use.
17
18
  * @property {number} navigationTimeout - Maximum in milliseconds until navigation times out, we use a default of 10 seconds as timeout.
18
19
  * @property {string} navigationWaitUntil - Determines when the navigation was finished, we wait here until the Window.load event is fired ( meaning all images, stylesheet, etc was loaded ).
19
20
  * @property {string} navigationWaitForSelector - If set, specifies the selector Puppeteer should wait for to appear before continuing.
@@ -23,6 +24,7 @@ const fs = require('fs')
23
24
  * @property {Object} viewPort - The view port to use.
24
25
  * @property {Object} httpAuthenticationCredentials - The credentials to use for HTTP authentication.
25
26
  * @property {boolean} isAutoHeight - The height is automatically set
27
+ * @property {number} slowMo - Amount of milliseconds to slow down Puppeteer operations.
26
28
  */
27
29
 
28
30
  /**
@@ -47,7 +49,7 @@ exports.getConfiguration = function () {
47
49
 
48
50
  /**
49
51
  * Launches Puppeteer and returns its instance.
50
- * @param {UserOptions} configuration - The configuration to use.
52
+ * @param {Configuration} configuration - The configuration to use.
51
53
  * @returns {Promise<Object>}
52
54
  * The launched instance of Puppeteer.
53
55
  */
@@ -55,10 +57,18 @@ exports.launchPuppeteer = async function (configuration) {
55
57
  module.paths.push(configuration.puppeteerPath);
56
58
  const puppeteer = require('puppeteer');
57
59
  const launchArgs = ['--no-sandbox', '--disable-setuid-sandbox'].concat(configuration.userOptions.chromeOptions).filter((item, index, self) => self.indexOf(item) === index);
58
- return await puppeteer.launch({
59
- args: launchArgs,
60
- headless: configuration.userOptions.isHeadless
61
- });
60
+
61
+ if (configuration.userOptions['browserWebsocketUrl'] !== "") {
62
+ return await puppeteer.connect( {
63
+ "browserWSEndpoint": configuration.userOptions.browserWebsocketUrl
64
+ })
65
+ } else {
66
+ return await puppeteer.launch({
67
+ args: launchArgs,
68
+ headless: configuration.userOptions.isHeadless,
69
+ slowMo: configuration.userOptions.slowMo
70
+ });
71
+ }
62
72
  }
63
73
 
64
74
  /**
@@ -148,7 +158,7 @@ exports.getConfiguredPdfOptions = async function (page, configuration) {
148
158
  exports.getNavigationParameters = function (configuration) {
149
159
  return {
150
160
  timeout: configuration.userOptions.navigationTimeout,
151
- waituntil: configuration.userOptions.navigationWaitUntil
161
+ waitUntil: configuration.userOptions.navigationWaitUntil
152
162
  }
153
163
  }
154
164
 
@@ -6,9 +6,10 @@ const scrapeHtml = async () => {
6
6
  const configuration = dhalang.getConfiguration();
7
7
 
8
8
  let browser;
9
+ let page;
9
10
  try {
10
11
  browser = await dhalang.launchPuppeteer(configuration);
11
- const page = await browser.newPage();
12
+ page = await browser.newPage();
12
13
  await dhalang.configure(page, configuration.userOptions);
13
14
  await dhalang.navigate(page, configuration);
14
15
  const html = await page.content();
@@ -17,8 +18,10 @@ const scrapeHtml = async () => {
17
18
  console.error(error.message);
18
19
  process.exit(1);
19
20
  } finally {
20
- if (browser) {
21
+ if (browser && configuration.userOptions['browserWebsocketUrl'] === "") {
21
22
  browser.close();
23
+ } else {
24
+ page.close();
22
25
  }
23
26
  process.exit(0);
24
27
  }
@@ -5,9 +5,10 @@ const createPdf = async () => {
5
5
  const configuration = dhalang.getConfiguration();
6
6
 
7
7
  let browser;
8
+ let page;
8
9
  try {
9
10
  browser = await dhalang.launchPuppeteer(configuration);
10
- const page = await browser.newPage();
11
+ page = await browser.newPage();
11
12
  await dhalang.configure(page, configuration.userOptions);
12
13
  await dhalang.navigate(page, configuration);
13
14
  const pdfOptions = await dhalang.getConfiguredPdfOptions(page, configuration);
@@ -21,8 +22,10 @@ const createPdf = async () => {
21
22
  console.error(error.message);
22
23
  process.exit(1);
23
24
  } finally {
24
- if (browser) {
25
+ if (browser && configuration.userOptions['browserWebsocketUrl'] === "") {
25
26
  browser.close();
27
+ } else {
28
+ page.close();
26
29
  }
27
30
  process.exit();
28
31
  }
@@ -5,9 +5,10 @@ const createScreenshot = async () => {
5
5
  const configuration = dhalang.getConfiguration();
6
6
 
7
7
  let browser;
8
+ let page;
8
9
  try {
9
10
  browser = await dhalang.launchPuppeteer(configuration);
10
- const page = await browser.newPage();
11
+ page = await browser.newPage();
11
12
  await dhalang.configure(page, configuration.userOptions);
12
13
  await dhalang.navigate(page, configuration);
13
14
 
@@ -23,8 +24,10 @@ const createScreenshot = async () => {
23
24
  console.error(error.message);
24
25
  process.exit(1);
25
26
  } finally {
26
- if (browser) {
27
+ if (browser && configuration.userOptions['browserWebsocketUrl'] === "") {
27
28
  browser.close();
29
+ } else {
30
+ page.close();
28
31
  }
29
32
  process.exit();
30
33
  }