palapala_pdf 0.1.8 → 0.1.10

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 717b57250c5722d66ac2310681959abc3ef56498d026b797f13b293077369d2b
4
- data.tar.gz: cef8fc4f830a237020cfa6379add09c40da64c818b1758261d560e5ed3482fdc
3
+ metadata.gz: e06d55c5dca6e14014e1154d4cd4fdcdddcd61844ccd138f38e1d9d803d1094e
4
+ data.tar.gz: 89ed6d300a9e4c804d3bfcb54516ec921ed741df1718d68b4d160b36e3fd2792
5
5
  SHA512:
6
- metadata.gz: a31ddde9c4617397ff1a1ae13a708dbe4e54085c5d85043bf149bc602b0d6ebc3c1d9ee87d6fa42077146a159b045e398d599c0d3819b64eaeb4e31081afb0dc
7
- data.tar.gz: a064e6622c5eb1737a0baa43a56fff6fd896b69c850b2120a3441b214fd8165ade5988f8bc971a0ddb37a19c900631f6935337e2682a1b8071e5a8c33ecd167c
6
+ metadata.gz: f0bd26fe4c402e06f1f75ab4a5ccfad05834b78bcb125024967134149b7738b2cbd8121dd985332907644ecfd167f44fc3109b322099ebb9c0e988971f74f42e
7
+ data.tar.gz: c95f8931ce1538af9b0cb63d93fcc179016cfe05cd4920b7ead1bf6933f4712bfbaee3ae8d243e1112353c1c2b4af875c92b6e1e6373ad219aa0c409f7db117a
data/README.md CHANGED
@@ -31,49 +31,47 @@ $ gem install palapala_pdf
31
31
  ```
32
32
 
33
33
  Palapala PDF connects to Chrome over a web socket connection.
34
- An external Chrome/Chromium is expected. Start it with the following
35
- command (9222 is the default port):
34
+ An external Chrome/Chromium is preferred. Start it with the following
35
+ command (9222 is the default/expected port):
36
36
 
37
37
  ```sh
38
38
  /path/to/chrome --headless --disable-gpu --remote-debugging-port=9222
39
39
  ```
40
40
 
41
- ### Installing Chrome / Headless Chrome
41
+ ### Connecting to Chrome
42
42
 
43
- Seems the august 2024 release 128.0.6613.85 is seriously performance impacted. So to avoid regression issues, it's suggested to install a specific version of Chrome, test it and stick with it. This is easiest using npx and some tooling provided by Puppeteer. Unfortunately it depends on node/npm, but it's worth it. E.g. install a specific version like this:
43
+ Palapa PDF will go through this process
44
44
 
45
- ```
46
- npx @puppeteer/browsers install chrome@127.0.6533.88
47
- ````
45
+ - check if a Chrome is running and exposing port 9222 (and if so, use it)
46
+ - if `Palapala.headless_chrome_path` is defined, launch Chrome as a child process using that path
47
+ - if **NPX** is avalaillable, install a **Chrome-Headless-Shell** variant locally and launch it as a child process. It will install the 'stable' version or the version identified by `Palapala.chrome_headless_shell_version` setting (or from ENV `CHROME_HEADLESS_SHELL_VERSION`).
48
+ - as a last fallback it will guess a chrome path from the detected OS and try to launch a Chrome with that
48
49
 
49
- This installs chrome in a `chrome` folder in the current working dir and it outputs the path where it's installed when it's finished.
50
+ A Chrome-Headless-Shell version gives the best performance and resource useage
50
51
 
51
- If you installed it using puppeteer from above
52
+ ### Installing Chrome / Headless Chrome manually
53
+
54
+ This is easiest using npx and some tooling provided by Puppeteer. Unfortunately it depends on node/npm, but it's worth it. E.g. install a specific version like this:
52
55
 
53
- ```sh
54
- ./chrome/mac_arm-127.0.6533.88/chrome-mac-arm64/Google\ Chrome\ for\ Testing.app/Contents/MacOS/Google\ Chrome\ for\ Testing --headless --disable-gpu --remote-debugging-port=9222
55
56
  ```
57
+ npx @puppeteer/browsers install chrome@127.0.6533.88
58
+ ````
56
59
 
57
- Currently i'd advise for the `chrome-headless-shell`variant that is a light version meant just for this use case. The chrome-headless-shell is a minimal, headless version of the Chrome browser designed specifically for environments where you need to run Chrome without a graphical user interface (GUI). This is particularly useful in scenarios like server-side rendering, automated testing, web scraping, or any situation where you need the power of the Chrome browser engine without the overhead of displaying a UI. Headless by design, reduced size and overhead but still the same engine.
60
+ This installs chrome in a `chrome` folder in the current working dir and it outputs the path where it's installed when it's finished which then could be started like this
61
+
62
+ Currently we'd advise for the `chrome-headless-shell` variant that is a light version meant just for this use case. The chrome-headless-shell is a minimal, headless version of the Chrome browser designed specifically for environments where you need to run Chrome without a graphical user interface (GUI). This is particularly useful in scenarios like server-side rendering, automated testing, web scraping, or any situation where you need the power of the Chrome browser engine without the overhead of displaying a UI. Headless by design, reduced size and overhead but still the same engine.
58
63
 
59
64
  ```
60
65
  npx @puppeteer/browsers install chrome-headless-shell@stable
61
66
  ```
62
67
 
63
- It installs to a path like this `./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell`. As it's headless by design, it only needs one parameter
68
+ It installs to a path like this `./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell`. As it's headless by design, it only needs one parameter:
64
69
 
65
70
  ```
66
71
  ./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell --remote-debugging-port=9222
67
72
  ```
68
73
 
69
- Alternatively, Palapala PDF will try to launch Chrome as a child process.
70
- It guesses the path to Chrome, or you configure it like this:
71
-
72
- ```ruby
73
- Palapala.setup do |config|
74
- config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
75
- end
76
- ```
74
+ *Note: Seems the august 2024 release 128.0.6613.85 is seriously performance impacted. So to avoid regression issues, it's suggested to install a specific version of Chrome, test it and stick with it. The chrome-headless-shell does not seem to suffer from this though.*
77
75
 
78
76
  ### Installing Node/NPX
79
77
 
@@ -92,7 +90,6 @@ nvm --version
92
90
  nvm install node
93
91
  ````
94
92
 
95
-
96
93
  ## Usage Instructions
97
94
 
98
95
  To create a PDF from HTML content using the `Palapala` library, follow these steps:
@@ -1,33 +1,11 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
- require 'open3'
4
- require 'pathname'
3
+ # $LOAD_PATH.unshift File.expand_path("../lib", __dir__)
4
+ require "palapala"
5
5
 
6
- # Run the command and capture the output
7
- puts "Installing latest stable chrome-headless-shell..."
8
- output, status = Open3.capture2('npx --yes @puppeteer/browsers install chrome-headless-shell@stable')
9
-
10
- if status.success?
11
- # Extract the path from the output
12
- result = output.lines.find { |line| line.include?("chrome-headless-shell@") }
13
- if result.nil?
14
- puts "Failed to install chrome-headless-shell"
15
- exit 1
16
- end
17
- _, chrome_path = result.split(' ', 2).map(&:strip)
18
-
19
- # Directory you want the relative path from (current working directory)
20
- base_dir = Dir.pwd
21
-
22
- # Convert absolute path to relative path
23
- relative_path = Pathname.new(chrome_path).relative_path_from(Pathname.new(base_dir)).to_s
24
-
25
- puts "Launching chrome-headless-shell at #{relative_path}"
26
- # Display the version
27
- system("#{chrome_path} --version")
28
- # Launch chrome-headless-shell with the --remote-debugging-port parameter
29
- exec("#{chrome_path} --remote-debugging-port=9222")
30
- else
31
- puts "Failed to install chrome-headless-shell"
32
- exit 1
6
+ Palapala.setup do |config|
7
+ config.debug = true
33
8
  end
9
+
10
+ pid = Palapala::ChromeProcess.spawn_chrome_headless_server
11
+ Process.wait(pid)
@@ -25,25 +25,20 @@ HEADER_HTML = <<~HTML
25
25
  HTML
26
26
 
27
27
  Palapala.setup do |config|
28
- config.debug = true
29
- config.headless_chrome_url = 'http://localhost:9222' # run against a remote Chrome instance
28
+ # config.debug = true
29
+ # config.headless_chrome_url = 'http://localhost:9222' # run against a remote Chrome instance
30
30
  # config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
31
31
  end
32
32
 
33
33
  result = Palapala::Pdf.new(
34
34
  # "<style>@page { size: A4 landscape; }</style><p>Hello world #{Time.now}</>",
35
35
  "<h1>Title</h1><p>Hello world #{Time.now}</>",
36
- header_html: HEADER_HTML,
37
- footer_html: '<div style="text-align: center;">Generated with Palapala PDF</div>',
36
+ header_template: HEADER_HTML,
37
+ footer_template: '<div style="text-align: center; font-size: 12pt; width: 100%;">Generated with Palapala PDF</div>',
38
38
  scale: 0.75,
39
39
  prefer_css_page_size: false,
40
- margin: { top: 3, bottom: 2 }
41
- ).save('tmp/headers_and_footers.pdf',
42
- generateDocumentOutline: false,
43
- # marginTop: 1,
44
- # paperWidth: 3,
45
- displayHeaderFooter: true,
46
- # landscape: false,
47
- headerTemplate: HEADER_HTML)
40
+ margin_top: 3,
41
+ margin_bottom: 2).save('tmp/headers_and_footers.pdf')
48
42
 
49
43
  puts result
44
+ `open tmp/headers_and_footers.pdf`
@@ -15,8 +15,9 @@ DOCUMENT = <<~HTML
15
15
  HTML
16
16
 
17
17
  Palapala.setup do |config|
18
- config.debug = true
18
+ # config.debug = true
19
+ # config.defaults = { header_template: '<div></div>', footer_template: '<div></div>' }
19
20
  end
20
21
 
21
- result = Palapala::Pdf.new(DOCUMENT).save('tmp/js_based_rendering.pdf')
22
- puts result
22
+ Palapala::Pdf.new(DOCUMENT).save('tmp/js_based_rendering.pdf')
23
+ `open tmp/js_based_rendering.pdf`
@@ -25,9 +25,9 @@ module Palapala
25
25
  end
26
26
  end
27
27
 
28
- # Check if a Chrome is running
28
+ # Check if a Chrome is running locally
29
29
  def self.chrome_running?
30
- port_in_use? || # Check if the port is in use and Chrome is running externally
30
+ port_in_use? || # Check if the port is in use
31
31
  chrome_process_healthy? # Check if the process is still alive
32
32
  end
33
33
 
@@ -59,9 +59,9 @@ module Palapala
59
59
  system("which npx > /dev/null 2>&1")
60
60
  end
61
61
 
62
- def self.spawn_chrome_headless_server
62
+ def self.spawn_chrome_headless_server_with_npx
63
63
  # Run the command and capture the output
64
- puts "Installing latest stable chrome-headless-shell..."
64
+ puts "Installing/launching chrome-headless-shell@#{Palapala.chrome_headless_shell_version}"
65
65
  output, status = Open3.capture2("npx --yes @puppeteer/browsers install chrome-headless-shell@#{Palapala.chrome_headless_shell_version}")
66
66
 
67
67
  if status.success?
@@ -82,29 +82,37 @@ module Palapala
82
82
  # Display the version
83
83
  system("#{chrome_path} --version") if Palapala.debug
84
84
  # Launch chrome-headless-shell with the --remote-debugging-port parameter
85
- if Palapala.debug
86
- puts "spawning with output"
87
- spawn(chrome_path, "--remote-debugging-port=9222", "--disable-gpu")
85
+ params = [ "--disable-gpu", "--remote-debugging-port=9222" ]
86
+ params.merge!(Palapala.chrome_params) if Palapala.chrome_params
87
+ pid = if Palapala.debug
88
+ spawn(chrome_path, *params)
88
89
  else
89
- spawn(chrome_path, "--remote-debugging-port=9222", "--disable-gpu", out: "/dev/null", err: "/dev/null")
90
+ spawn(chrome_path, *params, out: "/dev/null", err: "/dev/null")
90
91
  end
92
+ Palapala.headless_chrome_url = "http://localhost:9222"
93
+ pid
91
94
  else
92
95
  raise "Failed to install chrome-headless-shell"
93
96
  end
94
97
  end
95
98
 
99
+ def self.spawn_chrome_from_path
100
+ params = [ "--headless", "--disable-gpu", "--remote-debugging-port=9222" ]
101
+ params.merge!(Palapala.chrome_params) if Palapala.chrome_params
102
+ # Spawn an existing chrome with the path and parameters
103
+ Process.spawn(chrome_path, *params)
104
+ end
105
+
96
106
  # Spawn a Chrome child process
97
107
  def self.spawn_chrome
98
108
  return if chrome_running?
99
109
 
100
- if self.npx_installed?
101
- @chrome_process_id = spawn_chrome_headless_server
102
- else
103
- params = [ "--headless", "--disable-gpu", "--remote-debugging-port=9222" ]
104
- params.merge!(Palapala.chrome_params) if Palapala.chrome_params
105
- # Spawn an existing chrome with the path and parameters
106
- @chrome_process_id = Process.spawn(chrome_path, *params)
107
- end
110
+ @chrome_process_id =
111
+ if Palapala.headless_chrome_path.nil? && self.npx_installed?
112
+ spawn_chrome_headless_server_with_npx
113
+ else
114
+ spawn_chrome_from_path
115
+ end
108
116
 
109
117
  # Wait until the port is in use
110
118
  sleep 0.1 until port_in_use?
data/lib/palapala/pdf.rb CHANGED
@@ -42,20 +42,22 @@ module Palapala
42
42
  scale: nil)
43
43
  @content = content || raise(ArgumentError, "Content is required and can't be nil")
44
44
  @opts = {}
45
- @opts[:headerTemplate] = header_template || Palapala.defaults[:header_template]
46
- @opts[:footerTemplate] = footer_template || Palapala.defaults[:footer_template]
47
- @opts[:pageRanges] = page_ranges || Palapala.defaults[:page_ranges]
48
- @opts[:generateTaggedPDF] = generate_tagged_pdf || Palapala.defaults[:generate_tagged_pdf]
49
- @opts[:paperWidth] = paper_width || Palapala.defaults[:paper_width]
50
- @opts[:paperHeight] = paper_height || Palapala.defaults[:paper_height]
51
- @opts[:landscape] = landscape || Palapala.defaults[:landscape]
52
- @opts[:marginTop] = margin_top || Palapala.defaults[:margin_top]
53
- @opts[:marginLeft] = margin_left || Palapala.defaults[:margin_left]
54
- @opts[:marginBottom] = margin_bottom || Palapala.defaults[:margin_bottom]
55
- @opts[:marginRight] = margin_right || Palapala.defaults[:margin_right]
56
- @opts[:preferCSSPageSize] = prefer_css_page_size || Palapala.defaults[:prefer_css_page_size]
57
- @opts[:printBackground] = print_background || Palapala.defaults[:print_background]
58
- @opts[:scale] = scale || Palapala.defaults[:scale]
45
+ @opts[:headerTemplate] = header_template || Palapala.defaults[:header_template]
46
+ @opts[:footerTemplate] = footer_template || Palapala.defaults[:footer_template]
47
+ @opts[:pageRanges] = page_ranges || Palapala.defaults[:page_ranges]
48
+ @opts[:generateTaggedPDF] = generate_tagged_pdf || Palapala.defaults[:generate_tagged_pdf]
49
+ @opts[:paperWidth] = paper_width || Palapala.defaults[:paper_width]
50
+ @opts[:paperHeight] = paper_height || Palapala.defaults[:paper_height]
51
+ @opts[:landscape] = landscape || Palapala.defaults[:landscape]
52
+ @opts[:marginTop] = margin_top || Palapala.defaults[:margin_top]
53
+ @opts[:marginLeft] = margin_left || Palapala.defaults[:margin_left]
54
+ @opts[:marginBottom] = margin_bottom || Palapala.defaults[:margin_bottom]
55
+ @opts[:marginRight] = margin_right || Palapala.defaults[:margin_right]
56
+ @opts[:preferCSSPageSize] = prefer_css_page_size || Palapala.defaults[:prefer_css_page_size]
57
+ @opts[:printBackground] = print_background || Palapala.defaults[:print_background]
58
+ @opts[:scale] = scale || Palapala.defaults[:scale]
59
+ @opts[:displayHeaderFooter] = true
60
+ @opts[:encoding] = :binary
59
61
  @opts.compact!
60
62
  end
61
63
 
@@ -22,6 +22,13 @@ module Palapala
22
22
  send_command_and_wait_for_result("Page.enable")
23
23
  end
24
24
 
25
+ def websocket_url
26
+ self.class.websocket_url
27
+ rescue Errno::ECONNREFUSED
28
+ ChromeProcess.spawn_chrome # Spawn a new Chrome process
29
+ self.class.websocket_url # Retry (once)
30
+ end
31
+
25
32
  # Create a thread-local instance of the renderer
26
33
  def self.thread_local_instance
27
34
  Thread.current[:renderer] ||= Renderer.new
@@ -102,16 +109,8 @@ module Palapala
102
109
  @client.close
103
110
  end
104
111
 
105
- private
106
-
107
- # Convert the HTML content to a data URL
108
- def data_url_for_html(html)
109
- "data:text/html;base64,#{Base64.strict_encode64(html)}"
110
- end
111
-
112
112
  # Open a new tab in the remote chrome and return the WebSocket URL
113
- def websocket_url
114
- ChromeProcess.spawn_chrome
113
+ def self.websocket_url
115
114
  uri = URI("#{Palapala.headless_chrome_url}/json/new")
116
115
  http = Net::HTTP.new(uri.host, uri.port)
117
116
  request = Net::HTTP::Put.new(uri)
@@ -122,5 +121,12 @@ module Palapala
122
121
  puts "WebSocket URL: #{websocket_url}" if Palapala.debug
123
122
  websocket_url
124
123
  end
124
+
125
+ private
126
+
127
+ # Convert the HTML content to a data URL
128
+ def data_url_for_html(html)
129
+ "data:text/html;base64,#{Base64.strict_encode64(html)}"
130
+ end
125
131
  end
126
132
  end
@@ -1,3 +1,3 @@
1
1
  module Palapala
2
- VERSION = "0.1.8"
2
+ VERSION = "0.1.10"
3
3
  end
data/lib/palapala.rb CHANGED
@@ -19,16 +19,20 @@ module Palapala
19
19
  # path to the headless Chrome executable when using the child process renderer
20
20
  attr_accessor :headless_chrome_path
21
21
 
22
- # URL to the headless Chrome instance when using the remote renderer
22
+ # URL to the headless Chrome instance when using the remote renderer (priority)
23
23
  attr_accessor :headless_chrome_url
24
24
 
25
- # Chrome headless shell version to use
25
+ # Chrome headless shell version to use (stable, beta, dev, canary, etc.)
26
+ # when launching a new Chrome instance using npx
26
27
  attr_accessor :chrome_headless_shell_version
27
28
  end
28
-
29
29
  self.debug = false
30
- self.defaults = { displayHeaderFooter: true, encoding: :binary }
30
+ self.defaults = {
31
+ header_template: "<div></div>",
32
+ footer_template: "<div></div>"
33
+ # footer_template: '<div style="text-align: center; font-size: 12pt; width: 100%;">Generated with Palapala PDF</div>'
34
+ }
31
35
  self.headless_chrome_path = nil
32
- self.headless_chrome_url = "http://localhost:9222"
33
- self.chrome_headless_shell_version = "stable"
36
+ self.headless_chrome_url = ENV.fetch("HEADLESS_CHROME_URL", "http://localhost:9222")
37
+ self.chrome_headless_shell_version = ENV.fetch("CHROME_HEADLESS_SHELL_VERSION", "stable")
34
38
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: palapala_pdf
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.8
4
+ version: 0.1.10
5
5
  platform: ruby
6
6
  authors:
7
7
  - Koen Handekyn