palapala_pdf 0.1.8 → 0.1.10
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +19 -22
- data/bin/chrome-headless-server +7 -29
- data/examples/headers_and_footers.rb +7 -12
- data/examples/js_based_rendering.rb +4 -3
- data/lib/palapala/chrome_process.rb +24 -16
- data/lib/palapala/pdf.rb +16 -14
- data/lib/palapala/renderer.rb +15 -9
- data/lib/palapala/version.rb +1 -1
- data/lib/palapala.rb +10 -6
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: e06d55c5dca6e14014e1154d4cd4fdcdddcd61844ccd138f38e1d9d803d1094e
|
4
|
+
data.tar.gz: 89ed6d300a9e4c804d3bfcb54516ec921ed741df1718d68b4d160b36e3fd2792
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f0bd26fe4c402e06f1f75ab4a5ccfad05834b78bcb125024967134149b7738b2cbd8121dd985332907644ecfd167f44fc3109b322099ebb9c0e988971f74f42e
|
7
|
+
data.tar.gz: c95f8931ce1538af9b0cb63d93fcc179016cfe05cd4920b7ead1bf6933f4712bfbaee3ae8d243e1112353c1c2b4af875c92b6e1e6373ad219aa0c409f7db117a
|
data/README.md
CHANGED
@@ -31,49 +31,47 @@ $ gem install palapala_pdf
|
|
31
31
|
```
|
32
32
|
|
33
33
|
Palapala PDF connects to Chrome over a web socket connection.
|
34
|
-
An external Chrome/Chromium is
|
35
|
-
command (9222 is the default port):
|
34
|
+
An external Chrome/Chromium is preferred. Start it with the following
|
35
|
+
command (9222 is the default/expected port):
|
36
36
|
|
37
37
|
```sh
|
38
38
|
/path/to/chrome --headless --disable-gpu --remote-debugging-port=9222
|
39
39
|
```
|
40
40
|
|
41
|
-
###
|
41
|
+
### Connecting to Chrome
|
42
42
|
|
43
|
-
|
43
|
+
Palapa PDF will go through this process
|
44
44
|
|
45
|
-
|
46
|
-
|
47
|
-
|
45
|
+
- check if a Chrome is running and exposing port 9222 (and if so, use it)
|
46
|
+
- if `Palapala.headless_chrome_path` is defined, launch Chrome as a child process using that path
|
47
|
+
- if **NPX** is avalaillable, install a **Chrome-Headless-Shell** variant locally and launch it as a child process. It will install the 'stable' version or the version identified by `Palapala.chrome_headless_shell_version` setting (or from ENV `CHROME_HEADLESS_SHELL_VERSION`).
|
48
|
+
- as a last fallback it will guess a chrome path from the detected OS and try to launch a Chrome with that
|
48
49
|
|
49
|
-
|
50
|
+
A Chrome-Headless-Shell version gives the best performance and resource useage
|
50
51
|
|
51
|
-
|
52
|
+
### Installing Chrome / Headless Chrome manually
|
53
|
+
|
54
|
+
This is easiest using npx and some tooling provided by Puppeteer. Unfortunately it depends on node/npm, but it's worth it. E.g. install a specific version like this:
|
52
55
|
|
53
|
-
```sh
|
54
|
-
./chrome/mac_arm-127.0.6533.88/chrome-mac-arm64/Google\ Chrome\ for\ Testing.app/Contents/MacOS/Google\ Chrome\ for\ Testing --headless --disable-gpu --remote-debugging-port=9222
|
55
56
|
```
|
57
|
+
npx @puppeteer/browsers install chrome@127.0.6533.88
|
58
|
+
````
|
56
59
|
|
57
|
-
|
60
|
+
This installs chrome in a `chrome` folder in the current working dir and it outputs the path where it's installed when it's finished which then could be started like this
|
61
|
+
|
62
|
+
Currently we'd advise for the `chrome-headless-shell` variant that is a light version meant just for this use case. The chrome-headless-shell is a minimal, headless version of the Chrome browser designed specifically for environments where you need to run Chrome without a graphical user interface (GUI). This is particularly useful in scenarios like server-side rendering, automated testing, web scraping, or any situation where you need the power of the Chrome browser engine without the overhead of displaying a UI. Headless by design, reduced size and overhead but still the same engine.
|
58
63
|
|
59
64
|
```
|
60
65
|
npx @puppeteer/browsers install chrome-headless-shell@stable
|
61
66
|
```
|
62
67
|
|
63
|
-
It installs to a path like this `./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell`. As it's headless by design, it only needs one parameter
|
68
|
+
It installs to a path like this `./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell`. As it's headless by design, it only needs one parameter:
|
64
69
|
|
65
70
|
```
|
66
71
|
./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell --remote-debugging-port=9222
|
67
72
|
```
|
68
73
|
|
69
|
-
|
70
|
-
It guesses the path to Chrome, or you configure it like this:
|
71
|
-
|
72
|
-
```ruby
|
73
|
-
Palapala.setup do |config|
|
74
|
-
config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
|
75
|
-
end
|
76
|
-
```
|
74
|
+
*Note: Seems the august 2024 release 128.0.6613.85 is seriously performance impacted. So to avoid regression issues, it's suggested to install a specific version of Chrome, test it and stick with it. The chrome-headless-shell does not seem to suffer from this though.*
|
77
75
|
|
78
76
|
### Installing Node/NPX
|
79
77
|
|
@@ -92,7 +90,6 @@ nvm --version
|
|
92
90
|
nvm install node
|
93
91
|
````
|
94
92
|
|
95
|
-
|
96
93
|
## Usage Instructions
|
97
94
|
|
98
95
|
To create a PDF from HTML content using the `Palapala` library, follow these steps:
|
data/bin/chrome-headless-server
CHANGED
@@ -1,33 +1,11 @@
|
|
1
1
|
#!/usr/bin/env ruby
|
2
2
|
|
3
|
-
|
4
|
-
require
|
3
|
+
# $LOAD_PATH.unshift File.expand_path("../lib", __dir__)
|
4
|
+
require "palapala"
|
5
5
|
|
6
|
-
|
7
|
-
|
8
|
-
output, status = Open3.capture2('npx --yes @puppeteer/browsers install chrome-headless-shell@stable')
|
9
|
-
|
10
|
-
if status.success?
|
11
|
-
# Extract the path from the output
|
12
|
-
result = output.lines.find { |line| line.include?("chrome-headless-shell@") }
|
13
|
-
if result.nil?
|
14
|
-
puts "Failed to install chrome-headless-shell"
|
15
|
-
exit 1
|
16
|
-
end
|
17
|
-
_, chrome_path = result.split(' ', 2).map(&:strip)
|
18
|
-
|
19
|
-
# Directory you want the relative path from (current working directory)
|
20
|
-
base_dir = Dir.pwd
|
21
|
-
|
22
|
-
# Convert absolute path to relative path
|
23
|
-
relative_path = Pathname.new(chrome_path).relative_path_from(Pathname.new(base_dir)).to_s
|
24
|
-
|
25
|
-
puts "Launching chrome-headless-shell at #{relative_path}"
|
26
|
-
# Display the version
|
27
|
-
system("#{chrome_path} --version")
|
28
|
-
# Launch chrome-headless-shell with the --remote-debugging-port parameter
|
29
|
-
exec("#{chrome_path} --remote-debugging-port=9222")
|
30
|
-
else
|
31
|
-
puts "Failed to install chrome-headless-shell"
|
32
|
-
exit 1
|
6
|
+
Palapala.setup do |config|
|
7
|
+
config.debug = true
|
33
8
|
end
|
9
|
+
|
10
|
+
pid = Palapala::ChromeProcess.spawn_chrome_headless_server
|
11
|
+
Process.wait(pid)
|
@@ -25,25 +25,20 @@ HEADER_HTML = <<~HTML
|
|
25
25
|
HTML
|
26
26
|
|
27
27
|
Palapala.setup do |config|
|
28
|
-
config.debug = true
|
29
|
-
config.headless_chrome_url = 'http://localhost:9222' # run against a remote Chrome instance
|
28
|
+
# config.debug = true
|
29
|
+
# config.headless_chrome_url = 'http://localhost:9222' # run against a remote Chrome instance
|
30
30
|
# config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
|
31
31
|
end
|
32
32
|
|
33
33
|
result = Palapala::Pdf.new(
|
34
34
|
# "<style>@page { size: A4 landscape; }</style><p>Hello world #{Time.now}</>",
|
35
35
|
"<h1>Title</h1><p>Hello world #{Time.now}</>",
|
36
|
-
|
37
|
-
|
36
|
+
header_template: HEADER_HTML,
|
37
|
+
footer_template: '<div style="text-align: center; font-size: 12pt; width: 100%;">Generated with Palapala PDF</div>',
|
38
38
|
scale: 0.75,
|
39
39
|
prefer_css_page_size: false,
|
40
|
-
|
41
|
-
).save('tmp/headers_and_footers.pdf'
|
42
|
-
generateDocumentOutline: false,
|
43
|
-
# marginTop: 1,
|
44
|
-
# paperWidth: 3,
|
45
|
-
displayHeaderFooter: true,
|
46
|
-
# landscape: false,
|
47
|
-
headerTemplate: HEADER_HTML)
|
40
|
+
margin_top: 3,
|
41
|
+
margin_bottom: 2).save('tmp/headers_and_footers.pdf')
|
48
42
|
|
49
43
|
puts result
|
44
|
+
`open tmp/headers_and_footers.pdf`
|
@@ -15,8 +15,9 @@ DOCUMENT = <<~HTML
|
|
15
15
|
HTML
|
16
16
|
|
17
17
|
Palapala.setup do |config|
|
18
|
-
config.debug = true
|
18
|
+
# config.debug = true
|
19
|
+
# config.defaults = { header_template: '<div></div>', footer_template: '<div></div>' }
|
19
20
|
end
|
20
21
|
|
21
|
-
|
22
|
-
|
22
|
+
Palapala::Pdf.new(DOCUMENT).save('tmp/js_based_rendering.pdf')
|
23
|
+
`open tmp/js_based_rendering.pdf`
|
@@ -25,9 +25,9 @@ module Palapala
|
|
25
25
|
end
|
26
26
|
end
|
27
27
|
|
28
|
-
# Check if a Chrome is running
|
28
|
+
# Check if a Chrome is running locally
|
29
29
|
def self.chrome_running?
|
30
|
-
port_in_use? || # Check if the port is in use
|
30
|
+
port_in_use? || # Check if the port is in use
|
31
31
|
chrome_process_healthy? # Check if the process is still alive
|
32
32
|
end
|
33
33
|
|
@@ -59,9 +59,9 @@ module Palapala
|
|
59
59
|
system("which npx > /dev/null 2>&1")
|
60
60
|
end
|
61
61
|
|
62
|
-
def self.
|
62
|
+
def self.spawn_chrome_headless_server_with_npx
|
63
63
|
# Run the command and capture the output
|
64
|
-
puts "Installing
|
64
|
+
puts "Installing/launching chrome-headless-shell@#{Palapala.chrome_headless_shell_version}"
|
65
65
|
output, status = Open3.capture2("npx --yes @puppeteer/browsers install chrome-headless-shell@#{Palapala.chrome_headless_shell_version}")
|
66
66
|
|
67
67
|
if status.success?
|
@@ -82,29 +82,37 @@ module Palapala
|
|
82
82
|
# Display the version
|
83
83
|
system("#{chrome_path} --version") if Palapala.debug
|
84
84
|
# Launch chrome-headless-shell with the --remote-debugging-port parameter
|
85
|
-
|
86
|
-
|
87
|
-
|
85
|
+
params = [ "--disable-gpu", "--remote-debugging-port=9222" ]
|
86
|
+
params.merge!(Palapala.chrome_params) if Palapala.chrome_params
|
87
|
+
pid = if Palapala.debug
|
88
|
+
spawn(chrome_path, *params)
|
88
89
|
else
|
89
|
-
spawn(chrome_path,
|
90
|
+
spawn(chrome_path, *params, out: "/dev/null", err: "/dev/null")
|
90
91
|
end
|
92
|
+
Palapala.headless_chrome_url = "http://localhost:9222"
|
93
|
+
pid
|
91
94
|
else
|
92
95
|
raise "Failed to install chrome-headless-shell"
|
93
96
|
end
|
94
97
|
end
|
95
98
|
|
99
|
+
def self.spawn_chrome_from_path
|
100
|
+
params = [ "--headless", "--disable-gpu", "--remote-debugging-port=9222" ]
|
101
|
+
params.merge!(Palapala.chrome_params) if Palapala.chrome_params
|
102
|
+
# Spawn an existing chrome with the path and parameters
|
103
|
+
Process.spawn(chrome_path, *params)
|
104
|
+
end
|
105
|
+
|
96
106
|
# Spawn a Chrome child process
|
97
107
|
def self.spawn_chrome
|
98
108
|
return if chrome_running?
|
99
109
|
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
@chrome_process_id = Process.spawn(chrome_path, *params)
|
107
|
-
end
|
110
|
+
@chrome_process_id =
|
111
|
+
if Palapala.headless_chrome_path.nil? && self.npx_installed?
|
112
|
+
spawn_chrome_headless_server_with_npx
|
113
|
+
else
|
114
|
+
spawn_chrome_from_path
|
115
|
+
end
|
108
116
|
|
109
117
|
# Wait until the port is in use
|
110
118
|
sleep 0.1 until port_in_use?
|
data/lib/palapala/pdf.rb
CHANGED
@@ -42,20 +42,22 @@ module Palapala
|
|
42
42
|
scale: nil)
|
43
43
|
@content = content || raise(ArgumentError, "Content is required and can't be nil")
|
44
44
|
@opts = {}
|
45
|
-
@opts[:headerTemplate]
|
46
|
-
@opts[:footerTemplate]
|
47
|
-
@opts[:pageRanges]
|
48
|
-
@opts[:generateTaggedPDF]
|
49
|
-
@opts[:paperWidth]
|
50
|
-
@opts[:paperHeight]
|
51
|
-
@opts[:landscape]
|
52
|
-
@opts[:marginTop]
|
53
|
-
@opts[:marginLeft]
|
54
|
-
@opts[:marginBottom]
|
55
|
-
@opts[:marginRight]
|
56
|
-
@opts[:preferCSSPageSize]
|
57
|
-
@opts[:printBackground]
|
58
|
-
@opts[:scale]
|
45
|
+
@opts[:headerTemplate] = header_template || Palapala.defaults[:header_template]
|
46
|
+
@opts[:footerTemplate] = footer_template || Palapala.defaults[:footer_template]
|
47
|
+
@opts[:pageRanges] = page_ranges || Palapala.defaults[:page_ranges]
|
48
|
+
@opts[:generateTaggedPDF] = generate_tagged_pdf || Palapala.defaults[:generate_tagged_pdf]
|
49
|
+
@opts[:paperWidth] = paper_width || Palapala.defaults[:paper_width]
|
50
|
+
@opts[:paperHeight] = paper_height || Palapala.defaults[:paper_height]
|
51
|
+
@opts[:landscape] = landscape || Palapala.defaults[:landscape]
|
52
|
+
@opts[:marginTop] = margin_top || Palapala.defaults[:margin_top]
|
53
|
+
@opts[:marginLeft] = margin_left || Palapala.defaults[:margin_left]
|
54
|
+
@opts[:marginBottom] = margin_bottom || Palapala.defaults[:margin_bottom]
|
55
|
+
@opts[:marginRight] = margin_right || Palapala.defaults[:margin_right]
|
56
|
+
@opts[:preferCSSPageSize] = prefer_css_page_size || Palapala.defaults[:prefer_css_page_size]
|
57
|
+
@opts[:printBackground] = print_background || Palapala.defaults[:print_background]
|
58
|
+
@opts[:scale] = scale || Palapala.defaults[:scale]
|
59
|
+
@opts[:displayHeaderFooter] = true
|
60
|
+
@opts[:encoding] = :binary
|
59
61
|
@opts.compact!
|
60
62
|
end
|
61
63
|
|
data/lib/palapala/renderer.rb
CHANGED
@@ -22,6 +22,13 @@ module Palapala
|
|
22
22
|
send_command_and_wait_for_result("Page.enable")
|
23
23
|
end
|
24
24
|
|
25
|
+
def websocket_url
|
26
|
+
self.class.websocket_url
|
27
|
+
rescue Errno::ECONNREFUSED
|
28
|
+
ChromeProcess.spawn_chrome # Spawn a new Chrome process
|
29
|
+
self.class.websocket_url # Retry (once)
|
30
|
+
end
|
31
|
+
|
25
32
|
# Create a thread-local instance of the renderer
|
26
33
|
def self.thread_local_instance
|
27
34
|
Thread.current[:renderer] ||= Renderer.new
|
@@ -102,16 +109,8 @@ module Palapala
|
|
102
109
|
@client.close
|
103
110
|
end
|
104
111
|
|
105
|
-
private
|
106
|
-
|
107
|
-
# Convert the HTML content to a data URL
|
108
|
-
def data_url_for_html(html)
|
109
|
-
"data:text/html;base64,#{Base64.strict_encode64(html)}"
|
110
|
-
end
|
111
|
-
|
112
112
|
# Open a new tab in the remote chrome and return the WebSocket URL
|
113
|
-
def websocket_url
|
114
|
-
ChromeProcess.spawn_chrome
|
113
|
+
def self.websocket_url
|
115
114
|
uri = URI("#{Palapala.headless_chrome_url}/json/new")
|
116
115
|
http = Net::HTTP.new(uri.host, uri.port)
|
117
116
|
request = Net::HTTP::Put.new(uri)
|
@@ -122,5 +121,12 @@ module Palapala
|
|
122
121
|
puts "WebSocket URL: #{websocket_url}" if Palapala.debug
|
123
122
|
websocket_url
|
124
123
|
end
|
124
|
+
|
125
|
+
private
|
126
|
+
|
127
|
+
# Convert the HTML content to a data URL
|
128
|
+
def data_url_for_html(html)
|
129
|
+
"data:text/html;base64,#{Base64.strict_encode64(html)}"
|
130
|
+
end
|
125
131
|
end
|
126
132
|
end
|
data/lib/palapala/version.rb
CHANGED
data/lib/palapala.rb
CHANGED
@@ -19,16 +19,20 @@ module Palapala
|
|
19
19
|
# path to the headless Chrome executable when using the child process renderer
|
20
20
|
attr_accessor :headless_chrome_path
|
21
21
|
|
22
|
-
# URL to the headless Chrome instance when using the remote renderer
|
22
|
+
# URL to the headless Chrome instance when using the remote renderer (priority)
|
23
23
|
attr_accessor :headless_chrome_url
|
24
24
|
|
25
|
-
# Chrome headless shell version to use
|
25
|
+
# Chrome headless shell version to use (stable, beta, dev, canary, etc.)
|
26
|
+
# when launching a new Chrome instance using npx
|
26
27
|
attr_accessor :chrome_headless_shell_version
|
27
28
|
end
|
28
|
-
|
29
29
|
self.debug = false
|
30
|
-
self.defaults = {
|
30
|
+
self.defaults = {
|
31
|
+
header_template: "<div></div>",
|
32
|
+
footer_template: "<div></div>"
|
33
|
+
# footer_template: '<div style="text-align: center; font-size: 12pt; width: 100%;">Generated with Palapala PDF</div>'
|
34
|
+
}
|
31
35
|
self.headless_chrome_path = nil
|
32
|
-
self.headless_chrome_url = "http://localhost:9222"
|
33
|
-
self.chrome_headless_shell_version = "stable"
|
36
|
+
self.headless_chrome_url = ENV.fetch("HEADLESS_CHROME_URL", "http://localhost:9222")
|
37
|
+
self.chrome_headless_shell_version = ENV.fetch("CHROME_HEADLESS_SHELL_VERSION", "stable")
|
34
38
|
end
|