palapala_pdf 0.1.9 → 0.1.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +19 -22
- data/examples/headers_and_footers.rb +7 -12
- data/examples/js_based_rendering.rb +4 -3
- data/lib/palapala/chrome_process.rb +24 -15
- data/lib/palapala/pdf.rb +16 -14
- data/lib/palapala/renderer.rb +15 -9
- data/lib/palapala/version.rb +1 -1
- data/lib/palapala.rb +9 -7
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: e06d55c5dca6e14014e1154d4cd4fdcdddcd61844ccd138f38e1d9d803d1094e
|
4
|
+
data.tar.gz: 89ed6d300a9e4c804d3bfcb54516ec921ed741df1718d68b4d160b36e3fd2792
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f0bd26fe4c402e06f1f75ab4a5ccfad05834b78bcb125024967134149b7738b2cbd8121dd985332907644ecfd167f44fc3109b322099ebb9c0e988971f74f42e
|
7
|
+
data.tar.gz: c95f8931ce1538af9b0cb63d93fcc179016cfe05cd4920b7ead1bf6933f4712bfbaee3ae8d243e1112353c1c2b4af875c92b6e1e6373ad219aa0c409f7db117a
|
data/README.md
CHANGED
@@ -31,49 +31,47 @@ $ gem install palapala_pdf
|
|
31
31
|
```
|
32
32
|
|
33
33
|
Palapala PDF connects to Chrome over a web socket connection.
|
34
|
-
An external Chrome/Chromium is
|
35
|
-
command (9222 is the default port):
|
34
|
+
An external Chrome/Chromium is preferred. Start it with the following
|
35
|
+
command (9222 is the default/expected port):
|
36
36
|
|
37
37
|
```sh
|
38
38
|
/path/to/chrome --headless --disable-gpu --remote-debugging-port=9222
|
39
39
|
```
|
40
40
|
|
41
|
-
###
|
41
|
+
### Connecting to Chrome
|
42
42
|
|
43
|
-
|
43
|
+
Palapa PDF will go through this process
|
44
44
|
|
45
|
-
|
46
|
-
|
47
|
-
|
45
|
+
- check if a Chrome is running and exposing port 9222 (and if so, use it)
|
46
|
+
- if `Palapala.headless_chrome_path` is defined, launch Chrome as a child process using that path
|
47
|
+
- if **NPX** is avalaillable, install a **Chrome-Headless-Shell** variant locally and launch it as a child process. It will install the 'stable' version or the version identified by `Palapala.chrome_headless_shell_version` setting (or from ENV `CHROME_HEADLESS_SHELL_VERSION`).
|
48
|
+
- as a last fallback it will guess a chrome path from the detected OS and try to launch a Chrome with that
|
48
49
|
|
49
|
-
|
50
|
+
A Chrome-Headless-Shell version gives the best performance and resource useage
|
50
51
|
|
51
|
-
|
52
|
+
### Installing Chrome / Headless Chrome manually
|
53
|
+
|
54
|
+
This is easiest using npx and some tooling provided by Puppeteer. Unfortunately it depends on node/npm, but it's worth it. E.g. install a specific version like this:
|
52
55
|
|
53
|
-
```sh
|
54
|
-
./chrome/mac_arm-127.0.6533.88/chrome-mac-arm64/Google\ Chrome\ for\ Testing.app/Contents/MacOS/Google\ Chrome\ for\ Testing --headless --disable-gpu --remote-debugging-port=9222
|
55
56
|
```
|
57
|
+
npx @puppeteer/browsers install chrome@127.0.6533.88
|
58
|
+
````
|
56
59
|
|
57
|
-
|
60
|
+
This installs chrome in a `chrome` folder in the current working dir and it outputs the path where it's installed when it's finished which then could be started like this
|
61
|
+
|
62
|
+
Currently we'd advise for the `chrome-headless-shell` variant that is a light version meant just for this use case. The chrome-headless-shell is a minimal, headless version of the Chrome browser designed specifically for environments where you need to run Chrome without a graphical user interface (GUI). This is particularly useful in scenarios like server-side rendering, automated testing, web scraping, or any situation where you need the power of the Chrome browser engine without the overhead of displaying a UI. Headless by design, reduced size and overhead but still the same engine.
|
58
63
|
|
59
64
|
```
|
60
65
|
npx @puppeteer/browsers install chrome-headless-shell@stable
|
61
66
|
```
|
62
67
|
|
63
|
-
It installs to a path like this `./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell`. As it's headless by design, it only needs one parameter
|
68
|
+
It installs to a path like this `./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell`. As it's headless by design, it only needs one parameter:
|
64
69
|
|
65
70
|
```
|
66
71
|
./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell --remote-debugging-port=9222
|
67
72
|
```
|
68
73
|
|
69
|
-
|
70
|
-
It guesses the path to Chrome, or you configure it like this:
|
71
|
-
|
72
|
-
```ruby
|
73
|
-
Palapala.setup do |config|
|
74
|
-
config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
|
75
|
-
end
|
76
|
-
```
|
74
|
+
*Note: Seems the august 2024 release 128.0.6613.85 is seriously performance impacted. So to avoid regression issues, it's suggested to install a specific version of Chrome, test it and stick with it. The chrome-headless-shell does not seem to suffer from this though.*
|
77
75
|
|
78
76
|
### Installing Node/NPX
|
79
77
|
|
@@ -92,7 +90,6 @@ nvm --version
|
|
92
90
|
nvm install node
|
93
91
|
````
|
94
92
|
|
95
|
-
|
96
93
|
## Usage Instructions
|
97
94
|
|
98
95
|
To create a PDF from HTML content using the `Palapala` library, follow these steps:
|
@@ -25,25 +25,20 @@ HEADER_HTML = <<~HTML
|
|
25
25
|
HTML
|
26
26
|
|
27
27
|
Palapala.setup do |config|
|
28
|
-
config.debug = true
|
29
|
-
config.headless_chrome_url = 'http://localhost:9222' # run against a remote Chrome instance
|
28
|
+
# config.debug = true
|
29
|
+
# config.headless_chrome_url = 'http://localhost:9222' # run against a remote Chrome instance
|
30
30
|
# config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
|
31
31
|
end
|
32
32
|
|
33
33
|
result = Palapala::Pdf.new(
|
34
34
|
# "<style>@page { size: A4 landscape; }</style><p>Hello world #{Time.now}</>",
|
35
35
|
"<h1>Title</h1><p>Hello world #{Time.now}</>",
|
36
|
-
|
37
|
-
|
36
|
+
header_template: HEADER_HTML,
|
37
|
+
footer_template: '<div style="text-align: center; font-size: 12pt; width: 100%;">Generated with Palapala PDF</div>',
|
38
38
|
scale: 0.75,
|
39
39
|
prefer_css_page_size: false,
|
40
|
-
|
41
|
-
).save('tmp/headers_and_footers.pdf'
|
42
|
-
generateDocumentOutline: false,
|
43
|
-
# marginTop: 1,
|
44
|
-
# paperWidth: 3,
|
45
|
-
displayHeaderFooter: true,
|
46
|
-
# landscape: false,
|
47
|
-
headerTemplate: HEADER_HTML)
|
40
|
+
margin_top: 3,
|
41
|
+
margin_bottom: 2).save('tmp/headers_and_footers.pdf')
|
48
42
|
|
49
43
|
puts result
|
44
|
+
`open tmp/headers_and_footers.pdf`
|
@@ -15,8 +15,9 @@ DOCUMENT = <<~HTML
|
|
15
15
|
HTML
|
16
16
|
|
17
17
|
Palapala.setup do |config|
|
18
|
-
config.debug = true
|
18
|
+
# config.debug = true
|
19
|
+
# config.defaults = { header_template: '<div></div>', footer_template: '<div></div>' }
|
19
20
|
end
|
20
21
|
|
21
|
-
|
22
|
-
|
22
|
+
Palapala::Pdf.new(DOCUMENT).save('tmp/js_based_rendering.pdf')
|
23
|
+
`open tmp/js_based_rendering.pdf`
|
@@ -25,9 +25,9 @@ module Palapala
|
|
25
25
|
end
|
26
26
|
end
|
27
27
|
|
28
|
-
# Check if a Chrome is running
|
28
|
+
# Check if a Chrome is running locally
|
29
29
|
def self.chrome_running?
|
30
|
-
port_in_use? || # Check if the port is in use
|
30
|
+
port_in_use? || # Check if the port is in use
|
31
31
|
chrome_process_healthy? # Check if the process is still alive
|
32
32
|
end
|
33
33
|
|
@@ -59,9 +59,9 @@ module Palapala
|
|
59
59
|
system("which npx > /dev/null 2>&1")
|
60
60
|
end
|
61
61
|
|
62
|
-
def self.
|
62
|
+
def self.spawn_chrome_headless_server_with_npx
|
63
63
|
# Run the command and capture the output
|
64
|
-
puts "Installing
|
64
|
+
puts "Installing/launching chrome-headless-shell@#{Palapala.chrome_headless_shell_version}"
|
65
65
|
output, status = Open3.capture2("npx --yes @puppeteer/browsers install chrome-headless-shell@#{Palapala.chrome_headless_shell_version}")
|
66
66
|
|
67
67
|
if status.success?
|
@@ -82,28 +82,37 @@ module Palapala
|
|
82
82
|
# Display the version
|
83
83
|
system("#{chrome_path} --version") if Palapala.debug
|
84
84
|
# Launch chrome-headless-shell with the --remote-debugging-port parameter
|
85
|
-
|
86
|
-
|
85
|
+
params = [ "--disable-gpu", "--remote-debugging-port=9222" ]
|
86
|
+
params.merge!(Palapala.chrome_params) if Palapala.chrome_params
|
87
|
+
pid = if Palapala.debug
|
88
|
+
spawn(chrome_path, *params)
|
87
89
|
else
|
88
|
-
spawn(chrome_path,
|
90
|
+
spawn(chrome_path, *params, out: "/dev/null", err: "/dev/null")
|
89
91
|
end
|
92
|
+
Palapala.headless_chrome_url = "http://localhost:9222"
|
93
|
+
pid
|
90
94
|
else
|
91
95
|
raise "Failed to install chrome-headless-shell"
|
92
96
|
end
|
93
97
|
end
|
94
98
|
|
99
|
+
def self.spawn_chrome_from_path
|
100
|
+
params = [ "--headless", "--disable-gpu", "--remote-debugging-port=9222" ]
|
101
|
+
params.merge!(Palapala.chrome_params) if Palapala.chrome_params
|
102
|
+
# Spawn an existing chrome with the path and parameters
|
103
|
+
Process.spawn(chrome_path, *params)
|
104
|
+
end
|
105
|
+
|
95
106
|
# Spawn a Chrome child process
|
96
107
|
def self.spawn_chrome
|
97
108
|
return if chrome_running?
|
98
109
|
|
99
|
-
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
@chrome_process_id = Process.spawn(chrome_path, *params)
|
106
|
-
end
|
110
|
+
@chrome_process_id =
|
111
|
+
if Palapala.headless_chrome_path.nil? && self.npx_installed?
|
112
|
+
spawn_chrome_headless_server_with_npx
|
113
|
+
else
|
114
|
+
spawn_chrome_from_path
|
115
|
+
end
|
107
116
|
|
108
117
|
# Wait until the port is in use
|
109
118
|
sleep 0.1 until port_in_use?
|
data/lib/palapala/pdf.rb
CHANGED
@@ -42,20 +42,22 @@ module Palapala
|
|
42
42
|
scale: nil)
|
43
43
|
@content = content || raise(ArgumentError, "Content is required and can't be nil")
|
44
44
|
@opts = {}
|
45
|
-
@opts[:headerTemplate]
|
46
|
-
@opts[:footerTemplate]
|
47
|
-
@opts[:pageRanges]
|
48
|
-
@opts[:generateTaggedPDF]
|
49
|
-
@opts[:paperWidth]
|
50
|
-
@opts[:paperHeight]
|
51
|
-
@opts[:landscape]
|
52
|
-
@opts[:marginTop]
|
53
|
-
@opts[:marginLeft]
|
54
|
-
@opts[:marginBottom]
|
55
|
-
@opts[:marginRight]
|
56
|
-
@opts[:preferCSSPageSize]
|
57
|
-
@opts[:printBackground]
|
58
|
-
@opts[:scale]
|
45
|
+
@opts[:headerTemplate] = header_template || Palapala.defaults[:header_template]
|
46
|
+
@opts[:footerTemplate] = footer_template || Palapala.defaults[:footer_template]
|
47
|
+
@opts[:pageRanges] = page_ranges || Palapala.defaults[:page_ranges]
|
48
|
+
@opts[:generateTaggedPDF] = generate_tagged_pdf || Palapala.defaults[:generate_tagged_pdf]
|
49
|
+
@opts[:paperWidth] = paper_width || Palapala.defaults[:paper_width]
|
50
|
+
@opts[:paperHeight] = paper_height || Palapala.defaults[:paper_height]
|
51
|
+
@opts[:landscape] = landscape || Palapala.defaults[:landscape]
|
52
|
+
@opts[:marginTop] = margin_top || Palapala.defaults[:margin_top]
|
53
|
+
@opts[:marginLeft] = margin_left || Palapala.defaults[:margin_left]
|
54
|
+
@opts[:marginBottom] = margin_bottom || Palapala.defaults[:margin_bottom]
|
55
|
+
@opts[:marginRight] = margin_right || Palapala.defaults[:margin_right]
|
56
|
+
@opts[:preferCSSPageSize] = prefer_css_page_size || Palapala.defaults[:prefer_css_page_size]
|
57
|
+
@opts[:printBackground] = print_background || Palapala.defaults[:print_background]
|
58
|
+
@opts[:scale] = scale || Palapala.defaults[:scale]
|
59
|
+
@opts[:displayHeaderFooter] = true
|
60
|
+
@opts[:encoding] = :binary
|
59
61
|
@opts.compact!
|
60
62
|
end
|
61
63
|
|
data/lib/palapala/renderer.rb
CHANGED
@@ -22,6 +22,13 @@ module Palapala
|
|
22
22
|
send_command_and_wait_for_result("Page.enable")
|
23
23
|
end
|
24
24
|
|
25
|
+
def websocket_url
|
26
|
+
self.class.websocket_url
|
27
|
+
rescue Errno::ECONNREFUSED
|
28
|
+
ChromeProcess.spawn_chrome # Spawn a new Chrome process
|
29
|
+
self.class.websocket_url # Retry (once)
|
30
|
+
end
|
31
|
+
|
25
32
|
# Create a thread-local instance of the renderer
|
26
33
|
def self.thread_local_instance
|
27
34
|
Thread.current[:renderer] ||= Renderer.new
|
@@ -102,16 +109,8 @@ module Palapala
|
|
102
109
|
@client.close
|
103
110
|
end
|
104
111
|
|
105
|
-
private
|
106
|
-
|
107
|
-
# Convert the HTML content to a data URL
|
108
|
-
def data_url_for_html(html)
|
109
|
-
"data:text/html;base64,#{Base64.strict_encode64(html)}"
|
110
|
-
end
|
111
|
-
|
112
112
|
# Open a new tab in the remote chrome and return the WebSocket URL
|
113
|
-
def websocket_url
|
114
|
-
ChromeProcess.spawn_chrome
|
113
|
+
def self.websocket_url
|
115
114
|
uri = URI("#{Palapala.headless_chrome_url}/json/new")
|
116
115
|
http = Net::HTTP.new(uri.host, uri.port)
|
117
116
|
request = Net::HTTP::Put.new(uri)
|
@@ -122,5 +121,12 @@ module Palapala
|
|
122
121
|
puts "WebSocket URL: #{websocket_url}" if Palapala.debug
|
123
122
|
websocket_url
|
124
123
|
end
|
124
|
+
|
125
|
+
private
|
126
|
+
|
127
|
+
# Convert the HTML content to a data URL
|
128
|
+
def data_url_for_html(html)
|
129
|
+
"data:text/html;base64,#{Base64.strict_encode64(html)}"
|
130
|
+
end
|
125
131
|
end
|
126
132
|
end
|
data/lib/palapala/version.rb
CHANGED
data/lib/palapala.rb
CHANGED
@@ -19,18 +19,20 @@ module Palapala
|
|
19
19
|
# path to the headless Chrome executable when using the child process renderer
|
20
20
|
attr_accessor :headless_chrome_path
|
21
21
|
|
22
|
-
# URL to the headless Chrome instance when using the remote renderer
|
22
|
+
# URL to the headless Chrome instance when using the remote renderer (priority)
|
23
23
|
attr_accessor :headless_chrome_url
|
24
24
|
|
25
|
-
# Chrome headless shell version to use
|
25
|
+
# Chrome headless shell version to use (stable, beta, dev, canary, etc.)
|
26
|
+
# when launching a new Chrome instance using npx
|
26
27
|
attr_accessor :chrome_headless_shell_version
|
27
28
|
end
|
28
|
-
puts "setting defaults on palapala"
|
29
29
|
self.debug = false
|
30
|
-
self.defaults = {
|
30
|
+
self.defaults = {
|
31
|
+
header_template: "<div></div>",
|
32
|
+
footer_template: "<div></div>"
|
33
|
+
# footer_template: '<div style="text-align: center; font-size: 12pt; width: 100%;">Generated with Palapala PDF</div>'
|
34
|
+
}
|
31
35
|
self.headless_chrome_path = nil
|
32
|
-
self.headless_chrome_url = "http://localhost:9222"
|
36
|
+
self.headless_chrome_url = ENV.fetch("HEADLESS_CHROME_URL", "http://localhost:9222")
|
33
37
|
self.chrome_headless_shell_version = ENV.fetch("CHROME_HEADLESS_SHELL_VERSION", "stable")
|
34
38
|
end
|
35
|
-
|
36
|
-
puts "hoo"
|