palapala_pdf 0.1.6 → 0.1.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +3 -0
- data/.rubocop_todo.yml +1 -38
- data/README.md +55 -6
- data/assets/images/logo-variant2.webp +0 -0
- data/assets/images/logo.webp +0 -0
- data/examples/headers_and_footers.rb +49 -0
- data/examples/js_based_rendering.rb +22 -0
- data/examples/performance_benchmark.rb +50 -0
- data/exe/chrome-headless-server.sh +19 -0
- data/lib/palapala/chrome_process.rb +100 -0
- data/lib/palapala/pdf.rb +74 -54
- data/lib/palapala/renderer.rb +21 -93
- data/lib/palapala/version.rb +1 -3
- data/lib/palapala/web_socket_client.rb +2 -4
- data/lib/palapala.rb +19 -11
- data/lib/palapala_pdf.rb +1 -0
- data/palapala_pdf.gemspec +3 -3
- metadata +12 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3a5601b10de0f7f98fd62c9dda59133ed4f4ce89c3545d6522e3eea2cdbbfbac
|
4
|
+
data.tar.gz: 82b6a5f0919e6e587e0b3e0cd02fe3547e4c6ac3e6f55ddea8db97f9b89729b5
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 8f848195ef97f03506d3847c26a996ef9c70bdb953349720a6cd39ae092aec08bdb915afbfbbae16a0866e7ee2b3ce907991f75353677fee795baada3828be52
|
7
|
+
data.tar.gz: 74663055627e1fab37a2884dfc74d591238c75d8677f16501f1f369b58c53a096e7adfa7f4ee390dd72afc81bff96eb10f46c23ccc7b364423b301a7a8592964
|
data/.rubocop.yml
CHANGED
@@ -1,4 +1,6 @@
|
|
1
1
|
inherit_from: .rubocop_todo.yml
|
2
|
+
# Omakase Ruby styling for Rails
|
3
|
+
inherit_gem: { rubocop-rails-omakase: rubocop.yml }
|
2
4
|
|
3
5
|
# This is a basic RuboCop configuration file
|
4
6
|
AllCops:
|
@@ -12,3 +14,4 @@ AllCops:
|
|
12
14
|
require:
|
13
15
|
- rubocop-minitest
|
14
16
|
- rubocop-rake
|
17
|
+
- rubocop-performance
|
data/.rubocop_todo.yml
CHANGED
@@ -1,44 +1,7 @@
|
|
1
1
|
# This configuration was generated by
|
2
2
|
# `rubocop --auto-gen-config`
|
3
|
-
# on 2024-08-
|
3
|
+
# on 2024-08-27 21:04:10 UTC using RuboCop version 1.65.1.
|
4
4
|
# The point is for the user to remove these configuration records
|
5
5
|
# one by one as the offenses are removed from the code base.
|
6
6
|
# Note that changes in the inspected code, or installation of new
|
7
7
|
# versions of RuboCop, may require this file to be generated again.
|
8
|
-
|
9
|
-
# Offense count: 1
|
10
|
-
# Configuration parameters: AllowedMethods, AllowedPatterns, CountRepeatedAttributes.
|
11
|
-
Metrics/AbcSize:
|
12
|
-
Max: 33
|
13
|
-
|
14
|
-
# Offense count: 1
|
15
|
-
# Configuration parameters: AllowedMethods, AllowedPatterns.
|
16
|
-
Metrics/CyclomaticComplexity:
|
17
|
-
Max: 13
|
18
|
-
|
19
|
-
# Offense count: 1
|
20
|
-
# Configuration parameters: CountComments, CountAsOne, AllowedMethods, AllowedPatterns.
|
21
|
-
Metrics/MethodLength:
|
22
|
-
Max: 19
|
23
|
-
|
24
|
-
# Offense count: 1
|
25
|
-
# Configuration parameters: CountKeywordArgs, MaxOptionalParameters.
|
26
|
-
Metrics/ParameterLists:
|
27
|
-
Max: 10
|
28
|
-
|
29
|
-
# Offense count: 1
|
30
|
-
# Configuration parameters: AllowedMethods, AllowedPatterns.
|
31
|
-
Metrics/PerceivedComplexity:
|
32
|
-
Max: 13
|
33
|
-
|
34
|
-
# Offense count: 2
|
35
|
-
Style/ClassVars:
|
36
|
-
Exclude:
|
37
|
-
- 'lib/palapala/pdf.rb'
|
38
|
-
|
39
|
-
# Offense count: 1
|
40
|
-
# This cop supports safe autocorrection (--autocorrect).
|
41
|
-
# Configuration parameters: AllowHeredoc, AllowURI, URISchemes, IgnoreCopDirectives, AllowedPatterns.
|
42
|
-
# URISchemes: http, https
|
43
|
-
Layout/LineLength:
|
44
|
-
Max: 121
|
data/README.md
CHANGED
@@ -1,17 +1,20 @@
|
|
1
1
|
# PDF Generation for your Rubies
|
2
2
|
|
3
|
+
<div align="center"><img src="https://raw.githubusercontent.com/palapala-app/palapala_pdf/main/assets/images/logo.webp" alt="Palapala PDF Logo" width="200"></div>
|
4
|
+
|
3
5
|
This project is a Ruby gem that provides functionality for generating PDF files from HTML using the Chrome browser. It allows you to easily convert HTML content into PDF documents, making it convenient for tasks such as generating reports, invoices, or any other printable documents. The gem provides a simple and intuitive API for converting HTML to PDF, and it leverages the power and flexibility of the Chrome browser's rendering engine to ensure accurate and high-quality PDF output. With this gem, you can easily integrate PDF generation capabilities into your Ruby applications.
|
4
6
|
|
5
7
|
At the core, this project leverages the same rendering engine as [Grover](https://github.com/Studiosity/grover), but with significantly reduced overhead and dependencies. Instead of relying on the full Grover/Puppeteer/NodeJS stack, this project uses a raw web socket to enable direct communication from Ruby to a headless Chrome or Chromium browser. This approach ensures efficieny while providing a streamlined alternative for rendering tasks without sacrificing performance or flexibility.
|
6
8
|
|
7
|
-
This is how easy
|
9
|
+
This is how easy PDF generation can be in Ruby:
|
8
10
|
|
9
11
|
```ruby
|
10
12
|
require "palapala"
|
11
13
|
Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
|
12
14
|
```
|
15
|
+
And this while having the most modern HTML/CSS/JS availlable to you: flex, grid, canvas, ...
|
13
16
|
|
14
|
-
|
17
|
+
A core goal of this project is performance, and it is designed to be exceptionally fast. By leveraging **direct communication** with a headless Chrome or Chromium browser via a **raw web socket**, the gem minimizes overhead and dependencies, enabling PDF generation at speeds that significantly outperform other solutions. Whether generating simple or complex documents, this gem ensures that your Ruby applications can handle PDF tasks efficiently and at scale.
|
15
18
|
|
16
19
|
## Installation
|
17
20
|
|
@@ -28,23 +31,68 @@ $ gem install palapala_pdf
|
|
28
31
|
```
|
29
32
|
|
30
33
|
Palapala PDF connects to Chrome over a web socket connection.
|
31
|
-
|
32
|
-
|
33
|
-
Just start it with the following command (9222 is the default port):
|
34
|
+
An external Chrome/Chromium is expected. Start it with the following
|
35
|
+
command (9222 is the default port):
|
34
36
|
|
35
37
|
```sh
|
36
38
|
/path/to/chrome --headless --disable-gpu --remote-debugging-port=9222
|
37
39
|
```
|
38
40
|
|
41
|
+
### Installing Chrome / Headless Chrome
|
42
|
+
|
43
|
+
Seems the august 2024 release 128.0.6613.85 is seriously performance impacted. So to avoid regression issues, it's suggested to install a specific version of Chrome, test it and stick with it. This is easiest using npx and some tooling provided by Puppeteer. Unfortunately it depends on node/npm, but it's worth it. E.g. install a specific version like this:
|
44
|
+
|
45
|
+
```
|
46
|
+
npx @puppeteer/browsers install chrome@127.0.6533.88
|
47
|
+
````
|
48
|
+
|
49
|
+
This installs chrome in a `chrome` folder in the current working dir and it outputs the path where it's installed when it's finished.
|
50
|
+
|
51
|
+
If you installed it using puppeteer from above
|
52
|
+
|
53
|
+
```sh
|
54
|
+
./chrome/mac_arm-127.0.6533.88/chrome-mac-arm64/Google\ Chrome\ for\ Testing.app/Contents/MacOS/Google\ Chrome\ for\ Testing --headless --disable-gpu --remote-debugging-port=9222
|
55
|
+
```
|
56
|
+
|
57
|
+
Currently i'd advise for the `chrome-headless-shell`variant that is a light version meant just for this use case. The chrome-headless-shell is a minimal, headless version of the Chrome browser designed specifically for environments where you need to run Chrome without a graphical user interface (GUI). This is particularly useful in scenarios like server-side rendering, automated testing, web scraping, or any situation where you need the power of the Chrome browser engine without the overhead of displaying a UI. Headless by design, reduced size and overhead but still the same engine.
|
58
|
+
|
59
|
+
```
|
60
|
+
npx @puppeteer/browsers install chrome-headless-shell@stable
|
61
|
+
```
|
62
|
+
|
63
|
+
It installs to a path like this `./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell`. As it's headless by design, it only needs one parameter
|
64
|
+
|
65
|
+
```
|
66
|
+
./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell --remote-debugging-port=9222
|
67
|
+
```
|
68
|
+
|
39
69
|
Alternatively, Palapala PDF will try to launch Chrome as a child process.
|
40
70
|
It guesses the path to Chrome, or you configure it like this:
|
41
71
|
|
42
72
|
```ruby
|
43
73
|
Palapala.setup do |config|
|
44
|
-
|
74
|
+
config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
|
45
75
|
end
|
46
76
|
```
|
47
77
|
|
78
|
+
### Installing Node/NPX
|
79
|
+
|
80
|
+
Using Brew
|
81
|
+
|
82
|
+
````
|
83
|
+
brew install node
|
84
|
+
```
|
85
|
+
|
86
|
+
Using NVM (Node Version Manager)
|
87
|
+
|
88
|
+
````
|
89
|
+
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.3/install.sh | bash
|
90
|
+
source ~/.nvm/nvm.sh
|
91
|
+
nvm --version
|
92
|
+
nvm install node
|
93
|
+
````
|
94
|
+
|
95
|
+
|
48
96
|
## Usage Instructions
|
49
97
|
|
50
98
|
To create a PDF from HTML content using the `Palapala` library, follow these steps:
|
@@ -146,6 +194,7 @@ Bug reports and pull requests are welcome on GitHub at https://github.com/palapa
|
|
146
194
|
|
147
195
|
- [Kenneth Geerts](https://github.com/kennethgeerts) - Your foundational contributions to simplicity are greatly appreciated.
|
148
196
|
- [Eugen Neagoe](https://github.com/eneagoe) - Thank you for your valuable input, feedback and opinions.
|
197
|
+
- [Radu Bogoevici](https://github.com/codenighter) - Thanks for test driving, and all help big and small.
|
149
198
|
|
150
199
|
## Sponsor This Project
|
151
200
|
|
Binary file
|
Binary file
|
@@ -0,0 +1,49 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
$LOAD_PATH.unshift File.expand_path('../lib', __dir__)
|
4
|
+
require 'palapala'
|
5
|
+
|
6
|
+
HEADER_HTML = <<~HTML
|
7
|
+
<style type="text/css">
|
8
|
+
.header {
|
9
|
+
-webkit-print-color-adjust: exact;
|
10
|
+
border-bottom: 1px solid lightgray;
|
11
|
+
color: black;
|
12
|
+
font-family: Arial, Helvetica, sans-serif;
|
13
|
+
font-size: 12pt;
|
14
|
+
margin: 0 auto;
|
15
|
+
padding: 5px;
|
16
|
+
text-align: center;
|
17
|
+
vertical-align: middle;
|
18
|
+
width: 100%;
|
19
|
+
border: 1px solid black;
|
20
|
+
}
|
21
|
+
</style>
|
22
|
+
<div class="header" style="text-align: center">
|
23
|
+
Page <span class="pageNumber"></span> of <span class="totalPages"></span>
|
24
|
+
</div>
|
25
|
+
HTML
|
26
|
+
|
27
|
+
Palapala.setup do |config|
|
28
|
+
config.debug = true
|
29
|
+
config.headless_chrome_url = 'http://localhost:9222' # run against a remote Chrome instance
|
30
|
+
# config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
|
31
|
+
end
|
32
|
+
|
33
|
+
result = Palapala::Pdf.new(
|
34
|
+
# "<style>@page { size: A4 landscape; }</style><p>Hello world #{Time.now}</>",
|
35
|
+
"<h1>Title</h1><p>Hello world #{Time.now}</>",
|
36
|
+
header_html: HEADER_HTML,
|
37
|
+
footer_html: '<div style="text-align: center;">Generated with Palapala PDF</div>',
|
38
|
+
scale: 0.75,
|
39
|
+
prefer_css_page_size: false,
|
40
|
+
margin: { top: 3, bottom: 2 }
|
41
|
+
).save('tmp/headers_and_footers.pdf',
|
42
|
+
generateDocumentOutline: false,
|
43
|
+
# marginTop: 1,
|
44
|
+
# paperWidth: 3,
|
45
|
+
displayHeaderFooter: true,
|
46
|
+
# landscape: false,
|
47
|
+
headerTemplate: HEADER_HTML)
|
48
|
+
|
49
|
+
puts result
|
@@ -0,0 +1,22 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
$LOAD_PATH.unshift File.expand_path('../lib', __dir__)
|
4
|
+
require 'palapala'
|
5
|
+
|
6
|
+
DOCUMENT = <<~HTML
|
7
|
+
<html>
|
8
|
+
<script type="text/javascript">
|
9
|
+
document.addEventListener("DOMContentLoaded", () => {
|
10
|
+
document.body.innerHTML += "<p>Current time from JS: " + new Date().toLocaleString() + "</p>";
|
11
|
+
});
|
12
|
+
</script>
|
13
|
+
<body><p>Default body text.</p></body>
|
14
|
+
</html>
|
15
|
+
HTML
|
16
|
+
|
17
|
+
Palapala.setup do |config|
|
18
|
+
config.debug = true
|
19
|
+
end
|
20
|
+
|
21
|
+
result = Palapala::Pdf.new(DOCUMENT).save('tmp/js_based_rendering.pdf')
|
22
|
+
puts result
|
@@ -0,0 +1,50 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
$LOAD_PATH.unshift File.expand_path('../lib', __dir__)
|
4
|
+
|
5
|
+
require 'benchmark'
|
6
|
+
require 'palapala'
|
7
|
+
|
8
|
+
debug = ARGV[0] == 'debug'
|
9
|
+
|
10
|
+
Palapala.setup do |config|
|
11
|
+
# config.headless_chrome_url = 'http://localhost:9222'
|
12
|
+
config.debug = debug
|
13
|
+
config.defaults.merge! scale: 0.75, format: :A4
|
14
|
+
end
|
15
|
+
|
16
|
+
# @param concurrency Number of concurrent threads
|
17
|
+
# @param iterations Number of iterations per thread
|
18
|
+
def benchmark(concurrency, iterations)
|
19
|
+
time = Benchmark.realtime do
|
20
|
+
threads = (1..concurrency).map do |i|
|
21
|
+
Thread.new do
|
22
|
+
iterations.times do |j|
|
23
|
+
doc = "Hello #{i}, world #{j}! #{Time.now}."
|
24
|
+
Palapala::Pdf.new(doc).save("tmp/benchmark_#{i}_#{j}.pdf")
|
25
|
+
end
|
26
|
+
end
|
27
|
+
end
|
28
|
+
threads.each(&:join)
|
29
|
+
end
|
30
|
+
puts "c:#{concurrency}, n:#{iterations} : Throughput = #{(concurrency * iterations / time).round(2)} docs/sec, Total time = #{time.round(4)} seconds"
|
31
|
+
time
|
32
|
+
end
|
33
|
+
|
34
|
+
puts 'warmup'
|
35
|
+
benchmark(1, 10)
|
36
|
+
|
37
|
+
puts 'benchmarking 20 docs: 1x20, 2x10, 4x5, 5x4, 20x1'
|
38
|
+
benchmark(1, 20)
|
39
|
+
benchmark(2, 10)
|
40
|
+
benchmark(4, 5)
|
41
|
+
# benchmark(5, 4)
|
42
|
+
# benchmark(20, 1)
|
43
|
+
|
44
|
+
puts 'benchmarking 320 docs'
|
45
|
+
benchmark(1, 320)
|
46
|
+
benchmark(2, 320 / 2)
|
47
|
+
benchmark(4, 320 / 4)
|
48
|
+
benchmark(8, 320 / 8)
|
49
|
+
# benchmark(20, 2)
|
50
|
+
# benchmark(40, 1)
|
@@ -0,0 +1,19 @@
|
|
1
|
+
#!/bin/bash
|
2
|
+
|
3
|
+
# Run the command and capture the output
|
4
|
+
echo "Installing latest stable chrome-headless-shell..."
|
5
|
+
output=$(npx @puppeteer/browsers install chrome-headless-shell@stable)
|
6
|
+
|
7
|
+
# Extract the path from the output
|
8
|
+
chrome_path=$(echo "$output" | grep "chrome-headless-shell@" | awk '{print $2}')
|
9
|
+
|
10
|
+
# Directory you want the relative path from (current working directory)
|
11
|
+
base_dir=$(pwd)
|
12
|
+
|
13
|
+
# Convert absolute path to relative path using Node.js
|
14
|
+
relative_path=$(node -e "console.log(require('path').relative('$base_dir', '$chrome_path'))")
|
15
|
+
|
16
|
+
echo "Launching chrome-headless-shell at $relative_path"
|
17
|
+
echo $("$chrome_path" --version)
|
18
|
+
# Launch chrome-headless-shell with the --remote-debugging-port parameter
|
19
|
+
"$chrome_path" --remote-debugging-port=9222
|
@@ -0,0 +1,100 @@
|
|
1
|
+
module Palapala
|
2
|
+
# Manage the Chrome child process
|
3
|
+
module ChromeProcess
|
4
|
+
# Check if the port is in use
|
5
|
+
def self.port_in_use?(port = 9222, host = "127.0.0.1")
|
6
|
+
server = TCPServer.new(host, port)
|
7
|
+
server.close
|
8
|
+
false
|
9
|
+
rescue Errno::EADDRINUSE
|
10
|
+
true
|
11
|
+
end
|
12
|
+
|
13
|
+
# Check if the Chrome process is healthy
|
14
|
+
def self.chrome_process_healthy?
|
15
|
+
return false if @chrome_process_id.nil?
|
16
|
+
|
17
|
+
begin
|
18
|
+
Process.kill(0, @chrome_process_id) # Check if the process is alive
|
19
|
+
true
|
20
|
+
rescue Errno::ESRCH, Errno::EPERM
|
21
|
+
false
|
22
|
+
end
|
23
|
+
end
|
24
|
+
|
25
|
+
# Check if a Chrome is running
|
26
|
+
def self.chrome_running?
|
27
|
+
port_in_use? || # Check if the port is in use and Chrome is running externally
|
28
|
+
chrome_process_healthy? # Check if the process is still alive
|
29
|
+
end
|
30
|
+
|
31
|
+
# Kill the Chrome child process
|
32
|
+
def self.kill_chrome
|
33
|
+
return if @chrome_process_id.nil?
|
34
|
+
|
35
|
+
Process.kill("KILL", @chrome_process_id) # Kill the process
|
36
|
+
Process.wait(@chrome_process_id) # Wait for the process to finish
|
37
|
+
end
|
38
|
+
|
39
|
+
# Get the path to the Chrome executable, if it's not set, then guess based on the OS
|
40
|
+
def self.chrome_path
|
41
|
+
return Palapala.headless_chrome_path if Palapala.headless_chrome_path
|
42
|
+
|
43
|
+
case RbConfig::CONFIG["host_os"]
|
44
|
+
when /darwin/
|
45
|
+
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
|
46
|
+
when /linux/
|
47
|
+
"/usr/bin/google-chrome" # or "/usr/bin/chromium-browser"
|
48
|
+
when /win|mingw|cygwin/
|
49
|
+
"#{ENV.fetch("ProgramFiles(x86)", nil)}\\Google\\Chrome\\Application\\chrome.exe"
|
50
|
+
else
|
51
|
+
raise "Unsupported OS"
|
52
|
+
end
|
53
|
+
end
|
54
|
+
|
55
|
+
# Spawn a Chrome child process
|
56
|
+
def self.spawn_chrome
|
57
|
+
return if chrome_running?
|
58
|
+
|
59
|
+
# Define the path and parameters separately
|
60
|
+
# chrome_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
|
61
|
+
params = [ "--headless", "--disable-gpu", "--remote-debugging-port=9222" ]
|
62
|
+
params.merge!(Palapala.chrome_params) if Palapala.chrome_params
|
63
|
+
|
64
|
+
# Spawn the process with the path and parameters
|
65
|
+
@chrome_process_id = Process.spawn(chrome_path, *params)
|
66
|
+
|
67
|
+
# Wait until the port is in use
|
68
|
+
sleep 0.1 until port_in_use?
|
69
|
+
# Detach the process so it runs in the background
|
70
|
+
Process.detach(@chrome_process_id)
|
71
|
+
|
72
|
+
at_exit do
|
73
|
+
if @chrome_process_id
|
74
|
+
begin
|
75
|
+
Process.kill("TERM", @chrome_process_id)
|
76
|
+
Process.wait(@chrome_process_id)
|
77
|
+
puts "Child process #{@chrome_process_id} terminated."
|
78
|
+
rescue Errno::ESRCH
|
79
|
+
puts "Child process #{@chrome_process_id} is already terminated."
|
80
|
+
rescue Errno::ECHILD
|
81
|
+
puts "No child process #{@chrome_process_id} found."
|
82
|
+
end
|
83
|
+
end
|
84
|
+
end
|
85
|
+
|
86
|
+
# Handle when the process is killed
|
87
|
+
trap("SIGCHLD") do
|
88
|
+
while (@chrome_process_id = Process.wait(-1, Process::WNOHANG))
|
89
|
+
break if @chrome_process_id.nil?
|
90
|
+
|
91
|
+
puts "Process #{@chrome_process_id} was killed."
|
92
|
+
# Handle the error or restart the process if necessary
|
93
|
+
@chrome_process_id = nil
|
94
|
+
end
|
95
|
+
rescue Errno::ECHILD
|
96
|
+
@chrome_process_id = nil
|
97
|
+
end
|
98
|
+
end
|
99
|
+
end
|
100
|
+
end
|
data/lib/palapala/pdf.rb
CHANGED
@@ -1,67 +1,87 @@
|
|
1
|
-
|
1
|
+
require_relative "./renderer"
|
2
2
|
|
3
3
|
module Palapala
|
4
4
|
# Page class to generate PDF from HTML content using Chrome in headless mode in a thread-safe way
|
5
5
|
# @param page_ranges Empty string means all pages, e.g., "1-3, 5, 7-9"
|
6
6
|
class Pdf
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
7
|
+
# Initialize the PDF object with the HTML content and optional parameters.
|
8
|
+
#
|
9
|
+
# The options are passed to the renderer when generating the PDF.
|
10
|
+
# The options are the snakified version of the options from the Chrome DevTools Protocol to respect the Ruby conventions.
|
11
|
+
# (see https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF)
|
12
|
+
#
|
13
|
+
# @param content [String] the HTML content to convert to PDF
|
14
|
+
# @param footer_html [String] the HTML content for the footer
|
15
|
+
# @param generate_tagged_pdf [Boolean] whether to generate a tagged PDF
|
16
|
+
# @param header_html [String] the HTML content for the header
|
17
|
+
# @param landscape [Boolean] whether to use landscape orientation
|
18
|
+
# @param margin_bottom [Integer] the bottom margin in inches
|
19
|
+
# @param margin_left [Integer] the left margin in inches
|
20
|
+
# @param margin_right [Integer] the right margin in inches
|
21
|
+
# @param margin_top [Integer] the top margin in inches
|
22
|
+
# @param page_ranges [String] the page ranges to print, e.g., "1-3, 5, 7-9"
|
23
|
+
# @param paper_height [Integer] the paper height in inches
|
24
|
+
# @param paper_width [Integer] the paper width in inches
|
25
|
+
# @param prefer_css_page_size [Boolean] whether to prefer CSS page size (advised)
|
26
|
+
# @param print_background [Boolean] whether to print background graphics
|
27
|
+
# @param scale [Float] the scale of the PDF rendering
|
28
|
+
def initialize(content,
|
29
|
+
footer_template: nil,
|
30
|
+
generate_tagged_pdf: nil,
|
31
|
+
header_template: nil,
|
32
|
+
landscape: nil,
|
33
|
+
margin_bottom: nil,
|
34
|
+
margin_left: nil,
|
35
|
+
margin_right: nil,
|
36
|
+
margin_top: nil,
|
37
|
+
page_ranges: nil,
|
38
|
+
paper_height: nil,
|
39
|
+
paper_width: nil,
|
40
|
+
prefer_css_page_size: nil,
|
41
|
+
print_background: nil,
|
42
|
+
scale: nil)
|
43
|
+
@content = content || raise(ArgumentError, "Content is required and can't be nil")
|
44
|
+
@opts = {}
|
45
|
+
@opts[:headerTemplate] = header_template || Palapala.defaults[:header_template]
|
46
|
+
@opts[:footerTemplate] = footer_template || Palapala.defaults[:footer_template]
|
47
|
+
@opts[:pageRanges] = page_ranges || Palapala.defaults[:page_ranges]
|
48
|
+
@opts[:generateTaggedPDF] = generate_tagged_pdf || Palapala.defaults[:generate_tagged_pdf]
|
49
|
+
@opts[:paperWidth] = paper_width || Palapala.defaults[:paper_width]
|
50
|
+
@opts[:paperHeight] = paper_height || Palapala.defaults[:paper_height]
|
51
|
+
@opts[:landscape] = landscape || Palapala.defaults[:landscape]
|
52
|
+
@opts[:marginTop] = margin_top || Palapala.defaults[:margin_top]
|
53
|
+
@opts[:marginLeft] = margin_left || Palapala.defaults[:margin_left]
|
54
|
+
@opts[:marginBottom] = margin_bottom || Palapala.defaults[:margin_bottom]
|
55
|
+
@opts[:marginRight] = margin_right || Palapala.defaults[:margin_right]
|
56
|
+
@opts[:preferCSSPageSize] = prefer_css_page_size || Palapala.defaults[:prefer_css_page_size]
|
57
|
+
@opts[:printBackground] = print_background || Palapala.defaults[:print_background]
|
58
|
+
@opts[:scale] = scale || Palapala.defaults[:scale]
|
59
|
+
@opts.compact!
|
23
60
|
end
|
24
61
|
|
25
|
-
|
26
|
-
|
62
|
+
# Render the PDF content to a binary string.
|
63
|
+
#
|
64
|
+
# The params from the initializer are converted to the expected casing and merged with the options passed to this method.
|
65
|
+
# The options passed here are passed unchanged to the renderer and get priority over the options from the initializer.
|
66
|
+
# Chrome DevTools Protocol expects the options to be camelCase, see https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF.
|
67
|
+
#
|
68
|
+
# @param opts [Hash] the options to pass to the renderer
|
69
|
+
# @return [String] the PDF content as a binary string
|
70
|
+
def binary_data
|
71
|
+
puts "Rendering PDF with params: #{@opts}" if Palapala.debug
|
72
|
+
Renderer.html_to_pdf(@content, params: @opts)
|
73
|
+
rescue StandardError => e
|
74
|
+
puts "Error rendering PDF: #{e.message}"
|
75
|
+
Renderer.reset
|
76
|
+
raise
|
27
77
|
end
|
28
78
|
|
29
|
-
def save(path, **opts)
|
30
|
-
File.binwrite(path, pdf(**opts))
|
31
|
-
end
|
32
|
-
|
33
|
-
private
|
34
|
-
|
35
|
-
def renderer
|
36
|
-
Thread.current[:renderer] ||= Renderer.new
|
37
|
-
end
|
38
|
-
|
39
|
-
def pdf(**opts)
|
40
|
-
puts "Rendering PDF with options: #{opts}" if Palapala.debug
|
41
|
-
renderer.html_to_pdf(@content, params: opts_with_defaults.merge(opts))
|
42
|
-
end
|
43
|
-
|
44
|
-
def opts_with_defaults
|
45
|
-
opts = { scale: @scale,
|
46
|
-
printBackground: true,
|
47
|
-
displayHeaderFooter: true,
|
48
|
-
encoding: :binary,
|
49
|
-
preferCSSPageSize: @prefer_css_page_size }
|
50
79
|
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
opts[:format] = @format unless @format.nil?
|
57
|
-
# opts[:paperWidth] = @paper_width unless @paper_width.nil?
|
58
|
-
# opts[:paperHeight] = @paper_height unless @paper_height.nil?
|
59
|
-
opts[:landscape] = @landscape unless @landscape.nil?
|
60
|
-
opts[:marginTop] = @margin[:top] unless @margin[:top].nil?
|
61
|
-
opts[:marginLeft] = @margin[:left] unless @margin[:left].nil?
|
62
|
-
opts[:marginBottom] = @margin[:bottom] unless @margin[:bottom].nil?
|
63
|
-
opts[:marginRight] = @margin[:right] unless @margin[:right].nil?
|
64
|
-
opts
|
80
|
+
# Save the PDF content to a file
|
81
|
+
# @param path [String] the path to save the PDF file
|
82
|
+
# @return [void]
|
83
|
+
def save(path)
|
84
|
+
File.binwrite(path, binary_data)
|
65
85
|
end
|
66
86
|
end
|
67
87
|
end
|
data/lib/palapala/renderer.rb
CHANGED
@@ -1,25 +1,38 @@
|
|
1
|
-
# frozen_string_literal: true
|
2
|
-
|
3
1
|
require "json"
|
4
2
|
require "net/http"
|
5
3
|
require "websocket/driver"
|
4
|
+
require_relative "./web_socket_client"
|
5
|
+
require_relative "./chrome_process"
|
6
6
|
|
7
7
|
module Palapala
|
8
8
|
# Render HTML content to PDF using Chrome in headless mode with minimal dependencies
|
9
9
|
class Renderer
|
10
10
|
def initialize
|
11
|
+
puts "Initializing a renderer" if Palapala.debug
|
11
12
|
# Create an instance of WebSocketClient with the WebSocket URL
|
12
13
|
@client = Palapala::WebSocketClient.new(websocket_url)
|
13
14
|
# Create the WebSocket driver
|
14
15
|
@driver = WebSocket::Driver.client(@client)
|
15
16
|
# Register the on_message callback
|
16
17
|
@driver.on(:message, &method(:on_message))
|
18
|
+
@driver.on(:close) { Thread.current[:renderer] = nil } # Reset the renderer on close
|
17
19
|
# Start the WebSocket handshake
|
18
20
|
@driver.start
|
19
21
|
# Initialize the protocol to get the page events
|
20
22
|
send_command_and_wait_for_result("Page.enable")
|
21
23
|
end
|
22
24
|
|
25
|
+
# Create a thread-local instance of the renderer
|
26
|
+
def self.thread_local_instance
|
27
|
+
Thread.current[:renderer] ||= Renderer.new
|
28
|
+
end
|
29
|
+
|
30
|
+
# Reset the thread-local instance of the renderer
|
31
|
+
def self.reset
|
32
|
+
puts "Clearing the thread local renderer" if Palapala.debug
|
33
|
+
Thread.current[:renderer] = nil
|
34
|
+
end
|
35
|
+
|
23
36
|
# Callback to handle the incomming WebSocket messages
|
24
37
|
def on_message(e)
|
25
38
|
puts "Received: #{e.data[0..64]}" if Palapala.debug
|
@@ -80,6 +93,10 @@ module Palapala
|
|
80
93
|
Base64.decode64(result["data"])
|
81
94
|
end
|
82
95
|
|
96
|
+
def self.html_to_pdf(html, params: {})
|
97
|
+
thread_local_instance.html_to_pdf(html, params: params)
|
98
|
+
end
|
99
|
+
|
83
100
|
def close
|
84
101
|
@driver.close
|
85
102
|
@client.close
|
@@ -87,6 +104,7 @@ module Palapala
|
|
87
104
|
|
88
105
|
private
|
89
106
|
|
107
|
+
# Convert the HTML content to a data URL
|
90
108
|
def data_url_for_html(html)
|
91
109
|
"data:text/html;base64,#{Base64.strict_encode64(html)}"
|
92
110
|
end
|
@@ -97,102 +115,12 @@ module Palapala
|
|
97
115
|
uri = URI("#{Palapala.headless_chrome_url}/json/new")
|
98
116
|
http = Net::HTTP.new(uri.host, uri.port)
|
99
117
|
request = Net::HTTP::Put.new(uri)
|
100
|
-
request[
|
118
|
+
request["Content-Type"] = "application/json"
|
101
119
|
response = http.request(request)
|
102
120
|
tab_info = JSON.parse(response.body)
|
103
121
|
websocket_url = tab_info["webSocketDebuggerUrl"]
|
104
122
|
puts "WebSocket URL: #{websocket_url}" if Palapala.debug
|
105
123
|
websocket_url
|
106
124
|
end
|
107
|
-
|
108
|
-
# Manage the Chrome child process
|
109
|
-
module ChromeProcess
|
110
|
-
def self.port_in_use?(port = 9222, host = "127.0.0.1")
|
111
|
-
server = TCPServer.new(host, port)
|
112
|
-
server.close
|
113
|
-
false
|
114
|
-
rescue Errno::EADDRINUSE
|
115
|
-
true
|
116
|
-
end
|
117
|
-
|
118
|
-
def self.chrome_process_healthy?
|
119
|
-
return false if @chrome_process_id.nil?
|
120
|
-
|
121
|
-
begin
|
122
|
-
Process.kill(0, @chrome_process_id) # Check if the process is alive
|
123
|
-
true
|
124
|
-
rescue Errno::ESRCH, Errno::EPERM
|
125
|
-
false
|
126
|
-
end
|
127
|
-
end
|
128
|
-
|
129
|
-
def self.kill_chrome
|
130
|
-
return if @chrome_process_id.nil?
|
131
|
-
|
132
|
-
Process.kill("KILL", @chrome_process_id) # Kill the process
|
133
|
-
Process.wait(@chrome_process_id) # Wait for the process to finish
|
134
|
-
end
|
135
|
-
|
136
|
-
def self.chrome_path
|
137
|
-
return Palapala.headless_chrome_path if Palapala.headless_chrome_path
|
138
|
-
|
139
|
-
case RbConfig::CONFIG["host_os"]
|
140
|
-
when /darwin/
|
141
|
-
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
|
142
|
-
when /linux/
|
143
|
-
"/usr/bin/google-chrome" # or "/usr/bin/chromium-browser"
|
144
|
-
when /win|mingw|cygwin/
|
145
|
-
"#{ENV["ProgramFiles(x86)"]}\\Google\\Chrome\\Application\\chrome.exe"
|
146
|
-
else
|
147
|
-
raise "Unsupported OS"
|
148
|
-
end
|
149
|
-
end
|
150
|
-
|
151
|
-
def self.spawn_chrome
|
152
|
-
return if port_in_use?
|
153
|
-
return if chrome_process_healthy?
|
154
|
-
|
155
|
-
# Define the path and parameters separately
|
156
|
-
# chrome_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
|
157
|
-
params = ["--headless", "--disable-gpu", "--remote-debugging-port=9222"]
|
158
|
-
|
159
|
-
# Spawn the process with the path and parameters
|
160
|
-
@chrome_process_id = Process.spawn(chrome_path, *params)
|
161
|
-
|
162
|
-
# Wait until the port is in use
|
163
|
-
until port_in_use?
|
164
|
-
sleep 0.1
|
165
|
-
end
|
166
|
-
# Detach the process so it runs in the background
|
167
|
-
Process.detach(@chrome_process_id)
|
168
|
-
|
169
|
-
at_exit do
|
170
|
-
if @chrome_process_id
|
171
|
-
begin
|
172
|
-
Process.kill("TERM", @chrome_process_id)
|
173
|
-
Process.wait(@chrome_process_id)
|
174
|
-
puts "Child process #{@chrome_process_id} terminated."
|
175
|
-
rescue Errno::ESRCH
|
176
|
-
puts "Child process #{@chrome_process_id} is already terminated."
|
177
|
-
rescue Errno::ECHILD
|
178
|
-
puts "No child process #{@chrome_process_id} found."
|
179
|
-
end
|
180
|
-
end
|
181
|
-
end
|
182
|
-
|
183
|
-
# Handle when the process is killed
|
184
|
-
trap("SIGCHLD") do
|
185
|
-
while (@chrome_process_id = Process.wait(-1, Process::WNOHANG))
|
186
|
-
break if @chrome_process_id.nil?
|
187
|
-
|
188
|
-
puts "Process #{@chrome_process_id} was killed."
|
189
|
-
# Handle the error or restart the process if necessary
|
190
|
-
@chrome_process_id = nil
|
191
|
-
end
|
192
|
-
rescue Errno::ECHILD
|
193
|
-
@chrome_process_id = nil
|
194
|
-
end
|
195
|
-
end
|
196
|
-
end
|
197
125
|
end
|
198
126
|
end
|
data/lib/palapala/version.rb
CHANGED
data/lib/palapala.rb
CHANGED
@@ -1,22 +1,30 @@
|
|
1
|
-
|
1
|
+
require_relative "palapala/pdf"
|
2
|
+
require_relative "palapala/version"
|
2
3
|
|
3
|
-
require_relative 'palapala/version'
|
4
|
-
require_relative 'palapala/pdf'
|
5
|
-
require_relative 'palapala/web_socket_client'
|
6
|
-
require_relative 'palapala/renderer'
|
7
|
-
|
8
|
-
# Main module for the gem
|
9
4
|
module Palapala
|
10
5
|
def self.setup
|
11
6
|
yield self
|
12
7
|
end
|
13
8
|
|
14
9
|
class << self
|
15
|
-
|
10
|
+
# params to pass to Chrome when launched as a child process
|
11
|
+
attr_accessor :chrome_params
|
12
|
+
|
13
|
+
# debug mode
|
14
|
+
attr_accessor :debug
|
15
|
+
|
16
|
+
# default options for PDF generation
|
17
|
+
attr_accessor :defaults
|
18
|
+
|
19
|
+
# path to the headless Chrome executable when using the child process renderer
|
20
|
+
attr_accessor :headless_chrome_path
|
21
|
+
|
22
|
+
# URL to the headless Chrome instance when using the remote renderer
|
23
|
+
attr_accessor :headless_chrome_url
|
16
24
|
end
|
17
25
|
|
18
|
-
self.headless_chrome_url = 'http://localhost:9222'
|
19
|
-
self.headless_chrome_path = nil
|
20
|
-
self.defaults = {}
|
21
26
|
self.debug = false
|
27
|
+
self.defaults = { displayHeaderFooter: true, encoding: :binary }
|
28
|
+
self.headless_chrome_path = nil
|
29
|
+
self.headless_chrome_url = "http://localhost:9222"
|
22
30
|
end
|
data/lib/palapala_pdf.rb
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
require_relative "palapala"
|
data/palapala_pdf.gemspec
CHANGED
@@ -5,8 +5,8 @@ require_relative 'lib/palapala/version'
|
|
5
5
|
Gem::Specification.new do |spec|
|
6
6
|
spec.name = 'palapala_pdf'
|
7
7
|
spec.version = Palapala::VERSION
|
8
|
-
spec.authors = ['Koen Handekyn']
|
9
|
-
spec.email = ['github.com@handekyn.com']
|
8
|
+
spec.authors = [ 'Koen Handekyn' ]
|
9
|
+
spec.email = [ 'github.com@handekyn.com' ]
|
10
10
|
|
11
11
|
spec.summary = 'Convert HTML into PDF directly from Ruby using Chrome/Chromium.'
|
12
12
|
spec.description = 'This gem uses faw web sockets to render HTML into a PDF using Chrom(e)(ium) with minimal dependencies.'
|
@@ -31,7 +31,7 @@ Gem::Specification.new do |spec|
|
|
31
31
|
end
|
32
32
|
spec.bindir = 'exe'
|
33
33
|
spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
|
34
|
-
spec.require_paths = ['lib']
|
34
|
+
spec.require_paths = [ 'lib' ]
|
35
35
|
|
36
36
|
# Uncomment to register a new dependency of your gem
|
37
37
|
spec.add_dependency 'base64', '~> 0'
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: palapala_pdf
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.7
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Koen Handekyn
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-08-
|
11
|
+
date: 2024-08-29 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: base64
|
@@ -42,7 +42,8 @@ description: This gem uses faw web sockets to render HTML into a PDF using Chrom
|
|
42
42
|
with minimal dependencies.
|
43
43
|
email:
|
44
44
|
- github.com@handekyn.com
|
45
|
-
executables:
|
45
|
+
executables:
|
46
|
+
- chrome-headless-server.sh
|
46
47
|
extensions: []
|
47
48
|
extra_rdoc_files: []
|
48
49
|
files:
|
@@ -52,11 +53,19 @@ files:
|
|
52
53
|
- LICENSE
|
53
54
|
- README.md
|
54
55
|
- Rakefile
|
56
|
+
- assets/images/logo-variant2.webp
|
57
|
+
- assets/images/logo.webp
|
58
|
+
- examples/headers_and_footers.rb
|
59
|
+
- examples/js_based_rendering.rb
|
60
|
+
- examples/performance_benchmark.rb
|
61
|
+
- exe/chrome-headless-server.sh
|
55
62
|
- lib/palapala.rb
|
63
|
+
- lib/palapala/chrome_process.rb
|
56
64
|
- lib/palapala/pdf.rb
|
57
65
|
- lib/palapala/renderer.rb
|
58
66
|
- lib/palapala/version.rb
|
59
67
|
- lib/palapala/web_socket_client.rb
|
68
|
+
- lib/palapala_pdf.rb
|
60
69
|
- palapala_pdf.gemspec
|
61
70
|
homepage: https://github.com/palapala-app/palapala_pdf
|
62
71
|
licenses:
|