palapala_pdf 0.1.1 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +135 -30
- data/lib/palapala/pdf.rb +18 -60
- data/lib/palapala/renderer.rb +198 -0
- data/lib/palapala/version.rb +1 -1
- data/lib/palapala/web_socket_client.rb +29 -0
- data/lib/palapala.rb +8 -13
- data/palapala_pdf.gemspec +3 -2
- metadata +23 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 95d833844d730f058d59abd1eaaee467dde4fa8b4f3b9d01dd441646675c585b
|
4
|
+
data.tar.gz: b30cd15b438c71801ec9f8b0c0c61ae2aa47543cc59bd36975dd52b9974af98a
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f34c4baea12a5f2577da4bc629319470f7f46dddd3b1ad05b142d0abba6598caefe2128d93035a6d58d65e9357cec32e461a5650fae99606eeae749ded843621
|
7
|
+
data.tar.gz: ac072fb99db23b9b3d7895f4cbd8fed028dd6b5555d1ea1b8ebbc341e4845f901f75a07d4df621691340d754b2d624fc5638d6e2725922fd248d7879441a86a3
|
data/README.md
CHANGED
@@ -2,13 +2,13 @@
|
|
2
2
|
|
3
3
|
This project is a Ruby gem that provides functionality for generating PDF files from HTML using the Chrome browser. It allows you to easily convert HTML content into PDF documents, making it convenient for tasks such as generating reports, invoices, or any other printable documents. The gem provides a simple and intuitive API for converting HTML to PDF, and it leverages the power and flexibility of the Chrome browser's rendering engine to ensure accurate and high-quality PDF output. With this gem, you can easily integrate PDF generation capabilities into your Ruby applications.
|
4
4
|
|
5
|
-
At the core, this project leverages the same rendering engine as [Grover](https://github.com/Studiosity/grover), but with significantly reduced overhead and dependencies. Instead of relying on the full Grover stack, this project
|
5
|
+
At the core, this project leverages the same rendering engine as [Grover](https://github.com/Studiosity/grover), but with significantly reduced overhead and dependencies. Instead of relying on the full Grover/Puppeteer/NodeJS stack, this project uses a raw web socket to enable direct communication from Ruby to a headless Chrome or Chromium browser. This approach ensures efficieny while providing a streamlined alternative for rendering tasks without sacrificing performance or flexibility.
|
6
6
|
|
7
|
-
This is how easy and powerfull PDF generation
|
7
|
+
This is how easy and powerfull PDF generation can be in Ruby:
|
8
8
|
|
9
9
|
```ruby
|
10
10
|
require "palapala"
|
11
|
-
Palapala::
|
11
|
+
Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
|
12
12
|
```
|
13
13
|
|
14
14
|
And this while having the most modern HTML/CSS/JS availlable to you: flex, grid, canvas, you name it.
|
@@ -27,15 +27,23 @@ If you are not using bundler to manage dependencies, you can install the gem by
|
|
27
27
|
$ gem install palapala_pdf
|
28
28
|
```
|
29
29
|
|
30
|
-
Palapala PDF
|
30
|
+
Palapala PDF connects to Chrome over a web socket connection.
|
31
31
|
|
32
|
-
|
32
|
+
An external Chrome/Chromium is expected.
|
33
|
+
Just start it with the following command (9222 is the default port):
|
33
34
|
|
34
35
|
```sh
|
35
|
-
chrome --headless --disable-gpu --remote-debugging-port=9222
|
36
|
+
/path/to/chrome --headless --disable-gpu --remote-debugging-port=9222
|
36
37
|
```
|
37
38
|
|
38
|
-
|
39
|
+
Alternatively, Palapala PDF will try to launch Chrome as a child process.
|
40
|
+
It guesses the path to Chrome, or you configure it like this:
|
41
|
+
|
42
|
+
```ruby
|
43
|
+
Palapala.setup do |config|
|
44
|
+
config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
|
45
|
+
end
|
46
|
+
```
|
39
47
|
|
40
48
|
## Usage Instructions
|
41
49
|
|
@@ -43,21 +51,22 @@ To create a PDF from HTML content using the `Palapala` library, follow these ste
|
|
43
51
|
|
44
52
|
1. **Configuration**:
|
45
53
|
|
46
|
-
Configure the `Palapala` library with the necessary options, such as the URL for the
|
54
|
+
Configure the `Palapala` library with the necessary options, such as the URL for the browser and default settings like scale and format.
|
47
55
|
|
48
56
|
In a Rails context, this could be inside an initializer.
|
49
57
|
|
50
58
|
```ruby
|
51
59
|
Palapala.setup do |config|
|
52
60
|
# run against an external chrome/chromium or leave this out to run against a chrome that is started as a child process
|
53
|
-
config.
|
61
|
+
config.debug = true
|
62
|
+
config.headless_chrome_url = 'http://localhost:9222' # run against a remote Chrome instance
|
63
|
+
# config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
|
54
64
|
config.defaults = { scale: 1, format: :A4 }
|
55
65
|
end
|
56
66
|
```
|
67
|
+
1. **Create a PDF from HTML**:
|
57
68
|
|
58
|
-
|
59
|
-
|
60
|
-
Create a PDF file from HTML in IRB
|
69
|
+
Create a PDF file from HTML in `irb`
|
61
70
|
|
62
71
|
```sh
|
63
72
|
gem install palapala_pdf
|
@@ -65,22 +74,64 @@ gem install palapala_pdf
|
|
65
74
|
|
66
75
|
in IRB, load palapala and create a PDF from an HTML snippet:
|
67
76
|
|
68
|
-
```
|
69
|
-
|
77
|
+
```ruby
|
78
|
+
require "palapala"
|
79
|
+
Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
|
70
80
|
```
|
71
81
|
|
82
|
+
Instantiate a new Palapala::PDF object with your HTML content and generate the PDF binary data.
|
83
|
+
|
72
84
|
```ruby
|
73
85
|
require "palapala"
|
74
|
-
Palapala::
|
86
|
+
binary_data = Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
|
75
87
|
```
|
76
88
|
|
77
|
-
|
89
|
+
## Paged CSS
|
90
|
+
|
91
|
+
Paged CSS is a subset of CSS designed for styling printed documents. It extends standard CSS to handle pagination, page sizes, headers, footers, and other aspects of printed content. Paged CSS is commonly used in scenarios where web content needs to be converted to PDFs or other paginated formats.
|
92
|
+
|
93
|
+
### Headers and Footers
|
94
|
+
|
95
|
+
When using Chromium-based rendering engines, headers and footers are not controlled by the Paged CSS standard but are instead managed through specific settings in the rendering engine.
|
96
|
+
|
97
|
+
With palapala PDF headers and footers are defined using `header_html` and `footer_html` options. These allow you to insert HTML content directly into the header or footer areas.
|
78
98
|
|
79
99
|
```ruby
|
80
|
-
|
81
|
-
|
100
|
+
Palapala::PDF.new(
|
101
|
+
"<p>Hello world</>",
|
102
|
+
header_html: '<div style="text-align: center;">Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>',
|
103
|
+
footer_html: '<div style="text-align: center;">Generated with Palapala PDF</div>',
|
104
|
+
margin: { top: "2cm", bottom: "2cm"}
|
105
|
+
).save("test.pdf")
|
106
|
+
```
|
107
|
+
|
108
|
+
### Page size, orientation and margins
|
109
|
+
|
110
|
+
#### With CSS
|
111
|
+
|
112
|
+
todo example
|
113
|
+
|
114
|
+
#### As params
|
115
|
+
|
116
|
+
todo example
|
117
|
+
|
118
|
+
## JS based rendering
|
119
|
+
|
120
|
+
```html
|
121
|
+
<html>
|
122
|
+
<script type="text/javascript">
|
123
|
+
document.addEventListener("DOMContentLoaded", () => {
|
124
|
+
document.body.innerHTML += "<p>Current time from JS: " + new Date().toLocaleString() + "</p>";
|
125
|
+
});
|
126
|
+
</script>
|
127
|
+
<body><p>Default body text.</p></body>
|
128
|
+
</html>
|
82
129
|
```
|
83
130
|
|
131
|
+
## Raw parameters (Page.printToPDF)
|
132
|
+
|
133
|
+
See (Page.printToPDF)[https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF]
|
134
|
+
|
84
135
|
## Development
|
85
136
|
|
86
137
|
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
@@ -107,26 +158,80 @@ Your support is greatly appreciated and helps maintain the project!
|
|
107
158
|
|
108
159
|
## Findings
|
109
160
|
|
110
|
-
For Chrome, mode headless=new seems to be slower for pdf rendering cases.
|
161
|
+
- For Chrome, mode headless=new seems to be slower for pdf rendering cases.
|
162
|
+
- On mac m3 (aug 24), chromium (brew install chromium) is about 3x slower then chrome? Maybe the chromium that get's installed is not ARM optimized?
|
111
163
|
|
112
164
|
## Primitive benchmark
|
113
165
|
|
114
|
-
On a macbook m3, the throughput for 'hello world' PDF generation can reach around
|
166
|
+
On a macbook m3, the throughput for 'hello world' PDF generation can reach around 300 docs/second when allowing for some concurrency. As Chrome is actually also very efficient, it scales really well for complex documents also. If you run this in Rails, the concurrency is being taken care of either by the front end thread pool or by the workers and you shouldn't have to think about this. (Using an external Chrome)
|
167
|
+
|
168
|
+
Note: it renders `"Hello #{i}, world #{j}! #{Time.now}."` where i is the thread and j is the iteration counter within the thread and persists it to an SSD (which is very fast these days).
|
115
169
|
|
170
|
+
### benchmarking 20 docs: 1x20, 2x10, 4x5
|
116
171
|
|
172
|
+
```sh
|
173
|
+
c:1, n:20 : Throughput = 159.41 docs/sec, Total time = 0.1255 seconds
|
174
|
+
c:2, n:10 : Throughput = 124.91 docs/sec, Total time = 0.1601 seconds
|
175
|
+
c:4, n:5 : Throughput = 196.40 docs/sec, Total time = 0.1018 seconds
|
117
176
|
```
|
118
|
-
|
119
|
-
|
120
|
-
|
121
|
-
|
122
|
-
|
123
|
-
|
177
|
+
|
178
|
+
### benchmarking 320 docs: 1x320, 4x80, 8x40
|
179
|
+
|
180
|
+
```sh
|
181
|
+
c:1, n:320 : Throughput = 184.99 docs/sec, Total time = 1.7299 seconds
|
182
|
+
c:4, n:80 : Throughput = 302.50 docs/sec, Total time = 1.0578 seconds
|
183
|
+
c:8, n:40 : Throughput = 254.29 docs/sec, Total time = 1.2584 seconds
|
124
184
|
```
|
125
185
|
|
126
|
-
|
186
|
+
This is about a factor 100x faster then what you typically get with Grover and still 10x faster then with many alternatives. It's effectively that fast that you can run this for a lot of uses cases straight from e.g. your Ruby On Rails web worker in the controller on a single machine and still scale to lot's of users.
|
127
187
|
|
128
|
-
|
188
|
+
## Rails
|
189
|
+
|
190
|
+
### `send_data` and `render_to_string`
|
191
|
+
|
192
|
+
The `send_data` method in Rails is used to send binary data as a file download to the user's browser. It allows you to send any type of data, such as PDF files, images, or CSV files, directly to the user without saving the file on the server.
|
193
|
+
|
194
|
+
The `render_to_string` method in Rails is used to render a view template to a string without sending it as a response to the user's browser. It allows you to generate HTML or other text-based content that can be used in various ways, such as sending it as an email, saving it to a file, or manipulating it further before sending it as a response.
|
195
|
+
|
196
|
+
Here's an example of how to use `render_to_string` to render a view template to a string and send the pdf using `send_data`:
|
197
|
+
|
198
|
+
```ruby
|
199
|
+
def download_pdf
|
200
|
+
html_string = render_to_string(template: "example/template", layout: "print", locals: { } )
|
201
|
+
pdf_data = Palapala::PDF.new(html_string).binary_data
|
202
|
+
send_data pdf_data, filename: "document.pdf", type: "application/pdf"
|
203
|
+
end
|
204
|
+
```
|
205
|
+
|
206
|
+
In this example, `pdf_data` is the binary data of the PDF file. The `filename` option specifies the name of the file that will be downloaded by the user, and the `type` option specifies the MIME type of the file.
|
207
|
+
|
208
|
+
## Docker
|
209
|
+
|
210
|
+
In docker as root you must pass the no-sandbox browser option:
|
211
|
+
|
212
|
+
```ruby
|
213
|
+
Palapala.setup do |config|
|
214
|
+
config.opts = { 'no-sandbox': nil }
|
215
|
+
end
|
216
|
+
```
|
217
|
+
It has also been reported that the Chrome process repeatedly crashes when running inside a Docker container on an M1 Mac. Chrome should work as expected when deployed to a Docker container on a non-M1 Mac.
|
218
|
+
|
219
|
+
## Thread-safety
|
220
|
+
|
221
|
+
Behind the scenes, a websocket is openend and stored on Thread.current for subsequent requests. Hence, the code is
|
222
|
+
thread safe in the sense that every web socket get's a new tab in the underlying chromium and get an isolated context.
|
223
|
+
|
224
|
+
For performance reasons, the code uses a low level websocket connection that does all it's work on the curent thread
|
225
|
+
so we can avoid synchronisation penalties.
|
226
|
+
|
227
|
+
## Heroku
|
228
|
+
|
229
|
+
possible buildpacks
|
230
|
+
|
231
|
+
https://github.com/heroku/heroku-buildpack-chrome-for-testing
|
232
|
+
|
233
|
+
this buildpack install chrome and chromedriver, which is actually not needed, but it's maintained
|
129
234
|
|
130
|
-
|
235
|
+
https://elements.heroku.com/buildpacks/heroku/heroku-buildpack-google-chrome
|
131
236
|
|
132
|
-
|
237
|
+
this buildpack installs chrome, which is all we need, but it's deprecated
|
data/lib/palapala/pdf.rb
CHANGED
@@ -1,23 +1,18 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
-
require 'ferrum'
|
4
|
-
|
5
3
|
module Palapala
|
6
4
|
# Page class to generate PDF from HTML content using Chrome in headless mode in a thread-safe way
|
7
|
-
|
5
|
+
# @param page_ranges Empty string means all pages, e.g., "1-3, 5, 7-9"
|
6
|
+
class PDF
|
8
7
|
def initialize(content = nil,
|
9
|
-
url: nil,
|
10
|
-
path: nil,
|
11
8
|
header_html: nil,
|
12
9
|
footer_html: nil,
|
13
|
-
generate_tagged_pdf: false,
|
14
|
-
prefer_css_page_size: true,
|
10
|
+
generate_tagged_pdf: Palapala.defaults.fetch(:generate_tagged_pdf, false),
|
11
|
+
prefer_css_page_size: Palapala.defaults.fetch(:prefer_css_page_size, true),
|
15
12
|
scale: Palapala.defaults.fetch(:scale, 1),
|
16
|
-
page_ranges: Palapala.defaults.fetch(:page_ranges,
|
13
|
+
page_ranges: Palapala.defaults.fetch(:page_ranges, nil),
|
17
14
|
margin: Palapala.defaults.fetch(:margin, {}))
|
18
15
|
@content = content
|
19
|
-
@url = url
|
20
|
-
@path = path
|
21
16
|
@header_html = header_html
|
22
17
|
@footer_html = footer_html
|
23
18
|
@generate_tagged_pdf = generate_tagged_pdf
|
@@ -27,82 +22,45 @@ module Palapala
|
|
27
22
|
@margin = margin
|
28
23
|
end
|
29
24
|
|
30
|
-
def pdf(**opts)
|
31
|
-
browser_context = browser.contexts.create
|
32
|
-
browser_page = browser_context.page
|
33
|
-
# # output console logs for this page
|
34
|
-
if opts[:debug]
|
35
|
-
browser_page.on('Runtime.consoleAPICalled') do |params|
|
36
|
-
params['args'].each { |r| puts(r['value']) }
|
37
|
-
end
|
38
|
-
end
|
39
|
-
# open the page
|
40
|
-
url = @url || data_url
|
41
|
-
browser_page.go_to(url)
|
42
|
-
# Wait for the page to load
|
43
|
-
browser_page.network.wait_for_idle
|
44
|
-
# Generate PDF
|
45
|
-
pdf_binary_data = browser_page.pdf(**opts_with_defaults.merge(opts))
|
46
|
-
# Dispose the context
|
47
|
-
browser_context.dispose
|
48
|
-
# Return the PDF data
|
49
|
-
pdf_binary_data
|
50
|
-
end
|
51
|
-
|
52
25
|
def binary_data(**opts)
|
53
26
|
pdf(**opts)
|
54
27
|
end
|
55
28
|
|
56
29
|
def save(path, **opts)
|
57
|
-
|
30
|
+
File.binwrite(path, pdf(**opts))
|
58
31
|
end
|
59
32
|
|
60
33
|
private
|
61
34
|
|
62
|
-
def
|
63
|
-
|
64
|
-
|
35
|
+
def renderer
|
36
|
+
Thread.current[:renderer] ||= Renderer.new
|
37
|
+
end
|
38
|
+
|
39
|
+
def pdf(**opts)
|
40
|
+
renderer.html_to_pdf(@content, params: opts_with_defaults.merge(opts))
|
65
41
|
end
|
66
42
|
|
67
43
|
def opts_with_defaults
|
68
44
|
opts = { scale: @scale,
|
69
45
|
printBackground: true,
|
70
46
|
dispayHeaderFooter: true,
|
71
|
-
pageRanges: @page_ranges, # Empty string means all pages, e.g., "1-3, 5, 7-9"
|
72
47
|
encoding: :binary,
|
73
|
-
preferCSSPageSize:
|
74
|
-
headerTemplate: @header_html || '',
|
75
|
-
footerTemplate: @footer_html || '' }
|
48
|
+
preferCSSPageSize: @prefer_css_page_size }
|
76
49
|
|
50
|
+
opts[:headerTemplate] = @header_html unless @header_html.nil?
|
51
|
+
opts[:footerTemplate] = @footer_html unless @footer_html.nil?
|
52
|
+
opts[:pageRanges] = @page_ranges unless @page_ranges.nil?
|
77
53
|
opts[:path] = @path unless @path.nil?
|
78
54
|
opts[:generateTaggedPDF] = @generate_tagged_pdf unless @generate_tagged_pdf.nil?
|
79
55
|
opts[:format] = @format unless @format.nil?
|
80
|
-
opts[:paperWidth] = @paper_width unless @paper_width.nil?
|
81
|
-
opts[:paperHeight] = @paper_height unless @paper_height.nil?
|
56
|
+
# opts[:paperWidth] = @paper_width unless @paper_width.nil?
|
57
|
+
# opts[:paperHeight] = @paper_height unless @paper_height.nil?
|
82
58
|
opts[:landscape] = @landscape unless @landscape.nil?
|
83
59
|
opts[:marginTop] = @margin[:top] unless @margin[:top].nil?
|
84
60
|
opts[:marginLeft] = @margin[:left] unless @margin[:left].nil?
|
85
61
|
opts[:marginBottom] = @margin[:bottom] unless @margin[:bottom].nil?
|
86
62
|
opts[:marginRight] = @margin[:right] unless @margin[:right].nil?
|
87
|
-
|
88
63
|
opts
|
89
64
|
end
|
90
|
-
|
91
|
-
def browser
|
92
|
-
# accordng to the docs ferrum is thread safe, however, under heavy load
|
93
|
-
# we are seeing some issues, so we are using thread locals to have a
|
94
|
-
# browser per thread
|
95
|
-
Thread.current[:browser] ||= new_browser
|
96
|
-
# @@browser ||= new_browser
|
97
|
-
end
|
98
|
-
|
99
|
-
def new_browser
|
100
|
-
Ferrum::Browser.new(Palapala.ferrum_opts)
|
101
|
-
end
|
102
|
-
|
103
|
-
# # TODO use method from template class
|
104
|
-
# def cm_to_inches(value)
|
105
|
-
# value / 2.54
|
106
|
-
# end
|
107
65
|
end
|
108
66
|
end
|
@@ -0,0 +1,198 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require "json"
|
4
|
+
require "net/http"
|
5
|
+
require "websocket/driver"
|
6
|
+
|
7
|
+
module Palapala
|
8
|
+
# Render HTML content to PDF using Chrome in headless mode with minimal dependencies
|
9
|
+
class Renderer
|
10
|
+
def initialize
|
11
|
+
# Create an instance of WebSocketClient with the WebSocket URL
|
12
|
+
@client = Palapala::WebSocketClient.new(websocket_url)
|
13
|
+
# Create the WebSocket driver
|
14
|
+
@driver = WebSocket::Driver.client(@client)
|
15
|
+
# Register the on_message callback
|
16
|
+
@driver.on(:message, &method(:on_message))
|
17
|
+
# Start the WebSocket handshake
|
18
|
+
@driver.start
|
19
|
+
# Initialize the protocol to get the page events
|
20
|
+
send_command_and_wait_for_result("Page.enable")
|
21
|
+
end
|
22
|
+
|
23
|
+
# Callback to handle the incomming WebSocket messages
|
24
|
+
def on_message(e)
|
25
|
+
puts "Received: #{e.data[0..64]}" if Palapala.debug
|
26
|
+
@response = JSON.parse(e.data) # Parse the JSON response
|
27
|
+
end
|
28
|
+
|
29
|
+
# Update the current ID to the next ID (increment by 1)
|
30
|
+
def next_id = @id = (@id || 0) + 1
|
31
|
+
|
32
|
+
# Get the current ID
|
33
|
+
def current_id = @id
|
34
|
+
|
35
|
+
# Process the WebSocket messages until some state is true
|
36
|
+
def process_until(&block)
|
37
|
+
loop do
|
38
|
+
@driver.parse(@client.read)
|
39
|
+
return if block.call
|
40
|
+
return if @driver.state == :closed
|
41
|
+
end
|
42
|
+
end
|
43
|
+
|
44
|
+
# Method to send a message (text) and wait for a response
|
45
|
+
def send_and_wait(message, &)
|
46
|
+
puts "\nSending: #{message}" if Palapala.debug
|
47
|
+
@driver.text(message)
|
48
|
+
process_until(&)
|
49
|
+
end
|
50
|
+
|
51
|
+
# Method to send a CDP command and wait for some state to be true
|
52
|
+
def send_command(method, params: {}, &block)
|
53
|
+
send_and_wait(JSON.generate({ id: next_id, method:, params: }), &block)
|
54
|
+
end
|
55
|
+
|
56
|
+
# Method to send a CDP command and wait for the matching event to get the result
|
57
|
+
# @return [Hash] The result of the command
|
58
|
+
def send_command_and_wait_for_result(method, params: {})
|
59
|
+
send_command(method, params:) do
|
60
|
+
@response && @response["id"] == current_id
|
61
|
+
end
|
62
|
+
@response["result"]
|
63
|
+
end
|
64
|
+
|
65
|
+
# Method to send a CDP command and wait for a specific method to be called
|
66
|
+
def send_command_and_wait_for_event(method, event_name:, params: {})
|
67
|
+
send_command(method, params:) do
|
68
|
+
@response && @response["method"] == event_name
|
69
|
+
end
|
70
|
+
end
|
71
|
+
|
72
|
+
# Convert HTML content to PDF
|
73
|
+
# See https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF
|
74
|
+
# @param html [String] The HTML content to convert to PDF
|
75
|
+
# @param params [Hash] Additional parameters to pass to the CDP command
|
76
|
+
def html_to_pdf(html, params: {})
|
77
|
+
send_command_and_wait_for_event("Page.navigate", params: { url: data_url_for_html(html) },
|
78
|
+
event_name: "Page.frameStoppedLoading")
|
79
|
+
result = send_command_and_wait_for_result("Page.printToPDF", params:)
|
80
|
+
Base64.decode64(result["data"])
|
81
|
+
end
|
82
|
+
|
83
|
+
def close
|
84
|
+
@driver.close
|
85
|
+
@client.close
|
86
|
+
end
|
87
|
+
|
88
|
+
private
|
89
|
+
|
90
|
+
def data_url_for_html(html)
|
91
|
+
"data:text/html;base64,#{Base64.strict_encode64(html)}"
|
92
|
+
end
|
93
|
+
|
94
|
+
# Open a new tab in the remote chrome and return the WebSocket URL
|
95
|
+
def websocket_url
|
96
|
+
ChromeProcess.spawn_chrome
|
97
|
+
uri = URI("#{Palapala.headless_chrome_url}/json/new")
|
98
|
+
http = Net::HTTP.new(uri.host, uri.port)
|
99
|
+
request = Net::HTTP::Put.new(uri)
|
100
|
+
request['Content-Type'] = 'application/json'
|
101
|
+
response = http.request(request)
|
102
|
+
tab_info = JSON.parse(response.body)
|
103
|
+
websocket_url = tab_info["webSocketDebuggerUrl"]
|
104
|
+
puts "WebSocket URL: #{websocket_url}" if Palapala.debug
|
105
|
+
websocket_url
|
106
|
+
end
|
107
|
+
|
108
|
+
# Manage the Chrome child process
|
109
|
+
module ChromeProcess
|
110
|
+
def self.port_in_use?(port = 9222, host = "127.0.0.1")
|
111
|
+
server = TCPServer.new(host, port)
|
112
|
+
server.close
|
113
|
+
false
|
114
|
+
rescue Errno::EADDRINUSE
|
115
|
+
true
|
116
|
+
end
|
117
|
+
|
118
|
+
def self.chrome_process_healthy?
|
119
|
+
return false if @chrome_process_id.nil?
|
120
|
+
|
121
|
+
begin
|
122
|
+
Process.kill(0, @chrome_process_id) # Check if the process is alive
|
123
|
+
true
|
124
|
+
rescue Errno::ESRCH, Errno::EPERM
|
125
|
+
false
|
126
|
+
end
|
127
|
+
end
|
128
|
+
|
129
|
+
def self.kill_chrome
|
130
|
+
return if @chrome_process_id.nil?
|
131
|
+
|
132
|
+
Process.kill("KILL", @chrome_process_id) # Kill the process
|
133
|
+
Process.wait(@chrome_process_id) # Wait for the process to finish
|
134
|
+
end
|
135
|
+
|
136
|
+
def self.chrome_path
|
137
|
+
return Palapala.headless_chrome_path if Palapala.headless_chrome_path
|
138
|
+
|
139
|
+
case RbConfig::CONFIG["host_os"]
|
140
|
+
when /darwin/
|
141
|
+
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
|
142
|
+
when /linux/
|
143
|
+
"/usr/bin/google-chrome" # or "/usr/bin/chromium-browser"
|
144
|
+
when /win|mingw|cygwin/
|
145
|
+
"#{ENV["ProgramFiles(x86)"]}\\Google\\Chrome\\Application\\chrome.exe"
|
146
|
+
else
|
147
|
+
raise "Unsupported OS"
|
148
|
+
end
|
149
|
+
end
|
150
|
+
|
151
|
+
def self.spawn_chrome
|
152
|
+
return if port_in_use?
|
153
|
+
return if chrome_process_healthy?
|
154
|
+
|
155
|
+
# Define the path and parameters separately
|
156
|
+
# chrome_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
|
157
|
+
params = ["--headless", "--disable-gpu", "--remote-debugging-port=9222"]
|
158
|
+
|
159
|
+
# Spawn the process with the path and parameters
|
160
|
+
@chrome_process_id = Process.spawn(chrome_path, *params)
|
161
|
+
|
162
|
+
# Wait until the port is in use
|
163
|
+
until port_in_use?
|
164
|
+
sleep 0.1
|
165
|
+
end
|
166
|
+
# Detach the process so it runs in the background
|
167
|
+
Process.detach(@chrome_process_id)
|
168
|
+
|
169
|
+
at_exit do
|
170
|
+
if @chrome_process_id
|
171
|
+
begin
|
172
|
+
Process.kill("TERM", @chrome_process_id)
|
173
|
+
Process.wait(@chrome_process_id)
|
174
|
+
puts "Child process #{@chrome_process_id} terminated."
|
175
|
+
rescue Errno::ESRCH
|
176
|
+
puts "Child process #{@chrome_process_id} is already terminated."
|
177
|
+
rescue Errno::ECHILD
|
178
|
+
puts "No child process #{@chrome_process_id} found."
|
179
|
+
end
|
180
|
+
end
|
181
|
+
end
|
182
|
+
|
183
|
+
# Handle when the process is killed
|
184
|
+
trap("SIGCHLD") do
|
185
|
+
while (@chrome_process_id = Process.wait(-1, Process::WNOHANG))
|
186
|
+
break if @chrome_process_id.nil?
|
187
|
+
|
188
|
+
puts "Process #{@chrome_process_id} was killed."
|
189
|
+
# Handle the error or restart the process if necessary
|
190
|
+
@chrome_process_id = nil
|
191
|
+
end
|
192
|
+
rescue Errno::ECHILD
|
193
|
+
@chrome_process_id = nil
|
194
|
+
end
|
195
|
+
end
|
196
|
+
end
|
197
|
+
end
|
198
|
+
end
|
data/lib/palapala/version.rb
CHANGED
@@ -0,0 +1,29 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require 'uri'
|
4
|
+
require 'socket'
|
5
|
+
|
6
|
+
module Palapala
|
7
|
+
# Create a socket wrapper that conforms to what the websocket-driver expects
|
8
|
+
class WebSocketClient
|
9
|
+
attr_reader :url
|
10
|
+
|
11
|
+
def initialize(url)
|
12
|
+
@url = url
|
13
|
+
uri = URI.parse(url)
|
14
|
+
@socket = TCPSocket.new(uri.host, uri.port)
|
15
|
+
end
|
16
|
+
|
17
|
+
def write(data)
|
18
|
+
@socket.write(data)
|
19
|
+
end
|
20
|
+
|
21
|
+
def read
|
22
|
+
@socket.readpartial(1024)
|
23
|
+
end
|
24
|
+
|
25
|
+
def close
|
26
|
+
@socket.close
|
27
|
+
end
|
28
|
+
end
|
29
|
+
end
|
data/lib/palapala.rb
CHANGED
@@ -1,6 +1,8 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
require_relative 'palapala/pdf'
|
4
|
+
require_relative 'palapala/web_socket_client'
|
5
|
+
require_relative 'palapala/renderer'
|
4
6
|
|
5
7
|
# Main module for the gem
|
6
8
|
module Palapala
|
@@ -8,19 +10,12 @@ module Palapala
|
|
8
10
|
yield self
|
9
11
|
end
|
10
12
|
|
11
|
-
|
12
|
-
|
13
|
+
class << self
|
14
|
+
attr_accessor :defaults, :debug, :headless_chrome_url, :headless_chrome_path
|
13
15
|
end
|
14
16
|
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
def self.defaults=(defaults)
|
20
|
-
@defaults = defaults
|
21
|
-
end
|
22
|
-
|
23
|
-
def self.defaults
|
24
|
-
@defaults ||= {}
|
25
|
-
end
|
17
|
+
self.headless_chrome_url = 'http://localhost:9222'
|
18
|
+
self.headless_chrome_path = nil
|
19
|
+
self.defaults = {}
|
20
|
+
self.debug = false
|
26
21
|
end
|
data/palapala_pdf.gemspec
CHANGED
@@ -9,7 +9,7 @@ Gem::Specification.new do |spec|
|
|
9
9
|
spec.email = ['github.com@handekyn.com']
|
10
10
|
|
11
11
|
spec.summary = 'Convert HTML into PDF directly from Ruby using Chrome/Chromium.'
|
12
|
-
spec.description = 'This gem uses
|
12
|
+
spec.description = 'This gem uses faw web sockets to render HTML into a PDF using Chrom(e)(ium) with minimal dependencies.'
|
13
13
|
spec.homepage = 'https://github.com/palapala-app/palapala_pdf'
|
14
14
|
spec.required_ruby_version = '>= 3.1'
|
15
15
|
spec.license = 'MIT'
|
@@ -34,7 +34,8 @@ Gem::Specification.new do |spec|
|
|
34
34
|
spec.require_paths = ['lib']
|
35
35
|
|
36
36
|
# Uncomment to register a new dependency of your gem
|
37
|
-
spec.add_dependency '
|
37
|
+
spec.add_dependency 'base64', '~> 0'
|
38
|
+
spec.add_dependency 'websocket-driver', '~> 0'
|
38
39
|
|
39
40
|
# For more information and examples about making a new gem, check out our
|
40
41
|
# guide at: https://bundler.io/guides/creating_gem.html
|
metadata
CHANGED
@@ -1,31 +1,45 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: palapala_pdf
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Koen Handekyn
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-08-
|
11
|
+
date: 2024-08-27 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
|
-
name:
|
14
|
+
name: base64
|
15
15
|
requirement: !ruby/object:Gem::Requirement
|
16
16
|
requirements:
|
17
17
|
- - "~>"
|
18
18
|
- !ruby/object:Gem::Version
|
19
|
-
version: '0
|
19
|
+
version: '0'
|
20
20
|
type: :runtime
|
21
21
|
prerelease: false
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
24
|
- - "~>"
|
25
25
|
- !ruby/object:Gem::Version
|
26
|
-
version: '0
|
27
|
-
|
28
|
-
|
26
|
+
version: '0'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: websocket-driver
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - "~>"
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '0'
|
34
|
+
type: :runtime
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - "~>"
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '0'
|
41
|
+
description: This gem uses faw web sockets to render HTML into a PDF using Chrom(e)(ium)
|
42
|
+
with minimal dependencies.
|
29
43
|
email:
|
30
44
|
- github.com@handekyn.com
|
31
45
|
executables: []
|
@@ -40,7 +54,9 @@ files:
|
|
40
54
|
- Rakefile
|
41
55
|
- lib/palapala.rb
|
42
56
|
- lib/palapala/pdf.rb
|
57
|
+
- lib/palapala/renderer.rb
|
43
58
|
- lib/palapala/version.rb
|
59
|
+
- lib/palapala/web_socket_client.rb
|
44
60
|
- palapala_pdf.gemspec
|
45
61
|
homepage: https://github.com/palapala-app/palapala_pdf
|
46
62
|
licenses:
|