palapala_pdf 0.1.2 → 0.1.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +60 -87
- data/lib/palapala/pdf.rb +6 -48
- data/lib/palapala/renderer.rb +198 -0
- data/lib/palapala/version.rb +1 -1
- data/lib/palapala/web_socket_client.rb +29 -0
- data/lib/palapala.rb +6 -4
- data/palapala_pdf.gemspec +3 -2
- metadata +21 -5
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 467a7808449c84b0d2ddbd56c16a791ce1e3148636d911e8010fd3804f306abf
|
4
|
+
data.tar.gz: 7680b71baec5030b0992383f77c3098ac12da223d88057a36a4a985f97755c35
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz: '
|
6
|
+
metadata.gz: 2533a247040eed4962a337ebd15e9bf51c875ed562242e44b791597d46dc69079e182980f9053cb3519525f6283c249f32dde0ee5668f9bccb2f9a10843d33f9
|
7
|
+
data.tar.gz: '07527904c55f1f0b99d98c1854ac369722c548f4626f5fe5ead3bf49d3bd573fc68ff9f4514cdbdf0acf4fb6c4bead8b80e10f34b3b55ee9fd3cc8f7ddf08436'
|
data/README.md
CHANGED
@@ -2,13 +2,13 @@
|
|
2
2
|
|
3
3
|
This project is a Ruby gem that provides functionality for generating PDF files from HTML using the Chrome browser. It allows you to easily convert HTML content into PDF documents, making it convenient for tasks such as generating reports, invoices, or any other printable documents. The gem provides a simple and intuitive API for converting HTML to PDF, and it leverages the power and flexibility of the Chrome browser's rendering engine to ensure accurate and high-quality PDF output. With this gem, you can easily integrate PDF generation capabilities into your Ruby applications.
|
4
4
|
|
5
|
-
At the core, this project leverages the same rendering engine as [Grover](https://github.com/Studiosity/grover), but with significantly reduced overhead and dependencies. Instead of relying on the full Grover/Puppeteer/NodeJS stack, this project
|
5
|
+
At the core, this project leverages the same rendering engine as [Grover](https://github.com/Studiosity/grover), but with significantly reduced overhead and dependencies. Instead of relying on the full Grover/Puppeteer/NodeJS stack, this project uses a raw web socket to enable direct communication from Ruby to a headless Chrome or Chromium browser. This approach ensures efficieny while providing a streamlined alternative for rendering tasks without sacrificing performance or flexibility.
|
6
6
|
|
7
|
-
This is how easy and powerfull PDF generation
|
7
|
+
This is how easy and powerfull PDF generation can be in Ruby:
|
8
8
|
|
9
9
|
```ruby
|
10
10
|
require "palapala"
|
11
|
-
Palapala::
|
11
|
+
Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
|
12
12
|
```
|
13
13
|
|
14
14
|
And this while having the most modern HTML/CSS/JS availlable to you: flex, grid, canvas, you name it.
|
@@ -27,15 +27,23 @@ If you are not using bundler to manage dependencies, you can install the gem by
|
|
27
27
|
$ gem install palapala_pdf
|
28
28
|
```
|
29
29
|
|
30
|
-
Palapala PDF
|
30
|
+
Palapala PDF connects to Chrome over a web socket connection.
|
31
31
|
|
32
|
-
|
32
|
+
An external Chrome/Chromium is expected.
|
33
|
+
Just start it with the following command (9222 is the default port):
|
33
34
|
|
34
35
|
```sh
|
35
|
-
chrome --headless --disable-gpu --remote-debugging-port=9222
|
36
|
+
/path/to/chrome --headless --disable-gpu --remote-debugging-port=9222
|
36
37
|
```
|
37
38
|
|
38
|
-
|
39
|
+
Alternatively, Palapala PDF will try to launch Chrome as a child process.
|
40
|
+
It guesses the path to Chrome, or you configure it like this:
|
41
|
+
|
42
|
+
```ruby
|
43
|
+
Palapala.setup do |config|
|
44
|
+
config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
|
45
|
+
end
|
46
|
+
```
|
39
47
|
|
40
48
|
## Usage Instructions
|
41
49
|
|
@@ -43,19 +51,20 @@ To create a PDF from HTML content using the `Palapala` library, follow these ste
|
|
43
51
|
|
44
52
|
1. **Configuration**:
|
45
53
|
|
46
|
-
Configure the `Palapala` library with the necessary options, such as the URL for the
|
54
|
+
Configure the `Palapala` library with the necessary options, such as the URL for the browser and default settings like scale and format.
|
47
55
|
|
48
56
|
In a Rails context, this could be inside an initializer.
|
49
57
|
|
50
58
|
```ruby
|
51
59
|
Palapala.setup do |config|
|
52
60
|
# run against an external chrome/chromium or leave this out to run against a chrome that is started as a child process
|
53
|
-
config.
|
61
|
+
config.debug = true
|
62
|
+
config.headless_chrome_url = 'http://localhost:9222' # run against a remote Chrome instance
|
63
|
+
# config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
|
54
64
|
config.defaults = { scale: 1, format: :A4 }
|
55
65
|
end
|
56
66
|
```
|
57
|
-
|
58
|
-
2. **Create a PDF from HTML**:
|
67
|
+
1. **Create a PDF from HTML**:
|
59
68
|
|
60
69
|
Create a PDF file from HTML in `irb`
|
61
70
|
|
@@ -67,14 +76,14 @@ in IRB, load palapala and create a PDF from an HTML snippet:
|
|
67
76
|
|
68
77
|
```ruby
|
69
78
|
require "palapala"
|
70
|
-
Palapala::
|
79
|
+
Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
|
71
80
|
```
|
72
81
|
|
73
|
-
Instantiate a new Palapala::
|
82
|
+
Instantiate a new Palapala::Pdf object with your HTML content and generate the PDF binary data.
|
74
83
|
|
75
84
|
```ruby
|
76
85
|
require "palapala"
|
77
|
-
binary_data = Palapala::
|
86
|
+
binary_data = Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
|
78
87
|
```
|
79
88
|
|
80
89
|
## Paged CSS
|
@@ -88,7 +97,7 @@ When using Chromium-based rendering engines, headers and footers are not control
|
|
88
97
|
With palapala PDF headers and footers are defined using `header_html` and `footer_html` options. These allow you to insert HTML content directly into the header or footer areas.
|
89
98
|
|
90
99
|
```ruby
|
91
|
-
Palapala::
|
100
|
+
Palapala::Pdf.new(
|
92
101
|
"<p>Hello world</>",
|
93
102
|
header_html: '<div style="text-align: center;">Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>',
|
94
103
|
footer_html: '<div style="text-align: center;">Generated with Palapala PDF</div>',
|
@@ -119,50 +128,9 @@ todo example
|
|
119
128
|
</html>
|
120
129
|
```
|
121
130
|
|
122
|
-
##
|
123
|
-
|
124
|
-
|
125
|
-
|
126
|
-
It is Ruby clean and high-level API to Chrome. All you need is Ruby and
|
127
|
-
[Chrome](https://www.google.com/chrome/) or
|
128
|
-
[Chromium](https://www.chromium.org/). Ferrum connects to the browser by [CDP
|
129
|
-
protocol](https://chromedevtools.github.io/devtools-protocol/).
|
130
|
-
|
131
|
-
Highlighting some key Ferrum options in the context of PDF generation
|
132
|
-
|
133
|
-
* options `Hash`
|
134
|
-
* `:headless` (String | Boolean) - Set browser as headless or not, `true` by default. You can set `"new"` to support
|
135
|
-
[new headless mode](https://developer.chrome.com/articles/new-headless/).
|
136
|
-
* `:xvfb` (Boolean) - Run browser in a virtual framebuffer, `false` by default.
|
137
|
-
* `:extensions` (Array[String | Hash]) - An array of paths to files or JS
|
138
|
-
source code to be preloaded into the browser e.g.:
|
139
|
-
`["/path/to/script.js", { source: "window.secret = 'top'" }]`
|
140
|
-
* `:logger` (Object responding to `puts`) - When present, debug output is
|
141
|
-
written to this object.
|
142
|
-
* `:timeout` (Numeric) - The number of seconds we'll wait for a response when
|
143
|
-
communicating with browser. Default is 5.
|
144
|
-
* `:js_errors` (Boolean) - When true, JavaScript errors get re-raised in Ruby.
|
145
|
-
* `:pending_connection_errors` (Boolean) - When main frame is still waiting for slow responses while timeout is
|
146
|
-
reached `PendingConnectionsError` is raised. It's better to figure out why you have slow responses and fix or
|
147
|
-
block them rather than turn this setting off. Default is true.
|
148
|
-
* `:browser_path` (String) - Path to Chrome binary, you can also set ENV
|
149
|
-
variable as `BROWSER_PATH=some/path/chrome`.
|
150
|
-
* `:browser_options` (Hash) - Additional command line options,
|
151
|
-
[see them all](https://peter.sh/experiments/chromium-command-line-switches/)
|
152
|
-
e.g. `{ "ignore-certificate-errors" => nil }`
|
153
|
-
* `:ignore_default_browser_options` (Boolean) - Ferrum has a number of default
|
154
|
-
options it passes to the browser, if you set this to `true` then only
|
155
|
-
options you put in `:browser_options` will be passed to the browser,
|
156
|
-
except required ones of course.
|
157
|
-
* `:url` (String) - URL for a running instance of Chrome. If this is set, a
|
158
|
-
browser process will not be spawned.
|
159
|
-
* `:process_timeout` (Integer) - How long to wait for the Chrome process to
|
160
|
-
respond on startup.
|
161
|
-
* `:ws_max_receive_size` (Integer) - How big messages to accept from Chrome
|
162
|
-
over the web socket, in bytes. Defaults to 64MB. Incoming messages larger
|
163
|
-
than this will cause a `Ferrum::DeadBrowserError`.
|
164
|
-
|
165
|
-
More [details](https://github.com/rubycdp/ferrum#customization)
|
131
|
+
## Raw parameters (Page.printToPDF)
|
132
|
+
|
133
|
+
See (Page.printToPDF)[https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF]
|
166
134
|
|
167
135
|
## Development
|
168
136
|
|
@@ -190,66 +158,71 @@ Your support is greatly appreciated and helps maintain the project!
|
|
190
158
|
|
191
159
|
## Findings
|
192
160
|
|
193
|
-
For Chrome, mode headless=new seems to be slower for pdf rendering cases.
|
194
|
-
|
195
|
-
On mac m3 (aug 24), chromium (brew install chromium) is about 3x slower then chrome? Maybe the chromium that get's installed is not ARM optimized?
|
161
|
+
- For Chrome, mode headless=new seems to be slower for pdf rendering cases.
|
162
|
+
- On mac m3 (aug 24), chromium (brew install chromium) is about 3x slower then chrome? Maybe the chromium that get's installed is not ARM optimized?
|
196
163
|
|
197
164
|
## Primitive benchmark
|
198
165
|
|
199
|
-
On a macbook m3, the throughput for 'hello world' PDF generation can reach around
|
166
|
+
On a macbook m3, the throughput for 'hello world' PDF generation can reach around 300 docs/second when allowing for some concurrency. As Chrome is actually also very efficient, it scales really well for complex documents also. If you run this in Rails, the concurrency is being taken care of either by the front end thread pool or by the workers and you shouldn't have to think about this. (Using an external Chrome)
|
167
|
+
|
168
|
+
Note: it renders `"Hello #{i}, world #{j}! #{Time.now}."` where i is the thread and j is the iteration counter within the thread and persists it to an SSD (which is very fast these days).
|
200
169
|
|
170
|
+
### benchmarking 20 docs: 1x20, 2x10, 4x5
|
201
171
|
|
172
|
+
```sh
|
173
|
+
c:1, n:20 : Throughput = 159.41 docs/sec, Total time = 0.1255 seconds
|
174
|
+
c:2, n:10 : Throughput = 124.91 docs/sec, Total time = 0.1601 seconds
|
175
|
+
c:4, n:5 : Throughput = 196.40 docs/sec, Total time = 0.1018 seconds
|
202
176
|
```
|
203
|
-
|
204
|
-
|
205
|
-
|
206
|
-
|
207
|
-
|
208
|
-
|
177
|
+
|
178
|
+
### benchmarking 320 docs: 1x320, 4x80, 8x40
|
179
|
+
|
180
|
+
```sh
|
181
|
+
c:1, n:320 : Throughput = 184.99 docs/sec, Total time = 1.7299 seconds
|
182
|
+
c:4, n:80 : Throughput = 302.50 docs/sec, Total time = 1.0578 seconds
|
183
|
+
c:8, n:40 : Throughput = 254.29 docs/sec, Total time = 1.2584 seconds
|
209
184
|
```
|
210
185
|
|
186
|
+
This is about a factor 100x faster then what you typically get with Grover and still 10x faster then with many alternatives. It's effectively that fast that you can run this for a lot of uses cases straight from e.g. your Ruby On Rails web worker in the controller on a single machine and still scale to lot's of users.
|
211
187
|
|
212
188
|
## Rails
|
213
189
|
|
214
|
-
### `send_data`
|
190
|
+
### `send_data` and `render_to_string`
|
215
191
|
|
216
192
|
The `send_data` method in Rails is used to send binary data as a file download to the user's browser. It allows you to send any type of data, such as PDF files, images, or CSV files, directly to the user without saving the file on the server.
|
217
193
|
|
218
|
-
Here's an example of how to use `send_data` to send a PDF file:
|
219
|
-
|
220
|
-
```ruby
|
221
|
-
def download_pdf
|
222
|
-
pdf_data = Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
|
223
|
-
send_data pdf_data, filename: "document.pdf", type: "application/pdf"
|
224
|
-
end
|
225
|
-
```
|
226
|
-
|
227
|
-
In this example, `pdf_data` is the binary data of the PDF file. The `filename` option specifies the name of the file that will be downloaded by the user, and the `type` option specifies the MIME type of the file.
|
228
|
-
|
229
|
-
### `render_to_string`
|
230
|
-
|
231
194
|
The `render_to_string` method in Rails is used to render a view template to a string without sending it as a response to the user's browser. It allows you to generate HTML or other text-based content that can be used in various ways, such as sending it as an email, saving it to a file, or manipulating it further before sending it as a response.
|
232
195
|
|
233
|
-
Here's an example of how to use `render_to_string` to render a view template to a string
|
196
|
+
Here's an example of how to use `render_to_string` to render a view template to a string and send the pdf using `send_data`:
|
234
197
|
|
235
198
|
```ruby
|
236
199
|
def download_pdf
|
237
200
|
html_string = render_to_string(template: "example/template", layout: "print", locals: { } )
|
238
|
-
pdf_data = Palapala::
|
201
|
+
pdf_data = Palapala::Pdf.new(html_string).binary_data
|
239
202
|
send_data pdf_data, filename: "document.pdf", type: "application/pdf"
|
240
203
|
end
|
241
204
|
```
|
242
205
|
|
206
|
+
In this example, `pdf_data` is the binary data of the PDF file. The `filename` option specifies the name of the file that will be downloaded by the user, and the `type` option specifies the MIME type of the file.
|
207
|
+
|
243
208
|
## Docker
|
244
209
|
|
245
210
|
In docker as root you must pass the no-sandbox browser option:
|
246
211
|
|
247
212
|
```ruby
|
248
213
|
Palapala.setup do |config|
|
249
|
-
config.
|
214
|
+
config.opts = { 'no-sandbox': nil }
|
250
215
|
end
|
251
216
|
```
|
252
|
-
|
217
|
+
It has also been reported that the Chrome process repeatedly crashes when running inside a Docker container on an M1 Mac. Chrome should work as expected when deployed to a Docker container on a non-M1 Mac.
|
218
|
+
|
219
|
+
## Thread-safety
|
220
|
+
|
221
|
+
Behind the scenes, a websocket is openend and stored on Thread.current for subsequent requests. Hence, the code is
|
222
|
+
thread safe in the sense that every web socket get's a new tab in the underlying chromium and get an isolated context.
|
223
|
+
|
224
|
+
For performance reasons, the code uses a low level websocket connection that does all it's work on the curent thread
|
225
|
+
so we can avoid synchronisation penalties.
|
253
226
|
|
254
227
|
## Heroku
|
255
228
|
|
data/lib/palapala/pdf.rb
CHANGED
@@ -1,13 +1,10 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
-
require 'ferrum'
|
4
|
-
|
5
3
|
module Palapala
|
6
4
|
# Page class to generate PDF from HTML content using Chrome in headless mode in a thread-safe way
|
7
5
|
# @param page_ranges Empty string means all pages, e.g., "1-3, 5, 7-9"
|
8
|
-
class
|
6
|
+
class Pdf
|
9
7
|
def initialize(content = nil,
|
10
|
-
url: nil,
|
11
8
|
header_html: nil,
|
12
9
|
footer_html: nil,
|
13
10
|
generate_tagged_pdf: Palapala.defaults.fetch(:generate_tagged_pdf, false),
|
@@ -16,7 +13,6 @@ module Palapala
|
|
16
13
|
page_ranges: Palapala.defaults.fetch(:page_ranges, nil),
|
17
14
|
margin: Palapala.defaults.fetch(:margin, {}))
|
18
15
|
@content = content
|
19
|
-
@url = url
|
20
16
|
@header_html = header_html
|
21
17
|
@footer_html = footer_html
|
22
18
|
@generate_tagged_pdf = generate_tagged_pdf
|
@@ -31,36 +27,17 @@ module Palapala
|
|
31
27
|
end
|
32
28
|
|
33
29
|
def save(path, **opts)
|
34
|
-
|
30
|
+
File.binwrite(path, pdf(**opts))
|
35
31
|
end
|
36
32
|
|
37
33
|
private
|
38
34
|
|
39
|
-
def
|
40
|
-
|
41
|
-
browser_page = browser_context.page
|
42
|
-
# # output console logs for this page
|
43
|
-
if Palapala.debug
|
44
|
-
browser_page.on('Runtime.consoleAPICalled') do |params|
|
45
|
-
params['args'].each { |r| puts(r['value']) }
|
46
|
-
end
|
47
|
-
end
|
48
|
-
# open the page
|
49
|
-
url = @url || data_url
|
50
|
-
browser_page.go_to(url)
|
51
|
-
# Wait for the page to load
|
52
|
-
# browser_page.network.wait_for_idle
|
53
|
-
# Generate PDF
|
54
|
-
pdf_binary_data = browser_page.pdf(**opts_with_defaults.merge(opts))
|
55
|
-
# Dispose the context
|
56
|
-
browser_context.dispose
|
57
|
-
# Return the PDF data
|
58
|
-
opts[:path] ? opts[:path] : pdf_binary_data
|
35
|
+
def renderer
|
36
|
+
Thread.current[:renderer] ||= Renderer.new
|
59
37
|
end
|
60
38
|
|
61
|
-
def
|
62
|
-
|
63
|
-
"data:text/html;base64,#{encoded_html}"
|
39
|
+
def pdf(**opts)
|
40
|
+
renderer.html_to_pdf(@content, params: opts_with_defaults.merge(opts))
|
64
41
|
end
|
65
42
|
|
66
43
|
def opts_with_defaults
|
@@ -83,26 +60,7 @@ module Palapala
|
|
83
60
|
opts[:marginLeft] = @margin[:left] unless @margin[:left].nil?
|
84
61
|
opts[:marginBottom] = @margin[:bottom] unless @margin[:bottom].nil?
|
85
62
|
opts[:marginRight] = @margin[:right] unless @margin[:right].nil?
|
86
|
-
|
87
|
-
puts "opts: #{opts}" if Palapala&.debug
|
88
63
|
opts
|
89
64
|
end
|
90
|
-
|
91
|
-
def browser
|
92
|
-
# accordng to the docs ferrum is thread safe, however, under heavy load
|
93
|
-
# we are seeing some issues, so we are using thread locals to have a
|
94
|
-
# browser per thread
|
95
|
-
Thread.current[:browser] ||= new_browser
|
96
|
-
# @@browser ||= new_browser
|
97
|
-
end
|
98
|
-
|
99
|
-
def new_browser
|
100
|
-
Ferrum::Browser.new(Palapala.ferrum_opts)
|
101
|
-
end
|
102
|
-
|
103
|
-
# # TODO use method from template class
|
104
|
-
# def cm_to_inches(value)
|
105
|
-
# value / 2.54
|
106
|
-
# end
|
107
65
|
end
|
108
66
|
end
|
@@ -0,0 +1,198 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require "json"
|
4
|
+
require "net/http"
|
5
|
+
require "websocket/driver"
|
6
|
+
|
7
|
+
module Palapala
|
8
|
+
# Render HTML content to PDF using Chrome in headless mode with minimal dependencies
|
9
|
+
class Renderer
|
10
|
+
def initialize
|
11
|
+
# Create an instance of WebSocketClient with the WebSocket URL
|
12
|
+
@client = Palapala::WebSocketClient.new(websocket_url)
|
13
|
+
# Create the WebSocket driver
|
14
|
+
@driver = WebSocket::Driver.client(@client)
|
15
|
+
# Register the on_message callback
|
16
|
+
@driver.on(:message, &method(:on_message))
|
17
|
+
# Start the WebSocket handshake
|
18
|
+
@driver.start
|
19
|
+
# Initialize the protocol to get the page events
|
20
|
+
send_command_and_wait_for_result("Page.enable")
|
21
|
+
end
|
22
|
+
|
23
|
+
# Callback to handle the incomming WebSocket messages
|
24
|
+
def on_message(e)
|
25
|
+
puts "Received: #{e.data[0..64]}" if Palapala.debug
|
26
|
+
@response = JSON.parse(e.data) # Parse the JSON response
|
27
|
+
end
|
28
|
+
|
29
|
+
# Update the current ID to the next ID (increment by 1)
|
30
|
+
def next_id = @id = (@id || 0) + 1
|
31
|
+
|
32
|
+
# Get the current ID
|
33
|
+
def current_id = @id
|
34
|
+
|
35
|
+
# Process the WebSocket messages until some state is true
|
36
|
+
def process_until(&block)
|
37
|
+
loop do
|
38
|
+
@driver.parse(@client.read)
|
39
|
+
return if block.call
|
40
|
+
return if @driver.state == :closed
|
41
|
+
end
|
42
|
+
end
|
43
|
+
|
44
|
+
# Method to send a message (text) and wait for a response
|
45
|
+
def send_and_wait(message, &)
|
46
|
+
puts "\nSending: #{message}" if Palapala.debug
|
47
|
+
@driver.text(message)
|
48
|
+
process_until(&)
|
49
|
+
end
|
50
|
+
|
51
|
+
# Method to send a CDP command and wait for some state to be true
|
52
|
+
def send_command(method, params: {}, &block)
|
53
|
+
send_and_wait(JSON.generate({ id: next_id, method:, params: }), &block)
|
54
|
+
end
|
55
|
+
|
56
|
+
# Method to send a CDP command and wait for the matching event to get the result
|
57
|
+
# @return [Hash] The result of the command
|
58
|
+
def send_command_and_wait_for_result(method, params: {})
|
59
|
+
send_command(method, params:) do
|
60
|
+
@response && @response["id"] == current_id
|
61
|
+
end
|
62
|
+
@response["result"]
|
63
|
+
end
|
64
|
+
|
65
|
+
# Method to send a CDP command and wait for a specific method to be called
|
66
|
+
def send_command_and_wait_for_event(method, event_name:, params: {})
|
67
|
+
send_command(method, params:) do
|
68
|
+
@response && @response["method"] == event_name
|
69
|
+
end
|
70
|
+
end
|
71
|
+
|
72
|
+
# Convert HTML content to PDF
|
73
|
+
# See https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF
|
74
|
+
# @param html [String] The HTML content to convert to PDF
|
75
|
+
# @param params [Hash] Additional parameters to pass to the CDP command
|
76
|
+
def html_to_pdf(html, params: {})
|
77
|
+
send_command_and_wait_for_event("Page.navigate", params: { url: data_url_for_html(html) },
|
78
|
+
event_name: "Page.frameStoppedLoading")
|
79
|
+
result = send_command_and_wait_for_result("Page.printToPDF", params:)
|
80
|
+
Base64.decode64(result["data"])
|
81
|
+
end
|
82
|
+
|
83
|
+
def close
|
84
|
+
@driver.close
|
85
|
+
@client.close
|
86
|
+
end
|
87
|
+
|
88
|
+
private
|
89
|
+
|
90
|
+
def data_url_for_html(html)
|
91
|
+
"data:text/html;base64,#{Base64.strict_encode64(html)}"
|
92
|
+
end
|
93
|
+
|
94
|
+
# Open a new tab in the remote chrome and return the WebSocket URL
|
95
|
+
def websocket_url
|
96
|
+
ChromeProcess.spawn_chrome
|
97
|
+
uri = URI("#{Palapala.headless_chrome_url}/json/new")
|
98
|
+
http = Net::HTTP.new(uri.host, uri.port)
|
99
|
+
request = Net::HTTP::Put.new(uri)
|
100
|
+
request['Content-Type'] = 'application/json'
|
101
|
+
response = http.request(request)
|
102
|
+
tab_info = JSON.parse(response.body)
|
103
|
+
websocket_url = tab_info["webSocketDebuggerUrl"]
|
104
|
+
puts "WebSocket URL: #{websocket_url}" if Palapala.debug
|
105
|
+
websocket_url
|
106
|
+
end
|
107
|
+
|
108
|
+
# Manage the Chrome child process
|
109
|
+
module ChromeProcess
|
110
|
+
def self.port_in_use?(port = 9222, host = "127.0.0.1")
|
111
|
+
server = TCPServer.new(host, port)
|
112
|
+
server.close
|
113
|
+
false
|
114
|
+
rescue Errno::EADDRINUSE
|
115
|
+
true
|
116
|
+
end
|
117
|
+
|
118
|
+
def self.chrome_process_healthy?
|
119
|
+
return false if @chrome_process_id.nil?
|
120
|
+
|
121
|
+
begin
|
122
|
+
Process.kill(0, @chrome_process_id) # Check if the process is alive
|
123
|
+
true
|
124
|
+
rescue Errno::ESRCH, Errno::EPERM
|
125
|
+
false
|
126
|
+
end
|
127
|
+
end
|
128
|
+
|
129
|
+
def self.kill_chrome
|
130
|
+
return if @chrome_process_id.nil?
|
131
|
+
|
132
|
+
Process.kill("KILL", @chrome_process_id) # Kill the process
|
133
|
+
Process.wait(@chrome_process_id) # Wait for the process to finish
|
134
|
+
end
|
135
|
+
|
136
|
+
def self.chrome_path
|
137
|
+
return Palapala.headless_chrome_path if Palapala.headless_chrome_path
|
138
|
+
|
139
|
+
case RbConfig::CONFIG["host_os"]
|
140
|
+
when /darwin/
|
141
|
+
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
|
142
|
+
when /linux/
|
143
|
+
"/usr/bin/google-chrome" # or "/usr/bin/chromium-browser"
|
144
|
+
when /win|mingw|cygwin/
|
145
|
+
"#{ENV["ProgramFiles(x86)"]}\\Google\\Chrome\\Application\\chrome.exe"
|
146
|
+
else
|
147
|
+
raise "Unsupported OS"
|
148
|
+
end
|
149
|
+
end
|
150
|
+
|
151
|
+
def self.spawn_chrome
|
152
|
+
return if port_in_use?
|
153
|
+
return if chrome_process_healthy?
|
154
|
+
|
155
|
+
# Define the path and parameters separately
|
156
|
+
# chrome_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
|
157
|
+
params = ["--headless", "--disable-gpu", "--remote-debugging-port=9222"]
|
158
|
+
|
159
|
+
# Spawn the process with the path and parameters
|
160
|
+
@chrome_process_id = Process.spawn(chrome_path, *params)
|
161
|
+
|
162
|
+
# Wait until the port is in use
|
163
|
+
until port_in_use?
|
164
|
+
sleep 0.1
|
165
|
+
end
|
166
|
+
# Detach the process so it runs in the background
|
167
|
+
Process.detach(@chrome_process_id)
|
168
|
+
|
169
|
+
at_exit do
|
170
|
+
if @chrome_process_id
|
171
|
+
begin
|
172
|
+
Process.kill("TERM", @chrome_process_id)
|
173
|
+
Process.wait(@chrome_process_id)
|
174
|
+
puts "Child process #{@chrome_process_id} terminated."
|
175
|
+
rescue Errno::ESRCH
|
176
|
+
puts "Child process #{@chrome_process_id} is already terminated."
|
177
|
+
rescue Errno::ECHILD
|
178
|
+
puts "No child process #{@chrome_process_id} found."
|
179
|
+
end
|
180
|
+
end
|
181
|
+
end
|
182
|
+
|
183
|
+
# Handle when the process is killed
|
184
|
+
trap("SIGCHLD") do
|
185
|
+
while (@chrome_process_id = Process.wait(-1, Process::WNOHANG))
|
186
|
+
break if @chrome_process_id.nil?
|
187
|
+
|
188
|
+
puts "Process #{@chrome_process_id} was killed."
|
189
|
+
# Handle the error or restart the process if necessary
|
190
|
+
@chrome_process_id = nil
|
191
|
+
end
|
192
|
+
rescue Errno::ECHILD
|
193
|
+
@chrome_process_id = nil
|
194
|
+
end
|
195
|
+
end
|
196
|
+
end
|
197
|
+
end
|
198
|
+
end
|
data/lib/palapala/version.rb
CHANGED
@@ -0,0 +1,29 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require 'uri'
|
4
|
+
require 'socket'
|
5
|
+
|
6
|
+
module Palapala
|
7
|
+
# Create a socket wrapper that conforms to what the websocket-driver expects
|
8
|
+
class WebSocketClient
|
9
|
+
attr_reader :url
|
10
|
+
|
11
|
+
def initialize(url)
|
12
|
+
@url = url
|
13
|
+
uri = URI.parse(url)
|
14
|
+
@socket = TCPSocket.new(uri.host, uri.port)
|
15
|
+
end
|
16
|
+
|
17
|
+
def write(data)
|
18
|
+
@socket.write(data)
|
19
|
+
end
|
20
|
+
|
21
|
+
def read
|
22
|
+
@socket.readpartial(1024)
|
23
|
+
end
|
24
|
+
|
25
|
+
def close
|
26
|
+
@socket.close
|
27
|
+
end
|
28
|
+
end
|
29
|
+
end
|
data/lib/palapala.rb
CHANGED
@@ -1,6 +1,9 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
+
require_relative 'palapala/version'
|
3
4
|
require_relative 'palapala/pdf'
|
5
|
+
require_relative 'palapala/web_socket_client'
|
6
|
+
require_relative 'palapala/renderer'
|
4
7
|
|
5
8
|
# Main module for the gem
|
6
9
|
module Palapala
|
@@ -9,12 +12,11 @@ module Palapala
|
|
9
12
|
end
|
10
13
|
|
11
14
|
class << self
|
12
|
-
attr_accessor :
|
13
|
-
attr_accessor :defaults
|
14
|
-
attr_accessor :debug
|
15
|
+
attr_accessor :defaults, :debug, :headless_chrome_url, :headless_chrome_path
|
15
16
|
end
|
16
17
|
|
17
|
-
self.
|
18
|
+
self.headless_chrome_url = 'http://localhost:9222'
|
19
|
+
self.headless_chrome_path = nil
|
18
20
|
self.defaults = {}
|
19
21
|
self.debug = false
|
20
22
|
end
|
data/palapala_pdf.gemspec
CHANGED
@@ -9,7 +9,7 @@ Gem::Specification.new do |spec|
|
|
9
9
|
spec.email = ['github.com@handekyn.com']
|
10
10
|
|
11
11
|
spec.summary = 'Convert HTML into PDF directly from Ruby using Chrome/Chromium.'
|
12
|
-
spec.description = 'This gem uses
|
12
|
+
spec.description = 'This gem uses faw web sockets to render HTML into a PDF using Chrom(e)(ium) with minimal dependencies.'
|
13
13
|
spec.homepage = 'https://github.com/palapala-app/palapala_pdf'
|
14
14
|
spec.required_ruby_version = '>= 3.1'
|
15
15
|
spec.license = 'MIT'
|
@@ -34,7 +34,8 @@ Gem::Specification.new do |spec|
|
|
34
34
|
spec.require_paths = ['lib']
|
35
35
|
|
36
36
|
# Uncomment to register a new dependency of your gem
|
37
|
-
spec.add_dependency '
|
37
|
+
spec.add_dependency 'base64', '~> 0'
|
38
|
+
spec.add_dependency 'websocket-driver', '~> 0'
|
38
39
|
|
39
40
|
# For more information and examples about making a new gem, check out our
|
40
41
|
# guide at: https://bundler.io/guides/creating_gem.html
|
metadata
CHANGED
@@ -1,17 +1,17 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: palapala_pdf
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Koen Handekyn
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-08-
|
11
|
+
date: 2024-08-27 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
|
-
name:
|
14
|
+
name: base64
|
15
15
|
requirement: !ruby/object:Gem::Requirement
|
16
16
|
requirements:
|
17
17
|
- - "~>"
|
@@ -24,8 +24,22 @@ dependencies:
|
|
24
24
|
- - "~>"
|
25
25
|
- !ruby/object:Gem::Version
|
26
26
|
version: '0'
|
27
|
-
|
28
|
-
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: websocket-driver
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - "~>"
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '0'
|
34
|
+
type: :runtime
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - "~>"
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '0'
|
41
|
+
description: This gem uses faw web sockets to render HTML into a PDF using Chrom(e)(ium)
|
42
|
+
with minimal dependencies.
|
29
43
|
email:
|
30
44
|
- github.com@handekyn.com
|
31
45
|
executables: []
|
@@ -40,7 +54,9 @@ files:
|
|
40
54
|
- Rakefile
|
41
55
|
- lib/palapala.rb
|
42
56
|
- lib/palapala/pdf.rb
|
57
|
+
- lib/palapala/renderer.rb
|
43
58
|
- lib/palapala/version.rb
|
59
|
+
- lib/palapala/web_socket_client.rb
|
44
60
|
- palapala_pdf.gemspec
|
45
61
|
homepage: https://github.com/palapala-app/palapala_pdf
|
46
62
|
licenses:
|