palapala_pdf 0.1.1 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +146 -14
- data/lib/palapala/pdf.rb +25 -25
- data/lib/palapala/version.rb +1 -1
- data/lib/palapala.rb +7 -13
- data/palapala_pdf.gemspec +1 -1
- metadata +4 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: f4a479307ef1a9d4ebe8aee6d8b3f2d7da1f96c854252ba69cc19bbaba45f6cf
|
4
|
+
data.tar.gz: d8f184150f5b43eb1abcbaceaf4e37a643b9a3fdfbcca7dff075f56ec1d5622b
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a094985ed908279fac68ed43d7fbf83b333591d97ca723af840263425c94d530647388417bd935618680f402e617af6097c55eb8e09a98dd69df5a61a7eaa495
|
7
|
+
data.tar.gz: '07382dfb24841886ba2c09ae20b240701cb57d37963f16fd8e9e135f299f63962a5c4f09f6f14d4b2a0e86ae33a80cd2bc45426605b9137f3cdbc70c50cb34b2'
|
data/README.md
CHANGED
@@ -2,13 +2,13 @@
|
|
2
2
|
|
3
3
|
This project is a Ruby gem that provides functionality for generating PDF files from HTML using the Chrome browser. It allows you to easily convert HTML content into PDF documents, making it convenient for tasks such as generating reports, invoices, or any other printable documents. The gem provides a simple and intuitive API for converting HTML to PDF, and it leverages the power and flexibility of the Chrome browser's rendering engine to ensure accurate and high-quality PDF output. With this gem, you can easily integrate PDF generation capabilities into your Ruby applications.
|
4
4
|
|
5
|
-
At the core, this project leverages the same rendering engine as [Grover](https://github.com/Studiosity/grover), but with significantly reduced overhead and dependencies. Instead of relying on the full Grover stack, this project builds on [Ferrum](https://github.com/rubycdp/ferrum) to enable direct communication from Ruby to a headless Chrome or Chromium browser. This approach ensures efficient, thread-safe operations, providing a streamlined alternative for rendering tasks without sacrificing performance or flexibility.
|
5
|
+
At the core, this project leverages the same rendering engine as [Grover](https://github.com/Studiosity/grover), but with significantly reduced overhead and dependencies. Instead of relying on the full Grover/Puppeteer/NodeJS stack, this project builds on [Ferrum](https://github.com/rubycdp/ferrum) to enable direct communication from Ruby to a headless Chrome or Chromium browser. This approach ensures efficient, thread-safe operations, providing a streamlined alternative for rendering tasks without sacrificing performance or flexibility.
|
6
6
|
|
7
|
-
This is how easy and powerfull PDF generation should be:
|
7
|
+
This is how easy and powerfull PDF generation should be in Ruby:
|
8
8
|
|
9
9
|
```ruby
|
10
10
|
require "palapala"
|
11
|
-
Palapala::
|
11
|
+
Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
|
12
12
|
```
|
13
13
|
|
14
14
|
And this while having the most modern HTML/CSS/JS availlable to you: flex, grid, canvas, you name it.
|
@@ -57,7 +57,7 @@ end
|
|
57
57
|
|
58
58
|
2. **Create a PDF from HTML**:
|
59
59
|
|
60
|
-
Create a PDF file from HTML in
|
60
|
+
Create a PDF file from HTML in `irb`
|
61
61
|
|
62
62
|
```sh
|
63
63
|
gem install palapala_pdf
|
@@ -65,22 +65,105 @@ gem install palapala_pdf
|
|
65
65
|
|
66
66
|
in IRB, load palapala and create a PDF from an HTML snippet:
|
67
67
|
|
68
|
-
```
|
69
|
-
|
68
|
+
```ruby
|
69
|
+
require "palapala"
|
70
|
+
Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
|
70
71
|
```
|
71
72
|
|
73
|
+
Instantiate a new Palapala::PDF object with your HTML content and generate the PDF binary data.
|
74
|
+
|
72
75
|
```ruby
|
73
76
|
require "palapala"
|
74
|
-
Palapala::
|
77
|
+
binary_data = Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
|
75
78
|
```
|
76
79
|
|
77
|
-
|
80
|
+
## Paged CSS
|
81
|
+
|
82
|
+
Paged CSS is a subset of CSS designed for styling printed documents. It extends standard CSS to handle pagination, page sizes, headers, footers, and other aspects of printed content. Paged CSS is commonly used in scenarios where web content needs to be converted to PDFs or other paginated formats.
|
83
|
+
|
84
|
+
### Headers and Footers
|
85
|
+
|
86
|
+
When using Chromium-based rendering engines, headers and footers are not controlled by the Paged CSS standard but are instead managed through specific settings in the rendering engine.
|
87
|
+
|
88
|
+
With palapala PDF headers and footers are defined using `header_html` and `footer_html` options. These allow you to insert HTML content directly into the header or footer areas.
|
78
89
|
|
79
90
|
```ruby
|
80
|
-
|
81
|
-
|
91
|
+
Palapala::PDF.new(
|
92
|
+
"<p>Hello world</>",
|
93
|
+
header_html: '<div style="text-align: center;">Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>',
|
94
|
+
footer_html: '<div style="text-align: center;">Generated with Palapala PDF</div>',
|
95
|
+
margin: { top: "2cm", bottom: "2cm"}
|
96
|
+
).save("test.pdf")
|
97
|
+
```
|
98
|
+
|
99
|
+
### Page size, orientation and margins
|
100
|
+
|
101
|
+
#### With CSS
|
102
|
+
|
103
|
+
todo example
|
104
|
+
|
105
|
+
#### As params
|
106
|
+
|
107
|
+
todo example
|
108
|
+
|
109
|
+
## JS based rendering
|
110
|
+
|
111
|
+
```html
|
112
|
+
<html>
|
113
|
+
<script type="text/javascript">
|
114
|
+
document.addEventListener("DOMContentLoaded", () => {
|
115
|
+
document.body.innerHTML += "<p>Current time from JS: " + new Date().toLocaleString() + "</p>";
|
116
|
+
});
|
117
|
+
</script>
|
118
|
+
<body><p>Default body text.</p></body>
|
119
|
+
</html>
|
82
120
|
```
|
83
121
|
|
122
|
+
## Customisation
|
123
|
+
|
124
|
+
### Ferrum
|
125
|
+
|
126
|
+
It is Ruby clean and high-level API to Chrome. All you need is Ruby and
|
127
|
+
[Chrome](https://www.google.com/chrome/) or
|
128
|
+
[Chromium](https://www.chromium.org/). Ferrum connects to the browser by [CDP
|
129
|
+
protocol](https://chromedevtools.github.io/devtools-protocol/).
|
130
|
+
|
131
|
+
Highlighting some key Ferrum options in the context of PDF generation
|
132
|
+
|
133
|
+
* options `Hash`
|
134
|
+
* `:headless` (String | Boolean) - Set browser as headless or not, `true` by default. You can set `"new"` to support
|
135
|
+
[new headless mode](https://developer.chrome.com/articles/new-headless/).
|
136
|
+
* `:xvfb` (Boolean) - Run browser in a virtual framebuffer, `false` by default.
|
137
|
+
* `:extensions` (Array[String | Hash]) - An array of paths to files or JS
|
138
|
+
source code to be preloaded into the browser e.g.:
|
139
|
+
`["/path/to/script.js", { source: "window.secret = 'top'" }]`
|
140
|
+
* `:logger` (Object responding to `puts`) - When present, debug output is
|
141
|
+
written to this object.
|
142
|
+
* `:timeout` (Numeric) - The number of seconds we'll wait for a response when
|
143
|
+
communicating with browser. Default is 5.
|
144
|
+
* `:js_errors` (Boolean) - When true, JavaScript errors get re-raised in Ruby.
|
145
|
+
* `:pending_connection_errors` (Boolean) - When main frame is still waiting for slow responses while timeout is
|
146
|
+
reached `PendingConnectionsError` is raised. It's better to figure out why you have slow responses and fix or
|
147
|
+
block them rather than turn this setting off. Default is true.
|
148
|
+
* `:browser_path` (String) - Path to Chrome binary, you can also set ENV
|
149
|
+
variable as `BROWSER_PATH=some/path/chrome`.
|
150
|
+
* `:browser_options` (Hash) - Additional command line options,
|
151
|
+
[see them all](https://peter.sh/experiments/chromium-command-line-switches/)
|
152
|
+
e.g. `{ "ignore-certificate-errors" => nil }`
|
153
|
+
* `:ignore_default_browser_options` (Boolean) - Ferrum has a number of default
|
154
|
+
options it passes to the browser, if you set this to `true` then only
|
155
|
+
options you put in `:browser_options` will be passed to the browser,
|
156
|
+
except required ones of course.
|
157
|
+
* `:url` (String) - URL for a running instance of Chrome. If this is set, a
|
158
|
+
browser process will not be spawned.
|
159
|
+
* `:process_timeout` (Integer) - How long to wait for the Chrome process to
|
160
|
+
respond on startup.
|
161
|
+
* `:ws_max_receive_size` (Integer) - How big messages to accept from Chrome
|
162
|
+
over the web socket, in bytes. Defaults to 64MB. Incoming messages larger
|
163
|
+
than this will cause a `Ferrum::DeadBrowserError`.
|
164
|
+
|
165
|
+
More [details](https://github.com/rubycdp/ferrum#customization)
|
166
|
+
|
84
167
|
## Development
|
85
168
|
|
86
169
|
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
@@ -109,6 +192,8 @@ Your support is greatly appreciated and helps maintain the project!
|
|
109
192
|
|
110
193
|
For Chrome, mode headless=new seems to be slower for pdf rendering cases.
|
111
194
|
|
195
|
+
On mac m3 (aug 24), chromium (brew install chromium) is about 3x slower then chrome? Maybe the chromium that get's installed is not ARM optimized?
|
196
|
+
|
112
197
|
## Primitive benchmark
|
113
198
|
|
114
199
|
On a macbook m3, the throughput for 'hello world' PDF generation can reach around 25 docs/second when allowing for some concurrency. As Chrome is actually also very efficient, it scales really well for complex documents also. If you run this in Rails, the concurrency is being taken care of either by the front end thread pool or by the workers and you shouldn't have to think about this. (Using an external Chrome)
|
@@ -123,10 +208,57 @@ Total time c:5, n:4 = 0.72492800001055 seconds
|
|
123
208
|
Total time c:20, n:1 = 0.7156629998935387 seconds
|
124
209
|
```
|
125
210
|
|
126
|
-
## Advanced stuf
|
127
211
|
|
128
|
-
|
212
|
+
## Rails
|
213
|
+
|
214
|
+
### `send_data`
|
215
|
+
|
216
|
+
The `send_data` method in Rails is used to send binary data as a file download to the user's browser. It allows you to send any type of data, such as PDF files, images, or CSV files, directly to the user without saving the file on the server.
|
217
|
+
|
218
|
+
Here's an example of how to use `send_data` to send a PDF file:
|
219
|
+
|
220
|
+
```ruby
|
221
|
+
def download_pdf
|
222
|
+
pdf_data = Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
|
223
|
+
send_data pdf_data, filename: "document.pdf", type: "application/pdf"
|
224
|
+
end
|
225
|
+
```
|
226
|
+
|
227
|
+
In this example, `pdf_data` is the binary data of the PDF file. The `filename` option specifies the name of the file that will be downloaded by the user, and the `type` option specifies the MIME type of the file.
|
228
|
+
|
229
|
+
### `render_to_string`
|
230
|
+
|
231
|
+
The `render_to_string` method in Rails is used to render a view template to a string without sending it as a response to the user's browser. It allows you to generate HTML or other text-based content that can be used in various ways, such as sending it as an email, saving it to a file, or manipulating it further before sending it as a response.
|
232
|
+
|
233
|
+
Here's an example of how to use `render_to_string` to render a view template to a string:
|
234
|
+
|
235
|
+
```ruby
|
236
|
+
def download_pdf
|
237
|
+
html_string = render_to_string(template: "example/template", layout: "print", locals: { } )
|
238
|
+
pdf_data = Palapala::PDF.new(html_string).binary_data
|
239
|
+
send_data pdf_data, filename: "document.pdf", type: "application/pdf"
|
240
|
+
end
|
241
|
+
```
|
242
|
+
|
243
|
+
## Docker
|
244
|
+
|
245
|
+
In docker as root you must pass the no-sandbox browser option:
|
246
|
+
|
247
|
+
```ruby
|
248
|
+
Palapala.setup do |config|
|
249
|
+
config.ferrum_opts = { 'no-sandbox': nil }
|
250
|
+
end
|
251
|
+
```
|
252
|
+
(from Ferrum) It has also been reported that the Chrome process repeatedly crashes when running inside a Docker container on an M1 Mac preventing Ferrum from working. Ferrum should work as expected when deployed to a Docker container on a non-M1 Mac.
|
253
|
+
|
254
|
+
## Heroku
|
255
|
+
|
256
|
+
possible buildpacks
|
257
|
+
|
258
|
+
https://github.com/heroku/heroku-buildpack-chrome-for-testing
|
259
|
+
|
260
|
+
this buildpack install chrome and chromedriver, which is actually not needed, but it's maintained
|
129
261
|
|
130
|
-
|
262
|
+
https://elements.heroku.com/buildpacks/heroku/heroku-buildpack-google-chrome
|
131
263
|
|
132
|
-
|
264
|
+
this buildpack installs chrome, which is all we need, but it's deprecated
|
data/lib/palapala/pdf.rb
CHANGED
@@ -4,20 +4,19 @@ require 'ferrum'
|
|
4
4
|
|
5
5
|
module Palapala
|
6
6
|
# Page class to generate PDF from HTML content using Chrome in headless mode in a thread-safe way
|
7
|
-
|
7
|
+
# @param page_ranges Empty string means all pages, e.g., "1-3, 5, 7-9"
|
8
|
+
class PDF
|
8
9
|
def initialize(content = nil,
|
9
10
|
url: nil,
|
10
|
-
path: nil,
|
11
11
|
header_html: nil,
|
12
12
|
footer_html: nil,
|
13
|
-
generate_tagged_pdf: false,
|
14
|
-
prefer_css_page_size: true,
|
13
|
+
generate_tagged_pdf: Palapala.defaults.fetch(:generate_tagged_pdf, false),
|
14
|
+
prefer_css_page_size: Palapala.defaults.fetch(:prefer_css_page_size, true),
|
15
15
|
scale: Palapala.defaults.fetch(:scale, 1),
|
16
|
-
page_ranges: Palapala.defaults.fetch(:page_ranges,
|
16
|
+
page_ranges: Palapala.defaults.fetch(:page_ranges, nil),
|
17
17
|
margin: Palapala.defaults.fetch(:margin, {}))
|
18
18
|
@content = content
|
19
19
|
@url = url
|
20
|
-
@path = path
|
21
20
|
@header_html = header_html
|
22
21
|
@footer_html = footer_html
|
23
22
|
@generate_tagged_pdf = generate_tagged_pdf
|
@@ -27,11 +26,21 @@ module Palapala
|
|
27
26
|
@margin = margin
|
28
27
|
end
|
29
28
|
|
29
|
+
def binary_data(**opts)
|
30
|
+
pdf(**opts)
|
31
|
+
end
|
32
|
+
|
33
|
+
def save(path, **opts)
|
34
|
+
pdf(path:, **opts)
|
35
|
+
end
|
36
|
+
|
37
|
+
private
|
38
|
+
|
30
39
|
def pdf(**opts)
|
31
40
|
browser_context = browser.contexts.create
|
32
41
|
browser_page = browser_context.page
|
33
42
|
# # output console logs for this page
|
34
|
-
if
|
43
|
+
if Palapala.debug
|
35
44
|
browser_page.on('Runtime.consoleAPICalled') do |params|
|
36
45
|
params['args'].each { |r| puts(r['value']) }
|
37
46
|
end
|
@@ -40,25 +49,15 @@ module Palapala
|
|
40
49
|
url = @url || data_url
|
41
50
|
browser_page.go_to(url)
|
42
51
|
# Wait for the page to load
|
43
|
-
browser_page.network.wait_for_idle
|
52
|
+
# browser_page.network.wait_for_idle
|
44
53
|
# Generate PDF
|
45
54
|
pdf_binary_data = browser_page.pdf(**opts_with_defaults.merge(opts))
|
46
55
|
# Dispose the context
|
47
56
|
browser_context.dispose
|
48
57
|
# Return the PDF data
|
49
|
-
pdf_binary_data
|
58
|
+
opts[:path] ? opts[:path] : pdf_binary_data
|
50
59
|
end
|
51
60
|
|
52
|
-
def binary_data(**opts)
|
53
|
-
pdf(**opts)
|
54
|
-
end
|
55
|
-
|
56
|
-
def save(path, **opts)
|
57
|
-
pdf(path:, **opts)
|
58
|
-
end
|
59
|
-
|
60
|
-
private
|
61
|
-
|
62
61
|
def data_url
|
63
62
|
encoded_html = Base64.strict_encode64(@content)
|
64
63
|
"data:text/html;base64,#{encoded_html}"
|
@@ -68,23 +67,24 @@ module Palapala
|
|
68
67
|
opts = { scale: @scale,
|
69
68
|
printBackground: true,
|
70
69
|
dispayHeaderFooter: true,
|
71
|
-
pageRanges: @page_ranges, # Empty string means all pages, e.g., "1-3, 5, 7-9"
|
72
70
|
encoding: :binary,
|
73
|
-
preferCSSPageSize:
|
74
|
-
headerTemplate: @header_html || '',
|
75
|
-
footerTemplate: @footer_html || '' }
|
71
|
+
preferCSSPageSize: @prefer_css_page_size }
|
76
72
|
|
73
|
+
opts[:headerTemplate] = @header_html unless @header_html.nil?
|
74
|
+
opts[:footerTemplate] = @footer_html unless @footer_html.nil?
|
75
|
+
opts[:pageRanges] = @page_ranges unless @page_ranges.nil?
|
77
76
|
opts[:path] = @path unless @path.nil?
|
78
77
|
opts[:generateTaggedPDF] = @generate_tagged_pdf unless @generate_tagged_pdf.nil?
|
79
78
|
opts[:format] = @format unless @format.nil?
|
80
|
-
opts[:paperWidth] = @paper_width unless @paper_width.nil?
|
81
|
-
opts[:paperHeight] = @paper_height unless @paper_height.nil?
|
79
|
+
# opts[:paperWidth] = @paper_width unless @paper_width.nil?
|
80
|
+
# opts[:paperHeight] = @paper_height unless @paper_height.nil?
|
82
81
|
opts[:landscape] = @landscape unless @landscape.nil?
|
83
82
|
opts[:marginTop] = @margin[:top] unless @margin[:top].nil?
|
84
83
|
opts[:marginLeft] = @margin[:left] unless @margin[:left].nil?
|
85
84
|
opts[:marginBottom] = @margin[:bottom] unless @margin[:bottom].nil?
|
86
85
|
opts[:marginRight] = @margin[:right] unless @margin[:right].nil?
|
87
86
|
|
87
|
+
puts "opts: #{opts}" if Palapala&.debug
|
88
88
|
opts
|
89
89
|
end
|
90
90
|
|
data/lib/palapala/version.rb
CHANGED
data/lib/palapala.rb
CHANGED
@@ -8,19 +8,13 @@ module Palapala
|
|
8
8
|
yield self
|
9
9
|
end
|
10
10
|
|
11
|
-
|
12
|
-
|
11
|
+
class << self
|
12
|
+
attr_accessor :ferrum_opts
|
13
|
+
attr_accessor :defaults
|
14
|
+
attr_accessor :debug
|
13
15
|
end
|
14
16
|
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
def self.defaults=(defaults)
|
20
|
-
@defaults = defaults
|
21
|
-
end
|
22
|
-
|
23
|
-
def self.defaults
|
24
|
-
@defaults ||= {}
|
25
|
-
end
|
17
|
+
self.ferrum_opts = {}
|
18
|
+
self.defaults = {}
|
19
|
+
self.debug = false
|
26
20
|
end
|
data/palapala_pdf.gemspec
CHANGED
@@ -34,7 +34,7 @@ Gem::Specification.new do |spec|
|
|
34
34
|
spec.require_paths = ['lib']
|
35
35
|
|
36
36
|
# Uncomment to register a new dependency of your gem
|
37
|
-
spec.add_dependency 'ferrum', '~> 0
|
37
|
+
spec.add_dependency 'ferrum', '~> 0'
|
38
38
|
|
39
39
|
# For more information and examples about making a new gem, check out our
|
40
40
|
# guide at: https://bundler.io/guides/creating_gem.html
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: palapala_pdf
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Koen Handekyn
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-08-
|
11
|
+
date: 2024-08-25 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: ferrum
|
@@ -16,14 +16,14 @@ dependencies:
|
|
16
16
|
requirements:
|
17
17
|
- - "~>"
|
18
18
|
- !ruby/object:Gem::Version
|
19
|
-
version: '0
|
19
|
+
version: '0'
|
20
20
|
type: :runtime
|
21
21
|
prerelease: false
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
24
|
- - "~>"
|
25
25
|
- !ruby/object:Gem::Version
|
26
|
-
version: '0
|
26
|
+
version: '0'
|
27
27
|
description: This gem uses Ferrum to render HTML into a PDF using Chrom(e)(ium) with
|
28
28
|
minimal dependencies.
|
29
29
|
email:
|