palapala_pdf 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8ceecc9fff4323ef8cdf26b9e20aeb35f21610e8b55f716a0b8e0c27c0e38613
4
- data.tar.gz: 4f9f412514ad9a9b63e4b6484488ff245bb0769352374a652845749ac230b6e6
3
+ metadata.gz: f4a479307ef1a9d4ebe8aee6d8b3f2d7da1f96c854252ba69cc19bbaba45f6cf
4
+ data.tar.gz: d8f184150f5b43eb1abcbaceaf4e37a643b9a3fdfbcca7dff075f56ec1d5622b
5
5
  SHA512:
6
- metadata.gz: 62c62530b7034a687012bea224b8d284616cef97a2e749ab26151ee2e925ffb3b72ab0d96b13e970ab9d2630bca188c1e134ca59e81bcb37f245dbb1b888f000
7
- data.tar.gz: 352245729e62e3df55c23848b684af2deb642b3d506de83ef71f8cf0bc4964ac94219baac3e06e776957f15cb15fba41cf3f864ecb4a381c7e057bed75e598f6
6
+ metadata.gz: a094985ed908279fac68ed43d7fbf83b333591d97ca723af840263425c94d530647388417bd935618680f402e617af6097c55eb8e09a98dd69df5a61a7eaa495
7
+ data.tar.gz: '07382dfb24841886ba2c09ae20b240701cb57d37963f16fd8e9e135f299f63962a5c4f09f6f14d4b2a0e86ae33a80cd2bc45426605b9137f3cdbc70c50cb34b2'
data/README.md CHANGED
@@ -2,13 +2,13 @@
2
2
 
3
3
  This project is a Ruby gem that provides functionality for generating PDF files from HTML using the Chrome browser. It allows you to easily convert HTML content into PDF documents, making it convenient for tasks such as generating reports, invoices, or any other printable documents. The gem provides a simple and intuitive API for converting HTML to PDF, and it leverages the power and flexibility of the Chrome browser's rendering engine to ensure accurate and high-quality PDF output. With this gem, you can easily integrate PDF generation capabilities into your Ruby applications.
4
4
 
5
- At the core, this project leverages the same rendering engine as [Grover](https://github.com/Studiosity/grover), but with significantly reduced overhead and dependencies. Instead of relying on the full Grover stack, this project builds on [Ferrum](https://github.com/rubycdp/ferrum) to enable direct communication from Ruby to a headless Chrome or Chromium browser. This approach ensures efficient, thread-safe operations, providing a streamlined alternative for rendering tasks without sacrificing performance or flexibility.
5
+ At the core, this project leverages the same rendering engine as [Grover](https://github.com/Studiosity/grover), but with significantly reduced overhead and dependencies. Instead of relying on the full Grover/Puppeteer/NodeJS stack, this project builds on [Ferrum](https://github.com/rubycdp/ferrum) to enable direct communication from Ruby to a headless Chrome or Chromium browser. This approach ensures efficient, thread-safe operations, providing a streamlined alternative for rendering tasks without sacrificing performance or flexibility.
6
6
 
7
- This is how easy and powerfull PDF generation should be:
7
+ This is how easy and powerfull PDF generation should be in Ruby:
8
8
 
9
9
  ```ruby
10
10
  require "palapala"
11
- Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
11
+ Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
12
12
  ```
13
13
 
14
14
  And this while having the most modern HTML/CSS/JS availlable to you: flex, grid, canvas, you name it.
@@ -57,7 +57,7 @@ end
57
57
 
58
58
  2. **Create a PDF from HTML**:
59
59
 
60
- Create a PDF file from HTML in IRB
60
+ Create a PDF file from HTML in `irb`
61
61
 
62
62
  ```sh
63
63
  gem install palapala_pdf
@@ -65,22 +65,105 @@ gem install palapala_pdf
65
65
 
66
66
  in IRB, load palapala and create a PDF from an HTML snippet:
67
67
 
68
- ```sh
69
- >irb
68
+ ```ruby
69
+ require "palapala"
70
+ Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
70
71
  ```
71
72
 
73
+ Instantiate a new Palapala::PDF object with your HTML content and generate the PDF binary data.
74
+
72
75
  ```ruby
73
76
  require "palapala"
74
- Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
77
+ binary_data = Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
75
78
  ```
76
79
 
77
- Instantiate a new Palapala::Pdf object with your HTML content and generate the PDF binary data.
80
+ ## Paged CSS
81
+
82
+ Paged CSS is a subset of CSS designed for styling printed documents. It extends standard CSS to handle pagination, page sizes, headers, footers, and other aspects of printed content. Paged CSS is commonly used in scenarios where web content needs to be converted to PDFs or other paginated formats.
83
+
84
+ ### Headers and Footers
85
+
86
+ When using Chromium-based rendering engines, headers and footers are not controlled by the Paged CSS standard but are instead managed through specific settings in the rendering engine.
87
+
88
+ With palapala PDF headers and footers are defined using `header_html` and `footer_html` options. These allow you to insert HTML content directly into the header or footer areas.
78
89
 
79
90
  ```ruby
80
- require "palapala"
81
- binary_data = Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
91
+ Palapala::PDF.new(
92
+ "<p>Hello world</>",
93
+ header_html: '<div style="text-align: center;">Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>',
94
+ footer_html: '<div style="text-align: center;">Generated with Palapala PDF</div>',
95
+ margin: { top: "2cm", bottom: "2cm"}
96
+ ).save("test.pdf")
97
+ ```
98
+
99
+ ### Page size, orientation and margins
100
+
101
+ #### With CSS
102
+
103
+ todo example
104
+
105
+ #### As params
106
+
107
+ todo example
108
+
109
+ ## JS based rendering
110
+
111
+ ```html
112
+ <html>
113
+ <script type="text/javascript">
114
+ document.addEventListener("DOMContentLoaded", () => {
115
+ document.body.innerHTML += "<p>Current time from JS: " + new Date().toLocaleString() + "</p>";
116
+ });
117
+ </script>
118
+ <body><p>Default body text.</p></body>
119
+ </html>
82
120
  ```
83
121
 
122
+ ## Customisation
123
+
124
+ ### Ferrum
125
+
126
+ It is Ruby clean and high-level API to Chrome. All you need is Ruby and
127
+ [Chrome](https://www.google.com/chrome/) or
128
+ [Chromium](https://www.chromium.org/). Ferrum connects to the browser by [CDP
129
+ protocol](https://chromedevtools.github.io/devtools-protocol/).
130
+
131
+ Highlighting some key Ferrum options in the context of PDF generation
132
+
133
+ * options `Hash`
134
+ * `:headless` (String | Boolean) - Set browser as headless or not, `true` by default. You can set `"new"` to support
135
+ [new headless mode](https://developer.chrome.com/articles/new-headless/).
136
+ * `:xvfb` (Boolean) - Run browser in a virtual framebuffer, `false` by default.
137
+ * `:extensions` (Array[String | Hash]) - An array of paths to files or JS
138
+ source code to be preloaded into the browser e.g.:
139
+ `["/path/to/script.js", { source: "window.secret = 'top'" }]`
140
+ * `:logger` (Object responding to `puts`) - When present, debug output is
141
+ written to this object.
142
+ * `:timeout` (Numeric) - The number of seconds we'll wait for a response when
143
+ communicating with browser. Default is 5.
144
+ * `:js_errors` (Boolean) - When true, JavaScript errors get re-raised in Ruby.
145
+ * `:pending_connection_errors` (Boolean) - When main frame is still waiting for slow responses while timeout is
146
+ reached `PendingConnectionsError` is raised. It's better to figure out why you have slow responses and fix or
147
+ block them rather than turn this setting off. Default is true.
148
+ * `:browser_path` (String) - Path to Chrome binary, you can also set ENV
149
+ variable as `BROWSER_PATH=some/path/chrome`.
150
+ * `:browser_options` (Hash) - Additional command line options,
151
+ [see them all](https://peter.sh/experiments/chromium-command-line-switches/)
152
+ e.g. `{ "ignore-certificate-errors" => nil }`
153
+ * `:ignore_default_browser_options` (Boolean) - Ferrum has a number of default
154
+ options it passes to the browser, if you set this to `true` then only
155
+ options you put in `:browser_options` will be passed to the browser,
156
+ except required ones of course.
157
+ * `:url` (String) - URL for a running instance of Chrome. If this is set, a
158
+ browser process will not be spawned.
159
+ * `:process_timeout` (Integer) - How long to wait for the Chrome process to
160
+ respond on startup.
161
+ * `:ws_max_receive_size` (Integer) - How big messages to accept from Chrome
162
+ over the web socket, in bytes. Defaults to 64MB. Incoming messages larger
163
+ than this will cause a `Ferrum::DeadBrowserError`.
164
+
165
+ More [details](https://github.com/rubycdp/ferrum#customization)
166
+
84
167
  ## Development
85
168
 
86
169
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
@@ -109,6 +192,8 @@ Your support is greatly appreciated and helps maintain the project!
109
192
 
110
193
  For Chrome, mode headless=new seems to be slower for pdf rendering cases.
111
194
 
195
+ On mac m3 (aug 24), chromium (brew install chromium) is about 3x slower then chrome? Maybe the chromium that get's installed is not ARM optimized?
196
+
112
197
  ## Primitive benchmark
113
198
 
114
199
  On a macbook m3, the throughput for 'hello world' PDF generation can reach around 25 docs/second when allowing for some concurrency. As Chrome is actually also very efficient, it scales really well for complex documents also. If you run this in Rails, the concurrency is being taken care of either by the front end thread pool or by the workers and you shouldn't have to think about this. (Using an external Chrome)
@@ -123,10 +208,57 @@ Total time c:5, n:4 = 0.72492800001055 seconds
123
208
  Total time c:20, n:1 = 0.7156629998935387 seconds
124
209
  ```
125
210
 
126
- ## Advanced stuf
127
211
 
128
- ### Headers and Footers
212
+ ## Rails
213
+
214
+ ### `send_data`
215
+
216
+ The `send_data` method in Rails is used to send binary data as a file download to the user's browser. It allows you to send any type of data, such as PDF files, images, or CSV files, directly to the user without saving the file on the server.
217
+
218
+ Here's an example of how to use `send_data` to send a PDF file:
219
+
220
+ ```ruby
221
+ def download_pdf
222
+ pdf_data = Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
223
+ send_data pdf_data, filename: "document.pdf", type: "application/pdf"
224
+ end
225
+ ```
226
+
227
+ In this example, `pdf_data` is the binary data of the PDF file. The `filename` option specifies the name of the file that will be downloaded by the user, and the `type` option specifies the MIME type of the file.
228
+
229
+ ### `render_to_string`
230
+
231
+ The `render_to_string` method in Rails is used to render a view template to a string without sending it as a response to the user's browser. It allows you to generate HTML or other text-based content that can be used in various ways, such as sending it as an email, saving it to a file, or manipulating it further before sending it as a response.
232
+
233
+ Here's an example of how to use `render_to_string` to render a view template to a string:
234
+
235
+ ```ruby
236
+ def download_pdf
237
+ html_string = render_to_string(template: "example/template", layout: "print", locals: { } )
238
+ pdf_data = Palapala::PDF.new(html_string).binary_data
239
+ send_data pdf_data, filename: "document.pdf", type: "application/pdf"
240
+ end
241
+ ```
242
+
243
+ ## Docker
244
+
245
+ In docker as root you must pass the no-sandbox browser option:
246
+
247
+ ```ruby
248
+ Palapala.setup do |config|
249
+ config.ferrum_opts = { 'no-sandbox': nil }
250
+ end
251
+ ```
252
+ (from Ferrum) It has also been reported that the Chrome process repeatedly crashes when running inside a Docker container on an M1 Mac preventing Ferrum from working. Ferrum should work as expected when deployed to a Docker container on a non-M1 Mac.
253
+
254
+ ## Heroku
255
+
256
+ possible buildpacks
257
+
258
+ https://github.com/heroku/heroku-buildpack-chrome-for-testing
259
+
260
+ this buildpack install chrome and chromedriver, which is actually not needed, but it's maintained
129
261
 
130
- ### Title pages
262
+ https://elements.heroku.com/buildpacks/heroku/heroku-buildpack-google-chrome
131
263
 
132
- ### Page sizes in CSS
264
+ this buildpack installs chrome, which is all we need, but it's deprecated
data/lib/palapala/pdf.rb CHANGED
@@ -4,20 +4,19 @@ require 'ferrum'
4
4
 
5
5
  module Palapala
6
6
  # Page class to generate PDF from HTML content using Chrome in headless mode in a thread-safe way
7
- class Pdf
7
+ # @param page_ranges Empty string means all pages, e.g., "1-3, 5, 7-9"
8
+ class PDF
8
9
  def initialize(content = nil,
9
10
  url: nil,
10
- path: nil,
11
11
  header_html: nil,
12
12
  footer_html: nil,
13
- generate_tagged_pdf: false,
14
- prefer_css_page_size: true,
13
+ generate_tagged_pdf: Palapala.defaults.fetch(:generate_tagged_pdf, false),
14
+ prefer_css_page_size: Palapala.defaults.fetch(:prefer_css_page_size, true),
15
15
  scale: Palapala.defaults.fetch(:scale, 1),
16
- page_ranges: Palapala.defaults.fetch(:page_ranges, ''),
16
+ page_ranges: Palapala.defaults.fetch(:page_ranges, nil),
17
17
  margin: Palapala.defaults.fetch(:margin, {}))
18
18
  @content = content
19
19
  @url = url
20
- @path = path
21
20
  @header_html = header_html
22
21
  @footer_html = footer_html
23
22
  @generate_tagged_pdf = generate_tagged_pdf
@@ -27,11 +26,21 @@ module Palapala
27
26
  @margin = margin
28
27
  end
29
28
 
29
+ def binary_data(**opts)
30
+ pdf(**opts)
31
+ end
32
+
33
+ def save(path, **opts)
34
+ pdf(path:, **opts)
35
+ end
36
+
37
+ private
38
+
30
39
  def pdf(**opts)
31
40
  browser_context = browser.contexts.create
32
41
  browser_page = browser_context.page
33
42
  # # output console logs for this page
34
- if opts[:debug]
43
+ if Palapala.debug
35
44
  browser_page.on('Runtime.consoleAPICalled') do |params|
36
45
  params['args'].each { |r| puts(r['value']) }
37
46
  end
@@ -40,25 +49,15 @@ module Palapala
40
49
  url = @url || data_url
41
50
  browser_page.go_to(url)
42
51
  # Wait for the page to load
43
- browser_page.network.wait_for_idle
52
+ # browser_page.network.wait_for_idle
44
53
  # Generate PDF
45
54
  pdf_binary_data = browser_page.pdf(**opts_with_defaults.merge(opts))
46
55
  # Dispose the context
47
56
  browser_context.dispose
48
57
  # Return the PDF data
49
- pdf_binary_data
58
+ opts[:path] ? opts[:path] : pdf_binary_data
50
59
  end
51
60
 
52
- def binary_data(**opts)
53
- pdf(**opts)
54
- end
55
-
56
- def save(path, **opts)
57
- pdf(path:, **opts)
58
- end
59
-
60
- private
61
-
62
61
  def data_url
63
62
  encoded_html = Base64.strict_encode64(@content)
64
63
  "data:text/html;base64,#{encoded_html}"
@@ -68,23 +67,24 @@ module Palapala
68
67
  opts = { scale: @scale,
69
68
  printBackground: true,
70
69
  dispayHeaderFooter: true,
71
- pageRanges: @page_ranges, # Empty string means all pages, e.g., "1-3, 5, 7-9"
72
70
  encoding: :binary,
73
- preferCSSPageSize: true,
74
- headerTemplate: @header_html || '',
75
- footerTemplate: @footer_html || '' }
71
+ preferCSSPageSize: @prefer_css_page_size }
76
72
 
73
+ opts[:headerTemplate] = @header_html unless @header_html.nil?
74
+ opts[:footerTemplate] = @footer_html unless @footer_html.nil?
75
+ opts[:pageRanges] = @page_ranges unless @page_ranges.nil?
77
76
  opts[:path] = @path unless @path.nil?
78
77
  opts[:generateTaggedPDF] = @generate_tagged_pdf unless @generate_tagged_pdf.nil?
79
78
  opts[:format] = @format unless @format.nil?
80
- opts[:paperWidth] = @paper_width unless @paper_width.nil?
81
- opts[:paperHeight] = @paper_height unless @paper_height.nil?
79
+ # opts[:paperWidth] = @paper_width unless @paper_width.nil?
80
+ # opts[:paperHeight] = @paper_height unless @paper_height.nil?
82
81
  opts[:landscape] = @landscape unless @landscape.nil?
83
82
  opts[:marginTop] = @margin[:top] unless @margin[:top].nil?
84
83
  opts[:marginLeft] = @margin[:left] unless @margin[:left].nil?
85
84
  opts[:marginBottom] = @margin[:bottom] unless @margin[:bottom].nil?
86
85
  opts[:marginRight] = @margin[:right] unless @margin[:right].nil?
87
86
 
87
+ puts "opts: #{opts}" if Palapala&.debug
88
88
  opts
89
89
  end
90
90
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Palapala
4
- VERSION = '0.1.1'
4
+ VERSION = '0.1.2'
5
5
  end
data/lib/palapala.rb CHANGED
@@ -8,19 +8,13 @@ module Palapala
8
8
  yield self
9
9
  end
10
10
 
11
- def self.ferrum_opts=(ferrum_opts)
12
- @ferrum_opts = ferrum_opts
11
+ class << self
12
+ attr_accessor :ferrum_opts
13
+ attr_accessor :defaults
14
+ attr_accessor :debug
13
15
  end
14
16
 
15
- def self.ferrum_opts
16
- @ferrum_opts
17
- end
18
-
19
- def self.defaults=(defaults)
20
- @defaults = defaults
21
- end
22
-
23
- def self.defaults
24
- @defaults ||= {}
25
- end
17
+ self.ferrum_opts = {}
18
+ self.defaults = {}
19
+ self.debug = false
26
20
  end
data/palapala_pdf.gemspec CHANGED
@@ -34,7 +34,7 @@ Gem::Specification.new do |spec|
34
34
  spec.require_paths = ['lib']
35
35
 
36
36
  # Uncomment to register a new dependency of your gem
37
- spec.add_dependency 'ferrum', '~> 0.15'
37
+ spec.add_dependency 'ferrum', '~> 0'
38
38
 
39
39
  # For more information and examples about making a new gem, check out our
40
40
  # guide at: https://bundler.io/guides/creating_gem.html
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: palapala_pdf
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.1.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Koen Handekyn
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2024-08-23 00:00:00.000000000 Z
11
+ date: 2024-08-25 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ferrum
@@ -16,14 +16,14 @@ dependencies:
16
16
  requirements:
17
17
  - - "~>"
18
18
  - !ruby/object:Gem::Version
19
- version: '0.15'
19
+ version: '0'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
- version: '0.15'
26
+ version: '0'
27
27
  description: This gem uses Ferrum to render HTML into a PDF using Chrom(e)(ium) with
28
28
  minimal dependencies.
29
29
  email: