palapala_pdf 0.1.10 → 0.1.11

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e06d55c5dca6e14014e1154d4cd4fdcdddcd61844ccd138f38e1d9d803d1094e
4
- data.tar.gz: 89ed6d300a9e4c804d3bfcb54516ec921ed741df1718d68b4d160b36e3fd2792
3
+ metadata.gz: 230cc525cd5e4bbc4d2ce9ddb0a418486bc27ad6163cef9834a67d7e23442b41
4
+ data.tar.gz: b283d90551ef07efe3384148b061ec0a6a18cb3bdbc9485c1e8717e6bc4979f3
5
5
  SHA512:
6
- metadata.gz: f0bd26fe4c402e06f1f75ab4a5ccfad05834b78bcb125024967134149b7738b2cbd8121dd985332907644ecfd167f44fc3109b322099ebb9c0e988971f74f42e
7
- data.tar.gz: c95f8931ce1538af9b0cb63d93fcc179016cfe05cd4920b7ead1bf6933f4712bfbaee3ae8d243e1112353c1c2b4af875c92b6e1e6373ad219aa0c409f7db117a
6
+ metadata.gz: cfcf738f7171f679d419349cce4ce0441bb3be54fc355fa9592222b04c63654d6646c6103310f8ddf56e7112a3ccc95b30ddb92db86e0ff48a12b221f8be039c
7
+ data.tar.gz: efa8277743d960b0d3e869ab970c8248a2f901b572d4451bf0c479c311eabbd2a810e678efad489913ced8ab2ada14e8ce7d4b4fbb7d3ccad61ce98f419ee1b2
data/README.md CHANGED
@@ -4,7 +4,9 @@
4
4
 
5
5
  This project is a Ruby gem that provides functionality for generating PDF files from HTML using the Chrome browser. It allows you to easily convert HTML content into PDF documents, making it convenient for tasks such as generating reports, invoices, or any other printable documents. The gem provides a simple and intuitive API for converting HTML to PDF, and it leverages the power and flexibility of the Chrome browser's rendering engine to ensure accurate and high-quality PDF output. With this gem, you can easily integrate PDF generation capabilities into your Ruby applications.
6
6
 
7
- At the core, this project leverages the same rendering engine as [Grover](https://github.com/Studiosity/grover), but with significantly reduced overhead and dependencies. Instead of relying on the full Grover/Puppeteer/NodeJS stack, this project uses a raw web socket to enable direct communication from Ruby to a headless Chrome or Chromium browser. This approach ensures efficieny while providing a streamlined alternative for rendering tasks without sacrificing performance or flexibility.
7
+ At the core, this project leverages the Chrome rendering engine, but with significantly reduced overhead and dependencies. Instead of relying on the full Grover/Puppeteer/NodeJS stack, this project uses a raw web socket to enable direct communication from Ruby to a headless Chrome or Chromium browser. This approach ensures efficieny while providing a streamlined alternative for rendering tasks without sacrificing performance or flexibility.
8
+
9
+ It leverages work from [Puppeteer](https://pptr.dev/browsers-api/) (@puppeteer/browsers) to install a local Chrome-Headless-Shell if no Chrome is running, but that requires node (npx) to be available.
8
10
 
9
11
  This is how easy PDF generation can be in Ruby:
10
12
 
@@ -16,85 +18,28 @@ And this while having the most modern HTML/CSS/JS availlable to you: flex, grid,
16
18
 
17
19
  A core goal of this project is performance, and it is designed to be exceptionally fast. By leveraging **direct communication** with a headless Chrome or Chromium browser via a **raw web socket**, the gem minimizes overhead and dependencies, enabling PDF generation at speeds that significantly outperform other solutions. Whether generating simple or complex documents, this gem ensures that your Ruby applications can handle PDF tasks efficiently and at scale.
18
20
 
19
- ## Installation
20
-
21
- To install the gem and add it to your application's Gemfile, execute the following command:
22
-
23
- ```
24
- $ bundle add palapala_pdf
25
- ```
26
-
27
- If you are not using bundler to manage dependencies, you can install the gem by running:
28
-
29
- ```
30
- $ gem install palapala_pdf
31
- ```
32
-
33
- Palapala PDF connects to Chrome over a web socket connection.
34
- An external Chrome/Chromium is preferred. Start it with the following
35
- command (9222 is the default/expected port):
36
-
37
- ```sh
38
- /path/to/chrome --headless --disable-gpu --remote-debugging-port=9222
39
- ```
40
-
41
- ### Connecting to Chrome
42
-
43
- Palapa PDF will go through this process
44
-
45
- - check if a Chrome is running and exposing port 9222 (and if so, use it)
46
- - if `Palapala.headless_chrome_path` is defined, launch Chrome as a child process using that path
47
- - if **NPX** is avalaillable, install a **Chrome-Headless-Shell** variant locally and launch it as a child process. It will install the 'stable' version or the version identified by `Palapala.chrome_headless_shell_version` setting (or from ENV `CHROME_HEADLESS_SHELL_VERSION`).
48
- - as a last fallback it will guess a chrome path from the detected OS and try to launch a Chrome with that
49
-
50
- A Chrome-Headless-Shell version gives the best performance and resource useage
51
-
52
- ### Installing Chrome / Headless Chrome manually
53
-
54
- This is easiest using npx and some tooling provided by Puppeteer. Unfortunately it depends on node/npm, but it's worth it. E.g. install a specific version like this:
21
+ ## Sponsor This Project
55
22
 
56
- ```
57
- npx @puppeteer/browsers install chrome@127.0.6533.88
58
- ````
23
+ If you find this project useful and would like to support its development, consider sponsoring or buying a coffee to help keep it going:
59
24
 
60
- This installs chrome in a `chrome` folder in the current working dir and it outputs the path where it's installed when it's finished which then could be started like this
25
+ - **GitHub Sponsors:** [Sponsor on GitHub](https://github.com/sponsors/koenhandekyn)
26
+ - **Buy Me a Coffee:** [Buy a Coffee](https://buymeacoffee.com/koenhandekyn)
61
27
 
62
- Currently we'd advise for the `chrome-headless-shell` variant that is a light version meant just for this use case. The chrome-headless-shell is a minimal, headless version of the Chrome browser designed specifically for environments where you need to run Chrome without a graphical user interface (GUI). This is particularly useful in scenarios like server-side rendering, automated testing, web scraping, or any situation where you need the power of the Chrome browser engine without the overhead of displaying a UI. Headless by design, reduced size and overhead but still the same engine.
28
+ Your support is greatly appreciated and helps maintain the project!
63
29
 
64
- ```
65
- npx @puppeteer/browsers install chrome-headless-shell@stable
66
- ```
30
+ ## Installation
67
31
 
68
- It installs to a path like this `./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell`. As it's headless by design, it only needs one parameter:
32
+ To install the gem and add it to your application's Gemfile, execute the following command:
69
33
 
70
34
  ```
71
- ./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell --remote-debugging-port=9222
72
- ```
73
-
74
- *Note: Seems the august 2024 release 128.0.6613.85 is seriously performance impacted. So to avoid regression issues, it's suggested to install a specific version of Chrome, test it and stick with it. The chrome-headless-shell does not seem to suffer from this though.*
75
-
76
- ### Installing Node/NPX
77
-
78
- Using Brew
79
-
80
- ````
81
- brew install node
35
+ $ bundle add palapala_pdf
82
36
  ```
83
37
 
84
- Using NVM (Node Version Manager)
85
-
86
- ````
87
- curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.3/install.sh | bash
88
- source ~/.nvm/nvm.sh
89
- nvm --version
90
- nvm install node
91
- ````
92
-
93
38
  ## Usage Instructions
94
39
 
95
40
  To create a PDF from HTML content using the `Palapala` library, follow these steps:
96
41
 
97
- 1. **Configuration**:
42
+ **Configuration from inside Ruby**
98
43
 
99
44
  Configure the `Palapala` library with the necessary options, such as the URL for the browser and default settings like scale and format.
100
45
 
@@ -102,76 +47,82 @@ In a Rails context, this could be inside an initializer.
102
47
 
103
48
  ```ruby
104
49
  Palapala.setup do |config|
105
- # run against an external chrome/chromium or leave this out to run against a chrome that is started as a child process
50
+ # debug mode
106
51
  config.debug = true
107
- config.headless_chrome_url = 'http://localhost:9222' # run against a remote Chrome instance
108
- # config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
109
- config.defaults = { scale: 1, format: :A4 }
52
+ # Chrome headless shell version to use (stable, beta, dev, canary, etc.) when launching a new Chrome instance
53
+ config.chrome_headless_shell_version = :stable
54
+ # run against an external chrome/chromium or leave this out to run against a chrome that is started as a child process
55
+ config.headless_chrome_url = 'http://localhost:9222'
56
+ # path to Chrome executable
57
+ config.headless_chrome_path = '/usr/bin/google-chrome-stable'
58
+ # default options for PDF generation
59
+ config.defaults = { scale: 1 }
60
+ # extra params to pass to Chrome when launched as a child process
61
+ config.chrome_params = []
110
62
  end
111
63
  ```
112
- 1. **Create a PDF from HTML**:
113
64
 
114
- Create a PDF file from HTML in `irb`
65
+ **Using environemnt variables**
115
66
 
116
67
  ```sh
117
- gem install palapala_pdf
68
+ CHROME_HEADLESS_SHELL_VERSION=canary ruby examples/performance_benchmark.rb
69
+ ````
70
+
71
+ ```sh
72
+ HEADLESS_CHROME_URL=http://192.168.1.1:9222 ruby examples/performance_benchmark.rb
118
73
  ```
119
74
 
120
- in IRB, load palapala and create a PDF from an HTML snippet:
75
+ ```sh
76
+ CHROME_HEADLESS_PATH=/var/to/chrome ruby examples/performance_benchmark.rb
77
+ ```
78
+
79
+ **Create a PDF from HTML**
80
+
81
+ Load palapala and create a PDF file from an HTML snippet:
121
82
 
122
83
  ```ruby
123
84
  require "palapala"
124
85
  Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
125
86
  ```
126
87
 
127
- Instantiate a new Palapala::Pdf object with your HTML content and generate the PDF binary data.
88
+ Instantiate a new Palapala::Pdf object with your HTML content and generate the PDF binary data:
128
89
 
129
90
  ```ruby
130
91
  require "palapala"
131
92
  binary_data = Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
132
93
  ```
133
94
 
134
- ## Paged CSS
95
+ ## Advanced Examples
135
96
 
136
- Paged CSS is a subset of CSS designed for styling printed documents. It extends standard CSS to handle pagination, page sizes, headers, footers, and other aspects of printed content. Paged CSS is commonly used in scenarios where web content needs to be converted to PDFs or other paginated formats.
97
+ - headers and footers
98
+ - paged css for paper sizes, paper margins, pages breaks, etc
99
+ - js based rendering
137
100
 
138
- ### Headers and Footers
101
+ ## Connecting to Chrome
139
102
 
140
- When using Chromium-based rendering engines, headers and footers are not controlled by the Paged CSS standard but are instead managed through specific settings in the rendering engine.
141
-
142
- With palapala PDF headers and footers are defined using `header_html` and `footer_html` options. These allow you to insert HTML content directly into the header or footer areas.
143
-
144
- ```ruby
145
- Palapala::Pdf.new(
146
- "<p>Hello world</>",
147
- header_html: '<div style="text-align: center;">Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>',
148
- footer_html: '<div style="text-align: center;">Generated with Palapala PDF</div>',
149
- margin: { top: "2cm", bottom: "2cm"}
150
- ).save("test.pdf")
151
- ```
103
+ Palapa PDF will go through this process
152
104
 
153
- ### Page size, orientation and margins
105
+ - check if a Chrome is running and exposing port 9222 (and if so, use it)
106
+ - if `Palapala.headless_chrome_path` is defined, launch Chrome as a child process using that path
107
+ - if **NPX** is avalaillable, install a **Chrome-Headless-Shell** variant locally and launch it as a child process. It will install the 'stable' version or the version identified by `Palapala.chrome_headless_shell_version` setting (or from ENV `CHROME_HEADLESS_SHELL_VERSION`).
108
+ - as a last fallback it will guess a chrome path from the detected OS and try to launch a Chrome with that
154
109
 
155
- #### With CSS
110
+ In our expreience a Chrome-Headless-Shell version gives the best performance and resource useage.
156
111
 
157
- todo example
112
+ ### Installing Chrome / Headless Chrome manually
158
113
 
159
- #### As params
114
+ This is easiest using npx and tooling provided by Puppeteer (depends on node/npm, but it's worth it). This installs chrome in a `chrome` folder in the current working dir and it outputs the path where it's installed when it's finished. Currently we'd advise for the `chrome-headless-shell` variant that is a light version meant just for this use case. The chrome-headless-shell is a minimal, headless version of the Chrome browser designed specifically for environments where you need to run Chrome without a graphical user interface (GUI). This is particularly useful in scenarios like server-side rendering, automated testing, web scraping, or any situation where you need the power of the Chrome browser engine without the overhead of displaying a UI. Headless by design, reduced size and overhead but still the same engine.
160
115
 
161
- todo example
116
+ ```sh
117
+ npx @puppeteer/browsers install chrome-headless-shell@stable
118
+ ```
162
119
 
163
- ## JS based rendering
120
+ It installs to a path like this `./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell`. As it's headless by design, it only needs one parameter:
164
121
 
165
- ```html
166
- <html>
167
- <script type="text/javascript">
168
- document.addEventListener("DOMContentLoaded", () => {
169
- document.body.innerHTML += "<p>Current time from JS: " + new Date().toLocaleString() + "</p>";
170
- });
171
- </script>
172
- <body><p>Default body text.</p></body>
173
- </html>
122
+ ```sh
123
+ ./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell --remote-debugging-port=9222
174
124
  ```
125
+ *Note: Seems the august 2024 release Chrome releases 128.0.6613.85 onward is seriously performance impacted for PDF generation. Chrome Headless Shell releases don't seem to suffer from this issue.
175
126
 
176
127
  ## Raw parameters (Page.printToPDF)
177
128
 
@@ -193,15 +144,6 @@ Bug reports and pull requests are welcome on GitHub at https://github.com/palapa
193
144
  - [Eugen Neagoe](https://github.com/eneagoe) - Thank you for your valuable input, feedback and opinions.
194
145
  - [Radu Bogoevici](https://github.com/codenighter) - Thanks for test driving, and all help big and small.
195
146
 
196
- ## Sponsor This Project
197
-
198
- If you find this project useful and would like to support its development, consider sponsoring or buying a coffee to help keep it going:
199
-
200
- - **GitHub Sponsors:** [Sponsor on GitHub](https://github.com/sponsors/koenhandekyn)
201
- - **Buy Me a Coffee:** [Buy a Coffee](https://buymeacoffee.com/koenhandekyn)
202
-
203
- Your support is greatly appreciated and helps maintain the project!
204
-
205
147
  ## Findings
206
148
 
207
149
  - For Chrome, mode headless=new seems to be slower for pdf rendering cases.
@@ -209,24 +151,14 @@ Your support is greatly appreciated and helps maintain the project!
209
151
 
210
152
  ## Primitive benchmark
211
153
 
212
- On a macbook m3, the throughput for 'hello world' PDF generation can reach around 300 docs/second when allowing for some concurrency. As Chrome is actually also very efficient, it scales really well for complex documents also. If you run this in Rails, the concurrency is being taken care of either by the front end thread pool or by the workers and you shouldn't have to think about this. (Using an external Chrome)
154
+ On a macbook m3, the throughput for 'hello world' PDF generation can reach around 500 to 800 docs/second when allowing for some concurrency (4 threads). As Chrome is actually also very efficient, it scales really well for complex documents also. If you run this in Rails, the concurrency is being taken care of either by the front end thread pool or by the workers and you shouldn't have to think about this. (Using an external Chrome)
213
155
 
214
156
  Note: it renders `"Hello #{i}, world #{j}! #{Time.now}."` where i is the thread and j is the iteration counter within the thread and persists it to an SSD (which is very fast these days).
215
157
 
216
- ### benchmarking 20 docs: 1x20, 2x10, 4x5
217
-
218
158
  ```sh
219
- c:1, n:20 : Throughput = 159.41 docs/sec, Total time = 0.1255 seconds
220
- c:2, n:10 : Throughput = 124.91 docs/sec, Total time = 0.1601 seconds
221
- c:4, n:5 : Throughput = 196.40 docs/sec, Total time = 0.1018 seconds
222
- ```
223
-
224
- ### benchmarking 320 docs: 1x320, 4x80, 8x40
225
-
226
- ```sh
227
- c:1, n:320 : Throughput = 184.99 docs/sec, Total time = 1.7299 seconds
228
- c:4, n:80 : Throughput = 302.50 docs/sec, Total time = 1.0578 seconds
229
- c:8, n:40 : Throughput = 254.29 docs/sec, Total time = 1.2584 seconds
159
+ c:1, n:10 : Throughput = 16.76 docs/sec, Total time = 0.5968 seconds
160
+ c:2, n:10 : Throughput = 170.41 docs/sec, Total time = 0.1174 seconds
161
+ c:4, n:80 : Throughput = 579.03 docs/sec, Total time = 0.5526 seconds```
230
162
  ```
231
163
 
232
164
  This is about a factor 100x faster then what you typically get with Grover and still 10x faster then with many alternatives. It's effectively that fast that you can run this for a lot of uses cases straight from e.g. your Ruby On Rails web worker in the controller on a single machine and still scale to lot's of users.
@@ -253,25 +185,22 @@ In this example, `pdf_data` is the binary data of the PDF file. The `filename` o
253
185
 
254
186
  ## Docker
255
187
 
256
- In docker as root you must pass the no-sandbox browser option:
188
+ TODO
257
189
 
258
- ```ruby
259
- Palapala.setup do |config|
260
- config.opts = { 'no-sandbox': nil }
261
- end
262
- ```
263
- It has also been reported that the Chrome process repeatedly crashes when running inside a Docker container on an M1 Mac. Chrome should work as expected when deployed to a Docker container on a non-M1 Mac.
190
+ *It has also been reported that the Chrome process repeatedly crashes when running inside a Docker container on an M1 Mac. Chrome should work as expected when deployed to a Docker container on a non-M1 Mac.*
264
191
 
265
192
  ## Thread-safety
266
193
 
267
- Behind the scenes, a websocket is openend and stored on Thread.current for subsequent requests. Hence, the code is
268
- thread safe in the sense that every web socket get's a new tab in the underlying chromium and get an isolated context.
269
-
270
194
  For performance reasons, the code uses a low level websocket connection that does all it's work on the curent thread
271
195
  so we can avoid synchronisation penalties.
272
196
 
197
+ Behind the scenes, a websocket is openend and stored on Thread.current for subsequent requests. Hence, the code is
198
+ thread safe in the sense that every web socket get's a new tab in the underlying chromium and get an isolated context.
199
+
273
200
  ## Heroku
274
201
 
202
+ TODO
203
+
275
204
  possible buildpacks
276
205
 
277
206
  https://github.com/heroku/heroku-buildpack-chrome-for-testing
@@ -0,0 +1,16 @@
1
+ ### Installing Node (npx)
2
+
3
+ Using Brew
4
+
5
+ ```sh
6
+ brew install node
7
+ ```
8
+
9
+ Using NVM (Node Version Manager)
10
+
11
+ ```sh
12
+ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.3/install.sh | bash
13
+ source ~/.nvm/nvm.sh
14
+ nvm --version
15
+ nvm install node
16
+ ```
data/doc/paged_css.md ADDED
@@ -0,0 +1,167 @@
1
+ ## Paged CSS
2
+
3
+ Paged CSS is a subset of CSS designed for styling printed documents. It extends standard CSS to handle pagination, page sizes, headers, footers, and other aspects of printed content. Paged CSS is commonly used in scenarios where web content needs to be converted to PDFs or other paginated formats.
4
+
5
+ Setting page size
6
+
7
+ ```css
8
+ @page {
9
+ /* set a standard page size */
10
+ size: A4 landscape;
11
+ /* Custom */
12
+ size: 8.5in 11in; /* Width x Height */
13
+ }
14
+ ```
15
+
16
+ Setting page margins
17
+
18
+ ```css
19
+ @page {
20
+ margin: 1in; /* 1 inch on all sides */
21
+ margin: 1in 0.5in 1in 0.5in; /* Top, Right, Bottom, Left */
22
+ }
23
+ ```
24
+
25
+ Forcing a Page Break before or after an Element
26
+
27
+ ```css
28
+ /* This ensures that every `h1` starts on a new page. */
29
+ h1 {
30
+ page-break-before: always;
31
+ }
32
+ /* This ensures that every `p` element ends with a page break, starting the next content on a new page. */
33
+ p {
34
+ page-break-after: always;
35
+ }
36
+ /* This prevents a table from being split across two pages. */
37
+ table {
38
+ page-break-inside: avoid;
39
+ }
40
+ ```
41
+
42
+ ### Headers and Footers
43
+
44
+ When using Chromium-based rendering engines, headers and footers are not controlled by the Paged CSS standard but are instead managed through specific settings in the rendering engine.
45
+
46
+ With palapala PDF headers and footers are defined using `header_template` and `footer_template` options. These allow you to insert HTML content directly into the header or footer areas.
47
+
48
+ Critical is that you specify a font-size because by default Chrome uses a very tiny font.
49
+
50
+ ```ruby
51
+ Palapala::Pdf.new(
52
+ "<p>Hello world</>",
53
+ header_template: '<div style="text-align: center; font-size: 12pt;">Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>',
54
+ footer_template: '<div style="text-align: center; font-size: 12pt;">Generated with Palapala PDF</div>',
55
+ ).save("test.pdf")
56
+ ```
57
+
58
+ ### Examples
59
+
60
+ #### Headers and Footers
61
+
62
+ TODO explain about headers and footers, font sizes, styles being independent, and how to insert current page, total pages, etc.
63
+
64
+ #### Page sizes and margins
65
+
66
+ Paged CSS, also known as @page CSS, is used to control the layout and appearance of printed documents. It allows you to define page-specific styles, such as sizes and margins, which are crucial for generating well-formatted PDFs.
67
+
68
+ You can specify the size of the page using predefined sizes or custom dimensions. Common predefined sizes include A4, A3, letter, etc. Margins can be set for the top, right, bottom, and left sides of the page. You can specify all four margins at once or individually. You can also define named pages for different sections of your document.
69
+
70
+ ##### Example: Different First Page
71
+
72
+ TODO Validate
73
+
74
+ ```css
75
+ @page first {
76
+ size: A4;
77
+ margin: 2in; /* Larger margin for the first page */
78
+ }
79
+
80
+ @page {
81
+ size: A4;
82
+ margin: 1in;
83
+ }
84
+
85
+ body {
86
+ counter-reset: page;
87
+ }
88
+
89
+ body:first {
90
+ page: first;
91
+ }
92
+ ```
93
+
94
+ #### Page breaks
95
+
96
+ Paged CSS allows you to control how content is divided across pages when printing or generating PDFs. Page breaks are an essential part of this, as they determine where a new page starts. You can control page breaks using the `page-break-before`, `page-break-after`, and `page-break-inside` properties.
97
+
98
+ ##### Page Break Properties
99
+
100
+ 1. **`page-break-before`**: Forces a page break before the element.
101
+ 2. **`page-break-after`**: Forces a page break after the element.
102
+ 3. **`page-break-inside`**: Prevents or allows a page break inside the element.
103
+
104
+ ##### Values
105
+
106
+ - `auto`: Default. Neither forces nor prevents a page break.
107
+ - `always` Always forces a page break.
108
+ - `avoid`: Avoids a page break inside the element.
109
+ - `left`: Forces a page break so that the next page is a left page.
110
+ - `right`: Forces a page break so that the next page is a right page.
111
+
112
+ ##### Examples
113
+
114
+ ```css
115
+ /* This ensures that every `h1` starts on a new page. */
116
+ h1 {
117
+ page-break-before: always;
118
+ }
119
+ /* This ensures that every `p` element ends with a page break, starting the next content on a new page. */
120
+ p {
121
+ page-break-after: always;
122
+ }
123
+ /* This prevents a table from being split across two pages. */
124
+ table {
125
+ page-break-inside: avoid;
126
+ }
127
+ ```
128
+
129
+ ##### Practical Use Cases
130
+
131
+ - **Chapter Titles**: Use `page-break-before: always;` for chapter titles to ensure each chapter starts on a new page.
132
+ - **Sections**: Use `page-break-after: always;` for sections that should end with a page break.
133
+ - **Tables and Figures**: Use `page-break-inside: avoid;` to keep tables and figures from being split across pages.
134
+
135
+ #### Tables accross Pages
136
+
137
+ TODO explain `display` property with the values `table-header-group` and `table-footer-group`
138
+
139
+ ##### Example
140
+
141
+ ```html
142
+ <table>
143
+ <thead>
144
+ <tr>
145
+ <th>Header 1</th>
146
+ <th>Header 2</th>
147
+ </tr>
148
+ </thead>
149
+ <tbody>
150
+ <tr>
151
+ <td>Data 1</td>
152
+ <td>Data 2</td>
153
+ </tr>
154
+ <!-- More rows -->
155
+ </tbody>
156
+ <tfoot>
157
+ <tr>
158
+ <td>Footer 1</td>
159
+ <td>Footer 2</td>
160
+ </tr>
161
+ </tfoot>
162
+ </table>
163
+ ```
164
+
165
+ In this example:
166
+ - The `<thead>` section will be repeated at the top of each page.
167
+ - The `<tfoot>` section will be repeated at the bottom of each page.
data/examples/all.rb ADDED
@@ -0,0 +1,9 @@
1
+ $LOAD_PATH.unshift File.expand_path('../lib', __dir__)
2
+ require 'palapala'
3
+
4
+ $debug = ARGV[0] == 'debug'
5
+ Palapala.debug = $debug
6
+
7
+ require_relative "headers_and_footers"
8
+ require_relative "paged_css"
9
+ require_relative "js_based_rendering"
@@ -0,0 +1,169 @@
1
+ <!--
2
+ OPTIONS AS PASSED IN THE C++ code
3
+ =================================
4
+ options.Set(kSettingHeaderFooterDate,
5
+ base::Time::Now().InMillisecondsFSinceUnixEpoch());
6
+ options.Set("width", static_cast<double>(page_size.width()));
7
+ options.Set("height", static_cast<double>(page_size.height()));
8
+ options.Set("topMargin", page_layout.margin_top);
9
+ options.Set("bottomMargin", page_layout.margin_bottom);
10
+ options.Set("leftMargin", page_layout.margin_left);
11
+ options.Set("rightMargin", page_layout.margin_right);
12
+ // `page_index` is 0-based, so 1 is added to get the page number.
13
+ options.Set("pageNumber", base::checked_cast<int>(page_index + 1));
14
+ options.Set("totalPages", base::checked_cast<int>(total_pages));
15
+ options.Set("url", params.url);
16
+ std::u16string title = source_frame.GetDocument().Title().Utf16();
17
+ options.Set("title", title.empty() ? params.title : title);
18
+ options.Set("headerTemplate", params.header_template);
19
+ options.Set("footerTemplate", params.footer_template);
20
+ options.Set("isRtl", base::i18n::IsRTL());
21
+ -->
22
+
23
+ <!doctype html>
24
+ <html>
25
+
26
+ <head>
27
+ <link rel="stylesheet" href="chrome://resources/css/text_defaults.css">
28
+ <style>
29
+ body {
30
+ display: flex;
31
+ flex-direction: column;
32
+ margin: 0;
33
+ }
34
+
35
+ #header,
36
+ #footer {
37
+ display: flex;
38
+ flex: none;
39
+ }
40
+
41
+ #header {
42
+ align-items: flex-start;
43
+ padding-top: 15pt;
44
+ }
45
+
46
+ #footer {
47
+ align-items: flex-end;
48
+ padding-bottom: 15pt;
49
+ }
50
+
51
+ #content {
52
+ flex: auto;
53
+ }
54
+
55
+ .left {
56
+ flex: none;
57
+ padding-left: 24pt;
58
+ /* csschecker-disable-line left-right */
59
+ padding-right: 6pt;
60
+ /* csschecker-disable-line left-right */
61
+ }
62
+
63
+ .center {
64
+ flex: auto;
65
+ padding-left: 24pt;
66
+ /* csschecker-disable-line left-right */
67
+ padding-right: 24pt;
68
+ /* csschecker-disable-line left-right */
69
+ text-align: center;
70
+ }
71
+
72
+ .right {
73
+ flex: none;
74
+ /* historically does not account for RTL */
75
+ padding-left: 6pt;
76
+ /* csschecker-disable-line left-right */
77
+ padding-right: 24pt;
78
+ /* csschecker-disable-line left-right */
79
+ }
80
+
81
+ .grow {
82
+ flex: auto;
83
+ }
84
+
85
+ .text {
86
+ font-size: 8pt;
87
+ overflow: hidden;
88
+ text-overflow: ellipsis;
89
+ white-space: nowrap;
90
+ }
91
+ </style>
92
+ <script>
93
+
94
+ function getComputedStyleAsFloat(style, value) {
95
+ return parseFloat(style.getPropertyValue(value).slice(0, -2));
96
+ }
97
+
98
+ function elementIntersects(element, topPos, bottomPos, leftPos, rightPos) {
99
+ const rect = element.getBoundingClientRect();
100
+ const style = window.getComputedStyle(element);
101
+
102
+ // Only consider the size of |element|, so remove the padding from |rect|.
103
+ // The padding is used for positioning.
104
+ rect.top += getComputedStyleAsFloat(style, 'padding-top');
105
+ rect.bottom -= getComputedStyleAsFloat(style, 'padding-bottom');
106
+ rect.left += getComputedStyleAsFloat(style, 'padding-left');
107
+ rect.right -= getComputedStyleAsFloat(style, 'padding-right');
108
+ return leftPos < rect.right && rightPos > rect.left && topPos < rect.bottom &&
109
+ bottomPos > rect.top;
110
+ }
111
+
112
+ function setupHeaderFooterTemplate(options) {
113
+ const body = document.querySelector('body');
114
+ const header = document.querySelector('#header');
115
+ const footer = document.querySelector('#footer');
116
+
117
+ body.style.width = `${options.width}px`;
118
+ body.style.height = `${options.height}px`;
119
+ header.style.height = `${options.topMargin}px`;
120
+ footer.style.height = `${options.bottomMargin}px`;
121
+
122
+ const topMargin = options.topMargin;
123
+ const bottomMargin = options.height - options.bottomMargin;
124
+ const leftMargin = options.leftMargin;
125
+ const rightMargin = options.width - options.rightMargin;
126
+
127
+ header.innerHTML = options['headerTemplate'] || `
128
+ <div class='date text left'></div>
129
+ <div class='title text center'></div>`;
130
+ footer.innerHTML = options['footerTemplate'] || `
131
+ <div class='url text left grow'></div>
132
+ <div class='text right'>
133
+ <span class='pageNumber'></span>/<span class='totalPages'></span>
134
+ </div>`;
135
+
136
+ const date = new Date(options.date);
137
+ const formatter =
138
+ new Intl.DateTimeFormat(
139
+ navigator.languages[0].split('@')[0],
140
+ { dateStyle: 'short', timeStyle: 'short' });
141
+ options.date = formatter.format(date);
142
+ for (const cssClass of ['date', 'title', 'url', 'pageNumber', 'totalPages']) {
143
+ for (const element of document.querySelectorAll(`.${cssClass}`)) {
144
+ element.textContent = options[cssClass];
145
+ }
146
+ }
147
+ for (const element of document.querySelectorAll(`.text`)) {
148
+ if (options.isRtl &&
149
+ !element.classList.contains('url') &&
150
+ !element.classList.contains('title')) {
151
+ element.dir = 'rtl';
152
+ }
153
+ if (elementIntersects(element, topMargin, bottomMargin, leftMargin,
154
+ rightMargin)) {
155
+ element.style.visibility = 'hidden';
156
+ }
157
+ }
158
+ }
159
+
160
+ </script>
161
+ </head>
162
+
163
+ <body>
164
+ <div id="header"></div>
165
+ <div id="content"></div>
166
+ <div id="footer"></div>
167
+ </body>
168
+
169
+ </html>
Binary file
@@ -3,42 +3,18 @@
3
3
  $LOAD_PATH.unshift File.expand_path('../lib', __dir__)
4
4
  require 'palapala'
5
5
 
6
- HEADER_HTML = <<~HTML
7
- <style type="text/css">
8
- .header {
9
- -webkit-print-color-adjust: exact;
10
- border-bottom: 1px solid lightgray;
11
- color: black;
12
- font-family: Arial, Helvetica, sans-serif;
13
- font-size: 12pt;
14
- margin: 0 auto;
15
- padding: 5px;
16
- text-align: center;
17
- vertical-align: middle;
18
- width: 100%;
19
- border: 1px solid black;
20
- }
21
- </style>
22
- <div class="header" style="text-align: center">
23
- Page <span class="pageNumber"></span> of <span class="totalPages"></span>
24
- </div>
25
- HTML
6
+ header_template =
7
+ '<div style="text-align: center; font-size: 12pt; padding: 1rem; width: 100%;">Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>'
26
8
 
27
- Palapala.setup do |config|
28
- # config.debug = true
29
- # config.headless_chrome_url = 'http://localhost:9222' # run against a remote Chrome instance
30
- # config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
31
- end
9
+ footer_template =
10
+ '<div style="text-align: center; font-size: 12pt; padding: 1rem; width: 100%;">Generated with Palapala PDF</div>'
32
11
 
33
- result = Palapala::Pdf.new(
34
- # "<style>@page { size: A4 landscape; }</style><p>Hello world #{Time.now}</>",
12
+ Palapala::Pdf.new(
35
13
  "<h1>Title</h1><p>Hello world #{Time.now}</>",
36
- header_template: HEADER_HTML,
37
- footer_template: '<div style="text-align: center; font-size: 12pt; width: 100%;">Generated with Palapala PDF</div>',
38
- scale: 0.75,
39
- prefer_css_page_size: false,
14
+ header_template:,
15
+ footer_template:,
40
16
  margin_top: 3,
41
- margin_bottom: 2).save('tmp/headers_and_footers.pdf')
17
+ margin_bottom: 3).save('headers_and_footers.pdf')
42
18
 
43
- puts result
44
- `open tmp/headers_and_footers.pdf`
19
+ puts "Generated headers_and_footers.pdf"
20
+ # `open headers_and_footers.pdf`
Binary file
@@ -14,10 +14,8 @@ DOCUMENT = <<~HTML
14
14
  </html>
15
15
  HTML
16
16
 
17
- Palapala.setup do |config|
18
- # config.debug = true
19
- # config.defaults = { header_template: '<div></div>', footer_template: '<div></div>' }
20
- end
17
+ Palapala::Pdf.new(DOCUMENT).save('js_based_rendering.pdf')
21
18
 
22
- Palapala::Pdf.new(DOCUMENT).save('tmp/js_based_rendering.pdf')
23
- `open tmp/js_based_rendering.pdf`
19
+ puts "Generated js_based_rendering.pdf"
20
+
21
+ # `open tmp/js_based_rendering.pdf`
Binary file
@@ -0,0 +1,186 @@
1
+ # frozen_string_literal: true
2
+
3
+ $LOAD_PATH.unshift File.expand_path("../lib", __dir__)
4
+ require "palapala"
5
+
6
+ long_text = (1..30).map { "Demonstrate a paragraph that is not split across pages." }.join(" ")
7
+
8
+ def table(rows)
9
+ <<~HTML
10
+ <table>
11
+ <thead>
12
+ <tr>
13
+ <th>Header 1</th>
14
+ <th>Header 2</th>
15
+ </tr>
16
+ </thead>
17
+ <tbody>
18
+ #{ (1..rows).map { |i| "<tr><td>Row #{i}, Cell 1</td><td>Row #{i}, Cell 2</td></tr>" }.join }
19
+ </tbody>
20
+ <tfoot>
21
+ <tr>
22
+ <td>Footer 1</td>
23
+ <td>Footer 2</td>
24
+ </tr>
25
+ </tfoot>
26
+ </table>
27
+ HTML
28
+ end
29
+
30
+ big_table = table(35)
31
+ small_table = table(5)
32
+
33
+ document = <<~HTML
34
+ <html>
35
+ <style>
36
+ @page {
37
+ size: A4;
38
+ margin: 2cm;
39
+ margin-top: 3cm;
40
+ margin-bottom: 3cm;
41
+ }
42
+ body, html {
43
+ margin: 0;
44
+ padding: 0;
45
+ font-family: Arial, sans-serif;
46
+ }
47
+ h1 {
48
+ page-break-before: always;
49
+ border-bottom: 1px solid black;
50
+ }
51
+ h2 {
52
+ /* keep with next */
53
+ page-break-after: avoid;
54
+ }
55
+ @page:first {
56
+ size: A4 landscape;
57
+ margin: 0; /* no margin for the first page */
58
+ padding: 0;
59
+ }
60
+ div.titlepage {
61
+ background-color: black;
62
+ color: white;
63
+ font-size: 72pt;
64
+ text-align: center;
65
+ display: flex;
66
+ justify-content: center;
67
+ align-items: center;
68
+ height: 100%;
69
+ width: 100vw;
70
+ }
71
+ table {
72
+ font-size: 10pt;
73
+ width: 100%;
74
+ border-collapse: collapse;
75
+ td, th {
76
+ border: 1px solid black;
77
+ padding: 0.5rem;
78
+ }
79
+ & thead, & tfoot {
80
+ tr {
81
+ background-color: lightgray;
82
+ & th, & td {
83
+ padding-top: 0.5rem;
84
+ padding-bottom: 0.5rem;
85
+ }
86
+ }
87
+ }
88
+ }
89
+ /* Initialize counters */
90
+ body {
91
+ counter-reset: h1Counter h2Counter;
92
+ }
93
+ /* Numbering for H1 elements */
94
+ h1 {
95
+ counter-increment: h1Counter;
96
+ counter-reset: h2Counter; /* Reset h2 counter when a new h1 appears */
97
+ }
98
+ h1::before {
99
+ content: counter(h1Counter) ". ";
100
+ /* font-weight: bold; */
101
+ }
102
+ /* Numbering for H2 elements */
103
+ h2 {
104
+ counter-increment: h2Counter;
105
+ }
106
+ h2::before {
107
+ content: counter(h1Counter) "." counter(h2Counter) " ";
108
+ /* font-weight: bold; */
109
+ }
110
+ /* named pages */
111
+ @page addendum {
112
+ size: A5;
113
+ margin: 1cm;
114
+ margin-top: 3cm;
115
+ }
116
+ .addendum {
117
+ page: addendum;
118
+ counter-reset: h1Counter h2Counter;
119
+ }
120
+ </style>
121
+ <body>
122
+ <div class="titlepage">
123
+ <c-title>Title Page</c-title>
124
+ </div>
125
+ <h1>New Section</h1>
126
+ <h2>Subsection tables</h2>
127
+ <p>This demonstrates a table with a header and footer that spans multiple pages.</p>
128
+ #{big_table}
129
+ <h2>Subsection page break inside</h2>
130
+ <p style="page-break-inside: avoid; text-align: justify">
131
+ #{long_text}
132
+ </p>
133
+ <p>Note that the section title has moved to the second page because the paragraph above was moved to the second page.</p>
134
+ <h1>New Section</h1>
135
+ <p>Page 3 content</p>
136
+ <p>A small table</p>
137
+ #{small_table}
138
+ <h2>Subsection</h2>
139
+ <p>Some content</p>
140
+ <h2>Subsection</h2>
141
+ <p>Some content</p>
142
+ <div class="addendum">
143
+ This is an addendum and the page size is A5.
144
+ Headers are starting again from 1.
145
+ <h1>Some addendum header</h1>
146
+ <h2>Subsection</h2>
147
+ <h2>Subsection</h2>
148
+ <h1>Some addendum header</h1>
149
+ </div>
150
+ </body>
151
+ </html>
152
+ HTML
153
+
154
+ def debug(color: "red")
155
+ <<~HTML
156
+ <style>
157
+ /* this is a class chrome assigns to the header, footer and content in the main template */
158
+ #header, #content, #footer {
159
+ border: 1px dotted #{color}; /* uncomment to see the areas */
160
+ }
161
+ </style>
162
+ HTML
163
+ end
164
+
165
+ def header_footer_template(debug_color: nil)
166
+ <<~HTML
167
+ #{ debug(color: debug_color) if debug_color }
168
+ <div style="text-align: center; font-size: 12pt; padding: 1rem; width: 100%;">#{yield}</div>
169
+ HTML
170
+ end
171
+
172
+ footer_template = header_footer_template do
173
+ "Page <span class='pageNumber'></span> of <span class='totalPages'></span>"
174
+ end
175
+
176
+ header_template = header_footer_template do
177
+ "Generated with Palapala PDF"
178
+ end
179
+
180
+ Palapala::Pdf.new(document,
181
+ header_template:,
182
+ footer_template:).save("paged_css.pdf")
183
+
184
+ puts "Generated paged_css.pdf"
185
+
186
+ # `open paged_css.pdf`
@@ -5,14 +5,10 @@ $LOAD_PATH.unshift File.expand_path('../lib', __dir__)
5
5
  require 'benchmark'
6
6
  require 'palapala'
7
7
 
8
- debug = ARGV[0] == 'debug'
8
+ $debug = ARGV[0] == 'debug'
9
+ $save = ARGV[0] == 'save'
9
10
 
10
- Palapala.setup do |config|
11
- # config.headless_chrome_url = 'http://localhost:9222'
12
- config.debug = debug
13
- config.defaults.merge! scale: 0.75, format: :A4
14
- config.chrome_headless_shell_version = 'canary'
15
- end
11
+ Palapala.debug = $debug
16
12
 
17
13
  # @param concurrency Number of concurrent threads
18
14
  # @param iterations Number of iterations per thread
@@ -22,7 +18,8 @@ def benchmark(concurrency, iterations)
22
18
  Thread.new do
23
19
  iterations.times do |j|
24
20
  doc = "Hello #{i}, world #{j}! #{Time.now}."
25
- Palapala::Pdf.new(doc).save("tmp/benchmark_#{i}_#{j}.pdf")
21
+ pdf = Palapala::Pdf.new(doc)
22
+ $save ? pdf.save("tmp/benchmark_#{i}_#{j}.pdf") : pdf.binary_data
26
23
  end
27
24
  end
28
25
  end
@@ -32,18 +29,9 @@ def benchmark(concurrency, iterations)
32
29
  time
33
30
  end
34
31
 
35
- puts 'warmup'
32
+ puts "Warmup..."
33
+ benchmark(1, 5)
34
+ puts "Starting benchmark..."
36
35
  benchmark(1, 10)
37
-
38
- # benchmark(1, 20)
39
- benchmark(2, 10)
40
- # benchmark(4, 5)
41
- # benchmark(5, 4)
42
- # benchmark(20, 1)
43
-
44
- # benchmark(1, 320)
45
- # benchmark(2, 320 / 2)
36
+ benchmark(2, 20 / 2)
46
37
  benchmark(4, 320 / 4)
47
- # benchmark(8, 320 / 8)
48
- # benchmark(20, 2)
49
- # benchmark(40, 1)
data/lib/palapala/pdf.rb CHANGED
@@ -56,7 +56,7 @@ module Palapala
56
56
  @opts[:preferCSSPageSize] = prefer_css_page_size || Palapala.defaults[:prefer_css_page_size]
57
57
  @opts[:printBackground] = print_background || Palapala.defaults[:print_background]
58
58
  @opts[:scale] = scale || Palapala.defaults[:scale]
59
- @opts[:displayHeaderFooter] = true
59
+ @opts[:displayHeaderFooter] = (@opts[:headerTemplate] || @opts[:footerTemplate]) ? true : false
60
60
  @opts[:encoding] = :binary
61
61
  @opts.compact!
62
62
  end
@@ -44,6 +44,9 @@ module Palapala
44
44
  def on_message(e)
45
45
  puts "Received: #{e.data[0..64]}" if Palapala.debug
46
46
  @response = JSON.parse(e.data) # Parse the JSON response
47
+ if @response["error"] # Raise an error if the response contains an error
48
+ raise "#{@response["error"]["message"]}: #{@response["error"]["data"]} (#{@response["error"]["code"]})"
49
+ end
47
50
  end
48
51
 
49
52
  # Update the current ID to the next ID (increment by 1)
@@ -1,3 +1,3 @@
1
1
  module Palapala
2
- VERSION = "0.1.10"
2
+ VERSION = "0.1.11"
3
3
  end
data/lib/palapala.rb CHANGED
@@ -27,12 +27,8 @@ module Palapala
27
27
  attr_accessor :chrome_headless_shell_version
28
28
  end
29
29
  self.debug = false
30
- self.defaults = {
31
- header_template: "<div></div>",
32
- footer_template: "<div></div>"
33
- # footer_template: '<div style="text-align: center; font-size: 12pt; width: 100%;">Generated with Palapala PDF</div>'
34
- }
35
- self.headless_chrome_path = nil
30
+ self.defaults = { print_background: true, prefer_css_page_size: true, margin_left: 0, margin_right: 0, margin_top: 0, margin_bottom: 0 }
31
+ self.headless_chrome_path = ENV.fetch("HEADLESS_CHROME_PATH", nil)
36
32
  self.headless_chrome_url = ENV.fetch("HEADLESS_CHROME_URL", "http://localhost:9222")
37
33
  self.chrome_headless_shell_version = ENV.fetch("CHROME_HEADLESS_SHELL_VERSION", "stable")
38
34
  end
data/paged_css.pdf ADDED
Binary file
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: palapala_pdf
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.10
4
+ version: 0.1.11
5
5
  platform: ruby
6
6
  authors:
7
7
  - Koen Handekyn
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2024-08-29 00:00:00.000000000 Z
11
+ date: 2024-08-30 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: base64
@@ -56,8 +56,16 @@ files:
56
56
  - assets/images/logo-variant2.webp
57
57
  - assets/images/logo.webp
58
58
  - bin/chrome-headless-server
59
+ - doc/installing_node.md
60
+ - doc/paged_css.md
61
+ - examples/all.rb
62
+ - examples/chrome_base_header_footer_template.html
63
+ - examples/headers_and_footers.pdf
59
64
  - examples/headers_and_footers.rb
65
+ - examples/js_based_rendering.pdf
60
66
  - examples/js_based_rendering.rb
67
+ - examples/paged_css.pdf
68
+ - examples/paged_css.rb
61
69
  - examples/performance_benchmark.rb
62
70
  - lib/palapala.rb
63
71
  - lib/palapala/chrome_process.rb
@@ -66,6 +74,7 @@ files:
66
74
  - lib/palapala/version.rb
67
75
  - lib/palapala/web_socket_client.rb
68
76
  - lib/palapala_pdf.rb
77
+ - paged_css.pdf
69
78
  - palapala_pdf.gemspec
70
79
  homepage: https://github.com/palapala-app/palapala_pdf
71
80
  licenses: