palapala_pdf 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: f4a479307ef1a9d4ebe8aee6d8b3f2d7da1f96c854252ba69cc19bbaba45f6cf
4
- data.tar.gz: d8f184150f5b43eb1abcbaceaf4e37a643b9a3fdfbcca7dff075f56ec1d5622b
3
+ metadata.gz: 467a7808449c84b0d2ddbd56c16a791ce1e3148636d911e8010fd3804f306abf
4
+ data.tar.gz: 7680b71baec5030b0992383f77c3098ac12da223d88057a36a4a985f97755c35
5
5
  SHA512:
6
- metadata.gz: a094985ed908279fac68ed43d7fbf83b333591d97ca723af840263425c94d530647388417bd935618680f402e617af6097c55eb8e09a98dd69df5a61a7eaa495
7
- data.tar.gz: '07382dfb24841886ba2c09ae20b240701cb57d37963f16fd8e9e135f299f63962a5c4f09f6f14d4b2a0e86ae33a80cd2bc45426605b9137f3cdbc70c50cb34b2'
6
+ metadata.gz: 2533a247040eed4962a337ebd15e9bf51c875ed562242e44b791597d46dc69079e182980f9053cb3519525f6283c249f32dde0ee5668f9bccb2f9a10843d33f9
7
+ data.tar.gz: '07527904c55f1f0b99d98c1854ac369722c548f4626f5fe5ead3bf49d3bd573fc68ff9f4514cdbdf0acf4fb6c4bead8b80e10f34b3b55ee9fd3cc8f7ddf08436'
data/README.md CHANGED
@@ -2,13 +2,13 @@
2
2
 
3
3
  This project is a Ruby gem that provides functionality for generating PDF files from HTML using the Chrome browser. It allows you to easily convert HTML content into PDF documents, making it convenient for tasks such as generating reports, invoices, or any other printable documents. The gem provides a simple and intuitive API for converting HTML to PDF, and it leverages the power and flexibility of the Chrome browser's rendering engine to ensure accurate and high-quality PDF output. With this gem, you can easily integrate PDF generation capabilities into your Ruby applications.
4
4
 
5
- At the core, this project leverages the same rendering engine as [Grover](https://github.com/Studiosity/grover), but with significantly reduced overhead and dependencies. Instead of relying on the full Grover/Puppeteer/NodeJS stack, this project builds on [Ferrum](https://github.com/rubycdp/ferrum) to enable direct communication from Ruby to a headless Chrome or Chromium browser. This approach ensures efficient, thread-safe operations, providing a streamlined alternative for rendering tasks without sacrificing performance or flexibility.
5
+ At the core, this project leverages the same rendering engine as [Grover](https://github.com/Studiosity/grover), but with significantly reduced overhead and dependencies. Instead of relying on the full Grover/Puppeteer/NodeJS stack, this project uses a raw web socket to enable direct communication from Ruby to a headless Chrome or Chromium browser. This approach ensures efficieny while providing a streamlined alternative for rendering tasks without sacrificing performance or flexibility.
6
6
 
7
- This is how easy and powerfull PDF generation should be in Ruby:
7
+ This is how easy and powerfull PDF generation can be in Ruby:
8
8
 
9
9
  ```ruby
10
10
  require "palapala"
11
- Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
11
+ Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
12
12
  ```
13
13
 
14
14
  And this while having the most modern HTML/CSS/JS availlable to you: flex, grid, canvas, you name it.
@@ -27,15 +27,23 @@ If you are not using bundler to manage dependencies, you can install the gem by
27
27
  $ gem install palapala_pdf
28
28
  ```
29
29
 
30
- Palapala PDF uses [Ferrum](https://github.com/rubycdp/ferrum) inside and that one is pretty good at finding your Chrome or Chromium.
30
+ Palapala PDF connects to Chrome over a web socket connection.
31
31
 
32
- If you want the highest throughput, then use an external Chrome/Chromium. Just start it with (9222 is the default port):
32
+ An external Chrome/Chromium is expected.
33
+ Just start it with the following command (9222 is the default port):
33
34
 
34
35
  ```sh
35
- chrome --headless --disable-gpu --remote-debugging-port=9222
36
+ /path/to/chrome --headless --disable-gpu --remote-debugging-port=9222
36
37
  ```
37
38
 
38
- Then you can run Palapala PDF against that Chrome/Chromium instance (see configuration).
39
+ Alternatively, Palapala PDF will try to launch Chrome as a child process.
40
+ It guesses the path to Chrome, or you configure it like this:
41
+
42
+ ```ruby
43
+ Palapala.setup do |config|
44
+ config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
45
+ end
46
+ ```
39
47
 
40
48
  ## Usage Instructions
41
49
 
@@ -43,19 +51,20 @@ To create a PDF from HTML content using the `Palapala` library, follow these ste
43
51
 
44
52
  1. **Configuration**:
45
53
 
46
- Configure the `Palapala` library with the necessary options, such as the URL for the Ferrum browser and default settings like scale and format.
54
+ Configure the `Palapala` library with the necessary options, such as the URL for the browser and default settings like scale and format.
47
55
 
48
56
  In a Rails context, this could be inside an initializer.
49
57
 
50
58
  ```ruby
51
59
  Palapala.setup do |config|
52
60
  # run against an external chrome/chromium or leave this out to run against a chrome that is started as a child process
53
- config.ferrum_opts = { url: 'http://localhost:9222' }
61
+ config.debug = true
62
+ config.headless_chrome_url = 'http://localhost:9222' # run against a remote Chrome instance
63
+ # config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
54
64
  config.defaults = { scale: 1, format: :A4 }
55
65
  end
56
66
  ```
57
-
58
- 2. **Create a PDF from HTML**:
67
+ 1. **Create a PDF from HTML**:
59
68
 
60
69
  Create a PDF file from HTML in `irb`
61
70
 
@@ -67,14 +76,14 @@ in IRB, load palapala and create a PDF from an HTML snippet:
67
76
 
68
77
  ```ruby
69
78
  require "palapala"
70
- Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
79
+ Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
71
80
  ```
72
81
 
73
- Instantiate a new Palapala::PDF object with your HTML content and generate the PDF binary data.
82
+ Instantiate a new Palapala::Pdf object with your HTML content and generate the PDF binary data.
74
83
 
75
84
  ```ruby
76
85
  require "palapala"
77
- binary_data = Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
86
+ binary_data = Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
78
87
  ```
79
88
 
80
89
  ## Paged CSS
@@ -88,7 +97,7 @@ When using Chromium-based rendering engines, headers and footers are not control
88
97
  With palapala PDF headers and footers are defined using `header_html` and `footer_html` options. These allow you to insert HTML content directly into the header or footer areas.
89
98
 
90
99
  ```ruby
91
- Palapala::PDF.new(
100
+ Palapala::Pdf.new(
92
101
  "<p>Hello world</>",
93
102
  header_html: '<div style="text-align: center;">Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>',
94
103
  footer_html: '<div style="text-align: center;">Generated with Palapala PDF</div>',
@@ -119,50 +128,9 @@ todo example
119
128
  </html>
120
129
  ```
121
130
 
122
- ## Customisation
123
-
124
- ### Ferrum
125
-
126
- It is Ruby clean and high-level API to Chrome. All you need is Ruby and
127
- [Chrome](https://www.google.com/chrome/) or
128
- [Chromium](https://www.chromium.org/). Ferrum connects to the browser by [CDP
129
- protocol](https://chromedevtools.github.io/devtools-protocol/).
130
-
131
- Highlighting some key Ferrum options in the context of PDF generation
132
-
133
- * options `Hash`
134
- * `:headless` (String | Boolean) - Set browser as headless or not, `true` by default. You can set `"new"` to support
135
- [new headless mode](https://developer.chrome.com/articles/new-headless/).
136
- * `:xvfb` (Boolean) - Run browser in a virtual framebuffer, `false` by default.
137
- * `:extensions` (Array[String | Hash]) - An array of paths to files or JS
138
- source code to be preloaded into the browser e.g.:
139
- `["/path/to/script.js", { source: "window.secret = 'top'" }]`
140
- * `:logger` (Object responding to `puts`) - When present, debug output is
141
- written to this object.
142
- * `:timeout` (Numeric) - The number of seconds we'll wait for a response when
143
- communicating with browser. Default is 5.
144
- * `:js_errors` (Boolean) - When true, JavaScript errors get re-raised in Ruby.
145
- * `:pending_connection_errors` (Boolean) - When main frame is still waiting for slow responses while timeout is
146
- reached `PendingConnectionsError` is raised. It's better to figure out why you have slow responses and fix or
147
- block them rather than turn this setting off. Default is true.
148
- * `:browser_path` (String) - Path to Chrome binary, you can also set ENV
149
- variable as `BROWSER_PATH=some/path/chrome`.
150
- * `:browser_options` (Hash) - Additional command line options,
151
- [see them all](https://peter.sh/experiments/chromium-command-line-switches/)
152
- e.g. `{ "ignore-certificate-errors" => nil }`
153
- * `:ignore_default_browser_options` (Boolean) - Ferrum has a number of default
154
- options it passes to the browser, if you set this to `true` then only
155
- options you put in `:browser_options` will be passed to the browser,
156
- except required ones of course.
157
- * `:url` (String) - URL for a running instance of Chrome. If this is set, a
158
- browser process will not be spawned.
159
- * `:process_timeout` (Integer) - How long to wait for the Chrome process to
160
- respond on startup.
161
- * `:ws_max_receive_size` (Integer) - How big messages to accept from Chrome
162
- over the web socket, in bytes. Defaults to 64MB. Incoming messages larger
163
- than this will cause a `Ferrum::DeadBrowserError`.
164
-
165
- More [details](https://github.com/rubycdp/ferrum#customization)
131
+ ## Raw parameters (Page.printToPDF)
132
+
133
+ See (Page.printToPDF)[https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF]
166
134
 
167
135
  ## Development
168
136
 
@@ -190,66 +158,71 @@ Your support is greatly appreciated and helps maintain the project!
190
158
 
191
159
  ## Findings
192
160
 
193
- For Chrome, mode headless=new seems to be slower for pdf rendering cases.
194
-
195
- On mac m3 (aug 24), chromium (brew install chromium) is about 3x slower then chrome? Maybe the chromium that get's installed is not ARM optimized?
161
+ - For Chrome, mode headless=new seems to be slower for pdf rendering cases.
162
+ - On mac m3 (aug 24), chromium (brew install chromium) is about 3x slower then chrome? Maybe the chromium that get's installed is not ARM optimized?
196
163
 
197
164
  ## Primitive benchmark
198
165
 
199
- On a macbook m3, the throughput for 'hello world' PDF generation can reach around 25 docs/second when allowing for some concurrency. As Chrome is actually also very efficient, it scales really well for complex documents also. If you run this in Rails, the concurrency is being taken care of either by the front end thread pool or by the workers and you shouldn't have to think about this. (Using an external Chrome)
166
+ On a macbook m3, the throughput for 'hello world' PDF generation can reach around 300 docs/second when allowing for some concurrency. As Chrome is actually also very efficient, it scales really well for complex documents also. If you run this in Rails, the concurrency is being taken care of either by the front end thread pool or by the workers and you shouldn't have to think about this. (Using an external Chrome)
167
+
168
+ Note: it renders `"Hello #{i}, world #{j}! #{Time.now}."` where i is the thread and j is the iteration counter within the thread and persists it to an SSD (which is very fast these days).
200
169
 
170
+ ### benchmarking 20 docs: 1x20, 2x10, 4x5
201
171
 
172
+ ```sh
173
+ c:1, n:20 : Throughput = 159.41 docs/sec, Total time = 0.1255 seconds
174
+ c:2, n:10 : Throughput = 124.91 docs/sec, Total time = 0.1601 seconds
175
+ c:4, n:5 : Throughput = 196.40 docs/sec, Total time = 0.1018 seconds
202
176
  ```
203
- benchmarking 20 docs: 1x20, 2x10, 4x5, 5x4, 20x1 (c is concurrency, n is iterations)
204
- Total time c:1, n:20 = 1.2048690000083297 seconds
205
- Total time c:2, n:10 = 0.8969700000016019 seconds
206
- Total time c:4, n:5 = 0.7497870000079274 seconds
207
- Total time c:5, n:4 = 0.72492800001055 seconds
208
- Total time c:20, n:1 = 0.7156629998935387 seconds
177
+
178
+ ### benchmarking 320 docs: 1x320, 4x80, 8x40
179
+
180
+ ```sh
181
+ c:1, n:320 : Throughput = 184.99 docs/sec, Total time = 1.7299 seconds
182
+ c:4, n:80 : Throughput = 302.50 docs/sec, Total time = 1.0578 seconds
183
+ c:8, n:40 : Throughput = 254.29 docs/sec, Total time = 1.2584 seconds
209
184
  ```
210
185
 
186
+ This is about a factor 100x faster then what you typically get with Grover and still 10x faster then with many alternatives. It's effectively that fast that you can run this for a lot of uses cases straight from e.g. your Ruby On Rails web worker in the controller on a single machine and still scale to lot's of users.
211
187
 
212
188
  ## Rails
213
189
 
214
- ### `send_data`
190
+ ### `send_data` and `render_to_string`
215
191
 
216
192
  The `send_data` method in Rails is used to send binary data as a file download to the user's browser. It allows you to send any type of data, such as PDF files, images, or CSV files, directly to the user without saving the file on the server.
217
193
 
218
- Here's an example of how to use `send_data` to send a PDF file:
219
-
220
- ```ruby
221
- def download_pdf
222
- pdf_data = Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
223
- send_data pdf_data, filename: "document.pdf", type: "application/pdf"
224
- end
225
- ```
226
-
227
- In this example, `pdf_data` is the binary data of the PDF file. The `filename` option specifies the name of the file that will be downloaded by the user, and the `type` option specifies the MIME type of the file.
228
-
229
- ### `render_to_string`
230
-
231
194
  The `render_to_string` method in Rails is used to render a view template to a string without sending it as a response to the user's browser. It allows you to generate HTML or other text-based content that can be used in various ways, such as sending it as an email, saving it to a file, or manipulating it further before sending it as a response.
232
195
 
233
- Here's an example of how to use `render_to_string` to render a view template to a string:
196
+ Here's an example of how to use `render_to_string` to render a view template to a string and send the pdf using `send_data`:
234
197
 
235
198
  ```ruby
236
199
  def download_pdf
237
200
  html_string = render_to_string(template: "example/template", layout: "print", locals: { } )
238
- pdf_data = Palapala::PDF.new(html_string).binary_data
201
+ pdf_data = Palapala::Pdf.new(html_string).binary_data
239
202
  send_data pdf_data, filename: "document.pdf", type: "application/pdf"
240
203
  end
241
204
  ```
242
205
 
206
+ In this example, `pdf_data` is the binary data of the PDF file. The `filename` option specifies the name of the file that will be downloaded by the user, and the `type` option specifies the MIME type of the file.
207
+
243
208
  ## Docker
244
209
 
245
210
  In docker as root you must pass the no-sandbox browser option:
246
211
 
247
212
  ```ruby
248
213
  Palapala.setup do |config|
249
- config.ferrum_opts = { 'no-sandbox': nil }
214
+ config.opts = { 'no-sandbox': nil }
250
215
  end
251
216
  ```
252
- (from Ferrum) It has also been reported that the Chrome process repeatedly crashes when running inside a Docker container on an M1 Mac preventing Ferrum from working. Ferrum should work as expected when deployed to a Docker container on a non-M1 Mac.
217
+ It has also been reported that the Chrome process repeatedly crashes when running inside a Docker container on an M1 Mac. Chrome should work as expected when deployed to a Docker container on a non-M1 Mac.
218
+
219
+ ## Thread-safety
220
+
221
+ Behind the scenes, a websocket is openend and stored on Thread.current for subsequent requests. Hence, the code is
222
+ thread safe in the sense that every web socket get's a new tab in the underlying chromium and get an isolated context.
223
+
224
+ For performance reasons, the code uses a low level websocket connection that does all it's work on the curent thread
225
+ so we can avoid synchronisation penalties.
253
226
 
254
227
  ## Heroku
255
228
 
data/lib/palapala/pdf.rb CHANGED
@@ -1,13 +1,10 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'ferrum'
4
-
5
3
  module Palapala
6
4
  # Page class to generate PDF from HTML content using Chrome in headless mode in a thread-safe way
7
5
  # @param page_ranges Empty string means all pages, e.g., "1-3, 5, 7-9"
8
- class PDF
6
+ class Pdf
9
7
  def initialize(content = nil,
10
- url: nil,
11
8
  header_html: nil,
12
9
  footer_html: nil,
13
10
  generate_tagged_pdf: Palapala.defaults.fetch(:generate_tagged_pdf, false),
@@ -16,7 +13,6 @@ module Palapala
16
13
  page_ranges: Palapala.defaults.fetch(:page_ranges, nil),
17
14
  margin: Palapala.defaults.fetch(:margin, {}))
18
15
  @content = content
19
- @url = url
20
16
  @header_html = header_html
21
17
  @footer_html = footer_html
22
18
  @generate_tagged_pdf = generate_tagged_pdf
@@ -31,36 +27,17 @@ module Palapala
31
27
  end
32
28
 
33
29
  def save(path, **opts)
34
- pdf(path:, **opts)
30
+ File.binwrite(path, pdf(**opts))
35
31
  end
36
32
 
37
33
  private
38
34
 
39
- def pdf(**opts)
40
- browser_context = browser.contexts.create
41
- browser_page = browser_context.page
42
- # # output console logs for this page
43
- if Palapala.debug
44
- browser_page.on('Runtime.consoleAPICalled') do |params|
45
- params['args'].each { |r| puts(r['value']) }
46
- end
47
- end
48
- # open the page
49
- url = @url || data_url
50
- browser_page.go_to(url)
51
- # Wait for the page to load
52
- # browser_page.network.wait_for_idle
53
- # Generate PDF
54
- pdf_binary_data = browser_page.pdf(**opts_with_defaults.merge(opts))
55
- # Dispose the context
56
- browser_context.dispose
57
- # Return the PDF data
58
- opts[:path] ? opts[:path] : pdf_binary_data
35
+ def renderer
36
+ Thread.current[:renderer] ||= Renderer.new
59
37
  end
60
38
 
61
- def data_url
62
- encoded_html = Base64.strict_encode64(@content)
63
- "data:text/html;base64,#{encoded_html}"
39
+ def pdf(**opts)
40
+ renderer.html_to_pdf(@content, params: opts_with_defaults.merge(opts))
64
41
  end
65
42
 
66
43
  def opts_with_defaults
@@ -83,26 +60,7 @@ module Palapala
83
60
  opts[:marginLeft] = @margin[:left] unless @margin[:left].nil?
84
61
  opts[:marginBottom] = @margin[:bottom] unless @margin[:bottom].nil?
85
62
  opts[:marginRight] = @margin[:right] unless @margin[:right].nil?
86
-
87
- puts "opts: #{opts}" if Palapala&.debug
88
63
  opts
89
64
  end
90
-
91
- def browser
92
- # accordng to the docs ferrum is thread safe, however, under heavy load
93
- # we are seeing some issues, so we are using thread locals to have a
94
- # browser per thread
95
- Thread.current[:browser] ||= new_browser
96
- # @@browser ||= new_browser
97
- end
98
-
99
- def new_browser
100
- Ferrum::Browser.new(Palapala.ferrum_opts)
101
- end
102
-
103
- # # TODO use method from template class
104
- # def cm_to_inches(value)
105
- # value / 2.54
106
- # end
107
65
  end
108
66
  end
@@ -0,0 +1,198 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "json"
4
+ require "net/http"
5
+ require "websocket/driver"
6
+
7
+ module Palapala
8
+ # Render HTML content to PDF using Chrome in headless mode with minimal dependencies
9
+ class Renderer
10
+ def initialize
11
+ # Create an instance of WebSocketClient with the WebSocket URL
12
+ @client = Palapala::WebSocketClient.new(websocket_url)
13
+ # Create the WebSocket driver
14
+ @driver = WebSocket::Driver.client(@client)
15
+ # Register the on_message callback
16
+ @driver.on(:message, &method(:on_message))
17
+ # Start the WebSocket handshake
18
+ @driver.start
19
+ # Initialize the protocol to get the page events
20
+ send_command_and_wait_for_result("Page.enable")
21
+ end
22
+
23
+ # Callback to handle the incomming WebSocket messages
24
+ def on_message(e)
25
+ puts "Received: #{e.data[0..64]}" if Palapala.debug
26
+ @response = JSON.parse(e.data) # Parse the JSON response
27
+ end
28
+
29
+ # Update the current ID to the next ID (increment by 1)
30
+ def next_id = @id = (@id || 0) + 1
31
+
32
+ # Get the current ID
33
+ def current_id = @id
34
+
35
+ # Process the WebSocket messages until some state is true
36
+ def process_until(&block)
37
+ loop do
38
+ @driver.parse(@client.read)
39
+ return if block.call
40
+ return if @driver.state == :closed
41
+ end
42
+ end
43
+
44
+ # Method to send a message (text) and wait for a response
45
+ def send_and_wait(message, &)
46
+ puts "\nSending: #{message}" if Palapala.debug
47
+ @driver.text(message)
48
+ process_until(&)
49
+ end
50
+
51
+ # Method to send a CDP command and wait for some state to be true
52
+ def send_command(method, params: {}, &block)
53
+ send_and_wait(JSON.generate({ id: next_id, method:, params: }), &block)
54
+ end
55
+
56
+ # Method to send a CDP command and wait for the matching event to get the result
57
+ # @return [Hash] The result of the command
58
+ def send_command_and_wait_for_result(method, params: {})
59
+ send_command(method, params:) do
60
+ @response && @response["id"] == current_id
61
+ end
62
+ @response["result"]
63
+ end
64
+
65
+ # Method to send a CDP command and wait for a specific method to be called
66
+ def send_command_and_wait_for_event(method, event_name:, params: {})
67
+ send_command(method, params:) do
68
+ @response && @response["method"] == event_name
69
+ end
70
+ end
71
+
72
+ # Convert HTML content to PDF
73
+ # See https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF
74
+ # @param html [String] The HTML content to convert to PDF
75
+ # @param params [Hash] Additional parameters to pass to the CDP command
76
+ def html_to_pdf(html, params: {})
77
+ send_command_and_wait_for_event("Page.navigate", params: { url: data_url_for_html(html) },
78
+ event_name: "Page.frameStoppedLoading")
79
+ result = send_command_and_wait_for_result("Page.printToPDF", params:)
80
+ Base64.decode64(result["data"])
81
+ end
82
+
83
+ def close
84
+ @driver.close
85
+ @client.close
86
+ end
87
+
88
+ private
89
+
90
+ def data_url_for_html(html)
91
+ "data:text/html;base64,#{Base64.strict_encode64(html)}"
92
+ end
93
+
94
+ # Open a new tab in the remote chrome and return the WebSocket URL
95
+ def websocket_url
96
+ ChromeProcess.spawn_chrome
97
+ uri = URI("#{Palapala.headless_chrome_url}/json/new")
98
+ http = Net::HTTP.new(uri.host, uri.port)
99
+ request = Net::HTTP::Put.new(uri)
100
+ request['Content-Type'] = 'application/json'
101
+ response = http.request(request)
102
+ tab_info = JSON.parse(response.body)
103
+ websocket_url = tab_info["webSocketDebuggerUrl"]
104
+ puts "WebSocket URL: #{websocket_url}" if Palapala.debug
105
+ websocket_url
106
+ end
107
+
108
+ # Manage the Chrome child process
109
+ module ChromeProcess
110
+ def self.port_in_use?(port = 9222, host = "127.0.0.1")
111
+ server = TCPServer.new(host, port)
112
+ server.close
113
+ false
114
+ rescue Errno::EADDRINUSE
115
+ true
116
+ end
117
+
118
+ def self.chrome_process_healthy?
119
+ return false if @chrome_process_id.nil?
120
+
121
+ begin
122
+ Process.kill(0, @chrome_process_id) # Check if the process is alive
123
+ true
124
+ rescue Errno::ESRCH, Errno::EPERM
125
+ false
126
+ end
127
+ end
128
+
129
+ def self.kill_chrome
130
+ return if @chrome_process_id.nil?
131
+
132
+ Process.kill("KILL", @chrome_process_id) # Kill the process
133
+ Process.wait(@chrome_process_id) # Wait for the process to finish
134
+ end
135
+
136
+ def self.chrome_path
137
+ return Palapala.headless_chrome_path if Palapala.headless_chrome_path
138
+
139
+ case RbConfig::CONFIG["host_os"]
140
+ when /darwin/
141
+ "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
142
+ when /linux/
143
+ "/usr/bin/google-chrome" # or "/usr/bin/chromium-browser"
144
+ when /win|mingw|cygwin/
145
+ "#{ENV["ProgramFiles(x86)"]}\\Google\\Chrome\\Application\\chrome.exe"
146
+ else
147
+ raise "Unsupported OS"
148
+ end
149
+ end
150
+
151
+ def self.spawn_chrome
152
+ return if port_in_use?
153
+ return if chrome_process_healthy?
154
+
155
+ # Define the path and parameters separately
156
+ # chrome_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
157
+ params = ["--headless", "--disable-gpu", "--remote-debugging-port=9222"]
158
+
159
+ # Spawn the process with the path and parameters
160
+ @chrome_process_id = Process.spawn(chrome_path, *params)
161
+
162
+ # Wait until the port is in use
163
+ until port_in_use?
164
+ sleep 0.1
165
+ end
166
+ # Detach the process so it runs in the background
167
+ Process.detach(@chrome_process_id)
168
+
169
+ at_exit do
170
+ if @chrome_process_id
171
+ begin
172
+ Process.kill("TERM", @chrome_process_id)
173
+ Process.wait(@chrome_process_id)
174
+ puts "Child process #{@chrome_process_id} terminated."
175
+ rescue Errno::ESRCH
176
+ puts "Child process #{@chrome_process_id} is already terminated."
177
+ rescue Errno::ECHILD
178
+ puts "No child process #{@chrome_process_id} found."
179
+ end
180
+ end
181
+ end
182
+
183
+ # Handle when the process is killed
184
+ trap("SIGCHLD") do
185
+ while (@chrome_process_id = Process.wait(-1, Process::WNOHANG))
186
+ break if @chrome_process_id.nil?
187
+
188
+ puts "Process #{@chrome_process_id} was killed."
189
+ # Handle the error or restart the process if necessary
190
+ @chrome_process_id = nil
191
+ end
192
+ rescue Errno::ECHILD
193
+ @chrome_process_id = nil
194
+ end
195
+ end
196
+ end
197
+ end
198
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Palapala
4
- VERSION = '0.1.2'
4
+ VERSION = '0.1.4'
5
5
  end
@@ -0,0 +1,29 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'uri'
4
+ require 'socket'
5
+
6
+ module Palapala
7
+ # Create a socket wrapper that conforms to what the websocket-driver expects
8
+ class WebSocketClient
9
+ attr_reader :url
10
+
11
+ def initialize(url)
12
+ @url = url
13
+ uri = URI.parse(url)
14
+ @socket = TCPSocket.new(uri.host, uri.port)
15
+ end
16
+
17
+ def write(data)
18
+ @socket.write(data)
19
+ end
20
+
21
+ def read
22
+ @socket.readpartial(1024)
23
+ end
24
+
25
+ def close
26
+ @socket.close
27
+ end
28
+ end
29
+ end
data/lib/palapala.rb CHANGED
@@ -1,6 +1,9 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require_relative 'palapala/version'
3
4
  require_relative 'palapala/pdf'
5
+ require_relative 'palapala/web_socket_client'
6
+ require_relative 'palapala/renderer'
4
7
 
5
8
  # Main module for the gem
6
9
  module Palapala
@@ -9,12 +12,11 @@ module Palapala
9
12
  end
10
13
 
11
14
  class << self
12
- attr_accessor :ferrum_opts
13
- attr_accessor :defaults
14
- attr_accessor :debug
15
+ attr_accessor :defaults, :debug, :headless_chrome_url, :headless_chrome_path
15
16
  end
16
17
 
17
- self.ferrum_opts = {}
18
+ self.headless_chrome_url = 'http://localhost:9222'
19
+ self.headless_chrome_path = nil
18
20
  self.defaults = {}
19
21
  self.debug = false
20
22
  end
data/palapala_pdf.gemspec CHANGED
@@ -9,7 +9,7 @@ Gem::Specification.new do |spec|
9
9
  spec.email = ['github.com@handekyn.com']
10
10
 
11
11
  spec.summary = 'Convert HTML into PDF directly from Ruby using Chrome/Chromium.'
12
- spec.description = 'This gem uses Ferrum to render HTML into a PDF using Chrom(e)(ium) with minimal dependencies.'
12
+ spec.description = 'This gem uses faw web sockets to render HTML into a PDF using Chrom(e)(ium) with minimal dependencies.'
13
13
  spec.homepage = 'https://github.com/palapala-app/palapala_pdf'
14
14
  spec.required_ruby_version = '>= 3.1'
15
15
  spec.license = 'MIT'
@@ -34,7 +34,8 @@ Gem::Specification.new do |spec|
34
34
  spec.require_paths = ['lib']
35
35
 
36
36
  # Uncomment to register a new dependency of your gem
37
- spec.add_dependency 'ferrum', '~> 0'
37
+ spec.add_dependency 'base64', '~> 0'
38
+ spec.add_dependency 'websocket-driver', '~> 0'
38
39
 
39
40
  # For more information and examples about making a new gem, check out our
40
41
  # guide at: https://bundler.io/guides/creating_gem.html
metadata CHANGED
@@ -1,17 +1,17 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: palapala_pdf
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.1.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Koen Handekyn
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2024-08-25 00:00:00.000000000 Z
11
+ date: 2024-08-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
- name: ferrum
14
+ name: base64
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
17
  - - "~>"
@@ -24,8 +24,22 @@ dependencies:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
- description: This gem uses Ferrum to render HTML into a PDF using Chrom(e)(ium) with
28
- minimal dependencies.
27
+ - !ruby/object:Gem::Dependency
28
+ name: websocket-driver
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ description: This gem uses faw web sockets to render HTML into a PDF using Chrom(e)(ium)
42
+ with minimal dependencies.
29
43
  email:
30
44
  - github.com@handekyn.com
31
45
  executables: []
@@ -40,7 +54,9 @@ files:
40
54
  - Rakefile
41
55
  - lib/palapala.rb
42
56
  - lib/palapala/pdf.rb
57
+ - lib/palapala/renderer.rb
43
58
  - lib/palapala/version.rb
59
+ - lib/palapala/web_socket_client.rb
44
60
  - palapala_pdf.gemspec
45
61
  homepage: https://github.com/palapala-app/palapala_pdf
46
62
  licenses: