palapala_pdf 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8ceecc9fff4323ef8cdf26b9e20aeb35f21610e8b55f716a0b8e0c27c0e38613
4
- data.tar.gz: 4f9f412514ad9a9b63e4b6484488ff245bb0769352374a652845749ac230b6e6
3
+ metadata.gz: 95d833844d730f058d59abd1eaaee467dde4fa8b4f3b9d01dd441646675c585b
4
+ data.tar.gz: b30cd15b438c71801ec9f8b0c0c61ae2aa47543cc59bd36975dd52b9974af98a
5
5
  SHA512:
6
- metadata.gz: 62c62530b7034a687012bea224b8d284616cef97a2e749ab26151ee2e925ffb3b72ab0d96b13e970ab9d2630bca188c1e134ca59e81bcb37f245dbb1b888f000
7
- data.tar.gz: 352245729e62e3df55c23848b684af2deb642b3d506de83ef71f8cf0bc4964ac94219baac3e06e776957f15cb15fba41cf3f864ecb4a381c7e057bed75e598f6
6
+ metadata.gz: f34c4baea12a5f2577da4bc629319470f7f46dddd3b1ad05b142d0abba6598caefe2128d93035a6d58d65e9357cec32e461a5650fae99606eeae749ded843621
7
+ data.tar.gz: ac072fb99db23b9b3d7895f4cbd8fed028dd6b5555d1ea1b8ebbc341e4845f901f75a07d4df621691340d754b2d624fc5638d6e2725922fd248d7879441a86a3
data/README.md CHANGED
@@ -2,13 +2,13 @@
2
2
 
3
3
  This project is a Ruby gem that provides functionality for generating PDF files from HTML using the Chrome browser. It allows you to easily convert HTML content into PDF documents, making it convenient for tasks such as generating reports, invoices, or any other printable documents. The gem provides a simple and intuitive API for converting HTML to PDF, and it leverages the power and flexibility of the Chrome browser's rendering engine to ensure accurate and high-quality PDF output. With this gem, you can easily integrate PDF generation capabilities into your Ruby applications.
4
4
 
5
- At the core, this project leverages the same rendering engine as [Grover](https://github.com/Studiosity/grover), but with significantly reduced overhead and dependencies. Instead of relying on the full Grover stack, this project builds on [Ferrum](https://github.com/rubycdp/ferrum) to enable direct communication from Ruby to a headless Chrome or Chromium browser. This approach ensures efficient, thread-safe operations, providing a streamlined alternative for rendering tasks without sacrificing performance or flexibility.
5
+ At the core, this project leverages the same rendering engine as [Grover](https://github.com/Studiosity/grover), but with significantly reduced overhead and dependencies. Instead of relying on the full Grover/Puppeteer/NodeJS stack, this project uses a raw web socket to enable direct communication from Ruby to a headless Chrome or Chromium browser. This approach ensures efficieny while providing a streamlined alternative for rendering tasks without sacrificing performance or flexibility.
6
6
 
7
- This is how easy and powerfull PDF generation should be:
7
+ This is how easy and powerfull PDF generation can be in Ruby:
8
8
 
9
9
  ```ruby
10
10
  require "palapala"
11
- Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
11
+ Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
12
12
  ```
13
13
 
14
14
  And this while having the most modern HTML/CSS/JS availlable to you: flex, grid, canvas, you name it.
@@ -27,15 +27,23 @@ If you are not using bundler to manage dependencies, you can install the gem by
27
27
  $ gem install palapala_pdf
28
28
  ```
29
29
 
30
- Palapala PDF uses [Ferrum](https://github.com/rubycdp/ferrum) inside and that one is pretty good at finding your Chrome or Chromium.
30
+ Palapala PDF connects to Chrome over a web socket connection.
31
31
 
32
- If you want the highest throughput, then use an external Chrome/Chromium. Just start it with (9222 is the default port):
32
+ An external Chrome/Chromium is expected.
33
+ Just start it with the following command (9222 is the default port):
33
34
 
34
35
  ```sh
35
- chrome --headless --disable-gpu --remote-debugging-port=9222
36
+ /path/to/chrome --headless --disable-gpu --remote-debugging-port=9222
36
37
  ```
37
38
 
38
- Then you can run Palapala PDF against that Chrome/Chromium instance (see configuration).
39
+ Alternatively, Palapala PDF will try to launch Chrome as a child process.
40
+ It guesses the path to Chrome, or you configure it like this:
41
+
42
+ ```ruby
43
+ Palapala.setup do |config|
44
+ config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
45
+ end
46
+ ```
39
47
 
40
48
  ## Usage Instructions
41
49
 
@@ -43,21 +51,22 @@ To create a PDF from HTML content using the `Palapala` library, follow these ste
43
51
 
44
52
  1. **Configuration**:
45
53
 
46
- Configure the `Palapala` library with the necessary options, such as the URL for the Ferrum browser and default settings like scale and format.
54
+ Configure the `Palapala` library with the necessary options, such as the URL for the browser and default settings like scale and format.
47
55
 
48
56
  In a Rails context, this could be inside an initializer.
49
57
 
50
58
  ```ruby
51
59
  Palapala.setup do |config|
52
60
  # run against an external chrome/chromium or leave this out to run against a chrome that is started as a child process
53
- config.ferrum_opts = { url: 'http://localhost:9222' }
61
+ config.debug = true
62
+ config.headless_chrome_url = 'http://localhost:9222' # run against a remote Chrome instance
63
+ # config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
54
64
  config.defaults = { scale: 1, format: :A4 }
55
65
  end
56
66
  ```
67
+ 1. **Create a PDF from HTML**:
57
68
 
58
- 2. **Create a PDF from HTML**:
59
-
60
- Create a PDF file from HTML in IRB
69
+ Create a PDF file from HTML in `irb`
61
70
 
62
71
  ```sh
63
72
  gem install palapala_pdf
@@ -65,22 +74,64 @@ gem install palapala_pdf
65
74
 
66
75
  in IRB, load palapala and create a PDF from an HTML snippet:
67
76
 
68
- ```sh
69
- >irb
77
+ ```ruby
78
+ require "palapala"
79
+ Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
70
80
  ```
71
81
 
82
+ Instantiate a new Palapala::PDF object with your HTML content and generate the PDF binary data.
83
+
72
84
  ```ruby
73
85
  require "palapala"
74
- Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
86
+ binary_data = Palapala::PDF.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
75
87
  ```
76
88
 
77
- Instantiate a new Palapala::Pdf object with your HTML content and generate the PDF binary data.
89
+ ## Paged CSS
90
+
91
+ Paged CSS is a subset of CSS designed for styling printed documents. It extends standard CSS to handle pagination, page sizes, headers, footers, and other aspects of printed content. Paged CSS is commonly used in scenarios where web content needs to be converted to PDFs or other paginated formats.
92
+
93
+ ### Headers and Footers
94
+
95
+ When using Chromium-based rendering engines, headers and footers are not controlled by the Paged CSS standard but are instead managed through specific settings in the rendering engine.
96
+
97
+ With palapala PDF headers and footers are defined using `header_html` and `footer_html` options. These allow you to insert HTML content directly into the header or footer areas.
78
98
 
79
99
  ```ruby
80
- require "palapala"
81
- binary_data = Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
100
+ Palapala::PDF.new(
101
+ "<p>Hello world</>",
102
+ header_html: '<div style="text-align: center;">Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>',
103
+ footer_html: '<div style="text-align: center;">Generated with Palapala PDF</div>',
104
+ margin: { top: "2cm", bottom: "2cm"}
105
+ ).save("test.pdf")
106
+ ```
107
+
108
+ ### Page size, orientation and margins
109
+
110
+ #### With CSS
111
+
112
+ todo example
113
+
114
+ #### As params
115
+
116
+ todo example
117
+
118
+ ## JS based rendering
119
+
120
+ ```html
121
+ <html>
122
+ <script type="text/javascript">
123
+ document.addEventListener("DOMContentLoaded", () => {
124
+ document.body.innerHTML += "<p>Current time from JS: " + new Date().toLocaleString() + "</p>";
125
+ });
126
+ </script>
127
+ <body><p>Default body text.</p></body>
128
+ </html>
82
129
  ```
83
130
 
131
+ ## Raw parameters (Page.printToPDF)
132
+
133
+ See (Page.printToPDF)[https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF]
134
+
84
135
  ## Development
85
136
 
86
137
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
@@ -107,26 +158,80 @@ Your support is greatly appreciated and helps maintain the project!
107
158
 
108
159
  ## Findings
109
160
 
110
- For Chrome, mode headless=new seems to be slower for pdf rendering cases.
161
+ - For Chrome, mode headless=new seems to be slower for pdf rendering cases.
162
+ - On mac m3 (aug 24), chromium (brew install chromium) is about 3x slower then chrome? Maybe the chromium that get's installed is not ARM optimized?
111
163
 
112
164
  ## Primitive benchmark
113
165
 
114
- On a macbook m3, the throughput for 'hello world' PDF generation can reach around 25 docs/second when allowing for some concurrency. As Chrome is actually also very efficient, it scales really well for complex documents also. If you run this in Rails, the concurrency is being taken care of either by the front end thread pool or by the workers and you shouldn't have to think about this. (Using an external Chrome)
166
+ On a macbook m3, the throughput for 'hello world' PDF generation can reach around 300 docs/second when allowing for some concurrency. As Chrome is actually also very efficient, it scales really well for complex documents also. If you run this in Rails, the concurrency is being taken care of either by the front end thread pool or by the workers and you shouldn't have to think about this. (Using an external Chrome)
167
+
168
+ Note: it renders `"Hello #{i}, world #{j}! #{Time.now}."` where i is the thread and j is the iteration counter within the thread and persists it to an SSD (which is very fast these days).
115
169
 
170
+ ### benchmarking 20 docs: 1x20, 2x10, 4x5
116
171
 
172
+ ```sh
173
+ c:1, n:20 : Throughput = 159.41 docs/sec, Total time = 0.1255 seconds
174
+ c:2, n:10 : Throughput = 124.91 docs/sec, Total time = 0.1601 seconds
175
+ c:4, n:5 : Throughput = 196.40 docs/sec, Total time = 0.1018 seconds
117
176
  ```
118
- benchmarking 20 docs: 1x20, 2x10, 4x5, 5x4, 20x1 (c is concurrency, n is iterations)
119
- Total time c:1, n:20 = 1.2048690000083297 seconds
120
- Total time c:2, n:10 = 0.8969700000016019 seconds
121
- Total time c:4, n:5 = 0.7497870000079274 seconds
122
- Total time c:5, n:4 = 0.72492800001055 seconds
123
- Total time c:20, n:1 = 0.7156629998935387 seconds
177
+
178
+ ### benchmarking 320 docs: 1x320, 4x80, 8x40
179
+
180
+ ```sh
181
+ c:1, n:320 : Throughput = 184.99 docs/sec, Total time = 1.7299 seconds
182
+ c:4, n:80 : Throughput = 302.50 docs/sec, Total time = 1.0578 seconds
183
+ c:8, n:40 : Throughput = 254.29 docs/sec, Total time = 1.2584 seconds
124
184
  ```
125
185
 
126
- ## Advanced stuf
186
+ This is about a factor 100x faster then what you typically get with Grover and still 10x faster then with many alternatives. It's effectively that fast that you can run this for a lot of uses cases straight from e.g. your Ruby On Rails web worker in the controller on a single machine and still scale to lot's of users.
127
187
 
128
- ### Headers and Footers
188
+ ## Rails
189
+
190
+ ### `send_data` and `render_to_string`
191
+
192
+ The `send_data` method in Rails is used to send binary data as a file download to the user's browser. It allows you to send any type of data, such as PDF files, images, or CSV files, directly to the user without saving the file on the server.
193
+
194
+ The `render_to_string` method in Rails is used to render a view template to a string without sending it as a response to the user's browser. It allows you to generate HTML or other text-based content that can be used in various ways, such as sending it as an email, saving it to a file, or manipulating it further before sending it as a response.
195
+
196
+ Here's an example of how to use `render_to_string` to render a view template to a string and send the pdf using `send_data`:
197
+
198
+ ```ruby
199
+ def download_pdf
200
+ html_string = render_to_string(template: "example/template", layout: "print", locals: { } )
201
+ pdf_data = Palapala::PDF.new(html_string).binary_data
202
+ send_data pdf_data, filename: "document.pdf", type: "application/pdf"
203
+ end
204
+ ```
205
+
206
+ In this example, `pdf_data` is the binary data of the PDF file. The `filename` option specifies the name of the file that will be downloaded by the user, and the `type` option specifies the MIME type of the file.
207
+
208
+ ## Docker
209
+
210
+ In docker as root you must pass the no-sandbox browser option:
211
+
212
+ ```ruby
213
+ Palapala.setup do |config|
214
+ config.opts = { 'no-sandbox': nil }
215
+ end
216
+ ```
217
+ It has also been reported that the Chrome process repeatedly crashes when running inside a Docker container on an M1 Mac. Chrome should work as expected when deployed to a Docker container on a non-M1 Mac.
218
+
219
+ ## Thread-safety
220
+
221
+ Behind the scenes, a websocket is openend and stored on Thread.current for subsequent requests. Hence, the code is
222
+ thread safe in the sense that every web socket get's a new tab in the underlying chromium and get an isolated context.
223
+
224
+ For performance reasons, the code uses a low level websocket connection that does all it's work on the curent thread
225
+ so we can avoid synchronisation penalties.
226
+
227
+ ## Heroku
228
+
229
+ possible buildpacks
230
+
231
+ https://github.com/heroku/heroku-buildpack-chrome-for-testing
232
+
233
+ this buildpack install chrome and chromedriver, which is actually not needed, but it's maintained
129
234
 
130
- ### Title pages
235
+ https://elements.heroku.com/buildpacks/heroku/heroku-buildpack-google-chrome
131
236
 
132
- ### Page sizes in CSS
237
+ this buildpack installs chrome, which is all we need, but it's deprecated
data/lib/palapala/pdf.rb CHANGED
@@ -1,23 +1,18 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'ferrum'
4
-
5
3
  module Palapala
6
4
  # Page class to generate PDF from HTML content using Chrome in headless mode in a thread-safe way
7
- class Pdf
5
+ # @param page_ranges Empty string means all pages, e.g., "1-3, 5, 7-9"
6
+ class PDF
8
7
  def initialize(content = nil,
9
- url: nil,
10
- path: nil,
11
8
  header_html: nil,
12
9
  footer_html: nil,
13
- generate_tagged_pdf: false,
14
- prefer_css_page_size: true,
10
+ generate_tagged_pdf: Palapala.defaults.fetch(:generate_tagged_pdf, false),
11
+ prefer_css_page_size: Palapala.defaults.fetch(:prefer_css_page_size, true),
15
12
  scale: Palapala.defaults.fetch(:scale, 1),
16
- page_ranges: Palapala.defaults.fetch(:page_ranges, ''),
13
+ page_ranges: Palapala.defaults.fetch(:page_ranges, nil),
17
14
  margin: Palapala.defaults.fetch(:margin, {}))
18
15
  @content = content
19
- @url = url
20
- @path = path
21
16
  @header_html = header_html
22
17
  @footer_html = footer_html
23
18
  @generate_tagged_pdf = generate_tagged_pdf
@@ -27,82 +22,45 @@ module Palapala
27
22
  @margin = margin
28
23
  end
29
24
 
30
- def pdf(**opts)
31
- browser_context = browser.contexts.create
32
- browser_page = browser_context.page
33
- # # output console logs for this page
34
- if opts[:debug]
35
- browser_page.on('Runtime.consoleAPICalled') do |params|
36
- params['args'].each { |r| puts(r['value']) }
37
- end
38
- end
39
- # open the page
40
- url = @url || data_url
41
- browser_page.go_to(url)
42
- # Wait for the page to load
43
- browser_page.network.wait_for_idle
44
- # Generate PDF
45
- pdf_binary_data = browser_page.pdf(**opts_with_defaults.merge(opts))
46
- # Dispose the context
47
- browser_context.dispose
48
- # Return the PDF data
49
- pdf_binary_data
50
- end
51
-
52
25
  def binary_data(**opts)
53
26
  pdf(**opts)
54
27
  end
55
28
 
56
29
  def save(path, **opts)
57
- pdf(path:, **opts)
30
+ File.binwrite(path, pdf(**opts))
58
31
  end
59
32
 
60
33
  private
61
34
 
62
- def data_url
63
- encoded_html = Base64.strict_encode64(@content)
64
- "data:text/html;base64,#{encoded_html}"
35
+ def renderer
36
+ Thread.current[:renderer] ||= Renderer.new
37
+ end
38
+
39
+ def pdf(**opts)
40
+ renderer.html_to_pdf(@content, params: opts_with_defaults.merge(opts))
65
41
  end
66
42
 
67
43
  def opts_with_defaults
68
44
  opts = { scale: @scale,
69
45
  printBackground: true,
70
46
  dispayHeaderFooter: true,
71
- pageRanges: @page_ranges, # Empty string means all pages, e.g., "1-3, 5, 7-9"
72
47
  encoding: :binary,
73
- preferCSSPageSize: true,
74
- headerTemplate: @header_html || '',
75
- footerTemplate: @footer_html || '' }
48
+ preferCSSPageSize: @prefer_css_page_size }
76
49
 
50
+ opts[:headerTemplate] = @header_html unless @header_html.nil?
51
+ opts[:footerTemplate] = @footer_html unless @footer_html.nil?
52
+ opts[:pageRanges] = @page_ranges unless @page_ranges.nil?
77
53
  opts[:path] = @path unless @path.nil?
78
54
  opts[:generateTaggedPDF] = @generate_tagged_pdf unless @generate_tagged_pdf.nil?
79
55
  opts[:format] = @format unless @format.nil?
80
- opts[:paperWidth] = @paper_width unless @paper_width.nil?
81
- opts[:paperHeight] = @paper_height unless @paper_height.nil?
56
+ # opts[:paperWidth] = @paper_width unless @paper_width.nil?
57
+ # opts[:paperHeight] = @paper_height unless @paper_height.nil?
82
58
  opts[:landscape] = @landscape unless @landscape.nil?
83
59
  opts[:marginTop] = @margin[:top] unless @margin[:top].nil?
84
60
  opts[:marginLeft] = @margin[:left] unless @margin[:left].nil?
85
61
  opts[:marginBottom] = @margin[:bottom] unless @margin[:bottom].nil?
86
62
  opts[:marginRight] = @margin[:right] unless @margin[:right].nil?
87
-
88
63
  opts
89
64
  end
90
-
91
- def browser
92
- # accordng to the docs ferrum is thread safe, however, under heavy load
93
- # we are seeing some issues, so we are using thread locals to have a
94
- # browser per thread
95
- Thread.current[:browser] ||= new_browser
96
- # @@browser ||= new_browser
97
- end
98
-
99
- def new_browser
100
- Ferrum::Browser.new(Palapala.ferrum_opts)
101
- end
102
-
103
- # # TODO use method from template class
104
- # def cm_to_inches(value)
105
- # value / 2.54
106
- # end
107
65
  end
108
66
  end
@@ -0,0 +1,198 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "json"
4
+ require "net/http"
5
+ require "websocket/driver"
6
+
7
+ module Palapala
8
+ # Render HTML content to PDF using Chrome in headless mode with minimal dependencies
9
+ class Renderer
10
+ def initialize
11
+ # Create an instance of WebSocketClient with the WebSocket URL
12
+ @client = Palapala::WebSocketClient.new(websocket_url)
13
+ # Create the WebSocket driver
14
+ @driver = WebSocket::Driver.client(@client)
15
+ # Register the on_message callback
16
+ @driver.on(:message, &method(:on_message))
17
+ # Start the WebSocket handshake
18
+ @driver.start
19
+ # Initialize the protocol to get the page events
20
+ send_command_and_wait_for_result("Page.enable")
21
+ end
22
+
23
+ # Callback to handle the incomming WebSocket messages
24
+ def on_message(e)
25
+ puts "Received: #{e.data[0..64]}" if Palapala.debug
26
+ @response = JSON.parse(e.data) # Parse the JSON response
27
+ end
28
+
29
+ # Update the current ID to the next ID (increment by 1)
30
+ def next_id = @id = (@id || 0) + 1
31
+
32
+ # Get the current ID
33
+ def current_id = @id
34
+
35
+ # Process the WebSocket messages until some state is true
36
+ def process_until(&block)
37
+ loop do
38
+ @driver.parse(@client.read)
39
+ return if block.call
40
+ return if @driver.state == :closed
41
+ end
42
+ end
43
+
44
+ # Method to send a message (text) and wait for a response
45
+ def send_and_wait(message, &)
46
+ puts "\nSending: #{message}" if Palapala.debug
47
+ @driver.text(message)
48
+ process_until(&)
49
+ end
50
+
51
+ # Method to send a CDP command and wait for some state to be true
52
+ def send_command(method, params: {}, &block)
53
+ send_and_wait(JSON.generate({ id: next_id, method:, params: }), &block)
54
+ end
55
+
56
+ # Method to send a CDP command and wait for the matching event to get the result
57
+ # @return [Hash] The result of the command
58
+ def send_command_and_wait_for_result(method, params: {})
59
+ send_command(method, params:) do
60
+ @response && @response["id"] == current_id
61
+ end
62
+ @response["result"]
63
+ end
64
+
65
+ # Method to send a CDP command and wait for a specific method to be called
66
+ def send_command_and_wait_for_event(method, event_name:, params: {})
67
+ send_command(method, params:) do
68
+ @response && @response["method"] == event_name
69
+ end
70
+ end
71
+
72
+ # Convert HTML content to PDF
73
+ # See https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF
74
+ # @param html [String] The HTML content to convert to PDF
75
+ # @param params [Hash] Additional parameters to pass to the CDP command
76
+ def html_to_pdf(html, params: {})
77
+ send_command_and_wait_for_event("Page.navigate", params: { url: data_url_for_html(html) },
78
+ event_name: "Page.frameStoppedLoading")
79
+ result = send_command_and_wait_for_result("Page.printToPDF", params:)
80
+ Base64.decode64(result["data"])
81
+ end
82
+
83
+ def close
84
+ @driver.close
85
+ @client.close
86
+ end
87
+
88
+ private
89
+
90
+ def data_url_for_html(html)
91
+ "data:text/html;base64,#{Base64.strict_encode64(html)}"
92
+ end
93
+
94
+ # Open a new tab in the remote chrome and return the WebSocket URL
95
+ def websocket_url
96
+ ChromeProcess.spawn_chrome
97
+ uri = URI("#{Palapala.headless_chrome_url}/json/new")
98
+ http = Net::HTTP.new(uri.host, uri.port)
99
+ request = Net::HTTP::Put.new(uri)
100
+ request['Content-Type'] = 'application/json'
101
+ response = http.request(request)
102
+ tab_info = JSON.parse(response.body)
103
+ websocket_url = tab_info["webSocketDebuggerUrl"]
104
+ puts "WebSocket URL: #{websocket_url}" if Palapala.debug
105
+ websocket_url
106
+ end
107
+
108
+ # Manage the Chrome child process
109
+ module ChromeProcess
110
+ def self.port_in_use?(port = 9222, host = "127.0.0.1")
111
+ server = TCPServer.new(host, port)
112
+ server.close
113
+ false
114
+ rescue Errno::EADDRINUSE
115
+ true
116
+ end
117
+
118
+ def self.chrome_process_healthy?
119
+ return false if @chrome_process_id.nil?
120
+
121
+ begin
122
+ Process.kill(0, @chrome_process_id) # Check if the process is alive
123
+ true
124
+ rescue Errno::ESRCH, Errno::EPERM
125
+ false
126
+ end
127
+ end
128
+
129
+ def self.kill_chrome
130
+ return if @chrome_process_id.nil?
131
+
132
+ Process.kill("KILL", @chrome_process_id) # Kill the process
133
+ Process.wait(@chrome_process_id) # Wait for the process to finish
134
+ end
135
+
136
+ def self.chrome_path
137
+ return Palapala.headless_chrome_path if Palapala.headless_chrome_path
138
+
139
+ case RbConfig::CONFIG["host_os"]
140
+ when /darwin/
141
+ "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
142
+ when /linux/
143
+ "/usr/bin/google-chrome" # or "/usr/bin/chromium-browser"
144
+ when /win|mingw|cygwin/
145
+ "#{ENV["ProgramFiles(x86)"]}\\Google\\Chrome\\Application\\chrome.exe"
146
+ else
147
+ raise "Unsupported OS"
148
+ end
149
+ end
150
+
151
+ def self.spawn_chrome
152
+ return if port_in_use?
153
+ return if chrome_process_healthy?
154
+
155
+ # Define the path and parameters separately
156
+ # chrome_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
157
+ params = ["--headless", "--disable-gpu", "--remote-debugging-port=9222"]
158
+
159
+ # Spawn the process with the path and parameters
160
+ @chrome_process_id = Process.spawn(chrome_path, *params)
161
+
162
+ # Wait until the port is in use
163
+ until port_in_use?
164
+ sleep 0.1
165
+ end
166
+ # Detach the process so it runs in the background
167
+ Process.detach(@chrome_process_id)
168
+
169
+ at_exit do
170
+ if @chrome_process_id
171
+ begin
172
+ Process.kill("TERM", @chrome_process_id)
173
+ Process.wait(@chrome_process_id)
174
+ puts "Child process #{@chrome_process_id} terminated."
175
+ rescue Errno::ESRCH
176
+ puts "Child process #{@chrome_process_id} is already terminated."
177
+ rescue Errno::ECHILD
178
+ puts "No child process #{@chrome_process_id} found."
179
+ end
180
+ end
181
+ end
182
+
183
+ # Handle when the process is killed
184
+ trap("SIGCHLD") do
185
+ while (@chrome_process_id = Process.wait(-1, Process::WNOHANG))
186
+ break if @chrome_process_id.nil?
187
+
188
+ puts "Process #{@chrome_process_id} was killed."
189
+ # Handle the error or restart the process if necessary
190
+ @chrome_process_id = nil
191
+ end
192
+ rescue Errno::ECHILD
193
+ @chrome_process_id = nil
194
+ end
195
+ end
196
+ end
197
+ end
198
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Palapala
4
- VERSION = '0.1.1'
4
+ VERSION = '0.1.3'
5
5
  end
@@ -0,0 +1,29 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'uri'
4
+ require 'socket'
5
+
6
+ module Palapala
7
+ # Create a socket wrapper that conforms to what the websocket-driver expects
8
+ class WebSocketClient
9
+ attr_reader :url
10
+
11
+ def initialize(url)
12
+ @url = url
13
+ uri = URI.parse(url)
14
+ @socket = TCPSocket.new(uri.host, uri.port)
15
+ end
16
+
17
+ def write(data)
18
+ @socket.write(data)
19
+ end
20
+
21
+ def read
22
+ @socket.readpartial(1024)
23
+ end
24
+
25
+ def close
26
+ @socket.close
27
+ end
28
+ end
29
+ end
data/lib/palapala.rb CHANGED
@@ -1,6 +1,8 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require_relative 'palapala/pdf'
4
+ require_relative 'palapala/web_socket_client'
5
+ require_relative 'palapala/renderer'
4
6
 
5
7
  # Main module for the gem
6
8
  module Palapala
@@ -8,19 +10,12 @@ module Palapala
8
10
  yield self
9
11
  end
10
12
 
11
- def self.ferrum_opts=(ferrum_opts)
12
- @ferrum_opts = ferrum_opts
13
+ class << self
14
+ attr_accessor :defaults, :debug, :headless_chrome_url, :headless_chrome_path
13
15
  end
14
16
 
15
- def self.ferrum_opts
16
- @ferrum_opts
17
- end
18
-
19
- def self.defaults=(defaults)
20
- @defaults = defaults
21
- end
22
-
23
- def self.defaults
24
- @defaults ||= {}
25
- end
17
+ self.headless_chrome_url = 'http://localhost:9222'
18
+ self.headless_chrome_path = nil
19
+ self.defaults = {}
20
+ self.debug = false
26
21
  end
data/palapala_pdf.gemspec CHANGED
@@ -9,7 +9,7 @@ Gem::Specification.new do |spec|
9
9
  spec.email = ['github.com@handekyn.com']
10
10
 
11
11
  spec.summary = 'Convert HTML into PDF directly from Ruby using Chrome/Chromium.'
12
- spec.description = 'This gem uses Ferrum to render HTML into a PDF using Chrom(e)(ium) with minimal dependencies.'
12
+ spec.description = 'This gem uses faw web sockets to render HTML into a PDF using Chrom(e)(ium) with minimal dependencies.'
13
13
  spec.homepage = 'https://github.com/palapala-app/palapala_pdf'
14
14
  spec.required_ruby_version = '>= 3.1'
15
15
  spec.license = 'MIT'
@@ -34,7 +34,8 @@ Gem::Specification.new do |spec|
34
34
  spec.require_paths = ['lib']
35
35
 
36
36
  # Uncomment to register a new dependency of your gem
37
- spec.add_dependency 'ferrum', '~> 0.15'
37
+ spec.add_dependency 'base64', '~> 0'
38
+ spec.add_dependency 'websocket-driver', '~> 0'
38
39
 
39
40
  # For more information and examples about making a new gem, check out our
40
41
  # guide at: https://bundler.io/guides/creating_gem.html
metadata CHANGED
@@ -1,31 +1,45 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: palapala_pdf
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Koen Handekyn
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2024-08-23 00:00:00.000000000 Z
11
+ date: 2024-08-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
- name: ferrum
14
+ name: base64
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
17
  - - "~>"
18
18
  - !ruby/object:Gem::Version
19
- version: '0.15'
19
+ version: '0'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
- version: '0.15'
27
- description: This gem uses Ferrum to render HTML into a PDF using Chrom(e)(ium) with
28
- minimal dependencies.
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: websocket-driver
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ description: This gem uses faw web sockets to render HTML into a PDF using Chrom(e)(ium)
42
+ with minimal dependencies.
29
43
  email:
30
44
  - github.com@handekyn.com
31
45
  executables: []
@@ -40,7 +54,9 @@ files:
40
54
  - Rakefile
41
55
  - lib/palapala.rb
42
56
  - lib/palapala/pdf.rb
57
+ - lib/palapala/renderer.rb
43
58
  - lib/palapala/version.rb
59
+ - lib/palapala/web_socket_client.rb
44
60
  - palapala_pdf.gemspec
45
61
  homepage: https://github.com/palapala-app/palapala_pdf
46
62
  licenses: