palapala_pdf 0.1.24 → 0.1.26
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +45 -16
- data/bin/chrome-headless-server +1 -1
- data/examples/performance_benchmark.rb +21 -3
- data/examples/test_large_file.rb +33 -0
- data/lib/palapala/asset_helper.rb +66 -0
- data/lib/palapala/html_preprocessor.rb +33 -0
- data/lib/palapala/persistent_server.rb +104 -0
- data/lib/palapala/railtie.rb +24 -0
- data/lib/palapala/renderer.rb +85 -20
- data/lib/palapala/version.rb +1 -1
- data/lib/palapala.rb +11 -1
- data/paged_css.pdf +0 -0
- data/palapala_pdf.gemspec +1 -0
- metadata +21 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: cf35b002c0b5f5b0f33c5bb3513e8127180c3ceaf4855452e891b9b5833e4b04
|
4
|
+
data.tar.gz: de883c2a261e227e427d5a29efeaffc3b18fb7285e5eb9c65e529e1554d40378
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 4803547a2761565e67a17bb2770d4e2e6a79f01e55ace46101a438450c729ad97a399245891bf791a84135c41e7f5c33c89809097ca5375e1850187d82871580
|
7
|
+
data.tar.gz: 484fc20cc82ce043cdeeef636faec03f261ba7e7e3cdc7d2c40e1a042250510bda84ccac1e54be5b3fadc48f7817e0376a98b25b6a58b4b7e52efd91e34a6697
|
data/README.md
CHANGED
@@ -40,8 +40,40 @@ bundle add palapala_pdf
|
|
40
40
|
|
41
41
|
## Usage Instructions
|
42
42
|
|
43
|
+
The gem can be used in Rails context as wel as in plain Ruby context.
|
44
|
+
|
45
|
+
**Create a PDF from HTML**
|
46
|
+
|
47
|
+
Load palapala and create a PDF file from an HTML snippet:
|
48
|
+
|
49
|
+
```ruby
|
50
|
+
require "palapala"
|
51
|
+
Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
|
52
|
+
```
|
53
|
+
|
54
|
+
Instantiate a new Palapala::Pdf object with your HTML content and generate the PDF binary data:
|
55
|
+
|
56
|
+
```ruby
|
57
|
+
require "palapala"
|
58
|
+
binary_data = Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
|
59
|
+
```
|
60
|
+
|
43
61
|
To create a PDF from HTML content using the `Palapala` library, follow these steps:
|
44
62
|
|
63
|
+
**Render PDFs from Rails controllers**
|
64
|
+
|
65
|
+
Use the `pdf` renderer in Rails controllers to render a PDF from the current action.
|
66
|
+
Inspired by Chris Oliver's code shared at RailsWorld2025.
|
67
|
+
|
68
|
+
```ruby
|
69
|
+
def show
|
70
|
+
respond_to do |format|
|
71
|
+
format.html
|
72
|
+
format.pdf { render pdf: {}, disposition: :inline, filename: "example.pdf" }
|
73
|
+
end
|
74
|
+
end
|
75
|
+
```
|
76
|
+
|
45
77
|
**Configuration from inside Ruby**
|
46
78
|
|
47
79
|
Configure the `Palapala` library with the necessary options, such as the URL for the browser and default settings like scale and format.
|
@@ -79,22 +111,6 @@ HEADLESS_CHROME_URL=http://192.168.1.1:9222 ruby examples/performance_benchmark.
|
|
79
111
|
HEADLESS_CHROME_PATH=/var/to/chrome ruby examples/performance_benchmark.rb
|
80
112
|
```
|
81
113
|
|
82
|
-
**Create a PDF from HTML**
|
83
|
-
|
84
|
-
Load palapala and create a PDF file from an HTML snippet:
|
85
|
-
|
86
|
-
```ruby
|
87
|
-
require "palapala"
|
88
|
-
Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
|
89
|
-
```
|
90
|
-
|
91
|
-
Instantiate a new Palapala::Pdf object with your HTML content and generate the PDF binary data:
|
92
|
-
|
93
|
-
```ruby
|
94
|
-
require "palapala"
|
95
|
-
binary_data = Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").binary_data
|
96
|
-
```
|
97
|
-
|
98
114
|
## Advanced Examples
|
99
115
|
|
100
116
|
- headers and footers
|
@@ -230,6 +246,15 @@ $CHROME_PATH --disable-gpu --remote-debugging-port=9222 --disable-software-raste
|
|
230
246
|
|
231
247
|
*It has also been reported that the Chrome process repeatedly crashes when running inside a Docker container on an M1 Mac. Chrome should work asexpected when deployed to a Docker container on a non-M1 Mac.*
|
232
248
|
|
249
|
+
The gem comes with an executable `chrome-headless-server` that installs chrome headless (using npx) and runs it, so once the gem is installed you can simple do this
|
250
|
+
|
251
|
+
```
|
252
|
+
> chrome-headless-server
|
253
|
+
Installing/launching chrome-headless-shell@stable
|
254
|
+
Launching chrome-headless-shell at chrome-headless-shell/mac_arm-140.0.7339.80/chrome-headless-shell-mac-arm64/chrome-headless-shell
|
255
|
+
Google Chrome for Testing 140.0.7339.80
|
256
|
+
DevTools listening on ws://127.0.0.1:9222/devtools/browser/e2565da0-8bf0-45f0-9cfc-86211db70a99
|
257
|
+
```
|
233
258
|
|
234
259
|
## Thread-safety
|
235
260
|
|
@@ -278,3 +303,7 @@ Ensure the script is executable
|
|
278
303
|
```sh
|
279
304
|
chmod +x bin/start
|
280
305
|
```
|
306
|
+
|
307
|
+
# REFERENCES
|
308
|
+
|
309
|
+
Info to process : https://nathanfriend.com/2019/04/15/pdf-gotchas-with-headless-chrome.html
|
data/bin/chrome-headless-server
CHANGED
@@ -12,12 +12,16 @@ Palapala.debug = $debug
|
|
12
12
|
|
13
13
|
# @param concurrency Number of concurrent threads
|
14
14
|
# @param iterations Number of iterations per thread
|
15
|
-
def benchmark(concurrency, iterations)
|
15
|
+
def benchmark(concurrency, iterations, html_size: 1)
|
16
16
|
time = Benchmark.realtime do
|
17
17
|
threads = (1..concurrency).map do |i|
|
18
18
|
Thread.new do
|
19
19
|
iterations.times do |j|
|
20
|
-
doc = "Hello #{i}, world #{j}!
|
20
|
+
# doc = "Hello #{i}, <b>world</b> #{j}! <i>#{Time.now}</i>. 00001: 0123456789 The quick brown fox jumps over the lazy dog.\n"
|
21
|
+
doc = "Hello #{i}, world #{j}! <i>#{Time.now}</i>. 00001: 0123456789 The quick brown fox jumps over the lazy dog.\n"
|
22
|
+
# make doc double the size untiul it's bigger than html_size
|
23
|
+
doc *= (html_size / doc.bytesize) + 1
|
24
|
+
doc = "<html><body><pre>#{doc}</pre></body></html>"
|
21
25
|
pdf = Palapala::Pdf.new(doc)
|
22
26
|
$save ? pdf.save("tmp/benchmark_#{i}_#{j}.pdf") : pdf.binary_data
|
23
27
|
end
|
@@ -34,4 +38,18 @@ benchmark(1, 5)
|
|
34
38
|
puts "Starting benchmark..."
|
35
39
|
benchmark(1, 10)
|
36
40
|
benchmark(2, 20 / 2)
|
37
|
-
benchmark
|
41
|
+
puts "Starting benchmark step 2 (small docs)..."
|
42
|
+
benchmark(2, 20)
|
43
|
+
benchmark(2, 40)
|
44
|
+
benchmark(2, 80)
|
45
|
+
benchmark(4, 40)
|
46
|
+
benchmark(8, 20)
|
47
|
+
puts "Starting benchmark step 2 (medium docs)..."
|
48
|
+
benchmark(2, 20, html_size: 10_000)
|
49
|
+
benchmark(2, 40, html_size: 10_000)
|
50
|
+
benchmark(2, 80, html_size: 10_000)
|
51
|
+
benchmark(4, 40, html_size: 10_000)
|
52
|
+
benchmark(8, 20, html_size: 10_000)
|
53
|
+
puts "Starting benchmark with html size 2 000 000 ..."
|
54
|
+
# benchmark(1, 1, html_size: 1_500_000)
|
55
|
+
benchmark(1, 1, html_size: 2_000_000)
|
@@ -0,0 +1,33 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
# Test script to debug large file processing
|
3
|
+
|
4
|
+
$LOAD_PATH.unshift File.expand_path('../lib', __dir__)
|
5
|
+
|
6
|
+
require 'benchmark'
|
7
|
+
require 'palapala'
|
8
|
+
|
9
|
+
# Enable debug logging
|
10
|
+
Palapala.debug = true
|
11
|
+
|
12
|
+
puts "Testing large file processing..."
|
13
|
+
|
14
|
+
# Generate a large HTML file (> 2MB)
|
15
|
+
doc = "Hello, world! <i>#{Time.now}</i>. 00001: 0123456789 The quick brown fox jumps over the lazy dog.\n"
|
16
|
+
# make doc double the size untiul it's bigger than html_size
|
17
|
+
doc *= (2_000_000 / doc.bytesize) + 1
|
18
|
+
# doc *= (10_000 / doc.bytesize) + 1
|
19
|
+
large_html = "<html><body><pre>#{doc}</pre></body></html>"
|
20
|
+
|
21
|
+
# save the generated file as large_file.html
|
22
|
+
# File.write("large_file.html", large_html)
|
23
|
+
|
24
|
+
puts "Generated HTML: #{large_html.bytesize} bytes"
|
25
|
+
|
26
|
+
begin
|
27
|
+
puts "Starting PDF generation..."
|
28
|
+
pdf_data = Palapala::Renderer.html_to_pdf(large_html)
|
29
|
+
puts "Success! Generated PDF: #{pdf_data.length} bytes"
|
30
|
+
rescue => e
|
31
|
+
puts "Error: #{e.message}"
|
32
|
+
puts "Backtrace: #{e.backtrace.first(5).join("\n")}"
|
33
|
+
end
|
@@ -0,0 +1,66 @@
|
|
1
|
+
# sourced from ferrum_pdf (modified)
|
2
|
+
module Palapala
|
3
|
+
class BaseAsset
|
4
|
+
def initialize(asset)
|
5
|
+
@asset = asset
|
6
|
+
end
|
7
|
+
end
|
8
|
+
|
9
|
+
class PropshaftAsset < BaseAsset
|
10
|
+
def content_type
|
11
|
+
@asset.content_type.to_s
|
12
|
+
end
|
13
|
+
|
14
|
+
def content
|
15
|
+
@asset.content
|
16
|
+
end
|
17
|
+
end
|
18
|
+
|
19
|
+
class SprocketsAsset < BaseAsset
|
20
|
+
def content_type
|
21
|
+
@asset.content_type
|
22
|
+
end
|
23
|
+
|
24
|
+
def content
|
25
|
+
@asset.source
|
26
|
+
end
|
27
|
+
end
|
28
|
+
|
29
|
+
class AssetFinder
|
30
|
+
class << self
|
31
|
+
def find(path)
|
32
|
+
if Rails.application.assets.respond_to?(:load_path)
|
33
|
+
propshaft_asset(path)
|
34
|
+
elsif Rails.application.assets.respond_to?(:find_asset)
|
35
|
+
sprockets_asset(path)
|
36
|
+
else
|
37
|
+
nil
|
38
|
+
end
|
39
|
+
end
|
40
|
+
|
41
|
+
def propshaft_asset(path)
|
42
|
+
(asset = Rails.application.assets.load_path.find(path)) ? PropshaftAsset.new(asset) : nil
|
43
|
+
end
|
44
|
+
|
45
|
+
def sprockets_asset(path)
|
46
|
+
(asset = Rails.application.assets.find_asset(path)) ? SprocketsAsset.new(asset) : nil
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
51
|
+
module AssetsHelper
|
52
|
+
def palapala_pdf_inline_stylesheet(path)
|
53
|
+
(asset = AssetFinder.find(path)) ? "<style>#{asset.content}</style>".html_safe : nil
|
54
|
+
end
|
55
|
+
|
56
|
+
def palapala_pdf_inline_javascript(path)
|
57
|
+
(asset = AssetFinder.find(path)) ? "<script>#{asset.content}</script>".html_safe : nil
|
58
|
+
end
|
59
|
+
|
60
|
+
def palapala_pdf_base64_asset(path)
|
61
|
+
return nil unless (asset = AssetFinder.find(path))
|
62
|
+
|
63
|
+
"data:#{asset.content_type};base64,#{Base64.encode64(asset.content).gsub(/\s+/, "")}"
|
64
|
+
end
|
65
|
+
end
|
66
|
+
end
|
@@ -0,0 +1,33 @@
|
|
1
|
+
module Palapala
|
2
|
+
# Helper module for preparing HTML for conversion
|
3
|
+
#
|
4
|
+
# Sourced from the PDFKit project (through ferrum_pdf)
|
5
|
+
# @see https://github.com/pdfkit/pdfkit
|
6
|
+
module HTMLPreprocessor
|
7
|
+
# Change relative paths to absolute, and relative protocols to absolute protocols
|
8
|
+
#
|
9
|
+
# process("Some HTML", "https://example.org")
|
10
|
+
#
|
11
|
+
def self.process(html, base_url)
|
12
|
+
return html if base_url.blank?
|
13
|
+
|
14
|
+
base_url += "/" unless base_url.end_with? "/"
|
15
|
+
protocol = base_url.split("://").first
|
16
|
+
html = translate_relative_paths(html, base_url) if base_url
|
17
|
+
html = translate_relative_protocols(html, protocol) if protocol
|
18
|
+
html
|
19
|
+
end
|
20
|
+
|
21
|
+
def self.translate_relative_paths(html, base_url)
|
22
|
+
# Try out this regexp using rubular http://rubular.com/r/hiAxBNX7KE
|
23
|
+
html.gsub(%r{(href|src)=(['"])/([^/"']([^"']*|[^"']*))?['"]}, "\\1=\\2#{base_url}\\3\\2")
|
24
|
+
end
|
25
|
+
private_class_method :translate_relative_paths
|
26
|
+
|
27
|
+
def self.translate_relative_protocols(body, protocol)
|
28
|
+
# Try out this regexp using rubular http://rubular.com/r/0Ohk0wFYxV
|
29
|
+
body.gsub(%r{(href|src)=(['"])//([^"']*|[^"']*)['"]}, "\\1=\\2#{protocol}://\\3\\2")
|
30
|
+
end
|
31
|
+
private_class_method :translate_relative_protocols
|
32
|
+
end
|
33
|
+
end
|
@@ -0,0 +1,104 @@
|
|
1
|
+
require 'webrick'
|
2
|
+
require 'securerandom'
|
3
|
+
require 'socket'
|
4
|
+
|
5
|
+
module Palapala
|
6
|
+
# Persistent server that stays running and serves HTML content from memory
|
7
|
+
# Eliminates the overhead of creating/destroying servers for each PDF
|
8
|
+
class PersistentServer
|
9
|
+
@@instance = nil
|
10
|
+
@@files = {}
|
11
|
+
@@mutex = Mutex.new
|
12
|
+
|
13
|
+
def self.instance
|
14
|
+
@@mutex.synchronize do
|
15
|
+
@@instance ||= new
|
16
|
+
end
|
17
|
+
end
|
18
|
+
|
19
|
+
def initialize(port: 9223)
|
20
|
+
@port = find_available_port(port)
|
21
|
+
@server = WEBrick::HTTPServer.new(
|
22
|
+
Port: @port,
|
23
|
+
Logger: WEBrick::Log.new("/dev/null"),
|
24
|
+
AccessLog: []
|
25
|
+
)
|
26
|
+
|
27
|
+
# Custom handler to serve files from memory
|
28
|
+
@server.mount_proc '/file' do |req, res|
|
29
|
+
file_key = req.path.sub('/file/', '')
|
30
|
+
|
31
|
+
@@mutex.synchronize do
|
32
|
+
if @@files.key?(file_key)
|
33
|
+
res.status = 200
|
34
|
+
res['Content-Type'] = 'text/html'
|
35
|
+
res.body = @@files[file_key]
|
36
|
+
else
|
37
|
+
res.status = 404
|
38
|
+
res.body = 'File not found'
|
39
|
+
end
|
40
|
+
end
|
41
|
+
end
|
42
|
+
|
43
|
+
# Start server in background thread
|
44
|
+
@thread = Thread.new { @server.start }
|
45
|
+
|
46
|
+
# Wait for server to be ready
|
47
|
+
sleep 0.1 until @server.status == :Running
|
48
|
+
end
|
49
|
+
|
50
|
+
# Serve HTML content and return URL
|
51
|
+
def serve_html(html)
|
52
|
+
puts "PersistentServer: Serving HTML content (#{html.bytesize} bytes)" if defined?(Palapala) && Palapala.debug
|
53
|
+
key = SecureRandom.hex
|
54
|
+
@@mutex.synchronize do
|
55
|
+
@@files[key] = html
|
56
|
+
puts "PersistentServer: Stored content with key #{key}" if defined?(Palapala) && Palapala.debug
|
57
|
+
end
|
58
|
+
url = "http://localhost:#{@port}/file/#{key}"
|
59
|
+
puts "PersistentServer: Returning URL #{url}" if defined?(Palapala) && Palapala.debug
|
60
|
+
url
|
61
|
+
end
|
62
|
+
|
63
|
+
# Clean up served content
|
64
|
+
def cleanup(key)
|
65
|
+
@@mutex.synchronize do
|
66
|
+
@@files.delete(key)
|
67
|
+
end
|
68
|
+
end
|
69
|
+
|
70
|
+
# Get current port
|
71
|
+
def port
|
72
|
+
@port
|
73
|
+
end
|
74
|
+
|
75
|
+
# Check if server is running
|
76
|
+
def running?
|
77
|
+
@server.status == :Running
|
78
|
+
end
|
79
|
+
|
80
|
+
# Stop the server
|
81
|
+
def stop
|
82
|
+
@server.shutdown
|
83
|
+
@thread.join
|
84
|
+
end
|
85
|
+
|
86
|
+
private
|
87
|
+
|
88
|
+
# Find an available port starting from the preferred port
|
89
|
+
def find_available_port(preferred_port)
|
90
|
+
port = preferred_port
|
91
|
+
loop do
|
92
|
+
begin
|
93
|
+
server = TCPServer.new(port)
|
94
|
+
server.close
|
95
|
+
return port
|
96
|
+
rescue Errno::EADDRINUSE
|
97
|
+
port += 1
|
98
|
+
# Prevent infinite loop - if we can't find a port within 100 attempts, raise
|
99
|
+
raise "Could not find available port starting from #{preferred_port}" if port > preferred_port + 100
|
100
|
+
end
|
101
|
+
end
|
102
|
+
end
|
103
|
+
end
|
104
|
+
end
|
@@ -0,0 +1,24 @@
|
|
1
|
+
module Palapala
|
2
|
+
class Railtie < ::Rails::Railtie
|
3
|
+
initializer "palapala_pdf.assets_helper" do
|
4
|
+
ActiveSupport.on_load(:action_view) do
|
5
|
+
include Palapala::AssetsHelper
|
6
|
+
end
|
7
|
+
end
|
8
|
+
|
9
|
+
initializer "ferrum_pdf.controller" do
|
10
|
+
ActiveSupport.on_load(:action_controller) do
|
11
|
+
# render pdf: { pdf options }, template: "whatever", disposition: :inline, filename: "example.pdf"
|
12
|
+
ActionController.add_renderer :pdf do |pdf_options, options|
|
13
|
+
send_data_options = options.extract!(:disposition, :filename, :status)
|
14
|
+
url = pdf_options.delete(:url)
|
15
|
+
html = render_to_string(**options.with_defaults(formats: [ :html ])) if url.blank?
|
16
|
+
base_url = request.base_url
|
17
|
+
processed_html = Palapala::HTMLPreprocessor.process(html, base_url)
|
18
|
+
pdf = Palapala::Pdf.new(processed_html).binary_data
|
19
|
+
send_data(pdf, **send_data_options.with_defaults(type: :pdf))
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
data/lib/palapala/renderer.rb
CHANGED
@@ -3,6 +3,7 @@ require "net/http"
|
|
3
3
|
require "websocket/driver"
|
4
4
|
require_relative "./web_socket_client"
|
5
5
|
require_relative "./chrome_process"
|
6
|
+
require_relative "./persistent_server"
|
6
7
|
require 'tempfile'
|
7
8
|
require 'webrick'
|
8
9
|
|
@@ -44,7 +45,7 @@ module Palapala
|
|
44
45
|
|
45
46
|
# Callback to handle the incomming WebSocket messages
|
46
47
|
def on_message(e)
|
47
|
-
puts "Received: #{e.data[0..
|
48
|
+
puts "Received: #{e.data[0..128]}" if Palapala.debug
|
48
49
|
@response = JSON.parse(e.data) # Parse the JSON response
|
49
50
|
if @response["error"] # Raise an error if the response contains an error
|
50
51
|
raise "#{@response["error"]["message"]}: #{@response["error"]["data"]} (#{@response["error"]["code"]})"
|
@@ -60,15 +61,20 @@ module Palapala
|
|
60
61
|
# Process the WebSocket messages until some state is true
|
61
62
|
def process_until(&block)
|
62
63
|
loop do
|
63
|
-
|
64
|
-
|
65
|
-
|
64
|
+
begin
|
65
|
+
@driver.parse(@client.read)
|
66
|
+
return if block.call
|
67
|
+
return if @driver.state == :closed
|
68
|
+
rescue EOFError => e
|
69
|
+
puts "WebSocket connection lost: #{e.message}" if Palapala.debug
|
70
|
+
raise "Chrome process appears to have died. WebSocket connection lost."
|
71
|
+
end
|
66
72
|
end
|
67
73
|
end
|
68
74
|
|
69
75
|
# Method to send a message (text) and wait for a response
|
70
76
|
def send_and_wait(message, &)
|
71
|
-
puts "\nSending: #{message}" if Palapala.debug
|
77
|
+
puts "\nSending: #{message.to_s[0..128]}" if Palapala.debug
|
72
78
|
@driver.text(message)
|
73
79
|
process_until(&)
|
74
80
|
end
|
@@ -81,19 +87,52 @@ module Palapala
|
|
81
87
|
# Method to send a CDP command and wait for the matching event to get the result
|
82
88
|
# @return [Hash] The result of the command
|
83
89
|
def send_command_and_wait_for_result(method, params: {})
|
90
|
+
puts "Waiting for result of #{method}..." if Palapala.debug
|
84
91
|
send_command(method, params:) do
|
85
92
|
@response && @response["id"] == current_id
|
86
93
|
end
|
94
|
+
puts "Got result for #{method}" if Palapala.debug
|
87
95
|
@response["result"]
|
88
96
|
end
|
89
97
|
|
98
|
+
# Method to send a CDP command and wait for the result with timeout
|
99
|
+
# @return [Hash] The result of the command
|
100
|
+
def send_command_and_wait_for_result_with_timeout(method, params: {}, timeout: 300)
|
101
|
+
puts "Waiting for result of #{method} (timeout: #{timeout}s)..." if Palapala.debug
|
102
|
+
|
103
|
+
result = nil
|
104
|
+
error = nil
|
105
|
+
|
106
|
+
thread = Thread.new do
|
107
|
+
begin
|
108
|
+
send_command(method, params:) do
|
109
|
+
@response && @response["id"] == current_id
|
110
|
+
end
|
111
|
+
result = @response["result"]
|
112
|
+
rescue => e
|
113
|
+
error = e
|
114
|
+
end
|
115
|
+
end
|
116
|
+
|
117
|
+
unless thread.join(timeout)
|
118
|
+
thread.kill
|
119
|
+
raise "Timeout: #{method} took longer than #{timeout} seconds"
|
120
|
+
end
|
121
|
+
|
122
|
+
raise error if error
|
123
|
+
puts "Got result for #{method}" if Palapala.debug
|
124
|
+
result
|
125
|
+
end
|
126
|
+
|
90
127
|
# Method to send a CDP command and wait for a specific method to be called
|
91
128
|
def send_command_and_wait_for_event(method, event_name:, params: {})
|
129
|
+
puts "Waiting for event #{event_name} from #{method}..." if Palapala.debug
|
92
130
|
send_command(method, params:) do
|
93
131
|
# chrome refuses to load pages that are bigger than 2MB and returns a net::ERR_ABORTED error
|
94
132
|
raise "Page cannot be loaded" if @response.dig("result", "errorText") == "net::ERR_ABORTED"
|
95
133
|
@response && @response["method"] == event_name
|
96
134
|
end
|
135
|
+
puts "Got event #{event_name}" if Palapala.debug
|
97
136
|
end
|
98
137
|
|
99
138
|
# Convert HTML content to PDF
|
@@ -101,17 +140,35 @@ module Palapala
|
|
101
140
|
# @param html [String] The HTML content to convert to PDF
|
102
141
|
# @param params [Hash] Additional parameters to pass to the CDP command
|
103
142
|
def html_to_pdf(html, params: {})
|
104
|
-
|
143
|
+
puts "Starting PDF generation for #{html.bytesize} bytes" if Palapala.debug
|
144
|
+
|
145
|
+
# Use data URL for small content (< 2MB after base64 encoding), persistent server for larger content
|
146
|
+
# Base64 encoding increases size by ~33%, so check original size < 1.2MB to be safe
|
147
|
+
if html.bytesize < 1_200_000
|
148
|
+
puts "Using data URL for small content" if Palapala.debug
|
149
|
+
url = data_url_for_html(html)
|
150
|
+
cleanup_key = nil
|
151
|
+
else
|
152
|
+
puts "Using persistent server for large content" if Palapala.debug
|
153
|
+
server = PersistentServer.instance
|
154
|
+
url = server.serve_html(html)
|
155
|
+
cleanup_key = url.split('/').last
|
156
|
+
puts "Served content at URL: #{url}" if Palapala.debug
|
157
|
+
end
|
158
|
+
|
105
159
|
begin
|
106
|
-
|
107
|
-
|
108
|
-
|
109
|
-
|
110
|
-
|
160
|
+
puts "Navigating to URL..." if Palapala.debug
|
161
|
+
send_command_and_wait_for_event("Page.navigate", params: { url: url }, event_name: "Page.frameStoppedLoading")
|
162
|
+
puts "Page loaded, generating PDF..." if Palapala.debug
|
163
|
+
result = send_command_and_wait_for_result_with_timeout("Page.printToPDF", params:)
|
164
|
+
puts "PDF generated, decoding..." if Palapala.debug
|
111
165
|
Base64.decode64(result["data"])
|
112
166
|
ensure
|
113
|
-
|
114
|
-
|
167
|
+
# Clean up served content if using persistent server
|
168
|
+
if cleanup_key
|
169
|
+
puts "Cleaning up content key: #{cleanup_key}" if Palapala.debug
|
170
|
+
PersistentServer.instance.cleanup(cleanup_key)
|
171
|
+
end
|
115
172
|
end
|
116
173
|
end
|
117
174
|
|
@@ -122,7 +179,8 @@ module Palapala
|
|
122
179
|
|
123
180
|
def self.html_to_pdf(html, params: {})
|
124
181
|
thread_local_instance.html_to_pdf(html, params: params)
|
125
|
-
rescue StandardError
|
182
|
+
rescue StandardError => e
|
183
|
+
puts "PDF generation failed: #{e.message}" if Palapala.debug
|
126
184
|
reset # Reset the renderer on error, the websocket connection might be broken
|
127
185
|
thread_local_instance.html_to_pdf(html, params: params) # Retry (once)
|
128
186
|
end
|
@@ -143,19 +201,26 @@ module Palapala
|
|
143
201
|
request = Net::HTTP::Put.new(uri)
|
144
202
|
request["Content-Type"] = "application/json"
|
145
203
|
response = http.request(request)
|
146
|
-
tab_info = JSON.parse(response.body)
|
147
|
-
websocket_url = tab_info["webSocketDebuggerUrl"]
|
148
|
-
puts "WebSocket URL: #{websocket_url}" if Palapala.debug
|
149
|
-
websocket_url
|
150
|
-
end
|
151
204
|
|
152
|
-
|
205
|
+
# Check if response is valid JSON
|
206
|
+
begin
|
207
|
+
tab_info = JSON.parse(response.body)
|
208
|
+
websocket_url = tab_info["webSocketDebuggerUrl"]
|
209
|
+
puts "WebSocket URL: #{websocket_url}" if Palapala.debug
|
210
|
+
websocket_url
|
211
|
+
rescue JSON::ParserError => e
|
212
|
+
puts "Chrome response error: #{response.body}" if Palapala.debug
|
213
|
+
raise "Chrome is not responding properly. Response: #{response.body}"
|
214
|
+
end
|
215
|
+
end
|
153
216
|
|
154
217
|
# Convert the HTML content to a data URL
|
155
218
|
def data_url_for_html(html)
|
156
219
|
"data:text/html;base64,#{Base64.strict_encode64(html)}"
|
157
220
|
end
|
158
221
|
|
222
|
+
private
|
223
|
+
|
159
224
|
def start_local_server(html)
|
160
225
|
file = Tempfile.new(["html_content", ".html"])
|
161
226
|
file.write(html)
|
data/lib/palapala/version.rb
CHANGED
data/lib/palapala.rb
CHANGED
@@ -1,6 +1,16 @@
|
|
1
1
|
require_relative "palapala/pdf"
|
2
2
|
require_relative "palapala/helper"
|
3
|
+
require_relative "palapala/asset_helper"
|
3
4
|
require_relative "palapala/version"
|
5
|
+
require_relative "palapala/html_preprocessor"
|
6
|
+
|
7
|
+
# Only load railtie if Rails is present
|
8
|
+
begin
|
9
|
+
require "rails"
|
10
|
+
require_relative "palapala/railtie"
|
11
|
+
rescue LoadError
|
12
|
+
# Rails not available, skip railtie
|
13
|
+
end
|
4
14
|
|
5
15
|
module Palapala
|
6
16
|
def self.setup
|
@@ -34,7 +44,7 @@ module Palapala
|
|
34
44
|
self.chrome_headless_shell_version = ENV.fetch("CHROME_HEADLESS_SHELL_VERSION", "stable")
|
35
45
|
self.chrome_params = ENV.fetch("HEADLESS_CHROME_PARAMS", nil)&.split || []
|
36
46
|
|
37
|
-
if !ENV["DYNO"].nil? || File.exist?(
|
47
|
+
if !ENV["DYNO"].nil? || File.exist?("/.dockerenv")
|
38
48
|
self.chrome_params ||= []
|
39
49
|
self.chrome_params << "--no-sandbox"
|
40
50
|
end
|
data/paged_css.pdf
CHANGED
Binary file
|
data/palapala_pdf.gemspec
CHANGED
@@ -38,6 +38,7 @@ Gem::Specification.new do |spec|
|
|
38
38
|
spec.add_dependency 'websocket-driver', '~> 0'
|
39
39
|
spec.add_dependency 'combine_pdf', '~> 1'
|
40
40
|
spec.add_dependency 'webrick'
|
41
|
+
spec.add_dependency "rails", ">= 6.0.0"
|
41
42
|
|
42
43
|
# For more information and examples about making a new gem, check out our
|
43
44
|
# guide at: https://bundler.io/guides/creating_gem.html
|
metadata
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: palapala_pdf
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.26
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Koen Handekyn
|
8
8
|
bindir: bin
|
9
9
|
cert_chain: []
|
10
|
-
date: 2025-
|
10
|
+
date: 2025-09-09 00:00:00.000000000 Z
|
11
11
|
dependencies:
|
12
12
|
- !ruby/object:Gem::Dependency
|
13
13
|
name: base64
|
@@ -65,6 +65,20 @@ dependencies:
|
|
65
65
|
- - ">="
|
66
66
|
- !ruby/object:Gem::Version
|
67
67
|
version: '0'
|
68
|
+
- !ruby/object:Gem::Dependency
|
69
|
+
name: rails
|
70
|
+
requirement: !ruby/object:Gem::Requirement
|
71
|
+
requirements:
|
72
|
+
- - ">="
|
73
|
+
- !ruby/object:Gem::Version
|
74
|
+
version: 6.0.0
|
75
|
+
type: :runtime
|
76
|
+
prerelease: false
|
77
|
+
version_requirements: !ruby/object:Gem::Requirement
|
78
|
+
requirements:
|
79
|
+
- - ">="
|
80
|
+
- !ruby/object:Gem::Version
|
81
|
+
version: 6.0.0
|
68
82
|
description: This gem uses faw web sockets to render HTML into a PDF using Chrom(e)(ium)
|
69
83
|
with minimal dependencies.
|
70
84
|
email:
|
@@ -94,10 +108,15 @@ files:
|
|
94
108
|
- examples/paged_css.pdf
|
95
109
|
- examples/paged_css.rb
|
96
110
|
- examples/performance_benchmark.rb
|
111
|
+
- examples/test_large_file.rb
|
97
112
|
- lib/palapala.rb
|
113
|
+
- lib/palapala/asset_helper.rb
|
98
114
|
- lib/palapala/chrome_process.rb
|
99
115
|
- lib/palapala/helper.rb
|
116
|
+
- lib/palapala/html_preprocessor.rb
|
100
117
|
- lib/palapala/pdf.rb
|
118
|
+
- lib/palapala/persistent_server.rb
|
119
|
+
- lib/palapala/railtie.rb
|
101
120
|
- lib/palapala/renderer.rb
|
102
121
|
- lib/palapala/version.rb
|
103
122
|
- lib/palapala/web_socket_client.rb
|