palapala_pdf 0.1.6 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 9e783562bb4503abf97bdeaa76e27b78873c6e558af513b44a80b36e951f1b83
4
- data.tar.gz: 6f5e7c7e2503eceb8de04c78374981bc291df974483b4153c76a00e29da34f8b
3
+ metadata.gz: 3a5601b10de0f7f98fd62c9dda59133ed4f4ce89c3545d6522e3eea2cdbbfbac
4
+ data.tar.gz: 82b6a5f0919e6e587e0b3e0cd02fe3547e4c6ac3e6f55ddea8db97f9b89729b5
5
5
  SHA512:
6
- metadata.gz: 2da370cf5549e8f5931411ddfb3ef9a857241d00ad571e560b10523a78f172ad8e876d9efa4aca442b9c29d6708c6a3008a59d6555ca02a15460a616bd6d321c
7
- data.tar.gz: 31e808ea15655699e2c903a324f178d5ad94ccca1cd4eb1a6c3e6fa8bb08056a47125e83622cf26430972b35d8f40ddaee39c334584725bbfc755150e8392a63
6
+ metadata.gz: 8f848195ef97f03506d3847c26a996ef9c70bdb953349720a6cd39ae092aec08bdb915afbfbbae16a0866e7ee2b3ce907991f75353677fee795baada3828be52
7
+ data.tar.gz: 74663055627e1fab37a2884dfc74d591238c75d8677f16501f1f369b58c53a096e7adfa7f4ee390dd72afc81bff96eb10f46c23ccc7b364423b301a7a8592964
data/.rubocop.yml CHANGED
@@ -1,4 +1,6 @@
1
1
  inherit_from: .rubocop_todo.yml
2
+ # Omakase Ruby styling for Rails
3
+ inherit_gem: { rubocop-rails-omakase: rubocop.yml }
2
4
 
3
5
  # This is a basic RuboCop configuration file
4
6
  AllCops:
@@ -12,3 +14,4 @@ AllCops:
12
14
  require:
13
15
  - rubocop-minitest
14
16
  - rubocop-rake
17
+ - rubocop-performance
data/.rubocop_todo.yml CHANGED
@@ -1,44 +1,7 @@
1
1
  # This configuration was generated by
2
2
  # `rubocop --auto-gen-config`
3
- # on 2024-08-23 11:11:08 UTC using RuboCop version 1.65.1.
3
+ # on 2024-08-27 21:04:10 UTC using RuboCop version 1.65.1.
4
4
  # The point is for the user to remove these configuration records
5
5
  # one by one as the offenses are removed from the code base.
6
6
  # Note that changes in the inspected code, or installation of new
7
7
  # versions of RuboCop, may require this file to be generated again.
8
-
9
- # Offense count: 1
10
- # Configuration parameters: AllowedMethods, AllowedPatterns, CountRepeatedAttributes.
11
- Metrics/AbcSize:
12
- Max: 33
13
-
14
- # Offense count: 1
15
- # Configuration parameters: AllowedMethods, AllowedPatterns.
16
- Metrics/CyclomaticComplexity:
17
- Max: 13
18
-
19
- # Offense count: 1
20
- # Configuration parameters: CountComments, CountAsOne, AllowedMethods, AllowedPatterns.
21
- Metrics/MethodLength:
22
- Max: 19
23
-
24
- # Offense count: 1
25
- # Configuration parameters: CountKeywordArgs, MaxOptionalParameters.
26
- Metrics/ParameterLists:
27
- Max: 10
28
-
29
- # Offense count: 1
30
- # Configuration parameters: AllowedMethods, AllowedPatterns.
31
- Metrics/PerceivedComplexity:
32
- Max: 13
33
-
34
- # Offense count: 2
35
- Style/ClassVars:
36
- Exclude:
37
- - 'lib/palapala/pdf.rb'
38
-
39
- # Offense count: 1
40
- # This cop supports safe autocorrection (--autocorrect).
41
- # Configuration parameters: AllowHeredoc, AllowURI, URISchemes, IgnoreCopDirectives, AllowedPatterns.
42
- # URISchemes: http, https
43
- Layout/LineLength:
44
- Max: 121
data/README.md CHANGED
@@ -1,17 +1,20 @@
1
1
  # PDF Generation for your Rubies
2
2
 
3
+ <div align="center"><img src="https://raw.githubusercontent.com/palapala-app/palapala_pdf/main/assets/images/logo.webp" alt="Palapala PDF Logo" width="200"></div>
4
+
3
5
  This project is a Ruby gem that provides functionality for generating PDF files from HTML using the Chrome browser. It allows you to easily convert HTML content into PDF documents, making it convenient for tasks such as generating reports, invoices, or any other printable documents. The gem provides a simple and intuitive API for converting HTML to PDF, and it leverages the power and flexibility of the Chrome browser's rendering engine to ensure accurate and high-quality PDF output. With this gem, you can easily integrate PDF generation capabilities into your Ruby applications.
4
6
 
5
7
  At the core, this project leverages the same rendering engine as [Grover](https://github.com/Studiosity/grover), but with significantly reduced overhead and dependencies. Instead of relying on the full Grover/Puppeteer/NodeJS stack, this project uses a raw web socket to enable direct communication from Ruby to a headless Chrome or Chromium browser. This approach ensures efficieny while providing a streamlined alternative for rendering tasks without sacrificing performance or flexibility.
6
8
 
7
- This is how easy and powerfull PDF generation can be in Ruby:
9
+ This is how easy PDF generation can be in Ruby:
8
10
 
9
11
  ```ruby
10
12
  require "palapala"
11
13
  Palapala::Pdf.new("<h1>Hello, world! #{Time.now}</h1>").save('hello.pdf')
12
14
  ```
15
+ And this while having the most modern HTML/CSS/JS availlable to you: flex, grid, canvas, ...
13
16
 
14
- And this while having the most modern HTML/CSS/JS availlable to you: flex, grid, canvas, you name it.
17
+ A core goal of this project is performance, and it is designed to be exceptionally fast. By leveraging **direct communication** with a headless Chrome or Chromium browser via a **raw web socket**, the gem minimizes overhead and dependencies, enabling PDF generation at speeds that significantly outperform other solutions. Whether generating simple or complex documents, this gem ensures that your Ruby applications can handle PDF tasks efficiently and at scale.
15
18
 
16
19
  ## Installation
17
20
 
@@ -28,23 +31,68 @@ $ gem install palapala_pdf
28
31
  ```
29
32
 
30
33
  Palapala PDF connects to Chrome over a web socket connection.
31
-
32
- An external Chrome/Chromium is expected.
33
- Just start it with the following command (9222 is the default port):
34
+ An external Chrome/Chromium is expected. Start it with the following
35
+ command (9222 is the default port):
34
36
 
35
37
  ```sh
36
38
  /path/to/chrome --headless --disable-gpu --remote-debugging-port=9222
37
39
  ```
38
40
 
41
+ ### Installing Chrome / Headless Chrome
42
+
43
+ Seems the august 2024 release 128.0.6613.85 is seriously performance impacted. So to avoid regression issues, it's suggested to install a specific version of Chrome, test it and stick with it. This is easiest using npx and some tooling provided by Puppeteer. Unfortunately it depends on node/npm, but it's worth it. E.g. install a specific version like this:
44
+
45
+ ```
46
+ npx @puppeteer/browsers install chrome@127.0.6533.88
47
+ ````
48
+
49
+ This installs chrome in a `chrome` folder in the current working dir and it outputs the path where it's installed when it's finished.
50
+
51
+ If you installed it using puppeteer from above
52
+
53
+ ```sh
54
+ ./chrome/mac_arm-127.0.6533.88/chrome-mac-arm64/Google\ Chrome\ for\ Testing.app/Contents/MacOS/Google\ Chrome\ for\ Testing --headless --disable-gpu --remote-debugging-port=9222
55
+ ```
56
+
57
+ Currently i'd advise for the `chrome-headless-shell`variant that is a light version meant just for this use case. The chrome-headless-shell is a minimal, headless version of the Chrome browser designed specifically for environments where you need to run Chrome without a graphical user interface (GUI). This is particularly useful in scenarios like server-side rendering, automated testing, web scraping, or any situation where you need the power of the Chrome browser engine without the overhead of displaying a UI. Headless by design, reduced size and overhead but still the same engine.
58
+
59
+ ```
60
+ npx @puppeteer/browsers install chrome-headless-shell@stable
61
+ ```
62
+
63
+ It installs to a path like this `./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell`. As it's headless by design, it only needs one parameter
64
+
65
+ ```
66
+ ./chrome-headless-shell/mac_arm-128.0.6613.84/chrome-headless-shell-mac-arm64/chrome-headless-shell --remote-debugging-port=9222
67
+ ```
68
+
39
69
  Alternatively, Palapala PDF will try to launch Chrome as a child process.
40
70
  It guesses the path to Chrome, or you configure it like this:
41
71
 
42
72
  ```ruby
43
73
  Palapala.setup do |config|
44
- config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
74
+ config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
45
75
  end
46
76
  ```
47
77
 
78
+ ### Installing Node/NPX
79
+
80
+ Using Brew
81
+
82
+ ````
83
+ brew install node
84
+ ```
85
+
86
+ Using NVM (Node Version Manager)
87
+
88
+ ````
89
+ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.3/install.sh | bash
90
+ source ~/.nvm/nvm.sh
91
+ nvm --version
92
+ nvm install node
93
+ ````
94
+
95
+
48
96
  ## Usage Instructions
49
97
 
50
98
  To create a PDF from HTML content using the `Palapala` library, follow these steps:
@@ -146,6 +194,7 @@ Bug reports and pull requests are welcome on GitHub at https://github.com/palapa
146
194
 
147
195
  - [Kenneth Geerts](https://github.com/kennethgeerts) - Your foundational contributions to simplicity are greatly appreciated.
148
196
  - [Eugen Neagoe](https://github.com/eneagoe) - Thank you for your valuable input, feedback and opinions.
197
+ - [Radu Bogoevici](https://github.com/codenighter) - Thanks for test driving, and all help big and small.
149
198
 
150
199
  ## Sponsor This Project
151
200
 
Binary file
Binary file
@@ -0,0 +1,49 @@
1
+ # frozen_string_literal: true
2
+
3
+ $LOAD_PATH.unshift File.expand_path('../lib', __dir__)
4
+ require 'palapala'
5
+
6
+ HEADER_HTML = <<~HTML
7
+ <style type="text/css">
8
+ .header {
9
+ -webkit-print-color-adjust: exact;
10
+ border-bottom: 1px solid lightgray;
11
+ color: black;
12
+ font-family: Arial, Helvetica, sans-serif;
13
+ font-size: 12pt;
14
+ margin: 0 auto;
15
+ padding: 5px;
16
+ text-align: center;
17
+ vertical-align: middle;
18
+ width: 100%;
19
+ border: 1px solid black;
20
+ }
21
+ </style>
22
+ <div class="header" style="text-align: center">
23
+ Page <span class="pageNumber"></span> of <span class="totalPages"></span>
24
+ </div>
25
+ HTML
26
+
27
+ Palapala.setup do |config|
28
+ config.debug = true
29
+ config.headless_chrome_url = 'http://localhost:9222' # run against a remote Chrome instance
30
+ # config.headless_chrome_path = '/usr/bin/google-chrome-stable' # path to Chrome executable
31
+ end
32
+
33
+ result = Palapala::Pdf.new(
34
+ # "<style>@page { size: A4 landscape; }</style><p>Hello world #{Time.now}</>",
35
+ "<h1>Title</h1><p>Hello world #{Time.now}</>",
36
+ header_html: HEADER_HTML,
37
+ footer_html: '<div style="text-align: center;">Generated with Palapala PDF</div>',
38
+ scale: 0.75,
39
+ prefer_css_page_size: false,
40
+ margin: { top: 3, bottom: 2 }
41
+ ).save('tmp/headers_and_footers.pdf',
42
+ generateDocumentOutline: false,
43
+ # marginTop: 1,
44
+ # paperWidth: 3,
45
+ displayHeaderFooter: true,
46
+ # landscape: false,
47
+ headerTemplate: HEADER_HTML)
48
+
49
+ puts result
@@ -0,0 +1,22 @@
1
+ # frozen_string_literal: true
2
+
3
+ $LOAD_PATH.unshift File.expand_path('../lib', __dir__)
4
+ require 'palapala'
5
+
6
+ DOCUMENT = <<~HTML
7
+ <html>
8
+ <script type="text/javascript">
9
+ document.addEventListener("DOMContentLoaded", () => {
10
+ document.body.innerHTML += "<p>Current time from JS: " + new Date().toLocaleString() + "</p>";
11
+ });
12
+ </script>
13
+ <body><p>Default body text.</p></body>
14
+ </html>
15
+ HTML
16
+
17
+ Palapala.setup do |config|
18
+ config.debug = true
19
+ end
20
+
21
+ result = Palapala::Pdf.new(DOCUMENT).save('tmp/js_based_rendering.pdf')
22
+ puts result
@@ -0,0 +1,50 @@
1
+ # frozen_string_literal: true
2
+
3
+ $LOAD_PATH.unshift File.expand_path('../lib', __dir__)
4
+
5
+ require 'benchmark'
6
+ require 'palapala'
7
+
8
+ debug = ARGV[0] == 'debug'
9
+
10
+ Palapala.setup do |config|
11
+ # config.headless_chrome_url = 'http://localhost:9222'
12
+ config.debug = debug
13
+ config.defaults.merge! scale: 0.75, format: :A4
14
+ end
15
+
16
+ # @param concurrency Number of concurrent threads
17
+ # @param iterations Number of iterations per thread
18
+ def benchmark(concurrency, iterations)
19
+ time = Benchmark.realtime do
20
+ threads = (1..concurrency).map do |i|
21
+ Thread.new do
22
+ iterations.times do |j|
23
+ doc = "Hello #{i}, world #{j}! #{Time.now}."
24
+ Palapala::Pdf.new(doc).save("tmp/benchmark_#{i}_#{j}.pdf")
25
+ end
26
+ end
27
+ end
28
+ threads.each(&:join)
29
+ end
30
+ puts "c:#{concurrency}, n:#{iterations} : Throughput = #{(concurrency * iterations / time).round(2)} docs/sec, Total time = #{time.round(4)} seconds"
31
+ time
32
+ end
33
+
34
+ puts 'warmup'
35
+ benchmark(1, 10)
36
+
37
+ puts 'benchmarking 20 docs: 1x20, 2x10, 4x5, 5x4, 20x1'
38
+ benchmark(1, 20)
39
+ benchmark(2, 10)
40
+ benchmark(4, 5)
41
+ # benchmark(5, 4)
42
+ # benchmark(20, 1)
43
+
44
+ puts 'benchmarking 320 docs'
45
+ benchmark(1, 320)
46
+ benchmark(2, 320 / 2)
47
+ benchmark(4, 320 / 4)
48
+ benchmark(8, 320 / 8)
49
+ # benchmark(20, 2)
50
+ # benchmark(40, 1)
@@ -0,0 +1,19 @@
1
+ #!/bin/bash
2
+
3
+ # Run the command and capture the output
4
+ echo "Installing latest stable chrome-headless-shell..."
5
+ output=$(npx @puppeteer/browsers install chrome-headless-shell@stable)
6
+
7
+ # Extract the path from the output
8
+ chrome_path=$(echo "$output" | grep "chrome-headless-shell@" | awk '{print $2}')
9
+
10
+ # Directory you want the relative path from (current working directory)
11
+ base_dir=$(pwd)
12
+
13
+ # Convert absolute path to relative path using Node.js
14
+ relative_path=$(node -e "console.log(require('path').relative('$base_dir', '$chrome_path'))")
15
+
16
+ echo "Launching chrome-headless-shell at $relative_path"
17
+ echo $("$chrome_path" --version)
18
+ # Launch chrome-headless-shell with the --remote-debugging-port parameter
19
+ "$chrome_path" --remote-debugging-port=9222
@@ -0,0 +1,100 @@
1
+ module Palapala
2
+ # Manage the Chrome child process
3
+ module ChromeProcess
4
+ # Check if the port is in use
5
+ def self.port_in_use?(port = 9222, host = "127.0.0.1")
6
+ server = TCPServer.new(host, port)
7
+ server.close
8
+ false
9
+ rescue Errno::EADDRINUSE
10
+ true
11
+ end
12
+
13
+ # Check if the Chrome process is healthy
14
+ def self.chrome_process_healthy?
15
+ return false if @chrome_process_id.nil?
16
+
17
+ begin
18
+ Process.kill(0, @chrome_process_id) # Check if the process is alive
19
+ true
20
+ rescue Errno::ESRCH, Errno::EPERM
21
+ false
22
+ end
23
+ end
24
+
25
+ # Check if a Chrome is running
26
+ def self.chrome_running?
27
+ port_in_use? || # Check if the port is in use and Chrome is running externally
28
+ chrome_process_healthy? # Check if the process is still alive
29
+ end
30
+
31
+ # Kill the Chrome child process
32
+ def self.kill_chrome
33
+ return if @chrome_process_id.nil?
34
+
35
+ Process.kill("KILL", @chrome_process_id) # Kill the process
36
+ Process.wait(@chrome_process_id) # Wait for the process to finish
37
+ end
38
+
39
+ # Get the path to the Chrome executable, if it's not set, then guess based on the OS
40
+ def self.chrome_path
41
+ return Palapala.headless_chrome_path if Palapala.headless_chrome_path
42
+
43
+ case RbConfig::CONFIG["host_os"]
44
+ when /darwin/
45
+ "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
46
+ when /linux/
47
+ "/usr/bin/google-chrome" # or "/usr/bin/chromium-browser"
48
+ when /win|mingw|cygwin/
49
+ "#{ENV.fetch("ProgramFiles(x86)", nil)}\\Google\\Chrome\\Application\\chrome.exe"
50
+ else
51
+ raise "Unsupported OS"
52
+ end
53
+ end
54
+
55
+ # Spawn a Chrome child process
56
+ def self.spawn_chrome
57
+ return if chrome_running?
58
+
59
+ # Define the path and parameters separately
60
+ # chrome_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
61
+ params = [ "--headless", "--disable-gpu", "--remote-debugging-port=9222" ]
62
+ params.merge!(Palapala.chrome_params) if Palapala.chrome_params
63
+
64
+ # Spawn the process with the path and parameters
65
+ @chrome_process_id = Process.spawn(chrome_path, *params)
66
+
67
+ # Wait until the port is in use
68
+ sleep 0.1 until port_in_use?
69
+ # Detach the process so it runs in the background
70
+ Process.detach(@chrome_process_id)
71
+
72
+ at_exit do
73
+ if @chrome_process_id
74
+ begin
75
+ Process.kill("TERM", @chrome_process_id)
76
+ Process.wait(@chrome_process_id)
77
+ puts "Child process #{@chrome_process_id} terminated."
78
+ rescue Errno::ESRCH
79
+ puts "Child process #{@chrome_process_id} is already terminated."
80
+ rescue Errno::ECHILD
81
+ puts "No child process #{@chrome_process_id} found."
82
+ end
83
+ end
84
+ end
85
+
86
+ # Handle when the process is killed
87
+ trap("SIGCHLD") do
88
+ while (@chrome_process_id = Process.wait(-1, Process::WNOHANG))
89
+ break if @chrome_process_id.nil?
90
+
91
+ puts "Process #{@chrome_process_id} was killed."
92
+ # Handle the error or restart the process if necessary
93
+ @chrome_process_id = nil
94
+ end
95
+ rescue Errno::ECHILD
96
+ @chrome_process_id = nil
97
+ end
98
+ end
99
+ end
100
+ end
data/lib/palapala/pdf.rb CHANGED
@@ -1,67 +1,87 @@
1
- # frozen_string_literal: true
1
+ require_relative "./renderer"
2
2
 
3
3
  module Palapala
4
4
  # Page class to generate PDF from HTML content using Chrome in headless mode in a thread-safe way
5
5
  # @param page_ranges Empty string means all pages, e.g., "1-3, 5, 7-9"
6
6
  class Pdf
7
- def initialize(content = nil,
8
- header_html: nil,
9
- footer_html: nil,
10
- generate_tagged_pdf: Palapala.defaults.fetch(:generate_tagged_pdf, false),
11
- prefer_css_page_size: Palapala.defaults.fetch(:prefer_css_page_size, true),
12
- scale: Palapala.defaults.fetch(:scale, 1),
13
- page_ranges: Palapala.defaults.fetch(:page_ranges, nil),
14
- margin: Palapala.defaults.fetch(:margin, {}))
15
- @content = content
16
- @header_html = header_html
17
- @footer_html = footer_html
18
- @generate_tagged_pdf = generate_tagged_pdf
19
- @prefer_css_page_size = prefer_css_page_size
20
- @page_ranges = page_ranges
21
- @scale = scale
22
- @margin = margin
7
+ # Initialize the PDF object with the HTML content and optional parameters.
8
+ #
9
+ # The options are passed to the renderer when generating the PDF.
10
+ # The options are the snakified version of the options from the Chrome DevTools Protocol to respect the Ruby conventions.
11
+ # (see https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF)
12
+ #
13
+ # @param content [String] the HTML content to convert to PDF
14
+ # @param footer_html [String] the HTML content for the footer
15
+ # @param generate_tagged_pdf [Boolean] whether to generate a tagged PDF
16
+ # @param header_html [String] the HTML content for the header
17
+ # @param landscape [Boolean] whether to use landscape orientation
18
+ # @param margin_bottom [Integer] the bottom margin in inches
19
+ # @param margin_left [Integer] the left margin in inches
20
+ # @param margin_right [Integer] the right margin in inches
21
+ # @param margin_top [Integer] the top margin in inches
22
+ # @param page_ranges [String] the page ranges to print, e.g., "1-3, 5, 7-9"
23
+ # @param paper_height [Integer] the paper height in inches
24
+ # @param paper_width [Integer] the paper width in inches
25
+ # @param prefer_css_page_size [Boolean] whether to prefer CSS page size (advised)
26
+ # @param print_background [Boolean] whether to print background graphics
27
+ # @param scale [Float] the scale of the PDF rendering
28
+ def initialize(content,
29
+ footer_template: nil,
30
+ generate_tagged_pdf: nil,
31
+ header_template: nil,
32
+ landscape: nil,
33
+ margin_bottom: nil,
34
+ margin_left: nil,
35
+ margin_right: nil,
36
+ margin_top: nil,
37
+ page_ranges: nil,
38
+ paper_height: nil,
39
+ paper_width: nil,
40
+ prefer_css_page_size: nil,
41
+ print_background: nil,
42
+ scale: nil)
43
+ @content = content || raise(ArgumentError, "Content is required and can't be nil")
44
+ @opts = {}
45
+ @opts[:headerTemplate] = header_template || Palapala.defaults[:header_template]
46
+ @opts[:footerTemplate] = footer_template || Palapala.defaults[:footer_template]
47
+ @opts[:pageRanges] = page_ranges || Palapala.defaults[:page_ranges]
48
+ @opts[:generateTaggedPDF] = generate_tagged_pdf || Palapala.defaults[:generate_tagged_pdf]
49
+ @opts[:paperWidth] = paper_width || Palapala.defaults[:paper_width]
50
+ @opts[:paperHeight] = paper_height || Palapala.defaults[:paper_height]
51
+ @opts[:landscape] = landscape || Palapala.defaults[:landscape]
52
+ @opts[:marginTop] = margin_top || Palapala.defaults[:margin_top]
53
+ @opts[:marginLeft] = margin_left || Palapala.defaults[:margin_left]
54
+ @opts[:marginBottom] = margin_bottom || Palapala.defaults[:margin_bottom]
55
+ @opts[:marginRight] = margin_right || Palapala.defaults[:margin_right]
56
+ @opts[:preferCSSPageSize] = prefer_css_page_size || Palapala.defaults[:prefer_css_page_size]
57
+ @opts[:printBackground] = print_background || Palapala.defaults[:print_background]
58
+ @opts[:scale] = scale || Palapala.defaults[:scale]
59
+ @opts.compact!
23
60
  end
24
61
 
25
- def binary_data(**opts)
26
- pdf(**opts)
62
+ # Render the PDF content to a binary string.
63
+ #
64
+ # The params from the initializer are converted to the expected casing and merged with the options passed to this method.
65
+ # The options passed here are passed unchanged to the renderer and get priority over the options from the initializer.
66
+ # Chrome DevTools Protocol expects the options to be camelCase, see https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF.
67
+ #
68
+ # @param opts [Hash] the options to pass to the renderer
69
+ # @return [String] the PDF content as a binary string
70
+ def binary_data
71
+ puts "Rendering PDF with params: #{@opts}" if Palapala.debug
72
+ Renderer.html_to_pdf(@content, params: @opts)
73
+ rescue StandardError => e
74
+ puts "Error rendering PDF: #{e.message}"
75
+ Renderer.reset
76
+ raise
27
77
  end
28
78
 
29
- def save(path, **opts)
30
- File.binwrite(path, pdf(**opts))
31
- end
32
-
33
- private
34
-
35
- def renderer
36
- Thread.current[:renderer] ||= Renderer.new
37
- end
38
-
39
- def pdf(**opts)
40
- puts "Rendering PDF with options: #{opts}" if Palapala.debug
41
- renderer.html_to_pdf(@content, params: opts_with_defaults.merge(opts))
42
- end
43
-
44
- def opts_with_defaults
45
- opts = { scale: @scale,
46
- printBackground: true,
47
- displayHeaderFooter: true,
48
- encoding: :binary,
49
- preferCSSPageSize: @prefer_css_page_size }
50
79
 
51
- opts[:headerTemplate] = @header_html unless @header_html.nil?
52
- opts[:footerTemplate] = @footer_html unless @footer_html.nil?
53
- opts[:pageRanges] = @page_ranges unless @page_ranges.nil?
54
- opts[:path] = @path unless @path.nil?
55
- opts[:generateTaggedPDF] = @generate_tagged_pdf unless @generate_tagged_pdf.nil?
56
- opts[:format] = @format unless @format.nil?
57
- # opts[:paperWidth] = @paper_width unless @paper_width.nil?
58
- # opts[:paperHeight] = @paper_height unless @paper_height.nil?
59
- opts[:landscape] = @landscape unless @landscape.nil?
60
- opts[:marginTop] = @margin[:top] unless @margin[:top].nil?
61
- opts[:marginLeft] = @margin[:left] unless @margin[:left].nil?
62
- opts[:marginBottom] = @margin[:bottom] unless @margin[:bottom].nil?
63
- opts[:marginRight] = @margin[:right] unless @margin[:right].nil?
64
- opts
80
+ # Save the PDF content to a file
81
+ # @param path [String] the path to save the PDF file
82
+ # @return [void]
83
+ def save(path)
84
+ File.binwrite(path, binary_data)
65
85
  end
66
86
  end
67
87
  end
@@ -1,25 +1,38 @@
1
- # frozen_string_literal: true
2
-
3
1
  require "json"
4
2
  require "net/http"
5
3
  require "websocket/driver"
4
+ require_relative "./web_socket_client"
5
+ require_relative "./chrome_process"
6
6
 
7
7
  module Palapala
8
8
  # Render HTML content to PDF using Chrome in headless mode with minimal dependencies
9
9
  class Renderer
10
10
  def initialize
11
+ puts "Initializing a renderer" if Palapala.debug
11
12
  # Create an instance of WebSocketClient with the WebSocket URL
12
13
  @client = Palapala::WebSocketClient.new(websocket_url)
13
14
  # Create the WebSocket driver
14
15
  @driver = WebSocket::Driver.client(@client)
15
16
  # Register the on_message callback
16
17
  @driver.on(:message, &method(:on_message))
18
+ @driver.on(:close) { Thread.current[:renderer] = nil } # Reset the renderer on close
17
19
  # Start the WebSocket handshake
18
20
  @driver.start
19
21
  # Initialize the protocol to get the page events
20
22
  send_command_and_wait_for_result("Page.enable")
21
23
  end
22
24
 
25
+ # Create a thread-local instance of the renderer
26
+ def self.thread_local_instance
27
+ Thread.current[:renderer] ||= Renderer.new
28
+ end
29
+
30
+ # Reset the thread-local instance of the renderer
31
+ def self.reset
32
+ puts "Clearing the thread local renderer" if Palapala.debug
33
+ Thread.current[:renderer] = nil
34
+ end
35
+
23
36
  # Callback to handle the incomming WebSocket messages
24
37
  def on_message(e)
25
38
  puts "Received: #{e.data[0..64]}" if Palapala.debug
@@ -80,6 +93,10 @@ module Palapala
80
93
  Base64.decode64(result["data"])
81
94
  end
82
95
 
96
+ def self.html_to_pdf(html, params: {})
97
+ thread_local_instance.html_to_pdf(html, params: params)
98
+ end
99
+
83
100
  def close
84
101
  @driver.close
85
102
  @client.close
@@ -87,6 +104,7 @@ module Palapala
87
104
 
88
105
  private
89
106
 
107
+ # Convert the HTML content to a data URL
90
108
  def data_url_for_html(html)
91
109
  "data:text/html;base64,#{Base64.strict_encode64(html)}"
92
110
  end
@@ -97,102 +115,12 @@ module Palapala
97
115
  uri = URI("#{Palapala.headless_chrome_url}/json/new")
98
116
  http = Net::HTTP.new(uri.host, uri.port)
99
117
  request = Net::HTTP::Put.new(uri)
100
- request['Content-Type'] = 'application/json'
118
+ request["Content-Type"] = "application/json"
101
119
  response = http.request(request)
102
120
  tab_info = JSON.parse(response.body)
103
121
  websocket_url = tab_info["webSocketDebuggerUrl"]
104
122
  puts "WebSocket URL: #{websocket_url}" if Palapala.debug
105
123
  websocket_url
106
124
  end
107
-
108
- # Manage the Chrome child process
109
- module ChromeProcess
110
- def self.port_in_use?(port = 9222, host = "127.0.0.1")
111
- server = TCPServer.new(host, port)
112
- server.close
113
- false
114
- rescue Errno::EADDRINUSE
115
- true
116
- end
117
-
118
- def self.chrome_process_healthy?
119
- return false if @chrome_process_id.nil?
120
-
121
- begin
122
- Process.kill(0, @chrome_process_id) # Check if the process is alive
123
- true
124
- rescue Errno::ESRCH, Errno::EPERM
125
- false
126
- end
127
- end
128
-
129
- def self.kill_chrome
130
- return if @chrome_process_id.nil?
131
-
132
- Process.kill("KILL", @chrome_process_id) # Kill the process
133
- Process.wait(@chrome_process_id) # Wait for the process to finish
134
- end
135
-
136
- def self.chrome_path
137
- return Palapala.headless_chrome_path if Palapala.headless_chrome_path
138
-
139
- case RbConfig::CONFIG["host_os"]
140
- when /darwin/
141
- "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
142
- when /linux/
143
- "/usr/bin/google-chrome" # or "/usr/bin/chromium-browser"
144
- when /win|mingw|cygwin/
145
- "#{ENV["ProgramFiles(x86)"]}\\Google\\Chrome\\Application\\chrome.exe"
146
- else
147
- raise "Unsupported OS"
148
- end
149
- end
150
-
151
- def self.spawn_chrome
152
- return if port_in_use?
153
- return if chrome_process_healthy?
154
-
155
- # Define the path and parameters separately
156
- # chrome_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
157
- params = ["--headless", "--disable-gpu", "--remote-debugging-port=9222"]
158
-
159
- # Spawn the process with the path and parameters
160
- @chrome_process_id = Process.spawn(chrome_path, *params)
161
-
162
- # Wait until the port is in use
163
- until port_in_use?
164
- sleep 0.1
165
- end
166
- # Detach the process so it runs in the background
167
- Process.detach(@chrome_process_id)
168
-
169
- at_exit do
170
- if @chrome_process_id
171
- begin
172
- Process.kill("TERM", @chrome_process_id)
173
- Process.wait(@chrome_process_id)
174
- puts "Child process #{@chrome_process_id} terminated."
175
- rescue Errno::ESRCH
176
- puts "Child process #{@chrome_process_id} is already terminated."
177
- rescue Errno::ECHILD
178
- puts "No child process #{@chrome_process_id} found."
179
- end
180
- end
181
- end
182
-
183
- # Handle when the process is killed
184
- trap("SIGCHLD") do
185
- while (@chrome_process_id = Process.wait(-1, Process::WNOHANG))
186
- break if @chrome_process_id.nil?
187
-
188
- puts "Process #{@chrome_process_id} was killed."
189
- # Handle the error or restart the process if necessary
190
- @chrome_process_id = nil
191
- end
192
- rescue Errno::ECHILD
193
- @chrome_process_id = nil
194
- end
195
- end
196
- end
197
125
  end
198
126
  end
@@ -1,5 +1,3 @@
1
- # frozen_string_literal: true
2
-
3
1
  module Palapala
4
- VERSION = '0.1.6'
2
+ VERSION = "0.1.7"
5
3
  end
@@ -1,7 +1,5 @@
1
- # frozen_string_literal: true
2
-
3
- require 'uri'
4
- require 'socket'
1
+ require "uri"
2
+ require "socket"
5
3
 
6
4
  module Palapala
7
5
  # Create a socket wrapper that conforms to what the websocket-driver expects
data/lib/palapala.rb CHANGED
@@ -1,22 +1,30 @@
1
- # frozen_string_literal: true
1
+ require_relative "palapala/pdf"
2
+ require_relative "palapala/version"
2
3
 
3
- require_relative 'palapala/version'
4
- require_relative 'palapala/pdf'
5
- require_relative 'palapala/web_socket_client'
6
- require_relative 'palapala/renderer'
7
-
8
- # Main module for the gem
9
4
  module Palapala
10
5
  def self.setup
11
6
  yield self
12
7
  end
13
8
 
14
9
  class << self
15
- attr_accessor :defaults, :debug, :headless_chrome_url, :headless_chrome_path
10
+ # params to pass to Chrome when launched as a child process
11
+ attr_accessor :chrome_params
12
+
13
+ # debug mode
14
+ attr_accessor :debug
15
+
16
+ # default options for PDF generation
17
+ attr_accessor :defaults
18
+
19
+ # path to the headless Chrome executable when using the child process renderer
20
+ attr_accessor :headless_chrome_path
21
+
22
+ # URL to the headless Chrome instance when using the remote renderer
23
+ attr_accessor :headless_chrome_url
16
24
  end
17
25
 
18
- self.headless_chrome_url = 'http://localhost:9222'
19
- self.headless_chrome_path = nil
20
- self.defaults = {}
21
26
  self.debug = false
27
+ self.defaults = { displayHeaderFooter: true, encoding: :binary }
28
+ self.headless_chrome_path = nil
29
+ self.headless_chrome_url = "http://localhost:9222"
22
30
  end
@@ -0,0 +1 @@
1
+ require_relative "palapala"
data/palapala_pdf.gemspec CHANGED
@@ -5,8 +5,8 @@ require_relative 'lib/palapala/version'
5
5
  Gem::Specification.new do |spec|
6
6
  spec.name = 'palapala_pdf'
7
7
  spec.version = Palapala::VERSION
8
- spec.authors = ['Koen Handekyn']
9
- spec.email = ['github.com@handekyn.com']
8
+ spec.authors = [ 'Koen Handekyn' ]
9
+ spec.email = [ 'github.com@handekyn.com' ]
10
10
 
11
11
  spec.summary = 'Convert HTML into PDF directly from Ruby using Chrome/Chromium.'
12
12
  spec.description = 'This gem uses faw web sockets to render HTML into a PDF using Chrom(e)(ium) with minimal dependencies.'
@@ -31,7 +31,7 @@ Gem::Specification.new do |spec|
31
31
  end
32
32
  spec.bindir = 'exe'
33
33
  spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
34
- spec.require_paths = ['lib']
34
+ spec.require_paths = [ 'lib' ]
35
35
 
36
36
  # Uncomment to register a new dependency of your gem
37
37
  spec.add_dependency 'base64', '~> 0'
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: palapala_pdf
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.6
4
+ version: 0.1.7
5
5
  platform: ruby
6
6
  authors:
7
7
  - Koen Handekyn
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2024-08-27 00:00:00.000000000 Z
11
+ date: 2024-08-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: base64
@@ -42,7 +42,8 @@ description: This gem uses faw web sockets to render HTML into a PDF using Chrom
42
42
  with minimal dependencies.
43
43
  email:
44
44
  - github.com@handekyn.com
45
- executables: []
45
+ executables:
46
+ - chrome-headless-server.sh
46
47
  extensions: []
47
48
  extra_rdoc_files: []
48
49
  files:
@@ -52,11 +53,19 @@ files:
52
53
  - LICENSE
53
54
  - README.md
54
55
  - Rakefile
56
+ - assets/images/logo-variant2.webp
57
+ - assets/images/logo.webp
58
+ - examples/headers_and_footers.rb
59
+ - examples/js_based_rendering.rb
60
+ - examples/performance_benchmark.rb
61
+ - exe/chrome-headless-server.sh
55
62
  - lib/palapala.rb
63
+ - lib/palapala/chrome_process.rb
56
64
  - lib/palapala/pdf.rb
57
65
  - lib/palapala/renderer.rb
58
66
  - lib/palapala/version.rb
59
67
  - lib/palapala/web_socket_client.rb
68
+ - lib/palapala_pdf.rb
60
69
  - palapala_pdf.gemspec
61
70
  homepage: https://github.com/palapala-app/palapala_pdf
62
71
  licenses: