RubyGems - bidi2pdf - Versions diffs - 0.1.7 → 0.1.8 - Mend

bidi2pdf 0.1.7 → 0.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +35 -8
data/README.md +14 -0
data/docker/Dockerfile.chromedriver +8 -1
data/lib/bidi2pdf/bidi/browser_tab.rb +40 -0
data/lib/bidi2pdf/bidi/command_manager.rb +14 -26
data/lib/bidi2pdf/bidi/connection_manager.rb +3 -9
data/lib/bidi2pdf/bidi/event_manager.rb +1 -1
data/lib/bidi2pdf/bidi/navigation_failed_events.rb +41 -0
data/lib/bidi2pdf/bidi/session.rb +4 -1
data/lib/bidi2pdf/notifications.rb +1 -1
data/lib/bidi2pdf/test_helpers/matchers/contains_pdf_text.rb +50 -0
data/lib/bidi2pdf/test_helpers/matchers/have_pdf_page_count.rb +50 -0
data/lib/bidi2pdf/test_helpers/matchers/match_pdf_text.rb +45 -0
data/lib/bidi2pdf/test_helpers/pdf_reader_utils.rb +89 -0
data/lib/bidi2pdf/test_helpers/pdf_text_sanitizer.rb +232 -0
data/lib/bidi2pdf/test_helpers/testcontainers/chromedriver_container.rb +87 -0
data/lib/bidi2pdf/test_helpers.rb +13 -0
data/lib/bidi2pdf/version.rb +1 -1
data/lib/bidi2pdf.rb +32 -3
metadata +30 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 766b41f0ee642cd7316d0f72d8dd707b0f45aae4a315a46c2b27fb6bb2d176a6
-  data.tar.gz: aeec0549f82ff7bdd68d1aa658ea6ad2033e5310fd5936f40b94007b4ae6c38f
+  metadata.gz: d71c88a5941411b13770993de9b38f6321c263765ffce1a9bbd347fb960855ac
+  data.tar.gz: aa64333d4dc4de54f6e1b627287a5d11661634f3244a8e3a213428c243f155f1
 SHA512:
-  metadata.gz: cc7f1da58549b642521808b9ea2acc4b04068bdb7c877cf52943d2ae69bb989f2ade02601b8bfd0e409440ad8644206bba1e6e16603eb56099b8963a2136e350
-  data.tar.gz: 6258250ac5de22034cbb7816d3ff33c62680747a3eaee171fdd784d309bf1cd7880ca60e45342da8ee9195814ef1c78ca9b45b2c435f9fcbc4aaa43a8d7f95e6
+  metadata.gz: 1d598fe002552f46e53f803f46577adceeeb087b377a40b486d5d2ef7bf713463f429aa26b2687fb7d0b865d73aacf3262be71e17db154794ac82e1e4a245986
+  data.tar.gz: 3b7cb02b0e857e551c720a665ac31d3669a9a27e8c9e3e5c1cdc497517b8fbcd3e917d6b0735113e3b956b23ded042b44c72bfff637cfdbf2431642bd98aaa2b

data/CHANGELOG.md CHANGED Viewed

@@ -7,8 +7,37 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+[unreleased]: https://github.com/dieter-medium/bidi2pdf/compare/v0.1.8..HEAD
 <!-- generated by git-cliff end -->
+## [0.1.8] - 2025-04-22
+### 🎨 Refactored
+- Modularize ChromedriverContainer implementation by @dieter-medium
+- Replace method calls for clarity and consistency by @dieter-medium
+- Namespace PDFTextSanitizer under Bidi2pdf::TestHelpers by @dieter-medium
+- Refactor command management with concurrent queues by @dieter-medium
+### 🐛 Fixed
+- Update CHANGELOG links to correct Markdown syntax by @dieter-medium
+### 📝 Docs
+- Add Rails integration section to README by @dieter-medium
+### 🚀 Added
+- Update Chromedriver container setup and default image by @dieter-medium
+- Add workflow for pushing Chromedriver Docker image by @dieter-medium
+- Return session status and add test coverage by @dieter-medium
+- Integrate concurrent-ruby for thread safety improvements by @dieter-medium
+- Add specific navigation error classes for better handling by @dieter-medium
+- Enhance navigation error handling in BrowserTab by @dieter-medium
+- Add test helpers and matchers for PDF validation by @dieter-medium
 ## [0.1.7] - 2025-04-17
 ### 🎨 Refactored
@@ -143,12 +172,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Initial release
-[unreleased]: https://github.com/dieter-medium/bidi2pdf/compare/v0.1.7..HEAD
-[unreleased]: https://github.com/dieter-medium/bidi2pdf/compare/v0.1.6..v0.1.7
-[0.1.6]: https://github.com/dieter-medium/bidi2pdf/compare/v0.1.5..v0.1.6
-[0.1.5]: https://github.com/dieter-medium/bidi2pdf/compare/v0.1.4..v0.1.5
-[0.1.4]: https://github.com/dieter-medium/bidi2pdf/compare/v0.1.3..v0.1.4
+- [unreleased](https://github.com/dieter-medium/bidi2pdf/compare/v0.1.8..HEAD)
+- [0.1.8](https://github.com/dieter-medium/bidi2pdf/compare/v0.1.7..v0.1.8)
+- [0.1.7](https://github.com/dieter-medium/bidi2pdf/compare/v0.1.6..v0.1.7)
+- [0.1.6](https://github.com/dieter-medium/bidi2pdf/compare/v0.1.5..v0.1.6)
+- [0.1.5](https://github.com/dieter-medium/bidi2pdf/compare/v0.1.4..v0.1.5)
+- [0.1.4](https://github.com/dieter-medium/bidi2pdf/compare/v0.1.3..v0.1.4)

data/README.md CHANGED Viewed

@@ -257,6 +257,20 @@ docker compose -f docker/docker-compose.yml down
 ---
+## 🚂 Rails Integration
+Rails integration is available as an additional gem:
+```ruby
+# In your Gemfile
+gem 'bidi2pdf-rails'
+```
+For full documentation and usage examples,
+visit: [https://github.com/dieter-medium/bidi2pdf-rails](https://github.com/dieter-medium/bidi2pdf-rails)
+---
 ## 🛠 Development
 ```bash

data/docker/Dockerfile.chromedriver CHANGED Viewed

@@ -7,7 +7,7 @@ ENV DEBIAN_FRONTEND=noninteractive
 # Install dependencies
 RUN apt-get update && apt-get upgrade -y && \
     apt-get install -y --no-install-recommends\
-    chromium \
+    chromium chromium-driver\
     libglib2.0-0 \
     libnss3 \
     libxss1 \
@@ -26,6 +26,13 @@ RUN groupadd -r appuser && useradd -r -g appuser -m -d /home/appuser appuser
 COPY ./docker/entrypoint.sh /usr/local/bin/entrypoint.sh
 RUN chmod +x /usr/local/bin/entrypoint.sh
+# ARM compatibility workaround:
+# On ARM architectures (such as Apple Silicon), downloading chromedriver via automated scripts may fail or cause ELF binary errors,
+# such as "rosetta error: failed to open elf at /lib64/ld-linux-x86-64.so.2".
+# To avoid these issues, we directly install 'chromium-driver' via the package manager and explicitly create a symlink in the expected location.
+RUN mkdir -p /home/appuser/.webdrivers && ln -s /usr/bin/chromedriver /home/appuser/.webdrivers/chromedriver
 # Set working directory
 WORKDIR /app

data/lib/bidi2pdf/bidi/browser_tab.rb CHANGED Viewed

@@ -4,6 +4,7 @@ require "base64"
 require_relative "network_events"
 require_relative "logger_events"
+require_relative "navigation_failed_events"
 require_relative "auth_interceptor"
 require_relative "add_headers_interceptor"
 require_relative "js_logger_helper"
@@ -32,6 +33,11 @@ require_relative "js_logger_helper"
 # @param [String] user_context_id The ID of the user context.
 module Bidi2pdf
   module Bidi
+    # Represents a browser tab for managing interactions and communication
+    # using the Bidi2pdf library. This class provides methods for creating
+    # browser tabs, managing cookies, navigating to URLs, executing scripts,
+    # handling network events, and general tab lifecycle management.
+    #
     class BrowserTab
       include JsLoggerHelper
@@ -56,6 +62,9 @@ module Bidi2pdf
       # @return [LoggerEvents] The logger events handler.
       attr_reader :logger_events
+      # @return [NavigationFailedEvents] The navigation failed events handler.
+      attr_reader :navigation_failed_events
       # Initializes a new browser tab.
       #
       # @param [Object] client The WebSocket client for communication.
@@ -68,6 +77,7 @@ module Bidi2pdf
         @tabs = []
         @network_events = NetworkEvents.new browsing_context_id
         @logger_events = LoggerEvents.new browsing_context_id
+        @navigation_failed_events = NavigationFailedEvents.new browsing_context_id
         @open = true
       end
@@ -154,8 +164,21 @@ module Bidi2pdf
       # Navigates the browser tab to a specified URL.
       #
+      # This method registers necessary event listeners and sends a navigation
+      # command to the browser tab, instructing it to load the specified URL.
+      # It validates that the URL is properly formatted before attempting navigation.
+      #
       # @param [String] url The URL to navigate to.
+      # @raise [NavigationError] If the URL is invalid or improperly formatted.
+      # @example
+      #   browser_tab.navigate_to("https://example.com")
       def navigate_to(url)
+        begin
+          URI.parse(url)
+        rescue URI::InvalidURIError => e
+          raise NavigationError, "Invalid URL: #{url} - #{e.message}"
+        end
         Bidi2pdf.notification_service.instrument("navigate_to.bidi2pdf", url: url) do
           navigate_with_listeners url
         end
@@ -389,6 +412,18 @@ module Bidi2pdf
         client.send_cmd_and_wait(cmd) do |response|
           Bidi2pdf.logger.debug "Navigated to page url: #{url} response: #{response}"
         end
+      rescue Bidi2pdf::CmdError => e
+        msg = e.response["message"]
+        case msg
+        when /^net::ERR_INVALID_AUTH_CREDENTIALS/
+          raise NavigationAuthError.new(url, msg)
+        when /^net::ERR_NAME_NOT_RESOLVED/
+          raise NavigationDNSError.new(url, msg)
+        when /^net::/
+          raise NavigationError, "Connection error: #{url} #{msg}"
+        else
+          raise e
+        end
       end
       def register_event_listeners
@@ -401,6 +436,8 @@ module Bidi2pdf
         client.on_event("log.entryAdded",
                         &logger_events.method(:handle_event))
+        client.on_event("browsingContext.navigationFailed", &navigation_failed_events.method(:handle_event))
       end
       def handle_injection_exception(response, url, exception_class)
@@ -536,6 +573,9 @@ module Bidi2pdf
         client.remove_event_listener "network.responseStarted", "network.responseCompleted", "network.fetchError",
                                      &network_events.method(:handle_event)
+        client.remove_event_listener("log.entryAdded",
+                                     &logger_events.method(:handle_event))
       end
       # Closes all tabs associated with the browser tab.

data/lib/bidi2pdf/bidi/command_manager.rb CHANGED Viewed

@@ -5,11 +5,10 @@ module Bidi2pdf
     class CommandManager
       class << self
         def initialize_counter
-          @id = 0
-          @id_mutex = Mutex.new
+          @id = Concurrent::AtomicFixnum.new(0)
         end
-        def next_id = @id_mutex.synchronize { @id += 1 }
+        def next_id = @id.increment
       end
       initialize_counter
@@ -17,19 +16,14 @@ module Bidi2pdf
       def initialize(socket)
         @socket = socket
-        @pending_responses = {}
-        @initiated_cmds = {}
+        @pending_responses = Concurrent::Hash.new
       end
-      def send_cmd(cmd, store_response: false)
+      def send_cmd(cmd, result_queue: nil)
         id = next_id
         Bidi2pdf.notification_service.instrument("send_cmd.bidi2pdf", id: id, cmd: cmd) do |instrumentation_payload|
-          if store_response
-            init_queue_for id
-          else
-            @initiated_cmds[id] = true
-          end
+          init_queue_for id, result_queue
           payload = cmd.as_payload(id)
@@ -42,17 +36,20 @@ module Bidi2pdf
       end
       def send_cmd_and_wait(cmd, timeout: Bidi2pdf.default_timeout, &block)
+        result_queue = Thread::Queue.new
         Bidi2pdf.notification_service.instrument("send_cmd_and_wait.bidi2pdf", cmd: cmd, timeout: timeout) do |instrumentation_payload|
-          id = send_cmd(cmd, store_response: true)
+          id = send_cmd(cmd, result_queue: result_queue)
           instrumentation_payload[:id] = id
-          response = pop_response id, timeout: timeout
+          response = result_queue.pop(timeout: timeout)
           instrumentation_payload[:response] = response
           raise CmdTimeoutError, "Timeout waiting for response to command ID #{id}" if response.nil?
-          raise CmdError, "Error response: #{response["error"]} #{cmd.inspect}" if response["error"]
+          raise Bidi2pdf::CmdError.new(cmd, response) if response["error"]
           block ? block.call(response) : response
         ensure
@@ -60,14 +57,6 @@ module Bidi2pdf
         end
       end
-      def pop_response(id, timeout:)
-        raise CmdResponseNotStoredError, "No response stored for command ID #{id} or already popped or this command was not send" unless @pending_responses.key?(id)
-        @pending_responses[id].pop(timeout: timeout)
-      ensure
-        @pending_responses.delete(id)
-      end
       def handle_response(data)
         Bidi2pdf.notification_service.instrument("handle_response.bidi2pdf", data: data) do |instrumentation_payload|
           instrumentation_payload[:error] = data["error"] if data["error"]
@@ -78,9 +67,6 @@ module Bidi2pdf
             if @pending_responses.key?(id)
               @pending_responses[id]&.push(data)
-              return true
-            elsif @initiated_cmds.key?(id)
-              @initiated_cmds.delete(id)
               return true
             end
@@ -89,12 +75,14 @@ module Bidi2pdf
           instrumentation_payload[:handled] = false
           false
+        ensure
+          @pending_responses.delete id
         end
       end
       private
-      def init_queue_for(id) = @pending_responses[id] = Thread::Queue.new
+      def init_queue_for(id, result_queue) = @pending_responses[id] = result_queue
       def next_id = self.class.next_id
     end

data/lib/bidi2pdf/bidi/connection_manager.rb CHANGED Viewed

@@ -6,7 +6,7 @@ module Bidi2pdf
       def initialize(logger:)
         @logger = logger
         @connected = false
-        @connection_queue = Thread::Queue.new
+        @connection_latch = Concurrent::CountDownLatch.new(1)
       end
       def mark_connected
@@ -14,7 +14,7 @@ module Bidi2pdf
         @connected = true
         @logger.debug "WebSocket connection is open"
-        @connection_queue.push(true)
+        @connection_latch.count_down
       end
       def wait_until_open(timeout:)
@@ -22,13 +22,7 @@ module Bidi2pdf
         @logger.debug "Waiting for WebSocket connection to open"
-        begin
-          Timeout.timeout(timeout) do
-            @connection_queue.pop
-          end
-        rescue Timeout::Error
-          raise Bidi2pdf::WebsocketError, "WebSocket connection did not open in time #{timeout} sec."
-        end
+        raise Bidi2pdf::WebsocketError, "WebSocket connection did not open in time #{timeout} sec." unless @connection_latch.wait(timeout)
         true
       end

data/lib/bidi2pdf/bidi/event_manager.rb CHANGED Viewed

@@ -6,7 +6,7 @@ module Bidi2pdf
       attr_reader :type
       def initialize(type)
-        @listeners = Hash.new { |h, k| h[k] = [] }
+        @listeners = Concurrent::Hash.new { |h, k| h[k] = [] }
         @type = type
       end

data/lib/bidi2pdf/bidi/navigation_failed_events.rb ADDED Viewed

@@ -0,0 +1,41 @@
+# frozen_string_literal: true
+require_relative "browser_console_logger"
+module Bidi2pdf
+  module Bidi
+    class NavigationFailedEvents
+      attr_reader :context_id, :browser_console_logger
+      def initialize(context_id)
+        @context_id = context_id
+      end
+      def handle_event(data)
+        event = data["params"]
+        method = data["method"]
+        if event["context"] == context_id
+          handle_response(method, event)
+        else
+          Bidi2pdf.logger.debug2 "Ignoring Log event: #{method}, context_id: #{context_id}, params: #{event}"
+        end
+      end
+      def handle_response(_method, event)
+        url = event["url"]
+        navigation = event["navigation"]
+        timestamp = event["timestamp"]
+        Bidi2pdf.notification_service.instrument("navigation_failed_received.bidi2pdf",
+                                                 {
+                                                   url: url,
+                                                   timestamp: timestamp,
+                                                   navigation: navigation
+                                                 })
+        Bidi2pdf.logger.error "Navigation failed for URL: #{url}, Navigation: #{navigation}"
+      end
+    end
+  end
+end

data/lib/bidi2pdf/bidi/session.rb CHANGED Viewed

@@ -117,7 +117,10 @@ module Bidi2pdf
       # Retrieves the status of the session.
       def status
-        send_cmd(Bidi2pdf::Bidi::Commands::SessionStatus.new) { |resp| Bidi2pdf.logger.info "Session status: #{resp.inspect}" }
+        send_cmd(Bidi2pdf::Bidi::Commands::SessionStatus.new) do |resp|
+          Bidi2pdf.logger.info "Session status: #{resp["result"].inspect}"
+          resp["result"]
+        end
       end
       # Checks if the session has started.

data/lib/bidi2pdf/notifications.rb CHANGED Viewed

@@ -18,7 +18,7 @@ module Bidi2pdf
   module Notifications
     Thread.attr_accessor :bidi2pdf_notification_instrumenter
-    @subscribers = Hash.new { |h, k| h[k] = [] }
+    @subscribers = Concurrent::Hash.new { |h, k| h[k] = [] }
     class << self
       attr_reader :subscribers

data/lib/bidi2pdf/test_helpers/matchers/contains_pdf_text.rb ADDED Viewed

@@ -0,0 +1,50 @@
+# frozen_string_literal: true
+require_relative "../pdf_text_sanitizer"
+# Custom RSpec matcher for checking whether a PDF document contains specific text.
+#
+# This matcher allows you to assert that a certain string or regular expression
+# is present in the sanitized text of a PDF document.
+#
+# It supports chaining with `.at_page(n)` to limit the search to a specific page.
+#
+# ## Examples
+#
+#     expect(pdf_data).to contains_pdf_text("Total: 123.45")
+#     expect(pdf_data).to contains_pdf_text(/Invoice #\d+/).at_page(2)
+#
+# @param expected [String, Regexp] The text or pattern to match inside the PDF.
+#
+# @return [Boolean] true if the expected content is found (on the given page if specified)
+RSpec::Matchers.define :contains_pdf_text do |expected|
+  chain :at_page do |page_number|
+    @page_number = page_number
+  end
+  match do |actual|
+    Bidi2pdf::TestHelpers::PDFTextSanitizer.contains?(actual, expected, @page_number)
+  end
+  failure_message do |actual|
+    pages = Bidi2pdf::TestHelpers::PDFTextSanitizer.clean_pages(actual)
+    return "Document does not contain page #{@page_number}" if @page_number && !(@page_number && @page_number <= pages.size)
+    <<~MSG
+      PDF text did not contain expected content.
+      --- Expected (#{expected.inspect}) ---
+      On page #{@page_number || "any"}:
+      --- Actual ---
+      #{pages.each_with_index.map { |text, i| "Page #{i + 1}:\n#{text}" }.join("\n\n")}
+    MSG
+  end
+  description do
+    desc = "contain #{expected.inspect} in PDF"
+    desc += " on page #{@page_number}" if @page_number
+    desc
+  end
+end

data/lib/bidi2pdf/test_helpers/matchers/have_pdf_page_count.rb ADDED Viewed

@@ -0,0 +1,50 @@
+# frozen_string_literal: true
+require "pdf-reader"
+require "base64"
+# RSpec matcher to assert the number of pages in a PDF document.
+#
+# This matcher is useful for verifying the structural integrity of generated or uploaded PDFs,
+# especially in tests for reporting, invoice generation, or document exports.
+#
+# It supports a variety of input types:
+# - Raw PDF data as a `String`
+# - File paths (`String`)
+# - `StringIO` or `File` objects
+# - Even Base64-encoded strings, if your `pdf_reader_for` method handles it
+#
+# ## Example
+#
+#     expect(pdf_data).to have_pdf_page_count(5)
+#     expect(StringIO.new(pdf_data)).to have_pdf_page_count(3)
+#
+# If the PDF is malformed, the matcher will gracefully fail and show the error message.
+#
+# @param expected_count [Integer] The number of pages the PDF is expected to contain.
+# @return [RSpec::Matchers::Matcher] The matcher object for use in specs.
+#
+# @note This matcher depends on `Bidi2pdf::TestHelpers::PDFReaderUtils.pdf_reader_for`
+#   to extract the page count. Make sure it supports all your intended input formats.
+RSpec::Matchers.define :have_pdf_page_count do |expected_count|
+  match do |pdf_data|
+    reader = Bidi2pdf::TestHelpers::PDFReaderUtils.pdf_reader_for(pdf_data)
+    @actual_count = reader.page_count
+    @actual_count == expected_count
+  rescue PDF::Reader::MalformedPDFError => e
+    @error_message = e.message
+    false
+  end
+  failure_message do |_pdf_data|
+    if @error_message
+      "Expected a valid PDF with #{expected_count} pages, but encountered an error: #{@error_message}"
+    else
+      "Expected PDF to have #{expected_count} pages, but it has #{@actual_count} pages"
+    end
+  end
+  description do
+    "have #{expected_count} PDF pages"
+  end
+end

data/lib/bidi2pdf/test_helpers/matchers/match_pdf_text.rb ADDED Viewed

@@ -0,0 +1,45 @@
+# frozen_string_literal: true
+require_relative "../pdf_text_sanitizer"
+# Custom RSpec matcher to compare the **sanitized text content** of two PDF files.
+#
+# This matcher is useful for comparing PDF documents where formatting and metadata may differ,
+# but the actual visible text content should be the same. It uses `PDFTextSanitizer` internally
+# to normalize and clean the text before comparison.
+#
+# ## Example
+#
+#     expect(actual_pdf).to match_pdf_text(expected_pdf)
+#
+# If the texts don’t match, it prints a diff-friendly message showing cleaned text content.
+#
+# @param expected [String, StringIO, File] The expected PDF content (can be a file path, StringIO, or raw string).
+# @return [RSpec::Matchers::Matcher] An RSpec matcher to compare against an actual PDF.
+#
+# @note Ensure `PDFTextSanitizer.match?` and `PDFTextSanitizer.clean_pages` are implemented
+#   to handle your specific PDF processing logic.
+RSpec::Matchers.define :match_pdf_text do |expected|
+  match do |actual|
+    Bidi2pdf::TestHelpers::PDFTextSanitizer.match?(actual, expected)
+  end
+  failure_message do |actual|
+    cleaned_actual = Bidi2pdf::TestHelpers::PDFTextSanitizer.clean_pages(actual)
+    cleaned_expected = Bidi2pdf::TestHelpers::PDFTextSanitizer.clean_pages(expected)
+    <<~MSG
+      PDF text did not match.
+      --- Expected ---
+      #{cleaned_expected.join("\n")}
+      --- Actual ---
+      #{cleaned_actual.join("\n")}
+    MSG
+  end
+  description do
+    "match sanitized PDF text content"
+  end
+end

data/lib/bidi2pdf/test_helpers/pdf_reader_utils.rb ADDED Viewed

@@ -0,0 +1,89 @@
+# frozen_string_literal: true
+module Bidi2pdf
+  module TestHelpers
+    module PDFReaderUtils
+      class << self
+        # Extracts text content from a PDF document.
+        #
+        # This method accepts various PDF input formats and attempts to extract text content
+        # from all pages. If extraction fails due to malformed PDF data, it returns the original input.
+        #
+        # @param pdf_data [String, StringIO, File] The PDF data in one of the following formats:
+        #   * Base64-encoded PDF string
+        #   * Raw PDF data beginning with "%PDF-"
+        #   * StringIO object containing PDF data
+        #   * Path to a PDF file as String
+        #   * Raw PDF data as String
+        # @return [Array<String>] An array of strings, with each string representing the text content of a page
+        # @return [Object] The original input if PDF extraction fails
+        # @example Extract text from a PDF file
+        #   text_content = pdf_text('path/to/document.pdf')
+        #
+        # @example Extract text from Base64-encoded string
+        #   text_content = pdf_text(base64_encoded_pdf_data)
+        def pdf_text(pdf_data)
+          return pdf_data unless pdf_data.is_a?(String) || pdf_data.is_a?(StringIO) || pdf_data.is_a?(File)
+          begin
+            reader = pdf_reader_for pdf_data
+            reader.pages.map(&:text)
+          rescue PDF::Reader::MalformedPDFError
+            [pdf_data]
+          end
+        end
+        # Converts the input PDF data into an IO object and initializes a PDF::Reader.
+        #
+        # @param pdf_data [String, StringIO, File] The PDF data to be read.
+        # @return [PDF::Reader] A PDF::Reader instance for the given data.
+        # @raise [PDF::Reader::MalformedPDFError] If the PDF data is invalid.
+        def pdf_reader_for(pdf_data)
+          io = convert_data_to_io(pdf_data)
+          PDF::Reader.new(io)
+        end
+        # rubocop: disable Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity
+        # Converts various input formats into an IO object for PDF::Reader.
+        #
+        # @param pdf_data [String, StringIO, File] The PDF data to be converted.
+        # @return [IO] An IO object containing the PDF data.
+        def convert_data_to_io(pdf_data)
+          # rubocop:disable Lint/DuplicateBranch
+          if pdf_data.is_a?(String) && (pdf_data.start_with?("JVBERi") || pdf_data.start_with?("JVBER"))
+            StringIO.new(Base64.decode64(pdf_data))
+          elsif pdf_data.start_with?("%PDF-")
+            StringIO.new(pdf_data)
+          elsif pdf_data.is_a?(StringIO)
+            pdf_data
+          elsif pdf_data.is_a?(String) && File.exist?(pdf_data)
+            File.open(pdf_data, "rb")
+          else
+            StringIO.new(pdf_data)
+          end
+          # rubocop:enable Lint/DuplicateBranch
+        end
+      end
+      # rubocop: enable Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity
+      module InstanceMethods
+        def pdf_text(pdf_data)
+          PDFReaderUtils.pdf_text(pdf_data)
+        end
+        def pdf_reader_for(pdf_data)
+          PDFReaderUtils.pdf_reader_for(pdf_data)
+        end
+        def convert_data_to_io(pdf_data)
+          PDFReaderUtils.convert_data_to_io(pdf_data)
+        end
+      end
+      def self.included(base)
+        base.include(InstanceMethods)
+      end
+    end
+  end
+end

data/lib/bidi2pdf/test_helpers/pdf_text_sanitizer.rb ADDED Viewed

@@ -0,0 +1,232 @@
+# frozen_string_literal: true
+require "unicode_utils"
+require "diff/lcs"
+require "diff/lcs/hunk"
+module Bidi2pdf
+  module TestHelpers
+    # rubocop: disable Metrics/ModuleLength
+    # Provides utilities for sanitizing and comparing PDF text content.
+    # This module includes methods for cleaning text, comparing PDF content,
+    # and reporting differences between actual and expected PDF outputs.
+    #
+    # The sanitization process includes normalizing whitespace, replacing
+    # typographic ligatures, and handling other common text formatting issues.
+    #
+    # @example Cleaning text
+    #   sanitized_text = Bidi2pdf::TestHelpers::PDFTextSanitizer.clean("Some text")
+    #
+    # @example Comparing PDF content
+    #   match = Bidi2pdf::TestHelpers::PDFTextSanitizer.match?(actual_pdf, expected_pdf)
+    module PDFTextSanitizer
+      class << self
+        # Cleans the given text by replacing common typographic ligatures,
+        # normalizing whitespace, and removing unnecessary characters.
+        #
+        # @param [String] text The text to clean.
+        # @return [String] The cleaned text.
+        def clean(text)
+          text = UnicodeUtils.nfkd(text)
+          text.gsub("\uFB01", "fi")
+              .gsub("\uFB02", "fl")
+              .gsub("-\n", "")
+              .gsub(/["]/, '"')
+              .gsub(/[']/, "'")
+              .gsub("…", "...")
+              .gsub("—", "--")
+              .gsub("–", "-")
+              .gsub(/\s+/, " ") # Replace all whitespace sequences with a single space
+              .strip
+        end
+        # Cleans an array of PDF page texts by applying the `clean` method
+        # to each page's content.
+        #
+        # @param [Object] actual_pdf_thingy The PDF object to clean.
+        # @return [Array<String>] An array of cleaned page texts.
+        def clean_pages(actual_pdf_thingy)
+          Bidi2pdf::TestHelpers::PDFReaderUtils.pdf_text(actual_pdf_thingy).map { |text| clean(text) }
+        end
+        # Cleans the given text and removes all whitespace for comparison purposes.
+        #
+        # @param [String] text The text to clean and normalize.
+        # @return [String] The cleaned text without whitespace.
+        def normalize(text)
+          clean(text).gsub(/\s+/, "")
+        end
+        # Checks if the given PDF contains the expected text or pattern.
+        #
+        # @param [Object] actual_pdf_thingy The PDF object to search.
+        # @param [String, Regexp] expected The expected text or pattern.
+        # @param [Integer, nil] page_number The specific page to search (optional).
+        # @return [Boolean] `true` if the expected text is found, `false` otherwise.
+        def contains?(actual_pdf_thingy, expected, page_number = nil)
+          pages = Bidi2pdf::TestHelpers::PDFReaderUtils.pdf_text(actual_pdf_thingy)
+          cleaned_pages = clean_pages(pages)
+          return false if page_number && page_number > cleaned_pages.size
+          # Narrow to specific page if requested
+          if page_number
+            text = cleaned_pages[page_number - 1]
+            return match_expected?(text, expected)
+          end
+          # Search all pages
+          cleaned_pages.any? { |page| match_expected?(page, expected) }
+        end
+        # Matches the given text against the expected text or pattern.
+        #
+        # @param [String] text The text to match.
+        # @param [String, Regexp] expected The expected text or pattern.
+        # @return [Boolean] `true` if the text matches, `false` otherwise.
+        def match_expected?(text, expected)
+          return false unless text
+          expected.is_a?(Regexp) ? text.match?(expected) : text.include?(expected.to_s)
+        end
+        # Compares the content of two PDF objects for equality.
+        #
+        # @param [Object] actual_pdf_thingy The actual PDF object.
+        # @param [Object] expected_pdf_thingy The expected PDF object.
+        # @return [Boolean] `true` if the content matches, `false` otherwise.
+        def match?(actual_pdf_thingy, expected_pdf_thingy)
+          actual = Bidi2pdf::TestHelpers::PDFReaderUtils.pdf_text actual_pdf_thingy
+          expected = Bidi2pdf::TestHelpers::PDFReaderUtils.pdf_text expected_pdf_thingy
+          cleaned_actual = clean_pages(actual)
+          cleaned_expected = clean_pages(expected)
+          # Compare without whitespace for equality check
+          actual_for_comparison = cleaned_actual.map { |text| normalize(text) }
+          expected_for_comparison = cleaned_expected.map { |text| normalize(text) }
+          if actual_for_comparison == expected_for_comparison
+            true
+          else
+            report_content_mismatch(cleaned_actual, cleaned_expected)
+            false
+          end
+        end
+        # Reports differences between actual and expected PDF content.
+        #
+        # @param [Array<String>] actual The actual PDF content.
+        # @param [Array<String>] expected The expected PDF content.
+        # @return [void]
+        def report_content_mismatch(actual, expected)
+          puts "--- PDF content mismatch ---"
+          print_differences(actual, expected)
+        end
+        # Prints detailed differences between actual and expected PDF content.
+        #
+        # @param [Array<String>] actual The actual PDF content.
+        # @param [Array<String>] expected The expected PDF content.
+        # @return [void]
+        def print_differences(actual, expected)
+          max_pages = [actual.length, expected.length].max
+          (0...max_pages).each do |page_idx|
+            actual_page = actual[page_idx] || "(missing page)"
+            expected_page = expected[page_idx] || "(missing page)"
+            print_differences_for_page(actual_page, expected_page, page_idx)
+          end
+        end
+        # Prints the differences between actual and expected content for a specific page.
+        # This method compares the content ignoring whitespace and, if differences are found,
+        # outputs a formatted representation of those differences.
+        #
+        # @param [String] actual_page The actual page content.
+        # @param [String] expected_page The expected page content.
+        # @param [Integer] page_idx The zero-based index of the page being compared.
+        # @return [void]
+        def print_differences_for_page(actual_page, expected_page, page_idx)
+          # Compare without whitespace
+          actual_no_space = normalize(actual_page.to_s)
+          expected_no_space = normalize(expected_page.to_s)
+          return if actual_no_space == expected_no_space
+          puts "\nPage #{page_idx + 1} differences (ignoring whitespace):"
+          # Create diffs between the two pages
+          diffs = Diff::LCS.sdiff(expected_page, actual_page)
+          # Format and display the differences
+          puts format_diff_output(diffs, expected_page, actual_page)
+        end
+        # Formats the output of differences for display.
+        #
+        # @param [Array<Diff::LCS::ContextChange>] diffs The list of differences.
+        # @param [String] expected The expected text.
+        # @param [String] actual The actual text.
+        # @return [String] The formatted differences.
+        def format_diff_output(diffs, expected, actual)
+          output = []
+          changes = group_changed_diffs(diffs)
+          # Output each change with context
+          changes.each do |change|
+            output += format_change expected, actual, change
+          end
+          output.join("\n")
+        end
+        private
+        # Groups contiguous “real” diffs (added/removed/changed) into blocks,
+        # splitting whenever you hit an unchanged (“=”) diff.
+        def group_changed_diffs(diffs)
+          diffs
+            .chunk_while { |_prev, curr| curr.action != "=" }
+            .map { |chunk| chunk.reject { |elem| elem.action == "=" } }
+            .select(&:any?)
+            .map { |chunk| { diffs: chunk } }
+        end
+        def format_change(expected, actual, change)
+          pos = change[:diffs].first.old_position
+          snippets = extract_snippets(expected, actual, change, pos)
+          build_output(snippets, pos)
+        end
+        def extract_snippets(expected, actual, change, pos)
+          {
+            context_start: [0, pos - 20].max,
+            context: expected,
+            expected_snip: expected[pos, 50],
+            actual_snip: actual[change[:diffs].first.new_position, 50]
+          }
+        end
+        # 3. Build the final lines of output
+        def build_output(snip_data, pos)
+          start = snip_data[:context_start]
+          ctx = snip_data[:context]
+          [
+            "  Context: ...#{ctx[start...pos]}",
+            "  Expected: #{snip_data[:expected_snip]}...",
+            "  Actual:   #{snip_data[:actual_snip]}...",
+            "  Expected (no spaces): #{normalize(snip_data[:expected_snip])}...",
+            "  Actual (no spaces):   #{normalize(snip_data[:actual_snip])}..."
+          ]
+        end
+      end
+    end
+    # rubocop:enable Metrics/ModuleLength
+  end
+end

data/lib/bidi2pdf/test_helpers/testcontainers/chromedriver_container.rb ADDED Viewed

@@ -0,0 +1,87 @@
+# frozen_string_literal: true
+begin
+  require "testcontainers"
+rescue LoadError
+  warn "Missing #{dep}. Add it to your Gemfile if you're using Bidi2pdf test helpers."
+end
+module Bidi2pdf
+  module TestHelpers
+    module Testcontainers
+      class ChromedriverContainer < ::Testcontainers::DockerContainer
+        DEFAULT_CHROMEDRIVER_PORT = 3000
+        DEFAULT_IMAGE = "dieters877565/chromedriver"
+        attr_reader :docker_file, :build_dir
+        def initialize(image = DEFAULT_IMAGE, **options)
+          @docker_file = options.delete(:docker_file) || "Dockerfile"
+          @build_dir = options.delete(:build_dir) || options[:working_dir]
+          super
+          @wait_for ||= add_wait_for(:logs, /ChromeDriver was started successfully on port/)
+        end
+        def start
+          with_exposed_ports(port)
+          super
+        end
+        def port
+          DEFAULT_CHROMEDRIVER_PORT
+        end
+        # rubocop: disable Metrics/AbcSize
+        def build_local_image
+          old_timeout = Docker.options[:read_timeout]
+          Docker.options[:read_timeout] = 60 * 10
+          Docker::Image.build_from_dir(build_dir, { "t" => image, "dockerfile" => docker_file }) do |lines|
+            lines.split("\n").each do |line|
+              next unless (log = JSON.parse(line)) && log.key?("stream")
+              next unless log["stream"] && !(trimmed_stream = log["stream"].strip).empty?
+              timestamp = Time.now.strftime("[%Y-%m-%dT%H:%M:%S.%6N]")
+              $stdout.write "#{timestamp} #{trimmed_stream}\n"
+            end
+          end
+          Docker.options[:read_timeout] = old_timeout
+        end
+        # rubocop: enable  Metrics/AbcSize
+        # rubocop: disable Metrics/AbcSize
+        def start_local_image
+          build_local_image
+          with_exposed_ports(port)
+          @_container ||= Docker::Container.create(_container_create_options)
+          @_container.start
+          @_id = @_container.id
+          json = @_container.json
+          @name = json["Name"]
+          @_created_at = json["Created"]
+          @wait_for&.call(self)
+          self
+        rescue Docker::Error::NotFoundError => e
+          raise Testcontainers::NotFoundError, e.message
+        rescue Excon::Error::Socket => e
+          raise Testcontainers::ConnectionError, e.message
+        end
+        # rubocop: enable Metrics/AbcSize
+        def session_url(protocol: "http")
+          "#{protocol}://#{host}:#{mapped_port(port)}/session"
+        end
+      end
+    end
+  end
+end

data/lib/bidi2pdf/test_helpers.rb ADDED Viewed

@@ -0,0 +1,13 @@
+# frozen_string_literal: true
+%w[pdf-reader diff-lcs unicode_utils].each do |dep|
+  require dep
+rescue LoadError
+  warn "Missing #{dep}. Add it to your Gemfile if you're using Bidi2pdf test helpers."
+end
+require "bidi2pdf/test_helpers/pdf_text_sanitizer"
+require "bidi2pdf/test_helpers/pdf_reader_utils"
+require "bidi2pdf/test_helpers/matchers/match_pdf_text"
+require "bidi2pdf/test_helpers/matchers/contains_pdf_text"
+require "bidi2pdf/test_helpers/matchers/have_pdf_page_count"

data/lib/bidi2pdf/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Bidi2pdf
-  VERSION = "0.1.7"
+  VERSION = "0.1.8"
 end

data/lib/bidi2pdf.rb CHANGED Viewed

@@ -1,5 +1,8 @@
 # frozen_string_literal: true
+require "concurrent-ruby"
+require "logger"
 require_relative "bidi2pdf/process_tree"
 require_relative "bidi2pdf/launcher"
 require_relative "bidi2pdf/bidi/session"
@@ -8,8 +11,6 @@ require_relative "bidi2pdf/notifications"
 require_relative "bidi2pdf/notifications/logging_subscriber"
 require_relative "bidi2pdf/verbose_logger"
-require "logger"
 module Bidi2pdf
   PAPER_FORMATS_CM = {
     letter: { width: 21.59, height: 27.94 },
@@ -33,7 +34,16 @@ module Bidi2pdf
   class ClientError < WebsocketError; end
-  class CmdError < ClientError; end
+  class CmdError < ClientError
+    attr_reader :cmd, :response
+    def initialize(cmd, response)
+      @cmd = cmd
+      @response = response
+      super("Error response: #{response["error"]} #{cmd.inspect}")
+    end
+  end
   class CmdResponseNotStoredError < ClientError; end
@@ -55,6 +65,25 @@ module Bidi2pdf
     end
   end
+  class NavigationError < Error; end
+  class NavigationAuthError < NavigationError
+    attr_reader :url
+    def initialize(url, message = nil)
+      @url = url
+      super("Navigation to #{url} failed due to authentication error. #{message}")
+    end
+  end
+  class NavigationTimeoutError < NavigationError; end
+  class NavigationNotFoundError < NavigationError; end
+  class NavigationDNSError < NavigationError; end
+  # Global configuration for Bidi2pdf
   class << self
     attr_accessor :default_timeout, :enable_default_logging_subscriber
     attr_reader :logging_subscriber, :logger, :network_events_logger, :browser_console_logger, :notification_service

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: bidi2pdf
 version: !ruby/object:Gem::Version
-  version: 0.1.7
+  version: 0.1.8
 platform: ruby
 authors:
 - Dieter S.
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2025-04-17 00:00:00.000000000 Z
+date: 2025-04-22 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: base64
@@ -38,6 +38,26 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
+  name: concurrent-ruby
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 1.3.1
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 1.3.1
 - !ruby/object:Gem::Dependency
   name: json
   requirement: !ruby/object:Gem::Requirement
@@ -379,6 +399,7 @@ files:
 - lib/bidi2pdf/bidi/interceptor.rb
 - lib/bidi2pdf/bidi/js_logger_helper.rb
 - lib/bidi2pdf/bidi/logger_events.rb
+- lib/bidi2pdf/bidi/navigation_failed_events.rb
 - lib/bidi2pdf/bidi/network_event.rb
 - lib/bidi2pdf/bidi/network_event_formatters.rb
 - lib/bidi2pdf/bidi/network_event_formatters/network_event_console_formatter.rb
@@ -398,6 +419,13 @@ files:
 - lib/bidi2pdf/notifications/logging_subscriber.rb
 - lib/bidi2pdf/process_tree.rb
 - lib/bidi2pdf/session_runner.rb
+- lib/bidi2pdf/test_helpers.rb
+- lib/bidi2pdf/test_helpers/matchers/contains_pdf_text.rb
+- lib/bidi2pdf/test_helpers/matchers/have_pdf_page_count.rb
+- lib/bidi2pdf/test_helpers/matchers/match_pdf_text.rb
+- lib/bidi2pdf/test_helpers/pdf_reader_utils.rb
+- lib/bidi2pdf/test_helpers/pdf_text_sanitizer.rb
+- lib/bidi2pdf/test_helpers/testcontainers/chromedriver_container.rb
 - lib/bidi2pdf/verbose_logger.rb
 - lib/bidi2pdf/version.rb
 - sig/bidi2pdf.rbs